Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Improve performance of strings::memcpy_inlined #13330

Merged
merged 4 commits into from Nov 17, 2022
Merged

[Enhancement] Improve performance of strings::memcpy_inlined #13330

merged 4 commits into from Nov 17, 2022

Conversation

dirtysalt
Copy link
Contributor

@dirtysalt dirtysalt commented Nov 14, 2022

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

This PR is to improve memcpy to accelerate block cache:

  • use overlapped move to optimize the case when size <= 16
  • use avx(256bit reg) move to optimize the case when size <= 256
  • use erms(enhanced repeat movsb/stosb) to optimize the case when size in [512KB,2MB]
  • use unrolled avx(256bit reg) move to optimze the rest cases.

The overlapped move is very useful when copying irregular size like 5,7,11bytes.

  • if we don't use overlapped move, 11 bytes will be copied in (movq, mov ax, mov al) (8+2+1)
  • but if we use overlapped move, 11 bytes can be copied by (movq, movl) (8+4)

erms version works very wll when size in [512KB, 2MB], in 1 thread or 8 threads

image

image


avx unrolled version works well when size is large. this version works better than glibc __memcpy_avx_unaligned version when thread =1, but is worse when thread = 8. so it's really difficult to write a very general good performance memcpy for all workload.

image

image


  native ssb parquet ssb io tuned memcpy memcpy(io tuned)
Q01 109 173 169 179 163
Q02 91 126 128 126 122
Q03 87 118 121 122 115
Q04 578 905 773 813 739
Q05 524 1015 883 708 654
Q06 484 774 674 688 649
Q07 766 1450 1100 1214 1073
Q08 517 926 786 802 748
Q09 505 888 754 757 716
Q10 139 182 183 177 173
Q11 1000 1567 1315 1418 1263
Q12 376 590 568 539 502
Q13 301 389 388 378 362
SUM 5477 9103 7842 7921 7279

benchmark comparison between this impl and previous impl

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
memcpy_sr/16                 2.01 ns         2.01 ns    347320268
memcpy_sr/32                 2.26 ns         2.26 ns    309545548
memcpy_sr/64                 3.66 ns         3.66 ns    191179564
memcpy_sr/128                4.61 ns         4.61 ns    151792072
memcpy_sr/256                6.84 ns         6.84 ns    102082429
memcpy_sr/512                9.67 ns         9.67 ns     72437612
memcpy_sr/1024               15.0 ns         15.0 ns     46503179
memcpy_sr/2048               29.0 ns         29.0 ns     24160369
memcpy_sr/4096               38.5 ns         38.5 ns     18135859
memcpy_sr/8192               63.9 ns         63.9 ns     10980053
memcpy_sr/16384               132 ns          132 ns      5287391
memcpy_sr/32768               781 ns          781 ns       889899
memcpy_sr/65536              1582 ns         1582 ns       450258
memcpy_sr/131072             3112 ns         3112 ns       224624
memcpy_sr/262144             6292 ns         6291 ns       111086
memcpy_sr/524288            17364 ns        17364 ns        39997
memcpy_sr/1048576           70380 ns        70378 ns        10529
memcpy_sr/2097152          256405 ns       256397 ns         2647
memcpy_sr/4194304          682078 ns       682056 ns         1021
memcpy_sr/8388608         1494222 ns      1494169 ns          469
memcpy_sr/16777216        3031167 ns      3031001 ns          231
memcpy_sr/33554432        6110517 ns      6110305 ns          115
memcpy_sr/67108864       12198513 ns     12198063 ns           58
memcpy_sr/134217728      23999964 ns     23998751 ns           29
memcpy_gutil/16              2.10 ns         2.10 ns    333638027
memcpy_gutil/32              4.68 ns         4.68 ns    149708142
memcpy_gutil/64              4.68 ns         4.68 ns    149662819
memcpy_gutil/128             4.99 ns         4.99 ns    140184350
memcpy_gutil/256             6.89 ns         6.89 ns    101552625
memcpy_gutil/512             11.9 ns         11.9 ns     58626829
memcpy_gutil/1024            21.7 ns         21.7 ns     32212684
memcpy_gutil/2048            42.0 ns         42.0 ns     16680581
memcpy_gutil/4096            84.7 ns         84.7 ns      8264949
memcpy_gutil/8192             163 ns          163 ns      4285315
memcpy_gutil/16384            361 ns          361 ns      1933720
memcpy_gutil/32768           1042 ns         1042 ns       670798
memcpy_gutil/65536           2069 ns         2069 ns       338344
memcpy_gutil/131072          4157 ns         4157 ns       165195
memcpy_gutil/262144          9016 ns         9015 ns        80531
memcpy_gutil/524288         34986 ns        34984 ns        22445
memcpy_gutil/1048576       149037 ns       149032 ns         5211
memcpy_gutil/2097152       386921 ns       386908 ns         1843
memcpy_gutil/4194304       923590 ns       923556 ns          753
memcpy_gutil/8388608      1904319 ns      1904252 ns          367
memcpy_gutil/16777216     3549960 ns      3549845 ns          196
memcpy_gutil/33554432     7838210 ns      7837956 ns           91
memcpy_gutil/67108864    15422806 ns     15422286 ns           46
memcpy_gutil/134217728   30822498 ns     30821511 ns           23

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/common/ckmemcpy.h Outdated Show resolved Hide resolved
be/src/common/ckmemcpy.h Outdated Show resolved Hide resolved
@kangkaisen
Copy link
Collaborator

Why not improve the memcpy by ourself?

@dirtysalt dirtysalt changed the title [Enhancement][WIP] POC: introducing ck memcpy [Enhancement] improve performance of inline_memcpy Nov 15, 2022
@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@dirtysalt dirtysalt changed the title [Enhancement] improve performance of inline_memcpy [Enhancement] improve performance of strings::memcpy_inlined Nov 16, 2022
@dirtysalt dirtysalt enabled auto-merge (squash) November 16, 2022 14:13
@github-actions github-actions bot removed the be-build label Nov 17, 2022
@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@dirtysalt
Copy link
Contributor Author

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2022

rebase

✅ Branch has been successfully rebased

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@sonarcloud
Copy link

sonarcloud bot commented Nov 17, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.7% 0.7% Duplication

@dirtysalt dirtysalt changed the title [Enhancement] improve performance of strings::memcpy_inlined [Enhancement] Improve performance of strings::memcpy_inlined Nov 17, 2022
@dirtysalt
Copy link
Contributor Author

run starrocks_admit_test

@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Nov 17, 2022
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@dirtysalt dirtysalt enabled auto-merge (squash) November 17, 2022 07:58
@dirtysalt dirtysalt merged commit d940e8a into StarRocks:main Nov 17, 2022
1 check passed
@github-actions github-actions bot removed Approved Ready to merge be-build labels Nov 17, 2022
@dirtysalt dirtysalt deleted the goodbox branch November 17, 2022 10:06
@stdpain stdpain self-assigned this May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants