New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Improve performance of strings::memcpy_inlined
#13330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Why not improve the memcpy by ourself? |
inline_memcpy
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
inline_memcpy
strings::memcpy_inlined
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
@Mergifyio rebase |
This reverts commit 412e1f81702c3e92ea84f0d4b485ed35fe51b0f1.
✅ Branch has been successfully rebased |
clang-tidy review says "All clean, LGTM! 👍" |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
strings::memcpy_inlined
strings::memcpy_inlined
run starrocks_admit_test |
run starrocks_admit_test |
What type of PR is this:
Which issues of this PR fixes :
Fixes #
Problem Summary(Required) :
This PR is to improve
memcpy
to accelerate block cache:The overlapped move is very useful when copying irregular size like 5,7,11bytes.
erms version works very wll when size in [512KB, 2MB], in 1 thread or 8 threads
avx unrolled version works well when size is large. this version works better than glibc
__memcpy_avx_unaligned
version when thread =1, but is worse when thread = 8. so it's really difficult to write a very general good performance memcpy for all workload.benchmark comparison between this impl and previous impl
Checklist: