[Enhancement] Improve performance of `strings::memcpy_inlined` #13330

dirtysalt · 2022-11-14T03:17:27Z

What type of PR is this：

Which issues of this PR fixes ：

Fixes #

Problem Summary(Required) ：

This PR is to improve memcpy to accelerate block cache:

use overlapped move to optimize the case when size <= 16
use avx(256bit reg) move to optimize the case when size <= 256
use erms(enhanced repeat movsb/stosb) to optimize the case when size in [512KB,2MB]
use unrolled avx(256bit reg) move to optimze the rest cases.

The overlapped move is very useful when copying irregular size like 5,7,11bytes.

if we don't use overlapped move, 11 bytes will be copied in (movq, mov ax, mov al) (8+2+1)
but if we use overlapped move, 11 bytes can be copied by (movq, movl) (8+4)

erms version works very wll when size in [512KB, 2MB], in 1 thread or 8 threads

avx unrolled version works well when size is large. this version works better than glibc __memcpy_avx_unaligned version when thread =1, but is worse when thread = 8. so it's really difficult to write a very general good performance memcpy for all workload.

	native ssb	parquet ssb	io tuned	memcpy	memcpy(io tuned)
Q01	109	173	169	179	163
Q02	91	126	128	126	122
Q03	87	118	121	122	115
Q04	578	905	773	813	739
Q05	524	1015	883	708	654
Q06	484	774	674	688	649
Q07	766	1450	1100	1214	1073
Q08	517	926	786	802	748
Q09	505	888	754	757	716
Q10	139	182	183	177	173
Q11	1000	1567	1315	1418	1263
Q12	376	590	568	539	502
Q13	301	389	388	378	362
SUM	5477	9103	7842	7921	7279

benchmark comparison between this impl and previous impl

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
memcpy_sr/16                 2.01 ns         2.01 ns    347320268
memcpy_sr/32                 2.26 ns         2.26 ns    309545548
memcpy_sr/64                 3.66 ns         3.66 ns    191179564
memcpy_sr/128                4.61 ns         4.61 ns    151792072
memcpy_sr/256                6.84 ns         6.84 ns    102082429
memcpy_sr/512                9.67 ns         9.67 ns     72437612
memcpy_sr/1024               15.0 ns         15.0 ns     46503179
memcpy_sr/2048               29.0 ns         29.0 ns     24160369
memcpy_sr/4096               38.5 ns         38.5 ns     18135859
memcpy_sr/8192               63.9 ns         63.9 ns     10980053
memcpy_sr/16384               132 ns          132 ns      5287391
memcpy_sr/32768               781 ns          781 ns       889899
memcpy_sr/65536              1582 ns         1582 ns       450258
memcpy_sr/131072             3112 ns         3112 ns       224624
memcpy_sr/262144             6292 ns         6291 ns       111086
memcpy_sr/524288            17364 ns        17364 ns        39997
memcpy_sr/1048576           70380 ns        70378 ns        10529
memcpy_sr/2097152          256405 ns       256397 ns         2647
memcpy_sr/4194304          682078 ns       682056 ns         1021
memcpy_sr/8388608         1494222 ns      1494169 ns          469
memcpy_sr/16777216        3031167 ns      3031001 ns          231
memcpy_sr/33554432        6110517 ns      6110305 ns          115
memcpy_sr/67108864       12198513 ns     12198063 ns           58
memcpy_sr/134217728      23999964 ns     23998751 ns           29
memcpy_gutil/16              2.10 ns         2.10 ns    333638027
memcpy_gutil/32              4.68 ns         4.68 ns    149708142
memcpy_gutil/64              4.68 ns         4.68 ns    149662819
memcpy_gutil/128             4.99 ns         4.99 ns    140184350
memcpy_gutil/256             6.89 ns         6.89 ns    101552625
memcpy_gutil/512             11.9 ns         11.9 ns     58626829
memcpy_gutil/1024            21.7 ns         21.7 ns     32212684
memcpy_gutil/2048            42.0 ns         42.0 ns     16680581
memcpy_gutil/4096            84.7 ns         84.7 ns      8264949
memcpy_gutil/8192             163 ns          163 ns      4285315
memcpy_gutil/16384            361 ns          361 ns      1933720
memcpy_gutil/32768           1042 ns         1042 ns       670798
memcpy_gutil/65536           2069 ns         2069 ns       338344
memcpy_gutil/131072          4157 ns         4157 ns       165195
memcpy_gutil/262144          9016 ns         9015 ns        80531
memcpy_gutil/524288         34986 ns        34984 ns        22445
memcpy_gutil/1048576       149037 ns       149032 ns         5211
memcpy_gutil/2097152       386921 ns       386908 ns         1843
memcpy_gutil/4194304       923590 ns       923556 ns          753
memcpy_gutil/8388608      1904319 ns      1904252 ns          367
memcpy_gutil/16777216     3549960 ns      3549845 ns          196
memcpy_gutil/33554432     7838210 ns      7837956 ns           91
memcpy_gutil/67108864    15422806 ns     15422286 ns           46
memcpy_gutil/134217728   30822498 ns     30821511 ns           23

Checklist:

I have added test cases for my bug fix or my new feature
This pr will affect users' behaviors
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function

github-actions

clang-tidy made some suggestions

be/src/common/ckmemcpy.h

kangkaisen · 2022-11-14T05:50:44Z

Why not improve the memcpy by ourself?

be/src/common/ckmemcpy.h

github-actions · 2022-11-16T06:52:47Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2022-11-16T09:42:36Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2022-11-16T13:28:52Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2022-11-17T02:18:04Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2022-11-17T03:55:37Z

clang-tidy review says "All clean, LGTM! 👍"

dirtysalt · 2022-11-17T04:36:59Z

@Mergifyio rebase

This reverts commit 412e1f81702c3e92ea84f0d4b485ed35fe51b0f1.

mergify · 2022-11-17T04:37:47Z

rebase

✅ Branch has been successfully rebased

github-actions · 2022-11-17T04:52:24Z

clang-tidy review says "All clean, LGTM! 👍"

sonarcloud · 2022-11-17T04:53:38Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.7% Duplication

dirtysalt · 2022-11-17T07:57:16Z

run starrocks_admit_test

wanpengfei-git · 2022-11-17T07:57:29Z

run starrocks_admit_test

wanpengfei-git added the be-build label Nov 14, 2022

github-actions bot reviewed Nov 14, 2022

View reviewed changes

be/src/common/ckmemcpy.h Outdated Show resolved Hide resolved

be/src/common/ckmemcpy.h Outdated Show resolved Hide resolved

kangkaisen reviewed Nov 14, 2022

View reviewed changes

be/src/common/ckmemcpy.h Outdated Show resolved Hide resolved

dirtysalt changed the title ~~[Enhancement][WIP] POC: introducing ck memcpy~~ [Enhancement] improve performance of inline_memcpy Nov 15, 2022

wanpengfei-git added be-build and removed be-build labels Nov 16, 2022

github-actions bot removed the be-build label Nov 16, 2022

wanpengfei-git added the be-build label Nov 16, 2022

github-actions bot removed the be-build label Nov 16, 2022

wanpengfei-git added the be-build label Nov 16, 2022

dirtysalt changed the title ~~[Enhancement] improve performance of inline_memcpy~~ [Enhancement] improve performance of strings::memcpy_inlined Nov 16, 2022

dirtysalt enabled auto-merge (squash) November 16, 2022 14:13

github-actions bot removed the be-build label Nov 17, 2022

wanpengfei-git added the be-build label Nov 17, 2022

github-actions bot removed the be-build label Nov 17, 2022

wanpengfei-git added the be-build label Nov 17, 2022

dirtysalt added 4 commits November 17, 2022 04:37

new memcpy impl

c64856a

erms is not very stable

e4e1dd6

Revert "erms is not very stable"

ee545ba

This reverts commit 412e1f81702c3e92ea84f0d4b485ed35fe51b0f1.

erms works well in a specific region

bda3a28

github-actions bot removed the be-build label Nov 17, 2022

wanpengfei-git added the be-build label Nov 17, 2022

dirtysalt changed the title ~~[Enhancement] improve performance of strings::memcpy_inlined~~ [Enhancement] Improve performance of strings::memcpy_inlined Nov 17, 2022

kangkaisen approved these changes Nov 17, 2022

View reviewed changes

stephen-shelby approved these changes Nov 17, 2022

View reviewed changes

wanpengfei-git added the Approved Ready to merge label Nov 17, 2022

dirtysalt disabled auto-merge November 17, 2022 07:57

dirtysalt enabled auto-merge (squash) November 17, 2022 07:58

Youngwb approved these changes Nov 17, 2022

View reviewed changes

dirtysalt merged commit d940e8a into StarRocks:main Nov 17, 2022
1 check passed

github-actions bot removed Approved Ready to merge be-build labels Nov 17, 2022

dirtysalt deleted the goodbox branch November 17, 2022 10:06

liuyehcf mentioned this pull request May 12, 2023

[Enhancement] remove some unused memset #22744

Merged

16 tasks

stdpain self-assigned this May 12, 2023

stdpain mentioned this pull request Jan 31, 2024

Using glibc memcpy increases memory allocation performance, instead of using strings::memcpy_inlined function in Starrocks #40414

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Improve performance of `strings::memcpy_inlined` #13330

[Enhancement] Improve performance of `strings::memcpy_inlined` #13330

dirtysalt commented Nov 14, 2022 •

edited

github-actions bot left a comment

kangkaisen commented Nov 14, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 17, 2022

github-actions bot commented Nov 17, 2022

dirtysalt commented Nov 17, 2022

mergify bot commented Nov 17, 2022

github-actions bot commented Nov 17, 2022

sonarcloud bot commented Nov 17, 2022

dirtysalt commented Nov 17, 2022

wanpengfei-git commented Nov 17, 2022

[Enhancement] Improve performance of strings::memcpy_inlined #13330

[Enhancement] Improve performance of strings::memcpy_inlined #13330

Conversation

dirtysalt commented Nov 14, 2022 • edited

What type of PR is this：

Which issues of this PR fixes ：

Problem Summary(Required) ：

Checklist:

github-actions bot left a comment

Choose a reason for hiding this comment

kangkaisen commented Nov 14, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 16, 2022

github-actions bot commented Nov 17, 2022

github-actions bot commented Nov 17, 2022

dirtysalt commented Nov 17, 2022

mergify bot commented Nov 17, 2022

✅ Branch has been successfully rebased

github-actions bot commented Nov 17, 2022

sonarcloud bot commented Nov 17, 2022

dirtysalt commented Nov 17, 2022

wanpengfei-git commented Nov 17, 2022

[Enhancement] Improve performance of `strings::memcpy_inlined` #13330

[Enhancement] Improve performance of `strings::memcpy_inlined` #13330

dirtysalt commented Nov 14, 2022 •

edited