New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Optimize a subtle inline performance problem #23300
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
liuyehcf
changed the title
[Enhancement] Optimize non-nullable group by case
[Enhancement] Optimize a subtle inline performance problem
May 11, 2023
liuyehcf
changed the title
[Enhancement] Optimize a subtle inline performance problem
[WIP][Enhancement] Optimize a subtle inline performance problem
May 11, 2023
liuyehcf
force-pushed
the
performance
branch
2 times, most recently
from
May 12, 2023 03:02
717451c
to
59c13f0
Compare
liuyehcf
changed the title
[WIP][Enhancement] Optimize a subtle inline performance problem
[Enhancement] Optimize a subtle inline performance problem
May 12, 2023
kangkaisen
previously approved these changes
May 12, 2023
silverbullet233
previously approved these changes
May 12, 2023
add an ALWAYS_NOINLINE in |
stdpain
approved these changes
May 12, 2023
Signed-off-by: liuyehcf <1559500551@qq.com>
kangkaisen
approved these changes
May 12, 2023
@Mergifyio backport branch-3.0 |
✅ Backports have been created
|
mergify bot
pushed a commit
that referenced
this pull request
May 12, 2023
Signed-off-by: liuyehcf <1559500551@qq.com> (cherry picked from commit 9e03202)
wanpengfei-git
pushed a commit
that referenced
this pull request
May 13, 2023
Signed-off-by: liuyehcf <1559500551@qq.com> (cherry picked from commit 9e03202)
Moonm3n
pushed a commit
to Moonm3n/starrocks
that referenced
this pull request
May 23, 2023
…#23300) Signed-off-by: liuyehcf <1559500551@qq.com> Signed-off-by: Moonm3n <saxonzhan@gmail.com>
numbernumberone
pushed a commit
to numbernumberone/starrocks
that referenced
this pull request
May 31, 2023
…#23300) Signed-off-by: liuyehcf <1559500551@qq.com>
numbernumberone
pushed a commit
to numbernumberone/starrocks
that referenced
this pull request
May 31, 2023
…#23300) Signed-off-by: liuyehcf <1559500551@qq.com>
abc982627271
pushed a commit
to abc982627271/starrocks
that referenced
this pull request
Jun 5, 2023
…#23300) Signed-off-by: liuyehcf <1559500551@qq.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this:
Reproduce
we can reproduce it by the following steps:
Id
densely increase from"1"
to"500'0000"
PublishStatus
is a const valueNew
Analysis
It's a subtle performance problem, it is introduced by #13968, which is a refactor pr, only for extracting and reusing the common codes, and there are no logic changes(at least the above case).
The main reason is that the size of instructions of function is too large, and this case only go through a certain subset of the instructions. But unfortunately, some part of the instructions are put together with those won't be executed in this case(see the blow code, this case only go through branch 2). When instructions are loaded into the instruction cache, they are typically loaded in multiple cache line-sized chunks, which may lead to substantial cache miss.
In order to illustrate it more clearly. I present a graph down below(It is just a assumption of mine, which is not proved):
block4
, it contains instructions both belongs to branchA and branchB.instructorA1
, and it will loaded theblock4
from memory to cache system. But practically, current processor will load many blocks at a time, which means theblock4
,block5
,block6
,block7
may all be loaded to cache. The total code size ofstarrocks_be
is huge which is absolutely greater than the total cache size(at least L1 cache), so it may replace some hot instructions which may be re-executed before long, the re-execution of these hot instructions my occur further cache miss. All these may lead to the processor being bussy replacing the cache used for code. And then the performence deduction happens.Solution
For those functions that process a whole chunk's data, we can mark it as
__attribute__((noinline))
to avoid being inlined by compiler.This makes each function's instructions highly related without containing too many un-executed instructions.
Experiment
Since this sql runs very fast, so we use
mysqlslap
to test it-c 1 -n 1000
-c 10 -n 1000
-c 100 -n 1000
Notic that
3.0.0 + This PR
still has dedecution of 10 concurrency scenerio, this can be fixed by #22744 which is already cherry picked to branch-3.0.0, we can just ignore it here.Perf top
Checklist:
Bugfix cherry-pick branch check: