New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the merge if all hashSets are singleLevel in UniqExactSet #52973
Conversation
lastest upstream master
Update to the master
update from upstream master branch
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: Changelog entry required for category 'Performance Improvement' |
1 similar comment
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: Changelog entry required for category 'Performance Improvement' |
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: Changelog entry required for category 'Performance Improvement' |
Hi @nickitat, the previous PR '#50748' is only used when the hashSets are mixed by singleLevel and twoLevel. But I have found that if all the hashSets are singleLevel, it could also benefit from the previous patch a lot in most cases. I have submitted the additional PR in that case if all singleLevelHash. |
This is an automated comment for commit 4cea40b with description of existing statuses. It's updated for the latest CI running
|
In PR(ClickHouse#50748), it has added new phase `parallelizeMergePrepare` before merge if all the hashSets are not all singleLevel or not all twoLevel. Then it will convert all the singleLevelSet to twoLevelSet in parallel, which will increase the CPU utilization and QPS. But if all the hashtables are singleLevel, it could also benefit from the `parallelizeMergePrepare` optimization in most cases if the hashtable size are not too small. By tuning the Query `SELECT COUNT(DISTINCT SearchPhase) FROM hits_v1` in different threads, we have got the mild threshold 6,000. Test patch with the Query 'SELECT COUNT(DISTINCT Title) FROM hits_v1' on 2x80 vCPUs server. If the threads are less than 48, the hashSets are all twoLevel or mixed by singleLevel and twoLevel. If the threads are over 56, all the hashSets are singleLevel. And the QPS has got at most 2.35x performance gain. Threads Opt/Base 8 100.0% 16 99.4% 24 110.3% 32 99.9% 40 99.3% 48 99.8% 56 183.0% 64 234.7% 72 233.1% 80 229.9% 88 224.5% 96 229.6% 104 235.1% 112 229.5% 120 229.1% 128 217.8% 136 222.9% 144 217.8% 152 204.3% 160 203.2% Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
The CI error |
Hi @nickitat , thanks for your help to review the patch. BTW, do you think what else should I do for this PR? |
In PR(#50748), it has added new phase
parallelizeMergePrepare
before merge if all the hashSets are not all singleLevel or not all twoLevel. Then it will convert all the singleLevelSet to twoLevelSet in parallel, which will increase the CPU utilization and QPS.But if all the hashtables are singleLevel, it could also benefit from the
parallelizeMergePrepare
optimization in most cases if the hashtable size are not too small.Then we should tune the threshold of the hashtable size. The total dataSet should not be that large. The unique column count should be proportional to the number of hits_v1. Keep the unique column count very small or we will never find the cross point in limited threads. And we would choose the dataSet
hits_v1
and the columnSearchPhase
. We have to make sure all the hashtable are singleLevel no matter what is the threads_num. Also, we have to make sure there is a cross point in the figure.By tuning the Query
SELECT COUNT(DISTINCT SearchPhase) FROM hits_v1
in different threads, we have got the mild threshold 6,000 (total_hashtable_size/hashtable_num).COUNT(DISTINCT SearchPhase)
is only 132,256. Even when threads_num is very small, the hashtable size could not reach the threshold 100,000 and they are all singleLevel.Test patch with the Query 'SELECT COUNT(DISTINCT Title) FROM hits_v1' on 2x80 vCPUs server. If the threads are less than 48, the hashSets are all twoLevel or mixed by singleLevel and twoLevel. If the threads are over 56, all the hashSets are singleLevel. And the QPS has got at most 2.35x performance gain.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Optimize the merge if all hashSets are singleLevel in UniqExactSet.