-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sparse_hashed dict performance with sequential keys (wrong hash function) #32536
Conversation
Special thanks for adding performance test! 👍 |
@azat Performance test showed significant performance degradation: https://clickhouse-test-reports.s3.yandex.net/32536/6d31a389f12837f3b8168f17217dadab4f4f4974/performance_comparison/report.html#fail1 |
PS. For our own HashTables, we should use crc32c. |
We should completely revert #27152 and remove useless files, because we don't use "Arcadia" build system anymore. |
Yes, the problem is that for sequential keys DefaultHash does not works great, but it works significantly better if those sequential keys are randomly accessed/inserted, I'm going to update performance test for now (to fix the original issue) and get back to this later.
|
9a4494a
to
f4fdea7
Compare
So now perf tests shows improvement, but the query itself is too slow (there is lots of heuristics in report.py, that does not allow the query to run >~2 seconds each):
Will rework the perf test. |
f4fdea7
to
559b01a
Compare
Actually is it not that easy to satisfy all the conditions (various timeouts for each query and test overall) for performance tests and write one for this issue, so let's do this separately. Perf test had been removed for now. |
Could you please remove the Also need to figure out what is wrong with integration tests... |
559b01a
to
f7669b3
Compare
|
…unction) In ClickHouse#27152 the hash function for sparse_hash_map had been changed to std::hash<> switch it back to DefaultHash<> (ClickHouse builtin), since std::hash<> for numeric keys returns itself and this does not works great with sparse_hash_map. I've tried the example from ClickHouse#32480 and using some hash fixes the performance of sparse_hashed layout. Fixes: ClickHouse#32480 v2: Add comments for SparseHashMap
It was added only for arcadia build, and used only in one place, no need to have a separate typedef for it.
f7669b3
to
a7dc6f3
Compare
https://github.com/ClickHouse/ClickHouse/runs/4516380764?check_suite_focus=true
Something left from the previous run in this workspace? |
…uential keys (wrong hash function)
…uential keys (wrong hash function)
Backport #32536 to 21.12: Fix sparse_hashed dict performance with sequential keys (wrong hash function)
Backport #32536 to 21.11: Fix sparse_hashed dict performance with sequential keys (wrong hash function)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix sparse_hashed dict performance with sequential keys (wrong hash function)
Detailed description / Documentation draft:
In #27152 the hash function for sparse_hash_map had been changed to
std::hash<> switch it back to DefaultHash<> (ClickHouse builtin), since
std::hash<> for numeric keys returns itself and this does not works
great with sparse_hash_map.
I've tried the example from #32480 and using some hash fixes the
performance of sparse_hashed layout.
Fixes: #32480
Fixes: #27152
Backport: 20.9+