New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize merge of uniqExact without_key #43072
Optimize merge of uniqExact without_key #43072
Conversation
2ba1b0c
to
a7e7480
Compare
AST fuzzer (asan) — #43199 |
b97a39e
to
84fc8a7
Compare
c1731d6
to
0e76c8c
Compare
} | ||
}; | ||
|
||
for (size_t i = 0; i < thread_pool->getMaxThreads(); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a dangerous code a bit. If somebody common background I/O pool with 1000 threads, it will add 1000 tasks. let's add at least <= NUM_BUCKETS tasks.
Also, with current implementation we can't share such a pool with a multiple uniq functions.
Maybe it's better to avoid calling wait() at all and handle exceptions in every task separately. (But out ThreadPool interface is not so good for it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add at least <= NUM_BUCKETS tasks.
ok
Also, with current implementation we can't share such a pool with a multiple uniq functions.
yes, we cannot merge states of two different uniqExact
-s simultaneously, but it shouldn't be a problem since we aim to utilise all the threads by the current merge. if we would share, client code would need to manually call wait
before destroying states which is also not ideal.
* Used for partial specialization to add strings. | ||
*/ | ||
template <typename T, typename Data> | ||
struct OneAdder | ||
template <typename T, typename Data, bool is_variadic = false, bool is_exact = false, bool argument_is_tuple = false> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had an idea that probably we can put this 3 flags into Data
.
Probably it will make a code a bit more readable. (If it is possible to do).
And maybe we can do it with is_able_to_parallelize_merge
flag as well to those Data which support it, like
if (settings->max_threads > 1)
return createAggregateFunctionUniq<
..., AggregateFunctionUniqExactData<..., true /* is_able_to_parallelize_merge>
else
return createAggregateFunctionUniq<
..., AggregateFunctionUniqExactData<..., false /* is_able_to_parallelize_merge>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also thought about that, looks more accurate
src/Common/HashTable/HashTable.h
Outdated
@@ -1263,30 +1251,6 @@ class HashTable : | |||
ptr->write(wb); | |||
} | |||
|
|||
void writeText(DB::WriteBuffer & wb) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused a bit why this code is removed.
Probably it's not used, but may be still helpful for debugging.
I would prefer to do it in a separate pr if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's left it untouched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks ok.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Parallelized merging of
uniqExact
states for aggregation without a key, i.e. queries likeSELECT uniqExact(number) FROM table
. The improvement becomes noticeable when the number of unique keys approaches 10^6.Also
uniq
performance is slightly optimized.This closes #4510.
x86:
arm: