Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize merge of uniqExact without_key #43072

Merged
merged 15 commits into from Nov 17, 2022

Conversation

nickitat
Copy link
Member

@nickitat nickitat commented Nov 9, 2022

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Parallelized merging of uniqExact states for aggregation without a key, i.e. queries like SELECT uniqExact(number) FROM table. The improvement becomes noticeable when the number of unique keys approaches 10^6.
Also uniq performance is slightly optimized.
This closes #4510.


x86:

11.880 2.753 -4.315x -0.769 0.768 uniq_without_key 26 SELECT uniqExact(number) from numbers_mt(1e8)

arm:

5.777 1.077 -5.36x -0.814 0.813 uniq_without_key 26 SELECT uniqExact(number) from numbers_mt(1e8)

@nickitat nickitat added the force tests Force test ignoring fast test output. label Nov 9, 2022
@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Nov 9, 2022
@nickitat nickitat changed the title [WIP] Optimize merge of uniqExact without_key Optimize merge of uniqExact without_key Nov 11, 2022
@nickitat
Copy link
Member Author

AST fuzzer (asan) — #43199

@nickitat nickitat marked this pull request as ready for review November 14, 2022 16:00
@KochetovNicolai KochetovNicolai self-assigned this Nov 15, 2022
@nickitat
Copy link
Member Author

AST fuzzer (debug) - #42819
AST fuzzer (ubsan) - #42691
Integration tests (tsan) [1/4] - #43169
Stateless tests (release, s3 storage) - 01079_parallel_alter_detach_table_zookeeper doesn't look relevant

}
};

for (size_t i = 0; i < thread_pool->getMaxThreads(); ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a dangerous code a bit. If somebody common background I/O pool with 1000 threads, it will add 1000 tasks. let's add at least <= NUM_BUCKETS tasks.

Also, with current implementation we can't share such a pool with a multiple uniq functions.
Maybe it's better to avoid calling wait() at all and handle exceptions in every task separately. (But out ThreadPool interface is not so good for it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add at least <= NUM_BUCKETS tasks.

ok

Also, with current implementation we can't share such a pool with a multiple uniq functions.

yes, we cannot merge states of two different uniqExact-s simultaneously, but it shouldn't be a problem since we aim to utilise all the threads by the current merge. if we would share, client code would need to manually call wait before destroying states which is also not ideal.

* Used for partial specialization to add strings.
*/
template <typename T, typename Data>
struct OneAdder
template <typename T, typename Data, bool is_variadic = false, bool is_exact = false, bool argument_is_tuple = false>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an idea that probably we can put this 3 flags into Data.
Probably it will make a code a bit more readable. (If it is possible to do).

And maybe we can do it with is_able_to_parallelize_merge flag as well to those Data which support it, like

if (settings->max_threads > 1)
            return createAggregateFunctionUniq<
                ..., AggregateFunctionUniqExactData<..., true /* is_able_to_parallelize_merge>
else
            return createAggregateFunctionUniq<
                ..., AggregateFunctionUniqExactData<..., false /* is_able_to_parallelize_merge>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also thought about that, looks more accurate

@@ -1263,30 +1251,6 @@ class HashTable :
ptr->write(wb);
}

void writeText(DB::WriteBuffer & wb) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused a bit why this code is removed.
Probably it's not used, but may be still helpful for debugging.
I would prefer to do it in a separate pr if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's left it untouched

Copy link
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks ok.

@nickitat nickitat merged commit 7beb58b into ClickHouse:master Nov 17, 2022
nickitat added a commit that referenced this pull request Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force tests Force test ignoring fast test output. pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uniqExact (= distinct count) is slow
3 participants