New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AsyncPipelineExecutor all dictionaries #55839
Conversation
This is an automated comment for commit 8bc384e with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
78848fd
to
b4851cf
Compare
SOURCE(CLICKHOUSE(QUERY `SELECT key, attr FROM t1 GROUP BY key`)) | ||
LIFETIME(MIN 0 MAX 3600) | ||
LAYOUT(COMPLEX_KEY_HASHED_ARRAY()) | ||
SETTINGS(dictionary_use_async_executor=1, max_threads=8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear how many threads will be used if I wouldn't specify max_threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a default behavior (with max_threads=0
or 'auto'
and set to the number of cpus). Let me recheck it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPD: behavior is the same as for regular query, max_threads set automatically
SOURCE(CLICKHOUSE(QUERY `SELECT key, attr FROM t1 GROUP BY key`)) | ||
LIFETIME(MIN 0 MAX 3600) | ||
LAYOUT(COMPLEX_KEY_HASHED_ARRAY()) | ||
SETTINGS(dictionary_use_async_executor=1, max_threads=8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why someone could want to not use async executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the query itself is more lightweight that dictionary filling. To consume less query by executing source query and not to spawn additional threads. But if we will realize that it's always beneficial we may set it by default if a future.
else if (executor) | ||
return executor->pull(block); | ||
else | ||
throw Exception(ErrorCodes::LOGICAL_ERROR, "DictionaryPipelineExecutor is not initialized"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's do it straight in the ctor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ctor it's clear that one of executors is initialized from initializer-list. I added this branch just in case if new ctor is added and there it may be forgotten.
while (true) | ||
{ | ||
bool has_data = async_executor->pull(block); | ||
if (has_data && !block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we don't skip empty blocks when pulling from sync executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync executor never returns them. Actually Async shouldn't do it either. Perhaps this fix can help #55945 , but not sure it that is correct and complete fix.
const auto dict_id = StorageID::fromDictionaryConfig(config, config_prefix); | ||
|
||
auto context = copyContextAndApplySettingsFromDictionaryConfig(global_context, config, config_prefix); | ||
const auto * clickhouse_source = typeid_cast<const ClickHouseDictionarySource *>(source_ptr.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here it is typeid_cast
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as class ClickHouseDictionarySource final
it shouldn't be any difference between typeid_cast
and dynamic_cast
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
dictionary_use_async_executor
Documentation entry for user-facing changes