-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AsyncPipelineExecutor all dictionaries #55839
Changes from all commits
0ece6e0
b4851cf
1aa4a54
8bc384e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4746,3 +4746,18 @@ a Tuple( | |
l Nullable(String) | ||
) | ||
``` | ||
|
||
## dictionary_use_async_executor {#dictionary_use_async_executor} | ||
|
||
Execute a pipeline for reading dictionary source in several threads. It's supported only by dictionaries with local CLICKHOUSE source. | ||
|
||
You may specify it in `SETTINGS` section of dictionary definition: | ||
|
||
```sql | ||
CREATE DICTIONARY t1_dict ( key String, attr UInt64 ) | ||
PRIMARY KEY key | ||
SOURCE(CLICKHOUSE(QUERY `SELECT key, attr FROM t1 GROUP BY key`)) | ||
LIFETIME(MIN 0 MAX 3600) | ||
LAYOUT(COMPLEX_KEY_HASHED_ARRAY()) | ||
SETTINGS(dictionary_use_async_executor=1, max_threads=8); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why someone could want to not use async executor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the query itself is more lightweight that dictionary filling. To consume less query by executing source query and not to spawn additional threads. But if we will realize that it's always beneficial we may set it by default if a future. |
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,11 +9,15 @@ | |
#include <Poco/Util/AbstractConfiguration.h> | ||
#include <Common/SettingsChanges.h> | ||
|
||
#include <Processors/Executors/PullingPipelineExecutor.h> | ||
#include <Processors/Executors/PullingAsyncPipelineExecutor.h> | ||
|
||
namespace DB | ||
{ | ||
|
||
namespace ErrorCodes | ||
{ | ||
extern const int LOGICAL_ERROR; | ||
extern const int SIZES_OF_COLUMNS_DOESNT_MATCH; | ||
} | ||
|
||
|
@@ -130,4 +134,30 @@ String TransformWithAdditionalColumns::getName() const | |
{ | ||
return "TransformWithAdditionalColumns"; | ||
} | ||
|
||
DictionaryPipelineExecutor::DictionaryPipelineExecutor(QueryPipeline & pipeline_, bool async) | ||
: async_executor(async ? std::make_unique<PullingAsyncPipelineExecutor>(pipeline_) : nullptr) | ||
, executor(async ? nullptr : std::make_unique<PullingPipelineExecutor>(pipeline_)) | ||
{} | ||
|
||
bool DictionaryPipelineExecutor::pull(Block & block) | ||
{ | ||
if (async_executor) | ||
{ | ||
while (true) | ||
{ | ||
bool has_data = async_executor->pull(block); | ||
if (has_data && !block) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why we don't skip empty blocks when pulling from sync executor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sync executor never returns them. Actually Async shouldn't do it either. Perhaps this fix can help #55945 , but not sure it that is correct and complete fix. |
||
continue; | ||
return has_data; | ||
} | ||
} | ||
else if (executor) | ||
return executor->pull(block); | ||
else | ||
throw Exception(ErrorCodes::LOGICAL_ERROR, "DictionaryPipelineExecutor is not initialized"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe let's do it straight in the ctor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In ctor it's clear that one of executors is initialized from initializer-list. I added this branch just in case if new ctor is added and there it may be forgotten. |
||
} | ||
|
||
DictionaryPipelineExecutor::~DictionaryPipelineExecutor() = default; | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear how many threads will be used if I wouldn't specify
max_threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a default behavior (with
max_threads=0
or'auto'
and set to the number of cpus). Let me recheck it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPD: behavior is the same as for regular query, max_threads set automatically