Add enable_add_distinct_to_in_subqueries setting to optimize distributed IN subqueries.#81908
Conversation
|
Workflow [PR], commit [a999f29] Summary: ❌
|
|
Hi @novikd could u help have a look at this PR, thanks |
novikd
left a comment
There was a problem hiding this comment.
Overall the idea of the optimization is good, but it should be implemented in another place. Please, also provide some performance test or a performance improvement mesures.
|
Hi @novikd, just following up to kindly request a review when you have a moment, thanks. |
@novikd I've updated the code following your comments, please have a check when you have a moment. Thanks! |
novikd
left a comment
There was a problem hiding this comment.
It's strange that the perf tests don't run the new test.
Do I need to make any other changes to trigger the perf test for the new test? |
|
@novikd could you help merge the PR thanks! |
|
dca840e

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
A new setting, enable_add_distinct_to_in_subqueries, has been introduced. When enabled, ClickHouse will automatically add DISTINCT to subqueries in IN clauses for distributed queries. This can significantly reduce the size of temporary tables transferred between shards and improve network efficiency.
Note: This is a trade-off—while network transfer is reduced, additional merging (deduplication) work is required on each node. Enable this setting when network transfer is a bottleneck and the merging cost is acceptable.