Skip to content

Add enable_add_distinct_to_in_subqueries setting to optimize distributed IN subqueries.#81908

Merged
novikd merged 49 commits intoClickHouse:masterfrom
fhw12345:add_distinct_to_in_clause
Aug 8, 2025
Merged

Add enable_add_distinct_to_in_subqueries setting to optimize distributed IN subqueries.#81908
novikd merged 49 commits intoClickHouse:masterfrom
fhw12345:add_distinct_to_in_clause

Conversation

@fhw12345
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

A new setting, enable_add_distinct_to_in_subqueries, has been introduced. When enabled, ClickHouse will automatically add DISTINCT to subqueries in IN clauses for distributed queries. This can significantly reduce the size of temporary tables transferred between shards and improve network efficiency.
Note: This is a trade-off—while network transfer is reduced, additional merging (deduplication) work is required on each node. Enable this setting when network transfer is a bottleneck and the merging cost is acceptable.

@novikd novikd self-assigned this Jun 16, 2025
@novikd novikd added the can be tested Allows running workflows for external contributors label Jun 16, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 16, 2025

Workflow [PR], commit [b68aebb]

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jun 16, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 16, 2025

Workflow [PR], commit [a999f29]

Summary:

job_name test_name status info comment
Stress test (amd_tsan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL

@fhw12345
Copy link
Copy Markdown
Contributor Author

Hi @novikd could u help have a look at this PR, thanks

Copy link
Copy Markdown
Member

@novikd novikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the idea of the optimization is good, but it should be implemented in another place. Please, also provide some performance test or a performance improvement mesures.

@fhw12345 fhw12345 requested a review from novikd June 19, 2025 15:33
@fhw12345
Copy link
Copy Markdown
Contributor Author

fhw12345 commented Jul 2, 2025

Hi @novikd, just following up to kindly request a review when you have a moment, thanks.

Copy link
Copy Markdown
Member

@novikd novikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fhw12345
Copy link
Copy Markdown
Contributor Author

LGTM

@novikd I've updated the code following your comments, please have a check when you have a moment. Thanks!

Copy link
Copy Markdown
Member

@novikd novikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange that the perf tests don't run the new test.

@fhw12345
Copy link
Copy Markdown
Contributor Author

It's strange that the perf tests don't run the new test.

Do I need to make any other changes to trigger the perf test for the new test?

@fhw12345 fhw12345 requested a review from novikd July 16, 2025 02:37
@fhw12345
Copy link
Copy Markdown
Contributor Author

It's strange that the perf tests don't run the new test.

And this is the perf test result in local:
image (6)

Copy link
Copy Markdown
Member

@novikd novikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fhw12345
Copy link
Copy Markdown
Contributor Author

fhw12345 commented Aug 8, 2025

@novikd could you help merge the PR thanks!

@novikd
Copy link
Copy Markdown
Member

novikd commented Aug 8, 2025

PR / Stress test (amd_tsan) (pull_request)Failing after 19m

#81144

@novikd novikd added this pull request to the merge queue Aug 8, 2025
Merged via the queue into ClickHouse:master with commit dca840e Aug 8, 2025
122 of 124 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants