Skip to content

Prune unused columns from ARRAY JOIN#99587

Merged
novikd merged 10 commits intomasterfrom
optimize-array-join-analyzer
Mar 18, 2026
Merged

Prune unused columns from ARRAY JOIN#99587
novikd merged 10 commits intomasterfrom
optimize-array-join-analyzer

Conversation

@novikd
Copy link
Copy Markdown
Member

@novikd novikd commented Mar 16, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix performance degradation in the analyzer. Prune unused columns from ARRAY JOIN.

Closes #74878.
Closes #91855.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 16, 2026

Workflow [PR], commit [f37e585]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 16, 2026
}

auto expr_it = expressions_usage.find(column_node->getColumnName());
if (expr_it == expressions_usage.end() || expr_it->second.isUsed())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This pruning changes query semantics for multi-expression ARRAY JOIN when arrays have different lengths.

Before this pass, SELECT b FROM t ARRAY JOIN a, b validates both arrays and throws on size mismatch. After dropping unused a, the same query can stop throwing and produce rows from b only.

That is an observable correctness change (exception -> success). Please preserve mismatch validation for originally declared ARRAY JOIN expressions (or gate this behavior behind a compatibility/experimental setting).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not, because the optimisation checks enable_unaligned_array_join setting:

  • If the setting is disabled, it is not allowed to have arrays of different lengths. Thus, we can assume it and remove usage.
  • If the setting is enabled, it is allowed to have arrays of different lengths. Thus, we can not assume it and remove usage.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 18, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.70% 83.80% +0.10%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 89.35% (235/263, 0 noise lines excluded)
Diff coverage report
Uncovered code

Comment on lines +91 to +93
if (arguments.size() >= 2)
{
if (auto * column_node = arguments[0]->as<ColumnNode>())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks not so nice. tupleElement shoud always have 2 args afiak.
I'd better use arguments.at(0)

@KochetovNicolai KochetovNicolai self-assigned this Mar 18, 2026
@novikd novikd added this pull request to the merge queue Mar 18, 2026
Merged via the queue into master with commit af609c0 Mar 18, 2026
163 checks passed
@novikd novikd deleted the optimize-array-join-analyzer branch March 18, 2026 17:03
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 18, 2026
@novikd novikd added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Mar 18, 2026
@robot-ch-test-poll robot-ch-test-poll added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 18, 2026
robot-ch-test-poll added a commit that referenced this pull request Mar 18, 2026
Cherry pick #99587 to 25.8: Prune unused columns from ARRAY JOIN
robot-ch-test-poll added a commit that referenced this pull request Mar 18, 2026
Cherry pick #99587 to 25.12: Prune unused columns from ARRAY JOIN
robot-ch-test-poll added a commit that referenced this pull request Mar 18, 2026
Cherry pick #99587 to 26.1: Prune unused columns from ARRAY JOIN
robot-ch-test-poll added a commit that referenced this pull request Mar 18, 2026
Cherry pick #99587 to 26.2: Prune unused columns from ARRAY JOIN
clickhouse-gh bot added a commit that referenced this pull request Mar 19, 2026
Backport #99587 to 26.1: Prune unused columns from ARRAY JOIN
clickhouse-gh bot added a commit that referenced this pull request Mar 19, 2026
Backport #99587 to 26.2: Prune unused columns from ARRAY JOIN
@zlareb1
Copy link
Copy Markdown
Member

zlareb1 commented Mar 19, 2026

@novikd this PR has introduced bug: #100026

@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 20, 2026
nikitamikhaylov added a commit that referenced this pull request Mar 31, 2026
Backport #99587 to 25.8: Prune unused columns from ARRAY JOIN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance regression with analyzer: ARRAY JOIN with implicitly nested columns Analyzer: ARRAY JOIN with Nested reads all subcolumns

6 participants