Prune unused columns from ARRAY JOIN#99587
Conversation
| } | ||
|
|
||
| auto expr_it = expressions_usage.find(column_node->getColumnName()); | ||
| if (expr_it == expressions_usage.end() || expr_it->second.isUsed()) |
There was a problem hiding this comment.
❌ This pruning changes query semantics for multi-expression ARRAY JOIN when arrays have different lengths.
Before this pass, SELECT b FROM t ARRAY JOIN a, b validates both arrays and throws on size mismatch. After dropping unused a, the same query can stop throwing and produce rows from b only.
That is an observable correctness change (exception -> success). Please preserve mismatch validation for originally declared ARRAY JOIN expressions (or gate this behavior behind a compatibility/experimental setting).
There was a problem hiding this comment.
No, it does not, because the optimisation checks enable_unaligned_array_join setting:
- If the setting is disabled, it is not allowed to have arrays of different lengths. Thus, we can assume it and remove usage.
- If the setting is enabled, it is allowed to have arrays of different lengths. Thus, we can not assume it and remove usage.
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 89.35% (235/263, 0 noise lines excluded) |
| if (arguments.size() >= 2) | ||
| { | ||
| if (auto * column_node = arguments[0]->as<ColumnNode>()) |
There was a problem hiding this comment.
That looks not so nice. tupleElement shoud always have 2 args afiak.
I'd better use arguments.at(0)
Cherry pick #99587 to 25.8: Prune unused columns from ARRAY JOIN
Cherry pick #99587 to 25.12: Prune unused columns from ARRAY JOIN
Cherry pick #99587 to 26.1: Prune unused columns from ARRAY JOIN
Cherry pick #99587 to 26.2: Prune unused columns from ARRAY JOIN
Backport #99587 to 26.1: Prune unused columns from ARRAY JOIN
Backport #99587 to 26.2: Prune unused columns from ARRAY JOIN
Backport #99587 to 25.8: Prune unused columns from ARRAY JOIN
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix performance degradation in the analyzer. Prune unused columns from ARRAY JOIN.
Closes #74878.
Closes #91855.
Documentation entry for user-facing changes