Fix Not-ready Set exception when IN subquery is moved to PREWHERE#100375
Fix Not-ready Set exception when IN subquery is moved to PREWHERE#100375alexey-milovidov wants to merge 31 commits intomasterfrom
Conversation
When `optimizePrewhere` moves a filter condition containing `IN (subquery)` from WHERE to PREWHERE, the set for the subquery may not have been built yet. `buildSetsForDAG` was only called in `applyFilters`, which runs before the PREWHERE optimization. After `optimizePrewhere` moves the condition, the set remains unbuilt and causes a "Not-ready Set is passed" exception during MergeTree data reading. Fix by calling `buildSetsForDAG` in `ReadFromMergeTree::updatePrewhereInfo` so that any sets in newly-assigned PREWHERE actions are built synchronously. Fixes #100318 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [0052a6b] Summary: ⏳
AI ReviewSummaryThis PR fixes the ClickHouse Rules
Final VerdictStatus: ✅ Approve |
…where-not-ready-set
When `optimizePrewhere` moves a filter with `GLOBAL IN (subquery)` to PREWHERE, the synchronous `buildSetsForDAG` call in `updatePrewhereInfo` would build the set before `ReadFromRemote` had a chance to attach an external table via `setExternalTable`. This caused a LOGICAL_ERROR: "Trying to attach external table to a ready set without explicit elements". Add `buildSetsForDAGExcludingGlobalIn` that skips sets used as arguments to `globalIn`/`globalNotIn` functions. Those sets will be built later by the pipeline-level `CreatingSetsTransform` after external tables are set up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…where-not-ready-set
| if (node.type == ActionsDAG::ActionType::FUNCTION && node.function_base) | ||
| { | ||
| auto name = node.function_base->getName(); | ||
| if (name == "globalIn" || name == "globalNotIn") |
There was a problem hiding this comment.
buildSetsForDAGExcludingGlobalIn currently skips only globalIn/globalNotIn, but GLOBAL IN can also appear as globalNullIn/globalNotNullIn (e.g. with transform_null_in). Those variants still require external-table attachment in ReadFromRemote; if built here they can hit the same Trying to attach external table to a ready set without explicit elements exception.
Please extend the exclusion predicate to include the null-aware global variants too (and ideally centralize this check via functionIsInOrGlobalInOperator + isGlobalInOperator).
…where-not-ready-set
- Update `02967_parallel_replicas_joins_and_analyzer` reference: remove `CreatingSet` steps that no longer appear in EXPLAIN output because sets for IN subqueries are now built synchronously in `updatePrewhereInfo`. - Update `03800_autopr_reuse_index_analysis` reference: adjust `IndexAnalysisRounds` counts that increased because the parallel replicas plan now also builds sets via `buildSetsForDAGExcludingGlobalIn`, causing additional index analysis passes for IN subqueries. - Address review feedback: extend GLOBAL IN exclusion in `buildSetsForDAGExcludingGlobalIn` to also skip `globalNullIn` and `globalNotNullIn` (null-aware GLOBAL IN variants used with `transform_null_in`), which also require external table attachment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| SELECT ref_4.v0 FROM ( | ||
| SELECT row_number() OVER (PARTITION BY t_100318_mt.v0) AS c_1 | ||
| FROM t_100318_mt | ||
| WHERE t_100318_mt.v2 IN (SELECT 1 FROM t_100318_log) |
There was a problem hiding this comment.
This regression test covers local IN (subquery) moved to PREWHERE, but the new logic in buildSetsForDAGExcludingGlobalIn is specifically about GLOBAL IN variants (globalIn, globalNotIn, globalNullIn, globalNotNullIn).
Could you add a dedicated test where GLOBAL IN is moved to PREWHERE (ideally also with transform_null_in=1), so we lock in the fix for the external-table attachment path and prevent regressions of Trying to attach external table to a ready set?
…where-not-ready-set
Extract `functionIsGlobalInOperator` helper in `misc.h` and use it in `buildSetsForDAGExcludingGlobalIn` instead of hardcoded string comparisons. Add a dedicated test for `GLOBAL IN` moved to PREWHERE with `transform_null_in = 1` to cover null-aware global variants (`globalNullIn`/`globalNotNullIn`). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…where-not-ready-set
…collision GLOBAL IN sets are populated via external tables attached by `ReadFromRemote` and cannot be built synchronously during PREWHERE evaluation. Instead of trying to skip them in `buildSetsForDAGExcludingGlobalIn` (which leaves the sets unbuilt and causes "Not-ready Set" errors), prevent the optimizer from moving GLOBAL IN conditions to PREWHERE in the first place via `cannotBeMoved`. Also rename `04068_global_in_subquery_prewhere` to `04070_global_in_subquery_prewhere` to avoid collision with `04068_constant_fold_union_intersect` from master. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Since `globalIn`/`globalNotIn` conditions are now prevented from being moved to PREWHERE, the WHERE step in EXPLAIN output changes from `Expression` to `Filter` for GLOBAL IN queries (the filter remains in WHERE instead of being absorbed by PREWHERE). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158 |
|
The flaky check failure is fixed in #102148, let's update the branch. |
…where-not-ready-set
… pushdown When `updatePrewhereInfo` built IN-subquery sets synchronously via `buildSetInplace`, the resulting `Set` had `isCreated()==true` but no explicit elements. Subsequent primary-key and skip-index analysis in `KeyCondition` / `MergeTreeIndexSet` / bloom-filter / text-index conditions uses `FutureSetFromSubquery::buildOrderedSetInplace`, which early-returns `nullptr` on a created-but-empty set — so the set cannot be used for index filtering and the table is scanned fully. This manifested on parallel-replicas shards with `parallel_replicas_local_plan=1` and `parallel_replicas_index_analysis_only_on_coordinator=1`, where `ReadFromMergeTree::selectRangesToReadImpl` skips local index analysis (so `KeyCondition`'s ordered build never runs first) and our unordered build wins the race. Local `IN (subquery)` queries then lost PK / skip-index pruning, causing `TOO_MANY_ROWS` failures in `01583_const_column_in_set_index`, `01585_use_index_for_global_in` and inflated `ReadCompressedBytes` in `03801_autopr_input_bytes_estimation_query_with_subqueries`. Fix: prefer `buildOrderedSetInplace` in `buildSetsForDAGExcludingGlobalIn`, falling back to `buildSetInplace` only when `use_index_for_in_with_subqueries` is disabled (in which case the ordered path returns `nullptr` without building, so we still need the unordered build to satisfy the original "Not-ready Set" fix).
…where-not-ready-set
`04060_explain_pretty_joins_sets`: now that `buildSetsForDAGExcludingGlobalIn` also builds the non-PK side of `b IN subquery1 AND a IN subquery2`, the `CreatingSets` step has a single `ReadFromMergeTree` child, with no remaining `CreatingSet` for `subquery1` (the set is already populated). `03800_autopr_reuse_index_analysis`: pin `query_plan_optimize_prewhere = 1` on queries 3, 4, 5. Without the pin, randomized CI runs that flip `query_plan_optimize_prewhere = 0` skip `optimizePrewhere` (and therefore the synchronous `buildSetsForDAGExcludingGlobalIn` call), producing fewer `IndexAnalysisRounds` than the reference expects. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100375&sha=8085f6c1d87be999ed8573d5f67a2618cc1ea61e PR: #100375 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…where-not-ready-set
…ysis` The earlier pin of `query_plan_optimize_prewhere = 1` (in `e519f5ee7f0`) was not enough: when CI randomization sets `optimize_move_to_prewhere = 0`, `MergeTreeWhereOptimizer::optimize` skips moving the IN-subquery to PREWHERE even with `query_plan_optimize_prewhere = 1`, so `buildSetsForDAGExcludingGlobalIn` is not invoked and `IndexAnalysisRounds` drops below the reference (3, 5, 3 → 2, 3, 2 for queries 3-5). CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100375&sha=9768b76464cee92673fc0fdec72bb9c2064b2e9f PR: #100375 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…where-not-ready-set
…where-not-ready-set
…where-not-ready-set
The plan built by `considerEnablingParallelReplicas` for statistics collection (`automatic_parallel_replicas_mode=2`) runs `optimizePrewhere` without `optimizePrimaryKeyConditionAndLimit`, so `applyFilters` is skipped there. With our previous code, `updatePrewhereInfo` would still synchronously build IN-subquery sets via `buildSetsForDAGExcludingGlobalIn`, re-executing the subquery in a second `FutureSet` instance and double-counting its rows against `max_rows_to_read`. Guard the synchronous set build behind `indexes.has_value()`, which is true only after `applyFilters` has run. The parallel replicas plan is either discarded (mode 2) or replaces the original plan and goes through `addStepsToBuildSets` later, both paths that don't need synchronous set building here. Fixes failing CI tests `01585_use_index_for_global_in`, `01585_use_index_for_global_in_with_null`, and `03801_autopr_input_bytes_estimation_query_with_subqueries` reported in #100375 . Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…where-not-ready-set
After commit `31e5af2c995` ("Skip set building in `updatePrewhereInfo` for
parallel replicas plan") guarded the synchronous set build behind
`indexes.has_value()`, the parallel replicas plan path (which runs with
`query_plan_optimize_primary_key = false`, so `applyFilters` is skipped)
no longer triggers the extra subquery execution. As a result,
`IndexAnalysisRounds` for queries 3, 4, 5 dropped back to the original
master values (2, 3, 2) instead of the briefly elevated (3, 5, 3) that
the unguarded set build had produced.
Revert the reference and drop the now-unnecessary per-query
`query_plan_optimize_prewhere`/`optimize_move_to_prewhere` pins — the
counts are stable across randomized settings again.
CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100375&sha=31e5af2c995eea79616460595876c00a5722c5e2
PR: #100375
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…where-not-ready-set
…where-not-ready-set
LLVM Coverage Report
Changed lines: 91.43% (32/35) · Uncovered code |
…where-not-ready-set
When
optimizePrewheremoves a filter condition containingIN (subquery)from WHERE to PREWHERE, the set for the subquery may not have been built yet.buildSetsForDAGwas only called inapplyFilters, which runs before the PREWHERE optimization. AfteroptimizePrewheremoves the condition viaupdatePrewhereInfo, the set remains unbuilt and causes a "Not-ready Set is passed" exception (LOGICAL_ERROR) during MergeTree data reading.The fix calls
buildSetsForDAGinReadFromMergeTree::updatePrewhereInfoso that any sets in newly-assigned PREWHERE actions are built synchronously, matching the existing pattern already used inapplyFilters.Reproduces with a query that combines UNION ALL, a window function, and
IN (subquery)on a MergeTree table with multiple parts — the IN condition gets moved to PREWHERE, but its set was never built.Fixes #100318
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix "Not-ready Set" exception when a filter with
IN (subquery)is moved to PREWHERE by the query optimizer.Documentation entry for user-facing changes