Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
00f0916
Fix "Not-ready Set" exception when IN subquery is moved to PREWHERE
alexey-milovidov Mar 22, 2026
75bb6ec
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Mar 22, 2026
c72b03a
Merge branch 'master' into fix-in-subquery-prewhere-not-ready-set
alexey-milovidov Mar 23, 2026
7572716
Skip GLOBAL IN sets in `buildSetsForDAG` when building PREWHERE sets
alexey-milovidov Mar 28, 2026
1fdae6a
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Mar 28, 2026
d56437e
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Mar 29, 2026
89f66e5
Fix CI: update test references and handle null-aware GLOBAL IN variants
alexey-milovidov Mar 29, 2026
4b3f5d6
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Mar 30, 2026
9414477
Address review: centralize global IN check, add GLOBAL IN PREWHERE test
alexey-milovidov Mar 30, 2026
975768d
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Mar 30, 2026
227d460
Prevent GLOBAL IN from being moved to PREWHERE; rename test to avoid …
alexey-milovidov Mar 30, 2026
3d3306c
Update test references: GLOBAL IN no longer moved to PREWHERE
alexey-milovidov Mar 30, 2026
4b5a079
Merge branch 'master' into fix-in-subquery-prewhere-not-ready-set
alexey-milovidov Mar 31, 2026
9ea1766
Merge branch 'master' into fix-in-subquery-prewhere-not-ready-set
alexey-milovidov Apr 7, 2026
cbb06bc
Merge branch 'master' into fix-in-subquery-prewhere-not-ready-set
alexey-milovidov Apr 9, 2026
866515f
Merge branch 'master' into fix-in-subquery-prewhere-not-ready-set
alexey-milovidov Apr 10, 2026
ce16565
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Apr 23, 2026
2f88a79
Build ordered sets in PREWHERE so KeyCondition can use them for index…
alexey-milovidov Apr 23, 2026
8085f6c
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Apr 25, 2026
e519f5e
Update test references and pin settings for sets-built-in-PREWHERE
alexey-milovidov Apr 26, 2026
9768b76
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Apr 27, 2026
395b7ad
Pin `optimize_move_to_prewhere = 1` in `03800_autopr_reuse_index_anal…
alexey-milovidov Apr 27, 2026
0dcddf7
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Apr 28, 2026
bb1b0c8
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Apr 29, 2026
8b24f30
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov Apr 30, 2026
31e5af2
Skip set building in `updatePrewhereInfo` for parallel replicas plan
alexey-milovidov Apr 30, 2026
279654e
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov May 1, 2026
a418b55
Revert `03800_autopr_reuse_index_analysis` reference to master values
alexey-milovidov May 1, 2026
8c43b95
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov May 1, 2026
ad31410
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov May 3, 2026
0052a6b
Merge remote-tracking branch 'origin/master' into fix-in-subquery-pre…
alexey-milovidov May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion src/Interpreters/misc.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,14 @@ inline bool functionIsInOperator(const std::string & name)
return name == "in" || name == "notIn" || name == "nullIn" || name == "notNullIn";
}

inline bool functionIsGlobalInOperator(const std::string & name)
{
return name == "globalIn" || name == "globalNotIn" || name == "globalNullIn" || name == "globalNotNullIn";
}

inline bool functionIsInOrGlobalInOperator(const std::string & name)
{
return functionIsInOperator(name) || name == "globalIn" || name == "globalNotIn" || name == "globalNullIn" || name == "globalNotNullIn";
return functionIsInOperator(name) || functionIsGlobalInOperator(name);
}

inline bool functionIsLikeOperator(const std::string & name)
Expand Down
17 changes: 17 additions & 0 deletions src/Processors/QueryPlan/ReadFromMergeTree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2910,6 +2910,23 @@ void ReadFromMergeTree::updatePrewhereInfo(const PrewhereInfoPtr & prewhere_info
{
query_info.prewhere_info = prewhere_info_value;

/// Build sets for the new PREWHERE synchronously. PREWHERE is evaluated at the
/// storage level during data reading, before the pipeline-level CreatingSetsStep
/// has a chance to execute. If a condition with IN (subquery) was moved to PREWHERE
/// by optimizePrewhere after applyFilters already ran, the set would remain unbuilt
/// and cause a "Not-ready Set" error.
/// We must skip sets used in GLOBAL IN functions because ReadFromRemote needs to
/// attach external tables to those sets before they are built. Building them here
/// would cause "Trying to attach external table to a ready set" errors.
/// Only build sets when applyFilters has already been called for this step (indicated by
/// `indexes` being populated). The plan built by `considerEnablingParallelReplicas` for
/// statistics collection runs `optimizePrewhere` without `optimizePrimaryKeyConditionAndLimit`,
/// so `applyFilters` is skipped there and sets must not be built — the original plan's
/// `CreatingSetsStep` (added later via `addStepsToBuildSets`) handles them. Building here
/// would re-execute the IN-subquery and double-count its rows against `max_rows_to_read`.
if (query_info.prewhere_info && indexes.has_value())
VirtualColumnUtils::buildSetsForDAGExcludingGlobalIn(query_info.prewhere_info->prewhere_actions, context);

output_header = std::make_shared<const Block>(MergeTreeSelectProcessor::transformHeader(
storage_snapshot->getSampleBlockForColumns(all_column_names),
query_info.row_level_filter,
Expand Down
7 changes: 7 additions & 0 deletions src/Storages/MergeTree/MergeTreeWhereOptimizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,13 @@ bool MergeTreeWhereOptimizer::cannotBeMoved(const RPNBuilderTreeNode & node, con
if (function_name == "arrayJoin")
return true;

/// Disallow GLOBAL IN conditions from being moved to PREWHERE.
/// GLOBAL IN sets are populated via external tables attached by `ReadFromRemote`;
/// they cannot be built synchronously during PREWHERE evaluation, which runs
/// before the pipeline-level `CreatingSetsStep` has a chance to execute.
if (functionIsGlobalInOperator(function_name))
return true;

size_t arguments_size = function_node.getArgumentsSize();
for (size_t i = 0; i < arguments_size; ++i)
{
Expand Down
53 changes: 53 additions & 0 deletions src/Storages/VirtualColumnUtils.cpp
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
#include <memory>
#include <stack>
#include <unordered_set>

#include <Storages/VirtualColumnUtils.h>

#include <Core/NamesAndTypes.h>

#include <Interpreters/Context.h>
#include <Interpreters/ExpressionActions.h>
#include <Interpreters/misc.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Interpreters/TreeRewriter.h>
#include <Interpreters/convertFieldToType.h>
Expand Down Expand Up @@ -97,6 +99,57 @@ void buildSetsForDAG(const ActionsDAG & dag, const ContextPtr & context)
buildSetsForDagImpl(dag, context, /* ordered = */ false);
}

void buildSetsForDAGExcludingGlobalIn(const ActionsDAG & dag, const ContextPtr & context)
{
/// Collect ColumnSet nodes that are arguments to globalIn/globalNotIn functions.
/// These sets must NOT be built synchronously here because ReadFromRemote needs to
/// attach external tables to them first (via setExternalTable). Building them early
/// would make the set "created" without explicit elements, causing a LOGICAL_ERROR.
std::unordered_set<const ActionsDAG::Node *> global_in_set_nodes;
for (const auto & node : dag.getNodes())
{
if (node.type == ActionsDAG::ActionType::FUNCTION && node.function_base)
{
auto name = node.function_base->getName();
if (functionIsGlobalInOperator(name))
{
/// The set is the second argument (index 1)
if (node.children.size() >= 2)
global_in_set_nodes.insert(node.children[1]);
}
}
}

for (const auto & node : dag.getNodes())
{
if (node.type == ActionsDAG::ActionType::COLUMN && !global_in_set_nodes.contains(&node))
{
const ColumnSet * column_set = checkAndGetColumnConstData<const ColumnSet>(node.column.get());
if (!column_set)
column_set = checkAndGetColumn<const ColumnSet>(node.column.get());

if (column_set)
{
auto future_set = column_set->getData();
if (!future_set->get())
{
if (auto * set_from_subquery = typeid_cast<FutureSetFromSubquery *>(future_set.get()))
{
/// Prefer ordered build so that the set retains explicit elements,
/// which `KeyCondition` and skip-index analysis require to use the set
/// for primary-key / skip-index filtering (via `buildOrderedSetInplace`).
/// If `use_index_for_in_with_subqueries` is disabled, the ordered build
/// returns `nullptr` without building; fall back to unordered so the set
/// is still ready when PREWHERE is evaluated at read time.
if (!set_from_subquery->buildOrderedSetInplace(context))
set_from_subquery->buildSetInplace(context);
}
}
}
}
}
}

void buildOrderedSetsForDAG(const ActionsDAG & dag, const ContextPtr & context)
{
buildSetsForDagImpl(dag, context, /* ordered = */ true);
Expand Down
5 changes: 5 additions & 0 deletions src/Storages/VirtualColumnUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ void filterBlockWithExpression(const ExpressionActionsPtr & actions, Block & blo
/// Builds sets used by ActionsDAG inplace.
void buildSetsForDAG(const ActionsDAG & dag, const ContextPtr & context);

/// Builds sets used by ActionsDAG inplace, but skips sets that are arguments to
/// GLOBAL IN functions (globalIn, globalNotIn, globalNullIn, globalNotNullIn).
/// Those sets need external tables set up by ReadFromRemote before they can be built.
void buildSetsForDAGExcludingGlobalIn(const ActionsDAG & dag, const ContextPtr & context);

/// Builds ordered sets used by ActionsDAG inplace.
void buildOrderedSetsForDAG(const ActionsDAG & dag, const ContextPtr & context);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -407,10 +407,6 @@ Expression
Expression
Expression
ReadFromMergeTree
CreatingSet
Expression
Filter
ReadFromSystemNumbers
Expression
Expression
ReadFromMemoryStorage
Expand Down Expand Up @@ -466,10 +462,6 @@ Expression
Expression
Expression
ReadFromMergeTree
CreatingSet
Expression
Filter
ReadFromSystemNumbers
Expression
Union
Expression
Expand Down Expand Up @@ -894,10 +886,6 @@ Expression
Expression
Expression
ReadFromMergeTree
CreatingSet
Expression
Filter
ReadFromSystemNumbers
Expression
Expression
Expression
Expand Down Expand Up @@ -955,10 +943,6 @@ Expression
Expression
Expression
ReadFromMergeTree
CreatingSet
Expression
Filter
ReadFromSystemNumbers
Expression
Union
Expression
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,7 @@ CreatingSets (Create sets before main query execution)
ReadFromRemote (Read from remote replica)
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -331,7 +331,7 @@ CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Aggregating
Expression (Before GROUP BY)
Expression ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
Filter ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -351,7 +351,7 @@ CreatingSets (Create sets before main query execution)
ReadFromRemote (Read from remote replica)
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -364,7 +364,7 @@ CreatingSets (Create sets before main query execution)
Ranges: 0
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -384,7 +384,7 @@ CreatingSets (Create sets before main query execution)
Aggregating
Union
Expression (Before GROUP BY)
Expression ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
Filter ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -400,7 +400,7 @@ CreatingSets (Create sets before main query execution)
ReadFromRemote (Read from remote replica)
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -417,7 +417,7 @@ CreatingSets (Create sets before main query execution)
Aggregating
Union
Expression (Before GROUP BY)
Expression ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
Filter ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -433,7 +433,7 @@ CreatingSets (Create sets before main query execution)
ReadFromRemote (Read from remote replica)
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -446,7 +446,7 @@ CreatingSets (Create sets before main query execution)
Ranges: 1
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -463,7 +463,7 @@ CreatingSets (Create sets before main query execution)
Aggregating
Union
Expression (Before GROUP BY)
Expression ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
Filter ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -476,7 +476,7 @@ CreatingSets (Create sets before main query execution)
ReadFromRemote (Read from remote replica)
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -489,7 +489,7 @@ CreatingSets (Create sets before main query execution)
ReadFromMemoryStorage
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,13 @@
3 2048
23 2048
Prewhere filter column: globalIn(key, ) (removed)
3 2048
Prewhere filter column: globalIn(key, ) (removed)
0 2048
1 2048
2 2048
4 2048
5 2048
Prewhere filter column: globalNotIn(key, ) (removed)
0 2048
1 2048
2 2048
4 2048
5 2048
Prewhere filter column: globalNotIn(key, ) (removed)
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Aggregating
Expression (Before GROUP BY)
Expression ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
Filter ((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand All @@ -44,7 +44,7 @@ CreatingSets (Create sets before main query execution)
ReadFromRemote (Read from remote replica)
CreatingSets (Create sets before main query execution)
Expression ((Project names + Projection))
Expression ((WHERE + Change column names to column identifiers))
Filter ((WHERE + Change column names to column identifiers))
ReadFromMergeTree (default.tab0)
Indexes:
PrimaryKey
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
-- Regression test for "Not-ready Set" error when IN (subquery) condition
-- gets moved to PREWHERE by optimizePrewhere after applyFilters already ran.
-- https://github.com/ClickHouse/ClickHouse/issues/100318

CREATE TABLE t_100318_log (v0 UInt32) ENGINE = Log;
CREATE TABLE t_100318_mt (v0 UInt32, v1 UInt32, v2 DateTime, PRIMARY KEY(v1)) ENGINE = SummingMergeTree;
CREATE TABLE t_100318_rmt (v0 UInt32, v1 UInt32, PRIMARY KEY(v0)) ENGINE = ReplacingMergeTree;

INSERT INTO t_100318_mt VALUES (13, 23000, '2100-01-05');
INSERT INTO t_100318_mt VALUES (16, 26000, '2066-10-07');
INSERT INTO t_100318_rmt VALUES (91, 101000);

SELECT 1 FROM (SELECT 1 FROM t_100318_log)
WHERE EXISTS (
SELECT 1
UNION ALL
SELECT ref_4.v0 FROM (
SELECT row_number() OVER (PARTITION BY t_100318_mt.v0) AS c_1
FROM t_100318_mt
WHERE t_100318_mt.v2 IN (SELECT 1 FROM t_100318_log)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test covers local IN (subquery) moved to PREWHERE, but the new logic in buildSetsForDAGExcludingGlobalIn is specifically about GLOBAL IN variants (globalIn, globalNotIn, globalNullIn, globalNotNullIn).

Could you add a dedicated test where GLOBAL IN is moved to PREWHERE (ideally also with transform_null_in=1), so we lock in the fix for the external-table attachment path and prevent regressions of Trying to attach external table to a ready set?

) AS ref_3
INNER JOIN t_100318_rmt AS ref_4 ON (ref_3.c_1 = ref_4.v0)
);

DROP TABLE t_100318_log;
DROP TABLE t_100318_mt;
DROP TABLE t_100318_rmt;
Original file line number Diff line number Diff line change
Expand Up @@ -84,18 +84,12 @@ CreatingSets (Create sets before main query execution)
Output: a, b

CreatingSets (Create sets before main query execution)
├──ReadFromMergeTree (default.t1)
│ Read type: Default
│ Parts: 1 | Granules: 1
│ Output: a, b
│ Prewhere filter
│ Prewhere filter column: b IN subquery1 AND a IN subquery2
└──CreatingSet (Create set for subquery)
│ Set: subquery1
└──ReadFromMergeTree (default.t2)
Read type: Default
Parts: 1 | Granules: 1
Output: y
└──ReadFromMergeTree (default.t1)
Read type: Default
Parts: 1 | Granules: 1
Output: a, b
Prewhere filter
Prewhere filter column: b IN subquery1 AND a IN subquery2
--- IN with Set engine ---
Output: a, b

Expand Down
Empty file.
Loading