Skip to content

Fix crash when GLOBAL IN set lacks explicit elements during DIA#101178

Merged
nickitat merged 9 commits intoClickHouse:masterfrom
groeneai:fix/dia-global-in-no-explicit-set-elements
Apr 13, 2026
Merged

Fix crash when GLOBAL IN set lacks explicit elements during DIA#101178
nickitat merged 9 commits intoClickHouse:masterfrom
groeneai:fix/dia-global-in-no-explicit-set-elements

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix server crash with "Trying to attach external table to a ready set without explicit elements" when distributed index analysis encounters a GLOBAL IN predicate whose set was built without explicit elements.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Summary

When distributed index analysis (DIA) encounters a GLOBAL IN predicate whose set has been built without explicit elements (e.g. because use_index_for_in_with_subqueries_max_values was exceeded during buildOrderedSetInplace()), setExternalTable() crashes with a LOGICAL_ERROR:

Logical error: 'Trying to attach external table to a ready set without explicit elements'

This crash appeared in stress tests (STID: 4224-4913) — 5 hits across 5 distinct unrelated PRs in 30 days — where the AST fuzzer can produce queries combining GLOBAL IN on tables with DIA enabled while use_index_for_in_with_subqueries_max_values causes the set's explicit elements to be dropped (retaining only hashes).

Root Cause

The crash path:

  1. buildOrderedSetInplace() builds the IN subquery set, calling fillSetElements() first
  2. During insertion, use_index_for_in_with_subqueries_max_values is exceeded → fill_set_elements is reset to false, elements cleared — the set is created with only hashes
  3. DIA calls tryBuildAdditionalFilterAST() which finds the globalIn function
  4. It tries to create a temporary external table from the set via setExternalTable()
  5. setExternalTable() asserts hasExplicitSetElements()false → LOGICAL_ERROR crash

Fix

In tryBuildAdditionalFilterAST(), before calling setExternalTable(), check if the set is already built without explicit elements. If so, skip the GLOBAL IN predicate — the distributed filter will simply omit it. DIA still functions correctly (it just won't use this predicate for remote index analysis), maintaining correctness at a slight optimization cost.

Test Plan

  • New regression test 04071_global_in_dia_no_explicit_set_elements — reproduces the crash deterministically with GLOBAL IN + DIA + use_index_for_in_with_subqueries_max_values = 1
  • Verified both with and without enable_parallel_replicas
  • 20/20 runs pass with full randomization

🤖 Generated with Claude Code

…ed index analysis

When distributed index analysis (DIA) encounters a GLOBAL IN predicate
whose set has been built without explicit elements (e.g. because
use_index_for_in_with_subqueries_max_values was exceeded during
buildOrderedSetInplace), setExternalTable() would crash with:
"Trying to attach external table to a ready set without explicit elements"

This happened in stress tests where the AST fuzzer could produce queries
with GLOBAL IN on tables with DIA enabled, and the set had too many
elements to retain explicit values (only hashes remained).

The fix adds a check in tryBuildAdditionalFilterAST(): if the set is
already built without explicit elements, the GLOBAL IN predicate is
skipped from the distributed filter. DIA still functions correctly —
it just won't use this predicate for remote index analysis, which is
a slight optimization loss but maintains correctness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR Validation Gate (Session cron:clickhouse-ci-task-worker:20260330-081500)

a) Deterministic repro? ✅ Yes — the following crashes the server every time without the fix:

CREATE TABLE t (key Int32, value Int32) ENGINE = MergeTree() ORDER BY key
SETTINGS distributed_index_analysis_min_parts_to_activate = 0, distributed_index_analysis_min_indexes_bytes_to_activate = 0;
INSERT INTO t SELECT number, number * 10 FROM numbers(10000) SETTINGS max_block_size = 1000, min_insert_block_size_rows = 1000, max_insert_threads = 1;
SELECT sum(key) FROM t WHERE key GLOBAL IN (SELECT key FROM t WHERE key > 5000)
SETTINGS distributed_index_analysis_for_non_shared_merge_tree = 1, enable_parallel_replicas = 0,
  distributed_index_analysis = 1, distributed_index_analysis_only_on_coordinator = 1,
  cluster_for_parallel_replicas = 'test_cluster_one_shard_two_replicas',
  use_index_for_in_with_subqueries_max_values = 1;

b) Root cause explained?buildOrderedSetInplace() builds the IN set with fillSetElements(), but during insertion the element count exceeds use_index_for_in_with_subqueries_max_values, causing fill_set_elements to be reset to false and elements cleared. The set is created (hashes only). DIA's tryBuildAdditionalFilterAST() then finds the globalIn function, tries to create a temporary external table via setExternalTable(), which asserts hasExplicitSetElements() — fails because only hashes remain.

c) Fix matches root cause? ✅ Yes — the fix adds a check in tryBuildAdditionalFilterAST() right before setExternalTable(). If the set is already built without explicit elements, the GLOBAL IN predicate is skipped from the distributed filter (continuing the loop, leaving res = nullptr for this node, which AND parents gracefully drop).

d) Test intent preserved? / New tests added? ✅ New regression test 04071_global_in_dia_no_explicit_set_elements added. The existing test 04052_distributed_index_analysis_in_subquery_no_quadratic is unaffected (uses in not globalIn).

e) Both directions demonstrated? ✅ Without fix: server crashes with LOGICAL_ERROR "Trying to attach external table to a ready set without explicit elements". With fix: query returns correct result 37492500. Tested 20/20 times with randomization.

@groeneai
Copy link
Copy Markdown
Contributor Author

cc @nickitat @KochetovNicolai — could you review this? It fixes a LOGICAL_ERROR crash in tryBuildAdditionalFilterAST() where distributed index analysis tries to attach an external table to a GLOBAL IN set that was built without explicit elements (only hashes) because use_index_for_in_with_subqueries_max_values was exceeded. Related to PR #100375 which fixes a different manifestation of the same underlying set-state issue.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 30, 2026

Workflow [PR], commit [a1798fe]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, flaky check) failure
00178_quantile_ddsketch FAIL cidb, issue ISSUE CREATED

AI Review

Summary

This PR prevents a LOGICAL_ERROR exception in distributed index analysis for GLOBAL IN when a prebuilt set no longer has explicit elements (e.g., due to use_index_for_in_with_subqueries_max_values). The fix is surgical and correctness-preserving: it skips only the affected distributed filter predicate instead of attempting to attach an invalid external table. The added stateless test reproduces the failure mode and validates stable query results with and without parallel replicas. I did not find correctness, safety, concurrency, or performance blockers in the proposed change.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 30, 2026
@nickitat nickitat added the can be tested Allows running workflows for external contributors label Mar 30, 2026
@nickitat
Copy link
Copy Markdown
Member

@groeneai investigate all failed tests

@groeneai
Copy link
Copy Markdown
Contributor Author

@nickitat I investigated all 5 failing tests. None are caused by this PR — all are pre-existing trunk issues appearing across many unrelated PRs:

# Test Check 30-day hits Distinct PRs Master hits
1 03668_shard_join_in_reverse_order Stateless (arm_binary) 133 93 10
2 test_parallel_replicas_custom_key_replicatedmergetree[..sipHash64(key)] Integration (amd_msan, 2/6) 19 12
3 Logical error: No available columns (STID: 3938-33a6) Stress (amd_msan) 63 52 3
4 Hung check failed, possible deadlock found Stress (arm_debug) 243 159 31
5 Logical error: Column identifier A is already registered (STID: 4697-369d) Stress (arm_msan) 62 54 8

Details:

All data from CIDB (default.checks, 30-day window). This PR's change is a 9-line guard in tryBuildAdditionalFilterAST() that only affects the GLOBAL IN + DIA code path — it cannot cause any of these unrelated failures.

@nickitat nickitat self-assigned this Apr 2, 2026
@nickitat nickitat enabled auto-merge April 2, 2026 19:54
@alexey-milovidov
Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@groeneai
Copy link
Copy Markdown
Contributor Author

groeneai commented Apr 7, 2026

Done — merged latest master (includes the SimSIMD SVE MSan fix from #101239) into the branch. CI should re-run with the fix included.

@alexey-milovidov
Copy link
Copy Markdown
Member

The MSan stress test failure (MemorySanitizer: use-of-uninitialized-value, STID 4179-5154 or 4148-3044) is a known pre-existing issue unrelated to this PR. Fix: #102158

@alexey-milovidov
Copy link
Copy Markdown
Member

The Can't adjust last granule error in CI is a known issue. The fix is in #101641

@alexey-milovidov
Copy link
Copy Markdown
Member

The Hung check failure is fixed by #102008 and #102010, let's update the branch

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 10, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (9/9) · Uncovered code

Full report · Diff report

@nickitat nickitat added this pull request to the merge queue Apr 13, 2026
Merged via the queue into ClickHouse:master with commit 48e9a55 Apr 13, 2026
162 of 164 checks passed
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 13, 2026
@clickgapai
Copy link
Copy Markdown
Contributor

Hi @groeneai @nickitat — the changelog category needs updating.

Current: Bug Fix (user-visible misbehavior in an official stable release)
Required: Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Why: This fixes a LOGICAL_ERROR exception (PreparedSets.cpp:261). Per ClickHouse guidelines, LOGICAL_ERROR must use the critical category since it causes crashes in debug builds and is considered a code invariant violation.

Could you update the category? Thanks!

@clickgapai
Copy link
Copy Markdown
Contributor

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: tryBuildAdditionalFilterAST GLOBAL IN handling in ReadFromRemote.cpp — introduced in 25.1.

Why: The GLOBAL IN support in tryBuildAdditionalFilterAST was added in commit ea5b64e (2025-01-07), first appearing in v25.1.1.4165-stable. The bug exists in all supported branches. The PR reports 5 hits across 5 distinct PRs in 30 days from stress tests. The fix is minimal (adding a guard check) with no risk of regression.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

@nickitat nickitat added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Apr 14, 2026
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Apr 14, 2026
robot-clickhouse added a commit that referenced this pull request Apr 14, 2026
robot-clickhouse added a commit that referenced this pull request Apr 14, 2026
robot-clickhouse added a commit that referenced this pull request Apr 14, 2026
@robot-ch-test-poll robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants