Skip to content

Fix wrong results for integer boolean predicates in WHERE on MergeTree#101287

Merged
nikitamikhaylov merged 3 commits intoClickHouse:masterfrom
groeneai:fix/where-large-int-boolean-predicate
Mar 31, 2026
Merged

Fix wrong results for integer boolean predicates in WHERE on MergeTree#101287
nikitamikhaylov merged 3 commits intoClickHouse:masterfrom
groeneai:fix/where-large-int-boolean-predicate

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

@groeneai groeneai commented Mar 31, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix wrong query results when a large integer constant (e.g. 256, 2147483648) is used as a boolean predicate in a WHERE clause with AND on MergeTree tables. For example, SELECT count() FROM t WHERE (2147483648 > b) AND 2147483648 would incorrectly return 0 instead of matching all rows.

Summary

When a non-UInt8 integer constant is used as a boolean predicate in a WHERE clause with AND (e.g. WHERE (256 > b) AND 256), MergeTree incorrectly prunes all parts and returns 0 rows. This affects any integer value where value % 256 == 0 (256, 512, 65536, 2147483648, etc.).

Root cause: splitFilterNodeForAllowedInputs() in VirtualColumnUtils.cpp reduces AND(expr_with_column, constant) to just constant when the column is not a virtual column. Since the constant's type (e.g. UInt16) differs from the AND function's return type (UInt8), a numeric cast CAST(256, 'UInt8') is applied, truncating 256 → 0 (256 mod 256). This makes filterPartsByVirtualColumns() see always_false and prune all parts.

Fix: Replace the truncating numeric cast with notEquals(x, 0), which correctly converts any numeric value to boolean (UInt8 0/1) without truncation. Values where value % 256 == 0 now correctly evaluate to true (1).

Closes #101269
Closes #99979

🤖 Generated with Claude Code

When a large integer constant is used as a boolean predicate in a WHERE
clause with AND on a MergeTree table, e.g. `WHERE (256 > b) AND 256`,
the query incorrectly returns 0 rows instead of matching all rows.

Root cause: `splitFilterNodeForAllowedInputs()` in VirtualColumnUtils.cpp
reduces `AND(greater(256, b), 256)` to just `256` when column `b` is not
an allowed input for virtual column filtering. Since the remaining child's
type (UInt16) differs from the AND's return type (UInt8), it applies a
numeric cast: `CAST(256, 'UInt8')` = 0 (truncation). This makes
`filterPartsByVirtualColumns()` see always_false, pruning all parts.

Fix: Replace the truncating numeric cast with `notEquals(x, 0)`, which
correctly converts any numeric value to boolean (0 or 1) without
truncation. Values like 256, 512, 65536, 2147483648 now correctly
evaluate to true.

Closes ClickHouse#101269
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR Validation (session cron:clickhouse-ci-task-worker:20260331-001500)

a) Deterministic repro? Yes.

CREATE TABLE t (b Int8) ENGINE = MergeTree ORDER BY ();
INSERT INTO t VALUES (1), (0);
SELECT count() FROM t WHERE (256 > b) AND 256; -- returns 0 (should be 2)

Any integer where value % 256 == 0 triggers the bug (256, 512, 65536, 2147483648, etc.).

b) Root cause explained?
splitFilterNodeForAllowedInputs() in VirtualColumnUtils.cpp processes the filter DAG for virtual column analysis. For AND(greater(256, b), 256), column b is not a virtual column, so the greater(256, b) child is dropped (allow_partial_result=true). The remaining child 256_UInt16 has type UInt16, but the AND function returns UInt8. The code applies addCast(*res, UInt8) which performs numeric truncation: UInt8(256) = 0. The filterPartsByVirtualColumns() function then sees a constant-false filter and returns an empty part set, causing "Selected 0/0 parts by partition key" — all data is pruned without ever being read.

c) Fix matches root cause? Yes. Replaced addCast(res, UInt8) with notEquals(res, 0) which converts any numeric type to UInt8 (0 or 1) via truthiness check instead of truncation.

d) New test added? Yes — 04074_where_large_int_boolean_predicate.sql covers: UInt16 (256, 512), UInt32 (65536), UInt64 (2147483648), AND 0 (falsy), negative values, and floats.

e) Both directions demonstrated?

  • Without fix: SELECT count() FROM t WHERE (256 > b) AND 256 → 0 (wrong)
  • With fix: same query → 2 (correct)
  • All 20/20 randomized test runs pass

@groeneai
Copy link
Copy Markdown
Contributor Author

cc @CurtizJ @KochetovNicolai — could you review this? It fixes a wrong-results bug where splitFilterNodeForAllowedInputs() in VirtualColumnUtils truncates large integer boolean predicates (like 256, 2147483648) to UInt8(0) via numeric cast, causing MergeTree to prune all parts and return empty results.

@pufit pufit added the can be tested Allows running workflows for external contributors label Mar 31, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 31, 2026

Workflow [PR], commit [d4801eb]

Summary:

job_name test_name status info comment
Stress test (amd_asan_ubsan) failure
Hung check failed, possible deadlock found FAIL cidb, issue ISSUE EXISTS
Stress test (arm_msan) failure
Logical error: Shard number is greater than shard count: shard_num=A shard_count=B cluster=C (STID: 5066-457d) FAIL cidb IGNORED

AI Review

Summary

This PR fixes incorrect MergeTree part-pruning behavior for predicates like WHERE (256 > b) AND 256, where boolean truthiness was previously derived via truncating cast and could become false for large integer constants. The new notEquals(x, 0) conversion preserves boolean semantics for non-zero numeric constants, and the added stateless regression test covers representative large integer, zero, negative, and float cases. High-signal review found no correctness, safety, or performance regressions in the touched code.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 31, 2026
@@ -0,0 +1,31 @@
-- Tags: no-parallel
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-parallel should be avoided unless strictly required. This test uses a unique table name (t_large_int_bool) and does not rely on shared mutable global state, so it should be safe to run in parallel. Please remove the tag to keep stateless test concurrency high.

The performance-move-const-arg clang-tidy check flags std::move()
on ColumnPtr (immutable_ptr<IColumn> wrapping boost::intrusive_ptr<const IColumn>).
Copying the intrusive pointer is just an atomic refcount increment,
so the move is unnecessary. Removing it fixes the arm_tidy CI failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@groeneai
Copy link
Copy Markdown
Contributor Author

arm_tidy fix: Removed unnecessary std::move() on ColumnPtr (line 552). The performance-move-const-arg clang-tidy check fires because ColumnPtr is immutable_ptr<IColumn> wrapping boost::intrusive_ptr<const IColumn> — moving a const-pointer-managing smart pointer is no better than copying it (just an atomic refcount increment). New commit: 2bb7314.

Comment thread tests/queries/0_stateless/04074_where_large_int_boolean_predicate.sql Outdated
@davenger davenger self-assigned this Mar 31, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 100.00% (9/9) · Uncovered code

Full report · Diff report

@pufit pufit enabled auto-merge March 31, 2026 15:21
@nikitamikhaylov nikitamikhaylov added this pull request to the merge queue Mar 31, 2026
Merged via the queue into ClickHouse:master with commit b5e6e90 Mar 31, 2026
161 of 164 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 31, 2026
@clickgapai
Copy link
Copy Markdown
Contributor

Hi — this PR may need backporting to release branches 26.3, 26.2, 26.1, 25.12, 25.11, 25.10, 25.9, 25.8, but no backport label was found.

Why: This is a P0 wrong-results bug in VirtualColumnUtils.cpp which has existed for a long time in the MergeTree part pruning path. The splitFilterNodeForAllowedInputs function and its CAST-based type conversion are present in all active release branches. Users hitting WHERE expr AND large_const patterns get silently wrong (zero) results.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

@clickgapai
Copy link
Copy Markdown
Contributor

Hi @groeneai @nikitamikhaylov @davenger — while reviewing this PR I found the following:

Happy to discuss — close anything that's wrong or already addressed.

@ClickHouse ClickHouse deleted a comment from clickgapai Apr 2, 2026
@ClickHouse ClickHouse deleted a comment from clickgapai Apr 2, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 3, 2026
PR ClickHouse#101287 changed addCast(res, UInt8) to notEquals(x, 0) in the AND
single-child path of splitFilterNodeForAllowedInputs to avoid truncating
large values (e.g. CAST(256, 'UInt8') → 0).

However, for Nullable types, DataTypeNullable::getDefault() returns a
Null Field, so notEquals(x, NULL) always yields NULL (SQL three-valued
logic), incorrectly filtering out all rows/parts.  Issue ClickHouse#101433.

Fix: use removeNullable() to obtain the nested type's zero instead of
NULL.  For the special case of Nullable(Nothing) — a bare NULL literal —
fall back to the Nullable default (Null field), since Nothing has no
getDefault(); notEquals(x, NULL) correctly yields NULL → false.

The previous version of this fix (commit e9a6d5a) also incorrectly
changed the index-hint path from addCast to notEquals; that path performs
type coercion (not boolean conversion) and was never affected by the
Nullable issue.  This version leaves it unchanged.

Closes ClickHouse#101433
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

6 participants