Skip to content

Fix premature chassert(analyzed_join.oneDisjunct()) for ASOF JOIN#106194

Open
groeneai wants to merge 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-stid-2508-3d70-asof-oneDisjunct-chassert
Open

Fix premature chassert(analyzed_join.oneDisjunct()) for ASOF JOIN#106194
groeneai wants to merge 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-stid-2508-3d70-asof-oneDisjunct-chassert

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

The AST fuzzer (amd_debug) reported a debug-build abort with stack
TreeRewriter::collectJoinedColumns -> abortOnFailedAssertion,
hash STID 2508-3d70, on queries shaped like:

SELECT * FROM t0 ASOF LEFT JOIN t1 ON and((t0.y > t1.y))

chassert(analyzed_join.oneDisjunct()) at TreeRewriter.cpp:759 also
asserts that the single clause is non-empty. For an ASOF JOIN whose ON
expression is a pure inequality (no equality key),
CollectJoinOnKeysVisitor records the ASOF keys into data.asof_*_key
but does NOT populate key_names_left / key_names_right. The clause stays
empty. In release builds the immediately-following any_keys_empty check
throws INVALID_JOIN_ON_EXPRESSION cleanly; the assertion was harmless
because assert was a no-op. After commit
445dafb5f098
converted assert to chassert, the assertion fires in debug builds
before the proper error can be thrown.

The shape produced by the fuzzer (and(...) around a single inequality) is
incidental — the same failure reproduces for the simpler ON t0.y > t1.y.

Fix: weaken to a count-only check, mirroring the OR branch immediately
above (line 753). Empty clauses for non-ASOF cases are still caught by
the existing any_keys_empty throw immediately below; ASOF clauses are
populated later by data.asofToJoinKeys.

Tested with clickhouse local (debug build, enable_analyzer = 0):

  • Without fix: SIGABRT (exit 134) for both ON t0.y > t1.y and
    ON and((t0.y > t1.y)).
  • With fix: both throw INVALID_JOIN_ON_EXPRESSION cleanly; proper ASOF
    JOINs with an equality + inequality continue to work.

Report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=13a9b8ea79c017e7e34c4b6065660398647e35f7&name_0=MasterCI&name_1=AST%20fuzzer%20%28amd_debug%29

No related open issue found.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Not for changelog: chassert only fires in debug / sanitizer builds; release behaviour is unchanged.

Documentation entry for user-facing changes

  • Documentation is unchanged.

The AST fuzzer (`amd_debug`) surfaced a debug-build abort with stack
`TreeRewriter::collectJoinedColumns -> abortOnFailedAssertion`,
hash `STID 2508-3d70`, on queries shaped like
`SELECT * FROM t0 ASOF LEFT JOIN t1 ON and((t0.y > t1.y))`.

Root cause: `chassert(analyzed_join.oneDisjunct())` at
`src/Interpreters/TreeRewriter.cpp:759` also asserts that the
single clause is non-empty. For an `ASOF` JOIN whose `ON`
expression is a pure inequality (no equality key),
`CollectJoinOnKeysVisitor` records the `ASOF` keys into
`data.asof_*_key` but does NOT populate `key_names_left` /
`key_names_right`. The clause stays empty. In release builds the
following `any_keys_empty` check throws
`INVALID_JOIN_ON_EXPRESSION` cleanly; the assertion was harmless
because `assert` was a no-op. After 445dafb converted
`assert` to `chassert`, the assertion fires in debug builds
before the proper error can be thrown.

The shape produced by the fuzzer (`and(...)` around a single
inequality) is incidental — the same failure reproduces for
the simpler `ON t0.y > t1.y`.

Fix: weaken to a count-only check (mirroring the OR branch at
line 753). Empty clauses for non-`ASOF` cases are still caught
by the existing `any_keys_empty` throw immediately below; ASOF
clauses are populated later by `data.asofToJoinKeys`.

Stack trace (debug build):
    0. Common/Exception.cpp:60 DB::abortOnFailedAssertion
    1. Interpreters/TreeRewriter.cpp:759 collectJoinedColumns
    2. Interpreters/TreeRewriter.cpp:1494 TreeRewriter::analyzeSelect
    3. Interpreters/InterpreterSelectQuery.cpp:821 InterpreterSelectQuery

Report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=13a9b8ea79c017e7e34c4b6065660398647e35f7&name_0=MasterCI&name_1=AST%20fuzzer%20%28amd_debug%29
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (session cron:clickhouse-ci-task-worker:20260531-103000):

# Gate Status
a Deterministic repro? clickhouse local --query "...ASOF LEFT JOIN... ON t0.y > t1.y SETTINGS enable_analyzer = 0" aborts (exit 134) every run in debug. Same for ON and((t0.y > t1.y)).
b Root cause explained? chassert(analyzed_join.oneDisjunct()) requires !clauses.front().isEmpty(). For ASOF with a pure inequality, the visitor records keys into data.asof_*_key but the clause stays empty until asofToJoinKeys runs later (or any_keys_empty throws). Latent since 2021; surfaced after 445dafb5f098 converted assertchassert.
c Fix matches root cause? ✅ Weakened to count-only getClauses().size() == 1, mirroring the OR-branch (line 753). Emptiness for non-ASOF is still caught by the any_keys_empty throw immediately below.
d Test intent preserved? ✅ Structural invariant ("exactly one disjunct after else-branch") still asserted. New stateless regression test (04303_asof_join_inequality_only_old_analyzer) covers both shapes (with and without and(...)) plus the proper ASOF sanity.
e Both directions demonstrated? ✅ Without fix → SIGABRT 134, empty stdout, diff vs reference fails. With fix → exit 0, stdout matches reference exactly. Existing 00927_asof_*, 01428_nullable_asof_join still pass.
f Fix is general? ✅ Single call site; the OR-branch sibling already uses the count-only form. No symmetric paths to widen.

@groeneai
Copy link
Copy Markdown
Contributor Author

cc @vdimir @novikd — could you review this? It's a small fix to the old analyzer's TreeRewriter::collectJoinedColumns to keep a chassert from firing prematurely on ASOF joins with no equality key (AST-fuzzer-found STID 2508-3d70).

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label May 31, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 31, 2026

Workflow [PR], commit [98619d6]

Summary:

job_name test_name status info comment
Stateless tests (amd_debug, distributed plan, s3 storage, parallel) FAIL
01090_zookeeper_mutations_and_insert_quorum_long FAIL cidb
Stateless tests (amd_tsan, s3 storage, parallel, 1/2) FAIL
Logical error: Bad cast from type A to B (STID: 3336-2f87) FAIL cidb, issue
Stress test (arm_ubsan) FAIL
Logical error: Coordination mode mismatch for stream A: got B, expected C (STID: 6117-726a) FAIL cidb

AI Review

Summary

This PR weakens the old analyzer's single-disjunct assertion in TreeRewriter::collectJoinedColumns so invalid pure-inequality ASOF JOIN clauses reach the existing INVALID_JOIN_ON_EXPRESSION path instead of tripping chassert first. The focused stateless regression covers both the simple and fuzzer-shaped invalid clauses plus a valid ASOF JOIN; I did not find any review findings.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-ci label May 31, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.40% 84.40% +0.00%
Functions 91.40% 91.40% +0.00%
Branches 77.00% 77.00% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 4/4 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants