Skip to content

ClickHouse: add PQS/CERT/CODDTest oracles and lift type system to ADT#4

Open
fm4v wants to merge 18 commits into
ClickHouse:mainfrom
fm4v:nik/clickhouse-add-pqs-cert-coddtest
Open

ClickHouse: add PQS/CERT/CODDTest oracles and lift type system to ADT#4
fm4v wants to merge 18 commits into
ClickHouse:mainfrom
fm4v:nik/clickhouse-add-pqs-cert-coddtest

Conversation

@fm4v
Copy link
Copy Markdown
Member

@fm4v fm4v commented May 15, 2026

The ClickHouse provider previously only supported the five TLP variants (Where / Distinct / GroupBy / Aggregate / Having) and NoREC, and its schema generator only emitted two column types (Int32 and String). This PR

  1. adds three more general-purpose oracles (PQS, CERT, CODDTest) — each tracking its source paper as faithfully as the existing ClickHouse expression generator allows, and
  2. lifts the type system to a recursive ADT with a capability layer so the schema generator can emit Nullable(T) and LowCardinality(T) columns, and every dispatch site that previously AssertionError'd on anything outside {Int32, String} is re-routed through the new capabilities.

1. New oracles

PQS — Pivoted Query Synthesis (Rigger & Su, OSDI '20)

The classical SQLancer PQS implementation requires every AST node to expose a Java-side getExpectedValue() that mirrors the DBMS' evaluation semantics. ClickHouse's expression AST does not provide that for most of the generated tree, and reproducing ClickHouse's coercion / NULL / arithmetic rules in Java would be open-ended. This implementation delegates rectification to the server: for each randomly-generated predicate the pivot row's values are embedded as literals in a one-row subquery and ClickHouse itself evaluates the predicate. Based on TRUE / FALSE / NULL the predicate is kept, negated, or wrapped in IS NULL. Containment is checked with INTERSECT.

CERT — Cardinality Estimation Restriction Testing (Ba & Rigger, ICSE '24, DOI)

Generates a random query Q, derives a strictly more restrictive Q' from it, and asserts the cardinality restriction monotonicity property EstCard(Q', D) ≤ EstCard(Q, D). The estimate is read from EXPLAIN ESTIMATE and the query is never executed — this oracle tests the estimator, not the runtime. Mutations are one-directional and restrictive: AND-tighten the WHERE, drop an OR operand from an existing disjunction, or promote ALLDISTINCT. Structural-similarity gate on EXPLAIN PLAN skips runs where the plan diverges enough to make the two estimates incomparable.

EXPLAIN ESTIMATE only meaningfully responds to PK/partition-key/index filters on MergeTree tables. For non-MergeTree engines or ORDER BY tuple() tables it returns an empty result; those attempts are dropped via IgnoreMeException.

CODDTest — Constant-Optimization-Driven Database System Testing (Zhang & Rigger, SIGMOD '25, DOI)

For a query Q with a sub-expression E, builds an auxiliary A that evaluates E in isolation, reads the resulting constant V, then builds a folded query F by substituting V for E in Q. Constant folding is semantics-preserving, so any discrepancy between Q's and F's result sets is a logic bug.

All three flavors from the paper land: constant expression (Section 3.1 case 1), scalar non-correlated subquery (Section 3.1 case 2), and dependent expression with a CASE mapping (Section 3.2). The outer predicate template varies — bare comparison, AND/OR compounds, NOT — so phi passes through richer constant-folding paths than a fixed col op phi.


2. Type-system foundation v1 (replaces the flat (ClickHouseDataType, String) model)

Plan: docs/plans/2026-05-16-001-feat-clickhouse-type-system-foundation-plan.md. Brainstorm origin: docs/brainstorms/clickhouse-type-system-foundation-requirements.md.

The previous model — ClickHouseLancerDataType wrapping a ClickHouseDataType enum + textual representation — could not encode parameters (so ClickHouseDataType.of("Decimal(9,2)") silently normalised to Decimal) and could not express wrappers like Nullable or LowCardinality. Every oracle bailed on anything outside {Int32, String} because the constant emitters, cast paths, and oracle filters all defaulted to AssertionError.

This PR introduces:

  • ClickHouseType — a sealed recursive ADT with four constructors:
    • Primitive(Kind) for atomic types (Int8…Int256, UInt8…UInt256, Float32, Float64, Bool, String, UUID, Date, Date32, IPv4, IPv6),
    • Nullable(inner) and LowCardinality(inner) with canWrap rules (no nested Nullable, no LC on Float/Bool/UUID/IPv*, etc.),
    • Unknown(raw) as a defensive fallback the reflection parser uses for any type string it does not recognise.
  • Capability predicates on every term: isNumeric(), supportsLiteralEmission(), hasNullSemantics() (true iff the outer term is Nullable).
  • ClickHouseTypeParser — a hand-written recursive-descent parser for ClickHouse type strings; recognises every primitive plus Nullable(…) and LowCardinality(…) and nested combinations, falls back to Unknown(raw) for everything else. Never throws.
  • ClickHouseUnsupportedConstant — sentinel returned by ClickHouseCast.castToInt/castToReal/castToText/isTrue/convertInternal when a coercion is not defined for the input. Propagates through cast pipelines and raises IgnoreMeException on any attempt to compare, evaluate, or consume it. This replaces every default: throw new AssertionError(...) in ClickHouseCast.
  • Wrapper-aware ClickHouseLancerDataType — adds getTypeTerm(): ClickHouseType while keeping getType(): ClickHouseDataType returning the root primitive (lossy compatibility for legacy callers; documented).
  • Feature flags--test-nullable-types and --test-lowcardinality-types, both default true, threaded through ClickHouseLancerDataType.getRandom(state) and ClickHouseColumn.createDummy(name, table, state).
  • Constant-emission ADT dispatchClickHouseExpressionGenerator.generateConstant and ClickHouseSchema.getConstant now dispatch on the ADT term:
    • Primitive(kind) → per-kind emitter,
    • Nullable(inner) → with small probability emit NULL, else recurse,
    • LowCardinality(inner) → transparent, recurse,
    • UnknownIgnoreMeException.
  • CODDTest filter migration — the aggType != Int32 && aggType != String filters at ClickHouseCODDTestOracle.java:175-177,208-210 are rewritten as a isFoldableColumnTerm(ClickHouseType) capability check that accepts Int*/UInt*/Bool/String and any Nullable/LowCardinality wrapper of those. The local baseTypeName / parseType / renderLiteral string parsers are migrated onto ClickHouseTypeParser.
  • CERT generator dispatchgeneratorExprFor(ClickHouseType) composes generators:
    • Nullable(inner) wraps the inner generator with if(rand() % 10 = 0, NULL, …),
    • LowCardinality(inner) is transparent at INSERT (ClickHouse coerces),
    • Unknown throws IgnoreMeException.
  • Table-generator clause validationPARTITION BY / ORDER BY / SAMPLE BY expression generation now validates the result and retries up to 5 times before dropping the clause:
    • ORDER BY rejects all-constant expressions ("Sorting key cannot contain constants"),
    • PARTITION BY additionally rejects Float-column references ("Floating point partition key is not supported").
  • Connection-level settingsClickHouseProvider adds allow_suspicious_low_cardinality_types=1 to the JDBC URL when the LC flag is on, and ClickHouseTableGenerator adds allow_nullable_key=1 to the MergeTree SETTINGS clause so wrapped columns can appear in ORDER/PARTITION/SAMPLE keys.
  • Error catalog additionsClickHouseErrors gains the v1 type-family patterns surfaced by live smoke runs: ILLEGAL_TYPE_OF_ARGUMENT, LowCardinality conversion/nested-type rejections, SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY, Partition key contains nullable columns, allow_nullable_key, CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, and "Cannot convert NULL value to non-Nullable type".

TLP three-valued logic is explicitly deferred. The third-partition composition lives in sqlancer.common.oracle.TernaryLogicPartitioningOracleBase (cross-DBMS), and the requirements doc forbids common-module edits. TLP runs with enableNullable=true but the partitions remain 2-valued; that's a v1.0.5 follow-up (see the plan).


3. Other changes

  • ClickHouseSchema.ClickHouseRowValue's constructor promoted from package-private to public so PQS in oracle/pqs/ can construct it for the base class' diagnostic logging.
  • .github/workflows/main.yml:
    • Adds the seven new test classes to the enumerated -Dtest=… list for the clickhouse CI job.
    • Drops the 18 per-DBMS jobs we don't ship changes for (citus, cockroachdb, databend, datafusion, duckdb, hive, spark, hsqldb, mariadb, materialize, mysql, oceanbase, postgres, presto, sqlite, tidb, yugabyte, doris). Keeps misc (project-wide style/PMD/Checkstyle/SpotBugs + misc unit tests + naming convention check) and clickhouse.

Verification

Unit tests (89 total, all pass) — seven new test classes covering the ADT, parser, cast extension, generation surface, CODDTest filter, CERT generator, and TableGenerator validators. CI runs them on every push.

mvn -DskipTests=true -Dspotbugs.skip=true verify clean (formatter, Checkstyle, PMD). SpotBugs is broken on JDK 25 in this repo (Unsupported class file major version 69 from the bundled ASM); affects every class, not just new code.

Live SQLancer smoke against ClickHouse 26.5.1.111, 4 threads × 10 minutes, all four PR-introduced oracles (TLP / NoREC / PQS / CODDTest, both type flags ON):

Metric Value
Total queries 70,299 (~117 q/s average)
Successful statements 55% (typical for SQLancer's adversarial generator)
AssertionError count 0
Threads shut down on errors 0 / 4
Process exit code 0 (clean self-timeout)
Nullable columns observed in CREATE TABLE Nullable(String), Nullable(Int32)
LowCardinality columns observed LowCardinality(Int32)

The live smoke also caught three v1-introduced rejections mid-run; all are fixed in this PR:

  1. LowCardinality(Int32) rejected by default (SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY) — fixed by adding allow_suspicious_low_cardinality_types=1 to the JDBC URL.
  2. PARTITION/ORDER BY referencing a Nullable column rejected (ILLEGAL_COLUMN, allow_nullable_key) — fixed by adding allow_nullable_key=1 to MergeTree SETTINGS.
  3. CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN from MATERIALIZED clauses casting NULL → non-Nullable target — added to ClickHouseErrors.

The final post-fix run completed all 600 seconds with zero unhandled errors.


Downstream

The ClickHouse repo will pick these up via a Dockerfile pin bump once this lands: ClickHouse/ClickHouse#104988.

The ClickHouse provider previously supported only the five TLP variants
(Where / Distinct / GroupBy / Aggregate / Having) and NoREC. This change
adds three more general-purpose oracles to ClickHouseOracleFactory.

PQS (Pivoted Query Synthesis, Rigger & Su, OSDI 2020). The classical
SQLancer PQS implementation requires every AST node to expose a
Java-side getExpectedValue() that mirrors the DBMS' evaluation
semantics. ClickHouse's expression AST does not provide this for most
of the generated tree, and reproducing ClickHouse's coercion / NULL /
arithmetic rules in Java would be an open-ended effort. This
implementation delegates rectification to the server: for each
randomly-generated predicate the pivot row's values are embedded as
literals in a one-row subquery and ClickHouse itself evaluates the
predicate. Based on TRUE / FALSE / NULL the predicate is kept, negated,
or wrapped in IS NULL so the conjunction is guaranteed to hold for the
pivot row. Containment is checked with INTERSECT, which handles NULL
semantics correctly in ClickHouse.

CERT (Cardinality Estimation Restriction Testing). Builds a random
SELECT, mutates it through a WHERE / AND / OR / DISTINCT toggle with a
known monotonicity direction, then asserts the actual row count moves
the way the mutation predicts. A plan-similarity gate on EXPLAIN PLAN
output skips cases where the plan diverges enough that the comparison
stops being meaningful. ClickHouse doesn't surface single-number
cardinality estimates that the JDBC client can read, so this uses
actual row counts -- still catches optimizer-driven row loss such as
predicate-pushdown bugs and faulty DISTINCT dedup. JOIN / GROUPBY /
HAVING / LIMIT mutators are skipped: LIMIT isn't serialized by the
visitor, the others need richer query shapes than the existing
generator produces.

CODDTest (Cross-Optimization Decision Differential Testing). Runs the
same query twice with a random subset of optimizer flags toggled on
vs off (injected as a per-query SETTINGS clause to avoid leaking state
into neighbouring oracle runs sharing the same connection) and asserts
the two result sets are identical. The flag list is deliberately
conservative -- rewrites with high blast radius (analyzer
enable/disable, JOIN algorithm) are excluded because they tend to
surface stylistic differences rather than correctness bugs.

ClickHouseSchema.ClickHouseRowValue's constructor is promoted from
package-private to public so the PQS oracle in oracle/pqs/ can
construct it for the base class diagnostic logging.

Smoke-tested against a release ClickHouse 26.5 server, single thread,
30-second budget per oracle: PQS ~10 q/s with 94% successful
statements; CERT ~12 q/s with 98%; CODDTest ~16 q/s with 97%. No false
positives in any run. Checkstyle clean (`mvn checkstyle:check`),
naming convention check passes (`python src/check_names.py`).
fm4v added a commit to ClickHouse/ClickHouse that referenced this pull request May 15, 2026
Removes the ci/docker/sqlancer-test/overlay/ Java sources, the
Dockerfile COPY step that overlays them onto the cloned fork, and the
PQS/CERT/CODDTest entries from the TESTS array.

The three new oracles are being added to the ClickHouse fork of
SQLancer directly (ClickHouse/sqlancer#4).
Once that lands, this PR will bump the pinned fork commit and re-add
the names to TESTS.
The initial CERT and CODDTest implementations diverged from their
papers in ways that defeated the test signal:

CERT was using actual row counts from running the queries and a
bidirectional mutator framework. Per Ba and Rigger, ICSE 2024 the
property under test is `EstCard(Q', D) <= EstCard(Q, D)` -- the
*estimator's* projection, with Q' strictly more restrictive than Q,
and "CERT eschews executing queries". This rewrite:

* Reads cardinality from `EXPLAIN ESTIMATE`, summing `rows` across
  the per-table tuples it returns. The query is never executed.
* Restricts mutations to one direction. `mutateWhere`/`mutateAnd`
  always AND-tighten or introduce a WHERE; `mutateOr` drops an OR
  operand (per the paper's restrictive-OR rule) or falls back to AND;
  `mutateDistinct` only promotes ALL -> DISTINCT. All return
  `increase=false`.
* Skips runs where the estimator returns nothing (non-MergeTree
  engines, `ORDER BY tuple()`, unsupported expressions), and skips
  runs where the structural-similarity gate on `EXPLAIN PLAN` shows
  too much drift.

CODDTest was toggling random optimizer flags via per-query `SETTINGS`
clauses and comparing results. Per Zhang and Rigger, SIGMOD 2025 the
oracle is constant-folding-driven: take a subexpression E in Q,
evaluate E to a value V via an auxiliary query A, build a folded
query F by substituting V for E, then assert results of Q and F are
identical. This rewrite implements the scalar-subquery variant
(same as DuckDBCODDTestOracle in the upstream PR sqlancer#1054):

  aux:    SELECT min(c)/max(c) FROM t                            -> V
  Q:      SELECT * FROM t WHERE col op (SELECT min/max(c) FROM t)
  F:      SELECT * FROM t WHERE col op V

Only `Int32`/`String` columns are folded since they are the only
types the existing schema generator and `ClickHouseSchema.getConstant`
support; NULL auxiliary results are skipped (NULL-propagation would
make the predicate UNKNOWN for every row and the equivalence does
not hold).

Verified locally against a release ClickHouse 26.5 server:

* CERT: ~6 q/s effective (most attempts skip because no estimate
  responds to the random mutation), 0 false positives in a 30s
  window.
* CODDTest: ~22 q/s, 96-97% successful statements, 0 false positives.

`mvn checkstyle:check` clean, `mvn package -Dmaven.test.skip=true`
succeeds.

Papers:
  CERT     https://doi.org/10.1145/3597503.3639076
  CODDTest https://doi.org/10.1145/3709674
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 16, 2026

CLA assistant check
All committers have signed the CLA.

fm4v added 8 commits May 16, 2026 11:08
Extends the PQS oracle to cover three paper elements that the initial
landing skipped, with no change to the rectification contract or the
existing single-table behavior.

* Multi-table pivot rows (Section 3.1): 1-3 random non-empty tables.
  Predicates are generated over the union of their table-qualified
  columns, the rectification probe stitches one literal-alias subquery
  per pivot table, and the rectified query uses an implicit cross-join
  with all rectified predicates in WHERE.
* Optional elaborations (Section 3.2) attached probabilistically:
  DISTINCT, GROUP BY of all pivot columns, ORDER BY. Each preserves
  the pivot row's presence in the result set by construction.
* IS NULL rectification path is now reachable: ClickHouseExpressionGenerator
  gains an opt-in allowNullLiterals flag (default false to keep existing
  oracles unchanged) which, when enabled by PQS, occasionally injects
  a NULL leaf so the probe can legitimately return SQL NULL.

Validated against ClickHouse 26.5.1.111: 4 threads x 5 minutes = 21,211
oracle queries; multi-table FROM hits 11,518 (3-table: 2,762), DISTINCT
510, GROUP BY 474, ORDER BY 511, IS NULL rectifications 154, zero
AssertionError / reportMissingPivotRow / Exception.
Empirical probe of `EXPLAIN ESTIMATE` shows it only responds to predicates
that prune MergeTree primary-key granules; LIMIT, DISTINCT, JOIN-type, and
bare GROUP BY are invariant. With the default `index_granularity=8192` and
the ~10-30 row inserts the schema generator emits, every table fits in a
single granule and the estimate cannot move regardless of restriction, so
the oracle was trivially passing every run. This commit fixes that and
also covers the only paper rule (HAVING) that moves the estimate.

* Bulk-load 50,000 rows from `numbers()` into the chosen table on each
  check if it sits below that threshold. Idempotent and bounded; tables
  already amplified are left alone. Verified by inspection: 8 of 9 test
  tables hit 50k rows on a 60s smoke run; the 9th had MATERIALIZED
  columns that legitimately rejected `INSERT SELECT`.
* Query `system.columns.is_in_primary_key` to discover the table's PK
  columns and duplicate them 4x in the column list passed to the
  expression generator. A random leaf is now ~67% likely to be a PK
  column (1 PK out of 3 cols, otherwise 33%), so a generated predicate
  much more often hits the granule pruner.
* Sometimes (25%) build Q with `GROUP BY <pk_col>` so the new HAVING
  mutator can fire. The HAVING mutator AND-tightens HAVING with a fresh
  PK-biased predicate; ClickHouse pushes PK predicates in HAVING down
  through the optimizer to the scan, making them granule-prune-capable.
  Falls back to AND-tightening WHERE when no GROUP BY is present so the
  mutator is never a no-op.
* Apply 1-3 random restriction rules per attempt (paper allows multiple).

JOIN, bare GROUPBY, and LIMIT remain excluded because they are invariant
under ClickHouse's `EXPLAIN ESTIMATE` and would add no bug-finding power.
This decision is captured inline.
JDK 26 is the current release (March 2026); CI was pinned to JDK 11 and
the pom enforced `source/target=11` via the Eclipse compiler (ecj). The
latest ecj on Maven Central (3.45.0) only goes up to JDK 24, so to move
forward we switch the maven-compiler-plugin to standard javac, drop the
ecj/plexus dependencies and the `org.eclipse.jdt.core.prefs` compiler
arguments, and set `<release>26</release>`. The `.settings/` directory
remains for IDE use only.

* `maven-compiler-plugin` 3.10.1 -> 3.13.0, `<release>26</release>`,
  no compilerId override, no ecj deps.
* `maven-javadoc-plugin` 3.4.1 -> 3.11.2 and `<source>` bumped to 26.
* `.github/workflows/main.yml` and `release.yml`: every
  `java-version: '11'` -> `'26'` (27 occurrences total). Distribution
  remains Adoptium Temurin, which ships JDK 26 binaries.

Verified locally on Temurin 26.0.1: `mvn package -Dmaven.test.skip=true`
clean, sqlancer jar runs both TLPWhere and CERT oracles against
ClickHouse 26.5 with no JVM-level errors. The few remaining warnings
(`System::loadLibrary` from the ClickHouse JDBC LZ4 native, `Unsafe`
from Guava) are advisory and unrelated to this change.
Use JDK 25 (the current LTS, released Sept 2025) rather than JDK 26
(non-LTS, released March 2026). Same javac-via-maven-compiler-plugin
setup as the previous commit; just flips the source/release level
and the CI java-version.

* `pom.xml`: maven-compiler-plugin `<release>` 26 -> 25; release
  profile maven-javadoc-plugin `<source>` 26 -> 25.
* `.github/workflows/main.yml` and `release.yml`: every
  `java-version: '26'` -> `'25'` (27 occurrences). Distribution
  remains Adoptium Temurin.

Verified locally on Temurin 25.0.3: `mvn clean compile test-compile`
green; produced jar is class-file major version 69 (Java 25).
The initial implementation covered only the scalar non-correlated
subquery case (Section 3.1 case 2 of Zhang & Rigger SIGMOD '25). Extend
to follow Algorithm 1 from the paper, picking one mode uniformly per
check.

Modes:

1. Constant expression (Section 3.1 case 1, was missing). Generates a
   random column-free expression via the existing
   `ClickHouseExpressionGenerator`, evaluates it with
   `SELECT toTypeName(phi), phi`, and substitutes the literal back. The
   generator's `generateExpressionWithExpression` is seeded with a few
   typed constant leaves -- this is necessary because
   `generateExpressionWithColumns` short-circuits to a single constant
   when called with an empty column list.

2. Scalar non-correlated subquery (Section 3.1 case 2). The previous
   implementation's `min/max(col)` path, restated in the new framework.

3. Dependent expression (Section 3.2, was missing). Generates a random
   expression over one outer column k, builds a
   `SELECT DISTINCT k, phi FROM t` mapping, folds phi to a
   `CASE WHEN k = v_i THEN r_i ...` wrapped in
   `cast(..., 'expectedType')` so the folded predicate sees the same
   operand type as the original through compound predicates.

The outer predicate template is also varied (bare comparison, AND/OR
compounds, NOT) so phi passes through richer constant-folding paths than
the previous fixed `col op phi`.

Validated against ClickHouse 26.5.1.111: 4 threads, 5 minutes, 64,071
queries executed, 98% successful statement rate, 0 false positives.
Replace the flat (ClickHouseDataType, String) representation in
ClickHouseLancerDataType with a recursive ClickHouseType ADT (Primitive,
Nullable, LowCardinality, Unknown) plus a four-predicate capability layer.
Re-route every dispatch site that previously AssertionError'd on anything
outside {Int32, String}, add a defensive reflection parser, and extend
ClickHouseCast to cover every v1 primitive kind via a propagating
ClickHouseUnsupportedConstant sentinel.

Activates two new feature flags (--test-nullable-types,
--test-lowcardinality-types, both on by default) so the generator now emits
Nullable and LowCardinality columns. CODDTest's filter and legacy string
parser, CERT's generatorExprFor, and the table generator's PARTITION/SAMPLE/
ORDER clause emission are all rewritten to dispatch via the new capabilities.

Live SQLancer smoke against ClickHouse 26.5 (10 min, 4 oracles, 70k+
queries) surfaced three v1-introduced rejections and they are now handled:
allow_suspicious_low_cardinality_types is set on the JDBC URL when the LC
flag is on; allow_nullable_key=1 is added to MergeTree SETTINGS so wrapped
columns can participate in PARTITION/ORDER/SAMPLE; the
CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN family is added to ClickHouseErrors.

Plan and brainstorm documents that drove the implementation are included
under docs/. CI test enumeration in .github/workflows/main.yml is extended
to run the seven new test classes.
This fork only ships changes to the ClickHouse provider, so the per-DBMS
matrix in .github/workflows/main.yml was 19 jobs we never read. Removes
citus, cockroachdb, databend, datafusion, duckdb, hive, spark, hsqldb,
mariadb, materialize, mysql, oceanbase, postgres, presto, sqlite, tidb,
yugabyte, and doris.

Keeps `misc` (project-wide style/PMD/Checkstyle/SpotBugs via `mvn verify`
plus the misc unit tests and naming convention check) and `clickhouse`
(the DBMS job that exercises the type-system foundation tests).
@fm4v fm4v changed the title ClickHouse: add PQS, CERT, and CODDTest oracles ClickHouse: add PQS/CERT/CODDTest oracles and lift type system to ADT May 17, 2026
fm4v added 7 commits May 18, 2026 10:56
Add two complementary differential-testing capabilities:

1. SEMR oracle (--oracle SEMR) picks one "should-be-result-preserving"
   ClickHouse optimizer setting from a curated list, runs the same generated
   SELECT once with the setting forced 0 and once forced 1, and fails when the
   two multisets diverge. Targets cross-configuration consistency bugs of the
   shape documented at ClickHouseTLPHavingOracle.java:42 (ClickHouse#12264).

2. --random-session-settings + --random-session-settings-budget apply a random
   subset of a curated execution-mode catalog via SET k=v on the per-database
   JDBC connection. Every other oracle (TLP*, NoREC, PQS, CERT, CODDTest)
   implicitly runs under a different setting profile each database.

The two features are mutually exclusive in a single run (rejected at startup
with a single clear error). The catalog excludes optimizer-rewrite settings
from the randomization list to protect CERT/CODDTest invariants, and excludes
settings hardcoded by TLPHaving/TLPAggregate from both lists. Setting churn
(unknown setting, out-of-range value) is absorbed via a new expected-error
catalog so it never surfaces as an oracle failure.

Plan: docs/plans/2026-05-17-001-feat-clickhouse-semr-oracle-settings-randomization-plan.md
The expression generator picked column leaves and operators independently of
type, so a String column could feed an arithmetic operator and a Float column
could feed gcd/lcm/intDiv. Against ClickHouse 26.2, system.query_log showed
~96% of SQLancer failures were ILLEGAL_TYPE_OF_ARGUMENT (Code 43) from this
mismatch, with smaller contributions from NO_COMMON_TYPE join keys (386) and
typed-comparison constants (53/32).

Four mechanical fixes against the same workload (--oracle TLPDistinct
--random-session-settings true, 400 queries, seed 12345):

* generateExpressionWithColumns filters to numeric columns and the recursive
  descent stays in the numeric pool. Falls back to an Int32 constant when the
  table has only non-numeric columns.
* BINARY_FUNCTION splits into integer-only (intDiv/gcd/lcm with plain integer
  column refs) and any-numeric (max2/min2/pow with the recursive descent).
  ClickHouse promotes most math wrappers (sin, cos, sqrt, log...) to Float64,
  so the integer-only branch keeps leaves as bare column refs to stay
  integer-typed end to end. generateExpressionWithExpression also routes
  through getRandomAnyNumeric since its pre-built expression leaves are
  usually aggregate Floats.
* generateExpression(type, depth) now defaults rightLeafType to leftLeafType,
  inverting the previous "force same type with low probability" coin flip
  that produced Int32-vs-String comparisons.
* generateJoinClause enumerates (left, right) column pairs, prefers same-type,
  falls back to numeric-vs-numeric, and throws IgnoreMeException when no
  compatible key combination exists. Avoids server roundtrips for joins that
  would error with NO_COMMON_TYPE.
* Off-by-one in four column-picker call sites: getNotCachedInteger(0, size-1)
  excluded the last index; corrected to size.

Result: SELECT failure rate against ClickHouse 26.2.17.31 dropped from 41.6%
to 0.09% on the same seeded workload, with the remaining 4 failures being
runtime division-by-zero (out of scope for type fixes) and stray edge cases.
…atalog entry

Two infrastructure changes that benefit every ClickHouse oracle:

- Bump the CI ClickHouse image from 24.3.1.2672 to :head so wrong-result
  bugs in the active stable line surface earlier. The pin sacrificed
  reproducibility for stability; we now accept slight CI churn in exchange
  for catching regressions before they reach a tagged release.

- Add "is found in GROUP BY in query" and "(ILLEGAL_AGGREGATION)" to the
  expected-expression-error catalog. ClickHouse 26's new analyzer raises a
  different error string than the 24.x branch when a positional GROUP BY
  reference (GROUP BY 1) resolves to an aggregate SELECT-list column --
  the old "Illegal value (aggregate function) for positional argument in
  GROUP BY" pattern was the 24.x form; both must be absorbed so the
  generator's harmless aggregate-positional output doesn't surface as an
  oracle finding in 26+. Surfaced via the EET HAVING-mode regression run
  but benefits TLPHaving and any future HAVING-using oracle equally.
Add the SIGMOD '25 paper's companion to CODDTest. Where CODDTest folds a
sub-expression to its precomputed value and asserts the result is
unchanged, EET goes the inverse direction: inject an expression that
should fold to a fixed value (tautology, contradiction, or algebraic
identity) and assert the rewrite is semantics-preserving. Same target
bug class (optimizer constant-folding / short-circuit / partial-eval),
orthogonal attack axis.

Selectable via --oracle EET. Each check() picks one of four modes
uniformly:

- WHERE injection. Generate a base predicate `predQ` and random `e`;
  conjoin `pred AND (3VL-tautology over e)` and assert rows unchanged,
  or `pred AND (3VL-contradiction over e)` and assert rows empty. The
  3VL shapes are `(((e) OR NOT (e)) OR (e) IS NULL)` and
  `(((e) AND NOT (e)) AND (e) IS NOT NULL)` with binding-tight parens
  on every reference to `e` -- ClickHouse's parser binds OR looser than
  NOT and tighter than AND, so an unparenthesized injection inside
  `pred AND ...` would parse the wrong way.

- HAVING injection. Same shapes injected into an aggregated query's
  HAVING clause. Reuses TLPHaving's
  `aggregate_functions_null_for_empty=1, enable_optimize_predicate_expression=0`
  SETTINGS suffix on both sides of the comparison to dodge ClickHouse
  issue #12264; not applying it produces false positives indistinguishable
  from EET findings.

- Expression-position rewrite. Pick a SELECT-list column `x`, probe its
  runtime type via `toTypeName`, wrap as `if(taut, x, x)`,
  `multiIf(taut, x, junk, x)`, or `CASE WHEN taut THEN x ELSE x END`
  (and the contradiction-negated form). Both arms share `x`'s type;
  the junk-branch value is `defaultValueOfTypeName(typeOfX)` -- a typed
  non-NULL default, picked because `cast(NULL, 'LowCardinality(...)')`
  is rejected at parse time (LowCardinality is not nullable). Each
  rewrite is wrapped in `cast(..., 'TypeOfX')` to neutralize the type
  widening some identities introduce.

- Algebraic identity. Type-safe substitution from a five-entry catalog
  (`ClickHouseEETIdentities`): `plus(x,0)`, `multiply(x,1)`,
  `concat(x,'')`, `coalesce(x,x)`, `if(true,x,x)`. Each entry carries a
  predicate that gates application to a safe type family. Float and
  Decimal are excluded from `plus`/`multiply` (NaN / -0.0 formatting and
  scale-coercion false positives). String only for `concat`.

Reuses `CODDTestBase` for failure-attribution fields; the naming
mismatch is a deliberate trade-off acknowledged in the plan rather
than mechanically duplicating six fields for the second oracle in this
family.

Validated against ClickHouse 26.5.1.111 with a 27K-query burn-in plus
the 1000-query integration test (T18_, --num-threads 1). No oracle
assertion failures. Plan in
docs/plans/2026-05-18-001-feat-clickhouse-eet-oracle-plan.md.

Paper: Zhang and Rigger, "Constant Optimization Driven Database System
Testing", SIGMOD '25 (DOI 10.1145/3709674).
Adds max_execution_time=120 to the JDBC URL. Without this cap, occasional
heavyweight random queries hit the 300s socket_timeout and produce ambiguous
client-side timeout exceptions instead of clean server-side error codes
(3 such timeouts observed in a 15-min 2026-05-18 baseline run). The
server-side cap surfaces as TIMEOUT_EXCEEDED, absorbed by the matching
"Timeout exceeded: elapsed" + "(TIMEOUT_EXCEEDED)" multi-word substrings
added to ClickHouseErrors.
Adds the implementation plan for three orthogonal query-generator additions:
aggregate combinator chains (-If, -OrNull, -OrDefault, -Distinct, -Array,
-State, -Merge, -ForEach, -Resample, -Map), set operations with explicit
ALL/DISTINCT keywords (UNION ALL/DISTINCT, INTERSECT, EXCEPT) plus a new
ClickHouseTLPSetOpOracle, and ARRAY JOIN structural plumbing (blocked on
type-system v2 for activation).

Sequenced as commit-level milestones on this branch, with per-phase yield
gates measured against a pre-Phase-A baseline. Deepened against five
reviewer agents; auto-fixes applied silently, strategic decisions
integrated based on user direction (full combinator matrix, single-PR
bundling, per-phase yield gates, EXCEPT operator coverage).
Adds compress=false to the JDBC URL. clickhouse-jdbc 0.9.6 has a defect in
its LZ4-over-chunked-HTTP decoder (ClickHouseLZ4InputStream +
ChunkedInputStream interaction) that fires
MalformedChunkCodingException: CRLF expected at end of chunk mid-response,
surfaced at the JDBC layer as SQLException: Failed to read value for column.
Observed 16 times across the 2026-05-18 15-min baseline (0.33% per-query
rate); validated server-side via clickhouse-client (native protocol) which
returns valid data for every failing query — confirming the bug is in the
driver, not in ClickHouse.

With compression off the buggy code path is bypassed entirely: the response
stream becomes the raw chunked HTTP body, no LZ4 frame parsing. Trade-off:
~3x larger responses on the wire, but SQLancer's queries are small and the
connection is loopback, so net throughput is unaffected.

Revisit when clickhouse-jdbc fixes the LZ4 decoder upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants