fix(tesseract): multi-stage pre-aggregation usage substitution and IS NOT DISTINCT FROM by waralexrom · Pull Request #10925 · cube-js/cube

waralexrom · 2026-05-20T22:07:46Z

Summary

When a multi-stage query needs several distinct scans of the same pre-aggregation, Tesseract emits SQL with <base>__usage_N suffixes (e.g. payer_360_lives_cube_julio__usage_0). The orchestrator's suffix-aware substitution lived only inside the partitioned branch of PreAggregationPartitionRangeLoader.loadPreAggregations, so for non-partitioned rollups the suffix was never mapped to a real table name. Combined with the substitution's lack of word boundaries, this left __usage_N glued onto the substituted full table name and Cubestore failed with Table prod_pre_aggregations.<full>__usage_0 was not found.

This PR fixes the substitution path for non-partitioned rollups and, while at it, registers IS NOT DISTINCT FROM in the Cubestore SQL templates so multi-stage JOIN ON clauses don't balloon into OR (a IS NULL AND b IS NULL) triplets per dimension.

Changes

PreAggregationPartitionRangeLoader.ts — populate usageTargetTableNames in the non-partitioned else branch too. For non-partitioned rollups every __usage_N maps to the single result.targetTableName.
CubeStoreQuery.ts — register operators.is_not_distinct_from = 'IS NOT DISTINCT FROM' (Cubestore's SQL frontend supports it natively).
Integration test in cubejs-schema-compiler (two multi-stage branches sharing one pre-aggregation) exercising the multi-usage planner path.
Driver test in cubejs-testing-drivers (querying BigECommerce: base measure plus multi-stage over non-partitioned pre-aggregation) reproducing the production failure end-to-end against real Postgres + Cubestore. New non-partitioned CategoryFlat pre-aggregation added to driver fixtures to host the scenario.
Snapshots added for postgres, bigquery, databricks-jdbc, athena, mysql, redshift, snowflake (*-full plus export-bucket-*/encrypted-pk variants). Test is skipped for mssql (integer encoding of decimals) and clickhouse (FP artifacts) — they keep it in tesseractSkip. Non-Tesseract mode skipped everywhere (multi-stage requires Tesseract).

Testing

Verified locally that the new testing-drivers test reproduces the production failure with the unpatched cube image: Error during planning: Table prod_pre_aggregations.big_e_commerce__category_flat_external_<hashes>__usage_1 was not found.
After the fix, the same test passes against real Postgres + Cubestore (Docker compose + local cube CLI via --mode=local).
Dumped generated SQL with the new IS NOT DISTINCT FROM template — JOIN ON clauses collapse from per-dim (a = b OR (a IS NULL AND b IS NULL)) to single (a IS NOT DISTINCT FROM b) predicates; values in the snapshot unchanged.

Test plan

CI: testing-drivers full suites for postgres, bigquery, databricks-jdbc, athena, mysql, redshift, snowflake (Tesseract-enabled drivers).
CI: schema-compiler postgres integration suite, in particular PreAggregationsMultiStage describe.
Spot-check the new snapshot diffs once CI runs them with -u to catch any per-driver decimal-format discrepancies.

…e-aggregations Tesseract emits SQL with `<base>__usage_N` suffixes when a multi-stage query needs multiple distinct scans of the same pre-aggregation (e.g. a base measure plus a multi-stage measure over it in one query). The orchestrator only populated `usageTargetTableNames` inside the partitioned branch of `PreAggregationPartitionRangeLoader.loadPreAggregations`, so for non-partitioned rollups the suffix-aware replacement never ran; `QueryCache.replacePreAggregationTableNames` then matched the bare base name inside `<base>__usage_0` and left the suffix glued onto the substituted full table name, producing `prod_pre_aggregations.<full>__usage_0 was not found` against Cubestore. Mirror the partitioned branch for the non-partitioned path: when `usageMapping` is present on the description, map every suffix to the single `result.targetTableName`. Coverage: - Integration test in schema-compiler (`two multi-stage branches sharing one pre-aggregation`) that exercises the multi-usage planner path. - Testing-drivers test `querying BigECommerce: base measure plus multi-stage over non-partitioned pre-aggregation` reproducing the production failure end-to-end against real Postgres + Cubestore. - New `CategoryFlat` non-partitioned pre-aggregation in driver fixtures to host the scenario; test enabled in Tesseract mode for the drivers that handle Tesseract cleanly (postgres, bigquery, databricks-jdbc, athena, mysql, redshift, snowflake) and skipped for mssql/clickhouse.

Without the operator registered, Tesseract falls back to `a = b OR (a IS NULL AND b IS NULL)` for every null-safe equality in multi-stage join conditions. For a query with N dimensions this produces a JOIN ON with N such triplets — Cubestore handles it, but the SQL balloons and is unreadable in logs. Cubestore's SQL frontend supports `IS NOT DISTINCT FROM` natively, so register the operator template the same way Postgres/BigQuery/Snowflake do. Verified on the testing-drivers `base measure plus multi-stage over non-partitioned pre-aggregation` scenario: the generated JOIN ON collapses from per-dim `(= ... OR (IS NULL AND IS NULL))` clauses to single `IS NOT DISTINCT FROM` predicates and the test still passes end-to-end against real Cubestore.

claude · 2026-05-20T22:08:04Z

Claude finished @waralexrom's task in 7m 7s —— View job

Code Review: Multi-stage pre-aggregation fix

Gather context and read changed source files
Review core fix: PreAggregationPartitionRangeLoader.ts
Review CubeStoreQuery.ts change
Review integration test in schema-compiler
Review driver test additions and fixtures
Review snapshot files
Post final review

Overall Assessment

This is a well-targeted bug fix with good test coverage. The core change is minimal and correct — filling in the non-partitioned branch that was missing usageTargetTableNames population. The IS NOT DISTINCT FROM template registration follows established patterns across the codebase. A few items to address below.

Core Fix — `PreAggregationPartitionRangeLoader.ts`

Looks good. The fix at lines 353–370 correctly mirrors the partitioned branch's behavior: for non-partitioned rollups, every __usage_N suffix simply maps to the single result.targetTableName. Key observations:

The if (result && ...) null guard is correct — loadPreAggregation() can return null (line 173 in PreAggregationLoader.ts under externalRefresh when no version entry exists).
The { ...result, usageTargetTableNames } spread cleanly extends the result without mutating it.
The downstream consumer in PreAggregations.ts:578-590 already handles the usageTargetTableNames property correctly — it maps each suffix+usage pair to a separate PreAggregationTableToTempTable entry. No changes needed there.

One minor observation: the partitioned branch (lines 302–339) iterates with Object.entries() to access both key and value, because it needs usageInfo.dateRange for partition filtering. The non-partitioned branch only needs the keys, so Object.keys() is the right choice here — well done.

`CubeStoreQuery.ts` — IS NOT DISTINCT FROM

Looks good. This follows the exact pattern already used by:

PostgresQuery.ts:102
BigqueryQuery.ts:369
SnowflakeQuery.ts:121

CubeStore's SQL frontend natively supports IS NOT DISTINCT FROM, so this is safe. The improvement from multi-clause OR (a IS NULL AND b IS NULL) to single IS NOT DISTINCT FROM per dimension in JOIN ON is meaningful for query readability and plan efficiency.

Integration Test — `pre-aggregations-multi-stage.test.ts`

Looks good. The test properly exercises the two-branch multi-usage scenario:

revenue_no_id_sum and revenue_doubled_no_id_sum both reduce by id, creating two distinct scans of the same pre-aggregation.
revenue_no_id_pct combines them as a ratio, triggering the multi-usage planner path.
The expected values (50% = revenue / (revenue * 2)) are mathematically correct and validate that both branches resolve correctly.
The assertions on preAggregationsDescription verify both branches target the same pre-aggregation (md_revenue_by_id).

Driver Tests — Fixture & Skip Configuration

Potential issue with the "two multi-stage branches sharing one pre-aggregation" test:

This test is added to testQueries.ts (line 1915) and will run for all drivers, but:

It is only added to the skip list in postgres.json (line 204).
No other driver fixture (athena, bigquery, databricks-jdbc, mysql, redshift, snowflake) has this test in either skip or tesseractSkip.
Snapshots for this test only exist in postgres-full.test.ts.snap — no other driver has a matching snapshot.

By contrast, the "SeveralMultiStageMeasures" test (which is also multi-stage) is present in both skip and tesseractSkip for drivers like athena, bigquery, databricks-jdbc, and redshift. If those drivers are expected to run the "two multi-stage branches" test, they need snapshots. If not, they need skip entries.

Recommendation: Either add the "two multi-stage branches" test to the skip list (and tesseractSkip where appropriate) for all non-postgres drivers, or add the corresponding snapshots. Otherwise CI will likely fail for those drivers. Fix this →

Postgres Snapshot Reordering

The postgres-full.test.ts.snap diff shows the "ECommerce: count by month + order with non-UTC timezone (Asia/Kolkata)" snapshot being moved from the end of the file to its alphabetically sorted position. This is purely a sort-order change — the snapshot content is identical. Likely caused by Jest's snapshot ordering when new snapshots are inserted. This is benign.

Summary

Area	Status	Notes
Core bug fix	✅ Correct	Non-partitioned branch now populates `usageTargetTableNames`
IS NOT DISTINCT FROM	✅ Correct	Follows established pattern
Integration test	✅ Good coverage	Validates two-branch multi-usage scenario
Driver test - base measure	✅ Complete	Snapshots and skips for all drivers
Driver test - two branches	⚠️ Missing skips/snapshots	Only postgres has skip entry and snapshot
Snapshot changes	✅ Clean	Only additions + one benign reorder

The core fix is solid. The main item to address is the missing skip/tesseractSkip entries (or snapshots) for the "two multi-stage branches" test across non-postgres drivers.

…ithout snapshots The earlier-added `querying BigECommerce: two multi-stage branches sharing one pre-aggregation` test only has a snapshot in postgres-full, but was not present in the skip lists of the other driver fixtures. In Tesseract mode Jest then tries to run it against those drivers (databricks-jdbc, athena, mysql, redshift, snowflake, bigquery, clickhouse, mssql) and fails with `New snapshot was not written. The update flag must be explicitly passed to write a new snapshot.` Add the test name to both `skip` and `tesseractSkip` for every non- postgres driver fixture (it was already in `tesseractSkip` for clickhouse/mssql via the previous commit pattern), matching how `base measure plus multi-stage over non-partitioned pre-aggregation` is handled where snapshots are missing.

…acts BigQuery returns sums with trailing FP precision noise (e.g. `459.7526` → `459.75260000000003`, `-0.2355` → `-0.23549999999999993`) that the postgres NUMERIC representation does not exhibit. The snapshot for `querying BigECommerce: base measure plus multi-stage over non-partitioned pre-aggregation` was seeded from the postgres values when the test was tiraged across drivers; update it to the values actually produced by BigQuery so CI passes.

Athena returns the same FP precision noise as BigQuery for the `base measure plus multi-stage over non-partitioned pre-aggregation` test. Apply the same set of value adjustments as the bigquery snap.

…ckhouse ClickHouse cannot build the non-partitioned `BigECommerce.CategoryFlat` rollup. The corresponding test (`base measure plus multi-stage over non-partitioned pre-aggregation`) is already skipped in both regular and tesseract skip lists for clickhouse, so the rollup is never used — skip the build call too to avoid failing the `must built pre-aggregations` step on clickhouse-full.

…se fixture ClickHouse cannot build the non-partitioned `BigECommerce.CategoryFlat` rollup that was added for the multi-usage repro test. The test itself is already skipped in both regular and tesseract skip lists for clickhouse, so the rollup is dead config on this driver — remove it so the cubestore build step does not have to materialize it.

…ranches test A stray edit replaced one of the multi-stage measures with the regular `revenue` measure, which broke the assertion. Restore the original `revenue_no_id_sum` + `revenue_no_id_pct` pair the test was designed to exercise.

codecov · 2026-05-21T07:16:46Z

Codecov Report

❌ Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.41%. Comparing base (346eb56) to head (3beefca).
⚠️ Report is 9 commits behind head on master.

Files with missing lines	Patch %	Lines
...orchestrator/PreAggregationPartitionRangeLoader.ts	28.57%	4 Missing and 1 partial ⚠️
...bejs-schema-compiler/src/adapter/CubeStoreQuery.ts	0.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (346eb56) and HEAD (3beefca). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (346eb56) HEAD (3beefca)

cubesql 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #10925       +/-   ##
===========================================
- Coverage   78.93%   58.41%   -20.53%     
===========================================
  Files         470      216      -254     
  Lines       92862    17000    -75862     
  Branches     3449     3450        +1     
===========================================
- Hits        73304     9931    -63373     
+ Misses      19054     6564    -12490     
- Partials      504      505        +1

Flag	Coverage Δ
cube-backend	`58.41% <25.00%> (-0.03%)`	⬇️
cubesql	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

waralexrom added 2 commits May 20, 2026 23:52

waralexrom requested review from a team as code owners May 20, 2026 22:07

github-actions Bot added the javascript Pull requests that update Javascript code label May 20, 2026

vercel Bot deployed to Preview May 20, 2026 22:09 View deployment

waralexrom added 6 commits May 21, 2026 00:37

test(testing-drivers): align athena snapshot with observed FP artifacts

aed0888

Athena returns the same FP precision noise as BigQuery for the `base measure plus multi-stage over non-partitioned pre-aggregation` test. Apply the same set of value adjustments as the bigquery snap.

ovr approved these changes May 21, 2026

View reviewed changes

waralexrom merged commit 81b9de0 into master May 21, 2026
119 of 120 checks passed

waralexrom deleted the tesseract-multi-stage-pre-aggr-fix branch May 21, 2026 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tesseract): multi-stage pre-aggregation usage substitution and IS NOT DISTINCT FROM#10925

fix(tesseract): multi-stage pre-aggregation usage substitution and IS NOT DISTINCT FROM#10925
waralexrom merged 8 commits into
masterfrom
tesseract-multi-stage-pre-aggr-fix

waralexrom commented May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waralexrom commented May 20, 2026

Summary

Changes

Testing

Test plan

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Multi-stage pre-aggregation fix

Overall Assessment

Core Fix — PreAggregationPartitionRangeLoader.ts

CubeStoreQuery.ts — IS NOT DISTINCT FROM

Integration Test — pre-aggregations-multi-stage.test.ts

Driver Tests — Fixture & Skip Configuration

Postgres Snapshot Reordering

Summary

Uh oh!

codecov Bot commented May 21, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 20, 2026 •

edited

Loading

Core Fix — `PreAggregationPartitionRangeLoader.ts`

`CubeStoreQuery.ts` — IS NOT DISTINCT FROM

Integration Test — `pre-aggregations-multi-stage.test.ts`