fix(tesseract): make pre-aggregations tests work under native SQL planner by waralexrom · Pull Request #10992 · cube-js/cube

waralexrom · 2026-06-02T09:52:14Z

Summary

Fixes the bugs that surface when running pre-aggregation integration/smoke tests under the Tesseract native SQL planner (CUBEJS_TESSERACT_PRE_AGGREGATIONS=1), so pre-aggregations are safe to enable on Tesseract. Each fix restores parity with the legacy query builder. Also turns the flag on for the Tesseract leg of CI.

Changes

HLL count_distinct_approx: render as an HLL state on build (hll_init) and merge on read — cardinality at top level (hll_cardinality_merge), merged state when feeding a further aggregation (hll_merge, e.g. rolling-window over the rollup).
KSQL dialect mirrored into sqlTemplates (consumed by the Rust renderer): backtick identifier quoting, CONCAT(...) instead of the unsupported ||, and GROUP BY by full expressions instead of column ordinals.
rollupLambda external inherited from its member rollups, so the union read renders with the external store's dialect/placeholders (?) — fixes CubeStore No field named "$1".
preAggregationId forwarded in buildSqlAndParamsRust, so the pre-aggregation partitions API / scheduled refresh isolate a single pre-agg instead of matching a same-reference sibling.
No default ORDER BY for pre-aggregation/total build queries (matches legacy defaultOrder()); the stray Sort made CubeStore reject streaming pre-aggregation builds.
Skip rollupJoin with transitive joins under the native planner (genuinely unsupported), keeping the legacy assertion for the non-Tesseract path.
CI: enable CUBEJS_TESSERACT_PRE_AGGREGATIONS on the Tesseract matrix leg (integration + smoke).

Testing

Rust: full cubesqlplanner lib green (977 tests); new sql_generation tests cover HLL build/read, multi-stage-over-rollup, and rolling-window-over-rollup state path.
JS: pre-aggregations.test.ts (PreAggregations) under Tesseract; smoke-lambda 7/7 green with CUBEJS_TESSERACT_SQL_PLANNER=1 CUBEJS_TESSERACT_PRE_AGGREGATIONS=1 (Postgres rollups, lambda + unionWithSourceData, and KSQL streaming pre-aggregations).

…re-aggregations Pre-aggregation build now serializes count_distinct_approx as an HLL state (hll_init) instead of computing an approximate distinct, and queries reading the rollup merge the stored states (hll_merge / hll_cardinality_merge) instead of re-hashing the state column. Build side: render_measure_as_state is set for pre-aggregation queries. Read side: FinalPreAggregationMeasureSqlNode wraps the matched column with a merge, keeping the merged state when the result feeds a further aggregation and taking its cardinality otherwise.

The Postgres-style mock rendered hll_merge and hll_cardinality_merge identically (both as a cardinality), so tests could not tell which branch a count_distinct_approx pre-aggregation read used. Switch the mock to CubeStore-style forms where the two differ (merge vs cardinality(merge)). Update the approx pre-agg read/build tests to the new forms and add a multi-stage case (sum over count_distinct_approx) asserting the rollup read finalizes to a cardinality, matching the non-pre-agg shape.

…gation Add a rolling-window count_distinct_approx whose pre-aggregation stores the rolling measure itself (mirrors the JS partitionedRolling case). The leaf reads the rollup's HLL state and keeps it merged (hll_merge), while the rolling-window stage finalizes the merged states to a cardinality. This exercises the state branch of FinalPreAggregationMeasureSqlNode that the other approx tests do not reach.

Tesseract renders SQL from sqlTemplates() in Rust, while KsqlQuery only overrode the JS dialect methods. Identifiers came out double-quoted and string concatenation / LIKE wildcards used the unsupported `||` operator. Override sqlTemplates() so KSQL pre-aggregations and queries built by the native planner use backtick quoting and CONCAT(...) instead of `||`.

…lanner Tesseract's rollupJoin key resolution requires each join's ON condition to reference only the two joined cubes so it can extract the stitch keys. The test_facts -> merchant_and_product_dims join is transitive (its condition references intermediate cubes), which the binary resolver rejects. This is an unsupported feature, not a regression — skip it under nativeSqlPlanner and keep the legacy assertion for the non-Tesseract path.

A rollupLambda has no `external` default in the data model, so it reached the planner as external=None -> false. The lambda-union read then rendered with the source cube's templates (Postgres $N placeholders) instead of the external store's (CubeStore ?), and CubeStore could not bind the filter parameter ("No field named $1"). Derive the lambda's external from its member rollups (external only when all members are), mirroring the legacy R.all(p => p.external, descriptions) check. This selects the CubeStore dialect/placeholders for the union read.

…dParamsRust The Rust build path dropped preAggregationId, so getSql() (used by the pre-aggregation partitions API and scheduled refresh) could not isolate a single pre-aggregation. When several rollups share identical references — e.g. a rollupLambda member and a real-time rollup — Tesseract matched the wrong one, attributing one rollup's partitions to another. Forward preAggregationId (as findPreAggregationForQuery already does) so the optimizer filters to the requested pre-aggregation.

Tesseract always applied a default ORDER BY when no explicit order was set, including for pre-aggregation build and total queries. Legacy defaultOrder() returns [] for those. The stray ORDER BY is pointless for materialization and breaks streaming pre-aggregations: CubeStore's Kafka target only accepts an unsorted projection/filter over the source, so a top-level Sort is rejected. Skip default order when pre_aggregation_query or total_query, matching legacy.

…lates ksqlDB does not support positional GROUP BY. The native planner renders GROUP BY from the shared sqlTemplates, which default to column ordinals, so a KSQL streaming pre-aggregation build (CREATE TABLE AS SELECT ... GROUP BY 1,2) was rejected: 'Non-aggregate SELECT expression(s) not part of GROUP BY'. Override statements.group_by_exprs in KsqlQuery to emit the full expressions.

Run integration and integration-smoke jobs with Tesseract pre-aggregations on whenever the Tesseract SQL planner is on, so the full matrix exercises the native pre-aggregation path.

claude · 2026-06-02T09:52:33Z

Claude finished @waralexrom's task in 2m 30s —— View job

Code Review — fix(tesseract): make pre-aggregations work under native SQL planner

Verdict: LGTM. Surgical, well-motivated fixes that move Tesseract toward parity with the legacy builder for pre-aggregations. Code-level findings: 0 high, 0 medium, 3 low / nits.

Details (todos + findings)

Checked existing review threads (none)
Read diff (git diff 35bed42..HEAD)
Reviewed JS/TS changes
Reviewed Rust changes
Reviewed CI workflow change

Per-area notes

rust/.../pre_aggregations_compiler.rs (rollupLambda external inheritance)
The new rule — external if every member rollup is external — is correct for routing, but two edge cases are worth noting:

If pre_aggrs_for_lambda were ever empty, .iter().all(...) returns true, incorrectly granting Some(true). In practice a lambda always has members, but a defensive !is_empty() && ...all(...) would make this robust.
When members mix Some(false) / None, the result falls back to static_data.external (which is None for a lambda that omits external). Today downstream code already treats None as "not external," so behaviorally fine, but worth keeping in mind if external semantics ever tighten.

rust/.../final_pre_aggregation_measure.rs + physical_plan_builder/builder.rs
The count_approx_as_state plumbing is consistent with the existing render_measure_as_state flag used elsewhere (multistage leaf, final_measure). The cardinality-vs-state branch in FinalPreAggregationMeasureSqlNode::process_node (lines ~64–80) is the correct place for the decision and the comment explains why well. Tests pin both paths (test_count_distinct_approx_pre_agg_read_merges_state, ..._rolling_pre_agg_keeps_state, ..._multistage_pre_agg_reads_cardinality) — good coverage.

rust/.../query_properties.rs (no default ORDER BY for build/total)
Correct condition (pre_aggregation_query || total_query) and the comment explains the streaming-pre-agg motivation. No regression risk to ordinary queries since the branch is gated on these flags.

packages/cubejs-ksql-driver/src/KsqlQuery.ts (sqlTemplates override)

BaseQuery.sqlTemplates() returns a fresh object each call, so mutating templates.quotes, expressions.concat_strings, etc. is safe.
The three mirrored bits (backtick quoting, CONCAT, full-expr GROUP BY) match what KsqlQuery already does in its non-template overrides (quoteIdentifier, concatStringsSql, KsqlFilter). Nit: no JS-level unit test exercises sqlTemplates() for KSQL — drift between the two code paths would be silent. Consider a small assertion in a future change.

packages/cubejs-schema-compiler/src/adapter/BaseQuery.js
preAggregationId: this.options.preAggregationId || null — straightforward forwarding; matches the PR description's isolation goal. Nit: every other passthrough on these lines preserves undefined (e.g., joinHints, ungrouped); coercing to null is harmless but slightly inconsistent.

packages/cubejs-schema-compiler/test/integration/postgres/pre-aggregations.test.ts
Wrapping the existing tests in if (getEnv('nativeSqlPlanner')) { it.skip(...) } else { it(...) } keeps coverage on the legacy path and documents the gap on Tesseract. Verbose but fine.

.github/workflows/push.yml
Setting CUBEJS_TESSERACT_PRE_AGGREGATIONS from matrix.use_tesseract_sql_planner only — i.e. the flag is enabled exactly when the planner is — is the right shape. No further matrix expansion needed for this PR.

Suggestions (low priority)

Defensive !is_empty() && pre_aggrs_for_lambda.iter().all(...) in pre_aggregations_compiler.rs lambda-external block.
Add a JS unit test asserting KsqlQuery#sqlTemplates() returns the three overrides (quotes, concat_strings, group_by_exprs).
Consider preserving undefined (or documenting the null coercion) for preAggregationId in BaseQuery.js:957 for consistency with sibling fields.

· branch: `tesseract-enable-all-pre-aggrs-test`

codecov · 2026-06-02T10:04:37Z

Codecov Report

❌ Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.81%. Comparing base (35bed42) to head (ef20929).
⚠️ Report is 9 commits behind head on master.

Files with missing lines	Patch %	Lines
...bejs-schema-compiler/src/adapter/CubeStoreQuery.ts	0.00%	2 Missing ⚠️
...s/cubejs-schema-compiler/src/adapter/MysqlQuery.ts	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10992      +/-   ##
==========================================
- Coverage   78.81%   78.81%   -0.01%     
==========================================
  Files         470      471       +1     
  Lines       93438    94328     +890     
  Branches     3466     3494      +28     
==========================================
+ Hits        73644    74345     +701     
- Misses      19291    19474     +183     
- Partials      503      509       +6

Flag	Coverage Δ
cube-backend	`58.50% <33.33%> (+0.07%)`	⬆️
cubesql	`83.32% <ø> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

A subQuery dimension's inner query is aggregated per primary key and joined back by it, so its order is meaningless. The default order it received rendered as ORDER BY + OFFSET/FETCH inside a derived table, which MSSQL rejects (Invalid usage of the option NEXT in the FETCH statement) when the subquery sits inside a pre-aggregation build (SELECT ... INTO). Legacy emits no order there. Set an empty order on the subquery's QueryProperties.

A segment projected as a pre-aggregation dimension is a boolean expression. The native planner emitted it raw, which MSSQL rejects (no boolean type). Mark such members (segment resolved via add_dimension_evaluator) and render them through a new wrap_segment_select template: MSSQL CAST(... AS BIT), CubeStore/MySQL IF(...,1,0), default unchanged.

…e-aggregation A segment is stored as a boolean column in a rollup. On read MSSQL renders it as a BIT column, which can't be used as a bare WHERE predicate. Add a wrap_segment_filter template (default passthrough, MSSQL/MySQL/CubeStore `{{ expr }} = 1`) and apply it in BaseSegment::to_sql when the filters context is reading a pre-aggregation.

claude · 2026-06-02T14:12:54Z

Claude finished @waralexrom's task in 1m 4s —— View job

Code Review — fix(tesseract): make pre-aggregations work under native SQL planner (delta since last review)

Verdict: LGTM. The three new commits (51c6b85, b2a8f24, ef20929) extend Tesseract→legacy parity for segments-as-dimensions and dimension subquery ordering. Findings on the new delta: 0 high, 0 medium, 2 nits. Earlier-review findings still apply.

Details — new commits, file-by-file

What's new since the last review (`6e0e31c..HEAD`)

51c6b85 — no default ORDER BY in dimension subquery (dimension_subquery_planner.rs)
b2a8f24 — wrap segment-as-dimension per dialect (new SegmentDimensionSqlNode, wrap_segment_select/wrap_segment_filter templates, MemberExpressionSymbol::with_is_segment, MSSQL/MySQL/CubeStore overrides)
ef20929 — when reading a pre-aggregation, compare segment to its truthy value (BaseSegment::to_sql + reading_pre_aggregation flag in FiltersContext)

Per-area notes

dimension_subquery_planner.rs:128 — .order_by(Some(vec![]))
Correct. The subquery is grouped by primary key and joined back by it, so any ordering is wasted work; the explicit empty list also prevents MSSQL choking when the derived table grows an OFFSET/FETCH. Comment captures the why well.

SegmentDimensionSqlNode (physical_plan/sql_nodes/segment_dimension.rs)
Wraps only MemberSymbol::MemberExpression whose is_segment() is true; everything else passes through. The placement in factory.rs:194 is right — only on the build path; the read path keeps RenderReferencesSqlNode because the segment is already a stored column.

MemberExpressionSymbol::with_is_segment (member_expression_symbol.rs)
Mirrors the existing with_parenthesized builder. Default false for is_segment in new(...) preserves existing behavior for all other call sites. Compiler::add_dimension_evaluator_by_path is the only producer — chained with with_parenthesized() so a segment-as-dimension is always both. Good.

BaseSegment::to_sql + FiltersContext.reading_pre_aggregation
Correctly gates on filters_ctx.reading_pre_aggregation (derived from nodes_factory.reading_pre_aggregation() = !pre_aggregation_dimensions_references.is_empty()), so live-base-table filters still emit a bare boolean. Both VisitorContext::new and ::new_with_cube_alias_prefix are updated. Nit below.

JS dialect templates

BaseQuery.js: defaults to identity ({{ expr }}) for both — passthrough for Postgres/BigQuery/Snowflake/etc. ✓
CubeStoreQuery.ts: IF({{ expr }}, 1, 0) / {{ expr }} = 1. ✓
MysqlQuery.ts: same as CubeStore. ✓
MssqlQuery.ts: CAST((CASE WHEN ... THEN 1 ELSE 0 END) AS BIT) / {{ expr }} = 1. Comments call out the BIT rationale clearly.

Nits (low priority)

BaseSegment::to_sql only checks reading_pre_aggregation — not the dialect's wrapper. For Postgres/BigQuery the wrapper renders identity, so it's a no-op string copy, but you still pay a template render per segment under a rollup read. Probably negligible; mentioning only because the symmetric build-side SegmentDimensionSqlNode likewise unconditionally calls wrap_segment_select for any segment. If template render shows up in profiles, gate on a dialect-level "needs wrap" boolean.
SegmentDimensionSqlNode activation is broad. It wraps every segment projected via the evaluate_sql_processor path, even though the bug it fixes only requires wrapping when the segment is selected/grouped (not when it appears inside a larger expression that already coerces it). For Postgres this is moot; for MSSQL the CAST(... AS BIT) becomes visible inside derived expressions where the user might have expected the raw boolean. Worth a follow-up only if someone reports it — the legacy builder behaves similarly here.

Sanity checks I ran

All new templates are defined in the mock renderer (mock_sql_templates_render.rs) so Rust tests don't NPE on lookup.
is_segment defaults to false in MemberExpressionSymbol::new, so no other producer is affected.
reading_pre_aggregation flows through both VisitorContext constructors — no path bypasses it.
No new override needed on OracleQuery/SqliteQuery/PrestodbQuery — they don't currently support pre-aggregations with segments-as-dimensions; the BaseQuery passthrough is safe.

· branch: `tesseract-enable-all-pre-aggrs-test`

waralexrom added 10 commits June 1, 2026 18:24

ci: enable CUBEJS_TESSERACT_PRE_AGGREGATIONS on the Tesseract matrix leg

6e0e31c

Run integration and integration-smoke jobs with Tesseract pre-aggregations on whenever the Tesseract SQL planner is on, so the full matrix exercises the native pre-aggregation path.

waralexrom requested review from a team as code owners June 2, 2026 09:52

github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code data source driver labels Jun 2, 2026

vercel Bot had a problem deploying to Preview June 2, 2026 09:53 Failure

waralexrom added 3 commits June 2, 2026 14:46

vercel Bot deployed to Preview June 2, 2026 14:14 View deployment

waralexrom changed the title ~~fix(tesseract): make pre-aggregations work under native SQL planner~~ fix(tesseract): make pre-aggregations tests work under native SQL planner Jun 2, 2026

ovr approved these changes Jun 2, 2026

View reviewed changes

waralexrom merged commit 33ca484 into master Jun 2, 2026
187 of 191 checks passed

waralexrom deleted the tesseract-enable-all-pre-aggrs-test branch June 2, 2026 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tesseract): make pre-aggregations tests work under native SQL planner#10992

fix(tesseract): make pre-aggregations tests work under native SQL planner#10992
waralexrom merged 13 commits into
masterfrom
tesseract-enable-all-pre-aggrs-test

waralexrom commented Jun 2, 2026

Uh oh!

claude Bot commented Jun 2, 2026 •

edited

Loading

Per-area notes

Suggestions (low priority)

Uh oh!

codecov Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 2, 2026 •

edited

Loading

What's new since the last review (`6e0e31c..HEAD`)

Per-area notes

Nits (low priority)

Sanity checks I ran

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waralexrom commented Jun 2, 2026

Summary

Changes

Testing

Uh oh!

claude Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — fix(tesseract): make pre-aggregations work under native SQL planner

Per-area notes

Suggestions (low priority)

Uh oh!

codecov Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — fix(tesseract): make pre-aggregations work under native SQL planner (delta since last review)

What's new since the last review (6e0e31c..HEAD)

Per-area notes

Nits (low priority)

Sanity checks I ran

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Jun 2, 2026 •

edited

Loading

codecov Bot commented Jun 2, 2026 •

edited

Loading

claude Bot commented Jun 2, 2026 •

edited

Loading

What's new since the last review (`6e0e31c..HEAD`)