fix(tesseract): make pre-aggregations tests work under native SQL planner#10992
Conversation
…re-aggregations Pre-aggregation build now serializes count_distinct_approx as an HLL state (hll_init) instead of computing an approximate distinct, and queries reading the rollup merge the stored states (hll_merge / hll_cardinality_merge) instead of re-hashing the state column. Build side: render_measure_as_state is set for pre-aggregation queries. Read side: FinalPreAggregationMeasureSqlNode wraps the matched column with a merge, keeping the merged state when the result feeds a further aggregation and taking its cardinality otherwise.
The Postgres-style mock rendered hll_merge and hll_cardinality_merge identically (both as a cardinality), so tests could not tell which branch a count_distinct_approx pre-aggregation read used. Switch the mock to CubeStore-style forms where the two differ (merge vs cardinality(merge)). Update the approx pre-agg read/build tests to the new forms and add a multi-stage case (sum over count_distinct_approx) asserting the rollup read finalizes to a cardinality, matching the non-pre-agg shape.
…gation Add a rolling-window count_distinct_approx whose pre-aggregation stores the rolling measure itself (mirrors the JS partitionedRolling case). The leaf reads the rollup's HLL state and keeps it merged (hll_merge), while the rolling-window stage finalizes the merged states to a cardinality. This exercises the state branch of FinalPreAggregationMeasureSqlNode that the other approx tests do not reach.
Tesseract renders SQL from sqlTemplates() in Rust, while KsqlQuery only overrode the JS dialect methods. Identifiers came out double-quoted and string concatenation / LIKE wildcards used the unsupported `||` operator. Override sqlTemplates() so KSQL pre-aggregations and queries built by the native planner use backtick quoting and CONCAT(...) instead of `||`.
…lanner Tesseract's rollupJoin key resolution requires each join's ON condition to reference only the two joined cubes so it can extract the stitch keys. The test_facts -> merchant_and_product_dims join is transitive (its condition references intermediate cubes), which the binary resolver rejects. This is an unsupported feature, not a regression — skip it under nativeSqlPlanner and keep the legacy assertion for the non-Tesseract path.
A rollupLambda has no `external` default in the data model, so it reached
the planner as external=None -> false. The lambda-union read then rendered
with the source cube's templates (Postgres $N placeholders) instead of the
external store's (CubeStore ?), and CubeStore could not bind the filter
parameter ("No field named $1").
Derive the lambda's external from its member rollups (external only when all
members are), mirroring the legacy R.all(p => p.external, descriptions) check.
This selects the CubeStore dialect/placeholders for the union read.
…dParamsRust The Rust build path dropped preAggregationId, so getSql() (used by the pre-aggregation partitions API and scheduled refresh) could not isolate a single pre-aggregation. When several rollups share identical references — e.g. a rollupLambda member and a real-time rollup — Tesseract matched the wrong one, attributing one rollup's partitions to another. Forward preAggregationId (as findPreAggregationForQuery already does) so the optimizer filters to the requested pre-aggregation.
Tesseract always applied a default ORDER BY when no explicit order was set, including for pre-aggregation build and total queries. Legacy defaultOrder() returns [] for those. The stray ORDER BY is pointless for materialization and breaks streaming pre-aggregations: CubeStore's Kafka target only accepts an unsorted projection/filter over the source, so a top-level Sort is rejected. Skip default order when pre_aggregation_query or total_query, matching legacy.
…lates ksqlDB does not support positional GROUP BY. The native planner renders GROUP BY from the shared sqlTemplates, which default to column ordinals, so a KSQL streaming pre-aggregation build (CREATE TABLE AS SELECT ... GROUP BY 1,2) was rejected: 'Non-aggregate SELECT expression(s) not part of GROUP BY'. Override statements.group_by_exprs in KsqlQuery to emit the full expressions.
Run integration and integration-smoke jobs with Tesseract pre-aggregations on whenever the Tesseract SQL planner is on, so the full matrix exercises the native pre-aggregation path.
|
Claude finished @waralexrom's task in 2m 30s —— View job Code Review — fix(tesseract): make pre-aggregations work under native SQL plannerVerdict: LGTM. Surgical, well-motivated fixes that move Tesseract toward parity with the legacy builder for pre-aggregations. Code-level findings: 0 high, 0 medium, 3 low / nits. Details (todos + findings)
Per-area notes
Suggestions (low priority)
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10992 +/- ##
==========================================
- Coverage 78.81% 78.81% -0.01%
==========================================
Files 470 471 +1
Lines 93438 94328 +890
Branches 3466 3494 +28
==========================================
+ Hits 73644 74345 +701
- Misses 19291 19474 +183
- Partials 503 509 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
A subQuery dimension's inner query is aggregated per primary key and joined back by it, so its order is meaningless. The default order it received rendered as ORDER BY + OFFSET/FETCH inside a derived table, which MSSQL rejects (Invalid usage of the option NEXT in the FETCH statement) when the subquery sits inside a pre-aggregation build (SELECT ... INTO). Legacy emits no order there. Set an empty order on the subquery's QueryProperties.
A segment projected as a pre-aggregation dimension is a boolean expression. The native planner emitted it raw, which MSSQL rejects (no boolean type). Mark such members (segment resolved via add_dimension_evaluator) and render them through a new wrap_segment_select template: MSSQL CAST(... AS BIT), CubeStore/MySQL IF(...,1,0), default unchanged.
…e-aggregation
A segment is stored as a boolean column in a rollup. On read MSSQL renders it
as a BIT column, which can't be used as a bare WHERE predicate. Add a
wrap_segment_filter template (default passthrough, MSSQL/MySQL/CubeStore
`{{ expr }} = 1`) and apply it in BaseSegment::to_sql when the filters context
is reading a pre-aggregation.
|
Claude finished @waralexrom's task in 1m 4s —— View job Code Review — fix(tesseract): make pre-aggregations work under native SQL planner (delta since last review)Verdict: LGTM. The three new commits ( Details — new commits, file-by-fileWhat's new since the last review (
|
Summary
Fixes the bugs that surface when running pre-aggregation integration/smoke tests under the Tesseract native SQL planner (
CUBEJS_TESSERACT_PRE_AGGREGATIONS=1), so pre-aggregations are safe to enable on Tesseract. Each fix restores parity with the legacy query builder. Also turns the flag on for the Tesseract leg of CI.Changes
count_distinct_approx: render as an HLL state on build (hll_init) and merge on read — cardinality at top level (hll_cardinality_merge), merged state when feeding a further aggregation (hll_merge, e.g. rolling-window over the rollup).sqlTemplates(consumed by the Rust renderer): backtick identifier quoting,CONCAT(...)instead of the unsupported||, and GROUP BY by full expressions instead of column ordinals.externalinherited from its member rollups, so the union read renders with the external store's dialect/placeholders (?) — fixes CubeStoreNo field named "$1".preAggregationIdforwarded inbuildSqlAndParamsRust, so the pre-aggregation partitions API / scheduled refresh isolate a single pre-agg instead of matching a same-reference sibling.ORDER BYfor pre-aggregation/total build queries (matches legacydefaultOrder()); the straySortmade CubeStore reject streaming pre-aggregation builds.rollupJoinwith transitive joins under the native planner (genuinely unsupported), keeping the legacy assertion for the non-Tesseract path.CUBEJS_TESSERACT_PRE_AGGREGATIONSon the Tesseract matrix leg (integration + smoke).Testing
cubesqlplannerlib green (977 tests); newsql_generationtests cover HLL build/read, multi-stage-over-rollup, and rolling-window-over-rollup state path.pre-aggregations.test.ts(PreAggregations) under Tesseract;smoke-lambda7/7 green withCUBEJS_TESSERACT_SQL_PLANNER=1 CUBEJS_TESSERACT_PRE_AGGREGATIONS=1(Postgres rollups, lambda +unionWithSourceData, and KSQL streaming pre-aggregations).