feat(tesseract): Support separate pre-aggregations for different multi-fact subqueries by waralexrom · Pull Request #10718 · cube-js/cube

waralexrom · 2026-04-20T12:38:21Z

Check List

Tests have been run in packages where changes have been made if available
Linter has been run for changed code
Tests for the changes have been added if not covered yet
Docs have been added / updated if required

claude · 2026-04-20T12:38:41Z

Claude finished @waralexrom's task in 2m 37s —— View job

Code Review: Separate Pre-aggregations for Multi-fact Subqueries

Gather context and understand PR changes
Review code quality and architecture
Review security considerations
Review performance implications
Review test coverage
Post comprehensive review

Summary

This PR refactors how multiplied (multi-fact) measures are planned and optimized in Tesseract. The key architectural change is unifying the CTE management for both multi-stage members and multiplied measures into a single CteState, eliminating the separate ResolveMultipliedMeasures logical node. This enables pre-aggregation matching to work independently on each multi-fact subquery (regular or multiplied) rather than requiring the entire multiplied-measures subtree to match or fail as a unit.

What changed:

Deleted ResolveMultipliedMeasures (logical + physical) — multiplied measures are now planned as individual LogicalMultiStageMember CTEs
New CteState — shared counter/accumulator for both MultiStageQueryPlanner and MultipliedMeasuresQueryPlanner
MultiStageLeafMeasure.measure → .measures (Vec) — supports multi-measure CTEs from regular measure subqueries
pre_aggregation_override on AggregateMultipliedSubquery — allows the pre-agg optimizer to short-circuit individual multiplied subqueries
Pre-agg optimizer refactored — try_rewrite_query now handles simple rewrite + multi-stage fallback in one place; new try_rewrite_schema_and_filter for AggregateMultipliedSubquery matching
New bucketing integration tests — 8 tests covering multi-stage dimensions with bucketing logic

The approach is sound and the refactoring materially simplifies the codebase (~200 net lines removed despite adding tests and new functionality).

Code Quality

Positive:

Good separation of concerns: CteState is a clean, focused abstraction
The ResolveMultipliedMeasures + ResolvedMultipliedMeasures enum removal significantly simplifies FullKeyAggregate and all three aggregate strategy processors
The pre-aggregation optimizer flows more linearly now

Issues:

try_rewrite_schema_and_filter always uses TimeShiftState::default() — rust/cubesqlplanner/cubesqlplanner/src/logical_plan/optimizers/pre_aggregation/optimizer.rs:139-143

When matching a AggregateMultipliedSubquery against pre-aggregations, try_rewrite_schema_and_filter hardcodes TimeShiftState::default() for date range extraction. This is likely correct since multiplied measure subqueries don't have time shifts, but the same hardcoding also appears in the PlanNode::AggregateMultipliedSubquery arm at line 218. Consider whether the time shifts from the enclosing context should be threaded through here.

Investigate this →
root_filter used for all multiplied subquery matching — optimizer.rs:189-191

The comment says "multiplied-measure CTEs don't carry their own filter — logically they apply the same filter as the root query." This is a key semantic assumption. If a future change adds filter pushdown into multiplied measure subqueries, this would silently become wrong. The comment is helpful but consider adding a debug assertion or at minimum ensuring this invariant is documented at the AggregateMultipliedSubquery struct level.
Duplicate code in try_rewrite_simple_query vs try_rewrite_schema_and_filter — optimizer.rs:117-131 vs optimizer.rs:139-170

These two methods share nearly identical logic (schema/filter matching → make source → build Query). The only differences are: (a) try_rewrite_simple_query copies the existing query's modifiers while try_rewrite_schema_and_filter creates empty modifiers, and (b) the date range extraction. Consider extracting a shared helper.

Fix this →
Empty data_queries left behind in strategy files — The full_join_aggregate_strategy.rs:103, inner_join_aggregate_strategy.rs:29, and keys_aggregate_strategy.rs:33 files still declare let mut data_queries = vec![]; or let mut keys_queries = vec![]; immediately before the remaining loop. Not a bug, but now the code looks like it was hastily trimmed rather than cleanly restructured. No need to fix now, but worth noting.

Security

No security concerns. This is internal query planning logic with no user-facing input parsing, no SQL injection vectors (queries are built via typed builders), and no auth/authz changes.

Performance

CTE naming is globally sequential now — With CteState shared across multi-stage and multiplied planners, CTE names are cte_0, cte_1, ... in order of planning, not grouped by type. This is fine functionally but means CTE numbering is now sensitive to planning order. If the order of multi_stage_query_planner.plan_queries() vs multiplied_measures_query_planner.plan_queries() ever changes, CTE names will shift, potentially breaking snapshot tests. This is acceptable but worth being aware of.
No performance regression expected — The refactoring doesn't add extra passes or copies. The pre_aggregation_override field adds one Option<Rc<Query>> per AggregateMultipliedSubquery which is negligible.
Pre-aggregation matching improvement — The new approach tries simple rewrite first, then falls back to multi-stage rewriting (line 95-100). This is more efficient than the old code which checked !plan.multistage_members().is_empty() upfront and went straight to the multi-stage path, potentially missing a simpler match.

Testing

Strengths:

8 new integration tests covering bucketing scenarios: simple, multistage measures, complex dimensions, cross-cube references, joins, multi-dimension, and concatenated dimensions
Tests exercise both SQL generation (build_sql) and actual execution against Postgres (try_execute_pg)
Good seed data with diverse customer revenue patterns enabling meaningful assertions

Concerns:

No test for the pre-aggregation optimization path — The new PlanNode::AggregateMultipliedSubquery arm in the optimizer (line 216-233) and try_rewrite_schema_and_filter are not directly tested. The bucketing tests validate the planning pipeline but don't exercise the pre-aggregation rewrite of multiplied subqueries. Consider adding a test that provides a pre-aggregation definition matching a multiplied measure subquery.
No negative/error tests — All tests are happy-path. Consider adding a test where pre-aggregation matching partially fails (some subqueries match, some don't) to verify the rollback behavior (line 243-250) works correctly.
cumulative measure rejection — The new guard in measure_matcher.rs:34-38 that rejects cumulative measures from pre-aggregation matching is important but not directly tested.

Minor Notes

Pre-existing typo: only_addictive should be only_additive in measure_matcher.rs — not introduced by this PR.
modifers typo in Query builder (line 125, 161) — also pre-existing, not from this PR.
The regex update in pre-aggregation-utils.ts to handle WITH in addition to SELECT is correct and necessary since multiplied measure CTEs may now produce WITH ... SELECT patterns.

Verdict

This is a well-designed architectural refactoring that simplifies the codebase while enabling more granular pre-aggregation matching. The main actionable feedback is around code duplication between the two rewrite methods and the missing test coverage for the pre-aggregation optimization of multiplied subqueries. The rest is solid.

codecov · 2026-04-20T12:49:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.78%. Comparing base (46b7106) to head (48526d0).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #10718   +/-   ##
=======================================
  Coverage   78.78%   78.78%           
=======================================
  Files         466      466           
  Lines       92050    92050           
  Branches     3378     3378           
=======================================
+ Hits        72525    72526    +1     
+ Misses      19033    19032    -1     
  Partials      492      492

Flag	Coverage Δ
cube-backend	`58.03% <ø> (ø)`
cubesql	`83.40% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…i-fact subqueries

claude · 2026-04-28T07:42:59Z

Claude finished @waralexrom's task in 2m 14s —— View job

Code Review: Separate Pre-aggregations for Multi-fact Subqueries

Gather context and understand PR changes
Review code quality and architecture
Review security considerations
Review performance implications
Review test coverage
Post comprehensive review

Summary

This PR makes a significant architectural improvement to Tesseract's query planning for multiplied (multi-fact) measures. The key change is unifying CTE management for both multi-stage members and multiplied measures into a single CteState, and removing the separate ResolveMultipliedMeasures logical/physical node. This enables the pre-aggregation optimizer to independently match each multi-fact subquery (regular or multiplied) against pre-aggregations, rather than requiring the entire multiplied-measures subtree to match or fail as a unit.

Core architectural changes:

Deleted ResolveMultipliedMeasures (logical + physical) — multiplied measures now planned as individual LogicalMultiStageMember CTEs
New CteState — shared counter/accumulator for both planners
MultiStageLeafMeasure.measure → .measures (Vec) — supports multi-measure CTEs from regular measure subqueries
pre_aggregation_override on AggregateMultipliedSubquery — lets the pre-agg optimizer short-circuit individual multiplied subqueries
Pre-agg optimizer refactored — try_rewrite_query now handles simple + multi-stage fallback; new try_rewrite_schema_and_filter for AggregateMultipliedSubquery matching
Comprehensive test coverage — 8 bucketing tests, 12+ multi-fact pre-agg tests, 3 multi-stage pre-agg tests

The refactoring is well-designed — it reduces net code (~200 lines removed despite significant new tests) while enabling genuinely new functionality.

Code Quality

Strengths:

CteState is a clean, focused abstraction that eliminates dual tracking between MultiStageQueryPlanner and MultipliedMeasuresQueryPlanner
The removal of ResolveMultipliedMeasures + the FullKeyAggregateContent::ResolvedMultipliedMeasures variant significantly simplifies all three aggregate strategy processors (full join, inner join, keys)
The pre-aggregation optimizer flows more linearly — try simple match first, then fall back to multi-stage rewriting

Issues:

Code duplication between try_rewrite_simple_query and try_rewrite_schema_and_filter — optimizer.rs:115-178

These two methods share nearly identical logic: iterate pre-aggregations, call is_schema_and_filters_match, call make_pre_aggregation_source, build a Query. The only differences are: (a) try_rewrite_simple_query copies existing query modifiers while try_rewrite_schema_and_filter creates empty modifiers, and (b) try_rewrite_schema_and_filter loops over all pre-aggregations internally while try_rewrite_simple_query is called per pre-aggregation.

A shared helper that takes schema, filter, pre-aggregation, and modifiers would reduce this duplication to a few lines.

Fix this →
TimeShiftState::default() hardcoded for multiplied subquery matching — optimizer.rs:149-153 and optimizer.rs:224-226

When matching AggregateMultipliedSubquery against pre-aggregations, try_rewrite_schema_and_filter always uses TimeShiftState::default() for date range extraction. This is likely correct since multiplied measure subqueries don't have time shifts at this level (time shifts live on the enclosing multi-stage member). However, the assumption is implicit. Consider adding a brief comment at the call site in try_rewrite_query_with_multistages (line 224) noting that this is intentional, or threading the time shift state through for correctness.
root_filter used for all multiplied subquery matching — optimizer.rs:194

The comment at line 192 explains that multiplied-measure CTEs logically apply the same filter as the root query. This is a key semantic invariant. If a future change adds filter pushdown into multiplied measure subqueries, this would silently become wrong. Consider adding a debug_assert! or documenting this invariant at the AggregateMultipliedSubquery struct level (aggregate_multiplied_subquery.rs).
Mixed external/non-external rejection could fail silently — optimizer.rs:268-277

The check for mixed external/non-external pre-aggregation usages at line 270-276 rolls back all usages if there's a mix. This is correct, but only covers usages added during the multi-stage rewrite (from saved_usages_len). If a pre-aggregation from a different optimization pass were somehow present, it wouldn't be checked. This is fine today but worth noting as a fragile boundary.
Pre-existing typos (not introduced by this PR):
- only_addictive should be only_additive (measure_matcher.rs:8,14,41)
- modifers should be modifiers (optimizer.rs:129,166)

Security

No security concerns. All changes are internal query planning logic. No user-facing input parsing, no SQL injection vectors (queries built via typed builders), and no auth/authz changes.

Performance

No performance regression expected. The refactoring doesn't add extra passes or copies. The pre_aggregation_override field adds one Option<Rc<Query>> per AggregateMultipliedSubquery, which is negligible.
Pre-aggregation matching improvement. The new approach tries simple rewrite first, then falls back to multi-stage rewriting (optimizer.rs:96-112). This is more efficient than the old code which checked !plan.multistage_members().is_empty() upfront and went straight to the multi-stage path, potentially missing a simpler match.
CTE naming is globally sequential now. With CteState shared across multi-stage and multiplied planners, CTE names are cte_0, cte_1, ... in order of planning, not grouped by type. This is fine functionally but means CTE numbering is sensitive to planning order. Snapshot tests already account for this.

Testing

Strengths:

12+ new multi-fact pre-aggregation tests (multi_fact.rs) covering: separate pre-aggs, totals, single rollup match, mixed regular + multiplied, partial match rollback, filtered variants — all with both SQL generation and Postgres execution
3 new multi-stage pre-aggregation tests (multi_stage.rs) covering time shift with leaf measures, time shift with multi-stage measures, and multiplied leaf time shift
8 new bucketing integration tests (bucketing.rs) covering simple, multi-stage measures, complex dimensions, cross-cube references, joins, multi-dimension, and concatenated dimensions
Tests exercise both build_sql_with_used_pre_aggregations (assertions on pre-agg matching) and actual execution against Postgres (try_execute_pg)
Good seed data with diverse patterns enabling meaningful assertions
The test_multi_fact_partial_match_rolls_back test (line 454) specifically validates the rollback behavior when one of multiple subqueries cannot match — this was a gap identified in the prior review and is now addressed

One observation:

The cumulative measure rejection guard in measure_matcher.rs:34-38 is not directly unit-tested (there's no test that creates a cumulative measure and verifies it returns false). The existing MeasureMatcher unit tests only cover non-cumulative measures. This is a minor gap — the behavior is covered indirectly by the integration tests.

Verdict

This is a well-executed architectural refactoring. It simplifies the codebase materially while enabling more granular pre-aggregation matching for multi-fact queries. The test coverage is thorough and addresses gaps from the prior review (rollback behavior, filtered variants). The main actionable feedback is around code duplication between the two rewrite methods and documenting the root_filter invariant. The rest is solid work.

waralexrom requested a review from a team as a code owner April 20, 2026 12:38

github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Apr 20, 2026

vercel Bot deployed to Preview April 20, 2026 12:39 View deployment

waralexrom force-pushed the tesseract-multi-stage-pre-aggregations-full-support branch from a884e18 to 64242fe Compare April 24, 2026 10:53

waralexrom requested review from a team as code owners April 24, 2026 10:53

Base automatically changed from tesseract-multi-stage-pre-aggregations-full-support to master April 27, 2026 09:02

waralexrom added 18 commits April 27, 2026 12:23

feat(tesseract): Support separate pre-aggregations for different mult…

8e2aaf5

…i-fact subqueries

in work

e05b9ed

bucketing tests

38d2eab

in work

b3c3f98

in work

ecc6fd0

multiplied measures as subqueries

7d39d2b

in work

41022c2

in work

10ac780

in work

123d9e8

in work

82ba94f

in work

f45afff

in work

28ec2fc

in work

af074f5

fmt

e839a64

in work

a814d98

in work

5db53eb

in work

2d9c529

in work

48526d0

waralexrom force-pushed the tesseract-multi-fact-separate-pre-aggregations branch from 8a7bcd9 to 48526d0 Compare April 27, 2026 10:30

vercel Bot deployed to Preview April 27, 2026 10:33 View deployment

waralexrom closed this Apr 28, 2026

waralexrom reopened this Apr 28, 2026

ovr approved these changes Apr 28, 2026

View reviewed changes

waralexrom merged commit 187ee17 into master Apr 28, 2026
231 of 234 checks passed

waralexrom deleted the tesseract-multi-fact-separate-pre-aggregations branch April 28, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tesseract): Support separate pre-aggregations for different multi-fact subqueries#10718

feat(tesseract): Support separate pre-aggregations for different multi-fact subqueries#10718
waralexrom merged 18 commits intomasterfrom
tesseract-multi-fact-separate-pre-aggregations

waralexrom commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waralexrom commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Separate Pre-aggregations for Multi-fact Subqueries

Summary

Code Quality

Security

Performance

Testing

Minor Notes

Verdict

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Separate Pre-aggregations for Multi-fact Subqueries

Summary

Code Quality

Security

Performance

Testing

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 20, 2026 •

edited

Loading

codecov Bot commented Apr 20, 2026 •

edited

Loading

claude Bot commented Apr 28, 2026 •

edited

Loading