Skip to content

fix(tesseract): Prevent raw cube refs leaking into multi-stage pre-agg queries#10948

Open
MazterQyou wants to merge 1 commit into
masterfrom
tesseract/raw-refs-leak-fix
Open

fix(tesseract): Prevent raw cube refs leaking into multi-stage pre-agg queries#10948
MazterQyou wants to merge 1 commit into
masterfrom
tesseract/raw-refs-leak-fix

Conversation

@MazterQyou
Copy link
Copy Markdown
Member

Check List

  • Tests have been run in packages where changes have been made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made

This PR fixes a bug in Tesseract where multi-stage queries hitting an external pre-aggregation could route raw dwh-table SQL to Cube Store, leading to Internal: Error during planning: Table default.orders was not found error.

…g queries

Signed-off-by: Alex Qyoun-ae <4062971+MazterQyou@users.noreply.github.com>
@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 27, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 27, 2026

Claude finished @MazterQyou's task in 2m 33s —— View job


Code Review: PR #10948 — Prevent raw cube refs leaking into multi-stage pre-agg queries

  • Gather context and read changed files
  • Review code quality and correctness
  • Check for security, performance, and edge cases
  • Assess test coverage
  • Post final review

Summary

This PR fixes a correctness bug where multi-stage queries hitting an external pre-aggregation could route raw DWH-table SQL to CubeStore, causing Table default.orders was not found errors. The fix is well-scoped across two files and targets the root cause cleanly.

Bug Analysis

The root cause was a pack/unpack asymmetry in AggregateMultipliedSubquery. When the pre-aggregation optimizer set pre_aggregation_override, the pack() method still included the dead native branches (keys_subquery, source, dimension_subqueries) in the input list. Plan walkers — specifically collect_cube_names_from_node (used to determine which cubes a query references) and the plan rewriter — would traverse these dead branches, discover raw cube.table identifiers, and inadvertently include them in a plan destined for CubeStore.

A second contributing factor was that try_rewrite_query_with_multistages had a silent fallthrough for non-FullKeyAggregate sources (i.e. LogicalJoin), which would carry raw cube references into the rewritten plan.

What the fix does

1. aggregate_multiplied_subquery.rs — Prune dead branches from pack/unpack

  • pack() now short-circuits: when pre_aggregation_override is Some, only the override is returned as an input. This prevents all plan walkers from seeing the dead branches.
  • unpack() mirrors this: fields become Option, and when override is present, only inputs[0] (the override) is read.
  • with_inputs() correctly handles the bifurcation: in the override case, native branches are cloned from self (unchanged), while the override is reconstructed from the rewritten input.
  • inputs_len() returns 1 vs 2 + dim_count accordingly.

The symmetry between pack(), unpack(), inputs_len(), and with_inputs() is maintained correctly.

2. optimizer.rs — Guard against non-FullKeyAggregate sources

The else branch that previously fell through with query.source().clone() now rolls back usages and returns Ok(None). This is the correct conservative choice — if the source isn't a FullKeyAggregate (which references CTEs by name), rewriting is unsafe.

Findings

No correctness bugs found. The fix is logically sound and well-targeted.

Positive notes:

  • The comments are appropriate — they explain the why behind non-obvious invariants (e.g., why only the override is packed, why LogicalJoin is rejected)
  • The rollback pattern (usages.truncate + usage_counter restore) is consistent with how it's used elsewhere in the same function
  • The Option fields in the unpacker are a cleaner representation than having potentially-stale values

Suggestion (non-blocking):

  • Test coverage: The PR checklist notes that tests for this specific change haven't been added. There are existing multi-stage pre-aggregation tests in tests/integration/pre_aggregations/multi_stage.rs, but none cover the specific scenario of a multiplied-measure CTE with an external pre-aggregation override where the source is a LogicalJoin. A regression test exercising that exact path would lock down this fix. Add regression test →

Overall: Looks good. The fix is correct, well-commented, and narrowly scoped. Merging without a regression test is reasonable given the existing test suite for adjacent scenarios, but adding one would strengthen confidence.

Comment on lines +45 to +61
let (keys_subquery, source, dimension_subqueries) = if pre_aggregation_override.is_some() {
(
self.keys_subquery.clone(),
self.source.clone(),
self.dimension_subqueries.clone(),
)
} else {
(
keys_subquery.unwrap().clone().into_logical_node()?,
self.source.with_plan_node(source.unwrap().clone())?,
dimension_subqueries
.unwrap()
.iter()
.map(|itm| itm.clone().into_logical_node())
.collect::<Result<Vec<_>, _>>()?,
)
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation (non-blocking): The .unwrap() calls on lines 53–54 are safe here because the else branch guarantees pre_aggregation_override is None, which means the unpacker always sets keys_subquery, source, and dimension_subqueries to Some. The logic is sound, but worth noting for future maintainers — the safety depends on the pack/unpack invariant staying in sync.

Comment on lines 273 to 277
} else {
query.source().clone()
self.usages.truncate(saved_usages_len);
self.usage_counter = saved_counter;
return Ok(None);
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The previous code fell through with query.source().clone(), silently allowing a LogicalJoin source to carry raw cube.table identifiers into the rewritten plan. The rollback + early return is the correct conservative choice — better to skip the rewrite than produce invalid SQL for CubeStore.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.98%. Comparing base (f3da859) to head (2fb0e03).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10948      +/-   ##
==========================================
- Coverage   83.57%   78.98%   -4.60%     
==========================================
  Files         254      470     +216     
  Lines       76126    93132   +17006     
  Branches        0     3454    +3454     
==========================================
+ Hits        63622    73559    +9937     
- Misses      12504    19070    +6566     
- Partials        0      503     +503     
Flag Coverage Δ
cube-backend 58.45% <ø> (?)
cubesql 83.56% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant