Skip to content

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762

Open
tlangton3 wants to merge 5 commits intocube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology
Open

fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762
tlangton3 wants to merge 5 commits intocube-js:masterfrom
tlangton3:cube/fix-tesseract-fk-aggregate-pk-tautology

Conversation

@tlangton3
Copy link
Copy Markdown
Contributor

@tlangton3 tlangton3 commented Apr 28, 2026

Summary

Filtered count measures of shape type: count + sql: "{primary_key}" + filters: [...], when reached through a 1→N edge with CUBESQL_SQL_PUSH_DOWN=true, generated SQL with a tautological FK-aggregate rejoin: LEFT JOIN <source_cube> AS <pk_alias> ON keys.pk = keys.pk. Both sides of the equality referenced the inner keys subquery, so the join cross-produced and the count over-included any row whose filter predicates were satisfied anywhere in the source table — silently incorrect and orders of magnitude slower.

The legacy JS path (BaseQuery.js::aggregateSubQuery) generates the correct keys.pk = source.pk join. The bug is exclusive to the Tesseract Rust planner.

Changes

  • rust/cubesqlplanner/cubesqlplanner/src/physical_plan_builder/processors/aggregate_multiplied_subquery.rs — bind a dedicated VisitorContext to the right-hand side of the FK-aggregate ON clause in the Cube source arm.
  • rust/cubesqlplanner/cubesqlplanner/src/test_fixtures/schemas/yaml_files/common/integration_multi_fact.yaml — add customers.active_count (filtered count on PK) to the existing 1→N schema.
  • rust/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_fact.rs — new test_multiplied_aggregate_filtered_count_on_pk regression test with a structural tautology assertion over every ON predicate, plus the suite-standard assert_snapshot!(result) on postgres execution row data when the integration-postgres feature is on.

Implementation Details

The two arms of AggregateMultipliedSubquerySource (MeasureSubquery and Cube) both build a rejoin from a keys subselect to a source. The MeasureSubquery arm already used Expr::Reference with an explicit pk_cube_alias on the right — unaffected. The Cube arm used Expr::Member(MemberExpression::new(dim)), deferring resolution to whatever rendering context happened to be active at SQL emission.

That deferred resolution flows through RenderReferencesSqlNode (the outermost wrapper in SqlNodesFactory::default_node_processor). By the time the SELECT was emitted, the outer factory's render_references had been populated by the schema-resolution pass — and because the keys subquery (the join's root) projects the PK dimension, ReferencesBuilder::find_reference_column_for_member_in_join resolved it to (keys, <alias>). That mapping then intercepted every subsequent rendering of the PK dim, including both sides of the ON clause.

The fix: clone context_factory, register pk_cube.name() → pk_cube_alias in cube_name_references, and bind a VisitorContext snapshot to the join's right-hand side via Expr::new_member_with_context. VisitorContext::new calls default_node_processor() at construction (cloning render_references), so the bound context is genuinely a pre-pollution snapshot. The dimension then renders through AutoPrefixSqlNode, which qualifies its SQL with pk_cube_alias — preserving full SQL semantics for custom-SQL or composite PKs, not just simple sql: id cases.

The MeasureSubquery arm was the in-file template; the working JS reference is BaseQuery.js::primaryKeyJoinConditions (~lines 2478-2485).

Testing

  • cargo test --lib in rust/cubesqlplanner/: 823 tests pass locally, 0 failures, 14 ignored (Docker-dependent).
  • Pre-fix manifestation in the regression test: LEFT JOIN customers AS "..." ON (("keys"."customers__id" = "keys"."customers__id")) — tautology, both sides reference keys.
  • Post-fix: LEFT JOIN customers AS "customers_key_customers" ON (("keys"."customers__id" = "customers_key_customers".id)) — RHS correctly resolves to the source-cube alias.
  • Regression coverage: (a) the test's structural tautology scanner walks every ON predicate in the generated SQL and fails on any lhs == rhs equality (environment-independent, robust to planner formatting); (b) assert_snapshot!(result) on postgres execution row data when integration-postgres is enabled validates end-to-end correctness against a real database. SQL strings themselves are not snapshotted — that follows the existing convention in the suite (226 result-data snapshots vs 0 SQL-string snapshots), since SQL formatting is unstable across template-rendering passes while row data is stable.

Linear Ticket

N/A — upstream OSS contribution.

When a filtered count measure of shape `type: count` + `sql: "{primary_key}"` + `filters: [...]` is reached through a 1->N edge with `CUBESQL_SQL_PUSH_DOWN=true`, the FK-aggregate rejoin onto the source cube emitted `ON keys.pk = keys.pk` — both sides referencing the inner `keys` subquery. The join cross-produced and the count over-included any row whose filters were satisfied anywhere in the source table.

Root cause: the `Cube` arm of `AggregateMultipliedSubquerySource` built the right-hand side of the join condition as `Expr::Member(MemberExpression::new(dim))`, deferring resolution to whatever rendering context was active at SQL emission. By emission time, the outer factory's `render_references` had been populated by the schema-resolution pass, mapping the PK dim to `(keys, <alias>)` (since the `keys` subquery — the join root — projects it). That mapping then intercepted both sides of the `ON` clause via `RenderReferencesSqlNode`.

Fix: clone `context_factory`, register `pk_cube.name() -> pk_cube_alias` in `cube_name_references`, and bind a `VisitorContext` snapshot (pre-pollution) to the join's right-hand side via `Expr::new_member_with_context`. The dimension renders through `AutoPrefixSqlNode` against `pk_cube_alias`, preserving full SQL semantics for custom-SQL or composite primary keys. The `MeasureSubquery` arm in the same file already used an explicit reference to `pk_cube_alias` and was unaffected.

Adds a regression test exercising the bug shape (`customers.active_count` filtered count on PK, queried by `orders.status` to force the multiplied path) with both a structural tautology assertion over every `ON` predicate and an `insta` snapshot of the generated SQL.
@github-actions github-actions Bot added rust Pull requests that update Rust code pr:community Contribution from Cube.js community members. labels Apr 28, 2026
The SQL string captured by `build_sql(..)` differs between local and CI
environments: when `--features integration-postgres` is on and Docker is
available, `try_execute_pg` runs a second template-rendering pass that
emits CTE-wrapped SQL; the single-pass `build_sql` output is the
inline-subquery form. Same code, different rendering pipelines reachable.

The cubesqlplanner suite convention is to snapshot postgres execution
results (226 such files in the repo), not SQL strings (1 file — this one).
Drop the SQL snapshot to follow convention. The structural tautology
scanner above stays — it's the bug-specific, environment-independent
guard. The existing `assert_snapshot!(result)` on row data continues to
verify end-to-end correctness when CI's postgres container is up.
Adds the canonical row-data snapshot the regression test needs when
`integration-postgres` is enabled. Validates that the filtered count
on a primary key produces the correct distinct-customer counts (1 for
'completed', 1 for 'pending', 0 for orders.status NULL) — pre-fix the
buggy cross-join over-counted these.
The `active_count` measure was defined with `sql: "{CUBE}.id"` and
filter `sql: "{CUBE}.name LIKE 'A%'"`. Both forms parse to
`SqlDependency::CubeRef` only — no `SqlDependency::Symbol` on the `id`
or `name` dimensions. The FK-aggregate bug only manifests when the
measure transitively pulls a primary-key dimension into the outer
factory's `render_references` (via a Symbol dep), which is what makes
the join condition's RHS misresolve to the inner `keys` subquery.

Switch to `sql: "{id}"` and filter `sql: "{name} LIKE 'A%'"` — these
are member references that produce Symbol deps on the dimensions, so
`references_builder.resolve_references_for_member` recursively
populates render_references for the PK. This matches the original
production bug shape (`sql: "{primary_key}"` with filters) and
genuinely exercises the buggy code path. With the fix in place the
join condition correctly resolves to `keys.<pk> = pk_cube_alias.<col>`;
without the fix it would collapse to `keys.<pk> = keys.<pk>`.
@tlangton3 tlangton3 marked this pull request as ready for review April 28, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:community Contribution from Cube.js community members. rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant