fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762
Open
tlangton3 wants to merge 5 commits intocube-js:masterfrom
Open
fix(tesseract): tautological FK-aggregate join for filtered PK counts#10762tlangton3 wants to merge 5 commits intocube-js:masterfrom
tlangton3 wants to merge 5 commits intocube-js:masterfrom
Conversation
When a filtered count measure of shape `type: count` + `sql: "{primary_key}"` + `filters: [...]` is reached through a 1->N edge with `CUBESQL_SQL_PUSH_DOWN=true`, the FK-aggregate rejoin onto the source cube emitted `ON keys.pk = keys.pk` — both sides referencing the inner `keys` subquery. The join cross-produced and the count over-included any row whose filters were satisfied anywhere in the source table.
Root cause: the `Cube` arm of `AggregateMultipliedSubquerySource` built the right-hand side of the join condition as `Expr::Member(MemberExpression::new(dim))`, deferring resolution to whatever rendering context was active at SQL emission. By emission time, the outer factory's `render_references` had been populated by the schema-resolution pass, mapping the PK dim to `(keys, <alias>)` (since the `keys` subquery — the join root — projects it). That mapping then intercepted both sides of the `ON` clause via `RenderReferencesSqlNode`.
Fix: clone `context_factory`, register `pk_cube.name() -> pk_cube_alias` in `cube_name_references`, and bind a `VisitorContext` snapshot (pre-pollution) to the join's right-hand side via `Expr::new_member_with_context`. The dimension renders through `AutoPrefixSqlNode` against `pk_cube_alias`, preserving full SQL semantics for custom-SQL or composite primary keys. The `MeasureSubquery` arm in the same file already used an explicit reference to `pk_cube_alias` and was unaffected.
Adds a regression test exercising the bug shape (`customers.active_count` filtered count on PK, queried by `orders.status` to force the multiplied path) with both a structural tautology assertion over every `ON` predicate and an `insta` snapshot of the generated SQL.
The SQL string captured by `build_sql(..)` differs between local and CI environments: when `--features integration-postgres` is on and Docker is available, `try_execute_pg` runs a second template-rendering pass that emits CTE-wrapped SQL; the single-pass `build_sql` output is the inline-subquery form. Same code, different rendering pipelines reachable. The cubesqlplanner suite convention is to snapshot postgres execution results (226 such files in the repo), not SQL strings (1 file — this one). Drop the SQL snapshot to follow convention. The structural tautology scanner above stays — it's the bug-specific, environment-independent guard. The existing `assert_snapshot!(result)` on row data continues to verify end-to-end correctness when CI's postgres container is up.
Adds the canonical row-data snapshot the regression test needs when `integration-postgres` is enabled. Validates that the filtered count on a primary key produces the correct distinct-customer counts (1 for 'completed', 1 for 'pending', 0 for orders.status NULL) — pre-fix the buggy cross-join over-counted these.
The `active_count` measure was defined with `sql: "{CUBE}.id"` and
filter `sql: "{CUBE}.name LIKE 'A%'"`. Both forms parse to
`SqlDependency::CubeRef` only — no `SqlDependency::Symbol` on the `id`
or `name` dimensions. The FK-aggregate bug only manifests when the
measure transitively pulls a primary-key dimension into the outer
factory's `render_references` (via a Symbol dep), which is what makes
the join condition's RHS misresolve to the inner `keys` subquery.
Switch to `sql: "{id}"` and filter `sql: "{name} LIKE 'A%'"` — these
are member references that produce Symbol deps on the dimensions, so
`references_builder.resolve_references_for_member` recursively
populates render_references for the PK. This matches the original
production bug shape (`sql: "{primary_key}"` with filters) and
genuinely exercises the buggy code path. With the fix in place the
join condition correctly resolves to `keys.<pk> = pk_cube_alias.<col>`;
without the fix it would collapse to `keys.<pk> = keys.<pk>`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Filtered count measures of shape
type: count+sql: "{primary_key}"+filters: [...], when reached through a 1→N edge withCUBESQL_SQL_PUSH_DOWN=true, generated SQL with a tautological FK-aggregate rejoin:LEFT JOIN <source_cube> AS <pk_alias> ON keys.pk = keys.pk. Both sides of the equality referenced the innerkeyssubquery, so the join cross-produced and the count over-included any row whose filter predicates were satisfied anywhere in the source table — silently incorrect and orders of magnitude slower.The legacy JS path (
BaseQuery.js::aggregateSubQuery) generates the correctkeys.pk = source.pkjoin. The bug is exclusive to the Tesseract Rust planner.Changes
rust/cubesqlplanner/cubesqlplanner/src/physical_plan_builder/processors/aggregate_multiplied_subquery.rs— bind a dedicatedVisitorContextto the right-hand side of the FK-aggregateONclause in theCubesource arm.rust/cubesqlplanner/cubesqlplanner/src/test_fixtures/schemas/yaml_files/common/integration_multi_fact.yaml— addcustomers.active_count(filtered count on PK) to the existing 1→N schema.rust/cubesqlplanner/cubesqlplanner/src/tests/integration/multi_fact.rs— newtest_multiplied_aggregate_filtered_count_on_pkregression test with a structural tautology assertion over everyONpredicate, plus the suite-standardassert_snapshot!(result)on postgres execution row data when theintegration-postgresfeature is on.Implementation Details
The two arms of
AggregateMultipliedSubquerySource(MeasureSubqueryandCube) both build a rejoin from akeyssubselect to a source. TheMeasureSubqueryarm already usedExpr::Referencewith an explicitpk_cube_aliason the right — unaffected. TheCubearm usedExpr::Member(MemberExpression::new(dim)), deferring resolution to whatever rendering context happened to be active at SQL emission.That deferred resolution flows through
RenderReferencesSqlNode(the outermost wrapper inSqlNodesFactory::default_node_processor). By the time the SELECT was emitted, the outer factory'srender_referenceshad been populated by the schema-resolution pass — and because thekeyssubquery (the join's root) projects the PK dimension,ReferencesBuilder::find_reference_column_for_member_in_joinresolved it to(keys, <alias>). That mapping then intercepted every subsequent rendering of the PK dim, including both sides of theONclause.The fix: clone
context_factory, registerpk_cube.name() → pk_cube_aliasincube_name_references, and bind aVisitorContextsnapshot to the join's right-hand side viaExpr::new_member_with_context.VisitorContext::newcallsdefault_node_processor()at construction (cloning render_references), so the bound context is genuinely a pre-pollution snapshot. The dimension then renders throughAutoPrefixSqlNode, which qualifies its SQL withpk_cube_alias— preserving full SQL semantics for custom-SQL or composite PKs, not just simplesql: idcases.The
MeasureSubqueryarm was the in-file template; the working JS reference isBaseQuery.js::primaryKeyJoinConditions(~lines 2478-2485).Testing
cargo test --libinrust/cubesqlplanner/: 823 tests pass locally, 0 failures, 14 ignored (Docker-dependent).LEFT JOIN customers AS "..." ON (("keys"."customers__id" = "keys"."customers__id"))— tautology, both sides referencekeys.LEFT JOIN customers AS "customers_key_customers" ON (("keys"."customers__id" = "customers_key_customers".id))— RHS correctly resolves to the source-cube alias.ONpredicate in the generated SQL and fails on anylhs == rhsequality (environment-independent, robust to planner formatting); (b)assert_snapshot!(result)on postgres execution row data whenintegration-postgresis enabled validates end-to-end correctness against a real database. SQL strings themselves are not snapshotted — that follows the existing convention in the suite (226 result-data snapshots vs 0 SQL-string snapshots), since SQL formatting is unstable across template-rendering passes while row data is stable.Linear Ticket
N/A — upstream OSS contribution.