Skip to content

v3 SQL: dim-link-aware filter pushdown into inner subquery scopes#2170

Merged
shangyian merged 6 commits into
DataJunction:mainfrom
shangyian:fix-aliases
May 23, 2026
Merged

v3 SQL: dim-link-aware filter pushdown into inner subquery scopes#2170
shangyian merged 6 commits into
DataJunction:mainfrom
shangyian:fix-aliases

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented May 23, 2026

Summary

This PR overhauls how _resolve_pushdown_filters_for_cte decides where to inject user-supplied dimension filters within a CTE body, closing several regressions in the original "walk every source-table reference" pass (#2166) and bringing v3's pushdown placement closer to v2's behavior.

For context, the filter pushdown in v3 had two known regressions as compared to v2:

  1. Spark OOMs when the same source table was referenced inside a nested subquery but only the top-level reference got the partition predicate.
  2. Column doesn't exist errors when an outer subquery alias collided with a deeper Table alias, causing the filter to be retargeted into the wrong scope.

Earlier attempts to fix these (the "walk every source-table reference" change) also went too far: they pushed filters into tables that incidentally shared a column name with the filter's dim but weren't actually linked to it, diverging from v2.

_resolve_pushdown_filters_for_cte now does two complementary passes:

  1. Alias-substitution pass: For each alias used in the primary rewrite at target_select, find every sibling ast.Table ref to the same physical source elsewhere in the CTE body and inject a retargeted copy. Restores the v2 partition-prune-everywhere behavior for self-references.
  2. Column-aware retargeting pass: For each ast.Select scope inside the CTE body (including secondary set-op arms), builds a resolver map from:
  • Dim-link entries: for each node referenced in the scope's FROM, register one entry per foreign_keys_reversed row from its dimension links. This is the only route by which filters get pushed at inner scopes.
  • Dim-self entries: when the scope's table is itself a dimension node, register its PK columns as self-pointing entries.

Test Plan

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify Bot commented May 23, 2026

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 6c765a4
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/6a11da14ed406800070ffd9e

@shangyian shangyian changed the title Fix aliasing dim-link-aware filter pushdown into inner subquery scopes May 23, 2026
@shangyian shangyian changed the title dim-link-aware filter pushdown into inner subquery scopes v3 SQL: dim-link-aware filter pushdown into inner subquery scopes May 23, 2026
@shangyian shangyian marked this pull request as ready for review May 23, 2026 16:50
@shangyian shangyian merged commit fe489bd into DataJunction:main May 23, 2026
17 of 21 checks passed
@shangyian shangyian deleted the fix-aliases branch May 23, 2026 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant