DEV-1449: Fix nested-DAG multi-hop dotted dim from downstream stages#137
Conversation
…eam stages When stage 1 projects a multi-hop dotted dim (e.g. `customers.regions.name`) and a downstream stage references the same dotted path, SLayer used to emit broken SQL referencing a hybrid `__` + `.` table alias that doesn't exist in scope (e.g. `customers__regions.name` against an `s1` subquery). Fix: virtual stage models from `_query_as_model` now carry a `source_model_origin` breadcrumb (Pydantic field with `exclude=True`, so it's in-memory only). A new shared `resolve_via_stage_origin` helper in `enrichment.py` consults that breadcrumb when the join-walk fails and looks up the `__`-flattened column name on the virtual model. Two lookup candidates — ancestor-stripped and full-flat — handle both depth-1 and chained nested-DAG cases. Four cross-stage code paths gain the fallback: dimensions, time dimensions, cross-model measures (intercept before the strict join walk; emits a local re-aggregated measure), and filters (rewrites the filter SQL via direct re.sub against `resolved_sql`). Parameterized cross-model aggs fall through to the existing CTE path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an in-memory SourceModelOrigin breadcrumb on virtual SlayerModel instances, a resolve_via_stage_origin helper, interception logic to re-aggregate distributive dotted cross-stage measures locally, fallbacks for dimension/time/filter resolution to use flattened stage columns, and a comprehensive test suite validating SQL shape and execution. ChangesVirtual Stage Cross-Reference Resolution
Sequence Diagram(s)sequenceDiagram
participant OuterEngine as Outer Query Engine
participant Enr as Enrichment Resolver
participant QueryAsModel as _query_as_model
participant OriginResolver as resolve_via_stage_origin
participant SQLGen as SQL Generator
OuterEngine->>Enr: Resolve dotted ref (e.g., customers.regions.name)
Enr->>Enr: Attempt join-walk resolution
Enr->>QueryAsModel: Check model metadata (is virtual?)
QueryAsModel-->>Enr: model.source_model_origin present
Enr->>OriginResolver: resolve_via_stage_origin(model, parts)
OriginResolver->>OriginResolver: Try ancestor-stripped and full flat candidates
OriginResolver-->>Enr: Return matching Column (e.g., customers__regions__name)
Enr->>SQLGen: Emit <virtual_stage_alias>.<flat_col> in SQL
SQLGen-->>OuterEngine: SELECT s1.customers__regions__name
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/test_nested_dag_cross_stage_refs.py (2)
1135-1135: ⚡ Quick winMove the local
SQLiteStorageimport to the module import block.Keeping imports at top-level avoids hidden import points and aligns with the project import rule.
♻️ Proposed fix
@@ from slayer.engine.enrichment import resolve_via_stage_origin from slayer.engine.query_engine import SlayerQueryEngine +from slayer.storage.sqlite_storage import SQLiteStorage from slayer.storage.yaml_storage import YAMLStorage @@ - from slayer.storage.sqlite_storage import SQLiteStorage db_path = tmp_path / "slayer.db" storage = SQLiteStorage(db_path=str(db_path))As per coding guidelines "Place imports at the top of Python files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_nested_dag_cross_stage_refs.py` at line 1135, Move the local import "from slayer.storage.sqlite_storage import SQLiteStorage" out of the test body and place it with the module-level imports at the top of tests/test_nested_dag_cross_stage_refs.py; locate any test function or block referencing SQLiteStorage and remove the inline import, then add the same import to the file's import block so SQLiteStorage is imported once at module import time in accordance with the project's import rules.
843-843: ⚡ Quick winUse keyword arguments for
engine.executecalls.These calls pass more than one argument but use a positional first parameter. Switch to
query=...for consistency with the repo rule.♻️ Proposed fix
- resp = await engine.execute(single, dry_run=True) + resp = await engine.execute(query=single, dry_run=True) ... - resp = await engine.execute(outer, dry_run=True) + resp = await engine.execute(query=outer, dry_run=True)As per coding guidelines "Use keyword arguments for functions with more than 1 parameter in Python".
Also applies to: 1025-1025
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_nested_dag_cross_stage_refs.py` at line 843, The call to engine.execute currently passes the query as a positional argument (resp = await engine.execute(single, dry_run=True)); update it to use keyword arguments (resp = await engine.execute(query=single, dry_run=True)) so it follows the repo rule; make the same change for any other occurrences of engine.execute that pass the query positionally (e.g., where the first arg is a variable like single) to use query=... and preserve the dry_run=True keyword.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/engine/enrichment.py`:
- Around line 602-642: The fast-path in _try_intercept_cross_model_as_local must
be restricted to only semantics-preserving, re-aggregatable operations: check
ref.aggregation_name against an explicit safe set (e.g.
"sum","min","max","count") before proceeding and return None for all others
(avg, count_distinct, etc.); keep the existing logic using
model.get_column(flat_with_agg), _ensure_aggregated_measure, and
known_aliases[field_name], and do not attempt to re-aggregate parameterized refs
(ref.agg_args/agg_kwargs) — if you need to special-case the "*" count-to-sum
semantic, handle that explicitly by mapping the aggregation and flat_with_agg
appropriately, otherwise fall back to the CTE path.
- Around line 951-971: The interception branch that handles dot-qualified
spec.measure_name and returns early must perform the same rename/alias
bookkeeping as the normal local-aggregate path before continuing: after
finding/creating the local_alias (via _try_intercept_cross_model_as_local) and
updating the EnrichedMeasure metadata (measures / em.label / em.type), also
apply the canonical-to-user remap and projection/alias handling that the
standard local-aggregate branch does for qfield (respecting qfield.name
override, qfield.label, qfield.type), update any rename maps used by filters and
ORDER BY, and then call _mark_user_declared(local_alias) and append to
user_projection before continue; in short, reuse the same rename/alias
bookkeeping steps the normal branch uses so qfield.name properly remaps the
internal alias to the user-facing name.
---
Nitpick comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Line 1135: Move the local import "from slayer.storage.sqlite_storage import
SQLiteStorage" out of the test body and place it with the module-level imports
at the top of tests/test_nested_dag_cross_stage_refs.py; locate any test
function or block referencing SQLiteStorage and remove the inline import, then
add the same import to the file's import block so SQLiteStorage is imported once
at module import time in accordance with the project's import rules.
- Line 843: The call to engine.execute currently passes the query as a
positional argument (resp = await engine.execute(single, dry_run=True)); update
it to use keyword arguments (resp = await engine.execute(query=single,
dry_run=True)) so it follows the repo rule; make the same change for any other
occurrences of engine.execute that pass the query positionally (e.g., where the
first arg is a variable like single) to use query=... and preserve the
dry_run=True keyword.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 77ef8bcd-edcb-4bc5-906f-b7db5105506a
📒 Files selected for processing (4)
slayer/core/models.pyslayer/engine/enrichment.pyslayer/engine/query_engine.pytests/test_nested_dag_cross_stage_refs.py
Three fixes to `_try_intercept_cross_model_as_local` and its line-896 caller in `slayer/engine/enrichment.py`: 1. Semantics gate (CodeRabbit). Gate the intercept to aggregations that re-aggregate to an equivalent result. `sum`/`min`/`max` pass through unchanged; `count` re-maps to `sum` over the inner per-group count (using `count` on stage rows would just count groups, not rows); everything else (`avg`, `count_distinct`, median, ...) returns None and falls through to the existing cross-model CTE path. 2. Candidate A — ancestor-strip (Codex). Mirror `resolve_via_stage_origin`'s Candidate A/B order inside the intercept. The source-prefixed form (`orders.customers.revenue:sum` against an `s1` wrapping `orders`) lands on `customers__revenue_sum` because `_alias_to_short` stripped the immediate ancestor on the inner stage. Build the ancestor chain from `source_model_origin`, try the ancestor-stripped flat first, then full flat. 3. Rename bookkeeping at line-896 (CodeRabbit). The intercept's early `continue` skipped the standard local-aggregate rename machinery: updating the EnrichedMeasure's name/alias, writing `known_aliases[qfield.name]`, `canonical_to_user_name[canonical_name]`, `measure_canonical_key_to_alias` provenance, and surfacing the user alias in `user_projection`. Mirror it. Tests: 6 new cases — avg / count_distinct fall through to CTE path; sum is intercepted (regression guard); source-prefixed CMM resolves via Candidate A; renamed intercept surfaces as user alias; renamed intercept filter via colon-form remaps to the rename. Updated *:count test to assert outer SUM (not COUNT) per the new semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Around line 1414-1428: The test currently only asserts the internal canonical
alias isn't present but doesn't ensure a filter exists or that the rewritten
name is used; update the block that inspects outer_select (using where, having,
clause, clause_cols from clause.find_all(exp.Column)) to first assert that at
least one of where or having is not None (i.e., a filter exists), then assert
the forbidden internal name "customers__revenue_sum_sum" is absent, and finally
assert that the rewritten alias (e.g., column name "rev") appears in clause_cols
to confirm the rename was applied.
- Around line 1209-1220: The test currently only looks for exp.Column inside
each exp.Count (using count_call.find_all(exp.Column)) and therefore misses
COUNT(*) cases; update the COUNT guard so that for each count_call (in the
outer_select loop) you also check for exp.Star (e.g.,
count_call.find_all(exp.Star) or count_call.this is exp.Star) and treat those as
stage-row counts to reject, or otherwise inspect count_call.args to detect a
star/no-column COUNT; ensure the assert checks fail when a COUNT(*) would target
the stage (same scope as checking ("s1", "customers___count") in
outer_count_targets) by adding the star-detection to the logic that populates
outer_count_targets or by asserting directly on count_call when a star is found.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 91b7543f-b32a-45ef-a592-c86774b2cae6
📒 Files selected for processing (2)
slayer/engine/enrichment.pytests/test_nested_dag_cross_stage_refs.py
- Codex: add duplicate-canonical-qfield guard to the cross-stage intercept branch in `enrichment.py`. Without it, two qfields with the same cross-stage canonical aggregation but different `name`s would silently corrupt `user_projection`: the second call's rename mutates the first call's EnrichedMeasure alias and the projection still contains the pre-rename alias with no backing measure. Mirrors the guard the standard local-agg branch runs. New test: two intercepted qfields with different `name`s raise. - CodeRabbit: tighten the `*:count` cross-stage assertion so it also rejects `COUNT(*)` (and any other outer COUNT shape), not only `COUNT(<column>)`. The intercept now re-maps `count` to `sum`, so any outer COUNT is a regression — assert none exists at all. - CodeRabbit: tighten the renamed-CMM filter assertion to require at least one of WHERE/HAVING (so a silently-dropped filter can't pass the test). Dropped the "must reference `rev` literally" sub-check — standard SQL HAVING references the aggregation expression (`SUM(s1.customers__revenue_sum)`), not the projection alias, so pinning a literal `rev` ref would conflict with correct emission. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_nested_dag_cross_stage_refs.py (1)
1472-1472: 💤 Low valueConsider accepting both SlayerError and ValueError for consistency.
For consistency with similar validation tests in this file (lines 919, 1252, 1272), consider accepting both exception types.
Minor consistency improvement
- with pytest.raises(ValueError, match="canonicalises to the same"): + with pytest.raises((SlayerError, ValueError), match="canonicalises to the same"): await engine.execute(query=[inner, outer], dry_run=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_nested_dag_cross_stage_refs.py` at line 1472, Update the assertion in the failing test that currently uses pytest.raises(ValueError, match="canonicalises to the same") so it accepts either SlayerError or ValueError for consistency with other validation tests; change the raises check to allow both exception types (e.g., use pytest.raises((SlayerError, ValueError), match="canonicalises to the same")) and ensure SlayerError is imported or referenced where the test uses pytest.raises.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Line 1472: Update the assertion in the failing test that currently uses
pytest.raises(ValueError, match="canonicalises to the same") so it accepts
either SlayerError or ValueError for consistency with other validation tests;
change the raises check to allow both exception types (e.g., use
pytest.raises((SlayerError, ValueError), match="canonicalises to the same")) and
ensure SlayerError is imported or referenced where the test uses pytest.raises.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 28a865e2-982f-47e8-ae34-c86f1d9d30a0
📒 Files selected for processing (2)
slayer/engine/enrichment.pytests/test_nested_dag_cross_stage_refs.py
- Codex: intercept now surfaces under the cross-model canonical alias (e.g. `s1.customers.revenue_sum`) instead of the doubled-sum internal form (`s1.customers__revenue_sum_sum`). Previously the intercept built the EnrichedMeasure against the flat stage column and `_ensure_aggregated_measure` produced an alias that didn't match the cross-model CTE path's shape — public result keys would change between paths, and colon-form filters / ORDER BY refs canonicalising to `customers.revenue_sum` would fail to find the intercepted measure in `known_aliases`. Always rename the intercepted measure to the cross-model canonical (or to the user-supplied `qfield.name`), and write both `target_name` and the canonical alias into `known_aliases`. - CodeRabbit nitpick: move SQLiteStorage import from inside test_sqlite_storage_roundtrip_drops_origin to the top of the file (project rule: imports at module top). - CodeRabbit nitpick: accept both (SlayerError, ValueError) in the duplicate-qfield test's pytest.raises, for consistency with other validation tests in this file. Updated the row-count test to assert the new surfaced key shape `s1.customers.revenue_sum` (matching what a regular cross-model query would produce). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/test_nested_dag_cross_stage_refs.py (2)
904-924:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
rev:sumis no longer a valid negative case.The new alias-bookkeeping contract records both the canonical and target names, and CI confirms this path no longer raises. Keeping
pytest.raises(...)here locks the suite to the pre-change behavior instead of the behavior this PR is introducing. Update this test to assert the resolved SQL/output shape instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_nested_dag_cross_stage_refs.py` around lines 904 - 924, Update test_renamed_cmm_via_rename_does_not_resolve to reflect the new alias-bookkeeping behavior: remove the pytest.raises((SlayerError, ValueError)) expectation around await engine.execute(...) and instead run engine.execute(query=[inner, outer], dry_run=True) and assert the returned/dry-run result contains the expected resolved SQL or output shape (e.g., that the outer resolves `rev:sum` to the proper cross-model measure column from s1). Locate the test by function name and adjust assertions on the engine.execute/dry_run result accordingly.
866-900:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRenamed-inner canonical alias path is still red in CI.
This test encodes the new canonical-alias contract, but CI shows
engine.execute(..., dry_run=True)still raisesValueError: Model 's1' has no join to 'customers'here. The rename-aware intercept /known_aliasespath this assertion depends on is not wired through yet, so this stack cannot merge cleanly as-is.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_nested_dag_cross_stage_refs.py` around lines 866 - 900, The dry-run path used by engine.execute (called in test_renamed_cmm_colon_syntax_resolves_via_canonical_flat) does not propagate the rename-aware intercept / known_aliases used for CROSS-MODEL measures, so the resolver still errors with "Model 's1' has no join to 'customers'"; fix by threading the rename-aware intercept and known_aliases into the resolution flow that engine.execute(..., dry_run=True) uses (where aliases are resolved and _alias_to_short is called) so that the canonical alias mapping (customers__revenue_sum) is available during dry-run; locate the resolution/join-lookup logic invoked by engine.execute and ensure it consults the same intercept/known_aliases structure used in normal execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/engine/enrichment.py`:
- Around line 1044-1078: When renaming an intercepted measure in
_ensure_aggregated_measure (the block that sets target_name/target_alias and
updates measures, known_aliases, measure_canonical_key_to_alias,
canonical_to_user_name), preserve the existing "rename-vs-canonical collision"
guard: before applying the rename (i.e. before looping measures and updating
aliases), check whether target_name (qfield.name or canonical_name) would
collide with any existing canonical alias owned by a different measure/canonical
key; if so, reject or surface the same error as the non-intercept path (prevent
deduping onto the intercepted alias). Implement this by reusing the same
collision check logic used elsewhere (compare target_name/target_alias against
measure_canonical_key_to_alias and known_aliases) and abort/raise the same
conflict error instead of proceeding with the rename.
---
Outside diff comments:
In `@tests/test_nested_dag_cross_stage_refs.py`:
- Around line 904-924: Update test_renamed_cmm_via_rename_does_not_resolve to
reflect the new alias-bookkeeping behavior: remove the
pytest.raises((SlayerError, ValueError)) expectation around await
engine.execute(...) and instead run engine.execute(query=[inner, outer],
dry_run=True) and assert the returned/dry-run result contains the expected
resolved SQL or output shape (e.g., that the outer resolves `rev:sum` to the
proper cross-model measure column from s1). Locate the test by function name and
adjust assertions on the engine.execute/dry_run result accordingly.
- Around line 866-900: The dry-run path used by engine.execute (called in
test_renamed_cmm_colon_syntax_resolves_via_canonical_flat) does not propagate
the rename-aware intercept / known_aliases used for CROSS-MODEL measures, so the
resolver still errors with "Model 's1' has no join to 'customers'"; fix by
threading the rename-aware intercept and known_aliases into the resolution flow
that engine.execute(..., dry_run=True) uses (where aliases are resolved and
_alias_to_short is called) so that the canonical alias mapping
(customers__revenue_sum) is available during dry-run; locate the
resolution/join-lookup logic invoked by engine.execute and ensure it consults
the same intercept/known_aliases structure used in normal execution.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: edf22757-89f2-4936-8b44-9f88b77a25c2
📒 Files selected for processing (2)
slayer/engine/enrichment.pytests/test_nested_dag_cross_stage_refs.py
- python:S3776 (CRITICAL) x2: cognitive complexity in `_resolve_dimensions` (22→) and `_resolve_time_dimensions` (18→). The two functions had near-identical join-walk + stage-origin fallback logic in their dotted-ref branches. Extracted the shared flow into `_resolve_dotted_dim_with_stage_fallback`. Both call sites now delegate; complexity drops back below the 15 threshold and the fallback behaviour stays identical (cross-model CTE re-rooting still depends on the lenient fall-through when stage-origin misses). - python:S7503 (MINOR) x5: five resolver-direct unit tests in test_nested_dag_cross_stage_refs.py were marked `async` but call only the sync `resolve_via_stage_origin` helper — no awaits in the body. Dropped the `async` keyword on those 5 methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Codex (major): cross-stage intercept dup-guard keyed on raw
`spec.measure_name`, so two refs that resolve to the same flat
column via different Candidate A/B paths (e.g.
`orders.customers.revenue:sum` ancestor-strips to
`customers__revenue_sum`, and `customers.revenue:sum` full-flats
to the same column) slipped past the guard. The second call's
rename then mutated the first call's EnrichedMeasure alias and
`user_projection` was left with an orphaned reference.
Split the intercept helper into `_intercept_candidate_for_cross_model`
(pure, returns the resolved `(flat_with_agg, outer_agg)` or None)
+ `_try_intercept_cross_model_as_local` (applies the candidate).
Key the dup-guard on `("agg-intercept", flat_with_agg, outer_agg)`
so equivalent refs collide.
- CodeRabbit (major): the rename-vs-canonical collision guard from
the standard local-aggregate branch was missing on the intercept
branch. If an intercepted measure renamed to `customer_id_max`
while another query measure canonicalises to `customer_id_max`,
`_ensure_aggregated_measure`'s alias-keyed dedup would silently
collapse the second aggregate onto the first. Added the cross-
other-canonical check, mirroring the standard branch.
Two new tests pin the guards: source-prefixed + unprefixed variants
of the same cross-stage aggregate with different `name`s now raise;
an intercepted rename colliding with another canonical raises.
Round-4 CodeRabbit nitpicks (SQLiteStorage top-level import; accept
both SlayerError+ValueError in duplicate-qfield test) were stale —
already addressed in 24a5ace.
Round-4 CI failure (`lint-and-test (3.11)`) was a flake at
`Cannot install sqlglotc` (files.pythonhosted.org read timeout),
unrelated to PR code; should self-resolve on the next push.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Intercepted cross-model measures without an explicit `name` were setting `EnrichedMeasure.name` to the dotted cross-model canonical (`customers.revenue_sum`). The dotted form belongs on `em.alias` (public result key, filter/ORDER BY resolution) but NOT on `em.name` — `_query_as_model` uses `em.name` as the short `Column.name` of the wrapped virtual model, and `Column.name` validation rejects dots. So a 3-stage DAG where the middle stage is an intercepted unrenamed CMM would fail when `_query_as_model` tried to wrap the middle stage. Fix: only mutate `em.name` when the rename target is a simple identifier (user-supplied `qfield.name`). When the target is the cross-model canonical (dotted), keep `em.name` as the internal flat form `_ensure_aggregated_measure` produced. Test: 3-stage DAG with an intercepted unrenamed CMM as the middle stage. Without the fix, the wrap step raised a Column.name validation error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 5's fix avoided the Column.name validation error but kept `em.name` as the doubled-sum internal canonical (`customers__revenue_sum_sum`). When that stage became the inner of a third stage, `_query_as_model` exposed the wrapped column under that doubled-sum name, so the third stage's intercept (which looks for the flat single-sum form `customers__revenue_sum` — what a single-stage cross-model query produces) MISSED and fell through to the cross-model CTE path, which virtual stages don't have. Fix: in the unrenamed case, derive `em.name` from the dotted cross-model canonical by replacing `.` with `__` — `customers.revenue_sum` → `customers__revenue_sum`. This matches `_alias_to_short`'s convention and what a single-stage produces, so the third stage's intercept finds the alias on `s2.columns`. Strengthened the 3-stage test to actually reference `customers.revenue:sum` again in stage 3 and assert the outer SELECT contains `SUM(s2.customers__revenue_sum)` — pinning that the third-stage intercept now resolves correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The filter stage-origin fallback was designed for DIMENSION refs cross-stage (e.g. `customers.regions.name = 'X'`). When the dotted ref's leaf was an aggregation canonical like `customers.revenue_sum` (from parse_filter rewriting `customers.revenue:sum > 100`), the fallback still fired and rewrote the filter to `WHERE s1.customers__revenue_sum > 100` — applying the predicate to the inner row-level values rather than HAVING-classifying it against the outer re-aggregated SUM. Fix: gate the fallback on the leaf NOT looking like an aggregation canonical (no `_<agg>` suffix matching any BUILTIN_AGGREGATIONS). Cross-model measure filters on intercepted CMMs are not auto- resolved (DEV-1445 territory); the standard strict-error fires instead of silently emitting a wrong WHERE. Test pins the gating: a filter `customers.revenue:sum > 100` on an outer query with an unrenamed intercepted CMM now raises (DEV-1445 limitation surface), rather than emitting incorrect SQL. The working rename path (covered by `test_renamed_intercept_filter_via_colon_form`) is unaffected — `canonical_to_user_name` remaps colon-form to the user alias before the filter resolver sees the dotted ref. Codex's parallel ORDER-BY observation (colon-form ORDER BY of intercepted unrenamed CMMs binds the inner stage column instead of the outer aggregate) is the same DEV-1445 territory — not auto- resolved; users must reference via the renamed alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ORDER BY by colon/canonical form on an unrenamed intercepted
cross-model measure (e.g. `order=[{"column":"customers.revenue:sum"}]`
or `customers.revenue_sum`) fell through to a non-existent
`customers.revenue_sum` bare column because
`generator._resolve_order_column`'s alias_lookup is built from
`EnrichedMeasure.name` (which the intercept's auto-rename changed
to the flat `customers__revenue_sum`), not from `known_aliases`.
Fix: register the dotted canonical name in `field_name_aliases` so
`_resolve_order_column`'s qualified-match branch
(line 1903-1906 in generator.py) resolves
`customers.revenue:sum` → projection alias `s1.customers.revenue_sum`.
Test pins the resolution: ORDER BY by colon form on an outer
intercepted CMM now references the projection alias.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dag-multi-hop-dotted-dimension-referenced-from-a
DEV-1448 (merged into main while PR #137 was in review) propagates the user-supplied `name` on a cross-model measure through to the wrapped virtual stage's column. That reverses the assumption three of my tests were built around (pre-DEV-1448: rename doesn't propagate; the canonical flat alias stays on s1.columns). Updated to reflect the new reality: - `test_renamed_cmm_colon_syntax_no_longer_resolves_after_rename` (was `…_resolves_via_canonical_flat`): the rename now replaces `customers__revenue_sum` on s1.columns with `rev`, so the outer's colon-form `customers.revenue:sum` misses the intercept candidate and falls through to the cross-model CTE path which raises on a virtual stage (no joins). DEV-1445 territory: users renamed cross-model measures must reference them by the rename downstream. - `test_renamed_cmm_via_rename_resolves` (was `…_does_not_resolve`): the rename DOES propagate, so `rev:sum` on the outer resolves as a standard local-aggregate over s1.rev → SUM(s1.rev). - `test_intercepted_rename_colliding_with_other_canonical_raises`: the collision now triggers DEV-1448's downstream-short-name guard before my round-4 intercept-canonical-collision guard fires — both are correct outcomes; widened the regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex flagged two fragile heuristics with the same root cause: the cross-stage intercept and the filter fallback were using naming-suffix patterns as proxies for "is this an aggregation canonical?", but the suffix shape is unreliable. 1. Intercept candidate could re-aggregate dim columns. If an inner stage projects a real DIMENSION whose flat name happens to look like `<path>__<measure>_<agg>` (e.g. a user-defined `revenue_sum` dim, or a column auto-flattened to that shape), the intercept's `model.get_column(flat_with_agg)` lookup matched it and silently emitted `SUM(s1.customers__revenue_sum)` even though no aggregate was projected. Fix: add `SourceModelOrigin.agg_column_names: frozenset[str]`, populated by `_query_as_model` from the inner enriched query's measures / cross_model_measures / transforms / expressions (their downstream short names). Gate the intercept on `candidate in agg_names` — only aggregation projections qualify. 2. Filter fallback rejected valid cross-stage dim filters. The suffix-based gate blocked any dotted ref whose leaf ended with a builtin aggregation suffix, so a real dim filter like `customers.revenue_sum = 'X'` (where the dim is literally named that) skipped stage-origin resolution and raised in strict mode even though `s1.customers__revenue_sum` exists. Fix: use parser provenance (`filter_synthesized_aliases` — the set of names `parse_filter` synthesized from colon syntax) to distinguish a colon-syntax-synthesized aggregate alias from a user-typed literal dim ref. Already plumbed into `resolve_filter_columns` for the bare-name check at line 2695; reuse the same signal for the dotted-ref fallback gate. Test: pin that a dim column with an `_<agg>`-shaped name does NOT get intercepted (resolve_via_stage_origin still resolves it as a plain column, but the intercept's gating on `agg_column_names` keeps it from being silently re-summed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`test_intercept_skips_dim_columns_that_look_like_aggregations` calls only the sync `resolve_via_stage_origin` helper — no awaits — so the `async` keyword was unused. Sonar S7503 minor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`agg_column_names` (the intercept's "safe to re-aggregate" gate)
included plain measures / transforms / expressions, but those have
user-supplied or canonical-derived names that can coincidentally
match a cross-model canonical-flat shape (e.g.
`{"formula": "amount:sum", "name": "customers__revenue_sum"}`).
A subsequent outer-stage `customers.revenue:sum` reference would
silently bind to this unrelated column.
Narrow the set to two reliable sources:
* Auto-derived cross-model canonical-flats (`_alias_to_short(cm.alias)`).
* Intercept-produced EnrichedMeasures, marked via a new
`EnrichedMeasure.from_cross_model_intercept` flag set in
`_try_intercept_cross_model_as_local` and at the qfield-site
intercept call.
The intercept-produced flag is what keeps 3-stage DAGs working:
the middle stage's intercept emits a local EnrichedMeasure (not a
CrossModelMeasure), and the third stage's intercept needs to
recognise that column as a safe re-aggregation source.
Test: a renamed local measure on the inner stage whose name
coincidentally matches the cross-model canonical-flat shape (e.g.
`{"formula":"amount:sum","name":"customers__revenue_sum"}`) no
longer triggers the outer-stage intercept — it falls through to
the cross-model CTE path, which raises (virtual stage has no
joins), surfacing the mismatch instead of silently re-summing
the wrong column.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The round-8 fix registered the dotted cross-model canonical in `field_name_aliases` (so ORDER BY can resolve `customers.revenue:sum` via `_resolve_order_column`'s qualified-match branch) only at the qfield-site intercept. The `_flatten_spec` / `_ensure_measure_from_spec` intercept branches (which run when the user references a cross-model measure ONLY in `order=` or in a transform/expression argument, without also declaring it as a `measures=` entry) didn't register the alias and ORDER BY fell through to a non-existent bare column. Move the registration into `_try_intercept_cross_model_as_local` itself so all three call sites get it. Test pins the order-only case (no qfield declaration): ORDER BY must not reference a bare `customers` table after the intercept. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
customers.regions.name) and a downstream stage references the same path, SLayer used to emit broken SQL referencing a hybrid__+.table alias that doesn't exist in scope._query_as_modelnow carry asource_model_originbreadcrumb (exclude=True, in-memory only). A sharedresolve_via_stage_originhelper looks up the__-flattened column on the virtual model when the join-walk fails, using two lookup candidates (ancestor-stripped + full-flat) so chained 3+ stage DAGs work.resolved_sql). Parameterized cross-model aggs fall through to the existing CTE path.Repro
Test plan
tests/test_nested_dag_cross_stage_refs.pycover: reported case (2-hop), 3-hop, cross-stage time dims, cross-model measure re-aggregation (incl. row-count check), filter rewrite (SQL shape + row exclusion), order_by (SQL shape + actual sort), chained DAG Candidate A/B precedence, engineered collision, non-virtual regression, ModelExtension, stored query-backed model save/execute breadcrumb, time_shift over cross-stage time dim, serialization roundtrip drops breadcrumb (YAML + SQLite),*:countcross-model intercept, rename interaction (DEV-1443).ruff check slayer/ tests/clean.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests