Skip to content

DEV-1339: regression tests for multi-hop derived column qualification#93

Merged
ZmeiGorynych merged 4 commits into
mainfrom
egor/dev-1339-multi-hop-alias-path-doesnt-qualify-joined-model-derived
May 5, 2026
Merged

DEV-1339: regression tests for multi-hop derived column qualification#93
ZmeiGorynych merged 4 commits into
mainfrom
egor/dev-1339-multi-hop-alias-path-doesnt-qualify-joined-model-derived

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 5, 2026

Summary

DEV-1339 reported that derived columns reached via a multi-hop join path were emitted with unqualified inner refs. Investigation showed the bug is already fixed by PR #89 (DEV-1333) — the issue's exact reproduction (adapted to the real target_model/join_pairs API; the original used hypothetical alias/on fields that don't exist on ModelJoin) now produces correctly qualified SQL across every entry point.

This PR lands six regression tests in tests/test_cross_model_derived_columns.py pinning the fixed behavior so the contract can't silently regress:

  • multi-hop query dim ref → derived target column
  • multi-hop cross-model measure aggregation → derived target column
  • multi-hop filter → derived target column
  • multi-hop time-dim → derived target column
  • verbatim DEV-1339 solar_panels repro
  • diamond-path cross-model measure → derived target (asserts the non-queried diamond arm doesn't leak)

No production code changes — the existing expand_derived_refs machinery in slayer/engine/column_expansion.py already handles every scenario the issue describes.

Test plan

  • poetry run pytest tests/test_cross_model_derived_columns.py — 24/24 pass
  • poetry run pytest -m "not integration" — 1591/1591 pass
  • poetry run ruff check slayer/ tests/ — clean

Closes DEV-1339.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added extensive regression tests for multi-hop join queries with derived columns, ensuring correct qualification and aliasing across dimension selection, measure aggregation, filter predicates, and time-dimension enrichment.
    • Included repro and diamond-join scenarios to prevent cross-path identifier leakage and validate multi-hop measure SQL generation.

ZmeiGorynych and others added 2 commits May 5, 2026 10:51
DEV-1339 reported that querying a derived ``Column.sql`` reached via a
multi-hop join path leaves the inner bare base-column refs unqualified
in emitted SQL. Investigation showed the bug is already fixed by PR #89
(DEV-1333) — running the issue's reproduction now produces correctly
qualified SQL across all entry points: dim refs, cross-model measure
aggregations, filters, time dimensions, and diamond join paths.

Land six regression tests pinning the fixed behavior so the qualification
contract can't silently regress: multi-hop dim, multi-hop cross-model
measure, multi-hop filter, multi-hop time-dim, the verbatim DEV-1339
solar_panels repro (adapted from the issue's hypothetical alias/on
fields to the real target_model/join_pairs API), and a diamond-path
cross-model measure that asserts the non-queried arm of the diamond
never leaks into the measure CTE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ev-1339-multi-hop-alias-path-doesnt-qualify-joined-model-derived
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@ZmeiGorynych has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 32 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fc554bfa-232a-41af-99f1-a63ae7b39757

📥 Commits

Reviewing files that changed from the base of the PR and between 6aa9ddd and d518963.

📒 Files selected for processing (1)
  • tests/test_cross_model_derived_columns.py
📝 Walkthrough

Walkthrough

Adds regression tests and a test fixture validating recursive inlining and canonical alias qualification of cross-model derived Column.sql across multi-hop joins (A→B→C), including dimension selection, measures, filters, time-dim enrichment, a solar-panels repro, and a diamond-join case.

Changes

Multi-Hop Derived Column Test Suite

Layer / File(s) Summary
Test Fixture / Data Shape
tests/test_cross_model_derived_columns.py
Adds _DEFAULT_C_COLUMNS and _save_a_b_c(...) to persist a three-model chain A→B→C where C defines derived columns (e.g., x_derived = raw_c * 2).
Query-side Dimension / Filter
tests/test_cross_model_derived_columns.py
Adds test_multihop_query_dim_to_derived_target_column and test_multihop_filter_on_derived_target_column asserting query-time expansion of B.C.x_derived is inlined and qualified to the canonical multi-hop alias (e.g., B__C) with no bare identifier leakage.
Measure Expansion / Parsing
tests/test_cross_model_derived_columns.py
Adds test_multihop_cross_model_measure_over_derived_target which generates measure SQL (e.g., B.C.x_derived:sum), verifies sqlglot parses it, and asserts qualified inlining inside aggregation (SUM(...) contains qualified raw_c * 2).
Time-dimension Enrichment
tests/test_cross_model_derived_columns.py
Adds test_multihop_time_dim_to_derived_target_column asserting time-dimension SQL that targets B.C.shifted_ts is expanded with canonical multi-hop qualifiers.
Regression & Complex Join Topologies
tests/test_cross_model_derived_columns.py
Adds test_dev_1339_solar_panels_repro (solar_panels.panel_efficiency derived from bare inner bases) and test_diamond_path_cross_model_measure_over_derived_target (diamond join) to ensure inner base references and diamond-arm aliases are correctly qualified during cross-model derived expansion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • MotleyAI/slayer#9: Implements multi-hop join aliasing and derived column resolution logic that these tests directly validate.
  • MotleyAI/slayer#89: Introduces recursive derived reference expansion and enrichment used by the canonical alias qualification behavior under test.

Poem

🐰
Hopping through joins from A to C,
I inline the maths for all to see,
No bare names wander, aliases hold,
B__C keeps the SQL tidy and bold,
Rabbit-approved tests—safe and free.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding regression tests for multi-hop derived column qualification, with the ticket reference DEV-1339 providing context.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1339-multi-hop-alias-path-doesnt-qualify-joined-model-derived

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/test_cross_model_derived_columns.py (3)

650-905: ⚡ Quick win

Add one parse-level assertion for a new multi-hop scenario.

These regressions mostly validate aliasing with substring checks. A malformed query shape could still satisfy those assertions, so extending the existing sqlglot.parse(...) sanity check to one representative multi-hop measure/time-dimension path would tighten the safety net for DEV-1339.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cross_model_derived_columns.py` around lines 650 - 905, Add a sql
parsing sanity check to one representative multi-hop test to catch malformed SQL
that might pass substring checks: e.g. in
test_multihop_cross_model_measure_over_derived_target (after generating sql via
_gen_sql and normalizing to norm) call sqlglot.parse(sql) or
sqlglot.parse_one(norm) to assert the SQL is syntactically valid; alternatively
do the same in test_multihop_time_dim_to_derived_target_column after sql/norm is
produced. Ensure you import sqlglot at top of the test file if missing and place
the parse assertion before the other substring/assert checks so parse failures
fail early.

659-659: ⚡ Quick win

Use keyword arguments for the new _gen_sql calls.

These new cases pass engine, query, and model positionally, which breaks the repo’s Python calling convention and makes the setup blocks harder to scan.

As per coding guidelines "Use keyword arguments for functions with more than 1 parameter".

Also applies to: 686-686, 714-714, 774-774, 818-823, 829-834, 891-891

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cross_model_derived_columns.py` at line 659, Several tests call
_gen_sql with positional args (e.g., _gen_sql(engine, query, model_a)); update
those calls to use keyword arguments to follow the project's convention: call
_gen_sql(engine=engine, query=query, model=model_a) (or model=model_b as
appropriate). Replace every positional invocation of _gen_sql in this file
(including the instances flagged around the noted ranges) so the parameters are
passed as keywords: engine=..., query=..., model=.... This applies to all
occurrences (the call at the shown line and the other listed occurrences).

730-731: ⚡ Quick win

Move these imports to module scope.

TimeGranularity and TimeDimension are static dependencies of this test module, so keeping them inside the test body works against the repo’s import convention.

As per coding guidelines "Place imports at the top of files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cross_model_derived_columns.py` around lines 730 - 731, Move the
two local imports out of the test body and place them at module scope at the top
of the test module: import TimeGranularity from slayer.core.enums and
TimeDimension from slayer.core.query so they are module-level imports; update
any existing duplicate imports to avoid redundancy and run tests to ensure no
import-time side effects.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_cross_model_derived_columns.py`:
- Around line 650-905: Add a sql parsing sanity check to one representative
multi-hop test to catch malformed SQL that might pass substring checks: e.g. in
test_multihop_cross_model_measure_over_derived_target (after generating sql via
_gen_sql and normalizing to norm) call sqlglot.parse(sql) or
sqlglot.parse_one(norm) to assert the SQL is syntactically valid; alternatively
do the same in test_multihop_time_dim_to_derived_target_column after sql/norm is
produced. Ensure you import sqlglot at top of the test file if missing and place
the parse assertion before the other substring/assert checks so parse failures
fail early.
- Line 659: Several tests call _gen_sql with positional args (e.g.,
_gen_sql(engine, query, model_a)); update those calls to use keyword arguments
to follow the project's convention: call _gen_sql(engine=engine, query=query,
model=model_a) (or model=model_b as appropriate). Replace every positional
invocation of _gen_sql in this file (including the instances flagged around the
noted ranges) so the parameters are passed as keywords: engine=..., query=...,
model=.... This applies to all occurrences (the call at the shown line and the
other listed occurrences).
- Around line 730-731: Move the two local imports out of the test body and place
them at module scope at the top of the test module: import TimeGranularity from
slayer.core.enums and TimeDimension from slayer.core.query so they are
module-level imports; update any existing duplicate imports to avoid redundancy
and run tests to ensure no import-time side effects.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5349daaa-af36-4691-a1bf-eac440c7715b

📥 Commits

Reviewing files that changed from the base of the PR and between cce8b08 and dfc942d.

📒 Files selected for processing (1)
  • tests/test_cross_model_derived_columns.py

- Parameterise ``_save_a_b_c`` with a ``c_columns`` kwarg so
  ``test_multihop_time_dim_to_derived_target_column`` can reuse the A/B
  scaffold instead of redefining all three models inline. Eliminates the
  22-line duplicate block flagged by the SonarCloud quality gate (drops
  new-duplicated-lines density below the 3% threshold).
- Add a ``sqlglot.parse(...)`` sanity assertion at the head of
  ``test_multihop_cross_model_measure_over_derived_target`` so structurally
  broken SQL short-circuits before the substring matchers — addresses the
  CodeRabbit review-summary nitpick on the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_cross_model_derived_columns.py (1)

740-741: 💤 Low value

Consider moving imports to the top of the file.

These imports are placed inside the test function. Per coding guidelines, imports should be at the top of files. However, this is common in tests for isolation when only one test needs specific imports.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cross_model_derived_columns.py` around lines 740 - 741, Move the
local imports for TimeGranularity and TimeDimension out of the test body and
place them at the module top so they follow the project's import guidelines;
find usages of TimeGranularity and TimeDimension in
tests/test_cross_model_derived_columns.py and replace the in-test "from
slayer.core.enums import TimeGranularity" and "from slayer.core.query import
TimeDimension" with module-level imports at the top of the file, ensuring no
name conflicts and keeping test isolation intact if any setup previously
depended on the late import.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_cross_model_derived_columns.py`:
- Around line 740-741: Move the local imports for TimeGranularity and
TimeDimension out of the test body and place them at the module top so they
follow the project's import guidelines; find usages of TimeGranularity and
TimeDimension in tests/test_cross_model_derived_columns.py and replace the
in-test "from slayer.core.enums import TimeGranularity" and "from
slayer.core.query import TimeDimension" with module-level imports at the top of
the file, ensuring no name conflicts and keeping test isolation intact if any
setup previously depended on the late import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 831f6488-34f8-4e63-be20-69548655b80e

📥 Commits

Reviewing files that changed from the base of the PR and between dfc942d and 6aa9ddd.

📒 Files selected for processing (1)
  • tests/test_cross_model_derived_columns.py

Per the project import-style rule ("place all imports at the top of the
file") and the CodeRabbit nitpick on PR #93, lift the in-function
imports out of ``test_multihop_time_dim_to_derived_target_column``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant