Skip to content

Dynamic Joins & Multi-Stage Queries#9

Merged
AivanF merged 13 commits intomainfrom
aivan/2026-04-01-Dynamic-Joins+Multi-Stage-Queries
Apr 7, 2026
Merged

Dynamic Joins & Multi-Stage Queries#9
AivanF merged 13 commits intomainfrom
aivan/2026-04-01-Dynamic-Joins+Multi-Stage-Queries

Conversation

@AivanF
Copy link
Copy Markdown
Collaborator

@AivanF AivanF commented Apr 2, 2026

Summary by CodeRabbit

  • New Features

    • Cross-model measures with dotted multi‑hop references and support for transforms (e.g., cumsum).
    • Declarative model JOINs (LEFT JOINs), model-level always‑applied filters, multistage queries, inline model extensions, and saving queries as reusable models.
  • Documentation

    • Expanded guidance on SQL expressions, cross‑model usage, query lists, model filters, and persisting queries as models.
  • Tests

    • Added integration tests for cross‑model measures, multistage queries, model creation, and circular‑reference checks.
  • Chores

    • Project version bumped to 0.2.0rc1.

@AivanF AivanF requested a review from ZmeiGorynych April 2, 2026 15:34
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36035d42-86dc-405c-9fe0-6921a98297e0

📥 Commits

Reviewing files that changed from the base of the PR and between 0b82c9e and e3e0ab3.

📒 Files selected for processing (1)
  • slayer/engine/query_engine.py

📝 Walkthrough

Walkthrough

Adds declarative model LEFT JOINs and dotted cross‑model measure syntax (multi‑hop), model-level filters, query-as-model (named/nested queries and persisting queries), ModelExtension, ingestion of join metadata, enrichment/SQL generation for cross‑model measures, an MCP tool to create models from queries, and related tests and docs.

Changes

Cohort / File(s) Summary
Documentation
/.claude/skills/slayer-models.md, /.claude/skills/slayer-query.md, CLAUDE.md, README.md, docs/concepts/formulas.md, docs/concepts/models.md, docs/concepts/queries.md, docs/concepts/terminology.md, docs/interfaces/mcp.md
Add docs for joins and filters, dotted model.measure cross‑model measures (including multi‑hop), ModelExtension, query lists / query-as-model, create_model_from_query, and update model field references and roadmap.
Core data models & query types
slayer/core/models.py, slayer/core/query.py
Add ModelJoin type; extend SlayerModel with joins, filters, source_queries, default data_source; add ModelExtension; relax SlayerQuery.model to accept str / inline models / ModelExtension and optional name.
Formula parsing
slayer/core/formula.py
AST parsing now accepts dotted attribute access (e.g., customers.avg_score) as measure references and collects dotted names without double‑counting.
Enrichment & dataclasses
slayer/engine/enriched.py, slayer/engine/query_engine.py
Add CrossModelMeasure and EnrichedQuery.cross_model_measures; enrichment resolves dotted cross‑model measures, multi‑hop dotted dimensions via join-walk, validates model filters, supports named query lists, _query_as_model, create_model_from_query, and a model resolution pipeline.
Ingestion
slayer/engine/ingestion.py
Add _generate_joins BFS over FK reverse-relationships to derive ModelJoin metadata (including composite/transitive joins) and persist joins on ingested models.
SQL generation
slayer/sql/generator.py
SQLGenerator now wraps cross‑model measures into per‑measure CTEs and LEFT JOINs, forces GROUP BY when cross‑model measures exist, adjusts computed/transform handling and pagination, and adds _generate_with_cross_model.
MCP server & interface
slayer/mcp/server.py, docs/interfaces/mcp.md
Add MCP tool create_model_from_query(name, query, description?) to validate and persist models derived from queries.
Tests
tests/test_integration.py, tests/test_integration_postgres.py
Add fixtures and tests covering cross‑model measures, missing‑join errors, transforms on cross‑model measures, multi‑stage query lists, query‑as‑model persistence/usage, multi‑hop dotted resolution, ingestion join metadata, circular join/query errors, and YAML round‑trip for join metadata.
Versioning
pyproject.toml
Bump package version to 0.2.0rc1.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant QueryEngine
    participant Storage
    participant SQLGen
    participant Database

    Client->>QueryEngine: execute(query with cross-model measure)
    QueryEngine->>Storage: resolve primary model (name or inline)
    Storage-->>QueryEngine: SlayerModel (with joins)
    QueryEngine->>QueryEngine: _enrich() detects dotted measure → resolve target model
    QueryEngine->>Storage: get_model(targetModel)
    Storage-->>QueryEngine: target SlayerModel
    QueryEngine->>QueryEngine: build CrossModelMeasure entries
    QueryEngine->>SQLGen: generate(EnrichedQuery with cross_model_measures)
    SQLGen->>SQLGen: _generate_with_cross_model() → CTEs + LEFT JOINs
    SQLGen-->>QueryEngine: final SQL
    QueryEngine->>Database: execute SQL
    Database-->>QueryEngine: results
    QueryEngine-->>Client: SlayerResponse
Loading
sequenceDiagram
    participant Client
    participant QueryEngine
    participant Storage
    participant SQLGen
    participant Database

    Client->>QueryEngine: execute([named_inner_query, main_query])
    QueryEngine->>QueryEngine: register named_inner (name)
    QueryEngine->>QueryEngine: _query_as_model(named_inner) → virtual model
    QueryEngine->>SQLGen: generate SQL for inner
    SQLGen-->>QueryEngine: inner SQL
    QueryEngine->>Storage: (optional) save_model(virtual model)
    Storage-->>QueryEngine: persisted model name
    QueryEngine->>QueryEngine: replace main_query.model with virtual/persisted model
    QueryEngine->>QueryEngine: _enrich(main_query)
    QueryEngine->>SQLGen: generate final SQL
    QueryEngine->>Database: execute SQL
    Database-->>QueryEngine: results
    QueryEngine-->>Client: SlayerResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Filtering on computed columns #6 — Overlaps on filter/transform extraction and SQL generation paths that interact with computed/time-shift transforms; likely related to cross-model transform handling.

Suggested reviewers

  • ZmeiGorynych

Poem

🐰 I hopped through joins and dotted trails,

stitched measures from far‑off scales.
Nested queries, CTEs in line,
models born from SQL and time.
A rabbit cheers — analytics shine! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Dynamic Joins & Multi-Stage Queries' directly and clearly summarizes the two main feature additions across the changeset: explicit join declarations for cross-model relationships and multistage query execution.
Docstring Coverage ✅ Passed Docstring coverage is 88.14% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aivan/2026-04-01-Dynamic-Joins+Multi-Stage-Queries

Comment @coderabbitai help to get the list of available commands and usage tips.

@AivanF AivanF marked this pull request as draft April 2, 2026 15:46
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
slayer/core/formula.py (1)

161-164: ⚠️ Potential issue | 🟠 Major

Same gap in _parse_mixed_arithmetic for cross-model measures.

The _replace_calls inner function collects only ast.Name nodes (line 161-164). Cross-model measure references like customers.avg_score in mixed arithmetic (e.g., "cumsum(customers.avg_score) / count") would also be missed.

🐛 Proposed fix
         if isinstance(n, ast.Name):
             if n.id not in [p for p, _ in sub_transforms]:
                 measure_names.append(n.id)
             return n
+
+        if isinstance(n, ast.Attribute) and isinstance(n.value, ast.Name):
+            # Cross-model measure reference
+            full_name = f"{n.value.id}.{n.attr}"
+            measure_names.append(full_name)
+            return n
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/core/formula.py` around lines 161 - 164, _in _parse_mixed_arithmetic's
inner function _replace_calls, extend the current check that only handles
ast.Name to also handle ast.Attribute so cross-model measures like
customers.avg_score are captured: when encountering an ast.Attribute whose value
is an ast.Name, build the full identifier (e.g., f"{value.id}.{attr}") and
append it to measure_names if it's not in [p for p, _ in sub_transforms]; ensure
you still return the original node after recording the name. This change
preserves existing behavior for simple names (measure_names, sub_transforms) and
covers dotted cross-model references.
🧹 Nitpick comments (1)
slayer/mcp/server.py (1)

350-351: Remove redundant import.

SlayerQuery is already imported at the top of the file (line 10). The local import is unnecessary.

♻️ Proposed fix
-        from slayer.core.query import SlayerQuery as SQ
-        parsed_query = SQ.model_validate(query)
+        parsed_query = SlayerQuery.model_validate(query)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/mcp/server.py` around lines 350 - 351, Remove the redundant local
import "from slayer.core.query import SlayerQuery as SQ" and instead use the
existing top-level imported SlayerQuery symbol when validating the query (update
the parsed_query line to call the already-imported
SlayerQuery.model_validate(...) or adjust the alias to match the top-level
import); ensure only the duplicate import is removed and the parsed_query
assignment continues to call the correct SlayerQuery.model_validate function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/core/formula.py`:
- Around line 90-92: The collector and replacer miss dotted cross-model
measures: update _collect_names to also handle ast.Attribute whose .value is
ast.Name and create MeasureRef(name=f"{node.value.id}.{node.attr}"), and
likewise modify the inner _replace_calls function inside _parse_mixed_arithmetic
to detect ast.Attribute (with ast.Name value) in addition to ast.Name and
replace those dotted references the same way (producing the same MeasureRef/name
string) so cross-model measures are collected and substituted in both pure and
mixed arithmetic flows.

In `@slayer/core/models.py`:
- Around line 29-32: The ModelJoin pydantic model must validate join_pairs to
fail fast on malformed shapes: add a validator on ModelJoin.join_pairs that
ensures join_pairs is a non-empty list and that each entry is a list/sequence of
exactly two non-empty strings (e.g., ["src", "dst"]); raise a ValueError with a
clear message if any element has length != 2 or contains non-string/empty values
so bad YAML/storage is rejected before join generation uses unpacking. Ensure
the validator references the class ModelJoin and the field join_pairs so it
triggers during model parsing/validation.

In `@slayer/engine/ingestion.py`:
- Around line 195-211: The loop currently takes only the first (fk_col, tgt_col)
pair and fabricates a single source column (source_dim), which collapses
composite/transitive joins and produces aliases that the rollup builder never
emits; instead, for each ref_table iterate all entries in
reverse_refs[ref_table] whose referencing_table is in processed, build a
ModelJoin with join_pairs containing every matching pair (for each pair append
[qualified_source_column, tgt_col] rather than breaking after the first), and
stop fabricating a single collapsed column—use the actual fk_col for
referencing_table == source_table and include the full set of fk_col values for
referencing_table != source_table so composite FKs and multi-hop joins are
preserved (update the loop that constructs joins, ModelJoin(join_pairs=...), and
remove the premature break and single-source_dim logic).

In `@slayer/engine/query_engine.py`:
- Around line 377-387: _in _flatten_spec(), dotted MeasureRef instances (e.g.,
"customers.avg_score") are only handled when the spec is the entire field, so
nested uses inside change(...), last(...), arithmetic, or extracted filter
transforms still reach _ensure_measure() and fail; update _flatten_spec() to
detect and resolve MeasureRef with "." anywhere in the spec tree by reusing
_resolve_cross_model_measure (same logic used in the top-level branch), ensuring
any MeasureRef nodes encountered during flattening call
_resolve_cross_model_measure(spec_name=spec.name, field_name=field_name,
model=model, query=query, dimensions=dimensions,
time_dimensions=time_dimensions, label=field.label) and append the returned
cross-model measure to cross_model_measures instead of falling through to
_ensure_measure().
- Around line 545-555: The recursive branch resolves inner_model via
_query_as_model/storage.get_model but leaves inner_query.model pointing to the
nested SlayerQuery, which causes _enrich(query=inner_query, model=inner_model)
to build aliases from the wrong value; update inner_query so its .model is
normalized to the resolved inner_model (e.g., replace inner_query.model with
inner_model or create a shallow copy of inner_query with model=inner_model)
before calling _enrich; this ensures _enrich uses the actual Model object
returned by _query_as_model/get_model and fixes aliasing for 2+-stage
query-as-model chains.
- Around line 121-131: The code currently constructs a SlayerModel via
_query_as_model and then unconditionally calls self.storage.save_model when save
is True, which can silently overwrite existing models; update
create_model_from_query to first check for an existing model with the same name
(e.g. via self.storage.get_model(name) or a storage.exists/name lookup method)
and if one exists raise an explicit error (ValueError or a custom
DuplicateModelError) instead of calling self.storage.save_model, otherwise
proceed to save; reference the existing helpers _query_as_model, the SlayerModel
constructor, and storage.save_model when making the change.
- Around line 574-597: The _query_as_model() routine currently promotes inner
measures/transforms/expressions into the synthesized SlayerModel but omits
enriched.cross_model_measures, so cross-model columns are dropped; update
_query_as_model() to also iterate enriched.cross_model_measures (and any
analogous cross-model transforms/expressions if present) and append them to dims
as Dimension(name=..., sql=_quoted(...), type=...) and to measures with the same
aggregation helpers (e.g., add Measure(name=f"{m.name}_sum",
sql=_quoted(m.alias), type=DataType.SUM) and Measure(name=f"{m.name}_avg", ...))
so inner selects like customers.avg_score are preserved when using model=inner
or create_model_from_query().
- Around line 598-604: The synthesized SlayerModel returned in
create_model_from_query drops the inner model's time anchor causing loss of
default_time_dimension; update the SlayerModel construction to preserve the
inner_model's time metadata by copying its default_time_dimension (and, if
applicable, time_dimensions/main_time_dimension semantics) into the new
SlayerModel so outer transforms retain the original fallback for time ordering;
specifically, when building the return SlayerModel ensure you set
default_time_dimension = inner_model.default_time_dimension and preserve any
main_time_dimension or time_dimensions entries instead of omitting them.

In `@slayer/sql/generator.py`:
- Around line 500-548: The CTE (cte_sql) currently selects from the raw source
(cm.source_sql / cm.source_sql_table) and immediately aggregates after joining
the target, which ignores the main query's source-side filters and causes
duplicated weighting; change this to build the CTE from the same filtered source
slice used by the main query and dedupe the source rows by the join keys before
aggregating the target measure. Concretely: when composing from_sql for
cm.source_model_name, apply the same WHERE/date-range predicates used by the
main query (reuse whatever routine/expressions the generator uses to build model
filters for the main query), then create an intermediate source-key CTE that
SELECTs the shared dimension and shared_time_dimensions (use _resolve_sql and
_build_date_trunc as already used) and GROUPs BY the join keys (or SELECT
DISTINCT on the join keys) to de-duplicate source rows; finally LEFT JOIN that
deduped source-key CTE to the target (cm.target_model_sql /
cm.target_model_name) and then run the measure aggregation via _build_agg to
produce the final aggregated column. Ensure you update references to
cm.source_sql / cm.source_sql_table and cm.join_pairs so join_conditions use the
deduped source CTE name instead of the raw source.
- Around line 557-567: The FROM clause builder in slayer/sql/generator.py
currently skips joining CTEs when a cross-model CTE (cm) has no
shared_dimensions and shared_time_dimensions, producing invalid SQL; update the
loop that iterates cm_cte_names (the block that builds join_on_parts and appends
to from_clause) so that if join_on_parts is empty you append a "CROSS JOIN
{cte_name}" instead of nothing; keep existing LEFT JOIN logic when join_on_parts
is present and ensure you still use cm.shared_dimensions and
cm.shared_time_dimensions to build join_on_parts as before.
- Around line 465-585: The _generate_with_cross_model function currently
concatenates SQL strings; refactor it to build a sqlglot AST instead: for each
cross-model measure in enriched.cross_model_measures create an exp.CTE whose
expression is an exp.Select built from resolved column expressions (use
_resolve_sql, _build_date_trunc, _build_agg to produce exp expressions), add
GROUP BY expressions from group_parts, then create the main exp.CTE for _main
(wrapping base_sql as a parsed exp if needed), assemble the final exp.Select
selecting _main columns plus each CTE column, add exp.Join/exp.From nodes to
LEFT JOIN each cm CTE to _main on the shared dimension/time expressions, wrap
all CTEs in an exp.With and render once via .sql(dialect=self.dialect); ensure
ORDER BY/LIMIT/OFFSET are applied to the rendered Select using qualified _main
identifiers as in the current logic.

---

Outside diff comments:
In `@slayer/core/formula.py`:
- Around line 161-164: _in _parse_mixed_arithmetic's inner function
_replace_calls, extend the current check that only handles ast.Name to also
handle ast.Attribute so cross-model measures like customers.avg_score are
captured: when encountering an ast.Attribute whose value is an ast.Name, build
the full identifier (e.g., f"{value.id}.{attr}") and append it to measure_names
if it's not in [p for p, _ in sub_transforms]; ensure you still return the
original node after recording the name. This change preserves existing behavior
for simple names (measure_names, sub_transforms) and covers dotted cross-model
references.

---

Nitpick comments:
In `@slayer/mcp/server.py`:
- Around line 350-351: Remove the redundant local import "from slayer.core.query
import SlayerQuery as SQ" and instead use the existing top-level imported
SlayerQuery symbol when validating the query (update the parsed_query line to
call the already-imported SlayerQuery.model_validate(...) or adjust the alias to
match the top-level import); ensure only the duplicate import is removed and the
parsed_query assignment continues to call the correct SlayerQuery.model_validate
function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e02dbacd-1c17-4bd7-a133-0e4edf656d8b

📥 Commits

Reviewing files that changed from the base of the PR and between 147f9b5 and 123e9b7.

📒 Files selected for processing (19)
  • .claude/skills/slayer-models.md
  • .claude/skills/slayer-query.md
  • CLAUDE.md
  • README.md
  • docs/concepts/formulas.md
  • docs/concepts/models.md
  • docs/concepts/queries.md
  • docs/concepts/terminology.md
  • docs/interfaces/mcp.md
  • slayer/core/formula.py
  • slayer/core/models.py
  • slayer/core/query.py
  • slayer/engine/enriched.py
  • slayer/engine/ingestion.py
  • slayer/engine/query_engine.py
  • slayer/mcp/server.py
  • slayer/sql/generator.py
  • tests/test_integration.py
  • tests/test_integration_postgres.py

@AivanF AivanF marked this pull request as ready for review April 6, 2026 15:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
slayer/core/query.py (1)

46-51: ⚠️ Potential issue | 🟡 Minor

from_string may misparse multi-hop dimension references.

The from_string method splits on the first "." to extract model.name, but this conflicts with the new multi-hop dotted dimension support (e.g., "customers.regions.name").

If called with "customers.regions.name", it returns ColumnRef(model="customers", name="regions.name"), which may not be the intended behavior for dimension references.

Consider whether this method is still appropriate for dimension parsing, or if it should only be used for model-qualified references.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/core/query.py` around lines 46 - 51, The current ColumnRef.from_string
misparses multi-hop dotted dimension strings like "customers.regions.name"
because it always splits on the first dot; change the logic to only treat
strings with exactly one dot as a model-qualified reference. Update from_string
so that if s.count(".") == 1 it splits into model and name (returning
cls(name=name, model=model)), otherwise return cls(name=s) so multi-hop
dimension references remain intact; keep the method name ColumnRef.from_string
and ensure behavior is documented/clear for callers that this only handles
single-dot model-qualified references.
🧹 Nitpick comments (2)
slayer/core/query.py (1)

117-118: Use Union type instead of object for better type safety.

The model: object typing loses all type information. Consider using a proper Union type for better IDE support, validation, and documentation:

+from typing import List, Optional, Union
+
+# At class level or module level:
+ModelInput = Union[str, "SlayerModel", "ModelExtension", dict]
+
 class SlayerQuery(BaseModel):
-    name: Optional[str] = None
-    model: object  # str (model name), SlayerModel (inline), or ModelExtension
+    name: Optional[str] = None
+    model: ModelInput  # str (model name), SlayerModel (inline), or ModelExtension

This enables Pydantic's discriminated union validation and provides better IDE autocomplete.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/core/query.py` around lines 117 - 118, Update the type annotation of
the field `model` in slayer/core/query.py from `object` to a Union that reflects
the allowed types (e.g. Union[str, SlayerModel, ModelExtension]) so Pydantic can
perform discriminated-union validation and IDEs get proper completions; import
Union from typing (or use from __future__ import annotations and string-literal
types) and reference the concrete types `SlayerModel` and `ModelExtension` (use
forward-reference strings if they are defined later) in the `model` annotation.
tests/test_integration.py (1)

1030-1067: Query list with joins test has complex setup worth documenting.

The test creates a sub-query (customer_scores) and joins it to the main query via ModelExtension. The comment at lines 1043-1044 explains the measure naming convention (avg_score_avg) but could be clearer.

Consider adding a brief comment explaining why avg_score_avg is used (the virtual model from _query_as_model creates {measure}_avg aggregations for numeric columns from the inner query).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_integration.py` around lines 1030 - 1067, The test
test_query_list_with_joins creates a sub-query "customer_scores" and joins it
via ModelExtension, but the comment about using "avg_score_avg" is terse; update
the test's comments near the SlayerQuery/sub and ModelExtension setup to
explicitly state that the virtual model produced by _query_as_model converts
inner query numeric measures into auto-generated aggregated measures named
"{measure}_avg", so to re-average the inner avg_score you must reference
customer_scores.avg_score_avg in the main query; mention the naming pattern and
why the double "_avg" appears to make the intent clear for future readers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/concepts/queries.md`:
- Around line 25-39: Remove the incorrect `sql` and `formula` entries from the
ColumnRef table and update the examples and text to clearly distinguish between
a ColumnRef (supports name, model, label and is used in `dimensions`) and
computed fields (defined on `ModelExtension` via raw SQL or defined via the
`Field` class and used in `fields`); replace the example that shows `{"sql": ...
, "name": "tier"}` with a valid ColumnRef example (only `name`/`label`/optional
`model`) and add a short clarifying sentence that computed SQL dimensions belong
on `ModelExtension` and formula-based fields belong to `Field`/`fields`.

In `@slayer/engine/query_engine.py`:
- Line 145: The class-level mutable _resolving set on SlayerQueryEngine causes
shared state across instances; move _resolving into the instance by initializing
it in the SlayerQueryEngine.__init__ method (e.g., self._resolving = set()) and
remove or rename the class attribute so each instance tracks resolving models
independently; update any references in methods from
SlayerQueryEngine._resolving to self._resolving to avoid cross-instance
pollution and preserve circular-detection behavior.

---

Outside diff comments:
In `@slayer/core/query.py`:
- Around line 46-51: The current ColumnRef.from_string misparses multi-hop
dotted dimension strings like "customers.regions.name" because it always splits
on the first dot; change the logic to only treat strings with exactly one dot as
a model-qualified reference. Update from_string so that if s.count(".") == 1 it
splits into model and name (returning cls(name=name, model=model)), otherwise
return cls(name=s) so multi-hop dimension references remain intact; keep the
method name ColumnRef.from_string and ensure behavior is documented/clear for
callers that this only handles single-dot model-qualified references.

---

Nitpick comments:
In `@slayer/core/query.py`:
- Around line 117-118: Update the type annotation of the field `model` in
slayer/core/query.py from `object` to a Union that reflects the allowed types
(e.g. Union[str, SlayerModel, ModelExtension]) so Pydantic can perform
discriminated-union validation and IDEs get proper completions; import Union
from typing (or use from __future__ import annotations and string-literal types)
and reference the concrete types `SlayerModel` and `ModelExtension` (use
forward-reference strings if they are defined later) in the `model` annotation.

In `@tests/test_integration.py`:
- Around line 1030-1067: The test test_query_list_with_joins creates a sub-query
"customer_scores" and joins it via ModelExtension, but the comment about using
"avg_score_avg" is terse; update the test's comments near the SlayerQuery/sub
and ModelExtension setup to explicitly state that the virtual model produced by
_query_as_model converts inner query numeric measures into auto-generated
aggregated measures named "{measure}_avg", so to re-average the inner avg_score
you must reference customer_scores.avg_score_avg in the main query; mention the
naming pattern and why the double "_avg" appears to make the intent clear for
future readers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 583e9646-728c-4fc2-8548-7541c64fa5db

📥 Commits

Reviewing files that changed from the base of the PR and between 123e9b7 and 532c8b8.

📒 Files selected for processing (16)
  • .claude/skills/slayer-models.md
  • .claude/skills/slayer-query.md
  • CLAUDE.md
  • README.md
  • docs/concepts/formulas.md
  • docs/concepts/models.md
  • docs/concepts/queries.md
  • docs/concepts/terminology.md
  • slayer/core/formula.py
  • slayer/core/models.py
  • slayer/core/query.py
  • slayer/engine/ingestion.py
  • slayer/engine/query_engine.py
  • slayer/sql/generator.py
  • tests/test_integration.py
  • tests/test_integration_postgres.py
✅ Files skipped from review due to trivial changes (6)
  • docs/concepts/formulas.md
  • docs/concepts/terminology.md
  • CLAUDE.md
  • .claude/skills/slayer-query.md
  • .claude/skills/slayer-models.md
  • docs/concepts/models.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • slayer/core/formula.py
  • slayer/core/models.py
  • README.md
  • slayer/sql/generator.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
slayer/engine/query_engine.py (3)

109-144: Move imports to the top of the file.

Per coding guidelines, imports must be at the top of files. Lines 111 and 123 import ModelExtension and ModelJoin inside the function body.

♻️ Suggested fix

Add to imports at top of file:

from slayer.core.query import ModelExtension, SlayerQuery
from slayer.core.models import DatasourceConfig, Dimension, Measure, ModelJoin, SlayerModel

Then remove lines 111 and 123.

As per coding guidelines, "Imports must be at the top of files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 109 - 144, Move the in-function
imports used by _resolve_query_model to the module top: import ModelExtension
from slayer.core.query and Dimension, Measure, ModelJoin, SlayerModel (and any
other model types used) from slayer.core.models, then remove the local imports
inside _resolve_query_model (the inline imports around ModelExtension and
ModelJoin) so the function uses the top-level symbols instead.

343-347: Move imports to the top of the file.

The re module and slayer.core.formula imports are inside the _enrich method body, violating the coding guideline that imports must be at the top of files.

As per coding guidelines, "Imports must be at the top of files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 343 - 347, Move the local imports
out of the _enrich method and place them at the top of the module: import re and
the names from slayer.core.formula (ArithmeticField, MeasureRef,
MixedArithmeticField, TransformField, TIME_TRANSFORMS, parse_formula,
parse_filter) should be declared with the other top-level imports; update any
references inside _enrich to use these top-level names and remove the inline
import statements so the function only contains logic, not imports.

608-609: Move imports to the top of the file.

The ast module (imported as _ast) and slayer.core.formula imports are inside the static method body.

As per coding guidelines, "Imports must be at the top of files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 608 - 609, Move the local imports
of ast (as _ast) and from slayer.core.formula (ALL_TRANSFORMS, _preprocess_like)
out of the static method and place them at the top of the module so they follow
the "imports at top" guideline; update the static method to use the
now-module-level names (_ast, ALL_TRANSFORMS, _preprocess_like) and remove the
in-method import lines, and if doing so introduces a circular import, instead
add a short comment explaining why a local import is required and keep it as the
only exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/engine/query_engine.py`:
- Around line 689-691: inner_query.model may be a SlayerModel or ModelExtension,
not just a string, so calling _resolve_model(model_name=inner_query.model, ...)
can fail; before calling _resolve_model, normalize inner_query.model into a
string model_name: if it's already a str use it, if it's a SlayerModel or
ModelExtension extract its string identifier (e.g., .name or the appropriate
model name/property on those classes), otherwise raise a clear TypeError; then
call self._resolve_model(model_name=model_name, named_queries=named_queries).

---

Nitpick comments:
In `@slayer/engine/query_engine.py`:
- Around line 109-144: Move the in-function imports used by _resolve_query_model
to the module top: import ModelExtension from slayer.core.query and Dimension,
Measure, ModelJoin, SlayerModel (and any other model types used) from
slayer.core.models, then remove the local imports inside _resolve_query_model
(the inline imports around ModelExtension and ModelJoin) so the function uses
the top-level symbols instead.
- Around line 343-347: Move the local imports out of the _enrich method and
place them at the top of the module: import re and the names from
slayer.core.formula (ArithmeticField, MeasureRef, MixedArithmeticField,
TransformField, TIME_TRANSFORMS, parse_formula, parse_filter) should be declared
with the other top-level imports; update any references inside _enrich to use
these top-level names and remove the inline import statements so the function
only contains logic, not imports.
- Around line 608-609: Move the local imports of ast (as _ast) and from
slayer.core.formula (ALL_TRANSFORMS, _preprocess_like) out of the static method
and place them at the top of the module so they follow the "imports at top"
guideline; update the static method to use the now-module-level names (_ast,
ALL_TRANSFORMS, _preprocess_like) and remove the in-method import lines, and if
doing so introduces a circular import, instead add a short comment explaining
why a local import is required and keep it as the only exception.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 222f980a-d478-4bdd-bca3-36ed19c08cd7

📥 Commits

Reviewing files that changed from the base of the PR and between 532c8b8 and f963b5f.

📒 Files selected for processing (4)
  • CLAUDE.md
  • docs/concepts/queries.md
  • pyproject.toml
  • slayer/engine/query_engine.py
✅ Files skipped from review due to trivial changes (3)
  • pyproject.toml
  • CLAUDE.md
  • docs/concepts/queries.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
slayer/engine/query_engine.py (2)

688-694: ⚠️ Potential issue | 🟠 Major

Handle non-string inner_query.model before passing to _resolve_model().

_resolve_model() expects a string model_name, but inner_query.model can be a SlayerModel, ModelExtension, or dict. This path would fail or behave incorrectly for nested queries that use non-string model references.

🐛 Proposed fix
         named_queries = named_queries or {}
 
         # Resolve the inner model (checks named queries, then storage)
-        inner_model = self._resolve_model(
-            model_name=inner_query.model, named_queries=named_queries,
-        )
+        inner_model = self._resolve_query_model(
+            query_model=inner_query.model, named_queries=named_queries,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 688 - 694, inner_query.model can
be non-string (SlayerModel, ModelExtension, dict) but _resolve_model expects a
string; before calling _resolve_model and _enrich, normalize the model
reference: inspect inner_query.model and if it's already a str use it, if it has
a .name (or .model_name) attribute extract that string, if it's a dict extract
the 'name' or 'model' key, and otherwise raise a clear TypeError; pass that
normalized string as model_name to _resolve_model (and keep using the original
inner_query when calling _enrich). Ensure you adjust the code around the
_resolve_model(...) call to use the normalized model_name variable.

217-219: ⚠️ Potential issue | 🟠 Major

Duplicate model names still silently overwritten.

The previous review flagged that create_model_from_query(save=True) can silently clobber existing models, but this hasn't been addressed. A typo in the name parameter could overwrite a production model definition.

🛡️ Minimal guard
         if save:
+            existing = self.storage.get_model(name)
+            if existing is not None:
+                raise ValueError(f"Model '{name}' already exists. Use a different name or delete the existing model first.")
             self.storage.save_model(model)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 217 - 219, The
create_model_from_query function currently calls self.storage.save_model(model)
when save=True which can silently overwrite existing models; modify
create_model_from_query to check for an existing model with the same name (e.g.,
via self.storage.get_model(model.name) or equivalent) before calling
self.storage.save_model, and if a model exists either raise an error or require
an explicit overwrite flag (e.g., add an overwrite parameter or reuse an
existing confirm flag) so accidental clobbers are prevented; ensure the check
references create_model_from_query, the model.name attribute, and
self.storage.save_model/get_model to locate and fix the logic.
🧹 Nitpick comments (1)
slayer/engine/query_engine.py (1)

802-804: Multi-hop measure references silently fail with confusing error.

The split(".", 1) limits cross-model measures to single-hop (e.g., customers.avg_score). A multi-hop reference like customers.regions.avg_score would split to ["customers", "regions.avg_score"], then fail with "Measure 'regions.avg_score' not found" rather than a clear "multi-hop measures not supported" message.

If multi-hop measures aren't intended to be supported, consider adding explicit validation:

🛡️ Clearer error for unsupported multi-hop
         parts = spec_name.split(".", 1)
         if len(parts) != 2:
             raise ValueError(f"Invalid cross-model measure reference: '{spec_name}'")
+        if "." in parts[1]:
+            raise ValueError(
+                f"Multi-hop measure references like '{spec_name}' are not supported. "
+                f"Only single-hop references (e.g., 'model.measure') are allowed."
+            )
         target_model_name, measure_name = parts
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/query_engine.py` around lines 802 - 804, The code currently
uses parts = spec_name.split(".", 1) which allows only a single hop and later
produces a misleading "Measure 'regions.avg_score' not found" error for
multi-hop references; update the validation around spec_name/parts to detect and
explicitly reject multi-hop references (e.g., split on "." into all parts or
check for more than one dot) and raise a clear ValueError like "Multi-hop
measures not supported: '<spec_name>'" (adjust the existing raise ValueError
call near spec_name and parts accordingly) so multi-hop usage fails with a
precise message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/engine/query_engine.py`:
- Around line 57-62: The code in query_engine.py that handles list inputs (the
isinstance(query, list) branch) builds named_queries by mapping q.name to q but
silently overwrites duplicates; update that block in the QueryEngine (or
surrounding function where queries = query and named_queries are built) to
validate uniqueness by iterating queries[:-1], tracking seen names, and if a
name is encountered twice raise a clear ValueError (or custom exception) listing
the duplicate name(s) and the offending Query objects; this prevents silent
overwrites and makes the problem immediate to the caller.
- Around line 726-732: The generated SQL uses unquoted short identifiers causing
failures for names with spaces/special chars; update the code that builds
rename_parts/wrapped_sql and the Dimension construction to quote and escape
short consistently: when creating rename_parts in the loop that builds
wrapped_sql (referencing rename_parts, wrapped_sql, and column_map) wrap the
short in proper SQL identifier quotes (e.g., double quotes with internal quotes
escaped) for the AS alias, and set Dimension(sql=...) to use the same quoted
identifier (referencing Dimension(name=short, sql=short, type=dtype));
centralize quoting into a helper (e.g., quote_identifier) and call it wherever
short is used in SQL generation to ensure consistency.

---

Duplicate comments:
In `@slayer/engine/query_engine.py`:
- Around line 688-694: inner_query.model can be non-string (SlayerModel,
ModelExtension, dict) but _resolve_model expects a string; before calling
_resolve_model and _enrich, normalize the model reference: inspect
inner_query.model and if it's already a str use it, if it has a .name (or
.model_name) attribute extract that string, if it's a dict extract the 'name' or
'model' key, and otherwise raise a clear TypeError; pass that normalized string
as model_name to _resolve_model (and keep using the original inner_query when
calling _enrich). Ensure you adjust the code around the _resolve_model(...) call
to use the normalized model_name variable.
- Around line 217-219: The create_model_from_query function currently calls
self.storage.save_model(model) when save=True which can silently overwrite
existing models; modify create_model_from_query to check for an existing model
with the same name (e.g., via self.storage.get_model(model.name) or equivalent)
before calling self.storage.save_model, and if a model exists either raise an
error or require an explicit overwrite flag (e.g., add an overwrite parameter or
reuse an existing confirm flag) so accidental clobbers are prevented; ensure the
check references create_model_from_query, the model.name attribute, and
self.storage.save_model/get_model to locate and fix the logic.

---

Nitpick comments:
In `@slayer/engine/query_engine.py`:
- Around line 802-804: The code currently uses parts = spec_name.split(".", 1)
which allows only a single hop and later produces a misleading "Measure
'regions.avg_score' not found" error for multi-hop references; update the
validation around spec_name/parts to detect and explicitly reject multi-hop
references (e.g., split on "." into all parts or check for more than one dot)
and raise a clear ValueError like "Multi-hop measures not supported:
'<spec_name>'" (adjust the existing raise ValueError call near spec_name and
parts accordingly) so multi-hop usage fails with a precise message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9c561dc-2624-4fca-a6bd-cd1e1bff2760

📥 Commits

Reviewing files that changed from the base of the PR and between f963b5f and 0b82c9e.

📒 Files selected for processing (2)
  • slayer/engine/query_engine.py
  • tests/test_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_integration.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant