Skip to content

dbt Semantic Layer ingestion (with optional hidden-model import)#31

Merged
ZmeiGorynych merged 21 commits intomainfrom
egor/dev-1251-dbt-ingestion
Apr 17, 2026
Merged

dbt Semantic Layer ingestion (with optional hidden-model import)#31
ZmeiGorynych merged 21 commits intomainfrom
egor/dev-1251-dbt-ingestion

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented Apr 15, 2026

Summary

Adds first-class dbt support to SLayer in two layers:

  1. Semantic-layer importslayer import-dbt reads a dbt project's YAML and converts its semantic_models, entities, dimensions, measures, and metrics into SlayerModels + queries.yaml.
  2. Hidden-model import (opt-in)--include-hidden-models additionally registers every regular dbt model not wrapped by a semantic_model as a hidden SlayerModel. These models are queryable by name but omitted from discovery listings so the semantic layer stays the default surface for agents.

This is the foundation for running the dbt-llm-sl-bench benchmark against SLayer, and it closes the visibility gap for tables that exist in dbt but aren't (yet) part of the semantic layer.


Layer 1 — Semantic-layer import

New slayer/dbt/ module:

  • models.py — lightweight Pydantic v2 classes for dbt YAML objects. We don't use metricflow-semantic-interfaces because it requires a Pydantic v1 shim, has heavy transitive deps, and its parser expects a different YAML format than dbt-core produces.
  • parser.py — walks a dbt project directory, handles the semantic_models: list format, resolves ref() references.
  • entities.py — builds an entity registry across all semantic models, resolves foreign entities to explicit SLayer ModelJoin objects.
  • filters.py — converts dbt Jinja filter syntax ( {{ Dimension('entity__dim') }}) to plain SLayer filter strings.
  • converter.py — full conversion pipeline:
    • Semantic models → SlayerModels (1:1)
    • Entity relationships → primary-key dimensions + explicit joins
    • Measure consolidation: multiple dbt measures with the same expr but different agg values are merged into one SLayer measure with combined allowed_aggregations.
    • Simple metrics with filters → filtered SLayer measures.
    • Derived / ratio / cumulative metrics → SlayerQuery definitions written to a separate queries.yaml.
    • Conversion and windowed-cumulative metrics emit warnings (not supported).

CLI: slayer import-dbt <dbt_project_path> --datasource <name> [--storage <path>] [--no-strict-aggregations]


Layer 2 — Hidden-model import (new in the last commit)

Motivation: when an agent asks SLayer what tables it can query, it currently only sees tables with a semantic_model wrapper. Staging tables, unwrapped marts, and raw models are invisible even though they're sitting there in the warehouse. That's a discovery gap, not a capability gap — SLayer can already query an arbitrary table, it just didn't know the table existed.

What's added:

  • A new slayer/dbt/manifest.py module — the only place in the codebase that imports dbt-core. It exposes DBT_AVAILABLE, load_or_generate_manifest (uses an existing target/manifest.json or invokes dbt parse if dbt-core is available), and find_orphan_model_nodes / regular_models_from_manifest to turn the manifest into Pydantic DbtRegularModel instances.
  • DbtRegularModel and DbtColumnMeta added to slayer/dbt/models.py; DbtProject extended with regular_models.
  • A small refactor to slayer/engine/ingestion.py — the per-table introspection logic already used by auto-ingest is extracted as a public introspect_table_to_model helper so the new dbt pass can reuse it without duplication.
  • DbtToSlayerConverter gains two new kwargs (sa_engine, include_hidden_models, both optional). When enabled, after semantic models + metrics are converted it iterates every orphan dbt model, introspects the materialized table, wraps the result as a hidden=True SlayerModel, and overlays the manifest's model/column descriptions onto the introspected dimensions and measures. Name-collisions with semantic models are skipped (semantic wins). Introspection failures per model (table not yet materialized, permission denied, etc.) log a warning and continue.
  • slayer/cli.py adds --include-hidden-models to the import-dbt subcommand (off by default). When set, the CLI resolves the datasource, opens a SQLAlchemy engine, passes it to the converter, and disposes it in a finally. The final summary line now prints <visible> models, <hidden> hidden, <n> queries, <n> warnings.

Packaging: dbt-core is an optional dependency declared under a new dbt extras group (and added to all). Without it, the feature is cleanly no-op — import-dbt still works for semantic models, and the code never imports dbt unless asked to.

Design decisions

  • Discovery via dbt parse, not filesystem walk — the manifest is the authoritative source of the dbt DAG, including dbt_project.yml overrides, macro-generated names, and alias resolution. A filesystem walk of .sql files would miss all of that.
  • Off by default — existing import-dbt behavior is unchanged. Users opt in with a single flag once they've installed slayer[dbt].
  • Hidden filter is discovery-only — the flag keeps these models out of slayer models list, the MCP datasource_summary, and the dimensions/measures of GET /models/{name}, but they can still be queried by name (REST, MCP query, SQL). An agent that finds the name via the MCP edit_model tool can also flip hidden to false to promote a model.
  • Metadata overlay, not overwrite — column descriptions from the manifest only fill blanks; anything SQL introspection already produced is kept.
  • Fail soft, per model — one table failing to introspect shouldn't kill the whole import.

Documentation

  • docs/dbt/dbt_import.md — full walkthrough of the semantic-layer conversion plus a new "Regular dbt Models (Hidden Import)" section covering prerequisites, usage, metadata carried over, failure semantics, and how to promote a hidden model later via MCP edit_model.
  • docs/dbt/slayer_vs_dbt.md — self-contained comparison of SLayer vs dbt expressiveness.
  • docs/concepts/ingestion.md — now opens with a three-path overview (auto-ingest, semantic-layer import, hidden dbt-model import) and links to the dbt docs.
  • docs/reference/cli.md + docs/interfaces/cli.md — documented slayer import-dbt with all flags.
  • README.md — one-bullet feature mention under "What SLayer does differently".

Test plan

  • 95 new tests across test_dbt_parser.py, test_dbt_entities.py, test_dbt_filters.py, test_dbt_converter.py, and the new test_dbt_manifest.py. All use inline YAML fixtures and mocked SQLAlchemy engines — no live database required, CI-friendly.
  • Hidden-model specific cases:
    • default off → no hidden models produced even when regular_models are populated
    • opt-in without engine → warning emitted, semantic pass unaffected
    • opt-in with engine → hidden SlayerModel produced, description overlaid
    • introspection raises SQLAlchemyError → warning, model skipped, others still convert
    • name collision with semantic model → regular model skipped (semantic wins)
    • load_or_generate_manifest correctly loads existing files, invokes dbt parse when missing, returns None when dbt-core is unavailable
  • Full test suite: 622 passed.
  • Lint: ruff check slayer/ tests/ — clean.
  • Optional extra wiring: pyproject.toml declares dbt = ["dbt-core"] and includes it in all.

Follow-ups (not in this PR)

  • End-to-end integration test against jaffle_shop_duckdb with a materialized warehouse.
  • Expose --include-hidden-models through the REST /import-dbt and MCP equivalents once the import endpoints exist.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added slayer import-dbt CLI to ingest dbt semantic_models and metrics; optional --include-hidden-models to import non-semantic models as hidden, queryable models and emit queries.yaml.
  • Improvements / Bug Fixes

    • Improved SQL generation for filtered/first/last measures (correct join placement, isolated rank/match flags, safer column detection) and richer filter column handling for enrichment/introspection.
  • Documentation

    • Added detailed dbt import guide, dbt vs SLayer comparison, and CLI docs.
  • Tests

    • Added extensive tests for parsing, conversion, manifest handling, filters, joins, and SQL regressions.

New `slayer import-dbt` command that reads dbt semantic layer YAML
definitions and converts them to SLayer models.

Key features:
- Parses dbt-core format (semantic_models: list, metrics: list)
- Resolves entity relationships (primary/foreign) to explicit SLayer joins
- Converts Jinja filter syntax to plain SLayer filter strings
- Consolidates measures with same expr but different aggs into one measure
- Simple metrics with filters become filtered SLayer measures (CASE WHEN)
- Derived/ratio/cumulative metrics generate SlayerQuery defs in queries.yaml
- Preserves labels and descriptions throughout

Module structure: slayer/dbt/ with models, parser, entities, filters,
converter. 46 tests with inline fixtures (no external dependencies).

Also adds docs/dbt/ with comparison doc (slayer_vs_dbt.md) and detailed
conversion process doc (dbt_import.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added end-to-end dbt Semantic Layer ingestion: a CLI command parses dbt YAML/manifest, converts semantic_models/metrics into SLayer models/queries (with entity-based joins, filter translation, measure consolidation), optionally imports non-semantic dbt models as hidden models via SQL introspection, and persists results to storage with warnings reporting.

Changes

Cohort / File(s) Summary
Documentation - dbt & CLI
docs/dbt/dbt_import.md, docs/dbt/slayer_vs_dbt.md, docs/interfaces/cli.md, docs/reference/cli.md, docs/concepts/ingestion.md, README.md
New detailed docs: import behavior, mapping rules, filter translation, limitations, CLI usage and examples.
CLI & Persistence
slayer/cli.py
Added import-dbt subcommand, argument parsing (--datasource, --no-strict-aggregations, --include-hidden-models), helpers for queries.yaml location, command dispatch, storage persistence, and summary/warning output.
dbt Schema & Parsing
slayer/dbt/models.py, slayer/dbt/parser.py
Added Pydantic schemas for dbt semantic models/metrics/regular models and YAML parser that discovers files, normalizes entries, resolves ref() calls, and returns a DbtProject.
Manifest Integration
slayer/dbt/manifest.py, slayer/dbt/__init__.py
Optional dbt-core integration with load_or_generate_manifest, orphan-model detection, and conversion helpers to produce DbtRegularModel instances; feature-flag guarded.
Conversion Engine
slayer/dbt/converter.py, slayer/dbt/entities.py, slayer/dbt/filters.py
New converter mapping dbt semantic_models/metrics→SlayerModels/queries, consolidating measures by expr, resolving joins via EntityRegistry, converting Jinja filters to plain filter strings, and returning ConversionResult with warnings.
Engine Introspection & Enrichment
slayer/engine/ingestion.py, slayer/engine/enriched.py, slayer/engine/enrichment.py
Added introspect_table_to_model helper; EnrichedMeasure extended with filter_columns; enrichment updated to track structured filter column refs and derive needed joins from them.
SQL Generator
slayer/sql/generator.py
Adjusted ranked-subquery join injection and ROW_NUMBER handling for first/last and filtered measures; added filtered match-flag plumbing and safer identifier handling; wrapped formula terms for filtered measures.
Tests
tests/test_dbt_parser.py, tests/test_dbt_converter.py, tests/test_dbt_entities.py, tests/test_dbt_filters.py, tests/test_dbt_manifest.py, tests/test_sql_generator.py
Extensive tests for parsing, conversion, entity/joins, filter translation, manifest loading, SQL generation, hidden-model import behaviors, and warnings.
Packaging
pyproject.toml
Declared optional dependency dbt-core and new dbt extra (included in all extra).
Docs Example Update
docs/examples/01_dynamic/dynamic.md
Rewrote dynamic example narrative and multistage query example.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Parser
    participant Manifest
    participant Converter
    participant Registry
    participant Introspector
    participant Storage

    User->>CLI: slayer import-dbt <project_path> --datasource <ds> [--include-hidden-models]
    CLI->>Parser: parse_dbt_project(project_path)
    Parser-->>CLI: DbtProject (semantic_models, metrics)

    alt include_hidden_models
        CLI->>Manifest: load_or_generate_manifest(project_path)
        Manifest-->>CLI: manifest or None
    end

    CLI->>Converter: DbtToSlayerConverter.convert(project, datasource, flags)
    Converter->>Registry: EntityRegistry.build(semantic_models)
    Registry-->>Converter: registry (entity→owner mappings)

    alt include_hidden_models & sa_engine provided
        Converter->>Introspector: introspect_table_to_model(...) for orphan/regular models
        Introspector-->>Converter: hidden SlayerModel(s) or warnings
    end

    Converter-->>CLI: ConversionResult (models, queries, warnings)
    CLI->>Storage: persist SlayerModel(s)
    Storage-->>CLI: persisted
    alt queries present
        CLI->>Storage: write queries.yaml
        Storage-->>CLI: persisted
    end
    CLI-->>User: print warnings and summary (visible/hidden/query counts)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • AivanF
  • whimo

Poem

🐰 Hop, hop — a YAML feast I bring,

I nibble Jinja, make filters sing.
Entities find homes and measures unite,
Hidden tables peek into the light.
Now SLayer bakes dbt dreams just right. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 49.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the primary change: adding dbt Semantic Layer ingestion with optional hidden-model import support. It accurately reflects the main objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1251-dbt-ingestion

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (2)
docs/dbt/dbt_import.md (2)

124-128: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified for syntax highlighting.

📝 Suggested fix
-```
+```jinja
 {{ Dimension('claim_amount__has_loss_payment') }} = 1
 {{ TimeDimension('metric_time', 'day') }} >= '2024-01-01'
 {{ Entity('customer_id') }} IS NOT NULL
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/dbt/dbt_import.md around lines 124 - 128, The fenced code block
containing Jinja snippets (uses Dimension, TimeDimension, Entity) lacks a
language specifier; update the block delimiter to include a language (e.g.,
change tojinja) so the Jinja/templating code is syntax-highlighted and
complies with markdownlint.


</details>

---

`154-163`: **Add language specifier to CLI reference code block.**

<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 slayer import-dbt <dbt_project_path> [options]

 Arguments:
   dbt_project_path          Path to dbt project root or models directory

 Options:
   --datasource NAME         SLayer datasource name for imported models (required)
   --storage PATH            Storage directory for output (default: ./slayer_data)
   --no-strict-aggregations  Allow all aggregation types (don't restrict to dbt's defined agg)
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/dbt/dbt_import.md around lines 154 - 163, The fenced CLI example for
the "slayer import-dbt" command is missing a language specifier; update the
opening fence to text so the block is rendered as plain text (i.e., change the code block starting with before the "slayer import-dbt
<dbt_project_path> [options]" example to ```text) to ensure proper formatting
and syntax highlighting.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @slayer/cli.py:

  • Around line 399-405: The code writes queries.yaml to storage_path directly,
    which breaks when args.storage is a SQLite file (e.g., "slayer.db") because it
    tries to create "slayer.db/queries.yaml"; update the path logic before building
    queries_path: if storage was provided and is a file path (not a directory) use
    its directory (os.path.dirname(storage_path)) as the base; otherwise if
    storage_path is a directory use it as-is; fall back to args.models_dir or
    _STORAGE_DEFAULT as before. Apply this change around the block that references
    result.queries, storage_path, args.storage, args.models_dir and _STORAGE_DEFAULT
    so queries_path points to a directory not a file.

In @slayer/dbt/converter.py:

  • Around line 374-380: The current code uses formula.replace(ref_name, resolved)
    which can replace substrings inside other identifiers (e.g., "total" in
    "subtotal"); update the replacement to be token-aware by matching whole
    identifier tokens instead of raw substrings: find occurrences of the metric
    token (use the alias or name from m_input.alias / m_input.name) in formula as
    standalone identifiers (e.g., with a regex using word boundaries or by
    tokenizing the expression) and replace only those matches with the resolved
    value returned by self._resolve_metric_to_formula; ensure you still iterate
    metric.type_params.metrics and use ref_name when building the token match so
    other identifier parts are not altered.
  • Around line 136-139: The code forces sql to None by assigning measure_name =
    expr_key, making the expr_key != measure_name check always false; instead set
    measure_name from the measure's actual name (e.g., measure.get("name") or the
    original measure identifier) and only fall back to expr_key if no explicit name
    exists (e.g., measure_name = measure.get("name", expr_key)); keep the existing
    sql assignment logic (sql = expr_key if expr_key != measure_name else None) so
    consolidated measures retain their SQL when the measure name differs from the
    expression.

In @slayer/dbt/entities.py:

  • Around line 93-102: The current dedupe logic using seen_targets causes
    distinct FKs to the same model to collapse; in the join-building block
    (variables seen_targets, target_model_name, foreign_expr, primary_expr, joins,
    ModelJoin) change the key used for deduplication to the join signature (e.g.,
    include foreign_expr and primary_expr with target_model_name) or remove the
    dedupe entirely so you append a separate ModelJoin per foreign key; ensure each
    ModelJoin's join_pairs uses the correct foreign_expr/primary_expr pair so
    buyer_id->users.id and seller_id->users.id produce two distinct joins.

In @slayer/dbt/parser.py:

  • Around line 19-32: Clarify and enforce that _extract_ref_name only accepts
    single-argument ref('name') / ref("name"): update the docstring on
    _extract_ref_name to explicitly state only single-arg dbt ref() is supported,
    and add validation using _REF_PATTERN plus an additional broader detection regex
    (or simple search for "ref(" with commas or multiple args) to detect unsupported
    multi-argument/package-qualified forms; when such unsupported forms are found,
    raise a clear ValueError (or custom exception) from _extract_ref_name explaining
    that multi-argument/package-qualified ref() is not supported in semantic model
    YAML and must be simplified to ref('model'). Ensure references to _REF_PATTERN
    and _extract_ref_name are used so changes are easy to locate.

In @slayer/engine/enriched.py:

  • Line 58: The EnrichedMeasure currently only stores filter_sql which forces
    downstream join planning to reparse SQL; update EnrichedMeasure to retain
    structured filter metadata (e.g., the ParsedFilter instance and/or a resolved
    list of filter column identifiers) alongside filter_sql so join planning can use
    structured data; modify the construction/return points that set filter_sql (look
    for EnrichedMeasure creation sites and the variable named filter_sql) to also
    attach the ParsedFilter object or a resolved filter_columns list, and update any
    callers of EnrichedMeasure to read the new parsed metadata instead of regexing
    filter_sql.

In @slayer/engine/enrichment.py:

  • Around line 173-184: The measure-level filter resolution is incorrectly
    passing model.name to resolve_filter_columns which causes SQL to reference the
    wrong source alias; update the call in the measure filter block so
    resolve_filter_columns receives model_name_str (the same source alias used
    elsewhere) instead of model.name, keeping the
    parsed=parse_filter(measure_def.filter) and assigning filter_sql =
    resolved[0].sql unchanged; ensure you update the parameter named model_name in
    the resolve_filter_columns call to use model_name_str so CASE/WHERE clauses
    reference the actual FROM alias.

In @slayer/sql/generator.py:

  • Around line 963-964: filtered_rn_map currently uses a non-unique key
    f"{m.source_measure_name or m.name}:{m.aggregation}" which allows two filtered
    metrics with the same source/agg but different filters/time columns to clobber
    each other; change the key to include the per-measure identifier (e.g. use
    m.alias or measure.alias) so each filtered rank maps to a unique entry (update
    the key creation in the measure loop where measure_key is assigned and likewise
    adjust the other occurrence around the similar filtered_rn_map population).
  • Around line 1113-1115: The current code wraps only the {value} piece
    (col_expr) with measure.filter_sql, which alters numerators but leaves other
    formula terms (e.g., weights in weighted_avg) unfiltered; instead apply the
    filter to the entire substituted formula. Change the logic around
    measure.filter_sql/col_expr/substituted so you first build substituted =
    formula.replace("{value}", col_expr) and then, if measure.filter_sql is set,
    wrap the whole substituted expression with "CASE WHEN {measure.filter_sql} THEN
    ({substituted}) END" (referencing measure.filter_sql, col_expr, formula, and
    substituted) so all terms in the parameterized formula are filtered
    consistently.
  • Around line 232-234: The ranked subquery builder (_build_last_ranked_from) is
    only using base_from and thus omits enriched.resolved_joins so cross-model
    filters (m.filter_sql) end up in a subquery without the needed joins; update the
    flow to propagate joins into the ranked subquery: change _build_last_ranked_from
    to accept and merge enriched.resolved_joins (or accept the full enriched object)
    and apply those joins when constructing the from_clause subquery (so the
    injected joins from _generate_base still match), and adjust any code paths that
    build rn_suffix_map/filtered_rn_map to use the new from_clause with joins; also
    ensure _generate_base’s join injection logic recognizes when from_clause is a
    subquery and either injects joins into that subquery or prevents double-wrapping
    so filtered first/last filters reference valid tables.

In @tests/test_sql_generator.py:

  • Around line 1322-1328: The test's final assertion is vacuous because it ORs
    the unwanted "_last_rn " check with the already-verified "_last_rn_f0" presence;
    change the assertion to explicitly ensure the plain shared alias "_last_rn" does
    not appear in the generated SQL (i.e., replace the line using assert "_last_rn " not in sql or "_last_rn_f0" in sql with a direct negative check like assert "_last_rn" not in sql), keeping the earlier checks for "_last_rn_f0" and the
    CASE WHEN fragment intact so the test fails if the unfiltered "_last_rn" alias
    is emitted.

Nitpick comments:
In @docs/dbt/dbt_import.md:

  • Around line 124-128: The fenced code block containing Jinja snippets (uses
    Dimension, TimeDimension, Entity) lacks a language specifier; update the block
    delimiter to include a language (e.g., change tojinja) so the
    Jinja/templating code is syntax-highlighted and complies with markdownlint.
  • Around line 154-163: The fenced CLI example for the "slayer import-dbt"
    command is missing a language specifier; update the opening fence to ```text so
    the block is rendered as plain text (i.e., change the code block starting with
```text) to ensure proper formatting and syntax highlighting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e18cc87-13b0-4188-8755-0c4d961d2f7d

📥 Commits

Reviewing files that changed from the base of the PR and between d7bbae7 and 5606f9f.

📒 Files selected for processing (23)
  • docs/concepts/models.md
  • docs/dbt/dbt_import.md
  • docs/dbt/slayer_vs_dbt.md
  • slayer/cli.py
  • slayer/core/models.py
  • slayer/dbt/__init__.py
  • slayer/dbt/converter.py
  • slayer/dbt/entities.py
  • slayer/dbt/filters.py
  • slayer/dbt/models.py
  • slayer/dbt/parser.py
  • slayer/engine/enriched.py
  • slayer/engine/enrichment.py
  • slayer/engine/query_engine.py
  • slayer/mcp/server.py
  • slayer/sql/generator.py
  • tests/integration/test_integration.py
  • tests/test_dbt_converter.py
  • tests/test_dbt_entities.py
  • tests/test_dbt_filters.py
  • tests/test_dbt_parser.py
  • tests/test_models.py
  • tests/test_sql_generator.py

Comment thread slayer/cli.py
Comment thread slayer/dbt/converter.py Outdated
Comment thread slayer/dbt/converter.py Outdated
Comment thread slayer/dbt/entities.py Outdated
Comment thread slayer/dbt/parser.py Outdated
Comment thread slayer/engine/enrichment.py
Comment thread slayer/sql/generator.py Outdated
Comment thread slayer/sql/generator.py Outdated
Comment thread slayer/sql/generator.py
Comment thread tests/test_sql_generator.py Outdated
ZmeiGorynych and others added 2 commits April 16, 2026 10:56
Opt-in via `slayer import-dbt --include-hidden-models` (requires new
`dbt` extra). After the existing semantic-model pass, the converter walks
`target/manifest.json` (running `dbt parse` via the optional dbt-core
dep if needed), finds every regular model not referenced by a
`semantic_model`, introspects its materialized table through the shared
`introspect_table_to_model` helper, and registers the result with
`hidden=True` so it stays out of discovery surfaces but remains
queryable by name. Model/column descriptions from the manifest are
overlaid onto the introspected dims/measures. Introspection failures
and name collisions with semantic models are warned about and skipped
individually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ZmeiGorynych ZmeiGorynych changed the title Add dbt semantic layer ingestion module dbt Semantic Layer ingestion (with optional hidden-model import) Apr 16, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
slayer/sql/generator.py (1)

938-964: Filtered ROW_NUMBER implementation looks correct.

The approach of using ORDER BY CASE WHEN {filter} THEN 0 ELSE 1 END, {order} to push non-matching rows to the bottom of the ranking is sound. The seen_filters cache correctly deduplicates identical (filter_sql, time_col, agg) combinations.

One edge case worth noting: if two filtered measures have the same source_measure_name and aggregation but different filters, the measure_key at line 963 would collide and only one mapping would survive in filtered_rn_map. Consider whether using measure.alias (which is unique per enriched measure) as the key would be safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 938 - 964, The current filtered_rn_map
uses measure_key = f"{m.source_measure_name or m.name}:{m.aggregation}" which
can collide when two measures share source_measure_name and aggregation but have
different filters; update the mapping to use a unique per-measure identifier
(e.g., m.alias or another unique property on the EnrichedMeasure) as the key
when assigning filtered_rn_map so each enriched measure maps to its correct
ROW_NUMBER alias; adjust any downstream lookups that expect the old key to use
the new unique key (references: filtered_rn_map, measure_key, enriched.measures,
and m.alias).
slayer/dbt/manifest.py (1)

36-42: Consider explicit UTF-8 encoding for manifest loading.

The open(path) call uses the system default encoding. While JSON files are typically UTF-8, explicitly specifying the encoding improves robustness:

 def _load_manifest_file(path: str) -> Optional[dict]:
     try:
-        with open(path) as f:
+        with open(path, encoding="utf-8") as f:
             return json.load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/dbt/manifest.py` around lines 36 - 42, The _load_manifest_file
function opens the manifest with the system default encoding; change the open
call in _load_manifest_file to explicitly specify UTF-8 (e.g., open(path, "r",
encoding="utf-8")) so json.load reads the file with a deterministic encoding and
avoid silent decode issues when loading the dbt manifest.
tests/test_dbt_converter.py (1)

77-77: Prefer model lookup by name over positional index in assertions

Using result.models[0] makes tests brittle to benign ordering changes in converter output. Resolving by model name keeps tests deterministic and intention-revealing.

Refactor pattern
+def _model_by_name(models, name: str):
+    return next(m for m in models if m.name == name)

-        orders = result.models[0]
+        orders = _model_by_name(result.models, "orders")

Also applies to: 87-87, 95-95, 104-104, 113-113, 121-121, 143-143, 168-168, 184-184, 212-212, 241-241, 352-352

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_dbt_converter.py` at line 77, Tests currently access models via
positional indices (e.g., orders = result.models[0]), which is brittle; update
each test to resolve models by their name instead of by index by locating the
model in result.models where model.name equals the expected name (for example
replace the positional lookup used for "orders" and the other occurrences at the
commented indices with a name-based lookup or a small helper that maps
result.models by .name), so assertions reference the correct model
deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_dbt_converter.py`:
- Line 147: The current test uses a loose assertion "assert m.sql == 'amount' or
m.name == 'amount'" which can pass if only the name matches; tighten it to
assert both properties match the expected value. Update the assertion for the
consolidated measure (variable m) so it validates m.sql equals "amount" and
m.name equals "amount" (or assert the tuple (m.sql, m.name) == ("amount",
"amount")) to prevent false positives in the converter test.

---

Nitpick comments:
In `@slayer/dbt/manifest.py`:
- Around line 36-42: The _load_manifest_file function opens the manifest with
the system default encoding; change the open call in _load_manifest_file to
explicitly specify UTF-8 (e.g., open(path, "r", encoding="utf-8")) so json.load
reads the file with a deterministic encoding and avoid silent decode issues when
loading the dbt manifest.

In `@slayer/sql/generator.py`:
- Around line 938-964: The current filtered_rn_map uses measure_key =
f"{m.source_measure_name or m.name}:{m.aggregation}" which can collide when two
measures share source_measure_name and aggregation but have different filters;
update the mapping to use a unique per-measure identifier (e.g., m.alias or
another unique property on the EnrichedMeasure) as the key when assigning
filtered_rn_map so each enriched measure maps to its correct ROW_NUMBER alias;
adjust any downstream lookups that expect the old key to use the new unique key
(references: filtered_rn_map, measure_key, enriched.measures, and m.alias).

In `@tests/test_dbt_converter.py`:
- Line 77: Tests currently access models via positional indices (e.g., orders =
result.models[0]), which is brittle; update each test to resolve models by their
name instead of by index by locating the model in result.models where model.name
equals the expected name (for example replace the positional lookup used for
"orders" and the other occurrences at the commented indices with a name-based
lookup or a small helper that maps result.models by .name), so assertions
reference the correct model deterministically.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5047ddfe-66bd-44ce-9a4a-3e475dbc0c3b

📥 Commits

Reviewing files that changed from the base of the PR and between 5606f9f and 8504109.

📒 Files selected for processing (17)
  • README.md
  • docs/concepts/ingestion.md
  • docs/dbt/dbt_import.md
  • docs/interfaces/cli.md
  • docs/reference/cli.md
  • pyproject.toml
  • slayer/cli.py
  • slayer/dbt/converter.py
  • slayer/dbt/manifest.py
  • slayer/dbt/models.py
  • slayer/dbt/parser.py
  • slayer/engine/ingestion.py
  • slayer/sql/generator.py
  • tests/test_dbt_converter.py
  • tests/test_dbt_manifest.py
  • tests/test_dbt_parser.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (4)
  • docs/concepts/ingestion.md
  • docs/reference/cli.md
  • README.md
  • docs/interfaces/cli.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • slayer/dbt/converter.py
  • tests/test_sql_generator.py
  • slayer/dbt/models.py

Comment thread tests/test_dbt_converter.py Outdated
…ev-1251-dbt-ingestion

# Conflicts:
#	pyproject.toml
#	tests/test_sql_generator.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
slayer/cli.py (1)

437-442: ⚠️ Potential issue | 🟠 Major

queries.yaml still assumes --storage is a directory.

If the user imports into SQLite storage, Line 440 builds <db-file>/queries.yaml, which fails as soon as converted metrics produce query output. This was already called out on the earlier revision and still looks unresolved.

Suggested fix
     if result.queries:
         storage_path = args.storage or args.models_dir or _STORAGE_DEFAULT
+        if storage_path.endswith((".db", ".sqlite")):
+            storage_path = os.path.dirname(storage_path) or "."
         queries_path = os.path.join(storage_path, "queries.yaml")
         with open(queries_path, "w") as f:
             _yaml.dump(result.queries, f, sort_keys=False, default_flow_style=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/cli.py` around lines 437 - 442, The code assumes storage_path is a
directory when building queries_path, causing failures if args.storage points to
a file (e.g., SQLite DB). Change the logic that computes
storage_path/queries_path so that if the chosen storage (args.storage or
args.models_dir or _STORAGE_DEFAULT) is not a directory
(os.path.isdir(storage_path) is False) you use its parent directory
(os.path.dirname(storage_path) or fallback to current dir) before
os.path.join("queries.yaml"); update the block around
result.queries/queries_path/open/_yaml.dump to compute queries_path from that
normalized directory. Ensure you reference result.queries, storage_path,
args.storage, args.models_dir, _STORAGE_DEFAULT, and queries_path when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/cli.py`:
- Around line 402-428: The code calls storage.get_datasource and
storage.save_model directly although storage APIs are async; wrap those calls
with the sync bridge utility (run_sync) so coroutines are executed immediately.
Specifically, replace direct calls to storage.get_datasource(args.datasource)
and storage.save_model(model) with run_sync(storage.get_datasource,
args.datasource) and run_sync(storage.save_model, model) respectively (or a
single run_sync call that consumes the list), keeping the existing
DbtToSlayerConverter usage and sa_engine dispose logic intact.

In `@tests/test_sql_generator.py`:
- Around line 62-79: The test test_numeric_literal_measure is synchronous but
calls the async helper _generate; change the test signature to async def
test_numeric_literal_measure(...) and await the _generate(...) call (await
_generate(generator=generator, query=query, model=model)) so the coroutine
executes; also pass arguments to _generate using keyword names to follow project
conventions and ensure the assertions check the actual generated SQL from SUM(1)
and not a coroutine.

---

Duplicate comments:
In `@slayer/cli.py`:
- Around line 437-442: The code assumes storage_path is a directory when
building queries_path, causing failures if args.storage points to a file (e.g.,
SQLite DB). Change the logic that computes storage_path/queries_path so that if
the chosen storage (args.storage or args.models_dir or _STORAGE_DEFAULT) is not
a directory (os.path.isdir(storage_path) is False) you use its parent directory
(os.path.dirname(storage_path) or fallback to current dir) before
os.path.join("queries.yaml"); update the block around
result.queries/queries_path/open/_yaml.dump to compute queries_path from that
normalized directory. Ensure you reference result.queries, storage_path,
args.storage, args.models_dir, _STORAGE_DEFAULT, and queries_path when making
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5ac2d277-f2ba-4dce-a85b-d96641a8f8cd

📥 Commits

Reviewing files that changed from the base of the PR and between 8504109 and 914ffbb.

📒 Files selected for processing (3)
  • pyproject.toml
  • slayer/cli.py
  • tests/test_sql_generator.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml

Comment thread slayer/cli.py Outdated
Comment thread tests/test_sql_generator.py Outdated
ZmeiGorynych and others added 8 commits April 16, 2026 14:51
Three Major correctness fixes from CodeRabbit review on PR #31:

* slayer/dbt/converter.py: consolidated dbt measures lost their SQL
  expression because measure_name was assigned to expr_key just before
  the (expr_key != measure_name) test that gated sql assignment. Now
  measure_name comes from the first dbt measure in the group, and the
  shared expr is preserved as sql.
* slayer/engine/enrichment.py: measure-level filter resolution now
  qualifies columns with model_name_str (the source alias) instead of
  model.name, matching every other call site in the function. Fixes
  filtered measures over named-query / aliased sources generating
  SQL that references a table not present in the FROM clause.
* slayer/dbt/entities.py: distinct foreign keys to the same target
  (e.g. buyer_id and seller_id both pointing at users.id) no longer
  collapse into a single ModelJoin. Dedup is now keyed by the full
  join signature (target, fk_expr, pk_expr) so each relationship is
  preserved.

Also tightens tests/test_dbt_converter.py:147 (CodeRabbit #12) which
was masking the converter bug, and adds regression tests for the
filter-alias and multi-FK cases. Drive-by: makes
test_numeric_literal_measure async (it was silently coroutine-asserting).

Full pytest suite: 654 passed. ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ev-1251-dbt-ingestion

# Conflicts:
#	pyproject.toml
Three Major correctness fixes + one paired test fix from CodeRabbit
review on PR #31, all in slayer/sql/generator.py around the
filtered first/last and weighted_avg machinery:

* Parameterized formula filters now wrap every row-level reference
  in CASE WHEN, not just {value}. Previously `weighted_avg` on a
  filtered measure produced SUM(CASE WHEN filter THEN value END * weight)
  / SUM(weight) — numerator filtered, denominator summing all weights.
  Now both numerator and denominator share the same filter mask, so
  the weighted_avg correctly normalizes over the filtered subset.
* filtered_rn_map is keyed by measure.alias (unique per enriched
  measure) instead of "{source_measure_name|name}:{aggregation}". Two
  filtered metrics with the same source/agg but different filters or
  time columns no longer clobber each other in the rank-column map.
* _build_last_ranked_from() now applies enriched.resolved_joins inside
  the ranked subquery. Previously cross-model filters on filtered
  first/last (e.g. customers.status = 'active') would reference a
  table not present in the subquery's FROM. The outer string-level
  join injection in _generate_base is now suppressed when the from
  clause is a ranked subquery to avoid duplicate joins.
* Drive-by: the unfiltered _last_rn / _first_rn columns are no longer
  emitted when only filtered first/last measures are present (caught
  by tightening the previously-vacuous test assertion in
  test_filtered_last_generates_dedicated_rn).

Also tightens tests/test_sql_generator.py:1352 (CodeRabbit #11) which
was always-True, and adds three regression tests:
test_filtered_weighted_avg_filters_both_terms,
test_two_filtered_lasts_same_source_different_filters_dont_collide,
test_filtered_last_with_cross_model_filter_carries_join.

Includes docs/dbt/slayer_vs_dbt.md (carried across the previous
poetry-lock churn).

Full pytest: 657 passed. ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
One Major + two Minor + one Nitpick from CodeRabbit review on PR #31:

* slayer/dbt/converter.py: derived-metric formula substitution now uses
  word-boundary regex (`re.sub(r"\b...\b", ...)`) instead of plain
  str.replace. A metric named `total` no longer mutates `subtotal` or
  `total_orders` elsewhere in the same expression. Replacement string
  has its backslashes escaped so any literal "\\1" in the resolved
  colon expression is treated as text, not a regex backreference.
* slayer/dbt/parser.py: _REF_PATTERN now handles all three dbt ref()
  forms — single-arg `ref('name')`, package-qualified `ref('pkg', 'name')`,
  and versioned `ref('name', v=1)`. _extract_ref_name picks the SECOND
  positional string arg when present (package-qualified case) and falls
  back to the first.
* slayer/cli.py: `slayer import-dbt --storage slayer.db` no longer tries
  to write `slayer.db/queries.yaml`. New helper _queries_dir_for_storage
  detects SQLite file paths (.db / .sqlite / .sqlite3) and writes
  queries.yaml beside the file. Also adds explicit utf-8 encoding and
  os.makedirs to be defensive about target dir existence.
* slayer/dbt/manifest.py: explicit utf-8 encoding when loading
  manifest.json (CodeRabbit nitpick N3).

Adds 8 new tests: 1 token-aware metric replacement, 4 ref() forms,
3 queries-dir resolver cases.

Full pytest: 665 passed. ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CodeRabbit #6 (Major). Replace regex-based join discovery with the
structured ParsedFilter data already produced by resolve_filter_columns.

Previously _resolve_joins regexed m.filter_sql with _TABLE_COL_RE to
find cross-model column references the join planner needed to satisfy.
That pattern (\b\w+\.\w+\b) matches anything that looks like
table.column — including substrings inside string literals
("description LIKE '%foo.bar%'" → spurious join to a 'foo' table),
function calls written with dots, and any other dotted token in the
rendered SQL. The result was unwanted LEFT JOINs that could change
result cardinality or query performance.

Changes:
* slayer/engine/enriched.py: add filter_columns: List[str] to
  EnrichedMeasure (default empty list).
* slayer/engine/enrichment.py: in _ensure_aggregated_measure, capture
  resolved[0].columns alongside resolved[0].sql when resolving the
  measure-level filter.
* slayer/engine/enrichment.py: in _resolve_joins, iterate
  m.filter_columns (qualified column names from ParsedFilter) instead
  of regexing m.filter_sql. Only real column refs feed needed_tables.

Adds test_filter_with_dotted_string_literal_does_not_pull_spurious_join
which confirms a measure filter like "url LIKE 'foo.bar%'" no longer
pulls in an unrelated 'foo' join target.

Full pytest: 666 passed. ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs/dbt/dbt_import.md: tag the Jinja filter snippet block as
  `jinja` and the CLI usage block as `text` so markdownlint stops
  flagging them and IDE/web previews get appropriate (or absent)
  syntax highlighting.
* tests/test_dbt_converter.py: replace positional result.models[0]
  with name-based lookup `next(m for m in result.models if
  m.name == "orders")` in TestBasicConversion. The project under
  test has two models so the test is order-dependent today; the
  name-based lookup is robust if the converter ever changes
  iteration order. Single-model test cases (TestMeasureConsolidation
  etc.) keep [0] since order can't matter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous Add poetry.lock commit (32a2b9a) bundled an unresolved
merge state — eight `<<<<<<< Updated upstream` / `>>>>>>> Stashed
changes` blocks left over from a `git stash pop` after pulling main.
Poetry refused to read the file (Invalid statement at line 315),
which broke CI on `poetry install -E all --with dev`.

Regenerated cleanly from the current pyproject.toml. `poetry install
-E all --with dev` succeeds locally; quick pytest sanity check on
the dbt + filtered-measure suites passes (39 tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
docs/dbt/dbt_import.md (1)

73-76: Consolidation example uses the wrong resulting measure name.

In the current converter (slayer/dbt/converter.py:121-139), consolidated measures keep the first dbt measure name and set sql to the shared expression. This example should not output name: amount; it should be name: revenue_sum (with sql: amount).

Suggested doc fix
 # SLayer:
-- name: amount
+- name: revenue_sum
   sql: amount
   allowed_aggregations: [sum, avg]
   description: "dbt measures: revenue_sum (sum), revenue_avg (average)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/dbt/dbt_import.md` around lines 73 - 76, The consolidation currently
keeps the first dbt field name and overwrites sql with the shared expression;
change the consolidation logic in slayer/dbt/converter.py (the function that
merges/consolidates measures—e.g., the convert_measures/consolidate_measures
routine around lines 121-139) so the consolidated metric uses the derived
aggregation-specific name (e.g., revenue_sum, revenue_avg) instead of the
original column name (amount) while still setting sql to the shared expression;
specifically, choose the consolidated measure name by appending the aggregation
key to the base measure name (or by selecting the original dbt measure name that
contains the aggregation suffix), set sql to the common expression, and preserve
allowed_aggregations accordingly.
tests/test_sql_generator.py (2)

1456-1456: Hoist enrich_query to module scope in these regressions.

The same symbol is imported locally in three new tests. Moving it to the top of the file keeps the imports consistent and removes duplication.

As per coding guidelines, "Place imports at the top of files".

Also applies to: 1535-1535, 1627-1627

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_sql_generator.py` at line 1456, Hoist the local import of
enrich_query to module scope by adding "from slayer.engine.enrichment import
enrich_query" to the top-level imports in tests/test_sql_generator.py, then
remove the duplicate local imports of enrich_query inside the three tests that
currently import it locally so those tests simply call enrich_query directly.

76-77: Prefer keyword args in the new _generate call.

This is the only new call site added in the hunk, so it is easy to align with the repo convention now.

As per coding guidelines, "Use keyword arguments for functions with more than 1 parameter".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_sql_generator.py` around lines 76 - 77, The new call to the helper
_generate should use keyword arguments to match repo conventions: change the
invocation that currently passes positional args to call _generate with named
parameters (e.g., _generate(generator=..., query=..., model=...)) where the
current variables are generator, query, and model; update the single call in
tests/test_sql_generator.py near SlayerQuery and Field so it reads the
keyword-argument form.
tests/test_dbt_converter.py (1)

381-398: Hoist _queries_dir_for_storage to module scope.

These three tests import the same helper locally. Moving it to the top keeps the new code aligned with the repo import rule and removes repetition.

As per coding guidelines, "Place imports at the top of files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_dbt_converter.py` around lines 381 - 398, Move the repeated local
imports of _queries_dir_for_storage to a single module-level import at the top
of the test file: add "from slayer.cli import _queries_dir_for_storage" in the
file header and remove the per-test local imports inside
test_directory_storage_path_returned_as_is,
test_sqlite_db_uses_parent_directory, and test_bare_sqlite_filename_in_cwd so
all tests use the same module-scope symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/examples/01_dynamic/dynamic.md`:
- Line 3: Fix the subject-verb agreement in the opening sentence of the doc:
change "SLayer's intended audience are **agents and humans formulating ad hoc
queries**" to "SLayer's intended audience is **agents and humans formulating ad
hoc queries**" so the noun "audience" correctly pairs with the singular verb
"is".

In `@slayer/dbt/parser.py`:
- Line 128: parse_dbt_project always triggers manifest loading because it
unconditionally calls _parse_regular_models(), and _parse_regular_models()
unconditionally calls load_or_generate_manifest(); move manifest loading behind
the "--include-hidden-models" opt-in by either (a) only calling
_parse_regular_models() from parse_dbt_project when include_hidden_models is
true, or (b) add an include_hidden_models boolean parameter to
_parse_regular_models() and inside that function only call
load_or_generate_manifest() when that parameter is true; update all callers
accordingly (including the other call-site region noted) so a plain
semantic-layer import does not invoke load_or_generate_manifest() unless the
hidden-model flag is set.

In `@slayer/engine/enrichment.py`:
- Around line 178-186: The filtered measure resolution updates filter_sql using
model_name_str via resolve_filter_columns but does not update the corresponding
EnrichedMeasure's model_name, causing mixed aliases when query.source_model !=
model.name; update the EnrichedMeasure (and any other enriched objects created
in the same block, including the similar code at 188-201) to use model_name_str
(or otherwise propagate the resolved alias) after calling resolve_filter_columns
so that filter_sql, filter_columns and the EnrichedMeasure.model_name (and any
related fields used by slayer/sql/generator.py::_build_agg()) are consistent and
reference the same alias.

In `@slayer/sql/generator.py`:
- Around line 978-987: Filtered first/last measures currently leak joined-table
aliases because _build_agg() reuses measure.filter_sql in the outer aggregate
while _build_last_ranked_from() only projects model.*, _td_*, and
_first/_last_rn*; fix by having the ranked subquery (the code building
ranked_sql when enriched.resolved_joins is present) emit a per-measure boolean
flag column (e.g., <measure>_match) computed from the original
measure.filter_sql, include that flag in the SELECT projection produced by
_build_last_ranked_from(), and update _build_agg() to test this projected
<measure>_match flag in the outer MAX(CASE ...) instead of injecting the
original filter_sql expression so no joined aliases leak out.

---

Nitpick comments:
In `@docs/dbt/dbt_import.md`:
- Around line 73-76: The consolidation currently keeps the first dbt field name
and overwrites sql with the shared expression; change the consolidation logic in
slayer/dbt/converter.py (the function that merges/consolidates measures—e.g.,
the convert_measures/consolidate_measures routine around lines 121-139) so the
consolidated metric uses the derived aggregation-specific name (e.g.,
revenue_sum, revenue_avg) instead of the original column name (amount) while
still setting sql to the shared expression; specifically, choose the
consolidated measure name by appending the aggregation key to the base measure
name (or by selecting the original dbt measure name that contains the
aggregation suffix), set sql to the common expression, and preserve
allowed_aggregations accordingly.

In `@tests/test_dbt_converter.py`:
- Around line 381-398: Move the repeated local imports of
_queries_dir_for_storage to a single module-level import at the top of the test
file: add "from slayer.cli import _queries_dir_for_storage" in the file header
and remove the per-test local imports inside
test_directory_storage_path_returned_as_is,
test_sqlite_db_uses_parent_directory, and test_bare_sqlite_filename_in_cwd so
all tests use the same module-scope symbol.

In `@tests/test_sql_generator.py`:
- Line 1456: Hoist the local import of enrich_query to module scope by adding
"from slayer.engine.enrichment import enrich_query" to the top-level imports in
tests/test_sql_generator.py, then remove the duplicate local imports of
enrich_query inside the three tests that currently import it locally so those
tests simply call enrich_query directly.
- Around line 76-77: The new call to the helper _generate should use keyword
arguments to match repo conventions: change the invocation that currently passes
positional args to call _generate with named parameters (e.g.,
_generate(generator=..., query=..., model=...)) where the current variables are
generator, query, and model; update the single call in
tests/test_sql_generator.py near SlayerQuery and Field so it reads the
keyword-argument form.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 961098a7-8bc4-4ac5-a76c-b4ff456a3f8a

📥 Commits

Reviewing files that changed from the base of the PR and between 914ffbb and 32a2b9a.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • docs/dbt/dbt_import.md
  • docs/dbt/slayer_vs_dbt.md
  • docs/examples/01_dynamic/dynamic.md
  • pyproject.toml
  • slayer/cli.py
  • slayer/dbt/converter.py
  • slayer/dbt/entities.py
  • slayer/dbt/manifest.py
  • slayer/dbt/parser.py
  • slayer/engine/enriched.py
  • slayer/engine/enrichment.py
  • slayer/sql/generator.py
  • tests/test_dbt_converter.py
  • tests/test_dbt_entities.py
  • tests/test_dbt_parser.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (3)
  • slayer/engine/enriched.py
  • tests/test_dbt_parser.py
  • docs/dbt/slayer_vs_dbt.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • slayer/cli.py
  • slayer/dbt/manifest.py
  • slayer/dbt/converter.py
  • tests/test_dbt_entities.py

Comment thread docs/examples/01_dynamic/dynamic.md Outdated
Comment thread slayer/dbt/parser.py Outdated
Comment thread slayer/engine/enrichment.py
Comment thread slayer/sql/generator.py
ZmeiGorynych and others added 2 commits April 16, 2026 16:46
Five new CodeRabbit comments after the previous batches landed:

* slayer/cli.py: bridge async storage.get_datasource and storage.save_model
  with run_sync (B6-1, Critical). Every other storage.* call site in this
  file already does so; import-dbt was the lone exception, with the result
  that --include-hidden-models always failed the datasource lookup
  (coroutine != None) and even plain `slayer import-dbt` printed
  "Imported model: ..." without ever persisting anything.
* slayer/dbt/parser.py: add keyword-only `include_regular_models: bool =
  False` to parse_dbt_project (B6-2, Major). Previously the manifest was
  always loaded, which can invoke `dbt parse` (slow) or fail noisily when
  dbt-core isn't installed — making the "optional" hidden-model import
  path non-optional. CLI passes through args.include_hidden_models.
* slayer/engine/enrichment.py: propagate model_name_str through every
  Enriched* construction site (B6-3, Major). Batch 1 fixed filter
  resolution to use model_name_str but EnrichedMeasure.model_name,
  EnrichedQuery.model_name, and the Dimension/TimeDimension helpers
  still used model.name — mismatched aliases would render invalid SQL
  in any divergent path. In production today the two are always equal
  so this was latent; my Batch 1 regression test exposes the case.
* slayer/sql/generator.py: project a per-measure boolean match-flag
  column from the ranked subquery (B6-4, Major). _build_agg() previously
  re-emitted measure.filter_sql in the OUTER MAX(CASE WHEN ...). Batch 2
  injected joins inside the subquery so the inner ROW_NUMBER OVER
  resolved, but the outer aggregate had no scope on the joined tables.
  Now _build_last_ranked_from also projects `_match_f<idx>` columns and
  the outer MAX(CASE) tests `_match_f<idx> = 1` instead.
* docs/examples/01_dynamic/dynamic.md: subject-verb typo "audience are"
  → "audience is" (B6-5).

Plus regression tests for every real-bug fix:
* TestImportDbtCli::test_models_are_persisted_to_storage (B6-1)
* TestRegularModelsOptIn::{test_default_does_not_load_manifest,
  test_opt_in_loads_manifest} (B6-2)
* TestFilteredMeasures::test_filtered_measure_source_alias_propagates_to_generated_sql (B6-3)
* TestFilteredMeasures::test_filtered_last_outer_aggregate_uses_match_flag_not_joined_filter (B6-4)

Plus also merges in origin/main (PR #34 "Variables in filters") cleanly.

Full pytest: 681 passed (was 676 baseline post-merge). ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/examples/01_dynamic/dynamic.md (1)

6-8: Optional wording trim for repetition/concision.

You repeat “at the moment you write it” across adjacent lines, and Line 62 is a bit dense; a small edit could improve flow without changing meaning.

Also applies to: 62-62

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/examples/01_dynamic/dynamic.md` around lines 6 - 8, Trim the repeated
phrase "at the moment you write it" to improve flow: keep it once in the first
sentence and reword the following sentence to something concise (e.g., "This
post tours what that means in SLayer.") and simplify the dense sentence at Line
62 for readability by breaking it into two shorter clauses or removing redundant
qualifiers; update occurrences of "at the moment you write it" accordingly.
slayer/dbt/parser.py (1)

57-58: Make YAML discovery order deterministic across nested directories.

Line 57/58 filters hidden dirs, but unsorted dirs from os.walk can produce non-deterministic traversal order. Sorting dirs keeps parse/output order stable.

Suggested diff
-        dirs[:] = [d for d in dirs if not d.startswith(".")]
+        dirs[:] = sorted(d for d in dirs if not d.startswith("."))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/dbt/parser.py` around lines 57 - 58, The os.walk traversal in the loop
(for root, dirs, files in os.walk(directory)) currently filters hidden dirs but
leaves dirs unsorted, causing non-deterministic directory traversal; change the
in-place assignment to sort the directories (e.g., dirs[:] = sorted(d for d in
dirs if not d.startswith("."))) and also sort files (e.g., files.sort() or files
= sorted(files)) before processing so YAML file discovery order is
deterministic; update the code around the root, dirs, files loop in parser.py
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/dbt/parser.py`:
- Around line 95-97: The file opens YAML using platform-default encoding which
can break on non-ASCII dbt metadata; change the open call that reads path before
yaml.safe_load to explicitly specify UTF-8 (e.g., use open(path, "r",
encoding="utf-8") ) so yaml.safe_load(f) always receives UTF-8 text; update the
with-open invocation where path is used and leave the rest of the try/except
logic intact.

---

Nitpick comments:
In `@docs/examples/01_dynamic/dynamic.md`:
- Around line 6-8: Trim the repeated phrase "at the moment you write it" to
improve flow: keep it once in the first sentence and reword the following
sentence to something concise (e.g., "This post tours what that means in
SLayer.") and simplify the dense sentence at Line 62 for readability by breaking
it into two shorter clauses or removing redundant qualifiers; update occurrences
of "at the moment you write it" accordingly.

In `@slayer/dbt/parser.py`:
- Around line 57-58: The os.walk traversal in the loop (for root, dirs, files in
os.walk(directory)) currently filters hidden dirs but leaves dirs unsorted,
causing non-deterministic directory traversal; change the in-place assignment to
sort the directories (e.g., dirs[:] = sorted(d for d in dirs if not
d.startswith("."))) and also sort files (e.g., files.sort() or files =
sorted(files)) before processing so YAML file discovery order is deterministic;
update the code around the root, dirs, files loop in parser.py accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e16c7e6c-24b4-49e9-801e-126a1b564c2c

📥 Commits

Reviewing files that changed from the base of the PR and between 32a2b9a and 76cedac.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • docs/examples/01_dynamic/dynamic.md
  • slayer/cli.py
  • slayer/dbt/parser.py
  • slayer/engine/enrichment.py
  • slayer/sql/generator.py
  • tests/test_dbt_converter.py
  • tests/test_dbt_parser.py
  • tests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_dbt_converter.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • slayer/engine/enrichment.py
  • slayer/cli.py
  • slayer/sql/generator.py
  • tests/test_sql_generator.py

Comment thread slayer/dbt/parser.py Outdated
ZmeiGorynych and others added 3 commits April 16, 2026 17:13
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Semantic models over regular dbt models (a SELECT rather than a physical
table) are now ingested with `SlayerModel.sql = <resolved SELECT>` instead
of `SlayerModel.sql_table = <bare name>`. Callers no longer need to
pre-materialise regular models via `dbt run` before running SLayer — the
SQL is read straight from the project's `.sql` files, Jinja refs are
recursively inlined, and the resulting subquery lives on the SlayerModel
itself. Semantic models over source tables continue to use `sql_table`
exactly as before.

Changes:
- `slayer/dbt/models.py`: `DbtRegularModel` gains `raw_code: Optional[str]`
  carrying the SQL body read from disk.
- `slayer/dbt/parser.py`: new `_collect_sql_files` that walks the project
  (skipping `target/`), keyed by filename stem. `parse_dbt_project` now
  always populates `DbtRegularModel` entries from disk; the existing
  `include_regular_models` flag still gates manifest-based orphan
  discovery, but SQL scanning is free and unconditional. Removed the
  early-return when no YAMLs are present so SQL-only projects still
  yield regular models.
- `slayer/dbt/sql_resolver.py` (NEW): pure regex-based Jinja resolver.
  Strips `{{ config(...) }}`, substitutes `{{ ref('X') }}` (handling
  package-qualified and versioned forms) either by recursively inlining
  a known regular model or by substituting the bare identifier, and
  substitutes `{{ source('s','t') }}` as `s.t`. Uses strict `\w+`
  identifier captures so ref/source arguments cannot smuggle SQL tokens;
  unrecognised Jinja is left verbatim with a warning. Cycle detection
  via a visited set, plus a `max_depth` belt-and-braces cap.
- `slayer/dbt/converter.py`: `_convert_semantic_model` builds a
  `{regular_model_name: raw_code}` map at construction time and, when
  `sm.model` matches an entry, sets `SlayerModel.sql` from the resolved
  SQL body. Warnings from the resolver are attached to the converter's
  `ConversionResult` under the semantic model's name.

Tests:
- `tests/test_dbt_sql_resolver.py` (NEW): 13 tests covering ref-to-source,
  ref-to-regular-model, transitive refs, cycle detection, package-
  qualified refs, versioned refs, `{{ config() }}` stripping,
  `{{ source() }}` substitution, injection-attempt rejection, unknown-
  macro warnings, and the max-depth cap.
- `tests/test_dbt_converter_inline_sql.py` (NEW): 4 tests covering the
  semantic-over-source fallback, the semantic-over-regular-model inline
  path, transitive inlining, and a parse-then-convert round-trip from
  a tmp_path dbt project.
- `tests/test_dbt_parser.py`: new `TestParseDbtProjectSqlFiles` class
  (3 tests) asserting `.sql` scanning populates `raw_code`, works
  without the `include_regular_models` flag, and skips `target/`.

All 701 non-integration tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The resolver was emitting ``(inner) AS model_ref_sub``, which collides
with the caller's own alias in ``FROM {{ ref('claim') }} c`` →
``FROM (inner) AS claim_ref_sub c`` (invalid: two consecutive aliases).

Now emits bare ``(inner)`` — the caller's alias applies directly, as in
standard SQL derived-table syntax: ``FROM (inner) c``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ZmeiGorynych added a commit to MotleyAI/dbt-llm-sl-bench that referenced this pull request Apr 17, 2026
…trix

setup_slayer.py:
  - Remove _BRIDGE_VIEWS constant and all DuckDB view creation. Bridge
    models (claim_policy_bridge, policy_holder_policy, policy_premium_detail)
    now come through as pure SLayer models with inline SQL, courtesy of
    SLayer's new dbt-ingestion generalisation (MotleyAI/slayer#31).
  - Add --no-bridges flag: selects the `main` branch of the dbt project
    (25 base semantic models, no bridges) instead of the default
    `refresh-2025-additional-models` branch (28 models incl. bridges).
  - Fix async: bridge YAMLStorage.save_model/save_datasource/list_models
    with slayer.async_utils.run_sync (they became async coroutines in
    recent slayer).

pyproject.toml:
  - Point slayer extra at the git branch carrying the inline-SQL feature:
    git+https://github.com/MotleyAI/slayer.git@egor/dev-1251-dbt-ingestion

run_bench.py:
  - Replace the GPT-5.3-Codex 6-effort sweep with a focused comparison:
    sql + slayer strategies × gpt-5.3-codex (effort=medium) + gpt-4.1-mini
    (no effort) × 11 default questions × 5 iterations = 220 LLM calls.

tests/unit/test_setup_slayer_config.py (NEW):
  - --no-bridges selects `main`; default selects `refresh-2025-...`
  - load_csvs_into_duckdb produces tables only, zero DuckDB views
  - Empty CSV dir raises FileNotFoundError

All 122 non-integration tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-in-mcp-server' into egor/dev-1251-dbt-ingestion
…-in-mcp-server' into egor/dev-1251-dbt-ingestion

# Conflicts:
#	README.md
@ZmeiGorynych ZmeiGorynych merged commit f9e9604 into main Apr 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants