dbt Semantic Layer ingestion (with optional hidden-model import)#31
dbt Semantic Layer ingestion (with optional hidden-model import)#31ZmeiGorynych merged 21 commits intomainfrom
Conversation
New `slayer import-dbt` command that reads dbt semantic layer YAML definitions and converts them to SLayer models. Key features: - Parses dbt-core format (semantic_models: list, metrics: list) - Resolves entity relationships (primary/foreign) to explicit SLayer joins - Converts Jinja filter syntax to plain SLayer filter strings - Consolidates measures with same expr but different aggs into one measure - Simple metrics with filters become filtered SLayer measures (CASE WHEN) - Derived/ratio/cumulative metrics generate SlayerQuery defs in queries.yaml - Preserves labels and descriptions throughout Module structure: slayer/dbt/ with models, parser, entities, filters, converter. 46 tests with inline fixtures (no external dependencies). Also adds docs/dbt/ with comparison doc (slayer_vs_dbt.md) and detailed conversion process doc (dbt_import.md). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded end-to-end dbt Semantic Layer ingestion: a CLI command parses dbt YAML/manifest, converts semantic_models/metrics into SLayer models/queries (with entity-based joins, filter translation, measure consolidation), optionally imports non-semantic dbt models as hidden models via SQL introspection, and persists results to storage with warnings reporting. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Parser
participant Manifest
participant Converter
participant Registry
participant Introspector
participant Storage
User->>CLI: slayer import-dbt <project_path> --datasource <ds> [--include-hidden-models]
CLI->>Parser: parse_dbt_project(project_path)
Parser-->>CLI: DbtProject (semantic_models, metrics)
alt include_hidden_models
CLI->>Manifest: load_or_generate_manifest(project_path)
Manifest-->>CLI: manifest or None
end
CLI->>Converter: DbtToSlayerConverter.convert(project, datasource, flags)
Converter->>Registry: EntityRegistry.build(semantic_models)
Registry-->>Converter: registry (entity→owner mappings)
alt include_hidden_models & sa_engine provided
Converter->>Introspector: introspect_table_to_model(...) for orphan/regular models
Introspector-->>Converter: hidden SlayerModel(s) or warnings
end
Converter-->>CLI: ConversionResult (models, queries, warnings)
CLI->>Storage: persist SlayerModel(s)
Storage-->>CLI: persisted
alt queries present
CLI->>Storage: write queries.yaml
Storage-->>CLI: persisted
end
CLI-->>User: print warnings and summary (visible/hidden/query counts)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (2)
docs/dbt/dbt_import.md (2)
124-128: Add language specifier to fenced code block.Per markdownlint, fenced code blocks should have a language specified for syntax highlighting.
📝 Suggested fix
-``` +```jinja {{ Dimension('claim_amount__has_loss_payment') }} = 1 {{ TimeDimension('metric_time', 'day') }} >= '2024-01-01' {{ Entity('customer_id') }} IS NOT NULL</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/dbt/dbt_import.mdaround lines 124 - 128, The fenced code block
containing Jinja snippets (uses Dimension, TimeDimension, Entity) lacks a
language specifier; update the block delimiter to include a language (e.g.,
changetojinja) so the Jinja/templating code is syntax-highlighted and
complies with markdownlint.</details> --- `154-163`: **Add language specifier to CLI reference code block.** <details> <summary>📝 Suggested fix</summary> ```diff -``` +```text slayer import-dbt <dbt_project_path> [options] Arguments: dbt_project_path Path to dbt project root or models directory Options: --datasource NAME SLayer datasource name for imported models (required) --storage PATH Storage directory for output (default: ./slayer_data) --no-strict-aggregations Allow all aggregation types (don't restrict to dbt's defined agg) ``` ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/dbt/dbt_import.mdaround lines 154 - 163, The fenced CLI example for
the "slayer import-dbt" command is missing a language specifier; update the
opening fence totext so the block is rendered as plain text (i.e., change the code block starting withbefore the "slayer import-dbt
<dbt_project_path> [options]" example to ```text) to ensure proper formatting
and syntax highlighting.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@slayer/cli.py:
- Around line 399-405: The code writes queries.yaml to storage_path directly,
which breaks when args.storage is a SQLite file (e.g., "slayer.db") because it
tries to create "slayer.db/queries.yaml"; update the path logic before building
queries_path: if storage was provided and is a file path (not a directory) use
its directory (os.path.dirname(storage_path)) as the base; otherwise if
storage_path is a directory use it as-is; fall back to args.models_dir or
_STORAGE_DEFAULT as before. Apply this change around the block that references
result.queries, storage_path, args.storage, args.models_dir and _STORAGE_DEFAULT
so queries_path points to a directory not a file.In
@slayer/dbt/converter.py:
- Around line 374-380: The current code uses formula.replace(ref_name, resolved)
which can replace substrings inside other identifiers (e.g., "total" in
"subtotal"); update the replacement to be token-aware by matching whole
identifier tokens instead of raw substrings: find occurrences of the metric
token (use the alias or name from m_input.alias / m_input.name) in formula as
standalone identifiers (e.g., with a regex using word boundaries or by
tokenizing the expression) and replace only those matches with the resolved
value returned by self._resolve_metric_to_formula; ensure you still iterate
metric.type_params.metrics and use ref_name when building the token match so
other identifier parts are not altered.- Around line 136-139: The code forces sql to None by assigning measure_name =
expr_key, making the expr_key != measure_name check always false; instead set
measure_name from the measure's actual name (e.g., measure.get("name") or the
original measure identifier) and only fall back to expr_key if no explicit name
exists (e.g., measure_name = measure.get("name", expr_key)); keep the existing
sql assignment logic (sql = expr_key if expr_key != measure_name else None) so
consolidated measures retain their SQL when the measure name differs from the
expression.In
@slayer/dbt/entities.py:
- Around line 93-102: The current dedupe logic using seen_targets causes
distinct FKs to the same model to collapse; in the join-building block
(variables seen_targets, target_model_name, foreign_expr, primary_expr, joins,
ModelJoin) change the key used for deduplication to the join signature (e.g.,
include foreign_expr and primary_expr with target_model_name) or remove the
dedupe entirely so you append a separate ModelJoin per foreign key; ensure each
ModelJoin's join_pairs uses the correct foreign_expr/primary_expr pair so
buyer_id->users.id and seller_id->users.id produce two distinct joins.In
@slayer/dbt/parser.py:
- Around line 19-32: Clarify and enforce that _extract_ref_name only accepts
single-argument ref('name') / ref("name"): update the docstring on
_extract_ref_name to explicitly state only single-arg dbt ref() is supported,
and add validation using _REF_PATTERN plus an additional broader detection regex
(or simple search for "ref(" with commas or multiple args) to detect unsupported
multi-argument/package-qualified forms; when such unsupported forms are found,
raise a clear ValueError (or custom exception) from _extract_ref_name explaining
that multi-argument/package-qualified ref() is not supported in semantic model
YAML and must be simplified to ref('model'). Ensure references to _REF_PATTERN
and _extract_ref_name are used so changes are easy to locate.In
@slayer/engine/enriched.py:
- Line 58: The EnrichedMeasure currently only stores filter_sql which forces
downstream join planning to reparse SQL; update EnrichedMeasure to retain
structured filter metadata (e.g., the ParsedFilter instance and/or a resolved
list of filter column identifiers) alongside filter_sql so join planning can use
structured data; modify the construction/return points that set filter_sql (look
for EnrichedMeasure creation sites and the variable named filter_sql) to also
attach the ParsedFilter object or a resolved filter_columns list, and update any
callers of EnrichedMeasure to read the new parsed metadata instead of regexing
filter_sql.In
@slayer/engine/enrichment.py:
- Around line 173-184: The measure-level filter resolution is incorrectly
passing model.name to resolve_filter_columns which causes SQL to reference the
wrong source alias; update the call in the measure filter block so
resolve_filter_columns receives model_name_str (the same source alias used
elsewhere) instead of model.name, keeping the
parsed=parse_filter(measure_def.filter) and assigning filter_sql =
resolved[0].sql unchanged; ensure you update the parameter named model_name in
the resolve_filter_columns call to use model_name_str so CASE/WHERE clauses
reference the actual FROM alias.In
@slayer/sql/generator.py:
- Around line 963-964: filtered_rn_map currently uses a non-unique key
f"{m.source_measure_name or m.name}:{m.aggregation}" which allows two filtered
metrics with the same source/agg but different filters/time columns to clobber
each other; change the key to include the per-measure identifier (e.g. use
m.alias or measure.alias) so each filtered rank maps to a unique entry (update
the key creation in the measure loop where measure_key is assigned and likewise
adjust the other occurrence around the similar filtered_rn_map population).- Around line 1113-1115: The current code wraps only the {value} piece
(col_expr) with measure.filter_sql, which alters numerators but leaves other
formula terms (e.g., weights in weighted_avg) unfiltered; instead apply the
filter to the entire substituted formula. Change the logic around
measure.filter_sql/col_expr/substituted so you first build substituted =
formula.replace("{value}", col_expr) and then, if measure.filter_sql is set,
wrap the whole substituted expression with "CASE WHEN {measure.filter_sql} THEN
({substituted}) END" (referencing measure.filter_sql, col_expr, formula, and
substituted) so all terms in the parameterized formula are filtered
consistently.- Around line 232-234: The ranked subquery builder (_build_last_ranked_from) is
only using base_from and thus omits enriched.resolved_joins so cross-model
filters (m.filter_sql) end up in a subquery without the needed joins; update the
flow to propagate joins into the ranked subquery: change _build_last_ranked_from
to accept and merge enriched.resolved_joins (or accept the full enriched object)
and apply those joins when constructing the from_clause subquery (so the
injected joins from _generate_base still match), and adjust any code paths that
build rn_suffix_map/filtered_rn_map to use the new from_clause with joins; also
ensure _generate_base’s join injection logic recognizes when from_clause is a
subquery and either injects joins into that subquery or prevents double-wrapping
so filtered first/last filters reference valid tables.In
@tests/test_sql_generator.py:
- Around line 1322-1328: The test's final assertion is vacuous because it ORs
the unwanted "_last_rn " check with the already-verified "_last_rn_f0" presence;
change the assertion to explicitly ensure the plain shared alias "_last_rn" does
not appear in the generated SQL (i.e., replace the line usingassert "_last_rn " not in sql or "_last_rn_f0" in sqlwith a direct negative check likeassert "_last_rn" not in sql), keeping the earlier checks for "_last_rn_f0" and the
CASE WHEN fragment intact so the test fails if the unfiltered "_last_rn" alias
is emitted.
Nitpick comments:
In@docs/dbt/dbt_import.md:
- Around line 124-128: The fenced code block containing Jinja snippets (uses
Dimension, TimeDimension, Entity) lacks a language specifier; update the block
delimiter to include a language (e.g., changetojinja) so the
Jinja/templating code is syntax-highlighted and complies with markdownlint.- Around line 154-163: The fenced CLI example for the "slayer import-dbt"
command is missing a language specifier; update the opening fence to ```text so
the block is rendered as plain text (i.e., change the code block starting with```text) to ensure proper formatting and syntax highlighting.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID:
9e18cc87-13b0-4188-8755-0c4d961d2f7d📒 Files selected for processing (23)
docs/concepts/models.mddocs/dbt/dbt_import.mddocs/dbt/slayer_vs_dbt.mdslayer/cli.pyslayer/core/models.pyslayer/dbt/__init__.pyslayer/dbt/converter.pyslayer/dbt/entities.pyslayer/dbt/filters.pyslayer/dbt/models.pyslayer/dbt/parser.pyslayer/engine/enriched.pyslayer/engine/enrichment.pyslayer/engine/query_engine.pyslayer/mcp/server.pyslayer/sql/generator.pytests/integration/test_integration.pytests/test_dbt_converter.pytests/test_dbt_entities.pytests/test_dbt_filters.pytests/test_dbt_parser.pytests/test_models.pytests/test_sql_generator.py
Opt-in via `slayer import-dbt --include-hidden-models` (requires new `dbt` extra). After the existing semantic-model pass, the converter walks `target/manifest.json` (running `dbt parse` via the optional dbt-core dep if needed), finds every regular model not referenced by a `semantic_model`, introspects its materialized table through the shared `introspect_table_to_model` helper, and registers the result with `hidden=True` so it stays out of discovery surfaces but remains queryable by name. Model/column descriptions from the manifest are overlaid onto the introspected dims/measures. Introspection failures and name collisions with semantic models are warned about and skipped individually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
slayer/sql/generator.py (1)
938-964: Filtered ROW_NUMBER implementation looks correct.The approach of using
ORDER BY CASE WHEN {filter} THEN 0 ELSE 1 END, {order}to push non-matching rows to the bottom of the ranking is sound. Theseen_filterscache correctly deduplicates identical(filter_sql, time_col, agg)combinations.One edge case worth noting: if two filtered measures have the same
source_measure_nameandaggregationbut different filters, themeasure_keyat line 963 would collide and only one mapping would survive infiltered_rn_map. Consider whether usingmeasure.alias(which is unique per enriched measure) as the key would be safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/sql/generator.py` around lines 938 - 964, The current filtered_rn_map uses measure_key = f"{m.source_measure_name or m.name}:{m.aggregation}" which can collide when two measures share source_measure_name and aggregation but have different filters; update the mapping to use a unique per-measure identifier (e.g., m.alias or another unique property on the EnrichedMeasure) as the key when assigning filtered_rn_map so each enriched measure maps to its correct ROW_NUMBER alias; adjust any downstream lookups that expect the old key to use the new unique key (references: filtered_rn_map, measure_key, enriched.measures, and m.alias).slayer/dbt/manifest.py (1)
36-42: Consider explicit UTF-8 encoding for manifest loading.The
open(path)call uses the system default encoding. While JSON files are typically UTF-8, explicitly specifying the encoding improves robustness:def _load_manifest_file(path: str) -> Optional[dict]: try: - with open(path) as f: + with open(path, encoding="utf-8") as f: return json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/dbt/manifest.py` around lines 36 - 42, The _load_manifest_file function opens the manifest with the system default encoding; change the open call in _load_manifest_file to explicitly specify UTF-8 (e.g., open(path, "r", encoding="utf-8")) so json.load reads the file with a deterministic encoding and avoid silent decode issues when loading the dbt manifest.tests/test_dbt_converter.py (1)
77-77: Prefer model lookup by name over positional index in assertionsUsing
result.models[0]makes tests brittle to benign ordering changes in converter output. Resolving by model name keeps tests deterministic and intention-revealing.Refactor pattern
+def _model_by_name(models, name: str): + return next(m for m in models if m.name == name) - orders = result.models[0] + orders = _model_by_name(result.models, "orders")Also applies to: 87-87, 95-95, 104-104, 113-113, 121-121, 143-143, 168-168, 184-184, 212-212, 241-241, 352-352
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_dbt_converter.py` at line 77, Tests currently access models via positional indices (e.g., orders = result.models[0]), which is brittle; update each test to resolve models by their name instead of by index by locating the model in result.models where model.name equals the expected name (for example replace the positional lookup used for "orders" and the other occurrences at the commented indices with a name-based lookup or a small helper that maps result.models by .name), so assertions reference the correct model deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_dbt_converter.py`:
- Line 147: The current test uses a loose assertion "assert m.sql == 'amount' or
m.name == 'amount'" which can pass if only the name matches; tighten it to
assert both properties match the expected value. Update the assertion for the
consolidated measure (variable m) so it validates m.sql equals "amount" and
m.name equals "amount" (or assert the tuple (m.sql, m.name) == ("amount",
"amount")) to prevent false positives in the converter test.
---
Nitpick comments:
In `@slayer/dbt/manifest.py`:
- Around line 36-42: The _load_manifest_file function opens the manifest with
the system default encoding; change the open call in _load_manifest_file to
explicitly specify UTF-8 (e.g., open(path, "r", encoding="utf-8")) so json.load
reads the file with a deterministic encoding and avoid silent decode issues when
loading the dbt manifest.
In `@slayer/sql/generator.py`:
- Around line 938-964: The current filtered_rn_map uses measure_key =
f"{m.source_measure_name or m.name}:{m.aggregation}" which can collide when two
measures share source_measure_name and aggregation but have different filters;
update the mapping to use a unique per-measure identifier (e.g., m.alias or
another unique property on the EnrichedMeasure) as the key when assigning
filtered_rn_map so each enriched measure maps to its correct ROW_NUMBER alias;
adjust any downstream lookups that expect the old key to use the new unique key
(references: filtered_rn_map, measure_key, enriched.measures, and m.alias).
In `@tests/test_dbt_converter.py`:
- Line 77: Tests currently access models via positional indices (e.g., orders =
result.models[0]), which is brittle; update each test to resolve models by their
name instead of by index by locating the model in result.models where model.name
equals the expected name (for example replace the positional lookup used for
"orders" and the other occurrences at the commented indices with a name-based
lookup or a small helper that maps result.models by .name), so assertions
reference the correct model deterministically.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5047ddfe-66bd-44ce-9a4a-3e475dbc0c3b
📒 Files selected for processing (17)
README.mddocs/concepts/ingestion.mddocs/dbt/dbt_import.mddocs/interfaces/cli.mddocs/reference/cli.mdpyproject.tomlslayer/cli.pyslayer/dbt/converter.pyslayer/dbt/manifest.pyslayer/dbt/models.pyslayer/dbt/parser.pyslayer/engine/ingestion.pyslayer/sql/generator.pytests/test_dbt_converter.pytests/test_dbt_manifest.pytests/test_dbt_parser.pytests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (4)
- docs/concepts/ingestion.md
- docs/reference/cli.md
- README.md
- docs/interfaces/cli.md
🚧 Files skipped from review as they are similar to previous changes (3)
- slayer/dbt/converter.py
- tests/test_sql_generator.py
- slayer/dbt/models.py
…ev-1251-dbt-ingestion # Conflicts: # pyproject.toml # tests/test_sql_generator.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
slayer/cli.py (1)
437-442:⚠️ Potential issue | 🟠 Major
queries.yamlstill assumes--storageis a directory.If the user imports into SQLite storage, Line 440 builds
<db-file>/queries.yaml, which fails as soon as converted metrics produce query output. This was already called out on the earlier revision and still looks unresolved.Suggested fix
if result.queries: storage_path = args.storage or args.models_dir or _STORAGE_DEFAULT + if storage_path.endswith((".db", ".sqlite")): + storage_path = os.path.dirname(storage_path) or "." queries_path = os.path.join(storage_path, "queries.yaml") with open(queries_path, "w") as f: _yaml.dump(result.queries, f, sort_keys=False, default_flow_style=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/cli.py` around lines 437 - 442, The code assumes storage_path is a directory when building queries_path, causing failures if args.storage points to a file (e.g., SQLite DB). Change the logic that computes storage_path/queries_path so that if the chosen storage (args.storage or args.models_dir or _STORAGE_DEFAULT) is not a directory (os.path.isdir(storage_path) is False) you use its parent directory (os.path.dirname(storage_path) or fallback to current dir) before os.path.join("queries.yaml"); update the block around result.queries/queries_path/open/_yaml.dump to compute queries_path from that normalized directory. Ensure you reference result.queries, storage_path, args.storage, args.models_dir, _STORAGE_DEFAULT, and queries_path when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@slayer/cli.py`:
- Around line 402-428: The code calls storage.get_datasource and
storage.save_model directly although storage APIs are async; wrap those calls
with the sync bridge utility (run_sync) so coroutines are executed immediately.
Specifically, replace direct calls to storage.get_datasource(args.datasource)
and storage.save_model(model) with run_sync(storage.get_datasource,
args.datasource) and run_sync(storage.save_model, model) respectively (or a
single run_sync call that consumes the list), keeping the existing
DbtToSlayerConverter usage and sa_engine dispose logic intact.
In `@tests/test_sql_generator.py`:
- Around line 62-79: The test test_numeric_literal_measure is synchronous but
calls the async helper _generate; change the test signature to async def
test_numeric_literal_measure(...) and await the _generate(...) call (await
_generate(generator=generator, query=query, model=model)) so the coroutine
executes; also pass arguments to _generate using keyword names to follow project
conventions and ensure the assertions check the actual generated SQL from SUM(1)
and not a coroutine.
---
Duplicate comments:
In `@slayer/cli.py`:
- Around line 437-442: The code assumes storage_path is a directory when
building queries_path, causing failures if args.storage points to a file (e.g.,
SQLite DB). Change the logic that computes storage_path/queries_path so that if
the chosen storage (args.storage or args.models_dir or _STORAGE_DEFAULT) is not
a directory (os.path.isdir(storage_path) is False) you use its parent directory
(os.path.dirname(storage_path) or fallback to current dir) before
os.path.join("queries.yaml"); update the block around
result.queries/queries_path/open/_yaml.dump to compute queries_path from that
normalized directory. Ensure you reference result.queries, storage_path,
args.storage, args.models_dir, _STORAGE_DEFAULT, and queries_path when making
the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5ac2d277-f2ba-4dce-a85b-d96641a8f8cd
📒 Files selected for processing (3)
pyproject.tomlslayer/cli.pytests/test_sql_generator.py
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
Three Major correctness fixes from CodeRabbit review on PR #31: * slayer/dbt/converter.py: consolidated dbt measures lost their SQL expression because measure_name was assigned to expr_key just before the (expr_key != measure_name) test that gated sql assignment. Now measure_name comes from the first dbt measure in the group, and the shared expr is preserved as sql. * slayer/engine/enrichment.py: measure-level filter resolution now qualifies columns with model_name_str (the source alias) instead of model.name, matching every other call site in the function. Fixes filtered measures over named-query / aliased sources generating SQL that references a table not present in the FROM clause. * slayer/dbt/entities.py: distinct foreign keys to the same target (e.g. buyer_id and seller_id both pointing at users.id) no longer collapse into a single ModelJoin. Dedup is now keyed by the full join signature (target, fk_expr, pk_expr) so each relationship is preserved. Also tightens tests/test_dbt_converter.py:147 (CodeRabbit #12) which was masking the converter bug, and adds regression tests for the filter-alias and multi-FK cases. Drive-by: makes test_numeric_literal_measure async (it was silently coroutine-asserting). Full pytest suite: 654 passed. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ev-1251-dbt-ingestion # Conflicts: # pyproject.toml
Three Major correctness fixes + one paired test fix from CodeRabbit review on PR #31, all in slayer/sql/generator.py around the filtered first/last and weighted_avg machinery: * Parameterized formula filters now wrap every row-level reference in CASE WHEN, not just {value}. Previously `weighted_avg` on a filtered measure produced SUM(CASE WHEN filter THEN value END * weight) / SUM(weight) — numerator filtered, denominator summing all weights. Now both numerator and denominator share the same filter mask, so the weighted_avg correctly normalizes over the filtered subset. * filtered_rn_map is keyed by measure.alias (unique per enriched measure) instead of "{source_measure_name|name}:{aggregation}". Two filtered metrics with the same source/agg but different filters or time columns no longer clobber each other in the rank-column map. * _build_last_ranked_from() now applies enriched.resolved_joins inside the ranked subquery. Previously cross-model filters on filtered first/last (e.g. customers.status = 'active') would reference a table not present in the subquery's FROM. The outer string-level join injection in _generate_base is now suppressed when the from clause is a ranked subquery to avoid duplicate joins. * Drive-by: the unfiltered _last_rn / _first_rn columns are no longer emitted when only filtered first/last measures are present (caught by tightening the previously-vacuous test assertion in test_filtered_last_generates_dedicated_rn). Also tightens tests/test_sql_generator.py:1352 (CodeRabbit #11) which was always-True, and adds three regression tests: test_filtered_weighted_avg_filters_both_terms, test_two_filtered_lasts_same_source_different_filters_dont_collide, test_filtered_last_with_cross_model_filter_carries_join. Includes docs/dbt/slayer_vs_dbt.md (carried across the previous poetry-lock churn). Full pytest: 657 passed. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
One Major + two Minor + one Nitpick from CodeRabbit review on PR #31: * slayer/dbt/converter.py: derived-metric formula substitution now uses word-boundary regex (`re.sub(r"\b...\b", ...)`) instead of plain str.replace. A metric named `total` no longer mutates `subtotal` or `total_orders` elsewhere in the same expression. Replacement string has its backslashes escaped so any literal "\\1" in the resolved colon expression is treated as text, not a regex backreference. * slayer/dbt/parser.py: _REF_PATTERN now handles all three dbt ref() forms — single-arg `ref('name')`, package-qualified `ref('pkg', 'name')`, and versioned `ref('name', v=1)`. _extract_ref_name picks the SECOND positional string arg when present (package-qualified case) and falls back to the first. * slayer/cli.py: `slayer import-dbt --storage slayer.db` no longer tries to write `slayer.db/queries.yaml`. New helper _queries_dir_for_storage detects SQLite file paths (.db / .sqlite / .sqlite3) and writes queries.yaml beside the file. Also adds explicit utf-8 encoding and os.makedirs to be defensive about target dir existence. * slayer/dbt/manifest.py: explicit utf-8 encoding when loading manifest.json (CodeRabbit nitpick N3). Adds 8 new tests: 1 token-aware metric replacement, 4 ref() forms, 3 queries-dir resolver cases. Full pytest: 665 passed. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CodeRabbit #6 (Major). Replace regex-based join discovery with the structured ParsedFilter data already produced by resolve_filter_columns. Previously _resolve_joins regexed m.filter_sql with _TABLE_COL_RE to find cross-model column references the join planner needed to satisfy. That pattern (\b\w+\.\w+\b) matches anything that looks like table.column — including substrings inside string literals ("description LIKE '%foo.bar%'" → spurious join to a 'foo' table), function calls written with dots, and any other dotted token in the rendered SQL. The result was unwanted LEFT JOINs that could change result cardinality or query performance. Changes: * slayer/engine/enriched.py: add filter_columns: List[str] to EnrichedMeasure (default empty list). * slayer/engine/enrichment.py: in _ensure_aggregated_measure, capture resolved[0].columns alongside resolved[0].sql when resolving the measure-level filter. * slayer/engine/enrichment.py: in _resolve_joins, iterate m.filter_columns (qualified column names from ParsedFilter) instead of regexing m.filter_sql. Only real column refs feed needed_tables. Adds test_filter_with_dotted_string_literal_does_not_pull_spurious_join which confirms a measure filter like "url LIKE 'foo.bar%'" no longer pulls in an unrelated 'foo' join target. Full pytest: 666 passed. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs/dbt/dbt_import.md: tag the Jinja filter snippet block as `jinja` and the CLI usage block as `text` so markdownlint stops flagging them and IDE/web previews get appropriate (or absent) syntax highlighting. * tests/test_dbt_converter.py: replace positional result.models[0] with name-based lookup `next(m for m in result.models if m.name == "orders")` in TestBasicConversion. The project under test has two models so the test is order-dependent today; the name-based lookup is robust if the converter ever changes iteration order. Single-model test cases (TestMeasureConsolidation etc.) keep [0] since order can't matter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous Add poetry.lock commit (32a2b9a) bundled an unresolved merge state — eight `<<<<<<< Updated upstream` / `>>>>>>> Stashed changes` blocks left over from a `git stash pop` after pulling main. Poetry refused to read the file (Invalid statement at line 315), which broke CI on `poetry install -E all --with dev`. Regenerated cleanly from the current pyproject.toml. `poetry install -E all --with dev` succeeds locally; quick pytest sanity check on the dbt + filtered-measure suites passes (39 tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
docs/dbt/dbt_import.md (1)
73-76: Consolidation example uses the wrong resulting measure name.In the current converter (
slayer/dbt/converter.py:121-139), consolidated measures keep the first dbt measure name and setsqlto the shared expression. This example should not outputname: amount; it should bename: revenue_sum(withsql: amount).Suggested doc fix
# SLayer: -- name: amount +- name: revenue_sum sql: amount allowed_aggregations: [sum, avg] description: "dbt measures: revenue_sum (sum), revenue_avg (average)"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/dbt/dbt_import.md` around lines 73 - 76, The consolidation currently keeps the first dbt field name and overwrites sql with the shared expression; change the consolidation logic in slayer/dbt/converter.py (the function that merges/consolidates measures—e.g., the convert_measures/consolidate_measures routine around lines 121-139) so the consolidated metric uses the derived aggregation-specific name (e.g., revenue_sum, revenue_avg) instead of the original column name (amount) while still setting sql to the shared expression; specifically, choose the consolidated measure name by appending the aggregation key to the base measure name (or by selecting the original dbt measure name that contains the aggregation suffix), set sql to the common expression, and preserve allowed_aggregations accordingly.tests/test_sql_generator.py (2)
1456-1456: Hoistenrich_queryto module scope in these regressions.The same symbol is imported locally in three new tests. Moving it to the top of the file keeps the imports consistent and removes duplication.
As per coding guidelines, "Place imports at the top of files".
Also applies to: 1535-1535, 1627-1627
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_sql_generator.py` at line 1456, Hoist the local import of enrich_query to module scope by adding "from slayer.engine.enrichment import enrich_query" to the top-level imports in tests/test_sql_generator.py, then remove the duplicate local imports of enrich_query inside the three tests that currently import it locally so those tests simply call enrich_query directly.
76-77: Prefer keyword args in the new_generatecall.This is the only new call site added in the hunk, so it is easy to align with the repo convention now.
As per coding guidelines, "Use keyword arguments for functions with more than 1 parameter".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_sql_generator.py` around lines 76 - 77, The new call to the helper _generate should use keyword arguments to match repo conventions: change the invocation that currently passes positional args to call _generate with named parameters (e.g., _generate(generator=..., query=..., model=...)) where the current variables are generator, query, and model; update the single call in tests/test_sql_generator.py near SlayerQuery and Field so it reads the keyword-argument form.tests/test_dbt_converter.py (1)
381-398: Hoist_queries_dir_for_storageto module scope.These three tests import the same helper locally. Moving it to the top keeps the new code aligned with the repo import rule and removes repetition.
As per coding guidelines, "Place imports at the top of files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_dbt_converter.py` around lines 381 - 398, Move the repeated local imports of _queries_dir_for_storage to a single module-level import at the top of the test file: add "from slayer.cli import _queries_dir_for_storage" in the file header and remove the per-test local imports inside test_directory_storage_path_returned_as_is, test_sqlite_db_uses_parent_directory, and test_bare_sqlite_filename_in_cwd so all tests use the same module-scope symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/examples/01_dynamic/dynamic.md`:
- Line 3: Fix the subject-verb agreement in the opening sentence of the doc:
change "SLayer's intended audience are **agents and humans formulating ad hoc
queries**" to "SLayer's intended audience is **agents and humans formulating ad
hoc queries**" so the noun "audience" correctly pairs with the singular verb
"is".
In `@slayer/dbt/parser.py`:
- Line 128: parse_dbt_project always triggers manifest loading because it
unconditionally calls _parse_regular_models(), and _parse_regular_models()
unconditionally calls load_or_generate_manifest(); move manifest loading behind
the "--include-hidden-models" opt-in by either (a) only calling
_parse_regular_models() from parse_dbt_project when include_hidden_models is
true, or (b) add an include_hidden_models boolean parameter to
_parse_regular_models() and inside that function only call
load_or_generate_manifest() when that parameter is true; update all callers
accordingly (including the other call-site region noted) so a plain
semantic-layer import does not invoke load_or_generate_manifest() unless the
hidden-model flag is set.
In `@slayer/engine/enrichment.py`:
- Around line 178-186: The filtered measure resolution updates filter_sql using
model_name_str via resolve_filter_columns but does not update the corresponding
EnrichedMeasure's model_name, causing mixed aliases when query.source_model !=
model.name; update the EnrichedMeasure (and any other enriched objects created
in the same block, including the similar code at 188-201) to use model_name_str
(or otherwise propagate the resolved alias) after calling resolve_filter_columns
so that filter_sql, filter_columns and the EnrichedMeasure.model_name (and any
related fields used by slayer/sql/generator.py::_build_agg()) are consistent and
reference the same alias.
In `@slayer/sql/generator.py`:
- Around line 978-987: Filtered first/last measures currently leak joined-table
aliases because _build_agg() reuses measure.filter_sql in the outer aggregate
while _build_last_ranked_from() only projects model.*, _td_*, and
_first/_last_rn*; fix by having the ranked subquery (the code building
ranked_sql when enriched.resolved_joins is present) emit a per-measure boolean
flag column (e.g., <measure>_match) computed from the original
measure.filter_sql, include that flag in the SELECT projection produced by
_build_last_ranked_from(), and update _build_agg() to test this projected
<measure>_match flag in the outer MAX(CASE ...) instead of injecting the
original filter_sql expression so no joined aliases leak out.
---
Nitpick comments:
In `@docs/dbt/dbt_import.md`:
- Around line 73-76: The consolidation currently keeps the first dbt field name
and overwrites sql with the shared expression; change the consolidation logic in
slayer/dbt/converter.py (the function that merges/consolidates measures—e.g.,
the convert_measures/consolidate_measures routine around lines 121-139) so the
consolidated metric uses the derived aggregation-specific name (e.g.,
revenue_sum, revenue_avg) instead of the original column name (amount) while
still setting sql to the shared expression; specifically, choose the
consolidated measure name by appending the aggregation key to the base measure
name (or by selecting the original dbt measure name that contains the
aggregation suffix), set sql to the common expression, and preserve
allowed_aggregations accordingly.
In `@tests/test_dbt_converter.py`:
- Around line 381-398: Move the repeated local imports of
_queries_dir_for_storage to a single module-level import at the top of the test
file: add "from slayer.cli import _queries_dir_for_storage" in the file header
and remove the per-test local imports inside
test_directory_storage_path_returned_as_is,
test_sqlite_db_uses_parent_directory, and test_bare_sqlite_filename_in_cwd so
all tests use the same module-scope symbol.
In `@tests/test_sql_generator.py`:
- Line 1456: Hoist the local import of enrich_query to module scope by adding
"from slayer.engine.enrichment import enrich_query" to the top-level imports in
tests/test_sql_generator.py, then remove the duplicate local imports of
enrich_query inside the three tests that currently import it locally so those
tests simply call enrich_query directly.
- Around line 76-77: The new call to the helper _generate should use keyword
arguments to match repo conventions: change the invocation that currently passes
positional args to call _generate with named parameters (e.g.,
_generate(generator=..., query=..., model=...)) where the current variables are
generator, query, and model; update the single call in
tests/test_sql_generator.py near SlayerQuery and Field so it reads the
keyword-argument form.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 961098a7-8bc4-4ac5-a76c-b4ff456a3f8a
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
docs/dbt/dbt_import.mddocs/dbt/slayer_vs_dbt.mddocs/examples/01_dynamic/dynamic.mdpyproject.tomlslayer/cli.pyslayer/dbt/converter.pyslayer/dbt/entities.pyslayer/dbt/manifest.pyslayer/dbt/parser.pyslayer/engine/enriched.pyslayer/engine/enrichment.pyslayer/sql/generator.pytests/test_dbt_converter.pytests/test_dbt_entities.pytests/test_dbt_parser.pytests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (3)
- slayer/engine/enriched.py
- tests/test_dbt_parser.py
- docs/dbt/slayer_vs_dbt.md
🚧 Files skipped from review as they are similar to previous changes (4)
- slayer/cli.py
- slayer/dbt/manifest.py
- slayer/dbt/converter.py
- tests/test_dbt_entities.py
Five new CodeRabbit comments after the previous batches landed:
* slayer/cli.py: bridge async storage.get_datasource and storage.save_model
with run_sync (B6-1, Critical). Every other storage.* call site in this
file already does so; import-dbt was the lone exception, with the result
that --include-hidden-models always failed the datasource lookup
(coroutine != None) and even plain `slayer import-dbt` printed
"Imported model: ..." without ever persisting anything.
* slayer/dbt/parser.py: add keyword-only `include_regular_models: bool =
False` to parse_dbt_project (B6-2, Major). Previously the manifest was
always loaded, which can invoke `dbt parse` (slow) or fail noisily when
dbt-core isn't installed — making the "optional" hidden-model import
path non-optional. CLI passes through args.include_hidden_models.
* slayer/engine/enrichment.py: propagate model_name_str through every
Enriched* construction site (B6-3, Major). Batch 1 fixed filter
resolution to use model_name_str but EnrichedMeasure.model_name,
EnrichedQuery.model_name, and the Dimension/TimeDimension helpers
still used model.name — mismatched aliases would render invalid SQL
in any divergent path. In production today the two are always equal
so this was latent; my Batch 1 regression test exposes the case.
* slayer/sql/generator.py: project a per-measure boolean match-flag
column from the ranked subquery (B6-4, Major). _build_agg() previously
re-emitted measure.filter_sql in the OUTER MAX(CASE WHEN ...). Batch 2
injected joins inside the subquery so the inner ROW_NUMBER OVER
resolved, but the outer aggregate had no scope on the joined tables.
Now _build_last_ranked_from also projects `_match_f<idx>` columns and
the outer MAX(CASE) tests `_match_f<idx> = 1` instead.
* docs/examples/01_dynamic/dynamic.md: subject-verb typo "audience are"
→ "audience is" (B6-5).
Plus regression tests for every real-bug fix:
* TestImportDbtCli::test_models_are_persisted_to_storage (B6-1)
* TestRegularModelsOptIn::{test_default_does_not_load_manifest,
test_opt_in_loads_manifest} (B6-2)
* TestFilteredMeasures::test_filtered_measure_source_alias_propagates_to_generated_sql (B6-3)
* TestFilteredMeasures::test_filtered_last_outer_aggregate_uses_match_flag_not_joined_filter (B6-4)
Plus also merges in origin/main (PR #34 "Variables in filters") cleanly.
Full pytest: 681 passed (was 676 baseline post-merge). ruff clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/examples/01_dynamic/dynamic.md (1)
6-8: Optional wording trim for repetition/concision.You repeat “at the moment you write it” across adjacent lines, and Line 62 is a bit dense; a small edit could improve flow without changing meaning.
Also applies to: 62-62
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/examples/01_dynamic/dynamic.md` around lines 6 - 8, Trim the repeated phrase "at the moment you write it" to improve flow: keep it once in the first sentence and reword the following sentence to something concise (e.g., "This post tours what that means in SLayer.") and simplify the dense sentence at Line 62 for readability by breaking it into two shorter clauses or removing redundant qualifiers; update occurrences of "at the moment you write it" accordingly.slayer/dbt/parser.py (1)
57-58: Make YAML discovery order deterministic across nested directories.Line 57/58 filters hidden dirs, but unsorted
dirsfromos.walkcan produce non-deterministic traversal order. Sortingdirskeeps parse/output order stable.Suggested diff
- dirs[:] = [d for d in dirs if not d.startswith(".")] + dirs[:] = sorted(d for d in dirs if not d.startswith("."))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/dbt/parser.py` around lines 57 - 58, The os.walk traversal in the loop (for root, dirs, files in os.walk(directory)) currently filters hidden dirs but leaves dirs unsorted, causing non-deterministic directory traversal; change the in-place assignment to sort the directories (e.g., dirs[:] = sorted(d for d in dirs if not d.startswith("."))) and also sort files (e.g., files.sort() or files = sorted(files)) before processing so YAML file discovery order is deterministic; update the code around the root, dirs, files loop in parser.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@slayer/dbt/parser.py`:
- Around line 95-97: The file opens YAML using platform-default encoding which
can break on non-ASCII dbt metadata; change the open call that reads path before
yaml.safe_load to explicitly specify UTF-8 (e.g., use open(path, "r",
encoding="utf-8") ) so yaml.safe_load(f) always receives UTF-8 text; update the
with-open invocation where path is used and leave the rest of the try/except
logic intact.
---
Nitpick comments:
In `@docs/examples/01_dynamic/dynamic.md`:
- Around line 6-8: Trim the repeated phrase "at the moment you write it" to
improve flow: keep it once in the first sentence and reword the following
sentence to something concise (e.g., "This post tours what that means in
SLayer.") and simplify the dense sentence at Line 62 for readability by breaking
it into two shorter clauses or removing redundant qualifiers; update occurrences
of "at the moment you write it" accordingly.
In `@slayer/dbt/parser.py`:
- Around line 57-58: The os.walk traversal in the loop (for root, dirs, files in
os.walk(directory)) currently filters hidden dirs but leaves dirs unsorted,
causing non-deterministic directory traversal; change the in-place assignment to
sort the directories (e.g., dirs[:] = sorted(d for d in dirs if not
d.startswith("."))) and also sort files (e.g., files.sort() or files =
sorted(files)) before processing so YAML file discovery order is deterministic;
update the code around the root, dirs, files loop in parser.py accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e16c7e6c-24b4-49e9-801e-126a1b564c2c
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
docs/examples/01_dynamic/dynamic.mdslayer/cli.pyslayer/dbt/parser.pyslayer/engine/enrichment.pyslayer/sql/generator.pytests/test_dbt_converter.pytests/test_dbt_parser.pytests/test_sql_generator.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_dbt_converter.py
🚧 Files skipped from review as they are similar to previous changes (4)
- slayer/engine/enrichment.py
- slayer/cli.py
- slayer/sql/generator.py
- tests/test_sql_generator.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Semantic models over regular dbt models (a SELECT rather than a physical
table) are now ingested with `SlayerModel.sql = <resolved SELECT>` instead
of `SlayerModel.sql_table = <bare name>`. Callers no longer need to
pre-materialise regular models via `dbt run` before running SLayer — the
SQL is read straight from the project's `.sql` files, Jinja refs are
recursively inlined, and the resulting subquery lives on the SlayerModel
itself. Semantic models over source tables continue to use `sql_table`
exactly as before.
Changes:
- `slayer/dbt/models.py`: `DbtRegularModel` gains `raw_code: Optional[str]`
carrying the SQL body read from disk.
- `slayer/dbt/parser.py`: new `_collect_sql_files` that walks the project
(skipping `target/`), keyed by filename stem. `parse_dbt_project` now
always populates `DbtRegularModel` entries from disk; the existing
`include_regular_models` flag still gates manifest-based orphan
discovery, but SQL scanning is free and unconditional. Removed the
early-return when no YAMLs are present so SQL-only projects still
yield regular models.
- `slayer/dbt/sql_resolver.py` (NEW): pure regex-based Jinja resolver.
Strips `{{ config(...) }}`, substitutes `{{ ref('X') }}` (handling
package-qualified and versioned forms) either by recursively inlining
a known regular model or by substituting the bare identifier, and
substitutes `{{ source('s','t') }}` as `s.t`. Uses strict `\w+`
identifier captures so ref/source arguments cannot smuggle SQL tokens;
unrecognised Jinja is left verbatim with a warning. Cycle detection
via a visited set, plus a `max_depth` belt-and-braces cap.
- `slayer/dbt/converter.py`: `_convert_semantic_model` builds a
`{regular_model_name: raw_code}` map at construction time and, when
`sm.model` matches an entry, sets `SlayerModel.sql` from the resolved
SQL body. Warnings from the resolver are attached to the converter's
`ConversionResult` under the semantic model's name.
Tests:
- `tests/test_dbt_sql_resolver.py` (NEW): 13 tests covering ref-to-source,
ref-to-regular-model, transitive refs, cycle detection, package-
qualified refs, versioned refs, `{{ config() }}` stripping,
`{{ source() }}` substitution, injection-attempt rejection, unknown-
macro warnings, and the max-depth cap.
- `tests/test_dbt_converter_inline_sql.py` (NEW): 4 tests covering the
semantic-over-source fallback, the semantic-over-regular-model inline
path, transitive inlining, and a parse-then-convert round-trip from
a tmp_path dbt project.
- `tests/test_dbt_parser.py`: new `TestParseDbtProjectSqlFiles` class
(3 tests) asserting `.sql` scanning populates `raw_code`, works
without the `include_regular_models` flag, and skips `target/`.
All 701 non-integration tests pass; ruff clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The resolver was emitting ``(inner) AS model_ref_sub``, which collides
with the caller's own alias in ``FROM {{ ref('claim') }} c`` →
``FROM (inner) AS claim_ref_sub c`` (invalid: two consecutive aliases).
Now emits bare ``(inner)`` — the caller's alias applies directly, as in
standard SQL derived-table syntax: ``FROM (inner) c``.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…trix
setup_slayer.py:
- Remove _BRIDGE_VIEWS constant and all DuckDB view creation. Bridge
models (claim_policy_bridge, policy_holder_policy, policy_premium_detail)
now come through as pure SLayer models with inline SQL, courtesy of
SLayer's new dbt-ingestion generalisation (MotleyAI/slayer#31).
- Add --no-bridges flag: selects the `main` branch of the dbt project
(25 base semantic models, no bridges) instead of the default
`refresh-2025-additional-models` branch (28 models incl. bridges).
- Fix async: bridge YAMLStorage.save_model/save_datasource/list_models
with slayer.async_utils.run_sync (they became async coroutines in
recent slayer).
pyproject.toml:
- Point slayer extra at the git branch carrying the inline-SQL feature:
git+https://github.com/MotleyAI/slayer.git@egor/dev-1251-dbt-ingestion
run_bench.py:
- Replace the GPT-5.3-Codex 6-effort sweep with a focused comparison:
sql + slayer strategies × gpt-5.3-codex (effort=medium) + gpt-4.1-mini
(no effort) × 11 default questions × 5 iterations = 220 LLM calls.
tests/unit/test_setup_slayer_config.py (NEW):
- --no-bridges selects `main`; default selects `refresh-2025-...`
- load_csvs_into_duckdb produces tables only, zero DuckDB views
- Empty CSV dir raises FileNotFoundError
All 122 non-integration tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-in-mcp-server' into egor/dev-1251-dbt-ingestion
…-in-mcp-server' into egor/dev-1251-dbt-ingestion # Conflicts: # README.md
Summary
Adds first-class dbt support to SLayer in two layers:
slayer import-dbtreads a dbt project's YAML and converts itssemantic_models,entities,dimensions,measures, andmetricsinto SlayerModels +queries.yaml.--include-hidden-modelsadditionally registers every regular dbt model not wrapped by asemantic_modelas a hidden SlayerModel. These models are queryable by name but omitted from discovery listings so the semantic layer stays the default surface for agents.This is the foundation for running the dbt-llm-sl-bench benchmark against SLayer, and it closes the visibility gap for tables that exist in dbt but aren't (yet) part of the semantic layer.
Layer 1 — Semantic-layer import
New
slayer/dbt/module:models.py— lightweight Pydantic v2 classes for dbt YAML objects. We don't usemetricflow-semantic-interfacesbecause it requires a Pydantic v1 shim, has heavy transitive deps, and its parser expects a different YAML format than dbt-core produces.parser.py— walks a dbt project directory, handles thesemantic_models:list format, resolvesref()references.entities.py— builds an entity registry across all semantic models, resolves foreign entities to explicit SLayerModelJoinobjects.filters.py— converts dbt Jinja filter syntax ({{ Dimension('entity__dim') }}) to plain SLayer filter strings.converter.py— full conversion pipeline:exprbut differentaggvalues are merged into one SLayer measure with combinedallowed_aggregations.queries.yaml.CLI:
slayer import-dbt <dbt_project_path> --datasource <name> [--storage <path>] [--no-strict-aggregations]Layer 2 — Hidden-model import (new in the last commit)
Motivation: when an agent asks SLayer what tables it can query, it currently only sees tables with a
semantic_modelwrapper. Staging tables, unwrapped marts, and raw models are invisible even though they're sitting there in the warehouse. That's a discovery gap, not a capability gap — SLayer can already query an arbitrary table, it just didn't know the table existed.What's added:
slayer/dbt/manifest.pymodule — the only place in the codebase that importsdbt-core. It exposesDBT_AVAILABLE,load_or_generate_manifest(uses an existingtarget/manifest.jsonor invokesdbt parseif dbt-core is available), andfind_orphan_model_nodes/regular_models_from_manifestto turn the manifest into PydanticDbtRegularModelinstances.DbtRegularModelandDbtColumnMetaadded toslayer/dbt/models.py;DbtProjectextended withregular_models.slayer/engine/ingestion.py— the per-table introspection logic already used by auto-ingest is extracted as a publicintrospect_table_to_modelhelper so the new dbt pass can reuse it without duplication.DbtToSlayerConvertergains two new kwargs (sa_engine,include_hidden_models, both optional). When enabled, after semantic models + metrics are converted it iterates every orphan dbt model, introspects the materialized table, wraps the result as ahidden=TrueSlayerModel, and overlays the manifest's model/column descriptions onto the introspected dimensions and measures. Name-collisions with semantic models are skipped (semantic wins). Introspection failures per model (table not yet materialized, permission denied, etc.) log a warning and continue.slayer/cli.pyadds--include-hidden-modelsto theimport-dbtsubcommand (off by default). When set, the CLI resolves the datasource, opens a SQLAlchemy engine, passes it to the converter, and disposes it in afinally. The final summary line now prints<visible> models, <hidden> hidden, <n> queries, <n> warnings.Packaging:
dbt-coreis an optional dependency declared under a newdbtextras group (and added toall). Without it, the feature is cleanly no-op —import-dbtstill works for semantic models, and the code never imports dbt unless asked to.Design decisions
dbt parse, not filesystem walk — the manifest is the authoritative source of the dbt DAG, including dbt_project.yml overrides, macro-generated names, and alias resolution. A filesystem walk of.sqlfiles would miss all of that.import-dbtbehavior is unchanged. Users opt in with a single flag once they've installedslayer[dbt].slayer models list, the MCPdatasource_summary, and the dimensions/measures ofGET /models/{name}, but they can still be queried by name (REST, MCPquery, SQL). An agent that finds the name via the MCPedit_modeltool can also fliphiddentofalseto promote a model.Documentation
docs/dbt/dbt_import.md— full walkthrough of the semantic-layer conversion plus a new "Regular dbt Models (Hidden Import)" section covering prerequisites, usage, metadata carried over, failure semantics, and how to promote a hidden model later via MCPedit_model.docs/dbt/slayer_vs_dbt.md— self-contained comparison of SLayer vs dbt expressiveness.docs/concepts/ingestion.md— now opens with a three-path overview (auto-ingest, semantic-layer import, hidden dbt-model import) and links to the dbt docs.docs/reference/cli.md+docs/interfaces/cli.md— documentedslayer import-dbtwith all flags.README.md— one-bullet feature mention under "What SLayer does differently".Test plan
test_dbt_parser.py,test_dbt_entities.py,test_dbt_filters.py,test_dbt_converter.py, and the newtest_dbt_manifest.py. All use inline YAML fixtures and mocked SQLAlchemy engines — no live database required, CI-friendly.regular_modelsare populatedSQLAlchemyError→ warning, model skipped, others still convertload_or_generate_manifestcorrectly loads existing files, invokesdbt parsewhen missing, returnsNonewhen dbt-core is unavailableruff check slayer/ tests/— clean.pyproject.tomldeclaresdbt = ["dbt-core"]and includes it inall.Follow-ups (not in this PR)
jaffle_shop_duckdbwith a materialized warehouse.--include-hidden-modelsthrough the REST/import-dbtand MCP equivalents once the import endpoints exist.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
slayer import-dbtCLI to ingest dbt semantic_models and metrics; optional--include-hidden-modelsto import non-semantic models as hidden, queryable models and emit queries.yaml.Improvements / Bug Fixes
Documentation
Tests