Add label and filter fields to Dimension/Measure by ZmeiGorynych · Pull Request #30 · MotleyAI/slayer

ZmeiGorynych · 2026-04-15T11:18:47Z

Summary

Two model-level enhancements that prepare SLayer for dbt semantic layer ingestion:

1. `label` field on Dimension and Measure

A human-readable display name (e.g., "Order Date", "Total Revenue"), distinct from the technical name and the explanatory description. This maps directly to dbt's label field on dimensions and measures.

Model-level labels propagate through enrichment as fallbacks when no query-time label is set
Fixed _query_as_model() to store labels in the label field — previously labels were incorrectly stored as description, and measure labels were lost entirely
MCP _model_to_summary() now includes labels in dimension/measure output

2. `filter` field on Measure

A SQL condition applied before aggregation via CASE WHEN wrapping. This enables dbt-style filtered metrics (e.g., loss_payment_amount = SUM(claim_amount) WHERE has_loss_payment = 1) without needing to create separate models per metric.

measures:
  - name: active_revenue
    sql: amount
    filter: "status = 'active'"
    # active_revenue:sum → SUM(CASE WHEN status = 'active' THEN amount END)

Key implementation details:

Filter resolution happens during enrichment (not SQL generation), reusing resolve_filter_columns logic
Supports cross-model (joined) filter references: "categories.type = 'electronics'"
SQL generation handles all aggregation types:
- Standard (SUM/AVG/MIN/MAX): wraps inner expression in CASE WHEN
- COUNT(*): becomes COUNT(CASE WHEN filter THEN 1 END)
- first/last: filter added to the ROW_NUMBER CASE condition
- Formula-based (weighted_avg, custom): {value} placeholder wrapped before substitution
_resolve_joins() scans measure filters for join references to ensure necessary JOINs are active
Multiple filtered and unfiltered measures can coexist in the same query and be combined in arithmetic formulas

Why this matters

These are the prerequisite SLayer changes needed to faithfully ingest dbt semantic layer definitions. With label, we can preserve dbt's human-readable names. With filter, dbt's filtered simple metrics map directly to SLayer measures — no need for a separate "metrics" concept layer. Derived metrics become straightforward query formulas combining filtered measures.

Test plan

Unit tests: model validation for label and filter fields (test_models.py)
SQL generation tests: CASE WHEN for sum, avg, count(*), mixed filtered/unfiltered (test_sql_generator.py)
Integration tests: filtered measure sum, count, with dimensions; label propagation through enrichment (test_integration.py)
Full test suite: 548 passed
Linter: ruff check clean

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Optional label for dimensions and measures; labels surface in query results and summaries.
- Optional filter for measures to apply SQL conditions at query time (affects aggregation, ranking, and formulas); supports dotted references to joined dimensions and mixing filtered/unfiltered measures.
Documentation
- Model docs updated with examples for labels and filtered measures and how filters are applied.
Tests
- Extensive integration and unit tests added, including SQL generation and injection-hardening checks.

These two enhancements prepare SLayer for dbt semantic layer ingestion: 1. `label` on Dimension and Measure — a human-readable display name distinct from `name` (technical) and `description` (explanatory). Model-level labels propagate through enrichment as fallbacks when no query-time label is set. Fixed _query_as_model to store labels in the label field (was incorrectly using description) and propagate descriptions separately. 2. `filter` on Measure — a SQL condition applied before aggregation via CASE WHEN wrapping. Enables dbt-style filtered metrics without creating separate models. Supports local and cross-model (joined) filter references. SQL generation handles all aggregation types: standard (SUM/AVG/etc), COUNT(*), first/last, and formula-based. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-15T11:23:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 009177f6-8661-41a0-aa8e-8dad04b45fc1

📥 Commits

Reviewing files that changed from the base of the PR and between 1ca9922 and 363df16.

📒 Files selected for processing (6)

slayer/core/formula.py
slayer/engine/enrichment.py
slayer/sql/generator.py
tests/integration/test_integration.py
tests/test_formula.py
tests/test_sql_generator.py

🚧 Files skipped from review as they are similar to previous changes (2)

slayer/engine/enrichment.py
tests/test_sql_generator.py

📝 Walkthrough

Walkthrough

Adds optional label to dimensions and measures and a measure-level filter that is parsed, resolved to filter_sql, and applied by wrapping aggregation inputs in CASE WHEN expressions; updates enrichment, SQL generation, query metadata, MCP summaries, docs, and tests.

Changes

Cohort / File(s)	Summary
Documentation & Schema `docs/concepts/models.md`	Documents new optional `label` for dimensions/measures and new optional `filter` for measures, describing `CASE WHEN` application and dot-syntax references.
Core Models & Validation `slayer/core/models.py`	Adds `label: Optional[str]` to `Dimension` and `Measure`; adds `filter: Optional[str]` to `Measure` with a field validator that applies multi-dot SQL auto-fix.
Enriched Types `slayer/engine/enriched.py`	Adds `filter_sql: Optional[str]` to `EnrichedMeasure`.
Enrichment Logic `slayer/engine/enrichment.py`	Parses/resolves `measure.filter` into `filter_sql`, propagates `label` to enriched dimensions/measures, and scans measure filters when discovering needed join tables.
Query Engine Metadata `slayer/engine/query_engine.py`	Separates label/description lookups in column map, preserves descriptions for virtual model construction, and resolves source measure names when fetching metadata.
MCP Summary Building `slayer/mcp/server.py`	Builds model summaries with explicit loops that include `label`, `description`, and `filter` (for measures) when present; skips hidden fields.
SQL Generation `slayer/sql/generator.py`	Threads filtered ROW_NUMBER maps through generation; for measures with `filter_sql`, wraps aggregation inputs in `CASE WHEN <filter_sql> THEN ... END`, uses `COUNT(CASE WHEN ... THEN 1 END)` for count-without-sql, and creates per-filtered `ROW_NUMBER()` for `first`/`last`.
Filter Parsing & Escaping `slayer/core/formula.py`	Adds `_escape_sql_string` to properly escape backslashes and single quotes; updates filter string handling to use the new escaping.
Tests `tests/integration/test_integration.py`, `tests/test_models.py`, `tests/test_sql_generator.py`, `tests/test_formula.py`	Adds integration tests for filtered measures and label propagation; model tests for `label`/`filter` behavior including multi-dot auto-fix; extensive SQL-generation tests for filtered aggregations and ROW_NUMBER handling; injection/escaping tests for filters; parse-filter robustness tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client as User/Client
    participant QE as Query Engine
    participant Enrich as Enrichment Layer
    participant SQLGen as SQL Generator
    participant DB as Database

    Client->>QE: Execute query with measures (may include measure.filter)
    QE->>Enrich: Request enrichment (resolve measures/dimensions)
    Enrich->>Enrich: Parse measure.filter -> produce filter_sql
    Enrich->>Enrich: Resolve filter columns (including joins)
    Enrich->>Enrich: Propagate labels to EnrichedDimension/EnrichedMeasure
    Enrich-->>QE: Return EnrichedQuery (includes filter_sql, labels)
    QE->>SQLGen: Build SQL from EnrichedQuery
    SQLGen->>SQLGen: For each measure with filter_sql:
    SQLGen->>SQLGen: wrap aggregation input as CASE WHEN filter_sql THEN <expr> END
    SQLGen-->>QE: Return generated SQL
    QE->>DB: Execute SQL
    DB-->>QE: Return rows
    QE->>QE: Build result.meta including labels
    QE-->>Client: Return rows + metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Egor/dev 1227 carry over formatting implementation incl propagation from storyline #26: Modifies enrichment and per-column metadata propagation (EnrichedDimension/EnrichedMeasure) similar to label/filter propagation here.
Reworked SubQueries #13: Touches enrichment and SQL-generation pipeline, affecting how resolved fields (including filters) are represented and emitted.
Clean up dot treatment in several places #14: Edits model validators and dotted-name handling; relates to the multi-dot filter auto-fix added for Measure.filter.

Suggested reviewers

AivanF

Poem

🐰 In fields and measures, labels bloom like spring,

Filters whisper CASE WHEN, a neat little thing.
Rows now choose wisely which values to keep,
Enriched and tidy, the queries leap—
—🥕 From the rabbit who patches the code while you sleep.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.91% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding label fields to Dimension and Measure classes, and adding a filter field to Measure class.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch egor/dev-1164-dbt

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

slayer/engine/enrichment.py (1)
555-564: ⚠️ Potential issue | 🟠 Major

Carry model labels into time dimensions too.

_resolve_dimensions() now falls back to dim_def.label, but _resolve_time_dimensions() still uses only td.label. A labeled model dimension queried via time_dimensions=[...] will therefore lose its label in result metadata.
Suggested fix
-                label=td.label,
+                label=td.label or (dim_def.label if dim_def else None),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/enrichment.py` around lines 555 - 564, In
_resolve_time_dimensions(), the label for EnrichedTimeDimension is currently set
only from td.label causing loss of model-level labels; update the label
expression to use td.label if present, otherwise fall back to dim_def.label (the
same fallback used in _resolve_dimensions()), i.e. when constructing
EnrichedTimeDimension (variables td and dim_def) set label = td.label or
dim_def.label so labeled model dimensions retain their label in time_dimensions
results.

🧹 Nitpick comments (1)

slayer/sql/generator.py (1)
999-1017: Build the new filter wrappers with sqlglot nodes, not string SQL.

These branches introduce more CASE WHEN ... string assembly and reparsing in the generator. That tends to be brittle around quoting and dialect differences; exp.Case, exp.When, exp.Count, etc. keep the AST structured end-to-end. As per coding guidelines "Use sqlglot AST building for SQL generation, not string concatenation".

Also applies to: 1075-1082
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 999 - 1017, Replace the string-based
CASE WHEN assembly with sqlglot AST nodes: where the code currently builds
case_sql and reparses (the branches handling measure.filter_sql around the
COUNT(*) special-case and the later wrapper when not (agg_name == "count" and
measure.sql is None)), construct an exp.Case with
exp.When(condition=sqlglot.parse_one(measure.filter_sql, dialect=self.dialect)
or preferably parse the condition into an expression once, and use that When to
wrap the existing inner node (for COUNT(*) build
exp.Count(this=exp.Case(whens=[when_expr])) or for general measures wrap inner
as exp.Case(whens=[exp.When(this=inner, condition=cond_expr)])). Keep using
self._resolve_sql, exp.Star, exp.Column, and exp.to_identifier for the inner
expression and avoid any intermediate string SQL or sqlglot.parse_one of whole
CASE strings.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/sql/generator.py`:
- Around line 983-991: The current code applies measure.filter_sql only inside
the projection CASE (case_sql) after rn_col (like _first_rn/_last_rn) has ranked
all rows, which causes wrong rows to be returned; instead include the filter
expression as part of the ranking input so only matching rows get rank=1. Update
the logic that generates rn_col for filtered first/last (the _first_rn/_last_rn
/ ROW_NUMBER() expression) to incorporate measure.filter_sql into the ORDER BY
or a CASE inside the ORDER BY so that rows failing the filter are ordered after
matching rows (or excluded from rank=1), and then emit the simple MAX(CASE WHEN
{rn_col} = 1 THEN {measure.model_name}.{col} END) projection (remove the
measure.filter_sql from case_sql). Ensure you modify the code path that builds
rn_col and reference measure.filter_sql, rn_col, _first_rn/_last_rn and case_sql
when making the change.

In `@tests/integration/test_integration.py`:
- Around line 1518-1542: Replace the loose assertions in
test_filtered_measure_sum so they assert exact expected values rather than range
checks: after executing SlayerQuery (source_model="orders" with
Field(formula="total_amount:sum") and Field(formula="completed_revenue:sum")),
assert the concrete numeric values for row["orders.total_amount_sum"] and
row["orders.completed_revenue_sum"] (compute from the test fixture or hard-code
the expected sums) and keep the relational check removed; apply the same change
to the other similar test block that appends Measure(name="completed_revenue",
...) and uses Field + SlayerQuery (the second test in the file that mirrors
test_filtered_measure_sum) so both tests validate exact expected totals for
total_amount_sum and completed_revenue_sum.

---

Outside diff comments:
In `@slayer/engine/enrichment.py`:
- Around line 555-564: In _resolve_time_dimensions(), the label for
EnrichedTimeDimension is currently set only from td.label causing loss of
model-level labels; update the label expression to use td.label if present,
otherwise fall back to dim_def.label (the same fallback used in
_resolve_dimensions()), i.e. when constructing EnrichedTimeDimension (variables
td and dim_def) set label = td.label or dim_def.label so labeled model
dimensions retain their label in time_dimensions results.

---

Nitpick comments:
In `@slayer/sql/generator.py`:
- Around line 999-1017: Replace the string-based CASE WHEN assembly with sqlglot
AST nodes: where the code currently builds case_sql and reparses (the branches
handling measure.filter_sql around the COUNT(*) special-case and the later
wrapper when not (agg_name == "count" and measure.sql is None)), construct an
exp.Case with exp.When(condition=sqlglot.parse_one(measure.filter_sql,
dialect=self.dialect) or preferably parse the condition into an expression once,
and use that When to wrap the existing inner node (for COUNT(*) build
exp.Count(this=exp.Case(whens=[when_expr])) or for general measures wrap inner
as exp.Case(whens=[exp.When(this=inner, condition=cond_expr)])). Keep using
self._resolve_sql, exp.Star, exp.Column, and exp.to_identifier for the inner
expression and avoid any intermediate string SQL or sqlglot.parse_one of whole
CASE strings.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7571e379-6655-45cf-865c-3335195e24b1

📥 Commits

Reviewing files that changed from the base of the PR and between d7bbae7 and 1ca9922.

📒 Files selected for processing (10)

docs/concepts/models.md
slayer/core/models.py
slayer/engine/enriched.py
slayer/engine/enrichment.py
slayer/engine/query_engine.py
slayer/mcp/server.py
slayer/sql/generator.py
tests/integration/test_integration.py
tests/test_models.py
tests/test_sql_generator.py

ZmeiGorynych · 2026-04-15T12:38:29Z

@CodeRabbit review this!

Measure-level `filter` strings flow through parse_filter() → Python AST → SQL string, then get interpolated into CASE WHEN clauses in the generator. The existing single-quote doubling protected against standard-SQL injection, but backslashes were left unescaped — so a filter ending in `\` emitted SQL like `'a\'` which on MySQL/ClickHouse (default backslash-escape mode) reads as an unclosed string literal, letting subsequent tokens be consumed as string content. sqlglot's tokenizer surfaces this as a TokenError (DoS / error-leakage vector). Fix: in `_filter_node_to_sql` and `_get_string_arg`, escape `\` → `\\` before `'` → `''` so emitted string literals round-trip safely in every supported dialect. Also adds formal injection test suites: - tests/test_formula.py::TestParseFilterInjection — parser-level coverage for DROP/UNION/comments/stacked-semicolon/unknown-function rejection, tautology acceptance, embedded-quote doubling, backslash handling in both literal and LIKE paths, identifier-break rejection, deep-nesting bounds. - tests/test_sql_generator.py::TestMeasureFilterInjection — end-to-end coverage parametrised over postgres/mysql/sqlite/duckdb that round-trips generated SQL through sqlglot to verify no unclosed literals escape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread slayer/sql/generator.py Outdated

Comment thread tests/integration/test_integration.py Outdated

CodeRabbit fix

9284a4d

ZmeiGorynych mentioned this pull request Apr 15, 2026

dbt Semantic Layer ingestion (with optional hidden-model import) #31

Merged

5 tasks

AivanF approved these changes Apr 16, 2026

View reviewed changes

ZmeiGorynych merged commit ea028d2 into main Apr 16, 2026
3 checks passed

coderabbitai bot mentioned this pull request Apr 16, 2026

Egor/dev 1265 improve model introspection and help tool in mcp server #36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add label and filter fields to Dimension/Measure#30

Add label and filter fields to Dimension/Measure#30
ZmeiGorynych merged 3 commits intomainfrom
egor/dev-1164-dbt

ZmeiGorynych commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

ZmeiGorynych commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZmeiGorynych commented Apr 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. label field on Dimension and Measure

2. filter field on Measure

Why this matters

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZmeiGorynych commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZmeiGorynych commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

1. `label` field on Dimension and Measure

2. `filter` field on Measure

coderabbitai bot commented Apr 15, 2026 •

edited

Loading