Skip to content

Add label and filter fields to Dimension/Measure#30

Merged
ZmeiGorynych merged 3 commits intomainfrom
egor/dev-1164-dbt
Apr 16, 2026
Merged

Add label and filter fields to Dimension/Measure#30
ZmeiGorynych merged 3 commits intomainfrom
egor/dev-1164-dbt

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented Apr 15, 2026

Summary

Two model-level enhancements that prepare SLayer for dbt semantic layer ingestion:

1. label field on Dimension and Measure

A human-readable display name (e.g., "Order Date", "Total Revenue"), distinct from the technical name and the explanatory description. This maps directly to dbt's label field on dimensions and measures.

  • Model-level labels propagate through enrichment as fallbacks when no query-time label is set
  • Fixed _query_as_model() to store labels in the label field — previously labels were incorrectly stored as description, and measure labels were lost entirely
  • MCP _model_to_summary() now includes labels in dimension/measure output

2. filter field on Measure

A SQL condition applied before aggregation via CASE WHEN wrapping. This enables dbt-style filtered metrics (e.g., loss_payment_amount = SUM(claim_amount) WHERE has_loss_payment = 1) without needing to create separate models per metric.

measures:
  - name: active_revenue
    sql: amount
    filter: "status = 'active'"
    # active_revenue:sum → SUM(CASE WHEN status = 'active' THEN amount END)

Key implementation details:

  • Filter resolution happens during enrichment (not SQL generation), reusing resolve_filter_columns logic
  • Supports cross-model (joined) filter references: "categories.type = 'electronics'"
  • SQL generation handles all aggregation types:
    • Standard (SUM/AVG/MIN/MAX): wraps inner expression in CASE WHEN
    • COUNT(*): becomes COUNT(CASE WHEN filter THEN 1 END)
    • first/last: filter added to the ROW_NUMBER CASE condition
    • Formula-based (weighted_avg, custom): {value} placeholder wrapped before substitution
  • _resolve_joins() scans measure filters for join references to ensure necessary JOINs are active
  • Multiple filtered and unfiltered measures can coexist in the same query and be combined in arithmetic formulas

Why this matters

These are the prerequisite SLayer changes needed to faithfully ingest dbt semantic layer definitions. With label, we can preserve dbt's human-readable names. With filter, dbt's filtered simple metrics map directly to SLayer measures — no need for a separate "metrics" concept layer. Derived metrics become straightforward query formulas combining filtered measures.

Test plan

  • Unit tests: model validation for label and filter fields (test_models.py)
  • SQL generation tests: CASE WHEN for sum, avg, count(*), mixed filtered/unfiltered (test_sql_generator.py)
  • Integration tests: filtered measure sum, count, with dimensions; label propagation through enrichment (test_integration.py)
  • Full test suite: 548 passed
  • Linter: ruff check clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Optional label for dimensions and measures; labels surface in query results and summaries.
    • Optional filter for measures to apply SQL conditions at query time (affects aggregation, ranking, and formulas); supports dotted references to joined dimensions and mixing filtered/unfiltered measures.
  • Documentation

    • Model docs updated with examples for labels and filtered measures and how filters are applied.
  • Tests

    • Extensive integration and unit tests added, including SQL generation and injection-hardening checks.

These two enhancements prepare SLayer for dbt semantic layer ingestion:

1. `label` on Dimension and Measure — a human-readable display name
   distinct from `name` (technical) and `description` (explanatory).
   Model-level labels propagate through enrichment as fallbacks when
   no query-time label is set. Fixed _query_as_model to store labels
   in the label field (was incorrectly using description) and propagate
   descriptions separately.

2. `filter` on Measure — a SQL condition applied before aggregation via
   CASE WHEN wrapping. Enables dbt-style filtered metrics without
   creating separate models. Supports local and cross-model (joined)
   filter references. SQL generation handles all aggregation types:
   standard (SUM/AVG/etc), COUNT(*), first/last, and formula-based.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 009177f6-8661-41a0-aa8e-8dad04b45fc1

📥 Commits

Reviewing files that changed from the base of the PR and between 1ca9922 and 363df16.

📒 Files selected for processing (6)
  • slayer/core/formula.py
  • slayer/engine/enrichment.py
  • slayer/sql/generator.py
  • tests/integration/test_integration.py
  • tests/test_formula.py
  • tests/test_sql_generator.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • slayer/engine/enrichment.py
  • tests/test_sql_generator.py

📝 Walkthrough

Walkthrough

Adds optional label to dimensions and measures and a measure-level filter that is parsed, resolved to filter_sql, and applied by wrapping aggregation inputs in CASE WHEN expressions; updates enrichment, SQL generation, query metadata, MCP summaries, docs, and tests.

Changes

Cohort / File(s) Summary
Documentation & Schema
docs/concepts/models.md
Documents new optional label for dimensions/measures and new optional filter for measures, describing CASE WHEN application and dot-syntax references.
Core Models & Validation
slayer/core/models.py
Adds label: Optional[str] to Dimension and Measure; adds filter: Optional[str] to Measure with a field validator that applies multi-dot SQL auto-fix.
Enriched Types
slayer/engine/enriched.py
Adds filter_sql: Optional[str] to EnrichedMeasure.
Enrichment Logic
slayer/engine/enrichment.py
Parses/resolves measure.filter into filter_sql, propagates label to enriched dimensions/measures, and scans measure filters when discovering needed join tables.
Query Engine Metadata
slayer/engine/query_engine.py
Separates label/description lookups in column map, preserves descriptions for virtual model construction, and resolves source measure names when fetching metadata.
MCP Summary Building
slayer/mcp/server.py
Builds model summaries with explicit loops that include label, description, and filter (for measures) when present; skips hidden fields.
SQL Generation
slayer/sql/generator.py
Threads filtered ROW_NUMBER maps through generation; for measures with filter_sql, wraps aggregation inputs in CASE WHEN <filter_sql> THEN ... END, uses COUNT(CASE WHEN ... THEN 1 END) for count-without-sql, and creates per-filtered ROW_NUMBER() for first/last.
Filter Parsing & Escaping
slayer/core/formula.py
Adds _escape_sql_string to properly escape backslashes and single quotes; updates filter string handling to use the new escaping.
Tests
tests/integration/test_integration.py, tests/test_models.py, tests/test_sql_generator.py, tests/test_formula.py
Adds integration tests for filtered measures and label propagation; model tests for label/filter behavior including multi-dot auto-fix; extensive SQL-generation tests for filtered aggregations and ROW_NUMBER handling; injection/escaping tests for filters; parse-filter robustness tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client as User/Client
    participant QE as Query Engine
    participant Enrich as Enrichment Layer
    participant SQLGen as SQL Generator
    participant DB as Database

    Client->>QE: Execute query with measures (may include measure.filter)
    QE->>Enrich: Request enrichment (resolve measures/dimensions)
    Enrich->>Enrich: Parse measure.filter -> produce filter_sql
    Enrich->>Enrich: Resolve filter columns (including joins)
    Enrich->>Enrich: Propagate labels to EnrichedDimension/EnrichedMeasure
    Enrich-->>QE: Return EnrichedQuery (includes filter_sql, labels)
    QE->>SQLGen: Build SQL from EnrichedQuery
    SQLGen->>SQLGen: For each measure with filter_sql:
    SQLGen->>SQLGen: wrap aggregation input as CASE WHEN filter_sql THEN <expr> END
    SQLGen-->>QE: Return generated SQL
    QE->>DB: Execute SQL
    DB-->>QE: Return rows
    QE->>QE: Build result.meta including labels
    QE-->>Client: Return rows + metadata
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • AivanF

Poem

🐰 In fields and measures, labels bloom like spring,

Filters whisper CASE WHEN, a neat little thing.
Rows now choose wisely which values to keep,
Enriched and tidy, the queries leap—
—🥕 From the rabbit who patches the code while you sleep.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding label fields to Dimension and Measure classes, and adding a filter field to Measure class.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1164-dbt

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
slayer/engine/enrichment.py (1)

555-564: ⚠️ Potential issue | 🟠 Major

Carry model labels into time dimensions too.

_resolve_dimensions() now falls back to dim_def.label, but _resolve_time_dimensions() still uses only td.label. A labeled model dimension queried via time_dimensions=[...] will therefore lose its label in result metadata.

Suggested fix
-                label=td.label,
+                label=td.label or (dim_def.label if dim_def else None),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/enrichment.py` around lines 555 - 564, In
_resolve_time_dimensions(), the label for EnrichedTimeDimension is currently set
only from td.label causing loss of model-level labels; update the label
expression to use td.label if present, otherwise fall back to dim_def.label (the
same fallback used in _resolve_dimensions()), i.e. when constructing
EnrichedTimeDimension (variables td and dim_def) set label = td.label or
dim_def.label so labeled model dimensions retain their label in time_dimensions
results.
🧹 Nitpick comments (1)
slayer/sql/generator.py (1)

999-1017: Build the new filter wrappers with sqlglot nodes, not string SQL.

These branches introduce more CASE WHEN ... string assembly and reparsing in the generator. That tends to be brittle around quoting and dialect differences; exp.Case, exp.When, exp.Count, etc. keep the AST structured end-to-end. As per coding guidelines "Use sqlglot AST building for SQL generation, not string concatenation".

Also applies to: 1075-1082

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/sql/generator.py` around lines 999 - 1017, Replace the string-based
CASE WHEN assembly with sqlglot AST nodes: where the code currently builds
case_sql and reparses (the branches handling measure.filter_sql around the
COUNT(*) special-case and the later wrapper when not (agg_name == "count" and
measure.sql is None)), construct an exp.Case with
exp.When(condition=sqlglot.parse_one(measure.filter_sql, dialect=self.dialect)
or preferably parse the condition into an expression once, and use that When to
wrap the existing inner node (for COUNT(*) build
exp.Count(this=exp.Case(whens=[when_expr])) or for general measures wrap inner
as exp.Case(whens=[exp.When(this=inner, condition=cond_expr)])). Keep using
self._resolve_sql, exp.Star, exp.Column, and exp.to_identifier for the inner
expression and avoid any intermediate string SQL or sqlglot.parse_one of whole
CASE strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slayer/sql/generator.py`:
- Around line 983-991: The current code applies measure.filter_sql only inside
the projection CASE (case_sql) after rn_col (like _first_rn/_last_rn) has ranked
all rows, which causes wrong rows to be returned; instead include the filter
expression as part of the ranking input so only matching rows get rank=1. Update
the logic that generates rn_col for filtered first/last (the _first_rn/_last_rn
/ ROW_NUMBER() expression) to incorporate measure.filter_sql into the ORDER BY
or a CASE inside the ORDER BY so that rows failing the filter are ordered after
matching rows (or excluded from rank=1), and then emit the simple MAX(CASE WHEN
{rn_col} = 1 THEN {measure.model_name}.{col} END) projection (remove the
measure.filter_sql from case_sql). Ensure you modify the code path that builds
rn_col and reference measure.filter_sql, rn_col, _first_rn/_last_rn and case_sql
when making the change.

In `@tests/integration/test_integration.py`:
- Around line 1518-1542: Replace the loose assertions in
test_filtered_measure_sum so they assert exact expected values rather than range
checks: after executing SlayerQuery (source_model="orders" with
Field(formula="total_amount:sum") and Field(formula="completed_revenue:sum")),
assert the concrete numeric values for row["orders.total_amount_sum"] and
row["orders.completed_revenue_sum"] (compute from the test fixture or hard-code
the expected sums) and keep the relational check removed; apply the same change
to the other similar test block that appends Measure(name="completed_revenue",
...) and uses Field + SlayerQuery (the second test in the file that mirrors
test_filtered_measure_sum) so both tests validate exact expected totals for
total_amount_sum and completed_revenue_sum.

---

Outside diff comments:
In `@slayer/engine/enrichment.py`:
- Around line 555-564: In _resolve_time_dimensions(), the label for
EnrichedTimeDimension is currently set only from td.label causing loss of
model-level labels; update the label expression to use td.label if present,
otherwise fall back to dim_def.label (the same fallback used in
_resolve_dimensions()), i.e. when constructing EnrichedTimeDimension (variables
td and dim_def) set label = td.label or dim_def.label so labeled model
dimensions retain their label in time_dimensions results.

---

Nitpick comments:
In `@slayer/sql/generator.py`:
- Around line 999-1017: Replace the string-based CASE WHEN assembly with sqlglot
AST nodes: where the code currently builds case_sql and reparses (the branches
handling measure.filter_sql around the COUNT(*) special-case and the later
wrapper when not (agg_name == "count" and measure.sql is None)), construct an
exp.Case with exp.When(condition=sqlglot.parse_one(measure.filter_sql,
dialect=self.dialect) or preferably parse the condition into an expression once,
and use that When to wrap the existing inner node (for COUNT(*) build
exp.Count(this=exp.Case(whens=[when_expr])) or for general measures wrap inner
as exp.Case(whens=[exp.When(this=inner, condition=cond_expr)])). Keep using
self._resolve_sql, exp.Star, exp.Column, and exp.to_identifier for the inner
expression and avoid any intermediate string SQL or sqlglot.parse_one of whole
CASE strings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7571e379-6655-45cf-865c-3335195e24b1

📥 Commits

Reviewing files that changed from the base of the PR and between d7bbae7 and 1ca9922.

📒 Files selected for processing (10)
  • docs/concepts/models.md
  • slayer/core/models.py
  • slayer/engine/enriched.py
  • slayer/engine/enrichment.py
  • slayer/engine/query_engine.py
  • slayer/mcp/server.py
  • slayer/sql/generator.py
  • tests/integration/test_integration.py
  • tests/test_models.py
  • tests/test_sql_generator.py

Comment thread slayer/sql/generator.py Outdated
Comment thread tests/integration/test_integration.py Outdated
@ZmeiGorynych
Copy link
Copy Markdown
Member Author

@CodeRabbit review this!

Measure-level `filter` strings flow through parse_filter() → Python AST →
SQL string, then get interpolated into CASE WHEN clauses in the generator.
The existing single-quote doubling protected against standard-SQL injection,
but backslashes were left unescaped — so a filter ending in `\` emitted SQL
like `'a\'` which on MySQL/ClickHouse (default backslash-escape mode) reads
as an unclosed string literal, letting subsequent tokens be consumed as
string content. sqlglot's tokenizer surfaces this as a TokenError (DoS /
error-leakage vector).

Fix: in `_filter_node_to_sql` and `_get_string_arg`, escape `\` → `\\`
before `'` → `''` so emitted string literals round-trip safely in every
supported dialect.

Also adds formal injection test suites:
- tests/test_formula.py::TestParseFilterInjection — parser-level coverage
  for DROP/UNION/comments/stacked-semicolon/unknown-function rejection,
  tautology acceptance, embedded-quote doubling, backslash handling in
  both literal and LIKE paths, identifier-break rejection, deep-nesting
  bounds.
- tests/test_sql_generator.py::TestMeasureFilterInjection — end-to-end
  coverage parametrised over postgres/mysql/sqlite/duckdb that round-trips
  generated SQL through sqlglot to verify no unclosed literals escape.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ZmeiGorynych ZmeiGorynych merged commit ea028d2 into main Apr 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants