Add label and filter fields to Dimension/Measure#30
Conversation
These two enhancements prepare SLayer for dbt semantic layer ingestion: 1. `label` on Dimension and Measure — a human-readable display name distinct from `name` (technical) and `description` (explanatory). Model-level labels propagate through enrichment as fallbacks when no query-time label is set. Fixed _query_as_model to store labels in the label field (was incorrectly using description) and propagate descriptions separately. 2. `filter` on Measure — a SQL condition applied before aggregation via CASE WHEN wrapping. Enables dbt-style filtered metrics without creating separate models. Supports local and cross-model (joined) filter references. SQL generation handles all aggregation types: standard (SUM/AVG/etc), COUNT(*), first/last, and formula-based. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds optional Changes
Sequence Diagram(s)sequenceDiagram
participant Client as User/Client
participant QE as Query Engine
participant Enrich as Enrichment Layer
participant SQLGen as SQL Generator
participant DB as Database
Client->>QE: Execute query with measures (may include measure.filter)
QE->>Enrich: Request enrichment (resolve measures/dimensions)
Enrich->>Enrich: Parse measure.filter -> produce filter_sql
Enrich->>Enrich: Resolve filter columns (including joins)
Enrich->>Enrich: Propagate labels to EnrichedDimension/EnrichedMeasure
Enrich-->>QE: Return EnrichedQuery (includes filter_sql, labels)
QE->>SQLGen: Build SQL from EnrichedQuery
SQLGen->>SQLGen: For each measure with filter_sql:
SQLGen->>SQLGen: wrap aggregation input as CASE WHEN filter_sql THEN <expr> END
SQLGen-->>QE: Return generated SQL
QE->>DB: Execute SQL
DB-->>QE: Return rows
QE->>QE: Build result.meta including labels
QE-->>Client: Return rows + metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
slayer/engine/enrichment.py (1)
555-564:⚠️ Potential issue | 🟠 MajorCarry model labels into time dimensions too.
_resolve_dimensions()now falls back todim_def.label, but_resolve_time_dimensions()still uses onlytd.label. A labeled model dimension queried viatime_dimensions=[...]will therefore lose its label in result metadata.Suggested fix
- label=td.label, + label=td.label or (dim_def.label if dim_def else None),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/engine/enrichment.py` around lines 555 - 564, In _resolve_time_dimensions(), the label for EnrichedTimeDimension is currently set only from td.label causing loss of model-level labels; update the label expression to use td.label if present, otherwise fall back to dim_def.label (the same fallback used in _resolve_dimensions()), i.e. when constructing EnrichedTimeDimension (variables td and dim_def) set label = td.label or dim_def.label so labeled model dimensions retain their label in time_dimensions results.
🧹 Nitpick comments (1)
slayer/sql/generator.py (1)
999-1017: Build the new filter wrappers with sqlglot nodes, not string SQL.These branches introduce more
CASE WHEN ...string assembly and reparsing in the generator. That tends to be brittle around quoting and dialect differences;exp.Case,exp.When,exp.Count, etc. keep the AST structured end-to-end. As per coding guidelines "Use sqlglot AST building for SQL generation, not string concatenation".Also applies to: 1075-1082
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/sql/generator.py` around lines 999 - 1017, Replace the string-based CASE WHEN assembly with sqlglot AST nodes: where the code currently builds case_sql and reparses (the branches handling measure.filter_sql around the COUNT(*) special-case and the later wrapper when not (agg_name == "count" and measure.sql is None)), construct an exp.Case with exp.When(condition=sqlglot.parse_one(measure.filter_sql, dialect=self.dialect) or preferably parse the condition into an expression once, and use that When to wrap the existing inner node (for COUNT(*) build exp.Count(this=exp.Case(whens=[when_expr])) or for general measures wrap inner as exp.Case(whens=[exp.When(this=inner, condition=cond_expr)])). Keep using self._resolve_sql, exp.Star, exp.Column, and exp.to_identifier for the inner expression and avoid any intermediate string SQL or sqlglot.parse_one of whole CASE strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@slayer/sql/generator.py`:
- Around line 983-991: The current code applies measure.filter_sql only inside
the projection CASE (case_sql) after rn_col (like _first_rn/_last_rn) has ranked
all rows, which causes wrong rows to be returned; instead include the filter
expression as part of the ranking input so only matching rows get rank=1. Update
the logic that generates rn_col for filtered first/last (the _first_rn/_last_rn
/ ROW_NUMBER() expression) to incorporate measure.filter_sql into the ORDER BY
or a CASE inside the ORDER BY so that rows failing the filter are ordered after
matching rows (or excluded from rank=1), and then emit the simple MAX(CASE WHEN
{rn_col} = 1 THEN {measure.model_name}.{col} END) projection (remove the
measure.filter_sql from case_sql). Ensure you modify the code path that builds
rn_col and reference measure.filter_sql, rn_col, _first_rn/_last_rn and case_sql
when making the change.
In `@tests/integration/test_integration.py`:
- Around line 1518-1542: Replace the loose assertions in
test_filtered_measure_sum so they assert exact expected values rather than range
checks: after executing SlayerQuery (source_model="orders" with
Field(formula="total_amount:sum") and Field(formula="completed_revenue:sum")),
assert the concrete numeric values for row["orders.total_amount_sum"] and
row["orders.completed_revenue_sum"] (compute from the test fixture or hard-code
the expected sums) and keep the relational check removed; apply the same change
to the other similar test block that appends Measure(name="completed_revenue",
...) and uses Field + SlayerQuery (the second test in the file that mirrors
test_filtered_measure_sum) so both tests validate exact expected totals for
total_amount_sum and completed_revenue_sum.
---
Outside diff comments:
In `@slayer/engine/enrichment.py`:
- Around line 555-564: In _resolve_time_dimensions(), the label for
EnrichedTimeDimension is currently set only from td.label causing loss of
model-level labels; update the label expression to use td.label if present,
otherwise fall back to dim_def.label (the same fallback used in
_resolve_dimensions()), i.e. when constructing EnrichedTimeDimension (variables
td and dim_def) set label = td.label or dim_def.label so labeled model
dimensions retain their label in time_dimensions results.
---
Nitpick comments:
In `@slayer/sql/generator.py`:
- Around line 999-1017: Replace the string-based CASE WHEN assembly with sqlglot
AST nodes: where the code currently builds case_sql and reparses (the branches
handling measure.filter_sql around the COUNT(*) special-case and the later
wrapper when not (agg_name == "count" and measure.sql is None)), construct an
exp.Case with exp.When(condition=sqlglot.parse_one(measure.filter_sql,
dialect=self.dialect) or preferably parse the condition into an expression once,
and use that When to wrap the existing inner node (for COUNT(*) build
exp.Count(this=exp.Case(whens=[when_expr])) or for general measures wrap inner
as exp.Case(whens=[exp.When(this=inner, condition=cond_expr)])). Keep using
self._resolve_sql, exp.Star, exp.Column, and exp.to_identifier for the inner
expression and avoid any intermediate string SQL or sqlglot.parse_one of whole
CASE strings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7571e379-6655-45cf-865c-3335195e24b1
📒 Files selected for processing (10)
docs/concepts/models.mdslayer/core/models.pyslayer/engine/enriched.pyslayer/engine/enrichment.pyslayer/engine/query_engine.pyslayer/mcp/server.pyslayer/sql/generator.pytests/integration/test_integration.pytests/test_models.pytests/test_sql_generator.py
|
@CodeRabbit review this! |
Measure-level `filter` strings flow through parse_filter() → Python AST → SQL string, then get interpolated into CASE WHEN clauses in the generator. The existing single-quote doubling protected against standard-SQL injection, but backslashes were left unescaped — so a filter ending in `\` emitted SQL like `'a\'` which on MySQL/ClickHouse (default backslash-escape mode) reads as an unclosed string literal, letting subsequent tokens be consumed as string content. sqlglot's tokenizer surfaces this as a TokenError (DoS / error-leakage vector). Fix: in `_filter_node_to_sql` and `_get_string_arg`, escape `\` → `\\` before `'` → `''` so emitted string literals round-trip safely in every supported dialect. Also adds formal injection test suites: - tests/test_formula.py::TestParseFilterInjection — parser-level coverage for DROP/UNION/comments/stacked-semicolon/unknown-function rejection, tautology acceptance, embedded-quote doubling, backslash handling in both literal and LIKE paths, identifier-break rejection, deep-nesting bounds. - tests/test_sql_generator.py::TestMeasureFilterInjection — end-to-end coverage parametrised over postgres/mysql/sqlite/duckdb that round-trips generated SQL through sqlglot to verify no unclosed literals escape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Two model-level enhancements that prepare SLayer for dbt semantic layer ingestion:
1.
labelfield on Dimension and MeasureA human-readable display name (e.g., "Order Date", "Total Revenue"), distinct from the technical
nameand the explanatorydescription. This maps directly to dbt'slabelfield on dimensions and measures._query_as_model()to store labels in thelabelfield — previously labels were incorrectly stored asdescription, and measure labels were lost entirely_model_to_summary()now includes labels in dimension/measure output2.
filterfield on MeasureA SQL condition applied before aggregation via
CASE WHENwrapping. This enables dbt-style filtered metrics (e.g.,loss_payment_amount = SUM(claim_amount) WHERE has_loss_payment = 1) without needing to create separate models per metric.Key implementation details:
resolve_filter_columnslogic"categories.type = 'electronics'"SUM/AVG/MIN/MAX): wraps inner expression inCASE WHENCOUNT(*): becomesCOUNT(CASE WHEN filter THEN 1 END)first/last: filter added to the ROW_NUMBER CASE conditionweighted_avg, custom):{value}placeholder wrapped before substitution_resolve_joins()scans measure filters for join references to ensure necessary JOINs are activeWhy this matters
These are the prerequisite SLayer changes needed to faithfully ingest dbt semantic layer definitions. With
label, we can preserve dbt's human-readable names. Withfilter, dbt's filtered simple metrics map directly to SLayer measures — no need for a separate "metrics" concept layer. Derived metrics become straightforward query formulas combining filtered measures.Test plan
test_models.py)test_sql_generator.py)test_integration.py)ruff checkclean🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
labelfor dimensions and measures; labels surface in query results and summaries.filterfor measures to apply SQL conditions at query time (affects aggregation, ranking, and formulas); supports dotted references to joined dimensions and mixing filtered/unfiltered measures.Documentation
Tests