DEV-1378: string-hygiene operators + Mode A predicate routing#113
Conversation
…gh parse_sql_predicate Add a lowercase allowlist of scalar string functions (`lower`, `upper`, `trim`, `replace`, `substr`, `instr`, `length`, `concat`) plus the SQL `||` operator (folded to `concat(...)`) to `SlayerQuery.filters`. Fold two pre-existing issues caught while tracing the parser: 1. `parse_filter(mode="sql")` plumbing was dead code — never called from anywhere — left over from DEV-1369 round 1. Removed (~50 LOC). 2. `Column.filter` / `SlayerModel.filters` (Mode A SQL) constructed cleanly through `parse_sql_predicate` but were re-parsed via the DSL `parse_filter` at enrichment time, raising `Unknown filter function 'json_extract'` (or any other SQL function) on every query. Now routed through `parse_sql_predicate` at all four enrichment sites, with `_collect_needed_paths` taking a mode-tagged filter list so model and query filters use the right parser. Two `resolve_filter_columns` calls (model `strict=False`, query `strict=True`) replace the single combined call. Add `SQLGenerator._parse_predicate` that wraps in `SELECT 1 WHERE ...` to dodge SQLite/MySQL's `REPLACE`-as-Command parse trap (a pre-existing latent bug surfaced by the new operators). Swapped at WHERE / HAVING assembly and the windowed-CTE filter-AST parse. `||` is preprocessed to Python's `<<` (LShift) — same precedence as SQL `||` relative to comparison/boolean/arithmetic ops — and `_binop_to_sql` flattens LShift chains into n-ary `concat(...)` calls. Test coverage: 19 new `TestStringHygieneFilters` tests in `test_formula.py`; 24 dialect-translation + replace-predicate tests in `test_sql_generator.py` (5 dialects × ops); 15 `parse_sql_predicate` unit tests in new `test_sql_predicate.py`; 3 enrichment-runtime tests in `test_reference_semantics.py`; 6 SQLite integration tests covering both stages end-to-end. Updated injection tests to reflect the new Mode A contract (user-trusted SQL; sqlglot is the dialect-aware gate). 2283 unit / 93 SQLite / 33 DuckDB / 48 Postgres integration tests pass. Doc sync: `docs/concepts/references.md`, `docs/concepts/queries.md`, `CLAUDE.md`, `.claude/skills/slayer-query.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both `_parse_node` (formula → FieldSpec) and `_call_to_sql` (filter → SQL) made ad-hoc string checks against `ALL_TRANSFORMS` / `STRING_HYGIENE_OPS` / `__like__` / `__notlike__`. Extract a `_classify_call_name(name) -> CallCategory` helper as the single source of truth; each walker still decides what to do with the category. No behavior change. Same error messages, same SQL emission. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR implements DEV-1378: adds a lowercase-only string-hygiene allowlist for DSL filters, rewrites SQL ChangesString-Hygiene Filter Functions & Mode Distinction
Test Coverage for String-Hygiene Filters & SQL Mode
Sequence Diagram(s)sequenceDiagram
participant User as User Query
participant DSL as DSL Parser (parse_filter)
participant SQLP as SQL Predicate Parser (parse_sql_predicate)
participant Enr as Enrichment
participant Gen as SQLGenerator
User->>DSL: SlayerQuery.filters (DSL)
User->>SQLP: SlayerModel.filters / Column.filter (SQL fragments)
DSL->>Enr: processed DSL filters + transforms
SQLP->>Enr: parsed SQL-mode predicates
Enr->>Gen: enriched filters (WHERE/HAVING, measure.filter_sql)
Gen->>SQLP: _parse_predicate for fragments
Gen->>Enr: emits final SQL with concat(...) flattened
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_sql_generator.py (1)
3497-3497: ⚡ Quick winUse keyword args in newly added
_generate(...)calls.The new calls pass 3 positional args; this conflicts with the repo rule and makes call-order mistakes easier.
♻️ Suggested pattern
- sql = await _generate(SQLGenerator(dialect=dialect), query, orders_model) + sql = await _generate( + generator=SQLGenerator(dialect=dialect), + query=query, + model=orders_model, + )As per coding guidelines: "Use keyword arguments for functions with more than 1 parameter".
Also applies to: 6736-6737, 6757-6758, 6778-6779, 6800-6801, 6824-6825, 6845-6846
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_sql_generator.py` at line 3497, Replace positional arguments in the new _generate(...) calls with keyword arguments to follow the coding guideline; specifically change calls like _generate(SQLGenerator(dialect=dialect), query, orders_model) to use named parameters for the generator, query, and model (referencing the _generate function, SQLGenerator instantiation, and the variables dialect, query, orders_model) and apply the same change to the other occurrences listed so parameters are passed by keyword rather than position.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@slayer/core/formula.py`:
- Around line 1009-1042: The LIKE-preprocessor (_preprocess_like) only matches
bare/dotted identifiers so expressions like lower(name) like 'a%' fail; update
_preprocess_like to accept call- and parenthesized-left-hand-sides (e.g.,
function calls and parenthesis-wrapped expressions) or move LIKE handling into
the AST stage inside _filter_node_to_sql/_call_to_sql so CALL nodes are
recognized as valid LHS for LIKE/NOT LIKE; specifically, change the
pattern-matching logic in _preprocess_like to allow ast.Call-like text on the
left (functionName(...), (expr), or dotted calls) and ensure replacements
produce the same hygiene-helper form that _call_to_sql expects.
In `@slayer/engine/schema_drift.py`:
- Around line 550-560: The current _filter_refs(filter_str) only handles Mode A
SQL via parse_sql_predicate and is being reused for SlayerQuery.filters (Mode B
DSL); update the call-sites so SQL-only contexts (Column.filter,
SlayerModel.filters) continue to call _filter_refs which uses
parse_sql_predicate, and route SlayerQuery.filters through a new/alternate
DSL-aware extractor (e.g., _filter_refs_dsl or a branch in _filter_refs_on_base)
that uses the DSL parser to extract referenced columns/measures; locate usages
in _filter_refs_on_base and any callers that process SlayerQuery.filters and
switch them to the DSL extractor so validate_models() sees query-backed stage
filters correctly.
---
Nitpick comments:
In `@tests/test_sql_generator.py`:
- Line 3497: Replace positional arguments in the new _generate(...) calls with
keyword arguments to follow the coding guideline; specifically change calls like
_generate(SQLGenerator(dialect=dialect), query, orders_model) to use named
parameters for the generator, query, and model (referencing the _generate
function, SQLGenerator instantiation, and the variables dialect, query,
orders_model) and apply the same change to the other occurrences listed so
parameters are passed by keyword rather than position.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1651889d-3dc2-4c28-a0ca-5cef3e73229e
📒 Files selected for processing (13)
.claude/skills/slayer-query.mdCLAUDE.mddocs/concepts/queries.mddocs/concepts/references.mdslayer/core/formula.pyslayer/engine/enrichment.pyslayer/engine/schema_drift.pyslayer/sql/generator.pytests/integration/test_integration.pytests/test_formula.pytests/test_reference_semantics.pytests/test_sql_generator.pytests/test_sql_predicate.py
Address CodeRabbit + Sonar feedback on the DEV-1378 PR. Correctness (CodeRabbit, MAJOR): - formula.py: broaden `_LIKE_RE` so the LHS may be a hygiene call (`lower(name) like 'a%'`, `trim(email) not like '%@x'`). Three new test cases pin the behaviour. - schema_drift.py: add `_filter_refs_dsl` for `SlayerQuery.filters` (Mode B DSL). The Mode-A switch in DEV-1378 left `_filter_refs_on_base` pointing at the SQL-only extractor, so stage filters with colon syntax or transforms collapsed to `[]` and `validate_models` missed cascading drops on query-backed models. `ParsedFilter` now carries `agg_refs: List[AggregatedMeasureRef]` so schema drift can recover the underlying measure names. Regression test added. Sonar (new-code duplication): - enrichment.py: collapse the duplicated inner-alias resolution branches (change/change_pct vs other self-join transforms) into one `_resolve_inner_alias` helper inside `_flatten_spec`. - test_integration.py: extract the shared `items` SQLite setup into `_setup_items_db(tmp_path)`. - test_sql_generator.py: collapse the 47-line cross-model `last`-filter test setup into a static `_filtered_last_cross_model_sql` helper on `TestFilteredMeasures`. CodeRabbit nitpick: - test_sql_generator.py:3497 — swap positional `_generate(...)` args for keywords. 2287 unit tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/test_sql_generator.py (1)
6702-6811: ⚡ Quick winUse keyword args for
_generate(...)calls in these new tests.Lines 6702, 6723, 6744, 6766, 6790, and 6811 pass 3 positional arguments to
_generate(...). Please switch these to keyword arguments for consistency with repo rules.As per coding guidelines:
**/*.py: Use keyword arguments for functions with more than 1 parameter.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_sql_generator.py` around lines 6702 - 6811, The tests call _generate(SQLGenerator(dialect=dialect), query, orders_model) with positional args; change each call to use keyword arguments (e.g. _generate(generator=SQLGenerator(dialect=dialect), query=query, model=orders_model) or the actual parameter names used by _generate) for the calls in test_instr_translates_per_dialect, test_substr_translates_per_dialect, test_concat_via_pipe_pipe_translates_per_dialect, test_replace_in_query_filter, test_replace_in_column_filter, and the earlier invocation at the start of the file so all six occurrences pass named parameters instead of positional ones.tests/test_validate_models.py (1)
643-643: ⚡ Quick winUse keyword args in
_entry_forcall.Line 643 should pass named arguments to match repo conventions for multi-parameter calls.
Proposed change
- qb_entry = _entry_for("qb_orders_filtered", out) + qb_entry = _entry_for(model_name="qb_orders_filtered", entries=out)As per coding guidelines:
**/*.py: Use keyword arguments for functions with more than 1 parameter.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_validate_models.py` at line 643, Change the positional call to _entry_for to use keyword arguments per repo convention: replace the positional invocation _entry_for("qb_orders_filtered", out) with a call that passes explicit parameter names for the two arguments (e.g., the name for the entry key and the output variable) so the call to _entry_for uses keyword args; update the invocation near qb_entry to reference the same variables but as named parameters consistent with other usages of _entry_for.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_sql_generator.py`:
- Around line 6702-6811: The tests call _generate(SQLGenerator(dialect=dialect),
query, orders_model) with positional args; change each call to use keyword
arguments (e.g. _generate(generator=SQLGenerator(dialect=dialect), query=query,
model=orders_model) or the actual parameter names used by _generate) for the
calls in test_instr_translates_per_dialect, test_substr_translates_per_dialect,
test_concat_via_pipe_pipe_translates_per_dialect, test_replace_in_query_filter,
test_replace_in_column_filter, and the earlier invocation at the start of the
file so all six occurrences pass named parameters instead of positional ones.
In `@tests/test_validate_models.py`:
- Line 643: Change the positional call to _entry_for to use keyword arguments
per repo convention: replace the positional invocation
_entry_for("qb_orders_filtered", out) with a call that passes explicit parameter
names for the two arguments (e.g., the name for the entry key and the output
variable) so the call to _entry_for uses keyword args; update the invocation
near qb_entry to reference the same variables but as named parameters consistent
with other usages of _entry_for.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 04b33c24-63f9-48aa-94c7-d204d504adbd
📒 Files selected for processing (7)
slayer/core/formula.pyslayer/engine/enrichment.pyslayer/engine/schema_drift.pytests/integration/test_integration.pytests/test_formula.pytests/test_sql_generator.pytests/test_validate_models.py
🚧 Files skipped from review as they are similar to previous changes (3)
- slayer/engine/enrichment.py
- slayer/core/formula.py
- tests/integration/test_integration.py
CodeRabbit nitpick — six new ``_generate(...)`` calls in the
``TestStringHygieneDialectTranslation`` and ``TestReplaceFunctionInPredicate``
classes passed positional arguments. Switch to keyword args per
CLAUDE.md ("Use keyword arguments for functions with more than 1
parameter").
The remaining seven positional ``_generate(...)`` calls earlier in the
file pre-date this PR and were not flagged — left alone to keep the
diff focused.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
Closes two implementation gaps in SLayer's reference semantics (Mode A SQL vs Mode B DSL — see
docs/concepts/references.md):parse_sql_predicate. PreviouslyColumn.filterandSlayerModel.filterswere created via the SQL path but re-parsed at enrichment time via the DSL parser, so any model filter containing arbitrary SQL functions (json_extract,coalesce,CASE WHEN, …) raisedUnknown filter function 'json_extract'at runtime. The fix introduces a dedicatedslayer.sql.sql_predicate.parse_sql_predicatemodule and wires it through every enrichment site that handles Mode A filters; Mode B query filters (SlayerQuery.filters) continue through the DSL parser. Strictness is split: model filters resolve withstrict=False, query filters withstrict=True.SlayerQuery.filters(Mode B) accepts a small string-hygiene allowlist. Eight lowercase scalars —lower,upper,trim,replace,substr,instr,length,concat— plus the SQL||operator, which is rewritten toconcat(...)by a new_preprocess_concatpass. Names are lowercase only, matching SLayer's existing transform convention. sqlglot translates per-dialect at SQL-generation time. Single source of truth:STRING_HYGIENE_OPSfrozenset inslayer/core/formula.py.Also folded in: a SQL generator change to wrap user-supplied predicates in
SELECT 1 WHERE …before parsing (_parse_predicate). This dodges a sqlglot trap on SQLite/MySQL whereREPLACEat the start of a fragment is parsed as theREPLACE INTOstatement keyword, not the function call.What's in this PR
slayer/core/formula.py—STRING_HYGIENE_OPSallowlist,_preprocess_concat(||→<<→ flatconcat(...)), removed the deadparse_filter(mode='sql')branch (~50 LOC), tightened_call_to_sqlto the new allowlist, plus a_classify_call_namehelper that gives the formula and filter walkers one categorization function instead of duplicated string checks.slayer/sql/sql_predicate.py— new module.parse_sql_predicaterejects DSL constructs (colon syntax, transforms, rawOVER) up front, extracts column-shaped identifiers, and accepts arbitrary SQL function calls. No sqlglot at parse time; full dialect-aware parsing is deferred to SQL generation.slayer/engine/enrichment.py— every Mode A filter site (measure_def.filter, model-filter validation loop,_collect_needed_paths) routes throughparse_sql_predicate. Mode-tagged filter list (List[Tuple[str, str]]) threads through_resolve_joinsand_collect_needed_pathsso each parser applies the right validator.resolve_filter_columnsis now called twice with mode-appropriate strictness.slayer/sql/generator.py— new_parse_predicate(text)wraps fragments inSELECT 1 WHERE …to dodge theREPLACEstatement-keyword trap. Three call sites updated (measure filter SQL, WHERE assembly, HAVING assembly).slayer/engine/schema_drift.py— minor:_collect_needed_pathssignature accepts the mode-tagged filter list.docs/concepts/queries.mddocuments the string-hygiene allowlist and||rewrite.CLAUDE.mdand.claude/skills/slayer-query.mdupdated.TestStringHygieneFilterscases intests/test_formula.py; 15 unit tests in newtests/test_sql_predicate.py; 24 dialect-translation + REPLACE-predicate tests intests/test_sql_generator.py; 3 enrichment-runtime tests intests/test_reference_semantics.py; 6 SQLite integration tests intests/integration/test_integration.py.Scope decisions
After landing the above, I asked whether the formula parser (
parse_formula→FieldSpec) and the DSL filter parser (parse_filter→ParsedFilter) — both living inslayer/core/formula.pyand sharing preprocessing — should be partly or wholly merged. The candidate refactor was a shared expression IR with two thin emitters (one forFieldSpec, one for SQL strings).Decided in scope (and shipped here):
_classify_call_namehelper for call-name categorization (transform/hygiene/like_internal/unknown). Both walkers use it; their decision logic stays put. ~+25 / −5 LOC, no behavior change.Decided out of scope (deliberately not done):
MixedArithmeticField.sqlis built fromast.unparse(node)after AST mutation, with pre-order_t0/_t1placeholder assignment. Tests pin those exact strings; reproducing them from a Pydantic IR effectively requires re-implementing AST serialization.nullif,coalesce,ln, …) differently and correctly: formula rejects them at the root but accepts them inside arithmetic via_replace_calls_in_arith's passthrough; filter is strict. A literal IR-and-emitter implementation would silently change scalar-call acceptance rules.replace,substr,||flattening, LIKE escaping). Round-tripping through an IR is likely to shift quoting / parens / flattening somewhere subtle._BINOP_OP_MAP/_compare_op_to_sql/_compare_to_sql. These are filter-only; the formula walker has no equivalents because it usesast.unparsefor arithmetic and comparisons. There is nothing to dedup._resolve_dotted_attributeand_collect_names. The two functions look related but produce different shapes (single string vs. list with consumed-Name-node tracking); collapsing them required touching_collect_names's consumed-Name set, with no behavior win.The IR refactor remains a possible future direction if a third consumer for the parsed expression appears (e.g. a static analyzer or a non-SQL backend); until then, the two walkers + shared preprocessing layer is the right shape.
Test plan
poetry run pytest tests/test_formula.py -v— 179 passpoetry run pytest -m "not integration"— 2283 passpoetry run ruff check slayer/ tests/— cleanpoetry run pytest tests/integration/ -m integration— verify SQLite / Postgres / DuckDB suites greenjson_extract/coalesceand a Mode B query filter usinglower(name) || '_x' = 'foo_x'🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
||is rewritten to concat(...).Improvements
Documentation
Tests