DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule#156
Conversation
…G slack rule The slack FUNC_STYLE_AGG normalizer previously sourced custom aggregation names from the source model only, so `rolling_avg(customers.score)` reached the Mode-B parser unrewritten and raised UnknownFunctionError. Both the query path (`_normalize_stage`) and the model-save path (`normalize_model` via `save_model`) now collect aggregations reachable through the join graph: the query path uses a sync `ResolvedSourceBundle.reachable_aggregation_names` walk over the pre-resolved bundle (scoped per stage), and the save path reuses `_collect_reachable_agg_names` with a best-effort storage closure that swallows AmbiguousModelError and other lookup errors. `normalize_model` gains a `custom_agg_names` param mirroring `normalize_query`, with None=>fallback-to-own-aggs and frozenset()=>suppress-fallback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughExtends custom aggregation support across joined models by implementing reachability discovery through join graphs. Normalization now accepts custom aggregation names as context, enabling query-time and save-time discovery pipelines to recognize func-style aggregations defined on reachable models. ChangesCross-model custom aggregation reachability (DEV-1500)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
6ca070d to
beb6924
Compare
…TYLE_AGG callers The query/save normalization paths walked the join graph for custom aggregation names in beb6924, but the two quiet callers of ``func_style_agg_to_colon`` still passed source-model-only aggs: - ``slayer.memories.resolver.extract_entities_from_query`` — a saved memory whose query references a joined-model custom agg in funcstyle (e.g. ``rolling_avg(customers.score)``) lost the ``customers.score`` entity tag because the rewrite did not fire. - ``slayer.engine.schema_drift._cascade_measures`` — a ModelMeasure on ``orders`` using the same funcstyle did not cascade-drop when the joined column was removed from ``customers``. Both paths now collect custom aggregation names reachable through the join graph: the resolver reuses the async ``_collect_reachable_agg_names`` with a best-effort storage closure (matching ``save_model``), and the cascade does a sync BFS over ``_CascadeState.models_by_name``. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 4-hop save and 4-hop query tests each defined their own local _chain_model helper and inlined the same a -> b -> c -> d -> e save sequence. Lift both into module-level helpers so the tests don't duplicate the chain setup — fixes the SonarCloud duplication block the gate flagged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two new Sonar OPEN issues on PR 156 were attributed to my DEV-1500 edits: - normalize_model (normalization.py:571) — cognitive complexity 48 > 15. Split into three pure helpers — _normalize_model_measures (FUNC_STYLE on ModelMeasure.formula), _normalize_column_dot_paths (DOT_PATH on Column sql/filter), _normalize_model_filter_dot_paths (DOT_PATH on SlayerModel.filters). normalize_model now just sequences them, so the outer function's complexity drops sharply. - save_model (query_engine.py:1582) — cognitive complexity 29 > 15. The nested _resolve_join_target_for_save closure plus the gating try/except inflated the score. Lift the whole thing into a private async method _reachable_aggs_for_save(model); save_model is back to a one-liner await. Pure refactor, no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
slayer/engine/source_bundle.py (1)
93-109: ⚡ Quick winUse
dequefor traversal to avoid O(n) queue pops.
queue.pop(0)is linear per iteration. Switch tocollections.deque+popleft()to keep traversal cost stable on larger join graphs.♻️ Proposed change
+from collections import deque @@ - queue: list[SlayerModel] = [start] + queue = deque([start]) while queue: - current = queue.pop(0) + current = queue.popleft() @@ - queue.append(nxt) + queue.append(nxt)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/source_bundle.py` around lines 93 - 109, The BFS loop currently uses a list named `queue` and calls `queue.pop(0)`, which is O(n) per pop; replace the list with a collections.deque to get O(1) popleft() operations: import deque from collections, initialize `queue` as deque([start]) (or typed as deque[SlayerModel] if using typing), and replace `queue.pop(0)` with `queue.popleft()` while keeping the rest of the logic (variables `names`, `visited`, `current`, and calls to `self.get_referenced_model`) unchanged.slayer/engine/query_engine.py (1)
1582-1584: ⚡ Quick winAlign save-path helper usage with keyword-argument style rule.
_reachable_aggs_for_saveis invoked positionally insave_model; this violates the repo’s keyword-argument convention for multi-parameter Python functions.♻️ Proposed change
- async def _reachable_aggs_for_save( - self, model: SlayerModel, - ) -> Optional[frozenset[str]]: + async def _reachable_aggs_for_save( + self, *, model: SlayerModel, + ) -> Optional[frozenset[str]]: @@ - custom_aggs = await self._reachable_aggs_for_save(model) + custom_aggs = await self._reachable_aggs_for_save(model=model)As per coding guidelines: "
**/*.py: Use keyword arguments for functions with more than 1 parameter".Also applies to: 1639-1640
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/query_engine.py` around lines 1582 - 1584, The call to _reachable_aggs_for_save is being made positionally from save_model which violates the project's keyword-argument rule for multi-parameter functions; update save_model to call _reachable_aggs_for_save using explicit keyword syntax (e.g., model=...) and similarly change the other positional invocation around the second occurrence (the call referenced near lines 1639-1640) to use keyword arguments so all multi-parameter calls follow the repo convention.slayer/engine/schema_drift.py (1)
1312-1338: 💤 Low valueConsider using
collections.dequefor the BFS queue.Line 1326 uses
queue.pop(0)which is O(n) for lists. While join graphs are typically small (< 10 models), usingcollections.dequewithpopleft()would be O(1) and follow standard BFS best practices.♻️ Proposed refactor
+from collections import deque + def _reachable_agg_names_from_state( *, start: SlayerModel, state: "_CascadeState", ) -> Set[str]: """...""" names: Set[str] = set() visited: Set[str] = set() - queue: List[SlayerModel] = [start] + queue: deque[SlayerModel] = deque([start]) while queue: - current = queue.pop(0) + current = queue.popleft() if current.name in visited: continue visited.add(current.name) if current.aggregations: names.update(a.name for a in current.aggregations) for join in current.joins: if join.target_model in visited: continue nxt = state.models_by_name.get(join.target_model) if nxt is not None: queue.append(nxt) return names🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/schema_drift.py` around lines 1312 - 1338, The BFS in _reachable_agg_names_from_state uses a list as queue and calls queue.pop(0) which is O(n); change the queue to a collections.deque and use deque.popleft() for O(1) pops. Update the queue initialization (replace List[SlayerModel] = [start] with a deque containing start), swap queue.pop(0) to queue.popleft(), and adjust the type hint to collections.deque or typing.Deque[SlayerModel] and add the necessary import from collections.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@slayer/engine/query_engine.py`:
- Around line 1582-1584: The call to _reachable_aggs_for_save is being made
positionally from save_model which violates the project's keyword-argument rule
for multi-parameter functions; update save_model to call
_reachable_aggs_for_save using explicit keyword syntax (e.g., model=...) and
similarly change the other positional invocation around the second occurrence
(the call referenced near lines 1639-1640) to use keyword arguments so all
multi-parameter calls follow the repo convention.
In `@slayer/engine/schema_drift.py`:
- Around line 1312-1338: The BFS in _reachable_agg_names_from_state uses a list
as queue and calls queue.pop(0) which is O(n); change the queue to a
collections.deque and use deque.popleft() for O(1) pops. Update the queue
initialization (replace List[SlayerModel] = [start] with a deque containing
start), swap queue.pop(0) to queue.popleft(), and adjust the type hint to
collections.deque or typing.Deque[SlayerModel] and add the necessary import from
collections.
In `@slayer/engine/source_bundle.py`:
- Around line 93-109: The BFS loop currently uses a list named `queue` and calls
`queue.pop(0)`, which is O(n) per pop; replace the list with a collections.deque
to get O(1) popleft() operations: import deque from collections, initialize
`queue` as deque([start]) (or typed as deque[SlayerModel] if using typing), and
replace `queue.pop(0)` with `queue.popleft()` while keeping the rest of the
logic (variables `names`, `visited`, `current`, and calls to
`self.get_referenced_model`) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f324e4f3-6831-4a49-a132-8f7a63bbaf81
📒 Files selected for processing (10)
slayer/engine/normalization.pyslayer/engine/query_engine.pyslayer/engine/schema_drift.pyslayer/engine/source_bundle.pyslayer/memories/resolver.pytests/test_entity_resolution.pytests/test_slack_normalization.pytests/test_source_bundle.pytests/test_sql_generator.pytests/test_validate_models.py
💤 Files with no reviewable changes (1)
- tests/test_sql_generator.py
93bc7cd
into
egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy



Summary
FUNC_STYLE_AGGnormalizer now recognizes custom aggregations defined on joined models —rolling_avg(customers.score)is rewritten tocustomers.score:rolling_avginstead of reaching the Mode-B parser unrewritten and raisingUnknownFunctionError._normalize_stageuses a new syncResolvedSourceBundle.reachable_aggregation_names(*, start)(BFS over the pre-resolved bundle; scoped per stage).save_modeldoes a best-effort async storage BFS via the existing_collect_reachable_agg_names(swallowsAmbiguousModelErrorand any other lookup error) and passes the reachable set intonormalize_model, which gains acustom_agg_names: Optional[frozenset[str]] = Noneparam mirroringnormalize_query(None → fallback to model's own aggs; explicitfrozenset()→ suppress fallback).test_funcstyle_custom_agg_on_joined_modelandtest_funcstyle_custom_agg_at_four_hops) have theirxfail(strict=True)removed and now pass.Linear: DEV-1500.
Test plan
tests/test_source_bundle.py::TestReachableAggregationNames(9 unit tests: own-aggs, joined-aggs, 4-hop, cycle, none-when-empty, scoping guard, absent target, absent+no-aggs, multi-hop union).tests/test_slack_normalization.py::TestNormalizeModelCustomAggParam(3 pure tests: param recognises joined agg, None fallback, empty-frozenset suppression).tests/test_slack_normalization.py::TestJoinedCustomAggFuncStyle(9 engine tests: query-path warning form, save-path rewrite + persistence, multi-stage named-stage positive, missing-target + own discovery still works, AmbiguousModelError spy + swallow, 4-hop save rewrite, filter-path warning, 4-hop query warning, negative multi-stage scoping).poetry run pytest -m "not integration"— 3925 passed, 6 skipped, 40 xfailed (pre-existing), no regressions.poetry run ruff check slayer/ tests/— clean.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes