Skip to content

DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule#156

Merged
ZmeiGorynych merged 4 commits into
egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacyfrom
egor/dev-1500-slack-func_style_agg-normalization-doesnt-recognize-custom
May 31, 2026
Merged

DEV-1500: recognize joined-model custom aggregations in FUNC_STYLE_AGG slack rule#156
ZmeiGorynych merged 4 commits into
egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacyfrom
egor/dev-1500-slack-func_style_agg-normalization-doesnt-recognize-custom

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 30, 2026

Summary

  • The slack FUNC_STYLE_AGG normalizer now recognizes custom aggregations defined on joined models — rolling_avg(customers.score) is rewritten to customers.score:rolling_avg instead of reaching the Mode-B parser unrewritten and raising UnknownFunctionError.
  • Query path: _normalize_stage uses a new sync ResolvedSourceBundle.reachable_aggregation_names(*, start) (BFS over the pre-resolved bundle; scoped per stage).
  • Model-save path: save_model does a best-effort async storage BFS via the existing _collect_reachable_agg_names (swallows AmbiguousModelError and any other lookup error) and passes the reachable set into normalize_model, which gains a custom_agg_names: Optional[frozenset[str]] = None param mirroring normalize_query (None → fallback to model's own aggs; explicit frozenset() → suppress fallback).
  • The two DEV-1484 Stage-C tracking tests (test_funcstyle_custom_agg_on_joined_model and test_funcstyle_custom_agg_at_four_hops) have their xfail(strict=True) removed and now pass.

Linear: DEV-1500.

Test plan

  • tests/test_source_bundle.py::TestReachableAggregationNames (9 unit tests: own-aggs, joined-aggs, 4-hop, cycle, none-when-empty, scoping guard, absent target, absent+no-aggs, multi-hop union).
  • tests/test_slack_normalization.py::TestNormalizeModelCustomAggParam (3 pure tests: param recognises joined agg, None fallback, empty-frozenset suppression).
  • tests/test_slack_normalization.py::TestJoinedCustomAggFuncStyle (9 engine tests: query-path warning form, save-path rewrite + persistence, multi-stage named-stage positive, missing-target + own discovery still works, AmbiguousModelError spy + swallow, 4-hop save rewrite, filter-path warning, 4-hop query warning, negative multi-stage scoping).
  • poetry run pytest -m "not integration" — 3925 passed, 6 skipped, 40 xfailed (pre-existing), no regressions.
  • poetry run ruff check slayer/ tests/ — clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Custom aggregations defined on joined models are now discoverable and usable in measure formulas
    • Function-style aggregations automatically recognize and rewrite custom aggregations accessible through join paths
  • Bug Fixes

    • Improved cascade detection for schema changes affecting measures that reference aggregations from joined models

…G slack rule

The slack FUNC_STYLE_AGG normalizer previously sourced custom aggregation
names from the source model only, so `rolling_avg(customers.score)` reached
the Mode-B parser unrewritten and raised UnknownFunctionError. Both the
query path (`_normalize_stage`) and the model-save path (`normalize_model`
via `save_model`) now collect aggregations reachable through the join graph:
the query path uses a sync `ResolvedSourceBundle.reachable_aggregation_names`
walk over the pre-resolved bundle (scoped per stage), and the save path
reuses `_collect_reachable_agg_names` with a best-effort storage closure
that swallows AmbiguousModelError and other lookup errors. `normalize_model`
gains a `custom_agg_names` param mirroring `normalize_query`, with
None=>fallback-to-own-aggs and frozenset()=>suppress-fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented May 30, 2026

DEV-1500

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Extends custom aggregation support across joined models by implementing reachability discovery through join graphs. Normalization now accepts custom aggregation names as context, enabling query-time and save-time discovery pipelines to recognize func-style aggregations defined on reachable models.

Changes

Cross-model custom aggregation reachability (DEV-1500)

Layer / File(s) Summary
Reachability API foundation
slayer/engine/source_bundle.py, tests/test_source_bundle.py
ResolvedSourceBundle.reachable_aggregation_names performs BFS through join graphs to collect reachable aggregation names; tests validate multi-hop traversal, cycle termination, scoping to reachable subgraph, and handling of missing join targets.
Normalization signature and refactoring
slayer/engine/normalization.py, tests/test_slack_normalization.py
normalize_model now accepts custom_agg_names keyword-only parameter; internal helpers split Mode-A/Mode-B logic for measures, columns, and filters; tests cover parameter semantics including joined-model aggregations and fallback behavior.
Query-time entity resolution
slayer/memories/resolver.py, tests/test_entity_resolution.py
extract_entities_from_query discovers reachable aggregations via best-effort join-graph walk to recognize func-style references to joined-model custom aggregations during formula parsing and entity tagging.
Query engine integration and save-time persistence
slayer/engine/query_engine.py, tests/test_slack_normalization.py
_normalize_stage uses bundle reachability for query-time custom aggregations; new _reachable_aggs_for_save helper performs storage-backed discovery for save-time persistence, with extensive tests covering scoping, missing joins, ambiguous resolution swallowing, and unbounded 4-hop discovery.
Cascade validation for measure dependencies
slayer/engine/schema_drift.py, tests/test_validate_models.py
_cascade_measures uses reachable aggregation names to recognize func-style measure references to joined-model aggregations when computing datasource drops, with test validating cascade of dropped joined columns.
SQL generation test enablement
tests/test_sql_generator.py
Enable two previously xfailed tests: func-style custom aggregation on joined model and 4-hop reachable discovery, now treated as required passing test cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through joined-up paths,
Discovery spreads through aggregation graphs,
Custom names now bloom where joins align,
Four hops reachable, semantics shine!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: enabling FUNC_STYLE_AGG slack rule to recognize custom aggregations defined on joined models, which is the core objective of DEV-1500.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1500-slack-func_style_agg-normalization-doesnt-recognize-custom

Comment @coderabbitai help to get the list of available commands and usage tips.

@ZmeiGorynych ZmeiGorynych changed the base branch from main to egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy May 31, 2026 07:55
@ZmeiGorynych ZmeiGorynych force-pushed the egor/dev-1500-slack-func_style_agg-normalization-doesnt-recognize-custom branch from 6ca070d to beb6924 Compare May 31, 2026 08:25
ZmeiGorynych and others added 3 commits May 31, 2026 10:36
…TYLE_AGG callers

The query/save normalization paths walked the join graph for custom
aggregation names in beb6924, but the two quiet callers of
``func_style_agg_to_colon`` still passed source-model-only aggs:

- ``slayer.memories.resolver.extract_entities_from_query`` — a saved
  memory whose query references a joined-model custom agg in funcstyle
  (e.g. ``rolling_avg(customers.score)``) lost the ``customers.score``
  entity tag because the rewrite did not fire.
- ``slayer.engine.schema_drift._cascade_measures`` — a ModelMeasure on
  ``orders`` using the same funcstyle did not cascade-drop when the
  joined column was removed from ``customers``.

Both paths now collect custom aggregation names reachable through the
join graph: the resolver reuses the async ``_collect_reachable_agg_names``
with a best-effort storage closure (matching ``save_model``), and the
cascade does a sync BFS over ``_CascadeState.models_by_name``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 4-hop save and 4-hop query tests each defined their own local
_chain_model helper and inlined the same a -> b -> c -> d -> e save
sequence. Lift both into module-level helpers so the tests don't
duplicate the chain setup — fixes the SonarCloud duplication block
the gate flagged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two new Sonar OPEN issues on PR 156 were attributed to my DEV-1500 edits:

- normalize_model (normalization.py:571) — cognitive complexity 48 > 15.
  Split into three pure helpers — _normalize_model_measures (FUNC_STYLE on
  ModelMeasure.formula), _normalize_column_dot_paths (DOT_PATH on Column
  sql/filter), _normalize_model_filter_dot_paths (DOT_PATH on
  SlayerModel.filters). normalize_model now just sequences them, so the
  outer function's complexity drops sharply.

- save_model (query_engine.py:1582) — cognitive complexity 29 > 15. The
  nested _resolve_join_target_for_save closure plus the gating try/except
  inflated the score. Lift the whole thing into a private async method
  _reachable_aggs_for_save(model); save_model is back to a one-liner await.

Pure refactor, no behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
slayer/engine/source_bundle.py (1)

93-109: ⚡ Quick win

Use deque for traversal to avoid O(n) queue pops.

queue.pop(0) is linear per iteration. Switch to collections.deque + popleft() to keep traversal cost stable on larger join graphs.

♻️ Proposed change
+from collections import deque
@@
-        queue: list[SlayerModel] = [start]
+        queue = deque([start])
         while queue:
-            current = queue.pop(0)
+            current = queue.popleft()
@@
-                    queue.append(nxt)
+                    queue.append(nxt)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/source_bundle.py` around lines 93 - 109, The BFS loop currently
uses a list named `queue` and calls `queue.pop(0)`, which is O(n) per pop;
replace the list with a collections.deque to get O(1) popleft() operations:
import deque from collections, initialize `queue` as deque([start]) (or typed as
deque[SlayerModel] if using typing), and replace `queue.pop(0)` with
`queue.popleft()` while keeping the rest of the logic (variables `names`,
`visited`, `current`, and calls to `self.get_referenced_model`) unchanged.
slayer/engine/query_engine.py (1)

1582-1584: ⚡ Quick win

Align save-path helper usage with keyword-argument style rule.

_reachable_aggs_for_save is invoked positionally in save_model; this violates the repo’s keyword-argument convention for multi-parameter Python functions.

♻️ Proposed change
-    async def _reachable_aggs_for_save(
-        self, model: SlayerModel,
-    ) -> Optional[frozenset[str]]:
+    async def _reachable_aggs_for_save(
+        self, *, model: SlayerModel,
+    ) -> Optional[frozenset[str]]:
@@
-        custom_aggs = await self._reachable_aggs_for_save(model)
+        custom_aggs = await self._reachable_aggs_for_save(model=model)

As per coding guidelines: "**/*.py: Use keyword arguments for functions with more than 1 parameter".

Also applies to: 1639-1640

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/query_engine.py` around lines 1582 - 1584, The call to
_reachable_aggs_for_save is being made positionally from save_model which
violates the project's keyword-argument rule for multi-parameter functions;
update save_model to call _reachable_aggs_for_save using explicit keyword syntax
(e.g., model=...) and similarly change the other positional invocation around
the second occurrence (the call referenced near lines 1639-1640) to use keyword
arguments so all multi-parameter calls follow the repo convention.
slayer/engine/schema_drift.py (1)

1312-1338: 💤 Low value

Consider using collections.deque for the BFS queue.

Line 1326 uses queue.pop(0) which is O(n) for lists. While join graphs are typically small (< 10 models), using collections.deque with popleft() would be O(1) and follow standard BFS best practices.

♻️ Proposed refactor
+from collections import deque
+
 def _reachable_agg_names_from_state(
     *, start: SlayerModel, state: "_CascadeState",
 ) -> Set[str]:
     """..."""
     names: Set[str] = set()
     visited: Set[str] = set()
-    queue: List[SlayerModel] = [start]
+    queue: deque[SlayerModel] = deque([start])
     while queue:
-        current = queue.pop(0)
+        current = queue.popleft()
         if current.name in visited:
             continue
         visited.add(current.name)
         if current.aggregations:
             names.update(a.name for a in current.aggregations)
         for join in current.joins:
             if join.target_model in visited:
                 continue
             nxt = state.models_by_name.get(join.target_model)
             if nxt is not None:
                 queue.append(nxt)
     return names
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/engine/schema_drift.py` around lines 1312 - 1338, The BFS in
_reachable_agg_names_from_state uses a list as queue and calls queue.pop(0)
which is O(n); change the queue to a collections.deque and use deque.popleft()
for O(1) pops. Update the queue initialization (replace List[SlayerModel] =
[start] with a deque containing start), swap queue.pop(0) to queue.popleft(),
and adjust the type hint to collections.deque or typing.Deque[SlayerModel] and
add the necessary import from collections.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@slayer/engine/query_engine.py`:
- Around line 1582-1584: The call to _reachable_aggs_for_save is being made
positionally from save_model which violates the project's keyword-argument rule
for multi-parameter functions; update save_model to call
_reachable_aggs_for_save using explicit keyword syntax (e.g., model=...) and
similarly change the other positional invocation around the second occurrence
(the call referenced near lines 1639-1640) to use keyword arguments so all
multi-parameter calls follow the repo convention.

In `@slayer/engine/schema_drift.py`:
- Around line 1312-1338: The BFS in _reachable_agg_names_from_state uses a list
as queue and calls queue.pop(0) which is O(n); change the queue to a
collections.deque and use deque.popleft() for O(1) pops. Update the queue
initialization (replace List[SlayerModel] = [start] with a deque containing
start), swap queue.pop(0) to queue.popleft(), and adjust the type hint to
collections.deque or typing.Deque[SlayerModel] and add the necessary import from
collections.

In `@slayer/engine/source_bundle.py`:
- Around line 93-109: The BFS loop currently uses a list named `queue` and calls
`queue.pop(0)`, which is O(n) per pop; replace the list with a collections.deque
to get O(1) popleft() operations: import deque from collections, initialize
`queue` as deque([start]) (or typed as deque[SlayerModel] if using typing), and
replace `queue.pop(0)` with `queue.popleft()` while keeping the rest of the
logic (variables `names`, `visited`, `current`, and calls to
`self.get_referenced_model`) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f324e4f3-6831-4a49-a132-8f7a63bbaf81

📥 Commits

Reviewing files that changed from the base of the PR and between 74ed95a and c8fbfc5.

📒 Files selected for processing (10)
  • slayer/engine/normalization.py
  • slayer/engine/query_engine.py
  • slayer/engine/schema_drift.py
  • slayer/engine/source_bundle.py
  • slayer/memories/resolver.py
  • tests/test_entity_resolution.py
  • tests/test_slack_normalization.py
  • tests/test_source_bundle.py
  • tests/test_sql_generator.py
  • tests/test_validate_models.py
💤 Files with no reviewable changes (1)
  • tests/test_sql_generator.py

@ZmeiGorynych ZmeiGorynych merged commit 93bc7cd into egor/dev-1484-dev-1452-stage-c-migrate-pre-existing-tests-off-legacy May 31, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant