linq_fold group_by: min/max/first reducers + multi-reducer fused pass#2724
Merged
Conversation
Generalizes plan_group_by along two axes:
A.3 — per-reducer dispatch. `is_bucket_reducer_call` extended to 8 reducer
shapes: bare {sum, min, max, first} and inner-select {sum, min, max, first}
(`<reducer>(select(<bind>._1, <lambda>))`). New `emit_reducer_branches`
helper produces per-reducer (missInit, hitUpdate) pairs:
- min/max: direct `<` / `>` on workhorse acc types; `_::less` fallback for
non-workhorse (matches the reference at linq.das:1224,1308).
- first: miss-init only (hitUpdate null). Subsequent same-key elements are
ignored, exploiting the first-key-wins guarantee from PR #2721.
- inner-select min/max: bind the projection result to a per-element temp so
the inner body evaluates exactly once per source element — matches the
reference's `select` laziness, and avoids re-running side effects.
- first_or_default is rejected at the arity check (2 args) and cascades.
A.4 — N+1-slot named tuples. The 2-slot recognizer (key + 1 reducer) is
replaced by `recognize_reducer_specs` returning `array<ReducerSpec>`. The
named-tuple form now accepts arbitrarily many reducer slots (key at _0,
reducers at _1.._N). The planner walks the spec array once and concatenates
per-slot missInit + hitUpdate statements into the per-element loop —
N reducers fused into one pass instead of N separate passes.
Field-access into dynamic slots (`entry._{slot}`) is built programmatically
via `mk_slot_ref` since qmacro has no dynamic-field-name splice. The
loop-body emission is now conditional on whether any reducer contributes a
hit update — first-only chains drop the else branch entirely.
Bail paths (all cascade to tier 2):
- key not at slot 0 of the named tuple
- unrecognized reducer in any slot (e.g. `average`)
- first_or_default (2-arg, fails arity)
Tests:
- 18 new parity tests in test_group_by_min_max_first_fold_parity and
test_group_by_multi_reducer_fold_parity (bare, named, inner-select,
multi-reducer, empty source, reference parity)
- 5 new AST-shape tests: G5a (min workhorse direct compare), G5b (min
non-workhorse `_::less`), G6 (first no hit compare), G7 (multi-reducer
fused pass), G10 (first_or_default cascade)
Both PR-A1 baselines (groupby_count, groupby_sum) hold parity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t + multi-reducer 4 new 100K-row benchmarks (m1 SQL / m3 plain LINQ / m3f splice): - groupby_min: 175 / 111 / 42 ns/op (2.6× over m3, 4.2× over SQL) - groupby_max: 173 / 108 / 43 ns/op (2.5× over m3, 4.0× over SQL) - groupby_first: — / 71 / 36 ns/op (2.0× over m3; no direct SQL aggregator) - groupby_multi_reducer: 189 / 139 / 53 ns/op (3 reducers fused into 1 pass; 2.6× over m3, 3.6× over SQL) All 6 splice variants land within ~36–53 ns/op — per-element work bounded by the single hash op + slot mutations regardless of which reducer or how many. Multi-reducer pays ~5 ns per extra slot, still beats SQL by 3.6×. LINQ.md: refreshed Phase status table, baseline rows, Phase 3+ subsection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands the _fold tier-1 splice for group_by[_lazy] |> select(...) in daslib/linq_fold.das to cover additional reducer shapes (min, max, first, and their inner-select forms) and to fuse multiple reducers from N-slot named-tuples into a single per-element table-update pass, closing remaining BufferGroupBy gaps from the earlier follow-ups.
Changes:
- Extend reducer recognition to 8 shapes: bare
{sum,min,max,first}+ inner-select{sum,min,max,first}and add per-reducer miss-init / hit-update emission. - Generalize named-tuple group projection handling from 2 slots to N+1 slots (key at
_0, reducers at_1.._N) by planning via anarray<ReducerSpec>and concatenating per-slot update statements into one fused loop. - Add functional parity tests, AST-shape tests, new benchmarks (
groupby_min/max/first/multi_reducer), and updatebenchmarks/sql/LINQ.mdcoverage notes.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
daslib/linq_fold.das |
Implements reducer-spec planning, new reducer recognizers, and fused multi-reducer group-by splice emission. |
tests/linq/test_linq_fold.das |
Adds runtime parity tests for min/max/first (bare/named/inner-select) and multi-reducer named tuples. |
tests/linq/test_linq_fold_ast.das |
Adds AST fingerprint tests ensuring splice selection and expected codegen characteristics (direct compare vs _::less, no runtime reducers, fused pass). |
benchmarks/sql/LINQ.md |
Updates phase coverage table and benchmark results/notes to include PR-A2 additions. |
benchmarks/sql/groupby_min.das |
Adds benchmark for inner-select-min group-by splice performance vs baselines. |
benchmarks/sql/groupby_max.das |
Adds benchmark for inner-select-max group-by splice performance vs baselines. |
benchmarks/sql/groupby_first.das |
Adds benchmark for bare-first group-by splice performance (no SQL baseline). |
benchmarks/sql/groupby_multi_reducer.das |
Adds benchmark demonstrating multi-reducer fused pass (count+sum+max) performance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the remaining
BufferGroupBygaps from PR #2723's deferred-follow-up list. Builds on PR-A1's miss/hit emission split.A.3 — per-reducer dispatch. Recognizer extended to 8 reducer shapes: bare
{sum, min, max, first}+ inner-select{sum, min, max, first}(<reducer>(select(<bind>._1, <lambda>))). Newemit_reducer_brancheshelper produces per-reducer (missInit, hitUpdate) pairs.min/max: direct</>for workhorse acc types;_::lessfallback for non-workhorse (matches the reference at linq.das:1224,1308).first: miss-init only, no hit update. Subsequent same-key elements are ignored, exploiting PR linq_fold: buffer-required splice arms — reverse, distinct, group_by #2721's first-key-wins guarantee.min/max: bind the projection result to a per-element temp so the inner body evaluates exactly once per source element — matchesselectlaziness and avoids re-running side effects.first_or_default(def)rejects on the 2-arg arity check and cascades.A.4 — N+1-slot named tuples. The 2-slot recognizer (key + 1 reducer) is replaced by
recognize_reducer_specsreturningarray<ReducerSpec>. Named-tuple form now accepts arbitrarily many reducer slots (key at_0, reducers at_1.._N). The planner walks the spec array once and concatenates per-slotmissInit+hitUpdatestatements into the per-element loop — N reducers fused into one pass.Field access into dynamic slots (
entry._{slot}) is built programmatically viamk_slot_refsince qmacro has no dynamic-field-name splice. The per-element loop drops theelsebranch entirely for first-only chains (no hit update needed).Bail paths (cascade to tier 2): key not at slot 0; unrecognized reducer in any slot (e.g.
average);first_or_default(fails arity).Headline (100K rows, INTERP)
All 6 splice variants land in ~36–53 ns/op — per-element work bounded by the single hash op + slot mutations regardless of which reducer or how many. Multi-reducer pays ~5 ns per extra slot.
Test plan
tests/linq/test_linq_fold.das— 257 tests pass (18 new intest_group_by_min_max_first_fold_parity+test_group_by_multi_reducer_fold_paritycovering bare/named/inner-select/multi/empty/Person-fixture-parity)tests/linq/test_linq_fold_ast.das— 102 tests pass (5 new: G5a min workhorse direct compare, G5b min non-workhorse_::less, G6 first no hit compare, G7 multi-reducer fused pass, G10 first_or_default cascade)tests/linq/test_linq_group_by.das— 18 tests pass (regression check)tests/linq/test_linq_aggregation.das— 46 tests passtests/linq/test_linq.das— 21 tests passtest_aot -use-aot) — 257 + 102 tests passmcp__daslang__lint)mcp__daslang__format_file)Deferred follow-ups
averagereducer (2-slot per-key acc + post-process division)where_*/select*fusion intogroup_by(PR-B)reverse_takebackward index loop on array sources (PR-C)🤖 Generated with Claude Code