linq_fold group_by: min/max/first reducers + multi-reducer fused pass by borisbat · Pull Request #2724 · GaijinEntertainment/daScript

borisbat · 2026-05-19T04:52:45Z

Summary

Closes the remaining BufferGroupBy gaps from PR #2723's deferred-follow-up list. Builds on PR-A1's miss/hit emission split.

A.3 — per-reducer dispatch. Recognizer extended to 8 reducer shapes: bare {sum, min, max, first} + inner-select {sum, min, max, first} (<reducer>(select(<bind>._1, <lambda>))). New emit_reducer_branches helper produces per-reducer (missInit, hitUpdate) pairs.

min / max: direct < / > for workhorse acc types; _::less fallback for non-workhorse (matches the reference at linq.das:1224,1308).
first: miss-init only, no hit update. Subsequent same-key elements are ignored, exploiting PR linq_fold: buffer-required splice arms — reverse, distinct, group_by #2721's first-key-wins guarantee.
inner-select min / max: bind the projection result to a per-element temp so the inner body evaluates exactly once per source element — matches select laziness and avoids re-running side effects.
first_or_default(def) rejects on the 2-arg arity check and cascades.

A.4 — N+1-slot named tuples. The 2-slot recognizer (key + 1 reducer) is replaced by recognize_reducer_specs returning array<ReducerSpec>. Named-tuple form now accepts arbitrarily many reducer slots (key at _0, reducers at _1.._N). The planner walks the spec array once and concatenates per-slot missInit + hitUpdate statements into the per-element loop — N reducers fused into one pass.

Field access into dynamic slots (entry._{slot}) is built programmatically via mk_slot_ref since qmacro has no dynamic-field-name splice. The per-element loop drops the else branch entirely for first-only chains (no hit update needed).

Bail paths (cascade to tier 2): key not at slot 0; unrecognized reducer in any slot (e.g. average); first_or_default (fails arity).

Headline (100K rows, INTERP)

Benchmark	m1 SQL	m3 LINQ	m3f splice	Win
groupby_min	175	111	42	2.6× over m3 / 4.2× over SQL
groupby_max	173	108	43	2.5× over m3 / 4.0× over SQL
groupby_first	—	71	36	2.0× over m3 (no direct SQL aggregator)
groupby_multi_reducer	189	139	53	2.6× over m3 / 3.6× over SQL (3 reducers fused)
groupby_sum (regression check)	175	102	36	parity — refactor preserves PR-A1 splice
groupby_count (regression check)	141	71	36	parity — refactor preserves PR-A1 splice

All 6 splice variants land in ~36–53 ns/op — per-element work bounded by the single hash op + slot mutations regardless of which reducer or how many. Multi-reducer pays ~5 ns per extra slot.

Test plan

tests/linq/test_linq_fold.das — 257 tests pass (18 new in test_group_by_min_max_first_fold_parity + test_group_by_multi_reducer_fold_parity covering bare/named/inner-select/multi/empty/Person-fixture-parity)
tests/linq/test_linq_fold_ast.das — 102 tests pass (5 new: G5a min workhorse direct compare, G5b min non-workhorse _::less, G6 first no hit compare, G7 multi-reducer fused pass, G10 first_or_default cascade)
tests/linq/test_linq_group_by.das — 18 tests pass (regression check)
tests/linq/test_linq_aggregation.das — 46 tests pass
tests/linq/test_linq.das — 21 tests pass
AOT mode (test_aot -use-aot) — 257 + 102 tests pass
Lint clean (mcp__daslang__lint)
Format clean (mcp__daslang__format_file)
All 4 new benchmarks plus PR-A1 regression checks run cleanly

Deferred follow-ups

average reducer (2-slot per-key acc + post-process division)
Upstream where_* / select* fusion into group_by (PR-B)
reverse_take backward index loop on array sources (PR-C)

🤖 Generated with Claude Code

Generalizes plan_group_by along two axes: A.3 — per-reducer dispatch. `is_bucket_reducer_call` extended to 8 reducer shapes: bare {sum, min, max, first} and inner-select {sum, min, max, first} (`<reducer>(select(<bind>._1, <lambda>))`). New `emit_reducer_branches` helper produces per-reducer (missInit, hitUpdate) pairs: - min/max: direct `<` / `>` on workhorse acc types; `_::less` fallback for non-workhorse (matches the reference at linq.das:1224,1308). - first: miss-init only (hitUpdate null). Subsequent same-key elements are ignored, exploiting the first-key-wins guarantee from PR #2721. - inner-select min/max: bind the projection result to a per-element temp so the inner body evaluates exactly once per source element — matches the reference's `select` laziness, and avoids re-running side effects. - first_or_default is rejected at the arity check (2 args) and cascades. A.4 — N+1-slot named tuples. The 2-slot recognizer (key + 1 reducer) is replaced by `recognize_reducer_specs` returning `array<ReducerSpec>`. The named-tuple form now accepts arbitrarily many reducer slots (key at _0, reducers at _1.._N). The planner walks the spec array once and concatenates per-slot missInit + hitUpdate statements into the per-element loop — N reducers fused into one pass instead of N separate passes. Field-access into dynamic slots (`entry._{slot}`) is built programmatically via `mk_slot_ref` since qmacro has no dynamic-field-name splice. The loop-body emission is now conditional on whether any reducer contributes a hit update — first-only chains drop the else branch entirely. Bail paths (all cascade to tier 2): - key not at slot 0 of the named tuple - unrecognized reducer in any slot (e.g. `average`) - first_or_default (2-arg, fails arity) Tests: - 18 new parity tests in test_group_by_min_max_first_fold_parity and test_group_by_multi_reducer_fold_parity (bare, named, inner-select, multi-reducer, empty source, reference parity) - 5 new AST-shape tests: G5a (min workhorse direct compare), G5b (min non-workhorse `_::less`), G6 (first no hit compare), G7 (multi-reducer fused pass), G10 (first_or_default cascade) Both PR-A1 baselines (groupby_count, groupby_sum) hold parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t + multi-reducer 4 new 100K-row benchmarks (m1 SQL / m3 plain LINQ / m3f splice): - groupby_min: 175 / 111 / 42 ns/op (2.6× over m3, 4.2× over SQL) - groupby_max: 173 / 108 / 43 ns/op (2.5× over m3, 4.0× over SQL) - groupby_first: — / 71 / 36 ns/op (2.0× over m3; no direct SQL aggregator) - groupby_multi_reducer: 189 / 139 / 53 ns/op (3 reducers fused into 1 pass; 2.6× over m3, 3.6× over SQL) All 6 splice variants land within ~36–53 ns/op — per-element work bounded by the single hash op + slot mutations regardless of which reducer or how many. Multi-reducer pays ~5 ns per extra slot, still beats SQL by 3.6×. LINQ.md: refreshed Phase status table, baseline rows, Phase 3+ subsection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR expands the _fold tier-1 splice for group_by[_lazy] |> select(...) in daslib/linq_fold.das to cover additional reducer shapes (min, max, first, and their inner-select forms) and to fuse multiple reducers from N-slot named-tuples into a single per-element table-update pass, closing remaining BufferGroupBy gaps from the earlier follow-ups.

Changes:

Extend reducer recognition to 8 shapes: bare {sum,min,max,first} + inner-select {sum,min,max,first} and add per-reducer miss-init / hit-update emission.
Generalize named-tuple group projection handling from 2 slots to N+1 slots (key at _0, reducers at _1.._N) by planning via an array<ReducerSpec> and concatenating per-slot update statements into one fused loop.
Add functional parity tests, AST-shape tests, new benchmarks (groupby_min/max/first/multi_reducer), and update benchmarks/sql/LINQ.md coverage notes.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`daslib/linq_fold.das`	Implements reducer-spec planning, new reducer recognizers, and fused multi-reducer group-by splice emission.
`tests/linq/test_linq_fold.das`	Adds runtime parity tests for min/max/first (bare/named/inner-select) and multi-reducer named tuples.
`tests/linq/test_linq_fold_ast.das`	Adds AST fingerprint tests ensuring splice selection and expected codegen characteristics (direct compare vs `_::less`, no runtime reducers, fused pass).
`benchmarks/sql/LINQ.md`	Updates phase coverage table and benchmark results/notes to include PR-A2 additions.
`benchmarks/sql/groupby_min.das`	Adds benchmark for inner-select-min group-by splice performance vs baselines.
`benchmarks/sql/groupby_max.das`	Adds benchmark for inner-select-max group-by splice performance vs baselines.
`benchmarks/sql/groupby_first.das`	Adds benchmark for bare-first group-by splice performance (no SQL baseline).
`benchmarks/sql/groupby_multi_reducer.das`	Adds benchmark demonstrating multi-reducer fused pass (count+sum+max) performance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

borisbat and others added 2 commits May 18, 2026 21:52

Copilot AI review requested due to automatic review settings May 19, 2026 04:52

Copilot started reviewing on behalf of borisbat May 19, 2026 04:53 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

borisbat mentioned this pull request May 19, 2026

linq_fold group_by: fuse upstream where_/select* into per-element loop #2725

Merged

7 tasks

borisbat merged commit dd85a67 into master May 19, 2026
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linq_fold group_by: min/max/first reducers + multi-reducer fused pass#2724

linq_fold group_by: min/max/first reducers + multi-reducer fused pass#2724
borisbat merged 2 commits into
masterfrom
bbatkin/linq-fold-min-max-first-multi-reducer

borisbat commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

borisbat commented May 19, 2026

Summary

Headline (100K rows, INTERP)

Test plan

Deferred follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants