linq_fold: trivial-let elision + reverse_take skip-into-tail — closes the m4 ladder by borisbat · Pull Request #2834 · GaijinEntertainment/daScript

borisbat · 2026-05-23T12:02:29Z

Two slices that close the last m4-vs-m3f outliers from #2824's bench snapshot. Wraps the m4 perf push before docs / tutorials.

Slice 1 — trivial-let elision

Closes: sum_aggregate_m4 1.3ns systemic gap vs m3f.

When _select(_.userName) peels to a single decs_tup.<field> reference, rename the chain bind directly to the corresponding iter var instead of synthesizing decs_sel_N. wrap_decs_chain skips emitting let decs_sel_N = car_price entirely; the action's acc += <iterVar> references the iter var natively.

Required extending DecsTupUsageScanner with an iter-var → user-name reverse map so bare iter-var refs still seed the pruner (previously: empty usedNames fell through to unpruned-default, defeating the elision).

Slice 2 — reverse_take skip-into-tail

Closes: reverse_take_m4 48 → 9.2 ns/op (5.2× win, allocs 42B → 1B).

For from_decs_template(...).reverse().take(N).to_array() with no where/select, emit a two-pass invoke:

Pass 1 — sum arch.size across archetypes (no entity load, just archetype-header walk)
Pass 2 — for_each_archetype_find skips whole archetypes whose cumulative size still fits below skip = total - N; the partial archetype uses a per-iter skip-counter; once the takeN-sized buffer fills, return true stops iteration across remaining archetypes
reverse_inplace runs on the small N buffer at end, not the full source

where/select fall through to the legacy buffer + reverse_inplace + resize emit unchanged — a where filter invalidates the size-based skip (count after filter is unknown without iterating).

Bench results (INTERP, 100K rows, ns/op)

Lane	Before	After	Delta	m3f
sum_aggregate_m4	3.4	2.1	-1.3	2.1
reverse_take_m4	48.0	9.2	-38.8	0.0
select_where_sum_m4	7.5	7.5	0.0	7.5
contains_match_m4	2.1	1.4	-0.7	2.2
chained_where_m4	6.6	6.6	0.0	6.6
count_aggregate_m4	4.1	4.1	0.0	4.1

sum_aggregate_m4 now matches m3f exactly. reverse_take_m4 doesn't hit m3f's 0.0 (m3f leverages array-side R6 backward-index access, which decs can't replicate without indexed-component-array helpers) but the skip-into-tail cuts the remaining cost to just the per-element iter ticks. contains_match_m4 beats m3f at 2.2.

Tests

test_unroll_select_sum_trivial_let_elision_splice_shape — asserts decs_sel does not appear in _select(_.val).sum() splice body
test_unroll_where_select_sum_trivial_let_elision_splice_shape — same elision after a _where filter
test_unroll5d_reverse_take_skip_into_tail_splice_shape — asserts for_each_archetype_find count + presence of decs_skips skip-counter local
test_reverse_take_multi_archetype_parity — exercises whole-archetype-skip + partial-archetype skip-counter
test_reverse_take_skip_zero_when_take_exceeds_total — actualTake = totalCount branch (early-exit doesn't fire)
test_reverse_take_empty_source — totalCount = 0 early-return before for_each_archetype_find

Verification

1388/1388 linq + 245/245 decs + 782/782 dasSQLITE green INTERP
MCP lint + CI-style lint both clean
AOT lane (and JIT) deferred to CI

Out of scope

Closing the residual reverse_take 9.2 ns/op gap vs m3f's 0.0 would require either an indexed-component-array helper (so the inner for can start at an arbitrary offset, skipping iter ticks too) or moving the skip math into decs itself. Separate slice if it ever matters in production.
select is included in the skip-into-tail bail. It only affects element shape, not count, so it could in principle stay on the fast path — but the v1 emit doesn't carry projection through the inner-for body. Easy follow-up.

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR tightens the decs (m4) linq-fold unroll path to close remaining performance outliers versus the array (m3f) lane, focusing on eliminating redundant binds in trivial projections and adding a specialized fast path for reverse().take(N).to_array().

Changes:

Add “trivial-let elision” so single-field _select(_.field) projections can reuse the underlying iter var directly and avoid emitting no-op decs_sel_* bindings; extend decs tuple usage scanning to recognize bare iter-var usage so pruning still works.
Add a two-pass reverse().take(N).to_array() decs fast path that first sums archetype sizes, then skips into the tail using for_each_archetype_find + a per-iter skip counter, and reverses only the small N-sized buffer.
Add new splice-shape and parity tests covering the new behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
daslib/linq_fold.das	Implements trivial-let elision (with updated decs usage scanning) and the reverse+take skip-into-tail fast path in the decs reverse planner.
tests/linq/test_linq_from_decs.das	Adds splice-shape assertions and parity tests intended to validate the new decs unroll optimizations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two slices closing the final m4-vs-m3f gaps from #2824's residual outliers. Slice 1 — trivial-let elision (closes sum_aggregate_m4 1.3ns gap): When `_select(_.userName)` peels to a single `decs_tup.<field>` reference, rename the chain bind directly to the iter var instead of synthesizing `decs_sel_N`. wrap_decs_chain skips emitting the `let decs_sel_N = car_price` binding entirely; the action's `acc += <iterVar>` references the iter var natively. Required extending DecsTupUsageScanner with an iter-var-→-user-name reverse map so bare iter-var refs still seed the pruner (previously: empty usedNames fell through to unpruned-default, defeating the elision). Slice 2 — reverse_take skip-into-tail (closes reverse_take_m4 5.2× gap): For `from_decs(...).reverse().take(N).to_array()` with no where/select, emit a two-pass invoke: pass 1 sums `arch.size` (no entity load), pass 2 uses for_each_archetype_find to skip whole archetypes whose cumulative size still fits below the skip threshold, then a per-iter skip-counter through the partial archetype, push into a takeN-sized buffer, and `return true` to stop iteration once buf is full. reverse_inplace runs on the small N buffer at end, not the full source. where/select fall through to the legacy buffer+reverse_inplace+resize emit unchanged. Bench (INTERP, 100K rows, ns/op): - sum_aggregate_m4 3.4 → 2.1 matches m3f (was the systemic 1.3ns gap) - reverse_take_m4 48.0 → 9.2 5.2× win, allocs 42B → 1B - select_where_sum_m4 7.5 → 7.5 matches m3f (elision benefits this too) - contains_match_m4 2.1 → 1.4 beats m3f at 2.2 - chained_where_m4 6.6 → 6.6 no regression - count_aggregate_m4 4.1 → 4.1 no regression Tests: - New splice-shape assertions: trivial-let elision (no decs_sel binding for `_select(_.val).sum()` and `_where(_)._select(_.val).sum()`) - New splice-shape for skip-into-tail (for_each_archetype_find count==1, decs_skips local presence) - New parity tests: multi-archetype reverse+take, take(N>total), empty source — covers the whole-archetype-skip + partial-archetype + early-return arms 1388/1388 linq + 245/245 decs + 782/782 dasSQLITE green INTERP. MCP + CI lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#2835 fixed the typer-pass-order #2830 that originally tripped this test on the extended_checks (linux, 64) lane. With master now containing the fix, the test compiles cleanly on all lanes. Re-adding it covers a case the current suite missed: from_decs_template(type<Row>)._where(_.a >= 0)._where(_.b >= 0).count() Three chained single-field _where_s — all 3 fields read via field access, no whole-var ref. The splice must keep all 3 get_ros (no slot pruning) but elide the named-tuple bind (no decs_tup in the body, iter vars read directly). Lesson saved to memory: not every CI lane runs every test, so "platform-specific" failures often mean "we only check this on one platform" — not that the bug itself is platform-specific. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The original test created one A-only and one B-only archetype, but the query `from_decs_template(type<RevTakeMultiArchA>)` only matches A — the B archetype never enters for_each_archetype, so the cross- archetype skipping arm wasn't actually exercised. Now creates two MATCHING archetypes: both have `rev2_id` (so both satisfy the query), but the second group also has the rev2_b_* extras which lands it in a separate archetype class. With A1=4 + A2=5 → totalCount=9, take(3) → skip=6: A1 (size 4) skipped via the size-sum arm, A2 enters with skipsLeft=2 → drains 2, pushes 3, returns true. Exercises both the whole-archetype-skip and partial-archetype + early-exit paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Step 3 — emit_loop_or_count_lane_decs: Reconstructs flatten_linq-shaped calls array from captures (head + range ops + term) and dispatches to existing emit_decs_* lane fns. Per-adapter state-hoist shape stays (array binds source as invoke arg, decs zero-arg with state above for_each_archetype) — D1 keeps the 4 array lane fns untouched. Step 4+5 — 4 decs planners now thin pattern-table stubs: plan_decs_unroll → plan_loop_or_count_patterns + Decs adapter plan_decs_order_family → plan_order_family_patterns + Decs adapter (Row 4 buffer_helper_dispatch gated by array_source; decs cascades to Row 3 fused_prefilter) plan_decs_reverse → plan_reverse_patterns + Decs adapter (backward-walk rows already array_source-gated) plan_decs_distinct → plan_distinct_patterns + Decs adapter (emit_hashtable_dedup carries per-adapter take(N) branch — Decs for_each_archetype_find, Array break) All 4 imperative bodies hard-deleted: -1173 LOC, +492 LOC = -681 LOC net. Skip-into-tail preserved (emit_decs_reverse_skip_into_tail): Decs `reverse |> take(N) |> to_array` fast path lifted into a dedicated emit fn that emit_reverse_buffer_inplace pre-checks before the general buffer path. Preserves PR GaijinEntertainment#2834's 5.2× perf gain on multi-archetype decs sources. Test fixes (tests/linq/test_linq_from_decs.das): 6 splice-shape assertions updated to match unified naming: decs_buf → order_buf (order family) / `buf (distinct) decs_seen → order_seen (order family) / `seen (distinct) decs_best → order_best, decs_taken → `taken, decs_acc → `acc Verification: - mcp__daslang__lint: clean - test_linq_from_decs (198), test_linq_fold (385), test_linq_fold_ast (228) - all themes 2/3/6/7/8 + pattern_walker + collapse_chained_wheres - test_queries, test_queries_comprehensive (decs core) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 23, 2026 12:02

Copilot started reviewing on behalf of borisbat May 23, 2026 12:02 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread daslib/linq_fold.das

Comment thread daslib/linq_fold.das

Comment thread tests/linq/test_linq_from_decs.das

borisbat and others added 2 commits May 23, 2026 09:59

borisbat force-pushed the bbatkin/linq-fold-trivial-let-and-reverse-take branch from 836aee0 to eec6002 Compare May 23, 2026 17:02

borisbat merged commit 81823e4 into master May 23, 2026
29 checks passed

This was referenced May 23, 2026

linq_fold: bounded-heap / streaming-min for plan_decs_order_family #2837

Merged

linq_fold: PR C — SourceAdapter + 4 decs planner migrations #2885

Merged

borisbat deleted the bbatkin/linq-fold-trivial-let-and-reverse-take branch May 30, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linq_fold: trivial-let elision + reverse_take skip-into-tail — closes the m4 ladder#2834

linq_fold: trivial-let elision + reverse_take skip-into-tail — closes the m4 ladder#2834
borisbat merged 3 commits into
masterfrom
bbatkin/linq-fold-trivial-let-and-reverse-take

borisbat commented May 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

borisbat commented May 23, 2026

Slice 1 — trivial-let elision

Slice 2 — reverse_take skip-into-tail

Bench results (INTERP, 100K rows, ns/op)

Tests

Verification

Out of scope

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants