linq_fold: trivial-let elision + reverse_take skip-into-tail — closes the m4 ladder#2834
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR tightens the decs (m4) linq-fold unroll path to close remaining performance outliers versus the array (m3f) lane, focusing on eliminating redundant binds in trivial projections and adding a specialized fast path for reverse().take(N).to_array().
Changes:
- Add “trivial-let elision” so single-field
_select(_.field)projections can reuse the underlying iter var directly and avoid emitting no-opdecs_sel_*bindings; extend decs tuple usage scanning to recognize bare iter-var usage so pruning still works. - Add a two-pass
reverse().take(N).to_array()decs fast path that first sums archetype sizes, then skips into the tail usingfor_each_archetype_find+ a per-iter skip counter, and reverses only the small N-sized buffer. - Add new splice-shape and parity tests covering the new behaviors.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| daslib/linq_fold.das | Implements trivial-let elision (with updated decs usage scanning) and the reverse+take skip-into-tail fast path in the decs reverse planner. |
| tests/linq/test_linq_from_decs.das | Adds splice-shape assertions and parity tests intended to validate the new decs unroll optimizations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two slices closing the final m4-vs-m3f gaps from #2824's residual outliers. Slice 1 — trivial-let elision (closes sum_aggregate_m4 1.3ns gap): When `_select(_.userName)` peels to a single `decs_tup.<field>` reference, rename the chain bind directly to the iter var instead of synthesizing `decs_sel_N`. wrap_decs_chain skips emitting the `let decs_sel_N = car_price` binding entirely; the action's `acc += <iterVar>` references the iter var natively. Required extending DecsTupUsageScanner with an iter-var-→-user-name reverse map so bare iter-var refs still seed the pruner (previously: empty usedNames fell through to unpruned-default, defeating the elision). Slice 2 — reverse_take skip-into-tail (closes reverse_take_m4 5.2× gap): For `from_decs(...).reverse().take(N).to_array()` with no where/select, emit a two-pass invoke: pass 1 sums `arch.size` (no entity load), pass 2 uses for_each_archetype_find to skip whole archetypes whose cumulative size still fits below the skip threshold, then a per-iter skip-counter through the partial archetype, push into a takeN-sized buffer, and `return true` to stop iteration once buf is full. reverse_inplace runs on the small N buffer at end, not the full source. where/select fall through to the legacy buffer+reverse_inplace+resize emit unchanged. Bench (INTERP, 100K rows, ns/op): - sum_aggregate_m4 3.4 → 2.1 matches m3f (was the systemic 1.3ns gap) - reverse_take_m4 48.0 → 9.2 5.2× win, allocs 42B → 1B - select_where_sum_m4 7.5 → 7.5 matches m3f (elision benefits this too) - contains_match_m4 2.1 → 1.4 beats m3f at 2.2 - chained_where_m4 6.6 → 6.6 no regression - count_aggregate_m4 4.1 → 4.1 no regression Tests: - New splice-shape assertions: trivial-let elision (no decs_sel binding for `_select(_.val).sum()` and `_where(_)._select(_.val).sum()`) - New splice-shape for skip-into-tail (for_each_archetype_find count==1, decs_skips local presence) - New parity tests: multi-archetype reverse+take, take(N>total), empty source — covers the whole-archetype-skip + partial-archetype + early-return arms 1388/1388 linq + 245/245 decs + 782/782 dasSQLITE green INTERP. MCP + CI lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
#2835 fixed the typer-pass-order #2830 that originally tripped this test on the extended_checks (linux, 64) lane. With master now containing the fix, the test compiles cleanly on all lanes. Re-adding it covers a case the current suite missed: from_decs_template(type<Row>)._where(_.a >= 0)._where(_.b >= 0).count() Three chained single-field _where_s — all 3 fields read via field access, no whole-var ref. The splice must keep all 3 get_ros (no slot pruning) but elide the named-tuple bind (no decs_tup in the body, iter vars read directly). Lesson saved to memory: not every CI lane runs every test, so "platform-specific" failures often mean "we only check this on one platform" — not that the bug itself is platform-specific. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
836aee0 to
eec6002
Compare
The original test created one A-only and one B-only archetype, but the query `from_decs_template(type<RevTakeMultiArchA>)` only matches A — the B archetype never enters for_each_archetype, so the cross- archetype skipping arm wasn't actually exercised. Now creates two MATCHING archetypes: both have `rev2_id` (so both satisfy the query), but the second group also has the rev2_b_* extras which lands it in a separate archetype class. With A1=4 + A2=5 → totalCount=9, take(3) → skip=6: A1 (size 4) skipped via the size-sum arm, A2 enters with skipsLeft=2 → drains 2, pushes 3, returns true. Exercises both the whole-archetype-skip and partial-archetype + early-exit paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 23, 2026
pull Bot
pushed a commit
to forksnd/daScript
that referenced
this pull request
May 26, 2026
Step 3 — emit_loop_or_count_lane_decs:
Reconstructs flatten_linq-shaped calls array from captures (head + range
ops + term) and dispatches to existing emit_decs_* lane fns. Per-adapter
state-hoist shape stays (array binds source as invoke arg, decs zero-arg
with state above for_each_archetype) — D1 keeps the 4 array lane fns
untouched.
Step 4+5 — 4 decs planners now thin pattern-table stubs:
plan_decs_unroll → plan_loop_or_count_patterns + Decs adapter
plan_decs_order_family → plan_order_family_patterns + Decs adapter
(Row 4 buffer_helper_dispatch gated by array_source;
decs cascades to Row 3 fused_prefilter)
plan_decs_reverse → plan_reverse_patterns + Decs adapter
(backward-walk rows already array_source-gated)
plan_decs_distinct → plan_distinct_patterns + Decs adapter
(emit_hashtable_dedup carries per-adapter take(N)
branch — Decs for_each_archetype_find, Array break)
All 4 imperative bodies hard-deleted: -1173 LOC, +492 LOC = -681 LOC net.
Skip-into-tail preserved (emit_decs_reverse_skip_into_tail):
Decs `reverse |> take(N) |> to_array` fast path lifted into a dedicated
emit fn that emit_reverse_buffer_inplace pre-checks before the general
buffer path. Preserves PR GaijinEntertainment#2834's 5.2× perf gain on multi-archetype
decs sources.
Test fixes (tests/linq/test_linq_from_decs.das):
6 splice-shape assertions updated to match unified naming:
decs_buf → order_buf (order family) / `buf (distinct)
decs_seen → order_seen (order family) / `seen (distinct)
decs_best → order_best, decs_taken → `taken, decs_acc → `acc
Verification:
- mcp__daslang__lint: clean
- test_linq_from_decs (198), test_linq_fold (385), test_linq_fold_ast (228)
- all themes 2/3/6/7/8 + pattern_walker + collapse_chained_wheres
- test_queries, test_queries_comprehensive (decs core)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two slices that close the last m4-vs-m3f outliers from #2824's bench snapshot. Wraps the m4 perf push before docs / tutorials.
Slice 1 — trivial-let elision
Closes:
sum_aggregate_m41.3ns systemic gap vs m3f.When
_select(_.userName)peels to a singledecs_tup.<field>reference, rename the chain bind directly to the corresponding iter var instead of synthesizingdecs_sel_N.wrap_decs_chainskips emittinglet decs_sel_N = car_priceentirely; the action'sacc += <iterVar>references the iter var natively.Required extending
DecsTupUsageScannerwith an iter-var → user-name reverse map so bare iter-var refs still seed the pruner (previously: emptyusedNamesfell through to unpruned-default, defeating the elision).Slice 2 — reverse_take skip-into-tail
Closes:
reverse_take_m448 → 9.2 ns/op (5.2× win, allocs 42B → 1B).For
from_decs_template(...).reverse().take(N).to_array()with no where/select, emit a two-pass invoke:arch.sizeacross archetypes (no entity load, just archetype-header walk)for_each_archetype_findskips whole archetypes whose cumulative size still fits belowskip = total - N; the partial archetype uses a per-iter skip-counter; once the takeN-sized buffer fills,return truestops iteration across remaining archetypesreverse_inplaceruns on the small N buffer at end, not the full sourcewhere/selectfall through to the legacy buffer +reverse_inplace+ resize emit unchanged — a where filter invalidates the size-based skip (count after filter is unknown without iterating).Bench results (INTERP, 100K rows, ns/op)
sum_aggregate_m4now matches m3f exactly.reverse_take_m4doesn't hit m3f's 0.0 (m3f leverages array-side R6 backward-index access, which decs can't replicate without indexed-component-array helpers) but the skip-into-tail cuts the remaining cost to just the per-element iter ticks.contains_match_m4beats m3f at 2.2.Tests
test_unroll_select_sum_trivial_let_elision_splice_shape— assertsdecs_seldoes not appear in_select(_.val).sum()splice bodytest_unroll_where_select_sum_trivial_let_elision_splice_shape— same elision after a_wherefiltertest_unroll5d_reverse_take_skip_into_tail_splice_shape— assertsfor_each_archetype_findcount + presence ofdecs_skipsskip-counter localtest_reverse_take_multi_archetype_parity— exercises whole-archetype-skip + partial-archetype skip-countertest_reverse_take_skip_zero_when_take_exceeds_total—actualTake = totalCountbranch (early-exit doesn't fire)test_reverse_take_empty_source—totalCount = 0early-return beforefor_each_archetype_findVerification
Out of scope
selectis included in the skip-into-tail bail. It only affects element shape, not count, so it could in principle stay on the fast path — but the v1 emit doesn't carry projection through the inner-for body. Easy follow-up.🤖 Generated with Claude Code