Skip to content

linq_fold: plan_decs_unroll — splice from_decs* eager bridge into for_each_archetype#2750

Merged
borisbat merged 3 commits into
masterfrom
bbatkin/decs-zip-unroll-pattern3
May 20, 2026
Merged

linq_fold: plan_decs_unroll — splice from_decs* eager bridge into for_each_archetype#2750
borisbat merged 3 commits into
masterfrom
bbatkin/decs-zip-unroll-pattern3

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

New plan_decs_unroll planner in daslib/linq_fold.das. Recognizes the post-expansion from_decs* eager-bridge shape and replaces it with a for_each_archetype walk that hoists the linq accumulator above the per-entity loop — same emission shape query() produces by hand, no to_sequence, no lambda-per-element.

Supersedes #2748 (closes upon merge). The old PR pattern-matched the bridge inner for-loop and emitted per-terminator splices via renameVariable; this rewrite uses a named-tuple bind in the for-body so fold_linq_cond can peel user lambdas naturally (_.fieldnametup.fieldname), and covers a different terminator set (count/long_count/sum vs count/to_array).

Coverage

Slice 1 — bare count (arch.size shortcut):

  • _fold(from_decs*(...).count())var acc=0; for_each_archetype $(arch) { acc += arch.size }; return acc. No per-entity walk.

Slice 2 — chain-aware terminators:

  • count, long_count, sum
  • Chain ops: _where (multiple, AND'd), single _select
  • Canonical order only (where before select; chained selects bail to tier-2)

Multi-field bridges supported via named-tuple bind: var tup = (f1=iter1, f2=iter2); <chain body>. User's _.fieldname access resolves to tup.fieldname after fold_linq_cond peels with bound name = tup.

Bails (cascade to tier-2 eager bridge, runs correctly but slower):

  • Terminators other than count/long_count/sum: to_array, first, any, contains, min, max, average, distinct, group_by, reverse, order_by
  • After-select where, chained selects, take/skip/take_while/skip_while
  • Any shape that doesn't match the eager-bridge AST

Benchmarks (interpreter, 100K entities)

variant ns/entity vs master
m1 hand-written query() (ideal target) 4
m2 from_decs_template(...).count() (raw) 60 baseline
m2 _fold(from_decs_template(...).count()) 0 ∞× (arch.size shortcut)
m4 hand-written _where(...)._select(...).sum() raw 202 baseline
m4 _fold(...)._where(...)._select(...)..sum()) BEFORE 63 baseline
m4 _fold(...)._where(...)._select(...)..sum()) AFTER 6 10.5× faster

Architecture

plan_decs_unroll:

  1. Recognizer (extract_decs_bridge): pattern-matches invoke($() { var res; for_each_archetype(req, erq, $(arch) { for(iter_vars in get_ro_sources){push(named_tuple)} }); return res.to_sequence() }). Captures req hash, erq factory, arch param name, cloned inner ExprFor, per-field iter names + user names from the push tuple. Verifies to_sequence references the same res. Any mismatch returns null → tier-2 cascade runs the eager bridge unchanged.

  2. Bare count shortcut (emit_decs_count_archsize): no per-entity walk; sums arch.size per archetype.

  3. Chain-aware emission (emit_decs_accumulator):

    • Walks chain ops, peels user lambdas via fold_linq_cond(lambda, tupName) → projection / where expression in terms of the named tuple.
    • Clones the bridge's inner ExprFor (preserves iter vars + get_ro sources, reuses bridge's archName).
    • Substitutes the for-body with var tup = (n1=iter1, n2=iter2, ...); <wrapped chain body>.
    • Wraps in outer invoke($() : ResultType { var acc = init; for_each_archetype(req, erq) $(arch) { <cloned for> }; return acc }).

Verification

  • mcp__daslang__lint daslib/linq_fold.das: clean
  • mcp__daslang__lint tests/linq/test_linq_from_decs.das: clean
  • tests/linq: 1146/1146 green (interpreter)
  • tests/decs: 239/239 green (interpreter)
  • New tests: 8 functional parity + 2 AST-shape gates confirming splice fires

Out of scope (follow-ups)

  • Slice 3: buffer terminators (to_array), early-exit (first/first_or_default/any/all/contains), min/max/average
  • Slice 4: chained selects, after-select where, take/skip/take_while/skip_while chain ops
  • Slice 5: state-table terminators (distinct, group_by, reverse, order_by)
  • AOT benchmark sweep (defer until full terminator coverage)
  • LINQ_TO_DECS.md design-doc refresh

Test plan

  • CI green across interpreter + AOT + Release/Debug matrix
  • No regression in tests/linq/test_linq_from_decs.das (existing 7 tests) or any other tests/linq + tests/decs
  • Splice-shape AST gates fire (verified locally via find_module_function_via_rtti)

🤖 Generated with Claude Code

borisbat and others added 2 commits May 19, 2026 22:52
Approach Z splice for the from_decs* eager-bridge family. Recognizes the
post-expansion `invoke($() { var res; for_each_archetype(...); return
res.to_sequence() })` shape and, for the bare-count case (no chain ops),
emits `invoke($() { var acc=0; for_each_archetype(req, erq) $(arch) {
acc += arch.size }; return acc })` — skips the per-entity walk entirely.

Slot in the planner cascade between plan_group_by and plan_zip. On any
shape mismatch (different bridge layout, chain ops present, terminator
other than bare count) the planner returns null and tier-2 cascade runs
the eager bridge unchanged — safe degradation.

Benchmarks (interpreter, 100K entities):
- m1 hand-written for_each_archetype + arch.size:     0 ns/entity
- m2 from_decs_template.count() (eager bridge):       60 ns/entity
- m3 _fold(from_decs_template.count()) splice:        0 ns/entity  ← splice fires
- regression check: tests/linq 1138/1138, tests/decs 239/239

Subsequent slices: chain-aware terminators (sum/min/max/long_count) via
nested _fold leveraging plan_zip; buffer terminators (to_array); early-exit
(first/any/contains). Each slice incremental, each safe-degrading.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds chain-op handling for from_decs* eager-bridge splice: peels
_where + single _select between the bridge and the terminator, emits
the unrolled for_each_archetype with a named-tuple bind so the user's
`_.fieldname` chain access resolves naturally to the bridge's iter
vars without per-element lambda overhead.

Approach:
- DecsBridgeShape now carries the cloned inner ExprFor + bridge's
  iter var names + user-facing field names from the push tuple.
- emit_decs_accumulator clones the bridge's for-loop, replaces the
  push body with `var tup = (n=iter, ...); <wrapped chain body>`.
- fold_linq_cond peels user lambdas with bound name = tup, so
  `_.fieldname` becomes `tup.fieldname` (real named-tuple access,
  no rewriting required).
- Terminators: count/long_count (postfix ++), sum (+= projection;
  accumulator typed from projection._type).
- _where + _select canonical chain order (after-select where /
  chained selects bail to tier-2 — defer to follow-up).

Benchmarks (100K entities, interpreter):
- m1 hand-written query:                            4 ns/entity
- m4 _fold(from_decs_template..._where..._select..sum()) BEFORE: 63
- m4 _fold(from_decs_template..._where..._select..sum()) AFTER:   6  ← 10.5× faster
- regression check: tests/linq 1146/1146, tests/decs 239/239

Tests: 5 new functional parity + AST shape gate. Covers
select+sum, where+count, where+select+sum, long_count, single-field
+ multi-field bridges.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new _fold planner (plan_decs_unroll) to recognize the fully-expanded from_decs* eager-bridge AST shape and replace it with a for_each_archetype-based emission that hoists accumulator work outside the per-entity iterator/to_sequence bridge, enabling much faster count/long_count/sum (with limited _where/single _select) on DECS sources.

Changes:

  • Add plan_decs_unroll (plus recognizer + emitters) to splice the from_decs* eager bridge into an archetype walk, including an arch.size shortcut for bare count().
  • Extend LINQ-from-DECS tests with new parity checks and AST “splice fired” gates.
  • Add two new DECS benchmarks to compare eager-bridge vs _fold splice vs hand-written baselines.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
daslib/linq_fold.das Adds DECS eager-bridge recognition and optimized emission for count/long_count/sum, and wires the planner into the _fold cascade.
tests/linq/test_linq_from_decs.das Adds new Slice 1/Slice 2 functional parity tests and AST-shape gating for the splice.
benchmarks/decs/bench_from_decs_template_sum.das Adds benchmark coverage for _where + _select + sum comparing eager bridge vs _fold vs hand-written query.
benchmarks/decs/bench_from_decs_count.das Adds benchmark coverage for bare count() including the arch.size shortcut target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread daslib/linq_fold.das
Comment thread tests/linq/test_linq_from_decs.das
- extract_decs_bridge now verifies the inner push() writes into the
  same `res` variable declared in stmt 0 + that push has exactly 2
  arguments. Closes false-positive surface where a user-written
  invoke could return `res.to_sequence()` but push into an unrelated
  buffer — splice would have emitted count/sum over the wrong data.
- test_linq_from_decs.das: replace the brittle `describe(expr)`
  substring needle `"for ( "` (sensitive to formatter whitespace
  changes) with a structural count_expr_for(body_expr) AST walk.
  Other needles (`to_sequence`, `for_each_archetype`, `.size`) stay
  as describe-substring since those are stable call/field names that
  the formatter doesn't reshape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@borisbat borisbat merged commit 68fe61d into master May 20, 2026
32 checks passed
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 20, 2026
…_array

Extends Approach Z direct-inline splice (PR GaijinEntertainment#2750) to cover the remaining
terminator surface for from_decs* chains.

Slice 3a (accumulator family): min/max/average added to emit_decs_accumulator.
Match non-decs emit_accumulator_lane semantics — min/max keep a `first` flag
hoisted above outer for_each_archetype; average keeps a running sum + count
and divides via double() at end. sum/min/max/average require a scalar _select.

Slice 3b (early-exit): new emit_decs_early_exit for first/first_or_default/
any/all/contains. Outer becomes for_each_archetype_find (returns bool; inner
block returns true to stop the archetype walk). any/all/contains use the
find's return value directly (all negates). first/first_or_default thread a
found flag + result via prelude/tail.

Slice 3c (to_array): new emit_decs_to_array hoists `var buf` above outer
for_each_archetype and per-element push_clones the projection (or named tuple
when no _select). Dispatched via the implicit "no recognized terminator" path
since linqCalls marks to_array as skip=true.

Refactor: build_decs_tup_bind + build_decs_inner_for helpers extracted from
Slice 2's emit_decs_accumulator so the new emitters share the for-body shape.
DecsBridgeShape gains elementType (cloned from resVar._type.firstType) for
to_array / first / first_or_default when no projection is present.

Tests: 14 new functional parity + 3 AST-shape gate tests in
tests/linq/test_linq_from_decs.das. All 29 file-local tests green; 1146 linq
+ 234 decs interp tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants