Skip to content

linq_fold: plan_decs_unroll Slice 5f — terminal splice for last/single/aggregate/element_at#2822

Merged
borisbat merged 2 commits into
masterfrom
bbatkin/linq-fold-decs-unroll-slice5f-terminator
May 23, 2026
Merged

linq_fold: plan_decs_unroll Slice 5f — terminal splice for last/single/aggregate/element_at#2822
borisbat merged 2 commits into
masterfrom
bbatkin/linq-fold-decs-unroll-slice5f-terminator

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

Closes the 4 outliers identified in PR #2812's m4 bench snapshot. These terminators were falling through to to_sequence materialization because plan_decs_unroll only recognized first/any/all/contains/min_by/max_by plus the accumulator family. Wave 5 extends the splice path to cover walk-all single-return terminators + element_at counter early-exit.

emit_decs_walk_lane (last / last_or_default / single / single_or_default / aggregate):

  • State (found flag, retained element, accumulator) hoisted at invoke scope so it persists across archetype boundaries.
  • single_or_default uses for_each_archetype_find for early-exit on 2nd match (sets stop flag, returns true from inner lambda); others use plain for_each_archetype unless take/take_while forces bool-lambda dispatch.
  • aggregate mirrors emit_accumulator_lane's workhorse / non-workhorse seed branch (= vs <-) and peel-or-invoke fallback for the binary lambda.

emit_decs_element_at (element_at / element_at_or_default):

  • idx + counter + found + result hoisted at invoke scope.
  • perElement compares cnt == idx, writes result + returns true on match (uses for_each_archetype_find for the counter early-exit).
  • Pre-loop idx<0 panic (element_at) or return default<T> (_or_default).
  • Tail handles "not found" = out-of-range = idx >= effective length.

Bench (INTERP, 100K rows, ns/op)

Lane master m4 branch m4 speedup vs m3f
last_match 82 17 4.8× 17 vs 5
single_match 80 14 5.7× 14 vs 3
aggregate_match 82 7 11.7× 7 vs 5 ≈ matches
element_at_match 34 0 0 vs 0 matches

Per-op allocs drop 84 B → 1 B across all four — eliminated the array<tuple> materialization.

Test plan

  • 16 new tests in tests/linq/test_linq_from_decs.das (8 parity + 4 splice-shape gates + 2 range-interaction + 2 multi-match edge cases)
  • 185 tests in file (up from 169)
  • 1376 tests/linq suite — interp + AOT green
  • 245 tests/decs suite — interp + AOT green
  • 371 tests/ast_match + 10 tests/template + 9 tests/macro_call + 27 tests/macro_boost green
  • Non-outlier bench smoke (count_aggregate_m4) unchanged at 6 ns/op
  • Lint clean on daslib/linq_fold.das

🤖 Generated with Claude Code

…e/aggregate/element_at

Closes the 4 outliers identified in PR #2812's m4 bench snapshot. These
terminators were falling through to `to_sequence` materialization because
plan_decs_unroll only recognized first/any/all/contains/min_by/max_by plus
the accumulator family. Wave 5 extends the splice path to cover walk-all
single-return terminators + element_at counter early-exit.

emit_decs_walk_lane (last/last_or_default/single/single_or_default/aggregate):
- State (found flag, retained element, accumulator) hoisted at invoke scope
  so it persists across archetype boundaries
- single_or_default uses for_each_archetype_find for early-exit on 2nd match
  (sets stop flag, returns true from inner lambda); others use plain
  for_each_archetype unless take/take_while forces bool-lambda dispatch
- aggregate mirrors emit_accumulator_lane's workhorse/non-workhorse seed
  branch (`=` vs `<-`) and peel-or-invoke fallback for the binary lambda

emit_decs_element_at (element_at/element_at_or_default):
- idx + counter + found + result hoisted at invoke scope
- perElement compares cnt == idx, writes result + returns true on match
- Pre-loop idx<0 panic (element_at) or `return default<T>` (or_default)
- Tail handles "not found" = out-of-range = idx >= effective length

Bench (INTERP, 100K rows, ns/op):
  last_match     m4:  82 → 17   (4.8×)
  single_match   m4:  80 → 14   (5.7×)
  aggregate_match m4: 82 → 7    (11.7×)  beats m3f at 5
  element_at_match m4: 34 → 0   (matches m3f)
Per-op allocs drop 84 B → 1 B across all four (no more array<tuple>).

Tests: 16 new in test_linq_from_decs.das (8 parity + 4 splice-shape gates +
2 range-interaction + 2 multi-match edge cases). 185 tests in file (up from
169), 1376 linq + 245 decs + 371 ast_match suites all green interp + AOT.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 23, 2026 01:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the from_decs unroll/splice path in daslib/linq_fold.das to cover previously non-spliced terminal operations (last*, single*, aggregate, element_at*), avoiding fallback to_sequence materialization and improving performance parity with other terminators.

Changes:

  • Add new decs splice emitters for walk-all terminators (emit_decs_walk_lane) and index-based early-exit (emit_decs_element_at).
  • Update plan_decs_unroll terminator detection to route these ops through the splice path.
  • Add parity + AST-shape gate tests for the new terminators in tests/linq/test_linq_from_decs.das.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
daslib/linq_fold.das Adds new decs unroll emission paths for last*/single*/aggregate/element_at* and wires them into plan_decs_unroll.
tests/linq/test_linq_from_decs.das Adds parity and splice-shape tests validating the new terminator splice coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread daslib/linq_fold.das Outdated
Comment thread daslib/linq_fold.das Outdated
Copilot flagged that `last_or_default` and `single_or_default` splice their
user-supplied default via plain `let dv = arg` + `return dv`, which breaks
parity with daslib/linq.das's `static_if (typeinfo can_copy(defaultValue))
return defaultValue else return <- clone_to_move(defaultValue)` shape.
Non-copyable types (e.g. `array<int>`) compile-error at the prelude bind.

Add two small helpers:
  emit_default_bind(name, argExpr, canCopy)  →  let / var <- clone_to_move
  emit_default_return(name, canCopy)         →  return / return <- clone_to_move

Apply at all 6 default-return sites (3 decs + 3 array). The array-side has
the same pre-existing latent bug since none of the existing tests exercised
non-copyable defaults; mirror per the PR review's parity scope.

element_at_or_default in both paths has NO user-supplied default — switch
from `return default<T>` to `var t : T; return <- t` (linq.das line 2547's
move-init shape, works for both copyable and non-copyable element types).

Out of scope: element_at's success path emits `return $i(valueName)` (plain)
which would still break for non-copyable element types — pre-existing on
both paths, separate fix needs the same can_copy probe applied to
element-type returns across first/element_at success arms.

Tests: new test_linq_fold_non_copyable_default.das with 3 sub-tests
exercising last_or_default + single_or_default with array<int> defaults.
Standalone file (not extending test_linq_fold.das) so it doesn't pull in
pre-existing PERF022 hits from concat_impl/append_impl/prepend_impl that
surface via template instantiation through test_concat. 1380 linq + 245
decs suites green interp + AOT. Bench unchanged (no-op for copyable types).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment on lines +2922 to +2934
[test]
def test_unroll_last_parity(t : T?) {
fixture_unroll2(5)
// vals 0..4 → last is 4
t |> equal(target_unroll_last_fold(), 4, "last splice parity")
}

[test]
def test_unroll_last_or_default_empty(t : T?) {
fixture_unroll2(5)
// val>1000 → none → default -9
t |> equal(target_unroll_last_or_default_fold(), -9, "last_or_default empty source")
}
@borisbat borisbat merged commit 464f7f8 into master May 23, 2026
30 checks passed
@borisbat borisbat deleted the bbatkin/linq-fold-decs-unroll-slice5f-terminator branch May 30, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants