Skip to content

linq_fold: plan_decs_unroll Slice 5c — distinct/distinct_by on decs#2785

Merged
borisbat merged 1 commit into
masterfrom
bbatkin/linq-fold-decs-unroll-slice5c
May 21, 2026
Merged

linq_fold: plan_decs_unroll Slice 5c — distinct/distinct_by on decs#2785
borisbat merged 1 commit into
masterfrom
bbatkin/linq-fold-decs-unroll-slice5c

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

New plan_decs_distinct planner mirrors plan_distinct, inserted in the cascade BEFORE plan_distinct (same rule as 5d/5e — decs bridge match is stricter). Streaming dedup via var inscope seen : table<typedecl(_::unique_key(...))> hoisted above for_each_archetype so the seen-table persists across archetypes.

Emission shapes

Buffer-required (to_array default, optionally take(N)): per-element splice computes the key, runs key_exists check, on miss does taken++ → seen|>insert → buf|>push_clone. When take(N) is present, outer iteration switches to for_each_archetype_find and the take-cap return true propagates across archetypes — true streaming early-exit. Take-limit bound to a let at outer scope so a side-effecting take(arg) evaluates exactly once.

Buffer-elided (count/long_countlength(seen) / int64(length(seen)); sum folds inline at fresh-key site via acc += <projection>): no buffer allocation, only the seen-table is materialized. count + take both present cascades to tier 2 (matches array-side plan_distinct).

Side-effecting _select(proj) upstream binds once per element to a fresh decs_pv local — key + buf push (or sum fold) share the bind, matching array-side single-eval per source element. distinct_by(key) wraps the key block in invoke(<keyBlock>, <projection-or-tup>) and the seen-table's value type is derived via _::unique_key(invoke(<keyBlock>, default<elemT>)).

Benchmarks (100K rows, INTERP)

benchmark m1 sql m3 m3f (array splice) m4 (was) m4 (now) win
distinct_count 41 44 15 97 28 3.5× over eager bridge
distinct_take 0 30 0 34 0 matches m3f (early-exit)

distinct_count lands within 2× of m3f's 15 — the gap is the Wave 4 multi-component get_ro floor (3 components participate in the inner for-loop even when the chain only reads brand). distinct_take collapses to 0 ns/op — early-exit at the 3rd distinct brand visits only ~3 source elements regardless of N=100K, matching the array-side splice.

Coverage

_select(_.brand).distinct() + to_array / count / long_count / sum / take(N) / take(0) / take(N>num_distinct), _where(_.id<8)._select(_.brand).distinct(), _distinct_by(_.brand) + to_array / count / take(N), empty-decs distinct yields empty, side-effecting take(N) arg evaluates exactly once. AST shape gates for: distinct+count (no to_sequence, no decs_buf, single key_exists, plain for_each_archetype), distinct+take (for_each_archetype_find + decs_buf + decs_seen + decs_taken counters), distinct_by+to_array (unique_key wrapping the key invocation), distinct+sum (no decs_buf, decs_acc declared+folded). +17 tests (122 → 139 in file).

Test plan

  • mcp__daslang__lint + format_file clean
  • CI lint (utils/lint/main.das) clean
  • Interp: full tests/linq/ 1327/1327 + tests/decs/ 240/240 green
  • AOT (test_aot rebuild): same 1327 + 240 green
  • Benches confirm splice fires: distinct_count 97→28 ns/op, distinct_take 34→0 ns/op
  • ast_dump --mode source on target_unroll5c_select_distinct_take2_fold shows for_each_archetype_find + take-cap return true + unique_key + single key_exists + insert+push_clone+taken++ on fresh-key
  • CI green across matrix

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 21, 2026 08:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a dedicated LINQ fold planner for distinct / distinct_by on DEC systems (decs bridge) to enable streaming-style de-duplication using a hoisted seen table that persists across archetypes, improving performance and enabling early-exit for take(N).

Changes:

  • Added plan_decs_distinct to daslib/linq_fold.das and inserted it before plan_distinct in the fold cascade.
  • Added a new Slice 5c DEC-based test suite covering parity + AST-shape gating for distinct / distinct_by with multiple terminators (to_array, count, long_count, sum, take).
  • Updated benchmark documentation to describe Slice 5c behavior and results.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/linq/test_linq_from_decs.das Adds Slice 5c tests for DEC-backed distinct / distinct_by, including AST shape gates and single-eval side-effect checks.
daslib/linq_fold.das Introduces plan_decs_distinct macro planner and wires it into the fold cascade before plan_distinct.
benchmarks/sql/M4_DECS_EXPANSION.md Documents Slice 5c planner behavior, emission shapes, and benchmark deltas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread daslib/linq_fold.das
Comment thread daslib/linq_fold.das Outdated
New `plan_decs_distinct` planner mirrors `plan_distinct`, inserted in the
cascade BEFORE `plan_distinct` (same rule as 5d/5e — decs bridge match is
stricter). Streaming dedup via `var inscope seen : table<typedecl(_::unique_key
(...))>` hoisted above `for_each_archetype` so the seen-table persists across
archetypes. Two emit shapes:

- **Buffer-required** (`to_array` default, optionally `take(N)`): per-element
  splice computes the key, `key_exists` check, on miss does
  `taken++ → seen|>insert → buf|>push_clone`. When `take(N)` is present, outer
  iteration switches to `for_each_archetype_find` and the take-cap
  `return true` propagates across archetypes — true streaming early-exit.
  Take-limit bound to a `let` at outer scope so a side-effecting `take(arg)`
  evaluates exactly once.
- **Buffer-elided** (`count`/`long_count` → `length(seen)` /
  `int64(length(seen))`; `sum` folds inline at fresh-key site via
  `acc += <projection>`): no buffer allocation, only the seen-table is
  materialized. count + take both present cascades to tier 2 (matches array-
  side `plan_distinct`).

Side-effecting `_select(proj)` upstream binds once per element to a fresh
`decs_pv` local — key + buf push (or sum fold) share the bind, matching array-
side single-eval per source element. `distinct_by(key)` wraps the key block in
`invoke(<keyBlock>, <projection-or-tup>)` and the seen-table's value type is
derived via `_::unique_key(invoke(<keyBlock>, default<elemT>))`.

Copilot review fixes:
- **plan_distinct predicate-drop bug (pre-existing).** Adds the symmetric
  arg-count guard on the terminator pop. `plan_distinct` was popping
  `count`/`long_count`/`sum` unconditionally — so
  `each(arr).distinct().count(predicate)` silently dropped the predicate and
  emitted `length(seen)` instead of filtering. Repro returned 5 instead of 3
  for `[1,1,2,2,3,3,4,4,5,5].distinct().count(_>2)`. Fix mirrors the guard
  already in `plan_decs_distinct`: pop only when `length(args)==1`; otherwise
  cascade to tier-2 which honors the predicate via `count(seq, pred)`. +4
  parity tests (2 array, 2 decs) that fail pre-fix and pass post-fix.
- Comment wording in `plan_decs_distinct` no longer references the
  non-existent `sum(src, sel)` overload.

Benchmarks (100K rows, INTERP):

| benchmark | m1 sql | m3 | m3f | m4 (was) | m4 (now) | win |
|---|---:|---:|---:|---:|---:|---:|
| distinct_count | 41 | 44 | 15 | 97 | **28** | 3.5× |
| distinct_take  |  0 | 30 |  0 | 34 |  **0** | matches m3f |

Coverage: `_select(_.brand).distinct()` + to_array / count / long_count / sum /
take(N) / take(0) / take(N>num_distinct), `_where(_.id<8)._select(_.brand).
distinct()`, `_distinct_by(_.brand)` + to_array / count / take(N), empty-decs
distinct yields empty, side-effecting take(N) arg evaluates exactly once.
`count(pred)`/`long_count(pred)` after distinct (both array + decs sources)
must honor the predicate. AST shape gates for: distinct+count (no
to_sequence, no decs_buf, single key_exists, plain for_each_archetype),
distinct+take (for_each_archetype_find + decs_buf + decs_seen + decs_taken
counters), distinct_by+to_array (unique_key wrapping the key invocation),
distinct+sum (no decs_buf, decs_acc declared+folded). +19 tests (122 → 141
in test_linq_from_decs; +2 in test_linq_fold).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisbat borisbat force-pushed the bbatkin/linq-fold-decs-unroll-slice5c branch from 53c8b98 to 343f579 Compare May 21, 2026 08:45
@borisbat borisbat requested a review from Copilot May 21, 2026 08:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@borisbat borisbat merged commit 0561df6 into master May 21, 2026
32 checks passed
@borisbat borisbat deleted the bbatkin/linq-fold-decs-unroll-slice5c branch May 30, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants