Skip to content

benchmarks/sql: anti-DCE defense — accept() result in every bench run-block#2756

Closed
borisbat wants to merge 2 commits into
bbatkin/decs-unroll-slice4from
bbatkin/bench-anti-dce
Closed

benchmarks/sql: anti-DCE defense — accept() result in every bench run-block#2756
borisbat wants to merge 2 commits into
bbatkin/decs-unroll-slice4from
bbatkin/bench-anti-dce

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

@borisbat borisbat commented May 20, 2026

Closing — combining with #2757 + Copilot fixes into a single PR rebased on fresh master (bbatkin/decs-zip-unroll-pattern3 already landed via #2750).

Will reopen as a single PR covering: m4_decs_fold lane + anti-DCE accept sweep + order_by+first splice arm + fix for first()-panic-on-empty semantics flagged by #2757 review.

borisbat and others added 2 commits May 20, 2026 02:57
Adds a third benchmark target alongside m1_sql / m3_array / m3f_array_fold:
m4_decs_fold runs the same chain shape through `_fold(from_decs_template(type<DecsCar>)...)`.
Gives a tri-platform comparison (SQL vs array vs decs) under one chain spec.

Shared scaffold in _common.das: `[decs_template(prefix="car_")] DecsCar` mirroring
Car's 6 fields + `fixture_decs(n)` parallel to `fixture_array(n)`. Each benchmark
file gains a `run_m4` + `[benchmark]` wrapper.

Lambda quirks: explicit `$(c : Car)` annotations don't match the decs tuple
element type, so m4 lanes use `_select(_.field)` macro form (auto-types via
macro expansion). first_or_default_match's sentinel is a named-tuple literal
matching the iter element shape.

Skipped (Cat C — need new decs surface): indexed_lookup (eid-lookup), join_count
(decs join design), zip_dot_product (decs zip surface). Tracked in
`benchmarks/sql/M4_DECS_EXPANSION.md` with full first-sweep results matrix,
Cat A/B split, suspect-0ns-list, and Wave 2-4 plans (Cat C surface adds /
Slice 5+ splice arms / per-chain component-narrowing perf).

Wave 1 results (100K, INTERP):
- Cat A m4 beats SQL on most aggregate/filter shapes (1.5-9x), ~3-5x slower than m3f
  due to 6-component multi-iter for-loop overhead even when chain reads one field
- Cat B m4 falls to eager bridge (~100-130 ns); becomes the regression guard for
  each plan_decs_unroll splice arm as Slice 5+ lands

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-block

Sweeps 197 b|>run blocks across 51 SQL benchmarks, inserting
`b |> accept(<result_var>)` immediately after the inner let-binding. Uses
the existing `[sideeffects]` helper `accept` from dastest/testing.das:172.

**Why:** Documents intent (result must escape) and protects against future
DCE if anyone modifies the chains. ast_dump verified the calls survive
compilation: e.g. take_count m3f lowers to a full spliced invoke + accept(b,
rows) + empty/failNow guard — the chain is genuinely running.

**Finding:** This sweep was initially aimed at the 11 m3f=0 ns/op cases
suspected of being DCE'd (select_count, take_count, take_count_filtered,
take_sum_aggregate, reverse_take, skip_take, distinct_take, any_match,
element_at_match, first_match, first_or_default_match). After the sweep,
those cells still report 0 ns/op. Investigation: ast_dump shows the spliced
loop body fully expanded and the accept call alive. The zeros are real —
dastest reports total_time / n where n=100000, and a body cost ≤~100us
divides to ≤1 ns/op and rounds to 0. For take(N)+to_array shapes the divisor
is 100000 but only TAKE_N=1000 elements are processed, so the unit
underreports actual per-element cost. Not a DCE artifact; matrix is honest.

The accept guards stay regardless — cheap insurance for future bench
changes, and the convention is uniform across all four lanes (m1/m3/m3f/m4).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SQL benchmark suite to make benchmark bodies harder to optimize away by ensuring each run-block “uses” its computed result via accept(...). It also expands the benchmark matrix with an m4_decs_fold lane (decs-backed) and documents the m4 expansion plan.

Changes:

  • Insert b |> accept(result) into benchmark run-blocks as an anti-DCE defense.
  • Introduce an m4_decs_fold benchmark lane across many SQL benchmarks (new fixture_decs + DecsCar template in _common.das, plus per-benchmark run_m4 + [benchmark] entrypoints).
  • Add documentation for the m4 lane expansion plan.

Reviewed changes

Copilot reviewed 53 out of 53 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
benchmarks/sql/_common.das Add decs dependency + DecsCar template and fixture_decs to support m4 lane.
benchmarks/sql/aggregate_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/all_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/any_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/average_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/bare_order_where.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/chained_where.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/contains_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/count_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/distinct_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/distinct_take.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/element_at_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/first_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/first_or_default_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/groupby_average.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/groupby_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/groupby_first.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/groupby_having_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/groupby_having_hidden_sum.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/groupby_max.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/groupby_min.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/groupby_multi_reducer.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/groupby_select_sum.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/groupby_sum.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/groupby_where_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/groupby_where_sum.das Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged).
benchmarks/sql/indexed_lookup.das Add accept(...) in benchmark loop.
benchmarks/sql/join_count.das Add accept(...) in benchmark loop.
benchmarks/sql/last_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/long_count_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/max_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/min_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/order_take_desc.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/reverse_take.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/select_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/select_where.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/select_where_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/select_where_order_take.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/select_where_sum.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/single_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/skip_take.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/skip_while_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/sort_first.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/sort_take.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/sum_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/sum_where.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/take_count.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/take_count_filtered.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/take_sum_aggregate.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/take_while_match.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/to_array_filter.das Add accept(...) + add m4 lane benchmark.
benchmarks/sql/zip_dot_product.das Add accept(...) in benchmark loop.
benchmarks/sql/M4_DECS_EXPANSION.md Add documentation for m4 lane expansion plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

let groups <- _fold(from_decs_template(type<DecsCar>)
._group_by(_.brand)
._select((Brand = _._0,
TotalPrice = _._1 |> _select(_.price) |> sum()))
._where(_.price > 500)
._group_by(_.brand)
._select((Brand = _._0,
TotalPrice = _._1 |> _select(_.price) |> sum()))
let groups <- _fold(from_decs_template(type<DecsCar>)
._group_by(_.brand)
._select((Brand = _._0,
AvgPrice = _._1 |> _select(_.price) |> average()))
let groups <- _fold(from_decs_template(type<DecsCar>)
._group_by(_.brand)
._select((Brand = _._0,
MinPrice = _._1 |> _select(_.price) |> min()))
let groups <- _fold(from_decs_template(type<DecsCar>)
._group_by(_.brand)
._select((Brand = _._0,
MaxPrice = _._1 |> _select(_.price) |> max()))
Comment on lines +66 to +67
TotalPrice = _._1 |> _select(_.price) |> sum(),
MaxPrice = _._1 |> _select(_.price) |> max()))
b |> run("m4_decs_fold/{n}", n) {
let groups <- _fold(from_decs_template(type<DecsCar>)
._group_by(_.brand)
._having(_._1 |> _select(_.price) |> sum > 50000)
Comment on lines +1 to +9
# m4_decs_fold lane — expansion plan

Adds a third benchmark target alongside `m1_sql` (SQL via `_sql`) and `m3`/`m3f` (array linq, with/without `_fold` splice). The new lane is `m4_decs_fold`:

```das
_fold(from_decs_template(type<DecsCar>)._where(...)._select(...).sum())
```

Goal: tri-platform comparison (SQL vs array vs decs) under the same chain shape, so the splice path's decs win is directly comparable to its array win.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants