benchmarks/sql: anti-DCE defense — accept() result in every bench run-block#2756
Closed
borisbat wants to merge 2 commits into
Closed
benchmarks/sql: anti-DCE defense — accept() result in every bench run-block#2756borisbat wants to merge 2 commits into
borisbat wants to merge 2 commits into
Conversation
Adds a third benchmark target alongside m1_sql / m3_array / m3f_array_fold: m4_decs_fold runs the same chain shape through `_fold(from_decs_template(type<DecsCar>)...)`. Gives a tri-platform comparison (SQL vs array vs decs) under one chain spec. Shared scaffold in _common.das: `[decs_template(prefix="car_")] DecsCar` mirroring Car's 6 fields + `fixture_decs(n)` parallel to `fixture_array(n)`. Each benchmark file gains a `run_m4` + `[benchmark]` wrapper. Lambda quirks: explicit `$(c : Car)` annotations don't match the decs tuple element type, so m4 lanes use `_select(_.field)` macro form (auto-types via macro expansion). first_or_default_match's sentinel is a named-tuple literal matching the iter element shape. Skipped (Cat C — need new decs surface): indexed_lookup (eid-lookup), join_count (decs join design), zip_dot_product (decs zip surface). Tracked in `benchmarks/sql/M4_DECS_EXPANSION.md` with full first-sweep results matrix, Cat A/B split, suspect-0ns-list, and Wave 2-4 plans (Cat C surface adds / Slice 5+ splice arms / per-chain component-narrowing perf). Wave 1 results (100K, INTERP): - Cat A m4 beats SQL on most aggregate/filter shapes (1.5-9x), ~3-5x slower than m3f due to 6-component multi-iter for-loop overhead even when chain reads one field - Cat B m4 falls to eager bridge (~100-130 ns); becomes the regression guard for each plan_decs_unroll splice arm as Slice 5+ lands Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-block Sweeps 197 b|>run blocks across 51 SQL benchmarks, inserting `b |> accept(<result_var>)` immediately after the inner let-binding. Uses the existing `[sideeffects]` helper `accept` from dastest/testing.das:172. **Why:** Documents intent (result must escape) and protects against future DCE if anyone modifies the chains. ast_dump verified the calls survive compilation: e.g. take_count m3f lowers to a full spliced invoke + accept(b, rows) + empty/failNow guard — the chain is genuinely running. **Finding:** This sweep was initially aimed at the 11 m3f=0 ns/op cases suspected of being DCE'd (select_count, take_count, take_count_filtered, take_sum_aggregate, reverse_take, skip_take, distinct_take, any_match, element_at_match, first_match, first_or_default_match). After the sweep, those cells still report 0 ns/op. Investigation: ast_dump shows the spliced loop body fully expanded and the accept call alive. The zeros are real — dastest reports total_time / n where n=100000, and a body cost ≤~100us divides to ≤1 ns/op and rounds to 0. For take(N)+to_array shapes the divisor is 100000 but only TAKE_N=1000 elements are processed, so the unit underreports actual per-element cost. Not a DCE artifact; matrix is honest. The accept guards stay regardless — cheap insurance for future bench changes, and the convention is uniform across all four lanes (m1/m3/m3f/m4). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the SQL benchmark suite to make benchmark bodies harder to optimize away by ensuring each run-block “uses” its computed result via accept(...). It also expands the benchmark matrix with an m4_decs_fold lane (decs-backed) and documents the m4 expansion plan.
Changes:
- Insert
b |> accept(result)into benchmark run-blocks as an anti-DCE defense. - Introduce an
m4_decs_foldbenchmark lane across many SQL benchmarks (newfixture_decs+DecsCartemplate in_common.das, plus per-benchmarkrun_m4+[benchmark]entrypoints). - Add documentation for the m4 lane expansion plan.
Reviewed changes
Copilot reviewed 53 out of 53 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarks/sql/_common.das | Add decs dependency + DecsCar template and fixture_decs to support m4 lane. |
| benchmarks/sql/aggregate_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/all_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/any_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/average_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/bare_order_where.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/chained_where.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/contains_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/count_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/distinct_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/distinct_take.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/element_at_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/first_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/first_or_default_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/groupby_average.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/groupby_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/groupby_first.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/groupby_having_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/groupby_having_hidden_sum.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/groupby_max.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/groupby_min.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/groupby_multi_reducer.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/groupby_select_sum.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/groupby_sum.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/groupby_where_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/groupby_where_sum.das | Add accept(...) + add m4 lane benchmark (note: nested _select issue flagged). |
| benchmarks/sql/indexed_lookup.das | Add accept(...) in benchmark loop. |
| benchmarks/sql/join_count.das | Add accept(...) in benchmark loop. |
| benchmarks/sql/last_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/long_count_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/max_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/min_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/order_take_desc.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/reverse_take.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/select_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/select_where.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/select_where_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/select_where_order_take.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/select_where_sum.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/single_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/skip_take.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/skip_while_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/sort_first.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/sort_take.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/sum_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/sum_where.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/take_count.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/take_count_filtered.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/take_sum_aggregate.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/take_while_match.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/to_array_filter.das | Add accept(...) + add m4 lane benchmark. |
| benchmarks/sql/zip_dot_product.das | Add accept(...) in benchmark loop. |
| benchmarks/sql/M4_DECS_EXPANSION.md | Add documentation for m4 lane expansion plan. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let groups <- _fold(from_decs_template(type<DecsCar>) | ||
| ._group_by(_.brand) | ||
| ._select((Brand = _._0, | ||
| TotalPrice = _._1 |> _select(_.price) |> sum())) |
| ._where(_.price > 500) | ||
| ._group_by(_.brand) | ||
| ._select((Brand = _._0, | ||
| TotalPrice = _._1 |> _select(_.price) |> sum())) |
| let groups <- _fold(from_decs_template(type<DecsCar>) | ||
| ._group_by(_.brand) | ||
| ._select((Brand = _._0, | ||
| AvgPrice = _._1 |> _select(_.price) |> average())) |
| let groups <- _fold(from_decs_template(type<DecsCar>) | ||
| ._group_by(_.brand) | ||
| ._select((Brand = _._0, | ||
| MinPrice = _._1 |> _select(_.price) |> min())) |
| let groups <- _fold(from_decs_template(type<DecsCar>) | ||
| ._group_by(_.brand) | ||
| ._select((Brand = _._0, | ||
| MaxPrice = _._1 |> _select(_.price) |> max())) |
Comment on lines
+66
to
+67
| TotalPrice = _._1 |> _select(_.price) |> sum(), | ||
| MaxPrice = _._1 |> _select(_.price) |> max())) |
| b |> run("m4_decs_fold/{n}", n) { | ||
| let groups <- _fold(from_decs_template(type<DecsCar>) | ||
| ._group_by(_.brand) | ||
| ._having(_._1 |> _select(_.price) |> sum > 50000) |
Comment on lines
+1
to
+9
| # m4_decs_fold lane — expansion plan | ||
|
|
||
| Adds a third benchmark target alongside `m1_sql` (SQL via `_sql`) and `m3`/`m3f` (array linq, with/without `_fold` splice). The new lane is `m4_decs_fold`: | ||
|
|
||
| ```das | ||
| _fold(from_decs_template(type<DecsCar>)._where(...)._select(...).sum()) | ||
| ``` | ||
|
|
||
| Goal: tri-platform comparison (SQL vs array vs decs) under the same chain shape, so the splice path's decs win is directly comparable to its array win. |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closing — combining with #2757 + Copilot fixes into a single PR rebased on fresh master (
bbatkin/decs-zip-unroll-pattern3already landed via #2750).Will reopen as a single PR covering: m4_decs_fold lane + anti-DCE accept sweep + order_by+first splice arm + fix for first()-panic-on-empty semantics flagged by #2757 review.