Skip to content

sqlite_linq: expression keys in _group_by + PR #2845 review fixes#2847

Merged
borisbat merged 1 commit into
masterfrom
bbatkin/sqlite-linq-groupby-expression-keys
May 24, 2026
Merged

sqlite_linq: expression keys in _group_by + PR #2845 review fixes#2847
borisbat merged 1 commit into
masterfrom
bbatkin/sqlite-linq-groupby-expression-keys

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

Two improvements in one PR:

1. Expression-key support in _group_by (Session 2 of post-#2843 plan)

_group_by(_.Price % 100) (and tuples mixing field keys with expression keys) now lower to GROUP BY (("Price") % (100)). The rendered fragment is re-used verbatim in SELECT (K = _._0) and ORDER BY (_order_by(_._0)) positions, so the SQL stays a single source of truth across all three clauses.

Mechanism: collect_group_keys calls pred_to_sql with a new q.inlineConstants mode so the bound _.X field refs render as columns and ExprConst* literals (ints, floats, strings, bools) inline as SQL literals instead of ? placeholders. The fragment carries no binds, which lets it be re-used at multiple SQL positions without bind-position bookkeeping.

Runtime values reject loudly: _.Col - capturedVar would need bind re-pushing at each use site, so the analyzer rejects with a precise diagnostic (failed_sql_macro case 25).

Order swap in the _group_by peel: recurse into the source first so q.rootType is set before pred_to_sql runs on the key (the existing field-key path didn't need this since translation was deferred to emission; expression keys do).

Bench: benchmarks/sql/groupby_select_sum.das m1 backfilled. The bench uses the explicit-inner-select shape inside SUM (_._1 |> select($(c : Car) => c.price) |> sum()); m3f/m4 keep their splice-friendly bare-sum form; both emit equivalent SQL/compute. results.md refreshed per the living-doc policy (groupby_select_sum SQL bullet removed from the missing-lanes notes).

2. PR #2845 review fixes (Copilot, both real)

  • peel_count_terminal error wording — said _count(predicate) and suggested _count(); the actual terminals in _sql chains are bare count() / long_count(). Underscores dropped in both the offending name and the suggested fix.
  • try_peel_distinct_by_field outer-capture silent miscompile — receiver was pinned to ExprVar but the var wasn't checked against the lambda's bound parameter. _distinct_by(capturedRow.Brand) |> count() would silently emit COUNT(DISTINCT "Brand") against the SQL source — wrong result, no diagnostic. Now extracts the lambda's arg name and rejects on mismatch (failed_sql_macro case 24).

What's in the diff

  • modules/dasSQLITE/daslib/sqlite_linq.das — new q.inlineConstants flag + ExprConst* inlining in pred_to_sql; new push_group_key helper called by collect_group_keys; _group_by peel recurses-first; analyze_grouped_projection / collect_order_keys / GROUP BY emitter all branch on groupByKeyExprs[i] != null to use the cached expression fragment. Copilot fixes folded in.
  • tests/dasSQLITE/test_32_group_by_expression_keys.das (new) — 6 tests covering SQL emission + runtime values for single expression keys, multi-key tuples mixing field + expression keys, ORDER BY on the group key, and aggregate-over-expression-key projection.
  • tests/dasSQLITE/failed_sql_macro.das — cases 24 (outer-capture distinct_by) + 25 (runtime-value expression key) added; expect 50503 bumped 21 → 23.
  • benchmarks/sql/groupby_select_sum.das — m1 lane backfilled; bench comment refreshed (drops the dated TODO, notes the lane-specific shape).
  • benchmarks/sql/results.md — INTERP + JIT tables refreshed; commit-hash header bumped; missing-lanes bullet for groupby_select_sum SQL removed.

Test plan

  • bin/daslang dastest/dastest.das -- --test tests/dasSQLITE/ — 796/796 pass (interpreted)
  • bin/test_aot dastest/dastest.das -- --test tests/dasSQLITE/ — 796/796 pass (AOT)
  • bin/daslang dastest/dastest.das -- --test tests/linq/ — 1390/1390 pass (no shared-machinery regression)
  • INTERP + JIT bench rerun across benchmarks/sql/ — 0 failures, results.md refreshed
  • mcp__daslang__lint + format_file on all touched files — clean
  • Pre-push hook (formatter + lint) — clean

🤖 Generated with Claude Code

Two improvements this PR:

1. Expression-key support in `_group_by` (Session 2 from the post-PR
   #2843 plan). `_group_by(_.Price % 100)` (and tuples mixing field
   keys with expression keys) now lower to `GROUP BY ((price) %
   (100))`. The rendered fragment is reused verbatim in SELECT (`K =
   _._0`) and ORDER BY (`_order_by(_._0)`) positions, so the SQL stays
   a single source of truth across all three clauses.

   Mechanism: `collect_group_keys` calls `pred_to_sql` with a new
   `q.inlineConstants` mode so the bound `_.X` field refs render as
   columns and `ExprConst*` literals (ints, floats, strings, bools)
   inline as SQL literals instead of `?` placeholders. The fragment
   carries no binds, which lets us re-use it at multiple SQL positions
   without bind-position bookkeeping. Runtime values (`_.Col -
   capturedVar`) reject loudly with a precise diagnostic — same one
   exercised by `failed_sql_macro` case 25.

   Order swap in the `_group_by` peel: recurse into the source first
   so `q.rootType` is set before `pred_to_sql` runs on the key (the
   existing field-key path didn't need this since translation was
   deferred to emission; expression keys do).

   Backfills `benchmarks/sql/groupby_select_sum.das` m1 lane. The
   bench uses the explicit-inner-select shape inside SUM (`_._1 |>
   select($(c : Car) => c.price) |> sum()`) — m3f/m4 keep their
   splice-friendly bare-`sum` form; both emit equivalent SQL/compute.
   results.md refreshed per the living-doc policy, "Notes on missing
   lanes" bullet for `groupby_select_sum SQL` removed.

2. Two PR #2845 review fixes (Copilot, both real):

   - `peel_count_terminal`'s predicate-overload error mis-named the
     terminal: it said `_count(predicate)` and suggested `_count()`,
     but in `_sql` chains the bare `count()` / `long_count()` linq
     functions are the actual terminals. Drop the underscores in both
     the offending name and the suggested fix.

   - `try_peel_distinct_by_field` pinned receiver to `ExprVar` but
     didn't verify the var IS the lambda's bound parameter. So
     `_distinct_by(capturedRow.Brand) |> count()` would silently emit
     `COUNT(DISTINCT "Brand")` against the SQL source — wrong result,
     no diagnostic. Now extracts the lambda's arg name from
     `keyLambda._block.arguments[0]` and rejects when receiver name
     differs (`failed_sql_macro` case 24).

Test surface:
- `tests/dasSQLITE/test_32_group_by_expression_keys.das` — 6 tests
  covering SQL emission + runtime for single expression keys,
  multi-key tuples mixing field + expression keys, ORDER BY on the
  group key, and aggregate-over-expression-key projection.
- `failed_sql_macro.das` cases 24 + 25 added (50503:23).

Validation: 796/796 dasSQLITE (interp + AOT), 1390/1390 linq, 0 JIT
bench failures, lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 24, 2026 03:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends sqlite_linq’s _group_by lowering to support computed (expression) keys by rendering the key expression once (with inlined literals) and reusing the same SQL fragment across SELECT, GROUP BY, and ORDER BY. It also incorporates two review-driven correctness fixes around _distinct_by(... ) |> count() and count/long_count terminal error messaging, and updates tests/benchmarks accordingly.

Changes:

  • Add expression-key support for _group_by by caching a bind-free SQL fragment (with constant literals inlined) and reusing it across clauses.
  • Fix _distinct_by translation to reject outer-capture keys that would otherwise silently miscompile, and adjust count/long_count predicate error wording.
  • Add targeted tests for expression keys + update SQL benchmark lane + refresh benchmark results documentation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
modules/dasSQLITE/daslib/sqlite_linq.das Implements expression-key handling for _group_by (including literal inlining) and folds in _distinct_by/count review fixes.
tests/dasSQLITE/test_32_group_by_expression_keys.das New coverage for SQL emission and runtime correctness of expression-key group-by (single + tuple keys, order-by reuse, aggregates).
tests/dasSQLITE/failed_sql_macro.das Adds new negative cases for outer-capture distinct_by keys and runtime-value expression group keys; updates expected error counts.
benchmarks/sql/groupby_select_sum.das Backfills the SQL lane using the new expression-key support and updates bench commentary accordingly.
benchmarks/sql/results.md Refreshes benchmark tables and removes the previously-missing-lane note for groupby_select_sum SQL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@borisbat borisbat merged commit c4312eb into master May 24, 2026
30 checks passed
@borisbat borisbat deleted the bbatkin/sqlite-linq-groupby-expression-keys branch May 30, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants