Skip to content

sqlite_linq: add long_count + _distinct_by dispatch arms#2845

Merged
borisbat merged 1 commit into
masterfrom
bbatkin/sqlite-linq-distinct-by-count-and-long-count
May 24, 2026
Merged

sqlite_linq: add long_count + _distinct_by dispatch arms#2845
borisbat merged 1 commit into
masterfrom
bbatkin/sqlite-linq-distinct-by-count-and-long-count

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

Session 1 of the post-#2843 follow-up plan. Two new chain shapes now lower in _sql(...):

  • |> long_count()SELECT COUNT(*) with an int64 column binding, symmetric with the existing |> count() (int). Also recognised inside _select after _group_by so (..., N = _._1 |> long_count()) widens the per-group counter to int64.
  • |> _distinct_by(_.col) |> count() (and the long_count variant) → SELECT COUNT(DISTINCT "col"). Receiver-pinned to ExprVar so u.opt.X keys reject loudly instead of silently emitting the wrong COUNT.

A small refactor extracts peel_count_terminal so both terminals share the distinct-by recognition path.

What's in the diff

  • modules/dasSQLITE/daslib/sqlite_linq.das — new helpers try_peel_distinct_by_field + peel_count_terminal; the inline count peel block becomes two calls to the shared helper. try_translate_group_aggregate learns long_count (widens to int64).
  • tests/dasSQLITE/test_31_long_count_distinct_by_canary.das (new) — 8 tests covering SQL-text emission and runtime values for both terminals, plus the int64-grouped projection.
  • tests/dasSQLITE/failed_sql_macro.das — case 23 added: computed _distinct_by key (_.Price % 100) rejects with a precise diagnostic. expect 50503 bumped 20 → 21.
  • benchmarks/sql/distinct_by_count.das — m1 lane backfilled with _sql(... |> _distinct_by(_.brand) |> count()); the dated TODO comment is removed.
  • benchmarks/sql/long_count_aggregate.das — m1 lane now actually calls long_count() instead of count() (latent bench-bug). Accept is now c == 0l to match the int64 return.
  • benchmarks/sql/results.md — refreshed INTERP + JIT tables per the [[feedback-living-results-md]] policy; commit-hash header bumped; the "distinct_by_count SQL" bullet in "Notes on missing lanes" is removed now that the lane has a number.

Headline new cells: distinct_by_count SQL = 40.5 ns/op (INTERP) / 40.9 (JIT); long_count_aggregate SQL = 29.4 (INTERP) / 29.5 (JIT) and now genuinely measures long_count.

Test plan

  • bin/daslang dastest/dastest.das -- --test tests/dasSQLITE/ — 790/790 pass (interpreted)
  • bin/test_aot dastest/dastest.das -- --test tests/dasSQLITE/ — 790/790 pass (AOT)
  • bin/daslang dastest/dastest.das -- --test tests/linq/ — 1390/1390 pass (no shared-machinery regression)
  • INTERP + JIT bench rerun across benchmarks/sql/ — 0 failures, results.md refreshed
  • mcp__daslang__lint on all touched .das files — clean
  • mcp__daslang__format_file — all already-formatted

🤖 Generated with Claude Code

Two new chain shapes lower in `_sql(...)`:

- `|> long_count()` → `SELECT COUNT(*)` with an int64 column binding,
  symmetric with `|> count()` (int). Also recognised in the grouped
  projection arm so `_select((..., N = _._1 |> long_count()))` widens
  the per-group counter to int64.
- `|> _distinct_by(_.col) |> count()` (and the long_count variant) →
  `SELECT COUNT(DISTINCT "col")`. Receiver-pinned to ExprVar so
  `u.opt.X` keys reject loudly instead of silently emitting the wrong
  COUNT.

`peel_count_terminal` factors the shared count/long_count peel so both
share the distinct-by recognition path.

Test surface:
- `tests/dasSQLITE/test_31_long_count_distinct_by_canary.das` — 8 new
  tests covering SQL text emission + runtime values for both terminals,
  plus the int64-grouped projection.
- `failed_sql_macro.das` case 23 — computed `_distinct_by` key (e.g.
  `_.Price % 100`) rejects with a precise diagnostic.

Bench corpus:
- `benchmarks/sql/distinct_by_count.das` — m1 lane backfilled
  (`_sql(... |> _distinct_by(_.brand) |> count())`); the dated TODO
  comment is removed.
- `benchmarks/sql/long_count_aggregate.das` — m1 lane now actually
  calls `long_count()` instead of `count()`; the int64 accept matches
  m3f/m4. Latent bench-bug fixed.
- `benchmarks/sql/results.md` — refreshed INTERP + JIT tables; commit
  header bumped; the `distinct_by_count SQL` bullet in "Notes on
  missing lanes" is removed now that the lane has a number.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 24, 2026 02:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the SQLite LINQ lowering (_sql(...)) to support two additional terminal chain shapes: long_count() (returning int64) and _distinct_by(_.Field) |> count()/long_count() (lowering to COUNT(DISTINCT "col")). It also updates benches and adds canary tests to validate both SQL text emission and runtime behavior.

Changes:

  • Add _sql lowering for long_count() including group aggregate translation (_._1 |> long_count()COUNT(*) typed as int64).
  • Add _distinct_by(_.Field) |> count()/long_count() lowering to COUNT(DISTINCT "Field"), plus a shared peel helper for count/long_count.
  • Add tests and update benchmarks/results to cover the new SQL-lowered lanes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
modules/dasSQLITE/daslib/sqlite_linq.das Adds shared count/long_count terminal peeling and _distinct_byCOUNT(DISTINCT ...) support; adds long_count support in grouped aggregates.
tests/dasSQLITE/test_31_long_count_distinct_by_canary.das New canary tests for long_count and distinct_by+count/long_count (SQL text + runtime values).
tests/dasSQLITE/failed_sql_macro.das Adds a new negative case ensuring computed _distinct_by keys reject with a diagnostic.
benchmarks/sql/distinct_by_count.das Adds/reenables the SQL (m1) lane for distinct_by_count using the new lowering.
benchmarks/sql/long_count_aggregate.das Fixes the SQL lane to actually benchmark long_count() and updates the zero-check to int64.
benchmarks/sql/results.md Refreshes benchmark tables to include the new SQL lane and updated numbers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +620 to +624
if (cCall.arguments |> length != 1) {
macro_error(prog, at, "_sql: `_{linqName}(predicate)` is not yet supported — use `|> _where(predicate) |> _{linqName}()` instead")
q.hadError = true
return true
}
Comment on lines +590 to +605
var body = peel_lambda_single_return(dbCall.arguments[1])
if (body == null) {
macro_error(prog, at, "_sql: _distinct_by key lambda has unexpected shape")
q.hadError = true
return ""
}
var receiver : ExpressionPtr
var fieldName : string
if (!qmatch(body, $e(receiver).$f(fieldName)).matched
|| receiver == null || !(receiver is ExprVar)) {
macro_error(prog, at, "_sql: _distinct_by key must be `<arg>.<field>` (single column reference) — computed keys are not yet translatable to SQL")
q.hadError = true
return ""
}
distinctSource = dbCall.arguments[0]
return fieldName
@borisbat borisbat merged commit 31e4339 into master May 24, 2026
30 checks passed
borisbat added a commit that referenced this pull request May 24, 2026
…-groupby-expression-keys

sqlite_linq: expression keys in _group_by + PR #2845 review fixes
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 24, 2026
… review fixes

Two improvements this PR:

1. Expression-key support in `_group_by` (Session 2 from the post-PR
   GaijinEntertainment#2843 plan). `_group_by(_.Price % 100)` (and tuples mixing field
   keys with expression keys) now lower to `GROUP BY ((price) %
   (100))`. The rendered fragment is reused verbatim in SELECT (`K =
   _._0`) and ORDER BY (`_order_by(_._0)`) positions, so the SQL stays
   a single source of truth across all three clauses.

   Mechanism: `collect_group_keys` calls `pred_to_sql` with a new
   `q.inlineConstants` mode so the bound `_.X` field refs render as
   columns and `ExprConst*` literals (ints, floats, strings, bools)
   inline as SQL literals instead of `?` placeholders. The fragment
   carries no binds, which lets us re-use it at multiple SQL positions
   without bind-position bookkeeping. Runtime values (`_.Col -
   capturedVar`) reject loudly with a precise diagnostic — same one
   exercised by `failed_sql_macro` case 25.

   Order swap in the `_group_by` peel: recurse into the source first
   so `q.rootType` is set before `pred_to_sql` runs on the key (the
   existing field-key path didn't need this since translation was
   deferred to emission; expression keys do).

   Backfills `benchmarks/sql/groupby_select_sum.das` m1 lane. The
   bench uses the explicit-inner-select shape inside SUM (`_._1 |>
   select($(c : Car) => c.price) |> sum()`) — m3f/m4 keep their
   splice-friendly bare-`sum` form; both emit equivalent SQL/compute.
   results.md refreshed per the living-doc policy, "Notes on missing
   lanes" bullet for `groupby_select_sum SQL` removed.

2. Two PR GaijinEntertainment#2845 review fixes (Copilot, both real):

   - `peel_count_terminal`'s predicate-overload error mis-named the
     terminal: it said `_count(predicate)` and suggested `_count()`,
     but in `_sql` chains the bare `count()` / `long_count()` linq
     functions are the actual terminals. Drop the underscores in both
     the offending name and the suggested fix.

   - `try_peel_distinct_by_field` pinned receiver to `ExprVar` but
     didn't verify the var IS the lambda's bound parameter. So
     `_distinct_by(capturedRow.Brand) |> count()` would silently emit
     `COUNT(DISTINCT "Brand")` against the SQL source — wrong result,
     no diagnostic. Now extracts the lambda's arg name from
     `keyLambda._block.arguments[0]` and rejects when receiver name
     differs (`failed_sql_macro` case 24).

Test surface:
- `tests/dasSQLITE/test_32_group_by_expression_keys.das` — 6 tests
  covering SQL emission + runtime for single expression keys,
  multi-key tuples mixing field + expression keys, ORDER BY on the
  group key, and aggregate-over-expression-key projection.
- `failed_sql_macro.das` cases 24 + 25 added (50503:23).

Validation: 796/796 dasSQLITE (interp + AOT), 1390/1390 linq, 0 JIT
bench failures, lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@borisbat borisbat deleted the bbatkin/sqlite-linq-distinct-by-count-and-long-count branch May 30, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants