sqlite_linq: add long_count + _distinct_by dispatch arms by borisbat · Pull Request #2845 · GaijinEntertainment/daScript

borisbat · 2026-05-24T02:11:12Z

Summary

Session 1 of the post-#2843 follow-up plan. Two new chain shapes now lower in _sql(...):

|> long_count() → SELECT COUNT(*) with an int64 column binding, symmetric with the existing |> count() (int). Also recognised inside _select after _group_by so (..., N = _._1 |> long_count()) widens the per-group counter to int64.
|> _distinct_by(_.col) |> count() (and the long_count variant) → SELECT COUNT(DISTINCT "col"). Receiver-pinned to ExprVar so u.opt.X keys reject loudly instead of silently emitting the wrong COUNT.

A small refactor extracts peel_count_terminal so both terminals share the distinct-by recognition path.

What's in the diff

modules/dasSQLITE/daslib/sqlite_linq.das — new helpers try_peel_distinct_by_field + peel_count_terminal; the inline count peel block becomes two calls to the shared helper. try_translate_group_aggregate learns long_count (widens to int64).
tests/dasSQLITE/test_31_long_count_distinct_by_canary.das (new) — 8 tests covering SQL-text emission and runtime values for both terminals, plus the int64-grouped projection.
tests/dasSQLITE/failed_sql_macro.das — case 23 added: computed _distinct_by key (_.Price % 100) rejects with a precise diagnostic. expect 50503 bumped 20 → 21.
benchmarks/sql/distinct_by_count.das — m1 lane backfilled with _sql(... |> _distinct_by(_.brand) |> count()); the dated TODO comment is removed.
benchmarks/sql/long_count_aggregate.das — m1 lane now actually calls long_count() instead of count() (latent bench-bug). Accept is now c == 0l to match the int64 return.
benchmarks/sql/results.md — refreshed INTERP + JIT tables per the [[feedback-living-results-md]] policy; commit-hash header bumped; the "distinct_by_count SQL" bullet in "Notes on missing lanes" is removed now that the lane has a number.

Headline new cells: distinct_by_count SQL = 40.5 ns/op (INTERP) / 40.9 (JIT); long_count_aggregate SQL = 29.4 (INTERP) / 29.5 (JIT) and now genuinely measures long_count.

Test plan

bin/daslang dastest/dastest.das -- --test tests/dasSQLITE/ — 790/790 pass (interpreted)
bin/test_aot dastest/dastest.das -- --test tests/dasSQLITE/ — 790/790 pass (AOT)
bin/daslang dastest/dastest.das -- --test tests/linq/ — 1390/1390 pass (no shared-machinery regression)
INTERP + JIT bench rerun across benchmarks/sql/ — 0 failures, results.md refreshed
mcp__daslang__lint on all touched .das files — clean
mcp__daslang__format_file — all already-formatted

🤖 Generated with Claude Code

Two new chain shapes lower in `_sql(...)`: - `|> long_count()` → `SELECT COUNT(*)` with an int64 column binding, symmetric with `|> count()` (int). Also recognised in the grouped projection arm so `_select((..., N = _._1 |> long_count()))` widens the per-group counter to int64. - `|> _distinct_by(_.col) |> count()` (and the long_count variant) → `SELECT COUNT(DISTINCT "col")`. Receiver-pinned to ExprVar so `u.opt.X` keys reject loudly instead of silently emitting the wrong COUNT. `peel_count_terminal` factors the shared count/long_count peel so both share the distinct-by recognition path. Test surface: - `tests/dasSQLITE/test_31_long_count_distinct_by_canary.das` — 8 new tests covering SQL text emission + runtime values for both terminals, plus the int64-grouped projection. - `failed_sql_macro.das` case 23 — computed `_distinct_by` key (e.g. `_.Price % 100`) rejects with a precise diagnostic. Bench corpus: - `benchmarks/sql/distinct_by_count.das` — m1 lane backfilled (`_sql(... |> _distinct_by(_.brand) |> count())`); the dated TODO comment is removed. - `benchmarks/sql/long_count_aggregate.das` — m1 lane now actually calls `long_count()` instead of `count()`; the int64 accept matches m3f/m4. Latent bench-bug fixed. - `benchmarks/sql/results.md` — refreshed INTERP + JIT tables; commit header bumped; the `distinct_by_count SQL` bullet in "Notes on missing lanes" is removed now that the lane has a number. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR extends the SQLite LINQ lowering (_sql(...)) to support two additional terminal chain shapes: long_count() (returning int64) and _distinct_by(_.Field) |> count()/long_count() (lowering to COUNT(DISTINCT "col")). It also updates benches and adds canary tests to validate both SQL text emission and runtime behavior.

Changes:

Add _sql lowering for long_count() including group aggregate translation (_._1 |> long_count() → COUNT(*) typed as int64).
Add _distinct_by(_.Field) |> count()/long_count() lowering to COUNT(DISTINCT "Field"), plus a shared peel helper for count/long_count.
Add tests and update benchmarks/results to cover the new SQL-lowered lanes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
modules/dasSQLITE/daslib/sqlite_linq.das	Adds shared `count`/`long_count` terminal peeling and `_distinct_by`→`COUNT(DISTINCT ...)` support; adds `long_count` support in grouped aggregates.
tests/dasSQLITE/test_31_long_count_distinct_by_canary.das	New canary tests for long_count and distinct_by+count/long_count (SQL text + runtime values).
tests/dasSQLITE/failed_sql_macro.das	Adds a new negative case ensuring computed `_distinct_by` keys reject with a diagnostic.
benchmarks/sql/distinct_by_count.das	Adds/reenables the SQL (m1) lane for distinct_by_count using the new lowering.
benchmarks/sql/long_count_aggregate.das	Fixes the SQL lane to actually benchmark `long_count()` and updates the zero-check to int64.
benchmarks/sql/results.md	Refreshes benchmark tables to include the new SQL lane and updated numbers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if (cCall.arguments |> length != 1) {
+        macro_error(prog, at, "_sql: `_{linqName}(predicate)` is not yet supported — use `|> _where(predicate) |> _{linqName}()` instead")
+        q.hadError = true
+        return true
+    }


+    var body = peel_lambda_single_return(dbCall.arguments[1])
+    if (body == null) {
+        macro_error(prog, at, "_sql: _distinct_by key lambda has unexpected shape")
+        q.hadError = true
+        return ""
+    }
+    var receiver : ExpressionPtr
+    var fieldName : string
+    if (!qmatch(body, $e(receiver).$f(fieldName)).matched
+            || receiver == null || !(receiver is ExprVar)) {
+        macro_error(prog, at, "_sql: _distinct_by key must be `<arg>.<field>` (single column reference) — computed keys are not yet translatable to SQL")
+        q.hadError = true
+        return ""
+    }
+    distinctSource = dbCall.arguments[0]
+    return fieldName


…-groupby-expression-keys sqlite_linq: expression keys in _group_by + PR #2845 review fixes

… review fixes Two improvements this PR: 1. Expression-key support in `_group_by` (Session 2 from the post-PR GaijinEntertainment#2843 plan). `_group_by(_.Price % 100)` (and tuples mixing field keys with expression keys) now lower to `GROUP BY ((price) % (100))`. The rendered fragment is reused verbatim in SELECT (`K = _._0`) and ORDER BY (`_order_by(_._0)`) positions, so the SQL stays a single source of truth across all three clauses. Mechanism: `collect_group_keys` calls `pred_to_sql` with a new `q.inlineConstants` mode so the bound `_.X` field refs render as columns and `ExprConst*` literals (ints, floats, strings, bools) inline as SQL literals instead of `?` placeholders. The fragment carries no binds, which lets us re-use it at multiple SQL positions without bind-position bookkeeping. Runtime values (`_.Col - capturedVar`) reject loudly with a precise diagnostic — same one exercised by `failed_sql_macro` case 25. Order swap in the `_group_by` peel: recurse into the source first so `q.rootType` is set before `pred_to_sql` runs on the key (the existing field-key path didn't need this since translation was deferred to emission; expression keys do). Backfills `benchmarks/sql/groupby_select_sum.das` m1 lane. The bench uses the explicit-inner-select shape inside SUM (`_._1 |> select($(c : Car) => c.price) |> sum()`) — m3f/m4 keep their splice-friendly bare-`sum` form; both emit equivalent SQL/compute. results.md refreshed per the living-doc policy, "Notes on missing lanes" bullet for `groupby_select_sum SQL` removed. 2. Two PR GaijinEntertainment#2845 review fixes (Copilot, both real): - `peel_count_terminal`'s predicate-overload error mis-named the terminal: it said `_count(predicate)` and suggested `_count()`, but in `_sql` chains the bare `count()` / `long_count()` linq functions are the actual terminals. Drop the underscores in both the offending name and the suggested fix. - `try_peel_distinct_by_field` pinned receiver to `ExprVar` but didn't verify the var IS the lambda's bound parameter. So `_distinct_by(capturedRow.Brand) |> count()` would silently emit `COUNT(DISTINCT "Brand")` against the SQL source — wrong result, no diagnostic. Now extracts the lambda's arg name from `keyLambda._block.arguments[0]` and rejects when receiver name differs (`failed_sql_macro` case 24). Test surface: - `tests/dasSQLITE/test_32_group_by_expression_keys.das` — 6 tests covering SQL emission + runtime for single expression keys, multi-key tuples mixing field + expression keys, ORDER BY on the group key, and aggregate-over-expression-key projection. - `failed_sql_macro.das` cases 24 + 25 added (50503:23). Validation: 796/796 dasSQLITE (interp + AOT), 1390/1390 linq, 0 JIT bench failures, lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 24, 2026 02:11

Copilot started reviewing on behalf of borisbat May 24, 2026 02:11 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

borisbat merged commit 31e4339 into master May 24, 2026
30 checks passed

borisbat mentioned this pull request May 24, 2026

sqlite_linq: expression keys in _group_by + PR #2845 review fixes #2847

Merged

6 tasks

borisbat added a commit that referenced this pull request May 24, 2026

Merge pull request #2847 from GaijinEntertainment/bbatkin/sqlite-linq…

c4312eb

…-groupby-expression-keys sqlite_linq: expression keys in _group_by + PR #2845 review fixes

borisbat deleted the bbatkin/sqlite-linq-distinct-by-count-and-long-count branch May 30, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlite_linq: add long_count + _distinct_by dispatch arms#2845

sqlite_linq: add long_count + _distinct_by dispatch arms#2845
borisbat merged 1 commit into
masterfrom
bbatkin/sqlite-linq-distinct-by-count-and-long-count

borisbat commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

borisbat commented May 24, 2026

Summary

What's in the diff

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants