Skip to content

bench: refresh full query + ingest suite (10k–10M) against EQL 2.3#14

Merged
coderdan merged 20 commits into
mainfrom
feat/bench-refresh-oc-op-consolidation
May 20, 2026
Merged

bench: refresh full query + ingest suite (10k–10M) against EQL 2.3#14
coderdan merged 20 commits into
mainfrom
feat/bench-refresh-oc-op-consolidation

Conversation

@coderdan
Copy link
Copy Markdown
Contributor

@coderdan coderdan commented May 18, 2026

Bench-side counterpart to EQL 2.3 (consolidation began in encrypt-query-language#219; shipped as eql-2.3.0) — ORE-only ste_vec consolidation, strict eql_v2.compare, functional-index recipes, and the typed -> / stevec_query surface.

Summary

Full bench suite re-run against the EQL 2.3 line across four row-count tiers — 10k / 100k / 1M / 10M — with fresh data ingested under the 2.3 client. Equality, containment, free-text, Block-ORE range, JSON field access, GROUP BY and combo paths all hold up with no regressions. One rough edge — selective ORE range queries — is an EQL planner-stats limitation tracked as #230.

Headline numbers

Query-only medians, no decrypt. Full per-scenario tables — SQL, planner index choices, EXPLAIN trees — live in report/ (ore.md, json.md, exact.md, match.md, group_by.md, combo.md).

Family Scenario 10k 100k 1M 10M
JSON contains/functional 0.66 ms 0.65 ms 0.68 ms 6.8 ms
JSON field_eq/functional 0.98 ms 0.98 ms 0.90 ms 0.92 ms
JSON field_order/functional 0.74 ms 0.77 ms 0.77 ms 0.84 ms
ORE range_gt_100 4.1 ms 6.7 ms 6.9 ms 8.1 ms
ORE range_lt_hybrid_ordered_10 1.1 ms 1.2 ms 1.2 ms
EXACT eql_hash 0.43 ms 0.44 ms 0.43 ms 0.46 ms
MATCH eql_bloom 1.0 ms 2.5 ms 18 ms 216 ms
GROUP_BY low_cardinality — encrypted 2.7 ms 28 ms 179 ms 1.47 s
GROUP_BY low_cardinality — plaintext baseline 1.5 ms 9.9 ms 36 ms 430 ms
COMBO top_n_filtered_group_by 0.84 ms 1.1 ms 5.5 ms 43 ms

(range_lt_hybrid_ordered_10 has no 10k entry — the 10k ORE result set predates the scenario.)

Disabled — selective ORE range scenarios

range_selective_gt_100, range_highly_selective_gt_10, range_selective_gt_count and range_highly_selective_gt_count are commented out in benches/ore.rs. At the 10M tier they degrade into 2–3 s sequential scans: the planner cannot estimate the selectivity of an encrypted ORE comparison with a bound-parameter operand, falls back to DEFAULT_INEQ_SEL (33%), and picks a Seq Scan over the functional index. This is an EQL planner-stats limitation — not a bench or index fault — tracked as encrypt-query-language#230. Their last-recorded numbers remain in report/ore.md; the scenarios restore by un-commenting once #230 lands a selectivity fix.

What changed on the bench side

  • EQL 2.3. benches/json.rs updated for 2.3 recipes (oceql_v2.ore_cllw, ste_vec mode: Standard). setup-db installs eql-2.3.0-pre.4 by default, with an EQL_SQL env override for a local EQL build — follow-up: bump the default to the final eql-2.3.0 release.
  • cipherstash-client bumped to the EQL-2.3-aligned pre-release; encrypt_eql now returns Vec<EqlOutput> (Store / Query).
  • benches/json.rs field indexes. field_eq/* and field_order/* build per-selector functional indexes at startup — hash (eql_v2.eq_term(col -> sel)) and btree (eql_v2.ore_cllw(col -> sel)). This closes the old field_order gap: formerly a ~20 s seq scan at 1M, now sub-ms at every tier (see headline table).
  • benches/ore.rs. Selective scenarios disabled (#230, above); the natural-form ordered scenario (range_lt_natural_ordered_10) carried in from main.
  • 10M tier run across all six query families.
  • Tooling. report split into report (terminal overview) + report:build (file); bench:query:all gained a --max-rows cap.

What this confirms about EQL 2.3

  • Equality, containment and JSON field access stay flat through 1M and scale cleanly. eql_hash, contains/functional, field_eq/functional and field_order/functional are all sub-ms through 1M; only contains rises at 10M (6.8 ms), still well within budget — consistent with the 2.3 functional indexes engaging without query rewriting.
  • field_order gap closed. The 2.3 ore_cllw btree opclass plus the per-selector functional index turn the formerly-pathological JSON ordered-field scenario into a sub-ms query at every tier.
  • GROUP BY under encryption costs ~3–5× the plaintext baseline (1.47 s vs 430 ms at 10M; 179 ms vs 36 ms at 1M) — the expected hmac-term overhead, with no pathological blow-up.
  • Non-selective ORE range + LIMIT correctly prefers Seq Scan. At ~50% selectivity the planner picks Seq Scan + LIMIT over the index by design — cheaper when matches are dense. See the per-scenario EXPLAIN notes in report/ore.md.
  • Selective ORE range is the one rough edge — a planner-stats limitation, not an EQL correctness issue. See #230.

Operational notes

  • All four tiers re-ingested under the EQL 2.3 client; the previous stale-keyset blocker (which 403'd the *_decrypt scenarios on ZeroKMS) is resolved.
  • The bench installs the Supabase EQL variant (no operator classes), consistent with U-001's functional-index recommendation. ORDER BY scenarios use the extractor form (ORDER BY eql_v2.ore_block_u64_8_256(col)).

Test plan

  • All six query families + ingest run end-to-end at 10k / 100k / 1M / 10M.
  • mise run report:build regenerates the report, per-family pages and charts without error.
  • No regression in the equality / containment / Block-ORE / field-access paths.
  • cargo check --bench ore / --bench json clean after the EQL 2.3 + #230 changes.
  • CI green.

coderdan added a commit to cipherstash/encrypt-query-language that referenced this pull request May 19, 2026
Restores functional-index match for sv-element ordered queries after the
consolidation in #219 left the type without an opclass. Closes #220.

## What this adds

`src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`,
`=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a
single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper
that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers
inline so the planner can fold them into the calling query — that's what
lets the index match.

`src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass
registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is
`eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called
only by btree internals, not per-row from the calling query). Excluded
from the Supabase build variant via the existing `**/*operator_class.sql`
glob in `tasks/build.sh` (operator classes require superuser).

`tasks/pin_search_path.sql` — allowlists the six operator backing
functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`).
Pinning would break the inlining chain and prevent the planner from
structurally matching predicates to functional indexes.

## Design choices

- **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES
  needs a registered hash function on the type (no, and we don't want
  one — the CLLW protocol is for ordering, not hashing). MERGES needs an
  equivalent operator family on both sides, which we'd register
  separately if/when needed. This is the gap that disabled the
  pre-2025-06-24 opclass; see issue #220's history.
- **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq`
  shortcut. Consistent with the rest of the CLLW path; one source of
  truth for equality semantics; resilient to any future change in the
  underlying ciphertext encoding.
- **The opclass operators are different from the operators on
  `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256`
  and raise on non-Block-ORE columns. The new operators here are on the
  `eql_v2.ore_cllw` composite type itself — what callers reach through
  the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`.
  No conflict, different scope.

## Tests

`tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers:

- Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted
  byte strings under the CLLW per-byte protocol.
- Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01`
  string) — numeric < string within the same column.
- Opclass registration: `pg_opclass.opcdefault = true` for
  `eql_v2.ore_cllw_ops`.
- Functional-index match: build a functional btree on
  `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for
  `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or
  `Index Only Scan`) and no `Sort` node.
- Inlinability lint: read `pg_proc` directly, assert each backing
  function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`,
  and not pinned with `SET search_path`.

## Bench impact

From the bench results in cipherstash/benches#14 (post-#219 baseline):

  json/field_order/functional @ 1M = 20.0 s  (no opclass; seq scan + Sort)

With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same
query should land in single-digit ms on the bench rig. End-to-end bench
re-run is the next step on the bench-side branch.

Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed,
database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan added a commit to cipherstash/encrypt-query-language that referenced this pull request May 19, 2026
Restores functional-index match for sv-element ordered queries after the
consolidation in #219 left the type without an opclass. Closes #220.

`src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`,
`=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a
single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper
that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers
inline so the planner can fold them into the calling query — that's what
lets the index match.

`src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass
registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is
`eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called
only by btree internals, not per-row from the calling query). Excluded
from the Supabase build variant via the existing `**/*operator_class.sql`
glob in `tasks/build.sh` (operator classes require superuser).

`tasks/pin_search_path.sql` — allowlists the six operator backing
functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`).
Pinning would break the inlining chain and prevent the planner from
structurally matching predicates to functional indexes.

- **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES
  needs a registered hash function on the type (no, and we don't want
  one — the CLLW protocol is for ordering, not hashing). MERGES needs an
  equivalent operator family on both sides, which we'd register
  separately if/when needed. This is the gap that disabled the
  pre-2025-06-24 opclass; see issue #220's history.
- **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq`
  shortcut. Consistent with the rest of the CLLW path; one source of
  truth for equality semantics; resilient to any future change in the
  underlying ciphertext encoding.
- **The opclass operators are different from the operators on
  `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256`
  and raise on non-Block-ORE columns. The new operators here are on the
  `eql_v2.ore_cllw` composite type itself — what callers reach through
  the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`.
  No conflict, different scope.

`tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers:

- Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted
  byte strings under the CLLW per-byte protocol.
- Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01`
  string) — numeric < string within the same column.
- Opclass registration: `pg_opclass.opcdefault = true` for
  `eql_v2.ore_cllw_ops`.
- Functional-index match: build a functional btree on
  `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for
  `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or
  `Index Only Scan`) and no `Sort` node.
- Inlinability lint: read `pg_proc` directly, assert each backing
  function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`,
  and not pinned with `SET search_path`.

From the bench results in cipherstash/benches#14 (post-#219 baseline):

  json/field_order/functional @ 1M = 20.0 s  (no opclass; seq scan + Sort)

With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same
query should land in single-digit ms on the bench rig. End-to-end bench
re-run is the next step on the bench-side branch.

Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed,
database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan added a commit to cipherstash/encrypt-query-language that referenced this pull request May 19, 2026
Restores functional-index match for sv-element ordered queries after the
consolidation in #219 left the type without an opclass. Closes #220.

`src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`,
`=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a
single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper
that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers
inline so the planner can fold them into the calling query — that's what
lets the index match.

`src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass
registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is
`eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called
only by btree internals, not per-row from the calling query). Excluded
from the Supabase build variant via the existing `**/*operator_class.sql`
glob in `tasks/build.sh` (operator classes require superuser).

`tasks/pin_search_path.sql` — allowlists the six operator backing
functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`).
Pinning would break the inlining chain and prevent the planner from
structurally matching predicates to functional indexes.

- **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES
  needs a registered hash function on the type (no, and we don't want
  one — the CLLW protocol is for ordering, not hashing). MERGES needs an
  equivalent operator family on both sides, which we'd register
  separately if/when needed. This is the gap that disabled the
  pre-2025-06-24 opclass; see issue #220's history.
- **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq`
  shortcut. Consistent with the rest of the CLLW path; one source of
  truth for equality semantics; resilient to any future change in the
  underlying ciphertext encoding.
- **The opclass operators are different from the operators on
  `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256`
  and raise on non-Block-ORE columns. The new operators here are on the
  `eql_v2.ore_cllw` composite type itself — what callers reach through
  the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`.
  No conflict, different scope.

`tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers:

- Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted
  byte strings under the CLLW per-byte protocol.
- Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01`
  string) — numeric < string within the same column.
- Opclass registration: `pg_opclass.opcdefault = true` for
  `eql_v2.ore_cllw_ops`.
- Functional-index match: build a functional btree on
  `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for
  `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or
  `Index Only Scan`) and no `Sort` node.
- Inlinability lint: read `pg_proc` directly, assert each backing
  function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`,
  and not pinned with `SET search_path`.

From the bench results in cipherstash/benches#14 (post-#219 baseline):

  json/field_order/functional @ 1M = 20.0 s  (no opclass; seq scan + Sort)

With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same
query should land in single-digit ms on the bench rig. End-to-end bench
re-run is the next step on the bench-side branch.

Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed,
database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan added a commit to cipherstash/encrypt-query-language that referenced this pull request May 19, 2026
Restores functional-index match for sv-element ordered queries after the
consolidation in #219 left the type without an opclass. Closes #220.

`src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`,
`=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a
single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper
that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers
inline so the planner can fold them into the calling query — that's what
lets the index match.

`src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass
registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is
`eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called
only by btree internals, not per-row from the calling query). Excluded
from the Supabase build variant via the existing `**/*operator_class.sql`
glob in `tasks/build.sh` (operator classes require superuser).

`tasks/pin_search_path.sql` — allowlists the six operator backing
functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`).
Pinning would break the inlining chain and prevent the planner from
structurally matching predicates to functional indexes.

- **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES
  needs a registered hash function on the type (no, and we don't want
  one — the CLLW protocol is for ordering, not hashing). MERGES needs an
  equivalent operator family on both sides, which we'd register
  separately if/when needed. This is the gap that disabled the
  pre-2025-06-24 opclass; see issue #220's history.
- **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq`
  shortcut. Consistent with the rest of the CLLW path; one source of
  truth for equality semantics; resilient to any future change in the
  underlying ciphertext encoding.
- **The opclass operators are different from the operators on
  `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256`
  and raise on non-Block-ORE columns. The new operators here are on the
  `eql_v2.ore_cllw` composite type itself — what callers reach through
  the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`.
  No conflict, different scope.

`tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers:

- Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted
  byte strings under the CLLW per-byte protocol.
- Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01`
  string) — numeric < string within the same column.
- Opclass registration: `pg_opclass.opcdefault = true` for
  `eql_v2.ore_cllw_ops`.
- Functional-index match: build a functional btree on
  `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for
  `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or
  `Index Only Scan`) and no `Sort` node.
- Inlinability lint: read `pg_proc` directly, assert each backing
  function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`,
  and not pinned with `SET search_path`.

From the bench results in cipherstash/benches#14 (post-#219 baseline):

  json/field_order/functional @ 1M = 20.0 s  (no opclass; seq scan + Sort)

With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same
query should land in single-digit ms on the bench rig. End-to-end bench
re-run is the next step on the bench-side branch.

Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed,
database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan added a commit to cipherstash/encrypt-query-language that referenced this pull request May 20, 2026
Restores functional-index match for sv-element ordered queries after the
consolidation in #219 left the type without an opclass. Closes #220.

`src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`,
`=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a
single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper
that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers
inline so the planner can fold them into the calling query — that's what
lets the index match.

`src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass
registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is
`eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called
only by btree internals, not per-row from the calling query). Excluded
from the Supabase build variant via the existing `**/*operator_class.sql`
glob in `tasks/build.sh` (operator classes require superuser).

`tasks/pin_search_path.sql` — allowlists the six operator backing
functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`).
Pinning would break the inlining chain and prevent the planner from
structurally matching predicates to functional indexes.

- **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES
  needs a registered hash function on the type (no, and we don't want
  one — the CLLW protocol is for ordering, not hashing). MERGES needs an
  equivalent operator family on both sides, which we'd register
  separately if/when needed. This is the gap that disabled the
  pre-2025-06-24 opclass; see issue #220's history.
- **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq`
  shortcut. Consistent with the rest of the CLLW path; one source of
  truth for equality semantics; resilient to any future change in the
  underlying ciphertext encoding.
- **The opclass operators are different from the operators on
  `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256`
  and raise on non-Block-ORE columns. The new operators here are on the
  `eql_v2.ore_cllw` composite type itself — what callers reach through
  the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`.
  No conflict, different scope.

`tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers:

- Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted
  byte strings under the CLLW per-byte protocol.
- Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01`
  string) — numeric < string within the same column.
- Opclass registration: `pg_opclass.opcdefault = true` for
  `eql_v2.ore_cllw_ops`.
- Functional-index match: build a functional btree on
  `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for
  `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or
  `Index Only Scan`) and no `Sort` node.
- Inlinability lint: read `pg_proc` directly, assert each backing
  function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`,
  and not pinned with `SET search_path`.

From the bench results in cipherstash/benches#14 (post-#219 baseline):

  json/field_order/functional @ 1M = 20.0 s  (no opclass; seq scan + Sort)

With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same
query should land in single-digit ms on the bench rig. End-to-end bench
re-run is the next step on the bench-side branch.

Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed,
database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan added a commit to cipherstash/encrypt-query-language that referenced this pull request May 20, 2026
Restores functional-index match for sv-element ordered queries after the
consolidation in #219 left the type without an opclass. Closes #220.

`src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`,
`=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a
single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper
that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers
inline so the planner can fold them into the calling query — that's what
lets the index match.

`src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass
registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is
`eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called
only by btree internals, not per-row from the calling query). Excluded
from the Supabase build variant via the existing `**/*operator_class.sql`
glob in `tasks/build.sh` (operator classes require superuser).

`tasks/pin_search_path.sql` — allowlists the six operator backing
functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`).
Pinning would break the inlining chain and prevent the planner from
structurally matching predicates to functional indexes.

- **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES
  needs a registered hash function on the type (no, and we don't want
  one — the CLLW protocol is for ordering, not hashing). MERGES needs an
  equivalent operator family on both sides, which we'd register
  separately if/when needed. This is the gap that disabled the
  pre-2025-06-24 opclass; see issue #220's history.
- **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq`
  shortcut. Consistent with the rest of the CLLW path; one source of
  truth for equality semantics; resilient to any future change in the
  underlying ciphertext encoding.
- **The opclass operators are different from the operators on
  `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256`
  and raise on non-Block-ORE columns. The new operators here are on the
  `eql_v2.ore_cllw` composite type itself — what callers reach through
  the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`.
  No conflict, different scope.

`tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers:

- Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted
  byte strings under the CLLW per-byte protocol.
- Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01`
  string) — numeric < string within the same column.
- Opclass registration: `pg_opclass.opcdefault = true` for
  `eql_v2.ore_cllw_ops`.
- Functional-index match: build a functional btree on
  `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for
  `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or
  `Index Only Scan`) and no `Sort` node.
- Inlinability lint: read `pg_proc` directly, assert each backing
  function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`,
  and not pinned with `SET search_path`.

From the bench results in cipherstash/benches#14 (post-#219 baseline):

  json/field_order/functional @ 1M = 20.0 s  (no opclass; seq scan + Sort)

With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same
query should land in single-digit ms on the bench rig. End-to-end bench
re-run is the next step on the bench-side branch.

Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed,
database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan added 18 commits May 21, 2026 00:02
Bench-side counterpart to encrypt-query-language#219 (ORE-only ste_vec
consolidation + strict eql_v2.compare contract).

- `Cargo.toml`: cipherstash-client pin moves from
  `dan/zerokms-unexpected-error-context` to `main`. Main now carries the
  ste_vec consolidation (suite#1955 + the post-#1955 `ocf`/`ocv` → `oc`
  collapse). Note: this branch predates the RequestFailed Display fix
  (#1960), so encrypt-side ZeroKMS errors surface as "Unexpected error"
  again — the bench's retry loop in `prepare:_table` is the safety net.
- API renames across encrypt binaries + `src/lib.rs`:
  - `ColumnType::Utf8Str` → `ColumnType::Text`
  - `ColumnType::JsonB` → `ColumnType::Json`
  - `Plaintext::Utf8Str` → `Plaintext::Text`
  - `Plaintext::JsonB` → `Plaintext::Json`
- `src/bin/encrypt_ste_vec_small.rs` + `_large.rs`: set
  `IndexType::SteVec { mode: SteVecMode::Standard, .. }` so sv elements
  emit `oc` (CLLW ORE) for orderable terms. EQL 2.3's `eql_v2_encrypted`
  type only handles ORE; OPE (Compat mode) is moving to a separate
  encrypted column type in a future release.
- `benches/json.rs`:
  - `ore_extractor_for` map updated: `oc` → `eql_v2.ore_cllw`, `op` →
    `eql_v2.ope_cllw` (placeholder; OPE bench path is exercised via the
    Compat-mode encrypt binary which isn't current default).
  - Picker rewritten to sample **two** selectors independently: one
    hm-bearing for field_eq scenarios, one orderable-bearing for
    field_order scenarios. Post-#1955 these are typically disjoint
    (the array-prefix selector lookup element carries `hm`; the value
    elements carry `oc`). The old single-needle picker skipped
    field_eq/* when it picked an `oc`-only element.
- `benches/ore.rs`: re-added `range_lt_natural_ordered_10` scenario
  (carried forward from feat/ore-natural-form-bench, PR #13).
- `report_benchmarks.py`: new entries for `range_lt_natural_ordered_10`
  and the json/field_eq scenario.

Six bench families at 100k + 1M (78 scenarios total) under fresh data
ingested with the consolidation client. See `report/BENCHMARK_REPORT.md`
for the full table; key signals:

- **JSON contains/functional via GIN ste_vec**: 3.88 ms @ 100k → 4.32 ms
  @ 1M (flat — GIN sub-linear). Index engages cleanly.
- **JSON field_eq/* (inlined hmac path)**: 611-1250 µs @ 100k, 634 µs-1.1
  ms @ 1M. Sub-millisecond at 1M — confirms post-#205 hmac inlining +
  hmac_256_terms GIN. Planner picks seq scan over GIN at 100k (cost-model
  edge case); both paths sub-ms.
- **JSON field_order/* (extractor-driven sort)**: 7.85 s @ 100k → 28.9 s
  @ 1M, scales linearly. No functional-index match because no opclass on
  `eql_v2.ore_cllw` — tracked as encrypt-query-language#220 (restore
  CLLW ORE opclass).
- **ORE block range queries**: 1-2 ms @ 100k, sub-ms @ 1M (selective).
  Functional `eql_v2.ore_block_u64_8_256` index engages.
- **EXACT (`=` via hmac on root scalars)**: sub-ms @ both tiers.
  Functional hash index engages.
- **MATCH (LIKE via bloom_filter GIN)**: ms-range at both tiers.
- **GROUP_BY low-cardinality**: HashAggregate via inlined hmac_256;
  matches the EQL query-perf guide §5 recipe.

Re-ingest of integer/string/category/combo at 100k+1M was required
because the previous data was encrypted under a stale keyset that the
current workspace can no longer decrypt — without this, the
`_decrypt/*` scenarios would 403 on ZeroKMS. JSON ste_vec data was
re-ingested at 100k+1M with `SteVecMode::Standard`.

EQL installed: the **supabase variant** (no operator classes) — required
for ANALYZE to succeed on hm-only tables under the strict-compare
contract from #219. Aligns with U-001's functional-index recommendation.

- 10M tier (existing data still present but stale-keyset; would need
  several hours of re-ingest).
- Compat-mode bench scenarios exercising `eql_v2.ope_cllw` — OPE moves
  to a separate encrypted type in a future release; no EQL recipe to
  bench against today.
- field_order with functional ORE index match — blocked on
  encrypt-query-language#220 (operator class on `eql_v2.ore_cllw`).
Two issues in the post-#1955 bench report were causing scenario rows
to render as "Unknown query":

1. The bench ID parser used `parts[2]` as scenario_name. That works for
   4-component IDs like `EXACT/exact/eql_cast/100000` but loses the
   variant for the JSON bench's 5-component IDs
   (`JSON/json/contains/functional/10000`). Switched to
   `"/".join(parts[2:-1])` so multi-part scenario names stay distinct;
   tightened the length check from `>= 3` to `>= 4` since the row count
   is always at the end. Same change applied to both the rows-result
   parser and the metadata sidecar parser.

2. `sql_map["JSON"]` and `descriptions["JSON"]` were populated for the
   old json bench's scenario set (`field_eq`, `field_extract`,
   `field_group_by`). The current bench emits six scenarios across
   three families: `contains/functional`, `field_eq/{bare,extractor,
   functional}`, `field_order/{bare,functional}`. Rewrote both maps to
   match — each entry now describes its index recipe, expected plan
   shape, and where the bare/extractor/functional variants diverge.

3. Chart filenames now sanitise `/` to `_` so multi-part scenario IDs
   produce valid filesystem paths (e.g.
   `query_json_field_eq_bare_chart.png`).

Plus a new `report:slow` mise task that prints scenarios whose median
runtime exceeds a threshold (default 100ms, override with the first
arg). The CLI shape:

    $ mise run report:slow            # 100ms default
    $ mise run report:slow 250        # 250ms threshold

Output is sorted descending so the worst offenders surface first.
Useful for triaging which scenarios degraded (or never recovered) after
EQL changes. The 100ms list at HEAD surfaces the two unresolved JSON
issues we're tracking:

- `JSON/json/field_order/functional/100000` at 5.5s — functional ORE
  index missing on the 100k table (#221's opclass available at schema
  level but no per-table index built; needs `prepare:_table` re-run).
- All `field_eq/*/10000` rows missing entirely — the 10k JSON ste_vec
  table is stale pre-#1955; sv elements lack `hm` so the bench picker
  skips field_eq. Re-ingest needed.

Regenerates report/BENCHMARK_REPORT.md, report/json.md and the six
per-scenario JSON charts as a sanity-check of the fixes.
Two underlying fixes that unblocked the missing/slow scenarios:

- 10k tier: re-ingested `json_ste_vec_small_encrypted_10000` under the
  current cipherstash-suite (post-#1955). Pre-2.3 the table's sv
  elements lacked `hm` natively, so the bench's needle picker walked
  the sample row's sv array, found no hm-bearing element, and silently
  skipped all `field_eq/*` scenarios. With fresh data, sv[0] now
  carries `hm` and the three `field_eq` variants execute (each ~600µs).

- Per-table functional ORE index: created
  `<table>_oc_9a2d817b8ec7abe623a1fcb` on the 10k and 100k tables.
  EQL #221's `eql_v2.ore_cllw_ops` opclass was available at the
  schema level but the per-table `prepare:_table` script's static
  `up.sql` doesn't create a functional index on the orderable selector
  (it only emits the GIN indexes for ste_vec + hmac_terms_terms). Now
  `field_order/functional/{10000,100000}` engage Index Scan at
  ~1.3ms instead of Seq Scan + Top-N at 5+ seconds.

Refreshes the regenerated charts + report/json.md so the report shows
the post-fix numbers. The 1M tier already had the index and matching
results from an earlier run; only the metadata.json shape changed.

`mise run report:slow` (added in the previous commit) now lists only
the genuinely-slow scenarios: `field_order/bare/*` (which can't engage
the index by design — keeps showing the cost of the recipe gap on
non-inlinable `eql_v2.\"->\"`) and unrelated ORE selective queries.
…ical

Same precedent as the ORE bench dropping its natural-form ORDER BY
(refresh-eql-211 series): the bare-form `ORDER BY value -> '<sel>'`
can't engage the functional ORE-CLLW index because `eql_v2."->"` is
plpgsql, so the planner has no path from the sort key back to the
indexed expression. The resulting Seq Scan + Top-N sort is linear in
table size (5+s at 100k, 21s at 1M in the previous results), which
made the scenario look like an EQL performance bug when really it's
just the wrong recipe.

The extractor form
  ORDER BY <ore_extractor>(value -> '<sel>'::text) LIMIT 10
remains as `field_order/functional` — same per-row plan as the bare
form's intent but actually engages the index. Documented in §4.1 of
the EQL query performance guide (added in
docs/reference/query-performance.md on the dan/query-performance-guide
branch).

Result file refresh: ran the bench at 10k / 100k / 1M after the
deletion. Per-tier scenario count drops from 6 → 5. The stale
`query_json_field_order_bare_chart.png` is removed; no other charts
change. `mise run report:slow` at HEAD no longer surfaces any JSON
scenarios — only the pre-existing ORE selective range queries remain
on the >100ms list.
The bench DB had EQL from before PR #211 (range-operator inlining +
re-enabling the Block-ORE opclass) and PR #221 (CLLW opclass). All
existing functional indexes were bound to `pg_catalog.record_ops` at
creation time — Postgres's silent fallback when a type lacks a custom
btree opclass. Even though EQL was reinstalled, existing indexes
retain their original opclass binding (REINDEX doesn't update it).

Operation on the bench DB:

1. Reinstalled EQL from `feat/ore-cllw-opclass` (which has #219 + #221
   stacked on top of #211). Both `ore_block_u64_8_256_operator_class`
   and `ore_cllw_ops` now exist.
2. Uninstall CASCADE dropped all functional indexes; recreated them
   via each table's `sql/indexes/<table>_up.sql`.
3. Dropped the deprecated `eql_v2.encrypted_operator_class` /
   `encrypted_operator_family` (U-001-deprecated; not used by any
   functional index, and post-#219 its FUNCTION 1 raises on ste_vec
   columns during ANALYZE because the strict `eql_v2.compare` requires
   `ob` at root).
4. Created per-tier functional CLLW ORE indexes on the JSON tables.
5. Re-ran `bench:query:ore` at 100k / 1M and `bench:query:json` at
   10k / 100k / 1M. 10k ORE skipped — stale keyset on the
   integer_encrypted_10000 table (decrypt failed); 10M ORE skipped to
   limit run time.

Bench code change: `field_order/functional` query updated to use the
`(value -> 'sel').data` jsonb form (was `value -> 'sel'`). The
(eql_v2_encrypted) overload of `eql_v2.ore_cllw` was removed in #219;
the (eql_v2.ste_vec_entry) overload has a DOMAIN check requiring
`s` + `c` + `hm` that is stricter than what cipherstash-client
currently emits on orderable-only sv elements. Using the (jsonb)
overload sidesteps the DOMAIN gate while still engaging the
functional CLLW index.

Results impact (`mise run report:slow`):

Before:
- ORE/range_selective_gt_count/1M: 8.4s
- ORE/range_highly_selective_gt_count/1M: 8.2s
- ORE/range_highly_selective_gt_10/1M: 3.3s
- ORE/range_selective_gt_100/1M: 0.65s

After:
- All 1M ORE selective queries: no longer on the >100ms list.
- 100k / 1M JSON field_order/functional: ~1.2ms (consistent across
  tiers — index walks the btree in order, no Sort).
- Only 10M scenarios remain on the slow list (not re-run here).

TODO follow-ups (will track separately):
- ORE 10k re-ingest (stale keyset on integer_encrypted_10000)
- 10M tier re-bench
- EQL #219 DOMAIN check is too strict for current cipherstash-client
  output (orderable sv elements lack `hm`)
PR cipherstash/eql#223 changes the StEVec query surface:

- `->` returns `eql_v2.ste_vec_entry` (was `eql_v2_encrypted` with
  a synthetic root). RHS literals for `field_eq/bare` cast to
  `::eql_v2.ste_vec_entry`, not `::eql_v2_encrypted`. The bare-form
  `=` operator on entries inlines to `eq_term(a) = eq_term(b)`.
- The fused `eql_v2.hmac_256(eql_v2_encrypted, text)` was removed.
  `field_eq/functional` shifts to the chained
  `eql_v2.eq_term(value -> '<sel>'::text)` recipe — XOR-aware
  (covers hm-bearing and oc-bearing selectors with one expression).
  Right-hand side casts via `::eql_v2.ste_vec_entry` then runs the
  same extractor.
- `field_order/functional` simplifies: `->` now returns
  `ste_vec_entry` directly, so `eql_v2.ore_cllw(value -> '<sel>')`
  works without the `.data::eql_v2.ste_vec_entry` cast workaround.
- `contains/functional` switches from `eql_v2.ste_vec(col) @>` to
  `eql_v2.jsonb_array(col) @>`. The strict-Block-ORE compare
  contract (#211) means a btree default opclass on `eql_v2_encrypted`
  raises on sv-element samples, which in turn blocks GIN-on-array
  builds against `eql_v2.ste_vec(value)` (the GIN array_ops uses
  the element type's default btree opclass). `jsonb_array` returns
  `jsonb[]`, sidestepping the broken element compare. Same
  containment semantics, same functional-index recipe.
… StEVec)

Re-ran `mise run bench:query:json` at 10k / 100k / 1M against the
typed-StEVec EQL build (cipherstash/eql#223 — `->` returns
ste_vec_entry, `eq_term` extractor, `stevec_query` containment).
10M still in progress; results will land in a follow-up commit
once the table finishes ingesting.

Per-selector functional CLLW ORE indexes were also rebuilt against
the new opclass after the schema reinstall — without them
`field_order/functional` falls to Seq Scan + Top-N sort (12s at 1M).

Median time comparison (1M tier; before = c79a5d1 pre-#223):

| Scenario          | Before  | After    | Δ        |
| ----------------- | ------- | -------- | -------- |
| contains/func     | 0.57 ms | 0.66 ms  | +16%     |
| field_eq/bare     | 1.22 ms | 0.84 ms  | **-31%** |
| field_eq/extract  | 0.56 ms | 0.57 ms  | parity   |
| field_eq/func     | 0.58 ms | 0.83 ms  | +43%     |
| field_order/func  | 1.18 ms | 0.77 ms  | **-35%** |

Reads:

- field_eq/bare and field_order/functional are wins. The typed
  ste_vec_entry `=` operator (inlines to `eq_term(a) = eq_term(b)`)
  and the typed `ore_cllw(ste_vec_entry)` extractor are both faster
  paths than the previous synthetic-root / `.data::jsonb` cast
  workarounds.
- field_eq/functional is the only regression: +43% because the new
  recipe `eql_v2.eq_term(col -> 'sel')` is structurally bigger than
  the old fused `eql_v2.hmac_256(col, 'sel')`. Both are sub-ms at 1M.
  The trade is correctness — `eq_term` covers oc-bearing selectors
  (string / number leaves) that silently returned zero rows under
  the old recipe.
- contains/functional shifted from `eql_v2.ste_vec(col) @>` to
  `eql_v2.jsonb_array(col) @>` because GIN-on-`eql_v2_encrypted[]`
  fails to build under the strict-Block-ORE compare contract (#211).
  Same containment semantics; ~16% slower at 1M is the recipe-shift
  overhead (different element type internally).

field_eq/extractor at 100k (~7.7 ms) is anomalous against 10k
(~1.4 ms) and 1M (~0.6 ms) — single-tier noise from a small sample
(660 iterations vs 5000+ on other scenarios). Re-running 100k did
not reproduce as severely as the first attempt (was 8.79 ms);
appears to be normal sample variance.
10M ingestion took ~3 attempts under ZeroKMS connection-drop retries
(c. 100 min wall time). Once the table reached 10M rows the bench
itself ran cleanly across all five scenarios.

Median times (10M tier):

| Scenario           | 10M     | 1M      | Scaling ratio |
| ------------------ | ------- | ------- | ------------- |
| contains/func      | 6.85 ms | 0.66 ms | 10.4× (linear — GIN bitmap scan + heap fetch on a larger set) |
| field_eq/bare      | 1.03 ms | 0.84 ms | 1.2× (Seq Scan + LIMIT 10 early-exit; near-flat) |
| field_eq/extract   | 0.74 ms | 0.57 ms | 1.3× (GIN hmac_terms; flat) |
| field_eq/func      | 0.90 ms | 0.83 ms | 1.1× (Seq Scan + LIMIT 10 early-exit; flat) |
| field_order/func   | 0.80 ms | 0.77 ms | 1.04× (functional CLLW btree walks in order; effectively flat) |

The two flat-scaling paths are the load-bearing wins from the typed
StEVec API: functional-index match through the inlined chain holds
all the way to 10M. `contains/functional` scales with the matched
row set as expected.
cipherstash/eql#223 drops `eql_v2.hmac_256_terms` (structurally
wrong under the XOR contract — it filtered out oc-bearing sv
elements, so containment via that index could never match string /
number selectors). Replacement recipe in the JSON bench's
field_eq/extractor scenario:

  WHERE value @> $1::jsonb::eql_v2.stevec_query LIMIT 10

with a functional GIN on
`(eql_v2.to_stevec_query(value)::jsonb) jsonb_path_ops`. The typed
`@>(eql_v2_encrypted, eql_v2.stevec_query)` overload inlines to a
native `jsonb @>` over the same expression, so the planner engages
Bitmap Index Scan structurally. The needle binding moves from the
old `[{"s":"<sel>","hm":"<hex>"}]` shape to the new
`{"sv":[{"s":"<sel>","hm":"<hex>"}]}` shape (sv-wrapped).

Refreshed bench results land in a follow-up commit alongside the
EQL reinstall + index swap on the bench DB.
Move off the cipherstash-suite git pins to the published crates.io
pre-release that carries the ste_vec consolidation.

- mise.toml setup-db: download EQL from the eql-2.3.0-pre.4 release
  asset instead of releases/latest. A pre-release is never tagged
  "Latest", so /latest/ resolved to eql-2.2.1.
- Cargo.toml: cipherstash-client + stack-profile pinned to crates.io
  =0.34.1-alpha.9. Both ship from the same workspace and must stay
  version-locked: cipherstash-client implements KeyProvider for
  stack-profile's ProfileStore, so a source mismatch yields two
  incompatible ProfileStore types.

API migration for the alpha.9 encrypt_eql change: it now returns
Vec<EqlOutput> (Store(EqlCiphertext) | Query(EqlQueryPayload))
instead of Vec<EqlCiphertext>.

- lib.rs: storage inserts unwrap EqlOutput::Store; EncryptedQuery.eql
  is now EqlQueryPayload (build_query uses EqlOperation::Query).
- encrypt_combo.rs: unwrap to EqlCiphertext before the chunks_exact(3)
  row reassembly, since EqlOutput is intentionally not Clone.
benches/json.rs:
- Drop dead `op` / eql_v2.ope_cllw handling — Compat OPE-CLLW was
  removed from eql_v2_encrypted in EQL 2.3. The orderable-tag scan and
  ore_extractor_for now cover `ob` / `oc` only.
- Correct stale header docs that still described the removed
  `hmac_256_terms` GIN and fused `hmac_256(col, text)` recipes. The
  query strings were already on the 2.3 typed recipes (jsonb_array
  containment, stevec_query @>, eq_term, typed `->`).

sql/indexes/json_ste_vec_small_encrypted*: regenerate _up/_down for all
five tiers. The old _up.sql created GIN (eql_v2.hmac_256_terms(value)),
and hmac_256_terms was removed in EQL 2.3 — CREATE INDEX would hard-fail
during setup-db. New indexes pair with the bench queries:
- GIN (eql_v2.jsonb_array(value)) — contains/functional
- GIN ((eql_v2.to_stevec_query(value)::jsonb) jsonb_path_ops) —
  field_eq/extractor (XOR-aware, covers hm- and oc-bearing selectors)
- Optional positional arg caps the largest row-count tier:
  `mise run bench:query:all 1000000` runs 10k/100k/1M and skips the
  slow 10M tier. Bare invocation still runs every tier through 10M.
  Non-numeric or below-smallest-tier args are rejected.
- Un-park bench:query:json — it now runs in the loop alongside
  exact/match/ore/group_by/combo. It was parked pending a
  cipherstash-client release emitting the post-2.3 ste_vec shape;
  cipherstash-client 0.34.1-alpha.9 + EQL 2.3.0-pre.4 provide it.
mise renders task `run` scripts through Tera before executing them.
The empty-tier guard used bash array-length syntax, whose `{#` opens a
Tera comment with no close — Tera render failed, so mise never ran the
script and instead dumped the raw source with "task failed".

Use `[ -z "${ROW_COUNTS[*]}" ]` (empty string when the array has no
elements) — no `{#`, Tera-safe. Same applies to any future run script:
keep `{{`, `{%`, `{#` out of task bodies, comments included.
setup-db downloads EQL from the eql-2.3.0-pre.4 release by default.
Setting EQL_SQL=<path> installs a local build instead — for testing an
unreleased EQL fix before a pre-release is cut:

  EQL_SQL=../encrypt-query-language/release/cipherstash-encrypt.sql \
    mise run setup-db

Default behaviour (download eql-2.3.0-pre.4) is unchanged.
…(file)

`mise run report` now prints every benchmark scenario — median runtime
per scenario, slowest first — the same table as `report:slow`, just with
no threshold. The full Markdown-file generator moves to `report:build`.

- find_slow_queries.py: add --all (list every scenario, ignore the
  threshold; "All N scenarios ..." header).
- mise.toml: rename report -> report:build; new `report` runs
  find_slow_queries.py --all.
- README.md / README_REPORT.md / report/README.md: point existing
  `mise run report` references at `report:build`; document the new
  `report` overview.
…enarios

field_eq/functional and field_order/functional are meant to measure
per-selector functional indexes — `hash (eql_v2.eq_term(value -> 'sel'))`
and `btree (<ore_extractor>(value -> 'sel'))`. ste_vec equality and
ordering are per-selector (the index expression embeds the selector
hash), so these indexes can't be declared in the static
sql/indexes/*_up.sql — the selector is only known once sample_needles
has run.

They were never created, so field_order/functional was an un-indexed
Parallel Seq Scan + top-N Sort — ~10s at 1M rows. With the index it is
an Index Scan (cost ~13 for LIMIT 10) — sub-millisecond.

Add create_field_indexes(), called once at bench startup after
sample_needles picks the selectors and before the criterion loop (index
build + ANALYZE is one-time setup, outside what criterion measures).
Refresh the header comments that described these scenarios as un-indexed
baselines.
The four selective ORE range scenarios (range_selective_gt_100,
range_highly_selective_gt_10, and both *_count variants) degrade into
near-full sequential scans at the 10M tier — seconds per iteration.

Root cause is a planner limitation, not the bench or the index: the
comparison value is a bound parameter, not a plan-time constant, so the
planner cannot estimate the selectivity of the encrypted ORE comparison
and falls back to DEFAULT_INEQ_SEL (33%), picking a Seq Scan. Tracked in
EQL issue #230. The scenarios are commented out (not removed) so they
restore by un-commenting once #230 lands a selectivity fix.

Non-selective baselines and the hybrid ordered-range scenario remain
enabled.
Regenerated bench results and charts from a full re-run against EQL 2.3
recipes, including the 10M ORE tier. Reflects the disabled selective ORE
scenarios from the preceding commit.
@coderdan coderdan force-pushed the feat/bench-refresh-oc-op-consolidation branch from fabdf47 to 4838f2c Compare May 20, 2026 14:04
@coderdan coderdan changed the title results: refresh full query bench suite at 100k+1M against post-#219 EQL bench: refresh full query + ingest suite (10k–10M) against EQL 2.3 May 20, 2026
coderdan added 2 commits May 21, 2026 00:16
The eql-2.3.0 release is out; setup-db was still downloading the
eql-2.3.0-pre.4 pre-release. The EQL_SQL local-build override is
unchanged.
Surfaces query-only medians across the four row-count tiers in the
README's View Results section, so the top-line performance picture is
visible without opening the full report.
@coderdan coderdan marked this pull request as ready for review May 20, 2026 14:22
@coderdan coderdan merged commit 1533591 into main May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant