bench: refresh full query + ingest suite (10k–10M) against EQL 2.3#14
Merged
Conversation
5 tasks
coderdan
added a commit
to cipherstash/encrypt-query-language
that referenced
this pull request
May 19, 2026
Restores functional-index match for sv-element ordered queries after the consolidation in #219 left the type without an opclass. Closes #220. ## What this adds `src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`, `=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers inline so the planner can fold them into the calling query — that's what lets the index match. `src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is `eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called only by btree internals, not per-row from the calling query). Excluded from the Supabase build variant via the existing `**/*operator_class.sql` glob in `tasks/build.sh` (operator classes require superuser). `tasks/pin_search_path.sql` — allowlists the six operator backing functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`). Pinning would break the inlining chain and prevent the planner from structurally matching predicates to functional indexes. ## Design choices - **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES needs a registered hash function on the type (no, and we don't want one — the CLLW protocol is for ordering, not hashing). MERGES needs an equivalent operator family on both sides, which we'd register separately if/when needed. This is the gap that disabled the pre-2025-06-24 opclass; see issue #220's history. - **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq` shortcut. Consistent with the rest of the CLLW path; one source of truth for equality semantics; resilient to any future change in the underlying ciphertext encoding. - **The opclass operators are different from the operators on `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256` and raise on non-Block-ORE columns. The new operators here are on the `eql_v2.ore_cllw` composite type itself — what callers reach through the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`. No conflict, different scope. ## Tests `tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers: - Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted byte strings under the CLLW per-byte protocol. - Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01` string) — numeric < string within the same column. - Opclass registration: `pg_opclass.opcdefault = true` for `eql_v2.ore_cllw_ops`. - Functional-index match: build a functional btree on `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or `Index Only Scan`) and no `Sort` node. - Inlinability lint: read `pg_proc` directly, assert each backing function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`, and not pinned with `SET search_path`. ## Bench impact From the bench results in cipherstash/benches#14 (post-#219 baseline): json/field_order/functional @ 1M = 20.0 s (no opclass; seq scan + Sort) With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same query should land in single-digit ms on the bench rig. End-to-end bench re-run is the next step on the bench-side branch. Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed, database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan
added a commit
to cipherstash/encrypt-query-language
that referenced
this pull request
May 19, 2026
Restores functional-index match for sv-element ordered queries after the consolidation in #219 left the type without an opclass. Closes #220. `src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`, `=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers inline so the planner can fold them into the calling query — that's what lets the index match. `src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is `eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called only by btree internals, not per-row from the calling query). Excluded from the Supabase build variant via the existing `**/*operator_class.sql` glob in `tasks/build.sh` (operator classes require superuser). `tasks/pin_search_path.sql` — allowlists the six operator backing functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`). Pinning would break the inlining chain and prevent the planner from structurally matching predicates to functional indexes. - **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES needs a registered hash function on the type (no, and we don't want one — the CLLW protocol is for ordering, not hashing). MERGES needs an equivalent operator family on both sides, which we'd register separately if/when needed. This is the gap that disabled the pre-2025-06-24 opclass; see issue #220's history. - **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq` shortcut. Consistent with the rest of the CLLW path; one source of truth for equality semantics; resilient to any future change in the underlying ciphertext encoding. - **The opclass operators are different from the operators on `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256` and raise on non-Block-ORE columns. The new operators here are on the `eql_v2.ore_cllw` composite type itself — what callers reach through the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`. No conflict, different scope. `tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers: - Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted byte strings under the CLLW per-byte protocol. - Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01` string) — numeric < string within the same column. - Opclass registration: `pg_opclass.opcdefault = true` for `eql_v2.ore_cllw_ops`. - Functional-index match: build a functional btree on `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or `Index Only Scan`) and no `Sort` node. - Inlinability lint: read `pg_proc` directly, assert each backing function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`, and not pinned with `SET search_path`. From the bench results in cipherstash/benches#14 (post-#219 baseline): json/field_order/functional @ 1M = 20.0 s (no opclass; seq scan + Sort) With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same query should land in single-digit ms on the bench rig. End-to-end bench re-run is the next step on the bench-side branch. Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed, database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan
added a commit
to cipherstash/encrypt-query-language
that referenced
this pull request
May 19, 2026
Restores functional-index match for sv-element ordered queries after the consolidation in #219 left the type without an opclass. Closes #220. `src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`, `=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers inline so the planner can fold them into the calling query — that's what lets the index match. `src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is `eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called only by btree internals, not per-row from the calling query). Excluded from the Supabase build variant via the existing `**/*operator_class.sql` glob in `tasks/build.sh` (operator classes require superuser). `tasks/pin_search_path.sql` — allowlists the six operator backing functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`). Pinning would break the inlining chain and prevent the planner from structurally matching predicates to functional indexes. - **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES needs a registered hash function on the type (no, and we don't want one — the CLLW protocol is for ordering, not hashing). MERGES needs an equivalent operator family on both sides, which we'd register separately if/when needed. This is the gap that disabled the pre-2025-06-24 opclass; see issue #220's history. - **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq` shortcut. Consistent with the rest of the CLLW path; one source of truth for equality semantics; resilient to any future change in the underlying ciphertext encoding. - **The opclass operators are different from the operators on `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256` and raise on non-Block-ORE columns. The new operators here are on the `eql_v2.ore_cllw` composite type itself — what callers reach through the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`. No conflict, different scope. `tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers: - Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted byte strings under the CLLW per-byte protocol. - Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01` string) — numeric < string within the same column. - Opclass registration: `pg_opclass.opcdefault = true` for `eql_v2.ore_cllw_ops`. - Functional-index match: build a functional btree on `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or `Index Only Scan`) and no `Sort` node. - Inlinability lint: read `pg_proc` directly, assert each backing function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`, and not pinned with `SET search_path`. From the bench results in cipherstash/benches#14 (post-#219 baseline): json/field_order/functional @ 1M = 20.0 s (no opclass; seq scan + Sort) With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same query should land in single-digit ms on the bench rig. End-to-end bench re-run is the next step on the bench-side branch. Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed, database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan
added a commit
to cipherstash/encrypt-query-language
that referenced
this pull request
May 19, 2026
Restores functional-index match for sv-element ordered queries after the consolidation in #219 left the type without an opclass. Closes #220. `src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`, `=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers inline so the planner can fold them into the calling query — that's what lets the index match. `src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is `eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called only by btree internals, not per-row from the calling query). Excluded from the Supabase build variant via the existing `**/*operator_class.sql` glob in `tasks/build.sh` (operator classes require superuser). `tasks/pin_search_path.sql` — allowlists the six operator backing functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`). Pinning would break the inlining chain and prevent the planner from structurally matching predicates to functional indexes. - **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES needs a registered hash function on the type (no, and we don't want one — the CLLW protocol is for ordering, not hashing). MERGES needs an equivalent operator family on both sides, which we'd register separately if/when needed. This is the gap that disabled the pre-2025-06-24 opclass; see issue #220's history. - **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq` shortcut. Consistent with the rest of the CLLW path; one source of truth for equality semantics; resilient to any future change in the underlying ciphertext encoding. - **The opclass operators are different from the operators on `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256` and raise on non-Block-ORE columns. The new operators here are on the `eql_v2.ore_cllw` composite type itself — what callers reach through the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`. No conflict, different scope. `tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers: - Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted byte strings under the CLLW per-byte protocol. - Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01` string) — numeric < string within the same column. - Opclass registration: `pg_opclass.opcdefault = true` for `eql_v2.ore_cllw_ops`. - Functional-index match: build a functional btree on `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or `Index Only Scan`) and no `Sort` node. - Inlinability lint: read `pg_proc` directly, assert each backing function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`, and not pinned with `SET search_path`. From the bench results in cipherstash/benches#14 (post-#219 baseline): json/field_order/functional @ 1M = 20.0 s (no opclass; seq scan + Sort) With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same query should land in single-digit ms on the bench rig. End-to-end bench re-run is the next step on the bench-side branch. Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed, database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan
added a commit
to cipherstash/encrypt-query-language
that referenced
this pull request
May 20, 2026
Restores functional-index match for sv-element ordered queries after the consolidation in #219 left the type without an opclass. Closes #220. `src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`, `=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers inline so the planner can fold them into the calling query — that's what lets the index match. `src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is `eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called only by btree internals, not per-row from the calling query). Excluded from the Supabase build variant via the existing `**/*operator_class.sql` glob in `tasks/build.sh` (operator classes require superuser). `tasks/pin_search_path.sql` — allowlists the six operator backing functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`). Pinning would break the inlining chain and prevent the planner from structurally matching predicates to functional indexes. - **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES needs a registered hash function on the type (no, and we don't want one — the CLLW protocol is for ordering, not hashing). MERGES needs an equivalent operator family on both sides, which we'd register separately if/when needed. This is the gap that disabled the pre-2025-06-24 opclass; see issue #220's history. - **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq` shortcut. Consistent with the rest of the CLLW path; one source of truth for equality semantics; resilient to any future change in the underlying ciphertext encoding. - **The opclass operators are different from the operators on `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256` and raise on non-Block-ORE columns. The new operators here are on the `eql_v2.ore_cllw` composite type itself — what callers reach through the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`. No conflict, different scope. `tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers: - Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted byte strings under the CLLW per-byte protocol. - Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01` string) — numeric < string within the same column. - Opclass registration: `pg_opclass.opcdefault = true` for `eql_v2.ore_cllw_ops`. - Functional-index match: build a functional btree on `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or `Index Only Scan`) and no `Sort` node. - Inlinability lint: read `pg_proc` directly, assert each backing function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`, and not pinned with `SET search_path`. From the bench results in cipherstash/benches#14 (post-#219 baseline): json/field_order/functional @ 1M = 20.0 s (no opclass; seq scan + Sort) With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same query should land in single-digit ms on the bench rig. End-to-end bench re-run is the next step on the bench-side branch. Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed, database-indexes.md ORE-CLLW recipe entry refreshed.
coderdan
added a commit
to cipherstash/encrypt-query-language
that referenced
this pull request
May 20, 2026
Restores functional-index match for sv-element ordered queries after the consolidation in #219 left the type without an opclass. Closes #220. `src/ore_cllw/operators.sql` — same-type comparison operators (`<`, `<=`, `=`, `>=`, `>`, `<>`) on `eql_v2.ore_cllw`. Each operator is backed by a single-statement `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` wrapper that reduces to `eql_v2.compare_ore_cllw_term(a, b) <op> 0`. Wrappers inline so the planner can fold them into the calling query — that's what lets the index match. `src/ore_cllw/operator_class.sql` — `eql_v2.ore_cllw_ops` btree opclass registered `DEFAULT FOR TYPE eql_v2.ore_cllw`. FUNCTION 1 is `eql_v2.compare_ore_cllw_term` directly (plpgsql per-byte protocol; called only by btree internals, not per-row from the calling query). Excluded from the Supabase build variant via the existing `**/*operator_class.sql` glob in `tasks/build.sh` (operator classes require superuser). `tasks/pin_search_path.sql` — allowlists the six operator backing functions (`ore_cllw_eq` / `_neq` / `_lt` / `_lte` / `_gt` / `_gte`). Pinning would break the inlining chain and prevent the planner from structurally matching predicates to functional indexes. - **No `HASHES` / `MERGES` flags** on the operator declarations. HASHES needs a registered hash function on the type (no, and we don't want one — the CLLW protocol is for ordering, not hashing). MERGES needs an equivalent operator family on both sides, which we'd register separately if/when needed. This is the gap that disabled the pre-2025-06-24 opclass; see issue #220's history. - **Equality via `compare_ore_cllw_term = 0`**, not a `bytea_eq` shortcut. Consistent with the rest of the CLLW path; one source of truth for equality semantics; resilient to any future change in the underlying ciphertext encoding. - **The opclass operators are different from the operators on `eql_v2_encrypted`.** Those (per #211) inline to `ore_block_u64_8_256` and raise on non-Block-ORE columns. The new operators here are on the `eql_v2.ore_cllw` composite type itself — what callers reach through the extractor form `WHERE eql_v2.ore_cllw(col) <op> eql_v2.ore_cllw($1)`. No conflict, different scope. `tests/sqlx/tests/ore_cllw_opclass_tests.rs` covers: - Operator wiring: `=`, `<>`, `<`, `<=`, `>`, `>=` on hand-crafted byte strings under the CLLW per-byte protocol. - Cross-domain ordering via the leading tag byte (`0x00` numeric, `0x01` string) — numeric < string within the same column. - Opclass registration: `pg_opclass.opcdefault = true` for `eql_v2.ore_cllw_ops`. - Functional-index match: build a functional btree on `eql_v2.ore_cllw(value)`, confirm `EXPLAIN` for `ORDER BY eql_v2.ore_cllw(value) LIMIT n` shows `Index Scan` (or `Index Only Scan`) and no `Sort` node. - Inlinability lint: read `pg_proc` directly, assert each backing function is `LANGUAGE sql`, `IMMUTABLE`, `STRICT`, `PARALLEL SAFE`, and not pinned with `SET search_path`. From the bench results in cipherstash/benches#14 (post-#219 baseline): json/field_order/functional @ 1M = 20.0 s (no opclass; seq scan + Sort) With this opclass, `EXPLAIN` flips to `Index Scan + Limit` and the same query should land in single-digit ms on the bench rig. End-to-end bench re-run is the next step on the bench-side branch. Docs: CHANGELOG `Added` entry, U-005 action-required note refreshed, database-indexes.md ORE-CLLW recipe entry refreshed.
Bench-side counterpart to encrypt-query-language#219 (ORE-only ste_vec
consolidation + strict eql_v2.compare contract).
- `Cargo.toml`: cipherstash-client pin moves from
`dan/zerokms-unexpected-error-context` to `main`. Main now carries the
ste_vec consolidation (suite#1955 + the post-#1955 `ocf`/`ocv` → `oc`
collapse). Note: this branch predates the RequestFailed Display fix
(#1960), so encrypt-side ZeroKMS errors surface as "Unexpected error"
again — the bench's retry loop in `prepare:_table` is the safety net.
- API renames across encrypt binaries + `src/lib.rs`:
- `ColumnType::Utf8Str` → `ColumnType::Text`
- `ColumnType::JsonB` → `ColumnType::Json`
- `Plaintext::Utf8Str` → `Plaintext::Text`
- `Plaintext::JsonB` → `Plaintext::Json`
- `src/bin/encrypt_ste_vec_small.rs` + `_large.rs`: set
`IndexType::SteVec { mode: SteVecMode::Standard, .. }` so sv elements
emit `oc` (CLLW ORE) for orderable terms. EQL 2.3's `eql_v2_encrypted`
type only handles ORE; OPE (Compat mode) is moving to a separate
encrypted column type in a future release.
- `benches/json.rs`:
- `ore_extractor_for` map updated: `oc` → `eql_v2.ore_cllw`, `op` →
`eql_v2.ope_cllw` (placeholder; OPE bench path is exercised via the
Compat-mode encrypt binary which isn't current default).
- Picker rewritten to sample **two** selectors independently: one
hm-bearing for field_eq scenarios, one orderable-bearing for
field_order scenarios. Post-#1955 these are typically disjoint
(the array-prefix selector lookup element carries `hm`; the value
elements carry `oc`). The old single-needle picker skipped
field_eq/* when it picked an `oc`-only element.
- `benches/ore.rs`: re-added `range_lt_natural_ordered_10` scenario
(carried forward from feat/ore-natural-form-bench, PR #13).
- `report_benchmarks.py`: new entries for `range_lt_natural_ordered_10`
and the json/field_eq scenario.
Six bench families at 100k + 1M (78 scenarios total) under fresh data
ingested with the consolidation client. See `report/BENCHMARK_REPORT.md`
for the full table; key signals:
- **JSON contains/functional via GIN ste_vec**: 3.88 ms @ 100k → 4.32 ms
@ 1M (flat — GIN sub-linear). Index engages cleanly.
- **JSON field_eq/* (inlined hmac path)**: 611-1250 µs @ 100k, 634 µs-1.1
ms @ 1M. Sub-millisecond at 1M — confirms post-#205 hmac inlining +
hmac_256_terms GIN. Planner picks seq scan over GIN at 100k (cost-model
edge case); both paths sub-ms.
- **JSON field_order/* (extractor-driven sort)**: 7.85 s @ 100k → 28.9 s
@ 1M, scales linearly. No functional-index match because no opclass on
`eql_v2.ore_cllw` — tracked as encrypt-query-language#220 (restore
CLLW ORE opclass).
- **ORE block range queries**: 1-2 ms @ 100k, sub-ms @ 1M (selective).
Functional `eql_v2.ore_block_u64_8_256` index engages.
- **EXACT (`=` via hmac on root scalars)**: sub-ms @ both tiers.
Functional hash index engages.
- **MATCH (LIKE via bloom_filter GIN)**: ms-range at both tiers.
- **GROUP_BY low-cardinality**: HashAggregate via inlined hmac_256;
matches the EQL query-perf guide §5 recipe.
Re-ingest of integer/string/category/combo at 100k+1M was required
because the previous data was encrypted under a stale keyset that the
current workspace can no longer decrypt — without this, the
`_decrypt/*` scenarios would 403 on ZeroKMS. JSON ste_vec data was
re-ingested at 100k+1M with `SteVecMode::Standard`.
EQL installed: the **supabase variant** (no operator classes) — required
for ANALYZE to succeed on hm-only tables under the strict-compare
contract from #219. Aligns with U-001's functional-index recommendation.
- 10M tier (existing data still present but stale-keyset; would need
several hours of re-ingest).
- Compat-mode bench scenarios exercising `eql_v2.ope_cllw` — OPE moves
to a separate encrypted type in a future release; no EQL recipe to
bench against today.
- field_order with functional ORE index match — blocked on
encrypt-query-language#220 (operator class on `eql_v2.ore_cllw`).
Two issues in the post-#1955 bench report were causing scenario rows
to render as "Unknown query":
1. The bench ID parser used `parts[2]` as scenario_name. That works for
4-component IDs like `EXACT/exact/eql_cast/100000` but loses the
variant for the JSON bench's 5-component IDs
(`JSON/json/contains/functional/10000`). Switched to
`"/".join(parts[2:-1])` so multi-part scenario names stay distinct;
tightened the length check from `>= 3` to `>= 4` since the row count
is always at the end. Same change applied to both the rows-result
parser and the metadata sidecar parser.
2. `sql_map["JSON"]` and `descriptions["JSON"]` were populated for the
old json bench's scenario set (`field_eq`, `field_extract`,
`field_group_by`). The current bench emits six scenarios across
three families: `contains/functional`, `field_eq/{bare,extractor,
functional}`, `field_order/{bare,functional}`. Rewrote both maps to
match — each entry now describes its index recipe, expected plan
shape, and where the bare/extractor/functional variants diverge.
3. Chart filenames now sanitise `/` to `_` so multi-part scenario IDs
produce valid filesystem paths (e.g.
`query_json_field_eq_bare_chart.png`).
Plus a new `report:slow` mise task that prints scenarios whose median
runtime exceeds a threshold (default 100ms, override with the first
arg). The CLI shape:
$ mise run report:slow # 100ms default
$ mise run report:slow 250 # 250ms threshold
Output is sorted descending so the worst offenders surface first.
Useful for triaging which scenarios degraded (or never recovered) after
EQL changes. The 100ms list at HEAD surfaces the two unresolved JSON
issues we're tracking:
- `JSON/json/field_order/functional/100000` at 5.5s — functional ORE
index missing on the 100k table (#221's opclass available at schema
level but no per-table index built; needs `prepare:_table` re-run).
- All `field_eq/*/10000` rows missing entirely — the 10k JSON ste_vec
table is stale pre-#1955; sv elements lack `hm` so the bench picker
skips field_eq. Re-ingest needed.
Regenerates report/BENCHMARK_REPORT.md, report/json.md and the six
per-scenario JSON charts as a sanity-check of the fixes.
Two underlying fixes that unblocked the missing/slow scenarios:
- 10k tier: re-ingested `json_ste_vec_small_encrypted_10000` under the
current cipherstash-suite (post-#1955). Pre-2.3 the table's sv
elements lacked `hm` natively, so the bench's needle picker walked
the sample row's sv array, found no hm-bearing element, and silently
skipped all `field_eq/*` scenarios. With fresh data, sv[0] now
carries `hm` and the three `field_eq` variants execute (each ~600µs).
- Per-table functional ORE index: created
`<table>_oc_9a2d817b8ec7abe623a1fcb` on the 10k and 100k tables.
EQL #221's `eql_v2.ore_cllw_ops` opclass was available at the
schema level but the per-table `prepare:_table` script's static
`up.sql` doesn't create a functional index on the orderable selector
(it only emits the GIN indexes for ste_vec + hmac_terms_terms). Now
`field_order/functional/{10000,100000}` engage Index Scan at
~1.3ms instead of Seq Scan + Top-N at 5+ seconds.
Refreshes the regenerated charts + report/json.md so the report shows
the post-fix numbers. The 1M tier already had the index and matching
results from an earlier run; only the metadata.json shape changed.
`mise run report:slow` (added in the previous commit) now lists only
the genuinely-slow scenarios: `field_order/bare/*` (which can't engage
the index by design — keeps showing the cost of the recipe gap on
non-inlinable `eql_v2.\"->\"`) and unrelated ORE selective queries.
…ical Same precedent as the ORE bench dropping its natural-form ORDER BY (refresh-eql-211 series): the bare-form `ORDER BY value -> '<sel>'` can't engage the functional ORE-CLLW index because `eql_v2."->"` is plpgsql, so the planner has no path from the sort key back to the indexed expression. The resulting Seq Scan + Top-N sort is linear in table size (5+s at 100k, 21s at 1M in the previous results), which made the scenario look like an EQL performance bug when really it's just the wrong recipe. The extractor form ORDER BY <ore_extractor>(value -> '<sel>'::text) LIMIT 10 remains as `field_order/functional` — same per-row plan as the bare form's intent but actually engages the index. Documented in §4.1 of the EQL query performance guide (added in docs/reference/query-performance.md on the dan/query-performance-guide branch). Result file refresh: ran the bench at 10k / 100k / 1M after the deletion. Per-tier scenario count drops from 6 → 5. The stale `query_json_field_order_bare_chart.png` is removed; no other charts change. `mise run report:slow` at HEAD no longer surfaces any JSON scenarios — only the pre-existing ORE selective range queries remain on the >100ms list.
The bench DB had EQL from before PR #211 (range-operator inlining + re-enabling the Block-ORE opclass) and PR #221 (CLLW opclass). All existing functional indexes were bound to `pg_catalog.record_ops` at creation time — Postgres's silent fallback when a type lacks a custom btree opclass. Even though EQL was reinstalled, existing indexes retain their original opclass binding (REINDEX doesn't update it). Operation on the bench DB: 1. Reinstalled EQL from `feat/ore-cllw-opclass` (which has #219 + #221 stacked on top of #211). Both `ore_block_u64_8_256_operator_class` and `ore_cllw_ops` now exist. 2. Uninstall CASCADE dropped all functional indexes; recreated them via each table's `sql/indexes/<table>_up.sql`. 3. Dropped the deprecated `eql_v2.encrypted_operator_class` / `encrypted_operator_family` (U-001-deprecated; not used by any functional index, and post-#219 its FUNCTION 1 raises on ste_vec columns during ANALYZE because the strict `eql_v2.compare` requires `ob` at root). 4. Created per-tier functional CLLW ORE indexes on the JSON tables. 5. Re-ran `bench:query:ore` at 100k / 1M and `bench:query:json` at 10k / 100k / 1M. 10k ORE skipped — stale keyset on the integer_encrypted_10000 table (decrypt failed); 10M ORE skipped to limit run time. Bench code change: `field_order/functional` query updated to use the `(value -> 'sel').data` jsonb form (was `value -> 'sel'`). The (eql_v2_encrypted) overload of `eql_v2.ore_cllw` was removed in #219; the (eql_v2.ste_vec_entry) overload has a DOMAIN check requiring `s` + `c` + `hm` that is stricter than what cipherstash-client currently emits on orderable-only sv elements. Using the (jsonb) overload sidesteps the DOMAIN gate while still engaging the functional CLLW index. Results impact (`mise run report:slow`): Before: - ORE/range_selective_gt_count/1M: 8.4s - ORE/range_highly_selective_gt_count/1M: 8.2s - ORE/range_highly_selective_gt_10/1M: 3.3s - ORE/range_selective_gt_100/1M: 0.65s After: - All 1M ORE selective queries: no longer on the >100ms list. - 100k / 1M JSON field_order/functional: ~1.2ms (consistent across tiers — index walks the btree in order, no Sort). - Only 10M scenarios remain on the slow list (not re-run here). TODO follow-ups (will track separately): - ORE 10k re-ingest (stale keyset on integer_encrypted_10000) - 10M tier re-bench - EQL #219 DOMAIN check is too strict for current cipherstash-client output (orderable sv elements lack `hm`)
PR cipherstash/eql#223 changes the StEVec query surface: - `->` returns `eql_v2.ste_vec_entry` (was `eql_v2_encrypted` with a synthetic root). RHS literals for `field_eq/bare` cast to `::eql_v2.ste_vec_entry`, not `::eql_v2_encrypted`. The bare-form `=` operator on entries inlines to `eq_term(a) = eq_term(b)`. - The fused `eql_v2.hmac_256(eql_v2_encrypted, text)` was removed. `field_eq/functional` shifts to the chained `eql_v2.eq_term(value -> '<sel>'::text)` recipe — XOR-aware (covers hm-bearing and oc-bearing selectors with one expression). Right-hand side casts via `::eql_v2.ste_vec_entry` then runs the same extractor. - `field_order/functional` simplifies: `->` now returns `ste_vec_entry` directly, so `eql_v2.ore_cllw(value -> '<sel>')` works without the `.data::eql_v2.ste_vec_entry` cast workaround. - `contains/functional` switches from `eql_v2.ste_vec(col) @>` to `eql_v2.jsonb_array(col) @>`. The strict-Block-ORE compare contract (#211) means a btree default opclass on `eql_v2_encrypted` raises on sv-element samples, which in turn blocks GIN-on-array builds against `eql_v2.ste_vec(value)` (the GIN array_ops uses the element type's default btree opclass). `jsonb_array` returns `jsonb[]`, sidestepping the broken element compare. Same containment semantics, same functional-index recipe.
… StEVec) Re-ran `mise run bench:query:json` at 10k / 100k / 1M against the typed-StEVec EQL build (cipherstash/eql#223 — `->` returns ste_vec_entry, `eq_term` extractor, `stevec_query` containment). 10M still in progress; results will land in a follow-up commit once the table finishes ingesting. Per-selector functional CLLW ORE indexes were also rebuilt against the new opclass after the schema reinstall — without them `field_order/functional` falls to Seq Scan + Top-N sort (12s at 1M). Median time comparison (1M tier; before = c79a5d1 pre-#223): | Scenario | Before | After | Δ | | ----------------- | ------- | -------- | -------- | | contains/func | 0.57 ms | 0.66 ms | +16% | | field_eq/bare | 1.22 ms | 0.84 ms | **-31%** | | field_eq/extract | 0.56 ms | 0.57 ms | parity | | field_eq/func | 0.58 ms | 0.83 ms | +43% | | field_order/func | 1.18 ms | 0.77 ms | **-35%** | Reads: - field_eq/bare and field_order/functional are wins. The typed ste_vec_entry `=` operator (inlines to `eq_term(a) = eq_term(b)`) and the typed `ore_cllw(ste_vec_entry)` extractor are both faster paths than the previous synthetic-root / `.data::jsonb` cast workarounds. - field_eq/functional is the only regression: +43% because the new recipe `eql_v2.eq_term(col -> 'sel')` is structurally bigger than the old fused `eql_v2.hmac_256(col, 'sel')`. Both are sub-ms at 1M. The trade is correctness — `eq_term` covers oc-bearing selectors (string / number leaves) that silently returned zero rows under the old recipe. - contains/functional shifted from `eql_v2.ste_vec(col) @>` to `eql_v2.jsonb_array(col) @>` because GIN-on-`eql_v2_encrypted[]` fails to build under the strict-Block-ORE compare contract (#211). Same containment semantics; ~16% slower at 1M is the recipe-shift overhead (different element type internally). field_eq/extractor at 100k (~7.7 ms) is anomalous against 10k (~1.4 ms) and 1M (~0.6 ms) — single-tier noise from a small sample (660 iterations vs 5000+ on other scenarios). Re-running 100k did not reproduce as severely as the first attempt (was 8.79 ms); appears to be normal sample variance.
10M ingestion took ~3 attempts under ZeroKMS connection-drop retries (c. 100 min wall time). Once the table reached 10M rows the bench itself ran cleanly across all five scenarios. Median times (10M tier): | Scenario | 10M | 1M | Scaling ratio | | ------------------ | ------- | ------- | ------------- | | contains/func | 6.85 ms | 0.66 ms | 10.4× (linear — GIN bitmap scan + heap fetch on a larger set) | | field_eq/bare | 1.03 ms | 0.84 ms | 1.2× (Seq Scan + LIMIT 10 early-exit; near-flat) | | field_eq/extract | 0.74 ms | 0.57 ms | 1.3× (GIN hmac_terms; flat) | | field_eq/func | 0.90 ms | 0.83 ms | 1.1× (Seq Scan + LIMIT 10 early-exit; flat) | | field_order/func | 0.80 ms | 0.77 ms | 1.04× (functional CLLW btree walks in order; effectively flat) | The two flat-scaling paths are the load-bearing wins from the typed StEVec API: functional-index match through the inlined chain holds all the way to 10M. `contains/functional` scales with the matched row set as expected.
cipherstash/eql#223 drops `eql_v2.hmac_256_terms` (structurally
wrong under the XOR contract — it filtered out oc-bearing sv
elements, so containment via that index could never match string /
number selectors). Replacement recipe in the JSON bench's
field_eq/extractor scenario:
WHERE value @> $1::jsonb::eql_v2.stevec_query LIMIT 10
with a functional GIN on
`(eql_v2.to_stevec_query(value)::jsonb) jsonb_path_ops`. The typed
`@>(eql_v2_encrypted, eql_v2.stevec_query)` overload inlines to a
native `jsonb @>` over the same expression, so the planner engages
Bitmap Index Scan structurally. The needle binding moves from the
old `[{"s":"<sel>","hm":"<hex>"}]` shape to the new
`{"sv":[{"s":"<sel>","hm":"<hex>"}]}` shape (sv-wrapped).
Refreshed bench results land in a follow-up commit alongside the
EQL reinstall + index swap on the bench DB.
Move off the cipherstash-suite git pins to the published crates.io pre-release that carries the ste_vec consolidation. - mise.toml setup-db: download EQL from the eql-2.3.0-pre.4 release asset instead of releases/latest. A pre-release is never tagged "Latest", so /latest/ resolved to eql-2.2.1. - Cargo.toml: cipherstash-client + stack-profile pinned to crates.io =0.34.1-alpha.9. Both ship from the same workspace and must stay version-locked: cipherstash-client implements KeyProvider for stack-profile's ProfileStore, so a source mismatch yields two incompatible ProfileStore types. API migration for the alpha.9 encrypt_eql change: it now returns Vec<EqlOutput> (Store(EqlCiphertext) | Query(EqlQueryPayload)) instead of Vec<EqlCiphertext>. - lib.rs: storage inserts unwrap EqlOutput::Store; EncryptedQuery.eql is now EqlQueryPayload (build_query uses EqlOperation::Query). - encrypt_combo.rs: unwrap to EqlCiphertext before the chunks_exact(3) row reassembly, since EqlOutput is intentionally not Clone.
benches/json.rs: - Drop dead `op` / eql_v2.ope_cllw handling — Compat OPE-CLLW was removed from eql_v2_encrypted in EQL 2.3. The orderable-tag scan and ore_extractor_for now cover `ob` / `oc` only. - Correct stale header docs that still described the removed `hmac_256_terms` GIN and fused `hmac_256(col, text)` recipes. The query strings were already on the 2.3 typed recipes (jsonb_array containment, stevec_query @>, eq_term, typed `->`). sql/indexes/json_ste_vec_small_encrypted*: regenerate _up/_down for all five tiers. The old _up.sql created GIN (eql_v2.hmac_256_terms(value)), and hmac_256_terms was removed in EQL 2.3 — CREATE INDEX would hard-fail during setup-db. New indexes pair with the bench queries: - GIN (eql_v2.jsonb_array(value)) — contains/functional - GIN ((eql_v2.to_stevec_query(value)::jsonb) jsonb_path_ops) — field_eq/extractor (XOR-aware, covers hm- and oc-bearing selectors)
- Optional positional arg caps the largest row-count tier: `mise run bench:query:all 1000000` runs 10k/100k/1M and skips the slow 10M tier. Bare invocation still runs every tier through 10M. Non-numeric or below-smallest-tier args are rejected. - Un-park bench:query:json — it now runs in the loop alongside exact/match/ore/group_by/combo. It was parked pending a cipherstash-client release emitting the post-2.3 ste_vec shape; cipherstash-client 0.34.1-alpha.9 + EQL 2.3.0-pre.4 provide it.
mise renders task `run` scripts through Tera before executing them.
The empty-tier guard used bash array-length syntax, whose `{#` opens a
Tera comment with no close — Tera render failed, so mise never ran the
script and instead dumped the raw source with "task failed".
Use `[ -z "${ROW_COUNTS[*]}" ]` (empty string when the array has no
elements) — no `{#`, Tera-safe. Same applies to any future run script:
keep `{{`, `{%`, `{#` out of task bodies, comments included.
setup-db downloads EQL from the eql-2.3.0-pre.4 release by default.
Setting EQL_SQL=<path> installs a local build instead — for testing an
unreleased EQL fix before a pre-release is cut:
EQL_SQL=../encrypt-query-language/release/cipherstash-encrypt.sql \
mise run setup-db
Default behaviour (download eql-2.3.0-pre.4) is unchanged.
…(file) `mise run report` now prints every benchmark scenario — median runtime per scenario, slowest first — the same table as `report:slow`, just with no threshold. The full Markdown-file generator moves to `report:build`. - find_slow_queries.py: add --all (list every scenario, ignore the threshold; "All N scenarios ..." header). - mise.toml: rename report -> report:build; new `report` runs find_slow_queries.py --all. - README.md / README_REPORT.md / report/README.md: point existing `mise run report` references at `report:build`; document the new `report` overview.
…enarios field_eq/functional and field_order/functional are meant to measure per-selector functional indexes — `hash (eql_v2.eq_term(value -> 'sel'))` and `btree (<ore_extractor>(value -> 'sel'))`. ste_vec equality and ordering are per-selector (the index expression embeds the selector hash), so these indexes can't be declared in the static sql/indexes/*_up.sql — the selector is only known once sample_needles has run. They were never created, so field_order/functional was an un-indexed Parallel Seq Scan + top-N Sort — ~10s at 1M rows. With the index it is an Index Scan (cost ~13 for LIMIT 10) — sub-millisecond. Add create_field_indexes(), called once at bench startup after sample_needles picks the selectors and before the criterion loop (index build + ANALYZE is one-time setup, outside what criterion measures). Refresh the header comments that described these scenarios as un-indexed baselines.
The four selective ORE range scenarios (range_selective_gt_100, range_highly_selective_gt_10, and both *_count variants) degrade into near-full sequential scans at the 10M tier — seconds per iteration. Root cause is a planner limitation, not the bench or the index: the comparison value is a bound parameter, not a plan-time constant, so the planner cannot estimate the selectivity of the encrypted ORE comparison and falls back to DEFAULT_INEQ_SEL (33%), picking a Seq Scan. Tracked in EQL issue #230. The scenarios are commented out (not removed) so they restore by un-commenting once #230 lands a selectivity fix. Non-selective baselines and the hybrid ordered-range scenario remain enabled.
Regenerated bench results and charts from a full re-run against EQL 2.3 recipes, including the 10M ORE tier. Reflects the disabled selective ORE scenarios from the preceding commit.
fabdf47 to
4838f2c
Compare
The eql-2.3.0 release is out; setup-db was still downloading the eql-2.3.0-pre.4 pre-release. The EQL_SQL local-build override is unchanged.
Surfaces query-only medians across the four row-count tiers in the README's View Results section, so the top-line performance picture is visible without opening the full report.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bench-side counterpart to EQL 2.3 (consolidation began in encrypt-query-language#219; shipped as
eql-2.3.0) — ORE-only ste_vec consolidation, stricteql_v2.compare, functional-index recipes, and the typed->/stevec_querysurface.Summary
Full bench suite re-run against the EQL 2.3 line across four row-count tiers — 10k / 100k / 1M / 10M — with fresh data ingested under the 2.3 client. Equality, containment, free-text, Block-ORE range, JSON field access, GROUP BY and combo paths all hold up with no regressions. One rough edge — selective ORE range queries — is an EQL planner-stats limitation tracked as #230.
Headline numbers
Query-only medians, no decrypt. Full per-scenario tables — SQL, planner index choices, EXPLAIN trees — live in
report/(ore.md,json.md,exact.md,match.md,group_by.md,combo.md).(
range_lt_hybrid_ordered_10has no 10k entry — the 10k ORE result set predates the scenario.)Disabled — selective ORE range scenarios
range_selective_gt_100,range_highly_selective_gt_10,range_selective_gt_countandrange_highly_selective_gt_countare commented out inbenches/ore.rs. At the 10M tier they degrade into 2–3 s sequential scans: the planner cannot estimate the selectivity of an encrypted ORE comparison with a bound-parameter operand, falls back toDEFAULT_INEQ_SEL(33%), and picks a Seq Scan over the functional index. This is an EQL planner-stats limitation — not a bench or index fault — tracked as encrypt-query-language#230. Their last-recorded numbers remain inreport/ore.md; the scenarios restore by un-commenting once #230 lands a selectivity fix.What changed on the bench side
benches/json.rsupdated for 2.3 recipes (oc→eql_v2.ore_cllw, ste_vecmode: Standard).setup-dbinstallseql-2.3.0-pre.4by default, with anEQL_SQLenv override for a local EQL build — follow-up: bump the default to the finaleql-2.3.0release.cipherstash-clientbumped to the EQL-2.3-aligned pre-release;encrypt_eqlnow returnsVec<EqlOutput>(Store/Query).benches/json.rsfield indexes.field_eq/*andfield_order/*build per-selector functional indexes at startup —hash (eql_v2.eq_term(col -> sel))andbtree (eql_v2.ore_cllw(col -> sel)). This closes the oldfield_ordergap: formerly a ~20 s seq scan at 1M, now sub-ms at every tier (see headline table).benches/ore.rs. Selective scenarios disabled (#230, above); the natural-form ordered scenario (range_lt_natural_ordered_10) carried in frommain.reportsplit intoreport(terminal overview) +report:build(file);bench:query:allgained a--max-rowscap.What this confirms about EQL 2.3
eql_hash,contains/functional,field_eq/functionalandfield_order/functionalare all sub-ms through 1M; onlycontainsrises at 10M (6.8 ms), still well within budget — consistent with the 2.3 functional indexes engaging without query rewriting.field_ordergap closed. The 2.3ore_cllwbtree opclass plus the per-selector functional index turn the formerly-pathological JSON ordered-field scenario into a sub-ms query at every tier.Seq Scan + LIMITover the index by design — cheaper when matches are dense. See the per-scenario EXPLAIN notes inreport/ore.md.Operational notes
*_decryptscenarios on ZeroKMS) is resolved.ORDER BYscenarios use the extractor form (ORDER BY eql_v2.ore_block_u64_8_256(col)).Test plan
mise run report:buildregenerates the report, per-family pages and charts without error.cargo check --bench ore/--bench jsonclean after the EQL 2.3 + #230 changes.