bench(ore): add natural-form ORDER BY scenario alongside hybrid by coderdan · Pull Request #13 · cipherstash/benches

coderdan · 2026-05-18T08:55:03Z

Summary

Re-adds the WHERE value < $1 ORDER BY value LIMIT 10 natural-form scenario to benches/ore.rs, paired with the existing range_lt_hybrid_ordered_10 so the two together quantify the cost of the natural-form sort-key shortcut.
Refreshes ORE bench results (10k / 100k / 1M) against the current EQL DEV install carrying cipherstash/encrypt-query-language#211 / EQL #218 (range-operator inlining).
Regenerates report/BENCHMARK_REPORT.md + report/ore.md + charts. Adds query_ore_range_lt_natural_ordered_10_chart.png.

Why

The natural-form ORDER BY scenario was dropped from this bench in an earlier pass (the "§4 sort-key trap"). Once #211 / #218 closed the per-comparison perf cliff (each ORE-term comparison is now inlinable SQL rather than dispatched through a plpgsql eql_v2.compare() body), it was worth re-measuring to confirm whether the hybrid-form recommendation in docs/reference/query-performance.md §4 still holds.

It does. At 50% selectivity:

Rows	natural form	hybrid form	slowdown
10k	18 ms	1.7 ms	10×
100k	1.2 s	1.7 ms	697×
1M	8.85 s	1.3 ms	6988×

The natural form is no longer the 6.3 s perf cliff it was pre-#218, but the residual Top-N Sort over the bitmap-scan output still dominates at any meaningful row count — the per-comparison cost has dropped, but the work is O(n log n) over the post-WHERE set. The bench now makes this delta explicit so the perf guide §4 has a numerical anchor.

Test plan

cargo build --bench ore --release succeeds
mise run bench:query:ore 10000 completes, includes range_lt_natural_ordered_10 and range_lt_hybrid_ordered_10 in results/query/ore_rows_10000.json
Same at 100k and 1M
mise run report regenerates without errors; ORE section in report/BENCHMARK_REPORT.md includes the new scenario with description text
New chart report/query_ore_range_lt_natural_ordered_10_chart.png renders
CI green

Re-adds the `WHERE value < $1 ORDER BY value LIMIT 10` scenario that was dropped from this bench in an earlier cleanup pass. It pairs with the existing `range_lt_hybrid_ordered_10` (`ORDER BY eql_v2.ore_block_u64_8_256(value)`) so the two scenarios together quantify the cost of taking the natural-form sort-key shortcut. Both scenarios use the same predicate and `LIMIT` — only the `ORDER BY` form differs. The hybrid form's sort key matches the functional index expression syntactically, so rows stream out of the index already ordered (no Sort node). The natural form's sort key doesn't match, so the plan keeps a residual Top-N Sort over the bitmap-scan output; post-EQL #218 each comparison in that Sort is the inlined ORE-term path, but the Sort itself still scales with the size of the post-WHERE set. Numbers at 50% selectivity: rows natural hybrid slowdown 10k 18 ms 1.7 ms 10× 100k 1.2 s 1.7 ms 697× 1M 8.85 s 1.3 ms 6988× The natural form is no longer the 6.3 s perf cliff it was pre-#218 (each comparison is now inlinable rather than dispatched through a plpgsql `eql_v2.compare()` body), but the residual Sort still dominates at any meaningful row count — the bench now makes that delta explicit so `docs/reference/query-performance.md` §4 has a numerical anchor. `report_benchmarks.py` gets the new scenario in `sql_map` and `descriptions`, and the report is regenerated. Result sidecars (`ore_rows_*.json` + `ore_metadata_*.json`) are refreshed at 10k/100k/1M against the current EQL DEV install (carries #211 / EQL #218 inlining).

Bench-side counterpart to encrypt-query-language#219 (ORE-only ste_vec consolidation + strict eql_v2.compare contract). - `Cargo.toml`: cipherstash-client pin moves from `dan/zerokms-unexpected-error-context` to `main`. Main now carries the ste_vec consolidation (suite#1955 + the post-#1955 `ocf`/`ocv` → `oc` collapse). Note: this branch predates the RequestFailed Display fix (#1960), so encrypt-side ZeroKMS errors surface as "Unexpected error" again — the bench's retry loop in `prepare:_table` is the safety net. - API renames across encrypt binaries + `src/lib.rs`: - `ColumnType::Utf8Str` → `ColumnType::Text` - `ColumnType::JsonB` → `ColumnType::Json` - `Plaintext::Utf8Str` → `Plaintext::Text` - `Plaintext::JsonB` → `Plaintext::Json` - `src/bin/encrypt_ste_vec_small.rs` + `_large.rs`: set `IndexType::SteVec { mode: SteVecMode::Standard, .. }` so sv elements emit `oc` (CLLW ORE) for orderable terms. EQL 2.3's `eql_v2_encrypted` type only handles ORE; OPE (Compat mode) is moving to a separate encrypted column type in a future release. - `benches/json.rs`: - `ore_extractor_for` map updated: `oc` → `eql_v2.ore_cllw`, `op` → `eql_v2.ope_cllw` (placeholder; OPE bench path is exercised via the Compat-mode encrypt binary which isn't current default). - Picker rewritten to sample **two** selectors independently: one hm-bearing for field_eq scenarios, one orderable-bearing for field_order scenarios. Post-#1955 these are typically disjoint (the array-prefix selector lookup element carries `hm`; the value elements carry `oc`). The old single-needle picker skipped field_eq/* when it picked an `oc`-only element. - `benches/ore.rs`: re-added `range_lt_natural_ordered_10` scenario (carried forward from feat/ore-natural-form-bench, PR #13). - `report_benchmarks.py`: new entries for `range_lt_natural_ordered_10` and the json/field_eq scenario. Six bench families at 100k + 1M (78 scenarios total) under fresh data ingested with the consolidation client. See `report/BENCHMARK_REPORT.md` for the full table; key signals: - **JSON contains/functional via GIN ste_vec**: 3.88 ms @ 100k → 4.32 ms @ 1M (flat — GIN sub-linear). Index engages cleanly. - **JSON field_eq/* (inlined hmac path)**: 611-1250 µs @ 100k, 634 µs-1.1 ms @ 1M. Sub-millisecond at 1M — confirms post-#205 hmac inlining + hmac_256_terms GIN. Planner picks seq scan over GIN at 100k (cost-model edge case); both paths sub-ms. - **JSON field_order/* (extractor-driven sort)**: 7.85 s @ 100k → 28.9 s @ 1M, scales linearly. No functional-index match because no opclass on `eql_v2.ore_cllw` — tracked as encrypt-query-language#220 (restore CLLW ORE opclass). - **ORE block range queries**: 1-2 ms @ 100k, sub-ms @ 1M (selective). Functional `eql_v2.ore_block_u64_8_256` index engages. - **EXACT (`=` via hmac on root scalars)**: sub-ms @ both tiers. Functional hash index engages. - **MATCH (LIKE via bloom_filter GIN)**: ms-range at both tiers. - **GROUP_BY low-cardinality**: HashAggregate via inlined hmac_256; matches the EQL query-perf guide §5 recipe. Re-ingest of integer/string/category/combo at 100k+1M was required because the previous data was encrypted under a stale keyset that the current workspace can no longer decrypt — without this, the `_decrypt/*` scenarios would 403 on ZeroKMS. JSON ste_vec data was re-ingested at 100k+1M with `SteVecMode::Standard`. EQL installed: the **supabase variant** (no operator classes) — required for ANALYZE to succeed on hm-only tables under the strict-compare contract from #219. Aligns with U-001's functional-index recommendation. - 10M tier (existing data still present but stale-keyset; would need several hours of re-ingest). - Compat-mode bench scenarios exercising `eql_v2.ope_cllw` — OPE moves to a separate encrypted type in a future release; no EQL recipe to bench against today. - field_order with functional ORE index match — blocked on encrypt-query-language#220 (operator class on `eql_v2.ore_cllw`).

coderdan mentioned this pull request May 18, 2026

bench: refresh full query + ingest suite (10k–10M) against EQL 2.3 #14

Merged

5 tasks

coderdan marked this pull request as ready for review May 19, 2026 12:33

coderdan merged commit 66ca6fa into main May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(ore): add natural-form ORDER BY scenario alongside hybrid#13

bench(ore): add natural-form ORDER BY scenario alongside hybrid#13
coderdan merged 1 commit into
mainfrom
feat/ore-natural-form-bench

coderdan commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

coderdan commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderdan commented May 18, 2026 •

edited

Loading