bench(ore): add natural-form ORDER BY scenario alongside hybrid#13
Merged
Conversation
Re-adds the `WHERE value < $1 ORDER BY value LIMIT 10` scenario that was dropped from this bench in an earlier cleanup pass. It pairs with the existing `range_lt_hybrid_ordered_10` (`ORDER BY eql_v2.ore_block_u64_8_256(value)`) so the two scenarios together quantify the cost of taking the natural-form sort-key shortcut. Both scenarios use the same predicate and `LIMIT` — only the `ORDER BY` form differs. The hybrid form's sort key matches the functional index expression syntactically, so rows stream out of the index already ordered (no Sort node). The natural form's sort key doesn't match, so the plan keeps a residual Top-N Sort over the bitmap-scan output; post-EQL #218 each comparison in that Sort is the inlined ORE-term path, but the Sort itself still scales with the size of the post-WHERE set. Numbers at 50% selectivity: rows natural hybrid slowdown 10k 18 ms 1.7 ms 10× 100k 1.2 s 1.7 ms 697× 1M 8.85 s 1.3 ms 6988× The natural form is no longer the 6.3 s perf cliff it was pre-#218 (each comparison is now inlinable rather than dispatched through a plpgsql `eql_v2.compare()` body), but the residual Sort still dominates at any meaningful row count — the bench now makes that delta explicit so `docs/reference/query-performance.md` §4 has a numerical anchor. `report_benchmarks.py` gets the new scenario in `sql_map` and `descriptions`, and the report is regenerated. Result sidecars (`ore_rows_*.json` + `ore_metadata_*.json`) are refreshed at 10k/100k/1M against the current EQL DEV install (carries #211 / EQL #218 inlining).
5 tasks
coderdan
added a commit
that referenced
this pull request
May 20, 2026
Bench-side counterpart to encrypt-query-language#219 (ORE-only ste_vec
consolidation + strict eql_v2.compare contract).
- `Cargo.toml`: cipherstash-client pin moves from
`dan/zerokms-unexpected-error-context` to `main`. Main now carries the
ste_vec consolidation (suite#1955 + the post-#1955 `ocf`/`ocv` → `oc`
collapse). Note: this branch predates the RequestFailed Display fix
(#1960), so encrypt-side ZeroKMS errors surface as "Unexpected error"
again — the bench's retry loop in `prepare:_table` is the safety net.
- API renames across encrypt binaries + `src/lib.rs`:
- `ColumnType::Utf8Str` → `ColumnType::Text`
- `ColumnType::JsonB` → `ColumnType::Json`
- `Plaintext::Utf8Str` → `Plaintext::Text`
- `Plaintext::JsonB` → `Plaintext::Json`
- `src/bin/encrypt_ste_vec_small.rs` + `_large.rs`: set
`IndexType::SteVec { mode: SteVecMode::Standard, .. }` so sv elements
emit `oc` (CLLW ORE) for orderable terms. EQL 2.3's `eql_v2_encrypted`
type only handles ORE; OPE (Compat mode) is moving to a separate
encrypted column type in a future release.
- `benches/json.rs`:
- `ore_extractor_for` map updated: `oc` → `eql_v2.ore_cllw`, `op` →
`eql_v2.ope_cllw` (placeholder; OPE bench path is exercised via the
Compat-mode encrypt binary which isn't current default).
- Picker rewritten to sample **two** selectors independently: one
hm-bearing for field_eq scenarios, one orderable-bearing for
field_order scenarios. Post-#1955 these are typically disjoint
(the array-prefix selector lookup element carries `hm`; the value
elements carry `oc`). The old single-needle picker skipped
field_eq/* when it picked an `oc`-only element.
- `benches/ore.rs`: re-added `range_lt_natural_ordered_10` scenario
(carried forward from feat/ore-natural-form-bench, PR #13).
- `report_benchmarks.py`: new entries for `range_lt_natural_ordered_10`
and the json/field_eq scenario.
Six bench families at 100k + 1M (78 scenarios total) under fresh data
ingested with the consolidation client. See `report/BENCHMARK_REPORT.md`
for the full table; key signals:
- **JSON contains/functional via GIN ste_vec**: 3.88 ms @ 100k → 4.32 ms
@ 1M (flat — GIN sub-linear). Index engages cleanly.
- **JSON field_eq/* (inlined hmac path)**: 611-1250 µs @ 100k, 634 µs-1.1
ms @ 1M. Sub-millisecond at 1M — confirms post-#205 hmac inlining +
hmac_256_terms GIN. Planner picks seq scan over GIN at 100k (cost-model
edge case); both paths sub-ms.
- **JSON field_order/* (extractor-driven sort)**: 7.85 s @ 100k → 28.9 s
@ 1M, scales linearly. No functional-index match because no opclass on
`eql_v2.ore_cllw` — tracked as encrypt-query-language#220 (restore
CLLW ORE opclass).
- **ORE block range queries**: 1-2 ms @ 100k, sub-ms @ 1M (selective).
Functional `eql_v2.ore_block_u64_8_256` index engages.
- **EXACT (`=` via hmac on root scalars)**: sub-ms @ both tiers.
Functional hash index engages.
- **MATCH (LIKE via bloom_filter GIN)**: ms-range at both tiers.
- **GROUP_BY low-cardinality**: HashAggregate via inlined hmac_256;
matches the EQL query-perf guide §5 recipe.
Re-ingest of integer/string/category/combo at 100k+1M was required
because the previous data was encrypted under a stale keyset that the
current workspace can no longer decrypt — without this, the
`_decrypt/*` scenarios would 403 on ZeroKMS. JSON ste_vec data was
re-ingested at 100k+1M with `SteVecMode::Standard`.
EQL installed: the **supabase variant** (no operator classes) — required
for ANALYZE to succeed on hm-only tables under the strict-compare
contract from #219. Aligns with U-001's functional-index recommendation.
- 10M tier (existing data still present but stale-keyset; would need
several hours of re-ingest).
- Compat-mode bench scenarios exercising `eql_v2.ope_cllw` — OPE moves
to a separate encrypted type in a future release; no EQL recipe to
bench against today.
- field_order with functional ORE index match — blocked on
encrypt-query-language#220 (operator class on `eql_v2.ore_cllw`).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WHERE value < $1 ORDER BY value LIMIT 10natural-form scenario tobenches/ore.rs, paired with the existingrange_lt_hybrid_ordered_10so the two together quantify the cost of the natural-form sort-key shortcut.report/BENCHMARK_REPORT.md+report/ore.md+ charts. Addsquery_ore_range_lt_natural_ordered_10_chart.png.Why
The natural-form ORDER BY scenario was dropped from this bench in an earlier pass (the "§4 sort-key trap"). Once #211 / #218 closed the per-comparison perf cliff (each ORE-term comparison is now inlinable SQL rather than dispatched through a plpgsql
eql_v2.compare()body), it was worth re-measuring to confirm whether the hybrid-form recommendation indocs/reference/query-performance.md§4 still holds.It does. At 50% selectivity:
The natural form is no longer the 6.3 s perf cliff it was pre-#218, but the residual Top-N Sort over the bitmap-scan output still dominates at any meaningful row count — the per-comparison cost has dropped, but the work is O(n log n) over the post-WHERE set. The bench now makes this delta explicit so the perf guide §4 has a numerical anchor.
Test plan
cargo build --bench ore --releasesucceedsmise run bench:query:ore 10000completes, includesrange_lt_natural_ordered_10andrange_lt_hybrid_ordered_10inresults/query/ore_rows_10000.jsonmise run reportregenerates without errors; ORE section inreport/BENCHMARK_REPORT.mdincludes the new scenario with description textreport/query_ore_range_lt_natural_ordered_10_chart.pngrenders