Skip to content

bench(ore): add natural-form ORDER BY scenario alongside hybrid#13

Merged
coderdan merged 1 commit into
mainfrom
feat/ore-natural-form-bench
May 19, 2026
Merged

bench(ore): add natural-form ORDER BY scenario alongside hybrid#13
coderdan merged 1 commit into
mainfrom
feat/ore-natural-form-bench

Conversation

@coderdan
Copy link
Copy Markdown
Contributor

@coderdan coderdan commented May 18, 2026

Summary

  • Re-adds the WHERE value < $1 ORDER BY value LIMIT 10 natural-form scenario to benches/ore.rs, paired with the existing range_lt_hybrid_ordered_10 so the two together quantify the cost of the natural-form sort-key shortcut.
  • Refreshes ORE bench results (10k / 100k / 1M) against the current EQL DEV install carrying cipherstash/encrypt-query-language#211 / EQL #218 (range-operator inlining).
  • Regenerates report/BENCHMARK_REPORT.md + report/ore.md + charts. Adds query_ore_range_lt_natural_ordered_10_chart.png.

Why

The natural-form ORDER BY scenario was dropped from this bench in an earlier pass (the "§4 sort-key trap"). Once #211 / #218 closed the per-comparison perf cliff (each ORE-term comparison is now inlinable SQL rather than dispatched through a plpgsql eql_v2.compare() body), it was worth re-measuring to confirm whether the hybrid-form recommendation in docs/reference/query-performance.md §4 still holds.

It does. At 50% selectivity:

Rows natural form hybrid form slowdown
10k 18 ms 1.7 ms 10×
100k 1.2 s 1.7 ms 697×
1M 8.85 s 1.3 ms 6988×

The natural form is no longer the 6.3 s perf cliff it was pre-#218, but the residual Top-N Sort over the bitmap-scan output still dominates at any meaningful row count — the per-comparison cost has dropped, but the work is O(n log n) over the post-WHERE set. The bench now makes this delta explicit so the perf guide §4 has a numerical anchor.

Test plan

  • cargo build --bench ore --release succeeds
  • mise run bench:query:ore 10000 completes, includes range_lt_natural_ordered_10 and range_lt_hybrid_ordered_10 in results/query/ore_rows_10000.json
  • Same at 100k and 1M
  • mise run report regenerates without errors; ORE section in report/BENCHMARK_REPORT.md includes the new scenario with description text
  • New chart report/query_ore_range_lt_natural_ordered_10_chart.png renders
  • CI green

Re-adds the `WHERE value < $1 ORDER BY value LIMIT 10` scenario that was
dropped from this bench in an earlier cleanup pass. It pairs with the
existing `range_lt_hybrid_ordered_10` (`ORDER BY eql_v2.ore_block_u64_8_256(value)`)
so the two scenarios together quantify the cost of taking the natural-form
sort-key shortcut.

Both scenarios use the same predicate and `LIMIT` — only the `ORDER BY`
form differs. The hybrid form's sort key matches the functional index
expression syntactically, so rows stream out of the index already ordered
(no Sort node). The natural form's sort key doesn't match, so the plan
keeps a residual Top-N Sort over the bitmap-scan output; post-EQL #218
each comparison in that Sort is the inlined ORE-term path, but the Sort
itself still scales with the size of the post-WHERE set.

Numbers at 50% selectivity:

  rows     natural        hybrid     slowdown
  10k      18 ms          1.7 ms     10×
  100k     1.2 s          1.7 ms     697×
  1M       8.85 s         1.3 ms     6988×

The natural form is no longer the 6.3 s perf cliff it was pre-#218 (each
comparison is now inlinable rather than dispatched through a plpgsql
`eql_v2.compare()` body), but the residual Sort still dominates at any
meaningful row count — the bench now makes that delta explicit so
`docs/reference/query-performance.md` §4 has a numerical anchor.

`report_benchmarks.py` gets the new scenario in `sql_map` and
`descriptions`, and the report is regenerated. Result sidecars
(`ore_rows_*.json` + `ore_metadata_*.json`) are refreshed at 10k/100k/1M
against the current EQL DEV install (carries #211 / EQL #218 inlining).
@coderdan coderdan marked this pull request as ready for review May 19, 2026 12:33
@coderdan coderdan merged commit 66ca6fa into main May 19, 2026
coderdan added a commit that referenced this pull request May 20, 2026
Bench-side counterpart to encrypt-query-language#219 (ORE-only ste_vec
consolidation + strict eql_v2.compare contract).

- `Cargo.toml`: cipherstash-client pin moves from
  `dan/zerokms-unexpected-error-context` to `main`. Main now carries the
  ste_vec consolidation (suite#1955 + the post-#1955 `ocf`/`ocv` → `oc`
  collapse). Note: this branch predates the RequestFailed Display fix
  (#1960), so encrypt-side ZeroKMS errors surface as "Unexpected error"
  again — the bench's retry loop in `prepare:_table` is the safety net.
- API renames across encrypt binaries + `src/lib.rs`:
  - `ColumnType::Utf8Str` → `ColumnType::Text`
  - `ColumnType::JsonB` → `ColumnType::Json`
  - `Plaintext::Utf8Str` → `Plaintext::Text`
  - `Plaintext::JsonB` → `Plaintext::Json`
- `src/bin/encrypt_ste_vec_small.rs` + `_large.rs`: set
  `IndexType::SteVec { mode: SteVecMode::Standard, .. }` so sv elements
  emit `oc` (CLLW ORE) for orderable terms. EQL 2.3's `eql_v2_encrypted`
  type only handles ORE; OPE (Compat mode) is moving to a separate
  encrypted column type in a future release.
- `benches/json.rs`:
  - `ore_extractor_for` map updated: `oc` → `eql_v2.ore_cllw`, `op` →
    `eql_v2.ope_cllw` (placeholder; OPE bench path is exercised via the
    Compat-mode encrypt binary which isn't current default).
  - Picker rewritten to sample **two** selectors independently: one
    hm-bearing for field_eq scenarios, one orderable-bearing for
    field_order scenarios. Post-#1955 these are typically disjoint
    (the array-prefix selector lookup element carries `hm`; the value
    elements carry `oc`). The old single-needle picker skipped
    field_eq/* when it picked an `oc`-only element.
- `benches/ore.rs`: re-added `range_lt_natural_ordered_10` scenario
  (carried forward from feat/ore-natural-form-bench, PR #13).
- `report_benchmarks.py`: new entries for `range_lt_natural_ordered_10`
  and the json/field_eq scenario.

Six bench families at 100k + 1M (78 scenarios total) under fresh data
ingested with the consolidation client. See `report/BENCHMARK_REPORT.md`
for the full table; key signals:

- **JSON contains/functional via GIN ste_vec**: 3.88 ms @ 100k → 4.32 ms
  @ 1M (flat — GIN sub-linear). Index engages cleanly.
- **JSON field_eq/* (inlined hmac path)**: 611-1250 µs @ 100k, 634 µs-1.1
  ms @ 1M. Sub-millisecond at 1M — confirms post-#205 hmac inlining +
  hmac_256_terms GIN. Planner picks seq scan over GIN at 100k (cost-model
  edge case); both paths sub-ms.
- **JSON field_order/* (extractor-driven sort)**: 7.85 s @ 100k → 28.9 s
  @ 1M, scales linearly. No functional-index match because no opclass on
  `eql_v2.ore_cllw` — tracked as encrypt-query-language#220 (restore
  CLLW ORE opclass).
- **ORE block range queries**: 1-2 ms @ 100k, sub-ms @ 1M (selective).
  Functional `eql_v2.ore_block_u64_8_256` index engages.
- **EXACT (`=` via hmac on root scalars)**: sub-ms @ both tiers.
  Functional hash index engages.
- **MATCH (LIKE via bloom_filter GIN)**: ms-range at both tiers.
- **GROUP_BY low-cardinality**: HashAggregate via inlined hmac_256;
  matches the EQL query-perf guide §5 recipe.

Re-ingest of integer/string/category/combo at 100k+1M was required
because the previous data was encrypted under a stale keyset that the
current workspace can no longer decrypt — without this, the
`_decrypt/*` scenarios would 403 on ZeroKMS. JSON ste_vec data was
re-ingested at 100k+1M with `SteVecMode::Standard`.

EQL installed: the **supabase variant** (no operator classes) — required
for ANALYZE to succeed on hm-only tables under the strict-compare
contract from #219. Aligns with U-001's functional-index recommendation.

- 10M tier (existing data still present but stale-keyset; would need
  several hours of re-ingest).
- Compat-mode bench scenarios exercising `eql_v2.ope_cllw` — OPE moves
  to a separate encrypted type in a future release; no EQL recipe to
  bench against today.
- field_order with functional ORE index match — blocked on
  encrypt-query-language#220 (operator class on `eql_v2.ore_cllw`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant