Skip to content

perf(cypher): VarMap replaces Binding HashMap for clone-per-match path (FU-006 H2)#433

Merged
coseto6125 merged 1 commit into
mainfrom
perf/cypher-binding-smallvec
May 24, 2026
Merged

perf(cypher): VarMap replaces Binding HashMap for clone-per-match path (FU-006 H2)#433
coseto6125 merged 1 commit into
mainfrom
perf/cypher-binding-smallvec

Conversation

@coseto6125
Copy link
Copy Markdown
Owner

Summary

Third and final fix from the FU-006 follow-up spike's top-3 hotspot list. PR #432 attacked the iteration count (CSR kind lookup + walk_rel closure); this PR attacks the per-iteration clone cost. They compose orthogonally.

The cypher executor's Binding row holds three name→index maps that are cloned once per matched node in exec_pattern's frontier expansion. The old HashMap<String, u32> clone was the dominant cost — 67-85% of query time per the spike — because each clone heap-allocates a fresh bucket table and per-key String.

Replace node_vars and edge_vars with VarMap: a SmallVec of (CompactString, u32) entries that stays inline for the typical 2-4 vars per cypher query. Inline storage means Binding.clone() is a fixed-size memcpy with zero heap allocations on the hot path. CompactString keys inline up to 24 bytes (every realistic cypher var name), so the per-key clone is also alloc-free. Linear scan over ≤4 entries beats HashMap lookup at this size because there is no hashing cost.

Wider Binding API kept compatible — get(key: &str) -> Option<&u32>, contains_key(key: &str) -> bool, insert(key: &str, value: u32) — identical to the HashMap surface so existing call sites just drop the now-redundant var.clone() before insert. Five sites updated.

The computed field stays HashMap<String, Value>. It carries non-Copy Value items and is populated only by WITH clauses, never during frontier expansion. Empty-HashMap clone is a 56-byte memcpy with no allocation, so the savings would not justify the Option<Box<HashMap>> ergonomics churn.

Empirical (--runs 10, freshly-rebuilt .sample_repo, same-session A/B vs main)

Phase Main H2 Δ
cypher count(*) ungrouped 61.4 ms 35.6 ms -42%
cypher decorator IN 48.8 ms 41.4 ms -15%
find (bm25) 22.1 ms 18.9 ms -14%
inspect (Class) 23.3 ms 20.3 ms -13%
impact upstream 18.3 ms 16.0 ms -13%
impact downstream 18.1 ms 16.0 ms -12%
summary 19.9 ms 17.8 ms -11%
summary --detailed 19.6 ms 17.4 ms -11%
routes 12.7 ms 11.4 ms -10%
cypher Method-Calls→Method 74.4 ms 73.3 ms -1%
cypher Class→Method 45.0 ms 47.9 ms +6% (noise)

count(*) is the headline (~26 ms saved on the 110k-Method frontier-expansion clone path); the consistent -10% to -14% across other query shapes comes from the same Binding clone being on more code paths than just exec_pattern (inspect / impact / find all build bindings during their internal traversals).

Composes with PR #432 (kind-CSR + walk_rel closure)

Query Main #432 H1+H3 This H2 Stacked expected
cypher count(*) 61.4 ms 32.4 ms (-47%) 35.6 ms (-42%) ~20 ms (-67%)
cypher decorator IN 48.8 ms 39.1 ms (-20%) 41.4 ms (-15%) ~33 ms (-33%)

H1+H3 cuts iteration count from 303k → 110k; H2 cuts per-iteration clone from 74 ns → ~5 ns. Both halve their respective axis, so combined should compound. Will validate empirically once #432 lands and this rebases on top.

Spike attribution

Spike report identified all three hotspots; this PR completes the trio:

Test plan

  • cargo test -p egent-code-plexus --tests — 1182 passed
  • cargo test -p ecp-core --tests — 268 passed
  • cargo clippy -p egent-code-plexus --tests -- -D warnings — clean
  • cargo clippy -p ecp-core --tests -- -D warnings — clean
  • Same-session A/B against main on 11-phase benchmark — zero regression, -10% to -42% wins
  • Manual cross-check that the apparent impact --baseline HEAD~1 slowdown in the bench was a bench-state confound (both main and H2 take ~1.4s when L1 needs refresh; the bench's 9 ms reading on main was from a previously-warm L1 state)

@coseto6125 coseto6125 enabled auto-merge (squash) May 24, 2026 00:11
@coseto6125 coseto6125 added the merge-queue Opt-in to Mergify merge queue label May 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-high ecp signal label May 24, 2026
The cypher executor's `Binding` row holds three name->index maps that are
cloned once per matched node in `exec_pattern`'s frontier expansion. The
old `HashMap<String, u32>` clone was the dominant cost — 67-85% of query
time per the FU-006 follow-up spike — because each clone heap-allocates
a fresh bucket table and per-key `String`.

Replace `node_vars` and `edge_vars` with `VarMap`: a `SmallVec` of
`(CompactString, u32)` entries that stays inline for the typical 2-4
vars per cypher query. Inline storage means `Binding.clone()` is a
fixed-size memcpy with zero heap allocations on the hot path.
`CompactString` keys inline up to 24 bytes (every realistic cypher var
name), so the per-key clone is also alloc-free. Linear scan over <=4
entries beats HashMap lookup at this size because there is no hashing
cost.

Wider Binding API kept compatible: `get(key: &str) -> Option<&u32>`,
`contains_key(key: &str) -> bool`, `insert(key: &str, value: u32)` —
identical to the HashMap surface so existing call sites only drop the
now-redundant `var.clone()` before insert. Five sites updated.

The `computed` field stays `HashMap<String, Value>` — it carries
non-Copy `Value` items and is populated only by WITH clauses, never
during frontier expansion. Empty-HashMap clone is a 56-byte memcpy
with no allocation, so the savings would not justify the
`Option<Box<HashMap>>` ergonomics churn.

Deps: ecp-core picks up direct `smallvec` and `compact_str` (both
already present transitively, so no new vendored grammar work).

## Empirical (--runs 10, freshly-rebuilt .sample_repo, same-session A/B)

  cypher count(*) ungrouped      61.4 ms  ->  35.6 ms   -42%
  find (bm25)                    22.1 ms  ->  18.9 ms   -14%
  impact upstream                18.3 ms  ->  16.0 ms   -13%
  impact downstream              18.1 ms  ->  16.0 ms   -12%
  inspect (Class)                23.3 ms  ->  20.3 ms   -13%
  summary                        19.9 ms  ->  17.8 ms   -11%
  summary --detailed             19.6 ms  ->  17.4 ms   -11%
  routes                         12.7 ms  ->  11.4 ms   -10%
  cypher decorator IN            48.8 ms  ->  41.4 ms   -15%
  cypher Class->Method            45.0 ms  ->  47.9 ms    +6% (noise)
  cypher Method-Calls->Method     74.4 ms  ->  73.3 ms    -1%

Count(*) is the headline (~26 ms saved on the 110k-Method frontier-
expansion clone path); the consistent -10% to -14% across other query
shapes comes from the same Binding clone being on more code paths than
just exec_pattern.

Tests: 1182 (egent-code-plexus) + 268 (ecp-core) all pass, clippy clean.

Composes orthogonally with PR #432's exec_pattern kind-CSR + walk_rel
closure refactor — those attack the iteration count, this attacks the
per-iteration clone cost.
@coseto6125 coseto6125 force-pushed the perf/cypher-binding-smallvec branch from d68534e to bd61b75 Compare May 24, 2026 00:29
@github-actions github-actions Bot added ecp:risk-low ecp signal and removed ecp:risk-high ecp signal labels May 24, 2026
@coseto6125 coseto6125 merged commit 64c7cfd into main May 24, 2026
18 checks passed
@coseto6125 coseto6125 deleted the perf/cypher-binding-smallvec branch May 24, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal merge-queue Opt-in to Mergify merge queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant