perf(cypher): VarMap replaces Binding HashMap for clone-per-match path (FU-006 H2)#433
Merged
Merged
Conversation
Contributor
ecp impact cache (0 symbols) — internal, used by
|
The cypher executor's `Binding` row holds three name->index maps that are cloned once per matched node in `exec_pattern`'s frontier expansion. The old `HashMap<String, u32>` clone was the dominant cost — 67-85% of query time per the FU-006 follow-up spike — because each clone heap-allocates a fresh bucket table and per-key `String`. Replace `node_vars` and `edge_vars` with `VarMap`: a `SmallVec` of `(CompactString, u32)` entries that stays inline for the typical 2-4 vars per cypher query. Inline storage means `Binding.clone()` is a fixed-size memcpy with zero heap allocations on the hot path. `CompactString` keys inline up to 24 bytes (every realistic cypher var name), so the per-key clone is also alloc-free. Linear scan over <=4 entries beats HashMap lookup at this size because there is no hashing cost. Wider Binding API kept compatible: `get(key: &str) -> Option<&u32>`, `contains_key(key: &str) -> bool`, `insert(key: &str, value: u32)` — identical to the HashMap surface so existing call sites only drop the now-redundant `var.clone()` before insert. Five sites updated. The `computed` field stays `HashMap<String, Value>` — it carries non-Copy `Value` items and is populated only by WITH clauses, never during frontier expansion. Empty-HashMap clone is a 56-byte memcpy with no allocation, so the savings would not justify the `Option<Box<HashMap>>` ergonomics churn. Deps: ecp-core picks up direct `smallvec` and `compact_str` (both already present transitively, so no new vendored grammar work). ## Empirical (--runs 10, freshly-rebuilt .sample_repo, same-session A/B) cypher count(*) ungrouped 61.4 ms -> 35.6 ms -42% find (bm25) 22.1 ms -> 18.9 ms -14% impact upstream 18.3 ms -> 16.0 ms -13% impact downstream 18.1 ms -> 16.0 ms -12% inspect (Class) 23.3 ms -> 20.3 ms -13% summary 19.9 ms -> 17.8 ms -11% summary --detailed 19.6 ms -> 17.4 ms -11% routes 12.7 ms -> 11.4 ms -10% cypher decorator IN 48.8 ms -> 41.4 ms -15% cypher Class->Method 45.0 ms -> 47.9 ms +6% (noise) cypher Method-Calls->Method 74.4 ms -> 73.3 ms -1% Count(*) is the headline (~26 ms saved on the 110k-Method frontier- expansion clone path); the consistent -10% to -14% across other query shapes comes from the same Binding clone being on more code paths than just exec_pattern. Tests: 1182 (egent-code-plexus) + 268 (ecp-core) all pass, clippy clean. Composes orthogonally with PR #432's exec_pattern kind-CSR + walk_rel closure refactor — those attack the iteration count, this attacks the per-iteration clone cost.
d68534e to
bd61b75
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third and final fix from the FU-006 follow-up spike's top-3 hotspot list. PR #432 attacked the iteration count (CSR kind lookup + walk_rel closure); this PR attacks the per-iteration clone cost. They compose orthogonally.
The cypher executor's
Bindingrow holds three name→index maps that are cloned once per matched node inexec_pattern's frontier expansion. The oldHashMap<String, u32>clone was the dominant cost — 67-85% of query time per the spike — because each clone heap-allocates a fresh bucket table and per-keyString.Replace
node_varsandedge_varswithVarMap: aSmallVecof(CompactString, u32)entries that stays inline for the typical 2-4 vars per cypher query. Inline storage meansBinding.clone()is a fixed-size memcpy with zero heap allocations on the hot path.CompactStringkeys inline up to 24 bytes (every realistic cypher var name), so the per-key clone is also alloc-free. Linear scan over ≤4 entries beats HashMap lookup at this size because there is no hashing cost.Wider Binding API kept compatible —
get(key: &str) -> Option<&u32>,contains_key(key: &str) -> bool,insert(key: &str, value: u32)— identical to the HashMap surface so existing call sites just drop the now-redundantvar.clone()before insert. Five sites updated.The
computedfield staysHashMap<String, Value>. It carries non-CopyValueitems and is populated only by WITH clauses, never during frontier expansion. Empty-HashMap clone is a 56-byte memcpy with no allocation, so the savings would not justify theOption<Box<HashMap>>ergonomics churn.Empirical (--runs 10, freshly-rebuilt
.sample_repo, same-session A/B vs main)count(*) is the headline (~26 ms saved on the 110k-Method frontier-expansion clone path); the consistent -10% to -14% across other query shapes comes from the same Binding clone being on more code paths than just
exec_pattern(inspect / impact / find all build bindings during their internal traversals).Composes with PR #432 (kind-CSR + walk_rel closure)
H1+H3 cuts iteration count from 303k → 110k; H2 cuts per-iteration clone from 74 ns → ~5 ns. Both halve their respective axis, so combined should compound. Will validate empirically once #432 lands and this rebases on top.
Spike attribution
Spike report identified all three hotspots; this PR completes the trio:
Test plan
cargo test -p egent-code-plexus --tests— 1182 passedcargo test -p ecp-core --tests— 268 passedcargo clippy -p egent-code-plexus --tests -- -D warnings— cleancargo clippy -p ecp-core --tests -- -D warnings— cleanimpact --baseline HEAD~1slowdown in the bench was a bench-state confound (both main and H2 take ~1.4s when L1 needs refresh; the bench's 9 ms reading on main was from a previously-warm L1 state)