Skip to content

perf: fast-path exact primitive vector decode#712

Merged
lwshang merged 4 commits intomasterfrom
perf-3-primitive-vector-decode
Mar 15, 2026
Merged

perf: fast-path exact primitive vector decode#712
lwshang merged 4 commits intomasterfrom
perf-3-primitive-vector-decode

Conversation

@sasa-tomic
Copy link
Copy Markdown
Member

@sasa-tomic sasa-tomic commented Mar 13, 2026

Overview
Reduce repeated per-element type work when decoding primitive vectors.

Requirements
Preserve vector decoding semantics, including compatibility with extra trailing arguments.

Solution
Add an exact-primitive fast path for vector elements so deserialization can skip repeated type unrolling and checks when expected and wire element types already match. Add a compatibility test covering extra-args behavior.

Considerations
The optimization is limited to exact primitive matches and leaves the general decode path unchanged. Series-level benchmark context is tracked in #710.

Skip repeated type setup for vecs whose wire and expected element types are the same fixed-width primitive so large primitive arrays decode with less overhead.
@sasa-tomic sasa-tomic requested a review from a team as a code owner March 13, 2026 11:39
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 13, 2026

Name Max Mem (Kb) Encode Decode
blob 4_224 4_207_487 2_122_432
btreemap 73_856 531_975_925 12_984_691_964 ($\textcolor{red}{0.00\%}$)
nns 192 2_021_253 5_663_359 ($\textcolor{green}{-0.02\%}$)
nns_list_proposal 1_216 7_013_446 ($\textcolor{red}{0.01\%}$) 64_312_599 ($\textcolor{red}{0.20\%}$)
option_list 64 716_007 21_888_381 ($\textcolor{red}{0.15\%}$)
text 6_336 4_204_384 7_877_759
variant_list 64 710_989 20_528_739 ($\textcolor{red}{0.45\%}$)
vec_int16 16_704 123_694_298 633_364_072 ($\textcolor{green}{-36.55\%}$)
  • Parser cost: 17_069_949
  • Extra args: 2_872_191 ($\textcolor{red}{0.71\%}$)
Click to see raw report
---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 13.52 B (0.00%) (change within noise threshold)
    heap_increase: 1154 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (no change)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 12.98 B (0.00%) (change within noise threshold)
    heap_increase: 995 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.87 M (0.71%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 25.59 M (-0.01%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 17.07 M (no change)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.66 M (-0.02%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 71.33 M (0.18%) (change within noise threshold)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.01 M (0.01%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 64.31 M (0.20%) (change within noise threshold)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 22.61 M (0.15%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 716.01 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 21.89 M (0.15%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (no change)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (no change)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 21.24 M (0.44%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 710.99 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 20.53 M (0.45%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 757.06 M (improved by 32.52%)
    heap_increase: 261 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 123.69 M (no change)
    heap_increase: 261 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 633.36 M (improved by 36.55%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 9 | regressed 0 | improved 1 | new 0 | unchanged 8]
    change:   [max +129.79K | p75 +32.77K | median +4 | p25 0 | min -364.90M]
    change %: [max +0.71% | p75 +0.18% | median 0.00% | p25 0.00% | min -32.52%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                   | calls |     ins |  ins Δ% |  HI |  HI Δ% | SMI |  SMI Δ% |
|--------|------------------------|-------|---------|---------|-----|--------|-----|---------|
|   -    | vec_int16              |       | 757.06M | -32.52% | 261 |  0.00% |   0 |   0.00% |
|   -    | vec_int16::2. Decoding |     1 | 633.36M | -36.55% |   0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

sasa-tomic and others added 3 commits March 13, 2026 13:06
… vec fast path

- Add Drop impl on Compound to reset primitive_vec_fast_path on drop,
  preventing fast-path state leaking on error/panic paths; reborrow
  self.de in VariantAccess methods to satisfy the compiler
- Add comment explaining why deserialize_bool lives outside primitive_impl!
- Add tests: nested Vec<Vec<i16>>, struct with vec field, mismatched
  Rust/wire type error path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lwshang lwshang merged commit d9fd40c into master Mar 15, 2026
11 checks passed
@lwshang lwshang deleted the perf-3-primitive-vector-decode branch March 15, 2026 17:13
lwshang added a commit that referenced this pull request Mar 18, 2026
When decoding a trailing/extra argument that is a primitive vector,
`deserialize_ignored_any` relies on `expect_type`/`wire_type` being set
to the element type. The fast path introduced in #712 skipped setting
these, causing `deserialize_ignored_any` to see the outer `Vec<T>` type
and attempt to decode a nested vector instead of a scalar, corrupting
the byte stream.

Fix: always set `expect_type`/`wire_type` to the element type before
calling `seed.deserialize`, and only skip `add_cost(3)` in the fast
path. The `primitive_impl!` macro checks `primitive_vec_fast_path`
before touching these types, so normal decode performance is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lwshang added a commit that referenced this pull request Mar 18, 2026
## Summary

- Fix decoding failure when a trailing/extra argument is a primitive
vector (`vec int8/16/32/64`, `vec nat8/16/32/64`, `vec float32/64`, `vec
bool`)
- The fast-path optimization (#712) skipped setting
`expect_type`/`wire_type` per element; `deserialize_ignored_any` then
misidentified the element type and corrupted the byte stream
- Fix: always set element types before calling `seed.deserialize`,
skipping only the `add_cost(3)` call in the fast path
- Release candid 0.10.26

## Test plan

- [ ] New regression test `primitive_vector_is_extra_args` in
`tests/compatibility_vectors.rs` covers the exact failure scenario
- [ ] All existing `compatibility_vectors` tests continue to pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants