perf: optimize decode paths for Nat/Int, primitive vecs, and strings by sasa-tomic · Pull Request #721 · dfinity/candid

sasa-tomic · 2026-03-18T09:41:31Z

Overview

Four decode-side optimizations, all behavior-preserving:

Nat/Int deserialization bypass: for values fitting u64/i64, read LEB128 directly and call visitor.visit_u64/i64, avoiding the BigUint/BigInt → bytes → BigUint round-trip (saves 3 allocations per value).
BigNum vector fast path: batch cost tracking and skip per-element type cloning/checking for Vec, Vec, and Vec with Nat wire type, mirroring the existing primitive vec fast path.
PrimitiveVecAccess with IntoDeserializer: on LE platforms, decode primitive vectors via a lightweight SeqAccess that reads directly from the input byte slice using serde's IntoDeserializer, bypassing the full Deserializer and Cursor overhead.
Borrowed string deserialization: use visit_borrowed_str instead of copying bytes, enabling zero-copy for &str targets.

Benchmark improvements (decode, vs previous optimized baseline):
vec_nat: 910M → 300M (-67%)
vec_nat32: 406M → 247M (-39%)
vec_nat64: 411M → 255M (-38%)
vec_int16: 411M → 251M (-39%)
btreemap: 13.3B → 11.2B (-16%)
option_list: 23M → 18M (-20%)
variant_list: 21M → 17M (-21%)

Four decode-side optimizations, all behavior-preserving: 1. Nat/Int deserialization bypass: for values fitting u64/i64, read LEB128 directly and call visitor.visit_u64/i64, avoiding the BigUint/BigInt → bytes → BigUint round-trip (saves 3 allocations per value). 2. BigNum vector fast path: batch cost tracking and skip per-element type cloning/checking for Vec<Nat>, Vec<Int>, and Vec<Int> with Nat wire type, mirroring the existing primitive vec fast path. 3. PrimitiveVecAccess with IntoDeserializer: on LE platforms, decode primitive vectors via a lightweight SeqAccess that reads directly from the input byte slice using serde's IntoDeserializer, bypassing the full Deserializer and Cursor overhead. 4. Borrowed string deserialization: use visit_borrowed_str instead of copying bytes, enabling zero-copy for &str targets. Benchmark improvements (decode, vs previous optimized baseline): vec_nat: 910M → 300M (-67%) vec_nat32: 406M → 247M (-39%) vec_nat64: 411M → 255M (-38%) vec_int16: 411M → 251M (-39%) btreemap: 13.3B → 11.2B (-16%) option_list: 23M → 18M (-20%) variant_list: 21M → 17M (-21%) Made-with: Cursor

github-actions · 2026-03-18T09:46:27Z

Name	Max Mem (Kb)	Encode	Decode
blob	4_224	4_207_487	2_122_433 ($\textcolor{red}{0.00\%}$)
btreemap	75_456 ($\textcolor{red}{2.17\%}$)	531_975_781 ($\textcolor{green}{-0.00\%}$)	11_105_147_772 ($\textcolor{green}{-14.72\%}$)
nns	192	2_021_253	5_669_058 ($\textcolor{red}{0.04\%}$)
nns_list_proposal	1_216	7_013_836 ($\textcolor{red}{0.11\%}$)	65_295_437 ($\textcolor{red}{1.59\%}$)
option_list	128 ($\textcolor{red}{100.00\%}$)	716_415 ($\textcolor{red}{0.05\%}$)	17_851_091 ($\textcolor{green}{-18.62\%}$)
text	6_336	4_204_384	7_877_830 ($\textcolor{red}{0.00\%}$)
variant_list	128 ($\textcolor{red}{100.00\%}$)	711_213	16_594_674 ($\textcolor{green}{-20.01\%}$)
vec_int16	12_480	8_404_689	249_586_549 ($\textcolor{green}{-54.92\%}$)
vec_nat	11_008 ($\textcolor{red}{13.91\%}$)	67_095_666	304_518_781 ($\textcolor{green}{-63.93\%}$)
vec_nat32	24_768	16_793_297	243_295_382 ($\textcolor{green}{-55.72\%}$)
vec_nat64	49_344	33_570_495	251_684_254 ($\textcolor{green}{-54.88\%}$)

Parser cost: 16_174_059 ($\textcolor{green}{-0.00\%}$)
Extra args: 2_854_026 ($\textcolor{red}{0.55\%}$)

Click to see raw report

---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (0.00%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (0.00%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 11.64 B (improved by 14.15%)
    heap_increase: 1179 pages (regressed by 2.17%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (-0.00%) (change within noise threshold)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 11.11 B (improved by 14.72%)
    heap_increase: 1020 pages (regressed by 2.51%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.85 M (0.55%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 24.70 M (0.01%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 16.17 M (-0.00%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.67 M (0.04%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 72.31 M (1.44%) (change within noise threshold)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.01 M (0.11%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 65.30 M (1.59%) (change within noise threshold)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 18.57 M (improved by 18.03%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 716.41 K (0.05%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 17.85 M (improved by 18.62%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (0.00%) (change within noise threshold)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (0.00%) (change within noise threshold)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 17.31 M (improved by 19.35%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 711.21 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 16.59 M (improved by 20.01%)
    heap_increase: 2 pages (regressed by 100.00%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 257.99 M (improved by 54.10%)
    heap_increase: 195 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 8.40 M (no change)
    heap_increase: 130 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 249.59 M (improved by 54.92%)
    heap_increase: 65 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat
  total:
    instructions: 371.62 M (improved by 59.23%)
    heap_increase: 172 pages (regressed by 13.91%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 67.10 M (no change)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 304.52 M (improved by 63.93%)
    heap_increase: 139 pages (regressed by 17.80%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat32
  total:
    instructions: 260.09 M (improved by 54.07%)
    heap_increase: 387 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 16.79 M (no change)
    heap_increase: 258 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 243.30 M (improved by 55.72%)
    heap_increase: 129 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_nat64
  total:
    instructions: 285.26 M (improved by 51.77%)
    heap_increase: 771 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 33.57 M (no change)
    heap_increase: 514 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 251.68 M (improved by 54.88%)
    heap_increase: 257 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 12 | regressed 0 | improved 7 | new 0 | unchanged 5]
    change:   [max +1.03M | p75 +415 | median -4.12M | p25 -306.18M | min -1.92B]
    change %: [max +1.44% | p75 0.00% | median -16.09% | p25 -52.34% | min -59.23%]

  heap_increase:
    status:   Regressions detected 🔴
    counts:   [total 12 | regressed 4 | improved 0 | new 0 | unchanged 8]
    change:   [max +25 | p75 +1 | median 0 | p25 0 | min 0]
    change %: [max +100.00% | p75 +5.10% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 12 | regressed 0 | improved 0 | new 0 | unchanged 12]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                      | calls |     ins |  ins Δ% |    HI |    HI Δ% | SMI |  SMI Δ% |
|--------|---------------------------|-------|---------|---------|-------|----------|-----|---------|
|  +/-   | btreemap                  |       |  11.64B | -14.15% | 1.18K |   +2.17% |   0 |   0.00% |
|  +/-   | btreemap::2. Decoding     |     1 |  11.11B | -14.72% | 1.02K |   +2.51% |   0 |   0.00% |
|  +/-   | option_list               |       |  18.57M | -18.03% |     2 | +100.00% |   0 |   0.00% |
|  +/-   | option_list::2. Decoding  |     1 |  17.85M | -18.62% |     2 | +100.00% |   0 |   0.00% |
|  +/-   | variant_list              |       |  17.31M | -19.35% |     2 | +100.00% |   0 |   0.00% |
|  +/-   | variant_list::2. Decoding |     1 |  16.59M | -20.01% |     2 | +100.00% |   0 |   0.00% |
|   -    | vec_nat64                 |       | 285.26M | -51.77% |   771 |    0.00% |   0 |   0.00% |
|   -    | vec_nat32                 |       | 260.09M | -54.07% |   387 |    0.00% |   0 |   0.00% |
|   -    | vec_int16                 |       | 257.99M | -54.10% |   195 |    0.00% |   0 |   0.00% |
|   -    | vec_nat64::2. Decoding    |     1 | 251.68M | -54.88% |   257 |    0.00% |   0 |   0.00% |
|   -    | vec_int16::2. Decoding    |     1 | 249.59M | -54.92% |    65 |    0.00% |   0 |   0.00% |
|   -    | vec_nat32::2. Decoding    |     1 | 243.30M | -55.72% |   129 |    0.00% |   0 |   0.00% |
|  +/-   | vec_nat                   |       | 371.62M | -59.23% |   172 |  +13.91% |   0 |   0.00% |
|  +/-   | vec_nat::2. Decoding      |     1 | 304.52M | -63.93% |   139 |  +17.80% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

Remove redundant explicit cleanup blocks after visit_seq — Compound::drop already resets both primitive_vec_fast_path and bignum_vec_fast_path on all paths (success and error). Restore the explanatory comment on the Drop impl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously set_position advanced by total_bytes unconditionally. Use access.offset (bytes actually consumed) so the cursor is correct if the visitor short-circuits before consuming all elements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bignum_vec_fast_path is only ever set from deserialize_seq when type information is available, so is_untyped must be false whenever the fast path is active. Add debug_assert to make this invariant explicit in both deserialize_int and deserialize_nat. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_str Both methods were identical after the visit_borrowed_str change. Delegate deserialize_string to deserialize_str to avoid future drift. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Conflict in Compound::next_element_seed: master refactored to always set expect_type/wire_type upfront and simplified the cost condition. Resolved by keeping master's unconditional type assignment while extending the is_fast check to include bignum_vec_fast_path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

is_untyped can be true with bignum_vec_fast_path active when deserializing IDLValue (get_value_with_type sets is_untyped=true). The LEB128 fast path is already correctly guarded by !is_untyped; the bignum fallback path works regardless because wire_type is pre-set by the vec fast path setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sasa-tomic requested a review from a team as a code owner March 18, 2026 09:41

fmt

c4acfce

lwshang and others added 6 commits March 18, 2026 10:53

refactor: deduplicate deserialize_string by delegating to deserialize…

13948cb

…_str Both methods were identical after the visit_borrowed_str change. Delegate deserialize_string to deserialize_str to avoid future drift. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fmt

c7030ad

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lwshang approved these changes Mar 18, 2026

View reviewed changes

lwshang merged commit 99ef1fa into master Mar 18, 2026
11 checks passed

lwshang deleted the sat-perf-1-decode branch March 18, 2026 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize decode paths for Nat/Int, primitive vecs, and strings#721

perf: optimize decode paths for Nat/Int, primitive vecs, and strings#721
lwshang merged 9 commits intomasterfrom
sat-perf-1-decode

sasa-tomic commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sasa-tomic commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sasa-tomic commented Mar 18, 2026 •

edited

Loading

github-actions Bot commented Mar 18, 2026 •

edited

Loading