Skip to content

No length prefix in trie#452

Open
illuzen wants to merge 20 commits intomainfrom
illuzen/no-length-trie
Open

No length prefix in trie#452
illuzen wants to merge 20 commits intomainfrom
illuzen/no-length-trie

Conversation

@illuzen
Copy link
Contributor

@illuzen illuzen commented Mar 20, 2026

this sits on top of #451

it removes the length prefixes from the trie and makes a corresponding change in trie-db to fix the cache usage.

it also uses the updated PoseidonHasher to be non-injective

@n13 n13 changed the base branch from main to illuzen/inline-trie-db March 21, 2026 04:44
@n13
Copy link
Collaborator

n13 commented Mar 21, 2026

changed base to 451 branch

@n13
Copy link
Collaborator

n13 commented Mar 21, 2026

Security and correctness review of PR #452 on top of #451.


PR #452 Review: Security & Correctness

This PR does three things: (A) removes length prefixes from branch children in the trie encoding, (B) forces all values to be hashed (never inlined), and (C) bypasses the node cache for value lookups to fix a stale-cache bug. It also switches account ID derivation to an injective Poseidon hash.


Critical: Zero-hash sentinel in branch decoding

In node_codec.rs decode_plan, children that are all-zero bytes are treated as a sentinel meaning "child exists but omitted from proof":

let is_zero_hash = data[range.clone()].iter().all(|&b| b == 0);
if is_zero_hash {
    children[i] = Some(NodeHandlePlan::Inline(range.start..range.start));
} else {
    children[i] = Some(NodeHandlePlan::Hash(range));
}

Concerns:

  • This conflates data with metadata. While a cryptographic hash being all-zeros is astronomically unlikely, using a magic sentinel value in a security-critical structure is fragile.
  • The empty Inline(range.start..range.start) represents a zero-length inline node. Downstream code that tries to decode this inline child will get an empty slice. If the proof verifier tries to hash this to reconstruct the root, it gets H::hash(&[]) instead of the real child hash, producing the wrong root -- unless the verifier has special handling for zero-length inlines.
  • The encoding side never produces zero hashes (it always writes real hashes from H::hash(...) or ChildReference::Hash). So this decoding path only triggers for crafted inputs or proofs. This asymmetry between encode and decode paths is a smell that should be documented or handled more explicitly.

Recommendation: Use an explicit sentinel type or separate proof format rather than overloading the hash space. At minimum, add a comment explaining which call paths reach this code and confirming the proof verifier handles zero-length inlines correctly.


Critical: Inline children silently hashed during encoding

In branch_node_nibbled:

&Some(ChildReference::Inline(inline_data, len)) => {
    let child_hash = H::hash(&inline_data.as_ref()[..len]);
    output.extend_from_slice(child_hash.as_ref());
    true
},

If an inline child is passed, it gets silently hashed. But inline children by definition are not stored separately in the DB -- they were embedded directly in the parent node. After this hashing, the decoder will see a NodeHandlePlan::Hash and try to look it up in the DB, causing an IncompleteDatabase error.

The assumption is that with FELT_ALIGNED_MAX_INLINE_VALUE = 0, inline children never occur. The debug_assert in append_substream partially enforces this, but only in debug builds:

debug_assert!(
    data.len() > MAX_INLINE_THRESHOLD,
    "Node too small ({} bytes) - would be inlined but inline children not supported",
    data.len()
);

Recommendation: Make this a hard assert! (or return an error). A debug_assert compiles out in release mode, meaning a violation would silently corrupt the trie in production rather than failing loudly.


Important: Cache bypass is correct but heavy-handed

The lookup.rs changes completely disable two caching layers:

  1. load_owned_value no longer uses cache.get_or_insert_node() -- it always goes to DB
  2. The lookup path now ignores CachedValue::ExistingHash and CachedValue::Existing, falling through to lookup_data (DB fetch)

This effectively means every value read hits the database. The stale-cache regression test (poisoned_node_value_cache_is_ignored_for_hashed_values) proves the bug existed -- a stale NodeOwned::Value in the node cache would return wrong data. The fix is correct for safety.

However, the node cache still caches trie structure nodes (branches, leaves). The issue is specifically that value nodes were being cached by their hash, but when a key's value changes, the old value hash's cached entry isn't invalidated. Since the new value has a different hash, the old entry becomes orphaned but harmless -- unless the trie happens to look up the same hash for a different reason.

The real question: With MAX_INLINE_VALUE = 0, all values are external (hashed). So every value read pays a DB round-trip. For a chain that reads System::Number on every block, this matters. Is this intended as a permanent solution or a temporary safety measure?


Important: Consensus-breaking changes

Two changes break consensus with existing chain state:

  1. FELT_ALIGNED_MAX_INLINE_VALUE: 31 → 0 changes how every trie node is encoded. Existing nodes with inline values (<=31 bytes) would become invalid under the new encoding. The state root computed after this change will differ from the previous one for identical data.

  2. PoseidonHasher::hash()PoseidonHasher::hash_padded() changes account ID derivation for every public key. Every existing account mapping becomes stale.

These require a chain reset or coordinated migration. If this is intended (e.g., testnet reset), fine. If targeting a live chain, this needs a migration path.


Minor: All dependencies pinned to feature branches

qp-poseidon = { git = "...", branch = "illuzen/injectivity" }
qp-plonky2 = { git = "...", branch = "illuzen/new-rate" }
# ... 14 more git branch overrides

This is fine for development but these branches can be force-pushed, making builds non-reproducible. Before merging to main, these should be pinned to tags or commit SHAs.


Minor: Excessive debug logging

node_codec.rs gained many log::debug! calls that format raw trie data (hashes, partial keys, bitmap bytes). Even at debug level:

  • Format arguments are evaluated before the log level check in many log implementations
  • Could leak sensitive state data if debug logging is enabled in production
  • These look like debugging aids that should be stripped before merge

Tests: Good coverage, some cleanup needed

The regression test for cache poisoning is well-targeted. The encode/decode round-trip tests for nibble_count 30/31 and the realistic chain storage test cover the exact failure scenario. The decode_hex helper is duplicated 3 times across tests -- minor but worth extracting.


Summary

Issue Severity Action
Zero-hash sentinel in decode Critical Document or replace with explicit proof format
Inline→hash silent conversion Critical Upgrade debug_assert! to assert!
Full cache bypass Important Confirm this is acceptable for performance; consider targeted invalidation
Consensus-breaking changes Important Confirm chain reset is planned
Git branch dependencies Minor Pin to tags/SHAs before main merge
Debug logging in hot path Minor Remove before merge

let preimage_felts = preimage.to_felts();
PoseidonHasher::hash_variable_length(preimage_felts)
let preimage_felts = unsafe_digest_bytes_to_felts(&preimage);
PoseidonHasher::hash_variable_length(preimage_felts.to_vec())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the same poseidon hash we use for normal addresses?

@n13
Copy link
Collaborator

n13 commented Mar 23, 2026

PR #452 Review

Context update

  • This is a new chain, so backward compatibility is not required.
  • The previous compatibility concern is removed from this review.

Findings (ordered by severity)

Critical

  • Compact proof omission semantics appear inconsistent
    • primitives/trie-db/src/proof/generate.rs still encodes omitted hashed children as ChildReference::Inline(C::HashOut::default(), 0).
    • primitives/trie/src/node_codec.rs now encodes branch children as fixed-size hashes with no length prefix; inline children are converted to hashes.
    • The same decoder path treats all-zero child hashes as a special omitted-child sentinel and maps that to empty inline.
    • Risk: compact proof generation/verification could fail or behave unexpectedly on omitted hashed-child paths.

Medium

  • Release-build invariant protection is weak

    • primitives/trie/src/trie_stream.rs uses debug_assert! for a core invariant (data.len() > MAX_INLINE_THRESHOLD).
    • In release builds, this does not enforce the invariant.
    • Risk: invariant violations may not fail loudly in production.
  • Dependency reproducibility risk

    • Cargo.toml uses multiple branch-based git overrides for core dependencies.
    • Risk: force-pushable branches make builds less reproducible.

Low

  • Verbose debug logging in trie codec hot path
    • primitives/trie/src/node_codec.rs adds heavy debug! logging for decode/encode internals.
    • Risk: noisy logs and extra overhead in debug-enabled environments.

Removed concern

  • Backward compatibility / migration requirement
    • Removed per chain context (new chain).

Current merge stance

  • Not ready to merge until the compact proof omission path is confirmed/fixed.
  • Remaining items are medium/low and can be handled after the critical proof-path issue.

@n13
Copy link
Collaborator

n13 commented Mar 23, 2026

PR #452 Review — No length prefix in trie

PR: #452
Local head: 11df5cf0 remove zero hash branches we don't use this

Summary

This PR removes length prefixes from branch children in the trie encoding,
forces all values to be hashed (never inlined), bypasses the node value cache
to fix a stale-cache bug, and switches account ID / wormhole address derivation
to use injective Poseidon hashing.

Context

  • New chain — backward compatibility is not required.
  • Compact proofs (generate_trie_proof / verify_trie_proof) are not used in
    production, so the proof/generate.rs omission encoding path is not a blocker.

Verdict: ready to merge

No critical issues remain after the zero-hash sentinel removal.
The items below are non-blocking but worth addressing before or shortly after merge.

Remaining findings

Medium — debug_assert! guards a core invariant in release builds

primitives/trie/src/trie_stream.rs line 203:

debug_assert!(
    data.len() > MAX_INLINE_THRESHOLD,
    "Node too small ({} bytes) - would be inlined but inline children not supported",
    data.len()
);

This compiles away in release. If the invariant is ever violated, the trie
silently corrupts instead of failing loudly. Consider upgrading to assert!.

Medium — branch-based git dependency overrides

Cargo.toml adds 14+ [patch] overrides pointing at mutable branch heads
(illuzen/injectivity, illuzen/new-rate, illuzen/non-injectivity, etc.).
Force-pushes to those branches would silently change what gets built.
Pin to tags or commit SHAs before merging to main.

Low — heavy debug logging in trie codec hot path

primitives/trie/src/node_codec.rs has ~10 log::debug! calls that format
raw hashes, bitmaps, and partial keys. These are useful during development but
add noise and minor overhead when debug logging is enabled.
Strip or gate behind a feature flag before release.

Low — inline-to-hash silent conversion in encoding

branch_node_nibbled silently hashes ChildReference::Inline children.
With MAX_INLINE_VALUE = 0 this path should never be reached in practice,
and the debug_assert! in append_substream partially guards it.
Same debug_assert!assert! recommendation applies here.

Test coverage

  • Regression test for stale node-value cache (poisoned_node_value_cache_is_ignored_for_hashed_values) — targeted and good.
  • Round-trip encode/decode tests for nibble_count 30/31 — covers the exact failure scenario.
  • Realistic chain storage test with delta_trie_root block import simulation — good end-to-end coverage.
  • decode_hex helper is duplicated 3× across tests — minor, worth extracting.

@n13 n13 changed the base branch from illuzen/inline-trie-db to main March 24, 2026 03:14
illuzen and others added 3 commits March 24, 2026 17:59
@n13
Copy link
Collaborator

n13 commented Mar 26, 2026

AI review - see human review below for a much shorter version


Here's my review of PR #452 "No length prefix in trie":

Overview

This PR does four things:

  1. Removes length prefixes from branch children in the trie encoding (children are now always fixed-size 32-byte hashes)
  2. Forces all values to be hashed (never inlined) by setting FELT_ALIGNED_MAX_INLINE_VALUE = 0
  3. Inlines a fork of trie-db into primitives/trie-db/ to fix the node value cache
  4. Switches account ID / wormhole address derivation to injective Poseidon hashing and a new poseidon API (hash_for_circuit, hash_to_bytes, digest_to_felts, bytes_to_felts, etc.)

It also changes PoW from double-hash to single-hash in qpow-math.


Findings

Critical: debug_assert! guarding core invariant (trie_stream.rs)

Note @n13 : I fixed this, replacing debug asserts with asserts and adding panic to impossible code paths

The old append_substream had a debug_assert! ensuring data exceeded MAX_INLINE_THRESHOLD. The new code removed the constant and always hashes, which is correct, but the broader concern remains: node_codec.rs still has debug_assert! calls at lines 261 and 317 that verify hash lengths. If these invariants are violated in release builds, the trie silently corrupts.

Recommendation: Upgrade the debug_assert! in node_codec.rs (lines 261, 317) to hard assert!:

				Some(ChildReference::Hash(h)) => {
					// Children are always 32-byte hashes (no length prefix)
					debug_assert!(h.as_ref().len() == H::LENGTH);

These are on the encoding hot path but they protect against silent trie corruption, which is worse than a panic.

Medium: Inline-to-hash silent conversion in encoding

In branch_node_nibbled, when a ChildReference::Inline is encountered, it's silently hashed:

				&Some(ChildReference::Inline(inline_data, len)) => {
					// Inline children: hash the inline data, then store the 32-byte hash
					// (no length prefix, children are always hashes now)
					let child_hash = H::hash(&inline_data.as_ref()[..len]);
					// ...
					output.extend_from_slice(child_hash.as_ref());
					true
				},

With MAX_INLINE_VALUE = 0 this path should never be reached in practice, but if it is, the decoder will see a NodeHandlePlan::Hash and try to look it up in the DB -- causing IncompleteDatabase because the inline data was never persisted. Consider adding a hard assert! or returning an error here rather than silently converting.

Medium: Heavy debug logging in trie codec hot path

node_codec.rs and trie_stream.rs accumulated ~15 log::debug! calls that format raw hashes, bitmaps, and partial keys. Even at debug level, format arguments are evaluated before the log-level check in many log implementations. This adds overhead on every trie encode/decode.

Recommendation: Strip these before merging to main, or gate them behind a cfg(debug_assertions) / feature flag. These look like development aids.

Medium: Branch-based git dependency overrides

The [patch] section pins 14+ crates to mutable branch heads (illuzen/new-rate, illuzen/injective-to-felts, illuzen/non-injectivity, illuzen/new-poseidon-api). These branches can be force-pushed, making builds non-reproducible.

Recommendation: Pin to tags or commit SHAs before merging to main.

Low: Forked trie-db divergence tracking

The entire primitives/trie-db/ directory (~8,800 lines) is a vendored fork of the upstream Parity trie-db crate. The only meaningful modification is in lookup.rs where the value cache bypass was implemented (the load_owned_value path always goes to DB for ValueOwned::Node). This is a large surface area to maintain.

Recommendation: Document what was changed and why, ideally with a comment at the top of lookup.rs or in a README. This makes future upgrades tractable. The cache bypass itself is correct -- the regression test poisoned_node_value_cache_is_ignored_for_hashed_values proves the stale-cache bug existed.

Low: PoW hashing changed from double to single hash

qpow-math changed from double Poseidon2 hashing (hash-then-hash) to single hash with double squeeze. This is a consensus-impacting change that's bundled in without being called out in the PR description. It's fine for a new chain but should be noted.

Low: decode_hex helper duplicated 3x in tests

The decode_hex function appears in three test functions in primitives/trie/src/lib.rs. Worth extracting to a test utility.


Test Coverage

Good coverage overall:

  • poisoned_node_value_cache_is_ignored_for_hashed_values -- targeted regression test for the cache bug
  • test_branch_node_nibble_count_30_31 -- covers the exact failure scenario with round-trip encode/decode
  • test_realistic_chain_storage -- end-to-end simulation with delta_trie_root block import
  • test_system_number_key_path -- reproduces the exact chain error path

Summary

Finding Severity Action
debug_assert! guards core invariants in release Critical Upgrade to assert!
Inline-to-hash silent conversion Medium Add hard assert or error
Heavy debug logging in hot path Medium Strip before merge
Branch-based git dependencies Medium Pin to tags/SHAs
Forked trie-db divergence undocumented Low Add documentation
PoW hash change not called out Low Note in PR description
decode_hex duplication in tests Low Extract helper

The core approach is sound -- removing length prefixes simplifies the trie encoding, forcing all values to be hashed eliminates inline-vs-hashed ambiguity, and the cache bypass correctly fixes the stale value bug. The test coverage is solid. Main concern is hardening the debug_assert! guards before this goes to main.

@n13
Copy link
Collaborator

n13 commented Mar 26, 2026

Human review: I fixed the assert calls, now they all fail early in case things go wrong

We need to fix the patch github dependencies but everything needs to be released for that.

The rest is minor stuff GTG ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants