Skip to content

Hybrid Merkle trees#530

Closed
Antonio95 wants to merge 19 commits intoPlonky3:mainfrom
NP-Eng:antonio-cesar/hybrid
Closed

Hybrid Merkle trees#530
Antonio95 wants to merge 19 commits intoPlonky3:mainfrom
NP-Eng:antonio-cesar/hybrid

Conversation

@Antonio95
Copy link
Contributor

@Antonio95 Antonio95 commented Nov 1, 2024

Joint work with @Cesar199999

Cf. issue #476

General explanation

This PR contains some important steps towards hybrid Merkle trees, by which we mean a Merkle tree where different hash functions are used at different levels. This technique can save substantial prover work while only slightly increasing (recursive) verifier work. In the simplest case, one can use a (natively) very fast, recursion-unfriendly hash function (such as Blake3) for the bottom level of the tree, while maintaining a (natively) slower, recursion-friendly hash (such as Poseidon2) for the rest of the levels. Because the bottom level accounts for half of all the prover's hashing, while only constituting a 1/height fraction of the verifier's (due to the structure of a Merkle path), the aforementioned tradeoff arises.

The new code is meant to work as a toolbox allowing users (even ones importing the crate as a dependency) to mix hashes of their choice and try out different hybrid-hashing and node-conversion strategies (allowing them to implementing the relevant traits themselves). This flexibility should enable one to tweak the tradeoff and/or apply it only when it is beneficial.

So far the design only allows hybrid hashing for the compression of tree nodes (the previous generic C: PseudoCompressionFunction), not for the digestion of matrix rows into leaves (the previous generic H: CryptographicHasher). Our reasoning is that the verifier needs to do the same amount of hashing as the prover in order to digest a row, and so the tradeoff explained above doesn't manifest as clearly in the case of the digestion function. However, implementing hybrid digestion down the line is indeed a possibility which would make the toolbox more general.

Code overview

(the code itself contains detailed information in the form of // and /// documentation)

In essence, a new structure HybridMerkleTree has been created whose constructor new is generic on C: HybridPseudoCompressionFunction<...>, as opposed to the C: PseudoCompressionFunction<...> present in the preexisting MerkleTree. The trait HybridPseudoCompressionFunction<T, const N: usize> closely mirrors PseudoCompressionFunction<T, const N: usize> but replaces the original method fn compress(&self, input: [T; N]) -> T by fn compress(&self, input: [T; N], sizes: &[usize], current_size: usize) -> T. The two extra arguments are meant to inform the hybrid compressor about which level of the Merkle tree it is being called at, as well as the heights of the matrices being committed to - so that the hybrid compressor can decide which of its (possibly many) compression functions to use. Note that the hybrid compressor only accepts one input type [T; N]. If its implementor uses several compressors requiring different input types under the hood (e. g. field elements and bytes), that variety is hidden away from the trait and left for the implementor to handle. This design has some tradeoffs we are happy to discuss, but one of its main advantages is that the HybridMerkleTree code is almost a carbon copy of the original MerkleTree's - it does not need to worry about differing node types because HybridPseudoCompressionFunction<T, N> only exposes one type to it. The only change introduced to the HybridMerkleTree is passing the heights of the matrices to the compression function.

One implementor of HybridPseudoCompressionFunction is provided: SimpleHybridCompressor, which uses the aforementioned strategy of one compressor C1 at the bottom level and another one C2 at all others. To be precise, if the highest and second-highest numbers of rows differ by a factor of 2 (when rounded up to the nearest power of 2), then C1 is used both to compress the bottom leaves and to inject the next-to-bottom leaves - which makes sense, since the amount of calls to compress is the same in both operations.

The generics and inner workings of SimpleHybridCompressor are explained in detail in the code. Perhaps the only point deserving further attention is the generic NC: NodeConverter<..., ...> of SimpleHybridCompressor. It allows this hybrid compressor to convert back and forth between the input types of C1 and C2 in order to always receive and output the unique type present in the trait HybridPseudoCompressionFunction.

Two implementations of the node converter are provided, both for BabyBear <-> u8 conversion in the 256-bit-node case. One is NodeConverter256BabyBearBytes, which can handle any [PackedValue<Val=BabyBear>; 8]and [PackedValue<Val=u8>; 32] (with the same WIDTH); and another one, UnsafeNodeConverter256BabyBearBytes, which can only handle [BabyBear; 8] and [u8; 32] (with the individual types acting as implementors of PackedValue of WIDTH 1). The former implementation is forced to use PackedValue trait methods (which only provide references and therefore involve cloning) and requires transposition of the arrays of packed values, being therefore relatively slow. The latter relies on hard-casts (aside from the modular reduction) and is hence both unsafe and lightning fast.

Separately, we have added an IdentityHasher which simply pads the vector to be digested with defaults to the desired output size (and panics if the vector is longer than the that, since this would lead to a trivially second-preimage-weak digestion hasher). The relevance of this hasher will made clear by the benchmark explanation below.

What can be executed and some numbers

We have added three benchmarks to the merkle_tree crate. In them, matrices of BabyBear elements are generated randomly (both in terms of size and field elements). Then several compressor configurations are used to construct hybrid and plain Merkle trees, and the time is measured. In this description, by "plain-4", we mean "Original MerkleTree with Poseidon2 compressor all throughout and packed BabyBear nodes of WIDTH 4". By "plain-1" we mean the same with WIDTH equal to 1. And by "hybrid" we mean "HybridMerkleTree with Blake3 at the bottom and Poseidon2 elsewhere, with WIDTH equal to 1 in both cases".

The benchmarks were run on an Apple desktop computer with M2 and 16GB of RAM. In one case, it became relevant to also run them on a another system: a Linux laptop with i7-1165G7 and 32GB of RAM. We refer to these machines as A and B, respectively. The benchmarks and results are as follows:

  1. hybrid_unsafe_vs_safe: This is simply meant to highlight the speed difference between the two node converters. The exact numbers aren't all that relevant in that the cost of node conversion is overshadowed by the rest of the tree construction, but e. g. using flamegraph shows that unsafe conversion has a negligible cost compared to the safe one. For that reason, only the UnsafeNodeConverter is used for the hybrid tree in all other benchmarks.

  2. hybrid_vs_plain: This benches plain-1, plain-4 and hybrid using Poseidon2 as the digest hasher in all cases. Leaf digestion takes substantially longer than compression, which highlights one case in which hybrid Merkle trees (in their current form at least) may not be so useful, that is: when matrices have many columns and digestion is the dominating cost. These are the numbers:

    plain-1 plain-4 hybrid
    Time 161.02 ms 80.500 ms 138.01 ms

    Crucially, hybrid is not twice as fast as plain-1: even though compression itself should indeed take about half the time (because Blake3 is essentially free compared to Poseidon2), the digestion costs diminish that advantage.

  3. hybrid_vs_plain_identity_hasher: Here we repeat the same experiment as in bench 2, but we use the IdentityHasher in order to digest the leaves. Since this is essentially a free operation, digestion costs no longer muddle the compression gains brought about by the hybrid strategy. Note that this places a restriction on the matrices that form the leaves: at each level, the row length of all concatenated matrices cannot surpass 8 BabyBear elements (256 bits), since that would render identity hashing into 256 bits impossible. These are the numbers on machine A:

    plain-1 plain-4 hybrid
    Time 65.495 ms 37.556 ms 39.849 ms

    Here the expected ~50% savings in compression time between plain-1 and hybrid shine through (of course, other smaller costs slightly alter the exact relation). Crucially, the hybrid strategy is not better than the WIDTH-4 plain one. This is likely because machine A's architecture can execute operations on PackedValues efficiently, which the hybrid strategy cannot take advantage of (cf. point 1 of "Possible further work" below).

    However, the numbers on machine B are more flattering:

    plain-1 plain-4 hybrid
    Time (ms) 84.697 86.283 50.023

    In this machine, whose architecture does not take advantage of PackedValues, the hybrid strategy is the clear winner.

Possible future work

The point of this draft PR is to receive some feedback on the current code, but also to get your feel on whether it makes sense to develop this further and, if so, in which direction. Here are some things we thought about which we haven't got working on yet:

  1. It follows from the benchmarks that a very fast tree construction could be achieved if the hybrid tree could handle packed nodes of WIDTH 4. We don't have any numbers, but the obstacle to implementing this is purely Rust-based. Summarising things, if one wants the HybridPseudoCompression function to be able to handle both individual leaves [W; DIGEST_ELEMENTS] as well as packed ones [PW: DIGEST_ELEMENTS], one needs to add generics to either the SimpleHybridCompressor structure or the trait itself. Both possibilities come with their own difficulties - in the case of the trait, one would be forced to 1) limit the trait to two compression functions (as opposed to as many as one wants, as now); and 2) fill the HybridMerkleTree code with several more generics, whereas so far it is as clean as the original MerkleTree one. There might be a simpler design/Rust solution we are missing, though.

  2. Carry over the hybrid construction to the MMCS, i. e. add methods to prove and verify paths using HybridPseudoCompressionFunctions. This should be relatively straightforward.

  3. Add more implementations and benches for other hashes present in the codebase, including the required node converters (in particular, switching the current node converters from BabyBear to arbitrary 31-bit Montgomery fields shouldn't be too hard).

  4. Hybrid digestion, as explained above. This would probably require significant generic/trait work.

N. B.: We have included a cautioning docstring about the dangers of implementing/using hybrid strategies and node conversion without thinking about the security of the resulting configuration (cf. the beginning of the file hybrid_merkle_tree.rs).

@mmagician

Antonio95 and others added 19 commits October 16, 2024 14:31
Co-authored-by: Cesar Descalzo <Cesar199999@users.noreply.github.com>
Co-authored-by: Cesar Descalzo <Cesar199999@users.noreply.github.com>
Co-authored-by: Cesar Descalzo <Cesar199999@users.noreply.github.com>
Co-authored-by: Cesar Descalzo <Cesar199999@users.noreply.github.com>
Co-authored-by: Antonio Mejías Gil <anmegi.95@gmail.com>
Co-authored-by: Cesar Descalzo <Cesar199999@users.noreply.github.com>
Co-authored-by: Cesar Descalzo <Cesar199999@users.noreply.github.com>
Co-authored-by: Antonio Mejías Gil <anmegi.95@gmail.com>
@mmagician mmagician closed this Jan 16, 2025
@mmagician mmagician reopened this Jan 16, 2025
@Nashtare Nashtare closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants