fix: Flaky mutator set bug #106

aszepieniec · 2024-02-12T21:37:51Z

The problem was located in Block::accumulate_transaction, which is invoked repeatedly by a miner as he selects which mempool transactions to include in his block. Recall that the block has only one transaction, which represents the merger of all transactions of the miner's choice.

The problem is that the block-in-preparation contains the updated mutator set accumulator. When a new transaction is aggregated, the mutator set accumulator is updated from this intermediate state, with the new transaction's addition and removal records. It is not obvious why this leads to a problem, so let's use an example:

The current transaction has inputs_1 and outputs_1 and the current block-in-preparation has mutator set accumulator msa_1, which is what results from applying outputs_1 and inputs_1 to the previous block's mutator set accumulator: msa_1 = msa_0.add(outputs_1).remove(inputs_1).
The new transaction has inputs_2 and outputs_2. The accumulate_transaction (up until now) computes the new mutator set accumulator as msa_1.add(outputs_2).remove(inputs_2).
As far as mutator set accumulators go, additions commute with removals. The subtle caveat for this otherwise true statement is that it does require the removal records to be in sync with the mutator set at the time they are applied. If the removal records are not in sync, computing the next mutator set accumulator might fail. Indeed, this is what happened.
The removal records of the incoming transaction are not in sync with the mutator set accumulator msa_1.

So how do we get them in sync? We can't use the archival mutator set because we might not be running an archival node. We can't look at the new mutator set accumulator because we might need the chunk of the SWBF that was slid away and is now hidden by Merkleization. We can't revert the removal of inputs_1 because we still need that slid chunk to restore the mutator set accumulator.

The solution in this PR is to modify the type signature of the function accumulate_transaction, which now takes an additional argument previous_mutator_set_accumulator. The new mutator set is computed by starting from this value and applying all addition records (outputs_1 || outputs_2) first and all removal records (inputs_1 || inputs_2) second. The halfway-in-between mutator set accumulator is ignored.

At some point we should consider a prettier solution at some point -- one that does not generate objects that are destined to be ignored. However, that sounds to me like an optimization with low priority and low payoff. It affects only miners (who need way more memory anyway) and it's not clear whether it can run any faster if it is implemented in this way.

This PR also:

Drops the generic type argument H from all structs and functions associated with the mutator set. Throughout this code base we only ever use Tip5 which itself is already hidden behind type alias Hash. The genericity dates to the time where we needed an alternative to the relatively slow Rescue-Prime hash functino.
Removes the return values of type Result<(),_> from functions where there is no control path leading to error. Accordingly catch-and-report-error clauses where these functions are called, are removed as well.

Add a new method `new_pseudorandom` on `WalletSecret` that takes a seed and produces the wallet secret, as opposed to sampling one from `thread_rng`. This is useful for tests and conceivably in production too.

Adds a new test in `mutator_set_kernel` that is identical to the flaky one discussed in #100, except everything is derandomized. That is to say, every source of randomness is actually pseudorandom and the seed is hardcoded. This test fails reliably.

This commit fixes `accumulate_transaction` so that the flaky mutator set test does not fail anymore. Closes #100.

The dependence on the generic type argument `H` of trait `AlgebraicHasher` comes from having only the relatively slow Rescue-Prime and wanting to use the much faster Blake3 for tests. However, Tip5 is on par with Blake3 and used everywhere else in this codebase.

Sword-Smith · 2024-02-12T22:53:00Z

Very nice find. Well done! This bug highlights an underlying problem: we have tested the underlying MMR and mutator set implementations very thoroughly but not how we interact with those datastructures. A few solid property-based tests interacting with a mutator set would probably have revealed this bug.

aszepieniec added 4 commits February 12, 2024 14:20

feat: Derandomize wallet generation

4f3792a

Add a new method `new_pseudorandom` on `WalletSecret` that takes a seed and produces the wallet secret, as opposed to sampling one from `thread_rng`. This is useful for tests and conceivably in production too.

Derandomize flaky test

c578522

Adds a new test in `mutator_set_kernel` that is identical to the flaky one discussed in #100, except everything is derandomized. That is to say, every source of randomness is actually pseudorandom and the seed is hardcoded. This test fails reliably.

fix: Fix flaky mutator set test

bd15ae7

This commit fixes `accumulate_transaction` so that the flaky mutator set test does not fail anymore. Closes #100.

aszepieniec merged commit fcb306f into master Feb 13, 2024
3 checks passed

aszepieniec deleted the aszepieniec/flaky-mutator-set-bug branch February 14, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Flaky mutator set bug #106

fix: Flaky mutator set bug #106

aszepieniec commented Feb 12, 2024

Sword-Smith commented Feb 12, 2024

fix: Flaky mutator set bug #106

fix: Flaky mutator set bug #106

Conversation

aszepieniec commented Feb 12, 2024

Sword-Smith commented Feb 12, 2024