Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Flaky mutator set bug #106

Merged
merged 4 commits into from
Feb 13, 2024
Merged

Conversation

aszepieniec
Copy link
Contributor

The problem was located in Block::accumulate_transaction, which is invoked repeatedly by a miner as he selects which mempool transactions to include in his block. Recall that the block has only one transaction, which represents the merger of all transactions of the miner's choice.

The problem is that the block-in-preparation contains the updated mutator set accumulator. When a new transaction is aggregated, the mutator set accumulator is updated from this intermediate state, with the new transaction's addition and removal records. It is not obvious why this leads to a problem, so let's use an example:

  • The current transaction has inputs_1 and outputs_1 and the current block-in-preparation has mutator set accumulator msa_1, which is what results from applying outputs_1 and inputs_1 to the previous block's mutator set accumulator: msa_1 = msa_0.add(outputs_1).remove(inputs_1).
  • The new transaction has inputs_2 and outputs_2. The accumulate_transaction (up until now) computes the new mutator set accumulator as msa_1.add(outputs_2).remove(inputs_2).
  • As far as mutator set accumulators go, additions commute with removals. The subtle caveat for this otherwise true statement is that it does require the removal records to be in sync with the mutator set at the time they are applied. If the removal records are not in sync, computing the next mutator set accumulator might fail. Indeed, this is what happened.
  • The removal records of the incoming transaction are not in sync with the mutator set accumulator msa_1.

So how do we get them in sync? We can't use the archival mutator set because we might not be running an archival node. We can't look at the new mutator set accumulator because we might need the chunk of the SWBF that was slid away and is now hidden by Merkleization. We can't revert the removal of inputs_1 because we still need that slid chunk to restore the mutator set accumulator.

The solution in this PR is to modify the type signature of the function accumulate_transaction, which now takes an additional argument previous_mutator_set_accumulator. The new mutator set is computed by starting from this value and applying all addition records (outputs_1 || outputs_2) first and all removal records (inputs_1 || inputs_2) second. The halfway-in-between mutator set accumulator is ignored.

At some point we should consider a prettier solution at some point -- one that does not generate objects that are destined to be ignored. However, that sounds to me like an optimization with low priority and low payoff. It affects only miners (who need way more memory anyway) and it's not clear whether it can run any faster if it is implemented in this way.

This PR also:

  1. Drops the generic type argument H from all structs and functions associated with the mutator set. Throughout this code base we only ever use Tip5 which itself is already hidden behind type alias Hash. The genericity dates to the time where we needed an alternative to the relatively slow Rescue-Prime hash functino.
  2. Removes the return values of type Result<(),_> from functions where there is no control path leading to error. Accordingly catch-and-report-error clauses where these functions are called, are removed as well.

Add a new method `new_pseudorandom` on `WalletSecret` that takes a
seed and produces the wallet secret, as opposed to sampling one from
`thread_rng`. This is useful for tests and conceivably in production
too.
Adds a new test in `mutator_set_kernel` that is identical to the flaky
one discussed in #100, except everything is derandomized. That is to
say, every source of randomness is actually pseudorandom and the seed
is hardcoded. This test fails reliably.
This commit fixes `accumulate_transaction` so that the flaky mutator
set test does not fail anymore.

Closes #100.
The dependence on the generic type argument `H` of trait
`AlgebraicHasher` comes from having only the relatively slow
Rescue-Prime and wanting to use the much faster Blake3 for tests.
However, Tip5 is on par with Blake3 and used everywhere else in this
codebase.
@Sword-Smith
Copy link
Member

Very nice find. Well done! This bug highlights an underlying problem: we have tested the underlying MMR and mutator set implementations very thoroughly but not how we interact with those datastructures. A few solid property-based tests interacting with a mutator set would probably have revealed this bug.

@aszepieniec aszepieniec merged commit fcb306f into master Feb 13, 2024
3 checks passed
@aszepieniec aszepieniec deleted the aszepieniec/flaky-mutator-set-bug branch February 14, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants