Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kernel panics from hitting hash nodes during reverts #1275

Closed
BGluth opened this issue Oct 3, 2023 · 1 comment · Fixed by #1278
Closed

Fix kernel panics from hitting hash nodes during reverts #1275

BGluth opened this issue Oct 3, 2023 · 1 comment · Fixed by #1278

Comments

@BGluth
Copy link
Member

BGluth commented Oct 3, 2023

Overview

It's possible to trigger a kernel panic caused by an EVM fault that eventually reliably results in attempting to delete nodes in tries that are either non-existent/are below hash nodes. The full path and conditions to cause this panic are a bit long, but at a high level this what happens:

  • Executing a txn triggers a fault (fault_exception label is the asm).
  • Triggering the fault causes a revert (we need to delete any nodes that were inserted in order to perform the revert).
  • At some point during the deletion, we attempt to delete a key that ends up hitting a hash node in the trie (this triggers the kernel panic).

It's also important to note that the underlying issue which brought this to light this is EIP-2929 not being implemented in Edge but is in Plonky2 (credits to @Nashtare!). However, while there is a bug in the upstream EVM's data source (Edge), we can still determine that something is wrong within Plonky2 because:

  • Plonky2 should only delete nodes that it inserts. The keys it uses should always exist.
  • The one special edge case (no pun intended) where this could happen with plonky2 being correct is not happening (see below).

Edge case where this isn't a plonky2 bug

There is one scenario where Plonky2 panics but doesn't have any bugs itself.

For this to be the case, we need two conditions:

  • The upstream EVM (Edge) to be sending pre-image tries that contain hash nodes where they shouldn't.
  • Trie node deletions must occur during normal txn execution.

So what needs to happen is we need to have a trie node during execution that looks like this at some point:

  B
 / \
H   N

Where:

  • B --> Branch (16-ary, but empty children are not shown here).
  • H --> Hash node.
  • N --> Non-hash node.

Now if the upstream EVM ends up hashing nodes that it shouldn't, we can encounter a case where Plonky2 hits a hash node during deletion from reverting. If normal execution ends up deleting N, we end up collapsing this into an extension node:

E
|
H

What's important here is H should not be hashed out if the upstream EVM is correct. If we end up performing a revert at some point after this deletion occurs and attempt to restore the original branch structure, we end up traversing down into the hash node and getting a kernel panic. Since this hash node should not exist, this isn't really a bug in Plonky2.

... But this isn't what is happening. In the attached EVM trace log, no deletions are occurring during normal txn execution, and this scenario should be impossible (unless there are other possibilities that I'm not thinking of). So it ends up coming down to plonky2 attempting to delete a node that doesn't exist.

Priority

While this is something that needs to be fixed, the test data that I'm using doesn't normally trigger faults, so I can continue making progress without this fix. However, this is a kernel panic that shouldn't happen, so it's still something that I feel needs to be dealt with fairly soon.

Why we haven't maybe seen this yet

I was talking to @dlubarov and he mentioned that we don't have any tests at the moment that test reverts with tries that contain hash nodes. A nice starting point for debugging this might be creating one or more tests that has reverts with tries containing hashed out nodes.

Trace files

The trace log files are giant (multiple millions line long), so just message me if you want access to them.

@BGluth
Copy link
Member Author

BGluth commented Oct 3, 2023

Hey @wborgeaud I realized after writing this that the branch --> extension node edge case might actually not be an issue. Like it might not have to actually hit the hash node and end up panicking when re-inserting the nodes that were deleted, so this edge case might be irrelevant. You might have some more insight into this just because I think you wrote a lot of this logic.

Regardless, even if this edge case ends up not mattering, this still is a bug in Plonky2 I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant