Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(state): Avoid panics and history tree consensus database concurrency bugs #7590

Merged
merged 3 commits into from
Sep 20, 2023

Conversation

teor2345
Copy link
Collaborator

Scheduling

This PR should probably get in before the 1.3.0 release, because it fixes a panic bug.

Motivation

  1. We've been seeing "missing sprout tree" panics in CI and on local machines (but users haven't reported them yet)
  2. A similar concurrency bug exists in history trees, but it causes invalid empty history tree data to be returned from the state

Close #7581

Zebra Designs

These are the only column families that:

  1. Delete the old height and insert the new height
  2. Don't have code that handles the height changing in the middle of reading the height & data

https://github.com/ZcashFoundation/zebra/blob/main/book/src/dev/state-db-upgrades.md#current-state-database-format

(The UTXOs get deleted, but missing UTXOs are handled correctly, and their keys are never updated.)

Complex Code or Requirements

This fix to the tree reading code is compatible with the permanent fix in PR #7392: changing to the key format to an empty key, so the trees get overwritten rather than deleted and re-inserted.

Solution

  • Always read the last sprout tree, regardless of the key value or format
  • Always read the last history tree, regardless of the key value or format
  • Add a RawData database serialization type

Review

This PR should probably get in before the 1.3.0 release, because it fixes a panic bug.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

Permanent format fix in PR #7392.

@teor2345 teor2345 added C-bug Category: This is a bug A-consensus Area: Consensus rule updates P-Medium ⚡ I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data A-state Area: State / database changes A-concurrency Area: Async code, needs extra work to make it work properly. I-remote-trigger Remote nodes can make Zebra do something bad labels Sep 20, 2023
@teor2345 teor2345 self-assigned this Sep 20, 2023
@teor2345 teor2345 requested a review from a team as a code owner September 20, 2023 04:56
@teor2345 teor2345 requested review from arya2 and removed request for a team September 20, 2023 04:56
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

mergify bot added a commit that referenced this pull request Sep 20, 2023
@mergify mergify bot merged commit 2dce686 into main Sep 20, 2023
120 checks passed
@mergify mergify bot deleted the sprout-panic branch September 20, 2023 21:17
arya2 pushed a commit that referenced this pull request Sep 29, 2023
…ncy bugs (#7590)

* Add a RawBytes database serialization type

* Fix a history tree database concurrency bug

* Fix a sprout tree concurrency panic
@upbqdn upbqdn mentioned this pull request Oct 13, 2023
38 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Async code, needs extra work to make it work properly. A-consensus Area: Consensus rule updates A-state Area: State / database changes C-bug Category: This is a bug I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data I-remote-trigger Remote nodes can make Zebra do something bad
Projects
None yet
2 participants