Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MerkleDB crash recovery #2913

Merged
merged 8 commits into from Apr 4, 2024
Merged

Conversation

StephenButtolph
Copy link
Contributor

@StephenButtolph StephenButtolph commented Apr 4, 2024

Why this should be merged

Fixes a bug that prevented the merkledb from recovering from a process crash.

How this works

  • Calling initializeRoot on a corrupt DB does not error gracefully.
  • This PR ensures that initializeRoot is never called on a corrupt DB.
  • This PR guarantees that the rootKey on disk is correct if the DB is marked as valid.

How this was tested

  • Added unit test.

@StephenButtolph StephenButtolph added bug Something isn't working merkledb labels Apr 4, 2024
@StephenButtolph StephenButtolph self-assigned this Apr 4, 2024
@StephenButtolph StephenButtolph added this to the v1.11.4 milestone Apr 4, 2024
@@ -1166,12 +1167,11 @@ func (db *merkleDB) invalidateChildrenExcept(exception *view) {
// Otherwise leave [db.root] as Nothing.
func (db *merkleDB) initializeRoot() error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this function aren't required... But I think they improve the code quality.

if err := trieDB.initializeRoot(); err != nil {
return nil, err
}
}

// add current root to history (has no changes)
trieDB.history.record(&changeSummary{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR - but should we clear the history here? If we're recovering from a crash - we end up keeping the history of the rebuild.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR - but should we clear the history here?

That seems reasonable to me. I don't think it makes sense to have the history from the rebuild available.

@@ -1166,12 +1167,11 @@ func (db *merkleDB) invalidateChildrenExcept(exception *view) {
// Otherwise leave [db.root] as Nothing.
func (db *merkleDB) initializeRoot() error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug here was that this function assumes that the DB is in a consistent state but we called it prior to repairing the DB.

@StephenButtolph StephenButtolph marked this pull request as ready for review April 4, 2024 00:50
Comment on lines +464 to +477
var (
batch = db.baseDB.NewBatch()
err error
)
// Write the root key
if db.root.IsNothing() {
err = batch.Delete(rootDBKey)
} else {
rootKey := encodeKey(db.root.Value().key)
err = batch.Put(rootDBKey, rootKey)
}
if err != nil {
return err
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must write the root key here to avoid the case where:

  1. An empty database is corrupted.
  2. Repairing an empty database doesn't update the root key
  3. Closing the now (partially) repaired database updates the clean shutdown flag (but leaves the root key as invalid).

Comment on lines -981 to -987

if db.root.IsNothing() {
return db.baseDB.Delete(rootDBKey)
}

rootKey := encodeKey(db.root.Value().key)
return db.baseDB.Put(rootDBKey, rootKey)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we write the root key during shutdown, this write is no longer needed.

if err := trieDB.initializeRoot(); err != nil {
return nil, err
}
}

// add current root to history (has no changes)
trieDB.history.record(&changeSummary{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR - but should we clear the history here?

That seems reasonable to me. I don't think it makes sense to have the history from the rebuild available.

Comment on lines +1321 to +1323
value, err := newMerkleDB.Get([]byte("is this"))
require.NoError(err)
require.Equal([]byte("hope"), value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

butterfly meme: "is this hope?"

@danlaine
Copy link
Collaborator

danlaine commented Apr 4, 2024

We should probably add an invariant in a comment that documents this:

// Invariant: If [cleanShutdownKey] has [hadCleanShutdown] the root on disk is correct.

@StephenButtolph StephenButtolph added this pull request to the merge queue Apr 4, 2024
Merged via the queue into master with commit cca7419 Apr 4, 2024
17 checks passed
@StephenButtolph StephenButtolph deleted the merkledb-crash-recovery branch April 4, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working merkledb
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants