New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MerkleDB crash recovery #2913
Conversation
@@ -1166,12 +1167,11 @@ func (db *merkleDB) invalidateChildrenExcept(exception *view) { | |||
// Otherwise leave [db.root] as Nothing. | |||
func (db *merkleDB) initializeRoot() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this function aren't required... But I think they improve the code quality.
if err := trieDB.initializeRoot(); err != nil { | ||
return nil, err | ||
} | ||
} | ||
|
||
// add current root to history (has no changes) | ||
trieDB.history.record(&changeSummary{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR - but should we clear the history here? If we're recovering from a crash - we end up keeping the history of the rebuild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR - but should we clear the history here?
That seems reasonable to me. I don't think it makes sense to have the history from the rebuild available.
@@ -1166,12 +1167,11 @@ func (db *merkleDB) invalidateChildrenExcept(exception *view) { | |||
// Otherwise leave [db.root] as Nothing. | |||
func (db *merkleDB) initializeRoot() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug here was that this function assumes that the DB is in a consistent state but we called it prior to repairing the DB.
var ( | ||
batch = db.baseDB.NewBatch() | ||
err error | ||
) | ||
// Write the root key | ||
if db.root.IsNothing() { | ||
err = batch.Delete(rootDBKey) | ||
} else { | ||
rootKey := encodeKey(db.root.Value().key) | ||
err = batch.Put(rootDBKey, rootKey) | ||
} | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must write the root key here to avoid the case where:
- An empty database is corrupted.
- Repairing an empty database doesn't update the root key
- Closing the now (partially) repaired database updates the clean shutdown flag (but leaves the root key as invalid).
|
||
if db.root.IsNothing() { | ||
return db.baseDB.Delete(rootDBKey) | ||
} | ||
|
||
rootKey := encodeKey(db.root.Value().key) | ||
return db.baseDB.Put(rootDBKey, rootKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we write the root key during shutdown, this write is no longer needed.
if err := trieDB.initializeRoot(); err != nil { | ||
return nil, err | ||
} | ||
} | ||
|
||
// add current root to history (has no changes) | ||
trieDB.history.record(&changeSummary{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR - but should we clear the history here?
That seems reasonable to me. I don't think it makes sense to have the history from the rebuild available.
value, err := newMerkleDB.Get([]byte("is this")) | ||
require.NoError(err) | ||
require.Equal([]byte("hope"), value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
butterfly meme: "is this hope?"
We should probably add an invariant in a comment that documents this:
|
Why this should be merged
Fixes a bug that prevented the merkledb from recovering from a process crash.
How this works
initializeRoot
on a corrupt DB does not error gracefully.initializeRoot
is never called on a corrupt DB.rootKey
on disk is correct if the DB is marked as valid.How this was tested