New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi: Flush block DB before UTXO DB. #2649
Conversation
This adds a Flush method to the DB interface so that users of the DB type can flush the underlying database to disk as needed. The immediate use case for this is to allow for flushing dependencies. For example, the block database always needs to be flushed to disk prior to the UTXO database being flushed to disk to ensure that the UTXO database remains in a recoverable state in the event of an unclean shutdown.
This adds a flushBlockDB function to the UTXO cache. The flushBlockDB function is used to flush the block database to disk prior to flushing the UTXO cache to the UTXO database. This ensures that the block database is always at least as far along as the UTXO database which keeps the UTXO database in a recoverable state in the event of an unclean shutdown.
This forces the UTXO database to flush to disk after initializing the UTXO set state for the first time. This is necessary so that if the block database is flushed, and then an unclean shutdown occurs, the UTXO cache will know where to start from when recovering on startup.
This modifies the separateUtxoDatabase upgrade to force the UTXO database to flush to disk prior to removing the UTXO set and state from the block database. This prevents a scenario where the UTXO set is removed from the block database and the block database is flushed to disk, but an unclean shutdown occurs before the UTXO database flushes to disk, leaving the UTXO database in an unrecoverable state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I'll do some testing before approving, but I've reviewed it and everything looks proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now tested this pretty thoroughly on both Windows and Linux and it passed everything without issue.
For reference, I tested via three main methods:
- Using a USB drive and yanking the cable while it was in operation
- Using a virtual machine and removing the hard drive out from under it while it was in operation
- Killing the process with
SIGKILL
on Linux andtaskkill
on Windows
Within each of those three failure modes I tested the following conditions:
- Failure during the initial headers sync
- Clean shutdown to force a proper flush and ensure it recovers from that point
- Restart and allow it to download some more headers, but not all of them, and then cause each of the aforementioned failures
- Restart and ensure it properly resumes from the aforementioned clean recovery point in the initial headers sync
- Failure during the transition from headers sync to chain sync
- Clean shutdown during the initial headers sync to force a proper flush and ensure it recovers from that point
- Restart and allow it to download the rest of the headers along with a bunch of the initial blocks, and then cause each of the aforementioned failures
- Restart and ensure it properly resumes from the aforementioned clean recovery point during the initial headers sync
- Failure during the initial chain sync (block download)
- Clean shutdown during the initial chain sync (block download) to force a proper flush and ensure it recovers from that point
- Restart and allow it to download some more blocks, but not all of them, and then cause each of the aforementioned failures
- Restart and ensure it properly resumes from the aforementioned clean recovery point in the initial chain sync
Finally, I also modified the code to remove the flush and tried various failure combinations per the above to prove that it did NOT recover in those cases which further proves that both the testing methodology and fix are proper.
This ensures that the block database is always at least as far along as the UTXO database which keeps the UTXO database in a recoverable state in the event of an unclean shutdown.
An overview of the changes is as follows:
Flush
to thedatabase.DB
interfaceFlush
method to thedatabase.DB
interface so that users of thedatabase.DB
type can flush the underlying database to disk as neededflushBlockDB
function to the UTXO cache. TheflushBlockDB
function is used to flush the block database to disk prior to flushing the UTXO cache to the UTXO databaseutxoSetState
separateUtxoDatabase
upgradeseparateUtxoDatabase
upgrade to force the UTXO database to flush to disk prior to removing the UTXO set and state from the block databaseFixes #2643.