Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: Flush block DB before UTXO DB. #2649

Merged
merged 4 commits into from May 14, 2021

Conversation

rstaudt2
Copy link
Member

@rstaudt2 rstaudt2 commented May 8, 2021

This ensures that the block database is always at least as far along as the UTXO database which keeps the UTXO database in a recoverable state in the event of an unclean shutdown.

An overview of the changes is as follows:

  • Add Flush to the database.DB interface
    • This adds a Flush method to the database.DB interface so that users of the database.DB type can flush the underlying database to disk as needed
  • Flush block DB before UTXO DB
    • This adds a flushBlockDB function to the UTXO cache. The flushBlockDB function is used to flush the block database to disk prior to flushing the UTXO cache to the UTXO database
    • This ensures that the block database is always at least as far along as the UTXO database which keeps the UTXO database in a recoverable state in the event of an unclean shutdown
  • Flush UTXO DB after init utxoSetState
    • This forces the UTXO database to flush to disk after initializing the UTXO set state for the first time. This is necessary so that if the block database is flushed, and then an unclean shutdown occurs, the UTXO cache will know where to start from when recovering on startup
  • Force flush in separateUtxoDatabase upgrade
    • This modifies the separateUtxoDatabase upgrade to force the UTXO database to flush to disk prior to removing the UTXO set and state from the block database
    • This prevents a scenario where the UTXO set is removed from the block database and the block database is flushed to disk, but an unclean shutdown occurs before the UTXO database flushes to disk, leaving the UTXO database in an unrecoverable state

Fixes #2643.

This adds a Flush method to the DB interface so that users of the DB
type can flush the underlying database to disk as needed.

The immediate use case for this is to allow for flushing dependencies.
For example, the block database always needs to be flushed to disk prior
to the UTXO database being flushed to disk to ensure that the UTXO
database remains in a recoverable state in the event of an unclean
shutdown.
This adds a flushBlockDB function to the UTXO cache.  The flushBlockDB
function is used to flush the block database to disk prior to flushing
the UTXO cache to the UTXO database.

This ensures that the block database is always at least as far along as
the UTXO database which keeps the UTXO database in a recoverable state
in the event of an unclean shutdown.
This forces the UTXO database to flush to disk after initializing the
UTXO set state for the first time.  This is necessary so that if the
block database is flushed, and then an unclean shutdown occurs, the UTXO
cache will know where to start from when recovering on startup.
This modifies the separateUtxoDatabase upgrade to force the UTXO
database to flush to disk prior to removing the UTXO set and state from
the block database.

This prevents a scenario where the UTXO set is removed from the block
database and the block database is flushed to disk, but an unclean
shutdown occurs before the UTXO database flushes to disk, leaving the
UTXO database in an unrecoverable state.
Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I'll do some testing before approving, but I've reviewed it and everything looks proper.

@davecgh davecgh added this to the 1.7.0 milestone May 9, 2021
database/ffldb/db.go Show resolved Hide resolved
Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now tested this pretty thoroughly on both Windows and Linux and it passed everything without issue.

For reference, I tested via three main methods:

  • Using a USB drive and yanking the cable while it was in operation
  • Using a virtual machine and removing the hard drive out from under it while it was in operation
  • Killing the process with SIGKILL on Linux and taskkill on Windows

Within each of those three failure modes I tested the following conditions:

  • Failure during the initial headers sync
    • Clean shutdown to force a proper flush and ensure it recovers from that point
    • Restart and allow it to download some more headers, but not all of them, and then cause each of the aforementioned failures
    • Restart and ensure it properly resumes from the aforementioned clean recovery point in the initial headers sync
  • Failure during the transition from headers sync to chain sync
    • Clean shutdown during the initial headers sync to force a proper flush and ensure it recovers from that point
    • Restart and allow it to download the rest of the headers along with a bunch of the initial blocks, and then cause each of the aforementioned failures
    • Restart and ensure it properly resumes from the aforementioned clean recovery point during the initial headers sync
  • Failure during the initial chain sync (block download)
    • Clean shutdown during the initial chain sync (block download) to force a proper flush and ensure it recovers from that point
    • Restart and allow it to download some more blocks, but not all of them, and then cause each of the aforementioned failures
    • Restart and ensure it properly resumes from the aforementioned clean recovery point in the initial chain sync

Finally, I also modified the code to remove the flush and tried various failure combinations per the above to prove that it did NOT recover in those cases which further proves that both the testing methodology and fix are proper.

@davecgh davecgh merged commit 1f84364 into decred:master May 14, 2021
@rstaudt2 rstaudt2 deleted the utxo-db-flushing-dependency branch July 2, 2021 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to start server: block does not exist
3 participants