Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic automatic UTXO database backup #8037

Open
laanwj opened this Issue May 10, 2016 · 11 comments

Comments

Projects
None yet
8 participants
@laanwj
Copy link
Member

laanwj commented May 10, 2016

If the database (especially the UTXO set - the block index is pretty fast to rebuild, but could be included) corrupts this can result in multi-hour service interruptions while reindexing from scratch. I know this shouldn't happen, but in practice it does seem to happen for some (potentially unknown) reasons, on some hardware. One scenario where it can happen uncontrollably is power loss (e.g. #7233).

It would help if there was an option to periodically make database snapshots. Then when there is a corruption issue, the software can - either manually or automatically - revert to the latest snapshot and catch up from there.

This could be done in a background thread using the UTXO set cursor introduced in #7756 (here a basic linearize-utxo utility). Most notably there is, with leveldb, no need to "stop the world" while the backup is in progress.

An utxo state dump is about 1.2Gb as of block 408202:

-rw-rw-r--  1 1243427106 May 10 13:36 utxo.dat

It is not very compressible (well, xz does a reasonable job):

-rw-rw-r--  1 1243427106 May 10 13:36 utxo.dat
-rw-rw-r--  1 1199339866 May 10 13:41 utxo.dat.gz
-rw-rw-r--  1  764742324 May 10 13:55 utxo.dat.xz

So this should be optional for users that would trade a few GB of disk space for increased reliability.

@sdaftuar

This comment has been minimized.

Copy link
Member

sdaftuar commented May 10, 2016

I think this is a great idea, particularly for pruning nodes, where avoiding a reindex means avoiding redownloading all the old blocks.

@laanwj

This comment has been minimized.

Copy link
Member Author

laanwj commented May 10, 2016

Right. For pruning there may be an additional constraints (and/or a few old blocks may have to be re-downloaded, to ensure that the last 550MB of blocks is there), but I don't see why it couldn't work in principle, and it'd still be possible to restore much faster than starting entirely from scratch.

Edit: Hm thinking about this that may not even be necessary. If the database backup is not too old, from before the last pruned block, it would just revisit the blocks on disk and end up where it left.

@jonasschnelli

This comment has been minimized.

Copy link
Member

jonasschnelli commented May 11, 2016

Great idea!
I guess there is no easy way to dump only the differential between the last periodical dump to speed up the dump time? If the dump is running on a background thread this probably doesn't matter that much.

@sipa

This comment has been minimized.

Copy link
Member

sipa commented May 11, 2016

@jonasschnelli The differential between two UTXO sets is called the blockchain (without the signatures) :p

@whitslack

This comment has been minimized.

Copy link

whitslack commented May 19, 2016

I've been making periodic chainstate database backups at the file system level. Since the *.ldb files are write-once, they can be backed up using hard links. This saves disk space, as the backup can share many files with the live database. Also, it means the backup happens nearly instantly, only taking time to create the new links to the inodes. In my setup, I do shut down the node while taking the snapshot of the database, but if this functionality were built in, presumably the database could simply be closed briefly while the snapshot is made, and the whole process wouldn't have exit.

I guess there is some kind of equivalent of hard links on Windows. Not as elegant as the Unix way, but it should still work as well, I'd think.

@laanwj

This comment has been minimized.

Copy link
Member Author

laanwj commented Jun 22, 2016

Since the *.ldb files are write-once, they can be backed up using hard links.

That's a very interesting suggestion. Indeed, unlike a background copy this does require closing the database to be able to do this without interference, but only momentarily.

This won't work on OSes without hardlinks though, such as Windows.

@eklitzke

This comment has been minimized.

Copy link
Member

eklitzke commented Mar 10, 2018

Are you sure it's safe to create a backup using hard links? LevelDB has additional metadata in it that maps which key ranges belong to which files, and that has to be in a consistent state with the rest of the database. The only way to do this atomically is via code in LevelDB (which could be written to use hard links, but it's not just a matter of saving all the .ldb files).

@whitslack

This comment has been minimized.

Copy link

whitslack commented Mar 10, 2018

Are you sure it's safe to create a backup using hard links?

Don't do it while LevelDB is running! But yes, if the application is shut down, then you can make a backup of the database using hard links. Note that some files should be copied rather than hard linked, as only the *.ldb files are write-once. The *.log file is appended to and must be backed up using a full copy. I don't know about the MANIFEST* file, so I've been backing it up using full copies as well. But the *.ldb files are never modified once fully written — they are only deleted once they are obsolete — so they're safe to back up by hard linking.

@jspeigner

This comment has been minimized.

Copy link

jspeigner commented Dec 11, 2018

Any movement on this, looking for a good way to backup and do a faster restore if it's down. Would also be nice to quick launch a new node if its running in docker.

@jonasschnelli

This comment has been minimized.

Copy link
Member

jonasschnelli commented Dec 11, 2018

@jspeigner
I think the current best way – if you are on linux/macOS – is to do the hardlink approach (#8037 (comment)). It requires a shutdown for backup and restore which is cumbersome. But otherwise its very efficient (space & required time for backup / restore).

@luke-jr

This comment has been minimized.

Copy link
Member

luke-jr commented Feb 14, 2019

Another approach that can be used for everyone, would be to save a hash of the UTXO set at various points. Then you don't need to store a complete copy, but can still verify a copy someone else gives you.

(Maybe there's a way to even download the copy over the network, but that risks using a lot of data before we can verify it's correct or not. There's also a risk that someone might use it for crazy UTXO lookup, or trusted sync stuff.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.