Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ultraprune: use a pruned-txout-set database for block validation #1677

Merged
merged 27 commits into from Oct 20, 2012

Conversation

Projects
None yet
9 participants
@sipa
Copy link
Member

sipa commented Aug 16, 2012

This is a rewrite of the block storage and validation engine.

Instead of blkindex.dat (a database with block tree data, and all transactions and their spendings in the active chain), it uses chain.dat (only block tree data) and coins.dat (pruned txout set). These two databases together are significantly smaller than blkindex.dat (<200 MiB), and only coins.dat is actively needed during block validation, speeding it up significantly (15 minutes for importing 185000 blocks from a local disk file).

Blocks are still stored in blk????.dat files, in the same file format, but smaller files (up to 128 MiB). To prevent excessive fragmentation, they are allocated in chunks of 16 MiB, and some statistics are kept about them. To assist with reorganisation, undo files are created (rev????.dat), which contain the data necessary to undo block connections.

Block pruning itself is not yet implemented, but this makes it trivial to do so; all that is required is deleting old block and undo files when certain thresholds are reached. Also note that this block pruning mechanism is different from the transaction pruning mechanism described by Satoshi. This one does not prevent a node from acting as a full node.

All commits result in a functional code tree, with succeeding unit tests. The first few add some extra classes, without changing actual semantics. "One file per block" and "Multiple blocks per file" form a refactor of the block storage mechanism, with related database changes. "Do not store hashNext on disk" only introduces a forward-incompatible change that simplifies the database layout. "Ultraprune" itself contains the switch from txindex.dat to coins.dat as validation data, and contains the majority of the changes. What follows are optimizations and other some improvements, that do not effect compatibility.

There are a few TODO's left (see comment below), but I'd like to give the code some exposure already.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 16, 2012

(EDITED)

List of implementation changes:

  • new database layout:
    • 2 leveldb's (coins/ and blktree/ subdirs), replacing blkindex.dat
    • separate directory (blocks/) with block data (in the usual format, but smaller files) and undo data
  • database keys are of the form (char,key) instead of (string,key) for reasons of compactness
  • there is no txid-to-diskpos index anymore, only blkid-to-diskpos and txid-to-unspent-outputs
    • this makes getrawtransaction only work on unspent outputs (and slower)
      • an optional txid-to-diskpos index is planned
  • some new very specialized serializers are added (compact variable-length integer, compact amount, compact txout)
  • block index does not store hashBlockNext anymore - this is reconstructed from hashBestBlock at startup
  • at startup, automatically reorg to the best block in blktree/blocks
  • new RPCs: gettxoutsetinfo and gettxout operate on the coins database
  • no more CTxIndex in-scope: instead, a global pcoinsTip (representing the coin db) and pblocktree (representing the blktree db)
    • intended to be moved to separate modules/classes, properly encapsulated
  • blktree database contains statistics about the block file (size, which blocks in it, times, heights, undo stats, ...)
  • blktree database contains flag per block that determines the degree of validation it had, to allow future headers-first mode
  • block files are pre-allocated (in batches of 16 MiB, the files grow to max 128 MIB), to reduce fragmentation
  • transaction hashes are cached, and typically never calculated more than once

Included in the pullreq, but technically separate:

  • do -loadblock= and bootstrap.dat import in a separate thread
  • add check for strict DER encoding for signatures, and standard public keys
@Diapolo

This comment has been minimized.

Copy link

Diapolo commented Aug 21, 2012

@sipa One question, our current AppendBlockFile() function takes MAX_SIZE into account and generates a new block-file if the space left in the block file (max allowed filesize) is < MAX_SIZE. So 128 MiB files would have a maximum of 96 MiB usage-data, right?

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 21, 2012

@Diapolo: not sure what you mean; I don't use AppendBlockFile anymore.

@Diapolo

This comment has been minimized.

Copy link

Diapolo commented Aug 21, 2012

@sipa I saw that and wanted to understand the change here, which condition is used to determine, if a new block-file needs to be created, where is the check in your new code for that and what's the space limit?

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 21, 2012

The check is in FindBlockPos in main.cpp. And a new file is created if (old_used_size + new_block_size >= MAX_BLOCKFILE_SIZE).

@Diapolo

View changes

src/main.cpp Outdated
CDiskBlockPos blockPos;
{
CChainDB chaindb;
if (!FindBlockPos(chaindb, blockPos, nBlockSize+8, nHeight, nTime))

This comment has been minimized.

@Diapolo

Diapolo Aug 21, 2012

Why nBlockSize+8, is that a padding?

This comment has been minimized.

@sipa

sipa Aug 21, 2012

Author Member

4 bytes magic, 4 bytes block length; that's just the file format of blk*.dat.

This comment has been minimized.

@Diapolo

Diapolo Aug 21, 2012

I'm lacking some background information here, sorry :). Is the format defined / described somewhere?

This comment has been minimized.

@sipa

sipa Aug 21, 2012

Author Member

No idea, but I wanted to retain compatibility between pre and post-ultraprune block files, so I used the same format. That is: the files are a concatenation of {4 bytes magic, 4 bytes LE integer with the actual block size, block data itself).

This comment has been minimized.

@Diapolo

Diapolo Aug 21, 2012

I found this one and it explains what I was missing here: https://bitcointalk.org/index.php?topic=101514.0 thanks for your further explanation, too.

Why keep things compatible here, perhaps it's the right time to even optimize the internals of the block-files (e.g. compression or such a thing)?

@luke-jr

This comment has been minimized.

Copy link
Member

luke-jr commented Aug 24, 2012

Does this break the ability to downgrade at all? (I expect it just means wasted "padding" space in the blk*.dat files?)

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 27, 2012

Updated. Batch block connection now keeps a permanent cache, and modifies that (instead of delaying block connection until several blocks were available, which interfered with normal network-based downloading). Also added a commit that changes the block database format, in preparation of things like parallel signature checking and initial headers-only mode.

@Diapolo

This comment has been minimized.

Copy link

Diapolo commented Aug 27, 2012

@sipa With block database format you mean stored blocks in blk0000x.dat?

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 27, 2012

@luke-jr how do you mean breaking the ability to downgrade? The blk000*.dat files remain exactly the same format, but the other databases are incompatible.

@Diapolo No, it uses coins.dat (the unspent txout set) and chain.dat (the block index), in addition to the blk_.dat (and rev_.dat) files. It's the format of chain.dat that changed in the last commit.

@luke-jr

This comment has been minimized.

Copy link
Member

luke-jr commented Aug 27, 2012

@sipa If it interacts with downgrades in ugly ways, I'd probably not want to put it into next-test.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 27, 2012

@luke-jr Shouldn't be a problem - the filenames are all different, so you can (almost) run ultraprune and non-ultraprune together in the same datadir independently.

That said, it's likely to conflict with a lot of other stuff, so decide for yourself.

@mikehearn

This comment has been minimized.

Copy link
Contributor

mikehearn commented Aug 30, 2012

Could you provide a squashed version of the patch somewhere, for review? It's really hard to review as is because it's just a record of how you implemented it over time.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 30, 2012

@mikehearn

This comment has been minimized.

Copy link
Contributor

mikehearn commented Aug 31, 2012

Thanks, that looks useful.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Aug 31, 2012

@mikehearn Seems that through rebasing I lost some comments you made earlier on the commits?

Regarding the encodings, I plan to write some text about the final format for all datastructures, but I may change a few things still.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Sep 4, 2012

Rebased/combined with @mikehearn's LevelDB patch

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Sep 20, 2012

Rebased on 0.7, and moved the more experimental block caching and parallel signature checking to a separate branch. The code in here should be stable and can be tested.

The only things that remain to be done are automatic import of old data, and more elaborate consistency checks at startup. I think those can be done in separate pull requests though.

This branch has its own LevelDB glue, independent (though similar, but simpler) from the one in Mike's leveldb branch. As the coin and block indexes are only opened once, there was no need for a CDB-like wrapper and global CDBEnv to cache database accessors. If LevelDB is merged first, I'll add reverts for most of it here.

@mikehearn

This comment has been minimized.

Copy link
Contributor

mikehearn commented Sep 20, 2012

I closed the LevelDB pull req. Let's merge it as part of this.

Note that my LevelDB branch has code that does replay the blocks with some GUI progress. It's not great because it actually re-writes the block files in order to track the block offsets ... I didn't do any deep refactorings to fix that as I wanted it to be as easy/fast to merge as possible and it's a one-off migration anyway. But as it's now a part of ultraprune that bridge was crossed, so you could just re-use whatever GUI code is possible.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Sep 21, 2012

@TheBlueMatt any way to disable the build tester here, as it seems to be incompatible with this anyway?

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Sep 21, 2012

I've tested this a bit on the testnet. No problems found, and synchronization is super-fast.

One small comment: in your bitcoin-qt.pro, please use $(MAKE) instead of make. This prevents an annoying warning about a job server in Qt Creator.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Sep 21, 2012

@laanwj: updated to use $(MAKE)

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

TheBlueMatt commented Sep 22, 2012

@sipa Id rather not, the patch is really quite simple (http://jenkins.bluematt.me/pull-tester/files/bitcoind-comparison.patch) , afaict, its only failing because setBlockIndexValid was added directly above hashGenesisBlock in main.cpp. Can you just move that line and see if it works?

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Sep 25, 2012

Changed the database/serialization format one more time: coins and undo data now contains the transaction version number. This may be necessary when new versions of transaction are defined that have an influence on their ability to be spent.

@TheBlueMatt ok, moved the setBlockIndexValid line in main.cpp.

@mikehearn

This comment has been minimized.

Copy link
Contributor

mikehearn commented Sep 27, 2012

This does not build on MacOS X because there is no fdatasync on that platform.

@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Sep 28, 2012

@TheBlueMatt I wonder why it still complains?

EDIT: Oh, just out of date with master. Let's wait for the next cycle.

@mikehearn

This comment has been minimized.

Copy link
Contributor

mikehearn commented Sep 29, 2012

I just tried to start my client based on this branch and got:

Loading block index...
Opening LevelDB in /Users/hearn/Library/Application Support/Bitcoin/blktree
Opened LevelDB successfully
Opening LevelDB in /Users/hearn/Library/Application Support/Bitcoin/coins
Opened LevelDB successfully
LoadBlockIndex(): last block file = 23
LoadBlockIndex(): last block file: CBlockFileInfo(blocks=1572, size=132444896, heights=199237..200807, time=2012-09-17..2012-09-27)
LoadBlockIndex(): hashBestChain=00000000000000e78688 height=200806 date=09/27/2012 21:08:42
Verifying last 2500 blocks at level 1
block index 36135ms
Loading wallet...
dbenv.open LogDir=/Users/hearn/Library/Application Support/Bitcoin/database ErrorFile=/Users/hearn/Library/Application Support/Bitcoin/db.log
nFileVersion = 70099
wallet 1192ms
REORGANIZE: Disconnect 1 blocks; 000000000000051dcdc2..00000000000000e78688
REORGANIZE: Connect 2 blocks; 000000000000051dcdc2..00000000000003d0a2b1


EXCEPTION: NSt8ios_base7failureE
CAutoFile::read : end of file
bitcoin in Runaway exception

@mikehearn

This comment has been minimized.

Copy link
Contributor

mikehearn commented Sep 29, 2012

On investigation this failure can happen with both ultralevelprune and old bdb code, it happens when the block is not written but the db updates are. Typically if power is yanked at just the wrong time.

As it is not a new failure mode, I guess it should not delay review/merge of this code.

@Diapolo

View changes

bitcoin-qt.pro Outdated
@@ -90,6 +90,33 @@ contains(BITCOIN_NEED_QT_PLUGINS, 1) {
QTPLUGIN += qcncodecs qjpcodecs qtwcodecs qkrcodecs qtaccessiblewidgets
}

contains(USE_LEVELDB, -) {

This comment has been minimized.

@Diapolo

Diapolo Oct 11, 2012

So this still includes legacy BDB support? Means we need to keep 2 code-bases up to date.
What was the intention to keep it to be able to revert, just wanna know :).

This comment has been minimized.

@sipa

sipa Oct 11, 2012

Author Member

Yes, though the BDB version most likely doesn't compile anymore. This was converted from Mike's code which tried to keep compatibility, but that's just an unneccessary burden.

This comment has been minimized.

@Diapolo

Diapolo Oct 12, 2012

Thanks, so it would be nice to remove that burden entirely from this pull and the code. If this is a one way ticket there is no need to keep BDB compatibility code in.

This comment has been minimized.

@mikehearn

mikehearn Oct 12, 2012

Contributor

The original idea was to reduce the risk of merging the code, in case there were issues with LevelDB [on some specific platform] we don't want to hold up the release or do a potentially messy revert.

I agree it's irritating and a burden, but it'd suck if all of ultraprune ended up getting reverted due to unanticipated issues with LevelDB. Once 0.8 has been successfully rolled out to the userbase and things are quiet it could be deleted at that time?

This comment has been minimized.

@Diapolo

Diapolo Oct 12, 2012

I'm fine with removing that later as long as you / sipa keep track of that.
That whole block of commands in the pro-file looks like Vodoo to me anyway :-D.

@Diapolo

This comment has been minimized.

Copy link

Diapolo commented Oct 11, 2012

Did anyone build this directly on Windows with MinGW? I saw there was a cross-compile Windows flag in the pro file. Perhaps I should just fetch that branch and try in the next days.

justmoon and others added some commits Jul 21, 2012

LevelDB glue
Database-independent glue for supporting LevelDB databases.

Based on code from earlier commits by Mike Hearn in his leveldb
branch.
Backport Win32 LevelDB env from C++0x to C++
Since the gitian mingw compiler doesn't support C++0x yet.
One file per block
Refactor of the block storage code, which now stores one file per block.
This will allow easier pruning, as blocks can be removed individually.
Preliminary undo file creation
Create files (one per block) with undo information for the transactions
in it.
Ultraprune
This switches bitcoin's transaction/block verification logic to use a
"coin database", which contains all unredeemed transaction output scripts,
amounts and heights.

The name ultraprune comes from the fact that instead of a full transaction
index, we only (need to) keep an index with unspent outputs. For now, the
blocks themselves are kept as usual, although they are only necessary for
serving, rescanning and reorganizing.

The basic datastructures are CCoins (representing the coins of a single
transaction), and CCoinsView (representing a state of the coins database).
There are several implementations for CCoinsView. A dummy, one backed by
the coins database (coins.dat), one backed by the memory pool, and one
that adds a cache on top of it. FetchInputs, ConnectInputs, ConnectBlock,
DisconnectBlock, ... now operate on a generic CCoinsView.

The block switching logic now builds a single cached CCoinsView with
changes to be committed to the database before any changes are made.
This means no uncommitted changes are ever read from the database, and
should ease the transition to another database layer which does not
support transactions (but does support atomic writes), like LevelDB.

For the getrawtransaction() RPC call, access to a txid-to-disk index
would be preferable. As this index is not necessary or even useful
for any other part of the implementation, it is not provided. Instead,
getrawtransaction() uses the coin database to find the block height,
and then scans that block to find the requested transaction. This is
slow, but should suffice for debug purposes.
Batch block connection during IBD
During the initial block download (or -loadblock), delay connection
of new blocks a bit, and perform them in a single action. This reduces
the load on the database engine, as subsequent blocks often update an
earlier block's transaction already.
Transaction hash caching
Use CBlock's vMerkleTree to cache transaction hashes, and pass them
along as argument in more function calls. During initial block download,
this results in every transaction's hash to be only computed once.
Direct CCoins references
To prevent excessive copying of CCoins in and out of the CCoinsView
implementations, introduce a GetCoins() function in CCoinsViewCache
with returns a direct reference. The block validation and connection
logic is updated to require caching CCoinsViews, and exploits the
GetCoins() function heavily.
Automatically reorganize at startup to best known block
Given that the block tree database (chain.dat) and the active chain
database (coins.dat) are entirely separate now, it becomes legal to
swap one with another instance without affecting the other.

This commit introduces a check in the startup code that detects the
presence of a better chain in chain.dat that has not been activated
yet, and does so efficiently (in batch, while reusing the blk???.dat
files).
Multiple blocks per file
Change the block storage layer again, this time with multiple files
per block, but tracked by txindex.dat database entries. The file
format is exactly the same as the earlier blk00001.dat, but with
smaller files (128 MiB for now).

The database entries track how many bytes each block file already
uses, how many blocks are in it, which range of heights is present
and which range of dates.
Pre-allocate block and undo files in chunks
Introduce a AllocateFileRange() function in util, which wipes or
at least allocates a given range of a file. It can be overriden
by more efficient OS-dependent versions if necessary.

Block and undo files are now allocated in chunks of 16 and 1 MiB,
respectively.
Prepare database format for multi-stage block processing
This commit adds a status field and a transaction counter to the block
indexes.
LevelDB block and coin databases
Split off CBlockTreeDB and CCoinsViewDB into txdb-*.{cpp,h} files,
implemented by either LevelDB or BDB.

Based on code from earlier commits by Mike Hearn in his leveldb
branch.
Add LevelDB MemEnv support
Support LevelDB memory-backed environments, and use them in unit tests.
@gmaxwell

This comment has been minimized.

Copy link
Member

gmaxwell commented Oct 20, 2012

ACK. This appears ready for integration.

sipa added a commit that referenced this pull request Oct 20, 2012

Merge pull request #1677 from sipa/ultraprune
Ultraprune: use a pruned-txout-set database for block validation

@sipa sipa merged commit cf9b49f into bitcoin:master Oct 20, 2012

laudney pushed a commit to reddcoin-project/reddcoin that referenced this pull request Mar 19, 2014

Merge pull request bitcoin#1677 from sipa/ultraprune
Ultraprune: use a pruned-txout-set database for block validation

luke-jr referenced this pull request in laanwj/bitcoin Mar 24, 2014

Prevent empty transactions from being added to vtxPrev
CWalletTx::AddSupportingTransactions() was adding empty transaction
to vtxPrev in some cases. Skip over these.

Part one of the solution to bitcoin#3190. This prevents invalid vtxPrev from
entering the wallet, but not current ones being transmitted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.