-
Notifications
You must be signed in to change notification settings - Fork 36.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use non-atomic flushing with block replay #10148
Conversation
This badly needs testing, but I'm not sure how to simulate crashes in the middle of flushing (I've manually verified this patch can recover from failure by introducing an |
Cool!
I'll get to that :) |
Rebased, fixed a bug, and added a commit to allows simulating crashes after partial flushes. |
src/txdb.cpp
Outdated
static FastRandomContext rng; | ||
if (rng.rand32() % crash_simulate == 0) { | ||
LogPrintf("Simulating a crash. Goodbye."); | ||
sync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sync should be optional at least (not a realistic crash otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just removed the sync. It seems not needed for testing (I've done reindexes with hundreds of crashes in between).
contrib/devtools/check-doc.py is unhappy that you added new arguments without asking for permission from the argument gods. |
src/txdb.cpp
Outdated
bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) { | ||
CDBBatch batch(db); | ||
size_t count = 0; | ||
size_t changed = 0; | ||
size_t batch_size = (size_t)GetArg("-dbbatchsize", nDefaultDbBatchSize) << 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have memory usage that is some multiple of this, perhaps the argument should be in the form of the actual usage rather than the batch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the relevant constraint is the memory usage peak from allocating the batch, which depends on the batch memory usage, not dbcache memory usage. Also, I don't think anyone will need to change this property (except for tests, where it's very useful to get much more frequent partial flushes).
src/dbwrapper.h
Outdated
@@ -75,6 +83,7 @@ class CDBBatch | |||
leveldb::Slice slValue(ssValue.data(), ssValue.size()); | |||
|
|||
batch.Put(slKey, slValue); | |||
size_estimate += 3 + slKey.size() + slValue.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments as to why 3 (and, below, 2) bytes are overhead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/init.cpp
Outdated
strLoadError = _("Unable to replay blocks. You will need to rebuild the databse using -reindex."); | ||
break; | ||
} | ||
pcoinsTip->SetBestBlock(pcoinsdbview->GetBestBlock()); // TODO: only initialize pcoinsTip after ReplayBlocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes. Were you intending to do this after this PR? Can you just delete it and re-create it here? I feel like it may make sense to move the chainActive.Tip-setting from LoadBlockIndexDB to after this point.
Speaking of which, did you mean to add a PruneBlockIndexCandidates() to ReplayBlocks ala LoadBlockIndexDB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you intending to do this after this PR?
Yes, I tried doing it inside the PR, but doing it properly requires a bit more shuffling around and refactoring, which I'd prefer to keep for later.
Speaking of which, did you mean to add a PruneBlockIndexCandidates() to ReplayBlocks ala LoadBlockIndexDB?
ReplayBlocks doesn't touch the block index, so I don't think that would have any effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PruneBlockIndexCandidates in LoadBlockIndexDB uses chainActive.Tip(), so I assumed it may need to be re-run with the new tip (though likely not a bug without it, just a should-do). I'm ok with cleaning this stuff up in a followup PR, but it seems less than ideal as-is right now.
src/validation.cpp
Outdated
auto pindexUpto = mapBlockIndex[hashUpto]; | ||
|
||
int nHeight = 1; // Skip the genesis block | ||
if (mapBlockIndex.count(hashBest)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also fail here if hashBest has been written (ie is non-IsNull) but isnt in mapBlockIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be caught by other code we already have, but I've added it here.
src/validation.cpp
Outdated
inputs.ModifyCoins(txin.prevout.hash)->Spend(txin.prevout.n); | ||
} | ||
} | ||
inputs.ModifyNewCoins(tx->GetHash(), tx->IsCoinBase())->FromTx(*tx, nHeight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment above ModifyNewCoins I do not believe this works, we may need something new to capture the "maybe not fresh, but definitely fully overwrite in any case" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, fixed.
src/validation.cpp
Outdated
ReplayBlock(block, cache, pindex->nHeight); | ||
} | ||
cache.SetBestBlock(hashUpto); | ||
chainActive.SetTip(pindexUpto); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems super weird to be acting entirely on non-globals, and then suddenly set a global here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed.
22bb19a
to
16fc013
Compare
Addressed some of @TheBlueMatt's comments. |
src/coins.h
Outdated
@@ -315,6 +315,9 @@ class CCoinsView | |||
//! Retrieve the block hash whose state this CCoinsView currently represents | |||
virtual uint256 GetBestBlock() const; | |||
|
|||
//! Retrieve the block hash up to which changes are included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/which changes/which some changes/?
src/validation.cpp
Outdated
if (mapBlockIndex.count(hashBest)) { | ||
auto pindexBest = mapBlockIndex[hashBest]; | ||
if (pindexUpto->GetAncestor(pindexBest->nHeight) != pindexBest) { | ||
return error("ReplayBlocks(): chainstate tip does not derive from final boundary"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we'll hit this if we ever crash during a disconnect? Seems kinda annoying to not support disconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. ReplayBlocks should learn to deal with a reorg.
src/txdb.cpp
Outdated
bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) { | ||
CDBBatch batch(db); | ||
size_t count = 0; | ||
size_t changed = 0; | ||
size_t batch_size = (size_t)GetArg("-dbbatchsize", nDefaultDbBatchSize) << 20; | ||
if (!hashBlock.IsNull()) { | ||
batch.Write(DB_BEST_BLOCK_UPTO, hashBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me this API ties us to only doing batches per-block, and never across long chains of actions (or at least not across multiple reorgs). Consider the case where you disconnect A to get to B, then disconnect B to get to C then connect D. There is no way to encode that you need to ensure everything from disconnecting B must be replayed to ensure there are no leftover entries from that, I believe. This is likely OK, but should likely be documented somewhere to ensure we dont end up adding a multi-reorg-flush bug later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. I hadn't considered that in the case of a reorg the set of partially written changes may include things from multiple branches. It seems solvable by allowing the 'upto' blocks to be a list of tip hashes, and then at start time choose which ones to undo and which ones to replay. I think that's a problem for later, but it makes sense to have a comment about it.
87d5f62
to
00c29e7
Compare
Updated to deal with reorganizations. The disk format and recovery code can now also deal with multiple partially written branches. That functionality is not needed yet, but means we can switch to different partial flushing strategies later without breaking compatibility with older versions. |
src/init.cpp
Outdated
strLoadError = _("Unable to replay blocks. You will need to rebuild the database using -reindex-chainstate."); | ||
break; | ||
} | ||
pcoinsTip->SetBestBlock(pcoinsdbview->GetBestBlock()); // TODO: only initialize pcoinsTip after ReplayBlocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do this for (almost) free now. See TheBlueMatt@747b766, though if you dont want to take it here I'll just PR it afterwards.
src/validation.cpp
Outdated
if (blockUndo.vtxundo.size() + 1 != block.vtx.size()) | ||
return error("RollbackBlock(): block and undo data inconsistent"); | ||
|
||
for (size_t i = 0; i < block.vtx.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to be in reverse order (like in DisconnectBlock, maybe you should just go ahead and add an option to DisconnectBlock to ignore errors in a pervious commit to make it easier to review?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
I'll try to do the merging in a extra commit.
src/validation.cpp
Outdated
@@ -1505,18 +1505,15 @@ bool ApplyTxInUndo(const CTxInUndo& undo, CCoinsViewCache& view, const COutPoint | |||
CCoinsModifier coins = view.ModifyCoins(out.hash); | |||
if (undo.nHeight != 0) { | |||
// undo data contains height: this is the last output of the prevout tx being spent | |||
if (!coins->IsPruned()) | |||
fClean = fClean && error("%s: undo data overwriting existing transaction", __func__); | |||
if (!coins->IsPruned()) fClean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to not lose the error messages by adding a flag for printing errors. I think you need the flag either way for the next line, as I dont think you can run the Clear() if we're re-undoing a tx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no change in behavior here; the error case doesn't cause a return from the function. I believe the new (and existing) code is fine: if a failure is detected, the caller (VerifyDB
or DisconnectBlock
) won't flush the changes view to the level below, ignoring the resulting inconsistent state.
If you insist, I'll add a flag to ignore the error messages, but (perhaps in a separate PR) I think we should rid of these error()
calls and instead perhaps return some return code. Reporting of these things doesn't belong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just find the errors useful to keep around, my real concern is the Clear() one line down, which I believe is an actual bug for the new usage in RollbackBlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the Clear() is wrong - it does mean we're passing over a state where the outputs for that TX were all full spent. However, this is hard to reason about, and you may well be right. Furthermore, it seems that Clear() has no purpose. In the 'clean' case, the output is already pruned, to the Clear() is a no-op. In the other case it doesn't matter. I'm removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you may be right. Indeed, however, hard to reason about.
src/validation.cpp
Outdated
std::set<const CBlockIndex*, CBlockIndexWorkComparator> vpindexRollback; | ||
for (size_t i = 1; i < pindexHeads.size(); ++i) { | ||
const CBlockIndex *pindexHead = pindexHeads[i]; | ||
while (fGenesis ? pindexHead != nullptr : pindexHead != pindexFork) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is right. What if you connect both A and B, in the simple case? Now you'll disconnect B before you re-connect A and then re-connect B? Is that neccessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that i starts at 1 in the loop, skipping the branch that leads to the new tip. I've added a comment to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but that only means you wont do the disconnect-then-reconnect thing for one block (which I suppose may be fine for this PR), but you will do it if you have two back-to-back blocks in the list (or am I confused?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Let's say you have a chain A<-B<-C that was being flushed (meaning the old tip was A, you crashed in the middle of writing the changes for B and C, with C the intended new tip). In this case, at recovery time, GetBlockHeads() will return [C,A]
. pindexFork
will be A. The loop above will only process A, but because A is already the fork point, nothing is added to the disconnect set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, OK, it wasnt clear to me what GetBlockHeads() should be returning there, Flush seemed to indicate something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review, @TheBlueMatt.
src/validation.cpp
Outdated
@@ -1505,18 +1505,15 @@ bool ApplyTxInUndo(const CTxInUndo& undo, CCoinsViewCache& view, const COutPoint | |||
CCoinsModifier coins = view.ModifyCoins(out.hash); | |||
if (undo.nHeight != 0) { | |||
// undo data contains height: this is the last output of the prevout tx being spent | |||
if (!coins->IsPruned()) | |||
fClean = fClean && error("%s: undo data overwriting existing transaction", __func__); | |||
if (!coins->IsPruned()) fClean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no change in behavior here; the error case doesn't cause a return from the function. I believe the new (and existing) code is fine: if a failure is detected, the caller (VerifyDB
or DisconnectBlock
) won't flush the changes view to the level below, ignoring the resulting inconsistent state.
If you insist, I'll add a flag to ignore the error messages, but (perhaps in a separate PR) I think we should rid of these error()
calls and instead perhaps return some return code. Reporting of these things doesn't belong here.
src/validation.cpp
Outdated
if (blockUndo.vtxundo.size() + 1 != block.vtx.size()) | ||
return error("RollbackBlock(): block and undo data inconsistent"); | ||
|
||
for (size_t i = 0; i < block.vtx.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
I'll try to do the merging in a extra commit.
src/validation.cpp
Outdated
std::set<const CBlockIndex*, CBlockIndexWorkComparator> vpindexRollback; | ||
for (size_t i = 1; i < pindexHeads.size(); ++i) { | ||
const CBlockIndex *pindexHead = pindexHeads[i]; | ||
while (fGenesis ? pindexHead != nullptr : pindexHead != pindexFork) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that i starts at 1 in the loop, skipping the branch that leads to the new tip. I've added a comment to clarify.
8bdc5cd
to
1c41a6e
Compare
src/txdb.cpp
Outdated
std::vector<uint256> heads = GetHeadBlocks(); | ||
// Construct a set with all existing heads, excluding the new tip. | ||
std::set<uint256> setHeads(heads.begin(), heads.end()); | ||
if (setHeads.empty() || !tip.IsNull()) setHeads.insert(tip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can tip
be null ? I would have expected it is at least genesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip can be null when it's the first write ever to the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised, I always thought the coinbase of the genesis block was not added in the database. This would mean that having it to null
to mean "before processing the genesis", is the same than enforcing the best block to be at least the genesis block.
src/validation.cpp
Outdated
|
||
// Find last common ancestor of all heads. | ||
for (const uint256& hash : hashHeads) { | ||
if (hash.IsNull()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about all this fGenesis
variable. I guess this is related to my other comment on 02dfa4a#r111875593
I would not expect, pindexFork to be null, as the work fork point should be the genesis block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a lengthy comment to clarify.
This requires that we not access pcoinsTip in InitBlockIndex's FlushStateToDisk (so we just skip it until later in AppInitMain) and the LoadChainTip in LoadBlockIndex (which there is already one later in AppinitMain, after ReplayBlocks, so skipping it there is fine). Includes some simplifications by Suhas Daftuar and Pieter Wuille.
Rebased, and squashed the last two commits. |
Travis where art thou |
Adds new functional test, dbcrash.py, which uses -dbcrashratio to exercise the logic for recovering from a crash during chainstate flush. dbcrash.py is added to the extended tests, as it may take ~10 minutes to run Use _Exit() instead of exit() for crash simulation This eliminates stderr output such as: terminate called without an active exception or Assertion failed: (!pthread_mutex_destroy(&m)), function ~recursive_mutex, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 104. Eliminating the stderr output on crash simulation allows testing with test_runner.py, which reports a test as failed if stderr is produced.
re-ACK 176c021 |
Tested ACK 176c021 |
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
utACK-sans-tests once the fixes for init-order bugs here goes through in #10758. |
posthumous utACK-sans-tests and modulo some of the same bugs @TheBlueMatt found and fixed in #10758. I'll review that now. |
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar) d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) eaca1b7 Random db flush crash simulator (Pieter Wuille) 0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille) 013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
aab15d7 ReplayBlocks: use find instead of brackets operator to access to the element. (furszy) e898353 [Refactoring] Use const CBlockIndex* where appropriate (random-zebra) c76fa04 qa: Extract rpc_timewait as test param (furszy) 0f832e3 shutdown: Stop threads before resetting ptrs (MarcoFalke) 67aebbf http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan) e24c710 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan) b8f7364 http: Join worker threads before deleting work queue (Wladimir J. van der Laan) 7d68769 rpc: further constrain the libevent workaround (Cory Fields) 75af065 rpc: work-around an upstream libevent bug (Cory Fields) 50e5833 Always return true if AppInitMain got to the end (Matt Corallo) bd70dcc [qa] Test non-atomic chainstate writes (furszy) 8f04970 Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) 93f2b15 Random db flush crash simulator (Pieter Wuille) 72f3b17 Adapt memory usage estimation for flushing (Pieter Wuille) 8540113 Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) 8d6625f [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Pull request description: > This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding. > If a flush completes succesfully, the resulting database is compatible with previous versions. If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery. An adaptation of the following PRs with further modifications to the `feature_dbcrash.py` test to be up-to-date with upstream and solve RPC related bugs. * bitcoin#10148. * Increase RPC wait time. * bitcoin#11831 * bitcoin#11593 * bitcoin#12366 * bitcoin#13837 * bitcoin#13894 ACKs for top commit: random-zebra: ACK aab15d7 Fuzzbawls: ACK aab15d7 Tree-SHA512: 898806746f581a9eb8deb0155c558481abf4454c6f3b3c3ad505c557938d0700fe9796e98e36492286ae869378647072c3ad77ad65e9dd7075108ff96469ade1
This implements an alternative solution to the flush-time memory usage peak, suggested by @gmaxwell.
Instead of relying on using atomic batch writes in LevelDB for the chainstate, we rely on the fact that we have an external log of updates to it already (called the blockchain).
This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding.
If a flush completes succesfully, the resulting database is compatible with previous versions (down to 0.8). If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery.