Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use non-atomic flushing with block replay #10148

Merged
merged 6 commits into from
Jun 28, 2017
Merged

Conversation

sipa
Copy link
Member

@sipa sipa commented Apr 4, 2017

This implements an alternative solution to the flush-time memory usage peak, suggested by @gmaxwell.

Instead of relying on using atomic batch writes in LevelDB for the chainstate, we rely on the fact that we have an external log of updates to it already (called the blockchain).

This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding.

If a flush completes succesfully, the resulting database is compatible with previous versions (down to 0.8). If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery.

@sipa
Copy link
Member Author

sipa commented Apr 4, 2017

This badly needs testing, but I'm not sure how to simulate crashes in the middle of flushing (I've manually verified this patch can recover from failure by introducing an exit(0) in the middle of the flush code).

@laanwj
Copy link
Member

laanwj commented Apr 4, 2017

Cool!

but I'm not sure how to simulate crashes in the middle of flushing

I'll get to that :)

@sipa
Copy link
Member Author

sipa commented Apr 5, 2017

Rebased, fixed a bug, and added a commit to allows simulating crashes after partial flushes.

src/txdb.cpp Outdated
static FastRandomContext rng;
if (rng.rand32() % crash_simulate == 0) {
LogPrintf("Simulating a crash. Goodbye.");
sync();
Copy link
Member

@laanwj laanwj Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sync should be optional at least (not a realistic crash otherwise)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just removed the sync. It seems not needed for testing (I've done reindexes with hundreds of crashes in between).

@gmaxwell
Copy link
Contributor

gmaxwell commented Apr 5, 2017

contrib/devtools/check-doc.py is unhappy that you added new arguments without asking for permission from the argument gods.

src/txdb.cpp Outdated
bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
CDBBatch batch(db);
size_t count = 0;
size_t changed = 0;
size_t batch_size = (size_t)GetArg("-dbbatchsize", nDefaultDbBatchSize) << 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have memory usage that is some multiple of this, perhaps the argument should be in the form of the actual usage rather than the batch size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the relevant constraint is the memory usage peak from allocating the batch, which depends on the batch memory usage, not dbcache memory usage. Also, I don't think anyone will need to change this property (except for tests, where it's very useful to get much more frequent partial flushes).

src/dbwrapper.h Outdated
@@ -75,6 +83,7 @@ class CDBBatch
leveldb::Slice slValue(ssValue.data(), ssValue.size());

batch.Put(slKey, slValue);
size_estimate += 3 + slKey.size() + slValue.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments as to why 3 (and, below, 2) bytes are overhead here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/init.cpp Outdated
strLoadError = _("Unable to replay blocks. You will need to rebuild the databse using -reindex.");
break;
}
pcoinsTip->SetBestBlock(pcoinsdbview->GetBestBlock()); // TODO: only initialize pcoinsTip after ReplayBlocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes. Were you intending to do this after this PR? Can you just delete it and re-create it here? I feel like it may make sense to move the chainActive.Tip-setting from LoadBlockIndexDB to after this point.

Speaking of which, did you mean to add a PruneBlockIndexCandidates() to ReplayBlocks ala LoadBlockIndexDB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you intending to do this after this PR?

Yes, I tried doing it inside the PR, but doing it properly requires a bit more shuffling around and refactoring, which I'd prefer to keep for later.

Speaking of which, did you mean to add a PruneBlockIndexCandidates() to ReplayBlocks ala LoadBlockIndexDB?

ReplayBlocks doesn't touch the block index, so I don't think that would have any effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PruneBlockIndexCandidates in LoadBlockIndexDB uses chainActive.Tip(), so I assumed it may need to be re-run with the new tip (though likely not a bug without it, just a should-do). I'm ok with cleaning this stuff up in a followup PR, but it seems less than ideal as-is right now.

auto pindexUpto = mapBlockIndex[hashUpto];

int nHeight = 1; // Skip the genesis block
if (mapBlockIndex.count(hashBest)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also fail here if hashBest has been written (ie is non-IsNull) but isnt in mapBlockIndex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be caught by other code we already have, but I've added it here.

inputs.ModifyCoins(txin.prevout.hash)->Spend(txin.prevout.n);
}
}
inputs.ModifyNewCoins(tx->GetHash(), tx->IsCoinBase())->FromTx(*tx, nHeight);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment above ModifyNewCoins I do not believe this works, we may need something new to capture the "maybe not fresh, but definitely fully overwrite in any case" case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, fixed.

ReplayBlock(block, cache, pindex->nHeight);
}
cache.SetBestBlock(hashUpto);
chainActive.SetTip(pindexUpto);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems super weird to be acting entirely on non-globals, and then suddenly set a global here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

@sipa sipa force-pushed the non_atomic_flush branch 2 times, most recently from 22bb19a to 16fc013 Compare April 12, 2017 10:40
@sipa
Copy link
Member Author

sipa commented Apr 12, 2017

Addressed some of @TheBlueMatt's comments.

src/coins.h Outdated
@@ -315,6 +315,9 @@ class CCoinsView
//! Retrieve the block hash whose state this CCoinsView currently represents
virtual uint256 GetBestBlock() const;

//! Retrieve the block hash up to which changes are included
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/which changes/which some changes/?

if (mapBlockIndex.count(hashBest)) {
auto pindexBest = mapBlockIndex[hashBest];
if (pindexUpto->GetAncestor(pindexBest->nHeight) != pindexBest) {
return error("ReplayBlocks(): chainstate tip does not derive from final boundary");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'll hit this if we ever crash during a disconnect? Seems kinda annoying to not support disconnect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. ReplayBlocks should learn to deal with a reorg.

src/txdb.cpp Outdated
bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
CDBBatch batch(db);
size_t count = 0;
size_t changed = 0;
size_t batch_size = (size_t)GetArg("-dbbatchsize", nDefaultDbBatchSize) << 20;
if (!hashBlock.IsNull()) {
batch.Write(DB_BEST_BLOCK_UPTO, hashBlock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me this API ties us to only doing batches per-block, and never across long chains of actions (or at least not across multiple reorgs). Consider the case where you disconnect A to get to B, then disconnect B to get to C then connect D. There is no way to encode that you need to ensure everything from disconnecting B must be replayed to ensure there are no leftover entries from that, I believe. This is likely OK, but should likely be documented somewhere to ensure we dont end up adding a multi-reorg-flush bug later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. I hadn't considered that in the case of a reorg the set of partially written changes may include things from multiple branches. It seems solvable by allowing the 'upto' blocks to be a list of tip hashes, and then at start time choose which ones to undo and which ones to replay. I think that's a problem for later, but it makes sense to have a comment about it.

@sipa sipa force-pushed the non_atomic_flush branch 2 times, most recently from 87d5f62 to 00c29e7 Compare April 16, 2017 14:16
@sipa
Copy link
Member Author

sipa commented Apr 16, 2017

Updated to deal with reorganizations. The disk format and recovery code can now also deal with multiple partially written branches. That functionality is not needed yet, but means we can switch to different partial flushing strategies later without breaking compatibility with older versions.

src/init.cpp Outdated
strLoadError = _("Unable to replay blocks. You will need to rebuild the database using -reindex-chainstate.");
break;
}
pcoinsTip->SetBestBlock(pcoinsdbview->GetBestBlock()); // TODO: only initialize pcoinsTip after ReplayBlocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do this for (almost) free now. See TheBlueMatt@747b766, though if you dont want to take it here I'll just PR it afterwards.

if (blockUndo.vtxundo.size() + 1 != block.vtx.size())
return error("RollbackBlock(): block and undo data inconsistent");

for (size_t i = 0; i < block.vtx.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to be in reverse order (like in DisconnectBlock, maybe you should just go ahead and add an option to DisconnectBlock to ignore errors in a pervious commit to make it easier to review?)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

I'll try to do the merging in a extra commit.

@@ -1505,18 +1505,15 @@ bool ApplyTxInUndo(const CTxInUndo& undo, CCoinsViewCache& view, const COutPoint
CCoinsModifier coins = view.ModifyCoins(out.hash);
if (undo.nHeight != 0) {
// undo data contains height: this is the last output of the prevout tx being spent
if (!coins->IsPruned())
fClean = fClean && error("%s: undo data overwriting existing transaction", __func__);
if (!coins->IsPruned()) fClean = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to not lose the error messages by adding a flag for printing errors. I think you need the flag either way for the next line, as I dont think you can run the Clear() if we're re-undoing a tx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no change in behavior here; the error case doesn't cause a return from the function. I believe the new (and existing) code is fine: if a failure is detected, the caller (VerifyDB or DisconnectBlock) won't flush the changes view to the level below, ignoring the resulting inconsistent state.

If you insist, I'll add a flag to ignore the error messages, but (perhaps in a separate PR) I think we should rid of these error() calls and instead perhaps return some return code. Reporting of these things doesn't belong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just find the errors useful to keep around, my real concern is the Clear() one line down, which I believe is an actual bug for the new usage in RollbackBlock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced the Clear() is wrong - it does mean we're passing over a state where the outputs for that TX were all full spent. However, this is hard to reason about, and you may well be right. Furthermore, it seems that Clear() has no purpose. In the 'clean' case, the output is already pruned, to the Clear() is a no-op. In the other case it doesn't matter. I'm removing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you may be right. Indeed, however, hard to reason about.

std::set<const CBlockIndex*, CBlockIndexWorkComparator> vpindexRollback;
for (size_t i = 1; i < pindexHeads.size(); ++i) {
const CBlockIndex *pindexHead = pindexHeads[i];
while (fGenesis ? pindexHead != nullptr : pindexHead != pindexFork) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is right. What if you connect both A and B, in the simple case? Now you'll disconnect B before you re-connect A and then re-connect B? Is that neccessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that i starts at 1 in the loop, skipping the branch that leads to the new tip. I've added a comment to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but that only means you wont do the disconnect-then-reconnect thing for one block (which I suppose may be fine for this PR), but you will do it if you have two back-to-back blocks in the list (or am I confused?).

Copy link
Member Author

@sipa sipa Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Let's say you have a chain A<-B<-C that was being flushed (meaning the old tip was A, you crashed in the middle of writing the changes for B and C, with C the intended new tip). In this case, at recovery time, GetBlockHeads() will return [C,A]. pindexFork will be A. The loop above will only process A, but because A is already the fork point, nothing is added to the disconnect set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, OK, it wasnt clear to me what GetBlockHeads() should be returning there, Flush seemed to indicate something different.

Copy link
Member Author

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review, @TheBlueMatt.

@@ -1505,18 +1505,15 @@ bool ApplyTxInUndo(const CTxInUndo& undo, CCoinsViewCache& view, const COutPoint
CCoinsModifier coins = view.ModifyCoins(out.hash);
if (undo.nHeight != 0) {
// undo data contains height: this is the last output of the prevout tx being spent
if (!coins->IsPruned())
fClean = fClean && error("%s: undo data overwriting existing transaction", __func__);
if (!coins->IsPruned()) fClean = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no change in behavior here; the error case doesn't cause a return from the function. I believe the new (and existing) code is fine: if a failure is detected, the caller (VerifyDB or DisconnectBlock) won't flush the changes view to the level below, ignoring the resulting inconsistent state.

If you insist, I'll add a flag to ignore the error messages, but (perhaps in a separate PR) I think we should rid of these error() calls and instead perhaps return some return code. Reporting of these things doesn't belong here.

if (blockUndo.vtxundo.size() + 1 != block.vtx.size())
return error("RollbackBlock(): block and undo data inconsistent");

for (size_t i = 0; i < block.vtx.size(); ++i) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

I'll try to do the merging in a extra commit.

std::set<const CBlockIndex*, CBlockIndexWorkComparator> vpindexRollback;
for (size_t i = 1; i < pindexHeads.size(); ++i) {
const CBlockIndex *pindexHead = pindexHeads[i];
while (fGenesis ? pindexHead != nullptr : pindexHead != pindexFork) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that i starts at 1 in the loop, skipping the branch that leads to the new tip. I've added a comment to clarify.

@sipa sipa force-pushed the non_atomic_flush branch 2 times, most recently from 8bdc5cd to 1c41a6e Compare April 17, 2017 19:57
@sipa sipa changed the title Use non-atomic flushing with block replay [WIP] Use non-atomic flushing with block replay Apr 17, 2017
src/txdb.cpp Outdated
std::vector<uint256> heads = GetHeadBlocks();
// Construct a set with all existing heads, excluding the new tip.
std::set<uint256> setHeads(heads.begin(), heads.end());
if (setHeads.empty() || !tip.IsNull()) setHeads.insert(tip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can tip be null ? I would have expected it is at least genesis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip can be null when it's the first write ever to the database.

Copy link
Contributor

@NicolasDorier NicolasDorier Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised, I always thought the coinbase of the genesis block was not added in the database. This would mean that having it to null to mean "before processing the genesis", is the same than enforcing the best block to be at least the genesis block.


// Find last common ancestor of all heads.
for (const uint256& hash : hashHeads) {
if (hash.IsNull()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about all this fGenesis variable. I guess this is related to my other comment on 02dfa4a#r111875593

I would not expect, pindexFork to be null, as the work fork point should be the genesis block.

Copy link
Member Author

@sipa sipa Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a lengthy comment to clarify.

sipa and others added 3 commits June 26, 2017 10:45
This requires that we not access pcoinsTip in InitBlockIndex's
FlushStateToDisk (so we just skip it until later in AppInitMain)
and the LoadChainTip in LoadBlockIndex (which there is already one
later in AppinitMain, after ReplayBlocks, so skipping it there is
fine).

Includes some simplifications by Suhas Daftuar and Pieter Wuille.
@sipa
Copy link
Member Author

sipa commented Jun 26, 2017

Rebased, and squashed the last two commits.

@sipa
Copy link
Member Author

sipa commented Jun 26, 2017

Travis where art thou

Adds new functional test, dbcrash.py, which uses -dbcrashratio to exercise the
logic for recovering from a crash during chainstate flush.

dbcrash.py is added to the extended tests, as it may take ~10 minutes to run

Use _Exit() instead of exit() for crash simulation

This eliminates stderr output such as:
    terminate called without an active exception
or
    Assertion failed: (!pthread_mutex_destroy(&m)), function ~recursive_mutex, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 104.

Eliminating the stderr output on crash simulation allows testing with
test_runner.py, which reports a test as failed if stderr is produced.
@sdaftuar
Copy link
Member

re-ACK 176c021

@laanwj
Copy link
Member

laanwj commented Jun 28, 2017

Tested ACK 176c021

@laanwj laanwj merged commit 176c021 into bitcoin:master Jun 28, 2017
laanwj added a commit that referenced this pull request Jun 28, 2017
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
@TheBlueMatt
Copy link
Contributor

utACK-sans-tests once the fixes for init-order bugs here goes through in #10758.

@morcos
Copy link
Member

morcos commented Jul 20, 2017

posthumous utACK-sans-tests and modulo some of the same bugs @TheBlueMatt found and fixed in #10758. I'll review that now.

@jnewbery jnewbery mentioned this pull request Jul 31, 2017
12 tasks
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 6, 2019
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 6, 2019
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 6, 2019
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Aug 2, 2019
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Aug 6, 2019
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Aug 6, 2019
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
barrystyle pushed a commit to PACGlobalOfficial/PAC that referenced this pull request Jan 22, 2020
176c021 [qa] Test non-atomic chainstate writes (Suhas Daftuar)
d6af06d Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
eaca1b7 Random db flush crash simulator (Pieter Wuille)
0580ee0 Adapt memory usage estimation for flushing (Pieter Wuille)
013a56a Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
b3a279c [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Tree-SHA512: 47ccc62303f9075c44d2a914be75bd6969ff881a857a2ff1227f05ec7def6f4c71c46680c5a28cb150c814999526797dc05cf2701fde1369c06169f46eccddee
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Feb 21, 2021
aab15d7 ReplayBlocks: use find instead of brackets operator to access to the element. (furszy)
e898353 [Refactoring] Use const CBlockIndex* where appropriate (random-zebra)
c76fa04 qa: Extract rpc_timewait as test param (furszy)
0f832e3 shutdown: Stop threads before resetting ptrs (MarcoFalke)
67aebbf http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan)
e24c710 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan)
b8f7364 http: Join worker threads before deleting work queue (Wladimir J. van der Laan)
7d68769 rpc: further constrain the libevent workaround (Cory Fields)
75af065 rpc: work-around an upstream libevent bug (Cory Fields)
50e5833 Always return true if AppInitMain got to the end (Matt Corallo)
bd70dcc [qa] Test non-atomic chainstate writes (furszy)
8f04970 Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo)
93f2b15 Random db flush crash simulator (Pieter Wuille)
72f3b17 Adapt memory usage estimation for flushing (Pieter Wuille)
8540113 Non-atomic flushing using the blockchain as replay journal (Pieter Wuille)
8d6625f [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille)

Pull request description:

  > This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding.

  > If a flush completes succesfully, the resulting database is compatible with previous versions. If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery.

  An adaptation of the following PRs with further modifications to the `feature_dbcrash.py` test to be up-to-date with upstream and solve RPC related bugs.

  * bitcoin#10148.
  * Increase RPC wait time.
  * bitcoin#11831
  * bitcoin#11593
  * bitcoin#12366
  * bitcoin#13837
  * bitcoin#13894

ACKs for top commit:
  random-zebra:
    ACK aab15d7
  Fuzzbawls:
    ACK aab15d7

Tree-SHA512: 898806746f581a9eb8deb0155c558481abf4454c6f3b3c3ad505c557938d0700fe9796e98e36492286ae869378647072c3ad77ad65e9dd7075108ff96469ade1
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants