Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block ActivateBestChain to empty validationinterface queue #11824

Merged
merged 7 commits into from Dec 29, 2017

Conversation

TheBlueMatt
Copy link
Contributor

This should fix #11822.

It ended up bigger than I hoped for, but its not too gnarly. Note that "
Require no cs_main lock for ProcessNewBlock/ActivateBestChain" is mostly pure code-movement.


CBlockIndex *pindexMostWork = nullptr;
CBlockIndex *pindexNewTip = nullptr;
int nStopAtHeight = gArgs.GetArg("-stopatheight", DEFAULT_STOPATHEIGHT);
do {
boost::this_thread::interruption_point();

if (GetMainSignals().CallbacksPending() > 10) {
Copy link
Contributor

@dcousens dcousens Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be parameterized? Or at least a configurable constant?
Is there any advantage to tweaking it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During normal (read: non-IBD/reindex) usage, this should virtually never reach 10. During IBD/reindex, your memory is probably better speint in more dbcache than storing blocks you recently connected to eventually give to your wallet. I dont think more knobs is helpful here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11824 (comment)

In commit "Block ActivateBestChain to empty validationinterface queue"

During normal (read: non-IBD/reindex) usage, this should virtually never reach 10. During IBD/reindex, your memory is probably better speint in more dbcache than storing blocks you recently connected to eventually give to your wallet.

Please write this in a code comment. (How is anybody supposed to know this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to introduce a named constant for this instead.

Copy link
Member

@achow101 achow101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK-ish

This definitely is fixing the problem I was seeing, although I have one question below.


const CInv &inv = *it;
{
while (it != pfrom->vRecvGetData.end() && (it->type == MSG_TX || it->type == MSG_WITNESS_TX)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this expects to see transactions first and then blocks. But can't a getdata message have the types in any order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are allowed to (in fact always have) not finish the full queue before returning, and we do for blocks (always did). A peer shouldn't be able to make us spin giving them 100 blocks before processing any other peers' requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see now. I didn't see the break for when the type was a block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11824 (comment)

In commit "Block ActivateBestChain to empty validationinterface queue"

Please add a code comment here saying this, I had exactly the same question as achow.

@achow101
Copy link
Member

achow101 commented Dec 5, 2017

tACK 3113573c847c76943c065b72d3e7a9edc643fed1

Copy link
Contributor

@dcousens dcousens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once-over ACK

@jamesob
Copy link
Member

jamesob commented Dec 5, 2017

tACK 3113573

I've been running -reindex for over twelve hours and haven't seen an OOM:

job@ali:~$ sudo egrep "Kill.*bitcoind" /var/log/syslog || echo "cool"
cool

@jamesob
Copy link
Member

jamesob commented Dec 5, 2017

Just hit another OOM, but also realized I was on an outdated version of this branch (HEAD was 151b17da1 Block ActivateBestChain to empty validationinterface queue). @TheBlueMatt do you recall what changed between 151b17da1 and current HEAD and if it would explain the difference between another OOM or not?

In any case, I'm pulling down the tip of this branch and starting another -reindex.

@TheBlueMatt
Copy link
Contributor Author

@jamesob No, the only differences there were in test_bitcoin.cpp, so that should have had no affect.

@achow101
Copy link
Member

achow101 commented Dec 5, 2017

I haven't seen any OOMs and my node is almost fully reindexed. Granted it did go down sometime last night as it ran out of disk space. It's been running for ~6 hours now and was running for ~14 hours before then.

@TheBlueMatt
Copy link
Contributor Author

Well this definitely fixes a bug...just did a reindex to 150k with 0.15, 0.15.1 master and this branch, peak memory usage on this branch and 0.15 and 0.15.1 were all about the same, master was 3x higher...I'll look more into our memory footprint and see what else is hiding.

@jamesob
Copy link
Member

jamesob commented Dec 6, 2017

tACK (again) 3113573

Definitely hitting fewer OOMs reindexing on this branch than I was a week ago. :)

@jamesob
Copy link
Member

jamesob commented Dec 8, 2017

My bitcoind process has been running for 37 hours on this branch; memory usage has held steady through and after a partial reindex.

@TheBlueMatt
Copy link
Contributor Author

Rebased.

@laanwj laanwj added this to the 0.16.0 milestone Dec 14, 2017
This should (marginally) speed up validationinterface queue
draining by avoiding a cs_main lock in one client.
@TheBlueMatt
Copy link
Contributor Author

Rebased.

@jonasschnelli
Copy link
Contributor

Haven't looked at the code, but can confirm that a -reindex OOM/leak issue (#11822) was fixed once I merged in that PR.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK da8903142f807be9295797c0e65f228230ffd467. Locking changes look good, and nice to break up ProcessGetData.


const CInv &inv = *it;
{
while (it != pfrom->vRecvGetData.end() && (it->type == MSG_TX || it->type == MSG_WITNESS_TX)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11824 (comment)

In commit "Block ActivateBestChain to empty validationinterface queue"

Please add a code comment here saying this, I had exactly the same question as achow.

@@ -1038,180 +1038,193 @@ static void RelayAddress(const CAddress& addr, bool fReachable, CConnman* connma
connman->ForEachNodeThen(std::move(sortfunc), std::move(pushfunc));
}

void static ProcessGetData(CNode* pfrom, const Consensus::Params& consensusParams, CConnman* connman, const std::atomic<bool>& interruptMsgProc)
// returns true if we need to call ActivateBestChain before responding
bool static ProcessGetBlockData(CNode* pfrom, const Consensus::Params& consensusParams, const CInv& inv, CConnman* connman, const std::atomic<bool>& interruptMsgProc, bool best_chain_active)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Avoid cs_main in net_processing ActivateBestChain calls"

This change would be easier to understand if ProcessGetBlockData refactoring happened in a different commit than than the cs_main ActivateBestChain unlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I think.

@@ -1038,180 +1038,193 @@ static void RelayAddress(const CAddress& addr, bool fReachable, CConnman* connma
connman->ForEachNodeThen(std::move(sortfunc), std::move(pushfunc));
}

void static ProcessGetData(CNode* pfrom, const Consensus::Params& consensusParams, CConnman* connman, const std::atomic<bool>& interruptMsgProc)
// returns true if we need to call ActivateBestChain before responding
bool static ProcessGetBlockData(CNode* pfrom, const Consensus::Params& consensusParams, const CInv& inv, CConnman* connman, const std::atomic<bool>& interruptMsgProc, bool best_chain_active)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Avoid cs_main in net_processing ActivateBestChain calls"

best_chain_active argument is unused, should remove or use or explain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, was a bug, fixed in the refactor.

// before ActivateBestChain but after AcceptBlock).
// In this case, we need to run ActivateBestChain prior to checking the relay
// conditions below.
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Avoid cs_main in net_processing ActivateBestChain calls"

This is pretty convoluted. If the point is just to release cs_main before calling ActivateBestChain, why not just use LEAVE_CRITICAL_SECTION / ENTER_CRITICAL_SECTION to release it? Is the problem just lack of a reverse_lock RAII helper?

More importantly, if the point is to release cs_main before calling ActivateBestChain, you really need to say this in a code comment, because without looking through code history, I don't think this would be clear at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, refactored a bit, though LEAVE_CRITICAL_SECTION also isnt sufficient as you need to do the mapBlockIndex lookup again, which I went ahead and did.

BOOST_CHECK(it != wallet->mapWallet.end());
blocktx = CMutableTransaction(*it->second.tx);
}
CreateAndProcessBlock({CMutableTransaction(blocktx)}, GetScriptForRawPubKey(coinbaseKey.GetPubKey()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Require no cs_main lock for ProcessNewBlock/ActivateBestChain"

Maybe simplify this using mapWallet.at() instead of .find() and BOOST_CHECK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


CBlockIndex *pindexMostWork = nullptr;
CBlockIndex *pindexNewTip = nullptr;
int nStopAtHeight = gArgs.GetArg("-stopatheight", DEFAULT_STOPATHEIGHT);
do {
boost::this_thread::interruption_point();

if (GetMainSignals().CallbacksPending() > 10) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11824 (comment)

In commit "Block ActivateBestChain to empty validationinterface queue"

During normal (read: non-IBD/reindex) usage, this should virtually never reach 10. During IBD/reindex, your memory is probably better speint in more dbcache than storing blocks you recently connected to eventually give to your wallet.

Please write this in a code comment. (How is anybody supposed to know this?)

CallFunctionInValidationInterfaceQueue([&promise] {
promise.set_value();
});
promise.get_future().wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Block ActivateBestChain to empty validationinterface queue"

Could add a CValidationInterface::WaitForPendingCallbacks() method to wrap this up since this is also needed in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, in a new commit.

@Damballahwedo
Copy link

Damballahwedo commented Dec 24, 2017 via email

@TheBlueMatt TheBlueMatt force-pushed the 2017-12-11822-debug branch 3 times, most recently from 13473b6 to af5a003 Compare December 24, 2017 18:31
Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK af5a003d3cfdd3f3a26dfd7c2db5dde347b4ab3c (with a few non-blocking nits)

@@ -218,6 +217,8 @@ BOOST_AUTO_TEST_CASE(CreateNewBlock_validity)
for (unsigned int i = 0; i < sizeof(blockinfo)/sizeof(*blockinfo); ++i)
{
CBlock *pblock = &pblocktemplate->block; // pointer for convenience
{
LOCK(cs_main);
pblock->nVersion = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation?

@@ -627,10 +627,15 @@ class ListCoinsTestingSetup : public TestChain100Setup
BOOST_CHECK(wallet->CreateTransaction({recipient}, wtx, reservekey, fee, changePos, error, dummy));
CValidationState state;
BOOST_CHECK(wallet->CommitTransaction(wtx, reservekey, nullptr, state));
CMutableTransaction blocktx;
{
LOCK(wallet->cs_wallet);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation?


CBlockIndex *pindexMostWork = nullptr;
CBlockIndex *pindexNewTip = nullptr;
int nStopAtHeight = gArgs.GetArg("-stopatheight", DEFAULT_STOPATHEIGHT);
do {
boost::this_thread::interruption_point();

if (GetMainSignals().CallbacksPending() > 10) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to introduce a named constant for this instead.

* });
* promise.get_future().wait();
*/
void BlockOnValidationInterfaceQueueDrain();
Copy link
Member

@sipa sipa Dec 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, it's not actually waiting until a drain, as new events can be added to the queue before the callback added in this function is executed.

It's more a synchronization point (guaranteeing that at least all pre-existing events are executed first).

@sipa
Copy link
Member

sipa commented Dec 29, 2017

utACK 97d2b09 (and also some experimental evidence that this indeed removes the runaway memory usage).

Confirmed only indentation/naming changes since af5a003.

@sipa sipa merged commit 97d2b09 into bitcoin:master Dec 29, 2017
sipa added a commit that referenced this pull request Dec 29, 2017
97d2b09 Add helper to wait for validation interface queue to catch up (Matt Corallo)
3613749 Block ActivateBestChain to empty validationinterface queue (Matt Corallo)
5a933ce Add an interface to get the queue depth out of CValidationInterface (Matt Corallo)
a99b76f Require no cs_main lock for ProcessNewBlock/ActivateBestChain (Matt Corallo)
a734896 Avoid cs_main in net_processing ActivateBestChain calls (Matt Corallo)
66aa1d5 Refactor ProcessGetData in anticipation of avoiding cs_main for ABC (Matt Corallo)
818075a Create new mutex for orphans, no cs_main in PLV::BlockConnected (Matt Corallo)

Pull request description:

  This should fix #11822.

  It ended up bigger than I hoped for, but its not too gnarly. Note that "
  Require no cs_main lock for ProcessNewBlock/ActivateBestChain" is mostly pure code-movement.

Tree-SHA512: 1127688545926f6099449dca6a4e6609eefc3abbd72f1c66e03d32bd8c7b31e82097d8307822cfd1dec0321703579cfdd82069cab6e17b1024e75eac694122cb
@Sjors
Copy link
Member

Sjors commented Dec 29, 2017

I'm about to try this as well, will let you know if I still see extreme memory use during a reindex.

UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Mar 25, 2020
…face queue

97d2b09 Add helper to wait for validation interface queue to catch up (Matt Corallo)
3613749 Block ActivateBestChain to empty validationinterface queue (Matt Corallo)
5a933ce Add an interface to get the queue depth out of CValidationInterface (Matt Corallo)
a99b76f Require no cs_main lock for ProcessNewBlock/ActivateBestChain (Matt Corallo)
a734896 Avoid cs_main in net_processing ActivateBestChain calls (Matt Corallo)
66aa1d5 Refactor ProcessGetData in anticipation of avoiding cs_main for ABC (Matt Corallo)
818075a Create new mutex for orphans, no cs_main in PLV::BlockConnected (Matt Corallo)

Pull request description:

  This should fix bitcoin#11822.

  It ended up bigger than I hoped for, but its not too gnarly. Note that "
  Require no cs_main lock for ProcessNewBlock/ActivateBestChain" is mostly pure code-movement.

Tree-SHA512: 1127688545926f6099449dca6a4e6609eefc3abbd72f1c66e03d32bd8c7b31e82097d8307822cfd1dec0321703579cfdd82069cab6e17b1024e75eac694122cb
codablock pushed a commit to codablock/dash that referenced this pull request Mar 25, 2020
…face queue

97d2b09 Add helper to wait for validation interface queue to catch up (Matt Corallo)
3613749 Block ActivateBestChain to empty validationinterface queue (Matt Corallo)
5a933ce Add an interface to get the queue depth out of CValidationInterface (Matt Corallo)
a99b76f Require no cs_main lock for ProcessNewBlock/ActivateBestChain (Matt Corallo)
a734896 Avoid cs_main in net_processing ActivateBestChain calls (Matt Corallo)
66aa1d5 Refactor ProcessGetData in anticipation of avoiding cs_main for ABC (Matt Corallo)
818075a Create new mutex for orphans, no cs_main in PLV::BlockConnected (Matt Corallo)

Pull request description:

  This should fix bitcoin#11822.

  It ended up bigger than I hoped for, but its not too gnarly. Note that "
  Require no cs_main lock for ProcessNewBlock/ActivateBestChain" is mostly pure code-movement.

Tree-SHA512: 1127688545926f6099449dca6a4e6609eefc3abbd72f1c66e03d32bd8c7b31e82097d8307822cfd1dec0321703579cfdd82069cab6e17b1024e75eac694122cb
codablock added a commit to codablock/dash that referenced this pull request Mar 25, 2020
codablock added a commit to dashpay/dash that referenced this pull request Mar 26, 2020
Merge remaining bits of bitcoin#11824: Block ActivateBestChain to empty validationinterface queue
furszy added a commit to PIVX-Project/PIVX that referenced this pull request Feb 26, 2021
…ead (without cs_main)

3d11027 wallet:CreateCoinStake, solving IsSpent() missing cs_main lock. (furszy)
046386b IsNoteSaplingChange: Add missing cs_wallet lock. (furszy)
ded2e8e feature_dbcrash.py using blockmaxsize instead of blockmaxweight that we currently don't support. (furszy)
00cc6ec dumpwallet: Add missing BlockUntilSyncedToCurrentChain (furszy)
bfd9a15 test: sapling_fillblock.py sync mempool every 200 transactions instead of only at the end. (furszy)
53497f0 Validation: DisconnectTip doesn't need to force a flush to disk. (furszy)
f8cd371 [Miner] Sync wallet state before try to solve proof of stake. (furszy)
3ace13b qa: Fix some tests to work on native windows (furszy)
65cf7e1 don't attempt mempool entry for wallet transactions on startup if already in mempool (instagibbs)
756d0fa Handle rename failure in DumpMempool(...) by using RenameOver(...) return value (practicalswift)
1423dba [bugfix] save feeDelta instead of priorityDelta in DumpMempool (Alex Morcos)
d97ace9 [Test] notes_double_spend: sync wallet before check balances. (furszy)
1ed753f Fix wallet_tests.cpp, missing fInMempool flag set. (furszy)
815667d unit test framework: missing scheduler service loop start added. (furszy)
de3c7ae fix wallet_upgrade.py test, wasn't counting the coinbase script. (furszy)
e6770c8 fixing invalid wallet_dump.py, generated PoW blocks use a P2PKH coinbase script that now is properly marked as used inside the wallet. (furszy)
4ed7024 fix invalid numbers in wallet_labels.py (furszy)
b9249c5 Miner: generate RPC, fix coinbase script key not marked as used (furszy)
296c956 wallet: guard null m_last_block_processed (furszy)
0dfebf4 sapling_rpc_wallet_tests: remove unneeded cs_main and cs_wallet locks. (furszy)
c3a281c fix mempool_persist.py dump issue, missing sync with validation interface. (furszy)
67c754a qa: Sync with validationinterface queue in sync_mempools (MarcoFalke)
596056c [validation] Do not check for double spent zerocoins. (furszy)
0c4642c Add helper to wait for validation interface queue to catch up (Matt Corallo)
cc91d44 Block ActivateBestChain to empty validationinterface queue (Matt Corallo)
0c68e2f Add an interface to get the queue depth out of CValidationInterface (Matt Corallo)
31c7974 Decouple block processing cs_main lock from the rest of inv get data requests (furszy)
da7c0f7 Refactor ProcessGetData avoiding to lock cs_main for its entire time. (furszy)
10efbe5 net_processing: making PushTierTwoGetDataRequest return a bool in case of pushing the message. (furszy)
51dea23 net_processing move-only: decouple tier two get data request into its own function. (furszy)
1c9fe10 RPC: listunspent remove redundant wallet check (furszy)
4d927b0 Add a dev notes document describing the new wallet RPC blocking (Matt Corallo)
5f521fd Give ZMQ consistent order with UpdatedBlockTip on scheduler thread (Matt Corallo)
7d05997 Fix wallet RPC race by waiting for callbacks in sendrawtransaction (Matt Corallo)
c7ab490 Also call other wallet notify callbacks in scheduler thread (Matt Corallo)
31a8790 Use callbacks to cache whether wallet transactions are in mempool (Matt Corallo)
f6df6e4 Add calls to CWallet::BlockUntilSyncedToCurrentChain() in RPCs (furszy)
24a3ce4 Add CWallet::BlockUntilSyncedToCurrentChain() (Matt Corallo)
40ed4c4 Add CallFunctionInQueue to wait on validation interface queue drain (Matt Corallo)
268be9c Call TransactionRemovedFromMempool in the CScheduler thread (Matt Corallo)
1fa0d70 Add a CValidationInterface::TransactionRemovedFromMempool (Matt Corallo)

Pull request description:

  Concluding with the validation <--> wallet asynchronous signal processing work started in #2082, #2118, #2150, #2192, #2195.

  Effectively moving every validation interface callback to a background thread without locking `cs_main` for its entire process (each handler can now request `cs_main` lock only when/if they need it).

  This has a direct performance improvement on the synchronization time (which i haven't measured yet because there is one/two more PRs over the wallet and GUI areas, probably large as well, on top of this one and #2201 that should boost the sync time a lot more).

  Containing the following changes:

  * Adaptations coming from bitcoin#10286.
  * Adaptations coming from bitcoin#11824 (this one is different for us, take the base idea when you review it). Essentially solves a severe memory leak introduced previously in 10286 and improves `cs_main` lock acquisitions as well.
  * net_processing: decouple and refactor tier two inv data request processing.
  * bitcoin#12206

ACKs for top commit:
  random-zebra:
    Great job! 🍻 ACK 3d11027
  Fuzzbawls:
    ACK 3d11027

Tree-SHA512: 60a25604fb8a3ad0553ccb074aed99c1b3c6f8a765b40c1b43f25412373cbd2a9e4f0f413d45cf694bd62e48512c936099ffb7a0d23a1b97576cb33283ca05ac
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Severe memory leak on current master