Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded growth of scheduler queue #14289

Open
sipa opened this issue Sep 21, 2018 · 14 comments

Comments

Projects
None yet
7 participants
@sipa
Copy link
Member

commented Sep 21, 2018

In validation.cpp, both ConnectTip and DisconnectTip invoke a GetMainSignals()... callback, which schedules a future event though the CScheduler interface.

These functions are called in 3 places:

  • ActivateBestChainTipStep
  • InvalidateBlock
  • RewindBlockIndex

The first of these 3 prevents the scheduler queue from growing unboundedly, by limiting the size to 10 in ActivateBestChainTip. The other two however do not, and @Sjors discovered that doing a giant invalidateblock RPC call will in fact blow up the node's memory usage (which turns out to be caused by the queue events holding all disconnected blocks in memory).

I believe there are several improvements necessary.

  1. (short term symptom fix) If this issue also appears for RewindBlockIndex, we must fix it before 0.17, as this may prevent nodes from upgrading pre-0.13 code. I think this is easy, as this function is called prior to normal node operation, so it can easily be reworked to release cs_main intermittently and drain the queue.

  2. (long term fix) The use of background queues is a means to increase parallellism, not a way to break lock dependencies (as @gmaxwell pointed out privately to me, whenever a background queue is used to break a dependency, attempts to limit its size can turn into a deadlock). One way I think is to have a debug mode where the background scheduler immediately runs all scheduled actions synchronously, forcing all add-to-queue calls to happen without locks held. Once everything works in such a mode, we can safely add actual queue size limits as opposed to ad-hoc checks to prevent the queue from growing too large.

  3. (orthogonal memory usage reduction) It's unfortunate that our DisconnectBlock events hold a shared pointer to the block being disconnected. Ideally there is a shared layer that only keeps the last few blocks in memory, and releases and reloads them from disk when needed otherwise. This wouldn't fix the unbounded growth issue above, but it would reduce the impact. It's also independently useful to reduce the memory during very large reorgs (where we can't run callbacks until the entire thing completed).

@MarcoFalke MarcoFalke added this to the 0.17.0 milestone Sep 21, 2018

@andrewburns99

This comment has been minimized.

Copy link

commented Sep 21, 2018

Add so cash can I get money out of this shit

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Sep 21, 2018

See also "Unbounded reorg memory usage" #9027 about resource usage of blocks kept in memory and the wallet lagging behind too much.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2018

Are you sure (1) is a regression?

The discussion in (2) isn't entirely true - if you actually fully break lockdeps (which I think is a long-term goal here, especially across validation/wallet and like is bbeing worked towards in #12934), that isn't an issue. Its only when you have circular deps and cs_main that it becomes a problem.

(3) has been discussed a number of times and even had a TODO for it at some point, though I can't find it anymore. Also appears to be a dup of #9027.

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 22, 2018

Even if it's not a regression, the deeper SegWit activation is burried, the more strain is put on RewindBlockIndex. Perhaps at some depth the node should not automatically upgrade, but promt the user whether they want to do that, or if they prefer to delete and resync (probably faster), or even ignore SegWit until some later moment.

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Sep 22, 2018

It's a regression. At least before a walletless, txindexless node could happily invalidate down to 0 without causing extreme memory usage. It doesn't now. The memory usage never goes down, even after returning back to the tip. 64GB of usage is hit after invalidating only 40k blocks too.

The discussion in (2) isn't entirely true - if you actually fully break lockdeps

Sipa wrote "whenever a background queue is used to break a dependency".

The use of this to break locking dependencies is just wrong. It should not be used to break locking dependencies. Any uses of the queues for the exclusive purpose of breaking locking dependencies should be probably be removed because by definition every one of those cases results in "unbounded" memory usage. (technically there may be a "bound" but the resulting bound is impossible to reason about without consider the total behaviour of every lock touched by every piece of code that takes the lock(s) in question-- e.g. the entire codebase). Effectively proving that the memory usage isn't unbound is equivalent to proving that multiuse contention of the lock without the unbounded queue couldn't deadlock.

Also, as we've learned elsewhere the queue is actually pretty slow. So it may not be a suitable tool for improving performance through concurrency in many applications. It certainly shouldn't be deployed as such without benchmarks that show that it actually improves performance.

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 24, 2018

I'm syncing a v0.13.0 node now on AWS (i3.2xlarge) and plan to upgrade to v0.17.0rc4 once it's near or at the tip, to see how memory behaves. (update: syncing it again because I turned it off without realizing the storage is ephemeral)

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 25, 2018

After syncing v0.13 I upgraded to v0.17.0rc. It first upgraded the UTXO database as expected and then started rolling back blocks as expected.

Memory uses blows up... I have a copy of the v0.13.0 database, so will try with 0.16 later and I can try fixes.

Cache 1.5 GB, RAM 12.9 GB at height 533K:
schermafbeelding 2018-09-25 om 21 36 18

Cache 2.4 GB, RAM 20 GB at height 527K:
schermafbeelding 2018-09-25 om 21 44 31

No graceful stopping this either:
schermafbeelding 2018-09-25 om 21 44 57

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

Using a backup of the fully synced v0.13.0 database, I tried v0.16.3. It also first upgraded the UTXO database as expected and then started rolling back blocks as expected.

Cache 1.4 GB, RAM 11.2 GB at height 533K:
533k

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

v0.15.2 does not have this bug. It rolled back to 533K with only 3.2 GB of RAM usage (it loaded the full UTXO database during the database upgrade). It's still pretty slow, so I don't think automatic rollback is a great user experience.

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

Using a backup of the fully synced v0.13.0 database, I tried v0.16.3. It also first upgraded the UTXO database as expected and then started rolling back blocks as expected.
Cache 1.4 GB, RAM 11.2 GB at height 533K:

Is that with wallet and txindex disabled?

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

No wallet, no indexes.

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

Which weakly indicates that this is not a regression in 0.17.0

@Sjors

This comment has been minimized.

Copy link
Member

commented Sep 26, 2018

@MarcoFalke it's a regression in 0.16, based on the above tests. See also IRC.

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Sep 30, 2018

Moved to 0.17.1, as this is documented as a known issue in the release notes: https://github.com/bitcoin-core/bitcoin-devwiki/wiki/0.17.0-Release-notes/_history

@laanwj

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

It's not realistic to solve this entirely before the 0.18 release, and also not required as this is not a regression for 0.18—so I'm moving the milestone. Any fixes in this regard should of course be backported to the 0.18 branch.

@laanwj laanwj modified the milestones: 0.18.0, 0.19.0 Mar 4, 2019

laanwj added a commit that referenced this issue Mar 7, 2019

Merge #15402: Granular invalidateblock and RewindBlockIndex
519b0bc Make last disconnected block BLOCK_FAILED_VALID, even when aborted (Pieter Wuille)
8d22041 Optimization: don't add txn back to mempool after 10 invalidates (Pieter Wuille)
9ce9c37 Prevent callback overruns in InvalidateBlock and RewindBlockIndex (Pieter Wuille)
9bb32eb Release cs_main during InvalidateBlock iterations (Pieter Wuille)
9b1ff5c Call InvalidateBlock without cs_main held (Pieter Wuille)
241b2c7 Make RewindBlockIndex interruptible (Pieter Wuille)
880ce7d Call RewindBlockIndex without cs_main held (Pieter Wuille)
436f7d7 Release cs_main during RewindBlockIndex operation (Pieter Wuille)
1d34287 Merge the disconnection and erasing loops in RewindBlockIndex (Pieter Wuille)
32b2696 Move erasure of non-active blocks to a separate loop in RewindBlockIndex (Pieter Wuille)
9d6dcc5 Abstract EraseBlockData out of RewindBlockIndex (Pieter Wuille)

Pull request description:

  This PR makes a number of improvements to the InvalidateBlock (`invalidateblock` RPC) and RewindBlockIndex functions, primarily around breaking up their long-term cs_main holding. In addition:
  * They're made safely interruptible (`bitcoind` can be shutdown, and no progress in either will be lost, though if incomplete, `invalidateblock` won't continue after restart and will need to be called again)
  * The validation queue is prevented from overflowing (meaning `invalidateblock` on a very old block will not drive bitcoind OOM) (see #14289).
  * `invalidateblock` won't bother to move transactions back into the mempool after 10 blocks (optimization).

  This is not an optimal solution, as we're relying on the scheduler call sites to make sure the scheduler doesn't overflow. Ideally, the scheduler would guarantee this directly, but that needs a few further changes (moving the signal emissions out of cs_main) to prevent deadlocks.

  I have manually tested the `invalidateblock` changes (including interrupting, and running with -checkblockindex and -checkmempool), but haven't tried the rewinding (which is probably becoming increasingly unnecessary, as very few pre-0.13.1 nodes remain that would care to upgrade).

Tree-SHA512: 692e42758bd3d3efc2eb701984a8cb5db25fbeee32e7575df0183a00d0c2c30fdf72ce64c7625c32ad8c8bdc56313da72a7471658faeb0d39eefe39c4b8b8474
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.