Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune more aggressively during IBD #12404

Closed
wants to merge 1 commit into from

Conversation

Sjors
Copy link
Member

@Sjors Sjors commented Feb 10, 2018

Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly.

During IBD we now prune based on the worst case size of the remaining blocks, but no further than
the minimum prune size of 550 MB.

Using MAX_BLOCK_SERIALIZED_SIZE is complete overkill on testnet and usually too high on mainnet. It doesn't take into account the SegWit activation block either. This causes the node to be further pruned than strictly needed after IBD. It also makes it more difficult to test. One improvement could be to use a moving average actual block size or a hard coded educated guess. However there's something to be said for keeping this simple.

@Sjors
Copy link
Member Author

Sjors commented Feb 10, 2018

@fanquake probably also needs "Block storage" label.

@Sjors Sjors force-pushed the 2018/02/ibd_prune_extra branch 2 times, most recently from c3eea61 to a9e8bb4 Compare February 10, 2018 15:03
@esotericnonsense
Copy link
Contributor

esotericnonsense commented Feb 19, 2018

Untested ACK, would kill off #11658 and #11359.

I personally don't think the detail of how much we over-or-under-prune here are that important given that the long term solution is to fix the cache such that it doesn't require a complete flush. Basically any change here will speed up pruning IBD by a large amount.

@Sjors Sjors force-pushed the 2018/02/ibd_prune_extra branch 3 times, most recently from d49bab4 to 86bef23 Compare February 19, 2018 11:37
@@ -3570,6 +3570,9 @@ static void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfte

unsigned int nLastBlockWeCanPrune = chainActive.Tip()->nHeight - MIN_BLOCKS_TO_KEEP;
uint64_t nCurrentUsage = CalculateCurrentUsage();
// Worst case remaining block space:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this ought to be using best case...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best case would be empty blocks. That would lead to a lot of flushes.

@morcos
Copy link
Member

morcos commented Mar 5, 2018

ACK 86bef23

@eklitzke
Copy link
Contributor

utACK 86bef23e6550cdcf989ae6ac22dbbc45bbf613e4

@Sjors
Copy link
Member Author

Sjors commented Mar 12, 2018

Rebased due to release notes change.

@Sjors
Copy link
Member Author

Sjors commented Mar 26, 2018

Rebased due to release notes change.

@Sjors
Copy link
Member Author

Sjors commented Mar 26, 2018

p2p_leak.py failure on Travis seems a bit random (and passes on my local machine)...

@eklitzke
Copy link
Contributor

utACK 82efbf1e8ac67ad9d04cba9b64cb79ece86209f8

@luke-jr
Copy link
Member

luke-jr commented Mar 31, 2018

Before merging, please remove the name and PR reference from the commit message, so it doesn't ping us every time someone adds it to their random fork.

@bitcoin bitcoin deleted a comment from neverstopthegrind1 Apr 1, 2018
@Sjors
Copy link
Member Author

Sjors commented Apr 3, 2018

@luke-jr will do. Should I also remove it from the PR description, since that also ends up in the merge commit message? Or do those merge commits rarely make it into upstream work because commits are cherry-picked?

@Sjors
Copy link
Member Author

Sjors commented Apr 3, 2018

Done. Also: rebased for release notes.

Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly.

During IBD we now prune based on the worst case size of the remaining blocks, but no further than
the minimum prune size of 550 MB.
@Sjors
Copy link
Member Author

Sjors commented May 15, 2018

Rebased so I can do some benchmarking.

@Sjors
Copy link
Member Author

Sjors commented May 20, 2018

I've been racing AWS instances for the past few days, using master, #11658 (rebased on master) and this PR. I use a t2.micro with 1 vCPU, 1 GiB RAM and 20 GB storage. I set prune to 10 GB, dbcache=300 and maxmempool=5.

After 72 hours master is currently at block 341909, @luke-jr's branch is at 364905 and mine is at 360719.

I enabled T2 Unlimited to prevent CPU throttling, although it doesn't seem to be CPU bound beyond the first 100-200K blocks (that will change after the assumed valid block):
schermafbeelding 2018-05-20 om 10 42 26

I tried higher values for dbcache but that led to out of memory crashes (sometimes during a cache flush) and once even to a machine freezing. I didn't try adding swap to prevent these crashes; I'm not sure how to manage that properly, i.e. in a way that too much swap usage doesn't end up cancelling the benefits of these caches.

I'll leave them running for a bit. So far it seems clear that merging either of these PR's would be quite helpful on low-end machines, but which one is less clear. It probably depends on the choices for dbcache and prune and my guess is that machines with more RAM would benefit from pruning as aggressively as possible to minimize the number of cache flushes (but beyond ~8 GB of RAM it wouldn't matter, because it would never flush).

I just started three t2.medium instances with 2 vCPU, using dbcache=3000.

@Sjors
Copy link
Member Author

Sjors commented May 20, 2018

To clarify, is IsInitialBlockDownload() something that only happens once in the life time of a node, or is this also true if it needs to do a large catch up? If the latter, there is a case to be made for conservative pruning (or putting aggressive pruning behind a config flag).

When you run something like c-lightning against a pruned bitcoind node, it's constantly processing blocks as they come in. A large prune event could mess with this process if for some reason the other application isn't completely caught up. This is less of a problem for the initial sync if that other application doesn't care about pre-history. E.g. c-lightning doesn't need history before its wallet creation block, so the trick there is to wait with launching it until bitcoind finishes IBD, and then keep them both running.

But there may be other applications that need to track some sort of index all the way through IBD where it's important they don't lose sync.

@Sjors
Copy link
Member Author

Sjors commented May 21, 2018

After a little under 24 hours the t2.medium instances:

  • master: 458269
  • 10% pruning: 471894
  • this PR: 396051

Notice how this PR so far seems to perform worse than master (on this instance and with these settings, still better than master on the t2.micro instance). I'll keep an eye on it. Maybe it has something to do with the large dbcache? Because of the more frequent prune events the dbcache doesn't grow as much on master and the 10% prune branch. See IRC. Paging @eklitzke.

It would be nice to have a script that parses debug.log and spits out a CSV file with block height and cache size. Scrolling through the log I notice that on master cache mostly stays below 200 MB, on the 10% pruning cache stays below 1 GB and usually under 500 MB, whereas in this PR is grows up to 2 GB.

@Sjors
Copy link
Member Author

Sjors commented May 21, 2018

echo "height, cache" > cache.csv
cat ~/.bitcoin/debug.log | grep UpdateTip | awk -F'[ =M]'  '{print $7", " $19 }' >> cache.csv

2018-05-21 10 59 05

I'll update the plots later.

Source data and Thunderplot file: plot.zip

@Sjors
Copy link
Member Author

Sjors commented May 25, 2018

This extracts block height, cache size and a unix timestamp from the log:

cat prune300_master.log | grep UpdateTip | gawk -F'[ =M]'  '{print $7", " $19", " gsub(/[-T:Z]/," ") ", " mktime($1 " " $2 " " $3 " " $4 " " $5 " " $6)  }' >> prune300_master.csv

IBD duration with dbcache=300MB:

2018-05-25 17 39 59

Vertical access is in days. They ran for more than a week but didn't finish. The 10% prune strategy (green line) was the fastest, master the slowest.

Cache usage:

2018-05-25 17 40 18

Both strategies use more cache than master, but don't differ much for such a small dbcache.

IBD duration with dbcache=300MB:
2018-05-25 17 40 25

The 10% pruning strategy (green) was the only one that finished IBD before I gave up. This PR (red line) is dramatically slower than even master.

Cache usage:
2018-05-25 17 40 31

Both strategies use more cache than master. Aggressive pruning uses way more cache, but for some reason that seems to make things worse.

@n1bor
Copy link

n1bor commented May 31, 2018

FYI on AWS been running a node with 4gig RAM and sc1 disks (very cheap - 0.025/GigMonth) for a node with txindex on and keeps up fine (i.e. does not use burst allowance). Is used by a lightning node so get reasonable number of rpc requests. Is useless for IBD, but can set to SSD initially then once IBD done switch to sc1 with the click of a button!

@Sjors
Copy link
Member Author

Sjors commented May 31, 2018

@n1bor for my own project on AWS, I also use the strategy of doing IBD on a fast machine (i3.2xlarge). Anything with > 10 GB RAM to prevent the cache from flushing and a disk big enough to avoid pruning (doesn't have to be SSD). The bigger disk is ephemeral and goes away after you downgrade to a regular instance type, so I do a manual prune at the end and copy the data to the persistent disk.

I'll look into Cold HDD (sc1) Volumes for that project, because it c-lightning doesn't fully love pruned nodes yet, but that's off topic...

But you don't have that luxury on a physical machine like the various [Rasberry / Orange / Nano] Pi's out there. So it's quite useful if IBD performed better on a constrained machines. Also, even that price point pruning still saves $5 a month in storage at the current chain size.

@Sjors
Copy link
Member Author

Sjors commented Jun 9, 2018

Zooming in a little bit, this branch started dramatically slowing down compared to master around block 375000, which is around the September 2015 UTXO set explosion:

utxo 2015

Perhaps the performance of read or write operations involving CCoinsCacheEntry with DIRTY flag dramatically decrease when cache is > ~ 1GB? That would explain why higher frequency pruning, which generally keeps cache below 300 MB in that period doesn't slow down. While at the same time it would explain why 10 GB of cache, where all entries are FRESH, doesn't slow down either. Does that seem even plausible?

I could test this by deliberately interrupting IBD on master on a non pruned node at roughly the same blocks where this branch flushed the cache: 244000, 298000, 332000, 355000, 373000, 388000, 401000, 414000, 424000, 436000, 446000, 455000 (the latter two being the range where master starts slowing down).

@Sjors
Copy link
Member Author

Sjors commented Jun 10, 2018

@n1bor
Copy link

n1bor commented Jun 10, 2018

@Sjors not sure if you ever saw this: https://gist.github.com/n1bor/d5b0330a9addb0bf5e0f869518883522
Feels to me that time spent on IBD for pruned nodes would be better spent on chainstate only download type solution. Factor of 50x speed up. But needs a softfork - so maybe not!

@sipa
Copy link
Member

sipa commented Jun 10, 2018

@n1bor That seems orthogonal. Synchronizing from chainstate is a very interesting idea, but it's also a completely different security model (trusting that the historical chain has correct commitments rather than computing things yourself).

@n1bor
Copy link

n1bor commented Jun 10, 2018

My take is we have on order of "goodness":

  1. Full Node
  2. Pruned Full Node
  3. Chain-State Downloaded Full Node with soft-fork to commit chainstate to headers. (what my post was about)
  4. SPV
  5. Web-Wallets
    Currently core on offers 1 & 2.

Just think if core offered 3 would reduce number of users using web-wallets/SPV. Which has only got to be a good thing.

@sipa
Copy link
Member

sipa commented Jun 10, 2018

I agree that would be a good thing, but it in no way changes the fact that we should have a performant implementation for those who do not want to rely on trusting such hypothetical commitments (which this issue is about).

Also, this is not the place to discuss changes to the Bitcoin protocol.

@Sjors
Copy link
Member Author

Sjors commented Jun 11, 2018

I launched two new t2.medium nodes on AWS, running Ubuntu 16.04, 2 CPU (uncapped), 4 GB RAM no swap. I set prune=10000, dbcache=3000 and maxmempool=5 on both like I did earlier. The blue lines are the current master master, the orange line is this PR rebased on master.

Again, this branch slows down dramatically quite early on, this time I captured some metrics:

cpu

network

blue_is_master

There are prune events at 15:28 (block 244388, cache 929 MB), 15:38 (297332, cache 1024 MB), 14:48 (331446, cache 1235 MB), 15:58 (354941, cache 1279 MB) and 16:11 (373100, cache 2951 MB). Those last two are right before and after the network activity drops.

Note how this branch has dramatically more read throughput.

I'll try spinning up a node with dbcache=1000

I ran the same configuration on my iMac (which has 4 CPU's and a USB 3.1gen2 external SSD drive) and don't get any noticeable performance difference between these two branches (2 hours 20 minutes to run from block 360.000 to 480.000).

@Sjors
Copy link
Member Author

Sjors commented Jun 11, 2018

I don't see LogPrint(BCLog::PRUNE, "Prune: target=%dMiB actual=%dMiB diff=%dMiB max_prune_height=%d removed %d blk/rev pairs\n" appear in the logs, not even for master. That category isn't disabled by default, is it?

Trying to figure out what could explain the extra disk read activity. Does anything related to pruning happen in a separate thread that we don't wait for (before the next UpdateTip can happen)?

@Sjors
Copy link
Member Author

Sjors commented Jun 12, 2018

Running this branch with dbcache=1000 doesn't cause the same high read disk activity:
db cache 1000

It's still running so I don't know if it's faster than master or the 10% prune strategy, but at least it doesn't suffer a similar slow down as dbcache=3000.

@Sjors
Copy link
Member Author

Sjors commented Jun 14, 2018

The thick line shows this PR with dbcache set to 1000. It no longer shows the performance hit you see with dbcache=3000 and it's faster than master, but not necessarily faster than the 10% pruning strategy.

2018-06-14 17 10 14

Closing this in favor of #11658, since the benefit seems small and an unexplained massive performance hit needs... explaining :-)

@Sjors Sjors closed this Jun 14, 2018
@ajtowns
Copy link
Contributor

ajtowns commented Jul 12, 2018

FWIW, one effect I'm seeing that might cause the difference between dbcache 3000 vs 1000 is that when the cache is flushed, it takes a little while (and presumably 3x as long with 3x as large a dbcache), during which the block download queues pretty much empty, and then after the cache is flushed, the queues take a while to even out and get back up to the same download speed.

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants