validation: Warm coins cache during prevalidation to connect blocks faster #19271

andrewtoth · 2020-06-14T01:40:05Z

Retrieving coins from disk is a significant source of ConnectBlock latency. This PR increases ConnectBlock speed by retrieving coins from disk on a separate thread before ConnectBlock is reached. When a block is passed into ProcessNewBlock it is immediately warmed before prevalidation checks are begun.

Benchmarking IBD with -prune=10000 and default -dbcache from blocks 600000-633000 gives a 7.1% increase in speed. Since this is only really useful when blocks take a long time to prevalidate, benchmarks from genesis only give a 2.3% increase. However, this should keep increasing as more large blocks get mined, so it will be even more useful in the future. Of course this will be less useful with high -dbcache values.

andrewtoth · 2020-06-14T01:40:51Z

Big thanks to @JeremyRubin for giving me this idea.

JeremyRubin · 2020-06-14T02:20:15Z

Concept ACK and lite CR-ACK, the approach seems to make sense.

Very nice work! Did this help with the variance from before?

I don't want to go overkill but I wonder if there's a nicer way to communicate the start warming/stop warming instructions (e.g. not just tweaking flags so there's some sort of interface). I also wonder if it makes sense to just always completely finish warming the cache or not? Aborting warming early still means the work has to happen in serial (at some point) with the main thread. Is there ever a circumstance where the warmer gets the cache full that it evicts its own warmed entries? Can we detect this and just stop then?

DrahtBot · 2020-06-14T04:15:11Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

Pass mempool pointer to UnloadBlockIndex/GetCoinsCacheSizeState #19604 (Pass mempool pointer to UnloadBlockIndex/GetCoinsCacheSizeState by MarcoFalke)
refactor: Replace RecursiveMutex with Mutex in CTxMemPool #19306 (refactor: Replace RecursiveMutex with Mutex in CTxMemPool by hebasto)
Disable fee estimation in blocksonly mode (by removing the fee estimates global) #18766 (Disable fee estimation in blocksonly mode by darosior)
refactor: Make CCheckQueue RAII-styled #18731 (refactor: Make CCheckQueue RAII-styled by hebasto)
Return BlockValidationState from ProcessNewBlock if CheckBlock/AcceptBlock fails #17479 (Return BlockValidationState from ProcessNewBlock if CheckBlock/AcceptBlock fails by jnewbery)
[doc] explain why CheckBlock() is called before AcceptBlock #15545 ([doc] explain why CheckBlock() is called before AcceptBlock by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

naumenkogs · 2020-06-14T08:23:19Z

Concept ACK.
I think speeding up connecting blocks is always a good idea.

I see you introducing a new thread. That might be an issue. Last time I tried that it didn't work out well: see discussion here and an issue I created here.
Maybe you want to express your opinion there or during the IRC meeting, as it's likely that you'll face the same issue with this PR

andrewtoth · 2020-06-14T13:43:42Z

Did this help with the variance from before?

@JeremyRubin my personal benchmarking bandwidth is rather limited. I'm trying to get another machine set up to do more extensive benchmarking on this. My numbers were done on a 12 virtual core CPU with internal SSD. I think this might have a wide variance of performance increase depending on CPU and disk types. However, the results I had were very promising so I submitted this for concept approval. I will come back with some more numbers soon(ish).

I don't want to go overkill but I wonder if there's a nicer way to communicate the start warming/stop warming instructions (e.g. not just tweaking flags so there's some sort of interface).

I thought I had done a pretty good job of this by wrapping it in WarmBlock/CancelWarmingBlock methods. I'll think some more on this. I'm open to suggestions.

I also wonder if it makes sense to just always completely finish warming the cache or not? Aborting warming early still means the work has to happen in serial (at some point) with the main thread.

My benchmarks showed this to have consistently negative performance. I also tried locking/unlocking inside ConnectBlock only where the coins were actually accessed, but it was also less performant and would have also made this diff a lot more complex.

Is there ever a circumstance where the warmer gets the cache full that it evicts its own warmed entries? Can we detect this and just stop then?

I don't think this can occur. The cache is flushed only after ConnectBlock in ConnectTip, so the effect will be the same when warming coins in ConnectBlock or right before.

andrewtoth · 2020-06-14T13:46:18Z

@naumenkogs I don't think your issue applies here. The work this thread needs to do is on the order of milliseconds, but it also needs to be immediately available for there to be any benefit. If it had to wait some milliseconds for scheduling it would be useless.

JeremyRubin

Yeah, you have done a decent job with it!

Concretely it would be my preference to have a separate object for the warming thread (which is owned by the chainstate) so that the inner fields of the warming logic are encapsulated away from the chainstate. That way it's more explicit what the coin warmer can access and its state variables can only be accessed through the interface. I would then also hide the variables from the CoinsTip function and access through an interface. This is just a style preference so feel free to ignore -- some people also feel that objects are too heavyweight, but in particular I feel that an object to own/manage a thread is a good abstraction as it helps make clear which fields are protected by m_cs_warm_block (also check out some of the locking annotations used elsewhere in the code, would help with this this patch).

src/validation.cpp

luke-jr · 2020-06-14T23:40:37Z

Concept ACK

jnewbery · 2020-06-17T00:03:21Z

This is the kind of change that immediately scares me. It's adding complexity in a consensus-critical area, without great benefit to the critical metrics:

IBD: 2.3% speedup for is nice, but not worth the complexity/risk
Block propagation: I doubt this will have any impact since we relay compact blocks before calling ActivateBestChain(), which is where the coins are needed.

So I'd tend to concept NACK this kind of change. It seems to go in the opposite direction from what we'd want, which is to make validation-critical code as simple as possible.

andrewtoth · 2020-06-17T14:45:03Z

@jnewbery I can definitely appreciate that perspective! However, I think concentrating on the 2.3% speedup is a pessimistic take on this patch. It has a 7.1% increase in speed for recent blocks, which are what we are likely to continue to experience in the future. While a user only has to perform IBD from genesis once, they will have to sync recent blocks every time they turn on the node. I think optimizing the common user flow of using the node, then turning off for days/weeks/months until it's used again will prove to be a big benefit. I also think I can find a way to increase IBD from genesis somewhat, but am still experimenting and will get more concrete numbers together. As I said in the description, this will only become more beneficial as the block chain grows.

I also don't think this is that invasive of a patch. It's not changing any logic with any of the actual validation code. It is just looping through the inputs of a block before the actual validation occurs. This seems like an easy win for a bump of 7%.

You are correct that this doesn't improve block propagation. I didn't claim that it did. However, block propagation is only the first step for a block to be useful to a user; it still has to be connected after it is propagated. Increasing propagation to miners isn't as useful if it still takes a long time to validate the block afterwards. Miners (and all users, really) want to have the block connected as soon as possible. This can reduce forks by allowing miners to start mining on top of the block faster, and improve miner revenue by more quickly determining the optimal block template to mine on.

jnewbery · 2020-06-17T15:23:23Z

think optimizing the common user flow of using the node, then turning off for days/weeks/months until it's used again will prove to be a big benefit.

This isn't a usage pattern that we should be optimizing for:

it happens very infrequently (at most once every few days/weeks/months if the user really does stop-start their node frequently)
I'd argue that for this scenario, a 7.1% increase isn't that meaningful. If instead of a node taking one hour to resync from cold with this PR, I'm probably ok waiting one hour and four minutes without this PR.

I've had a quick read through the implementation and I don't see how this can work. Your ThreadWarmCoinsCache thread is taking cs_main, grabbing a pointer to the coins tip, then dropping cs_main and continuing to write to the cache object. Meanwhile, the message handler thread or any other thread could also be accessing that same cache and reading/writing to it. Surely cs_main needs to be held for as long as you have the coins view cache?

andrewtoth · 2020-06-17T16:18:27Z

Meanwhile, the message handler thread or any other thread could also be accessing that same cache and reading/writing to it. Surely cs_main needs to be held for as long as you have the coins view cache?

@jnewbery any other thread accessing CoinsTip requires cs_main to be held before accessing. Calling CoinsTip will cancel the current block warming and acquire the warming lock, and it can't start warming another block until it takes cs_main again. I can make this more clear with refactoring as per the comment here.

JeremyRubin · 2020-06-17T17:46:38Z

I think the approach can be made clearly safe. While out of the propagation path for compact blocks, this is still in path for mining so there is a motivating factor IMO.

Repeated uncontended locking is sufficiently cheap that I'd be curious to see the performance difference if the warming thread just tries to grab the lock before every read/write and receives an atomic flag from the main thread to quit warming at a certain point. This would make the algorithm more 'obviously correct' and I think would have negligible performance impact as the lock won't be contended.

I'd also be curious if we can start the warming even earlier. E.g., when we receive a compact block we can immediately warm coins we can reconstruct. But that's a lot of complexity for a marginal gain over the current approach.

andrewtoth · 2020-06-22T00:49:47Z

I had originally been benchmarking by connecting to a single peer in the local network, which processes the new block and connects each block serially. When connecting to multiple remote peers the node will process new blocks in parallel and then connect them all later. When benchmarking a true IBD with multiple remote peers, the node gets a lot more time to warm the blocks as they come in.

Doing a full IBD with -prune=10000 -stopatheight=635000 on the same machine as before, I get an increase in speed of 39% (548 minutes in master, 337 minutes in this branch) and total block connect time speed increase of 7% (14563.66s (22.93ms/blk) in master, 13571.83s (21.37ms/blk) in this branch). Total time warming coins was 4294.83s.

Doing a full IBD to block 635000 with no pruning on an older quad core machine with SSD, I get a speed increase in total time of 12% (28 hours in master, 24.75 hours in this branch) and total connect time speed increase of 16% (67425.83s (106.18ms/blk) in master, 56600.08s (89.13ms/blk) in this branch). Total time warming coins was 22229.61s.

Of course there is a lot of variance due to network latency and quality of peers, and I will continue benchmarking with different configurations. However, I think these numbers warrant adding some more complexity to the validation logic as well as a new dedicated thread.

This also means what I said about warmed coins being evicted before block connection is not true. The cache can be filled and the coins evicted before the block is connected. I'll work on updating this so it won't happen.

I'm still working on refactoring the locking interface, but I wanted to share my findings. I also pushed a small update to cancel warming a block when a new block is processed before the previous one is connected, so it will work efficiently with multiple remote peers in case anyone else wants to try benchmarking this.

JeremyRubin · 2020-07-18T17:00:11Z

@andrewtoth I wanted to check in here and see how your progress is going? Anything in particular you feel stuck on?

andrewtoth · 2020-07-18T22:17:44Z

@JeremyRubin Thanks for checking in, as well as for all your help and encouragement!

I've pushed my latest. I've taken your suggestion and moved the thread handling to a new class BlockWarmer, and added an instance of it as a member variable of CChainState. Also added locking annotations. I think it makes it a lot more clear what is being changed with the validation logic. I can also write unit tests for it now.

As for what I'm felling stuck on, it's difficult to get a consistent benchmark on this. It's also very time consuming since I don't have a bunch of machines to keep running this on. It's consistently faster doing IBD, but the speedup varies widely, to above 20% to low single digits. I think I might have to setup something in the local network to pretend to be multiple peers. I'm also thinking I should write a benchmark for time to connect after receiving a block in ProcessNewBlock. I think I need to demonstrate objectively that there is a speedup here worth the added complexity.

Also, since we are downloading blocks out of order, I'm not sure if we could warm a block but it won't be processed until after the next flush. Not sure if it's possible to detect that.

andrewtoth · 2020-07-18T23:06:27Z

Looks like I have to figure out some locking issues as well.

andrewtoth · 2020-07-18T23:08:15Z

Converted to draft until I figure out all the issues.

andrewtoth · 2020-08-07T02:30:49Z

Unfortunately I don't think I'll be able to keep working on this for the near future. Going to close for now.

fanquake added the Validation label Jun 14, 2020

This was referenced Jun 14, 2020

refactor: Make CCheckQueue RAII-styled #18731

Closed

Validation: Move CheckBlock() mutation guard to AcceptBlock() #17601

Closed

JeremyRubin reviewed Jun 14, 2020

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

andrewtoth force-pushed the warm-coinscache branch from 548e173 to f44f586 Compare June 22, 2020 00:30

DrahtBot mentioned this pull request Jul 12, 2020

Return BlockValidationState from ProcessNewBlock if CheckBlock/AcceptBlock fails #17479

Closed

andrewtoth added 2 commits July 18, 2020 17:46

Make m_cacheview a shared_ptr

1e994f2

Move CoinsTip accesses out of critical path

c294aa6

andrewtoth force-pushed the warm-coinscache branch 2 times, most recently from 9483525 to f2d161b Compare July 18, 2020 22:07

andrewtoth added 3 commits July 18, 2020 18:16

Add BlockWarmer class

afb9e0b

Warm new blocks with BlockWarmer

3cfb303

Add debug bench logs for BlockWarmer

0580245

andrewtoth force-pushed the warm-coinscache branch from f2d161b to 0580245 Compare July 18, 2020 22:17

andrewtoth marked this pull request as draft July 18, 2020 23:08

This was referenced Jul 19, 2020

refactor: Replace RecursiveMutex with Mutex in CTxMemPool #19306

Closed

[doc] explain why CheckBlock() is called before AcceptBlock #15545

Merged

Remove mempool global #19556

Merged

This was referenced Jul 28, 2020

Pass mempool pointer to UnloadBlockIndex/GetCoinsCacheSizeState #19604

Merged

Disable fee estimation in blocksonly mode (by removing the fee estimates global) #18766

Merged

andrewtoth closed this Aug 7, 2020

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validation: Warm coins cache during prevalidation to connect blocks faster #19271

validation: Warm coins cache during prevalidation to connect blocks faster #19271

andrewtoth commented Jun 14, 2020 •

edited

Loading

andrewtoth commented Jun 14, 2020

JeremyRubin commented Jun 14, 2020

DrahtBot commented Jun 14, 2020 •

edited

Loading

naumenkogs commented Jun 14, 2020

andrewtoth commented Jun 14, 2020

andrewtoth commented Jun 14, 2020

JeremyRubin left a comment

luke-jr commented Jun 14, 2020

jnewbery commented Jun 17, 2020

andrewtoth commented Jun 17, 2020 •

edited

Loading

jnewbery commented Jun 17, 2020

andrewtoth commented Jun 17, 2020 •

edited

Loading

JeremyRubin commented Jun 17, 2020

andrewtoth commented Jun 22, 2020 •

edited

Loading

JeremyRubin commented Jul 18, 2020

andrewtoth commented Jul 18, 2020

andrewtoth commented Jul 18, 2020

andrewtoth commented Jul 18, 2020

andrewtoth commented Aug 7, 2020

validation: Warm coins cache during prevalidation to connect blocks faster #19271

validation: Warm coins cache during prevalidation to connect blocks faster #19271

Conversation

andrewtoth commented Jun 14, 2020 • edited Loading

andrewtoth commented Jun 14, 2020

JeremyRubin commented Jun 14, 2020

DrahtBot commented Jun 14, 2020 • edited Loading

Conflicts

naumenkogs commented Jun 14, 2020

andrewtoth commented Jun 14, 2020

andrewtoth commented Jun 14, 2020

JeremyRubin left a comment

Choose a reason for hiding this comment

luke-jr commented Jun 14, 2020

jnewbery commented Jun 17, 2020

andrewtoth commented Jun 17, 2020 • edited Loading

jnewbery commented Jun 17, 2020

andrewtoth commented Jun 17, 2020 • edited Loading

JeremyRubin commented Jun 17, 2020

andrewtoth commented Jun 22, 2020 • edited Loading

JeremyRubin commented Jul 18, 2020

andrewtoth commented Jul 18, 2020

andrewtoth commented Jul 18, 2020

andrewtoth commented Jul 18, 2020

andrewtoth commented Aug 7, 2020

andrewtoth commented Jun 14, 2020 •

edited

Loading

DrahtBot commented Jun 14, 2020 •

edited

Loading

andrewtoth commented Jun 17, 2020 •

edited

Loading

andrewtoth commented Jun 17, 2020 •

edited

Loading

andrewtoth commented Jun 22, 2020 •

edited

Loading