Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation: Warm coins cache during prevalidation to connect blocks faster #19271

Closed
wants to merge 5 commits into from

Conversation

andrewtoth
Copy link
Contributor

@andrewtoth andrewtoth commented Jun 14, 2020

Retrieving coins from disk is a significant source of ConnectBlock latency. This PR increases ConnectBlock speed by retrieving coins from disk on a separate thread before ConnectBlock is reached. When a block is passed into ProcessNewBlock it is immediately warmed before prevalidation checks are begun.

Benchmarking IBD with -prune=10000 and default -dbcache from blocks 600000-633000 gives a 7.1% increase in speed. Since this is only really useful when blocks take a long time to prevalidate, benchmarks from genesis only give a 2.3% increase. However, this should keep increasing as more large blocks get mined, so it will be even more useful in the future. Of course this will be less useful with high -dbcache values.

@andrewtoth
Copy link
Contributor Author

Big thanks to @JeremyRubin for giving me this idea.

@JeremyRubin
Copy link
Contributor

Concept ACK and lite CR-ACK, the approach seems to make sense.

Very nice work! Did this help with the variance from before?

I don't want to go overkill but I wonder if there's a nicer way to communicate the start warming/stop warming instructions (e.g. not just tweaking flags so there's some sort of interface). I also wonder if it makes sense to just always completely finish warming the cache or not? Aborting warming early still means the work has to happen in serial (at some point) with the main thread. Is there ever a circumstance where the warmer gets the cache full that it evicts its own warmed entries? Can we detect this and just stop then?

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 14, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@naumenkogs
Copy link
Member

Concept ACK.
I think speeding up connecting blocks is always a good idea.

I see you introducing a new thread. That might be an issue. Last time I tried that it didn't work out well: see discussion here and an issue I created here.
Maybe you want to express your opinion there or during the IRC meeting, as it's likely that you'll face the same issue with this PR

@andrewtoth
Copy link
Contributor Author

Did this help with the variance from before?

@JeremyRubin my personal benchmarking bandwidth is rather limited. I'm trying to get another machine set up to do more extensive benchmarking on this. My numbers were done on a 12 virtual core CPU with internal SSD. I think this might have a wide variance of performance increase depending on CPU and disk types. However, the results I had were very promising so I submitted this for concept approval. I will come back with some more numbers soon(ish).

I don't want to go overkill but I wonder if there's a nicer way to communicate the start warming/stop warming instructions (e.g. not just tweaking flags so there's some sort of interface).

I thought I had done a pretty good job of this by wrapping it in WarmBlock/CancelWarmingBlock methods. I'll think some more on this. I'm open to suggestions.

I also wonder if it makes sense to just always completely finish warming the cache or not? Aborting warming early still means the work has to happen in serial (at some point) with the main thread.

My benchmarks showed this to have consistently negative performance. I also tried locking/unlocking inside ConnectBlock only where the coins were actually accessed, but it was also less performant and would have also made this diff a lot more complex.

Is there ever a circumstance where the warmer gets the cache full that it evicts its own warmed entries? Can we detect this and just stop then?

I don't think this can occur. The cache is flushed only after ConnectBlock in ConnectTip, so the effect will be the same when warming coins in ConnectBlock or right before.

@andrewtoth
Copy link
Contributor Author

@naumenkogs I don't think your issue applies here. The work this thread needs to do is on the order of milliseconds, but it also needs to be immediately available for there to be any benefit. If it had to wait some milliseconds for scheduling it would be useless.

Copy link
Contributor

@JeremyRubin JeremyRubin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you have done a decent job with it!

Concretely it would be my preference to have a separate object for the warming thread (which is owned by the chainstate) so that the inner fields of the warming logic are encapsulated away from the chainstate. That way it's more explicit what the coin warmer can access and its state variables can only be accessed through the interface. I would then also hide the variables from the CoinsTip function and access through an interface. This is just a style preference so feel free to ignore -- some people also feel that objects are too heavyweight, but in particular I feel that an object to own/manage a thread is a good abstraction as it helps make clear which fields are protected by m_cs_warm_block (also check out some of the locking annotations used elsewhere in the code, would help with this this patch).

src/validation.cpp Outdated Show resolved Hide resolved
@luke-jr
Copy link
Member

luke-jr commented Jun 14, 2020

Concept ACK

@jnewbery
Copy link
Contributor

This is the kind of change that immediately scares me. It's adding complexity in a consensus-critical area, without great benefit to the critical metrics:

  • IBD: 2.3% speedup for is nice, but not worth the complexity/risk
  • Block propagation: I doubt this will have any impact since we relay compact blocks before calling ActivateBestChain(), which is where the coins are needed.

So I'd tend to concept NACK this kind of change. It seems to go in the opposite direction from what we'd want, which is to make validation-critical code as simple as possible.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Jun 17, 2020

@jnewbery I can definitely appreciate that perspective! However, I think concentrating on the 2.3% speedup is a pessimistic take on this patch. It has a 7.1% increase in speed for recent blocks, which are what we are likely to continue to experience in the future. While a user only has to perform IBD from genesis once, they will have to sync recent blocks every time they turn on the node. I think optimizing the common user flow of using the node, then turning off for days/weeks/months until it's used again will prove to be a big benefit. I also think I can find a way to increase IBD from genesis somewhat, but am still experimenting and will get more concrete numbers together. As I said in the description, this will only become more beneficial as the block chain grows.

I also don't think this is that invasive of a patch. It's not changing any logic with any of the actual validation code. It is just looping through the inputs of a block before the actual validation occurs. This seems like an easy win for a bump of 7%.

You are correct that this doesn't improve block propagation. I didn't claim that it did. However, block propagation is only the first step for a block to be useful to a user; it still has to be connected after it is propagated. Increasing propagation to miners isn't as useful if it still takes a long time to validate the block afterwards. Miners (and all users, really) want to have the block connected as soon as possible. This can reduce forks by allowing miners to start mining on top of the block faster, and improve miner revenue by more quickly determining the optimal block template to mine on.

@jnewbery
Copy link
Contributor

think optimizing the common user flow of using the node, then turning off for days/weeks/months until it's used again will prove to be a big benefit.

This isn't a usage pattern that we should be optimizing for:

  • it happens very infrequently (at most once every few days/weeks/months if the user really does stop-start their node frequently)
  • I'd argue that for this scenario, a 7.1% increase isn't that meaningful. If instead of a node taking one hour to resync from cold with this PR, I'm probably ok waiting one hour and four minutes without this PR.

I've had a quick read through the implementation and I don't see how this can work. Your ThreadWarmCoinsCache thread is taking cs_main, grabbing a pointer to the coins tip, then dropping cs_main and continuing to write to the cache object. Meanwhile, the message handler thread or any other thread could also be accessing that same cache and reading/writing to it. Surely cs_main needs to be held for as long as you have the coins view cache?

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Jun 17, 2020

Meanwhile, the message handler thread or any other thread could also be accessing that same cache and reading/writing to it. Surely cs_main needs to be held for as long as you have the coins view cache?

@jnewbery any other thread accessing CoinsTip requires cs_main to be held before accessing. Calling CoinsTip will cancel the current block warming and acquire the warming lock, and it can't start warming another block until it takes cs_main again. I can make this more clear with refactoring as per the comment here.

@JeremyRubin
Copy link
Contributor

I think the approach can be made clearly safe. While out of the propagation path for compact blocks, this is still in path for mining so there is a motivating factor IMO.

Repeated uncontended locking is sufficiently cheap that I'd be curious to see the performance difference if the warming thread just tries to grab the lock before every read/write and receives an atomic flag from the main thread to quit warming at a certain point. This would make the algorithm more 'obviously correct' and I think would have negligible performance impact as the lock won't be contended.

I'd also be curious if we can start the warming even earlier. E.g., when we receive a compact block we can immediately warm coins we can reconstruct. But that's a lot of complexity for a marginal gain over the current approach.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Jun 22, 2020

I had originally been benchmarking by connecting to a single peer in the local network, which processes the new block and connects each block serially. When connecting to multiple remote peers the node will process new blocks in parallel and then connect them all later. When benchmarking a true IBD with multiple remote peers, the node gets a lot more time to warm the blocks as they come in.

Doing a full IBD with -prune=10000 -stopatheight=635000 on the same machine as before, I get an increase in speed of 39% (548 minutes in master, 337 minutes in this branch) and total block connect time speed increase of 7% (14563.66s (22.93ms/blk) in master, 13571.83s (21.37ms/blk) in this branch). Total time warming coins was 4294.83s.

Doing a full IBD to block 635000 with no pruning on an older quad core machine with SSD, I get a speed increase in total time of 12% (28 hours in master, 24.75 hours in this branch) and total connect time speed increase of 16% (67425.83s (106.18ms/blk) in master, 56600.08s (89.13ms/blk) in this branch). Total time warming coins was 22229.61s.

Of course there is a lot of variance due to network latency and quality of peers, and I will continue benchmarking with different configurations. However, I think these numbers warrant adding some more complexity to the validation logic as well as a new dedicated thread.

This also means what I said about warmed coins being evicted before block connection is not true. The cache can be filled and the coins evicted before the block is connected. I'll work on updating this so it won't happen.

I'm still working on refactoring the locking interface, but I wanted to share my findings. I also pushed a small update to cancel warming a block when a new block is processed before the previous one is connected, so it will work efficiently with multiple remote peers in case anyone else wants to try benchmarking this.

@JeremyRubin
Copy link
Contributor

@andrewtoth I wanted to check in here and see how your progress is going? Anything in particular you feel stuck on?

@andrewtoth andrewtoth force-pushed the warm-coinscache branch 2 times, most recently from 9483525 to f2d161b Compare July 18, 2020 22:07
@andrewtoth
Copy link
Contributor Author

@JeremyRubin Thanks for checking in, as well as for all your help and encouragement!

I've pushed my latest. I've taken your suggestion and moved the thread handling to a new class BlockWarmer, and added an instance of it as a member variable of CChainState. Also added locking annotations. I think it makes it a lot more clear what is being changed with the validation logic. I can also write unit tests for it now.

As for what I'm felling stuck on, it's difficult to get a consistent benchmark on this. It's also very time consuming since I don't have a bunch of machines to keep running this on. It's consistently faster doing IBD, but the speedup varies widely, to above 20% to low single digits. I think I might have to setup something in the local network to pretend to be multiple peers. I'm also thinking I should write a benchmark for time to connect after receiving a block in ProcessNewBlock. I think I need to demonstrate objectively that there is a speedup here worth the added complexity.

Also, since we are downloading blocks out of order, I'm not sure if we could warm a block but it won't be processed until after the next flush. Not sure if it's possible to detect that.

@andrewtoth
Copy link
Contributor Author

Looks like I have to figure out some locking issues as well.

@andrewtoth andrewtoth marked this pull request as draft July 18, 2020 23:08
@andrewtoth
Copy link
Contributor Author

Converted to draft until I figure out all the issues.

@andrewtoth
Copy link
Contributor Author

Unfortunately I don't think I'll be able to keep working on this for the near future. Going to close for now.

@andrewtoth andrewtoth closed this Aug 7, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants