2019 08 16 process header optimization #701

Christewart · 2019-08-20T14:23:46Z

This is part of #653

Previously, we were inserting block headers after every tip validation instead of batching them. The goal of this PR is to batch these headers to insert them into the database all at once, to reduce database latency in our tip validation process.

8ba9626 (current master)

[info] ScalaTest
[info] Run completed in 2 minutes.
[info] Total number of tests run: 27
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

31d9e31

[info] ScalaTest
[info] Run completed in 2 minutes, 5 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 27, Failed 0, Errors 0, Passed 27
[success] Total time: 128 s, completed Aug 20, 2019 9:16:11 AM

As the tests indicate, it seems like we have slightly regressed in at least running the test suite. I don't think this will be true when syncing the real chain though. Our chain tests sync insert roughly 10,000 headers, which is a relatively small sample size to a real chain.

It also looks like @nkohen found that the optimal insert size is not 2,000 -- which is the default number a node will send to us one the network at one time -- but has used batch sizes of 500 in our ChainUnitTest

chain/src/main/scala/org/bitcoins/chain/blockchain/Blockchain.scala

chain/src/main/scala/org/bitcoins/chain/blockchain/BlockchainUpdate.scala

chain/src/main/scala/org/bitcoins/chain/blockchain/ConnectTipResult.scala

…der()

…), this decouples 'BlockchainUpdate' and Blockchain.connectTip(). This was needed because it doesn't make sense to have a Vector[BlockHeaderDb] returned from Blockchain.connectTip() as we are validating one header. However, to execute batch inserts, we need to accumulate headers somewhere. I choose to do this in BlockchainUpdate with the 'successfulHeaders' field.

…s out extraneous noise in another test case

Christewart · 2019-08-21T13:01:13Z

With #708 in master the results for running the test case that is most similar to header processing is this one.

1619d5d

sbt:bitcoin-s-chain-test> testOnly *ChainHandler* -- -z "be able to process and fetch real headers from mainnet"
[info] BitcoindChainHandlerViaZmqTest:
[info] BitcoindChainHandlerViaZmq
[info] ChainHandlerTest:
[info] ChainHandler
[info] - must be able to process and fetch real headers from mainnet
[info] ScalaTest
[info] Run completed in 41 seconds, 476 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 43 s, completed Aug 21, 2019 7:14:12 AM
sbt:bitcoin-s-chain-test> testOnly *ChainHandler* -- -z "be able to process and fetch real headers from mainnet"
[info] BitcoindChainHandlerViaZmqTest:
[info] BitcoindChainHandlerViaZmq
[info] ChainHandlerTest:
[info] ChainHandler
[info] - must be able to process and fetch real headers from mainnet
[info] ScalaTest
[info] Run completed in 37 seconds, 110 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 38 s, completed Aug 21, 2019 7:14:54 AM

and this branch (9c3ea41)

[info] BitcoindChainHandlerViaZmqTest:
[info] BitcoindChainHandlerViaZmq
[info] ChainHandlerTest:
[info] ChainHandler
[info] - must be able to process and fetch real headers from mainnet
[info] ScalaTest
[info] Run completed in 42 seconds, 312 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 67 s, completed Aug 21, 2019 7:16:30 AM
sbt:bitcoin-s-chain-test> testOnly *ChainHandler* -- -z "be able to process and fetch real headers from mainnet"
[info] BitcoindChainHandlerViaZmqTest:
[info] BitcoindChainHandlerViaZmq
[info] ChainHandlerTest:
[info] ChainHandler
[info] - must be able to process and fetch real headers from mainnet
[info] ScalaTest
[info] Run completed in 39 seconds, 206 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 40 s, completed Aug 21, 2019 7:17:22 AM

Christewart · 2019-08-21T13:05:37Z

So I pushed up another commit (b206d8d) that creates a specific test case to call processHeaders() and then assert we have the best hash after processing headers. This cuts out a lot of extraneous stuff that was being timed in my other results for benchmarking this PR.

Here is current master with my new test case (1619d5d)

[info] ChainHandlerTest:
[info] ChainHandler
[info] - must benchmark ChainHandler.processHeaders()
[info] ScalaTest
[info] Run completed in 17 seconds, 26 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 19 s, completed Aug 21, 2019 7:55:21 AM
sbt:bitcoin-s-chain-test> testOnly *ChainHandlerTest* -- -z "benchmark ChainHandler.processHeaders()"
[info] ChainHandlerTest:
[info] ChainHandler
[info] - must benchmark ChainHandler.processHeaders()
[info] ScalaTest
[info] Run completed in 17 seconds, 629 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 19 s, completed Aug 21, 2019 7:55:47 AM

and here is the results from b206d8d

[info] ChainHandlerTest:
[info] ChainHandler
[info] - must benchmark ChainHandler.processHeaders()
[info] ScalaTest
[info] Run completed in 5 seconds, 751 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 27 s, completed Aug 21, 2019 7:57:16 AM
sbt:bitcoin-s-chain-test> testOnly *ChainHandlerTest* -- -z "benchmark ChainHandler.processHeaders()"
[info] ChainHandlerTest:
[info] ChainHandler
[info] - must benchmark ChainHandler.processHeaders()
[info] ScalaTest
[info] Run completed in 5 seconds, 650 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 7 s, completed Aug 21, 2019 7:57:27 AM

As you can see there is a pretty significant time reduction here from 17 seconds to 5 seconds.

Here is the method that was being called in previous measurements, and as you can see it doesn't even call processHeaders()

chain/src/main/scala/org/bitcoins/chain/blockchain/Blockchain.scala

nkohen · 2019-08-21T21:01:43Z

chain/src/main/scala/org/bitcoins/chain/blockchain/ChainHandler.scala

-          }
-        }
+    val headersToBeCreated = {
+      blockchainUpdates.map(_.successfulHeaders).flatten.distinct.toVector


also map and flatten to flatMap

chain/src/main/scala/org/bitcoins/chain/blockchain/ConnectTipResult.scala

rorp

👍

* WIP: implement processHeaders() in ChainHandler instead of processHeader() * Create ConnectTipResult which is returned from Blockchain.connectTip(), this decouples 'BlockchainUpdate' and Blockchain.connectTip(). This was needed because it doesn't make sense to have a Vector[BlockHeaderDb] returned from Blockchain.connectTip() as we are validating one header. However, to execute batch inserts, we need to accumulate headers somewhere. I choose to do this in BlockchainUpdate with the 'successfulHeaders' field. * make chain-verification log level INFO * Add unit test for benchmarking ChainHandler.processHeaders() that cuts out extraneous noise in another test case * Address code review

Christewart requested review from torkelrogstad and nkohen August 20, 2019 14:26

Christewart added the chain work on the chain project label Aug 20, 2019

torkelrogstad reviewed Aug 21, 2019

View reviewed changes

Christewart mentioned this pull request Aug 21, 2019

Use transactionally when inserting data into our database with createAll #708

Merged

Christewart added 3 commits August 21, 2019 06:33

WIP: implement processHeaders() in ChainHandler instead of processHea…

a3a15d0

…der()

make chain-verification log level INFO

9c3ea41

Christewart force-pushed the 2019-08-16-process-header-optimization branch from 31d9e31 to 9c3ea41 Compare August 21, 2019 12:06

Add unit test for benchmarking ChainHandler.processHeaders() that cut…

b206d8d

…s out extraneous noise in another test case

nkohen reviewed Aug 21, 2019

View reviewed changes

Address code review

2a7f5a7

Christewart requested a review from rorp August 22, 2019 18:17

rorp approved these changes Aug 22, 2019

View reviewed changes

torkelrogstad approved these changes Aug 23, 2019

View reviewed changes

Christewart merged commit ab170d0 into bitcoin-s:master Aug 23, 2019

Christewart mentioned this pull request Aug 23, 2019

Initial BIP157 support #695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2019 08 16 process header optimization #701

2019 08 16 process header optimization #701

Christewart commented Aug 20, 2019 •

edited

Christewart commented Aug 21, 2019

Christewart commented Aug 21, 2019 •

edited

nkohen Aug 21, 2019

rorp left a comment

2019 08 16 process header optimization #701

2019 08 16 process header optimization #701

Conversation

Christewart commented Aug 20, 2019 • edited

Christewart commented Aug 21, 2019

Christewart commented Aug 21, 2019 • edited

nkohen Aug 21, 2019

Choose a reason for hiding this comment

rorp left a comment

Choose a reason for hiding this comment

Christewart commented Aug 20, 2019 •

edited

Christewart commented Aug 21, 2019 •

edited