Skip to content

Parallel script verification #2060

Merged
merged 5 commits into from Jan 18, 2013

6 participants

@sipa
Bitcoin member
sipa commented Dec 2, 2012
  • During block verification (when parallelism is requested), script check actions are stored instead of being executed immediately.
  • After every processed transactions, its signature actions are pushed to a CScriptCheckQueue, which maintains a queue and some synchronization mechanism.
  • Two or more threads (if enabled) process elements from this queue, and, and signal the waiting block verification code when they are done.

As cs_main is held the entire time, and all verification must be finished before the block continues processing, this does not reach the best possible performance. It is a less drastic change than some more advanced mechanisms (like doing verification out-of-band entirely, and rolling back blocks when a failure is detected).

This feature is enabled though the -par=N flag.

Depends on #2058 and #2059.

@sipa
Bitcoin member
sipa commented Dec 2, 2012

Benchmark result: on my system (an i7-2670QM), a reindex of the first 210000 blocks, with script verification enabled everywhere, and -dbcache=900:

  • HEAD: 3h22m
  • -par=4: 1h14m

With -par=4, CPU usage is around 350% (though the first ~100000 blocks cause lower CPU usage)

@Diapolo Diapolo commented on the diff Dec 3, 2012
src/init.cpp
@@ -579,6 +588,11 @@ bool AppInit2()
if (fDaemon)
fprintf(stdout, "Bitcoin server starting\n");
+ if (nScriptCheckThreads) {
@Diapolo
Diapolo added a note Dec 3, 2012

When -par=1 this would cause no thread to get spawned for verification and matches current behaviour?

@sipa
Bitcoin member
sipa added a note Dec 3, 2012

If nScriptCheckThreads == 0, there is some special code that just runs the script validation inline, instead of pushing it to queues.

nScriptCheckThreads == 1 shouldn't ever happen - there's some code that turns it into 0 if set to 1.

If nScriptCheckThreads is higher, nScriptCheckThreads-1 actual separate threads are started. When the main block processing thread is done with its normal tasks, it joins the worker thread pool temporarily, becoming the N'th worker, so there are always N threads working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@sipa
Bitcoin member
sipa commented Dec 4, 2012
  • cleaned up the code
  • moved the job queue implementation to checkqueue.h
  • added comments
  • enabled by default (-par=0 autodetects)
@Diapolo Diapolo and 1 other commented on an outdated diff Dec 4, 2012
src/checkqueue.h
+ // Whether we're shutting down.
+ bool fQuit;
+
+ // The maximum number of elements to be processed in one batch
+ unsigned int nBatchSize;
+
+ // Internal function that does bulk of the verification work.
+ bool Loop(bool fMaster = false) {
+ boost::condition_variable &cond = fMaster ? condMaster : condWorker;
+ std::vector<T> vChecks;
+ vChecks.reserve(nBatchSize);
+ nTotal++;
+ unsigned int nNow = 0;
+ bool fOk = true;
+ do {
+ {
@Diapolo
Diapolo added a note Dec 4, 2012

Nit: Small indentation glitch.

@sipa
Bitcoin member
sipa added a note Dec 4, 2012

How so? Indentation is 4 spaces...

@Diapolo
Diapolo added a note Dec 4, 2012

You are right, it's fine ... just looked weird because of the do { above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Diapolo Diapolo and 1 other commented on an outdated diff Dec 4, 2012
src/init.cpp
@@ -483,6 +488,14 @@ bool AppInit2()
// ********************************************************* Step 3: parameter-to-internal-flags
fDebug = GetBoolArg("-debug");
+ fBenchmark = GetBoolArg("-benchmark");
+ nScriptCheckThreads = GetArg("-par", 0);
+ if (nScriptCheckThreads == 0)
+ nScriptCheckThreads = boost::thread::hardware_concurrency();
+ if (nScriptCheckThreads <= 1)
+ nScriptCheckThreads = 0;
+ if (nScriptCheckThreads > 64)
@Diapolo
Diapolo added a note Dec 4, 2012

This could be an else if.

@sipa
Bitcoin member
sipa added a note Dec 4, 2012

Indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Diapolo Diapolo commented on the diff Dec 4, 2012
src/checkqueue.h
@@ -0,0 +1,155 @@
@Diapolo
Diapolo added a note Dec 4, 2012

Can you include checkqueue.h in bitcoin-qt.pro, to be visible in the Qt IDE.

@sipa
Bitcoin member
sipa added a note Dec 4, 2012

Ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Diapolo
Diapolo commented Dec 4, 2012

I love your comments, great work here. I still need to try out the code though :).

@laanwj laanwj commented on an outdated diff Dec 6, 2012
src/init.cpp
@@ -481,6 +486,16 @@ bool AppInit2()
// ********************************************************* Step 3: parameter-to-internal-flags
fDebug = GetBoolArg("-debug");
+ fBenchmark = GetBoolArg("-benchmark");
+
+ // -par=0 means autodetect, but nScriptCheckThreads==0 means no concurrency
+ nScriptCheckThreads = GetArg("-par", 0);
+ if (nScriptCheckThreads == 0)
+ nScriptCheckThreads = boost::thread::hardware_concurrency();
+ if (nScriptCheckThreads <= 1)
+ nScriptCheckThreads = 0;
+ else if (nScriptCheckThreads > 64)
@laanwj
Bitcoin member
laanwj added a note Dec 6, 2012

Please make this (arbitary?) limit of 64 a constant instead of a magic number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@laanwj
Bitcoin member
laanwj commented Dec 6, 2012

Nice!

@sipa
Bitcoin member
sipa commented Dec 6, 2012

I've been doing some benchmark, and it seems the contention on the (single) lock protecting the queue makes the throughput and contention overhead go rather high when using too many threads. At least extrapolating from what I see on my system. more than 8 or 16 threads will probably cause significantly degraded performance. Switching to a per-thread queue is probably better, with jobs assigned in a round-robin way to them, or something more intelligent

That said, rebuilding the coindb from scratch (-dbcache=1000, -par=12, with #2061 and #2062, script checks only after block 193k) takes 13m51s on a hexacore E5-1650 @ 3.2Ghz)...

@BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/8f706026e6dee8e38cca0d17acbfc75107d2dcba for binaries and test log.

@sipa
Bitcoin member
sipa commented Dec 8, 2012

Changes:

  • Access to the script check queue is now piped through a RAII CScriptCheckQueueControl, which guarantees the queue is fully processed before continuing
  • Print the number of threads used in debug.log
  • Don't store block validation results in signature cache (only mempool transactions are stored), but still use them. This allows multiple threads reading the cache simultaneously.
@BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/5c713c9daa1128d407d9c483d1abae9bde6d48ad for binaries and test log.

@BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/2f3ae3eebd979c1c4c7f43d9cfbe95f61db93ec6 for binaries and test log.

@gmaxwell

Just a comment on negative testing results:

I've been running loops of par inside valgrind on fuzzed blockchains with an instrumented copy of Bitcoin that disables most of the block validity tests (so that the fuzzing doesn't cause the chain to be rejected). In 1000 runs, no errors so far— but I did trigger invalid memory accesses after about 100 runs on this code prior to the RAII CScriptCheckQueueControl added in the last patch.

@sipa
Bitcoin member
sipa commented Dec 19, 2012

Given that any non-trivial code has at least one bug (see http://www.murphys-laws.com/murphy/murphy-computer.html), this is indeed bad news :(

sipa added some commits Sep 8, 2012
@sipa sipa Move VerifySignature to main f113620
@sipa sipa Add CScriptCheck: a closure representing a script check 2800ce7
@sipa sipa Remove CheckSig_mode and move logic out of CheckInputs() 1d70f4b
@sipa sipa Parallelize script verification
* During block verification (when parallelism is requested), script
  check actions are stored instead of being executed immediately.
* After every processed transactions, its signature actions are
  pushed to a CScriptCheckQueue, which maintains a queue and some
  synchronization mechanism.
* Two or more threads (if enabled) start processing elements from
  this queue,
* When the block connection code is finished processing transactions,
  it joins the worker pool until the queue is empty.

As cs_main is held the entire time, and all verification must be
finished before the block continues processing, this does not reach
the best possible performance. It is a less drastic change than
some more advanced mechanisms (like doing verification out-of-band
entirely, and rolling back blocks when a failure is detected).

The -par=N flag controls the number of threads (1-16). 0 means auto,
and is the default.
f9cae83
@sipa sipa Remove contention on signature cache during block validation
Since block validation happens in parallel, multiple threads may be
accessing the signature cache simultaneously. To prevent contention:
* Turn the signature cache lock into a shared mutex
* Make reading from the cache only acquire a shared lock
* Let block validations not store their results in the cache
ef0f422
@BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/ef0f422519de4a3ce47d923e5f8f90cd12349f3e for binaries and test log.

@gavinandresen
Bitcoin member

ACK.

Benchmark results on my mac, testing by doing a fresh sync of the -testnet blockchain pulled over the LAN:

Without this pull:
32-bit compile: 270 seconds
64-bit compile: 180 seconds

With this pull:
64-bit, 4-CPU : 125 seconds

@gavinandresen gavinandresen merged commit 0e31ae9 into bitcoin:master Jan 18, 2013
@sipa sipa deleted the sipa:parallel branch May 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.