Acquire CCheckQueue's lock to avoid race condition #5721

sdaftuar · 2015-01-28T19:48:44Z

This fixes a potential race condition in the CCheckQueueControl constructor,
which was looking directly at data in CCheckQueue without acquiring its lock.

Even though only one CCheckQueueControl exists at a time, one of the
CCheckQueue threads may have completed work but not yet updated nIdle or
released its lock, so looking at that variable without acquiring the lock first is not
safe.

Fixes #5703.

sipa · 2015-01-28T20:02:32Z

src/checkqueue.h

-            assert(pqueue->nTotal == pqueue->nIdle);
-            assert(pqueue->nTodo == 0);
-            assert(pqueue->fAllOk == true);
+            assert(pqueue->IsIdle());


Can you avoid having an effectful statement inside an assert? We're requiring NDEBUG now, but that may not remain the case.

Good point -- fixed so that the function gets called outside the assert (commit squashed).

wtogami · 2015-01-29T20:24:45Z

0.10?

sipa · 2015-01-30T04:28:45Z

Untested ACK (after squash), including 0.10.

TheBlueMatt · 2015-02-02T19:02:36Z

utACK, after squash

sdaftuar · 2015-02-02T19:36:01Z

Now that I've eliminated the empty commit I used to bump travis (after it initially failed for reasons I think are unrelated to my pull), travis is again showing the initial failed run. Is there a better way for me to handle this in the future?

theuni · 2015-02-02T21:19:51Z

@sdaftuar creating a new commit on top forces travis to test that commit. But when you pop it back off, it's already built that exact revision, so it doesn't bother trying again.

Rather than that, just re-commit the change in some way that generates a new commit hash. Edit the commit message somewhat, git format-patch -1 + git am, something like that. Then force-push to overwrite the old one.

But of course, the above only applies if the test failure really was a fluke!

theuni · 2015-02-02T21:38:15Z

CCheckQueue should be able to drop its friendship with CCheckQueueControl after this change (sorry CCheckQueueControl...)

That should keep something like this from happening again, since the member vars would be guarded.
utACK after that.

ghost · 2015-02-03T00:39:42Z

utACK

The following was observed on testnet in one of the initial syncs at height 224873:

bitcoind: checkqueue.h:183: CCheckQueueControl::CCheckQueueControl(CCheckQueue*) [with T = CScriptCheck]: Assertion `pqueue->nTotal == pqueue->nIdle' failed.

laanwj · 2015-02-03T07:57:08Z

Weird, it failed travis again.

jonasschnelli · 2015-02-03T09:51:59Z

Travis reports during make check (a.k.a src/test/test_bitcoin):

2015-01-28 20:27:11 Unlocked: cs_Shutdown  init.cpp:141
make[2]: *** [check-local] Error 1

It looks like test_bitcoin crashes at init.cpp:141 (Shutdown())
This like is TRY_LOCK(cs_Shutdown, lockShutdown); (https://github.com/bitcoin/bitcoin/blob/master/src/init.cpp#L141)

And this pull introduces a mutex.
I tend to NACK unless this is sorted out.

This fixes a potential race condition in the CCheckQueueControl constructor, which was looking directly at data in CCheckQueue without acquiring its lock. Remove the now-unnecessary friendship for CCheckQueueControl

sdaftuar · 2015-02-03T15:24:34Z

To clarify -- this pull doesn't introduce an actually new mutex, but does it acquire a lock based on the existing mutex that is already used to synchronize access to the member variables in CCheckQueue.

The travis failure was unrelated to the pull; the travis log (https://travis-ci.org/bitcoin/bitcoin/builds/48672519) shows the java comparison tool test failed with:

08:27:09 14 BitcoindComparisonTool$1.onPeerDisconnected: bitcoind node disconnected!

What appears below that is the end of the debug log from bitcoind after the comparison tool failure, which shows a clean shutdown for a bitcoind that is compiled with DEBUG_LOCKORDER and run with -debug (ie the message @jonasschnelli pasted is a normal one, not indicative of the failure).

Note also that this is not test_bitcoin which failed; the test_bitcoin unit tests passed:

$ if [ "$RUN_TESTS" = "true" ]; then make check; fi
Making check in src
make[1]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
make[2]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
make  check-TESTS check-local
make[3]: Entering directory `/home/travis/build/bitcoin/bitcoin/bitcoin-x86_64-unknown-linux-gnu/src'
Running 137 test cases...
*** No errors detected
PASS: test/test_bitcoin
=============
1 test passed
=============

I tested my original pull by compiling with DEBUG_LOCKORDER and doing a -reindex on testnet, and by verifying that it solved the race condition in the original issue (which I exacerbated in testing by adding a usleep(500000) before nIdle++ at checkqueue.h:101).

At any rate I just made the change suggested by @theuni (removing the friend class designation for CCheckQueueControl), which is going to bump travis again.

theuni · 2015-02-03T21:55:23Z

I've tested this a good bit locally, since although it doesn't look like it should be able to break anything, the random travis failure on the DEBUG_LOCKORDER test is rather scary.

I've re-run the comparison tool test dozens of times now, with the exact same conditions (built from depends, NO_QT=1 NO_UPNP=1 DEBUG=1, configured with --enable-glibc-back-compat CPPFLAGS=-DDEBUG_LOCKORDER).

No problems here.

Also, notice that in the Travis failure, we never even get a "bitcoind connected". Looks to me like failure really was a fluke.

TheBlueMatt · 2015-02-03T23:07:15Z

I'd bet this is related to #5433 (comment) (probably the same issue), try with TheBlueMatt@e21269e merged in.

sdaftuar · 2015-02-06T15:46:09Z

@TheBlueMatt Just to clarify is there anything more to be done on this pull? Since travis ran cleanly with the current code, I thought I'd just keep your workaround in mind if that travis issue comes up again, but not necessarily change anything now.

cf008ac Acquire CCheckQueue's lock to avoid race condition (Suhas Daftuar)

This fixes a potential race condition in the CCheckQueueControl constructor, which was looking directly at data in CCheckQueue without acquiring its lock. Remove the now-unnecessary friendship for CCheckQueueControl Rebased-From: cf008ac Github-Pull: #5721

This fixes a potential race condition in the CCheckQueueControl constructor, which was looking directly at data in CCheckQueue without acquiring its lock. Remove the now-unnecessary friendship for CCheckQueueControl Rebased-From: cf008ac Github-Pull: bitcoin#5721 (cherry picked from commit d148f62)

sipa reviewed Jan 28, 2015
View reviewed changes

sdaftuar force-pushed the fix-checkqueue-race branch from 54e9638 to 425eb1c Compare January 28, 2015 20:17

laanwj added this to the 0.10.0 milestone Jan 30, 2015

laanwj added the Bug label Jan 30, 2015

sdaftuar force-pushed the fix-checkqueue-race branch from ea2a009 to 425eb1c Compare February 2, 2015 19:28

laanwj removed this from the 0.10.0 milestone Feb 3, 2015

Acquire CCheckQueue's lock to avoid race condition

cf008ac

This fixes a potential race condition in the CCheckQueueControl constructor, which was looking directly at data in CCheckQueue without acquiring its lock. Remove the now-unnecessary friendship for CCheckQueueControl

sdaftuar force-pushed the fix-checkqueue-race branch from 425eb1c to cf008ac Compare February 3, 2015 15:24

laanwj merged commit cf008ac into bitcoin:master Feb 6, 2015

laanwj added a commit that referenced this pull request Feb 6, 2015

Merge pull request #5721

fb6140b

cf008ac Acquire CCheckQueue's lock to avoid race condition (Suhas Daftuar)

dr-mr-space-monkey mentioned this pull request Mar 27, 2021

Race condition in CCheckQueueControl ShorelineCrypto/cheetahcoin#6

Closed

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acquire CCheckQueue's lock to avoid race condition #5721

Acquire CCheckQueue's lock to avoid race condition #5721

sdaftuar commented Jan 28, 2015

sipa Jan 28, 2015

sdaftuar Jan 28, 2015

wtogami commented Jan 29, 2015

sipa commented Jan 30, 2015

TheBlueMatt commented Feb 2, 2015

sdaftuar commented Feb 2, 2015

theuni commented Feb 2, 2015

theuni commented Feb 2, 2015

ghost commented Feb 3, 2015

laanwj commented Feb 3, 2015

jonasschnelli commented Feb 3, 2015

sdaftuar commented Feb 3, 2015

theuni commented Feb 3, 2015

TheBlueMatt commented Feb 3, 2015

sdaftuar commented Feb 6, 2015

Acquire CCheckQueue's lock to avoid race condition #5721

Acquire CCheckQueue's lock to avoid race condition #5721

Conversation

sdaftuar commented Jan 28, 2015

sipa Jan 28, 2015

Choose a reason for hiding this comment

sdaftuar Jan 28, 2015

Choose a reason for hiding this comment

wtogami commented Jan 29, 2015

sipa commented Jan 30, 2015

TheBlueMatt commented Feb 2, 2015

sdaftuar commented Feb 2, 2015

theuni commented Feb 2, 2015

theuni commented Feb 2, 2015

ghost commented Feb 3, 2015

laanwj commented Feb 3, 2015

jonasschnelli commented Feb 3, 2015

sdaftuar commented Feb 3, 2015

theuni commented Feb 3, 2015

TheBlueMatt commented Feb 3, 2015

sdaftuar commented Feb 6, 2015