Thread / shutdown cleanup #2357

Merged
merged 7 commits into from Apr 4, 2013

Projects

None yet

8 participants

@gavinandresen
Member

The shutdown code has bothered me for a long time, with fShutdown and vnThreadsRunning and a bunch of ad-hoc Sleeps().

Reports of shutdown-on-exit bugs in 0.8 prompted me to clean it all up.

This pull reworks the way we keep track of threads, using boost::thread_groups or boost::thread pointers.

Instead of polling for a global fShutdown flag, threads now just MilliSleep(), wait on boost mutexes, or periodically call boost::this_thread::interruption_point(), and are interrupted using boost's thread interruption mechanisms.

All of the cleanup removes 400 lines of code, and, I hope, makes the code easier to follow.

Tested on OSX extensively, and compiled/tested lightly on Windows XP.

@gavinandresen
Member

Rebased, and added -rpcthreads to --help output.

But the SIGTERM handler is not working properly, which is why the pull tester is upset...

@Diapolo
Diapolo commented Mar 19, 2013

I really like the reduced complexity of the code, I didn't test it out yet though.

@gavinandresen
Member

I can't reproduce the pull-tester issue in an Ubuntu VM, but I did fix a compiler warning.

@Diapolo
Diapolo commented Mar 20, 2013

Let's see what pull-tester is doing now, seems the old entries are gone and I wanted to also take a look :).

@sipa sipa and 3 others commented on an outdated diff Mar 21, 2013
src/init.cpp
@@ -299,6 +304,7 @@ bool static Bind(const CService &addr, unsigned int flags) {
" -rpcport=<port> " + _("Listen for JSON-RPC connections on <port> (default: 8332 or testnet: 18332)") + "\n" +
" -rpcallowip=<ip> " + _("Allow JSON-RPC connections from specified IP address") + "\n" +
" -rpcconnect=<ip> " + _("Send commands to node running on <ip> (default: 127.0.0.1)") + "\n" +
+ " -rpcthreads=<n> " + _("Use this mean threads to service RPC calls (default: 4)") + "\n" +
@sipa
sipa Mar 21, 2013 Member

mean?

@Diapolo
Diapolo Mar 21, 2013

^^ I vote for Set the number of threads to service RPC calls (1-16, 0=auto, default: 0) and we should perhaps consider the same routine as you added for the -par switch then.

// -rpcthreads=0 means autodetect, but nRPCThreads==0 means no concurrency
    nRPCThreads = GetArg("-rpcthreads", 0);
    if (nRPCThreads == 0)
        nRPCThreads = boost::thread::hardware_concurrency();
    if (nRPCThreads <= 1)
        nRPCThreads = 0;
    else if (nRPCThreads > MAX_RPC_THREADS)
        nRPCThreads = MAX_RPC_THREADS;
@gavinandresen
gavinandresen Mar 21, 2013 Member

No.

More RPC threads is not better, because RPC calls are normally single-threaded anyway due to lock contention.

@Diapolo
Diapolo Mar 21, 2013

Okay, didn't know that, but the message should be in a similar format and some checks for min / max would be also nice IMHO.

@rebroad
rebroad Apr 1, 2013 Contributor

"Use this mean threads" doesn't appear to me to be English.

@jgarzik
Member
jgarzik commented Mar 22, 2013

ACK (with some inline minor nits mentioned)

@sipa
Member
sipa commented Mar 23, 2013

I really like this change (except the polling to check whether someone interrupted... i wonder if one can call a condition variable's notify in a signal handler...). I haven't gone through the code in detail, but I see nothing terrible.

I've tested both bitcoind and bitcoin-qt under valgrind, with several ways of causing exit. All seems good.

@denis2342

I tested it under OSX 10.8.3 with a command line build of bitcoind and it does not stop when I send the "stop" rpc command.

@gavinandresen
Member

There is definitely a bug with the signal handler / RPC stop on OSX at least, if I try to stop shortly after startup the "stop" get swallowed and further stops have no effect.

@gavinandresen
Member

Fixed two issues, and made one improvement:

  1. Running -daemon, the shutdown detection thread was being started before the fork(). Oops. I moved the -daemon code to run earlier, in AppInit() (which has the added benefit of simplifying the code a bit).
  2. A request to shutdown during AppInit2() startup could get lost. I fixed that by making the detect shutdown thread run and repeatedly interrupt the main thread group until it, itself, is interrupted.

Improvement:

Simplified the Qt code by using a QTimer to look for fRequestShutdown getting set, instead of the more complicated boost signal --> slot --> Qt signal --> slot.

@denis2342

on freebsd I needed to the add boost_timer lib to the linker

@denis2342

right now I got this while closing with "stop":

Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp

@gavinandresen
Member

@denis2342 : when you say "right now", do you mean with this patch or without it?

@denis2342

yes, with the patch.

sorry for the delay, I am on the road.

denis

Sent from my iPhone

On 29.03.2013, at 18:29, Gavin Andresen notifications@github.com wrote:

@denis2342 : when you say "right now", do you mean with this patch or without it?


Reply to this email directly or view it on GitHub.

@laanwj laanwj commented on the diff Mar 31, 2013
src/walletdb.cpp
{
- Sleep(500);
+ MilliSleep(500);
@laanwj
laanwj Mar 31, 2013 Member

It looks like the Sleep calls were directly replaced by MilliSleep calls. Sleep, however, takes seconds. Was it intended to multiply with a factor 1000 too?

@sipa
Member
sipa commented Mar 31, 2013

@laanwj No, Sleep() (as opposed to sleep()) was an own function in util.cpp that takes milliseconds as argument. I'm in favor of renaming it to MilliSleep() to reduce confusion.

@laanwj
Member
laanwj commented Mar 31, 2013

Yes, agreed. This is certainly an improvement in naming, then.

@gavinandresen
Member

Squashed a few commits, rebased, and changed unsigned constants from 'u' to 'U'.

I'm still puzzled as to why shutdown isn't happening immediately on the pull-tester machine, I can't reproduce that problem on either my Mac or a debian VM. I'll debug directly on the pull-tester machine next.

@denis2342

got it again with the latest changes while using "stop":

./bitcoind stop
Bitcoin server stopping
Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 105.

@gavinandresen
Member

@denis2342 can you run under a debugger and get a call stack for that assertion? I'm guessing something is being called after main() exits, but I'm just guessing...

@gavinandresen
Member

Sigh. That took way too long to figure out.

Boost 1.40 (running on pull-tester) versus 1.53 (running on my machine) difference; boost 1.40's thread_group cannot interrupt_all() if another thread is waiting on join_all().

I'll rework the code to be boost 1.40-compatible.

gavinandresen added some commits Mar 7, 2013
@gavinandresen gavinandresen Shutdown cleanup prep-work
Create a boost::thread_group object at the qt/bitcoind main-loop level
that will hold pointers to all the main-loop threads.

This will replace the vnThreadsRunning[] array.

For testing, ported the BitcoinMiner threads to use its
own boost::thread_group.
c8c2fbe
@gavinandresen gavinandresen Fix signed/unsigned comparison warnings 87b9931
@gavinandresen gavinandresen Rename util.h Sleep --> MilliSleep
Two reasons for this change:
1. Need to always use boost::thread's sleep, even on Windows, so the
sleeps can be interrupted (prior code used Windows' built-in Sleep).

2. I always forgot what units the old Sleep took.
1b43bf0
@Diapolo
Diapolo commented Apr 3, 2013

@gavinandresen Why are we supporting such an ancient Boost version anyway? I don't get that :).

@BitcoinPullTester

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/723035bb6839c5d65bfee96d501a8c54814778e3 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@gavinandresen
Member

I'm going to pull this, I suspect @denis2342 is seeing the same intermittent crash-at-shutdown problems that we saw before on OSX (or are you seeing them on freebsd?)

@gavinandresen gavinandresen merged commit a0a437c into bitcoin:master Apr 4, 2013
@gavinandresen gavinandresen deleted the gavinandresen:shutdowncleanup branch Apr 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment