Thread / shutdown cleanup #2357

gavinandresen · 2013-03-11T22:07:00Z

The shutdown code has bothered me for a long time, with fShutdown and vnThreadsRunning and a bunch of ad-hoc Sleeps().

Reports of shutdown-on-exit bugs in 0.8 prompted me to clean it all up.

This pull reworks the way we keep track of threads, using boost::thread_groups or boost::thread pointers.

Instead of polling for a global fShutdown flag, threads now just MilliSleep(), wait on boost mutexes, or periodically call boost::this_thread::interruption_point(), and are interrupted using boost's thread interruption mechanisms.

All of the cleanup removes 400 lines of code, and, I hope, makes the code easier to follow.

Tested on OSX extensively, and compiled/tested lightly on Windows XP.

gavinandresen · 2013-03-13T15:36:01Z

Rebased, and added -rpcthreads to --help output.

But the SIGTERM handler is not working properly, which is why the pull tester is upset...

Diapolo · 2013-03-19T22:40:18Z

I really like the reduced complexity of the code, I didn't test it out yet though.

gavinandresen · 2013-03-20T20:10:05Z

I can't reproduce the pull-tester issue in an Ubuntu VM, but I did fix a compiler warning.

Diapolo · 2013-03-20T21:16:37Z

Let's see what pull-tester is doing now, seems the old entries are gone and I wanted to also take a look :).

sipa · 2013-03-21T13:59:30Z

src/init.cpp

@@ -299,6 +304,7 @@ bool static Bind(const CService &addr, unsigned int flags) {
        "  -rpcport=<port>        " + _("Listen for JSON-RPC connections on <port> (default: 8332 or testnet: 18332)") + "\n" +
        "  -rpcallowip=<ip>       " + _("Allow JSON-RPC connections from specified IP address") + "\n" +
        "  -rpcconnect=<ip>       " + _("Send commands to node running on <ip> (default: 127.0.0.1)") + "\n" +
+        "  -rpcthreads=<n>        " + _("Use this mean threads to service RPC calls (default: 4)") + "\n" +


^^ I vote for Set the number of threads to service RPC calls (1-16, 0=auto, default: 0) and we should perhaps consider the same routine as you added for the -par switch then.

// -rpcthreads=0 means autodetect, but nRPCThreads==0 means no concurrency nRPCThreads = GetArg("-rpcthreads", 0); if (nRPCThreads == 0) nRPCThreads = boost::thread::hardware_concurrency(); if (nRPCThreads <= 1) nRPCThreads = 0; else if (nRPCThreads > MAX_RPC_THREADS) nRPCThreads = MAX_RPC_THREADS;

No.

More RPC threads is not better, because RPC calls are normally single-threaded anyway due to lock contention.

Okay, didn't know that, but the message should be in a similar format and some checks for min / max would be also nice IMHO.

"Use this mean threads" doesn't appear to me to be English.

jgarzik · 2013-03-22T17:58:07Z

ACK (with some inline minor nits mentioned)

sipa · 2013-03-23T00:35:46Z

I really like this change (except the polling to check whether someone interrupted... i wonder if one can call a condition variable's notify in a signal handler...). I haven't gone through the code in detail, but I see nothing terrible.

I've tested both bitcoind and bitcoin-qt under valgrind, with several ways of causing exit. All seems good.

denis2342 · 2013-03-23T03:14:48Z

I tested it under OSX 10.8.3 with a command line build of bitcoind and it does not stop when I send the "stop" rpc command.

gavinandresen · 2013-03-23T17:55:40Z

There is definitely a bug with the signal handler / RPC stop on OSX at least, if I try to stop shortly after startup the "stop" get swallowed and further stops have no effect.

gavinandresen · 2013-03-24T00:22:00Z

Fixed two issues, and made one improvement:

Running -daemon, the shutdown detection thread was being started before the fork(). Oops. I moved the -daemon code to run earlier, in AppInit() (which has the added benefit of simplifying the code a bit).
A request to shutdown during AppInit2() startup could get lost. I fixed that by making the detect shutdown thread run and repeatedly interrupt the main thread group until it, itself, is interrupted.

Improvement:

Simplified the Qt code by using a QTimer to look for fRequestShutdown getting set, instead of the more complicated boost signal --> slot --> Qt signal --> slot.

denis2342 · 2013-03-29T10:39:23Z

on freebsd I needed to the add boost_timer lib to the linker

denis2342 · 2013-03-29T16:19:57Z

right now I got this while closing with "stop":

Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp

gavinandresen · 2013-03-29T17:28:54Z

@denis2342 : when you say "right now", do you mean with this patch or without it?

denis2342 · 2013-03-30T10:21:41Z

yes, with the patch.

sorry for the delay, I am on the road.

denis

Sent from my iPhone

On 29.03.2013, at 18:29, Gavin Andresen notifications@github.com wrote:

@denis2342 : when you say "right now", do you mean with this patch or without it?

—
Reply to this email directly or view it on GitHub.

laanwj · 2013-03-31T11:05:51Z

src/walletdb.cpp

    {
-        Sleep(500);
+        MilliSleep(500);


It looks like the Sleep calls were directly replaced by MilliSleep calls. Sleep, however, takes seconds. Was it intended to multiply with a factor 1000 too?

sipa · 2013-03-31T11:11:20Z

@laanwj No, Sleep() (as opposed to sleep()) was an own function in util.cpp that takes milliseconds as argument. I'm in favor of renaming it to MilliSleep() to reduce confusion.

laanwj · 2013-03-31T11:36:40Z

Yes, agreed. This is certainly an improvement in naming, then.

gavinandresen · 2013-04-01T21:52:15Z

Squashed a few commits, rebased, and changed unsigned constants from 'u' to 'U'.

I'm still puzzled as to why shutdown isn't happening immediately on the pull-tester machine, I can't reproduce that problem on either my Mac or a debian VM. I'll debug directly on the pull-tester machine next.

denis2342 · 2013-04-02T14:39:53Z

got it again with the latest changes while using "stop":

./bitcoind stop
Bitcoin server stopping
Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 105.

gavinandresen · 2013-04-02T16:02:14Z

@denis2342 can you run under a debugger and get a call stack for that assertion? I'm guessing something is being called after main() exits, but I'm just guessing...

gavinandresen · 2013-04-03T17:30:03Z

Sigh. That took way too long to figure out.

Boost 1.40 (running on pull-tester) versus 1.53 (running on my machine) difference; boost 1.40's thread_group cannot interrupt_all() if another thread is waiting on join_all().

I'll rework the code to be boost 1.40-compatible.

Create a boost::thread_group object at the qt/bitcoind main-loop level that will hold pointers to all the main-loop threads. This will replace the vnThreadsRunning[] array. For testing, ported the BitcoinMiner threads to use its own boost::thread_group.

Two reasons for this change: 1. Need to always use boost::thread's sleep, even on Windows, so the sleeps can be interrupted (prior code used Windows' built-in Sleep). 2. I always forgot what units the old Sleep took.

Diapolo · 2013-04-03T20:41:27Z

@gavinandresen Why are we supporting such an ancient Boost version anyway? I don't get that :).

BitcoinPullTester · 2013-04-04T00:37:08Z

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/723035bb6839c5d65bfee96d501a8c54814778e3 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

gavinandresen · 2013-04-04T01:24:54Z

I'm going to pull this, I suspect @denis2342 is seeing the same intermittent crash-at-shutdown problems that we saw before on OSX (or are you seeing them on freebsd?)

Thread / shutdown cleanup

gavinandresen mentioned this pull request Mar 11, 2013

Rework the clients thread priority handling #2199

Closed

sipa reviewed Mar 21, 2013
View reviewed changes

laanwj reviewed Mar 31, 2013
View reviewed changes

gavinandresen added 3 commits April 3, 2013 14:04

Fix signed/unsigned comparison warnings

87b9931

Shutdown cleanup prep-work

c8c2fbe

Create a boost::thread_group object at the qt/bitcoind main-loop level that will hold pointers to all the main-loop threads. This will replace the vnThreadsRunning[] array. For testing, ported the BitcoinMiner threads to use its own boost::thread_group.

Rename util.h Sleep --> MilliSleep

1b43bf0

Two reasons for this change: 1. Need to always use boost::thread's sleep, even on Windows, so the sleeps can be interrupted (prior code used Windows' built-in Sleep). 2. I always forgot what units the old Sleep took.

gavinandresen added 4 commits April 3, 2013 19:57

LoopForever and ThreadTrace helpers

72f14d2

Port Thread* methods to boost::thread_group

21eb5ad

Clean up shutdown process

b31499e

Have Qt poll for shutdown requested, the QT way.

723035b

gavinandresen added a commit that referenced this pull request Apr 4, 2013

Merge pull request #2357 from gavinandresen/shutdowncleanup

a0a437c

Thread / shutdown cleanup

gavinandresen merged commit a0a437c into bitcoin:master Apr 4, 2013

gavinandresen deleted the shutdowncleanup branch April 4, 2013 01:25

laudney pushed a commit to reddcoin-project/reddcoin-3.10 that referenced this pull request Mar 19, 2014

Merge pull request bitcoin#2357 from gavinandresen/shutdowncleanup

282a094

Thread / shutdown cleanup

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread / shutdown cleanup #2357

Thread / shutdown cleanup #2357

gavinandresen commented Mar 11, 2013

gavinandresen commented Mar 13, 2013

Diapolo commented Mar 19, 2013

gavinandresen commented Mar 20, 2013

Diapolo commented Mar 20, 2013

sipa Mar 21, 2013

Diapolo Mar 21, 2013

gavinandresen Mar 21, 2013

Diapolo Mar 21, 2013

rebroad Apr 1, 2013

jgarzik commented Mar 22, 2013

sipa commented Mar 23, 2013

denis2342 commented Mar 23, 2013

gavinandresen commented Mar 23, 2013

gavinandresen commented Mar 24, 2013

denis2342 commented Mar 29, 2013

denis2342 commented Mar 29, 2013

gavinandresen commented Mar 29, 2013

denis2342 commented Mar 30, 2013

laanwj Mar 31, 2013

sipa commented Mar 31, 2013

laanwj commented Mar 31, 2013

gavinandresen commented Apr 1, 2013

denis2342 commented Apr 2, 2013

gavinandresen commented Apr 2, 2013

gavinandresen commented Apr 3, 2013

Diapolo commented Apr 3, 2013

BitcoinPullTester commented Apr 4, 2013

gavinandresen commented Apr 4, 2013

Thread / shutdown cleanup #2357

Thread / shutdown cleanup #2357

Conversation

gavinandresen commented Mar 11, 2013

gavinandresen commented Mar 13, 2013

Diapolo commented Mar 19, 2013

gavinandresen commented Mar 20, 2013

Diapolo commented Mar 20, 2013

sipa Mar 21, 2013

Choose a reason for hiding this comment

Diapolo Mar 21, 2013

Choose a reason for hiding this comment

gavinandresen Mar 21, 2013

Choose a reason for hiding this comment

Diapolo Mar 21, 2013

Choose a reason for hiding this comment

rebroad Apr 1, 2013

Choose a reason for hiding this comment

jgarzik commented Mar 22, 2013

sipa commented Mar 23, 2013

denis2342 commented Mar 23, 2013

gavinandresen commented Mar 23, 2013

gavinandresen commented Mar 24, 2013

denis2342 commented Mar 29, 2013

denis2342 commented Mar 29, 2013

gavinandresen commented Mar 29, 2013

denis2342 commented Mar 30, 2013

laanwj Mar 31, 2013

Choose a reason for hiding this comment

sipa commented Mar 31, 2013

laanwj commented Mar 31, 2013

gavinandresen commented Apr 1, 2013

denis2342 commented Apr 2, 2013

gavinandresen commented Apr 2, 2013

gavinandresen commented Apr 3, 2013

Diapolo commented Apr 3, 2013

BitcoinPullTester commented Apr 4, 2013

gavinandresen commented Apr 4, 2013