Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread / shutdown cleanup #2357

Merged
merged 7 commits into from Apr 4, 2013

Conversation

gavinandresen
Copy link
Contributor

The shutdown code has bothered me for a long time, with fShutdown and vnThreadsRunning and a bunch of ad-hoc Sleeps().

Reports of shutdown-on-exit bugs in 0.8 prompted me to clean it all up.

This pull reworks the way we keep track of threads, using boost::thread_groups or boost::thread pointers.

Instead of polling for a global fShutdown flag, threads now just MilliSleep(), wait on boost mutexes, or periodically call boost::this_thread::interruption_point(), and are interrupted using boost's thread interruption mechanisms.

All of the cleanup removes 400 lines of code, and, I hope, makes the code easier to follow.

Tested on OSX extensively, and compiled/tested lightly on Windows XP.

@gavinandresen
Copy link
Contributor Author

Rebased, and added -rpcthreads to --help output.

But the SIGTERM handler is not working properly, which is why the pull tester is upset...

@Diapolo
Copy link

Diapolo commented Mar 19, 2013

I really like the reduced complexity of the code, I didn't test it out yet though.

@gavinandresen
Copy link
Contributor Author

I can't reproduce the pull-tester issue in an Ubuntu VM, but I did fix a compiler warning.

@Diapolo
Copy link

Diapolo commented Mar 20, 2013

Let's see what pull-tester is doing now, seems the old entries are gone and I wanted to also take a look :).

@@ -299,6 +304,7 @@ bool static Bind(const CService &addr, unsigned int flags) {
" -rpcport=<port> " + _("Listen for JSON-RPC connections on <port> (default: 8332 or testnet: 18332)") + "\n" +
" -rpcallowip=<ip> " + _("Allow JSON-RPC connections from specified IP address") + "\n" +
" -rpcconnect=<ip> " + _("Send commands to node running on <ip> (default: 127.0.0.1)") + "\n" +
" -rpcthreads=<n> " + _("Use this mean threads to service RPC calls (default: 4)") + "\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ I vote for Set the number of threads to service RPC calls (1-16, 0=auto, default: 0) and we should perhaps consider the same routine as you added for the -par switch then.

// -rpcthreads=0 means autodetect, but nRPCThreads==0 means no concurrency
    nRPCThreads = GetArg("-rpcthreads", 0);
    if (nRPCThreads == 0)
        nRPCThreads = boost::thread::hardware_concurrency();
    if (nRPCThreads <= 1)
        nRPCThreads = 0;
    else if (nRPCThreads > MAX_RPC_THREADS)
        nRPCThreads = MAX_RPC_THREADS;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

More RPC threads is not better, because RPC calls are normally single-threaded anyway due to lock contention.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, didn't know that, but the message should be in a similar format and some checks for min / max would be also nice IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use this mean threads" doesn't appear to me to be English.

@jgarzik
Copy link
Contributor

jgarzik commented Mar 22, 2013

ACK (with some inline minor nits mentioned)

@sipa
Copy link
Member

sipa commented Mar 23, 2013

I really like this change (except the polling to check whether someone interrupted... i wonder if one can call a condition variable's notify in a signal handler...). I haven't gone through the code in detail, but I see nothing terrible.

I've tested both bitcoind and bitcoin-qt under valgrind, with several ways of causing exit. All seems good.

@denis2342
Copy link

I tested it under OSX 10.8.3 with a command line build of bitcoind and it does not stop when I send the "stop" rpc command.

@gavinandresen
Copy link
Contributor Author

There is definitely a bug with the signal handler / RPC stop on OSX at least, if I try to stop shortly after startup the "stop" get swallowed and further stops have no effect.

@gavinandresen
Copy link
Contributor Author

Fixed two issues, and made one improvement:

  1. Running -daemon, the shutdown detection thread was being started before the fork(). Oops. I moved the -daemon code to run earlier, in AppInit() (which has the added benefit of simplifying the code a bit).
  2. A request to shutdown during AppInit2() startup could get lost. I fixed that by making the detect shutdown thread run and repeatedly interrupt the main thread group until it, itself, is interrupted.

Improvement:

Simplified the Qt code by using a QTimer to look for fRequestShutdown getting set, instead of the more complicated boost signal --> slot --> Qt signal --> slot.

@denis2342
Copy link

on freebsd I needed to the add boost_timer lib to the linker

@denis2342
Copy link

right now I got this while closing with "stop":

Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp

@gavinandresen
Copy link
Contributor Author

@denis2342 : when you say "right now", do you mean with this patch or without it?

@denis2342
Copy link

yes, with the patch.

sorry for the delay, I am on the road.

denis

Sent from my iPhone

On 29.03.2013, at 18:29, Gavin Andresen notifications@github.com wrote:

@denis2342 : when you say "right now", do you mean with this patch or without it?


Reply to this email directly or view it on GitHub.

{
Sleep(500);
MilliSleep(500);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the Sleep calls were directly replaced by MilliSleep calls. Sleep, however, takes seconds. Was it intended to multiply with a factor 1000 too?

@sipa
Copy link
Member

sipa commented Mar 31, 2013

@laanwj No, Sleep() (as opposed to sleep()) was an own function in util.cpp that takes milliseconds as argument. I'm in favor of renaming it to MilliSleep() to reduce confusion.

@laanwj
Copy link
Member

laanwj commented Mar 31, 2013

Yes, agreed. This is certainly an improvement in naming, then.

@gavinandresen
Copy link
Contributor Author

Squashed a few commits, rebased, and changed unsigned constants from 'u' to 'U'.

I'm still puzzled as to why shutdown isn't happening immediately on the pull-tester machine, I can't reproduce that problem on either my Mac or a debian VM. I'll debug directly on the pull-tester machine next.

@denis2342
Copy link

got it again with the latest changes while using "stop":

./bitcoind stop
Bitcoin server stopping
Assertion failed: (!pthread_mutex_lock(&m)), function lock, file /usr/local/include/boost/thread/pthread/recursive_mutex.hpp, line 105.

@gavinandresen
Copy link
Contributor Author

@denis2342 can you run under a debugger and get a call stack for that assertion? I'm guessing something is being called after main() exits, but I'm just guessing...

@gavinandresen
Copy link
Contributor Author

Sigh. That took way too long to figure out.

Boost 1.40 (running on pull-tester) versus 1.53 (running on my machine) difference; boost 1.40's thread_group cannot interrupt_all() if another thread is waiting on join_all().

I'll rework the code to be boost 1.40-compatible.

Create a boost::thread_group object at the qt/bitcoind main-loop level
that will hold pointers to all the main-loop threads.

This will replace the vnThreadsRunning[] array.

For testing, ported the BitcoinMiner threads to use its
own boost::thread_group.
Two reasons for this change:
1. Need to always use boost::thread's sleep, even on Windows, so the
sleeps can be interrupted (prior code used Windows' built-in Sleep).

2. I always forgot what units the old Sleep took.
@Diapolo
Copy link

Diapolo commented Apr 3, 2013

@gavinandresen Why are we supporting such an ancient Boost version anyway? I don't get that :).

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/723035bb6839c5d65bfee96d501a8c54814778e3 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@gavinandresen
Copy link
Contributor Author

I'm going to pull this, I suspect @denis2342 is seeing the same intermittent crash-at-shutdown problems that we saw before on OSX (or are you seeing them on freebsd?)

gavinandresen added a commit that referenced this pull request Apr 4, 2013
@gavinandresen gavinandresen merged commit a0a437c into bitcoin:master Apr 4, 2013
@gavinandresen gavinandresen deleted the shutdowncleanup branch April 4, 2013 01:25
laudney pushed a commit to reddcoin-project/reddcoin-3.10 that referenced this pull request Mar 19, 2014
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants