Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: drop boost::thread_group #9289

Merged
merged 7 commits into from Jan 4, 2017

Conversation

@theuni
Copy link
Member

commented Dec 6, 2016

(Split out and rewritten chunk of #8631. I'll rebase that on top of this post-merge and reopen.)

This is a prerequisite for async network handling. As-is, we don't have enough control over the shutdown process to be able to deal with async connecting and send/recv.

In particular, we need to be sure that message processing has shut down before forcing all networking down, otherwise we run the risk of trying to process a node's messages during its destruction. This is not a problem now because the current sync model works fine with all threads being interrupted at the same time.

And if that's not a satisfactory reason for the change, it also gets rid of a nice chunk of boost threading (and simplifies the rebase of #8631 :)

To accomplish this, we need to:

  • Set a flag to interrupt all net threads
  • notify all blocked condvars
  • replace MilliSleep (since they're boost interruption points)
  • teach init to Interrupt/Stop, similar to some of the other subsystems.

The MilliSleeps are replaced with condvars that check for the thread's interrupt flag. Since the flags are atomic, there's no need for real locking, so these condvars use a dummy CNullLock.

With all of that done, we may as well switch away from boost threads, since we're no longer dependent on interruption.

@TheBlueMatt 3f3f0b4 is the awkward change I mentioned on IRC. The global will be cleaned up when we move processing into a class as a next step (something like theuni@3f598db)

@laanwj

This comment has been minimized.

Copy link
Member

commented Dec 6, 2016

Concept ACK

@sipa

This comment has been minimized.

Copy link
Member

commented Dec 6, 2016

Would it make sense to introduce a utility function like MilliSleep which takes a reference to a condition variable and an atomic flag? The pattern that replaces the MilliSleep calls is pretty complicated and repeats several times.

@fanquake

This comment has been minimized.

Copy link
Member

commented Dec 7, 2016

Concept ACK

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 8, 2016

@sipa Sure, will do.

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 9, 2016

@sipa done. I'm not sure if it's generic enough to add to utiltime, happy to move it back to net if that makes more sense.

Also fixed up the TODO that I was mistaken about.

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 16, 2016

Just noticed that this PR would've broken interruptible proxy recvs, so I pushed a commit to fix that. The proxy code will soon be dumped anyway, so I'm ok with the hackish global there for now.

Also squashed down the header fix while I was at it.

@TheBlueMatt
Copy link
Contributor

left a comment

Does each thread need its own atomic_flag? We currently dont use it and I'm not sure I envision a scenario where you want to interrupt only some of the threads. Would simplify the code a decent chunk, I think.

I super dont like using atomic_flags and resetting them to dont-wake every time you check them...I think its fine in this PR, but it just seems like a bad idea generally. atomic_bool should be just fine, too :).

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@TheBlueMatt I think you might be confused as to how the atomic flags are used in this code. They're used in "clear triggered mode", so there is no need to reset them after every check. The other usage ("test_and_set triggered mode") would introduce race conditions when the checking thread clears the flag as a concurrent interrupt could be cleared.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@TheBlueMatt ok I was responding to "atomic_flags and resetting them to dont-wake every time you check them.", not sure what you meant by that if not my earlier interpretation.

I agree with you on the semantics being off if your worry is future code breaking it. We should either make it throw an Interruption, or call std::terminate. Changing it to an atomic_bool will not have any effect on reducing likelihood future code changes don't break intended behavior. Implemented here for comparison theuni/bitcoin@connman-threads...JeremyRubin:connman-threads The only negative of this approach is that local cleanup can't be handled as neatly.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@JeremyRubin well my note was that if you use an atomic_bool instead of a flag (which means you can check without resetting the state to the dont-exit-now state) then even if there is a bug the worst that happens is you will exit the thread at the next check.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@JeremyRubin sadly I'm not sure about the execution propagation from within condition_variable...cppreference seems to indicate it was cleaned up in C++14, but its unclear if its guaranteed to propagation from the predicate or if it just "may".

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

Thanks for clarifying the kind of bug you were imagining, I had a different type of bug in mind (there being no code flow which can cause a termination, rather than being catchable on the next iteration).

You're absolutely right on the "may" for exceptions thrown from wait_for. Semantics should be fixed now (moved InterruptibleSleep into interruption_point and made the throwing external to wait_for).

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2016

@JeremyRubin ehh, I mean yes regaring wrappers, no regarding exceptions - I think its generally accepted that using exceptions in C++ is bad, and I'm not sure we want to add any more uses than we already have.

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2016

@JeremyRubin and @TheBlueMatt Thanks for having a look.

After a debate (probably more than necessary), and running it by a third party, I think I'll concede and just use an atomic bool instead. The convincing argument was that a bool allows us to deviate from the current "got an interrupt, bail from thread!", and lets us do quick on-thread teardown post-interruption. It will need to stay in check though, our usages of Interrupt() assume that it won't block.

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2016

@TheBlueMatt I pushed up a new attempt to https://github.com/theuni/bitcoin/commits/connman-threads-redo. I think you'll like that much better.

I need to split a few non-obvious things into different commits, but mind having a quick look at the concept?

I borrowed @JeremyRubin's wrapper idea but implemented it a bit differently: theuni@7b4d8f6#diff-09404f8fdd1f69609d329cb0b1f1252e

The usage is (imo) pretty straightforward: theuni@7b4d8f6#diff-9a82240fe7dfe86564178691cc57f2f1L1660

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 22, 2016

@theuni Yup, really like that version.

@theuni theuni force-pushed the theuni:connman-threads branch 2 times, most recently to e0bdb1d Dec 22, 2016

@TheBlueMatt
Copy link
Contributor

left a comment

Why did the socket handler startup need to move up? Did I miss a bug that woule be cause by it not or did you just prefer it that way? (dont care, just asking to make sure I'm aware)

There is still a MilliSleep call in ThreadSocketHandler. Is that intentional?

#include <condition_variable>
#include <atomic>

class CThreadInterrupt

This comment has been minimized.

Copy link
@TheBlueMatt

TheBlueMatt Dec 22, 2016

Contributor

Generally, I think it might make sense to just rename this "InterruptibleSleeper" and have it own the mutex and condition_variable. Then you dont have to have its owner own memory it points to and should still be able to keep +/- the same API.

This comment has been minimized.

Copy link
@theuni

theuni Dec 22, 2016

Author Member

Sometimes you'll want a real interruptible condvar, though. See theuni@da68e3c#diff-9a82240fe7dfe86564178691cc57f2f1R1887.

The alternative would be to add wait_for/wait_until/notify_one/notify_all to CThreadInterrupt, then just forward them to the condvar. But for that, you'd still need to expose the mutex.

This comment has been minimized.

Copy link
@theuni

theuni Dec 22, 2016

Author Member

I suppose this could be rigged up with a separate condvar/mutex and some additional manual isinterruped checks.

* @returns false if the sleep was interrupted, true otherwise
*/
template <typename Duration>
bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt& threadInterrupt)

This comment has been minimized.

Copy link
@TheBlueMatt

TheBlueMatt Dec 22, 2016

Contributor

Can we just move this into a member function of the class?

Also, does it need to be a template? I think if you set Duration to something that the various durations can be converted into it should do so intelligently. Would mean less stuff to compile in headers.

This comment has been minimized.

Copy link
@theuni

theuni Dec 22, 2016

Author Member

This started out as a member function, but interrupt.sleep_for() felt awkward conceptually.

We can probably just force this to milliseconds. If we need microsecond precision later, we can just add another function.

{
public:
CThreadInterrupt(std::condition_variable& condIn, std::mutex& mutIn) : cond(condIn), mut(mutIn), val(false) {}
void reset()

This comment has been minimized.

Copy link
@TheBlueMatt

TheBlueMatt Dec 22, 2016

Contributor

Please no new implementations-in-headers. I know its mostly single-line stuff but we're fighting to reduce compile-time memory usage, so every little bit helps. :)

This comment has been minimized.

Copy link
@theuni

theuni Dec 22, 2016

Author Member

Sure. Though some of these need to be inlined. I figured someone would yell if I added a .cpp with 2 functions :p

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 22, 2016

Socket handler moved up just so that there's a consistent start/stop order.

Thanks for pointing out the MilliSleep, definitely not intentional!

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2016

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2016

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 23, 2016

That would require the wakeup condition to be atomic, no?

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2016

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2016

@TheBlueMatt Pushed a round of changes after last night's IRC discussion. I hope we can compromise on this approach.

@fanquake fanquake added this to the 0.14.0 milestone Dec 29, 2016

@theuni theuni force-pushed the theuni:connman-threads branch Dec 29, 2016

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 29, 2016

rebased at @TheBlueMatt's request.

@@ -0,0 +1,41 @@
// Copyright (c) 2009-2010 Satoshi Nakamoto

This comment has been minimized.

Copy link
@sipa

sipa Dec 30, 2016

Member

Satoshi did not write this... as far as I know at least? Nor was it written in 2009.

This comment has been minimized.

Copy link
@theuni

theuni Dec 30, 2016

Author Member

Caught me!

messageHandlerCondition.timed_wait(lock, boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(100));
if (fSleep) {
std::unique_lock<std::mutex> lock(mutexMsgProc);
condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));

This comment has been minimized.

Copy link
@sipa

sipa Dec 30, 2016

Member

It seems that mutexMsgProc is only used for locking flagInterruptMsgProc and waiting, making it very similar to CThreadInterrupt. If you'd add a wake() method to that class, which breaks from sleeps, but doesn't set the interrupt point, wouldn't you be able to replace the explicit mutex/flag/condvar with a CThreadInterrupt>

This comment has been minimized.

Copy link
@theuni

theuni Dec 30, 2016

Author Member

I tried to make CThreadInterrupt do this at first (see some of the back+forth with @TheBlueMatt) above. Your suggestion works fine here, I think, but it's not generic due to the lack of accounting for wake conditions. I think it could be done by having wait_until accept an optional predicate, which would be used internally as "while !internal_wake && !pred()".

The compromise was to only use it for plain sleeps. But it's bounced back and forth enough times that I really have no preference. If you'd prefer that, I'm happy to make the change.

This comment has been minimized.

Copy link
@theuni

theuni Dec 30, 2016

Author Member

Whoops, that's right, that didn't work because it would force the internal mutex to be exposed so the caller could actually set the wake condition.

This comment has been minimized.

Copy link
@sipa

sipa Dec 30, 2016

Member

Ok, I see. Let's think about generalizing this later. No need to hold this PR up.

src/net.cpp Outdated

void CConnman::Stop()
{
LogPrintf("%s\n",__func__);

This comment has been minimized.

Copy link
@sipa

sipa Dec 30, 2016

Member

Leftover debug statement?

This comment has been minimized.

Copy link
@theuni

theuni Dec 30, 2016

Author Member

Will remove.

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 30, 2016

Pushed a commit for @sipa's nits. I've been running my dev branches on top of this for several days, killing bitcoind in all kinds of rude ways while debugging, and it has continued to shutdown reliably.

@sipa
Copy link
Member

left a comment

utACK

messageHandlerCondition.timed_wait(lock, boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(100));
if (fSleep) {
std::unique_lock<std::mutex> lock(mutexMsgProc);
condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));

This comment has been minimized.

Copy link
@sipa

sipa Dec 30, 2016

Member

Ok, I see. Let's think about generalizing this later. No need to hold this PR up.

@theuni theuni force-pushed the theuni:connman-threads branch Dec 30, 2016

@theuni

This comment has been minimized.

Copy link
Member Author

commented Dec 30, 2016

@sipa I figured you'd have more nits :)

Went ahead and squashed.

@sipa

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

Needs rebase.

theuni added 7 commits Dec 27, 2016
net: a few small cleanups before replacing boost threads
- Drop the interruption point directly after the pnode allocation. This would
    be leaky if hit.
- Rearrange thread creation so that the socket handler comes first
net: make net interruptible
Also now that net threads are interruptible, switch them to use std
threads/binds/mutexes/condvars.
net: remove thread_interrupted catch
This is now a std::thread, so there's no hope of catching a boost interruption
point.

@theuni theuni force-pushed the theuni:connman-threads branch to 67ee4ec Jan 3, 2017

@theuni

This comment has been minimized.

Copy link
Member Author

commented Jan 4, 2017

Done.

@laanwj

This comment has been minimized.

Copy link
Member

commented Jan 4, 2017

lightly-tested ACK 67ee4ec

@laanwj laanwj merged commit 67ee4ec into bitcoin:master Jan 4, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
laanwj added a commit that referenced this pull request Jan 4, 2017
Merge #9289: net: drop boost::thread_group
67ee4ec net: misc header cleanups (Cory Fields)
8b3159e net: make proxy receives interruptible (Cory Fields)
5cb0fce net: remove thread_interrupted catch (Cory Fields)
d3d7056 net: make net processing interruptible (Cory Fields)
0985052 net: make net interruptible (Cory Fields)
799df91 net: add CThreadInterrupt and InterruptibleSleep (Cory Fields)
7325b15 net: a few small cleanups before replacing boost threads (Cory Fields)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.