-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize thread pool implementation: #4172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, thanks for simplifying this unnecessarily complicated part of the codebase. Several years back I spent some time fixing a corner case when the number of threads in Workers
decreased. That bug, and the time spent fixing and testing it, was pointless since the feature was gratuitous. Less complexity means fewer bugs. So this pull request is a win.
However, as it stands, this pull request makes the full suite of unit tests take more than twice as long to execute. Not good. It's fixable however.
I have a number of suggestions for what I think are improvements for Workers
. Most of them are naming and tidiness. But using std::thread::join()
rather than detach()
is critical to not losing the shutdown performance. You can see all of those accumulated Workers
changes here: scottschurr@0c345a0
There are also some less important suggestions I made for JobQueue
. You can see those implemented here: scottschurr@9ce5347
Feel free to cherry-pick those commits if you would like.
So having looked at your proposed changes, I'm not sure I'm sold. I dislike the need for a |
One person's cake is another person's poison. Indeed, another approach would be to see if a |
So, I'll think about this some more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice code cleanups. We need to fix the slow-downs, but from a code complexity POV this is a nice win.
I have a concern over changing the behavior of uncaught exceptions; I'm not totally against the change, but I don't think it's a small change either.
src/ripple/core/impl/Job.cpp
Outdated
@@ -61,7 +61,8 @@ Job::queue_time() const | |||
void | |||
Job::doJob() | |||
{ | |||
beast::setCurrentThreadName("doJob: " + mName); | |||
beast::setCurrentThreadName(beast::getCurrentThreadName() + ": " + mName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works because we reset the thread name elsewhere when running a new task. Still, this seems fragile and we could get thread names that grow without bound. Consider resetting the thread name immediately after the job finishes running.
src/ripple/core/impl/JobQueue.cpp
Outdated
JobQueue::uncaughtException(unsigned int instance, std::exception_ptr eptr) | ||
{ | ||
try | ||
{ | ||
if (eptr) | ||
std::rethrow_exception(eptr); | ||
} | ||
catch (std::exception const& e) | ||
{ | ||
JLOG(m_journal.fatal()) | ||
<< beast::getCurrentThreadName() | ||
<< ": Exception caught during task processing: " << e.what(); | ||
} | ||
catch (...) | ||
{ | ||
JLOG(m_journal.fatal()) | ||
<< beast::getCurrentThreadName() | ||
<< ": Unknown exception caught during task processing."; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a large behavior change that probably should get it's own patch and not be folded into an unrelated change. There are times where we purposely crash the app on unhandled exceptions - because we don't want to continue in an insane state. We've discussed making changes like this in the past, and have always opted to keep the same behavior.
I'm not necessarily against making this change, but at minimum we need to make this change much more visible.
src/ripple/core/impl/Workers.cpp
Outdated
perf::PerfLog* perfLog, | ||
std::string const& threadNames, | ||
int numberOfThreads) | ||
Workers::Workers(Callback& callback, std::string name, unsigned int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass name
by const&
src/ripple/core/impl/Workers.cpp
Outdated
threads_--; | ||
}, | ||
count); | ||
t.detach(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably keep a collection of threads in the object rather than detach
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottschurr, @seelabs: I thought about this and actually wrote the version with the std::vector<std::thread>
and something about it rubbed me the wrong way. I guess it's because actually holding onto the handle doesn't really serve a purpose.
Unless you're vetoing the current version (and if you feel strongly about it, please do!), I'd prefer to stick with the version as is.
@seelabs writes:
I hear what you're saying; is this behavior change appropriate for this PR? Let's take a step back. My position is that relying on this "hidden" behavior (i.e. that the job queue doesn't catch exceptions and that to So at some point we're going to need to fix this. The way I see it is that:
I think that terminating is reasonable because, as you said, it's hard for the job queue to properly handle exceptions. So I would suggest that we leave the new |
@nbougalis Re: catching the exception (I wish github would let us respond to top level comments): That's a reasonable position - my real concern was putting this behavior change in without giving it the attention it deserves (because it looks innocuous and it part of an optimization patch). And now it's getting attention, so I'm happy. We have two behaviors that need to be accounted for:
Unfortunately, it's really hard to audit the code for this type be behavior. There are no good tools that I know of that let me look at where an an exception is thrown and find where it might be caught or find the call paths where it won't be caught (and if anyone knows of such a tool I'd LOVE to hear about it - I made a feature request to rtags a while back, but no love so far: Andersbakken/rtags#1383). Without an audit we have to decide to either terminate where we shouldn't have, or continue in an insane state where we shouldn't have. Neither choice is good. The advantage of terminating when we shouldn't have is we will be notified and we can patch these issues. If we paper them over we'll never fix them. On the other hand, these issues can be a potential source of DOS attacks on a server. Bottom line: I vote to terminate, but either choice is reasonable and I'm fine with whatever you decide. One minor note: I don't think marking an function |
Just FYI, the Refactor JobQueue commit message has some junk tagged onto the end. You may wish to clean that up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified a few more changes for you to consider.
FWIW, I timed unit tests with your change which uses std::detach
and std::yield
. I also timed unit tests with an implementation that replaces those with std::join
. The unit test timings came out comparable. So, although I think the std::join
is more maintainable, I'm okay with your approach using std::detach
and std::yield
.
I tried out the most recent version of these changes on macOS. I see two things that need to be addressed:
The stack trace points at JobQueue.cpp line 265: https://github.com/XRPLF/rippled/pull/4172/files#diff-85e882e68cc55760882821eba100a325cad49b5bce66e21047bd861b3a7e6fedR262-R264 The good news is that the exception message got out. The bad news is that since we caught the exception the trace to the actual source of the exception is lost. The I removed the
|
The issue that @scottschurr encountered had to do with timing, but it did lead me to discover an implementation issue that could cause a worker thread to sleep despite the fact that work was waiting. I've reworked the code to address this and now leverage more C++20 functionality, since the PR to enable this was merged into I know it's annoying to have to do this, but please re-review the first commit (that changes the |
With the most recent code I consistently get the following failure on the JobQueue unit test:
I'm running macOS 12.5.1, Apple clang version 13.1.6 (clang-1316.0.21.2.5). |
Yeah, that was just sloppy of me. It's been fixed and the ugly-looking code in the unit tests has been removed. |
|
||
// The number of jobs currently in processTask() | ||
int m_processCount; | ||
int activeThreads_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the semantic meaning of the variable names like activeThreads_, coroutines and workers. In parallel computing, these terms are used in a loosely inter-changeable fashion. There are a lot of comments about the different methods of the Coro class, but can we give a definition of these terms too?
My understanding is that: Coroutines (objects of the type Coro) are function pointers which are queued for execution in order of their respective priorities. (I know that this is not an accurate description of Coro class, but something like this would be very helpful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a number of comments, many of which are nits. However I think I spotted a couple of thread race conditions that should be fixed. There's also the for
loop in Coroutine_test.cpp that runs thread_specific_storage
1000 times. I'm pretty sure we don't want that to happen every time anyone runs the unit tests.
The most recent version of this pull request fails the clang-format check. The top-most commit on this brand will fix that: https://github.com/scottschurr/rippled/commits/nik-jqt-clang-format Other than the clang-format problems, is this branch ready to be re-reviewed? Or is there more work to be done? |
@scottschurr Thanks for the patience, the comments and the I know there's a couple of |
Ping... |
@thejohnfreeman @seelabs @scottschurr - looks like this is ready for (re)review. If you won't be able to have an initial look at this within 2 weeks (and provide some feedback), please comment here to let us know. |
@intelliot, I'm not ignoring you, but I have another higher priority code review I'm working on. I can't predict whether or not I'll get to this one in the coming two weeks. |
This isn't a high priority item, imo. I'm sure Scott et al are focused on business priorities and that's fine; this can wait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should start out by saying that I'm really sorry that this pull request has been sitting stagnant for so long. It's hard to have code that you've put effort into just sit and be ignored for such a long time.
At a request from @intelliot I moved this pull request to the top of my review queue. I was requested to only verify that the changes I had flagged had been made. So I did that. But even with that cursory look I stumbled across a place that sure looks like it is missing a mutex lock.
So I can say that it looks like all of the important review comments I had made earlier have been addressed. But the most recent rework contains enough significant changes that it probably should have a full review. I'm fine with either me or someone else doing that additional review. But I'm not ready to give this code a thumbs up yet.
return {}; | ||
|
||
return std::make_unique<LoadEvent>(iter->second.load(), name, true); | ||
return {jobData_[t].load().sample(), std::move(name), true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much all of the other accesses to jobData_
are done with mutex_
locked. I have the uneasy sensation that there's a lock missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some more time spelunking through the code today. As far as I can tell it's okay that there is no lock here. Sorry for the false alarm. My instinct tells me that it would be good to leave a comment for why that lock is not needed. But that's not essential.
In my defense, I think it's hard to review code using a type that requires locking only some of the time. But I also know that part of the design predates your modifications, @nbougalis.
src/ripple/core/impl/Workers.h
Outdated
|
||
A number of initial threads may be optionally specified. The | ||
The number of initial threads must be specified. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The comment about "The default is to create one thread per CPU" is still present. There can be no default, since the count
is required by the signature.
@scottschurr, no worries. I know this was (a) a large PR; (b) not a priority; and (c) you're swamped. I appreciate you taking another look. Re: your comment about an apparently missing lock in this bit of code:
You say:
Here's my (preliminary) rationale for why a lock isn't missing: The The
And Given that, I think that it's fine as is, although it's been a long time since I've even looked at this code and I will have to re-check to be sure. I'll take a closer looks when I rebase this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug into the details regarding the comment I left in a previous review. It looks like my suspicions were unfounded. As far as I can tell no lock is needed at that location.
Since I don't anticipate spending more time on this review, and I don't want to be responsible for holding up this pull request any longer, I'm giving the thumbs up.
The removal of beast::LockFreeStack
and semaphore
, constexpr
-izing jobTypes
, and the massive simplification of Workers
are all good steps forward for this code base. Thanks for the contribution, @nbougalis.
return {}; | ||
|
||
return std::make_unique<LoadEvent>(iter->second.load(), name, true); | ||
return {jobData_[t].load().sample(), std::move(name), true}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some more time spelunking through the code today. As far as I can tell it's okay that there is no lock here. Sorry for the false alarm. My instinct tells me that it would be good to leave a comment for why that lock is not needed. But that's not essential.
In my defense, I think it's hard to review code using a type that requires locking only some of the time. But I also know that part of the design predates your modifications, @nbougalis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The existing thread pool code uses several layers of indirection which uses a custom lock-free stack, and offers functionality that supports features that are never used (e.g. the ability to dynamically adjust the number of threads in the pool). This refactoring aims to simplify the code, making it easier to reason about (although lock-free multi-threaded code is always tricky) what is happening, and reduce the latency of the thread pool internals.
This commit cleans up and modernizes the JobQueue but does not change the queueing logic. It focuses on simplifying the code by eliminating awkward code constructs, like "invalid jobs" and the need for default constructors. It leverages modern C++ to initialize tables and data structures at compile time and replaces `std:map` instances with directly indexed arrays. Lastly, it restructures the load tracking infrastructure and reduces the need for dynamic memory allocations by supporting move semantics and value types.
The existing code attempted to restrict the instantiation of `Coro` only to a subset of helper functions, by using the `Coro_create_t` helper structure. But the structure was public, which limited the effectiveness of this method. This commit uses a private type, fixing the issue.
@nbougalis if this PR is ready for review and merging (in your opinion), then please avoid force-pushing to the branch. Best practices would require that any changes to the PR after a review should invalidate that review. Also, when a branch is force-pushed or any changes are made to it, please comment on the PR to explain why the push / change was made. |
How can I avoid force-pushing if I rebase a branch to resolve merge conflicts with upstream? |
@nbougalis Instead of rebasing, just do a merge commit ( |
@thejohnfreeman what's the status of your review of this? |
@seelabs will you be able to review this? |
@intelliot I won't be able to review this week for sure |
src/ripple/core/JobQueue.h
Outdated
@note Calling this when the coroutine can result in undefined | ||
behavior if (a) the coroutine has finished executing by | ||
using 'return' instead of a call to 'yield()'; or (b) | ||
the coroutine is either executing or scheduled for execution | ||
and a call to to yield(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@note Calling `post` can result in undefined
behavior if (a) the coroutine has finished execution by
returning instead of calling `yield()`; or (b)
the coroutine is already executing or scheduled for execution.
|
||
Once the job starts running, the coroutine execution context is | ||
set up and execution begins either at the start of the coroutine | ||
or, if yield() was previous called, at the statement after that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous -> previously
|
||
LogicError( | ||
beast::getCurrentThreadName() + | ||
": Uncaught exception handler invoked with no exception_ptr"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we're talking about the exception pointed by eptr
. For this function to be called, that exception must have already been caught to be captured in eptr
. If what you're saying is true, then the stack trace is already lost when this function is entered. Are we sure that is true? Or maybe I'm misunderstanding.
// See if we can reuse a paused worker | ||
Worker* worker = m_paused.pop_front(); | ||
std::thread th( | ||
[this, name](unsigned int instance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a preference, but why capture this
and name
but not instance
/ count
? Could pass them all as parameters and turn this lambda into a function pointer. No biggie, just wondering how the distinction was made.
without a corresponding yield. | ||
/** Resume execution of a suspended coroutine on the current thread. | ||
|
||
The coroutine will continues execution from where it last left, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continues -> continue
@nbougalis - does this PR still make sense to merge? If so, please consider the above comments and update the branch to be up-to-date with |
The existing thread pool code uses several layers of indirection which uses a custom lock-free stack, and offers functionality that supports features that are never used (e.g. the ability to dynamically adjust the number of threads in the pool).
This refactoring aims to simplify the code, making it easier to reason about (although lock-free multi-threaded code is always tricky) what is happening, and reduce the latency of the thread pool internals.
High Level Overview of Change
Context of Change
Type of Change