Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task thread worker is busy waiting when tasks queue is empty #1

Closed
wuqiong opened this issue Mar 10, 2021 · 14 comments
Closed

task thread worker is busy waiting when tasks queue is empty #1

wuqiong opened this issue Mar 10, 2021 · 14 comments
Assignees

Comments

@wuqiong
Copy link

wuqiong commented Mar 10, 2021

so tasks queue should be blocking.

@bshoshany bshoshany self-assigned this Mar 10, 2021
@bshoshany
Copy link
Owner

Sorry, I'm not exactly sure what you mean. Is this a bug report or a feature request? Can you please provide more details?

@corot
Copy link

corot commented Apr 6, 2021

I suppose @wuqiong means that the pool consumes a full core for every thread in the pool before giving it any task.
I verified with 1, 12 and default (system limit), and get ~100% in the first and ~500% CPU usage in the other 2.

@bshoshany
Copy link
Owner

bshoshany commented Apr 7, 2021

I see, thanks for the clarification.

Each thread is assigned the worker function worker(), which continuously tries to pop tasks out of the queue. If it finds a task, then that task is executed, but if there is no task currently in the queue, then the worker function calls std::this_thread::yield().

The precise behaviour of std::this_thread::yield() depends on the OS and implementation, but it's supposed to suspend the thread, run other threads instead, and then come back to it later.

So let's say you have N cores and you're running a program using a thread pool of N threads, with no other programs running at the same time. Even if your program just sits and waits, without submitting any tasks to the queue, you should still see all N cores being utilized (I get about 25% utilization per core on my system) - that's just each thread's worker function continuously checking the task queue.

However, if there are other multi-threaded applications running at the same time, then the OS will let them use those cores while the pool's threads are yielding. So the cores are not being taken over by the thread pool, they are actually shared with other programs. Therefore I do not think this is a problem, it is actually the intended behaviour - but please let me know if you think otherwise.

@bshoshany
Copy link
Owner

@wuqiong and @corot: I decided to look into possible alternatives to std::this_thread::yield(), and I found a simple solution which eliminates the high CPU usage while the workers are idling.

In the new version (v1.2), which I committed today, the worker function by default sleeps for a duration given by the public member variable sleep_duration (in microseconds) before checking the queue again. The default value is 1000 microseconds, which I found to be optimal on my system in terms of both CPU usage and performance, but your own optimal value may be different. Setting sleep_duration to 0 restores the original behaviour of calling std::this_thread::yield() instead of sleeping.

The interesting thing is that in my benchmarks, sleeping for 1000 microseconds actually resulted in moderately improved performance compared both to lower values of sleep_duration and to std::this_thread::yield(). I believe this is because the workers check the queue - which is a costly process - less frequently.

Therefore, the overall performance of the thread pool actually improved thanks to the feedback you provided in this issue - thanks! I am closing this issue now, since the problem of high CPU usage has been solved in v1.2. However, you are welcome to open a new issue if you find any problems or shortcomings with the new method.

@teopiaz
Copy link

teopiaz commented Aug 18, 2021

I think is possible to remove the sleep and the yield using a condition variable and a mutex.
cv.notify_one after you push the task
cv.wait in the worker loop before pop ( you need to check again running )
notify_all() in the destructor.

while (running)
{
    std::unique_lock<std::mutex> lk(run_mutex_);
    queue_cv_.wait(lk);
    if(!running){
        return;
    }
    std::function<void()> task;
    [...]
}

@bshoshany
Copy link
Owner

Thanks for the suggestion @teopiaz! :) This has previously been suggested in #12. However, the code people suggested there runs into a deadlock, and in any case does not seem to improve performance.

Furthermore, I'm not sure there's a clear benefit to making this change in the first place. The implementations of std::condition_variable::wait() themselves most likely also use sleep and/or yield to facilitate the actual waiting, but that means the performance would be implementation-dependent. By employing my own method for waiting I ensure that the thread pool behave the same way on all platforms (at least in this particular respect).

And lastly, I think it's actually a good thing that the sleep duration is a variable that can be changed, because it allows further optimization by tuning it to just the right value for a particular system. Again, if I use std::condition_variable::wait() no such tuning would be possible.

@simsong
Copy link

simsong commented Aug 23, 2021

Furthermore, I'm not sure there's a clear benefit to making this change in the first place. The implementations of std::condition_variable::wait() themselves most likely also use sleep and/or yield to facilitate the actual waiting, but that means the performance would be implementation-dependent. By employing my own method for waiting I ensure that the thread pool behave the same way on all platforms (at least in this particular respect).

I think that the current system works fine. However, it was my understanding that some operating systems provided condition variables that were not implemented with sleeps or spin locks, but instead were implemented by moving the waiting thread from a sleep state to a run state. Are you sure that std::condition_variable is implemented with spin locks?

@bshoshany
Copy link
Owner

I'm definitely not sure about that. I looked it up and could not find information on exactly how condition variables are implemented in any operating system, so I made a guess.

I did some more research now and I did find this article which shows that with condition variables the author gets 0% CPU utilization vs. 100% CPU utilization with a waiting loop.

However, in my thread pool, even though the workers do employ a loop, there is virtually zero CPU utilization while they're waiting, as long as sleep_duration is not 0 (tested on both Windows and Linux). Therefore, I am still not convinced using std::condition_variable will provide any noticeable improvement.

That said, I did add to my TODO list a task to try to implement the thread pool using std::condition_variable just to see how it compares to sleeping, but it will have to wait until I find the time to do it.

Also, FYI, I do plan to eventually convert the thread pool to using semaphores, which might work better, but will require C++20. So I will probably post that as a separate header file (or a separate repository altogether). Again, I just need to find the time to work on it... :\

@simsong
Copy link

simsong commented Aug 24, 2021

However, in my thread pool, even though the workers do employ a loop, there is virtually zero CPU utilization while they're waiting, as long as sleep_duration is not 0 (tested on both Windows and Linux). Therefore, I am still not convinced using std::condition_variable will provide any noticeable improvement.

If you set sleep_duration to be 1 second, and it takes a microsecond to check the mutex, then you are losing 1 millionth of the clock on checking. If you move the sleep_duration to 100 microseconds, then the over head is 1%. If you make sleep_duration 10 microseconds, your overhead is 10%, and you'll start to get memory contention as well, because your threads are not bound to individual cores, so the mutex needs to move from one L1 cache to another.

What you have works and is pretty simple. Here is a threadpool that uses condition variables:

And here a source code implementation for pthread_cond_wait.c:

It turns out that C++11 has std::condition_variable. Using it would cause minor increases in complexity.

So what's the advantage to using them? None in your application. That's because you don't care about latency. But another user of thread-pool might have 4 threads, 1 for each core, and might have events coming in and want them handled within a microsecond, but not want to have the threads polling every microsecond. That user will be frustrated that your system polls rather thna using condition variables.

@bshoshany
Copy link
Owner

Thanks for the explanation. I see your point and I will move this task to a higher priority. But still, it will take some time, since I have a lot of stuff to do before the fall term starts. I will use std::condition_variable, of course, since that is the most portable way to do it.

@simsong
Copy link

simsong commented Aug 24, 2021

Thanks. There is no hurry; your system works well in its current form for my application.

@sketch34
Copy link

Great looking lib! But yes latency is very important to me (high-performance network server) so I want to avoid busy waits. It could be great to have condition variables as an option in this lib. sleep() can be somewhat of an anti-pattern in multithreaded systems, it is a heavyweight solution. This old but gold article sheds some light: In praise of idleness.

@bshoshany
Copy link
Owner

@sketch34 thanks! You'll be happy to know that a draft for a version with condition variables has been suggested by another user (see #23). I want to do some tests on my end to make sure it doesn't have any issues, perhaps make some other changes, and then update the documentation. But it might take a few weeks since I'm a bit busy nowadays (the academic year just started). I'm also planning a C++20 version using semaphores, but that will come later. Stay tuned! :)

@bshoshany
Copy link
Owner

Update: v3.0.0, to be released in the next few days, will incorporate the changes discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants