Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threadpool create/destroy deadlock [BUG] #107

Closed
alugowski opened this issue May 10, 2023 · 7 comments
Closed

Threadpool create/destroy deadlock [BUG] #107

alugowski opened this issue May 10, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@alugowski
Copy link

Simple reproducible deadlock:

#include <iostream>
#include "BS_thread_pool_light.hpp"

void func() {
  BS::thread_pool_light pool;
  pool.wait_for_tasks();
}

int main() {
  for (int i = 0; i < 1e6; ++i) {
    func();
    std::cout << i << " ";
    std::cout.flush();
  }
  std::cout << "done" << std::endl;
  return 0;
}

Sometimes this deadlocks at iteration 100, sometimes 5000, but it always does.

@alugowski alugowski added the bug Something isn't working label May 10, 2023
@alugowski
Copy link
Author

This is probably the same bug as #93 but easily reproducible.

@bshoshany
Copy link
Owner

Thanks, I was able to reproduce the deadlock with your code. I will look into it.

As a side note, the main benefit of the thread pool is that you only create it once and then reuse it, otherwise you're creating and destroying the threads all over again each time, which significantly hurts performance. Creating and destroying the pool 5000 times in a row is not something that should ever happen in real-life applications.

I noticed this in your fast_matrix_market library too - in write_body_threads() you recreate the pool every time it's called, but if this function is called multiple times, it would be better to create the pool just once and reuse it.

To be clear, I definitely want to fix this bug because the thread pool should be "mathematically" free of deadlocks, but I'm not sure it's very relevant for real-life applications.

@alugowski
Copy link
Author

Thanks, I was able to reproduce the deadlock with your code. I will look into it.

🎉

Creating and destroying the pool 5000 times in a row is not something that should ever happen in real-life applications.

Again, the 5000 is to make the problem reproducible. Consider it a blessing. I can think of already nearly a half dozen people with this issue, most have tracked it down and some even created beautiful reproducible examples. Would you really rather run our user code instead of a 20-line program? Same problem exists in both places.

If you insist then remove the loop and loop the entire program instead. A program that freezes once in a thousand runtimes is a support nightmare. You lose users for stuff like that.

I noticed this in your fast_matrix_market library too - in write_body_threads() you recreate the pool every time it's called, but if this function is called multiple times, it would be better to create the pool just once and reuse it.

I understand that is the classical thinking, but it really does not apply to that library. Have you measured the threadpool creation/destruction time? On my machine it's less than 1/10 of a millisecond. Compared to something that takes several orders of magnitude longer, and is also called literally once or twice in the entire runtime of another very long process. Amdahl's Law says that this is not where we should be focusing our efforts.

What you're asking is to keep track of a threadpool across multiple API calls. All ways of doing that are far worse than a millisecond penalty.

I'm not sure it's very relevant for real-life applications.

It is extremely relevant. This attitude is worrying.

@bshoshany
Copy link
Owner

bshoshany commented May 10, 2023

@alugowski, the reason for my "worrying attitude" is that thousands of users - the vast majority - are using this library in the way that was originally intended, creating the thread pool only once, and therefore will never encounter this bug. You are using it in a way that I did not anticipate, but I understand now that you have tested various approaches and found that this is what works best for your software, so that makes sense and it convinces me that this bug is indeed relevant for real-life applications. I assure you that I will definitely try to fix this bug, meanwhile please be patient :)

@alugowski
Copy link
Author

creating the thread pool only once, and therefore will never encounter this bug.

Ok, I'll play along. Do you agree that this follows your advice?

#include "BS_thread_pool_light.hpp"

int main() {
  BS::thread_pool_light pool;
  return 0;
}

Compiled with:

g++-12 -std=c++17 single.cpp -o single

Then just execute it a bunch of times:

#!/bin/bash

for i in {1..10000}
do
	./single
	echo "$i"
done

I tried this 10 times and it made it above 1000 once.

Dude, all your users are experiencing this. They just don't know why their programs are occasionally freezing. The only thing your advice changes is that the freeze will be at process exit time where it's less likely to be noticed and even harder to debug.

@bshoshany
Copy link
Owner

Thank you for this useful example! I wasn't convinced until now, but this example convinced me that this problem applies to all of the users, and I will move it to the highest priority.

@bshoshany
Copy link
Owner

Closed as resolved by PR #108 (will be included in v3.4.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants