-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenBLAS : Program will terminate because you tried to start too many threads. #1735
Comments
If you call openblas from multiple threads you need omp or single thread version. Check Makefile.rule for more options |
If you call multi-threaded blas from many threads at once you lose cache efficiency |
This appears to be a (still poorly understood) bug introduced with the rewrite (and speedup) of the thread initialization code in 0.3.1 - see #1704 and #1641. A workaround is simply to increase the value of MAX_ALLOCATING_THREADS, but to achieve some kind of final solution it would be very useful to get an idea how many (blas-calling) threads an affected program is/was trying to start. |
#1704 sort of ended in resource leak where calling thread exits and new comes in place all the time? |
Not sure I'd call that a resource leak, it just so happens that there is a fixed size array for thread pointers in the new code where old entries apparently never get reused. #1641 already had musings on making this thing dynamically allocated. |
The same issue has been reported in Fedora so I'm also interested in a speedy fix to this. |
Unfortunately that bugzilla entry is mostly useless, as the person who wrote it does not appear to know what the code he uses actually does internally (or how many threads he will eventually need). For a crude "fix", you could try replacing
in line 512 of driver/others/memory.c with some big constant, such as |
If one isn't using OpenBLAS API, directly, how can you troubleshoot this error message? I looked, but didn't see any runtime env-based debug options I could try. I cannot even tell where the crash occurs since I don't get a coredump. I see in the FAQ that "If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading." Is this a new conflict? If not, that is not my problem because it broke immediately after update to 0.3.1-2. But, using OPENBLAS_NUM_THREADS=1 prevents the crash, so, can you tell if that is related or not? Thanks for your help. |
@PorcelainMouse the problem is an unexpected and unintended consequence of recent changes in OpenBLAS that were made to speed up the thread initialization. There appears to be a certain kind of |
The right way to fix this is to do what I suggested before, which is to have the allocation tracking be dynamic, per thread, rather than globally allocated and assuming a certain number of threads. I might have some time tomorrow to make that happen (since I'm the one who originally caused this). |
Thanks much! I think I understand. I think I can monitor threads, somehow, though I don't remember. I'm worried I will not be able to catch every single thread if I have to poll, for example. But, I'll see. My code is python. So, while I'm intimately familiar with every line, I'm also not using OpenBLAS; some module I'm using is using it. Numpy, pandas, and matplotlib could all be using it, since all three depend on OpenBLAS package. This is an odd situation. It's pretty clear to me that python's use of OpenBLAS assumes this old behavior. I cannot imagine they think it's okay for every one who uses matplotlib to set this OpenBLAS environment variable. I wonder if it is better to build OpenBLAS with OpenMP and make that a dependency for distribution packages? The FAQ seems to be saying that is safer for distro packages since their users cannot change their code, in general. (Hmm, I suppose I could srpm it and try that.) One more thing: I'm suspicious about the algorithm you describe since it uses the build machine core count, which is likely to be one or two (i.e. a virtual machine) but that is atypical for any modern device, even phone. I know you know that, so maybe I misunderstand, but I'm just thinking out loud and still a bit confused. I'm more than happy to troubleshoot! I can recur the crash at will--well the code runs for 4 minutes before crashing, so I hardly have MWE; that's why I don't know exactly where it crashes--so that's a resource for you. Let me know if you can think of some instrumentation I can use from high up in the stack. I have no idea how to probe this. |
The new behaviour is just an oversight, but the old code it replaced was not without problems either. The reliance on the core count of the build system is actually just a fallback for when NUM_THREADS is not set at build time. (Actually with the old code you would get a very similar message and subsequent crash if you happened to exceed that limit, just the circumstances that triggered it were different.) |
Perhaps matplotlib is creating a new thread which does a single OpenBLAS call for every point (line or whatever object) it is drawing ? I wonder if it would be possible to come up with a very smalll python/matplotlib script that shows this behaviour. |
I'll try my best to get a clean repro with debug information, but it might be a while before I have time for it. :( What I will say (which I think jibes with some other comments here) is that I was running into another (more mysterious) issue with the older version of OpenBLAS, and I highly suspect that it was due to a similar underlying limitation. I disagree with the design philosophy that puts a limit on the number of user threads in the program that links to OpenBLAS, but it is at least better to error with a clear message than fail silently / unpredictably. |
You should be very careful when setting |
You should certainly not set NUM_THREADS that high unless you have a very big computer. The workaround suggested above was to change the value of MAX_ALLOCATING_THREADS in file memory.c which is just an array of pointers. |
Anyway I have now committed a change to develop (and "soon" 0.3.3) that reverts to the old version of memory.c unless OpenBLAS is built with -DUSE_TLS. (And in the latter case, the arbitrary limitation on the number of threads should be gone - I just do not want to make this code the default just yet as I just hacked around bugs in oonm0oo's latest PR. Hopefully these remaining issues can be resolved soon) |
0.3.3 is released now, reverting to the old code until we get to the bottom of this. |
From #1761 it appears I made a mistake though, as the new USE_TLS option is on by default in 0.3.3 when you build with plain make - it needs to be commented out in Makefile.rule to actually get the 0.3.0 version of memory.c |
@susilehtola the situation in 0.3.3 is as follows - with the latest iteration of the TLS code active (USE_TLS set to 1 at compile time), the "too many threads" problem is probably solved, but the fix is a mix of oon3m0oo's PR #1739 and my attempts at fixing new bugs in that PR. With cmake, the TLS version of memory.c is not used by default - this was my intention to again provide a stable basis after so many and diverse projects were affected by the decision to include this new code. Unfortunately I merged the wrong version of Makefile.rule, which made USE_TLS=1 the default for pure With #1765 merged, the situation is now cleaned up on the develop branch, in that USE_TLS is not set by default no matter which build system is used, leaving it up to the decision of the user or package maintainer to activate it based on their own testing. The same PR also ensures that the old |
@martin-frbg right. I guess I should then add |
@susilehtola with stock 0.3.3 you would actually need to remove or comment out the USE_TLS=1 line in Makefile.rule as I had continued the bad tradition in OpenBLAS of making a variable look like a Boolean while only checking if it is defined at all. |
Closing as 0.3.4 is released with both this fix for my USE_TLS blunder in 0.3.3 and brada4's bumping the default number of buffers to 50. |
mybe too many cpus, change \site-packages\joblib\externals\loky\backend\context.py can solved os_cpu_count = min(os.cpu_count() or 1,12) cpu_count_user = min(_cpu_count_user(os_cpu_count),12) |
I recently built OpenBLAS 0.3.2 on Ubuntu 16 and am running into this error.
OpenBLAS : Program will terminate because you tried to start too many threads.
My program is using a library that allocates many threads for various reasons. Most of the threads are sleeping most of the time, so there are no performance issues, but I guess that the large number of threads is too much...
Does linking to OpenBLAS really limit the max number of threads that the program creates, even if those threads are not making blas calls? Is there a way around this?
FWIW, I have done a lot of searching and reading on the topic, but still can't quite figure out how to solve this... Some notes:
USE_OPENMP=1
but that doesn't seem to fix the problem.OPENBLAS_NUM_THREADS=1
environment variable which seemed to make no differenceopenblas_set_num_threads(1)
at runtime which seemd to make some difference, but eventually I got the same errorUSE_THREAD=0
. I think this prevents the error but I haven't found clear documentation as to what the implications are. Can I still call blas functions in multiple threads safely (on unrelated data of course)? Does blas do everything in a single thread in this case or in whatever thread it's called from?Thanks much.
The text was updated successfully, but these errors were encountered: