OpenBLAS : Program will terminate because you tried to start too many threads. #1735

logidelic · 2018-08-13T14:49:49Z

I recently built OpenBLAS 0.3.2 on Ubuntu 16 and am running into this error.

OpenBLAS : Program will terminate because you tried to start too many threads.

My program is using a library that allocates many threads for various reasons. Most of the threads are sleeping most of the time, so there are no performance issues, but I guess that the large number of threads is too much...

Does linking to OpenBLAS really limit the max number of threads that the program creates, even if those threads are not making blas calls? Is there a way around this?

FWIW, I have done a lot of searching and reading on the topic, but still can't quite figure out how to solve this... Some notes:

I was not getting this error with a previous version of OpenBLAS (whatever is in he Ubuntu repo)
I also tried building with USE_OPENMP=1 but that doesn't seem to fix the problem.
I tried with OPENBLAS_NUM_THREADS=1 environment variable which seemed to make no difference
I tried calling openblas_set_num_threads(1) at runtime which seemd to make some difference, but eventually I got the same error
I tried building with USE_THREAD=0. I think this prevents the error but I haven't found clear documentation as to what the implications are. Can I still call blas functions in multiple threads safely (on unrelated data of course)? Does blas do everything in a single thread in this case or in whatever thread it's called from?

Thanks much.

The text was updated successfully, but these errors were encountered:

brada4 · 2018-08-13T17:16:08Z

If you call openblas from multiple threads you need omp or single thread version. Check Makefile.rule for more options

brada4 · 2018-08-13T17:18:10Z

If you call multi-threaded blas from many threads at once you lose cache efficiency

martin-frbg · 2018-08-13T19:54:38Z

This appears to be a (still poorly understood) bug introduced with the rewrite (and speedup) of the thread initialization code in 0.3.1 - see #1704 and #1641. A workaround is simply to increase the value of MAX_ALLOCATING_THREADS, but to achieve some kind of final solution it would be very useful to get an idea how many (blas-calling) threads an affected program is/was trying to start.

brada4 · 2018-08-14T10:52:08Z

#1704 sort of ended in resource leak where calling thread exits and new comes in place all the time?
@logidelic can you add some cout() where your code creates / exits a thread?

martin-frbg · 2018-08-14T11:01:18Z

Not sure I'd call that a resource leak, it just so happens that there is a fixed size array for thread pointers in the new code where old entries apparently never get reused. #1641 already had musings on making this thing dynamically allocated.

susilehtola · 2018-08-14T21:18:38Z

The same issue has been reported in Fedora
https://bugzilla.redhat.com/show_bug.cgi?id=1615803

so I'm also interested in a speedy fix to this.

martin-frbg · 2018-08-14T21:38:23Z

Unfortunately that bugzilla entry is mostly useless, as the person who wrote it does not appear to know what the code he uses actually does internally (or how many threads he will eventually need). For a crude "fix", you could try replacing

#  define MAX_ALLOCATING_THREADS MAX_CPU_NUMBER * 2 * MAX_PARALLEL_NUMBER * 2

in line 512 of driver/others/memory.c with some big constant, such as
#define MAX_ALLOCATING_THREADS 4096. Another choice would be to revert the entire file to its 0.3.0 state...

PorcelainMouse · 2018-08-15T00:54:24Z

If one isn't using OpenBLAS API, directly, how can you troubleshoot this error message? I looked, but didn't see any runtime env-based debug options I could try. I cannot even tell where the crash occurs since I don't get a coredump.

I see in the FAQ that "If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading." Is this a new conflict? If not, that is not my problem because it broke immediately after update to 0.3.1-2. But, using OPENBLAS_NUM_THREADS=1 prevents the crash, so, can you tell if that is related or not?

Thanks for your help.

martin-frbg · 2018-08-15T07:20:54Z

@PorcelainMouse the problem is an unexpected and unintended consequence of recent changes in OpenBLAS that were made to speed up the thread initialization. There appears to be a certain kind of
workload, or application behaviour, where the assumptions about the maximum number of threads to
expect do not hold. Unfortunately there is no short-term solution via environment options other than
setting OPENBLAS_NUM_THREADS=1.
To fix this, it would be very useful to get an understanding what your program does, and a rough estimate of how many threads it creates. (It seems the assumptions were that at most four times the number of cpu cores on the machine where OpenBLAS was compiled would get used, and that threads would typically persist for the lifetime of the program. From #1704 we now know that some
"deep learning" context may use many more threads over the lifetime of a program, which used to work with 0.3.0 as that used a global memory pool rather than thread-local storage. ) If all else fails,
0.3.3 will switch back to the old, slower method. @oon3m0oo

oon3m0oo · 2018-08-15T08:01:55Z

The right way to fix this is to do what I suggested before, which is to have the allocation tracking be dynamic, per thread, rather than globally allocated and assuming a certain number of threads. I might have some time tomorrow to make that happen (since I'm the one who originally caused this).

PorcelainMouse · 2018-08-15T08:17:09Z

Thanks much! I think I understand. I think I can monitor threads, somehow, though I don't remember. I'm worried I will not be able to catch every single thread if I have to poll, for example. But, I'll see.

My code is python. So, while I'm intimately familiar with every line, I'm also not using OpenBLAS; some module I'm using is using it. Numpy, pandas, and matplotlib could all be using it, since all three depend on OpenBLAS package.

This is an odd situation. It's pretty clear to me that python's use of OpenBLAS assumes this old behavior. I cannot imagine they think it's okay for every one who uses matplotlib to set this OpenBLAS environment variable. I wonder if it is better to build OpenBLAS with OpenMP and make that a dependency for distribution packages? The FAQ seems to be saying that is safer for distro packages since their users cannot change their code, in general. (Hmm, I suppose I could srpm it and try that.)

One more thing: I'm suspicious about the algorithm you describe since it uses the build machine core count, which is likely to be one or two (i.e. a virtual machine) but that is atypical for any modern device, even phone. I know you know that, so maybe I misunderstand, but I'm just thinking out loud and still a bit confused.

I'm more than happy to troubleshoot! I can recur the crash at will--well the code runs for 4 minutes before crashing, so I hardly have MWE; that's why I don't know exactly where it crashes--so that's a resource for you. Let me know if you can think of some instrumentation I can use from high up in the stack. I have no idea how to probe this.

martin-frbg · 2018-08-15T09:15:23Z

The new behaviour is just an oversight, but the old code it replaced was not without problems either. The reliance on the core count of the build system is actually just a fallback for when NUM_THREADS is not set at build time. (Actually with the old code you would get a very similar message and subsequent crash if you happened to exceed that limit, just the circumstances that triggered it were different.)
What worries me most is just how many programs have come to depend on OpenBLAS, and will be affected if a distro habitually updates to the latest release as soon as it becomes available. There is no sizable organisation or permanent developer community behind OpenBLAS at the moment, so not all regressions will be caught in time before they make it into a release.

martin-frbg · 2018-08-15T10:15:15Z

Perhaps matplotlib is creating a new thread which does a single OpenBLAS call for every point (line or whatever object) it is drawing ? I wonder if it would be possible to come up with a very smalll python/matplotlib script that shows this behaviour.

logidelic · 2018-08-15T13:25:14Z

I'll try my best to get a clean repro with debug information, but it might be a while before I have time for it. :(

What I will say (which I think jibes with some other comments here) is that I was running into another (more mysterious) issue with the older version of OpenBLAS, and I highly suspect that it was due to a similar underlying limitation. I disagree with the design philosophy that puts a limit on the number of user threads in the program that links to OpenBLAS, but it is at least better to error with a clear message than fail silently / unpredictably.

sscherfke · 2018-08-28T06:51:13Z

You should be very careful when setting NUM_THREADS=4096, b/c it can result in a huge memory “leak” (depending on your code).

martin-frbg · 2018-08-28T07:02:00Z

You should certainly not set NUM_THREADS that high unless you have a very big computer. The workaround suggested above was to change the value of MAX_ALLOCATING_THREADS in file memory.c which is just an array of pointers.

martin-frbg · 2018-08-28T07:32:28Z

Anyway I have now committed a change to develop (and "soon" 0.3.3) that reverts to the old version of memory.c unless OpenBLAS is built with -DUSE_TLS. (And in the latter case, the arbitrary limitation on the number of threads should be gone - I just do not want to make this code the default just yet as I just hacked around bugs in oonm0oo's latest PR. Hopefully these remaining issues can be resolved soon)

martin-frbg · 2018-08-31T11:07:29Z

0.3.3 is released now, reverting to the old code until we get to the bottom of this.

martin-frbg · 2018-09-14T06:33:24Z

From #1761 it appears I made a mistake though, as the new USE_TLS option is on by default in 0.3.3 when you build with plain make - it needs to be commented out in Makefile.rule to actually get the 0.3.0 version of memory.c

martin-frbg · 2018-09-19T20:52:53Z

@susilehtola the situation in 0.3.3 is as follows - with the latest iteration of the TLS code active (USE_TLS set to 1 at compile time), the "too many threads" problem is probably solved, but the fix is a mix of oon3m0oo's PR #1739 and my attempts at fixing new bugs in that PR. With cmake, the TLS version of memory.c is not used by default - this was my intention to again provide a stable basis after so many and diverse projects were affected by the decision to include this new code. Unfortunately I merged the wrong version of Makefile.rule, which made USE_TLS=1 the default for pure make builds (where 0.3.3 should still perform much better than 0.3.2).

With #1765 merged, the situation is now cleaned up on the develop branch, in that USE_TLS is not set by default no matter which build system is used, leaving it up to the decision of the user or package maintainer to activate it based on their own testing. The same PR also ensures that the old
code is always used for non-threaded builds, which takes care of the recent issue #1761.
There will be a 0.3.4 once I have time to do more than damage control again.

susilehtola · 2018-09-20T06:14:29Z

@martin-frbg right. I guess I should then add USE_TLS=0 to the Fedora builds of 0.3.3.?

martin-frbg · 2018-09-20T06:25:43Z

@susilehtola with stock 0.3.3 you would actually need to remove or comment out the USE_TLS=1 line in Makefile.rule as I had continued the bad tradition in OpenBLAS of making a variable look like a Boolean while only checking if it is defined at all. ☹️

martin-frbg · 2018-12-02T22:56:14Z

Closing as 0.3.4 is released with both this fix for my USE_TLS blunder in 0.3.3 and brada4's bumping the default number of buffers to 50.

linuxl7 · 2023-04-28T08:58:00Z

mybe too many cpus, change \site-packages\joblib\externals\loky\backend\context.py can solved

os_cpu_count = min(os.cpu_count() or 1,12)

cpu_count_user = min(_cpu_count_user(os_cpu_count),12)

oon3m0oo mentioned this issue Aug 16, 2018

Support arbitrary numbers of threads for memory allocation. #1739

Open

martin-frbg mentioned this issue Aug 31, 2018

Performance improved a lot in the last 2 weeks #1624

Closed

martin-frbg added this to the 0.3.4 milestone Aug 31, 2018

roy7 mentioned this issue Sep 12, 2018

Tree search batch support leela-zero/leela-zero#1803

Closed

zhreshold mentioned this issue Sep 14, 2018

1.3.0 release pre-built package contains a buggy openblas version. Causing gluon data loader with large num_workers to crash apache/mxnet#12567

Closed

davydden mentioned this issue Oct 9, 2018

openblas: fix experimental USE_TLS makefile option spack/spack#9474

Merged

dpo mentioned this issue Nov 18, 2018

openblas: patch to avoid segfault Homebrew/homebrew-core#34253

Closed

4 tasks

martin-frbg closed this as completed Dec 2, 2018

aroffringa mentioned this issue Feb 20, 2019

ddecal segmentation fault, too many open threads lofar-astron/DP3#126

Closed

3fon3fonov mentioned this issue Oct 9, 2019

OpenBLAS : Program will terminate because you tried to start too many threads. 3fon3fonov/exostriker#29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenBLAS : Program will terminate because you tried to start too many threads. #1735

OpenBLAS : Program will terminate because you tried to start too many threads. #1735

logidelic commented Aug 13, 2018 •

edited

Loading

brada4 commented Aug 13, 2018

brada4 commented Aug 13, 2018

martin-frbg commented Aug 13, 2018

brada4 commented Aug 14, 2018

martin-frbg commented Aug 14, 2018

susilehtola commented Aug 14, 2018

martin-frbg commented Aug 14, 2018

PorcelainMouse commented Aug 15, 2018

martin-frbg commented Aug 15, 2018

oon3m0oo commented Aug 15, 2018

PorcelainMouse commented Aug 15, 2018

martin-frbg commented Aug 15, 2018

martin-frbg commented Aug 15, 2018

logidelic commented Aug 15, 2018

sscherfke commented Aug 28, 2018

martin-frbg commented Aug 28, 2018

martin-frbg commented Aug 28, 2018

martin-frbg commented Aug 31, 2018

martin-frbg commented Sep 14, 2018

martin-frbg commented Sep 19, 2018 •

edited

Loading

susilehtola commented Sep 20, 2018

martin-frbg commented Sep 20, 2018

martin-frbg commented Dec 2, 2018

linuxl7 commented Apr 28, 2023

OpenBLAS : Program will terminate because you tried to start too many threads. #1735

OpenBLAS : Program will terminate because you tried to start too many threads. #1735

Comments

logidelic commented Aug 13, 2018 • edited Loading

brada4 commented Aug 13, 2018

brada4 commented Aug 13, 2018

martin-frbg commented Aug 13, 2018

brada4 commented Aug 14, 2018

martin-frbg commented Aug 14, 2018

susilehtola commented Aug 14, 2018

martin-frbg commented Aug 14, 2018

PorcelainMouse commented Aug 15, 2018

martin-frbg commented Aug 15, 2018

oon3m0oo commented Aug 15, 2018

PorcelainMouse commented Aug 15, 2018

martin-frbg commented Aug 15, 2018

martin-frbg commented Aug 15, 2018

logidelic commented Aug 15, 2018

sscherfke commented Aug 28, 2018

martin-frbg commented Aug 28, 2018

martin-frbg commented Aug 28, 2018

martin-frbg commented Aug 31, 2018

martin-frbg commented Sep 14, 2018

martin-frbg commented Sep 19, 2018 • edited Loading

susilehtola commented Sep 20, 2018

martin-frbg commented Sep 20, 2018

martin-frbg commented Dec 2, 2018

linuxl7 commented Apr 28, 2023

logidelic commented Aug 13, 2018 •

edited

Loading

martin-frbg commented Sep 19, 2018 •

edited

Loading