-
Notifications
You must be signed in to change notification settings - Fork 10
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many threads used when getting a variable from a file #248
Comments
using libgcc-ng<7 seems to fix this for now. I am not sure why libgcc 7 is doing this.
|
@zshaheen can you close this if things work? |
@dnadeau4 I plan on doing that when the actual versions get released and I test everything again, both in the examples I gave, and e3sm_diags (manually and in its test suite). |
This is a
|
@dnadeau4 is there a numpy webpage that documents the mpi/multithread change since ~1.12? If yes, want to drop that ref here in perpetuity? |
@dnadeau4 bad news, this problem is not fixed. I have set the
And here are the libraries in this env:
And it seems this is not controlling threads, with a new thread added every 3 secs that the script runs:
The details above were copied from #264, as that issue is solved by the |
I'm still getting blocked logins due to hitting the per user thread limit of 1024, so I'm currently unable to use CDAT 8.0, I'll roll back to 2.12 so I can continue working |
@dnadeau4 oh no, bad news this is also happening with a
This is really bad
@zshaheen have you recently updated your conda install? I'm starting to think that it's something to do with that |
Ok painful.. The env below is using numpy 1.14.2 py27hdbf6ddf_1 (default conda) and the issue is all due to the behaviour of
And the thread count after a ~minute - I also note a CPU load of 4300% upon start up (Ref #264):
And log_anonymously =
And the thread count after a ~minute:
And for completeness, here is the output of |
@dnadeau4 @zshaheen @doutriaux1 the problem with Closing this issue as it's resolved, but a new |
@durack1 Okay, that makes sense... For our stuff, we have the following environmental variables set before each run, ensuring that all is okay. import os
# Must be done before any CDAT library is called.
if 'UVCDAT_ANONYMOUS_LOG' not in os.environ:
os.environ['UVCDAT_ANONYMOUS_LOG'] = 'no'
# Used by numpy, causes too many threads to spawn otherwise.
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1' |
@zshaheen this should just work out of the box, rather than having custom environment variables to be generated. The behaviour of |
@zshaheen note that if you say |
@dnadeau4 Oh wow, that shouldn't really happen 😞 |
It's this issue: E3SM-Project/e3sm_diags#156
NOTE: I confirmed twice that this only appears with CDAT 8 (cdms 3.0) and not CDAT 2.12 (cdms 2.12).
Here’s how to recreate the issue.
I recommend you run it on a machine that’s not your machine.
This way, when you look at the processes/threads, it’ll only show a few instead of hundreds of them under your username.
Three files were tested (two are attached, the other is clt.nc), all of whom cause the error.
Create an env with just cdms:
Decide if you want to use htop (better, automatically updates every second) or manually query ps every x seconds.
a. If you use htop, you can install it like so:
conda install htop -c conda-forge
Run the
test_threads.py
. You can easily choose which file to run.While this is running (it runs for 30 seconds), query the threads being ran.
a. If you’re using htop:
If the error doesn’t show, toggle the viewing of threads with
SHIFT + h
b. If you want to use ps:
You have to manually run when you want to update the list.
You should see a large number of
python test_threads.py
. Each of these are threads spawned from one process.Look at the image below to see how an error should look like.
Test script and test files: test_threads.tar.gz
The text was updated successfully, but these errors were encountered: