Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many threads used when getting a variable from a file #248

Closed
zshaheen opened this issue May 24, 2018 · 14 comments
Closed

Many threads used when getting a variable from a file #248

zshaheen opened this issue May 24, 2018 · 14 comments
Milestone

Comments

@zshaheen
Copy link
Contributor

It's this issue: E3SM-Project/e3sm_diags#156

NOTE: I confirmed twice that this only appears with CDAT 8 (cdms 3.0) and not CDAT 2.12 (cdms 2.12).

Here’s how to recreate the issue.

I recommend you run it on a machine that’s not your machine.
This way, when you look at the processes/threads, it’ll only show a few instead of hundreds of them under your username.

Three files were tested (two are attached, the other is clt.nc), all of whom cause the error.

  1. Create an env with just cdms:

    conda create -n cdms_3.0_py2 -c conda-forge -c cdat cdms2 python=2
    
  2. Decide if you want to use htop (better, automatically updates every second) or manually query ps every x seconds.
    a. If you use htop, you can install it like so: conda install htop -c conda-forge

  3. Run the test_threads.py. You can easily choose which file to run.

    python test_threads.py
    
  4. While this is running (it runs for 30 seconds), query the threads being ran.
    a. If you’re using htop:

    htop -u <your_username>
    

    If the error doesn’t show, toggle the viewing of threads with SHIFT + h

    b. If you want to use ps:

    ps -T -u <your_username>
    

    You have to manually run when you want to update the list.

  5. You should see a large number of python test_threads.py. Each of these are threads spawned from one process.
    Look at the image below to see how an error should look like.

error

Test script and test files: test_threads.tar.gz

@dnadeau4
Copy link
Contributor

dnadeau4 commented Jun 5, 2018

using libgcc-ng<7 seems to fix this for now. I am not sure why libgcc 7 is doing this.

conda create -n cdms_3.0_py2 -c conda-forge -c cdat cdms2 python=2 "libgcc-ng<7"

@dnadeau4
Copy link
Contributor

@zshaheen can you close this if things work?

@zshaheen
Copy link
Contributor Author

@dnadeau4 I plan on doing that when the actual versions get released and I test everything again, both in the examples I gave, and e3sm_diags (manually and in its test suite).

@dnadeau4
Copy link
Contributor

This is a numpy issue using openblas" You need to set one of these external variables.
If you got numpy for "anaconda" channel, then you don't have openblas and you need to set the OMP_NUM_THREADS environment variable.

  • export OPENBLAS_NUM_THREADS=1
  • export OMP_NUM_THREADS=1

@durack1
Copy link
Member

durack1 commented Aug 14, 2018

@dnadeau4 is there a numpy webpage that documents the mpi/multithread change since ~1.12? If yes, want to drop that ref here in perpetuity?

@durack1
Copy link
Member

durack1 commented Aug 15, 2018

@dnadeau4 bad news, this problem is not fixed. I have set the envs:

(cdat80py2) duro@ocean:[180606_PaperPlots_UpperDeepWarming]:[9352]> env | grep NUM_THREADS
OPENBLAS_NUM_THREADS=1
OMP_NUM_THREADS=1

And here are the libraries in this env:

(cdat80py2) duro@ocean:[180606_PaperPlots_UpperDeepWarming]:[9291]> 
ls anaconda2/envs/cdat80py2/conda-meta/*cdms*
anaconda2/envs/cdat80py2/conda-meta/cdms2-3.0.1-py27h6091dcd_1.json
anaconda2/envs/cdat80py2/conda-meta/libcdms-3.0.1-h9ac9557_2.json

And it seems this is not controlling threads, with a new thread added every 3 secs that the script runs:

top - 11:22:25 up 20 days, 21:19, 34 users,  load average: 2.19, 2.23, 2.32
Tasks:  87 total,   1 running,  86 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.6%us,  1.3%sy,  0.0%ni, 96.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  231325120k total, 229465760k used,  1859360k free,  5111920k buffers
Swap:        0k total,        0k used,        0k free, 196023468k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                    
133509 durack1   20   0 32.7g 200m  23m R 88.4  0.1   4:10.45 python make_ohcFromArgo.py                 133551 durack1   20   0 32.7g 200m  23m S 42.2  0.1   0:01.76 python make_ohcFromArgo.py                 133515 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:06.27 python make_ohcFromArgo.py                 133519 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:06.10 python make_ohcFromArgo.py                 133521 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:06.15 python make_ohcFromArgo.py                 133523 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:06.19 python make_ohcFromArgo.py                 133533 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:06.31 python make_ohcFromArgo.py                 133547 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:07.13 python make_ohcFromArgo.py                 133549 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:07.15 python make_ohcFromArgo.py                 133553 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133555 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133557 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133562 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133564 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133566 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133568 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133570 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133572 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133574 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133576 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133579 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133581 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133583 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133585 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133587 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133589 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133591 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133593 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133595 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133597 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133599 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133601 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133603 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133605 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133607 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133609 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133611 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133613 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133615 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133617 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133619 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133621 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133623 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133625 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133627 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133629 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133631 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133633 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133635 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133637 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133639 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133641 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133643 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133645 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133647 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 133649 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133651 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133653 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133655 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133657 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133659 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133664 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133666 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133668 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133670 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133672 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133674 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133676 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133678 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133680 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133682 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133684 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133686 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133688 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133690 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133692 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133696 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133698 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133700 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133704 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133707 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133709 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133711 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133715 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133720 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133722 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                 
133724 durack1   20   0 32.7g 200m  23m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py

The details above were copied from #264, as that issue is solved by the env variables being set

@durack1 durack1 reopened this Aug 15, 2018
@durack1
Copy link
Member

durack1 commented Aug 15, 2018

I'm still getting blocked logins due to hitting the per user thread limit of 1024, so I'm currently unable to use CDAT 8.0, I'll roll back to 2.12 so I can continue working

@durack1
Copy link
Member

durack1 commented Aug 15, 2018

@dnadeau4 oh no, bad news this is also happening with a uvcdat 2.12 environment

(uvcdat2120) duro@ocean:[180606_PaperPlots_UpperDeepWarming]:[9313]> env | grep NUM_THREADS
OPENBLAS_NUM_THREADS=1
OMP_NUM_THREADS=1
(uvcdat2120) duro@ocean:[180606_PaperPlots_UpperDeepWarming]:[9313]> 
ls anaconda2/envs/uvcdat2120/conda-meta/*cdms*
anaconda2/envs/uvcdat2120/conda-meta/cdms2-2.12-np113py27_0.json
anaconda2/envs/uvcdat2120/conda-meta/libcdms-2.12-0.json

This is really bad

top - 11:34:39 up 20 days, 21:31, 34 users,  load average: 2.59, 1.78, 1.78
Tasks:  75 total,   2 running,  73 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.9%us,  1.7%sy,  0.0%ni, 96.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  231325120k total, 229539064k used,  1786056k free,  5112704k buffers
Swap:        0k total,        0k used,        0k free, 196093364k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                       
137392 durack1   20   0 32.1g 195m  19m R 65.3  0.1   3:48.83 python make_ohcFromArgo.py                                                                                    
137418 durack1   20   0 32.1g 195m  19m S 33.0  0.1   0:08.27 python make_ohcFromArgo.py                                                                                    
137420 durack1   20   0 32.1g 195m  19m R 27.1  0.1   0:00.84 python make_ohcFromArgo.py                                                                                    
137400 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:07.39 python make_ohcFromArgo.py                                                                                    
137402 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:07.52 python make_ohcFromArgo.py                                                                                    
137404 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:07.37 python make_ohcFromArgo.py                                                                                    
137406 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:08.45 python make_ohcFromArgo.py                                                                                    
137416 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:07.33 python make_ohcFromArgo.py                                                                                    
137425 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137427 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137429 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137431 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137433 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137435 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137437 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137439 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137441 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137443 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137445 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137447 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137449 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137451 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137453 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137455 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137457 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137459 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137461 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137463 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137466 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137478 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137480 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137482 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137485 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137487 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137489 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137491 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137493 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137495 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137497 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137499 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137501 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137503 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137505 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137507 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137509 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137511 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137513 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137515 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137517 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137519 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137521 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137523 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137525 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137527 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137529 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137531 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137533 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137535 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137537 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137539 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137541 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137543 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137545 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137547 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137549 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137551 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137553 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137555 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137557 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137559 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137561 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137563 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137565 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137567 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py                                                                                    
137569 durack1   20   0 32.1g 195m  19m S  0.0  0.1   0:00.00 python make_ohcFromArgo.py

@zshaheen have you recently updated your conda install? I'm starting to think that it's something to do with that

@durack1
Copy link
Member

durack1 commented Aug 15, 2018

Ok painful.. The env below is using numpy 1.14.2 py27hdbf6ddf_1 (default conda) and the issue is all due to the behaviour of cdat_info when log_anonymously = true is:

(cdat80py2) duro@ocean:[180606_PaperPlots_UpperDeepWarming]:[9224]>
more ~/.uvcdat/.anonymouslog
{
  "log_anonymously": true,
  "last_version_check": [
    "",
    "8.0"
  ],
  "last_time_checked": 1534367206.884015
}

And the thread count after a ~minute - I also note a CPU load of 4300% upon start up (Ref #264):

top - 14:03:36 up 21 days, 0 min, 34 users,  load average: 2.31, 1.18, 0.67
Tasks:  25 total,   1 running,  24 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.5%us,  3.6%sy,  0.0%ni, 94.8%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  231325120k total, 228147356k used,  3177764k free,  5139360k buffers
Swap:        0k total,        0k used,        0k free, 194621712k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                  
 10469 durack1   20   0 24.2g 180m  22m R 97.4  0.1   1:14.64 python                                                   
 10608 durack1   20   0 24.2g 180m  22m D 12.0  0.1   0:00.36 python                                                   
 10610 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10613 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10616 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10618 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10621 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10623 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10625 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10627 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10629 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10631 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10633 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10635 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10637 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10639 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10641 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10643 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10645 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10647 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10649 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10651 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10653 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10655 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python                                                   
 10657 durack1   20   0 24.2g 180m  22m S  0.0  0.1   0:00.00 python 

And log_anonymously = false, so:

(cdat80py2) duro@ocean:[180606_PaperPlots_UpperDeepWarming]:[9225]>
more ~/.uvcdat/.anonymouslog
{
  "log_anonymously": false,
  "last_version_check": [
    "",
    "8.0"
  ],
  "last_time_checked": 1534367252.604961
}

And the thread count after a ~minute:

top - 14:09:02 up 21 days, 6 min, 34 users,  load average: 1.40, 1.69, 1.06
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.1%us,  1.7%sy,  0.0%ni, 97.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  231325120k total, 228520972k used,  2804148k free,  5143648k buffers
Swap:        0k total,        0k used,        0k free, 195410568k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                  
 13607 durack1   20   0  640m 128m  21m R 99.8  0.1   0:25.71 python

And for completeness, here is the output of conda list:
cdat8py2-condaList.txt

@durack1
Copy link
Member

durack1 commented Aug 15, 2018

@dnadeau4 @zshaheen @doutriaux1 the problem with cdat_info should be fixed, these rogue threads that don't clean themselves up is really causing problems

Closing this issue as it's resolved, but a new cdat_info issue should be opened

@durack1 durack1 closed this as completed Aug 15, 2018
@zshaheen
Copy link
Contributor Author

@durack1 Okay, that makes sense...

For our stuff, we have the following environmental variables set before each run, ensuring that all is okay.

import os
# Must be done before any CDAT library is called.
if 'UVCDAT_ANONYMOUS_LOG' not in os.environ:
    os.environ['UVCDAT_ANONYMOUS_LOG'] = 'no'
# Used by numpy, causes too many threads to spawn otherwise.
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'

@durack1
Copy link
Member

durack1 commented Aug 15, 2018

@zshaheen this should just work out of the box, rather than having custom environment variables to be generated. The behaviour of cdat_info needs real work. I created CDAT/cdat#2213 to attempt to deal with this issue

@dnadeau4
Copy link
Contributor

@zshaheen note that if you say yes/no once, the file ~/.uvcdat/.anonymouslog is created. That file superseded your environmental variable until it is deleted.

@zshaheen
Copy link
Contributor Author

@dnadeau4 Oh wow, that shouldn't really happen 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants