New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU resources, htop shows all 12 cores being used for example_1.py #72
Comments
I ran example_1.py on an Ubuntu machine which has no GPU card installed and saw 8 "python examples/example_1.py" process. |
Hi! Sorry this took a while to reply. My first guess as to what's happening here is that we're seeing some multithreading from inside PyTorch (such as for BLAS). Although I'm not sure why that would be so busy if you're using the GPU. Try putting The Atari env is single-threaded, and example_1.py only runs one Python process, so the rest of the code should really only use 1 core. |
Thanks @astooke I did another test where I took example_5 and did one change, which is to replace Let me now do some testing with your suggestion of changing the num threads ... Update 1: was going to do some experimentation but saw @codelast put some new stuff below. Update 2: actually I might have gotten confused. In Update 3: figured out my prior question. |
@astooke I tried using torch.set_num_threads(1) at the very beginning of the main function in example_1.py, that did reduce the process number, to 3(still not one), and I found that the %CPU usage of the 3 processes varies greatly(via "htop" command), the highest is about 95%, the 2nd is about 20%, and the lowest is only about 0.6%. Seems that they are doing the similar jobs, but process 1 takes a lot of time in "forward", process 2 takes a lot of time in "backward", and there is no obvious feature for process 3. |
Well, this is interesting to see the separate processes! I don't know what's going on, but it looks possible this is still something that's happening inside PyTorch? So within rlpyt you can develop as a serial program. |
I've also noticed sometimes that multi-threading in PyTorch doesn't fully obey the cpu affinity set with psutil. But it seems that using Good call on not needing the cuda_idx arg, that might just be a feature of the example? In the MinibatchRlBase.startup() it uses |
Just wondering @astooke was there any update on understanding how to control CPU resources? I'm asking because my code has been using more CPUs than specified and it has caused some machines and/or scripts to crash or hang. |
I haven't tried with newer versions of PyTorch since 1.2, but my experience is still that using
Hope that helps! And if that doesn't work, please let us know, that would be a surprising problem. |
I just ran some tests today, and indeed using |
I'll close this for now and then in experiments I will just put |
Thanks for the great library. I am running some tests to benchmark performance. Since I hope to be running these using many CPUs, I want to understand how many CPUs a script will consume. I installed the repository as of today, and ran:
I'm running this on a machine with a single GPU, and an i7-8700k CPU with 12 logical cores. I assume I'm not using the GPU in the above command.
In a separate tab, my
htop
is regularly showing something like this:All 12 CPUs on my machine are being used.
When I run:
I don't generally see all 12 CPUs being used, but I may get something like 6 CPUs used:
Just wondering, the documentation of
example_1.py
says this:However I assumed "one python process = one core". Perhaps this is not the right way to think about it. Is there a way to roughly estimate how many CPUs (or "cores" -- I use terms interchangeably) will be used for a given training run?
The text was updated successfully, but these errors were encountered: