-
Notifications
You must be signed in to change notification settings - Fork 2k
Multi-GPU PyTorch example freezes docker containers #1010
Comments
Hello @RenaudWasTaken, long time no see! Waiting for your awesomeness ahah ;=) |
Seems to be working with the latest pytorch image. Can you maybe provide an strace log? |
Hi @RenaudWasTaken, I hope you are doing well :) |
Your dockerfile is also incorrect, when trying to get it running torch is not installed:
Can you provide a container that reproduces the issue ? |
|
@Dubrzr Hello, sir. I am having the same problem, did you fix it? I am lost in google for two days to find a solution , but ..... |
@itsnickyang I reproduced the error outside of docker containers, the process freezes and is not killable. I created an issue in PyTorch repo pytorch/pytorch#24081 But don't know whether this is a PyTorch or Nvidia problem. |
So closing as not a container issue :D ! |
1. Issue or feature description
We have 8xP40, all mounted inside multiple docker containers running JupyterLab using nvidia-docker2.
When one person tries to use multiple GPUs for machine learning, it freezes all docker containers on the machine. We cannot restart the docker containers in question. I don't know if it only concerns docker containers that mounted nvidia gpus but I think so.
We tried to restart docker daemon but it cannot exit.
The only solution is to reboot the machine.
2. Steps to reproduce the issue
Inside a Docker container with multiple GPUs (here multiple P40), run the following Python3 code:
3. Information to attach (optional if deemed irrelevant)
The text was updated successfully, but these errors were encountered: