-
Notifications
You must be signed in to change notification settings - Fork 2k
nvidia-docker fails on EC2 after restarting instance #137
Comments
We run into it as well. For some reason after the VM restarts the kernel can slightly change and nouveau gets loaded by default (in the initramfs). Best way I know of is to upgrade the machine:
Blacklist nouveau:
Reboot (just in case) and reinstall the drivers with DKMS
From now on it should be fine. I will probably update the doc when I know for sure what's happening |
Yeah, created a couple of machines and blacklisting nouveau works. 👍 on adding it to the docs. |
Is it correct that this issue is closed? It is not documented in https://github.com/NVIDIA/nvidia-docker/wiki/Deploy-on-Amazon-EC2 . |
@3XX0 ping^ Thanks for your help! |
The documentation is right, but depending on the AMI used you might want to restart the instance after creating it . For example, some Ubuntu AMIs have been snapshoted with a running kernel different from the one that will be used at next reboot. |
@christinakayastha I elaborated a bit on the installation for a base AMI ami-40d28157 (Ubuntu server 16.04 LTS) here: |
ahhh gochha, thanks a ton! |
I added a |
I'm following the instructions on how to Deploy on Amazon EC2. Right after the gpu instance creation I test:
nvidia-docker run --rm nvidia/cuda nvidia-smi
and everything works fine.Then I stop the instance
docker-machine stop aws01
and start it againdocker-machine start aws01
and test again:This time it fails. Is this expected behavior?
The text was updated successfully, but these errors were encountered: