Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the SCNN docker on AWS #11

Open
bhargav253 opened this issue Jun 13, 2019 · 3 comments
Open

Running the SCNN docker on AWS #11

bhargav253 opened this issue Jun 13, 2019 · 3 comments

Comments

@bhargav253
Copy link

I am trying to run the docker container on AWS-p2 instance with 1 K80 Tesla Card.
The default config of the instance is
CUDA : 10.1
Driver : 418.67

I tried following the instructions to manually install a different nvidia driver version from this link

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

But I am unable to install the driver version specified "367.57".
The wget command in that instruction fails saying that such a driver version is unavailable for tesla series.

When I try to run the docker with the default driver version, it fails on some CUDA call, and complains about the driver version.

Have you tried running it recently on AWS? Have you faced similar issues?

@bhargav253
Copy link
Author

I keep getting this error while trying to check the gpu version after launching the docker

root@c107a0693ba7:~/scnn# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

The docker container is unable to use the underlying GPU on my AWS instance. It complaints about driver version mismatch.
I am unable to rollback the nvidia driver version on my instance to exactly match the version mentioned.

Apparently, there are container best practices like in the link below which talks about how to avoid these exact driver miss-comparability issues while composing dockers.

https://hackernoon.com/docker-compose-gpu-tensorflow-%EF%B8%8F-a0e2011d36

@cooperlab
Copy link
Contributor

We are going to re-package this to avoid the driver/library conflicts. It won't happen immediately but is on our short list of things to do.

@pranjalvaidya
Copy link

Any update on this matter? We have been really interested in using the model; however, we've experienced some issues related to libraries and drivers.

@pooyam pooyam closed this as completed Feb 27, 2020
@pooyam pooyam reopened this Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants