Problem: Docker image uses CUDA 7.5, and host system driver 367.44 requires CUDA 8 #237

thommiano · 2016-11-03T17:26:59Z

I’m trying to run an image that uses CUDA 7.5, but my host driver is 367.44 for a GTX1070. I’m getting the error, Value 'sm_61' is not defined for option 'gpu-architecture’, which suggests that my host driver is incompatible with the CUDA version on the image. To run this image I was using nvidia-docker rather than plain docker, but it still returned the error.

I thought nvidia-docker is supposed to solve this problem? Am I doing something wrong (e.g., need to have a driver compatible with CUDA 7.5 on the image), or is this not possible, and I will need to have a CUDA 7.5 compatible driver on my host?

Specifically, I'm trying to run the alexjc/neural-doodle:gpu image. https://github.com/alexjc/neural-doodle/issues/96

The text was updated successfully, but these errors were encountered:

3XX0 · 2016-11-03T17:34:54Z

Pascal support is only available starting with CUDA 8.0.
Using the (default) CUDA 8.0 image should fix your issue.

thommiano · 2016-11-03T18:04:43Z

Ok. So does that mean I won't be able to run an image that uses anything below CUDA 8.0?

flx42 · 2016-11-03T18:28:09Z

Not necessarily, it depends how the project was compiled and which libraries it's using.

In CUDA we have an assembly language called PTX, if your app was compiled with CUDA 7.5 but with all CUDA code bundled as PTX, then at runtime the code will be JITed by the NVIDIA driver for your new architecture. If your code was compiled with only binary code (e.g. sm_52), then it's not forward compatible.

When you use cuDNN, they don't have PTX for most algorithms, so it won't work.

Anyway, I think your problem is just that you aren't using the nvidia-docker wrapper when launching the GPU app, that's all.

flx42 · 2016-11-03T18:37:23Z

Sorry, I read too fast. You did try with nvidia-docker apparently. The problem is that Theano does its own JITing at runtime, it detects you have a sm_61 GPU, so it tries calling nvcc with sm_61.
This is problem is specific to Theano. With other ML frameworks, you can specify at build time with architectures to compile against.

thommiano · 2016-11-03T19:19:19Z

Ok, thanks for the feedback. Do you have any suggestions for the best way to move forward? From what I've read on similar problems (and as @3XX0 suggests) it seems like updating the image to CUDA 8 might work.

I haven't done this before . . . would I just edit the docker files (update to CUDA8 for install-cuda-drivers-ubuntu-14.04.sh and update to cuDNN5 for docker-gpu.df) and create new image?

flx42 · 2016-11-03T20:01:08Z

Yes, try to modify the first line to FROM nvidia/cuda:8.0-cudnn5-devel

thommiano · 2016-11-03T22:34:59Z

I cloned the source image via git clone on github rather than using docker git clone because I couldn't quite figure out if there was a way to do that. I made the following changes:

CUDA8: https://github.com/thommiano/neural-doodle/blob/master/docker/install-cuda-drivers-ubuntu-14.04.sh
CUDA8 and cuDNN5: https://github.com/thommiano/neural-doodle/blob/master/docker-gpu.df
Duplicated docker-gpu.df and renamed it Dockerfile in order to create a docker image. Then removed docker-gpu.df and docker-cpu.df. Couldn't figure out how run the image otherwise since I cloned it from github. Any pointers on the right way to do this would be appreciated.

I then pushed it up to my remote, built the docker image, and pushed that to dockerhub.

Now when I run nvidia-docker run socraticdatum/neural-doodle I get the following:

Neural Doodle for semantic style transfer.
  - Using device `gpu` for processing the images.
Traceback (most recent call last):
  File "doodle.py", line 657, in <module>
    generator = NeuralGenerator()
  File "doodle.py", line 234, in __init__
    self.style_img_original, self.style_map_original = self.load_images('style', args.style)
  File "doodle.py", line 288, in load_images
    basename, _ = os.path.splitext(filename)
  File "/usr/lib/python3.4/posixpath.py", line 122, in splitext
    return genericpath._splitext(p, sep, None, extsep)
  File "/usr/lib/python3.4/genericpath.py", line 118, in _splitext
    sepIndex = p.rfind(sep)
AttributeError: 'NoneType' object has no attribute 'rfind'

I think this is at least an improvement because it's now saying Using devicegpufor processing the images, but I'm not quite sure how to fix the rfind problem. Any ideas on this?

Seems like a Python issue now, so perhaps more appropriate to pose this question somewhere else.

flx42 · 2016-11-07T22:39:43Z

Yeah, it's clearly a Python issue now :) Closing this.

flx42 closed this as completed Nov 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: Docker image uses CUDA 7.5, and host system driver 367.44 requires CUDA 8 #237

Problem: Docker image uses CUDA 7.5, and host system driver 367.44 requires CUDA 8 #237

thommiano commented Nov 3, 2016 •

edited

3XX0 commented Nov 3, 2016 •

edited

thommiano commented Nov 3, 2016

flx42 commented Nov 3, 2016 •

edited

flx42 commented Nov 3, 2016

thommiano commented Nov 3, 2016 •

edited

flx42 commented Nov 3, 2016

thommiano commented Nov 3, 2016 •

edited

flx42 commented Nov 7, 2016

Problem: Docker image uses CUDA 7.5, and host system driver 367.44 requires CUDA 8 #237

Problem: Docker image uses CUDA 7.5, and host system driver 367.44 requires CUDA 8 #237

Comments

thommiano commented Nov 3, 2016 • edited

3XX0 commented Nov 3, 2016 • edited

thommiano commented Nov 3, 2016

flx42 commented Nov 3, 2016 • edited

flx42 commented Nov 3, 2016

thommiano commented Nov 3, 2016 • edited

flx42 commented Nov 3, 2016

thommiano commented Nov 3, 2016 • edited

flx42 commented Nov 7, 2016

thommiano commented Nov 3, 2016 •

edited

3XX0 commented Nov 3, 2016 •

edited

flx42 commented Nov 3, 2016 •

edited

thommiano commented Nov 3, 2016 •

edited

thommiano commented Nov 3, 2016 •

edited