Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-gpu #380

Closed
sathiez opened this issue Jan 11, 2019 · 13 comments
Closed

tensorflow-gpu #380

sathiez opened this issue Jan 11, 2019 · 13 comments

Comments

@sathiez
Copy link

sathiez commented Jan 11, 2019

By default s2i wrapper uses tensorflow cpu, how to use tensorflow gpu?

@ukclivecox
Copy link
Contributor

We don't have a particular example is showing how you can include the appropriate GPU version of the tensorflow libraries. However, did you not manage to get it working as discussed in #215

@sathiez
Copy link
Author

sathiez commented Jan 11, 2019

Nope. I used seldonio/core-python-wrapper:0.7 wrapper before.

@ukclivecox
Copy link
Contributor

You should be able to use the steps described in #215 to create a custom assemble script in s2i which will allow you to do any installations needed to install the appropriate libraries.

@sathiez
Copy link
Author

sathiez commented Jan 13, 2019

how to set environment variable in s2i. If I set in .s2i/environment or in assemble script with "export" it is not reflecting in built image.

@sathiez
Copy link
Author

sathiez commented Jan 13, 2019

Now environment variable is working fine but When I run tensorflow inside s2i image I get no NVIDIA GPU device is present: /dev/nvidia0 does not exist. while I can successfully able to build and run tensorflow in normal docker image using nvidia as runtime

@ukclivecox
Copy link
Contributor

Are you sure you have access to the NVIDIA drivers from the container, e.g. for GKE: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers

@ukclivecox
Copy link
Contributor

or for example: https://github.com/NVIDIA/nvidia-docker

@sathiez
Copy link
Author

sathiez commented Jan 13, 2019

Actually when I run s2i built docker image with "- - runtime nvidia", nvidia drivers are not getting installed, but I can able to run other docker image with "runtime nvidia" successfully.
I hope seldon core wrapper built with debian stretch and my host os is ubuntu 18.04 with nvidia 390.87. Does this cause any error?

@sathiez sathiez closed this as completed Jan 13, 2019
@sathiez sathiez reopened this Jan 13, 2019
@sathiez
Copy link
Author

sathiez commented Jan 15, 2019

Does seldonio/seldon-core-s2i-python3:0.4 image supports nvidia-390 driver? If not which version does it support, compatible with cuda 9.0?

@ukclivecox
Copy link
Contributor

You may need to replace the cpu the tensorflow library with the gpu one.
What issues areyou having exactly?

@sathiez
Copy link
Author

sathiez commented Jan 15, 2019

I already have tensorflow gpu in my seldon image. But, my issue is that
Tensorflow gpu depends on cuda9.0 which in turn depends on nvidia-390 driver since my host system has nvidia-390 driver. When I build my tensorflow model with this seldonio/seldon-core-s2i-python3:0.4 using s2i, and run the image, I get the following error while importing tensorflow:

2019-01-14 10:58:31.828997: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-14 10:58:31.834895: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: UNKNOWN ERROR (-1)
2019-01-14 10:58:31.834941: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: 7a52b07f170d
2019-01-14 10:58:31.834949: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: 7a52b07f170d
2019-01-14 10:58:31.835018: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2019-01-14 10:58:31.835050: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 390.87.0

This error is probably due to nvidia driver version mismatch. So as mentioned in #380 (comment) which version of nvidia driver does seldonio/seldon-core-s2i-python3:0.4 support?

@ukclivecox
Copy link
Contributor

There should be not restrictions on the version of the nvidia driver.

@sathiez
Copy link
Author

sathiez commented Jan 16, 2019

Thank you, your comment was useful to know atleast this may not be issue

@sathiez sathiez closed this as completed Jan 16, 2019
agrski pushed a commit that referenced this issue Dec 2, 2022
* intial namespaced mode operator work

* remove namespace from yamls

* regenerate with hodometer changes

* first draft to remove namespace seldon-mesh

* remove namespace from resources

* separate helm chart for servers

* rerun notebooks

* update docs

* remove namespace added by controller gen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants