Skip to content
This repository has been archived by the owner on Jul 10, 2023. It is now read-only.

Intel GPUs is not working in openvisual cloud #56

Closed
Gsarg18 opened this issue Feb 12, 2021 · 14 comments
Closed

Intel GPUs is not working in openvisual cloud #56

Gsarg18 opened this issue Feb 12, 2021 · 14 comments

Comments

@Gsarg18
Copy link

Gsarg18 commented Feb 12, 2021

I have tried using GPU(Intel® UHD Graphics 630 (CFL GT2)) and processor(Intel® Core™ i7-8700 CPU @ 3.20GHz × 12) with video-analytics-serving(change docker image from Xeon to Xeone3) and it is working, but when i tired to work with smart city sample on GPU it is not working. I also raised the issue here:- OpenVisualCloud/Smart-City-Sample#736
I also tried by changing the VA-Serving version from 0.3.0-alpha to 0.3.1.1-alpha in the smart city sample still it is not working

@Gsarg18 Gsarg18 changed the title Intel GPU is not working in openvisual cloud Intel GPUs is not working in openvisual cloud Feb 12, 2021
@nnshah1
Copy link

nnshah1 commented Feb 12, 2021

@Gsarg18 Can you post the va serving log? You can increase the log level with an environment variable (LOG_LEVEL=DEBUG). The Open Visual Cloud docker files have different versions of the drivers than the default VA Serving container- so if VA Serving standalone is working and Open Visual Cloud base image is not - then I suspect a difference in dependencies.

@Gsarg18
Copy link
Author

Gsarg18 commented Feb 15, 2021

I have attached the va seving log
gpu_pipeline.log

@nnshah1
Copy link

nnshah1 commented Feb 15, 2021

@Gsarg18 As confirmation - you ran the same VA Serving image (using XeonE3 base image from openvisual cloud docker files) outside open visual cloud and it is working?

Or: Did you build a VA Serving image from the VA Serving git hub?

If it is the second - then highly suspect the dependencies in the base image -

Can you provide the build command / output you used to create the VA Serving image?

@whbruce
Copy link

whbruce commented Feb 17, 2021

Please give the output of the following. No output means that GPU cannot be detected.

$ docker run -it --device /dev/dri  --entrypoint /bin/bash openvisualcloud/xeone3-ubuntu1804-analytics-gst:20.10 -c "clinfo -l"

@Gsarg18
Copy link
Author

Gsarg18 commented Feb 19, 2021

This is the output of above command:

Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) Gen9 HD Graphics NEO

@whbruce
Copy link

whbruce commented Feb 19, 2021

Thanks for quick response

  1. The clinfo output shows that the container can access the GPU. This is good news!
  2. Your docker log does not show any errors, can you clarify what you mean by "not working".
  3. Note that GPU inference takes ~30s to respond to first request.
  4. Please answer @nnshah1's question, how did the build the VA Serving container
  5. Please update to the latest VA Serving version, v0.4.1.

@Gsarg18
Copy link
Author

Gsarg18 commented Feb 19, 2021

Sorry for late response @nnshah1
I run VA-serving by replacing openvisual xeon base image with xeone3(./docker/build.sh --base openvisualcloud/xeone3-ubuntu1804-analytics-gst ) image on GPU and it is working. Then i try to run the same Xeone3 image in smart city sample with latest VA-serving version, it is not working

@whbruce logs of smart city with GPU and VA-serving v0.3.1.1-alpha
GPU_error

@nnshah1
Copy link

nnshah1 commented Feb 20, 2021

I believe I understand what might be happening:

Docker swarm does not support the 'device' or 'priveledged mode'. To enable this in swarm you have to enable a special container image with docker runtime client support that can launch a container with privileges. This is how the vcac-a deployment scripts are set up. Within the analytics folder you can find the run-container.sh within the vcac-a subfolder.

This would explain why the same image run using the video analytics serving run scripts works as expected as those too use docker run directly -

TL/dr: you will need to create / run a container launcher within in swarm to access the igpu hardware -

@Gsarg18
Copy link
Author

Gsarg18 commented Feb 22, 2021

@nnshah1, we are using kubernetes deployment not docker swarm. How to make these changes in kubernetes?

@nnshah1
Copy link

nnshah1 commented Feb 22, 2021

@Gsarg18 , For Kubernetes, I believe you can designate a pod as "priviledged". You should be able to deploy the analytics container as a privileged pod. https://kubernetes.io/docs/concepts/workloads/pods/#privileged-mode-for-containers

@xwu2git, In order to run the analytics container on VCAC-A (with access to GPU) within Kubernetes do we use privileged pods or do we use the same technique as in docker swarm (i.e. a container that launches another container?)

@xwu2git
Copy link

xwu2git commented Feb 22, 2021

For gpus, you can either use a privileged pod or install the gpu device plugins.

@Gsarg18
Copy link
Author

Gsarg18 commented Feb 23, 2021

@nnshah and @xwu2git Thanks for the suggestion related to making analytics pod as priviledged, we will try it and let you know.
Another clarification is , VCAC and GPU are two different issues, here we are concerned about running smart city on GPU only. VCAC is on different thread: OpenVisualCloud/Smart-City-Sample#741

Thanks

@Gsarg18
Copy link
Author

Gsarg18 commented Feb 24, 2021

Thankyou @nnshah1 @xwu2git @whbruce
We did the changes as suggested by you, and now smart-city-sample is working on GPU with kubernetes deployment

@nnshah1
Copy link

nnshah1 commented Feb 24, 2021

Thanks for the update! This is great news! Can you briefly describe the change in set up - so we can capture for anyone else running into the same issue?

@nnshah1 nnshah1 closed this as completed Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants