Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

nvidia-docker 1 can run OpenGL applications; nvidia-docker 2 can't #534

Closed
pjreed opened this issue Nov 15, 2017 · 32 comments
Closed

nvidia-docker 1 can run OpenGL applications; nvidia-docker 2 can't #534

pjreed opened this issue Nov 15, 2017 · 32 comments

Comments

@pjreed
Copy link

pjreed commented Nov 15, 2017

When using nvidia-docker 1, I can run applications that use OpenGL in a guest and they will display in the host environment. When trying to the same application in a similarly-configured container with nvidia-docker 2, I always get this error:

libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  35
  Current serial number in output stream:  37

Running nvidia-smi works on the host as well as in containers using either version of nvidia-docker and always produces appropriate output.

I've attached a couple of Dockerfiles and scripts that demonstrate the issue. run-nvidia-docker-1.sh uses Dockerfile.1 to pull nvidia/cuda:8.0-devel-ubuntu16.04, installs mesa-utils, and then uses nvidia-docker to launch a container that maps all of the necessary volumes and then runs glxgears. When I have nvidia-docker 1 installed, it works and glxgears displays as expected. When I completely remove nvidia-docker 1, install 2, purge all existing Docker images and volumes, and try again, I get the above error.

run-nvidia-docker-2.sh is similar, but it uses the stock ubuntu:16.04 image and adds --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all parameters, so I would expect it to work with nvidia-docker 2. It also produces the same error as above.

Test files: x11-test.tar.gz

Host computer specs:
OS: Ubuntu Linux 16.04
CPU: Intel(R) Xeon(R) CPU E5-1650 v3
Output from nvidia-smi (which works on the host and also in containers using either version of nvidia-docker):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.12                 Driver Version: 387.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        Off  | 00000000:02:00.0  On |                  N/A |
| 32%   49C    P0    N/A /  N/A |    899MiB /  1996MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2000        Off  | 00000000:03:00.0 Off |                  N/A |
| 30%   37C    P8    N/A /  N/A |     12MiB /  1999MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
@3XX0
Copy link
Member

3XX0 commented Nov 15, 2017

OpenGL is not supported at the moment and there is no plan to support GLX is the near future (same as 1.0). OpenGL+EGL however is on the roadmap and will be supported. We will update #11 once we publish it.

If you are a NGC subscriber and need GLX for your workflow, I suggest you fill out a feature request.

@3XX0 3XX0 closed this as completed Nov 15, 2017
@BenBlumer
Copy link

It might be helpful to note this at the beginning of the README for Nvidia-docker2. Even though it's totally fair that it's not implemented yet, it would be good to know, since this is critical for a lot of users (especially the ROS community and those following the ROS communiy's work in using the host's xserver).

@orwel1984
Copy link

orwel1984 commented Dec 21, 2017

I am facing the same problem. It would be helpful to know that such an error can come up with NVidia-Docker2 so we don;t update.
( But please do support OpenGL asap too. Thanks )

@potiuk
Copy link

potiuk commented Dec 31, 2017

Same problem. We are relying on OpemGL apps run in container. That's the only issue stopping us from switching to nvidia-docker 2.

@myyan92
Copy link

myyan92 commented Jan 9, 2018

same problem here trying to use ROS in docker. Will revert to nvidia-docker 1.

@flx42
Copy link
Member

flx42 commented Jan 11, 2018

Please try our new OpenGL beta images based on libglvnd: https://hub.docker.com/r/nvidia/opengl/

@oursland
Copy link

Hi @flx42 the OpenGL beta images appear to solve the problem for me in my initial tests.

Will this functionality get rolled into the CUDA docker images?

@flx42
Copy link
Member

flx42 commented Jan 15, 2018

@oursland We will have a CUDA + OpenGL official image soon (probably called nvidia/cudagl).

@AndreiBarsan
Copy link

AndreiBarsan commented Feb 20, 2018

@flx42 I found the newly released cudagl images here: https://gitlab.com/nvidia/cudagl

That is awesome! However, it is unclear to me how to customize them so that I can work with CUDA 8, as I do not have access to CUDA 9 at this time.

The descriptions say something like 9.1-devel, 9.1-devel-ubuntu16.04 (9.1/devel/Dockerfile) + (1.0-glvnd/devel/Dockerfile), but it is unclear to me how to "compose" the OpenGL and CUDA images in my own Dockerfile, e.g., to use glnvd and CUDA 8, instead of CUDA 9. (I am not very familiar with Docker.)

Would something like that even be possible?

Thank you!

@flx42
Copy link
Member

flx42 commented Feb 20, 2018

Hello @AndreiBarsan, we won't publish images with OpenGL and CUDA 8.0. If you need this use case, you can either do FROM nvidia/cuda:8.0-devel and then install libglvnd like we do in our Dockerfile:
https://gitlab.com/nvidia/opengl/tree/ubuntu16.04
But that's probably a bit challenging.

Instead, you should do FROM nvidia/opengl:1.0-glvnd-devel and then add CUDA 8.0 manually:
https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/runtime/Dockerfile
https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/devel/Dockerfile

@AndreiBarsan
Copy link

@flx42 Thank you very much for the quick response and the tips. I will try that!

@lindwaltz
Copy link

For the sake of anyone else that come here looking, I've created a ros-indigo-desktop-full-nvidia example at https://hub.docker.com/r/lindwaltz/ros-indigo-desktop-full-nvidia/ that adds the necessary libraries on top of the osrf image (opengl, libglvnd, cuda8). Should be easy to reproduce for other ros flavors.

@flx42
Copy link
Member

flx42 commented Mar 1, 2018

Thanks @lindwaltz !

@ruffsl Do you think we could get ROS images based on our recently released (finally!) OpenGL images:
https://hub.docker.com/r/nvidia/opengl/
https://hub.docker.com/r/nvidia/cudagl/
Let me know what you think, or if you need any help.

@mash-graz
Copy link

@flx42

Let me know what you think, or if you need any help.

your images work really great!

i used them to wrap DaVinci Resolve in Docker containers (see: https://gitlab.com/mash-graz/resolve) and the result works surprisingly well!
it's really impressive, that nvidia-docker is already able to handle quite challenging tasks like this.

i only have to remark one significant issue: your opengl and cudagl are using a GLVND installation, but CentOS, which represents the base of one of your images, doesn't support GLVND utill now! that's a real unpleasant source of all kinds of nasty GLX related troubles. even the most simple tools additionally installed from CentOSs main repository, like a glxinfo command, will not work anymore. at the end, i had to prepare two different images for pure Nvidia use and mixed Cuda+Mesa DRI intel iGPU utilization. but that's nothing to criticize. i just write it down, because others may have to face similar issues.

@flx42
Copy link
Member

flx42 commented Apr 18, 2018

Thanks for the feedback @mash-graz!
Don't hesitate to do a bug report for the CentOS images, if you want me to take a look.
You can do it here, or on GitLab: https://gitlab.com/nvidia/opengl/

@potiuk
Copy link

potiuk commented Jun 20, 2018

Thanks! I used your images as a starting point and it worked for us as well. Also some observation - as long as you keep the old /usr/local/nvidia/lib , /usr/local/nvidia/lib64 in the LD_LIBRARY_PATH, the image seems to work fine also when starting via nvidia-docker 1 (not sure if keeping the old entries is even needed).

@kuz
Copy link

kuz commented Jun 28, 2018

An additional question here. I am able to run glxgears on the host's X server from a Docker image based on nvidia/opengl:1.0-glvnd-devel-ubuntu16.04. Now I'd like to be able to run it (well, actually gazebo) inside of a https://github.com/fcwu/docker-ubuntu-vnc-desktop. But I am getting

X Error of failed request: adValue (integer parameter out of range for operation)
  Major opcode of failed request:   151 (GLX)
  Minor opcode of failed request:   3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  25
  Current serial number in output stream:  26

Is it possible to achieve that or there are some inherent limitations?

@flx42
Copy link
Member

flx42 commented Jun 28, 2018

@ruffsl probably has an image for this already.

@ruffsl
Copy link
Contributor

ruffsl commented Jun 28, 2018

@kuz I'm not sure how your launched your container there, but I posted a minimal example GLX-Gears with nvidia-docker here: #136 (comment)

With respect to gazebo, we have some examples of using gazebo with nvidia-docker1 at osrf/car_demo , and I have a WIP PR for updating it to use nvidia-docker2 here: osrf/car_demo#40

This is really old and dated, but here is a rabbit hole about getting gazebo server running in docker on a headless server with AWS: https://github.com/ruffsl/gazebo_docker_demos/tree/master/aws

@nathantsoi
Copy link

thx @ruffsl. anyone know if there are cudagl images for arm64v8/ubuntu:xenial-20180123? e.g. https://github.com/open-horizon/cogwerx-jetson-tx2/blob/master/Dockerfile.cudabase w/ opengl

critically, i dont see libglvnd0 for arm64 https://launchpad.net/ubuntu/xenial/arm64?text=libglvnd0

i have tried building it to no avail: #136 (comment)

@kuz
Copy link

kuz commented Jul 4, 2018

@ruffsl, yep, I've seen those examples before, but they rely on X-server running on the host machine, I am trying to create a self-contained image where gazebo utilizes server's GPUs and there is a web interface to VNC with LXDE, so that a remote user can just access VNC via browser, and see the whole LXDE in front of his eyes, where he can run gzclient, rviz and do that all from a tablet if that is his wish. I guess nvidia VirtualGL examples are the step in the right direction, I just hoped maybe there is an easier way or that someone had already done it.

Is it even possible?

@kuz
Copy link

kuz commented Jul 17, 2018

To answer my own question: yes, it is possible, see https://github.com/willkessler/nvidia-docker-novnc

@tsly123
Copy link

tsly123 commented Oct 24, 2018

Hi,

My code (when running on local PC) will pop up CUDA OpenGL post-processing and Super Triangle Calculator window for rendering. However the nvidia docker (in linux server) is not able to show these 2 windows. The error is:
freeglut (c): ERROR: Internal error <FBConfig with necessary capabilities not found> in function fgOpenWindow
It seem the opengl in docker miss some libraries.
I've been trying to find the way for weeks but still not find a solution for this. I've tried nvidia/opengl, nvidia/cudagl, vnc remote.

Is there any to solve this?
Thank you.

tsly

@MannyKayy
Copy link

Any plans for cudagl with cudnn support?

@machinekoder
Copy link

Has anyone managed to get ROS working with nvidia-docker? Would be great if you could share steps.

@NVIDIA Will there also be a stretch example?

@ruffsl
Copy link
Contributor

ruffsl commented Nov 5, 2018

@machinekoder , yep. our uni lab uses ROS and nvidia-docker extensively so we can quickly onboard new students joining, share common ML development environments, and isolate simulation cluster resources.

See this ROS wiki for more details:
http://wiki.ros.org/docker/Tutorials/Hardware%20Acceleration#nvidia-docker2

Note that it's a lot easier with Ubuntu 18.04 and up. Related:
#136 (comment)
osrf/docker_templates#33

@koenlek
Copy link

koenlek commented Nov 5, 2018

@machinekoder: We use the follow simple trick to add opengl support for nvidia-docker2 to any docker image.

Create this Dockerfile:

FROM <some_image>
# e.g. FROM osrf/ros:kinetic-desktop-full

# optional, if the default user is not "root", you might need to switch to root here and at the end of the script to the original user again.
# e.g.
# USER root

RUN apt-get update && apt-get install -y --no-install-recommends \
        pkg-config \
        libxau-dev \
        libxdmcp-dev \
        libxcb1-dev \
        libxext-dev \
        libx11-dev && \
    rm -rf /var/lib/apt/lists/*

# replace with other Ubuntu version if desired
# see: https://hub.docker.com/r/nvidia/opengl/
COPY --from=nvidia/opengl:1.0-glvnd-runtime-ubuntu16.04 \
  /usr/local/lib/x86_64-linux-gnu \
  /usr/local/lib/x86_64-linux-gnu

# replace with other Ubuntu version if desired
# see: https://hub.docker.com/r/nvidia/opengl/
COPY --from=nvidia/opengl:1.0-glvnd-runtime-ubuntu16.04 \
  /usr/local/share/glvnd/egl_vendor.d/10_nvidia.json \
  /usr/local/share/glvnd/egl_vendor.d/10_nvidia.json

RUN echo '/usr/local/lib/x86_64-linux-gnu' >> /etc/ld.so.conf.d/glvnd.conf && \
    ldconfig && \
    echo '/usr/local/$LIB/libGL.so.1' >> /etc/ld.so.preload && \
    echo '/usr/local/$LIB/libEGL.so.1' >> /etc/ld.so.preload

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES \
    ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES \
    ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics

# USER original_user

Then build it:

docker build --tag <image_name>:<img_tag>-nvidia .
# e.g.:
docker build --tag osrf/ros:kinetic-desktop-full-nvidia .

Now run it e.g. with:

docker run -it \
    --env="DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    -env="XAUTHORITY=$XAUTH" \
    --volume="$XAUTH:$XAUTH" \
    --runtime=nvidia \
    osrf/ros:kinetic-desktop-full-nvidia \
    bash

One catch is that this patched image will only work with Nvidia, so not anymore with Intel Graphics acceleration...

@machinekoder
Copy link

@koenlek Thanks.
@ruffsl Thanks for the pointers to the ROS wiki.

I just tried compiling the 16.04 example with debian:stretch base and it works!

We are using this setup with ROS as well. So far I'm the first nvidia user thanks to a recent hardware upgrade.

@machinekoder
Copy link

@koenlek I created a fork of the opengl repo for Debian Stretch and add your usage instructions: https://github.com/machinekoder/nvidia-opengl-docker Works like a charm!

@qdin
Copy link

qdin commented Apr 14, 2019

OpenGL is not supported at the moment and there is no plan to support GLX is the near future (same as 1.0). OpenGL+EGL however is on the roadmap and will be supported. We will update #11 once we publish it.

I realize things have changed since then, but I’m still a bit confused about some of the capabilities, terminology and implementation of GPU-based containers in particular with respect to EGL. The most popular one seems to be this one, nvidia-docker.

So a few questions:

  1. In the FAQ (https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions) ‘Is OpenGL supported?’ Answer is “Yes, EGL is supported.” My question is why are they used interchangeably? From my understanding EGL + glslES is a subset of OpenGL+glsl, is used over a different context type (egl vs opengl context), and though usage is similar, generally demands conscious knowledge of its limitations.
  2. Why would EGL be implemented first? Isn’t OpenGL way more popular?
  3. Why would Cudagl have to be a different docker altogether? Are there any drawbacks to using CudaGL?

Also, if there is a better/more appropriate place to post this question, I'd appreciate the head's up!

@hcv1027
Copy link

hcv1027 commented Apr 17, 2019

I've tried two docker image, both of them show the error message.

  1. docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
  2. Followed @koenlek 's Dockerfile sample to build my docker image.

The error message like below:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused "process_linux.go:407: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --graphics --pid=15868 /var/lib/docker/aufs/mnt/6bee740d7fe7096a7bdb52c84b4251fe09f314aab8fff22c720ee04ae8de8be8]\\nnvidia-container-cli: ldcache error: process /sbin/ldconfig terminated with signal 11\\n\""": unknown.

If I just type the command nvidia-smi, it will show below message:
nvidia-smi

Is there any suggestion for me to try?

@Seanmatthews
Copy link

@hcv1027 At least for CUDA 9.x, I had success with the following setup: https://github.com/Seanmatthews/ros-docker-gazebo

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests