Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetson: libcudnn_adv_infer_static_v8.a: file exists: unknown error #274

Open
7 of 8 tasks
ben-xD opened this issue Jan 28, 2022 · 2 comments
Open
7 of 8 tasks

Jetson: libcudnn_adv_infer_static_v8.a: file exists: unknown error #274

ben-xD opened this issue Jan 28, 2022 · 2 comments

Comments

@ben-xD
Copy link

ben-xD commented Jan 28, 2022

Problem

Support for Jetson platforms has been in beta for more than a year. Unfortunately, the following simple command does not work on my Jetson. Fortunately, this is very easy to reproduce, just run this:

docker run --runtime nvidia -it nvcr.io/nvidia/tensorrt:21.12-py3

Error

You will get:

$ docker run --runtime nvidia -it nvcr.io/nvidia/tensorrt:21.12-py3
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook NVIDIA/nvidia-docker#1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/63a5dc0a46e4f12d052b60007e09f18b5bb773903c054915cf6f843392531b40/merged/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer_static_v8.a: file exists: unknown.
ERRO[0006] error waiting for container: context canceled

And let me pick out the juiciest part: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/id/merged/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer_static_v8.a: file exists: unknown

My details:

Running `nvidia-container-cli -k -d /dev/tty info`
-- WARNING, the following logs are for debugging purposes only --

I0128 16:30:01.821611 9072 nvc.c:281] initializing library context (version=0.10.0+jetpack, build=61f57bcdf7aa6e73d9a348a7e36ec9fd94128fb2)
I0128 16:30:01.821757 9072 nvc.c:255] using root /
I0128 16:30:01.821803 9072 nvc.c:256] using ldcache /etc/ld.so.cache
I0128 16:30:01.821874 9072 nvc.c:257] using unprivileged user 1002:1002
I0128 16:30:01.822601 9073 driver.c:134] starting driver service
I0128 16:30:01.831030 9072 driver.c:231] driver service terminated with signal 15
nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected
`jetson_release -v`
 - NVIDIA Jetson AGX Xavier [16GB]
   * Jetpack 4.5.1 [L4T 32.5.2]
   * NV Power Mode: MAXN - Type: 0
   * jetson_stats.service: active
 - Board info:
   * Type: AGX Xavier [16GB]
   * SOC Family: tegra194 - ID:25
   * Module: P2888-0001 - Board: P2822-0000
   * Code Name: galen
   * CUDA GPU architecture (ARCH_BIN): 7.2
   * Serial Number: 1421021087906
 - Libraries:
   * CUDA: 10.2.89
   * cuDNN: 8.0.0.180
   * TensorRT: 7.1.3.0
   * Visionworks: 1.6.0.501
   * OpenCV: NOT_INSTALLED compiled CUDA: NO
   * VPI: ii libnvvpi1 1.0.15 arm64 NVIDIA Vision Programming Interface library
   * Vulkan: 1.2.70
 - jetson-stats:
   * Version 3.1.2
   * Works on Python 3.6.9

My uname -a:

Linux desktopPC 4.9.201-tegra NVIDIA/nvidia-docker#1 SMP PREEMPT Wed May 5 09:32:25 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux
`docker version`
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.7-0ubuntu5~18.04.3
 Built:             Mon Nov  1 01:04:31 2021
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0ubuntu5~18.04.3
  Built:            Fri Oct 22 00:57:37 2021
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.5-0ubuntu3~18.04.1
  GitCommit:
 runc:
  Version:          1.0.1-0ubuntu2~18.04.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:
My `dpkg -l '*nvidia*'`
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                 Version                 Architecture            Description
+++-====================================-=======================-=======================-=============================================================================
un  libgldispatch0-nvidia                <none>                  <none>                  (no description available)
ii  libnvidia-container-tools            1.7.0-1                 arm64                   NVIDIA container runtime library (command-line tools)
ii  libnvidia-container0:arm64           0.10.0+jetpack          arm64                   NVIDIA container runtime library
ii  libnvidia-container1:arm64           1.7.0-1                 arm64                   NVIDIA container runtime library
un  nvidia-304                           <none>                  <none>                  (no description available)
un  nvidia-340                           <none>                  <none>                  (no description available)
un  nvidia-384                           <none>                  <none>                  (no description available)
un  nvidia-common                        <none>                  <none>                  (no description available)
ii  nvidia-container-csv-cuda            10.2.89-1               arm64                   Jetpack CUDA CSV file
ii  nvidia-container-csv-cudnn           8.0.0.180-1+cuda10.2    arm64                   Jetpack CUDNN CSV file
ii  nvidia-container-csv-tensorrt        7.1.3.0-1+cuda10.2      arm64                   Jetpack TensorRT CSV file
ii  nvidia-container-csv-visionworks     1.6.0.501               arm64                   Jetpack VisionWorks CSV file
un  nvidia-container-runtime             <none>                  <none>                  (no description available)
un  nvidia-container-runtime-hook        <none>                  <none>                  (no description available)
ii  nvidia-container-toolkit             1.7.0-1                 arm64                   NVIDIA container runtime hook
un  nvidia-cuda-dev                      <none>                  <none>                  (no description available)
un  nvidia-docker                        <none>                  <none>                  (no description available)
ii  nvidia-docker2                       2.8.0-1                 all                     nvidia-docker CLI wrapper
ii  nvidia-l4t-3d-core                   32.5.2-20210709090156   arm64                   NVIDIA GL EGL Package
ii  nvidia-l4t-apt-source                32.5.2-20210709090156   arm64                   NVIDIA L4T apt source list debian package
ii  nvidia-l4t-bootloader                32.5.2-20210709090156   arm64                   NVIDIA Bootloader Package
ii  nvidia-l4t-camera                    32.5.2-20210709090156   arm64                   NVIDIA Camera Package
un  nvidia-l4t-ccp-t186ref               <none>                  <none>                  (no description available)
ii  nvidia-l4t-configs                   32.5.2-20210709090156   arm64                   NVIDIA configs debian package
ii  nvidia-l4t-core                      32.5.2-20210709090156   arm64                   NVIDIA Core Package
ii  nvidia-l4t-cuda                      32.5.2-20210709090156   arm64                   NVIDIA CUDA Package
ii  nvidia-l4t-firmware                  32.5.2-20210709090156   arm64                   NVIDIA Firmware Package
ii  nvidia-l4t-graphics-demos            32.5.2-20210709090156   arm64                   NVIDIA graphics demo applications
ii  nvidia-l4t-gstreamer                 32.5.2-20210709090156   arm64                   NVIDIA GST Application files
ii  nvidia-l4t-init                      32.5.2-20210709090156   arm64                   NVIDIA Init debian package
ii  nvidia-l4t-initrd                    32.5.2-20210709090156   arm64                   NVIDIA initrd debian package
ii  nvidia-l4t-jetson-io                 32.5.2-20210709090156   arm64                   NVIDIA Jetson.IO debian package
ii  nvidia-l4t-jetson-multimedia-api     32.5.2-20210709090156   arm64                   NVIDIA Jetson Multimedia API is a collection of lower-level APIs that support
ii  nvidia-l4t-kernel                    4.9.201-tegra-32.5.2-20 arm64                   NVIDIA Kernel Package
ii  nvidia-l4t-kernel-dtbs               4.9.201-tegra-32.5.2-20 arm64                   NVIDIA Kernel DTB Package
ii  nvidia-l4t-kernel-headers            4.9.201-tegra-32.5.2-20 arm64                   NVIDIA Linux Tegra Kernel Headers Package
ii  nvidia-l4t-libvulkan                 32.5.2-20210709090156   arm64                   NVIDIA Vulkan Loader Package
ii  nvidia-l4t-multimedia                32.5.2-20210709090156   arm64                   NVIDIA Multimedia Package
ii  nvidia-l4t-multimedia-utils          32.5.2-20210709090156   arm64                   NVIDIA Multimedia Package
ii  nvidia-l4t-oem-config                32.5.2-20210709090156   arm64                   NVIDIA OEM-Config Package
ii  nvidia-l4t-tools                     32.5.2-20210709090156   arm64                   NVIDIA Public Test Tools Package
ii  nvidia-l4t-wayland                   32.5.2-20210709090156   arm64                   NVIDIA Wayland Package
ii  nvidia-l4t-weston                    32.5.2-20210709090156   arm64                   NVIDIA Weston Package
ii  nvidia-l4t-x11                       32.5.2-20210709090156   arm64                   NVIDIA X11 Package
ii  nvidia-l4t-xusb-firmware             32.5.2-20210709090156   arm64                   NVIDIA USB Firmware Package
un  nvidia-libopencl1-dev                <none>                  <none>                  (no description available)
un  nvidia-prime                         <none>                  <none>                  (no description available)
nvidia-container-cli -V
cli-version: 1.7.0
lib-version: 0.10.0+jetpack
build date: 2021-11-30T19:53+00:00
build revision: f37bb387ad05f6e501069d99e4135a97289faf1f
build compiler: aarch64-linux-gnu-gcc-7 7.5.0
build platform: aarch64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
`/var/log/nvidia-container-runtime.log` **logs**
2022/01/28 18:33:07 Using bundle directory: /run/containerd/io.containerd.runtime.v2.task/moby/9b8ee90ce1ae157106936445c2c429ab33f6293d4a2f9e1c400b60917d597a97
2022/01/28 18:33:07 Using OCI specification file path: /run/containerd/io.containerd.runtime.v2.task/moby/9b8ee90ce1ae157106936445c2c429ab33f6293d4a2f9e1c400b60917d597a97/config.json
2022/01/28 18:33:07 Looking for runtime binary 'docker-runc'
2022/01/28 18:33:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH
2022/01/28 18:33:07 Looking for runtime binary 'runc'
2022/01/28 18:33:07 Found runtime binary '/usr/sbin/runc'
2022/01/28 18:33:07 Running nvidia-container-runtime

2022/01/28 18:33:07 'create' command detected; modification required
2022/01/28 18:33:07 prestart hook path: /usr/bin/nvidia-container-runtime-hook

2022/01/28 18:33:07 Forwarding command to runtime
2022/01/28 18:33:07 Using bundle directory: 
2022/01/28 18:33:07 Using OCI specification file path: config.json
2022/01/28 18:33:07 Looking for runtime binary 'docker-runc'
2022/01/28 18:33:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH
2022/01/28 18:33:07 Looking for runtime binary 'runc'
2022/01/28 18:33:07 Found runtime binary '/usr/sbin/runc'
2022/01/28 18:33:07 Running nvidia-container-runtime

2022/01/28 18:33:07 No modification required
2022/01/28 18:33:07 Forwarding command to runtime
`dmesg`
[782180.143499] docker0: port 1(veth7284103) entered blocking state
[782180.143505] docker0: port 1(veth7284103) entered disabled state
[782180.143995] device veth7284103 entered promiscuous mode
[782180.153901] IPv6: ADDRCONF(NETDEV_UP): veth7284103: link is not ready
[782180.579633] eth0: renamed from veth7260057
[782180.602680] IPv6: ADDRCONF(NETDEV_CHANGE): veth7284103: link becomes ready
[782180.603100] docker0: port 1(veth7284103) entered blocking state
[782180.603108] docker0: port 1(veth7284103) entered forwarding state
[782185.791137] docker0: port 1(veth7284103) entered disabled state
[782185.791615] veth7260057: renamed from eth0
[782185.854805] docker0: port 1(veth7284103) entered disabled state
[782185.864577] device veth7284103 left promiscuous mode
[782185.864587] docker0: port 1(veth7284103) entered disabled state

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
  • Kernel version from uname -a
  • Any relevant kernel output lines from dmesg
  • 💣 Driver information from nvidia-smi -a - this is not available on Jetson
  • Docker version from docker version
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
  • NVIDIA container library version from nvidia-container-cli -V
  • NVIDIA container library logs (see troubleshooting)

Things I found

  • I found nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected - I'm not sure why not devices are found. It is a NVIDIA Jetson AGX Xavier [16GB] - Jetpack 4.5.1 [L4T 32.5.2]
@ben-xD ben-xD changed the title Beta support for containers on Jetson platforms (AArch64): file exists: unknown Jetson: file exists: unknown error Jan 28, 2022
@ben-xD ben-xD changed the title Jetson: file exists: unknown error Jetson: executable file not found in $PATH: <nil>: unknown error Jan 28, 2022
@ben-xD ben-xD changed the title Jetson: executable file not found in $PATH: <nil>: unknown error Jetson: libcudnn_adv_infer_static_v8.a: file exists: unknown error Jan 28, 2022
@ben-xD
Copy link
Author

ben-xD commented Jan 28, 2022

I had come across NVIDIA/nvidia-docker#825 (comment):

as part of v2 we prevent the container from starting if you have the NVIDIA driver

I am not sure if my issue is related to this. Perhaps @RenaudWasTaken would know? Thanks in advance :)

@klueska
Copy link
Contributor

klueska commented Jan 29, 2022

In general, only the containers packaged for l4t (in this case l4t-tensorrt) are designed to work on jetson machines, e.g.:

docker run --runtime nvidia -it nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime

This is because these containers rely on the host to inject all cuda and other support files into the container at runtime instead of bundling them inside the container image (keeping the container images themselves relatively small). The error you are seeing is because a file already bundled in the container image is attempting to be injected into the container at runtime by the container stack.

That said, you may be able to leverage a new feature of the container support for jetson that limits the set of files ultimately injected into a container to only the base l4t base files. That wy you can run any container build for arm that just needs these base files in order to run.

You can do this by setting the following environment variable when you start the container.

NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=base-only

i.e.

$ docker run --runtime nvidia -it -e NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=base-only nvcr.io/nvidia/tensorrt:21.12-py3

@elezar elezar transferred this issue from NVIDIA/nvidia-docker Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants