Skip to content
This repository has been archived by the owner on Oct 27, 2023. It is now read-only.

CUDA libraries not being mounted into container #173

Closed
rgov opened this issue Aug 17, 2022 · 6 comments
Closed

CUDA libraries not being mounted into container #173

rgov opened this issue Aug 17, 2022 · 6 comments

Comments

@rgov
Copy link

rgov commented Aug 17, 2022

I am using a Jetson Nano, which has just been flashed with the latest SD card image which contains L4T r32.7.2. I have installed the latest Docker Engine. I configured the libnvidia-container apt repository following these instructions which state:

As of NVIDIA Container Toolkit 1.7.0 (nvidia-docker2 >= 2.8.0) support for Jetson plaforms is included for Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 distributions. This means that the installation instructions provided for these distributions are expected to work on Jetson devices.

Then I installed the nvidia-docker2 package, rebooted, and finally ran:

sudo docker run --rm -it --runtime nvidia nvcr.io/nvidia/l4t-base:r32.7.1 cat /usr/local/cuda-10.2/version.txt

I expect that this would output CUDA Version 10.2.300 because the file exists on the host and as I understand it, the instruction in /etc/nvidia-container-runtime/host-files-for-container.d/cuda.csv (dir, /usr/local/cuda-10.2) tells it to mount this into the container.

Instead, it tells me that the file is missing from the container. And many of the CUDA libraries are missing as well (libcublas.so in particular).

Info about packages installed
cuda-command-line-tools-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
cuda-compiler-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
cuda-cudart-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-cudart-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-cuobjdump-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-cupti-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-cupti-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-documentation-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-driver-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-gdb-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-libraries-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
cuda-libraries-dev-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
cuda-memcheck-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvcc-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvdisasm-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvgraph-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvgraph-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvml-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvprof-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvprune-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvrtc-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvrtc-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-nvtx-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-repo-l4t-10-2-local/now 10.2.460-1 arm64 [installed,local]
cuda-samples-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
cuda-toolkit-10-2/unknown,stable,now 10.2.460-1 arm64 [installed]
cuda-tools-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
cuda-visual-tools-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
libnvidia-container-tools/bionic,now 1.10.0-1 arm64 [installed]
libnvidia-container0/bionic,now 0.11.0+jetpack arm64 [installed]
libnvidia-container1/bionic,now 1.10.0-1 arm64 [installed]
nvidia-container-csv-cuda/stable,now 10.2.460-1 arm64 [installed]
nvidia-container-csv-cudnn/stable,now 8.2.1.32-1+cuda10.2 arm64 [installed]
nvidia-container-csv-tensorrt/stable,now 8.2.1.8-1+cuda10.2 arm64 [installed]
nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [installed]
nvidia-container-runtime/bionic,now 3.10.0-1 all [installed]
nvidia-container-toolkit/bionic,now 1.10.0-1 arm64 [installed]
nvidia-docker2/bionic,now 2.11.0-1 all [installed]
nvidia-l4t-3d-core/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-apt-source/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-bootloader/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-camera/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-configs/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-core/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-cuda/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-firmware/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-gputools/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-graphics-demos/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-gstreamer/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-init/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-initrd/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-jetson-io/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-jetson-multimedia-api/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-kernel/stable,now 4.9.253-tegra-32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-kernel-dtbs/stable,now 4.9.253-tegra-32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-kernel-headers/stable,now 4.9.253-tegra-32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-libvulkan/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-multimedia/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-multimedia-utils/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-oem-config/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-tools/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-wayland/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-weston/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-x11/stable,now 32.7.2-20220420143418 arm64 [installed]
nvidia-l4t-xusb-firmware/stable,now 32.7.2-20220420143418 arm64 [installed]

/etc/apt/sources.list.d/cuda-l4t-10-2-local.list
----------------
deb file:///var/cuda-repo-l4t-10-2-local /


/etc/apt/sources.list.d/docker.list
----------------
deb [arch=arm64 signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu   bionic stable


/etc/apt/sources.list.d/nvidia-container-toolkit.list
----------------
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /


/etc/apt/sources.list.d/nvidia-l4t-apt-source.list
----------------
# SPDX-FileCopyrightText: Copyright (c) 2019-2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
#
# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from NVIDIA CORPORATION or
# its affiliates is strictly prohibited.

deb https://repo.download.nvidia.com/jetson/common r32.7 main
deb https://repo.download.nvidia.com/jetson/t210 r32.7 main


/etc/apt/sources.list.d/visionworks-repo.list
----------------
deb-src file:///var/visionworks-repo /
deb file:///var/visionworks-repo /


/etc/apt/sources.list.d/visionworks-sfm-repo.list
----------------
deb-src file:///var/visionworks-sfm-repo /
deb file:///var/visionworks-sfm-repo /


/etc/apt/sources.list.d/visionworks-tracking-repo.list
----------------
deb-src file:///var/visionworks-tracking-repo /
deb file:///var/visionworks-tracking-repo /

Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:19 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:00:46 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.7
  GitCommit:        0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
 nvidia:
  Version:          1.1.3
  GitCommit:        v1.1.3-0-g6724737
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
@klueska
Copy link
Contributor

klueska commented Aug 17, 2022

Starting with v0.11.0+jetpack of libnvidia-container, only the l4t.csv, devices.csv, and drivers.csv files are processed by default.

Please see the release notes:
https://github.com/NVIDIA/libnvidia-container/releases/tag/v0.11.0%2Bjetpack

If you want to process all CSV files you either need to:

  1. Downgrade to v0.10.0+jetpack of libnvidia-container; OR
  2. Set the NVIDIA_REQUIRE_JETPACK="csv-mounts=all" environment variable when launching your container

@rgov
Copy link
Author

rgov commented Aug 17, 2022

Interesting. Thanks for the quick response.

I see that the l4t-base container README says that it no longer includes CUDA from the host system but this is supposed to be "[s]tarting with the r34.1 release (JetPack 5.0 Developer Preview)". Therefore the change seems like it is breaking the expectation that older releases will have the host's CUDA.

Would it be possible to release a new version to the < r34.1 repositories that restores the previous behavior so that existing software continues to run and apt upgrade does not break anything?

Is there a place to call out this change more prominently than the libnvidia-container release notes? There are dozens of Nvidia packages and it seems unlikely users are reading release notes for all of them.

@rgov
Copy link
Author

rgov commented Aug 17, 2022

Actually, I'm still struggling. I ran apt install libnvidia-container0=0.10.0+jetpack and rebooted but I have the same issue.

Downgrading libnvidia-container1=1.8.1-1 because v1.9.0 has a similar-sounding release note ends up uninstalling nvidia-container-runtime and some other packages.

@klueska
Copy link
Contributor

klueska commented Aug 17, 2022

You would need to undo installing libnvidia-container from the upstream repo and instead use the repos that ship with Jetpack. By following the instructions at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html you basically bypassed using the "bundled" packages that ship with jetpack and provide the guarantees it promises.

@rgov
Copy link
Author

rgov commented Aug 17, 2022

(I figured out how to set the environment variable, it has to be passed to the container with docker -e NVIDIA_REQUIRE_JETPACK="csv-mounts=all" and not set on the parent shell.)

@elezar
Copy link
Member

elezar commented Oct 20, 2023

Closing this as it seems that the issue was resolved. If this is not the case, please open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants