Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question : Cuda version support #258

Open
rahulswa08 opened this issue Jun 27, 2023 · 15 comments
Open

Question : Cuda version support #258

rahulswa08 opened this issue Jun 27, 2023 · 15 comments

Comments

@rahulswa08
Copy link

Hi @dusty-nv ,
I'm currently using ros:noetic-pytorch-l4t-r34.1.1 base image on Jetson AGX Orin 32GB with cuda version 11.4 installed. However I need cuda version 11.8 in my docker, for this do I need to upgrade cuda on Jetson? Or can I perform upgrade to 11.8- cuda on this image?

@dusty-nv
Copy link
Owner

Hi @rahulswa08, on JetPack 5, CUDA/cuDNN/TensorRT/ect are installed inside the container (unlike JetPack 4, where they get mounted into the container from the host device by the NVIDIA runtime). So you would just perform the upgrade inside the container. I've not tried changing the CUDA version before though.

@rahulswa08
Copy link
Author

Thanks @dusty-nv , As the docker have its own CUDA I have tried upgrading the CUDA on docker using the instructions provided here.

Please ensure your device is configured per the [CUDA Tegra Setup Documentation](https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#upgradable-package-for-jetson).
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-tegra-repo-ubuntu2004-11-8-local_11.8.0-1_arm64.deb
sudo dpkg -i cuda-tegra-repo-ubuntu2004-11-8-local_11.8.0-1_arm64.deb
sudo cp /var/cuda-tegra-repo-ubuntu2004-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

But when I perform the update I face the following issues at the last step sudo apt-get -y install cuda:

The following packages have unmet dependencies:
cuda : Depends: cuda-11.8 (>= 11.8) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

I'm not sure why I'm facing this. I'm able to perform update on jetson by following the same steps but not able to do on docker.

Am I doing anything wrong here? or it this a limitation?

Could you help me solve this issue

Thanks!!

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 7, 2023

@rahulswa08 can you try installing the cuda-11.8 package instead of cuda ? Or maybe try the --only-upgrade flag to apt-get? I haven't upgraded CUDA before in the containers.

@rahulswa08
Copy link
Author

rahulswa08 commented Jul 7, 2023

I have tried installing cuda-11.8 but it leads to some other dependency and that leads to another. And I'm unable to update it by trying to install them. I haven't tried --only-upgrade option.

@dusty-nv dusty-nv closed this as completed Jul 7, 2023
@dusty-nv
Copy link
Owner

dusty-nv commented Jul 7, 2023

If --only-upgrade doesn't work and you are unable to resolve the dependencies, you could try uninstalling the previous CUDA from the container first. Or it may be cleaner for you just to start with l4t-base, then install your desired CUDA Toolkit/ect on top of that, then PyTorch and so on.

@dusty-nv dusty-nv reopened this Jul 7, 2023
@hillct
Copy link

hillct commented Nov 4, 2023

I've encountered the same issues, starting from each of: nvcr.io/nvidia/l4t-cuda:11.4.19-devel, nvcr.io/nvidia/l4t-cuda:11.4.19-runtime, nvcr.io/nvidia/l4t-base:35.4.1, nvcr.io/nvidia/l4t-base:35.3.1 and nvcr.io/nvidia/l4t-base:35.2.1 when following h documented procedure found here: https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=aarch64-jetson&Compilation=Native&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

Having tested both network and local repo methodologies, the network repo seems to be targeted toward the the muli-platform CUDA images for example https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags as evryhing is cross-dependant on cuda-12.2 packages (essentially a documentation issue for the above webpage) bu when pinned to cuda 11.8, he behavior is the same as with the local repo methodology wherein you get circular dependencies among the various CUDA packages at 11.8. So far I've not tested h various force or ignore dependencies approaches as hey would inevitably lead to unstable images. Certainly the preferred approach would be to resolve he underlying circular dependency issue.

@hillct
Copy link

hillct commented Nov 4, 2023

As it turns out, the dependency tree ends at he unresolvable dependency on nvidia-l4t-core which is a board suppor package mean for he hos hardware, not containers. The dependency itself seems o be a holdover from the Jepack 4.5.x days when CUDA was meant to run outside the containers. The issue might be resolvable by correcting and rebuilding cuda-compat-11-8

For reference, the (consolidated) tree looks like this:

# apt-get install cuda-11.8
 cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# apt-get install cuda-runtime-11-8
cuda-runtime-11-8 : Depends: cuda-compat-11-8 (>= 11.8.31339915) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# apt-get install cuda-compat-11.8
 cuda-compat-11-8 : PreDepends: nvidia-l4t-core but it is not installable
E: Unable to correct problems, you have held broken packages.

Further discussion of his issue related to nvidia-l4t-core (while no directly on point) can be found here https://forums.developer.nvidia.com/t/installing-nvidia-l4t-core-package-in-a-docker-layer/153412

@johnnynunez
Copy link
Contributor

As it turns out, the dependency tree ends at he unresolvable dependency on nvidia-l4t-core which is a board suppor package mean for he hos hardware, not containers. The dependency itself seems o be a holdover from the Jepack 4.5.x days when CUDA was meant to run outside the containers. The issue might be resolvable by correcting and rebuilding cuda-compat-11-8

For reference, the (consolidated) tree looks like this:

# apt-get install cuda-11.8
 cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# apt-get install cuda-runtime-11-8
cuda-runtime-11-8 : Depends: cuda-compat-11-8 (>= 11.8.31339915) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# apt-get install cuda-compat-11.8
 cuda-compat-11-8 : PreDepends: nvidia-l4t-core but it is not installable
E: Unable to correct problems, you have held broken packages.

Further discussion of his issue related to nvidia-l4t-core (while no directly on point) can be found here https://forums.developer.nvidia.com/t/installing-nvidia-l4t-core-package-in-a-docker-layer/153412

https://hackmd.io/ZmWQz8azTdWNVoCc9Bf3QA
If not wait for jetpack 6 end of the month

@hillct
Copy link

hillct commented Nov 5, 2023

https://hackmd.io/ZmWQz8azTdWNVoCc9Bf3QA

@johnnynunez Congratulations on your article, but it doesn't seem to address the issue at hand - that being deploying CUDA 11.8 INSIDE a container.

If not wait for jetpack 6 end of the month

I'm also a bit baffled by your assertion that the release of Jetpack 6 might include recompilation and correction of the dependency flaw, especially since no such recompilation was completed as part of the Jetpack 5.x roadmap. If you have information that this differs for the 6.0 release, please share that documented roadmap.

@johnnynunez
Copy link
Contributor

johnnynunez commented Nov 6, 2023

https://hackmd.io/ZmWQz8azTdWNVoCc9Bf3QA

@johnnynunez Congratulations on your article, but it doesn't seem to address the issue at hand - that being deploying CUDA 11.8 INSIDE a container.

If not wait for jetpack 6 end of the month

I'm also a bit baffled by your assertion that the release of Jetpack 6 might include recompilation and correction of the dependency flaw, especially since no such recompilation was completed as part of the Jetpack 5.x roadmap. If you have information that this differs for the 6.0 release, please share that documented roadmap.

Only @dusty-nv OR @tokk-nv can confirm somethings here.

  1. The problem is that driver is old.
  2. Cuda 12.3 is not compatible with jetson.
  3. If you upgrade the problem is still existing, because there are libraries like Cudnn that which are private, and only you can download pre-compiled and for jetson not exists the urls with latest version of cudnn.

So we can only wait for Jetpack 6.0 because:

  1. uncompile linux from jetpack(you can install any distro).
  2. You can install any linux kernel.
  3. Jetpack 6 comes with cuda 12.2.

I do not work in Nvidia, but I think the idea of Nvidia, is to pass the jetson as if it were a gpu, being able to install open dkms kernel and have precompilations of cudnn and other libraries on the order of the day as have other devices such as Grace Hopper (based on ARM)

@dusty-nv
Copy link
Owner

dusty-nv commented Nov 6, 2023

@johnnynunez @hillct here is another thread to keep an eye on: https://forums.developer.nvidia.com/t/use-cuda-12-2-in-a-container/271600

@dusty-nv
Copy link
Owner

dusty-nv commented Nov 9, 2023

OK, I found a workaround for this by manually extracting the cuda-compat deb inside the container, and then installing cuda-toolkit or cuda-libraries package instead (only cuda and cuda-runtime depend on cuda-compat/nvidia-l4t-core)

#
# sudo docker build --network=host --tag cuda:12.2 .
# sudo docker run --runtime nvidia -it --rm --network host cuda:12.2 cuda-samples/bin/aarch64/linux/release/deviceQuery
#
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
            wget \
            git \
            binutils \
            xz-utils \
            ca-certificates \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

# download the CUDA Toolkit local installer
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin -O /etc/apt/preferences.d/cuda-repository-pin-600 && \
    wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-tegra-repo-ubuntu2004-12-2-local_12.2.2-1_arm64.deb && \
    dpkg -i cuda-tegra-repo-*.deb && \
    rm cuda-tegra-repo-*.deb 

# add the signed keys
RUN cp /var/cuda-tegra-repo-*/cuda-tegra-*-keyring.gpg /usr/share/keyrings/

# manually extract cuda-compat
RUN mkdir /var/cuda-compat && \
    cd /var/cuda-compat && \
    ar x ../cuda-tegra-repo-*/cuda-compat-*.deb && \
    tar xvf data.tar.xz -C / && \
    rm -rf /var/cuda-compat
    
# install cuda-toolkit (doesn't depend on cuda-compat/nvidia-l4t-core)
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
            cuda-toolkit-* \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean
 
# environment variables 
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all

ENV CUDA_HOME="/usr/local/cuda"
ENV PATH="/usr/local/cuda/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda/compat:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"
   
# build cuda samples
RUN git clone --branch=v12.2 https://github.com/NVIDIA/cuda-samples && \
    cd cuda-samples/Samples/1_Utilities/deviceQuery && \
    make

WORKDIR /

Tried this on a board running JetPack 5.1.2 / L4T R35.4.1, which did not have CUDA 12.2 installed outside the container - and it worked (YMMV)

@0Unkn0wn
Copy link

Thank you very much @dusty-nv it worked on the first try, and no problems were encountered.

0Unkn0wn added a commit to 0Unkn0wn/JetsonOrinNano-Jax-ROS2 that referenced this issue Nov 10, 2023
Added dockerfile with cuda 12 fix taken from dusty-nv/jetson-containers#258 (comment)
and added 2 scripts for building and running the cuda 12 container
and added a jax performance jupyter python script to test the container and to see the difference in performance
@hillct
Copy link

hillct commented Nov 11, 2023

Just for completeness, in case others come across this issue in the future, the alternate approach is to force the installation of the dependency as in this example. It should be noted you can specify CUDA=11-8 or CUDA=12-2 to get the desired resuls a build time.

ARG BASE_IMAGE=nvcr.io/nvidia/l4t-base:35.3.1
FROM ${BASE_IMAGE} as base
ARG DEBIAN_FRONTEND=noninteractive
ARG sm=87
ARG USE_DISTRIBUTED=1                    # skip setting this if you want to enable OpenMPI backend
ARG USE_QNNPACK=0
ARG CUDA=11-8
# nvidia-l4t-core is a dependency for the rest
# of the packages, and is designed to be installed directly
# on the target device. This because it parses /proc/device-tree
# in the deb's .preinst script. Looks like we can bypass it though:
RUN \
    echo "deb https://repo.download.nvidia.com/jetson/common r35.3 main" >> /etc/apt/sources.list && \
    echo "deb https://repo.download.nvidia.com/jetson/t194 r35.3 main" >> /etc/apt/sources.list && \
    apt-key adv --fetch-key http://repo.download.nvidia.com/jetson/jetson-ota-public.asc && \
    mkdir -p /opt/nvidia/l4t-packages/ && \
    touch /opt/nvidia/l4t-packages/.nv-l4t-disable-boot-fw-update-in-preinstall && \
    rm -f /etc/ld.so.conf.d/nvidia-tegra.conf && apt-get update && \
    apt-get install -y --no-install-recommends nvidia-l4t-core && \
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-keyring_1.0-1_all.deb && \
    dpkg -i cuda-keyring_1.0-1_all.deb && apt-get update && apt-get install -y --no-install-recommends cuda-${CUDA} && \
    apt-get -y upgrade &&  apt-get clean && rm -rf /var/lib/apt/lists/* cuda-keyring_1.0-1_all.deb

I've not yet done a comparison of the final images bu given the methodology, it's likely that @dusty-nv's approach would be slower to build (owing to the large download requirement) but of similar final size

@Vektor284
Copy link

Hello,

I have a board running JetPack 5.1.4 / L4T R35.4.1. I am working on a project that requires Python 3.9 and Cuda 12.2. I can get @dusty-nv solution working and I can get Pytorch installed. However, when I checked for the presence of the GPU, using torch.cuda.is_available(), it returns None. The same is true when the setup script checks for $CUDA_HOME. Some of my dependencies required these to compile.

So far, my steps have been to create the container image as indicated by dusty and then use this image to create the container with Python 3.9 and the rest of my project.

Any help and or advice is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants