-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata issue with the CUDA repositories [CDN] #7
Comments
Hi @justinokamoto and @Angel-Popa |
I rolled back the repository metadata from the last posting, the signatures (i.e. I have scheduled a re-posting job to run in a few hours and will verify the intermittent issue is resolved. I'll close the issue pending that verification for all affected repos. Please let me know if you continue to see any errors, thank you! |
The metadata for each repo passes |
The new release file seems to be error-ing for us:
did something go amiss in fixing this issue? |
Re-opening, there are several reports in the NVIDIA Developer forums |
Did this get fixed? |
any update on this? Still getting the error |
Any update on this error. I'm facing similar issues while running below commands
error response: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/./libcudnn8_8.3.2.44-1+cuda11.5_amd64.deb Hash Sum mismatch |
Getting the same thing. Have tried everything and no luck. |
Sorry I missed the email notification @xmalina-aibuild / @xmalina, @Nrohlable Based on the timestamp and the $ podman run -it ubuntu:20.04 /bin/bash -c "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y wget sudo ca-certificates gnupg; bash"
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-get update $ apt-cache policy libcudnn8
libcudnn8:
Installed: (none)
Candidate: 8.5.0.96-1+cuda11.7
Version table:
8.5.0.96-1+cuda11.7 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
8.4.1.50-1+cuda11.6 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
8.4.0.27-1+cuda11.6 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages
8.3.3.40-1+cuda11.5 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages $ sudo apt-get install --verbose-versions libcudnn8=8.3.2.44-1+cuda11.5
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
libcudnn8 (8.3.2.44-1+cuda11.5)
0 upgraded, 1 newly installed, 0 to remove and 9 not upgraded.
Need to get 423 MB of archives.
After this operation, 1270 MB of additional disk space will be used.
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64
libcudnn8 8.3.2.44-1+cuda11.5 [423 MB]
Fetched 423 MB in 5s (91.9 MB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libcudnn8.
(Reading database ... 4885 files and directories currently installed.)
Preparing to unpack .../libcudnn8_8.3.2.44-1+cuda11.5_amd64.deb ...
Unpacking libcudnn8 (8.3.2.44-1+cuda11.5) ...
Setting up libcudnn8 (8.3.2.44-1+cuda11.5) ...
|
Hey @kmittman, sorry for the delayed response. I'm using the Dockerhub Image tag corresponding to tensorflow/tensorflow:2.8.0-gpu and creating a image for Computer vision application and training. I was able to update the cuda version from 11.2 to 11.7 as you see from here: (Reading database ... 19687 files and directories currently installed.) But at the runtime training when I checked tensorflow was still running on cuda 11.2, where I'm receiving this error: Node: 'model/conv2d/Conv2D' This seems to be a gpu issue since, the training appears to be running just fine on CPU tho. Note: The error is poping up while using MTCNN lib for face detection which i'm using just for inference. |
Hi @Nrohlable Looking at the Tensorflow 2.8.0-gpu docker image tag you mentioned, it was last updated 8 months ago, on February 2nd. However, we performed a GPG key rotation to a new public key on ~ April 28th. All of the packages in the NVIDIA repository, including CUDA 11.2 packages were re-signed using the new GPG key. The My suggestion would be one of the following
|
Hi @kmittman, 2.8.1-gpu tensorflow image is working fine without any issues in my case. Thanks for your help, really appreciate it |
Hey @kmittman, I hope you could help me with this as well. Siamese on CPU : 18/899 [..............................] - ETA: 4:41:43 - loss: 0.8343 It seems both of them take exactly similar time, which shouldn't be the case. [name: "/device:CPU:0" These are following commands i'm running at the training time in order to make sure it is working on GPU: from tensorflow.python.client import device_lib As disccused earlier i'm using Tensorflow-2.8.1-gpu for training it on GPU. |
Closing this repository issue. Please follow up with Tensorflow team. |
Reporting metadata issues with the CUDA repositories
Getting "File has unexpected size" issues when running
apt update
. This seems to be a known issue to NVIDIA CDNs, as NVIDIA mentions it here.Please provide the following information in your comment:
When was the
Release
(Debian) orrepomd.xml
(RPM) file last modified ?The Linux distro and architecture. If cross-compiling or containerized, please mention that.
This is occurring within the latest Docker image nvidia/cuda:11.4.0-cudnn8-runtime-ubuntu18.04.
Which NVIDIA repositories do you have enabled ?
Do your
.list
/.repo
files contain URLs using HTTP (port 80) or HTTPS (port 443) ?Which geographic region is the machine located in ?
Seattle area
Which CDN edge node are you hitting ?
Not sure :/
Any other relevant environmental conditions (i.e. a specific Docker container image) ?
nvidia/cuda:11.4.0-cudnn8-runtime-ubuntu18.04
The text was updated successfully, but these errors were encountered: