Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

下载 nvcr.io 镜像会发生 unexpected commit digest 错误 #1441

Closed
yankay opened this issue Dec 21, 2023 · 3 comments
Closed

下载 nvcr.io 镜像会发生 unexpected commit digest 错误 #1441

yankay opened this issue Dec 21, 2023 · 3 comments

Comments

@yankay
Copy link
Member

yankay commented Dec 21, 2023

使用 kube 创建 Pod,

安装 https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/ ,报错日志为:

Dec 21 16:38:57 kay113-gpu kubelet[93644]: E1221 16:38:57.353228   93644 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"driver-validation\" with ErrImagePull: \"rpc error: code = FailedPrecondition desc = failed to pull and unpack image \\\"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v23.9.1\\\": failed commit on ref \\\"layer-sha256:b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3\\\": unexpected commit digest sha256:d3fb406557f48b8a28ac40a19c3a24dd99519ba1beb0e4d3e8343d863e3a495b, expected sha256:b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3: failed precondition\"" pod="gpu-operator/nvidia-container-toolkit-daemonset-rvbmj" podUID="9bea6827-f2db-4d67-9a6d-e2bf8925b52d"

使用 nerdctl pull nvcr.io/nvidia/cloud-native/gpu-operator-validator:v23.9.1 可以重现

FATA[0003] failed commit on ref "layer-sha256:b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3": commit failed: unexpected commit digest sha256:5dc2908c530dc73acc888ac402047c4978155775ac28fe041cbcac81080cab26, expected sha256:b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3: failed precondition

使用 下面命令 可以修复

rm -rf /var/lib/containerd/io.containerd.content.v1.content/ingest/*
ctr -n k8s.io i pull nvcr.io/nvidia/cloud-native/gpu-operator-validator:v23.9.1 ` 

The same as containerd/containerd#3974

Copy link
Contributor

Hi @yankay,
感谢您的反馈!
我们会尽快跟进.

Details

Instructions for interacting with me using comments are available here.
If you have questions or suggestions related to my behavior, please file an issue against the gh-ci-bot repository.

@yankay yankay changed the title 下载 nvcr.io 镜像会发生错误 下载 nvcr.io 镜像会发生 unexpected commit digest 错误 Dec 21, 2023
@wzshiming
Copy link
Member

x.x.x.x - - [21/Dec/2023:08:39:46 +0000] "GET /v2/nvcr.io/nvidia/cloud-native/gpu-operator-validator/blobs/sha256:b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3 HTTP/1.1" 500 0 "https://nvcr.m.daocloud.io/v2/nvidia/cloud-native/gpu-operator-validator/blobs/sha256:b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3?ns=nvcr.io" "containerd/1.7.10+unknown" "-"
2023/12/21 08:39:46 failed to request nvcr.io nvidia/cloud-native/gpu-operator-validator Get "https://ngc.download.nvidia.com/containers/registry//docker/registry/v2/blobs/sha256/b4/b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3/data?ak-token=exp=1703149174~acl=/containers/registry/docker/registry/v2/blobs/sha256/b4/b4e744f5f131fb2db0dd7649806f286ecaa3fcda18dc9a4245d83e902100ccb3/data*~hmac=xxxxx": stream error: stream ID 219; INTERNAL_ERROR; received from peer

看起来是 nvcr.io 的服务器主动断联的, 我定位下原因

@wzshiming
Copy link
Member

现已修复, 定位到是 nvcr.io 使用的 cdn 不支持 h2 的连接复用, 当前单独对 nvcr.io 禁用连接复用

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants