Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia/k8s-device-plugin:1.0.0-beta3 image is missing from docker hub #140

Closed
jucrouzet opened this issue Oct 9, 2019 · 8 comments
Closed

Comments

@jucrouzet
Copy link

$ docker pull nvidia/k8s-device-plugin:1.0.0-beta3
Error response from daemon: manifest for nvidia/k8s-device-plugin:1.0.0-beta3 not found: manifest unknown: manifest unknown

@jucrouzet
Copy link
Author

When I build it, i get :

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8414d7]

goroutine 1 [running]:
main.getDevices(0xc4200c3e38, 0x1, 0x1)
/go/src/nvidia-device-plugin/nvidia.go:49 +0x177
main.main()
/go/src/nvidia-device-plugin/main.go:42 +0x111

@RenaudWasTaken
Copy link
Contributor

I just restarted the pipeline, the image should be there in a few minutes or so.
@klueska looks like nvml has a Segfaults on here:

ID: int64(*(d.CPUAffinity)),

@rockrush
Copy link
Contributor

rockrush commented Oct 9, 2019

@jucrouzet @RenaudWasTaken Pull request #141 fixes the problem(hopefully), please try and verify.

@klueska
Copy link
Contributor

klueska commented Oct 9, 2019

@RenaudWasTaken I left a comment on #141

If d.CPUAffinity is Nil then we should just not set the Topology field of the device at all.

When is it the case that d.CPUAffinity is not set, such that this occurs?

@zplizzi
Copy link

zplizzi commented Oct 11, 2019

I'm getting the same error.

I installed the plugin as recommended on the main Readme.md with
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta3/nvidia-device-plugin.yml

Here's the logs for the pod running on an AWS p3.2xlarge instance:

% kubectl logs -n kube-system nvidia-device-plugin-daemonset-4h5pl
2019/10/11 19:06:30 Loading NVML
2019/10/11 19:06:30 Fetching devices.
2019/10/11 19:06:30 Shutdown of NVML returned: <nil>
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8414d7]

goroutine 1 [running]:
main.getDevices(0xc42006fe38, 0x1, 0x1)
        /go/src/nvidia-device-plugin/nvidia.go:49 +0x177
main.main()
        /go/src/nvidia-device-plugin/main.go:42 +0x111

Is there a recommended version to install instead that should work?

@zplizzi
Copy link

zplizzi commented Oct 11, 2019

Dropping back to the previous version seems to work. Uninstalled broken version with

kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system

and installed previous version (beta instead of beta3) with

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

@klueska
Copy link
Contributor

klueska commented Oct 14, 2019

This issue should be fixed by #141. No release has been made including this fix yet though.

@klueska
Copy link
Contributor

klueska commented Oct 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants