Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Self-diagnostic command #179

Open
rgov opened this issue Dec 17, 2022 · 1 comment
Open

Suggestion: Self-diagnostic command #179

rgov opened this issue Dec 17, 2022 · 1 comment

Comments

@rgov
Copy link

rgov commented Dec 17, 2022

My experience with the NVIDIA Docker integration across two PCs and a few Jetson devices is that it can be a bumpy experience, and the error messages are often fairly inscrutable. I've had a working system break a few days later due to an unattended upgrade.

It would probably help reduce the number of support requests if there were a self-diagnostic script that could look for common misconfiguration issues and/or format a bug report with all the relevant info for the user. The Homebrew project does this (brew doctor) and it turns out to be convenient for both users and project admins.

Over 1,500 issues have been filed to this repo and 1 in 5 mention some "Error response from daemon" message, like this one that doesn't give me enough information to remedy the situation.

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: insufficient permissions: unknown.

This problem appears to be related to permissions of /dev/nvidia* when virtualgl is set up on the host. Solution comment. This is an example of something that would be really easy for a script to detect but takes a bit of digging for the user to solve. (Also it hopefully wouldn't be that hard to say something more useful than "unknown" in this error message.)

@elezar elezar transferred this issue from NVIDIA/nvidia-docker Dec 5, 2023
@leobenkel
Copy link

Could you give more details on how you solved it ? I followed your link but still cant make it work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants