-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NOT URGENT] Test upgraded NVIDIA driver in gpu-2-6 #331
Comments
Are you sure you mean gpu-2-8? nvidia-smi shows four of the GTX 680.
The GTX 980 cards are in gpu-2-6. Please confirm with nvidia-smi as well just to make sure we offline the correct node. |
Snippet from nodes file as well to show the card types in that group of nodes:
|
Yep, |
I've placed a reservation on the GPU resources on |
No GPU activity was seen. Updated driver.
I still have the reservation in place on the GPUS however. Do you wish to test manually first in case roll back is desired? |
Yes, will test in the morning (Frankfurt time). Thanks!
|
No prob. Reservation left in place for GPUs. Batch jobs non-impacted. |
gpu-2-6 is drained from discussions elsewhere. Did that driver update work out? I can re-add it to the batch queue and re-issue the GPU only reservation if desired. |
My apologies for not having much time to further debug. There appears to
|
OK. I put the node back in the pool for batch work but stuck a 10 day reservation on the GPUs. Hope that is reasonable. |
I believe I need to renew the GPU reservation on this node. Done for another 10 days. |
Thanks. We're still chasing this down, and have replicated the issue on a local dev box. It seems to be 980-specific and related to driver versions. @pgrinaway and @steven-albanese have been investigating on the local dev box. |
Fun! Noted. |
We are having some trouble using the FAH client application on
gpu-2-8
(where the new GTX-980 cards were installed), and the advice we have received is to upgrade the 352.39 driver to 355.11 or later. Would it be possible to drain this node of GPU jobs and test the upgrade when feasible?I believe the 355.11 driver is available here: http://www.nvidia.com/download/driverResults.aspx/90393/en-us
The text was updated successfully, but these errors were encountered: