Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: increase nvidia-smi timeout and relax gpu validation timeout to 5 minutes #2911

Merged
merged 2 commits into from
Mar 22, 2023

Conversation

alexeldeib
Copy link
Contributor

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:

Release note:

none

ganeshkumarashok and others added 2 commits March 22, 2023 17:38
nvidia gpus take 1-3 seconds per card to load/initialize,
for VM sizes with many GPUs, this can take more than 25 sec
(our old timeout). We raised it to 60 seconds, but there is
no real reason for this. We raise it arbitraily to 5 minutes
here to avoid any tail latency issues, while we work toward
a more stable/performant fix.

there's additionally little reason for this to fail provisioning;
sophisticated customers can fix it themselves post-creation.
@alexeldeib alexeldeib merged commit fea5ef0 into official/v20230306 Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants