You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On some of our heavily loaded ML nodes I see intermittent failures in running the nvidia-smi command. The error (here on ml8, which has been overly busy since the end of vacation) is
which is either in response to a timeout or a SIGTERM exit. It would be useful to distinguish these, so that's one tweak to implement. But it is likely a timeout. It's possible that the timeout should be longer, or that the default 2s should be overridable on the command line to allow sonar to adapt more easily to its environment.
The text was updated successfully, but these errors were encountered:
On some of our heavily loaded ML nodes I see intermittent failures in running the nvidia-smi command. The error (here on ml8, which has been overly busy since the end of vacation) is
which is either in response to a timeout or a SIGTERM exit. It would be useful to distinguish these, so that's one tweak to implement. But it is likely a timeout. It's possible that the timeout should be longer, or that the default 2s should be overridable on the command line to allow sonar to adapt more easily to its environment.
The text was updated successfully, but these errors were encountered: