You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently seeing some issues with the flux scheduler that don't crop up in the others. The machine appears to struggle with lots of fast running jobs, possibly losing track of resource availability, leading to throughput grinding to a halt at some point -> looks to be related to there being a mix of using ats and the flux handler for resource tracking?
Additionally, the time argument to flux mini run is causing some issues, requiring large over estimates of job allocation times other wise there gets to be a race condition where ats thinks jobs are still remaining, but flux won't schedule any of them due to the time requested exceeding the remaining allocation time.
This is tested with a project specific ats wrapper, and on a flux scheduled cluster (not bootstrapped within slurm/etc).
Jeremy
The text was updated successfully, but these errors were encountered:
I am currently seeing some issues with the flux scheduler that don't crop up in the others. The machine appears to struggle with lots of fast running jobs, possibly losing track of resource availability, leading to throughput grinding to a halt at some point -> looks to be related to there being a mix of using ats and the flux handler for resource tracking?
Additionally, the time argument to flux mini run is causing some issues, requiring large over estimates of job allocation times other wise there gets to be a race condition where ats thinks jobs are still remaining, but flux won't schedule any of them due to the time requested exceeding the remaining allocation time.
This is tested with a project specific ats wrapper, and on a flux scheduled cluster (not bootstrapped within slurm/etc).
Jeremy
The text was updated successfully, but these errors were encountered: