You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should redesign the Slurm batch system so that it can ask Slurm just for the finished jobs, so that we don't need to do O(n) work to get one finished job when n jobs are running, and so it won't be O(n^2) work to find n finished jobs.
This is limiting the max number of running jobs below what the cluster can support, in the linked issue, because at a certain point we spend so much time polling jobs we have no time to handle the done ones or send in new ones.
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1317
The text was updated successfully, but these errors were encountered:
It'd be pretty hacky, but one could imagine mostly removing SLURM polling from the loop. For example, perhaps have all jobs append a "I'm done" line to a common log, which could itself be "tail'ed". You could still poll SLURM directly every few minutes to look for stuck jobs.
We should redesign the Slurm batch system so that it can ask Slurm just for the finished jobs, so that we don't need to do
O(n)
work to get one finished job when n jobs are running, and so it won't beO(n^2)
work to find n finished jobs.Originally posted by @adamnovak in #2323 (comment)
This is limiting the max number of running jobs below what the cluster can support, in the linked issue, because at a certain point we spend so much time polling jobs we have no time to handle the done ones or send in new ones.
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1317
The text was updated successfully, but these errors were encountered: