Stop polling all jobs in Slurm to find the done ones #4431

adamnovak · 2023-03-30T17:21:40Z

We should redesign the Slurm batch system so that it can ask Slurm just for the finished jobs, so that we don't need to do O(n) work to get one finished job when n jobs are running, and so it won't be O(n^2) work to find n finished jobs.

Originally posted by @adamnovak in #2323 (comment)

This is limiting the max number of running jobs below what the cluster can support, in the linked issue, because at a certain point we spend so much time polling jobs we have no time to handle the done ones or send in new ones.

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1317

The text was updated successfully, but these errors were encountered:

michaelkarlcoleman · 2023-03-30T17:28:45Z

It'd be pretty hacky, but one could imagine mostly removing SLURM polling from the loop. For example, perhaps have all jobs append a "I'm done" line to a common log, which could itself be "tail'ed". You could still poll SLURM directly every few minutes to look for stuck jobs.

adamnovak mentioned this issue Mar 30, 2023

Toil can't saturate Slurm with jobs because it spends all its time polling jobs already submitted #2323

Closed

unito-bot added the slurm label Mar 30, 2023

This was referenced Apr 11, 2023

Scattering scaling performance test #4434

Open

updated --coalesceStatusCalls help #4437

Merged

unito-bot assigned adamnovak Apr 27, 2023

adamnovak mentioned this issue May 11, 2023

Speed up Slurm job turnaround when there are many jobs #4471

Merged

19 tasks

DailyDreaming closed this as completed in #4471 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop polling all jobs in Slurm to find the done ones #4431

Stop polling all jobs in Slurm to find the done ones #4431

adamnovak commented Mar 30, 2023 •

edited

michaelkarlcoleman commented Mar 30, 2023

Stop polling all jobs in Slurm to find the done ones #4431

Stop polling all jobs in Slurm to find the done ones #4431

Comments

adamnovak commented Mar 30, 2023 • edited

michaelkarlcoleman commented Mar 30, 2023

adamnovak commented Mar 30, 2023 •

edited