Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider queueing to multiple partitions where possible #36

Open
rueberger opened this issue Aug 31, 2022 · 1 comment
Open

consider queueing to multiple partitions where possible #36

rueberger opened this issue Aug 31, 2022 · 1 comment

Comments

@rueberger
Copy link
Contributor

cc: brainsss1

I've noticed that jobs queue either to trc or normal. You might consider queueing both to normal and trc where possible (ie for time limits <= 2 days) and to owners for short running or checkpointable jobs (although requeuing may require some modification to the control flow logic in preprocess.py).

In general this would just be a convenience to shorten queue times and load-balance, aside from one scenario: submissions to normal are limited when the global number of cpus in use for a group exceeds 512. group partitions and owners are unaffected by cpu limits.

For instance, in this scenario yandan's moco jobs won't execute until a number of other jobs finish, but could execute immediately on trc.

Screen Shot 2022-08-30 at 4 59 18 PM

Weird policy, but according to Killian "Owner groups are expected to mainly submit jobs to their own partition, as well as to the owners partition that offers them a very large pool of resources for free."

@poldrack
Copy link
Contributor

poldrack commented Sep 1, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants