Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What high priority scheduler issues remain to be resolved? #197

Closed
jchodera opened this issue Dec 17, 2014 · 10 comments
Closed

What high priority scheduler issues remain to be resolved? #197

jchodera opened this issue Dec 17, 2014 · 10 comments

Comments

@jchodera
Copy link
Member

We have sometime left with Adaptive Computing to resolve additional issues.

Besides the scheduler segfault issue (#172), what other high priority issues do we have?

@tatarsky
Copy link
Contributor

I have the list of open bugs which are

-qdel -t doesn't work #167 (workaround however exists)
-the cuda_visible_devices bug (workaround in place but would be nice to get fixed in code)
-I am not clear on #138 and its status

I've suggested a review of fairshare configs as the most "common" item that is referenced.

@tatarsky
Copy link
Contributor

Ah, and #186 may be a bug but it may not be. I'm not sure on that one either.

@jchodera
Copy link
Member Author

#147 also lists a number of open issues in addition to these:

  • Fairshare needs to take group membership into account
  • Short (<1h) jobs in the active queue (interactive jobs) should also be able to suspend batch jobs in order to find a slot more quickly.
  • Jobs in the active queue should run as cpu overcommitment. They don't need a full CPU allocated. This takes into account that they are mostly idle. The number of interactive jobs allowed per user can be increased at that point.

@tatarsky
Copy link
Contributor

Yep, I've already got those in the collection with them.

@tatarsky
Copy link
Contributor

Fairshare BTW does take group membership into account. So I'd like further explanation or example where it is not as believed to be.

@jchodera
Copy link
Member Author

We can probably also address #102 to optimize scheduling order

@jchodera
Copy link
Member Author

OK, I think @tatarsky has the best handle on the order of priority here at this point. I'll write back to Rob and let him know you have a good idea of what order to tackle these things.

Thanks!

@tatarsky
Copy link
Contributor

Yes, I wanted to basically comment out "ENABLEHIGHTHROUGHPUT TRUE" on that one and see if it improves or degrades the situation. But I'm trying to only make one mod at a time ;)

@tatarsky
Copy link
Contributor

We are reviewing items and may make some one at a time changes at some point. But if we do they will be announced with a Git issue so that any issues with the mod can be tracked. This is unrelated to core dump matter. Which remains under investigation.

@tatarsky
Copy link
Contributor

All items in this ticket are duplicated in other Git tickets or resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants