Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: commands (TensorBoards, notebooks, etc.) should not be preempted [DET-4157] #1346

Merged
merged 3 commits into from Sep 30, 2020

Conversation

mackrorysd
Copy link
Member

Description

Commands (TensorBoards, Notebooks), being interactive, would cause more obvious disruption to an end-user if preempted.

Implementing this requires changes in 2 aspects of the scheduler. First simply not sending the ReleaseResources message to such tasks (and closely related: commands will also ignore this unless something changes CanTerminate to true, and nothing does right now). Second, the scheduler will also take this into account when allocating slots so that a group cannot lose a slot it needs for non-preemptible tasks, so the system cannot oversubscribe itself.

Test Plan

I've added unit tests that reproduced the original problem, including subtleties discovered during implementation. Some other unit tests needed to have CanTerminate added to their task definitions since it defaults to false, which triggers the new behavior.

Commentary

This patch prevents these commands from being preempted using the existing CanTerminate flag. Because of Go's zero-value initialization, this means all tasks are now non-premptible by default. This feels backwards - I would actually prefer to replace this with a field call NonPreemptible (depending on how reviewers feel about it) and invert the logic.

I also made garbage collection non-preemptible - I'm not sure if that's desirable or not.

Checklist

  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.

@shiyuann
Copy link
Contributor

@mackrorysd I have left one comment. Also, can you also run E2E GPU tests for this PR before merging it?

@mackrorysd mackrorysd changed the title fix: commands (TensorBoards, notebooks, etc.) should not be prempted [DET-4157] fix: commands (TensorBoards, notebooks, etc.) should not be preempted [DET-4157] Sep 29, 2020
@mackrorysd mackrorysd assigned shiyuann and unassigned mackrorysd Sep 29, 2020
Copy link
Contributor

@shiyuann shiyuann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one last comment. Otherwise, it looks very good!

master/internal/command/command.go Show resolved Hide resolved
@shiyuann shiyuann assigned mackrorysd and unassigned shiyuann Sep 30, 2020
Copy link
Contributor

@shiyuann shiyuann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 ship it!

@mackrorysd mackrorysd merged commit 38b0c95 into determined-ai:master Sep 30, 2020
@mackrorysd mackrorysd deleted the det-4157 branch May 6, 2021 21:21
@dannysauer dannysauer added this to the 0.13.6 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants