Skip to content

Add termination reason and message to the runner API#2204

Merged
r4victor merged 7 commits intomasterfrom
issue_2202_runner_termination_reason
Jan 21, 2025
Merged

Add termination reason and message to the runner API#2204
r4victor merged 7 commits intomasterfrom
issue_2202_runner_termination_reason

Conversation

@r4victor
Copy link
Copy Markdown
Collaborator

@r4victor r4victor commented Jan 21, 2025

Closes #2202
Closes #1701

This PR:

  • Adds termination_reason and termination_message to the runner API.
  • Returns "Max duration exceeded" in termination_message when job is terminated due to max duration.
  • Adds max_duration_exceeded termination reason but does not use it yet for backward compatibility with old clients.
  • Makes shim/runner return lowercase termination reasons (termination reason values) for consistency with the server API.

The CLI error when max duration exceeded:

Installing extension 'ms-toolsai.jupyter'...
Run failed with error code CONTAINER_EXITED_WITH_ERROR.
Error: Max duration exceeded
Check CLI, server, and run logs for more details.

@r4victor r4victor changed the title Issue 2202 runner termination reason Add termination reason and message to the runner API Jan 21, 2025
@r4victor r4victor requested a review from un-def January 21, 2025 06:31
The shim may expect any termination reason from the server
@r4victor
Copy link
Copy Markdown
Collaborator Author

I was also trying to switch to enum for termination reason in shim but it don't think it's feasible. The shim can accept a termination reason from the server and enumerating possible values set by the server will make backward compatibility challenging.

@un-def
Copy link
Copy Markdown
Collaborator

un-def commented Jan 21, 2025

Also here:

reason = job_model.termination_reason.name

.name.value for a lowercase literal

@r4victor r4victor merged commit 6d93ecc into master Jan 21, 2025
@r4victor r4victor deleted the issue_2202_runner_termination_reason branch January 21, 2025 10:02
pranitnaik43 pushed a commit to bahaal-tech/dstack that referenced this pull request Feb 9, 2025
* Introduce TerminationReason and JobState types

* Handle runner API not avaiable when stopping

Maybe relevant for local runner when the runner container or shim was stopped

* Set max duration exceeded in termination message

* Add max_duration_exceeded termination reason

* Update shim OpenAPI spec

* Revert using TerminationReason enum in shim

The shim may expect any termination reason from the server

* Send termination_reason.value to shim
pranitnaik43 pushed a commit to bahaal-tech/dstack that referenced this pull request Mar 4, 2025
* Introduce TerminationReason and JobState types

* Handle runner API not avaiable when stopping

Maybe relevant for local runner when the runner container or shim was stopped

* Set max duration exceeded in termination message

* Add max_duration_exceeded termination reason

* Update shim OpenAPI spec

* Revert using TerminationReason enum in shim

The shim may expect any termination reason from the server

* Send termination_reason.value to shim
pranitnaik43 pushed a commit to bahaal-tech/dstack that referenced this pull request Mar 5, 2025
* Introduce TerminationReason and JobState types

* Handle runner API not avaiable when stopping

Maybe relevant for local runner when the runner container or shim was stopped

* Set max duration exceeded in termination message

* Add max_duration_exceeded termination reason

* Update shim OpenAPI spec

* Revert using TerminationReason enum in shim

The shim may expect any termination reason from the server

* Send termination_reason.value to shim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add run termination reason to the runner API [Feature]: Use a dedicated status when exceeding max_duration

2 participants