Feature request: cancel job from runner (self-hosted) #128844
Unanswered
carlcsaposs-canonical
asked this question in
Actions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Select Topic Area
Product Feedback
Body
For self-hosted runners on spot instances, we would like to gracefully cancel any running jobs during spot eviction (30 second notice on Azure).
Currently, sending SIGINT or SIGTERM to the runner stops the runner almost immediately and does not run steps using
if: always()
orif: cancelled()
and does not run the post-job script (ACTIONS_RUNNER_HOOK_JOB_COMPLETED
).Ideally, the behavior would be the same as a job level timeout (i.e. job
timeout-minutes
exceeded): the current step gets SIGINT, then SIGTERM after 7.5 seconds, then SIGKILL after 2.5 seconds. After that, steps withalways()
orcancelled()
would run.It should be possible to trigger this cancellation from the host machine of the runner.
Current workaround
Currently, we are manually parsing the Worker log to get the process ID of the currently running step and then sending SIGINT, SIGTERM, and SIGKILL to that process ID (after 0, 7.5, and 2.5 seconds). However, this allows steps with
if: failure()
(and withoutif: cancelled()
to run)—the same thing that happens when a steptimeout-minutes
is reached.Source code of workaround:
https://github.com/canonical/self-hosted-runner-provisioner-azure/blob/8a6332f770cfb8dd9061070a2ba4f28c84c76755/cli/azure_runner_provisioner_cli/listen_for_spot_eviction_event.py#L11-L26
https://github.com/canonical/self-hosted-runner-provisioner-azure/blob/8a6332f770cfb8dd9061070a2ba4f28c84c76755/cli/azure_runner_provisioner_cli/listen_for_spot_eviction_event.py#L97-L117
Beta Was this translation helpful? Give feedback.
All reactions