-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the enhancement
I would like to be able to tie a specific workflow job to a specific ephemeral runner via labels
I would imagine this working as a generated or optional label with a unique identifier like the job ID.
- A workflow job event comes in with id
12345678 - I create a VM, and instruct it to have the label
12345678 - After 2-3 seconds, the runner starts on the VM
- The runner picks up a job
12345678 - The job is completed, the runner exits and the VM shuts down, the ephemeral runner is removed from the org.
Alternative flow using cancellation
- Two workflow job events come in with ids
1234and5678 - Two VMs are created with these IDs, and the corresponding labels
- Before the runner starts, the user cancels job
1234 - I delete the VM with the name
runner-1234 - After 2-3 seconds, the runner starts on the
5678VM - The runner picks up a job
5678 - The job is completed, the runner exits and the VM shuts down, the ephemeral runner is removed from the org.
Why is this necessary?
When creating ephemeral runners with individual VMs, there is no deterministic way to manage the lifecycle of the VM. If you create 5 VMs because a workflow run has 5 jobs within it, then the workflow gets cancelled, you can't remove any of the VMs you created, because a separate build from another repo could have started running jobs there.
In addition, if you have a workflow run with 10 jobs, 10 VMs are created and one job starts, the other 9 are cancelled by the user. How do you know which VMs to remove?
Code Snippet
Additional information
Add any other context about the feature here.
NOTE: if the feature request has been agreed upon then the assignee will create an ADR. See docs/adrs/README.md
This may be useful to the recommended Philips solution linked from the official GitHub docs https://github.com/philips-labs/terraform-aws-github-runner/issues/1853 - they also seemed to run into similar issues with managing lifecycle.
If a runner can be created and set to only run a job for a deterministic job ID, lifecycle management because much easier.