Create backoff mechanism for failed runners and allow re-creation of failed ephemeral runners #4059

nikola-jokic · 2025-04-25T16:42:29Z

…failed ephemeral runners

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

adamcharnock · 2025-05-02T10:29:22Z

This would be a big help to us. We're having some issues with our Kubevirt runners that we need to resolve, but this would help us out until we can nail down the precise problem.

rr-krupesh-savaliya · 2025-05-13T13:24:21Z

Could we please prioritize this PR? We are currently experiencing the same problem as described in #2721, and it is impacting our workflow.

@nikola-jokic @mumoshu @rentziass @toast-gear

nikola-jokic · 2025-05-13T13:29:15Z

Hey @rr-krupesh-savaliya,

I just want to say that depending on the root cause of the failures happening on your cluster, you might still want to inspect them and fix them. This is just a measure that would allow self-healing, but underlying problems should still be fixed, so you can get all the benefits of autoscaling

Link- · 2025-05-14T12:33:31Z

controllers/actions.github.com/ephemeralrunner_controller_test.go

-				}
-				return updated.Status.Reason, nil
-			}, ephemeralRunnerTimeout, ephemeralRunnerInterval).Should(BeEquivalentTo("TooManyPodFailures"), "Reason should be TooManyPodFailures")
+			for i := range 5 {


Do we have coverage for partial failures? Do we want to add one more test case to cover behaviour for less than < 5 failures?

There is a test that covers it on eviction, which kind of covers the case for re-creation, and we have the case that checks the exit code 0 when the runner exists in the service. I feel like these tests are covering the less than 5 failures.

Link-

We already discussed this at length, left a non blocking comment

nikola-jokic added 2 commits April 25, 2025 18:41

Create backoff mechanism for failed runners and allow re-creation of …

3c62dba

…failed ephemeral runners

fix test

850d112

nikola-jokic marked this pull request as ready for review April 28, 2025 10:15

Copilot AI review requested due to automatic review settings April 28, 2025 10:15

nikola-jokic requested review from mumoshu, toast-gear, rentziass and a team as code owners April 28, 2025 10:15

Copilot AI reviewed Apr 28, 2025

View reviewed changes

nikola-jokic added the gha-runner-scale-set label Apr 28, 2025

nikola-jokic assigned Link- Apr 28, 2025

Link- reviewed May 14, 2025

View reviewed changes

Link- approved these changes May 14, 2025

View reviewed changes

nikola-jokic merged commit cae7efa into master May 14, 2025
17 checks passed

nikola-jokic deleted the nikola-jokic/retry-ephemeral-runner branch May 14, 2025 13:38

kahirokunn mentioned this pull request May 15, 2025

I want to disable "Pod has failed to start more than 5 times" #2721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create backoff mechanism for failed runners and allow re-creation of failed ephemeral runners #4059

Create backoff mechanism for failed runners and allow re-creation of failed ephemeral runners #4059

nikola-jokic commented Apr 25, 2025 •

edited

Loading

Copilot AI left a comment

adamcharnock commented May 2, 2025

rr-krupesh-savaliya commented May 13, 2025 •

edited

Loading

nikola-jokic commented May 13, 2025

Link- May 14, 2025

nikola-jokic May 14, 2025

Link- left a comment

Create backoff mechanism for failed runners and allow re-creation of failed ephemeral runners #4059

Create backoff mechanism for failed runners and allow re-creation of failed ephemeral runners #4059

Conversation

nikola-jokic commented Apr 25, 2025 • edited Loading

Copilot AI left a comment

Choose a reason for hiding this comment

adamcharnock commented May 2, 2025

rr-krupesh-savaliya commented May 13, 2025 • edited Loading

nikola-jokic commented May 13, 2025

Link- May 14, 2025

Choose a reason for hiding this comment

nikola-jokic May 14, 2025

Choose a reason for hiding this comment

Link- left a comment

Choose a reason for hiding this comment

nikola-jokic commented Apr 25, 2025 •

edited

Loading

rr-krupesh-savaliya commented May 13, 2025 •

edited

Loading