Skip to content

actions-runner min Pod is '0'. #4073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
oliverpark999 opened this issue May 7, 2025 · 8 comments
Closed
4 tasks done

actions-runner min Pod is '0'. #4073

oliverpark999 opened this issue May 7, 2025 · 8 comments
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@oliverpark999
Copy link

oliverpark999 commented May 7, 2025

Checks

Controller Version

0.11.0

Deployment Method

ArgoCD

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Initial ARC deployment
2. After configuring, at least one Runner Pod was running for a while
3. After a few days, the number of Runner Pods became 0
4. During this time, there were no new builds. If there are no builds for a few days and the Runner Pod does not run, does the number of Runner Pods become 0? I don't understand.
5. After Runner Pod = 0, even if a new build occurs, the Runner Pod is terminated when the build is completed.

Describe the bug

I set the actions-runner min/max pod count as below.

maxRunners: 10
minRunners: 1

at least 1 Runner Pod was always running,
but at some point, the Runner Pod is not visible, and when the build is run, the Runner Pod is created.
The version I'm using is '0.11.0' and I configured it with Helm.
Why is that?

Describe the expected behavior

There must be at least one Runner Pod running.

Additional Context

## maxRunners is the max number of runners the autoscaling runner set will scale up to.
maxRunners: 50

## minRunners is the min number of idle runners. The target number of runners created will be
## calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 1

## A self-signed CA certificate for communication with the GitHub server can be
## provided using a config map key selector. If `runnerMountPath` is set, for
## each runner pod ARC will:
## - create a `github-server-tls-cert` volume containing the certificate
##   specified in `certificateFrom`
## - mount that volume on path `runnerMountPath`/{certificate name}
## - set NODE_EXTRA_CA_CERTS environment variable to that same path
## - set RUNNER_UPDATE_CA_CERTS environment variable to "1" (as of version
##   2.303.0 this will instruct the runner to reload certificates on the host)
##
## If any of the above had already been set by the user in the runner pod
## template, ARC will observe those and not overwrite them.
## Example configuration:
#
# githubServerTLS:
#   certificateFrom:
#     configMapKeyRef:
#       name: config-map-name
#       key: ca.crt
#   runnerMountPath: /usr/local/share/ca-certificates/

## Container mode is an object that provides out-of-box configuration
## for dind and kubernetes mode. Template will be modified as documented under the
## template object.
##
## If any customization is required for dind or kubernetes mode, containerMode should remain
## empty, and configuration should be applied to the template.
containerMode:
  type: "dind" ## type can be set to dind or kubernetes
#   ## the following is required when containerMode.type=kubernetes
#   kubernetesModeWorkVolumeClaim:
#     accessModes: ["ReadWriteOnce"]
#     # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
#     storageClassName: "dynamic-blob-storage"
#     resources:
#       requests:
#         storage: 1Gi
#

Controller Logs

2025-05-07T00:24:09Z    INFO    EphemeralRunner Deleted the runner pod  {"version": "0.11.0", "ephemeralrunner": {"name":"foobar-6bxd8-runner-dd8w2","namespace":"arc-runners"}}
2025-05-07T00:24:09Z    INFO    EphemeralRunner Cleaning up the runner jitconfig secret {"version": "0.11.0", "ephemeralrunner": {"name":"foobar-6bxd8-runner-dd8w2","namespace":"arc-runners"}}
2025-05-07T00:24:09Z    INFO    EphemeralRunner Deleting the jitconfig secret   {"version": "0.11.0", "ephemeralrunner": {"name":"foobar-6bxd8-runner-dd8w2","namespace":"arc-runners"}}
2025-05-07T00:24:09Z    INFO    EphemeralRunner Deleted jitconfig secret        {"version": "0.11.0", "ephemeralrunner": {"name":"foobar-6bxd8-runner-dd8w2","namespace":"arc-runners"}}
2025-05-07T00:24:09Z    INFO    EphemeralRunner Removing finalizer      {"version": "0.11.0", "ephemeralrunner": {"name":"foobar-6bxd8-runner-dd8w2","namespace":"arc-runners"}}
2025-05-07T00:24:09Z    INFO    EphemeralRunnerSet      Ephemeral runner counts {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "pending": 0, "running":0, "finished": 0, "failed": 1, "deleting": 0}
2025-05-07T00:24:09Z    INFO    EphemeralRunnerSet      Scaling comparison      {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "current": 1, "desired":1}
2025-05-07T00:24:09Z    INFO    EphemeralRunner Successfully removed finalizer after cleanup    {"version": "0.11.0", "ephemeralrunner": {"name":"foobar-6bxd8-runner-dd8w2","namespace":"arc-runners"}}
2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Ephemeral runner counts {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "pending": 0, "running":0, "finished": 0, "failed": 1, "deleting": 0}
2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Scaling comparison      {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "current": 1, "desired":3}
2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Creating new ephemeral runners (scale up)       {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "count": 2}
2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Creating new ephemeral runner   {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "progress": 1, "total": 2}
2025-05-07T00:33:58Z    INFO    KubeAPIWarningLogger    unknown field "spec.metadata.creationTimestamp"
2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Created new ephemeral runner    {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "runner": "foobar-6bxd8-runner-4hkxj"}
2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Creating new ephemeral runner   {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "progress": 2, "total": 2}
2025-05-07T00:33:58Z    INFO    AutoscalingRunnerSet    Find existing ephemeral runner set      {"version": "0.11.0", "autoscalingrunnerset": {"name":"foobar","namespace":"arc-runners"}, "name": "foobar-6bxd8", "specHash": "59f76f6b8d"}

Runner Pod Logs

This is the log of the listener Pod.

2025-05-07T00:27:31Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 1, "min": 1, "max": 10, "currentRunnerCount": 1, "jobsCompleted": 0}
2025-05-07T00:27:31Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
@oliverpark999 oliverpark999 added bug Something isn't working needs triage Requires review from the maintainers gha-runner-scale-set Related to the gha-runner-scale-set mode labels May 7, 2025
Copy link
Contributor

github-actions bot commented May 7, 2025

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic
Copy link
Collaborator

Hey @oliverpark999,

Your runner has failed more than 5 times, marking the runner as failed. You can see it from the log line:

2025-05-07T00:33:58Z    INFO    EphemeralRunnerSet      Ephemeral runner counts {"version": "0.11.0", "ephemeralrunnerset": {"name":"foobar-6bxd8","namespace":"arc-runners"}, "pending": 0, "running":0, "finished": 0, "failed": 1, "deleting": 0}

The failed runners are counted in when calculating how much we need to scale. This is expected behavior, so I'm going to close the issue. You should investigate the reason for the failure.

@DingGGu
Copy link

DingGGu commented May 12, 2025

Hi @nikola-jokic ,

This issue needs to be discussed rather than just closed.

The only "automatic" solution for this problem is to delete the failed EphemeralRunner via a CronJob:
There's a good example in the recent discussion: #3300

We also failed at the same time as everyone else, and it didn't recover until we manually deleted the failed EphemeralRunner.
I'll have to look into the cause of the Runner not running a bit more, but since it happened at the same time, it seems to be a Github issue where the EphemeralRunner isn't registered to the Github RunnerGroup.

@nikola-jokic
Copy link
Collaborator

Hey @DingGGu,

It might have been the back-end issue, but ARC is still working as expected. Currently, by design, 5 failed attempts would cause ephemeral runner to reach the failed state and is left for further inspection. There are currently efforts to create a self-recovery for failed ephemeral runners that should be released in the 0.12.0. Regardless of that, from the ARC's perspective, 5 failed attempts would leave the ephemeral runner pod in the failed state, which is exactly what happened.

We try to keep issues related to ARC itself here; that is why I closed it.

@DingGGu
Copy link

DingGGu commented May 12, 2025

Hi @nikola-jokic,

Is there any way to find a hint as to why the EpmeralRunner failed 5 times?

I can't find any related event in the EphemeralRunner or log of arc-gha-rs-controller or listener.

@nikola-jokic
Copy link
Collaborator

It is usually in the reason field under the ephemeral runner's status. Otherwise, you probably need to inspect the Kubelet log. We copy the pod termination reason to the ephemeral runner status reason, so if it is not present in the pod, the ephemeral runner will not have it as well.

@talsh-oasis
Copy link

Why was this issue closed?
We still get this error for ghcr.io/actions/actions-runner:2.323.0

@talsh-oasis
Copy link

Hi @nikola-jokic,

Is there any way to find a hint as to why the EpmeralRunner failed 5 times?

I can't find any related event in the EphemeralRunner or log of arc-gha-rs-controller or listener.

Did you find any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

4 participants