Scale Set Listener Stops Responding #3204

jameshounshell · 2024-01-05T19:05:08Z

Checks

I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
I am using charts that are officially provided

Controller Version

0.7.0

Deployment Method

Helm

Checks

This isn't a question or user support case (For Q&A and community support, go to Discussions).
I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Install both ARC and the ScaleSet helm charts to a kubernetes cluster.
2. Run Github Actions workflows/jobs normally with a self hosted runner.

Beyond this we do not have specific steps to reproduce and would very much appreciate suggestions for how we could gather more information to trigger this bug.

Describe the bug

Symptom:

Developers reported that jobs were pending for over 20 minutes (in the github workflow job view)
Restarted the controller and nothing happened
Deleted the listener and once a new listener pod starts, then new runner pods start. (Alternative we have now discovered that after about an hour, the listener will begin responding again and the logs says "refreshing token")

Diagnostics:

The listener stops logging
The only association we can make to alert on is using something like this ((gha_assigned_jobs > 0) and (rate(gha_assigned_jobs[10m]) == 0)) from the metrics provided by the listener.

Another symptom I've noticed: When the controller pod is updated (for instance, between 0.7.0 and 0.8.0) two inconsistencies happen:

If there are old job pods running, they will block the new listener from starting. I need to delete the pods and do a rollout restart of the controller to get the listener going
Just updating the deployument doesn't seem to consistently start the listener either. There is always a chance that a rollout restart is needed to get it going.

A separate instance where we observed the issue

Currently running listener gha-runner-scale-set-controller:0.7.0
No logs were being generated, had found a workflow job in a pending state (from the listener logs).
10 Runners were online but the runner group was offline when looking at our org's action runner group settings ( url like this: https://github.com/organizations/bigcorp/settings/actions/runner-groups/9 )

Verified listener metrics had flatlined, for example:

Then the queued job being watched was assigned with no intervention.
Below is the section of logs where the listener had stopped and then started working again with no intervention.

2023-12-28T17:10:14Z	INFO	service	process batched runner scale set job messages.	{"messageId": 13462, "batchSize": 4}
2023-12-28T17:10:14Z	INFO	service	job started message received.	{"RequestId": 1026947, "RunnerId": 1062477}
2023-12-28T17:10:14Z	INFO	service	update job info for runner	{"runnerName": "self-hosted-pqbsj-runner-dkmf2", "ownerName": "bigcorp", "repoName": "cc-api-orc-kt", "workflowRef": "bigcorp/github-actions-shared/.github/workflows/sql-migration-lint.yaml@refs/heads/v1", "workflowRunId": 7349734264, "jobDisplayName": "build / sql-lint / lint-migrations", "requestId": 1026947}
2023-12-28T17:10:14Z	INFO	KubernetesManager	Created merge patch json for EphemeralRunner status update	{"json": "{\"status\":{\"jobDisplayName\":\"build / sql-lint / lint-migrations\",\"jobRepositoryName\":\"bigcorp/cc-api-orc-kt\",\"jobRequestId\":1026947,\"jobWorkflowRef\":\"bigcorp/github-actions-shared/.github/workflows/sql-migration-lint.yaml@refs/heads/v1\",\"workflowRunId\":7349734264}}"}
2023-12-28T17:10:14Z	INFO	service	job started message received.	{"RequestId": 1026948, "RunnerId": 1062478}
2023-12-28T17:10:14Z	INFO	service	update job info for runner	{"runnerName": "self-hosted-pqbsj-runner-gl8pq", "ownerName": "bigcorp", "repoName": "cc-api-orc-kt", "workflowRef": "bigcorp/github-actions-shared/.github/workflows/gradle.yaml@refs/heads/v1", "workflowRunId": 7349734264, "jobDisplayName": "build / build / code build", "requestId": 1026948}
2023-12-28T17:10:14Z	INFO	KubernetesManager	Created merge patch json for EphemeralRunner status update	{"json": "{\"status\":{\"jobDisplayName\":\"build / build / code build\",\"jobRepositoryName\":\"bigcorp/cc-api-orc-kt\",\"jobRequestId\":1026948,\"jobWorkflowRef\":\"bigcorp/github-actions-shared/.github/workflows/gradle.yaml@refs/heads/v1\",\"workflowRunId\":7349734264}}"}
2023-12-28T17:10:14Z	INFO	service	job assigned message received.	{"RequestId": 1026947}
2023-12-28T17:10:14Z	INFO	service	job assigned message received.	{"RequestId": 1026948}
2023-12-28T17:10:14Z	INFO	auto_scaler	acquiring jobs.	{"request count": 0, "requestIds": "[]"}
2023-12-28T17:10:15Z	INFO	auto_scaler	deleted message.	{"messageId": 13462}
2023-12-28T17:10:15Z	INFO	service	waiting for message...

2023-12-28T18:10:15Z	INFO	refreshing_client	message queue token is expired during GetNextMessage, refreshing...
2023-12-28T18:10:15Z	INFO	refreshing token	{"githubConfigUrl": "https://github.com/bigcorp"}
2023-12-28T18:10:15Z	INFO	getting access token for GitHub App auth	{"accessTokenURL": "https://api.github.com/app/installations/43625644/access_tokens"}
2023-12-28T18:10:15Z	INFO	getting runner registration token	{"registrationTokenURL": "https://api.github.com/orgs/bigcorp/actions/runners/registration-token"}
2023-12-28T18:10:15Z	INFO	getting Actions tenant URL and JWT	{"registrationURL": "https://api.github.com/actions/runner-registration"}
2023-12-28T18:10:16Z	INFO	service	process message.	{"messageId": 13463, "messageType": "RunnerScaleSetJobMessages"}
2023-12-28T18:10:16Z	INFO	service	current runner scale set statistics.	{"available jobs": 26, "acquired jobs": 0, "assigned jobs": 0, "running jobs": 0, "registered runners": 10, "busy runners": 0, "idle runners": 10}
2023-12-28T18:10:16Z	INFO	service	process batched runner scale set job messages.	{"messageId": 13463, "batchSize": 1}
2023-12-28T18:10:16Z	INFO	service	job completed message received.	{"RequestId": 1026947, "Result": "succeeded", "RunnerId": 1062477, "RunnerName": "self-hosted-pqbsj-runner-dkmf2"}
2023-12-28T18:10:16Z	INFO	auto_scaler	acquiring jobs.	{"request count": 0, "requestIds": "[]"}

Describe the expected behavior

Ideally the listener would never stop responding.

Additional Context

The only additional thing we tried was using the opt out button on our github app advanced features. This was kind of a hail mary since we saw the logs related to refreshing the token.

It seems to have helped but maybe that's just a fluke.

Controller Logs

Here's the logs from the time 10 min before through 10 min after the lister stops responding.

https://gist.github.com/jameshounshell/597358f9b0b624d1f80f98057ddddcf0

Runner Pod Logs

N/A

The text was updated successfully, but these errors were encountered:

jameshounshell · 2024-01-05T20:14:05Z

We've had a pretty huge break in the case.

It appears that the listener stopping responding is directly linked to when the token is refreshed.

In the image in yellow you can see every time the token refreshes with the log message "refreshing_client".
It appears to be reactive, in response to a failed http call, approximately happening every 60-65m. (which seems to jive with the max token lifetime of 1h for a github app)
But before the outage there is no token refresh. Then 1 hour later the token refreshes and the logs begin again (and jobs start running again)

nikola-jokic · 2024-01-08T12:09:11Z

Hey @jameshounshell,

Is it possible that there are network interrupts causing problems? At version 0.8.0, we did add client timeouts to retriable client that could help determine if there are any network issues. Just keep in mind, we are missing metrics on the newest listener app (Will be part of 0.8.2, and to enable metrics, you can use the old listener app as described here)

jameshounshell · 2024-01-09T21:56:26Z

We can sit tight and wait for 0.8.2 and report back on this issue.

jasonluck · 2024-01-24T16:01:18Z

I am seeing this exact same problem with our listener pod. I can also confirm it appears to be related to the token refresh. Its always the last thing logged by the pod before it stops responding.

nikola-jokic · 2024-01-25T12:03:42Z

Hey @jasonluck,

Could you also please post the log so we can inspect it? It is always better to have more evidence of an issue to help us find the root cause

nikola-jokic · 2024-02-12T09:51:41Z

I am going to close this issue until we hear back from you. Please, let us know if the issue is resolved. If not, we can always re-open it ☺️

ccureau · 2024-02-12T21:28:21Z

Now that we've updated to the 0.8.2 runner, we are getting a better indication of what happens when the listener pod stops. Instead of just no longer polling for jobs, we now get the pod stopped in an error state. The Controller does not attempt to kill/restart the listener.

Here's the logs that I collected this morning:

{"severity":"info","ts":"2024-02-11T02:27:36Z","logger":"listener-app.listener","message":"Getting next message","lastMessageID":557}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app.listener","message":"Processing message","messageId":558,"messageType":"RunnerScaleSetJobMessages"}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app.listener","message":"New runner scale set statistics.","statistics":{"totalAvailableJobs":1,"totalAcquiredJobs":1,"totalAssignedJobs":1,"totalRunningJobs":1,"totalRegisteredRunners":11,"totalBusyRunners":1,"totalIdleRunners":10}}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app.listener","message":"Job available message received","jobId":1132810}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app.listener","message":"Acquiring jobs","count":1,"requestIds":"[1132810]"}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app.listener","message":"Message queue token is expired during GetNextMessage, refreshing..."}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app","message":"refreshing token","githubConfigUrl":"https://github.com/missionlane"}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app","message":"getting access token for GitHub App auth","accessTokenURL":"https://api.github.com/app/installations/43625644/access_tokens"}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app","message":"getting runner registration token","registrationTokenURL":"https://api.github.com/orgs/missionlane/actions/runners/registration-token"}
{"severity":"info","ts":"2024-02-11T02:27:57Z","logger":"listener-app","message":"getting Actions tenant URL and JWT","registrationURL":"https://api.github.com/actions/runner-registration"}
{"severity":"info","ts":"2024-02-11T02:27:58Z","logger":"listener-app.listener","message":"Deleting message session"}
2024/02/11 02:27:58 Application returned an error: failed to handle message: failed to acquire jobs: failed to acquire jobs after session refresh: 400 - had issue communicating with Actions backend: Value cannot be null. (Parameter 'source')

ccureau · 2024-02-12T21:28:39Z

@nikola-jokic Please re-open this ticket :)

RalphSleighK · 2024-02-14T14:38:32Z

We are on 0.8.1 (with the LISTENER_ENTRYPOINT workaround) and I think we saw this today, one of our four listeners just stopped responding. The controller finalised and removed its remaining runner pods as they finished their jobs but didn't produce any errors or otherwise appear to notice. About 45 mins later a node scaledown evicted the controller and it all came back to life after that.

We also saw a token refresh just before this happened, listener logs:

2024-02-14T11:55:41Z INFO refreshing_client message queue token is expired during AcquireJobs, refreshing...
2024-02-14T11:55:41Z INFO refreshing token {"githubConfigUrl": "https://github.com/********"}
2024-02-14T11:55:42Z INFO getting access token for GitHub App auth {"accessTokenURL": "https://api.github.com/app/installations/********/access_tokens"}"
2024-02-14T11:55:48Z INFO getting runner registration token {"registrationTokenURL": "https://api.github.com/orgs/********/actions/runners/registration-token"}"
2024-02-14T11:55:52Z INFO getting Actions tenant URL and JWT {"registrationURL": "https://api.github.com/actions/runner-registration"}"
2024-02-14T11:56:02Z INFO auto_scaler acquired jobs. {"requested": 5, "acquired": 5}
  
2024-02-14T12:41:20Z INFO Starting metrics server {"address": ":8080"}

We have been using the current version for about a month without seeing it until today, but we are on EKS and upgraded from 1.28 to 1.29 yesterday. If it happens again I may downgrade and see if it goes away.

ccureau · 2024-02-14T14:44:17Z

@RalphSleighK This is the behavior we saw with 0.8.1 as well. If left as it is, it seems to recover after about an hour of being brain dead. Otherwise it takes us deleting the pod to make it happy again.

angusjellis · 2024-02-14T14:53:46Z

@ccureau What version of K8s are you using out of curiosity? We've just upgraded to 1.29, wondering if it's a coincidence that we saw this issue post-upgrade.

ccureau · 2024-02-14T14:57:19Z

@angusjellis We're on 1.27.10 in a GKE cluster. We did an upgrade from 1.26 to 1.27 while this was happening without any change in listener behavior for what it's worth.

ccureau · 2024-02-14T15:13:44Z

We got a new error this morning...unsure what to make of it:

{"severity":"info","ts":"2024-02-14T11:21:43Z","logger":"listener-app.listener","message":"Deleting last message","lastMessageID":1298}
{"severity":"info","ts":"2024-02-14T11:21:43Z","logger":"listener-app.listener","message":"Deleting message session"}
{"severity":"info","ts":"2024-02-14T11:21:43Z","logger":"listener-app","message":"refreshing token","githubConfigUrl":"https://github.com/missionlane"}
{"severity":"info","ts":"2024-02-14T11:21:43Z","logger":"listener-app","message":"getting access token for GitHub App auth","accessTokenURL":"https://api.github.com/app/installations/43625644/access_tokens"}
{"severity":"info","ts":"2024-02-14T11:21:43Z","logger":"listener-app","message":"getting runner registration token","registrationTokenURL":"https://api.github.com/orgs/missionlane/actions/runners/registration-token"}
{"severity":"info","ts":"2024-02-14T11:21:43Z","logger":"listener-app","message":"getting Actions tenant URL and JWT","registrationURL":"https://api.github.com/actions/runner-registration"}
2024/02/14 11:21:43 Application returned an error: failed to handle message: failed to delete message: failed to delete message: {"$id":"1","innerException":null,"message":"The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.","typeName":"Microsoft.TeamFoundation.Framework.Server.UnauthorizedRequestException, Microsoft.TeamFoundation.Framework.Server","typeKey":"UnauthorizedRequestException","errorCode":0,"eventId":3000}

pavelsmolensky · 2024-03-06T14:37:19Z

With ghcr.io/actions/gha-runner-scale-set-controller:0.8.3 we still get

2024/03/06 13:19:57 Application returned an error: failed to handle message: failed to delete message: failed to delete message: {"$id":"1","innerException":null,"message":"The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.","typeName":"Microsoft.TeamFoundation.Framework.Server.UnauthorizedRequestException, Microsoft.TeamFoundation.Framework.Server","typeKey":"UnauthorizedRequestException","errorCode":0,"eventId":3000}

once/twice a day (k8s v1.28.5)

clementnero · 2024-04-12T09:40:51Z

With ghcr.io/actions/gha-runner-scale-set-controller:0.8.3 we still get

2024/03/06 13:19:57 Application returned an error: failed to handle message: failed to delete message: failed to delete message: {"$id":"1","innerException":null,"message":"The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.","typeName":"Microsoft.TeamFoundation.Framework.Server.UnauthorizedRequestException, Microsoft.TeamFoundation.Framework.Server","typeKey":"UnauthorizedRequestException","errorCode":0,"eventId":3000}

once/twice a day (k8s v1.28.5)

Same here with GKE v1.29.1 and gha-runner-scale-set-0.9.0

nikola-jokic · 2024-04-12T09:57:52Z

Re-opening this one for #3204 (comment)

ohookins · 2024-04-29T00:31:18Z

Also seeing this here, with 0.9.1. The listener seems to be running but we don't see any new jobs being created in the cluster. What's frustrating is that the log level is already debug by default, and there's no way to get any further detail from the listener to see what might be happening.

I did take a quick look at the listener queue handling code, and while I'm not positive I'm either correct or that the possible problem is even related to this, I wonder if we have a goroutine leak here:
https://github.com/actions/actions-runner-controller/blob/master/github/actions/client.go#L610

When a queue message is requested and read, if any of the error paths are followed, we defer the closing of resp.Body. If it is the happy path, we pass resp.Body to the json.NewDecoder (which only accepts an io.Reader) and never close it. Possibly we tie up all the client connections within leaked goroutines? I might make a PR to address it.

EDIT: I missed the full response body buffering in client.Do, so this is not a problem.

nikola-jokic · 2024-04-29T09:50:51Z

Hey @ohookins,

If your runners are taking a long time to be scheduled on nodes, it may be related to this issue: #3450

ohookins · 2024-04-29T10:22:49Z

@nikola-jokic yes, that sounds very much like it! Thanks :)

verdel · 2024-04-29T10:25:50Z

I add restartPolicy: Always to listenerTemplate.spec in AutoscalingRunnerSet. It seems like a temporary fix that has helped. The listener container restarts immediately after completing its work with exit 1 due to the issue described here, rather than waiting for the Pod to be deleted, which remains in an Error state the whole time.

So far, one day has passed, and I am waiting for more statistical data to accumulate.

nikola-jokic · 2024-04-29T10:39:33Z

Hey @verdel,

When you can, please provide the log when the listener restarts. Since this has been an issue with the older version of the listener, it may be related to the actions server. In the 0.9.1 release, we added the trace ID which should help us track down the issue.

verdel · 2024-04-29T10:55:50Z

I'll send the logs a bit later. We are staying on version 0.9.0 as we are waiting for a fix that will be added in this PR.

verdel · 2024-04-29T15:08:34Z

Lately, I've been seeing the following message in the listener.

2024-04-27T07:07:53Z	INFO	listener-app.listener	Job started message received.	{"RequestId": 653192, "RunnerId": 76946}
2024-04-27T07:07:53Z	INFO	listener-app.listener	Job started message received.	{"RequestId": 653194, "RunnerId": 76949}
2024-04-27T07:07:53Z	INFO	listener-app.listener	Job started message received.	{"RequestId": 653193, "RunnerId": 76950}
2024-04-27T07:07:53Z	INFO	listener-app.listener	Deleting last message	{"lastMessageID": 180}
2024-04-27T07:07:53Z	INFO	listener-app.listener	Deleting message session
2024-04-27T07:07:53Z	INFO	listener-app	refreshing token	{"githubConfigUrl": "https://github.com/<org_name>"}
2024-04-27T07:07:53Z	INFO	listener-app	getting access token for GitHub App auth	{"accessTokenURL": "https://api.github.com/app/installations/<installation_id>/access_tokens"}
2024-04-27T07:07:53Z	INFO	listener-app	getting runner registration token	{"registrationTokenURL": "https://api.github.com/orgs/<org_name>/actions/runners/registration-token"}
2024-04-27T07:07:53Z	INFO	listener-app	getting Actions tenant URL and JWT	{"registrationURL": "https://api.github.com/actions/runner-registration"}
2024/04/27 07:07:54 Application returned an error: failed to handle message: failed to delete message: failed to delete message: {"$id":"1","innerException":null,"message":"The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.","typeName":"Microsoft.TeamFoundation.Framework.Server.UnauthorizedRequestException, Microsoft.TeamFoundation.Framework.Server","typeKey":"UnauthorizedRequestException","errorCode":0,"eventId":3000}

We use runners that are rarely launched (no more than once or twice a day). Such a message appears during the first launch of the day.

After that, ghalistener terminates with exit code 1 and the container transitions to Error state. Consequently, tasks stop being processed.

Today, after I added a restart policy, there have been no messages so far. I will continue to observe and provide additional information when it becomes available.

nikola-jokic · 2024-05-22T08:32:15Z

Hey everyone, I believe we found the root cause of this issue, and it is now fixed in 0.9.2.
This was the fix: #3529

I will leave this issue opened for now until someone confirms they are not seeing these problems with the newest release ☺️

nikola-jokic · 2024-05-24T09:45:17Z

I think this issue is safe to close now since there are no reports of listener crashes in the latest release. Thank you all for providing more information, and please let us know if you experience any issues like this again!

vladimirlinden-inv · 2024-12-05T10:09:25Z

I am having this issue on 0.9.3

jonathonbattista · 2024-12-18T19:26:47Z

Seeing this in 0.10.0

niodice · 2025-04-18T15:26:51Z

We believe we also may be seeing this; can we re-open this issue @nikola-jokic ?

In our case the logs we see are as follows: a bunch of "job assigned" messages followed by the listener computing zero new runners should be deployed, and then a token refresh a few minutes later. Perhaps related?

We are running on the scale sets 0.11.0 version

2025-04-18T14:53:12Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 100000170}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Processing message      {"messageId": 100000171, "messageType": "RunnerScaleSetJobMessages"}
2025-04-18T14:53:12Z    INFO    listener-app.listener   New runner scale set statistics.        {"statistics": {"totalAvailableJobs":0,"totalAcquiredJobs":0,"totalAssignedJobs":0,"totalRunningJobs":0,"totalRegisteredRunners":0,"totalBusyRunners":0,"totalIdleRunners":0}}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Job assigned message received   {"jobId": 1448}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Job assigned message received   {"jobId": 1449}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Job assigned message received   {"jobId": 1450}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Job assigned message received   {"jobId": 1451}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Job assigned message received   {"jobId": 1452}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Deleting last message   {"lastMessageID": 100000171}
2025-04-18T14:53:12Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 0, "min": 0, "max": 5, "currentRunnerCount": 0, "jobsCompleted": 0}
2025-04-18T14:53:12Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2025-04-18T14:53:12Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update     {"json": "{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2025-04-18T14:53:12Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "github-actions", "name": "<REPO>-mtt6s", "replicas": 0}
2025-04-18T14:53:12Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 100000171}
2025-04-18T14:54:02Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 0, "min": 0, "max": 5, "currentRunnerCount": 0, "jobsCompleted": 0}
2025-04-18T14:54:02Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2025-04-18T14:54:02Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update     {"json": "{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2025-04-18T14:54:02Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "github-actions", "name": "<REPO>-mtt6s", "replicas": 0}
2025-04-18T14:54:02Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 100000171}
2025-04-18T14:54:52Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 0, "min": 0, "max": 5, "currentRunnerCount": 0, "jobsCompleted": 0}
2025-04-18T14:54:52Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2025-04-18T14:54:52Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update     {"json": "{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2025-04-18T14:54:52Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "github-actions", "name": "<REPO>-mtt6s", "replicas": 0}
2025-04-18T14:54:52Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 100000171}
2025-04-18T14:55:42Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 0, "min": 0, "max": 5, "currentRunnerCount": 0, "jobsCompleted": 0}
2025-04-18T14:55:42Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2025-04-18T14:55:42Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update     {"json": "{\"spec\":{\"patchID\":0,\"replicas\":null}}"}
2025-04-18T14:55:42Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "github-actions", "name": "<REPO>-mtt6s", "replicas": 0}
2025-04-18T14:55:42Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 100000171}
2025-04-18T14:55:42Z    INFO    listener-app.listener   Message queue token is expired during GetNextMessage, refreshing...
2025-04-18T14:55:42Z    INFO    listener-app    refreshing token        {"githubConfigUrl": "https://github.com/<ORG>/<REPO>"}
2025-04-18T14:55:42Z    INFO    listener-app    getting access token for GitHub App auth        {"accessTokenURL": "https://api.github.com/app/installations/44232966/access_tokens"}
2025-04-18T14:55:42Z    INFO    listener-app    getting runner registration token       {"registrationTokenURL": "https://api.github.com/repos/<ORG>/<REPO>/actions/runners/registration-token"}
2025-04-18T14:55:42Z    INFO    listener-app    getting Actions tenant URL and JWT      {"registrationURL": "https://api.github.com/actions/runner-registration"}
2025-04-18T14:55:43Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 100000171}

nikola-jokic · 2025-04-22T10:51:54Z

Hey @niodice,

I'm hesitant to reopen this issue since the one you submitted is a separate issue from the one originally reported. Could you please share the workflow run URL here or via support so we can investigate it further?

nikola-jokic · 2025-04-22T12:34:32Z

Hey @niodice,

We found the issue, and it should be fixed now. Please let us know here if you experience it again.

jameshounshell added bug gha-runner-scale-set needs triage labels Jan 5, 2024

nikola-jokic removed the needs triage label Jan 23, 2024

nikola-jokic closed this as completed Feb 12, 2024

nikola-jokic reopened this Feb 12, 2024

nikola-jokic mentioned this issue Feb 27, 2024

Fix acquire jobs after session refresh ghalistener #3307

Merged

nikola-jokic closed this as completed in #3307 Feb 27, 2024

nikola-jokic reopened this Apr 12, 2024

nikola-jokic mentioned this issue Apr 12, 2024

Refactor actions client error to include request id #3430

Merged

OneideLuizSchneider mentioned this issue Apr 29, 2024

0.9.1 does not work with on-demand provisioned nodes #3450

Closed

4 tasks

nikola-jokic closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale Set Listener Stops Responding #3204

Scale Set Listener Stops Responding #3204

jameshounshell commented Jan 5, 2024 •

edited

Loading

jameshounshell commented Jan 5, 2024 •

edited

Loading

nikola-jokic commented Jan 8, 2024

jameshounshell commented Jan 9, 2024

jasonluck commented Jan 24, 2024

nikola-jokic commented Jan 25, 2024

nikola-jokic commented Feb 12, 2024

ccureau commented Feb 12, 2024 •

edited

Loading

ccureau commented Feb 12, 2024

RalphSleighK commented Feb 14, 2024 •

edited

Loading

ccureau commented Feb 14, 2024

angusjellis commented Feb 14, 2024

ccureau commented Feb 14, 2024

ccureau commented Feb 14, 2024

pavelsmolensky commented Mar 6, 2024

clementnero commented Apr 12, 2024

nikola-jokic commented Apr 12, 2024

ohookins commented Apr 29, 2024 •

edited

Loading

nikola-jokic commented Apr 29, 2024

ohookins commented Apr 29, 2024

verdel commented Apr 29, 2024 •

edited

Loading

nikola-jokic commented Apr 29, 2024

verdel commented Apr 29, 2024

verdel commented Apr 29, 2024

nikola-jokic commented May 22, 2024

nikola-jokic commented May 24, 2024

vladimirlinden-inv commented Dec 5, 2024

jonathonbattista commented Dec 18, 2024 •

edited

Loading

niodice commented Apr 18, 2025 •

edited

Loading

nikola-jokic commented Apr 22, 2025

nikola-jokic commented Apr 22, 2025

Scale Set Listener Stops Responding #3204

Scale Set Listener Stops Responding #3204

Comments

jameshounshell commented Jan 5, 2024 • edited Loading

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

jameshounshell commented Jan 5, 2024 • edited Loading

nikola-jokic commented Jan 8, 2024

jameshounshell commented Jan 9, 2024

jasonluck commented Jan 24, 2024

nikola-jokic commented Jan 25, 2024

nikola-jokic commented Feb 12, 2024

ccureau commented Feb 12, 2024 • edited Loading

ccureau commented Feb 12, 2024

RalphSleighK commented Feb 14, 2024 • edited Loading

ccureau commented Feb 14, 2024

angusjellis commented Feb 14, 2024

ccureau commented Feb 14, 2024

ccureau commented Feb 14, 2024

pavelsmolensky commented Mar 6, 2024

clementnero commented Apr 12, 2024

nikola-jokic commented Apr 12, 2024

ohookins commented Apr 29, 2024 • edited Loading

nikola-jokic commented Apr 29, 2024

ohookins commented Apr 29, 2024

verdel commented Apr 29, 2024 • edited Loading

nikola-jokic commented Apr 29, 2024

verdel commented Apr 29, 2024

verdel commented Apr 29, 2024

nikola-jokic commented May 22, 2024

nikola-jokic commented May 24, 2024

vladimirlinden-inv commented Dec 5, 2024

jonathonbattista commented Dec 18, 2024 • edited Loading

niodice commented Apr 18, 2025 • edited Loading

nikola-jokic commented Apr 22, 2025

nikola-jokic commented Apr 22, 2025

jameshounshell commented Jan 5, 2024 •

edited

Loading

jameshounshell commented Jan 5, 2024 •

edited

Loading

ccureau commented Feb 12, 2024 •

edited

Loading

RalphSleighK commented Feb 14, 2024 •

edited

Loading

ohookins commented Apr 29, 2024 •

edited

Loading

verdel commented Apr 29, 2024 •

edited

Loading

jonathonbattista commented Dec 18, 2024 •

edited

Loading

niodice commented Apr 18, 2025 •

edited

Loading