Skip to content

scaleset listener's config is not deleted on cleanup #4029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
hsmade opened this issue Apr 9, 2025 · 8 comments · May be fixed by #4033
Open
4 tasks done

scaleset listener's config is not deleted on cleanup #4029

hsmade opened this issue Apr 9, 2025 · 8 comments · May be fixed by #4033
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode

Comments

@hsmade
Copy link

hsmade commented Apr 9, 2025

Checks

Controller Version

0.11.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

- have a scaleset with a GHA token
- expire the token
- scaleset listener pod crashes (correct)
- controller deletes listener pod
- listener config is kept in place, with old copy of the secret
- new listener pod is created with as context the new secret, but as the config already exists, it's not replaced
- listener pod crashes again, because token is still invalid

Describe the bug

is never reached (I don't see the logged line in my logs).
So the lines after also never run, which seem to be in charge of deleting the config secret.

Which results into the following, when the token gets refreshed / replaced in the original secret (configured with the autoscalingrunnerset.spec.githubConfigSecret):

$ kubectl -n github-runners get secret test-github-runners-55655b45-listener-config  -o jsonpath="{.data['config\.json']}"|base64 -d|jq .token | md5sum 
c4e76cd6a3e556f2348fa05590baf4a7  -
$ kubectl -n github-runners get secret test-github-runners-55655b45-listener  -o jsonpath={.data.github_token}|base64 -d|md5sum 
91ff6fda9734754b5fed52613bb15d13  -

Describe the expected behavior

I would expect that if I replace the token in the secret configured in autoscalingrunnerset.spec.githubConfigSecret, the listener pod will eventually run with the new token.

Additional Context

.

Controller Logs

https://gist.github.com/hsmade/9f21b38f9680b7c3a42f690ecb6ddf65

Runner Pod Logs

https://gist.github.com/hsmade/82d14b70299d878322e420c605733760
@hsmade hsmade added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Apr 9, 2025
Copy link
Contributor

github-actions bot commented Apr 9, 2025

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@hsmade
Copy link
Author

hsmade commented Apr 9, 2025

Maybe this is happening because the listener pod gets recreated by the reconciler between the return on line 292 and the next execution? So it never matches line 293 (case kerrors.IsNotFound(err))?

@hsmade
Copy link
Author

hsmade commented Apr 11, 2025

Hmm no, it doesn't even get there. What I think happens is that the pod terminates because the container terminates and restartPolicy is set to never. Then the Reconcile loop sees that, deletes the pod, and recreates it the next loop. But because the config secret isn't deleted, it doesn't get recreated, so it still has the old values.

@nikola-jokic
Copy link
Collaborator

Hey @hsmade, if you want to update the token, you should probably update it for the autoscaling runner set as well. But if this secret gets updated, we should probably re-create the AutoscalingListener resource which should re-create the configuration and the listener pod, mirror secret etc.

@nikola-jokic nikola-jokic removed the needs triage Requires review from the maintainers label Apr 11, 2025
hsmade added a commit to hsmade/actions-runner-controller that referenced this issue Apr 11, 2025
This should fix actions#4029.
When the github token gets updated, and the old one expires, the listener pod dies. Only to be recreated with the existing listener config that holds the now expired token.
This PR fixes that by deleting the config secret as well, so the reconciler recreates it with the updated token.
@hsmade hsmade linked a pull request Apr 11, 2025 that will close this issue
@hsmade
Copy link
Author

hsmade commented Apr 11, 2025

Hi!

The autoscaler runner set has:

spec:
  githubConfigSecret: github-auth

Which is the secret that is getting updated. But the changes don't get reflected in the listener.
What I find curious is that when the listener pod terminates, the pod gets deleted by the reconciler, just to get recreated with the exact same data. Including the config secret that isn't updated.
The mirror secret is actually updated automatically, and correctly.

What I suggest is that when the listener pod gets deleted, its config secret gets deleted as well.
This actually does fix the issue I'm seeing (tested this in a running environment).

PR #4033 is what I tested on my environment, and seems to fix the issue.

@nikola-jokic
Copy link
Collaborator

Hey @hsmade,

Thank you for raising the PR, but it doesn't fix the root of the problem. The listener exit could be happening due to various reasons, and re-creating the secret on each crash would put more pressure on the API server without any need for it.
I think that the appropriate fix is to track the spec hash, including the secret hash. Including both would re-create the entire AutoscalingListener instance, which would include mirroring the secret.

Would you be interested in testing this change?

@hsmade
Copy link
Author

hsmade commented Apr 14, 2025

The code does seem to know when the mirror secret is out of sync. Maybe that should invalidate the config secrets as well:

	mirrorSecretDataHash := mirrorSecret.Labels["secret-data-hash"]
	secretDataHash := hash.ComputeTemplateHash(secret.Data)
	if mirrorSecretDataHash != secretDataHash {
		log.Info("Updating mirror listener secret for the listener pod", "mirrorSecretDataHash", mirrorSecretDataHash, "secretDataHash", secretDataHash)
		return r.updateSecretsForListener(ctx, secret, mirrorSecret, log)
	}

I'm confused to as to why updateSecretsForListener doesn't create the config secret, only the mirror secret.
What's the purpose of having this mirror secret actually?

@nikola-jokic
Copy link
Collaborator

Hey @hsmade,

I was playing around with the fix, and I was wondering if you can simply update the values.yaml file and re-apply the new secret with it, wouldn't that solve your problem? Instead of updating the secret directly, you update it through the values.yaml file. We do have some mechanisms to understand when the values.yaml hash is changed, and it should solve your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode
Projects
None yet
2 participants