New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runner created but no pods linked on GHES on Canary from 13/03/2022 #1223
Comments
@Eriwyr Wow! Thanks a lot for testing canary Re your issue, I see It's usually emitted when your PAT or GitHub App does not have We've made no GitHub API related changes other than adding a transparent cache layer and I did verify that the layer does not affect authentication. So, I hope this is just a simple permission misconfiguration on either your PAT or App. Thanks in advance for your cooperation! |
Thank you very much for your answer @mumoshu We use Github App for the auth and we have checked the permissions of it. Everything works correctly if we switch back to a canary from 10 days ago for example or to the last stable release. The permissions of the github app are those specified in the documentation: You can see from the logs that in the same second the log says ok and then 403. To confirm that it was not a problem of rights we also realized a curl manually with the credentials of the github app and that works correctly in this case. Thank you in advance for your help! |
@Eriwyr Thank you! All I can say is there may be an edge-case that our new cache layer breaks some authenticated HTTP requests 🤔 Would you mind setting the log level to Here's how you can set the log level to And here's the implementation of API req/res logging, You mind find it helpful to see which message is from the log level: |
Note that I was able to successfully use the canary version of ARC with my GitHub App and organizational runners. So at least it isn't totally broken. |
Beware that it might print sensitive information like token contained in the request header for GitHub API access! |
Hi, colleague here. Using log level |
Some more details on our environment
Seeing as there were multiple |
@Meroje Which stacktraces did you get? 🤔 How did you enabled the log level of |
How did it ever work?! Our helm chart doesn't have a namespace selector in the webhook config which should be required to deploy one ARC per namespace. Without that, it might be indeterministic which mutating webhook (including PodTokenInjector) comes into play for a runner pod. Suppose have ARC instance 1 and instance 2, instance 2's webhook might comes into play for a runner pod that is managed by instance 1.
To me, this seems to indicate that you are actually using the same mutatingwebhookconfig name for all the ARC instances across namespaces, right? 🤔 |
I copied your exact snippet, which gets
The event would rather get sent to all matching webhooks, I guess ARC v0.21 did not care about some failures as long as one webhook was successful. Shall I send you a PR to add the namespace selector ? Would look like this --- a/charts/actions-runner-controller/templates/webhook_configs.yaml
+++ b/charts/actions-runner-controller/templates/webhook_configs.yaml
@@ -12,6 +12,11 @@ metadata:
webhooks:
- admissionReviewVersions:
- v1beta1
+ {{- if .Values.scope.singleNamespace }}
+ namespaceSelector:
+ matchLabels:
+ name: {{ .Values.scope.watchNamespace }}
+ {{- end }}
clientConfig:
{{- if .Values.admissionWebHooks.caBundle }}
caBundle: {{ quote .Values.admissionWebHooks.caBundle }} |
Also, I guess the lack of namespace selector explains why we didn't see logs from the mutating webhook |
Thank you for confirming!
I had a lightbulb in my brain when I was about to sleep- So it might be due to that your duplicating mutatingwebhooks across namespaces have never worked. Which custom resource are you using to deploy runners, RunnerSet or RunnerDeployment? Pre ARC 0.22.0, only RunnerSet managed runner pods were subject to the mutating webhook. So no matter if the mutating webhook is actually working, RunnerDeployment just worked. Since 0.22.0, it doesn't. Every runner pod, regardless of its managed by RunnerSet or Runnerdeployment, is subject to the mutating webhook so if your webhook config/server is broken/misconfigured somehow, it may result in nasty errors. In your case, your mutating webhook wasn't correctly namespaced in terms of That said:
Please! Your change looks definitely good and aligns well with my analysis above.
Yes, that makes sense to me too. |
Still not seeing roundtrips from the mutatingwebhook to
Yes, we only ever used RunnerDeployments
Thanks for confirming the change in behaviour.
Will do :) |
Describe the bug
We try to use the latest version of the canary image of the action runner controller in our GHES cluster to take advantage of the latest fixes. (We are really eager to see the 0.22 release :) )
We notice the following problem:
When launching jobs we can see objects of kind runner appearing in the cluster but no pods are launched.
There are errors logged on the runner-controller.
To Reproduce
Use image action-runner-controller:canary on GHES
Environment:
The text was updated successfully, but these errors were encountered: