-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many GitHub's API requests reach the rate limit #206
Comments
I think a main contributor to this problem are the number of times the Runner Reconciler updates the registration token and then re-queues the reconcile job. Would you mind confirming the number of times |
Since updates to the Runner resource are not evaluated immediately but are placed on the controller-runtime workqueue, the token expiration date is not immediately updated, the following function returns false https://git.blendlabs.com/blend/actions-runner-controller/blob/184538e3ee1f85477884d29bc6dc33454f072f5d/controllers/runner_controller.go#L265 and the reconcile function returns Requeue: true in the runner reconcile func below:
I wonder if there is a good solution to delay requeue - one option is to add time.Sleep(X * time.Second) before returning the requeue result. |
Hi @eb-trigo - can you try running with the latest release? I above PR should reduce the total number of api requests made to GitHub substantially. |
@ZacharyBenamram I'm using latest images for the controller and for the runners (DinD runner image). I'm still regularly banned from Github API. My controller is banned within 10 or 20min it depends... Do you think that there's a solution except the PR about |
Hey @theobolo - another step you can take is to use the alternate hpa scheme PercentageRunnersBusy. This reduces the number of api calls made to github considerably. Instead of iterating through each repository, and then checking each workflow, and subsequently each job - the new hpa scheme only makes one call to github to get the number of runners busy. This should reduce the total number of requests you are making. I would keep --sync-period to 1m to test to begin with. If you set sync-period too high, then the hpa will become useless. |
@ZacharyBenamram Thanks for your answer, i'm gonna give it a try right now :) |
@ZacharyBenamram I'm certainly missing something but controller is printing this log, and the AutoScaling is not working :
I updated my CRDs to use the last modifications btw |
@theobolo - are you able to add your organization to the runnerdeployment spec? And can you also provide your runner deployment + horizontal runner autoscaler specs here? |
Ok that's working using Organization to my runnerdeployment spec, using repository scoped RunnerDeployment seems not working. I can't use both within the RunnerDeployment (as excepted). There is my RunnerDeployment + Horizontal witch is working now :
And this one, witch is not :
I'll push testing a little bit tonight, i'm off for the moment. Thanks for your Quick answer @ZacharyBenamram |
@ZacharyBenamram I did try with this config &
Unfortunately i ended being rate limited after maybe 30mins more or less. BTW, I did success to run 10 parrallel workers, during 1 hour with 5min jobs on each.... wich is quite disapointing xD Need to setup prometheus stats and grafana to track number of API requests during scale up process and workers cycling with jobs. Nobody here did try this before ? I'm really surprised, tought it was pretty common to run more than 10 parrallel actions runners, even more when it's self-hosted on a kube cluster ;D ps: I also tried to use Github APP instead of a Github PAT for Authentication, but without any improvements :/ |
Honestly, I've never tried that with low sync-period. My strategy so far has been to use as long as possible sync-period to avoid rate limiting. Also, I think the API has too strict rate limit for our use-case. But I'm open to suggestions or data to help us make the controller even more efficient. |
Also- could you run
At least, |
Hello everyone, I did a lot of tests yesterday, and i found that there is a big problem when you forget the The Autoscaler is working, but it's consistently trying to scale down to 1 worker, wich is the number of replicas expected by the ReplicaSet... This is really misleading because it's kinda working, but it cause a lot of troubles and also a lot of unnecessary API Requests, because runners are scaling up and then instantly scaled down to 1 replica. It leads to have some Runners in "Offline" state, with some skipped jobs on Github Actions side, etc... I suspect that some of thoses issues : #62 | #77 are related to this, because it's exactly the behaviour that i've seen on my cluster : Some pods are "Completed", and Github Actions UI show them "Offline", i'm not 100% sure if that's related... Without "replicas: 1", i didn't experienced this case: Offline runners, etc ... So don't use On the API Request limitation side, fixing this, gave me a lot more room to use small I did hit the limit by using 50 workers in parallel with a I'll use your tip @mumoshu to track API Calls and find the sweet spot between max replicas and sync-period. I'm very attracted by this controller to minimize infrastructure cost by using a fast and responsive HorizontalRunnerAutoscaler, but for the moment we need to deal with Github API limitations ... I'll give you more insights soon, my goal is to achieve 100 workers in parallel with a decent See ya |
Not sure what's happening but all of a sudden I'm getting high API call usage even with no HPA's installed. I have the controller manager installed on a single cluster and it goes up about 10-20 API calls every few seconds or so. This all started after a botched deployment of a second HPA which created about 2k runners in our Org but even after we removed that installation and revoked the token, it's still eating through our rate. Edit: Looks like it was due to the 2k offline runners being created in our Org. The reconciliation process seemed to be checking all those runners page by page and eating up our limit. I've deleted them all using the rest API and its a lot less usage now. |
Outside of changing the GitHub API behavior or playing around with syncing settings, any thoughts about having the controller register itself as a GitHub App so it can take advantage of higher server-to-server rate limiting?
Genuinely ignorant question; there might be a reason this is using PATs |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi,
After deploying ~20 runners/pods I noticed that after half an hour they are up and running I received an error that appeared inside the controller-manager container saying I've reached the maximum of GitHub Apps's rate limit of 5,000 requests per hour.
Why the actions-runner-controller API is making too many concurrent requests in such a short time to Github?
Is it possible to reduce it significantly?
These are the settings I added to avoid this issue but they did not help:
And --sync-period=10m
The text was updated successfully, but these errors were encountered: