-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce "No runner matching the specified labels was found" errors while runner replacement #392
Merged
mumoshu
merged 4 commits into
master
from
runnerreplicaset-replacement-perserving-available-runners
Mar 16, 2021
Merged
Reduce "No runner matching the specified labels was found" errors while runner replacement #392
mumoshu
merged 4 commits into
master
from
runnerreplicaset-replacement-perserving-available-runners
Mar 16, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ion timeout check Setting a short sync-period on actions-runner-controller had resulted in a series of paginated list-runners GitHub API calls for each sync on every idle runner. This was not necessarily. The original reason we wanted to call the GitHub API here was to determine if the runner needs to be recreated due to the registration timeout. A registration timeout happens only after 10 minutes after the pod creation so we do not need to check earlier than that. Since this change, the controller checks for the registration timeout only after approx 10 minutes regardless of how short the sync period is.
…le runner replacement We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered.
…ve resync Each runner resource is reconciled on any update in its spec or status, e.g. runner pod status update, and even runner status change triggered by the runner controller itself. As result, several ListRunners GitHub API have been called until single runner resource gets ready. Ideally it should be called only once or twice. This updates the runner controller and runner status resource to reduce the number of calls to twice in most cases. RunnerStatus has been obtained the new `LastRegistrationCheckTime` field to enable the controller to do so. I have also tweaked timings and contents of log messages emitted while a RunnerDeployment is being updated and RunnerReplicaSets and Runners are replaced. It should be more readable and less scary.
mumoshu
deleted the
runnerreplicaset-replacement-perserving-available-runners
branch
March 16, 2021 01:52
mumoshu
added a commit
that referenced
this pull request
Mar 18, 2021
… \"null\"` error Follow-up for #392
mumoshu
added a commit
that referenced
this pull request
Mar 19, 2021
Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded). More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.
mumoshu
added a commit
that referenced
this pull request
Mar 19, 2021
Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded). More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes
Running
, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase toRunning
only after the runner in the runner pod appears to be registered.A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.