Manage runner with label #355

tapih · 2021-02-25T07:58:06Z

This PR fixes #330 .

RunnerDeployment manages RunnerReplicaSet with label selector
RunnerReplicaSet manages Runners with label selector
HorizontalRunnerAutoscler lists Runners by labels when PercentageRunnersBusy is specified

I manually checked that HorizontalRunnerAutoscler works properly on the PercentageRunnersBusy mode when multiple RunnerDeployment exist in the same namespace.

mumoshu · 2021-02-25T11:08:10Z

controllers/runnerdeployment_controller.go

@@ -270,6 +316,12 @@ func (r *RunnerDeploymentReconciler) newRunnerReplicaSet(rd v1alpha1.RunnerDeplo

 	newRSTemplate.Labels = labels

+	if rd.Spec.Selector == nil {


This seems like a breaking change and I'd like to make it backward-compatible as much as possible.

Sorry if I'm missing something or I have any misconception - but can we use labels to compose the default rd.Spec.Selector here? As far as remember, labels is inherited to RunnerReplicaSet.ObjectMeta.Labels and and Runner.ObjectMeta.Labels, which might be enough for resolving the original issue.

As far as it is backward compatible. I think I'm fine to add selector and matchLabels, matchExpressions. But it isn't strictly needed for the original issue, right?

I've tried to express the above as code in 62fcd5f. Does it make sense to you?

62fcd5f basically looks good, but it's not completely backward-compatiblem, IMO.

Let's say users upgrades the controller while RunnerReplicaSets already exist.
The new controller cannot find the existing RunnerReplicaSets, because it looks for RunnerReplicaSets by labels but the existing RunnerReplicaSets are not labeled.
I guess that new RunnerReplicaSets will be created after the controller is upgraded.

I found 2 ways to avoid that.

The controller lists RunnerReplicaSets both with labels and ownerReference and makes a set of both.

The controller lists RunnerReplicaSets first, and if they are not labeled, it labels RunnerReplicaSets first.
Then, it lists RunnerReplicaSets both with labels again.

These are temporal. and you can delete them when you are forced to break backward compatibility.

@tapih Thanks! Could you double-check my idea?

The new controller cannot find the existing RunnerReplicaSets, because it looks for RunnerReplicaSets by labels but the existing RunnerReplicaSets are not labeled.
I guess that new RunnerReplicaSets will be created after the controller is upgraded.

Would this be about the updated controller results in recreating RunnerReplicaSet due to the replicasets' template hash change by the addition of the selector? As long as the controller update doesn't result in requiring the user to manually update K8s resources other than the controller's, it looks fine to me. WDYT?

My motivation behind 62fcd5f was to keep backward-compatibility in the sense of the RunnerDeployment API / avoid forcing users to manually update their RunnerDeployment resources to include Spec.Template.Selector.

desiredRS would change due to the addition of the selector, but these lines in the current code does not get the existing RunnerReplicaSets in the first place because it now matches RunnerReplicaSets with labels but RunnerReplicaSets are not labeled before the controller is updated, IIUC.
I understand your motivation here and agree with you, so we should do what I suggested here to avoid this problem, .

What do you think about it? Thank you.

RunnerReplicaSet manages Runner with labels?

Thanks to your change, RunnerReplicaSet already seems to manage Runner with labels. After 9d75e30, the controller even works correctly for RunnerReplicaSet with/without Spec.Selector.

If so, I'm gonna make another small fix for HRA to work when no selector is given.

Are you willing to add something like the owner-reference check similar to 9d75e30 the below?

https://github.com/summerwind/actions-runner-controller/blob/9d75e30c9cd39b9f46aca5afd3b56caec1a99d27/controllers/runnerreplicaset_controller.go#L77-L81

@tapih Just a friendly reminder for the case that you missed my comment 😃

Sorry, I took a day off yesterday...

I took this way to fix the HRA bug because I thought we should basically use labels to have RunnerDeployment manage Runners to make the logic as symmetric as possible with k8s's Deployment
At the same time, I undestood that you didn't want to break the compatibility, so I made some patches not to break the compatibility afterwards.

However, now the code is getting more and more complex, and I'm being strongly concerned that this would make it unnecessaliry harder to maintain this project later.
I'm really sorry for taking your time so much, but I would like to suggest taking the following steps:

Fix the HRA bug without using labels

Merge the PR which has the RunnerDeployment manage Runners with labels when you are forced to break the compatibility later

I'm willing to make PRs again. What do you think about it?

@tapih Just to be clear, I was actually inclined to add selector to RunnerDeployment and RunnerReplicaSet, despite that it requires some work to make it backward-compatible.

Not only it helps us shape RunnerDeplomyet as much as similar as K8s Deployment, I found that selector itself can be a good/common language that allows the third-party(user) to easily find a specific set of runners managed by a runnerreplicaset/runnerdeployment.

Without selector, one had to read actions-runner-controller code to see what information can be used to correlate a runner deployment/replicaset to the corresponding runner(s).

So we now seem to have another benefit that we weren't aware of, for adding selector.

In addition to that-

Fix the HRA bug without using labels

(I may have been pretty confused, but) if you literally meant to NOT use labels, I'm not sure how we can achieve that. If you meant to not use selectors, I'm still unsure how easy it is.

HRA needs to find the latest runners managed by the targeted RunnerDeployment anyway.

Just following owner-references from HRA, RD, RS, and Runners would be not enough.

For HRA-RD, RS-R it's fine, as there might be only 1-to-1 relationship among them. However for RD, there may be two or more RS. They have different runner-template-hash label values and that label has been already there for RunnerReplicaSet, so using it for filtering RunnerReplicaSet seems natural.

We also need to determine which runner-template-hash value is one for the latest runner spec. Instead of relying on Selector, we can probably modify the HRA controller to call the same hash computing function on the RD.Spec.Template as the RD controller for that. But doing so would make a tight coupling between RD and HRA controllers. Wouldn't it be easier to add/use Selector then, in terms of design/implementation complexity?

WDYT?

You may have thought that it's really up to me to decide which direction we should go, but I'd really appreciate your feedback!

controllers/runnerdeployment_controller_test.go

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

…ward compatibility

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

…fore v0.17.0

mumoshu · 2021-03-04T01:12:07Z

It seems like we get validation failure list:\nspec.selector in body must be of type object: \"null\"" errors on Spec.Selector for existing RunnerReplicaSet resources and RunnerReplicaSet update errors due to some bug. Added 2e0eab0 for the fix.

…mplate hash change

mumoshu · 2021-03-04T06:54:08Z

controllers/autoscaling.go

+		ctx,
+		&runnerList,
+		client.InNamespace(rd.Namespace),
+		client.MatchingLabelsSelector{Selector: selector},


As I have roughly mentioned in #355 (comment), this might be incorrect.

We don't include runner-template-hash into account here, so at least this can result in runners from multiple RunnerReplicaSet to match. It would double numRunners while replacing RunnerReplicaSet due to RD.Spec.Template change.

Probably we could just use rd.Spec.Replicas (that defaults to 1 if missing) instead of numRunners := len(runnerList.Items), while keeping this line as-is.

Well, it might be out-of-scope- the above issue should have been there before this PR.

mumoshu · 2021-03-05T01:05:06Z

Actually, I'm unsure if we really need matchExpressions. Can we just remove it from the spec, so that the amount of change becomes a bit smaller?

mumoshu

@tapih Let me merge this anyway so that everyone can participate in testing!

To be clear, I'm pretty confident this is working as expected.

But still wondering if we can remove matchExpression for simplicity.

I'll shortly fix #355 (comment). It's clearly out of the scope of this PR.

Other than those, I'm not aware of any other enhancements we need on top of this right now.

If you have any comments/opinions/feedbacks/etc, it's more than welcome!

Addresses #355 (comment)

mumoshu · 2021-03-05T01:28:32Z

controllers/runnerdeployment_controller.go

@@ -143,6 +144,28 @@ func (r *RunnerDeploymentReconciler) Reconcile(req ctrl.Request) (ctrl.Result, e
 		return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
 	}

+	if !reflect.DeepEqual(newestSet.Spec.Selector, desiredRS.Spec.Selector) {


AFAIK, and to be clear, Selector is immutable in the standard K8s deployment. But we'd prefer making this mutable for our use-case

PercentageRunnerBusy seems to have regressed since #355 due to that RunnerDeployment.Spec.Selector is empty by default and the HRA controller was using that empty selector to query runners, which somehow returned 0 runners. This fixes that by using the newly added automatic `runner-deployment-name` label for the default runner label and the selector, which avoids querying with empty selector. Ref #377 (comment)

mumoshu reviewed Feb 25, 2021

View reviewed changes

controllers/runnerdeployment_controller_test.go Outdated Show resolved Hide resolved

tapih and others added 15 commits March 2, 2021 13:51

Update RunnerDeploymentSpec to have Selector field

053de07

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Update RunnerReplicaSetSpec to have Selector field

a93e8ab

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Add CloneSelectorAndAddLabel to add Selector field

225cd3a

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Fix tests

2f922a7

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Use label to find RunnerReplicaSet/Runner

6e90110

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

Update controller-gen versions in CRD

0b37e4b

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Update autoscaler to list Pods with labels

7c57fa5

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Add debug log

e119af9

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Modify RunnerDeployment tests

813b9fe

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

Modify RunnerReplicaset test

1058d8d

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

Modify integration test

ef0bd4a

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Use RunnerDeployment Template Labels as the default selector for back…

d9b0d70

…ward compatibility

Fix labeling

7eca735

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Update func in Eventually to return (int, error)

757a5cc

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

Update RunnerDeployment controller not to use label selector

9f8220c

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

tapih force-pushed the manage-runner-with-label branch from 8de84f9 to 9f8220c Compare March 2, 2021 05:12

mumoshu added 3 commits March 2, 2021 18:14

Fix potential replicaset controller breakage on replicaset created be…

9d75e30

…fore v0.17.0

Fix errors on existing runner replica sets

2e0eab0

Ensure RunnerReplicaSet Spec Selector addition does not break controller

a409a9e

Ensure RunnerDeployment Template.Spec.Labels change does result in te…

d6b3475

…mplate hash change

mumoshu force-pushed the manage-runner-with-label branch from 151e3b7 to d6b3475 Compare March 4, 2021 01:22

mumoshu reviewed Mar 4, 2021

View reviewed changes

Fix comment

555bd1a

mumoshu approved these changes Mar 5, 2021

View reviewed changes

mumoshu merged commit 11e58fc into actions:master Mar 5, 2021

mumoshu added a commit that referenced this pull request Mar 5, 2021

Fix possible flapping autoscale on runner update

727d0a5

Addresses #355 (comment)

mumoshu mentioned this pull request Mar 5, 2021

Fix possible flapping autoscale on runner update #371

Merged

mumoshu added a commit that referenced this pull request Mar 5, 2021

Fix possible flapping autoscale on runner update (#371)

4fa5315

Addresses #355 (comment)

mumoshu reviewed Mar 5, 2021

View reviewed changes

callum-tait-pbx mentioned this pull request Mar 5, 2021

chore: bumping helm chart #372

Merged

binoue mentioned this pull request Mar 8, 2021

Fix the runner controller to manage Runner by labels cybozu-go/neco#1428

Closed

4 tasks

mumoshu mentioned this pull request Mar 9, 2021

docs: fix typos and add PR link #379

Merged

This was referenced Mar 11, 2021

Fix PercentageRunnersBusy scaling not working #386

Merged

[Github Webhook HRA] Not able to get it working... #377

Closed

tapih mentioned this pull request Mar 22, 2021

Implement github-actions-controller at minimal cybozu-go/meows#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage runner with label #355

Manage runner with label #355

tapih commented Feb 25, 2021 •

edited

mumoshu Feb 25, 2021 •

edited

mumoshu Feb 26, 2021

tapih Feb 26, 2021

mumoshu Mar 1, 2021 •

edited

tapih Mar 1, 2021

mumoshu Mar 3, 2021

mumoshu Mar 3, 2021

tapih Mar 4, 2021

tapih Mar 4, 2021

mumoshu Mar 4, 2021 •

edited

mumoshu commented Mar 4, 2021

mumoshu Mar 4, 2021 •

edited

mumoshu Mar 4, 2021

mumoshu Mar 5, 2021

mumoshu commented Mar 5, 2021

mumoshu left a comment

mumoshu Mar 5, 2021

		@@ -270,6 +316,12 @@ func (r *RunnerDeploymentReconciler) newRunnerReplicaSet(rd v1alpha1.RunnerDeplo

		newRSTemplate.Labels = labels

		if rd.Spec.Selector == nil {

Manage runner with label #355

Manage runner with label #355

Conversation

tapih commented Feb 25, 2021 • edited

mumoshu Feb 25, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu Mar 1, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu Mar 4, 2021 • edited

Choose a reason for hiding this comment

mumoshu commented Mar 4, 2021

mumoshu Mar 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu commented Mar 5, 2021

mumoshu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tapih commented Feb 25, 2021 •

edited

mumoshu Feb 25, 2021 •

edited

mumoshu Mar 1, 2021 •

edited

mumoshu Mar 4, 2021 •

edited

mumoshu Mar 4, 2021 •

edited