Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage runner with label #355

Merged
merged 20 commits into from
Mar 5, 2021
Merged

Conversation

tapih
Copy link
Contributor

@tapih tapih commented Feb 25, 2021

This PR fixes #330 .

  • RunnerDeployment manages RunnerReplicaSet with label selector
  • RunnerReplicaSet manages Runners with label selector
  • HorizontalRunnerAutoscler lists Runners by labels when PercentageRunnersBusy is specified

I manually checked that HorizontalRunnerAutoscler works properly on the PercentageRunnersBusy mode when multiple RunnerDeployment exist in the same namespace.

@@ -270,6 +316,12 @@ func (r *RunnerDeploymentReconciler) newRunnerReplicaSet(rd v1alpha1.RunnerDeplo

newRSTemplate.Labels = labels

if rd.Spec.Selector == nil {
Copy link
Collaborator

@mumoshu mumoshu Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a breaking change and I'd like to make it backward-compatible as much as possible.

Sorry if I'm missing something or I have any misconception - but can we use labels to compose the default rd.Spec.Selector here? As far as remember, labels is inherited to RunnerReplicaSet.ObjectMeta.Labels and and Runner.ObjectMeta.Labels, which might be enough for resolving the original issue.

As far as it is backward compatible. I think I'm fine to add selector and matchLabels, matchExpressions. But it isn't strictly needed for the original issue, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to express the above as code in 62fcd5f. Does it make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

62fcd5f basically looks good, but it's not completely backward-compatiblem, IMO.

Let's say users upgrades the controller while RunnerReplicaSets already exist.
The new controller cannot find the existing RunnerReplicaSets, because it looks for RunnerReplicaSets by labels but the existing RunnerReplicaSets are not labeled.
I guess that new RunnerReplicaSets will be created after the controller is upgraded.

I found 2 ways to avoid that.

  • The controller lists RunnerReplicaSets both with labels and ownerReference and makes a set of both.
  • The controller lists RunnerReplicaSets first, and if they are not labeled, it labels RunnerReplicaSets first.
    Then, it lists RunnerReplicaSets both with labels again.

These are temporal. and you can delete them when you are forced to break backward compatibility.

Copy link
Collaborator

@mumoshu mumoshu Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tapih Thanks! Could you double-check my idea?

The new controller cannot find the existing RunnerReplicaSets, because it looks for RunnerReplicaSets by labels but the existing RunnerReplicaSets are not labeled.
I guess that new RunnerReplicaSets will be created after the controller is upgraded.

Would this be about the updated controller results in recreating RunnerReplicaSet due to the replicasets' template hash change by the addition of the selector? As long as the controller update doesn't result in requiring the user to manually update K8s resources other than the controller's, it looks fine to me. WDYT?

My motivation behind 62fcd5f was to keep backward-compatibility in the sense of the RunnerDeployment API / avoid forcing users to manually update their RunnerDeployment resources to include Spec.Template.Selector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

desiredRS would change due to the addition of the selector, but these lines in the current code does not get the existing RunnerReplicaSets in the first place because it now matches RunnerReplicaSets with labels but RunnerReplicaSets are not labeled before the controller is updated, IIUC.
I understand your motivation here and agree with you, so we should do what I suggested here to avoid this problem, .

What do you think about it? Thank you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunnerReplicaSet manages Runner with labels?

Thanks to your change, RunnerReplicaSet already seems to manage Runner with labels. After 9d75e30, the controller even works correctly for RunnerReplicaSet with/without Spec.Selector.

If so, I'm gonna make another small fix for HRA to work when no selector is given.

Are you willing to add something like the owner-reference check similar to 9d75e30 the below?

https://github.com/summerwind/actions-runner-controller/blob/9d75e30c9cd39b9f46aca5afd3b56caec1a99d27/controllers/runnerreplicaset_controller.go#L77-L81

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tapih Just a friendly reminder for the case that you missed my comment 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I took a day off yesterday...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this way to fix the HRA bug because I thought we should basically use labels to have RunnerDeployment manage Runners to make the logic as symmetric as possible with k8s's Deployment
At the same time, I undestood that you didn't want to break the compatibility, so I made some patches not to break the compatibility afterwards.

However, now the code is getting more and more complex, and I'm being strongly concerned that this would make it unnecessaliry harder to maintain this project later.
I'm really sorry for taking your time so much, but I would like to suggest taking the following steps:

  1. Fix the HRA bug without using labels
  2. Merge the PR which has the RunnerDeployment manage Runners with labels when you are forced to break the compatibility later

I'm willing to make PRs again. What do you think about it?

Copy link
Collaborator

@mumoshu mumoshu Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tapih Just to be clear, I was actually inclined to add selector to RunnerDeployment and RunnerReplicaSet, despite that it requires some work to make it backward-compatible.

Not only it helps us shape RunnerDeplomyet as much as similar as K8s Deployment, I found that selector itself can be a good/common language that allows the third-party(user) to easily find a specific set of runners managed by a runnerreplicaset/runnerdeployment.

Without selector, one had to read actions-runner-controller code to see what information can be used to correlate a runner deployment/replicaset to the corresponding runner(s).

So we now seem to have another benefit that we weren't aware of, for adding selector.

In addition to that-

Fix the HRA bug without using labels

(I may have been pretty confused, but) if you literally meant to NOT use labels, I'm not sure how we can achieve that. If you meant to not use selectors, I'm still unsure how easy it is.

HRA needs to find the latest runners managed by the targeted RunnerDeployment anyway.

Just following owner-references from HRA, RD, RS, and Runners would be not enough.

For HRA-RD, RS-R it's fine, as there might be only 1-to-1 relationship among them. However for RD, there may be two or more RS. They have different runner-template-hash label values and that label has been already there for RunnerReplicaSet, so using it for filtering RunnerReplicaSet seems natural.

We also need to determine which runner-template-hash value is one for the latest runner spec. Instead of relying on Selector, we can probably modify the HRA controller to call the same hash computing function on the RD.Spec.Template as the RD controller for that. But doing so would make a tight coupling between RD and HRA controllers. Wouldn't it be easier to add/use Selector then, in terms of design/implementation complexity?

WDYT?

You may have thought that it's really up to me to decide which direction we should go, but I'd really appreciate your feedback!

tapih and others added 15 commits March 2, 2021 13:51
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: binoue <banji-inoue@cybozu.co.jp>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: binoue <banji-inoue@cybozu.co.jp>
Signed-off-by: binoue <banji-inoue@cybozu.co.jp>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>
@mumoshu
Copy link
Collaborator

mumoshu commented Mar 4, 2021

It seems like we get validation failure list:\nspec.selector in body must be of type object: \"null\"" errors on Spec.Selector for existing RunnerReplicaSet resources and RunnerReplicaSet update errors due to some bug. Added 2e0eab0 for the fix.

ctx,
&runnerList,
client.InNamespace(rd.Namespace),
client.MatchingLabelsSelector{Selector: selector},
Copy link
Collaborator

@mumoshu mumoshu Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have roughly mentioned in #355 (comment), this might be incorrect.

We don't include runner-template-hash into account here, so at least this can result in runners from multiple RunnerReplicaSet to match. It would double numRunners while replacing RunnerReplicaSet due to RD.Spec.Template change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we could just use rd.Spec.Replicas (that defaults to 1 if missing) instead of numRunners := len(runnerList.Items), while keeping this line as-is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it might be out-of-scope- the above issue should have been there before this PR.

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 5, 2021

Actually, I'm unsure if we really need matchExpressions. Can we just remove it from the spec, so that the amount of change becomes a bit smaller?

Copy link
Collaborator

@mumoshu mumoshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tapih Let me merge this anyway so that everyone can participate in testing!

To be clear, I'm pretty confident this is working as expected.

But still wondering if we can remove matchExpression for simplicity.

I'll shortly fix #355 (comment). It's clearly out of the scope of this PR.

Other than those, I'm not aware of any other enhancements we need on top of this right now.

If you have any comments/opinions/feedbacks/etc, it's more than welcome!

@mumoshu mumoshu merged commit 11e58fc into actions:master Mar 5, 2021
mumoshu added a commit that referenced this pull request Mar 5, 2021
@@ -143,6 +144,28 @@ func (r *RunnerDeploymentReconciler) Reconcile(req ctrl.Request) (ctrl.Result, e
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
}

if !reflect.DeepEqual(newestSet.Spec.Selector, desiredRS.Spec.Selector) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, and to be clear, Selector is immutable in the standard K8s deployment. But we'd prefer making this mutable for our use-case

mumoshu added a commit that referenced this pull request Mar 11, 2021
PercentageRunnerBusy seems to have regressed since #355 due to that RunnerDeployment.Spec.Selector is empty by default and the HRA controller was using that empty selector to query runners, which somehow returned 0 runners. This fixes that by using the newly added automatic `runner-deployment-name` label for the default runner label and the selector, which avoids querying with empty selector.

Ref #377 (comment)
mumoshu added a commit that referenced this pull request Mar 11, 2021
PercentageRunnerBusy seems to have regressed since #355 due to that RunnerDeployment.Spec.Selector is empty by default and the HRA controller was using that empty selector to query runners, which somehow returned 0 runners. This fixes that by using the newly added automatic `runner-deployment-name` label for the default runner label and the selector, which avoids querying with empty selector.

Ref #377 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autoscaling controller should not calculate desired replicas including out-of-scaleTargetRef RunnerDeployment
3 participants