start breaking up controller manager into two pieces #34273

wlan0 · 2016-10-06T21:10:10Z

This PR addresses: kubernetes/enhancements#88

This commit starts breaking the controller manager into two pieces, namely,

cloudprovider dependent piece
coudprovider agnostic piece

the controller manager has the following control loops -

nodeController
volumeController
routeController
serviceController
replicationController
endpointController
resourceQuotaController
namespaceController
deploymentController
etc..

among the above controller loops,

nodeController
volumeController
routeController
serviceController

are cloud provider dependent. As kubernetes has evolved tremendously, it has become difficult
for different cloudproviders (currently 8), to make changes and iterate quickly. Moreover, the
cloudproviders are constrained by the kubernetes build/release lifecycle. This commit is the first
step in moving towards a kubernetes code base where cloud providers specific code will move out of
the core repository, and will be maintained by the cloud providers themselves.

I have added a new cloud provider called "external", which signals the controller-manager that
cloud provider specific loops are being run by another controller. I have added these changes in such
a way that the existing cloud providers are not affected. This change is completely backwards compatible, and does not require any changes to the way kubernetes is run today.

Finally, along with the controller-manager, the kubelet also has cloud-provider specific code, and that will be addressed in a different commit/issue.

@alena1108 @ibuildthecloud @thockin @dchen1107

Special notes for your reviewer:

@thockin - Im making this WIP PR to ensure that I don't stray too far from everyone's view of how we should make this change. As you can see, only one controller, namely nodecontroller can be disabled with the --cloudprovider=external flag at the moment. I'm working on cleaning up the rancher-controller-manger that I wrote to test this.

Secondly, I'd like to use this PR to address cloudprovider specific code in kubelet and api-server.

Kubelet
Kubelet uses provider specific code for node registration and for checking node-status. I thought of two ways to divide the kubelet:

We could start a cloud provider specific kubelet on each host as a part of kubernetes, and this cloud-specific-kubelet does node registration and node-status checks.
Create a kubelet plugin for each provider, which will be started by kubelet as a long running service. This plugin can be packaged as a binary.

I'm leaning towards the first option. That way, kubelet does not have to manage another process, and we can offload the process management of the cloud-provider-specific-kubelet to something like systemd.

@dchen1107 @thockin what do you think?

Kube-apiserver

Kube-apiserver uses provider specific code for distributing ssh keys to all the nodes of a cluster. Do you have any suggestions about how to address this?

Release note:

This change is

k8s-ci-robot · 2016-10-06T21:10:12Z

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line.

Regular contributors should join the org to skip this step.

bgrant0607 · 2016-10-06T23:22:05Z

@gmarek What do you think about the idea of splitting the node controller into cloud-dependent and cloud-independent controllers?

bgrant0607 · 2016-10-06T23:29:11Z

@wlan0 What are the uses in the kubelet?

Where is the ssh-key-distribution code?

bgrant0607 · 2016-10-06T23:29:21Z

ok to test

bgrant0607 · 2016-10-06T23:37:02Z

ssh key distribution:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-apiserver/app/server.go#L118

            if instances, supported := cloud.Instances(); supported {
                installSSH = instances.AddSSHKeyToAllInstances
            }

bgrant0607 · 2016-10-06T23:41:51Z

My thought about static information provided by the cloud provider (id, zone, etc.) could be retrieved from a file generated via cloud init or something similar.

I believe cloud-provider-specific volume logic in Kubelet is deprecated, though @saad-ali could confirm or deny that.

saad-ali · 2016-10-07T03:47:59Z

I believe cloud-provider-specific volume logic in Kubelet is deprecated, though @saad-ali could confirm or deny that.

Kubelet by default doesn't do attach/detach (i.e. cloud volume operations), but it retains the ability to do so (enabled via a flag) to support headless operation and thus the dependency still exists.

gmarek · 2016-10-07T08:58:59Z

@bgrant0607 - NodeController uses cloud provider read-only and only the most basic operation (listing instances). The sole reason for having cloud provider at all is to be able to quickly delete Nodes that does not exist anymore.

IIUC the reason for this PR is to make it easier to iterate on cloud-provider specific implementations. I think that depending on very basic cloud provider operations (in this case listing instances) should not cause too much trouble for cloud-provider-interface implementers, so keeping the NodeController as it is should be fine. That being said we can create a separate controller that does the removal and remove this functionality from NodeController, thus making it cloud-agnostic, if it proves not to be the case.

wlan0 · 2016-10-07T23:18:32Z

@bgrant0607 - As @saad-ali mentioned, there is a cloud provider dependency for kubelet.

@gmarek Thanks for the response. I'll leave nodecontroller as it is.

@bgrant0607 I'm pushing to propose this as a feature to kubernetes-1.5, and basing the proposal off of this issue - kubernetes/enhancements#88

Should I make you the assignee for the feature proposal?

wlan0 · 2016-10-08T00:45:19Z

@bgrant0607 @thockin - Here's a reference to the external controller manager that i'm using for the rancher cloud provider - https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager

The cloud provider is linked here (standard for all cloud providers) - https://github.com/wlan0/kubernetes/blob/to_show/cmd/external-controller-manager/controller-manager.go#L35

The cloud provider needs to satisfy this interface - https://github.com/wlan0/kubernetes/blob/to_show/cmd/external-controller-manager/rancher/rancher.go#L766

Everything else is a simple copy+paste of existing code.

wlan0 · 2016-10-12T23:56:31Z

@bgrant0607 - I managed to successfully run the service, and route controller loops as well, along with the node controller loop. I also tested these and they work just fine.

It's here - https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager

I'll be trying to run the attach_detach controller loop in the cloud specific controller-manager now, and as @thockin suggested, this is likely where I'll encounter challenges.

Also, the deadline for kubernetes-1.5 feature proposal has passed, so we don't have to worry about pushing for a feature for now.

thockin · 2016-10-13T06:11:11Z

I do think we should aim for a 100% cloud-neutral controller-manager.

I told you kubelet would be the hardest, didn't I? :) Breaking kubelet into two processes is going to run counter to all the simplified UX work. We'll need to think really hard about that. @mikedanese

Volume attach/detach is part of the volumes API, so it should already be a little insulated. I would leave this for near last.

I guess what I want to see most is an exploration of all the controller-manager controllers to see what other things we have not thought of that will be extra hard...

k8s-ci-robot · 2016-10-13T21:20:56Z

Jenkins GKE smoke e2e failed for commit 7124f222e16c2f07c5c64b8d8064b23472a157df. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

wlan0 · 2016-10-13T22:45:00Z

I told you kubelet would be the hardest, didn't I? :) Breaking kubelet into two
processes is going to run counter to all the simplified UX work. We'll need to
think really hard about that. @mikedanese

@thockin @mikedanese Is the UX simplification work mentioned above kubernetes/enhancements#11?

I guess what I want to see most is an exploration of all the controller-manager
controllers to see what other things we have not thought of that will be extra hard...

Do you mean concerns arising from running these controllers within different controller-managers?

For eg, when nodecontroller and servicecontroller were on different controller-managers, pod creation took longer. The problem disappeared when both were on the same controller-manager.

I could investigate these and other controllers to identify locking issues, potential deadlocks, leader election, performance issues etc.

thockin · 2016-10-13T23:31:54Z

yes, exactly. But not just what happens when they run separately, but
actually walk through the main controller loops and think about anything
that will be extra hard to disentangle. I am SURE something is lurking
that will make you say "oh damn, that's tricky" - I'd rather try to find it
sooner.

Or maybe the best way forward is just to try. Shoot for a really hacky
prototype level of quality, and see how hard it is. Maybe this really will
be just a refactoring problem, and not need and architectural decisions. I
doubt it, but maybe... :)

On Thu, Oct 13, 2016 at 3:45 PM, Sidhartha Mani notifications@github.com
wrote:

I told you kubelet would be the hardest, didn't I? :) Breaking kubelet
into two
processes is going to run counter to all the simplified UX work. We'll
need to
think really hard about that. @mikedanese https://github.com/mikedanese

@thockin https://github.com/thockin @mikedanese
https://github.com/mikedanese Is the UX simplification work mentioned
above kubernetes/enhancements#11
kubernetes/enhancements#11?

I guess what I want to see most is an exploration of all the
controller-manager
controllers to see what other things we have not thought of that will be
extra hard...

Do you mean concerns arising from running these controllers as separate
services?

For eg, when nodecontroller and servicecontroller were on different
controller-managers, pod creation took longer. The problem disappeared when
both were on the same controller-manager.

I could investigate these and other controllers to identify locking
issues, potential deadlocks, leader election, performance issues etc.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34273 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLBVeo_MloG1T6qQHgWXOmdGQgmgks5qzrSOgaJpZM4KQa1j
.

gmarek · 2016-10-14T08:56:35Z

@thockin - do you think we should split out removing Nodes not visible in cloud provider out of NC to make it cloud agnostic?

thockin · 2016-10-19T07:14:58Z

yes, I'd like to totally kill off cloud provider, completely.

gmarek · 2016-10-19T07:37:16Z

OK, SG @thockin - please file an issue and prioritize accordingly.

thockin · 2016-10-19T07:39:09Z

One step at a time. Let's let this PR make some progress before we all
jump in :)

On Wed, Oct 19, 2016 at 12:37 AM, Marek Grabowski notifications@github.com
wrote:

OK, SG @thockin https://github.com/thockin - please file an issue and
prioritize accordingly.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34273 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVH-08CXxQJJFQxPoOZb7EW0ju6Mfks5q1cjNgaJpZM4KQa1j
.

wlan0 · 2016-10-21T01:15:48Z

@thockin - I have a prototype level quality of code where I disentangled cloud provider specific controllers from the rest (except volumes).

In my design, I made the decision to run entire controller loops in the provider specific binary, rather than only the very minute parts that rely on cloud provider. If I drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller.

This approach makes the implementation really simple. There are four controllers that rely on cloud provider specific code. These are node controller, service controller, route controller and attach detach controller (volumes, which i have left for the very last). I created copies of each of these controllers, and bundled them together into one binary. This binary registers itself as a controller, and runs the cloud specific controller loops with the user-agent named "external-controller-manager".

After analyzing the cloud specific controllers, this is what I found -

All of the Informers get events from the REST API, similarly all writes to resources are done directly using the REST API. There is no local state maintained, therefore there is no issue with running another set of controller loops as another binary.
Leader election ID for the new binary - This new binary will require a new leader election endpoint name. I just called it "external-controller-manager" in my prototype. As long as this is done, it should have no issues setting up HA external-controller-manager.
Event broadcast/Sink framework - This framework also talks to the REST API. The shared instance of this is only used by the attach_detach_controller to send events to the api-server. The other controllers create their own instances of the broadcaster. I can't spot any risk of concurrency/disentanglement issues with this part of the system.
It increases the number of queries to check the state of resources in the server. In the previous model, all of the controllers shared the informer, and now there will be another informer for the new binary making requests to the server. The impact on performance is not noticeable for small deployments. I haven't tested beyond that.

To me, the self-contained, modular nature of the various subsystems makes it look straight forward. But, If there is something missing, please do let me know. I would love to investigate further.

I haven't analyzed the rest of the controllers. I'll probably uncover more issues once I start breaking up kubelet and the volume controllers. Will sync back here once I do that.

Here's my external-controller-manager - https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager

luxas · 2016-12-18T19:40:01Z

@davidopp Sure! We won't touch the normal controller-manager this release @wlan0, will we?

@davidopp Are there any cloud-specific things to take into account in #34825?
I guess we have to update the forked cloud variant for NodeController later then if the logic changes a lot, but I don't think this will affect controller-manager yet.

Thanks for the heads-up though!

wlan0 · 2016-12-18T21:08:16Z

@davidopp @luxas - In this and the next release, we have planned to leave the controller manager as it is. In the releases after that, we'll deprecate the --cloud-provider flag and subsequently remove it from the controller manager.

liggitt · 2016-12-19T00:08:37Z

cc @deads2k

thockin

LGTM

thockin · 2016-12-22T20:33:42Z

cmd/cloud-controller-manager/app/controllermanager.go

+}
+
+// ResyncPeriod computes the time interval a shared informer waits before resyncing with the api server
+func ResyncPeriod(s *options.CloudControllerManagerServer) func() time.Duration {


could be private ?

luxas · 2016-12-22T20:55:13Z

Yay!
Thanks @thockin for that christmas 🎁 :)

liggitt · 2016-12-22T21:00:59Z

cmd/cloud-controller-manager/app/controllermanager.go

+			ClientConfig: kubeconfig,
+		}
+		var clientBuilder controller.ControllerClientBuilder
+		if len(s.ServiceAccountKeyFile) > 0 && s.UseServiceAccountCredentials {


This controller doesn't need ServiceAccountKeyFile, and should probably just use the UseServiceAccountCredentials flag to determine this... the goal here is to run each controller loop with a distinct identity so its permissions can be scoped to its job.

thockin · 2016-12-22T21:08:02Z

AI to @wlan0: I tagged this as needing a relnote. Anyone who does a cluster setup needs to know this is coming.

@justinsb @luxas FYI in case we want to optionally enable this (certainly not by default, and maybe not at all).

We need a doc that explains "This exists, it is pre-alpha, but here's how you can manually test it"

thockin · 2016-12-22T21:08:45Z

After discussion, ignore the relnote. We do not want mere mortals using this yet.

luxas · 2016-12-22T21:35:18Z

@thockin PTAL at https://docs.google.com/document/d/1bpEGhdg8t-2Ae-IL_ezZIPklMHpZLWwnoAPMgLwT3oU

Addresses: kubernetes/enhancements#88 This commit starts breaking the controller manager into two pieces, namely, 1. cloudprovider dependent piece 2. coudprovider agnostic piece the controller manager has the following control loops - - nodeController - volumeController - routeController - serviceController - replicationController - endpointController - resourcequotacontroller - namespacecontroller - deploymentController etc.. among the above controller loops, - nodeController - volumeController - routeController - serviceController are cloud provider dependent. As kubernetes has evolved tremendously, it has become difficult for different cloudproviders (currently 8), to make changes and iterate quickly. Moreover, the cloudproviders are constrained by the kubernetes build/release lifecycle. This commit is the first step in moving towards a kubernetes code base where cloud providers specific code will move out of the core repository, and will be maintained by the cloud providers themselves. Finally, along with the controller-manager, the kubelet also has cloud-provider specific code, and that will be addressed in a different commit/issue.

…-manager

… changes

k8s-ci-robot · 2016-12-22T23:17:10Z

Jenkins Bazel Build failed for commit 75da310. Full PR test history.

The magic incantation to run this job again is @k8s-bot bazel test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

wlan0 · 2016-12-22T23:30:50Z

The bazel build failure is a flake.

./hack/verify-bazel.sh script succeeds.
./hack/update-bazel.sh does not modify any files.

There are other PRs at the moment which are failing due to the bazel failure. Last successful bazel build was 5 hrs ago.

thockin · 2016-12-23T01:07:35Z

@mikedanese @ixdy re bazel failures.

@wlan0 no way am I hand-merging this one :)

wlan0 · 2016-12-23T02:24:29Z

@k8s-bot bazel test this

k8s-ci-robot · 2016-12-23T02:24:30Z

@wlan0: you can't request testing unless you are a kubernetes member.

In response to this comment:

@k8s-bot bazel test this

If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository.

luxas · 2016-12-23T08:24:28Z

@k8s-bot bazel test this

k8s-github-robot · 2016-12-23T09:25:28Z

Automatic merge from submit-queue (batch tested with PRs 39093, 34273)

googlebot added the cla: yes label Oct 6, 2016

k8s-github-robot assigned mikedanese Oct 6, 2016

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Oct 6, 2016

wlan0 mentioned this pull request Oct 6, 2016

Add Rancher as a cloud provider #32419

Closed

bgrant0607 assigned thockin and bgrant0607 and unassigned mikedanese Oct 6, 2016

bgrant0607 assigned dchen1107 Oct 6, 2016

wlan0 force-pushed the master branch from 6f6b596 to 7124f22 Compare October 13, 2016 20:43

thockin approved these changes Dec 22, 2016

View reviewed changes

thockin added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 22, 2016

liggitt reviewed Dec 22, 2016

View reviewed changes

thockin added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 22, 2016

wlan0 force-pushed the master branch from e58ed8c to bd5f8cd Compare December 22, 2016 22:33

wlan0 added 3 commits December 22, 2016 14:37

add cloud-controller-manager as the first step in breaking controller…

1e48fd1

…-manager

sanitize names and add more comments, and other essential boilerplate…

75da310

… changes

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2016

thockin added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2016

wlan0 force-pushed the master branch from bd5f8cd to 75da310 Compare December 22, 2016 23:16

k8s-github-robot merged commit 48793a4 into kubernetes:master Dec 23, 2016

wlan0 mentioned this pull request May 30, 2017

Support out-of-process and out-of-tree cloud providers kubernetes/enhancements#88

Closed

start breaking up controller manager into two pieces #34273

start breaking up controller manager into two pieces #34273

Conversation

wlan0 commented Oct 6, 2016 • edited by k8s-oncall

k8s-ci-robot commented Oct 6, 2016

bgrant0607 commented Oct 6, 2016

bgrant0607 commented Oct 6, 2016

bgrant0607 commented Oct 6, 2016

bgrant0607 commented Oct 6, 2016

bgrant0607 commented Oct 6, 2016

saad-ali commented Oct 7, 2016

gmarek commented Oct 7, 2016

wlan0 commented Oct 7, 2016

wlan0 commented Oct 8, 2016

wlan0 commented Oct 12, 2016

thockin commented Oct 13, 2016

k8s-ci-robot commented Oct 13, 2016

wlan0 commented Oct 13, 2016 • edited

thockin commented Oct 13, 2016

gmarek commented Oct 14, 2016

thockin commented Oct 19, 2016

gmarek commented Oct 19, 2016

thockin commented Oct 19, 2016

wlan0 commented Oct 21, 2016 • edited

luxas commented Dec 18, 2016

wlan0 commented Dec 18, 2016

liggitt commented Dec 19, 2016

thockin left a comment

Choose a reason for hiding this comment

thockin Dec 22, 2016

Choose a reason for hiding this comment

luxas commented Dec 22, 2016

liggitt Dec 22, 2016

Choose a reason for hiding this comment

thockin commented Dec 22, 2016

thockin commented Dec 22, 2016

luxas commented Dec 22, 2016

k8s-ci-robot commented Dec 22, 2016

wlan0 commented Dec 22, 2016

thockin commented Dec 23, 2016

wlan0 commented Dec 23, 2016

k8s-ci-robot commented Dec 23, 2016

luxas commented Dec 23, 2016

k8s-github-robot commented Dec 23, 2016

wlan0 commented Oct 6, 2016 •

edited by k8s-oncall

wlan0 commented Oct 13, 2016 •

edited

wlan0 commented Oct 21, 2016 •

edited