Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start breaking up controller manager into two pieces #34273

Merged
merged 3 commits into from Dec 23, 2016

Conversation

wlan0
Copy link
Member

@wlan0 wlan0 commented Oct 6, 2016

This PR addresses: kubernetes/enhancements#88

This commit starts breaking the controller manager into two pieces, namely,

  1. cloudprovider dependent piece
  2. coudprovider agnostic piece

the controller manager has the following control loops -

  • nodeController
  • volumeController
  • routeController
  • serviceController
  • replicationController
  • endpointController
  • resourceQuotaController
  • namespaceController
  • deploymentController
    etc..

among the above controller loops,

  • nodeController
  • volumeController
  • routeController
  • serviceController

are cloud provider dependent. As kubernetes has evolved tremendously, it has become difficult
for different cloudproviders (currently 8), to make changes and iterate quickly. Moreover, the
cloudproviders are constrained by the kubernetes build/release lifecycle. This commit is the first
step in moving towards a kubernetes code base where cloud providers specific code will move out of
the core repository, and will be maintained by the cloud providers themselves.

I have added a new cloud provider called "external", which signals the controller-manager that
cloud provider specific loops are being run by another controller. I have added these changes in such
a way that the existing cloud providers are not affected. This change is completely backwards compatible, and does not require any changes to the way kubernetes is run today.

Finally, along with the controller-manager, the kubelet also has cloud-provider specific code, and that will be addressed in a different commit/issue.

@alena1108 @ibuildthecloud @thockin @dchen1107

Special notes for your reviewer:

@thockin - Im making this WIP PR to ensure that I don't stray too far from everyone's view of how we should make this change. As you can see, only one controller, namely nodecontroller can be disabled with the --cloudprovider=external flag at the moment. I'm working on cleaning up the rancher-controller-manger that I wrote to test this.

Secondly, I'd like to use this PR to address cloudprovider specific code in kubelet and api-server.

Kubelet
Kubelet uses provider specific code for node registration and for checking node-status. I thought of two ways to divide the kubelet:

  • We could start a cloud provider specific kubelet on each host as a part of kubernetes, and this cloud-specific-kubelet does node registration and node-status checks.
  • Create a kubelet plugin for each provider, which will be started by kubelet as a long running service. This plugin can be packaged as a binary.

I'm leaning towards the first option. That way, kubelet does not have to manage another process, and we can offload the process management of the cloud-provider-specific-kubelet to something like systemd.

@dchen1107 @thockin what do you think?

Kube-apiserver

Kube-apiserver uses provider specific code for distributing ssh keys to all the nodes of a cluster. Do you have any suggestions about how to address this?

Release note:


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line.

Regular contributors should join the org to skip this step.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Oct 6, 2016
@bgrant0607
Copy link
Member

@gmarek What do you think about the idea of splitting the node controller into cloud-dependent and cloud-independent controllers?

@bgrant0607 bgrant0607 assigned thockin and bgrant0607 and unassigned mikedanese Oct 6, 2016
@bgrant0607
Copy link
Member

@wlan0 What are the uses in the kubelet?

Where is the ssh-key-distribution code?

@bgrant0607
Copy link
Member

ok to test

@bgrant0607
Copy link
Member

ssh key distribution:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-apiserver/app/server.go#L118

            if instances, supported := cloud.Instances(); supported {
                installSSH = instances.AddSSHKeyToAllInstances
            }

@bgrant0607
Copy link
Member

My thought about static information provided by the cloud provider (id, zone, etc.) could be retrieved from a file generated via cloud init or something similar.

I believe cloud-provider-specific volume logic in Kubelet is deprecated, though @saad-ali could confirm or deny that.

@saad-ali
Copy link
Member

saad-ali commented Oct 7, 2016

I believe cloud-provider-specific volume logic in Kubelet is deprecated, though @saad-ali could confirm or deny that.

Kubelet by default doesn't do attach/detach (i.e. cloud volume operations), but it retains the ability to do so (enabled via a flag) to support headless operation and thus the dependency still exists.

@gmarek
Copy link
Contributor

gmarek commented Oct 7, 2016

@bgrant0607 - NodeController uses cloud provider read-only and only the most basic operation (listing instances). The sole reason for having cloud provider at all is to be able to quickly delete Nodes that does not exist anymore.

IIUC the reason for this PR is to make it easier to iterate on cloud-provider specific implementations. I think that depending on very basic cloud provider operations (in this case listing instances) should not cause too much trouble for cloud-provider-interface implementers, so keeping the NodeController as it is should be fine. That being said we can create a separate controller that does the removal and remove this functionality from NodeController, thus making it cloud-agnostic, if it proves not to be the case.

@wlan0
Copy link
Member Author

wlan0 commented Oct 7, 2016

@bgrant0607 - As @saad-ali mentioned, there is a cloud provider dependency for kubelet.

@gmarek Thanks for the response. I'll leave nodecontroller as it is.

@bgrant0607 I'm pushing to propose this as a feature to kubernetes-1.5, and basing the proposal off of this issue - kubernetes/enhancements#88

Should I make you the assignee for the feature proposal?

@wlan0
Copy link
Member Author

wlan0 commented Oct 8, 2016

@bgrant0607 @thockin - Here's a reference to the external controller manager that i'm using for the rancher cloud provider - https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager

The cloud provider is linked here (standard for all cloud providers) - https://github.com/wlan0/kubernetes/blob/to_show/cmd/external-controller-manager/controller-manager.go#L35

The cloud provider needs to satisfy this interface - https://github.com/wlan0/kubernetes/blob/to_show/cmd/external-controller-manager/rancher/rancher.go#L766

Everything else is a simple copy+paste of existing code.

@wlan0
Copy link
Member Author

wlan0 commented Oct 12, 2016

@bgrant0607 - I managed to successfully run the service, and route controller loops as well, along with the node controller loop. I also tested these and they work just fine.

It's here - https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager

I'll be trying to run the attach_detach controller loop in the cloud specific controller-manager now, and as @thockin suggested, this is likely where I'll encounter challenges.

Also, the deadline for kubernetes-1.5 feature proposal has passed, so we don't have to worry about pushing for a feature for now.

@thockin
Copy link
Member

thockin commented Oct 13, 2016

I do think we should aim for a 100% cloud-neutral controller-manager.

I told you kubelet would be the hardest, didn't I? :) Breaking kubelet into two processes is going to run counter to all the simplified UX work. We'll need to think really hard about that. @mikedanese

Volume attach/detach is part of the volumes API, so it should already be a little insulated. I would leave this for near last.

I guess what I want to see most is an exploration of all the controller-manager controllers to see what other things we have not thought of that will be extra hard...

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 7124f222e16c2f07c5c64b8d8064b23472a157df. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@wlan0
Copy link
Member Author

wlan0 commented Oct 13, 2016

I told you kubelet would be the hardest, didn't I? :) Breaking kubelet into two
processes is going to run counter to all the simplified UX work. We'll need to
think really hard about that. @mikedanese

@thockin @mikedanese Is the UX simplification work mentioned above kubernetes/enhancements#11?

I guess what I want to see most is an exploration of all the controller-manager
controllers to see what other things we have not thought of that will be extra hard...

Do you mean concerns arising from running these controllers within different controller-managers?

For eg, when nodecontroller and servicecontroller were on different controller-managers, pod creation took longer. The problem disappeared when both were on the same controller-manager.

I could investigate these and other controllers to identify locking issues, potential deadlocks, leader election, performance issues etc.

@thockin
Copy link
Member

thockin commented Oct 13, 2016

yes, exactly. But not just what happens when they run separately, but
actually walk through the main controller loops and think about anything
that will be extra hard to disentangle. I am SURE something is lurking
that will make you say "oh damn, that's tricky" - I'd rather try to find it
sooner.

Or maybe the best way forward is just to try. Shoot for a really hacky
prototype level of quality, and see how hard it is. Maybe this really will
be just a refactoring problem, and not need and architectural decisions. I
doubt it, but maybe... :)

On Thu, Oct 13, 2016 at 3:45 PM, Sidhartha Mani notifications@github.com
wrote:

I told you kubelet would be the hardest, didn't I? :) Breaking kubelet
into two
processes is going to run counter to all the simplified UX work. We'll
need to
think really hard about that. @mikedanese https://github.com/mikedanese

@thockin https://github.com/thockin @mikedanese
https://github.com/mikedanese Is the UX simplification work mentioned
above kubernetes/enhancements#11
kubernetes/enhancements#11?

I guess what I want to see most is an exploration of all the
controller-manager
controllers to see what other things we have not thought of that will be
extra hard...

Do you mean concerns arising from running these controllers as separate
services?

For eg, when nodecontroller and servicecontroller were on different
controller-managers, pod creation took longer. The problem disappeared when
both were on the same controller-manager.

I could investigate these and other controllers to identify locking
issues, potential deadlocks, leader election, performance issues etc.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34273 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLBVeo_MloG1T6qQHgWXOmdGQgmgks5qzrSOgaJpZM4KQa1j
.

@gmarek
Copy link
Contributor

gmarek commented Oct 14, 2016

@thockin - do you think we should split out removing Nodes not visible in cloud provider out of NC to make it cloud agnostic?

@thockin
Copy link
Member

thockin commented Oct 19, 2016

yes, I'd like to totally kill off cloud provider, completely.

@gmarek
Copy link
Contributor

gmarek commented Oct 19, 2016

OK, SG @thockin - please file an issue and prioritize accordingly.

@thockin
Copy link
Member

thockin commented Oct 19, 2016

One step at a time. Let's let this PR make some progress before we all
jump in :)

On Wed, Oct 19, 2016 at 12:37 AM, Marek Grabowski notifications@github.com
wrote:

OK, SG @thockin https://github.com/thockin - please file an issue and
prioritize accordingly.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34273 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVH-08CXxQJJFQxPoOZb7EW0ju6Mfks5q1cjNgaJpZM4KQa1j
.

@wlan0
Copy link
Member Author

wlan0 commented Oct 21, 2016

@thockin - I have a prototype level quality of code where I disentangled cloud provider specific controllers from the rest (except volumes).

In my design, I made the decision to run entire controller loops in the provider specific binary, rather than only the very minute parts that rely on cloud provider. If I drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller.

This approach makes the implementation really simple. There are four controllers that rely on cloud provider specific code. These are node controller, service controller, route controller and attach detach controller (volumes, which i have left for the very last). I created copies of each of these controllers, and bundled them together into one binary. This binary registers itself as a controller, and runs the cloud specific controller loops with the user-agent named "external-controller-manager".

After analyzing the cloud specific controllers, this is what I found -

  1. All of the Informers get events from the REST API, similarly all writes to resources are done directly using the REST API. There is no local state maintained, therefore there is no issue with running another set of controller loops as another binary.
  2. Leader election ID for the new binary - This new binary will require a new leader election endpoint name. I just called it "external-controller-manager" in my prototype. As long as this is done, it should have no issues setting up HA external-controller-manager.
  3. Event broadcast/Sink framework - This framework also talks to the REST API. The shared instance of this is only used by the attach_detach_controller to send events to the api-server. The other controllers create their own instances of the broadcaster. I can't spot any risk of concurrency/disentanglement issues with this part of the system.
  4. It increases the number of queries to check the state of resources in the server. In the previous model, all of the controllers shared the informer, and now there will be another informer for the new binary making requests to the server. The impact on performance is not noticeable for small deployments. I haven't tested beyond that.

To me, the self-contained, modular nature of the various subsystems makes it look straight forward. But, If there is something missing, please do let me know. I would love to investigate further.

I haven't analyzed the rest of the controllers. I'll probably uncover more issues once I start breaking up kubelet and the volume controllers. Will sync back here once I do that.

Here's my external-controller-manager - https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager

@luxas
Copy link
Member

luxas commented Dec 18, 2016

@davidopp Sure! We won't touch the normal controller-manager this release @wlan0, will we?

@davidopp Are there any cloud-specific things to take into account in #34825?
I guess we have to update the forked cloud variant for NodeController later then if the logic changes a lot, but I don't think this will affect controller-manager yet.

Thanks for the heads-up though!

@wlan0
Copy link
Member Author

wlan0 commented Dec 18, 2016

@davidopp @luxas - In this and the next release, we have planned to leave the controller manager as it is. In the releases after that, we'll deprecate the --cloud-provider flag and subsequently remove it from the controller manager.

@liggitt
Copy link
Member

liggitt commented Dec 19, 2016

cc @deads2k

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

// ResyncPeriod computes the time interval a shared informer waits before resyncing with the api server
func ResyncPeriod(s *options.CloudControllerManagerServer) func() time.Duration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be private ?

@thockin thockin added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 22, 2016
@luxas
Copy link
Member

luxas commented Dec 22, 2016

Yay!
Thanks @thockin for that christmas 🎁 :)

ClientConfig: kubeconfig,
}
var clientBuilder controller.ControllerClientBuilder
if len(s.ServiceAccountKeyFile) > 0 && s.UseServiceAccountCredentials {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controller doesn't need ServiceAccountKeyFile, and should probably just use the UseServiceAccountCredentials flag to determine this... the goal here is to run each controller loop with a distinct identity so its permissions can be scoped to its job.

@thockin
Copy link
Member

thockin commented Dec 22, 2016

AI to @wlan0: I tagged this as needing a relnote. Anyone who does a cluster setup needs to know this is coming.

@justinsb @luxas FYI in case we want to optionally enable this (certainly not by default, and maybe not at all).

We need a doc that explains "This exists, it is pre-alpha, but here's how you can manually test it"

@thockin thockin added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 22, 2016
@thockin
Copy link
Member

thockin commented Dec 22, 2016

After discussion, ignore the relnote. We do not want mere mortals using this yet.

@luxas
Copy link
Member

luxas commented Dec 22, 2016

wlan0 added 3 commits December 22, 2016 14:37
Addresses: kubernetes/enhancements#88

This commit starts breaking the controller manager into two pieces, namely,

1. cloudprovider dependent piece
2. coudprovider agnostic piece

the controller manager has the following control loops -

   - nodeController
   - volumeController
   - routeController
   - serviceController
   - replicationController
   - endpointController
   - resourcequotacontroller
   - namespacecontroller
   - deploymentController etc..

among the above controller loops,

   - nodeController
   - volumeController
   - routeController
   - serviceController

are cloud provider dependent. As kubernetes has evolved tremendously, it has become difficult
for different cloudproviders (currently 8), to make changes and iterate quickly. Moreover, the
cloudproviders are constrained by the kubernetes build/release lifecycle. This commit is the first
step in moving towards a kubernetes code base where cloud providers specific code will move out of
the core repository, and will be maintained by the cloud providers themselves.

Finally, along with the controller-manager, the kubelet also has cloud-provider specific code, and that will
be addressed in a different commit/issue.
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2016
@thockin thockin added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins Bazel Build failed for commit 75da310. Full PR test history.

The magic incantation to run this job again is @k8s-bot bazel test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@wlan0
Copy link
Member Author

wlan0 commented Dec 22, 2016

The bazel build failure is a flake.

./hack/verify-bazel.sh script succeeds.
./hack/update-bazel.sh does not modify any files.

There are other PRs at the moment which are failing due to the bazel failure. Last successful bazel build was 5 hrs ago.

@thockin
Copy link
Member

thockin commented Dec 23, 2016

@mikedanese @ixdy re bazel failures.

@wlan0 no way am I hand-merging this one :)

@wlan0
Copy link
Member Author

wlan0 commented Dec 23, 2016

@k8s-bot bazel test this

@k8s-ci-robot
Copy link
Contributor

@wlan0: you can't request testing unless you are a kubernetes member.

In response to this comment:

@k8s-bot bazel test this

If you have questions or suggestions related to this bot's behavior, please file an issue against the kubernetes/test-infra repository.

@luxas
Copy link
Member

luxas commented Dec 23, 2016

@k8s-bot bazel test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 39093, 34273)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet