New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --cgroup_parent flag to Kubelet to set the parent cgroup for pods #7277
Conversation
@@ -511,6 +511,9 @@ func (dm *DockerManager) runContainer(pod *api.Pod, container *api.Container, op | |||
if len(opts.DNSSearch) > 0 { | |||
hc.DNSSearch = opts.DNSSearch | |||
} | |||
if len(opts.CgroupParent) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this option will be silently ignored with older versions of docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I tested with Docker 1.4.1. Containers just launch under the default /docker
cgroup.
Overall looks good, only big issue is handling older versions of Docker. |
@vmarmol see above, older versions of Docker silently ignore it. |
/cc @jonboulle |
Updated naming according to commends |
IpcMode: ipcMode, | ||
NetMode: netMode, | ||
IpcMode: ipcMode, | ||
CgroupRoot: kl.cgroupRoot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are introducing this, should we also introduce the pod-level cgroup?
/kubernetes/pod-uid/container-id
And do we want to distinguish pods that asked for limits from pods that didnt? And is that CPU, memory, or the tuple of both?
/kubernetes/t1/pod-uid/container-id
Let's also put thought into how to future-proof this. Changing this schema has been an unending source of pain internally.
Also, let's carefully consider how important this is before 1.0. It's fun, but we want to be careful about what we undertake in the near-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin I think the pod-level cgroup should be added in a separate PR, and probably post-1.0 since it needs some discussion and is orthogonal to this change. In its current form the PR is a minimal change that we need for k8s on Mesos and is backward (and forward) compatible.
The only other question is, should the kubelet resource container live under the cgroup root as well? I'd say yes, to help with resource accounting. So the hierarchy could look like:
/kubernetes/kubelet
/kubernetes/pods/pod-uid/container-id
...
If there is agreement I can make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 towards moving kubelet to be under a configurable root cgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do cgroups for multiple QoS tiers and for pods in subsequent PRs. I don't think we need much more discussion on them, but we will need an (additive) API change, to add resources at the pod level.
Given where we are, I don't think we need to be that rigorous about changing the hierarchy. If we evacuate the node when we update the kubelet, which we need to be able to do for kernel upgrades, we'll be fine.
As for Kubelet, we're working on making it run as a container. Eventually I'd like to even run it as a pod of the highest QoS tier, but putting it into a custom cgroup root seems fine for now.
@thockin AFAIK this feature right now is primarily to allow the Kubelet to run well in Mesos where all containers have to be created within the allocation given to the Kubelet. This mechanism should be generic enough to support the future additions we need. I see us in the future:
For most users, +1 on not for v1.0, but this PR shouldn't move us further from those goals. |
LGTM, only that one nit on the naming. @thockin let me know if you have any strong feelings against this. From your comment, it seems like we're okay to go. |
+1 to @vmarmol's comments. This PR should help provide resource isolation in mesos. |
Thanks for the speedy and thorough reviews folks, we're pretty excited for this one to land 👍 |
+1 @vmarmol |
LGTM. Will merge once tests pass! |
Add --cgroup_parent flag to Kubelet to set the parent cgroup for pods
Implements #6991.
cc @bgrant0607 @davidopp @jdef @ConnorDoyle