Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external workload: Run cilium-agent in host cgroup namespace #572

Merged
merged 1 commit into from
Oct 25, 2021

Conversation

wazir-ahmed
Copy link
Contributor

@wazir-ahmed wazir-ahmed commented Oct 7, 2021

Recent versions of Docker create a new cgroup namespace for each container.
Add Docker CLI option --cgroupns=host to the external workload installation
script (generated by cilium clustermesh vm install) so that the cilium
agent will run in host cgroup namespace and service load-balancing will work
as expected in external workloads.

Fixes: #569

Signed-off-by: Wazir Ahmed wazir@accuknox.com

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the install script seem sane to me, but I have no context to the various issues we've had with cgroup (v2) mounts, will have to defer to @aditighag for that. For example, is it possible or likely that the added cgroupv2 filesystem would not be available in the VM? Is there a minimum kernel version needed, etc.? If yes then maybe the script should the added mount to fail and not add the new docker mount option if that happens.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there existing dependencies on the mount command in the script? If not, can we check if the host has utilities like mount installed, and log a warning if they are not? See cilium/cilium#16815 for more details.

@wazir-ahmed
Copy link
Contributor Author

@aditighag Currently, there is no existing dependency on the mount command in the script. But the script also uses other utils like tee, readlink, grep, systemctl and service.

Is it okay if I log and exit when mount fails?

@aditighag
Copy link
Member

@aditighag Currently, there is no existing dependency on the mount command in the script. But the script also uses other utils like tee, readlink, grep, systemctl and service.

Is it okay if I log and exit when mount fails?

Ok, let's do that then.

@wazir-ahmed
Copy link
Contributor Author

@aditighag I have modified the script to log and exit when mount fails.

@aditighag
Copy link
Member

aditighag commented Oct 18, 2021

External workloads workflow has successfully completed so marking it ready for merge.

Edit : Checking if a review from sig-clustermesh is required.

@aditighag aditighag added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 18, 2021
@ldelossa
Copy link
Contributor

ldelossa commented Oct 19, 2021

@wazir-ahmed

if I'm understanding this change correctly, you are mounting the host's cgroup tree into the agent container, so the agent container can see the entire tree of cgroup namespaces created by docker?

sanity check me: what occurs when this script runs on a host without cgroup_v2 enabled?

@wazir-ahmed
Copy link
Contributor Author

wazir-ahmed commented Oct 19, 2021

@ldelossa

if I'm understanding this change correctly, you are mounting the host's cgroup tree into the agent container, so the agent container can see the entire tree of cgroup namespaces created by docker?

  • The agent attaches BPF_CGROUP_* type programs to provide service load-balancing feature. These programs needs to be attached to host's root cgroup, in order for service-lb to work properly.
  • Since new versions of docker creates new cgroup namespaces for each containers, the agent running in the container will not be aware of host's root cgroup. Currently the agent attaches the service-lb programs to the container cgroup (thinking that it is the root cgroup and not aware that it is running inside a new cgroup namespace).
  • If we mount the host's root cgroup into the agent container at the default path (/run/cilium/cgroupv2), then agent will use that path and attach the service-lb program to the cgroup mounted on that path.

sanity check me: what occurs when this script runs on a host without cgroup_v2 enabled?

BPF_CGROUP_* type programs only work with cgroup2. So I think the following will happen (not sure though)

  • The bpf/init.sh script will fail and the agent will exit because datapath initialization failed.

@ldelossa
Copy link
Contributor

@wazir-ahmed - this all sounds good.

The other approach would be to start the container with the host's "mount" name space, like we do with the net namespace.

I think your approach is fine, and I will approve just after checking that init.sh does either fail or fall back to something safe. However did we consider sharing the host's mount namespace as well ?

@wazir-ahmed
Copy link
Contributor Author

wazir-ahmed commented Oct 19, 2021

@ldelossa - We can use the docker option --cgroupns=host to share the host cgroup namespace with the container. I have tried it, it works.

Initially, I mentioned both the approaches in #569. But then, I followed the mount approach since it is similar to how we run the agent-container in cluster nodes - cilium/cilium#16259

@ldelossa
Copy link
Contributor

@wazir-ahmed okay, I'm cool with the mount approach if we don't need any other facilities of the host's filesystem its probably best to just bind mount what we need.

@aditighag
Copy link
Member

The other approach would be to start the container with the host's "mount" name space, like we do with the net namespace.
I think your approach is fine, and I will approve just after checking that init.sh does either fail or fall back to something safe. >However did we consider sharing the host's mount namespace as well ?

@ldelossa Sharing host's mount namespace (or bind mount) isn't going to address the issue. The namespace in question here is cgroup.

@ldelossa - We can use the docker option --cgroupns=host to share the host cgroup namespace with the container. I have tried it, it works.

On second thoughts, let's go ahead with this approach so that we don't have to run the additional mount commands. This option will obviously only work with Docker container runtime. But I just checked, and we already pass a bunch of other Docker options while creating the cilium container in the script. The end result for both the options is same - cilium container is started in the underlying host's cgroup namespace. Sorry, I should've checked this earlier.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per last comment, let's go ahead with the Docker CLI option.

Edit : Please add a note in the code explaining why we are passing that option.

@ldelossa
Copy link
Contributor

ldelossa commented Oct 19, 2021

Thank @aditighag i had it in my head that the root cgroup fs on the host can view all children cgroups created by docker, thus giving access to the hosts cgroup fs from the agent would accomplish the goal.

@aditighag
Copy link
Member

Thank @aditighag i had it in my head that the root cgroup fs on the host can view all children cgroups created by docker, thus giving access to the hosts cgroup fs from the agent would accomplish the goal.

That would require cgroup2 fs be already mounted on the host.

@ldelossa
Copy link
Contributor

Ah okay, and in this case it is not?

@ldelossa ldelossa closed this Oct 19, 2021
@ldelossa ldelossa reopened this Oct 19, 2021
@ldelossa ldelossa temporarily deployed to ci October 19, 2021 21:53 Inactive
@wazir-ahmed wazir-ahmed temporarily deployed to ci October 20, 2021 07:10 Inactive
@wazir-ahmed wazir-ahmed changed the title external workload: Mount host's root cgroup in the cilium-agent container external workload: Run cilium-agent in host cgroup namespace Oct 20, 2021
@wazir-ahmed
Copy link
Contributor Author

@aditighag I have added the Docker CLI option --cgroupns=host and removed the old changes.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

clustermesh/clustermesh.go Outdated Show resolved Hide resolved
Recent versions of Docker create a new cgroup namespace for each container.
Add Docker CLI option `--cgroupns=host` to the external workload installation
script (generated by `cilium clustermesh vm install`) so that the cilium
agent will run in host cgroup namespace and service load-balancing will work
as expected in external workloads.

Fixes: cilium#569

Signed-off-by: Wazir Ahmed <wazir@accuknox.com>
@aditighag
Copy link
Member

Multicluster test failed for the new commit. AFAICS, the test shouldn't have failed even though the changes are related. Will run it locally to see if it's a flake.

@tklauser
Copy link
Member

The multicluster test failure looks like a the known issue #399. I've re-triggered the test to validate.

@aditighag
Copy link
Member

Re-run also failed. /cc @tklauser

@wazir-ahmed Although the test failure is not related, can you check the minimum Docker CLI version that supports the --cgroupns option? It'll be good document this.

@aditighag
Copy link
Member

Marking the PR as ready for merge since the test failure is a known flake.

@aditighag aditighag added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 25, 2021
@tklauser tklauser merged commit 09f1524 into cilium:master Oct 25, 2021
@wazir-ahmed
Copy link
Contributor Author

wazir-ahmed commented Oct 26, 2021

can you check the minimum Docker CLI version that supports the --cgroupns option? It'll be good document this.

@aditighag - The CLI reference document mentions that --cgroupns option is available in Docker Engine API version 1.41+ which is only supported in the latest Docker 20.10 as per API version matrix. So the minimum version should be 20.10

@aditighag
Copy link
Member

@wazir-ahmed
Copy link
Contributor Author

@aditighag Raised a PR for documenting the Docker requirement - cilium/cilium#17726

wazir-ahmed added a commit to wazir-ahmed/cilium that referenced this pull request Nov 2, 2021
Related to cilium/cilium-cli#572

Signed-off-by: Wazir Ahmed <wazir@accuknox.com>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Nov 4, 2021
Related to cilium/cilium-cli#572

Signed-off-by: Wazir Ahmed <wazir@accuknox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Service load-balancing not working in external workloads
6 participants