Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blocking for EnableIngressController in validation #26744

Merged

Conversation

rickysumho
Copy link
Contributor

@rickysumho rickysumho commented Jul 10, 2023

We want to block any configurations that enable ingress controllers when using delegated IPAM. Cilium allocates its own IP address for sending and differentiating ingress traffic, which is not possible with delegated IPAM. This change will provide a clearer error message for this. The behavior remains the same with or without these changes as cilium-agent still fails on startup when allocating an IP address.

Here is the current error message:

Defaulted container "cilium-agent" out of: cilium-agent, install-cni-binaries (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), systemd-networkd-overrides (init), block-wireserver (init)
...
...
...
level=info msg="  Local IPv4 addresses:" subsys=daemon
level=info msg="  - 10.224.0.4" subsys=daemon
level=fatal msg="Error while creating daemon" error="unable to allocate ingress IPs: Operation not supported, see https://cilium.link/ipam-range-full" subsys=daemon
level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/secret.go:89: failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" at the cluster scope" subsys=klog
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/watchers/secret.go:89: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" at the cluster scope" subsys=k8s

We should also add a note that delegated IPAM and ingress controller are incompatible in the Prerequisites section of the ingress controller docs .

Below is the background information in Slack behind this change.

Will Daly
We were doing some testing yesterday with Cilium delegated IPAM and enable-ingress-controller=true in Cilium 1.12. Noticed that cilium-agent failed on startup trying to allocate an ingress IP here:

cilium/daemon/cmd/ipam.go

Lines 349 to 352 in 30c501c

result, err = d.ipam.AllocateNextFamilyWithoutSyncUpstream(ipam.IPv4, "ingress")
if err != nil {
return fmt.Errorf("unable to allocate ingress IPs: %s, see https://cilium.link/ipam-range-full", err)
}

Delegated IPAM uses the no-op IP allocator, so I'm not too surprised that allocating an IP fails.
However, I'm curious why Cilium's ingress controller needs to allocate an IP in cilium-agent. I see that the IP is used in pkg/envoy and pkg/nodediscovery, but I'm not really sure how these interact with the ingress controller.
Could someone help me understand why cilium-agent allocates an ingress IP when the ingress controller is enabled? Thanks!

Joe Stringer
Enabling ingress causes Cilium to introduce an L7 proxy in the path for traffic coming into the cluster towards pods inside the cluster. This L7 proxy terminates the TCP connections, so therefore all traffic flowing through the ingress resource between the external peer and the internal peer must have each connection terminated at that proxy.

So we end up with two connections, External Peer to Ingress IP (let's shorten Ext <-> Ing), then from the proxy towards the backend pod (let's shorten Ing <-> Pod)

Broadly there's a couple of ways to handle this TCP termination: Either to attempt transparent proxying, ie the Ing<->Pod connection uses the External IP as the source, or to opaquely proxy, ie the Ing<->Pod connection uses an IP associated with the Ingress node.

For transparent proxying, when the proxy handles the Ing<->Pod connection, the proxy sets the source IP address to the same as the original source IP from the incoming connection. So for the Ing<->Pod connection, the traffic is actually using the IPs of Ext<->Pod. The tricky part here is that for replies back from the Pod, Cilium needs to force that the Pod->Ext 5tuple traffic is actually redirected to the Ingress. Taking this a little further, the Ingress node may be different from the Pod node, so this can get very complicated very quickly. Cilium must track the fact that these specific connections are different from any other connections that go between the Pod and the external IP, in order to force the traffic to be sent back through the proxy (so that the TCP connections / windows / etc line up).

The alternative is to use an IP associated with the node. One option here is that Cilium just uses the Node's IP address in order to initiate the backend connection. While this works from a connectivity perspective, this also makes it difficult to enforce policies differently between traffic that actually originates from the node vs. traffic that is proxied through the node. In both cases, the connection to the Pod backend is actually just initiated from a process on the intermediate node (either local process or the proxy, which is a local process).

So, in order to simplify policy enforcement for traffic coming from an Ingress, Cilium prefers to allocate a brand new IP address specifically for Ingress traffic, and then use this specific IP address when implementing the Ingress proxy logic. When traffic arrives at the backend node for the target Pod, Cilium can then easily differentiate traffic that came into the cluster via the Ingress (because it uses the Ingress IP) vs. traffic that came directly from that particular node.

Prevent Cilium from running with Delegated IPAM at the same time as Ingress

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 10, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jul 10, 2023
@wedaly
Copy link
Contributor

wedaly commented Jul 10, 2023

could you please add a commit message describing the change? (see step 5 in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#submitting-a-pull-request)

The msg you have at the start of this PR should work "We want to block any configurations that enable ingress controllers when using delegated IPAM...This change will provide a clearer error message for this." Maybe also put examples of the previous error ("unable to allocate ingress IPs: Operation not supported, see https://cilium.link/ipam-range-full") and the new error messages.

You can edit the commit msg using git commit --amend -s, then force-push to this branch with git push upstream <branch> -f

@wedaly
Copy link
Contributor

wedaly commented Jul 10, 2023

Suggest that we also add a note in the "prerequisites" section of the ingress controller docs saying that delegated IPAM and ingress controller are incompatible: https://github.com/cilium/cilium/blob/main/Documentation/network/servicemesh/ingress.rst#prerequisites

// for differentiating ingress traffic, which is not possible with delegated IPAM
if c.EnableIngressController {
return fmt.Errorf("--%s must be disabled with --%s=%s", EnableIngressController, IPAM, ipamOption.IPAMDelegatedPlugin)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also block c.EnableEnvoyConfig? I see that Cilium checks that flag when deciding whether to allocate an ingress IP here:

if option.Config.EnableEnvoyConfig {

(From these docs, I believe EnableIngressController implies EnableEnvoyConfig, but not the other way around.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, if envoy is the issue here we should block on that as it is used in many more cases than just ingress (gateway api/ l7 proxy)

@rickysumho rickysumho force-pushed the pr/block-delegated-ipam-ingress-controller branch from 8c2e2c1 to f5ecfcc Compare July 11, 2023 18:37
@maintainer-s-little-helper
Copy link

Commit 5e0b23960b8969d6bbc98e9b7a65bf0bc7dc074e does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 11, 2023
@rickysumho rickysumho force-pushed the pr/block-delegated-ipam-ingress-controller branch from 5e0b239 to 263a825 Compare July 11, 2023 23:17
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 11, 2023
@rickysumho rickysumho marked this pull request as ready for review July 11, 2023 23:22
@rickysumho rickysumho requested review from a team as code owners July 11, 2023 23:22
Copy link
Member

@asauber asauber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind setting the same author name and email on both commits?

pkg/option/config.go Outdated Show resolved Hide resolved
@rickysumho rickysumho force-pushed the pr/block-delegated-ipam-ingress-controller branch from 263a825 to a0b7dfa Compare July 17, 2023 02:25
@rickysumho rickysumho requested review from a team as code owners July 17, 2023 02:25
Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent changes lgtm 👍 see a couple CI failures that look like they could be flakes. @rickysumho could you please take a look at the CI failures using this guide? https://docs.cilium.io/en/stable/contributing/testing/ci/#ci-failure-triage

@wedaly
Copy link
Contributor

wedaly commented Jul 26, 2023

I see that there were some CI changes merged recently in #27002
Maybe we should rebase this branch onto main and rerun tests? I searched for some of the failing tests and didn't find any open GH issues, and I don't want to open new issues for things that might be fixed in the updated CI.

We want to block any configurations that enable ingress controllers when using delegated IPAM.
Cilium allocates its own IP address for sending and differentiating ingress traffic, which is
not possible with delegated IPAM. This change will provide a clearer error message for this.

Signed-off-by: Ricky Ho <horicky78@gmail.com>
@rickysumho rickysumho force-pushed the pr/block-delegated-ipam-ingress-controller branch from d89d4ce to c9d17df Compare July 26, 2023 15:55
@rickysumho
Copy link
Contributor Author

Hello, could we rerun tests to see if we are still failing the tests?

Copy link
Member

@meyskens meyskens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the config change

@meyskens
Copy link
Member

/test

@wedaly
Copy link
Contributor

wedaly commented Jul 26, 2023

Only two failures on the last run, which I think are flakes. Created these two issues:

#27101
#27102

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement! This seems straightforward and we probably want to backport it to 1.13 and 1.14 to prevent starting in this invalid mode.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 31, 2023
@joestringer joestringer added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed release-note/misc This PR makes changes that have no direct user impact. labels Jul 31, 2023
@joestringer joestringer merged commit fded1ab into cilium:main Jul 31, 2023
58 checks passed
@sayboras sayboras mentioned this pull request Aug 3, 2023
11 tasks
@sayboras sayboras added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Aug 3, 2023
@sayboras sayboras mentioned this pull request Aug 3, 2023
4 tasks
@sayboras sayboras added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Impacts IP address management functionality. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/ipam IP address management, including cloud IPAM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet