New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blocking for EnableIngressController in validation #26744
Add blocking for EnableIngressController in validation #26744
Conversation
could you please add a commit message describing the change? (see step 5 in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#submitting-a-pull-request) The msg you have at the start of this PR should work "We want to block any configurations that enable ingress controllers when using delegated IPAM...This change will provide a clearer error message for this." Maybe also put examples of the previous error ("unable to allocate ingress IPs: Operation not supported, see https://cilium.link/ipam-range-full") and the new error messages. You can edit the commit msg using |
Suggest that we also add a note in the "prerequisites" section of the ingress controller docs saying that delegated IPAM and ingress controller are incompatible: https://github.com/cilium/cilium/blob/main/Documentation/network/servicemesh/ingress.rst#prerequisites |
// for differentiating ingress traffic, which is not possible with delegated IPAM | ||
if c.EnableIngressController { | ||
return fmt.Errorf("--%s must be disabled with --%s=%s", EnableIngressController, IPAM, ipamOption.IPAMDelegatedPlugin) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also block c.EnableEnvoyConfig
? I see that Cilium checks that flag when deciding whether to allocate an ingress IP here:
Line 312 in f3c921b
if option.Config.EnableEnvoyConfig { |
(From these docs, I believe EnableIngressController implies EnableEnvoyConfig, but not the other way around.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, if envoy is the issue here we should block on that as it is used in many more cases than just ingress (gateway api/ l7 proxy)
8c2e2c1
to
f5ecfcc
Compare
Commit 5e0b23960b8969d6bbc98e9b7a65bf0bc7dc074e does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
5e0b239
to
263a825
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind setting the same author name and email on both commits?
263a825
to
a0b7dfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agent changes lgtm 👍 see a couple CI failures that look like they could be flakes. @rickysumho could you please take a look at the CI failures using this guide? https://docs.cilium.io/en/stable/contributing/testing/ci/#ci-failure-triage
I see that there were some CI changes merged recently in #27002 |
We want to block any configurations that enable ingress controllers when using delegated IPAM. Cilium allocates its own IP address for sending and differentiating ingress traffic, which is not possible with delegated IPAM. This change will provide a clearer error message for this. Signed-off-by: Ricky Ho <horicky78@gmail.com>
d89d4ce
to
c9d17df
Compare
Hello, could we rerun tests to see if we are still failing the tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the config change
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement! This seems straightforward and we probably want to backport it to 1.13 and 1.14 to prevent starting in this invalid mode.
We want to block any configurations that enable ingress controllers when using delegated IPAM. Cilium allocates its own IP address for sending and differentiating ingress traffic, which is not possible with delegated IPAM. This change will provide a clearer error message for this. The behavior remains the same with or without these changes as cilium-agent still fails on startup when allocating an IP address.
Here is the current error message:
We should also add a note that delegated IPAM and ingress controller are incompatible in the Prerequisites section of the ingress controller docs .
Below is the background information in Slack behind this change.
Will Daly
We were doing some testing yesterday with Cilium delegated IPAM and
enable-ingress-controller=true
in Cilium 1.12. Noticed that cilium-agent failed on startup trying to allocate an ingress IP here:cilium/daemon/cmd/ipam.go
Lines 349 to 352 in 30c501c
Delegated IPAM uses the no-op IP allocator, so I'm not too surprised that allocating an IP fails.
However, I'm curious why Cilium's ingress controller needs to allocate an IP in cilium-agent. I see that the IP is used in pkg/envoy and pkg/nodediscovery, but I'm not really sure how these interact with the ingress controller.
Could someone help me understand why cilium-agent allocates an ingress IP when the ingress controller is enabled? Thanks!
Joe Stringer
Enabling ingress causes Cilium to introduce an L7 proxy in the path for traffic coming into the cluster towards pods inside the cluster. This L7 proxy terminates the TCP connections, so therefore all traffic flowing through the ingress resource between the external peer and the internal peer must have each connection terminated at that proxy.
So we end up with two connections, External Peer to Ingress IP (let's shorten Ext <-> Ing), then from the proxy towards the backend pod (let's shorten Ing <-> Pod)
Broadly there's a couple of ways to handle this TCP termination: Either to attempt transparent proxying, ie the Ing<->Pod connection uses the External IP as the source, or to opaquely proxy, ie the Ing<->Pod connection uses an IP associated with the Ingress node.
For transparent proxying, when the proxy handles the Ing<->Pod connection, the proxy sets the source IP address to the same as the original source IP from the incoming connection. So for the Ing<->Pod connection, the traffic is actually using the IPs of Ext<->Pod. The tricky part here is that for replies back from the Pod, Cilium needs to force that the Pod->Ext 5tuple traffic is actually redirected to the Ingress. Taking this a little further, the Ingress node may be different from the Pod node, so this can get very complicated very quickly. Cilium must track the fact that these specific connections are different from any other connections that go between the Pod and the external IP, in order to force the traffic to be sent back through the proxy (so that the TCP connections / windows / etc line up).
The alternative is to use an IP associated with the node. One option here is that Cilium just uses the Node's IP address in order to initiate the backend connection. While this works from a connectivity perspective, this also makes it difficult to enforce policies differently between traffic that actually originates from the node vs. traffic that is proxied through the node. In both cases, the connection to the Pod backend is actually just initiated from a process on the intermediate node (either local process or the proxy, which is a local process).
So, in order to simplify policy enforcement for traffic coming from an Ingress, Cilium prefers to allocate a brand new IP address specifically for Ingress traffic, and then use this specific IP address when implementing the Ingress proxy logic. When traffic arrives at the backend node for the target Pod, Cilium can then easily differentiate traffic that came into the cluster via the Ingress (because it uses the Ingress IP) vs. traffic that came directly from that particular node.