New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy redirect issue when running Cilium on top of Calico (CNI-Chaining) #12454
Comments
Additionally applying L7 rules seems to have no effect at all. Steps to reproduce: Expected result Actual result Has been tested by the community also using AWS CNI |
What I think is going on here, Calico uses mark fields that conflict with Cilium and will change mark values as they are sent to the stack. For Cilium redirect from pod to pod in non-chaining case we use a bpf redirect and skip going to the stack when we know the pod is local. However, in chaining case we use stacks routing table so we let the packet go to the stack. In non-L7 cases we do avoid using the mark value in Calico chaining case and force ingress on veth to do another extra lookup because of the mark mangling done in Calico. However, in the L7 case we redirect to Envoy using the stack and still rely on the mark value to hit the Cilium NOTRACK rules in iptables. I observed in a similar test that in this chaining case NOTRACK rules are not being hit and the packets sent to Envoy are being dropped by the stack. The reason we get a miss on the NOTRACK side is because we can also see a Calico rule being hit (in front of the Cilium rules) that mangles the mark value using Calico logic, conflicting with Cilium usage. To fix we likely need to put the NOTRACK rule in front of the Calico rules. I think this is OK because Cilium forward to Envoy is an internal implementation detail of Cilium and should avoid any routing/policy logic in use by Calico. Also I expect that TPROXY improvements in future release will resolve this as well because we wont need the iptables logic in the Envoy redirect path. |
See this PR for the referred TPROXY changes mentioned above: #11279 |
@joestringer Can we close this issue since the TPROXY change was merged in 1.9? |
@aditighag I didn't specifically validate that #11279 fixed this issue. I think that additional development work will be needed to fully resolve this. |
@aditighag @joestringer I'll have a look whether or in which scope this fix is solving/mitigating the issue |
@brandshaide @joestringer We started to use cilium with AWS CNI chaining. Can we maybe mention more about this limitation in docs? Or it currently should works? |
@sergeyshevch sure, do you have a suggestion where to document this? We currently have this documented at the top of pages like this: https://docs.cilium.io/en/stable/gettingstarted/cni-chaining-aws-cni/ . |
@joestringer I guess it's not totally clear what isn't working. So all L7 ruless will not work on CNI chaining or only some cases. Maybe it will be good also to mention all related issues? |
Note that there is also this section in the docs https://docs.cilium.io/en/stable/gettingstarted/cni-chaining-calico/#calico that enumerates specific issues. |
@sergeyshevch yes, basically all features listed in those sections are untested and are likely to be broken. Someone would need to pick up the work to implement the integrations necessary for it to work. Currently the docs link to the primary known issue which is likely to affect all CNI chaining setups, even though the specific details may vary from plugin to plugin. |
i have tested in CNI-Chaining (calico) kind k8s cluster(cgroupv2 only) , it looks like exist same problem :), and cilium version(v1.12.0-rc1)
|
|
Is anyone planning on picking up this work? Chaining Cilium to the AWS VPC CNI is a compelling proposition for us but loosing layer 7 policy support makes it a tougher sell. |
We are chaining Cilium 1.12 to the AWS VPC CNI and L7 policies (seem to) work fine. Edit: My assessment is anecdotal... I haven't (yet) run |
This issue has been automatically marked as stale because it has not |
This issue has not seen any activity since it was marked stale. |
We are chaining Cilium 1.14 to the AWS VPC CNI and L7 policies work fine.
|
I'm using CNI chaining on AWS and L7 policies work until a security group is attached, at which point traffic black holed at some point after the egress proxy |
How did you do it? |
This issue has been automatically marked as stale because it has not |
This issue has not seen any activity since it was marked stale. |
Bug report
General Information
pod-to-external-fqdn-allow-google-cnp is failing when running Cilium on top of Calico using
CNI chaining
minikube
1.8.2 and RKEHow to reproduce the issue
minikube start --network-plugin=cni --memory=4096
CNI Chaining
as described in our Documentation**Expected behaviour
All connectivity-check-pods are up/running:
**Actual behaviour
pod-to-external-fqdn-allow-google-cnp
is failing and falling into aCrashLoopBackOff
When deleting (correct)
CNP
, which istraffic is now being routed, i. e. now successfully
cURLing
google.com .redirects are OK:
No
ERROR
and/orWARN
logs identified at theagent
@pchaigno :
issue being related to a conflict on packet marks between Cilium and Calico that prevents proxy redirects from working properly.
The text was updated successfully, but these errors were encountered: