-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#18246 v1.10 backport #18379
#18246 v1.10 backport #18379
Conversation
/test-backport-1.10 Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
/test-1.16-netnext |
This PR will need to be rebased as I've just merged another backport PR in v1.10. |
[ upstream commit e3dca63 ] to make it more explicit their purpose Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit cdfc30d ] In the context of egress gateway, when traffic is leaving the cluster we need to check twice if it is a match for an egress NAT policy: * first time in handle_ipv4_from_lxc(), on the node where the client pod is running (to determine if it should be forwarded to a gateway node) * second time in snat_v4_needed(), on the actual gateway node (to determine if it should be SNATed) Currently the 2 checks are slightly diverging wrt how traffic destined to outside the cluster is identified: * in the first case we use is_cluster_destination(), which uses the information stored on the ipcache and EP maps * in the second case we just rely on the IPV4_SNAT_EXCLUSION_DST_CIDR The issue with the IPV4_SNAT_EXCLUSION_DST_CIDR logic is that we may incorrectly exclude from egress gw SNAT traffic that is supposed to be SNATed: case in point an EKS environment where the primary VPC is shared between the cluster and some other EC2 nodes that don't belong to the cluster. To fix this, this commit changes the snat_v4_needed() logic to match the one we use in handle_ipv4_from_lxc() and executes it before the IPV4_SNAT_EXCLUSION_DST_CIDR check. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit a38aaba ] In snat_v4_neeeded(), one of the conditions to determine if a packet should be SNATed with an egress IP is: !local_ep || is_cluster_destination() The intent of the first check (!local_ep) was to that tells us that traffic was redirected by an egress gateway policy to a different node to be masqueraded, but in practice it's not needed: as long as the packet is destined to outside the cluster, is not reply traffic and it's matched by an egress NAT policy, it should be SNATed with the egress gw IP (moreover we should not assume that there's no local EP since it's possible that the node where the client pod is running is the same node that will act as egress gateway). Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
466c13e
to
8fe4113
Compare
/test-gke |
CI status:
and I believe the GKE failure is unrelated to these changes, as the same 2 tests are failing on another v1.10 backport PR #18438 which has totally unrelated changes |
K8sUpdates is failing but it's also skipped :thisisfine: It's basically failing in the initial setup, just before being skipped. It's not counted as a failure by Jenkins. The K8sChaosTest failure seems legitimate, though not related to this pull request AFAICT. We should open an issue for it if there isn't already one. |
👍
already logged #18439 from the other failing PR 👍 |
Once this PR is merged, you can update the PR labels via: