Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: egressgw: sync logic to determine if destination is outside cluster #18246

Merged
merged 3 commits into from
Jan 5, 2022

Conversation

jibi
Copy link
Member

@jibi jibi commented Dec 13, 2021

See individual commits

@jibi jibi added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.10 feature/egress-gateway Impacts the egress IP gateway feature. labels Dec 13, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.7 Dec 13, 2021
@jibi jibi force-pushed the pr/jibi/egressgw-fix-snat-exclusion branch 3 times, most recently from c9dcd86 to b2e8f29 Compare December 13, 2021 14:01
@jibi
Copy link
Member Author

jibi commented Dec 13, 2021

/test

@jibi jibi requested a review from kkourt December 13, 2021 15:13
@jibi jibi requested a review from pchaigno December 13, 2021 15:13
@jibi jibi marked this pull request as ready for review December 13, 2021 15:14
@jibi jibi requested a review from a team December 13, 2021 15:14
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gilberto, LGTM.

As we discussed offline, I think we should describe what IPV4_SNAT_EXCLUSION_DST_CIDR and, subsequently, --ipv4-native-routing-cidr mean. My understanding was that the above CIDR included node IPs and only those, which at least in the EKS environment you tried out was not the case. IMO, this can be done in a followup PR but we should do it soon before it gets forgotten since there was a lot of time wasted on this based on different understandings of what above macro meant.

Based on our offline discussion IPV4_SNAT_EXCLUSION_DST_CIDR seems to be: "a CIDR that does not require SNAT (unless traffic is going through egress gw)". We should add an appropriate line in @pchaigno's list here: #15379 (comment).

to make it more explicit their purpose

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi force-pushed the pr/jibi/egressgw-fix-snat-exclusion branch from b2e8f29 to 99b4c9a Compare January 4, 2022 10:21
@jibi
Copy link
Member Author

jibi commented Jan 4, 2022

Thanks Gilberto, LGTM.

As we discussed offline, I think we should describe what IPV4_SNAT_EXCLUSION_DST_CIDR and, subsequently, --ipv4-native-routing-cidr mean. My understanding was that the above CIDR included node IPs and only those, which at least in the EKS environment you tried out was not the case. IMO, this can be done in a followup PR but we should do it soon before it gets forgotten since there was a lot of time wasted on this based on different understandings of what above macro meant.

Based on our offline discussion IPV4_SNAT_EXCLUSION_DST_CIDR seems to be: "a CIDR that does not require SNAT (unless traffic is going through egress gw)". We should add an appropriate line in @pchaigno's list here: #15379 (comment).

Yep, I'll fill a PR with these changes now 👍

@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 4, 2022
@jibi
Copy link
Member Author

jibi commented Jan 4, 2022

Marking as ready to merge. No need to rerun full CI as I just added an additional comment and amended a commit message (and before CI was green)

In the context of egress gateway, when traffic is leaving the cluster we
need to check twice if it is a match for an egress NAT policy:

* first time in handle_ipv4_from_lxc(), on the node where the client pod is
  running (to determine if it should be forwarded to a gateway node)
* second time in snat_v4_needed(), on the actual gateway node (to
  determine if it should be SNATed)

Currently the 2 checks are slightly diverging wrt how traffic destined
to outside the cluster is identified:

* in the first case we use is_cluster_destination(), which uses the
  information stored on the ipcache and EP maps
* in the second case we just rely on the IPV4_SNAT_EXCLUSION_DST_CIDR

The issue with the IPV4_SNAT_EXCLUSION_DST_CIDR logic is that we may
incorrectly exclude from egress gw SNAT traffic that is supposed to be
SNATed: case in point an EKS environment where the primary VPC is shared
between the cluster and some other EC2 nodes that don't belong to the
cluster.

To fix this, this commit changes the snat_v4_needed() logic to match the
one we use in handle_ipv4_from_lxc() and executes it before the
IPV4_SNAT_EXCLUSION_DST_CIDR check.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
In snat_v4_neeeded(), one of the conditions to determine if a packet
should be SNATed with an egress IP is:

    !local_ep || is_cluster_destination()

The intent of the first check (!local_ep) was to that tells us that
traffic was redirected by an egress gateway policy to a different node
to be masqueraded, but in practice it's not needed: as long as the
packet is destined to outside the cluster, is not reply traffic and it's
matched by an egress NAT policy, it should be SNATed with the egress gw
IP (moreover we should not assume that there's no local EP since it's
possible that the node where the client pod is running is the same node
that will act as egress gateway).

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi force-pushed the pr/jibi/egressgw-fix-snat-exclusion branch from 99b4c9a to 6fc9f33 Compare January 4, 2022 10:25
@christarazi christarazi merged commit a38aaba into master Jan 5, 2022
@christarazi christarazi deleted the pr/jibi/egressgw-fix-snat-exclusion branch January 5, 2022 00:19
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.7 Jan 5, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.7 Jan 5, 2022
@jibi
Copy link
Member Author

jibi commented Jan 5, 2022

(taking care of v1.10 backport as there may be a few conflicts here)

@jibi jibi mentioned this pull request Jan 5, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.7 Jan 12, 2022
@joestringer joestringer added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. feature/egress-gateway Impacts the egress IP gateway feature. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.10.7
Backport done to v1.10
Development

Successfully merging this pull request may close these issues.

None yet

5 participants