#18246 v1.10 backport #18379

jibi · 2022-01-05T10:09:17Z

bpf: egressgw: sync logic to determine if destination is outside cluster #18246 -- bpf: egressgw: sync logic to determine if destination is outside cluster (@jibi)

Once this PR is merged, you can update the PR labels via:

$ for pr in 18246; do contrib/backporting/set-labels.py $pr done 1.10; done

jibi · 2022-01-05T10:09:42Z

/test-backport-1.10

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running

Failure Output

FAIL: Found 1 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

jibi · 2022-01-05T12:39:49Z

/test-1.16-netnext

pchaigno · 2022-01-10T09:58:23Z

This PR will need to be rebased as I've just merged another backport PR in v1.10.

[ upstream commit e3dca63 ] to make it more explicit their purpose Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit cdfc30d ] In the context of egress gateway, when traffic is leaving the cluster we need to check twice if it is a match for an egress NAT policy: * first time in handle_ipv4_from_lxc(), on the node where the client pod is running (to determine if it should be forwarded to a gateway node) * second time in snat_v4_needed(), on the actual gateway node (to determine if it should be SNATed) Currently the 2 checks are slightly diverging wrt how traffic destined to outside the cluster is identified: * in the first case we use is_cluster_destination(), which uses the information stored on the ipcache and EP maps * in the second case we just rely on the IPV4_SNAT_EXCLUSION_DST_CIDR The issue with the IPV4_SNAT_EXCLUSION_DST_CIDR logic is that we may incorrectly exclude from egress gw SNAT traffic that is supposed to be SNATed: case in point an EKS environment where the primary VPC is shared between the cluster and some other EC2 nodes that don't belong to the cluster. To fix this, this commit changes the snat_v4_needed() logic to match the one we use in handle_ipv4_from_lxc() and executes it before the IPV4_SNAT_EXCLUSION_DST_CIDR check. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit a38aaba ] In snat_v4_neeeded(), one of the conditions to determine if a packet should be SNATed with an egress IP is: !local_ep || is_cluster_destination() The intent of the first check (!local_ep) was to that tells us that traffic was redirected by an egress gateway policy to a different node to be masqueraded, but in practice it's not needed: as long as the packet is destined to outside the cluster, is not reply traffic and it's matched by an egress NAT policy, it should be SNATed with the egress gw IP (moreover we should not assume that there's no local EP since it's possible that the node where the client pod is running is the same node that will act as egress gateway). Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

jibi · 2022-01-11T09:58:24Z

/test-gke

jibi · 2022-01-11T13:15:12Z

CI status:

ci-l4lb-1.10 is failing because test: bump l4lb Vagrantfile kind to 0.11.1 #18370 has not been backported yet to v1.10
test-gke has 2 failing tests:
- K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running
- K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

and I believe the GKE failure is unrelated to these changes, as the same 2 tests are failing on another v1.10 backport PR #18438 which has totally unrelated changes

pchaigno · 2022-01-11T13:21:05Z

test-gke has 2 failing tests:
* `K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running`
* `K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master`
and I believe the GKE failure is unrelated to these changes, as the same 2 tests are failing on another v1.10 backport PR #18438 which has totally unrelated changes

K8sUpdates is failing but it's also skipped :thisisfine: It's basically failing in the initial setup, just before being skipped. It's not counted as a failure by Jenkins.

The K8sChaosTest failure seems legitimate, though not related to this pull request AFAICT. We should open an issue for it if there isn't already one.

jibi · 2022-01-11T13:24:30Z

K8sUpdates is failing but it's also skipped :thisisfine: It's basically failing in the initial setup, just before being skipped. It's not counted as a failure by Jenkins.

👍

The K8sChaosTest failure seems legitimate, though not related to this pull request AFAICT. We should open an issue for it if there isn't already one.

already logged #18439 from the other failing PR 👍

jibi added kind/backports This PR provides functionality previously merged into master. backport/1.10 labels Jan 5, 2022

jibi requested a review from kkourt January 5, 2022 10:09

jibi requested a review from a team as a code owner January 5, 2022 10:09

maintainer-s-little-helper bot assigned kkourt Jan 5, 2022

pchaigno added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 10, 2022

jibi added 3 commits January 10, 2022 11:02

bpf: rename ep and info to local_ep and remote_ep in snat_v4_needed

b90b0a5

[ upstream commit e3dca63 ] to make it more explicit their purpose Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

jibi force-pushed the pr/jibi/v1.10-egressgw-fix-snat-exclusion branch from 466c13e to 8fe4113 Compare January 10, 2022 10:03

jibi removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 10, 2022

kkourt approved these changes Jan 10, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned kkourt Jan 10, 2022

jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 11, 2022

pchaigno merged commit bf2f83d into v1.10 Jan 11, 2022

pchaigno deleted the pr/jibi/v1.10-egressgw-fix-snat-exclusion branch January 11, 2022 13:27

joestringer mentioned this pull request Jan 18, 2022

Prepare for release v1.10.7 #18534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#18246 v1.10 backport #18379

#18246 v1.10 backport #18379

jibi commented Jan 5, 2022

jibi commented Jan 5, 2022 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

jibi commented Jan 5, 2022

pchaigno commented Jan 10, 2022

jibi commented Jan 11, 2022

jibi commented Jan 11, 2022

pchaigno commented Jan 11, 2022

jibi commented Jan 11, 2022

#18246 v1.10 backport #18379

#18246 v1.10 backport #18379

Conversation

jibi commented Jan 5, 2022

jibi commented Jan 5, 2022 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

jibi commented Jan 5, 2022

pchaigno commented Jan 10, 2022

jibi commented Jan 11, 2022

jibi commented Jan 11, 2022

pchaigno commented Jan 11, 2022

jibi commented Jan 11, 2022

jibi commented Jan 5, 2022 •

edited by maintainer-s-little-helper bot