Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#18246 v1.10 backport #18379

Merged
merged 3 commits into from
Jan 11, 2022
Merged

Conversation

jibi
Copy link
Member

@jibi jibi commented Jan 5, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18246; do contrib/backporting/set-labels.py $pr done 1.10; done

@jibi jibi added kind/backports This PR provides functionality previously merged into master. backport/1.10 labels Jan 5, 2022
@jibi jibi requested a review from kkourt January 5, 2022 10:09
@jibi jibi requested a review from a team as a code owner January 5, 2022 10:09
@jibi
Copy link
Member Author

jibi commented Jan 5, 2022

/test-backport-1.10

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running

Failure Output

FAIL: Found 1 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@jibi
Copy link
Member Author

jibi commented Jan 5, 2022

/test-1.16-netnext

@pchaigno pchaigno added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 10, 2022
@pchaigno
Copy link
Member

This PR will need to be rebased as I've just merged another backport PR in v1.10.

[ upstream commit e3dca63 ]

to make it more explicit their purpose

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit cdfc30d ]

In the context of egress gateway, when traffic is leaving the cluster we
need to check twice if it is a match for an egress NAT policy:

* first time in handle_ipv4_from_lxc(), on the node where the client pod is
  running (to determine if it should be forwarded to a gateway node)
* second time in snat_v4_needed(), on the actual gateway node (to
  determine if it should be SNATed)

Currently the 2 checks are slightly diverging wrt how traffic destined
to outside the cluster is identified:

* in the first case we use is_cluster_destination(), which uses the
  information stored on the ipcache and EP maps
* in the second case we just rely on the IPV4_SNAT_EXCLUSION_DST_CIDR

The issue with the IPV4_SNAT_EXCLUSION_DST_CIDR logic is that we may
incorrectly exclude from egress gw SNAT traffic that is supposed to be
SNATed: case in point an EKS environment where the primary VPC is shared
between the cluster and some other EC2 nodes that don't belong to the
cluster.

To fix this, this commit changes the snat_v4_needed() logic to match the
one we use in handle_ipv4_from_lxc() and executes it before the
IPV4_SNAT_EXCLUSION_DST_CIDR check.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit a38aaba ]

In snat_v4_neeeded(), one of the conditions to determine if a packet
should be SNATed with an egress IP is:

    !local_ep || is_cluster_destination()

The intent of the first check (!local_ep) was to that tells us that
traffic was redirected by an egress gateway policy to a different node
to be masqueraded, but in practice it's not needed: as long as the
packet is destined to outside the cluster, is not reply traffic and it's
matched by an egress NAT policy, it should be SNATed with the egress gw
IP (moreover we should not assume that there's no local EP since it's
possible that the node where the client pod is running is the same node
that will act as egress gateway).

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi force-pushed the pr/jibi/v1.10-egressgw-fix-snat-exclusion branch from 466c13e to 8fe4113 Compare January 10, 2022 10:03
@jibi jibi removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 10, 2022
@jibi
Copy link
Member Author

jibi commented Jan 11, 2022

/test-gke

@jibi
Copy link
Member Author

jibi commented Jan 11, 2022

CI status:

  • ci-l4lb-1.10 is failing because test: bump l4lb Vagrantfile kind to 0.11.1 #18370 has not been backported yet to v1.10
  • test-gke has 2 failing tests:
    • K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running
    • K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

and I believe the GKE failure is unrelated to these changes, as the same 2 tests are failing on another v1.10 backport PR #18438 which has totally unrelated changes

@pchaigno
Copy link
Member

  • test-gke has 2 failing tests:

    * `K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running`
    * `K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master`
    

and I believe the GKE failure is unrelated to these changes, as the same 2 tests are failing on another v1.10 backport PR #18438 which has totally unrelated changes

K8sUpdates is failing but it's also skipped :thisisfine: It's basically failing in the initial setup, just before being skipped. It's not counted as a failure by Jenkins.

The K8sChaosTest failure seems legitimate, though not related to this pull request AFAICT. We should open an issue for it if there isn't already one.

@jibi
Copy link
Member Author

jibi commented Jan 11, 2022

K8sUpdates is failing but it's also skipped :thisisfine: It's basically failing in the initial setup, just before being skipped. It's not counted as a failure by Jenkins.

👍

The K8sChaosTest failure seems legitimate, though not related to this pull request AFAICT. We should open an issue for it if there isn't already one.

already logged #18439 from the other failing PR 👍

@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 11, 2022
@pchaigno pchaigno merged commit bf2f83d into v1.10 Jan 11, 2022
@pchaigno pchaigno deleted the pr/jibi/v1.10-egressgw-fix-snat-exclusion branch January 11, 2022 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants