Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egress gw exclude internal traffic #17639

Merged
merged 7 commits into from Nov 5, 2021

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Oct 19, 2021

This PR aims to do fix an issue where egress gw policy is incorrectly applied to internal cluster traffic, using a patch originally from @anfernee. In addition to that, it refactors the function snat_v4_needed so that it is easier to follow. Since this is brittle code, the refactoring is split into small patches so that they are easy to review.

Fixes: #16147

fix incorrect application of egress gateway policy to internal cluster traffic.
require  a 5.2 kernel or later for the egress gateway policy feature.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 19, 2021
@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from 9429eee to a09c1eb Compare October 19, 2021 10:45
@kkourt kkourt added kind/tech-debt Technical debt kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Oct 19, 2021
@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from a09c1eb to 684d609 Compare October 19, 2021 11:13
@kkourt kkourt added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Oct 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 19, 2021
@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from 684d609 to 7fa3986 Compare October 19, 2021 11:17
@kkourt
Copy link
Contributor Author

kkourt commented Oct 19, 2021

/test

@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch 2 times, most recently from 2802a56 to b501b39 Compare October 20, 2021 19:47
@kkourt
Copy link
Contributor Author

kkourt commented Oct 20, 2021

/test-only --focus="K8sEgressGateway" --kernel_version="net-next"

@kkourt
Copy link
Contributor Author

kkourt commented Oct 20, 2021

/test

@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from b501b39 to 20de38f Compare October 21, 2021 07:04
@kkourt
Copy link
Contributor Author

kkourt commented Oct 21, 2021

/test-only --focus="K8sEgressGateway" --kernel_version="net-next"

@kkourt kkourt marked this pull request as ready for review October 21, 2021 17:04
@kkourt kkourt requested a review from a team October 21, 2021 17:04
@kkourt kkourt requested review from a team as code owners October 21, 2021 17:04
@kkourt kkourt requested a review from pchaigno October 21, 2021 17:08
@kkourt
Copy link
Contributor Author

kkourt commented Oct 21, 2021

/test

@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from e0969ba to c03e7a4 Compare October 21, 2021 17:24
This is a preparation patch (1/2) for subsequent changes. The patch
intends to maintain the code behavior.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This is a preparation patch for (2/2) subsequent changes. The patch
intends to maintain the code behavior.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
datapath code checks the egress gateway map to determine whether the packet
should be forwarded to the egress gateway and whether the packet should
be SNAT-ed with an appropriate egress IP.

In-cluster traffic should be, however, excluded from these checks.
In-cluster traffic includes both pod-to-pod and pod-to-node. The purpose
of this patch is to distinguish between cluster-local versus
clluster-egress traffic, and apply the egress map check only in the
latter case.

There are two check points:
 - in bpf_lxc we check whether the packet should be forwarded to
 the egress gateway (currently, via a tunnel)
 - in bpf_host (in the gateway node) we check whether the packet should
 be SNATed

This patch adds the following tests to predicate checking the egress gw
map.

In bpf_lxc, we test whether:
 - there is tunnel_endpoint in ipcache, which means the destination is a
 remote endpoint
 - the destination id is REMOTE_NODE_ID or HOST_ID, which means that the
 destination is the host or a remote node
 - the destination matches a local endpoint
and in all above cases we skip checking the egress policy map.

In bpf_host, we only perform the egress map check if:
 - the source address does not match a local endpoint
 - the remote address is not a remote node

One of the things that we considered was using WORLD_ID to distinguish
egress traffic. However, as Paul pointed out, egress traffic might
not have a WORLD_ID because it might match CIDR policies (toCIDR rules).

Fix issue:16147

Signed-off-by: Yongkun Gui <ygui@google.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from 9d95669 to 6c14d49 Compare November 4, 2021 13:54
@kkourt
Copy link
Contributor Author

kkourt commented Nov 4, 2021

It does make reviewing a bit easier because we now know what was the intended change in the logic. You had similar descriptions in previous versions of the commit history :-)

Added something close to Paul's suggestions (even though, I had to truncate them). Thanks!

@kkourt
Copy link
Contributor Author

kkourt commented Nov 4, 2021

/test

After datapath modifications in this PR, 4.19 kernel is unable to verify
bpf_lxc:

>  => Loading bpf_lxc.c:from-container...
>  ...
>
> Prog section '2/7' rejected: Argument list too long (7)!
>  - Type:         3
>  - Attach Type:  0
>  - Instructions: 4130 (34 over limit)
>  - License:      GPL

Introduce a requirement for egress gateway to run on 5.2 or later
kernels that support a larger number of instructions.

In addition to the daemon, modify testing and build files accordingly.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from 6c14d49 to f199656 Compare November 4, 2021 14:37
@kkourt
Copy link
Contributor Author

kkourt commented Nov 4, 2021

In an offline discussion, @pchaigno and @brb pointed out that for the fix to work, enable-remote-node-identity should be set in the agent. Added a patch to check this when agent is initialized.

@kkourt
Copy link
Contributor Author

kkourt commented Nov 4, 2021

/test

Egress gateway datapath code depends on REMOTE_NODE_ID to distinguish
between cluster-local and cluster-egress traffic.

Currently, egress gateway depends on BPF masquerade, which also requires
remote-node-identity. However, this might change in the future so we add
a check for the above.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Currently, we print the policy map after the policy is removed. This
means that the map will always be empty.

test used go's defer to remove the egress gateway policy. We modify this
to use an AfterAll block for removing this policy, which in turn, allow
us to use an AfterFailed block to print the maps.

Fixes: 25f32bc

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add an additional CiliumEgressNATPolicy that will match every packet and
forward it into 1.1.1.1. This allows us to check that internal traffic
will not be checked using the egress gw policy.

This was originally from:
c9485e4

Signed-off-by: Yongkun Gui <ygui@google.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt force-pushed the pr/kkourt/egress-gw-exclude-internal-traffic branch from f199656 to ce1be03 Compare November 4, 2021 14:47
@kkourt
Copy link
Contributor Author

kkourt commented Nov 4, 2021

/test

K8s-1.21-kernel-4.19 seems like an instance of #15455

Job 'Cilium-PR-K8s-1.21-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.~~

Test Name

K8sDatapathConfig Host firewall With native routing

Failure Output

FAIL: Failed to reach 192.168.36.11:80 from testclient-25lrg

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.19 so I can create a new GitHub issue to track it.

GKE failure seems like an instance of #17307

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.Test Name
K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications

Failure Output

FAIL: Pods are not ready in time: timed out waiting for pods with filter  to be ready: 4m0s timeout expired

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@kkourt
Copy link
Contributor Author

kkourt commented Nov 5, 2021

Since this addresses a bug reported by many users, I believe we had enough discussion and reviews to merge it. All test failures can be attributed to flakes: #17639 (comment). Marking ready-to-merge.

@kkourt kkourt added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 5, 2021
@pchaigno pchaigno merged commit 81f7608 into master Nov 5, 2021
@pchaigno pchaigno deleted the pr/kkourt/egress-gw-exclude-internal-traffic branch November 5, 2021 08:19
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport done to v1.10 in 1.10.6 Nov 23, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport done to v1.10 in 1.10.6 Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/egress-gateway Impacts the egress IP gateway feature. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.10.6
Backport done to v1.10
Development

Successfully merging this pull request may close these issues.

Egress for destination CIDR 0.0.0.0/0 blocks all network connections