New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
egress gw exclude internal traffic #17639
Conversation
9429eee
to
a09c1eb
Compare
a09c1eb
to
684d609
Compare
684d609
to
7fa3986
Compare
|
/test |
2802a56
to
b501b39
Compare
|
/test-only --focus="K8sEgressGateway" --kernel_version="net-next" |
|
/test |
b501b39
to
20de38f
Compare
|
/test-only --focus="K8sEgressGateway" --kernel_version="net-next" |
|
/test |
e0969ba
to
c03e7a4
Compare
This is a preparation patch (1/2) for subsequent changes. The patch intends to maintain the code behavior. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This is a preparation patch for (2/2) subsequent changes. The patch intends to maintain the code behavior. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
datapath code checks the egress gateway map to determine whether the packet should be forwarded to the egress gateway and whether the packet should be SNAT-ed with an appropriate egress IP. In-cluster traffic should be, however, excluded from these checks. In-cluster traffic includes both pod-to-pod and pod-to-node. The purpose of this patch is to distinguish between cluster-local versus clluster-egress traffic, and apply the egress map check only in the latter case. There are two check points: - in bpf_lxc we check whether the packet should be forwarded to the egress gateway (currently, via a tunnel) - in bpf_host (in the gateway node) we check whether the packet should be SNATed This patch adds the following tests to predicate checking the egress gw map. In bpf_lxc, we test whether: - there is tunnel_endpoint in ipcache, which means the destination is a remote endpoint - the destination id is REMOTE_NODE_ID or HOST_ID, which means that the destination is the host or a remote node - the destination matches a local endpoint and in all above cases we skip checking the egress policy map. In bpf_host, we only perform the egress map check if: - the source address does not match a local endpoint - the remote address is not a remote node One of the things that we considered was using WORLD_ID to distinguish egress traffic. However, as Paul pointed out, egress traffic might not have a WORLD_ID because it might match CIDR policies (toCIDR rules). Fix issue:16147 Signed-off-by: Yongkun Gui <ygui@google.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
9d95669
to
6c14d49
Compare
Added something close to Paul's suggestions (even though, I had to truncate them). Thanks! |
|
/test |
After datapath modifications in this PR, 4.19 kernel is unable to verify bpf_lxc: > => Loading bpf_lxc.c:from-container... > ... > > Prog section '2/7' rejected: Argument list too long (7)! > - Type: 3 > - Attach Type: 0 > - Instructions: 4130 (34 over limit) > - License: GPL Introduce a requirement for egress gateway to run on 5.2 or later kernels that support a larger number of instructions. In addition to the daemon, modify testing and build files accordingly. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
6c14d49
to
f199656
Compare
|
/test |
Egress gateway datapath code depends on REMOTE_NODE_ID to distinguish between cluster-local and cluster-egress traffic. Currently, egress gateway depends on BPF masquerade, which also requires remote-node-identity. However, this might change in the future so we add a check for the above. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Currently, we print the policy map after the policy is removed. This means that the map will always be empty. test used go's defer to remove the egress gateway policy. We modify this to use an AfterAll block for removing this policy, which in turn, allow us to use an AfterFailed block to print the maps. Fixes: 25f32bc Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Add an additional CiliumEgressNATPolicy that will match every packet and forward it into 1.1.1.1. This allows us to check that internal traffic will not be checked using the egress gw policy. This was originally from: c9485e4 Signed-off-by: Yongkun Gui <ygui@google.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
f199656
to
ce1be03
Compare
|
/test K8s-1.21-kernel-4.19 seems like an instance of #15455
Click to show.~~Test NameFailure Output
GKE failure seems like an instance of #17307
Click to show.Test NameFailure Output
|
|
Since this addresses a bug reported by many users, I believe we had enough discussion and reviews to merge it. All test failures can be attributed to flakes: #17639 (comment). Marking ready-to-merge. |
This PR aims to do fix an issue where egress gw policy is incorrectly applied to internal cluster traffic, using a patch originally from @anfernee. In addition to that, it refactors the function
snat_v4_neededso that it is easier to follow. Since this is brittle code, the refactoring is split into small patches so that they are easy to review.Fixes: #16147