New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Inter-cluster SNAT with ClusterIP global service #24212
bpf: Inter-cluster SNAT with ClusterIP global service #24212
Conversation
6dd5a01
to
8bcaac1
Compare
8bcaac1
to
f4a939c
Compare
/test |
f4a939c
to
b656676
Compare
/test |
b656676
to
baa2a01
Compare
/test |
/test-1.26-net-next |
baa2a01
to
5fa9097
Compare
/test |
/test-1.26-net-next |
/test-1.16-4.19 |
5d76501
to
1b30eed
Compare
/test-1.16-4.19 |
/ci-verifier |
1b30eed
to
26e9ce0
Compare
/test-1.16-4.19 |
/test-only --focus="K8sDatapathVerifier" |
a53fba2
to
74d8fc4
Compare
/test-only --focus="K8sDatapathVerifier" |
74d8fc4
to
3277a0a
Compare
/test-only --focus="K8sDatapathVerifier" |
/test-1.16-4.19 |
/test |
Introduce an egress side of the inter-cluster SNAT. The essential parts are as follows. 1. We carry around the ClusterID with mark (for lxc -> overlay redirect) and skb->cb (for tailcalls). 2. We recognize the inter-cluster communication using the ClusterID. If the ClusterID is not 0 and not the same as local ClusterID, then it's inter-cluster communication. 3. To track SNAT mapping, we use per-cluster SNAT map. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
This commit implements the receive path of the cluster mesh with overlapping PodCIDR on service backend side (not a revSNAT path). When the packet comes from tunnel and go to local endpoints, propergate information that the packet comes from tunnel to handle_policy call. It finally reaches to the ct_create for the new connection and records from_tunnel into conntrack entry. So that we can correctly put packets back to tunnel on return path. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
In the reply path of the inter-cluster communication, we lookup the conntrack entry created on the request path. If the conntrack entry has from_tunnel set and if the destination IP address is remote-node IP (we can recognize it by the identity from ipcache), we set tunnel_endpoint to destination IP address (remote-node IP, which should also be a remote tunnel endpoint) and redirect packet to the tunnel device. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Introduce an ingress side of the inter-cluster SNAT. First, we obtain ClusterID from security identity and check if the packet is from the remote cluster. After that, we check the destination IP address and if the IP address is IPV4_INTER_CLUSTER_SNAT, we'll perform revSNAT. We separate the revSNAT logic into tailcall since we hit the complexity limit in the kernel 4.19. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Transfer ClusterID using metadata from overlay, call into handle_policy and lookup per-cluster conntrack map with ingress direction. It should have the entry we created on the egress path. Thus, we do revDNAT for service, bypass ingress policy and reach to the client Pod. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Initialize per-cluster CT/SNAT map when BPF_TEST macro is defined. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Added four new test files to test TCP three-way handshake with inter-cluster SNAT communication. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Disable coverage report for some tests affected by Coverbee's bug. cilium/coverbee#7 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
c6b9537
to
9a274db
Compare
/test |
/ci-gke |
/gateway-api-conformance-test |
2 similar comments
/gateway-api-conformance-test |
/gateway-api-conformance-test |
@julianwiedmann I resolved your comments. Wrt the 32bit |
Now it's ready to merge @nathanjsweet. Please give me 🍏 and if I win the race, please mark this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hubble protobuf changes LGTM, will let @nathanjsweet decide on the DropReason
enum value race between this PR and #23890.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for API changes.
@@ -850,7 +861,8 @@ struct ct_entry { | |||
dsr:1, | |||
from_l7lb:1, /* Connection is originated from an L7 LB proxy */ | |||
auth_required:1, | |||
reserved:6; | |||
from_tunnel:1, /* Connection is over tunnel */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YutaroHayakawa I noticed that we are missing some visibility information when we are dumping the CT map. Shouldn't we add a "FromTunnel" in here?
Implement very basic inter-cluster SNAT logic with ClusterIP global service. All the features are hidden behind the conditional macro, so it shouldn't affect the existing code path.
The datapath changes are split into small pieces. Here's a quick guide to how commits are organized.
Client/Egress
indicates which code path you're in.