Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Inter-cluster SNAT with ClusterIP global service #24212

Conversation

YutaroHayakawa
Copy link
Member

@YutaroHayakawa YutaroHayakawa commented Mar 7, 2023

Implement very basic inter-cluster SNAT logic with ClusterIP global service. All the features are hidden behind the conditional macro, so it shouldn't affect the existing code path.

The datapath changes are split into small pieces. Here's a quick guide to how commits are organized.

  • f1ed5fd - 3b451b4 are preparatory commits (define macros, drop reason, etc...)
  • 36f0789 - c1f2237 are commits for actual datapath logic. The commits are ordered by packet flow, so if you read the code commit by commit, you'll automatically follow the packet flow. Labels in commit titles such as Client/Egress indicates which code path you're in.
  • b5996c2 - 9a274db are tests.
bpf: Inter-cluster SNAT with ClusterIP global service

@YutaroHayakawa YutaroHayakawa added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels Mar 7, 2023
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch 2 times, most recently from 6dd5a01 to 8bcaac1 Compare March 7, 2023 13:42
@YutaroHayakawa YutaroHayakawa reopened this Mar 7, 2023
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from 8bcaac1 to f4a939c Compare March 7, 2023 13:48
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from f4a939c to b656676 Compare March 8, 2023 04:20
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from b656676 to baa2a01 Compare March 8, 2023 07:27
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa
Copy link
Member Author

/test-1.26-net-next

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from baa2a01 to 5fa9097 Compare March 9, 2023 08:07
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa
Copy link
Member Author

/test-1.26-net-next

@YutaroHayakawa
Copy link
Member Author

/test-1.16-4.19

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from 5d76501 to 1b30eed Compare March 10, 2023 09:08
@YutaroHayakawa
Copy link
Member Author

/test-1.16-4.19

@YutaroHayakawa
Copy link
Member Author

/ci-verifier

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from 1b30eed to 26e9ce0 Compare March 11, 2023 01:45
@YutaroHayakawa
Copy link
Member Author

/test-1.16-4.19

@YutaroHayakawa
Copy link
Member Author

/test-only --focus="K8sDatapathVerifier"

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from a53fba2 to 74d8fc4 Compare March 11, 2023 02:38
@YutaroHayakawa
Copy link
Member Author

/test-only --focus="K8sDatapathVerifier"

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from 74d8fc4 to 3277a0a Compare March 11, 2023 02:53
@YutaroHayakawa
Copy link
Member Author

/test-only --focus="K8sDatapathVerifier"

@YutaroHayakawa
Copy link
Member Author

/test-1.16-4.19

@YutaroHayakawa
Copy link
Member Author

/test

Introduce an egress side of the inter-cluster SNAT. The essential parts
are as follows.

1. We carry around the ClusterID with mark (for lxc -> overlay redirect)
   and skb->cb (for tailcalls).
2. We recognize the inter-cluster communication using the ClusterID. If
   the ClusterID is not 0 and not the same as local ClusterID, then it's
   inter-cluster communication.
3. To track SNAT mapping, we use per-cluster SNAT map.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
This commit implements the receive path of the cluster mesh with
overlapping PodCIDR on service backend side (not a revSNAT path).

When the packet comes from tunnel and go to local endpoints, propergate
information that the packet comes from tunnel to handle_policy call. It
finally reaches to the ct_create for the new connection and records
from_tunnel into conntrack entry. So that we can correctly put packets
back to tunnel on return path.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
In the reply path of the inter-cluster communication, we lookup the
conntrack entry created on the request path. If the conntrack entry
has from_tunnel set and if the destination IP address is remote-node
IP (we can recognize it by the identity from ipcache), we set
tunnel_endpoint to destination IP address (remote-node IP, which should
also be a remote tunnel endpoint) and redirect packet to the tunnel
device.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Introduce an ingress side of the inter-cluster SNAT. First, we obtain
ClusterID from security identity and check if the packet is from the
remote cluster. After that, we check the destination IP address and if
the IP address is IPV4_INTER_CLUSTER_SNAT, we'll perform revSNAT.

We separate the revSNAT logic into tailcall since we hit the complexity
limit in the kernel 4.19.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Transfer ClusterID using metadata from overlay, call into handle_policy
and lookup per-cluster conntrack map with ingress direction. It should
have the entry we created on the egress path. Thus, we do revDNAT for
service, bypass ingress policy and reach to the client Pod.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Initialize per-cluster CT/SNAT map when BPF_TEST macro is defined.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Added four new test files to test TCP three-way handshake with
inter-cluster SNAT communication.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Disable coverage report for some tests affected by Coverbee's bug.

cilium/coverbee#7

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/oss/clustermesh-overlapping-podcidr/datapath branch from c6b9537 to 9a274db Compare March 22, 2023 07:12
@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented Mar 22, 2023

ConformanceGKE: #22368
ConformanceGatewayAPI: #24217

@YutaroHayakawa
Copy link
Member Author

/ci-gke

@YutaroHayakawa
Copy link
Member Author

/gateway-api-conformance-test

2 similar comments
@YutaroHayakawa
Copy link
Member Author

/gateway-api-conformance-test

@YutaroHayakawa
Copy link
Member Author

/gateway-api-conformance-test

@YutaroHayakawa
Copy link
Member Author

@julianwiedmann I resolved your comments. Wrt the 32bit cluster_id, as you suggested, I changed my code to generally treat cluster_id as a __u32 value and cast it to __u8 when needed (e.g., When we lookup ipcache).

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented Mar 22, 2023

Now it's ready to merge @nathanjsweet. Please give me 🍏 and if I win the race, please mark this PR ready-to-merge. Note that I added one more drop reason DROP_INVALID_CLUSTER_ID since you last saw it (should be a trivial change).

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hubble protobuf changes LGTM, will let @nathanjsweet decide on the DropReason enum value race between this PR and #23890.

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API changes.

@YutaroHayakawa YutaroHayakawa added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 22, 2023
@youngnick youngnick merged commit f7ed911 into cilium:master Mar 22, 2023
42 checks passed
@youngnick youngnick mentioned this pull request Mar 22, 2023
7 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 24, 2023
@@ -850,7 +861,8 @@ struct ct_entry {
dsr:1,
from_l7lb:1, /* Connection is originated from an L7 LB proxy */
auth_required:1,
reserved:6;
from_tunnel:1, /* Connection is over tunnel */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YutaroHayakawa I noticed that we are missing some visibility information when we are dumping the CT map. Shouldn't we add a "FromTunnel" in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants