Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium: initial xdp-based nodeport implementation #10877

Merged
merged 4 commits into from Apr 8, 2020
Merged

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Apr 7, 2020

See commits.

XDP-based NodePort LB handling for BPF-based DSR, SNAT and Hybrid mode. 

@borkmann borkmann added pending-review sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/major This PR introduces major new functionality to Cilium. labels Apr 7, 2020
@borkmann borkmann requested review from brb and a team April 7, 2020 14:33
@borkmann borkmann requested review from a team as code owners April 7, 2020 14:33
@borkmann borkmann requested a review from a team April 7, 2020 14:33
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 Apr 7, 2020
@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-docs-please

@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-me-please

@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-me-please

@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-docs-please

@coveralls
Copy link

coveralls commented Apr 7, 2020

Coverage Status

Coverage decreased (-0.06%) to 46.924% when pulling 65a1351 on pr/xdp-nodeport-rev into 418500b on master.

@borkmann borkmann requested a review from a team as a code owner April 7, 2020 17:12
@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-me-please

@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

(docs green, travis green, 4.19 green in prior run)

@borkmann borkmann force-pushed the pr/xdp-nodeport-rev branch 2 times, most recently from d0655d0 to 8966b22 Compare April 7, 2020 20:05
@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-me-please

@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-focus K8sDatapathConfig.*

1 similar comment
@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-focus K8sDatapathConfig.*

@maintainer-s-little-helper
Copy link

Commit 3d9ce09ddd31ef3ca29dccb40270310e5f9f19b2 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 7, 2020
@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-focus K8sDatapathConfig.*

@borkmann
Copy link
Member Author

borkmann commented Apr 7, 2020

test-focus K8sDatapathConfig.*

1 similar comment
@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits below, this is awesome:)

Documentation/gettingstarted/kubeproxy-free.rst Outdated Show resolved Hide resolved
Documentation/gettingstarted/kubeproxy-free.rst Outdated Show resolved Hide resolved
install/kubernetes/cilium/values.yaml Show resolved Hide resolved
bpf/include/bpf/ctx/xdp.h Show resolved Hide resolved
@@ -292,7 +292,10 @@ function xdp_load()
SEC=$6
CIDR_MAP=$7

bpf_compile $IN $OUT obj "$OPTS"
NODE_MAC=$(ip link show $DEV | grep ether | awk '{print $2}')
NODE_MAC="{.addr=$(mac2array $NODE_MAC)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NODE_MAC is already defined in the endpoint's headers, can we reuse that one instead of extending the init.sh script? It's easier to debug & introspect that way (not to mention reduces the number of things we'll need to change to eventually get rid of init.sh)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have it for bpf_load, but I agree that generally, it might be good. Perhaps that would also make the code generation easier in Martynas' PR. Noting for follow-up.

xdp_load $XDP_DEV $XDP_MODE "$COPTS" bpf_prefilter.c bpf_prefilter.o from-netdev $CIDR_MAP
COPTS="-DSECLABEL=${ID_WORLD} -DCALLS_MAP=cilium_calls_xdp"
if [ "$NODE_PORT" = "true" ]; then
COPTS="${COPTS} -DLB_L3 -DLB_L4 -DDISABLE_LOOPBACK_LB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a follow-up cleanup but I wonder if we should move towards having a dedicated xdp_config.h header, ideally with as much code sharing with netdev_config.h as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree, any special casing for XDP should be avoided so we can have better integration with the rest of the code.

bpf/lib/nodeport.h Show resolved Hide resolved

static __always_inline int check_v4_lb(struct __ctx_buff *ctx)
{
ep_tail_call(ctx, CILIUM_CALL_IPV4_FROM_LXC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the tail calls here? (even if we don't, we can always follow up after this PR..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From inside the nodeport handling we basically recirculate back to here (CILIUM_CALL_IPV4_FROM_LXC). Avoiding the tail call would be nice, but means we somehow need to get rid of the tail call chain inside our nodeport code. I can check if it's feasible in a follow-up.

Implement a minimal ctx_adjust_room() handler so that we can start
to use NodePort DSR in XDP. The implementation is currently kept to
a very minimum specifically to handle the DSR case (which is the only
one in our code base right now). If we need other support in future,
we can always extend it at the cost of verifier complexity.

Note, the skb-based implementation uses skb->protocol to figure out
where to insert room. In XDP, we don't have such facility, but given
we do this in the code-base only in two locations, i) from v4 DSR and
ii) from v6 DSR, I've used the len_diff as an indicator for the proto
here. A bit of a hack, but given it's a constant, it allows for code
optimization, can compile out either case and therefore reduce
complexity. If we start using this in more locaitons, we need to parse
the protocol from the eth header via xdp->data manually.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@maintainer-s-little-helper
Copy link

Commit b518ebe66956a2af25e8e4df6c6dc85d5358c584 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*
(test-focus green when taking out daemon related code)

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 8, 2020
@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*

@borkmann borkmann force-pushed the pr/xdp-nodeport-rev branch 2 times, most recently from dc41469 to af16d23 Compare April 8, 2020 10:31
@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*

@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*
(green up to cilium: add daemon node-port-acceleration option)

@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*

@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-focus K8sDatapathConfig.*
(green now)

Rework the XDP prefilter early drop to now also include NodePort service
handling. Idea is that after prefilter phase, we can accelerate the case
where the NodePort(/ExternalIP) backend is remote and we are forced to
push the packet back out again via SNAT or DSR.

Instead of first crafting an skb, pushing it through GRO engine, packet
taps to eventually end up at tc ingress of the phy device, we can do all
this right in the driver's XDP layer instead at a much higher per packet
rate. This is the case where XDP can really help to help scaling up the
backends. XDP won't be able to help much for the case where the backend
is local to the node, and when we're hitting cases where we need to punt
to remote backends, then can can do so at max the rate that tc ingress
can handle, so moving this down into XDP will improve that limit. At
the same time it will also create less burden given unneeded layers are
bypassed.

All major 10/40G NICs these days support XDP in the upstream kernel. This
is also in particular useful e.g. in cloud-based environments with SRIO-V
networking or on bare metal machines in general.

This work piggy-back on the entire datapath conversion that has been done
for Cilium's BPF code in order to be generic for skb and xdp.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This add the necessary daemon-only parts to enable xdp-based nodeport.
We add daemon config flag, so we can opt-into --node-port-acceleration={none,
generic,driver} setting in order to enable NodePort via XDP. The default
is 'none' which will then use the regular tc ingress based mechanism. So
far it's opt-in, and at some point we can think about integrating it into
the kube-proxy-free probe mode.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add an initial paragraph on configuring nodeport XDP acceleration to
the kube-proxy-free guide as well as Helm support.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

borkmann commented Apr 8, 2020

test-me-please

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, discussed that we can follow up on the remaining items.

@borkmann borkmann merged commit 4b3f591 into master Apr 8, 2020
1.8.0 automation moved this from In progress to Merged Apr 8, 2020
@borkmann borkmann deleted the pr/xdp-nodeport-rev branch April 8, 2020 17:05
option.Config.NodePortAcceleration != option.NodePortAccelerationNone {
if option.Config.XDPDevice != "undefined" &&
option.Config.XDPDevice != option.Config.Device {
log.Fatalf("Cannot set NodePort acceleration device: mismatch between Prefilter device %s and NodePort device %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One extra fixup later, the check here is option.Config.Device but the message says "Prefilter" (which I would expect to be option.Config.DevicePreFilter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality to Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

3 participants