New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDP not work well with Docker veth #3077

Open
hittingnote opened this Issue Mar 9, 2018 · 4 comments

Comments

Projects
None yet
6 participants
@hittingnote

hittingnote commented Mar 9, 2018

I wanna load XDP program to container’s veth.
Firstly, I created two containers:
docker run -d --name host1 -h host1 exp:v1.0, and
docker run -d --name host2 -h host2 exp:v1.0,
where exp:v1.0 is the image I had created from training/webapp. And now host1 was binded to veth02b9ec2 with IP address 172.17.0.2, and host2 to vethf2e33b9 with ip_addr 172.17.0.3.

My XDP program is as follows:
#include <linux/bpf.h>
#include “bpf_helpers.h”

#define u32 unsigned int
#define u16 unsigned short
#define u64 unsigned long long

SEC(“test_xdp”) int test_xdp_main(struct xdp_md *ctx)
{
return XDP_DROP;
}

char license[] SEC(“license”) = “GPL”;

which means all the arriving packets will be dropped.

Then, I compiled it with clang and loaded it to veth02b9ec2:
clang -O2 -Wall -target bpf -c test_xdp.c -o test_xdp.o, and
ip link set dev veth02b9ec2 xdp obj test_xdp.o section test_xdp verbose.
And the XDP program successfully loaded in veth02b9ec2.

But when I ping 172.17.0.3 in host1 whose IP is 172.17.0.2, it seemed that XDP didn’t work at all. However, when I loaded this XDP program in docker0, and ping 172.17.0.1 or the address outside the container network at host1, the XDP program worked very well and ping failed.

My iproute2 version is ss180129, clang version is 4.0.0 and docker version is 1.12.6. I hope someone can help me out.

@borkmann

This comment has been minimized.

Member

borkmann commented Mar 9, 2018

If I understand you correctly, you have two docker containers, host1 and host2. Inside host1 sits veth02b9ec2 with 172.17.0.2. Inside host2 container vethf2e33b9 with 172.17.0.3. You attach the drop-all XDP program to veth02b9ec2 in host1, and ping 172.17.0.2 -> 172.17.0.3. All correct? Now you are wondering that even though you've installed the XDP prog on veth02b9ec2 that nothing gets dropped.

Two reasons: i) XDP only enforces on ingress, meaning you're much better off attaching the XDP program onto the host facing veth device of host1 if you want to enforce policy for host1's egress. ii) While this might work for ping, there are cases where generic XDP fails today, in particular when the skb is cloned, since original use case for XDP generic was to mimic native XDP run on drivers for phys NICs. Meaning, the latter will fail on veth devices in case of TCP since it clones the packets. This can be fixed though, although at a bigger performance penalty.

The other option you of course have is to use tc/BPF on the veth devices instead (like Cilium does) and you can attach on ingress or egress via sch_clsact and cls_bpf with a very similar dropper prog. See also in Cilium's doc the BPF/XDP reference guide.

@fruffy

This comment has been minimized.

fruffy commented Oct 17, 2018

I am hijacking this issue a bit since it still seems relevant.
@borkmann
I am facing a similar problem when deploying TCP programs in my virtual prototyping environment. How can it be fixed and how high is the potential performance penalty? Are there any dirty workarounds or does it require a patch?
Unfortunately, tc/BPF is less of an option because I am primarily interested in testing XDP's packet modification/AF_XDP features.

Also are there any other kernel protocols where the skb is cloned beforehand? Interestingly, SCTP seems to work fine with XDP and virtual interfaces.

Thanks!

@borkmann

This comment has been minimized.

Member

borkmann commented Oct 17, 2018

@fruffy here's something hacky that should unblock it, although it's very inefficient given this would call twice into pskb_expand_head() when skb is cloned and non-linear. Afaik, UDP generally doesn't do it, but TCP is the main one.

diff --git a/net/core/dev.c b/net/core/dev.c
index 93243479..28dd13b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4280,11 +4280,8 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
       int hlen, off;
       u32 mac_len;

-       /* Reinjected packets coming from act_mirred or similar should
-        * not get XDP generic processing.
-        */
-       if (skb_cloned(skb) || skb_is_tc_redirected(skb))
-               return XDP_PASS;
+       if (skb_unclone(skb, GFP_ATOMIC))
+               return XDP_DROP;

       /* XDP packets must be linear and must have sufficient headroom
        * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
@fruffy

This comment has been minimized.

fruffy commented Oct 18, 2018

That's great, thanks a lot for the patch! Since it is just for verification I am not too worried about efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment