Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: open code mark_is_from_proxy to fix proxy error #2162

Merged
merged 2 commits into from
Nov 29, 2017
Merged

Conversation

jrfastab
Copy link
Contributor

On 4.14 kernel with llvm 4.0 or greater the current implementation of
mark_is_from_proxy() is returning zero always regardless of actual
mark field.

The result is "ingress" policy tests are failing. This can be seen
with the 10-proxy.sh test, giving an "Internal Server Error" (error
code 500). This is because if we do not mark the skb correctly it
will be sent to the proxy repeatedly if the configuration/packet
would have checked the skip bit and avoided forwarding to the proxy.

At the moment its unclear to me why the original version is tripping
up, but kicking through the testing cycle now. This should not change
the logic any AFAICT but still it fixes the issue.

Some ideas, arithmatic overflow (not seeing it now though),
compiler bug, artifact of my bare metal ubuntu setup, ...

Also for what its worth I like the open-coded version better anyways.
We probably don't need an inline function for a single caller.

Fixes: f89a22c ("proxy: Distinguish between proxy and other local processes")
Signed-off-by: John Fastabend john.fastabend@gmail.com

Proxy ingress policy fix for LLVM 4.0 and greater. Resolves return code 500 'Internal Error' seen with some policies and traffic patterns.

On 4.14 kernel with llvm 4.0 or greater the current implementation of
mark_is_from_proxy() is returning zero _always_ regardless of actual
mark field.

The result is "ingress" policy tests are failing. This can be seen
with the 10-proxy.sh test, giving an "Internal Server Error" (error
code 500). This is because if we do not mark the skb correctly it
will be sent to the proxy repeatedly if the configuration/packet
would have checked the skip bit and avoided forwarding to the proxy.

At the moment its unclear to me why the original version is tripping
up, but kicking through the testing cycle now. This should not change
the logic any AFAICT but still it fixes the issue.

Some ideas, arithmatic overflow (not seeing it now though),
compiler bug, artifact of my bare metal ubuntu setup, ...

Also for what its worth I like the open-coded version better anyways.
We probably don't need an inline function for a single caller.

Fixes: f89a22c ("proxy: Distinguish between proxy and other local processes")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
@jrfastab jrfastab requested review from tgraf, a team and borkmann November 29, 2017 06:47
@jrfastab
Copy link
Contributor Author

some relevant snippets from new objdump:

LBB5_89:
     456:       r1 = *(u32 *)(r6 + 8)
     457:       r3 = 4294901760 ll
     459:       r2 = r1
     460:       r2 &= r3
     461:       r3 = 4277862400 ll
     463:       *(u64 *)(r10 - 88) = r5
     464:       if r2 == r3 goto 6
     465:       r3 = 4277796864 ll
     467:       if r2 != r3 goto 5
     468:       r2 = *(u32 *)(r6 + 44)
     469:       r2 |= 1
     470:       *(u32 *)(r6 + 44) = r2

versus old

LBB5_89:
     461:       r1 = *(u32 *)(r6 + 8)
     462:       r3 = 4294901760 ll
     464:       r2 = r1
     465:       r2 &= r3
     466:       r3 = 1
     467:       r4 = 1
     468:       if r2 != -17104896 goto 1
     469:       r4 = 0

LBB5_91:
     470:       if r2 == -17170432 goto 1
     471:       r3 = 0

LBB5_93:
     472:       r3 ^= r4
     473:       r3 &= 1
     474:       if r3 != 0 goto 9
     475:       r1 &= 65535
     476:       r3 = 4277796864 ll
     478:       r5 = r1
     479:       if r2 != r3 goto 4
     480:       r2 = *(u32 *)(r6 + 44)
     481:       r2 |= 1
     482:       *(u32 *)(r6 + 44) = r2
     483:       r5 = r1

Copy link
Contributor

@raybejjani raybejjani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the agent (daemon/bpf.sha) change only, since I don't know enough about the rest.

Copy link
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. Let me investigate the issue a bit, it feels there may be a disconnect between LLVM and kernel runtime on this wrt signed extension.

@borkmann
Copy link
Member

borkmann commented Nov 29, 2017

Below are the semantics of the kernel with regards to the issue. I think they are fine, imho. I would see the bug on LLVM side. I'm wondering for the patch, if you move the test to __u64 and the constants to ULL, would LLVM still tend to generate such code?

From 14e649e9b979c63779f4394e6ade1412a146328a Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 29 Nov 2017 15:07:36 +0100
Subject: [PATCH] bpf: test

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 lib/test_bpf.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index aa8812a..fc464cc 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6066,6 +6066,110 @@ static struct bpf_test tests[] = {
 		{},
 		{ {0x1, 0x42 } },
 	},
+	/* Checking interpreter vs JIT wrt signed extended imms. */
+	{
+		"JNE signed compare, test 1",
+		.u.insns_int = {
+			BPF_ALU32_IMM(BPF_MOV, R1, 0xfefbbc12),
+			BPF_ALU32_IMM(BPF_MOV, R3, 0xffff0000),
+			BPF_MOV64_REG(R2, R1),
+			BPF_ALU64_REG(BPF_AND, R2, R3),
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_JMP_IMM(BPF_JNE, R2, -17104896, 1),
+			BPF_ALU32_IMM(BPF_MOV, R0, 2),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 1 } },
+	},
+	{
+		"JNE signed compare, test 2",
+		.u.insns_int = {
+			BPF_ALU32_IMM(BPF_MOV, R1, 0xfefbbc12),
+			BPF_ALU32_IMM(BPF_MOV, R3, 0xffff0000),
+			BPF_MOV64_REG(R2, R1),
+			BPF_ALU64_REG(BPF_AND, R2, R3),
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_JMP_IMM(BPF_JNE, R2, 0xfefb0000, 1),
+			BPF_ALU32_IMM(BPF_MOV, R0, 2),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 1 } },
+	},
+	{
+		"JNE signed compare, test 3",
+		.u.insns_int = {
+			BPF_ALU32_IMM(BPF_MOV, R1, 0xfefbbc12),
+			BPF_ALU32_IMM(BPF_MOV, R3, 0xffff0000),
+			BPF_ALU32_IMM(BPF_MOV, R4, 0xfefb0000),
+			BPF_MOV64_REG(R2, R1),
+			BPF_ALU64_REG(BPF_AND, R2, R3),
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_JMP_REG(BPF_JNE, R2, R4, 1),
+			BPF_ALU32_IMM(BPF_MOV, R0, 2),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 2 } },
+	},
+	{
+		"JNE signed compare, test 4",
+		.u.insns_int = {
+			BPF_LD_IMM64(R1, -17104896),
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_JMP_IMM(BPF_JNE, R1, -17104896, 1),
+			BPF_ALU32_IMM(BPF_MOV, R0, 2),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 2 } },
+	},
+	{
+		"JNE signed compare, test 5",
+		.u.insns_int = {
+			BPF_LD_IMM64(R1, 0xfefb0000),
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_JMP_IMM(BPF_JNE, R1, 0xfefb0000, 1),
+			BPF_ALU32_IMM(BPF_MOV, R0, 2),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 1 } },
+	},
+	{
+		"JNE signed compare, test 6",
+		.u.insns_int = {
+			BPF_LD_IMM64(R1, 0x7efb0000),
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_JMP_IMM(BPF_JNE, R1, 0x7efb0000, 1),
+			BPF_ALU32_IMM(BPF_MOV, R0, 2),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 2 } },
+	},
+	{
+		"JNE signed compare, test 7",
+		.u.insns = {
+			BPF_STMT(BPF_LD | BPF_IMM, 0xffff0000),
+			BPF_STMT(BPF_MISC | BPF_TAX, 0),
+			BPF_STMT(BPF_LD | BPF_IMM, 0xfefbbc12),
+			BPF_STMT(BPF_ALU | BPF_AND | BPF_X, 0),
+			BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 0xfefb0000, 1, 0),
+			BPF_STMT(BPF_RET | BPF_K, 1),
+			BPF_STMT(BPF_RET | BPF_K, 2),
+		},
+		CLASSIC | FLAG_NO_DATA,
+		{},
+		{ { 0, 2 } },
+	},
 };
 
 static struct net_device dev;
-- 
2.9.5

@borkmann
Copy link
Member

Would it help anything in LLVM behaviour if the constants are defined as unsigned, meaning, would LLVM detect that they would have to be loaded into a register instead since signed extension cannot be applied here?

#define MARK_MAGIC_PROXY_INGRESS   (0xFEFAU << 16)
#define MARK_MAGIC_PROXY_EGRESS    (0xFEFBU << 16)

@tgraf tgraf changed the title RFC: cilium: bpf: open code mark_is_from_proxy to fix proxy error bpf: open code mark_is_from_proxy to fix proxy error Nov 29, 2017
@tgraf tgraf merged commit a37d511 into master Nov 29, 2017
@tgraf tgraf deleted the fix-proxy-500-error branch November 29, 2017 16:00
@yonghong-song
Copy link

The problem is caused by several reg/imm comparison
insns like r2 != -17104896.

through the first few insns, r2 will hold a value like
0xfefb0000 or 0xfefa0000, and two negative immeidates
are actually also 0xfefb0000 and 0xfefa0000 in 2-complement
form.

However, since imm is 32-bit signed integer, when comparison
happens, it is promoted to 64-bit as 0xfffffffffefb0000
and 0xfffffffffefa0000, and then compared against the
register r2. This will always result in not equal, although
users expect it matched in cases where top-32 bits are ignored.

This looks like a compiler issue as we should not use EQ/NE_ri
as the immediate sign bit is 1.

In LLVM, we already only put i64immSExt32 into insn imm field.
def i64immSExt32 : PatLeaf<(i64 imm),
[{return isInt<32>(N->getSExtValue()); }]>;

isInt<32> will check:
template <> constexpr inline bool isInt<32>(int64_t x) {
return static_cast<int32_t>(x) == x;
}

So it does matter for how you define your constants.

I tried the mimic the exact original program, but the compiler
refuses to use EQ_ri for immediate 0xfefa0000. If I choose
the immediate 0xefa0000, the compiler is able to use EQ_ri.
It seems more complex code is required to reproduce the issue.

$ cat test.c
#define MARK_MAGIC_PROXY_INGRESS        (0xFEFA << 16)
#define MARK_MAGIC_PROXY_EGRESS         (0xFEFB << 16)
#define MARK_MAGIC_PROXY_MASK           (0xFFFF << 16)
#define SOURCE_INGRESS_PROXY            1
#define SOURCE_EGRESS_PROXY             2

struct sk_buff {
        unsigned int mark;
        unsigned int tc_index;
};

static int mark_is_from_proxy(struct sk_buff *skb)
{
        int magic = skb->mark & MARK_MAGIC_PROXY_MASK;

        if (magic == MARK_MAGIC_PROXY_INGRESS)
                return SOURCE_INGRESS_PROXY;
        else if (magic == MARK_MAGIC_PROXY_EGRESS)
                return SOURCE_EGRESS_PROXY;
        return 0;
}

void proxy(struct sk_buff *skb, unsigned int *addr)
{
        int ret = mark_is_from_proxy(skb);
        if (ret) {
                *addr = 1;
                if (ret == SOURCE_INGRESS_PROXY)
                        skb->tc_index = 1;
        }
}

Could you tell me how to reproduce the issue? Following insns in https://github.com/cilium/cilium/blob/master/CONTRIBUTING.md to build?
Thanks!

@jrfastab
Copy link
Contributor Author

jrfastab commented Nov 30, 2017

Thanks for the reply. Following the guide should get you the source code via,

go get -d github.com/cilium/cilium
cd $GOPATH/src/github.com/cilium/cilium

If you don't have a go build environment you will need that. Then once you have the source doing a standard 'make' in the ciliium directory should work. In './bpf/' directory there will be './bpf/bpf_netdev..o' (after the make) which is the object file in question. Although in order to see the issue you will need to add a FROM_HOST define. I usually just modify ./bpf/Makefile for this. After doing this there will be a compile error. The following is the diff I used to resolve it,

diff --git a/bpf/Makefile b/bpf/Makefile
index 1bb5b30..db42641 100644
--- a/bpf/Makefile
+++ b/bpf/Makefile
@@ -5,7 +5,7 @@ FLAGS := -Iinclude -I. -D__NR_CPUS__=$(shell nproc) -O2
 CLANG_FLAGS :=  ${FLAGS} -target bpf -emit-llvm
 # eBPF verifier enforces unaligned access checks where necessary, so don't
 # let clang complain too early.
-CLANG_FLAGS += -Wall -Werror -Wno-address-of-packed-member -Wno-unknown-warning-option
+CLANG_FLAGS += -Wall -Werror -Wno-address-of-packed-member -Wno-unknown-warning-option -DFROM_HOST
 LLC_FLAGS   := -march=bpf -mcpu=probe -filetype=obj
 
 BPF = bpf_lxc.o bpf_netdev.o bpf_overlay.o bpf_lb.o bpf_xdp.o
diff --git a/bpf/bpf_netdev.c b/bpf/bpf_netdev.c
index eabec67..87c5e97 100644
--- a/bpf/bpf_netdev.c
+++ b/bpf/bpf_netdev.c
@@ -148,7 +148,7 @@ static inline int rewrite_dmac_to_host(struct __sk_buff *skb)
        /* When attached to cilium_host, we rewrite the DMAC to the mac of
         * cilium_host (peer) to ensure the packet is being considered to be
         * addressed to the host (PACKET_HOST) */diff --git a/bpf/Makefile b/bpf/Makefile
index 1bb5b30..db42641 100644
--- a/bpf/Makefile
+++ b/bpf/Makefile
@@ -5,7 +5,7 @@ FLAGS := -Iinclude -I. -D__NR_CPUS__=$(shell nproc) -O2
 CLANG_FLAGS :=  ${FLAGS} -target bpf -emit-llvm
 # eBPF verifier enforces unaligned access checks where necessary, so don't
 # let clang complain too early.
-CLANG_FLAGS += -Wall -Werror -Wno-address-of-packed-member -Wno-unknown-warning-option
+CLANG_FLAGS += -Wall -Werror -Wno-address-of-packed-member -Wno-unknown-warning-option -DFROM_HOST
 LLC_FLAGS   := -march=bpf -mcpu=probe -filetype=obj
 
 BPF = bpf_lxc.o bpf_netdev.o bpf_overlay.o bpf_lb.o bpf_xdp.o
diff --git a/bpf/bpf_netdev.c b/bpf/bpf_netdev.c
index eabec67..87c5e97 100644
--- a/bpf/bpf_netdev.c
+++ b/bpf/bpf_netdev.c
@@ -148,7 +148,7 @@ static inline int rewrite_dmac_to_host(struct __sk_buff *skb)
        /* When attached to cilium_host, we rewrite the DMAC to the mac of
         * cilium_host (peer) to ensure the packet is being considered to be
         * addressed to the host (PACKET_HOST) */
-       union macaddr cilium_net_mac = CILIUM_NET_MAC;
+       union macaddr cilium_net_mac = {.addr = {0}}; //CILIUM_NET_MAC;
 
        /* Rewrite to destination MAC of cilium_net (remote peer) */
        if (eth_store_daddr(skb, (__u8 *) &cilium_net_mac.addr, 0) < 0)

-       union macaddr cilium_net_mac = CILIUM_NET_MAC;
+       union macaddr cilium_net_mac = {.addr = {0}}; //CILIUM_NET_MAC;
 
        /* Rewrite to destination MAC of cilium_net (remote peer) */
        if (eth_store_daddr(skb, (__u8 *) &cilium_net_mac.addr, 0) < 0)

:
Building with the above produces the error in bpf_netdev.o for me.

@jrfastab
Copy link
Contributor Author

Also let me know I can try patches and/or provide debug info if needed.

@yonghong-song
Copy link

Thanks, @jrfastab, I am able to reproduce the issue with 5.0, but not latest 6.0. After debugging 5.0 and then cross-check 6.0, I found the bug has been fixed. The fix commit is:

commit e53750e1e086f4e234f52041072e688abd79e22b
Author: Yonghong Song <yhs@fb.com>
Date:   Mon Oct 16 04:14:53 2017 +0000

    bpf: fix bug on silently truncating 64-bit immediate
    
    We came across an llvm bug when compiling some testcases that 64-bit
    immediates are silently truncated into 32-bit and then packed into
    BPF_JMP | BPF_K encoding.  This caused comparison with wrong value.
    
    This bug looks to be introduced by r308080.  The Select_Ri pattern is
    supposed to be lowered into J*_Ri while the latter only support 32-bit
    immediate encoding, therefore Select_Ri should have similar immediate
    predicate check as what J*_Ri are doing.
    
    Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
    Reviewed-by: Yonghong Song <yhs@fb.com>

This patch also prevents negative integer from using Select_Ri format since the
negative value does not satisfy:

template <> constexpr inline bool isInt<32>(int64_t x) {
return static_cast<int32_t>(x) == x;
}

where the compiler considers (0xFEFA << 16) as a value with 32-bit width and when it converts to int64, its value is 0xFEFA0000.

@borkmann
Copy link
Member

borkmann commented Dec 1, 2017

Thanks @yonghong-song for looking into this!

Do you see a reliable workaround to avoid silently running into this bug for clang versions < 6.0? Does this exist from the beginning of BPF backend or only clang 5.0 affected?

Do you see a chance to get this fix into 5.0.1, since so far only clang 5.0.0 has been released?

@jrfastab
Copy link
Contributor Author

jrfastab commented Dec 1, 2017 via email

@yonghong-song
Copy link

The reliable way to fix the issue is to force the constant in the register through volatile keyword like below:

static inline int __inline__ mark_is_from_proxy(struct __sk_buff *skb)
{
        int magic = skb->mark & MARK_MAGIC_PROXY_MASK;
        volatile int ingress_magic = MARK_MAGIC_PROXY_INGRESS;
        volatile int egress_magic = MARK_MAGIC_PROXY_EGRESS;

        if (magic == ingress_magic)
                return SOURCE_INGRESS_PROXY;
        else if (magic == egress_magic)
                return SOURCE_EGRESS_PROXY;
        return 0;
}

You only need to do this when constant is greater than 0x7FFFFFFF.

This bug only affects 5.0. It does seem 5.0.1 is still open for business. This bug does look really bad and I will try to push into 5.0.1.

@yonghong-song
Copy link

pushed into llvm stable_50 release. It should appear in 5.0.1 llvm release.

@jrfastab
Copy link
Contributor Author

jrfastab commented Dec 4, 2017 via email

@tgraf tgraf added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Dec 4, 2017
@borkmann
Copy link
Member

borkmann commented Dec 4, 2017

Thanks a lot @yonghong-song!

@borkmann
Copy link
Member

borkmann commented Dec 4, 2017

@jrfastab I guess it would be better to switch to using volatile then for the constants, maybe with a comment pointing to here.

fengguang pushed a commit to 0day-ci/linux that referenced this pull request Jan 21, 2018
Add a couple of test cases for interpreter and JIT that are
related to an issue we faced some time ago in Cilium [1],
which is fixed in LLVM with commit e53750e1e086 ("bpf: fix
bug on silently truncating 64-bit immediate").

Test cases were run-time checking kernel to behave as intended
which should also provide some guidance for current or new
JITs in case they should trip over this. Added for cBPF and
eBPF.

  [1] cilium/cilium#2162

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants