datapath: Improved BPF testing framework #20017

dylandreimerink · 2022-05-31T08:53:09Z

This PR adds a new BPF testing framework for datapath code. This
framework will allow us to write tests in C which will be executed,
as eBPF programs in the kernel. This allows us to test our code against
the actual kernel facilities and helper calls as well as confirming that
code will pass the verifier.

This framework differs from other related works in the sense that we do
the test setup, execution and verification all in C/eBPF code instead
of having to do half the work in userspace and the other half in eBPF.
A lightweight loader program is used to automatically detect test
programs in ELF files in a given directory, load, and execute them using
the BPF_PROG_TEST_RUN feature. The test results are passed back to the
loader which will convert them into sub-tests within the golang testing
framework to allow for easy integration into existing tooling.

Doing setup in C allows us to easily mock out code with #define
preprocessor tricks, replace tailcalls with stubs/code, and leverage
conditional testing depending on which flags are set / features enabled.
It also decouples datapath testing from the agent allowing us to verify
that the eBPF programs work as expected separately from the agent /
userspace.

Each ELF file can contain multiple CHECK's each of which will become
a separate program which can fail(test_fail/test_fail_now()) or pass
(default if not failed). Tests can log messages with test_log("msg")
for debugging purposes or as fail message(test_fatal("msg") logs and
test_fail_now()). These messages can include parameters analogous to
bpf_trace_printk, for example:
test_log("Expected 123, found: %llu", some_var). Or for compact
checks the assert(some_var == 123) style can be used which will
report the file and line of failed asserts and stop the test.

These check programs can also define sub-tests with a TEST("name", {
block which can pass/fail independently from the parent test. The appeal
is that such sub-tests run after each other in the exact same context
and can for example test different aspects of the same test setup(ctx,
map state).

Both CHECK and TEST can be skipped by calling test_skip()/
test_skip_now(), to mark a test as skipped, for example if applies to
a feature which is disabled or depends on features which are not
available in the current kernel version. Skipped tests are explicitly
marked as such instead of silently ignored. Tests can use #ifdef
preprocessor statements to determine to skip or not.

Tests that do not require tailcalls can do setup, execution and
pass/fail checking all in the CHECK program. Tests that do need to
tests across multiple tailcalls need to use a SETUP program in
addition, in which case the setup and execution will happen in the
SETUP program which will exit after the last tailcall is done. The
loader will run any SETUP program before the CHECK program of the
same name, the CHECK program will be invoked with the result context
of the SETUP program, the result number of the SETUP program will
be pretended to the context(4 bytes), this allows the CHECK program
to verify the return value, context and map state and determine a pass/
fail result.

Fixes: #18480

.github/workflows/lint-bpf-checks.yaml

bpf/tests/bpf_tests/common.h

test/bpf_tests/bpf_test.go

joestringer

It looked like others focused on the details of the test progs themselves, so I focused more on the build systems and general bits that were different from the way that other go tests work in the tree.

I explicitly didn't review the actual test content, because if you delete the full files and then add full files, it's much more difficult to review the diffs. If you're able to combine these in the same commit with minimal diff, that would make review easier for those pieces.

bpf/tests/bpf_tests/Makefile

joestringer · 2022-06-03T18:39:04Z

test/bpf_tests/bpf_test.go

@@ -0,0 +1,443 @@
+// SPDX-License-Identifier: Apache-2.0


Similar feedback here, we don't need yet another directory (test/bpf, bpf/tests, test/bpf_tests, bpf/tests/bpf_tests) to confuse people even more about where the code should live :-)

IIRC the rationale behind test/bpf/ vs. bpf/tests is the language distinction of Go vs. C, but if we have a reasonable way to satisfy whatever tooling is analyzing these directories then my preference would actually be to just put them all in the same directory. If the tooling doesn't like them being in the same directory (eg go tooling complaining about the C files) then we can keep two separate dirs.

Currently the distinction seems to be that /bpf is for C code and the rest for Go. You can't put Go and C files in the same directory since Go will automatically attempt to compile .c files as CGO, so any Go code would have to live in a separate directory anyway. Having 1 Go file in among all the C code felt strange, but I agree that it would be better to co-locate code that works together, perhaps not just for the tests but other parts of cilium as well

Ah I see. Maybe we should move the unit-test.c (and elf-demo.c?) into bpf/ ..?

elf-demo.c I'm less sure about, IIRC that's actually used for the pkg/elf Go testing. Could potentially even move that one under something like test/elf.

Well anyways this is not such a big deal, we can always move the files later.

test/bpf_tests/bpf_test.go

test/bpf/Makefile

test/Makefile

joestringer · 2022-06-03T19:38:54Z

test/bpf_tests/bpf_test.go

+				// Give the global log buf some time to empty
+				time.Sleep(50 * time.Millisecond)


🤔 I'm always suspicious about sleeps, is there a way we can flush the log or something instead?

cilium/ebpf currently doesn't provide this, though I agree it should, and I believe it is possible, I will see if I can fix that. Would it be acceptable to leave it like this and fix it as soon as a flush feature has been upstreamed in cilium/ebpf? (if that takes longer than this PR)

Maybe a question first, what's the consequences if 50 Milliseconds is not enough time to flush the log? Can we easily detect that problem and react to it?

I guess the condition we are looking for is:

Did the test complete

Are we blocking

But we can't simply use a select to check if we are blocked, since it uses epoll. It would be great if we the library had a "check without blocking" function, but it doesn't. So to really fix it we need to update cilium/ebpf. Worst case if the reader does take longer than 50ms, is that we lose some debug information while in verbose mode, so I personally feel like this doesn't have to be a blocking issue as long as we improve this in a follow-up.

test/bpf_tests/trf.proto

bpf/tests/xdp_nodeport_lb4_test.c

test/bpf_tests/bpf_test.go

This commit adds a new BPF testing framework for datapath code. This framework will allow us to write unit tests in C which will be executed, as eBPF programs in the kernel. This allows us to test our code against the actual kernel facilities and helper calls as well as confirming that code will pass the verifier. This framework differs from other related works in the sense that we do the test setup, execution and verification all in C/eBPF code instead of having to do half the work in userspace and the other half in eBPF. A lightweight loader program is used to automatically detect test programs in ELF files in a given directory, load, and execute them using the `BPF_PROG_TEST_RUN` feature. The test results are passed back to the loader which will convert them into sub-tests within the golang testing framework to allow for easy integration into existing tooling. Doing setup in C allows us to easily mock out code with `#define` preprocessor tricks, replace tailcalls with stubs/code, and leverage conditional testing depending on which flags are set / features enabled. It also decouples datapath testing from the agent allowing us to verify that the eBPF programs work as expected separately from the agent / userspace. Each ELF file can contain multiple `CHECK`'s each of which will become a separate program which can fail(`test_fail`/`test_fail_now()`) or pass (default if not failed). Tests can log messages with `test_log("msg")` for debugging purposes or as fail message(`test_fatal("msg")` logs and `test_fail_now()`). These messages can include parameters analogous to `bpf_trace_printk`, for example: `test_log("Expected 123, found: %llu", some_var)`. Or for compact checks the `assert(some_var == 123)` style can be used which will report the file and line of failed asserts and stop the test. These check programs can also define sub-tests with a `TEST("name", {` block which can pass/fail independently from the parent test. The appeal is that such sub-tests run after each other in the exact same context and can for example test different aspects of the same test setup(ctx, map state). Both `CHECK` and `TEST` can be skipped by calling `test_skip()`/ `test_skip_now()`, to mark a test as skipped, for example if applies to a feature which is disabled or depends on features which are not available in the current kernel version. Skipped tests are explicitly marked as such instead of silently ignored. Tests can use `#ifdef` preprocessor statements to determine to skip or not. Tests that do not require tailcalls can do setup, execution and pass/fail checking all in the `CHECK` program. Tests that do need to tests across multiple tailcalls need to use a `SETUP` program in addition, in which case the setup and execution will happen in the `SETUP` program which will exit after the last tailcall is done. The loader will run any `SETUP` program before the `CHECK` program of the same name, the `CHECK` program will be invoked with the result context of the `SETUP` program, the result number of the `SETUP` program will be pretended to the context(4 bytes), this allows the `CHECK` program to verify the return value, context and map state and determine a pass/ fail result. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

The `memcpy` and `memcmp` builtin replacements did work for uneven values other than 1. This commit updates both builtin replacements so they do. This is quite important since it isn't uncommon to copy strings with an uneven amount of chars in the new test framework. Additionally, this commit extends the max size of values passed into `memcmp` from 32 bytes to 72 bytes which is required for some unit tests to compare packets which are larger than 32 bytes. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit ports most existing unit tests in their different forms to the new eBPF unit test framework. This removes the need for a mocking framework in userspace and replaces the existing BPF_PROG_TEST_RUN based framework in the CI. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

The const Coccinelle script attempts to detect all function arguments that should be marked as const. To that end, for each function argument, we skip all cases that could lead to that argument being modified. One corner case we need to detect is when we modify an array field in a struct (`func(..., x->field, ...)`). In that case, we can't easily distinguish between a pointer field and an array field. In the former case, the struct is not modified; in the latter case, it is. Since we only have a few cases with array fields in the codebase, we list those manually. Until recently, that was only the addr field in some struct (holding an IPv6). We now need to add the fields for MAC addresses. Signed-off-by: Paul Chaignon <paul@cilium.io>

dylandreimerink · 2022-06-09T13:32:38Z

I have gotten the CI as far as I can take it. The only remaining complaint from checkpatch is about the macros used:

CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
  #379: FILE: bpf/tests/common.h:28:
  +#define __bpf_log_arg1(ptr, arg) *(ptr++) = MKR_LOG_ARG; *(__u64 *)(ptr) = arg; ptr += sizeof(__u64)
  
  CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'ptr' may be better as '(ptr)' to avoid precedence issues
  #379: FILE: bpf/tests/common.h:28:
  +#define __bpf_log_arg1(ptr, arg) *(ptr++) = MKR_LOG_ARG; *(__u64 *)(ptr) = arg; ptr += sizeof(__u64)

But I can't really "fix" these without altering/breaking the behavior of the macros

This commits adds a new page to the "For developers/Testing" section. The page covers how to run and create BPF tests within the framework that was added in cilium#20017 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commits adds a new page to the "For developers/Testing" section. The page covers how to run and create BPF tests within the framework that was added in #20017 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 31, 2022

dylandreimerink force-pushed the feature/bpf-unit-tests branch 4 times, most recently from 27d561b to decad16 Compare May 31, 2022 09:15

dylandreimerink changed the title ~~Improved BPF testing framework~~ datapath: Improved BPF testing framework May 31, 2022

dylandreimerink force-pushed the feature/bpf-unit-tests branch 5 times, most recently from c08ceaa to 51ed754 Compare May 31, 2022 17:47

dylandreimerink added area/CI Continuous Integration testing issue or flake sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. labels May 31, 2022

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 31, 2022

dylandreimerink marked this pull request as ready for review June 1, 2022 08:14

dylandreimerink requested a review from a team June 1, 2022 08:14

dylandreimerink requested review from a team as code owners June 1, 2022 08:14

dylandreimerink requested review from YutaroHayakawa, nebril, ti-mo, pchaigno and kkourt June 1, 2022 08:14

dylandreimerink mentioned this pull request Jun 1, 2022

Datapath testing using BPF_PROG_TEST_RUN #14990

Open

joestringer reviewed Jun 2, 2022

View reviewed changes

.github/workflows/lint-bpf-checks.yaml Show resolved Hide resolved

joamaki reviewed Jun 3, 2022

View reviewed changes

joestringer reviewed Jun 3, 2022

View reviewed changes

dylandreimerink force-pushed the feature/bpf-unit-tests branch 2 times, most recently from 078f03f to 88b0cfa Compare June 8, 2022 11:28

brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 8, 2022

YutaroHayakawa reviewed Jun 9, 2022

View reviewed changes

bpf/tests/xdp_nodeport_lb4_test.c Outdated Show resolved Hide resolved

YutaroHayakawa reviewed Jun 9, 2022

View reviewed changes

bpf/tests/xdp_nodeport_lb4_test.c Outdated Show resolved Hide resolved

YutaroHayakawa reviewed Jun 9, 2022

View reviewed changes

test/bpf_tests/bpf_test.go Outdated Show resolved Hide resolved

brb removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 9, 2022

dylandreimerink force-pushed the feature/bpf-unit-tests branch 2 times, most recently from ebfa850 to e36559f Compare June 9, 2022 10:48

dylandreimerink and others added 4 commits June 9, 2022 12:52

dylandreimerink force-pushed the feature/bpf-unit-tests branch from e36559f to 3c42068 Compare June 9, 2022 10:52

brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 9, 2022

nebril merged commit ab9f69e into cilium:master Jun 10, 2022

dylandreimerink mentioned this pull request Jun 10, 2022

docs: Add developers guide page about BPF testing framework #20165

Merged

aanm mentioned this pull request Jun 22, 2022

Prepare for release v1.12.0-rc3 #20279

Merged

sahid mentioned this pull request Nov 4, 2022

[SNAT] add "need to frag" ICMP support #18414

Merged

christarazi mentioned this pull request Jan 4, 2023

CODEOWNERS: Cover test/bpf_tests by sig-datapath #22928

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datapath: Improved BPF testing framework #20017

datapath: Improved BPF testing framework #20017

dylandreimerink commented May 31, 2022 •

edited

joestringer left a comment

joestringer Jun 3, 2022 •

edited

dylandreimerink Jun 3, 2022

joestringer Jun 7, 2022

joestringer Jun 3, 2022 •

edited

dylandreimerink Jun 3, 2022

joestringer Jun 7, 2022

dylandreimerink Jun 7, 2022 •

edited

dylandreimerink commented Jun 9, 2022

		// Give the global log buf some time to empty
		time.Sleep(50 * time.Millisecond)

datapath: Improved BPF testing framework #20017

datapath: Improved BPF testing framework #20017

Conversation

dylandreimerink commented May 31, 2022 • edited

joestringer left a comment

Choose a reason for hiding this comment

joestringer Jun 3, 2022 • edited

Choose a reason for hiding this comment

dylandreimerink Jun 3, 2022

Choose a reason for hiding this comment

joestringer Jun 7, 2022

Choose a reason for hiding this comment

joestringer Jun 3, 2022 • edited

Choose a reason for hiding this comment

dylandreimerink Jun 3, 2022

Choose a reason for hiding this comment

joestringer Jun 7, 2022

Choose a reason for hiding this comment

dylandreimerink Jun 7, 2022 • edited

Choose a reason for hiding this comment

dylandreimerink commented Jun 9, 2022

dylandreimerink commented May 31, 2022 •

edited

joestringer Jun 3, 2022 •

edited

joestringer Jun 3, 2022 •

edited

dylandreimerink Jun 7, 2022 •

edited