Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath: Improved BPF testing framework #20017

Merged
merged 4 commits into from
Jun 10, 2022

Conversation

dylandreimerink
Copy link
Member

@dylandreimerink dylandreimerink commented May 31, 2022

This PR adds a new BPF testing framework for datapath code. This
framework will allow us to write tests in C which will be executed,
as eBPF programs in the kernel. This allows us to test our code against
the actual kernel facilities and helper calls as well as confirming that
code will pass the verifier.

This framework differs from other related works in the sense that we do
the test setup, execution and verification all in C/eBPF code instead
of having to do half the work in userspace and the other half in eBPF.
A lightweight loader program is used to automatically detect test
programs in ELF files in a given directory, load, and execute them using
the BPF_PROG_TEST_RUN feature. The test results are passed back to the
loader which will convert them into sub-tests within the golang testing
framework to allow for easy integration into existing tooling.

Doing setup in C allows us to easily mock out code with #define
preprocessor tricks, replace tailcalls with stubs/code, and leverage
conditional testing depending on which flags are set / features enabled.
It also decouples datapath testing from the agent allowing us to verify
that the eBPF programs work as expected separately from the agent /
userspace.

Each ELF file can contain multiple CHECK's each of which will become
a separate program which can fail(test_fail/test_fail_now()) or pass
(default if not failed). Tests can log messages with test_log("msg")
for debugging purposes or as fail message(test_fatal("msg") logs and
test_fail_now()). These messages can include parameters analogous to
bpf_trace_printk, for example:
test_log("Expected 123, found: %llu", some_var). Or for compact
checks the assert(some_var == 123) style can be used which will
report the file and line of failed asserts and stop the test.

These check programs can also define sub-tests with a TEST("name", {
block which can pass/fail independently from the parent test. The appeal
is that such sub-tests run after each other in the exact same context
and can for example test different aspects of the same test setup(ctx,
map state).

Both CHECK and TEST can be skipped by calling test_skip()/
test_skip_now(), to mark a test as skipped, for example if applies to
a feature which is disabled or depends on features which are not
available in the current kernel version. Skipped tests are explicitly
marked as such instead of silently ignored. Tests can use #ifdef
preprocessor statements to determine to skip or not.

Tests that do not require tailcalls can do setup, execution and
pass/fail checking all in the CHECK program. Tests that do need to
tests across multiple tailcalls need to use a SETUP program in
addition, in which case the setup and execution will happen in the
SETUP program which will exit after the last tailcall is done. The
loader will run any SETUP program before the CHECK program of the
same name, the CHECK program will be invoked with the result context
of the SETUP program, the result number of the SETUP program will
be pretended to the context(4 bytes), this allows the CHECK program
to verify the return value, context and map state and determine a pass/
fail result.

Fixes: #18480

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 31, 2022
@dylandreimerink dylandreimerink force-pushed the feature/bpf-unit-tests branch 4 times, most recently from 27d561b to decad16 Compare May 31, 2022 09:15
@dylandreimerink dylandreimerink changed the title Improved BPF testing framework datapath: Improved BPF testing framework May 31, 2022
@dylandreimerink dylandreimerink force-pushed the feature/bpf-unit-tests branch 5 times, most recently from c08ceaa to 51ed754 Compare May 31, 2022 17:47
@dylandreimerink dylandreimerink added area/CI Continuous Integration testing issue or flake sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. labels May 31, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 31, 2022
@dylandreimerink dylandreimerink marked this pull request as ready for review June 1, 2022 08:14
@dylandreimerink dylandreimerink requested a review from a team June 1, 2022 08:14
@dylandreimerink dylandreimerink requested review from a team as code owners June 1, 2022 08:14
bpf/tests/bpf_tests/common.h Outdated Show resolved Hide resolved
test/bpf_tests/bpf_test.go Outdated Show resolved Hide resolved
test/bpf_tests/bpf_test.go Outdated Show resolved Hide resolved
test/bpf_tests/bpf_test.go Outdated Show resolved Hide resolved
test/bpf_tests/bpf_test.go Outdated Show resolved Hide resolved
test/bpf_tests/bpf_test.go Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked like others focused on the details of the test progs themselves, so I focused more on the build systems and general bits that were different from the way that other go tests work in the tree.

I explicitly didn't review the actual test content, because if you delete the full files and then add full files, it's much more difficult to review the diffs. If you're able to combine these in the same commit with minimal diff, that would make review easier for those pieces.

bpf/tests/bpf_tests/Makefile Outdated Show resolved Hide resolved
@@ -0,0 +1,443 @@
// SPDX-License-Identifier: Apache-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feedback here, we don't need yet another directory (test/bpf, bpf/tests, test/bpf_tests, bpf/tests/bpf_tests) to confuse people even more about where the code should live :-)

IIRC the rationale behind test/bpf/ vs. bpf/tests is the language distinction of Go vs. C, but if we have a reasonable way to satisfy whatever tooling is analyzing these directories then my preference would actually be to just put them all in the same directory. If the tooling doesn't like them being in the same directory (eg go tooling complaining about the C files) then we can keep two separate dirs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the distinction seems to be that /bpf is for C code and the rest for Go. You can't put Go and C files in the same directory since Go will automatically attempt to compile .c files as CGO, so any Go code would have to live in a separate directory anyway. Having 1 Go file in among all the C code felt strange, but I agree that it would be better to co-locate code that works together, perhaps not just for the tests but other parts of cilium as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Maybe we should move the unit-test.c (and elf-demo.c?) into bpf/ ..?

elf-demo.c I'm less sure about, IIRC that's actually used for the pkg/elf Go testing. Could potentially even move that one under something like test/elf.

Well anyways this is not such a big deal, we can always move the files later.

test/bpf_tests/bpf_test.go Show resolved Hide resolved
test/bpf_tests/bpf_test.go Outdated Show resolved Hide resolved
test/bpf/Makefile Show resolved Hide resolved
test/Makefile Outdated Show resolved Hide resolved
test/Makefile Outdated Show resolved Hide resolved
test/Makefile Outdated Show resolved Hide resolved
Comment on lines 197 to 198
// Give the global log buf some time to empty
time.Sleep(50 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm always suspicious about sleeps, is there a way we can flush the log or something instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium/ebpf currently doesn't provide this, though I agree it should, and I believe it is possible, I will see if I can fix that. Would it be acceptable to leave it like this and fix it as soon as a flush feature has been upstreamed in cilium/ebpf? (if that takes longer than this PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a question first, what's the consequences if 50 Milliseconds is not enough time to flush the log? Can we easily detect that problem and react to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the condition we are looking for is:

  1. Did the test complete
  2. Are we blocking

But we can't simply use a select to check if we are blocked, since it uses epoll. It would be great if we the library had a "check without blocking" function, but it doesn't. So to really fix it we need to update cilium/ebpf. Worst case if the reader does take longer than 50ms, is that we lose some debug information while in verbose mode, so I personally feel like this doesn't have to be a blocking issue as long as we improve this in a follow-up.

test/bpf_tests/trf.proto Show resolved Hide resolved
@dylandreimerink dylandreimerink force-pushed the feature/bpf-unit-tests branch 2 times, most recently from 078f03f to 88b0cfa Compare June 8, 2022 11:28
@brb brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 8, 2022
@brb brb removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 9, 2022
@dylandreimerink dylandreimerink force-pushed the feature/bpf-unit-tests branch 2 times, most recently from ebfa850 to e36559f Compare June 9, 2022 10:48
dylandreimerink and others added 4 commits June 9, 2022 12:52
This commit adds a new BPF testing framework for datapath code. This
framework will allow us to write unit tests in C which will be executed,
as eBPF programs in the kernel. This allows us to test our code against
the actual kernel facilities and helper calls as well as confirming that
code will pass the verifier.

This framework differs from other related works in the sense that we do
the test setup, execution and verification all in C/eBPF code instead
of having to do half the work in userspace and the other half in eBPF.
A lightweight loader program is used to automatically detect test
programs in ELF files in a given directory, load, and execute them using
the `BPF_PROG_TEST_RUN` feature. The test results are passed back to the
loader which will convert them into sub-tests within the golang testing
framework to allow for easy integration into existing tooling.

Doing setup in C allows us to easily mock out code with `#define`
preprocessor tricks, replace tailcalls with stubs/code, and leverage
conditional testing depending on which flags are set / features enabled.
It also decouples datapath testing from the agent allowing us to verify
that the eBPF programs work as expected separately from the agent /
userspace.

Each ELF file can contain multiple `CHECK`'s each of which will become
a separate program which can fail(`test_fail`/`test_fail_now()`) or pass
(default if not failed). Tests can log messages with `test_log("msg")`
for debugging purposes or as fail message(`test_fatal("msg")` logs and
`test_fail_now()`). These messages can include parameters analogous to
`bpf_trace_printk`, for example: `test_log("Expected 123, found: %llu",
some_var)`. Or for compact checks the `assert(some_var == 123)` style
can be used which will report the file and line of failed asserts and
stop the test.

These check programs can also define sub-tests with a `TEST("name", {`
block which can pass/fail independently from the parent test. The appeal
is that such sub-tests run after each other in the exact same context
and can for example test different aspects of the same test setup(ctx,
map state).

Both `CHECK` and `TEST` can be skipped by calling `test_skip()`/
`test_skip_now()`, to mark a test as skipped, for example if applies to
a feature which is disabled or depends on features which are not
available in the current kernel version. Skipped tests are explicitly
marked as such instead of silently ignored. Tests can use `#ifdef`
preprocessor statements to determine to skip or not.

Tests that do not require tailcalls can do setup, execution and
pass/fail checking all in the `CHECK` program. Tests that do need to
tests across multiple tailcalls need to use a `SETUP` program in
addition, in which case the setup and execution will happen in the
`SETUP` program which will exit after the last tailcall is done. The
loader will run any `SETUP` program before the `CHECK` program of the
same name, the `CHECK` program will be invoked with the result context
of the `SETUP` program, the result number of the `SETUP` program will
be pretended to the context(4 bytes), this allows the `CHECK` program
to verify the return value, context and map state and determine a pass/
fail result.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The `memcpy` and `memcmp` builtin replacements did work for uneven
values other than 1. This commit updates both builtin replacements so
they do. This is quite important since it isn't uncommon to copy strings
with an uneven amount of chars in the new test framework.

Additionally, this commit extends the max size of values passed into
`memcmp` from 32 bytes to 72 bytes which is required for some unit tests
to compare packets which are larger than 32 bytes.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit ports most existing unit tests in their different forms to
the new eBPF unit test framework. This removes the need for a mocking
framework in userspace and replaces the existing BPF_PROG_TEST_RUN based
framework in the CI.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The const Coccinelle script attempts to detect all function arguments
that should be marked as const. To that end, for each function argument,
we skip all cases that could lead to that argument being modified.

One corner case we need to detect is when we modify an array field in a
struct (`func(..., x->field, ...)`). In that case, we can't easily
distinguish between a pointer field and an array field. In the former
case, the struct is not modified; in the latter case, it is. Since we
only have a few cases with array fields in the codebase, we list those
manually.

Until recently, that was only the addr field in some struct (holding an
IPv6). We now need to add the fields for MAC addresses.

Signed-off-by: Paul Chaignon <paul@cilium.io>
@dylandreimerink
Copy link
Member Author

I have gotten the CI as far as I can take it. The only remaining complaint from checkpatch is about the macros used:

CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
  #379: FILE: bpf/tests/common.h:28:
  +#define __bpf_log_arg1(ptr, arg) *(ptr++) = MKR_LOG_ARG; *(__u64 *)(ptr) = arg; ptr += sizeof(__u64)
  
  CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'ptr' may be better as '(ptr)' to avoid precedence issues
  #379: FILE: bpf/tests/common.h:28:
  +#define __bpf_log_arg1(ptr, arg) *(ptr++) = MKR_LOG_ARG; *(__u64 *)(ptr) = arg; ptr += sizeof(__u64)

But I can't really "fix" these without altering/breaking the behavior of the macros

@brb brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 9, 2022
@nebril nebril merged commit ab9f69e into cilium:master Jun 10, 2022
dylandreimerink added a commit to dylandreimerink/cilium that referenced this pull request Jun 10, 2022
This commits adds a new page to the "For developers/Testing" section.
The page covers how to run and create BPF tests within the framework
that was added in cilium#20017

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit to dylandreimerink/cilium that referenced this pull request Jun 17, 2022
This commits adds a new page to the "For developers/Testing" section.
The page covers how to run and create BPF tests within the framework
that was added in cilium#20017

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
kkourt pushed a commit that referenced this pull request Jun 20, 2022
This commits adds a new page to the "For developers/Testing" section.
The page covers how to run and create BPF tests within the framework
that was added in #20017

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve BPF datapath unit testing
9 participants