Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: verifier-test.sh always skips cgroup programs #17125

Open
pchaigno opened this issue Aug 10, 2021 · 12 comments
Open

CI: verifier-test.sh always skips cgroup programs #17125

pchaigno opened this issue Aug 10, 2021 · 12 comments
Labels
area/CI Continuous Integration testing issue or flake kind/bug/CI This is a bug in the testing code. pinned These issues are not marked stale by our issue bot.

Comments

@pchaigno
Copy link
Member

Commit 696012f changed the section names for the cgroup BPF programs (e.g., from from-sock4 to connect4), but didn't change the corresponding code in verifier-test.sh:

if check_macro CGROUP_INET4_CONNECT; then
prog_types["from-sock4"]="sockaddr"
attach_types["from-sock4"]="connect4"
fi
if check_macro CGROUP_INET6_CONNECT; then
prog_types["from-sock6"]="sockaddr"
attach_types["from-sock6"]="connect6"
fi
if check_macro UDP4_RECVMSG; then
prog_types["snd-sock4"]="sockaddr"
prog_types["rcv-sock4"]="sockaddr"
attach_types["snd-sock4"]="sendmsg4"
attach_types["rcv-sock4"]="recvmsg4"
fi
if check_macro UDP6_RECVMSG; then
prog_types["snd-sock6"]="sockaddr"
prog_types["rcv-sock6"]="sockaddr"
attach_types["snd-sock6"]="sendmsg6"
attach_types["rcv-sock6"]="recvmsg6"
fi

As a result we always skip those programs in the K8sVerifier test. Maybe we should also find a way to error out when skipping something that should be tested.

/cc @borkmann

@pchaigno pchaigno added kind/bug/CI This is a bug in the testing code. area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Aug 10, 2021
@borkmann
Copy link
Member

Sigh, good catch, I was not aware we had these; we should definitely find a way to error.

@joestringer
Copy link
Member

joestringer commented Aug 10, 2021

I think the reason we don't error out right now is that depending on kernel support, it's expected that we skip the CG tests (why error out if you can't run those tests?). That's the first case in the load_cg() conditional checks:

if [ "${prog_types[$section]}" == "" ]; then
echo "=> Skipping ${p}.c:$section"
continue

@pchaigno
Copy link
Member Author

Is there a way to error out other than passing the KERNEL version in the script? I was hoping to avoid that, but I don't really see another solution :-(

@joestringer
Copy link
Member

Maybe bpftool feature probe | grep ...CGROUP... for the most recent cgroup BPF feature we rely on?

@pchaigno
Copy link
Member Author

What do you mean? We already rely on bpftool feature probe:

if check_macro CGROUP_INET4_CONNECT; then

@joestringer
Copy link
Member

Assuming that UDP6_RECVMSG is the most recent cgroups bpf hook we use in the script, I was thinking something like this:

$ git di
diff --git a/test/bpf/verifier-test.sh b/test/bpf/verifier-test.sh
index 582e2229c9d5..ae07e846ecbd 100755
--- a/test/bpf/verifier-test.sh
+++ b/test/bpf/verifier-test.sh
@@ -191,6 +191,11 @@ function load_cg {
                ELF_SECTIONS="$(readelf -S $DIR/${p}.o)"
                for section in $(get_section ${p}.c); do
                        if [ "${prog_types[$section]}" == "" ]; then
+                               if check_macro UDP6_RECVMSG; then
+                                       echo "=> CGroups BPF programs should be supported, but this script does not understand how to load ${p} (section $section)." 2>&1
+                                       echo "   Please fix me 🥺." 2>&1
+                                       exit 1
+                               fi
                                echo "=> Skipping ${p}.c:$section"
                                continue
                        elif [ "${prog_types[$section]}" == "SKIP" ]; then

@pchaigno
Copy link
Member Author

Ah, I see. That's a bit specific to our current issue, but it's probably enough 👍

@joestringer
Copy link
Member

Yeah true. Maybe something more reliable would be to verify that there are no "Skipping" lines in the output when running this test on net-next...?

@pchaigno
Copy link
Member Author

Ah, of course! 🤦 We can easily do that from K8sVerifier, where there's already per-version code.

@aditighag
Copy link
Member

I'm just curious - why don't we run bpftool cgroup show ... to verify that all the expected programs were loaded on net-next?

@joestringer joestringer removed the ci/flake This is a known failure that occurs in the tree. Please investigate me! label Sep 8, 2021
@joestringer
Copy link
Member

@aditighag I think the question is how we determine the "all the expected programs" from your question above. If we need to manually maintain a list of programs somewhere (eg to check whether they are present in bpftool cgroup show), then that list can still get out of sync compared to other parts of the codebase.

From that perspective, the right approach to me is one step more abstract: Assemble a list of all possible programs from the tree programmatically, and ensure that every one of them is loaded. If any of the programs in the tree are not loaded, then the script needs fixing and it should quit out. The script should be doing 90% of this already, it just prints "Skipping ..." if it cannot load a program. The missing piece is that we don't force the test to fail in that case.

@github-actions

This comment was marked as outdated.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 9, 2022
@pchaigno pchaigno added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake kind/bug/CI This is a bug in the testing code. pinned These issues are not marked stale by our issue bot.
Projects
None yet
Development

No branches or pull requests

4 participants