New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move datapath verifier tests into GH actions workflow #22754
Conversation
c5ebc5c
to
cf9e089
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! Will ACK once it's ready for the review 🚀
8052aee
to
0096b0a
Compare
0096b0a
to
e9709af
Compare
0eb0553
to
83ec61b
Compare
Datapath verifier GHA workflow passed with test commit: https://github.com/cilium/cilium/actions/runs/3950809920 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming the last commit is removed.
b1d92f9
to
17288a3
Compare
Based on K8sDatapathVerifier. This allows to run the datapath verifier tests outside of a k8s cluster as part of the datapath conformance workflow in GitHub actions. Note that this test no longer covers kernel version 4.9 which K8sDatapathVerifier did. However, support for that kernel version is dropped in the upcoming Cilium 1.13 release. Suggested-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
The 4.19 image used by the GHA datapath verifier workflow currently leads to verifier issues on that version. The k8s-based tests use an older 4.19 version which does not have these issues. Thus, keep the k8s-based test for now but restrict it to 4.19. Signed-off-by: Tobias Klauser <tobias@cilium.io>
17288a3
to
f5d244a
Compare
New datapath verifier workflow passed on all kernel versions with test commit: https://github.com/cilium/cilium/actions/runs/3995585967/jobs/6854636517 Removing test commit and running K8sDatapathVerifier test to check 393e100 |
f5d244a
to
393e100
Compare
/test --focus="K8sDatapathVerifier" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍕 🥳
/test-only --focus="K8sDatapathVerifier" |
Focussed run of K8sDatapathVerifier passed as well: https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/508/ This is the only bit of existing infrastructure modified by this PR, thus there is no need to run full CI. Marking as ready to merge given that reviews are in and the newly added workflow passed with a test commit: #22754 (comment) |
Paul reported that verifier complexity tests were broken on bpf-next for a while due to the GH action testing the wrong datapath configurations for some months. This slipped in via #22754. Paul says: As part of the verifier tests, we first compile the datapath then attempt to load it. We do that for a set of kernels and datapath configurations. We compile e.g. bpf_lxc with: make -C bpf KERNEL=[kernel] MAX_LXC_OPTIONS=[config] That method of passing the two environment variables however doesn't seem to work in the CI for some reason. The datapath config used for the compilation ends up empty and the default one is used instead. That means the verifier test has been covering a subset of the expected datapath configurations. The 4.19 kernel is not affected because it was kept in the previous K8sVerifier ginkgo test. Specifically, bpf-next fails with an invalid size of register fill error from the verifier: [...] 578: R0=scalar() 578: (79) r6 = *(u64 *)(r10 -288) ; R6_w=ctx(off=0,imm=0) R10=fp0 579: (b4) w9 = -141 ; R9_w=4294967155 580: (c6) if w0 s< 0x0 goto pc+2320 ; R0=scalar(smax=9223372034707292159,umax=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min=0,u32_max=2147483647) ; if (no_neigh) { 581: (61) r1 = *(u32 *)(r10 -292) ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 582: (16) if w1 == 0x0 goto pc+2292 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; dmac = nh_params.nh_family == AF_INET ? 583: (18) r1 = 0xffff8c030542d400 ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0) 585: (61) r2 = *(u32 *)(r10 -304) invalid size of register fill processed 7447 insns (limit 1000000) max_states_per_insn 5 total_states 426 peak_states 420 mark_read 117 Maxim built a small reproducer: 1) Setup LVH: # mkdir -v /tmp/lvh # docker run -v /tmp/lvh:/mnt/images quay.io/lvh-images/complexity-test:bpf-next-main cp /data/images/complexity-test_bpf-next.qcow2.zst /mnt/images/ # zstd -d /tmp/lvh/complexity-test_bpf-next.qcow2.zst -o /tmp/datapath-bpf-complexity.qcow2 # rm -vf /tmp/lvh/complexity-test_bpf-next.qcow2.zst # rmdir -v /tmp/lvh # ./lvh run --image /tmp/datapath-bpf-complexity.qcow2 --host-mount "$YOUR_CILIUM_REPOSITORY_DIR" 2) Inside the VM: # cd /host # make -C bpf KERNEL=netnext MAX_HOST_OPTIONS='-DSKIP_DEBUG=1 -DENABLE_IPV4=1 -DENABLE_IPV6=1 -DENABLE_ROUTING=1 -DPOLICY_VERDICT_NOTIFY=1 -DALLOW_ICMP_FRAG_NEEDED=1 -DENABLE_IDENTITY_MARK=1 -DMONITOR_AGGREGATION=3 -DCT_REPORT_FLAGS=0x0002 -DENABLE_HOST_FIREWALL=1 -DENABLE_ICMP_RULE=1 -DENABLE_CUSTOM_CALLS=1 -DENABLE_SRV6=1 -DHAVE_LPM_TRIE_MAP_TYPE=1 -DHAVE_LRU_HASH_MAP_TYPE=1 -DENABLE_MASQUERADE=1 -DENABLE_SRC_RANGE_CHECK=1 -DENABLE_NODEPORT=1 -DENABLE_NODEPORT_ACCELERATION=1 -DENABLE_SESSION_AFFINITY=1 -DENABLE_DSR_ICMP_ERRORS=1 -DENABLE_DSR=1 -DENABLE_DSR_HYBRID=1 -DENABLE_IPV4_FRAGMENTS=1 -DENABLE_TPROXY=1 -DENABLE_HOST_ROUTING=1 -DENABLE_BANDWIDTH_MANAGER=1 -DETH_HLEN=0 -DENCAP_IFINDEX=1 -DTUNNEL_MODE=1 -DENABLE_EGRESS_GATEWAY=1 -DENABLE_VTEP=1 -DENABLE_SCTP=1' bpf_host.o # env TC_PROGS="" XDP_PROGS="" CG_PROGS="" TC_PROGS="bpf_host" ./test/bpf/verifier-test.sh The above verifier log with nh_params.nh_family == AF_INET test seems to make little sense. It looks like the *(u32 *)(r10 -304) is about fetching the nh_family, but the verifier thinks that the data on stack was a spilled pointer. We can actually fix this by using fib_params->l.family directly which would also simplify the code a bit. For the next hop, the fib lookup helper updates the field depending on whether the next hop is a v4 or v6 address. Related: #24335 Reported-by: Paul Chaignon <paul@cilium.io> Co-developed-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Paul reported that verifier complexity tests were broken on bpf-next for a while due to the GH action testing the wrong datapath configurations for some months. This slipped in via #22754. Paul says: As part of the verifier tests, we first compile the datapath then attempt to load it. We do that for a set of kernels and datapath configurations. We compile e.g. bpf_lxc with: make -C bpf KERNEL=[kernel] MAX_LXC_OPTIONS=[config] That method of passing the two environment variables however doesn't seem to work in the CI for some reason. The datapath config used for the compilation ends up empty and the default one is used instead. That means the verifier test has been covering a subset of the expected datapath configurations. The 4.19 kernel is not affected because it was kept in the previous K8sVerifier ginkgo test. Specifically, bpf-next fails with an invalid size of register fill error from the verifier: [...] 578: R0=scalar() 578: (79) r6 = *(u64 *)(r10 -288) ; R6_w=ctx(off=0,imm=0) R10=fp0 579: (b4) w9 = -141 ; R9_w=4294967155 580: (c6) if w0 s< 0x0 goto pc+2320 ; R0=scalar(smax=9223372034707292159,umax=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min=0,u32_max=2147483647) ; if (no_neigh) { 581: (61) r1 = *(u32 *)(r10 -292) ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 582: (16) if w1 == 0x0 goto pc+2292 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; dmac = nh_params.nh_family == AF_INET ? 583: (18) r1 = 0xffff8c030542d400 ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0) 585: (61) r2 = *(u32 *)(r10 -304) invalid size of register fill processed 7447 insns (limit 1000000) max_states_per_insn 5 total_states 426 peak_states 420 mark_read 117 Maxim built a small reproducer: 1) Setup LVH: # mkdir -v /tmp/lvh # docker run -v /tmp/lvh:/mnt/images quay.io/lvh-images/complexity-test:bpf-next-main cp /data/images/complexity-test_bpf-next.qcow2.zst /mnt/images/ # zstd -d /tmp/lvh/complexity-test_bpf-next.qcow2.zst -o /tmp/datapath-bpf-complexity.qcow2 # rm -vf /tmp/lvh/complexity-test_bpf-next.qcow2.zst # rmdir -v /tmp/lvh # ./lvh run --image /tmp/datapath-bpf-complexity.qcow2 --host-mount "$YOUR_CILIUM_REPOSITORY_DIR" 2) Inside the VM: # cd /host # make -C bpf KERNEL=netnext MAX_HOST_OPTIONS='-DSKIP_DEBUG=1 -DENABLE_IPV4=1 -DENABLE_IPV6=1 -DENABLE_ROUTING=1 -DPOLICY_VERDICT_NOTIFY=1 -DALLOW_ICMP_FRAG_NEEDED=1 -DENABLE_IDENTITY_MARK=1 -DMONITOR_AGGREGATION=3 -DCT_REPORT_FLAGS=0x0002 -DENABLE_HOST_FIREWALL=1 -DENABLE_ICMP_RULE=1 -DENABLE_CUSTOM_CALLS=1 -DENABLE_SRV6=1 -DHAVE_LPM_TRIE_MAP_TYPE=1 -DHAVE_LRU_HASH_MAP_TYPE=1 -DENABLE_MASQUERADE=1 -DENABLE_SRC_RANGE_CHECK=1 -DENABLE_NODEPORT=1 -DENABLE_NODEPORT_ACCELERATION=1 -DENABLE_SESSION_AFFINITY=1 -DENABLE_DSR_ICMP_ERRORS=1 -DENABLE_DSR=1 -DENABLE_DSR_HYBRID=1 -DENABLE_IPV4_FRAGMENTS=1 -DENABLE_TPROXY=1 -DENABLE_HOST_ROUTING=1 -DENABLE_BANDWIDTH_MANAGER=1 -DETH_HLEN=0 -DENCAP_IFINDEX=1 -DTUNNEL_MODE=1 -DENABLE_EGRESS_GATEWAY=1 -DENABLE_VTEP=1 -DENABLE_SCTP=1' bpf_host.o # env TC_PROGS="" XDP_PROGS="" CG_PROGS="" TC_PROGS="bpf_host" ./test/bpf/verifier-test.sh The above verifier log with nh_params.nh_family == AF_INET test seems to make little sense. It looks like the *(u32 *)(r10 -304) is about fetching the nh_family, but the verifier thinks that the data on stack was a spilled pointer. We can actually fix this by using fib_params->l.family directly which would also simplify the code a bit. For the next hop, the fib lookup helper updates the field depending on whether the next hop is a v4 or v6 address. Related: #24335 Reported-by: Paul Chaignon <paul@cilium.io> Co-developed-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
No description provided.