Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move datapath verifier tests into GH actions workflow #22754

Merged
merged 3 commits into from Jan 24, 2023

Conversation

tklauser
Copy link
Member

No description provided.

@tklauser tklauser added dont-merge/preview-only Only for preview or testing, don't merge it. release-note/ci This PR makes changes to the CI. ci/hyperjump labels Dec 15, 2022
@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch 3 times, most recently from c5ebc5c to cf9e089 Compare December 15, 2022 16:09
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Will ACK once it's ready for the review 🚀

@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch 4 times, most recently from 8052aee to 0096b0a Compare December 15, 2022 19:03
@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch from 0096b0a to e9709af Compare December 15, 2022 19:26
@pchaigno pchaigno marked this pull request as ready for review December 15, 2022 21:56
@pchaigno pchaigno requested review from a team as code owners December 15, 2022 21:56
@pchaigno pchaigno marked this pull request as draft December 15, 2022 21:57
@tklauser tklauser removed the dont-merge/preview-only Only for preview or testing, don't merge it. label Jan 18, 2023
@tklauser tklauser marked this pull request as ready for review January 18, 2023 15:34
@tklauser tklauser requested review from brb and pchaigno January 18, 2023 15:34
@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch from 0eb0553 to 83ec61b Compare January 18, 2023 16:19
@tklauser
Copy link
Member Author

tklauser commented Jan 18, 2023

Datapath verifier GHA workflow passed with test commit: https://github.com/cilium/cilium/actions/runs/3950809920

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the last commit is removed.

test/verifier/verifier_test.go Show resolved Hide resolved
test/verifier/verifier_test.go Outdated Show resolved Hide resolved
test/verifier/verifier_test.go Outdated Show resolved Hide resolved
@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch 2 times, most recently from b1d92f9 to 17288a3 Compare January 24, 2023 10:30
Based on K8sDatapathVerifier.

This allows to run the datapath verifier tests outside of a k8s cluster
as part of the datapath conformance workflow in GitHub actions.

Note that this test no longer covers kernel version 4.9 which
K8sDatapathVerifier did. However, support for that kernel version is
dropped in the upcoming Cilium 1.13 release.

Suggested-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
The 4.19 image used by the GHA datapath verifier workflow currently leads to
verifier issues on that version. The k8s-based tests use an older 4.19
version which does not have these issues. Thus, keep the k8s-based test
for now but restrict it to 4.19.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch from 17288a3 to f5d244a Compare January 24, 2023 11:11
@tklauser
Copy link
Member Author

tklauser commented Jan 24, 2023

New datapath verifier workflow passed on all kernel versions with test commit: https://github.com/cilium/cilium/actions/runs/3995585967/jobs/6854636517

Removing test commit and running K8sDatapathVerifier test to check 393e100

@tklauser tklauser force-pushed the pr/tklauser/convert-verifier-tests branch from f5d244a to 393e100 Compare January 24, 2023 11:24
@tklauser
Copy link
Member Author

/test --focus="K8sDatapathVerifier"

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍕 🥳

@tklauser
Copy link
Member Author

/test-only --focus="K8sDatapathVerifier"

@tklauser
Copy link
Member Author

tklauser commented Jan 24, 2023

Focussed run of K8sDatapathVerifier passed as well: https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/508/ This is the only bit of existing infrastructure modified by this PR, thus there is no need to run full CI.

Marking as ready to merge given that reviews are in and the newly added workflow passed with a test commit: #22754 (comment)

@tklauser tklauser added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 24, 2023
@ldelossa ldelossa merged commit 5008dbc into master Jan 24, 2023
@ldelossa ldelossa deleted the pr/tklauser/convert-verifier-tests branch January 24, 2023 15:07
borkmann added a commit that referenced this pull request Mar 16, 2023
Paul reported that verifier complexity tests were broken on bpf-next for
a while due to the GH action testing the wrong datapath configurations for
some months. This slipped in via #22754. Paul says:

  As part of the verifier tests, we first compile the datapath then
  attempt to load it. We do that for a set of kernels and datapath
  configurations. We compile e.g. bpf_lxc with:

    make -C bpf KERNEL=[kernel] MAX_LXC_OPTIONS=[config]

  That method of passing the two environment variables however doesn't
  seem to work in the CI for some reason. The datapath config used for
  the compilation ends up empty and the default one is used instead.

  That means the verifier test has been covering a subset of the expected
  datapath configurations. The 4.19 kernel is not affected because it was
  kept in the previous K8sVerifier ginkgo test.

Specifically, bpf-next fails with an invalid size of register fill error
from the verifier:

  [...]
  578: R0=scalar()
  578: (79) r6 = *(u64 *)(r10 -288)     ; R6_w=ctx(off=0,imm=0) R10=fp0
  579: (b4) w9 = -141                   ; R9_w=4294967155
  580: (c6) if w0 s< 0x0 goto pc+2320   ; R0=scalar(smax=9223372034707292159,umax=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min=0,u32_max=2147483647)
  ; if (no_neigh) {
  581: (61) r1 = *(u32 *)(r10 -292)     ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
  582: (16) if w1 == 0x0 goto pc+2292   ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  ; dmac = nh_params.nh_family == AF_INET ?
  583: (18) r1 = 0xffff8c030542d400     ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0)
  585: (61) r2 = *(u32 *)(r10 -304)
  invalid size of register fill
  processed 7447 insns (limit 1000000) max_states_per_insn 5 total_states 426 peak_states 420 mark_read 117

Maxim built a small reproducer:

1) Setup LVH:

  # mkdir -v /tmp/lvh
  # docker run -v /tmp/lvh:/mnt/images quay.io/lvh-images/complexity-test:bpf-next-main cp /data/images/complexity-test_bpf-next.qcow2.zst /mnt/images/
  # zstd -d /tmp/lvh/complexity-test_bpf-next.qcow2.zst -o /tmp/datapath-bpf-complexity.qcow2
  # rm -vf /tmp/lvh/complexity-test_bpf-next.qcow2.zst
  # rmdir -v /tmp/lvh
  # ./lvh run --image /tmp/datapath-bpf-complexity.qcow2 --host-mount "$YOUR_CILIUM_REPOSITORY_DIR"

2) Inside the VM:

  # cd /host
  # make -C bpf KERNEL=netnext MAX_HOST_OPTIONS='-DSKIP_DEBUG=1 -DENABLE_IPV4=1 -DENABLE_IPV6=1 -DENABLE_ROUTING=1 -DPOLICY_VERDICT_NOTIFY=1 -DALLOW_ICMP_FRAG_NEEDED=1 -DENABLE_IDENTITY_MARK=1 -DMONITOR_AGGREGATION=3 -DCT_REPORT_FLAGS=0x0002 -DENABLE_HOST_FIREWALL=1 -DENABLE_ICMP_RULE=1 -DENABLE_CUSTOM_CALLS=1 -DENABLE_SRV6=1 -DHAVE_LPM_TRIE_MAP_TYPE=1 -DHAVE_LRU_HASH_MAP_TYPE=1 -DENABLE_MASQUERADE=1 -DENABLE_SRC_RANGE_CHECK=1 -DENABLE_NODEPORT=1 -DENABLE_NODEPORT_ACCELERATION=1 -DENABLE_SESSION_AFFINITY=1 -DENABLE_DSR_ICMP_ERRORS=1 -DENABLE_DSR=1 -DENABLE_DSR_HYBRID=1 -DENABLE_IPV4_FRAGMENTS=1 -DENABLE_TPROXY=1 -DENABLE_HOST_ROUTING=1 -DENABLE_BANDWIDTH_MANAGER=1 -DETH_HLEN=0 -DENCAP_IFINDEX=1 -DTUNNEL_MODE=1 -DENABLE_EGRESS_GATEWAY=1 -DENABLE_VTEP=1 -DENABLE_SCTP=1' bpf_host.o
  # env TC_PROGS="" XDP_PROGS="" CG_PROGS="" TC_PROGS="bpf_host" ./test/bpf/verifier-test.sh

The above verifier log with nh_params.nh_family == AF_INET test seems to
make little sense. It looks like the *(u32 *)(r10 -304) is about fetching
the nh_family, but the verifier thinks that the data on stack was a spilled
pointer. We can actually fix this by using fib_params->l.family directly
which would also simplify the code a bit. For the next hop, the fib lookup
helper updates the field depending on whether the next hop is a v4 or v6
address.

Related: #24335
Reported-by: Paul Chaignon <paul@cilium.io>
Co-developed-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann added a commit that referenced this pull request Mar 16, 2023
Paul reported that verifier complexity tests were broken on bpf-next for
a while due to the GH action testing the wrong datapath configurations for
some months. This slipped in via #22754. Paul says:

  As part of the verifier tests, we first compile the datapath then
  attempt to load it. We do that for a set of kernels and datapath
  configurations. We compile e.g. bpf_lxc with:

    make -C bpf KERNEL=[kernel] MAX_LXC_OPTIONS=[config]

  That method of passing the two environment variables however doesn't
  seem to work in the CI for some reason. The datapath config used for
  the compilation ends up empty and the default one is used instead.

  That means the verifier test has been covering a subset of the expected
  datapath configurations. The 4.19 kernel is not affected because it was
  kept in the previous K8sVerifier ginkgo test.

Specifically, bpf-next fails with an invalid size of register fill error
from the verifier:

  [...]
  578: R0=scalar()
  578: (79) r6 = *(u64 *)(r10 -288)     ; R6_w=ctx(off=0,imm=0) R10=fp0
  579: (b4) w9 = -141                   ; R9_w=4294967155
  580: (c6) if w0 s< 0x0 goto pc+2320   ; R0=scalar(smax=9223372034707292159,umax=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min=0,u32_max=2147483647)
  ; if (no_neigh) {
  581: (61) r1 = *(u32 *)(r10 -292)     ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
  582: (16) if w1 == 0x0 goto pc+2292   ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  ; dmac = nh_params.nh_family == AF_INET ?
  583: (18) r1 = 0xffff8c030542d400     ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0)
  585: (61) r2 = *(u32 *)(r10 -304)
  invalid size of register fill
  processed 7447 insns (limit 1000000) max_states_per_insn 5 total_states 426 peak_states 420 mark_read 117

Maxim built a small reproducer:

1) Setup LVH:

  # mkdir -v /tmp/lvh
  # docker run -v /tmp/lvh:/mnt/images quay.io/lvh-images/complexity-test:bpf-next-main cp /data/images/complexity-test_bpf-next.qcow2.zst /mnt/images/
  # zstd -d /tmp/lvh/complexity-test_bpf-next.qcow2.zst -o /tmp/datapath-bpf-complexity.qcow2
  # rm -vf /tmp/lvh/complexity-test_bpf-next.qcow2.zst
  # rmdir -v /tmp/lvh
  # ./lvh run --image /tmp/datapath-bpf-complexity.qcow2 --host-mount "$YOUR_CILIUM_REPOSITORY_DIR"

2) Inside the VM:

  # cd /host
  # make -C bpf KERNEL=netnext MAX_HOST_OPTIONS='-DSKIP_DEBUG=1 -DENABLE_IPV4=1 -DENABLE_IPV6=1 -DENABLE_ROUTING=1 -DPOLICY_VERDICT_NOTIFY=1 -DALLOW_ICMP_FRAG_NEEDED=1 -DENABLE_IDENTITY_MARK=1 -DMONITOR_AGGREGATION=3 -DCT_REPORT_FLAGS=0x0002 -DENABLE_HOST_FIREWALL=1 -DENABLE_ICMP_RULE=1 -DENABLE_CUSTOM_CALLS=1 -DENABLE_SRV6=1 -DHAVE_LPM_TRIE_MAP_TYPE=1 -DHAVE_LRU_HASH_MAP_TYPE=1 -DENABLE_MASQUERADE=1 -DENABLE_SRC_RANGE_CHECK=1 -DENABLE_NODEPORT=1 -DENABLE_NODEPORT_ACCELERATION=1 -DENABLE_SESSION_AFFINITY=1 -DENABLE_DSR_ICMP_ERRORS=1 -DENABLE_DSR=1 -DENABLE_DSR_HYBRID=1 -DENABLE_IPV4_FRAGMENTS=1 -DENABLE_TPROXY=1 -DENABLE_HOST_ROUTING=1 -DENABLE_BANDWIDTH_MANAGER=1 -DETH_HLEN=0 -DENCAP_IFINDEX=1 -DTUNNEL_MODE=1 -DENABLE_EGRESS_GATEWAY=1 -DENABLE_VTEP=1 -DENABLE_SCTP=1' bpf_host.o
  # env TC_PROGS="" XDP_PROGS="" CG_PROGS="" TC_PROGS="bpf_host" ./test/bpf/verifier-test.sh

The above verifier log with nh_params.nh_family == AF_INET test seems to
make little sense. It looks like the *(u32 *)(r10 -304) is about fetching
the nh_family, but the verifier thinks that the data on stack was a spilled
pointer. We can actually fix this by using fib_params->l.family directly
which would also simplify the code a bit. For the next hop, the fib lookup
helper updates the field depending on whether the next hop is a v4 or v6
address.

Related: #24335
Reported-by: Paul Chaignon <paul@cilium.io>
Co-developed-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/hyperjump ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants