Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Verifier tests: Keep generated object files and logs on test failure #25862

Merged
merged 3 commits into from Jun 7, 2023

Conversation

qmonnet
Copy link
Member

@qmonnet qmonnet commented Jun 2, 2023

The CI workflow for testing Cilium's programs against the verifier is supposed to upload its log files and the generated files on failures. However, we lose some of this data, because:

  • Multiple jobs (for different kernel versions) upload their log and object files to the same artifact, overwriting previous uploads in the process, so that we only keep the files uploaded by the last job to complete. We can address that by using different artifact names for the different jobs.

  • The test itself runs make -C bpf/ clean before testing the programs of each source file, thus removing the object files previously generated. Instead, we can clean up the directory just once before running the tests, not between two tests.

  • At last, each source file is compiled multiple times with different sets of options, resulting in only the last version of this object file being kept. The solution implemented here consists in renaming these object files (on test failure) to make sure we keep them around.

@qmonnet qmonnet added area/CI Continuous Integration testing issue or flake sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/ci This PR makes changes to the CI. labels Jun 2, 2023
@qmonnet qmonnet requested review from a team as code owners June 2, 2023 13:09
@qmonnet
Copy link
Member Author

qmonnet commented Jun 2, 2023

Tested on another PR, for example on this run.

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! One nit inline in case a respin is needed.

test/verifier/verifier_test.go Outdated Show resolved Hide resolved
@qmonnet qmonnet force-pushed the pr/ci-verifier-keep-objfiles branch from 6068539 to 8178f17 Compare June 2, 2023 14:43
@ti-mo
Copy link
Contributor

ti-mo commented Jun 2, 2023

Let's give all .o's a unique name instead and include it as part of the name of the subtest. I think renaming is confusing, and it'd be nice to have all objs in the zip for debugging.

@qmonnet
Copy link
Member Author

qmonnet commented Jun 2, 2023

@ti-mo: I don't mind adding all .o's to the artifacts. But that still means renaming them, right? Only we rename them all, and earlier, instead of doing it only on failure.

We could try to generate them with the right name from the beginning, but given we're executing a call to make to compile them, it will be quite involved if we have to update the Makefile to support that?

Copy link
Contributor

@viktor-kurchenko viktor-kurchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@qmonnet qmonnet force-pushed the pr/ci-verifier-keep-objfiles branch from 8178f17 to cb94261 Compare June 5, 2023 14:43
@qmonnet
Copy link
Member Author

qmonnet commented Jun 5, 2023

Updated to rename all object files regardless of exit status. I checked (locally) that all object files are preserved. @ti-mo does this correspond to what you had in mind?

@ti-mo
Copy link
Contributor

ti-mo commented Jun 6, 2023

Updated to rename all object files regardless of exit status. I checked (locally) that all object files are preserved. @ti-mo does this correspond to what you had in mind?

Yes, thanks! The make invocation is indeed what deterred me from doing the renaming at the time. We can start using Cilium's Go clang pipeline here at some point for more flexibility. In the meantime, this will do!

@ti-mo
Copy link
Contributor

ti-mo commented Jun 6, 2023

Not sure why ci-verifier is not starting since the last push was 18 hours ago.

@ti-mo
Copy link
Contributor

ti-mo commented Jun 6, 2023

/test

@qmonnet
Copy link
Member Author

qmonnet commented Jun 6, 2023

/test-runtime

@qmonnet qmonnet force-pushed the pr/ci-verifier-keep-objfiles branch from cb94261 to 2ef102c Compare June 6, 2023 13:18
@qmonnet
Copy link
Member Author

qmonnet commented Jun 6, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathConfig Iptables Skip conntrack for pod traffic

Failure Output

FAIL: Found 2 k8s-app=cilium logs matching list of errors that must be investigated:

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/505/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

The verifier tests include a run of "make -C bpf/ clean" prior to
testing the programs in each BPF source files. While this sounds like a
sane practice, this means that at the end of the run, we're left with
only one object file under bpf/, the one used by the last tests
(namely: bpf_sock.o). While the object files are trivial to rebuild
locally, this prevents the CI workflow to find and upload the relevant
object files for instances where the verifier's logs are not enough to
debug.

Let's run instead a single "make -C bpf/ clean" on startup.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
For a given source file, we build the object multiple times, with a
different set of build options. This results in object files being lost
on failure. To address this, we rename the object files to include the
iteration number, so that they're still present when we upload to the CI
artifact.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
In the tests-datapath-verifier workflow, all jobs generated from the
matrix (for running the tests with different kernel versions) upload
files to the same artifact on failure. This means that a job may
overwrite the upload from a previous job, and in the end we only get the
uploads from the last job to complete. If several jobs fail with
different errors, then we lose useful debugging information.

Let's have the jobs upload their files to separate artifacts, named
after the kernel version in use.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@qmonnet qmonnet force-pushed the pr/ci-verifier-keep-objfiles branch from 2ef102c to bda8d4d Compare June 7, 2023 08:32
@qmonnet
Copy link
Member Author

qmonnet commented Jun 7, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' hit: #25958 (89.87% similarity)

@qmonnet
Copy link
Member Author

qmonnet commented Jun 7, 2023

Runtime tests on Jenkins and GitHub are currently broken due to #25968, and net-next on Jenkins hit #25958 - both unrelated to the current PR.

None of these workflows are affected by the change in this PR. I'm marking as ready to merge.

@qmonnet qmonnet added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 7, 2023
@dylandreimerink dylandreimerink merged commit e3580a0 into cilium:main Jun 7, 2023
59 of 62 checks passed
@qmonnet qmonnet deleted the pr/ci-verifier-keep-objfiles branch June 7, 2023 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants