New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run all ginkgo tests on GitHub actions #25713
Conversation
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/225/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
a7ed947
to
dcf3ade
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only looked at the contributing change, which is kind.sh - printing the version is fine :)
dcf3ade
to
3e67a76
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, some questions below.
### | ||
kafka: "K8sKafkaPolicyTest" | ||
# K8sKafkaPolicyTest Kafka Policy Tests KafkaPolicies | ||
### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comments with test names will be a pain to maintain, do we really need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is really useful to get an quick idea oh what tests are being covered by the focus regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I think they will become obsolete very quickly as people will forget to add them when creating new tests, even though we're not supposed to it still happens xD
case ${{ matrix.k8s-version }} in | ||
1.27) | ||
kube_image="kindest/node:v1.27.1@sha256:b7d12ed662b873bd8510879c1846e87c7e676a79fefc93e17b2a52989d3ff42b" | ||
kernel="bpf-next-20230526.105339@sha256:4133d4e09b1e86ac175df8d899873180281bb4220dc43e2566c47b0241637411" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason for handling kind and kernel versions like this instead of with matrix
env variables? I think the less bash processing the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Having the kernel and kube_image as part of the matrix it so would be considered a two new dimensions in the matrix and increase the number of combinations to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can this not be handled via include
, such that the matrix does not have 2 dimensions but the proper kernel env variables are inserted for the proper K8s version? 馃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, also one of my first approaches :D After working with it for a couple days it was really frustrating to maintain the workflow https://github.com/cilium/cilium/actions/runs/4573014881/workflow#L175-L497
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @nbusseneau about this. I've tested out @nbusseneau proposal here and it works really well.
Unfortunately, GitHub workflow files lack a GitHub context that allows access to the job name during their execution. This presents a significant challenge when generating the JUnit filename because it is generated based on the job name and matrix name. On datastudio, we retrieve the job name for the GitHub API, which typically appears as something like "name": "E2E Test (1.26, datapath-bgp-custom-lrp)"
. However, the JUnit filename incorporates all matrix dimensions and looks like this: E2E Test (1.26, datapath-bgp-custom-lrp, 5.4-20230526.105339@sha256:523ff0c81ed9be73a2d0d02a64c72655225efdd32e35220f530ca3eff3eada3c).xml
. This discrepancy makes it exceedingly difficult to establish a clear correlation between the JUnit filename and the job name. We need to know which JUnit file was generated by which GitHub job so that we can store it properly on datastudio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we set up the filenames for sysdump and JUnit with ${{ join(matrix.*, '-') }}
:
./cilium-cli sysdump --output-filename cilium-sysdump-${{ join(matrix.*, '-') }}-final || true
...
junit_filename="${{ env.job_name }} (${{ join(matrix.*, ', ') }}).xml"
To get back to the same behaviour as initially proposed in the PR, we can use ${{ matrix.k8s-version }}-${{ matrix.focus }}
:
./cilium-cli sysdump --output-filename cilium-sysdump-${{ matrix.k8s-version }}-${{ matrix.focus }}-final || true
...
junit_filename="${{ env.job_name }} (${{ matrix.k8s-version }}-${{ matrix.focus }}).xml"
case ${{ matrix.focus }} in | ||
agent-chaos) | ||
focus="${{ env.agent-chaos }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar fashion, instead of matrix
+ env
glued together with this bash processing, wouldn't it be more convenient to have matrix.focus
handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bash hack is simply to give a better UX in terms of maintaining the page of focus that we run and visualizing the page of test results. I've tried that in the beginning and they looked like this https://github.com/cilium/cilium/actions/runs/4495666151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, then I would take same approach as suggested in the discussion above with include
.
cid=$(docker create quay.io/cilium/cilium-cli-ci:latest ls) | ||
docker cp $cid:/usr/local/bin/cilium ./cilium-cli | ||
docker rm $cid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this :D
Any reason to use latest
? We typically pin the CLI version in other workflows, and it's then handled by Renovate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not on me, I'm simply copy pasting what other workflows are doing 馃槄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair 馃槃
c493d50
to
428dfaf
Compare
Nice work! I think the overall approach here where we roll out the new logic, get at least a week's worth of feedback, then disable the old approach is a reasonable way to mitigate the risks associated with migrating this infrastructure. I glanced through the PR at a high level and it made sense to me, but I didn't go into any depth on specific changes. I note that during the discussion during the community meeting, we raised that Jenkins will continue to run on older branches. This is fine for now. It may be worth exploring the backport to 1.12 and maybe 1.11, but probably not 1.10. Given the upcoming 1.14 release, that could provide a path to migrate entirely off Jenkins definitely this year if not in a couple of months. As for the missing datapath tests, will you file issues to follow up on the reduction of coverage?
I think that having focus groups would be helpful to be able to easily narrow down groups of tests. I glanced at the four datapath groups that you listed in the PR description, and I couldn't tell the difference at first look. I think that failing tests may result in similar assumptions amongst readers unless we add some sort of numeric identifier for the groups. I'd be in favour of reintroducing that. I'll note my approval here though I'll skip the GitHub codeowners review since I didn't look closely at the implementation. |
Not all cluster have the specific node names but can have the labels to designated which node is which. We should use ExecInHostNetNSByLabel to execute some commands inside a particular host by its label and not by their name. Signed-off-by: Andr茅 Martins <andre@cilium.io>
This The deploymentManager variable was not initialized on this file which was causing a panic to happen if the CI was running focusing on particular tests of this file. stack trace of panic: github.com/cilium/cilium/test/ginkgo-ext.beforeEach.func1.1() /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:443 +0x47 panic({0x2318820, 0x3ecde60}) /opt/hostedtoolcache/go/1.20.4/x64/src/runtime/panic.go:884 +0x213 github.com/cilium/cilium/test/k8s.DeployCiliumOptionsAndDNS(0x0, {0xc000580360?, 0x1000002a53f01?}, 0xc00206cc78?) /home/runner/work/cilium/cilium/test/k8s/assertion_helpers.go:204 +0x27 github.com/cilium/cilium/test/helpers.(*DeploymentManager).DeployCilium(0xc000291950, 0xc00193cf00?, 0x2704c57?) /home/runner/work/cilium/cilium/test/helpers/manifest.go:305 +0x3c github.com/cilium/cilium/test/k8s.prepareHostPolicyEnforcement(0xc000964690) /home/runner/work/cilium/cilium/test/k8s/datapath_configuration.go:581 +0xa5 github.com/cilium/cilium/test/k8s.glob..func20.7.23.1() /home/runner/work/cilium/cilium/test/k8s/services.go:628 +0xc6 github.com/cilium/cilium/test/ginkgo-ext.BeforeAll.func1() /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:195 +0x69 github.com/cilium/cilium/test/ginkgo-ext.beforeEach.func1() /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:447 +0xe6 github.com/cilium/cilium/test/ginkgo-ext.applyAdvice.func1({0x3f5e378, 0x0, 0x0}) /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:575 +0x8a github.com/cilium/cilium/test.Test(0xc000031040) /home/runner/work/cilium/cilium/test/test_suite_test.go:112 +0x484 testing.tRunner(0xc000031040, 0x2825e08) /opt/hostedtoolcache/go/1.20.4/x64/src/testing/testing.go:1576 +0x10b created by testing.(*T).Run /opt/hostedtoolcache/go/1.20.4/x64/src/testing/testing.go:1629 +0x3ea Signed-off-by: Andr茅 Martins <andre@cilium.io>
Some cluster configurations might not support any IPv6 connectivity which makes some tests to fail. They were failing because cilium health probe was unable to connect to the IPv6 IP addresses on Kind with Kubernetes <= v1.19, which is the environments that we will disable IPv6 on Cilium. Signed-off-by: Andr茅 Martins <andre@cilium.io>
Some cluster don't have IPv6 connectivity to the outside would which makes some tests to fail. To workaround this issue we should only run the IPv6 connectivity to the outside by simulating with the help of a node, when available, that is not running Cilium. Otherwise we will skip the IPv6 connectivity test entirely. Signed-off-by: Andr茅 Martins <andre@cilium.io>
428dfaf
to
90dbc0e
Compare
/test |
90dbc0e
to
02251a0
Compare
/test |
02251a0
to
71618a3
Compare
/test |
This commit adds the ability to run the ginkgo tests on GH actions. Running the ginkgo tests on GH actions will allow us to a better parallelization of the test runs and smaller queues which results in faster test feedback loop for developers. The changes performed in the test/ directory were necessary so that we can setup the infrastructure in GitHub actions. Unfortunately GitHub does not have IPv6 connectivity and some tests had to be modified to take that into account by simulating an external host with IPv6 support. Also, because kind doesn't mount "/proc/sys/net/core/default_qdisc" we can't test bandwidth manager tests. Signed-off-by: Andr茅 Martins <andre@cilium.io>
71618a3
to
6375c4d
Compare
Last tests were done with the commit from main branch afb6b2b executed in here: https://github.com/cilium/cilium/actions/runs/5176732237 All tests were passing, I had to made some changes into the workflow file that doesn't require a new Merging. Thank for the comments / reviews. |
Description
The current CI setup runs all ginkgo tests on Jenkins. Some of the runtime tests were already transferred into Github actions #25516 and this PR is moving the missing pieces.
Testing
Example of some runs:
Pros vs Cons
Pros 馃帀
Cons 馃槥
/proc/sys/net/core/default_qdisc
from the host:Notes
Re-introduce the
f-XX
prefix?The #25516 had a prefix (
f-XX
) for "focus" matrix dimension but it was removedbecause it didn't make sense. However, it might make sense to introduce them in this PR because it's hard to navigate
through the web UI between jobs that have the same. For example:
datapath-egress-tun-disabled-endpoint-routes- disabled
,datapath-egress-tun-disabled-endpoint-routes-enabled
,datapath-egress-tun-vxlan-endpoint-routes- disabled
anddatapath-egress-tun-vxlan-endpoint-routes-enabled
.