-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-multicluster: Fix post-test information gathering #16712
ci-multicluster: Fix post-test information gathering #16712
Conversation
This fixes two issues with the ci-multicluster log gathering: 1. `cilium clustermesh status` can unfortunately block even when not using `--wait`. See cilium/cilium-cli#384 This causes a failing job to be cancelled, thus causing any post-failiure steps to be skipped as well. This commit works around the issue by manually adding a timeout to the post-failiure `cilium clustermes status` invocation. 2. Sysdumps were not collected for both clusters. This is fixed by manually switching to each cluster context using `kubectl`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
I have successfully tested this in #16705 - the artificial failure caused the correct double sysdump to be created: https://github.com/cilium/cilium/pull/16705/checks?check_run_id=2954156565 Artifacts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM. Do you want to make changes in cilium-cli
(https://github.com/cilium/cilium-cli/blob/daed305be87be2ba609a077d766ddd8b404c5623/.github/workflows/multicluster.yaml#L131-L138) accordingly for retrieving sysdump for both clusters? Otherwise I'll make a note of doing that later.
We should probably also make a note of removing the workarounds once we bump up the CLI version to a release with the fix for cilium/cilium-cli#384. |
Oh, good catch. I will open a PR on cilium/cilium-cli as well then. |
This fixes two issues with the ci-multicluster log gathering:
cilium clustermesh status
can unfortunately block even whennot using
--wait
.See
cilium clustermesh status
blocks even when not using--wait
cilium-cli#384This causes a failing job to be cancelled, thus causing any
post-failiure steps to be skipped as well.
This commit works around the issue by manually adding a timeout to
the post-failiure
cilium clustermes status
invocation.Sysdumps were not collected for both clusters. This is fixed by
manually switching to each cluster context using
kubectl
.