Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-multicluster: Fix post-test information gathering #16712

Conversation

gandro
Copy link
Member

@gandro gandro commented Jun 30, 2021

This fixes two issues with the ci-multicluster log gathering:

  1. cilium clustermesh status can unfortunately block even when
    not using --wait.
    See cilium clustermesh status blocks even when not using --wait cilium-cli#384
    This causes a failing job to be cancelled, thus causing any
    post-failiure steps to be skipped as well.
    This commit works around the issue by manually adding a timeout to
    the post-failiure cilium clustermes status invocation.

  2. Sysdumps were not collected for both clusters. This is fixed by
    manually switching to each cluster context using kubectl.

This fixes two issues with the ci-multicluster log gathering:

  1. `cilium clustermesh status` can unfortunately block even when
     not using `--wait`. See cilium/cilium-cli#384
     This causes a failing job to be cancelled, thus causing any
     post-failiure steps to be skipped as well.
     This commit works around the issue by manually adding a timeout to
     the post-failiure `cilium clustermes status` invocation.

  2. Sysdumps were not collected for both clusters. This is fixed by
     manually switching to each cluster context using `kubectl`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro added area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/ci This PR makes changes to the CI. labels Jun 30, 2021
@gandro gandro requested review from a team as code owners June 30, 2021 16:16
@gandro
Copy link
Member Author

gandro commented Jun 30, 2021

I have successfully tested this in #16705 - the artificial failure caused the correct double sysdump to be created: https://github.com/cilium/cilium/pull/16705/checks?check_run_id=2954156565

Artifacts:

Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM. Do you want to make changes in cilium-cli (https://github.com/cilium/cilium-cli/blob/daed305be87be2ba609a077d766ddd8b404c5623/.github/workflows/multicluster.yaml#L131-L138) accordingly for retrieving sysdump for both clusters? Otherwise I'll make a note of doing that later.

@nbusseneau
Copy link
Member

We should probably also make a note of removing the workarounds once we bump up the CLI version to a release with the fix for cilium/cilium-cli#384.

@aanm aanm merged commit 2641808 into cilium:master Jul 2, 2021
@gandro
Copy link
Member Author

gandro commented Jul 5, 2021

Thanks LGTM. Do you want to make changes in cilium-cli (https://github.com/cilium/cilium-cli/blob/daed305be87be2ba609a077d766ddd8b404c5623/.github/workflows/multicluster.yaml#L131-L138) accordingly for retrieving sysdump for both clusters? Otherwise I'll make a note of doing that later.

Oh, good catch. I will open a PR on cilium/cilium-cli as well then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants