New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.github: Capture hubble flows when smoke test fails #16968
.github: Capture hubble flows when smoke test fails #16968
Conversation
b18ab9a
to
58d9858
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deployment pieces of this definitely make sense to me, hubble-relay will help us get flows from the entire cluster.
The part where we gather hubble flows outside of sysdump seems broken to me: sysdump should be gathering everything we need all at once rather than requiring a separate additional step. I believe that @tklauser and/or @bmcustodio may be looking into this particular aspect for the new sysdump tool though so maybe we can just go with this for now.
One minor comment below (applies to both workflows).
Were you able to test this already? I see that it is opened from your own branch, and therefore these workflow changes are not being tested in the runs that GitHub reports on this PR. There was a PSA during the last community meeting during the testing section around what's necessary to test workflow changes.
@joestringer I was definitely able to test these new changes as I forced it to fail to make sure that capturing hubble flows worked and included in the sysdump. And I think this is in line with the community meeting PSA as it says to use Re: the sysdump tool including this already. I looked into that and saw that it wasn't implemented yet. |
58d9858
to
8a42973
Compare
Hmm, true, that's what it says. I guess given that we limit permissions to "read-only" and that github only runs workflows for contributors who have had PRs merged, and that these are only smoke tests running stuff locally in the workflow, that's why these particular tests are OK to run under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably no point in holding back on getting this information in these CI tests since the work is already done, but this shouldn't stop us from working on getting these flows in sysdumps as well.
8a42973
to
fdbf49a
Compare
The Since we are using |
I removed @nebril from assignment since @nbusseneau provided ci-structure review. Looks like the tests are failing though so not ready-to-merge. |
f4993c5
to
aed400f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the last commit
3c0d9f2
to
ed4e9b9
Compare
Moving to draft until we figure out why |
ed4e9b9
to
ebc8e6e
Compare
This is useful when debugging the conformance tests to understand why pod connectivity may be broken. Signed-off-by: Chris Tarazi <chris@isovalent.com>
ebc8e6e
to
d71d14b
Compare
- name: Run conformance test (e.g. connectivity check without external 1.1.1.1 and www.google.com) | ||
run: | | ||
kubectl apply -f ${{ env.CONFORMANCE_TEMPLATE }} | ||
kubectl wait --for=condition=Available --all deployment --timeout=${{ env.TIMEOUT }} | ||
|
||
- name: Capture cilium-sysdump | ||
if: ${{ failure() }} | ||
# The following is needed to prevent hubble from receiving an empty | ||
# file (EOF) on stdin and displaying no flows. | ||
shell: 'script -q -e -c "bash --noprofile --norc -eo pipefail {0}"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the change that was required to resolve #16968 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
All reviews are in and the change has been validated manually by forcing it to fail and inspecting the sysdump, and the relevant jobs are passing now. Marking ready to merge. |
If the `hubble observe` command fails for whatever reason, we don't want to prevent the collection of the sysdump. Fixes: 0cbb855 (".github: Capture hubble flows when smoke test fails") Fixes: cilium#16968 Signed-off-by: Chris Tarazi <chris@isovalent.com>
If the `hubble observe` command fails for whatever reason, we don't want to prevent the collection of the sysdump. Fixes: 0cbb855 (".github: Capture hubble flows when smoke test fails") Fixes: cilium#16968 Signed-off-by: Chris Tarazi <chris@isovalent.com>
This is useful when debugging to understand why pod connectivity may be
broken.
Opened while working on #16608