Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.github: Capture hubble flows when smoke test fails #16968

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Jul 21, 2021

This is useful when debugging to understand why pod connectivity may be
broken.

Opened while working on #16608

@christarazi christarazi added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. kind/enhancement This would improve or streamline existing functionality. labels Jul 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 21, 2021
@christarazi christarazi added the area/CI-improvement Topic or proposal to improve the Continuous Integration workflow label Jul 21, 2021
@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch 11 times, most recently from b18ab9a to 58d9858 Compare July 23, 2021 06:44
@christarazi christarazi marked this pull request as ready for review July 23, 2021 06:44
@christarazi christarazi requested review from a team as code owners July 23, 2021 06:44
@christarazi christarazi requested a review from nebril July 23, 2021 06:44
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployment pieces of this definitely make sense to me, hubble-relay will help us get flows from the entire cluster.

The part where we gather hubble flows outside of sysdump seems broken to me: sysdump should be gathering everything we need all at once rather than requiring a separate additional step. I believe that @tklauser and/or @bmcustodio may be looking into this particular aspect for the new sysdump tool though so maybe we can just go with this for now.

One minor comment below (applies to both workflows).

Were you able to test this already? I see that it is opened from your own branch, and therefore these workflow changes are not being tested in the runs that GitHub reports on this PR. There was a PSA during the last community meeting during the testing section around what's necessary to test workflow changes.

.github/workflows/tests-smoke.yaml Outdated Show resolved Hide resolved
@christarazi
Copy link
Member Author

christarazi commented Jul 23, 2021

@joestringer I was definitely able to test these new changes as I forced it to fail to make sure that capturing hubble flows worked and included in the sysdump. And I think this is in line with the community meeting PSA as it says to use pull_request trigger , which these smoke tests are under :).

Re: the sysdump tool including this already. I looked into that and saw that it wasn't implemented yet.

@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch from 58d9858 to 8a42973 Compare July 23, 2021 17:35
@joestringer
Copy link
Member

Hmm, true, that's what it says. I guess given that we limit permissions to "read-only" and that github only runs workflows for contributors who have had PRs merged, and that these are only smoke tests running stuff locally in the workflow, that's why these particular tests are OK to run under pull_request and run whatever logic is submitted in PRs from external branches.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably no point in holding back on getting this information in these CI tests since the work is already done, but this shouldn't stop us from working on getting these flows in sysdumps as well.

@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch from 8a42973 to fdbf49a Compare July 23, 2021 18:15
@nbusseneau
Copy link
Member

nbusseneau commented Jul 23, 2021

Hmm, true, that's what it says. I guess given that we limit permissions to "read-only" and that github only runs workflows for contributors who have had PRs merged, and that these are only smoke tests running stuff locally in the workflow, that's why these particular tests are OK to run under pull_request and run whatever logic is submitted in PRs from external branches.

The read-only scope applies to the GITHUB_TOKEN and "only" forbids a potential attacker getting hold of the token to read the repository, and not write to it (i.e. no malicious git push). However it does not forbid the workflow from reading the repo's secrets.

Since we are using pull_request here, the workflow itself runs in the context of the PR branch. If the PR branch comes from a fork, the workflow only has access to the fork's secrets, not cilium repo's secrets, so we are good :)

@joestringer joestringer removed the request for review from nebril July 23, 2021 18:21
@joestringer
Copy link
Member

I removed @nebril from assignment since @nbusseneau provided ci-structure review. Looks like the tests are failing though so not ready-to-merge.

@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch 2 times, most recently from f4993c5 to aed400f Compare July 23, 2021 19:55
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the last commit

@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch 2 times, most recently from 3c0d9f2 to ed4e9b9 Compare July 23, 2021 21:22
@christarazi
Copy link
Member Author

Moving to draft until we figure out why hubble observe is not printing any flows even though hubble status indicates that it's connected (as well as the hubble relay logs in the sysdump): https://github.com/cilium/cilium/runs/3147403555?check_suite_focus=true#step:10:29

@christarazi christarazi marked this pull request as draft July 23, 2021 23:44
@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch from ed4e9b9 to ebc8e6e Compare July 28, 2021 22:46
This is useful when debugging the conformance tests to understand why
pod connectivity may be broken.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/hubble-flows-smoke-test branch from ebc8e6e to d71d14b Compare July 29, 2021 16:43
- name: Run conformance test (e.g. connectivity check without external 1.1.1.1 and www.google.com)
run: |
kubectl apply -f ${{ env.CONFORMANCE_TEMPLATE }}
kubectl wait --for=condition=Available --all deployment --timeout=${{ env.TIMEOUT }}

- name: Capture cilium-sysdump
if: ${{ failure() }}
# The following is needed to prevent hubble from receiving an empty
# file (EOF) on stdin and displaying no flows.
shell: 'script -q -e -c "bash --noprofile --norc -eo pipefail {0}"'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the change that was required to resolve #16968 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

@christarazi christarazi marked this pull request as ready for review July 29, 2021 16:44
@christarazi
Copy link
Member Author

All reviews are in and the change has been validated manually by forcing it to fail and inspecting the sysdump, and the relevant jobs are passing now. Marking ready to merge.

@christarazi christarazi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 29, 2021
@nathanjsweet nathanjsweet merged commit 0cbb855 into cilium:master Aug 2, 2021
@christarazi christarazi deleted the pr/christarazi/hubble-flows-smoke-test branch August 2, 2021 17:20
christarazi added a commit to christarazi/cilium that referenced this pull request Aug 2, 2021
If the `hubble observe` command fails for whatever reason, we don't want
to prevent the collection of the sysdump.

Fixes: 0cbb855 (".github: Capture hubble flows when smoke test
fails")
Fixes: cilium#16968

Signed-off-by: Chris Tarazi <chris@isovalent.com>
joestringer pushed a commit that referenced this pull request Aug 4, 2021
If the `hubble observe` command fails for whatever reason, we don't want
to prevent the collection of the sysdump.

Fixes: 0cbb855 (".github: Capture hubble flows when smoke test
fails")
Fixes: #16968

Signed-off-by: Chris Tarazi <chris@isovalent.com>
krishgobinath pushed a commit to krishgobinath/cilium that referenced this pull request Oct 20, 2021
If the `hubble observe` command fails for whatever reason, we don't want
to prevent the collection of the sysdump.

Fixes: 0cbb855 (".github: Capture hubble flows when smoke test
fails")
Fixes: cilium#16968

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/CI-improvement Topic or proposal to improve the Continuous Integration workflow kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants