-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflows/datapath: Fix always-passing step #24918
Conversation
91052b9
to
b0723c9
Compare
Support for flag --keep-registry was removed in commit 2975c75 ("contrib/kind: no longer create local docker registry") because it is no longer necessary. Because the ConformanceDatapath workflow was broken and always passing, it was not noticed that this workflow was still using the flag. Fixes: 2975c75 ("contrib/kind: no longer create local docker registry") Signed-off-by: Paul Chaignon <paul@cilium.io>
Flags -it to 'kubectl exec' are unnecessary given the command being executed is not an interactive one and couldn't be anyway. Worse, the -i flag causes this workflow step to exit immediately with a zero error code. The subsequent commands, including the actual connectivity tests, are therefore silently skipped. That was causing the ConformanceDatapath workflow to seemingly always succeed its connectivity tests. This commit fixes it. I was able to reproduce a similar behavior locally: $ cat test.sh #!/bin/bash echo before kubectl -n kube-system -c cilium-agent exec -i ds/cilium -- echo test echo after $ ./test.sh & [4] 13155 $ before [4]+ Stopped ./test.sh If removing -i, then all of 'before', 'test', and 'after' are displayed. It's unclear why this kubectl flag causes this behavior, but we should fix nonetheless. Fixes: edef93f ("gh/workflows: Split ci-dp encrypt tests into separate matrix configs") Signed-off-by: Paul Chaignon <paul@cilium.io>
b0723c9
to
375d617
Compare
|
||
./cilium-cli connectivity test --datapath --collect-sysdump-on-failure \ | ||
--sysdump-output-filename "cilium-sysdump-${{ matrix.name }}-<ts>" | ||
./cilium-cli connectivity test --collect-sysdump-on-failure \ | ||
--sysdump-output-filename "cilium-sysdump-${{ matrix.name }}-<ts>" | ||
./contrib/scripts/kind-down.sh --keep-registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your PR, but why do we need to kill Kind? The runner won't be reused by any other test run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. Could it be a leftover from when the workflow was running connectivity tests twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely. We can remove it to save a few secs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing 🤯 🎉 Any idea how to prevent such issues in the future?
By considering any tests that passes 30 times in a row as broken 😸 No real idea. It's tricky to detect always-passing bugs :-/ |
@@ -428,7 +428,7 @@ jobs: | |||
./cilium-cli install ${{ steps.vars.outputs.cilium_install_defaults }} | |||
|
|||
./cilium-cli status --wait | |||
kubectl -n kube-system exec -it daemonset/cilium -- cilium status | |||
kubectl -n kube-system exec daemonset/cilium -- cilium status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I wonder why it exits with 0 though. Does it actually run the connectivity test now? :)
In your case in test.sh
nothing is displayed, because you put your script to background, and as it wants the stdin, it waits in the background under sigstop
. (Do a fg
in bash and it will display you the test
string and the script will terminate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your case in
test.sh
nothing is displayed, because you put your script to background, and as it wants the stdin, it waits in the background undersigstop
. (Do afg
in bash and it will display you thetest
string and the script will terminate.)
Yep. What I'm not entirely clear on is why when doing fg
"after" doesn't show.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does for me
$ ./z.sh &
[1] 49590
$ before
[1]+ Stopped ./z.sh
$ fg
./z.sh
test
after
Anyway, if this fixes the issue, then fine.
We have reviews from the datapath team. Missing reviews from the CI but this isn't changing the structure of the file or the way we run the workflow (though it would be nice to get their opinion on why it may have failed). Merging to avoid having other regressions sneak in. |
First commit fixes an incorrect flag in the ConformanceDatapath workflow. Second commit fixes an always-passing step in the same workflow. See commits for details.
Passing run with test commit: https://github.com/cilium/cilium/actions/runs/4730060326/jobs/8393311157.
Fixes: #24296.