Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Retrieve the private interface in an Eventually #16990

Merged
merged 1 commit into from Aug 4, 2021

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Jul 23, 2021

Upon sysdump inspection of
#16479, it's possible that the
failures in retrieving the private interface are time-sensative. As
#16479 (comment)
points out, the output of ip a actually showed that the interface the
test was looking for existed, so it's very likely a timing issue. This
commit attempts to fix this flake by retrying until we are able to
extract out the interface.

Fixes: #16479

@christarazi christarazi added area/CI Continuous Integration testing issue or flake area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/ci This PR makes changes to the CI. kind/enhancement This would improve or streamline existing functionality. labels Jul 23, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 23, 2021
Upon sysdump inspection of
cilium#16479, it's possible that the
failures in retrieving the private interface are time-sensative. As
cilium#16479 (comment)
points out, the output of `ip a` actually showed that the interface the
test was looking for existed, so it's very likely a timing issue. This
commit attempts to fix this flake by retrying until we are able to
extract out the interface.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi
Copy link
Member Author

test-1.16-netnext

@christarazi
Copy link
Member Author

test-1.20-4.19

@christarazi
Copy link
Member Author

test-1.19-5.4

@christarazi christarazi marked this pull request as ready for review July 25, 2021 22:56
@christarazi christarazi requested a review from a team July 25, 2021 22:56
@christarazi christarazi requested a review from a team as a code owner July 25, 2021 22:56
@christarazi christarazi removed the request for review from jrfastab July 25, 2021 22:57
@pchaigno
Copy link
Member

Do we have any idea why this would be happening? When the test fails, it's usually not the first test to run, so the enp0s8 interface has been set up for a while already.

@christarazi
Copy link
Member Author

christarazi commented Jul 26, 2021

@pchaigno It might have something to do with Cilium being re-installed and it re-configuring the devices. I tried reproducing locally but I couldn't even when forcing the relevant previous test to run as #16479 (comment) pointed out. I can't think of any other explanation because the test actually spits out ip a when we fail to find the private iface and that error msg contains enp0s8.

Actually another thing that I just thought about is that it could be netlink being unreliable. It's possible that the first call to ip a led to an empty response back hence enp0s8 missing. An Eventually should cover both cases. I don't think this is a Cilium bug in any way, so I don't think we are papering over anything.

@pchaigno
Copy link
Member

Actually another thing that I just thought about is that it could be netlink being unreliable. It's possible that the first call to ip a led to an empty response back hence enp0s8 missing.

That's what I had in mind. Ideally, we'd print, in the error message, the exact ip a output we've used. I just couldn't figure out a clean way to do that at the time :-/

An Eventually should cover both cases. I don't think this is a Cilium bug in any way, so I don't think we are papering over anything.

Yep, you're probably right.

Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@nbusseneau
Copy link
Member

Does this PR really need a review from @cilium/bpf?

@pchaigno
Copy link
Member

Does this PR really need a review from @cilium/bpf?

No, but we can't remove the team review request, so Chris removed the reviewer. I guess it can be merged now if tests are passing.

@nbusseneau
Copy link
Member

I'll let Chris mark as ready-to-merge if he feels it's ready, otherwise I suggest we run test-1.19-5.4 a few times since it's the only job I've seen fail with this so far.

@christarazi
Copy link
Member Author

test-1.19-5.4

@christarazi
Copy link
Member Author

test-1.19-5.4

@aanm aanm merged commit 4d20ec9 into cilium:master Aug 4, 2021
@christarazi christarazi deleted the pr/christarazi/fix-16479 branch August 4, 2021 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/CI-improvement Topic or proposal to improve the Continuous Integration workflow kind/enhancement This would improve or streamline existing functionality. release-note/ci This PR makes changes to the CI.
Projects
None yet
7 participants