Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: RuntimePrivilegedUnitTests: arping failed in TestArpPingHandling #14125

Closed
pchaigno opened this issue Nov 23, 2020 · 12 comments · Fixed by #14501
Closed

CI: RuntimePrivilegedUnitTests: arping failed in TestArpPingHandling #14125

pchaigno opened this issue Nov 23, 2020 · 12 comments · Fixed by #14501
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! needs/triage This issue requires triaging to establish severity and next steps.
Projects

Comments

@pchaigno
Copy link
Member

https://jenkins.cilium.io/job/Cilium-PR-Runtime-4.9/2657/testReport/junit/(root)/Suite-runtime/RuntimePrivilegedUnitTests_Run_Tests/
372c1e59_RuntimePrivilegedUnitTests_Run_Tests.zip

level=error msg="arping failed" error=timeout interface=veth0 ipAddr=9.9.9.250 subsys=linux-datapath
node_linux_test.go:1038:
 c.Assert(found, check.Equals, true)
... obtained bool = false
... expected bool = true

START: node_linux_test.go:133: linuxPrivilegedIPv4OnlyTestSuite.TearDownTest
PASS: node_linux_test.go:133: linuxPrivilegedIPv4OnlyTestSuite.TearDownTest	0.052s

FAIL: node_linux_test.go:950: linuxPrivilegedIPv4OnlyTestSuite.TestArpPingHandling
@pchaigno pchaigno added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Nov 23, 2020
@brb brb self-assigned this Nov 23, 2020
@brb
Copy link
Member

brb commented Nov 23, 2020

The test was recently introduced by #14070.

@brb brb added the needs/triage This issue requires triaging to establish severity and next steps. label Nov 23, 2020
@pchaigno pchaigno added this to To Do (1.9 - Daily Flakes) in CI Force Nov 26, 2020
@pchaigno
Copy link
Member Author

I bumped this to daily flakes since I'm seeing it several times per day, just now in e.g. #14113.

@brb
Copy link
Member

brb commented Nov 27, 2020

#14201 might fix this.

@tklauser
Copy link
Member

@aanm
Copy link
Member

aanm commented Dec 8, 2020

@brb still happening

@brb
Copy link
Member

brb commented Dec 9, 2020

Managed to reproduce it locally running in the loop, continuing investigation.

@brb
Copy link
Member

brb commented Dec 9, 2020

Switching arping implementation to #13112 didn't resolve the issue. Also, I see with tcpdump that the arping response, which waiting times out according to the failure log, is received by the host.

@brb
Copy link
Member

brb commented Dec 21, 2020

Tried running tests with strace for the whole day - no luck reproducing the issue.

@ungureanuvladvictor
Copy link
Member

@brb brb assigned aanm Jan 4, 2021
CI Force automation moved this from To Do (1.9 - Daily Flakes) to Fixed / Done Jan 4, 2021
brb added a commit that referenced this issue Jan 13, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this issue Jan 14, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm pushed a commit that referenced this issue Jan 18, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit to brb/arping that referenced this issue Jan 18, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: cilium/cilium#14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit to brb/arping that referenced this issue Jan 18, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: cilium/cilium#14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit to brb/arping that referenced this issue Jan 18, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: cilium/cilium#14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit to brb/arping that referenced this issue Jan 18, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: cilium/cilium#14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm pushed a commit to cilium/arping that referenced this issue Jan 18, 2021
It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: cilium/cilium#14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this issue Apr 28, 2021
[ upstream commit 2284c76 ]

It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this issue Apr 28, 2021
[ upstream commit 2284c76 ]

It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
tklauser added a commit that referenced this issue Apr 28, 2021
[ upstream commit 2284c76 ]

It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Co-authored-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
rolinh pushed a commit that referenced this issue Apr 29, 2021
[ upstream commit 2284c76 ]

It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Co-authored-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
rolinh pushed a commit that referenced this issue Apr 29, 2021
[ upstream commit 2284c76 ]

It has been observed that sometimes arping fails with "i/o timeout".
Further investigation [1] has shown that this happen due to the kernel
not sending packets. Therefore, to mitigate the issue,try multiple times
to send the request if the timeout error is encountered.

[1]: #14125 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! needs/triage This issue requires triaging to establish severity and next steps.
Projects
No open projects
CI Force
  
Fixed / Done
Development

Successfully merging a pull request may close this issue.

6 participants