Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Fix graceful termination test flake #18050

Merged

Conversation

aditighag
Copy link
Member

@aditighag aditighag commented Nov 30, 2021

The graceful termination test apps [1] are updated to fix a flake. Specifically, added read and write deadlines while making socket calls on the server side. This way the server doesn't block on the socket calls when SIGTERM event is received on termination.

While at it, also updated the test logic to validate that connectivity between client and server is intact at least for the configured terminationGracePeriodInSeconds duration.

Ran the test locally a few times, and it passed every time.

[1] https://github.com/cilium/graceful-termination-test-apps

Signed-off-by: Aditi Ghag aditi@cilium.io

Fixes: #18045

@aditighag aditighag added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Nov 30, 2021
@aditighag aditighag requested a review from a team as a code owner November 30, 2021 02:45
@aditighag aditighag requested review from a team, pchaigno and joamaki November 30, 2021 02:45
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.11.0 Nov 30, 2021
@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-only --focus="K8sServicesTest.Checks graceful termination of service endpoints" --k8s_version=1.22 --kernel_version="419"

Edit : Passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/360/console.

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-only --focus="K8sServicesTest.Checks graceful termination of service endpoints" --k8s_version=1.22 --kernel_version="419"

Edit : Passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/361/console.

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-only --focus="K8sServicesTest.Checks graceful termination of service endpoints" --k8s_version=1.22 --kernel_version="419"

Edit : Passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/362/console.

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-only --focus="K8sServicesTest.Checks graceful termination of service endpoints" --k8s_version=1.22 --kernel_version="419"

Edit : Passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/363/console.

@aditighag
Copy link
Member Author

4/4 Jenkins runs passed.

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-1.22-4.19

Edit - Failed in vagrant bootstrap -

restarting portmap and nfs-kernel-server services
23:27:40  Failed in branch Copy code and boot vms

Need to check if the test failed to destroy VMs as it never got this stage - https://github.com/cilium/cilium/blob/master/test/vagrant-ci-start.sh#L16.

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with direct routing Tests NodePort

Failure Output

FAIL: Request from testclient-s6k78 pod to service http://[fd03::a9d6]:10080 failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

@borkmann borkmann added the priority/high This is considered vital to an upcoming release. label Nov 30, 2021
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-1.22-4.19

Edit : Passed - https://jenkins.cilium.io/job/Cilium-PR-K8s-1.22-kernel-4.19/34/.

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-only --focus="K8sServicesTest.*" --k8s_version=1.22 --kernel_version="419"

Edit : Running the entire Services test suite to expedite the process. The changes are limited to a test case anyway.

Passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/365/console.

@aditighag
Copy link
Member Author

aditighag commented Nov 30, 2021

test-only --focus="K8sServicesTest.*" --k8s_version=1.22 --kernel_version="419"

Edit : Passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/366/console.

@joestringer
Copy link
Member

FYI the failing test K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with direct routing Tests NodePort should now be quarantined based on #18051 due to the high incidence of recent failures in the tree. You would need to rebase to pick up the quarantining of that test.

test/k8sT/Services.go Outdated Show resolved Hide resolved
@aditighag aditighag force-pushed the pr/aditighag/fix-graceful-term-test branch from ee744c4 to 5b1e83a Compare December 1, 2021 00:20
@aditighag
Copy link
Member Author

aditighag commented Dec 1, 2021

test-only --focus="K8sServicesTest.*" --k8s_version=1.22 --kernel_version="419"

Edit :

Still waiting to schedule task
‘Jenkins’ doesn’t have label ‘baremetal’; ‘jenkins-node-gke-fixed-1’ doesn’t have label ‘baremetal’; ‘node-emerging-perch’ is offline; ‘node-poetic-lizard’ is offline; ‘node-romantic-humpback’ is offline
Aborted by unknown
[Pipeline] // node
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] sh
Error when executing always post condition:
org.jenkinsci.plugins.workflow.steps.MissingContextVariableException: Required context class hudson.FilePath is missing
Perhaps you forgot to surround the code with a step that provides this, such as: node
	at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:266)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:260)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:176)

Posted in the #testing slack channel.

@aditighag
Copy link
Member Author

aditighag commented Dec 1, 2021

The latest push contains rebased commits along with the reverted commit that quarantined the test (this PR fixes the test).

@aditighag
Copy link
Member Author

aditighag commented Dec 1, 2021

@aditighag aditighag added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 1, 2021
@pchaigno pchaigno removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 1, 2021
@pchaigno
Copy link
Member

pchaigno commented Dec 1, 2021

Removing ready-to-merge label as @cilium/ci-structure wasn't covered with an approving review. I'll review now.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test improvement looks good to me with just one nit on error message. As per the previous discussions, I however think we should remove the second commit for now.

test/k8sT/Services.go Show resolved Hide resolved
test/k8sT/Services.go Outdated Show resolved Hide resolved
test/k8sT/Services.go Outdated Show resolved Hide resolved
@aditighag aditighag force-pushed the pr/aditighag/fix-graceful-term-test branch from 5b1e83a to 988e7ec Compare December 2, 2021 02:50
This reverts commit cbbea39

Signed-off-by: Aditi Ghag <aditi@cilium.io>
The graceful termination test apps [1] are updated to make the
test logic to fix flakes. Specifically, added read and write
deadlines while making socket calls on the server side. This
way the server doesn't block on the socket calls when `SIGTERM`
event is received on termination.

While at it, also updated the test logic to validate that
connectivity between client and server is intact at least
for the configured `terminationGracePeriodInSeconds` duration.

[1] https://github.com/cilium/graceful-termination-test-apps

Signed-off-by: Aditi Ghag <aditi@cilium.io>
The library function provides the same functionality.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag aditighag force-pushed the pr/aditighag/fix-graceful-term-test branch from 988e7ec to cf0334b Compare December 2, 2021 02:50
@aditighag
Copy link
Member Author

aditighag commented Dec 2, 2021

test-only --focus="K8sServicesTest.*" --k8s_version=1.22 --kernel_version="419"

Passed, hit #18072 -

19:16:49  Summarizing 2 Failures:
19:16:49  
19:16:49  [Fail] K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with vxlan [It] Tests with secondary NodePort device 
19:16:49  /home/jenkins/workspace/Cilium-PR-Tests-Kernel-Focus/src/github.com/cilium/cilium/test/k8sT/service_helpers.go:617
19:16:49  
19:16:49  [Fail] K8sServicesTest Checks service across nodes Tests NodePort BPF [It] Tests with secondary NodePort device 
19:16:49  /home/jenkins/workspace/Cilium-PR-Tests-Kernel-Focus/src/github.com/cilium/cilium/test/k8sT/service_helpers.go:617

https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/371/console

@aditighag
Copy link
Member Author

aditighag commented Dec 2, 2021

test-only --focus="K8sServicesTest.*" --k8s_version=1.22 --kernel_version="419"

Edit : Passed, hit #18072 again

https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/372/

@aditighag
Copy link
Member Author

test-only --focus="K8sServicesTest.*" --k8s_version=1.23 --kernel_version="49"

@borkmann borkmann merged commit 1987b67 into cilium:master Dec 2, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.11.0 Dec 2, 2021
@qmonnet qmonnet mentioned this pull request Dec 2, 2021
18 tasks
@nathanjsweet nathanjsweet added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Dec 2, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.0 Dec 2, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.0 Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. priority/high This is considered vital to an upcoming release. release-note/ci This PR makes changes to the CI.
Projects
No open projects
1.11.0
Backport done to v1.11
9 participants