Make LRP restore test logic robust and optimized #16194

aditighag · 2021-05-18T00:58:51Z

The goal of the test is to check if curl to a clusterIP svc endpoint is redirected to both the backends when the original svc entry is restored upon LRP removal. The current test logic expects the same backend should be selected for all the pod clients simultaneously, and this can lengthen test duration. This doesn't seem right since backend selection is not exactly deterministic. More importantly, we only need both backends to be selected at least once for all the client pods.

Flip the order in which we loop over backends and client pods. Loop over client pods first, and then making curl calls to until we hit both the backends on each of the client pods. Also, keep state about which backends have been successfully tested in order to avoid making some of the duplicate curl calls, and make the test logic deterministic.

More details - #16154 (comment).

Deferred to a follow-up PR - Looking at the LRP test case where we check if traffic only goes to the local backend, it doesn't seem reliable since it's possible that the curl request that the test made wasn't redirected to the remote backend by chance. I think the reliable way to validate the correctness is to check for a LocalRedirect service entry and its corresponding backends in the cilium service list .

Fixes: #16154

aditighag · 2021-05-18T01:00:27Z

test-only --focus="K8sServicesTest.* LRP" --kernel_version="net-next"

aditighag · 2021-05-18T20:57:15Z

LRP focused tests passed. Marking it as ready for review. /cc @Weil0ng

aanm · 2021-05-19T01:47:31Z

marking for backport as it failed in one backport PR #16210

test/k8sT/Services.go

Weil0ng

LGTM! Thanks for the fix, I think this is the correct test logic as you described.

nbusseneau

As discussed in the community meeting, rationale looks sensible to me. Thanks!

aditighag

@joamaki Thanks for the review, PTAL.

test/k8sT/Services.go

aditighag · 2021-05-21T06:07:39Z

test-only --focus="K8sServicesTest.* LRP" --kernel_version="net-next"

aditighag · 2021-05-21T16:34:05Z

Test failures -

1.16-netnext : https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/576/console

Setting status of 07392becc9d9752966c6bbf97d50391be02efea3 to FAILURE with url https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/576/ and message: 'Build finished. '
Using context: k8s-1.16-kernel-netnext (test-1.16-netnext)
Global Slack Notifier try posting to slack. However some error occurred
TeamDomain :cilium
Channel :#testing
Message :

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)

1.20-net-next failure looks legit. Marking the PR as draft for now.

aditighag · 2021-05-21T16:35:03Z

Focused test runs passed - https://jenkins.cilium.io/job/Cilium-PR-Tests-Kernel-Focus/219/testReport/Suite-k8s-1/20/.

pchaigno · 2021-05-26T19:21:58Z

I'll push a new commit to address Paul's comment about incorrect "Fixes" commit SHA mentioned in the 1st commit.

According to the logs, you also rebased, so let's retrigger the e2e tests 😞

pchaigno · 2021-05-26T19:22:22Z

test-gke

pchaigno · 2021-05-26T19:22:29Z

test-1.20-4.19

pchaigno · 2021-05-26T19:22:33Z

test-1.16-netnext

aditighag · 2021-05-26T19:34:06Z

I'll push a new commit to address Paul's comment about incorrect "Fixes" commit SHA mentioned in the 1st commit.

According to the logs, you also rebased, so let's retrigger the e2e tests 😞

@pchaigno Why do we need to trigger the e2e tests? The changes are localized to only LRP test cases, and there were no merge conflicts.

pchaigno · 2021-05-26T20:55:11Z

Because other changes included in the rebase could cause the test to fail (e.g., changes to the agent). The only case where we can merge without restarting tests is when the diff doesn't include any changes that is tested (e.g., only code comment or commit description changes). That's not the case with a rebase.

aditighag · 2021-05-27T16:26:04Z

Relevant tests have passed. Marking it as ready for merge again.

aditighag added the release-note/ci This PR makes changes to the CI. label May 18, 2021

aditighag requested a review from a team as a code owner May 18, 2021 00:58

aditighag requested review from a team, nbusseneau and joamaki May 18, 2021 00:58

maintainer-s-little-helper bot assigned nbusseneau and joamaki May 18, 2021

aditighag marked this pull request as draft May 18, 2021 00:59

aditighag requested a review from Weil0ng May 18, 2021 01:00

maintainer-s-little-helper bot assigned Weil0ng May 18, 2021

aditighag mentioned this pull request May 18, 2021

CI: K8sServicesTest Checks local redirect policy LRP restores service when removed #16154

Closed

aditighag marked this pull request as ready for review May 18, 2021 20:46

aanm added the needs-backport/1.10 label May 19, 2021

joamaki requested changes May 19, 2021

View reviewed changes

test/k8sT/Services.go Show resolved Hide resolved

test/k8sT/Services.go Outdated Show resolved Hide resolved

test/k8sT/Services.go Outdated Show resolved Hide resolved

Weil0ng approved these changes May 19, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned Weil0ng May 19, 2021

nbusseneau approved these changes May 19, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned nbusseneau May 19, 2021

aditighag commented May 21, 2021

View reviewed changes

test/k8sT/Services.go Outdated Show resolved Hide resolved

test/k8sT/Services.go Outdated Show resolved Hide resolved

aditighag force-pushed the pr/lrp-test-robust branch from e285745 to 07392be Compare May 21, 2021 06:07

joamaki approved these changes May 21, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned joamaki May 21, 2021

aditighag marked this pull request as draft May 21, 2021 16:35

aditighag force-pushed the pr/lrp-test-robust branch from 07392be to 33e6658 Compare May 25, 2021 16:53

michi-covalent approved these changes May 26, 2021

View reviewed changes

pchaigno removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 26, 2021

pchaigno approved these changes May 26, 2021

View reviewed changes

aditighag added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. needs-backport/1.9 labels May 27, 2021

maintainer-s-little-helper bot added this to Needs backport from master in 1.9.8 May 27, 2021

aanm added this to Needs backport from master in 1.9.9 May 27, 2021

aanm removed this from Needs backport from master in 1.9.8 May 27, 2021

aanm merged commit 6a3e846 into cilium:master May 28, 2021

aditighag mentioned this pull request May 28, 2021

CI: Make LRP connectivity test case reliable #16348

Open

qmonnet mentioned this pull request Jun 1, 2021

v1.10 backports 2021-06-01 #16384

Merged

23 tasks

qmonnet added backport-pending/1.10 and removed needs-backport/1.10 labels Jun 1, 2021

qmonnet mentioned this pull request Jun 2, 2021

v1.9 backports 2021-06-02 #16394

Merged

6 tasks

qmonnet added backport-pending/1.9 and removed needs-backport/1.9 labels Jun 2, 2021

aanm added backport-done/1.10 and removed backport-pending/1.10 labels Jun 4, 2021

aanm mentioned this pull request Jun 16, 2021

Prepare for release v1.10.1 #16544

Merged

joestringer moved this from Needs backport from master to Backport done to v1.9 in 1.9.9 Jul 19, 2021

joestringer mentioned this pull request Jul 19, 2021

Prepare for release v1.9.9 #16932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make LRP restore test logic robust and optimized #16194

Make LRP restore test logic robust and optimized #16194

aditighag commented May 18, 2021 •

edited

aditighag commented May 18, 2021

aditighag commented May 18, 2021

aanm commented May 19, 2021

Weil0ng left a comment

nbusseneau left a comment

aditighag left a comment

aditighag commented May 21, 2021

aditighag commented May 21, 2021

aditighag commented May 21, 2021

pchaigno commented May 26, 2021

pchaigno commented May 26, 2021

pchaigno commented May 26, 2021

pchaigno commented May 26, 2021

aditighag commented May 26, 2021 •

edited

pchaigno commented May 26, 2021

aditighag commented May 27, 2021

Make LRP restore test logic robust and optimized #16194

Make LRP restore test logic robust and optimized #16194

Conversation

aditighag commented May 18, 2021 • edited

aditighag commented May 18, 2021

aditighag commented May 18, 2021

aanm commented May 19, 2021

Weil0ng left a comment

Choose a reason for hiding this comment

nbusseneau left a comment

Choose a reason for hiding this comment

aditighag left a comment

Choose a reason for hiding this comment

aditighag commented May 21, 2021

aditighag commented May 21, 2021

aditighag commented May 21, 2021

pchaigno commented May 26, 2021

pchaigno commented May 26, 2021

pchaigno commented May 26, 2021

pchaigno commented May 26, 2021

aditighag commented May 26, 2021 • edited

pchaigno commented May 26, 2021

aditighag commented May 27, 2021

aditighag commented May 18, 2021 •

edited

aditighag commented May 26, 2021 •

edited