Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LRP restore test logic robust and optimized #16194

Merged
merged 2 commits into from May 28, 2021

Conversation

aditighag
Copy link
Member

@aditighag aditighag commented May 18, 2021

The goal of the test is to check if curl to a clusterIP svc endpoint is redirected to both the backends when the original svc entry is restored upon LRP removal. The current test logic expects the same backend should be selected for all the pod clients simultaneously, and this can lengthen test duration. This doesn't seem right since backend selection is not exactly deterministic. More importantly, we only need both backends to be selected at least once for all the client pods.

Flip the order in which we loop over backends and client pods. Loop over client pods first, and then making curl calls to until we hit both the backends on each of the client pods. Also, keep state about which backends have been successfully tested in order to avoid making some of the duplicate curl calls, and make the test logic deterministic.

More details - #16154 (comment).

Deferred to a follow-up PR - Looking at the LRP test case where we check if traffic only goes to the local backend, it doesn't seem reliable since it's possible that the curl request that the test made wasn't redirected to the remote backend by chance. I think the reliable way to validate the correctness is to check for a LocalRedirect service entry and its corresponding backends in the cilium service list .

Fixes: #16154

@aditighag aditighag added the release-note/ci This PR makes changes to the CI. label May 18, 2021
@aditighag aditighag requested a review from a team as a code owner May 18, 2021 00:58
@aditighag aditighag requested review from a team, nbusseneau and joamaki May 18, 2021 00:58
@aditighag aditighag marked this pull request as draft May 18, 2021 00:59
@aditighag
Copy link
Member Author

test-only --focus="K8sServicesTest.* LRP" --kernel_version="net-next"

@aditighag
Copy link
Member Author

LRP focused tests passed. Marking it as ready for review. /cc @Weil0ng

@aanm
Copy link
Member

aanm commented May 19, 2021

marking for backport as it failed in one backport PR #16210

test/k8sT/Services.go Show resolved Hide resolved
test/k8sT/Services.go Outdated Show resolved Hide resolved
test/k8sT/Services.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Weil0ng Weil0ng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix, I think this is the correct test logic as you described.

Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the community meeting, rationale looks sensible to me. Thanks!

Copy link
Member Author

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joamaki Thanks for the review, PTAL.

test/k8sT/Services.go Outdated Show resolved Hide resolved
test/k8sT/Services.go Outdated Show resolved Hide resolved
@aditighag
Copy link
Member Author

test-only --focus="K8sServicesTest.* LRP" --kernel_version="net-next"

@aditighag
Copy link
Member Author

Test failures -

1.16-netnext : https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/576/console

Setting status of 07392becc9d9752966c6bbf97d50391be02efea3 to FAILURE with url https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/576/ and message: 'Build finished. '
Using context: k8s-1.16-kernel-netnext (test-1.16-netnext)
Global Slack Notifier try posting to slack. However some error occurred
TeamDomain :cilium
Channel :#testing
Message :

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)

1.20-net-next failure looks legit. Marking the PR as draft for now.

@aditighag
Copy link
Member Author

@aditighag aditighag marked this pull request as draft May 21, 2021 16:35
@pchaigno
Copy link
Member

I'll push a new commit to address Paul's comment about incorrect "Fixes" commit SHA mentioned in the 1st commit.

According to the logs, you also rebased, so let's retrigger the e2e tests 😞

@pchaigno pchaigno removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 26, 2021
@pchaigno
Copy link
Member

test-gke

@pchaigno
Copy link
Member

test-1.20-4.19

@pchaigno
Copy link
Member

test-1.16-netnext

@aditighag
Copy link
Member Author

aditighag commented May 26, 2021

I'll push a new commit to address Paul's comment about incorrect "Fixes" commit SHA mentioned in the 1st commit.

According to the logs, you also rebased, so let's retrigger the e2e tests 😞

@pchaigno Why do we need to trigger the e2e tests? The changes are localized to only LRP test cases, and there were no merge conflicts.

@pchaigno
Copy link
Member

Because other changes included in the rebase could cause the test to fail (e.g., changes to the agent). The only case where we can merge without restarting tests is when the diff doesn't include any changes that is tested (e.g., only code comment or commit description changes). That's not the case with a rebase.

@aditighag
Copy link
Member Author

Relevant tests have passed. Marking it as ready for merge again.

@aditighag aditighag added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. needs-backport/1.9 labels May 27, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.8 May 27, 2021
@aanm aanm added this to Needs backport from master in 1.9.9 May 27, 2021
@aanm aanm removed this from Needs backport from master in 1.9.8 May 27, 2021
@aanm aanm merged commit 6a3e846 into cilium:master May 28, 2021
@qmonnet qmonnet mentioned this pull request Jun 1, 2021
23 tasks
@qmonnet qmonnet mentioned this pull request Jun 2, 2021
6 tasks
@joestringer joestringer moved this from Needs backport from master to Backport done to v1.9 in 1.9.9 Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
No open projects
1.9.9
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

CI: K8sServicesTest Checks local redirect policy LRP restores service when removed
8 participants