-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2024-04-22 #32104
Merged
Merged
v1.14 Backports 2024-04-22 #32104
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jschwinger233
added
kind/backports
This PR provides functionality previously merged into master.
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
labels
Apr 22, 2024
giorio94
approved these changes
Apr 22, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PRs look good, thanks!
jschwinger233
force-pushed
the
pr/v1.14-backport-2024-04-22-01-00
branch
from
April 22, 2024 07:38
2f000df
to
435188c
Compare
/test-backport-1.14 |
marseel
approved these changes
Apr 22, 2024
jrajahalme
approved these changes
Apr 22, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits LGTM
[ upstream commit 100e625 ] This prevents possible shenanigans caused by search domains possibly configured on the runner, and propagated to the pods. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 804d5f0 ] When number of concurrent dns requests was moderately high, there was a chance that some of the gorutines would get stuck waiting for response. Contains fix from cilium/dns#10 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit c869e6c ] Do not return an error from xds server when the context is cancelled, as this is part of normal operation, and we test for this in server_e2e_test. This resolves a test flake: panic: Fail in goroutine after has completed Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 3cde59c ] The main test goroutine might be completed before checks on the server goroutine are completed, hence cause the below panic issue. This commit is to defer the streamDone channel close to make sure the error check on the stream server is done before returning from the test. We keep the time check on the wait in the end of each test to not stall the tests in case the stream server fails to exit. Panic error ``` panic: Fail in goroutine after Test/ServerSuite/TestRequestStaleNonce has completed ``` Testing was done as per below: ``` $ go test -count 500 -run Test/ServerSuite/TestRequestStaleNonce ./pkg/envoy/xds/... ok github.com/cilium/cilium/pkg/envoy/xds 250.866s ``` Fixes: #31855 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit df6afbd ] [ backporter's note: moved changes from pkg/envoy/xds/stream_test.go to pkg/envoy/xds/stream.go as v1.14 doesn't have the former file ] Return io.EOF if test channel was closed, rather than returning a nil request. This mimics the behavior of generated gRPC code, which never returns nil request with a nil error. This resolves a test flake with this error logs: time="2024-04-16T08:46:23+02:00" level=error msg="received nil request from xDS stream; stopping xDS stream handling" subsys=xds xdsClientNode="node0~10.0.0.0~node0~bar" xdsStreamID=1 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit d3c1fee ] Speed up tests by eliminating CacheUpdateDelay, as it is generally not needed. When needed, replace with IsCompletedInTimeChecker that waits for upto MaxCompletionDuration before returning, in contrast with IsCompletedChecker that only returns the current state without any wait. This change makes the server_e2e_test tests run >1000x faster. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 4efe9dd ] TestRequestStaleNonce test code was written with the assumption that no response would be reveived for a request with a stale nonce, and a second SendRequest was done right after with the correct nonce value. This caused two responses to be returned, and the first one could have been with the old version of the resources. Remove this duplicate SendRequest. This resolves test flakes like this: --- FAIL: Test/ServerSuite/TestRequestStaleNonce (0.00s) server_e2e_test.go:784: ... response *discoveryv3.DiscoveryResponse = &discoveryv3.DiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), VersionInfo:"3", Resources:[]*anypb.Any{(*anypb.Any)(0x40003a63c0), (*anypb.Any)(0x40003a6410)}, Canary:false, TypeUrl:"type.googleapis.com/envoy.config.v3.DummyConfiguration", Nonce:"3", ControlPlane:(*corev3.ControlPlane)(nil)} ("version_info:\"3\" resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource1\"}} resources:{[type.googleapis.com/envoy.config.route.v3.RouteConfiguration]:{name:\"resource0\"}} type_url:\"type.googleapis.com/envoy.config.v3.DummyConfiguration\" nonce:\"3\"") ... VersionInfo string = "4" ... Resources []protoreflect.ProtoMessage = []protoreflect.ProtoMessage{(*routev3.RouteConfiguration)(0xe45380)} ... Canary bool = false ... TypeUrl string = "type.googleapis.com/envoy.config.v3.DummyConfiguration" Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 75d144c ] Stream timeout is a duration we use in tests to make sure the stream does not stall for too long. In production we do not have such a timeout at all, and in fact the requests are long-lived and responses are only sent when there is something (new) to send. Test stream timeout was 2 seconds, and it would occasionally cause a test flake, especially if debug logging is enabled. This seems to happen due to goroutine scheduling, and for this reason debug logging should not be on for these tests. Bump the test stream timeout to 4 seconds to further reduce the chance of a test flake due to it. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 350e9d3 ] [ backporter's note: minor conflicts in .github/workflows/tests-clustermesh-upgrade.yaml ] The goal being to slow down the rollout process, to better highlight possible connection disruption occurring in the meanwhile. At the same time, this also reduces the overall CPU load caused by datapath recompilation, which is a possible additional cause for connection disruption flakiness. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit 7d2505e ] [ backporter's note: minor conflicts in .github/workflows/tests-clustermesh-upgrade.yaml ] The default IPAM mode is cluster-pool, which gets automatically overwritten by the Cilium CLI to kubernetes when running on kind. However, the default helm value gets restored upon upgrade due to --reset-values, causing confusion and possible issues. Hence, let's explicitly configure it to kubernetes, to prevent changes. Similarly, let's configure a single replica for the operator. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit a0d7d37 ] So that it gets actually executed. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gray Liang <gray.liang@isovalent.com>
[ upstream commit 1e28a10 ] [ backporter's note: minor conflicts in .github/workflows/tests-clustermesh-upgrade.yaml ] Hubble relay is not deployed in this workflow, hence it doesn't make sense to wait for the image availability. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
[ upstream commit 0c211e1 ] [ backporter's note: minor conflicts in .github/workflows/tests-clustermesh-upgrade.yaml ] As it simplifies troubleshooting possible connection disruptions. However, let's configure monitor aggregation to medium (i.e., the maximum, and default value) to avoid the performance penalty due to the relatively high traffic load. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
jschwinger233
force-pushed
the
pr/v1.14-backport-2024-04-22-01-00
branch
from
April 23, 2024 03:57
435188c
to
285a502
Compare
/test-backport-1.14 |
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Apr 23, 2024
aanm
approved these changes
Apr 23, 2024
maintainer-s-little-helper
bot
removed
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Apr 23, 2024
ti-mo
approved these changes
Apr 23, 2024
This was referenced Apr 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, a GitHub action will update the labels of these PRs: