New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 Backports 2023-04-20 #25011
v1.11 Backports 2023-04-20 #25011
Conversation
[ upstream commit 82d5adc ] When the quiet mode was enabled, the $(CLANG) var would previously have a '@' at the start, which caused errors while attempting to make in this directory because it would be run in the context of a shell rather than directly as a make instruction. Move the $(QUIET) to the start of individual make instructions to resolve this compilation failure. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 29fe753 ] When TESTPKGS only contains unprivileged tests, the PRIV_TEST_PKGS_EVAL evaluation previously filtered down to an empty list of packages that should be tested, and would pass this empty list to dirname, which then reports: dirname: missing operand Try 'dirname --help' for more information. This could happen multiple times during evaluation of the Makefile, and littered the output with no meaning. This could occur even if the privileged tests are not the target being run. Fix this by always adding "." to the list, which evaluates to the root directory of the repository. This causes dirname to succeed. Then, we can filter this root directory back out since there are no privileged tests at this level of the repository. This finally quietens the error. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 9f7e24f ] Pass the verbosity parameters --quiet V=0 to quieten Travis output. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 7f9e0f9 ] The travis logs are frequently polluted with >10K lines of docker pull and build output. While this helps to track the ongoing progress of docker builds that take a long time, it's mostly useless output that developers must scroll past in order to see the useful output. Quieten that output in Travis to just the trigger of building the image plus the final summary that docker outputs. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 9bb669b ] Etcd quorum checks are falsely reported as failing even though connection to etcd is intact. This can cause health checks to fail in both the agent and the operator. This happens due to a deadlock in pkg/kvstore/etcd after a prolonged downtime of etcd. Status check errors are being sent into a channel for the purpose of recreating kvstore connections in clustermesh. However when clustermesh is not used, messages from this channel are never read. The channel uses a buffer of size 128. After etcd has been down long enough to generate 128 errors, we enter a deadlock state. Agent / operator will continue to report etcd quorum failures and inturn health check failures until they're restarted. statusChecker() -> isConnectedAndHasQuorum() -> waitForInitLock() -> goroutine -> for -> ( initLockSucceeded <- err ) -> chan initLockSucceeded returned -> Block on receiving messages from initLockSucceeded channel -> e.statusCheckErrors <- e.latestErrorStatus [Blocked after 128 entries] Blocked goroutines captured from cilium 1.10 operator: goroutine 3309 [chan send, 13456 minutes]: github.com/cilium/cilium/pkg/kvstore.(*etcdClient).statusChecker(0xc00017db30) /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:1171 +0x75a created by github.com/cilium/cilium/pkg/kvstore.connectEtcdClient /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:801 +0x679 goroutine 7838665 [chan send, 13505 minutes]: g.com/c/cilium/pkg/kvstore.(*etcdClient).waitForInitLock.func1(-,-,-,-) /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:433 +0x449 created by github.com/cilium/cilium/pkg/kvstore.(*etcdClient).waitForInitLock /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:425 +0x7f Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit a6d0142 ] This commit marks the CiliumEndpointSlice feature as beta (as per the documentation) in the agent flag description. This is necessary because users don't always read the full documentation before turning agent flags on. While at it, change the flag description to match the wording of other flags. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 7398de6 ] Currently, the wireguard subsystem in the cilium agent caches information about the known peers by node name only. This can lead to conflicts in case of clustermesh, if nodes in different clusters have the same name, causing in turn connectivity issues. Hence, let's switch to identify peers by full name (i.e., cluster-name/node-name) to ensure uniqueness. This modification does not introduce issues during upgrades, since the node ID is not propagated to the datapath. Fixes: cilium#24227 Reported-by: @oulinbao <oulinbao@163.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
/test-backport-1.11 Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-4.9/2822/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.17-kernel-4.9' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.17-kernel-4.9/1315/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/2589/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.20-kernel-4.9' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.9/1819/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.19-kernel-4.9' hit: #25099 (93.06% similarity) Job 'Cilium-PR-K8s-1.17-kernel-4.9' hit: #25100 (94.45% similarity) Job 'Cilium-PR-K8s-1.21-kernel-4.9' has 1 failure but they might be new flake since it also hit 1 known flake: #25101 (93.46% similarity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mine too. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for #20031
LGTM for my commit. Thanks. |
I would personally advocate for the first solution. |
👍 I think it's fair to say that Cilium branch CI should test the intersection of {k8s releases that were out at the time of cilium release} and {k8s releases supported by the cloud now}. The logical consequence is that if those groups become disjoint then we no longer need to run such tests, as you propose. We could of course opt to add additional support, but I think it's in everyone's best interest to encourage newer software in cases like this rather than provide more reasons for folks to stick on the older releases. |
Cool, I will disable AKS testing on 1.11 then! |
/test-1.16-netnext |
/mlh new-flake Cilium-PR-K8s-1.19-kernel-4.9 👍 created #25098 |
/mlh new-flake Cilium-PR-K8s-1.17-kernel-4.9 👍 created #25099 |
/mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 👍 created #25100 |
/mlh new-flake Cilium-PR-K8s-1.20-kernel-4.9 👍 created #25101 |
/test-1.17-4.9 |
/test-1.19-4.9 |
/test-1.20-4.9 |
/test-1.21-4.9 |
Please find below the test failure summary: test-1.17-4.9
test-1.19-4.9
test-1.21-4.9 |
/test-1.21-4.9 Re-run this test suite to confirm if there is any issue or just flake. |
Once this PR is merged, you can update the PR labels via:
or with