-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2021-12-10 #18232
v1.11 backports 2021-12-10 #18232
Conversation
[ upstream commit 3bd4ad6 ] We use path filters in the CodeQL workflow to avoid running it for unrelated changes. We're however missing the workflow file itself in the path filters. As a result, the CodeQL workflow isn't run when the GitHub Actions it uses are updated by dependabot. This commit fixes it. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit fdfd78d ] The microk8s target previously relied on an env var declared at the top-level of the Makefile, which will conflict with an upcoming commit. Move it into a target-specific variable. This also means we can remove the redundant tagging operation. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit b12ecd1 ] Add a new kind-image target, mirroring the microk8s target for building and tagging the image. To use: $ make kind <install Cilium> $ make kind-image $ kubectl -n kube-system set image daemonset/cilium cilium-agent=localhost:5000/cilium/cilium-dev:local Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 444c58c ] It seems that in some cases (e.g. for RCs) the minor version reported by `kubectl version --client -o json` contains trailing non-numeric characters, e.g. ``` { "clientVersion": { "major": "1", "minor": "23+", "gitVersion": "v1.23.0-rc.0", "gitCommit": "a117a51aa497a516106a3115963437218493c8d2", "gitTreeState": "clean", "buildDate": "2021-11-24T05:31:49Z", "goVersion": "go1.17.3", "compiler": "gc", "platform": "linux/amd64" } } ``` This breaks the check where the reported version is compared against the current k8s env version, leading to `kubectl` being downloaded by each test and creating flakes such as ``` cmd: "curl --output /tmp/kubectl/1.23/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.23.0-rc.0/bin/linux/amd64/kubectl && chmod +x /tmp/kubectl/1.23/kubectl" exitCode: 23 duration: 90.516369ms stdout: err: exit status 23 stderr: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed ^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Warning: Failed to create the file /tmp/kubectl/1.23/kubectl: Text file busy ^M 0 44.4M 0 1177 0 0 14530 0 0:53:25 --:--:-- 0:53:25 14530 curl: (23) Failed writing body (0 != 1177) FAIL: failed to ensure kubectl version: failed to download kubectl ``` Where `kubectl` from another test is still in use and attempted to be overwritten by the current test. Fix this by stripping any trailing non-numeric characters from the k8s minor version for the purpose of version comparison. Fixes: 75fbebb ("test/helpers: use rc.0 as the default version of kubectl") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 6c169f6 ] Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 4b16aee ] Calculate MaximumAllocationIdentity during module initialization is not enough as user specified 'ClusterID' hasn't been parsed during this phase. This patch fixed the problem by recalculating the max identity during runtime initialization when `ClusterID > 0`. Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f4d59d1 ] Minikube 1.12.0 or later is required to use the --cni flag [1]. 1 - kubernetes/minikube@9e95435 Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 11c17f0 ] This reverts commit ca4ed8d ("test/Services: Quarantine 'Checks service on same node'"). The original intention was to quarantine the test from cilium#17919. However, ca4ed8d quarantined the /wrong/ test case which was not the failing one since the code in mentioned commit only covers the IPv6 one and in the test the IPv4 one was failing. Meanwhile, the IPv4 test for 'Checks service on same node' was still left active on master, and looking at CI dashboard from last few days on master there has been no failure related to any of the 'Checks service on same node' tests. Closes: cilium#17919 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 018be31 ] Alexander reports in GitHub issue 18023 that establishing a connection via an FQDN policy, then modifying that FQDN policy, would cause subsequent traffic to the FQDN to be dropped, even if the new policy still allowed the same traffic via a toFQDN statement. This was caused by overzealous release of CIDR identities while generating a new policy. Although the policy calculation itself keeps all selectorcache entries alive during the policy generation phase (see cachedSelectorPolicy.setPolicy() ), after the new policy is inserted into the PolicyCache, the distillery package would clean up the old policy. As part of that cleanup, it would call into the individual selector to call the RemoveSelectors() function. The previous implementation of this logic unintentionally released the underlying identities any time a user of a selector was released, rather than only releasing the underlying identities when the number of users reached zero and the selector itself would be released. This meant that rather than the selectorcache retaining references to the underlying identities when a policy was updated, instead the references would be released (and all corresponding BPF resources cleaned up) at the end of the process. This then triggered subsequent connectivity outages. Fix it by only releasing the identity references once the cached selector itself is removed from the SelectorCache. Fixes: f559cf1 ("selectorcache: Release identities on selector removal") Reported-by: Alexander Block <ablock84@gmail.com> Suggested-by: Jarno Rajahalme <jarno@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 80ff05a ] Test that when FQDN policy is updated to a new policy which still selects the same old FQDN destination, connectivity continues to work. Validated to fail without the previous commit: K8sFQDNTest /home/joe/git/cilium/test/ginkgo-ext/scopes.go:473 Validate that FQDN policy continues to work after being updated [It] /home/joe/git/cilium/test/ginkgo-ext/scopes.go:527 Can't connect to to a valid target when it should work Expected command: kubectl exec -n default app2-58757b7dd5-rh7dd -- curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 --retry 5 http://vagrant-cache.ci.cilium.io -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}', Transfer '%{time_starttransfer}', total '%{time_total}'" To succeed, but it failed: Exitcode: 28 Err: exit status 28 Stdout: time-> DNS: '0.000016()', Connect: '0.000000',Transfer '0.000000', total '5.000415' Stderr: command terminated with exit code 28 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 09f5b56 ] Currently there's a potential 15 sec delay in updating ciliumnode CRD after IP allocation to a pod, meanwhile the operator can determine that a node has excess IPs and release the IP causing pod connectivity issues. A new operator flag `excess-ip-release-delay` is added to control how long operator should wait before considering an IP for release(defaults to 180 secs). This is done to better handle IP reuse during rollouts. Operator and agent use a new map in cilium node status .status.ipam.release-ips to exchange information during the handshake. Fixes: cilium#13412 Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 5a4dd80 ] After the handshake is complete and the operator is done releasing an IP, CiliumNode status (release-ips) and spec (pool) are updated in two consecutive requests. There's a tiny window between the two updates where the entry is removed from .status.ipam.release-ips but the IP is still present in spec.ipam.pool. It was possible that the IP could be allocated between these requests. This commit introduces a new state called released to deal with this. Now agent removes the entry from release-ips only when the IP was removed from .spec.ipam.pool as well. Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 82d4422 ] Occasionally the cilium-operator will run into a transient issue where it cannot get/update/release the leaselock with K8s that it uses to adjudicate its leader election. This error message is part and parcel of this failure and can be ignored. cf. cilium#16402 Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 337b2c9 ] If the developer set the imagePullPolicy to Always, then kind would always end up pulling the image version from the localhost:5000 registry instead of using the version we sideload in the final step here. Avoid this by unconditionally pushing the image to the local registry. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 1305bab ] Commit f665152 ("test: Use stable tags instead of :latest") switched most use of image tags to stable tags, but omitted one occurrence of the zookeeper image in the runtime kafka tests. Switch it to the stable docker.io/cilium/zookeeper:1.0 as well. Fixes: f665152 ("test: Use stable tags instead of :latest") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 09e14df ] This commits adds a workflow parameter to allow for an developer-defined image suffix. It's useful if the developers don't want to use the default "beta" prefix for the docker image repository. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 7d74863 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 701967f ] The variable was not enclosed fully in speechmarks, observed by seeing failures in `make -C install/kubernetes` while backporting this to older branches. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f584267 ] Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
/test-backport-1.11 Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
runtime failure is legitimate -
|
Both Travis CI and runtime tests hit issues due to |
1.22 / 4.19 hit #18138. |
[ upstream commit 2bede03 ] [ Backport notes: regenerated generated files on the release branch using `make -C install/kubernetes` and `make -C Documentation update-helm-values` ] Generate the helm values docs during the release process. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f542724 ] Fixes: cilium#18204 Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
8b47ae9
to
0e6dd25
Compare
/test-runtime Retriggering only runtime as all previous CI passed. |
Runtime test passed, Travis CI hit DockerHub pull limits. I've retriggered Travis CI out of habit, but can be merged afterwards. |
Travis CI passed, marking as ready to merge. |
Once this PR is merged, you can update the PR labels via: