New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2021-11-30 #18076
v1.11 backports 2021-11-30 #18076
Conversation
[ upstream commit dea1343 ] CC: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit cbbea39 ] CC: Aditi Ghag <aditi@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f77a8d8 ] CC: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 0fc1188 ] CC: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 02fa124 ] Previously in the case that both router IPs from the filesystem and the CiliumNode resource were available, we missed a fallback to the CiliumNode IP, if the IP from the FS was outside the provided CIDR range. In other words, we returned early that the FS IP does not belong to the CIDR, without checking if the IP from the CiliumNode was a valid fallback. This commit adds the missing case logic and also adds more documentation to the function. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit fcd0039 ] In the previous commit (referenced below), we forgot to remove the old router IPs from the actual interface (`cilium_host`). This caused connectivity issues in user environments where the discarded, stale IPs were reassigned to pods, causing the ipcache entries for those IPs to have `remote-node` identity. To fix this, we remove all IPs from the `cilium_host` interface that weren't restored during the router IP restoration process. This step correctly finalizes the restoration process for router IPs. Fixes: ff63b07 ("daemon, node: Fix faulty router IP restoration logic") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 420028f ] With this new workflow, developers will be able to release beta features that are created on top of an existing release. The workflow to create a new beta image is as follow: 1. Push a branch into Cilium's repository with the name: `feature/<stable-branch>/<feature-name>` where `<stable-branch>` represents the branch where the feature is based on and `<feature-name>` represents the name of the feature being released. 2. Trigger the workflow by going into [1], use the workflow from `feature/<stable-branch>/<feature-name>` branch and write an image tag name. The tag name should be in the format `vX.Y.Z-<feature-name>` where `vX.Y.Z` is the version on which the branch is built on, and `<feature-name>` the name of the feature. 3. Ping one of the maintainers or anyone from the cilium-build team to approve the build and release process of this feature. [1] https://github.com/cilium/cilium/actions/workflows/build-images-beta.yaml Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit c640c71 ] To determine cluster health, Cilium exposes a HTTP server both on each node, as well as on the artificial health endpoint running on each node. The port used for this HTTP server is the same and can be configured via `cluster-health-port` (introduced in cilium#16926) and defaults to 4240. This commit fixes a bug where the port specified by `cluster-health-port` was not passed to the Cilium health endpoint responder. Which meant that `cilium-health-responder` was always listening on the default port instead of the one configured by the user, while the probe tried to connect via `cluster-health-port`. This resulted in the cluster being reported us unhealthy whenever `cluster-health-port` was set to a non-default value (which is the case our OpenShift OLM for v1.11): ``` Nodes: gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost): Host connectivity to 10.0.128.2: ICMP to stack: OK, RTT=634.746µs HTTP to agent: OK, RTT=228.066µs Endpoint connectivity to 10.128.11.73: ICMP to stack: OK, RTT=666.83µs HTTP to agent: Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused ``` Fixes: e624868 ("health: Add a flag to set HTTP port") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit cfd9da2 ] This sets a custom value for `cluster-health-port` in the K8sHealth test suite, to ensure we support setting a custom health port (e.g. used in OpenShift, which we do not test in our CI at the moment). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 6334f98 ] This is a cleanup commit with no functional change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 8986930 ] The graceful termination test apps [1] are updated to make the test logic to fix flakes. Specifically, added read and write deadlines while making socket calls on the server side. This way the server doesn't block on the socket calls when `SIGTERM` event is received on termination. While at it, also updated the test logic to validate that connectivity between client and server is intact at least for the configured `terminationGracePeriodInSeconds` duration. [1] https://github.com/cilium/graceful-termination-test-apps Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1987b67 ] The library function provides the same functionality. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
e3456f3
to
00b4927
Compare
/test-backport-1.11 Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
LGTM |
All other Jenkins issues are #18060, fixed in #18087, not backported to v1.11 yet. I haven't looked into the details of the Conformance issues yet. |
ConformanceAKS is hitting #18107. Unfortunately part of the issue there is also that the AKS workflow is tearing down the cluster before gathering state from the cluster, so it's difficult to ascertain what went wrong. It's also hitting an error I haven't seen before: |
I double-checked the Jenkins builds and agree with @qmonnet 's assessment above. This is good to merge, the next batch of v1.11 backports should return stability to the tree. |
(Note: I ran
make -C install/kubernetes
andmake -C Documentation update-helm-values
and committed the resulting changes)test/contrib: Bump CoreDNS version to 1.8.3 #18018 (@brb)Removed because of CI: k8s v1.19 or older: Kubernetes DNS did not become ready in time (all tests) #18086
cilium_host
#17762 (@christarazi)Once this PR is merged, you can update the PR labels via: