v1.11 backports 2021-11-30 #18076

qmonnet · 2021-11-30T23:36:38Z

Once this PR is merged, you can update the PR labels via:

$ for pr in 18033 17909 17958 18017 18031 18026 17789 18057 18059 18048 18068 18006 18051 17762 18052 18061 18050; do contrib/backporting/set-labels.py $pr done 1.11; done

[ upstream commit dea1343 ] CC: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit cbbea39 ] CC: Aditi Ghag <aditi@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit f77a8d8 ] CC: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 0fc1188 ] CC: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 02fa124 ] Previously in the case that both router IPs from the filesystem and the CiliumNode resource were available, we missed a fallback to the CiliumNode IP, if the IP from the FS was outside the provided CIDR range. In other words, we returned early that the FS IP does not belong to the CIDR, without checking if the IP from the CiliumNode was a valid fallback. This commit adds the missing case logic and also adds more documentation to the function. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit fcd0039 ] In the previous commit (referenced below), we forgot to remove the old router IPs from the actual interface (`cilium_host`). This caused connectivity issues in user environments where the discarded, stale IPs were reassigned to pods, causing the ipcache entries for those IPs to have `remote-node` identity. To fix this, we remove all IPs from the `cilium_host` interface that weren't restored during the router IP restoration process. This step correctly finalizes the restoration process for router IPs. Fixes: ff63b07 ("daemon, node: Fix faulty router IP restoration logic") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 420028f ] With this new workflow, developers will be able to release beta features that are created on top of an existing release. The workflow to create a new beta image is as follow: 1. Push a branch into Cilium's repository with the name: `feature/<stable-branch>/<feature-name>` where `<stable-branch>` represents the branch where the feature is based on and `<feature-name>` represents the name of the feature being released. 2. Trigger the workflow by going into [1], use the workflow from `feature/<stable-branch>/<feature-name>` branch and write an image tag name. The tag name should be in the format `vX.Y.Z-<feature-name>` where `vX.Y.Z` is the version on which the branch is built on, and `<feature-name>` the name of the feature. 3. Ping one of the maintainers or anyone from the cilium-build team to approve the build and release process of this feature. [1] https://github.com/cilium/cilium/actions/workflows/build-images-beta.yaml Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit c640c71 ] To determine cluster health, Cilium exposes a HTTP server both on each node, as well as on the artificial health endpoint running on each node. The port used for this HTTP server is the same and can be configured via `cluster-health-port` (introduced in cilium#16926) and defaults to 4240. This commit fixes a bug where the port specified by `cluster-health-port` was not passed to the Cilium health endpoint responder. Which meant that `cilium-health-responder` was always listening on the default port instead of the one configured by the user, while the probe tried to connect via `cluster-health-port`. This resulted in the cluster being reported us unhealthy whenever `cluster-health-port` was set to a non-default value (which is the case our OpenShift OLM for v1.11): ``` Nodes: gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost): Host connectivity to 10.0.128.2: ICMP to stack: OK, RTT=634.746µs HTTP to agent: OK, RTT=228.066µs Endpoint connectivity to 10.128.11.73: ICMP to stack: OK, RTT=666.83µs HTTP to agent: Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused ``` Fixes: e624868 ("health: Add a flag to set HTTP port") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit cfd9da2 ] This sets a custom value for `cluster-health-port` in the K8sHealth test suite, to ensure we support setting a custom health port (e.g. used in OpenShift, which we do not test in our CI at the moment). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 6334f98 ] This is a cleanup commit with no functional change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 32b5bb2 ] This reverts commit cbbea39 Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 8986930 ] The graceful termination test apps [1] are updated to make the test logic to fix flakes. Specifically, added read and write deadlines while making socket calls on the server side. This way the server doesn't block on the socket calls when `SIGTERM` event is received on termination. While at it, also updated the test logic to validate that connectivity between client and server is intact at least for the configured `terminationGracePeriodInSeconds` duration. [1] https://github.com/cilium/graceful-termination-test-apps Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 1987b67 ] The library function provides the same functionality. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

qmonnet · 2021-12-02T10:43:27Z

I removed #18018 (because of #18086) and added #18050 to this PR.

qmonnet · 2021-12-02T10:55:06Z

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with secondary NodePort device

Failure Output

FAIL: Request from testclient-jfkmb pod to service http://[fd04::11]:30704 failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

aditighag

Thank you!

aditighag · 2021-12-02T18:10:48Z

docs: correct ec2 modify net iface action #17958 (@austince)
docs: add registry (quay.io/) for pre-loading images for kind #18017 (@adamzhoul)
Prometheus lint errors in operator metrics #17789

LGTM

qmonnet · 2021-12-02T18:11:51Z

k8s-1.22-kernel-4.19 is similar to #17895. Most issues of this kind should be fixed by #18051 (by quarantining tests) that we backport in the current PR, but this one hits Tests with secondary NodePort device, which is not quarantined.

All other Jenkins issues are #18060, fixed in #18087, not backported to v1.11 yet.

I haven't looked into the details of the Conformance issues yet.

joestringer · 2021-12-02T22:51:07Z

ConformanceAKS is hitting #18107. Unfortunately part of the issue there is also that the AKS workflow is tearing down the cluster before gathering state from the cluster, so it's difficult to ascertain what went wrong. It's also hitting an error I haven't seen before: failed to inject labels into ipcache: local identity allocator uninitialized . This seems to be clearly trying to warn that the agent is not yet initialized, possibly because of an issue in the workflows? Manual validation of v1.11.0-rc3 via #17926 seemed to indicate that AKS is working fine.
ConformanceEKS hit #16938 which I've just submitted a PR to disable to avoid hitting that flake.

joestringer · 2021-12-02T23:11:48Z

I double-checked the Jenkins builds and agree with @qmonnet 's assessment above. This is good to merge, the next batch of v1.11 backports should return stability to the tree.

qmonnet requested a review from a team November 30, 2021 23:36

qmonnet requested a review from a team as a code owner November 30, 2021 23:36

qmonnet requested a review from a team November 30, 2021 23:36

qmonnet requested review from a team as code owners November 30, 2021 23:36

qmonnet requested a review from a team November 30, 2021 23:36

qmonnet requested a review from a team as a code owner November 30, 2021 23:36

qmonnet requested a review from a team November 30, 2021 23:36

qmonnet requested a review from a team as a code owner November 30, 2021 23:36

qmonnet requested review from a team November 30, 2021 23:36

qmonnet requested review from a team as code owners November 30, 2021 23:36

qmonnet requested review from borkmann, tklauser, kkourt, aanm, geakstr, brb, joestringer, Weil0ng, kaworu, pchaigno, christarazi and gandro November 30, 2021 23:36

maintainer-s-little-helper bot added the backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. label Nov 30, 2021

joestringer and others added 13 commits December 2, 2021 10:34

test/Services: Quarantine 'Tests with direct routing'

b4fa529

[ upstream commit dea1343 ] CC: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

test/Services: Quarantine 'Checks graceful termination'

0ba8143

[ upstream commit cbbea39 ] CC: Aditi Ghag <aditi@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

test/Services: Quarantine 'IPv6 masquerading across K8s nodes'

cbe925c

[ upstream commit f77a8d8 ] CC: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

test/DatapathConfiguration: Quarantine 'Encapsulation'

ba1483b

[ upstream commit 0fc1188 ] CC: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

health: Use signal.NotifyContext

1cb17e4

[ upstream commit 6334f98 ] This is a cleanup commit with no functional change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

Revert "test/Services: Quarantine 'Checks graceful termination'"

7347ba1

[ upstream commit 32b5bb2 ] This reverts commit cbbea39 Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

test: Replace WaitUntilMatch with Eventually

00b4927

[ upstream commit 1987b67 ] The library function provides the same functionality. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

qmonnet force-pushed the pr/v1.11-backport-2021-11-30 branch from e3456f3 to 00b4927 Compare December 2, 2021 10:39

qmonnet requested a review from aditighag December 2, 2021 10:47

maintainer-s-little-helper bot assigned aditighag Dec 2, 2021

kkourt approved these changes Dec 2, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned kkourt Dec 2, 2021

aditighag approved these changes Dec 2, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned aditighag Dec 2, 2021

joestringer approved these changes Dec 2, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned joestringer Dec 2, 2021

joestringer merged commit 12bb19b into cilium:v1.11 Dec 2, 2021

qmonnet deleted the pr/v1.11-backport-2021-11-30 branch December 2, 2021 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.11 backports 2021-11-30 #18076

v1.11 backports 2021-11-30 #18076

qmonnet commented Nov 30, 2021 •

edited by joestringer

qmonnet commented Dec 2, 2021

qmonnet commented Dec 2, 2021 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

aditighag left a comment

aditighag commented Dec 2, 2021

qmonnet commented Dec 2, 2021

joestringer commented Dec 2, 2021

joestringer commented Dec 2, 2021

v1.11 backports 2021-11-30 #18076

v1.11 backports 2021-11-30 #18076

Conversation

qmonnet commented Nov 30, 2021 • edited by joestringer

qmonnet commented Dec 2, 2021

qmonnet commented Dec 2, 2021 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

aditighag left a comment

Choose a reason for hiding this comment

aditighag commented Dec 2, 2021

qmonnet commented Dec 2, 2021

joestringer commented Dec 2, 2021

joestringer commented Dec 2, 2021

qmonnet commented Nov 30, 2021 •

edited by joestringer

qmonnet commented Dec 2, 2021 •

edited by maintainer-s-little-helper bot