Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2021-11-30 #18076

Merged
merged 28 commits into from Dec 2, 2021

Conversation

qmonnet
Copy link
Member

@qmonnet qmonnet commented Nov 30, 2021

Once this PR is merged, you can update the PR labels via:

$ for pr in 18033 17909 17958 18017 18031 18026 17789 18057 18059 18048 18068 18006 18051 17762 18052 18061 18050; do contrib/backporting/set-labels.py $pr done 1.11; done

@qmonnet qmonnet requested a review from a team November 30, 2021 23:36
@qmonnet qmonnet requested a review from a team as a code owner November 30, 2021 23:36
@qmonnet qmonnet requested a review from a team November 30, 2021 23:36
@qmonnet qmonnet requested review from a team as code owners November 30, 2021 23:36
@qmonnet qmonnet requested a review from a team November 30, 2021 23:36
@qmonnet qmonnet requested a review from a team as a code owner November 30, 2021 23:36
@qmonnet qmonnet requested a review from a team November 30, 2021 23:36
@qmonnet qmonnet requested a review from a team as a code owner November 30, 2021 23:36
@qmonnet qmonnet requested review from a team November 30, 2021 23:36
@qmonnet qmonnet requested review from a team as code owners November 30, 2021 23:36
@maintainer-s-little-helper maintainer-s-little-helper bot added the backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. label Nov 30, 2021
joestringer and others added 13 commits December 2, 2021 10:34
[ upstream commit dea1343 ]

CC: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit cbbea39 ]

CC: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f77a8d8 ]

CC: Deepesh Pathak <deepshpathak@gmail.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 0fc1188 ]

CC: Thomas Graf <thomas@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 02fa124 ]

Previously in the case that both router IPs from the filesystem and the
CiliumNode resource were available, we missed a fallback to the
CiliumNode IP, if the IP from the FS was outside the provided CIDR
range. In other words, we returned early that the FS IP does not belong
to the CIDR, without checking if the IP from the CiliumNode was a valid
fallback.

This commit adds the missing case logic and also adds more documentation
to the function.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit fcd0039 ]

In the previous commit (referenced below), we forgot to remove the old
router IPs from the actual interface (`cilium_host`). This caused
connectivity issues in user environments where the discarded, stale IPs
were reassigned to pods, causing the ipcache entries for those IPs to
have `remote-node` identity.

To fix this, we remove all IPs from the `cilium_host` interface that
weren't restored during the router IP restoration process. This step
correctly finalizes the restoration process for router IPs.

Fixes: ff63b07 ("daemon, node: Fix faulty router IP restoration
logic")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 420028f ]

With this new workflow, developers will be able to release beta features
that are created on top of an existing release.

The workflow to create a new beta image is as follow:

1. Push a branch into Cilium's repository with the name:
   `feature/<stable-branch>/<feature-name>` where `<stable-branch>`
   represents the branch where the feature is based on and
   `<feature-name>` represents the name of the feature being released.
2. Trigger the workflow by going into [1], use the workflow from
   `feature/<stable-branch>/<feature-name>` branch and write an image
   tag name.
   The tag name should be in the format `vX.Y.Z-<feature-name>` where
   `vX.Y.Z` is the version on which the branch is built on, and
   `<feature-name>` the name of the feature.
3. Ping one of the maintainers or anyone from the cilium-build team to
   approve the build and release process of this feature.

[1] https://github.com/cilium/cilium/actions/workflows/build-images-beta.yaml

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit c640c71 ]

To determine cluster health, Cilium exposes a HTTP server both on each
node, as well as on the artificial health endpoint running on each node.
The port used for this HTTP server is the same and can be configured via
`cluster-health-port` (introduced in cilium#16926) and defaults to 4240.

This commit fixes a bug where the port specified by
`cluster-health-port` was not passed to the Cilium health endpoint
responder. Which meant that `cilium-health-responder` was always
listening on the default port instead of the one configured by the user,
while the probe tried to connect via `cluster-health-port`. This
resulted in the cluster being reported us unhealthy whenever
`cluster-health-port` was set to a non-default value (which is the case
our OpenShift OLM for v1.11):

```
Nodes:
  gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost):
    Host connectivity to 10.0.128.2:
      ICMP to stack:   OK, RTT=634.746µs
      HTTP to agent:   OK, RTT=228.066µs
    Endpoint connectivity to 10.128.11.73:
      ICMP to stack:   OK, RTT=666.83µs
      HTTP to agent:   Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused
```

Fixes: e624868 ("health: Add a flag to set HTTP port")

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit cfd9da2 ]

This sets a custom value for `cluster-health-port` in the K8sHealth test
suite, to ensure we support setting a custom health port (e.g. used in
OpenShift, which we do not test in our CI at the moment).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 6334f98 ]

This is a cleanup commit with no functional change.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 32b5bb2 ]

This reverts commit cbbea39

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 8986930 ]

The graceful termination test apps [1] are updated to make the
test logic to fix flakes. Specifically, added read and write
deadlines while making socket calls on the server side. This
way the server doesn't block on the socket calls when `SIGTERM`
event is received on termination.

While at it, also updated the test logic to validate that
connectivity between client and server is intact at least
for the configured `terminationGracePeriodInSeconds` duration.

[1] https://github.com/cilium/graceful-termination-test-apps

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1987b67 ]

The library function provides the same functionality.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@qmonnet qmonnet force-pushed the pr/v1.11-backport-2021-11-30 branch from e3456f3 to 00b4927 Compare December 2, 2021 10:39
@qmonnet
Copy link
Member Author

qmonnet commented Dec 2, 2021

I removed #18018 (because of #18086) and added #18050 to this PR.

@qmonnet
Copy link
Member Author

qmonnet commented Dec 2, 2021

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with secondary NodePort device

Failure Output

FAIL: Request from testclient-jfkmb pod to service http://[fd04::11]:30704 failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@aditighag
Copy link
Member

docs: correct ec2 modify net iface action #17958 (@austince)
docs: add registry (quay.io/) for pre-loading images for kind #18017 (@adamzhoul)
Prometheus lint errors in operator metrics #17789

LGTM

@qmonnet
Copy link
Member Author

qmonnet commented Dec 2, 2021

k8s-1.22-kernel-4.19 is similar to #17895. Most issues of this kind should be fixed by #18051 (by quarantining tests) that we backport in the current PR, but this one hits Tests with secondary NodePort device, which is not quarantined.

All other Jenkins issues are #18060, fixed in #18087, not backported to v1.11 yet.

I haven't looked into the details of the Conformance issues yet.

@joestringer
Copy link
Member

ConformanceAKS is hitting #18107. Unfortunately part of the issue there is also that the AKS workflow is tearing down the cluster before gathering state from the cluster, so it's difficult to ascertain what went wrong. It's also hitting an error I haven't seen before: failed to inject labels into ipcache: local identity allocator uninitialized . This seems to be clearly trying to warn that the agent is not yet initialized, possibly because of an issue in the workflows? Manual validation of v1.11.0-rc3 via #17926 seemed to indicate that AKS is working fine.
ConformanceEKS hit #16938 which I've just submitted a PR to disable to avoid hitting that flake.

@joestringer
Copy link
Member

I double-checked the Jenkins builds and agree with @qmonnet 's assessment above. This is good to merge, the next batch of v1.11 backports should return stability to the tree.

@joestringer joestringer merged commit 12bb19b into cilium:v1.11 Dec 2, 2021
@qmonnet qmonnet deleted the pr/v1.11-backport-2021-11-30 branch December 2, 2021 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet