Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.14 Backports 2023-07-10 #26734

Merged
merged 26 commits into from Jul 11, 2023
Merged

v1.14 Backports 2023-07-10 #26734

merged 26 commits into from Jul 11, 2023

Conversation

jibi
Copy link
Member

@jibi jibi commented Jul 10, 2023

Once this PR is merged, you can update the PR labels via:

for pr in 26607 26608 26577 26615 26642 26590 26668 26614 26606 22587 26662 26671 26484 25217 26457 26675 26692 26715 26667; do contrib/backporting/set-labels.py $pr done 1.14; done

or with

make add-labels BRANCH=v1.14 ISSUES=26607,26608,26577,26615,26642,26590,26668,26614,26606,22587,26662,26671,26484,25217,26457,26675,26692,26715,26667

michi-covalent and others added 26 commits July 10, 2023 11:38
[ upstream commit b9f467a ]

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit c426dc8 ]

- Only document using the Helm mode. Time to deprecate the classic mode.
- Remove "clustermesh enable" and "clustermesh connect" output samples.
  These outputs might change in the future, and I think output samples
  from the subsequent "clustermesh status" commands give users enough
  context to confirm that "clustermesh enable" and "clustermesh connect"
  commands succeeded.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 2e2fc63 ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1a4e7d4 ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a0366e3 ]

Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 31592e7 ]

This commit optimizes the logoutput by given insights when an
authentication request is skipped due to a pending auth request. Without
this log entry, it might be quite irritating why so many auth requests
are logged without further insights.

In addition, a log output is added when the pending auth is cleared.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 8a5f1c6 ]

The CRD spec of CiliumBGPPeeringPolicy states

```
NodeSelector selects a group of nodes where this BGP Peering Policy
applies. If empty / nil this policy applies to all nodes.
```

However, our current implementation treats empty / nil selector as an
error. This commit fixes it.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 90cfb26 ]

This fixes a bug where Cilium agent would release CIDRs it considered
unused because the endpoints using them weren't restored yet. We fix
this by not releasing any CIDRs before endpoint restoration has
finished.

Co-authored-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a361b2d ]

9f5a82a ("clustermesh: use custom dialer for service resolution")
configured a custom dialer to perform service resolution when the etcd
address refers to a local service (mainly in the kvstoremesh case).

Yet, the custom dialer function outputs quite verbose log messages, which
can be misleading when the address does not point to a local service (as
it is correct that the address cannot be parsed). One example being:

level=error msg="Unable to parse etcd service URL"
  error="parse \"172.19.0.2:2379\": first path segment in URL cannot contain colon"

Hence, let's silence them, as they do more harm than good in this specific
situation. We still print (at debug level) the outcome of the translation,
so that we know to whom we are actually connecting.

Fixes: 9f5a82a ("clustermesh: use custom dialer for service resolution")
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit ad76862 ]

Currently, the custom dialer ignores the context it is passed to. Let's
switch to using DialContext, so that we properly propagate it. This
mimics also the default behavior when no custom dialer is configured [1].

[1] vendor/google.golang.org/grpc/internal/transport/http2_client.go:179

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit ba52226 ]

- Consistently use the --set flag.
- Replace --helm-auto-gen-values with --dry-run-helm-values.
- Set kubeProxyReplacement to "true" instead of "strict".
- Add a section in the upgrade guide.
- Add a warning in cilium-cli installation instructions to highlight
  that you need to upgrade cilium-cli to v0.15.0 or later.

Ref: #26430

Co-authored-by: Marco Iorio <marco.iorio@isovalent.com>
Co-authored-by: Ryan Drew <ryan.drew@isovalent.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit af4d761 ]

This addresses a bug where deleting a completed pod would wind up
deleting the service map entry for the corresponding running pod on
the same node, due to the hostPort mapping key being the same for
the old and the new pod.

Ideally we want to validate whether the completed pod "owns" the
host port service, before deleting it, thus preventing breakage of
host port connectivity for any running pods with the same service
as frontend.

This commit adds such a validation.

Testing-
Automated (Control Plane):
This fix is captured by a control plane test case that does the
following:
1. Create a hostport pod, and terminate it's running containers to mark
   it as "Completed".
2. Create another hostport pod using the same port as the "Completed" pod.
3. Delete the "Completed" pod, and verify that the hostport service has
   not been deleted in the Datapath.

Manual Testing -
1. Add the GracefulNodeShutdown in the kubelet config on all nodes by
   modifying the configuration in `/var/lib/kubelet/config.yaml`
   ```
    featureGates:
      GracefulNodeShutdown: true
    shutdownGracePeriod: 30s
    shutdownGracePeriodCriticalPods: 10s
   ```
2. Run `sudo systemctl restart kubelet` on each node to apply the kubelet
   config change
3. Deploy an nginx web server with hostPort set, as well as a
   nodeSelector, so pods get scheduled on the same node after node
   restarts.
   ```
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: nginx-deployment
   spec:
     selector:
       matchLabels:
         app: nginx
    replicas: 1
    template:
      metadata:
        labels:
          app: nginx
      spec:
        nodeSelector:
	  kubernetes.io/hostname: <node-name>
	containers:
	- name: nginx
          image: nginx:1.14.2
          ports:
          - containerPort: 80
            hostPort: 8081
   ```
4. Run `systemctl reboot` on the worker node to restart the machine.
5. After reboot spot the old pod in `Completed state`, while the new pod
   is `Running`.
   ```
   $ kubectl get pods
   NAME                                READY   STATUS      RESTARTS    AGE
   nginx-deployment-645797c867-8p2hp   0/1     Completed   0           13m
   nginx-deployment-645797c867-dx2m8   1/1     Running     0           4m2s
   ```
6. `curl nodeIP:hostPort` successfully get the result.
7. Manually deleted the old pod which is in Completed state.
   ```
   $ kubectl delete pod/nginx-deployment-645797c867-8p2hp
   ```
8. Redo the `curl nodeIP:hostPort`, and successfully get the result again.    // hostPort service has been preserved.

Signed-off-by: Yash Shetty <yashshetty@google.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 751fc71 ]

This commit adds a check and documentation that kvstore-based identity
allocation mode is currently not supported. This is due to #26621. We
intend to fix that in Cilium v1.15.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 2141d67 ]

It was missed in e674820 ("Update k8s tests and libraries to v1.27.2").

Fixes: e674820 ("Update k8s tests and libraries to v1.27.2")
Fixes: #25602
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit b48c7d4 ]

Fixes: e674820 ("Update k8s tests and libraries to v1.27.2")
Fixes: #25602
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 83a9066 ]

The k8s Node watcher receives events related to the local node from the
Resource[T] framework Events() method.

The Resource[T] framework starts a single informer for each type,
alongside a workqueue for each new subscriber that calls the Events()
method. The workqueue is used to accumulate the keys related to the
objects that are added and/or updated.

When Store() is called before Events(), the informer is started, but no
queue exists to receive the events yet. Only when the Events() is
called, the framework iterates over all the keys in the store to send
the related updates to the new subscriber. Doing that, events like an
ADD followed by one or more UPDATE might be coalesced in a single ADD
event carrying the last available version of the object.

What this means for the consumers of the NodeChain subscription
mechanism, is that discarding the ADD events with an empty handler might
lead to missing update events too.  In order to avoid this, all the
NodeChain subscribers like the CiliumNodeUpdater and the EndpointManager
should have a ADD handler that mimics what their UPDATE handler is
already doing.

Fixes: #26082

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1bbe80c ]

Since OnUpdateNode is actually implemented for the endpointmanager,
remove the stale commit that said the opposite.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a33f234 ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit cb7f447 ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 16ef64b ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 9fb24de ]

when the manager fails to resolve the identity of an endpoint to a set
of labels, instead of discarding the endpoint update, enqueue the event
and retry identity labels resolution following the same exponential
backoff logic used by the identity allocator

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a1007ef ]

inside getEndpointMetadata as that's where the rest of the checks for
the received CiliumEndpoint event live

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit d3b3c7a ]

Document what is already known in the community for debugging CPU
bottlenecks and memory leaks within Cilium.

Fixes: #23004
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 03bb181 ]

Ref: https://github.com/cilium/cilium-cli#experimental-helm-installation-mode
Ref: #25156

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1f35baf ]

Both of these workflows use binaries that are built in CI making use of
various vendored dependencies, so run them as well on PRs only changing
vendor/.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 93da6ae ]

Currently sysdumps do not contain the L2 responder map dumps.
This commit adds these so that we can debug issues with the
L2 announcements if they come up in the future.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Jul 10, 2023
@jibi jibi requested review from brb and mhofstetter July 10, 2023 11:39
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my changes, thanks Jibi!

@jibi
Copy link
Member Author

jibi commented Jul 10, 2023

/test-backport-1.14

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my PR, thanks!

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits look good. Thanks!

Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Copy link
Contributor

@yasz24 yasz24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for #22587

@jibi jibi marked this pull request as ready for review July 10, 2023 15:45
@jibi jibi requested review from a team as code owners July 10, 2023 15:45
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My change looks good. Thanks!

@jibi
Copy link
Member Author

jibi commented Jul 11, 2023

couple of trivial docs/bugtool PRs left, marking as ready

@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 11, 2023
@julianwiedmann julianwiedmann merged commit 2d32db8 into v1.14 Jul 11, 2023
178 checks passed
@julianwiedmann julianwiedmann deleted the pr/v1.14-backport-2023-07-10 branch July 11, 2023 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet