New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2023-07-10 #26734
v1.14 Backports 2023-07-10 #26734
Conversation
[ upstream commit b9f467a ] Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit c426dc8 ] - Only document using the Helm mode. Time to deprecate the classic mode. - Remove "clustermesh enable" and "clustermesh connect" output samples. These outputs might change in the future, and I think output samples from the subsequent "clustermesh status" commands give users enough context to confirm that "clustermesh enable" and "clustermesh connect" commands succeeded. Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 2e2fc63 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1a4e7d4 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a0366e3 ] Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 31592e7 ] This commit optimizes the logoutput by given insights when an authentication request is skipped due to a pending auth request. Without this log entry, it might be quite irritating why so many auth requests are logged without further insights. In addition, a log output is added when the pending auth is cleared. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 8a5f1c6 ] The CRD spec of CiliumBGPPeeringPolicy states ``` NodeSelector selects a group of nodes where this BGP Peering Policy applies. If empty / nil this policy applies to all nodes. ``` However, our current implementation treats empty / nil selector as an error. This commit fixes it. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 90cfb26 ] This fixes a bug where Cilium agent would release CIDRs it considered unused because the endpoints using them weren't restored yet. We fix this by not releasing any CIDRs before endpoint restoration has finished. Co-authored-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tobias Klauser <tobias@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a361b2d ] 9f5a82a ("clustermesh: use custom dialer for service resolution") configured a custom dialer to perform service resolution when the etcd address refers to a local service (mainly in the kvstoremesh case). Yet, the custom dialer function outputs quite verbose log messages, which can be misleading when the address does not point to a local service (as it is correct that the address cannot be parsed). One example being: level=error msg="Unable to parse etcd service URL" error="parse \"172.19.0.2:2379\": first path segment in URL cannot contain colon" Hence, let's silence them, as they do more harm than good in this specific situation. We still print (at debug level) the outcome of the translation, so that we know to whom we are actually connecting. Fixes: 9f5a82a ("clustermesh: use custom dialer for service resolution") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit ad76862 ] Currently, the custom dialer ignores the context it is passed to. Let's switch to using DialContext, so that we properly propagate it. This mimics also the default behavior when no custom dialer is configured [1]. [1] vendor/google.golang.org/grpc/internal/transport/http2_client.go:179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit ba52226 ] - Consistently use the --set flag. - Replace --helm-auto-gen-values with --dry-run-helm-values. - Set kubeProxyReplacement to "true" instead of "strict". - Add a section in the upgrade guide. - Add a warning in cilium-cli installation instructions to highlight that you need to upgrade cilium-cli to v0.15.0 or later. Ref: #26430 Co-authored-by: Marco Iorio <marco.iorio@isovalent.com> Co-authored-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit af4d761 ] This addresses a bug where deleting a completed pod would wind up deleting the service map entry for the corresponding running pod on the same node, due to the hostPort mapping key being the same for the old and the new pod. Ideally we want to validate whether the completed pod "owns" the host port service, before deleting it, thus preventing breakage of host port connectivity for any running pods with the same service as frontend. This commit adds such a validation. Testing- Automated (Control Plane): This fix is captured by a control plane test case that does the following: 1. Create a hostport pod, and terminate it's running containers to mark it as "Completed". 2. Create another hostport pod using the same port as the "Completed" pod. 3. Delete the "Completed" pod, and verify that the hostport service has not been deleted in the Datapath. Manual Testing - 1. Add the GracefulNodeShutdown in the kubelet config on all nodes by modifying the configuration in `/var/lib/kubelet/config.yaml` ``` featureGates: GracefulNodeShutdown: true shutdownGracePeriod: 30s shutdownGracePeriodCriticalPods: 10s ``` 2. Run `sudo systemctl restart kubelet` on each node to apply the kubelet config change 3. Deploy an nginx web server with hostPort set, as well as a nodeSelector, so pods get scheduled on the same node after node restarts. ``` apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 1 template: metadata: labels: app: nginx spec: nodeSelector: kubernetes.io/hostname: <node-name> containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 hostPort: 8081 ``` 4. Run `systemctl reboot` on the worker node to restart the machine. 5. After reboot spot the old pod in `Completed state`, while the new pod is `Running`. ``` $ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-deployment-645797c867-8p2hp 0/1 Completed 0 13m nginx-deployment-645797c867-dx2m8 1/1 Running 0 4m2s ``` 6. `curl nodeIP:hostPort` successfully get the result. 7. Manually deleted the old pod which is in Completed state. ``` $ kubectl delete pod/nginx-deployment-645797c867-8p2hp ``` 8. Redo the `curl nodeIP:hostPort`, and successfully get the result again. // hostPort service has been preserved. Signed-off-by: Yash Shetty <yashshetty@google.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 751fc71 ] This commit adds a check and documentation that kvstore-based identity allocation mode is currently not supported. This is due to #26621. We intend to fix that in Cilium v1.15. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 83a9066 ] The k8s Node watcher receives events related to the local node from the Resource[T] framework Events() method. The Resource[T] framework starts a single informer for each type, alongside a workqueue for each new subscriber that calls the Events() method. The workqueue is used to accumulate the keys related to the objects that are added and/or updated. When Store() is called before Events(), the informer is started, but no queue exists to receive the events yet. Only when the Events() is called, the framework iterates over all the keys in the store to send the related updates to the new subscriber. Doing that, events like an ADD followed by one or more UPDATE might be coalesced in a single ADD event carrying the last available version of the object. What this means for the consumers of the NodeChain subscription mechanism, is that discarding the ADD events with an empty handler might lead to missing update events too. In order to avoid this, all the NodeChain subscribers like the CiliumNodeUpdater and the EndpointManager should have a ADD handler that mimics what their UPDATE handler is already doing. Fixes: #26082 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1bbe80c ] Since OnUpdateNode is actually implemented for the endpointmanager, remove the stale commit that said the opposite. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a33f234 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit cb7f447 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 16ef64b ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 9fb24de ] when the manager fails to resolve the identity of an endpoint to a set of labels, instead of discarding the endpoint update, enqueue the event and retry identity labels resolution following the same exponential backoff logic used by the identity allocator Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit a1007ef ] inside getEndpointMetadata as that's where the rest of the checks for the received CiliumEndpoint event live Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 03bb181 ] Ref: https://github.com/cilium/cilium-cli#experimental-helm-installation-mode Ref: #25156 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 1f35baf ] Both of these workflows use binaries that are built in CI making use of various vendored dependencies, so run them as well on PRs only changing vendor/. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 93da6ae ] Currently sysdumps do not contain the L2 responder map dumps. This commit adds these so that we can debug issues with the L2 announcements if they come up in the future. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my changes, thanks Jibi!
/test-backport-1.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my PR, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits look good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for #22587
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My change looks good. Thanks!
couple of trivial docs/bugtool PRs left, marking as ready |
Once this PR is merged, you can update the PR labels via:
or with