New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.15 Backports 2024-03-24 #31568
v1.15 Backports 2024-03-24 #31568
Conversation
[ upstream commit 9939fa2 ] Before this patch, Hubble would wrongly report known traffic direction and reply status when IPSec was enabled. Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit fbe78c4 ] The default service CIDR of AKS clusters is 10.0.0.0/16 [1]. Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence use cilium's default of 10.0.0.0/8, which overlaps. This can lead to "fun" situations in which e.g. the kube-dns service ClusterIP is the same as the hubble-relay pod IP, or similar shenanigans. This usually breaks the cluster utterly. The fix is relatively straight-forward: set a pod CIDR for cilium which does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this is what is recommended in [2]. [1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities [2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network Fixes: fbf3d38 (ci: add AKS workflow) Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io>
/test-backport-1.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tam!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My patch LGTM, thanks @sayboras!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
AFAIK @glrf is on PTO for the week - might want to drop his PR for now if there's issues |
[ upstream commit 7162892 ] This change fixes an issue with the PeerManager that can lead to Relay being unable to reconnect to some or all peers when the client certificate expires or the Certificate Authority is replaced. Before this change, when the client certificate changes, we did not redial or update the exiting gRPC ClientConns. When the old certificate becomes invalid, (expiring, changed CA, or revoked) The connection will eventually fail with a certificate error. However, the gRPC ClientConn is not closed, but treats the certificate error as a transient failure and will retry connecting with the old credentials indefinitely. In most cases this will cause the relay health checks to fail. Relay will restart and successfully reconnect to all peers. However, if a new peer joins between the certificate being updated and the connections failing, Relay may keep on running in a degraded state. This issue was introduced by #28595. Before that change, Relay aggressively closed and re-dialed ClientConns on any error, mitigating this problem. We fix this issue by wrapping the provided gRPC transport credentials and updating the TLS configuration whenever a new TLS connection is established. This means every TLS connection will use up-to-date certificates and gRPC ClientConns will be able to recover when their certificate changes. Fixes: aca4d42 ("hubble/relay: Fix connection leak when reconnecting to peer service") Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Alexandre Perrin <alex@isovalent.com>
… port [ upstream commit d3b19d6 ] Currently, listing the load-balancing configuration doesn't display the L7LB Proxy Port for services of type `l7-load-balancer`. ``` cilium-dbg bpf lb list SERVICE ADDRESS BACKEND ADDRESS (REVNAT_ID) (SLOT) ... 10.96.193.7:443 0.0.0.0:0 (30) (0) [ClusterIP, non-routable, l7-load-balancer] ``` The only way of retrieving the L7LB proxy port is to list the frontends (`cilium-dbg bpf lb list --frontends`) and manually convert the backend id (union type) to the L7LB proxy port. Therefore, this commit addsd the L7LB proxy port to the output of `cilium-dbg bpf lb list` if the service is of type L7 LoadBalancer. The `--frontends` subcommand still displays the unmapped backend id. ``` cilium-dbg bpf lb list SERVICE ADDRESS BACKEND ADDRESS (REVNAT_ID) (SLOT) 10.96.0.1:443 172.18.0.3:6443 (1) (1) 0.0.0.0:0 (1) (0) [ClusterIP, non-routable] 10.96.252.10:443 172.18.0.2:4244 (22) (1) 0.0.0.0:0 (22) (0) [ClusterIP, InternalLocal, non-routable] 10.96.155.44:80 0.0.0.0:0 (14) (0) [ClusterIP, non-routable] 10.244.1.211:80 (14) (1) 172.18.0.2:32646 0.0.0.0:0 (33) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 15735) 10.96.193.7:443 0.0.0.0:0 (30) (0) [ClusterIP, non-routable, l7-load-balancer] (L7LB Proxy Port: 15735) 10.96.122.45:80 10.244.1.250:80 (26) (1) 0.0.0.0:0 (26) (0) [ClusterIP, non-routable] 10.96.102.137:80 0.0.0.0:0 (23) (0) [ClusterIP, non-routable] 10.244.1.126:4245 (23) (1) 10.96.108.180:443 0.0.0.0:0 (17) (0) [ClusterIP, non-routable, l7-load-balancer] (L7LB Proxy Port: 17731) 172.18.255.1:80 0.0.0.0:0 (25) (0) [LoadBalancer, l7-load-balancer] (L7LB Proxy Port: 17731) 0.0.0.0:32646 0.0.0.0:0 (34) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 15735) 0.0.0.0:31012 0.0.0.0:0 (21) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 17731) ``` Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit f8fb8d1 ] Replaced docker.io by quay.io pinned with current latest Docker source is deprecated. Signed-off-by: loomkoom <loomkoom@hotmail.com> Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit 59a01a8 ] Implement following initial metrics for BGP Control Plane. 1. cilium_bgp_control_plane_session_state Gauge that shows session state per vrouter/neighbor. Established (1) or Not Established (0). 2. cilium_bgp_control_plane_advertised_routes Gauge that shows the number of advertised routes per vrouter/neighbor/afi/safi. 3. cilium_bgp_control_plane_received_routes Gauge that shows the number of received routes per vrouter/neighbor/afi/safi. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io>
c3eda6d
to
145be9c
Compare
/test-backport-1.15 |
Most of the reviews are in, CI is all green, marking this ready to merge. |
Once this PR is merged, a GitHub action will update the labels of these PRs: