Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.15 Backports 2024-03-24 #31568

Merged
merged 6 commits into from Mar 25, 2024
Merged

v1.15 Backports 2024-03-24 #31568

merged 6 commits into from Mar 25, 2024

Conversation

sayboras
Copy link
Member

@sayboras sayboras commented Mar 24, 2024

Once this PR is merged, a GitHub action will update the labels of these PRs:

 31211 31504 31376 31503 31373 31469

kaworu and others added 2 commits March 24, 2024 19:00
[ upstream commit 9939fa2 ]

Before this patch, Hubble would wrongly report known traffic direction
and reply status when IPSec was enabled.

Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit fbe78c4 ]

The default service CIDR of AKS clusters is 10.0.0.0/16 [1].
Unfortunately, we don't set a pod cidr for clusterpool IPAM, and hence
use cilium's default of 10.0.0.0/8, which overlaps. This can
lead to "fun" situations in which e.g. the kube-dns service ClusterIP is
the same as the hubble-relay pod IP, or similar shenanigans. This
usually breaks the cluster utterly.

The fix is relatively straight-forward: set a pod CIDR for cilium which
does not overlap with defaults of AKS. We chose 192.168.0.0/16 as this
is what is recommended in [2].

[1]: https://learn.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-with-system-assigned-managed-identities
[2]: https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#option-1-assign-ip-addresses-from-an-overlay-network

Fixes: fbf3d38 (ci: add AKS workflow)

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras sayboras added kind/backports This PR provides functionality previously merged into master. backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. labels Mar 24, 2024
@sayboras
Copy link
Member Author

/test-backport-1.15

@sayboras sayboras marked this pull request as ready for review March 24, 2024 10:01
@sayboras sayboras requested review from a team as code owners March 24, 2024 10:01
@sayboras sayboras requested review from aanm and tklauser March 24, 2024 10:01
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tam!

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My patch LGTM, thanks @sayboras!

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bimmlerd
Copy link
Member

AFAIK @glrf is on PTO for the week - might want to drop his PR for now if there's issues

glrf and others added 4 commits March 25, 2024 11:29
[ upstream commit 7162892 ]

This change fixes an issue with the PeerManager that can lead to Relay
being unable to reconnect to some or all peers when the client
certificate expires or the Certificate Authority is replaced.

Before this change, when the client certificate changes, we did not
redial or update the exiting gRPC ClientConns. When the old certificate
becomes invalid, (expiring, changed CA, or revoked) The connection will
eventually fail with a certificate error.
However, the gRPC ClientConn is not closed, but treats the certificate
error as a transient failure and will retry connecting with the old
credentials indefinitely.
In most cases this will cause the relay health checks to fail. Relay
will restart and successfully reconnect to all peers. However, if a new
peer joins between the certificate being updated and the connections
failing, Relay may keep on running in a degraded state.

This issue was introduced by #28595. Before that change, Relay
aggressively closed and re-dialed ClientConns on any error, mitigating
this problem.

We fix this issue by wrapping the provided gRPC transport credentials
and updating the TLS configuration whenever a new TLS connection is
established. This means every TLS connection will use up-to-date
certificates and gRPC ClientConns will be able to recover when their
certificate changes.

Fixes: aca4d42 ("hubble/relay: Fix connection leak when reconnecting to peer service")

Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Alexandre Perrin <alex@isovalent.com>
… port

[ upstream commit d3b19d6 ]

Currently, listing the load-balancing configuration doesn't display the
L7LB Proxy Port for services of type `l7-load-balancer`.

```
cilium-dbg bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)
...
10.96.193.7:443     0.0.0.0:0 (30) (0) [ClusterIP, non-routable, l7-load-balancer]
```

The only way of retrieving the L7LB proxy port is to list the frontends
(`cilium-dbg bpf lb list --frontends`) and manually convert the backend id
(union type) to the L7LB proxy port.

Therefore, this commit addsd the L7LB proxy port to the output of `cilium-dbg bpf lb list`
if the service is of type L7 LoadBalancer. The `--frontends` subcommand still displays the
unmapped backend id.

```
cilium-dbg bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)
10.96.0.1:443       172.18.0.3:6443 (1) (1)
                    0.0.0.0:0 (1) (0) [ClusterIP, non-routable]
10.96.252.10:443    172.18.0.2:4244 (22) (1)
                    0.0.0.0:0 (22) (0) [ClusterIP, InternalLocal, non-routable]
10.96.155.44:80     0.0.0.0:0 (14) (0) [ClusterIP, non-routable]
                    10.244.1.211:80 (14) (1)
172.18.0.2:32646    0.0.0.0:0 (33) (0) [NodePort, l7-load-balancer] (L7LB Proxy Port: 15735)
10.96.193.7:443     0.0.0.0:0 (30) (0) [ClusterIP, non-routable, l7-load-balancer] (L7LB Proxy Port: 15735)
10.96.122.45:80     10.244.1.250:80 (26) (1)
                    0.0.0.0:0 (26) (0) [ClusterIP, non-routable]
10.96.102.137:80    0.0.0.0:0 (23) (0) [ClusterIP, non-routable]
                    10.244.1.126:4245 (23) (1)
10.96.108.180:443   0.0.0.0:0 (17) (0) [ClusterIP, non-routable, l7-load-balancer] (L7LB Proxy Port: 17731)
172.18.255.1:80     0.0.0.0:0 (25) (0) [LoadBalancer, l7-load-balancer] (L7LB Proxy Port: 17731)
0.0.0.0:32646       0.0.0.0:0 (34) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 15735)
0.0.0.0:31012       0.0.0.0:0 (21) (0) [NodePort, non-routable, l7-load-balancer] (L7LB Proxy Port: 17731)
```

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit f8fb8d1 ]

Replaced docker.io by quay.io pinned with current latest
Docker source is deprecated.

Signed-off-by: loomkoom <loomkoom@hotmail.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
[ upstream commit 59a01a8 ]

Implement following initial metrics for BGP Control Plane.

1. cilium_bgp_control_plane_session_state

Gauge that shows session state per vrouter/neighbor. Established (1) or
Not Established (0).

2. cilium_bgp_control_plane_advertised_routes

Gauge that shows the number of advertised routes per
vrouter/neighbor/afi/safi.

3. cilium_bgp_control_plane_received_routes

Gauge that shows the number of received routes per
vrouter/neighbor/afi/safi.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
@kaworu kaworu force-pushed the pr/v1.15-backport-2024-03-24-07-00 branch from c3eda6d to 145be9c Compare March 25, 2024 10:29
@sayboras
Copy link
Member Author

/test-backport-1.15

@sayboras
Copy link
Member Author

Most of the reviews are in, CI is all green, marking this ready to merge.

@sayboras sayboras added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 25, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 25, 2024
@aanm aanm merged commit b15c342 into v1.15 Mar 25, 2024
219 checks passed
@aanm aanm deleted the pr/v1.15-backport-2024-03-24-07-00 branch March 25, 2024 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants