[v1.15] Prevent Cilium agents from incorrectly restarting an etcd watch against a different clustermesh-apiserver instance. #32005

[ upstream commit df3c02f ] [ backporter's notes: dropped the session affinity changes, and backported only the introduction of the unique cluster id which, together with the interceptors backported as part of the next commit, prevents Cilium agents from incorrectly restarting an etcd watch against a different clustermesh-apiserver instance. ] This commit makes changes to the helm templates for clustermesh-apiserver to support deploying multiple replicas. - Use a unique cluster id for etcd: Each replica of the clustermesh-apiserver deploys its own discrete etcd cluster. Utilize the K8s downward API to provide the Pod UUID to the etcd cluster as an initial cluster token, so that each instance has a unique cluster ID. This is necessary to distinguish connections to multiple clustermesh-apiserver Pods using the same K8s Service. - Use session affinity for the clustermesh-apiserver Service Session affinity ensures that connections from a client are passed to the same service backend each time. This will allow a Cilium Agent or KVStoreMesh instance to maintain a connection to the same backend for both long-living, streaming connections, such as watches on the kv store, and short, single-response connections, such as checking the status of a cluster. However, this can be unreliable if the l3/l4 loadbalancer used does not also implement sticky sessions to direct connections from a particular client to the same cluster node. Signed-off-by: Tim Horner <timothy.horner@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

[ upstream commit 174e721 ] [ backporter's notes: backported a stripped down version of the upstream commit including the introduction of the interceptors only, as fixing a bug occurring in a single clustermesh-apiserver configuration as well (during rollouts), by preventing Cilium agents from incorrectly restarting an etcd watch against a different clustermesh-apiserver instance. ] In a configuration where there are mutliple replicas of the clustermesh-apiserver, each Pod runs its own etcd instance with a unique cluster ID. This commit adds a `clusterLock` type, which is a wrapper around a uint64 that can only be set once. `clusterLock` is used to create gRPC unary and stream interceptors that are provided to the etcd client to intercept and validate the cluster ID in the header of all responses from the etcd server. If the client receives a response from a different cluster, the connection is terminated and restarted. This is designed to prevent accepting responses from another cluster and potentially missing events or retaining invalid data. Since the addition of the interceptors allows quick detection of a failover event, we no longer need to rely on endpoint status checks to determine if the connection is healthy. Additionally, since service session affinity can be unreliable, the status checks could trigger a false failover event and cause a connection restart. To allow creating etcd clients for ClusterMesh that do not perform endpoint status checks, the option NoEndpointStatusChecks was added to ExtraOptions. Signed-off-by: Tim Horner <timothy.horner@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.15] Prevent Cilium agents from incorrectly restarting an etcd watch against a different clustermesh-apiserver instance. #32005

[v1.15] Prevent Cilium agents from incorrectly restarting an etcd watch against a different clustermesh-apiserver instance. #32005

Commits on Apr 16, 2024