Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.15] Prevent Cilium agents from incorrectly restarting an etcd watch against a different clustermesh-apiserver instance. #32005

Merged
merged 2 commits into from
Apr 18, 2024

Commits on Apr 16, 2024

  1. ClusterMesh/helm: support multiple replicas

    [ upstream commit df3c02f ]
    
    [ backporter's notes: dropped the session affinity changes, and
      backported only the introduction of the unique cluster id which,
      together with the interceptors backported as part of the next
      commit, prevents Cilium agents from incorrectly restarting an
      etcd watch against a different clustermesh-apiserver instance. ]
    
    This commit makes changes to the helm templates for
    clustermesh-apiserver to support deploying multiple replicas.
    
    - Use a unique cluster id for etcd:
    
    Each replica of the clustermesh-apiserver deploys its own discrete etcd
    cluster. Utilize the K8s downward API to provide the Pod UUID to the
    etcd cluster as an initial cluster token, so that each instance has a
    unique cluster ID. This is necessary to distinguish connections to
    multiple clustermesh-apiserver Pods using the same K8s Service.
    
    - Use session affinity for the clustermesh-apiserver Service
    
    Session affinity ensures that connections from a client are passed to
    the same service backend each time. This will allow a Cilium Agent or
    KVStoreMesh instance to maintain a connection to the same backend for
    both long-living, streaming connections, such as watches on the kv
    store, and short, single-response connections, such as checking the
    status of a cluster. However, this can be unreliable if the l3/l4
    loadbalancer used does not also implement sticky sessions to direct
    connections from a particular client to the same cluster node.
    
    Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
    Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
    thorn3r authored and giorio94 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    cce1cc0 View commit details
    Browse the repository at this point in the history
  2. ClusterMesh: validate etcd cluster ID

    [ upstream commit 174e721 ]
    
    [ backporter's notes: backported a stripped down version of the upstream
      commit including the introduction of the interceptors only, as fixing
      a bug occurring in a single clustermesh-apiserver configuration as
      well (during rollouts), by preventing Cilium agents from incorrectly
      restarting an etcd watch against a different clustermesh-apiserver
      instance. ]
    
    In a configuration where there are mutliple replicas of the
    clustermesh-apiserver, each Pod runs its own etcd instance with a unique
    cluster ID. This commit adds a `clusterLock` type, which is a wrapper
    around a uint64 that can only be set once. `clusterLock` is used to
    create gRPC unary and stream interceptors that are provided to the etcd
    client to intercept and validate the cluster ID in the header of all
    responses from the etcd server.
    
    If the client receives a response from a different cluster, the
    connection is terminated and restarted. This is designed to prevent
    accepting responses from another cluster and potentially missing events
    or retaining invalid data.
    
    Since the addition of the interceptors allows quick detection of a
    failover event, we no longer need to rely on endpoint status checks to
    determine if the connection is healthy. Additionally, since service session
    affinity can be unreliable, the status checks could trigger a false
    failover event and cause a connection restart. To allow creating etcd
    clients for ClusterMesh that do not perform endpoint status checks, the
    option NoEndpointStatusChecks was added to ExtraOptions.
    
    Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
    Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
    thorn3r authored and giorio94 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    4f8a3a6 View commit details
    Browse the repository at this point in the history