Skip to content

Etcd backed membership

lucyge2022 edited this page Aug 2, 2024 · 4 revisions

Architecture

Etcd served as distributed system coordinator in alluxio Architecture, currently it provides the membership module to manage alluxio Dora cluster worker membership.

Topology of deployment:

        Etcd managing one Alluxio cluster                   Etcd managing multiple Alluxio clusters
            --------------                                         --------------        
           | Etcd cluster |                                       | Etcd cluster |
            --------------                                         --------------
           /       |       \                                     /                \
      ------------------------------                 ------------------------     -----------------------
     |     Alluxio Dora cluster 1   |               | Alluxio Dora cluster 1 |   | Alluxio Dora cluster 2 |
     | Worker | Worker | Worker |...|               | worker | worker ...    |   | worker | worker ...    |
      ------------------------------                 ------------------------     ------------------------

Etcd specs

Currently only supporting etcd v3 API. And the etcd v3 authentication is not supported.

Registration

Initial cluster, Deploy and Spinup

  • [StartEtcd] make sure etcd up and running..

  • [StartWorkers], start 100 workers

each worker will join the membership, it will first register itself to the path on etcd in the format of:

/DHT/<cluster_name>/AUTHORIZED/<worker_id>  
// the cluster_name could be configured thru: alluxio.cluster.name` otherwise "DefaultAlluxioCluster" is used.
// the worker_id is a auto-generated unique identifier of each worker
// the value stored in this path is the worker info such as network addresses/port etc.

Note that this path is a permanent key-value pair registered on etcd. It's an idempotent operation at time of the starting up the worker process.

  • [StartServiceDiscovery]

After registration, worker will do service registry on etcd for service discovery to report its liveliness. It will create an ephemeral key path which binds an etcd lease with a fixed TTL (currently hard-coded to 2sec), and doing keep-alive calls to keep renewing the lease to report its liveliness. If the worker process crash or there's network partition happening on this worker, this path will be deleted by etcd after TTL passed.

/ServiceDiscovery/<cluster_name>/<worker_id>

Either client or worker, when requesting etcd with prefix of "/ServiceDiscovery/<cluster_name>/" will get a list of living workers, comparing against the registered worker, we can get a list of failed workers.

Bootstrapping a new worker

Simply start the worker with the configured the etcd endpoints, it will follow the steps of Registration and StartingServicediscovery registry, and automatically join the cluster membership.

Decommissioning

For ETCD typed membership, The workflow to do decommissioning is:

  1. bring the target worker down.
  2. use cli to delete the node permanently from the etcd registration. bin/alluxio process remove-worker <worker_id>

e.g. bin/alluxio process remove-worker worker-46af7238-f09b-4db6-986e-152d9516a2da

Detailed implementation:

Registration:

The registration happens with a transaction (Txn) on etcd, the transaction will only happen when the registration key path's version is 0 (first time creation). Since the current worker_id [=] md5(WorkerNetAddress.dumpMainInfo() string), therefore for a same set of values of fields contained in dumpMainInfo is considered a same worker. If same such configured worker is going to register at the same time onto etcd, only one will succeed in completing the transaction.

Refer to etcd doc: https://etcd.io/docs/v3.4/learning/api/#transaction

ServiceDiscovery:

After registration, Worker first creates a new lease on etcd with a fix TTL, and uses Txn to create the ephemeral key path with the lease only when the version is 0. We used a StreamObserver to act on LeaseKeepAliveResponse when creating a keep-alive client on this ephemeral key path, when onError or onCompleted is called to notify the end of the keep-alive client lifecycle, we will mark this worker entity needs reconnection. A background periodic task will check the entity, and recreate the lease and the ephemeral key-value path.

Reference: https://etcd.io/docs/v3.4/learning/api/#lease-api

Retrieve membership infos

Range query with a prefix of the registration key path (/DHT/<cluster_name>/AUTHORIZED/) or ServiceDiscovery (/ServiceDiscovery/<cluster_name>/) could give full view / live view of the worker members.

High availability guarantee

As etcd guarantees high availability by automatically switching leadership in case leader nodes down. Alluxio Worker will automatically detect the new endpoint to connect to and keep the service discovery heartbeating ongoing.

The details about deploying etcd backed alluxio cluster, check this wiki: https://github.com/Alluxio/alluxio/wiki/Membership-Module

Code gateway: https://github.com/Alluxio/alluxio/blob/main/dora/core/common/src/main/java/alluxio/membership/EtcdMembershipManager.java