Pulsar Failure Domain & Anti-affinity namespace group #840
Labels
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Milestone
Motivation
While there are tremendous operational benefits of collapsing domains under a single umbrella, one thing that becomes more difficult is managing the impact of failures when they do occur. So, it would be always beneficiary to divide a single domain into multiple virtual domains that can serve as failure domains and increases the system availability in case one of the domain goes offline.
In Pulsar, each cluster will have multiple pre-configured failure domains and each failure domain is a logical region that contains set of brokers. Creating these multiple failure domains in a cluster can be useful in many scenarios:
Deploy a patch or release to specific domain:
Sometimes, there will be a need to deploy a change/patch which requires for set of clients (eg. some critical/high-traffic namespace requires immediate attention or bug-fix).
Rollout a new release domain by domain:
We can do new release roll out domain by domain which can make sure that other domains are always available if current domain is going through maintenance or deployment-rollout.
Support anti-affinity namespace group:
Sometimes application has multiple namespaces and wants one of them available all the time to avoid any downtime. In this case, we can distribute such namespaces to different failure domains. We will discuss this feature in details in next section.
Therefore, we want to introduce "Failure domain" and "Anti-Affinity namespace" in pulsar .
Proposal
Pulsar failure domain
Pulsar failure domain is a logical domain under a Pulsar cluster. Each logical domain contains pre-configured list of brokers in it. Pulsar will have admin api to create a logical domains under a cluster and register list of brokers under those logical domains.
How to create domain and register brokers
Broker will store domain configuration in global-zookeeper at path:
/admin/clusters/<my-cluster>/domains/<domain-name>
Admin api for domain:
Anti-affinity namespace group
Sometimes application has multiple namespaces and we want one of them available all the time to avoid any downtime. In this case, these namespaces should be owned by different failure domains and different brokers so, if one of the failure domain is down (due to release rollout or brokers restart) then it will only disrupt namespaces that owned by that specific failure domain which is down and rest of the namespaces owned by other domains will remain available without any impact.
Therefore, such group of namespaces have anti-affinity to each other and together they make an anti-affinity-group which describes that all the namespaces that are part of this anti-affinity group have anti-affinity and load-balancer should try to place these namespaces to different failure domains. if there are more anti-affinity namespaces than failure domains then, load-balancer distributes namespaces evenly across all the domains and also every domain should distribute namespaces evenly across all the brokers under that domain.
For instance in figure 1:
[Figure 1: anti-affinity namespace distribution across failure domains]
Broker changes:
Namespace policies
To describe anti-affinity between namespaces, we have to bind them under one anti-affinity group which indicates that all namespaces under this group have anti-affinity to each other. Therefore, we will introduce a new field “antiAffinityGroup” under namespace-policies.
Load-balancer
While assigning namespace-bundle ownership, load-balancer will first check the anti-affinity group name for this namespace and if it exists then load-balancer will get list of all namespaces which belong to same anti-affinity group. Once, load-balancer will retrieve list of anti-affinity namespaces that belong to this group, load-balancer will try to place them under different failure domains.
As we described earlier, load-balancer will provide a best effort to distribute such namespaces to different failure domains but it does not give guarantee if we have more number of anti-affinity namespaces than number of failure domains.
If we add a new namespace to an existing anti-affinity group then load-balancer will not unload already loaded namespace bundles but load-balancer will make sure that newly coming lookup request considers this change.
The text was updated successfully, but these errors were encountered: