title | summary | toc |
---|---|---|
Cluster Topology Patterns |
An illustration of common topology patterns. |
true |
This is page covers common cluster topology patterns with setup examples and performance considerations.
When selecting a pattern for your cluster, the following must be taken into consideration:
- The function of a CockroachDB leaseholder.
- The impacts of the leaseholder on read and write activities.
- The leaseholders are local to reader and writers within the datacenter.
- The leaseholder migration among the datacenters is minimized by using the partitioning feature.
- Whether the application is designed to use the partitioning feature or not.
- The
--locality
flag must be set properly on each node to enable follow-the-workload.
Before you select a candidate pattern for your cluster, use the following broad patterns as a starting point and consider trade-offs.
{{site.data.alerts.callout_info}} This page does not factor in hardware differences. {{site.data.alerts.end}}
A local deployment is a single datacenter deployment. The network latency among the nodes is expected to be the same, around 1ms.
In the diagrams below:
App
is an application that accesses CockroachDBHA-Proxy
is a software based load balancer1
,2
, and3
each represents a CockroachDB node- The nodes are all running in a single datacenter
Normal operating mode:
App
|
HA-Proxy
/ | \
/ | \
1------2------3
\___________/
If node 1
goes down, the database and app are still fully operational:
App
|
HA-Proxy
/ | \
/ | \
x------2------3
\___________/
If node 2
is down, the database and app are still fully operational:
App
|
HA-Proxy
/ | \
/ | \
1------x------3
\___________/
If node 3
is down, the database and app are still fully operational:
App
|
HA-Proxy
/ | \
/ | \
1------2------x
\___________/
Three or more nodes are recommended to provide high availability, share the load, and spread the capacity. Dynamically scaling out the nodes from three to four, four to five, or any other intervals is supported. There are no constraints on the server increments.
The diagram below depicts each node as a letter (i.e., A
, B
, C
, D
, E
):
A------C A-----C A-----C
\ / online | | online | \ / |
\ / ------> | | ------> | E |
\/ | | | / \ |
B B-----D B-----D
The sample patterns in this section represent multi-region clusters and show a broad placement of datacenters as West
, East
and Central
. The latency numbers (e.g., 60ms
) represent network round-trip from one datacenter to another.
The diagram below depicts an asymmetrical setup where Central
is closer to the West
than the East
. This configuration will provide better write latency to the write workloads in the West
and Central
.
A---C A---C
\ / \ /
B B
West---80m---East
\ /
20ms 60ms
\ /
\ /
\/
Central
A---C
\ /
B
In this pattern:
- Each region defines an availability zone.
- Three or more regions are recommended.
- Similar to the local topology, more regions can be added dynamically.
- A homogenous configuration among the regions for simplified operations is recommended.
- For sophisticated workloads, each region can have different node count and node specification. This heterogeneous configuration could better handle regional specific concurrency and load characteristics.
- Each region defines an availability zone, and three or more regions are recommended.
- Can survive a single datacenter failure
- The network latency among the regions is expected to be linear to the distance among the nodes.
Modern multi-tier architecture is simplified as App
and LB
layers in the below diagram:
Clients
|
GSLB
|
+--------+---------+
West Central East
| | |
App | App | App
--- | --- | ---
LB | LB | LB
| |
West---Central ---East
\ /
\ CockroachDB /
-------------
- A client connects to geographically close app server via
GSLB
. - The app servers connect to one of the CockroachDB nodes within their geography via local balancer (
LB
). - The configuration for the software-based load balancer (
HAProxy
), located on the app server, is provided by CockroachDB. A network-based load balancer can also be used.
- Can survive a single datacenter failure
When locality is enabled, haproxy
should be setup to load balance on the database nodes within the same locality as the app servers first:
- The
West
app servers should connect to the West CockroachDB servers. - The
Central
app servers should connect to the Central CockroachDB servers. - The
East
app servers should connect to the East CockroachDB servers.
If all of the nodes for a preferred locality are down, then the app will try databases in other localities.
Some applications have high-performance requirements. In the diagram below, NJ
and NY
depict two separate datacenters that are connected by a high bandwidth low-latency network:
NJ ---1ms--- NY
\ /
20ms 20ms
\ /
\/
Central
/\
/ \
20ms 20ms
/ \
CA ---1ms--- NV
In this pattern:
NJ
andNY
have the performance characteristics of the local topology, but the benefit of Zero RPO and near Zero RTO disaster recovery SLA.CA
andNV
have been set up with a network capability- The
Central
region serves as the quorum.
- The cluster can survive a single datacenter failure.
The global pattern connects multiple regional clusters together to form a single database that is globally distributed. Transactions are globally consistent.
West-----East West-------East
\ / \ /
\Asia/ \Europe/
\ / \ /
\/ \ /
Central Central
Asia------Europe
\ /
\ /
\ /
Americas
West---------East
\ /
\Americas/
\ /
Central
- The cluster can survive a single datacenter failure.
The sample patterns in this section assume the usage of the geo-partitioning feature, with West
, East
and Central
indicating a broad placement of datacenters. The latency numbers (e.g., 60ms
) represent network round-trip from one datacenter to another.
Topology | West | Central | East ------------------------------------------------------+------------------------------+------------------------+------------------------ Symmetrical | Read local, Write 60ms | Read local, Write 60ms | Read local, Write 60ms Dual East | East Read local, Write 60ms | | Read local, Write 5ms Dual East and West | Read local, Write 5ms | |
During normal operations:
App App
\ /
West ---60ms--- East
\ /
60ms 60ms
\ /
\ /
Central
|
App
- Tables are partitioned at row-level by locality.
- Rows with the
West
partition have their leaseholder in theWest
datacenter. - Rows with the
Central
partition have their leaseholder in theCentral
datacenter. - Rows with the
East
partition have their leaseholder in theEast
datacenter. - Replicas are evenly distributed among the three datacenters.
- The cluster can survive a single datacenter failure.
- Reads respond in a few milliseconds.
- Writes respond in 60ms.
- Symmetrical latency between datacenters.
- West
App
servers connect to theWest
CockroachDB nodes. - Central
App
servers connect to theCentral
CockroachDB nodes. - East
App
servers connect to theEast
CockroachDB nodes.
-
Abbreviated startup flag for each datacenter:
--loc=Region=East --loc=Region=Central --loc=Region=West
During normal operations:
App App
\ /
West ---60ms--- East1
\ |
\ 5ms
60ms |
\____East2
- Rows with the
West
partition will have the leaseholder in theWest
datacenter. - Rows with the
East
partition will have the leaseholder in theEast1
datacenter. - The 3 replicas will be evenly distributed among the three datacenters.
- The cluster can survive a single datacenter failure.
- The reader can expect to have a couple of milliseconds response time.
- The
East
writers can expect to have a 5ms response time. - The
West
writers can expect to have a 60ms response time.
- West
App
servers connect to theWest
CockroachDB nodes - East
App
servers connect to theEast1
CockroachDB nodes
-
Abbreviated startup flag for each datacenter:
--loc=Region=East,DC=1 --loc=Region=East,DC=2 --loc=Region=West
During normal operations:
App App
\ /
West1 ---60ms--- East1
| \ / |
5ms - 60ms - 5ms
| / \ |
West2----60ms----East2
- Rows with the
West
partition will have the leaseholder in theWest1
datacenter. - Rows with the
East
partition will have the leaseholder in theEast1
datacenter. - West partitions will have 1 replica each in
West1
andWest2
, then 1 replica inEast1
orEast2
. - East partitions will have 1 replica each in
East1
andEast2
, then 1 replica inWest1
orWest2
.
- The cluster can survive a single datacenter failure.
- The reader can expect to have a couple of milliseconds response time.
- The writers can expect to have a 5ms response time.
- West
App
servers connect to theWest
CockroachDB nodes. - East
App
servers connect to theEast
CockroachDB nodes.
-
Abbreviated startup flag for each datacenter:
--loc=Region=West,DC=1 --loc=Region=West,DC=2 --loc=Region=East,DC=1 --loc=Region=East,DC=2
Do we want to add a section for bad patterns (i.e., two datacenters, even # of replicas)?