title	summary	toc
Cluster Topology Patterns	Common cluster topology patterns with setup examples and performance considerations.	true

This page covers common cluster topology patterns with setup examples, as well as the benefits and trade-off for each pattern. Before you select a candidate pattern for your cluster, use the following broad patterns as a starting point and consider trade-offs.

Considerations

When selecting a pattern for your cluster, the following must be taken into consideration:

The function of a CockroachDB leaseholder
The impacts of the leaseholder on read and write activities
The leaseholders are local to reader and writers within the datacenter
The --locality flag must be set properly on each node to enable follow-the-workload
The leaseholder migration among the datacenters is minimized by using partitioning, an Enterprise feature
Whether the application is designed to use the partitioning feature or not

{{site.data.alerts.callout_info}} This page does not factor in hardware differences. {{site.data.alerts.end}}

Single datacenter clusters

Basic pattern for a single datacenter cluster

This first example is of a single datacenter cluster, i.e., a local deployment. This pattern is common starting point for smaller organizations who may not have the resources (or need) to worry about a datacenter failure, but still want to take advantage of CockroachDB's high availability. The cluster is self-hosted with each node on a different machine within the same datacenter. The network latency among the nodes is expected to be the same, around 1ms.

For the diagram above:

Configuration

App is an application that accesses CockroachDB
Load Balancer is a software-based load balancer
The 3 nodes are all running in a single datacenter
All CockroachDB nodes communicate with each other
The cluster is using the default replication factor of 3 (represented by r1, r2, r3)

Availability expectations

The cluster can survive 1 node failure because a majority of replicas (2/3) remains available. It will not survive a datacenter failure.

Performance expectations

The network latency among the nodes is expected to be the same, sub-millisecond.

More resilient local deployment

While the basic local deployment takes advantage of CockroachDB's high availability, shares the load, and spreads capacity, dynamically scaling out the nodes from 3 (to 4) to 5 has many benefits:

There will be more room to increase replication factor, which increases resiliency against the failure of more than one node.
You can scale out; because there are more nodes, you can increase throughput, add storage, etc.

There are no constraints on node increments.

Single-region clusters

Single-region, multiple datacenters cluster

Once an organization begins to grow, a datacenter outage isn't acceptable and a cluster needs to be available all of the time. This is where a single-region cluster with multiple datacenters is useful. For example, an organization can do a cloud deployment across multiple datacenters within the same geographical region.

For the diagram above:

Configuration

App is an application that accesses CockroachDB
Load Balancer is a software-based load balancer
The 3 nodes are each in a different datacenter, all located in the us-east region
All CockroachDB nodes communicate with each other
The cluster is using the default replication factor of 3 (represented by r1, r2, r3)

Availability expectations

The cluster can withstand a datacenter failure.

Performance expectations

The network latency among the nodes is expected to be the same, sub-millisecond.

Multi-region clusters

Basic pattern for a multi-region cluster

For even more resiliency, use a multi-region cluster. A multi-region cluster is comprised of multiple datacenters in different regions (e.g., East, West), that each have with multiple nodes. CockroachDB will automatically try to diversify replica placement across localities (i.e., place a replica in each region). Using this setup, many organization will also transition to using different cloud providers (one provider per region).

In this example, the cluster has an asymmetrical setup where Central is closer to the West than the East. This configuration will provide better write latency to the write workloads in the West and Central because there is a lower latency (versus writing in the East). This is assuming you are not using zone configurations.

For this example:

Configuration

Nodes are spread across 3 regions within a country (West, East, Central)
A software-based load balancer directs traffic to any of the regions' nodes at random
Every region has 3 datacenters
All CockroachDB nodes communicate with each other
Similar to the local topology, more regions can be added dynamically
A homogenous configuration among the regions for simplified operations is recommended
For sophisticated workloads, each region can have different node count and node specification. This heterogeneous configuration could better handle regional specific concurrency and load characteristics.

When locality is enabled, the load balancer should be setup to load balance on the database nodes within the same locality as the app servers first:

The West app servers should connect to the West CockroachDB servers
The Central app servers should connect to the Central CockroachDB servers
The East app servers should connect to the East CockroachDB servers

Availability expectations

If all of the nodes for a preferred locality are down, then the app will try databases in other localities. The cluster can withstand a datacenter failure. In general, multi-regions can help protect against natural disaster.

Performance expectations

The latency numbers (e.g., 60ms) in the first diagram represent network round-trip from one datacenter to another.
Follow-the-workload will keep the performance quick for where the load is so you do not pay cross-country latency on reads.
Write latencies will not be faster than the slowest quorum between two regions.

More performant multi-region cluster

While the basic pattern for a multi-region cluster can help protect against regional failures, there will be high latency due to cross-country roundtrips. This is not ideal for organizations who have users spread out across the country. For any multi-region cluster, partitioning should be used to keep data close to the users who access it.

This setup uses a modern multi-tier architecture, which is simplified to global server load balancer (GSLB), App, and Load Balancer layers in the below diagram:

Configuration

Nodes are spread across 3 regions within a country (West, East, Central)
A client connects to geographically close app server via GSLB.
Inside each region, an app server connects to one of the CockroachDB nodes within their geography through a software-based load balancer
Every region has 3 datacenters
All CockroachDB nodes communicate with each other
Tables are partitioned at row-level by locality.
Rows with the West partition have their leaseholder in the West datacenter.
Rows with the Central partition have their leaseholder in the Central datacenter.
Rows with the East partition have their leaseholder in the East datacenter.
Replicas are evenly distributed among the three datacenters.

Abbreviated startup flag for each datacenter:

--loc=Region=East
--loc=Region=Central
--loc=Region=West

Availability expectations

Can survive a single datacenter failure, since a majority of the replicas will remain available.

Performance expectations

Reads respond in a few milliseconds.
Writes respond in 60ms.
Symmetrical latency between datacenters.

Application expectations

West App servers connect to the West CockroachDB nodes.
Central App servers connect to the Central CockroachDB nodes.
East App servers connect to the East CockroachDB nodes.

Anti-patterns

Do we want to add a section for bad patterns / things not to do? What should be added here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-topology-patterns.md

cluster-topology-patterns.md

Considerations

Single datacenter clusters

Basic pattern for a single datacenter cluster

More resilient local deployment

Single-region clusters

Single-region, multiple datacenters cluster

Multi-region clusters

Basic pattern for a multi-region cluster

More performant multi-region cluster

Anti-patterns

Files

cluster-topology-patterns.md

Latest commit

History

cluster-topology-patterns.md

File metadata and controls

Considerations

Single datacenter clusters

Basic pattern for a single datacenter cluster

More resilient local deployment

Single-region clusters

Single-region, multiple datacenters cluster

Multi-region clusters

Basic pattern for a multi-region cluster

More performant multi-region cluster

Anti-patterns