Skip to content

Latest commit

 

History

History
342 lines (232 loc) · 15.8 KB

cluster-topology-patterns.md

File metadata and controls

342 lines (232 loc) · 15.8 KB
title summary toc
Cluster Topology Patterns
Common cluster topology patterns with setup examples and performance considerations.
true

This page covers common cluster topology patterns with setup examples, as well as the benefits and trade-off for each pattern. Before you select a candidate pattern for your cluster, use the following broad patterns as a starting point and consider trade-offs.

Considerations

When selecting a pattern for your cluster, the following must be taken into consideration:

  • The function of a CockroachDB leaseholder
  • The impacts of the leaseholder on read and write activities
  • The leaseholders are local to reader and writers within the datacenter
  • The --locality flag must be set properly on each node to enable follow-the-workload
  • The leaseholder migration among the datacenters is minimized by using partitioning, an Enterprise feature
  • Whether the application is designed to use the partitioning feature or not

{{site.data.alerts.callout_info}} This page does not factor in hardware differences. {{site.data.alerts.end}}

Single datacenter clusters

Basic pattern for a single datacenter cluster

This first example is of a single datacenter cluster, i.e., a local deployment. This pattern is common starting point for smaller organizations who may not have the resources (or need) to worry about a datacenter failure, but still want to take advantage of CockroachDB's high availability. The cluster is self-hosted with each node on a different machine within the same datacenter. The network latency among the nodes is expected to be the same, around 1ms.

Local deployment

For the diagram above:

Configuration

  • App is an application that accesses CockroachDB
  • Load Balancer is a software-based load balancer
  • The 3 nodes are all running in a single datacenter
  • All CockroachDB nodes communicate with each other
  • The cluster is using the default replication factor of 3 (represented by r1, r2, r3)

Availability expectations

  • The cluster can survive 1 node failure because a majority of replicas (2/3) remains available. It will not survive a datacenter failure.

Performance expectations

  • The network latency among the nodes is expected to be the same, sub-millisecond.

More resilient local deployment

While the basic local deployment takes advantage of CockroachDB's high availability, shares the load, and spreads capacity, dynamically scaling out the nodes from 3 (to 4) to 5 has many benefits:

  • There will be more room to increase replication factor, which increases resiliency against the failure of more than one node.
  • You can scale out; because there are more nodes, you can increase throughput, add storage, etc.

There are no constraints on node increments.

Resilient local deployment

Single-region clusters

Single-region, multiple datacenters cluster

Once an organization begins to grow, a datacenter outage isn't acceptable and a cluster needs to be available all of the time. This is where a single-region cluster with multiple datacenters is useful. For example, an organization can do a cloud deployment across multiple datacenters within the same geographical region.

Single region multiple datacenters

For the diagram above:

Configuration

  • App is an application that accesses CockroachDB
  • Load Balancer is a software-based load balancer
  • The 3 nodes are each in a different datacenter, all located in the us-east region
  • All CockroachDB nodes communicate with each other
  • The cluster is using the default replication factor of 3 (represented by r1, r2, r3)

Availability expectations

  • The cluster can withstand a datacenter failure.

Performance expectations

  • The network latency among the nodes is expected to be the same, sub-millisecond.

Multi-region clusters

Basic pattern for a multi-region cluster

For even more resiliency, use a multi-region cluster. A multi-region cluster is comprised of multiple datacenters in different regions (e.g., East, West), that each have with multiple nodes. CockroachDB will automatically try to diversify replica placement across localities (i.e., place a replica in each region). Using this setup, many organization will also transition to using different cloud providers (one provider per region).

In this example, the cluster has an asymmetrical setup where Central is closer to the West than the East. This configuration will provide better write latency to the write workloads in the West and Central because there is a lower latency (versus writing in the East). This is assuming you are not using zone configurations.

Basic pattern for multi-region

For this example:

Configuration

  • Nodes are spread across 3 regions within a country (West, East, Central)
  • A software-based load balancer directs traffic to any of the regions' nodes at random
  • Every region has 3 datacenters
  • All CockroachDB nodes communicate with each other
  • Similar to the local topology, more regions can be added dynamically
  • A homogenous configuration among the regions for simplified operations is recommended
  • For sophisticated workloads, each region can have different node count and node specification. This heterogeneous configuration could better handle regional specific concurrency and load characteristics.

When locality is enabled, the load balancer should be setup to load balance on the database nodes within the same locality as the app servers first:

  • The West app servers should connect to the West CockroachDB servers
  • The Central app servers should connect to the Central CockroachDB servers
  • The East app servers should connect to the East CockroachDB servers

Availability expectations

If all of the nodes for a preferred locality are down, then the app will try databases in other localities. The cluster can withstand a datacenter failure. In general, multi-regions can help protect against natural disaster.

Performance expectations

  • The latency numbers (e.g., 60ms) in the first diagram represent network round-trip from one datacenter to another.
  • Follow-the-workload will keep the performance quick for where the load is so you do not pay cross-country latency on reads.
  • Write latencies will not be faster than the slowest quorum between two regions.

More performant multi-region cluster

While the basic pattern for a multi-region cluster can help protect against regional failures, there will be high latency due to cross-country roundtrips. This is not ideal for organizations who have users spread out across the country. For any multi-region cluster, partitioning should be used to keep data close to the users who access it.

Multi-region partition

This setup uses a modern multi-tier architecture, which is simplified to global server load balancer (GSLB), App, and Load Balancer layers in the below diagram:

Multi-tier architecture

Configuration

  • Nodes are spread across 3 regions within a country (West, East, Central)

  • A client connects to geographically close app server via GSLB.

  • Inside each region, an app server connects to one of the CockroachDB nodes within their geography through a software-based load balancer

  • Every region has 3 datacenters

  • All CockroachDB nodes communicate with each other

  • Tables are partitioned at row-level by locality.

  • Rows with the West partition have their leaseholder in the West datacenter.

  • Rows with the Central partition have their leaseholder in the Central datacenter.

  • Rows with the East partition have their leaseholder in the East datacenter.

  • Replicas are evenly distributed among the three datacenters.

  • Abbreviated startup flag for each datacenter:

    --loc=Region=East
    --loc=Region=Central
    --loc=Region=West
    

Availability expectations

  • Can survive a single datacenter failure, since a majority of the replicas will remain available.

Performance expectations

  • Reads respond in a few milliseconds.
  • Writes respond in 60ms.
  • Symmetrical latency between datacenters.

Application expectations

  • West App servers connect to the West CockroachDB nodes.
  • Central App servers connect to the Central CockroachDB nodes.
  • East App servers connect to the East CockroachDB nodes.

Anti-patterns

Do we want to add a section for bad patterns / things not to do? What should be added here?