Skip to content

Latest commit

 

History

History
334 lines (231 loc) · 18.8 KB

cluster-topology-patterns.md

File metadata and controls

334 lines (231 loc) · 18.8 KB
title summary toc
Cluster Topology Patterns
Common cluster topology patterns with setup examples and performance considerations.
true

This page covers common cluster topology patterns with setup examples, as well as the benefits and trade-off for each pattern. Before you select a candidate pattern for your cluster, use the following broad patterns as a starting point and consider trade-offs.

Considerations

Before selecting a pattern:

  • Review the recommendations and requirements in our Production Checklist.
  • Review the CockroachDB architecture. It's especially important to understand how data is stored in ranges, how ranges are replicated, and how one replica in each range serves as the "leaseholder" that coordinates all read and write requests for that range. See our Performance Tuning tutorial for more details about these important concepts and for simple read and write examples.
  • Learn about the concept of locality, which makes CockroachDB aware of the location of nodes and able to intelligently balance replicas across localities. Locality is also a prerequisite for the follow-the-workload feature and for enterprise partitioning.
  • Learn about follower reads, an enterprise feature, which reduces latency for read queries by letting the closest replica serve the read request at the expense of only not guaranteeing that data is up to date.

{{site.data.alerts.callout_info}} This page does not factor in hardware differences. {{site.data.alerts.end}}

Single-region clusters

Single datacenter, basic pattern

This first example is of a single-datacenter cluster, with each node on a different machine as per our basic topology recommendations. This pattern is common starting point for smaller organizations who may not have the resources (or need) to worry about a datacenter failure but still want to take advantage of CockroachDB's high availability.

Local deployment

For the diagram above:

Configuration

  • App is an application that accesses CockroachDB.
  • Load Balancer is a software-based load balancer.
  • Leaseholders are denoted by a dashed line.
  • The 3 nodes are all running in a single datacenter.
  • The cluster is using the default replication factor of 3 (represented by 3 blocks of the same color). Each range (e.g., r1) has 3 replicas, with each replica on a different node.
  • All CockroachDB nodes communicate with each other

Availability expectations

  • With the default replication factor of 3, the cluster can tolerate 1 node failure. In such a case, all ranges still have 2 replicas on live nodes and, thus, a majority.

Performance expectations

  • The network latency among the nodes is expected to be sub-millisecond.

Single datacenter, more performant and/or resilient

While the basic single-datacenter deployment takes advantage of CockroachDB's high availability, shares the load, and spreads capacity, scaling out the nodes has many benefits:

  • Performance: Adding nodes for more processing power and/or storage typically increases throughput. For example, with five nodes and a replication factor of 3, each range has 3 replicas, with each replica on a different node. In this case, there will only be 1-2 replicas on each nod, leaving additional storage and bandwidth available.
  • Resiliency: There will be more room to increase the replication factor, which increases resiliency against the failure of more than one node. For example, with 5 nodes and a replication factor of 5, each range has 5 replicas, with each replica on a different node. In this case, even with 2 nodes down, each range retains a majority of its replicas (3/5).

There are no constraints on node increments.

Resilient local deployment

Multi-region clusters

Multiple regions, basic pattern

Once an organization begins to grow, a datacenter outage isn't acceptable and a cluster needs to be available all of the time. This is where a multi-region cluster is useful. A multi-region cluster is comprised of multiple datacenters in different regions (e.g., us-east, us-west), each with multiple nodes. CockroachDB will automatically try to diversify replica placement across localities (i.e., place a replica in each region). This setup can be used when your application is not SLA-sensitive, or you do not care about write performance. With this cluster pattern, many organizations will consider transitioning to using a variety of cloud providers (one provider per region).

In this example, the cluster has an asymmetrical setup where Central is closer to the West than the East. This configuration will provide better write latency to the write workloads in the West and Central because there is a lower latency (versus writing in the East).

Basic pattern for multi-region

Each region has 2 nodes across 2 datacenters and does not use partitioning:

Basic pattern for multi-region

For this example:

Configuration

  • App is an application that accesses CockroachDB.

  • Load Balancers are software-based load balancers that direct traffic to each of the regions' nodes at random.

  • Leaseholders are denoted by a dashed line.

  • 6 Nodes are spread across 3 regions (us-west, us-central, us-east) within a country (us).

  • Every region has 2 nodes across 2 datacenters (e.g., us-west-a, us-west-b). Note that most cloud providers only have 2 datacenters per region. Each node is started with the --locality flag to identify which region it is in:

    --locality=region=us-west,datacenter=us-west-a
    --locality=region=us-west,datacenter=us-west-b
    --locality=region=us-central,datacenter=us-central-a
    --locality=region=us-central,datacenter=us-central-b
    --locality=region=us-east,datacenter=us-east-a
    --locality=region=us-east,datacenter=us-east-b
    
  • The cluster is using a replication factor of 5 (represented by 5 blocks of the same color). Each range (e.g., r1) has 5 replicas, with each replica on a different node.

  • All CockroachDB nodes communicate with each other

  • Similar to the single-datacenter topology, more regions can be added dynamically.

Availability expectations

  • If all of the nodes for a preferred locality are down, then the app will try datacenters in other localities.
  • The cluster can withstand a datacenter failure without losing a region because there are 2 nodes in each region.
  • The cluster can withstand a regional failure because even with 2 nodes down, each range retains a majority of its replicas (3/5). In general, multi-regions can help protect against natural disaster.

Performance expectations

  • The latency numbers (e.g., 60ms) in the first diagram represent network round-trip from one datacenter to another.
  • Follow-the-workload will increase the speed for reads.
  • Write latencies will be as fast as the slowest quorum between 2 regions.

Multiple regions, more performant (with partitioning)

While the basic pattern for a multi-region cluster can help protect against datacenter and regional failures, there will be high latency due to cross-country roundtrips. This is not ideal for organizations who have users spread out across the country (or world). For any multi-region cluster, partitioning should be used to keep data close to the users who access it.

In this example, a table is partitioned by a column indicating the region where a customer is located (e.g., a table has a city column and the values LA, SF, and SD are partitioned to the us-west region). Then, zone configurations are used to keep the replicas and leaseholders for each partition in the closest datacenter to those customer.

This setup uses a modern multi-tier architecture, which is simplified to global server load balancer (GSLB), App, and Load Balancer layers in the below diagram:

Partitioned multi-region

Configuration

A multi-region cluster with partitioning has a similar setup as the basic multi-region pattern:

  • 9 Nodes are spread across 3 regions (us-west, us-central, us-east) within a country (us).

  • A client connects to geographically close app server via GSLB.

  • Inside each region, an app server connects to one of the CockroachDB nodes within the region through a software-based load balancer.

  • Every region has 3 nodes across 2 datacenters (e.g., us-west-a, us-west-b). Note that most cloud providers only have 2 datacenters per region. Each node is started with the --locality flag to identify which region it is in:

    --locality=region=us-west,datacenter=us-west-a
    --locality=region=us-west,datacenter=us-west-b
    --locality=region=us-central,datacenter=us-central-a
    --locality=region=us-central,datacenter=us-central-b
    --locality=region=us-east,datacenter=us-east-a
    --locality=region=us-east,datacenter=us-east-b
    
  • The cluster is using a replication factor of 3 (represented by the 3 blocks of the same color). Each range (e.g., r1) has a prefix (w- for West, c- for Central, e- for East), which denotes the partition that is replicated.

  • Leaseholders are denoted by a dashed line. Using zone configurations, leaseholders can be pinned (represented by the x) to a datacenter close to the users.

  • All CockroachDB nodes communicate with each other.

However, to make the cluster more performant, you need to add partitions (an enterprise-only feature). In this example:

  • Tables are partitioned at the row level by locality.
  • Partition replicas are distributed among the 3 nodes within each region.
  • Rows with the region=us-west partition have their leaseholder constrained to a us-west-b datacenter.
  • Rows with the region=us-central partition have their leaseholder constrained to a us-central-a datacenter.
  • Rows with the region=us-east partition have their leaseholder constrained to a us-east-b datacenter.

Availability expectations

  • If a datacenter with 1 replica is lost, the cluster will not lose a region because there is a majority of replicas (2/3) in the region's other datacenter.
  • If a datacenter with 2 range is lost, the cluster will lose a region (i.e., data is unavailable) because there is only 1 range in the region's other datacenter. For more information, see the Locality-resilience tradeoff section of the Define Table Partitions doc.

Performance expectations

  • Reads respond in 2-4 milliseconds.
  • Writes respond in 2-4 milliseconds.
  • Symmetrical latency between datacenters.

Anti-patterns

Anti-patterns are commonly used patterns that are ineffective or risky. Consider the following when choosing a cluster pattern:

  • Do not deploy to 2 datacenters. A cluster across 2 datacenters is not protected against datacenter failure and can lead to a split-brain scenario. For CockroachDB to work from a resiliency standpoint, it is best practice to deploy your cluster across 3 or more datacenters.
  • Do not deploy to regions with high network latency (e.g., us-west, asia, and europe) without using partitioning.
  • The cluster's replication factor does not need to be the same as the number of nodes in the cluster. In fact, as you scale your cluster, you should add nodes (but keep the replication factor at 5, for example) to improve performance. This is shown in the Single datacenter, more resilient and/or performant section.