From 759ddae6127049231ec270d51cb655caa4109a22 Mon Sep 17 00:00:00 2001 From: Jesse Seldess Date: Thu, 1 Sep 2016 14:11:05 -0400 Subject: [PATCH 1/3] zone config updates --- beta-20160602.md | 2 +- configure-replication-zones.md | 317 +++++++++++++++++++++-------- recommended-production-settings.md | 12 +- start-a-node.md | 2 +- 4 files changed, 245 insertions(+), 88 deletions(-) diff --git a/beta-20160602.md b/beta-20160602.md index 1b1df46b9ff..7994db8c052 100644 --- a/beta-20160602.md +++ b/beta-20160602.md @@ -48,7 +48,7 @@ Get future release notes emailed to you: - Added a tutorial on [building an Application with CockroachDB and SQLAlchemy](https://www.cockroachlabs.com/blog/building-application-cockroachdb-sqlalchemy-2/) - Added docs on [how CockroachDB handles `NULL` values](null-handling.html) in various contexts. [#333](https://github.com/cockroachdb/docs/pull/333) - Improved guidance on [Contributing to CockroachDB docs](https://github.com/cockroachdb/docs/blob/gh-pages/CONTRIBUTING.md). [#344](https://github.com/cockroachdb/docs/pull/344) -- Improved [zone configuration examples](configure-replication-zones.html#examples). [#327](https://github.com/cockroachdb/docs/pull/327) +- Improved [zone configuration examples](configure-replication-zones.html#basic-examples). [#327](https://github.com/cockroachdb/docs/pull/327) ### Contributors diff --git a/configure-replication-zones.md b/configure-replication-zones.md index 96fc6b6dab1..5e1a450d1fc 100644 --- a/configure-replication-zones.md +++ b/configure-replication-zones.md @@ -41,20 +41,43 @@ constraints: [comma-separated constraint list] Field | Description ------|------------ `range_min_bytes` | Not yet implemented. -`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range surpasses this size, CockroachDB will spit it into two ranges.

**Default:** `67108864` (64MiB) +`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will spit it into two ranges.

**Default:** `67108864` (64MiB) `ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select.html#select-historical-data-time-travel).

It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.

**Default:** `86400` (24 hours) `num_replicas` | The number of replicas in the zone.

**Default:** `3` -`constraints` | A comma-separated list of attributes from nodes and/or stores where replicas should be located.

Node-level and store-level attributes are arbitrary strings specified when starting a node. You must match these strings exactly here in order for replication to work as you intend, so be sure to check carefully. See [Start a Node](start-a-node.html) for more details about node and store attributes.

**Default:** No constraints, with CockroachDB locating each replica on a unique node +`constraints` | A comma-separated list of positive, required, and/or prohibited constraints influencing the location of replicas. See [Replica Constraints](#replication-constraints) for more details.

**Default:** No constraints, with CockroachDB locating each replica on a unique rack, if possible. -### Node/Replica Recommendations +### Replication Constraints + +The location of replicas is based on the interplay between descriptive attributes assigned to nodes when they are started and constraints set in zone configurations. + +{{site.data.alerts.callout_success}}For demonstrations of how to set node attributes and replication constraints in different scenarios, see Scenario-based Examples below.{{site.data.alerts.end}} + +#### Descriptive Attributes Assigned to Nodes -When running a cluster with more than one node, each replica will be on a different node and a majority of replicas must remain available for the cluster to make progress. Therefore: +When starting a node with the [`cockroach start`](start-a-node.html) command, you can assign the following types of descriptive attributes: -- When running a cluster with more than one node, you should run at least three to ensure that a majority of replicas (2/3) remains available when a node goes down. +Attribute Type | Description +---------------|------------ +**Node Locality** | Using the `--locality` flag, you can assign arbitrary key-value pairs that describe the locality of the node. Locality might include cloud provider, country, region, datacenter, rack, etc.

CockroachDB attempts to spread replicas evenly across the cluster based on locality, with the order determining the priority. The keys themselves and the order of key-value pairs must be the same on all nodes, for example:

`--locality=cloud=gce,country=us,region=east,datacenter=us-east-1,rack=10`
`--locality=cloud=aws,country=us,region=west,datacenter=us-west-1,rack=12` +**Node Capability** | Using the `--attrs` flag, you can specify node capability, which might include specialized hardware or number of cores, for example:

`--attrs=gpu:x16c` +**Store Type/Capability** | Using the `attrs` field of the `--store` flag, you can specify disk type or capability, for example:

`--store=path=/mnt/ssd01,attrs=ssd`
`--store=path=/mnt/hda1,attrs=hdd:7200rpm` -- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a quorum (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. +#### Constraints in Replication Zones -- When replicating across datacenters, you should use datacenters on a single continent to ensure peformance (cross-continent scenarios will be better supported in the future). If the average network round-trip latency between your datacenters is greater than 200ms, you should adjust the [`raft-tick-interval`](start-a-node.html#flags) flag on each node. +The node- and store-level descriptive attributes mentioned above can be used as the following types of constraints in replication zones to influence the location of replicas. However, note the following general guidance: + +- When locality is the only consideration for replication, it's recommended to set locality on nodes without specifying any constraints in zone configurations. In the absence of constraints, CockroachDB attempts to spread replicas evenly across the cluster based on locality. +- When additional or different constraints are needed, positive constraints are generally sufficient. Required and prohibited constraints are useful in special situations where, for example, data must or must not be stored in a specific country. + +Constraint Type | Description | Syntax +----------------|-------------|------- +**Positive** | Replicas will be placed on nodes/stores with as many matching attributes as possible. When there are no matching nodes/stores with capacity, replicas will be added wherever there is capacity.

For example, `constraints: [ssd, datacenter=us-west-1a]` would cause replicas to be located on different `ssd` drives in the `us-west-1a` datacenter. When there's not sufficient capacity for a new replica on an `ssd` drive in the datacenter, the replica would get added on an available `ssd` drive elsewhere. When there's not sufficient capacity for a new replica on any `ssd` drive, the replica would get added on any other drive with capacity. | `[ssd]` +**Required** | Replicas **must** be placed on nodes/stores with matching attributes. When there are no matching nodes/stores with capacity, new replicas will not be added.

For example, `constraints: [+datacenter=us-west-1a]` would force replicas to be located on different racks in the `us-west-1a` datacenter. When there's not sufficient capacity for a new replica on a unique rack in the datacenter, the replica would get added on a rack already storing a replica. When there's not sufficient capacity for a new replica on any rack in the datacenter, the replica would not get added. | `[+ssd]` +**Prohibited** | Replicas **must not** be placed on nodes/stores with matching attributes. When there are no alternate nodes/stores with capacity, new replicas will not be added.

For example, `constraints: [-mem, -us-west-1a]` would force replicas to be located on-disk on different racks outside the `us-west-1a` datacenter. When there's not sufficient on-disk capacity on a unique rack outside the datacenter, the replica would get added on a rack already storing a replica. When there's sufficient capacity for a new replica only in the `us-west-1a` datacenter, the replica would not get added. | `[-ssd]` + +### Node/Replica Recommendations + +See [Cluster Topography](recommended-production-settings.html#cluster-topology) recommendations for production deployments. ## Subcommands @@ -121,12 +144,9 @@ Flag | Description `--url` | The connection URL. If you use this flag, do not set any other connection flags.

For insecure connections, the URL format is:
`--url=postgresql://@:/?sslmode=disable`

For secure connections, the URL format is:
`--url=postgresql://@:/`
with the following parameters in the query string:
`sslcert=`
`sslkey=`
`sslmode=verify-full`
`sslrootcert=`

**Env Variable:** `COCKROACH_URL` `--user`
`-u` | The user connecting to the database. Currently, only the `root` user can configure replication zones.

**Env Variable:** `COCKROACH_USER`
**Default:** `root` -## Examples +## Basic Examples -- [View the Default Replication Zone](#view-the-default-replication-zone) -- [Edit the Default Replication Zone](#edit-the-default-replication-zone) -- [Create a Replication Zone for a Database](#create-a-replication-zone-for-a-database) -- [Create a Replication Zone for a Table](#create-a-replication-zone-for-a-table) +These examples focus on the basic approach and syntax for working with zone configuration. For examples demonstrating how to use constraints, see [Scenario-based Examples](#scenario-based-examples). ### View the Default Replication Zone @@ -150,35 +170,175 @@ constraints: [] ### Edit the Default Replication Zone -Let's say you want to run a three-node cluster across three datacenters, two on the US east coast and one on the US west coast. You want data replicated three times by default, with each replica stored on a specific node in a specific datacenter. +To edit the default replication zone, create a YAML file with changes, and use the `cockroach zone set .default -f ` command with appropriate flags: -1. Start each node with the relevant datacenter location specified in the `--attrs` field: +~~~ shell +$ cat default_update.yaml +~~~ - ~~~ shell - # Start node in first US east coast datacenter: - $ cockroach start --host=node1-hostname --attrs=us-east-1a +~~~ +num_replicas: 5 +~~~ - # Start node in second US east coast datacenter: - $ cockroach start --host=node2-hostname --attrs=us-east-1b --join=node1-hostname:27257 +~~~ shell +$ cockroach zone set .default -f default_update.yaml +~~~ + +~~~ +range_min_bytes: 1048576 +range_max_bytes: 67108864 +gc: + ttlseconds: 86400 +num_replicas: 5 +constraints: [] +~~~ - # Start node in US west coast datacenter: - $ cockroach start --host=node3-hostname --attrs=us-west-1a --join=node1-hostname:27257 +Alternately, you can pass the YAML content via the standard input: + +~~~ shell +$ echo 'num_replicas: 5' | cockroach zone set .default -f - +~~~ + +### Create a Replication Zone for a Database + +To control replication for a specific database, create a YAML file with changes, and use the `cockroach zone set -f ` command with appropriate flags: + +~~~ shell +$ cat database_zone.yaml +~~~ + +~~~ +num_replicas: 7 +~~~ + +~~~ shell +$ cockroach zone set db1 -f database_zone.yaml +~~~ + +~~~ +range_min_bytes: 1048576 +range_max_bytes: 67108864 +gc: + ttlseconds: 86400 +num_replicas: 5 +constraints: [] +~~~ + +Alternately, you can pass the YAML content via the standard input: + +~~~ shell +$ echo 'num_replicas: 5' | cockroach zone set db1 -f - +~~~ + +### Create a Replication Zone for a Table + +To control replication for a specific table, create a YAML file with changes, and use the `cockroach zone set -f ` command with appropriate flags: + +~~~ shell +$ cat table_zone.yaml +~~~ + +~~~ +num_replicas: 7 +~~~ + +~~~ shell +$ cockroach zone set db1.t1 -f table_zone.yaml +~~~ + +~~~ +range_min_bytes: 1048576 +range_max_bytes: 67108864 +gc: + ttlseconds: 86400 +num_replicas: 7 +constraints: [] +~~~ + +Alternately, you can pass the YAML content via the standard input: + +~~~ shell +$ echo 'num_replicas: 7' | cockroach zone set db1.t1 -f - +~~~ + +## Scenario-based Examples + +### Even Replication Across Datacenters + +**Scenario:** +- You have a 6 nodes across 3 datacenters, 2 nodes in each datacenter. +- You want data replicated 3 times, with replicas balanced evenly across all three datacenters. + +**Approach:** + +Start each node with its datacenter location specified in the `--locality` flag: + +~~~ shell +# Start the two nodes in datacenter 1: +$ cockroach start --host= --locality=datacenter=us-1 \ +--insecure +$ cockroach start --host= --locality=datacenter=us-1 \ +--join=:27257 --insecure + +# Start the two nodes in datacenter 2: +$ cockroach start --host= --locality=datacenter=us-2 \ +--join=:27257 --insecure +$ cockroach start --host= --locality=datacenter=us-2 \ +--join=:27257 --insecure + +# Start the two nodes in datacenter 3: +$ cockroach start --host= --locality=datacenter=us-3 \ +--join=:27257 --insecure +$ cockroach start --host= --locality=datacenter=us-3 \ +--join=:27257 --insecure +~~~ + +There's no need to make zone configuration changes; by default, the cluster is configured to replicate data three times, and even without explicit constraints, the cluster will aim to diversify replicas across node localities. + +### Multiple Applications Writing to Different Databases + +**Scenario:** +- You have 2 independent applications connected to the same CockroachDB cluster, each application using a distinct database. +- You have 6 nodes across 2 datacenters, 3 nodes in each datacenter. +- You want the data for application 1 to be replicated 5 times, with replicas evenly balanced across both datacenters. +- You want the data for application 2 to be replicated 3 times, with all replicas in a single datacenter. + +**Aproach:** + +1. Start each node with its datacenter location specified in the `--locality` flag: + + ~~~ shell + # Start the three nodes in datacenter 1: + $ cockroach start --host= --locality=datacenter=us-1 \ + --insecure + $ cockroach start --host= --locality=datacenter=us-1 \ + --join=:27257 --insecure + $ cockroach start --host= --locality=datacenter=us-1 \ + --join=:27257 --insecure + + # Start the three nodes in datacenter 2: + $ cockroach start --host= --locality=datacenter=us-2 \ + --join=:27257 --insecure + $ cockroach start --host= --locality=datacenter=us-2 \ + --join=:27257 --insecure + $ cockroach start --host= --locality=datacenter=us-2 \ + --join=:27257 --insecure ~~~ -2. Create a YAML file with the node attributes listed as constraints: +2. Configure a replication zone for the database used by application 1: ~~~ shell - $ cat default_update.yaml + # Create a YAML file with the replica count set to 5: + $ cat app1_zone.yaml ~~~ ~~~ - constraints: [us-west-1a, us-east-1b, us-west-1a] + num_replicas: 5 ~~~ -3. Use the file to update the default zone configuration: - ~~~ shell - $ cockroach zone set .default -f default_update.yaml + # Apply the replication zone to the database used by application 1: + $ cockroach zone set app1_db -f app1_zone.yaml ~~~ ~~~ @@ -186,47 +346,26 @@ Let's say you want to run a three-node cluster across three datacenters, two on range_max_bytes: 67108864 gc: ttlseconds: 86400 - num_replicas: 3 - constraints: [us-west-1a, us-east-1b, us-west-1a] + num_replicas: 5 + constraints: [] ~~~ - Alternately, you can pass the YAML content via the standard input: - - ~~~ shell - $ echo 'constraints: [us-west-1a, us-east-1b, us-west-1a]' | cockroach zone set .default -f - - ~~~ - -### Create a Replication Zone for a Database - -Let's say you want to run a cluster across five nodes, three of which have ssd storage devices. You want data in the `bank` database replicated to these ssd devices. - -1. When starting the three nodes that have ssd storage, specify `ssd` as an attribute of the stores, and when starting the other two nodes, leave the attribute out: + Nothing else is necessary for application 1's data. Since all nodes specify their datacenter locality, the cluster will aim to balance the data in the database used by application 1 between datacenters 1 and 2. - ~~~ shell - # Start nodes with ssd storage: - $ cockroach start --insecure --host=node1-hostname --store=path=node1-data,attrs=ssd - $ cockroach start --insecure --host=node2-hostname --store=path=node2-data,attrs=ssd --join=node1-hostname:27257 - $ cockroach start --insecure --host=node3-hostname --store=path=node3-data,attrs=ssd --join=node1-hostname:27257 - - # Start nodes without ssd storage: - $ cockroach start --insecure --host=node4-hostname --store=path=node4-data --join=node1-hostname:27257 - $ cockroach start --insecure --host=node5-hostname --store=path=node5-data --join=node1-hostname:27257 - ~~~ - -2. Create a YAML file with the `ssd` attribute listed as a constraint: +3. Configure a replication zone for the database used by application 2: ~~~ shell - $ cat bank_zone.yaml + # Create a YAML file with 1 datacenter as a required constraint: + $ cat app2_zone.yaml ~~~ ~~~ - constraints: [ssd] + constraints: [+datacenter=us-2] ~~~ -3. Use the file to update the zone configuration for the `bank` database: - ~~~ shell - $ cockroach zone set bank -f bank_zone.yaml + # Apply the replication zone to the database used by application 2: + $ cockroach zone set app2_db -f app2_zone.yaml ~~~ ~~~ @@ -235,46 +374,58 @@ Let's say you want to run a cluster across five nodes, three of which have ssd s gc: ttlseconds: 86400 num_replicas: 3 - constraints: [ssd] + constraints: [+datacenter=us-2] ~~~ - Alternately, you can pass the YAML content via the standard input: + The required constraint will force application 2's data to be replicated only within the `us-2` datacenter. - ~~~ shell - $ echo 'constraints: [ssd]' | cockroach zone set bank -f - - ~~~ +### Stricter Replication for a Specific Table -### Create a Replication Zone for a Table +**Scenario:** +- You have a 7 nodes, 5 with SSD drives and 2 with HDD drives. +- You want data replicated 3 times by default. +- Speed and availability are important for a specific table that is queried very frequently, however, so you want the data in that table to be replicated 5 times, preferrably on nodes with SSD drives. -Let's say you want to run a cluster across five nodes, three of which have ssd storage devices. You want data in the `bank.accounts` table replicated to these ssd devices. +**Aproach:** -1. When starting the three nodes that have ssd storage, specify `ssd` as an attribute of the stores, and when starting the other two nodes, leave the attribute out: +1. Start each node with `ssd` or `hdd` specified as store attributes: ~~~ shell - # Start nodes with ssd storage: - $ cockroach start --insecure --host=node1-hostname --store=path=node1-data,attrs=ssd - $ cockroach start --insecure --host=node2-hostname --store=path=node2-data,attrs=ssd --join=node1-hostname:27257 - $ cockroach start --insecure --host=node3-hostname --store=path=node3-data,attrs=ssd --join=node1-hostname:27257 - - # Start nodes without ssd storage: - $ cockroach start --insecure --host=node4-hostname --store=path=node4-data --join=node1-hostname:27257 - $ cockroach start --insecure --host=node5-hostname --store=path=node5-data --join=node1-hostname:27257 + # Start the 5 nodes with SSD storage: + $ cockroach start --host= --store=path=node1,attrs=ssd \ + --insecure + $ cockroach start --host= --store=path=node2,attrs=ssd \ + --join=:27257 --insecure + $ cockroach start --host= --store=path=node3,attrs=ssd \ + --join=:27257 --insecure + $ cockroach start --host= --store=path=node4,attrs=ssd \ + --join=:27257 --insecure + $ cockroach start --host= --store=path=node5,attrs=ssd \ + --join=:27257 --insecure + + # Start the 2 nodes with HDD storage: + $ cockroach start --host= --store=path=node6,attrs=hdd \ + --join=:27257 --insecure + $ cockroach start --host= --store=path=node2,attrs=hdd \ + --join=:27257 --insecure ~~~ -2. Create a YAML file with the `ssd` attribute listed as a constraint: +2. Configure a replication zone for the table that must be replicated more strictly: ~~~ shell - $ cat accounts_zone.yaml + # Create a YAML file with the replica count set to 5 + # and the ssd attribute as a positive constraint: + $ cat table_zone.yaml ~~~ ~~~ + num_replicas: 5 constraints: [ssd] ~~~ -3. Use the file to update the zone configuration for the `bank` database: - ~~~ shell - $ cockroach zone set bank.accounts -f accounts_zone.yaml + # Apply the replication zone to the table: + $ cockroach zone set db.important_tablee -f table_zone.yaml ~~~ ~~~ @@ -282,16 +433,12 @@ Let's say you want to run a cluster across five nodes, three of which have ssd s range_max_bytes: 67108864 gc: ttlseconds: 86400 - num_replicas: 3 + num_replicas: 5 constraints: [ssd] ~~~ - Alternately, you can pass the YAML content via the standard input: - - ~~~ shell - $ echo 'constraints: [ssd]' | cockroach zone set bank.accounts -f - - ~~~ + Data in the table will be replicated 5 times, and the positive constraint will place data in the table on nodes with `ssd` drives whenever possible. ## See Also -[Other Cockroach Commands](cockroach-commands.html) \ No newline at end of file +[Other Cockroach Commands](cockroach-commands.html) diff --git a/recommended-production-settings.md b/recommended-production-settings.md index 282cc912582..60ccf85b2e5 100644 --- a/recommended-production-settings.md +++ b/recommended-production-settings.md @@ -11,10 +11,20 @@ This page provides recommended settings for production deployments. ## Cluster Topology -- Run one node per machine. Since CockroachDB [replicates](configure-replication-zones.html) across nodes, running more than one node per machine increases the risk of data unavailability if a machine fails. +When running a cluster with more than one node, each replica will be on a different node and a majority of replicas must remain available for the cluster to make progress. Therefore: + +- Run at least three nodes to ensure that a majority of replicas (2/3) remains available when a node goes down. + +- Run one node per machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data unavailability if a machine fails. - If a machine has multiple disks or SSDs, it's better to run one node with multiple `--store` flags instead of one node per disk, because this informs CockroachDB about the relationship between the stores and ensures that data will be replicated across different machines instead of being assigned to different disks of the same machine. For more details about stores, see [Start a Node](start-a-node.html). +- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a quorum (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. + +- When replicating across datacenters, you should use datacenters on a single continent to ensure peformance (cross-continent scenarios will be better supported in the future). + +For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html). + ## Clock Synchronization CockroachDB needs moderately accurate time to preserve data consistency, so it's important to run [NTP](http://www.ntp.org/) or other clock synchronization software on each machine. If clocks drift too far apart, nodes will self-terminate, but this mechanism is not fail-safe and data consistency guarantees may be lost. diff --git a/start-a-node.md b/start-a-node.md index 2512c4549ef..a558259c357 100644 --- a/start-a-node.md +++ b/start-a-node.md @@ -42,7 +42,7 @@ Flag | Description `--locality` | Not yet implemented. `--max-sql-memory` | The total size for storage of temporary data for SQL clients, including prepared queries and intermediate data rows during query execution. This can be in any bytes-based unit, for example:

`--max-sql-memory=10000000000 ----> 1000000000 bytes`
`--max-sql-memory=1GB ----> 1000000000 bytes`
`--max-sql-memory=1GiB ----> 1073741824 bytes`

**Default:** 25% of total system memory (excluding swap), or 512MiB if the memory size cannot be determined `--port`
`-p` | The port to bind to for internal and client communication.

**Env Variable:** `COCKROACH_PORT`
**Default:** `26257` -`--raft-tick-interval` | CockroachDB uses the [Raft consensus algorithm](https://raft.github.io/) to replicate data consistently according to your [replication zone configuration](configure-replication-zones.html). For each replica group, an elected leader heartbeats its followers and keeps their logs replicated. When followers fail to receive heartbeats, a new leader is elected.

This flag sets the interval at which the replica leader heartbeats followers. For high-latency deployments, set this flag to a value greater than the average latency between your nodes. Also, this flag should be set identically on all nodes in the cluster.

**Default:** 200ms +`--raft-tick-interval` | CockroachDB uses the [Raft consensus algorithm](https://raft.github.io/) to replicate data consistently according to your [replication zone configuration](configure-replication-zones.html). For each replica group, an elected leader heartbeats its followers and keeps their logs replicated. When followers fail to receive heartbeats, a new leader is elected.

This flag is factored into defining the interval at which replica leaders heartbeat followers. It is not recommended to change the default, but if changed, it must be changed identically on all nodes in the cluster.

**Default:** 200ms `--store`
`-s` | The file path to a storage device and, optionally, store attributes and maximum size. When using multiple storage devices for a node, this flag must be specified separately for each device, for example:

`--store=/mnt/ssd01 --store=/mnt/ssd02`

For more details, see [`store`](#store) below. #### `store` From 27cea7e2a4c5e5ef04c6207bf8b537c7ff11bb0f Mon Sep 17 00:00:00 2001 From: Jesse Seldess Date: Thu, 26 Jan 2017 11:07:48 -0500 Subject: [PATCH 2/3] revise language about cross-continent deployment; fixes #1028 --- recommended-production-settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recommended-production-settings.md b/recommended-production-settings.md index 60ccf85b2e5..30bb680aadd 100644 --- a/recommended-production-settings.md +++ b/recommended-production-settings.md @@ -21,7 +21,7 @@ When running a cluster with more than one node, each replica will be on a differ - Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a quorum (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. -- When replicating across datacenters, you should use datacenters on a single continent to ensure peformance (cross-continent scenarios will be better supported in the future). +- When replicating across datacenters, it's recommended to use datacenters on a single continent to ensure performance. Inter-continent scenarios will improve in performance soon. For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html). From 740b36a556eea1312d0eda81b1eaef5f52711a52 Mon Sep 17 00:00:00 2001 From: Jesse Seldess Date: Thu, 26 Jan 2017 11:31:39 -0500 Subject: [PATCH 3/3] revisions --- beta-20170126.md | 2 +- configure-replication-zones.md | 26 +++++++++++++------------- recommended-production-settings.md | 10 ++++------ start-a-node.md | 8 ++++---- 4 files changed, 22 insertions(+), 24 deletions(-) diff --git a/beta-20170126.md b/beta-20170126.md index 10a576167be..ce42d1ac0b2 100644 --- a/beta-20170126.md +++ b/beta-20170126.md @@ -73,7 +73,7 @@ Get future release notes emailed to you: - It is now possible to drop a table with a self-referential foreign key without the `CASCADE` modifier. [#12958](https://github.com/cockroachdb/cockroach/pull/12958) - Additional data consistency checks have been temporarily enabled. [#12994](https://github.com/cockroachdb/cockroach/pull/12994) - Fixed a crash when retryable errors are returned inside subqueries. [#13028](https://github.com/cockroachdb/cockroach/pull/13028) -- Node ID allocation is now retried if it fails when a node first starts. [#123107](https://github.com/cockroachdb/cockroach/pull/123107) +- Node ID allocation is now retried if it fails when a node first starts. [#13107](https://github.com/cockroachdb/cockroach/pull/13107) ### Performance Improvements diff --git a/configure-replication-zones.md b/configure-replication-zones.md index 5e1a450d1fc..b7fa76e73e1 100644 --- a/configure-replication-zones.md +++ b/configure-replication-zones.md @@ -5,7 +5,7 @@ keywords: ttl, time to live, availability zone toc: false --- -In CockroachDB, you use **replication zones** to control the number and location of replicas for specific sets of data. Initially, there is a single, default replication zone for the entire cluster. You can adjust this default zone as well as add zones for individual databases and tables as needed. For example, you might use the default zone to replicate most data in a cluster normally within a single datacenter, while creating a specific zone to more highly replicate a certain database or table across multiple datacenters and geographies. +In CockroachDB, you use **replication zones** to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium. Initially, there is a single, default replication zone for the entire cluster. You can adjust this default zone as well as add zones for individual databases and tables as needed. For example, you might use the default zone to replicate most data in a cluster normally within a single datacenter, while creating a specific zone to more highly replicate a certain database or table across multiple datacenters and geographies. This page explains how replication zones work and how to use the `cockroach zone` [command](cockroach-commands.html) to configure them. @@ -44,11 +44,11 @@ Field | Description `range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will spit it into two ranges.

**Default:** `67108864` (64MiB) `ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select.html#select-historical-data-time-travel).

It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 64MiB; such oversized ranges could contribute to the server running out of memory or other problems.

**Default:** `86400` (24 hours) `num_replicas` | The number of replicas in the zone.

**Default:** `3` -`constraints` | A comma-separated list of positive, required, and/or prohibited constraints influencing the location of replicas. See [Replica Constraints](#replication-constraints) for more details.

**Default:** No constraints, with CockroachDB locating each replica on a unique rack, if possible. +`constraints` | A comma-separated list of positive, required, and/or prohibited constraints influencing the location of replicas. See [Replica Constraints](#replication-constraints) for more details.

**Default:** No constraints, with CockroachDB locating each replica on a unique node, if possible. ### Replication Constraints -The location of replicas is based on the interplay between descriptive attributes assigned to nodes when they are started and constraints set in zone configurations. +The location of replicas, both when they are first added and when they are rebalanced to maintain cluster equilibrium, is based on the interplay between descriptive attributes assigned to nodes and constraints set in zone configurations. {{site.data.alerts.callout_success}}For demonstrations of how to set node attributes and replication constraints in different scenarios, see Scenario-based Examples below.{{site.data.alerts.end}} @@ -58,22 +58,22 @@ When starting a node with the [`cockroach start`](start-a-node.html) command, yo Attribute Type | Description ---------------|------------ -**Node Locality** | Using the `--locality` flag, you can assign arbitrary key-value pairs that describe the locality of the node. Locality might include cloud provider, country, region, datacenter, rack, etc.

CockroachDB attempts to spread replicas evenly across the cluster based on locality, with the order determining the priority. The keys themselves and the order of key-value pairs must be the same on all nodes, for example:

`--locality=cloud=gce,country=us,region=east,datacenter=us-east-1,rack=10`
`--locality=cloud=aws,country=us,region=west,datacenter=us-west-1,rack=12` -**Node Capability** | Using the `--attrs` flag, you can specify node capability, which might include specialized hardware or number of cores, for example:

`--attrs=gpu:x16c` +**Node Locality** | Using the `--locality` flag, you can assign arbitrary key-value pairs that describe the locality of the node. Locality might include country, region, datacenter, rack, etc.

CockroachDB attempts to spread replicas evenly across the cluster based on locality, with the order determining the priority. The keys themselves and the order of key-value pairs must be the same on all nodes, for example:

`--locality=region=east,datacenter=us-east-1`
`--locality=region=west,datacenter=us-west-1` +**Node Capability** | Using the `--attrs` flag, you can specify node capability, which might include specialized hardware or number of cores, for example:

`--attrs=ram:64gb` **Store Type/Capability** | Using the `attrs` field of the `--store` flag, you can specify disk type or capability, for example:

`--store=path=/mnt/ssd01,attrs=ssd`
`--store=path=/mnt/hda1,attrs=hdd:7200rpm` #### Constraints in Replication Zones -The node- and store-level descriptive attributes mentioned above can be used as the following types of constraints in replication zones to influence the location of replicas. However, note the following general guidance: +The node-level and store-level descriptive attributes mentioned above can be used as the following types of constraints in replication zones to influence the location of replicas. However, note the following general guidance: - When locality is the only consideration for replication, it's recommended to set locality on nodes without specifying any constraints in zone configurations. In the absence of constraints, CockroachDB attempts to spread replicas evenly across the cluster based on locality. -- When additional or different constraints are needed, positive constraints are generally sufficient. Required and prohibited constraints are useful in special situations where, for example, data must or must not be stored in a specific country. +- When additional or different constraints are needed, positive constraints are generally sufficient. Required and prohibited constraints are useful in special situations where, for example, data must or must not be stored in a specific country or on a specific type of machine. Constraint Type | Description | Syntax ----------------|-------------|------- -**Positive** | Replicas will be placed on nodes/stores with as many matching attributes as possible. When there are no matching nodes/stores with capacity, replicas will be added wherever there is capacity.

For example, `constraints: [ssd, datacenter=us-west-1a]` would cause replicas to be located on different `ssd` drives in the `us-west-1a` datacenter. When there's not sufficient capacity for a new replica on an `ssd` drive in the datacenter, the replica would get added on an available `ssd` drive elsewhere. When there's not sufficient capacity for a new replica on any `ssd` drive, the replica would get added on any other drive with capacity. | `[ssd]` -**Required** | Replicas **must** be placed on nodes/stores with matching attributes. When there are no matching nodes/stores with capacity, new replicas will not be added.

For example, `constraints: [+datacenter=us-west-1a]` would force replicas to be located on different racks in the `us-west-1a` datacenter. When there's not sufficient capacity for a new replica on a unique rack in the datacenter, the replica would get added on a rack already storing a replica. When there's not sufficient capacity for a new replica on any rack in the datacenter, the replica would not get added. | `[+ssd]` -**Prohibited** | Replicas **must not** be placed on nodes/stores with matching attributes. When there are no alternate nodes/stores with capacity, new replicas will not be added.

For example, `constraints: [-mem, -us-west-1a]` would force replicas to be located on-disk on different racks outside the `us-west-1a` datacenter. When there's not sufficient on-disk capacity on a unique rack outside the datacenter, the replica would get added on a rack already storing a replica. When there's sufficient capacity for a new replica only in the `us-west-1a` datacenter, the replica would not get added. | `[-ssd]` +**Positive** | When placing replicas, the cluster will prefer nodes/stores with as many matching attributes as possible. When there are no matching nodes/stores with capacity, replicas will be placed wherever there is capacity. | `[ssd]` +**Required** | When placing replicas, the cluster will only consider nodes/stores with matching attributes. When there are no matching nodes/stores with capacity, new replicas will not be added. | `[+ssd]` +**Prohibited** | When placing replicas, the cluster will ignore nodes/stores with matching attributes. When there are no alternate nodes/stores with capacity, new replicas will not be added. | `[-ssd]` ### Node/Replica Recommendations @@ -170,7 +170,7 @@ constraints: [] ### Edit the Default Replication Zone -To edit the default replication zone, create a YAML file with changes, and use the `cockroach zone set .default -f ` command with appropriate flags: +To edit the default replication zone, create a YAML file defining only the values you want to change (other values will not be affected), and use the `cockroach zone set .default -f ` command with appropriate flags: ~~~ shell $ cat default_update.yaml @@ -201,7 +201,7 @@ $ echo 'num_replicas: 5' | cockroach zone set .default -f - ### Create a Replication Zone for a Database -To control replication for a specific database, create a YAML file with changes, and use the `cockroach zone set -f ` command with appropriate flags: +To control replication for a specific database, create a YAML file defining only the values you want to change (other values will not be affected), and use the `cockroach zone set -f ` command with appropriate flags: ~~~ shell $ cat database_zone.yaml @@ -232,7 +232,7 @@ $ echo 'num_replicas: 5' | cockroach zone set db1 -f - ### Create a Replication Zone for a Table -To control replication for a specific table, create a YAML file with changes, and use the `cockroach zone set -f ` command with appropriate flags: +To control replication for a specific table, create a YAML file defining only the values you want to change (other values will not be affected), and use the `cockroach zone set -f ` command with appropriate flags: ~~~ shell $ cat table_zone.yaml diff --git a/recommended-production-settings.md b/recommended-production-settings.md index 30bb680aadd..d9b9c34292c 100644 --- a/recommended-production-settings.md +++ b/recommended-production-settings.md @@ -13,15 +13,13 @@ This page provides recommended settings for production deployments. When running a cluster with more than one node, each replica will be on a different node and a majority of replicas must remain available for the cluster to make progress. Therefore: -- Run at least three nodes to ensure that a majority of replicas (2/3) remains available when a node goes down. +- Run at least three nodes to ensure that a majority of replicas (2/3) remains available if a node fails. -- Run one node per machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data unavailability if a machine fails. +- Run each node on a separate machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data loss if a machine fails. Likewise, if a machine has multiple disks or SSDs, run one node with multiple `--store` flags and not one node per disk. For more details about stores, see [Start a Node](start-a-node.html). -- If a machine has multiple disks or SSDs, it's better to run one node with multiple `--store` flags instead of one node per disk, because this informs CockroachDB about the relationship between the stores and ensures that data will be replicated across different machines instead of being assigned to different disks of the same machine. For more details about stores, see [Start a Node](start-a-node.html). +- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a majority (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. -- Configurations with odd numbers of replicas are more robust than those with even numbers. Clusters of three and four nodes can each tolerate one node failure and still reach a quorum (2/3 and 3/4 respectively), so the fourth replica doesn't add any extra fault-tolerance. To survive two simultaneous failures, you must have five replicas. - -- When replicating across datacenters, it's recommended to use datacenters on a single continent to ensure performance. Inter-continent scenarios will improve in performance soon. +- When replicating across datacenters, it's recommended to use datacenters on a single continent to ensure performance (inter-continent scenarios will improve in performance soon). Also, to ensure even replication across datacenters, it's recommended to specify datacenter as a `--locality` at the node-level (see this [example](configure-replication-zones.html#even-replication-across-datacenters) for more details). For details about controlling the number and location of replicas, see [Configure Replication Zones](configure-replication-zones.html). diff --git a/start-a-node.md b/start-a-node.md index a558259c357..d38874fb72b 100644 --- a/start-a-node.md +++ b/start-a-node.md @@ -28,7 +28,7 @@ The `start` command supports the following flags, as well as [logging flags](coc Flag | Description -----|----------- `--advertise-host` | The hostname or IP address to advertise to other CockroachDB nodes. If it is a hostname, it must be resolvable from all nodes; if it is an IP address, it must be routable from all nodes.

When this flag is not set, the node advertises the address in the `--host` flag. -`--attrs` | Arbitrary strings, separated by colons, relating to node-level attributes such as topography or machine capability. These can be used to influence the location of data replicas. See [Configure Replication Zones](configure-replication-zones.html) for full details.

Topography might include datacenter designation (e.g., `us-west-1a`, `us-west-1b`, `us-east-1c`). Machine capabilities might include specialized hardware or number of cores (e.g., `gpu`, `x16c`). The relative geographic proximity of two nodes is inferred from the common prefix of the attributes list, so topographic attributes should be specified first and in the same order for all nodes, for example:

`--attrs=us-west-1b:gpu` +`--attrs` | Arbitray strings, separated by colons, specifying node capability, which might include specialized hardware or number of cores, for example:

`--attrs=ram:64gb`

These can be used to influence the location of data replicas. See [Configure Replication Zones](configure-replication-zones.html#replication-constraints) for full details. `--background` | Set this to start the node in the background. This is better than appending `&` to the command because control is returned to the shell only once the node is ready to accept requests. `--cache` | The total size for caches, shared evenly if there are multiple storage devices. This can be in any bytes-based unit, for example:

`--cache=1000000000 ----> 1000000000 bytes`
`--cache=1GB ----> 1000000000 bytes`
`--cache=1GiB ----> 1073741824 bytes`

**Default:** 25% of total system memory (excluding swap), or 512MiB if the memory size cannot be determined `--ca-cert` | The path to the [CA certificate](create-security-certificates.html). This flag is required to start a secure node.

**Env Variable:** `COCKROACH_CA_CERT` @@ -39,10 +39,10 @@ Flag | Description `--insecure` | Set this only if the cluster is insecure and running on multiple machines.

If the cluster is insecure and local, leave this out. If the cluster is secure, leave this out and set the `--ca-cert`, `--cert`, and `-key` flags.

**Env Variable:** `COCKROACH_INSECURE` `--join`
`-j` | The address for connecting the node to an existing cluster. When starting the first node, leave this flag out. When starting subsequent nodes, set this flag to the address of any existing node.

Optionally, you can specify the addresses of multiple existing nodes as a comma-separated list, using multiple `--join` flags, or using a combination of these approaches, for example:

`--join=localhost:1234,localhost:2345`
`--join=localhost:1234 --join=localhost:2345`
`--join=localhost:1234,localhost:2345 --join=localhost:3456` `--key` | The path to the [node key](create-security-certificates.html) protecting the node certificate. This flag is required to start a secure node. -`--locality` | Not yet implemented. +`--locality` | Arbitrary key-value pairs that describe the locality of the node. Locality might include country, region, datacenter, rack, etc.

CockroachDB attempts to spread replicas evenly across the cluster based on locality, with the order determining the priority. The keys themselves and the order of key-value pairs must be the same on all nodes, for example:

`--locality=region=east,datacenter=us-east-1`
`--locality=region=west,datacenter=us-west-1`

These can be used to influence the location of data replicas. See [Configure Replication Zones](configure-replication-zones.html#replication-constraints) for full details. `--max-sql-memory` | The total size for storage of temporary data for SQL clients, including prepared queries and intermediate data rows during query execution. This can be in any bytes-based unit, for example:

`--max-sql-memory=10000000000 ----> 1000000000 bytes`
`--max-sql-memory=1GB ----> 1000000000 bytes`
`--max-sql-memory=1GiB ----> 1073741824 bytes`

**Default:** 25% of total system memory (excluding swap), or 512MiB if the memory size cannot be determined `--port`
`-p` | The port to bind to for internal and client communication.

**Env Variable:** `COCKROACH_PORT`
**Default:** `26257` -`--raft-tick-interval` | CockroachDB uses the [Raft consensus algorithm](https://raft.github.io/) to replicate data consistently according to your [replication zone configuration](configure-replication-zones.html). For each replica group, an elected leader heartbeats its followers and keeps their logs replicated. When followers fail to receive heartbeats, a new leader is elected.

This flag is factored into defining the interval at which replica leaders heartbeat followers. It is not recommended to change the default, but if changed, it must be changed identically on all nodes in the cluster.

**Default:** 200ms +`--raft-tick-interval` | CockroachDB uses the [Raft consensus algorithm](https://raft.github.io/) to replicate data consistently according to your [replication zone configuration](configure-replication-zones.html). For each replica group, an elected leader heartbeats its followers and keeps their logs replicated. When followers fail to receive heartbeats, a new leader is elected.

This flag is factored into defining the interval at which replica leaders heartbeat followers. It is not recommended to change the default, but if changed, every node in the cluster must be stopped and restarted with the identical value.

**Default:** 200ms `--store`
`-s` | The file path to a storage device and, optionally, store attributes and maximum size. When using multiple storage devices for a node, this flag must be specified separately for each device, for example:

`--store=/mnt/ssd01 --store=/mnt/ssd02`

For more details, see [`store`](#store) below. #### `store` @@ -52,7 +52,7 @@ The `store` flag supports the following fields. Note that commas are used to sep Field | Description ------|------------ `path` | The file path to the storage device. When not setting `attr` or `size`, the `path` field label can be left out:

`--store=/mnt/ssd01`

When either of those fields are set, however, the `path` field label must be used:

`--store=path=/mnt/ssd01,size=20GB`

**Default:** `cockroach-data` -`attrs` | Arbitrary strings, separated by colons, relating to store-level attributes such as disk type or capabilities. These can be used to influence the location of data replicas. See [Configure Replication Zones](configure-replication-zones.html) for full details.

In most cases, node-level attributes are preferable to store-level attributes, but this field can be used to match capabilities for storage of individual databases or tables. For example, an OLTP database would probably want to allocate space for its tables only on solid state devices, whereas append-only time series might prefer cheaper spinning drives. Typical attributes include whether the store is flash (`ssd`), spinny disk (`hdd`), or in-memory (`mem`), as well as speeds and other specs, for example:

`--store=path=/mnt/hda1,attrs=hdd:7200rpm` +`attrs` | Arbitrary strings, separated by colons, specifying disk type or capability. These can be used to influence the location of data replicas. See [Configure Replication Zones](configure-replication-zones.html#replication-constraints) for full details.

In most cases, node-level `--locality` or `--attrs` are preferable to store-level attributes, but this field can be used to match capabilities for storage of individual databases or tables. For example, an OLTP database would probably want to allocate space for its tables only on solid state devices, whereas append-only time series might prefer cheaper spinning drives. Typical attributes include whether the store is flash (`ssd`), spinny disk (`hdd`), or in-memory (`mem`), as well as speeds and other specs, for example:

`--store=path=/mnt/hda1,attrs=hdd:7200rpm` `size` | The maximum size allocated to the node. When this size is reached, CockroachDB attempts to rebalance data to other nodes with available capacity. When there's no capacity elsewhere, this limit will be exceeded. Also, data may be written to the node faster than the cluster can rebalance it away; in this case, as long as capacity is available elsewhere, CockroachDB will gradually rebalance data down to the store limit.

The `size` can be specified either in a bytes-based unit or as a percentage of hard drive space, for example:

`--store=path=/mnt/ssd01,size=10000000000 ----> 10000000000 bytes`
`--store-path=/mnt/ssd01,size=20GB ----> 20000000000 bytes`
`--store-path=/mnt/ssd01,size=20GiB ----> 21474836480 bytes`
`--store-path=/mnt/ssd01,size=0.02TiB ----> 21474836480 bytes`
`--store=path=/mnt/ssd01,size=20% ----> 20% of available space`
`--store=path=/mnt/ssd01,size=0.2 ----> 20% of available space`
`--store=path=/mnt/ssd01,size=.2 ----> 20% of available space`

**Default:** 100%

For an in-memory store, the `size` field is required and must be set to the true maximum bytes or percentage of available memory. In addition, an extra `type` field must be set to `mem`, and the `path` field must be left out, for example:

`--store=type=mem,size=20GB`
`--store=type=mem,size=90%` ## Standard Output