From d2a72901fe3f664650c58a1f94fd59ca99efe0d3 Mon Sep 17 00:00:00 2001 From: Jesse Seldess Date: Tue, 2 Jul 2019 17:24:42 -0400 Subject: [PATCH] Document ballast file as production best practice Fixes #4932. --- _includes/sidebar-data-v19.1.json | 6 ++++ _includes/sidebar-data-v19.2.json | 6 ++++ _includes/v19.1/misc/debug-subcommands.md | 3 ++ _includes/v19.2/misc/debug-subcommands.md | 3 ++ v19.1/cockroach-commands.md | 1 + v19.1/debug-ballast.md | 43 +++++++++++++++++++++++ v19.1/debug-encryption-active-key.md | 4 +++ v19.1/debug-merge-logs.md | 4 +-- v19.1/debug-zip.md | 9 +++-- v19.1/operational-faqs.md | 9 ++++- v19.1/recommended-production-settings.md | 2 ++ v19.2/cockroach-commands.md | 1 + v19.2/debug-ballast.md | 43 +++++++++++++++++++++++ v19.2/debug-encryption-active-key.md | 4 +++ v19.2/debug-merge-logs.md | 4 +-- v19.2/debug-zip.md | 9 +++-- v19.2/operational-faqs.md | 9 ++++- v19.2/recommended-production-settings.md | 2 ++ 18 files changed, 144 insertions(+), 18 deletions(-) create mode 100644 _includes/v19.1/misc/debug-subcommands.md create mode 100644 _includes/v19.2/misc/debug-subcommands.md create mode 100644 v19.1/debug-ballast.md create mode 100644 v19.2/debug-ballast.md diff --git a/_includes/sidebar-data-v19.1.json b/_includes/sidebar-data-v19.1.json index c88a3fe74e5..f03cbfde883 100644 --- a/_includes/sidebar-data-v19.1.json +++ b/_includes/sidebar-data-v19.1.json @@ -1588,6 +1588,12 @@ "/${VERSION}/cockroach-demo.html" ] }, + { + "title": "cockroach debug ballast", + "urls": [ + "/${VERSION}/debug-ballast.html" + ] + }, { "title": "cockroach debug encryption-active-key", "urls": [ diff --git a/_includes/sidebar-data-v19.2.json b/_includes/sidebar-data-v19.2.json index 86371716099..28aafe3e7d7 100644 --- a/_includes/sidebar-data-v19.2.json +++ b/_includes/sidebar-data-v19.2.json @@ -1588,6 +1588,12 @@ "/${VERSION}/cockroach-demo.html" ] }, + { + "title": "cockroach debug ballast", + "urls": [ + "/${VERSION}/debug-ballast.html" + ] + }, { "title": "cockroach debug encryption-active-key", "urls": [ diff --git a/_includes/v19.1/misc/debug-subcommands.md b/_includes/v19.1/misc/debug-subcommands.md new file mode 100644 index 00000000000..7a10e4a7fba --- /dev/null +++ b/_includes/v19.1/misc/debug-subcommands.md @@ -0,0 +1,3 @@ +While the `cockroach debug` command has a few subcommands, users are expected to use only the [`zip`](debug-zip.html), [`encryption-active-key`](debug-encryption-active-key.html), [`merge-logs`](debug-merge-logs.html), and [`ballast`](debug-ballast.html) subcommands. + +The other `debug` subcommands are useful only to CockroachDB's developers and contributors. diff --git a/_includes/v19.2/misc/debug-subcommands.md b/_includes/v19.2/misc/debug-subcommands.md new file mode 100644 index 00000000000..7a10e4a7fba --- /dev/null +++ b/_includes/v19.2/misc/debug-subcommands.md @@ -0,0 +1,3 @@ +While the `cockroach debug` command has a few subcommands, users are expected to use only the [`zip`](debug-zip.html), [`encryption-active-key`](debug-encryption-active-key.html), [`merge-logs`](debug-merge-logs.html), and [`ballast`](debug-ballast.html) subcommands. + +The other `debug` subcommands are useful only to CockroachDB's developers and contributors. diff --git a/v19.1/cockroach-commands.md b/v19.1/cockroach-commands.md index faf615e68db..8f2f2f3bdb7 100644 --- a/v19.1/cockroach-commands.md +++ b/v19.1/cockroach-commands.md @@ -26,6 +26,7 @@ Command | Usage [`cockroach demo`](cockroach-demo.html) | Start a temporary, in-memory, single-node CockroachDB cluster, and open an interactive SQL shell to it. [`cockroach gen`](generate-cockroachdb-resources.html) | Generate manpages, a bash completion file, example SQL data, or an HAProxy configuration file for a running cluster. [`cockroach version`](view-version-details.html) | Output CockroachDB version details. +[`cockroach debug ballast`](debug-ballast.html) | Create a large, unused file in a node's storage directory that you can delete if the node runs out of disk space. [`cockroach debug encryption-active-key`](debug-encryption-active-key.html) | View the encryption algorithm and store key. [`cockroach debug zip`](debug-zip.html) | Generate a `.zip` file that can help Cockroach Labs troubleshoot issues with your cluster. [`cockroach debug merge-logs`](debug-merge-logs.html) | Merge multiple log files from different machines into a single stream. diff --git a/v19.1/debug-ballast.md b/v19.1/debug-ballast.md new file mode 100644 index 00000000000..7b3fb02473c --- /dev/null +++ b/v19.1/debug-ballast.md @@ -0,0 +1,43 @@ +--- +title: Create a Ballast File +summary: Create a large, unused file in a node's storage directory that you can delete if the node runs out of disk space. +toc: true +--- + +The `debug ballast` [command](cockroach-commands.html) creates a large, unused file that you can place in a node's storage directory. In the case that a node runs out of disk space and shuts down, you can delete the ballast file to free up enough space to be able to restart the node. + +- In addition to placing a ballast file in each node's storage directory, it is important to actively [monitor remaining disk space](monitoring-and-alerting.html#events-to-alert-on). +- Ballast files may be created in many ways, including the standard `dd` command. `cockroach debug ballast` uses the `fallocate` system call when available, so it will be faster than `dd`. + +## Subcommands + +{% include {{ page.version.version }}/misc/debug-subcommands.md %} + +## Synopsis + +~~~ shell +# Create a ballast file: +$ cockroach debug ballast [path to ballast file] [flags] + +# View help: +$ cockroach debug ballast --help +~~~ + +## Flags + +Flag | Description +-----|----------- +`--size`
`-z` | The amount of space to fill, or to leave available, in a node's storage directory via a ballast file. Positive values equal the size of the ballast file. Negative values equal the amount of space to leave after creating the ballast file. This can be a percentage (notated as a decimal or with %) or any bytes-based unit, for example:

`--size=1000000000 ----> 1000000000 bytes`
`--size=1GiB ----> 1073741824 bytes`
`--size=5% ----> 5% of available space`
`--size=0.05 ----> 5% of available space`
`--size=.05 ----> 5% of available space` + +## Example + +{% include copy-clipboard.html %} +~~~ shell +$ cockroach debug ballast cockroach-data/ballast.txt --size=1GiB +~~~ + +## See also + +- [Other Cockroach Commands](cockroach-commands.html) +- [Troubleshooting Overview](troubleshooting-overview.html) +- [Production Checklist](recommended-production-settings.html) diff --git a/v19.1/debug-encryption-active-key.md b/v19.1/debug-encryption-active-key.md index c42c5f3e3bd..737379ee931 100644 --- a/v19.1/debug-encryption-active-key.md +++ b/v19.1/debug-encryption-active-key.md @@ -6,6 +6,10 @@ toc: true The `debug encryption-active-key` [command](cockroach-commands.html) displays the encryption algorithm and store key for an encrypted store. +## Subcommands + +{% include {{ page.version.version }}/misc/debug-subcommands.md %} + ## Synopsis ~~~ shell diff --git a/v19.1/debug-merge-logs.md b/v19.1/debug-merge-logs.md index 727ece57cb6..048ce626cbc 100644 --- a/v19.1/debug-merge-logs.md +++ b/v19.1/debug-merge-logs.md @@ -12,9 +12,7 @@ The file produced by `cockroach debug merge-log` can contain highly sensitive, u ## Subcommands -While the `cockroach debug` command has a few subcommands, users are expected to only use the [`encryption-active-key`](debug-encryption-active-key.html), [`zip`](debug-zip.html), and `debug-merge` subcommands. - -`debug`'s other subcommands are useful only to CockroachDB's developers and contributors. +{% include {{ page.version.version }}/misc/debug-subcommands.md %} ## Synopsis diff --git a/v19.1/debug-zip.md b/v19.1/debug-zip.md index 7bafd7baba7..2ef093dd52e 100644 --- a/v19.1/debug-zip.md +++ b/v19.1/debug-zip.md @@ -24,8 +24,9 @@ The `debug zip` [command](cockroach-commands.html) connects to your cluster and Additionally, you can run the [`debug merge-logs`](debug-merge-logs.html) command to merge the collected logs in one file, making it easier to parse them to locate an issue with your cluster. -{{site.data.alerts.callout_danger}}The file produced by cockroach debug zip can contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data. You should share this data only with Cockroach Labs developers and only after determining the most secure method of delivery.{{site.data.alerts.end}} - +{{site.data.alerts.callout_danger}} +The file produced by `cockroach debug zip` can contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data. You should share this data only with Cockroach Labs developers and only after determining the most secure method of delivery. +{{site.data.alerts.end}} ## Details @@ -49,9 +50,7 @@ You can locate logs in the unarchived file's `debug/nodes/[node dir]/logs` direc ## Subcommands -While the `cockroach debug` command has a few subcommands, users are expected to only use the [`encryption-active-key`](debug-encryption-active-key.html), `zip`, and [`debug-merge`](debug-merge-logs.html) subcommands. - -`debug`'s other subcommands are useful only to CockroachDB's developers and contributors. +{% include {{ page.version.version }}/misc/debug-subcommands.md %} ## Synopsis diff --git a/v19.1/operational-faqs.md b/v19.1/operational-faqs.md index 5f227a540b9..4b7ed5a3ef0 100644 --- a/v19.1/operational-faqs.md +++ b/v19.1/operational-faqs.md @@ -5,7 +5,6 @@ toc: true toc_not_nested: true --- - ## Why is my process hanging when I try to start it in the background? The first question that needs to be asked is whether or not you have previously @@ -96,6 +95,14 @@ If you want all existing timeseries data to be deleted, change the `timeseries.s > SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '0s'; ~~~ +## What happens when a node runs out of disk space? + +When a node runs out of disk space, it shuts down and cannot be restarted until space is freed up. To prepare for this case, place a [ballast file](debug-ballast.html) in each node's storage directory that can be deleted to free up enough space to be able to restart the node. If you did not create a ballast file, look for other files that can be deleted, such as log files. + +{{site.data.alerts.callout_info}} +In addition to using ballast files, it is important to actively [monitor remaining disk space](monitoring-and-alerting.html#events-to-alert-on). +{{site.data.alerts.end}} + ## Why would increasing the number of nodes not result in more operations per second? If queries operate on different data, then increasing the number diff --git a/v19.1/recommended-production-settings.md b/v19.1/recommended-production-settings.md index d62a32fa785..b8447f4af9e 100644 --- a/v19.1/recommended-production-settings.md +++ b/v19.1/recommended-production-settings.md @@ -79,6 +79,8 @@ Nodes should have sufficient CPU, RAM, network, and storage capacity to handle y - To calculate IOPS, use [sysbench](https://github.com/akopytov/sysbench). If IOPS decrease, add more nodes to your cluster to increase IOPS. +- Place a [ballast file](debug-ballast.html) in each node's storage directory. In the unlikely case that a node runs out of disk space and shuts down, you can delete the ballast file to free up enough space to be able to restart the node. + - Use [zone configs](configure-replication-zones.html) to increase the replication factor from 3 (the default) to 5 (across at least 5 nodes). This is especially recommended if you are using local disks with no RAID protection rather than a cloud provider's network-attached disks that are often replicated under the hood, because local disks have a greater risk of failure. You can do this for the [entire cluster](configure-replication-zones.html#edit-the-default-replication-zone) or for specific [databases](configure-replication-zones.html#create-a-replication-zone-for-a-database), [tables](configure-replication-zones.html#create-a-replication-zone-for-a-table), or [rows](configure-replication-zones.html#create-a-replication-zone-for-a-table-or-secondary-index-partition) (enterprise-only). diff --git a/v19.2/cockroach-commands.md b/v19.2/cockroach-commands.md index faf615e68db..8f2f2f3bdb7 100644 --- a/v19.2/cockroach-commands.md +++ b/v19.2/cockroach-commands.md @@ -26,6 +26,7 @@ Command | Usage [`cockroach demo`](cockroach-demo.html) | Start a temporary, in-memory, single-node CockroachDB cluster, and open an interactive SQL shell to it. [`cockroach gen`](generate-cockroachdb-resources.html) | Generate manpages, a bash completion file, example SQL data, or an HAProxy configuration file for a running cluster. [`cockroach version`](view-version-details.html) | Output CockroachDB version details. +[`cockroach debug ballast`](debug-ballast.html) | Create a large, unused file in a node's storage directory that you can delete if the node runs out of disk space. [`cockroach debug encryption-active-key`](debug-encryption-active-key.html) | View the encryption algorithm and store key. [`cockroach debug zip`](debug-zip.html) | Generate a `.zip` file that can help Cockroach Labs troubleshoot issues with your cluster. [`cockroach debug merge-logs`](debug-merge-logs.html) | Merge multiple log files from different machines into a single stream. diff --git a/v19.2/debug-ballast.md b/v19.2/debug-ballast.md new file mode 100644 index 00000000000..7b3fb02473c --- /dev/null +++ b/v19.2/debug-ballast.md @@ -0,0 +1,43 @@ +--- +title: Create a Ballast File +summary: Create a large, unused file in a node's storage directory that you can delete if the node runs out of disk space. +toc: true +--- + +The `debug ballast` [command](cockroach-commands.html) creates a large, unused file that you can place in a node's storage directory. In the case that a node runs out of disk space and shuts down, you can delete the ballast file to free up enough space to be able to restart the node. + +- In addition to placing a ballast file in each node's storage directory, it is important to actively [monitor remaining disk space](monitoring-and-alerting.html#events-to-alert-on). +- Ballast files may be created in many ways, including the standard `dd` command. `cockroach debug ballast` uses the `fallocate` system call when available, so it will be faster than `dd`. + +## Subcommands + +{% include {{ page.version.version }}/misc/debug-subcommands.md %} + +## Synopsis + +~~~ shell +# Create a ballast file: +$ cockroach debug ballast [path to ballast file] [flags] + +# View help: +$ cockroach debug ballast --help +~~~ + +## Flags + +Flag | Description +-----|----------- +`--size`
`-z` | The amount of space to fill, or to leave available, in a node's storage directory via a ballast file. Positive values equal the size of the ballast file. Negative values equal the amount of space to leave after creating the ballast file. This can be a percentage (notated as a decimal or with %) or any bytes-based unit, for example:

`--size=1000000000 ----> 1000000000 bytes`
`--size=1GiB ----> 1073741824 bytes`
`--size=5% ----> 5% of available space`
`--size=0.05 ----> 5% of available space`
`--size=.05 ----> 5% of available space` + +## Example + +{% include copy-clipboard.html %} +~~~ shell +$ cockroach debug ballast cockroach-data/ballast.txt --size=1GiB +~~~ + +## See also + +- [Other Cockroach Commands](cockroach-commands.html) +- [Troubleshooting Overview](troubleshooting-overview.html) +- [Production Checklist](recommended-production-settings.html) diff --git a/v19.2/debug-encryption-active-key.md b/v19.2/debug-encryption-active-key.md index c42c5f3e3bd..26a957747fe 100644 --- a/v19.2/debug-encryption-active-key.md +++ b/v19.2/debug-encryption-active-key.md @@ -12,6 +12,10 @@ The `debug encryption-active-key` [command](cockroach-commands.html) displays th $ cockroach debug encryption-active-key [path specified by the store flag] ~~~ +## Subcommands + +{% include {{ page.version.version }}/misc/debug-subcommands.md %} + ## Example Start a node with encryption-at-rest enabled: diff --git a/v19.2/debug-merge-logs.md b/v19.2/debug-merge-logs.md index 727ece57cb6..048ce626cbc 100644 --- a/v19.2/debug-merge-logs.md +++ b/v19.2/debug-merge-logs.md @@ -12,9 +12,7 @@ The file produced by `cockroach debug merge-log` can contain highly sensitive, u ## Subcommands -While the `cockroach debug` command has a few subcommands, users are expected to only use the [`encryption-active-key`](debug-encryption-active-key.html), [`zip`](debug-zip.html), and `debug-merge` subcommands. - -`debug`'s other subcommands are useful only to CockroachDB's developers and contributors. +{% include {{ page.version.version }}/misc/debug-subcommands.md %} ## Synopsis diff --git a/v19.2/debug-zip.md b/v19.2/debug-zip.md index 7bafd7baba7..2ef093dd52e 100644 --- a/v19.2/debug-zip.md +++ b/v19.2/debug-zip.md @@ -24,8 +24,9 @@ The `debug zip` [command](cockroach-commands.html) connects to your cluster and Additionally, you can run the [`debug merge-logs`](debug-merge-logs.html) command to merge the collected logs in one file, making it easier to parse them to locate an issue with your cluster. -{{site.data.alerts.callout_danger}}The file produced by cockroach debug zip can contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data. You should share this data only with Cockroach Labs developers and only after determining the most secure method of delivery.{{site.data.alerts.end}} - +{{site.data.alerts.callout_danger}} +The file produced by `cockroach debug zip` can contain highly sensitive, unanonymized information, such as usernames, hashed passwords, and possibly your table's data. You should share this data only with Cockroach Labs developers and only after determining the most secure method of delivery. +{{site.data.alerts.end}} ## Details @@ -49,9 +50,7 @@ You can locate logs in the unarchived file's `debug/nodes/[node dir]/logs` direc ## Subcommands -While the `cockroach debug` command has a few subcommands, users are expected to only use the [`encryption-active-key`](debug-encryption-active-key.html), `zip`, and [`debug-merge`](debug-merge-logs.html) subcommands. - -`debug`'s other subcommands are useful only to CockroachDB's developers and contributors. +{% include {{ page.version.version }}/misc/debug-subcommands.md %} ## Synopsis diff --git a/v19.2/operational-faqs.md b/v19.2/operational-faqs.md index 5f227a540b9..4b7ed5a3ef0 100644 --- a/v19.2/operational-faqs.md +++ b/v19.2/operational-faqs.md @@ -5,7 +5,6 @@ toc: true toc_not_nested: true --- - ## Why is my process hanging when I try to start it in the background? The first question that needs to be asked is whether or not you have previously @@ -96,6 +95,14 @@ If you want all existing timeseries data to be deleted, change the `timeseries.s > SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '0s'; ~~~ +## What happens when a node runs out of disk space? + +When a node runs out of disk space, it shuts down and cannot be restarted until space is freed up. To prepare for this case, place a [ballast file](debug-ballast.html) in each node's storage directory that can be deleted to free up enough space to be able to restart the node. If you did not create a ballast file, look for other files that can be deleted, such as log files. + +{{site.data.alerts.callout_info}} +In addition to using ballast files, it is important to actively [monitor remaining disk space](monitoring-and-alerting.html#events-to-alert-on). +{{site.data.alerts.end}} + ## Why would increasing the number of nodes not result in more operations per second? If queries operate on different data, then increasing the number diff --git a/v19.2/recommended-production-settings.md b/v19.2/recommended-production-settings.md index d62a32fa785..b8447f4af9e 100644 --- a/v19.2/recommended-production-settings.md +++ b/v19.2/recommended-production-settings.md @@ -79,6 +79,8 @@ Nodes should have sufficient CPU, RAM, network, and storage capacity to handle y - To calculate IOPS, use [sysbench](https://github.com/akopytov/sysbench). If IOPS decrease, add more nodes to your cluster to increase IOPS. +- Place a [ballast file](debug-ballast.html) in each node's storage directory. In the unlikely case that a node runs out of disk space and shuts down, you can delete the ballast file to free up enough space to be able to restart the node. + - Use [zone configs](configure-replication-zones.html) to increase the replication factor from 3 (the default) to 5 (across at least 5 nodes). This is especially recommended if you are using local disks with no RAID protection rather than a cloud provider's network-attached disks that are often replicated under the hood, because local disks have a greater risk of failure. You can do this for the [entire cluster](configure-replication-zones.html#edit-the-default-replication-zone) or for specific [databases](configure-replication-zones.html#create-a-replication-zone-for-a-database), [tables](configure-replication-zones.html#create-a-replication-zone-for-a-table), or [rows](configure-replication-zones.html#create-a-replication-zone-for-a-table-or-secondary-index-partition) (enterprise-only).