⚠️ roachprod is an internal tool for creating and testing CockroachDB clusters. Use at your own risk! ⚠️


  1. Make sure you have gcloud installed and configured (gcloud auth list to check, gcloud auth login to authenticate). You may want to update old installations (gcloud components update).
  2. Build a local binary of roachprod: make bin/roachprod
  3. Add $PWD/bin to your PATH so you can run roachprod from the root directory of cockroach.


  • By default, clusters are created in the cockroach-ephemeral GCE project. Use the --gce-project flag or GCE_PROJECT environment variable to create clusters in a different GCE project. Note that the lifetime functionality requires roachprod gc --gce-project=<name> to be run periodically (i.e. via a cronjob). This is only provided out-of-the-box for the cockroach-ephemeral cluster.
  • Anyone can connect to any port on VMs in cockroach-ephemeral. DO NOT STORE SENSITIVE DATA.
  • Cluster names are prefixed with the user creating them. For example, roachprod create test creates the marc-test cluster.
  • VMs have a default lifetime of 12 hours (changeable with the --lifetime flag).
  • Default settings create 4 VMs (-n 4) with 4 CPUs, 15GB memory (--machine-type=n1-standard-4), and local SSDs (--local-ssd).

Cluster quick-start using roachprod

# Create a cluster with 4 nodes and local SSD. The last node is used as a
# load generator for some tests. Note that the cluster name must always begin
# with your username.
export CLUSTER="${USER}-test"
roachprod create ${CLUSTER} -n 4 --local-ssd

# Add gcloud SSH key. Optional for most commands, but some require it.
ssh-add ~/.ssh/google_compute_engine

# Stage binaries.
roachprod stage ${CLUSTER} workload
roachprod stage ${CLUSTER} release v2.0.5

# ...or using roachprod directly (e.g., for your locally-built binary).
build/ mkrelease
roachprod put ${CLUSTER} cockroach-linux-2.6.32-gnu-amd64 cockroach

# Start a cluster.
roachprod start ${CLUSTER}

# Check the admin UI.
roachprod admin --open ${CLUSTER}:1

# Run a workload.
roachprod run ${CLUSTER}:4 -- ./workload init kv
roachprod run ${CLUSTER}:4 -- ./workload run kv --read-percent=0 --splits=1000 --concurrency=384 --duration=5m

# Open a SQL connection to the first node.
roachprod sql ${CLUSTER}:1

# Extend lifetime by another 6 hours.
roachprod extend ${CLUSTER} --lifetime=6h

# Destroy the cluster.
roachprod destroy ${CLUSTER}

Command reference

Warning: this reference is incomplete. Be prepared to refer to the CLI help text and the source code.

Create a cluster

$ roachprod create foo
Creating cluster marc-foo with 3 nodes
marc-foo: 23h59m42s remaining
  marc-foo-0000   []
  marc-foo-0001   []
  marc-foo-0002   []

Choosing a Provider

Use the --clouds flag to set which cloud provider(s) to use. Ex:

$ roachprod create foo --clouds gce,aws

Node Distribution Options

There are a couple flags that interact to create nodes in one zone or in geographically distributed zones:

  • --geo
  • the --[provider]-zones flags (--gce-zones, --aws-zones, --azure-locations)

Here's what to expect when the options are combined:

  • If neither are set: nodes are all placed within one of the the provider's default zones
  • --geo only: nodes are spread across the provider's default zones
  • --[provider]-zones or --geo --[provider]-zones: nodes are spread across all the specified zones

Interact using crl-prod tools

roachprod populates hosts files in ~/.roachprod/hosts. These are used by crl-prod tools to map clusters to node addresses.

$ crl-ssh marc-foo all df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G  1.2G   48G   3% /

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G  1.2G   48G   3% /

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G  1.2G   48G   3% /

Interact using roachprod directly

# Add ssh-key
$ ssh-add ~/.ssh/google_compute_engine

$ roachprod status marc-foo
marc-foo: status 3/3
   1: not running
   2: not running
   3: not running

SSH into hosts

roachprod uses gcloud to sync the list of hostnames to ~/.ssh/config and set up keys.

$ ssh

List clusters

$ roachprod list
marc-foo: 23h58m27s remaining

Destroy cluster

$ roachprod destroy marc-foo
Destroying cluster marc-foo with 3 nodes

See roachprod help <command> for further details.

Future improvements

  • Bigger loadgen VM (last instance)

  • Ease the creation of test metadata and then running a series of tests using roachprod <cluster> test <dir1> <dir2> .... Perhaps something like roachprod prepare <test> <binary>.

  • Automatically detect stalled tests and restart tests upon unexpected failures. Detection of stalled tests could be done by noticing zero output for a period of time.

  • Detect crashed cockroach nodes.

