Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme updates #5511

Merged
merged 11 commits into from
Mar 24, 2016
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
256 changes: 97 additions & 159 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,195 +5,137 @@

## A Scalable, Survivable, Strongly-Consistent SQL Database

**Table of Contents**

- [What is CockroachDB](#what-is-cockroachdb)
- [Status](#status)
- [Running CockroachDB Locally](#running-cockroachdb-locally)
- [Deploying CockroachDB in the cloud](#deploying-cockroachdb-in-the-cloud)
- [Running a multi-node cluster](#running-a-multi-node-cluster)
- [Getting in touch and contributing](#get-in-touch)
- [What is CockroachDB?](#what-is-cockroachdb)
- [Quickstart](#quickstart)
- [Client Drivers](#client-drivers)
- [Deployment](#deployment)
- [Get In Touch](#get-in-touch)
- [Contributing](#contributing)
- [Talks](#talks)
- [Design](#design) and [Datastore Goal Articulation](#datastore-goal-articulation)
- [Architecture](#architecture) and [Client Architecture](#client-architecture)
- [Design](#design)

## What is CockroachDB
## What is CockroachDB?

CockroachDB is a distributed SQL database built on top of a transactional and consistent key:value store. The primary design goals are support for ACID transactions, horizontal scalability, and survivability, hence the name. CockroachDB implements a Raft consensus algorithm for consistency. It aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. CockroachDB nodes (RoachNodes) are symmetric; a design goal is homogeneous deployment (one binary) with minimal configuration.
CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It **scales** horizontally; **survives** disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports **strongly-consistent** ACID transactions; and provides a familiar **SQL** API for structuring, manipulating, and querying data.

For more details, see our [FAQ](https://www.cockroachlabs.com/docs/frequently-asked-questions.html), [documentation](https://www.cockroachlabs.com/docs), and [design overview](#design-overview).

## Status

CockroachDB is currently in alpha. See our
[Roadmap](https://github.com/cockroachdb/cockroach/issues/2132) and
[Issues](https://github.com/cockroachdb/cockroach/issues) for a list of features planned or in development.

## Running CockroachDB Locally

### Environment Setup

#### Native (read: without Docker)

* set up the dev environment (see [CONTRIBUTING.md](CONTRIBUTING.md))
* `make build`

#### Using Docker

Install Docker! On OSX ([official docs](https://docs.docker.com/engine/installation/mac/#from-your-shell)):
```bash
# install docker and docker-machine:
$ brew install docker docker-machine
# install VirtualBox:
$ brew cask install virtualbox
# create the VM (this will also start it):
$ docker-machine create --driver virtualbox default
# if the VM exists but isn't running, start it:
$ docker-machine start default
# set up the environment for the docker client:
$ eval $(docker-machine env default)
```
Other operating systems will have a similar set of commands. Please check Docker's documentation for more info.

Pull the CockroachDB Docker image and drop into a shell within it:
```bash
docker pull cockroachdb/cockroach
docker run -p 26257:26257 -p 8080:8080 -t -i cockroachdb/cockroach shell
# root@82cb657cdc42:/cockroach#
```
## Quickstart

1. [Install Cockroach DB](https://www.cockroachlabs.com/docs/install-cockroachdb.html).

2. [Start a local cluster](https://www.cockroachlabs.com/docs/start-a-local-cluster.html) with three nodes running on different ports:

```shell
$ ./cockroach start --insecure &
$ ./cockroach start --insecure --store=node2-data --port=26258 --http-port=8081 --join=localhost:26257 &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first node is in cockroach-data. We should either pass --store=node1-data to the first node or use cockroach-data2 and cockroach-data3 for the others.

The binary should be on your path after installation, not in the current directory, so these commands should start with cockroach instead of ./cockroach.

Should the quickstart be three nodes or just one, with the multi-node setup left for the full docs? I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the store directories. I'll fix that.

As for using cockroach instead of ./cockroach, it's added to your path only if you do go get (build from source), right? All the other more likely methods (download binary, homebrew) won't automatically add it to the path. So it seems safest to not assume anything about the path and just use ./cockroach, no?

For one vs. three nodes, it feels to me valuable to demonstrate just how easy it is to add nodes on different ports, but I can change if you and others feel strongly about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brew adds it to your path; the other methods we currently document generally don't automatically(make install from source might do it, depending on your setup). However, I think we should encourage people doing a manual install to put the binary on their path instead of running with a data directory underneath where they untarred the binary.

In the future when we offer "real" packages for apt-get or similar, they'll put the binary on the path.

$ ./cockroach start --insecure --store=node3-data --port=26259 --http-port=8082 --join=localhost:26257 &
```

3. [Start the built-in SQL client](https://www.cockroachlabs.com/docs/use-the-built-in-sql-client.html) as an interactive shell:

```shell
$ ./cockroach sql --insecure
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
```

4. Run some [CockroachDB SQL statements](https://www.cockroachlabs.com/docs/learn-cockroachdb-sql.html):

```shell
root@:26257> CREATE DATABASE bank;
CREATE DATABASE

### Bootstrap and talk to a single node

Note: If you’re using Docker as described above, run all the commands described below in the container’s shell.
root@:26257> SET DATABASE = bank;
SET

Setting up Cockroach is easy, but starting a test node is even easier. All it takes is running:
root@:26257> CREATE TABLE accounts (id INT PRIMARY KEY, balance DECIMAL);
CREATE TABLE

```bash
./cockroach start --insecure &
```
root@26257> INSERT INTO accounts VALUES (1234, DECIMAL '10000.50');
INSERT 1

Verify that you're up and running by visiting the cluster UI. If you're running
without Docker (or on Linux), you'll find it at
[localhost:8080](http://localhost:8080); for OSX under Docker, things are a
little more complicated and you need to run `docker-machine ip default` to get
the correct address (but the port is the same).
root@26257> SELECT * FROM accounts;
+------+----------+
| id | balance |
+------+----------+
| 1234 | 10000.50 |
+------+----------+
```

##### Built-in client
4. Checkout the admin UI by pointing your browser to `http://<localhost>:26257`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI is currently at :8080, not :26257.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, Ben. Thanks.


Now let's talk to this node. The easiest way to do that is to use the `cockroach` binary - it comes with a built-in sql client:
5. Learn how to [secure a cluster](https://www.cockroachlabs.com/docs/secure-a-cluster.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about make this less daunting: "CockroachDB makes it easy to [secure a cluster]"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixing.


```bash
./cockroach sql --insecure
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
192.168.99.100:26257> show databases;
+----------+
| Database |
+----------+
| system |
+----------+
192.168.99.100:26257> SET database = system;
OK
192.168.99.100:26257> show tables;
+------------+
| Table |
+------------+
| descriptor |
| eventlog |
| lease |
| namespace |
| rangelog |
| reporting |
| users |
| zones |
+------------+
```
## Client Drivers

Check out `./cockroach help` to see all available commands.
CockroachDB supports the PostgreSQL wire protocol, so you can use any available PostgreSQL client drivers to connect from various languages. For recommended drivers that we've tested, see [Install Client Drivers](https://www.cockroachlabs.com/docs/install-client-drivers.html).

## Deployment

## Deploying CockroachDB in the cloud
- [Manual](https://www.cockroachlabs.com/docs/manual-deployment.html) - Steps to deploy a CockroachDB cluster manually on multiple machines.

For a sample configuration to run an insecure CockroachDB cluster on AWS using [Terraform](https://terraform.io/),
see [cloud deployment](https://github.com/cockroachdb/cockroach/tree/master/cloud/aws).
- [Cloud](https://github.com/cockroachdb/cockroach/tree/master/cloud/aws) - A sample configuration to run an insecure CockroachDB cluster on AWS using [Terraform](https://terraform.io/).

## Running a multi-node cluster
## Get In Touch

We'll set up a three-node cluster below.
When you see a bug or have improvements to suggest, please open an [issue](https://github.com/cockroachdb/cockroach/issues).

The code assumes that `$NODE{1,2,3}` are the host names of the three nodes in the cluster.
For development-related questions and anything else, there are two easy ways to get in touch:

```bash
# Create certificates
./cockroach cert create-ca
./cockroach cert create-node 127.0.0.1 ::1 localhost $NODE1 $NODE2 $NODE3
./cockroach cert create-client root
# Distribute certificates
for n in $NODE1 $NODE2 $NODE3; do
scp -r certs ${n}:certs
done
```

Now, on node 1, initialize the cluster (this example uses `/data`; yours may vary):

```bash
./cockroach start --store=/data1
```

Then, add nodes 2, 3, etc. to the cluster by specifying the `--join` flag to connect to any already-joined node.

```bash
./cockroach start --store=/data2 --join=${NODE1}:26257
```

Verify that the cluster is connected on the web UI by directing your browser at
```
https://<any_node>:8080
```
- [Join us on Gitter](https://gitter.im/cockroachdb/cockroach). This is the best, most immediate way to connect with CockroachDB engineers.
- [Post to our Developer mailing list](https://groups.google.com/forum/#!forum/cockroach-db). Please join first or you messages may be held back for moderation.

## Get in touch
## Contributing

We spend almost all of our time here on GitHub, and use the [issue
tracker](https://github.com/cockroachdb/cockroach/issues) for
bug reports.
We're an open source project and welcome contributions.

For development related questions and anything else, message our mailing list at [cockroach-db@googlegroups.com](https://groups.google.com/forum/#!forum/cockroach-db). We recommend joining before posting, or your messages may be held back for moderation.
1. See [CONTRIBUTING.md](https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md) to get your local environment set up.

### Contributing
2. Take a look at our [open issues](https://github.com/cockroachdb/cockroach/issues/), in particular those with the [helpwanted label](https://github.com/cockroachdb/cockroach/labels/helpwanted).

We're an Open Source project and welcome contributions.
See [CONTRIBUTING.md](https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md) to get your local environment set up.
Once that's done, take a look at our [open issues](https://github.com/cockroachdb/cockroach/issues/), in particular those with the [helpwanted label](https://github.com/cockroachdb/cockroach/labels/helpwanted), and follow our [code reviews](https://github.com/cockroachdb/cockroach/pulls/) to learn about our style and conventions.
3. Review our [style guide](https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md#style-guide) and follow our [code reviews](https://github.com/cockroachdb/cockroach/pulls) to learn about our style and conventions.

4. Make your changes according to our [code review workflow](https://github.com/cockroachdb/cockroach/blob/master/CONTRIBUTING.md#code-review-workflow).

## Talks

* [Venue: Annual RocksDB meetup at Facebook HQ](https://www.youtube.com/watch?v=-ij2OiDTxz0), by [Spencer Kimball] (https://github.com/spencerkimball) on (12/02/2015), 21min.<br />
CockroachDB's MVCC model.
* [Venue: Code Driven NYC](https://www.youtube.com/watch?v=tV-WXM2IJ3U), by [Spencer Kimball] (https://github.com/spencerkimball) on (10/28/2015), 30min.<br />
Architecture & Overview.
* [Venue: Golang UK Conference 2015](https://www.youtube.com/watch?v=33oqpLmQ3LE), by [Ben Darnell](https://github.com/bdarnell) on (08/21/2015), 52min.<br />
* [Venue: Data Driven NYC](https://youtu.be/TA-Jw78Ms_4), by [Spencer Kimball] (https://github.com/spencerkimball) on (06/16/2015), 23min.<br />
A short, less technical presentation of CockroachDB.
* [Venue: NY Enterprise Technology Meetup](https://www.youtube.com/watch?v=SXAEZlpsHNE), by [Tobias Schottdorf](https://github.com/tschottdorf) on (06/10/2015), 15min.<br />
A short, non-technical talk with a small cluster survivability demo.
* [Venue: CoreOS Fest](https://www.youtube.com/watch?v=LI7uaaYeYmQ), by [Spencer Kimball](https://github.com/spencerkimball) on (05/27/2015), 25min.<br />
An introduction to the goals and design of CockroachDB. The recommended talk to watch if all you have time for is one.
* [Venue: The Go Devroom FOSDEM 2015](https://www.youtube.com/watch?v=ndKj77VW2eM&index=2&list=PLtLJO5JKE5YDK74RZm67xfwaDgeCj7oqb), by [Tobias Schottdorf](https://github.com/tschottdorf) on (03/04/2015), 45min.<br />
The most technical talk given thus far, going through the implementation of transactions in some detail.
- 12/2/2015: [Annual RocksDB meetup at Facebook HQ](https://www.youtube.com/watch?v=-ij2OiDTxz0), by [Spencer Kimball] (https://github.com/spencerkimball), 21min
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh this talk is terrible. I think the best one is probably the Data Driven one still for a good general overview.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the Data Driven talk. That's the first public presentation on Cockroach I attended.

CockroachDB's MVCC model.

### Older talks
- 10/28/2015: [Code Driven NYC](https://www.youtube.com/watch?v=tV-WXM2IJ3U), by [Spencer Kimball] (https://github.com/spencerkimball), 30min
Architecture & overview.

* [Venue: The NoSQL User Group Cologne](https://www.youtube.com/watch?v=jI3LiKhqN0E), by [Tobias Schottdorf](https://github.com/tschottdorf) on (11/5/2014), 1h25min.
* [Venue: Yelp!](https://www.youtube.com/watch?feature=youtu.be&v=MEAuFgsmND0), by [Spencer Kimball](https://github.com/spencerkimball) on (9/5/2014), 1h.
- 8/21/2015: [Golang UK Conference 2015](https://www.youtube.com/watch?v=33oqpLmQ3LE), by [Ben Darnell](https://github.com/bdarnell), 52min

- 6/16/2015: [Data Driven NYC](https://youtu.be/TA-Jw78Ms_4), by [Spencer Kimball] (https://github.com/spencerkimball), 23min
A short, less technical presentation of CockroachDB.

## Design
- 6/10/2015: [NY Enterprise Technology Meetup](https://www.youtube.com/watch?v=SXAEZlpsHNE), by [Tobias Schottdorf](https://github.com/tschottdorf), 15min
A short, non-technical talk with a small cluster survivability demo.

- 5/27/2015: [CoreOS Fest](https://www.youtube.com/watch?v=LI7uaaYeYmQ), by [Spencer Kimball](https://github.com/spencerkimball), 25min
An introduction to the goals and design of CockroachDB. **Recommended** if you have time for only one talk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the most recommended talk to start with? One of the more recent talks would probably be better - maybe Spencer at Code Driven on 10/28/15?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spencer marked the Data Driven one as the best, so I'll mark that as recommended, or maybe just create a section with both Data Driven and Code Driven as recommended starting points.


- 3/4/2015: [The Go Devroom FOSDEM 2015](https://www.youtube.com/watch?v=ndKj77VW2eM&index=2&list=PLtLJO5JKE5YDK74RZm67xfwaDgeCj7oqb), by [Tobias Schottdorf](https://github.com/tschottdorf), 45min
The most technical talk given thus far, going through the implementation of transactions in some detail.

This is an overview. For an in depth discussion of the design, see the [design doc](https://github.com/cockroachdb/cockroach/blob/master/docs/design.md).
- 11/5/2014: [The NoSQL User Group Cologne](https://www.youtube.com/watch?v=jI3LiKhqN0E), by [Tobias Schottdorf](https://github.com/tschottdorf), 1h 25min

For a quick design overview, see the [CockroachDB tech talk slides](https://docs.google.com/presentation/d/1tPPhnpJ3UwyYMe4MT8jhqCrE9ZNrUMqsvXAbd97DZ2E/edit#slide=id.p)
or watch a [presentation](#talks).
- 9/5/2014: [Yelp!](https://www.youtube.com/watch?feature=youtu.be&v=MEAuFgsmND0), by [Spencer Kimball](https://github.com/spencerkimball), 1h

## Design

This is an overview. For an in-depth discussion of the design and architecture, see the full [design doc](https://github.com/cockroachdb/cockroach/blob/master/docs/design.md). For another quick design overview, see the [CockroachDB tech talk slides](https://docs.google.com/presentation/d/1tPPhnpJ3UwyYMe4MT8jhqCrE9ZNrUMqsvXAbd97DZ2E/edit#slide=id.p).

### Overview
CockroachDB is a distributed SQL database built on top of a transactional and consistent key:value store. The primary design goals are support for ACID transactions, horizontal scalability and survivability, hence the name. CockroachDB implements a Raft consensus algorithm for consistency. It aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. CockroachDB nodes (RoachNodes) are symmetric; a design goal is homogeneous deployment (one binary) with minimal configuration.

CockroachDB implements a single, monolithic sorted map from key to value
Expand All @@ -207,9 +149,9 @@ total byte size within a globally configurable min/max size
interval. Range sizes default to target 64M in order to facilitate
quick splits and merges and to distribute load at hotspots within a
key range. Range replicas are intended to be located in disparate
datacenters for survivability (e.g. { US-East, US-West, Japan }, {
Ireland, US-East, US-West}, { Ireland, US-East, US-West, Japan,
Australia }).
datacenters for survivability (e.g. `{ US-East, US-West, Japan }`, `{
Ireland, US-East, US-West}` , `{ Ireland, US-East, US-West, Japan,
Australia }`).

Single mutations to ranges are mediated via an instance of a
distributed consensus algorithm to ensure consistency. We’ve chosen to
Expand Down Expand Up @@ -241,16 +183,12 @@ performance and/or availability. Unlike Spanner, zones are monolithic
and don’t allow movement of fine grained data on the level of entity
groups.

A [Megastore][4]-like message queue mechanism is also provided to 1)
efficiently sideline updates which can tolerate asynchronous execution
and 2) provide an integrated message queuing system for asynchronous
communication between distributed system components.

#### SQL - NoSQL - NewSQL Capabilities

![SQL - NoSQL - NewSQL Capabilities](/resource/doc/sql-nosql-newsql.png?raw=true)

## Datastore Goal Articulation

### Datastore Goal Articulation

There are other important axes involved in data-stores which are less
well understood and/or explained. There is lots of cross-dependency,
Expand Down Expand Up @@ -317,7 +255,7 @@ write-optimized (HBase, Cassandra, SQLite3/LSM, CockroachDB).

![Read vs. Write Optimization Spectrum](/resource/doc/read-vs-write.png?raw=true)

## Architecture
### Architecture

CockroachDB implements a layered architecture, with various
subdirectories implementing layers as appropriate. The highest level of
Expand All @@ -343,7 +281,7 @@ replicas.

![Range Architecture Blowup](/resource/doc/architecture-blowup.png?raw=true)

## Client Architecture
### Client Architecture

RoachNodes serve client traffic using a fully-featured SQL API which accepts requests as either application/x-protobuf or
application/json. Client implementations consist of an HTTP sender
Expand Down