public
Description: Good code.
Homepage: http://www.ralree.com
Clone URL: git://github.com/hank/life.git
life / oscon / 2008 / sessions / CouchDB.rdoc
100644 74 lines (58 sloc) 2.452 kb

OSCON 2008, Session 1: CouchDB

RDBMS

  • CouchDB is not a relational database.
  • Usually you design a schema up front

What is CouchDB

  • Stores Documents (individual data records)
  • No Schema!
  • Columns containing NULL don’t make sense
    • "My Business card doesn’t have ‘Fax Number:’ and then NULL"
  • Natural data behavior

Documents

  • Store your document data in a JSON string.
  • Talked about how Ruby is basically JSON compatible with no translations
  • XML Sucks.

Short example:

  {
    "_id":"223BDCD",
    "_rev":"834BC",
    "age":54,
    "name":"Darth Vader",
  ...
  }
  • Revision allows you to fetch the document, write a new copy, and save it as the latest revision of the document. This allows you to turn back time per database row!

How do I talk to it?

HTTP REST API

  • Create: HTTP PUT /db/docid
  • Read: HTTP GET /db/docid
  • Update: HTTP POST /db/docid
  • Delete: HTTP DELE /db/docid
  • The ID does not have to be generated by the user. Just don’t provide one. If you provide one, it has to be a string.
  • JSON doesn’t deal with binary data, you have to BASE64 encoding. There apparently is some other way to handle it.
  • If you don’t have well-formed JSON, all calls will result in an error. They don’t have a way to specify a way to enforce writes.
  • Type integrity checking: They don’t care.
  • Is there a way to get a document using something other than an id? Yes.

There are 2 more features that make Couch really cool:

Views

  • Filter, Collate, Aggregate
  • Powered by map/reduce! (They improved it a little bit!)
  • Views are built incrementally and on demand. Reduction is optional.
  • Sends diffs around to sync db data. VERY FAST!
  • No write penalty with views.
  • The view is simply the result of a map/reduce function stored in a btree.

Example: Tag Cloud

  • We have a db full of tagged documents
  • We must know how often each tag appears
  • Use map/reduce!
  • Works well since it’s in Erlang, which can be massively parallel

Replication

  • CouchDB was originally designed for an offline replication of your database.
  • Replication works a lot like rsync
  • They don’t use auto_increment
  • Full new revisions of documents, not partial changes.

Built for the future

  • Written in Erlang.
  • Non-locking MVCC and ACID compliant data store
    • No locking of the data store ever
  • Damien Katz invented it. Self-funded fulltime development for 2 years.
  • Now it’s backed by IBM.