public
Description: Store billions of photos easily and efficiently.
Homepage:
Clone URL: git://github.com/kr/cubby.git
Keith Rarick (author)
Thu Aug 13 16:22:25 -0700 2009
commit  2162109cd01930c064010d9bfd018322a037a5a0
tree    bb8084a2bf1b2648b8fa798a78e388bb581f6d9f
parent  89e9b3efd621a9db188dc592ccea3e7f889ecfc1
cubby /
name age message
file .gitignore Loading commit data...
file COPYING Wed May 27 06:00:29 -0700 2009 Initial commit. [Keith Rarick]
file CUT-LICENSE.TXT Wed May 27 06:00:29 -0700 2009 Initial commit. [Keith Rarick]
file HACKING.md Wed Jun 03 06:48:32 -0700 2009 Automatically update the version as necessary. [Keith Rarick]
file Makefile.am
file README.md Thu Jun 04 01:46:54 -0700 2009 Add a blurb about reliability. [Keith Rarick]
file arr.c Fri Jul 10 23:26:47 -0700 2009 Send LINK packets. [Keith Rarick]
file arr.h Mon Jul 27 19:29:14 -0700 2009 Make a place to store known nodes. [Keith Rarick]
file autogen Wed May 27 06:00:29 -0700 2009 Initial commit. [Keith Rarick]
file blah.sh Fri Jul 10 23:26:47 -0700 2009 Send LINK packets. [Keith Rarick]
file blob.c Mon Jun 01 07:59:16 -0700 2009 Read and initialize bundles at startup. [Keith Rarick]
file blob.h Wed Aug 05 17:39:55 -0700 2009 Allocate space for a checksum. [Keith Rarick]
file bundle.c
file bundle.h
file check-arr.c Fri Jul 10 23:26:47 -0700 2009 Send LINK packets. [Keith Rarick]
file check-bundle.c Sun Aug 09 16:05:19 -0700 2009 Only one root key per peer. [Keith Rarick]
file check-cpkt.c Sun Aug 09 16:05:19 -0700 2009 Only one root key per peer. [Keith Rarick]
file check-dirent.c Sun Aug 09 15:22:33 -0700 2009 Update tests. [Keith Rarick]
file check-heap.c Thu Jun 04 10:42:26 -0700 2009 Store the pool of available regions in a min heap. [Keith Rarick]
file check-http.py Thu Aug 13 03:51:15 -0700 2009 Tests for basic HTTP functions. [Keith Rarick]
file check-key.c Sat Jul 11 00:18:33 -0700 2009 Format keys properly. [Keith Rarick]
file check-manager.c Sun Aug 09 15:22:33 -0700 2009 Update tests. [Keith Rarick]
file check-node.c Mon Aug 10 18:53:56 -0700 2009 Don't allow nodes to have a null peer. [Keith Rarick]
file check-peer.c Wed Jul 08 18:40:27 -0700 2009 Each peer gets a key. [Keith Rarick]
file check-region.c Thu Jun 04 04:41:39 -0700 2009 Refactor and add more tests. [Keith Rarick]
file check-sanity.py
file check-sha512.c Wed Jun 03 05:52:52 -0700 2009 SHA-512. [Keith Rarick]
file check-sparr.c Mon Jul 20 15:36:03 -0700 2009 More 64-bit cleanup. [Keith Rarick]
file check-spht.c Sun Aug 09 15:22:33 -0700 2009 Update tests. [Keith Rarick]
file check-util.c Sun Aug 09 15:28:49 -0700 2009 Make a usec type. [Keith Rarick]
file check-version Wed Jun 03 06:48:32 -0700 2009 Automatically update the version as necessary. [Keith Rarick]
file check.py
file configure.ac Mon Jul 20 14:34:18 -0700 2009 Relax the autoconf requirement. [Keith Rarick]
file cpkt.c Mon Aug 10 19:50:15 -0700 2009 Remove a special case. [Keith Rarick]
file cpkt.h Mon Aug 10 19:50:15 -0700 2009 Remove a special case. [Keith Rarick]
file cubbyd.c
file cut.c Thu Jun 25 16:41:46 -0700 2009 Make ASSERT take format and varargs, like printf. [Keith Rarick]
file cut.h Wed Jul 01 13:08:28 -0700 2009 Put a LF after each assert message. [Keith Rarick]
file cutgen.c Wed Jul 08 18:36:44 -0700 2009 Don't let cutgen depend on util.[ch] [Keith Rarick]
file dirent.c Sat Aug 08 12:31:01 -0700 2009 Add a rank field to dirents. [Keith Rarick]
file dirent.h Sat Aug 08 12:31:01 -0700 2009 Add a rank field to dirents. [Keith Rarick]
directory doc/ Sat Aug 08 12:27:27 -0700 2009 Part of the new unified linking and rebalancing. [Keith Rarick]
file heap.c Thu Jun 04 10:42:26 -0700 2009 Store the pool of available regions in a min heap. [Keith Rarick]
file heap.h Thu Jun 04 10:42:26 -0700 2009 Store the pool of available regions in a min heap. [Keith Rarick]
file http.c
file http.h
file key.c Sat Jul 11 00:18:33 -0700 2009 Format keys properly. [Keith Rarick]
file key.h Sat Jul 11 19:28:45 -0700 2009 Handle LINKED packets on the receiving end. [Keith Rarick]
directory m4/ Sat Jun 13 11:38:01 -0700 2009 Use a more appropriate name. [Keith Rarick]
file manager.c
file manager.h Mon Aug 10 19:15:43 -0700 2009 Remove this other deprecated function. [Keith Rarick]
file net.c
file net.h Wed Jul 08 16:49:19 -0700 2009 CCP send and receive and most of bootstrapping. [Keith Rarick]
file node.c
file node.h Mon Aug 10 18:53:56 -0700 2009 Don't allow nodes to have a null peer. [Keith Rarick]
file peer.c Mon Aug 10 19:50:15 -0700 2009 Remove a special case. [Keith Rarick]
file peer.h Mon Aug 10 18:37:22 -0700 2009 Treat the local process as a peer. [Keith Rarick]
file prot.c
file prot.h
file region.c Sat Aug 08 12:31:01 -0700 2009 Add a rank field to dirents. [Keith Rarick]
file region.h Mon Jul 20 16:10:01 -0700 2009 Avoid bogus regions by checking a magic number. [Keith Rarick]
file reslink Tue Jun 02 08:58:01 -0700 2009 Link arbitrary files. [Keith Rarick]
file root.html Tue Jun 02 08:58:01 -0700 2009 Link arbitrary files. [Keith Rarick]
file sha512.c Fri Jun 05 07:13:05 -0700 2009 Fix compilation with no optimizations. [Keith Rarick]
file sha512.h Wed Jun 03 05:52:52 -0700 2009 SHA-512. [Keith Rarick]
file sparr.c Wed Jun 10 10:58:53 -0700 2009 Fix some memory leaks. Thanks valgrind! [Keith Rarick]
file sparr.h Wed Jun 10 10:58:53 -0700 2009 Fix some memory leaks. Thanks valgrind! [Keith Rarick]
file spht.c Mon Jun 01 02:30:44 -0700 2009 Simpler hashtable interface. [Keith Rarick]
file spht.h Wed Jun 03 11:26:59 -0700 2009 Receive and retrieve files. [Keith Rarick]
file testhelpers.py
file util.c
file util.h
README.md

Cubby Readme

Store billions of photos easily and efficiently.

Cubby is a distributed key-value store for static blobs. It is simple, fast, scalable, and tuned for the datacenter.

It is ideal for storing thumbnails, photos, videos, and other medium to large files that never change.

Easy to Use

Cubby automatically manages new nodes, failures, and routing.

To add more storage, you just turn on a new box and fire up cubbyd. No configuration is necessary. If a node fails, Cubby will re-replicate files to maintain your desired level of reduncancy.

Reliable

Unlike similar systems, Cubby doesn't have a centralized "manager" or "metadata server" to fail. File data and metadata are distributed evenly and redundantly among all Cubby nodes. If one node fails, directory lookups and file retrieval continue to work with the remaining nodes.

Scalable

Cubby is designed to scale to hundreds of nodes and petabytes of storage.

In the future, if there is demand, we may extend Cubby to use a Distributed Hash Table for routing requests. This would allow the system to scale to millions of nodes and exabytes of storage.

Fast

Cubby stores blobs in an efficient, packed format. It causes at most one disk seek for each file read or written.

This is similar to the technique used in Varnish and Haystack.

Tuned for the Datacenter

We assume the local network is fast, but (of course) we do not assume that the network or hardware is reliable.

We assume that Cubby nodes are responsible citizens; and that clients and peers are never malicious.

We also assume you have plenty of read caching in front of your Cubby cluster. Thus our performance priorities are, roughly in order: minimizing write latency, minimizing jitter in write latency, and maximising throughput. Read performance usually takes a back seat.

These assumptions enable design decisions that provide better performance.

Get Involved

We're just getting started, so there's no mailing list yet. Please send any questions or comments to Keith Rarick kr@xph.us.