public
Description: Ruby Document Database
Homepage: http://rddb.rubyforge.org/
Clone URL: git://github.com/aeden/rddb.git
rddb /
name age message
file .ignore Fri Oct 26 06:10:27 -0700 2007 initial import [aeden]
file LICENSE Fri Oct 26 08:35:33 -0700 2007 added license [aeden]
file README Sat Dec 15 20:14:00 -0800 2007 fixed small typo [aeden]
file Rakefile Sun Nov 11 21:20:05 -0800 2007 more test cleanup [aeden]
directory bin/ Sun Oct 28 15:56:53 -0700 2007 added vendor directory [aeden]
directory example/ Sat Nov 03 10:55:42 -0700 2007 added support and tests for defining a custom m... [aeden]
directory lib/ Sat Dec 15 20:18:30 -0800 2007 added cacheable module [aeden]
directory test/ Sat Nov 17 05:26:41 -0800 2007 removed the line to read the view from a file..... [aeden]
README
RDDB is a Ruby Document-oriented Database. It is inspired by CouchDB and the notion that you insert documents into the 
database and then define views for querying. Views are defined as blocks which are used to select the documents and the 
attributes in the documents that are relevant to your needs. Views are materialized to enhance performance.

== Features

* Documents are simply collections of name/value pairs.
* Views can be defined with Ruby code, mapping from a document to any other data structure, such as a String, Array or 
Hash.
* A reduce block can be defined to reduce the initial mapped data from a view.
* Views can be materialized to improve query performance.
* Datastores are pluggable. Current implementations are RAM, partitioned files and Amazon S3.
* Viewstores are pluggable. Current implementations are RAM, file system and Amazon S3.
* Materialization stores are pluggable. Current implementations are RAM, file system and Amazon S3.
* Distributed materialization may work, but it's going to be rewritten.

== Examples

This first example creates three documents and then queries for the names in the documents.

  # First create an database object
  database = Rddb::Database.new
  
  # Put some documents into it
  database << {:name => 'John', :income => 35000}
  database << {:name => 'Bob', :income => 40000}
  database << {:name => 'Jim', :income => 37000}
  
  # Create a view that will return the names
  database.create_view('names') do |document, args|
    document.name
  end
  
  # The result of querying will return an array of names
  assert_equal ['John','Bob','Jim'], database.query('names')
  
In addition it is possible to specify a block to reduce the result set:

  database.create_view('total income') do |document, args|
    document.income
  end.reduce_with do |results|
    results.inject { |memo,value| memo + value }
  end
  assert_equal 112000, database.query('total income')
  
Here's another example, this time averaging the incomes:

  database.create_view('average income') do |document, args|
    document.income
  end.reduce_with do |results|
    results.inject { |memo,value| memo + value } / results.length
  end
  assert_equal 37333, database.query('average income')
  
You can also implement grouping:

  database.create_view('total income by profession') do |document, args|
    [document.profession, document.income]
  end.reduce_with do |results|
    returning Hash.new do |reduced|
      results.each do |result|
        reduced[result[0]] ||= 0
        reduced[result[0]] += result[1]
      end
    end
  end
  
  results = database.query('total income by profession')
  assert_equal 77000, results['Plumber']
  assert_equal 35000, results['Carpenter']
  
Here is an example of sorting:

  database.create_view('sorted by income') do |document, args|
    {
      :name => document.name,
      :income => document.income,
      :profession => document.profession
    }
  end.reduce_with do |results|
    results.sort { |a,b| a[:income] <=> b[:income] }
  end
  
  assert_equal [
    {:income=>35000, :profession=>"Carpenter", :name=>"John"},
    {:income=>37000, :profession=>"Plumber", :name=>"Jim"},
    {:income=>40000, :profession=>"Plumber", :name=>"Bob"}
  ], database.query('sorted by income')
  
Finding distinct values:

  database.create_view('distinct professions') do |document, args|
    document.profession
  end.reduce_with do |results|
    returning Array.new do |reduced|
      results.each do |result|
        reduced << result unless reduced.include?(result)
      end
    end
  end
  
  assert_equal ['Carpenter','Plumber'], database.query('distinct professions')
  
You can also indicate that a view is materialized by passing a block to +materialize_if+:

  database.create_view('names') do |document, args|
    document.name
  end.materialize_if do |document|
    document.attribute?(:name)
  end
  
And this works with reduce as well:

  database.create_view('names') do |document, args|
    document.name
  end.reduce_with do |results|
    results.sort
  end.materialize_if do |document|
    document.attribute?(:name)
  end
  
You can indicate that materialization is required with the following:

  database.create_view('names') do |document, args|
    document.name
  end.reduce_with do |results|
    results.sort
  end.materialize_if do |document|
    document.attribute?(:name)
  end.require_materialization
  
Since materialization can be a threaded or distributed process it may be a non-blocking operation. Therefore, if you 
specify that materialization is required then a Rddb::ViewNotYetMaterialized error will be raised if you attempt to 
query the view prior to completion of materialization. Normally materialization will occur when a document is added to 
the database, however you can also force materialization by calling <code>database.refresh_views</code> at any time. 
Usually you'll want to call this method after adding a new materialized view with <code>database.create_view</code>.

== Document Stores

There are three document stores provided with Rddb: RamDocumentStore, PartitionedFileDocumentStore and S3DocumentStore. 
By default the RamDocumentStore will be used, however either the partitioned file document store or the S3 document 
store should be used if you require your data to be persistent. The following example shows how to use the 
PartitionedFileDocumentStore

  document_store = Rddb::DocumentStore::PartitionedFileDocumentStore
  database = Rddb::Database.new(:document_store => document_store)
  
The partitioned file document store accepts three options:

* <tt>:basedir</tt>: The base directory for data storage
* <tt>:partition_strategy</tt>: A Proc that defines partitioning
* <tt>:cache</tt>: Provide a cache implementation (such as a Hash) to enable caching. If not specified then caching will 
not be used.

The S3 document store accepts the same arguments as above.

The :partition_strategy option can be used to provide a prefix for partitioning depending on what data is in the 
document. For example, if you have users then you might want to partition on the first three letters of their username:


  ps = Proc.new { |document| document.username[0..2] }
  document_store = Rddb::DocumentStore::PartitionedFileDatastore.new(
    :partition_strategy => ps
  )

Whatever string is returned from the :partition_strategy Proc will be used as the partition name.

The cache option can be used to implement in-memory caching of documents. You can pass any sort of Hash, including a 
basic Hash or a more complex cache implementation (for example an LRU cache) as long as it has the same interface as 
Hash.

Right now the partitioned file document store and S3 both use Marshal to dump and load data. This will most likely 
change in the future to something faster and more compact.

== Materialization Stores

There are three materialization stores currently provided with Rddb: RamMaterializationStore,  
FilesystemMaterializationStore and S3MaterializationStore.

In both the FilesystemMaterializationStore and the S3MaterializationStore the view will be materialized to the storage 
system so that queries can be answered very quickly without recalculating the view. The following example shows the file 
system version:

  m_store = Rddb::MaterializationStore::FilesystemMaterializationStore.new
  database.create_view('names sorted by last name', :materialization_store => m_store) do |document, args|
    {
      :first_name => document.first_name,
      :last_name => document.last_name
    }
  end.reduce_with do |results|
    results.sort { |a,b| a[:last_name] <=> b[:last_name] }
  end.materialize_if do |document|
    true
  end
  
== REST Interface

== Workers

== Gotchas

Types matter. You should be sure that values which will be aggregated (for example, reduced with an inject) are inserted 
into the database using a numeric type. Calling inject on an array with 200,000 Strings is not a pleasant experience. 
;-)

== RubyForge

This project is hosted on RubyForge at http://rubyforge.org/projects/rddb. Instructions for accessing the Subversion 
repository: http://rubyforge.org/scm/?group_id=4688.