This repository has been archived by the owner on May 10, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Ruby Document Database
License
aeden/rddb
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
RDDB is a Ruby Document-oriented Database. It is inspired by CouchDB and the notion that you insert documents into the database and then define views for querying. Views are defined as blocks which are used to select the documents and the attributes in the documents that are relevant to your needs. Views are materialized to enhance performance. == Features * Documents are simply collections of name/value pairs. * Views can be defined with Ruby code, mapping from a document to any other data structure, such as a String, Array or Hash. * A reduce block can be defined to reduce the initial mapped data from a view. * Views can be materialized to improve query performance. * Datastores are pluggable. Current implementations are RAM, partitioned files and Amazon S3. * Viewstores are pluggable. Current implementations are RAM, file system and Amazon S3. * Materialization stores are pluggable. Current implementations are RAM, file system and Amazon S3. * Distributed materialization may work, but it's going to be rewritten. == Examples This first example creates three documents and then queries for the names in the documents. # First create an database object database = Rddb::Database.new # Put some documents into it database << {:name => 'John', :income => 35000} database << {:name => 'Bob', :income => 40000} database << {:name => 'Jim', :income => 37000} # Create a view that will return the names database.create_view('names') do |document, args| document.name end # The result of querying will return an array of names assert_equal ['John','Bob','Jim'], database.query('names') In addition it is possible to specify a block to reduce the result set: database.create_view('total income') do |document, args| document.income end.reduce_with do |results| results.inject { |memo,value| memo + value } end assert_equal 112000, database.query('total income') Here's another example, this time averaging the incomes: database.create_view('average income') do |document, args| document.income end.reduce_with do |results| results.inject { |memo,value| memo + value } / results.length end assert_equal 37333, database.query('average income') You can also implement grouping: database.create_view('total income by profession') do |document, args| [document.profession, document.income] end.reduce_with do |results| returning Hash.new do |reduced| results.each do |result| reduced[result[0]] ||= 0 reduced[result[0]] += result[1] end end end results = database.query('total income by profession') assert_equal 77000, results['Plumber'] assert_equal 35000, results['Carpenter'] Here is an example of sorting: database.create_view('sorted by income') do |document, args| { :name => document.name, :income => document.income, :profession => document.profession } end.reduce_with do |results| results.sort { |a,b| a[:income] <=> b[:income] } end assert_equal [ {:income=>35000, :profession=>"Carpenter", :name=>"John"}, {:income=>37000, :profession=>"Plumber", :name=>"Jim"}, {:income=>40000, :profession=>"Plumber", :name=>"Bob"} ], database.query('sorted by income') Finding distinct values: database.create_view('distinct professions') do |document, args| document.profession end.reduce_with do |results| returning Array.new do |reduced| results.each do |result| reduced << result unless reduced.include?(result) end end end assert_equal ['Carpenter','Plumber'], database.query('distinct professions') You can also indicate that a view is materialized by passing a block to +materialize_if+: database.create_view('names') do |document, args| document.name end.materialize_if do |document| document.attribute?(:name) end And this works with reduce as well: database.create_view('names') do |document, args| document.name end.reduce_with do |results| results.sort end.materialize_if do |document| document.attribute?(:name) end You can indicate that materialization is required with the following: database.create_view('names') do |document, args| document.name end.reduce_with do |results| results.sort end.materialize_if do |document| document.attribute?(:name) end.require_materialization Since materialization can be a threaded or distributed process it may be a non-blocking operation. Therefore, if you specify that materialization is required then a Rddb::ViewNotYetMaterialized error will be raised if you attempt to query the view prior to completion of materialization. Normally materialization will occur when a document is added to the database, however you can also force materialization by calling <code>database.refresh_views</code> at any time. Usually you'll want to call this method after adding a new materialized view with <code>database.create_view</code>. == Document Stores There are three document stores provided with Rddb: RamDocumentStore, PartitionedFileDocumentStore and S3DocumentStore. By default the RamDocumentStore will be used, however either the partitioned file document store or the S3 document store should be used if you require your data to be persistent. The following example shows how to use the PartitionedFileDocumentStore document_store = Rddb::DocumentStore::PartitionedFileDocumentStore database = Rddb::Database.new(:document_store => document_store) The partitioned file document store accepts three options: * <tt>:basedir</tt>: The base directory for data storage * <tt>:partition_strategy</tt>: A Proc that defines partitioning * <tt>:cache</tt>: Provide a cache implementation (such as a Hash) to enable caching. If not specified then caching will not be used. The S3 document store accepts the same arguments as above. The :partition_strategy option can be used to provide a prefix for partitioning depending on what data is in the document. For example, if you have users then you might want to partition on the first three letters of their username: ps = Proc.new { |document| document.username[0..2] } document_store = Rddb::DocumentStore::PartitionedFileDatastore.new( :partition_strategy => ps ) Whatever string is returned from the :partition_strategy Proc will be used as the partition name. The cache option can be used to implement in-memory caching of documents. You can pass any sort of Hash, including a basic Hash or a more complex cache implementation (for example an LRU cache) as long as it has the same interface as Hash. Right now the partitioned file document store and S3 both use Marshal to dump and load data. This will most likely change in the future to something faster and more compact. == Materialization Stores There are three materialization stores currently provided with Rddb: RamMaterializationStore, FilesystemMaterializationStore and S3MaterializationStore. In both the FilesystemMaterializationStore and the S3MaterializationStore the view will be materialized to the storage system so that queries can be answered very quickly without recalculating the view. The following example shows the file system version: m_store = Rddb::MaterializationStore::FilesystemMaterializationStore.new database.create_view('names sorted by last name', :materialization_store => m_store) do |document, args| { :first_name => document.first_name, :last_name => document.last_name } end.reduce_with do |results| results.sort { |a,b| a[:last_name] <=> b[:last_name] } end.materialize_if do |document| true end == REST Interface == Workers == Gotchas Types matter. You should be sure that values which will be aggregated (for example, reduced with an inject) are inserted into the database using a numeric type. Calling inject on an array with 200,000 Strings is not a pleasant experience. ;-) == RubyForge This project is hosted on RubyForge at http://rubyforge.org/projects/rddb. Instructions for accessing the Subversion repository: http://rubyforge.org/scm/?group_id=4688.
About
Ruby Document Database
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published