public
Description: ActiveRecord plugin that provides the underlying framework to connect to multiple databases at the instance level
Homepage:
Clone URL: git://github.com/brennandunn/sharded_database.git
name age message
file .gitignore Wed Jan 21 09:06:02 -0800 2009 Ignore autotest dir. Signed-off-by: Brennan Du... [metaskills]
file README.rdoc Thu Jan 22 06:13:38 -0800 2009 Added more usage examples. [brennandunn]
file Rakefile Wed Jan 21 09:06:03 -0800 2009 Create new DB names scoped to the project. Add ... [metaskills]
file init.rb Sun Jan 11 19:47:44 -0800 2009 Fix typo from mass replace [decisiv]
directory lib/ Thu Feb 12 12:08:06 -0800 2009 Version bump [brennandunn]
file sharded_database.gemspec Thu Feb 12 12:08:06 -0800 2009 Version bump [brennandunn]
directory test/ Thu Feb 12 17:22:37 -0800 2009 Expanded test coverage for has_many associations [brennandunn]
file uninstall.rb Tue Dec 30 14:06:29 -0800 2008 Initial import [brennandunn]
README.rdoc

sharded_database

Overview

This ActiveRecord plugin is the end result of a real world problem - First, how to aggregate table data from across multiple databases for the purpose of pagination / searching. Second, how to transparently have one production instance have read/write access to a set of aggregated records pulled from a global view.

The first step was solved by creating a SQL view to pool all production records together for a given databse. This SQL view resides on a central database accessible by all instances of our application. From there, we query against this view to return a set of Aggregate* records. The second step, which this plugin handles, is proxying to the original database/record for the ‘aggregate’ instance and transparently interacting with the returned aggregate record as if it were the original object.

Uses

Currently, sharded_database works by reflecting on a set of returned records, and taking the result of #sharded_connection_klass method to delegate the connection to - for each instance. The path we chose had us creating a view that would return aggregate records, with an added ‘source’ column specifying the originating database. We then rigged our #sharded_connection_klass to return an abstract connection model depending on the value. Other ideas for implementation include an after_create callback that writes the source database / ID to a centralized table.

Example and Usage

Setup

Setup your Aggregate model, inheriting from a class that establishes a connection to your global datastore.

  class AggregateFoo < ActiveRecord::Base
    include ShardedDatabase::Aggregate
    source_class 'Foo'

    def sharded_connection_klass
      "Connection::#{source.classify}".constantize
    end
  end

The constant returned by #sharded_connection_klass is expected to be an subclass of ActiveRecord::Base and respond to #connection. It is suggested that you use the preserve_attributes class method in the aggregate class and use that attribute to augment and/or find the constant somehow.

As of 0.2.0, the class method #source_class is required - sharded_database no longer guesses for the correct source class!

Loading Aggregate Records

Assuming that AggregateFoo is a model that is bound to your aggregate view/table, you can now #find against this model as you would any other ActiveRecord class.

  AggregateFoo.all   # => [#<Foo(Connection::One) ..>, #<Foo(Connection::Two) ..>, #<Foo(Connection::One) ..>]

Updating an attribute on the first record in the above array would update the corresponding record located in the database that Connection::One connects to.

Associations

Associations are also taken into account, and any associations that are defined on a Foo model will be sourced correctly. I have only tested this one level deep, though.

Accessing Non-Proxyable AggregateFoo Objects

Easily done. Simply add a :aggregate_proxy option set to false to your finders.

  AggreateFoo.all(:aggregate_proxy => false)

Working without an AggregateFoo table

In the event that you need to load a Foo object directly, and you may or may not know the appropriate connection at runtime, sharded_database is able to apply itself to an ActiveRecord model directly.

This is done by accepting a :connection key in the options hash for a #find method. The value of :connection can either be an abstract ActiveRecord class, a Proc, or a Symbol, which will attempt to call a method of the same name on the class doing the finding. The latter two options are also supplied the #find arguments, which is useful when wanting to dynamically determine a connection.

Setup

Any class that includes ShardedDatabase::Aggregate and defines a source_class will have that class be given this functionality. To add manually:

  class Bar < ActiveRecord::Base
    include ShardedDatabase::ModelWithConnection
  end

Usage

As mentioned above, there are three acceptable values for the :connection key. Borrowing heavily from ActionController’s callback setup, acceptable values can either be an ActiveRecord class, or a Proc or class method name (represented as a symbol) that will ultimately return an ActiveRecord class. Both the Proc and method will be passed a splat of the attributes passed to #find.

The method implementation will be demonstrated in the Example, but the other implementations are as follows:

Supplying a class

  Bar.find(:first, :order => 'created_at asc', :connection => BarConnection)

Supplying a Proc

  Bar.find(123, :connection => lambda { |*args| (args.first % 2) == 1 ? Connection::One : Connection::Two })

…Which would use Connection::One, as % 2 of 123 is 1.

Example

A real world situation would be something akin to Flickr. Data is sharded over N number of databases, and the criteria sent to your finder method would be used to connect to the right datastore.

Imagine you have a photo sharing application, and due to the amount of photos managed, you’re forced to shard the ‘photos’ database across 3 database servers. To do this with sharded_database, you’d do something like:

  class Photo < ActiveRecord::Base

    def self.determine_database(*attrs)   # [:all, { :conditions => { :name => 'mountain' } }]
      case first_letter = attrs.last[:conditions][:name].first[0,1]  # get the first letter
      when a..f   : Connection::AToF
      when g..p   : Connection::GToP
      when q..z   : Connection::QtoZ
    end

  end

  Photo.all(:conditions => { :name => 'mountain' }, :connection => :determine_database)

Which would return any photo named ‘mountain’ against the database Connection::GToP is connected to.

TODO

  • Apply associations to objects found using the :connection key
  • Fix association application to regenerate a proper reflection/association

Copyright &copy; 2008, Brennan Dunn, Decisiv Inc. Released under the MIT license.