Skip to content

brennandunn/sharded_database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sharded_database

Overview

This ActiveRecord plugin is the end result of a real world problem - First, how to aggregate table data from across multiple databases for the purpose of pagination / searching. Second, how to transparently have one production instance have read/write access to a set of aggregated records pulled from a global view.

The first step was solved by creating a SQL view to pool all production records together for a given databse. This SQL view resides on a central database accessible by all instances of our application. From there, we query against this view to return a set of Aggregate* records. The second step, which this plugin handles, is proxying to the original database/record for the ‘aggregate’ instance and transparently interacting with the returned aggregate record as if it were the original object.

Uses

Currently, sharded_database works by reflecting on a set of returned records, and taking the result of #sharded_connection_klass method to delegate the connection to - for each instance. The path we chose had us creating a view that would return aggregate records, with an added ‘source’ column specifying the originating database. We then rigged our #sharded_connection_klass to return an abstract connection model depending on the value. Other ideas for implementation include an after_create callback that writes the source database / ID to a centralized table.

Example and Usage

Setup

Setup your Aggregate model, inheriting from a class that establishes a connection to your global datastore.

class AggregateFoo < ActiveRecord::Base
  include ShardedDatabase::Aggregate
  source_class 'Foo'

  def sharded_connection_klass
    "Connection::#{source.classify}".constantize
  end
end

The constant returned by #sharded_connection_klass is expected to be an subclass of ActiveRecord::Base and respond to #connection. It is suggested that you use the preserve_attributes class method in the aggregate class and use that attribute to augment and/or find the constant somehow.

As of 0.2.0, the class method #source_class is required - sharded_database no longer guesses for the correct source class!

Loading Aggregate Records

Assuming that AggregateFoo is a model that is bound to your aggregate view/table, you can now #find against this model as you would any other ActiveRecord class.

AggregateFoo.all   # => [#<Foo(Connection::One) ..>, #<Foo(Connection::Two) ..>, #<Foo(Connection::One) ..>]

Updating an attribute on the first record in the above array would update the corresponding record located in the database that Connection::One connects to.

Associations

Associations are also taken into account, and any associations that are defined on a Foo model will be sourced correctly. I have only tested this one level deep, though.

Accessing Non-Proxyable AggregateFoo Objects

Easily done. Simply add a :aggregate_proxy option set to false to your finders.

AggreateFoo.all(:aggregate_proxy => false)

Working without an AggregateFoo table

In the event that you need to load a Foo object directly, and you may or may not know the appropriate connection at runtime, sharded_database is able to apply itself to an ActiveRecord model directly.

This is done by accepting a :connection key in the options hash for a #find method. The value of :connection can either be an abstract ActiveRecord class, a Proc, or a Symbol, which will attempt to call a method of the same name on the class doing the finding. The latter two options are also supplied the #find arguments, which is useful when wanting to dynamically determine a connection.

Setup

Any class that includes ShardedDatabase::Aggregate and defines a source_class will have that class be given this functionality. To add manually:

class Bar < ActiveRecord::Base
  include ShardedDatabase::ModelWithConnection
end

Usage

As mentioned above, there are three acceptable values for the :connection key. Borrowing heavily from ActionController’s callback setup, acceptable values can either be an ActiveRecord class, or a Proc or class method name (represented as a symbol) that will ultimately return an ActiveRecord class. Both the Proc and method will be passed a splat of the attributes passed to #find.

The method implementation will be demonstrated in the Example, but the other implementations are as follows:

Supplying a class

Bar.find(:first, :order => 'created_at asc', :connection => BarConnection)

Supplying a Proc

Bar.find(123, :connection => lambda { |*args| (args.first % 2) == 1 ? Connection::One : Connection::Two })

…Which would use Connection::One, as % 2 of 123 is 1.

Example

A real world situation would be something akin to Flickr. Data is sharded over N number of databases, and the criteria sent to your finder method would be used to connect to the right datastore.

Imagine you have a photo sharing application, and due to the amount of photos managed, you’re forced to shard the ‘photos’ database across 3 database servers. To do this with sharded_database, you’d do something like:

class Photo < ActiveRecord::Base

  def self.determine_database(*attrs)   # [:all, { :conditions => { :name => 'mountain' } }]
    case first_letter = attrs.last[:conditions][:name].first[0,1]  # get the first letter
    when a..f   : Connection::AToF
    when g..p   : Connection::GToP
    when q..z   : Connection::QtoZ
  end

end

Photo.all(:conditions => { :name => 'mountain' }, :connection => :determine_database)

Which would return any photo named ‘mountain’ against the database Connection::GToP is connected to.

TODO

  • Apply associations to objects found using the :connection key

  • Fix association application to regenerate a proper reflection/association

Copyright &copy; 2008, Brennan Dunn, Decisiv Inc. Released under the MIT license.

About

ActiveRecord plugin that provides the underlying framework to connect to multiple databases at the instance level

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages