public
Description: A full featured solution for database replication under a Rails application layer
Homepage: http://www.akitaonrails.com
Clone URL: git://github.com/akitaonrails/acts_as_replica.git
Search Repo:
Fabio Akita (author)
Thu Apr 03 09:49:51 -0700 2008
name age message
folder .gitignore Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder CHANGELOG Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder MIT-LICENSE Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder README Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder Rakefile Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder TODO Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder generators/ Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder init.rb Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder install.rb Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder lib/ Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder script/ Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder test/ Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder trunk/ Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
folder uninstall.rb Thu Apr 03 09:49:51 -0700 2008 First import [Fabio Akita]
README
== ActsAsReplica
 
This plugin is meant to be used in offline-client scenarios where the same Rails
app is deployed in both the clients and in the main server. For instance using
some other 3rd party solution as Joyent's Slingshot, Rails2Ext and so forth.

Bear in mind that this is not a one-package-solves-all kind of solutions. It 
assumes the scenario of multiple offline clients and one master server. It doesn't
replace heavy industrial level message queues or database level merge replication.
It also doesn't support master-less distributed peer-to-peer replications. Only
N-clients-1-master is supported by now.

Clients can input data offline. This data will be recorded in a local sqlite3 file.
Then it can connect to the server to pull more recent data from it and push its 
new data back to it.

== Background Job

This sollution relies on a background job and batch control. The Rails App can
trigger the execution of the background job that will actually do the replication
procedure. The plugin generator will create a sample SyncsController and views
that you can tailor to your needs. In the background ('system' call in *nix and
Process.create in Win32) it will start a script/runner process that calls 
lib/daemons/replicator.rb. The sample controller reads this log
file to create a user feedback on screen via Ajax call.

== Dependencies

- gem install uuidtools
- gem install fastercsv

Win32Utils in Windows

== Installation

./script/generate replicator

== Project Assumptions

This plugin follows several assumptions:

- Every replicable table has to have a Surrogate UUID-based primary key
  This is made this way to avoid any possible primary key conflict between
  the clients or server. Yes, I could use integer ranges for each client but this
  would add unnecessary overhead to the process. I could also have made some
  man-in-the-middle controller that would transact ids back and forth, but this
  would be even more unnecessary. UUIDs are fast, simple and reliable.

- This app has to have a User class with a singleton 'current_user' method.
  The app has to make sure User.current_user always contain something (usually
  with the before_filter method in the controller to get the currently logged
  in user). Just define 'acts_as_auditor' in the User model for this.

- The primary key of the User model also has to be a UUID, and it also has to
  have a secondary UUID (column named GUID) that has to be available at the 
  RemoteClient model in the server. It means that the server doesn't need to
  have a full User table with all the offline clients if it doesn't want to
  (this may make the deployment process easier). And finally, this User model
  also has to have a last_synced integer column to record the latest replicated
  transaction log entry.

- Every replicable table has to have UserStamps (created_by, created_at, updated_by, 
  updated_at) because this plugin uses this data to know how to track them. So,
  it's not optional. The detail being the the created_by and updated_by columns
  will hold the UUID primary key of the User.

- The client can be behind a http proxy, using SSL connection and the web server 
  can request basic authentication credentials. Configurations can be held in the 
  config/syncable.yml file. Be careful though, as it supports the same
  infra-structure as Net::HTTP, so probably Windows based servers need more tests
  as they are usually not standards compliant. Refer to the SyncSetting model for
  details. This table will contain only ONE SINGLE ROW for each client machine.
  Be careful not to duplicate settings because one single setting will have 
  a specific UUID bound to the machine. This ID is important for it's used to 
  uniquely identify each client app that replicates back to the server.

- It doesn't use XML for the payload packages for 2 reasons: first of all, I don't
  personally like XML for data transfer. Second of all, YAML is lighter weight, 
  supported through all Ruby and Rails objects nativelly and easily human readable.
  One can make an adapter later, as this is only a matter of marshalling. So it
  may not be very easy to place message brokers in between the client and server.
  But as I said, this is a very opinionated piece of software made for my own use.

== Basic Workflow (started through /syncs/perform_sync in the client)

(1) The client initiate a handshake process:

GET /syncs/handshake.yaml

(2) The server creates an internal session and sends back a cookie ID
    (session ID), a hashed challenge key and it's own machine ID (UUID).

(3) The client has to look for its internal users's GUID and create a
    response to the challenge:

POST /syncs/handshake.yaml?client_id=&challenge_response=

(4) The server has the user's GUID mapped in the RemoteClient table so it 
    can compare the received response with its own. When the server receives
    new data from the client, it looks for a correspondent entry in the
    RemoteMachine table. Each user can be bound to many machines, each having
    its own machind UUID. That way the user can choose to work in any client
    app. installed in any machine and still be able to replicate data reliably.
    Each RemoteMachine records the latest executed transaction log entry, so
    it know where to restart the next time.

(5) Now, the client requests the most recent data from the server. It has to
    look for the last_synced column in its own User table.

POST /syncs/down.yaml&for_when=9999

(6) Server calls Replica.down internally and looks for all new data since the 
    'for_when' integer received that was not created by the logged in user. Sends 
    back a ActsAsReplica::Structs::SyncPayload package encoded as YAML.

(7) Client calls Replica.up internally to record the new data. If everything goes
    fine, records the latest last_synced transaction entry ID in the User table.

(8) Client calls Replica.down internally, using the latest recorded transaction entry
    and machine ID obtained from the server upon the handshake described above. 
  It retrieves the newest data it has created offline and also creates a 
  ActsAsReplica::Structs::SyncPayload package that it posts to the server in 
  YAML format:

POST /syncs/up.yaml?syncs=<YAML::Object>

(8) Server calls Replica.up internally and processes the received package. If
    everything goes fine, it updates the last_synced column in the 
    RemoteMachine table for this particular logged in user/machine.

(9) Client compiles the results page with all that happened in this transaction

== FIRST LOGIN

When a brand new desktop stand-alone installation is done, the database is probably
empty. But the user has to log into the server. So we have a bootstrap problem:
how to log in if the local database is void of any user to do so?

We have to integrate a "first login" procedure into your authentication system. The
user is prompted for his username/password. The authentication proceed with a local
verification. If it fails then it checks connectivity and then queries the server:

(1) POST /syncs/handshake.yaml?username=XXX&password=YYY

Ideally this is done through a SSL connection so the password is never disclosed 
over a plain text only protocol (further cryptography could help).

(2) The server queries it's own local database. If it confirms it, then it sends
back a YAML serialized array containing [@user, @revision]. This revision is for
SVN upgrading integration (see lib/daemons/upgrade.rb).

(3) The local call will automatically receive the server's serialized User object
and properly persist it locally. Now you can authenticate the user and
automatically start a replication/upgrade procedure as described in the previous section

== INITIAL TESTS

As this involves at least two peers, we have to load up at least two mongrel 
processes. In this particular test, we'll use the development and production
environments at once as a testbed for a simple scenario.

(1) First, everytime we want to test the whole scenario, we have to clean the
databases. Migrations are already set to correctly populate both different
environments. So, from the shell:

rm db/*.sql*; rake db:migrate RAILS_ENV=development; rake db:migrate RAILS_ENV=production

(2) Now, we start 2 mongrel processes in 2 different shells:

./script/server -p 3000 -e development
  or ./script/runner '@logged_user=User.find_by_login("admin").id; load "lib/daemons/replicator.rb"'

./script/server -p 3001 -e production

(3) Now, login with username 'admin', password 'admin' at:

http://localhost:3000/users/login

(4) Then manually type this URL:

http://localhost:3000/syncs/perform_sync

(5) The call above simulates a client starting synchronization with a server. If
everything went fine, we can get in the ./script/console [environment] of each
and check that totals for ReturnOrder.count and Batch.count are the same in both
environments. The browser should disclose something similar to this:

Perform Syncing Results:

./script/runner 'puts ReturnOrder.count; puts Batch.count' -e development
./script/runner 'puts ReturnOrder.count; puts Batch.count' -e production

The results should be exactly the same