Add rails to transitioning section (#332)

citusdata · Apr 19, 2017 · 212b8a3 · 212b8a3
1 parent 81c9dfb
commit 212b8a3
Show file tree

Hide file tree

Showing 3 changed files with 222 additions and 0 deletions.
diff --git a/images/rails-ref-app.png b/images/rails-ref-app.png
diff --git a/index.rst b/index.rst
@@ -47,6 +47,7 @@ topics.
    :caption: Transitioning to Citus
 
    migration/transitioning.rst
+   migration/rails.rst
 
 .. toctree::
    :caption: Performance

diff --git a/migration/transitioning.rst b/migration/transitioning.rst
@@ -136,6 +136,227 @@ When joining tables make sure to filter by tenant id. For instance here is how t
      AND l.store_id='8c69aa0d-3f13-4440-86ca-443566c1fc75'
      AND p.store_id='8c69aa0d-3f13-4440-86ca-443566c1fc75'
 
+App Migration (Ruby on Rails)
+-----------------------------
+
+Above, we discussed the framework-agnostic database changes required
+for using Citus in the multi-tenant use case. This section investigates
+specifically how to migrate multi-tenant Rails applications to a
+Citus storage backend. We'll use the `activerecord-multi-tenant
+<https://github.com/citusdata/activerecord-multi-tenant>`__ Ruby gem for
+easier scale-out.
+
+This Ruby gem has evolved from our experience working with customers
+scaling out their multi-tenant apps. It patches some restrictions
+that ActiveRecord and Rails currently have when it comes to automatic
+query building. It is based on the excellent `acts\_as\_tenant
+<https://github.com/ErwinM/acts_as_tenant>`__ library, and extends it
+for the particular use-case of a distributed multi-tenant database like
+Citus.
+
+Preparing to scale-out a multi-tenant application
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Initially you’ll often start out with all tenants placed on a single
+database node, and using a framework like Ruby on Rails and ActiveRecord
+to load the data for a given tenant when you serve a web request that
+returns the tenant’s data.
+
+ActiveRecord makes a few assumptions about the data storage that limit
+your scale-out options. In particular, ActiveRecord introduces a pattern
+where you normalize data and split it into many distinct models each
+identified by a single ``id`` column, with multiple ``belongs_to``
+relationships that tie objects back to a tenant or customer:
+
+.. code-block:: ruby
+
+  # typical pattern with multiple belongs_to relationships
+
+  class Customer < ActiveRecord::Base
+    has_many :sites
+  end
+  class Site < ActiveRecord::Base
+    belongs_to :customer
+    has_many :page_views
+  end
+  class PageView < ActiveRecord::Base
+    belongs_to :site
+  end
+
+The tricky thing with this pattern is that in order to find all page
+views for a customer, you'll have to query for all of a customer's sites
+first. This becomes a problem once you start sharding data, and in
+particular when you run UPDATE or DELETE queries on nested models like
+page views in this example.
+
+There are a few steps you can take today, to make scaling out easier in
+the future:
+
+**1. Introduce a column for the tenant\_id on every record that belongs
+to a tenant**
+
+In order to scale out a multi-tenant model, its essential you can locate
+all records that belong to a tenant quickly. The easiest way to achieve
+this is to simply add a ``tenant_id`` column (or “customer\_id” column,
+etc) on every object that belongs to a tenant, and backfilling your
+existing data to have this column set correctly.
+
+When you move to a distributed multi-tenant database like Citus in the
+future, this will be a required step - but if you've done this before,
+you can simply COPY over your data, without doing any additional data
+modification.
+
+**2. Use UNIQUE constraints which include the tenant\_id**
+
+Unique constraints on values will present a problem in any distributed
+system, since it’s difficult to make sure that no two nodes accept the
+same unique value.
+
+In many cases, you can work around this problem by adding the tenant\_id
+to the constraint, effectively making objects unique inside a given
+tenant, but not guaranteeing this beyond that tenant.
+
+For example, Rails creates a primary key by default, that only includes
+the ``id`` of the record:
+
+::
+
+  Indexes:
+      "page_views_pkey" PRIMARY KEY, btree (id)
+
+You should modify that primary key to also include the tenant\_id:
+
+.. code-block:: sql
+
+  ALTER TABLE page_views DROP CONSTRAINT page_views_pkey;
+  ALTER TABLE page_views ADD PRIMARY KEY(id, customer_id);
+
+An exception to this rule might be an email or username column on a
+users table (unless you give each tenant their own login page), which is
+why, once you scale out, we typically recommend these to be split out
+from your distributed tables and placed as a local table on the Citus
+coordinator node.
+
+**3. Include the tenant\_id in all queries, even when you can locate an
+object using its own object\_id**
+
+The easiest way to run a typical SQL query in a distributed system
+without restrictions is to always access data that lives on a single
+node, determined by the tenant you are accessing.
+
+For this reason, once you use a distributed system like Citus, we
+recommend you always specify both the tenant\_id and an object’s own ID
+for queries, so the coordinator can locate your data quickly, and can
+route the query to a single shard - instead of going to each shard in
+the system individually and asking the shard whether it knows the given
+object\_id.
+
+Updating the Rails Application
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can get started by including ``gem 'activerecord-multi-tenant'``
+into your Gemfile, running ``bundle install``, and then annotating your
+ActiveRecord models like this:
+
+.. code-block:: ruby
+
+  class PageView < ActiveRecord::Base
+    multi_tenant :customer
+    # ...
+  end
+
+In this case ``customer`` is the tenant model, and your ``page_views``
+table needs to have a ``customer_id`` column that references the
+customer the page view belongs to.
+
+The `activerecord-multi-tenant
+<https://github.com/citusdata/activerecord-multi-tenant>`__ Gem aims to
+make it easier to implement the above data changes in a typical Rails
+application.
+
+As mentioned in the beginning, by adding ``multi_tenant :customer``
+annotations to your models, the library automatically takes care of
+including the tenant\_id with all queries.
+
+In order for that to work, you’ll always need to specify which tenant
+you are accessing, either by specifying it on a per-request basis:
+
+.. code-block:: ruby
+
+  class ApplicationController < ActionController::Base
+    # Opt-into the "set_current_tenant" controller helpers by specifying this:
+    set_current_tenant_through_filter
+
+    before_filter :set_customer_as_tenant
+
+    def set_customer_as_tenant
+      customer = Customer.find(session[:current_customer_id])
+      set_current_tenant(customer) # Set the tenant
+    end
+  end
+
+Or by wrapping your code in a block, e.g. for background and maintenance
+tasks:
+
+.. code-block:: ruby
+
+  customer = Customer.find(session[:current_customer_id])
+  # ...
+  MultiTenant.with(customer) do
+    site = Site.find(params[:site_id])
+
+    # Modifications automatically include tenant_id
+    site.update! last_accessed_at: Time.now
+
+    # Queries also include tenant_id automatically
+    site.page_views.count
+  end
+
+Once you are ready to use a distributed multi-tenant database like
+Citus, all you need is a few adjustments to your migrations, and you're
+good to go:
+
+.. code-block:: ruby
+
+  class InitialTables < ActiveRecord::Migration
+    def up
+      create_table :page_views, partition_key: :customer_id do |t|
+        t.references :customer, null: false
+        t.references :site, null: false
+
+        t.text :url, null: false
+        ...
+        t.timestamps null: false
+      end
+      create_distributed_table :page_views, :account_id
+    end
+
+    def down
+      drop_table :page_views
+    end
+  end
+
+Note the ``partition_key: :customer_id``, something that's
+added to Rails' ``create_table`` by our library, which ensures
+that the primary key includes the tenant\_id column, as well as
+``create_distributed_table`` which enables Citus to scale out the data
+to multiple nodes.
+
+Example Application
+~~~~~~~~~~~~~~~~~~~
+
+If you are interested in a more complete
+example, check out our `reference app
+<https://github.com/citusdata/citus-example-ad-analytics>`__ that
+showcases a simplified sample SaaS application for ad analytics.
+
+.. image:: ../images/rails-ref-app.png
+
+As you can see in the screenshot, most data is associated to the
+currently logged in customer - even though this is complex analytical
+data, all data is accessed in the context of a single customer or
+tenant.
+
 Real-Time Analytics Data Model
 ==============================