Highlight create_distributed_function (#892)

citusdata · Jan 10, 2020 · cdb1272 · cdb1272
1 parent 2bef61b
commit cdb1272
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 5 deletions.
diff --git a/get_started/concepts.rst b/get_started/concepts.rst
@@ -69,6 +69,8 @@ When you use Citus, the coordinator node you connect to and interact with is a r
 
 Creating standard PostgreSQL tables is easy because it's the default. It’s what you get when you run CREATE TABLE. In almost every Citus deployment we see standard PostgreSQL tables co-existing with distributed and reference tables. Indeed, Citus itself uses local tables to hold cluster metadata, as mentioned earlier.
 
+.. _shards:
+
 Shards
 ------
 

diff --git a/get_started/tutorial_multi_tenant.rst b/get_started/tutorial_multi_tenant.rst
@@ -107,6 +107,9 @@ In this case, we will shard all the tables on the :code:`company_id`.
 
 .. code-block:: sql
 
+    -- before distributing tables, enable some extra features
+    SET citus.replication_model = 'streaming';
+
     SELECT create_distributed_table('companies', 'id');
     SELECT create_distributed_table('campaigns', 'company_id');
     SELECT create_distributed_table('ads', 'company_id');
@@ -147,17 +150,55 @@ If you want to double the budget for all the campaigns of a company, you can run
     WHERE company_id = 5;
 
 Another example of such an operation would be to run transactions which span multiple tables. Let's
-say you want to delete a campaign and all its associated ads, you could do it atomically by running.
+say you want to delete a campaign and all its associated ads, you could do it atomically by running:
 
 .. code-block:: sql
 
     BEGIN;
-    DELETE from campaigns where id = 46 AND company_id = 5;
-    DELETE from ads where campaign_id = 46 AND company_id = 5;
+    DELETE FROM campaigns WHERE id = 46 AND company_id = 5;
+    DELETE FROM ads WHERE campaign_id = 46 AND company_id = 5;
     COMMIT;
 
-Other than transactional operations, you can also run analytics queries on this data using standard SQL.
-One interesting query for a company to run would be to see details about its campaigns with maximum budget.
+Each statement in a transactions causes roundtrips between the coordinator and
+workers in Citus.  For multi-tenant workloads, it's more efficient to run
+transactions in distributed functions. The efficiency gains become more
+apparent for larger transactions, but we can use the small transaction above as
+an example.
+
+First create a function that does the deletions:
+
+.. code-block:: postgres
+
+    CREATE OR REPLACE FUNCTION
+      delete_campaign(company_id int, campaign_id int)
+    RETURNS void LANGUAGE plpgsql AS $fn$
+    BEGIN
+      DELETE FROM campaigns
+       WHERE id = $2 AND campaigns.company_id = $1;
+      DELETE FROM ads
+       WHERE ads.campaign_id = $2 AND ads.company_id = $1;
+    END;
+    $fn$;
+
+Next use :ref:`create_distributed_function` to instruct Citus to run the
+function directly on workers rather than on the coordinator. It will run the
+function on whatever worker holds the :ref:`shards` for tables ``ads`` and
+``campaigns`` corresponding to the value ``company_id``. (This feature
+requires the ``citus.replication_model`` change we made earlier.)
+
+.. code-block:: sql
+
+    SELECT create_distributed_function(
+      'delete_campaign(int, int)', 'company_id',
+      colocate_with := 'campaigns'
+    );
+
+    -- you can run the function as usual
+    SELECT delete_campaign(5, 46);
+
+Besides transactional operations, you can also run analytics queries using
+standard SQL.  One interesting query for a company to run would be to see
+details about its campaigns with maximum budget.
 
 .. code-block:: sql