Skip to content
Browse files

Add Sharding documentation from doctrine/Shards

  • Loading branch information...
1 parent f0ccb4d commit 9b2fbbae4d9b99cdd3ade7bd64bb970c504a3afa @beberlei beberlei committed Jun 25, 2012
Showing with 894 additions and 0 deletions.
  1. +2 −0 en/index.rst
  2. +473 −0 en/reference/sharding.rst
  3. +419 −0 en/reference/sharding_azure_tutorial.rst
View
2 en/index.rst
@@ -24,6 +24,8 @@ Contents:
reference/schema-representation
reference/events
reference/security
+ reference/sharding
+ reference/sharding_azure_tutorial
reference/supporting-other-databases
reference/portability
reference/known-vendor-issues
View
473 en/reference/sharding.rst
@@ -0,0 +1,473 @@
+Sharding
+========
+
+Starting with 2.3 Doctrine DBAL contains some functionality to simplify the
+development of horizontally sharded applications. In this first release it
+contains a ``ShardManager`` interface. This interface allows to programatically
+select a shard to send queries to. At the moment there are no functionalities
+yet to dynamically pick a shard based on ID, query or database row yet. That
+means the sharding extension is primarily suited for:
+
+1. multi-tenant applications or
+2. applications with completly separated datasets (example: weather data).
+
+Both kind of application will work with both DBAL and ORM.
+
+.. note::
+
+ Horizontal sharding is an evasive archicture that will affect your application code and using this
+ extension to Doctrine will not make it work "magically".
+
+You have to understand and integrate the following drawbacks:
+
+1. Pre-generation of IDs that are unique across all shards required.
+2. No transaction support across shards.
+3. No foreign key support across shards (meaning no "real" relations).
+4. Very complex (or impossible) to query aggregates across shards.
+5. Denormalization: Composite keys required where normalized non-sharded db schemas don't need them.
+6. Schema Operations have to be done on all shards.
+
+The primary questions in a sharding architecture are:
+
+* Where is my data located?
+* Where should I save this new data to find it later?
+
+To answer these questions you generally have to craft a function that will tell
+you for a given ID, on which shard the data for this ID is located. To simplify
+this approach you will generally just pick a table which is the root of a set of
+related data and decide for the IDs of this table. All the related data that
+belong to this table are saved on the same shard.
+
+Take for example a multi-user blog application with the following tables:
+
+* Blog [id, name]
+* Post [id, blog_id, subject, body, author_id]
+* Comment [id, post_id, comment, author_id]
+* User [id, username]
+
+A sensible sharding architecture will split the application by blog. That means
+all the data for a particular blog will be on a single shard and scaling is
+done by putting the amound of blogs on many different database servers.
+
+Now users can post and comment on different blogs that reside on different
+shards. This makes the database schema above slightly tricky, because both
+`author_id` columns cannot have foreign keys to `User (id)`. Instead the User
+table is located in an entirely different "dimension" of the application in
+terms of the sharding architecture.
+
+To simplify working with this kind of multi-dimensional database schema, you
+can replace the author_ids with something more "meaningful", for example the
+e-mail address of the users if that is always known. The "user" table can then
+be seperated from the database schema above and put on a second horizontally
+scaled sharding architecture.
+
+As you can see, even with just the four tables above, sharding actually becomes
+quite complex to think about.
+
+The rest of this section discusses Doctrine sharding functionality in technical
+detail.
+
+ID Generation
+-------------
+
+To solve the issue of unique ID-generation across all shards are several
+approaches you should evaluate:
+
+Use GUID/UUIDs
+~~~~~~~~~~~~~~
+
+The most simple ID-generation mechanism for sharding are
+universally unique identifiers. These are 16-byte
+(128-bit) numbers that are guaranteed to be unique across different servers.
+You can `read up on UUIDs on Wikipedia
+<http://en.wikipedia.org/wiki/Universally_unique_identifier>`_.
+
+The drawback of UUIDs is the segmentation they cause on indexes. Because UUIDs
+are not sequentially generated, they can have negative impact on index access
+performance. Additionally they are much bigger
+than numerical primary keys (which are normally 4-bytes in length).
+
+At the moment Doctrine DBAL drivers MySQL and SQL Server support the generation
+of UUID/GUIDs. You can use the following bit of code to generate them across
+platforms:
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\DBAL\DriverManager;
+
+ $conn = DriverManager::getConnection(/**..**/);
+ $guid = $conn->fetchColumn('SELECT ' . $conn->getDatabasePlatform()->getGuidExpression());
+
+ $conn->insert("my_table", array("id" => $guid, "foo" => "bar"));
+
+In your application you should hide this details in Id-Generation services:
+
+.. code-block:: php
+
+ <?php
+ namespace MyApplication;
+
+ class IdGenerationService
+ {
+ private $conn;
+
+ public function generateCustomerId()
+ {
+ return $this->conn->fetchColumn('SELECT ' .
+ $this->conn->getDatabasePlatform()->getGuidExpression()
+ );
+ }
+ }
+
+A good starting point to read up on GUIDs (vs numerical ids) is this blog post
+`Coding Horror: Primary Keys: IDs vs GUIDs
+<http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html>`_.
+
+Table Generator
+~~~~~~~~~~~~~~~
+
+In some scenarios there is no way around a numerical, automatically
+incrementing id. The way Auto incrementing IDs are implemented in MySQL and SQL
+Server however is completly unsuitable for sharding. Remember in a sharding
+architecture you have to know where the row for a specific ID is located and
+IDs have to be globally unique across all servers. Auto-Increment Primary Keys
+are missing both properties.
+
+To get around this issue you can use the so-called "table-generator" strategy.
+In this case you define a single database that is responsible for the
+generation of auto-incremented ids. You create a table on this database and
+through the use of locking create new sequential ids.
+
+There are three important drawbacks to this strategy:
+
+1. Single point of failure
+2. Bottleneck when application is write-heavy
+3. A second independent database connection is needed to guarantee transaction
+ safety.
+
+If you can live with this drawbacks then you can use table-generation with the
+following code in Doctrine:
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\DBAL\DriverManager;
+ use Doctrine\DBAL\Id\TableGenerator;
+
+ $conn = DriverManager::getConnection(/**..**); // connection 1
+
+ // creating the TableGenerator automatically opens a second connection.
+ $tableGenerator = new TableGenerator($conn, "sequences_tbl_name");
+
+ $id1 = $tableGenerator->nextValue("sequence_name1");
+ $id2 = $tableGenerator->nextValue("sequence_name2");
+
+The table generator obviously needs a table to work. The schema of this table
+is described in the ``TableGenerator`` class-docblock. Alternatively you
+can use the ``Doctrine\DBAL\Id\TableGeneratorSchemaVisitor`` and apply it to your
+``Doctrine\DBAL\Schema\Schema`` instance. It will automatically add the required
+sequence table.
+
+Natural Identifiers
+~~~~~~~~~~~~~~~~~~~
+
+Sometimes you are lucky and your application data-model comes with a natural
+id. This is mostly the case for applications who get their IDs generated
+somewhere else (exogeneous ID-generation) or that work with temporal data. In
+that case you can just define the natural primary key and shard your
+application based on this data.
+
+Transactions
+------------
+
+Transactions in sharding can only work for data that is located on a single
+shard. If you need transactions in your sharding architecture then you have to
+make sure that the data updated during a transaction is located on a single
+shard.
+
+Foreign Keys
+------------
+
+Since you cannot create foreign keys between remote database servers, in a
+sharding architecture you should put the data on a shard that belongs to each
+other. But even if you can isolate most of the rows on a single shard there may
+exist relations between tables that exist on different shards. In this case
+your application should be aware of the potential inconsistencies and handle
+them graciously.
+
+Complex Queries
+---------------
+
+GROUP BY, DISTINCT and ORDER BY are clauses that cannot be easily used in a
+sharding architecture. If you have to execute these queries against multiple
+shards then you cannot just append the different results to each other.
+
+You have to be aware of this problem and design your queries accordingly or
+shard the data in a way that you never have to query multiple shards to
+calculate a result.
+
+ShardManager Interface
+----------------------
+
+The central API of the sharding extension is the ``ShardManager`` interface.
+It contains two different groups of functions with regard to sharding.
+
+First, it contains the Shard Selection API. You can pick a shard based on a
+so-called "distribution-value" or reset the connection to the "global" shard,
+a necessary database that often contains heavily cached, sharding independent
+data such as meta tables or the "user/tenant" table.
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\DBAL\DriverManager;
+ use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
+
+ $conn = DriverManager::getConnection(array(
+ 'sharding' => array(
+ 'federationName' => 'my_database',
+ 'distributionKey' => 'customer_id',
+ )
+ ));
+ $shardManager = new SQLAzureShardManager($conn);
+
+ $currentCustomerId = 1234;
+ $shardManager->selectShard($currentCustomerId);
+ // all queries after this call hit the shard
+ // where customer with id 1234 is on.
+
+ $shardManager->selectGlobal();
+ // the global database is selected.
+
+To access the currently selected distribution value use the following API
+method:
+
+.. code-block:: php
+
+ <?php
+ $value = $shardManager->getCurrentDistributionValue();
+
+The shard manager will prevent you switching shards when a transaction is open.
+This is especially important when using sharding with the ORM. Because the ORM
+uses a single transaction during the flush-operation this means that you can
+only ever use one ``EntityManager`` with data from a single shard.
+
+The second API is the "fan-out" query API. This allows you to execute queries against
+ALL shards. The order of the results of this operation is undefined, that means
+your query has to return the data in a way that works for the application, or
+you have to sort the data in the application.
+
+.. code-block:: php
+
+ <?php
+ $sql = "SELECT * FROM customers";
+ $rows = $shardManager->queryAll($sql, $params);
+
+
+Schema Operations: SchemaSynchronizer Interface
+-----------------------------------------------
+
+Schema Operations in a sharding architecture are tricky. You have to perform
+them on all databases instances (shards) at the same time. Also Doctrine
+has problems with this in particular as you cannot generate an SQL file with
+changes on any development machine anymore and apply this on production. The
+required changes depend on the amount of shards.
+
+To allow the Doctrine Schema API operations on a sharding architecture we
+performed a refactored from code inside ORM ``Doctrine\ORM\Tools\SchemaTool``
+class and extracted the code for operations on Schema instances into a new
+``Doctrine\Shards\DBAL\SchemaSynchronizer`` interface.
+
+Every sharding implementation can implement this interface and allow schema
+operations to take part on multiple shards.
+
+SQL Azure Federations
+---------------------
+
+Doctrine Shards ships with a custom implementation for Microsoft SQL
+Azure. The Azure platform provides a native sharding functionality. In SQL
+Azure the sharding functionality is called Federations. This
+functionality applies the following restrictions (in line with the ones listed
+above):
+
+1. IDENTITY columns are not allowed on sharded tables (federated tables)
+2. Each table may only have exactly one clustered index and this index has to
+ have the distribution key/sharding-id as one column.
+3. Every unique index (or primary key) has to contain the
+ distribution-key/sharding-id.
+
+Especially the requirements 2 and 3 prevent normalized database schemas. You
+have to put the distribution key on every sharded table, which can affect your
+application code quite a bit. This may lead to the creation of composite keys
+where you normally wouldn't need them.
+
+The benefit of SQL Azure Federations is that they implement all the
+shard-picking logic on the server. You only have to make use of the ``USE
+FEDERATION`` statement. You don't have to maintain a list of all the shards
+inside your application and more importantly, resizing shards is done
+transparently on the server.
+
+Features of SQL Azure are:
+
+- Central server to log into federations architecture. No need to know all
+ connection details of all shards.
+- Database level operation to split shards, taking away the tediousness of this
+ operation for application developers.
+- A global tablespace that can contain global data to all shards.
+- One or many different federations (this library only supports working with
+ one)
+- Sharded or non-sharded tables inside federations
+- Allows filtering SELECT queries on the database based on the selected
+ sharding key value. This allows to implement sharded Multi-Tenant Apps very easily.
+
+To setup an SQL Azure ShardManager use the following code:
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\DBAL\DriverManager;
+ use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
+
+ $conn = DriverManager::getConnection(array(
+ 'dbname' => 'my_database',
+ 'host' => 'tcp:dbname.windows.net',
+ 'user' => 'user@dbname',
+ 'password' => 'XXX',
+ 'sharding' => array(
+ 'federationName' => 'my_federation',
+ 'distributionKey' => 'customer_id',
+ 'distributionType' => 'integer',
+ )
+ ));
+ $shardManager = new SQLAzureShardManager($conn);
+
+Currently you are limited to one federation in your application.
+
+You can inspect all the currently known shards on SQL Azure using the
+``ShardManager#getShards()`` function:
+
+.. code-block:: php
+
+ <?php
+ foreach ($shardManager->getShards() as $shard) {
+ echo $shard['id'] . " " . $shard['rangeLow'] . " - " . $shard['rangeHigh'];
+ }
+
+Schema Operations
+~~~~~~~~~~~~~~~~~
+
+Schema Operations on SQL Azure Federations are possible with the
+``SQLAzureSchemaSynchronizer``. You can instantiate this from your code:
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\Shards\DBAL\SQLAzure\SQLAzureSchemaSynchronizer;
+
+ $synchronizer = new SQLAzureSchemaSynchronizer($conn, $shardManager);
+
+You can use the API such as ``createSchema($schema)`` then and it will be
+distributed across all shards. The assumptions are:
+
+- Using ``SchemaSynchronizer#createSchema()`` assumes the database is empty.
+ The federation is created during this operation.
+- Using ``SchemaSynchronizer#updateSchema()`` assumes the database and the
+ federation exists. All shards of the federation are iterated and update is
+ applied to all shards consecutively.
+
+For a schema with tables in the global or federated sub-schema you have to use
+the Schema API to mark tables:
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\DBAL\Schema\Schema;
+
+ $schema = new Schema();
+
+ // no options set, this table will be on the federation root
+ $users = $schema->createTable('Users');
+ //...
+
+ // marked as sharded, but no distribution column given:
+ // non-federated table inside the federation
+ $products = $schema->createTable('Products');
+ $products->addOption('azure.federated', true);
+ //...
+
+ // shared + distribution column:
+ // federated table
+ $customers = $schema->createTable('Customers');
+ $customers->addColumn('CustomerID', 'integer');
+ //...
+ $customers->addOption('azure.federated', true);
+ $customers->addOption('azure.federatedOnColumnName', 'CustomerID');
+
+SQLAzure Filtering
+~~~~~~~~~~~~~~~~~~
+
+SQL Azure comes with a powerful filtering feature, that allows you to
+automatically implement a multi-tenant application for a formerly single-tenant
+application. The restriction to make this work is that your application does not work with
+IDENTITY columns.
+
+Normally when you select a shard using ``ShardManager#selectShard()`` any query
+executed against this shard will return data from ALL the tenants located on
+this shard. With the "FILTERING=ON" flag on the ``USE FEDERATION`` query
+however SQL Azure can automatically filter all SELECT queries with the chosen
+distribution value. Additionally you can automatically set the currently
+selected distribution value in every INSERT statement using a function for this
+value as the ``DEFAULT`` part of the column. If you are using GUIDs for every
+row then UPDATE and DELETE statements using only GUIDs will work out perfectly
+as well, as they are by definition for unique rows. This feature allows you to
+build multi-tenant applications, even though they were not originally designed
+that way.
+
+To enable filtering you can use the
+``SQLAzureShardManager#setFilteringEnabled()`` method. This method is not part
+of the interface. You can also set a default value for filtering by passing it
+as the "sharding.filteringEnabled" parameter to
+``DriverManager#getConnection()``.
+
+Generic SQL Sharding Support
+----------------------------
+
+Besides the custom SQL Azure support there is a generic implementation that
+works with all database drivers. It requires to specify all database
+connections and will switch between the different connections under the hood
+when using the ``ShardManager`` API. This is also the biggest drawback of this
+approach, since fan-out queries need to connect to all databases in a single
+request.
+
+See the configuration for a sample sharding connection:
+
+.. code-block:: php
+
+ <?php
+ use Doctrine\DBAL\DriverManager;
+
+ $conn = DriverManager::getConnection(array(
+ 'wrapperClass' => 'Doctrine\Shards\DBAL\PoolingShardConnection',
+ 'driver' => 'pdo_sqlite',
+ 'global' => array('memory' => true),
+ 'shards' => array(
+ array('id' => 1, 'memory' => true),
+ array('id' => 2, 'memory' => true),
+ ),
+ 'shardChoser' => 'Doctrine\Shards\DBAL\ShardChoser\MultiTenantShardChoser',
+ ));
+
+You have to configure the following options:
+
+- 'wrapperClass' - Selecting the PoolingShardConnection as above.
+- 'global' - An array of database parameters that is used for connecting to the
+ global database.
+- 'shards' - An array of of shard database parameters. You have to specifiy an
+ 'id' parameter for each of the shard configurations.
+- 'shardChoser' - Implementation of the
+ ``Doctrine\Shards\DBAL\ShardChoser\ShardChoser`` interface.
+
+The Shard Choser interface maps the distribution value to a shard-id. This
+gives you the freedom to implement your own strategy for sharding the data
+horizontally.
View
419 en/reference/sharding_azure_tutorial.rst
@@ -0,0 +1,419 @@
+SQLAzure Sharding Tutorial
+==========================
+
+This tutorial builds upon the `Brian Swans tutorial
+<http://blogs.msdn.com/b/silverlining/archive/2012/01/18/using-sql-azure-federations-via-php.aspx>`_
+on SQLAzure Sharding and turns all the examples into examples using the Doctrine Sharding support.
+
+It introduces SQL Azure Sharding, which is an abstraction layer in SQL Azure to
+support sharding. Many features for sharding are implemented on the database
+level, which makes it much easier to work with than generic sharding
+implementations.
+
+For this tutorial you need an Azure account. You don't need to deploy the code
+on Azure, you can run it from your own machine against the remote database.
+
+.. note::
+
+ You can look at the code from the 'examples/sharding' directory.
+
+Install Doctrine
+----------------
+
+For this tutorial we will install Doctrine and the Sharding Extension through
+`Composer <http://getcomposer.org>`_ which is the easiest way to install
+Doctrine. Composer is a new package manager for PHP. Download the
+``composer.phar`` from their website and put it into a newly created folder for
+this tutorial. Now create a ``composer.json`` file in this project root with
+the following content:
+
+ {
+ "require": {
+ "doctrine/dbal": "2.2.2",
+ "doctrine/shards": "0.2"
+ }
+ }
+
+Open up the commandline and switch to your tutorial root directory, then call
+``php composer.phar install``. It will grab the code and install it into the
+``vendor`` subdirectory of your project. It also creates an autoloader, so that
+we don't have to care about this.
+
+Setup Connection
+----------------
+
+The first thing to start with is setting up Doctrine and the database connection:
+
+.. code-block:: php
+
+ <?php
+ // bootstrap.php
+ use Doctrine\DBAL\DriverManager;
+ use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
+
+ require_once "vendor/autoload.php";
+
+ $conn = DriverManager::getConnection(array(
+ 'driver' => 'pdo_sqlsrv',
+ 'dbname' => 'SalesDB',
+ 'host' => 'tcp:dbname.windows.net',
+ 'user' => 'user@dbname',
+ 'password' => 'XXX',
+ 'platform' => new \Doctrine\DBAL\Platforms\SQLAzurePlatform(),
+ 'driverOptions' => array('MultipleActiveResultSets' => false),
+ 'sharding' => array(
+ 'federationName' => 'Orders_Federation',
+ 'distributionKey' => 'CustId',
+ 'distributionType' => 'integer',
+ )
+ ));
+
+ $shardManager = new SQLAzureShardManager($conn);
+
+Create Database
+---------------
+
+Create a new database using the Azure/SQL Azure management console.
+
+Create Schema
+-------------
+
+Doctrine has a powerful schema API. We don't need to use low-level DDL
+statements to generate the database schema. Instead you can use an Object-Oriented API
+to create the database schema and then have Doctrine turn it into DDL
+statements.
+
+We will recreate Brians example schema with Doctrine DBAL. Instead of having to
+create federations and schema seperately as in his example, Doctrine will do it
+all in one step:
+
+.. code-block:: php
+
+ <?php
+ // create_schema.php
+ use Doctrine\DBAL\Schema\Schema;
+ use Doctrine\Shards\DBAL\SQLAzure\SQLAzureSchemaSynchronizer;
+
+ require_once 'bootstrap.php';
+
+ $schema = new Schema();
+
+ $products = $schema->createTable('Products');
+ $products->addColumn('ProductID', 'integer');
+ $products->addColumn('SupplierID', 'integer');
+ $products->addColumn('ProductName', 'string');
+ $products->addColumn('Price', 'decimal', array('scale' => 2, 'precision' => 12));
+ $products->setPrimaryKey(array('ProductID'));
+ $products->addOption('azure.federated', true);
+
+ $customers = $schema->createTable('Customers');
+ $customers->addColumn('CustomerID', 'integer');
+ $customers->addColumn('CompanyName', 'string');
+ $customers->addColumn('FirstName', 'string');
+ $customers->addColumn('LastName', 'string');
+ $customers->setPrimaryKey(array('CustomerID'));
+ $customers->addOption('azure.federated', true);
+ $customers->addOption('azure.federatedOnColumnName', 'CustomerID');
+
+ $orders = $schema->createTable('Orders');
+ $orders->addColumn('CustomerID', 'integer');
+ $orders->addColumn('OrderID', 'integer');
+ $orders->addColumn('OrderDate', 'datetime');
+ $orders->setPrimaryKey(array('CustomerID', 'OrderID'));
+ $orders->addOption('azure.federated', true);
+ $orders->addOption('azure.federatedOnColumnName', 'CustomerID');
+
+ $orderItems = $schema->createTable('OrderItems');
+ $orderItems->addColumn('CustomerID', 'integer');
+ $orderItems->addColumn('OrderID', 'integer');
+ $orderItems->addColumn('ProductID', 'integer');
+ $orderItems->addColumn('Quantity', 'integer');
+ $orderItems->setPrimaryKey(array('CustomerID', 'OrderID', 'ProductID'));
+ $orderItems->addOption('azure.federated', true);
+ $orderItems->addOption('azure.federatedOnColumnName', 'CustomerID');
+
+ // Create the Schema + Federation:
+ $synchronizer = new SQLAzureSchemaSynchronizer($conn, $shardManager);
+ $synchronizer->createSchema($schema);
+
+ // Or jut look at the SQL:
+ echo implode("\n", $synchronizer->getCreateSchema($schema));
+
+View Federation Members
+-----------------------
+
+To see how many shard instances (called Federation Members) your SQLAzure database currently has
+you can ask the ``ShardManager`` to enumerate all shards:
+
+.. code-block:: php
+
+ <?php
+ // view_federation_members.php
+ require_once "bootstrap.php";
+
+ $shards = $shardManager->getShards();
+ foreach ($shards as $shard) {
+ print_r($shard);
+ }
+
+Insert Data
+-----------
+
+Now we want to insert some test data into the database to see the behavior when
+we split the shards. We use the same test data as Brian, but use the Doctrine
+API to insert them. To insert data into federated tables we have to select the
+shard we want to put the data into. We can use the ShardManager to execute this
+operation for us:
+
+.. code-block:: php
+
+ <?php
+ // insert_data.php
+ require_once "bootstrap.php";
+
+ $shardManager->selectShard(0);
+
+ $conn->insert("Products", array(
+ "ProductID" => 386,
+ "SupplierID" => 1001,
+ "ProductName" => 'Titanium Extension Bracket Left Hand',
+ "Price" => 5.25,
+ ));
+ $conn->insert("Products", array(
+ "ProductID" => 387,
+ "SupplierID" => 1001,
+ "ProductName" => 'Titanium Extension Bracket Right Hand',
+ "Price" => 5.25,
+ ));
+ $conn->insert("Products", array(
+ "ProductID" => 388,
+ "SupplierID" => 1001,
+ "ProductName" => 'Fusion Generator Module 5 kV',
+ "Price" => 10.50,
+ ));
+ $conn->insert("Products", array(
+ "ProductID" => 388,
+ "SupplierID" => 1001,
+ "ProductName" => 'Bypass Filter 400 MHz Low Pass',
+ "Price" => 10.50,
+ ));
+
+ $conn->insert("Customers", array(
+ 'CustomerID' => 10,
+ 'CompanyName' => 'Van Nuys',
+ 'FirstName' => 'Catherine',
+ 'LastName' => 'Abel',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 20,
+ 'CompanyName' => 'Abercrombie',
+ 'FirstName' => 'Kim',
+ 'LastName' => 'Branch',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 30,
+ 'CompanyName' => 'Contoso',
+ 'FirstName' => 'Frances',
+ 'LastName' => 'Adams',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 40,
+ 'CompanyName' => 'A. Datum Corporation',
+ 'FirstName' => 'Mark',
+ 'LastName' => 'Harrington',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 50,
+ 'CompanyName' => 'Adventure Works',
+ 'FirstName' => 'Keith',
+ 'LastName' => 'Harris',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 60,
+ 'CompanyName' => 'Alpine Ski House',
+ 'FirstName' => 'Wilson',
+ 'LastName' => 'Pais',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 70,
+ 'CompanyName' => 'Baldwin Museum of Science',
+ 'FirstName' => 'Roger',
+ 'LastName' => 'Harui',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 80,
+ 'CompanyName' => 'Blue Yonder Airlines',
+ 'FirstName' => 'Pilar',
+ 'LastName' => 'Pinilla',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 90,
+ 'CompanyName' => 'City Power & Light',
+ 'FirstName' => 'Kari',
+ 'LastName' => 'Hensien',
+ ));
+ $conn->insert("Customers", array(
+ 'CustomerID' => 100,
+ 'CompanyName' => 'Coho Winery',
+ 'FirstName' => 'Peter',
+ 'LastName' => 'Brehm',
+ ));
+
+ $conn->executeUpdate("DECLARE @orderId INT
+
+ DECLARE @customerId INT
+
+ SET @orderId = 10
+ SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Hensien' and FirstName = 'Kari'
+
+ INSERT INTO Orders (CustomerId, OrderId, OrderDate)
+ VALUES (@customerId, @orderId, GetDate())
+
+ INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
+ VALUES (@customerId, @orderId, 388, 4)
+
+ SET @orderId = 20
+ SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Harui' and FirstName = 'Roger'
+
+ INSERT INTO Orders (CustomerId, OrderId, OrderDate)
+ VALUES (@customerId, @orderId, GetDate())
+
+ INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
+ VALUES (@customerId, @orderId, 389, 2)
+
+ SET @orderId = 30
+ SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Brehm' and FirstName = 'Peter'
+
+ INSERT INTO Orders (CustomerId, OrderId, OrderDate)
+ VALUES (@customerId, @orderId, GetDate())
+
+ INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
+ VALUES (@customerId, @orderId, 387, 3)
+
+ SET @orderId = 40
+ SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Pais' and FirstName = 'Wilson'
+
+ INSERT INTO Orders (CustomerId, OrderId, OrderDate)
+ VALUES (@customerId, @orderId, GetDate())
+
+ INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
+ VALUES (@customerId, @orderId, 388, 1)"
+ );
+
+This puts the data into the currently only existing federation member. We
+selected that federation member by picking 0 as distribution value, which is by
+definition part of the only existing federation.
+
+Split Federation
+----------------
+
+Now lets split the federation, creating a second federation member. SQL Azure
+will automatically redistribute the data into the two federations after you
+executed this command.
+
+.. code-block:: php
+
+ <?php
+ // split_federation.php
+ require_once 'bootstrap.php';
+
+ $shardManager->splitFederation(60);
+
+This little script uses the shard manager with a special method only existing
+on the SQL AZure implementation ``splitFederation``. It accepts a value at
+at which the split is executed.
+
+If you reexecute the ``view_federation_members.php`` script you can now see
+that there are two federation members instead of just one as before. You can
+see with the ``rangeLow`` and ``rangeHigh`` parameters what customers and
+related entries are now served by which federation.
+
+Inserting Data after Split
+--------------------------
+
+Now after we splitted the data we now have to make sure to be connected to the
+right federation before inserting data. Lets add a new customer with ID 55 and
+have him create an order.
+
+.. code-block:: php
+
+ <?php
+ // insert_data_aftersplit.php
+ require_once 'bootstrap.php';
+
+ $newCustomerId = 55;
+
+ $shardManager->selectShard($newCustomerId);
+
+ $conn->insert("Customers", array(
+ "CustomerID" => $newCustomerId,
+ "CompanyName" => "Microsoft",
+ "FirstName" => "Brian",
+ "LastName" => "Swan",
+ ));
+
+ $conn->insert("Orders", array(
+ "CustomerID" => 55,
+ "OrderID" => 37,
+ "OrderDate" => date('Y-m-d H:i:s'),
+ ));
+
+ $conn->insert("OrderItems", array(
+ "CustomerID" => 55,
+ "OrderID" => 37,
+ "ProductID" => 387,
+ "Quantity" => 1,
+ ));
+
+As you can see its very important to pick the right distribution key in your
+sharded application. Otherwise you have to switch the shards very often, which
+is not really easy to work with. If you pick the sharding key right then it
+should be possible to select the shard only once per request for the major
+number of use-cases.
+
+Fan-out the queries accross multiple shards should only be necessary for a
+small number of queries, because these kind of queries are complex.
+
+Querying data with filtering off
+--------------------------------
+
+To access the data you have to pick a shard again and then start selecting data
+from it.
+
+.. code-block:: php
+
+ <?php
+ // query_filtering_off.php
+ require_once "bootstrap.php";
+
+ $shardManager->selectShard(0);
+
+ $data = $conn->fetchAll('SELECT * FROM Customers');
+ print_r($data);
+
+This returns all customers from the shard with distribution value 0. This will
+be all customers with id 10 to less than 60, since we split federations at 60.
+
+Querying data with filtering on
+-------------------------------
+
+One special feature of SQL Azure is the possibility to database level filtering
+based on the sharding distribution values. This means that SQL Azure will add
+WHERE clauses with distributionkey=current distribution value conditions to
+each distribution key.
+
+.. code-block:: php
+
+ <?php
+ // query_filtering_on.php
+ require_once "bootstrap.php";
+
+ $shardManager->setFilteringEnabled(true);
+ $shardManager->selectShard(55);
+
+ $data = $conn->fetchAll('SELECT * FROM Customers');
+ print_r($data);
+
+Now you only get the customer with id = 55. The same holds for queries on the
+``Orders`` and ``OrderItems`` table, which are restricted by customer id = 55.
+

0 comments on commit 9b2fbba

Please sign in to comment.
Something went wrong with that request. Please try again.