symmetric-assemble/src/docbook/configuration.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="configuration"
         xmlns="http://docbook.org/ns/docbook"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xi="http://www.w3.org/2001/XInclude"
         xmlns:svg="http://www.w3.org/2000/svg"
         xmlns:ns="http://docbook.org/ns/docbook"
         xmlns:mml="http://www.w3.org/1998/Math/MathML"
         xmlns:html="http://www.w3.org/1999/xhtml">
  <title>Configuration</title>

  <para><xref linkend="planning" xrefstyle="select: label"/> introduced
  numerous concepts and the analysis and design needed to create an
  implementation of SymmetricDS. This chapter re-visits each analysis step and
  documents how to turn a SymmetricDS design into reality through
  configuration of the various SymmetricDS tables. In addition, several
  advanced configuration options, not presented previously, will also be
  covered.</para>

  <section id="configuration-node-properties">
    <title>Node Properties</title>

    <para>To get a SymmetricDS node running, it needs to be given an identity
    and it needs to know how to connect to the database it will be
    synchronizing. The preferred way to configure a SymmetricDS engine is to
    create a properties file in the engines directory. The SymmetricDS server
    will create an engine for each properties file found in the engines
    directory. When started up, SymmetricDS reads the synchronization
    configuration and state from the database. If the configuration tables are
    missing, they are created automatically (auto creation can be disabled).
    Basic configuration is described by inserting into the following tables
    (the complete data model is defined in <xref linkend="data-model"/>).
    <itemizedlist>
        <listitem>
          <para><xref linkend="table_node_group" xrefstyle="table"/> -
          specifies the tiers that exist in a SymmetricDS network</para>
        </listitem>

        <listitem>
          <para><xref linkend="table_node_group_link" xrefstyle="table"/> -
          links two node groups together for synchronization</para>
        </listitem>

        <listitem>
          <para><xref linkend="table_channel" xrefstyle="table"/> - grouping
          and priority of synchronizations</para>
        </listitem>

        <listitem>
          <para><xref linkend="table_trigger" xrefstyle="table"/> - specifies
          tables, channels, and conditions for which changes in the database
          should be captured</para>
        </listitem>

        <listitem>
          <para><xref linkend="table_router" xrefstyle="table"/> - specifies
          the routers defined for synchronization, along with other routing
          details</para>
        </listitem>

        <listitem>
          <para><xref linkend="table_trigger_router" xrefstyle="table"/> -
          provides mappings of routers and triggers</para>
        </listitem>
      </itemizedlist></para>

    <para>During start up, triggers are verified against the database, and
    database triggers are installed on tables that require data changes to be
    captured. The Route, Pull and Push Jobs begin running to synchronize
    changes with other nodes.</para>

    <para>Each node requires properties that allow it to connect to a database
    and register with a parent node. Properties are configured in a file named
    <code>xxxxx.properties</code> that is placed in the engines directory of
    the SymmetricDS install. The file is usually named according to the
    engine.name, but it is not a requirement.</para>

    <para>To give a node its identity, the following properties are required.
    Any other properties found in <code>conf/symmetric.properties</code> can
    be overridden for a specific engine in an engine's properties file. If the
    properties are changed in <code>conf/symmetric.properties</code> they will
    take effect across all engines deployed to the server. Note that you can
    use the variable <literal>$(hostName)</literal> to represent the host name
    of the machine when defining these properties (for example,
    external.id=$(hostName) ).</para>

    <variablelist>
      <varlistentry>
        <term>
          <command>engine.name</command>
        </term>

        <listitem>
          <para>This is an arbitrary name that is used to access a specific
          engine using an HTTP URL. Each node configured in the engines
          directory must have a unique engine name. The engine name is also
          used for the domain name of registered JMX beans.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>
          <command>group.id</command>
        </term>

        <listitem>
          <para>The node group that this node is a member of. Synchronization
          is specified between node groups, which means you only need to
          specify it once for multiple nodes in the same group.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>
          <command>external.id</command>
        </term>

        <listitem>
          <para>The external id for this node has meaning to the user and
          provides integration into the system where it is deployed. For
          example, it might be a retail store number or a region number. The
          external id can be used in expressions for conditional and subset
          data synchronization. Behind the scenes, each node has a unique
          sequence number for tracking synchronization events. That makes it
          possible to assign the same external id to multiple nodes, if
          desired.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>
          <command>sync.url</command>
        </term>

        <listitem>
          <para>The URL where this node can be contacted for synchronization.
          At startup and during each heartbeat, the node updates its entry in
          the database with this URL. The sync url is of the format:
          <code>http://{hostname}:{port}/{webcontext}/sync/{engine.name}</code>.</para>

          <para>The {webcontext} is blank for a standalone deployment. It will
          typically be the name of the war file for an application server
          deployment.</para>

          <para>The {engine.name} can be left blank if there is only one
          engine deployed in a SymmetricDS server.</para>
        </listitem>
      </varlistentry>
    </variablelist>

    <para>When a new node is first started, it is has no information about
    synchronizing. It contacts the registration server in order to join the
    network and receive its configuration. The configuration for all nodes is
    stored on the registration server, and the URL must be specified in the
    following property:</para>

    <variablelist>
      <varlistentry>
        <term>
          <command>registration.url</command>
        </term>

        <listitem>
          <para>The URL where this node can connect for registration to
          receive its configuration. The registration server is part of
          SymmetricDS and is enabled as part of the deployment. This is
          typically equal to the value of the sync.url of the registration
          server.</para>
        </listitem>
      </varlistentry>
    </variablelist>

    <important>
      <para>Note that a <emphasis>registration server node</emphasis> is
      defined as one whose <literal>registration.url</literal> is either (a)
      blank, or (b) identical to its <literal>sync.url</literal>.</para>
    </important>

    <para>For a deployment where the database connection pool should be
    created using a JDBC driver, set the following properties:</para>

    <variablelist>
      <varlistentry>
        <term>
          <command>db.driver</command>
        </term>

        <listitem>
          <para>The class name of the JDBC driver.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>
          <command>db.url</command>
        </term>

        <listitem>
          <para>The JDBC URL used to connect to the database.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>
          <command>db.user</command>
        </term>

        <listitem>
          <para>The database username, which is used to login, create, and
          update SymmetricDS tables.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>
          <command>db.password</command>
        </term>

        <listitem>
          <para>The password for the database user.</para>
        </listitem>
      </varlistentry>
    </variablelist>
  </section>

  <section id="configuration-node">
    <title>Node</title>

    <para>A <emphasis>node</emphasis>, a single instance of SymmetricDS, is
    defined in the <xref linkend="table_node" xrefstyle="table"/> table. Two
    other tables play a direct role in defining a node, as well The first is
    <xref linkend="table_node_identity" xrefstyle="table"/>. The
    <emphasis>only</emphasis> row in this table is inserted in the database
    when the node first <emphasis>registers</emphasis> with a parent node. In
    the case of a root node, the row is entered by the user. The row is used
    by a node instance to determine its node identity.</para>

    <para>The following SQL statements set up a top-level registration server
    as a node identified as "00000" in the "corp" node group. <programlisting>
insert into SYM_NODE 
  (node_id, node_group_id, external_id, sync_enabled)
values
  ('00000', 'corp', '00000', 1);

insert into SYM_NODE_IDENTITY values ('00000');</programlisting></para>

    <para>The second table, <xref linkend="table_node_security"
    xrefstyle="table"/> has rows created for each <emphasis>child</emphasis>
    node that registers with the node, assuming auto-registration is enabled.
    If auto registration is not enabled, you must create a row in <xref
    linkend="table_node" xrefstyle="table"/> and <xref
    linkend="table_node_security" xrefstyle="table"/> for the node to be able
    to register. You can also, with this table, manually cause a node to
    re-register or do a re-initial load by setting the corresponding columns
    in the table itself. Registration is discussed in more detail in <xref
    linkend="configuration-registration"/>.</para>
  </section>

  <section id="configuration-node-group">
    <title>Node Group</title>

    <para>Node Groups are straightforward to configure and are defined in the
    <xref linkend="table_node_group" xrefstyle="table"/> table. The following
    SQL statements would create node groups for "corp" and "store" based on
    our retail store example. <programlisting>
insert into SYM_NODE_GROUP 
  (node_group_id, description)
values
  ('store', 'A retail store node');

insert into SYM_NODE_GROUP 
  (node_group_id, description)
values
  ('corp', 'A corporate node');</programlisting></para>
  </section>

  <section id="configuration-node-group-link">
    <title>Node Group Link</title>

    <para>Similarly, Node Group links are established using a data event
    action of 'P' for Push and 'W' for Pull ("wait"). The following SQL
    statements links the "corp" and "store" node groups for synchronization.
    It configures the "store" nodes to push their data changes to the "corp"
    nodes, and the "corp" nodes to send changes to "store" nodes by waiting
    for a pull. <programlisting>
insert into SYM_NODE_GROUP_LINK
  (source_node_group, target_node_group, data_event_action)
values
  ('store', 'corp', 'P');

insert into SYM_NODE_GROUP_LINK
  (source_node_group, target_node_group, data_event_action)
values
  ('corp', 'store', 'W');</programlisting></para>

    <para>A node group link can be configured to use the same node group as
    the source and the target. This configuration allows a node group to sync
    with every other node in its group.</para>
    <para>A third type of link action of 'R' for 'Route Only' exists if you want to associate a router with a link that will not move the data.  
    This action type might be useful when using an XML publishing router or an audit table changes router.</para>
  </section>

  <section id="configuration-channel">
    <title>Channel</title>

    <para>By categorizing data into channels and assigning them to <xref
    linkend="table_trigger" xrefstyle="table"/>s, the user gains more control
    and visibility into the flow of data. In addition, SymmetricDS allows for
    synchronization to be enabled, suspended, or scheduled by channels as
    well. The frequency of synchronization and order that data gets
    synchronized is also controlled at the channel level.</para>

    <para>The following SQL statements setup channels for a retail store. An
    "item" channel includes data for items and their prices, while a
    "sale_transaction" channel includes data for ringing sales at a register.
    <programlisting>
insert into SYM_CHANNEL 
  (channel_id, processing_order, max_batch_size, max_batch_to_send, 
   extract_period_millis, batch_algorithm, enabled, description)
values
  ('item', 10, 1000, 10,  0, 'default', 1, 'Item and pricing data');

insert into SYM_CHANNEL 
  (channel_id, processing_order, max_batch_size, max_batch_to_send, 
   extract_period_millis, batch_algorithm, enabled, description)
values
  ('sale_transaction', 1, 1000, 10,  60000, 'transactional', 1, 
   'retail sale transactions from register');</programlisting></para>

    <para>Batching is the grouping of data, by channel, to be transferred and
    committed at the client together. There are three different out-of-the-box
    batching algorithms which may be configured in the batch_algorithm column
    on channel. <variablelist>
        <varlistentry>
          <term>
            <command>default</command>
          </term>

          <listitem>
            <para>All changes that happen in a transaction are guaranteed to
            be batched together. Multiple transactions will be batched and
            committed together until there is no more data to be sent or the
            max_batch_size is reached.</para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>
            <command>transactional</command>
          </term>

          <listitem>
            <para>Batches will map directly to database transactions. If there
            are many small database transactions, then there will be many
            batches. The max_batch_size column has no effect.</para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>
            <command>nontransactional</command>
          </term>

          <listitem>
            <para>Multiple transactions will be batched and committed together
            until there is no more data to be sent or the max_batch_size is
            reached. The batch will be cut off at the max_batch_size
            regardless of whether it is in the middle of a transaction.</para>
          </listitem>
        </varlistentry>
      </variablelist></para>

    <para>If a channel contains <emphasis>only</emphasis> tables that will be
    synchronized in one direction and and data is routed to all the nodes in
    the target node groups, then batching on the channel can be optimized to
    share batches across nodes. This is an important feature when data needs
    to be routed to thousands of nodes. When this mode is detected, you will
    see batches created in <xref linkend="table_outgoing_batch"
    xrefstyle="table"/> with the <literal>common_flag</literal> set to
    1.</para>

    <para>There are also several size-related parameters that can be set by
    channel. They include: <variablelist>
        <varlistentry>
          <term>
            <command>max_batch_size</command>
          </term>

          <listitem>
            <para>Specifies the maximum number of data events to process
            within a batch for this channel.</para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>
            <command>max_batch_to_send</command>
          </term>

          <listitem>
            <para>Specifies the maximum number of batches to send for a given
            channel during a 'synchronization' between two nodes. A
            'synchronization' is equivalent to a push or a pull. For example,
            if there are 12 batches ready to be sent for a channel and
            max_batch_to_send is equal to 10, then only the first 10 batches
            will be sent even though 12 batches are ready.</para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>
            <command>max_data_to_route</command>
          </term>

          <listitem>
            <para>Specifices the maximum number of data rows to route for a
            channel at a time.</para>
          </listitem>
        </varlistentry>
      </variablelist></para>

    <para>Based on your particular synchronization requirements, you can also
    specify whether old, new, and primary key data should be read and included
    during routing for a given channel. These are controlled by the columns
    use_old_data_to_route, use_row_data_to_route, and use_pk_data_to_route,
    respectively. By default, they are all 1 (true).</para>

    <para>Finally, if data on a particular channel contains big lobs, you can
    set the column contains_big_lob to 1 (true) to provide SymmetricDS the
    hint that the channel contains big lobs. Some databases have shortcuts
    that SymmetricDS can take advantage of if it knows that the lob columns in
    <xref linkend="table_data" xrefstyle="table"/> aren't going to contain
    large lobs. The definition of how large a 'big' lob is varies from
    database to database.</para>
  </section>

  <section id="configuration-triggers-and-routers">
    <title>Triggers, Routers, and Trigger / Routers Mappings</title>

    <para>In order to synchronize data, you must define at least one trigger, at least one router,
    and provide at least one link between the two (known as a trigger-router). 
    </para>

    <section id="configuration-trigger">
      <title>Trigger</title>

      <para>SymmetricDS captures synchronization data using database triggers.
      SymmetricDS' Triggers are defined in the <xref linkend="table_trigger"
      xrefstyle="table"/> table. Each record is used by SymmetricDS when
      generating database triggers. Database triggers are only generated when
      a trigger is associated with a <xref linkend="table_router"
      xrefstyle="table"/> whose <literal>source_node_group_id</literal>
      matches the node group id of the current node.</para>

      <para>The <literal>source_table_name</literal> may contain the asterisk
      ('*') wildcard character so that one <xref linkend="table_trigger"
      xrefstyle="table"/> table entry can define synchronization for many
      tables. System tables and any tables that start with the SymmetricDS
      table prefix will be excluded. A list of wildcard tokens can also be
      supplied. If there are multiple tokens, they should be delimited with a
      comma. A wildcard token can also start with a bang ('!') to indicate an
      exclusive match. Tokens are always evalulated from left to right. When a
      table match is made, the table is either added to or removed from the
      list of tables. If another trigger already exists for a table, then that
      table is not included in the wildcard match (the explictly defined
      trigger entry take precendence).</para>

      <para>When determining whether a data change has occurred or not, by
      defalt the triggers will record a change even if the data was updated to
      the same value(s) they were originally. For example, a data change will
      be captured if an update of one column in a row updated the value to the
      same value it already was. There is a global property,
      <literal>trigger.update.capture.changed.data.only.enabled</literal>
      (false by default), that allows you to override this behavior. When set
      to true, SymmetricDS will only capture a change if the data has truly
      changed (i.e., when the new column data is not equal to the old column
      data).</para>

      <important>The property
      <literal>trigger.update.capture.changed.data.only.enabled</literal> is
      currently only supported in the MySQL, DB2 and Oracle
      dialects.</important>

      <para>The following SQL statement defines a trigger that will capture
      data for a table named "item" whenever data is inserted, updated, or
      deleted. The trigger is assigned to a channel also called 'item'.
      <programlisting>
insert into SYM_TRIGGER 
    (trigger_id,source_table_name,channel_id,last_update_time,create_time)
  values
    ('item', 'item', 'item', current_timestamp, current_timestamp);
</programlisting></para>

      <important>
        <para>Note that many databases allow for multiple triggers of the same
        type to be defined. Each database defines the order in which the
        triggers fire differently. If you have additional triggers beyond
        those SymmetricDS installs on your table, please consult your database
        documentation to determine if there will be issues with the ordering
        of the triggers.</para>
      </important>


    <section id="configuration-trigger-lobs">
    <title>Large Objects</title>
      <para>Two lobs-related settings are also available on <xref
      linkend="table_trigger" xrefstyle="table"/>: <variablelist>
          <varlistentry>
            <term>
              <command>use_stream_lobs</command>
            </term>

            <listitem>
              <para>Specifies whether to capture lob data as the trigger is
              firing or to stream lob columns from the source tables using
              callbacks during extraction. A value of 1 indicates to stream
              from the source via callback; a value of 0, lob data is captured
              by the trigger.</para>
            </listitem>
          </varlistentry>

          <varlistentry>
            <term>
              <command>use_capture_lobs</command>
            </term>

            <listitem>
              <para>Provides a hint as to whether this trigger will capture
              big lobs data. If set to 1 every effort will be made during data
              capture in trigger and during data selection for initial load to
              use lob facilities to extract and store data in the
              database.</para>
            </listitem>
          </varlistentry>
        </variablelist></para>
        </section>
        
            <section id="configuration-trigger-external-select">
                <title>External Select</title>
            
                <para>Occasionally, you may find that you need to capture and save away a piece of data present in another table when a trigger is firing.
                This data is typically needed for
                the purposes of determining where to 'route' the data to once routing takes place.  Each trigger definition contains an optional
                <literal>external_select</literal> field which can be used to specify the data to be captured.  
                Once captured, this data is available during routing in <xref
                linkend="table_data" xrefstyle="table"/>'s <literal>external_data</literal> field.
                For these cases, place a SQL select statement which returns the data item you need for routing in <literal>external_select</literal>.
                An example of the use of external select can be found in <xref
                linkend="configuration-routing-external-select"/>.</para>
            </section>
    </section>

    <section id="configuration-router">
      <title>Router</title>

      <para>Routers provided in the base implementation currently include:
      <itemizedlist>
          <listitem>Default Router - a router that sends all data to all nodes
          that belong to the target node group defined in the
          router.</listitem>

          <listitem>Column Match Router - a router that compares old or new
          column values to a constant value or the value of a node's
          external_id or node_id.</listitem>

          <listitem>Lookup Router - a router which can be configured to
          determine routing based on an existing or ancillary table
          specifically for the purpose of routing data.</listitem>

          <listitem>Subselect Router - a router that executes a SQL expression
          against the database to select nodes to route to. This SQL
          expression can be passed values of old and new column
          values.</listitem>

          <listitem>Scripted Router - a router that executes a Bean Shell
          script expression in order to select nodes to route to. The script
          can use the the old and new column values.</listitem>

          <listitem>Xml Publishing Router - a router the publishes data
          changes directly to a messaging solution instead of transmitting
          changes to registered nodes. This router must be configured manually
          in XML as an extension point.</listitem>
          
          <listitem>Audit Table Router - a router that inserts into an automatically created audit table.  It records captured changes
          to tables that it is linked to.
          </listitem>
        </itemizedlist> The mapping between the set of triggers and set of
      routers is many-to-many. This means that one trigger can capture changes
      and route to multiple locations. It also means that one router can be
      defined an associated with many different triggers.</para>

      <section id="configuration-default-router">
        <title>Default Router</title>

        <para>The simplest router is a router that sends all the data that is
        captured by its associated triggers to all the nodes that belong to
        the target node group defined in the router. A router is defined as a
        row in the <xref linkend="table_router" xrefstyle="table"/> table. It
        is then linked to triggers in the <xref linkend="table_trigger_router"
        xrefstyle="table"/> table.</para>

        <para>The following SQL statement defines a router that will send data
        from the 'corp' group to the 'store' group. <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, 
    create_time, last_update_time)
values
  ('corp-2-store','corp', 'store', current_timestamp, current_timestamp);

</programlisting></para>

        <para>The following SQL statement maps the 'corp-2-store' router to
        the item trigger. <programlisting>
insert into SYM_TRIGGER_ROUTER 
  (trigger_id, router_id, initial_load_order,  create_time, last_update_time)
values
  ('item', 'corp-2-store', 1, current_timestamp, current_timestamp);

</programlisting></para>
      </section>

      <section id="configuration-column-match-router">
        <title>Column Match Router</title>

        <para>Sometimes requirements may exist that require data to be routed
        based on the current value or the old value of a column in the table
        that is being routed. Column routers are configured by setting the
        <literal>router_type</literal> column on the <xref
        linkend="table_router" xrefstyle="table"/> table to
        <literal>column</literal> and setting the
        <literal>router_expression</literal> column to an equality expression
        that represents the expected value of the column.</para>

        <para>The first part of the expression is always the column name. The
        column name should always be defined in upper case. The upper case
        column name prefixed by OLD_ can be used for a comparison being done
        with the old column data value.</para>

        <para>The second part of the expression can be a constant value, a
        token that represents another column, or a token that represents some
        other SymmetricDS concept. Token values always begin with a colon
        (:).</para>

        <para>Consider a table that needs to be routed to all nodes in the
        target group only when a status column is set to 'READY TO SEND.' The
        following SQL statement will insert a column router to accomplish
        that. <programlisting>
insert into SYM_ROUTER 
(router_id, source_node_group_id, target_node_group_id, router_type, 
 router_expression, create_time, last_update_time)
values
('corp-2-store-ok','corp', 'store', 'column', 
 'STATUS=READY TO SEND', current_timestamp, current_timestamp);

</programlisting></para>

        <para>Consider a table that needs to be routed to all nodes in the
        target group only when a status column changes values. The following
        SQL statement will insert a column router to accomplish that. Note the
        use of OLD_STATUS, where the OLD_ prefix gives access to the old
        column value. <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-status','corp', 'store', 'column', 
    'STATUS!=:OLD_STATUS', current_timestamp, current_timestamp);

</programlisting></para>

        <para>Consider a table that needs to be routed to only nodes in the
        target group whose STORE_ID column matches the external id of a node.
        The following SQL statement will insert a column router to accomplish
        that. <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-id','corp', 'store', 'column', 
    'STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);

</programlisting> Attributes on a <xref linkend="table_node"
        xrefstyle="table"/> that can be referenced with tokens include:
        <itemizedlist>
            <listitem>:NODE_ID</listitem>

            <listitem>:EXTERNAL_ID</listitem>

            <listitem>:NODE_GROUP_ID</listitem>
          </itemizedlist>
          Captured EXTERNAL_DATA is also available for routing as a virtual column.
          </para>

        <para>Consider a table that needs to be routed to a redirect node
        defined by its external id in the <xref
        linkend="table_registration_redirect" xrefstyle="table"/> table. The
        following SQL statement will insert a column router to accomplish
        that. <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-redirect','corp', 'store', 'column', 
    'STORE_ID=:REDIRECT_NODE', current_timestamp, current_timestamp);
</programlisting></para>

        <para>More than one column may be configured in a router_expression.
        When more than one column is configured, all matches are added to the
        list of nodes to route to. The following is an example where the
        STORE_ID column may contain the STORE_ID to route to or the constant
        of ALL which indicates that all nodes should receive the update.
        <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-multiple-matches','corp', 'store', 'column', 
   'STORE_ID=ALL or STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
</programlisting></para>

        <para>The NULL keyword may be used to check if a column is null. If
        the column is null, then data will be routed to all nodes who qualify
        for the update. This following is an example where the STORE_ID column
        is used to route to a set of nodes who have a STORE_ID equal to their
        EXTERNAL_ID, or to all nodes if the STORE_ID is null. <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-multiple-matches','corp', 'store', 'column', 
   'STORE_ID=NULL or STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
</programlisting></para>
      </section>

      <section id="configuration-lookup-table-router">
        <title>Lookup Table Router</title>

        <para>A lookup table may contain the id of the node where data needs
        to be routed. This could be an existing table or an ancillary table
        that is added specifically for the purpose of routing data. Lookup
        table routers are configured by setting the
        <literal>router_type</literal> column on the <xref
        linkend="table_router" xrefstyle="table"/> table to
        <literal>lookuptable</literal> and setting a list of configuration
        parameters in the <literal>router_expression</literal> column.</para>

        <para>Each of the following configuration parameters are required.
        <variablelist>
            <varlistentry>
              <term>
                <command>LOOKUP_TABLE</command>
              </term>

              <listitem>
                <para>This is the name of the lookup table.</para>
              </listitem>
            </varlistentry>

            <varlistentry>
              <term>
                <command>KEY_COLUMN</command>
              </term>

              <listitem>
                <para>This is the name of the column on the table that is
                being routed. It will be used as a key into the lookup
                table.</para>
              </listitem>
            </varlistentry>

            <varlistentry>
              <term>
                <command>LOOKUP_KEY_COLUMN</command>
              </term>

              <listitem>
                <para>This is the name of the column that is the key on the
                lookup table.</para>
              </listitem>
            </varlistentry>

            <varlistentry>
              <term>
                <command>EXTERNAL_ID_COLUMN</command>
              </term>

              <listitem>
                <para>This is the name of the column that contains the
                external_id of the node to route to on the lookup
                table.</para>
              </listitem>
            </varlistentry>
          </variablelist></para>

        <para>Note that the lookup table will be read into memory and cached
        for the duration of a routing pass for a single channel.</para>

        <para>Consider a table that needs to be routed to a specific store,
        but the data in the changing table only contains brand information. In
        this case, the STORE table may be used as a lookup table.
        <programlisting>
insert into SYM_ROUTER 
(router_id, source_node_group_id, target_node_group_id, router_type, 
 router_expression, create_time, last_update_time)
values
('corp-2-store-ok','corp', 'store', 'lookuptable', 
 'LOOKUP_TABLE=STORE
KEY_COLUMN=BRAND_ID
LOOKUP_KEY_COLUMN=BRAND_ID
EXTERNAL_ID_COLUMN=STORE_ID', current_timestamp, current_timestamp);

</programlisting></para>
      </section>

      <section id="configuration-subselect-router">
        <title>Subselect Router</title>

        <para>Sometimes routing decisions need to be made based on data that
        is not in the current row being synchronized.  A 'subselect' router can be used 
        in these cases.  A 'subselect' is configured with a <literal>router_expression</literal> that is a
        SQL select statement which returns a result set of the node ids that
        need routed to. Column tokens can be used in the SQL expression and
        will be replaced with row column data. The overhead of using this
        router type is high because the 'subselect' statement runs for each
        row that is routed. It should not be used for tables that have a lot
        of rows that are updated. It also has the disadvantage that if the
        data being relied on to determine the node id has been deleted before 
        routing takes place, then no results would be returned and
        routing would not happen.</para> 
        <para>The <literal>router_expression</literal> you specify is appended to the
        following SQL statement in order to select the node ids:
        <programlisting>select c.node_id from sym_node c where 
  c.node_group_id=:NODE_GROUP_ID and c.sync_enabled=1 and ...
</programlisting>
    <para>As you can see, you have access to information about the node currently under consideration for routing
    through the 'c' alias, for example <literal>c.external_id</literal>.
    
    There are two node-related tokens you can use in your expression:
            <itemizedlist>
            <listitem>:NODE_GROUP_ID</listitem>
            <listitem>:EXTERNAL_DATA</listitem>
          </itemizedlist></para>
    Column names representing data for the row in question are prefixed with a colon as well, for example:         
         
        <literal>:EMPLOYEE_ID</literal>, or <literal>:OLD_EMPLOYEE_ID</literal>.  Here, the OLD_ prefix indicates the value before
        the change in cases where the old data has been captured.
          
    </para><para>
        For an example, consider the case where an Order table and a OrderLineItem table need to be routed to a
        specific store. The Order table has a column named order_id and
        STORE_ID. A store node has an external_id that is equal to the
        STORE_ID on the Order table. OrderLineItem, however, only has a
        foreign key to its Order of order_id. To route OrderLineItems to the
        same nodes that the Order will be routed to, we need to reference the
        master Order record.</para>

        <para>There are two possible ways to solve this in
        SymmetricDS. One is to configure a 'subselect' router_type on the
        <xref linkend="table_router" xrefstyle="table"/> table, shown below (The other possible
        approach is to use an <literal>external_select</literal> to capture the data via a trigger for use in
        a column match router, demonstrated in <xref
        linkend="configuration-routing-external-select" />).
        </para>

 <para>Our solution utilizing subselect compares the external id
 of the current node with the store id from the Order table where the order id matches
 the order id of the current row being routed:
        <programlisting>
insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store','corp', 'store', 'subselect', 
    'c.external_id in (select STORE_ID from order where order_id=:ORDER_ID)', 
    current_timestamp, current_timestamp);
</programlisting></para>

<para>As a final note, please note in this example that the parent row in Order must still exist at the moment of routing for the
child rows (OrderLineItem) to route, since the select statement is run when routing is occurring, not when the change data is first captured.
</para>

     </section>

      <section id="configuration-scripted-router">
        <title>Scripted Router</title>

        <para>When more flexibility is needed in the logic to choose the nodes
        to route to, then the a scripted router may be used. The currently
        available scripting language is Bean Shell. Bean Shell is a Java-like
        scripting language. Documentation for the Bean Shell scripting
        language can be found at <ulink
        url="http://www.beanshell.org/">http://www.beanshell.org</ulink>.</para>

        <para>The router_type for a Bean Shell scripted router is 'bsh'. The
        router_expression is a valid Bean Shell script that: <itemizedlist>
            <listitem>adds node ids to the <code>targetNodes</code> collection
            which is bound to the script</listitem>

            <listitem>returns a new collection of node ids</listitem>

            <listitem>returns a single node id</listitem>

            <listitem>returns true to indicate that all nodes should be routed
            or returns false to indicate that no nodes should be
            routed</listitem>
          </itemizedlist> Also bound to the script evaluation is a list of
        <code>nodes</code>. The list of <code>nodes</code> is a list of
        eligible <code>org.jumpmind.symmetric.model.Node</code> objects. The
        current data column values and the old data column values are bound to
        the script evaluation as Java object representations of the column
        data. The columns are bound using the uppercase names of the columns.
        Old values are bound to uppercase representations that are prefixed
        with 'OLD_'.</para>

        <para>If you need access to any of the SymmetricDS services, then the
        instance of <code>org.jumpmind.symmetric.ISymmetricEngine</code> is
        accessible via the bound <code>engine</code> variable.</para>

        <para>In the following example, the node_id is a combination of
        STORE_ID and WORKSTATION_NUMBER, both of which are columns on the
        table that is being routed. <programlisting>

insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-bsh','corp', 'store', 'bsh', 
    'targetNodes.add(STORE_ID + "-" + WORKSTATION_NUMBER);', 
    current_timestamp, current_timestamp);
</programlisting></para>

        <para>The same could also be accomplished by simply returning the node
        id. The last line of a bsh script is always the return value.
        <programlisting>

insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-bsh','corp', 'store', 'bsh', 
    'STORE_ID + "-" + WORKSTATION_NUMBER', 
    current_timestamp, current_timestamp);
</programlisting></para>

        <para>The following example will synchronize to all nodes if the FLAG
        column has changed, otherwise no nodes will be synchronized. Note that
        here we make use of OLD_, which provides access to the old column
        value. <programlisting>

insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-flag-changed','corp', 'store', 'bsh', 
    'FLAG != null &amp;&amp; !FLAG.equals(OLD_FLAG)', 
    current_timestamp, current_timestamp);
</programlisting></para>

        <para>The next example shows a script that iterates over each eligible
        node and checks to see if the trimmed value of the column named
        STATION equals the external_id. <programlisting>

insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-trimmed-station','corp', 'store', 'bsh', 
    'for (org.jumpmind.symmetric.model.Node node : nodes) {
         if (STATION != null &amp;&amp; node.getExternalId().equals(STATION.trim())) {
             targetNodes.add(node.getNodeId());
         }
      }', 
    current_timestamp, current_timestamp);
</programlisting></para>
      </section>
      
      <section id="configuration-audit-table-router">
        <title>Audit Table Router</title>

        <para>This router audits captured data by recording the change in an audit table
        that the router creates and keeps up to date (as long as <code>auto.config.database</code> is 
        set to true.)  The router creates a table named the same as the table for which
        data was captured with the suffix of _AUDIT.  It will contain all of the same columns
        as the original table with the same data types only each column is nullable with no default 
        values.     </para>

        <para>Three extra "AUDIT" columns are added to the table: <itemizedlist>
            <listitem>AUDIT_ID - the primary key of the table.</listitem>
            <listitem>AUDIT_TIME - the time at which the change occurred.</listitem>
            <listitem>AUDIT_EVENT - the DML type that happened to the row.</listitem>
          </itemizedlist> </para>

        <para>The following is an example of an audit router<programlisting>

insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    create_time, last_update_time)
values
  ('audit_at_corp','corp', 'local', 'audit', current_timestamp, current_timestamp);
</programlisting></para>

      <para>Because the audit router isn't capturing data for a specific node in the system, but it still has to be 
      associated with a node_group_link a new link action of type 'R' has been introduced.  The 'R' stands for 'only routes to'</para>
      </section>  
      
      
    <section id="configuration-routing-external-select">
              <title>Utilizing External Select when Routing</title>
      

<para>There may be times when you wish to route based on a piece of data that exists in
a table other than the one being routed.  The approach, first discussed in 
<xref linkend="configuration-subselect-router"/>,
is to utlize an <literal>external_select</literal> to save away data in <literal>external_data</literal>, which can then
be referenced during routing.
</para>
<para>
Reconsider subselect's Order / OrderLineItem example (found in <xref linkend="configuration-subselect-router"/>), where
routing for the line item is accomplished by linking to the "header" Order row.  As an alternate way of solving the problem, 
we will now use External Select combined with a column match router.  
</para><para>In this version of the solution, the STORE_ID is captured from the
        Order table in the EXTERNAL_DATA column when the trigger fires.  The
        router is configured to route based on the captured EXTERNAL_DATA to
        all nodes whose external id matches the captured external data.
        <programlisting>insert into SYM_TRIGGER 
  (trigger_id,source_table_name,channel_id,external_select,
    last_update_time,create_time)
values
  ('orderlineitem', 'orderlineitem', 'orderlineitem','select STORE_ID 
    from order where order_id=$(curTriggerValue).$(curColumnPrefix)order_id',
    current_timestamp, current_timestamp);

insert into SYM_ROUTER 
  (router_id, source_node_group_id, target_node_group_id, router_type, 
    router_expression, create_time, last_update_time)
values
  ('corp-2-store-ext','corp', 'store', 'column', 
    'EXTERNAL_DATA=:EXTERNAL_ID', current_timestamp, current_timestamp);
</programlisting></para>

        <para>Note the syntax $(curTriggerValue).$(curColumnPrefix). This
        translates into "OLD_" or "NEW_" based on the DML type being run. In
        the case of Insert or Update, it's NEW_. For Delete, it's OLD_ (since
        there is no new data). In this way, you can access the DML-appropriate
        value for your select statement.</para>
     
       <para>The advantage of this approach over the 'subselect' approach is that it guards against the (somewhat unlikely)
        possibility that the master Order table row might have been deleted before routing has taken place.   This external select solution
        also is a bit more efficient
        than the 'subselect' approach, although the triggers produced do run
        the extra external_select SQL inline with application database
        updates.</para>
        
    </section>
    
    </section>

    <section id="configuration-trigger-router">
      <title>Trigger / Router Mappings</title>

    <para>The <xref
      linkend="table_trigger_router" xrefstyle="table"/> table is used to define
      which specific combinations of triggers and routers are needed for your configuration.  The relationship between
      triggers and routers is many-to-many, so this table serves as the join table to define which combinations are valid, as well
      as to define settings available at the trigger-router level of granularity.
    </para>
      <para>Three important controls can be configured for a specific Trigger /
      Router combination: Enabled, Initial Loads and Ping Back. The parameters for these
      can be found in the Trigger / Router mapping table,<xref
      linkend="table_trigger_router" xrefstyle="table"/> .</para>

      <section id="configuration-trigger-router-enabled">
        <title>Enable / disable trigger router</title>
        
        <para>
        Each individual trigger-router combination can be disabled or enabled if needed.  By default, a trigger router is enabled,
        but if you have a reason you wish to define a trigger router combination prior to it being active, you can set
        the <literal>enabled</literal> flag to 0.  This will cause the trigger-router mapping to be sent to all nodes, but the trigger-router
        mapping will not be considered active or enabled for the purposes of capturing data changes or routing.       
        </para>
    </section>
      <section id="configuration-initial-load">
        
        <title>Initial Loads</title>

        <para>An initial load is the process of seeding tables at a target
        node with data from its parent node. When a node connects and data is
        extracted, after it is registered and if an initial load was
        requested, each table that is configured to synchronize to the target
        node group will be given a reload event in the order defined by the
        end user. A SQL statement is run against each table to get the data
        load that will be streamed to the target node. The selected data is
        filtered through the configured router for the table being loaded. If
        the data set is going to be large, then SQL criteria can optionally be
        provided to pair down the data that is selected out of the
        database.</para>

        <para>An initial load can not occur until after a node is registered.
        An initial load is requested by setting the
        <literal>initial_load_enabled</literal> column on <xref
        linkend="table_node_security" xrefstyle="table"/> to
        <emphasis>1</emphasis> on the row for the target node in the parent
        node's database.  You can configure SymmetricDS to automatically perform an initial load
        when a node registers by setting the parameter <literal>auto.reload</literal> to true. 
        Regardless of how the initial load is initiated, the next time the source node routes data, reload
        batches will be inserted. At the same time reload batches are
        inserted, all previously pending batches for the node are marked as
        successfully sent.</para>

        <important>
          <para>Note that if the parent node that a node is registering with
          is <emphasis>not</emphasis> a registration server node (as can
          happen with a registration redirect or certain non-tree structure
          node configurations) the parent node's <xref
          linkend="table_node_security" xrefstyle="table"/> entry must exist
          at the parent node and have a non-null value for column
          <literal>initial_load_time</literal>. Nodes can't be registered to
          non-registration-server nodes without this value being set one way
          or another (i.e., manually, or as a result of an initial load
          occuring at the parent node).</para>
        </important>

        <para>SymmetricDS recognizes that an initial load has completed when
        the <literal>initial_load_time</literal> column on the target node is
        set to a non-null value.</para>

        <para>An initial load is accomplished by inserting reload batches in a
        defined order according to the <literal>initial_load_order</literal>
        column on <xref linkend="table_trigger_router" xrefstyle="table"/>. If
        the <literal>initial_load_order</literal> column contains a negative
        value the associated table will <emphasis>NOT</emphasis> be loaded. If
        the <literal>initial_load_order</literal> column contains the same
        value for multiple tables, SymmetricDS will attempt to order the
        tables according to foreign key constraints. If there are cyclical
        constraints, then foreign keys might need to be turned off or the
        initial load will need to be manually configured based on knowledge of
        how the data is structured.</para>

        <para>Initial load data is always queried from the source database
        table. All data is passed through the configured router to filter out
        data that might not be targeted at a node.</para>

        <section id="configuration-initial-load-options">
            <title>Target table prep for initial load</title>
             <para>There are several parameters that can be used to specify what, if anything, should be
             done to the table on the target database just prior to loading the data.  Note that the parameters below
             specify the desired behavior for all tables in the initial load, not just one.
             </para>
             <itemizedlist>
             <listitem><literal>initial.load.delete.first / initial.load.delete.first.sql</literal>
             <para>By default, an initial load will not delete existing rows from a target table
             before loading the data.  If a delete is desired, the parameter
             <literal>initial.load.delete.first</literal> can be set to true.  If true,
             the command found in <literal>initial.load.delete.first.sql</literal> will be run on each table prior to loading the data.
             Thd default value for  <literal>initial.load.delete.first.sql</literal> is <literal>delete from %s</literal>,
             but could be changed if needed.
                Note that additional reload batches are created, in the correct order, to achieve the delete.
             </para>
             </listitem>
             <listitem><literal>initial.load.create.first</literal>
              <para>
              By default, an initial load will not create the table on the target if it doesn't aleady exist.
              If the desired behavior is to create the table on the target if it is not present,
              set the parameter <literal>intial.load.create.first</literal> to true. SymmetricDS will
              attempt to create the table and indexes on the target database before doing the initial load. (Additional batches
              are created to represent the table schema).
              </para>
            </listitem>
            </itemizedlist>
        </section>
      <section id="configuration-initial-load-select">
        <title>Loading subsets of data</title>
        
        <para>An efficient way to select a subset of data from a table for an
        initial load is to provide an <literal>initial_load_select</literal>
        clause on <xref linkend="table_trigger_router" xrefstyle="table"/>.
        This clause, if present, is applied as a <literal>where</literal>
        clause to the SQL used to select the data to be loaded. The clause may
        use "t" as an alias for the table being loaded, if needed. The
        <literal>$(externalId)</literal> token can be used for subsetting the
        data in the where clause.</para>

        <para>In cases where routing is done using a feature like <xref
        linkend="configuration-subselect-router">Subselect Router</xref>, an
        <literal>initial_load_select</literal> clause matching the subselect's
        criteria would be a more efficient approach. Some routers will check
        to see if the <literal>initial_load_select</literal> clause is
        provided, and they will <emphasis>not</emphasis> execute assuming that
        the more optimal path is using the
        <literal>initial_load_select</literal> statement.</para>

        <para>One example of the use of an initial load select would be if you
        wished to only load data created more recently than the start of year
        2011. Say, for example, the column <literal>created_time</literal>
        contains the creation date. Your
        <literal>initial_load_select</literal> would read
        <literal>created_time &gt; ts {'2011-01-01 00:00:00.0000'}</literal>
        (using whatever timestamp format works for your database). This then
        gets applied as a <literal>where</literal> clause when selecting data
        from the table.</para>

        <important>
          <para>When providing an <literal>initial_load_select</literal> be
          sure to test out the criteria against production data in a query
          browser. Do an explain plan to make sure you are properly using
          indexes.</para>
        </important>
      </section>
      
       <section id="configuration-initial-load-reverse">
        <title>Reverse Initial Loads</title>
        <para>
        The default behavior for initial loads is to load data from the registration server or parent node, to a client node.
        Occasionally, there may be need to do a one-time intial load of data in the opposite or "reverse" direction, namely from a client
        node to the registration node.  To achieve this, set the parameter <literal>auto.reload.reverse</literal> to be true, <emphasis>but only for the specific
        node group representing the client nodes</emphasis>.  This will cause a one time reverse load of data, for tables configured with non-negative initial load orders, to be 
        batched at the point when registration of the client node is occurring.  These batches are then sent to the parent or registration node.
        This capability might be needed, for example, if there is data already present in the client that doesn't exist in the parent but needs to.        
        </para>     
     </section>
 </section>
      <section id="configuration-dead-triggers">
        <title>Dead Triggers</title>

        <para>Occasionally the decision of what data to load initially results
        in additional triggers. These triggers, known as <emphasis>Dead
        Triggers</emphasis>, are configured such that they do not capture any
        data changes. A "dead" Trigger is one that does not capture data
        changes. In other words, the <literal>sync_on_insert</literal>,
        <literal>sync_on_update</literal>, and
        <literal>sync_on_delete</literal> properties for the Trigger are all
        set to false. However, since the Trigger is specified, it
        <emphasis>will</emphasis> be included in the initial load of data for
        target Nodes.</para>

        <para>Why might you need a Dead Trigger? A dead Trigger might be used
        to load a read-only lookup table, for example. It could also be used
        to load a table that needs populated with example or default data.
        Another use is a recovery load of data for tables that have a single
        direction of synchronization. For example, a retail store records
        sales transaction that synchronize in one direction by trickling back
        to the central office. If the retail store needs to recover all the
        sales transactions from the central office, they can be sent are part
        of an initial load from the central office by setting up dead Triggers
        that "sync" in that direction.</para>

        <para>The following SQL statement sets up a non-syncing dead Trigger
        that sends the <literal>sale_transaction</literal> table to the
        "store" Node Group from the "corp" Node Group during an initial load.
        <programlisting>

insert into sym_trigger (TRIGGER_ID,SOURCE_CATALOG_NAME,
  SOURCE_SCHEMA_NAME,SOURCE_TABLE_NAME,CHANNEL_ID,
  SYNC_ON_UPDATE,SYNC_ON_INSERT,SYNC_ON_DELETE,
  SYNC_ON_INCOMING_BATCH,NAME_FOR_UPDATE_TRIGGER,
  NAME_FOR_INSERT_TRIGGER,NAME_FOR_DELETE_TRIGGER,
  SYNC_ON_UPDATE_CONDITION,SYNC_ON_INSERT_CONDITION,
  SYNC_ON_DELETE_CONDITION,EXTERNAL_SELECT,
  TX_ID_EXPRESSION,EXCLUDED_COLUMN_NAMES,
  CREATE_TIME,LAST_UPDATE_BY,LAST_UPDATE_TIME) 
  values ('SALE_TRANSACTION_DEAD',null,null,
  'SALE_TRANSACTION','transaction',
  0,0,0,0,null,null,null,null,null,null,null,null,null,
  current_timestamp,'demo',current_timestamp);

insert into sym_router (ROUTER_ID,TARGET_CATALOG_NAME,TARGET_SCHEMA_NAME,
  TARGET_TABLE_NAME,SOURCE_NODE_GROUP_ID,TARGET_NODE_GROUP_ID,ROUTER_TYPE,
  ROUTER_EXPRESSION,SYNC_ON_UPDATE,SYNC_ON_INSERT,SYNC_ON_DELETE,
  CREATE_TIME,LAST_UPDATE_BY,LAST_UPDATE_TIME) 
  values ('CORP_2_STORE',null,null,null,
  'corp','store',null,null,1,1,1,
  current_timestamp,'demo',current_timestamp);
   
insert into sym_trigger_router (TRIGGER_ID,ROUTER_ID,INITIAL_LOAD_ORDER,
  INITIAL_LOAD_SELECT,CREATE_TIME,LAST_UPDATE_BY,LAST_UPDATE_TIME) 
  values ('SALE_TRANSACTION_DEAD','CORP_2_REGION',100,null,
   current_timestamp,'demo',current_timestamp);
   </programlisting></para>
      </section>

      <section id="configuration-trigger-router-ping-back">
        <title>Enabling "Ping Back"</title>

        <para>As discussed in <xref
        linkend="defining-data-changes-trigger-routers-ping-back"/>
        SymmetricDS, by default, avoids circular data changes. When a trigger
        fires as a result of SymmetricDS itself (such as the case when sync on
        incoming batch is set), it records the originating source node of the
        data change in <literal>source_node_id</literal>. During routing, if
        routing results in sending the data back to the originating source
        node, the data is not routed by default. If instead you wish to route
        the data back to the originating node, you can set the
        <literal>ping_back_enabled</literal> column for the needed particular
        trigger / router combination. This will cause the router to "ping" the
        data back to the originating node when it usually would not.</para>
      </section>
    </section>
  </section>

  <section id="configuration-registration">
    <title>Opening Registration</title>

    <para>Node registration is the act of setting up a new <xref
    linkend="table_node" xrefstyle="table"/> and <xref
    linkend="table_node_security" xrefstyle="table"/> so that when the new
    node is brought online it is allowed to join the system. Nodes are only
    allowed to register if rows exist for the node and the
    <literal>registration_enabled</literal> flag is set to 1. If the
    <literal>auto.registration</literal> SymmetricDS property is set to true,
    then when a node attempts to register, if registration has not already
    occurred, the node will automatically be registered.</para>

    <para>SymmetricDS allows you to have multiple nodes with the same
    <literal>external_id</literal>. Out of the box, openRegistration will open
    a new registration if a registration already exists for a node with the
    same external_id. A new registration means a new node with a new
    <literal>node_id</literal> and the same <literal>external_id</literal>
    will be created. If you want to re-register the same node you can use the
    <literal>reOpenRegistration()</literal> JMX method which takes a
    <literal>node_id</literal> as an argument.</para>
  </section>

  <section id="transform-data">
    <title>Transforming Data</title>

    <para>New as of SymmetricDS 2.4, SymmetricDS is now able to transform
    synchronized data by way of configuration (previously, for most cases a
    custom data loader would need to have been written). This transformation
    can take place on a source node or on a target node, as the data is being
    loaded or extracted. With this new feature you can, for example:</para>

    <itemizedlist>
      <listitem>
        <para>Copy a column from a source table to two (or more) target table
        columns,</para>
      </listitem>

      <listitem>
        <para>Merge columns from two or more source tables into a single row
        in a target table,</para>
      </listitem>

      <listitem>
        <para>Insert constants in columns in target tables based on source
        data synchronizations,</para>
      </listitem>

      <listitem>
        <para>Insert multiple rows of data into a single target table based on
        one change in a source table,</para>
      </listitem>

      <listitem>
        <para>Apply a Bean Shell script to achieve a custom transform when
        loading into the target database.</para>
      </listitem>
    </itemizedlist>

    <para>These transformations can take place either on the target or on the
    source, and as data is either being extracted or loaded. In either case,
    the transformation is initiated due to existence of a source
    synchronization trigger. The source trigger creates the synchronization
    data, while the transformation configuration decides what to do with the
    sychronization data as it is either being extracted from the source or
    loaded into the target. You have the flexibility of defining different
    transformation behavior depending on whether the source change that
    triggered the synchronization was an Insert, Update, or Delete. In the
    case of Delete, you even have options on what exactly to do on the target
    side, be it a delete of a row, setting columns to specific values, or
    absolutely nothing at all.</para>

    <para>A few key concepts are important to keep in mind to understand how
    SymmetricDS performs transformations. The first concept is that of the
    "source operation" or "source DML type", which is the type of operation
    that occurred to generate the synchronization data in the first place
    (i.e., an insert, a delete, or an update). Your transformations can be
    configured to act differently based on the source DML type, if desired.
    When transforming, by default the DML action taken on the target matches
    that of the action taken on the row in the source (although this behavior
    can be altered through configuration if needed). If the source DML type is
    an Insert, for example, the resulting transformation DML(s) will be
    Insert(s).</para>

    <para>Another important concept is the way in which transforms are
    applied. Each source operation may map to one or more transforms and
    result in one or more operations on the target tables. Each of these
    target operations are performed as independent operations in sequence and
    must be "complete" from a SQL perspective. In other words, you must define
    columns for the transformation that are sufficient to fill in any primary
    key or other required data in the target table if the source operation was
    an Insert, for example.</para>

    <para>Finally, please note that the tranformation engine relies on a
    source trigger / router existing to supply the source data for the
    transformation. The transform configuration will never be used if the
    source table and target node group does not have a defined trigger /
    router combination for that source table and target node group.</para>

    <section id="transform-data-tables">
      <title>Transform Configuration Tables</title>

      <para>SymmetricDS stores its transformation configuration in two
      configuration tables, <xref linkend="table_transform_table"
      xrefstyle="table"/> and <xref linkend="table_transform_column"
      xrefstyle="table"/>. Defining a transformation involves configuration in
      both tables, with the first table defining which source and destination
      tables are involved, and the second defining the columns involved in the
      transformation and the behavior of the data for those columns. We will
      explain the various options available in both tables and the various
      pre-defined transformation types.<!--  and then end with a series of examples.--></para>

      <para>To define a transformation, you will first define the source table
      and target table that applies to a particular transformation. The source
      and target tables, along with a unique identifier (the transform_id
      column) are defined in <xref linkend="table_transform_table"
      xrefstyle="table"/>. In addition, you will specify the
      source_node_group_id and target_node_group_id to which the transform
      will apply, along with whether the transform should occur on the Extract
      step or the Load step (transform_point). All of these values are
      required.</para>

      <para>Three additional configuration settings are also defined at the
      source-target table level: the order of the transformations, the
      behavior when deleting, and whether an update should always be attempted
      first. More specifically, <itemizedlist>
          <listitem>transform_order: For a single source operation that is
          mapped to a transformation, there could be more than one target
          operation that takes place. You may control the order in which the
          target operations are applied through a configuration parameter
          defined for each source-target table combination. This might be
          important, for example, if the foreign key relationships on the
          target tables require you to execute the transformations in a
          particular order.</listitem>

          <listitem>column_policy: Indicates whether unspecified columns are
          passed thru or if all columns must be explicitly defined. The
          options include: <itemizedlist>
              <listitem>SPECIFIED - Indicates that only the transform columns
              that are defined will be the ones that end up as part of the
              transformation.</listitem>

              <listitem>IMPLIED - Indicates that if not specified, then
              columns from the source are passed through to the target. This
              is useful if you just want to map a table from one name to
              anther or from one schema to another. It is also useful if you
              want to transform a table, but also want to pass it through. You
              would define an implied transform from the source to the target
              and would not have to configure each column.</listitem>
            </itemizedlist></listitem>

          <listitem>delete_action: When a source operation of Delete takes
          place, there are three possible ways to handle the transformation at
          the target. The options include: <itemizedlist>
              <listitem>NONE - The delete results in no target
              changes.</listitem>

              <listitem>DEL_ROW - The delete results in a delete of the row as
              specified by the pk columns defined in the transformation
              configuration.</listitem>

              <listitem>UPDATE_COL - The delete results in an Update operation
              on the target which updates the specific rows and columns based
              on the defined transformation.</listitem>
            </itemizedlist></listitem>

          <listitem>update_first: This option overrides the default behavior
          for an Insert operation. Instead of attempting the Insert first,
          SymmetricDS will always perform an Update first and then fall back
          to an Insert if that fails. Note that, by default, fall back logic
          <emphasis>always</emphasis> applies for Insert and Updates. Here,
          all you a specifying is whether to always do an Update first, which
          can have performance benefits under certain situations you may run
          into.</listitem>
        </itemizedlist></para>

      <para>For each transformation defined in <xref
      linkend="table_transform_table" xrefstyle="table"/>, the columns to be
      transformed (and how they are transformed) are defined in <xref
      linkend="table_transform_column" xrefstyle="table"/>. This column-level
      table typically has several rows for each transformation id, each of
      which defines the source column name, the target column name, as well as
      the following details: <itemizedlist>
          <listitem>include_on: Defines whether this entry applies to source
          operations of Insert (I), Update (U), or Delete (D), or any source
          operation.</listitem>

          <listitem>pk: Indicates that this mapping is used to define the
          "primary key" for identifying the target row(s) (which may or may
          not be the true primary key of the target table). This is used to
          define the "where" clause when an Update or Delete on the target is
          occurring. At least one row marked as a pk should be present for
          each transform_id.</listitem>

          <listitem>transform_type, transform_expression: Specifies how the
          data is modified, if at all. The available transform types are
          discussed below, and the default is 'copy', which just copies the
          data from source to target.</listitem>

          <listitem>transform_order: In the event there are more than one
          columns to transform, this defines the relative order in which the
          transformations are applied.</listitem>
        </itemizedlist></para>
    </section>

    <section id="transform-data-types">
      <title>Transformation Types</title>

      <para>There are several pre-defined transform types available in
      SymmetricDS. Additional ones can be defined by creating and configuring
      an extension point which implements the <code>IColumnTransform</code>
      interface. The pre-defined transform types include the following (the
      transform_type entry is shown in parentheses): <itemizedlist>
          <listitem>Copy Column Transform ('copy'): This transformation type
          copies the source column value to the target column. This is the
          default behavior.</listitem>

          <listitem>Constant Transform ('const'): This transformation type
          allows you to map a constant value to the given target column. The
          constant itself is placed in transform_expression.</listitem>

          <listitem>Variable Transform ('variable'): This transformation type
          allows you to map a built-in dynamic variable to the given target
          column. The variable name is placed in transform_expression. The
          following variables are available: <code>Ssystem_date</code> is the
          current system date, <code>system_timestamp</code> is the current
          system date and time, <code>source_node_id</code> is the node id of
          the source, <code>target_node_id</code> is the node id of the
          target, and <code>null</code> is a null value.</listitem>

          <listitem>Additive Transform ('additive'): This transformation type
          is used for numeric data. It computes the change between the old and
          new values on the source and then adds the change to the existing
          value in the target column. That is, target = target + multiplier
          (source_new - source_old), where multiplier is a constant found in
          the transform_expression (default is 1 if not specified). For
          example, if the source column changed from a 2 to a 4, the target
          column is currently 10, and the multiplier is 3, the effect of the
          transform will be to change the target column to a value of 16 (
          10+3*(4-2) =&gt; 16 ). Note that, in the case of deletes, the new
          column value is considered 0 for the purposes of the
          calculation.</listitem>

          <listitem>Substring Transform ('substr'): This transformation
          computes a substring of the source column data and uses the
          substring as the target column value. The transform_expression can
          be a single integer (<code>n</code>, the beginning index), or a pair
          of comma-separated integers (<code>n,m</code> - the beginning and
          ending index). The transform behaves as the Java substring function
          would using the specified values in transform_expression.</listitem>

          <listitem>Multiplier Transform ('multiply'): This transformation
          allows for the creation of multiple rows in the target table based
          on the transform_expression. This transform type can only be used on
          a primary key column. The transform_expression is a SQL statement
          that returns the list to be used to create the multiple
          targets.</listitem>

          <listitem>Lookup Transform ('lookup'): This transformation
          determines the target column value by using a query, contained in
          transform_expression to lookup the value in another table. The query
          must return a single row, and the first column of the query is used
          as the value. Your query references source column names by prefixing
          with a colon (e.g., :MY_COLUMN).</listitem>

          <listitem>Shell Script Transform ('bsh'): This transformation allows
          you to provide a Bean Shell script in transform_expression and
          executes the script at the time of transformation. Some variables
          are provided to the script: <code>COLUMN_NAME</code> is a variable
          for a source column in the row, where the variable name is the
          column name in uppercase; <code>currentValue</code> is the value of
          the current source column; <code>oldValue</code> is the old value of
          the source column for an updated row; <code>sqlTemplate</code> is a
          <code>org.jumpmind.db.sql.ISqlTemplate</code> object for querying or
          updating the database; <code>channelId</code> is a reference to the
          channel on which the transformation is happening;
          <code>sourceNode</code> is a
          <code>org.jumpmind.symmetric.model.Node</code> object that
          represents the node from where the data came;
          <code>targetNode</code> is a
          <code>org.jumpmind.symmetric.model.Node</code> object that
          represents the node where the data is being loaded.</listitem>

          <listitem>Identity Transform ('identity'): This transformation
          allows you to insert into an identity column by computing a new
          identity, not copying the actual identity value from the
          source.</listitem>
        </itemizedlist></para>
    </section>

    <!--  <section id="transform-data-examples">
        <title>Transformation Examples</title>
            <para>
            To be done.
            </para>        
    </section> -->
 </section>
    <section id="data-load-filter">
      <title>Data Load Filters</title>

      <para>New as of SymmetricDS 3.1, SymmetricDS is now capable of taking
      actions upon the load of certain data via configurable load filters.
      This new configurable option is in additon to the already existing
      option of writing a class that implements <xref
      linkend="extensions-data-loader-filter" xrefstyle="table"/>. A
      configurable load filter watches for specific data that is being loaded
      and then takes action based on the load of that data.</para>

      <para>Specifying which data to action is done by specifying a souce and
      target node group (data extracted from this node group, and loaded into
      that node group), and a target catalog, schema and table name. You can
      decide to take action on rows that are inserted, updated and/or deleted,
      and can also further delineate which rows of the target table to take
      action on by specifying additional criteria in the bean shell script
      that is executed in response to the loaded data. As an example, old and
      new values for the row of data being loaded are available in the bean
      shell script, so you can action rows with a certain column value in old
      or new data.</para>

      <para>The action taken is based on a bean shell script that you can
      provide as part of the configuration. Actions can be taken at different
      points in the load process including before write, after write, at batch
      complete, at batch commit and/or at batch rollback.</para>

    <section id="data-load-filter-config">
      <title>Load Filter Configuration Table</title>

      <para>SymmetricDS stores its load filter configuration in a single table
      called <xref linkend="table_load_filter" xrefstyle="table"/>. The load filter
      table allows you to specify the following: <itemizedlist>
          <listitem>Load Filter Type ('load_filter_type'): The type of load
          filter. Today only beanshell is support ('BSH'), but SQL scripts may
          be added in a future release.</listitem>

          <listitem>Source Node Group ('source_node_group_id'): The source
          node group for which you would like to watch for changes.</listitem>

          <listitem>Target Node Group ('target_node_group_id'): The target
          node group for which you would like to watch for changes. The source
          and target not groups are used together to identify the node group
          link for which you would like to watch for changes (i.e. When the
          Server node group sends data to a Client node group).</listitem>

          <listitem>Target Catalog ('target_catalog_name'): The name of the
          target catalog for which you would like to watch for
          changes.</listitem>

          <listitem>Target Schema ('target_schema_name'): The name of the
          target schema for which you would like to watch for
          changes.</listitem>

          <listitem>Target Table ('target_table_name'): The name of the target
          table for which you would like to watch for changes. The target
          catalog, target schema and target table name are used together to
          fully qualify the table for which you would like to watch for
          changes.</listitem>

          <listitem>Filter on Update ('filter_on_update'): Determines whether
          the load filter takes action (executes) on a database update
          statement.</listitem>

          <listitem>Filter on Insert ('filter_on_insert'): Determines whether
          the load filter takes action (executes) on a database insert
          statement.</listitem>

          <listitem>Filter on Delete ('filter_on_delete'): Determines whether
          the load filter takes action (executes) on a database delete
          statement.</listitem>

          <listitem>Before Write Script ('before_write_script'): The script to
          execute before the database write occurs.</listitem>

          <listitem>After Write Script ('after_write_script'): The script to
          execute after the database write occurs.</listitem>

          <listitem>Batch Complete Script ('batch_complete_script'): The
          script to execute after the entire batch completes.</listitem>

          <listitem>Batch Commit Script ('batch_commit_script'): The script to
          execute after the entire batch is committed.</listitem>

          <listitem>Batch Rollback Script ('batch_rollback_script'): The
          script to execute if the batch rolls back.</listitem>

          <listitem>Handle Error Script ('handle_error_script'): A script to
          execute if data cannot be processed.</listitem>

          <listitem>Load Filter Order ('load_filter_order'): The order in
          which load filters should execute if there are multiple scripts
          pertaining to the same source and target data.</listitem>
        </itemizedlist></para>
    </section>

    <section id="data-load-filter-variables">
      <title>Variables available to Data Load Filters</title>

      <para>As part of the bean shell load filters, SymmetricDS provides
      certain variables for use in the bean shell script. Those variables
      include: <itemizedlist>
          <listitem>Symmetric Engine ('ENGINE'): The Symmetric engine
          object.</listitem>

          <listitem>Source Values ('&lt;COLUMN_NAME&gt;'): The source values
          for the row being inserted, updated or deleted.</listitem>

          <listitem>Old Values ('OLD_&lt;COLUMN_NAME&gt;'): The old values for
          the row being inserted, updated or deleted.</listitem>

          <listitem>Data Context ('CONTEXT'): The data context object for the
          data being inserted, updated or deleted. .</listitem>

          <listitem>Table Data ('TABLE'): The table object for the table being
          inserted, updated or deleted.</listitem>
        </itemizedlist></para>
    </section>

    <section id="data-load-filter-examples">
      <title>Data Load Filter Example</title>

      <para>The following is an example of a load filter that watches a table
      named TABLE_TO_WATCH being loaded from the Server Node Group to the
      Client Node Group for inserts or updates, and performs an initial load
      on a table named "TABLE_TO_RELOAD" for KEY_FIELD on the reload table
      equal to a column named KEY_FIELD on the TABLE_TO_WATCH table.
      <programlisting>
insert into sym_load_filter 
   (LOAD_FILTER_ID, LOAD_FILTER_TYPE, SOURCE_NODE_GROUP_ID, TARGET_NODE_GROUP_ID,
    TARGET_CATALOG_NAME, TARGET_SCHEMA_NAME, TARGET_TABLE_NAME, 
    FILTER_ON_UPDATE, FILTER_ON_INSERT, FILTER_ON_DELETE, 
    BEFORE_WRITE_SCRIPT, AFTER_WRITE_SCRIPT, BATCH_COMPLETE_SCRIPT, 
    BATCH_COMMIT_SCRIPT, BATCH_ROLLBACK_SCRIPT, HANDLE_ERROR_SCRIPT, 
    CREATE_TIME, LAST_UPDATE_BY, LAST_UPDATE_TIME,
    LOAD_FILTER_ORDER, FAIL_ON_ERROR)
values ('TABLE_TO_RELOAD','BSH','Client','Server',NULL,NULL,
    'TABLE_TO_WATCH',1,1,0,null,
    'engine.getDataService().reloadTable(context.getBatch().getSourceNodeId(), 
    table.getCatalog(), table.getSchema(),
    "TABLE_TO_RELOAD","KEY_FIELD=''" + KEY_FIELD + "''");'
    ,null,null,null,null,sysdate,'userid',sysdate,1,1);

</programlisting></para>
    </section>
  </section>

  <xi:include href="conflicts.xml"/>
</chapter>