/
configuration.xml
1748 lines (1489 loc) · 84.3 KB
/
configuration.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="configuration"
xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:ns="http://docbook.org/ns/docbook"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
xmlns:html="http://www.w3.org/1999/xhtml">
<title>Configuration</title>
<para><xref linkend="planning" xrefstyle="select: label"/> introduced
numerous concepts and the analysis and design needed to create an
implementation of SymmetricDS. This chapter re-visits each analysis step and
documents how to turn a SymmetricDS design into reality through
configuration of the various SymmetricDS tables. In addition, several
advanced configuration options, not presented previously, will also be
covered.</para>
<section id="configuration-node-properties">
<title>Node Properties</title>
<para>To get a SymmetricDS node running, it needs to be given an identity
and it needs to know how to connect to the database it will be
synchronizing. The preferred way to configure a SymmetricDS engine is to
create a properties file in the engines directory. The SymmetricDS server
will create an engine for each properties file found in the engines
directory. When started up, SymmetricDS reads the synchronization
configuration and state from the database. If the configuration tables are
missing, they are created automatically (auto creation can be disabled).
Basic configuration is described by inserting into the following tables
(the complete data model is defined in <xref linkend="data-model"/>).
<itemizedlist>
<listitem>
<para><xref linkend="table_node_group" xrefstyle="table"/> -
specifies the tiers that exist in a SymmetricDS network</para>
</listitem>
<listitem>
<para><xref linkend="table_node_group_link" xrefstyle="table"/> -
links two node groups together for synchronization</para>
</listitem>
<listitem>
<para><xref linkend="table_channel" xrefstyle="table"/> - grouping
and priority of synchronizations</para>
</listitem>
<listitem>
<para><xref linkend="table_trigger" xrefstyle="table"/> - specifies
tables, channels, and conditions for which changes in the database
should be captured</para>
</listitem>
<listitem>
<para><xref linkend="table_router" xrefstyle="table"/> - specifies
the routers defined for synchronization, along with other routing
details</para>
</listitem>
<listitem>
<para><xref linkend="table_trigger_router" xrefstyle="table"/> -
provides mappings of routers and triggers</para>
</listitem>
</itemizedlist></para>
<para>During start up, triggers are verified against the database, and
database triggers are installed on tables that require data changes to be
captured. The Route, Pull and Push Jobs begin running to synchronize
changes with other nodes.</para>
<para>Each node requires properties that allow it to connect to a database
and register with a parent node. Properties are configured in a file named
<code>xxxxx.properties</code> that is placed in the engines directory of
the SymmetricDS install. The file is usually named according to the
engine.name, but it is not a requirement.</para>
<para>To give a node its identity, the following properties are required.
Any other properties found in <code>conf/symmetric.properties</code> can
be overridden for a specific engine in an engine's properties file. If the
properties are changed in <code>conf/symmetric.properties</code> they will
take effect across all engines deployed to the server. Note that you can
use the variable <literal>$(hostName)</literal> to represent the host name
of the machine when defining these properties (for example,
external.id=$(hostName) ).</para>
<variablelist>
<varlistentry>
<term>
<command>engine.name</command>
</term>
<listitem>
<para>This is an arbitrary name that is used to access a specific
engine using an HTTP URL. Each node configured in the engines
directory must have a unique engine name. The engine name is also
used for the domain name of registered JMX beans.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>group.id</command>
</term>
<listitem>
<para>The node group that this node is a member of. Synchronization
is specified between node groups, which means you only need to
specify it once for multiple nodes in the same group.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>external.id</command>
</term>
<listitem>
<para>The external id for this node has meaning to the user and
provides integration into the system where it is deployed. For
example, it might be a retail store number or a region number. The
external id can be used in expressions for conditional and subset
data synchronization. Behind the scenes, each node has a unique
sequence number for tracking synchronization events. That makes it
possible to assign the same external id to multiple nodes, if
desired.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>sync.url</command>
</term>
<listitem>
<para>The URL where this node can be contacted for synchronization.
At startup and during each heartbeat, the node updates its entry in
the database with this URL. The sync url is of the format:
<code>http://{hostname}:{port}/{webcontext}/sync/{engine.name}</code>.</para>
<para>The {webcontext} is blank for a standalone deployment. It will
typically be the name of the war file for an application server
deployment.</para>
<para>The {engine.name} can be left blank if there is only one
engine deployed in a SymmetricDS server.</para>
</listitem>
</varlistentry>
</variablelist>
<para>When a new node is first started, it is has no information about
synchronizing. It contacts the registration server in order to join the
network and receive its configuration. The configuration for all nodes is
stored on the registration server, and the URL must be specified in the
following property:</para>
<variablelist>
<varlistentry>
<term>
<command>registration.url</command>
</term>
<listitem>
<para>The URL where this node can connect for registration to
receive its configuration. The registration server is part of
SymmetricDS and is enabled as part of the deployment. This is
typically equal to the value of the sync.url of the registration
server.</para>
</listitem>
</varlistentry>
</variablelist>
<important>
<para>Note that a <emphasis>registration server node</emphasis> is
defined as one whose <literal>registration.url</literal> is either (a)
blank, or (b) identical to its <literal>sync.url</literal>.</para>
</important>
<para>For a deployment where the database connection pool should be
created using a JDBC driver, set the following properties:</para>
<variablelist>
<varlistentry>
<term>
<command>db.driver</command>
</term>
<listitem>
<para>The class name of the JDBC driver.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>db.url</command>
</term>
<listitem>
<para>The JDBC URL used to connect to the database.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>db.user</command>
</term>
<listitem>
<para>The database username, which is used to login, create, and
update SymmetricDS tables.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>db.password</command>
</term>
<listitem>
<para>The password for the database user.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section id="configuration-node">
<title>Node</title>
<para>A <emphasis>node</emphasis>, a single instance of SymmetricDS, is
defined in the <xref linkend="table_node" xrefstyle="table"/> table. Two
other tables play a direct role in defining a node, as well The first is
<xref linkend="table_node_identity" xrefstyle="table"/>. The
<emphasis>only</emphasis> row in this table is inserted in the database
when the node first <emphasis>registers</emphasis> with a parent node. In
the case of a root node, the row is entered by the user. The row is used
by a node instance to determine its node identity.</para>
<para>The following SQL statements set up a top-level registration server
as a node identified as "00000" in the "corp" node group. <programlisting>
insert into SYM_NODE
(node_id, node_group_id, external_id, sync_enabled)
values
('00000', 'corp', '00000', 1);
insert into SYM_NODE_IDENTITY values ('00000');</programlisting></para>
<para>The second table, <xref linkend="table_node_security"
xrefstyle="table"/> has rows created for each <emphasis>child</emphasis>
node that registers with the node, assuming auto-registration is enabled.
If auto registration is not enabled, you must create a row in <xref
linkend="table_node" xrefstyle="table"/> and <xref
linkend="table_node_security" xrefstyle="table"/> for the node to be able
to register. You can also, with this table, manually cause a node to
re-register or do a re-initial load by setting the corresponding columns
in the table itself. Registration is discussed in more detail in <xref
linkend="configuration-registration"/>.</para>
</section>
<section id="configuration-node-group">
<title>Node Group</title>
<para>Node Groups are straightforward to configure and are defined in the
<xref linkend="table_node_group" xrefstyle="table"/> table. The following
SQL statements would create node groups for "corp" and "store" based on
our retail store example. <programlisting>
insert into SYM_NODE_GROUP
(node_group_id, description)
values
('store', 'A retail store node');
insert into SYM_NODE_GROUP
(node_group_id, description)
values
('corp', 'A corporate node');</programlisting></para>
</section>
<section id="configuration-node-group-link">
<title>Node Group Link</title>
<para>Similarly, Node Group links are established using a data event
action of 'P' for Push and 'W' for Pull ("wait"). The following SQL
statements links the "corp" and "store" node groups for synchronization.
It configures the "store" nodes to push their data changes to the "corp"
nodes, and the "corp" nodes to send changes to "store" nodes by waiting
for a pull. <programlisting>
insert into SYM_NODE_GROUP_LINK
(source_node_group, target_node_group, data_event_action)
values
('store', 'corp', 'P');
insert into SYM_NODE_GROUP_LINK
(source_node_group, target_node_group, data_event_action)
values
('corp', 'store', 'W');</programlisting></para>
<para>A node group link can be configured to use the same node group as
the source and the target. This configuration allows a node group to sync
with every other node in its group.</para>
<para>A third type of link action of 'R' for 'Route Only' exists if you want to associate a router with a link that will not move the data.
This action type might be useful when using an XML publishing router or an audit table changes router.</para>
</section>
<section id="configuration-channel">
<title>Channel</title>
<para>By categorizing data into channels and assigning them to <xref
linkend="table_trigger" xrefstyle="table"/>s, the user gains more control
and visibility into the flow of data. In addition, SymmetricDS allows for
synchronization to be enabled, suspended, or scheduled by channels as
well. The frequency of synchronization and order that data gets
synchronized is also controlled at the channel level.</para>
<para>The following SQL statements setup channels for a retail store. An
"item" channel includes data for items and their prices, while a
"sale_transaction" channel includes data for ringing sales at a register.
<programlisting>
insert into SYM_CHANNEL
(channel_id, processing_order, max_batch_size, max_batch_to_send,
extract_period_millis, batch_algorithm, enabled, description)
values
('item', 10, 1000, 10, 0, 'default', 1, 'Item and pricing data');
insert into SYM_CHANNEL
(channel_id, processing_order, max_batch_size, max_batch_to_send,
extract_period_millis, batch_algorithm, enabled, description)
values
('sale_transaction', 1, 1000, 10, 60000, 'transactional', 1,
'retail sale transactions from register');</programlisting></para>
<para>Batching is the grouping of data, by channel, to be transferred and
committed at the client together. There are three different out-of-the-box
batching algorithms which may be configured in the batch_algorithm column
on channel. <variablelist>
<varlistentry>
<term>
<command>default</command>
</term>
<listitem>
<para>All changes that happen in a transaction are guaranteed to
be batched together. Multiple transactions will be batched and
committed together until there is no more data to be sent or the
max_batch_size is reached.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>transactional</command>
</term>
<listitem>
<para>Batches will map directly to database transactions. If there
are many small database transactions, then there will be many
batches. The max_batch_size column has no effect.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>nontransactional</command>
</term>
<listitem>
<para>Multiple transactions will be batched and committed together
until there is no more data to be sent or the max_batch_size is
reached. The batch will be cut off at the max_batch_size
regardless of whether it is in the middle of a transaction.</para>
</listitem>
</varlistentry>
</variablelist></para>
<para>If a channel contains <emphasis>only</emphasis> tables that will be
synchronized in one direction and and data is routed to all the nodes in
the target node groups, then batching on the channel can be optimized to
share batches across nodes. This is an important feature when data needs
to be routed to thousands of nodes. When this mode is detected, you will
see batches created in <xref linkend="table_outgoing_batch"
xrefstyle="table"/> with the <literal>common_flag</literal> set to
1.</para>
<para>There are also several size-related parameters that can be set by
channel. They include: <variablelist>
<varlistentry>
<term>
<command>max_batch_size</command>
</term>
<listitem>
<para>Specifies the maximum number of data events to process
within a batch for this channel.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>max_batch_to_send</command>
</term>
<listitem>
<para>Specifies the maximum number of batches to send for a given
channel during a 'synchronization' between two nodes. A
'synchronization' is equivalent to a push or a pull. For example,
if there are 12 batches ready to be sent for a channel and
max_batch_to_send is equal to 10, then only the first 10 batches
will be sent even though 12 batches are ready.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>max_data_to_route</command>
</term>
<listitem>
<para>Specifices the maximum number of data rows to route for a
channel at a time.</para>
</listitem>
</varlistentry>
</variablelist></para>
<para>Based on your particular synchronization requirements, you can also
specify whether old, new, and primary key data should be read and included
during routing for a given channel. These are controlled by the columns
use_old_data_to_route, use_row_data_to_route, and use_pk_data_to_route,
respectively. By default, they are all 1 (true).</para>
<para>Finally, if data on a particular channel contains big lobs, you can
set the column contains_big_lob to 1 (true) to provide SymmetricDS the
hint that the channel contains big lobs. Some databases have shortcuts
that SymmetricDS can take advantage of if it knows that the lob columns in
<xref linkend="table_data" xrefstyle="table"/> aren't going to contain
large lobs. The definition of how large a 'big' lob is varies from
database to database.</para>
</section>
<section id="configuration-triggers-and-routers">
<title>Triggers, Routers, and Trigger / Routers Mappings</title>
<para>In order to synchronize data, you must define at least one trigger, at least one router,
and provide at least one link between the two (known as a trigger-router).
</para>
<section id="configuration-trigger">
<title>Trigger</title>
<para>SymmetricDS captures synchronization data using database triggers.
SymmetricDS' Triggers are defined in the <xref linkend="table_trigger"
xrefstyle="table"/> table. Each record is used by SymmetricDS when
generating database triggers. Database triggers are only generated when
a trigger is associated with a <xref linkend="table_router"
xrefstyle="table"/> whose <literal>source_node_group_id</literal>
matches the node group id of the current node.</para>
<para>The <literal>source_table_name</literal> may contain the asterisk
('*') wildcard character so that one <xref linkend="table_trigger"
xrefstyle="table"/> table entry can define synchronization for many
tables. System tables and any tables that start with the SymmetricDS
table prefix will be excluded. A list of wildcard tokens can also be
supplied. If there are multiple tokens, they should be delimited with a
comma. A wildcard token can also start with a bang ('!') to indicate an
exclusive match. Tokens are always evalulated from left to right. When a
table match is made, the table is either added to or removed from the
list of tables. If another trigger already exists for a table, then that
table is not included in the wildcard match (the explictly defined
trigger entry take precendence).</para>
<para>When determining whether a data change has occurred or not, by
defalt the triggers will record a change even if the data was updated to
the same value(s) they were originally. For example, a data change will
be captured if an update of one column in a row updated the value to the
same value it already was. There is a global property,
<literal>trigger.update.capture.changed.data.only.enabled</literal>
(false by default), that allows you to override this behavior. When set
to true, SymmetricDS will only capture a change if the data has truly
changed (i.e., when the new column data is not equal to the old column
data).</para>
<important>The property
<literal>trigger.update.capture.changed.data.only.enabled</literal> is
currently only supported in the MySQL, DB2 and Oracle
dialects.</important>
<para>The following SQL statement defines a trigger that will capture
data for a table named "item" whenever data is inserted, updated, or
deleted. The trigger is assigned to a channel also called 'item'.
<programlisting>
insert into SYM_TRIGGER
(trigger_id,source_table_name,channel_id,last_update_time,create_time)
values
('item', 'item', 'item', current_timestamp, current_timestamp);
</programlisting></para>
<important>
<para>Note that many databases allow for multiple triggers of the same
type to be defined. Each database defines the order in which the
triggers fire differently. If you have additional triggers beyond
those SymmetricDS installs on your table, please consult your database
documentation to determine if there will be issues with the ordering
of the triggers.</para>
</important>
<section id="configuration-trigger-lobs">
<title>Large Objects</title>
<para>Two lobs-related settings are also available on <xref
linkend="table_trigger" xrefstyle="table"/>: <variablelist>
<varlistentry>
<term>
<command>use_stream_lobs</command>
</term>
<listitem>
<para>Specifies whether to capture lob data as the trigger is
firing or to stream lob columns from the source tables using
callbacks during extraction. A value of 1 indicates to stream
from the source via callback; a value of 0, lob data is captured
by the trigger.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>use_capture_lobs</command>
</term>
<listitem>
<para>Provides a hint as to whether this trigger will capture
big lobs data. If set to 1 every effort will be made during data
capture in trigger and during data selection for initial load to
use lob facilities to extract and store data in the
database.</para>
</listitem>
</varlistentry>
</variablelist></para>
</section>
<section id="configuration-trigger-external-select">
<title>External Select</title>
<para>Occasionally, you may find that you need to capture and save away a piece of data present in another table when a trigger is firing.
This data is typically needed for
the purposes of determining where to 'route' the data to once routing takes place. Each trigger definition contains an optional
<literal>external_select</literal> field which can be used to specify the data to be captured.
Once captured, this data is available during routing in <xref
linkend="table_data" xrefstyle="table"/>'s <literal>external_data</literal> field.
For these cases, place a SQL select statement which returns the data item you need for routing in <literal>external_select</literal>.
An example of the use of external select can be found in <xref
linkend="configuration-routing-external-select"/>.</para>
</section>
</section>
<section id="configuration-router">
<title>Router</title>
<para>Routers provided in the base implementation currently include:
<itemizedlist>
<listitem>Default Router - a router that sends all data to all nodes
that belong to the target node group defined in the
router.</listitem>
<listitem>Column Match Router - a router that compares old or new
column values to a constant value or the value of a node's
external_id or node_id.</listitem>
<listitem>Lookup Router - a router which can be configured to
determine routing based on an existing or ancillary table
specifically for the purpose of routing data.</listitem>
<listitem>Subselect Router - a router that executes a SQL expression
against the database to select nodes to route to. This SQL
expression can be passed values of old and new column
values.</listitem>
<listitem>Scripted Router - a router that executes a Bean Shell
script expression in order to select nodes to route to. The script
can use the the old and new column values.</listitem>
<listitem>Xml Publishing Router - a router the publishes data
changes directly to a messaging solution instead of transmitting
changes to registered nodes. This router must be configured manually
in XML as an extension point.</listitem>
<listitem>Audit Table Router - a router that inserts into an automatically created audit table. It records captured changes
to tables that it is linked to.
</listitem>
</itemizedlist> The mapping between the set of triggers and set of
routers is many-to-many. This means that one trigger can capture changes
and route to multiple locations. It also means that one router can be
defined an associated with many different triggers.</para>
<section id="configuration-default-router">
<title>Default Router</title>
<para>The simplest router is a router that sends all the data that is
captured by its associated triggers to all the nodes that belong to
the target node group defined in the router. A router is defined as a
row in the <xref linkend="table_router" xrefstyle="table"/> table. It
is then linked to triggers in the <xref linkend="table_trigger_router"
xrefstyle="table"/> table.</para>
<para>The following SQL statement defines a router that will send data
from the 'corp' group to the 'store' group. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id,
create_time, last_update_time)
values
('corp-2-store','corp', 'store', current_timestamp, current_timestamp);
</programlisting></para>
<para>The following SQL statement maps the 'corp-2-store' router to
the item trigger. <programlisting>
insert into SYM_TRIGGER_ROUTER
(trigger_id, router_id, initial_load_order, create_time, last_update_time)
values
('item', 'corp-2-store', 1, current_timestamp, current_timestamp);
</programlisting></para>
</section>
<section id="configuration-column-match-router">
<title>Column Match Router</title>
<para>Sometimes requirements may exist that require data to be routed
based on the current value or the old value of a column in the table
that is being routed. Column routers are configured by setting the
<literal>router_type</literal> column on the <xref
linkend="table_router" xrefstyle="table"/> table to
<literal>column</literal> and setting the
<literal>router_expression</literal> column to an equality expression
that represents the expected value of the column.</para>
<para>The first part of the expression is always the column name. The
column name should always be defined in upper case. The upper case
column name prefixed by OLD_ can be used for a comparison being done
with the old column data value.</para>
<para>The second part of the expression can be a constant value, a
token that represents another column, or a token that represents some
other SymmetricDS concept. Token values always begin with a colon
(:).</para>
<para>Consider a table that needs to be routed to all nodes in the
target group only when a status column is set to 'READY TO SEND.' The
following SQL statement will insert a column router to accomplish
that. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-ok','corp', 'store', 'column',
'STATUS=READY TO SEND', current_timestamp, current_timestamp);
</programlisting></para>
<para>Consider a table that needs to be routed to all nodes in the
target group only when a status column changes values. The following
SQL statement will insert a column router to accomplish that. Note the
use of OLD_STATUS, where the OLD_ prefix gives access to the old
column value. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-status','corp', 'store', 'column',
'STATUS!=:OLD_STATUS', current_timestamp, current_timestamp);
</programlisting></para>
<para>Consider a table that needs to be routed to only nodes in the
target group whose STORE_ID column matches the external id of a node.
The following SQL statement will insert a column router to accomplish
that. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-id','corp', 'store', 'column',
'STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
</programlisting> Attributes on a <xref linkend="table_node"
xrefstyle="table"/> that can be referenced with tokens include:
<itemizedlist>
<listitem>:NODE_ID</listitem>
<listitem>:EXTERNAL_ID</listitem>
<listitem>:NODE_GROUP_ID</listitem>
</itemizedlist>
Captured EXTERNAL_DATA is also available for routing as a virtual column.
</para>
<para>Consider a table that needs to be routed to a redirect node
defined by its external id in the <xref
linkend="table_registration_redirect" xrefstyle="table"/> table. The
following SQL statement will insert a column router to accomplish
that. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-redirect','corp', 'store', 'column',
'STORE_ID=:REDIRECT_NODE', current_timestamp, current_timestamp);
</programlisting></para>
<para>More than one column may be configured in a router_expression.
When more than one column is configured, all matches are added to the
list of nodes to route to. The following is an example where the
STORE_ID column may contain the STORE_ID to route to or the constant
of ALL which indicates that all nodes should receive the update.
<programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-multiple-matches','corp', 'store', 'column',
'STORE_ID=ALL or STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
</programlisting></para>
<para>The NULL keyword may be used to check if a column is null. If
the column is null, then data will be routed to all nodes who qualify
for the update. This following is an example where the STORE_ID column
is used to route to a set of nodes who have a STORE_ID equal to their
EXTERNAL_ID, or to all nodes if the STORE_ID is null. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-multiple-matches','corp', 'store', 'column',
'STORE_ID=NULL or STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
</programlisting></para>
</section>
<section id="configuration-lookup-table-router">
<title>Lookup Table Router</title>
<para>A lookup table may contain the id of the node where data needs
to be routed. This could be an existing table or an ancillary table
that is added specifically for the purpose of routing data. Lookup
table routers are configured by setting the
<literal>router_type</literal> column on the <xref
linkend="table_router" xrefstyle="table"/> table to
<literal>lookuptable</literal> and setting a list of configuration
parameters in the <literal>router_expression</literal> column.</para>
<para>Each of the following configuration parameters are required.
<variablelist>
<varlistentry>
<term>
<command>LOOKUP_TABLE</command>
</term>
<listitem>
<para>This is the name of the lookup table.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>KEY_COLUMN</command>
</term>
<listitem>
<para>This is the name of the column on the table that is
being routed. It will be used as a key into the lookup
table.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>LOOKUP_KEY_COLUMN</command>
</term>
<listitem>
<para>This is the name of the column that is the key on the
lookup table.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>EXTERNAL_ID_COLUMN</command>
</term>
<listitem>
<para>This is the name of the column that contains the
external_id of the node to route to on the lookup
table.</para>
</listitem>
</varlistentry>
</variablelist></para>
<para>Note that the lookup table will be read into memory and cached
for the duration of a routing pass for a single channel.</para>
<para>Consider a table that needs to be routed to a specific store,
but the data in the changing table only contains brand information. In
this case, the STORE table may be used as a lookup table.
<programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-ok','corp', 'store', 'lookuptable',
'LOOKUP_TABLE=STORE
KEY_COLUMN=BRAND_ID
LOOKUP_KEY_COLUMN=BRAND_ID
EXTERNAL_ID_COLUMN=STORE_ID', current_timestamp, current_timestamp);
</programlisting></para>
</section>
<section id="configuration-subselect-router">
<title>Subselect Router</title>
<para>Sometimes routing decisions need to be made based on data that
is not in the current row being synchronized. A 'subselect' router can be used
in these cases. A 'subselect' is configured with a <literal>router_expression</literal> that is a
SQL select statement which returns a result set of the node ids that
need routed to. Column tokens can be used in the SQL expression and
will be replaced with row column data. The overhead of using this
router type is high because the 'subselect' statement runs for each
row that is routed. It should not be used for tables that have a lot
of rows that are updated. It also has the disadvantage that if the
data being relied on to determine the node id has been deleted before
routing takes place, then no results would be returned and
routing would not happen.</para>
<para>The <literal>router_expression</literal> you specify is appended to the
following SQL statement in order to select the node ids:
<programlisting>select c.node_id from sym_node c where
c.node_group_id=:NODE_GROUP_ID and c.sync_enabled=1 and ...
</programlisting>
<para>As you can see, you have access to information about the node currently under consideration for routing
through the 'c' alias, for example <literal>c.external_id</literal>.
There are two node-related tokens you can use in your expression:
<itemizedlist>
<listitem>:NODE_GROUP_ID</listitem>
<listitem>:EXTERNAL_DATA</listitem>
</itemizedlist></para>
Column names representing data for the row in question are prefixed with a colon as well, for example:
<literal>:EMPLOYEE_ID</literal>, or <literal>:OLD_EMPLOYEE_ID</literal>. Here, the OLD_ prefix indicates the value before
the change in cases where the old data has been captured.
</para><para>
For an example, consider the case where an Order table and a OrderLineItem table need to be routed to a
specific store. The Order table has a column named order_id and
STORE_ID. A store node has an external_id that is equal to the
STORE_ID on the Order table. OrderLineItem, however, only has a
foreign key to its Order of order_id. To route OrderLineItems to the
same nodes that the Order will be routed to, we need to reference the
master Order record.</para>
<para>There are two possible ways to solve this in
SymmetricDS. One is to configure a 'subselect' router_type on the
<xref linkend="table_router" xrefstyle="table"/> table, shown below (The other possible
approach is to use an <literal>external_select</literal> to capture the data via a trigger for use in
a column match router, demonstrated in <xref
linkend="configuration-routing-external-select" />).
</para>
<para>Our solution utilizing subselect compares the external id
of the current node with the store id from the Order table where the order id matches
the order id of the current row being routed:
<programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store','corp', 'store', 'subselect',
'c.external_id in (select STORE_ID from order where order_id=:ORDER_ID)',
current_timestamp, current_timestamp);
</programlisting></para>
<para>As a final note, please note in this example that the parent row in Order must still exist at the moment of routing for the
child rows (OrderLineItem) to route, since the select statement is run when routing is occurring, not when the change data is first captured.
</para>
</section>
<section id="configuration-scripted-router">
<title>Scripted Router</title>
<para>When more flexibility is needed in the logic to choose the nodes
to route to, then the a scripted router may be used. The currently
available scripting language is Bean Shell. Bean Shell is a Java-like
scripting language. Documentation for the Bean Shell scripting
language can be found at <ulink
url="http://www.beanshell.org/">http://www.beanshell.org</ulink>.</para>
<para>The router_type for a Bean Shell scripted router is 'bsh'. The
router_expression is a valid Bean Shell script that: <itemizedlist>
<listitem>adds node ids to the <code>targetNodes</code> collection
which is bound to the script</listitem>
<listitem>returns a new collection of node ids</listitem>
<listitem>returns a single node id</listitem>
<listitem>returns true to indicate that all nodes should be routed
or returns false to indicate that no nodes should be
routed</listitem>
</itemizedlist> Also bound to the script evaluation is a list of
<code>nodes</code>. The list of <code>nodes</code> is a list of
eligible <code>org.jumpmind.symmetric.model.Node</code> objects. The
current data column values and the old data column values are bound to
the script evaluation as Java object representations of the column
data. The columns are bound using the uppercase names of the columns.
Old values are bound to uppercase representations that are prefixed
with 'OLD_'.</para>
<para>If you need access to any of the SymmetricDS services, then the
instance of <code>org.jumpmind.symmetric.ISymmetricEngine</code> is
accessible via the bound <code>engine</code> variable.</para>
<para>In the following example, the node_id is a combination of
STORE_ID and WORKSTATION_NUMBER, both of which are columns on the
table that is being routed. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-bsh','corp', 'store', 'bsh',
'targetNodes.add(STORE_ID + "-" + WORKSTATION_NUMBER);',
current_timestamp, current_timestamp);
</programlisting></para>
<para>The same could also be accomplished by simply returning the node
id. The last line of a bsh script is always the return value.
<programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-bsh','corp', 'store', 'bsh',
'STORE_ID + "-" + WORKSTATION_NUMBER',
current_timestamp, current_timestamp);
</programlisting></para>
<para>The following example will synchronize to all nodes if the FLAG
column has changed, otherwise no nodes will be synchronized. Note that
here we make use of OLD_, which provides access to the old column
value. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-flag-changed','corp', 'store', 'bsh',
'FLAG != null && !FLAG.equals(OLD_FLAG)',
current_timestamp, current_timestamp);
</programlisting></para>
<para>The next example shows a script that iterates over each eligible
node and checks to see if the trimmed value of the column named
STATION equals the external_id. <programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-trimmed-station','corp', 'store', 'bsh',
'for (org.jumpmind.symmetric.model.Node node : nodes) {
if (STATION != null && node.getExternalId().equals(STATION.trim())) {
targetNodes.add(node.getNodeId());
}
}',
current_timestamp, current_timestamp);
</programlisting></para>
</section>
<section id="configuration-audit-table-router">
<title>Audit Table Router</title>
<para>This router audits captured data by recording the change in an audit table
that the router creates and keeps up to date (as long as <code>auto.config.database</code> is
set to true.) The router creates a table named the same as the table for which
data was captured with the suffix of _AUDIT. It will contain all of the same columns
as the original table with the same data types only each column is nullable with no default
values. </para>
<para>Three extra "AUDIT" columns are added to the table: <itemizedlist>
<listitem>AUDIT_ID - the primary key of the table.</listitem>
<listitem>AUDIT_TIME - the time at which the change occurred.</listitem>
<listitem>AUDIT_EVENT - the DML type that happened to the row.</listitem>
</itemizedlist> </para>
<para>The following is an example of an audit router<programlisting>
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
create_time, last_update_time)
values
('audit_at_corp','corp', 'local', 'audit', current_timestamp, current_timestamp);
</programlisting></para>