-
Notifications
You must be signed in to change notification settings - Fork 221
/
basic-configuration.xml
567 lines (548 loc) · 29.2 KB
/
basic-configuration.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="basic-configuration" xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:svg="http://www.w3.org/2000/svg" xmlns:ns="http://docbook.org/ns/docbook"
xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:html="http://www.w3.org/1999/xhtml">
<title>Basic Configuration</title>
<para>
To get a SymmetricDS node running, it needs to be given an identity and it needs to know how
to connect to the database it will be synchronizing. A typical way to specify this is to place properties
in the symmetric.properties file. When started up, SymmetricDS reads the configuration
and state from the database. If the configuration tables are missing, they are created
automatically (auto creation can be disabled). Basic configuration is described by inserting into the following tables:
<itemizedlist>
<listitem>
<para><xref linkend="node_group" xrefstyle="select: title page"/> - specify the tiers that exist in a topology</para>
</listitem>
<listitem>
<para><xref linkend="node_group_link" xrefstyle="select: title page"/> - two nodes groups are linked together for synchronization</para>
</listitem>
<listitem>
<para><xref linkend="channel" xrefstyle="select: title page"/> - grouping to control synchronizations</para>
</listitem>
<listitem>
<para><xref linkend="trigger" xrefstyle="select: title page"/> - specify tables, channels, and conditions for which changes in the database should be captured</para>
</listitem>
<listitem>
<para><xref linkend="router" xrefstyle="select: title page"/> - specify the <xref linkend="node_group_link" xrefstyle="select: title page"/> to be used for synchronization, along with other routing details</para>
</listitem>
<listitem>
<para><xref linkend="trigger_router" xrefstyle="select: title page"/> - map routers triggers</para>
</listitem>
</itemizedlist>
During start up, triggers are verified against the database, and database triggers
are installed on tables that require data changes to be captured. The route job, pull job and push job
begin running to synchronize changes with other nodes.
</para>
<section id="basic-properties">
<title>Basic Properties</title>
<para>
Each node requires properties that allow it to connect to a database and register
with a parent node. To give a node its identity, the following properties are used:
</para>
<variablelist>
<varlistentry>
<term>
<command>group.id</command>
</term>
<listitem>
<para>
The node group that this node is a member of. Synchronization is specified
between node groups, which means you only need to specify it once for
multiple nodes in the same group.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>external.id</command>
</term>
<listitem>
<para>
The external id for this node has meaning to the user and provides
integration into the system where it is deployed. For example, it might be a
retail store number or a region number. The external id can be used in
expressions for conditional and subset data synchronization. Behind the
scenes, each node has a unique sequence number for tracking synchronization
events. That makes it possible to assign the same external id to multiple
nodes, if desired.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>sync.url</command>
</term>
<listitem>
<para>
The URL where this node can be contacted for synchronization.
At startup and during each heartbeat, the node updates its entry in
the database with this URL.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
When a new node is first started, it is has no information about synchronizing. It
contacts the registration server in order to join the network and receive its
configuration. The configuration for all nodes is stored on the registration server, and
the URL must be specified in the following property:
</para>
<variablelist>
<varlistentry>
<term>
<command>registration.url</command>
</term>
<listitem>
<para>
The URL where this node can connect for registration to receive its
configuration. The registration server is part of SymmetricDS and is enabled
as part of the deployment.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
When deploying to an application server, it is common for database connection pools
to be found in the Java naming directory (JNDI). In this case, set the following property:
</para>
<variablelist>
<varlistentry>
<term>
<command>db.jndi.name</command>
</term>
<listitem>
<para>
The name of the database connection pool to use, which is registered in the JNDI
directory tree of the application server. It is recommended that this DataSource is
NOT transactional, because SymmetricDS will handle its own transactions.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
For a deployment where the database connection pool should be created using a JDBC driver,
set the following properties:
</para>
<variablelist>
<varlistentry>
<term>
<command>db.driver</command>
</term>
<listitem>
<para>
The class name of the JDBC driver.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>db.url</command>
</term>
<listitem>
<para>
The JDBC URL used to connect to the database.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>db.user</command>
</term>
<listitem>
<para>
The database username, which is used to login, create, and update SymmetricDS tables.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>db.password</command>
</term>
<listitem>
<para>
The password for the database user.
</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section id="node-group">
<title>Node Group</title>
<para>
Each node must belong to a node group, a collection of one or more nodes.
A common use of node groups is to describe a level in a hierarchy of data synchronization.
For example, at a retail store chain, there might be a few nodes that belong to "corp", which
sync with hundreds of nodes that belong to "store", which sync with thousands of nodes that
belong to "register".
</para>
<para>
The following SQL statements would create node groups for "corp" and "store".
<programlisting>
<![CDATA[insert into SYM_NODE_GROUP
(node_group_id, description)
values
('store', 'A retail store node');
insert into SYM_NODE_GROUP
(node_group_id, description)
values
('corp', 'A corporate node');]]></programlisting>
</para>
</section>
<section id="basic-node-group-link">
<title>Node Group Link</title>
<para>
To establish synchronization between nodes, two node groups are linked together. The direction
of synchronization is determined by specifying a source and target node group.
If synchronization should occur in both directions, then two links are created in opposite
directions. The target node group receives data changes by either push or pull methods.
A push method causes the source node group to connect to the target, while a pull method
causes it to wait for the target to connect to it.
</para>
<para>
The following SQL statements links the "corp" and "store" node groups for synchronization.
It configures the "store" nodes to push their data changes to the "corp" nodes,
and the "corp" nodes to send changes to "store" nodes by waiting for a pull.
<programlisting>
<![CDATA[insert into SYM_NODE_GROUP_LINK
(source_node_group, target_node_group, data_event_action)
values
('store', 'corp', 'P');
insert into SYM_NODE_GROUP_LINK
(source_node_group, target_node_group, data_event_action)
values
('corp', 'store', 'W');]]></programlisting>
</para>
</section>
<section id="basic-node">
<title>Node</title>
<para>
Each instance of SymmetricDS is a node that can be uniquely identified.
The node has a unique identifier used by the system, and the user provides an external identifier
for context in the local system. For most common use, the two identifiers are the same.
The registration process generates and sends the identity and password to the node, along
with its synchronization configuration. The top-level registration server must
have its identity provided by the user since it has no parent to contact.
</para>
<para>
The following SQL statements setup a top-level registration server as a node identified
as "00000" in the "corp" node group.
<programlisting>
<![CDATA[insert into SYM_NODE
(node_id, node_group_id, external_id, sync_enabled)
values
('00000', 'corp', '00000', 1);
insert into SYM_NODE_IDENTITY values ('00000');]]></programlisting>
</para>
</section>
<section id="basic-channel">
<title>Channel</title>
<para>
Data changes in the database are captured in the order that they occur, which is preserved
when synchronizing to other nodes. Some data may need priority for synchronization despite
the normal order of events. Tables can be grouped together in <link linkend="channel">channels</link>.
Channels provide a processing order for groups of tables, a limit on the
amount of data that will be batched together, and isolation from errors in other channels.
By categorizing data into channels and assigning them to <link linkend="trigger">triggers</link>, the user gains more control and visibility into
the flow of data.
</para>
<para>
The following SQL statements setup channels for a retail store. An "item" channel includes
data for items and their prices, while a "sale_transaction" channel includes data for ringing
sales at a register.
<programlisting>
<![CDATA[insert into SYM_CHANNEL
(channel_id, processing_order, max_batch_size, max_batch_to_send,
extract_period_millis, batch_algorithm, enabled, description)
values
('item', 10, 1000, 10, 0, 'default', 1, 'Item and pricing data');
insert into SYM_CHANNEL
(channel_id, processing_order, max_batch_size, max_batch_to_send,
extract_period_millis, batch_algorithm, enabled, description)
values
('sale_transaction', 1, 1000, 10, 60000, 'transactional', 1,
'retail sale transactions from register');]]></programlisting>
</para>
<para>
Batching is the grouping of data, by channel, to be transferred and committed at
the client together. There are three different out-of-the-box batching algorithms which
may be configured in the batch_algorithm column on channel.
<variablelist>
<varlistentry>
<term>
<command>default</command>
</term>
<listitem>
<para>
All changes that happen in a transaction are guaranteed to be batched
together. Multiple transactions will be batched and committed together
until there is no more data to be sent or the max_batch_size is reached.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>transactional</command>
</term>
<listitem>
<para>
Batches will map directly to database transactions. If there are many
small database transactions, then there will be many batches. The max_batch_size
column has no effect.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<command>nontransactional</command>
</term>
<listitem>
<para>
Multiple transactions will be batched and committed together
until there is no more data to be sent or the max_batch_size is reached.
The batch will be cut off at the max_batch_size regardless of whether
it is in the middle of a transaction.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</section>
<section id="basic-trigger">
<title>Trigger</title>
<para>
SymmetricDS captures synchronization data using database triggers. Triggers are defined in the <link linkend="trigger">trigger</link> table.
Each record is used by SymmetricDS when generating database triggers. Database triggers are only generated when a trigger
is associated with a <link linkend="router">router</link> whose source_node_group_id matches the node group id of the current node.
</para>
<para>
The following SQL statement defines a trigger that will capture data for a table named "item"
whenever data is inserted, updated, or deleted. The trigger is assigned to a channel also called 'item'.
<programlisting>
<![CDATA[insert into SYM_TRIGGER
(trigger_id,source_table_name,channel_id,last_update_time,create_time)
values
('item', 'item', 'item', current_timestamp, current_timestamp);
]]></programlisting>
</para>
</section>
<section id="basic-route">
<title>Simple Router</title>
<para>
The simplest router is a router that sends all the data that is captured by its
associated triggers to all the nodes that belong to the target node group defined
in the router. A router is defined as a row in the <link linkend="router">router</link> table.
It is then linked to triggers in the <link linkend="trigger_router">trigger router</link> table.
</para>
<para>
The following SQL statement defines a router that will send data from the 'corp' group to the 'store' group.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id,
create_time, last_update_time)
values
('corp-2-store','corp', 'store',
current_timestamp, current_timestamp);
]]></programlisting>
</para>
<para>
The following SQL statement maps the 'corp-2-store' router to the item trigger.
<programlisting>
<![CDATA[insert into SYM_TRIGGER_ROUTER
(trigger_id, router_id, initial_load_order, create_time, last_update_time)
values
('item', 'corp-2-store', 1, current_timestamp, current_timestamp);
]]></programlisting>
</para>
</section>
<section id="basic-route-column-match">
<title>Column Router</title>
<para>
Sometimes requirements may exist that require data to be routed based on the current value or the old value of a
column in the table that is being routed. Column routers are configured by setting the router_type column on the <link linkend="router">router</link> table
to 'column' and setting the router_expression column to an equality expression that represents
the expected value of the column.
</para>
<para>
The first part of the expression is always the column name. The column name should always be defined in upper case.
The upper case column name prefixed by OLD_ can be used for a comparison being done with the old column data value.
</para>
<para>
The second part of the expression can be a constant value, a token that represents another column, or a token
that represents some other SymmetricDS concept. Token values always begin with a colon (:).
</para>
<para>
Consider a table that needs to be routed to all nodes in the target group only when a status column is set to 'OK.' The following
SQL statement will insert a column router to accomplish that.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-ok','corp', 'store', 'column',
'STATUS=OK', current_timestamp, current_timestamp);
]]></programlisting>
</para>
<para>
Consider a table that needs to be routed to all nodes in the target group only when a status column changes values. The following
SQL statement will insert a column router to accomplish that.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-status','corp', 'store', 'column',
'STATUS!=:OLD_STATUS', current_timestamp, current_timestamp);
]]></programlisting>
</para>
<para>
Consider a table that needs to be routed to only nodes in the target group whose STORE_ID column matches the external id of a node. The following
SQL statement will insert a column router to accomplish that.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-id','corp', 'store', 'column',
'STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
]]></programlisting>
Attributes on <link linkend="node">node</link> that can be referenced with tokens include:
<itemizedlist>
<listitem>NODE_ID</listitem>
<listitem>EXTERNAL_ID</listitem>
<listitem>NODE_GROUP_ID</listitem>
</itemizedlist>
</para>
<para>
Consider a table that needs to be routed to a redirect node defined by its external_id in the <link linkend="registration_redirect">registration redirect</link> table. The following
SQL statement will insert a column router to accomplish that.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-redirect','corp', 'store', 'column',
'STORE_ID=:REDIRECT_NODE', current_timestamp, current_timestamp);
]]></programlisting>
</para>
<para>
More than one column may be configured in a router_expression. When more than one column is configured, all matches are added to the list of nodes to route to. The following is
an example where the STORE_ID column may contain the STORE_ID to route to or the constant of ALL which indicates that all nodes should receive the update.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-multiple-matches','corp', 'store', 'column',
'STORE_ID=ALL
STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
]]></programlisting>
</para>
<para>
The NULL keyword may be used to check if a column is null. If the column is null, then data will be routed to all nodes who qualify for the update. This following is an example
where the STORE_ID column is used to route to a set of nodes who have a STORE_ID equal to their EXTERNAL_ID, or to all nodes if the STORE_ID is null.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-multiple-matches','corp', 'store', 'column',
'STORE_ID=NULL
STORE_ID=:EXTERNAL_ID', current_timestamp, current_timestamp);
]]></programlisting>
</para>
</section>
<section id="basic-route-relational">
<title>Relational Router</title>
<para>
Sometimes routing decisions need to be made based on data that is not in the current row being synchronized. Consider an
example where an Order table and a OrderLineItem table need to be routed to a specific store. The Order table has a column
named order_id and STORE_ID. A store node has an external_id that is equal to the STORE_ID on the Order table. OrderLineItem,
however, only has a foreign key to its Order of order_id. To route OrderLineItems to the same nodes that the Order will be routed
to, we need to reference the master Order record.
</para>
<para>
There are two possible ways to route the OrderLineItem in SymmetricDS. One is to configure a 'subselect' router_type on the <link linkend="router">router</link> table
and the other is to configure an external_select on the <link linkend="trigger">trigger</link> table.
</para>
<para>
A 'subselect' is configured with a router_expression that is a SQL select statement which returns a result set of the node_ids that need routed to. Column tokens can
be used in the SQL expression and will be replaced with row column data. The overhead of using this router type is high because the 'subselect' statement runs for each row
that is routed. It should not be used for tables that have a lot of rows that are updated. It also has the disadvantage that if the Order master record is deleted,
then no results would be returned and routing would not happen. The router_expression is appended to the following
SQL statement in order to select the node ids.
<programlisting>
<![CDATA[
select c.node_id from sym_node c where
c.node_group_id=:NODE_GROUP_ID and c.sync_enabled=1 and
]]></programlisting>
</para>
<para>
Consider a table that needs to be routed to all nodes in the target group only when a status column is set to 'OK.' The following
SQL statement will insert a column router to accomplish that.
<programlisting>
<![CDATA[insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store','corp', 'store', 'subselect',
'c.external_id in
(select STORE_ID from order where order_id=:ORDER_ID)',
current_timestamp, current_timestamp);
]]></programlisting>
</para>
<para>
Alternatively, when using an external_select on the <link linkend="trigger">trigger</link> table, data is captured in the EXTERNAL_DATA column of the <link linkend="data">data</link> table at the time a trigger
fires. The EXTERNAL_DATA can then be used for routing by using a router_type of 'column'. The advantage of this approach is that it is very unlikely that the master Order table
will have been deleted at the time any DML accures on the OrderLineItem table. It also is a bit more effcient than the 'subselect' approach, although the triggers produced do run
the extra external_select inline with application database updates.
</para>
<para>
In the following example, the STORE_ID is captured from the Order table in the EXTERNAL_DATA column. EXTERNAL_DATA is always available for routing as a virtual column in a 'column'
router. The router is configured to route based on the captured EXTERNAL_DATA to all nodes whose external_id matches. Note that other supported node attribute token can also be
used for routing.
<programlisting>
<![CDATA[
insert into SYM_TRIGGER
(trigger_id,source_table_name,channel_id,external_select,
last_update_time,create_time)
values
('orderlineitem', 'orderlineitem', 'orderlineitem','select STORE_ID
from order where order_id=$(curTriggerValue).$(curColumnPrefix)order_id',
current_timestamp, current_timestamp);
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-ext','corp', 'store', 'column',
'EXTERNAL_DATA=:EXTERNAL_ID', current_timestamp, current_timestamp);
]]></programlisting>
</para>
</section>
<section id="basic-route-scripted">
<title>Scripted Router</title>
<para>
When more flexibility is needed in the logic to choose the nodes to route to, then the a Bean Shell router may be used. Bean Shell is a Java-like scripting language. Documentation
for the Bean Shell scripting language can be found at <ulink url="http://www.beanshell.org/">http://www.beanshell.org</ulink>.
</para>
<para>
The router_type for a Bean Shell router is 'bsh'. The router_expression is a valid Bean Shell script that adds to the 'targetNodes' collection which is bound to the script
evaluation. The script should add the node_ids that should be routed to 'targetNodes' as Strings. Also bound to the script evaluation is a list of 'nodes'. The list of 'nodes' is
a list of eligible Node objects. The current data column values and the old data column values are bound to the script evaluation as Java object representations of the column data.
The columns are bound using the uppercase names of the columns. Old values are bound to uppercase representations that are prefixed with 'OLD_'.
</para>
<para>
In the following example, the node_id is a combination of STORE_ID and WORKSTATION_NUMBER, both of which are columns on the table that is being routed.
<programlisting>
<![CDATA[
insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, router_type,
router_expression, create_time, last_update_time)
values
('corp-2-store-bsh','corp', 'store', 'bsh',
'targetNodes.add(STORE_ID + "-" + WORKSTATION_NUMBER);',
current_timestamp, current_timestamp);
]]></programlisting>
</para>
</section>
</chapter>