/
overview.xml
300 lines (298 loc) · 21.4 KB
/
overview.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="introduction" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:svg="http://www.w3.org/2000/svg" xmlns:ns="http://docbook.org/ns/docbook"
xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:html="http://www.w3.org/1999/xhtml">
<title>Introduction</title>
<section id="intro">
<title>Introduction</title>
<para>This User's Guide will introduce both basic and advanced concepts in the configuration
of SymmetricDS. By the end of this chapter, you will have a better understanding of SymmetricDS' capabilities, and
many of its basic concepts. This chapter also includes a hands-on, step-by-step, tutorial meant to demonstrate some of
the potential applications for SymmetricDS.</para>
<section id="definition">
<title>What is SymmetricDS?</title>
<para> SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and
bi-directional synchronization. It uses web and database technologies to replicate tables between relational
databases in near real time. The software was designed to scale for a large number of databases, work across
low-bandwidth connections, and withstand periods of network outage.</para>
<para>
The software can be installed as a standalone process, as a web application in a Java application server, or it
can be embedded into another Java application. A single installation of SymmetricDS attached to a target database
is called a <emphasis>node</emphasis>
. A node is initialized by a properties file and is configured by inserting configuration data into a series of
database tables. It then creates database triggers on the application tables to be synchronized so that database
events are captured for delivery to other SymmetricDS nodes.
</para>
<para>
In most databases, the transaction id is also captured by the database triggers so that the insert, update, and delete
events can be replicated transactionally via the transport layer to other nodes. The transport layer is typically a CSV protocol over HTTP.
</para>
<para>
SymmetricDS supports synchronization across different database platforms through the concept of database
<emphasis>dialects</emphasis>
. A database dialect is an abstraction layer that SymmetricDS interacts with to insulated the main synchronization
logic from database specific implementation details.
</para>
<para>
SymmetricDS is extendable through extension points. Extension points are custom Java code or reusable Java code that is
configured via XML. Extension points hook into key points in the life-cycle of a synchronization to allow for custom
behavior to be injected. Extension points allow custom behavior like: publishing data to other sources, transforming data,
and taking different actions based on the content or status of a synchronization.
</para>
</section>
<section id="background">
<title>Background</title>
<para>The idea of SymmetricDS was born from a real-word need. Several of the original developers were, several years
ago, implementing a commercial Point of Sale (POS) system for a large retailer. The development team came to the
conclusion that that the software available for trickling back transactions to the general office did not meet the
project needs. The list of problems in the requirements made finding the ideal solution difficult:</para>
<itemizedlist>
<listitem>
<para> Sending and receiving data with 2000 stores during peak holiday loads.</para>
</listitem>
<listitem>
<para> Supporting one database platform at the store and another at general office.</para>
</listitem>
<listitem>
<para> Synchronizing some data in one direction, and other data in both directions.</para>
</listitem>
<listitem>
<para> Filtering out sensitive data and re-routing it to a protected database.</para>
</listitem>
<listitem>
<para> Preparing the store database with an initial load of data from general office.</para>
</listitem>
</itemizedlist>
<para> The team ultimately created a custom solution that met the requirements and made the project successful. From
this initial challenge came the knowledge and experience that SymmetricDS benefits from today.</para>
</section>
<section>
<title>SymmetricDS Features</title>
<para>At a high level, SymmetricDS comes with a number of features that you are likely to need or want when doing data
synchronization. A majority of these features were created as a direct result of real-world use of SymmetricDS in
production settings.</para>
<section id="notification">
<title>Notification Schemes</title>
<para>
After a change to the database is recorded, the SymmetricDS nodes interested in the change are notified. Change
notification is configured to perform a
<emphasis>push</emphasis>
(trickle-back) or a
<emphasis>pull</emphasis>
(trickle-poll) of data. When several nodes target their changes to a central node, it is efficient to push the
changes instead of waiting for the central node to pull from each source node. When network configuration protects
a node with a firewall, a pull configuration allows the node to receive data changes that might otherwise be
blocked using push. The frequency of the change notification is configured by default to one minute.
</para>
</section>
<section id="bi-sync">
<title>Two-Way Table Synchronization</title>
<para> In practice, some data may require synchronization in just one direction. For example, a retail store sends its sales transactions to a
central office, and the central office sends its stock items and pricing to the store. Other data may synchronize in both
directions. For example, the retail store sends the central office an inventory document, and the central office
updates the document status, which is sent back to the store. SymmetricDS supports bi-directional or two-way table synchronization
and avoids getting into update loops by only recording data changes outside of synchronization.</para>
</section>
<section id="data-channels">
<title>Data Channels</title>
<para> SymmetricDS supports the concept of <emphasis>channels</emphasis> of data.
Data synchronization is defined at the table (or table subset) level, and each managed table can be assigned to a
<emphasis>channel</emphasis> that helps control the flow of data. A channel is a category of data that can be enabled, prioritized and
synchronized independently of other channels. For example, in a retail environment, users may be waiting for
inventory documents to update while a promotional sale event updates a large number of items. If processed in
order, the item updates would delay the inventory updates even though the data is unrelated. By assigning item
table changes to the "item" channel and inventory table changes to the "inventory" channel, the changes are
processed independently so inventory can get through.</para>
</section>
<section id="transactions">
<title>Transaction Awareness</title>
<para> Many databases provide a unique transaction identifier associated with the rows that are committed together as a transaction.
SymmetricDS stores the transaction identifier along with the data that changed so it can play back the transaction exactly
as it occurred originally. This means the target database maintains the same integrity as its source. Support for
transaction identification is documented in the appendix of this guide.</para>
</section>
<section id="plugins">
<title>Data Filtering and Rerouting</title>
<para>
Using SymmetricDS, data can be filtered as it is recorded, extracted, and loaded.
<itemizedlist>
<listitem>
<para>
Data routing is accomplished by assigning a router type to a <xref linkend="table.router" /> configuration.
Routers are responsible for identifying what target nodes captured changes should be delivered to. Custom
routers are possible by providing a class implementing <literal>IDataRouter</literal>.
</para>
</listitem>
<listitem>
<para>
As data changes are loaded in the target database, a class implementing
<literal>IDataLoaderFilter</literal>
can change the data in a column or route it somewhere else. One possible use might be to route credit
card data to a secure database and blank it out as it loads into a centralized sales database. The
filter can also prevent data from reaching the database altogether, effectively replacing the default
data loading.
</para>
</listitem>
<listitem>
<para>
Columns can be excluded from synchronization so they are never recorded when the table is changed. As
data changes are loaded into the target database, a class implementing
<literal>IColumnFilter</literal>
can altogether remove a column from the synchronization. For example, an employee table may be
synchronized to a retail store database, but the employee's password is only synchronized on the
initial insert.
</para>
</listitem>
<listitem>
<para>
As data changes are extracted from the source database, a class implementing the
<literal>IExtractorListener</literal>
interface is called to filter data or route it somewhere else. By default, SymmetricDS provides a
handler that transforms and streams data as CSV. Optionally, an alternate implementation may be
provided to take some other action on the extracted data.
</para>
</listitem>
</itemizedlist>
</para>
</section>
<section id="transports">
<title>HTTP Transport</title>
<para>
By default, SymmetricDS uses web-based HTTP in a style called Representation State Transfer (REST) that is
lightweight and easy to manage. A series of filters are also provided to enforce authentication and to restrict
the number of simultaneous synchronization streams. The
<literal>ITransportManager</literal>
interface allows other transports to be implemented.
</para>
</section>
<section id="jmx">
<title>Remote Management</title>
<para> Administration functions are exposed through Java Management Extensions (JMX) that can be accessed from the
Java JConsole or through an application server. Functions include opening registration, reloading data, purging
old data, and viewing batches. A number of configuration and runtime properties are available to be viewed as
well.</para>
<para> SymmetricDS also provides functionality to send a SQL events through the same synchronization mechanism that is
used to send data. The data payload can be any SQL statement. The event is processed and acknowledged just like
any other event type.</para>
</section>
</section>
</section>
<section id="requirements">
<title>System Requirements</title>
<para> SymmetricDS is written in Java 5 and requires a Java SE Runtime Environment (JRE) or Java SE Development Kit (JDK)
version 5.0 or above.</para>
<para> Any database with trigger technology and a JDBC driver has the potential to run SymmetricDS. The database is
abstracted through a Database Dialect in order to support specific features of each database. The following Database
Dialects have been included with this release:</para>
<itemizedlist>
<listitem>
<para>MySQL version 5.0.2 and above</para>
</listitem>
<listitem>
<para>Oracle version 8.1.7 and above</para>
</listitem>
<listitem>
<para>PostgreSQL version 8.2.5 and above</para>
</listitem>
<listitem>
<para>Sql Server 2005</para>
</listitem>
<listitem>
<para>HSQLDB 1.8</para>
</listitem>
<listitem>
<para>H2 1.x</para>
</listitem>
<listitem>
<para>Apache Derby 10.3.2.1 and above</para>
</listitem>
<listitem>
<para>IBM DB2 9.5</para>
</listitem>
<listitem>
<para>Firebird 2.0 and above</para>
</listitem>
</itemizedlist>
<para>
See the appendix
<link linkend="databases">Database Notes</link>
for compatibility notes and other details for your specific database.
</para>
</section>
<section id="whats-new">
<title>What's new in SymmetricDS 2.x</title>
<para> SymmetricDS 2.x builds upon the existing SymmetricDS 1.x software base and incorporates a number of
architectural changes and performance improvements. If you are brand new to SymmetricDS, you can safely skip this
section. If you have used SymmetricDS 1.x in the past, this section summarizes the key differences you will
encounter when moving to 2.x.</para>
<para>The first significant architectural change involves SymmetricDS's use of triggers. In 1.x, triggers capture and
record both data changes and the nodes to which the changes must be applied as row inserts into the data_event
table. Thus, the number of row-inserts grows linearly with the number of client nodes. This can lead to an obvious
performance issue as the number of nodes increases. In addition, the problem is made worse at times due to
synchronizing nodes updating the same data_event table as part of the batching process while the row-inserts are
being created.</para>
<para>In SymmetricDS 2.0, triggers capture only data changes, not the node-specific details. The node-specific
row-inserts are replaced with a new routing mechanism that does both the routing and the batching of data on one
thread. This means we have eliminated the real-time inserts into data_event by applications using synchronized
tables and thus database performance will improve. We have also eliminated the database contention on data_event
since the router job is the only thread inserting data into that table. The only other access to the data_event
table is from selects by synchronizing nodes.</para>
<para>
As a result of these changes, we gain the following benefits:
<itemizedlist>
<listitem>Synchronizing client nodes will spend less time connected to a server node,</listitem>
<listitem>Applications updating database tables that are being synchronized to a large number of nodes will
not degrade in performance as more nodes are added, and</listitem>
<listitem> There should be almost no database contention on the data_event table as there could be in 1.X.
</listitem>
</itemizedlist>
Because routing no longer takes place in the SymmetricDS triggers, a new mechanism for routing was needed. In
SymmetricDS 1.x, the node_select expression was used for specifying the desired data routing. It was a SQL
expression that qualified the insert into data_event from the SymmetricDS triggers. In SymmetricDS 2 there is a
new extension point called the data router. Data routers are configured in the router table with a router_type and
a router_expression. We will be providing several different routers that will serve the majority of users routing
needs, but the framework is in place for a SymmetricDS programmer to develop domain- or application-specific
routers.
</para>
<para>
Routers that are provided include:
<itemizedlist>
<listitem>Column Match Router - a router that compares old or new column values to a constant value or the
value of a node's external_id or node_id.</listitem>
<listitem>Sub-select Router - a router that executes a SQL expression against the database to select nodes to
route to. This SQL expression can be passed values of old and new column values.</listitem>
<listitem>Bean Shell Router - a router that executes a BSH expression in order to select nodes to route to.
The BSH expression can use the the old and new column values.</listitem>
<listitem>Xml Publishing Router - a router the publishes data changes directly to a messaging solution instead
of transmitting changes to registered nodes.</listitem>
</itemizedlist>
</para>
<para>Since the routing and capturing of data are now performed with two separate mechanisms, the two concepts have
been separated into separate configuration tables in the database, with a join table (trigger_router) specifying
the relationships between routing (router) and capturing of data (trigger). This solves a long standing issue that
some databases that only allow one trigger per table. On those database platforms, we can now route data in
multiple directions since we only require one SymmetricDS trigger to capture data. This also helps performance in
those scenarios, since we only capture the data once instead of once per routing instance.</para>
<para>
As part of the new routing job, we have introduced another new extension point to allow more flexibility in the
way data events get batched. A batch is the unit by with captured data is sent and committed on target nodes. In
SymmetricDS 2, batching is now configured on the channel configuration table. This provides additional flexibility
for batching:
<itemizedlist>
<listitem>Batching can have the traditional SymmetricDS 1.x behavior of batching up to a max batch size, but
never breaking on a database transaction boundary.</listitem>
<listitem>Batching can be completely tied to a database transaction. One batch per database transaction.
</listitem>
<listitem>Batching can ignore database transactions altogether and always batch based on a max batch size.
</listitem>
</itemizedlist>
</para>
<para> Another significant change to note in SymmetricDS 2.x is the removal of the incoming and outgoing batch history
tables. This change was made because it was found that over 95% of the time the statistics the end user truly
wanted to see were those for the most recent synchronization attempt, not to mention that the outgoing batch
history table was difficult to query. The most valuable information in the batch history tables, the batch
statistics, have been moved over to the batch tables. The statistics in the batch tables now always represent the
latest synchronization attempt.</para>
</section>
<xi:include href="tutorial.xml" />
</chapter>