/
outgoing-batches.ad
204 lines (149 loc) · 9.97 KB
/
outgoing-batches.ad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
==== Extract Frequency By Channel
The pull and push frequency cannot be adjusted by channel. If you want to adjust the frequency that outgoing batches for a specific channel are sent, you have two options:
. Batches are extracted by channel at an interval controlled by the `extract_period_millis` in the <<Channels>> settings.
The `last_extract_time` is always recorded, by channel, on the <<NODE_CHANNEL_CTL>> table for the host node's id. When the Pull and Push Job run, if the extract
period has not passed according to the last extract time, then the channel will be skipped for this run. If the `extract_period_millis` is set to zero, data extraction
will happen every time the jobs run.
+
. SymmetricDS provides the ability to configure windows of time when synchronization is allowed. This is done using the
<<NODE_GROUP_CHANNEL_WND>> table. A list of allowed time windows can be specified for a node group and a channel.
If one or more windows exist, then data will only be extracted and transported if the time of day falls within the window of
time specified. The configured times are always for the target node's local time. If the `start_time` is greater than the
`end_time`, then the window crosses over to the next day.
==== Outgoing Sync Status
The status of outgoing synchronization can be queried at the source database.
The following query will show outgoing synchronization failures by node:
[source, sql]
----
select count(*), node_id from sym_outgoing_batch
where error_flag=1 group by node_id;
----
The following query will show the number of data rows that have *not* been delivered to target nodes:
[source, sql]
----
select sum(data_event_count), node_id from sym_outgoing_batch
where status != 'OK' group by node_id;
----
The following queries summed together give an approximation of the number of rows that have *not* been routed:
[source, sql]
----
select sum(end_id-start_id) from sym_data_gap
where start_id < (select max(start_id) from sym_data_gap);
select count(*) from sym_data
where data_id >= (select max(start_id) from sym_data_gap);
----
==== Outgoing Batch Errors
By design, whenever SymmetricDS encounters an issue with synchronization, the batch containing the error is marked as being in
an error state, and _all subsequent batches on the same channel to the same node are not synchronized until the batch error is resolved_.
SymmetricDS will retry the batch in error until the situation creating the error is resolved (or the data for the batch itself is changed). If the
error is caused by network or database failures, then the error might eventually resolve itself when the network or database failures are resolved.
Analyzing and resolving issues can take place on the outgoing or <<_incoming_batch_errors,incoming>> side. The techniques for analysis are slightly different in
the two cases, however, due to the fact that the node with outgoing batch data also has the data and data events associated with the batch in
the database. On the incoming node, however, all that is available is the incoming batch header and data present in an incoming error table.
===== Analyzing the Issue
The first step in analyzing the cause of a failed batch is to locate information about the data in the batch.
ifdef::pro[]
To show only batches in error, select the status of _Error_:
image::images/manage/manage-outgoing-batches-error.png[]
In order to view the error message, click the *bar chart* link in the details column of the outgoing batch row of interest.
image::images/manage/manage-outgoing-batches-error-show.png[]
In order to view the data row in error, click the *Sent Data* tab on the captured batch data screen. The row that is in error will be red.
You can view the data that changed at the bottom of the dialog.
image::images/manage/manage-outgoing-batches-error-data.png[]
The error message and the data should give the user clues as to why the batch failed.
endif::pro[]
ifndef::pro[]
To locate batches in error, run the following SQL query:
[source, sql]
----
select * from sym_outgoing_batch where error_flag=1;
----
Several useful pieces of information are available from this query:
* The batch number of the failed batch, available in column `BATCH_ID`.
* The node to which the batch is being sent, available in column `NODE_ID`.
* The channel to which the batch belongs, available in column `CHANNEL_ID`.
All subsequent batches on this channel to this node will be held until the error condition is resolved.
* The specific data id in the batch which is causing the failure, available in column `FAILED_DATA_ID`.
* Any SQL message, SQL State, and SQL Codes being returned during the synchronization attempt, available in columns `SQL_MESSAGE`,
`SQL_STATE`, and `SQL_CODE`, respectively.
NOTE: Using the `error_flag` on the batch table, as shown above, is more reliable than using the
`status` column. The status column can change from 'ER' to a different status temporarily as
the batch is retried.
NOTE: The query above will also show you any recent batches that
were originally in error and were changed to be manually skipped. See the end of <<Outgoing Batches>> for more details.
To get a full picture of the batch, you can query for information representing the complete
list of all data changes associated with the failed batch by joining
<<DATA>> and <<DATA_EVENT>>, such as:
[source, sql]
----
select * from sym_data where data_id in
(select data_id from sym_data_event where batch_id='XXXXXX');
----
where XXXXXX is the batch id of the failing batch.
This query returns a wealth of information about each data change in a batch, including:
* The table involved in each data change, available in column `TABLE_NAME`,
* The event type (Update [U], Insert [I], or Delete [D]), available in column `EVENT_TYPE`,
* A comma separated list of the new data and (optionally) the old data, available in columns `ROW_DATA` and
`OLD_DATA`, respectively.
* The primary key data, available in column `PK_DATA`
* The channel id, trigger history information, transaction id if available, and other information.
More importantly, if you narrow your query to just the failed data id you can determine the exact data change that is causing the failure:
[source, sql]
----
select * from sym_data where data_id in
(select failed_data_id from sym_outgoing_batch where batch_id='XXXXX'
and node_id='YYYYY');
----
where XXXXXX is the batch id and YYYYY is the node id of the batch that is failing.
The queries above usually yield enough information to be able to determine why a particular batch is failing.
endif::pro[]
Common reasons a batch might fail include:
* The schema at the destination has a column that is not nullable yet the source
has the column defined as nullable and a data change was sent with the column as null.
* A foreign key constraint at the destination is preventing an insertion or update, which could be caused from
data being deleted at the destination or the foreign key constraint is not in place at the source.
* The data size of a column on the destination is smaller than the data size in the source, and data that
is too large for the destination has been synced.
===== Resolving the Issue
Once you have decided upon the cause of the issue, you'll have to decide the best course of action to fix the issue. If, for example,
the problem is due to a database schema mismatch, one possible solution would be to alter the destination database
in such a way that the SQL error no longer occurs. Whatever approach you take to remedy the issue, once you have
made the change, on the next push or pull SymmetricDS will retry the batch
and the channel's data will start flowing again.
If you have instead decided that the batch itself is wrong, or does not need synchronized, or you wish to remove a
particular data change from a batch, you do have the option of changing the data associated with the batch directly.
WARNING: Be cautious when using the following two approaches to resolve synchronization issues. By far, the
best approach to solving a synchronization error is to resolve what is truly causing the
error at the destination database. Skipping a batch or removing a data id as discussed below should be your
solution of last resort, since doing so results in differences between the source and destination databases.
Now that you've read the warning, if you _still_ want to change the batch
data itself, you do have several options, including:
ifndef::pro[]
* Causing SymmetricDS to skip the batch completely. This is accomplished by setting the
batch's status to 'IG' (ignore), as in:
+
[source, sql]
----
update sym_outgoing_batch set status='IG' where batch_id='XXXXXX'
----
+
where XXXXXX is the failing batch. On the next pull or push, SymmetricDS will skip this batch since
it now thinks the batch has already been synchronized. Note that you can still distinguish between successful
batches and ones that you've artificially marked as 'OK', since the `error_flag` column on
the failed batch will still be set to '1' (in error).
endif::pro[]
ifdef::pro[]
* Causing SymmetricDS to skip the batch completely can be accomplished from the user interface. Select the batch that should be
ignored. Click *Change* and select *Ignore Batch*.
+
image::images/manage/manage-outgoing-batches-error-ignore.png[]
endif::pro[]
* Removing the failing data id from the batch by deleting the corresponding row in <<DATA_EVENT>>.
Eliminating the data id from the list of data ids in the batch will cause future synchronization attempts
of the batch to no longer include that particular data change as part of the batch. For example:
+
[source, sql]
delete from sym_data_event where batch_id='XXXXXX' and data_id='YYYYYY'
where XXXXXX is the failing batch and YYYYYY is the data id to longer be included in the batch.
+
After modifying the batch you will have to clear the <<Staging Area>> manually or wait for the staged version of the batch to timeout and clear itself.