DBZ-947 SQL Server snapshot.locking.modes overhaul #38

grzegorz8 · 2018-10-24T13:28:08Z

https://issues.jboss.org/browse/DBZ-947

Exclusive mode: In order to provide a consistent snapshot, exclusive locks
are taken on all monitored tables during entire snapshot process duration.

Snapshot mode: The initial snapshot is executed in snapshot transaction
isolation level. This guarantees consistent snapshot as long as DDL
statements are not executed. In addition, neither table locks nor
row-level locks are acquired.

Repeatable read mode: The initial snapshot is executed in repeatable read
transaction level. Since phantom reads can occur, it does not fully
guarantee consistency.

None mode: Neither table locks nor row-level locks are taken. This way
other transactions are not affected by initial snapshot process.
However, snapshot consistency is not guaranteed. In addition, DDL
statements must not be executed during the snapshot.

grzegorz8 · 2018-10-24T13:28:53Z

Related pull request: #36

grzegorz8 · 2018-10-24T13:31:06Z

...server/src/main/java/io/debezium/connector/sqlserver/SqlServerSnapshotChangeEventSource.java

-        }
-        LOGGER.info("Executing schema locking");
+        } else if (connectorConfig.getSnapshotLockingMode() == SnapshotLockingMode.EXCLUSIVE) {
+            LOGGER.info("Executing schema locking");


Taking locks here is misleading, since it suggests they are taken only for schema snapshot. I believe we should introduce yet another template method.

Yes, that's a good point. Can you file a JIRA issue so we can track it separately?

Hum, no, I think I'm taking that back. Before this PR, these locks should indeed only have been held while taking the schema snapshot, by virtue of a savepoint. Which makes me wonder why you removed it here?

Rollback to savepoint makes no sense.
In both SNAPSHOT and R_R mode no locks are taken - no rollback needed.
For EXCLUSIVE, we want to keep locks until the end of transaction - rollback must not be executed.

I meant that taking exclusive lock in lockTablesForSchemaSnapshot is misleading.

For EXCLUSIVE, we want to keep locks until the end of transaction - rollback must not be executed.

I think the original idea was to keep the lock only for the time of taking the schema snapshot. At least that was my motivation for the Oracle connector, which is where @jpechane copied this from. Then the actual data snapshot would be taken using R_R semantics, which indeed seems to be missing still, when looking at the master branch. @jpechane Correct me please, if I'm wrong.

Yes, on master in EXCLUSIVE data snapshot is taken in database default isolation level.

I suggested to keep the locks during data snapshot as well due to this scenario: https://issues.jboss.org/browse/DBZ-947?focusedCommentId=13651038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13651038

grzegorz8 · 2018-10-26T21:05:13Z

To be honest, even having seen the logs, I still have no idea why the build failed.

jpechane · 2018-10-29T14:06:31Z

@grzegorz8 Unfortunately the build here will fail till the related PR gets merged.

jpechane · 2018-10-31T12:51:15Z

@grzegorz8 There is a conflict now after #36 was applied :-(

grzegorz8 · 2018-10-31T13:35:52Z

@jpechane Rebase done.

gunnarmorling · 2018-11-06T12:25:04Z

Hey @grzegorz8, where are we with this one? On first thought I'm having doubts of using READ_COMITTED for snaphotting, as it may lead to concurrent changes being missed. What's the motivation for adding this in addition to REPEATABLE_READ?

grzegorz8 · 2018-11-06T12:34:56Z

Hey @grzegorz8, where are we with this one? On first thought I'm having doubts of using READ_COMITTED for snaphotting, as it may lead to concurrent changes being missed. What's the motivation for adding this in addition to REPEATABLE_READ?

I'm waiting for review and suggestions how to structure the code (e.g. do we need to introduce a new template method for taking locks for the entire snapshot, not only for schema snapshot?).

Regarding isolation level, I know that Debezium tries to deliver consistent snapshot, which in fact is feasible with neither READ COMMITTED nor REPEATABLE READ in case of SQL Server. I presented an example for RR here: https://issues.jboss.org/browse/DBZ-947?focusedCommentId=13651038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13651038
My concern was the fact that "none" mode in current form is misleading. It blocks other transactions from making updates, since in repeatable read isolation level row shared locks are taken until the end of transaction. So I concluded, that if we cannot deliver consistency, we should at least deliver truly "none" locking mode with "eventual consistency".

In addition, I encountered cases when blocking other transactions is unacceptable for DB owner, so read committed fits this case well.

grzegorz8 · 2018-11-12T07:20:37Z

@jpechane @gunnarmorling The PR has been updated according to the suggestions here https://issues.jboss.org/browse/DBZ-947?focusedCommentId=13658596&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13658596

gunnarmorling

@grzegorz8 A few more comments inline. My key questions are:

Do why need to keep NONE at all?
Should CONSISTENT be renamed to REPEATABLE_READ and be the new default?

My answers would be "no", "yes" and "yes" :)

...nector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnectorConfig.java

...server/src/main/java/io/debezium/connector/sqlserver/SqlServerSnapshotChangeEventSource.java

gunnarmorling · 2018-11-13T10:03:31Z

...server/src/main/java/io/debezium/connector/sqlserver/SqlServerSnapshotChangeEventSource.java

-        }
-        LOGGER.info("Executing schema locking");
+        } else if (connectorConfig.getSnapshotLockingMode() == SnapshotLockingMode.EXCLUSIVE) {
+            LOGGER.info("Executing schema locking");


Yes, that's a good point. Can you file a JIRA issue so we can track it separately?

debezium-connector-sqlserver/src/test/java/io/debezium/connector/sqlserver/SnapshotIT.java

grzegorz8 · 2018-11-13T10:19:33Z

@gunnarmorling

Do why need to keep NONE at all?

I'm not insisting. I'll remove it.

Should CONSISTENT be renamed to REPEATABLE_READ and be the new default?

I'll do that.

gunnarmorling · 2018-11-13T10:36:56Z

I'm not insisting. I'll remove it.

Was just talking to @jpechane a bit, and also re-read your comment above about REPEATABLE READ keeping locks. So I can see why someone wanted to use READ COMITTED, provided they are aware of the implications. So we can keep it, I'd just suggest to rename it into READ_COMMITTED then. And then it'd be a good idea to rename the option and enum into snapshot.isolation.mode.

grzegorz8 · 2018-12-01T21:09:43Z

@gunnarmorling I believe I have solved all your suggestions (except Can you file a JIRA issue so we can track it separately?). Are there any issues with this PR?

gunnarmorling · 2018-12-01T21:23:14Z

Hey @grzegorz8, nope, no issues I think, I only was busy with some conference travel and other work recently. I'll get back to this one next week. Unless anything else pops up, the change will be part of the next 0.9.x release. Sorry for the delay and thanks your patience.

grzegorz8 · 2018-12-08T17:16:58Z

Unfortunately, I haven't understood deep enough how read committed works before. It DOES take shared locks, but on very short periods. Deadlocks might occur, though.

org.apache.kafka.connect.errors.ConnectException: Snapshotting of table XXX.XXX failed
       at io.debezium.relational.HistorizedRelationalSnapshotChangeEventSource.createDataEventsForTable(HistorizedRelationalSnapshotChangeEventSource.java:315)
       at io.debezium.relational.HistorizedRelationalSnapshotChangeEventSource.createDataEvents(HistorizedRelationalSnapshotChangeEventSource.java:259)
       at io.debezium.relational.HistorizedRelationalSnapshotChangeEventSource.execute(HistorizedRelationalSnapshotChangeEventSource.java:132)
       at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:77)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 2316) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
       at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:259)
       at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:6388)
       at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1647)
       at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:988)
       at io.debezium.relational.HistorizedRelationalSnapshotChangeEventSource.createDataEventsForTable(HistorizedRelationalSnapshotChangeEventSource.java:287)
       ... 8 more

I need to rethink usefulness of none mode :(

jpechane · 2018-12-13T07:22:26Z

I woonder how deadlock is possible - is it because we are snapshotting two tables and another TX is accessing those two tables but in the opposite order?

grzegorz8 · 2018-12-13T14:04:35Z

@jpechane In this case I was fetching one huge table only (it failed after ~4 hours). I've read that while reading table in read committed a read lock or either individual row or a page is acquired. I'm trying now to track down the exact scenario.

gunnarmorling · 2018-12-17T15:44:42Z

Hey @grzegorz8 and @jpechane, it seems this one still needs some more work? Anything specific we should discuss?

jpechane · 2018-12-18T05:14:04Z

@gunnarmorling Yeah, I'd still give it a couple of more rounds. If the locks are on raw/page level then even on table level if a concurrent process will read two rows in different order than Debezium we can end in a deadlock.

gunnarmorling · 2018-12-18T08:20:01Z

Ok, thanks for the heads-up. I'll assign the issue to 0.9-next then.

grzegorz8 · 2018-12-18T19:06:09Z

@gunnarmorling I doubt we can do anything about deadlocks. We can either decide to get rid of read_committed support or we can leave it and document it properly.

If we leave it, we should document tradeoffs between consistency and locking of each mode and mention that the best choice is SNAPSHOT isolation level and if it's not an option for a user, the user should accept disadvantages of other isolation level modes.

jpechane · 2018-12-20T05:22:55Z

Hi, could you please look at locking hints (https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-2017) - maybe TABLOCK would help to mitigate the issue. Also as you mentioned, in thius conetext lociknig mode NONE might mave sense again.

grzegorz8 · 2019-01-11T19:28:50Z

@gunnarmorling @jpechane Let's try to conclude what to do with this :)

Now I think isolation level read_committed makes no sense. a) It is unreliable (deadlocks). b) adding a table lock contradicts with what I hoped to achieve with this isolation level. read_committed with table lock brings no benefit comparing to repeatable_read.
I'd propose to keep the name snapshot.isolation.mode with values exclusive, snapshot, repeatable_read.
What was additionally fixed here:
a) rollback to a savepoint was removed. In exclusive mode we need to keep lock until the end of transaction. For snapshot and repeatable_read rollback to a savepoint has no effect in fact.
b) Once the snapshot is finished, the default isolation level (read committed) is set.

The property is replaced with snapshot.isolation.mode. Exclusive mode: In order to provide a consistent snapshot, exclusive locks are taken on all monitored tables during entire snapshot process duration. Snapshot mode: The initial snapshot is executed in snapshot transaction isolation level. This guarantees consistent snapshot as long as DDL statements are not executed. In addition, neither table locks nor row-level locks are acquired. Repeatable read mode: The initial snapshot is executed in repeatable read transaction level. Since phantom reads can occur, it does not fully guarantee consistency.

jpechane · 2019-01-14T12:36:02Z

@grzegorz8 @gunnarmorling Based up the current observations I think dropping read_committed is fine, I'd still provide none for edge cases. Naming is fine for me.

grzegorz8 · 2019-01-14T13:21:51Z

@jpechane How should none look like? I believe repeatable_read has the same semantics as the old none.

jpechane · 2019-01-14T13:23:33Z

@grzegorz8 none would be read _uncomitted.

grzegorz8 · 2019-01-14T13:24:34Z

@jpechane Ok, I'll add that. Should I name it none or read_uncommitted?

grzegorz8 · 2019-01-22T15:07:17Z

@jpechane @gunnarmorling How about this PR. If the overall idea of the PR is not good for you, let me create separate tasks (and PRs) for bugs fixes here:
a) rollback to a savepoint was removed. In exclusive mode we need to keep lock until the end of transaction. For snapshot and repeatable_read rollback to a savepoint has no effect in fact.
b) Once the snapshot is finished, the default isolation level (read committed) is set.

jpechane · 2019-01-23T06:19:50Z

@grzegorz8 Nobody is saying that the overall idea is not good. I understand this takes a little bit long already but as you've pointed before - not everything works as we all expected. To speed things up how baout having a chat or hangout session where we agree upon final solution? I think it is better than pushing back and forth. @gunnarmorling Do you agree?

grzegorz8 · 2019-01-23T06:49:18Z

@jpechane I understand this is rather low-priority task if we consider other improvements, bugs or upcoming release candidate.
A quick chat is fine for me. We can schedule it at your convenience, once you have more free time :)

gunnarmorling · 2019-01-23T07:05:18Z

Yes, good idea with having a chat, so to agree on the way forward. And sorry for the delay, there's just (too) much going on. Would Friday morning, e.g. 10:00 a.m. CET, work for the two of you?

jpechane · 2019-01-23T07:35:04Z

@gunnarmorling Fine for me!

grzegorz8 · 2019-01-23T08:55:43Z

Could it be 10:30, please?

gunnarmorling · 2019-01-23T09:17:32Z

Yes, of course. I'll send an invite with a Google Hangouts link.

grzegorz8 · 2019-01-23T09:21:57Z

@gunnarmorling Thanks!

gunnarmorling · 2019-01-23T10:14:13Z

Sent an invite to your Google address.

gunnarmorling · 2019-01-24T07:08:38Z

@grzegorz8 Can you close this PR and open a new one against the main repository (after tomorrow's call, I reckon)? In the course of moving the SQL Server connector out of "Incubating" state, we've moved the connector over there. Thanks!

…on_loops_after_rollback_1 to master Squashed commit of the following: commit 123bcc651acef2a31ab81dc07a337bf996857833 Author: AndreyIg <gnyiny@gmail.com> Date: Mon Mar 16 13:14:45 2020 -0700 DSOPS-101, mining session loops after rollback commit 4343d84e1477fb0ad15730fd48ff587ae49aea2a Merge: 996c0c0f 0536eab Author: AndreyIg <gnyiny@gmail.com> Date: Mon Mar 16 13:13:49 2020 -0700 Merge branch 'master' into DSOPS-101_session_loops_after_rollback_1 commit 0536eab Merge: 76f9f80 e30cfbd Author: AndreyIg <gnyiny@gmail.com> Date: Thu Mar 12 13:31:51 2020 -0700 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 996c0c0ff4c7989eb2469a8c2486de6f80e44484 Author: AndreyIg <gnyiny@gmail.com> Date: Thu Mar 12 13:11:16 2020 -0700 DSOPS-101_session_loops_after_rollback commit f49ae0bb1e205f7dce13c1ea342262cdd0b57ee8 Merge: 59b65a1b 76f9f80 Author: AndreyIg <gnyiny@gmail.com> Date: Thu Mar 12 13:04:47 2020 -0700 Merge branch 'master' into DSOPS-101_session_loops_after_rollback commit 76f9f80 Merge: 77e567e af6f8a3 Author: AndreyIg <gnyiny@gmail.com> Date: Wed Mar 11 12:33:09 2020 -0700 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 59b65a1ba27436e75774593d35a76a1379b5e830 Merge: ddd3c186 77e567e Author: AndreyIg <gnyiny@gmail.com> Date: Mon Mar 9 12:08:39 2020 -0700 Merge branch 'master' into DSOPS-101_session_loops_after_rollback commit 77e567e Merge: 8e3d922 0585b2b Author: AndreyIg <gnyiny@gmail.com> Date: Mon Mar 9 12:07:58 2020 -0700 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit ddd3c1867ca7fd5aea6f0d54c8b431e8bc6648f1 Author: AndreyIg <gnyiny@gmail.com> Date: Mon Mar 9 11:16:49 2020 -0700 DSOPS-101, mining session loops after rollback commit 8e3d922 Merge: a98bb75 d4bc528 Author: AndreyIg <gnyiny@gmail.com> Date: Sat Mar 7 04:26:18 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit a98bb75 Merge: c78c368 a23eb5a Author: AndreyIg <gnyiny@gmail.com> Date: Sat Mar 7 04:12:31 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit c78c368 Merge: 90bcc19 4619fcd Author: AndreyIg <gnyiny@gmail.com> Date: Fri Mar 6 06:52:42 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 90bcc19 Merge: b5d1ea7 3e3aeea Author: AndreyIg <gnyiny@gmail.com> Date: Mon Mar 2 14:31:07 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit b5d1ea7 Merge: 9686041 51f0dcb Author: AndreyIg <gnyiny@gmail.com> Date: Wed Feb 26 17:17:38 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 9686041 Merge: 926c648 4996a49 Author: AndreyIg <gnyiny@gmail.com> Date: Wed Feb 26 12:02:35 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 926c648 Merge: 92140a3 829206c Author: AndreyIg <gnyiny@gmail.com> Date: Wed Feb 26 10:49:29 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 92140a3 Merge: 9fa48df 15a6c6c Author: AndreyIg <gnyiny@gmail.com> Date: Tue Feb 25 18:14:58 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 9fa48df Merge: d3da472 27eb9af Author: AndreyIg <gnyiny@gmail.com> Date: Fri Feb 14 16:11:29 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit d3da472 Merge: 86f3f65 081731f Author: AndreyIg <gnyiny@gmail.com> Date: Mon Feb 3 16:18:33 2020 -0800 Merge branch 'master' of http://git.navis.lan/scm/n4fra/debezium commit 86f3f65 Author: AndreyIg <gnyiny@gmail.com> Date: Mon Feb 3 16:02:43 2020 -0800 DSCON-117, DBConnector exception while incremental loading - revert This reverts commit c3a6023. ... and 19 more commits

grzegorz8 commented Oct 24, 2018

View reviewed changes

grzegorz8 force-pushed the DBZ-947 branch from 667897e to 4c7596e Compare October 31, 2018 13:27

grzegorz8 force-pushed the DBZ-947 branch from 4c7596e to 93fba1e Compare November 8, 2018 12:21

gunnarmorling reviewed Nov 13, 2018

View reviewed changes

grzegorz8 force-pushed the DBZ-947 branch from 93fba1e to a71e12c Compare November 13, 2018 15:09

grzegorz8 force-pushed the DBZ-947 branch from a71e12c to d3fc622 Compare December 1, 2018 21:03

grzegorz8 force-pushed the DBZ-947 branch from d3fc622 to 0535bcd Compare December 4, 2018 10:48

grzegorz8 force-pushed the DBZ-947 branch from cb32188 to 99716d3 Compare December 18, 2018 06:07

DBZ-947 Remove read_committed shapshot.isolation.mode

c8c6edd

grzegorz8 force-pushed the DBZ-947 branch from 99716d3 to c8c6edd Compare January 11, 2019 19:37

DBZ-947 Add read_uncommitted snapshot isolation mode

ef95733

grzegorz8 closed this Jan 25, 2019

DBZ-947 SQL Server snapshot.locking.modes overhaul #38

DBZ-947 SQL Server snapshot.locking.modes overhaul #38

Conversation

grzegorz8 commented Oct 24, 2018 • edited by gunnarmorling Loading

grzegorz8 commented Oct 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grzegorz8 commented Oct 26, 2018 • edited Loading

jpechane commented Oct 29, 2018

jpechane commented Oct 31, 2018

grzegorz8 commented Oct 31, 2018

gunnarmorling commented Nov 6, 2018

grzegorz8 commented Nov 6, 2018 • edited Loading

grzegorz8 commented Nov 12, 2018

gunnarmorling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grzegorz8 commented Nov 13, 2018 • edited Loading

gunnarmorling commented Nov 13, 2018

grzegorz8 commented Dec 1, 2018

gunnarmorling commented Dec 1, 2018

grzegorz8 commented Dec 8, 2018 • edited Loading

jpechane commented Dec 13, 2018

grzegorz8 commented Dec 13, 2018

gunnarmorling commented Dec 17, 2018

jpechane commented Dec 18, 2018

gunnarmorling commented Dec 18, 2018

grzegorz8 commented Dec 18, 2018

jpechane commented Dec 20, 2018

grzegorz8 commented Jan 11, 2019

jpechane commented Jan 14, 2019

grzegorz8 commented Jan 14, 2019

jpechane commented Jan 14, 2019

grzegorz8 commented Jan 14, 2019

grzegorz8 commented Jan 22, 2019

jpechane commented Jan 23, 2019

grzegorz8 commented Jan 23, 2019

gunnarmorling commented Jan 23, 2019

jpechane commented Jan 23, 2019

grzegorz8 commented Jan 23, 2019

gunnarmorling commented Jan 23, 2019 via email

grzegorz8 commented Jan 23, 2019

gunnarmorling commented Jan 23, 2019

gunnarmorling commented Jan 24, 2019

grzegorz8 commented Oct 24, 2018 •

edited by gunnarmorling

Loading

grzegorz8 commented Oct 26, 2018 •

edited

Loading

grzegorz8 commented Nov 6, 2018 •

edited

Loading

grzegorz8 commented Nov 13, 2018 •

edited

Loading

grzegorz8 commented Dec 8, 2018 •

edited

Loading