Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-5423 Add test case to support table rename for SQLServer #3738

Merged
merged 4 commits into from Nov 14, 2022

Conversation

ani-sha
Copy link
Member

@ani-sha ani-sha commented Jul 25, 2022

@ani-sha ani-sha requested a review from Naros July 25, 2022 08:45
Copy link
Member

@Naros Naros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ani-sha, I wonder since we are wanting to verify all aspects around a rename, could you adjust the test case to also check the contents of the schema history topic as well?

@Naros
Copy link
Member

Naros commented Jul 26, 2022

Hi @ani-sha I left a comment on the Jira issue. I've added a small change to your test case to account for what I described in the Jira to highlight that the behavior isn't exactly as straightforward as it appears to be described in the issue. You'll notice that adding the restart & verifying that the table name doesn't change in the later events fails the topic-name assumption.

@ani-sha
Copy link
Member Author

ani-sha commented Jul 27, 2022

@Naros Yes I went through the comments. But what should actually be the behavior in such a case?

@Naros
Copy link
Member

Naros commented Jul 28, 2022

Hi @ani-sha so I think the actual behavior should mimic what we see for other connectors, once the table is renamed, the old table name is effectively non-existent and we should always emit changes based on that new name. I believe what we need to understand is:

  1. Why does the connector emit events under the old, non-existent table name
  2. What about restarting the connector triggers the change in the behavior

Perhaps there is something cached somewhere in the scope of (1) that gets lost when the connector is restarted in (2). In either case, could you explain why the connector works the way it does first, and then we discuss what and how should the code be changed.

@ani-sha
Copy link
Member Author

ani-sha commented Jul 29, 2022

Thanks for the input @Naros. I am looking into this.

@Naros
Copy link
Member

Naros commented Aug 2, 2022

So @ani-sha SQL Server relies on a SqlServerStreamingExecutionContext that maintains an atomic reference to an array of SqlServerChangeTable instances. The reason we observe this behavior is we resolve the source table id from this instance when we process changes, even if the table's name has been renamed. And since this array is dynamically populated on the first execution of the streaming loop, this explains why we see a difference in behavior when the connector gets restarted.

What we need to do is to determine the right moment when the source table changes and call ChangeTable#setSourceTable. I'll continue to look and see where this is most appropriatel.

@Naros
Copy link
Member

Naros commented Aug 2, 2022

So we don't specifically mention anything about sp_rename in the documentation from what I can tell, but given the fact that the operation is not tracked/audited by SQL Server, it can be very difficult to support. There are a couple of use cases to think about.

First, we have a table called account. The connector is currently capturing this table, but the connector is stopped briefly. While the connector is stopped, the table continues to have records changed and is renamed using sp_rename. When the connector is restarted, all changes even if they occurred while the table was named account will be emitted to account_new.

Second, we again have this table called account and as the connector emits changes, the table is renamed using sp_rename. As there is no mechanism to notify the connector of this rename, we continue to emit changes under the account topic until the connector is restarted at which point the table references that are cached get populated from the existing database names.

What I am not yet sure of (but I'm very skeptical its possible) is to connect the dots with whatever the object metadata is in the database. For example, if the database object has an id that remains the same after the rename, we could use this to detect the rename by querying the database information schema on each iteration. This does mean that the emission of events may not align precisely to when the rename occurred, but given the first scenario we already have this problem as-is.

The alternative is quite simply outline that sp_rename isn't supported and we document what the procedure should be for users who wish to rename tables but keep the same capture instance. Maybe with signal support being available, maybe the only way forward is to capture all pending changes for old table, issue sp_rename, followed by a signal to notify the connector of the table rename, followed by enabling changes on the new table.

But what is clear, regardless, is that the behavior of the connector to continue to emit changes under the old table name is not intended and is in fact what I would consider a bug, but one that we need to find an elegant way to fix.

@Naros
Copy link
Member

Naros commented Aug 2, 2022

I can confirm that sp_rename does in fact leave the object_id untouched:

1> select name, object_id, type, type_desc from sys.all_objects where name = 'test';
2> go
name                                                                                                                             object_id   type type_desc                                                   
-------------------------------------------------------------------------------------------------------------------------------- ----------- ---- ------------------------------------------------------------
test                                                                                                                               581577110 U    USER_TABLE                                                  

(1 rows affected)
1> EXEC sp_rename 'test', 'test_new';
2> go
Caution: Changing any part of an object name could break scripts and stored procedures.
1> select name, object_id, type, type_desc from sys.all_objects where name = 'test_new';
2> go
name                                                                                                                             object_id   type type_desc                                                   
-------------------------------------------------------------------------------------------------------------------------------- ----------- ---- ------------------------------------------------------------
test_new                                                                                                                           581577110 U    USER_TABLE                                                  

@ani-sha ani-sha requested a review from Naros August 16, 2022 12:48
@Naros Naros force-pushed the DBZ-5423-sqlserver-tests branch 2 times, most recently from 82ed876 to a218c81 Compare August 18, 2022 20:06
@ani-sha ani-sha added the waits for other PR Issues that can only be merged once another PR has been merged label Aug 26, 2022
@jpechane jpechane merged commit c595fb9 into debezium:main Nov 14, 2022
@jpechane
Copy link
Contributor

@ani-sha @Naros Applied, thanks a lot!

@ani-sha ani-sha deleted the DBZ-5423-sqlserver-tests branch April 20, 2023 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waits for other PR Issues that can only be merged once another PR has been merged
Projects
None yet
3 participants