Skip to content

Remove table and clean up redundant state info when scanning newly added table#1134

Merged
ruanhang1993 merged 2 commits intoapache:masterfrom
qidian99:remove_table_20220429
Jun 14, 2023
Merged

Remove table and clean up redundant state info when scanning newly added table#1134
ruanhang1993 merged 2 commits intoapache:masterfrom
qidian99:remove_table_20220429

Conversation

@qidian99
Copy link
Contributor

As mentioned in Issue#913, when a task is stopped and its configuration changed, there might be redundant table information remaining in the states of SplitAssigner and SourceReader.

The current functionality of scanning newly added tables will removing and cleaning up table info in the previous savepoint according to the new configuration.

The changes proposed in this PR is as follows:

In MySqlSplitAssigner, it will apply the new filter in the new configuration and remove redundant table info
In MySqlSourceReader, it will sieve out splits that do not match the new filter.

@qidian99 qidian99 force-pushed the remove_table_20220429 branch from 65ac40c to cce5830 Compare April 29, 2022 04:08
@zmzeng
Copy link

zmzeng commented Dec 8, 2022

Hi @qidian99, I wonder why you PR didn't merged into master?
Recently, I came across the same issue as you did, thus this PR is pretty necessary for us.

@zmzeng zmzeng mentioned this pull request Dec 12, 2022
13 tasks
@ruanhang1993
Copy link
Contributor

@leonardBang Do you have time to review this PR?

@ruanhang1993
Copy link
Contributor

Hi, @qidian99 . Could you rebase the master branch? Thanks~

@ruanhang1993 ruanhang1993 force-pushed the remove_table_20220429 branch from cce5830 to d76935b Compare June 13, 2023 07:53
Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @qidian99 and @ruanhang1993 for the contribution, I left some coments

@ruanhang1993 ruanhang1993 requested a review from leonardBang June 13, 2023 11:56
Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ruanhang1993 for the update, LGTM

@zhangkunjie
Copy link

1、I configured MySQL tables A and B to synchronize to a pipeline in Starrocks using the configuration file yaml Task.
2、I use the flink savepoint command to save the running task as a _metadata.
3、I modified the yaml configuration file by commenting out Table B, leaving only Table A, and then restoring the task from the second step's savepoint. At this point, the synchronization task only synchronizes Table A, which is not a problem.
4、At this point, I used the flink savepoint command again to save the running task as a _metadata. Through observation, it was found that the information of Table B, which had already been deleted, still exists in the _metadata at this time. Why does the deleted B table information still exist in the _metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants