Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mysql] Optimize pure binlog phase check logic to improve performance #1392

Merged

Conversation

lzshlzsh
Copy link
Contributor

@lzshlzsh lzshlzsh commented Jul 22, 2022

Pure binlog phase performace opt
Our online business found that the performance of the binlog phase of Flink jobs could not meet the demand. Through performance analysis, we found that the performance bottleneck in the phase was the comparison of binlog offset. For each binlog data, mysql-cdc needs to judge whether the offset of the binlog data is after the end of the full snapshot phase (max split high watermark). If so, it is in pure binlog phase, and can be directly output to the downstream. The comparison between binlog data offset and max split high watermark consumes CPU very much and has become a performance bottleneck (see the following figure).
Further analysis of the internal logic of mysql-cdc shows that for each mysql-cdc table, the state of incremental synchronization will remain unchanged after entering the pure binlog phase. Therefore, it is sufficient to keep a flag for each table to judge whether the table has entered pure binlog phase, so as to avoid binlog offset comparison of each data in the pure binlog phase, Improve the performance of incremental data synchronization.
image

Performance improvement
In the actual online scenario test (more than 180 tables are synchronized with many table fields in a flink job), the performance is improved by 3 times (from 5k/s to 2w/s), which meets the real-time synchronization needs of our business.

@lzshlzsh
Copy link
Contributor Author

@leonardBang Would you have a look at if there is any problem with this optimization?

@leonardBang leonardBang self-requested a review July 28, 2022 12:42
Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lzshlzsh for the great work, I like your detail analysis, the optimization makes sense to me, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants