Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL CDC sync fails because starting binlog position not found in DB #6425

Closed
sherifnada opened this issue Sep 24, 2021 · 4 comments · Fixed by #9514
Closed

MySQL CDC sync fails because starting binlog position not found in DB #6425

sherifnada opened this issue Sep 24, 2021 · 4 comments · Fixed by #9514

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Sep 24, 2021

Enviroment

Source: MySQL 0.4.4
Destination: Snowflake 0.3.14
EC2 Docker Airbyte instance 0.29.17-alpha

Current Behavior

context dump from #5870

logs-23-0.txt

looking back at these logs, they actually start with the error

The connector is trying to read binlog starting at binlog file 'mysql-bin-changelog.162732', pos=834769, skipping 2 events plus 1 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.

but the job continues ahead and appears to "succeed" whilst still causing a retry

Screen Shot 2021-09-14 at 11 16 30 am

attached sync timestamps

  1. see sync frequency. it is manual, so i ran it once (full refresh) and then again 2 days later. i doubt the sync waited too long as that "successful but retried" sync took 8 minutes.
  2. only one CDC source for one connection. (in a seperate airbyte instance i have the same CDC source, with the same account but that connection is disabled and this new instance, with full-refresh sync on 09/10 worked fine)

separately. on point 2. I do have multiple connections with the same CDC source to the same destination. is this not allowed? I run these two multiple connections at the same time and they appeared to work as expected although I run into the "hanging" issue, where it doesnt actually COPY the data after reading it.

Expected Behavior

Presumably we should either:

  1. Reset the whole database and start fresh on the binlog, or
  2. Somehow find out which table updates we missed (or refresh only the relevant tables). IDK how this would work.
@sherifnada sherifnada added type/bug Something isn't working area/connectors Connector related issues cdc labels Sep 24, 2021
@danieldiamond
Copy link
Contributor

@sherifnada just adding some thoughts here regarding the specific issue of waiting long periods of time between syncs. I ran into this issue and was going to post in the slack group but was curious what was considered "expected" behaviour. Upon reviewing debezium docs as well as airbyte:

We read data from the log up to the time that the sync was started. We do not treat CDC sources as infinite streaming sources. You should ensure that your schedule for running these syncs is frequent enough to consume the logs that are generated.

source

I thought that's fair. The docs could potentially be more explicit but they do state that syncs should be frequent. If a user is implementing CDC, I would presume they understanding what that means i.e. reading bin logs, and thus they should understand that bin logs are not persistent and that sync should be frequent enough to consume the logs before they're dropped.

@mkhokh-33
Copy link
Contributor

mkhokh-33 commented Dec 7, 2021

CDC use binlog files to apply snapshot changes on db. binlog files have a retention policy, time of living, during that time mysql server keeps them alive and available for use.
This time can be configured

  1. by user with certain privileges. See https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_expire_logs_seconds.
  2. there is possibility to remove binlogs manually as well - https://dev.mysql.com/doc/refman/8.0/en/purge-binary-logs.html

User which is going to configure cdc for mysql database should be sure that:

  • binlogs will not be deleted before each bd sync - retention policy should be properly configured.
  • binlogs should not be deleted manually.

To make reset from our side during a db sync might cause some issues:

  • we don't know how user is using database and whether there are some down stream systems that might using it, if we reset data all records from db will be deleted for sometime.
  • if the size of the data is huge it will take time to sync it all again.
  • user might even never realize that cdc is not working as expected.

This kind of critical error that we can not handle because binlogs and db settings should be properly configured.

Instead of reset data on our side I suggest:

  1. inform the user that the critical error is happened and the incremental update is impossible. Provide description that problem is related to binlog and propose to reset data and make db sync again manually thus user will be aware about the issue and we as well provide with instructions how to deal with it.
  2. add description to documentation about mysql binlogs and their retention time and ask users to check that connections are configured the way that they sync data before binlogs are deleted.

@sherifnada what do you think?

@alexandr-shegeda
Copy link
Contributor

@sherifnada please take a look at this comment #6425 (comment)

@sherifnada
Copy link
Contributor Author

@mkhokh-33 your suggestions sound good, let's go with those!

@mkhokh-33 mkhokh-33 moved this from On hold to Implementation in progress in GL Roadmap Dec 21, 2021
@mkhokh-33 mkhokh-33 moved this from Implementation in progress to Internal review in GL Roadmap Jan 17, 2022
@mkhokh-33 mkhokh-33 moved this from Internal review to Airbyte review in GL Roadmap Feb 1, 2022
@igrankova igrankova moved this to Implementation in progress in GL Roadmap Feb 2, 2022
@oustynova oustynova moved this from Implementation in progress to In review (internal) in GL Roadmap Feb 3, 2022
@mkhokh-33 mkhokh-33 moved this from In review (internal) to In review (Airbyte) in GL Roadmap Feb 3, 2022
mkhokh-33 added a commit that referenced this issue Feb 10, 2022
…6425 (#9514)

* Check binlog position on mysql server before run sync job, add error description into log

* fix MySqlStrictEncryptSourceAcceptanceTest

* fix formatting

* fix review comments

* added java docs and fixed few minor comments

* fix formatting

* update versions

* update source_specs.yaml
@mkhokh-33 mkhokh-33 moved this from In review (Airbyte) to Done in GL Roadmap Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants