Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres oldest lsn doesn't update when there is no new records #6230

Closed
mohammad-bolt opened this issue Sep 17, 2021 · 7 comments
Closed

Postgres oldest lsn doesn't update when there is no new records #6230

mohammad-bolt opened this issue Sep 17, 2021 · 7 comments
Assignees
Labels
area/connectors Connector related issues cdc connectors/source/postgres type/bug Something isn't working

Comments

@mohammad-bolt
Copy link
Contributor

mohammad-bolt commented Sep 17, 2021

Enviroment

  • Airbyte version: example is 0.27.0-alpha
  • Deployment: kubernetes
  • Source Connector and version: postgress 0.3.11
  • Destination Connector and version: bigquery
  • Severity: High
  • Step where error happened: Sync job

Current Behavior

Tell us what happens.

I noticed when in incremental mode, postgress source doesn’t update postgre server when there is no new data.
that means postgress doesn’t empty its logs in this case. has anybody else noticed this? I ran this

SELECT 
    slot_name, 
    plugin,
    pg_size_pretty(pg_current_wal_lsn()-restart_lsn) as replication_lag,
    restart_lsn
FROM pg_replication_slots;

and confirmed.
the table in questions has only 3 rows with no updates.

Expected Behavior

In order to prevent binlogs from piling up on the DB, the connector should always update its read position in the binlog to the latest available, even if none of the updates are from tables we care about.

Logs

logs-273-0.txt.gz

@mohammad-bolt mohammad-bolt added the type/bug Something isn't working label Sep 17, 2021
@marcosmarxm marcosmarxm added the area/connectors Connector related issues label Sep 18, 2021
@sherifnada sherifnada added the cdc label Sep 24, 2021
@sherifnada sherifnada changed the title postgress oldest lsn doesn't update when there is no new records Postgres oldest lsn doesn't update when there is no new records Sep 24, 2021
@sherifnada
Copy link
Contributor

sherifnada commented Sep 24, 2021

@mohammad-bolt thanks for creating this - we've heard this from other users and will prioritize shortly!

One question though: I am assuming this table is one of many in the DB but it's the only one selected for replication with Airbyte? And the other tables are receiving updates? (hence the binlog piling up)

@smartpierre
Copy link

I confirm I had the same issue on my side.

@angelied
Copy link

I am also experiencing this in my Postgres setup -- I have to manually insert data so that the oldest lsn updates.

@etsybaev
Copy link
Contributor

etsybaev commented May 6, 2022

Hi @mohammad-bolt, @smartpierre, @angelied.

Is this issue still reproducible? Could you please provide a little bit more information about the steps to reproduce, actual and expected results?
Thanks!

@etsybaev
Copy link
Contributor

etsybaev commented May 8, 2022

Just tried to check with CDC, but can't get wats wrong. Kidly ask to add steps to reproduce, actual and expected results. Thanks!
One of my tests was:
Updated record with id=11 to set value "1111" instead of old record "1".
Selection_159.png
So only one change was in DB since previous sync and it was an Update DB change.

Then tried to replicate one more time (expectedly one record has been updated):
Selection_160.png

Checking the result:
Selection_162.png

Then did nothing with DB and tried to sync again. expectedly no reconds have been replicated, and no fails:
Selection_163.png

So it seems like it works from the end-user perspective. The investigation above and others (do I not mention here do decrease captures count) shows that CDC replication seems to work successfully. We do not handle\iteract with system fields like "lsn" (etc) by our own. This is done by DB itself and by Debezium as a third party CDC client tool.

@DoNotPanicUA
Copy link
Contributor

Hi @angelied @smartpierre @mohammad-bolt
Please correct me if I'm wrong because I'm not so good at the Postgres binlogs. You have multiply tables, but you sync only one (for example).
Your table is not updated for a long period of time, but other tables are. As a result, minimal required lsn for your consumer is relevant for your non-updated table. And your binlog contains all changes of all tables after your lsn.
Have you tried to change your Postgres configuration if my perception is correct? Because solution from our side looks like a workaround which might lead to some other issues.

@etsybaev
Copy link
Contributor

Hi there. I'm closing this ticket since I haven't got any replies and do not see any issues from the end-user perspective. Please feel free to reopen, but kindly ask to add clear steps to reproduce, actual and expected behavior. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues cdc connectors/source/postgres type/bug Something isn't working
Projects
Status: Scoping in progress
Development

No branches or pull requests

10 participants