Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some rows are not replicated onto final table (pg source and bigquery destination) #7590

Closed
rezmuh opened this issue Nov 3, 2021 · 3 comments

Comments

@rezmuh
Copy link

rezmuh commented Nov 3, 2021

Enviroment

  • Airbyte version: 0.30.23-alpha
  • OS Version / Instance: GKE 1.19
  • Deployment: Kubernetes
  • Source Connector and version: postgres 0.3.11
  • Destination Connector and version: Bigquery 0.5.0
  • Severity: High
  • Step where error happened: Sync job

Current Behavior

I'm replicating Postgres using Logical replication with pgoutput from CloudSQL to BigQuery. The sync mode is using incremental + dedup + history. And I sync every 30 mins.

Everyday, there would be some rows are not updated to the final table. In my example, I have a final order_items table. If I search from order_items_scd table for a specific order_id, I see there are 12 rows. But If I query from the final order_items table, there would only be 6 rows returned.

What I found was that those rows which were not carried over to the final order_items table, have _airbyte_end_at se to a UTC timestamp and its _airbyte_active_row field set to 0. However, these rows are still active and appear in the source database.

If I do a full refresh, then those missing rows would appear in the final order_items table. This is happening everyday (we would be missing rows from 20+ order ids in one day).

Expected Behavior

Those rows should appear.

Steps to Reproduce

  1. Set pg logical replication with pgoutput
  2. sync mode: incremental + dedup + history
  3. sync every 15 mins
  4. Within 24 hours, there will be some missing rows.

Are you willing to submit a PR?

no

Reference

Initially raised on: https://airbytehq.slack.com/archives/C01MFR03D5W/p1635771024396100

@rezmuh rezmuh added the type/bug Something isn't working label Nov 3, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Nov 15, 2021
@rezmuh
Copy link
Author

rezmuh commented Nov 15, 2021

@sherifnada any additional info you guys need that I need to provide so that we can fix this?

@sashaNeshcheret
Copy link
Contributor

Could not reproduce the issue, number of records in postgres are equal to the number in bigquery. @rezmuh can you check if it is still reproducible and please, share the logs.

@sashaNeshcheret
Copy link
Contributor

Closed the issue as not reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants