Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Destination bigquery: Properly fix per-stream state handling #29498

Merged
merged 20 commits into from Aug 17, 2023

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Aug 16, 2023

partial solution for #29479. This PR will only publish destination-bigquery.

based on #29420, but with the hacky handling removed, and with null-namespace handling added.

releasing before #29478. This will break checkpointing for connections with destination-default/custom namespace, and/or non-empty stream name prefix. That's OK, because not checkpointing is better than incorrectly checkpointing.

Testing: I ran a faker -> bigquery gcs sync locally and watched the flush/state ack log messages (configured to use destination default namespace + arst_ stream name prefix). Ran using a custom platform build that overwrites the state messages with the correct namespace/stream prefix.

  • Observed that once we started flushing the arst_purchases stream, we stopped emitting states for arst_users.
  • Observed that the namespace was correctly null on the state messages, even though the destination was actually writing to the edgao_test_gcs_1s1t_disabled dataset.

image
The last arst_users state had updated_at 2023-08-16T22:27:20+00:00; queried the raw table at that moment and verified that the latest record was >= that updated_at value.
image
At the end of the sync, after we flushed all remaining data, we correctly emitted a final state message for every stream.
image
The final state message for arst_users had 2023-08-16T22:28:00+00:00 for updated_at, which is now reflected in the raw table:
image

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/bigquery labels Aug 16, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 16, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@edgao edgao changed the title Edgao/fix state handling correctly Destination bigquery: Properly fix per-stream state handling Aug 16, 2023
@edgao edgao changed the title Destination bigquery: Properly fix per-stream state handling 🐛 Destination bigquery: Properly fix per-stream state handling Aug 16, 2023
@edgao edgao marked this pull request as ready for review August 16, 2023 22:41
@edgao edgao requested review from a team as code owners August 16, 2023 22:41
@octavia-squidington-iii

This comment was marked as outdated.

@octavia-squidington-iii

This comment was marked as outdated.

docs/integrations/destinations/bigquery.md Outdated Show resolved Hide resolved
@edgao edgao enabled auto-merge (squash) August 17, 2023 14:41
@octavia-squidington-iii
Copy link
Collaborator

destination-bigquery test report (commit 0b3694b56a) - ✅

⏲️ Total pipeline duration: 32mn12s

Step Result
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build destination-bigquery docker image for platform linux/x86_64
Build airbyte/normalization:dev
./gradlew :airbyte-integrations:connectors:destination-bigquery:integrationTest

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-bigquery test

@edgao edgao merged commit f003a06 into master Aug 17, 2023
21 checks passed
@edgao edgao deleted the edgao/fix_state_handling_correctly branch August 17, 2023 15:30
harrytou pushed a commit to KYVENetwork/airbyte that referenced this pull request Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/bigquery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants