New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Destination bigquery: Properly fix per-stream state handling #29498
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
…tehq/airbyte into edgao/fix_state_handling_correctly
destination-bigquery test report (commit
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Build airbyte/normalization:dev | ✅ |
./gradlew :airbyte-integrations:connectors:destination-bigquery:integrationTest | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
…tehq#29498) Co-authored-by: edgao <edgao@users.noreply.github.com>
partial solution for #29479. This PR will only publish destination-bigquery.
based on #29420, but with the hacky handling removed, and with null-namespace handling added.
releasing before #29478. This will break checkpointing for connections with destination-default/custom namespace, and/or non-empty stream name prefix. That's OK, because not checkpointing is better than incorrectly checkpointing.
Testing: I ran a faker -> bigquery gcs sync locally and watched the flush/state ack log messages (configured to use destination default namespace +
arst_
stream name prefix). Ran using a custom platform build that overwrites the state messages with the correct namespace/stream prefix.arst_purchases
stream, we stopped emitting states forarst_users
.null
on the state messages, even though the destination was actually writing to theedgao_test_gcs_1s1t_disabled
dataset.The last
arst_users
state hadupdated_at
2023-08-16T22:27:20+00:00
; queried the raw table at that moment and verified that the latest record was >= that updated_at value.At the end of the sync, after we flushed all remaining data, we correctly emitted a final state message for every stream.
The final state message for
arst_users
had2023-08-16T22:28:00+00:00
for updated_at, which is now reflected in the raw table: