New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination bigquery: reduce commit frequency in GCS staging mode #32112
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
This reverts commit 89995dd.
...dk/core/src/main/java/io/airbyte/cdk/integrations/destination_async/DetectStreamToFlush.java
Outdated
Show resolved
Hide resolved
...-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryAsyncFlush.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks to all the testing, and the change to 200mb files makes sense.
I think this is a slow-rollout scenario - goes this also work well with wide datasets, many streams, etc. It's probably OK, but please make a dev image and test in some (our?) workspace first.
useLocalCdk = false | ||
useLocalCdk = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't forget to revert this
sounds good. I'll do our workspace + the workspace in https://github.com/airbytehq/oncall/issues/3468 |
this has been running stably for the last day. releasing cdk + merging. |
/publish-java-cdk
|
/publish-java-cdk
|
comment was addressed + reviewer is pto
more context in https://airbytehq-team.slack.com/archives/C05H8UCNCK0/p1698964771455149?thread_ts=1698939985.620589&cid=C05H8UCNCK0 and https://github.com/airbytehq/oncall/issues/3468
I did some manual tests with source-faker and source-postgres. In both cases, this resulted in less-frequent flushes without affecting memory usage. (admittedly these were both one-stream syncs :/ but my understanding is that the async framework is supposed to detect memory pressure anyway, so.... 🤷)