New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destinations V2 - T&D Streams in parallel #30020
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supplyAsync makes me nervous, otherwise I think this looks right. Had a question around correctness wrt to the final dropStage / T+D interaction, plus a few nits.
...ping/src/main/java/io/airbyte/integrations/base/destination/typing_deduping/FutureUtils.java
Show resolved
Hide resolved
.../main/java/io/airbyte/integrations/base/destination/typing_deduping/DefaultTyperDeduper.java
Outdated
Show resolved
Hide resolved
...-jdbc/src/main/java/io/airbyte/integrations/destination/staging/GeneralStagingFunctions.java
Outdated
Show resolved
Hide resolved
...-jdbc/src/main/java/io/airbyte/integrations/destination/staging/GeneralStagingFunctions.java
Outdated
Show resolved
Hide resolved
This reverts commit aa0a901ad1bb334e54d8bbc7da14c83f74b22150.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
previous commit timed out on bigquery integration tests. I think we need to call |
:doh: good catch, thanks @edgao |
This comment was marked as outdated.
This comment was marked as outdated.
@@ -104,6 +104,7 @@ public static OnCloseFunction onCloseFunction(final JdbcDatabase database, | |||
// After moving data from staging area to the target table (airybte_raw) clean up the staging | |||
// area (if user configured) | |||
log.info("Cleaning up destination started for {} streams", writeConfigs.size()); | |||
typerDeduper.typeAndDedupe(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benmoriceau / @tryangul fyi we're tweaking the T+D interface again 😅 . There's now a typeAndDedupe()
method, which runs T+D on all streams, along with a new cleanup
method (under the hood, it shuts down a thread pool, and is mandatory to run at the end of a sync to allow the process to exit)
This comment was marked as outdated.
This comment was marked as outdated.
destination-bigquery test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
destination-snowflake test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-snowflake docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
destination-snowflake test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-snowflake docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
destination-bigquery test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
ran a test to dest-bigquery with 11 streams and got this error during table setup:
... which I had to go into the bq cli for, because |
destination-bigquery test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
destination-snowflake test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-snowflake docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
snowflake prerelease publish running here https://github.com/airbytehq/airbyte/actions/runs/6090550929 |
destination-snowflake test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-snowflake docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
destination-bigquery test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ❌ |
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
destination-bigquery test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-bigquery docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-bigquery/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
destination-snowflake test report (commit
|
Step | Result |
---|---|
Java Connector Unit Tests | ✅ |
Build connector tar | ✅ |
Build destination-snowflake docker image for platform linux/x86_64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate airbyte-integrations/connectors/destination-snowflake/metadata.yaml | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
We were running typing and deduping operations per stream sequentially. This change makes them execute concurrently.
@evantahler @edgao @cynthiaxyin : One thing I have not had time to track down is why snowflake does not show queries in the logs despite seemingly doing the tasks.
In my testing with source faker this was a little bit faster, the snowflake snyc that was taking 45-49 seconds took 30-37 seconds and the bigquery sync that was taking 1m40-1m45 seconds was taking 1m30 seconds
I have not yet set up a test with many streams, which is what exacerbates the sequential T&D issue - will set this up tomorrow.
UPDATE
After testing changes locally for BigQuery GCS Staging, I noticed a minimal increase, 25min -> 20 minutes on 30 streams. By removing incremental typing and deduping I saw a significant speed improvement 25min -> 11 min. This convinces me that we should hold off on incremental typing and deduping until we have a better story around it.
While testing snowflake, I encountered several OOM errors, however the syncs do eventually succeed. I saw the same behavior with the non parallel T&D so I'm inclined to think this is an existing bug.
I created a platform PR which allows us to configure the number of threads used by typing and deduping, but it will fallback to 8 if nothing is provided.
SECOND UPDATE
The tests for Standard Inserts were failing after this change, so I've removed this for standard inserts. SI is already slower than GCS staging so I think this is acceptable