-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃悰 Redshift Destination: copies only 1 csv stream part from S3 #10646
Comments
Having the same experience with : Verified only one csv file was loaded in redshift by querying the STL_LOAD_COMMITS table. |
My experience in #11158: Airbyte processed 88,769 emitted records into 2 CSV files in staging S3: one with 41,147 records and one with 47,622 records. Only one of those two files was written to Redshift, as both the _airbyte_raw_shortened_urls table and the shortened_urls table have only 41,147 records. If "Purge Staging Files and Tables" is set to True on the Redshift destination, both those CSV files are then deleted. I turned it to False to check the records in each CSV, and voila. Airbyte version: 0.35.54-alpha |
Having the exact same issue but on Snowflake destination. Thought it was because i upgraded to 0.35.55-alpha, but even after a downgrade back to 0.35.31-alpha the issue still persists. At the end of a successful run Airbyte throws this:
|
I'm not sure if @hellmund's observation in the initial comment that this is caused by #9920 is correct or not. I WILL say that I rolled back to |
Fixed in destination-redshift 0.3.28 |
Environment
Current Behavior
Expected Behavior
Logs
Logging and S3 bucket shows that the multiple CSV staging files contain full data. Log output doesn't mention copying individual files but only that RedshiftStreamCopier(copyStagingFileToTemporaryTable) Copy to tmp table complete.
Steps to Reproduce
Are you willing to submit a PR?
No, as I don't have context on the surrounding recent changes.
Additional observations: Seems to be related to changes in #9920 affecting entries in "stagingWritersByFile"
airbyte/airbyte-integrations/connectors/destination-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/s3/S3StreamCopier.java
Lines 130 to 141 in 2157b47
"stagingWritersByFile" gets used in RedshiftStreamCopier to generate the manifest of which csv files to copy to Redshift
airbyte/airbyte-integrations/connectors/destination-redshift/src/main/java/io/airbyte/integrations/destination/redshift/RedshiftStreamCopier.java
Line 124 in 9fe804a
Another user experienced similar issues with Snowflake using S3 intermediary: https://airbytehq.slack.com/archives/C01MFR03D5W/p1645735115553539?thread_ts=1645734289.113269&cid=C01MFR03D5W
Temporary work-around: Disable S3 bucket + copy for Redshift Destination, using inefficient batch insert instead.
The text was updated successfully, but these errors were encountered: