Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake s3 copy & redshift s3 refactor #2921

Merged
merged 22 commits into from
Apr 26, 2021
Merged

Conversation

jrhizor
Copy link
Contributor

@jrhizor jrhizor commented Apr 16, 2021

This PR provides the ability to create streaming writes to a file and manage the issuance of copy commands for the destination.

It introduces the concept of a SwitchingDestination and has tools to reuse code for copying files to a specific staging environment (such as S3) that can be reused across destinations.

Recommended reading order:

  • SnowflakeDestination
  • airbyte-integrations/connectors/destination-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/copy/*
  • airbyte-integrations/connectors/destination-snowflake/src/main/java/io/airbyte/integrations/destination/snowflake/*
  • Redshift-related code

Remaining:

  • remove merge conflicts
  • clean up tests
  • add integration test
  • redo manual testing

@jrhizor jrhizor changed the title snowflake s3 copy snowflake s3 copy & redshift s3 refactor Apr 20, 2021
@jrhizor jrhizor marked this pull request as ready for review April 20, 2021 15:39
@cgardens cgardens self-requested a review April 20, 2021 20:04
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I followed what's going on here. I think you're adding some solid abstractions to keep the file upload approach lean for future dbs.

I'm requesting changes because there a few places where I think we can improve the clarity of how things are working. The main things are:

  1. the responsibilities / lifecycle of the copier--in other words all of the steps that are involved
  2. understanding the delegate concept as used in the copier.

if you plan to address all of these things in the refactor you're working on, just lmk and i can approve this one and we can figure out the clarity stuff in the next PR. i definitely want to take another look when the clarity stuff is addressed. that all being said, i think the fundamental approach here is spot on.

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 23, 2021

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/778927460
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/778927460

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 23, 2021

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/778940999
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/778940999

@cgardens cgardens self-requested a review April 23, 2021 21:05
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! much clearer now.

I had a few more comments about structure but I don't want to block. Happy to talk about them more if they are interesting.

*
* @return the SQL queries necessary to merge or the empty string if there was a failure
*/
String copyToTmpTableAndPrepMergeToFinalTable(boolean hasFailed) throws Exception;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still not in love this method. would it make sense to split it up? looking at the impl it kinda looks like that's how you think about it anway.

  • closeWriter / closerConsumer
  • copyDataToTmpTable
  • generateMergeStatement

dunno. i might be off course here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a developer perspective I really dont know that I should do with the function.

Also because it has so many responsibilities that can fail, all the error handling will have to be done in every implementation instead of in the consumer of that interface.

import io.airbyte.protocol.models.DestinationSyncMode;
import java.sql.SQLException;

public class SnowflakeS3StreamCopier extends S3StreamCopier {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe purely a style thing, but what about just doing a CopyFromStorageToTable iface and use that for this part as opposed to extending S3StreamCopier? Similar to JdbcStreamingQueryConfiguration. just a thought.

@@ -9,6 +9,10 @@ application {
}

dependencies {
implementation 'org.apache.commons:commons-lang3:3.11'
implementation 'org.apache.commons:commons-csv:1.4'
implementation 'com.github.alexmojaki:s3-stream-upload:2.2.2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How reliable is that library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pretty lightweight library and is regularly updated: https://github.com/alexmojaki/s3-stream-upload
However, there aren't many public usages: https://github.com/search?q=alexmojaki+%22s3-stream-upload%22&type=code

I only used this since it was already part of @davinchia's Redshift implementation. I imagine if we do run into problems with it we can rip it out easily, so I'm not planning on changing it to something internal preemptively.

*
* @return the SQL queries necessary to merge or the empty string if there was a failure
*/
String copyToTmpTableAndPrepMergeToFinalTable(boolean hasFailed) throws Exception;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a developer perspective I really dont know that I should do with the function.

Also because it has so many responsibilities that can fail, all the error handling will have to be done in every implementation instead of in the consumer of that interface.

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

I changed the interface to move all of the control flow to CopyConsumer.closeAsOneTransaction:

      for (var copier : streamCopiers) {
        copier.closeStagingUploader(hasFailed);

        if (!hasFailed) {
          copier.createTemporaryTable();
          copier.copyStagingFileToTemporaryTable();
          copier.createDestinationSchema();
          var destTableName = copier.createDestinationTable();
          var mergeQuery = copier.generateMergeStatement(destTableName);
          mergeCopiersToFinalTableQuery.append(mergeQuery);
        }
      }

Which I think reads a lot better.

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786019268
❌ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786019268

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786020239
❌ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786020239

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786106199
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786106199

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786108236
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786108236

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786151392
❌ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786151392

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/publish connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786158453
❌ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786158453

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786158939
❌ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786158939

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/publish connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786199510
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/786199510

@jrhizor
Copy link
Contributor Author

jrhizor commented Apr 26, 2021

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786199961
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/786199961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants