Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake destination: Support internal staging #4370

Closed
sherifnada opened this issue Jun 28, 2021 · 6 comments · Fixed by #8253 or #8528
Closed

Snowflake destination: Support internal staging #4370

sherifnada opened this issue Jun 28, 2021 · 6 comments · Fixed by #8253 or #8528

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Jun 28, 2021

Tell us about the problem you're trying to solve

Records can be inserted into snowflake using a standard SQL INSERT statement or by first copying the data to blob storage (e.g: azure, gcs, s3, snowflake internal) then using the COPY command to read that data into snowflake. The staging approach is an order of magnitude faster and should pretty much always be used except for the smallest data volumes/proof of concepts.

The snowflake destination currently supports staging to S3. it should also support internal staging as described here: https://docs.snowflake.com/en/user-guide/data-load-considerations-stage.html

Describe the solution you’d like

Support internal staging in snowflake.

Acceptance criteria

  1. Internal staging is supported
  2. All docs are updated with explicit documentation of needed DB user permissions
  3. If no backwards incompatible changes are introduced in terms of permissions, remove the INSERT mode in a backwards compatible way. If we have internal staging then there is basically no reason to use inserts
@sherifnada sherifnada added type/enhancement New feature or request area/connectors Connector related issues lang/java labels Jun 28, 2021
@marcosmarxm
Copy link
Member

From slack convo thank Nick Akincilar:
Workload should be

  • CREATE STAGE @TEMP_STAGE_NAME
  • PUT file://local/some_folder/* @TEMP_STAGE_NAME. --JDBC Driver will upload the files using PUT command. No need to program it.
  • COPY FROM @TEMP_STAGE_NAME
  • DROP @TEMP_STAGE_NAME

@fredpinchon
Copy link

The direct access method is way too slow beyond a couple hundred records. Beyond the technical arguments presented above, that is, Snowflake's own recommendations, a lot of teams don't have access to S3 buckets, or just can't bother to manage one, having to go through security loops and red tape hassles. A Snowflake Internal Stage method is critical for the adoption of Airbyte at Snowflake accounts...

@bashyroger
Copy link

We experience the same problem and even worse: when loading significant data sets via Airbyte to snowflake whe nusing the INSERT statement method, it looks like Snowflake enters a sort of DDOS-attack protection mode: we lose the connection from Airbyte to SF, Snowflake drops the connection and we have to retry.

As we want to use Airbyte to sync complete databases to snowflake and as we are dealing with sensitive PII data that we are not allowed to store in a SF external stage / cloud bucket, we simply need to be able to load to a SF internal stage for Airbyte to be of any use for us...

@sherifnada
Copy link
Contributor Author

thanks for the context @bashyroger -- we're upping the priority on this and will tackle it very soon

@VitaliiMaltsev VitaliiMaltsev self-assigned this Nov 18, 2021
@VitaliiMaltsev VitaliiMaltsev linked a pull request Nov 30, 2021 that will close this issue
40 tasks
@tuliren
Copy link
Contributor

tuliren commented Dec 6, 2021

@VitaliiMaltsev, actually we cannot remove the Standard loading method, because there are existing syncs using this method.

destination - 2021-12-06 08:34:08 ERROR LineGobbler(voidCall):82 - Exception in thread "main" java.lang.Exception: Verification error(s) occurred for WRITE. Errors: [$.loading_method: should be valid to one and only one of the schemas ] 
destination - 2021-12-06 08:34:08 ERROR LineGobbler(voidCall):82 - 	at io.airbyte.integrations.base.IntegrationRunner.validateConfig(IntegrationRunner.java:158)
destination - 2021-12-06 08:34:08 ERROR LineGobbler(voidCall):82 - 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:125)
destination - 2021-12-06 08:34:08 ERROR LineGobbler(voidCall):82 - 	at io.airbyte.integrations.destination.snowflake.SnowflakeDestination.main(SnowflakeDestination.java:72)

Need to either add this method back, or patch the connector to default the Standard loading method to the Internal Staging.

@tuliren tuliren reopened this Dec 6, 2021
@VitaliiMaltsev VitaliiMaltsev linked a pull request Dec 6, 2021 that will close this issue
40 tasks
@VitaliiMaltsev
Copy link
Contributor

@tuliren returned Standard Loading Method in #8528. Please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment