Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error fetching chunk from Snowflake #6246

Closed
chandininekkantti opened this issue Sep 18, 2021 · 7 comments · Fixed by #9567
Closed

Error fetching chunk from Snowflake #6246

chandininekkantti opened this issue Sep 18, 2021 · 7 comments · Fixed by #9567

Comments

@chandininekkantti
Copy link

chandininekkantti commented Sep 18, 2021

Enviroment

  • Airbyte version: 0.29.19-alpha
  • OS Version / Instance: GCP c2-standard-16 (16 vCPUs, 64 GB memory)
  • Deployment: Docker
  • Source Connector and version: airbyte/source-snowflake 0.1.1
  • Destination Connector and version: airbyte/destination-bigquery | 0.4.0 |
  • Severity: Critical
  • Step where error happened: Sync job

Current Behavior

Standard Insert enabled with 15M B chunk. However, I see following errors after 6 hours of Airbyte running

Expected Behavior

Data mirrored in BigQuery but fails after mirroring few tables.

Logs

The following is just snapshot of error. I have same error for almost 30 tables.

LOG
2021-09-15 19:10:52 INFO  2021-09-15 19:10:51 �[32mINFO�[m i.a.i.s.r.AbstractRelationalDbSource(lambda$createReadIterator$6):315 - {} - Reading stream SAT_BDV_TEST. Records read: 43390000
2021-09-15 19:10:52 INFO  Records read: 69834000
2021-09-15 19:10:53 INFO  Records read: 69835000
2021-09-15 19:10:53 INFO  Records read: 69836000
2021-09-15 19:10:54 INFO  Records read: 69837000
2021-09-15 19:10:54 INFO  Records read: 69838000
2021-09-15 19:10:55 ERROR Sep 15, 2021 7:10:54 PM net.snowflake.client.jdbc.RestRequest execute
2021-09-15 19:10:55 ERROR SEVERE: Error response: HTTP Response code: 403, request: GET https://sfc-aus-ds1-5-customer-stage.s3.ap-southeast-2.amazonaws.com/50iy-s-auss7828/results/019ef6f0-3200-8140-0000-6f710018c1be_0/main/data_0_4_136?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1631733039&Signature=**** HTTP/1.1
2021-09-15 19:10:55 ERROR Sep 15, 2021 7:10:55 PM net.snowflake.client.jdbc.SnowflakeChunkDownloader$2 getInputStream
2021-09-15 19:10:55 ERROR SEVERE: Error fetching chunk from: https://sfc-aus-ds1-5-customer-stage.s3.ap-southeast-2.amazonaws.com/50iy-s-auss7828/results/019ef6f0-3200-8140-0000-6f710018c1be_0/main/data_0_4_136?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1631733039&Signature=****
2021-09-15 19:10:55 ERROR Sep 15, 2021 7:10:55 PM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2021-09-15 19:10:55 ERROR SEVERE: Response status line reason: Forbidden
2021-09-15 19:10:55 ERROR Sep 15, 2021 7:10:55 PM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2021-09-15 19:10:55 ERROR SEVERE: Response content: <?xml version="1.0" encoding="UTF-8"?>
2021-09-15 19:10:55 ERROR <Error><Code>AccessDenied</Code><Message>Request has expired</Message><Expires>2021-09-15T19:10:39Z</Expires><ServerTime>2021-09-15T19:10:55Z</ServerTime><RequestId>CBSG46H7P0PG82JC</RequestId><HostId>HrGRp2B5ufiSfgRNpGmJ914gxkTaqtD5bgKkkogYVTuamW6THiABokv1J8hQt1PT6sdNzabuDRM=</HostId></Error>
2021-09-15 19:10:55 ERROR Sep 15, 2021 7:10:55 PM net.snowflake.client.jdbc.RestRequest execute
2021-09-15 19:10:55 ERROR SEVERE: Error response: HTTP Response code: 403, request: GET https://sfc-aus-ds1-5-customer-stage.s3.ap-southeast-2.amazonaws.com/50iy-s-auss7828/results/019ef6f0-3200-8140-0000-6f710018c1be_0/main/data_0_4_136?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1631733039&Signature=**** HTTP/1.1
2021-09-15 19:10:55 INFO  Records read: 69839000
2021-09-15 19:10:55 ERROR Sep 15, 2021 7:10:55 PM net.snowflake.client.jdbc.SnowflakeChunkDownloader$2 getInputStream
2021-09-15 19:10:55 ERROR SEVERE: Error fetching chunk from:

Steps to Reproduce

  1. Choose BiQuery as destination
  2. Standard Insert with 15MB as chunk and transfer 1GB table from SnowFlake
  3. Run job

Are you willing to submit a PR?

No

@chandininekkantti chandininekkantti added the type/bug Something isn't working label Sep 18, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Sep 24, 2021
@sherifnada
Copy link
Contributor

@chandininekkantti could you share the full log?

Also, is it an option to use bulk inserts on bigquery? We recommend using it over standard INSERT for production workloads

@chandininekkantti
Copy link
Author

chandininekkantti commented Sep 28, 2021

Can I request to email or upload the logs to a secure private location ? You mean use GCS staging (bulk inserts) ? Yes, we have tried that option too and same result - #6245

@chandininekkantti
Copy link
Author

@sherifnada - here are the latest logs with same outcome - https://airbytehq.slack.com/archives/C01MFR03D5W/p1637320569249100

@alafanechere
Copy link
Contributor

According to the logs shared by @chandininekkantti here's what I understand:

  • The Snowflake's source connector reads ~264126000 records
  • Snowflake REST's JDBC client makes a request to read more data. This HTTP request is a GET on a AWS presigned URL which is expired (these kind of aws links expire after a certain amount of time), so the AWS backend is responding with a 403.

@chandininekkantti
Copy link
Author

chandininekkantti commented Nov 19, 2021

Thanks for your reply.

What I don't understand is, when the presigned URL is generated ? At the start of the session i.e when reading the first record ?

So how to avoid this ? Can I load data from a large table in batches ? For example, extract and load 1-10000000 records and so on. Do I have to do it as part of custom transformation in DBT ? I have a feeling this is a very common issue and wondering how others are tackling.

I have already contacted the Snowflake about the issue and they suggested to keep the session active which I already did.

@sherifnada
Copy link
Contributor

sherifnada commented Dec 3, 2021

moved into our sprint, I suspect this is going to require some investgation to repro or research into snowflake

@prashantgolash
Copy link

prashantgolash commented Jul 25, 2022

Is there any update on this issue? The issue has been resolved but it is not actually fixed yet?
@cgardens @alafanechere @sherifnada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment