Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read S3 with R 3.5 / CURL 3.2 #237

Closed
dinosupervisor opened this issue Jun 11, 2018 · 10 comments
Closed

Read S3 with R 3.5 / CURL 3.2 #237

dinosupervisor opened this issue Jun 11, 2018 · 10 comments
Labels

Comments

@dinosupervisor
Copy link

dinosupervisor commented Jun 11, 2018

After switching to the R-Version 3.5 (64 bit), using R-Studio under Windows, I can not read S3 buckets any longer. Maybe this is due to the updated CURL 3.2 package that loads with R 3.5. I can revert to R 3.4.3 / CURL 3.1.

Put your code here:

## load package
library("aws.s3")

## code
Sys.setenv("AWS_ACCESS_KEY_ID" = "REMOVED",
           "AWS_SECRET_ACCESS_KEY" = "REMOVED",
           "AWS_DEFAULT_REGION" = "us-east-1")

bucket_name = "REMOVED"

bucket = get_bucket(bucket = bucket_name)

object = save_object(bucket[[1]][["Key"]], 
                     file = tempfile(), 
                     bucket = bucket_name,
                     check_region = F,
                     verbose = T)

## works with 3.4.3

## using verbose = T, the output looks as follows:

# Checking for credentials in user-supplied values
# Checking for credentials in Environment Variables
# Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
# Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key
# Using Environment Variable 'AWS_DEFAULT_REGION' for AWS Region ('us-east-1')
# S3 Request URL: https://s3.amazonaws.com/REMOVED.zip
# Executing request with AWS credentials
# Checking for credentials in user-supplied values
# Using user-supplied value for AWS Access Key ID
# Using user-supplied value for AWS Secret Access Key
# Using user-supplied value for AWS Region ('us-east-1')
# Checking for credentials in user-supplied values
# Using user-supplied value for AWS Secret Access Key
# Using user-supplied value for AWS Region ('us-east-1')
# Parsing AWS API response
# Success: (200) OK

## session info for your system   
# R version 3.4.3 (2017-11-30)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)

# Matrix products: default

# locale:
# [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    
# LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] aws.s3_0.3.12

# loaded via a namespace (and not attached):
# [1] httr_1.3.1          compiler_3.4.3      R6_2.2.2            tools_3.4.3         base64enc_0.1-3     curl_3.1           
# [7] yaml_2.1.18         Rcpp_0.12.16        aws.signature_0.4.1 xml2_1.2.0          digest_0.6.15




## but it does not work under R 3.5 / CURL 3.2
# R version 3.5.0 (2018-04-23)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)

# Matrix products: default

# locale:
# [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
# LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] aws.s3_0.3.12

# loaded via a namespace (and not attached):
# [1] httr_1.3.1          compiler_3.5.0      R6_2.2.2            tools_3.5.0         base64enc_0.1-3     curl_3.2           
# [7] yaml_2.1.18         Rcpp_0.12.16        aws.signature_0.4.1 xml2_1.2.0          digest_0.6.15   

## which gives the following error

# Checking for credentials in user-supplied values
# Checking for credentials in Environment Variables
# Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
# Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key
# Using Environment Variable 'AWS_DEFAULT_REGION' for AWS Region ('us-east-1')
# S3 Request URL: https://s3.amazonaws.com/REMOVED.zip
# Executing request with AWS credentials
# Checking for credentials in user-supplied values
# Using user-supplied value for AWS Access Key ID
# Using user-supplied value for AWS Secret Access Key
# Using user-supplied value for AWS Region ('us-east-1')
# Checking for credentials in user-supplied values
# Using user-supplied value for AWS Secret Access Key
# Using user-supplied value for AWS Region ('us-east-1')
# Error in curl::curl_fetch_disk(url, x$path, handle = handle) : 
#   Unrecognized content encoding type. libcurl understands deflate, gzip content encodings.

I hope you can find the cause here, so we have a future proof solution.
Kind regards

@leeper
Copy link
Member

leeper commented Jun 11, 2018

I haven't seen this before. Can you give me the output of head_object(bucket[[1]][["Key"]]) (redacting anything that looks confidential)? I'm wondering if it's something about the specific file.

@leeper
Copy link
Member

leeper commented Jun 11, 2018

Actually, related to that, does get_object() work for this file?

@dinosupervisor
Copy link
Author

dinosupervisor commented Jun 11, 2018

head_object(bucket[[1000]][["Key"]],
+ file = tempfile(),
+ bucket = bucket_name,
+ check_region = F,
+ verbose = T)
Checking for credentials in user-supplied values
Checking for credentials in Environment Variables
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key
Using Environment Variable 'AWS_DEFAULT_REGION' for AWS Region ('us-east-1')
S3 Request URL: https: //s3.amazonaws.com/REMOVED.../REMOVED-events-10-2018-03-28-20-07-29-d93136e6-e751-4747-ba34-a11c4792bc0b.zip
Executing request with AWS credentials
Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-east-1')
Checking for credentials in user-supplied values
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-east-1')
[1] TRUE
attr(,"x-amz-id-2")
[1] "SO4XszQwDv12rflUsNiNTIZ5Y53pKaHHe09gXrdHonhy1tWhGI239wNmLZ7jFyBiXZUq71943/o="
attr(,"x-amz-request-id")
[1] "B890999CEB3BDAD6"
attr(,"date")
[1] "Mon, 11 Jun 2018 10:46:01 GMT"
attr(,"last-modified")
[1] "Wed, 28 Mar 2018 20:22:33 GMT"
attr(,"etag")
[1] ""8fe21e20845be24b4e51029a5114c842""
attr(,"content-encoding")
[1] "zip"
attr(,"accept-ranges")
[1] "bytes"
attr(,"content-type")
[1] "application/octet-stream"
attr(,"content-length")
[1] "3364"
attr(,"server")
[1] "AmazonS3"
attr(,"class")
[1] "HEAD"

@dinosupervisor
Copy link
Author

dinosupervisor commented Jun 11, 2018

get_object() does this:

object = get_object(bucket[[1]][["Key"]],
file = tempfile(),
bucket = bucket_name,
check_region = F,
verbose = T)

Checking for credentials in user-supplied values
Checking for credentials in Environment Variables
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key
Using Environment Variable 'AWS_DEFAULT_REGION' for AWS Region ('us-east-1')
S3 Request URL: https:// s3.amazonaws.com/REMOVED/REMOVED-10-2018-03-01-07-33-30-7e2a3b57-6942-4a27-8ff0-faa6e1340d6f.zip
Executing request with AWS credentials
Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-east-1')
Checking for credentials in user-supplied values
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-east-1')
Error in curl::curl_fetch_memory(url, handle = handle) :
Unrecognized content encoding type. libcurl understands deflate, gzip content encodings.

@leeper
Copy link
Member

leeper commented Jun 11, 2018

Thanks. One more request, can you wrap the entire call in httr::with_verbose() and copy over the output (again, redacting anything confidential)? I'm interested in the request and response headers for the call.

@dinosupervisor
Copy link
Author

dinosupervisor commented Jun 11, 2018

sure thing,

httr::with_verbose( head_object(bucket[[1]][["Key"]], file = tempfile(), bucket = bucket_name, check_region = F, verbose = T) )

gives

Checking for credentials in user-supplied values
Checking for credentials in Environment Variables
Using Environment Variable 'AWS_ACCESS_KEY_ID' for AWS Access Key ID
Using Environment Variable 'AWS_SECRET_ACCESS_KEY' for AWS Secret Access Key
Using Environment Variable 'AWS_DEFAULT_REGION' for AWS Region ('us-east-1')
S3 Request URL: https: //s3.amazonaws.com/REMOVED/2018/03/01/07/REMOVED-10-2018-03-01-07-33-30-7e2a3b57-6942-4a27-8ff0-faa6e1340d6f.zip
Executing request with AWS credentials
Checking for credentials in user-supplied values
Using user-supplied value for AWS Access Key ID
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-east-1')
Checking for credentials in user-supplied values
Using user-supplied value for AWS Secret Access Key
Using user-supplied value for AWS Region ('us-east-1')
-> HEAD /REMOVED/2018/03/01/07/REMOVED-10-2018-03-01-07-33-30-7e2a3b57-6942-4a27-8ff0-faa6e1340d6f.zip HTTP/1.1
-> Host: s3.amazonaws.com
-> User-Agent: libcurl/7.59.0 r-curl/3.2 httr/1.3.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, /
-> x-amz-date: 20180611T144838Z
-> x-amz-content-sha256: REMOVED
-> Authorization: AWS4-HMAC-SHA256 Credential=REMOVED/20180611/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-date, Signature=REMOVED
->
<- HTTP/1.1 200 OK
<- x-amz-id-2: REMOVED
<- x-amz-request-id: REMOVED
<- Date: Mon, 11 Jun 2018 14:48:42 GMT
<- Last-Modified: Thu, 01 Mar 2018 07:48:37 GMT
<- ETag: REMOVED
<- Content-Encoding: zip
<- Accept-Ranges: bytes
<- Content-Type: application/octet-stream
<- Content-Length: 630
<- Server: AmazonS3
<-
[1] TRUE
attr(,"x-amz-id-2")
[1] REMOVED
attr(,"x-amz-request-id")
[1] REMOVED
attr(,"date")
[1] "Mon, 11 Jun 2018 14:48:42 GMT"
attr(,"last-modified")
[1] "Thu, 01 Mar 2018 07:48:37 GMT"
attr(,"etag")
[1] REMOVED
attr(,"content-encoding")
[1] "zip"
attr(,"accept-ranges")
[1] "bytes"
attr(,"content-type")
[1] "application/octet-stream"
attr(,"content-length")
[1] "630"
attr(,"server")
[1] "AmazonS3"
attr(,"class")
[1] "HEAD"

@leeper
Copy link
Member

leeper commented Jun 11, 2018

What's the file extension on the object key?

@dinosupervisor
Copy link
Author

it's a zip file.

@dinosupervisor
Copy link
Author

Our server devs will upload .gz files henceforth, which is supported and works.
It should be noted that I am able to download the existing zip files, after removing the metadata tag "content-encoding" (which is set to "zip") in the S3 bucket.
@leeper sorry for the false alarm and thank you so much for helping to find the problem.

@leeper
Copy link
Member

leeper commented Jun 13, 2018

Okay, sorry that sounds a bit frustrating, but glad you got it figured out.

@leeper leeper closed this as completed Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants