memDecompress error #59

achekroud · 2016-07-31T22:03:50Z

Hi team

I may be working in a very strange use case, I'm not sure. Feel free to disregard this if so.

I'm working in r-studio-server hosted on an EC2 instance in AWS (amazon linux OS, R 3.2.2, rstudio-server version 0.99.465). I'm trying to use the aws.s3 package to access an .rds file that is in an s3 bucket. The file is approximately 200mb on disk as an RDS. The EC2 instance is m4.2xlarge, so there should be around 32GB of RAM available.

The bucket is called "chek1", and get_bucket("chek1") works fine.

However, when I do:
> s3readRDS(object = "SAMHDA/RAWdata/vcat.08-14.rds", bucket = "chek1")

I get the following cryptic error message

Error in memDecompress(from = as.vector(r), type = "gzip") : 
  internal error -3 in memDecompress(2)

I'm not sure whats going on here. Does anyone have any ideas/workarounds? I really liked the look/feel of this package, and was pretty surprised to get tripped up with this. Googling the error message only returns a random conversation from 2012 between @hadley wickham and brian ripley lol.

Adam

The text was updated successfully, but these errors were encountered:

leeper · 2016-08-01T07:36:56Z

This looks like it might be a bug. Are you able to get the object as a raw vector using get_object(object = "SAMHDA/RAWdata/vcat.08-14.rds", bucket = "chek1")?

achekroud · 2016-08-01T21:35:33Z

Yeah, the command executes. I wasn't sure what the output does/means though.

leeper · 2016-09-07T12:58:54Z

I am unable to reproduce this. Given that you can read the file using get_object(), it seems it is probably an issue with the file rather than with this package. I'm closing for now. Feel free to open a new issue or follow-up here if you continue to experience issues.

yasminlucero · 2016-10-06T21:35:12Z

I had this exact behavior as well. Notably, the RDS that failed was a large file (85MB). The s3readRDS worked fine on a small file (1KB). Oh, and I verified that I can read the file via other means (s3 fs mount). So, there is no reason to expect that the file is corrupt.

big.test <- s3readRDS(object = "bigtest.RDS", bucket = "grv-myexamplebucket")

Error in memDecompress(from = as.vector(r), type = "gzip") : 
  internal error -3 in memDecompress(2)

big.test.raw <- get_object(object = "bigtest.RDS", bucket = "grv-myexamplebucket")

  attr(big.test.raw, 'content-type')
[1] "application/octet-stream"
  attr(big.test.raw, 'content-length')
[1] "88697837"

I haven't figured out yet how to parse the raw object.

The error is on line 6045 ish: https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/main/connections.c

vicmayrink · 2017-01-16T20:24:42Z

I'm experiencing exactly the same issue. Did you find any solution?

mjpdenver · 2017-01-19T21:58:20Z

Likewise - I get the same response as yasminlucero trying to read an RDS file.

Thanks

fanghaolei · 2017-04-12T19:20:26Z

I'm experiencing the exact same issue. It appears to me that this memDecompress error only occurs when I sync some .rds file to a bucket via aws CLI tool first and then try to download it with S3readRDS().

Thanks!

ieaves · 2017-04-18T20:28:08Z

I have no idea if this is related to the issue everyone else is seeing but in my use case s3saveRDS requires headers=list("x-amz-server-side-encryption" = "AES256") like so:

s3saveRDS(my_object, bucket=my_bucket, object=my_file_name, headers=list("x-amz-server-side-encryption" = "AES256"))

however, attempting to use s3readRDS with the same headers results in the cryptic memDecompress error.

Removing the headers from the readRDS call like s3readRDS(bucket=my_bucket, object=my_file_name) allowed me to successfully load from s3.

leonawicz · 2017-04-21T19:42:39Z

I am experiencing the same issue with package version aws.s3_0.2.2.

First I tried to use s3readRDS on .rds files I had previously uploaded to an AWS S3 bucket using the S3 web GUI uploader. This give the same memDecompress error noted above. I can always read the raw vector with get_object.

The second way I did this was to use put_object to upload .rds files to my bucket. Trying to load such a file with s3readRDS results in the same error.

The third way I tried was to upload rds files to my bucket strictly using the s3saveRDS wrapper. Only if uploaded in this manner can I then subsequently load .rds files using s3readRDS.

I am not sure what is different about these files based on method of upload. I was hopeful that at least the second approach using put_object on local .rds files for uploading would have been a solution, because it is analogous to the approach I have to use for uploading .RData files, using put_object directly instead of s3save (see issue #128)

For the time being, it seems that uploading strictly via s3saveRDS will avoid the reading errors with s3readRDS. Not ideal, but this is working for me. And at least at a glance (haven't fully tested) doing so fortunately does not appear to lead to file size bloat like in the above referenced issue.

Regards,
Matt

leeper · 2017-04-23T12:51:44Z

@leonawicz Can you give this a try on the latest version from GitHub?

leonawicz · 2017-04-24T16:19:01Z

I can confirm with the latest github version aws.s3_0.2.4 I can load an object into R via s3readRDS regardless of which of the three methods of upload to AWS I'd previously used: upload R object directly with s3saveRDS, upload a previously saved (using base saveRDS) local .rds file via put_object, or upload previously saved .rds file using the AWS GUI uploader utility.

Serenthia · 2017-04-24T16:42:26Z

FYI, this change has meant that I can't read any binary files I previously saved to S3 with the old method, which is a breaking change as far as I'm concerned.

Re-uploading them with the new s3saveRDS method means they then can be read, however I can't do this for thousands of past files...

leeper · 2017-04-24T17:19:08Z

@Serenthia what error do you get when trying to read a previously uploaded RDS?

leonawicz · 2017-04-24T19:57:09Z

@leeper I also noticed just now that I could no longer read .rds files uploaded with the previous package version. I had to delete them all from AWS and reupload before I could read them with the newer package version s3readRDS. The error is:

Error in readRDS(tmp) : unknown input format

This occurs trying to read older .rds files. Newer ones are fine. It seems somehow the file created was dependent on the aws.s3 package version. Hopefully, it was a bug unique to the old version? I'm unsure why when reading a .rds file with s3readRDS it would matter how it was created and uploaded to AWS. But for some reason it seems to matter with which package version the file was made.

Serenthia · 2017-04-24T23:20:21Z

Can confirm that that's the same behaviour and error message that I'm experiencing. Thanks for the reopen!

leeper · 2017-04-25T08:03:24Z

Okay, I think I've tracked this down to being a decompression issue. Just to confirm that you're experiencing it the same way (@Serenthia, @leonawicz), if you do this for one of the older files:

o <- get_object("s3://yourbucket/yourobject")
unserialize(memDecompress(o, "gzip"))

Do you get back what you expect?

Serenthia · 2017-04-25T14:40:36Z

@leeper Yes - using that, I can successfully read a file that returns the unknown input format error using readRDS.

leeper · 2017-04-25T16:05:47Z

Okay, I've tracked this down to the previous behavior being a bug (specifically, serialize() sets xdr = TRUE by default (writing to big endian), which is (basically) never what we want. The current behavior is correct and more consistent with using saveRDS() and readRDS() directly.

However, because it would be annoying to figure this out for a given file, s3readRDS() now tries to read and then tries to unserialize if that fails, so it should work on both older (incorrect) and new files.

Let me know if not and I'll continue to patch.

Serenthia · 2017-04-25T22:26:11Z

Thanks! 0.2.5 looks perfect 👍

drorata · 2019-11-04T09:30:06Z

What about non-RDS files? I fail to load a compressed JSON from S3.

drorata · 2019-11-04T10:51:35Z

I found the workaround to be something like:

read_gzip_json_from_s3_to_df <- function(path) {
  #' Read a single gzipeed JSON file from S3 location into a dataframe
  #'
  #' The compressed JSON should contain a single object per line
  #' with no commas of array structure wrapping the objects
  #'
  #' @param path S3 location of an object; e.g. s3://my-bucket/some/folders/file.json.gz
  raw_data <- path %>% get_object %>% rawConnection %>% gzcon %>% jsonlite::stream_in() %>% jsonlite::flatten()
  raw_data
}

dmaupin12 · 2021-12-02T21:30:13Z

I just had this happen on a fairly large dataset as well. The following code is how i upload to server. Is there a better way to do this to avoid this happening in the future?

tmp <- tempfile()
saveRDS(full_data, tmp)
put_object(tmp, object = paste0(s3_path,"full_data.rds"), show_progress = TRUE, multipart = TRUE)

leeper added the question label Aug 1, 2016

leeper closed this as completed Sep 7, 2016

leeper reopened this Oct 7, 2016

This was referenced Apr 22, 2017

s3save/save significant file size inconsistency #128

Closed

Read .gz file from s3 #117

Closed

leeper closed this as completed in cfa7541 Apr 23, 2017

leeper reopened this Apr 24, 2017

leeper added a commit that referenced this issue Apr 25, 2017

maybe, finally fix s3readRDS()? (#59)

2fc108e

leeper mentioned this issue Apr 25, 2017

memCompress Error on s3saveRDS #89

Closed

leeper closed this as completed Apr 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memDecompress error #59

memDecompress error #59

achekroud commented Jul 31, 2016

leeper commented Aug 1, 2016

achekroud commented Aug 1, 2016

leeper commented Sep 7, 2016

yasminlucero commented Oct 6, 2016 •

edited

Loading

vicmayrink commented Jan 16, 2017 •

edited

Loading

mjpdenver commented Jan 19, 2017

fanghaolei commented Apr 12, 2017 •

edited

Loading

ieaves commented Apr 18, 2017 •

edited

Loading

leonawicz commented Apr 21, 2017

leeper commented Apr 23, 2017

leonawicz commented Apr 24, 2017 •

edited

Loading

Serenthia commented Apr 24, 2017

leeper commented Apr 24, 2017

leonawicz commented Apr 24, 2017

Serenthia commented Apr 24, 2017

leeper commented Apr 25, 2017

Serenthia commented Apr 25, 2017

leeper commented Apr 25, 2017

Serenthia commented Apr 25, 2017

drorata commented Nov 4, 2019

drorata commented Nov 4, 2019

dmaupin12 commented Dec 2, 2021 •

edited

Loading

memDecompress error #59

memDecompress error #59

Comments

achekroud commented Jul 31, 2016

leeper commented Aug 1, 2016

achekroud commented Aug 1, 2016

leeper commented Sep 7, 2016

yasminlucero commented Oct 6, 2016 • edited Loading

vicmayrink commented Jan 16, 2017 • edited Loading

mjpdenver commented Jan 19, 2017

fanghaolei commented Apr 12, 2017 • edited Loading

ieaves commented Apr 18, 2017 • edited Loading

leonawicz commented Apr 21, 2017

leeper commented Apr 23, 2017

leonawicz commented Apr 24, 2017 • edited Loading

Serenthia commented Apr 24, 2017

leeper commented Apr 24, 2017

leonawicz commented Apr 24, 2017

Serenthia commented Apr 24, 2017

leeper commented Apr 25, 2017

Serenthia commented Apr 25, 2017

leeper commented Apr 25, 2017

Serenthia commented Apr 25, 2017

drorata commented Nov 4, 2019

drorata commented Nov 4, 2019

dmaupin12 commented Dec 2, 2021 • edited Loading

yasminlucero commented Oct 6, 2016 •

edited

Loading

vicmayrink commented Jan 16, 2017 •

edited

Loading

fanghaolei commented Apr 12, 2017 •

edited

Loading

ieaves commented Apr 18, 2017 •

edited

Loading

leonawicz commented Apr 24, 2017 •

edited

Loading

dmaupin12 commented Dec 2, 2021 •

edited

Loading