-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memDecompress error #59
Comments
This looks like it might be a bug. Are you able to get the object as a raw vector using |
Yeah, the command executes. I wasn't sure what the output does/means though. |
I am unable to reproduce this. Given that you can read the file using |
I had this exact behavior as well. Notably, the RDS that failed was a large file (85MB). The s3readRDS worked fine on a small file (1KB). Oh, and I verified that I can read the file via other means (s3 fs mount). So, there is no reason to expect that the file is corrupt.
I haven't figured out yet how to parse the raw object. The error is on line 6045 ish: https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/main/connections.c |
I'm experiencing exactly the same issue. Did you find any solution? |
Likewise - I get the same response as yasminlucero trying to read an RDS file. Thanks |
I'm experiencing the exact same issue. It appears to me that this Thanks! |
I have no idea if this is related to the issue everyone else is seeing but in my use case s3saveRDS requires
however, attempting to use s3readRDS with the same headers results in the cryptic memDecompress error. Removing the headers from the readRDS call like |
I am experiencing the same issue with package version First I tried to use The second way I did this was to use The third way I tried was to upload rds files to my bucket strictly using the I am not sure what is different about these files based on method of upload. I was hopeful that at least the second approach using For the time being, it seems that uploading strictly via Regards, |
@leonawicz Can you give this a try on the latest version from GitHub? |
I can confirm with the latest github version |
FYI, this change has meant that I can't read any binary files I previously saved to S3 with the old method, which is a breaking change as far as I'm concerned. Re-uploading them with the new s3saveRDS method means they then can be read, however I can't do this for thousands of past files... |
@Serenthia what error do you get when trying to read a previously uploaded RDS? |
@leeper I also noticed just now that I could no longer read .rds files uploaded with the previous package version. I had to delete them all from AWS and reupload before I could read them with the newer package version
This occurs trying to read older .rds files. Newer ones are fine. It seems somehow the file created was dependent on the aws.s3 package version. Hopefully, it was a bug unique to the old version? I'm unsure why when reading a .rds file with |
Can confirm that that's the same behaviour and error message that I'm experiencing. Thanks for the reopen! |
Okay, I think I've tracked this down to being a decompression issue. Just to confirm that you're experiencing it the same way (@Serenthia, @leonawicz), if you do this for one of the older files: o <- get_object("s3://yourbucket/yourobject")
unserialize(memDecompress(o, "gzip")) Do you get back what you expect? |
@leeper Yes - using that, I can successfully read a file that returns the |
Okay, I've tracked this down to the previous behavior being a bug (specifically, However, because it would be annoying to figure this out for a given file, Let me know if not and I'll continue to patch. |
Thanks! 0.2.5 looks perfect 👍 |
What about non-RDS files? I fail to load a compressed JSON from S3. |
I found the workaround to be something like: read_gzip_json_from_s3_to_df <- function(path) {
#' Read a single gzipeed JSON file from S3 location into a dataframe
#'
#' The compressed JSON should contain a single object per line
#' with no commas of array structure wrapping the objects
#'
#' @param path S3 location of an object; e.g. s3://my-bucket/some/folders/file.json.gz
raw_data <- path %>% get_object %>% rawConnection %>% gzcon %>% jsonlite::stream_in() %>% jsonlite::flatten()
raw_data
} |
I just had this happen on a fairly large dataset as well. The following code is how i upload to server. Is there a better way to do this to avoid this happening in the future?
|
Hi team
I may be working in a very strange use case, I'm not sure. Feel free to disregard this if so.
I'm working in r-studio-server hosted on an EC2 instance in AWS (amazon linux OS, R 3.2.2, rstudio-server version 0.99.465). I'm trying to use the aws.s3 package to access an .rds file that is in an s3 bucket. The file is approximately 200mb on disk as an RDS. The EC2 instance is m4.2xlarge, so there should be around 32GB of RAM available.
The bucket is called "chek1", and
get_bucket("chek1")
works fine.However, when I do:
> s3readRDS(object = "SAMHDA/RAWdata/vcat.08-14.rds", bucket = "chek1")
I get the following cryptic error message
I'm not sure whats going on here. Does anyone have any ideas/workarounds? I really liked the look/feel of this package, and was pretty surprised to get tripped up with this. Googling the error message only returns a random conversation from 2012 between @hadley wickham and brian ripley lol.
Adam
The text was updated successfully, but these errors were encountered: