New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load .RDS files directly into environment gcs_get_object
?
#146
Comments
Yes you can supply a custom parse function to load the object directly into R. You would want something like readRDS(). All the downloads write to disk at least temporarily so it's not more efficient, but a lot more convenient:) |
Hi Mark, Thanks for quick response. This must be what I'm not quite understanding because when I run:
I get an error that the parsing failed. Thanks! |
Sorry I thought this would be simpler but actually the raw RDS response is harder to deal with than I thought. The best I can come up with is a wrapper to saveToDisk then load it which will do what I thought it should do: my_parse <- function(obj){
tmp <- tempfile(fileext = ".rds")
on.exit(unlink(tmp))
suppressMessages(gcs_get_object(obj, saveToDisk = tmp))
readRDS(tmp)
}
obj <- my_parse("gs://bucket_name/obj.RDS") I will look at if this can be improved :) |
Rich Fergie found the right functions for parsing RDS without needing to save to disk for you: https://twitter.com/RichardFergie/status/1385531335423447040 f <- function(obj) {
readRDS(gzcon(rawConnection(httr::content(obj))))
}
gcs_get_object("obj.rds", parseFunction = f) |
I added the function as a helper as it looked useful, so for the GitHub version you can use: gcs_get_object("obj.rds", parseFunction = gcs_parse_rds) See |
Hey Mark, Really appreciate your help on this! Unfortunately still getting errors when I try myself. Although the errors are different depending on whether it is the GitHub branch or CRAN version. Using github master branch and running the code below results in following error:
If I revert to the CRAN version and using the custom parse function itself from global env I get following error messages:
For reference the RDS object that I'm testing this with is 2.4GB. Also including sessionInfo below for reference in case it's helpful! Thanks again so much for all your help on this and quick response!!
|
Ok cool, seems your RDS is a special case compared to mine ;) May I ask if the RDS files you are using "old" in that they were done before R 3.5? They changed the format type in that release, just trying to eliminate it as a cause. |
Could you also issue |
And I guess writing to disk should work ok? my_parse <- function(obj){
tmp <- tempfile(fileext = ".rds")
on.exit(unlink(tmp))
suppressMessages(gcs_get_object(obj, saveToDisk = tmp))
readRDS(tmp)
}
obj <- my_parse("gs://bucket_name/obj.RDS") It may be that 2.4GB is just too big for R to decompress |
FYI: for me, this works with a 10.2GB .RDS file that is saved without compression (with readr::write_rds). So the file size per se, at least, is not the issue. Thanks for implementing this very convenient parser function! |
Thanks @LukasWallrich good to know. I think then @samuel-marsh 's rds file must have something unique about it - if it is downloaded locally trying to debug where the |
Somehow unrelated, this strategy also works for parsing UTF-16LE CSV files, which I haven't managed to do by just using |
I forgot to put here that If there are other useful parsing functions I'd be glad to put them in. |
@MarkEdmondson1234 - I think you might have meant to type Thank you so much for your contributions! googleCloudStorageR and googleCloudRunner are incredibly useful tools. |
Ah yes that is it gcs_ vs gce_ - got confusing sometimes working on the packages at same time ;) glad they are helpful! |
Hi,
This might be naive question and I might be missing something but wondering if there is way to load file saved as a .RDS file from GCP bucket directly into local R environment without saving to disk first?
I have been currently trying this with objects created with the single cell analysis package Seurat which creates S4 class object (See more info on Seurat Objects format see here: https://github.com/mojaveazure/seurat-object and here: https://github.com/satijalab/seurat/wiki.
When I run:
It loads into the environment as a "Raw" file that is then unreadable by Seurat. If I add
saveToDisk = "obj.RDS"
and then subsequently read it into R withreadRDS
(or wrapperread_rds
) then it works just fine and is readable by Seurat.Wondering whether there is additional parameter I missing specifying that would allow this or if not whether this is feature that could be added?
Thanks!
Sam
The text was updated successfully, but these errors were encountered: