-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3save/save significant file size inconsistency #128
Comments
@leonawicz Thanks for this report. I don't know how to explain this given that I've created a new branch that replaces all the internal usage of Can you give it a try and let me know how it affects your use case? |
I can confirm with version If there is a way to achieve the same using The only other thing I am thinking about regarding the current |
You can specify an object key with whatever extension you want. The tempfile gets purged immediately, so it would only be seen during some kind of debugging. |
Oops. Right, sorry, I mixed up file and object when reading the |
And to respond substantively, I've been reading (no time for full benchmarks, unfortunately) and it seems that the i/o cost of writing to disk is probably minimal as rawConnections can apparently be somewhat slow. They also lead to modification of defaults for some functions, so my inclination is to stick with the new behavior. Closing (for now). |
Hi,
This is a great package, easy to use, and seems to be the most promising existing package. I am curious if I am missing something or if this could be addressed to make the package better.
I noticed that when using
s3save
, it sends a relatively decompressed raw data file to AWS. While the .RData file generated behaves exactly the same if downloaded and then loaded into R using the base R function,load
, the file is much larger in size than if I were to save the same objects to a local .RData file usingsave
. The latter is much more compressed.While this in no way affects function and
s3save
appears to be an analog tosave
on the surface, it makes much larger files thansave
. This also defeats the purpose of rapid file retrieval over the internet from AWS (e.g., in R Shiny apps when there are a number of files preferably stored externally and loaded on demand rather than hardcoded into the app).An easy way around this is to use
save
to save a local .RData file and then useput_object
to send the more compressed version of the R workspace file to AWS. For me, this allowed me to use the package because I was then able to retrieve ~1.8 MB files in about one second withs3load
rather than ~15 MB files containing identical objects, which took about 12 seconds to retrieve withs3load
(far too slow for serving apps for example). It was the perfect use case to highlight why I would have to use this latter approach.What I am wondering (assuming I'm not missing some existing option) is wouldn't it be preferable to increase the consistency between
s3save
andsave
in this regard? The method would not have to be a replacement, but perhaps an option (default?) would be to haves3save
simply create a local .RData temporary file behind the scenes usingsave
and then upload that file to AWS viaput_object
. It seems that this most accurately reproduces the behavior ofsave
as well as avoiding unnecessary file size expansion for remote storage/retrieval.I am using
aws.s3_0.2.2
from Github.Regards,
Matt
The text was updated successfully, but these errors were encountered: