Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outputs saved from pipeline saved as RDS are significantly larger than they should be #56

Open
husamrahman opened this issue Mar 22, 2019 · 3 comments

Comments

@husamrahman
Copy link

I have an object that's created by a function. in Rstudio, the object itself is around 3GB. When I save it as an RDS, the size is ~800MB. When I use the same function but part of the pipeline process and extract that specific object from the output of the pipeline and save it, the RDS size becomes 5GB+. It seems like there are dependencies of some sort when saving the RDS. Do I need to do something specific to remove the additional meta data?

Walkthrough:

x <- Result_From_Some_Function()
saveRDS(x) ~ 800MB

x <- output_from_pipeline@result$f1
saveRDS(x) ~ 5GB+

@naren1991
Copy link
Collaborator

This should not be the case. The object stored as output from the pipeline is exactly the same as a direct call to the function, within the R session. There might be minor differences when saving to RDS but definitely not of the order you have mentioned.

I verified the same with a simple example of a plot on iris data, using object.size() to check the size of the object in memory.

Could you share a reproducible example where you are facing this issue? Does this happen for outputs for all functions or this specific function? What is the class of the object that is returned from the function where you are facing this issue.

@husamrahman
Copy link
Author

The specific object being generated is a random forest model built using the randomforestSRC package. I tested this multiple times and can reproduce it every time. Is there potential conflicts with this package?

@naren1991
Copy link
Collaborator

This helps. No known conflicts, but might be a problem specific to this object. I will investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants