fridge is a simple approach to managing data analysis projects in
fridge assumes that data analysis projects are managed using a
hierarchical folder structure. Helper functions are kept seperated
from the code that specifies the final models etc., in a
Objects that need long computation times to be created are stored as
binary files (
*.rda) in a
cache directory and are reused if needed
rather than recreated each time the analyses are run.
install.packages("devtools") library(devtools) install_github("cszang/fridge")
lib directory inside the main project folder, and populate with auxilliary code needed for e.g. munging data from various sources, creating special plots, ...
To create cached objects, use
freeze() or its infix operator
%<f-% for assignment and caching in one step:
freeze("a_big_sum", sum(1:2000)) # the same as a_big_sum %<f-% sum(1:2000)
This will assign
sum(1:2000) to a new object
a_big_sum, and cache
it for later use. The next time the exact same call is made from the
same or another script, the assignment is not evaluated, but the
cached object is loaded. It will also store the checksum of the
expression used to create the object and issue a warning when the
expression has changed from the cached version of the object.
To "forget" an assignment including cached data, use
Compare checksums during assigning read in file
To assure that the object created while reading in data is identical to previous runs of the analysis, the SHA1 sum of the object can be stored and compared to the previous version. If the SHA1 of the current object diverges from the SHA1 sum of the same object from a previous run, a warning is issued and the old SHA1 sum is retained. After adapting the code to the new version of the data, delete the SHA1 sum corresponding to the object and re-create the object.
This is probably most helpful when your data is too big for Git(Hub), Git LFS is not an option, or the same big data files are used for multiple projects (e.g., large climate data sets of several GB in size).
e %<c-% read_table("some_file.txt")
request checks for the existance of an object in the current
session. If the object is not defined, it will try to load it into the
session from the project's
All cached objects can be deleted in one go using
empty_cache(). This will, however, not "forget" the objects in the
Loading library functions
Auxilliary code from the
lib directory can easily be sourced into
the R session using
load_lib(), which can either load all files from
lib folder (default), or only specified ones, which have to be
only referenced to by their base name without extension.