makefile type behavior #11

Open
piccolbo opened this Issue Sep 13, 2011 · 1 comment

Comments

Projects
None yet
1 participant
Collaborator

piccolbo commented Sep 13, 2011

In very large computations, it's important to avoid recomputing what has not changed from the last time. It would be nice to support this in rmr or an add-on package. The steps I envision are: compute a unique filename from the args of a mapreduce job to use as output path; add logic to check for the mod date of inputs and outputs (or use CRCs if dates are unreliable) and trigger re-computation only as needed; define an input type or a package option that would trigger this behavior in a complex job without having to modify the job, e.g. we have linear.least.squares function that triggers multiple jobs written before this makefile-like feature was even discussed. Can we switch on the makefile-like behavior without any internal modification? One possibility is a decorator type solution, see addMemoization in R.cache package for a similar approach.

Collaborator

piccolbo commented Oct 17, 2011

See also what Hadoopy is doing leveraging Ooozie features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment