Marc Limotte edited this page May 8, 2013 · 9 revisions


Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions (aka hooks) and zero or more "steps". A step is Amazon's name for a task or job submitted to the cluster. Lemur reads your jobdef, at the end of your jobdef, you execute (fire! ...) to make things happen. Also keep in mind that the jobdef is an interpreted clj file, so you can insert arbitrary Clojure code to be executed anywhere in the file (but see Jobdef Hooks for a better way).


Start Here Marc's blog announcement

A sample jobdef using Hadoop Streaming