Guinea Pig is (yet another) workflow language for Hadoop. For more information, including a tutorial, see: http://curtis.ml.cmu.edu/w/courses/index.php/Guinea_Pig
As the name suggests, Guinea Pig is similar to Pig, with some important differences.
Guinea Pig is pure Python, and embedded in Python, so there's less new stuff to learn.
Guinea Pig is simple. Programs use only ten pre-defined classes (like Join and Flatten), and the full implementation is less than 1500 non-comment-source lines.
Guinea Pig programs can be executed incrementally, and you can inspect and/or re-use partially constructed outputs - similar to the way that you might use make to implement a workflow.
Guinea Pig programs can be executed with or without a Hadoop backend, so you can use it for smaller-to-medium sized workflows, and then migrate these easily to a cluster.