Skip to content


Repository files navigation


Guinea Pig is (yet another) workflow language for Hadoop. For more information, including a tutorial, see:

As the name suggests, Guinea Pig is similar to Pig, with some important differences.

  • Guinea Pig is pure Python, and embedded in Python, so there's less new stuff to learn.

  • Guinea Pig is simple. Programs use only ten pre-defined classes (like Join and Flatten), and the full implementation is less than 1500 non-comment-source lines.

  • Guinea Pig programs can be executed incrementally, and you can inspect and/or re-use partially constructed outputs - similar to the way that you might use make to implement a workflow.

  • Guinea Pig programs can be executed with or without a Hadoop backend, so you can use it for smaller-to-medium sized workflows, and then migrate these easily to a cluster.


No releases published


No packages published