Pure python PIG-like language
Python Makefile Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
emr-extras
mrs_test
tutorial
LICENSE
README.md
TODO.txt
gpextras.py
guineapig.py
guineapig1_1.py
guineapig1_2.py
guineapig1_3.py
mrs_gp.py
mrs_gp1_0.py
spyk.py
testgp.py
testgp1_2.py
testgp1_3.py
testspyk.py

README.md

GuineaPig

Guinea Pig is (yet another) workflow language for Hadoop. For more information, including a tutorial, see: http://curtis.ml.cmu.edu/w/courses/index.php/Guinea_Pig

As the name suggests, Guinea Pig is similar to Pig, with some important differences.

  • Guinea Pig is pure Python, and embedded in Python, so there's less new stuff to learn.

  • Guinea Pig is simple. Programs use only ten pre-defined classes (like Join and Flatten), and the full implementation is less than 1500 non-comment-source lines.

  • Guinea Pig programs can be executed incrementally, and you can inspect and/or re-use partially constructed outputs - similar to the way that you might use make to implement a workflow.

  • Guinea Pig programs can be executed with or without a Hadoop backend, so you can use it for smaller-to-medium sized workflows, and then migrate these easily to a cluster.