A repository the explores interacting with Hadoop through Python with IPython Notebooks.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



An example of creating mapreduce programs in Python by using Hadoop and Pig

To checkout my presentation click here.

###Installation On Windows.

Get and install Virtual Box.

Get the Cloudera Quick Start VM. This contains Hadoop, Pig, Hive, HBase, and Zookeeper installed on RedHat Linux. Start your VM and go to the terminal.

Execute the following commands in root to install Python2.7:

yum groupinstall "Development tools"
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel

wget http://python.org/ftp/python/2.7.5/Python-2.7.5.tar.bz2
tar xf Python-2.7.5.tar.bz2
cd Python-2.7.5
make && make altinstall

Enter the following commands to install distribute:

wget http://pypi.python.org/packages/source/d/distribute/distribute-0.6.45.tar.gz
tar xf distribute-0.6435.tar.gz
cd distribute-0.6.45
python2.7 setup.py install

A good tutrial on the process is located here.

###Name Etymology The Karung Snake is a slang term for the Elephant Trunk Snake. Because these projects will focus on interacting with Hadoop with python, it seemed like a good choice.