Skip to content
A repository the explores interacting with Hadoop through Python with IPython Notebooks.
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
Karung.ipynb
LICENSE
README.md

README.md

Karung

An example of creating mapreduce programs in Python by using Hadoop and Pig

To checkout my presentation click here.

Installation

On Windows.

Get and install Virtual Box.

Get the Cloudera Quick Start VM. This contains Hadoop, Pig, Hive, HBase, and Zookeeper installed on RedHat Linux. Start your VM and go to the terminal.

Execute the following commands in root to install Python2.7:

yum groupinstall "Development tools"
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel

wget http://python.org/ftp/python/2.7.5/Python-2.7.5.tar.bz2
tar xf Python-2.7.5.tar.bz2
cd Python-2.7.5
./configure 
make && make altinstall

Enter the following commands to install distribute:

wget http://pypi.python.org/packages/source/d/distribute/distribute-0.6.45.tar.gz
tar xf distribute-0.6435.tar.gz
cd distribute-0.6.45
python2.7 setup.py install

A good tutrial on the process is located here.

Name Etymology

The Karung Snake is a slang term for the Elephant Trunk Snake. Because these projects will focus on interacting with Hadoop with python, it seemed like a good choice.

Something went wrong with that request. Please try again.