public
Description: Python module that allows you to easily write and run Hadoop programs.
Homepage: http://last.fm/dumbo
Clone URL: git://github.com/klbostee/dumbo.git
Klaas Bosteels (author)
Wed Dec 10 12:25:36 -0800 2008
commit  6880858ef7335d3af3075faa6781ad54e90e9c9c
tree    b0b3155a1bec119e559ca59370a629852cccfc69
parent  c0866f5ca102af5a9a3fe63e681542efd2f14974
dumbo /
name age message
file README Loading commit data...
file build-pymod.xml Tue Nov 04 03:28:22 -0800 2008 Fixed ant issues and updated README. [Klaas Bosteels]
file build.xml
directory examples/
directory src/
README
DESCRIPTION
"""""""""""

Originally, Dumbo was just a simple Python module that made writing 
and running Streaming programs very easy, but now it also consists 
of some helper code in Java. More generally, Dumbo can be considered
to be a convenient Python API for writing MapReduce programs.


INSTALLATION
""""""""""""

The Java code gets build together with the rest of Hadoop when the 
"dumbo/" directory is put in Hadoop's "src/contrib/", and the Python 
module can be installed by running

sudo ant -f build-pymod.xml install_pymod

in the "src/contrib/dumbo" directory. If the dir "dumbo/" is a subdir
of Hadoop's "src/contrib/", then the -f option can be omitted:

sudo ant install_pymod


USAGE
"""""

/usr/local/hadoop/bin/hadoop dfs -put examples/brian.txt brian.txt

python examples/wordcount.py -hadoop /path/to/hadoop \
-file excludes.txt -input brian.txt -output brian-wc

python -m dumbo cat brian-wc > brian-wc.txt


MORE INFO
"""""""""

http://github.com/klbostee/dumbo/wikis