public
Description: Python module that allows you to easily write and run Hadoop programs.
Homepage: http://last.fm/dumbo
Clone URL: git://github.com/klbostee/dumbo.git
Klaas Bosteels (author)
Tue Dec 30 10:51:37 -0800 2008
commit  d0f67d12c5adf885291f1deb9b1e469b75ca45a4
tree    29b5429f85dfe3118267e4dbfb75b532fb8a758e
parent  d5f1ede9cbaf796b2324021b921c9bfcfb5c79cb
dumbo /
name age message
file README Fri Dec 26 03:51:46 -0800 2008 added example, updated README, and added indivi... [Klaas Bosteels]
directory bin/ Fri Dec 26 03:51:46 -0800 2008 added example, updated README, and added indivi... [Klaas Bosteels]
file build-pymod.xml Tue Nov 04 03:28:22 -0800 2008 Fixed ant issues and updated README. [Klaas Bosteels]
file build.xml Fri Dec 26 03:18:58 -0800 2008 fit better into hadoop [Klaas Bosteels]
directory examples/ Sat Dec 27 02:50:45 -0800 2008 removed "-inputformat text" from examples [Klaas Bosteels]
directory src/ Loading commit data...
README
DESCRIPTION
"""""""""""

Originally, Dumbo was just a simple Python module that made writing 
and running Streaming programs very easy, but now it also consists 
of some helper code in Java. More generally, Dumbo can be considered
to be a convenient Python API for writing MapReduce programs.


INSTALLATION
""""""""""""

Dumbo should get built together with the rest of Hadoop when the 
"dumbo/" directory is put in Hadoop's "src/contrib/" directory. More
precisely, a "build/hadoop-*/contrib/dumbo/" directory should be
generated when you run "ant package" in Hadoop's root directory.


USAGE
"""""

contrib/dumbo/bin/put examples/brian.txt brian.txt

contrib/dumbo/bin/start examples/wordcount.py \
-input brian.txt -output brian-wc -inputformat text

contrib/dumbo/bin/cat brian-wc > brian-wc.txt


MORE INFO
"""""""""

http://github.com/klbostee/dumbo/wikis