This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
Running programs
Local run:
python program.py -input input.txt -output output.txt python -m dumbo cat output.txt | moreDistributed run on Hadoop:
python program.py -hadoop <path to local hadoop> \ -input <DFS input path> -output <DFS output path> [<options>] python -m dumbo cat <DFS output path> -hadoop <path to local hadoop> | moreOptions (see also the Hadoop streaming page and wiki):
- -input <additional DFS input path>
- -python <python command to use on nodes> (“python” by default)
- -name <job name> (“program.py” by default)
- -nummaptasks <number>
- -numreducetasks <number> (no sorting or reducing will take place if this is 0)
- -priority <priority value>
- -libjar <path to jar> (this jar gets put in the class path)
- -libegg <path to egg> (this egg gets put in the Python path)
- -file <local file> (this file will be put in the dir where the python program gets executed)
- -cacheFile hdfs://<host>:<fs_port>/<path to file>#<link name> (a link ”<link name>” to the given file will be in the dir)
- -cacheArchive hdfs://<host>:<fs_port>/<path to jar>#<link name> (link points to dir that contains files from given jar)
- -inputformat <name of an InputFormat class> (“TextInputFormat” by default)
- -cmdenv <env var name>=<value>
- -jobconf <property name>=<value>
- -fake yes (fake run, only prints the underlying shell commands but does not actually execute them)
Last edited by klbostee, 6 days ago
Versions:





