Skip to content

Commit

Permalink
Added sphinx docs for PyPi. Some renames and hiding of methods so tha…
Browse files Browse the repository at this point in the history
…t the docs would make more sense.
  • Loading branch information
David Marin committed Oct 21, 2010
1 parent ea048f9 commit bd46f0a
Show file tree
Hide file tree
Showing 49 changed files with 2,290 additions and 899 deletions.
71 changes: 71 additions & 0 deletions README.md
@@ -0,0 +1,71 @@
mrjob
=====

mrjob is a Python package that helps you write and run Hadoop Streaming jobs.

mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. It also works with your own Hadoop cluster.

Some important features:

* Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
* Write multi-step jobs (one map-reduce step feeds into the next)
* Duplicate your production environment inside Hadoop
* Upload your source tree and put it in your job's `$PYTHONPATH`
* Run make and other setup scripts
* Set environment variables (e.g. `$TZ`)
* Easily install python packages from tarballs (EMR only)
* Setup handled transparently by `mrjob.conf` config file
* Automatically interpret error logs from EMR
* SSH tunnel to hadoop job tracker on EMR
* Zero setup on Hadoop (no need to install mrjob on your Hadoop cluster)

Installation
============
`python setup.py install`

Works out-of-the box with your hadoop cluster (just set `$HADOOP_HOME`)

Minimal EMR setup:

* create an Amazon Web Services account: <http://aws.amazon.com/>
* sign up for Elastic MapReduce: <http://aws.amazon.com/elasticmapreduce/>
* Get your access and secret keys (go to <http://aws.amazon.com/account/> and
click on "Security Credentials") and set the environment variables
`$AWS_ACCESS_KEY_ID` and `$AWS_SECRET_ACCESS_KEY` accordingly
* create at least one S3 bucket in the "US Standard" region to use for logs
and scratch space: <https://console.aws.amazon.com/s3/home>

mrjob will work in other AWS regions (e.g. Asia), but you'll have to set up
`mrjob.conf`. See below.


Try it out!
===========
# locally
python mrjob/examples/mr_word_freq_count.py README.txt > counts
# on EMR
python mrjob/examples/mr_word_freq_count.py README.txt -r emr > counts
# on your Hadoop cluster
python mrjob/examples/mr_word_freq_count.py README.txt -r hadoop > counts


Advanced Configuration
======================
To run in other AWS regions, upload your source tree, run `make`, and use
other advanced mrjob features, you'll need to set up `mrjob.conf`. mrjob looks
for its conf file in:

* `~/.mrjob`
* `mrjob.conf` anywhere in your `$PYTHONPATH`
* `/etc/mrjob.conf`

See `mrjob.conf.example` for more information.


Links
=====

* source: <http://github.com/Yelp/mrjob>
* documentation: <http://packages.python.org/mrjob/>
* Hadoop MapReduce: <http://hadoop.apache.org/mapreduce/>
* Elastic MapReduce: <http://aws.amazon.com/documentation/elasticmapreduce/>
70 changes: 0 additions & 70 deletions README.txt

This file was deleted.

3 changes: 3 additions & 0 deletions docs/.gitignore
@@ -0,0 +1,3 @@
_build
_static
_templates
130 changes: 130 additions & 0 deletions docs/Makefile
@@ -0,0 +1,130 @@
# Makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build

# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .

.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest

help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"

clean:
-rm -rf $(BUILDDIR)/*

html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."

singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."

pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."

json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."

htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."

qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/mrjob.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/mrjob.qhc"

devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/mrjob"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/mrjob"
@echo "# devhelp"

epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."

latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."

latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
make -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."

man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."

changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."

linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."

doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."

0 comments on commit bd46f0a

Please sign in to comment.