Skip to content

Commit

Permalink
Updated
Browse files Browse the repository at this point in the history
Signed-off-by: Brandyn A. White <bwhite@dappervision.com>
  • Loading branch information
Brandyn A. White committed Jul 30, 2012
1 parent 67ce134 commit d500999
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/flow.rst
@@ -1,5 +1,6 @@
Hadoopy Flow: Automatic Job-Level Parallization (Experimental)
==============================================================
Hadoopy flow is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_flow. It is under active development.

Once you get past the wordcount examples and you have a few scripts you use regularly, the next level of complexity is managing a workflow of jobs. The simplest way of doing this is to put a few sequential launch statements in a python script and run it. This is fine for simple workflows but you miss out on two abilities: re-execution of previous workflows by re-using outputs (e.g., when tweaking one job in a flow) and parallel execution of jobs in a flow. I've had some fairly complex flows and previously the best solution I could find was using Oozie_ with a several thousand line XML file. Once setup, this ran jobs in parallel and re-execute the workflow by skipping previous nodes; however, it is another server you have to setup and making that XML file takes a lot of the fun out of using Python in the first place (it could be more code than your actual task). While Hadoopy is fully compatible with Oozie, it certainly seems lacking for the kind of short turn-around scripts most users want to make.

Expand Down
2 changes: 1 addition & 1 deletion doc/hbase.rst
@@ -1,3 +1,3 @@
HBase Integration (Experimental)
================================
Preliminary HBase support is available at http://github.com/bwhite/hadoopy_hbase. It is under active development.
Hadoopy HBase support is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_hbase. It is under active development.
2 changes: 1 addition & 1 deletion doc/helper.rst
@@ -1,3 +1,3 @@
Hadoopy Helper: Useful Hadoopy tools (Experimental)
===================================================
Hadoopy Helper is available at http://github.com/bwhite/hadoopy_helper. It is under active development.
Hadoopy Helper support is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_helper. It is under active development.
15 changes: 15 additions & 0 deletions doc/projects.rst
Expand Up @@ -3,6 +3,21 @@ Example Projects

Compute Vector Statistics
-------------------------
Compute statistics for different vector groups. The first job is a basic implementation, the second uses the "In-mapper" combine design pattern to minimize the amount of data sent to the reducer(s) during the shuffle phase.
.. raw:: html

<script src="https://gist.github.com/3204235.js?file=vector_stats.py"></script>

.. raw:: html

<script src="https://gist.github.com/3204235.js?file=vector_stats_imc.py"></script>


Below is the driver and test script for the above jobs.
.. raw:: html

<script src="https://gist.github.com/3204235.js?file=driver"></script>



Resize Images
Expand Down

0 comments on commit d500999

Please sign in to comment.