Browse files

Updated

Signed-off-by: Brandyn A. White <bwhite@dappervision.com>
  • Loading branch information...
1 parent 67ce134 commit d500999aee49e454efb7ce40f85be4b899a839c1 @bwhite committed Jul 30, 2012
Showing with 18 additions and 2 deletions.
  1. +1 −0 doc/flow.rst
  2. +1 −1 doc/hbase.rst
  3. +1 −1 doc/helper.rst
  4. +15 −0 doc/projects.rst
View
1 doc/flow.rst
@@ -1,5 +1,6 @@
Hadoopy Flow: Automatic Job-Level Parallization (Experimental)
==============================================================
+Hadoopy flow is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_flow. It is under active development.
Once you get past the wordcount examples and you have a few scripts you use regularly, the next level of complexity is managing a workflow of jobs. The simplest way of doing this is to put a few sequential launch statements in a python script and run it. This is fine for simple workflows but you miss out on two abilities: re-execution of previous workflows by re-using outputs (e.g., when tweaking one job in a flow) and parallel execution of jobs in a flow. I've had some fairly complex flows and previously the best solution I could find was using Oozie_ with a several thousand line XML file. Once setup, this ran jobs in parallel and re-execute the workflow by skipping previous nodes; however, it is another server you have to setup and making that XML file takes a lot of the fun out of using Python in the first place (it could be more code than your actual task). While Hadoopy is fully compatible with Oozie, it certainly seems lacking for the kind of short turn-around scripts most users want to make.
View
2 doc/hbase.rst
@@ -1,3 +1,3 @@
HBase Integration (Experimental)
================================
-Preliminary HBase support is available at http://github.com/bwhite/hadoopy_hbase. It is under active development.
+Hadoopy HBase support is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_hbase. It is under active development.
View
2 doc/helper.rst
@@ -1,3 +1,3 @@
Hadoopy Helper: Useful Hadoopy tools (Experimental)
===================================================
-Hadoopy Helper is available at http://github.com/bwhite/hadoopy_helper. It is under active development.
+Hadoopy Helper support is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_helper. It is under active development.
View
15 doc/projects.rst
@@ -3,6 +3,21 @@ Example Projects
Compute Vector Statistics
-------------------------
+Compute statistics for different vector groups. The first job is a basic implementation, the second uses the "In-mapper" combine design pattern to minimize the amount of data sent to the reducer(s) during the shuffle phase.
+.. raw:: html
+
+ <script src="https://gist.github.com/3204235.js?file=vector_stats.py"></script>
+
+.. raw:: html
+
+ <script src="https://gist.github.com/3204235.js?file=vector_stats_imc.py"></script>
+
+
+Below is the driver and test script for the above jobs.
+.. raw:: html
+
+ <script src="https://gist.github.com/3204235.js?file=driver"></script>
+
Resize Images

0 comments on commit d500999

Please sign in to comment.