diff --git a/doc/flow.rst b/doc/flow.rst index 7ee1a5c..e707e39 100644 --- a/doc/flow.rst +++ b/doc/flow.rst @@ -1,5 +1,6 @@ Hadoopy Flow: Automatic Job-Level Parallization (Experimental) ============================================================== +Hadoopy flow is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_flow. It is under active development. Once you get past the wordcount examples and you have a few scripts you use regularly, the next level of complexity is managing a workflow of jobs. The simplest way of doing this is to put a few sequential launch statements in a python script and run it. This is fine for simple workflows but you miss out on two abilities: re-execution of previous workflows by re-using outputs (e.g., when tweaking one job in a flow) and parallel execution of jobs in a flow. I've had some fairly complex flows and previously the best solution I could find was using Oozie_ with a several thousand line XML file. Once setup, this ran jobs in parallel and re-execute the workflow by skipping previous nodes; however, it is another server you have to setup and making that XML file takes a lot of the fun out of using Python in the first place (it could be more code than your actual task). While Hadoopy is fully compatible with Oozie, it certainly seems lacking for the kind of short turn-around scripts most users want to make. diff --git a/doc/hbase.rst b/doc/hbase.rst index 6d9dcd1..21c321b 100644 --- a/doc/hbase.rst +++ b/doc/hbase.rst @@ -1,3 +1,3 @@ HBase Integration (Experimental) ================================ -Preliminary HBase support is available at http://github.com/bwhite/hadoopy_hbase. It is under active development. +Hadoopy HBase support is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_hbase. It is under active development. diff --git a/doc/helper.rst b/doc/helper.rst index 2810c82..49e6b89 100644 --- a/doc/helper.rst +++ b/doc/helper.rst @@ -1,3 +1,3 @@ Hadoopy Helper: Useful Hadoopy tools (Experimental) =================================================== -Hadoopy Helper is available at http://github.com/bwhite/hadoopy_helper. It is under active development. +Hadoopy Helper support is experimental and is maintained out of branch at https://github.com/bwhite/hadoopy_helper. It is under active development. diff --git a/doc/projects.rst b/doc/projects.rst index 0a9f5c6..e1fed4e 100644 --- a/doc/projects.rst +++ b/doc/projects.rst @@ -3,6 +3,21 @@ Example Projects Compute Vector Statistics ------------------------- +Compute statistics for different vector groups. The first job is a basic implementation, the second uses the "In-mapper" combine design pattern to minimize the amount of data sent to the reducer(s) during the shuffle phase. +.. raw:: html + + + +.. raw:: html + + + + +Below is the driver and test script for the above jobs. +.. raw:: html + + + Resize Images