Pangool-Flow is an experimental module on top of Pangool (http://pangool.net) which adds automatic flow building and management, parallel execution and high-level constructs.
Java Clojure
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
README.md
pom.xml

README.md

Pangool-flow

Pangool-flow is an experimental module on top of Pangool (http://pangool.net).

Pangool-flow adds to Pangool:

  • The possibility of chaining Pangool MapReduce Jobs and executing the resultant flow.
  • Parallel execution of Jobs in a Flow.
  • High-level constructs (operations) similar to those in Cascading, called "Ops".

The difference between Pangool-flow and other flow-based APIs such as Pig or Cascading is that Pangool-flow is built around the MapReduce abstraction. So, each step in the flow is represented by a MapReduceStep. Therefore, it is possible to tune every MapReduceStep as much as needed, and so there is no flexibility tradeoff involved in using pangool-flow.