Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

..
Octocat-spinner-32 data
Octocat-spinner-32 docs
Octocat-spinner-32 src
Octocat-spinner-32 LICENSE.txt
Octocat-spinner-32 README.md
Octocat-spinner-32 build.gradle
README.md

Cascading for the Impatient, Part 3

The goal is to expand on our Word Count example in Cascading, and show how to write a custom Operation.

We'll keep building on this example until we have a MapReduce implementation of TF-IDF.

More detailed background information and step-by-step documentation is provided at https://github.com/cascading/impatient/wiki

Build Instructions

To generate an IntelliJ project use:

gradle ideaModule

To build the sample app from the command line use:

gradle clean jar

Before running this sample app, be sure to set your HADOOP_HOME environment variable. Then clear the output directory, then to run on a desktop/laptop with Apache Hadoop in standalone mode:

rm -rf output
hadoop jar ./build/libs/impatient.jar data/rain.txt output/wc

To view the results:

more output/wc/part-00000

To run the Pig version of the script, make sure PIG_HOME is set and run :

rm -rf output
mkdir -p dot
pig -p docPath=./data/rain.txt -p wcPath=./output/wc ./src/scripts/wc.pig

An example of log captured from a successful build+run is at https://gist.github.com/3021655

For more discussion, see the cascading-user email forum.

Stay tuned for the next installments of our Cascading for the Impatient series.

Something went wrong with that request. Please try again.