source examples to support the "Cascading for the Impatient" blog post series
Java CSS Shell
Switch branches/tags
Nothing to show
Latest commit 6155f85 Aug 22, 2016 Mark Castillo committed with fs111 Update to use Cascading 3.1.0
Permalink
Failed to load latest commit information.
docs bug fix thanks to Will DelHagen Apr 5, 2013
etc fix s3 upload location Jun 19, 2015
gradle/wrapper set gradle wrapper to use gradle 2.2 Jul 14, 2015
impatient-docs/src
part1 Update to use Cascading 3.1.0 Aug 30, 2016
part2 Update to use Cascading 3.1.0 Aug 30, 2016
part3
part4 Update to use Cascading 3.1.0 Aug 30, 2016
part5 Update to use Cascading 3.1.0 Aug 30, 2016
part6 Update to use Cascading 3.1.0 Aug 30, 2016
.gitignore
README.md Update to use Cascading 3.1.0 Aug 30, 2016
build.gradle Update to use Cascading 3.1.0 Aug 30, 2016
gradlew Adding Gradle Wrapper so that users do not need to install Gradle Jul 14, 2015
gradlew.bat
settings.gradle update copyright May 19, 2015

README.md

Cascading for the Impatient

Welcome to Cascading for the Impatient, a tutorial for Cascading 3.1.x to get you started. Quickly. Like, yesterday.

This set of progressive coding examples starts with a simple file copy and builds up to a MapReduce implementation of the TF-IDF algorithm.

You can read the full series here: http://docs.cascading.org/impatient/

If you have a question or run into any problems send an email to the cascading-user-list.

Part 1

  • Implements simplest Cascading app possible
  • Copies each TSV line from source tap to sink tap
  • Roughly, in about a dozen lines of code
  • Physical plan: 1 Mapper

Part 2

  • Implements a simple example of WordCount
  • Uses a regex to split the input text lines into a token stream
  • Generates a DOT file, to show the Cascading flow graphically
  • Physical plan: 1 Mapper, 1 Reducer

Part 3

  • Uses a custom Function to scrub the token stream
  • Discusses when to use standard Operations vs. creating custom ones
  • Physical plan: 1 Mapper, 1 Reducer

Part 4

  • Shows how to use a HashJoin on two pipes
  • Filters a list of stop words out of the token stream
  • Physical plan: 1 Mapper, 1 Reducer

Part 5

  • Calculates TF-IDF using an ExpressionFunction
  • Shows how to use a CountBy, SumBy, and a CoGroup
  • Physical plan: 10 Mappers, 8 Reducers

Part 6

  • Includes unit tests in the build
  • Shows how to use other TDD features: checkpoints, assertions, traps, debug
  • Physical plan: 11 Mappers, 8 Reducers

Part 7

This example is currently not implemented.

Part 8

  • Scalding equivalents of previous examples in Cascading