Mr. D. Patterns
Err uh... MapReduce Design Patterns.
As I work through the MapReduce Design Patterns book I need a place to stash my source code. This is it.
I stayed moderately true to the examples, with some re-arrangement here and there. Most notably the MRDPUtils#transformXmlToMap performs a StringEscapeUtils#unescapeHtml within itself rather than separately in any mapper that needs that functionality.
$ mvn package
I've placed a bunch of scripts in the ./bin/ directory. These make a few terrible assumptions about your environment. You can change ./bin/env.sh to be more accomodating.
- There is a
$HADOOP_HOME, even though its deprecated
$DATADIRis mapped to
- You have the CC data dump from StackOverflow (I used 2009 because its smallish, you should be able to use any year)
- The launch scripts assume single node mode
Make sure Hadoop is running (
$HADOOP_HOME/bin/start-all.sh) and execute the script of your choice.