Sameer Adhikari:
This repo contains material I created for a three-day course that I offered at Intel in July 2013.
I have included the code and data for the examples I used in class.
The data for patent jobs is missing as it does not fit within GitHub limits.
The code requires access to a Hadoop cluster where you can run streaming jobs.
Some examples were run on Amazon Web Services Elastic MapReduce without chainging the code that ran on the local cluster.