# MapReduce Using `MRJob`

## Job Posting Dataset

The sample dataset we will use (`data/job-data/job-data-2018-09-08-00-00-37.txt`) contains job postings from on one of the US job search websites. The data is stored with each row as a JSON document representing a job posting record. 

The example below shows a sample job postings from the data file. The sample record has been formatted with 4 spaces indentation. In the real file, each record is stored as a JSON document in one row.

*Example: JSON document of a job posting record*

```
{
    "industry": "Information Technology", 
    "datePosted": "2018-09-07", 
    "salaryCurrency": "USD", 
    "validThrough": "2018-10-07", 
    "empId": 671932, 
    "jobLocation": {
        "geo": {
            "latitude": "37.7623", 
            "@type": "GeoCoordinates", 
            "longitude": "-122.4145"
        }, 
        "@type": "Place", 
        "address": {
            "postalCode": "94110-2042", 
            "addressLocality": "San Francisco", 
            "@type": "PostalAddress", 
            "addressRegion": "CA", 
            "addressCountry": {
                "@type": 
                "Country", 
                "name": "US"
            }
        }
    }, 
    "estimatedSalary": {
        "@type": "MonetaryAmount", 
        "currency": "USD", 
        "value": {
            "maxValue": "202000", 
            "@type": "QuantitativeValue", 
            "unitText": "YEAR", 
            "minValue": "146000"
        }
    }, 
    "description": "<div><em>Generate insights and impact from data</em><em>.</em></div>\n<br/>\n<div>\n<div>We're looking for data scientists to join the Analytics team who are excited about applying their analytical skills to understand our users and influence decision making. If you are naturally data curious, excited about deriving insights from data, and motivated by having impact on the business, we want to hear from you.</div><br/>\n\n<div><strong>You will:</strong></div><div>\n\n\n<ul>\n<li>Work closely with product and business teams to identify important questions and answer them with data.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Apply statistical and econometric models on large datasets to: i) measure results and outcomes, ii) identify causal impact and attribution, iii) predict future performance of users or products.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Design, analyze, and interpret the results of experiments.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Drive the collection of new data and the refinement of existing data sources.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Create analyses that tell a \"story\" focused on insights, not just data.</li>\n</ul>\n\n</div><br/>\n\n<div><strong>We're looking for someone with:</strong></div><div>\n\n\n<ul>\n<li>3+ years experience working with and analyzing large data sets to solve problems.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>A PhD or MS in a quantitative field (e.g., Economics, Statistics, Eng, Natural Sciences, CS).</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Expert knowledge of a scientific computing language (such as R or Python) and SQL.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Strong knowledge of statistics and experimental design.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Ability to communicate results clearly and a focus on driving impact.</li>\n</ul>\n\n</div><br/>\n\n<div><strong>Nice to haves:</strong></div><div>\n\n\n<ul>\n<li>Prior experience with data-distributed tools (Scalding, Hadoop, Pig, etc).</li>\n</ul>\n\n</div><br/>\n\n<div><strong>You should include these in your application:</strong></div><div>\n\n\n<ul>\n<li>Resume and LinkedIn profile.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Description of the most interesting data analysis you've done, key findings, and its impact.</li>\n</ul>\n\n</div><br/>\n\n<div>\n\n\n<ul>\n<li>Link to or attachment of code you've written related to data analysis.</li>\n</ul>\n\n</div>\n</div>\n<br/>", 
    "hiringOrganization": {
        "@type": "Organization", 
        "sameAs": "www.stripe.com", 
        "name": "Stripe"
    },
    "@type": "JobPosting", 
    "jobId": 2280174543, 
    "@context": "http://schema.org", 
    "employmentType": "FULL_TIME", 
    "occupationalCategory": [
        "15-1111.00", 
        "Computer and Information Research Scientists"
    ], 
    "title": "Data Scientist"
}
```

## 1. Protocols For Input & Output

`mrjob` assumes that all data is newline-delimited bytes. Each job has an *input protocol*, an *output protocol*, and an *internal protocol*. These protocols can be changed by overwritting the attributes: `INPUT_PROTOCOL`, `INTERNAL_PROTOCOL`, and `OUTPUT_PROTOCOL`, respectively.

The default *input* protocol is `RawValueProtocol`, which just reads in a line as a `str`.
The default *output* and *internal* protocols are both `JSONProtocol`, which reads and writes JSON strings separated by a tab character.

`JSONValueProtocol` encodes value as a JSON and discard key (key is read in as None). To load the job posting dataset, we can set `INPUT_PROTOCOL = JSONValueProtocol` which automaticall loads input data as Python `dict` objects.

For more information, see [Protocols](https://pythonhosted.org/mrjob/guides/writing-mrjobs.html#job-protocols).

**Example**: The script below loads the data into `MRTest.mapper` and generates output of key-value pairs where keys are `jobId:int` and values are `jobLocation:dict`, which will then be written into output files as JSON documents. Note that no `MRTest.reducer` is provided, this type of jobs are sometimes called *map-only* jobs.

In [1]:
%%file mr-jobs/1_protocols.py
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol, TextProtocol

class MRTest(MRJob):
    
    INPUT_PROTOCOL = JSONValueProtocol

    def mapper(self, _, value):
        yield value.get('jobId', None), value.get('jobLocation', None)

        
if __name__ == '__main__':
    MRTest.run()

Overwriting mr-jobs/1_protocols.py


Test locally:

In [2]:
!python3 mr-jobs/1_protocols.py ../data/job-data/ --output-dir mr-output

No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/1_protocols.hadoop.20180914.183728.393372
job output is in mr-output
Removing temp directory /tmp/1_protocols.hadoop.20180914.183728.393372...


Run on your Hadoop cluster:

In [3]:
!hdfs dfs -rm -r hdfs:///user/hadoop/mr-output

rm: `hdfs:///user/hadoop/mr-output': No such file or directory


In [4]:
!python3 mr-jobs/1_protocols.py -r hadoop \
hdfs:///user/hadoop/job-data/ \
--output-dir mr-output/

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/hadoop-2.8.4/bin...
Found hadoop binary: /usr/local/hadoop-2.8.4/bin/hadoop
Using Hadoop version 2.8.4
Looking for Hadoop streaming jar in /usr/local/hadoop-2.8.4...
Found Hadoop streaming jar: /usr/local/hadoop-2.8.4/share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar
Creating temp directory /tmp/1_protocols.hadoop.20180914.183752.023497
Copying local files to hdfs:///user/hadoop/tmp/mrjob/1_protocols.hadoop.20180914.183752.023497/files/...
Running step 1 of 1...
  packageJobJar: [/tmp/hadoop-unjar4781646709125021108/] [] /tmp/streamjob3101220794703745066.jar tmpDir=null
  Connecting to ResourceManager at /0.0.0.0:8032
  Connecting to ResourceManager at /0.0.0.0:8032
  Total input files to process : 1
  number of splits:2
  Submitting tokens for job: job_1536772908775_0010
  Submitted application application_1536772908775_0010
  The url to track the job: ht

## 2. Filtering

Keys:

- Filtering pattern aims to find a subset of data but (often) not change the actural records. 
  - We can set `OUTPUT_PROTOCOL = JSONValueProtocol` to ignore the key field for each record in the output.
- Filtering patterns usually don't need a reducer if each record is filtered individually and the evaluation does not depend on other records.
- Filtering usually serves as an abstract pattern for some other patterns.

Applications:

- Data cleaning
- Events tracking
- Records matching
- Random sampling
- Dataset splitting

### 2.1 Simple Filtering

Simple filtering is often used when data cleaning, events tracking, outliers removing, etc. are needed.

**Example**: Find all jobs with titles relavant to *Data Scientist*.

In [5]:
%%file mr-jobs/2.1_simple_filtering.py
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol

class MRSimpleFiltering(MRJob):
    
    INPUT_PROTOCOL = JSONValueProtocol
    OUTPUT_PROTOCOL = JSONValueProtocol
    
    def mapper(self, _, value):
        title = value.get('title', '').lower()
        if title.find('data scientist') > -1:
            yield _, value
        
if __name__ == '__main__':
    MRSimpleFiltering.run()

Overwriting mr-jobs/2.1_simple_filtering.py


Test locally:

In [6]:
!python3 mr-jobs/2.1_simple_filtering.py ../data/job-data/* --output-dir mr-output

No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/2.hadoop.20180914.183939.999036
job output is in mr-output
Removing temp directory /tmp/2.hadoop.20180914.183939.999036...


Run on your Hadoop cluster:

In [9]:
!hdfs dfs -rm -r hdfs:///user/hadoop/mr-output

Deleted hdfs:///user/hadoop/mr-output


In [10]:
!python3 mr-jobs/2.1_simple_filtering.py \
-r hadoop hdfs:///user/hadoop/job-data/ \
--output-dir mr-output/

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/hadoop-2.8.4/bin...
Found hadoop binary: /usr/local/hadoop-2.8.4/bin/hadoop
Using Hadoop version 2.8.4
Looking for Hadoop streaming jar in /usr/local/hadoop-2.8.4...
Found Hadoop streaming jar: /usr/local/hadoop-2.8.4/share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar
Creating temp directory /tmp/2.hadoop.20180914.184104.444184
Copying local files to hdfs:///user/hadoop/tmp/mrjob/2.hadoop.20180914.184104.444184/files/...
Running step 1 of 1...
  packageJobJar: [/tmp/hadoop-unjar4384047019595041235/] [] /tmp/streamjob7867418860001505140.jar tmpDir=null
  Connecting to ResourceManager at /0.0.0.0:8032
  Connecting to ResourceManager at /0.0.0.0:8032
  Total input files to process : 1
  number of splits:2
  Submitting tokens for job: job_1536772908775_0012
  Submitted application application_1536772908775_0012
  The url to track the job: http://653a9ad8c076:80

### 2.2 Random Sampling

Random sampling pattern allows us to create a subset (usually much smaller) of our larger dataset for quick exploration. Thus each record should have an equal probability of being selected. 

If reproducible is not required, then we can use a random function, e.g.: `random.uniform(a, b)` in python, to do the work.

Now we want to pass an argument `fraction` to our `MRJob` class. We can do this by using `MRJob.configure_args()` and `MRJob.add_passthru_arg` together.

**Example**: Create a random subset with 10% of the full dataset.

In [109]:
%%file mr-jobs/2.2_random_sampling.py
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol

import random


class MRRandomSampling(MRJob):
    
    INPUT_PROTOCOL = JSONValueProtocol
    OUTPUT_PROTOCOL = JSONValueProtocol
    
    def configure_args(self):
        super().configure_args()
        self.add_passthru_arg('-f', '--fraction', type=float)
        
    def mapper_init(self):
        if self.options.fraction > 1 or self.options.fraction < 0:
            raise ValueError('Invalid fraction value')
        
    def mapper(self, _, value):
        key = value.get('jobId', 0)
        if random.uniform(0, 1) < self.options.fraction:
            yield _, value
    
        
if __name__ == '__main__':
    MRRandomSampling.run()

Overwriting mr-jobs/2.2_random_sampling.py


Test locally:

In [112]:
!python3 mr-jobs/2.2_random_sampling.py ../data/job-data/ --output-dir mr-output/ --fraction .1

No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/2.hadoop.20180914.202619.288833
job output is in mr-output/
Removing temp directory /tmp/2.hadoop.20180914.202619.288833...


Run on your Hadoop cluster:

In [113]:
!hdfs dfs -rm -r hdfs:///user/hadoop/mr-output

Deleted hdfs:///user/hadoop/mr-output


In [114]:
!python3 mr-jobs/2.2_random_sampling.py \
-r hadoop hdfs:///user/hadoop/job-data/ \
--output-dir mr-output/ --fraction .1

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/hadoop-2.8.4/bin...
Found hadoop binary: /usr/local/hadoop-2.8.4/bin/hadoop
Using Hadoop version 2.8.4
Looking for Hadoop streaming jar in /usr/local/hadoop-2.8.4...
Found Hadoop streaming jar: /usr/local/hadoop-2.8.4/share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar
Creating temp directory /tmp/2.hadoop.20180914.202623.769842
Copying local files to hdfs:///user/hadoop/tmp/mrjob/2.hadoop.20180914.202623.769842/files/...
Running step 1 of 1...
  packageJobJar: [/tmp/hadoop-unjar1457984388549201326/] [] /tmp/streamjob3508481496319194522.jar tmpDir=null
  Connecting to ResourceManager at /0.0.0.0:8032
  Connecting to ResourceManager at /0.0.0.0:8032
  Total input files to process : 1
  number of splits:2
  Submitting tokens for job: job_1536772908775_0017
  Submitted application application_1536772908775_0017
  The url to track the job: http://653a9ad8c076:80

## 2.3 Data Splitting

For machine learning modeling, we usually divide the data set into two non-overlapping subsets:
- training set — a subset to train a model.
- test set — a subset to test the trained model.

If the goal is to split the dataset into such two subsets, then we need to make sure:
- each record can only be selected into one of the two datasets
- sampling is reproducible

The `sample` function below return either `True` or `False` based on the key and fraction:
1. split fraction into *numerator* and *denominator*, e.g.: 0.125 -> 125/1000
2. calculate the hash value of the key. Here we will use MD5, which is a widely used hash function producing a 128-bit hash value.
3. calculate hash value modulo *denominator*, if it's less than *numerator*, return `True`, otherwise return `False`.

Note: if you just want to randomly sample the dataset, then a simple random number generator will work.

In [103]:
import decimal
import hashlib

def sample(key, fraction):
    if fraction > 1 or fraction < 0:
        raise ValueError('Invalid fraction value')
    # calculate numerator and denominator
    frac = decimal.Decimal(str(fraction)).as_tuple()
    numer = sum([v*10**i for i, v in enumerate(frac.digits[::-1])])
    denom = 10**(-frac.exponent)
    # calculate hash value using md5
    hash_val = hashlib.md5(str(key).encode()).hexdigest()
    return (int(hash_val, 16) % denom) < numer

In [104]:
# test the function with the code below
N = 1000
print(sum([sample(i, fraction=0.25) for i in range(N)]))

259


**Example**: Randomly slicing a single data set into a training set (70%) and test set (30%)

In [143]:
%%file mr-jobs/2.3_train_test_splitting.py
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol

import decimal
import hashlib

class MRTrainTestSplit(MRJob):
    
    INPUT_PROTOCOL = JSONValueProtocol
    OUTPUT_PROTOCOL = JSONValueProtocol
    
    def configure_args(self):
        super().configure_args()
        self.add_passthru_arg('-s', '--split', )
        self.add_passthru_arg('-t', '--test_size', type=float)
        
    def mapper_init(self):
        if self.options.split not in ('train', 'test'):
            raise ValueError('Invalid split value')
        if self.options.test_size > 1 or self.options.test_size < 0:
            raise ValueError('Invalid test size')
        
    def mapper(self, _, value):
        key = value.get('jobId', 0)
        include = self._sample(key=key, fraction=self.options.test_size)
        if include ^ (self.options.split=='train'):
            yield _, value
    
    def _sample(self, key, fraction=1):
        frac = decimal.Decimal(str(fraction)).as_tuple()
        numer = sum([v*10**i for i, v in enumerate(frac.digits[::-1])])
        denom = 10**(-frac.exponent)
        hash_val = hashlib.md5(str(key).encode()).hexdigest()
        return (int(hash_val, 16) % denom) < numer
    
        
if __name__ == '__main__':
    MRTrainTestSplit.run()

Overwriting mr-jobs/2.3_train_test_splitting.py


Test locally:

In [148]:
!python3 mr-jobs/2.3_train_test_splitting.py ../data/job-data/ \
--output-dir mr-output/train \
--test_size 0.3 \
--split train \
&& python3 mr-jobs/2.3_train_test_splitting.py ../data/job-data/ \
--output-dir mr-output/test \
--test_size 0.3 \
--split test \

No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/2.hadoop.20180914.205605.794433
job output is in mr-output/train
Removing temp directory /tmp/2.hadoop.20180914.205605.794433...
No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/2.hadoop.20180914.205606.327740
job output is in mr-output/test
Removing temp directory /tmp/2.hadoop.20180914.205606.327740...


Run on your Hadoop cluster:

In [149]:
!hdfs dfs -rm -r hdfs:///user/hadoop/mr-output

Deleted hdfs:///user/hadoop/mr-output


Example: Create a reproducible train/test split.

In [150]:
!python3 mr-jobs/2.3_train_test_splitting.py \
-r hadoop hdfs:///user/hadoop/job-data/ \
    --output-dir mr-output/train \
    --test_size 0.3 \
    --split train \
&& python3 mr-jobs/2.3_train_test_splitting.py \
-r hadoop hdfs:///user/hadoop/job-data/ \
    --output-dir mr-output/test \
    --test_size 0.3 \
    --split test

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/hadoop-2.8.4/bin...
Found hadoop binary: /usr/local/hadoop-2.8.4/bin/hadoop
Using Hadoop version 2.8.4
Looking for Hadoop streaming jar in /usr/local/hadoop-2.8.4...
Found Hadoop streaming jar: /usr/local/hadoop-2.8.4/share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar
Creating temp directory /tmp/2.hadoop.20180914.205751.274286
Copying local files to hdfs:///user/hadoop/tmp/mrjob/2.hadoop.20180914.205751.274286/files/...
Running step 1 of 1...
  packageJobJar: [/tmp/hadoop-unjar8384516427279074643/] [] /tmp/streamjob1112618257944548065.jar tmpDir=null
  Connecting to ResourceManager at /0.0.0.0:8032
  Connecting to ResourceManager at /0.0.0.0:8032
  Total input files to process : 1
  number of splits:2
  Submitting tokens for job: job_1536772908775_0018
  Submitted application application_1536772908775_0018
  The url to track the job: http://653a9ad8c076:80

## 3. Top N Pattern

Top N algorithm is the top-n items of a dataset. The top ten pattern is a bit different than previous ones in that you know how many records you want to get in the end, no matter what the input size. 

### 3.1 Top N Values

In [151]:
%%file mr-jobs/3.1_top_n_value.py
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol

import heapq

class MRTopNValue(MRJob):
    
    INPUT_PROTOCOL = JSONValueProtocol
    OUTPUT_PROTOCOL = JSONValueProtocol
        
    def configure_args(self):
        super().configure_args()
        self.add_passthru_arg('-n', '--top_n', type=int)
        
    def mapper(self, _, value):
        try:
            max_ = float(value['estimatedSalary']['value']['maxValue'])
            min_ = float(value['estimatedSalary']['value']['minValue'])
        except (KeyError, ValueError):
            pass
        else:
            yield _, (max_, min_)
    
    def reducer_init(self):
        if self.options.top_n < 1:
            raise ValueError('Invalid top_n value')
        self.top_n = []
        
    def reducer(self, _, values):
        for value in values:
            if len(self.top_n) < self.options.top_n:
                heapq.heappush(self.top_n, value)
            else:
                heapq.heappushpop(self.top_n, value)
                
    def reducer_final(self):
        for value in self.top_n:
            yield None, value


if __name__ == '__main__':
    MRTopNValue.run()

Overwriting mr-jobs/3.1_top_n_value.py


Test locally:

In [158]:
!python3 mr-jobs/3.1_top_n_value.py ../data/job-data/ --output-dir mr-output --top_n 100

No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/3.hadoop.20180914.210728.290722
job output is in mr-output
Removing temp directory /tmp/3.hadoop.20180914.210728.290722...


Run on your Hadoop cluster:

In [160]:
!hdfs dfs -rm -r hdfs:///user/hadoop/mr-output \

Deleted hdfs:///user/hadoop/mr-output


In [161]:
!python3 mr-jobs/3.1_top_n_value.py \
-r hadoop hdfs:///user/hadoop/job-data/ \
    --output-dir mr-output/ --top_n 100

No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/hadoop-2.8.4/bin...
Found hadoop binary: /usr/local/hadoop-2.8.4/bin/hadoop
Using Hadoop version 2.8.4
Looking for Hadoop streaming jar in /usr/local/hadoop-2.8.4...
Found Hadoop streaming jar: /usr/local/hadoop-2.8.4/share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar
Creating temp directory /tmp/3.hadoop.20180914.210816.817767
Copying local files to hdfs:///user/hadoop/tmp/mrjob/3.hadoop.20180914.210816.817767/files/...
Running step 1 of 1...
  packageJobJar: [/tmp/hadoop-unjar745154766757230548/] [] /tmp/streamjob4231415285271400797.jar tmpDir=null
  Connecting to ResourceManager at /0.0.0.0:8032
  Connecting to ResourceManager at /0.0.0.0:8032
  Total input files to process : 1
  number of splits:2
  Submitting tokens for job: job_1536772908775_0021
  Submitted application application_1536772908775_0021
  The url to track the job: http://653a9ad8c076:808

In [None]:
%%file mr-jobs/3.2_top_n_job.py
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol

import os
import json

class MRTopNJob(MRJob):
    
    INPUT_PROTOCOL = JSONValueProtocol
    OUTPUT_PROTOCOL = JSONValueProtocol
        
    def configure_args(self):
        super().configure_args()
        self.add_passthru_arg('-n', '--top_n', type=int)
    
    def mapper_init(self):
        for cache_file in os.listdir(self.cache_dir):
            if cache_file.find('part-') == 0:
                with open(os.path.join(self.cache_dir, cache_file), 'r') as f:
                    for line in f:
                        self.cache.append(tuple(json.loads(line)))
        
    def mapper(self, _, value):
        try:
            max_ = float(value['estimatedSalary']['value']['maxValue'])
            min_ = float(value['estimatedSalary']['value']['minValue'])
        except (KeyError, ValueError):
            pass
        else:
            if (max_, min_) in self.cache:
                yield _, value

if __name__ == '__main__':
    MRTopNJob.run()

In [None]:
!python3 mr-jobs/3.1_top_n_job.py ../data/job-data/job-data-2018-09-08-00-00-37.txt --output-dir mr-output-jobs

In [None]:
import json

tuple(json.loads('[1,2]'))

In [None]:
import os

os.getcwd()

In [None]:
os.listdir('mr-output/')

In [None]:
import requests

In [132]:
True ^ False

True