# <u><p style="text-align: center;">MapReduce</p></u>

### Learning goals  
Students will be able to:  
*	Explain how map and reduce can be combined to operate on data

### Background

`Map` and `reduce` operations are combined into a workflow for data transformations called **mapReduce**. As the name suggests, in this workflow the functions are first mapped to data and the result of this operation goes through the `reduce` operation. This workflow allows a high degree of parallelization since both operations are parallelizable and as a result suitable for working with large amounts of data. 

### Code examples

#### Example 1:
Usually, different datasets come with different representations for dates. A common representation is to store dates as `dd-mm-yyyy`. In our first example, we are going to use mapReduce to process such formatted dates in order to find the most recent year later. 

For this we have a list of dates:

In [None]:
dates = ['9-1-2004',
        '19-5-1986',
        '27-10-2018',
        '5-4-2021',
        '16-8-1936']

and a function that extracts the year from our date:

In [None]:
def extract_year(date):
    return date[-4:] #retain the last 4 characters of the date string

print(extract_year('27-10-2018'))

so we can proceeed with the `map` and `reduce` operations:

In [None]:
from functools import reduce

years = list(map(extract_year, dates))
most_recent_year = reduce(max, years)
print(most_recent_year)

#### Example 2:
Next we have some chicken gait length measurements and we want to convert them from millimeters to meters. Also, after the conversion we want to find the chicken with the longest gait.

We have the chicken entries:

In [None]:
chickens = [{'name': 'Elliot', 'gait_length': 50 },
            {'name': 'Susanna', 'gait_length': 30 },
            {'name': 'John', 'gait_length': 10 },
            {'name': 'Jane', 'gait_length': 70},
            {'name': 'Dolores', 'gait_length': 40}]

a function that converts the gait from millimeters to meters for a chicken entry:

In [None]:
def mm_to_m(chicken):
    
    chicken_dict = {'name': chicken['name'], 'gait_length': chicken['gait_length'] / 1000}
    
    return chicken_dict

print(mm_to_m({'name': 'Elliot', 'gait_length': 50 }))

a function that compares the gait of two chicken entries:

In [None]:
def max_gait(chicken_1, chicken_2):
    
    if chicken_1['gait_length'] > chicken_2['gait_length']:
        longest_gait_chicken = chicken_1
    else:
        longest_gait_chicken = chicken_2
    
    return longest_gait_chicken

print(max_gait({'name': 'Susanna', 'gait_length': 30 }, {'name': 'Dolores', 'gait_length': 40}))

and the map and reduce operations:

In [None]:
converted_entries = list(map(mm_to_m, chickens))
longest_gait_chicken = reduce(max_gait, converted_entries)
print(longest_gait_chicken)

#### Example 3:
Next, we are given the gross weight of some potato crates, and we want to calculate the net weight of the potatoes as well as the average net weight:

In [None]:
gross_weights = [30, 32, 28, 28.7, 31.2] #kg

To calculate the net weight we need to subract the crate weight from each gross weight:

In [None]:
def calculate_net_weight(gross_weight):
    return gross_weight - 1 #we subtract the crate weight which we assume to be 1, from the gross weight

net_weights = list(map(calculate_net_weight, gross_weights))

and then calculate the average net weight:

In [None]:
def add(x, y):
    return x + y

total_net_weight = reduce(add, net_weights)
average_net_weight = total_net_weight / len(net_weights)
print(average_net_weight)

<span style="display:none" id="question1">W3sicXVlc3Rpb24iOiAiV2hpY2ggb2YgdGhlIGZvbGxvd2luZyBzdGF0ZW1lbnRzIGFyZSBjb3JyZWN0IGZvciBtYXBSZWR1Y2U/IiwgInR5cGUiOiAibXVsdGlwbGVfY2hvaWNlIiwgImFuc3dlcnMiOiBbeyJjb2RlIjogIml0IGJyZWFrcyBkb3duIHRoZSAgICAgIHdvcmtsb2FkIGludG8gc21hbGxlciAgIHBpZWNlcyBvZiB3b3JrIHRoYXQgY2FuIGJlIG9wZXJhdGVkIG9uIGluICAgICAgIHBhcmFsbGVsIiwgImNvcnJlY3QiOiB0cnVlfSwgeyJjb2RlIjogIml0IGNhbm5vdCBiZSB1c2VkIGZvciAgIHNtYWxsIGRhdGFzZXRzIiwgImNvcnJlY3QiOiBmYWxzZSwgImZlZWRiYWNrIjogIkl0IGlzIHN1aXRhYmxlIGZvciBib3RoIHNtYWxsIGFuZCBiaWcgZGF0YXNldHMifSwgeyJjb2RlIjogIml0IGlzIHN1aXRhYmxlIG9ubHkgZm9yIG51bWVyaWNhbCBkYXRhc2V0cyIsICJjb3JyZWN0IjogZmFsc2UsICJmZWVkYmFjayI6ICJJdCBpcyBub3QgbGltaXRlZCB0byBudW1lcmljYWwgZGF0YSJ9XX1d</span>

## Quiz

#### Q1:

In [None]:
from jupyterquiz import display_quiz

display_quiz("#question1")

### Further reading

* https://en.wikipedia.org/wiki/MapReduce