# <u><p style="text-align: center;">Reduce</p></u>

### Learning goals  
Students will be able to:  
* Explain how the reduce operation works 
* Recognize reduce operations in python
* Gain hands-on experience with applying a map function to a dataset in examples

### Background

In the previous notebook, we saw how to use the `map` operation to work elementwise with collections of data. `Map` is suitable for transforming each element of a collection <u>individually</u>, but it cannot be applied in cases where the required transformations involve more than one element of the same collection. For the latter cases, we have a technique called `reduce`. `Reduce` is used to combine all the elements of a collection into a single result. The function `reduce` takes two elements each time, applies the function that was given as argument, and carries over the result. 

#### <u>A requirement of `reduce`</u>

Similarly to `map`, `reduce` operations can be distributed to multiple cores/machines in order to perform calculations faster. To ensure that the results are coherent between repeated evaluations, the operation implemented by the function that is used by `reduce` should be **associative**. Associative operations are those where the result will always be the same, no matter how we group the elements of the collection and apply the function. Below are some associative operations (suitable for reduce):

$$ (2 + 3) + 4 = 2 + (3 + 4) $$
$$ (10 \times 2) \times 5 = 10 \times (2 \times 5) $$

And non-associative opetations (not suitable for reduce):

$$ (2 - 3) + 4 \neq 2 - (3 + 4) $$
$$ (10 \div 2) \div 5 \neq 10 \div (2 \div 5) $$

### Code examples

In this section we are going to see examples of how to apply the `reduce` function: 

***Example 1*** shows how to use `reduce` to find the highest temperature in a list.   
***Example 2*** demonstrates how `reduce` can be used to find the southermost latitude/longitude coordinate.   
***Example 3*** applies `reduce` to calculate the average weight of chickens in a barn.  

The syntax of `reduce` is:

#### Example 1: Determine maximum temperature in a list
Let's say that we want to find the highest daily temperature in the town of Wageningen in a week during January. We have data that contain the average temperature for all the days in that week:

In [4]:
temperatures = [10, 7, 11, 13, 9, 10, 8]    # in Celsius

We need to find the maximum of this list. This can be achieved with a reduce operation over `temperatures`. We pass two arguments on to the `reduce` function: the function (`max` in our case) and the data (our list `temperatures`). The function `reduce` finds what we need: one single result, which is the maximum temperature in the list.

In [5]:
from functools import reduce

result = reduce(max, temperatures)
print(result)

13


#### Example 2: Find southernmost coordinates
In a similar manner we can find the southernmost point from location data that contain the latitude and longitude of each location. First we have to define a function that compares two pairs of coordinates. We could write this function in a way to compare all the coordinates in the same function, but this would probably be inefficient. Instead, we write this function to compare pairs of coordinates and then supply to reduce. In this way, a big data application would be able to divide the work over multiple computers and combine the results later.

As we see below, our function `get_southern_pair` takes as arguments two locations. Each location is a pair, because it is a pair of coordinates of the form [latitude, longitude]. It doesn't matter which pair is the first argument and which pair is the second argument: the function is *associative* so any pair is valid. The southernmost location is the one with the lowest latitude. The function takes the first coordinate from each pair (pair[0], which is the latitude), and returns the location (the pair) which is more southern:

In [3]:
def get_southern_pair(pair1, pair2):
    
    if pair1[0] < pair2[0]:
        southern_pair = pair1
    else:
        southern_pair = pair2
    
    return southern_pair 

coordinate_1 = [35.632291, -293.326864]
coordinate_2 = [50.869759, 7.869558]

southern_pair = get_southern_pair(coordinate_1, coordinate_2)

print(southern_pair)

[35.632291, -293.326864]


Now we can use `reduce` to apply our function `get_southern_pair` to our data which is a list of coordinates:

In [4]:
#coordinates are formatted as (latitude, longitude)
coordinates = [[62.235184, -144.026314],
              [35.632291, -293.326864],
              [50.869759, 7.869558],
              [35.682291, -249.642216],
              [63.728073, 98.231107]]

southernmost_pair = reduce(get_southern_pair, coordinates)
print(southernmost_pair)

[35.632291, -249.642216]


#### Example 3: Average weight of chickens
In a similar way, we could calculate the average weight of broiler chickens in a barn. We first calculate the sum of the chicken weights. We do this by using the `add` function. Then we divide the sum by the number of chickens that we have (the length of the weights collection). For finding the length of the collection we use the `len` function.

In [7]:
def add(number_1, number_2):
    return number_1 + number_2

chicken_weights = [2.1, 1.9, 2.8, 2.2, 2.5, 2.3, 1.9] #broilers, kg

total_sum = reduce(add, chicken_weights)
average_weight = total_sum / len(chicken_weights)

print(average_weight)

2.242857142857143


<span style="display:none" id="question1">W3sicXVlc3Rpb24iOiAiV2hpY2ggb2YgdGhlIGZvbGxvd2luZyBzdGF0ZW1lbnRzIGFyZSB0cnVlPyIsICJ0eXBlIjogIm1hbnlfY2hvaWNlIiwgImFuc3dlcnMiOiBbeyJjb2RlIjogIidSZWR1Y2UnIGFuZCAnbWFwJyBjYW4gIGJlIHVzZWQgaW50ZXJjaGFuZ2VhYmx5IiwgImNvcnJlY3QiOiBmYWxzZSwgImZlZWRiYWNrIjogIk1hcCBpcyBub3Qgc3VpdGFibGUgZm9yIG9wZXJhdGlvbnMgaW52b2x2aW5nIG11bHRpcGxlIGVsZW1lbnRzIG9mIGEgY29sbGVjdGlvbiJ9LCB7ImNvZGUiOiAiVGhlIGZ1bmN0aW9uIHByb3ZpZGVkIHRvICdyZWR1Y2UnIHNob3VsZCBiZSBhc3NvLWNpYXRpdmUiLCAiY29ycmVjdCI6IHRydWV9LCB7ImNvZGUiOiAiJ1JlZHVjZScgYXBwbGllcyBhIGZ1bmMtdGlvbiB0byBlYWNoIGluZGl2aWR1YWwgZWxlbWVudCBpbiBhIGNvbGxlY3Rpb24iLCAiY29ycmVjdCI6IGZhbHNlLCAiZmVlZGJhY2siOiAiSXQgZG9lcyBpdCBpbiBwYWlycyBvZiBlbGVtZW50cyJ9LCB7ImNvZGUiOiAiJ1JlZHVjZScgYXBwbGllcyBhIGZ1bmMtdGlvbiB0byBhIHBhaXIgb2YgZWxlLSAgIG1lbnRzLCB0YWtlcyB0aGUgcmVzdWx0IGFuZCBwYXNzZXMgaXQgdG8gdGhlICAgIGZ1bmN0aW9uIHdpdGggdGhlIG5leHQgIGVsZW1lbnQgb2YgdGhlICAgICAgICAgIGNvbGxlY3Rpb24iLCAiY29ycmVjdCI6IHRydWV9XX1d</span>

### Quiz

#### Q1:

In [1]:
from jupyterquiz import display_quiz

display_quiz("#question1")




### More advanced examples

In this section we are going to see an example with a more advanced version of `reduce`. Feel free to skip this example as it is not necessary to understand the rest of the course.

#### Example A1:
Until now we saw that `reduce` can be supplied with a function and a data collection. There is also a third optional parameter which acts as an initializer for the reduce operation. 

Without supplying a third argument, `reduce` will take the first two elements of the data collection to perform its first operation. On the other hand, if a third argument is supplied, `reduce` will use the third argument as the first argument of the supplied function, and the first element of the data collection as the second argument of the supplied function.

Below is an example where we use the third parameter of `reduce` as an offset for the addition operation:

In [8]:
#Here we use the `add` function of example 3
print(reduce(add, [1,2,3,4,5])) #two arguments
print(reduce(add, [1,2,3,4,5], 100)) #three arguments

15
115


The third parameter can also be a data structure (e.g. list) which could be used as a container for our results:

In [9]:
result = reduce(lambda container, value: container + [value], [6,7,8,9,10], [1,2,3,4,5])
print(result)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


Here the list `[1,2,3,4,5]` was used as a container and then one by one the elements of the list `[6,7,8,9,10]` were added to it.

### Further reading
[The reduce function](https://en.wikipedia.org/wiki/Fold_(higher-order_function))  
[Reduce in Python](https://docs.python.org/3/library/functools.html#functools.reduce)