## Map-Reduce

- map(func, list) - apply the same function to every element of a list. Returns modified list. Output can be casted as a list, to see the whole thing
- returns a generator
- Lets watch this video: https://www.youtube.com/watch?v=cKlnR-CB3tk&t=137s
- if a dict is passed in, the map fucntion will iterate over the sequence of dict items


- filter(condition, list) --> new list of only the items that meet the condition
- the condition can be computed using a lambda function



- reduce - FOR PAIRS
- applies the same operation to each elem in as list
- uses result of a single operation, as first param for next operation
- result --> result of applying the function to every pair of numbers in the list
- will return that element, not the list just containing that one element

Reduce example:

```
ls = [m, n, p, q]
f = lambda x, y: x - y

res = reduce(f, ls)

res = [r = f(m, n), p, q]
res = [s = f(r, p), q]
res = [t = f(s, q)]
res = t = m - n - p - q

```

In [3]:
f = lambda x: x * x

# def f(x):
#     return x * x

print(f(5))

25


In [6]:
# example from above
from functools import reduce


ls = [5, 2, 3, 0]
f = lambda x, y: x - y

res = reduce(f, ls)

res

0

In [17]:
# Calculating MSE using map and reduce
y_true = [-52, -54, -31, -16]
y_pred = [-38.25, -38.25, -38.25, -38.25]
mse = list(map((lambda pair:(pair[0] - pair[1])**2), zip(y_true, y_pred)))
summation = reduce(lambda x, y: x + y, mse)
result = summation / len(y_true)
result

## Why use Map() and Reduce()? Doesn't It Violate "Simpler > Complex"? 🤔

```
map(func, list) 

ls = [m, n, p, q]
f = lambda x, y: x - y

res = reduce(f, ls)

res = [r = f(m, n), p, q]
res = [s = f(r, p), q]
res = [t = f(s, q)]
res = t = m - n - p - q
```

Faster time!

The CPU can run the operations in parallel, because the elements in the output list can be computed simulataneously (because their results are independent of each other, the inputs are different and in different locations

This way we can move faster through our data, as long as the has multiple CORES which it can distribute (aka "partition") its computation across as it calculates the elements for the output list.

These functions can come in handy especially for certain DS/ML functions, which require lots of computation involving iterations - because it will help you more efficiently allocate resources!

## Using Map-Reduce to Create a Histogram of Word Type Frequency

In [56]:
words = ['Deer', 'Bear', 'River', 'Car', 'Car', 'River', 'Deer', 'Car', 'Bear']

mapping = list(map(lambda x: {x: 1}, words))

In [54]:
for item in mapping:
    print(item)

{'Deer': 1}
{'Bear': 1}
{'River': 1}
{'Car': 1}
{'Car': 1}
{'River': 1}
{'Deer': 1}
{'Car': 1}
{'Bear': 1}


In [57]:
def increment_word_count(key_val1, key_val2):
    key1, key2 = key_val1, key_val2
    val1, val2 = key_val1[key1], key_val2[key2]
    if key1 == key2:
        val1 += 1
        key_val1[key1] = val1
    else:
         key_val1[key2] = val2
    return key_val1
        
histogram = reduce(increment_word_count, mapping)

TypeError: unhashable type: 'dict'

In [61]:
from collections import Counter

words = ['Deer', 'Bear', 'River', 'Car', 'Car', 'River', 'Deer', 'Car', 'Bear']
histogram = Counter(words)
dict(histogram)

{'Deer': 2, 'Bear': 2, 'River': 2, 'Car': 3}

In [64]:
# Map reduce technique

words = ['Deer', 'Bear', 'River', 'Car', 'Car', 'River', 'Deer', 'Car', 'Bear']
mapping = list(map(lambda x: {x: 1}, words))

def func(x, y):
    return dict(Counter(x) + Counter(y))

histogram = reduce(func, mapping)
histogram

{'Deer': 2, 'Bear': 2, 'River': 2, 'Car': 3}

# Lambda

- simple 1 line
- uses lambda keyword
- def and return are implict
- can take up to 2 args

## Two ways to apply a function to the elements of a list

In [6]:
nums = [1, 2, 3, 4, 5, 6]
nums_squared = list(map(lambda x: x * x, nums))
print(nums_squared)

nums_squared = [x * x for x in nums] 
print(nums_squared)

[1, 4, 9, 16, 25, 36]
[1, 4, 9, 16, 25, 36]


In [2]:
words = ['Deer', 'Bear', 'River', 'Car', 'Car', 'River', 'Deer', 'Car', 'Bear']

mapping = map(lambda x : (x, 1), words)

for i in mapping:
    print(i)

('Deer', 1)
('Bear', 1)
('River', 1)
('Car', 1)
('Car', 1)
('River', 1)
('Deer', 1)
('Car', 1)
('Bear', 1)


In [4]:
for i in mapping:
    print(i)

In [11]:
mapping = map(lambda x : {x: 1}, words)

for i in mapping:
    print(i)

{'Deer': 1}
{'Bear': 1}
{'River': 1}
{'Car': 1}
{'Car': 1}
{'River': 1}
{'Deer': 1}
{'Car': 1}
{'Bear': 1}


## It is somehow similar to generator unless we do list(map())

In [7]:
n = list(map(lambda char: dict([[char, 1]]), 'testing yeah it works'))
n

[{'t': 1},
 {'e': 1},
 {'s': 1},
 {'t': 1},
 {'i': 1},
 {'n': 1},
 {'g': 1},
 {' ': 1},
 {'y': 1},
 {'e': 1},
 {'a': 1},
 {'h': 1},
 {' ': 1},
 {'i': 1},
 {'t': 1},
 {' ': 1},
 {'w': 1},
 {'o': 1},
 {'r': 1},
 {'k': 1},
 {'s': 1}]

## Reduce

In [3]:
from functools import reduce

print(reduce(lambda x, y: x & y, [{1, 2, 3}, {2, 3, 4}, {3, 4, 5}]))

# & = intersection operator, when given two sets as operands

{3}


In [7]:
print(reduce(lambda x, y: x + y, [1, 2, 4]))

# [1, 2, 4]
# [3, 4]
# 7

7


## Count how many words do we have in a list

What if you had to do this on a dataset of 1 billion words?

Using functions like `map()` and `reduce()` will serve you well here!

In [8]:
from collections import Counter

words = ['Deer', 'Bear', 'River', 'Car', 'Car', 'River', 'Deer', 'Car', 'Bear']

mapping = map(lambda x : {x: 1}, words)

def fn(x, y):
    return dict(Counter(x) + Counter(y))

reduce(fn, mapping)

{'Deer': 2, 'Bear': 2, 'River': 2, 'Car': 3}

In [65]:
def fn_reducer(i, j):
    '''Same as above, without using Counter'''
    for k in j:
        i[k] = i.get(k, 0) + j.get(k, 0)
    return i

mapping = map(lambda x: {x, 1}, words)
reduce(fn_reducer, mapping)

AttributeError: 'set' object has no attribute 'get'

In [21]:
reduce(fn, [{'a': 1}, {'a': 1}, {'b': 1}])

{'a': 2, 'b': 1}

In [16]:
Counter({'a': 1}) + Counter({'a': 1})

Counter({'a': 2})

In [12]:
print(fn({'a': 1},  {'a': 1}))

{'a': 2}


In [13]:
print(fn({'a': 1},  {'b': 1}))

{'a': 1, 'b': 1}


## *arg

In [1]:
def intersection(*arg):
    result = set(arg[0])
    for i in range(1,len(arg)):
        result = result & set(arg[i]) 
    return list(result)
  
print(intersection(['a', 'b'], ['a', 'c'], ['a', 'b']))

['a']


# Up Next

Soon we're going to cover PySpark, one of the premier platforms for Big Data.
So why is knowing about `map()` and `reduce()` useful?

Because these functions form the basis for all the functionality in PySpark!