In [1]:
import re
import numpy as np
import pandas as pd


# Map, Filter & Reduce

Let's recap the differences between the two programming paradigms we've seen so far:

**Imperative Paradigm**
- The program is a series of **instructions** that modify a **state**:

```python
x = 0
for i in range(10):
    x = (x + i)*2
print x
```
- The *variable* `x` is the state of our program, which is modified through the `for` loop.

- One the simplest forms of programming, typical of older programming languages (C, Fortran e COBOL por exemplo).

**Functional Programming**
- There is no state: the program defines functions which are applied over the input.
```python
def somar_2(x):
    return x + 2

def mult_4(x):
    return x * 4

saida = somar_2(mult_4(somar_2(entrada)))
```
- In the functional paradigm, functions are variables.
- Originated in the 1970s with LISP and is present today in many data-oriented languages such as R, Julia, Python (em parte).

## Funções are variables

In [2]:
soma_1 = lambda x: x + 1


In [8]:
soma_1(soma_1(soma_1(10)))


13

In [9]:
def soma_1_c(x):
    return x + 1


In [10]:
soma_1_c


<function __main__.soma_1_c(x)>

In [11]:
soma_1(soma_1_c(10))


12

We can use this to create functions that `return` other functions:

In [12]:
somar_n = lambda x, n: x + n


In [15]:
soma_1 = lambda x: somar_n(x, 8)


In [16]:
soma_1(10)


18

## The `map` concept

One of the key concepts in functional programming is **mapping**: applying a function to the elements of a set, list or other iterable. 

In [17]:
lista_exemplo = [10, 12, 34, 23, 2, 6, 7]


In [23]:
def div_2(x):
    return x / 2


A simple call of `div_2(lista_exemplo)` will not work!

In [24]:
[div_2(item) for item in lista_exemplo]


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

The `div_2` is expecting a number as an argumento, but `lista_exemplo` is a list!

We could create an empty list and use a loop to iterate over `lista_exemplo`:

In [25]:
new_list = []

for item in lista_exemplo:
    new_list.append(div_2(item))

new_list


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

Another way is using a `list comprehension`: one of the tools in the functional programming toolbox:

In [26]:
[div_2(item) for item in lista_exemplo]


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

A third way is using `map()`:

In [28]:
map(div_2, lista_exemplo)

<map at 0x1a6013b95b0>

In [29]:
for i in map(div_2, lista_exemplo):
    print(i)


5.0
6.0
17.0
11.5
1.0
3.0
3.5


The results from `map()` are **lazy**: it is not calculated when you call the functions but when you need the results!

In [30]:
list(map(div_2, lista_exemplo))


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

A *interesting* behavior of **lazy** iterators is that they become **empty as you iterate over their elements**:

In [43]:
resultado_map = list((map(div_2, lista_exemplo)))
resultado_map

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [44]:
for i in resultado_map:
    print(i)


5.0
6.0
17.0
11.5
1.0
3.0
3.5


In [40]:
'''Cuando ya lo iteraste el map borra la info del map, entonces no puedes volver a iterarla.''' 
list(resultado_map)


[]

In [45]:
resultado_map = map(div_2, lista_exemplo)


In [46]:
list(resultado_map)


[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

### Lazy evaluation

Lazy evaluation is an important concept in Big Data: it saves memory and CPU by performing computations only **when they are needed**.

In [47]:
lista_telefones = [
    19999571559,
    "(21) 2412-0107",
    "(34) 99762-1166",
    "91-4002-8282",
    "(19) 3542-1820",
    "(19) 3561-9525",
    "(34) 3333-5802",
]
pattern = r"[0-9]{2}"


In [48]:
lista_dds = list(map(lambda x: re.findall(pattern, str(x))[0], lista_telefones))
print(lista_dds)


['19', '21', '34', '91', '19', '19', '34']


In [51]:
for ddd in map(lambda x: "".join(re.findall(pattern, str(x)))[:2], lista_telefones):
    print(ddd)


19
21
34
91
19
19
34


## Filtering `filter()`

A segunda parte importante do paradigma funcional é a função `filter()`: ela nos permite filtrar os elementos de um iterável a partir de uma função que retorna valores booleanos. Assim como `map()`, `filter()` avalia (de forma preguiçosa) um iterável e retorna apenas os elementos onde a função aplicada retorna `True`.

Vamos continuar o nosso exemplo com uma lista de telefones e uma função para extrair o DDD:

In [52]:
lista_telefones = [
    19999571559,
    "(21) 2412-0107",
    "(34) 99762-1166",
    "91-4002-8282",
    "(19) 3542-1820",
    "(19) 3561-9525",
    "(34) 3333-5802",
]


def extrair_ddd(telefone):
    """
    Recebe um telefone e retorna seu DDD

    telefone (str or int): Telefone onde os dois primeiros digitios numéricos são o DDD
    """
    pattern = r"[0-9]{2}"
    return "".join(re.findall(pattern, str(telefone)))[:2]


In [53]:
map_19 = filter(lambda x: extrair_ddd(x) == "19", lista_telefones)
for i in map_19:
    print(i)


19999571559
(19) 3542-1820
(19) 3561-9525


In [54]:
lista_ddd_19 = list(
    filter(lambda x: extrair_ddd(x) == "19", lista_telefones)
)
print(lista_ddd_19)


[19999571559, '(19) 3542-1820', '(19) 3561-9525']


In [55]:
filtro_19 = filter(lambda x: extrair_ddd(x) == "19", lista_telefones)
for telefone in filtro_19:
    print(telefone)


19999571559
(19) 3542-1820
(19) 3561-9525


Both `map()` and `filter()` are similar to `list comprehensions` - the only difference is that they're *lazy evaluators*!

In [61]:
[telefone for telefone in lista_telefones if extrair_ddd(telefone) != "19"]


['(21) 2412-0107', '(34) 99762-1166', '91-4002-8282', '(34) 3333-5802']

## Agregando iteráveis com `reduce()`

The function `reduce()` implements an `accumulator`. Let's see how this works with the simple function `sum_two_elements(a, b)`:

```python
def sum_two_elements(a,b):
    return a+b
```

now, let's use `reduce()` to *reduce* our list through summing:

```python
reduce( sum_two_elements, [1,4,6,8] )
```

```python
a = 0 # accumulator
b = 1 # value
a + b = 1 # so the accumulator receives this cummulative sum

a = 1 # accumulator
b = 4 # value
a + b = 5
...
a = 5 # accumulator
b = 6 # value 
a + b = 11
...
a = 11 # accumulator
b = 8 # value
a + b = 19

return 19
```

In [62]:
from functools import reduce


### Example 1: Numbers

In [63]:
def somar_ab(a, b):
    print(f"a={a}, b={b}")
    return a + b


In [64]:
lista_numeros = [1, 4, 6, 8]
reduce(somar_ab, lista_numeros)


a=1, b=4
a=5, b=6
a=11, b=8


19

In [66]:
def comp_ab(x, y):
    print(f"x={x}, y={y}")
    if x > y:
        return x
    else:
        return y


reduce(comp_ab, [2, 10, 25, 1, -10, 13, 40, 20])


x=2, y=10
x=10, y=25
x=25, y=1
x=25, y=-10
x=25, y=13
x=25, y=40
x=40, y=20


40

### Example 2: Strings

In [67]:
lista_letras = ["P", "e", "d", "r", "o"]


In [68]:
reduce(lambda x, y: x + y, lista_letras)


'Pedro'

Let's use reduce to select the longest string in a list:

In [73]:
lista_nomes = ["Amapá", "Roraima", "Pará", "Piauí", "Maranhão"]
reduce(lambda x, y: x if len(x) > len(y) else y, lista_nomes)

'Maranhão'

### Example 3: Chaining Map, Filter & Reduce

In [75]:
list_tuples = [(12, 119), (-12, 43), (28, 39), (12, 21), (-14, 43)]

In [89]:
map_prod = map(lambda x: x[0] * x[1], list_tuples)

In [90]:
filt_neg = filter(lambda x: x > 0, map_prod)

In [91]:
smallest = reduce(lambda x, y: x if x < y else y, filt_neg)
print(smallest)

252


In [76]:
map_prod = map(lambda x: x[0] *x[1], list_tuples)
filt_neg = filter(lambda x: x > 0, map_prod)
smallest = reduce(lambda x, y: x if x < y else y, filt_neg)

print(smallest)

252
