In [2]:
import tqdm
import numpy as np
import pandas as pd


# Functional Paradigm Intro

What other paradigms we have experienced?

> <b> Procedural Programming </b>
- Instructions are procedures.
- Side effects are its core.

> <b> Objected Oriented Programming </b>
- Instructions are grouped as part of a state of an object.

> <b> Functional Programming </b>
- No state exists. Just a serie of functions being evaluated. 
- No side effects.
- The solution obtained is entirely based on the input. Like in math where <code>f(x) = y</code>
- This idea leads to the fact that you can also <b>pass functions as arguments</b>. And this helps a lot.


In [3]:
def add_one(x):
    return x + 1

In [4]:
x = 2

In [9]:
banana=add_one

In [10]:
banana

<function __main__.add_one(x)>

In [5]:
# functions can be thought as variables as well (!)
# add_one is just a name

f = add_one

In [6]:
f

<function __main__.add_one(x)>

In [7]:
add_one(10)

11

In [11]:
# now f receives add_one 

variavel = f(10)

In [13]:
type(variavel)

int

In [14]:
def add_two(x):
    return x + 2

In [15]:
# so, if it can be thought as a variable, 
# can it be passed as an argument like any other variable? YES! 

def add_any(f, x):
    return f(x)

In [16]:
add_any(add_one, 3)

4

In [19]:
4()

TypeError: 'int' object is not callable

In [18]:
add_any(4, 3)

TypeError: 'int' object is not callable

# Function definition

```python
def function_name(arg1):
    something = arg1 + 10
    return something
```

# Mapping concept

In [20]:
# Simple list 
example_list = [10, 12, 34, 23, 2, 6, 7]

In [21]:
# define a function that performs any operation: 

def half(x):
    return x/2

## How to apply that function to all elements of this list?

In [22]:
# you cant simply:

half(example_list)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [23]:
# using a for loop
new_list = []

for item in example_list:
    new_list.append(half(item))
    
new_list 

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [24]:
# using list comprehensions
[half(item) for item in example_list]

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [25]:
# using mapping:

map(half, example_list)

# what it does when you map a function onto a list is the below: 

# [half(10), half(12), half(34), half(23), half(2), half(6), half(7)]

<map at 0x217fa960088>

Map is called `lazy`. When you run `map(function, my_list)`, it doesn't execute anything. It just stores what it needs to perform. Whenever you call it, it washes out the result.

In [30]:
tuple(map(half, example_list))

TypeError: 'int' object is not callable

# Lazy evaluation

Functional programming allows the idea of not calculating the whole function at once. 

These methods return only a `python object`. This haven't calculated nothing yet. As soon as you require the results, it calculates it.

In [31]:
map(half, example_list)

<map at 0x217fa96e0c8>

In [32]:
list(map(half, example_list))

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [33]:
for item in map(half, example_list):
    print(item)

5.0
6.0
17.0
11.5
1.0
3.0
3.5


In [None]:
set(map(half, example_list))

# Filter

`filter` helps removing elements of a list (or any iterator, anything you can run through) by passing a function that returns `True` or `False`. `filter` will also return a `python object`, but when you require it to show you the results, it will filter out every item that has return `False` on your function.

In [34]:
def check_if_even(x):
    """
    Return True if x is even, else return False"""
    
    
    return x % 2 == 0

In [38]:
check_if_even(example_list[0])

True

In [35]:
example_list

[10, 12, 34, 23, 2, 6, 7]

In [36]:
filter(check_if_even, example_list)

<filter at 0x217fa95d088>

In [37]:
list(filter(check_if_even, example_list))

[10, 12, 34, 2, 6]

In [39]:
[item for item in example_list if item % 2 == 0]

[10, 12, 34, 2, 6]

In [None]:
list(filter(check_if_even, example_list))

# Reduce

Reduce brings the idea of an `accumulator`. Imagine you have a function that performs a `sum` for each pair of arguments. `reduce` (from the library `functools`) will consider the first argument of your function an `accumulator` and will run through your iterator recursively applying your function for pairs of items.

For example, for the list [1,4,6,8]

If you perform the following function:
```python
def sum_two_elements(a,b):
    return a+b
```

as 
```python
reduce( sum_two_elements, [1,4,6,8] )
```

The steps it will perform are:
```python
a = 0 # accumulator
b = 1 # value
a + b = 1 # so the accumulator receives this cummulative sum

a = 1 # accumulator
b = 4 # value
a + b = 5
...
a = 5 # accumulator
b = 6 # value 
a + b = 11
...
a = 11 # accumulator
b = 8 # value
a + b = 19

return 19
```

In [40]:
from functools import reduce

In [47]:
def div_two(x):
    return x/2

In [48]:
reduce(div_two, [1,4,6,8])

TypeError: div_two() takes 1 positional argument but 2 were given

In [51]:
def sum_two_elements(a,b):
    print(f'a = {a}, b={b}')
    return a+b

In [52]:
reduce( sum_two_elements, [1,4,6,8])

19

In [53]:
reduce( sum_two_elements, ['Raiana ','Rocha ', 'Oliveira '])

'Raiana Rocha Oliveira '

In [54]:
''.join(['Raiana ','Rocha ', 'Oliveira '])

'Raiana Rocha Oliveira '

In [55]:
def my_sum(acc, value):
    print(acc, value)
    if acc % 2 == 0:
        return_value = acc+value
    else:
        return_value = acc

    return return_value

In [56]:
example_list

[10, 12, 34, 23, 2, 6, 7]

In [57]:
# sum up to the sum gets an odd value
reduce(my_sum, example_list)

10 12
22 34
56 23
79 2
79 6
79 7


79

In [None]:
example_list

In [None]:
example_list2 = ['a','b', 'c', 'd']

In [None]:
def my_sum(a,b):
    return a + b

In [None]:
sum(example_list2)

In [None]:
reduce(my_sum, example_list2)

# Mapping on Pandas

> <code> df['col_name'].apply() </code>

In [58]:
n = 100

In [113]:
df = pd.DataFrame(np.random.random(n), columns=['number'])

In [90]:
df

Unnamed: 0,number
0,0.691114
1,0.087984
2,0.097756
3,0.079281
4,0.506913
...,...
95,0.353329
96,0.444927
97,0.717166
98,0.577464


In [91]:
def division(x,divisor):
    return x/divisor

In [93]:
df['half']=df['number'].apply(division,divisor=2)

In [94]:
df['quarter']=df['number'].apply(division,divisor=4)

In [95]:
df['double']=df['number'].apply(division,divisor=0.5)

In [96]:
df.head()

Unnamed: 0,number,half,quarter,double
0,0.691114,0.345557,0.172778,1.382227
1,0.087984,0.043992,0.021996,0.175969
2,0.097756,0.048878,0.024439,0.195512
3,0.079281,0.039641,0.01982,0.158563
4,0.506913,0.253457,0.126728,1.013826


In [69]:
def coluna(x):
    return x[0]

In [62]:
greater_than_half(df['number'])

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [121]:
df.loc[0,:]['number']

0.8692040912898408

In [117]:
def greater_than_half(row):
    print(row)
    if row['number'] > 0.5:
        return 'Banana'
    else:
        return 'Uvas'

In [118]:
df.apply(greater_than_half,axis=1)

number    0.869204
Name: 0, dtype: float64
number    0.046142
Name: 1, dtype: float64
number    0.360175
Name: 2, dtype: float64
number    0.05455
Name: 3, dtype: float64
number    0.486504
Name: 4, dtype: float64
number    0.101407
Name: 5, dtype: float64
number    0.380095
Name: 6, dtype: float64
number    0.970919
Name: 7, dtype: float64
number    0.525903
Name: 8, dtype: float64
number    0.914289
Name: 9, dtype: float64
number    0.069123
Name: 10, dtype: float64
number    0.018014
Name: 11, dtype: float64
number    0.650186
Name: 12, dtype: float64
number    0.013314
Name: 13, dtype: float64
number    0.161989
Name: 14, dtype: float64
number    0.177034
Name: 15, dtype: float64
number    0.121002
Name: 16, dtype: float64
number    0.800614
Name: 17, dtype: float64
number    0.962862
Name: 18, dtype: float64
number    0.687634
Name: 19, dtype: float64
number    0.951382
Name: 20, dtype: float64
number    0.982118
Name: 21, dtype: float64
number    0.057054
Name: 22, dtype: float64

0     Banana
1       Uvas
2       Uvas
3       Uvas
4       Uvas
       ...  
95    Banana
96    Banana
97    Banana
98    Banana
99      Uvas
Length: 100, dtype: object

In [70]:
df.apply(coluna)

number    0.253704
dtype: float64

> Pandas Series have both `map` and `apply`. The most used, though, is the `apply` method. 

In [71]:
df['is_greater_than_half'] = df['number'].apply(greater_than_half)

In [72]:
df

Unnamed: 0,number,is_greater_than_half
0,0.253704,Uvas
1,0.588665,Banana
2,0.296651,Uvas
3,0.267273,Uvas
4,0.921240,Banana
...,...,...
95,0.269564,Uvas
96,0.723982,Banana
97,0.988884,Banana
98,0.894270,Banana


---

In [73]:
import re

In [74]:
names = ['andre', 'Andre', 'André','ANDRE','ANDRÉ', 'Joao','Carlos', 'Maria', 'Jose']
df = pd.DataFrame(np.random.choice(names, n), columns=['names'])
df

Unnamed: 0,names
0,andre
1,ANDRE
2,andre
3,Andre
4,Maria
...,...
95,Joao
96,André
97,André
98,Joao


In [75]:
df['names'].value_counts()

Andre     19
Jose      17
Joao      13
Maria     11
André     11
ANDRE     10
ANDRÉ      9
andre      8
Carlos     2
Name: names, dtype: int64

In [None]:
## task: replace all occurrences of my name to Andre

In [76]:
def change_names(name):
    return re.sub('[Aa][Nn][Dd][Rr][EeÉé]', 'Andre', name)

In [77]:
change_names(df['names'])

TypeError: expected string or bytes-like object

In [78]:
df['names'] = df['names'].apply(change_names)
df['names']

0     Andre
1     Andre
2     Andre
3     Andre
4     Maria
      ...  
95     Joao
96    Andre
97    Andre
98     Joao
99    Andre
Name: names, Length: 100, dtype: object

In [79]:
df['names'].value_counts()

Andre     57
Jose      17
Joao      13
Maria     11
Carlos     2
Name: names, dtype: int64

# Apply functions with arguments.

In [81]:
def my_replace(x, index):
    """
    If index = 0, returns the name
    If index = 1, returns the profession
    """
    return x.replace('_',' ').split()[index]

In [82]:
example_df = pd.DataFrame({'names': ['Andre_LT','Matheus_TA','Joao_Student','Jose_Student']})

In [84]:
my_replace('Matheus_TA', 1)

'TA'

In [87]:
example_df.head()

Unnamed: 0,names,profissao,nome
0,Andre_LT,LT,Andre
1,Matheus_TA,TA,Matheus
2,Joao_Student,Student,Joao
3,Jose_Student,Student,Jose


In [86]:
example_df['profissao'] = example_df['names'].apply(my_replace, index=1)
example_df['nome'] = example_df['names'].apply(my_replace, index=0)

# Apply in axis = 1

Whenever you map (apply) on a pandas dataframe using axis=1, you'll be able to have access to the rows of the dataframe on your function.

In [123]:
df = pd.DataFrame()
df['type'] = example_df['names'].apply(my_replace, index=1)
df['name'] = example_df['names'].apply(my_replace, index=0)
df['score'] = [6, 7, 8, 7]

In [124]:
df

Unnamed: 0,type,name,score
0,LT,Andre,6
1,TA,Matheus,7
2,Student,Joao,8
3,Student,Jose,7


In [131]:
df.loc[:,'type']

0         LT
1         TA
2    Student
3    Student
Name: type, dtype: object

In [103]:
def has_passed(row):
    if row['type'] == 'Student':
        if row['score'] > 7:
            return 'pass'
        else:
            return 'fail'
    else:
        return 'invalid'      

In [132]:
df.apply(has_passed, axis=1)

0    invalid
1    invalid
2       pass
3       fail
dtype: object

In [104]:
df['passed']=df.apply(has_passed, axis=1)

In [105]:
df.head()

Unnamed: 0,type,name,score,passed
0,LT,Andre,6,invalid
1,TA,Matheus,7,invalid
2,Student,Joao,8,pass
3,Student,Jose,7,fail
