I have this dataframe and I want to do a function to every element:

In [14]:
import pandas as pd
import numpy as np

In [5]:
last_names = pd.Series(['Rivera','Panizo','Barton-Henry','Sampietro','Fragoso','Talbot','Bayer','Suñol','Daukantaite',
'Delgadillo','Roig','Ferrer','Vidal','Casamajor','Martínez','Sang'])

In [6]:
names = pd.Series(['Claudia','Eva','Kelsey','Pau','Álvaro',
'Adrien','Jordi','David ','Vilma','Jessie Lee','Roger ','Adrià','Anna','Alejandro','Nacho','Chun'])

In [11]:
points = pd.Series([10,31,54,72,84,22,44,76,48,87,25,66,39,51,24,69])


In [12]:
data_bootcamp = pd.DataFrame({'names': names, 'last_names': last_names, 'points': points })

In [13]:
data_bootcamp

Unnamed: 0,names,last_names,points
0,Claudia,Rivera,10
1,Eva,Panizo,31
2,Kelsey,Barton-Henry,54
3,Pau,Sampietro,72
4,Álvaro,Fragoso,84
5,Adrien,Talbot,22
6,Jordi,Bayer,44
7,David,Suñol,76
8,Vilma,Daukantaite,48
9,Jessie Lee,Delgadillo,87


Let's say I want to subtract 1 to all of us because we are all late. How could I do that? Iterating throug each row and subtracting one... But there are some useful functions!

# High order functions

## Go to the slides!

We want to apply the same function to an iterable structure 

ITERABLE? Lists, tuples, dictionaries, and sets are all iterable objects

Enseñarles la función de la nueva calculadora y decirles que map es la clave. Así es explícito qué entra y qué sale de la función. No les digo nada del list que hace falta, ahora se explica qué es el objeto map.

In [26]:
def new_calculator(function, iterable):
    result = map(function, iterable)
    return list(result)

In [27]:
l = [10, 12, 34, 23]

In [28]:
def half(x):
    return x / 2

In [29]:
new_calculator(half, l)

[5.0, 6.0, 17.0, 11.5]

In [30]:
def my_wierd_function(x):
    return ((x + 2) * (x^3) ^ x )/(7 - 2*x)

In [32]:
new_calculator(my_wierd_function, l)

[-7.846153846153846,
 -13.058823529411764,
 -18.983606557377048,
 -12.384615384615385]

Let's see what map function does itself:

In [6]:
l = [10, 12, 34, 23]
map(half, l)

<map at 0x1089f40f0>

Map is an iterator, so we do not need to convert it into a list, but if we want to access the elements then we need to (or tuple or set)

In [7]:
list(map(half, l))

[5.0, 6.0, 17.0, 11.5]

In [8]:
for i in map(half, l):
    print(i)

5.0
6.0
17.0
11.5


## Filter
The structure is the same but instead of return all the elements with some function applied, we return the elements filtered

In [20]:
def is_odd(x):
    return x % 2 == 1

In [23]:
list(filter(is_odd(x), l))


NameError: name 'x' is not defined

Here we need some way to give the function itself into the line... Does anyone know how to do it?

In [24]:
list(filter(lambda x: x % 2 == 1, l))


[23]

In [25]:
lambda x: x % 2 == 1

<function __main__.<lambda>(x)>

In [26]:
list(filter(lambda x: x % 2 == 0, l))


[10, 12, 34]

QUESTION

In [35]:
list(filter(lambda x: x % 2 == 0, map(half,l)))


[6.0]

# Reduce
We want to do a resumee of the elements, it means we don't want to keep each of them but we want to take only one thing. For example we want to resume 7 numbers into one, do we know any function like this? (sum, mean, etc)

It is not a native function so we need to import it

In [27]:
from functools import reduce


In [28]:
reduce(lambda a, b: a + b, l)

79

so it is a way to apply the function we want to all the elements

In [None]:
reduce(lambda a, b: a + b, l)

# What about pandas
Sometimes we need to apply the same function to every element in our dataframe, and this type of functions are very useful

In [17]:
data_bootcamp.points.apply(half)

0      5.0
1     15.5
2     27.0
3     36.0
4     42.0
5     11.0
6     22.0
7     38.0
8     24.0
9     43.5
10    12.5
11    33.0
12    19.5
13    25.5
14    12.0
15    34.5
Name: points, dtype: float64

In [18]:
data_bootcamp.points = data_bootcamp.points.apply(half)

In [19]:
data_bootcamp

Unnamed: 0,names,last_names,points
0,Claudia,Rivera,5.0
1,Eva,Panizo,15.5
2,Kelsey,Barton-Henry,27.0
3,Pau,Sampietro,36.0
4,Álvaro,Fragoso,42.0
5,Adrien,Talbot,11.0
6,Jordi,Bayer,22.0
7,David,Suñol,38.0
8,Vilma,Daukantaite,24.0
9,Jessie Lee,Delgadillo,43.5


In [20]:
data_bootcamp.points = data_bootcamp.points.apply(lambda x: x - 1)

In [21]:
data_bootcamp

Unnamed: 0,names,last_names,points
0,Claudia,Rivera,4.0
1,Eva,Panizo,14.5
2,Kelsey,Barton-Henry,26.0
3,Pau,Sampietro,35.0
4,Álvaro,Fragoso,41.0
5,Adrien,Talbot,10.0
6,Jordi,Bayer,21.0
7,David,Suñol,37.0
8,Vilma,Daukantaite,23.0
9,Jessie Lee,Delgadillo,42.5


## To all the dataframe

In [36]:
df = pd.DataFrame(np.random.randn(4, 3), columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,0.173438,-0.855186,1.882033
1,0.254967,-0.171282,-0.451012
2,1.177518,0.309236,0.514519
3,0.063711,1.298072,-1.113455


In [37]:
df = df.apply(lambda x: x - 1)

In [38]:
df

Unnamed: 0,a,b,c
0,-0.826562,-1.855186,0.882033
1,-0.745033,-1.171282,-1.451012
2,0.177518,-0.690764,-0.485481
3,-0.936289,0.298072,-2.113455
