# FUNCIONES MATEMÁTICAS

In [11]:
import numpy as np
import pandas as pd
df_london = pd.read_csv('files/london_merged.csv')
df_london.head(5)

Unnamed: 0,timestamp,cnt,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season
0,2015-01-04 00:00:00,182,3.0,2.0,93.0,6.0,3.0,0.0,1.0,3.0
1,2015-01-04 01:00:00,138,3.0,2.5,93.0,5.0,1.0,0.0,1.0,3.0
2,2015-01-04 02:00:00,134,2.5,2.5,96.5,0.0,1.0,0.0,1.0,3.0
3,2015-01-04 03:00:00,72,2.0,2.0,100.0,0.0,1.0,0.0,1.0,3.0
4,2015-01-04 04:00:00,47,2.0,0.0,93.0,6.5,1.0,0.0,1.0,3.0


## Preprocesamieto

In [12]:
df_london.dtypes

timestamp        object
cnt               int64
t1              float64
t2              float64
hum             float64
wind_speed      float64
weather_code    float64
is_holiday      float64
is_weekend      float64
season          float64
dtype: object

Cambiar tipo de variable de la columna *timestamp*

In [13]:
df_london['timestamp'] = pd.to_datetime(df_london['timestamp'])
print(df_london['timestamp'].dtype)
df_london['timestamp'].dtype

datetime64[ns]


dtype('<M8[ns]')

Crear columna con la hora de los eventos

In [14]:
df_london['hour'] = df_london['timestamp'].dt.hour
df_london['hour']

0         0
1         1
2         2
3         3
4         4
         ..
17409    19
17410    20
17411    21
17412    22
17413    23
Name: hour, Length: 17414, dtype: int64

Un nuevo DF con solo variables de tipo numérico

In [15]:
df = df_london.iloc[:, 1:]
df.head(2)

Unnamed: 0,cnt,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season,hour
0,182,3.0,2.0,93.0,6.0,3.0,0.0,1.0,3.0,0
1,138,3.0,2.5,93.0,5.0,1.0,0.0,1.0,3.0,1


## Funciones matemáticas

### Funciones básicas

In [16]:
np.sin(df['wind_speed']**2)+10

0         9.008221
1         9.867648
2        10.000000
3        10.000000
4         9.013013
           ...    
17409    10.279387
17410    10.923470
17411     9.114047
17412    10.936473
17413    10.193503
Name: wind_speed, Length: 17414, dtype: float64

### Operaciones entre columnas

In [17]:
df['t1'] - df['t2']

0        1.0
1        0.5
2        0.0
3        0.0
4        2.0
        ... 
17409    4.0
17410    4.0
17411    4.0
17412    4.0
17413    4.0
Length: 17414, dtype: float64

In [18]:
df['t1'].iloc[::2] - df['t2']

0        1.0
1        NaN
2        0.0
3        NaN
4        2.0
        ... 
17409    NaN
17410    4.0
17411    NaN
17412    4.0
17413    NaN
Length: 17414, dtype: float64

Usar la función de restar en lugar de el operador "-" da la versatilidad de difinir nuevos argumentos

In [19]:
df['t1'].iloc[::2].sub(df['t2'], fill_value=1000)

0          1.0
1        997.5
2          0.0
3        998.0
4          2.0
         ...  
17409    999.0
17410      4.0
17411    998.5
17412      4.0
17413    999.0
Length: 17414, dtype: float64

## Funciones definidas

In [20]:
def fun(x, a=1, b=0):
    return x**2 + a*x + b

In [26]:
print(fun(10, 20, -100))
print(fun(10, a=20, b=-100))

200
200


In [29]:
# df['hour'].apply(fun, args=(20,-100))
df['hour'].apply(fun, a=20, b=-100)

0       -100
1        -79
2        -56
3        -31
4         -4
        ... 
17409    641
17410    700
17411    761
17412    824
17413    889
Name: hour, Length: 17414, dtype: int64

### Usando funciones lambda

In [32]:
df['hour'].apply(lambda x: x+0.5)

0         0.5
1         1.5
2         2.5
3         3.5
4         4.5
         ... 
17409    19.5
17410    20.5
17411    21.5
17412    22.5
17413    23.5
Name: hour, Length: 17414, dtype: float64

funciones lambda aplicadas a las columnas

In [33]:
df.apply(lambda x: x.mean())

cnt             1143.101642
t1                12.468091
t2                11.520836
hum               72.324954
wind_speed        15.913063
weather_code       2.722752
is_holiday         0.022051
is_weekend         0.285403
season             1.492075
hour              11.513265
dtype: float64

funciones lambda aplicadas a las filas

In [37]:
df.apply(lambda x: x.mean(), axis='columns' )

0         29.30
1         24.75
2         24.25
3         18.40
4         15.75
          ...  
17409    117.30
17410     67.60
17411     47.45
17412     35.90
17413     27.10
Length: 17414, dtype: float64

In [38]:
df.apply(lambda x: x['t1']-x['t2'], axis='columns' )

0        1.0
1        0.5
2        0.0
3        0.0
4        2.0
        ... 
17409    4.0
17410    4.0
17411    4.0
17412    4.0
17413    4.0
Length: 17414, dtype: float64

funciones lambda aplicadas a cada valor del DF

In [39]:
df.applymap(lambda x: x*10)

Unnamed: 0,cnt,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season,hour
0,1820,30.0,20.0,930.0,60.0,30.0,0.0,10.0,30.0,0
1,1380,30.0,25.0,930.0,50.0,10.0,0.0,10.0,30.0,10
2,1340,25.0,25.0,965.0,0.0,10.0,0.0,10.0,30.0,20
3,720,20.0,20.0,1000.0,0.0,10.0,0.0,10.0,30.0,30
4,470,20.0,0.0,930.0,65.0,10.0,0.0,10.0,30.0,40
...,...,...,...,...,...,...,...,...,...,...
17409,10420,50.0,10.0,810.0,190.0,30.0,0.0,0.0,30.0,190
17410,5410,50.0,10.0,810.0,210.0,40.0,0.0,0.0,30.0,200
17411,3370,55.0,15.0,785.0,240.0,40.0,0.0,0.0,30.0,210
17412,2240,55.0,15.0,760.0,230.0,40.0,0.0,0.0,30.0,220
