<img src="https://www.aseafi.es/wp-content/uploads/2020/09/Afi-Escuela.png" alt="Drawing" width="300"/>

# Pandas


**Verónica Ruiz Méndez - Septiembre 2022**

Curso Introducción a Machine Learning - Afi Escuela de Finanzas

*Autor*: [Verónica Ruiz Méndez](https://www.linkedin.com/in/veronica-ruiz-mendez/) 

El pilar básico de la librería <b>pandas</b>, al igual que como ocurría con <b>numpy</b>, son las estructuras de datos que pone a nuestra disposición.<br/>
En este caso, dispondremos de dos estructuras de datos relacionadas, pero con su funcionamiento específico:<br/>
<ul>
<li><b>Series:</b> Para información unidimensional.</li>
<li><b>DataFrame:</b> Para información tabular.</li>
</ul>

Son estructuras muy similares a las ofrecidas por R: vectores (con nombre) y data.frame.

In [None]:
import pandas as pd
import numpy as np

### Series

In [None]:
# Serie desde ndarray
array = np.array([2, 4, 6, 8, 10])
serie = pd.Series(array)
serie

0     2
1     4
2     6
3     8
4    10
dtype: int64

In [None]:
serie.values

array([ 2,  4,  6,  8, 10])

In [None]:
serie.index

RangeIndex(start=0, stop=5, step=1)

In [None]:
serie = pd.Series(array, index=['a', 'b', 'c', 'd', 'e'])
serie

a     2
b     4
c     6
d     8
e    10
dtype: int64

In [None]:
serie.values

array([ 2,  4,  6,  8, 10])

In [None]:
serie.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

### DataFrames

In [None]:
# DataFrame desde ndarray con índices para filas y columnas
dataframe = pd.DataFrame(np.arange(49).reshape(7, 7), 
                         index=['f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7'], 
                         columns=['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'])
dataframe

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7
f1,0,1,2,3,4,5,6
f2,7,8,9,10,11,12,13
f3,14,15,16,17,18,19,20
f4,21,22,23,24,25,26,27
f5,28,29,30,31,32,33,34
f6,35,36,37,38,39,40,41
f7,42,43,44,45,46,47,48


In [None]:
dataframe.values

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

In [None]:
dataframe.index

Index(['f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7'], dtype='object')

In [None]:
dataframe.columns

Index(['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'], dtype='object')

In [None]:
dataframe.head()

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7
f1,0,1,2,3,4,5,6
f2,7,8,9,10,11,12,13
f3,14,15,16,17,18,19,20
f4,21,22,23,24,25,26,27
f5,28,29,30,31,32,33,34


In [None]:
dataframe.head(6)

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7
f1,0,1,2,3,4,5,6
f2,7,8,9,10,11,12,13
f3,14,15,16,17,18,19,20
f4,21,22,23,24,25,26,27
f5,28,29,30,31,32,33,34
f6,35,36,37,38,39,40,41


In [None]:
dataframe.tail()

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7
f3,14,15,16,17,18,19,20
f4,21,22,23,24,25,26,27
f5,28,29,30,31,32,33,34
f6,35,36,37,38,39,40,41
f7,42,43,44,45,46,47,48


In [None]:
dataframe.tail(3)

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7
f5,28,29,30,31,32,33,34
f6,35,36,37,38,39,40,41
f7,42,43,44,45,46,47,48


#### Indexación en dataframes

In [None]:
dataframe['c1']

f1     0
f2     7
f3    14
f4    21
f5    28
f6    35
f7    42
Name: c1, dtype: int64

In [None]:
dataframe.loc['f1']

c1    0
c2    1
c3    2
c4    3
c5    4
c6    5
c7    6
Name: f1, dtype: int64

#### Añadir/eliminar columnas

In [None]:
dataframe['new'] = [100, 100, 100, 100, 100, 100, 100]

In [None]:
dataframe

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7,new
f1,0,1,2,3,4,5,6,100
f2,7,8,9,10,11,12,13,100
f3,14,15,16,17,18,19,20,100
f4,21,22,23,24,25,26,27,100
f5,28,29,30,31,32,33,34,100
f6,35,36,37,38,39,40,41,100
f7,42,43,44,45,46,47,48,100


In [None]:
dataframe.drop('f1')

Unnamed: 0,c1,c2,c3,c4,c5,c6,c7,new
f2,7,8,9,10,11,12,13,100
f3,14,15,16,17,18,19,20,100
f4,21,22,23,24,25,26,27,100
f5,28,29,30,31,32,33,34,100
f6,35,36,37,38,39,40,41,100
f7,42,43,44,45,46,47,48,100


In [None]:
dataframe.drop('c1', axis=1)

Unnamed: 0,c2,c3,c4,c5,c6,c7,new
f1,1,2,3,4,5,6,100
f2,8,9,10,11,12,13,100
f3,15,16,17,18,19,20,100
f4,22,23,24,25,26,27,100
f5,29,30,31,32,33,34,100
f6,36,37,38,39,40,41,100
f7,43,44,45,46,47,48,100


#### Operaciones sobre dataframes

Trasposición de dataframes

In [None]:
dataframe.T

Unnamed: 0,f1,f2,f3,f4,f5,f6,f7
c1,0,7,14,21,28,35,42
c2,1,8,15,22,29,36,43
c3,2,9,16,23,30,37,44
c4,3,10,17,24,31,38,45
c5,4,11,18,25,32,39,46
c6,5,12,19,26,33,40,47
c7,6,13,20,27,34,41,48
new,100,100,100,100,100,100,100


Operaciones sobre dataframes (columna a columna, por defecto)

In [None]:
np.sum(dataframe)

c1     147
c2     154
c3     161
c4     168
c5     175
c6     182
c7     189
new    700
dtype: int64

In [None]:
np.sum(dataframe, axis = 1)

f1    121
f2    170
f3    219
f4    268
f5    317
f6    366
f7    415
dtype: int64