# Introducción a Pandas



[*Pandas*](http://pandas.pydata.org/) Es una librería de python para análisis de datos estructuradas en columnas y renglones, muy popular para manipulación, análisis y preprocesamiento  en columnas, ideal para manipular y analizar datos de entrada. Además, *Pandas* se integra muy bien con otras librerías de Ciencia de Datos, de manera que usualmente los algoritmos de machine learning reciben como entradas las dos principales estructuras de *Pandas*: los `DataFrames` y las `Series`




# Objetos de Pandas

In [None]:
import numpy as np
import pandas as pd

## Series

Las series son equivalentes a los vectores de NumPy.

Pueden crearse a partir de listas o vectores.

In [None]:
c = ['rojo', 'verde', 'azul', 'amarillo']
colores = pd.Series(c)
colores

0        rojo
1       verde
2        azul
3    amarillo
dtype: object

In [None]:
p = np.array([0.2, 0.4, 0.8, 1])
porcentajes = pd.Series(p)
porcentajes

0    0.2
1    0.4
2    0.8
3    1.0
dtype: float64

Una serie consta de los valores y de los índices y se puede acceder a ellos con los atributos  ``values`` e ``index``.

Los valores son un vector de NumPy

In [None]:
porcentajes.values

array([0.2, 0.4, 0.8, 1. ])

In [None]:
colores.values

array(['rojo', 'verde', 'azul', 'amarillo'], dtype=object)

In [None]:
porcentajes.index

RangeIndex(start=0, stop=4, step=1)

Como en un vector de numpy, usamos los indices par obtener un subconjunto de elementos.

In [None]:
porcentajes[1]

0.4

In [None]:
porcentajes[1:3]

1    0.4
2    0.8
dtype: float64

Los objetos ``Series`` tienen la ventaja sobre los vectores de numpy que permite definir indices asociados a los valores, y pueden ser de cualquier tipo por ejemplo:

In [None]:
nivel_color = pd.Series( p,   # toma el objeto p (puede ser lista o vector)
                 index=c)     # asigna como índice el objeto c (lista o vector)
nivel_color

rojo        0.2
verde       0.4
azul        0.8
amarillo    1.0
dtype: float64

In [None]:
nivel_color.values

array([0.2, 0.4, 0.8, 1. ])

donde, además de acceder a los datos mediante la posición de los elementos, se puede hacer, mediante el indice asociado.

In [None]:
nivel_color['azul']         # Nos recuerda el uso de los diccionarios

0.8

In [None]:
nivel_color[2]

0.8

Así,  las Series son un objeto que toma las bondades de los vectores de numpy pero también de los diccionarios



In [None]:
type(nivel_color.values)

numpy.ndarray

In [None]:
#diccionario con la población en 2020 para estados seleccionados

poblacion_dict = {'Aguascalientes': 1425607	,
                   'Guanajuato': 6166934,
                   'Jalisco': 8348151	,
                   'CDMX': 9209944,
                   'Querétaro': 2368467}

pob = pd.Series(poblacion_dict)
pob

Aguascalientes    1425607
Guanajuato        6166934
Jalisco           8348151
CDMX              9209944
Querétaro         2368467
dtype: int64

In [None]:
pob['CDMX']

9209944

Con la ventaja de que podemos seleccionar una parte de los datos

In [None]:
pob['Aguascalientes':'CDMX']

Aguascalientes    1425607
Guanajuato        6166934
Jalisco           8348151
CDMX              9209944
dtype: int64

In [None]:
pob.iloc[3]

9209944

lo que no es posible hacer con los diccionarios

In [None]:
poblacion_dict['Aguascalientes':'CDMX']  

TypeError: ignored

<h2> Más ejemplos de series

In [None]:
pd.Series([2, 4, 6])

0    2
1    4
2    6
dtype: int64

In [None]:
pd.Series(14, index=[150, 220, 380])

150    14
220    14
380    14
dtype: int64

In [None]:
pd.Series({2:'a', 1:'b', 3:'c'})

2    a
1    b
3    c
dtype: object

##  DataFrame 

Igual que las ```Series```, un ```DataFrame``` toma las cualidades de una matriz de NumPy y también de los diccionarios, con ventajas adicionales

In [None]:
hombres_dic = {'Guanajuato': 2996454,'Aguascalientes': 696683	,  
               'Jalisco':4098455 , 'CDMX': 4404927, 'Querétaro': 1156820,
               }

In [None]:
pd.DataFrame({'total': poblacion_dict,'hombres': hombres_dic}) # A partir de dos diccionarios

Unnamed: 0,total,hombres
Aguascalientes,1425607,696683
Guanajuato,6166934,2996454
Jalisco,8348151,4098455
CDMX,9209944,4404927
Querétaro,2368467,1156820


guardamos el DataFrame con el nombre poblacion

In [None]:
poblacion = pd.DataFrame({'total': poblacion_dict,'hombres': hombres_dic}) 
poblacion

Unnamed: 0,total,hombres
Aguascalientes,1425607,696683
Guanajuato,6166934,2996454
Jalisco,8348151,4098455
CDMX,9209944,4404927
Querétaro,2368467,1156820


Un  ``DataFrame`` tiene además de los atributos ``values`` e ``index``, el atributo ``columns`` que es un índice que contiene las etiquetas de cada columna.

In [None]:
poblacion.values   

array([[1425607,  696683],
       [6166934, 2996454],
       [8348151, 4098455],
       [9209944, 4404927],
       [2368467, 1156820]])

In [None]:
poblacion.index

Index(['Aguascalientes', 'Guanajuato', 'Jalisco', 'CDMX', 'Querétaro'], dtype='object')

In [None]:
poblacion.columns

Index(['total', 'hombres'], dtype='object')

Los DataFrame son también una especie de diccionario donde las columnas actuan como claves


In [None]:
poblacion['total']

Aguascalientes    1425607
Guanajuato        6166934
Jalisco           8348151
CDMX              9209944
Querétaro         2368467
Name: total, dtype: int64

pero no el índice (como ocurre para las Series)

In [None]:
poblacion['Guanajuato']

KeyError: ignored

al igual que los diccionarios, podemos agregar nuevos elementos, en este caso series, y al igual que en NumPy podemos hacer opercion de vectores sin necesidad de recurrir a ciclos for

In [None]:
poblacion['mujeres'] = poblacion['total'] - poblacion['hombres']
poblacion

Unnamed: 0,total,hombres,mujeres
Aguascalientes,1425607,696683,728924
Guanajuato,6166934,2996454,3170480
Jalisco,8348151,4098455,4249696
CDMX,9209944,4404927,4805017
Querétaro,2368467,1156820,1211647


### Creación de dataframes a partir de distintos objetos



#### Desde una Series

In [None]:
pob

Aguascalientes    1425607
Guanajuato        6166934
Jalisco           8348151
CDMX              9209944
Querétaro         2368467
dtype: int64

In [None]:
pd.DataFrame(pob, columns=['poblacion'])

Unnamed: 0,poblacion
Aguascalientes,1425607
Guanajuato,6166934
Jalisco,8348151
CDMX,9209944
Querétaro,2368467


#### Desde una diccionario

In [None]:
data = {'Edades': [30, 21, 18, 20], 'Sexo': ['H', 'H', 'M', 'H']}
pd.DataFrame(data)

Unnamed: 0,Edades,Sexo
0,30,H
1,21,H
2,18,M
3,20,H


#### Desde una matriz de NumPy


In [None]:
pd.DataFrame(np.random.rand(5, 3),
             columns=['columna1', 'columna2','columna3'],
             index=['a', 'b', 'c', 'd', 'e']
             )

Unnamed: 0,columna1,columna2,columna3
a,0.878813,0.972173,0.610622
b,0.961671,0.461368,0.176108
c,0.937657,0.623051,0.783592
d,0.392929,0.787558,0.467425
e,0.985761,0.722678,0.576336


## Lectura de archivos csv como DataFrames

In [None]:
#Leer datos mediante la función csv

titanic = pd.read_csv( "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")
type(titanic)

pandas.core.frame.DataFrame

In [None]:
titanic

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.2500
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.9250
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1000
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.0500
...,...,...,...,...,...,...,...,...
882,0,2,Rev. Juozas Montvila,male,27.0,0,0,13.0000
883,1,1,Miss. Margaret Edith Graham,female,19.0,0,0,30.0000
884,0,3,Miss. Catherine Helen Johnston,female,7.0,1,2,23.4500
885,1,1,Mr. Karl Howell Behr,male,26.0,0,0,30.0000


In [None]:
titanic = titanic.reset_index()

In [None]:
titanic = titanic.set_index(["Name"])
titanic.head(10)

Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Mr. Owen Harris Braund,0,0,3,male,22.0,1,0,7.25
Mrs. John Bradley (Florence Briggs Thayer) Cumings,1,1,1,female,38.0,1,0,71.2833
Miss. Laina Heikkinen,2,1,3,female,26.0,0,0,7.925
Mrs. Jacques Heath (Lily May Peel) Futrelle,3,1,1,female,35.0,1,0,53.1
Mr. William Henry Allen,4,0,3,male,35.0,0,0,8.05
Mr. James Moran,5,0,3,male,27.0,0,0,8.4583
Mr. Timothy J McCarthy,6,0,1,male,54.0,0,0,51.8625
Master. Gosta Leonard Palsson,7,0,3,male,2.0,3,1,21.075
Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson,8,1,3,female,27.0,0,2,11.1333
Mrs. Nicholas (Adele Achem) Nasser,9,1,2,female,14.0,1,0,30.0708


In [None]:
titanic.columns


Index(['index', 'Survived', 'Pclass', 'Sex', 'Age', 'Siblings/Spouses Aboard',
       'Parents/Children Aboard', 'Fare'],
      dtype='object')

In [None]:
titanic.index

Index(['Mr. Owen Harris Braund',
       'Mrs. John Bradley (Florence Briggs Thayer) Cumings',
       'Miss. Laina Heikkinen', 'Mrs. Jacques Heath (Lily May Peel) Futrelle',
       'Mr. William Henry Allen', 'Mr. James Moran', 'Mr. Timothy J McCarthy',
       'Master. Gosta Leonard Palsson',
       'Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson',
       'Mrs. Nicholas (Adele Achem) Nasser',
       ...
       'Mr. Johann Markun', 'Miss. Gerda Ulrika Dahlberg',
       'Mr. Frederick James Banfield', 'Mr. Henry Jr Sutehall',
       'Mrs. William (Margaret Norton) Rice', 'Rev. Juozas Montvila',
       'Miss. Margaret Edith Graham', 'Miss. Catherine Helen Johnston',
       'Mr. Karl Howell Behr', 'Mr. Patrick Dooley'],
      dtype='object', name='Name', length=887)

In [None]:
## Una sola columna del datafreme es una Serie

edad = titanic['Age']
type(edad)

pandas.core.series.Series

In [None]:
edad

Name
Mr. Owen Harris Braund                                22.0
Mrs. John Bradley (Florence Briggs Thayer) Cumings    38.0
Miss. Laina Heikkinen                                 26.0
Mrs. Jacques Heath (Lily May Peel) Futrelle           35.0
Mr. William Henry Allen                               35.0
                                                      ... 
Rev. Juozas Montvila                                  27.0
Miss. Margaret Edith Graham                           19.0
Miss. Catherine Helen Johnston                         7.0
Mr. Karl Howell Behr                                  26.0
Mr. Patrick Dooley                                    32.0
Name: Age, Length: 887, dtype: float64

## Funciones, métodos y atributos .


In [None]:
titanic.info()  #regresa el numero de renglones, el número de columnas, los nombres de las columnas, 
                #la cantidad de datos faltanes, la clase de cada variable

<class 'pandas.core.frame.DataFrame'>
Index: 887 entries, Mr. Owen Harris Braund to Mr. Patrick Dooley
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   index                    887 non-null    int64  
 1   Survived                 887 non-null    int64  
 2   Pclass                   887 non-null    int64  
 3   Sex                      887 non-null    object 
 4   Age                      887 non-null    float64
 5   Siblings/Spouses Aboard  887 non-null    int64  
 6   Parents/Children Aboard  887 non-null    int64  
 7   Fare                     887 non-null    float64
dtypes: float64(2), int64(5), object(1)
memory usage: 94.7+ KB


In [None]:
titanic.describe()  

Unnamed: 0,index,Survived,Pclass,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
count,887.0,887.0,887.0,887.0,887.0,887.0,887.0
mean,443.0,0.385569,2.305524,29.471443,0.525366,0.383315,32.30542
std,256.199141,0.487004,0.836662,14.121908,1.104669,0.807466,49.78204
min,0.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,221.5,0.0,2.0,20.25,0.0,0.0,7.925
50%,443.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,664.5,1.0,3.0,38.0,1.0,0.0,31.1375
max,886.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [None]:
#ver la cantidad de renglones y de columnas utilizando el atributo shape
print('dimensiones: ', titanic.shape)
print('\nrenglones: ', titanic.shape[0])
print('columnas:  ', titanic.shape[1])

dimensiones:  (887, 8)

renglones:  887
columnas:   8


In [None]:
type(titanic.shape)

tuple

In [None]:
#regresa el número de valores en cada variable 
titanic.nunique()  

index                      887
Survived                     2
Pclass                       3
Sex                          2
Age                         89
Siblings/Spouses Aboard      7
Parents/Children Aboard      7
Fare                       248
dtype: int64

In [None]:
titanic.columns

Index(['index', 'Survived', 'Pclass', 'Sex', 'Age', 'Siblings/Spouses Aboard',
       'Parents/Children Aboard', 'Fare'],
      dtype='object')

Para conocer la distribución y los valores de cada columna utilizamos el método value_counts() que es un método para Series

In [None]:
titanic['Sex'].value_counts()

male      573
female    314
Name: Sex, dtype: int64

Para hacer lo anterior con cada columna del data frame podemos iterar por cada una de las columnas

In [None]:
for column in titanic.columns:
    print(titanic[column].value_counts(),'\n')   
    

0      1
596    1
585    1
586    1
587    1
      ..
299    1
300    1
301    1
302    1
886    1
Name: index, Length: 887, dtype: int64 

0    545
1    342
Name: Survived, dtype: int64 

3    487
1    216
2    184
Name: Pclass, dtype: int64 

male      573
female    314
Name: Sex, dtype: int64 

22.00    39
28.00    37
18.00    36
21.00    34
24.00    34
         ..
0.92      1
23.50     1
36.50     1
55.50     1
74.00     1
Name: Age, Length: 89, dtype: int64 

0    604
1    209
2     28
4     18
3     16
8      7
5      5
Name: Siblings/Spouses Aboard, dtype: int64 

0    674
1    118
2     80
5      5
3      5
4      4
6      1
Name: Parents/Children Aboard, dtype: int64 

8.0500     43
13.0000    42
7.8958     36
7.7500     33
26.0000    31
           ..
35.0000     1
28.5000     1
6.2375      1
14.0000     1
10.5167     1
Name: Fare, Length: 248, dtype: int64 



## Métodos para resumir en un solo valor información

Producen un único valor relacionado con los datos a los que se aplica. Si no se especifica otra cosa se Pandas lo aplica resumiendo los valores de cada columna de un dataframe, si se quiere resumir las variables de cada renglón, se deberá especificar axis=1 entre los paréntesis.



In [None]:
titanic.mean()

  """Entry point for launching an IPython kernel.


index                      443.000000
Survived                     0.385569
Pclass                       2.305524
Age                         29.471443
Siblings/Spouses Aboard      0.525366
Parents/Children Aboard      0.383315
Fare                        32.305420
dtype: float64

In [None]:
titanic.count()

index                      887
Survived                   887
Pclass                     887
Sex                        887
Age                        887
Siblings/Spouses Aboard    887
Parents/Children Aboard    887
Fare                       887
dtype: int64

In [None]:
titanic.isnull()

Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Mr. Owen Harris Braund,False,False,False,False,False,False,False,False
Mrs. John Bradley (Florence Briggs Thayer) Cumings,False,False,False,False,False,False,False,False
Miss. Laina Heikkinen,False,False,False,False,False,False,False,False
Mrs. Jacques Heath (Lily May Peel) Futrelle,False,False,False,False,False,False,False,False
Mr. William Henry Allen,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...
Rev. Juozas Montvila,False,False,False,False,False,False,False,False
Miss. Margaret Edith Graham,False,False,False,False,False,False,False,False
Miss. Catherine Helen Johnston,False,False,False,False,False,False,False,False
Mr. Karl Howell Behr,False,False,False,False,False,False,False,False


In [None]:
titanic.isnull().sum()

index                      0
Survived                   0
Pclass                     0
Sex                        0
Age                        0
Siblings/Spouses Aboard    0
Parents/Children Aboard    0
Fare                       0
dtype: int64

In [None]:
print('\n',titanic.isnull().sum(axis=1))


 Name
Mr. Owen Harris Braund                                0
Mrs. John Bradley (Florence Briggs Thayer) Cumings    0
Miss. Laina Heikkinen                                 0
Mrs. Jacques Heath (Lily May Peel) Futrelle           0
Mr. William Henry Allen                               0
                                                     ..
Rev. Juozas Montvila                                  0
Miss. Margaret Edith Graham                           0
Miss. Catherine Helen Johnston                        0
Mr. Karl Howell Behr                                  0
Mr. Patrick Dooley                                    0
Length: 887, dtype: int64


## Acceso a los datos usando etiquetas o posiciones

Acceso a los datos utilizando etiquetas de las columnas

Las siguientes expresiones son equivalentes:
- titanic['Age'] 
- titanic.Age  

En la segunda forma, se asume que cada columna es un atributo del DataFrame

In [None]:
print(type(titanic['Age']), '\n')
titanic['Age'] 

<class 'pandas.core.series.Series'> 



Name
Mr. Owen Harris Braund                                22.0
Mrs. John Bradley (Florence Briggs Thayer) Cumings    38.0
Miss. Laina Heikkinen                                 26.0
Mrs. Jacques Heath (Lily May Peel) Futrelle           35.0
Mr. William Henry Allen                               35.0
                                                      ... 
Rev. Juozas Montvila                                  27.0
Miss. Margaret Edith Graham                           19.0
Miss. Catherine Helen Johnston                         7.0
Mr. Karl Howell Behr                                  26.0
Mr. Patrick Dooley                                    32.0
Name: Age, Length: 887, dtype: float64

In [None]:
print(type(titanic.Age), '\n')
titanic.Age

<class 'pandas.core.series.Series'> 



Name
Mr. Owen Harris Braund                                22.0
Mrs. John Bradley (Florence Briggs Thayer) Cumings    38.0
Miss. Laina Heikkinen                                 26.0
Mrs. Jacques Heath (Lily May Peel) Futrelle           35.0
Mr. William Henry Allen                               35.0
                                                      ... 
Rev. Juozas Montvila                                  27.0
Miss. Margaret Edith Graham                           19.0
Miss. Catherine Helen Johnston                         7.0
Mr. Karl Howell Behr                                  26.0
Mr. Patrick Dooley                                    32.0
Name: Age, Length: 887, dtype: float64

podemos llamar a dos o más columnas juntas, agrupandolas en una lista

In [None]:
titanic[['Survived','Sex','Age']]

Unnamed: 0_level_0,Survived,Sex,Age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mr. Owen Harris Braund,0,male,22.0
Mrs. John Bradley (Florence Briggs Thayer) Cumings,1,female,38.0
Miss. Laina Heikkinen,1,female,26.0
Mrs. Jacques Heath (Lily May Peel) Futrelle,1,female,35.0
Mr. William Henry Allen,0,male,35.0
...,...,...,...
Rev. Juozas Montvila,0,male,27.0
Miss. Margaret Edith Graham,1,female,19.0
Miss. Catherine Helen Johnston,0,female,7.0
Mr. Karl Howell Behr,1,male,26.0


Para los renglones, hemos dicho que no se puede usar el índice como si fuera la clave de un diccionario

In [None]:
titanic.index

Index(['Mr. Owen Harris Braund',
       'Mrs. John Bradley (Florence Briggs Thayer) Cumings',
       'Miss. Laina Heikkinen', 'Mrs. Jacques Heath (Lily May Peel) Futrelle',
       'Mr. William Henry Allen', 'Mr. James Moran', 'Mr. Timothy J McCarthy',
       'Master. Gosta Leonard Palsson',
       'Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson',
       'Mrs. Nicholas (Adele Achem) Nasser',
       ...
       'Mr. Johann Markun', 'Miss. Gerda Ulrika Dahlberg',
       'Mr. Frederick James Banfield', 'Mr. Henry Jr Sutehall',
       'Mrs. William (Margaret Norton) Rice', 'Rev. Juozas Montvila',
       'Miss. Margaret Edith Graham', 'Miss. Catherine Helen Johnston',
       'Mr. Karl Howell Behr', 'Mr. Patrick Dooley'],
      dtype='object', name='Name', length=887)

In [None]:
titanic.iloc[0:15,3:]

Unnamed: 0_level_0,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mr. Owen Harris Braund,male,22.0,1,0,7.25
Mrs. John Bradley (Florence Briggs Thayer) Cumings,female,38.0,1,0,71.2833
Miss. Laina Heikkinen,female,26.0,0,0,7.925
Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
Mr. William Henry Allen,male,35.0,0,0,8.05
Mr. James Moran,male,27.0,0,0,8.4583
Mr. Timothy J McCarthy,male,54.0,0,0,51.8625
Master. Gosta Leonard Palsson,male,2.0,3,1,21.075
Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson,female,27.0,0,2,11.1333
Mrs. Nicholas (Adele Achem) Nasser,female,14.0,1,0,30.0708


### Propiedad **`loc[]`** 

Se utiliza **`df.loc[renglones,columnas]`**  para tener acceso a los datos usando etiquetas

Podemos utilizar esta propiedad de los DataFrames para llamar a un solo renglón

In [None]:
titanic.loc['Miss. Laina Heikkinen']  

index                           2
Survived                        1
Pclass                          3
Sex                        female
Age                          26.0
Siblings/Spouses Aboard         0
Parents/Children Aboard         0
Fare                        7.925
Name: Miss. Laina Heikkinen, dtype: object

O varios renglones, agrupandolos en una lista

In [None]:
titanic.loc[['Miss. Laina Heikkinen','Mr. Timothy J McCarthy']]  

Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Miss. Laina Heikkinen,2,1,3,female,26.0,0,0,7.925
Mr. Timothy J McCarthy,6,0,1,male,54.0,0,0,51.8625


o un grupo de renglones y columnas

In [None]:
titanic.loc[['Miss. Laina Heikkinen','Mr. Timothy J McCarthy'], ['Survived','Age']]

Unnamed: 0_level_0,Survived,Age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Miss. Laina Heikkinen,1,26.0
Mr. Timothy J McCarthy,0,54.0


o utilizar '*slicing*'

In [None]:
titanic.loc['Miss. Laina Heikkinen':'Mr. Timothy J McCarthy', 'Survived':'Age']  

Unnamed: 0_level_0,Survived,Pclass,Sex,Age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Miss. Laina Heikkinen,1,3,female,26.0
Mrs. Jacques Heath (Lily May Peel) Futrelle,1,1,female,35.0
Mr. William Henry Allen,0,3,male,35.0
Mr. James Moran,0,3,male,27.0
Mr. Timothy J McCarthy,0,1,male,54.0


In [None]:
titanic.loc[:,'Age']

Name
Mr. Owen Harris Braund                                22.0
Mrs. John Bradley (Florence Briggs Thayer) Cumings    38.0
Miss. Laina Heikkinen                                 26.0
Mrs. Jacques Heath (Lily May Peel) Futrelle           35.0
Mr. William Henry Allen                               35.0
                                                      ... 
Rev. Juozas Montvila                                  27.0
Miss. Margaret Edith Graham                           19.0
Miss. Catherine Helen Johnston                         7.0
Mr. Karl Howell Behr                                  26.0
Mr. Patrick Dooley                                    32.0
Name: Age, Length: 887, dtype: float64

### Propiedad **`iloc[]`** para acceder mediante posiciones


Se utiliza **`df.iloc[row, column]`**, para  usar las posiciones de un valor

In [None]:
titanic.iloc[15:22,3:5]  

Unnamed: 0_level_0,Sex,Age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Mrs. (Mary D Kingcome) Hewlett,female,55.0
Master. Eugene Rice,male,2.0
Mr. Charles Eugene Williams,male,23.0
Mrs. Julius (Emelia Maria Vandemoortele) Vander Planke,female,31.0
Mrs. Fatima Masselmani,female,22.0
Mr. Joseph J Fynney,male,35.0
Mr. Lawrence Beesley,male,34.0


In [None]:
titanic.iloc[[15,22],[3,5]]  

Unnamed: 0_level_0,Sex,Siblings/Spouses Aboard
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Mrs. (Mary D Kingcome) Hewlett,female,0
Miss. Anna McGowan,female,0


## Filtrado de datos

In [None]:
titanic[titanic['Survived']==1]

Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Mrs. John Bradley (Florence Briggs Thayer) Cumings,1,1,1,female,38.0,1,0,71.2833
Miss. Laina Heikkinen,2,1,3,female,26.0,0,0,7.9250
Mrs. Jacques Heath (Lily May Peel) Futrelle,3,1,1,female,35.0,1,0,53.1000
Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson,8,1,3,female,27.0,0,2,11.1333
Mrs. Nicholas (Adele Achem) Nasser,9,1,2,female,14.0,1,0,30.0708
...,...,...,...,...,...,...,...,...
Miss. Adele Kiamie Najib,871,1,3,female,15.0,0,0,7.2250
Mrs. Thomas Jr (Lily Alexenia Wilson) Potter,875,1,1,female,56.0,0,1,83.1583
Mrs. William (Imanita Parrish Hall) Shelley,876,1,2,female,25.0,0,1,26.0000
Miss. Margaret Edith Graham,883,1,1,female,19.0,0,0,30.0000


In [None]:
filtro = titanic['Pclass']==3

tercera = titanic[filtro]
tercera.head(10)

Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Mr. Owen Harris Braund,0,0,3,male,22.0,1,0,7.25
Miss. Laina Heikkinen,2,1,3,female,26.0,0,0,7.925
Mr. William Henry Allen,4,0,3,male,35.0,0,0,8.05
Mr. James Moran,5,0,3,male,27.0,0,0,8.4583
Master. Gosta Leonard Palsson,7,0,3,male,2.0,3,1,21.075
Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson,8,1,3,female,27.0,0,2,11.1333
Miss. Marguerite Rut Sandstrom,10,1,3,female,4.0,1,1,16.7
Mr. William Henry Saundercock,12,0,3,male,20.0,0,0,8.05
Mr. Anders Johan Andersson,13,0,3,male,39.0,1,5,31.275
Miss. Hulda Amanda Adolfina Vestrom,14,0,3,female,14.0,0,0,7.8542


In [None]:
#Guarda el conjunto de datos que cumple la condición 
menores10anios = titanic[titanic['Age']<10]   
print(menores10anios.shape)
menores10anios.head()

(71, 8)


Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Master. Gosta Leonard Palsson,7,0,3,male,2.0,3,1,21.075
Miss. Marguerite Rut Sandstrom,10,1,3,female,4.0,1,1,16.7
Master. Eugene Rice,16,0,3,male,2.0,4,1,29.125
Miss. Torborg Danira Palsson,24,0,3,female,8.0,3,1,21.075
Miss. Simonne Marie Anne Andree Laroche,42,1,2,female,3.0,1,2,41.5792


In [None]:
hombres = titanic[titanic['Sex']=='male']
print(hombres.shape)
hombres.tail()

(573, 8)


Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Mr. Frederick James Banfield,879,0,2,male,28.0,0,0,10.5
Mr. Henry Jr Sutehall,880,0,3,male,25.0,0,0,7.05
Rev. Juozas Montvila,882,0,2,male,27.0,0,0,13.0
Mr. Karl Howell Behr,885,1,1,male,26.0,0,0,30.0
Mr. Patrick Dooley,886,0,3,male,32.0,0,0,7.75


In [None]:
#Subconjunto de pasajeros de primera clase mujeres
filtro1 = titanic['Pclass']==1
filtro2 = titanic['Sex']=='female'

MujeresPrimera = titanic[filtro1 & filtro2]

print(MujeresPrimera.shape)
MujeresPrimera.head()

(94, 8)


Unnamed: 0_level_0,index,Survived,Pclass,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Mrs. John Bradley (Florence Briggs Thayer) Cumings,1,1,1,female,38.0,1,0,71.2833
Mrs. Jacques Heath (Lily May Peel) Futrelle,3,1,1,female,35.0,1,0,53.1
Miss. Elizabeth Bonnell,11,1,1,female,58.0,0,0,26.55
Mrs. William Augustus (Marie Eugenie) Spencer,31,1,1,female,48.0,1,0,146.5208
Mrs. Henry Sleeper (Myna Haxtun) Harper,51,1,1,female,49.0,1,0,76.7292


In [None]:
a = titanic[filtro1]

## Recomendacion para programadores que inician su práctica.


1. Escribe y ejecuta una sola linea de codigo para explorar tus datos
2. Verifica su correcto funcionamiento, obteniendo el output de esa linea
3. Asigna el resultad a una variable
4. En la misma linea, muestra las primeras lineas del Data Frame o la Serie, según la acción ejecutada.
5. Continua en la siguiente celda.