#**PANDAS**


Esta biblioteca permite el:

**Análisis e importación de datos** <p>

Está construida sobre NumPy y Matplotlib <p>

Se caracteriza por ser rápida y eficiente con diferentes estructuras de datos, las principales son Series y DataFrame.

Permite:

*   Reagrupamiento
*   Agrupamientos 
*   Mezclar
*   Indexación jerárquica
*   Manejo de series de tiempo

Documentación: [Pandas](https://pandas.pydata.org/)

Es la más popular para el análisis de datos en áreas como: Finanzas, Neurociencias, Economía, Estadística, entre otras. <p>
Para usar pandas hay que importarlo


```
import pandas as pd
```



# Creación, Lectura y Escritura

##Creación de datos
Los objetos principales de pandas son los **DataFrame** y las **Series**.

###DataFrame
Un DataFrame es una tabla, contiene un arreglo de elementos, cada uno con un determinado valor. Cada entrada corresponde a un renglón (o registro) y una columna. <p>
El constructor pd.DataFrame () puede recibir como parámetro un diccionario, la forma más común es llave-lista, donde las llaves son las columas y los valores la lista de entradas.
Por ejemplo:

In [None]:
import pandas as pd

pd.DataFrame ({'Si': [50,21], 'No':[131,2]})

Unnamed: 0,Si,No
0,50,131
1,21,2


Se pueden usar distintos tipos de datos

In [None]:
pd.DataFrame ({'Chacha': ['Me gusta el pescado', 'Como croquetas'],
               'Polita': ['Me gusta la carne', 'Como tocino']})

Unnamed: 0,Chacha,Polita
0,Me gusta el pescado,Me gusta la carne
1,Como croquetas,Como tocino


Las etiquetas de los renglones inician desde 0 y se van incrementando. Para asignar otros valores se usa el parámetro `index`.

In [None]:
pd.DataFrame ({'Chacha': ['Me gusta el pescado', 'Como croquetas'],
               'Polita': ['Me gusta la carne', 'Como tocino']},
              index =['Preferencia', 'Alimento'])

Unnamed: 0,Chacha,Polita
Preferencia,Me gusta el pescado,Me gusta la carne
Alimento,Como croquetas,Como tocino


###Series
Una serie es una secuencia de valores. Por analogía si un DataFrame es una tabla, una Serie es una lista.

In [None]:
pd.Series ([0,5,10,15,20,25,30])

0     0
1     5
2    10
3    15
4    20
5    25
6    30
dtype: int64

Una serie es una columna de un DataFrame, pero sin nombre. <p>
Pero la serie si puede tener nombre.

In [None]:
import pandas as pd

pd.Series ([5, 10, 15], index = ['Año 1', 'Año 2', 'Año 3'],
           name ='Ventas')

Año 1     5
Año 2    10
Año 3    15
Name: Ventas, dtype: int64

##Lectura de archivos

En muchas ocasiones se trabaja con datos existentes. <p>
Los datos pueden almacenarse en muchas formas y formatos. <p>
La forma más común es en archivos CSV (*Comma-Separated Values*), que se ven de la siguiente forma:



```
Producto A, Producto B, Producto C,
10, 20, 30,
15, 25, 35,
1, 2, 3
```

O sea una tabla de valores separados por comas.<p>
La función que se usa para leer datos en un DataFrame es:



```
pd.read_csv ()
```



##Ejercicio:
Subir el archivo https://www.kaggle.com/sudalairajkumar/covid19-in-italy?select=covid19_italy_region.csv al almacenamiento de sesión

In [None]:
from google.colab import files
import pandas as pd
import io

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  

Saving covid19_italy_region.csv to covid19_italy_region (5).csv
User uploaded file "covid19_italy_region.csv" with length 648719 bytes


In [None]:
data = pd.read_csv (io.BytesIO(uploaded['covid19_italy_region.csv']))  
data

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
0,0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.250850,0,0,0,0,0,0,0,0,0,
4,4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6022,6022,2020-12-06T17:00:00,ITA,19,Sicilia,38.115697,13.362357,1367,213,1580,38166,39746,1022,29984,1759,71489,692062.0
6023,6023,2020-12-06T17:00:00,ITA,9,Toscana,43.769231,11.255889,1360,252,1612,27587,29199,753,76331,2867,108397,983103.0
6024,6024,2020-12-06T17:00:00,ITA,10,Umbria,43.106758,12.388247,332,60,392,5673,6065,234,18619,460,25144,231538.0
6025,6025,2020-12-06T17:00:00,ITA,2,Valle d'Aosta,45.737503,7.320149,102,8,110,877,987,34,5406,333,6726,34644.0


El atributo shape permite ver las dimensiones del Dataframe

In [None]:
data.shape

(6027, 17)

Para examinar el contenido del DataFrame se pueden usar `head` para mostrar los primeros 5 renglones o `tail`, para mostrar los últimos 5.

In [None]:
data.head (10)

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
0,0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.25085,0,0,0,0,0,0,0,0,0,
4,4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,
5,5,2020-02-24T18:00:00,ITA,6,Friuli Venezia Giulia,45.649435,13.768136,0,0,0,0,0,0,0,0,0,
6,6,2020-02-24T18:00:00,ITA,12,Lazio,41.89277,12.483667,1,1,2,0,2,2,1,0,3,
7,7,2020-02-24T18:00:00,ITA,7,Liguria,44.411493,8.932699,0,0,0,0,0,0,0,0,0,
8,8,2020-02-24T18:00:00,ITA,3,Lombardia,45.466794,9.190347,76,19,95,71,166,166,0,6,172,
9,9,2020-02-24T18:00:00,ITA,11,Marche,43.61676,13.518875,0,0,0,0,0,0,0,0,0,


In [None]:
data.tail (7)

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
6020,6020,2020-12-06T17:00:00,ITA,16,Puglia,41.125596,16.867367,1613,203,1816,44018,45834,1789,16795,1712,64341,567857.0
6021,6021,2020-12-06T17:00:00,ITA,20,Sardegna,39.215312,9.110616,616,64,680,14280,14960,293,8695,531,24186,333552.0
6022,6022,2020-12-06T17:00:00,ITA,19,Sicilia,38.115697,13.362357,1367,213,1580,38166,39746,1022,29984,1759,71489,692062.0
6023,6023,2020-12-06T17:00:00,ITA,9,Toscana,43.769231,11.255889,1360,252,1612,27587,29199,753,76331,2867,108397,983103.0
6024,6024,2020-12-06T17:00:00,ITA,10,Umbria,43.106758,12.388247,332,60,392,5673,6065,234,18619,460,25144,231538.0
6025,6025,2020-12-06T17:00:00,ITA,2,Valle d'Aosta,45.737503,7.320149,102,8,110,877,987,34,5406,333,6726,34644.0
6026,6026,2020-12-06T17:00:00,ITA,5,Veneto,45.434905,12.338452,2508,308,2816,73988,76804,3444,84235,4210,165249,1090932.0


En la función pd.read_csv se pueden especificar cerca de 30 parámetros. Por ejemplo, para no crear el índice

In [None]:
covidItalia = pd.read_csv ('/content/covid19_italy_region.csv', index_col=0)
covidItalia.head()

Unnamed: 0_level_0,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
SNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.25085,0,0,0,0,0,0,0,0,0,
4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,


In [None]:
covidItalia.iloc [0:5, :]

Unnamed: 0_level_0,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
SNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.25085,0,0,0,0,0,0,0,0,0,
4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,


In [None]:
covidItalia.loc [0:5, :]

Unnamed: 0_level_0,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
SNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.25085,0,0,0,0,0,0,0,0,0,
4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,
5,2020-02-24T18:00:00,ITA,6,Friuli Venezia Giulia,45.649435,13.768136,0,0,0,0,0,0,0,0,0,


##Ejercicios

1. Crear un dataFrame frutas como la siguiente tabla:

.   | Manzanas | Peras
---|----------|---------
2020 |  35|  21
2021 |  42|  34

In [2]:
frutas = pd.DataFrame ({'Manzanas': [35, 21],
               'Peras': [42, 34]},
              index =['2020', '2021'])
frutas


NameError: ignored

In [3]:
pd.DataFrame ([[35, 21], [42, 34]], columns= ['Manzanas', 'Peras'],
              index =['2020', '2021'])


NameError: ignored

2. Crear una Serie de ingredientes como la siguiente:

. | .
---|-----
Hariana | 4 tazas
Leche  | 1 taza
Huevos | 2 piezas
Spam   | 1 lata

con nombre pastel.

In [None]:
ingredientes = pd.Series({'Ingredientes': ['Harina', 'Leche', 'Huevos', 'Spam'], 
                          'Cantidad': ['4 tazas', '1 taza', '2 piezas', '1 lata']}, 
                         name = 'Pastel')


ingredientes

3. Leer el siguiente dataset [winemag-data-130k-v2.csv](https://www.kaggle.com/christopheiv/winemagdata130k) en un DataFrame.<p>
Desplegar los 7 primeros renglones y los últimos 5.

Ejecute el siguiente snippet para crear y desplegar el siguiente DataFrame

In [None]:
animales = pd.DataFrame({'Gatos': [12, 20], 'Perros': [22, 19]}, index=['Año 1', 'Año 2'])
animales

4. Guarde en disco el DataFrame en un archivo con nombre "gatosYperros.csv"

In [None]:
animales.to_csv('gatosYperros.csv')


#Indexación, Selección y Asignación

Una de las tareas principales cuando se trabaja con datos es seleccionar lo relevante.

Subir el archivo https://www.kaggle.com/sudalairajkumar/covid19-in-italy?select=covid19_italy_region.csv al almacenamiento de sesión

In [None]:
import pandas as pd
import io
from google.colab import files
uploaded = files.upload ()
data = pd.read_csv(io.BytesIO(uploaded['covid19_italy_region.csv']))
data

Saving covid19_italy_region.csv to covid19_italy_region (6).csv


Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
0,0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.250850,0,0,0,0,0,0,0,0,0,
4,4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6022,6022,2020-12-06T17:00:00,ITA,19,Sicilia,38.115697,13.362357,1367,213,1580,38166,39746,1022,29984,1759,71489,692062.0
6023,6023,2020-12-06T17:00:00,ITA,9,Toscana,43.769231,11.255889,1360,252,1612,27587,29199,753,76331,2867,108397,983103.0
6024,6024,2020-12-06T17:00:00,ITA,10,Umbria,43.106758,12.388247,332,60,392,5673,6065,234,18619,460,25144,231538.0
6025,6025,2020-12-06T17:00:00,ITA,2,Valle d'Aosta,45.737503,7.320149,102,8,110,877,987,34,5406,333,6726,34644.0


In [None]:
#mostrar un máximo de 5 renglones
pd.set_option('max_rows', 15)

In [None]:
data

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
0,0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.250850,0,0,0,0,0,0,0,0,0,
4,4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6022,6022,2020-12-06T17:00:00,ITA,19,Sicilia,38.115697,13.362357,1367,213,1580,38166,39746,1022,29984,1759,71489,692062.0
6023,6023,2020-12-06T17:00:00,ITA,9,Toscana,43.769231,11.255889,1360,252,1612,27587,29199,753,76331,2867,108397,983103.0
6024,6024,2020-12-06T17:00:00,ITA,10,Umbria,43.106758,12.388247,332,60,392,5673,6065,234,18619,460,25144,231538.0
6025,6025,2020-12-06T17:00:00,ITA,2,Valle d'Aosta,45.737503,7.320149,102,8,110,877,987,34,5406,333,6726,34644.0


## Acceso básico

Los objetos ofrecen una forma sencilla de acceso (indexación) a los datos. Pandas lo hace de manera similar

In [None]:
data.RegionName

0              Abruzzo
1           Basilicata
2             Calabria
3             Campania
4       Emilia-Romagna
             ...      
6022           Sicilia
6023           Toscana
6024            Umbria
6025     Valle d'Aosta
6026            Veneto
Name: RegionName, Length: 6027, dtype: object

Si se tuviera un diccionario, se podría usar tener acceso usando el operador [] y Pandas también lo ofrece

In [None]:
data ['RegionName']

0              Abruzzo
1           Basilicata
2             Calabria
3             Campania
4       Emilia-Romagna
             ...      
6022           Sicilia
6023           Toscana
6024            Umbria
6025     Valle d'Aosta
6026            Veneto
Name: RegionName, Length: 6027, dtype: object

Estas son las dos formas de acceso, su eficiencia es la misma. Sin embargo, a través del operador . no se pueden tener espacios en el nombre de la columna. <p>
Para accesar un solo valor, se usa el operador []

In [None]:
data ['RegionName'][0]

'Abruzzo'

In [None]:
data ['RegionName'][10]

'Molise'

##Índices en pandas

El operador de índices y la selección de atributos se hace como en el resto del ecosistema Python. <p>
Pero se tienen los operadores **loc** e **iloc**.<p>
En ambos primero renglón después columna (al revés que la mayoría en Python).
>* **iloc** basada en posición numérica
>* **loc** basado en etiquetas


Selección del primer renglón en el DataFrame

In [None]:
data.iloc[0]

SNo                                     0
Date                  2020-02-24T18:00:00
Country                               ITA
RegionCode                             13
RegionName                        Abruzzo
                             ...         
NewPositiveCases                        0
Recovered                               0
Deaths                                  0
TotalPositiveCases                      0
TestsPerformed                        NaN
Name: 0, Length: 17, dtype: object


El operador `:` significa todo como en Python básico. <p>
Selección de la primera columna.

In [None]:
data.iloc[:,0]

0          0
1          1
2          2
3          3
4          4
        ... 
6022    6022
6023    6023
6024    6024
6025    6025
6026    6026
Name: SNo, Length: 6027, dtype: int64

In [None]:
#¿Qué salida produce?
data.iloc[0:5,:]

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
0,0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.25085,0,0,0,0,0,0,0,0,0,
4,4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,


*Ejercicio:* Seleccionar la segunda y tercera entrada usando `iloc ()`.

In [None]:
data.iloc [1:3, 0]

1    1
2    2
Name: SNo, dtype: int64

También se puede pasar una lista

In [None]:
data.iloc [[0,2,4,6], 0]

0    0
2    2
4    4
6    6
Name: SNo, dtype: int64

In [None]:
data.iloc [[0,2,4,6], [0,1,2,8]]

Unnamed: 0,SNo,Date,Country,IntensiveCarePatients
0,0,2020-02-24T18:00:00,ITA,0
2,2,2020-02-24T18:00:00,ITA,0
4,4,2020-02-24T18:00:00,ITA,2
6,6,2020-02-24T18:00:00,ITA,1


También se pueden manejar de derecha a izquierda.

In [None]:
#últimos 5
data.iloc [-5: ]

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
6022,6022,2020-12-06T17:00:00,ITA,19,Sicilia,38.115697,13.362357,1367,213,1580,38166,39746,1022,29984,1759,71489,692062.0
6023,6023,2020-12-06T17:00:00,ITA,9,Toscana,43.769231,11.255889,1360,252,1612,27587,29199,753,76331,2867,108397,983103.0
6024,6024,2020-12-06T17:00:00,ITA,10,Umbria,43.106758,12.388247,332,60,392,5673,6065,234,18619,460,25144,231538.0
6025,6025,2020-12-06T17:00:00,ITA,2,Valle d'Aosta,45.737503,7.320149,102,8,110,877,987,34,5406,333,6726,34644.0
6026,6026,2020-12-06T17:00:00,ITA,5,Veneto,45.434905,12.338452,2508,308,2816,73988,76804,3444,84235,4210,165249,1090932.0


##Selección basada en etiquetas **loc**

Se puede pasar el nombre del renglón o de la columna al hacer la selección. Incluye el último elemento dentro del rango, a diferencia de iloc (). <p>
`loc () `acepta datos booleanos a diferencia de iloc.

In [None]:
#La primera entrada
data.loc[0,'SNo']

0

*Ejercicio*: Seleccionar la segunda y tercera entrada usando loc ().



In [None]:
data.loc [1:2, 'SNo']]


Unnamed: 0,SNo
1,1
2,2


In [None]:
data.loc[2:3]['SNo']


2    2
3    3
Name: SNo, dtype: int64

In [None]:
#¿Qué hace el siguiente enunciado?
data.loc[0:5,:]

Unnamed: 0,SNo,Date,Country,RegionCode,RegionName,Latitude,Longitude,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement,CurrentPositiveCases,NewPositiveCases,Recovered,Deaths,TotalPositiveCases,TestsPerformed
0,0,2020-02-24T18:00:00,ITA,13,Abruzzo,42.351222,13.398438,0,0,0,0,0,0,0,0,0,
1,1,2020-02-24T18:00:00,ITA,17,Basilicata,40.639471,15.805148,0,0,0,0,0,0,0,0,0,
2,2,2020-02-24T18:00:00,ITA,18,Calabria,38.905976,16.594402,0,0,0,0,0,0,0,0,0,
3,3,2020-02-24T18:00:00,ITA,15,Campania,40.839566,14.25085,0,0,0,0,0,0,0,0,0,
4,4,2020-02-24T18:00:00,ITA,8,Emilia-Romagna,44.494367,11.341721,10,2,12,6,18,18,0,0,18,
5,5,2020-02-24T18:00:00,ITA,6,Friuli Venezia Giulia,45.649435,13.768136,0,0,0,0,0,0,0,0,0,


In [None]:
data.index

RangeIndex(start=0, stop=6027, step=1)

In [None]:
data.loc[:5,"HospitalizedPatients"]

0     0
1     0
2     0
3     0
4    10
5     0
Name: HospitalizedPatients, dtype: int64

In [None]:
data.loc[:5,["HospitalizedPatients","IntensiveCarePatients", "TotalHospitalizedPatients",	"HomeConfinement"]]

Unnamed: 0,HospitalizedPatients,IntensiveCarePatients,TotalHospitalizedPatients,HomeConfinement
0,0,0,0,0
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
4,10,2,12,6
5,0,0,0,0


In [None]:
data["IntensiveCarePatients"]

0         0
1         0
2         0
3         0
4         2
       ... 
6022    213
6023    252
6024     60
6025      8
6026    308
Name: IntensiveCarePatients, Length: 6027, dtype: int64

In [None]:
data[["TotalHospitalizedPatients",	"HomeConfinement"]]

Unnamed: 0,TotalHospitalizedPatients,HomeConfinement
0,0,0
1,0,0
2,0,0
3,0,0
4,12,6
...,...,...
6022,1580,38166
6023,1612,27587
6024,392,5673
6025,110,877


Muchas veces se requiere filtrar datos de acuerdo a ciertas condiciones

In [None]:
# Creación de un dataframe
data = pd.DataFrame({'Brand' : ['Maruti', 'Hyundai', 'Tata',
                                'Mahindra', 'Maruti', 'Hyundai',
                                'Renault', 'Tata', 'Maruti'],
                     'Year' : [2012, 2014, 2011, 2015, 2012, 
                               2016, 2014, 2018, 2019],
                     'Kms Driven' : [50000, 30000, 60000, 
                                     25000, 10000, 46000, 
                                     31000, 15000, 12000],
                     'City' : ['Gurgaon', 'Delhi', 'Mumbai', 
                               'Delhi', 'Mumbai', 'Delhi', 
                               'Mumbai','Chennai', 'Ghaziabad'],
                     'Mileage' :  [28, 27, 25, 26, 28, 
                                   29, 24, 21, 24]})
   
# desplegar el DataFrame
display(data)

Unnamed: 0,Brand,Year,Kms Driven,City,Mileage
0,Maruti,2012,50000,Gurgaon,28
1,Hyundai,2014,30000,Delhi,27
2,Tata,2011,60000,Mumbai,25
3,Mahindra,2015,25000,Delhi,26
4,Maruti,2012,10000,Mumbai,28
5,Hyundai,2016,46000,Delhi,29
6,Renault,2014,31000,Mumbai,24
7,Tata,2018,15000,Chennai,21
8,Maruti,2019,12000,Ghaziabad,24


In [None]:
#seleccionar los autos 'Maruti' con 'kilometraje' > 25
display (data.loc[(data.Brand == 'Maruti') & (data.Mileage > 25)])

Unnamed: 0,Brand,Year,Kms Driven,City,Mileage
0,Maruti,2012,50000,Gurgaon,28
4,Maruti,2012,10000,Mumbai,28


In [None]:
#Actualizar los valores del kilometraje si año < 2015
data.loc [(data.Year < 2015), ['Mileage']] = 55
display (data)

Unnamed: 0,Brand,Year,Kms Driven,City,Mileage
0,Maruti,2012,50000,Gurgaon,55
1,Hyundai,2014,30000,Delhi,55
2,Tata,2011,60000,Mumbai,55
3,Mahindra,2015,25000,Delhi,26
4,Maruti,2012,10000,Mumbai,55
5,Hyundai,2016,46000,Delhi,29
6,Renault,2014,31000,Mumbai,55
7,Tata,2018,15000,Chennai,21
8,Maruti,2019,12000,,24


Pandas tiene algunos selectores preconstruidos como: `isin`, `isnull`, `notnull`.

In [None]:
data.loc [data.City.isin(['Delhi', 'Mumbai'])]

Unnamed: 0,Brand,Year,Kms Driven,City,Mileage
1,Hyundai,2014,30000,Delhi,55
2,Tata,2011,60000,Mumbai,55
3,Mahindra,2015,25000,Delhi,26
4,Maruti,2012,10000,Mumbai,55
5,Hyundai,2016,46000,Delhi,29
6,Renault,2014,31000,Mumbai,55


##Ejercicios

Substituya --- por uno o más enunciados para resolver el ejercicio

1. Crear un data frame con el siguiente dataset [winemag-data-130k-v2.csv](https://www.kaggle.com/zynicide/wine-reviews)

In [None]:
from google.colab import files

uploaded = files.upload()

In [None]:
uploaded = files.upload ()
reviews = ---

In [None]:
reviews.head ()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


2. Seleccione la columna `description` de `reviews` y asigne el resultado a la variable `desc`.

In [None]:
desc = ____________

3. Seleccione el primer valor de la columna `description` de `reviews` y asígnelo a la variable `primeraDescripción`.

In [None]:
primeraDescripción = ____________

4. Seleccione el primer renglón de `reviews` y asígnelo a la variable `primerRenglón`.

In [None]:
primerRenglón = ____________

5. Seleccione los primeros 10 valores de la columna `description` de `review`s, asígnelo a la variable `primerasDescripciones`.

In [None]:
primerasDescripciones = ____________

6. Selecciones con registros con `index labels 1,2,3,5 y 8`, asigne el resultado a la variable `algunasRevisiones`.

In [None]:
algunasRevisiones = ____________

7. Crear una variable `df` que contenga las `columnas country, province, region_1 y region_2` y los registros con `index labels 0, 1, 10 y 100`.

In [None]:
df = ____________

8. Crear una variable df que contenga las columnas country y variety de los primeros 100 registros

In [None]:
df = ____________

9. Crear un dataFrame `italianWines` que contenga las `reviews` de los vinos hechos en `Italy`.

In [None]:
italianWines = ____________

9. Crear un DataFrame `topWinesOceania` con todas las revisiones de los vinos con al menos `95 points` de `Australia` y `New Zealand`.

In [None]:
topWinesOceania = ____________

#Funciones y Maps

En algunas ocasiones es necesario aplicar operaciones sobre los datos.

1. Crear un DataFrame con el siguiente dataset [winemag-data-130k-v2.csv](https://www.kaggle.com/zynicide/wine-reviews)

In [None]:
import pandas as pd
from google.colab import files
uploaded = files.upload ()
import io
reviews = pd.read_csv(io.BytesIO(uploaded['winemag-data-130k-v2.csv']))
reviews

Saving winemag-data-130k-v2.csv to winemag-data-130k-v2 (1).csv


Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


In [None]:
reviews = pd.read_csv(io.BytesIO(uploaded['/content/winemag-data-130k-v2.csv']))
reviews

KeyError: ignored

##Resumen

Ofrece varias funciones que permiten ver un panorama general de los datos. Por ejemplo `describe ()`, que genera un resumen de alto nivel de los atributos por columna, sólo tiene sentido para datos numéricos.

In [None]:
reviews.points.describe ()

count    129971.000000
mean         88.447138
std           3.039730
min          80.000000
25%          86.000000
50%          88.000000
75%          91.000000
max         100.000000
Name: points, dtype: float64

In [None]:
reviews.taster_name.describe ()

count         103727
unique            19
top       Roger Voss
freq           25514
Name: taster_name, dtype: object

Si se desea el resumen estadístico simple de una columna o una Serie, casi seguro que hay una función en Pandas que lo realice. Algunos métodos son:


* pandas.DataFrame.corr: correlación
* pandas.DataFrame.count: no nulos
* pandas.DataFrame.max: valor más alto 
* pandas.DataFrame.min: valor menor
* pandas.DataFrame.median: mediana
* pandas.DataFrame.std: desviación estándar
* pandas.DataFrame.mean: media



In [None]:
reviews.points.mean ()

88.44713820775404

In [None]:
#Valores únicos
reviews.taster_name.unique ()

array(['Kerin O’Keefe', 'Roger Voss', 'Paul Gregutt',
       'Alexander Peartree', 'Michael Schachner', 'Anna Lee C. Iijima',
       'Virginie Boone', 'Matt Kettmann', nan, 'Sean P. Sullivan',
       'Jim Gordon', 'Joe Czerwinski', 'Anne Krebiehl\xa0MW',
       'Lauren Buzzeo', 'Mike DeSimone', 'Jeff Jenssen',
       'Susan Kostrzewa', 'Carrie Dykes', 'Fiona Adams',
       'Christina Pickard'], dtype=object)

In [None]:
#Valores únicos y cuántos
reviews.taster_name.value_counts ()

Roger Voss            25514
Michael Schachner     15134
Kerin O’Keefe         10776
Virginie Boone         9537
Paul Gregutt           9532
Matt Kettmann          6332
Joe Czerwinski         5147
Sean P. Sullivan       4966
Anna Lee C. Iijima     4415
Jim Gordon             4177
Anne Krebiehl MW       3685
Lauren Buzzeo          1835
Susan Kostrzewa        1085
Mike DeSimone           514
Jeff Jenssen            491
Alexander Peartree      415
Carrie Dykes            139
Fiona Adams              27
Christina Pickard         6
Name: taster_name, dtype: int64

In [None]:
#Para todas las columnas (numéricas)
reviews.describe()

Unnamed: 0.1,Unnamed: 0,points,price
count,129971.0,129971.0,120975.0
mean,64985.0,88.447138,35.363389
std,37519.540256,3.03973,41.022218
min,0.0,80.0,4.0
25%,32492.5,86.0,17.0
50%,64985.0,88.0,25.0
75%,97477.5,91.0,42.0
max,129970.0,100.0,3300.0


##Maps

Función que toma un conjunto de valores y los "mapea" a otro conjunto de valores. En data science frecuentemente se necesita crear nuevas representaciones de datos existentes o se transforman a un nuevo formato. <p>
Dos métodos de mapeo:
> * map (), función que regresa una nueva Serie donde todos los valores se transformaron
> * apply (), transformación a todo el DataFrame

Tanto `map ()` como `apply ()` regresan `Series` y `DataFrames` nuevos y transformados, los originales se conservan.

La sintaxis de las funciones lambda son como cualquier función  en lambda, excepto que no tienen nombre cuando se definen y están contenidas en una línea.


```
lambda argumento(s): expresión
```

Ejemplo:


```
# función normal
def nombre (x):
  return x+x

#función lambda
lambda x: x+x  
```

Usarlas cuando las operaciones tienen una lógica simple (una sóla expresión) y son fáciles de entender, también cuando la función sólo se usa una vez.<p>
No usarlas cuando se tienen condicionales anidados o funciones que ocupan más de una línea de código



In [None]:
#ejemplo sencillo
df = pd.DataFrame ( {
    'nombre': ['Luke', 'Lorelai', 'Rory', 'Logan'],
    'papel': ['papá', 'mamá', 'hija', 'novio'],
    'añoNac': [1976, 1984, 2000, 1999],
})
df

Unnamed: 0,nombre,papel,añoNac
0,Luke,papá,1976
1,Lorelai,mamá,1984
2,Rory,hija,2000
3,Logan,novio,1999


In [None]:
df['edad'] = df['añoNac'].apply (lambda x: 2021-x)
df

Unnamed: 0,nombre,papel,añoNac,edad
0,Luke,papá,1976,45
1,Lorelai,mamá,1984,37
2,Rory,hija,2000,21
3,Logan,novio,1999,22


Usando map (), para redefinir la media de los vinos

In [None]:
reviewsPointsMean = reviews.points.mean ()
reviewsPointsMean

88.44713820775404

In [None]:
reviews.points.map (lambda p: p-reviewsPointsMean)

0        -1.447138
1        -1.447138
2        -1.447138
3        -1.447138
4        -1.447138
            ...   
129966    1.552862
129967    1.552862
129968    1.552862
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

apply () es el equivalente de map cuando se desea transformar todo el Dataframe invocando un método en cada renglón. <p>


In [None]:
def remeanPoints (ren):
  ren.points = ren.points-reviewsPointsMean
  return ren

reviews.apply (remeanPoints, axis = 'columns')

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,-1.447138,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,-1.447138,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,-1.447138,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,-1.447138,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,-1.447138,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,1.552862,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,1.552862,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,1.552862,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,1.552862,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


In [None]:
#El DataFrame original se conserva
reviews.head (1)

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia


Otra forma de reasignar la media de los puntos

In [None]:
reviewsPointsMean = reviews.points.mean ()
reviews.points - reviewsPointsMean

Pandas también entiende si se realizan operaciones entre Series de igual longitud. Por ejemplo, una forma fácil de combinar el país y la región es:

In [None]:
reviews.country + ' - ' + reviews.region_1

0                     Italy - Etna
1                              NaN
2           US - Willamette Valley
3         US - Lake Michigan Shore
4           US - Willamette Valley
                    ...           
129966                         NaN
129967                 US - Oregon
129968             France - Alsace
129969             France - Alsace
129970             France - Alsace
Length: 129971, dtype: object

Estos operadores son más rápidos que map() o apply (). Todos los operadores estándar (>, <, ==, etc.) trabajan de esta manera, pero map () y apply () permiten realizar cuestiones más avanzadas.

# **Ejercicios**

Substituya ------ por uno o más enunciados para resolver el ejercicio

1. ¿Cuál es la mediana de la columna points en el DataFrame reviews?

In [None]:
median_points = ---
median_points

88.0

2. ¿Qué países están representados en el dataSet? (La respuesta no debe contener duplicados)

In [None]:
países =---
países

array(['Italy', 'Portugal', 'US', 'Spain', 'France', 'Germany',
       'Argentina', 'Chile', 'Australia', 'Austria', 'South Africa',
       'New Zealand', 'Israel', 'Hungary', 'Greece', 'Romania', 'Mexico',
       'Canada', nan, 'Turkey', 'Czech Republic', 'Slovenia',
       'Luxembourg', 'Croatia', 'Georgia', 'Uruguay', 'England',
       'Lebanon', 'Serbia', 'Brazil', 'Moldova', 'Morocco', 'Peru',
       'India', 'Bulgaria', 'Cyprus', 'Armenia', 'Switzerland',
       'Bosnia and Herzegovina', 'Ukraine', 'Slovakia', 'Macedonia',
       'China', 'Egypt'], dtype=object)

3. ¿Qué tan frecuente aparece cada país en el dataset? Crear una serie para mapear países con el número de revisiones

In [None]:
revPorPaís = ---
revPorPaís

US                        54504
France                    22093
Italy                     19540
Spain                      6645
Portugal                   5691
Chile                      4472
Argentina                  3800
Austria                    3345
Australia                  2329
Germany                    2165
New Zealand                1419
South Africa               1401
Israel                      505
Greece                      466
Canada                      257
Hungary                     146
Bulgaria                    141
Romania                     120
Uruguay                     109
Turkey                       90
Slovenia                     87
Georgia                      86
England                      74
Croatia                      73
Mexico                       70
Moldova                      59
Brazil                       52
Lebanon                      35
Morocco                      28
Peru                         16
Ukraine                      14
Czech Re

4. Crear una variable centPrecio que contenga una versión de la columna price con la resta del promedio

In [None]:
centPrecio =---
centPrecio

0               NaN
1        -20.363389
2        -21.363389
3        -22.363389
4         29.636611
            ...    
129966    -7.363389
129967    39.636611
129968    -5.363389
129969    -3.363389
129970   -14.363389
Name: price, Length: 129971, dtype: float64

In [None]:
centPrecio =---
reviews.price.map (lambda p: p-centPrecio)

0               NaN
1        -20.363389
2        -21.363389
3        -22.363389
4         29.636611
            ...    
129966    -7.363389
129967    39.636611
129968    -5.363389
129969    -3.363389
129970   -14.363389
Name: price, Length: 129971, dtype: float64

In [None]:
reviews

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


5. ¿Qué vino tiene la mejor relación puntos precio en el dataset?<p>
Nota: puede usar el método idxmax

In [None]:
#solo para ver nombres de las columnas
reviews.head (3)

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm


In [None]:
i = ---).idxmax ()
mejorV = ---
mejorV

'Bandit NV Merlot (California)'

6. Cuente cuántas veces se repiten las palabras 'tropical' o 'fruity' en la columna description del dataset.

In [None]:
t = reviews.description.map (---in d).---
f = reviews.description.map (lambda d : --- in d).sum ()
cuentas = pd.Series ([t,f], index=['tropical', 'fruity'])
cuentas

tropical    3607
fruity      9090
dtype: int64

7. Modificar el sistema de points por una más simple de estrellas. Una calificación de 95 o más puntos tendrá 3 estrellas, uno entre 95 y 85 2 estrellas y el resto 1 estrellas. Además, si el país es Francia tendrá 3 puntos.

In [None]:
def estrellas (r):
  if (r.country == 'Francia') |(r.points >= 95): return 3
  if r.points >= 85: return 2
  return 1

estrellasRatings = reviews.apply (estrellas, axis = 'columns')  
estrellasRatings

0         2
1         2
2         2
3         2
4         2
         ..
129966    2
129967    2
129968    2
129969    2
129970    2
Length: 129971, dtype: int64

#**PANDAS**

#Agrupamiento y orden

La operación `groupby` involucra una de las siguientes operaciones en el objeto original:


*   Dividir el objeto
*   Aplicar una función
*   Combinar los resultados

Cuando los datos se dividen en conjuntos muchas veces se aplica alguna funcionalidad como:


*   Agregación, calcular el resumen estadístico
*   Transformación, realizar alguna operación sobre el grupo
*   Filtración, descartar datos con alguna condición






In [None]:
import pandas as pd
from google.colab import files
uploaded = files.upload ()
import io
reviews = pd.read_csv(io.BytesIO(uploaded['winemag-data-130k-v2.csv']))
reviews

Saving winemag-data-130k-v2.csv to winemag-data-130k-v2.csv


Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


Ejemplo: `groupby()` crea grupos del dataset con los mismos valores en points. Para cada grupo se considera la columna points y se cuenta cuántas veces aparece. La función value_counts () realiza esta operación.

In [None]:
display (reviews.groupby ('points').points.count ())

points
80       397
81       692
82      1836
83      3025
84      6480
85      9530
86     12600
87     16933
88     17207
89     12226
90     15410
91     11359
92      9613
93      6489
94      3758
95      1535
96       523
97       229
98        77
99        33
100       19
Name: points, dtype: int64

Se puede usar cualquier función de resumen. Por ejemplo para obtener el vino más barato en cada categoría de puntos:

In [None]:
display (reviews.groupby ('points').price.min ())

points
80      5.0
81      5.0
82      4.0
83      4.0
84      4.0
85      4.0
86      4.0
87      5.0
88      6.0
89      7.0
90      8.0
91      7.0
92     11.0
93     12.0
94     13.0
95     20.0
96     20.0
97     35.0
98     50.0
99     44.0
100    80.0
Name: price, dtype: float64

Cada grupo se puede pensar como una rebanada del DataFrame, que es accesible usando el método `apply ()` para manipular los datos en el sentido que se requiera. Por ejemplo, seleccionar el nombre de la primera revisión de cada bodega (`'winery'`).

In [None]:
display (reviews.groupby ('winery').apply (lambda df : df.title.iloc[0]))

winery
1+1=3                                     1+1=3 NV Rosé Sparkling (Cava)
10 Knots                            10 Knots 2010 Viognier (Paso Robles)
100 Percent Wine              100 Percent Wine 2015 Moscato (California)
1000 Stories           1000 Stories 2013 Bourbon Barrel Aged Zinfande...
1070 Green                  1070 Green 2011 Sauvignon Blanc (Rutherford)
                                             ...                        
Órale                       Órale 2011 Cabronita Red (Santa Ynez Valley)
Öko                    Öko 2013 Made With Organically Grown Grapes Ma...
Ökonomierat Rebholz    Ökonomierat Rebholz 2007 Von Rotliegenden Spät...
àMaurice               àMaurice 2013 Fred Estate Syrah (Walla Walla V...
Štoka                                    Štoka 2009 Izbrani Teran (Kras)
Length: 16757, dtype: object

Se puede agrupar por más de una columna. Por ejemplo, el lmejor vino por país y provincia.

In [None]:
reviews.groupby(['country', 'province']).apply (lambda df : df.loc[df.points.idxmax()])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
country,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Argentina,Mendoza Province,82754,Argentina,"If the color doesn't tell the full story, the ...",Nicasia Vineyard,97,120.0,Mendoza Province,Mendoza,,Michael Schachner,@wineschach,Bodega Catena Zapata 2006 Nicasia Vineyard Mal...,Malbec,Bodega Catena Zapata
Argentina,Other,78303,Argentina,"Take note, this could be the best wine Colomé ...",Reserva,95,90.0,Other,Salta,,Michael Schachner,@wineschach,Colomé 2010 Reserva Malbec (Salta),Malbec,Colomé
Armenia,Armenia,66146,Armenia,"Deep salmon in color, this wine offers a bouqu...",Estate Bottled,88,15.0,Armenia,,,Mike DeSimone,@worldwineguys,Van Ardi 2015 Estate Bottled Rosé (Armenia),Rosé,Van Ardi
Australia,Australia Other,37882,Australia,Writes the book on how to make a wine filled w...,Sarah's Blend,93,15.0,Australia Other,South Eastern Australia,,,,Marquis Philips 2000 Sarah's Blend Red (South ...,Red Blend,Marquis Philips
Australia,New South Wales,85337,Australia,De Bortoli's Noble One is as good as ever in 2...,Noble One Bortytis,94,32.0,New South Wales,New South Wales,,Joe Czerwinski,@JoeCz,De Bortoli 2007 Noble One Bortytis Semillon (N...,Sémillon,De Bortoli
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Uruguay,Juanico,9133,Uruguay,This mature Bordeaux-style blend is earthy on ...,Preludio Barrel Select Lote N 77,90,45.0,Juanico,,,Michael Schachner,@wineschach,Familia Deicas 2004 Preludio Barrel Select Lot...,Red Blend,Familia Deicas
Uruguay,Montevideo,15750,Uruguay,"A rich, heady bouquet offers aromas of blackbe...",Monte Vide Eu Tannat-Merlot-Tempranillo,91,60.0,Montevideo,,,Michael Schachner,@wineschach,Bouza 2015 Monte Vide Eu Tannat-Merlot-Tempran...,Red Blend,Bouza
Uruguay,Progreso,93103,Uruguay,"Rusty in color but deep and complex in nature,...",Etxe Oneko Fortified Sweet Red,90,46.0,Progreso,,,Michael Schachner,@wineschach,Pisano 2007 Etxe Oneko Fortified Sweet Red Tan...,Tannat,Pisano
Uruguay,San Jose,39898,Uruguay,"Baked, sweet, heavy aromas turn earthy with ti...",El Preciado Gran Reserva,87,50.0,San Jose,,,Michael Schachner,@wineschach,Castillo Viejo 2005 El Preciado Gran Reserva R...,Red Blend,Castillo Viejo


agg () ejecuta diferentes funciones sobre el DataFrame de manera simultánea. Por ejemplo, generar el resumen estadístico del dataset.

In [None]:
reviews.groupby (['country']).price.agg ([len, min, max])

Unnamed: 0_level_0,len,min,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,3800.0,4.0,230.0
Armenia,2.0,14.0,15.0
Australia,2329.0,5.0,850.0
Austria,3345.0,7.0,1100.0
Bosnia and Herzegovina,2.0,12.0,13.0
Brazil,52.0,10.0,60.0
Bulgaria,141.0,8.0,100.0
Canada,257.0,12.0,120.0
Chile,4472.0,5.0,400.0
China,1.0,18.0,18.0


## Multi-indexes

In [None]:
países = reviews.groupby(['country', 'province']).description.agg([len])
países

Unnamed: 0_level_0,Unnamed: 1_level_0,len
country,province,Unnamed: 2_level_1
Argentina,Mendoza Province,3264
Argentina,Other,536
Armenia,Armenia,2
Australia,Australia Other,245
Australia,New South Wales,85
...,...,...
Uruguay,Juanico,12
Uruguay,Montevideo,11
Uruguay,Progreso,11
Uruguay,San Jose,3


In [None]:
mi = países.index
type(mi)

pandas.core.indexes.multi.MultiIndex

Para convertir multi-índices a un índice regular:

In [None]:
países.reset_index ()

Unnamed: 0,country,province,len
0,Argentina,Mendoza Province,3264
1,Argentina,Other,536
2,Armenia,Armenia,2
3,Australia,Australia Other,245
4,Australia,New South Wales,85
...,...,...,...
420,Uruguay,Juanico,12
421,Uruguay,Montevideo,11
422,Uruguay,Progreso,11
423,Uruguay,San Jose,3


In [None]:
type (países)

pandas.core.frame.DataFrame

#Sorting

El orden que regresa groupby es el mismo que el de los renglones de los valores del índice, no de los datos. Para ordenarlos está el método `sort_values ()`.



In [None]:
países = países.reset_index ()
países.sort_values (by = 'len')

Unnamed: 0,country,province,len
179,Greece,Muscat of Kefallonian,1
192,Greece,Sterea Ellada,1
194,Greece,Thraki,1
354,South Africa,Paardeberg,1
40,Brazil,Serra do Sudeste,1
...,...,...,...
409,US,Oregon,5373
227,Italy,Tuscany,5897
118,France,Bordeaux,5941
415,US,Washington,8639


In [None]:
#para orden no ascendente
países.sort_values (by = 'len', ascending = False)

NameError: ignored

`sort_index () `tiene los mismos argumentos y orden por default.

In [None]:
países.sort_index ()

Unnamed: 0,country,province,len
0,Argentina,Mendoza Province,3264
1,Argentina,Other,536
2,Armenia,Armenia,2
3,Australia,Australia Other,245
4,Australia,New South Wales,85
...,...,...,...
420,Uruguay,Juanico,12
421,Uruguay,Montevideo,11
422,Uruguay,Progreso,11
423,Uruguay,San Jose,3


También puede ordenarse por más de una columna:

In [None]:
países.sort_values(by = ['country', 'len'])

Unnamed: 0,country,province,len
1,Argentina,Other,536
0,Argentina,Mendoza Province,3264
2,Armenia,Armenia,2
6,Australia,Tasmania,42
4,Australia,New South Wales,85
...,...,...,...
421,Uruguay,Montevideo,11
422,Uruguay,Progreso,11
420,Uruguay,Juanico,12
424,Uruguay,Uruguay,24


#Tipos de datos

El dipo de datos de una columna en un DataFrame o una Serie se conoce como **dtype**. Indican la forma en que pandas almacena internamente los datos. int64 es un número entero de 64 bits. Las cadenas (strings) son de tipo `object`.

In [None]:
reviews.price.dtype

dtype('float64')

In [None]:
reviews.dtypes

Unnamed: 0                 int64
country                   object
description               object
designation               object
points                     int64
price                    float64
province                  object
region_1                  object
region_2                  object
taster_name               object
taster_twitter_handle     object
title                     object
variety                   object
winery                    object
dtype: object

In [None]:
#Para convertir un tipo a otro
reviews.points.astype ('float64')

0         87.0
1         87.0
2         87.0
3         87.0
4         87.0
          ... 
129966    90.0
129967    90.0
129968    90.0
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

In [None]:
#El index tiene su propia dtype
reviews.index.dtype

dtype('int64')

#Valores faltantes

NaN ("Not a Number") representa valores faltantes, siempre tienen dtype `float64`. Para seleccionarlos se puede usar `isnull()` o `notnull()`.

In [None]:
reviews[pd.isnull (reviews.country)]

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
913,913,,"Amber in color, this wine has aromas of peach ...",Asureti Valley,87,30.0,,,,Mike DeSimone,@worldwineguys,Gotsa Family Wines 2014 Asureti Valley Chinuri,Chinuri,Gotsa Family Wines
3131,3131,,"Soft, fruity and juicy, this is a pleasant, si...",Partager,83,,,,,Roger Voss,@vossroger,Barton & Guestier NV Partager Red,Red Blend,Barton & Guestier
4243,4243,,"Violet-red in color, this semisweet wine has a...",Red Naturally Semi-Sweet,88,18.0,,,,Mike DeSimone,@worldwineguys,Kakhetia Traditional Winemaking 2012 Red Natur...,Ojaleshi,Kakhetia Traditional Winemaking
9509,9509,,This mouthwatering blend starts with a nose of...,Theopetra Malagouzia-Assyrtiko,92,28.0,,,,Susan Kostrzewa,@suskostrzewa,Tsililis 2015 Theopetra Malagouzia-Assyrtiko W...,White Blend,Tsililis
9750,9750,,This orange-style wine has a cloudy yellow-gol...,Orange Nikolaevo Vineyard,89,28.0,,,,Jeff Jenssen,@worldwineguys,Ross-idi 2015 Orange Nikolaevo Vineyard Chardo...,Chardonnay,Ross-idi
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124176,124176,,This Swiss red blend is composed of four varie...,Les Romaines,90,30.0,,,,Jeff Jenssen,@worldwineguys,Les Frères Dutruy 2014 Les Romaines Red,Red Blend,Les Frères Dutruy
129407,129407,,Dry spicy aromas of dusty plum and tomato add ...,Reserve,89,22.0,,,,Michael Schachner,@wineschach,El Capricho 2015 Reserve Cabernet Sauvignon,Cabernet Sauvignon,El Capricho
129408,129408,,El Capricho is one of Uruguay's more consisten...,Reserve,89,22.0,,,,Michael Schachner,@wineschach,El Capricho 2015 Reserve Tempranillo,Tempranillo,El Capricho
129590,129590,,"A blend of 60% Syrah, 30% Cabernet Sauvignon a...",Shah,90,30.0,,,,Mike DeSimone,@worldwineguys,Büyülübağ 2012 Shah Red,Red Blend,Büyülübağ


Frecuentemente hay que reemplazar los valores faltantes para hacerlo está `fillna ()`.

In [None]:
#cambiar NaN por Unknown
reviews.region_2.fillna ("Unknown")

In [None]:
#Cambiar un valor por otro
#Ejemplo @kerinokeefe a @kerino
reviews.taster_twitter_handle.replace ("@kerinokeefe","@kerino")

0             @kerino
1          @vossroger
2         @paulgwine 
3                 NaN
4         @paulgwine 
             ...     
129966            NaN
129967    @paulgwine 
129968     @vossroger
129969     @vossroger
129970     @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object

#Renombrar

rename ()cambia el nombre del índice y/o columna

In [None]:
reviews.rename(columns={'points': 'score'})

Unnamed: 0.1,Unnamed: 0,country,description,designation,score,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


In [None]:
#Otra forma
reviews.rename(index = {0: 'primeraEntrada', 1: 'segundaEntrada'})

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
primeraEntrada,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
segundaEntrada,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


Pare renombrar valores de índices también está `set_index ()`.

In [None]:
reviews.rename_axis("wines", axis='rows').rename_axis('fields', axis='columns')

fields,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
wines,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
