# Clase 10: Multi-Índices, Pivoteo, Melt, Concatenación y Combinación de Múltiples Fuentes de Datos


## Objetivos de la Clase

- Comprender como fusionar distintas fuentes de datos a partir de `concatenaciones` y `merge`.
- Profundizar el uso de los multi-índices de pandas.
- Reorganizar los datos usando `melt` y `pivot`


## Datasets de Hoy

### Índices para una Vida Mejor

Nuevamente, seguiremos usando los datos de índices para una Vida Mejor de la OECD:


<img src="https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/oecd.png" alt="OECD Better life index"/>


http://www.oecdbetterlifeindex.org/

https://stats.oecd.org/index.aspx?DataSetCode=BLI

Son 11 temas considerados como esenciales para el bienestar de la población. Cada crierio contiene uno o mas indicadores

| Tema | Indicador (Inglés) | Indicador (Español) | Unidad | Descripción |
|---|---|---|---|---|
| Vivienda 🏠 | Dwellings without basic facilities | Vivienda con Instalaciones Básicas | Porcentaje | Porcentaje de personas con inodoros de agua corriente dentro del hogar, año disponible más reciente |
|  | Housing expenditure | Gastos en Vivienda | Porcentaje | Proporción de costos de vivienda en el ingreso neto ajustado de las familias, año disponible más reciente |
|  | Rooms per person | Habitaciones por Persona | Ratio | Número promedio de habitaciones compartidas por persona en una vivienda, año disponible más reciente |
| Ingresos 💰 | Household net adjusted disposable income | Ingreso Familiar Disponible | US Dollar | Cantidad promedio de dinero que una familia gana al año, después de impuestos, año disponible más reciente |
|  | Household net wealth | Patrimonio Neto Familiar | US Dollar | Valor total promedio de los activos financieros de una familia (ahorros, acciones) menos sus pasivos (créditos), año disponible más reciente |
| Empleo ⚙️ | Labour market insecurity | Seguridad en el Empleo | Porcentaje | Pérdida esperada de ingresos cuando alguien queda desempleado, año disponible más reciente |
|  | Employment rate | Tasa de Empleo | Porcentaje | Porcentaje de personas, de 15 a 64 años de edad, actualmente con empleo remunerado, año disponible más reciente |
|  | Long-term unemployment rate | Tasa de Empleo a Largo Plazo | Porcentaje | Porcentaje de personas, de 15 a 64 años de edad, que no trabajan pero que han buscado empleo activamente durante más de un año, año disponible más reciente |
|  | Personal earnings | Ingresos Personales | US Dollar | Ingresos anuales promedio por empleado de tiempo completo, año disponible más reciente |
| Comunidad 🧑‍🤝‍🧑   | Quality of support network  | Calidad del Apoyo Social | Porcentaje | Porcentaje de personas con amigos o parientes en quienes confiar en caso de necesidad |
| Educación 📚 | Educational attainment | Nivel de Educación | Porcentaje | Porcentaje de personas, de 25 a 64 años de edad, graduadas por lo menos de educación media superior, año disponible más reciente |
|  | Student skills | Competencias de estudiantes en matemáticas, lectura y ciencias | Puntaje promedio | Desempeño promedio de estudiantes de 15 años de edad, según PISA (Programa para la Evaluación Internacional de Estudiantes) |
|  | Years in education  | Nivel de educación | Años | Duración promedio de la educación formal en la que un niño de cinco años de edad puede esperar matricularse durante su vida |
| Medio Ambiente 🌳 | Air pollution | Contaminación del Aire | Microgramos por metro cúbico | Concentración promedio de partículas (PM2.5) en ciudades con poblaciones mayores de 100,000 personas, medida en microgramos por metro cúbico, año disponible más reciente |
|  | Water quality | Calidad del Agua | Porcentaje | Porcentaje de personas que informan estar satisfechas con la calidad del agua local |
| Compromiso Cívico 🗳️  | Stakeholder engagement for developing regulations | Participación de los interesados en la elaboración de regulaciones | Puntaje promedio | Nivel de transparencia gubernamental al preparar las regulaciones, año disponible más reciente |
|  | Voter turnout | Participación electoral | Porcentaje | Porcentaje de votantes registrados que votaron durante las elecciones recientes, año disponible más reciente |
| Salud ⚕️ | Life expectancy | Esperanza de vida | Años | Número promedio de años que una persona puede esperar vivir, año disponible más reciente |
|  | Self-reported health | Salud según informan las personas | Porcentaje | Porcentaje de personas que informan que su salud es «buena o muy buena», año disponible más reciente |
| Satisfacción ✨ | Life satisfaction | Satisfacción ante la vida | Puntaje promedio | Autoevaluación promedio de satisfacción ante la vida, en una escala de 0 a 10 |
| Seguridad 🌃 | Feeling safe walking alone at night | Sentimiento de seguridad al caminar solos por la noche | Porcentaje | Porcentaje de personas que reportan sentirse seguras al caminar solas por la noche  |
|  | Homicide rate | Tasa de homicidios | Ratio | Número promedio de homicidios reportados por 100,000 personas, año disponible más reciente |
| Balance Vida Trabajo 🧘 | Employees working very long hours | Empleados que trabajan muchas horas | Porcentaje | Porcentaje de empleados que trabajan más de cincuenta horas a la semana en promedio, año disponible más reciente |
|  | Time devoted to leisure and personal care | Tiempo destinado al ocio y el cuidado personal | Horas | Número promedio de minutos al día dedicados al ocio y el cuidado personal, incluidos el sueño y la alimentación |

In [14]:
import pandas as pd

bli_df = pd.read_excel("dataset.xlsx", header=1, index_col=0)
bli_df.head()

Unnamed: 0_level_0,Dwellings without basic facilities,Housing expenditure,Rooms per person,Household net adjusted disposable income,Household net wealth,Labour market insecurity,Employment rate,Long-term unemployment rate,Personal earnings,Quality of support network,...,Water quality,Stakeholder engagement for developing regulations,Voter turnout,Life expectancy,Self-reported health,Life satisfaction,Feeling safe walking alone at night,Homicide rate,Employees working very long hours,Time devoted to leisure and personal care
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Australia,,20.0,,32759.0,427064.0,5.4,73,1.31,49126.0,95,...,93,2.7,91,82.5,85.0,7.3,63.5,1.1,13.04,14.35
Austria,0.9,21.0,1.6,33541.0,308325.0,3.5,72,1.84,50349.0,92,...,92,1.3,80,81.7,70.0,7.1,80.6,0.5,6.66,14.55
Belgium,1.9,21.0,2.2,30364.0,386006.0,3.7,63,3.54,49675.0,91,...,84,2.0,89,81.5,74.0,6.9,70.1,1.0,4.75,15.7
Canada,0.2,22.0,2.6,30854.0,423849.0,6.0,73,0.77,47622.0,93,...,91,2.9,68,81.9,88.0,7.4,82.2,1.3,3.69,14.56
Chile,9.4,18.0,1.2,,100967.0,8.7,63,,25879.0,85,...,71,1.3,47,79.9,57.0,6.5,47.9,4.2,9.72,


### Dataset de Temperaturas Globales

![wbg_climate](https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/wbg_climate.png)


https://climateknowledgeportal.worldbank.org/download-data

In [2]:
temp_df = pd.read_csv("./resources/temperature.csv")
temp_df

Unnamed: 0,Temperature,Year,Month,Country,ISO3
0,-0.03110,1991,Jan,Afghanistan,AFG
1,1.43654,1991,Feb,Afghanistan,AFG
2,6.88685,1991,Mar,Afghanistan,AFG
3,12.93970,1991,Apr,Afghanistan,AFG
4,17.07550,1991,May,Afghanistan,AFG
...,...,...,...,...,...
59899,26.09480,2016,Aug,Venezuela,VEN
59900,26.22090,2016,Sep,Venezuela,VEN
59901,26.62850,2016,Oct,Venezuela,VEN
59902,26.27680,2016,Nov,Venezuela,VEN


In [3]:
temp_df.shape

(59904, 5)

----

## 1.- Concatenación


> Según Wikipedia: *Es la operación por la cual dos caracteres se unen para formar una cadena de caracteres (o string). También se pueden concatenar dos cadenas de caracteres o un carácter con una cadena para formar una cadena de mayor tamaño*. Ejemplo: 

In [4]:
a = "Hola "

b = "a todos 🤗"


a + b

'Hola a todos 🤗'

La idea general de concatenar es unir 2 o más `Dataframes` por filas o columnas.

<div align='center'>
    <img src='https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/concat.png' width=900/>
</div>


Todas las operaciones se hacen a través de la operación sobre los índices de los `DataFrames`.

### 1.1 Caso 1: Concatenar Filas

En el caso de contactenar por filas `(axis=0)`, los `DataFrames` se unen al final a través de los índices.

<div align='center'>
    <img src='https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/merging_concat_basic.png' width=500/>
</div>

In [5]:
bli_df.head(5)

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,...,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,GPD per capita (2018)
0,Australia,5.0,,81.0,12.84,73.0,64.133333,1.1,32759.0,427064.0,...,95.25,,87.25,2.7,411.2,14.35,91.0,92.666667,20.966667,57395.91947
1,Austria,16.0,0.9,85.0,6.59,72.0,80.7,0.466667,33541.0,308325.0,...,92.0,1.6,70.6,1.3,492.8,14.53,80.0,92.0,17.0,51525.04643
2,Belgium,15.0,1.9,77.0,4.703333,63.333333,70.266667,1.033333,30364.0,386006.0,...,92.0,2.2,73.6,2.0,503.8,15.663333,89.0,83.666667,19.3,47491.32326
3,Brazil,10.0,6.7,49.0,7.006667,61.0,35.866667,27.0,,,...,89.25,,,2.2,398.2,,79.0,73.0,16.166667,9001.234249
4,Canada,7.0,0.2,91.333333,3.673333,73.333333,82.5,1.266667,30854.0,423849.0,...,93.25,2.6,87.8,2.9,523.2,14.553333,68.0,91.0,17.333333,46313.17137


In [8]:
sudamerica_df = bli_df.loc[
    bli_df["Country"].isin(
        [
            "Chile",
            "Brazil",
            "Colombia",
        ]
    ),
    :,
]

sudamerica_df

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,...,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,GPD per capita (2018)
3,Brazil,10.0,6.7,49.0,7.006667,61.0,35.866667,27.0,,,...,89.25,,,2.2,398.2,,79.0,73.0,16.166667,9001.234249
5,Chile,16.0,9.4,65.0,9.316667,62.666667,48.0,4.2,,100967.0,...,84.6,1.2,57.0,1.3,443.8,,47.0,71.0,17.5,15924.79424
6,Colombia,10.0,23.9,54.0,26.006667,67.0,44.566667,25.0,,,...,89.0,1.2,,1.4,412.8,,53.0,74.666667,14.1,6718.585324


In [9]:
norteamerica_df = bli_df[
    bli_df["Country"].isin(
        [
            "Canada",
            "United States",
            "Mexico",
        ]
    )
]

norteamerica_df

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,...,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,GPD per capita (2018)
4,Canada,7.0,0.2,91.333333,3.673333,73.333333,82.5,1.266667,30854.0,423849.0,...,93.25,2.6,87.8,2.9,523.2,14.553333,68.0,91.0,17.333333,46313.17137
24,Mexico,16.0,25.5,37.666667,27.28,61.666667,41.966667,18.633333,,,...,81.6,1.0,65.666667,3.2,416.0,,63.0,67.666667,15.166667,9673.443674
40,United States,10.0,0.1,90.666667,10.99,70.0,73.9,5.5,45284.0,632100.0,...,91.75,2.4,86.8,3.1,489.4,14.44,65.0,82.666667,17.2,62996.66482


In [10]:
oceania_df = bli_df.loc[bli_df["Country"].isin(["New Zealand", "Australia"]), :]
oceania_df

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,...,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,GPD per capita (2018)
0,Australia,5.0,,81.0,12.84,73.0,64.133333,1.1,32759.0,427064.0,...,95.25,,87.25,2.7,411.2,14.35,91.0,92.666667,20.966667,57395.91947
26,New Zealand,5.0,,78.666667,15.036667,77.0,66.266667,1.3,,388514.0,...,96.25,2.4,89.25,2.5,506.2,14.883333,80.0,89.0,17.7,42949.93058


Para ejecutar la concatenación, usamos el método `pd.concat` sobre un arreglo con los `DataFrames` por concatenar.

In [19]:
df_concatenado_filas = pd.concat(
    [sudamerica_df, norteamerica_df, oceania_df]
    , axis=0)
df_concatenado_filas

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,...,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,GPD per capita (2018)
3,Brazil,10.0,6.7,49.0,7.006667,61.0,35.866667,27.0,,,...,89.25,,,2.2,398.2,,79.0,73.0,16.166667,9001.234249
5,Chile,16.0,9.4,65.0,9.316667,62.666667,48.0,4.2,,100967.0,...,84.6,1.2,57.0,1.3,443.8,,47.0,71.0,17.5,15924.79424
6,Colombia,10.0,23.9,54.0,26.006667,67.0,44.566667,25.0,,,...,89.0,1.2,,1.4,412.8,,53.0,74.666667,14.1,6718.585324
4,Canada,7.0,0.2,91.333333,3.673333,73.333333,82.5,1.266667,30854.0,423849.0,...,93.25,2.6,87.8,2.9,523.2,14.553333,68.0,91.0,17.333333,46313.17137
24,Mexico,16.0,25.5,37.666667,27.28,61.666667,41.966667,18.633333,,,...,81.6,1.0,65.666667,3.2,416.0,,63.0,67.666667,15.166667,9673.443674
40,United States,10.0,0.1,90.666667,10.99,70.0,73.9,5.5,45284.0,632100.0,...,91.75,2.4,86.8,3.1,489.4,14.44,65.0,82.666667,17.2,62996.66482
0,Australia,5.0,,81.0,12.84,73.0,64.133333,1.1,32759.0,427064.0,...,95.25,,87.25,2.7,411.2,14.35,91.0,92.666667,20.966667,57395.91947
26,New Zealand,5.0,,78.666667,15.036667,77.0,66.266667,1.3,,388514.0,...,96.25,2.4,89.25,2.5,506.2,14.883333,80.0,89.0,17.7,42949.93058


### 1.2 Caso 2: Concatenar Columnas

En este caso, los `DataFrames` se unen por los índices y las columnas.


<div align='center'>
    <img src='https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/merging_concat_mixed_ndim.png' width=700/>
</div>

In [20]:
env_df = bli_df.loc[:, ["Country", "Air pollution", "Water quality"]]
env_df.head()

Unnamed: 0,Country,Air pollution,Water quality
0,Australia,5.0,92.666667
1,Austria,16.0,92.0
2,Belgium,15.0,83.666667
3,Brazil,10.0,73.0
4,Canada,7.0,91.0


In [21]:
health_df = bli_df.loc[
    :, ["Country", "Self-reported health", "Life expectancy", "Life satisfaction"]
]

health_df.head()

Unnamed: 0,Country,Self-reported health,Life expectancy,Life satisfaction
0,Australia,87.25,82.5,7.35
1,Austria,70.6,81.7,7.225
2,Belgium,73.6,81.5,7.0
3,Brazil,,74.766667,6.4
4,Canada,87.8,81.866667,7.425


In [22]:
env_heatlh_df = pd.concat([env_df, health_df], axis=1)
env_heatlh_df.head()

Unnamed: 0,Country,Air pollution,Water quality,Country.1,Self-reported health,Life expectancy,Life satisfaction
0,Australia,5.0,92.666667,Australia,87.25,82.5,7.35
1,Austria,16.0,92.0,Austria,70.6,81.7,7.225
2,Belgium,15.0,83.666667,Belgium,73.6,81.5,7.0
3,Brazil,10.0,73.0,Brazil,,74.766667,6.4
4,Canada,7.0,91.0,Canada,87.8,81.866667,7.425


Nota: La unión sigue siendo por filas. Por ende, una columna repetida aparecerá dos veces en el `DataFrame` resultante, como en el caso anterior con `Country`

> **Nota**: Para facilitar la práctica, solo dejaremos una columna `Country`

In [23]:
env_heatlh_df.columns

Index(['Country', 'Air pollution', 'Water quality', 'Country',
       'Self-reported health', 'Life expectancy', 'Life satisfaction'],
      dtype='object')

In [25]:
~env_heatlh_df.columns.duplicated()

array([ True,  True,  True, False,  True,  True,  True])

In [26]:
env_heatlh_df = env_heatlh_df.loc[:, ~env_heatlh_df.columns.duplicated()]
env_heatlh_df.head()

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction
0,Australia,5.0,92.666667,87.25,82.5,7.35
1,Austria,16.0,92.0,70.6,81.7,7.225
2,Belgium,15.0,83.666667,73.6,81.5,7.0
3,Brazil,10.0,73.0,,74.766667,6.4
4,Canada,7.0,91.0,87.8,81.866667,7.425


### 1.3 Un `DataFrame` tiene menos datos que el otro

En este caso, rellena los valores de las filas sin valor con `np.nan`.

<div align='center'>
    <img src='https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/merging_concat_axis1.png' width=800/>
</div>

In [27]:
env_df_reducido = env_df[0:7]
env_df_reducido

Unnamed: 0,Country,Air pollution,Water quality
0,Australia,5.0,92.666667
1,Austria,16.0,92.0
2,Belgium,15.0,83.666667
3,Brazil,10.0,73.0
4,Canada,7.0,91.0
5,Chile,16.0,71.0
6,Colombia,10.0,74.666667


In [28]:
health_df

Unnamed: 0,Country,Self-reported health,Life expectancy,Life satisfaction
0,Australia,87.25,82.5,7.35
1,Austria,70.6,81.7,7.225
2,Belgium,73.6,81.5,7.0
3,Brazil,,74.766667,6.4
4,Canada,87.8,81.866667,7.425
5,Chile,57.0,79.9,6.48
6,Colombia,,76.233333,6.266667
7,Czech Republic,60.8,79.1,6.7
8,Denmark,72.6,80.9,7.65
9,Estonia,54.4,77.766667,5.78


In [41]:
pd.concat([env_df_reducido, health_df], axis=1).head(15)

Unnamed: 0,Country,Air pollution,Water quality,Country.1,Self-reported health,Life expectancy,Life satisfaction
0,Australia,5.0,92.666667,Australia,87.25,82.5,7.35
1,Austria,16.0,92.0,Austria,70.6,81.7,7.225
2,Belgium,15.0,83.666667,Belgium,73.6,81.5,7.0
3,Brazil,10.0,73.0,Brazil,,74.766667,6.4
4,Canada,7.0,91.0,Canada,87.8,81.866667,7.425
5,Chile,16.0,71.0,Chile,57.0,79.9,6.48
6,Colombia,10.0,74.666667,Colombia,,76.233333,6.266667
7,,,,Czech Republic,60.8,79.1,6.7
8,,,,Denmark,72.6,80.9,7.65
9,,,,Estonia,54.4,77.766667,5.78


### 1.4 Error en parámetro `axis`

In [30]:
# concatenamos correctamente al igual que el ejemplo anterior
pd.concat([env_df.head(), health_df.head()], axis=1)

Unnamed: 0,Country,Air pollution,Water quality,Country.1,Self-reported health,Life expectancy,Life satisfaction
0,Australia,5.0,92.666667,Australia,87.25,82.5,7.35
1,Austria,16.0,92.0,Austria,70.6,81.7,7.225
2,Belgium,15.0,83.666667,Belgium,73.6,81.5,7.0
3,Brazil,10.0,73.0,Brazil,,74.766667,6.4
4,Canada,7.0,91.0,Canada,87.8,81.866667,7.425


> **Pregunta ❓**: ¿Qué sucede si nos equivocamos con el parámetro `axis`?


In [32]:
env_df.head()

Unnamed: 0,Country,Air pollution,Water quality
0,Australia,5.0,92.666667
1,Austria,16.0,92.0
2,Belgium,15.0,83.666667
3,Brazil,10.0,73.0
4,Canada,7.0,91.0


In [33]:
health_df.head()

Unnamed: 0,Country,Self-reported health,Life expectancy,Life satisfaction
0,Australia,87.25,82.5,7.35
1,Austria,70.6,81.7,7.225
2,Belgium,73.6,81.5,7.0
3,Brazil,,74.766667,6.4
4,Canada,87.8,81.866667,7.425


In [31]:
pd.concat([env_df.head(), health_df.head()], axis=0)

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction
0,Australia,5.0,92.666667,,,
1,Austria,16.0,92.0,,,
2,Belgium,15.0,83.666667,,,
3,Brazil,10.0,73.0,,,
4,Canada,7.0,91.0,,,
0,Australia,,,87.25,82.5,7.35
1,Austria,,,70.6,81.7,7.225
2,Belgium,,,73.6,81.5,7.0
3,Brazil,,,,74.766667,6.4
4,Canada,,,87.8,81.866667,7.425


Incluso, en el caso que los índices no sean útiles para unir filas, pueden especificar `ignore_index` como `True`.

In [None]:
pd.concat([env_df.head(), health_df.head()], ignore_index=True)

---

## 2.- Agregaciones:  Breve repaso y preparación de nuevos datos

Recordemos que podemos agregar datos según algún grupo y calcular estadísticas sobre estos.

El proceso consiste en tres pasos: 

1. Separar
2. Aplicar la función.
3. Juntar.

<div align='center'>
    <img src='https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/group_by.png' width=900/>
</div>

In [34]:
temp_df.head(10)

Unnamed: 0,Temperature,Year,Month,Country,ISO3
0,-0.0311,1991,Jan,Afghanistan,AFG
1,1.43654,1991,Feb,Afghanistan,AFG
2,6.88685,1991,Mar,Afghanistan,AFG
3,12.9397,1991,Apr,Afghanistan,AFG
4,17.0755,1991,May,Afghanistan,AFG
5,23.0777,1991,Jun,Afghanistan,AFG
6,25.571,1991,Jul,Afghanistan,AFG
7,23.9673,1991,Aug,Afghanistan,AFG
8,19.38,1991,Sep,Afghanistan,AFG
9,12.8779,1991,Oct,Afghanistan,AFG


In [51]:
prom = temp_df.groupby("Country").mean()
prom = prom.drop(columns=["Year"])
prom = prom.rename(columns={'Temperature': 'promedio'})
prom

Unnamed: 0_level_0,promedio
Country,Unnamed: 1_level_1
Afghanistan,13.545609
Albania,12.106435
Algeria,23.439610
Andorra,11.953746
Angola,22.133815
...,...
United States,7.617000
Uruguay,17.941855
Uzbekistan,13.158793
Vanuatu,24.123163


In [52]:
std = temp_df.groupby("Country").std().drop(columns=["Year"])
std = std.rename(columns={'Temperature': 'std'})
std

Unnamed: 0_level_0,std
Country,Unnamed: 1_level_1
Afghanistan,8.695203
Albania,7.101392
Algeria,7.473115
Andorra,6.014903
Angola,1.710757
...,...
United States,9.077944
Uruguay,4.505184
Uzbekistan,10.837303
Vanuatu,1.204321


> **Pregunta ❓**: ¿Cómo podríamos usar concat para juntar los promedios y desviaciones estándar?

In [53]:
df_unido = pd.concat([prom, std], axis=1)
df_unido

Unnamed: 0_level_0,promedio,std
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,13.545609,8.695203
Albania,12.106435,7.101392
Algeria,23.439610,7.473115
Andorra,11.953746,6.014903
Angola,22.133815,1.710757
...,...,...
United States,7.617000,9.077944
Uruguay,17.941855,4.505184
Uzbekistan,13.158793,10.837303
Vanuatu,24.123163,1.204321


In [54]:
t_agg_df = pd.concat([prom, std], axis=1)

### 2.1 `agg`

Recordemos el método `agg`, el cuál permite agregar datos por grupo usando una o más operaciones:

In [57]:
t_agg_df = temp_df.groupby("Country").agg(
    {"Temperature": ["mean", "std"]}
)
t_agg_df

Unnamed: 0_level_0,Temperature,Temperature
Unnamed: 0_level_1,mean,std
Country,Unnamed: 1_level_2,Unnamed: 2_level_2
Afghanistan,13.545609,8.695203
Albania,12.106435,7.101392
Algeria,23.439610,7.473115
Andorra,11.953746,6.014903
Angola,22.133815,1.710757
...,...,...
United States,7.617000,9.077944
Uruguay,17.941855,4.505184
Uzbekistan,13.158793,10.837303
Vanuatu,24.123163,1.204321


Este, como vimos anteriormente, retorna un multi-índice en las columnas:

In [None]:
t_agg_df.columns

Haremos un pequeño _fix_ didáctico: eliminaremos los multiíndices de las filas y columnas para dejar listo el dataset para lo que viene.

In [58]:
t_agg_df = t_agg_df.droplevel(0, axis=1).reset_index()
t_agg_df.columns = ["Country", "t_mean", "t_std"]
t_agg_df

Unnamed: 0,Country,t_mean,t_std
0,Afghanistan,13.545609,8.695203
1,Albania,12.106435,7.101392
2,Algeria,23.439610,7.473115
3,Andorra,11.953746,6.014903
4,Angola,22.133815,1.710757
...,...,...,...
187,United States,7.617000,9.077944
188,Uruguay,17.941855,4.505184
189,Uzbekistan,13.158793,10.837303
190,Vanuatu,24.123163,1.204321


In [59]:
env_heatlh_df.head(10)

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction
0,Australia,5.0,92.666667,87.25,82.5,7.35
1,Austria,16.0,92.0,70.6,81.7,7.225
2,Belgium,15.0,83.666667,73.6,81.5,7.0
3,Brazil,10.0,73.0,,74.766667,6.4
4,Canada,7.0,91.0,87.8,81.866667,7.425
5,Chile,16.0,71.0,57.0,79.9,6.48
6,Colombia,10.0,74.666667,,76.233333,6.266667
7,Czech Republic,20.0,86.666667,60.8,79.1,6.7
8,Denmark,9.0,95.0,72.6,80.9,7.65
9,Estonia,8.0,84.0,54.4,77.766667,5.78


In [60]:
t_agg_df.head(10)

Unnamed: 0,Country,t_mean,t_std
0,Afghanistan,13.545609,8.695203
1,Albania,12.106435,7.101392
2,Algeria,23.43961,7.473115
3,Andorra,11.953746,6.014903
4,Angola,22.133815,1.710757
5,Antigua and Barbuda,26.156215,1.002084
6,Argentina,14.568881,4.813378
7,Armenia,7.803382,9.697264
8,Australia,21.966571,4.881637
9,Austria,7.202215,6.993245


### DataFrames Desalineados

> **Pregunta ❓**: ¿Qué pasará al combinar el Dataset de *Better Life Index* con el de Temperaturas usando `concat`?

In [61]:
df_concat = pd.concat([env_heatlh_df, t_agg_df], axis=1)
df_concat

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,Country.1,t_mean,t_std
0,Australia,5.0,92.666667,87.25,82.500000,7.350,Afghanistan,13.545609,8.695203
1,Austria,16.0,92.000000,70.60,81.700000,7.225,Albania,12.106435,7.101392
2,Belgium,15.0,83.666667,73.60,81.500000,7.000,Algeria,23.439610,7.473115
3,Brazil,10.0,73.000000,,74.766667,6.400,Andorra,11.953746,6.014903
4,Canada,7.0,91.000000,87.80,81.866667,7.425,Angola,22.133815,1.710757
...,...,...,...,...,...,...,...,...,...
187,,,,,,,United States,7.617000,9.077944
188,,,,,,,Uruguay,17.941855,4.505184
189,,,,,,,Uzbekistan,13.158793,10.837303
190,,,,,,,Vanuatu,24.123163,1.204321


Parece que no funcionó muy bien...



---

## 3.- Merge / Combinar usando un identificador común

Es una forma de combinar dos `DataFrames` en la que usamos los valores de columna como identificador comunes para concatenar el resto de los valores:

![Idea del Merge](https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/merge.png)


Es equivalente a las sentencias `JOIN` de SQL. Existen varios tipos. `Pandas` implementa 5 a través de la función `pd.merge`.

En los siguientes ejemplos uniremos los datasets de la OECD y de temperatura agregada según los distintos tipos de `Merge`. Será de mucha utilidad pensar los `Merge` como operaciones sobre conjuntos.

In [62]:
# Paises en el primer dataset
bli_df["Country"].unique()

array(['Australia', 'Austria', 'Belgium', 'Brazil', 'Canada', 'Chile',
       'Colombia', 'Czech Republic', 'Denmark', 'Estonia', 'Finland',
       'France', 'Germany', 'Greece', 'Hungary', 'Iceland', 'Ireland',
       'Israel', 'Italy', 'Japan', 'Korea', 'Latvia', 'Lithuania',
       'Luxembourg', 'Mexico', 'Netherlands', 'New Zealand', 'Norway',
       'OECD - Total', 'Poland', 'Portugal', 'Russia', 'Slovak Republic',
       'Slovenia', 'South Africa', 'Spain', 'Sweden', 'Switzerland',
       'Turkey', 'United Kingdom', 'United States'], dtype=object)

In [None]:
t_agg_df

In [63]:
# Paises en el segundo dataset
t_agg_df["Country"].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh',
       'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan',
       'Bolivia', 'BosniaandHerzegovina', 'Botswana', 'Brazil', 'Brunei',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Cape Verde', 'Central African Republic', 'Chad',
       'Chile', 'China', 'Colombia', 'Comoros', 'Costa Rica',
       "Coted'Ivoire", 'Croatia', 'Cuba', 'Cyprus', 'CzechRepublic',
       'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica',
       'DominicanRepublic', 'Ecuador', 'Egypt', 'ElSalvador',
       'EquatorialGuinea', 'Eritrea', 'Estonia', 'Ethiopia',
       'FaroeIslands ', 'FederatedStatesofMicronesia', 'Fiji', 'Finland',
       'France', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana',
       'Greece', 'Greenland', 'Grenada', 'Guatemal

---

### Inner

Combina los elementos que se encuentren en ambas tablas. Descarta todo el resto

![Inner](https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/inner.png)

In [68]:
env_heatlh_df.shape

(41, 6)

In [66]:
pd.merge(
    left=env_heatlh_df,
    right=t_agg_df,
    left_on="Country",
    right_on="Country",
    how="inner",
    sort=True,
)

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std
0,Australia,5.0,92.666667,87.25,82.5,7.35,21.966571,4.881637
1,Austria,16.0,92.0,70.6,81.7,7.225,7.202215,6.993245
2,Belgium,15.0,83.666667,73.6,81.5,7.0,10.573462,5.739911
3,Brazil,10.0,73.0,,74.766667,6.4,25.504852,0.962928
4,Canada,7.0,91.0,87.8,81.866667,7.425,-5.963874,12.528132
5,Chile,16.0,71.0,57.0,79.9,6.48,8.454478,3.126838
6,Colombia,10.0,74.666667,,76.233333,6.266667,24.734349,0.568972
7,Denmark,9.0,95.0,72.6,80.9,7.65,8.678213,6.136734
8,Estonia,8.0,84.0,54.4,77.766667,5.78,6.166056,8.199942
9,Finland,6.0,95.0,70.2,81.5,7.66,2.402188,9.278735


### Left Merge

![Right Merge](https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/left.png)

Conserva solo los elementos que se hayan combinado correctamente provenientes dataset `left`.

In [69]:
pd.merge(
    left=env_heatlh_df,
    right=t_agg_df,
    on="Country",
    how="left",
    sort=True,
)

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std
0,Australia,5.0,92.666667,87.25,82.5,7.35,21.966571,4.881637
1,Austria,16.0,92.0,70.6,81.7,7.225,7.202215,6.993245
2,Belgium,15.0,83.666667,73.6,81.5,7.0,10.573462,5.739911
3,Brazil,10.0,73.0,,74.766667,6.4,25.504852,0.962928
4,Canada,7.0,91.0,87.8,81.866667,7.425,-5.963874,12.528132
5,Chile,16.0,71.0,57.0,79.9,6.48,8.454478,3.126838
6,Colombia,10.0,74.666667,,76.233333,6.266667,24.734349,0.568972
7,Czech Republic,20.0,86.666667,60.8,79.1,6.7,,
8,Denmark,9.0,95.0,72.6,80.9,7.65,8.678213,6.136734
9,Estonia,8.0,84.0,54.4,77.766667,5.78,6.166056,8.199942


### Right Merge

Conserva solo los elementos que se hayan combinado correctamente provenientes dataset `right`.

![Right Merge](https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/right.png)

In [70]:
pd.merge(
    left=env_heatlh_df,
    right=t_agg_df,
    on="Country",
    how="right",
    sort=True,
)

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std
0,Afghanistan,,,,,,13.545609,8.695203
1,Albania,,,,,,12.106435,7.101392
2,Algeria,,,,,,23.439610,7.473115
3,Andorra,,,,,,11.953746,6.014903
4,Angola,,,,,,22.133815,1.710757
...,...,...,...,...,...,...,...,...
187,United States,10.0,82.666667,86.8,78.6,7.0,7.617000,9.077944
188,Uruguay,,,,,,17.941855,4.505184
189,Uzbekistan,,,,,,13.158793,10.837303
190,Vanuatu,,,,,,24.123163,1.204321


### Outer

Combina todos los elementos posibles y conserva todo el resto en filas independientes.

![Outer Join](https://raw.githubusercontent.com/MDS7202/MDS7202/main/recursos/2023-01/10-Pandas3/outer.png)

In [71]:
outer_merged_df = pd.merge(
    left=env_heatlh_df,
    right=t_agg_df,
    on="Country",
    sort=True,
    how="outer",
)
outer_merged_df

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std
0,Afghanistan,,,,,,13.545609,8.695203
1,Albania,,,,,,12.106435,7.101392
2,Algeria,,,,,,23.439610,7.473115
3,Andorra,,,,,,11.953746,6.014903
4,Angola,,,,,,22.133815,1.710757
...,...,...,...,...,...,...,...,...
191,United States,10.0,82.666667,86.8,78.6,7.0,7.617000,9.077944
192,Uruguay,,,,,,17.941855,4.505184
193,Uzbekistan,,,,,,13.158793,10.837303
194,Vanuatu,,,,,,24.123163,1.204321


In [None]:
outer_merged_df[outer_merged_df["Country"] == "OECD - Total"]

#### Con Indicador

In [78]:
outer_merged_df = pd.merge(
    left=env_heatlh_df,
    right=t_agg_df,
    on="Country",
    how="outer",
    sort=True,
    indicator=True,
)
outer_merged_df

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std,_merge
0,Afghanistan,,,,,,13.545609,8.695203,right_only
1,Albania,,,,,,12.106435,7.101392,right_only
2,Algeria,,,,,,23.439610,7.473115,right_only
3,Andorra,,,,,,11.953746,6.014903,right_only
4,Angola,,,,,,22.133815,1.710757,right_only
...,...,...,...,...,...,...,...,...,...
191,United States,10.0,82.666667,86.8,78.6,7.0,7.617000,9.077944,both
192,Uruguay,,,,,,17.941855,4.505184,right_only
193,Uzbekistan,,,,,,13.158793,10.837303,right_only
194,Vanuatu,,,,,,24.123163,1.204321,right_only


In [73]:
outer_merged_df[outer_merged_df["_merge"] == "left_only"]

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std,_merge
43,Czech Republic,20.0,86.666667,60.8,79.1,6.7,,,left_only
91,Korea,28.0,76.0,33.0,82.366667,5.866667,,,left_only
131,OECD - Total,14.0,81.0,69.2,80.2,6.48,,,left_only
157,Slovak Republic,21.0,84.666667,68.6,77.266667,6.425,,,left_only


In [74]:
outer_merged_df[outer_merged_df["_merge"] == "both"]

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std,_merge
8,Australia,5.0,92.666667,87.25,82.5,7.35,21.966571,4.881637,both
9,Austria,16.0,92.0,70.6,81.7,7.225,7.202215,6.993245,both
16,Belgium,15.0,83.666667,73.6,81.5,7.0,10.573462,5.739911,both
23,Brazil,10.0,73.0,,74.766667,6.4,25.504852,0.962928,both
30,Canada,7.0,91.0,87.8,81.866667,7.425,-5.963874,12.528132,both
34,Chile,16.0,71.0,57.0,79.9,6.48,8.454478,3.126838,both
36,Colombia,10.0,74.666667,,76.233333,6.266667,24.734349,0.568972,both
46,Denmark,9.0,95.0,72.6,80.9,7.65,8.678213,6.136734,both
55,Estonia,8.0,84.0,54.4,77.766667,5.78,6.166056,8.199942,both
60,Finland,6.0,95.0,70.2,81.5,7.66,2.402188,9.278735,both


In [76]:
outer_merged_df[outer_merged_df["_merge"] == "right_only"]

Unnamed: 0,Country,Air pollution,Water quality,Self-reported health,Life expectancy,Life satisfaction,t_mean,t_std,_merge
0,Afghanistan,,,,,,13.545609,8.695203,right_only
1,Albania,,,,,,12.106435,7.101392,right_only
2,Algeria,,,,,,23.439610,7.473115,right_only
3,Andorra,,,,,,11.953746,6.014903,right_only
4,Angola,,,,,,22.133815,1.710757,right_only
...,...,...,...,...,...,...,...,...,...
189,United Arab Emirates,,,,,,27.557650,5.628118,right_only
192,Uruguay,,,,,,17.941855,4.505184,right_only
193,Uzbekistan,,,,,,13.158793,10.837303,right_only
194,Vanuatu,,,,,,24.123163,1.204321,right_only


---

## 4.- Transponer Datos

Simplemente invertir las filas por las columnas.

In [81]:
bli_df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,31,32,33,34,35,36,37,38,39,40
Country,Australia,Austria,Belgium,Brazil,Canada,Chile,Colombia,Czech Republic,Denmark,Estonia,...,Russia,Slovak Republic,Slovenia,South Africa,Spain,Sweden,Switzerland,Turkey,United Kingdom,United States
Air pollution,5.0,16.0,15.0,10.0,7.0,16.0,10.0,20.0,9.0,8.0,...,15.0,21.0,16.0,22.0,11.0,6.0,15.0,20.0,11.0,10.0
Dwellings without basic facilities,,0.9,1.9,6.7,0.2,9.4,23.9,0.7,0.5,7.0,...,14.8,1.2,0.4,37.0,0.1,0.0,0.1,8.0,0.3,0.1
Educational attainment,81.0,85.0,77.0,49.0,91.333333,65.0,54.0,93.666667,81.0,88.666667,...,94.0,91.333333,88.0,73.333333,59.0,83.0,87.666667,39.0,81.0,90.666667
Employees working very long hours,12.84,6.59,4.703333,7.006667,3.673333,9.316667,26.006667,5.496667,2.316667,2.436667,...,0.14,4.073333,4.333333,17.84,3.963333,1.066667,0.37,31.043333,12.123333,10.99
Employment rate,73.0,72.0,63.333333,61.0,73.333333,62.666667,67.0,73.666667,74.0,74.0,...,70.333333,66.0,69.333333,43.333333,62.333333,76.666667,79.666667,51.666667,75.0,70.0
Feeling safe walking alone at night,64.133333,80.7,70.266667,35.866667,82.5,48.0,44.566667,72.533333,83.566667,69.633333,...,53.466667,63.7,86.166667,36.333333,82.166667,75.566667,85.333333,59.833333,77.766667,73.9
Homicide rate,1.1,0.466667,1.033333,27.0,1.266667,4.2,25.0,0.5,0.6,3.166667,...,9.933333,0.8,0.6,14.0,0.6,0.9,0.6,1.366667,0.166667,5.5
Household net adjusted disposable income,32759.0,33541.0,30364.0,,30854.0,,,21453.0,29606.0,19697.0,...,,20474.0,20820.0,,23999.0,31287.0,37466.0,,28715.0,45284.0
Household net wealth,427064.0,308325.0,386006.0,,423849.0,100967.0,,,118637.0,159373.0,...,,119696.0,203044.0,,373548.0,,,,548392.0,632100.0


---

## 5.- Pivotear Datos


El dataset que usamos la clase pasada está relativamente ordenado.

In [86]:
bli_df.head(5).iloc[:, 0:20]

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,Housing expenditure,Labour market insecurity,Life expectancy,Life satisfaction,Long-term unemployment rate,Personal earnings,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations
0,Australia,5.0,,81.0,12.84,73.0,64.133333,1.1,32759.0,427064.0,20.0,5.922,82.5,7.35,1.306667,49126.0,95.25,,87.25,2.7
1,Austria,16.0,0.9,85.0,6.59,72.0,80.7,0.466667,33541.0,308325.0,21.0,4.076,81.7,7.225,1.83,50349.0,92.0,1.6,70.6,1.3
2,Belgium,15.0,1.9,77.0,4.703333,63.333333,70.266667,1.033333,30364.0,386006.0,21.0,4.052,81.5,7.0,3.533333,49675.0,92.0,2.2,73.6,2.0
3,Brazil,10.0,6.7,49.0,7.006667,61.0,35.866667,27.0,,,,,74.766667,6.4,,,89.25,,,2.2
4,Canada,7.0,0.2,91.333333,3.673333,73.333333,82.5,1.266667,30854.0,423849.0,22.0,7.048,81.866667,7.425,0.763333,47622.0,93.25,2.6,87.8,2.9


Sin embargo, originalmente tenía la siguiente estructura:

In [83]:
dataset_original = pd.read_csv("./resources/bli_original.csv", keep_default_na=False)
dataset_original

Unnamed: 0,Continent,Country,INDICATOR,Indicator,MEASURE,Measure,INEQUALITY,Inequality,Unit Code,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,Value,Flag Codes,Flags
0,OC,Australia,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,5.40,,
1,EU,Austria,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,3.50,,
2,EU,Belgium,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,3.70,,
3,,Canada,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,6.00,,
4,EU,Czech Republic,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,3.10,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2364,EU,Lithuania,WL_EWLH,Employees working very long hours,L,Value,TOT,Total,PC,Percentage,0,Units,,,0.54,,
2365,SA,Colombia,WL_EWLH,Employees working very long hours,L,Value,MN,Men,PC,Percentage,0,Units,,,32.09,,
2366,EU,Lithuania,WL_EWLH,Employees working very long hours,L,Value,MN,Men,PC,Percentage,0,Units,,,0.67,,
2367,SA,Colombia,WL_EWLH,Employees working very long hours,L,Value,WMN,Women,PC,Percentage,0,Units,,,19.37,,


In [85]:
dataset_original.loc[:, ["Country", "Indicator", "Value"]].head(20)

Unnamed: 0,Country,Indicator,Value
0,Australia,Labour market insecurity,5.4
1,Austria,Labour market insecurity,3.5
2,Belgium,Labour market insecurity,3.7
3,Canada,Labour market insecurity,6.0
4,Czech Republic,Labour market insecurity,3.1
5,Denmark,Labour market insecurity,4.2
6,Finland,Labour market insecurity,3.9
7,France,Labour market insecurity,7.6
8,Germany,Labour market insecurity,2.7
9,Greece,Labour market insecurity,29.8


In [90]:
dataset_original.loc[
    :, ["Continent", "Country", "Indicator", "Unit", "Value"]
].sort_values("Country").head(20)

Unnamed: 0,Continent,Country,Indicator,Unit,Value
0,OC,Australia,Labour market insecurity,Percentage,5.4
1636,OC,Australia,Self-reported health,Percentage,85.0
1673,OC,Australia,Self-reported health,Percentage,85.0
1710,OC,Australia,Self-reported health,Percentage,85.0
454,OC,Australia,Household net wealth,US Dollar,427064.0
1747,OC,Australia,Self-reported health,Percentage,94.0
425,OC,Australia,Household net adjusted disposable income,US Dollar,32759.0
1813,OC,Australia,Life satisfaction,Average score,7.3
1852,OC,Australia,Life satisfaction,Average score,7.2
350,OC,Australia,Feeling safe walking alone at night,Percentage,48.8


In [None]:
dataset_original.shape

Cada fila de este dataset contiene información acerca de los paises y de los indicadores y el valor del indicador. Esta forma es conocida como **long**. 

**Pivotear**

Para convertirla al formato con el que hemos estado trabajando, **wide**, debemos pivotear la tabla:

In [None]:
dataset_original.head(3)

![Pivot](./resources/pivot.png)

In [87]:
dataset_original["Indicator"].unique()

array(['Labour market insecurity',
       'Stakeholder engagement for developing regulations',
       'Dwellings without basic facilities', 'Housing expenditure',
       'Feeling safe walking alone at night', 'Rooms per person',
       'Household net adjusted disposable income', 'Household net wealth',
       'Employment rate', 'Long-term unemployment rate',
       'Personal earnings', 'Quality of support network',
       'Educational attainment', 'Student skills', 'Years in education',
       'Air pollution', 'Water quality', 'Voter turnout',
       'Life expectancy', 'Self-reported health', 'Life satisfaction',
       'Homicide rate', 'Employees working very long hours',
       'Time devoted to leisure and personal care'], dtype=object)

In [88]:
dataset_original.head().loc[:, ["Country", "Indicator", "Value"]]

Unnamed: 0,Country,Indicator,Value
0,Australia,Labour market insecurity,5.4
1,Austria,Labour market insecurity,3.5
2,Belgium,Labour market insecurity,3.7
3,Canada,Labour market insecurity,6.0
4,Czech Republic,Labour market insecurity,3.1


> **Ejercicio ✏️**: Pivotear la tabla original de los datos de la OECD


In [89]:
pd.pivot_table(dataset_original, index='Country', columns='Indicator', values='Value').head(5)

Indicator,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,Housing expenditure,...,Personal earnings,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Australia,5.0,,81.0,12.84,73.0,64.133333,1.1,32759.0,427064.0,20.0,...,49126.0,95.25,,87.25,2.7,411.2,14.35,91.0,92.666667,20.966667
Austria,16.0,0.9,85.0,6.59,72.0,80.7,0.466667,33541.0,308325.0,21.0,...,50349.0,92.0,1.6,70.6,1.3,492.8,14.53,80.0,92.0,17.0
Belgium,15.0,1.9,77.0,4.703333,63.333333,70.266667,1.033333,30364.0,386006.0,21.0,...,49675.0,92.0,2.2,73.6,2.0,503.8,15.663333,89.0,83.666667,19.3
Brazil,10.0,6.7,49.0,7.006667,61.0,35.866667,27.0,,,,...,,89.25,,,2.2,398.2,,79.0,73.0,16.166667
Canada,7.0,0.2,91.333333,3.673333,73.333333,82.5,1.266667,30854.0,423849.0,22.0,...,47622.0,93.25,2.6,87.8,2.9,523.2,14.553333,68.0,91.0,17.333333


---

## 6.- Multi-Índices

Hasta el momento solo hemos trabajado con `Dataframes` que contienen solo un nivel de filas o columnas. Sin embargo, es posible también agregar más niveles a los indices y a las columnas. 
Esto se le conoce como multi-índice.

In [94]:
dataset_original.loc[:, ["Continent", "Country", "Indicator", "Unit", "Value"]].head()

Unnamed: 0,Continent,Country,Indicator,Unit,Value
0,OC,Australia,Labour market insecurity,Percentage,5.4
1,EU,Austria,Labour market insecurity,Percentage,3.5
2,EU,Belgium,Labour market insecurity,Percentage,3.7
3,,Canada,Labour market insecurity,Percentage,6.0
4,EU,Czech Republic,Labour market insecurity,Percentage,3.1


Para agregar niveles de columnas, en el proceso de pivoteo vamos a indicar que tanto `Unit` como `Indicator` sean niveles de las columnas; y que a la vez, tanto `Continent` como `Country` sean indices para las filas. 

El resultado de esto puede ser visto en el siguiente `DataFrame`:

In [95]:
dataset_multindex = pd.pivot_table(
    dataset_original,
    index=["Continent", "Country"],
    columns=["Unit", "Indicator"],
    values="Value",
)
dataset_multindex

Unnamed: 0_level_0,Unit,Average score,Average score,Average score,Hours,Micrograms per cubic metre,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Ratio,Ratio,US Dollar,US Dollar,US Dollar,Years,Years
Unnamed: 0_level_1,Indicator,Life satisfaction,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,...,Self-reported health,Voter turnout,Water quality,Homicide rate,Rooms per person,Household net adjusted disposable income,Household net wealth,Personal earnings,Life expectancy,Years in education
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
AF,South Africa,4.7,,,14.77,22.0,37.0,73.333333,17.84,43.333333,36.333333,...,,73.0,67.0,14.0,,,,,57.5,
AS,Israel,7.225,2.5,470.0,,21.0,,87.333333,15.273333,69.0,69.966667,...,86.75,72.0,67.0,1.833333,1.2,,,35067.0,82.466667,15.633333
AS,Japan,5.9,1.4,528.8,,14.0,6.4,,,75.0,72.8,...,34.8,53.0,87.0,0.2,1.9,29798.0,305878.0,40863.0,84.066667,16.366667
AS,Korea,5.866667,2.9,520.0,14.583333,28.0,2.5,87.666667,,66.666667,66.666667,...,33.0,77.0,76.0,1.0,1.5,21882.0,285980.0,35191.0,82.366667,17.266667
EU,Austria,7.225,1.3,492.8,14.53,16.0,0.9,85.0,6.59,72.0,80.7,...,70.6,80.0,92.0,0.466667,1.6,33541.0,308325.0,50349.0,81.7,17.0
EU,Belgium,7.0,2.0,503.8,15.663333,15.0,1.9,77.0,4.703333,63.333333,70.266667,...,73.6,89.0,83.666667,1.033333,2.2,30364.0,386006.0,49675.0,81.5,19.3
EU,Czech Republic,6.7,1.6,492.2,,20.0,0.7,93.666667,5.496667,73.666667,72.533333,...,60.8,61.0,86.666667,0.5,1.4,21453.0,,25372.0,79.1,17.9
EU,Denmark,7.65,2.0,505.2,15.873333,9.0,0.5,81.0,2.316667,74.0,83.566667,...,72.6,86.0,95.0,0.6,1.9,29606.0,118637.0,51466.0,80.9,19.5
EU,Estonia,5.78,2.7,525.8,14.886667,8.0,7.0,88.666667,2.436667,74.0,69.633333,...,54.4,64.0,84.0,3.166667,1.6,19697.0,159373.0,24336.0,77.766667,17.7
EU,Finland,7.66,2.2,523.6,15.163333,6.0,0.5,88.0,3.816667,70.333333,85.266667,...,70.2,67.0,95.0,1.266667,1.9,29943.0,200827.0,42964.0,81.5,19.833333


In [96]:
dataset_multindex.index

MultiIndex([(  'AF',    'South Africa'),
            (  'AS',          'Israel'),
            (  'AS',           'Japan'),
            (  'AS',           'Korea'),
            (  'EU',         'Austria'),
            (  'EU',         'Belgium'),
            (  'EU',  'Czech Republic'),
            (  'EU',         'Denmark'),
            (  'EU',         'Estonia'),
            (  'EU',         'Finland'),
            (  'EU',          'France'),
            (  'EU',         'Germany'),
            (  'EU',          'Greece'),
            (  'EU',         'Hungary'),
            (  'EU',         'Iceland'),
            (  'EU',         'Ireland'),
            (  'EU',           'Italy'),
            (  'EU',          'Latvia'),
            (  'EU',       'Lithuania'),
            (  'EU',      'Luxembourg'),
            (  'EU',     'Netherlands'),
            (  'EU',          'Norway'),
            (  'EU',          'Poland'),
            (  'EU',        'Portugal'),
            (  '

Ojo que las columnas también son Indices!

In [97]:
dataset_multindex.columns.values

array([('Average score', 'Life satisfaction'),
       ('Average score', 'Stakeholder engagement for developing regulations'),
       ('Average score', 'Student skills'),
       ('Hours', 'Time devoted to leisure and personal care'),
       ('Micrograms per cubic metre', 'Air pollution'),
       ('Percentage', 'Dwellings without basic facilities'),
       ('Percentage', 'Educational attainment'),
       ('Percentage', 'Employees working very long hours'),
       ('Percentage', 'Employment rate'),
       ('Percentage', 'Feeling safe walking alone at night'),
       ('Percentage', 'Housing expenditure'),
       ('Percentage', 'Labour market insecurity'),
       ('Percentage', 'Long-term unemployment rate'),
       ('Percentage', 'Quality of support network'),
       ('Percentage', 'Self-reported health'),
       ('Percentage', 'Voter turnout'), ('Percentage', 'Water quality'),
       ('Ratio', 'Homicide rate'), ('Ratio', 'Rooms per person'),
       ('US Dollar', 'Household net adjusted di

Podemos acceder a los indices de cada nivel usando `get_level_values`

In [98]:
dataset_multindex.columns.get_level_values(0)

Index(['Average score', 'Average score', 'Average score', 'Hours',
       'Micrograms per cubic metre', 'Percentage', 'Percentage', 'Percentage',
       'Percentage', 'Percentage', 'Percentage', 'Percentage', 'Percentage',
       'Percentage', 'Percentage', 'Percentage', 'Percentage', 'Ratio',
       'Ratio', 'US Dollar', 'US Dollar', 'US Dollar', 'Years', 'Years'],
      dtype='object', name='Unit')

In [99]:
dataset_multindex.columns.get_level_values(1)

Index(['Life satisfaction',
       'Stakeholder engagement for developing regulations', 'Student skills',
       'Time devoted to leisure and personal care', 'Air pollution',
       'Dwellings without basic facilities', 'Educational attainment',
       'Employees working very long hours', 'Employment rate',
       'Feeling safe walking alone at night', 'Housing expenditure',
       'Labour market insecurity', 'Long-term unemployment rate',
       'Quality of support network', 'Self-reported health', 'Voter turnout',
       'Water quality', 'Homicide rate', 'Rooms per person',
       'Household net adjusted disposable income', 'Household net wealth',
       'Personal earnings', 'Life expectancy', 'Years in education'],
      dtype='object', name='Indicator')

También a cierto nivel de las columnas

In [None]:
dataset_multindex.index.get_level_values(0)

In [None]:
dataset_multindex.index.get_level_values(1)

### Acceder a Multi-Índices

> **Ejercicio ✏️**: Seleccionar la fila que contiene a Chile

In [101]:
dataset_multindex

Unnamed: 0_level_0,Unit,Average score,Average score,Average score,Hours,Micrograms per cubic metre,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Ratio,Ratio,US Dollar,US Dollar,US Dollar,Years,Years
Unnamed: 0_level_1,Indicator,Life satisfaction,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,...,Self-reported health,Voter turnout,Water quality,Homicide rate,Rooms per person,Household net adjusted disposable income,Household net wealth,Personal earnings,Life expectancy,Years in education
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
AF,South Africa,4.7,,,14.77,22.0,37.0,73.333333,17.84,43.333333,36.333333,...,,73.0,67.0,14.0,,,,,57.5,
AS,Israel,7.225,2.5,470.0,,21.0,,87.333333,15.273333,69.0,69.966667,...,86.75,72.0,67.0,1.833333,1.2,,,35067.0,82.466667,15.633333
AS,Japan,5.9,1.4,528.8,,14.0,6.4,,,75.0,72.8,...,34.8,53.0,87.0,0.2,1.9,29798.0,305878.0,40863.0,84.066667,16.366667
AS,Korea,5.866667,2.9,520.0,14.583333,28.0,2.5,87.666667,,66.666667,66.666667,...,33.0,77.0,76.0,1.0,1.5,21882.0,285980.0,35191.0,82.366667,17.266667
EU,Austria,7.225,1.3,492.8,14.53,16.0,0.9,85.0,6.59,72.0,80.7,...,70.6,80.0,92.0,0.466667,1.6,33541.0,308325.0,50349.0,81.7,17.0
EU,Belgium,7.0,2.0,503.8,15.663333,15.0,1.9,77.0,4.703333,63.333333,70.266667,...,73.6,89.0,83.666667,1.033333,2.2,30364.0,386006.0,49675.0,81.5,19.3
EU,Czech Republic,6.7,1.6,492.2,,20.0,0.7,93.666667,5.496667,73.666667,72.533333,...,60.8,61.0,86.666667,0.5,1.4,21453.0,,25372.0,79.1,17.9
EU,Denmark,7.65,2.0,505.2,15.873333,9.0,0.5,81.0,2.316667,74.0,83.566667,...,72.6,86.0,95.0,0.6,1.9,29606.0,118637.0,51466.0,80.9,19.5
EU,Estonia,5.78,2.7,525.8,14.886667,8.0,7.0,88.666667,2.436667,74.0,69.633333,...,54.4,64.0,84.0,3.166667,1.6,19697.0,159373.0,24336.0,77.766667,17.7
EU,Finland,7.66,2.2,523.6,15.163333,6.0,0.5,88.0,3.816667,70.333333,85.266667,...,70.2,67.0,95.0,1.266667,1.9,29943.0,200827.0,42964.0,81.5,19.833333


In [103]:
dataset_multindex.loc[[('SA', 'Chile')], :]

Unnamed: 0_level_0,Unit,Average score,Average score,Average score,Hours,Micrograms per cubic metre,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Ratio,Ratio,US Dollar,US Dollar,US Dollar,Years,Years
Unnamed: 0_level_1,Indicator,Life satisfaction,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,...,Self-reported health,Voter turnout,Water quality,Homicide rate,Rooms per person,Household net adjusted disposable income,Household net wealth,Personal earnings,Life expectancy,Years in education
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
SA,Chile,6.48,1.3,443.8,,16.0,9.4,65.0,9.316667,62.666667,48.0,...,57.0,47.0,71.0,4.2,1.2,,100967.0,25879.0,79.9,17.5


> **Ejercicio ✏️**: Seleccionar las columnas de los indicadores basados en Porcentajes.

In [106]:
dataset_multindex.loc[:, ['Percentage']]

Unnamed: 0_level_0,Unit,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage,Percentage
Unnamed: 0_level_1,Indicator,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Housing expenditure,Labour market insecurity,Long-term unemployment rate,Quality of support network,Self-reported health,Voter turnout,Water quality
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
AF,South Africa,37.0,73.333333,17.84,43.333333,36.333333,18.0,,16.643333,88.0,,73.0,67.0
AS,Israel,,87.333333,15.273333,69.0,69.966667,,4.2,0.486667,88.5,86.75,72.0,67.0
AS,Japan,6.4,,,75.0,72.8,22.0,2.22,0.996667,88.75,34.8,53.0,87.0
AS,Korea,2.5,87.666667,,66.666667,66.666667,15.0,2.302,0.05,78.333333,33.0,77.0,76.0
EU,Austria,0.9,85.0,6.59,72.0,80.7,21.0,4.076,1.83,92.0,70.6,80.0,92.0
EU,Belgium,1.9,77.0,4.703333,63.333333,70.266667,21.0,4.052,3.533333,92.0,73.6,89.0,83.666667
EU,Czech Republic,0.7,93.666667,5.496667,73.666667,72.533333,24.0,7.208,1.056667,91.75,60.8,61.0,86.666667
EU,Denmark,0.5,81.0,2.316667,74.0,83.566667,23.0,4.606,1.313333,95.5,72.6,86.0,95.0
EU,Estonia,7.0,88.666667,2.436667,74.0,69.633333,17.0,4.392,1.916667,91.2,54.4,64.0,84.0
EU,Finland,0.5,88.0,3.816667,70.333333,85.266667,23.0,4.508,2.123333,94.8,70.2,67.0,95.0


> **Ejercicio ✏️**: Seleccionar la columna que contiene a Life expectancy

In [110]:
dataset_multindex.columns.values

array([('Average score', 'Life satisfaction'),
       ('Average score', 'Stakeholder engagement for developing regulations'),
       ('Average score', 'Student skills'),
       ('Hours', 'Time devoted to leisure and personal care'),
       ('Micrograms per cubic metre', 'Air pollution'),
       ('Percentage', 'Dwellings without basic facilities'),
       ('Percentage', 'Educational attainment'),
       ('Percentage', 'Employees working very long hours'),
       ('Percentage', 'Employment rate'),
       ('Percentage', 'Feeling safe walking alone at night'),
       ('Percentage', 'Housing expenditure'),
       ('Percentage', 'Labour market insecurity'),
       ('Percentage', 'Long-term unemployment rate'),
       ('Percentage', 'Quality of support network'),
       ('Percentage', 'Self-reported health'),
       ('Percentage', 'Voter turnout'), ('Percentage', 'Water quality'),
       ('Ratio', 'Homicide rate'), ('Ratio', 'Rooms per person'),
       ('US Dollar', 'Household net adjusted di

In [111]:
dataset_multindex.loc[:, [('Years', 'Life expectancy')]]

Unnamed: 0_level_0,Unit,Years
Unnamed: 0_level_1,Indicator,Life expectancy
Continent,Country,Unnamed: 2_level_2
AF,South Africa,57.5
AS,Israel,82.466667
AS,Japan,84.066667
AS,Korea,82.366667
EU,Austria,81.7
EU,Belgium,81.5
EU,Czech Republic,79.1
EU,Denmark,80.9
EU,Estonia,77.766667
EU,Finland,81.5


> **Ejercicio ✏️**: Seleccionar la fila que contiene a Chile y la columna que contiene a Life expectancy

In [None]:
dataset_multindex.loc[...]

> **Pregunta ❓**: ¿Cómo puedo solicitar `Housing expenditure` como `Employment rate` al mismo tiempo?

In [None]:
dataset_multindex.head()

In [112]:
dataset_multindex.loc[
    :, [
        ("Percentage", "Housing expenditure"), 
        ("Percentage", "Employment rate")
    ]
]

Unnamed: 0_level_0,Unit,Percentage,Percentage
Unnamed: 0_level_1,Indicator,Housing expenditure,Employment rate
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2
AF,South Africa,18.0,43.333333
AS,Israel,,69.0
AS,Japan,22.0,75.0
AS,Korea,15.0,66.666667
EU,Austria,21.0,72.0
EU,Belgium,21.0,63.333333
EU,Czech Republic,24.0,73.666667
EU,Denmark,23.0,74.0
EU,Estonia,17.0,74.0
EU,Finland,23.0,70.333333


> **Pregunta ❓:** ¿Podrá abreviarse?

In [113]:
dataset_multindex.loc[:, [("Percentage", ["Housing expenditure", "Employment rate"])]]

  return array(a, dtype, copy=False, order=order)


TypeError: unhashable type: 'list'

Este caso puede convertirse en un problema cuando sacamos muchas columnas, ya que tendremos que escribir muchas tuplas. 

#### Opción: `IndexSlice`

`IndexSlice` soluciona el problema anteriormente mencionado al permitir seleccionar más de un índice/columna por nivel:

In [115]:
idx = pd.IndexSlice
idx

<pandas.core.indexing._IndexSlice at 0x7f6698b08640>

In [114]:
idx = pd.IndexSlice

dataset_multindex.loc[
    :, idx["Percentage", ["Employees working very long hours", "Housing expenditure"]]
]

Unnamed: 0_level_0,Unit,Percentage,Percentage
Unnamed: 0_level_1,Indicator,Employees working very long hours,Housing expenditure
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2
AF,South Africa,17.84,18.0
AS,Israel,15.273333,
AS,Japan,,22.0
AS,Korea,,15.0
EU,Austria,6.59,21.0
EU,Belgium,4.703333,21.0
EU,Czech Republic,5.496667,24.0
EU,Denmark,2.316667,23.0
EU,Estonia,2.436667,17.0
EU,Finland,3.816667,23.0


Lo siguiente extiende el ejemplo anterior para seleccionar los paises de Norte y Sudamerica.

In [None]:
dataset_multindex.loc[
    idx[["NA", "SA"]],
    idx["Percentage", ["Employees working very long hours", "Housing expenditure"]],
]

Incluso, puede pedir más de un índice/columna por cada nivel:

In [116]:
dataset_multindex.loc[
    :,
    idx[
        ["Hours", "Percentage"],
        [
            "Time devoted to leisure and personal care",
            "Employees working very long hours",
            "Housing expenditure",
        ],
    ],
]

Unnamed: 0_level_0,Unit,Hours,Percentage,Percentage
Unnamed: 0_level_1,Indicator,Time devoted to leisure and personal care,Employees working very long hours,Housing expenditure
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
AF,South Africa,14.77,17.84,18.0
AS,Israel,,15.273333,
AS,Japan,,,22.0
AS,Korea,14.583333,,15.0
EU,Austria,14.53,6.59,21.0
EU,Belgium,15.663333,4.703333,21.0
EU,Czech Republic,,5.496667,24.0
EU,Denmark,15.873333,2.316667,23.0
EU,Estonia,14.886667,2.436667,17.0
EU,Finland,15.163333,3.816667,23.0


Que sería lo mismo que indexar usando `:` (es decir, seleccionar todo ese multi-índice)

In [118]:
dataset_multindex.loc[
    idx[: ,['Chile', 'Israel']],
    idx[
        :,
        [
            "Time devoted to leisure and personal care",
            "Employees working very long hours",
            "Housing expenditure",
        ],
    ],
]

Unnamed: 0_level_0,Unit,Hours,Percentage,Percentage
Unnamed: 0_level_1,Indicator,Time devoted to leisure and personal care,Employees working very long hours,Housing expenditure
Continent,Country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
AS,Israel,,15.273333,
SA,Chile,,9.316667,18.0


### `droplevel`

El método `droplevel` nos permite eliminar un nivel de un multi-índice, tanto para filas como para columnas.
Recibe como parámetros el nivel (partiendo por 0 desde afuera hacia adentro) y el eje (axis): 

In [None]:
dataset_multindex.head()

In [None]:
dataset_multindex.droplevel(0, axis=0).head()

In [None]:
dataset_multindex.droplevel(1, axis=0).head()

In [None]:
dataset_multindex.droplevel(0, axis=1).head()

Noten que estos métodos generan DataFrames nuevos. Por ende, al ejecutarse deben reemplazar los `DataFrames` originales.

---

## 7.- Fundir / Melt

El proceso inverso al pivoteado:

![Melt](./resources/melt.png)

En este caso retornaremos a algo similar al formato original del dataset:

In [91]:
bli_df.head()

Unnamed: 0,Country,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,...,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,GPD per capita (2018)
0,Australia,5.0,,81.0,12.84,73.0,64.133333,1.1,32759.0,427064.0,...,95.25,,87.25,2.7,411.2,14.35,91.0,92.666667,20.966667,57395.91947
1,Austria,16.0,0.9,85.0,6.59,72.0,80.7,0.466667,33541.0,308325.0,...,92.0,1.6,70.6,1.3,492.8,14.53,80.0,92.0,17.0,51525.04643
2,Belgium,15.0,1.9,77.0,4.703333,63.333333,70.266667,1.033333,30364.0,386006.0,...,92.0,2.2,73.6,2.0,503.8,15.663333,89.0,83.666667,19.3,47491.32326
3,Brazil,10.0,6.7,49.0,7.006667,61.0,35.866667,27.0,,,...,89.25,,,2.2,398.2,,79.0,73.0,16.166667,9001.234249
4,Canada,7.0,0.2,91.333333,3.673333,73.333333,82.5,1.266667,30854.0,423849.0,...,93.25,2.6,87.8,2.9,523.2,14.553333,68.0,91.0,17.333333,46313.17137


In [92]:
bli_df.melt(id_vars=["Country"])

Unnamed: 0,Country,variable,value
0,Australia,Air pollution,5.000000
1,Austria,Air pollution,16.000000
2,Belgium,Air pollution,15.000000
3,Brazil,Air pollution,10.000000
4,Canada,Air pollution,7.000000
...,...,...,...
1020,Sweden,GPD per capita (2018),54589.060390
1021,Switzerland,GPD per capita (2018),82818.108160
1022,Turkey,GPD per capita (2018),9370.176355
1023,United Kingdom,GPD per capita (2018),43043.227820


Usando su argumento `value_vars` podemos seleccionar solo alguna de las columnas que deseamos operar con `melt`.
De todas formas, este comportamiento también puede ser logrado usando un simple indexador `.loc`

In [93]:
bli_df.melt(id_vars=["Country"], value_vars=["Air pollution", "Water quality"])

Unnamed: 0,Country,variable,value
0,Australia,Air pollution,5.000000
1,Austria,Air pollution,16.000000
2,Belgium,Air pollution,15.000000
3,Brazil,Air pollution,10.000000
4,Canada,Air pollution,7.000000
...,...,...,...
77,Sweden,Water quality,96.000000
78,Switzerland,Water quality,95.333333
79,Turkey,Water quality,65.000000
80,United Kingdom,Water quality,83.666667
