# IIC2433 - Proyecto semestral


Este notebook se creó usando Google Colab, en esta plataforma es necesario correr la siguiente celda para forzar la actualización de la librería plotly. De otra forma, no se podrán ejecutar correctamente las celdas correspondientes a la Visualización en Mapas

In [None]:
pip install -U plotly



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from functools import reduce
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

## 1. Definición del problema y su impacto


Queremos responder a la pregunta: "Qué comunas se comportan de manera similar en su respuesta al COVID-19?". Para esto usaremos datos que entregan información respecto a la evolución y respuesta de COVID-19 en cada comuna de Chile. Estos datos provienen del [repositorio del Ministerio de Ciencia](https://github.com/MinCiencia/Datos-COVID19). 

Encontrar la respuesta a este problema puede ser de utilidad para las autoridades del Ministerio de Salud y el Ministerio de Ciencia, que utilizan la evolución de casos por comuna y la tasa de incidencia para decretar si una comuna debe presentar modificaciones en el plan Paso a Paso. 

Nuestra hipótesis es que se pueden formar grupos de comunas que han tenido un comportamiento similar en sus cifras de COVID-19. Además de los casos totales y la tasas de incidencia se podrían considerar otros factores, como la cobertura del testeo y la cantidad de tests por búsqueda activa de casos.

Estos nuevos grupos de comportamiento similar pueden ser utilizados como las unidades territoriales a las que se aplican las medidas sanitarias del plan Paso a Paso, permitiendo así que cada comuna tenga las medidas sanitarias más adecuadas a su comportamiento histórico.

El análisis lo vamos a realizar con todas las comunas del país, pero también queremos analizar solo las comunas de la región metropolitana.

## 2. Datos

Descargamos los datos de COVID-19, están disponibles en un gist de github, pero realmente vienen de un [repositorio del Ministerio de Ciencia](https://github.com/MinCiencia/Datos-COVID19). A continuación se presentan los links a las descripciones de los datos a usar: 

- [Casos totales por comuna](https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto1)
- [Movilidad en chile por comuna](https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto33)
- [Fallecidos por comuna](https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto38)
- [índice de positividad test RT-PRC por comuna](https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto65)
- [Cobertura de testeo por comuna](https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto66)
- [Cantidad de Test por búsqueda activa de casos (BAC) por Comuna](https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto64)

Los archivos csv que serán utilizados se pueden encontrar en las url de las siguientes celdas:


In [None]:
url_total_cases = 'https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto1/Covid-19.csv'
url_mobility = 'https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto33/IndiceDeMovilidad-IM.csv'
url_deceased = 'https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto38/CasosFallecidosPorComuna.csv'
url_PCR_positivity = 'https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto65/PositividadPorComuna.csv'
url_test_coverage = 'https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto66/CoberturaPorComuna.csv'
url_BAC = 'https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto64/BACPorComuna.csv'

## 3. Preprocesamiento

Definimos los dataframes

In [None]:
df_total_cases = pd.read_csv(url_total_cases)
df_mobility = pd.read_csv(url_mobility)
df_deceased = pd.read_csv(url_deceased)
df_PCR_positivity = pd.read_csv(url_PCR_positivity)
df_test_coverage = pd.read_csv(url_test_coverage)
df_BAC = pd.read_csv(url_BAC)

Realizamos una función para agrupar columas de acuerdo a la fecha

In [None]:
def create_regular_expressions():
  years = ['2020', '2021']
  months = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
  regex_expressions = []
  for year in years:
    for month in months:
      regex_expressions.append(year + '-' + month + '-.*')
  return regex_expressions

def group_date_columns(df, column_name):
  reg_expressions = create_regular_expressions()
  # new_df = pd.DataFrame()
  #new_df['comuna'] = df['Comuna']
  new_df = df[['Region', 'Comuna', 'Codigo comuna', 'Poblacion']]
  
  for exp in reg_expressions:
    # Creamos las columnas nuevas con el promedio del mes
    new_df[column_name + '_' + exp[:7]] = df.filter(regex=exp).mean(axis=1)
  
  # Eliminamos las fechas en que no tenemos datos
  new_df.drop(columns=[f"{column_name}_2020-01", f"{column_name}_2020-02", f"{column_name}_2021-11", f"{column_name}_2021-12"], inplace=True)
  # Los demás promedios los llenamos con ceros
  new_df.fillna(0, inplace=True)
  return new_df

Agrupamos los datos de fechas de cada df


In [None]:
df_total_cases_info = group_date_columns(df_total_cases, 'total_cases')
df_mobility_info = group_date_columns(df_mobility, 'mobility')
df_deceased_info = group_date_columns(df_deceased, 'deceased')
df_PCR_positivity_info = group_date_columns(df_PCR_positivity, 'PRC_positivity')
df_test_coverage_info = group_date_columns(df_test_coverage, 'test_coverage')
df_BAC_info = group_date_columns(df_BAC, 'BAC')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  downcast=downcast,


Hacemos merge de los df por comuna para obtener el df con toda la información


In [None]:
dfs = [df_total_cases_info, df_mobility_info, df_deceased_info, df_PCR_positivity_info, df_test_coverage_info, df_BAC_info]
df_merged = reduce(lambda left,right: pd.merge(left, right, on=['Codigo comuna'], how='inner'), dfs)
df_merged

Unnamed: 0,Region_x,Comuna_x,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Region_y,Comuna_y,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,mobility_2020-12,mobility_2021-01,mobility_2021-02,mobility_2021-03,...,test_coverage_2020-06,test_coverage_2020-07,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,test_coverage_2021-08,test_coverage_2021-09,test_coverage_2021-10,Region_y.1,Comuna_y.1,Poblacion_y.1,BAC_2020-03,BAC_2020-04,BAC_2020-05,BAC_2020-06,BAC_2020-07,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07,BAC_2021-08,BAC_2021-09,BAC_2021-10
0,Arica y Parinacota,Arica,15101.0,247552.0,6.0,112.909091,406.888889,1155.875,3159.444444,6042.000000,7737.250,9294.222222,10332.222222,10842.125,11984.444444,14315.000,17066.000000,20570.111111,23859.000000,26726.375,28812.000000,29463.666667,29905.000,30671.444444,Arica y Parinacota,Arica,247552.0,5.394165,4.092249,4.283068,4.973705,5.049779,5.235622,5.868555,5.983744,6.312275,0.0,0.0,0.0,0.0,...,0.0,0.0,95.35,95.825,94.800,94.66,94.400,93.675,95.625,94.36,92.075,93.78,88.825,90.650,0.0,0.0,0.0,Arica y Parinacota,Arica,247552.0,0.0,0.0,0.0,0.0,0.0,60.60,66.975,64.700,73.38,78.525,81.650,83.025,84.74,85.300,87.16,86.425,87.700,0.0,0.0,0.0
1,Arica y Parinacota,Camarones,15102.0,1233.0,0.0,0.000000,0.000000,0.000,8.666667,24.333333,26.375,27.888889,28.000000,27.500,28.555556,30.375,39.666667,50.777778,57.222222,61.375,65.444444,67.444444,68.500,70.888889,Arica y Parinacota,Camarones,1233.0,19.541699,7.755541,6.572208,8.568462,9.214640,9.920844,10.699744,7.944169,5.325128,0.0,0.0,0.0,0.0,...,0.0,0.0,93.75,91.150,95.225,96.40,97.725,93.425,100.000,96.68,96.750,99.62,96.650,97.450,0.0,0.0,0.0,Arica y Parinacota,Camarones,1233.0,0.0,0.0,0.0,0.0,0.0,87.50,61.975,96.900,95.42,95.625,99.225,98.450,94.50,96.150,95.04,95.950,96.875,0.0,0.0,0.0
2,Arica y Parinacota,General Lagos,15202.0,810.0,0.0,0.000000,0.000000,0.000,0.333333,35.222222,63.500,64.000000,64.000000,64.000,64.000000,64.250,65.000000,69.666667,78.444444,79.500,86.888889,87.000000,88.000,88.000000,Arica y Parinacota,General Lagos,810.0,1.352493,0.990023,0.953350,0.926154,0.796030,0.758809,1.180513,0.720099,0.811795,0.0,0.0,0.0,0.0,...,0.0,0.0,100.00,100.000,91.675,100.00,75.000,75.000,75.000,53.34,67.150,97.50,90.825,94.225,0.0,0.0,0.0,Arica y Parinacota,General Lagos,810.0,0.0,0.0,0.0,0.0,0.0,25.00,87.775,66.675,80.00,41.650,75.000,58.350,60.00,61.050,97.14,96.075,97.050,0.0,0.0,0.0
3,Arica y Parinacota,Putre,15201.0,2515.0,0.0,0.000000,0.000000,4.500,27.333333,50.444444,60.500,70.222222,72.444444,73.000,81.000000,117.000,128.333333,156.888889,169.555556,182.625,195.000000,200.777778,207.625,214.000000,Arica y Parinacota,Putre,2515.0,3.936189,2.406134,2.915727,3.678952,3.983215,3.855582,4.908582,5.249672,5.750949,0.0,0.0,0.0,0.0,...,0.0,0.0,99.00,99.250,98.300,97.26,98.825,87.675,99.025,99.36,98.750,93.70,90.325,89.725,0.0,0.0,0.0,Arica y Parinacota,Putre,2515.0,0.0,0.0,0.0,0.0,0.0,89.45,94.000,93.900,97.48,98.275,95.150,82.875,94.46,87.275,95.94,90.325,87.525,0.0,0.0,0.0
4,Tarapacá,Alto Hospicio,1107.0,129999.0,0.0,17.000000,338.666667,1433.500,2250.000000,2946.000000,3571.875,4006.666667,4394.888889,4856.500,6370.777778,8229.625,9518.888889,11301.000000,13167.222222,14330.250,15055.888889,15577.333333,15848.000,16063.444444,Tarapacá,Alto Hospicio,129999.0,5.578910,4.598310,4.560372,4.267630,4.776936,5.534276,6.442598,6.629997,6.612584,0.0,0.0,0.0,0.0,...,0.0,0.0,94.40,95.525,92.775,95.26,90.800,89.150,97.800,95.74,97.000,95.74,92.450,90.800,0.0,0.0,0.0,Tarapaca,Alto Hospicio,129999.0,0.0,0.0,0.0,0.0,0.0,74.30,80.625,76.400,79.96,77.450,74.250,79.725,74.78,78.675,82.96,84.025,86.075,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339,Magallanes,Punta Arenas,12101.0,141984.0,29.0,357.454545,871.333333,1128.750,1458.555556,2383.444444,5609.250,9908.333333,12041.333333,13850.000,15823.333333,17914.500,19381.222222,20902.222222,23277.444444,24438.375,24914.222222,25057.777778,25148.625,25178.222222,Magallanes y de la Antártica Chilena,Punta Arenas,141984.0,5.898374,3.599590,4.357672,5.160808,6.025074,5.543681,6.000764,5.383628,5.763862,0.0,0.0,0.0,0.0,...,0.0,0.0,97.00,92.075,95.500,97.44,97.325,92.400,96.875,97.32,97.950,95.38,92.975,93.850,0.0,0.0,0.0,Magallanes y la Antartica,Punta Arenas,141984.0,0.0,0.0,0.0,0.0,0.0,32.05,33.875,46.575,65.06,63.750,58.700,68.175,74.62,70.525,77.42,81.725,81.650,0.0,0.0,0.0
340,Magallanes,Rio Verde,12103.0,211.0,0.0,0.000000,0.000000,0.000,0.000000,0.000000,0.500,1.777778,2.888889,3.000,4.666667,7.000,8.777778,10.222222,12.666667,13.000,16.888889,17.000000,17.000,17.000000,Magallanes y de la Antártica Chilena,Río Verde,211.0,6.101632,4.277943,4.331606,4.348359,4.013204,3.209761,4.007599,4.067859,4.397927,0.0,0.0,0.0,0.0,...,0.0,0.0,100.00,37.500,66.675,40.00,50.000,50.000,75.000,40.00,75.000,57.50,50.000,50.000,0.0,0.0,0.0,Magallanes y la Antartica,Rio Verde,211.0,0.0,0.0,0.0,0.0,0.0,50.00,50.000,25.000,20.00,0.000,8.325,58.325,70.00,62.500,52.50,25.000,79.175,0.0,0.0,0.0
341,Magallanes,San Gregorio,12104.0,681.0,0.0,0.000000,1.555556,2.000,2.000000,2.000000,2.000,15.666667,26.333333,30.000,48.555556,51.375,61.555556,66.111111,69.000000,70.375,73.444444,74.000000,74.000,74.000000,Magallanes y de la Antártica Chilena,San Gregorio,681.0,5.976834,3.548821,3.626463,3.738194,3.565583,3.084164,3.976961,3.557203,4.407598,0.0,0.0,0.0,0.0,...,0.0,0.0,100.00,100.000,98.875,100.00,75.000,100.000,75.000,80.00,100.000,60.00,50.000,100.000,0.0,0.0,0.0,Magallanes y la Antartica,San Gregorio,681.0,0.0,0.0,0.0,0.0,0.0,83.35,55.075,73.800,37.24,45.825,94.650,90.000,19.86,54.175,70.00,41.675,70.825,0.0,0.0,0.0
342,Magallanes,Timaukel,12303.0,282.0,0.0,0.000000,0.000000,0.000,0.000000,0.000000,0.375,13.333333,20.222222,20.375,23.222222,35.750,38.555556,39.000000,41.555556,42.000,42.000000,42.000000,42.000,42.000000,Magallanes y de la Antártica Chilena,Timaukel,282.0,0.463772,0.209580,0.236634,0.222611,0.209790,0.186330,0.222145,0.204827,0.305594,0.0,0.0,0.0,0.0,...,0.0,0.0,50.00,28.400,99.175,80.00,75.000,100.000,100.000,76.36,75.000,98.66,100.000,68.750,0.0,0.0,0.0,Magallanes y la Antartica,Timaukel,282.0,0.0,0.0,0.0,0.0,0.0,50.00,33.325,81.475,77.50,71.875,88.750,83.825,43.34,70.825,86.68,75.000,44.225,0.0,0.0,0.0


Creamos un df que será el que contiene toda la información y eliminamos las columnas duplicadas de región.

In [None]:
df_master = df_merged
df_master.drop(columns=['Region_y'], inplace=True)
df_master.drop(columns=['Comuna_y'], inplace=True)
dedup_names = pd.io.parsers.ParserBase({'names':df_master.columns})._maybe_dedup_names(df_master.columns)
df_master.set_axis(dedup_names, axis=1, inplace=True)
df_master.drop(columns=['Region_x.1', 'Region_x.2', 'Comuna_x.1', 'Comuna_x.2'], inplace=True)
df_master.rename(columns={'Region_x': 'Region', 'Comuna_x': 'Comuna'}, inplace=True)

In [None]:
# Vemos que ya no hay comunas que empiecen con "Desconocido"
df_merged[df_merged['Comuna'].str.startswith('Desconocido')]

Unnamed: 0,Region,Comuna,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,mobility_2020-12,mobility_2021-01,mobility_2021-02,mobility_2021-03,mobility_2021-04,mobility_2021-05,...,test_coverage_2020-04,test_coverage_2020-05,test_coverage_2020-06,test_coverage_2020-07,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,test_coverage_2021-08,test_coverage_2021-09,test_coverage_2021-10,Poblacion_y.2,BAC_2020-03,BAC_2020-04,BAC_2020-05,BAC_2020-06,BAC_2020-07,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07,BAC_2021-08,BAC_2021-09,BAC_2021-10


In [None]:
# No hay nulos
rows_with_NaN = df_merged[df_merged.isnull().any(axis=1)]
rows_with_NaN

Unnamed: 0,Region,Comuna,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,mobility_2020-12,mobility_2021-01,mobility_2021-02,mobility_2021-03,mobility_2021-04,mobility_2021-05,...,test_coverage_2020-04,test_coverage_2020-05,test_coverage_2020-06,test_coverage_2020-07,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,test_coverage_2021-08,test_coverage_2021-09,test_coverage_2021-10,Poblacion_y.2,BAC_2020-03,BAC_2020-04,BAC_2020-05,BAC_2020-06,BAC_2020-07,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07,BAC_2021-08,BAC_2021-09,BAC_2021-10


In [None]:
df_2 = df_master.replace(0,np.nan).dropna(axis=1,how="all")
df_3 = df_2.replace(np.nan,0).dropna(axis=1,how="all")
df_master = df_3
df_master.drop(columns=['Poblacion_y.1'])
df_master

Unnamed: 0,Region,Comuna,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,Poblacion_x.1,deceased_2020-06,deceased_2020-07,deceased_2020-08,deceased_2020-09,deceased_2020-10,...,Poblacion_y.1,PRC_positivity_2020-08,PRC_positivity_2020-09,PRC_positivity_2020-10,PRC_positivity_2020-11,PRC_positivity_2020-12,PRC_positivity_2021-01,PRC_positivity_2021-02,PRC_positivity_2021-03,PRC_positivity_2021-04,PRC_positivity_2021-05,PRC_positivity_2021-06,PRC_positivity_2021-07,PRC_positivity_2021-10,Poblacion_x.2,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,Poblacion_y.2,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07
0,Arica y Parinacota,Arica,15101.0,247552.0,6.0,112.909091,406.888889,1155.875,3159.444444,6042.000000,7737.250,9294.222222,10332.222222,10842.125,11984.444444,14315.000,17066.000000,20570.111111,23859.000000,26726.375,28812.000000,29463.666667,29905.000,30671.444444,247552.0,5.394165,4.092249,4.283068,4.973705,5.049779,5.235622,5.868555,5.983744,6.312275,247552.0,13.4,36.111111,89.000000,133.375,166.777778,...,247552.0,11.55,7.625,7.200,3.48,2.550,5.650,7.050,6.34,6.950,5.68,5.800,3.575,1.7,247552.0,95.35,95.825,94.800,94.66,94.400,93.675,95.625,94.36,92.075,93.78,88.825,90.650,247552.0,60.60,66.975,64.700,73.38,78.525,81.650,83.025,84.74,85.300,87.16,86.425,87.700
1,Arica y Parinacota,Camarones,15102.0,1233.0,0.0,0.000000,0.000000,0.000,8.666667,24.333333,26.375,27.888889,28.000000,27.500,28.555556,30.375,39.666667,50.777778,57.222222,61.375,65.444444,67.444444,68.500,70.888889,1233.0,19.541699,7.755541,6.572208,8.568462,9.214640,9.920844,10.699744,7.944169,5.325128,1233.0,0.0,0.000000,0.888889,1.000,1.000000,...,1233.0,2.40,12.500,0.375,0.00,0.000,1.425,1.675,2.50,3.825,1.92,2.675,1.550,0.0,1233.0,93.75,91.150,95.225,96.40,97.725,93.425,100.000,96.68,96.750,99.62,96.650,97.450,1233.0,87.50,61.975,96.900,95.42,95.625,99.225,98.450,94.50,96.150,95.04,95.950,96.875
2,Arica y Parinacota,General Lagos,15202.0,810.0,0.0,0.000000,0.000000,0.000,0.333333,35.222222,63.500,64.000000,64.000000,64.000,64.000000,64.250,65.000000,69.666667,78.444444,79.500,86.888889,87.000000,88.000,88.000000,810.0,1.352493,0.990023,0.953350,0.926154,0.796030,0.758809,1.180513,0.720099,0.811795,810.0,0.0,0.000000,0.222222,1.000,1.000000,...,810.0,25.00,5.275,0.000,0.00,0.000,0.000,8.325,0.00,18.275,12.50,3.675,8.825,0.0,810.0,100.00,100.000,91.675,100.00,75.000,75.000,75.000,53.34,67.150,97.50,90.825,94.225,810.0,25.00,87.775,66.675,80.00,41.650,75.000,58.350,60.00,61.050,97.14,96.075,97.050
3,Arica y Parinacota,Putre,15201.0,2515.0,0.0,0.000000,0.000000,4.500,27.333333,50.444444,60.500,70.222222,72.444444,73.000,81.000000,117.000,128.333333,156.888889,169.555556,182.625,195.000000,200.777778,207.625,214.000000,2515.0,3.936189,2.406134,2.915727,3.678952,3.983215,3.855582,4.908582,5.249672,5.750949,2515.0,0.0,0.000000,0.000000,0.000,0.000000,...,2515.0,6.20,4.825,1.275,0.24,0.000,3.350,7.300,2.34,7.675,1.74,2.400,0.975,20.0,2515.0,99.00,99.250,98.300,97.26,98.825,87.675,99.025,99.36,98.750,93.70,90.325,89.725,2515.0,89.45,94.000,93.900,97.48,98.275,95.150,82.875,94.46,87.275,95.94,90.325,87.525
4,Tarapacá,Alto Hospicio,1107.0,129999.0,0.0,17.000000,338.666667,1433.500,2250.000000,2946.000000,3571.875,4006.666667,4394.888889,4856.500,6370.777778,8229.625,9518.888889,11301.000000,13167.222222,14330.250,15055.888889,15577.333333,15848.000,16063.444444,129999.0,5.578910,4.598310,4.560372,4.267630,4.776936,5.534276,6.442598,6.629997,6.612584,129999.0,18.2,35.444444,51.111111,55.625,57.333333,...,129999.0,9.40,4.150,3.675,3.60,7.625,15.725,11.225,10.34,11.950,7.40,4.175,2.000,1.1,129999.0,94.40,95.525,92.775,95.26,90.800,89.150,97.800,95.74,97.000,95.74,92.450,90.800,129999.0,74.30,80.625,76.400,79.96,77.450,74.250,79.725,74.78,78.675,82.96,84.025,86.075
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339,Magallanes,Punta Arenas,12101.0,141984.0,29.0,357.454545,871.333333,1128.750,1458.555556,2383.444444,5609.250,9908.333333,12041.333333,13850.000,15823.333333,17914.500,19381.222222,20902.222222,23277.444444,24438.375,24914.222222,25057.777778,25148.625,25178.222222,141984.0,5.898374,3.599590,4.357672,5.160808,6.025074,5.543681,6.000764,5.383628,5.763862,141984.0,17.2,26.333333,32.222222,59.500,108.555556,...,141984.0,21.50,28.625,22.125,11.82,11.700,14.150,11.900,7.54,10.150,8.86,4.050,1.375,0.1,141984.0,97.00,92.075,95.500,97.44,97.325,92.400,96.875,97.32,97.950,95.38,92.975,93.850,141984.0,32.05,33.875,46.575,65.06,63.750,58.700,68.175,74.62,70.525,77.42,81.725,81.650
340,Magallanes,Rio Verde,12103.0,211.0,0.0,0.000000,0.000000,0.000,0.000000,0.000000,0.500,1.777778,2.888889,3.000,4.666667,7.000,8.777778,10.222222,12.666667,13.000,16.888889,17.000000,17.000,17.000000,211.0,6.101632,4.277943,4.331606,4.348359,4.013204,3.209761,4.007599,4.067859,4.397927,211.0,0.0,0.000000,0.000000,0.000,0.000000,...,211.0,0.00,0.000,37.500,0.00,25.000,0.000,14.575,10.00,37.500,5.00,0.000,20.825,0.0,211.0,100.00,37.500,66.675,40.00,50.000,50.000,75.000,40.00,75.000,57.50,50.000,50.000,211.0,50.00,50.000,25.000,20.00,0.000,8.325,58.325,70.00,62.500,52.50,25.000,79.175
341,Magallanes,San Gregorio,12104.0,681.0,0.0,0.000000,1.555556,2.000,2.000000,2.000000,2.000,15.666667,26.333333,30.000,48.555556,51.375,61.555556,66.111111,69.000000,70.375,73.444444,74.000000,74.000,74.000000,681.0,5.976834,3.548821,3.626463,3.738194,3.565583,3.084164,3.976961,3.557203,4.407598,681.0,0.0,0.000000,0.000000,0.000,0.000000,...,681.0,0.00,6.250,30.300,23.96,25.000,41.975,25.000,48.14,33.350,0.00,33.325,16.675,0.0,681.0,100.00,100.000,98.875,100.00,75.000,100.000,75.000,80.00,100.000,60.00,50.000,100.000,681.0,83.35,55.075,73.800,37.24,45.825,94.650,90.000,19.86,54.175,70.00,41.675,70.825
342,Magallanes,Timaukel,12303.0,282.0,0.0,0.000000,0.000000,0.000,0.000000,0.000000,0.375,13.333333,20.222222,20.375,23.222222,35.750,38.555556,39.000000,41.555556,42.000,42.000000,42.000000,42.000,42.000000,282.0,0.463772,0.209580,0.236634,0.222611,0.209790,0.186330,0.222145,0.204827,0.305594,282.0,0.0,0.000000,0.000000,0.000,0.000000,...,282.0,0.00,16.675,23.575,0.00,3.125,4.175,15.175,10.00,0.000,13.34,0.000,0.000,0.0,282.0,50.00,28.400,99.175,80.00,75.000,100.000,100.000,76.36,75.000,98.66,100.000,68.750,282.0,50.00,33.325,81.475,77.50,71.875,88.750,83.825,43.34,70.825,86.68,75.000,44.225


Hacemos una descripción de los datos
 a usar

In [None]:
df_master.describe()

Unnamed: 0,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,Poblacion_x.1,deceased_2020-06,deceased_2020-07,deceased_2020-08,deceased_2020-09,deceased_2020-10,deceased_2020-11,deceased_2020-12,...,Poblacion_y.1,PRC_positivity_2020-08,PRC_positivity_2020-09,PRC_positivity_2020-10,PRC_positivity_2020-11,PRC_positivity_2020-12,PRC_positivity_2021-01,PRC_positivity_2021-02,PRC_positivity_2021-03,PRC_positivity_2021-04,PRC_positivity_2021-05,PRC_positivity_2021-06,PRC_positivity_2021-07,PRC_positivity_2021-10,Poblacion_x.2,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,Poblacion_y.2,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07
count,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,...,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0
mean,9019.200581,56562.534884,5.630814,22.123943,130.935724,536.087209,991.599483,1206.832687,1391.996366,1561.179587,1714.636951,1874.363735,2155.018411,2528.626817,2996.215116,3736.562339,4423.124677,5113.384448,5496.968023,5632.138243,5697.401163,5783.443475,56562.534884,6.216846,4.986945,5.041564,5.079379,5.601002,6.17427,6.968629,7.377583,7.965244,56562.534884,12.139535,24.450258,31.379199,36.031613,40.271964,44.909561,48.387718,...,56562.534884,7.006977,6.262282,6.111846,4.788488,6.381613,9.588953,10.003198,11.417791,13.243823,12.435174,11.100581,5.656323,1.393605,56562.534884,89.125581,91.92064,93.405233,93.733488,93.622238,91.98561,94.560465,94.411628,93.462573,91.032267,88.718387,87.5875,56562.534884,54.92907,56.267151,58.826744,59.988721,61.715698,60.549419,64.497529,63.5075,64.697674,67.26436,66.104506,68.369985
std,3823.451166,89102.350327,18.997477,56.790865,339.454712,1399.86082,2377.910427,2684.032176,2912.474765,3105.640476,3271.538333,3465.479206,3829.183794,4333.177828,4984.805655,6117.032714,7188.307547,8242.588237,8798.899973,8997.492915,9103.872267,9253.3907,89102.350327,3.44893,2.798961,2.704601,2.685504,3.024791,3.465312,4.144899,4.556876,4.925763,89102.350327,32.731078,62.661557,76.203552,83.763135,90.50396,97.407013,102.028276,...,89102.350327,5.688672,4.349964,6.611647,3.955982,5.076887,5.594283,4.967489,5.662282,6.078371,5.985183,5.13014,3.410303,1.952681,89102.350327,14.403157,9.32606,7.548798,7.097548,7.196332,6.907376,7.462136,9.633144,8.445428,9.819404,11.590342,12.066665,89102.350327,19.765269,16.153351,16.416445,15.305237,15.899684,16.283119,15.080041,14.41485,14.573193,14.15872,14.193924,14.482872
min,1101.0,211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,211.0,0.00452,0.001375,0.000204,0.001055,0.001225,0.002144,0.002004,0.001123,0.002532,211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,211.0,0.0,25.0,25.0,40.0,25.0,25.0,0.0,0.0,0.0,36.4,25.0,8.1,211.0,0.0,0.0,0.0,0.0,0.0,8.325,0.0,19.86,0.0,24.88,16.45,8.325
25%,6108.75,10013.75,0.0,0.272727,3.194444,16.0625,38.861111,60.472222,95.40625,133.305556,167.638889,202.15625,257.666667,309.34375,424.083333,568.277778,717.638889,821.125,901.194444,913.972222,922.125,936.694444,10013.75,4.119785,3.311728,3.442023,3.516485,3.764414,4.210003,4.581216,4.583536,4.782316,10013.75,0.0,0.0,0.444444,1.0,1.527778,2.0,2.90625,...,10013.75,2.95,3.61875,2.775,2.055,3.125,5.81875,6.9375,8.03,9.29375,8.39,7.78125,3.5,0.0,10013.75,87.175,90.41875,91.96875,92.465,92.175,89.89375,93.14375,94.295,91.94375,88.62,83.91875,83.05625,10013.75,42.8125,45.73125,47.71875,50.295,52.15,49.9875,54.89375,54.895,56.925,59.1,57.66875,61.0125
50%,8312.5,20147.0,0.0,2.909091,11.111111,50.4375,117.055556,174.888889,243.3125,329.055556,388.277778,438.75,542.5,698.3125,920.666667,1212.444444,1419.888889,1706.4375,1861.166667,1932.444444,1962.6875,1991.222222,20147.0,5.843537,4.696964,4.93273,5.01632,5.474358,5.869026,6.430196,6.722275,7.309826,20147.0,0.4,1.888889,3.0,4.0625,5.555556,6.888889,8.1875,...,20147.0,6.15,5.3625,4.625,3.65,4.9875,8.3,9.2,10.66,12.6125,12.02,10.625,4.975,0.8,20147.0,93.75,94.2375,95.4625,95.42,95.4875,93.2625,95.95,96.41,96.4125,95.03,92.8875,91.275,20147.0,57.0,57.575,59.65,60.04,62.2875,62.3125,66.9125,63.87,66.225,68.98,66.6625,69.8125
75%,13103.25,56106.5,0.0,13.977273,51.861111,196.0,475.722222,670.5,972.1875,1234.5,1393.083333,1530.0,1724.805556,2304.78125,2871.527778,3458.833333,4154.777778,4798.59375,5289.111111,5455.305556,5524.5,5607.166667,56106.5,7.923251,6.429449,6.582111,6.680965,7.346519,8.136359,9.02504,9.721312,10.379841,56106.5,3.2,7.333333,12.916667,16.09375,20.138889,26.0,28.9375,...,56106.5,9.4,8.35,7.6625,7.185,9.08125,12.73125,12.25625,14.36,16.75625,16.125,13.85625,7.18125,2.0,56106.5,97.1625,96.98125,97.40625,97.42,97.50625,95.925,97.65625,97.865,97.83125,97.285,96.625,96.04375,56106.5,68.05,67.85,70.90625,70.715,73.39375,72.18125,74.675,73.42,74.6,77.365,75.9875,78.5125
max,16305.0,645909.0,181.0,470.090909,2841.222222,12220.5,21040.555556,23505.222222,25339.125,26725.777778,27891.666667,29272.5,31411.555556,33915.5,37799.0,46720.222222,57028.888889,66958.0,71810.444444,73442.777778,74324.5,75436.888889,645909.0,26.047198,23.668003,23.930282,23.351062,26.377843,31.691633,38.979188,38.43873,40.366021,645909.0,232.6,470.666667,577.333333,643.625,707.444444,761.888889,795.375,...,645909.0,29.65,28.625,87.5,24.48,31.975,41.975,29.15,48.14,42.875,43.14,33.325,22.6,20.0,645909.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,645909.0,91.85,94.0,100.0,97.48,98.275,99.225,98.45,96.14,100.0,100.0,98.375,97.05


Dividimos las columas correspondientes por población

In [None]:
for i, row in enumerate(df_master.iterrows()):
  for p, column in enumerate(df_master):
    if 'cases' in column or 'deceased' in column:
      cant_poblacion = df_master.loc[[i],['Poblacion_x']]
      df_master.at[i, column]= df_master.iloc[[i],[p]].squeeze() / cant_poblacion.squeeze()  
  

In [None]:
df_master.describe()

Unnamed: 0,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,Poblacion_x.1,deceased_2020-06,deceased_2020-07,deceased_2020-08,deceased_2020-09,deceased_2020-10,deceased_2020-11,deceased_2020-12,...,Poblacion_y.1,PRC_positivity_2020-08,PRC_positivity_2020-09,PRC_positivity_2020-10,PRC_positivity_2020-11,PRC_positivity_2020-12,PRC_positivity_2021-01,PRC_positivity_2021-02,PRC_positivity_2021-03,PRC_positivity_2021-04,PRC_positivity_2021-05,PRC_positivity_2021-06,PRC_positivity_2021-07,PRC_positivity_2021-10,Poblacion_x.2,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,Poblacion_y.2,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07
count,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,...,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0,344.0
mean,9019.200581,56562.534884,4.4e-05,0.000262,0.001262,0.004946,0.010249,0.013716,0.016946,0.020497,0.023716,0.026974,0.032751,0.040773,0.049799,0.062727,0.075091,0.087499,0.094899,0.097482,0.098596,0.099907,56562.534884,6.216846,4.986945,5.041564,5.079379,5.601002,6.17427,6.968629,7.377583,7.965244,56562.534884,0.000104,0.000221,0.000309,0.000377,0.000438,0.000507,0.000562,...,56562.534884,7.006977,6.262282,6.111846,4.788488,6.381613,9.588953,10.003198,11.417791,13.243823,12.435174,11.100581,5.656323,1.393605,56562.534884,89.125581,91.92064,93.405233,93.733488,93.622238,91.98561,94.560465,94.411628,93.462573,91.032267,88.718387,87.5875,56562.534884,54.92907,56.267151,58.826744,59.988721,61.715698,60.549419,64.497529,63.5075,64.697674,67.26436,66.104506,68.369985
std,3823.451166,89102.350327,0.000168,0.000509,0.001857,0.006436,0.011243,0.013129,0.014052,0.014426,0.015005,0.016038,0.018009,0.020697,0.022007,0.023481,0.025782,0.027579,0.028581,0.029037,0.029158,0.029103,89102.350327,3.44893,2.798961,2.704601,2.685504,3.024791,3.465312,4.144899,4.556876,4.925763,89102.350327,0.000201,0.000348,0.000419,0.000456,0.000477,0.000498,0.000509,...,89102.350327,5.688672,4.349964,6.611647,3.955982,5.076887,5.594283,4.967489,5.662282,6.078371,5.985183,5.13014,3.410303,1.952681,89102.350327,14.403157,9.32606,7.548798,7.097548,7.196332,6.907376,7.462136,9.633144,8.445428,9.819404,11.590342,12.066665,89102.350327,19.765269,16.153351,16.416445,15.305237,15.899684,16.283119,15.080041,14.41485,14.573193,14.15872,14.193924,14.482872
min,1101.0,211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000968,0.000968,0.000968,0.000968,0.000968,0.000968,0.000968,0.000968,0.000968,0.000968,211.0,0.00452,0.001375,0.000204,0.001055,0.001225,0.002144,0.002004,0.001123,0.002532,211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,211.0,0.0,25.0,25.0,40.0,25.0,25.0,0.0,0.0,0.0,36.4,25.0,8.1,211.0,0.0,0.0,0.0,0.0,0.0,8.325,0.0,19.86,0.0,24.88,16.45,8.325
25%,6108.75,10013.75,0.0,1.5e-05,0.000209,0.001149,0.003056,0.005107,0.007819,0.011383,0.014464,0.017364,0.022032,0.02833,0.035727,0.04727,0.058333,0.069509,0.076941,0.079015,0.080311,0.081679,10013.75,4.119785,3.311728,3.442023,3.516485,3.764414,4.210003,4.581216,4.583536,4.782316,10013.75,0.0,0.0,2.8e-05,6.4e-05,0.000109,0.00016,0.000223,...,10013.75,2.95,3.61875,2.775,2.055,3.125,5.81875,6.9375,8.03,9.29375,8.39,7.78125,3.5,0.0,10013.75,87.175,90.41875,91.96875,92.465,92.175,89.89375,93.14375,94.295,91.94375,88.62,83.91875,83.05625,10013.75,42.8125,45.73125,47.71875,50.295,52.15,49.9875,54.89375,54.895,56.925,59.1,57.66875,61.0125
50%,8312.5,20147.0,0.0,0.000116,0.000566,0.002453,0.006315,0.009987,0.013463,0.017113,0.019844,0.023176,0.029169,0.037407,0.045358,0.058377,0.071825,0.08474,0.091894,0.095013,0.09571,0.096999,20147.0,5.843537,4.696964,4.93273,5.01632,5.474358,5.869026,6.430196,6.722275,7.309826,20147.0,1.9e-05,8.5e-05,0.000157,0.000218,0.000283,0.000356,0.000403,...,20147.0,6.15,5.3625,4.625,3.65,4.9875,8.3,9.2,10.66,12.6125,12.02,10.625,4.975,0.8,20147.0,93.75,94.2375,95.4625,95.42,95.4875,93.2625,95.95,96.41,96.4125,95.03,92.8875,91.275,20147.0,57.0,57.575,59.65,60.04,62.2875,62.3125,66.9125,63.87,66.225,68.98,66.6625,69.8125
75%,13103.25,56106.5,0.0,0.000339,0.001361,0.004905,0.011747,0.016822,0.020855,0.025206,0.028884,0.032637,0.038951,0.047919,0.060073,0.074285,0.088493,0.101741,0.108294,0.111116,0.111991,0.113563,56106.5,7.923251,6.429449,6.582111,6.680965,7.346519,8.136359,9.02504,9.721312,10.379841,56106.5,9.6e-05,0.000226,0.000371,0.00048,0.000608,0.000693,0.000763,...,56106.5,9.4,8.35,7.6625,7.185,9.08125,12.73125,12.25625,14.36,16.75625,16.125,13.85625,7.18125,2.0,56106.5,97.1625,96.98125,97.40625,97.42,97.50625,95.925,97.65625,97.865,97.83125,97.285,96.625,96.04375,56106.5,68.05,67.85,70.90625,70.715,73.39375,72.18125,74.675,73.42,74.6,77.365,75.9875,78.5125
max,16305.0,645909.0,0.002521,0.007014,0.012882,0.033412,0.054201,0.082462,0.083624,0.083624,0.084808,0.105037,0.140864,0.155995,0.168096,0.180198,0.203996,0.227404,0.232884,0.234907,0.235388,0.235542,645909.0,26.047198,23.668003,23.930282,23.351062,26.377843,31.691633,38.979188,38.43873,40.366021,645909.0,0.001309,0.001623,0.002006,0.002201,0.00225,0.002348,0.002448,...,645909.0,29.65,28.625,87.5,24.48,31.975,41.975,29.15,48.14,42.875,43.14,33.325,22.6,20.0,645909.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,645909.0,91.85,94.0,100.0,97.48,98.275,99.225,98.45,96.14,100.0,100.0,98.375,97.05


El dataframe final cuenta con 90 columnas y 344 filas.

Para ver un poco la distribución de los datos, vamos a mostrar la evolución de los casos totales.

Obtenemos el dataframe para hacer el análisis de la región metropolitana. 

In [None]:
df_master_metropolitana = df_master[df_master['Region'] == 'Metropolitana']
df_master_metropolitana

Unnamed: 0,Region,Comuna,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,Poblacion_x.1,deceased_2020-06,deceased_2020-07,deceased_2020-08,deceased_2020-09,deceased_2020-10,...,Poblacion_y.1,PRC_positivity_2020-08,PRC_positivity_2020-09,PRC_positivity_2020-10,PRC_positivity_2020-11,PRC_positivity_2020-12,PRC_positivity_2021-01,PRC_positivity_2021-02,PRC_positivity_2021-03,PRC_positivity_2021-04,PRC_positivity_2021-05,PRC_positivity_2021-06,PRC_positivity_2021-07,PRC_positivity_2021-10,Poblacion_x.2,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,Poblacion_y.2,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07
82,Metropolitana,Alhue,13502.0,7405.0,0.0,0.0,0.00015,0.004254,0.019281,0.033596,0.04578,0.050416,0.054333,0.055689,0.056313,0.057984,0.063306,0.079286,0.088214,0.097367,0.104524,0.10706,0.107529,0.10889,7405.0,0.109454,0.063551,0.058146,0.052381,0.062935,0.08191,0.093184,0.136668,0.083333,7405.0,8.1e-05,0.000345,0.00051,0.000625,0.00075,...,7405.0,10.3,11.0,4.625,2.5,0.85,3.375,7.325,11.58,11.875,5.96,7.2,4.775,4.0,7405.0,81.55,85.85,92.025,95.7,87.25,86.0,93.775,94.78,89.1,59.26,62.0,60.7,7405.0,80.05,75.525,71.725,69.5,76.2,67.775,67.625,63.08,74.475,68.34,76.725,79.425
83,Metropolitana,Buin,13402.0,109641.0,6.4e-05,0.000307,0.002203,0.00998,0.01908,0.022795,0.0258,0.028353,0.030001,0.0316,0.034385,0.038397,0.044763,0.056371,0.06869,0.085428,0.094621,0.097603,0.099428,0.101029,109641.0,6.959855,6.230245,6.065094,5.666719,6.023731,6.857551,7.575264,8.360444,9.627637,109641.0,0.000296,0.000592,0.000736,0.000832,0.000954,...,109641.0,7.35,6.075,3.975,3.34,4.625,6.75,9.5,11.76,13.45,11.52,12.05,4.775,2.1,109641.0,86.1,91.875,90.975,90.3,92.975,87.1,92.475,90.22,89.6,85.62,81.75,86.925,109641.0,49.9,56.475,58.125,50.68,55.925,57.05,59.45,55.86,56.975,65.22,61.6,61.675
84,Metropolitana,Calera de Tango,13403.0,28525.0,0.00021,0.000357,0.001317,0.007778,0.018027,0.020769,0.022805,0.023722,0.024739,0.026144,0.027882,0.030324,0.035653,0.049126,0.062421,0.073659,0.081726,0.08446,0.086157,0.088628,28525.0,8.838029,9.207986,6.843134,7.025447,8.467747,9.127492,10.03902,11.6984,11.735486,28525.0,4.2e-05,0.00021,0.000304,0.000386,0.000448,...,28525.0,6.15,3.425,2.35,2.32,2.7,4.0,3.4,8.94,14.9,12.18,11.525,7.65,2.1,28525.0,92.65,91.625,89.875,90.72,90.8,89.1,89.975,94.44,94.9,88.3,91.1,92.35,28525.0,64.4,42.425,49.425,41.62,49.325,53.525,54.75,51.28,57.15,58.72,55.2,52.175
85,Metropolitana,Cerrillos,13102.0,88956.0,0.0,0.000368,0.005242,0.018165,0.031235,0.035029,0.037145,0.038851,0.040019,0.041515,0.043821,0.046745,0.052874,0.062317,0.072513,0.08568,0.092345,0.094291,0.095744,0.097475,88956.0,10.947413,8.566077,8.891126,7.689902,11.039027,10.875165,12.618182,12.836347,13.125003,88956.0,0.000339,0.000719,0.000909,0.001067,0.001194,...,88956.0,3.95,4.175,3.575,1.94,3.85,6.0,6.225,9.46,9.55,11.92,11.925,5.725,2.1,88956.0,91.85,93.4,95.0,94.24,93.525,93.85,95.25,86.92,75.15,70.9,69.8,71.3,88956.0,66.95,64.125,63.8,67.18,69.825,65.475,73.725,68.34,73.875,73.3,71.45,71.325
86,Metropolitana,Cerro Navia,13103.0,142465.0,0.000133,0.000452,0.004031,0.022329,0.041567,0.047063,0.050347,0.052117,0.053638,0.055715,0.058978,0.063185,0.068166,0.081549,0.099385,0.116235,0.124004,0.126818,0.128165,0.130411,142465.0,8.191836,7.652349,7.054088,7.473678,9.427247,10.309975,11.694886,12.043715,12.483961,142465.0,0.000807,0.001505,0.001807,0.001946,0.002047,...,142465.0,6.7,4.425,2.575,4.34,4.75,7.675,7.65,9.36,16.725,18.48,15.025,6.975,3.9,142465.0,94.65,89.6,92.9,90.76,92.05,93.425,93.25,93.98,93.2,87.5,83.825,79.725,142465.0,63.0,56.275,61.075,60.46,60.2,60.55,64.2,63.9,68.65,64.36,63.875,64.95
87,Metropolitana,Colina,13301.0,180353.0,0.000177,0.000395,0.002931,0.014367,0.023842,0.025865,0.027667,0.028936,0.029891,0.030911,0.032958,0.035779,0.040057,0.049031,0.059528,0.074627,0.082576,0.084842,0.085863,0.087398,180353.0,8.148372,7.623792,7.414511,7.376062,8.434201,9.863956,10.617423,11.515919,12.296341,180353.0,0.000252,0.000518,0.000628,0.000702,0.000754,...,180353.0,7.8,6.45,3.75,2.2,2.8,5.95,5.65,7.32,10.05,11.42,12.4,4.35,1.9,180353.0,88.15,86.9,87.35,85.82,86.125,86.325,92.6,91.8,74.0,74.64,72.175,71.05,180353.0,30.35,41.675,46.975,45.8,44.525,50.775,66.35,58.04,62.45,68.72,68.1,76.5
88,Metropolitana,Conchali,13104.0,139195.0,0.000101,0.000405,0.005506,0.024281,0.041046,0.045193,0.047894,0.049679,0.051237,0.052997,0.055394,0.058521,0.063293,0.074265,0.08534,0.096866,0.102829,0.10518,0.106119,0.108129,139195.0,8.843485,7.319656,7.158501,7.063871,7.816682,8.657791,10.427388,11.585823,11.781001,139195.0,0.00075,0.00138,0.001656,0.001814,0.001933,...,139195.0,8.15,7.6,4.8,4.3,5.275,8.325,8.45,11.42,18.3,16.68,14.3,8.55,3.3,139195.0,91.8,92.125,90.45,90.88,87.7,87.65,91.375,90.94,88.65,83.82,80.225,77.325,139195.0,42.1,32.725,35.9,35.32,37.825,49.55,45.375,43.88,43.375,46.98,42.0,44.975
89,Metropolitana,Curacavi,13503.0,36430.0,0.000137,0.000304,0.00219,0.013056,0.021981,0.025397,0.029028,0.030387,0.031885,0.034817,0.038436,0.041141,0.046689,0.061229,0.073587,0.090924,0.099936,0.104419,0.106564,0.109254,36430.0,6.989391,6.058972,6.321804,5.911031,7.249672,7.958609,9.014086,10.059219,11.383893,36430.0,0.000165,0.000369,0.000479,0.000594,0.000674,...,36430.0,4.35,3.25,0.925,2.14,3.925,7.575,4.45,9.3,14.125,12.66,14.2,8.3,3.2,36430.0,96.95,96.7,96.95,97.42,97.275,93.625,91.325,94.7,83.125,78.92,77.0,78.525,36430.0,79.2,60.95,68.825,84.28,83.625,63.325,76.775,67.54,69.0,68.24,62.85,65.525
90,Metropolitana,El Bosque,13105.0,172000.0,0.000105,0.000498,0.004073,0.018553,0.034078,0.038052,0.040358,0.041978,0.043818,0.045946,0.049185,0.053729,0.06062,0.075824,0.094063,0.112586,0.121343,0.123974,0.126208,0.128353,172000.0,6.00205,5.197559,5.262118,5.530618,5.549853,6.137735,7.007011,7.422888,7.381564,172000.0,0.00047,0.001032,0.001254,0.001408,0.001533,...,172000.0,6.3,5.225,3.975,4.44,5.125,7.3,8.55,11.18,16.925,17.64,14.925,7.125,3.5,172000.0,92.45,92.4,90.425,90.84,92.325,89.175,94.475,91.66,92.15,91.06,91.3,91.325,172000.0,45.2,38.725,44.125,42.76,52.2,58.1,60.325,52.76,54.0,50.46,49.275,48.9
91,Metropolitana,El Monte,13602.0,40014.0,0.0,0.000214,0.001294,0.01074,0.02606,0.030806,0.032704,0.034988,0.036204,0.037156,0.038967,0.041257,0.045692,0.058896,0.075046,0.090846,0.097938,0.100262,0.10089,0.103097,40014.0,8.404257,7.714192,8.167683,7.05287,7.219785,8.638955,10.771235,11.125179,12.247379,40014.0,0.000305,0.000869,0.00105,0.001112,0.001227,...,40014.0,3.05,4.375,4.925,3.72,3.475,4.825,5.45,9.46,18.5,20.6,14.725,7.175,3.2,40014.0,96.9,93.525,96.25,94.36,93.75,90.475,95.55,91.54,92.925,82.14,80.05,74.3,40014.0,71.5,68.875,62.025,62.56,67.025,59.7,69.8,68.06,66.9,63.1,65.125,67.4


In [None]:
df_master_metropolitana.describe()

Unnamed: 0,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,Poblacion_x.1,deceased_2020-06,deceased_2020-07,deceased_2020-08,deceased_2020-09,deceased_2020-10,deceased_2020-11,deceased_2020-12,...,Poblacion_y.1,PRC_positivity_2020-08,PRC_positivity_2020-09,PRC_positivity_2020-10,PRC_positivity_2020-11,PRC_positivity_2020-12,PRC_positivity_2021-01,PRC_positivity_2021-02,PRC_positivity_2021-03,PRC_positivity_2021-04,PRC_positivity_2021-05,PRC_positivity_2021-06,PRC_positivity_2021-07,PRC_positivity_2021-10,Poblacion_x.2,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,Poblacion_y.2,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07
count,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,...,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0
mean,13238.076923,156251.384615,0.000125,0.000425,0.003798,0.016969,0.030101,0.03415,0.036716,0.038459,0.04004,0.041696,0.044314,0.04759,0.053122,0.064885,0.07719,0.090995,0.097956,0.100448,0.10193,0.103897,156251.384615,8.710154,7.191718,6.928389,6.936707,7.770475,8.648672,10.127783,11.097427,11.767577,156251.384615,0.000435,0.000846,0.001024,0.001123,0.001206,0.001294,0.001354,...,156251.384615,5.819231,4.875481,3.799038,3.183846,4.078365,6.420673,6.963462,10.306538,15.104808,15.323846,13.995673,6.427404,2.713462,156251.384615,89.820192,91.415865,90.664423,90.687308,90.254327,89.159615,92.384135,93.176154,89.991346,83.346923,80.986538,80.151442,156251.384615,56.924038,52.404327,53.074038,53.460385,55.522596,53.476923,61.153846,56.294231,57.430288,58.879615,57.268269,60.921154
std,178.051309,135806.80634,0.000164,0.000243,0.002019,0.006922,0.009813,0.010428,0.010949,0.011228,0.011388,0.011539,0.011771,0.012151,0.01251,0.014427,0.017259,0.019918,0.021056,0.021259,0.021308,0.021428,135806.80634,3.037806,1.90119,1.720868,1.744386,2.0323,2.385721,2.944095,3.242271,3.569993,135806.80634,0.000242,0.00039,0.000449,0.000479,0.000499,0.000529,0.000556,...,135806.80634,1.984453,1.789611,1.155878,1.062797,1.160133,1.497937,1.717695,2.173183,3.737499,4.705878,3.54915,1.890471,1.107555,135806.80634,4.680385,3.003328,3.164534,2.970794,3.592284,3.38297,2.218726,2.424215,5.089103,9.654211,10.312418,10.06728,135806.80634,13.422291,10.212656,8.779173,10.60743,10.587024,10.864891,9.542637,9.337091,10.199205,11.259521,11.968484,11.571947
min,13101.0,7405.0,0.0,0.0,0.00015,0.004254,0.014433,0.016143,0.017267,0.018366,0.019513,0.021203,0.024697,0.027624,0.031742,0.037677,0.041989,0.049077,0.052479,0.054515,0.055934,0.057763,7405.0,0.109454,0.063551,0.058146,0.052381,0.062935,0.08191,0.093184,0.136668,0.083333,7405.0,4.2e-05,0.00021,0.000304,0.000335,0.000344,0.000465,0.000502,...,7405.0,1.65,1.525,0.725,0.88,0.85,3.375,3.175,5.3,7.025,5.96,4.95,1.575,0.5,7405.0,74.75,84.175,82.475,83.28,81.025,81.825,87.2,86.92,74.0,51.9,47.175,48.025,7405.0,22.5,32.2,35.275,26.4,35.5,24.9,34.725,31.62,33.8,32.3,27.925,35.35
25%,13113.75,85574.0,2.9e-05,0.000304,0.0022,0.01153,0.021699,0.025859,0.028595,0.030178,0.031643,0.033352,0.035308,0.038479,0.044363,0.053501,0.065533,0.078855,0.084582,0.086851,0.088593,0.090884,85574.0,6.900643,6.056337,6.056931,5.692372,6.527092,7.236665,8.331991,9.5267,9.559879,85574.0,0.000253,0.000531,0.000633,0.000696,0.000776,0.000839,0.000882,...,85574.0,4.675,3.7375,3.1875,2.315,3.33125,5.26875,5.93125,9.09,12.78125,12.115,12.01875,5.125,2.0,85574.0,87.25,90.05,88.4375,88.6,87.7375,86.6625,90.88125,91.645,88.2875,79.69,75.975,76.74375,85574.0,48.975,43.99375,46.54375,46.155,48.375,46.6875,54.50625,50.47,50.55,50.82,48.9375,50.80625
50%,13126.5,123316.5,8.4e-05,0.000391,0.003997,0.017824,0.029819,0.033688,0.036139,0.037841,0.03941,0.041228,0.043425,0.046688,0.052189,0.063791,0.077202,0.092711,0.099235,0.101346,0.1031,0.104958,123316.5,8.461129,7.387917,6.949806,7.038886,7.759739,8.589764,10.060699,11.106644,11.660779,123316.5,0.000383,0.000778,0.00098,0.001073,0.001127,0.001188,0.001229,...,123316.5,5.575,4.4875,3.8125,3.01,4.1125,6.825,7.225,10.87,15.375,15.88,14.2375,6.5375,2.9,123316.5,91.1,91.8125,91.2125,90.85,91.15,89.2625,92.3375,93.42,91.0875,85.43,83.925,81.625,123316.5,57.3,52.275,53.4,51.87,55.1,55.6,64.3375,59.52,59.9875,62.14,58.575,61.625
75%,13401.25,182598.5,0.000135,0.000502,0.004852,0.021814,0.036805,0.041047,0.044945,0.046859,0.048509,0.050343,0.05306,0.056662,0.061504,0.074491,0.088408,0.103076,0.110215,0.112606,0.113887,0.116082,182598.5,9.952754,8.061564,7.723573,7.704663,8.88247,9.817689,11.218674,12.100922,12.666522,182598.5,0.000604,0.001099,0.001259,0.001393,0.001498,0.00165,0.001728,...,182598.5,7.3625,5.78125,4.5125,4.11,4.69375,7.5375,8.11875,11.945,18.28125,18.435,17.1,7.65,3.3,182598.5,92.7625,93.425,92.9,92.435,92.95625,91.7625,94.03125,94.9,93.24375,90.07,88.1125,87.93125,182598.5,65.0625,60.3125,58.85625,60.355,60.7125,60.89375,67.23125,63.27,64.3375,67.22,66.8625,70.70625
max,13605.0,645909.0,0.000858,0.001245,0.009097,0.033412,0.053568,0.057717,0.060099,0.063132,0.064754,0.066674,0.070092,0.074368,0.081409,0.099397,0.119188,0.13868,0.149109,0.152288,0.153651,0.155631,645909.0,18.047714,11.249104,10.849707,10.784057,12.334766,14.74831,17.611436,19.967112,21.676257,645909.0,0.001001,0.001623,0.001951,0.002114,0.002215,0.002339,0.002448,...,645909.0,10.7,11.0,6.825,5.48,7.55,8.9,10.0,14.2,24.85,24.66,19.575,10.9,4.9,645909.0,96.95,97.6,96.95,97.42,97.275,96.875,98.025,98.2,97.325,96.28,93.25,92.35,645909.0,86.05,75.525,71.725,84.28,83.625,71.05,76.775,69.36,74.475,78.08,77.5,80.15


Este dataframe cuenta también con 90 columnas, pero solo 52 filas que corresponden a las comunas de la región metropolitana. 

Como para la siguiente sección queremos reducir dimensionalidad y luego clusterizar, queremos encontrar un valor optimo de clusters de KMeans. Para eso definimos la siguiente función que elige según el SilhouetteScore.

In [None]:
from sklearn.metrics import silhouette_score

def best_silhouette_score_clusters(X, k=30):
    j = 0
    max_score = -1
    for i in range(2, k + 1):
        kmeans = KMeans(n_clusters=i)
        pca_clusters = kmeans.fit_predict(X)
        score = silhouette_score(X, pca_clusters)
        if score > max_score:
            max_score = score
            j = i

    return (j, max_score)

def top_silhouette_score_clusters(X, k=30):
    clusters = []
    for i in range(2, k + 1):
        kmeans = KMeans(n_clusters=i)
        pca_clusters = kmeans.fit_predict(X)
        score = silhouette_score(X, pca_clusters)
        clusters.append((i, score))

    return sorted(clusters, key=lambda x: -x[1])[:4]

## 4. Análisis a nivel país

Definimos el X que usaremos de aquí en adelante. Nos quedamos solo con los valores numéricos del df

In [None]:
X, y = df_master.drop(columns=['Comuna', 'Region']), df_master[['Region']]
X

Unnamed: 0,Codigo comuna,Poblacion_x,total_cases_2020-03,total_cases_2020-04,total_cases_2020-05,total_cases_2020-06,total_cases_2020-07,total_cases_2020-08,total_cases_2020-09,total_cases_2020-10,total_cases_2020-11,total_cases_2020-12,total_cases_2021-01,total_cases_2021-02,total_cases_2021-03,total_cases_2021-04,total_cases_2021-05,total_cases_2021-06,total_cases_2021-07,total_cases_2021-08,total_cases_2021-09,total_cases_2021-10,Poblacion_y,mobility_2020-03,mobility_2020-04,mobility_2020-05,mobility_2020-06,mobility_2020-07,mobility_2020-08,mobility_2020-09,mobility_2020-10,mobility_2020-11,Poblacion_x.1,deceased_2020-06,deceased_2020-07,deceased_2020-08,deceased_2020-09,deceased_2020-10,deceased_2020-11,deceased_2020-12,...,Poblacion_y.1,PRC_positivity_2020-08,PRC_positivity_2020-09,PRC_positivity_2020-10,PRC_positivity_2020-11,PRC_positivity_2020-12,PRC_positivity_2021-01,PRC_positivity_2021-02,PRC_positivity_2021-03,PRC_positivity_2021-04,PRC_positivity_2021-05,PRC_positivity_2021-06,PRC_positivity_2021-07,PRC_positivity_2021-10,Poblacion_x.2,test_coverage_2020-08,test_coverage_2020-09,test_coverage_2020-10,test_coverage_2020-11,test_coverage_2020-12,test_coverage_2021-01,test_coverage_2021-02,test_coverage_2021-03,test_coverage_2021-04,test_coverage_2021-05,test_coverage_2021-06,test_coverage_2021-07,Poblacion_y.2,BAC_2020-08,BAC_2020-09,BAC_2020-10,BAC_2020-11,BAC_2020-12,BAC_2021-01,BAC_2021-02,BAC_2021-03,BAC_2021-04,BAC_2021-05,BAC_2021-06,BAC_2021-07
0,15101.0,247552.0,0.000024,0.000456,0.001644,0.004669,0.012763,0.024407,0.031255,0.037545,0.041738,0.043797,0.048412,0.057826,0.068939,0.083094,0.096380,0.107963,0.116388,0.119020,0.120803,0.123899,247552.0,5.394165,4.092249,4.283068,4.973705,5.049779,5.235622,5.868555,5.983744,6.312275,247552.0,0.000054,0.000146,0.000360,0.000539,0.000674,0.000816,0.000896,...,247552.0,11.55,7.625,7.200,3.48,2.550,5.650,7.050,6.34,6.950,5.68,5.800,3.575,1.7,247552.0,95.35,95.825,94.800,94.66,94.400,93.675,95.625,94.36,92.075,93.78,88.825,90.650,247552.0,60.60,66.975,64.700,73.38,78.525,81.650,83.025,84.74,85.300,87.16,86.425,87.700
1,15102.0,1233.0,0.000000,0.000000,0.000000,0.000000,0.007029,0.019735,0.021391,0.022619,0.022709,0.022303,0.023159,0.024635,0.032171,0.041182,0.046409,0.049777,0.053077,0.054699,0.055556,0.057493,1233.0,19.541699,7.755541,6.572208,8.568462,9.214640,9.920844,10.699744,7.944169,5.325128,1233.0,0.000000,0.000000,0.000721,0.000811,0.000811,0.000811,0.001115,...,1233.0,2.40,12.500,0.375,0.00,0.000,1.425,1.675,2.50,3.825,1.92,2.675,1.550,0.0,1233.0,93.75,91.150,95.225,96.40,97.725,93.425,100.000,96.68,96.750,99.62,96.650,97.450,1233.0,87.50,61.975,96.900,95.42,95.625,99.225,98.450,94.50,96.150,95.04,95.950,96.875
2,15202.0,810.0,0.000000,0.000000,0.000000,0.000000,0.000412,0.043484,0.078395,0.079012,0.079012,0.079012,0.079012,0.079321,0.080247,0.086008,0.096845,0.098148,0.107270,0.107407,0.108642,0.108642,810.0,1.352493,0.990023,0.953350,0.926154,0.796030,0.758809,1.180513,0.720099,0.811795,810.0,0.000000,0.000000,0.000274,0.001235,0.001235,0.001235,0.001235,...,810.0,25.00,5.275,0.000,0.00,0.000,0.000,8.325,0.00,18.275,12.50,3.675,8.825,0.0,810.0,100.00,100.000,91.675,100.00,75.000,75.000,75.000,53.34,67.150,97.50,90.825,94.225,810.0,25.00,87.775,66.675,80.00,41.650,75.000,58.350,60.00,61.050,97.14,96.075,97.050
3,15201.0,2515.0,0.000000,0.000000,0.000000,0.001789,0.010868,0.020057,0.024056,0.027921,0.028805,0.029026,0.032207,0.046521,0.051027,0.062381,0.067418,0.072614,0.077535,0.079832,0.082555,0.085089,2515.0,3.936189,2.406134,2.915727,3.678952,3.983215,3.855582,4.908582,5.249672,5.750949,2515.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,2515.0,6.20,4.825,1.275,0.24,0.000,3.350,7.300,2.34,7.675,1.74,2.400,0.975,20.0,2515.0,99.00,99.250,98.300,97.26,98.825,87.675,99.025,99.36,98.750,93.70,90.325,89.725,2515.0,89.45,94.000,93.900,97.48,98.275,95.150,82.875,94.46,87.275,95.94,90.325,87.525
4,1107.0,129999.0,0.000000,0.000131,0.002605,0.011027,0.017308,0.022662,0.027476,0.030821,0.033807,0.037358,0.049006,0.063305,0.073223,0.086931,0.101287,0.110234,0.115815,0.119827,0.121909,0.123566,129999.0,5.578910,4.598310,4.560372,4.267630,4.776936,5.534276,6.442598,6.629997,6.612584,129999.0,0.000140,0.000273,0.000393,0.000428,0.000441,0.000475,0.000531,...,129999.0,9.40,4.150,3.675,3.60,7.625,15.725,11.225,10.34,11.950,7.40,4.175,2.000,1.1,129999.0,94.40,95.525,92.775,95.26,90.800,89.150,97.800,95.74,97.000,95.74,92.450,90.800,129999.0,74.30,80.625,76.400,79.96,77.450,74.250,79.725,74.78,78.675,82.96,84.025,86.075
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339,12101.0,141984.0,0.000204,0.002518,0.006137,0.007950,0.010273,0.016787,0.039506,0.069785,0.084808,0.097546,0.111444,0.126173,0.136503,0.147215,0.163944,0.172121,0.175472,0.176483,0.177123,0.177331,141984.0,5.898374,3.599590,4.357672,5.160808,6.025074,5.543681,6.000764,5.383628,5.763862,141984.0,0.000121,0.000185,0.000227,0.000419,0.000765,0.001121,0.001390,...,141984.0,21.50,28.625,22.125,11.82,11.700,14.150,11.900,7.54,10.150,8.86,4.050,1.375,0.1,141984.0,97.00,92.075,95.500,97.44,97.325,92.400,96.875,97.32,97.950,95.38,92.975,93.850,141984.0,32.05,33.875,46.575,65.06,63.750,58.700,68.175,74.62,70.525,77.42,81.725,81.650
340,12103.0,211.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.002370,0.008425,0.013691,0.014218,0.022117,0.033175,0.041601,0.048447,0.060032,0.061611,0.080042,0.080569,0.080569,0.080569,211.0,6.101632,4.277943,4.331606,4.348359,4.013204,3.209761,4.007599,4.067859,4.397927,211.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,211.0,0.00,0.000,37.500,0.00,25.000,0.000,14.575,10.00,37.500,5.00,0.000,20.825,0.0,211.0,100.00,37.500,66.675,40.00,50.000,50.000,75.000,40.00,75.000,57.50,50.000,50.000,211.0,50.00,50.000,25.000,20.00,0.000,8.325,58.325,70.00,62.500,52.50,25.000,79.175
341,12104.0,681.0,0.000000,0.000000,0.002284,0.002937,0.002937,0.002937,0.002937,0.023005,0.038669,0.044053,0.071300,0.075441,0.090390,0.097079,0.101322,0.103341,0.107848,0.108664,0.108664,0.108664,681.0,5.976834,3.548821,3.626463,3.738194,3.565583,3.084164,3.976961,3.557203,4.407598,681.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,681.0,0.00,6.250,30.300,23.96,25.000,41.975,25.000,48.14,33.350,0.00,33.325,16.675,0.0,681.0,100.00,100.000,98.875,100.00,75.000,100.000,75.000,80.00,100.000,60.00,50.000,100.000,681.0,83.35,55.075,73.800,37.24,45.825,94.650,90.000,19.86,54.175,70.00,41.675,70.825
342,12303.0,282.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.001330,0.047281,0.071710,0.072252,0.082348,0.126773,0.136722,0.138298,0.147360,0.148936,0.148936,0.148936,0.148936,0.148936,282.0,0.463772,0.209580,0.236634,0.222611,0.209790,0.186330,0.222145,0.204827,0.305594,282.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,282.0,0.00,16.675,23.575,0.00,3.125,4.175,15.175,10.00,0.000,13.34,0.000,0.000,0.0,282.0,50.00,28.400,99.175,80.00,75.000,100.000,100.000,76.36,75.000,98.66,100.000,68.750,282.0,50.00,33.325,81.475,77.50,71.875,88.750,83.825,43.34,70.825,86.68,75.000,44.225


### 4.1 Usando PCA
Primero vamos a hacer reducción de dimensionalidad a dos dimensiones con PCA para graficar

In [None]:
# Estandarizamos
std_scaler = StandardScaler()
X_std = std_scaler.fit_transform(X)

# Queremos reducir a las dimensiones a 2 
pca_2 = PCA(n_components=2)

# Guardamos el dataset pero ahora en su versión reducida en 2 dimensiones
X_PCA = pca_2.fit_transform(X_std)
pca_2.explained_variance_ratio_

array([0.26347949, 0.14232497])

Probamos clusterizando con **kmeans**

In [None]:
best_silhouette_score_clusters(X_PCA)

(2, 0.5092307149511924)

A continuación un listado de los 4 mejores silhouette score para dos dimensiones

In [None]:
top_silhouette_score_clusters(X_PCA)

[(2, 0.5092307149511924),
 (4, 0.3937331960139819),
 (29, 0.3768170872758968),
 (3, 0.37622815764458145)]

In [None]:
# el mejor score es con 2 clusters para kmeans

kmeans2 = KMeans(n_clusters=2)
pca_clusters2 = kmeans2.fit_predict(X_PCA)
pca_clusters2

array([0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

Graficamos

In [None]:
df_pca_kmeans = pd.concat([pd.DataFrame(X_PCA), pd.DataFrame(pca_clusters2), pd.DataFrame(y), df_master['Comuna']], axis=1)
df_pca_kmeans.set_axis(['dim1', 'dim2', 'cluster', 'region', 'comuna'], axis=1, inplace=True)
df_pca_kmeans

Unnamed: 0,dim1,dim2,cluster,region,comuna
0,3.791244,1.045015,0,Arica y Parinacota,Arica
1,-2.488876,7.377346,1,Arica y Parinacota,Camarones
2,3.461578,-1.568894,1,Arica y Parinacota,General Lagos
3,-5.888086,3.277928,1,Arica y Parinacota,Putre
4,1.597426,-0.009610,1,Tarapacá,Alto Hospicio
...,...,...,...,...,...
339,10.273062,-6.972609,0,Magallanes,Punta Arenas
340,-5.685371,-1.163658,1,Magallanes,Rio Verde
341,-0.374341,-8.813634,1,Magallanes,San Gregorio
342,-2.982572,-6.918509,1,Magallanes,Timaukel


In [None]:
px.scatter(df_pca_kmeans, x='dim1', y='dim2', hover_name='comuna', hover_data=['region'], color='cluster')

A continuación utilizamos PCA manteniendo el 95% de varianza

In [None]:
# Estandarizamos
std_scaler = StandardScaler()
X_std = std_scaler.fit_transform(X)

# Queremos reducir manteniendo el 95%
pca = PCA(n_components=0.95)

# Guardamos el dataset pero ahora en su versión reducida
X_95 = pca.fit_transform(X_std)

print(X_95.shape)
pca.explained_variance_ratio_


(344, 26)


array([0.26347949, 0.14232497, 0.10887649, 0.08733959, 0.06175616,
       0.05044536, 0.02906438, 0.0242371 , 0.02267624, 0.02054941,
       0.01843255, 0.0151571 , 0.01341475, 0.01125614, 0.01072001,
       0.00907667, 0.00821724, 0.00810274, 0.00773285, 0.00674831,
       0.00637339, 0.00564579, 0.00533025, 0.00483152, 0.00456709,
       0.00420306])

Nos quedamos con 26 dimensiones para obtener el 0.95 de varianza 

In [None]:
best_silhouette_score_clusters(X_95)

(2, 0.30759808190623933)

In [None]:
top_silhouette_score_clusters(X_95)

[(2, 0.30434703046986034),
 (3, 0.18634479601206275),
 (7, 0.15691386930129497),
 (6, 0.15585546022116004)]

In [None]:
# nos quedamos con 2 clusters que es el mejor score
kmeans95_2 = KMeans(n_clusters=2)
pca_clusters95 = kmeans95_2.fit_predict(X_95)
pca_clusters95

array([1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [None]:
df_pca_kmeans = pd.concat([pd.DataFrame(X_95), pd.DataFrame(pca_clusters95), pd.DataFrame(y), df_master['Comuna']], axis=1)
df_pca_kmeans.set_axis(["dim" + str(i + 1) for i in range(X_95.shape[1])] + ["cluster", "region", "comuna"], axis=1, inplace=True)
df_pca_kmeans

Unnamed: 0,dim1,dim2,dim3,dim4,dim5,dim6,dim7,dim8,dim9,dim10,dim11,dim12,dim13,dim14,dim15,dim16,dim17,dim18,dim19,dim20,dim21,dim22,dim23,dim24,dim25,dim26,cluster,region,comuna
0,3.791244,1.045015,4.827509,0.153688,2.415301,-4.086410,-1.462939,1.053183,-0.621954,-0.724493,-1.379022,-0.675999,0.511723,-1.072265,0.485404,-0.496382,-0.716943,-0.336419,0.809532,-0.664559,0.350160,0.334162,0.031284,-0.207332,-0.017903,-0.630117,1,Arica y Parinacota,Arica
1,-2.488876,7.377346,4.603310,-3.153702,-0.381580,1.336256,-0.657055,1.345265,0.753632,-0.338354,-0.658674,-1.406805,1.004445,0.418419,1.363651,-1.738852,-0.276854,0.726979,0.272120,-1.113382,0.617174,0.809547,-1.053395,-0.075286,0.300894,-0.346065,0,Arica y Parinacota,Camarones
2,3.461578,-1.568894,6.747169,4.880477,-1.642631,2.734141,-0.662934,5.478762,-3.112027,0.349144,-1.412711,-0.597535,-3.169556,-1.790869,0.504117,-1.268729,1.919848,-2.213039,3.073732,-1.132644,4.165269,-1.907237,-2.235339,-1.833865,0.706854,-1.525068,1,Arica y Parinacota,General Lagos
3,-5.888086,3.277928,6.714476,-0.867313,0.418745,-0.630275,3.394784,0.512527,-4.594230,2.624517,0.586544,-0.031673,1.461945,-0.195299,5.319621,5.648732,-2.964351,-0.478549,-0.569389,-0.237919,0.413622,-0.510394,-0.877907,-1.295501,0.791764,-0.450021,0,Arica y Parinacota,Putre
4,1.597426,-0.009610,4.753089,-0.703149,1.935615,-1.523864,-0.541209,0.481721,-0.821568,-0.066375,1.397743,1.411990,-0.173672,0.157417,-0.639341,0.908479,1.462530,-0.121350,0.156273,0.809967,-0.223023,-0.182820,0.733881,0.087594,0.251120,0.107219,0,Tarapacá,Alto Hospicio
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339,10.273062,-6.972609,5.888839,-0.553500,5.823288,-0.090982,0.152468,6.115914,2.665581,-1.222764,-1.082659,-2.476628,-0.467674,-1.025520,1.055387,-0.565864,-1.231420,1.049412,-1.477531,0.311472,-0.155929,0.438807,0.212943,-0.238604,0.207750,-1.640068,1,Magallanes,Punta Arenas
340,-5.685371,-1.163658,-3.759476,14.439950,2.853279,7.033972,-2.810457,-3.519434,1.169673,1.659513,4.127862,2.969640,-1.802705,-3.568389,-0.571717,0.189724,-1.489954,-1.362535,1.778057,-2.810278,-0.569346,4.932097,-3.109756,-0.335214,0.015524,-0.161431,0,Magallanes,Rio Verde
341,-0.374341,-8.813634,-1.724352,0.601071,2.902164,2.482078,-2.889376,-4.213532,-0.735019,0.747446,8.393277,-4.829788,-1.198157,3.143081,2.770892,-1.291923,-0.748544,-2.278950,1.649253,-0.371051,2.299651,0.586745,3.197067,2.858697,2.172585,2.276877,0,Magallanes,San Gregorio
342,-2.982572,-6.918509,7.132091,5.821234,4.780349,2.795865,0.534261,4.107066,-0.443919,0.984347,-1.683475,3.084406,3.500245,5.717506,0.681563,-1.535317,-1.817559,1.925017,-2.008647,-2.187552,0.540737,1.978299,-0.482669,-0.607747,-1.413251,2.713391,0,Magallanes,Timaukel


## 5. Análisis Región Metropolitana

In [None]:
X_metrop, y_metrop = df_master_metropolitana.drop(columns=['Comuna', 'Region']), df_master_metropolitana[['Comuna']]

### 5.1 Usando PCA

In [None]:
# Estandarizamos
std_scaler1 = StandardScaler()
X_metrop_std = std_scaler1.fit_transform(X_metrop)

# Queremos reducir las dimensiones a 2 para graficar algo
pca_2 = PCA(n_components=2)

# Guardamos el dataset pero ahora en su versión reducida en 2 dimensiones
X_metrop_PCA = pca_2.fit_transform(X_metrop_std)
pca_2.explained_variance_ratio_

array([0.38969806, 0.16337092])

Probamos clusterizando con **kmeans**

In [None]:
best_silhouette_score_clusters(X_metrop_PCA)

(3, 0.4747927951565718)

In [None]:
top_silhouette_score_clusters(X_metrop_PCA)

[(3, 0.4747927951565718),
 (16, 0.46074067717936246),
 (2, 0.4602819145316686),
 (18, 0.4492940638390651)]

In [None]:
# Elegimos 3 clusters porque es el que da mejor score

kmeans_3 = KMeans(n_clusters=3)
pca_clusters_metrop = kmeans_3.fit_predict(X_metrop_PCA)
pca_clusters_metrop

array([0, 0, 0, 0, 1, 2, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 2, 0, 2, 2,
       1, 1, 0, 0, 0, 0, 2, 0, 0, 1, 0, 1, 0, 2, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 2, 0, 0, 2], dtype=int32)

Graficamos

In [None]:
df_pca_kmeans_metrop = pd.concat([pd.DataFrame(X_metrop_PCA), pd.DataFrame(pca_clusters_metrop), y_metrop.reset_index()['Comuna']], axis=1)
df_pca_kmeans_metrop.set_axis(['dim1', 'dim2', 'cluster', 'comuna'], axis=1, inplace=True)

In [None]:
px.scatter(df_pca_kmeans_metrop, x='dim1', y='dim2', hover_name='comuna', color='cluster')

Ahora calculamos PCA con 0.95 

In [None]:
# Estandarizamos
std_scaler = StandardScaler()
X_std = std_scaler.fit_transform(X_metrop)

# Queremos reducir manteniendo el 95%
pca_95 = PCA(n_components=0.95)

# Guardamos el dataset pero ahora en su versión reducida
X_metrop_PCA95 = pca_95.fit_transform(X_metrop_std)


print(X_metrop_PCA95.shape)
pca_95.explained_variance_ratio_

(52, 15)


array([0.38969806, 0.16337092, 0.11085838, 0.06883086, 0.04798883,
       0.04437071, 0.03110676, 0.02020835, 0.01546496, 0.01472199,
       0.01081111, 0.01001903, 0.00860689, 0.00780103, 0.00625493])

Para obtener el 0.95 de varianza nos quedamos con 15 dimensiones

In [None]:
best_silhouette_score_clusters(X_metrop_PCA95)

(3, 0.2513083807118407)

In [None]:
top_silhouette_score_clusters(X_metrop_PCA95)

[(2, 0.2509995974297051),
 (3, 0.2434473813324963),
 (4, 0.20401941788951025),
 (6, 0.17804420706984858)]

In [None]:
# usamos 4 clusters porque tiene un buen score y coincide con la implementación del plan paso a paso del gobierno
kmeans_4 = KMeans(n_clusters=4)
pca_clusters_metrop = kmeans_4.fit_predict(X_metrop_PCA95)
pca_clusters_metrop

array([0, 0, 0, 0, 1, 3, 1, 0, 1, 0, 0, 3, 1, 0, 1, 3, 1, 1, 2, 3, 2, 2,
       1, 1, 0, 3, 0, 0, 2, 0, 0, 1, 0, 1, 0, 2, 3, 3, 3, 1, 1, 1, 3, 1,
       1, 1, 0, 1, 3, 0, 3, 2], dtype=int32)

In [None]:
df_pca_kmeans_metrop = pd.concat([pd.DataFrame(X_metrop_PCA95), pd.DataFrame(pca_clusters_metrop), y_metrop.reset_index()['Comuna']], axis=1)
df_pca_kmeans_metrop.set_axis(["dim" + str(i + 1) for i in range(X_metrop_PCA95.shape[1])] + ["cluster", "comuna"], axis=1, inplace=True)

## 6. Visualización en Mapas

### 6.1 Región Metropolitana

In [None]:
# Agregamos la columna de codigo de comuna al resultado de aplicar PCA y Kmeans a los datos
df_pca_kmeans_metrop['CodigoComuna'] = df_master_metropolitana['Codigo comuna'].reset_index(drop=True).astype(int)
# Y pasamos los clusters obtenidos a strings para que plotly los considere como categorias discretas
df_pca_kmeans_metrop['cluster'] = df_pca_kmeans_metrop['cluster'].astype(str)

In [None]:
rm_geojson = 'https://raw.githubusercontent.com/caracena/chile-geojson/master/13.geojson'
fig = px.choropleth_mapbox(df_pca_kmeans_metrop, geojson=rm_geojson, locations='CodigoComuna', color='cluster',
                           hover_data=["comuna"],
                           mapbox_style="carto-positron",
                           featureidkey="properties.cod_comuna",
                           zoom=7, center = {"lat": -33.4489, "lon": -70.6693})

fig.show()

### 6.2 Todas las Comunas de Chile

In [None]:
# Agregamos la columna de codigo de comuna al resultado de aplicar PCA y Kmeans a los datos
df_pca_kmeans['CodigoComuna'] = df_master['Codigo comuna'].reset_index(drop=True).astype(int)
# Y pasamos los clusters obtenidos a strings para que plotly los considere como categorias discretas
df_pca_kmeans['cluster'] = df_pca_kmeans['cluster'].astype(str)
df_pca_kmeans

Unnamed: 0,dim1,dim2,dim3,dim4,dim5,dim6,dim7,dim8,dim9,dim10,dim11,dim12,dim13,dim14,dim15,dim16,dim17,dim18,dim19,dim20,dim21,dim22,dim23,dim24,dim25,dim26,cluster,region,comuna,CodigoComuna
0,3.791244,1.045015,4.827509,0.153688,2.415301,-4.086410,-1.462939,1.053183,-0.621954,-0.724493,-1.379022,-0.675999,0.511723,-1.072265,0.485404,-0.496382,-0.716943,-0.336419,0.809532,-0.664559,0.350160,0.334162,0.031284,-0.207332,-0.017903,-0.630117,1,Arica y Parinacota,Arica,15101
1,-2.488876,7.377346,4.603310,-3.153702,-0.381580,1.336256,-0.657055,1.345265,0.753632,-0.338354,-0.658674,-1.406805,1.004445,0.418419,1.363651,-1.738852,-0.276854,0.726979,0.272120,-1.113382,0.617174,0.809547,-1.053395,-0.075286,0.300894,-0.346065,0,Arica y Parinacota,Camarones,15102
2,3.461578,-1.568894,6.747169,4.880477,-1.642631,2.734141,-0.662934,5.478762,-3.112027,0.349144,-1.412711,-0.597535,-3.169556,-1.790869,0.504117,-1.268729,1.919848,-2.213039,3.073732,-1.132644,4.165269,-1.907237,-2.235339,-1.833865,0.706854,-1.525068,1,Arica y Parinacota,General Lagos,15202
3,-5.888086,3.277928,6.714476,-0.867313,0.418745,-0.630275,3.394784,0.512527,-4.594230,2.624517,0.586544,-0.031673,1.461945,-0.195299,5.319621,5.648732,-2.964351,-0.478549,-0.569389,-0.237919,0.413622,-0.510394,-0.877907,-1.295501,0.791764,-0.450021,0,Arica y Parinacota,Putre,15201
4,1.597426,-0.009610,4.753089,-0.703149,1.935615,-1.523864,-0.541209,0.481721,-0.821568,-0.066375,1.397743,1.411990,-0.173672,0.157417,-0.639341,0.908479,1.462530,-0.121350,0.156273,0.809967,-0.223023,-0.182820,0.733881,0.087594,0.251120,0.107219,0,Tarapacá,Alto Hospicio,1107
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
339,10.273062,-6.972609,5.888839,-0.553500,5.823288,-0.090982,0.152468,6.115914,2.665581,-1.222764,-1.082659,-2.476628,-0.467674,-1.025520,1.055387,-0.565864,-1.231420,1.049412,-1.477531,0.311472,-0.155929,0.438807,0.212943,-0.238604,0.207750,-1.640068,1,Magallanes,Punta Arenas,12101
340,-5.685371,-1.163658,-3.759476,14.439950,2.853279,7.033972,-2.810457,-3.519434,1.169673,1.659513,4.127862,2.969640,-1.802705,-3.568389,-0.571717,0.189724,-1.489954,-1.362535,1.778057,-2.810278,-0.569346,4.932097,-3.109756,-0.335214,0.015524,-0.161431,0,Magallanes,Rio Verde,12103
341,-0.374341,-8.813634,-1.724352,0.601071,2.902164,2.482078,-2.889376,-4.213532,-0.735019,0.747446,8.393277,-4.829788,-1.198157,3.143081,2.770892,-1.291923,-0.748544,-2.278950,1.649253,-0.371051,2.299651,0.586745,3.197067,2.858697,2.172585,2.276877,0,Magallanes,San Gregorio,12104
342,-2.982572,-6.918509,7.132091,5.821234,4.780349,2.795865,0.534261,4.107066,-0.443919,0.984347,-1.683475,3.084406,3.500245,5.717506,0.681563,-1.535317,-1.817559,1.925017,-2.008647,-2.187552,0.540737,1.978299,-0.482669,-0.607747,-1.413251,2.713391,0,Magallanes,Timaukel,12303


In [None]:
chile_geojson = 'https://gist.githubusercontent.com/MarceloSaldias/8123ad8d4a78737123703159850ef8d0/raw/8cf696b51d1176aa1f5a4539eedfaf8a8c184259/All_Comunas.geojson'
fig = px.choropleth_mapbox(df_pca_kmeans, geojson=chile_geojson, locations='CodigoComuna', color='cluster',
                           hover_data=["comuna", "region"],
                           mapbox_style="carto-positron",
                           featureidkey="properties.cod_comuna",
                           zoom=3.5, center = {"lat": -40, "lon": -70.6693})

fig.update_layout(
    autosize=False,
    width=700,
    height=1000,
    margin=dict(
        l=50,
        r=50,
        b=20,
        t=20,
        pad=1
    ),
)

fig.show()

## 7. Conclusiones

A partir de los resultados, podemos observar que el análisis pareciera tener coincidencia con lo que esperábamos. Cuando realizamos el análisis con todas las comunas vemos que las de regiones (no metropolitana) quedan graficadas muy cerca y en el mismo cluster o en clusters muy cercanos. Con respecto a las comunas de la región metropolitana, los clusters coinciden con respecto a su nivel socieconómico, cantidad de población y localidad, por ejemplo comunas como Providencia, La Reina, Lo Barnechea, Las Condes, Vitacura y Ñuñoa quedan en el mismo cluster de `Kmeans`, mientras que comunas rurales como Curacaví, Isla de Maipo, Paine, Talagante y Pirque, entre otras, están en otro cluster. 


### Que se puede hacer con este modelo?  

Primero, puede servir para estudiar implementaciones de medidas ya sean preventivas o reactivas en ciertas comunas representativas de los clusters, analizar como funcionan y en base a eso decidir si implementar en el resto de las comunas del cluster.

Además, si bien actualmente, la mayoría de las comunas se encuentra en fase 4, eventualmente con las nuevas variantes o con otras enfermedades como la influenza,esta agrupación de comunas podría servir para tomar mejores medidas preventivas. Cabe destacar que en este modelo está entrenado con datos de COVID-19, por lo que en caso de utilizarlo para otras enfermedades o variantes que puedan tener otro comportamiento, esto es una buena aproximación inicial ,pero con el tiempo se debería ir calibrando con los datos respectivos.

### Posibles aplicaciones

Teniendo los datos respectivos,  cualquier  caso de uso que tenga relación con comportamiento de comunas, por ejemplo votaciones, puntajes PTU, entre otros.


