# Entrega 1
Optimización 2021-1

Grupo 17

Importación de librerías necesarias:

In [3]:
import json
import pandas as pd

## 1. Descarga de datos

Los datos fueron obtenidos por medio de [esta query](https://services3.arcgis.com/YLNLlpguFsVmgseb/arcgis/rest/services/Dosis_por_comuna_puntos/FeatureServer/0/query?f=json&where=1=1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=ORD_REG%20asc&resultOffset=0&resultRecordCount=32000&resultType=standard&cacheHint=true) a los servidores del Ministerio de Bienes Nacionales.

In [4]:
!wget https://www.dropbox.com/s/lg8grwkyurfxrpl/vacunas.json

--2021-04-17 21:45:51--  https://www.dropbox.com/s/lg8grwkyurfxrpl/vacunas.json
Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:601a:18::a27d:712
Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/lg8grwkyurfxrpl/vacunas.json [following]
--2021-04-17 21:45:51--  https://www.dropbox.com/s/raw/lg8grwkyurfxrpl/vacunas.json
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc8b1d779d18cdbd47006b4910a6.dl.dropboxusercontent.com/cd/0/inline/BM0Jw9XGOXVtXqhUotzCNAHpslWZT_s101sPsBYLcey7nExchXMBPyuNxgcFfG6ndXvkw_aFBLI-obyeZM5cWNBxQFxMKgqGyz6JvhfnRAVoAi_IzenEt-8Inb5smyh3UCVh7nxT92bfXuZ-_3XKdIMZ/file# [following]
--2021-04-17 21:45:51--  https://uc8b1d779d18cdbd47006b4910a6.dl.dropboxusercontent.com/cd/0/inline/BM0Jw9XGOXVtXqhUotzCNAHpslWZT_s101sPsBYLcey7nExchXMBPyuNxgcFfG6ndXvkw_aFBLI-obyeZM5cWNBxQF

Carga del json de datos:

In [5]:
with open('vacunas.json', 'r') as file:
    data = file.read()

vaccines = json.loads(data)

In [6]:
#print(json.dumps(vaccines, sort_keys=True, indent=4))

In [7]:
len(vaccines['features'])

346

In [8]:
vaccines['features'][0]['attributes']

{'COMUNA': 'Arica',
 'CUT_COM': '15101',
 'CUT_COM_EN': 15101,
 'CUT_PROV': '151',
 'CUT_REG': '15',
 'CUT_REG_EN': 15,
 'DOSIS_1': 78951,
 'DOSIS_2': 52357,
 'DOSIS_TOT': 131308,
 'FID': 321,
 'F_Reporte': 1616630400000,
 'ID_DOSIS': 1,
 'ORD_COM': 1,
 'ORD_REG': 1,
 'POB_OBJ': 188465,
 'POB_TOTAL': 250795,
 'PROVINCIA': 'Arica',
 'REGION': 'Arica y Parinacota'}

## 2. Preprocesamiento

Los datos fueron transformados a formato `pd.DataFrame`. Luego, se eliminaron comunas que no serían incorporadas en el modelo.

Finalmente, se agregan columnas de interés relacionadas a los porcentajes de vacunación.

In [9]:
def pandification(my_json):
    data = []
    for commune in my_json['features']:
        row = commune['attributes']
        data.append(row)
    return pd.DataFrame(data)

In [10]:
df = pandification(vaccines)

df.head()

Unnamed: 0,FID,CUT_REG,CUT_PROV,CUT_COM,REGION,PROVINCIA,COMUNA,ID_DOSIS,CUT_REG_EN,POB_OBJ,POB_TOTAL,DOSIS_1,DOSIS_2,DOSIS_TOT,F_Reporte,ORD_REG,ORD_COM,CUT_COM_EN
0,321,15,151,15101,Arica y Parinacota,Arica,Arica,1,15,188465,250795,78951,52357,131308,1616630400000,1,1,15101
1,322,15,151,15102,Arica y Parinacota,Arica,Camarones,2,15,1025,1239,453,341,794,1616630400000,1,2,15102
2,323,15,152,15201,Arica y Parinacota,Parinacota,Putre,4,15,2080,2536,617,330,947,1616630400000,1,4,15201
3,324,15,152,15202,Arica y Parinacota,Parinacota,General Lagos,3,15,673,810,94,48,142,1616630400000,1,3,15202
4,1,1,11,1101,Tarapacá,Iquique,Iquique,9,1,173861,227127,85331,54322,139653,1616630400000,2,9,1101


Elimina comunas en zona extrema:

In [11]:
extreme_zone = ['Isla de Pascua', 'Juan Fernández', 'Antártica', 'Cabo de Hornos', 'Porvenir',
                'Primavera', 'Timaukel', 'Laguna Blanca', 'Punta Arenas',
                'Río Verde', 'San Gregorio', 'Natales', 'Torres del Paine']

def get_index(df, elements):
    index = []
    for e in elements:
        index.append(df[df['COMUNA'] == e].index[0])
    return index

extreme_index = get_index(df, extreme_zone)
df = df.drop(extreme_index, axis=0).reset_index()

Definición de porcentajes:

In [12]:
df['P_DOSIS_1'] = df['DOSIS_1'] / df['POB_OBJ']
df['P_DOSIS_2'] = df['DOSIS_2'] / df['POB_OBJ']
df['P_DOSIS']   = df['DOSIS_TOT'] / (2*df['POB_OBJ'])

main_columns = ['REGION', 'COMUNA', 'P_DOSIS_1', 'P_DOSIS_2', 'P_DOSIS']

## 3. Análisis de datos

### 3.1. En general

En general, la distribución de la vacunación en general se describe de la siguiente forma:

In [13]:
df.P_DOSIS.describe()

count    333.000000
mean       0.434413
std        0.109190
min        0.105498
25%        0.374616
50%        0.419665
75%        0.479351
max        1.024230
Name: P_DOSIS, dtype: float64

In [14]:
mean_dosis = df.P_DOSIS.mean()

# Cálculo del error medio
abs(df.P_DOSIS - mean_dosis).mean()

0.07589593629325789

#### 3.1.1 Por comuna

Las comunas con los menores porcentajes de vacunación para la primera dosis son:

In [15]:
df[main_columns].sort_values('P_DOSIS_1').head()

Unnamed: 0,REGION,COMUNA,P_DOSIS_1,P_DOSIS_2,P_DOSIS
3,Arica y Parinacota,General Lagos,0.139673,0.071322,0.105498
8,Tarapacá,Colchane,0.178713,0.125496,0.152105
248,Biobío,Alto Biobío,0.218723,0.132391,0.175557
169,Maule,Maule,0.247713,0.145638,0.196676
5,Tarapacá,Alto Hospicio,0.264641,0.1347,0.19967


Las comunas con los menores porcentajes de vacunación para la segunda dosis son:

In [16]:
df[main_columns].sort_values('P_DOSIS_2').head()

Unnamed: 0,REGION,COMUNA,P_DOSIS_1,P_DOSIS_2,P_DOSIS
3,Arica y Parinacota,General Lagos,0.139673,0.071322,0.105498
8,Tarapacá,Colchane,0.178713,0.125496,0.152105
248,Biobío,Alto Biobío,0.218723,0.132391,0.175557
5,Tarapacá,Alto Hospicio,0.264641,0.1347,0.19967
169,Maule,Maule,0.247713,0.145638,0.196676


Las comunas con los menores porcentajes de vacunación en general son:

In [17]:
df[main_columns].sort_values('P_DOSIS').head()

Unnamed: 0,REGION,COMUNA,P_DOSIS_1,P_DOSIS_2,P_DOSIS
3,Arica y Parinacota,General Lagos,0.139673,0.071322,0.105498
8,Tarapacá,Colchane,0.178713,0.125496,0.152105
248,Biobío,Alto Biobío,0.218723,0.132391,0.175557
169,Maule,Maule,0.247713,0.145638,0.196676
5,Tarapacá,Alto Hospicio,0.264641,0.1347,0.19967


#### 3.1.2 Por región

Distribución de los datos en cada región, **ordenados por promedio**:

In [18]:
df.groupby(by=df.REGION).describe()['P_DOSIS'].sort_values('mean')

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
REGION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Arica y Parinacota,4.0,0.267205,0.127446,0.105498,0.197108,0.288003,0.358101,0.387317
Tarapacá,7.0,0.304897,0.097784,0.152105,0.242915,0.326004,0.38436,0.401623
Metropolitana de Santiago,52.0,0.380648,0.109491,0.230557,0.326644,0.363233,0.395392,0.765867
Atacama,9.0,0.382987,0.042195,0.345406,0.353526,0.379956,0.386723,0.483904
Antofagasta,9.0,0.399522,0.118548,0.301856,0.341575,0.358308,0.374521,0.65413
Coquimbo,15.0,0.414622,0.055177,0.336937,0.374436,0.410027,0.450537,0.504119
Los Lagos,30.0,0.424971,0.05943,0.328107,0.377881,0.416627,0.457804,0.559768
Biobío,33.0,0.427915,0.082095,0.175557,0.390536,0.432981,0.478436,0.567233
La Araucanía,32.0,0.432565,0.064136,0.276229,0.388762,0.434587,0.475951,0.545525
Los Ríos,12.0,0.432608,0.033572,0.3879,0.412287,0.427322,0.45069,0.508818


Distribución de los datos en cada región, **ordenados por desviación estandar**:

In [19]:
df.groupby(by=df.REGION).describe()['P_DOSIS'].sort_values('std', ascending=False)

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
REGION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Valparaíso,36.0,0.504575,0.174932,0.309068,0.39808,0.432119,0.52728,1.02423
Arica y Parinacota,4.0,0.267205,0.127446,0.105498,0.197108,0.288003,0.358101,0.387317
Antofagasta,9.0,0.399522,0.118548,0.301856,0.341575,0.358308,0.374521,0.65413
Metropolitana de Santiago,52.0,0.380648,0.109491,0.230557,0.326644,0.363233,0.395392,0.765867
Libertador General Bernardo O'Higgins,33.0,0.465311,0.107614,0.271541,0.406086,0.440528,0.496212,0.760847
Tarapacá,7.0,0.304897,0.097784,0.152105,0.242915,0.326004,0.38436,0.401623
Maule,30.0,0.45076,0.093735,0.196676,0.413339,0.433636,0.468156,0.736425
Biobío,33.0,0.427915,0.082095,0.175557,0.390536,0.432981,0.478436,0.567233
Ñuble,21.0,0.513849,0.069102,0.364946,0.473769,0.521306,0.562744,0.629592
Aysén del General Carlos Ibáñez del Campo,10.0,0.466341,0.064701,0.367206,0.4391,0.463427,0.514913,0.56391


### 3.2 Casos particulares

#### 3.2.1 Arica y Parinacota

La descripción de esta región es:

In [20]:
df[df.REGION == 'Arica y Parinacota'].P_DOSIS.describe()

count    4.000000
mean     0.267205
std      0.127446
min      0.105498
25%      0.197108
50%      0.288003
75%      0.358101
max      0.387317
Name: P_DOSIS, dtype: float64

Análisis de las comunas en la región:

In [21]:
df[df.REGION == 'Arica y Parinacota'][main_columns]

Unnamed: 0,REGION,COMUNA,P_DOSIS_1,P_DOSIS_2,P_DOSIS
0,Arica y Parinacota,Arica,0.418916,0.277808,0.348362
1,Arica y Parinacota,Camarones,0.441951,0.332683,0.387317
2,Arica y Parinacota,Putre,0.296635,0.158654,0.227644
3,Arica y Parinacota,General Lagos,0.139673,0.071322,0.105498


#### 3.2.2 Metropolitana de Santiago

La descripción de esta región es:

In [22]:
df[df.REGION == 'Metropolitana de Santiago'].P_DOSIS.describe()

count    52.000000
mean      0.380648
std       0.109491
min       0.230557
25%       0.326644
50%       0.363233
75%       0.395392
max       0.765867
Name: P_DOSIS, dtype: float64

Análisis de las comunas de la región, **ordenadas por porcentaje de vacunación**:

In [23]:
df[df.REGION == 'Metropolitana de Santiago'][main_columns].sort_values('P_DOSIS')

Unnamed: 0,REGION,COMUNA,P_DOSIS_1,P_DOSIS_2,P_DOSIS
116,Metropolitana de Santiago,Lampa,0.301393,0.159721,0.230557
85,Metropolitana de Santiago,Estación Central,0.310396,0.19976,0.255078
104,Metropolitana de Santiago,Quilicura,0.345231,0.174172,0.259702
82,Metropolitana de Santiago,Cerro Navia,0.337651,0.198592,0.268122
91,Metropolitana de Santiago,La Pintana,0.336054,0.210612,0.273333
112,Metropolitana de Santiago,Puente Alto,0.336467,0.212121,0.274294
118,Metropolitana de Santiago,San Bernardo,0.350454,0.205426,0.27794
107,Metropolitana de Santiago,Renca,0.358034,0.200947,0.279491
115,Metropolitana de Santiago,Colina,0.370313,0.199664,0.284989
101,Metropolitana de Santiago,Peñalolén,0.390494,0.242296,0.316395
