# Comorbodities of Mexican States

This Notebook uses the prevalence of comorbidities dataframe  for States (admin1) and Municipalities (admin2) derived from the 2018 Mexican Census: [INEGI](https://www.inegi.org.mx/investigacion/pohd/2018/#Tabulados).

In [1]:
import pandas as pd

In [2]:
df = pd.read_excel('../data/a_peq_prev_2018.xlsx',skiprows = range(0, 2))

Only the data of relevance is selected

In [3]:
df = df.iloc[0:12450,0:9].copy()

The name of the columns is changed so the interpretation can be easier

In [4]:
df.rename(columns = {'Identificador único del municipio':'cve_ent','Entidad federativa': 'state', 'Clave de entidad federativa': 'CVE_ENT','Municipio o delegación': 'municipality', 
                     'Porcentaje de población de 20 años y más con obesidad.': 'pct_pop_obesity',
                     'Porcentaje de población de 20 años y más con diagnóstico previo de hipertensión.':'pct_pop_hypertension',
                     'Porcentaje de población de 20 años y más con diagnóstico previo de diabetes.':'pct_pop_diabetes'}, inplace=True)

The national data is deleted from the dataframe

In [5]:
df.query("state != 'Estados Unidos Mexicanos'",inplace=True)

The code of the municipality of origin is converted to an integer because of the ompatibility with other dataframe that will be merged with it

In [6]:
df['cve_ent'] = df['cve_ent'].astype(int)

The code of the state of origin is converted to an integer because of the ompatibility with other dataframe that will be merged with it

In [7]:
df['CVE_ENT'] = df['CVE_ENT'].astype(str).str[:-2]

The data of interest is selected

In [8]:
df = df[['cve_ent', 'CVE_ENT', 'state','municipality', 'Estimador', 'pct_pop_obesity', 'pct_pop_hypertension','pct_pop_diabetes']].copy()
df

Unnamed: 0,cve_ent,CVE_ENT,state,municipality,Estimador,pct_pop_obesity,pct_pop_hypertension,pct_pop_diabetes
5,1000,1,Aguascalientes,Total,Valor,32.593387,14.700566,7.556478
6,1000,1,Aguascalientes,Total,Error estándar,2.246356,0.970934,0.823857
7,1000,1,Aguascalientes,Total,Límite inferior de confianza,28.898460,13.103521,6.201354
8,1000,1,Aguascalientes,Total,Límite superior de confianza,36.288314,16.297611,8.911602
9,1000,1,Aguascalientes,Total,Coeficiente de variación,6.892061,6.604742,7.556478
...,...,...,...,...,...,...,...,...
12445,32058,32,Zacatecas,Santa María de la Paz,Valor,25.364136,21.230514,11.442311
12446,32058,32,Zacatecas,Santa María de la Paz,Error estándar,4.175374,2.587923,1.051674
12447,32058,32,Zacatecas,Santa María de la Paz,Límite inferior de confianza,18.496258,16.973759,9.712460
12448,32058,32,Zacatecas,Santa María de la Paz,Límite superior de confianza,32.232015,25.487269,13.172161


## Process of obtaining state data

Only the municipality and state total data is obtain

In [9]:
dfStates = df.query("municipality == 'Total'").copy()

Only the value of the state data is obtain

In [10]:
dfStates.query("Estimador == 'Valor'",inplace=True)

The data of interest of the state data is selected

In [11]:
dfStates = dfStates[[ 'CVE_ENT', 'state', 'pct_pop_obesity', 'pct_pop_hypertension','pct_pop_diabetes']].copy()

The state of origin code is rename for further compatibility with other dataframes

In [12]:
dfStates.rename(columns={'CVE_ENT':'cve_ent'},inplace=True)
dfStates

Unnamed: 0,cve_ent,state,pct_pop_obesity,pct_pop_hypertension,pct_pop_diabetes
5,1,Aguascalientes,32.593387,14.700566,7.556478
65,2,Baja California,48.366995,21.007712,9.974848
95,3,Baja California Sur,42.849118,16.536911,8.36975
125,4,Campeche,44.903811,26.108154,14.007917
185,5,Coahuila de Zaragoza,37.606715,22.397551,12.343806
380,6,Colima,43.19011,17.222931,10.827326
435,7,Chiapas,28.954615,16.232804,7.790363
1030,8,Chihuahua,38.667202,22.602201,9.250058
1370,9,Ciudad de México,36.3398,20.207474,12.665536
1455,10,Durango,37.582702,20.249738,10.900251


The comorbiditie state data is stored

In [13]:
dfStates.to_csv('../data/week3_comorbidities_states.csv',index=False)

## Process of obtaining municipality data

All the municipality data is obtain

In [14]:
dfMun =df.query("municipality != 'Total'").copy()

Only the municipalities total values are obtain

In [15]:
dfMun.query("Estimador == 'Valor'",inplace=True)

The data of interest of the municipality data is selected

In [16]:
dfMun = dfMun[[ 'cve_ent', 'municipality', 'pct_pop_obesity', 'pct_pop_hypertension','pct_pop_diabetes']].copy()
dfMun

Unnamed: 0,cve_ent,municipality,pct_pop_obesity,pct_pop_hypertension,pct_pop_diabetes
10,1001,Aguascalientes,31.486541,14.942242,7.495861
15,1002,Asientos,32.282284,15.320425,7.953634
20,1003,Calvillo,40.004293,13.751906,9.172624
25,1004,Cosío,32.596450,16.431493,7.383116
30,1005,Jesús María,34.731715,12.356755,6.745819
...,...,...,...,...,...
12425,32054,Villa Hidalgo,31.075955,16.920279,9.895491
12430,32055,Villanueva,30.526273,21.746589,12.050064
12435,32056,Zacatecas,36.822130,20.010453,11.814172
12440,32057,Trancoso,32.477646,15.941678,8.089966


The comorbiditie municipality data is stored

In [17]:
dfMun.to_csv('../data/week3_comorbidities_municipalities.csv',index=False)