<a href="https://colab.research.google.com/github/acoiman/pdt/blob/main/asthma_mortality/notebooks/colab/02_Asthma_Mortality_PP_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Asthma Mortality Data Preprocessing (Part 2)

In part 2 of asthma mortality data preprocessing we will do Feature engineering. We will calculate asthma mortality rate aggregated by departments adjusted per 100,000 inhabitants.

## Load libraries
Libraries required for the analysis will be loaded.

In [None]:
# DataFrame libraries
import pandas as pd
from pandas.api.types import CategoricalDtype
import geopandas as gpd

# numpy
import numpy as np

# Google Drive libraries
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload


# other libraries
import csv
from itables import init_notebook_mode
import webbrowser

In [None]:
# change directory to work folder (at the begining docker container enter into /home/jovyan/)
%cd work

## Modifying 2001 census data

Argentina's 2001 population data, aggregated by department, were obtained from the website [poblaciones.org](https://poblaciones.org/). The data was displayed in QGIS, and only the total population column was selected.

This dataset will be modified below. All polygons in Buenos Aires City will be dissolved because, according to the Department of Health Statistics (DEIS), "in the case of Buenos Aires city, we treated it as a single geographic unit. The subdivision of Buenos Aires City was not homogeneous throughout the requested data period." Consequently, Buenos Aires City will be considered as a single department.

In [None]:
# Read a shapefile into a GeoDataFrame.
gdf= gpd.read_file("pdt/asthma_mortality/data/shp/censo_2001.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the DataFrame 'gdf'.
gdf.head()

In [None]:
# Generate information about the dataframe.
gdf.info()

In [None]:
# Filter a GeoDataFrame to select rows where the 'DPTO' column starts with '02'.
departamentos_02 = gdf[gdf['DPTO'].str.startswith('02')]

#  Calculate the sum of the population in the 'A_2001' column of the 'departamentos_02' dataframe.
poligono_union = departamentos_02.union_all()

# Calculate the sum of the population in the 'A_2001' column of the 'departamentos_02' dataframe.
suma_poblacion = departamentos_02['A_2001'].sum()

# Update the 'geometry' and 'A_2001' columns in a GeoDataFrame.
gdf.loc[departamentos_02.index, 'geometry'] = poligono_union

# Update the 'A_2001' column in the 'gdf' DataFrame at the specified index locations with the calculated 'suma_poblacion' value.
gdf.loc[departamentos_02.index, 'A_2001'] = suma_poblacion

# Filter out rows from a DataFrame where the 'DPTO' column starts with '02'.
gdf = gdf[~gdf['DPTO'].str.startswith('02')]

# Create a new GeoDataFrame row with specified columns and values
new_row = gpd.GeoDataFrame({'DPTO': '02000', 'geometry': poligono_union, 'A_2001': suma_poblacion}, index=[0], crs=gdf.crs)

# Concatenate a DataFrame (new_row) to another DataFrame (gdf) while ignoring the index.
gdf = pd.concat([gdf, new_row], ignore_index=True)

In [None]:
# Rename the column 'DPTO' to 'IDDPTO' in the DataFrame gdf.
gdf = gdf.rename(columns={'DPTO': 'IDDPTO'})

In [None]:
# Filter polygons where 'DPTO' starts with '02'
selected_polygons = gdf[gdf['IDDPTO'].str.startswith('02')]

# Print or further process the selected polygons
selected_polygons


In [None]:
# Save the GeoDataFrame to a shapefile at the specified path.
gdf.to_file("pdt/asthma_mortality/data/shp/censo_2001_modified.shp")

Since the dataset resulting from the union of the CABA polygons presents topological inconsistencies (holes), it was cleaned with the Delete Holes tool in QGIS.

In [None]:
# Read a shapefile containing data on asthma mortality from the year 2001.
gdf2001= gpd.read_file("pdt/asthma_mortality/data/shp/censo_2001_clean.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the gdf2001 DataFrame
init_notebook_mode(all_interactive=False)
gdf2001.head()

In [None]:
# Return the length of the object `gdf2001`.
len(gdf2001)

## Modifying SHP census 2010-2023


Argentina's projected population data for 2010-2023, aggregated by department, were obtained from the website [poblaciones.org](https://poblaciones.org/).


This dataset will be modified below. All polygons in Buenos Aires City will be dissolved because, according to the Department of Health Statistics (DEIS), "in the case of Buenos Aires city, we treated it as a single geographic unit. The subdivision of Buenos Aires City was not homogeneous throughout the requested period." Consequently, Buenos Aires City will be considered as a single department.

In [None]:
# Read a shapefile into a GeoDataFrame.
gdf= gpd.read_file("pdt/asthma_mortality/data/shp/proyecciones_departamento_2010-2025.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame to get an overview of the data structure and content
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Display information about the GeoDataFrame, such as column data types and non-null counts
gdf.info()

In [None]:
# Display the column names of the GeoDataFrame
gdf.columns

In [None]:
# Drop unnecessary columns from the GeoDataFrame 'gdf'
gdf = gdf.drop(columns=['CODPROV', 'PROVINCIA', 'CODDPTO', 'DEPARTAMEN','DEPARTAM_1', 'DEPARTAM_2',
                        'DEPARTAM_3', 'DEPARTAM_4', 'DEPARTAM_5', 'A_2024', 'A_2025'])

In [None]:
# Display the column names of the GeoDataFrame
gdf.columns

In [None]:
# Filter the GeoDataFrame to include only rows where the 'IDDPTO' column starts with '02'
departamentos_02 = gdf[gdf['IDDPTO'].str.startswith('02')]

# Perform a unary union operation on the geometries in the 'departamentos_02' GeoDataFrame
poligono_union = departamentos_02.union_all()

# Calculate the sum of population for each year from 2010 to 2023
suma_poblacion2010 = departamentos_02['A_2010'].sum()
suma_poblacion2011 = departamentos_02['A_2011'].sum()
suma_poblacion2012 = departamentos_02['A_2012'].sum()
suma_poblacion2013 = departamentos_02['A_2013'].sum()
suma_poblacion2014 = departamentos_02['A_2014'].sum()
suma_poblacion2015 = departamentos_02['A_2015'].sum()
suma_poblacion2016 = departamentos_02['A_2016'].sum()
suma_poblacion2017 = departamentos_02['A_2017'].sum()
suma_poblacion2018 = departamentos_02['A_2018'].sum()
suma_poblacion2019 = departamentos_02['A_2019'].sum()
suma_poblacion2020 = departamentos_02['A_2020'].sum()
suma_poblacion2021 = departamentos_02['A_2021'].sum()
suma_poblacion2022 = departamentos_02['A_2022'].sum()
suma_poblacion2023 = departamentos_02['A_2023'].sum()

# # Update the geometry and population data for the specified indices
gdf.loc[departamentos_02.index, 'geometry'] = poligono_union
gdf.loc[departamentos_02.index, 'A_2010'] = suma_poblacion2010
gdf.loc[departamentos_02.index, 'A_2011'] = suma_poblacion2011
gdf.loc[departamentos_02.index, 'A_2012'] = suma_poblacion2012
gdf.loc[departamentos_02.index, 'A_2013'] = suma_poblacion2013
gdf.loc[departamentos_02.index, 'A_2014'] = suma_poblacion2014
gdf.loc[departamentos_02.index, 'A_2015'] = suma_poblacion2015
gdf.loc[departamentos_02.index, 'A_2016'] = suma_poblacion2016
gdf.loc[departamentos_02.index, 'A_2017'] = suma_poblacion2017
gdf.loc[departamentos_02.index, 'A_2018'] = suma_poblacion2018
gdf.loc[departamentos_02.index, 'A_2019'] = suma_poblacion2019
gdf.loc[departamentos_02.index, 'A_2020'] = suma_poblacion2020
gdf.loc[departamentos_02.index, 'A_2021'] = suma_poblacion2021
gdf.loc[departamentos_02.index, 'A_2022'] = suma_poblacion2022
gdf.loc[departamentos_02.index, 'A_2023'] = suma_poblacion2023

# Filter out rows where 'IDDPTO' starts with '02'
gdf = gdf[~gdf['IDDPTO'].str.startswith('02')]

# Creating a new GeoDataFrame row with population data and geometry
new_row = gpd.GeoDataFrame({'IDDPTO': '02000', 'geometry': poligono_union, 'A_2010': suma_poblacion2010,
                            'A_2011': suma_poblacion2011,
                            'A_2012': suma_poblacion2012,
                            'A_2013': suma_poblacion2013,
                            'A_2014': suma_poblacion2014,
                            'A_2015': suma_poblacion2015,
                            'A_2016': suma_poblacion2016,
                            'A_2017': suma_poblacion2017,
                            'A_2018': suma_poblacion2018,
                            'A_2019': suma_poblacion2019,
                            'A_2020': suma_poblacion2020,
                            'A_2021': suma_poblacion2021,
                            'A_2022': suma_poblacion2022,
                            'A_2023': suma_poblacion2023,
                            'IDPROV': '02',
                            'PROV': 'CABA',
                            'DPTO': 'CABA',
                            }, index=[0], crs=gdf.crs)

# Adding a new row to the GeoDataFrame with population data and geometry
gdf = pd.concat([gdf, new_row], ignore_index=True)

In [None]:
# Saving the modified GeoDataFrame to a shapefile
gdf.to_file("pdt/asthma_mortality/data/shp/proyecciones_departamento_2010-2023_modified.shp")

Since the dataset resulting from the union of the CABA polygons has topological inconsistencies (holes), it was cleaned with the Delete Holes tool in QGIS.

In [None]:
# This line reads a shapefile containing projections for departments from 2010 to 2023
gdf2010_2023= gpd.read_file("pdt/asthma_mortality/data/shp/proyecciones_departamento_2010-2023_clean.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame to inspect its structure
init_notebook_mode(all_interactive=True)
gdf2010_2023.head()

In [None]:
# Get the length of the GeoDataFrame
len(gdf2010_2023)

## Merging data from 2001 and 2010-2023 census

In [None]:
# Merging the 2010-2023 GeoDataFrame with the 2001 GeoDataFrame on the 'IDDPTO' column using a left join
gdf01_1023 = gdf2010_2023.merge(gdf2001, on='IDDPTO', how='left')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf01_1023.head()

In [None]:
# Display the columns of the GeoDataFrame
gdf01_1023.columns

In [None]:
# Reordering the columns in the DataFrame to match the specified order
new_column_order = ['IDPROV', 'PROV', 'IDDPTO', 'DPTO', 'A_2001', 'A_2010', 'A_2011', 'A_2012', 'A_2013', 'A_2014', 'A_2015',
                    'A_2016', 'A_2017', 'A_2018', 'A_2019', 'A_2020', 'A_2021', 'A_2022', 'A_2023', 'geometry_x']
gdf01_1023 = gdf01_1023[new_column_order]
gdf01_1023 = gdf01_1023.rename(columns={'geometry_x': 'geometry'})


In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf01_1023.head()

In [None]:
# Display information about the GeoDataFrame, such as column data types and non-null counts
gdf01_1023.info()

In [None]:
# Save the GeoDataFrame to a shapefile in the specified directory
gdf01_1023.to_file("pdt/asthma_mortality/data/shp/proyecciones_departamento_2001-2010_2023_01.shp")

## Compute the population by department between 2002 and 2009


We have data on number of inhabitants per departments in 2001 and from 2010 to 2023. The population per department between 2002 and 2009 will be calculated in two stages: i) the population of each province will be calculated using the average annual growth rate (AAR) (average annual growth rate per thousand inhabitants); ii) the population of each department will be estimated using the projected provincial population using the relative increase method. The resulting dataset will show the population by department between 2001 and 2022.

In [None]:
# This line reads a shapefile into a GeoDataFrame using Geopandas
gdf= gpd.read_file("pdt/asthma_mortality/data/shp/proyecciones_departamento_2001-2010_2023_01.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Adding new columns to the GeoDataFrame with specified data types
gdf['PP_2001'] = pd.Series(dtype='Int64')
gdf['PP_2010'] = pd.Series(dtype='Int64')
gdf['TCAM_0110'] = np.nan

In [None]:
# Extract unique province IDs from the 'IDPROV' column in the GeoDataFrame
lcp = sorted(list(gdf['IDPROV'].unique()))
print(lcp)
len(lcp)

Generate a list of population of each province for 2001 and 2010, as well as the TCAM. This data was obtained from INDEC (https://www.indec.gob.ar/ftp/nuevaweb/cuadros/7/sesd_01a01.xls)

In [None]:
# list with the population of each province for the year 2001
lpp2001 = [2995805, 14211087, 338168, 3154833, 942870, 995192, 427404, 1177747, 492221, 619929, 307500, 297149, 1611091, 973225, 489997, 575043, 1090600, 630793, 375865, 199381, 3102849, 812609, 1359114, 102498   ]

In [None]:
# list with the population of each province for the year 2010
lpp2010 = [3038430, 15771581, 378977, 3384649, 1021242, 1083740, 515203, 1259903, 553528, 685870, 32815, 343765, 1780854, 1117121, 573881, 650511, 1243386, 698476, 445477, 276407, 3269134, 883684, 1494358, 132116]

In [None]:
# list with  TCAM each province
ltcam0110 = [1.6, 11.7, 12.8, 7.9, 9, 9.6, 21.1, 7.6, 13.2, 11.4, 7.3, 16.4,  11.3, 15.5, 17.8, 13.9,  14.8, 11.5, 19.2, 37.2, 5.8, 9.4, 10.7, 28.8 ]

In [None]:
# Iterate through the list of province IDs (lcp) and update the GeoDataFrame (gdf) with population data and growth rates
for i in range(len(lcp)):
  gdf.loc[gdf['IDPROV'] == lcp[i], 'PP_2001'] = lpp2001[i]
  gdf.loc[gdf['IDPROV'] == lcp[i], 'PP_2010'] = lpp2010[i]
  gdf.loc[gdf['IDPROV'] == lcp[i], 'TCAM_0110'] = ltcam0110[i]

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

To calculate the projected population of each province per year between 2002 and 2009, the following equation will be used:

\begin{align}
TCAM = (\sqrt[t]{Pf/Pi}-1).1000
\end{align}

* TCAM: average annual growth rate per thousand inhabitants
* Pf: population at the end of the period
* Pi: population at the beginning of the period.
* t: magnitude of the given period expressed in years.

Fuente: https://www.argentina.gob.ar/interior/renaper/estadistica-de-poblacion/crecimiento-poblacional-2001-2010-y-2010-2022


By clearing Pf from the equation we have:

\begin{align}
Pf = Pi*(1+(\frac{TCAM}{1000})^{t})
\end{align}

This calculation will only be carried out between 2001 and 2009, from 2010 onwards, the projected population by province and department is avaliable at [poblaciones.org](https://poblaciones.org/).

In [None]:
# Calculate the projected population by province for each year between 2002 and 2009
#2002
gdf['PP_2002'] = (pow(((gdf.TCAM_0110/1000)+1), 1)*gdf.PP_2001).astype(int)

#2003
gdf['PP_2003'] = (pow(((gdf.TCAM_0110/1000)+1), 2)*gdf.PP_2001).astype(int)

#2004
gdf['PP_2004'] = (pow(((gdf.TCAM_0110/1000)+1), 3)*gdf.PP_2001).astype(int)

#2005
gdf['PP_2005'] = (pow(((gdf.TCAM_0110/1000)+1), 4)*gdf.PP_2001).astype(int)

#2006
gdf['PP_2006'] = (pow(((gdf.TCAM_0110/1000)+1), 5)*gdf.PP_2001).astype(int)

#2007
gdf['PP_2007'] = (pow(((gdf.TCAM_0110/1000)+1), 6)*gdf.PP_2001).astype(int)

#2008
gdf['PP_2008'] = (pow(((gdf.TCAM_0110/1000)+1), 7)*gdf.PP_2001).astype(int)

#2009
gdf['PP_2009'] = (pow(((gdf.TCAM_0110/1000)+1), 8)*gdf.PP_2001).astype(int)

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

The relative increase method will be used to calculate the projected population of each department per year between 2002 and 2009 through the following equation:

\begin{align}
P^{t}_{i} = a_{i}.P^{t}_{T}+b_{i}
\end{align}

Where:
\begin{align}
P^{t}_{i} = \text{population of the smallest area i in year t}
\\
P^{t}_{T} = \text{population of the largest area T in year t}
\end{align}

The proportional increase coefficient of the smaller area (a) in relation to the increase in population of the larger area is equal to:

\begin{align}
{a_{i} = \frac{P^{1}_{i}-P^{o}_{i}}{P^{1}_{T}-P^{0}_{T}}}
\end{align}

The linear correlation coefficient (b) is calculated according to the following equation:

\begin{align}
{b_{i} = \frac{P^{1}_{i}+P^{o}_{i}-a_{i}.(P^{1}_{T}+P^{o}_{T})}{2}}
\end{align}

Fuente: https://biblioteca.indec.gob.ar/bases/minde/4si20_34.pdf,

https://files.alapop.org/alap/Serie-E-Investigaciones/N2/Capitulos/Capitulo4_Estimaciones&Proyecciones.pdf

In [None]:
# proportional increase coefficient of the smaller area
gdf['a'] = (gdf['A_2010']-gdf['A_2001'])/(gdf['PP_2010']-gdf['PP_2001'])

In [None]:
# linear correlation coefficient (b)
gdf['b'] = ((gdf['A_2010']+gdf['A_2001'])-(gdf['a']*(gdf['PP_2010']+gdf['PP_2001'])))/2

In [None]:
# Calculate the population of the smaller areas (departments) between 2002 and 2009
# 2002
gdf['A_2002'] = (gdf['a'] * gdf['PP_2002'] + gdf['b']).astype(int)

# 2003
gdf['A_2003'] = (gdf['a'] * gdf['PP_2003'] + gdf['b']).astype(int)

# 2004
gdf['A_2004'] = (gdf['a'] * gdf['PP_2004'] + gdf['b']).astype(int)

# 2005
gdf['A_2005'] = (gdf['a'] * gdf['PP_2005'] + gdf['b']).astype(int)

# 2006
gdf['A_2006'] = (gdf['a'] * gdf['PP_2006'] + gdf['b']).astype(int)

# 2007
gdf['A_2007'] = (gdf['a'] * gdf['PP_2007'] + gdf['b']).astype(int)

# 2008
gdf['A_2008'] = (gdf['a'] * gdf['PP_2008'] + gdf['b']).astype(int)

# 2009
gdf['A_2009'] = (gdf['a'] * gdf['PP_2009'] + gdf['b']).astype(int)

In [None]:
# Display the columns of the GeoDataFrame
gdf.columns

In [None]:
# Selecting specific columns from the GeoDataFrame
gdf = gdf[['IDPROV', 'PROV', 'IDDPTO', 'DPTO', 'A_2001', 'A_2002', 'A_2003', 'A_2004', 'A_2005', 'A_2006','A_2007', 'A_2008', 'A_2009', 'A_2010', 'A_2011',\
           'A_2012', 'A_2013', 'A_2014', 'A_2015', 'A_2016', 'A_2017', 'A_2018','A_2019', 'A_2020', 'A_2021', 'A_2022', 'geometry']]

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/censo_2001_2022.shp", encoding='utf-8')

## Calculating the adjusted asthma mortality rate

Using population and asthma mortality data by department, we will calculate the adjusted asthma mortality rate per 100,000 inhabitants for each year of study (2001-2022).

In the years 2011, 2014, 2018, 2021 and 2022, it is necessary to make modifications to the geographic code of some departments due to the following:

* Lezama District, Province of Buenos Aires: Law 14,087, passed by the Chamber of Deputies of the Province of Buenos Aires (22-12-2009), created a new municipality called Lezama over portions of the territory belonging to the Chascomús District. Since the geographic boundaries of the Lezama District were not available at the time the census 2010, the corresponding information is included within the Chascomús District.

* Regarding the assignment of geographic codes, the code used in the census 2010 for the Chascomús district (code 217) has been modified by the new code 218. The Lezama district will be assigned code 466.


### 2001

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2001
df2001 = df[df['ANIO'] == 2001]

In [None]:
# Display the first few rows of the DataFrame for the year 2001
init_notebook_mode(all_interactive=False)
df2001.head()

In [None]:
# The following code selects only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2001
df2001 = df2001[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2001
init_notebook_mode(all_interactive=False)
df2001.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2001 DataFrame
df2001['IDDPTO'] = df2001['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2001
init_notebook_mode(all_interactive=False)
df2001.head()

In [None]:
# Get the number of rows in the DataFrame `df2001`
len(df2001)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2001_2 = df2001.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2001' in the dataframe df2001_2
df2001_2 = df2001_2.rename(columns={'CANTIDAD': 'C_2001'})

In [None]:
# Display the first few rows of the DataFrame for the year 2001
init_notebook_mode(all_interactive=False)
df2001_2.head()

In [None]:
# Get the number of rows in the DataFrame df2001_2
len(df2001_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/censo_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2001_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2001_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2001' column with 0 and convert the column to integer type
gdf['C_2001'] = gdf['C_2001'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2001'] = round((gdf['C_2001']/gdf['A_2001'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2001' column
init_notebook_mode(all_interactive=False)
gdf['CA_2001'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2002

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2002
df2002 = df[df['ANIO'] == 2002]

In [None]:
# Display the first few rows of the DataFrame for the year 2002
init_notebook_mode(all_interactive=False)
df2002.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2002
df2002 = df2002[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2002
init_notebook_mode(all_interactive=False)
df2002.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2002 DataFrame
df2002['IDDPTO'] = df2002['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2002
init_notebook_mode(all_interactive=False)
df2002.head()

In [None]:
# Get the number of rows in the DataFrame `df2002`
len(df2002)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2002_2 = df2002.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2002' in the dataframe df2002_2
df2002_2 = df2002_2.rename(columns={'CANTIDAD': 'C_2002'})

In [None]:
# Display the first few rows of the DataFrame for the year 2002
init_notebook_mode(all_interactive=False)
df2002_2.head()

In [None]:
# Get the number of rows in the DataFrame df2002_2
len(df2002_2)

In [None]:
# Read the shapefile into a GeoDataFrame with UTF-8 encoding
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2002_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2002_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2002' column with 0 and convert the column to integer type
gdf['C_2002'] = gdf['C_2002'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2002'] = round((gdf['C_2002']/gdf['A_2002'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2002' column
init_notebook_mode(all_interactive=False)
gdf['CA_2002'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

###2003

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2003
df2003 = df[df['ANIO'] == 2003]

In [None]:
# Display the first few rows of the DataFrame for the year 2003
init_notebook_mode(all_interactive=False)
df2003.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2003
df2003 = df2003[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2003
init_notebook_mode(all_interactive=False)
df2003.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2003 DataFrame
df2003['IDDPTO'] = df2003['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2003
init_notebook_mode(all_interactive=False)
df2003.head()

In [None]:
# Get the number of rows in the DataFrame `df2003`
len(df2003)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2003_2 = df2003.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2003' in the dataframe df2003_2
df2003_2 = df2003_2.rename(columns={'CANTIDAD': 'C_2003'})

In [None]:
# Display the first few rows of the DataFrame for the year 2003
init_notebook_mode(all_interactive=False)
df2003_2.head()

In [None]:
# Get the number of rows in the DataFrame df2003_2
len(df2003_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2003_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2003_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2001' column with 0 and convert the column to integer type
gdf['C_2003'] = gdf['C_2003'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2003'] = round((gdf['C_2003']/gdf['A_2003'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2003' column
init_notebook_mode(all_interactive=False)
gdf['CA_2003'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2004

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2004
df2004 = df[df['ANIO'] == 2004]

In [None]:
# Display the first few rows of the DataFrame for the year 2004
init_notebook_mode(all_interactive=False)
df2004.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2004
df2004 = df2004[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2004
init_notebook_mode(all_interactive=False)
df2004.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2004 DataFrame
df2004['IDDPTO'] = df2004['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2004
init_notebook_mode(all_interactive=False)
df2004.head()

In [None]:
# Get the number of rows in the DataFrame `df2004`
len(df2004)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2004_2 = df2004.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2004' in the dataframe df2004_2
df2004_2 = df2004_2.rename(columns={'CANTIDAD': 'C_2004'})

In [None]:
# Display the first few rows of the DataFrame for the year 2004
init_notebook_mode(all_interactive=False)
df2004_2.head()

In [None]:
# Get the number of rows in the DataFrame df2004_2
len(df2004_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2004_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2004_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2001' column with 0 and convert the column to integer type
gdf['C_2004'] = gdf['C_2004'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2004'] = round((gdf['C_2004']/gdf['A_2004'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2004' column
init_notebook_mode(all_interactive=False)
gdf['CA_2004'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2005

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2005
df2005 = df[df['ANIO'] == 2005]

In [None]:
# Display the first few rows of the DataFrame for the year 2005
init_notebook_mode(all_interactive=False)
df2005.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2001
df2005 = df2005[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2005
init_notebook_mode(all_interactive=False)
df2005.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2005 DataFrame
df2005['IDDPTO'] = df2005['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2005
init_notebook_mode(all_interactive=False)
df2005.head()

In [None]:
# Get the number of rows in the DataFrame `df2005`
len(df2005)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2005_2 = df2005.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2005' in the dataframe df2005_2
df2005_2 = df2005_2.rename(columns={'CANTIDAD': 'C_2005'})

In [None]:
# Display the first few rows of the DataFrame for the year 2005
init_notebook_mode(all_interactive=False)
df2005_2.head()

In [None]:
# Get the number of rows in the DataFrame df2005_2
len(df2005_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2005_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2005_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2005' column with 0 and convert the column to integer type
gdf['C_2005'] = gdf['C_2005'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2005'] = round((gdf['C_2005']/gdf['A_2005'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2005' column
init_notebook_mode(all_interactive=False)
gdf['CA_2005'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2006

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2006
df2006 = df[df['ANIO'] == 2006]

In [None]:
# Display the first few rows of the DataFrame for the year 2006
init_notebook_mode(all_interactive=False)
df2006.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2006
df2006 = df2006[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2006
init_notebook_mode(all_interactive=False)
df2006.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2006 DataFrame
df2006['IDDPTO'] = df2006['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2006
init_notebook_mode(all_interactive=False)
df2006.head()

In [None]:
# Get the number of rows in the DataFrame `df2006`
len(df2006)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2006_2 = df2006.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2006' in the dataframe df2006_2
df2006_2 = df2006_2.rename(columns={'CANTIDAD': 'C_2006'})

In [None]:
# Display the first few rows of the DataFrame for the year 2006
init_notebook_mode(all_interactive=False)
df2006_2.head()

In [None]:
# Get the number of rows in the DataFrame df2006_2
len(df2006_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2006_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2006_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2006' column with 0 and convert the column to integer type
gdf['C_2006'] = gdf['C_2006'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2006'] = round((gdf['C_2006']/gdf['A_2006'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2006' column
init_notebook_mode(all_interactive=False)
gdf['CA_2006'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2007

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2007
df2007 = df[df['ANIO'] == 2007]

In [None]:
# Display the first few rows of the DataFrame for the year 2007
init_notebook_mode(all_interactive=False)
df2007.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2007
df2007 = df2007[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2007
init_notebook_mode(all_interactive=False)
df2007.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2007 DataFrame
df2007['IDDPTO'] = df2007['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2007
init_notebook_mode(all_interactive=False)
df2007.head()

In [None]:
# Get the number of rows in the DataFrame `df2007`
len(df2007)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2007_2 = df2007.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2007' in the dataframe df2007_2
df2007_2 = df2007_2.rename(columns={'CANTIDAD': 'C_2007'})

In [None]:
# Display the first few rows of the DataFrame for the year 2007
init_notebook_mode(all_interactive=False)
df2007_2.head()

In [None]:
# Get the number of rows in the DataFrame df2007_2
len(df2007_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2007_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2007_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2001' column with 0 and convert the column to integer type
gdf['C_2007'] = gdf['C_2007'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2007'] = round((gdf['C_2007']/gdf['A_2007'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2007' column
init_notebook_mode(all_interactive=False)
gdf['CA_2007'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2008

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2008
df2008 = df[df['ANIO'] == 2008]

In [None]:
# Display the first few rows of the DataFrame for the year 2008
init_notebook_mode(all_interactive=False)
df2008.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2008
df2008 = df2008[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2008
init_notebook_mode(all_interactive=False)
df2008.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2008 DataFrame
df2008['IDDPTO'] = df2008['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2008
init_notebook_mode(all_interactive=False)
df2008.head()

In [None]:
# Get the number of rows in the DataFrame `df2008`
len(df2008)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2008_2 = df2008.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2008' in the dataframe df2008_2
df2008_2 = df2008_2.rename(columns={'CANTIDAD': 'C_2008'})

In [None]:
# Display the first few rows of the DataFrame for the year 2008
init_notebook_mode(all_interactive=False)
df2008_2.head()

In [None]:
# Get the number of rows in the DataFrame df2008_2
len(df2008_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2008_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2008_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2008' column with 0 and convert the column to integer type
gdf['C_2008'] = gdf['C_2008'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2008'] = round((gdf['C_2008']/gdf['A_2008'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2008' column
init_notebook_mode(all_interactive=True)
gdf['CA_2008'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2009

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2009
df2009 = df[df['ANIO'] == 2009]

In [None]:
# Display the first few rows of the DataFrame for the year 2009
init_notebook_mode(all_interactive=False)
df2009.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2009
df2009 = df2009[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2009
init_notebook_mode(all_interactive=False)
df2009.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2009 DataFrame
df2009['IDDPTO'] = df2009['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2009
init_notebook_mode(all_interactive=False)
df2009.head()

In [None]:
# Get the number of rows in the DataFrame `df2009`
len(df2009)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2009_2 = df2009.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2009' in the dataframe df2009_2
df2009_2 = df2009_2.rename(columns={'CANTIDAD': 'C_2009'})

In [None]:
# Display the first few rows of the DataFrame for the year 2009
init_notebook_mode(all_interactive=False)
df2009_2.head()

In [None]:


# Get the number of rows in the DataFrame df2009_2
len(df2009_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2009_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2009_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2001' column with 0 and convert the column to integer type
gdf['C_2009'] = gdf['C_2009'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2009'] = round((gdf['C_2009']/gdf['A_2009'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2009' column
init_notebook_mode(all_interactive=False)
gdf['CA_2009'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2010

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2010
df2010 = df[df['ANIO'] == 2010]

In [None]:
# Display the first few rows of the DataFrame for the year 2010
init_notebook_mode(all_interactive=False)
df2010.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2010
df2010 = df2010[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2010
init_notebook_mode(all_interactive=False)
df2010.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2010 DataFrame
df2010['IDDPTO'] = df2010['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2010
init_notebook_mode(all_interactive=False)
df2010.head()

In [None]:
# Get the number of rows in the DataFrame `df2010`
len(df2010)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2010_2 = df2010.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2010' in the dataframe df2010_2
df2010_2 = df2010_2.rename(columns={'CANTIDAD': 'C_2010'})

In [None]:
# Display the first few rows of the DataFrame for the year 2010
init_notebook_mode(all_interactive=False)
df2010_2.head()

In [None]:
# Get the number of rows in the DataFrame df2010_2
len(df2010_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2010_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2010_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2010' column with 0 and convert the column to integer type
gdf['C_2010'] = gdf['C_2010'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2010'] = round((gdf['C_2010']/gdf['A_2010'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2010' column
init_notebook_mode(all_interactive=False)
gdf['CA_2010'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2011



In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2011
df2011 = df[df['ANIO'] == 2011]

In [None]:
# Display the first few rows of the DataFrame for the year 2011
init_notebook_mode(all_interactive=False)
df2011.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2011
df2011 = df2011[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2011
init_notebook_mode(all_interactive=False)
df2011.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2011 DataFrame
df2011['IDDPTO'] = df2011['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2011
init_notebook_mode(all_interactive=False)
df2011.head()

In [None]:
# Get the number of rows in the DataFrame `df2011`
len(df2011)

In [None]:
# Find rows where IDDPTO is equal to '06466' in df2011
lezama_data = df2011[df2011['IDDPTO'] == '06466']

# Print the data for Lezama
init_notebook_mode(all_interactive=False)
lezama_data


In [None]:
# Change the 'IDDPTO' value from '06466' to '06217'
df2011.loc[df2011['IDDPTO'] == '06466', 'IDDPTO'] = '06217'

In [None]:
# Find rows where IDDPTO is equal to '06466' in df2011
lezama_data = df2011[df2011['IDDPTO'] == '06466']

# Print the data for Lezama
init_notebook_mode(all_interactive=False)
lezama_data

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2011_2 = df2011.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2011' in the dataframe df2011_2
df2011_2 = df2011_2.rename(columns={'CANTIDAD': 'C_2011'})

In [None]:
# Display the first few rows of the DataFrame for the year 2011
init_notebook_mode(all_interactive=False)
df2011_2.head()

In [None]:
# Get the number of rows in the DataFrame df2011_2
len(df2011_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2011_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2011_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2011' column with 0 and convert the column to integer type
gdf['C_2011'] = gdf['C_2011'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2011'] = round((gdf['C_2011']/gdf['A_2011'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2011' column
init_notebook_mode(all_interactive=True)
gdf['CA_2011'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2012

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2012
df2012 = df[df['ANIO'] == 2012]

In [None]:
# Display the first few rows of the DataFrame for the year 2012
init_notebook_mode(all_interactive=False)
df2012.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2012
df2012 = df2012[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2012
init_notebook_mode(all_interactive=False)
df2012.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2011 DataFrame
df2012['IDDPTO'] = df2012['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2012
init_notebook_mode(all_interactive=False)
df2012.head()

In [None]:
# Get the number of rows in the DataFrame `df2012`
len(df2012)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2012_2 = df2012.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2012' in the dataframe df2012_2
df2012_2 = df2012_2.rename(columns={'CANTIDAD': 'C_2012'})

In [None]:
# Display the first few rows of the DataFrame for the year 2012
init_notebook_mode(all_interactive=False)
df2012_2.head()

In [None]:
# Get the number of rows in the DataFrame df2012_2
len(df2012_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2012_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2012_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2012' column with 0 and convert the column to integer type
gdf['C_2012'] = gdf['C_2012'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2012'] = round((gdf['C_2012']/gdf['A_2012'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2012' column
init_notebook_mode(all_interactive=False)
gdf['CA_2012'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2013

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2013
df2013 = df[df['ANIO'] == 2013]

In [None]:
# Display the first few rows of the DataFrame for the year 2013
init_notebook_mode(all_interactive=False)
df2013.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2013
df2013 = df2013[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2013
init_notebook_mode(all_interactive=False)
df2013.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2001 DataFrame
df2013['IDDPTO'] = df2013['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2013
init_notebook_mode(all_interactive=False)
df2013.head()

In [None]:
# Get the number of rows in the DataFrame `df2013`
len(df2013)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2013_2 = df2013.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2013' in the dataframe df2013_2
df2013_2 = df2013_2.rename(columns={'CANTIDAD': 'C_2013'})

In [None]:
# Display the first few rows of the DataFrame for the year 2013
init_notebook_mode(all_interactive=False)
df2013_2.head()

In [None]:
# Get the number of rows in the DataFrame df2013_2
len(df2013_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2013_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2013_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2013' column with 0 and convert the column to integer type
gdf['C_2013'] = gdf['C_2013'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2013'] = round((gdf['C_2013']/gdf['A_2013'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2013' column
init_notebook_mode(all_interactive=True)
gdf['CA_2013'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2014

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2014
df2014 = df[df['ANIO'] == 2014]

In [None]:
# Display the first few rows of the DataFrame for the year 2014
init_notebook_mode(all_interactive=False)
df2014.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2014
df2014 = df2014[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2014
init_notebook_mode(all_interactive=False)
df2014.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2014 DataFrame
df2014['IDDPTO'] = df2014['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2014
init_notebook_mode(all_interactive=False)
df2014.head()

In [None]:
# Get the number of rows in the DataFrame `df2014`
len(df2014)

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2014[df2014['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data


In [None]:
# Change the 'IDDPTO' value from '06218' to '06217'
df2014.loc[df2014['IDDPTO'] == '06218', 'IDDPTO'] = '06217'

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2014[df2014['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2014_2 = df2014.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2014' in the dataframe df2014_2
df2014_2 = df2014_2.rename(columns={'CANTIDAD': 'C_2014'})

In [None]:
# Display the first few rows of the DataFrame for the year 2014
init_notebook_mode(all_interactive=False)
df2014_2.head()

In [None]:
# Get the number of rows in the DataFrame df2014_2
len(df2014_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2014_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2014_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2014' column with 0 and convert the column to integer type
gdf['C_2014'] = gdf['C_2014'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2014'] = round((gdf['C_2014']/gdf['A_2014'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2014' column
init_notebook_mode(all_interactive=False)
gdf['CA_2014'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2015

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2015
df2015 = df[df['ANIO'] == 2015]

In [None]:
# Display the first few rows of the DataFrame for the year 2015
init_notebook_mode(all_interactive=False)
df2015.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2015
df2015 = df2015[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2015
init_notebook_mode(all_interactive=False)
df2015.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2015 DataFrame
df2015['IDDPTO'] = df2015['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2015
init_notebook_mode(all_interactive=False)
df2015.head()

In [None]:
# Get the number of rows in the DataFrame `df2015`
len(df2015)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2015_2 = df2015.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2015' in the dataframe df2015_2
df2015_2 = df2015_2.rename(columns={'CANTIDAD': 'C_2015'})

In [None]:
# Display the first few rows of the DataFrame for the year 2015
init_notebook_mode(all_interactive=False)
df2015_2.head()

In [None]:
# Get the number of rows in the DataFrame df2015_2
len(df2015_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2015_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2015_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2015' column with 0 and convert the column to integer type
gdf['C_2015'] = gdf['C_2015'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2015'] = round((gdf['C_2015']/gdf['A_2015'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2015' column
init_notebook_mode(all_interactive=False)
gdf['CA_2015'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2016

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2016
df2016 = df[df['ANIO'] == 2016]

In [None]:
# Display the first few rows of the DataFrame for the year 2016
init_notebook_mode(all_interactive=False)
df2016.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2016
df2016 = df2016[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2016
init_notebook_mode(all_interactive=False)
df2016.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2016 DataFrame
df2016['IDDPTO'] = df2016['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2016
init_notebook_mode(all_interactive=False)
df2016.head()

In [None]:
# Get the number of rows in the DataFrame `df2016`
len(df2016)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2016_2 = df2016.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2016' in the dataframe df2016_2
df2016_2 = df2016_2.rename(columns={'CANTIDAD': 'C_2016'})

In [None]:
# Display the first few rows of the DataFrame for the year 2016
init_notebook_mode(all_interactive=False)
df2016_2.head()

In [None]:
# Get the number of rows in the DataFrame df2016_2
len(df2016_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2016_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2016_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2016' column with 0 and convert the column to integer type
gdf['C_2016'] = gdf['C_2016'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2016'] = round((gdf['C_2016']/gdf['A_2016'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2016' column
init_notebook_mode(all_interactive=False)
gdf['CA_2016'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2017

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2017
df2017 = df[df['ANIO'] == 2017]

In [None]:
# Display the first few rows of the DataFrame for the year 2017
init_notebook_mode(all_interactive=False)
df2017.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2017
df2017 = df2017[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2017
init_notebook_mode(all_interactive=False)
df2017.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2017 DataFrame
df2017['IDDPTO'] = df2017['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2017
init_notebook_mode(all_interactive=False)
df2017.head()

In [None]:
# Get the number of rows in the DataFrame `df2017`
len(df2017)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2017_2 = df2017.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2017' in the dataframe df2017_2
df2017_2 = df2017_2.rename(columns={'CANTIDAD': 'C_2017'})

In [None]:
# Display the first few rows of the DataFrame for the year 2017
init_notebook_mode(all_interactive=False)
df2017_2.head()

In [None]:
# Get the number of rows in the DataFrame df2017_2
len(df2017_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2017_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2017_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2016' column with 0 and convert the column to integer type
gdf['C_2017'] = gdf['C_2017'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2017'] = round((gdf['C_2017']/gdf['A_2017'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2017' column
init_notebook_mode(all_interactive=False)
gdf['CA_2017'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2018

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2018
df2018 = df[df['ANIO'] == 2018]

In [None]:
# Display the first few rows of the DataFrame for the year 2018
init_notebook_mode(all_interactive=False)
df2018.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2018
df2018 = df2018[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2018
init_notebook_mode(all_interactive=False)
df2018.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2018 DataFrame
df2018['IDDPTO'] = df2018['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2018
init_notebook_mode(all_interactive=False)
df2018.head()

In [None]:
# Get the number of rows in the DataFrame df2018
len(df2018)

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2018[df2018['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Change the 'IDDPTO' value from '06218' to '06217'
df2018.loc[df2018['IDDPTO'] == '06218', 'IDDPTO'] = '06217'

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2018[df2018['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2018_2 = df2018.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2018' in the dataframe df2018_2
df2018_2 = df2018_2.rename(columns={'CANTIDAD': 'C_2018'})

In [None]:
# Display the first few rows of the DataFrame for the year 2018
init_notebook_mode(all_interactive=False)
df2018_2.head()

In [None]:
# Get the number of rows in the DataFrame df2018_2
len(df2018_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2018_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2018_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2018' column with 0 and convert the column to integer type
gdf['C_2018'] = gdf['C_2018'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2018'] = round((gdf['C_2018']/gdf['A_2018'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2018' column
init_notebook_mode(all_interactive=False)
gdf['CA_2018'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2019

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2019
df2019 = df[df['ANIO'] == 2019]

In [None]:
# Display the first few rows of the DataFrame for the year 2019
init_notebook_mode(all_interactive=False)
df2019.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2019
df2019 = df2019[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2019
init_notebook_mode(all_interactive=False)
df2019.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2019 DataFrame
df2019['IDDPTO'] = df2019['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2019
init_notebook_mode(all_interactive=False)
df2019.head()

In [None]:
# Get the number of rows in the DataFrame `df2019`
len(df2019)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2019_2 = df2019.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2019' in the dataframe df2019_2
df2019_2 = df2019_2.rename(columns={'CANTIDAD': 'C_2019'})

In [None]:
# Display the first few rows of the DataFrame for the year 2019
init_notebook_mode(all_interactive=False)
df2019_2.head()

In [None]:
# Get the number of rows in the DataFrame df2019_2
len(df2019_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2019_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2019_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2019' column with 0 and convert the column to integer type
gdf['C_2019'] = gdf['C_2019'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2019'] = round((gdf['C_2019']/gdf['A_2019'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2019' column
init_notebook_mode(all_interactive=False)
gdf['CA_2019'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

### 2020

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2020
df2020 = df[df['ANIO'] == 2020]

In [None]:
# Display the first few rows of the DataFrame for the year 2020
init_notebook_mode(all_interactive=False)
df2020.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2020
df2020 = df2020[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2020
init_notebook_mode(all_interactive=False)
df2020.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2020 DataFrame
df2020['IDDPTO'] = df2020['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2020
init_notebook_mode(all_interactive=False)
df2020.head()

In [None]:
# Get the number of rows in the DataFrame `df2020`
len(df2020)

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2020_2 = df2020.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2020' in the dataframe df2020_2
df2020_2 = df2020_2.rename(columns={'CANTIDAD': 'C_2020'})

In [None]:
# Display the first few rows of the DataFrame for the year 2020
init_notebook_mode(all_interactive=False)
df2020_2.head()

In [None]:
# Get the number of rows in the DataFrame df2020_2
len(df2020_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2020_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2020_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2020' column with 0 and convert the column to integer type
gdf['C_2020'] = gdf['C_2020'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2020'] = round((gdf['C_2020']/gdf['A_2020'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2020' column
init_notebook_mode(all_interactive=False)
gdf['CA_2020'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

###2021

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2021
df2021 = df[df['ANIO'] == 2021]

In [None]:
# Display the first few rows of the DataFrame for the year 2021
init_notebook_mode(all_interactive=False)
df2021.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2021
df2021 = df2021[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2021
init_notebook_mode(all_interactive=False)
df2021.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2021 DataFrame
df2021['IDDPTO'] = df2021['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2021
init_notebook_mode(all_interactive=False)
df2021.head()

In [None]:
# Get the number of rows in the DataFrame `df2021`
len(df2021)

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2021[df2021['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Change the 'IDDPTO' value from '06218' to '06217'
df2021.loc[df2021['IDDPTO'] == '06218', 'IDDPTO'] = '06217'

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2021[df2021['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2021_2 = df2021.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2021' in the dataframe df2021_2
df2021_2 = df2021_2.rename(columns={'CANTIDAD': 'C_2021'})

In [None]:
# Display the first few rows of the DataFrame for the year 2021
init_notebook_mode(all_interactive=False)
df2021_2.head()

In [None]:
# Get the number of rows in the DataFrame df2021_2
len(df2021_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2021_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2021_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2021' column with 0 and convert the column to integer type
gdf['C_2021'] = gdf['C_2021'].fillna(0).astype(int)

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2021'] = round((gdf['C_2021']/gdf['A_2021'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2021' column
init_notebook_mode(all_interactive=False)
gdf['CA_2021'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

###2022

In [None]:
# Read the CSV file into a DataFrame
df = pd.read_csv("pdt/asthma_mortality/data/csv/def_asma_2001_2022_clean_02.csv")

In [None]:
# Filter the dataframe to include only rows where the 'ANIO' column equals 2022
df2022 = df[df['ANIO'] == 2022]

In [None]:
# Display the first few rows of the DataFrame for the year 2022
init_notebook_mode(all_interactive=False)
df2022.head()

In [None]:
# Select only the columns 'ANIO', 'IDDPTO', and 'CANTIDAD' from the DataFrame df2022
df2022 = df2022[['ANIO', 'IDDPTO', 'CANTIDAD']]

In [None]:
# Display the first few rows of the DataFrame for the year 2022
init_notebook_mode(all_interactive=False)
df2022.head()

In [None]:
# Add a leading zero to IDDPTO if its length is 4 or less, otherwise return it as a string
append_zero_IDDPTO = lambda IDDPTO: "0" + str(IDDPTO) if len(str(IDDPTO)) <= 4 else str(IDDPTO) # Convert IDDPTO to string before checking the length and in the output

In [None]:
# Apply the append_zero_IDDPTO function to the 'IDDPTO' column of the df2001 DataFrame
df2022['IDDPTO'] = df2022['IDDPTO'].apply(append_zero_IDDPTO)

In [None]:
# Display the first few rows of the DataFrame for the year 2022
init_notebook_mode(all_interactive=False)
df2022.head()

In [None]:
# Get the number of rows in the DataFrame `df2022`
len(df2022)

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2022[df2022['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Change the 'IDDPTO' value from '06218' to '06217'
df2022.loc[df2022['IDDPTO'] == '06218', 'IDDPTO'] = '06217'

In [None]:
# Find rows where IDDPTO is equal to '06218'
chascomus_data = df2022[df2022['IDDPTO'] == '06218']

# Print the data
init_notebook_mode(all_interactive=False)
chascomus_data

In [None]:
# Group the dataframe by the 'IDDPTO' column, sum the 'CANTIDAD' values for each group, and reset the index
df2022_2 = df2022.groupby('IDDPTO')['CANTIDAD'].sum().reset_index()

In [None]:
# Rename the 'CANTIDAD' column to 'C_2022' in the dataframe df2022_2
df2022_2 = df2022_2.rename(columns={'CANTIDAD': 'C_2022'})

In [None]:
# Display the first few rows of the DataFrame for the year 2022
init_notebook_mode(all_interactive=False)
df2022_2.head()

In [None]:
# Get the number of rows in the DataFrame df2022_2
len(df2022_2)

In [None]:
# Load the shapefile using geopandas
gdf = gpd.read_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')

In [None]:
# Display the first few rows of the GeoDataFrame
init_notebook_mode(all_interactive=True)
gdf.head()

In [None]:
# Merge the GeoDataFrame with df2022_2 on the 'IDDPTO' column using a left join
gdf = gdf.merge(df2022_2, on='IDDPTO', how='left')

In [None]:
# Display information about the GeoDataFrame
gdf.info()

In [None]:
# Fill missing values in the 'C_2022' column with 0 and convert the column to integer type
gdf['C_2022'] = gdf['C_2022'].fillna(0).astype(int)

In [None]:

# Display information about the GeoDataFrame
gdf.info()

In [None]:
# calculate the adjusted mortality rate (100,000 inhabitants)
gdf['CA_2022'] = round((gdf['C_2022']/gdf['A_2022'])*100000, 2)

In [None]:
# Display summary statistics for the 'CA_2022' column
init_notebook_mode(all_interactive=False)
gdf['CA_2022'].describe()

In [None]:
# Save the GeoDataFrame to a shapefile with UTF-8 encoding
gdf.to_file("pdt/asthma_mortality/data/shp/tma_2001_2022.shp", encoding='utf-8')