# Choropleth map with markers of the COVID-19 in Ecuador

This Notebook will be a guide on how to create a choropleth map of Ecuador based in the data given by the public health authorities from Ecuador (Ministerio de salud Publica, MSP).
<br>
__Note:__ To fully understand this notebook it is needed basic knowledge of pandas dataframes and folium python library.

## Installation of the libraries that are going to be used

In [1]:
!pip install folium
!pip install pandas



## Import libraries

__Note:__ os and webbrowser libraries are default libraries, so there is no need to install them.

In [2]:
import pandas as pd
import folium
import os 
import webbrowser

## Reading Base_covid19_msp_23abr2020.csv file using pandas

This file contains a list of all the COVID-19 tests realized in Ecuador until April 23th, 2020.
<br>
__sep=';'__ defines that the separation character will be ";" and not "," which is set by default.
<br>
__header='infer'__ defines that the headers will be the same names shown in the first row of the .csv file.
<br>
__encoding='ISO-8859-1'__ is a very common encoding used to read .csv files.


In [3]:
df_COVID_EC = pd.read_csv('Base_covid19_msp_23abr2020.csv', sep=';', header='infer', encoding='ISO-8859-1')
df_COVID_EC.head()

Unnamed: 0,anio,cod_prov_res,prov_residencia,cod_cant_res,cant_residencia,sexo,edad,tipo_edad,grupo_edad,inicio_sintomas,diagnostico,condicion_paciente,clasificacion_caso,total_muestras
0,2020,'12,LOS RIOS,'1201,BABAHOYO,Femenino,71,Año(s),mas de 65,2020-02-15,"U071 Enfermedad Respiratoria Aguda (U07.1, 201...",Muerto,Confirmado,1
1,2020,'12,LOS RIOS,'1201,BABAHOYO,Femenino,66,Año(s),mas de 65,2020-02-27,"U071 Enfermedad Respiratoria Aguda (U07.1, 201...",Vivo,Confirmado,1
2,2020,'12,LOS RIOS,'1201,BABAHOYO,Masculino,36,Año(s),de 20 a 49 años,2020-02-26,"U071 Enfermedad Respiratoria Aguda (U07.1, 201...",Vivo,Confirmado,1
3,2020,'12,LOS RIOS,'1201,BABAHOYO,Femenino,33,Año(s),de 20 a 49 años,2020-02-27,"U071 Enfermedad Respiratoria Aguda (U07.1, 201...",Vivo,Confirmado,1
4,2020,'12,LOS RIOS,'1201,BABAHOYO,Masculino,68,Año(s),mas de 65,2020-02-26,"U071 Enfermedad Respiratoria Aguda (U07.1, 201...",Vivo,Confirmado,1


## Understanding the loaded Dataframe

In [4]:
#Shows the column headers of the dataframe
df_COVID_EC.columns

Index(['anio', 'cod_prov_res', 'prov_residencia', 'cod_cant_res',
       'cant_residencia', 'sexo', 'edad', 'tipo_edad', 'grupo_edad',
       'inicio_sintomas', 'diagnostico', 'condicion_paciente',
       'clasificacion_caso', 'total_muestras'],
      dtype='object')

In [5]:
#Shows the column headers of the dataframe and the data type they represent.
df_COVID_EC.dtypes

anio                   int64
cod_prov_res          object
prov_residencia       object
cod_cant_res          object
cant_residencia       object
sexo                  object
edad                   int64
tipo_edad             object
grupo_edad            object
inicio_sintomas       object
diagnostico           object
condicion_paciente    object
clasificacion_caso    object
total_muestras         int64
dtype: object

In [6]:
# Show some characteristics of the columns tahta contains numeric data
df_COVID_EC.describe()

Unnamed: 0,anio,edad,total_muestras
count,35448.0,35448.0,35448.0
mean,2020.0,43.21248,1.0
std,0.0,16.658176,0.0
min,2020.0,0.0,1.0
25%,2020.0,31.0,1.0
50%,2020.0,40.0,1.0
75%,2020.0,55.0,1.0
max,2020.0,120.0,1.0


## Data pre-processing

So, as said before, the .csv contains a list of all the COVID-19 tests realized in Ecuador until April 23th, 2020. Where each row represents an exam realized on a person. 
<br>
The aim of this section is to get a dataframe that contains the number of positive COVID-19 cases grouped by each province of the country in order to represent them in the map.


In [7]:
# Since the groupby method returns a pandas series it is used the name se_provs
#Filter the rows only focusing in positive test result rows from dataframe.
se_provs = df_COVID_EC[df_COVID_EC['clasificacion_caso']=='Confirmado']
# Group the positive tests by provinces with the groupby and count methods.  
se_provs = se_provs.groupby(['prov_residencia']).count()
# locate only the interes coumns for the map ('total_muestras')
se_provs = se_provs.loc[:, 'total_muestras']

# Create an empty data frame
df_provs = pd.DataFrame()
# Add the columns to the new dataframe from the pandas series "se_provs"
df_provs['prov_residencia'] = se_provs.index
df_provs['total_muestras'] = se_provs.values

df_provs

Unnamed: 0,prov_residencia,total_muestras
0,AZUAY,285
1,BOLIVAR,66
2,CARCHI,38
3,CAÑAR,166
4,CHIMBORAZO,130
5,COTOPAXI,68
6,EL ORO,300
7,ESMERALDAS,115
8,GALAPAGOS,54
9,GUAYAS,7502


As visible, there is a province with the name "CAÑAR". In programming is preferable to avoid those words and replace them. In this case "CAÑAR" will be replaced with "CANAR".

In [8]:
df_provs.replace({'CAÑAR': 'CANAR'}, inplace=True)
df_provs

Unnamed: 0,prov_residencia,total_muestras
0,AZUAY,285
1,BOLIVAR,66
2,CARCHI,38
3,CANAR,166
4,CHIMBORAZO,130
5,COTOPAXI,68
6,EL ORO,300
7,ESMERALDAS,115
8,GALAPAGOS,54
9,GUAYAS,7502


## Map Object Creation and layer add

Create map and then add choropleth using layer folium library.
<br>
For the Choropleth map creation we will need to read a provs-ec.json file which contains the limits of each province from 
Ecuador.
<br>
__geo_data__ Contains the file path to the .json file.
<br>
__data__ Contains the name of the dataframe that you are going to use.
<br>
__columns__ Contains the columns from the dataframe that will be read. First columns is the column that has de values that match with the data contained in the .json file.
<br>
__key_on__ Contains the path of the values we want to match with the first column of the dataframe.
<br>
__threshold_scale__ Is set by default, but you can redefine the values on it. The only condition is that all the values from de dataframe are contained by the threshold scale.


In [9]:
# Locate map in Ecuador
ec_map = folium.Map(location=[-1.831239, -78.183403], zoom_start=7, tiles='cartodbpositron') 

ec_geo = r'provs-ec.json' # geojson file
threshold_scale = [1, 50, 150, 350, 1500, 7503]

# generate choropleth map using COVID-19 dataframe
folium.Choropleth(
    geo_data=ec_geo,
    data=df_provs,
    columns=['prov_residencia', 'total_muestras'],
    key_on='feature.properties.dpa_despro',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='COVID-19 en Ecuador',
    threshold_scale=threshold_scale,
    reset=True,
).add_to(ec_map)

ec_map

Now, the Chropleth map is created, but let's try to put markers on it containin information about the exact number of cases from each province.

## Load markers data, using pandas.read_csv

In order to place the markers it is needed __loc_provs.csv__ file which contains the location of each province capital, and there is where the marker are going to be.

In [10]:
df_loc_provs = pd.read_csv('loc_provs.csv', sep=',', delimiter=None, header='infer', encoding='ISO-8859-1')
df_loc_provs

Unnamed: 0,city,lat,lng,country,iso2,admin,capital,population,population_proper
0,Guayaquil,-2.205842,-79.907948,Ecuador,EC,Guayas,admin,2514000.0,1952029.0
1,Quito,-0.194568,-78.493005,Ecuador,EC,Pichincha,primary,1701000.0,1399814.0
2,Cuenca,-2.900545,-79.004527,Ecuador,EC,Azuay,admin,286878.0,276964.0
3,Ambato,-1.252369,-78.622781,Ecuador,EC,Tungurahua,admin,281425.0,154369.0
4,Portoviejo,-1.054579,-80.454448,Ecuador,EC,Manabi,admin,213601.0,170326.0
5,Machala,-3.258608,-79.960535,Ecuador,EC,El Oro,admin,213034.0,198123.0
6,Esmeraldas,0.969112,-79.652018,Ecuador,EC,Esmeraldas,admin,173101.0,95630.0
7,Riobamba,-1.670984,-78.647124,Ecuador,EC,Chimborazo,admin,172464.0,124478.0
8,Ibarra,0.35171,-78.122333,Ecuador,EC,Imbabura,admin,146741.0,108666.0
9,Loja,-4.005579,-79.202235,Ecuador,EC,Loja,admin,126368.0,117796.0


Same situation with the "Cañar" province. It is necessary to replace that word with "Canar".

In [11]:
df_loc_provs.replace({'Cañar': 'Canar'}, inplace=True)
df_loc_provs

Unnamed: 0,city,lat,lng,country,iso2,admin,capital,population,population_proper
0,Guayaquil,-2.205842,-79.907948,Ecuador,EC,Guayas,admin,2514000.0,1952029.0
1,Quito,-0.194568,-78.493005,Ecuador,EC,Pichincha,primary,1701000.0,1399814.0
2,Cuenca,-2.900545,-79.004527,Ecuador,EC,Azuay,admin,286878.0,276964.0
3,Ambato,-1.252369,-78.622781,Ecuador,EC,Tungurahua,admin,281425.0,154369.0
4,Portoviejo,-1.054579,-80.454448,Ecuador,EC,Manabi,admin,213601.0,170326.0
5,Machala,-3.258608,-79.960535,Ecuador,EC,El Oro,admin,213034.0,198123.0
6,Esmeraldas,0.969112,-79.652018,Ecuador,EC,Esmeraldas,admin,173101.0,95630.0
7,Riobamba,-1.670984,-78.647124,Ecuador,EC,Chimborazo,admin,172464.0,124478.0
8,Ibarra,0.35171,-78.122333,Ecuador,EC,Imbabura,admin,146741.0,108666.0
9,Loja,-4.005579,-79.202235,Ecuador,EC,Loja,admin,126368.0,117796.0


## Understanding the loaded Dataframe

In [12]:
df_loc_provs.columns

Index(['city', 'lat', 'lng', 'country', 'iso2', 'admin', 'capital',
       'population', 'population_proper'],
      dtype='object')

In [13]:
df_loc_provs.dtypes

city                  object
lat                  float64
lng                  float64
country               object
iso2                  object
admin                 object
capital               object
population           float64
population_proper    float64
dtype: object

## creating markers for each province

In order to create the markers we have to iterate on the two loaded data frames, because we need the number of cases and the name of the province from the first data set, and then the latitude and longitude and for that same province. So a nested for loop is used.
<br>
The Marker object constructor consist of: a list that contains [lat, lng], popup that receives the string that contains a message that will be shown when the marker is clicked.
<br>
__Note:__ In order to do an iteration on multiple list at the same time, the __zip()__ method is used.


In [14]:
for lat, lng, label in zip(df_loc_provs.lat, df_loc_provs.lng, df_loc_provs.admin):
	for prov, casos in zip(df_provs.prov_residencia, df_provs.total_muestras):
		if prov == label.upper():
			cad = label + ' ' +  str(casos) + ' casos'
			folium.Marker([lat,lng], popup=cad).add_to(ec_map)

ec_map

## Display map if you are not using jupyter notebook

If you are using another python IDE or a text editor, the wont have the display option as Jupyter notebook does, so in order to display the map on those circumstances it will be needed to save the map as an .html file and then open it using the webbrowser library.

In [15]:
# Display world map
filepath = 'C:/Users/Hugoa/Notebooks/Map_Ecuador.html'
ec_map.save(filepath)
webbrowser.open('file://' + filepath)

True