# Explaining the COVID-19 cases in Montreal

## Table Of Contents

* [1. Introduction](#item1)
* [2. Data Acquisition](#item2)
    * [2.1 Data Sources](#item2_1)
    * [2.2 Data Cleaning](#item2_2)
    * [2.3 Data Wrangling](#item2_3)
* [3. Methodology](#item3)
* [4. Results](#item4)
* [5. Discussion](#item5)
* [6. Conclusion](#item6)

## 1. Introduction <a class="anchor" id="item1"></a>

The coronavirus or, otherwise known as COVID-19, has disrupted societies worldwide and brought entire industries to a halt. 

In Canada, however, not all provinces have been affected equally, with Quebec being at the top of the list with the highest number of cases. Since the beginning of the pandemic, it was obvious that Montreal would probably be the hottest spot of the province but, as weeks went by, it also became apparent that some areas of the island were more affected than others.

In this project, we will analyze and, based on this historic data, cluster the boroughs of the Montreal island with the goal of identifying the areas that are more at risk of having a spike in the near future. Furthermore, we will leverage the Foursquare API to determine the availability of healthcare locations on the island and, hopefully, explain the high number of confirmed cases in the most affected boroughs.

## 2. Data Acquisition <a class="anchor" id="item2"></a>

### 2.1 Data Sources <a class="anchor" id="item2_1"></a>

Most of the data is available at the <a href="https://santemontreal.qc.ca/en/public/coronavirus-covid-19/situation-of-the-coronavirus-covid-19-in-montreal">Santé Montréal website</a>. There are multiple data sets available on their website but, for this report, we are mainly interested in the table titled **"Numbers of confirmed cases and deaths by borough or linked city"**, which can also be downloaded in CSV format.

However, location information had to be sourced from the <a href="http://donnees.ville.montreal.qc.ca/dataset/polygones-arrondissements">Ville de Montréal website</a>, and can be found by querying **"Limite administrative de l'agglomération de Montréal"**. For our report, we picked the GEOJSON format which, apart from the usual polygon coordinates, contains data properties about area, perimeter and borough type.

Finally, we are going to compliment these data with information from Foursquare's search API about venues whose categories are related to healthcare institutions, such as hospitals, emergency rooms, medical centers, etc. The complete list of possible venues is available <a href="https://developer.foursquare.com/docs/build-with-foursquare/categories/">here</a>.

### 2.2 Data Cleaning <a class="anchor" id="item2_2"></a>

Although the data on the Santé Montréal website is presented in English and French, the downloadable CSV appears to be only in French. For convenience and ease of integration, it was decided that the CSV would be simpler to clean and maintain in the code. The first cleanup required was translating the headers. Next, we removed the noise from the table body; this included removing the comma in values above 1000, removing ***"<"*** for some very small values, and ***"*"*** for values with a citation at the footer of the table. Furthermore, the value ***"n.p."*** was replaced by 0. The finalize the cleanup process, ***NAN*** values were converted to 0, as well. The justification can be found at the footer of the table on the website:

<blockquote>
* Because of the small number of reported cases in relation with the total population, the precision of the rate value isn't optimal and should therfore be interpreted with caution

n.p. Because of the very small number or reported cases in relation with the total population, the precision of the rate value is considered too low to be published
</blockquote>

### 2.3 Data Wrangling <a class="anchor" id="item2_3"></a>

Once the cleanup was performed, we could proceed to transforming some of the data. To do any meaningful calculations, we first cast all the numerical fields to float. We can infer the pupolation of the borough by dividing the confirmed cases by the confrmed rate per 100K people. This gave us a very close approximation of the real population numbers as reported by the last Canadian census of 2016. Unfortunately, the Senneville borough did not have a value for the confirmed rate so the population could not be calculated, the actual value from the was manually entered for that particular case.

The next step was to merge this dataframe with some useful information found in the GEOJSON file from Ville de Montreal website that we intended to use for plotting the map. At this point, some borough names had to be adjusted to be able to join the datasets. Once the names matched, we merged into the former dataset the area and the one-hot encoding of the borough types (which are either Municipality or Linked city). Now, having the area and population for each borough, we could also obtain population density.

## 3. Methodology <a class="anchor" id="item3"></a>

TBD

## 4. Results <a class="anchor" id="item4"></a>

TBD

## 5. Discussion <a class="anchor" id="item5"></a>

TBD

## 6. Conclusion <a class="anchor" id="item6"></a>

TBD