# Project Title: Analysis of COVID cases in Ontario, CA
Owned by Asghar Sadeghi, PhD, Dec. 2020

## Week 4 - Part A: Statement of the Problem

In this study, we plan to analyze the COVID-19 cases in Ontario province. Currently, we observe the second wave of pandemic and the government plans to impose additional restrictions on different counties. Given the venues in the neighborhood of a medical center and the number of confirmed cases, we try to cluster similar cities/counties in terms of COVID behaviour and consult the government to make decisions accordingly.
In the first section we analyze the data, then we try to map it, later the neighborhoud venues will be extraced using FourSquare, and as the final step the data will be clustered to 5 different categories to restrict the interactions.
The first audience of this study would be the Government of Ontario and the Mayers and City councils, but the citizens also could be the second audience.

### Import libraries
The required libraries are installed or called in this section.

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!conda install -c anaconda beautifulsoup4 --yes
from bs4 import BeautifulSoup

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# we are using the inline backend
%matplotlib inline 
import matplotlib as mpl
import matplotlib.pyplot as plt
import datetime

import os
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
import seaborn as sns

# Handle date time conversions between pandas and matplotlib
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Use white grid plot background from seaborn
sns.set(font_scale=1.5, style="whitegrid")

%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')

print('Libraries imported. - Confirmed')

Libraries imported. - Confirmed


## Week 4 - Part B: Data Section
The data is live COVID data that is being posted on the "https://data.ontario.ca". It includes all reported cases since the start of pandemic.
The columns for each case are the exact episode date, age grouped by decade, gender, outcome (recovered, active, and death), Reporting PHU ID, PHU name, postal code, latitude and longitude.
Furthermore, using the Foursquare website, the venues near each public health unit (PHU) is extracted to find a relation between venues and the number of confirmed cases.
Data URL: https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/455fd63b-603d-4608-8216-7d8647f43350/download/conposcovidloc.csv

In [2]:
links={'COVID_Cases_Ontario':'https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/455fd63b-603d-4608-8216-7d8647f43350/download/conposcovidloc.csv'}

In [3]:
csv_path = links["COVID_Cases_Ontario"]
df_covid = pd.read_csv(csv_path)
df_covid.drop(['Case_Reported_Date', 'Test_Reported_Date', 'Specimen_Date', 'Case_AcquisitionInfo', 'Outbreak_Related', 'Reporting_PHU_Address','Reporting_PHU_City','Reporting_PHU_Website'], axis=1, inplace=True)

In [4]:
df_covid['Age_Group'].replace('UNKNOWN', np.nan)
df_covid.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)
print ('After cleaning, number of confirmed cases are:', df_covid.shape[0])

After cleaning, number of confirmed cases are: 119915


Table 1: Daily COVID-19 Cases, Ontario Province

In [5]:
df_covid['Accurate_Episode_Date'] = pd.to_datetime(df_covid['Accurate_Episode_Date'])
df_covid.head()

Unnamed: 0,Row_ID,Accurate_Episode_Date,Age_Group,Client_Gender,Outcome1,Reporting_PHU_ID,Reporting_PHU,Reporting_PHU_Postal_Code,Reporting_PHU_Latitude,Reporting_PHU_Longitude
0,1,2020-11-18,60s,FEMALE,Not Resolved,4913,Southwestern Public Health,N5P 1G9,42.777804,-81.151156
1,2,2020-10-30,50s,MALE,Resolved,4913,Southwestern Public Health,N5P 1G9,42.777804,-81.151156
2,3,2020-10-28,50s,MALE,Resolved,4913,Southwestern Public Health,N5P 1G9,42.777804,-81.151156
3,4,2020-11-17,50s,MALE,Resolved,2234,Haldimand-Norfolk Health Unit,N3Y 4N5,42.847825,-80.303815
4,5,2020-11-06,60s,MALE,Resolved,2227,Brant County Health Unit,N3R 1G7,43.151811,-80.274374


Table 2: Grouped number of confirmed cases for each PHU

In [6]:
df_covid_count = pd.DataFrame(df_covid.groupby(['Reporting_PHU'])['Row_ID'].count())
df_covid_count = df_covid_count.reset_index()
df_covid_count.head()

Unnamed: 0,Reporting_PHU,Row_ID
0,Algoma Public Health Unit,61
1,Brant County Health Unit,608
2,Chatham-Kent Health Unit,510
3,Durham Region Health Department,4636
4,Eastern Ontario Health Unit,853


In [7]:
df_covid_loc = pd.DataFrame(df_covid.groupby(['Reporting_PHU','Reporting_PHU_Postal_Code'],as_index=True).mean())
df_covid_loc = df_covid_loc.reset_index()
df_covid_loc.drop(['Row_ID'], axis=1, inplace=True)

In [8]:
df_covid_final=pd.merge(df_covid_count,df_covid_loc)
df_covid_final['Norm_Cases']=df_covid_final['Row_ID']*100/df_covid_final['Row_ID'].sum()

## Define Foursquare Credentials and Version

In [9]:
CLIENT_ID = '5CI5CO1LEUXWI12VXCOO1O0UUHPD5RHARRWPV02DQULDLVNM' # your Foursquare ID
CLIENT_SECRET = '0NCZVFFNU3WGX0VVX4UXFBSSKSUKVEYJWG1ZB22LO3SQO4MK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

## Nearby Venues attached to each PHU

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Reporting_PHU', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
# type your answer here
Ontario_venues = getNearbyVenues(names = df_covid_final['Reporting_PHU'],
                                   latitudes = df_covid_final['Reporting_PHU_Latitude'],
                                   longitudes = df_covid_final['Reporting_PHU_Longitude']
                                 )


Table 3: Venues around each Public Health Unit (PHU)

In [12]:
print(Ontario_venues.shape)
Ontario_venues.head()

(2355, 7)


Unnamed: 0,Reporting_PHU,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Algoma Public Health Unit,46.532373,-84.314836,Shogun Sushi,46.530801,-84.319091,Sushi Restaurant
1,Algoma Public Health Unit,46.532373,-84.314836,Burger Don,46.52258,-84.319638,Burger Joint
2,Algoma Public Health Unit,46.532373,-84.314836,Fratellis,46.542842,-84.318774,Italian Restaurant
3,Algoma Public Health Unit,46.532373,-84.314836,YMCA,46.521494,-84.316275,Gym / Fitness Center
4,Algoma Public Health Unit,46.532373,-84.314836,North 82,46.527674,-84.319183,Steakhouse
