# Incidence of Covid-19 and Neighborhood Activity

The world is currently in the grip of a global pandemic of SARS-CoV-2. Countries, and public health departments in particular, have been fairly open about infection data in order to give the public the information necessary to make behavioral choices to protect themselves and their communities. We might wonder, though, whether this data is being consumed and deployed at the individual level. The proposed project will explore this question using location-tagged COVID-19 infection data paired with Foursquare trending venue data. We ask, is the level of COVID-19 infection in a neighborhood correlated with the kinds of activity we observe in that neighborhood? For instance, if we find that the level of activity in indoor spaces like grocery stores is negatively correlated with COVID-19 levels in a neighborhood, it suggests that residents are aware of the danger in their community and are avoiding enclosed spaces. If we see a positive correlation, we evidence (though not proof!) for some causal link, that is the high activity itself could be explaining the high incidence of disease.

The results of this study will help public health stakeholders calibrate their public outreach. For instance, if we find a positive or no correlation, the public health department may choose to take specific interventions (e.g., increased targeted advertising) so that the residents are aware of the heighten risk in their neighborhood. If we find a negative correlation because COVID-19 case count and activity, then stakeholders will know that the current strategy is currently working. 


## Introducing the data

We will explore the incidence of COVID-19 and neighborhood activity in the city of Toronto. The Toronto public health department publishes comprehensive data about COVID-19, including data about number of positive cases by neighborhood (see https://www.toronto.ca/home/covid-19/covid-19-latest-city-of-toronto-news/covid-19-status-of-cases-in-toronto/). We will be able to take the raw COVID-19 cases by neighborhood data and pair it with trending venue data using the Foursquare API (documented here: https://developer.foursquare.com/docs/api-reference/venues/trending). We can then ask a variety of questions that will get at the larger question at hand. For instance, we could form a binary feature encoding whether the highest trending location in a neighborhood is an indoor space (like a restaurant or grocery store), we could then fit a logistic regression model to see whether COVID-19 case count predicts if a neighborhood has trending indoor spaces or not.

### Constructing a Toronto case count dataset
We retrieved the COVID-19 case count by neighborhood data for the day of June 5th, 2020, which we show below. We need to add geospatial data to this set, which is what we will do in this section. For readers no interested in the process of data construction, skip to end of the section to see the resulting dataset.

In [109]:
import pandas as pd

covid_df = pd.read_csv('toronto_covid_by_neighborhood_6-5-20.csv')
covid_df.head(20)

Unnamed: 0,Neighbourhood Name,Case Count
0,Yorkdale-Glen Park,120
1,York University Heights,286
2,Yonge-St.Clair,19
3,Yonge-Eglinton,12
4,Wychwood,68
5,Woodbine-Lumsden,19
6,Woodbine Corridor,17
7,Woburn,231
8,Willowridge-Martingrove-Richview,61
9,Willowdale West,35


We need to pair these data with geospatial coordinates in order to query the Foursquare API. To do this, we start by scraping Wikipedia for the postal codes of each borough and neighborhood, and then crossreferencing the postal codes with a CSV of postal code coordinates.

In [110]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df = dfs[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


We have to do a bit of cleaning the data in order to join our these data with our COVID neighborhood data.

In [111]:
# Toss rows with unassigned Borough
df = df[df.Borough != 'Not assigned']

# Explode df so that Neighborhoods under the same postal code get their own row.

df = df.assign(Neighborhood=df.Neighborhood.str.split(',')).explode('Neighborhood')

In [112]:
# Join covid data frame and our neighborhood data on Neighborhood
covid_df.rename(columns={"Neighbourhood Name":"Neighborhood"}, inplace=True)

df = df.join(covid_df.set_index('Neighborhood'), on='Neighborhood')

In [113]:
df = df.dropna()
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Case Count
3,M4A,North York,Victoria Village,37.0
4,M5A,Downtown Toronto,Regent Park,23.0
9,M1B,Scarborough,Malvern,158.0
23,M6C,York,Humewood-Cedarvale,20.0
27,M1E,Scarborough,Guildwood,101.0


We combine our case count data by neighborhood with spatial coordinates, which will allow us to plot our data and crossreference trending venues on Foursquare.

In [114]:
df2 = pd.read_csv("Geospatial_Coordinates.csv")

In [115]:
df = df.join(df2.set_index('Postal Code'), on='Postal Code')
df.to_csv('casecount_latlong.csv', encoding='utf-8', index = False)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Case Count,Latitude,Longitude
3,M4A,North York,Victoria Village,37.0,43.725882,-79.315572
4,M5A,Downtown Toronto,Regent Park,23.0,43.65426,-79.360636
9,M1B,Scarborough,Malvern,158.0,43.806686,-79.194353
23,M6C,York,Humewood-Cedarvale,20.0,43.693781,-79.428191
27,M1E,Scarborough,Guildwood,101.0,43.763573,-79.188711


Now that we have our final case cound dataset, we can extend that data with data about activity in those neighborhoods using Foursquare trending venue data.

### Adding Foursquare trending venue data

Foursquare provides an API endpoint providing venues near a location with the most people currently checked in. We want to grab those venues and consult their type. This will let us know what kinds of venues are currently popular in particular Toronto neighborhoods. We will crossreference this data with the COVID data we constructed above.

First, we set our Foursquare credentials.

In [169]:
CLIENT_ID = 'CLIENT_ID' # your Foursquare ID
CLIENT_SECRET = 'CLIEND_SECRET' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CLIENT_ID
CLIENT_SECRET:CLIEND_SECRET


## Methodology

In [None]:
import folium

map_toronto = folium.Map(location=[43.753259, -79.329656], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto