# Incidence of Covid-19 and Neighborhood Activity

The world is currently in the grip of a global pandemic of SARS-CoV-2. Countries, and public health departments in particular, have been fairly open about infection data in order to give the public the information necessary to make behavioral choices to protect themselves and their communities. We might wonder, though, whether this data is being consumed and deployed at the individual level. The proposed project will explore this question using location-tagged COVID-19 infection data paired with Foursquare trending venue data. We ask, is the level of COVID-19 infection in a neighborhood correlated with the kinds of activity we observe in that neighborhood? For instance, if we find that the level of activity in indoor spaces like grocery stores is negatively correlated with COVID-19 levels in a neighborhood, it suggests that residents are aware of the danger in their community and are avoiding enclosed spaces. If we see a positive correlation, we evidence (though not proof!) for some causal link, that is the high activity itself could be explaining the high incidence of disease.

The results of this study will help public health stakeholders calibrate their public outreach. For instance, if we find a positive or no correlation, the public health department may choose to take specific interventions (e.g., increased targeted advertising) so that the residents are aware of the heighten risk in their neighborhood. If we find a negative correlation because COVID-19 case count and activity, then stakeholders will know that the current strategy is currently working. 


## Introducing the data

We will explore the incidence of COVID-19 and neighborhood activity in the city of Toronto. The Toronto public health department publishes comprehensive data about COVID-19, including data about number of positive cases by neighborhood (see https://www.toronto.ca/home/covid-19/covid-19-latest-city-of-toronto-news/covid-19-status-of-cases-in-toronto/). We will be able to take the raw COVID-19 cases by neighborhood data and pair it with trending venue data using the Foursquare API (documented here: https://developer.foursquare.com/docs/api-reference/venues/trending). We can then ask a variety of questions that will get at the larger question at hand. For instance, we could form a binary feature encoding whether the highest trending location in a neighborhood is an indoor space (like a restaurant or grocery store), we could then fit a logistic regression model to see whether COVID-19 case count predicts if a neighborhood has trending indoor spaces or not.

### Appendix

We have already retrieved the COVID-19 case count by neighborhood data for the day of June 5th, 2020, which we show below. We will continue to retrieve that data at intervals, which would even allow us to look at possible trends. 

In [2]:
import pandas as pd

df = pd.read_csv('toronto_covid_by_neighborhood_6-5-20.csv')
df.head()

Unnamed: 0,Neighbourhood Name,Case Count
0,Yorkdale-Glen Park,120
1,York University Heights,286
2,Yonge-St.Clair,19
3,Yonge-Eglinton,12
4,Wychwood,68
