# The effects of COVID-19 on crime rates in Vancouver

group project proposal:

Sean Lee, Neil Li, Tracy Wang, Wendi Zhong

## Introduction

Before the pandemic, our teammate Neil has experienced no crimes more major than perhaps public drunkeness, but once the pandemic started, he has been subjected to two different attempts of grand theft auto and one shooting. This can't help but make us wonder: is this simply a streak of bad luck or is there a genuine correlation between these crimes and the pandemic?

Now there are two ways to looking at this idea, as research (Nivette et. al., 2021) has shown that crime rate decreases due to lockdowns forcing people to stay in their homes, there are also arguments to be had about how the economic downturn (Munywoki, 2020) could lead more people into commiting crimes. So there are arguments for the pandemic leading people into commiting simultaneously less and more crimes.

### Research Question:

<b>Has Covid 19 affected the frequency and severity of Crimes?<b>
    
### Hypothesis:

$H_0: \mu_1 - \mu_2 = 0$ vs $H_1: \mu_1 - \mu_2 \neq 0$
    
$\mu_1$: the average total crimes commited before the outbreak
    
$\mu_2$: the average total crimes commited after the outbreak

In [None]:
library(tidyverse)
library(datateachr)
library(repr)
library(digest)
library(infer)
library(grid)

## Dataset Info:

The dataset is downloaded from \"[Vancouver Crime Data](https://geodash.vpd.ca/opendata/)\", an open data dataset provided by the Vancouver Police Department. Which we selected to list all the the crimes commited in every neighbourhood in Vancouver since 2003.

In [None]:
crime_data <- read.csv("crimedata_csv_AllNeighbourhoods_AllYears.csv")
head(crime_data)

Because we want to have the crime data be more representative of the difference between the years leading up to the pandemic to the years during and after the pandemic, we will filter the data to only include years from 2017 onwards, and before November since 2022 has not had a November yet. We will also only need the columns containing the type of the crime, year the crime was committed.

In [None]:
crime_data_processed <- crime_data %>%
    filter(YEAR >= 2017, MONTH <= 10) %>%
    select(TYPE, YEAR)

head(crime_data_processed)

In [None]:
# take a single sample with size 2000 from population

set.seed(2190)

crime_sample <- crime_data_processed %>%
    rep_sample_n(size = 2000, replace = FALSE) %>%
    mutate(Pandemic = ifelse(YEAR < 2020, "Before", "After"))
head(crime_sample)

We first decided to visualize the overall spread of crime over the six years by taking a sample of size 2000, and bootstrapping 1000 samples from it to see the overall

In [None]:
# create 1000 bootstrap samples with size 2000 of the difference in crimes commited before the pandemic 
# (YEAR < 2020) 

set.seed(2190)
bootstrap_sample <- crime_sample %>%
    rep_sample_n(size = 2000, reps = 1000, replace = TRUE)%>%
    group_by(replicate,Pandemic)%>%
    summarize(n = n())%>%
    pivot_wider(names_from = Pandemic, values_from = n) %>%
    mutate(diff = Before - After) 
    
head(bootstrap_sample)



In [None]:
#Visualize the bootstrap distribution
bootstrap_sampling_distribution <- bootstrap_sample%>%
    ggplot(aes(x = diff)) +
    geom_histogram(binwidth = 10) +
    xlab("Difference in Crimes Commited before and after Pandemic") +
    ggtitle("Bootstrap Sampling Distribution")

bootstrap_sampling_distribution

In [None]:
#obtain 95% confidence interval 
ci <- bootstrap_sample %>%
    get_ci(level = 0.95, type = "percentile")
ci

Because this is a large dataset, we have the luxury of creating many large samples, and with those large samples we could apply the central limit theorem to get more crucial data.

In [None]:
#Visualize the bootstrap distribution with 95% confidence interval

ci_plot <- bootstrap_sample%>%
    ggplot(aes(x = diff)) +
    geom_histogram (binwidth = 10, colour = "white", fill = "grey") +
    annotate("rect", xmin = ci$lower_ci, xmax = ci$upper_ci, ymin = 0, ymax = Inf,
             fill = "deepskyblue",
             alpha = 0.3) +
    xlab("Difference in Crime Commited Before and After Pandemic")+
    ggtitle("Bootstrap Distribution with 95% Confidence Interval")
ci_plot

In [None]:
# calculate mean and standard deviation on the difference between the total amount of crime before and after the 
# pandemic using the central limit theorem and obtain a 95% confidence interval from this

## Methods


mention our plans for hypothesis testing, and future plans to test how different kinds of crimes have been affected.

# References:

Ferguson, E. (2015). Crime and punishment vocabulary with pronunciation. IELTS Liz. Retrieved October 31, 2022, from https://ieltsliz.com/crime-and-punishment-vocabulary/ 

Munywoki, G. (2020). Economic effects of novel coronavirus (COVID – 19) on the global economy. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3719130 

n.a. (n.d.). Crime Data Download. VPD open data. Retrieved October 31, 2022, from https://geodash.vpd.ca/opendata/ 

Nivette, A.E., Zahnow, R., Aguilar, R. et al. A global analysis of the impact of COVID-19 stay-at-home restrictions on crime. Nat Hum Behav 5, 868–877 (2021). https://doi.org/10.1038/s41562-021-01139-z