# The effects of COVID-19 on crime rates in Vancouver

group project proposal:

Neil Li, Tracy Wang, Wendi Zhong

## Introduction

Before the pandemic, our teammate Neil has experienced no crimes more major than perhaps public drunkeness, but once the pandemic started, he has been subjected to two different attempts of grand theft auto and one shooting. This can't help but make us wonder: is this simply a streak of bad luck or is there a genuine correlation between these crimes and the pandemic?

Now this is not a completely unfounded idea, as although research has shown 

### Research Question:

<b>Has Covid 19 affected the frequency and severity of Crimes?<b>

In [1]:
library(tidyverse)
library(datateachr)
library(repr)
library(digest)
library(infer)
library(grid)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


## Dataset Info:

The dataset is downloaded from \"[Vancouver Crime Data](https://geodash.vpd.ca/opendata/)\", an open data dataset provided by the Vancouver Police Department. Which we selected to list all the the crimes commited in every neighbourhood in Vancouver since 2003.

In [2]:
crime_data <- read.csv("crimedata_csv_AllNeighbourhoods_AllYears.csv")
head(crime_data)

Unnamed: 0_level_0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y
Unnamed: 0_level_1,<chr>,<int>,<int>,<int>,<int>,<int>,<chr>,<chr>,<dbl>,<dbl>
1,Break and Enter Commercial,2012,12,14,8,52,,Oakridge,491285.0,5453433
2,Break and Enter Commercial,2019,3,7,2,6,10XX SITKA SQ,Fairview,490613.0,5457110
3,Break and Enter Commercial,2019,8,27,4,12,10XX ALBERNI ST,West End,491007.8,5459174
4,Break and Enter Commercial,2021,4,26,4,44,10XX ALBERNI ST,West End,491007.8,5459174
5,Break and Enter Commercial,2014,8,8,5,13,10XX ALBERNI ST,West End,491015.9,5459166
6,Break and Enter Commercial,2020,7,28,19,12,10XX ALBERNI ST,West End,491015.9,5459166


Because we want to have the crime data be more representative of the difference between the years leading up to the pandemic to the years during and after the pandemic, we will filter the data to only include years from 2017 onwards, and before November since 2022 has not had a November yet. We will also only need the columns containing the type of the crime, year the crime was committed.

In [3]:
 crime_data_processed <- crime_data %>%
    filter(YEAR >= 2017, MONTH <= 10) %>%
    select(TYPE, YEAR)

head(crime_data_processed)

Unnamed: 0_level_0,TYPE,YEAR
Unnamed: 0_level_1,<chr>,<int>
1,Break and Enter Commercial,2019
2,Break and Enter Commercial,2019
3,Break and Enter Commercial,2021
4,Break and Enter Commercial,2020
5,Break and Enter Commercial,2022
6,Break and Enter Commercial,2022


We first decided to visualize the overall spread of crime over the six years by taking a sample of size 3000, and bootstrapping 1000 samples from it to see the overall

In [4]:
# create bootstrap samples of the difference in proportion of crimes commited before the pandemic 
# (YEAR < 2020), and obtain a 95% confidence interval from this

crime_data_processed %>%
group_by(YEAR) %>%
summarize(n = n())


YEAR,n
<int>,<int>
2017,35533
2018,36855
2019,40206
2020,32051
2021,26891
2022,28600


Because this is a large dataset, we have the luxury of creating many large samples, and with those large samples we could apply the central limit theorem to get more crucial data.

In [5]:
# calculate mean and standard deviation on the difference between the total amount of 
# crime before and after the pandemic using the central limit theorem and obtain a 95% 
# confidence interval from this

In [9]:
# take 1000 single sample with size 2000 from population

set.seed(2190)

crime_1000_samples <- crime_data_processed %>%
    rep_sample_n(size = 2000, reps = 1000, replace = FALSE) %>%
    mutate(Pandemic = ifelse(YEAR < 2020, "Before", "After"))
head(crime_1000_samples)

replicate,TYPE,YEAR,Pandemic
<int>,<chr>,<int>,<chr>
1,Theft from Vehicle,2018,Before
1,Theft from Vehicle,2018,Before
1,Theft from Vehicle,2018,Before
1,Other Theft,2022,After
1,Mischief,2021,After
1,Theft of Bicycle,2017,Before


In [7]:
# number_of_crime <- crime_data_processed %>%
# group_by(YEAR) %>%
# summarize(n = n())
# # number_of_crime 

# before <- number_of_crime %>%
# filter(YEAR < 2020)
# before
# before_sum

# after <- number_of_crime %>%
# filter(YEAR >= 2020)
# after

# before_mean <- mean(before$n)
# before_mean
# before_sd <- sd(before$n)
# before_sd
# b_n <- nrow(before)
# b_n

# after_mean <- mean(after$n)
# after_mean
# after_sd <- sd(before$n)
# after_sd
# a_n <- nrow(after)
# a_n


# parking_clt_ci <- 
#     tibble(lower_ci = downtown_mean - kits_mean - qnorm(0.97) * sqrt(downtown_var + kits_var),
#            upper_ci = downtown_mean - kits_mean + qnorm(0.97) * sqrt(downtown_var + kits_var)
#     )


## Methods


mention our plans for hypothesis testing, and future plans to test how different kinds of crimes have been affected.

# References:

Ferguson, E. (2015). Crime and punishment vocabulary with pronunciation. IELTS Liz. Retrieved October 31, 2022, from https://ieltsliz.com/crime-and-punishment-vocabulary/ 

n.a. (n.d.). Crime Data Download. VPD open data. Retrieved October 31, 2022, from https://geodash.vpd.ca/opendata/ 

Nivette, A.E., Zahnow, R., Aguilar, R. et al. A global analysis of the impact of COVID-19 stay-at-home restrictions on crime. Nat Hum Behav 5, 868–877 (2021). https://doi.org/10.1038/s41562-021-01139-z