In [2]:
# Run this cell before continuing.
library(cowplot)
library(datateachr)
library(digest)
library(infer)
library(repr)
library(taxyvr)
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m    masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m       masks [34mstats[39m::lag()
[31m✖[39m [34mlubridate[39m::[32mstamp()[39m masks [34mcowplot[39m::stamp()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


# Examining Correlations between location and crime Rates

## Introduction
---

Recently, the rates of crimes in the east of Vancouver have surged, prompting an exploration into the extent of this increase. The pressing question arises: just how elevated are these crime rates?

Our dataset contains crime data for 2022 in Vancouver. For each entry, we have recorded the type of crime, and we are specifically focusing on three crime types: "Break and Enter Residential/Other," "Break and Enter Commercial," and "Mischief." We also have information about the date and time the crime occurred, the address, neighborhood, and geographic coordinates.

The question we are trying to answer is: are the crime rates in east of vancouver 2 times more than west of vancouver? as defined by the "X" coordinate. Crimes with "X" values less than 491017.47 are categorized as the "west" side, while values greater than or equal to this threshold are considered the "east" side.

In this study, we created a new column REGION which will used to find the proportion of crimes in east and west. This approach allows us to focus on the geographical aspect of crime patterns for "Break and Enter Residential/Other," "Break and Enter Commercial," and "Mischief" incidents. 

Downtown Vancouver and hasting street have frequently been associated with higher rates of criminality, according to many news and these are one of the most hazardous locations in Vancouver. However, higher levels of socioeconomic inequality, poverty, or other social factors in west vancouver are associated with higher crime rates, it could contribute to an elevated crime rate compared to the east side. our null hypothesis is that the crime rate in east is not 2 times more than crime rate in west. Our alternative hypothesis is that crime rate in east is 2 times more than crime rate in west. 

### Parameters of Interest:

* Location Parameter (Proportion): 

H0: p_east < 0.67

H1: p_east >= 0.67

* Scale Parameter (Standard Deviation):

The standard deviation will be a secondary parameter of interest. We'll calculate the standard deviation for each crime type in both west and east sides of Vancouver. This parameter is important because it allows us to assess the variability or dispersion of crime occurrences within each group. It can help us understand how consistent or variable crime patterns are in each area.

## Preliminary Results
---

In [3]:
# setwd("/home/jovyan/work/project")

In [4]:
crime_data <- read_csv("https://raw.githubusercontent.com/aradsab/Stat-201-Project/main/crimedata_csv_AllNeighbourhoods_2022.csv")
# cleaning the data
crime_data <- crime_data[complete.cases(crime_data), ]
crime_data <- subset(crime_data, X != 0)
crime_data <- crime_data %>%
  mutate(REGION = ifelse(X > 491017.47, "East", "West"))

head(crime_data)

[1mRows: [22m[34m34281[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): TYPE, HUNDRED_BLOCK, NEIGHBOURHOOD
[32mdbl[39m (7): YEAR, MONTH, DAY, HOUR, MINUTE, X, Y

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y,REGION
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>
Break and Enter Commercial,2022,1,3,16,19,10XX ALBERNI ST,West End,491036.1,5459146,East
Break and Enter Commercial,2022,6,17,5,16,10XX ALBERNI ST,West End,491067.3,5459115,East
Break and Enter Commercial,2022,3,15,5,14,10XX ALBERNI ST,West End,491102.2,5459092,East
Break and Enter Commercial,2022,3,19,6,42,10XX ALBERNI ST,West End,491102.2,5459092,East
Break and Enter Commercial,2022,2,23,23,0,10XX BALFOUR AVE,Shaughnessy,490699.8,5455444,West
Break and Enter Commercial,2022,2,25,10,15,10XX BALFOUR AVE,Shaughnessy,490699.8,5455444,West


In [14]:
crime_X <- crime_data[c("REGION","TYPE","X")]
head(crime_X)

REGION,TYPE,X
<chr>,<chr>,<dbl>
East,Break and Enter Commercial,491036.1
East,Break and Enter Commercial,491067.3
East,Break and Enter Commercial,491102.2
East,Break and Enter Commercial,491102.2
West,Break and Enter Commercial,490699.8
West,Break and Enter Commercial,490699.8


In [30]:


crime_res <- crime_X[crime_X$TYPE == "Break and Enter Residential/Other", ]
crime_com <- crime_X[crime_X$TYPE == "Break and Enter Commercial", ]
crime_mis <- crime_X[crime_X$TYPE == "Mischief", ]


crime_res_summary <- crime_res %>% group_by(REGION) %>% summarise(total_rows = n()) %>%
    mutate(proportion = total_rows / sum(total_rows))
crime_com_summary <- crime_com %>% group_by(REGION) %>% summarise(total_rows = n()) %>%
    mutate(proportion = total_rows / sum(total_rows))
crime_mis_summary <- crime_mis %>% group_by(REGION) %>% summarise(total_rows = n()) %>%
    mutate(proportion = total_rows / sum(total_rows))


tibble(crime_res_summary)
tibble(crime_com_summary)
tibble(crime_mis_summary)


REGION,total_rows,proportion
<chr>,<int>,<dbl>
East,801,0.6327014
West,465,0.3672986


REGION,total_rows,proportion
<chr>,<int>,<dbl>
East,1344,0.6770781
West,641,0.3229219


REGION,total_rows,proportion
<chr>,<int>,<dbl>
East,4349,0.7746705
West,1265,0.2253295


## Methods: Plan

---

In [65]:
res_prop_east <- crime_res_summary$proportion[1]
confidence_level <- 0.95

# Calculate the standard error of the proportion
se <- sqrt(res_prop_east * (1 - res_prop_east) / 1266)


population_proportion_null <- 0.67  # Hypothesized population proportion under the null hypothesis


# Calculate z value
z_value <- (res_prop_east - population_proportion_null) / se

# Calculate the margin of error
margin_of_error_res <- z_value * se

p_value <- pnorm(z_value)
print(p_value)
print(margin_of_error_res)

[1] 0.002952944
[1] -0.03729858


In [66]:
com_prop_east <- crime_com_summary$proportion[1]

# Calculate the standard error of the proportion
se <- sqrt(com_prop_east * (1 - com_prop_east) / 1985)

# Calculate z value
z_value <- (com_prop_east - population_proportion_null) / se

# Calculate the margin of error
margin_of_error_com <- z_value * se

p_value <- pnorm(z_value)
print(p_value)
print(margin_of_error_com)

[1] 0.7499766
[1] 0.007078086


In [67]:
mis_prop_east <- crime_mis_summary$proportion[1]

# Calculate the standard error of the proportion
se <- sqrt(mis_prop_east * (1 - mis_prop_east) / 5614)  # Replace 'n' with your sample size

# z value
z_value <- (mis_prop_east - population_proportion_null) / se

# margin of error
margin_of_error_mis <- z_value * se

p_value <- pnorm(z_value)
print(p_value)
print(margin_of_error_mis)

[1] 1
[1] 0.1046705



For our hypothesis testing the null hypothesis is that there is equal crime rates between the west and east sides of Vancouver. The alternative is that the west side has more crime than the east. For this our confidence level will be 95%. In this case it would be better to use bootstrapping rather than the asymptotic method as the asymptotic method relies on the data to already follow a  normal distribution. If the p-value is less than 0.05 after conducting bootstrapping we will reject the null hypothesis.

* What do you expect to find?

We expect to find our alternative hypothesis to be true. Crime will be the most common in the west area (downtown), X < 491017.47 in coordinate, and this trend will stay consistent for different types of crimes and for different neighbourhoods within Vancouver. 
 
* What impact could such findings have?

Understanding the relationship between location and crime can tell law enforcement agencies when and where to allocate police resources efficiently. It may additionally cause more centred and efficient policing techniques, reducing crime rates during peak periods.

* What future questions could this lead to?

Beyond location, what different elements (i.e. weather, events, or tourism) could influence crime rates? Our research can be continued with those variables to inspect why the west side (downtown) has a significantly higher rate than other areas.

How do economic factors, such as income levels, education, and employment opportunities, differ between the west and east sides of Vancouver, and to what extent do these disparities correlate with crime rates.
References
Staysafevancouver. (2023, August 31). Vancouver crime rate: Areas to avoid. Stay Safe Vancouver. https://www.staysafevancouver.com/post/vancouver-crime-rate 

West side rated safer than East Side neighbourhoods: Vancouver police survey. The Georgia Straight. (2019, February 21). https://www.straight.com/news/1203621/west-side-rated-safer-east-side-neighbourhoods-vancouver-police-survey 
