# Understanding Crime Rates in London, England

## Project Overview

In this project, I aim to analyze and understand the crime rate in London, England, with a particular focus on violent crimes. The primary objective is to explore the patterns of violent crime rates and identify when they are most likely to occur throughout the year.

## Dataset Description

The dataset used in this analysis includes all crimes (both violent and non-violent) reported in London between 2008 and 2016. However, the dataset does not explicitly categorize crimes as violent or non-violent. Therefore, during the preprocessing phase, we will need to address this limitation by defining and separating violent crimes from non-violent ones.

Additionally, we will need to account for the months when daylight saving time (DST) is in effect, as this could influence the patterns of crime.

## Hypothesis

My initial hypothesis is that violent crime rates tend to increase during periods when daylight saving time is not in effect. This is because the nights are longer during these periods, potentially providing more opportunities for violent crimes to occur under the cover of darkness.

## Next Steps

1. **Data Preprocessing**: 
   - Categorize crimes as violent or non-violent.
   - Identify and label the months when daylight saving time is in effect.

2. **Exploratory Data Analysis (EDA)**:
   - Analyze crime trends over the years.
   - Compare crime rates during DST and non-DST periods.

3. **Statistical Analysis**:
   - Test the hypothesis by comparing violent crime rates during DST and non-DST periods.

4. **Visualization**:
   - Create visualizations to illustrate crime patterns and support findings.

By the end of this notebook, I hope to provide insights into the relationship between daylight saving time and violent crime rates in London.

## 1. Loading the Data
We start by loading the dataset and examining its structure.

In [1]:
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from statsmodels.stats.proportion import proportions_ztest
import statsmodels.api as sm

%matplotlib inline

In [4]:
# read csv and import data
df = pd.read_csv('./data/london_crime.csv')

# make a copy
crime_df = df.copy()

# Display the first few rows
df.head()

Unnamed: 0,lsoa_code,borough,major_category,minor_category,value,year,month
0,E01001116,Croydon,Burglary,Burglary in Other Buildings,0,2016,11
1,E01001646,Greenwich,Violence Against the Person,Other violence,0,2016,11
2,E01000677,Bromley,Violence Against the Person,Other violence,0,2015,5
3,E01003774,Redbridge,Burglary,Burglary in Other Buildings,0,2016,3
4,E01004563,Wandsworth,Robbery,Personal Property,0,2008,6


In [16]:

print(f"There are a total of {crime_df.shape[0]} rows in the dataset.")

There are a total of 13490604 rows in the dataset.


In [17]:
crime_df.borough.value_counts()

borough
Croydon                   602100
Barnet                    572832
Ealing                    549396
Bromley                   523908
Lambeth                   519048
Enfield                   511164
Wandsworth                498636
Brent                     490644
Lewisham                  485136
Southwark                 483300
Newham                    471420
Redbridge                 445716
Hillingdon                442584
Greenwich                 421200
Hackney                   417744
Haringey                  413856
Tower Hamlets             412128
Waltham Forest            406296
Havering                  399600
Hounslow                  395928
Bexley                    385668
Camden                    378432
Westminster               366660
Harrow                    365688
Islington                 359208
Merton                    339876
Hammersmith and Fulham    328752
Sutton                    322488
Barking and Dagenham      311040
Richmond upon Thames      304128
Ke

It appears that the majority of crimes occurred in the Croydon borough, followed by Barnet and Ealing. Interestingly, the City of London recorded the lowest number of crimes, which is somewhat surprising given that I expected the city center to have higher crime rates. Additionally, the crime rate in the City of London is nearly one-third that of the second-lowest area, Kingston upon Thames.

## 2. Data Overview
Let's get a sense of the dataset's size, columns, and missing values.


In [18]:
crime_df.month.value_counts().sort_index()

month
1     1124217
2     1124217
3     1124217
4     1124217
5     1124217
6     1124217
7     1124217
8     1124217
9     1124217
10    1124217
11    1124217
12    1124217
Name: count, dtype: int64

In [19]:
crime_df.value.min(), crime_df.value.max()

(0, 309)

In [20]:
crime_df.isnull().sum()

lsoa_code          0
borough            0
major_category     0
minor_category     0
value              0
year               0
month              0
Violent            0
daylight_saving    0
dtype: int64

In [21]:
crime_df.duplicated().sum()

0

Fortunately, the dataset contains no duplicates or null values. That said, even if duplicates were present, removing them might not be appropriate, as the same type of crime could occur in the same location and within the same month.

## 3. Crime Distribution by Category
We need to categorize crimes as violent or non-violent. For this analysis, we will define violent crimes as those involving physical harm (e.g., assault, robbery). Let's create a new column to classify crimes.

In [22]:

crime_df.major_category.unique()

array(['Burglary', 'Violence Against the Person', 'Robbery',
       'Theft and Handling', 'Criminal Damage', 'Drugs',
       'Fraud or Forgery', 'Other Notifiable Offences', 'Sexual Offences'],
      dtype=object)

In [23]:
crime_df.minor_category.unique()

array(['Burglary in Other Buildings', 'Other violence',
       'Personal Property', 'Other Theft', 'Offensive Weapon',
       'Criminal Damage To Other Building', 'Theft/Taking of Pedal Cycle',
       'Motor Vehicle Interference & Tampering',
       'Theft/Taking Of Motor Vehicle', 'Wounding/GBH',
       'Other Theft Person', 'Common Assault', 'Theft From Shops',
       'Possession Of Drugs', 'Harassment', 'Handling Stolen Goods',
       'Criminal Damage To Dwelling', 'Burglary in a Dwelling',
       'Criminal Damage To Motor Vehicle', 'Other Criminal Damage',
       'Counted per Victim', 'Going Equipped', 'Other Fraud & Forgery',
       'Assault with Injury', 'Drug Trafficking', 'Other Drugs',
       'Business Property', 'Other Notifiable', 'Other Sexual',
       'Theft From Motor Vehicle', 'Rape', 'Murder'], dtype=object)

In [24]:
# checking the different types of violent crimes
crime_df[crime_df.major_category == 'Violence Against the Person'].minor_category.value_counts()

minor_category
Common Assault         522180
Harassment             522072
Assault with Injury    521856
Wounding/GBH           519372
Other violence         512028
Offensive Weapon       481896
Murder                  92340
Name: count, dtype: int64

In [25]:
crime_df[crime_df.major_category == 'Other Notifiable Offences'].minor_category.value_counts()

minor_category
Other Notifiable    519696
Going Equipped      256608
Name: count, dtype: int64

In [26]:
crime_df[crime_df.major_category == 'Robbery'].minor_category.value_counts()

minor_category
Personal Property    520668
Business Property    418716
Name: count, dtype: int64

In [27]:
crime_df[crime_df.major_category == 'Burglary'].minor_category.value_counts()

minor_category
Burglary in Other Buildings    522072
Burglary in a Dwelling         521532
Name: count, dtype: int64

In [28]:
crime_df[crime_df.major_category == 'Sexual Offences'].minor_category.value_counts()

minor_category
Other Sexual    81108
Rape            27000
Name: count, dtype: int64

In [29]:
crime_df[crime_df.major_category == 'Theft and Handling'].minor_category.value_counts()

minor_category
Other Theft                               522180
Theft From Motor Vehicle                  522180
Theft/Taking Of Motor Vehicle             522072
Motor Vehicle Interference & Tampering    520452
Other Theft Person                        519480
Theft/Taking of Pedal Cycle               516996
Handling Stolen Goods                     426168
Theft From Shops                          416772
Name: count, dtype: int64

In [30]:
crime_df[crime_df.major_category == 'Criminal Damage'].minor_category.value_counts()

minor_category
Criminal Damage To Motor Vehicle     521964
Other Criminal Damage                521856
Criminal Damage To Dwelling          521424
Criminal Damage To Other Building    503928
Name: count, dtype: int64

## 4. Temporal Analysis
Next, we analyze crime trends over time, focusing on monthly and yearly patterns. To enhance our analysis, we introduce additional features to the dataset, such as identifying violent crimes and accounting for daylight saving months. Below is the code used to create these new columns:

In [34]:
def create_columns(df):
    # Identifying violent crimes and daylight saving months
    violent_crimes = ['Burglary', 'Violence Against the Person', 'Robbery', 'Sexual Offences']
    is_daylight_saving = list(range(3, 12))  # Months from March to November
    
    # Encoding the respective columns with binary values (0 and 1)
    df['violent'] = [1 if x in violent_crimes else 0 for x in df['major_category']]
    df['daylight_saving'] = [1 if x in is_daylight_saving else 0 for x in df['month']]
    
    return df

In [35]:
create_columns(crime_df)

Unnamed: 0,lsoa_code,borough,major_category,minor_category,value,year,month,Violent,daylight_saving,violent
0,E01001116,Croydon,Burglary,Burglary in Other Buildings,0,2016,11,0,1,1
1,E01001646,Greenwich,Violence Against the Person,Other violence,0,2016,11,0,1,1
2,E01000677,Bromley,Violence Against the Person,Other violence,0,2015,5,0,1,1
3,E01003774,Redbridge,Burglary,Burglary in Other Buildings,0,2016,3,0,1,1
4,E01004563,Wandsworth,Robbery,Personal Property,0,2008,6,1,1,1
...,...,...,...,...,...,...,...,...,...,...
13490599,E01000504,Brent,Criminal Damage,Criminal Damage To Dwelling,0,2015,2,0,0,0
13490600,E01002504,Hillingdon,Robbery,Personal Property,1,2015,6,1,1,1
13490601,E01004165,Sutton,Burglary,Burglary in a Dwelling,0,2011,2,0,0,1
13490602,E01001134,Croydon,Robbery,Business Property,0,2011,5,1,1,1


## 5. Probability

In [36]:
violent_prob_mean = crime_df['violent'].mean()
print(f"Probability of violent crime regardless of daylight saving status: {violent_prob_mean}.")

Probability of violent crime regardless of daylight saving status: 0.3901115176162609.


In [37]:
violent_isdaylight = crime_df[crime_df.daylight_saving == 1]['violent'].mean()
print(f"Probability of violent crime during daylight saving: {violent_isdaylight}.")

Probability of violent crime during daylight saving: 0.3901115176162609.


In [38]:
violent_notdaylight = crime_df[crime_df.daylight_saving == 0]['violent'].mean()
print(f"Probability of violent crime not during daylight saving: {violent_notdaylight}.")

Probability of violent crime not during daylight saving: 0.3901115176162609.


In [39]:
print(f"Probability of a crime occuring when daylight saving is not in effect: \
{crime_df.daylight_saving.value_counts()[0]/len(crime_df)}.")

Probability of a crime occuring when daylight saving is not in effect: 0.25.


Based on the probabilities calculated earlier, there is insufficient evidence to conclude that the proportion of violent crimes rises when daylight saving is not in effect. In fact, the data shows that the rates of violent crimes remain consistent, regardless of whether daylight saving is active or not.

## 6. Hypothesis Testing


Initially, we hypothesized that violent crime rates are higher when daylight saving is not in effect. Therefore, we formulated the following hypotheses, assuming a Type 1 Error rate of 0.05.

**Null Hypothesis:** The difference between the violent crime rates when daylight saving is in effect and when it is not is less than or equal to 0.

\[H_0 : P_{\text{no daylight}} - P_{\text{daylight}} \leq 0\]

**Alternative Hypothesis:** The difference between the violent crime rates when daylight saving is in effect and when it is not is greater than 0.

\[H_1 : P_{\text{no daylight}} - P_{\text{daylight}} > 0\]

First, let us restate our earlier findings. We will assume that \(P_{\text{no daylight}}\) and \(P_{\text{daylight}}\) have the same rate, equal to the overall violent crime rate, regardless of whether daylight saving is in effect.