# Violent Crime - Contextual analysis
Before conducting the statistical analysis it is worthwhile getting an overview of serious violent crime rates, including knife crime rates, across England and Wales. We wish to understand the rates both at a point in time, which is the end of 2019, and their trends over time. Performing this analysis allows us to understand the similarities and differences between serious violent crime rates and trends in London and the rest of the country. 

The reason we choose to consider data up to the end of the financial year 2019 is because this is the latest year prior to the Covid pandemic, in which lockdowns would have altered the rates and trends of crime

In this notebook we analyse the following:
- Knife crime rates and trends across England and Wales by police force area
- Serious violent crime across England and Wales by police force area
    - including a reconciliation between knife crime and serious violent crime in England and Wales
- Serious violent crime in London, by Community Safety Partnership area
    - including a reconciliation between serious violent crime and knife crime in London.

The output from this notebook will a crime contextual analysis for England and Wales together with a baseline for serious violent crime in London. This London baseline will then be used in later notebooks to reconcile with the serious violent crime data produced at lower LSOA level granularity.

For information, the definition for serious violent crime used in the analysis conducted by The Greater London Authority (04d Appendix B - Data pack 2018.pptx, located at https://data.london.gov.uk/dataset/a-public-health-approach-to-serious-youth-violence) was crime categorised as follows:
- Violence with injury
- Violence without injury
- Robbery
- Sexual Offences

Furthermore, they describe youth violent crime as being comprised of the above categories but where the victim is aged between 1-24.

## 1. England and Wales - Knife Crime
The link between knife crime and serious violent crime is important because they are often used interchangeably even though, according to ONS estimates, knife crime is just 6-7% of all serious violent crime (see Appendix A). However, knife crime gains significant media and political attention, which influences peoples opinions on serious violent crime as a whole, and therefore it's important to understand the rates of knife crime as part of our contextual analysis.

The ONS (and the Home Office, who provide the ONS with crime data) define knife crimes as 'selected police recorded offences involving a knife or sharp instrument. Knives or sharp instruments are taken to be involved in an offence if they 
are used to stab or cut, or as a threat'. The 'selected offences' they include in their analysis are as follows:
- Attempted murder
- Threats to kill
- Assault with injury and assault with intent to cause serious harm
- Robbery
- Rape
- Sexual assault
- Homicide

Usefully, they also provide the subcategories that roll up into these high level offences and this is useful because it allows us to cross refer between the crime codes used by the ONS and Home Office with those used in the GLA analysis. We provide a detailed cross reference between these codes in Appendix B below. 

In order to analyse knife crime we will use the following source:
- official government (Home Office) statistics sourced from https://www.gov.uk/government/statistics/police-recorded-crime-open-data-tables and provided within the dataset 'Offences involving knives or sharp instruments open data year ending March 2009 onwards'
- we also use a second source of knife crime data, but purely to get access to population numbers by police force area in order to pro rata crime by head of population

## 2. England and Wales - Serious Violent Crime
Having analysed the knife crime data, we will then repeat the process but for all serious violent crimes. This is because the GLA analysis we aim to verify assesses serious violent crime as a whole rather than purely knife crime.

For serious violent crime across England and Wales we will also use Home Office data, and this data can be found at https://www.gov.uk/government/statistics/police-recorded-crime-open-data-tables and the specific file we will use is as follows:
- Police recorded crime open data Police Force Area tables from year ending March 2013 onwards


## 3. London - Serious Violent crime 
The Home Office also provide crime data at borough level and the final part of our contextual analysis is to review London borough crime rates, using data from https://www.gov.uk/government/statistics/police-recorded-crime-open-data-tables and contained within 'Police recorded crime Community Safety Partnership open data tables, from year ending March 2016 to year ending March 2020'

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import altair as alt
import seaborn as sns
import statsmodels.api as sm
import scipy.stats as stats
import numpy as np

## 1. England and Wales - Knife Crime

### Get population data by police force areas
We first load the parliamentary briefing data (sourced from ONS) because it contains population data by year by police force and we need that to convert our Home Office knife crime data from absolute numbers to rates per 100,000 population
- it should be noted that in the ONS/Parliament data the years run from April to March and so the 2018 data is from April 2018 to March 2019, while the 2019 data is from April 2019 to March 2020. This coincides with the financial years used in the Home Office Data

In [2]:
numeric_column_names = ['2012_total', '2012_per_100K', '2013_total', '2013_per_100K', 
                                        '2014_total', '2014_per_100K', '2015_total', '2015_per_100K', 
                                       '2016_total', '2016_per_100K', '2017_total', '2017_per_100K',
                                       '2018_total', '2018_per_100K', '2019_total', '2019_per_100K']

all_column_names = ['PoliceForce']
all_column_names = all_column_names + numeric_column_names

parl_knife_crime = pd.read_excel(".\DataSources\England and Wales Crime Data\knifecrime_parliamentary_briefing.xlsx", 
                                 sheet_name='KnifeCrime', skiprows=6, 
                                 usecols = "C,Q,R,T,U,W,X,Z,AA,AC,AD,AF,AG,AI,AJ,AL,AM", 
                                 names=all_column_names)

print("Shape before removing NaNs: " + str(parl_knife_crime.shape))
parl_knife_crime = parl_knife_crime.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(parl_knife_crime.shape))

# now convert all fields to numerica
parl_knife_crime[numeric_column_names] = parl_knife_crime[numeric_column_names].apply(pd.to_numeric,errors='coerce')

# Now check for NaN values

nan_values_ = parl_knife_crime[parl_knife_crime.isna().any(axis=1)]
print(nan_values_.shape)
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (99, 17)
Shape after removing NaNs: (68, 17)
(18, 17)

fields with nan_values



Unnamed: 0,PoliceForce,2012_total,2012_per_100K,2013_total,2013_per_100K,2014_total,2014_per_100K,2015_total,2015_per_100K,2016_total,2016_per_100K,2017_total,2017_per_100K,2018_total,2018_per_100K,2019_total,2019_per_100K
6,Greater Manchester,1587.0,58.070182,1634.0,59.789967,1757.0,64.29068,1791.0,65.53478,1655.0,60.558381,1953.0,69.779906,3620.0,129.341146,,
38,City of London,9.0,,19.0,,8.0,,13.0,,17.0,,26.0,,60.0,,31.0,
40,London,11373.0,,10083.0,,9688.0,113.352737,9751.0,114.089858,12078.0,141.316511,14721.0,167.847295,14902.0,168.86119,15928.0,178.630522
62,British Transport Police,93.0,,72.0,,86.0,,94.0,,56.0,,97.0,,200.0,,225.0,
67,Notes:,,,,,,,,,,,,,,,,
68,1. Police recorded crime data are not designat...,,,,,,,,,,,,,,,,
69,2. Police recorded knife and sharp instrument ...,,,,,,,,,,,,,,,,
70,3. In this table 'offences involving a knife' ...,,,,,,,,,,,,,,,,
71,4. Greater Manchester Police reviewed their re...,,,,,,,,,,,,,,,,
72,"5. Data from Bedfordshire, Cambridgeshire and ...",,,,,,,,,,,,,,,,


### Comment
There are lots of sub total fields and explanatory notes at bottom of sheet so just strip out - we will remove rows where the 2012_per_100K isd NaN and this will also remove City of London data.

In [3]:
print("Shape before removing NaNs: " + str(parl_knife_crime.shape))
parl_knife_crime = parl_knife_crime[parl_knife_crime['2012_per_100K'].notna()].copy()
print("Shape after removing NaNs: " + str(parl_knife_crime.shape))

datatypes = parl_knife_crime.dtypes
print(datatypes)

Shape before removing NaNs: (68, 17)
Shape after removing NaNs: (51, 17)
PoliceForce       object
2012_total       float64
2012_per_100K    float64
2013_total       float64
2013_per_100K    float64
2014_total       float64
2014_per_100K    float64
2015_total       float64
2015_per_100K    float64
2016_total       float64
2016_per_100K    float64
2017_total       float64
2017_per_100K    float64
2018_total       float64
2018_per_100K    float64
2019_total       float64
2019_per_100K    float64
dtype: object


### Derive the population by police force by year
We will use this later to pro rata the Home Office knife crime data

In [4]:
parl_knife_crime_copy = parl_knife_crime.copy()
parl_knife_crime_copy['2012_population'] = 100000 * parl_knife_crime_copy['2012_total'] / parl_knife_crime_copy['2012_per_100K']
parl_knife_crime_copy['2013_population'] = 100000 * parl_knife_crime_copy['2013_total'] / parl_knife_crime_copy['2013_per_100K']
parl_knife_crime_copy['2014_population'] = 100000 * parl_knife_crime_copy['2014_total'] / parl_knife_crime_copy['2014_per_100K']
parl_knife_crime_copy['2015_population'] = 100000 * parl_knife_crime_copy['2015_total'] / parl_knife_crime_copy['2015_per_100K']
parl_knife_crime_copy['2016_population'] = 100000 * parl_knife_crime_copy['2016_total'] / parl_knife_crime_copy['2016_per_100K']
parl_knife_crime_copy['2017_population'] = 100000 * parl_knife_crime_copy['2017_total'] / parl_knife_crime_copy['2017_per_100K']
parl_knife_crime_copy['2018_population'] = 100000 * parl_knife_crime_copy['2018_total'] / parl_knife_crime_copy['2018_per_100K']
parl_knife_crime_copy['2019_population'] = 100000 * parl_knife_crime_copy['2019_total'] / parl_knife_crime_copy['2019_per_100K']
parl_knife_crime_copy.head()

Unnamed: 0,PoliceForce,2012_total,2012_per_100K,2013_total,2013_per_100K,2014_total,2014_per_100K,2015_total,2015_per_100K,2016_total,...,2019_total,2019_per_100K,2012_population,2013_population,2014_population,2015_population,2016_population,2017_population,2018_population,2019_population
0,Durham,142.0,22.789279,131.0,21.023913,119.0,19.098058,163.0,26.159525,175.0,...,165.0,26.043886,623100.0,623100.0,623100.0,623100.0,623100.0,630000.0,630000.0,633546.0
1,Northumberland,313.0,21.816408,347.0,24.186241,427.0,29.76232,526.0,36.662717,461.0,...,795.0,54.577471,1434700.0,1434700.0,1434700.0,1434700.0,1434700.0,1448500.0,1448600.0,1456645.0
2,North East Region,650.0,24.821476,678.0,25.890709,830.0,31.695116,1035.0,39.523428,1003.0,...,1481.0,55.720493,2618700.0,2618700.0,2618700.0,2618700.0,2618700.0,2644600.0,2644600.0,2657909.0
4,Cheshire,204.0,19.630485,242.0,23.287144,212.0,20.400308,247.0,23.768283,275.0,...,401.0,37.856224,1039200.0,1039200.0,1039200.0,1039200.0,1039200.0,1054100.0,1054100.0,1059271.0
5,Cumbria,100.0,20.084354,104.0,20.887728,99.0,19.883511,120.0,24.101225,120.0,...,203.0,40.690496,497900.0,497900.0,497900.0,497900.0,497900.0,498400.0,498400.0,498888.0


In [5]:
police_force_population = pd.melt(parl_knife_crime_copy, id_vars=['PoliceForce'], 
                        value_vars=['2012_population', 
                                    '2013_population',
                                    '2014_population',
                                    '2015_population',
                                    '2016_population',
                                    '2017_population',
                                    '2018_population',
                                    '2019_population'], var_name = 'year', value_name='population')

police_force_population['year'] = police_force_population['year'].str[:4].astype(int)
police_force_population.head()

Unnamed: 0,PoliceForce,year,population
0,Durham,2012,623100.0
1,Northumberland,2012,1434700.0
2,North East Region,2012,2618700.0
3,Cheshire,2012,1039200.0
4,Cumbria,2012,497900.0


### Home Office Knife Crime Data
Now we can load the Home Office Knife Crime data and then merge it with the parliamentary data to create a single dataset, joined on PoliceForce and year.

Data for 12 forces (Derbyshire, Dyfed-Powys, Greater Manchester, Leicestershire, Lincolnshire, Merseyside, Metropolitan, Northamptonshire, South Wales, South Yorkshire, West Midlands and West Yorkshire Police) are based on a new methodology (the National Data Quality Improvement Service) for identifying whether an offence included a knife or sharp instrument or not. These forces also supplied data on revised coverage and guidance for the collection. Data for these 12 forces are not comparable with the other non-NDQIS forces and so we will only compare data for these forces. However, these are the most relevant forces for comparing with the Met Police and so this is suitable for our purposes

In [6]:
knife_crime_all = pd.read_excel(".\DataSources\England and Wales Crime Data\knifecrime_2012-onwards.xlsx",sheet_name='2012_2020')

print("Shape before removing NaNs: " + str(knife_crime_all.shape))
knife_crime_all = knife_crime_all.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(knife_crime_all.shape))

# Now check for NaN values
nan_values = knife_crime_all[knife_crime_all.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values

Shape before removing NaNs: (10780, 6)
Shape after removing NaNs: (10780, 6)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,Offence codes,Offence description,Force Offences
7423,2018/19,Q1,Bedfordshire,19C-H,Rape,
7425,2018/19,Q1,Bedfordshire,"17, 20",Sexual assault,
10573,2020/21,Q1,Staffordshire,19C-H,Rape,


In [7]:
# can ignore the NaN because not in police forces we're interested in

required_forces = ['Metropolitan Police', 'West Midlands', 'Greater Manchester', 'Merseyside', 'Derbyshire', 
                   'Dyfed-Powys', 'Leicestershire', 'Lincolnshire', 'Northamptonshire', 'South Wales', 
                   'South Yorkshire', 'West Yorkshire']
knife_crime = knife_crime_all[knife_crime_all['Force Name'].isin(required_forces)].copy().reset_index()

print(knife_crime['Force Name'].unique())
print(knife_crime.shape)
knife_crime.head()

['Derbyshire' 'Dyfed-Powys' 'Greater Manchester' 'Leicestershire'
 'Lincolnshire' 'Merseyside' 'Metropolitan Police' 'Northamptonshire'
 'South Wales' 'South Yorkshire' 'West Midlands' 'West Yorkshire']
(2940, 7)


Unnamed: 0,index,Financial Year,Financial Quarter,Force Name,Offence codes,Offence description,Force Offences
0,224,2012/13,Q1,Derbyshire,2,Attempted murder,0.0
1,225,2012/13,Q1,Derbyshire,"1, 4.1, 4.10, 4.2",Homicide,0.0
2,226,2012/13,Q1,Derbyshire,"17, 20",Sexual assault,0.0
3,227,2012/13,Q1,Derbyshire,19C-H,Rape,0.0
4,228,2012/13,Q1,Derbyshire,"34A, 34B",Robbery,22.0


### Data transformation

#### Dates
The data is presented in terms of quarters and financial years, which we will convert to quarterly data to make time series easier to review. Home Office (and ONS) considers financial years as starting in April and so Q1 runs from April-June, Q2 from July-September, Q3 from October-December, and Q4 from January-March.

#### Crimes
We will group all crimes together into a single knife crime total by Force Name

In [8]:
knife_crime['year'] = knife_crime['Financial Year'].str[:4].astype(int)

print(knife_crime.shape)
knife_crime.head()

(2940, 8)


Unnamed: 0,index,Financial Year,Financial Quarter,Force Name,Offence codes,Offence description,Force Offences,year
0,224,2012/13,Q1,Derbyshire,2,Attempted murder,0.0,2012
1,225,2012/13,Q1,Derbyshire,"1, 4.1, 4.10, 4.2",Homicide,0.0,2012
2,226,2012/13,Q1,Derbyshire,"17, 20",Sexual assault,0.0,2012
3,227,2012/13,Q1,Derbyshire,19C-H,Rape,0.0,2012
4,228,2012/13,Q1,Derbyshire,"34A, 34B",Robbery,22.0,2012


In [9]:
knife_crime_totals = knife_crime.groupby(["Force Name", "year"]).apply(lambda x: x['Force Offences'].sum()).reset_index()
knife_crime_totals.rename(columns = {0:'total'}, inplace = True)
print(knife_crime_totals.shape)
knife_crime_totals.head()

(108, 3)


Unnamed: 0,Force Name,year,total
0,Derbyshire,2012,287.0
1,Derbyshire,2013,344.0
2,Derbyshire,2014,297.0
3,Derbyshire,2015,347.0
4,Derbyshire,2016,395.0


### Now merge with population data

In [10]:
knife_crime_merged = pd.merge(knife_crime_totals, police_force_population,  how='left', left_on=['Force Name','year'], right_on = ['PoliceForce','year'])
print(knife_crime_merged.shape)
knife_crime_merged.head()

(108, 5)


Unnamed: 0,Force Name,year,total,PoliceForce,population
0,Derbyshire,2012,287.0,Derbyshire,1032300.0
1,Derbyshire,2013,344.0,Derbyshire,1032300.0
2,Derbyshire,2014,297.0,Derbyshire,1032300.0
3,Derbyshire,2015,347.0,Derbyshire,1032300.0
4,Derbyshire,2016,395.0,Derbyshire,1032300.0


### Comments on merged dataset
The 2020 population data is not representative because of the pandemic and so we can drop it. For Greater Manchester we don't have up to date population data so instead will just set the population = 2,629,400 which was the population in 2011, which we took from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/116249/population-police-force.csv/preview

In [11]:
knife_crime_clean = knife_crime_merged[knife_crime_merged.year < 2020].copy()
knife_crime_clean.loc[knife_crime_clean['Force Name'] == 'Greater Manchester', "population"] = 2629400
knife_crime_clean['crime_per_100K_homeoffice'] = (100000 * knife_crime_clean.total) / knife_crime_clean.population

print(knife_crime_clean.shape)
knife_crime_clean.head()

(96, 6)


Unnamed: 0,Force Name,year,total,PoliceForce,population,crime_per_100K_homeoffice
0,Derbyshire,2012,287.0,Derbyshire,1032300.0,27.801996
1,Derbyshire,2013,344.0,Derbyshire,1032300.0,33.323646
2,Derbyshire,2014,297.0,Derbyshire,1032300.0,28.770706
3,Derbyshire,2015,347.0,Derbyshire,1032300.0,33.614259
4,Derbyshire,2016,395.0,Derbyshire,1032300.0,38.264071


In [12]:
# just add a label I can easily use in the graph
knife_crime_rates = knife_crime_clean.copy()
knife_crime_rates['label'] = ""
knife_crime_rates.loc[knife_crime_rates.year == 2019, 'label'] = knife_crime_rates['Force Name']

knife_crime_rates.head()

Unnamed: 0,Force Name,year,total,PoliceForce,population,crime_per_100K_homeoffice,label
0,Derbyshire,2012,287.0,Derbyshire,1032300.0,27.801996,
1,Derbyshire,2013,344.0,Derbyshire,1032300.0,33.323646,
2,Derbyshire,2014,297.0,Derbyshire,1032300.0,28.770706,
3,Derbyshire,2015,347.0,Derbyshire,1032300.0,33.614259,
4,Derbyshire,2016,395.0,Derbyshire,1032300.0,38.264071,


In [13]:
dom = required_forces
rng = ["rgb(0,0,255)", "rgb(0,255,0)", "rgb(255,0,0)", "rgb(220,220,220)", "rgb(220,220,220)", 
      "rgb(220,220,220)", "rgb(220,220,220)", "rgb(220,220,220)", "rgb(220,220,220)", "rgb(220,220,220)", 
      "rgb(220,220,220)", "rgb(220,220,220)"]

line = alt.Chart(knife_crime_rates, title='Knife crimes/100,000 by Police Force (Home Office): 2012 to 2019').mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(title=None, ticks=False, labelAngle=0)),
    y=alt.Y('crime_per_100K_homeoffice:Q', axis=alt.Axis(title='knife crimes per 100,000 (annualised)', ticks=False, values=[25, 50, 75, 100, 125, 150, 175])),
    color=alt.Color('Force Name:N', scale=alt.
                    Scale(domain=dom, range=rng), legend=None)
)

text = line.mark_text(
    align='left',
    baseline='middle',
    dx=10  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='label:N'
)

(line + text).configure_axis(
    grid=False,
    domain=False
).configure_view(
    strokeWidth=0
).properties(
    width=600,
    height=400)

### Percentage increase between 2017 and 2019
These dates are chosen because they relate to the increase in knife crime between the mayor being elected (we start 6 months after his election because it's not reasonable to assume his involvement would have any effect for the first 6 months) and the beginning of the Covid pandemic. 

In [14]:
print(knife_crime_rates.shape)
knife_crime_rates.head()

(96, 7)


Unnamed: 0,Force Name,year,total,PoliceForce,population,crime_per_100K_homeoffice,label
0,Derbyshire,2012,287.0,Derbyshire,1032300.0,27.801996,
1,Derbyshire,2013,344.0,Derbyshire,1032300.0,33.323646,
2,Derbyshire,2014,297.0,Derbyshire,1032300.0,28.770706,
3,Derbyshire,2015,347.0,Derbyshire,1032300.0,33.614259,
4,Derbyshire,2016,395.0,Derbyshire,1032300.0,38.264071,


In [15]:
knife_crime_rates_2017_2019 = knife_crime_rates[(knife_crime_rates.year == 2017) | 
                                               (knife_crime_rates.year == 2019)].copy()

knife_crime_home_wide = knife_crime_rates_2017_2019.pivot_table(index=['PoliceForce'], 
                                    columns='year', values='crime_per_100K_homeoffice').reset_index()

knife_crime_home_wide['pct_change'] = 100 * (knife_crime_home_wide[2019] - knife_crime_home_wide[2017]) / knife_crime_home_wide[2017]

knife_crime_home_wide

year,PoliceForce,2017,2019,pct_change
0,Derbyshire,46.885595,81.077284,72.925785
1,Dyfed-Powys,24.9613,34.551849,38.421671
2,Greater Manchester,74.2755,121.472579,63.543267
3,Leicestershire,65.731167,78.029022,18.709321
4,Lincolnshire,38.205538,56.097048,46.829625
5,Merseyside,66.91135,99.78462,49.129588
6,Metropolitan Police,156.441158,165.011588,5.478373
7,Northamptonshire,68.132758,95.90408,40.760602
8,South Wales,43.337108,60.55711,39.735005
9,South Yorkshire,127.170949,115.473606,-9.198125


In [16]:
rng = ["rgb(0,0,255)", "rgb(0,255,0)", "rgb(255,0,0)", "rgb(220,220,220)", "rgb(220,220,220)", 
      "rgb(220,220,220)", "rgb(220,220,220)", "rgb(220,220,220)", "rgb(220,220,220)", "rgb(220,220,220)", 
      "rgb(220,220,220)", "rgb(220,220,220)"]

bars = alt.Chart(knife_crime_home_wide[['PoliceForce', 'pct_change']], title='Percent change in knife crime between 2017 and 2019 (Home Office)').mark_bar().encode(
    x=alt.X('pct_change:Q', axis=alt.Axis(title='% change', ticks=False, values=[0, 40, 80])),
    y=alt.Y('PoliceForce:N', axis=alt.Axis(title=None, ticks=False), sort='-x'),  
    color=alt.Color('PoliceForce:N', scale=alt.Scale(domain=dom, range=rng), legend=None)
)

text = alt.Chart(knife_crime_home_wide[['PoliceForce', 'pct_change']]).mark_text(
    align='center',
    baseline='middle',
    color='black',
    dx=20  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    x=alt.X('pct_change:Q', axis=alt.Axis(title='% change', ticks=False)),
    y=alt.Y('PoliceForce:N', axis=alt.Axis(title=None, ticks=False), sort='-x'),
    text=alt.Text('pct_change:Q', format=',.3r')
)

(bars + text).configure_axis(
    grid=False,
    domain=False
).configure_view(
    strokeWidth=0).properties(height=400, width=600)

### Knife Crime Commentary
From these plots we can see that at the end of 2019 West Midlands had the worst knife crime per 100K results in England and Wales and London had the 2nd worst rates.

When looking at trends between 2012 and 2019 it is important to understand that crime reporting procedures changed around around 2016 and much of the increase between 2012 and 2016 is due to better crime recording rather than increases in crime occurrence. However, and more interestingly for our later analysis, knife crime in London rose far less than most other forces between 2017 and 2019.

## 2. Serious Violent Crime - England and Wales
We now repeat the analysis for the same crime categories but this time we are looking at serious violent crime rather than just knife crime. We use offence categories identified previously to extract crime types of interest and we don't load data prior to 2017 because crime recording practices changed in 2016.

### First specify the offence types based on those used within Knife Crime (see Appendix B for further detail)
It is important to note that the categories included in the Home Office Knife Crime statistics do not include 'Violence without Injury' whereas the GLA analysis does include these crimes. We have decided to use the Home Office definition because this increases our ability to reconcile with official Home Office data, and helps us maintain a level of independence from the GLA analysis we are trying to verify.

In [17]:
sexual_offence_types = ['Rape of a female aged 16 and over',
               'Rape of a female child under 13',
               'Rape of a female child under 16',
               'Rape of a male aged 16 and over',
               'Rape of a male child under 13',
               'Rape of a male child under 16',
               'Sexual assault on a female aged 13 and over',
               'Sexual assault on a female child under 13',
               'Sexual assault on a male aged 13 and over',
               'Sexual assault on a male child under 13']

violence_robbery_offence_types = ['Assault with injury',
               'Assault with injury on a constable',
               'Assault with intent to cause serious harm',
               'Attempted murder',
               'Racially or religiously aggravated assault with injury',
               'Threats to kill',
               'Corporate manslaughter',
               'Infanticide',
               'Manslaughter',
               'Murder',
               'Robbery of personal property',
               'Robbery of business property',
               ]

offence_types = sexual_offence_types + violence_robbery_offence_types

In [18]:
# pfa = police force area
pfa_2017 = pd.read_excel(".\DataSources\England and Wales Crime Data\prc-pfa-mar2013-onwards-tables-130521.xlsx", sheet_name='2017-18')

print("Shape before removing NaNs: " + str(pfa_2017.shape))
pfa_2017 = pfa_2017.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(pfa_2017.shape))

# Now check for NaN values
nan_values_ = pfa_2017[pfa_2017.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (23068, 8)
Shape after removing NaNs: (23068, 8)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences


In [19]:
# pfa = police force area
pfa_2018 = pd.read_excel(".\DataSources\England and Wales Crime Data\prc-pfa-mar2013-onwards-tables-130521.xlsx", sheet_name='2018-19')

print("Shape before removing NaNs: " + str(pfa_2018.shape))
pfa_2018 = pfa_2018.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(pfa_2018.shape))

# Now check for NaN values
nan_values_ = pfa_2018[pfa_2018.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (23068, 8)
Shape after removing NaNs: (23068, 8)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences


In [20]:
# pfa = police force area
pfa_2019 = pd.read_excel(".\DataSources\England and Wales Crime Data\prc-pfa-mar2013-onwards-tables-130521.xlsx", sheet_name='2019-20')

print("Shape before removing NaNs: " + str(pfa_2019.shape))
pfa_2019 = pfa_2019.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(pfa_2019.shape))

# Now check for NaN values
nan_values_ = pfa_2019[pfa_2019.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (23420, 8)
Shape after removing NaNs: (23420, 8)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences


In [21]:
pfa_2019.head()

Unnamed: 0,Financial Year,Financial Quarter,Force Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences
0,2019/20,1,Action Fraud,Fraud offences recorded by Action Fraud,Fraud offences,Fraud: Action Fraud,AF,84746.0
1,2019/20,1,Avon & Somerset,Absconding from lawful custody,Miscellaneous crimes,Miscellaneous crimes,80,9.0
2,2019/20,1,Avon & Somerset,Abuse of children through sexual exploitation,Sexual offences,Other sexual offences,71,5.0
3,2019/20,1,Avon & Somerset,Abuse of position of trust of a sexual nature,Sexual offences,Other sexual offences,73,0.0
4,2019/20,1,Avon & Somerset,Aggravated Burglary Business and Community,Theft offences,Non-domestic burglary,31A,2.0


### We now want to filter on required forces, serious violence data and aggregate by year

In [22]:
def get_aggregate_pol_force_data(df, year):
    pol_row_data = df[(df['Force Name'].isin(required_forces)) &
                              (df['Offence Description'].isin(offence_types))]
    
    police_totals = pol_row_data.groupby(["Force Name"]).apply(lambda x: x['Number of Offences'].sum()).reset_index()
    police_totals.rename(columns = {0:'total'}, inplace = True)
    police_totals.rename(columns = {'Force Name':'force_name'}, inplace = True)
    police_totals['year'] = year
    
    return police_totals

pfa_2017_agg = get_aggregate_pol_force_data(pfa_2017, 2017)
pfa_2018_agg = get_aggregate_pol_force_data(pfa_2018, 2018)
pfa_2019_agg = get_aggregate_pol_force_data(pfa_2019, 2019)

In [23]:
police_force_crime_ = pd.concat([pfa_2017_agg, pfa_2018_agg, pfa_2019_agg])

# save a copy at this stage so that I can reload from here to save enormous load time
police_force_crime_.to_csv('.\DataSources\England and Wales Crime Data\police_force_serious_violence.csv', index=False)

## Load Police Force Data directly from transformed file
We previously saved a transformed file containing police serious violent data only so we can just load that rather than load the original data

In [24]:
police_force_crime = pd.read_csv('.\DataSources\England and Wales Crime Data\police_force_serious_violence.csv')

print(police_force_crime.shape)

police_force_crime.head(13)

(36, 3)


Unnamed: 0,force_name,total,year
0,Derbyshire,10142.0,2017
1,Dyfed-Powys,4333.0,2017
2,Greater Manchester,43636.0,2017
3,Leicestershire,10666.0,2017
4,Lincolnshire,5852.0,2017
5,Merseyside,17878.0,2017
6,Metropolitan Police,130103.0,2017
7,Northamptonshire,9307.0,2017
8,South Wales,14978.0,2017
9,South Yorkshire,19212.0,2017


### Get population totals so I can pro rata the crime by head of population

In [25]:
police_force_crime = pd.merge(police_force_crime, police_force_population,  how='left', left_on=['force_name','year'], right_on = ['PoliceForce','year'])
print(police_force_crime.shape)
police_force_crime.head()

(36, 5)


Unnamed: 0,force_name,total,year,PoliceForce,population
0,Derbyshire,10142.0,2017,Derbyshire,1032300.0
1,Dyfed-Powys,4333.0,2017,Dyfed-Powys,516800.0
2,Greater Manchester,43636.0,2017,Greater Manchester,2798800.0
3,Leicestershire,10666.0,2017,Leicestershire,1083200.0
4,Lincolnshire,5852.0,2017,Lincolnshire,751200.0


In [85]:
police_force_crime['total_per_1000'] = 1000 * police_force_crime.total / police_force_crime.population
mean_2019 = police_force_crime['total_per_1000'][police_force_crime.year == 2019].mean()
police_force_crime['delta_from_2019_mean'] = 0
police_force_crime.loc[police_force_crime.year == 2019, 'delta_from_2019_mean'] = police_force_crime.total_per_1000 - mean_2019

# I want to add labels for the graphs
police_force_crime['label'] = ""
police_force_crime.loc[police_force_crime.year == 2019, 'label'] = police_force_crime.force_name

police_force_crime.loc[(police_force_crime.year == 2018) & (police_force_crime.force_name == 'Greater Manchester'), 'label'] = 'Greater Manchester (no 2019 data)'


police_force_crime.head(20)

Unnamed: 0,force_name,total,year,PoliceForce,population,total_per_1000,delta_from_2019_mean,label
0,Derbyshire,10142.0,2017,Derbyshire,1032300.0,9.824663,0.0,
1,Dyfed-Powys,4333.0,2017,Dyfed-Powys,516800.0,8.384288,0.0,
2,Greater Manchester,43636.0,2017,Greater Manchester,2798800.0,15.590968,0.0,
3,Leicestershire,10666.0,2017,Leicestershire,1083200.0,9.84675,0.0,
4,Lincolnshire,5852.0,2017,Lincolnshire,751200.0,7.790202,0.0,
5,Merseyside,17878.0,2017,Merseyside,1416800.0,12.618577,0.0,
6,Metropolitan Police,130103.0,2017,Metropolitan Police,8762400.0,14.847873,0.0,
7,Northamptonshire,9307.0,2017,Northamptonshire,741200.0,12.556665,0.0,
8,South Wales,14978.0,2017,South Wales,1324500.0,11.308418,0.0,
9,South Yorkshire,19212.0,2017,South Yorkshire,1393400.0,13.787857,0.0,


In [86]:
line = alt.Chart(police_force_crime, title='Serious violent crimes/1000, by Police Force Area: 2017 to 2019').mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(title=None, ticks=False, labelAngle=0)),
    y=alt.Y('total_per_1000:Q', axis=alt.Axis(title='Serious violent crimes per 1000 (annualised)', ticks=False, values=[0, 10, 20, 30, 40, 50])),
    color=alt.Color('force_name:N', legend=None)
)

text = line.mark_text(
    align='left',
    baseline='middle',
    dx=10  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='label:N'
)

(line + text).configure_axis(
    grid=False,
    domain=False
).configure_view(
    strokeWidth=0
).properties(
    width=600,
    height=1000)

In [60]:
police_force_crime_2017_2019 = police_force_crime[(police_force_crime.year == 2017) | 
                                               (police_force_crime.year == 2019)].copy()

police_force_crime_wide = police_force_crime_2017_2019.pivot_table(index=['force_name'], 
                                    columns='year', values='total_per_1000').reset_index()

police_force_crime_wide['pct_change'] = (police_force_crime_wide[2019] - police_force_crime_wide[2017]) / police_force_crime_wide[2017]

police_force_crime_wide.sort_values(by='pct_change', ascending=False)

year,force_name,2017,2019,pct_change
4,Lincolnshire,7.790202,12.019851,0.542945
0,Derbyshire,9.824663,14.032826,0.428326
10,West Midlands,14.142275,17.077213,0.207529
1,Dyfed-Powys,8.384288,9.97564,0.189802
3,Leicestershire,9.84675,11.461942,0.164033
7,Northamptonshire,12.556665,14.415039,0.147999
5,Merseyside,12.618577,13.5946,0.077348
9,South Yorkshire,13.787857,14.590304,0.0582
11,West Yorkshire,16.566537,17.509592,0.056925
6,Metropolitan Police,14.847873,15.338043,0.033013


In [59]:
bars = alt.Chart(police_force_crime_wide[['force_name', 'pct_change']], title='Percent change in serious violence between 2017 and 2019 (Home Office)').mark_bar(opacity=0.6, color='firebrick').encode(
    x=alt.X('pct_change:Q', axis=alt.Axis(title='% change', ticks=False, values=[0, 0.25, 0.5, 1.0], format=',.0%')),
    y=alt.Y('force_name:N', axis=alt.Axis(title=None, ticks=False), sort='-x'),
    color=alt.Color('force_name:N', scale=alt.Scale(domain=dom, range=rng), legend=None)
)

text = alt.Chart(police_force_crime_wide[['force_name', 'pct_change']]).mark_text(
    align='center',
    baseline='middle',
    color='black',
    dx=-20  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    x=alt.X('pct_change:Q', axis=alt.Axis(title='% change', ticks=False)),
    y=alt.Y('force_name:N', axis=alt.Axis(title=None, ticks=False), sort='-x'),
    text=alt.Text('pct_change:Q', format=',.1%')
)

(bars + text).configure_axis(
    grid=False,
    domain=False
).configure_view(
    strokeWidth=0).properties(height=300, width=600)

### Now deduce knife crime as a proportion of all serious violent crime
We want to confirm it is approximately 6-7% as discussed in ONS estimates

In [62]:
knife_by_force = knife_crime_rates[['Force Name', 'total']][knife_crime_rates.year == 2019]
serious_by_force = police_force_crime[['force_name', 'total']][police_force_crime.year == 2019]

merged_by_force = pd.merge(knife_by_force, serious_by_force,  how='left', left_on='Force Name', right_on = 'force_name')
merged_by_force['knife_vs_serious'] = merged_by_force.total_x / merged_by_force.total_y
merged_by_force.sort_values(by='knife_vs_serious', ascending=True)

Unnamed: 0,Force Name,total_x,force_name,total_y,knife_vs_serious
1,Dyfed-Powys,179.0,Dyfed-Powys,5168.0,0.034636
4,Lincolnshire,424.0,Lincolnshire,9085.0,0.04667
8,South Wales,806.0,South Wales,15538.0,0.051873
0,Derbyshire,854.0,Derbyshire,14781.0,0.057777
11,West Yorkshire,2529.0,West Yorkshire,40626.0,0.062251
7,Northamptonshire,717.0,Northamptonshire,10777.0,0.066531
3,Leicestershire,853.0,Leicestershire,12530.0,0.068077
5,Merseyside,1420.0,Merseyside,19346.0,0.0734
2,Greater Manchester,3194.0,Greater Manchester,40471.0,0.078921
9,South Yorkshire,1620.0,South Yorkshire,20469.0,0.079144


In [64]:
print("Proportion of knife crimes versus all serious violent crimes across top 12 police forces = {:.2%}\n".format(merged_by_force.total_x.sum() / merged_by_force.total_y.sum()))
    

Proportion of knife crimes versus all serious violent crimes across top 12 police forces = 8.61%



## Comments
- Fairly happy with these results because the ONS estimated knife crime was between 6-7% while we get between 3.5% and 10% by police force and around 8.6% overall. This is in the right ballpark, particularly given the ONS estimate for was for all forces whereas our data is only for 12 forces that use the new reporting system and so we're fairly comfortable therefore that the crime categories we are using are reasonable. 

## Conclusions from England and Wales context
London has the 2nd highest (not highest!) knife crime rates when prorated per head of population and the rate of increase in knife crime in London between 2017 and 2019 is far lower than the majority of the other top 12 police force areas. For example, London's knife crime increased by 5.5% whereas Mersyside increased by nearly 50% and others by even more. We see a similar picture when looking at serious violent crime where London has 4th highest SVC rates but a far lower increase between 2017 and 2019.

This data suggests London is doing better in dealing with knife and serious violent crime than most other major police forces in England and Wales. This might be due to London having reported numbers better in 2017 and hence experienced les of an increase than other forces, but even if that were the case, it's overall crime rates still aren't the highest in E+W if we prorata by population size. 

## 3. Serious Violent Crime - London
Now, we will perform serious violent crime analysis for London data using the Police recorded crime Community Safety Partnership open data tables, from year ending March 2016 to year ending December 2020

Within this file, we will use the aforementioned crime subcategories, which are based on the crime categories used in Home Office knife crime data (see Appendix B) and are used specifically to create a link between official knife crime data and the categories refered to within the Greater London Authority crime statistics

### First load the crime data
#### These are very large files and take a while to load so, either load from scratch in which case run the next 3 cells:
- Data for each year kept in different tabs so load one at a time, strip out London CSPs and for 2017/18/19 and then merge

#### Or, skip these cells and load from a file containing the previously transformed data by jumping to <b> "Load London Crime Data directly from transformed file" </b>

In [32]:
csp_2017 = pd.read_excel(".\DataSources\England and Wales Crime Data\crime_2015_2021.xlsx", sheet_name='2017_18')

print("Shape before removing NaNs: " + str(csp_2017.shape))
csp_2017 = csp_2017.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(csp_2017.shape))

# Now check for NaN values
nan_values_ = csp_2017[csp_2017.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

#csp_2017

Shape before removing NaNs: (186282, 9)
Shape after removing NaNs: (186282, 9)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,CSP Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences


In [33]:
csp_2018 = pd.read_excel(".\DataSources\England and Wales Crime Data\crime_2015_2021.xlsx", sheet_name='2018_19')

print("Shape before removing NaNs: " + str(csp_2018.shape))
csp_2018 = csp_2018.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(csp_2018.shape))

# Now check for NaN values
nan_values_ = csp_2018[csp_2018.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (185496, 9)
Shape after removing NaNs: (185496, 9)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,CSP Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences


In [34]:
csp_2019 = pd.read_excel(".\DataSources\England and Wales Crime Data\crime_2015_2021.xlsx", sheet_name='2019_20')

print("Shape before removing NaNs: " + str(csp_2019.shape))
csp_2019 = csp_2019.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(csp_2019.shape))

# Now check for NaN values
nan_values_ = csp_2019[csp_2019.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (188195, 9)
Shape after removing NaNs: (188195, 9)

fields with nan_values



Unnamed: 0,Financial Year,Financial Quarter,Force Name,CSP Name,Offence Description,Offence Group,Offence Subgroup,Offence Code,Number of Offences


In [35]:
# create copies so I don't have to reload the data if I mess it up
csp_2017_cp = csp_2017.copy()
csp_2018_cp = csp_2018.copy()
csp_2019_cp = csp_2019.copy()

### Filter on Met Police and specific crime types

It should be noted that I create two output files, one with all serious violent crimes, the other containing serious violent crimes except for those involving sexual offences. This second file is needed because the MetPolice LSOA crime data does not include sexual offences and I want to be able to reconcile between the two datasources.

In [36]:
london_csp_names = ['Barking and Dagenham',
               'Barnet',
               'Bexley',
               'Brent',
               'Bromley',
               'Camden',
               'Croydon',
               'Ealing',
               'Enfield',
               'Greenwich',
               'Hackney',
               'Hammersmith and Fulham',
               'Haringey',
               'Harrow',
               'Havering',
               'Hillingdon',
               'Hounslow',
               'Islington',
               'Kensington and Chelsea',
               'Kingston upon Thames',
               'Lambeth',
               'Lewisham',
               'Merton',
               'Newham',
               'Redbridge',
               'Richmond upon Thames',
               'Southwark',
               'Sutton',
               'Tower Hamlets',
               'Waltham Forest',
               'Wandsworth',
               'Westminster']

In [37]:
def get_aggregate_data(df, year):
    london_row_data = df[(df['CSP Name'].isin(london_csp_names)) &
                              (df['Offence Description'].isin(offence_types))]
    
    london_row_not_sex = df[(df['CSP Name'].isin(london_csp_names)) &
                              (df['Offence Description'].isin(violence_robbery_offence_types))]
    
    london_totals = london_row_data.groupby(["CSP Name"]).apply(lambda x: x['Number of Offences'].sum()).reset_index()
    london_totals.rename(columns = {0:'total'}, inplace = True)
    london_totals.rename(columns = {'CSP Name':'csp_name'}, inplace = True)
    london_totals['year'] = year
    
    london_totals_notsex = london_row_not_sex.groupby(["CSP Name"]).apply(lambda x: x['Number of Offences'].sum()).reset_index()
    london_totals_notsex.rename(columns = {0:'total'}, inplace = True)
    london_totals_notsex.rename(columns = {'CSP Name':'csp_name'}, inplace = True)
    london_totals_notsex['year'] = year
    
    return london_totals, london_totals_notsex

london_csp_2017, london_csp_2017_notsex = get_aggregate_data(csp_2017_cp, 2017)
london_csp_2018, london_csp_2018_notsex = get_aggregate_data(csp_2018_cp, 2018)
london_csp_2019, london_csp_2019_notsex = get_aggregate_data(csp_2019_cp, 2019)

In [38]:
london_csp = pd.concat([london_csp_2017, london_csp_2018, london_csp_2019])
london_csp_not_sex = pd.concat([london_csp_2017_notsex, london_csp_2018_notsex, london_csp_2019_notsex])

# save a copy at this stage so that I can reload from here to save enormous load time
london_csp.to_csv('.\DataSources\England and Wales Crime Data\london_csp.csv', index=False)
london_csp_not_sex.to_csv('.\DataSources\England and Wales Crime Data\london_csp_notsex.csv', index=False)

# now delete all references to the files to clear memory
del london_csp_2017
del london_csp_2018
del london_csp_2019
del london_csp
del csp_2017_cp
del csp_2018_cp
del csp_2019_cp
del csp_2017
del csp_2018
del csp_2019

## Load London Crime Data directly from transformed file
We previously saved a transformed file containing London data only so we can just load that rather than load the original data

In [39]:
london_csp = pd.read_csv('.\DataSources\England and Wales Crime Data\london_csp.csv')

print(london_csp.shape)

# save a copy at this stage so that 
london_csp.head()

(96, 3)


Unnamed: 0,csp_name,total,year
0,Barking and Dagenham,3617,2017
1,Barnet,3575,2017
2,Bexley,2346,2017
3,Brent,5463,2017
4,Bromley,3308,2017


In [40]:
# I want to add labels for the graphs
london_csp['label'] = ""
london_csp.loc[london_csp.year == 2019, 'label'] = london_csp.csp_name

### Get population numbers per Community Safety Partnership
Use ONS crime data to get population by CSP as at mid 2019
https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/recordedcrimedatabycommunitysafetypartnershiparea

In [41]:
csp_populations = pd.read_excel(".\DataSources\England and Wales Crime Data\CSP_population_2019.xlsx", 
                                 sheet_name='Table C5', skiprows=6, 
                                 usecols = "D, G", 
                                 names=['csp_name', 'population'])

print("Shape before removing NaNs: " + str(csp_populations.shape))
csp_populations = csp_populations.dropna(how='all') # only drops a row when every column is NA
print("Shape after removing NaNs: " + str(csp_populations.shape))

# now convert all fields to numerica
csp_populations['population'] = csp_populations['population'].apply(pd.to_numeric,errors='coerce')

# Now check for NaN values

nan_values_ = csp_populations[csp_populations.isna().any(axis=1)]
print("\nfields with nan_values\n")
nan_values_

Shape before removing NaNs: (400, 2)
Shape after removing NaNs: (392, 2)

fields with nan_values



Unnamed: 0,csp_name,population
5,Unassigned Avon and Somerset,
6,,675000.0
10,Unassigned Bedfordshire,
11,,855800.0
18,Unassigned Cambridgeshire,
...,...,...
373,,2928600.0
379,Unassigned West Midlands,
382,,2332500.0
387,Unassigned West Yorkshire,


In [42]:
csp_populations = csp_populations.dropna(how='any') # only drops a row when every column is NA
print(csp_populations.shape)
csp_populations.head()

(301, 2)


Unnamed: 0,csp_name,population
0,Bath and North East Somerset,193300.0
1,"Bristol, City of",463400.0
2,North Somerset,215100.0
3,Somerset,562200.0
4,South Gloucestershire,285100.0


In [43]:
london_csp = pd.merge(london_csp, csp_populations, how='inner', left_on='csp_name', right_on = 'csp_name')
london_csp

Unnamed: 0,csp_name,total,year,label,population
0,Barking and Dagenham,3617,2017,,212900.0
1,Barking and Dagenham,3607,2018,,212900.0
2,Barking and Dagenham,3897,2019,Barking and Dagenham,212900.0
3,Barnet,3575,2017,,395900.0
4,Barnet,3863,2018,,395900.0
...,...,...,...,...,...
91,Wandsworth,3987,2018,,329700.0
92,Wandsworth,3827,2019,Wandsworth,329700.0
93,Westminster,7097,2017,,261300.0
94,Westminster,8523,2018,,261300.0


In [44]:
london_csp['total_per_1000'] = 1000 * london_csp.total / london_csp.population
mean_2019 = london_csp['total_per_1000'][london_csp.year == 2019].mean()
london_csp['delta_from_2019_mean'] = 0
london_csp.loc[london_csp.year == 2019, 'delta_from_2019_mean'] = london_csp.total_per_1000 - mean_2019

london_csp.head(20)

Unnamed: 0,csp_name,total,year,label,population,total_per_1000,delta_from_2019_mean
0,Barking and Dagenham,3617,2017,,212900.0,16.989197,0.0
1,Barking and Dagenham,3607,2018,,212900.0,16.942226,0.0
2,Barking and Dagenham,3897,2019,Barking and Dagenham,212900.0,18.304368,3.084653
3,Barnet,3575,2017,,395900.0,9.030058,0.0
4,Barnet,3863,2018,,395900.0,9.757515,0.0
5,Barnet,4322,2019,Barnet,395900.0,10.916898,-4.302817
6,Bexley,2346,2017,,248300.0,9.448248,0.0
7,Bexley,2389,2018,,248300.0,9.621426,0.0
8,Bexley,2708,2019,Bexley,248300.0,10.906162,-4.313553
9,Brent,5463,2017,,329800.0,16.564585,0.0


In [45]:
line = alt.Chart(london_csp, title='Serious violent crimes/1000, by Community Safety Partnership: 2017 to 2019').mark_line().encode(
    x=alt.X('year:O', axis=alt.Axis(title=None, ticks=False, labelAngle=0)),
    y=alt.Y('total_per_1000:Q', axis=alt.Axis(title='Serious violent crimes per 1000 (annualised)', ticks=False, values=[0, 10, 20, 30, 40, 50])),
    color=alt.Color('csp_name:N', legend=None)
)

text = line.mark_text(
    align='left',
    baseline='middle',
    dx=10  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='label:N'
)

(line + text).configure_axis(
    grid=False,
    domain=False
).configure_view(
    strokeWidth=0
).properties(
    width=600,
    height=1000)

In [46]:
london_rates_2017_2019 = london_csp[(london_csp.year == 2017) | 
                                               (london_csp.year == 2019)].copy()

london_crime_wide = london_rates_2017_2019.pivot_table(index=['csp_name'], 
                                    columns='year', values='total_per_1000').reset_index()

london_crime_wide['pct_change'] = 100 * (london_crime_wide[2019] - london_crime_wide[2017]) / london_crime_wide[2017]

london_crime_wide

year,csp_name,2017,2019,pct_change
0,Barking and Dagenham,16.989197,18.304368,7.741222
1,Barnet,9.030058,10.916898,20.895105
2,Bexley,9.448248,10.906162,15.43052
3,Brent,16.564585,15.442693,-6.772835
4,Bromley,9.95486,9.656937,-2.992745
5,Camden,20.451852,18.103704,-11.481347
6,Croydon,14.150504,14.753039,4.258041
7,Ealing,13.241662,14.075483,6.296951
8,Enfield,13.034751,15.79988,21.213514
9,Greenwich,14.161167,15.894408,12.239392


In [47]:
bars = alt.Chart(london_crime_wide[['csp_name', 'pct_change']], title='Percent change in serious violent between 2017 and 2019 (Home Office)').mark_bar(opacity=0.6, color='firebrick').encode(
    x=alt.X('pct_change:Q', axis=alt.Axis(title='% change', ticks=False, values=[0, 40, 80])),
    y=alt.Y('csp_name:N', axis=alt.Axis(title=None, ticks=False), sort='-x')
)

text = alt.Chart(london_crime_wide[['csp_name', 'pct_change']]).mark_text(
    align='center',
    baseline='middle',
    color='black',
    dx=20  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    x=alt.X('pct_change:Q', axis=alt.Axis(title='% change', ticks=False)),
    y=alt.Y('csp_name:N', axis=alt.Axis(title=None, ticks=False), sort='-x'),
    text=alt.Text('pct_change:Q', format=',.3r')
)

(bars + text).configure_axis(
    grid=False,
    domain=False
).configure_view(
    strokeWidth=0).properties(height=600, width=600)

## Choropleths - Serious Violent Crime by Community Safe Partnerhip

In [48]:
import geopandas as gpd

london_csp_map = gpd.read_file("./DataSources/England and Wales Crime Data/CSP Shapefile/Transformed/Community_Safety_Partnerships_(December_2020)_EW_BFC.shp") # a gis format that has geographical boundaries QGIS is a package for looking at shape files
london_csp_map.crs = "epsg:27700" # code for the UK national grid

london_csp_map.head()

Unnamed: 0,OBJECTID,CSP20CD,CSP20NM,BNG_E,BNG_N,LONG,LAT,Shape__Are,Shape__Len,geometry
0,1,E22000001,Bath and North East Somerset,366217,161999,-2.48654,51.35604,351123200.0,141684.814206,"POLYGON ((-2.29462 51.42880, -2.29511 51.42741..."
1,2,E22000002,"Bristol, City of",359990,174846,-2.57742,51.47115,109666800.0,108825.818391,"MULTIPOLYGON (((-2.68438 51.48080, -2.68487 51..."
2,3,E22000003,North Somerset,347614,166718,-2.75438,51.39706,374637200.0,154770.379665,"MULTIPOLYGON (((-3.10601 51.34143, -3.10554 51..."
3,4,E22000006,South Gloucestershire,367559,183198,-2.46922,51.54673,497050900.0,142552.758508,"POLYGON ((-2.29164 51.59370, -2.28986 51.59320..."
4,5,E22000009,Bedford,505721,256463,-0.45463,52.19628,476408300.0,131721.268581,"POLYGON ((-0.24981 52.18436, -0.25128 52.18473..."


In [49]:
# now load centroids so we can position our text
engwales_csp_centroids= pd.read_csv("./DataSources/England and Wales Crime Data/CSP Shapefile/Transformed/london_csp_centroids.csv")
london_csp_centroids = engwales_csp_centroids[['CSP20CD', 'cx', 'cy']][engwales_csp_centroids.CSP20NM.isin(london_csp_names)].copy()
london_csp_centroids.head()

Unnamed: 0,CSP20CD,cx,cy
140,E22000186,0.133935,51.545345
141,E22000187,-0.210009,51.616027
142,E22000188,0.140338,51.458803
143,E22000189,-0.26782,51.558549
144,E22000190,0.051524,51.371997


In [50]:
london_csp_map = pd.merge(london_csp_map, london_csp_centroids, left_on='CSP20CD', right_on='CSP20CD', how = 'inner')
london_csp_map.head()

Unnamed: 0,OBJECTID,CSP20CD,CSP20NM,BNG_E,BNG_N,LONG,LAT,Shape__Are,Shape__Len,geometry,cx,cy
0,141,E22000186,Barking and Dagenham,547757,185111,0.129479,51.54555,36101390.0,40936.55762,"MULTIPOLYGON (((0.07493 51.52973, 0.07465 51.5...",0.133935,51.545345
1,142,E22000187,Barnet,523473,191752,-0.21819,51.61107,86766700.0,50937.778177,"POLYGON ((-0.18199 51.66868, -0.18592 51.66284...",-0.210009,51.616027
2,143,E22000188,Bexley,549202,175434,0.146212,51.45822,60578410.0,47959.661576,"MULTIPOLYGON (((0.20127 51.48016, 0.20178 51.4...",0.140338,51.458803
3,144,E22000189,Brent,519615,186465,-0.27568,51.56438,43236370.0,38441.556644,"POLYGON ((-0.21351 51.55519, -0.21194 51.55370...",-0.26782,51.558549
4,145,E22000190,Bromley,542036,165707,0.039246,51.37266,150132500.0,76226.263933,"POLYGON ((0.07533 51.43199, 0.08114 51.43065, ...",0.051524,51.371997


In [51]:
london_geo_1 = pd.merge(london_csp_map, london_csp[london_csp.year == 2019], left_on='CSP20NM', right_on='csp_name', how = 'inner')

print(london_geo_1.shape)
london_geo_1.head()

(32, 19)


Unnamed: 0,OBJECTID,CSP20CD,CSP20NM,BNG_E,BNG_N,LONG,LAT,Shape__Are,Shape__Len,geometry,cx,cy,csp_name,total,year,label,population,total_per_1000,delta_from_2019_mean
0,141,E22000186,Barking and Dagenham,547757,185111,0.129479,51.54555,36101390.0,40936.55762,"MULTIPOLYGON (((0.07493 51.52973, 0.07465 51.5...",0.133935,51.545345,Barking and Dagenham,3897,2019,Barking and Dagenham,212900.0,18.304368,3.084653
1,142,E22000187,Barnet,523473,191752,-0.21819,51.61107,86766700.0,50937.778177,"POLYGON ((-0.18199 51.66868, -0.18592 51.66284...",-0.210009,51.616027,Barnet,4322,2019,Barnet,395900.0,10.916898,-4.302817
2,143,E22000188,Bexley,549202,175434,0.146212,51.45822,60578410.0,47959.661576,"MULTIPOLYGON (((0.20127 51.48016, 0.20178 51.4...",0.140338,51.458803,Bexley,2708,2019,Bexley,248300.0,10.906162,-4.313553
3,144,E22000189,Brent,519615,186465,-0.27568,51.56438,43236370.0,38441.556644,"POLYGON ((-0.21351 51.55519, -0.21194 51.55370...",-0.26782,51.558549,Brent,5093,2019,Brent,329800.0,15.442693,0.222977
4,145,E22000190,Bromley,542036,165707,0.039246,51.37266,150132500.0,76226.263933,"POLYGON ((0.07533 51.43199, 0.08114 51.43065, ...",0.051524,51.371997,Bromley,3209,2019,Bromley,332300.0,9.656937,-5.562779


In [52]:
data_geo_1 = alt.InlineData(values = london_geo_1.to_json(), #geopandas to geojson string
                       format = alt.DataFormat(property='features',type='json'))

chart_1 = alt.Chart(data_geo_1).mark_geoshape(strokeWidth=1,stroke='lightgray',strokeOpacity=0.2
).encode(
    color=alt.Color('properties.delta_from_2019_mean:Q', scale = alt.Scale(scheme='lightgreyred')),
    tooltip=['properties.csp_name:N', 'properties.total_per_1000:Q', 'properties.delta_from_2019_mean:Q']
).properties(
    projection={'type': 'identity','reflectY': True},
    width=800,
    height=450,
    title='violent crime per 1000 (2019), delta from mean'
)

text = alt.Chart(data_geo_1).mark_text(
    align='center',
    baseline='middle',
    fontSize=10, 
    fontWeight=100,
    opacity=0.6,
    color='black').encode(
    longitude='properties.LONG:Q',
    latitude='properties.LAT:Q',
    text=alt.Text('properties.csp_name:N')
)

chart_1 + text

In [53]:
london_geo_2 = pd.merge(london_csp_map, london_crime_wide, left_on='CSP20NM', right_on='csp_name', how = 'inner')

In [54]:
data_geo_2 = alt.InlineData(values = london_geo_2.to_json(), #geopandas to geojson string
                       format = alt.DataFormat(property='features',type='json'))

chart_2 = alt.Chart(data_geo_2).mark_geoshape(strokeWidth=1,stroke='lightgray',strokeOpacity=0.2
).encode(
    color=alt.Color('properties.pct_change:Q', scale = alt.Scale(scheme='lightgreyred')),
    tooltip=['properties.csp_name:N', 'properties.pct_change:Q']
).properties(
    projection={'type': 'identity','reflectY': True},
    width=800,
    height=450,
    title='violent crime, percent change between 2017 and 2019'
)

chart_2 + text

## Comments
- <b> Westminster </b> is off the scale in terms of violent crime/1000 head of population and also the extent to which violent crime increased between 2017 and 2019. This CSP includes a significant proportion of London's night time economy (Leicester Square, Piccadilly) and such places are known to attract violent crime in the early hours.
- The next five areas with highest violent crime rates/1000 in 2019 were: Haringey, Hackney, Southwark and Islington
- Many of the areas suffering the biggest increases in violent crime between 2017 and 2019 were on London's outskirts, e.g. Enfield, Barnet and Harrow.

## Reconcile with Londonwide Knife Crime Data
- The final check is to derive the proportion of knife crime in London versus all serious violent crime to see if this confirms what we discussed previously

In [55]:
knife_crime_rates[knife_crime_rates.PoliceForce == 'Metropolitan Police']

Unnamed: 0,Force Name,year,total,PoliceForce,population,crime_per_100K_homeoffice,label
54,Metropolitan Police,2012,10618.0,Metropolitan Police,8538700.0,124.351482,
55,Metropolitan Police,2013,9382.0,Metropolitan Police,8538700.0,109.876211,
56,Metropolitan Police,2014,9016.0,Metropolitan Police,8538700.0,105.589844,
57,Metropolitan Police,2015,9072.0,Metropolitan Police,8538700.0,106.245681,
58,Metropolitan Police,2016,11213.0,Metropolitan Police,8538700.0,131.319756,
59,Metropolitan Police,2017,13708.0,Metropolitan Police,8762400.0,156.441158,
60,Metropolitan Police,2018,13819.0,Metropolitan Police,8817300.0,156.725982,
61,Metropolitan Police,2019,14685.0,Metropolitan Police,8899375.0,165.011588,Metropolitan Police


In [56]:
london_csp_aggregates = london_csp.groupby(["year"]).apply(lambda x: x['total'].sum()).reset_index()
london_csp_aggregates.rename(columns = {0:'total'}, inplace = True)

london_csp_aggregates

Unnamed: 0,year,total
0,2017,130003
1,2018,133225
2,2019,136347


In [83]:
print("Proportion of knife crimes versus all serious violent crimes for the Metropolitan Police = {:.2%}\n".format(14685/136347))
    

Proportion of knife crimes versus all serious violent crimes for the Metropolitan Police = 10.77%



## Conclusions
Home office knife crime rates for the Metropolitan Police were around 14,685 in 2019 while the total derived from aggregating the crimes for all of Londons Community Safety Partnerships was 136,347, which means that knives are used in approximately 10.8% of all violent crimes. This 10.8% is very close to the result we got for Metropolitan Police when we previously compared knife and serious crime at Police Force area and so this shows the Community Safety Partnership level data is consistent with overall police force area data. And while 10.8% is more than the 6-7% the ONS estimated, it's in the right ball park and so we can have some comfort with the data as a reasonable baseline for comparing with the London violent crime data produced we will shortly produce at LSOA level.

# Appendix A - Knife crime as a proportion of Serious Violent Crime
The ONS publish data on the proportion of serious violent crimes which can also be considered knife crimes, which was between 6-7% in recent years. This data can be found in 'Table 3a' within the following dataset 
https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/crimeinenglandandwalesotherrelatedtables).


<img src="ONS_proportion_of_knives.png">

# Appendix B - Crime Code Cross reference
The following table was used to cross reference between the crime codes used in the knife crime and wider serious violent crime datasets and also cross reference back to the crime codes used within the GLA analysis.

It is important to note that the categories included in the Home Office Knife Crime statistics do not include 'Violence without Injury' whereas the GLA analysis does include these crimes.

<img src="crime code cross reference.png">