# Women's imprisonment rates
## Criminal Justice Statistics Police Force Area: Offences resulting in custody

#### Importing pandas library and reading in data

In [1]:
import pandas as pd
df = pd.read_csv('../data/interim/PFA_2009-21_women_cust_comm_sus.csv')

In [2]:
df.head()

Unnamed: 0,year,pfa,sex,age_group,offence,outcome,sentence_len,freq
0,2009,Avon and Somerset,Female,Young adults,Violence against the person,Community sentence,,2
1,2009,Avon and Somerset,Female,Young adults,Violence against the person,Suspended sentence,,1
2,2009,Avon and Somerset,Female,Young adults,Violence against the person,Suspended sentence,,1
3,2009,Avon and Somerset,Female,Young adults,Public order offences,Community sentence,,1
4,2009,Avon and Somerset,Female,Young adults,Miscellaneous crimes against society,Community sentence,,1


#### Filtering data for custodial sentences and test year of 2019

In [5]:
filt = df['outcome'] == 'Immediate custody'
filt2 = df['year'] == 2019
df2 = df[filt & filt2]

In [7]:
df2.head()

Unnamed: 0,year,pfa,sex,age_group,offence,outcome,sentence_len,freq
194384,2019,Hampshire,Female,Adults,Violence against the person,Immediate custody,Life sentence,1
194385,2019,Lancashire,Female,Adults,Violence against the person,Immediate custody,Life sentence,1
194386,2019,Metropolitan Police,Female,Adults,Violence against the person,Immediate custody,Life sentence,1
194387,2019,Lancashire,Female,Adults,Violence against the person,Immediate custody,Life sentence,1
194388,2019,Lancashire,Female,Adults,Violence against the person,Immediate custody,Life sentence,1


### Grouping for by PFA and offence for local factsheets

In [9]:
df3 = df2.groupby(['pfa', 'offence'], as_index=False)['freq'].sum()
df3

Unnamed: 0,pfa,offence,freq
0,Avon and Somerset,Criminal damage and arson,2
1,Avon and Somerset,Drug offences,16
2,Avon and Somerset,Fraud Offences,3
3,Avon and Somerset,Miscellaneous crimes against society,12
4,Avon and Somerset,Possession of weapons,11
...,...,...,...
443,Wiltshire,Robbery,1
444,Wiltshire,Sexual offences,1
445,Wiltshire,Summary non-motoring,4
446,Wiltshire,Theft Offences,23


### Checking results

#### Running a query for a PFA

In [10]:
df3.query('pfa == "Derbyshire"')

Unnamed: 0,pfa,offence,freq
65,Derbyshire,Drug offences,12
66,Derbyshire,Fraud Offences,5
67,Derbyshire,Miscellaneous crimes against society,6
68,Derbyshire,Possession of weapons,2
69,Derbyshire,Public order offences,5
70,Derbyshire,Robbery,2
71,Derbyshire,Summary motoring,3
72,Derbyshire,Summary non-motoring,11
73,Derbyshire,Theft Offences,57
74,Derbyshire,Violence against the person,20


#### Checking that I get the expected result when I crosstab. I need to use value and aggfunc arguments to sum the frequency, otherwise it will perform a count of rows, rather than the frequency.

In [15]:
pd.crosstab(index=df3['pfa'], columns=df3['offence'], values=df3['freq'], aggfunc=sum)

offence,Criminal damage and arson,Drug offences,Fraud Offences,Miscellaneous crimes against society,Not known,Possession of weapons,Public order offences,Robbery,Sexual offences,Summary motoring,Summary non-motoring,Theft Offences,Violence against the person
pfa,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Avon and Somerset,2.0,16.0,3.0,12.0,,11.0,5.0,3.0,2.0,3.0,19.0,62.0,13.0
Bedfordshire,,,,3.0,,1.0,,1.0,,1.0,2.0,16.0,7.0
Cambridgeshire,,2.0,7.0,4.0,1.0,4.0,3.0,4.0,1.0,3.0,13.0,31.0,16.0
Cheshire,2.0,18.0,6.0,3.0,,2.0,9.0,2.0,1.0,7.0,13.0,71.0,15.0
Cleveland,,3.0,2.0,10.0,1.0,3.0,1.0,11.0,2.0,2.0,5.0,44.0,14.0
Cumbria,,15.0,1.0,1.0,,2.0,10.0,1.0,,1.0,4.0,28.0,9.0
Derbyshire,,12.0,5.0,6.0,,2.0,5.0,2.0,,3.0,11.0,57.0,20.0
Devon and Cornwall,2.0,8.0,9.0,3.0,,6.0,7.0,2.0,1.0,2.0,13.0,33.0,20.0
Dorset,,5.0,3.0,5.0,,2.0,4.0,,2.0,,8.0,23.0,9.0
Durham,,1.0,,4.0,1.0,2.0,2.0,2.0,,1.0,4.0,18.0,6.0


#### The using the normalize argument to calculate percentages by index (row), and rounding by 3dp to minimise impact of bankers rounding.

In [26]:
pd.crosstab(index=df3['pfa'], columns=df3['offence'], values=df3['freq'], aggfunc=sum, normalize='index').round(3)

offence,Criminal damage and arson,Drug offences,Fraud Offences,Miscellaneous crimes against society,Not known,Possession of weapons,Public order offences,Robbery,Sexual offences,Summary motoring,Summary non-motoring,Theft Offences,Violence against the person
pfa,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Avon and Somerset,0.013,0.106,0.02,0.079,0.0,0.073,0.033,0.02,0.013,0.02,0.126,0.411,0.086
Bedfordshire,0.0,0.0,0.0,0.097,0.0,0.032,0.0,0.032,0.0,0.032,0.065,0.516,0.226
Cambridgeshire,0.0,0.022,0.079,0.045,0.011,0.045,0.034,0.045,0.011,0.034,0.146,0.348,0.18
Cheshire,0.013,0.121,0.04,0.02,0.0,0.013,0.06,0.013,0.007,0.047,0.087,0.477,0.101
Cleveland,0.0,0.031,0.02,0.102,0.01,0.031,0.01,0.112,0.02,0.02,0.051,0.449,0.143
Cumbria,0.0,0.208,0.014,0.014,0.0,0.028,0.139,0.014,0.0,0.014,0.056,0.389,0.125
Derbyshire,0.0,0.098,0.041,0.049,0.0,0.016,0.041,0.016,0.0,0.024,0.089,0.463,0.163
Devon and Cornwall,0.019,0.075,0.085,0.028,0.0,0.057,0.066,0.019,0.009,0.019,0.123,0.311,0.189
Dorset,0.0,0.082,0.049,0.082,0.0,0.033,0.066,0.0,0.033,0.0,0.131,0.377,0.148
Durham,0.0,0.024,0.0,0.098,0.024,0.049,0.049,0.049,0.0,0.024,0.098,0.439,0.146


#### Saving to variable

In [28]:
df4 = pd.crosstab(index=df3['pfa'], columns=df3['offence'], values=df3['freq'], aggfunc=sum, normalize='index').round(3)

#### Outputting to CSV

In [31]:
df4.to_csv('../data/processed/PFA_2019_offences.csv')