# Women's imprisonment rates
## Criminal Justice Statistics Police Force Area: Filtering by custodial sentences and offence

In [1]:
import pandas as pd

In [25]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
from src.data.processing import filter_custody_offences, filter_years

In [27]:
df = filter_custody_offences.load_data()

2025-08-01 15:58:10,284 - INFO - Loading interim data for custody offences...
2025-08-01 15:58:10,992 - INFO - Loaded data from data/interim/women_cust_comm_sus.csv
2025-08-01 15:58:11,362 - INFO - Filtering data for custodial sentences...


In [28]:
df

Unnamed: 0,year,pfa,sex,age_group,offence,specific_offence,outcome,sentence_len,freq
0,2010,Avon and Somerset,Female,Adults,Drug offences,Unlawful importation - Class A,Immediate Custody,More than 12 months and up to and including 18...,1
16,2010,Avon and Somerset,Female,Young adults,Violence against the person,Murder,Immediate Custody,Life sentence,1
24,2010,Avon and Somerset,Female,Adults,Theft offences,Theft from Shops,Immediate Custody,More than 1 month and up to and including 2 mo...,1
26,2010,Avon and Somerset,Female,Adults,Miscellaneous crimes against society,Perverting the Course of Justice - indictable ...,Immediate Custody,More than 18 months and up to and including 2 ...,2
30,2010,Avon and Somerset,Female,Young adults,Drug offences,"Production, supply and possession with intent ...",Immediate Custody,More than 6 months and up to and including 9 m...,1
...,...,...,...,...,...,...,...,...,...
330974,2024,Wiltshire,Female,Adults,Sexual offences,Sexual assault of a female child under 13,Immediate Custody,More than 12 months and up to and including 18...,1
330979,2024,Wiltshire,Female,Adults,Theft offences,Burglary in a Building Other than a Dwelling -...,Immediate Custody,More than 1 month and up to and including 2 mo...,1
330982,2024,Wiltshire,Female,Adults,Drug offences,Possession of a controlled drug - Class A,Immediate Custody,More than 2 months and up to and including 3 m...,1
330988,2024,Wiltshire,Female,Adults,Theft offences,Theft from Shops,Immediate Custody,Up to and including 1 month,1


In [6]:
max_year = df["year"].max()

df_processed = (
    df
    .pipe(filter_years.get_year, year_from=max_year)
    .pipe(filter_custody_offences.group_by_pfa_and_offence)
)

2025-08-01 14:13:42,265 - INFO - Filtering data from 2024 onwards
2025-08-01 14:13:42,267 - INFO - Grouping data by PFA, year, offence and specific offence...


In [7]:
df_processed

Unnamed: 0,pfa,year,offence,specific_offence,freq
0,Avon and Somerset,2024,Criminal damage and arson,Arson endangering life,2
1,Avon and Somerset,2024,Criminal damage and arson,Other Criminal Damage,1
2,Avon and Somerset,2024,Drug offences,Possession of a controlled drug - Class A,2
3,Avon and Somerset,2024,Drug offences,"Production, supply and possession with intent ...",6
4,Avon and Somerset,2024,Drug offences,"Production, supply and possession with intent ...",1
...,...,...,...,...,...
1436,Wiltshire,2024,Theft offences,Burglary in a Dwelling - triable either way,1
1437,Wiltshire,2024,Theft offences,Theft from Shops,9
1438,Wiltshire,2024,Theft offences,Theft from the Person of Another,2
1439,Wiltshire,2024,Violence against the person,Assault occasioning actual bodily harm,1


## Examining other implementations of the sunburst chart to direct data processing steps

In [10]:
import plotly.graph_objects as go

fig =go.Figure(go.Sunburst(
    labels=["Eve", "Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
    parents=["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve" ],
    values=[10, 14, 12, 10, 2, 6, 6, 4, 4],
    texttemplate="%{label} <b>%{percentRoot: .0%}</b>",
    hovertemplate="<b>%{label}</b><br>%{percentParent: .0%} of %{parent}<extra></extra>",
))
# Update layout for tight margin
# See https://plotly.com/python/creating-and-updating-figures/
fig.update_layout(margin = dict(t=0, l=0, r=0, b=0))

fig.show()

Okay, this is helpful, as it suggests that I don't need to make the calculations for the proportions of offences in the sunburst chart data processing step. Instead, I can just pass the data as it is, and Plotly will handle the calculations for me, as long as the parent-child relationships are correctly defined.

## Using Plotly example code as a guide to engineer the data processing steps for the sunburst chart

In [65]:
import plotly.graph_objects as go

In [None]:
fig = go.Figure(go.Sunburst(
    labels=["All offences", "Theft", "Other", "Fraud", "Summary", "Drugs", "VAP", "Assault EW"],
    parents=["", "All offences", "All offences", "Other", "Other", "All offences", "All offences", "VAP"],
    values=[38, 14, 12, 10, 2, 6, 6, 4],
    sort=False,
    branchvalues='total',
    texttemplate="%{label} <b>%{percentRoot: .0%}</b>",
    hovertemplate="<b>%{label}</b><br>%{percentParent: .0%} of %{parent}<extra></extra>",
    hoverinfo='label+percent parent',
    insidetextorientation='radial',
    domain_column=0,
    domain_row=0
))

fig.update_layout(margin = dict(t=0, l=0, r=0, b=0))

fig.show()

Taking the dummy data from the Plotly example, I can create a DataFrame that mimics the structure of the data used in the sunburst chart. This will help me understand how to structure my own data for the sunburst chart.

In [None]:
d = {'offence':["All offences", "Theft", "Other", "Fraud", "Summary", "Drugs", "VAP", "Assault EW"],
'parent':["", "All offences", "All offences", "Other", "Other", "All offences", "All offences", "VAP"],
'freq':[38, 14, 12, 10, 2, 6, 6, 4]}

test_data = pd.DataFrame(data=d)
test_data

Unnamed: 0,offence,parent,freq
0,All offences,,38
1,Theft,All offences,14
2,Other,All offences,12
3,Fraud,Other,10
4,Summary,Other,2
5,Drugs,All offences,6
6,VAP,All offences,6
7,Assault EW,VAP,4


## Returning to data processing

In [None]:
df = (
    filter_custody_offences.load_data()
    .pipe(filter_custody_offences.process_data)
    )
df

2025-08-06 09:53:57,437 - INFO - Loading interim data for custody offences...
2025-08-06 09:53:59,156 - INFO - Loaded data from data/interim/women_cust_comm_sus.csv
2025-08-06 09:53:59,635 - INFO - Filtering data for custodial sentences...
2025-08-06 09:53:59,685 - INFO - Running test processing...
2025-08-06 09:53:59,694 - INFO - Filtering data from 2024 onwards
2025-08-06 09:53:59,706 - INFO - Grouping data by PFA, year, offence and specific offence...
2025-08-06 09:53:59,799 - INFO - Creating filter for highlighted offence groups...
2025-08-06 09:53:59,808 - INFO - Setting parent column for offence groups...
2025-08-06 09:53:59,827 - INFO - Setting plot order for offences...
2025-08-06 09:53:59,846 - INFO - Test processing function executed successfully.


Unnamed: 0,pfa,year,offence,specific_offence,freq,parent,plot_order
0,Avon and Somerset,2024,Criminal damage and arson,Arson endangering life,2,All other offences,0.0
1,Avon and Somerset,2024,Criminal damage and arson,Other Criminal Damage,1,All other offences,0.0
2,Avon and Somerset,2024,Drug offences,Possession of a controlled drug - Class A,2,All offences,2.0
3,Avon and Somerset,2024,Drug offences,"Production, supply and possession with intent ...",6,All offences,2.0
4,Avon and Somerset,2024,Drug offences,"Production, supply and possession with intent ...",1,All offences,2.0
...,...,...,...,...,...,...,...
1436,Wiltshire,2024,Theft offences,Burglary in a Dwelling - triable either way,1,All offences,1.0
1437,Wiltshire,2024,Theft offences,Theft from Shops,9,All offences,1.0
1438,Wiltshire,2024,Theft offences,Theft from the Person of Another,2,All offences,1.0
1439,Wiltshire,2024,Violence against the person,Assault occasioning actual bodily harm,1,All offences,3.0


I feel that I need to group by offence, but somehow also keep the assault of an emergency worker as a separate offence, with VAP as the parent.

Store the data as a variable and then concat it with the original DataFrame.

In [47]:
ew_df = filter_custody_offences.extract_assault_of_emergency_worker(df)
ew_df

2025-08-06 10:24:14,785 - INFO - Extracting 'Assault of an emergency worker' offence...


Unnamed: 0,pfa,year,offence,specific_offence,freq,parent,plot_order
27,Avon and Somerset,2024,Assault of an emergency worker,Assault of an emergency worker,25,Violence against the person,3.0
72,Cambridgeshire,2024,Assault of an emergency worker,Assault of an emergency worker,6,Violence against the person,3.0
102,Cheshire,2024,Assault of an emergency worker,Assault of an emergency worker,17,Violence against the person,3.0
137,Cleveland,2024,Assault of an emergency worker,Assault of an emergency worker,19,Violence against the person,3.0
164,Cumbria,2024,Assault of an emergency worker,Assault of an emergency worker,5,Violence against the person,3.0
190,Derbyshire,2024,Assault of an emergency worker,Assault of an emergency worker,13,Violence against the person,3.0
227,Devon and Cornwall,2024,Assault of an emergency worker,Assault of an emergency worker,12,Violence against the person,3.0
254,Dorset,2024,Assault of an emergency worker,Assault of an emergency worker,3,Violence against the person,3.0
280,Durham,2024,Assault of an emergency worker,Assault of an emergency worker,4,Violence against the person,3.0
289,Dyfed Powys,2024,Assault of an emergency worker,Assault of an emergency worker,5,Violence against the person,3.0


Great, that seems to have worked. Now let's further develop the test processing function to now perform the following steps:
1. Extract the assault of an emergency worker offence.
2. Group `df` by PFA and offence (again!) to collapse all of the specific offences into a single offence.
3. Add the assault of an emergency worker offence back into `df`.
4. Set the plot order for the offences.
5. Drop the `specific_offence` column.
6. Set the `parent` column values.

In [None]:
df = (
    filter_custody_offences.load_data()
    .pipe(filter_custody_offences.process_data)
    )
df

2025-08-06 10:44:29,531 - INFO - Loading interim data for custody offences...
2025-08-06 10:44:30,131 - INFO - Loaded data from data/interim/women_cust_comm_sus.csv
2025-08-06 10:44:30,485 - INFO - Filtering data for custodial sentences...
2025-08-06 10:44:30,494 - INFO - Running test processing...
2025-08-06 10:44:30,495 - INFO - Filtering data from 2024 onwards
2025-08-06 10:44:30,499 - INFO - Grouping data by pfa, year, offence, specific_offence and summing frequencies...
2025-08-06 10:44:30,512 - INFO - Extracting 'Assault of an emergency worker' offence...
2025-08-06 10:44:30,514 - INFO - Grouping data by pfa, year, offence and summing frequencies...
2025-08-06 10:44:30,519 - INFO - Adding 'Assault of an emergency worker' offence to the main DataFrame...
2025-08-06 10:44:30,527 - INFO - Setting plot order for offences...
2025-08-06 10:44:30,530 - INFO - Creating filter for highlighted offence groups...
2025-08-06 10:44:30,531 - INFO - Setting parent column for offence groups...
20

Unnamed: 0,pfa,year,offence,freq,plot_order,parent
0,Gloucestershire,2024,Assault of an emergency worker,1,0.0,Violence against the person
1,West Mercia,2024,Assault of an emergency worker,1,0.0,Violence against the person
2,Northamptonshire,2024,Assault of an emergency worker,2,0.0,Violence against the person
3,Dorset,2024,Assault of an emergency worker,3,0.0,Violence against the person
4,Leicestershire,2024,Assault of an emergency worker,3,0.0,Violence against the person
...,...,...,...,...,...,...
463,Lancashire,2024,Violence against the person,52,3.0,All offences
464,West Midlands,2024,Violence against the person,55,3.0,All offences
465,South Wales,2024,Violence against the person,58,3.0,All offences
466,West Yorkshire,2024,Violence against the person,77,3.0,All offences


In [62]:
wiltshire_df = df.query("pfa == 'Wiltshire'").sort_values(by='freq', ascending=False)
wiltshire_df

Unnamed: 0,pfa,year,offence,freq,plot_order,parent
387,Wiltshire,2024,Theft offences,13,1.0,All offences
429,Wiltshire,2024,Violence against the person,5,3.0,All offences
12,Wiltshire,2024,Assault of an emergency worker,4,0.0,Violence against the person
131,Wiltshire,2024,Fraud offences,4,0.0,All other offences
302,Wiltshire,2024,Sexual offences,2,0.0,All other offences
225,Wiltshire,2024,Public order offences,2,0.0,All other offences
69,Wiltshire,2024,Drug offences,1,2.0,All offences
46,Wiltshire,2024,Criminal damage and arson,1,0.0,All other offences
145,Wiltshire,2024,Miscellaneous crimes against society,1,0.0,All other offences
187,Wiltshire,2024,Possession of weapons,1,0.0,All other offences


Let's finally ensure that the `All other offences` subtotal is calculated. I can do this by grouping the data by the 'offence' and 'parent' columns, summing the 'freq' column, and then creating a new row for 'All other offences' with the total frequency.

In [63]:
def create_all_offences_group(pfa_df: pd.DataFrame) -> pd.DataFrame:
    """
    This method filters out the highlighted offences and aggregates the remaining offences
    into a single group. It then creates a new row in the DataFrame for "All other offences"
    with the sum of their proportions.
    """
    mask_filter = ~filter_custody_offences.filter_offences(pfa_df)  # Filter out highlighted offences
    pfa_df = pd.concat([
        pfa_df,
        pd.DataFrame.from_records([{
            'pfa': pfa_df['pfa'].iloc[0],
            'year': pfa_df['year'].iloc[0],
            'offence': "All other offences",
            'freq': pfa_df.loc[mask_filter, 'freq'].sum(),
            'parent': "All offences",
            'plot_order': 0
        }])
    ], ignore_index=True).sort_values(by=['plot_order', 'freq'], ascending=True)
    return pfa_df

In [66]:
test_data = create_all_offences_group(wiltshire_df)
test_data

2025-08-06 11:19:24,421 - INFO - Creating filter for highlighted offence groups...


Unnamed: 0,pfa,year,offence,freq,plot_order,parent
7,Wiltshire,2024,Criminal damage and arson,1,0.0,All other offences
8,Wiltshire,2024,Miscellaneous crimes against society,1,0.0,All other offences
9,Wiltshire,2024,Possession of weapons,1,0.0,All other offences
10,Wiltshire,2024,Summary non-motoring,1,0.0,All other offences
11,Wiltshire,2024,Summary motoring,1,0.0,All other offences
4,Wiltshire,2024,Sexual offences,2,0.0,All other offences
5,Wiltshire,2024,Public order offences,2,0.0,All other offences
2,Wiltshire,2024,Assault of an emergency worker,4,0.0,Violence against the person
3,Wiltshire,2024,Fraud offences,4,0.0,All other offences
12,Wiltshire,2024,All other offences,17,0.0,All offences


## Testing with Plotly

In [67]:
fig = go.Figure(go.Sunburst(
    labels=test_data['offence'],
    parents=test_data['parent'],
    values=test_data['freq'],
    sort=False,
    branchvalues='total',
    texttemplate="%{label} <b>%{percentRoot: .0%}</b>",
    hovertemplate="<b>%{label}</b><br>%{percentParent: .0%} of %{parent}<extra></extra>",
    hoverinfo='label+percent parent',
    insidetextorientation='radial',
    domain_column=0,
    domain_row=0
))

fig.update_layout(margin = dict(t=0, l=0, r=0, b=0))

fig.show()

Great, that's worked. Now to run `filter_custody_offences.py` to ensure that the data processing steps are correctly implemented and test with the production code in `custody_offences.py`.

## Testing `custody_offences.py`

In [68]:
from src.visualization import custody_offences

In [106]:
custody_offences.test_chart(pfa='Wiltshire')

2025-08-06 13:18:56,964 - INFO - Loaded data from data/processed/PFA_custodial_sentences_by_offence_2024_FINAL.csv


2025-08-06 13:18:56,972 - INFO - Creating filter for highlighted offence groups...
2025-08-06 13:18:56,996 - INFO - Adding chart annotations...
2025-08-06 13:18:57,006 - INFO - Setting source annotation...
2025-08-06 13:18:57,009 - INFO - Annotations added successfully.


In [None]:
custody_offences.test_chart(pfa='Northamptonshire')

2025-08-06 13:44:36,290 - INFO - Loaded data from data/processed/PFA_custodial_sentences_by_offence_2024_FINAL.csv


2025-08-06 13:44:36,305 - INFO - Creating filter for highlighted offence groups...
2025-08-06 13:44:36,334 - INFO - Adding chart annotations...
2025-08-06 13:44:36,341 - INFO - Setting source annotation...
2025-08-06 13:44:36,343 - INFO - Annotations added successfully.
2025-08-06 13:44:36,344 - INFO - Saving chart...
2025-08-06 13:44:39,295 - INFO - Figure saved to reports/figures/custody_offences/Northamptonshire.pdf


In [102]:
test_chart = custody_offences.test_chart()
test_chart

2025-08-06 13:16:03,942 - INFO - Loaded data from data/processed/PFA_custodial_sentences_by_offence_2024_FINAL.csv
2025-08-06 13:16:03,947 - INFO - Creating filter for highlighted offence groups...


2025-08-06 13:16:03,968 - INFO - Adding chart annotations...
2025-08-06 13:16:03,975 - INFO - Setting source annotation...
2025-08-06 13:16:03,977 - INFO - Annotations added successfully.
