## Cleaner demo

Any changes to the Cleaner module should only be pushed to main if the below code works without issue.

The Cleaner class is primarily respomnsible for correcting spelling errors contained within PFD reports. It also standardises coroner names into _Initial. LastName_ format, which we've used to assist with coroner-level filtering.

In [None]:
from pfd_toolkit import Cleaner, LLM
from dotenv import load_dotenv
import os
import pandas as pd

# Read unclean / directly scraped reports from file
unclean_reports = pd.read_csv('../data/testreports.csv')

# Get API key
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Set up API client
llm_client = LLM(api_key=openai_api_key,
                 model="gpt-4.1-mini")

# Run cleaner (below, we use minimal parameters. In practice, the user can 'turn off' cleaning for a given column)
cleaner = Cleaner(
    llm=llm_client,
    reports=unclean_reports)


cleaned_reports = cleaner.clean_reports()

Processing Fields: 100%|██████████| 6/6 [02:01<00:00, 20.26s/it]


In [5]:
cleaned_reports.head()

cleaned_reports.to_csv('../data/testreports_cleaned.csv')

Below, we can see the output of our cleaning instance:

In [6]:
cleaner.cleaned_reports

Unnamed: 0.1,Unnamed: 0,URL,ID,Date,CoronerName,Area,Receiver,InvestigationAndInquest,CircumstancesOfDeath,MattersOfConcern
0,0,https://www.judiciary.uk/prevention-of-future-...,2025-0140,2025-02-01,L. Brown,West London,Revon Healthcare,On 18 December 2023 I commenced an investigati...,James was found deceased in his room at Surbit...,(1) During the inquest the court was advised t...
1,1,https://www.judiciary.uk/prevention-of-future-...,2025-0136,2025-11-03,S. Ridge,Surrey,HMPPS,N/A: Not found,During the course of the inquest the court hea...,a Probation staff are not always aware of or h...
2,2,https://www.judiciary.uk/prevention-of-future-...,2025-0121,2025-04-03,N. Walker,"Hampshire, Portsmouth and Southampton",National Institute for Health and Care Excelle...,On 19th September 2023 an investigation was co...,Chloe Elizabeth Burgess was found deceased at ...,The inquest heard evidence that the potential ...
3,3,https://www.judiciary.uk/prevention-of-future-...,2025-0115,2025-02-28,A. Cox,Cornwall and the Isles of Scilly,MP; Secretary of State for Health and Social Care,"On 27 February 2025, I concluded a four-day ju...",Despite appropriate treatment by paramedics an...,Delay in ambulance response attributable to de...
4,4,https://www.judiciary.uk/prevention-of-future-...,2025-0114,2025-02-28,A. Cox,Cornwall and the Isles of Scilly,Chief Constable Devon & Cornwall Constabulary;...,"On 27 February 2025, I concluded a four-day ju...",Mr Campbell had a history of recreational drug...,1) Delays in ambulance attendance. I have writ...
5,5,https://www.judiciary.uk/prevention-of-future-...,2025-0113,2025-02-28,H. Westerman,"Shropshire, Telford and Wrekin",NHS England; Chief Executive of Shrewsbury and...,"On 12 July 2023 Mr Ellery, H.M. Senior Coroner...",Mr Green was admitted to The Royal Shrewsbury ...,(1) Once any patient at The Royal Shrewsbury H...
6,6,https://www.judiciary.uk/prevention-of-future-...,2025-0110,2025-02-27,R. Middleton,Dorset,The Home Office,"On the 13th June 2024, an investigation was co...",Mr Leatham-Prosser had started misusing ketami...,N/A: Not found
7,7,https://www.judiciary.uk/prevention-of-future-...,2025-0057,2025-01-31,J. Turner,"West Sussex, Brighton and Hove",Ministry of Defence,On 01 November 2023 I commenced an investigati...,Mr Taylor had rapidly fallen into drug addicti...,When found to have taken illicit drugs months ...
8,8,https://www.judiciary.uk/prevention-of-future-...,2025-0055,2025-01-31,N. Parsley,Suffolk,Secretary of State Department of Health and So...,On 13th May 2024 I commenced an investigation ...,Kim Robinson's death was recognised at 05:16 o...,1. Following Kim's tragic death the GP who had...
9,9,https://www.judiciary.uk/prevention-of-future-...,2025-0048,2025-01-24,X. Mooyaart,Inner South London,NHS England,On 1 July 2021 an investigation into the death...,Mr Marriage had a longstanding diagnosis of id...,(1) There are cohorts of patients who are medi...


Let's compare it with the original, unclean reports that we imported earlier. Even though the below content in concatinated, we can see that the above has correctly standardised the Coroner's name into the desired format. There are a couple of instances in the longer sections where improper spaces have been removed (e.g. "On 19 th September" has been changed to "On 19th September").

In [None]:
unclean_reports.head(n=10)

Unnamed: 0.1,Unnamed: 0,URL,ID,Date,CoronerName,Area,Receiver,InvestigationAndInquest,CircumstancesOfDeath,MattersOfConcern
0,0,https://www.judiciary.uk/prevention-of-future-...,2025-0140,2025-02-01,Lydia Brown,West London,Revon Healthcare,On 18 December 2023 I commenced an investigati...,James was found deceased in his room at [REDAC...,During the course of the inquest the evidence ...
1,1,https://www.judiciary.uk/prevention-of-future-...,2025-0136,2025-11-03,Susan Ridge,Surrey,HMPPS,N/A: Not found,During the course of the inquest the court hea...,The MATTERS OF CONCERN are: a.Probation staff ...
2,2,https://www.judiciary.uk/prevention-of-future-...,2025-0121,2025-04-03,Nicholas Walker,"Hampshire, Portsmouth and Southampton",1. National Institute for Health and Care Exce...,On 19 th September 2023 an investigation was c...,Chloe Elizabeth Burgess was found deceased at ...,During the inquest the evidence revealed matte...
3,3,https://www.judiciary.uk/prevention-of-future-...,2025-0115,2025-02-28,Andrew Cox,Cornwall and the Isles of Scilly,"1. , MP, Secretary of State for Health & Socia...","On 27 February 2025, I concluded a four-day ju...",The jury recorded the following: Despite appro...,"During the course of these inquests, the evide..."
4,4,https://www.judiciary.uk/prevention-of-future-...,2025-0114,2025-02-28,Andrew Cox,Cornwall and the Isles of Scilly,"1. , Chief Constable, Devon & Cornwall Constab...","On 27/2/25, I concluded a four-day jury inques...",The relevant background circumstances are that...,"During the course of these inquests, the evide..."
5,5,https://www.judiciary.uk/prevention-of-future-...,2025-0113,2025-02-28,Heath Westerman,"Shropshire, Telford & Wrekin","1. NHS England, Wellington House, 133-155 Wate...","On 12 July 2023 Mr Ellery, H.M. Senior Coroner...",Mr Green was admitted to The Royal Shrewsbury ...,During the course of the inquest the evidence ...
6,6,https://www.judiciary.uk/prevention-of-future-...,2025-0110,2025-02-27,Richard Middleton,Dorset,The Home Office,"On the 13 th June 2024, an investigation was c...",Mr Leatham-Prosser had started misusing ketami...,N/A: Not found
7,7,https://www.judiciary.uk/prevention-of-future-...,2025-0057,2025-01-31,Joseph Turner,"West Sussex, Brighton and Hove",Ministry of Defence,On 01 November 2023 I commenced an investigati...,Mr Taylor had rapidly fallen into drug addicti...,During the course of the investigation my inqu...
8,8,https://www.judiciary.uk/prevention-of-future-...,2025-0055,2025-01-31,Nigel Parsley,Suffolk,Secretary of State Department of Health and So...,On 13 th May 2024 I commenced an investigation...,Kim Robinson's death was recognised at 05:16 o...,During the course of the inquest the evidence ...
9,9,https://www.judiciary.uk/prevention-of-future-...,2025-0048,2025-01-24,Xavier Mooyaart,Inner South London,NHS England,On 1 July 2021 an investigation into the death...,Mr Marriage had a longstanding diagnosis of id...,During the course of the inquest the evidence ...
