## Automating ONS Research on Child Suicides

In February 2025, the ONS published [research](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/mentalhealth/bulletins/preventionoffuturedeathreportsforsuicideinchildreninenglandandwales/january2015tonovember2023) analysing:

> Prevention of Future Death reports for suicide in children in England and Wales: January 2015 to November 2023

Their study involved the manual identification and thematic coding of PFD reports relating to child suicides - a process known to be time-consuming and difficult to scale.

This notebook evaluates the performance of PFD Toolkit in replicating and automating the ONS approach. By comparing automated outputs to the original manual research, we assess the toolkit’s accuracy, efficiency, and potential to accelerate large-scale, reproducible analysis of PFD reports.

In [1]:
# Keep track of notebook run time
import time
start_time = time.time()

### Identifying the reports

#### Loading all reports

In [2]:
from pfd_toolkit import load_reports

reports = load_reports(category='all',
                    start_date="2015-01-01",
                    end_date="2023-11-01")

print(f"In total, there were {len(reports)} PFD reports published between January 2015 and November 2023")

In total, there were 3941 PFD reports published between January 2015 and November 2023


#### Create 'Screener' specification to filter reports

**Note:** The ONS analysis defines a "child" as "aged 18 years and under," with included cases ranging from 12 to 18 years old. While the standard UK definition of a child is "under 18," for consistency we have adopted the ONS inclusion criteria.

First, we need to set up the LLM and Screener modules...


In [3]:
from pfd_toolkit import LLM, Screener
from dotenv import load_dotenv
import os

# Load OpenAI API key from local environment
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, 
                 max_workers=25)

# Set up Screener
user_query = (
"Where the deceased is **explicitly** noted as being aged 18 or younger *AND* the death was due to suicide." 
)

child_suicide_screener = Screener(llm=llm_client,
                                  reports=reports,
                                  user_query=user_query,
                                  match_leniency=None,
                                  include_concerns=False
                                  )

This will generate a prompt to our LLM. We can see what this prompt looks like:

In [4]:
child_suicide_screener._build_prompt_template(user_query)

"You are an expert text classification assistant. Your task is to read the following excerpt from a Prevention of Future Death (PFD) report and decide whether it matches the user's query. \n\n**Instructions:** \n- Only respond 'Yes' if **all** elements of the user query are clearly present in the report. \n- If any required element is missing or there is not enough information, respond 'No'. \n- Make sure any user query related to the deceased is concerned with them *only*, not other persons.\n- Your response must be a JSON object in which 'matches_topic' can be either 'Yes' or 'No'. \n\n**User query:** \n'Where the deceased is **explicitly** noted as being aged 18 or younger *AND* the death was due to suicide.'\nHere is the PFD report excerpt:\n\n{report_excerpt}"

Now we can run the Screener and assign the results to `child_suicide_reports`.

In [5]:
%%time

child_suicide_reports = child_suicide_screener.screen_reports()

print(
    f"""From the initial {len(reports)} reports, PFD Toolkit 
      identified {len(child_suicide_reports)} reports on child suicide"""
)

Sending requests to the LLM (in parallel):   0%|          | 0/3941 [00:00<?, ?it/s]

From the initial 3941 reports, PFD Toolkit 
      identified 61 reports on child suicide
CPU times: user 51.6 s, sys: 1.79 s, total: 53.4 s
Wall time: 5min 27s


The Toolkit identified 61 cases, compared to the ONS's 37.

**Note: I checked

---

In [6]:
# Save & reload reports to keep progress...
child_suicide_reports.to_csv('../data/child_suicide.csv')

In [7]:
import pandas as pd
child_suicide_reports = pd.read_csv('../data/child_suicide.csv')
len(child_suicide_reports)

61

### Categorise addressees

We now need to reproduce this table produced by ONS's analysis (though of course the number of reports is different):

| Addressee                         | No of reports | %  |
|----------------------------------|---------------|----|
| Government department or minister| 15            | 41 |
| NHS Trust or CCG                 | 15            | 41 |
| Professional body                | 12            | 32 |
| Local council                    | 8             | 22 |
| Other                            | 10            | 27 |

For this, we'll create a new screener object and run it multiple times. We'll also turn on 'annotate mode,' which doesn't filter reports but insted adds a classification column, which we can name and use to create a tabulation.

The below code looks a bit ungraceful, but this will be addressed in future versions of Toolkit which will contain a Categorisation module.

In [8]:
from pfd_toolkit import Screener, LLM
from dotenv import load_dotenv
import os

# Load OpenAI API key from local environment
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, max_workers=20)

addressee_screener = Screener(llm=llm_client,
                         reports=child_suicide_reports,
                         filter_df=False,
                         # Turn off long sections
                         include_investigation=False,
                         include_circumstances=False,
                         include_concerns=False,
                         
                         # Turn on recevier section
                         include_receiver=True
                         )

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to government department or minister",
    result_col_name="sent_gov"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to NHS Trust, CCG or ICS", 
    result_col_name="sent_nhs"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to some form of professional body, but **not** govt, NHS Trust/CCG/ICS nor local council", 
    result_col_name="sent_prof_body"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to a local council",
    result_col_name="sent_council"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query=(
        "Sent to any of the following: "
        "rail or road company, police (including British Transport Police), **private** companies/hospital or CAMHS)"
    ),
    result_col_name="sent_other"
)

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

#### Create summary table

In [9]:
import pandas as pd

categories_config = [
    {"name": "Government department or minister", "col": "sent_gov"},
    {"name": "NHS Trust or CCG", "col": "sent_nhs"},
    {"name": "Professional body", "col": "sent_prof_body"},
    {"name": "Local council", "col": "sent_council"},
    {"name": "Other", "col": "sent_other"},
]

total_reports = len(child_suicide_reports)

summary_data = [
    {
        "Addressee": cat["name"],
        "No of reports": int(child_suicide_reports[cat["col"]].sum()),
        "%": int(
            round((child_suicide_reports[cat["col"]].sum() / total_reports) * 100)
        ),
    }
    for cat in categories_config
]

summary_table_df = pd.DataFrame(summary_data)

print("Summary Table of Report Addressees:")
print(summary_table_df.to_string(index=False))

Summary Table of Report Addressees:
                        Addressee  No of reports  %
Government department or minister             19 31
                 NHS Trust or CCG             21 34
                Professional body              5  8
                    Local council              7 11
                            Other              6 10
