## Automating ONS Research on Child Suicides

In February 2025, the ONS published [research](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/mentalhealth/bulletins/preventionoffuturedeathreportsforsuicideinchildreninenglandandwales/january2015tonovember2023) analysing:

> Prevention of Future Death reports for suicide in children in England and Wales: January 2015 to November 2023

This notebook assesses how well the PFD Toolkit can replicate and automate the ONS’s manual approach.

*Note: The dataset loaded via `load_reports` is an early development version based solely on HTML and PDF scraping, without the LLM fallback. I'm reluctant to do a full LLM scrape of all reports until I'm confident that its scraping logic is as good as it can be, due to associated costs. As a result, some reports may be missed from screening due to missing data.*

In [1]:
# Time the entire workflow

import time
start = time.time()

### Identifying the reports

#### Loading all reports

In [2]:
from pfd_toolkit import load_reports

reports = load_reports(category='all',
                    start_date="2015-01-01",
                    end_date="2023-11-01")

print(f"In total, there were {len(reports)} PFD reports published between January 2015 and November 2023")

In total, there were 3941 PFD reports published between January 2015 and November 2023


#### Create 'Screener' specification to filter reports

**Note:** The ONS analysis defines a "child" as "aged 18 years and under," with included cases ranging from 12 to 18 years old. While the standard UK definition of a child is *under* 18, for consistency we have adopted the ONS inclusion criteria.

First, we need to set up the LLM and Screener modules...


In [3]:
from pfd_toolkit import LLM, Screener
from dotenv import load_dotenv
import os

# Load OpenAI API key from local environment
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, 
                 max_workers=8)

# Set up Screener
user_query = (
"Where the deceased is **explicitly** noted as being aged 18 or younger *AND* the death was due to suicide." 
)

child_suicide_screener = Screener(llm=llm_client,
                                  reports=reports,
                                  user_query=user_query,
                                  match_leniency=None,
                                  include_concerns=False # ...don't need the concerns section for this task
                                  )

This will generate a prompt to our LLM. We can see what this prompt looks like:

In [4]:
child_suicide_screener._build_prompt_template(user_query)

"You are an expert text classification assistant. Your task is to read the following excerpt from a Prevention of Future Death (PFD) report and decide whether it matches the user's query. \n\n**Instructions:** \n- Only respond 'Yes' if **all** elements of the user query are clearly present in the report. \n- If any required element is missing or there is not enough information, respond 'No'. \n- You may not infer or make judgements; the evidence must be clear.- Make sure any user query related to the deceased is concerned with them *only*, not other persons.\n- Your response must be a JSON object in which 'matches_topic' can be either 'Yes' or 'No'. \n\n**User query:** \n'Where the deceased is **explicitly** noted as being aged 18 or younger *AND* the death was due to suicide.'\nHere is the PFD report excerpt:\n\n{report_excerpt}"

Now we can run the Screener and assign the results to `child_suicide_reports`.

In [5]:
child_suicide_reports = child_suicide_screener.screen_reports()

print(
    f"""\nFrom the initial {len(reports)} reports, PFD Toolkit identified {len(child_suicide_reports)} reports on child suicide"""
)

Sending requests to the LLM (in parallel):   0%|          | 0/3941 [00:00<?, ?it/s]


From the initial 3941 reports, PFD Toolkit identified 58 reports on child suicide


For context, the ONS identified 37 reports relevant reports.

---

In [6]:
# Save & reload reports to keep progress...
child_suicide_reports.to_csv('../data/child_suicide.csv')

In [7]:
import pandas as pd
child_suicide_reports = pd.read_csv('../data/child_suicide.csv')
len(child_suicide_reports)

58

---

### Categorise addressees

We now need to reproduce the 'report by addressees' table produced by ONS.

For this, we'll create a new screener object and run it multiple times. We'll also turn on 'annotate mode,' which doesn't filter reports but insted adds a classification column, which we can name and use to create a tabulation.

The below code looks a bit ungraceful, but this will be addressed in future versions of Toolkit which will contain a Categorisation module.

In [8]:
from pfd_toolkit import Screener, LLM
from dotenv import load_dotenv
import os

# Load OpenAI API key from local environment
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, max_workers=8)

addressee_screener = Screener(llm=llm_client,
                         reports=child_suicide_reports,
                         filter_df=False,
                         # Hide long report sections...
                         include_investigation=False,
                         include_circumstances=False,
                         include_concerns=False,
                         
                         # Show recevier section...
                         include_receiver=True
                         )

child_suicide_reports = addressee_screener.screen_reports(
    user_query="At least one recipient is government department or minister",
    result_col_name="sent_gov"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="At least one recipient is NHS Trust, CCG or ICS",
    result_col_name="sent_nhs",
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="At least one recipient is an organisation with statutory responsibility for a profession (GMC, NMC, Royal Colleges, etc.)",
    result_col_name="sent_prof_body",
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="At least one recipient is a local council",
    result_col_name="sent_council",
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query=(
        "At least one recipient is **none** of the following: "
        "Government dept or minister, NHS Trust / CCG / ICS, professional body, local council"
    ),
    result_col_name="sent_other",
)

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

#### Create summary table

In [9]:
import pandas as pd

categories_config = [
    {"name": "Government department or minister", "col": "sent_gov"},
    {"name": "NHS Trust or CCG", "col": "sent_nhs"},
    {"name": "Professional body", "col": "sent_prof_body"},
    {"name": "Local council", "col": "sent_council"},
    {"name": "Other", "col": "sent_other"},
]

total_reports = len(child_suicide_reports)

summary_data = [
    {
        "Addressee": cat["name"],
        "No of reports": int(child_suicide_reports[cat["col"]].sum()),
        "%": int(
            round((child_suicide_reports[cat["col"]].sum() / total_reports) * 100)
        ),
    }
    for cat in categories_config
]

summary_table_df = pd.DataFrame(summary_data)

print("Summary Table of Report Addressees:")
print(summary_table_df.to_string(index=False))

Summary Table of Report Addressees:
                        Addressee  No of reports  %
Government department or minister             26 45
                 NHS Trust or CCG             28 48
                Professional body              6 10
                    Local council             12 21
                            Other             21 36


We can now compare this with ONS's own table...

| Addressee                         | No of reports | %  |
|----------------------------------|---------------|----|
| Government department or minister| 15            | 41 |
| NHS Trust or CCG                 | 15            | 41 |
| Professional body                | 12            | 32 |
| Local council                    | 8             | 22 |
| Other                            | 10            | 27 |


The big point of difference here is assignment to 'professional body'. In ONS's research, this reflected 32% of reports, but ours is only 10%. Despite us identifying a significantly higher number of reports, our absolute value of reports for this addressee category is lower than that of ONS (6 vs. 12, respectively.)

In ONS's report and accompanying metadata spreadsheet, I was unable to find a definition of 'professional body'. So I used what I think is a reasonable definition:

> "An organisation with statutory responsibility for a profession (e.g. GMC, Nursing and Midwifery Council, Royal Colleges, etc.)"

In essence, it's possible that this difference is caused by a definitional mismatch.

### Categorise 'themes' from coroner concerns

ONS coded the **coroner's concerns** sections into 6 primary themes: service provision, staffing & resourcing, communication, multiple services involved in care, accessing services, access to harmful content & environment. 

Each of these themes contains a number of sub-themes. 

...Sorry for this ugly code...

In [10]:
theme_screener = Screener(
    llm=llm_client,
    reports=child_suicide_reports,
    match_leniency=None,
    filter_df=False,
    # We only need the Concerns section
    include_investigation=False,
    include_circumstances=False,
    include_concerns=True,
)

# ── Service Provision ──────────────────────────────────────────────────────────
child_suicide_reports = theme_screener.screen_reports(
    user_query="Standard operating procedures (e.g. note taking, monitoring, observations) don't exist, are unclear, or not followed correctly",
    result_col_name="sp_sop_inadequate",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Specialist services unavailable or insufficient (e.g. issues with crisis teams, urgent inpatient beds, special educational needs, autism support, deprioritised services, etc.)",
    result_col_name="sp_specialist_services",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Risk assessment documents not completed, assessed inadequately, not updated, or not communicated",
    result_col_name="sp_risk_assessment",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Discharge without review or liaison, self-discharge when detention may be required, poor communication of care requirements to community teams, uncoordinated post-discharge care, inadequate care packages, etc.",
    result_col_name="sp_discharge",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Delayed diagnosis, misdiagnosis, lack of caregiver support for a specific diagnosis, lack of specialist diagnostic training, etc.",
    result_col_name="sp_diagnostics",
)

# ── Staffing & Resourcing ──────────────────────────────────────────────────────
child_suicide_reports = theme_screener.screen_reports(
    user_query="Inadequate staff knowledge of suicide-prevention processes, missing grab bags or anti-ligature tools, staff not following procedures, training gaps, etc.",
    result_col_name="sr_training",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Staff not appropriately qualified, inexperienced case worker assigned, or other inadequate staffing levels",
    result_col_name="sr_inadequate_staffing",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of funding to CAMHS services etc., preventing recruitment or provision of specialist services",
    result_col_name="sr_funding",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Unable to recruit specialist staff or retain an adequate number of staff",
    result_col_name="sr_recruitment_retention",
)

# ── Communication ──────────────────────────────────────────────────────────────
child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of communication between CAMHS and foster or care services or schools, or information sharing between services not possible or not conducted",
    result_col_name="comm_between_services",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of communication from CAMHS with child and/or parent, including insufficient family involvement, support or signposting",
    result_col_name="comm_patient_family",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Instances where professionals did not communicate with parents or caregivers, resulting in missed opportunities to intervene",
    result_col_name="comm_confidentiality_risk",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Inadequate communication of policies to staff, inadequate note keeping or record sharing, unclear responsibility for care coordination within a service",
    result_col_name="comm_within_services",
)

# ── Multiple Services Involved in Care ─────────────────────────────────────────
child_suicide_reports = theme_screener.screen_reports(
    user_query="Care coordinator not assigned or unclear responsibility for coordinating care needs across multiple services",
    result_col_name="msic_integration_care",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of social services involvement, no social worker, inadequate safeguarding checks, lack of specialist support in schools, or missing school safety plan",
    result_col_name="msic_local_authority",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of support transitioning from CAMHS to adult services or unclear guidance for 16–18-year-olds",
    result_col_name="msic_transition_camhs",
)

# ── Accessing Services ─────────────────────────────────────────────────────────
child_suicide_reports = theme_screener.screen_reports(
    user_query="Delay in GP or CAMHS referrals, CAMHS picking up referral, offering appointments, excessive waiting times leading to inappropriate referral, or COVID-19 related delays",
    result_col_name="as_delays_waiting",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Referral rejected due to waiting times, lack of staff, inadequate risk assessment, or complex needs not met by CAMHS",
    result_col_name="as_referral_rejected",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Inadequate contact with child or parent regarding referral, or patient refusal to engage followed by insufficient follow-up",
    result_col_name="as_patient_engagement",
)

# ── Access to Harmful Content & Environments ───────────────────────────────────
child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of internet safeguarding in school or failure of websites or social media to block harmful content",
    result_col_name="ahce_internet",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Sensitive questions or material presented to a child without adequate follow-up, adult support, warnings, or consideration of safety",
    result_col_name="ahce_safeguarding_sensitive",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Access to items that can be used to harm or ligature, or access to alcohol, drugs, or substances where safety concerns are known",
    result_col_name="ahce_harmful_items",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Ability to access railway environments where access should be prevented, such as inadequate fencing",
    result_col_name="ahce_trainline",
)

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/58 [00:00<?, ?it/s]

#### Create theme tables

In [11]:
# 1. Service provision
print("\nPrimary theme: Service provision")
service_provision_config = [
    {
        "name": "Standard operating procedures/ processes not followed or adequate",
        "col": "sp_sop_inadequate",
    },
    {
        "name": "Specialist services (crisis, autism, beds)",
        "col": "sp_specialist_services",
    },
    {"name": "Risk assessment", "col": "sp_risk_assessment"},
    {"name": "Discharge from services", "col": "sp_discharge"},
    {"name": "Diagnostics", "col": "sp_diagnostics"},
]
service_provision_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in service_provision_config
]
service_provision_df = pd.DataFrame(service_provision_data)
print(service_provision_df.to_string(index=False))

# 2. Staffing and resourcing 
print("\n---\n\nPrimary theme: Staffing and resourcing\n")
staffing_resourcing_config = [
    {"name": "Training", "col": "sr_training"},
    {"name": "Inadequate staffing", "col": "sr_inadequate_staffing"},
    {"name": "Funding", "col": "sr_funding"},
    {"name": "Recruitment and retention", "col": "sr_recruitment_retention"},
]
staffing_resourcing_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in staffing_resourcing_config
]
staffing_resourcing_df = pd.DataFrame(staffing_resourcing_data)
print(staffing_resourcing_df.to_string(index=False))

# 3. Communication
print("\n---\n\nPrimary theme: Communication\n")
communication_config = [
    {"name": "Between services", "col": "comm_between_services"},
    {"name": "With patient and family", "col": "comm_patient_family"},
    {
        "name": "Confidentiality risk not communicated",
        "col": "comm_confidentiality_risk",
    },
    {"name": "Within services", "col": "comm_within_services"},
]
communication_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in communication_config
]
communication_df = pd.DataFrame(communication_data)
print(communication_df.to_string(index=False))

# 4. Multiple services involved in care
print("\n---\n\nPrimary theme: Multiple services involved in care\n")
multi_services_config = [
    {"name": "Integration of care", "col": "msic_integration_care"},
    {
        "name": "Local Authority (incl child services, schools)",
        "col": "msic_local_authority",
    },
    {"name": "Transition from CAMHS", "col": "msic_transition_camhs"},
]
multi_services_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in multi_services_config
]
multi_services_df = pd.DataFrame(multi_services_data)
print(multi_services_df.to_string(index=False))

# 5. Accessing services 
print("\n---\n\nPrimary theme: Accessing services\n")
accessing_services_config = [
    {"name": "Delays in referrals and waiting times", "col": "as_delays_waiting"},
    {"name": "Referral rejected", "col": "as_referral_rejected"},
    {"name": "Patient engagement", "col": "as_patient_engagement"},
]
accessing_services_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in accessing_services_config
]
accessing_services_df = pd.DataFrame(accessing_services_data)
print(accessing_services_df.to_string(index=False))

# 6. Access to harmful content and environment
print("\n---\n\nPrimary theme: Access to harmful content and environment\n")
harmful_content_config = [
    {"name": "Internet", "col": "ahce_internet"},
    {
        "name": "Safeguarding from sensitive material",
        "col": "ahce_safeguarding_sensitive",
    },
    {"name": "Harmful items/ substances", "col": "ahce_harmful_items"},
    {"name": "Trainline", "col": "ahce_trainline"},
]
harmful_content_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in harmful_content_config
]
harmful_content_df = pd.DataFrame(harmful_content_data)
print(harmful_content_df.to_string(index=False))


Primary theme: Service provision
                                                        Sub-theme  Number of reports
Standard operating procedures/ processes not followed or adequate                 21
                       Specialist services (crisis, autism, beds)                 29
                                                  Risk assessment                 14
                                          Discharge from services                  2
                                                      Diagnostics                 16

---

Primary theme: Staffing and resourcing

                Sub-theme  Number of reports
                 Training                  5
      Inadequate staffing                 16
                  Funding                  9
Recruitment and retention                  5

---

Primary theme: Communication

                            Sub-theme  Number of reports
                     Between services                 15
              With patient and fami

In [12]:
child_suicide_reports.to_csv('../data/child_suicide_tagged.csv')

### Check workflow runtime

In [13]:
end = time.time()

elapsed_seconds = int(end - start)

minutes, seconds = divmod(elapsed_seconds, 60)
print(f"Elapsed time: {minutes}m {seconds}s")

Elapsed time: 8m 55s
