## Automating ONS Research on Child Suicides

In February 2025, the ONS published [research](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/mentalhealth/bulletins/preventionoffuturedeathreportsforsuicideinchildreninenglandandwales/january2015tonovember2023) analysing:

> Prevention of Future Death reports for suicide in children in England and Wales: January 2015 to November 2023

This notebook evaluates the performance of PFD Toolkit in replicating and automating the ONS approach.

### Identifying the reports

#### Loading all reports

In [None]:
from pfd_toolkit import load_reports

reports = load_reports(category='all',
                    start_date="2015-01-01",
                    end_date="2023-11-01")

print(f"In total, there were {len(reports)} PFD reports published between January 2015 and November 2023")

#### Create 'Screener' specification to filter reports

**Note:** The ONS analysis defines a "child" as "aged 18 years and under," with included cases ranging from 12 to 18 years old. While the standard UK definition of a child is "under 18," for consistency we have adopted the ONS inclusion criteria.

First, we need to set up the LLM and Screener modules...


In [None]:
from pfd_toolkit import LLM, Screener
from dotenv import load_dotenv
import os

# Load OpenAI API key from local environment
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, 
                 max_workers=25)

# Set up Screener
user_query = (
"Where the deceased is **explicitly** noted as being aged 18 or younger *AND* the death was due to suicide." 
)

child_suicide_screener = Screener(llm=llm_client,
                                  reports=reports,
                                  user_query=user_query,
                                  match_leniency=None,
                                  include_concerns=False
                                  )

This will generate a prompt to our LLM. We can see what this prompt looks like:

In [None]:
child_suicide_screener._build_prompt_template(user_query)

Now we can run the Screener and assign the results to `child_suicide_reports`.

In [None]:
%%time

child_suicide_reports = child_suicide_screener.screen_reports()

print(
    f"""From the initial {len(reports)} reports, PFD Toolkit 
      identified {len(child_suicide_reports)} reports on child suicide"""
)

For context, the ONS identified 37 reports relevant reports.

---

In [None]:
# Save & reload reports to keep progress...
child_suicide_reports.to_csv('../data/child_suicide.csv')

In [2]:
import pandas as pd
child_suicide_reports = pd.read_csv('../data/child_suicide.csv')
len(child_suicide_reports)

61

### Categorise addressees

We now need to reproduce this table produced by ONS's analysis (though of course we've identified a larger number of reports):

| Addressee                         | No of reports | %  |
|----------------------------------|---------------|----|
| Government department or minister| 15            | 41 |
| NHS Trust or CCG                 | 15            | 41 |
| Professional body                | 12            | 32 |
| Local council                    | 8             | 22 |
| Other                            | 10            | 27 |

For this, we'll create a new screener object and run it multiple times. We'll also turn on 'annotate mode,' which doesn't filter reports but insted adds a classification column, which we can name and use to create a tabulation.

The below code looks a bit ungraceful, but this will be addressed in future versions of Toolkit which will contain a Categorisation module.

In [3]:
from pfd_toolkit import Screener, LLM
from dotenv import load_dotenv
import os

# Load OpenAI API key from local environment
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, max_workers=25)

addressee_screener = Screener(llm=llm_client,
                         reports=child_suicide_reports,
                         filter_df=False,
                         # Turn off long sections
                         include_investigation=False,
                         include_circumstances=False,
                         include_concerns=False,
                         
                         # Turn on recevier section
                         include_receiver=True
                         )

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to government department or minister",
    result_col_name="sent_gov"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to NHS Trust, CCG or ICS", 
    result_col_name="sent_nhs"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to some form of professional body, but **not** govt, NHS Trust/CCG/ICS nor local council", 
    result_col_name="sent_prof_body"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query="Sent to a local council",
    result_col_name="sent_council"
)

child_suicide_reports = addressee_screener.screen_reports(
    user_query=(
        "Sent to any of the following: "
        "rail or road company, police (including British Transport Police), **private** companies/hospital or CAMHS)"
    ),
    result_col_name="sent_other"
)

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

#### Create summary table

In [4]:
import pandas as pd

categories_config = [
    {"name": "Government department or minister", "col": "sent_gov"},
    {"name": "NHS Trust or CCG", "col": "sent_nhs"},
    {"name": "Professional body", "col": "sent_prof_body"},
    {"name": "Local council", "col": "sent_council"},
    {"name": "Other", "col": "sent_other"},
]

total_reports = len(child_suicide_reports)

summary_data = [
    {
        "Addressee": cat["name"],
        "No of reports": int(child_suicide_reports[cat["col"]].sum()),
        "%": int(
            round((child_suicide_reports[cat["col"]].sum() / total_reports) * 100)
        ),
    }
    for cat in categories_config
]

summary_table_df = pd.DataFrame(summary_data)

print("Summary Table of Report Addressees:")
print(summary_table_df.to_string(index=False))

Summary Table of Report Addressees:
                        Addressee  No of reports  %
Government department or minister             19 31
                 NHS Trust or CCG             21 34
                Professional body              4  7
                    Local council              8 13
                            Other              7 11


### Categorise 'themes' from coroner concerns

ONS coded the **coroner's concerns** sections into 6 primary themes: service provision, staffing & resourcing, communication, multiple services involved in care, accessing services, access to harmful content & environment. 

Each of these themes contains a number of sub-themes. 

...Sorry for this ugly code...

In [None]:
theme_screener = Screener(
    llm=llm_client,
    reports=child_suicide_reports,
    match_leniency=None,
    filter_df=False,
    # We only need to read the concerns section
    include_investigation=False,
    include_circumstances=False,
    include_concerns=True,
)

child_suicide_reports = theme_screener.screen_reports(
    user_query="Issues with inadequate or un-followed standard operating procedures in service provision.",
    result_col_name="sp_sop_inadequate",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Concerns about specialist services (e.g., crisis, autism, beds) within service provision.",
    result_col_name="sp_specialist_services",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Failures or inadequacies in risk assessment during service provision.",
    result_col_name="sp_risk_assessment",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Problems related to discharge from services within service provision.",
    result_col_name="sp_discharge",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Issues concerning diagnostics as part of service provision.",
    result_col_name="sp_diagnostics",
)

child_suicide_reports = theme_screener.screen_reports(
    user_query="Inadequate or insufficient training for staff impacting staffing and resourcing.",
    result_col_name="sr_training",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Problems due to inadequate staffing levels or poor staff deployment.",
    result_col_name="sr_inadequate_staffing",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Funding shortages or financial constraints affecting staffing and resourcing.",
    result_col_name="sr_funding",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Difficulties in staff recruitment or retention impacting services.",
    result_col_name="sr_recruitment_retention",
)

child_suicide_reports = theme_screener.screen_reports(
    user_query="Breakdowns or poor communication between different services or agencies.",
    result_col_name="comm_between_services",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Failures or difficulties in communication with the patient or their family.",
    result_col_name="comm_patient_family",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Failure to communicate confidentiality risks to relevant parties.",
    result_col_name="comm_confidentiality_risk",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Poor communication or information sharing within a single service or team.",
    result_col_name="comm_within_services",
)

child_suicide_reports = theme_screener.screen_reports(
    user_query="Lack of integration or coordination of care when multiple services are involved.",
    result_col_name="msic_integration_care",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Issues related to Local Authority involvement (e.g., child services, schools) in multi-agency care.",
    result_col_name="msic_local_authority",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Difficulties or failures during the transition from CAMHS in a multi-service care context.",
    result_col_name="msic_transition_camhs",
)

child_suicide_reports = theme_screener.screen_reports(
    user_query="Significant delays in referrals or excessively long waiting times for accessing services.",
    result_col_name="as_delays_waiting",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Referrals being inappropriately or incorrectly rejected for accessing services.",
    result_col_name="as_referral_rejected",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Challenges or failures in engaging patients with necessary services.",
    result_col_name="as_patient_engagement",
)

child_suicide_reports = theme_screener.screen_reports(
    user_query="Concerns about access to harmful content or negative experiences via the internet.",
    result_col_name="ahce_internet",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Failures in safeguarding individuals from accessing sensitive or inappropriate material.",
    result_col_name="ahce_safeguarding_sensitive",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Access to or presence of harmful items or substances.",
    result_col_name="ahce_harmful_items",
)
child_suicide_reports = theme_screener.screen_reports(
    user_query="Incidents, risks, or access to harm specifically related to trainlines or railway environments.",
    result_col_name="ahce_trainline",
)

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

Sending requests to the LLM (in parallel):   0%|          | 0/61 [00:00<?, ?it/s]

In [None]:
# 1. Service provision
print("\nPrimary theme: Service provision")
service_provision_config = [
    {
        "name": "Standard operating procedures/ processes not followed or adequate",
        "col": "sp_sop_inadequate",
    },
    {
        "name": "Specialist services (crisis, autism, beds)",
        "col": "sp_specialist_services",
    },
    {"name": "Risk assessment", "col": "sp_risk_assessment"},
    {"name": "Discharge from services", "col": "sp_discharge"},
    {"name": "Diagnostics", "col": "sp_diagnostics"},
]
service_provision_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in service_provision_config
]
service_provision_df = pd.DataFrame(service_provision_data)
print(service_provision_df.to_string(index=False))

# 2. Staffing and resourcing 
print("\n---\n\nPrimary theme: Staffing and resourcing\n")
staffing_resourcing_config = [
    {"name": "Training", "col": "sr_training"},
    {"name": "Inadequate staffing", "col": "sr_inadequate_staffing"},
    {"name": "Funding", "col": "sr_funding"},
    {"name": "Recruitment and retention", "col": "sr_recruitment_retention"},
]
staffing_resourcing_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in staffing_resourcing_config
]
staffing_resourcing_df = pd.DataFrame(staffing_resourcing_data)
print(staffing_resourcing_df.to_string(index=False))

# 3. Communication
print("\n---\n\nPrimary theme: Communication\n")
communication_config = [
    {"name": "Between services", "col": "comm_between_services"},
    {"name": "With patient and family", "col": "comm_patient_family"},
    {
        "name": "Confidentiality risk not communicated",
        "col": "comm_confidentiality_risk",
    },
    {"name": "Within services", "col": "comm_within_services"},
]
communication_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in communication_config
]
communication_df = pd.DataFrame(communication_data)
print(communication_df.to_string(index=False))

# 4. Multiple services involved in care
print("\n---\n\nPrimary theme: Multiple services involved in care\n")
multi_services_config = [
    {"name": "Integration of care", "col": "msic_integration_care"},
    {
        "name": "Local Authority (incl child services, schools)",
        "col": "msic_local_authority",
    },
    {"name": "Transition from CAMHS", "col": "msic_transition_camhs"},
]
multi_services_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in multi_services_config
]
multi_services_df = pd.DataFrame(multi_services_data)
print(multi_services_df.to_string(index=False))

# 5. Accessing services 
print("\n---\n\nPrimary theme: Accessing services\n")
accessing_services_config = [
    {"name": "Delays in referrals and waiting times", "col": "as_delays_waiting"},
    {"name": "Referral rejected", "col": "as_referral_rejected"},
    {"name": "Patient engagement", "col": "as_patient_engagement"},
]
accessing_services_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in accessing_services_config
]
accessing_services_df = pd.DataFrame(accessing_services_data)
print(accessing_services_df.to_string(index=False))

# 6. Access to harmful content and environment
print("\n---\n\nPrimary theme: Access to harmful content and environment\n")
harmful_content_config = [
    {"name": "Internet", "col": "ahce_internet"},
    {
        "name": "Safeguarding from sensitive material",
        "col": "ahce_safeguarding_sensitive",
    },
    {"name": "Harmful items/ substances", "col": "ahce_harmful_items"},
    {"name": "Trainline", "col": "ahce_trainline"},
]
harmful_content_data = [
    {
        "Sub-theme": theme["name"],
        "Number of reports": int(child_suicide_reports[theme["col"]].sum()),
    }
    for theme in harmful_content_config
]
harmful_content_df = pd.DataFrame(harmful_content_data)
print(harmful_content_df.to_string(index=False))


Primary theme: Service provision
                                                        Sub-theme  Number of reports
Standard operating procedures/ processes not followed or adequate                 18
                       Specialist services (crisis, autism, beds)                 12
                                                  Risk assessment                  1
                                          Discharge from services                  1
                                                      Diagnostics                  2

---

Primary theme: Staffing and resourcing

                Sub-theme  Number of reports
                 Training                  3
      Inadequate staffing                  2
                  Funding                  4
Recruitment and retention                  0

---

Primary theme: Communication

                            Sub-theme  Number of reports
                     Between services                 14
              With patient and fami