# Getting started

This page talks you through an example workflow using PFD Toolkit: loading a dataset and screening for relevant cases.

It doesn't cover everything: for more, we strongly suggest browsing through the pages in the top panel.

---

## Installation

PFD Toolkit can be installed from pip as `pfd_toolkit`:

```bash
pip install pfd_toolkit
```

## Load your first dataset

First, you'll need to load a PFD dataset. These datasets are updated weekly, meaning you always have access to the latest reports with minimal setup.

In [1]:
from pfd_toolkit import load_reports

# Load all PFD reports from Jan-May 2025
reports = load_reports(
    start_date="2024-01-01",
    end_date="2025-05-01")

# Identify number of reports
num_reports = len(reports)

reports.head(n=5)

Unnamed: 0,url,id,date,coroner,area,receiver,investigation,circumstances,concerns
0,https://www.judiciary.uk/prevention-of-future-...,2025-0209,2025-05-01,A. Hodson,Birmingham and Solihull,NHS England; The Robert Jones and Agnes Hunt O...,On 9th December 2024 I commenced an investigat...,"At 10.45am on 23rd November 2024, Peter sadly ...",To The Robert Jones and Agnes Hunt Orthopaedic...
1,https://www.judiciary.uk/prevention-of-future-...,2025-0208,2025-04-30,J. Andrews,"West Sussex, Brighton and Hove",West Sussex County Council,On 2 November 2024 I commenced an investigatio...,Mrs Turner drove her car into the canal at the...,The inquest was told that South Bank is a resi...
2,https://www.judiciary.uk/prevention-of-future-...,2025-0207,2025-04-30,A. Mutch,Manchester South,Flixton Road Medical Centre; Greater Mancheste...,On 1 October 2024 I commenced an investigation...,Louise Danielle Rosendale was prescribed long ...,The inquest heard evidence that Louise Rosenda...
3,https://www.judiciary.uk/prevention-of-future-...,2025-0206,2025-04-25,J. Heath,North Yorkshire and York,Townhead Surgery,On 4th June 2024 I commenced an investigation ...,"On 15 March 2024, Richard James Moss attended ...",When a referral document is completed by a med...
4,https://www.judiciary.uk/prevention-of-future-...,2025-0120,2025-04-25,M. Hassell,Inner North London,The President Royal College Obstetricians and ...,"On 23 August 2024, one of my assistant coroner...",Jannat was a big baby and her mother had a his...,With the benefit of a maternity and newborn sa...


## Screening for relevant reports

You're likely using PFD Toolkit because you want to answer a specific question. For example: "Do any PFD reports raise concerns related to detention under the Mental Health Act?"

PFD Toolkit lets you query reports in plain English — no need to know precise keywords or categories. Just describe the cases you care about, and the toolkit will return matching reports.

### Set up an LLM client

Screening and other advanced features use AI, and require you to first set up an LLM client. You'll need to head to [platform.openai.com](https://platform.openai.com/docs/overview) and create an API key. Once you've got this, simply feed it to the `LLM`.

In [2]:
from pfd_toolkit import LLM
from dotenv import load_dotenv
import os

# Load OpenAI API key
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Initialise LLM client
llm_client = LLM(api_key=openai_api_key, max_workers=30)

### Screen reports in plain English

Now, all we need to do is specify our `user_query` (the statement the LLM will use to filter reports), and set up our `Screener` engine.

In [None]:
from pfd_toolkit import Screener

# Create a user query to filter
user_query = "Concerns related to detention under the Mental Health Act **only**"

# Screen reports
screener = Screener(llm = llm_client,
                        reports = reports) # Reports that you loaded earlier

filtered_reports = screener.screen_reports(user_query=user_query,
                                           produce_spans=True)

Sending requests to the LLM: 100%|██████████| 883/883 [00:20<00:00, 44.02it/s]


In [8]:
# Capture number of screened reports
num_reports_screened = len(filtered_reports)

# Check how many reports we've identified
print(f"From our initial {num_reports} reports, PFD toolkit screened {num_reports_screened} \
reports mentioning concerns around detention under the Mental Health Act.")

From our initial 883 reports, PFD toolkit screened 80 reports mentioning concerns around detention under the Mental Health Act.


In practice, we'd probably want to extend our start and end dates to cover the entire corpus of reports. We've only kept things short for demo purposes :)

## Discover themes

Now that we've got our collection of reports related to detention under the Mental Health Act, we might want to discover recurring themes: in other words, common occurances 