# DEMO of response format capabilities
As you can see in the below made up example, the LLM is FORCED to produce outputs in the format laid out in the class called Test. It's important that Test inherits from Pydantic's BaseModel, as this is handled internally by the openai SDK and is also used to parse the text json output back into the class. Once you have the output object, you can access the attributes of the object like normal. The use of Typing is enforced by openai, ensuring that a string is always a string, an integer is always an integer, a List is always a list, and a Literal has to be one of the options you give it. This is perfect for ensuring LLM outputs are bulletproof in their conforming to your expected outputs, meaning that they do not break unexpectedly no matter what the input is.

The use of Field, from pydantic allows you to add a description property to the attribute. This description is provided to the LLM too, and so are any docstrings!

In [None]:
from pfd_toolkit.llm import LLM
import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field, create_model
from typing import List, Literal
load_dotenv()
class Test(BaseModel):
    """An example response format case, this text is seen by the LLM too!"""
    name: Literal['john', 'sam', 'other'] = Field(..., description='The name of the patient')
    sibling_ages: List[str] = Field(..., description='The ages of their siblings')
    death_reason: str = Field(..., description='A verbose reason for the patients death')
    category: Literal['suicide', 'old-age', 'covid'] = Field(..., description='Which category the death reason falls into')

In [2]:
llm = LLM(api_key=os.getenv('OPENAI_API_KEY'), model='gpt-4o-mini')

In [40]:
res = llm.generate(prompt='johns siblings are 28 and 29 respectively. john died from confusion and suffered from long-covid which caused cognitive impairment that ultimately resulted in his passing...', response_format=Test)

In [41]:
res

Test(name='john', sibling_ages=['28', '29'], death_reason='John died from confusion and suffered from long-covid which caused cognitive impairment that ultimately resulted in his passing.', category='covid')

# Classification of PFD report demo

In [None]:
class ReportClassification(BaseModel):
    """The classification of the prevention of future death report based on the given context"""
    classification: Literal['suicide', 'depression', 'covid', 'old age', 'confused']

In [46]:
case = llm.generate(prompt='The cause of death in this case was old age, as the patient lived to an impressive 98 years of age.', response_format=ReportClassification)

In [48]:
case

ReportClassification(classification='old age')

In [49]:
case.classification

'old age'

My testing of the grammar (response format) results. Please delete or let me know once you are happy and we'll clean the notebook up. Sorry it's here, I started here and don't wanna move it and re do the LLM calls just to regenerate the cell outputs.

In [1]:
from pfd_toolkit import PFDScraper, llm
from dotenv import load_dotenv
import os
# Load OpenAI API key
load_dotenv("api.env")
openai_api_key = os.getenv("OPENAI_API_KEY")
llm_client = llm.LLM(api_key=openai_api_key)

# Run the scraper! :D
scraper = PFDScraper(
    llm=llm_client,
    category="all",
    date_from="2024-01-10",
    date_to="2024-01-11",
    html_scraping=False,

    pdf_fallback=False,
    llm_fallback=True,
    # docx_conversion="LibreOffice", # Doesn't currently seem to work; need to debug.
    include_time_stamp=False,
    delay_range=None,
    verbose=False,
)
scraper.scrape_reports()
scraper.estimate_api_costs()

While this is a high-performance option, large API costs may be incurred, especially for large requests. 
Consider enabling HTML scraping or .pdf fallback for more cost-effective data extraction.

This will disable delays between requests. This may trigger anti-scraping measures by the host, leading to temporary or permanent IP bans. 
We recommend setting to (1,2).

Fetching pages: 2 page(s) [00:00,  5.79 page(s)/s]INFO:pfd_toolkit.scraper:Total collected report links: 4
Scraping reports:  25%|██▌       | 1/4 [00:18<00:56, 18.89s/it]

OUTPUT JSON:

 {'date of report': '08/01/2024', "coroner's name": 'Karen HENDERSON', 'area': 'West Sussex, Brighton and Hove', 'receiver': '1. Association of Anaesthetists Great Britain and Ireland\n2. Royal College of Anaesthetists\n3. Chief Executive Health Education, England\n4. CQC (Care Quality Commission)', 'investigation and inquest': 'On 7th January 2022 I resumed an investigation into the death of David Bryan Moore sitting with a Jury. On 21st July 2022, the investigation was concluded:\n\nThe medical cause of death given was:\n\n1a. Hypoxic ischaemic brain injury\n1b. Cardiac arrest\n1c. Dislodged tracheostomy tube and delayed replacement\n1d. Burns suffered in an industrial accident requiring a tracheostomy tube\nII. Obesity, Hypertension\n\nThe jury determined:\n\nMr Moore was a self-employed industrial electrician, employed on the 29th May 2021 to change a molded case circuit breaker (MCCB) at a property in Uxbridge. Mr Moore engineered the circuit to allow the front doors

Scraping reports:  50%|█████     | 2/4 [00:19<00:16,  8.27s/it]

OUTPUT JSON:

 {'date of report': '4 January 2024', "coroner's name": 'Ian Potter', 'area': 'Inner North London', 'receiver': 'Lee Rowley MP, Chief Executive [REDACTED]', 'investigation and inquest': 'On 13 December 2022, an investigation was commenced into the death of BERNADETTE GRACE FAULKNER, then aged 80 years. The investigation concluded at the end of an inquest, heard by me, on 13 December 2023.\n\nThe conclusion of the inquest was accidental death, the medical cause of death being:\n1a respiratory failure\n1b lung contusion\n1c multiple bilateral rib fractures (out of hospital fall, 2/12/2022)\n1d obstructive sleep apnoea, type 2 diabetes mellitus, hypertension, asthma', 'circumstances of death': '(1) Mrs Faulkner rented a flat from her local authority, which was a former Victorian townhouse converted into four separate flats. Her electricity meter (installed in 2001) was in a cupboard, just inside the communal door to the flats, some 7-8 feet off the ground.\n\n(2) Mrs Faulkne

Scraping reports:  75%|███████▌  | 3/4 [00:20<00:04,  4.86s/it]

OUTPUT JSON:

 {'date of report': '28/12/2023', "coroner's name": 'Victoria DAVIES', 'area': 'Cheshire', 'receiver': 'Department for Health and Social Care, National Crime Agency, Department for Science, Innovation & Technology', 'investigation and inquest': 'On 17 November 2017 I commenced an investigation into the death of Adrian Brendan GALLAGHER aged 24. The investigation concluded at the end of the inquest on 19 December 2023. The conclusion of the inquest was that:\n\nThis was a death due to suicide.', 'circumstances of death': 'Adrian Gallagher had a history of mental health struggles dating back to 2013, with no definitive diagnosis. On 12 June 2017 he was admitted to Hollins Park Hospital as an informal patient, but was discharged at his request on 16 June 2017. The same day, he was taken to hospital having been found intoxicated at a bridge, with suicidal ideation. He was formally sectioned under the Mental Health Act the following day and re-admitted to Hollins Park. During 

Scraping reports: 100%|██████████| 4/4 [00:22<00:00,  5.60s/it]
INFO:pfd_toolkit.scraper:Estimated API cost for LLM fallback (model: gpt-4o-mini): $0.00 based on 0 missing fields.


OUTPUT JSON:

 {'date of report': '8 January 2024', "coroner's name": 'Dr Nicholas Shaw', 'area': 'Cumbria', 'receiver': '[REDACTED]', 'investigation and inquest': 'On 20 December 2022 I commenced an investigation into the death of Walter FAULDER. The investigation concluded at the end of the inquest . The conclusion of the inquest was Death due to Road Traffic Collision and the medical cause of death was 1a Multiple Injuries', 'circumstances of death': "Walter Faulder BEM was an 88 year old gentleman who lived alone in Wigton, Cumberland. He had recently been discharged from hospital.\n\nOn Saturday 10th December 2022, Walter's daughter visited his address and there was a Cranston’s shopping bag next to his chair. He said he had got the bus to Orton Grange (where the shop is) and back which raised alarm bells to his daughter. She asked him not to do this journey again as it is unsafe. Walter stated that he believes there is a pelican crossing to assist him crossing the road (A595). Th

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
scraper.reports

Unnamed: 0,URL,ID,Date,CoronerName,Area,Receiver,InvestigationAndInquest,CircumstancesOfDeath,MattersOfConcern
0,https://www.judiciary.uk/prevention-of-future-...,N/A: Not found,08/01/2024,Karen HENDERSON,"West Sussex, Brighton and Hove",1. Association of Anaesthetists Great Britain ...,On 7th January 2022 I resumed an investigation...,The conclusion of the jury at the Inquest prov...,During the course of the investigation my inqu...
1,https://www.judiciary.uk/prevention-of-future-...,N/A: Not found,4 January 2024,Ian Potter,Inner North London,"Lee Rowley MP, Chief Executive [REDACTED]","On 13 December 2022, an investigation was comm...",(1) Mrs Faulkner rented a flat from her local ...,During the course of the inquest the evidence ...
2,https://www.judiciary.uk/prevention-of-future-...,N/A: Not found,28/12/2023,Victoria DAVIES,Cheshire,"Department for Health and Social Care, Nationa...",On 17 November 2017 I commenced an investigati...,Adrian Gallagher had a history of mental healt...,During the course of the investigation my inqu...
3,https://www.judiciary.uk/prevention-of-future-...,N/A: Not found,8 January 2024,Dr Nicholas Shaw,Cumbria,[REDACTED],On 20 December 2022 I commenced an investigati...,Walter Faulder BEM was an 88 year old gentlema...,During the course of the inquest the evidence ...


In [4]:
sample_report = scraper.reports.iloc[0:1,:]

In [9]:
print(sample_report.InvestigationAndInquest[0])

On 7th January 2022 I resumed an investigation into the death of David Bryan Moore sitting with a Jury. On 21st July 2022, the investigation was concluded:

The medical cause of death given was:

1a. Hypoxic ischaemic brain injury
1b. Cardiac arrest
1c. Dislodged tracheostomy tube and delayed replacement
1d. Burns suffered in an industrial accident requiring a tracheostomy tube
II. Obesity, Hypertension

The jury determined:

Mr Moore was a self-employed industrial electrician, employed on the 29th May 2021 to change a molded case circuit breaker (MCCB) at a property in Uxbridge. Mr Moore engineered the circuit to allow the front doors of the property to open. On doing this the metal plate divider between the MCCB’s made contact with the exposed live bus bars resulting in an electrical flashover. As a result, Mr Moore sustained burns covering 32 % of his body surface area.

Mr Moore was transferred to St Mary’s Hospital where he was intubated, ventilated and had surgical release of bur

In [5]:
scraper.verbose = True
result = scraper.run_llm_fallback(reports_df=sample_report)

Running LLM Fallback: 100%|██████████| 1/1 [00:00<?, ?it/s]


In [21]:
result

Unnamed: 0,URL,ID,Date,CoronerName,Area,Receiver,InvestigationAndInquest,CircumstancesOfDeath,MattersOfConcern
0,https://www.judiciary.uk/prevention-of-future-...,N/A: Not found,28/12/2023,Victoria DAVIES,Cheshire,"Department for Health and Social Care, Nationa...",On 17 November 2017 I commenced an investigati...,Adrian Gallagher had a history of mental healt...,During the course of the investigation my inqu...
