In [1]:
#| hide
from police_risk_open_ai.llm import *
from police_risk_open_ai.crawl import *
import os
from dotenv import load_dotenv
import pandas as pd
import openai
from openai.embeddings_utils import distances_from_embeddings

load_dotenv()

openai.api_key = os.getenv("OPENAI_API_KEY")

df = pd.read_parquet('processed/embeddings.parquet')

# An OpenAI Experiment - CopBot!

In case you've been living under a rock and missed the latest news from [OpenAI](https://openai.com/), their newest version of ChatGPT, and the [large language model](https://en.wikipedia.org/wiki/Large_language_model) underpinning it, is absolute technological wizardry at this point. Given using the tool in development is [actually quite affordable](https://openai.com/pricing), I thought I'd build a few prototypes for public safety use cases, and see how it performs. 

![CopBot](images/cop_bot.png)

Reading, evaluating and prioritising risk is a core policing skill: from investigating crimes with piles of witness statements in dingy offices, to responding to life threatening incidents incidents at 2am on a rainy street, the key challenge is digesting piles of information, quickly, and deciding what to do first, taking into account decades worth of policing legislation, and making sure your decision is justifiable when it inevitably goes wrong.

So can an AI learn to speak police and make vaguely convicing risk assessments? I scraped all the policing guidance I could find, fed it to an OpenAI powered model, and asked it to predict risk for missing people in an explainable way...and it kind of works! Of course don't use it for any real police work, but you [can try out the prototype here](https://andreasthinks-police-risk-open-ai-main-y0l65v.streamlit.app/) for as long as I don't go over my usage limits.

For those who want to get into the detail, the [project code is available here](https://github.com/AndreasThinks/police-risk-open-ai)(built using [Jupyter Notebooks with nbdev](https://nbdev.fast.ai/), which makes it easy to read, was a real delight and deserves it's own blog post), and I thought I'd put together a post to explain the high level principles and document some thoughts. 


## Teaching policing to AI

You've probably already played with the [public version of ChatGPT](https://openai.com/blog/chatgpt), and hopefully understand the basic principles:the language model trained on a huge corpus of publicly available text, trained to answer questions in a helpful way...but not necessarily an operationally useful one, nor one that takes into account of legislation and best practice. If you had all the relevant documentation to hand, and knew exactly which was most relevant, you could just feed it into your prompt - something like "answer this operation question, but consider this legislative text" - but how do you do that if you don't know what's relevant? If you want to teach it how to investigate missing people, where do you event start? Thankfully, the professional body for policings [Authorised Professional Practice](https://www.college.police.uk/app/major-investigation-and-public-protection/missing-persons) in one place (though sadly not through an API), so we'll start there, by using a clever crawler script to download each page, as well as any other document it refers to.

The next stage is to convert all those documents into [embeddings](https://platform.openai.com/docs/guides/embeddings), turning them from text into numerical representation of what the model thinks they actually mean. I've previously done those computations myself, using libraries like [Huggingface](https://huggingface.co/) or [Spacy](https://spacy.io/), but [OpenAI provides all of their embeddings through a quite affordable API you can query](https://platform.openai.com/docs/guides/embeddings).

::: {.callout-note collapse="true"}
## Embeddings
:::

Once you've converted all your all our documentation into embeddings, we can quickly calculate the distance between our question and each document, and that tells us which pieces from our corpus of of text is most closely associated to the question we're asking!

Unlike ChatGPT though, we want to limit our model to *only answer questions from that corpus*: if our documents don't contain information about a certain topic, [don't just go and hallucinate a whole new answer](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). Then it's just a matter of writing our question to extract the most meaningful documents linked to our question, feed them into our prompt, and ask OpenAI to complete the answer, *unless it doesn't know based off the documentation provided*

So how does it work?  Well, let's start by asking it some generic questions about policy.

In [2]:
answer_question(df, question="What are the most important factors to consider when searching for a missing person?", debug=True)

Context:
Very detailed information and a lifestyle profile will be needed in high-risk cases consider taking a full statement from the person reporting the missing person as well as any other key individuals (for example, the last person to see them) conduct initial searches of relevant premises, the extent and nature of the search should be recorded (see Search) consider seizing electronic devices, computers, and other documentation, (for example, diaries, financial records and notes) and obtain details of usernames and passwords obtain photos of the missing person; these should ideally be current likeness of the missing person and obtained in a digital format obtain details of the individual’s mobile phone and if they have it with them; if the missing person has a mobile phone arrange for a TextSafe© to be sent by the charity, Missing People obtain details of any vehicles that they may have access to and place markers on relevant vehicles on the PNC without delay consider obtaining a

'The primary consideration for the first responder is the safety of the missing person. Judgements made at this early stage may have a significant impact on the outcome of the investigation. The initial investigating officer should begin the investigation by identifying places where the person might be, check information and assumptions, corroborate what they have been told, review the risk assessment, seek and secure evidence, conduct appropriate searches, conduct appropriate intelligence checks, continually reassess the level of risk using the risk principles, assess the level of support required for the missing person’s family, residential worker or foster carer as appropriate, consider seizing electronic devices, computers, and other documentation, obtain photos of the missing person, obtain details of the individual’s mobile phone, obtain'

You can see that as I've enabled debug mode on the function, it will start by printing the relevant documentation it has found (the context), before then giving its answer...which is actually pretty convicing! Let's see what happens if I ask a question it can't know the answer to.

In [3]:
answer_question(df, question="What day is it?", debug=True)

Context:
.

###

It’s also been raining heavily in the night and we have further calls about flooding in the road, so we ring Highways to inform them. I have a little smile to myself as I remember a call in the summer about cars stopping on the M11 because a mother duck and her ducklings were crossing the road. Lunchtime looms. I’m feeling hungry, but that disappears when I take a call from a 16-year-old male, who tells me that he can’t cope any more. He has cut himself with a knife but he doesn’t want to die. His sister has just had a baby. This goes on an emergency straight away, and officers are dispatched within three minutes. I have to talk to him about anything I can to distract him from his misery – luckily, I am good at small talk! Officers arrive and I feel relief as I can hang up the phone. COVID-19 has really affected Essex this year. People are low and weary. You can hear it in their voices. The number of mental health incidents has gone through the roof, and even the force

"I don't know."

Sucess! While it does find a bunch of documentation relating to the current day in our corpus, it does identify that it doesn't know what the day is now.

Let's try something a little bit more technical, and see if it can answer a few questions from the [Sergeants’ and inspectors’ NPPF legal exams](https://www.college.police.uk/career-learning/taking-exams-online/nppf-step-two). I couldn't find any official questions available publicly, [here are two questions from an online guidance service.](https://www.how2become.com/blog/police-sergeants-inspectors-exam/) 

In [4]:
question = """ Officer Jennings is on his evening patrol. He is just about to finish for the day. As he walks down the street, he is approached by a man named Mark, who claims that he saw a man (named Steven) driving down a road not far from the location. Mark claims that he saw Steven drive into a cyclist, before driving off without stopping. Luckily, the cyclist was unharmed. The cyclist was named Kevin. Mark spoke to Kevin, and discovered that he is a 42 year old man, with a wife and two daughters.

Fifteen minutes later, Officer Jennings manages to stop the car being driven by Steven. He pulls him over to the side of the road, and orders him to step out of the car.

 

Referring s.6 (5) of the Road Traffic Act 1988, is Officer Jennings within his legal rights to order that Steven takes a preliminary breath test?

A – No. Officer Jennings has no right to tell Steven what he can and can’t do. He should never have stopped Steven in the first place.

B – No. In order for Officer Jennings to do this, an accident must have happened. The fact that Officer Jennings suspects an accident has taken place, does not meet this requirement.

C – Yes. However, the breath test must take place within or close to an area where the requirements for Steven to cooperate, can be imposed.

D – Yes. Officer Jennings can tell Steven to do whatever he wants, as he’s a police officer."""

answer_sergeant_exam_question(df,question)

'B - No. In order for Officer Jennings to do this, an accident must have happened. The fact that Officer Jennings suspects an accident has taken place, does not meet this requirement.'

In [5]:
question = """Sarah is walking to work one morning, when she is approached from behind by Henry and Jacob.

‘We won’t hurt you, as long as you give us the bag,’ Henry says.

‘You’re not getting it!’ Sarah shouts.

Henry grabs Sarah and holds a knife to her throat, whilst Jacob tries to snatch her bag.

Sarah fights with her attackers, and begins to run away. As the two men chase her, she trips and bangs her head on the pavement. She is taken to hospital and dies from head trauma.

Based on the above information, which of the following options is correct?

A – Jacob cannot be held accountable for the death of Sarah, as he simply tried to take her bag.

B – Jacob and Henry will be charged with attempted robbery, but not in the death of Sarah.

C – Jacob and Henry could be considered liable for the death of Sarah.

D – Sarah’s death cannot be blamed on Henry and Jacob, as it was her choice to run away."""

answer_sergeant_exam_question(df,question)

'C - The answer is C because Jacob and Henry could be considered liable for the death of Sarah, as they were the ones who initiated the attack and chased her, which led to her tripping and hitting her head.'

The answer both should have been C, so that's 50/50 for CopBot... not bad! You can see it's referring to to relevant guidance, but sadly that doesn't really help you pass a promotion exam (though I suspect it would do seriously well trained on a bank of questions instead).

## So does it work?

Before we test our model on fictional missing scenarios, I made one last tweak: I've amended the prompt to explictly refer to identified risk factors, and return them in a given format - you can see how it works below.

In [6]:
margaret_risk_profile = """ Margaret is a 97 year old woman with severe dementia from Twickenham. She lives in supported accomodation, but regularly goes missing, as she walks out when left unsupervised.

She has been missing 6 hours, and it is now 2200.  It is getting dark, and staff are saying she is rarely missing this long"""

margaret_answer = machine_risk_assessment(margaret_risk_profile, df, debug=True)
margaret_answer


Question:
 Margaret is a 97 year old woman with severe dementia from Twickenham. She lives in supported accomodation, but regularly goes missing, as she walks out when left unsupervised.

She has been missing 6 hours, and it is now 2200.  It is getting dark, and staff are saying she is rarely missing this long
Context:
.police.uk research projects maximizing effectiveness police scotland investigations when people living dementia go missing.            Maximizing the effectiveness of Police Scotland investigations when people living with dementia go missing | College of Policing             Sorry, you need to enable JavaScript to visit this website.    Skip to content Jump to search         Menu      Secondary navigation About us News & views Contact us  Search Search     Main navigation Policing guidance Research Career & learning Support for forces Ethics     Breadcrumb Home Research Research projects map           Maximizing the effectiveness of Police Scotland investigations when p

('Graded as High risk, because of the below risk factors: \n- Margaret is 97 years old and has severe dementia, making her more vulnerable to harm\n- She has been missing for 6 hours, which is longer than usual, and it is now dark outside, increasing the risk of harm',

In [12]:
about_james = """ 
James is a 34 year old man, who was reported missing by his wife this evening as he has not returned home from work. It is now 2200, and she expected him home by 1900.
She says while he does go out for drinks after work sometimes, he has not been out this late before, and his phone is off.
James is in good health, there are no mental health concerns or other vulnerabilities. The weather is good, and his friend from work said he'd probably just gone out for drinks.
"""

james_answer = machine_risk_assessment(about_james, df, debug=True)
james_answer

Question:
 
James is a 34 year old man, who was reported missing by his wife this evening as he has not returned home from work. It is now 2200, and she expected him home by 1900.
She says while he does go out for drinks after work sometimes, he has not been out this late before, and his phone is off.
James is in good health, there are no mental health concerns or other vulnerabilities. The weather is good, and his friend from work said he'd probably just gone out for drinks.

Context:
 First published 22 November 2016  Updated 15 March 2023   Latest changes  Written by College of Policing  Missing persons  30 mins read   Implications for the UK leaving the European Union are currently under review – please see APP on international investigation for latest available detail on specific areas, for example: Schengen Information System Europol INTERPOL Joint Investigation Teams This section provides additional information to aid the investigation based on the vulnerability of the individua

('Graded as Medium risk, because of the below risk factors:\n- James has not returned home at the expected time, and his phone is off\n- There is no indication of any mental health concerns or other vulnerabilities that could put him at risk',
 ' First published 22 November 2016  Updated 15 March 2023   Latest changes  Written by College of Policing  Missing persons  30 mins read   Implications for the UK leaving the European Union are currently under review – please see\xa0APP\xa0on international investigation\xa0for latest available detail on specific areas, for example: Schengen Information System Europol INTERPOL Joint Investigation Teams This section provides additional information to aid the investigation based on the vulnerability of the individual and the circumstances in which they are missing. Missing children Safeguarding young and vulnerable people is a responsibility of the police service and partner agencies (see\xa0Children Act 2004). When the police are notified that a 

In [7]:
about_yannik = """ Yannik is a 15 year old boy. He has recently been down, and was reported missing by his parents as he did not return home from school today.

His friends are worried he may be depressed, and when he apparently told one a few days ago 'if it doesn't get any better, I'm going to end it soon'
"""

yannik_answer = machine_risk_assessment(about_yannik, df, debug=True)
yannik_answer

Question:
 Yannik is a 15 year old boy. He has recently been down, and was reported missing by his parents as he did not return home from school today.

His friends are worried he may be depressed, and when he apparently told one a few days ago 'if it doesn't get any better, I'm going to end it soon'

Context:
 First published 22 November 2016  Updated 15 March 2023   Latest changes  Written by College of Policing  Missing persons  30 mins read   Implications for the UK leaving the European Union are currently under review – please see APP on international investigation for latest available detail on specific areas, for example: Schengen Information System Europol INTERPOL Joint Investigation Teams This section provides additional information to aid the investigation based on the vulnerability of the individual and the circumstances in which they are missing. Missing children Safeguarding young and vulnerable people is a responsibility of the police service and partner agencies (see Ch

('Graded as High risk, because of the below risk factors: \n- Yannik is a 15 year old boy who has been reported missing by his parents\n- His friends are worried he may be depressed\n- He has expressed suicidal ideation to one of his friends',
 " First published 22 November 2016  Updated 15 March 2023   Latest changes  Written by College of Policing  Missing persons  30 mins read   Implications for the UK leaving the European Union are currently under review – please see\xa0APP\xa0on international investigation\xa0for latest available detail on specific areas, for example: Schengen Information System Europol INTERPOL Joint Investigation Teams This section provides additional information to aid the investigation based on the vulnerability of the individual and the circumstances in which they are missing. Missing children Safeguarding young and vulnerable people is a responsibility of the police service and partner agencies (see\xa0Children Act 2004). When the police are notified that a 

In [10]:
jason_risk_profile = """ Jason is a 15 year old adult male, who has gone missing from his care home in Southwark. His carer has contacted the school, which has said he was not in today.
They that this is not the first time, and that Jason has been seen hanging out with older boys, who may be involved in crime and drugs."""


jason_answer = machine_risk_assessment(jason_risk_profile, df, debug=True)
jason_answer

Question:
 Jason is a 15 year old adult male, who has gone missing from his care home in Southwark. His carer has contacted the school, which has said he was not in today.
They that this is not the first time, and that Jason has been seen hanging out with older boys, who may be involved in crime and drugs.
Context:
 First published 22 November 2016  Updated 15 March 2023   Latest changes  Written by College of Policing  Missing persons  30 mins read   Implications for the UK leaving the European Union are currently under review – please see APP on international investigation for latest available detail on specific areas, for example: Schengen Information System Europol INTERPOL Joint Investigation Teams This section provides additional information to aid the investigation based on the vulnerability of the individual and the circumstances in which they are missing. Missing children Safeguarding young and vulnerable people is a responsibility of the police service and partner agencies (s

('Graded as Medium risk, because of the below risk factors: \n- Jason is a 15 year old adult male, who has gone missing from his care home in Southwark\n- This is not the first time he has gone missing\n- He has been seen hanging out with older boys, who may be involved in crime and drugs',

How does it do? Honestly, not badly! while this certainly wouldn't replace human decision making, it could certainly act as a guard raid against missing a key piece of information at three in the morning when you have a queue of 8 missing people to evaluate, each pages worth of history.

There are certainly some pretty major niggles you'd need to fix: for one I'm not sure how the model will perform given actual legislation, rather than "plain English" guidance, nor do I imagine it will cope particularly well with policing specific terminology. The risks it's identified so far are all mostly obvious, rather than identifying one needle in a giant haystack of intelligence reports. It's also all going through the OpenAI black-box servers, though I imagine that could be replaced with something open-ish like [Llama](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) without too much effort. But when I consider where NLP was even 18 months ago, and just how much computing cognitive power we've been able to deploy in only a few hours...who knows where we'll be next year?