# Walkthrough example of the pipeline

Before we start, a few functions I'll use later.

In [69]:
import ollama 

# basic function to prompt the model
def get_response_from_model(prompt, model="gemma3:1b"):
    response = ollama.generate(model=model, prompt=prompt)
    return response['response']

# convert documents to vector embeddings
def vectorize_documents(documents, name):
    collection = client.create_collection(name=name)

    for _, d in enumerate(documents):
        condition, text = d
        response = ollama.embed(model="mxbai-embed-large", input=text)
        embeddings = response["embeddings"]
        collection.add(
        ids=[condition],
        embeddings=embeddings,
        documents=[text]
        )
    
    return collection

# query with context
def retrieve_relevant_doc(query, collection, n_results=1):
  # generate an embedding for the input and retrieve the most relevant doc
  response = ollama.embed(
    model="mxbai-embed-large",
    input=query
  )
  results = collection.query(
    query_embeddings=[response["embeddings"][0]],
    n_results=n_results
  )
  data = results['ids']
  return data

## First step: persona generation

The first step is to generate synthetic requests. Given a randomly selected condition and its level of severity, the LLM should create a corresponding synthetic request, including fabricated demographic details as needed.

In [114]:
import os
import random
from bs4 import BeautifulSoup

# path to the condition folder (download them from sharepoint or scrape them again)
conditions_folder = '../../use-cases/nhs-conditions/nhs-use-case/conditions/'

# loop a few times and pick something meaningful for now (some "conditions" are not really conditions!)
selected_condition = random.choice(os.listdir(conditions_folder))
content = open(os.path.join(conditions_folder, selected_condition, 'index.html'), 'r').read()
soup = BeautifulSoup(content, 'html.parser')
text = soup.get_text()
print (selected_condition)

black-eye


In [115]:
# random pick severity between low medium and high
severity = random.choice(['low: monitor the situation', 'medium: go to the GP', 'medium borderline high: some factors might make it seem high, but following guidelines the outcome should be go to the GP','borderline high: some factors might make it seem less dangerous than it is, but following guidelines the outcome should be call 999', 'high: call 999'])
print (severity)

medium: go to the GP


In [116]:
import re, json

prompt = f"""
I want to generate synthetic requests to NHS 111. I will give you a condition, a patient severity level for the condition (e.g. low, medium, high), and a textual description of possible actions given the condition and the severity level.

Write the profile of a patient who is looking for help with the condition and that severity level. The profile should include general demographics (all patients are adults) and should be structured as a JSON object with the following keys:

1. age: The patient's age (between 18 and 80)

2. gender: The patient's gender

3. location: The patient's location (in UK)

4. occupation: The patient's occupation

5. social_support: A brief description of the patientâ€™s social support network

6. condition: The condition they are experiencing

7. description: A detailed description of their symptoms and concerns (please do not mention the condition name here!)

8. reason_for_seeking_help: The reason they are reaching out to NHS 111

9. overall_assessment: A summary of their condition and what should be the next step, based on the description and the severity level. 

Here I'm providing the condition, severity level and a description of possible actions given the condition and the severity level.

Condition: {selected_condition}, Severity level: {severity}, Description: {text}.

If the severity level is not covered by the description (e.g., the condition is not serious enough for a high severity level or the opposite), please say so and do not generate a profile.

Format the output as a JSON object."""
response = get_response_from_model(prompt)

# just a bit of cleaning up of the response
response = re.sub(r"^```json\s*|\s*```$", "", response).strip()
json_object = json.loads(response)

# I'm asking for condition and overall_assessment in the json to double check that the model is not hallucinating
# and that it is actually using the condition and severity level I provided
# I'll remove them later 
for key, value in json_object.items():
    print(f"{key}: {value}")

age: 18
gender: female
location: UK
occupation: Student
social_support: Limited - primarily university friends
condition: black-eye
description: A black eye is bruising and swelling around the eye, usually caused by a blow to the area, such as a punch or fall. It should get better within 2 to 3 weeks.
reason_for_seeking_help: Seeking advice and guidance on managing a minor injury.
overall_assessment: Medium severity. The black eye is a relatively minor injury that requires attention to prevent further complications. It's likely a bruise and swelling, and while it's not a serious condition, it needs monitoring to ensure it doesn't worsen or become a concern.


## Second step: information retrieval

Given the patient description, we generate a series of keywords and search our database of conditions

In [117]:
description = json_object['description']

print (f"Description: {description}\n")

# number of keywords to generate
n = 5

prompt = f"""
Generate no more than {n} keywords for someone who has these symptoms and is searching the NHS database of possible conditions. Symptoms: {description}.
The keywords should be relevant to the symptoms and should help in identifying the condition. The keywords should be separated by commas and should not include any personal information or specific medical terms.
"""

keywords = get_response_from_model(prompt)
print (f"Keywords: {keywords}")

Description: A black eye is bruising and swelling around the eye, usually caused by a blow to the area, such as a punch or fall. It should get better within 2 to 3 weeks.

Keywords: Bleeding, Bruising, Eye Injury, Trauma, Healing


Now we read all conditions and put the text as elements in a list

In [85]:
# Read all conditions and put them in a list

conditions = []
for condition in os.listdir(conditions_folder):
    try:
        content = open(os.path.join(conditions_folder, condition, 'index.html'), 'r').read()
        soup = BeautifulSoup(content, 'html.parser')
        text = soup.get_text()
        conditions.append((condition, text))
    except Exception as e:
        print (f"Error reading condition {condition}: {e}")
        continue

Error reading condition index.html: [Errno 20] Not a directory: '../../use-cases/nhs-conditions/nhs-use-case/conditions/index.html/index.html'
Error reading condition README.txt: [Errno 20] Not a directory: '../../use-cases/nhs-conditions/nhs-use-case/conditions/README.txt/index.html'
Error reading condition mental-health: [Errno 2] No such file or directory: '../../use-cases/nhs-conditions/nhs-use-case/conditions/mental-health/index.html'


### Important step: we convert each condition to a vector

We'll need to do some research here on best vectors to use

In [21]:
import chromadb

client = chromadb.Client()

# Example usage
collection = vectorize_documents(conditions, "conditions")

### Information retrieval evaluation

Here we decide how many conditions we retrieve and check whether the correct one appears

Note that the better the persona generator is (e.g. complex and not obvious situations), the harded this step should be.
If the persona has a an obvious symptom, then this step should be very easy.

In [118]:
number_of_conditions_to_retrieve = 5

relevant_docs = retrieve_relevant_doc(keywords, collection, n_results=number_of_conditions_to_retrieve)[0]

print (f"Relevant docs: {relevant_docs}")
## Check if condition is among the retrieved conditions
if selected_condition in relevant_docs:
    print (f"Condition {selected_condition} is among the retrieved conditions.")

Relevant docs: ['eye-injuries', 'black-eye', 'red-eye', 'head-injury-and-concussion', 'cuts-and-grazes']
Condition black-eye is among the retrieved conditions.


## Third step: condition prediction and outcome recommendation

In [119]:
# For each of the relevant_docs, retrieve the content in a dictionary

possible_conditions = {}
for condition in relevant_docs:
    try:
        content = open(os.path.join(conditions_folder, condition, 'index.html'), 'r').read()
        soup = BeautifulSoup(content, 'html.parser')
        text = soup.get_text()
        possible_conditions[condition] = text
    except Exception as e:
        print (f"Error reading condition {condition}: {e}")
        continue

In [120]:
# prepare a dictionary with patient info, remove the condition and overall assessment

patient_info = {
    "age" : json_object['age'],
    "gender": json_object['gender'],
    "location": json_object['location'],
    "occupation": json_object['occupation'],
    "social_support": json_object['social_support'],
    "description": json_object['description'],
    "reason_for_seeking_help": json_object['reason_for_seeking_help']
}

### Final evaluation
Here we should assess if an independent LLM is able to recognise the correct condition and suggest the correct outcome.

In [121]:
prompt = f'''
You are supporting a 111 operator. You have been given the following patient information.
Patient information: {json.dumps(patient_info, indent=2)}
Possible conditions based on patient information: {json.dumps(possible_conditions, indent=2)}
Please provide:
1. the most likely condition (one of the following: {possible_conditions.keys()}) based on the patient information and the textual description of each condition.
2. the most likely outcome for the patient based on the symptom and condition (from point 1). The outcome should be one of the following: {severity}. it should be followed by a clear suggestion for the next step.
All should be strictly based on the patient information and the possible conditions. If you lack information to make a decision, please say so.
Answer only with:
1. the condition name and 
2. the outcome, with a clear suggestion for next step.
'''
response = get_response_from_model(prompt)
print (f"Response: {response}")

Response: 1. Cuts and grazes
2. medium: Go to the GP. Schedule an urgent appointment with your GP to assess the wound and determine if further treatment is needed. Consider seeking advice from a healthcare professional to ensure proper wound care and prevent infection.


