# SOAP Note Generation

In this cookbook, we'll show you how to generate structured SOAP Notes from a patient, doctor consultation using [Cohere's NLP platform](https://cohere.com).

# Please Note

LLMs are not substitutes for medical expertise or advice, and should only be viewed as general informational support for healthcare professionals in tasks such as SOAP note summarization. LLM summaries may omit crucial details or introduce errors that could lead to harmful outcomes in patient care.

When using summarization in this use-case, we recommend the ensuring the following:

- Awareness: Healthcare professionals should be aware that SOAP note summaries may be generated or assisted by LLMs. The LLM's role and limitations should be clearly documented within patient care protocols.
- Audit: A qualified healthcare professional should always carefully review and edit LLM-generated summaries before they are used for patient care decisions.
- Testing: In addition, LLMs used for SOAP note summarization should undergo a regular cadence of testing and validation by healthcare professionals to ensure summaries accurately capture key clinical information and avoid propagating or introducing new biases, omissions of critical information, or other errors.
- Rectification: Healthcare professionals must establish clear procedures for patients who experience issues due to inaccurate information in LLM-generated summaries. This may involve avenues for reporting errors, requesting corrections to the original SOAP note and modification to existing practices to prevent similar issues in the future.

For more guidance about how to use Cohere's models safely, visit Cohere's [Responsible Use Guides](https://docs.cohere.com/docs/responsible-use).

# Getting Started
The first step is to download and install the necessary packages. In this case, we'll install [Cohere's Python SDK](https://docs.cohere.com/reference/about) and import it (we'll also import `getpass`, which is a helper library for Google Colab, and `requests`, which we'll use to download sample data).

In [None]:
%%capture
!pip install --pre --upgrade cohere

In [None]:
import cohere
import requests

from getpass import getpass

## Getting the data
Now it's time to prepare the transcript. We've provided an example transcript from the [Primock57 corpus of primary care consultation transcripts](https://github.com/babylonhealth/primock57). Feel free to go along with this one, or upload your own if you have one available.

Hint: if you only have audio, you can turn it into a transcript with your preferred transcription tool, for example [Amazon Transcribe Medical](https://docs.aws.amazon.com/transcribe/latest/dg/transcribe-medical.html).

In [None]:
# Load data
transcript_url = "https://raw.githubusercontent.com/cohere-ai/notebooks/4cd8595fab4f39bc367de1685faf0ed7a4ba3446/notebooks/data/patient_doctor_consultation.txt"
response = requests.get(transcript_url)
data = response.text.strip()

# Or replace with your own transcript!
transcript = data

# Set up Cohere client
api_key = getpass("Cohere API Key: ")
model = "command-r"
co = cohere.Client(api_key=api_key)

Cohere API Key: ··········


# Building the prompt step-by-step

Now we can build up the prompt we'll send to Cohere's LLM.

## Overall format

The first step is to structure the prompt with the high-level sections. At Cohere, we've found it's best to notate sections with `##`, so an "Introduction" section would be `## INTRODUCTION`.

To generate high quality SOAP notes, we'll use 5 sections:
1. Transcript (of doctor consultation)
2. Instructions to the model (on how to construct a SOAP note)
3. Guidelines for the model to follow
4. A format for the model to use
5. An example for the model to reference

By including an example, we're using a technique called "one-shot prompting," which helps the model by giving it an expected level of detail and tone.

In [None]:
prompt_template = """
## TRANSCRIPT
{transcript}

## INSTRUCTION
{instruction}

## GUIDELINES
{guidelines}

## JSON
{format_json}

## EXAMPLE
{example}

## SOAP NOTE
""".strip()

## Filling in the data

Now we build up the content of the prompt by filling in each of the templated sections.

### Instruction

We start by giving the model simple and clear instructions to follow. A great way to do this is to give the model a role by telling it "You are X", and then provide instructions. In this case, we'll provide instructions as we would to an entry-level medical student.

In [None]:
instruction = """
You are a medical assistant AI. You take a TRANSCRIPT between a DOCTOR (D) and a PATIENT (P) and generate a SOAP note based on the GUIDELINES below. Strictly follow the guidelines. You will be evaluated on the quality of your SOAP note.
""".strip()

### Guidelines

After the instruction, we can provide a few additional rules to keep the model on track:

In [None]:
guidelines = """
A medical SOAP note should maintain consistency, quality and completeness. All generated SOAP notes, to achieve this, must adhere to the below mentioned foundational principles to avoid inaccurate, and incomplete medical SOAP notes.
**Fundamental Tenets to guide documentation:**
1. If clinician discusses a detail/fact, must be documented.
2. If a detail/fact was not discussed, do not add into the note.
3. If it was not documented, it never happened.
4. Do adhere to preferences by a clinician.
5. Do use proper medical terminology, avoiding Laymans terms in documentation.
6. Never use an abbreviation that is not designated as a proper medical abbreviation.
7. Do use the 24-hour format when documenting time of day of an event/complaint.
8. Do adhere to US date format of month/day/year for any dates discussed.
9. Do correct all conflicting statements/phrases/details/facts within the note.
10. Do not construe details from vague statements in a patient-clinician interview.

In addition, strictly follow the JSON format for the SOAP note shared below. If some fields don't have any answer based on the conversation, mark then N/A. Only output the JSON with no other text.
""".strip()

### Format and example

Now we tell the model which output format we want, and show it an example of a good answer. In this case we're using [JSON](https://www.json.org/json-en.html) because it's a structured format that's easy to parse after generation. At Cohere we train our models to deeply understand JSON, so it's a great way to get structured data from our LLMs.

Tip: even if your output format isn't structured as JSON, it can help to ask for a response in JSON and convert it to the desired format later. It can still boost performance in some cases!

In [None]:
format_json = """
{
    "subjective": {
        "chief_complaint": "<chief complaint>",
        "history_present_illness": "<history of present illness>",
        "medical_history": "<past medical history>",
        "surgical_history": "<past surgical history>",
        "family_history": "<family medical history>",
        "allergies": "<drug allergies>",
        "social_history": "<relevant social history>",
        "medication_list": "<medications>",
        "immunization_history": "<immunization history>"
    },
    "review_of_systems": "<review of systems>",
    "objective": {
        "vitals": "<measurements of vitals>",
        "physical_exam": "<results of physical exam>",
        "diagnostic_studies": "<diagnostics>"
    },
    "assessment": {
        "diagnosis": "<diagnosis>"
    },
    "plan": "<plan>"
}
""".strip()

example = """
{
    "subjective": {
        "chief_complaint": "Left elbow swelling with no pain or injury history.",
        "history_present_illness": "53 y/o male presents with a one history of elbow swelling. He denies any specific trauma or injury but states that, while showering, he noticed a warm sensation and a sensation of fluid in his elbow about a week ago. He denies this ever happening previously. He states he is able to range his elbow normally through all ranges of motion without any limitations. He also has noticed some dry skin at the tip of his elbow but denies any history of rheumatologitical disease in himself or his family. He has a history of osteoarthritis but is otherwise well without any other medical conditions.",
        "medical_history": "Osteoarthritis",
        "surgical_history": "N/A",
        "family_history": "No known rheumatological diseases.",
        "allergies": "NKDA, Peanut allergy",
        "social_history": "Runs 2-3x per week",
        "medication_list": "N/A",
        "immunization_history": "N/A"
    },
    "review_of_systems": "No other joint issues, no dry skin elsewhere, no eczema.",
    "objective": {
        "vitals": "N/A",
        "physical_exam": "Video assessment: Patient exhibits full range of motion. Unable to test strength or ligamentous integrity through virtual assesment. Patient reports that the elbow is warm to touch with some dry skin on the tip of the elbow. There is no obvious tenderness to palpation. No observable signs of septic joint. ",
        "diagnostic_studies": "N/A"
    },
    "assessment":{
        "diagnosis": "Left Elbow Bursitis, does not appear to be septic at this time."
    },
    "plan": "1. Ibuprofen 400mg, twice daily with food (not on an empty stomach). Stop Ibuprofen if any heartburn is experienced. 2. Blood tests to be done (forms to be sent) to rule out including but not limited to inflammatory conditions, gout, infection and rheumatological conditions. 3. Patient to contact office to book follow-up appointment via phone or in-person after bloodwork is complete. 4. Advised to return immediately for in person or phone appointment if the patient starts to experience worsening erythema and/or elbow pain."
}
""".strip()

### Format the prompt

Finally, we format the prompt by entering the data in the appropriate places.

In [None]:
prompt = prompt_template.format(
    transcript=transcript,
    instruction=instruction,
    guidelines=guidelines,
    format_json=format_json,
    example=example,
)

In [None]:
print(prompt)

## TRANSCRIPT
Doctor: Good morning. I'm Doctor Smith from Babylon. Can you just confirm your name, date of birth, and the first line of your address please?
 Patient: Hi. My name is Susan. Um, thirty, Redbridge Street, SW two two HZ.
 Doctor: Hello.
 Doctor: And your date of birth?
 Patient: forty, oh two, nineteen seventy four.
 Doctor: OK. Are you in a private place so you can have a consultation today?
 Patient: Yes I am.
 Doctor: OK. What can I do for you?
 Patient: It hurts when I pee.
 Doctor: OK, and how long has that been going on for?
 Patient: It stays now.
 Doctor: Pardon?
 Patient: Uh, six days.
 Doctor: Six days, OK. And just tell me a bit more about that. How did it start?
 Patient: Um, I've got this thing when I pee, and it hurts when I go to the loo, and I've got this very unpleasant smell that comes out.
 Doctor: 
 Doctor: OK. And, have you had any other symptoms along with that? Have you had any abdominal pain, or back, lower back pain at all?
 Patient: I've got, pain

# Running it through the model

Getting your SOAP note is now as simple as calling the model! We can do this easily with chat endpoint in the Cohere SDK. For this task, we only need a basic invocation of `co.chat`, but the underlying API is much more powerful. To explore additional ways you can use `chat` with Cohere, check out [chat's API reference](https://docs.cohere.com/reference/chat).

In [None]:
response = co.chat(message=prompt, model=model)
note = response.text
print(note)

```json
{
    "subjective": {
        "chief_complaint": "Patient reports dysuria and an unpleasant smell for the past six days.",
        "history_present_illness": "The patient describes experiencing pain during urination, with associated blood spots in urine. The pain is localized to the middle of the lower abdomen and rated as 7/10 in severity. It comes and goes.",
        "medical_history": "IBS, no other medical problems.",
        "surgical_history": "N/A",
        "family_history": "No medical problems reported.",
        "allergies": "Clindamycin",
        "social_history": "The patient lives with friends, has a job as a support worker, consumes one glass of wine a week and is a non-smoker.",
        "medication_list": "Mebeverine 200mg, three times a day.",
        "immunization_history": "N/A"
    },
    "review_of_systems": "The patient denies any abdominal pain, back pain, vaginal discharge, change in bowel habit, weight loss, or fever. She reports pain in the lower middle

## Cost

Command-R model calls cost \$0.50 per million input tokens and \$1.50 per million output tokens. We can calculate the cost of this generation with the following formula:

In [None]:
in_tokens, out_tokens = response.meta["billed_units"]["input_tokens"], response.meta["billed_units"]["output_tokens"]
total_cost = (0.5 * in_tokens / 1e6) + (1.5 * out_tokens / 1e6)
print(f"${total_cost:.3f}")

$0.002


This SOAP note costed _less than half a cent_ to generate!

# Export

Finally, you may want to export your SOAP Note to another location like Google Sheets. Since our SOAP Note was generated in structured format, that's easy to do!

In [None]:
from google.colab import auth
auth.authenticate_user()

import json
import gspread
from google.auth import default
creds, _ = default()

# Parse the response from Cohere
note_parsed = note.replace("```json", "").replace("```", "").strip()  # Strip json tags
note_parsed = note_parsed.replace("\n", "")  # Flatten
note_object = json.loads(note_parsed)

# Open a new sheet and add some data.
gc = gspread.authorize(creds)
sh = gc.create('SOAP Report')
worksheet = gc.open('SOAP Report').sheet1

# Define the header for the columns
headers = [key.capitalize() if key != 'review_of_systems' else 'Review of Systems' for key in note_object.keys()]

max_len = max(len(value) if isinstance(value, dict) else 1 for value in note_object.values())
table_data = [[None]*len(headers) for _ in range(max_len+1)]  # +1 for headers

# Insert the headers into the first row
for i, header in enumerate(headers):
    table_data[0][i] = header

# Then insert the data into the table
for i, (key, value) in enumerate(note_object.items()):
    if isinstance(value, dict):
        cell_content = "\n".join(
            f"{subkey}: {subvalue}" for subkey, subvalue in value.items()
        )
    else:
        cell_content = value

    table_data[1][i] = cell_content

for row in table_data:
    worksheet.append_row(row)

# Make the headers bold
worksheet.format('A1:M1', {'textFormat': {'bold': True}})

{'spreadsheetId': '14qMKf4GvqyJ5S9vVoOzRj5EUJOjI1cpINQ-CVcn0qOg',
 'replies': [{}]}