# Synthesize Patient Communications

_Click on the button bellow to open this tutorial as an executable notebook_

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/atlasfutures/memex-sample-public/blob/main/docs/tutorial/tutorials/synthesize-patient-communications/synthesize-patient-communications.ipynb)

***

### Introduction

Imagine you are a data scientist or engineer working at a company that is building an AI application that helps healthcare providers manage their patient communication data. Examples could include an application that creates summaries of conversations, one that analyzes the conversations and classifies them by type, or even the chatbot itself.

In order for you to create this application, you'll need some examples to make sure it works well and efficiently. But you quickly realized it might be tough to get access to a good dataset of patient communication messages to work with.

In this tutorial, we'll show you how you can use Memex to generate a synthetic set of patient communications. We'll explore how to pre-process data, utilize Memex's capabilities to query and transform data, and finally, how to synthesize patient communications using Memex's unique integration of User-Defined Functions (UDFs) with SQL queries.

### Workflow summary

In this tutorial we'll start with detailed medical records as inputs, and end with relevant patient communications through the following workflow:

_Patient detailed medical records -> formatted patient medical histories -> summarized medical histories -> Patient messages to a healthcare provider_

***

### Configuring Memex Connection

Before proceeding, ensure that the Memex library is installed in your environment. This library provides the necessary tools and functions to interact with the Memex platform. (Note: you might get an error about pip dependencies, but you can disregard)

In [None]:
!pip install -q memexdata

To interact with Memex, set up your instance URL and API key. You can find both of them under your profile bubble `API Access` section at the top right corner of the Memex UI after you log-in.

In [None]:
MEMEX_INSTANCE_URL = "https://<YOUR_INSTANCE>.memexdata.com"
MEMEX_API_KEY = "<YOUR_API_KEY>"

With the credentials set, instantiate a MemexSession.

In [None]:
import os
from pathlib import Path

from memex import MemexSession

mx = MemexSession(MEMEX_INSTANCE_URL, api_key=MEMEX_API_KEY)

### Uploading the dataset

In this tutorial we'll start with a synthetic set of 100 patient medical records.

The dataset, called `patients_joined`, includes patient demographics, encounters, conditions, and observations in a single table. You can [download the json file](https://sample.memexdata.com/synthesize-patient/data/patients\_joined.jsonl) here and upload it to Memex either via the UI or the Python API.

### Create patients medical histories

The first step is to process the data to generate a nicely formatted patient medical history based on the information we have for each patient. To do this, we define a UDF `format_patient_history`, which takes various patient details from `patients_joined` as input and formats them into a patient medical history broken down into three sections: Medications, Encounter Reasons, and Conditions.

In [None]:
from pydantic import BaseModel
from datetime import datetime
from typing import List


class Encounter(BaseModel):
    id: str
    encounterclass: str
    desc: str
    start: datetime
    end: datetime
    reason_code: float
    reason_desc: str
    provider_id: str
    provider_name: str


class Condition(BaseModel):
    id: int
    desc: str
    start: datetime
    end: datetime


class Medication(BaseModel):
    id: int
    desc: str
    start: datetime
    end: datetime
    medication_reason_code: float
    reason_desc: str


@mx.udf
def format_patient_history(
    first: str,
    last: str,
    birthdate: str,
    encounter_list: List[Encounter],
    condition_list: List[Condition],
    medication_list: List[Medication],
) -> str:
    import random

    # Convert BaseModel instances to dicts manually and exclude 'id' key
    encounter_dicts = [
        {k: v for k, v in e.items() if k != "id"} for e in encounter_list
    ]
    condition_dicts = [
        {k: v for k, v in c.items() if k != "id"} for c in condition_list
    ]
    medication_dicts = [
        {k: v for k, v in m.items() if k != "id"} for m in medication_list
    ]

    # sort the histories in reverse chronological order by start date
    encounter_dicts = sorted(encounter_dicts, key=lambda x: x["start"], reverse=True)
    condition_dicts = sorted(condition_dicts, key=lambda x: x["start"], reverse=True)
    medication_dicts = sorted(medication_dicts, key=lambda x: x["start"], reverse=True)

    # select the most recent encounter that has a reason description
    most_recent_reason = next(
        (e["reason_desc"] for e in encounter_dicts if e["reason_desc"]), None
    )

    # get the date of the most recent encounter
    most_recent_encounter_date = (
        encounter_dicts[0]["start"] if encounter_dicts else None
    )

    # get the provider name of the most recent encounter
    most_recent_provider = (
        encounter_dicts[0]["provider_name"] if encounter_dicts else None
    )

    # filter the condition and medication lists to only include items that occurred before the most recent encounter
    condition_dicts = [
        c for c in condition_dicts if c["start"] < most_recent_encounter_date
    ]
    medication_dicts = [
        m for m in medication_dicts if m["start"] < most_recent_encounter_date
    ]

    # get a list of active medications by their description
    active_medications = [m["desc"] for m in medication_dicts if m["end"] is None]

    # Format the active medications into a list. If the list is empty, return a message indicating that there are no active medications
    formatted_medications = "Active medications\n"
    formatted_medications += (
        "\n".join(active_medications) if active_medications else "No active medications"
    )

    # group encounters by reason and encounter class, with a list of dates on which those encounter types occurred
    # exclude reasons that are "None" or empty strings
    encounter_reasons = {}
    for e in encounter_dicts:
        if e["reason_desc"] and e["reason_desc"] not in encounter_reasons:
            encounter_reasons[e["reason_desc"]] = {}
        if e["reason_desc"]:
            if e["encounterclass"] not in encounter_reasons[e["reason_desc"]]:
                encounter_reasons[e["reason_desc"]][e["encounterclass"]] = []
            encounter_reasons[e["reason_desc"]][e["encounterclass"]].append(e["start"])

    # format the encounter reasons + classes and dates into a string, only include the month and year of the date (which is str format) in the format MM/DD
    formatted_encounter_reasons = "Encounter reasons\n"
    for reason, classes in encounter_reasons.items():
        formatted_encounter_reasons += f"{reason}\n"
        for cls, dates in classes.items():
            formatted_encounter_reasons += f"  {cls}: {', '.join(dates)}\n"

    # get a list of active conditions by their description
    active_conditions = [c["desc"] for c in condition_dicts if c["end"] is None]

    # Format the active conditions into a string. If the list is empty, return a message indicating that there are no active conditions
    formatted_conditions = "Active conditions\n"
    formatted_conditions += (
        "\n".join(active_conditions) if active_conditions else "No active conditions"
    )

    summary = f"""{first} {last}, born on {birthdate}, is communicating with their healthcare provider, {most_recent_provider}, about "{most_recent_reason}". 
    
## Below is a summary of their healthcare history:

### {formatted_medications}

### {formatted_encounter_reasons} 
 
### {formatted_conditions}"""

    return summary

Now that we have our `format_patient_history` UDF, let's call it on a query for a single patient to see the output it would generate. Notice that `format_patient_history` is a Python-only UDF (i.e., no LLM calls), that Memex allows you to call from SQL.

In [None]:
format_query = """
SELECT 
  id, 
  format_patient_history(first, last, birthdate, encounter_list, condition_list, medication_list) as pt_hist 
FROM patients_joined"""

In [None]:
for record in mx.query(format_query, limit=3)['pt_hist'].values:
  print(record, "\n---\n")

Which generates something like this:

<figure><img src="../.gitbook/assets/synthesize-patient-query-1.png" alt=""><figcaption></figcaption></figure>

**Tip:** The Memex UI was built to render this type of results more effectively. At any point you can go to your instance and see the data you are creating. For example, the formatted patient history looks like this:

<figure><img src="../.gitbook/assets/synthesize-patient-ui-1.png" alt=""><figcaption></figcaption></figure>

Now let's run it on all 100 synthetic patients, and save the outputs on a new table called `patient_histories_formatted` which will contain the formatted patient medical histories.

In [None]:
format_query = """
SELECT 
  id, 
  format_patient_history(first, last, birthdate, encounter_list, condition_list, medication_list) as pt_hist 
FROM patients_joined"""

mx.save_as_table("patient_histories_formatted", format_query, overwrite=True)

The new table is now created and accessible through the Memex UI.

### Summarizing Patient Medical Histories

In this step, we'll define a prompt UDF, `summarize_patient_history`, which will take as input the patients medical histories and summarize them. This UDF, being a prompt one, will only have the LLM prompt we'll want to run on each medical history.

In [None]:
@mx.prompt
def summarize_patient_history(patient_history: str) -> str:
    """Summarize the following patient history:
    {patient_history}
    """

Memex supports many different LLM models. You can use the following call to see available ones and choose the one you'll want to use for the `summarize_patient_history` prompt UDF.

In [None]:
mx.get_models()

Let's go with `gpt-4-turbo-preview`.

Now we create a query that will call `summarize_patient_history` by passing each patients histories from `patients_histories_formatted` as input. Let's see how the result looks like

In [None]:
summarize_query = """
WITH patients AS (
    SELECT *
    FROM patient_histories_formatted
    LIMIT 5
)
SELECT id , summarize_patient_history(pt_hist) as pt_hist_summary FROM patients
"""

mx.query(summarize_query).values

<figure><img src="../.gitbook/assets/synthesize-patient-query-2.png" alt=""><figcaption></figcaption></figure>

And save the output as a new table called `patient_histories_summarized`.

In [None]:
mx.save_as_table(
    "patient_histories_summarized",
    summarize_query,
    model="gpt-4-turbo-preview",
    temperature=0.2,
    max_tokens=1000,
    overwrite=True,
)

### Creating Patient Portal Messages

In this last step, we'll create synthetic patient conversations with their healthcare provider.

We'll start by defining a Prompt Function `create_portal_message`. This function takes a patient medical history summary as input and generates a message that the patient might send to their healthcare provider through a patient portal. The aim is to use colloquial language to express concerns or questions about the patient's health that are specific to their own medical history.

In [None]:
@mx.prompt
def create_portal_message(patient_summary: str) -> str:
    """Your instructions are to write a message to your healthcare provider as if you are the patient summarized below.
    - You are messaging them with questions about your current health issue, which may be symptoms, billing issues, prescription issues, etc.
    - Use colloquial language and be concise. Do not use sophisticated medical terms.

    Patient Summary:
    {patient_summary}
    """

### Generating Portal Communications

With the `create_portal_message` defined, the next step is to apply it to the patient medical histories we've summarized.

We do this with a SQL query that calls the UDF for each patient medical history summary, generating a simulated message that the patient might send to their healthcare provider about something related to their medical history.

Let's run it on a single patient to see it in action.

In [None]:
portal_query = """
WITH patients AS (
    SELECT id, pt_hist_summary
    FROM patient_histories_summarized
)
SELECT id, create_portal_message(pt_hist_summary) as portal_communication 
FROM patients LIMIT 1
"""


mx.query(portal_query)['portal_communication'].values[0]

Which generates something like this:

<figure><img src="../.gitbook/assets/synthesize-patient-query-3.png" alt=""><figcaption></figcaption></figure>

And now we can run it on all the patients to generate one message for each.

In [None]:
portal_query = """
WITH patients AS (
    SELECT id, pt_hist_summary
    FROM patient_histories_summarized
)
SELECT id, create_portal_message(pt_hist_summary) as portal_communication 
FROM patients LIMIT 100
"""

mx.save_as_table(
    "portal_communications",
    portal_query,
    model="gpt-4-turbo-preview",
    temperature=0.2,
    max_tokens=1000,
    overwrite=True,
)

The results of this query, saved as the `portal_communications` table, contains the generated messages from patients to their healthcare providers.&#x20;

And you are done! This is a good moment to go back to the Memex UI and explore all the data you created:

<figure><img src="../.gitbook/assets/synthesize-patient-ui-2.png" alt=""><figcaption></figcaption></figure>


***

## Conclusion

Let's put ourselves back in the shoes of the data scientist or engineer at the AI healthcare company. In just a couple of steps you were able to format patient histories, summarize medical records, and generate patient-specific communications.

Well done!