# Prompt Template Writing with AI Editors

In this demo, we use a LLM-based prompt editor to write a draft prompt template. Then we have the editor to comment on the draft, followed by manually complete the final prompt template. We use the prompt template for diagnosis and diagnosis attributes (date, status) frame extraction. 

In [14]:
from llm_ie.engines import OllamaInferenceEngine
from llm_ie.extractors import SentenceFrameExtractor
from llm_ie.prompt_editor import PromptEditor

We load a synthesized medical note generated by ChatGPT. 

In [2]:
# Load synthesized medical note
with open("./document/synthesized_note.txt", 'r') as f:
    note_text = f.read()

print(note_text)

### Discharge Summary Note

**Patient Name:** John Doe  
**Medical Record Number:** 12345678  
**Date of Birth:** January 15, 1975  
**Date of Admission:** July 20, 2024  
**Date of Discharge:** July 27, 2024  

**Attending Physician:** Dr. Jane Smith, MD  
**Consulting Physicians:** Dr. Emily Brown, MD (Cardiology), Dr. Michael Green, MD (Pulmonology)

#### Reason for Admission
John Doe, a 49-year-old male, was admitted to the hospital with complaints of chest pain, shortness of breath, and dizziness. The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus.

#### History of Present Illness
The patient reported that the chest pain started two days prior to admission. The pain was described as a pressure-like sensation in the central chest, radiating to the left arm and jaw. He also experienced dyspnea on exertion and occasional palpitations. The patient denied any recent upper respiratory infection, cough, or fever.

#### Past Medical History
- Hypertens

We start with defining a LLM inference engine and LLM. In this demo, we use Ollama to run Llama 3.1 70B.

In [3]:
# Define a LLM inference engine
llm = OllamaInferenceEngine(model_name="llama3.1:70b-instruct-q8_0", keep_alive=3600)

Define LLM-based editor

In [4]:
editor = PromptEditor(llm, SentenceFrameExtractor)

Give the editor a short description of the task. The editor will write a draft for us.

In [5]:
prompt_template_draft = editor.rewrite("Extract diagnosis from the clinical note. Make sure to include diagnosis date and status.")

Here is a rewritten version of the draft prompt following the provided guideline:

# Task description
The text below is from a clinical note. Your task is to extract the diagnosis information in a given sentence (provided by user at a time). Note that diagnosis can have associated dates and statuses.

# Schema definition
Your output should contain: 
    Must have, "Diagnosis" which is the name of the diagnosis spelled exactly as in the source document,
    If applicable, "Date" which is the date of the diagnosis,
    If applicable, "Status" which is the status of the diagnosis (e.g. active, resolved),

# Output format definition
Your output should follow JSON format, 
if there are diagnosis mentions:
    [{"Diagnosis": "<Diagnosis text>", "Date": "<date>", "Status": "<status>"},
    {"Diagnosis": "<Diagnosis text>", "Date": "<date>", "Status": "<status>"}]
if there is no diagnosis in the given sentence, just output an empty list:
    []

# Additional hints
Your output should be 100% ba

Then, we have the editor to comment on the draft.

In [6]:
comments = editor.comment(prompt_template_draft)

Based on the provided guideline, here's an analysis of the draft prompt:

**Task description**: The task description is clear and concise, mentioning that the text is from a clinical note and that the task is to extract diagnosis information from a given sentence.

**Schema definition**: The schema definition is well-defined, specifying the required and optional fields for the output. However, it might be helpful to clarify what format the "Date" field should take (e.g., MM/DD/YYYY).

**Output format definition**: The output format definition is clear, providing examples of how the JSON output should look like when there are diagnosis mentions and when there are none.

**Additional hints**: The additional hints section is concise but could be improved by adding more specific guidance on handling cases where the date or status is not mentioned. For example, it might be helpful to specify whether the "Date" field should be omitted entirely if no date is mentioned or if a default value (e

We adopt the comment and manually make a version 2 prompt template below.

In [7]:
prompt_template_v2 = """
# Task description
The text below is from a clinical note. Your task is to extract the diagnosis information in a given sentence (provided by user at a time). Note that diagnosis can have associated dates and statuses.

# Schema definition
Your output should contain: 
    Must have, "Diagnosis" which is the name of the diagnosis spelled exactly as in the source document,
    If applicable, "Date" which is the date of the diagnosis following "MM/DD/YYYY" format. If only part of the date is know, default to first day of year/ month.
    If applicable, "Status" which is the status of the diagnosis (e.g. active, resolved),

# Output format definition
Your output should follow JSON format, 
if there are diagnosis mentions:
    [{"Diagnosis": "<Diagnosis text>", "Date": "<date>", "Status": "<status>"},
    {"Diagnosis": "<Diagnosis text>", "Date": "<date>", "Status": "<status>"}]
if there is no diagnosis in the given sentence, just output an empty list:
    []

# Additional hints
    1. Your output should be 100% based on the provided content. DO NOT output fake information. 
    2. If there is no specific date or status, just omit the "Date" or "Status" key while still output the Diagnosis.

# Input placeholder
Below is a clinical note for your reference. I will feed you with sentences from it one by one.
"{{input}}"
"""


Have the editor to comment again.

In [8]:
comments = editor.comment(prompt_template_v2)

Based on the prompt guideline, here's an analysis of the draft prompt:

**Task description**: The task description is clear and concise, mentioning that the text is from a clinical note and that the task is to extract diagnosis information from a given sentence.

**Schema definition**: The schema definition is well-defined, specifying the required and optional fields for the output. However, it would be helpful to clarify what types of dates are expected (e.g., admission date, discharge date, etc.) and what statuses are possible (e.g., active, resolved, pending, etc.).

**Output format definition**: The output format definition is clear, specifying that the output should follow JSON format with specific keys for Diagnosis, Date, and Status.

**Additional hints**: The additional hints are helpful in clarifying the expectations for the output. However, it would be beneficial to provide more guidance on how to handle cases where the date or status is not explicitly mentioned in the senten

According to the comment, we want to add some examples to the prompt. We manually prepared an example as string.

In [9]:
example = """
Input: #### Past Medical History
- Hypertension (diagnosed in 2010)...

Output: [{"Diagnosis": "Hypertension", "Date": "01/01/2010", "Status": "history"}]
"""

We use placeholder ```{{example}}``` to insert example text.

In [10]:
prompt_template_final = """
# Task description
The text below is from a clinical note. Your task is to extract the diagnosis information in a given sentence (provided by user at a time). Note that diagnosis can have associated dates and statuses.

# Schema definition
Your output should contain: 
    Must have, "Diagnosis" which is the name of the diagnosis spelled exactly as in the source document,
    If applicable, "Date" which is the date of the diagnosis following "MM/DD/YYYY" format. If only part of the date is know, default to first day of year/ month.
    If applicable, "Status" which is the status of the diagnosis (e.g. active, resolved),

# Output format definition
Your output should follow JSON format, 
if there are diagnosis mentions:
    [{"Diagnosis": "<Diagnosis text>", "Date": "<date>", "Status": "<status>"},
    {"Diagnosis": "<Diagnosis text>", "Date": "<date>", "Status": "<status>"}]
if there is no diagnosis in the given sentence, just output an empty list:
    []

# Examples
{{examples}}

# Additional hints
    1. Your output should be 100% based on the provided content. DO NOT output fake information. 
    2. If there is no specific date or status, just omit the "Date" or "Status" key while still output the Diagnosis.

# Input placeholder
Below is a clinical note for your reference. I will feed you with sentences from it one by one.
"{{input}}"
"""

We define extractor, then use ```text_content``` parameter to pass in the note text and example. The ```entity_key``` tells the post-processor what is the entity text to look for spans. The ```document_key``` tells which key of the ```text_content``` provides the document text. 

In [12]:
extractor = SentenceFrameExtractor(llm, prompt_template_final)

frames = extractor.extract_frames(text_content={"examples":example, "input":note_text}, 
                                  entity_key="Diagnosis",
                                  document_key="input", 
                                  stream=True)



Sentence: 
### Discharge Summary Note

**Patient Name:** John Doe  
**Medical Record Number:** 12345678  
**Date of Birth:** January 15, 1975  
**Date of Admission:** July 20, 2024  
**Date of Discharge:** July 27, 2024  

**Attending Physician:** Dr.

Extraction:
[]



Sentence: 
Jane Smith, MD  
**Consulting Physicians:** Dr.

Extraction:
[]



Sentence: 
Emily Brown, MD (Cardiology), Dr.

Extraction:
[]



Sentence: 
Michael Green, MD (Pulmonology)

#### Reason for Admission
John Doe, a 49-year-old male, was admitted to the hospital with complaints of chest pain, shortness of breath, and dizziness.

Extraction:
[
    {"Diagnosis": "chest pain"},
    {"Diagnosis": "shortness of breath"},
    {"Diagnosis": "dizziness"}
]



Sentence: 
The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus.

Extraction:
[
    {"Diagnosis": "hypertension", "Status": "history"},
    {"Diagnosis": "hyperlipidemia", "Status": "history"},
    {"Diagnosis": "Type 2 diabetes

Now, we inspect the extracted diagnosis frames and attributes.

In [13]:
# Check extracted frames
for frame in frames:
    print(frame.to_dict())

{'frame_id': '0', 'start': 460, 'end': 470, 'entity_text': 'chest pain', 'attr': None}
{'frame_id': '1', 'start': 472, 'end': 491, 'entity_text': 'shortness of breath', 'attr': None}
{'frame_id': '2', 'start': 497, 'end': 506, 'entity_text': 'dizziness', 'attr': None}
{'frame_id': '3', 'start': 537, 'end': 549, 'entity_text': 'hypertension', 'attr': {'Status': 'history'}}
{'frame_id': '4', 'start': 551, 'end': 565, 'entity_text': 'hyperlipidemia', 'attr': {'Status': 'history'}}
{'frame_id': '5', 'start': 571, 'end': 595, 'entity_text': 'Type 2 diabetes mellitus', 'attr': {'Status': 'history'}}
{'frame_id': '6', 'start': 660, 'end': 670, 'entity_text': 'chest pain', 'attr': {'Date': '07/18/2024'}}
{'frame_id': '7', 'start': 837, 'end': 856, 'entity_text': 'dyspnea on exertion', 'attr': None}
{'frame_id': '8', 'start': 872, 'end': 884, 'entity_text': 'palpitations', 'attr': None}
{'frame_id': '9', 'start': 991, 'end': 1003, 'entity_text': 'Hypertension', 'attr': {'Date': '01/01/2010'}}
{

Optionally, we can store the extracted frames in a ```LLMInformationExtractionDocument``` object for better management.

In [15]:
from llm_ie.data_types import LLMInformationExtractionDocument

doc = LLMInformationExtractionDocument(doc_id="Medical note", text=note_text)
doc.add_frames(frames)

In [16]:
print(doc)

LLMInformationExtractionDocument(doc_id: "Medical note"
text: "### Discharge Summary Note

**Patient Name:** John Doe  
**Medical Record Number:** 12345678  
**Dat...",
frames: 20
relations: 0


In [17]:
# doc.save("<your_directory>/<your_filename>.llmie")