# Drug, Strength, and Frequency Extraction
#### This notebook demos named entity recognition (NER) and relation extraction (RE) tasks with the llm-ie package.

In [1]:
from llm_ie.engines import OllamaInferenceEngine
from llm_ie.extractors import SentenceFrameExtractor, MultiClassRelationExtractor
from llm_ie.data_types import LLMInformationExtractionDocument

We load a synthesized medical note generated by ChatGPT. In this demo, we will use this medical note to show NER and RE workflow.

In [2]:
# Load synthesized medical note
with open("./document/synthesized_note.txt", 'r') as f:
    note_text = f.read()

print(note_text)

### Discharge Summary Note

**Patient Name:** John Doe  
**Medical Record Number:** 12345678  
**Date of Birth:** January 15, 1975  
**Date of Admission:** July 20, 2024  
**Date of Discharge:** July 27, 2024  

**Attending Physician:** Dr. Jane Smith, MD  
**Consulting Physicians:** Dr. Emily Brown, MD (Cardiology), Dr. Michael Green, MD (Pulmonology)

#### Reason for Admission
John Doe, a 49-year-old male, was admitted to the hospital with complaints of chest pain, shortness of breath, and dizziness. The patient has a history of hypertension, hyperlipidemia, and Type 2 diabetes mellitus.

#### History of Present Illness
The patient reported that the chest pain started two days prior to admission. The pain was described as a pressure-like sensation in the central chest, radiating to the left arm and jaw. He also experienced dyspnea on exertion and occasional palpitations. The patient denied any recent upper respiratory infection, cough, or fever.

#### Past Medical History
- Hypertens

We start with defining a LLM inference engine and LLM. In this demo, we use Ollama to run Llama 3.1 70B.

In [None]:
# Define a LLM inference engine
llm = OllamaInferenceEngine(model_name="llama3.3:70b-instruct-q8_0", keep_alive=3600)

### Frame extraction (NER)

We write a prompt template for NER. This can be done manually or through the built-in ```PromptEditor``` (see GitHub main page). 

In [None]:

prompt_template = """
# Task description
The clinical note below contains information about medications prescribed to a patient. Your task is to extract medication-related information from a given sentence (provided by user at a time). Specifically, you need to identify mentions of medication names, strengths, and frequencies.

# Schema definition
Your output should contain: 
    "entity_text" which is the exact mention of the entity,
    "entity_type" which is one of the "Medication", "Strength", or "Frequency".

# Output format definition
Your output should follow JSON format. If there are medication-related mentions:
[
    {"entity_text": "<Medication name>", "attr": {"entity_type": "Medication"}}, 
    {"entity_text": "<Strength mention>", "attr": {"entity_type": "Strength"}},
    {"entity_text": "<Frequency mention>", "attr": {"entity_type": "Frequency"}}
]
If there is no medication-related information in the given sentence, just output an empty list:
[]

# Additional hints
Your output should be 100% based on the provided content. DO NOT output fake information.

# Context
Below is the clinical note for your reference. I will feed you with sentences from it one by one.
{{input}}
"""

Now, we define an extractor to perform frame extraction. Note this code block will take a few minutes to run, depending on your GPU.

In [None]:
# Define extractor
extractor = SentenceFrameExtractor(llm, prompt_template)

# Extract with concurrent mode (faster)
frames =  extractor.extract_frames(note_text, concurrent=True)

# To print out the step-by-step, use the `concurrent=False` and `stream=True` options
# frames =  extractor.extract_frames(note_text, concurrent=False, stream=True)

The extractor outputs a list of frames (```LLMInformationExtractionFrame```). We can print them for inspection.

In [6]:
# Check extractions
for frame in frames:
    print(frame.to_dict())

{'frame_id': '0', 'start': 2482, 'end': 2489, 'entity_text': 'aspirin', 'attr': {'entity_type': 'Medication'}}
{'frame_id': '1', 'start': 2494, 'end': 2505, 'entity_text': 'clopidogrel', 'attr': {'entity_type': 'Medication'}}
{'frame_id': '2', 'start': 2518, 'end': 2540, 'entity_text': 'high-dose atorvastatin', 'attr': {'entity_type': 'Strength'}}
{'frame_id': '3', 'start': 3080, 'end': 3089, 'entity_text': 'metformin', 'attr': {'entity_type': 'Medication'}}
{'frame_id': '4', 'start': 3214, 'end': 3221, 'entity_text': 'Aspirin', 'attr': {'entity_type': 'Medication'}}
{'frame_id': '5', 'start': 3222, 'end': 3227, 'entity_text': '81 mg', 'attr': {'entity_type': 'Strength'}}
{'frame_id': '6', 'start': 3228, 'end': 3233, 'entity_text': 'daily', 'attr': {'entity_type': 'Frequency'}}
{'frame_id': '7', 'start': 3236, 'end': 3247, 'entity_text': 'Clopidogrel', 'attr': {'entity_type': 'Medication'}}
{'frame_id': '8', 'start': 3248, 'end': 3253, 'entity_text': '75 mg', 'attr': {'entity_type': 'S

We define a document object to store the extracted frames. The ```add_frames()``` method will validate frames (check for duplicates), and assign automatic IDs.

In [7]:
# Define document
doc = LLMInformationExtractionDocument(doc_id="Meidcal note", text=note_text)

# Add frames to document
doc.add_frames(frames, valid_mode="span", create_id=True)

Inspect the document. Now the document holds the frames.

In [8]:
print(doc)

LLMInformationExtractionDocument(doc_id: "Meidcal note"
text: "### Discharge Summary Note

**Patient Name:** John Doe  
**Medical Record Number:** 12345678  
**Dat...",
frames: 23
relations: 0


### Relation extractio (RE)

Now that we have frames, we can extract relations between them. We write a prompt template for RE as below. This can be done manually or through the ```PromptEditor```. 

In [9]:
re_prompt_template = """
# Task description
This is a multi-class relation extraction task. Given a region of interest (ROI) text and two frames from a medical note, classify the relation types between the two frames. 

# Schema definition
    Strength-Drug: this is a relationship between the drug strength and its name.
    Frequency-Drug: this is a relationship between a drug frequency and its name.

# Output format definition
    Choose one of the relation types listed below or choose "No Relation":
    {{pos_rel_types}}

    Your output should follow the JSON format:
    {"RelationType": "<relation type or No Relation>"}

    I am only interested in the content between []. Do not explain your answer. 

# Hints
    1. Your input always contains one medication entity and 1) one strength entity or 2) one frequency entity.
    2. Pay attention to the medication entity and see if the strength or frequency is for it.
    3. If the strength or frequency is for another medication, output "No Relation". 
    4. If the strength or frequency is for the same medication but at a different location (span), output "No Relation".

# Input placeholders
ROI Text with the two entities annotated with <entity_1> and <entity_2>:
"{{roi_text}}"

Entity 1 full information:
{{frame_1}}

Entity 2 full information:
{{frame_2}}
"""

To avoid checking all combinations of frame-pairs (which will consume a lot of computation), the RelationExtractors requires users to input a pre-processing function: given two frames, output the possible relation types between them. 

In the function ```possible_relation_types_func``` below, we set:
  - if the two frames are > 500 characters apart, we assumes "No Relation"
  - if the two frames are "Medication" and "Strength", the only possible relation types are "Strength-Drug" or "No Relation"
  - if the two frames are "Medication" and "Frequency", the only possible relation types are "Frequency-Drug" or "No Relation"

In [10]:
from typing import List

def possible_relation_types_func(frame_1, frame_2) -> List[str]:
    # If the two frames are > 500 characters apart, we assume "No Relation"
    if abs(frame_1.start - frame_2.start) > 500:
        return []
    
    # If the two frames are "Medication" and "Strength", the only possible relation types are "Strength-Drug" or "No Relation"
    if (frame_1.attr["entity_type"] == "Medication" and frame_2.attr["entity_type"] == "Strength") or \
        (frame_2.attr["entity_type"] == "Medication" and frame_1.attr["entity_type"] == "Strength"):
        return ['Strength-Drug']
    
    # If the two frames are "Medication" and "Frequency", the only possible relation types are "Frequency-Drug" or "No Relation"
    if (frame_1.attr["entity_type"] == "Medication" and frame_2.attr["entity_type"] == "Frequency") or \
        (frame_2.attr["entity_type"] == "Medication" and frame_1.attr["entity_type"] == "Frequency"):
        return ['Frequency-Drug']

    return []

Now we can define a relation extractor to perform relation extraction. Note this code block can take a few minutes to run.

In [11]:
# Define relation extractor
relation_extractor = MultiClassRelationExtractor(llm, prompt_template=re_prompt_template, possible_relation_types_func=possible_relation_types_func)

# Extract multi-class relations with concurrent mode (faster)
relations = relation_extractor.extract_relations(doc, concurrent=True)

# To print out the step-by-step, use the `concurrent=False` and `stream=True` options
# relations = relation_extractor.extract_relations(doc, concurrent=False, stream=True)

The relation extractor ouptuts a list of relations as Dictionary of frame_1 id, frame_2 id and relation type.

In [12]:
print(relations)

[{'frame_1': '3', 'frame_2': '20', 'relation': 'Strength-Drug'}, {'frame_1': '3', 'frame_2': '21', 'relation': 'Frequency-Drug'}, {'frame_1': '4', 'frame_2': '5', 'relation': 'Strength-Drug'}, {'frame_1': '4', 'frame_2': '6', 'relation': 'Frequency-Drug'}, {'frame_1': '4', 'frame_2': '9', 'relation': 'Frequency-Drug'}, {'frame_1': '4', 'frame_2': '15', 'relation': 'Frequency-Drug'}, {'frame_1': '4', 'frame_2': '18', 'relation': 'Frequency-Drug'}, {'frame_1': '4', 'frame_2': '21', 'relation': 'Frequency-Drug'}, {'frame_1': '5', 'frame_2': '16', 'relation': 'Strength-Drug'}, {'frame_1': '6', 'frame_2': '7', 'relation': 'Frequency-Drug'}, {'frame_1': '6', 'frame_2': '10', 'relation': 'Frequency-Drug'}, {'frame_1': '6', 'frame_2': '16', 'relation': 'Frequency-Drug'}, {'frame_1': '7', 'frame_2': '8', 'relation': 'Strength-Drug'}, {'frame_1': '7', 'frame_2': '9', 'relation': 'Frequency-Drug'}, {'frame_1': '10', 'frame_2': '11', 'relation': 'Strength-Drug'}, {'frame_1': '10', 'frame_2': '12',

Add the extracted relations to the document object. Validation is performed at the backend (e.g., check if all the frames in the relations exists).

In [13]:
doc.add_relations(relations)

Now the document has all the frames and relations.

In [14]:
print(doc)

LLMInformationExtractionDocument(doc_id: "Meidcal note"
text: "### Discharge Summary Note

**Patient Name:** John Doe  
**Medical Record Number:** 12345678  
**Dat...",
frames: 23
relations: 24


Visualize the frames and relations with `viz_serve()` method. A Flask App will start on *localhost:5000*.

In [None]:
doc.viz_serve(color_attr_key="entity_type")

Alternatively, visualize the frames and relations with `viz_render()`. This works better in Jupyter Notebook.

In [16]:
html_content = doc.viz_render(color_attr_key="entity_type")

import html
from IPython.display import display, HTML

iframe_html = f"""
    <iframe srcdoc="{html.escape(html_content)}" width="100%" height="300px" style="border:none;"></iframe>
"""
display(HTML(iframe_html))

Finally, we can save the document to file with ".llmie" extension. The extension is not enforced, but recommended.

In [30]:
# doc.save("<your_directory>/<your_filename>.llmie")

### Conclusion

In this demo, we performed NER with ```SentenceFrameExtractor``` and RE with ```MultiClassRelationExtractor```. The output is a document object that holds the frames of "Medication", "Strength", and "Frequency" and relations of "Strength-Drug" and "Frequency-Drug". 