![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Applied_Generative_A/Healthcare_NLP_Agents_with_LLMs.ipynb)

# **JohnSnowLabs Healthcare NLP Agents with LLMs (Certification Trainings, July 2024)**

# Coding an LLM Agent with John Snow Labs Library (Healthcare NLP)

In [None]:
import json
import os

from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
license_keys['JSL_VERSION']


'5.3.2'

In [None]:
license_keys['PUBLIC_VERSION']


'5.3.2'

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.4.1 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [None]:
import json
import os

import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

spark

Spark NLP Version : 5.3.2
Spark NLP_JSL Version : 5.3.2


In [None]:
from pyspark.sql import DataFrame
import pyspark.sql.functions as F
import pyspark.sql.types as T
import pyspark.sql as SQL
from pyspark import keyword_only

In [None]:
!pip install -qU transformers accelerate

## Information Extraction with LLMs

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

In [None]:
messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])


 To solve the equation 2x + 3 = 7, you need to isolate the variable x. Here are the steps:

1. Subtract 3 from both sides of the equation to get rid of the +3 on the left side. This gives you: 2x = 7 - 3, which simplifies to 2x = 4.

2. Now, divide both sides of the equation by 2 to solve for x. This gives you: x = 4 / 2, which simplifies to x = 2.

So, the solution to the equation 2x + 3 = 7 is x = 2.


### with a simple prompt

In [None]:
simple_prompt = """
You are a highly experienced and skilled medical annotator who have been working on medical texts to label medical and pharma related entities.
Given the following text, return all the Problem, Test and Treatment entities:
"""
content_text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''

messages = [
    {"role": "user", "content": f"{simple_prompt} {content_text}"}
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])


 Problem:
1. History of gestational diabetes mellitus
2. Subsequent type two diabetes mellitus (T2DM)
3. Prior episode of hyperglycemia-induced pancreatitis (HTG-induced pancreatitis)
4. Acute hepatitis
5. Polyuria
6. Poor appetite
7. Vomiting

Test:
1. Serum glucose
2. Creatinine
3. Triglycerides
4. Total cholesterol
5. Venous pH

Treatment:
1. Metformin
2. Glipizide
3. Dapagliflozin
4. Atorvastatin
5. Gemfibrozil


### with a detailed prompt (instructions and few shot examples)

In [None]:
ner_prompt = """
You are a highly experienced and skilled medical annotator who have been working on medical texts to label medical and pharma related entities.

I will provide you some entity types with sample chunks and I want you to find similar entities from given texts and label them with right entity types.


-  Entity Type: Problem

    Examples:
    a) given sample sentence:
    The patient is an 86-year-old man who has a diabetes.
    Problem in above given text: diabetes

    b) given sample sentence:
    No family history of coronary artery disease, CVA, diabetes, CHF or MI.
    Problem in above given text: coronary artery disease, CVA, diabetes, CHF or MI

-  Entity Type: Test

    Examples:
    a) given sample sentence:
    Decortication of the lung with pleural biopsy and transpleural fluoroscopy.
    Procedure in above given text: transpleural fluoroscopy

    b) given sample sentence:
    VITAL SIGNS: Blood pressure 123/95, heart rate 83, respirations 20, temperature 97, and oxygen saturation 97%.
    Test in above given text: Blood pressure 123/95, heart rate 83, respirations 20, temperature 97, and oxygen saturation 97%

-  Entity Type: Treatment

    Examples:
    a) given sample sentence:
    He has also tried acupuncture Past, TENS unit, physical therapy Past, chiropractic treatment Past and multiple neuropathic medications, with no effect.
    Treatment in above given text: acupuncture, physical therapy, rehabilitation


As a AnnotatorGPT I want you to extract Problem, Test and Treatment chunks from the given text and label them accordingly.

Task :

Find entities in the given sentence.

Answer value must be as given (valid JSON) for the given example sentence:

{{"list_of_entities":
    [
        {{"entity_type": "Treatment", "chunk": "skilled speech therapy"}}
    ]
}}

Now I want you to find the Problem, Test and Treatment entities in the given sentence:
"""



In [None]:
content_text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''

messages = [
    {"role": "user", "content": f"{ner_prompt} {content_text}"}
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])


 {{
    "list_of_entities":
    [
        {{"entity_type": "Problem", "chunk": "gestational diabetes mellitus", "additional_info": "diagnosed eight years prior to presentation"}},
        {{"entity_type": "Problem", "chunk": "type two diabetes mellitus (T2DM)", "additional_info": "subsequent to gestational diabetes mellitus"}},
        {{"entity_type": "Problem", "chunk": "HTG-induced pancreatitis", "additional_info": "one prior episode three years prior to presentation"}},
        {{"entity_type": "Problem", "chunk": "acute hepatitis", "additional_info": "associated with"}},
        {{"entity_type": "Test", "chunk": "serum glucose 111 mg/dl"}},
        {{"entity_type": "Test", "chunk": "creatinine 0.4 mg/dL"}},
        {{"entity_type": "Test", "chunk": "triglycerides 508 mg/dL"}},
        {{"entity_type": "Test", "chunk": "total cholesterol 122 mg/dL"}},
        {{"entity_type": "Test", "chunk": "venous pH 7.27"}}
    ]
}}


## Information Extraction with Healthcare NLP by John Snow Labs

In [None]:
# Annotator that transforms a text column from dataframe into an Annotation ready for NLP
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

# Tokenizer splits words in a relevant format for NLP
tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

# Clinical word embeddings trained on PubMED dataset
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["sentence","token"])\
    .setOutputCol("embeddings")

# NER model trained on i2b2 (sampled from MIMIC) dataset
clinical_ner = MedicalNerModel.pretrained("ner_clinical_large","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner")\
    .setLabelCasing("upper") #decide if we want to return the tags in upper or lower case

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner"])\
    .setOutputCol("ner_chunk")

nlpPipeline = Pipeline(
    stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter
        ])


empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical_large download started this may take some time.
[OK!]


In [None]:
clinical_ner.getClasses()


['O',
 'B-TREATMENT',
 'I-TREATMENT',
 'B-PROBLEM',
 'I-PROBLEM',
 'B-TEST',
 'I-TEST']

In [None]:
def explode_annotate(ann_result):
    '''
        function to convert result object to json
        input: raw result
        output: processed result dictionary
    '''
    result = []

    #for column, ann in ann_result[0].items():

    for lines in ann_result[0]['ner_chunk']:
        content = {
            "chunk": lines.result,
            "begin": int(lines.begin),
            "end": int(lines.end),
            "entity_label": lines.metadata['entity'],
            "confidence": lines.metadata['confidence']
        }

        result.append(content)

    return result


In [None]:
# fullAnnotate in LightPipeline

text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''

print (text)

light_model = LightPipeline(model)

res = light_model.fullAnnotate(text)

pd.DataFrame(explode_annotate(res))


A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.



Unnamed: 0,chunk,begin,end,entity_label,confidence
0,gestational diabetes mellitus,40,68,PROBLEM,0.91976666
1,subsequent type two diabetes mellitus,118,154,PROBLEM,0.75924003
2,T2DM,157,160,PROBLEM,0.9917
3,HTG-induced pancreatitis,185,208,PROBLEM,0.97535
4,an acute hepatitis,265,282,PROBLEM,0.9440667
5,polyuria,322,329,PROBLEM,0.9728
6,poor appetite,332,344,PROBLEM,0.9934
7,vomiting,351,358,PROBLEM,0.9854
8,metformin,372,380,TREATMENT,0.9998
9,glipizide,383,391,TREATMENT,0.9999


## Building an NER agent with Healthcare NLP




In [None]:
from typing import List, Union

def SNLP4HC_general_Tool_func(text: Union[str, None]):
    light_result = light_model.fullAnnotate(text)
    return explode_annotate(light_result)


In [None]:
agent_list = [{
        'name': "SNLP4HC_general_Tool_func",
        'description' : "when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.",
        'shot': 'Given the following text, return all the Problem, Test and Treatment entities: sample text'
}]
agent_list

[{'name': 'SNLP4HC_general_Tool_func',
  'description': "when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.",
  'shot': 'Given the following text, return all the Problem, Test and Treatment entities: sample text'}]

In [None]:
# prompt template

HC_agent_prompt_text = ""

for agent in agent_list:

    name = agent['name']

    description = agent['description']

    HC_agent_prompt = f"""
    "{name}": Useful for {description}
    To use the {name} tool, Assistant should respond like so:
    {{{{"action": "{name}", "action_input": "the text user specifies or refers to"}}}}
    """

    HC_agent_prompt_text = HC_agent_prompt_text+ HC_agent_prompt


print (HC_agent_prompt_text)


    "SNLP4HC_general_Tool_func": Useful for when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.
    To use the SNLP4HC_general_Tool_func tool, Assistant should respond like so:
    {{"action": "SNLP4HC_general_Tool_func", "action_input": "the text user specifies or refers to"}}
    


In [None]:
HC_agent_shot_text = ""

for agent in agent_list:

    name = agent['name']

    shot = agent['description']

    shot = shot.split('>')[0].replace('when you need to ','')

    HC_agent_shot = f"""
    User: Can you {shot}: sample text.
    Assistant: {{{{"action": "{name}", "action_input": "sample text"}}}}
    """

    HC_agent_shot_text = HC_agent_shot_text + HC_agent_shot


print (HC_agent_shot_text)


    User: Can you extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. : sample text.
    Assistant: {{"action": "SNLP4HC_general_Tool_func", "action_input": "sample text"}}
    


In [None]:
user_input = 'user_input'

prompt_template = f'''<s>[INST] <<SYS>>
Assistant is a expert JSON builder designed to assist with a wide range of tasks.

Assistant is able to trigger actions for User by responding with JSON strings that contain "action" and "action_input" parameters.

Actions available to Assistant are:

{HC_agent_prompt_text}

Here are some previous conversations between the Assistant and User:

{HC_agent_shot_text}

If the Assistant is not able to find any agent, it should answer using its own knowledge and answer directly.

<</SYS>>

{user_input}

[/INST]'''

print (prompt_template)

<s>[INST] <<SYS>>
Assistant is a expert JSON builder designed to assist with a wide range of tasks.

Assistant is able to trigger actions for User by responding with JSON strings that contain "action" and "action_input" parameters.

Actions available to Assistant are:


    "SNLP4HC_general_Tool_func": Useful for when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.
    To use the SNLP4HC_general_Tool_func tool, Assistant should respond like so:
    {{"action": "SNLP4HC_general_Tool_func", "action_input": "the text user specifies or refers to"}}
    

Here are some previous conversations between the Assistant and User:


    User: Can you extract clinical named entities such as Problem, Test and Treatement from patient notes. T

In [None]:
import re
import json
from IPython.core.display import display, HTML


def local_llm_own_knowledge(user_prompt):

  output = pipe(user_prompt, **generation_args)

  return output[0]['generated_text']

def local_llm(user_prompt):

  user_prompt = prompt_template.replace('user_input',"User: " + user_prompt)

  output = pipe(user_prompt, **generation_args)

  return output[0]['generated_text']


def process_command_SingleAgent(user_prompt):

    # Send command to the model
    response = local_llm(user_prompt)

    print ('raw response:',response)
    # try to find json in the response
    try:
      # Extract json from model response by finding first and last brackets {}
      firstBracketIndex = response.index("{")
      lastBracketIndex = len(response) - response[::-1].index("}")
      jsonString = response[firstBracketIndex:lastBracketIndex]
      responseJson = json.loads(jsonString)
      #print ('responseJson:', responseJson)
      if responseJson['action'] == 'SNLP4HC_general_Tool_func':
          print ('\n Agent found:', responseJson['action'], '\n')
          return SNLP4HC_general_Tool_func(responseJson['action_input'])
      else:
          print ('\n NO MATCHING AGENT FOUND ! LLM will answer using its own knowledge.\n')
          return local_llm_own_knowledge(user_prompt)

    except Exception as e:
        print('\n NO MATCHING AGENT FOUND OR LLM FAILED TO RETURN STRUCTURED JSON! \n LLM will answer using its own knowledge.\n')
        return local_llm_own_knowledge(user_prompt)

    # No json match, just return response
    return response


In [None]:
text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''

result = process_command_SingleAgent(f"Can you extract Problem, Test and Treatment entities from the following text: {text}")

result

raw response:  To extract the requested clinical named entities such as Problem, Test, and Treatment from the provided text, the Assistant will use the SNLP4HC_general_Tool_func action with the text as the input.

Here is how the Assistant should respond:

```json
{
  "action": "SNLP4HC_general_Tool_func",
  "action_input": "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa; significantly, her abdominal examination was benign with no ten

[{'chunk': 'gestational diabetes mellitus',
  'begin': 39,
  'end': 67,
  'entity_label': 'PROBLEM',
  'confidence': '0.91976666'},
 {'chunk': 'subsequent type two diabetes mellitus',
  'begin': 117,
  'end': 153,
  'entity_label': 'PROBLEM',
  'confidence': '0.75924003'},
 {'chunk': 'T2DM',
  'begin': 156,
  'end': 159,
  'entity_label': 'PROBLEM',
  'confidence': '0.9917'},
 {'chunk': 'HTG-induced pancreatitis',
  'begin': 184,
  'end': 207,
  'entity_label': 'PROBLEM',
  'confidence': '0.97535'},
 {'chunk': 'an acute hepatitis',
  'begin': 264,
  'end': 281,
  'entity_label': 'PROBLEM',
  'confidence': '0.9440667'},
 {'chunk': 'polyuria',
  'begin': 321,
  'end': 328,
  'entity_label': 'PROBLEM',
  'confidence': '0.9728'},
 {'chunk': 'poor appetite',
  'begin': 331,
  'end': 343,
  'entity_label': 'PROBLEM',
  'confidence': '0.9934'},
 {'chunk': 'vomiting',
  'begin': 350,
  'end': 357,
  'entity_label': 'PROBLEM',
  'confidence': '0.9854'},
 {'chunk': 'metformin',
  'begin': 371,
 

In [None]:
pd.DataFrame(result)

Unnamed: 0,chunk,begin,end,entity_label,confidence
0,gestational diabetes mellitus,39,67,PROBLEM,0.91976666
1,subsequent type two diabetes mellitus,117,153,PROBLEM,0.75924003
2,T2DM,156,159,PROBLEM,0.9917
3,HTG-induced pancreatitis,184,207,PROBLEM,0.97535
4,an acute hepatitis,264,281,PROBLEM,0.9440667
5,polyuria,321,328,PROBLEM,0.9728
6,poor appetite,331,343,PROBLEM,0.9934
7,vomiting,350,357,PROBLEM,0.9854
8,metformin,371,379,TREATMENT,0.9998
9,glipizide,382,390,TREATMENT,0.9999


In [None]:
# test with non-exist entity label
# LLM should return using its own knowledge.

text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''


result = process_command_SingleAgent(f"Can you extract Oncology entities from the following text: {text}")

result

raw response:  To extract Oncology entities from the provided text, the Assistant should use the specified tool designed for clinical information extraction. Given the context, it seems that the focus is on identifying terms related to Oncology within the patient's history and current condition. However, the provided text does not explicitly mention any Oncology entities such as specific types of cancer, treatments, or procedures directly related to Oncology. The text primarily discusses gestational diabetes, type 2 diabetes, hypertriglyceridemia (HTG), and symptoms that may require further investigation but does not directly reference Oncology entities.

Given the instructions and the nature of the text, the Assistant should still attempt to use the tool for a comprehensive analysis, even though the direct application of "SNLP4HC_general_Tool_func" for Oncology entities might not yield explicit results in this specific context. The Assistant's response should reflect an attempt to app

'Her initial serum aminotransferases were normal. Repeat testing at 72 hours of presentation showed an alanine aminotransferase (ALT) of 1013 U/L and aspartate aminotransferase (AST) of 972 U/L.\nSerologic testing for viral hepatitis A, B, and C was negative.\nHer abdominal ultrasound showed no evidence of fatty liver, cholelithiasis, or cholecystitis.\nHer computed tomography (CT) scan of the abdomen and pelvis with contrast showed no evidence of pancreatic mass, ascites, or other intra-abdominal pathology.\nHer magnetic resonance cholangiopancreatography (MRCP) showed no evidence of biliary obstruction.\nHer endoscopic ultrasound (EUS) showed no evidence of pancreatic mass.\nHer endoscopic retrograde cholangiopancreatography (ERCP) showed no evidence of biliary obstruction.\nHer diagnosis of acute pancreatitis was confirmed by elevated serum aminotransferases and imaging studies.\nHer acute pancreatitis was thought to be due to dapagliflozin.\nHer dapagliflozin was discontinued.\nHer

In [None]:
# lets try LLM raw output again
# same as above (irrelevant requests)

user_prompt = f"Can you extract Problem, Test and Treatment entities from the following text: {text}"

output = pipe(user_prompt, **generation_args)

output[0]['generated_text']

"Her initial workup included a comprehensive metabolic panel, complete blood count, and urinalysis. The comprehensive metabolic panel revealed a creatinine of 0.4 mg/dL, an AST of 10 U/L, an ALT of 10 U/L, a BUN of 10 mg/dL, a calcium of 9.5 mg/dL, a chloride of 100 mEq/L, a CO2 of 20 mEq/L, a creatinine of 0.4 mg/dL, a potassium of 4.0 mEq/L, a sodium of 140 mEq/L, and a total bilirubin of 0.5 mg/dL. The complete blood count revealed a hemoglobin of 12.0 g/dL, a hematocrit of 36%, a platelet count of 250,000/mm3, a red blood cell count of 4.2 million cells/mm3, a white blood cell count of 7,000/mm3, and a white blood cell differential of 60% neutrophils, 30% lymphocytes, and 10% monocytes. The urinalysis revealed a specific gravity of 1.010, a pH of 6.0, a protein of negative, a red blood cell count of 0/hpf, a white blood cell count of 0/hpf, and a nitrite of negative.\nThe patient was admitted to the hospital for further workup and management. She was started on IV fluids and insuli

#### Align with user prompts precisely

In [None]:
text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''

agent_result = process_command_SingleAgent(f"Can you extract Problem entities from the following text: {text}")

pd.DataFrame(agent_result)

raw response:  To extract "Problem" entities from the provided text using the specified tool, the Assistant should structure the JSON response as follows:

```json
{
  "action": "SNLP4HC_general_Tool_func",
  "action_input": "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa; significantly, her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg

Unnamed: 0,chunk,begin,end,entity_label,confidence
0,gestational diabetes mellitus,39,67,PROBLEM,0.91976666
1,subsequent type two diabetes mellitus,117,153,PROBLEM,0.75924003
2,T2DM,156,159,PROBLEM,0.9917
3,HTG-induced pancreatitis,184,207,PROBLEM,0.97535
4,an acute hepatitis,264,281,PROBLEM,0.9440667
5,polyuria,321,328,PROBLEM,0.9728
6,poor appetite,331,343,PROBLEM,0.9934
7,vomiting,350,357,PROBLEM,0.9854
8,metformin,371,379,TREATMENT,0.9998
9,glipizide,382,390,TREATMENT,0.9999


**As you see, even if we only asked for `PROBLEM` entities, it returned all the entities since our NLP agent cannot filter entities based on the user query. Now lets fix this.**

In [None]:
def filter_by_user_query(user_prompt, candidate_entities):

  new_prompt = f'''<s>[INST] <<SYS>>

  Given the following user query, figure out which entities user asked for precisely out of candidate entities and return a list of entities accordingly:

  candidate entities are {candidate_entities}

  input: Give me all the Problem entities.

  output: ['Problem']

  <</SYS>>

  input: {user_prompt}
  output:

  [/INST]'''

  output = pipe(new_prompt, **generation_args)

  return output[0]['generated_text']

In [None]:
user_prompt = f"Can you extract Problem entities from the following text."

candidate_entities = ['Problem', 'Test', 'Treatment']

target_entities = filter_by_user_query(user_prompt, candidate_entities)
print (target_entities)

 ['Problem']


In [None]:
user_prompt = f"Return all the Test entities from the following text."

target_entities = filter_by_user_query(user_prompt, candidate_entities)
print (target_entities)

 ['Test']


In [None]:
df = pd.DataFrame(agent_result)

df[df['entity_label'].isin([x.upper() for x in eval(target_entities)])]

Unnamed: 0,chunk,begin,end,entity_label,confidence
16,Physical examination,536,555,TEST,0.9448
18,her abdominal examination,625,649,TEST,0.8749666
22,serum glucose,754,766,TEST,0.969
23,creatinine,779,788,TEST,0.9994
24,triglycerides,801,813,TEST,0.9999
25,total cholesterol,826,842,TEST,0.9827
26,venous pH,859,867,TEST,0.97185004


In [None]:
user_prompt = f"Return all the Test and Treatment entities from the following text."

target_entities = filter_by_user_query(user_prompt, candidate_entities)

print ('target_entities:', target_entities)

df = pd.DataFrame(agent_result)

df[df['entity_label'].isin([x.upper() for x in eval(target_entities)])]

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


target_entities:  ['Test', 'Treatment']


Unnamed: 0,chunk,begin,end,entity_label,confidence
8,metformin,371,379,TREATMENT,0.9998
9,glipizide,382,390,TREATMENT,0.9999
10,dapagliflozin,397,409,TREATMENT,1.0
12,atorvastatin,424,435,TREATMENT,0.9999
13,gemfibrozil,441,451,TREATMENT,1.0
15,dapagliflozin,478,490,TREATMENT,0.9999
16,Physical examination,536,555,TEST,0.9448
18,her abdominal examination,625,649,TEST,0.8749666
22,serum glucose,754,766,TEST,0.969
23,creatinine,779,788,TEST,0.9994


In [None]:
type(agent_result)

list

In [None]:
agent_result[0]

{'chunk': 'gestational diabetes mellitus',
 'begin': 39,
 'end': 67,
 'entity_label': 'PROBLEM',
 'confidence': '0.91976666'}

In [None]:
target_entities = [x.upper() for x in eval(target_entities)]

[record for record in agent_result if record['entity_label'] in target_entities]

[{'chunk': 'metformin',
  'begin': 371,
  'end': 379,
  'entity_label': 'TREATMENT',
  'confidence': '0.9998'},
 {'chunk': 'glipizide',
  'begin': 382,
  'end': 390,
  'entity_label': 'TREATMENT',
  'confidence': '0.9999'},
 {'chunk': 'dapagliflozin',
  'begin': 397,
  'end': 409,
  'entity_label': 'TREATMENT',
  'confidence': '1.0'},
 {'chunk': 'atorvastatin',
  'begin': 424,
  'end': 435,
  'entity_label': 'TREATMENT',
  'confidence': '0.9999'},
 {'chunk': 'gemfibrozil',
  'begin': 441,
  'end': 451,
  'entity_label': 'TREATMENT',
  'confidence': '1.0'},
 {'chunk': 'dapagliflozin',
  'begin': 478,
  'end': 490,
  'entity_label': 'TREATMENT',
  'confidence': '0.9999'},
 {'chunk': 'Physical examination',
  'begin': 536,
  'end': 555,
  'entity_label': 'TEST',
  'confidence': '0.9448'},
 {'chunk': 'her abdominal examination',
  'begin': 625,
  'end': 649,
  'entity_label': 'TEST',
  'confidence': '0.8749666'},
 {'chunk': 'serum glucose',
  'begin': 754,
  'end': 766,
  'entity_label': '

In [None]:
def agent_with_filtered_entities(content_text, query, candidate_entities):

  agent_result = process_command_SingleAgent(f"{query}: {content_text}")

  user_prompt = f"Return all the Test and Treatment entities from the following text."

  target_entities = filter_by_user_query(query, candidate_entities)

  print ('target_entities:', target_entities)

  try:
    target_entities = [x.upper() for x in eval(target_entities)]
    print('filtered by:', target_entities)
  except:
    print ('no filtering')
    target_entities = candidate_entities

  df = pd.DataFrame(agent_result)

  return df[df['entity_label'].isin(target_entities)]


In [None]:
query = "Can you extract Problem entities from the following text"

content_text = '''
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
'''

candidate_entities = ['Problem', 'Test', 'Treatment']

df = agent_with_filtered_entities(content_text, query, candidate_entities)

df

raw response:  To extract "Problem" entities from the provided text using the specified tool, the Assistant should structure the JSON response as follows:

```json
{
  "action": "SNLP4HC_general_Tool_func",
  "action_input": "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa; significantly, her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg

Unnamed: 0,chunk,begin,end,entity_label,confidence
0,gestational diabetes mellitus,39,67,PROBLEM,0.91976666
1,subsequent type two diabetes mellitus,117,153,PROBLEM,0.75924003
2,T2DM,156,159,PROBLEM,0.9917
3,HTG-induced pancreatitis,184,207,PROBLEM,0.97535
4,an acute hepatitis,264,281,PROBLEM,0.9440667
5,polyuria,321,328,PROBLEM,0.9728
6,poor appetite,331,343,PROBLEM,0.9934
7,vomiting,350,357,PROBLEM,0.9854
11,T2DM,415,418,PROBLEM,0.9995
14,HTG,457,459,PROBLEM,0.9994


## Building a multi agent with Healthcare NLP

##### adding Posology (medication) NER

In [None]:
# NER model trained on i2b2 (sampled from MIMIC) dataset
posology_ner = MedicalNerModel.pretrained("ner_posology", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

posology_ner_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner"])\
    .setOutputCol("ner_chunk")

# greedy model
posology_ner_greedy = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner_greedy")

ner_converter_greedy = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_greedy"])\
    .setOutputCol("ner_chunk_greedy")

nlpPipeline = Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    posology_ner,
    posology_ner_converter,
    posology_ner_greedy,
    ner_converter_greedy])

empty_data = spark.createDataFrame([[""]]).toDF("text")

posology_model = nlpPipeline.fit(empty_data)

light_posology_model = LightPipeline(posology_model)

ner_posology download started this may take some time.
[OK!]
ner_posology_greedy download started this may take some time.
[OK!]


In [None]:
posology_ner_greedy.getClasses()

['O',
 'I-STRENGTH',
 'B-FORM',
 'I-FREQUENCY',
 'B-ROUTE',
 'I-ROUTE',
 'B-FREQUENCY',
 'I-DRUG',
 'B-DURATION',
 'I-DOSAGE',
 'B-DRUG',
 'I-DURATION',
 'B-STRENGTH',
 'I-FORM',
 'B-DOSAGE']

In [None]:
set([x.split('-')[-1].lower() for x in posology_ner_greedy.getClasses() if x!='O'])

{'dosage', 'drug', 'duration', 'form', 'frequency', 'route', 'strength'}

In [None]:
text ='The patient was prescribed 1 capsule of Advil for 5 days . He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely for 3 months .'

res = light_posology_model.fullAnnotate(text)

pd.DataFrame(explode_annotate(res))

Unnamed: 0,chunk,begin,end,entity_label,confidence
0,1,27,27,DOSAGE,0.9992
1,capsule,29,35,FORM,0.9897
2,Advil,40,44,DRUG,0.997
3,for 5 days,46,55,DURATION,0.71383333
4,40 units,126,133,DOSAGE,0.84085
5,insulin glargine,138,153,DRUG,0.87525
6,at night,155,162,FREQUENCY,0.76839995
7,12 units,166,173,DOSAGE,0.88755
8,insulin lispro,178,191,DRUG,0.77805
9,with meals,193,202,FREQUENCY,0.9213


In [None]:
from typing import List, Union

def SNLP4HC_general_Tool_func(text: Union[str, None]):
    light_result = light_model.fullAnnotate(text)
    return explode_annotate(light_result)

def SNLP4HC_medication_Tool_func(text: Union[str, None]):
    light_result = light_posology_model.fullAnnotate(text)
    return explode_annotate(light_result)

agent_list = [{
        'name': "SNLP4HC_general_Tool_func",
        'description' : "when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.",
        'shot': 'Given the following text, return all the Problem, Test and Treatment entities: sample text'
},
              {
        'name': "SNLP4HC_medication_Tool_func",
        'description': "when you need to extract medication/ posology related named entities (e.g. drug, dosage, strength etc) from patient notes.  The requested entities could be any of dosage, drug, duration, form, frequency, route, strength. For the irrelavant requests, don't use this tool. > Use this only when medication and poslogy related entities need to be extracted.",
        'shot': "extract the drug related entities from the following text: sample text"
    },]
agent_list

[{'name': 'SNLP4HC_general_Tool_func',
  'description': "when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.",
  'shot': 'Given the following text, return all the Problem, Test and Treatment entities: sample text'},
 {'name': 'SNLP4HC_medication_Tool_func',
  'description': "when you need to extract medication/ posology related named entities (e.g. drug, dosage, strength etc) from patient notes.  The requested entities could be any of dosage, drug, duration, form, frequency, route, strength. For the irrelavant requests, don't use this tool. > Use this only when medication and poslogy related entities need to be extracted.",
  'shot': 'extract the drug related entities from the following text: sample text'}]

In [None]:
HC_agent_shot_text = ""

for agent in agent_list:

    name = agent['name']

    shot = agent['description']

    shot = shot.split('>')[0].replace('when you need to ','')

    HC_agent_shot = f"""
    User: Can you {shot}: sample text.
    Assistant: {{{{"action": "{name}", "action_input": "sample text"}}}}
    """

    HC_agent_shot_text = HC_agent_shot_text + HC_agent_shot


print (HC_agent_shot_text)


    User: Can you extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. : sample text.
    Assistant: {{"action": "SNLP4HC_general_Tool_func", "action_input": "sample text"}}
    
    User: Can you extract medication/ posology related named entities (e.g. drug, dosage, strength etc) from patient notes.  The requested entities could be any of dosage, drug, duration, form, frequency, route, strength. For the irrelavant requests, don't use this tool. : sample text.
    Assistant: {{"action": "SNLP4HC_medication_Tool_func", "action_input": "sample text"}}
    


In [None]:

HC_agent_prompt_text = ""

for agent in agent_list:

    name = agent['name']

    description = agent['description']

    HC_agent_prompt = f"""
- "{name}": Useful for {description}
    - To use the {name} tool, Assistant should respond like so:
    {{{{"action": "{name}", "action_input": "the text user specifies or refers to"}}}}
"""

    HC_agent_prompt_text = HC_agent_prompt_text+ HC_agent_prompt


print (HC_agent_prompt_text)


- "SNLP4HC_general_Tool_func": Useful for when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.
    - To use the SNLP4HC_general_Tool_func tool, Assistant should respond like so:
    {{"action": "SNLP4HC_general_Tool_func", "action_input": "the text user specifies or refers to"}}

- "SNLP4HC_medication_Tool_func": Useful for when you need to extract medication/ posology related named entities (e.g. drug, dosage, strength etc) from patient notes.  The requested entities could be any of dosage, drug, duration, form, frequency, route, strength. For the irrelavant requests, don't use this tool. > Use this only when medication and poslogy related entities need to be extracted.
    - To use the SNLP4HC_medication_Tool_func tool, Ass

In [None]:
user_input = 'user_input'

prompt_template = f'''<s>[INST] <<SYS>>
Assistant is a expert JSON builder designed to assist with a wide range of tasks.

Assistant is able to trigger actions for User by responding with JSON strings that contain "action" and "action_input" parameters.

Actions available to Assistant are:

{HC_agent_prompt_text}

Here are some previous conversations between the Assistant and User:

{HC_agent_shot_text}

If the Assistant is not able to find any agent, it should answer using its own knowledge and answer directly.

<</SYS>>

{user_input}

[/INST]'''

print (prompt_template)

<s>[INST] <<SYS>>
Assistant is a expert JSON builder designed to assist with a wide range of tasks.

Assistant is able to trigger actions for User by responding with JSON strings that contain "action" and "action_input" parameters.

Actions available to Assistant are:


- "SNLP4HC_general_Tool_func": Useful for when you need to extract clinical named entities such as Problem, Test and Treatement from patient notes. The requested entities could be any of Problem, Test or Treatement. For the irrelavant requests, don't use this tool. > This will be used only when clinical information extraction tools are picked by the agent.
    - To use the SNLP4HC_general_Tool_func tool, Assistant should respond like so:
    {{"action": "SNLP4HC_general_Tool_func", "action_input": "the text user specifies or refers to"}}

- "SNLP4HC_medication_Tool_func": Useful for when you need to extract medication/ posology related named entities (e.g. drug, dosage, strength etc) from patient notes.  The requested e

In [None]:

def process_command_MultiAgent(user_prompt):

    # Send command to the model
    response = local_llm(user_prompt)

    print ('raw response:',response)
    # try to find json in the response
    try:
      # Extract json from model response by finding first and last brackets {}
      firstBracketIndex = response.index("{")
      lastBracketIndex = len(response) - response[::-1].index("}")
      jsonString = response[firstBracketIndex:lastBracketIndex]
      responseJson = json.loads(jsonString)
      #print ('responseJson:', responseJson)
      if responseJson['action'] == 'SNLP4HC_general_Tool_func':
          print ('\n Agent found:', responseJson['action'], '\n')
          return SNLP4HC_general_Tool_func(responseJson['action_input'])
      elif responseJson['action'] == 'SNLP4HC_medication_Tool_func':
            print ('\n Agent found:', responseJson['action'], '\n')
            return SNLP4HC_medication_Tool_func(responseJson['action_input'])
      else:
          print ('\n NO MATCHING AGENT FOUND ! LLM will answer using its own knowledge.\n')
          return local_llm_own_knowledge(user_prompt)

    except Exception as e:
        print('\n NO MATCHING AGENT FOUND OR LLM FAILED TO RETURN STRUCTURED JSON! \n LLM will answer using its own knowledge.\n')
        return local_llm_own_knowledge(user_prompt)

    # No json match, just return response
    return response


In [None]:
text ='The patient was prescribed 1 capsule of Advil for 5 days . He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely for 3 months .'

result = process_command_MultiAgent(f"Can you extract medication entities from the following text: {text}")

result

raw response:  To extract medication-related named entities such as drug names, dosages, and administration details from the provided text, the Assistant should use the SNLP4HC_medication_Tool_func tool. The text contains information about various medications including their names (Advil, insulin glargine, insulin lispro), dosages (1 capsule, 5 days, 40 units, 12 units, 1000 mg, twice a day), and administration details (at night, with meals).

Here is how the Assistant should respond:

```json

{

  "action": "SNLP4HC_medication_Tool_func",

  "action_input": "The patient was prescribed 1 capsule of Advil for 5 days. He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night, 12 units of insulin lispro with meals, and metformin 1000 mg two times a day. It was determined that all SGLT2 inhibitors should be discontinued indefinitely for 3 months."

}

```

This JSON response correctly identifies the action to be taken and includes the text in

[{'chunk': '1',
  'begin': 27,
  'end': 27,
  'entity_label': 'DOSAGE',
  'confidence': '0.9992'},
 {'chunk': 'capsule',
  'begin': 29,
  'end': 35,
  'entity_label': 'FORM',
  'confidence': '0.9897'},
 {'chunk': 'Advil',
  'begin': 40,
  'end': 44,
  'entity_label': 'DRUG',
  'confidence': '0.997'},
 {'chunk': 'for 5 days',
  'begin': 46,
  'end': 55,
  'entity_label': 'DURATION',
  'confidence': '0.71383333'},
 {'chunk': '40 units',
  'begin': 125,
  'end': 132,
  'entity_label': 'DOSAGE',
  'confidence': '0.84085'},
 {'chunk': 'insulin glargine',
  'begin': 137,
  'end': 152,
  'entity_label': 'DRUG',
  'confidence': '0.87525'},
 {'chunk': 'at night',
  'begin': 154,
  'end': 161,
  'entity_label': 'FREQUENCY',
  'confidence': '0.76839995'},
 {'chunk': '12 units',
  'begin': 164,
  'end': 171,
  'entity_label': 'DOSAGE',
  'confidence': '0.88755'},
 {'chunk': 'insulin lispro',
  'begin': 176,
  'end': 189,
  'entity_label': 'DRUG',
  'confidence': '0.77805'},
 {'chunk': 'with meals'

In [None]:
pd.DataFrame(result)

Unnamed: 0,chunk,begin,end,entity_label,confidence
0,1,27,27,DOSAGE,0.9992
1,capsule,29,35,FORM,0.9897
2,Advil,40,44,DRUG,0.997
3,for 5 days,46,55,DURATION,0.71383333
4,40 units,125,132,DOSAGE,0.84085
5,insulin glargine,137,152,DRUG,0.87525
6,at night,154,161,FREQUENCY,0.76839995
7,12 units,164,171,DOSAGE,0.88755
8,insulin lispro,176,189,DRUG,0.77805
9,with meals,191,200,FREQUENCY,0.9213


#### Align with user prompts precisely

In [None]:
def MultiAgent_with_filtered_entities(content_text, query, candidate_entities):

  agent_result = process_command_MultiAgent(f"{query}: {content_text}")

  user_prompt = f"Return all the Test and Treatment entities from the following text."

  target_entities = filter_by_user_query(query, candidate_entities)

  print ('target_entities:', target_entities)

  try:
    target_entities = [x.upper() for x in eval(target_entities)]
    print('filtered by:', target_entities)
  except:
    print ('no filtering')
    target_entities = candidate_entities

  df = pd.DataFrame(agent_result)

  return df[df['entity_label'].isin(target_entities)]


In [None]:
candidate_entities = list(set([x.split('-')[-1] for x in posology_ner_greedy.getClasses()]))
candidate_entities

['STRENGTH', 'FREQUENCY', 'DOSAGE', 'ROUTE', 'DURATION', 'O', 'FORM', 'DRUG']

In [None]:
query = "Can you extract the drug and dosage information from the following text"

content_text ='The patient was prescribed 1 capsule of Advil for 5 days . He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely for 3 months .'

df = MultiAgent_with_filtered_entities(content_text, query, candidate_entities)

df

raw response:  To extract the drug and dosage information from the provided text, we will use the "SNLP4HC_medication_Tool_func" action as it is specifically designed for this purpose. The text contains information about medications, their dosages, and the frequency of administration.

Here is how the JSON response should be structured:

```json
{
  "action": "SNLP4HC_medication_Tool_func",
  "action_input": "The patient was prescribed 1 capsule of Advil for 5 days. He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night, 12 units of insulin lispro with meals, and metformin 1000 mg two times a day. It was determined that all SGLT2 inhibitors should be discontinued indefinitely for 3 months."
}
```

In this JSON response, the "action" parameter specifies that we are using the medication extraction tool, and the "action_input" parameter contains the text from which we need to extract the medication and dosage information. The assistant wil

Unnamed: 0,chunk,begin,end,entity_label,confidence
0,1,27,27,DOSAGE,0.9992
2,Advil,40,44,DRUG,0.997
4,40 units,125,132,DOSAGE,0.84085
5,insulin glargine,137,152,DRUG,0.87525
7,12 units,164,171,DOSAGE,0.88755
8,insulin lispro,176,189,DRUG,0.77805
10,metformin,207,215,DRUG,0.9997
13,SGLT2 inhibitors,269,284,DRUG,0.8754
