In [None]:
# enable automatic reloading of the notebook
%load_ext autoreload
%autoreload 2

In [1]:
# Load model directly
from jsonformer import Jsonformer
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AdaptLLM/medicine-chat")
model = AutoModelForCausalLM.from_pretrained("AdaptLLM/medicine-chat", device_map="auto", load_in_4bit=True)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [2]:
user_input = '''Please extract information from the text between !!! by using only categorical words or choosing from options Yes or No. Do not provide any additional text or information. The output must be in the following format:

{
    "Sex": "",
    "Age": "",
    "Treatment": "",
    "Patient had ECG done?": "",
    "Patient had palpitations?": "",
    "Patient had leg operation?": "",
    "Rehabilitation time": ""
    "Patient finished with treatment?": "",
    "Patient died": "",
}

!!!
A 28-year-old previously healthy man presented with a 6-week history of palpitations.
The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea.
Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings.
An electrocardiogram (ECG) revealed normal sinus rhythm and a Wolff– Parkinson– White pre-excitation pattern (Fig.1: Top), produced by a right-sided accessory pathway.
Transthoracic echocardiography demonstrated the presence of Ebstein's anomaly of the tricuspid valve, with apical displacement of the valve and formation of an “atrialized” right ventricle (a functional unit between the right atrium and the inlet [inflow] portion of the right ventricle) (Fig.2).
The anterior tricuspid valve leaflet was elongated (Fig.2C, arrow), whereas the septal leaflet was rudimentary (Fig.2C, arrowhead).
Contrast echocardiography using saline revealed a patent foramen ovale with right-to-left shunting and bubbles in the left atrium (Fig.2D).
The patient underwent an electrophysiologic study with mapping of the accessory pathway, followed by radiofrequency ablation (interruption of the pathway using the heat generated by electromagnetic waves at the tip of an ablation catheter).
His post-ablation ECG showed a prolonged PR interval and an odd “second” QRS complex in leads III, aVF and V2–V4 (Fig.1Bottom), a consequence of abnormal impulse conduction in the “atrialized” right ventricle.
The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.
!!!

'''

# Apply the prompt template and system prompt of LLaMA-2-Chat demo for chat models (NOTE: NO prompt template is required for base models!)
our_system_prompt = "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n" # Please do NOT change this

prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{user_input} [/INST]"

# # NOTE:
# # If you want to apply your own system prompt, please integrate it into the instruction part following our system prompt like this:
# your_system_prompt = "Please, answer this question faithfully."
# prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{your_system_prompt}\n{user_input} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')



### User Input:
Please extract information from the text between !!! by using only categorical words or choosing from options Yes or No. Do not provide any additional text or information. The output must be in the following format:

{
    "Sex": "",
    "Age": "",
    "Treatment": "",
    "Patient had ECG done?": "",
    "Patient had palpitations?": "",
    "Patient had leg operation?": "",
    "Rehabilitation time": ""
    "Patient finished with treatment?": "",
    "Patient died": "",
}

!!!
A 28-year-old previously healthy man presented with a 6-week history of palpitations.
The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea.
Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings.
An electrocardiogram (ECG) revealed normal sinus rhythm and a Wolff– Parkinson– White pre-excitation p

In [None]:
user_input = '''Please extract information from the text between !!! by using only categorical words or choosing from options Yes or No. Do not provide any additional text or information. Please provide the response in the form of a Python list. It should begin with “[“ and end with “]”:

[
    "Sex": "",
    "Age": "",
    "Treatment": "",
    "Patient had ECG done?": "",
    "Patient had palpitations?": "",
    "Patient had leg operation?": "",
    "Rehabilitation time": ""
    "Patient finished with treatment?": "",
    "Patient died": "",
]

!!!
A 28-year-old previously healthy man presented with a 6-week history of palpitations.
The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea.
Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings.
An electrocardiogram (ECG) revealed normal sinus rhythm and a Wolff– Parkinson– White pre-excitation pattern (Fig.1: Top), produced by a right-sided accessory pathway.
Transthoracic echocardiography demonstrated the presence of Ebstein's anomaly of the tricuspid valve, with apical displacement of the valve and formation of an “atrialized” right ventricle (a functional unit between the right atrium and the inlet [inflow] portion of the right ventricle) (Fig.2).
The anterior tricuspid valve leaflet was elongated (Fig.2C, arrow), whereas the septal leaflet was rudimentary (Fig.2C, arrowhead).
Contrast echocardiography using saline revealed a patent foramen ovale with right-to-left shunting and bubbles in the left atrium (Fig.2D).
The patient underwent an electrophysiologic study with mapping of the accessory pathway, followed by radiofrequency ablation (interruption of the pathway using the heat generated by electromagnetic waves at the tip of an ablation catheter).
His post-ablation ECG showed a prolonged PR interval and an odd “second” QRS complex in leads III, aVF and V2–V4 (Fig.1Bottom), a consequence of abnormal impulse conduction in the “atrialized” right ventricle.
The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.
!!!

'''

# Apply the prompt template and system prompt of LLaMA-2-Chat demo for chat models (NOTE: NO prompt template is required for base models!)
our_system_prompt = "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n" # Please do NOT change this

prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{user_input} [/INST]"

# # NOTE:
# # If you want to apply your own system prompt, please integrate it into the instruction part following our system prompt like this:
# your_system_prompt = "Please, answer this question faithfully."
# prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{your_system_prompt}\n{user_input} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')

In [None]:

user_input = '''Please extract information from the text between !!! by using only categorical words or choosing from options Yes or No. Do not provide any additional text or information.

!!!
A 28-year-old previously healthy man presented with a 6-week history of palpitations.
The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea.
Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings.
An electrocardiogram (ECG) revealed normal sinus rhythm and a Wolff– Parkinson– White pre-excitation pattern (Fig.1: Top), produced by a right-sided accessory pathway.
Transthoracic echocardiography demonstrated the presence of Ebstein's anomaly of the tricuspid valve, with apical displacement of the valve and formation of an “atrialized” right ventricle (a functional unit between the right atrium and the inlet [inflow] portion of the right ventricle) (Fig.2).
The anterior tricuspid valve leaflet was elongated (Fig.2C, arrow), whereas the septal leaflet was rudimentary (Fig.2C, arrowhead).
Contrast echocardiography using saline revealed a patent foramen ovale with right-to-left shunting and bubbles in the left atrium (Fig.2D).
The patient underwent an electrophysiologic study with mapping of the accessory pathway, followed by radiofrequency ablation (interruption of the pathway using the heat generated by electromagnetic waves at the tip of an ablation catheter).
His post-ablation ECG showed a prolonged PR interval and an odd “second” QRS complex in leads III, aVF and V2–V4 (Fig.1Bottom), a consequence of abnormal impulse conduction in the “atrialized” right ventricle.
The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.
!!!

Format the information using the following schema: 
'''
#prompt = "Generate a person's information based on the following schema:"
json_schema = {
    "type": "object",
    "properties": {
        "sex": {"type": "string"},
        "age": {"type": "string"}, # for some reason, "type": "number" does not work with the medical-chat model (investigate?)
        "Treatment": {"type": "string"},
        "Patient had ECG done": {"type": "boolean"},
        "Patient died": {"type": "boolean"},
    }
}

# Apply the prompt template and system prompt of LLaMA-2-Chat demo for chat models (NOTE: NO prompt template is required for base models!)
our_system_prompt = "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n" # Please do NOT change this

prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{user_input} [/INST]"


jsonformer = Jsonformer(model, tokenizer, json_schema, user_input)
generated_data = jsonformer()

print(generated_data)

# # NOTE:
# # If you want to apply your own system prompt, please integrate it into the instruction part following our system prompt like this:
# your_system_prompt = "Please, answer this question faithfully."
# prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{your_system_prompt}\n{user_input} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')

In [None]:
from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b")

json_schema = {
    "type": "object",
    "properties": {
        "sex": {"type": "string"},
        "age": {"type": "number"},
        "Treatment": {"type": "string"},
        "Patient had ECG done": {"type": "boolean"},
        "Patient died": {"type": "boolean"},
    }
}

prompt = """Please extract information from the text between !!! by using only categorical words or choosing from options Yes or No. Do not provide any additional text or information. Please generate a person's information based on the following schema:


!!!
A 28-year-old previously healthy man presented with a 6-week history of palpitations.
The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea.
Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings.
An electrocardiogram (ECG) revealed normal sinus rhythm and a Wolff– Parkinson– White pre-excitation pattern (Fig.1: Top), produced by a right-sided accessory pathway.
Transthoracic echocardiography demonstrated the presence of Ebstein's anomaly of the tricuspid valve, with apical displacement of the valve and formation of an “atrialized” right ventricle (a functional unit between the right atrium and the inlet [inflow] portion of the right ventricle) (Fig.2).
The anterior tricuspid valve leaflet was elongated (Fig.2C, arrow), whereas the septal leaflet was rudimentary (Fig.2C, arrowhead).
Contrast echocardiography using saline revealed a patent foramen ovale with right-to-left shunting and bubbles in the left atrium (Fig.2D).
The patient underwent an electrophysiologic study with mapping of the accessory pathway, followed by radiofrequency ablation (interruption of the pathway using the heat generated by electromagnetic waves at the tip of an ablation catheter).
His post-ablation ECG showed a prolonged PR interval and an odd “second” QRS complex in leads III, aVF and V2–V4 (Fig.1Bottom), a consequence of abnormal impulse conduction in the “atrialized” right ventricle.
The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.
!!!
"""
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

In [6]:
import os
import re

In [3]:
path = os.getcwd()

parent_directory_path = os.path.dirname(path)

data_directory = "data/maccrobat/MACCROBAT2020"

path = os.path.join(parent_directory_path, data_directory)
file_list = os.listdir(path)

# Retrive txt and ann files
txt_files = [file for file in file_list if file.endswith('.txt')]
ann_files = [file for file in file_list if file.endswith('.ann')]

# Tuple together same txt and ann files
file_tuples = [(txt, txt[:-4] + '.ann') for txt in txt_files if txt[:-4] + '.ann' in ann_files]

In [4]:
file_tuples

[('26444414.txt', '26444414.ann'),
 ('28079821.txt', '28079821.ann'),
 ('23033875.txt', '23033875.ann'),
 ('28767567.txt', '28767567.ann'),
 ('27741115.txt', '27741115.ann'),
 ('25572898.txt', '25572898.ann'),
 ('25246819.txt', '25246819.ann'),
 ('28353604.txt', '28353604.ann'),
 ('25410034.txt', '25410034.ann'),
 ('26530965.txt', '26530965.ann'),
 ('20146086.txt', '20146086.ann'),
 ('21308977.txt', '21308977.ann'),
 ('22520024.txt', '22520024.ann'),
 ('28353613.txt', '28353613.ann'),
 ('27218632.txt', '27218632.ann'),
 ('18666334.txt', '18666334.ann'),
 ('21477357.txt', '21477357.ann'),
 ('24526194.txt', '24526194.ann'),
 ('26285706.txt', '26285706.ann'),
 ('27928148.txt', '27928148.ann'),
 ('22791498.txt', '22791498.ann'),
 ('21527041.txt', '21527041.ann'),
 ('28193213.txt', '28193213.ann'),
 ('24518095.txt', '24518095.ann'),
 ('19860925.txt', '19860925.ann'),
 ('26405496.txt', '26405496.ann'),
 ('24957905.txt', '24957905.ann'),
 ('25210224.txt', '25210224.ann'),
 ('26469535.txt', '2

In [7]:
label_types = []


for Annfile in ann_files:
    pathToAnnFile = os.path.join(path, Annfile)
    Annfile = open(pathToAnnFile, "r")
    allAnnLines = [re.split(r'\t+', tag.rstrip('\t')) for tag in Annfile if tag[0][0].startswith(('T'))]

    for annLine in allAnnLines:
        label_types.append(annLine[1].split()[0])

label_types = set(label_types)
print(len(label_types))
print(label_types)

41
{'Area', 'Personal_background', 'Subject', 'Quantitative_concept', 'Qualitative_concept', 'Medication', 'Duration', 'Coreference', 'Dosage', 'History', 'Outcome', 'Distance', 'Severity', 'Mass', 'Height', 'Frequency', 'Biological_structure', 'Date', 'Activity', 'Color', 'Detailed_description', 'Nonbiological_location', 'Sign_symptom', 'Disease_disorder', 'Weight', 'Shape', 'Age', 'Texture', 'Administration', 'Sex', 'Clinical_event', 'Time', 'Family_history', 'Other_event', 'Lab_value', 'Other_entity', 'Occupation', 'Diagnostic_procedure', 'Biological_attribute', 'Therapeutic_procedure', 'Volume'}


In [None]:
mySet = {
    'Diagnostic_procedure',
    'Sign_symptom',
    'Biological_structure',
    'Detailed_description',
    'Age',
    'Lab_value'
}

In [21]:
for txtAnnPair in file_tuples[:1]:
    pathToTxtFile = os.path.join(path, txtAnnPair[0])
    pathToAnnFile = os.path.join(path, txtAnnPair[1])

    Txtfile = open(pathToTxtFile, "r")
    Txtfile = Txtfile.readlines()
    Txtfile = "".join(Txtfile)
    
    Annfile = open(pathToAnnFile, "r")
    allAnnLines = [re.split(r'\t+', tag.rstrip('\t')) for tag in Annfile if tag[0][0].startswith(('T'))]
    print(len(allAnnLines))


    removed = []
    previous = -10
    for annLine in allAnnLines.copy():
        currentStart = int((annLine[1].split()[1]))
        if currentStart == previous:
            removed.append(annLine)
            allAnnLines.remove(annLine)
        previous = currentStart

    print(allAnnLines)
    print(Txtfile)

53
[['T1', 'Age 2 13', '58-year-old\n'], ['T2', 'Sex 14 17', 'man\n'], ['T3', 'Sign_symptom 42 57', 'general fatigue\n'], ['T4', 'Sign_symptom 69 75', 'anemia\n'], ['T5', 'Severity 62 68', 'severe\n'], ['T6', 'Duration 80 94', 'several months\n'], ['T7', 'Diagnostic_procedure 100 117', 'hemoglobin levels\n'], ['T8', 'Lab_value 123 131', '6.6 g/dl\n'], ['T9', 'History 167 185', 'no medical history\n'], ['T10', 'History 190 215', 'did not take any medicine\n'], ['T11', 'Diagnostic_procedure 217 243', 'Esophagogastroduodenoscopy\n'], ['T12', 'Diagnostic_procedure 248 259', 'colonoscopy\n'], ['T13', 'Sign_symptom 291 299', 'bleeding\n'], ['T14', 'Diagnostic_procedure 311 330', 'computer tomography\n'], ['T15', 'Biological_structure 301 310', 'Abdominal\n'], ['T16', 'Sign_symptom 361 366', 'tumor\n'], ['T17', 'Biological_structure 374 389', 'small intestine\n'], ['T18', 'Detailed_description 347 360', 'hypervascular\n'], ['T19', 'Distance 342 346', '2-cm\n'], ['T20', 'Diagnostic_procedure 4

# IDEAS for better results

On generated and words in dataset perfrom lematization!

In [1]:
#from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AdaptLLM/medicine-chat", device_map="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("AdaptLLM/medicine-chat")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [25]:
user_input = """Identify sign symptoms mentioned in text between !!!. For every detected sign symptom extract only one word and do not provide additional information.

!!!
A 58-year-old man had been suffering from general fatigue and severe anemia for several months.
His hemoglobin levels were 6.6 g/dl (normal range: 12–16 g/dl).
He had no medical history and did not take any medicine.
Esophagogastroduodenoscopy and colonoscopy did not reveal any significant bleeding.
Abdominal computer tomography revealed a 2-cm hypervascular tumor in the small intestine (Fig.1).
Oral DBE detected a 2-cm-diameter reddish, submucosal tumor-like lesion with surface ulceration in the jejunum, approximately 20 cm away from the Treitz ligament (Fig.2).
We did not perform biopsy because it can be difficult to stop bleeding in the case of hypervascular lesions.
Under the diagnosis of a small bowel tumor, gastrointestinal stromal tumor (GIST), malignant lymphoma, or cancer, we performed laparoscopic-assisted segmental resection of the jejunum with the dissection of lymph nodes.
Examination of the resected tumor showed that it measured 19 × 16 mm in diameter (Fig.3).
Histology revealed the proliferation of blood capillaries and granulation tissue, which was consistent with PG (Fig.4).
The patient was discharged on postoperative day 9 without complication and his anemia improved gradually without the need for oral iron after surgery.
!!!
"""
our_system_prompt = "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n" # Please do NOT change this

prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{user_input} [/INST]"

# # NOTE:
# # If you want to apply your own system prompt, please integrate it into the instruction part following our system prompt like this:
# your_system_prompt = "Please, answer this question faithfully."
# prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{your_system_prompt}\n{user_input} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096, max_new_tokens=150)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')

Both `max_new_tokens` (=150) and `max_length`(=4096) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


### User Input:
Identify sign symptoms mentioned in text between !!!. For every detected sign symptom extract only one word and do not provide additional information.

!!!
A 58-year-old man had been suffering from general fatigue and severe anemia for several months.
His hemoglobin levels were 6.6 g/dl (normal range: 12–16 g/dl).
He had no medical history and did not take any medicine.
Esophagogastroduodenoscopy and colonoscopy did not reveal any significant bleeding.
Abdominal computer tomography revealed a 2-cm hypervascular tumor in the small intestine (Fig.1).
Oral DBE detected a 2-cm-diameter reddish, submucosal tumor-like lesion with surface ulceration in the jejunum, approximately 20 cm away from the Treitz ligament (Fig.2).
We did not perform biopsy because it can be difficult to stop bleeding in the case of hypervascular lesions.
Under the diagnosis of a small bowel tumor, gastrointestinal stromal tumor (GIST), malignant lymphoma, or cancer, we performed laparoscopic-assisted 

In [20]:
user_input = """Identify and sign symptoms mentioned in text between !!!. For every detected sign symptom extract only one word and do not provide additional information.


!!!
A 58-year-old man had been suffering from general fatigue and severe anemia for several months.
His hemoglobin levels were 6.6 g/dl (normal range: 12–16 g/dl).
He had no medical history and did not take any medicine.
Esophagogastroduodenoscopy and colonoscopy did not reveal any significant bleeding.
Abdominal computer tomography revealed a 2-cm hypervascular tumor in the small intestine (Fig.1).
Oral DBE detected a 2-cm-diameter reddish, submucosal tumor-like lesion with surface ulceration in the jejunum, approximately 20 cm away from the Treitz ligament (Fig.2).
We did not perform biopsy because it can be difficult to stop bleeding in the case of hypervascular lesions.
Under the diagnosis of a small bowel tumor, gastrointestinal stromal tumor (GIST), malignant lymphoma, or cancer, we performed laparoscopic-assisted segmental resection of the jejunum with the dissection of lymph nodes.
Examination of the resected tumor showed that it measured 19 × 16 mm in diameter (Fig.3).
Histology revealed the proliferation of blood capillaries and granulation tissue, which was consistent with PG (Fig.4).
The patient was discharged on postoperative day 9 without complication and his anemia improved gradually without the need for oral iron after surgery.
!!!
"""
our_system_prompt = "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n" # Please do NOT change this

prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{user_input} [/INST]"

# # NOTE:
# # If you want to apply your own system prompt, please integrate it into the instruction part following our system prompt like this:
# your_system_prompt = "Please, answer this question faithfully."
# prompt = f"<s>[INST] <<SYS>>{our_system_prompt}<</SYS>>\n\n{your_system_prompt}\n{user_input} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')

### User Input:
Identify and sign symptoms mentioned in text between !!!. For every detected sign symptom extract only one word and do not provide additional information.


!!!
A 58-year-old man had been suffering from general fatigue and severe anemia for several months.
His hemoglobin levels were 6.6 g/dl (normal range: 12–16 g/dl).
He had no medical history and did not take any medicine.
Esophagogastroduodenoscopy and colonoscopy did not reveal any significant bleeding.
Abdominal computer tomography revealed a 2-cm hypervascular tumor in the small intestine (Fig.1).
Oral DBE detected a 2-cm-diameter reddish, submucosal tumor-like lesion with surface ulceration in the jejunum, approximately 20 cm away from the Treitz ligament (Fig.2).
We did not perform biopsy because it can be difficult to stop bleeding in the case of hypervascular lesions.
Under the diagnosis of a small bowel tumor, gastrointestinal stromal tumor (GIST), malignant lymphoma, or cancer, we performed laparoscopic-assi

In [None]:
generator = transformers.pipeline(
    model = model,
    tokenizer=tokenizer,
    return_full_text = True, # langchain expects full text
    task='text-generation',
    #stopping_criteria=stopping_criteria, # without this model rambles during chat
    temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 is the max
    max_new_tokens=512, # max number of tokens to generate in the output
    repetition_penalty=1.1 # without this output begins repeating
)

llm = HuggingFacePipeline(pipeline=generator)
# creating prompt for large language model
pre_prompt = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\nGenerate the next agent response by answering the question. Answer it as succinctly as possible. You are provided several documents with titles. If the answer comes from different documents please mention all possibilities in your answer and use the titles to separate between topics or domains. If you cannot answer the question from the given documents, please state that you do not have an answer.\n"""
prompt = pre_prompt + "CONTEXT:\n\n{context}\n" +"Question : {question}" + "[\INST]"
llama_prompt = PromptTemplate(template=prompt, input_variables=["context", "question"])
# integrate prompt with LLM
chain = ConversationalRetrievalChain.from_llm(llm, loaded_vectorstore.as_retriever(), combine_docs_chain_kwargs={"prompt": llama_prompt}, return_source_documents=True)