In [1]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import torch
import transformers
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
import semantic_search

-------

### Input a medical text:

In [None]:
input_text = input("""Please insert a medical text to be analyzed by the model: """)

In [3]:
print_text = input_text.replace('. ', '.\n')
print("You've inserted the following medical text:")
print(print_text)

You've inserted the following medical text:
The patient was a 34-yr-old man who presented with complaints of fever and a chronic cough.
He was a smoker and had a history of pulmonary tuberculosis that had been treated and cured.
A computed tomographic (CT) scan revealed multiple tiny nodules in both lungs.
A thoracoscopic lung biopsy was taken from the right upper lobe.
The microscopic examination revealed a typical LCH.
The tumor cells had vesicular and grooved nuclei, and they formed small aggregations around the bronchioles (Fig.1).
The tumor cells were strongly positive for S-100 protein, vimentin, CD68 and CD1a.
There were infiltrations of lymphocytes and eosinophils around the tumor cells.
With performing additional radiologic examinations, no other organs were thought to be involved.
He quit smoking, but he received no other specific treatment.
He was well for the following one year.
After this, a follow-up CT scan was performed and it showed a 4 cm-sized mass in the left lower 

## Named Entity Recognition

In [4]:

ner_model_name = "SahuH/distilbert-ner"
ner_tokenizer = AutoTokenizer.from_pretrained(ner_model_name)
ner_model = AutoModel.from_pretrained(ner_model_name)

tokenizer_config.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/5.00k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/266M [00:00<?, ?B/s]

In [None]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained(ner_model_name)
model = AutoModelForTokenClassification.from_pretrained(ner_model_name)

pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
output = pipe(input_text)

In [None]:
!python -m spacy download en_core_web_sm

In [41]:
import spacy
from spacy import displacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(input_text)

ents = []
for d in output:
    ent = doc.char_span(d['start'], d['end'], label=d['entity_group'])
    if ent is None:
        continue

    ents.append(ent)

doc.ents = ents

displacy.render(doc, style="ent", jupyter=True)

-----

### Relevant Information retrieval from Medical Text

In [None]:
input_question = input("""Please input a query to retrieve relevant information from the given text: """)

In [43]:
print("You've given the following query:")
print(input_question)

You've given the following query:
Which disease was detected in patient through biopsy?


In [8]:
relevant_info = semantic_search.semantic_search(input_text, input_question)

Map:   0%|          | 0/22 [00:00<?, ? examples/s]

  0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
print("The following information was retrieved from the given text based on the query:")
print(relevant_info)

The following information was retrieved from the given text based on the query:
A needle biopsy specimen revealed the possibility of a sarcoma; therefore, a lobectomy was performed.
With performing additional radiologic examinations, no other organs were thought to be involved.
The microscopic examination revealed a typical LCH.
Now, at five months after lobectomy, the patient is doing well with no significant change in the radiologic findings..
A computed tomographic (CT) scan revealed multiple tiny nodules in both lungs.
Microscopically, the tumor cells were aggregated in large sheets and they showed an infiltrative growth.
After this, a follow-up CT scan was performed and it showed a 4 cm-sized mass in the left lower lobe, in addition to the multiple tiny nodules in both lungs (Fig.2).
A thoracoscopic lung biopsy was taken from the right upper lobe.
The cytologic features of some of the tumor cells were similar to those seen in a typical LCH.
The ultrastructural analysis failed to d

### Information Summarization


#### Here, we will summarize the relevant information retrieved in previous part

In [10]:
from transformers import pipeline

summarizer = pipeline("summarization", model="t5-base")
output = summarizer(relevant_info, min_length=30, max_length=256)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [11]:
final_output = output[0]["summary_text"].replace('. ', '.\n')
print("A concise summary of the relevant information is: ")
print(final_output)

A concise summary summary of the relevant information is: 
a needle biopsy specimen revealed the possibility of a sarcoma .
after lobectomy, no other organs were thought to be involved .
the patient is now doing well with no significant change in the radiologic findings .
