

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/NER_DIAG_PROC.ipynb)




# **Detect diagnosis and procedures**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.



## **Colab Setup**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType, IntegerType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G", 
          "spark.kryoserializer.buffer.max":"2000M", 
          "spark.driver.maxResultSize":"2000M"} 

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.2.8
Spark NLP_JSL Version : 4.2.8


## **🔎 For about models**


📌 **"ner_diseases"**--> *Pretrained named entity recognition deep learning model for diseases.*

*   Predicted Entities => **Disease**

📌 **ner_diseases_large** --> *Extract mentions of different types of disease in medical text using pretrained NER model.*

*   Predicted Entities => **Disease** 



📌 **ner_jsl**--> *Pretrained named entity recognition deep learning model for clinical terminology. *

*   Predicted Entities => **Injury_or_Poisoning, Direction, Test, Admission_Discharge, Death_Entity, Relationship_Status, Duration, Respiration, Hyperlipidemia, Birth_Entity, Age, Labour_Delivery, Family_History_Header, BMI, Temperature, Alcohol, Kidney_Disease, Oncological, Medical_History_Header, Cerebrovascular_Disease, Oxygen_Therapy, O2_Saturation, Psychological_Condition, Heart_Disease, Employment, Obesity, Disease_Syndrome_Disorder, Pregnancy, ImagingFindings, Procedure, Medical_Device, Race_Ethnicity, Section_Header, Symptom, Treatment, Substance, Route, Drug_Ingredient, Blood_Pressure, Diet, External_body_part_or_region, LDL, VS_Finding, Allergen, EKG_Findings, Imaging_Technique, Triglycerides, RelativeTime, Gender, Pulse, Social_History_Header, Substance_Quantity, Diabetes, Modifier, Internal_organ_or_component, Clinical_Dept, Form, Drug_BrandName, Strength, Fetus_NewBorn, RelativeDate, Height, Test_Result, Sexually_Active_or_Sexual_Orientation, Frequency, Time, Weight, Vaccine, Vital_Signs_Header, Communicable_Disease, Dosage, Overweight, Hypertension, HDL, Total_Cholesterol, Smoking, Date**


**🔎You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&edition=Spark+NLP+for+Healthcare)**

## **🔎Define Spark NLP pipeline**

In [4]:
#basic_stages👇🏻
documentAssembler = DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = Tokenizer()\
    .setInputCols(['sentence']) \
    .setOutputCol('token')

word_embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
    .setInputCols(['sentence', 'token']) \
    .setOutputCol('embeddings')
    
#select models👇🏻

def pipeline(model_name):
    
    clinical_ner = MedicalNerModel.pretrained(model_name, 'en', 'clinical/models') \
        .setInputCols(['sentence', 'token', 'embeddings']) \
        .setOutputCol('ner')
    
    if model_name == "ner_jsl":
        ner_converter = NerConverterInternal() \
            .setInputCols(['sentence', 'token', 'ner']) \
            .setOutputCol('ner_chunk')\
            .setWhiteList(['Disease_Syndrome_Disorder', 'Procedure'])
             
    else:
        ner_converter = NerConverter() \
            .setInputCols(['sentence', 'token', 'ner']) \
            .setOutputCol('ner_chunk')
            

    nlpPipeline = Pipeline(stages=[documentAssembler, 
                                   sentenceDetector,
                                   tokenizer,
                                   word_embeddings,
                                   clinical_ner,
                                   ner_converter])

    empty_df = spark.createDataFrame([['']]).toDF("text")
    pipelineModel = nlpPipeline.fit(empty_df)
    
    light_model = LightPipeline(pipelineModel)
    return light_model

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]


# **🔎 "ner_diseases" model**

In [5]:
sample_text = """Nature and course of the diagnosis has been discussed with the patient. 

According to its presentation with no history of malignant melanoma, this appears to be Pathological fracture of the left proximal hip. 
At the present time, I would recommend obtaining a bone scan and repeat x-rays, which will include AP pelvis, femur, hip including knee.

She needs a left hip hemiarthroplasty versus calcar hemiarthroplasty, cemented type. Indication, risk, and benefits of left hip hemiarthroplasty has been discussed with the patient, which includes, but not limited to infection, nerve injury, blood vessel injury, dislocation , 
leg length discrepancy, myositis ossificans, intraoperative fracture, prosthetic fracture, need for conversion to total hip replacement surgery, revision surgery, DVT, pulmonary embolism, risk of anesthesia, need for blood transfusion, and cardiac arrest. She understands above and is willing to undergo further procedure. The goal and the functional outcome have been explained. Further plan will be discussed with her once we obtain the bone scan and the radiographic studies. We will also await for the oncology feedback and clearance.
"""

In [6]:
light_result = pipeline("ner_diseases").fullAnnotate(sample_text)

chunks = []
entities = []
sentence= []
begin = []
end = []

for n in light_result[0]['ner_chunk']:
        
    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    sentence.append(n.metadata['sentence'])
    
    
import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, 
                   'sentence_id':sentence, 'entities':entities})

df.head(20)

ner_diseases download started this may take some time.
[OK!]


Unnamed: 0,chunks,begin,end,sentence_id,entities
0,malignant melanoma,123,140,1,Disease
1,Pathological fracture of the left proximal hip,162,207,1,Disease
2,infection,566,574,4,Disease
3,nerve injury,577,588,4,Disease
4,blood vessel injury,591,609,4,Disease
5,dislocation,612,622,4,Disease
6,leg length discrepancy,627,648,4,Disease
7,myositis ossificans,651,669,4,Disease
8,intraoperative fracture,672,694,4,Disease
9,prosthetic fracture,697,715,4,Disease


In [7]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()
visualiser.display(result = light_result[0] ,label_col = 'ner_chunk', document_col = 'document')

# **🔎 "ner_diseases_large" model**

In [8]:
sample_text = """FINDINGS: The patient was found upon excision of the cyst that it contained a large Prolene suture, which is multiply knotted as it always is; beneath this was a very small incisional hernia, the hernia cavity, which contained omentum; the hernia was easily repaired.

DESCRIPTION OF PROCEDURE: The patient was identified, then taken into the operating room, where after induction of an LMA anesthetic, his abdomen was prepped with Betadine solution and draped in sterile fashion. The puncta of the wound lesion was infiltrated with methylene blue and peroxide. The lesion was excised and the existing scar was excised using an ellipse and using a tenotomy scissors, the cyst was excised down to its base. In doing so, we identified a large Prolene suture within the wound and followed this cyst down to its base at which time we found that it contained omentum and was in fact overlying a small incisional hernia. The cyst was removed in its entirety, divided from the omentum using a Metzenbaum and tying with 2-0 silk ties. The hernia repair was undertaken with interrupted 0 Vicryl suture with simple sutures. The wound was then irrigated and closed with 3-0 Vicryl subcutaneous and 4-0 Vicryl subcuticular and Steri-Strips. Patient tolerated the procedure well. Dressings were applied and he was taken to recovery room in stable condition.
"""

In [9]:
light_result = pipeline("ner_diseases_large").fullAnnotate(sample_text)

chunks = []
entities = []
sentence= []
begin = []
end = []

for n in light_result[0]['ner_chunk']:
        
    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    sentence.append(n.metadata['sentence'])
    
    
import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, 
                   'sentence_id':sentence, 'entities':entities})

df.head(20)

ner_diseases_large download started this may take some time.
[OK!]


Unnamed: 0,chunks,begin,end,sentence_id,entities
0,cyst,53,56,0,Disease
1,incisional hernia,173,189,0,Disease
2,hernia,196,201,0,Disease
3,hernia,240,245,0,Disease
4,incisional hernia,896,912,4,Disease
5,cyst,919,922,5,Disease
6,hernia,1031,1036,6,Disease


In [10]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()
visualiser.display(result = light_result[0] ,label_col = 'ner_chunk', document_col = 'document')

# **🔎 "ner_jsl" model**

In [11]:
sample_text = """PREOPERATIVE DIAGNOSIS: Cervical myelopathy secondary to  very large disc herniations at C4-C5 and C5-C6.
PROCEDURE PERFORMED:
1. Anterior cervical discectomy, C4-C5 and C5-C6.
2. Arthrodesis, C4-C5 and C5-C6.
3. Partial corpectomy, C5.
4. Machine bone allograft, C4-C5 and C5-C6.
5. Placement of anterior cervical plate with a Zephyr C4 to C6.
6. Fluoroscopic guidance.
7. Microscopic dissection
"""

In [12]:
light_result = pipeline("ner_jsl").fullAnnotate(sample_text)

chunks = []
entities = []
sentence= []
begin = []
end = []

for n in light_result[0]['ner_chunk']:
        
    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    sentence.append(n.metadata['sentence'])
    
    
import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, 
                   'sentence_id':sentence, 'entities':entities})

df.head(20)

ner_jsl download started this may take some time.
[OK!]


Unnamed: 0,chunks,begin,end,sentence_id,entities
0,Cervical myelopathy,24,42,0,Disease_Syndrome_Disorder
1,disc herniations,69,84,0,Disease_Syndrome_Disorder
2,Anterior cervical discectomy,130,157,2,Procedure
3,Arthrodesis,180,190,3,Procedure
4,Partial corpectomy,213,230,4,Procedure
5,Machine bone allograft,240,261,4,Procedure
6,Placement of anterior cervical plate,284,319,5,Procedure
7,Fluoroscopic guidance,348,368,5,Procedure
8,Microscopic dissection,374,395,6,Procedure


In [13]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()
visualiser.display(result = light_result[0] ,label_col = 'ner_chunk', document_col = 'document')