

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/NER_DIAG_PROC.ipynb)




# **Detect diagnosis and procedures**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.



## **Colab Setup**

In [None]:
import json
import os

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)

# Adding license key-value pairs to environment variables
os.environ.update(license_keys)

## **Install dependencies**

In [4]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

## **Import dependencies into Python and start the Spark session**

In [5]:
import json
import os

from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import SparkSession

import sparknlp
import sparknlp_jsl

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.pretrained import ResourceDownloader
from pyspark.sql import functions as F

import pandas as pd

pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

import string
import numpy as np

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(secret = SECRET, params=params)

print ("Spark NLP Version :", sparknlp.version())
print ("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 3.4.4
Spark NLP_JSL Version : 3.5.2


## **🔎 For about models**


📌 **"ner_diseases"**--> *Pretrained named entity recognition deep learning model for diseases.*

*   Predicted Entities => **Disease**

📌 **ner_diseases_large** --> *Extract mentions of different types of disease in medical text using pretrained NER model.*

*   Predicted Entities => **Disease** 



📌 **ner_jsl**--> *Pretrained named entity recognition deep learning model for clinical terminology. *

*   Predicted Entities => **Injury_or_Poisoning, Direction, Test, Admission_Discharge, Death_Entity, Relationship_Status, Duration, Respiration, Hyperlipidemia, Birth_Entity, Age, Labour_Delivery, Family_History_Header, BMI, Temperature, Alcohol, Kidney_Disease, Oncological, Medical_History_Header, Cerebrovascular_Disease, Oxygen_Therapy, O2_Saturation, Psychological_Condition, Heart_Disease, Employment, Obesity, Disease_Syndrome_Disorder, Pregnancy, ImagingFindings, Procedure, Medical_Device, Race_Ethnicity, Section_Header, Symptom, Treatment, Substance, Route, Drug_Ingredient, Blood_Pressure, Diet, External_body_part_or_region, LDL, VS_Finding, Allergen, EKG_Findings, Imaging_Technique, Triglycerides, RelativeTime, Gender, Pulse, Social_History_Header, Substance_Quantity, Diabetes, Modifier, Internal_organ_or_component, Clinical_Dept, Form, Drug_BrandName, Strength, Fetus_NewBorn, RelativeDate, Height, Test_Result, Sexually_Active_or_Sexual_Orientation, Frequency, Time, Weight, Vaccine, Vital_Signs_Header, Communicable_Disease, Dosage, Overweight, Hypertension, HDL, Total_Cholesterol, Smoking, Date**


**🔎You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&edition=Spark+NLP+for+Healthcare)**

## **🔎Define Spark NLP pipeline**

In [16]:
#basic_stages👇🏻
documentAssembler = DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = Tokenizer()\
    .setInputCols(['sentence']) \
    .setOutputCol('token')

word_embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
    .setInputCols(['sentence', 'token']) \
    .setOutputCol('embeddings')
    
#select models👇🏻

def pipeline(model_name):
    
    clinical_ner = MedicalNerModel.pretrained(model_name, 'en', 'clinical/models') \
        .setInputCols(['sentence', 'token', 'embeddings']) \
        .setOutputCol('ner')
    
    if model_name == "ner_jsl":
        ner_converter = NerConverter() \
            .setInputCols(['sentence', 'token', 'ner']) \
            .setOutputCol('ner_chunk')\
            .setWhiteList(['Disease_Syndrome_Disorder', 'Procedure'])
             
    else:
        ner_converter = NerConverter() \
            .setInputCols(['sentence', 'token', 'ner']) \
            .setOutputCol('ner_chunk')
            

    nlpPipeline = Pipeline(stages=[
        documentAssembler, 
        sentenceDetector,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter])

    empty_df = spark.createDataFrame([['']]).toDF("text")
    pipelineModel = nlpPipeline.fit(empty_df)
    light_model = LightPipeline(pipelineModel)
    return light_model

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]


## **Create sample input**

In [17]:
sample_text = """FINDINGS: The patient was found upon excision of the cyst that it contained a large Prolene suture, which is multiply knotted as it always is; beneath this was a very small incisional hernia, the hernia cavity, which contained omentum; the hernia was easily repaired.

DESCRIPTION OF PROCEDURE: The patient was identified, then taken into the operating room, where after induction of an LMA anesthetic, his abdomen was prepped with Betadine solution and draped in sterile fashion. The puncta of the wound lesion was infiltrated with methylene blue and peroxide. The lesion was excised and the existing scar was excised using an ellipse and using a tenotomy scissors, the cyst was excised down to its base. In doing so, we identified a large Prolene suture within the wound and followed this cyst down to its base at which time we found that it contained omentum and was in fact overlying a small incisional hernia. The cyst was removed in its entirety, divided from the omentum using a Metzenbaum and tying with 2-0 silk ties. The hernia repair was undertaken with interrupted 0 Vicryl suture with simple sutures. The wound was then irrigated and closed with 3-0 Vicryl subcutaneous and 4-0 Vicryl subcuticular and Steri-Strips. Patient tolerated the procedure well. Dressings were applied and he was taken to recovery room in stable condition. """

# **🔎 "ner_diseases" model**

In [18]:
light_result = pipeline("ner_diseases").fullAnnotate(sample_text)

chunks = []
entities = []
sentence= []
begin = []
end = []

for n in light_result[0]['ner_chunk']:
        
    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    sentence.append(n.metadata['sentence'])
    
    
import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, 
                   'sentence_id':sentence, 'entities':entities})

df.head(20)

ner_diseases download started this may take some time.
[OK!]


Unnamed: 0,chunks,begin,end,sentence_id,entities
0,the cyst,49,56,0,Disease
1,a large Prolene suture,76,97,0,Disease
2,a very small incisional hernia,160,189,0,Disease
3,the hernia cavity,192,208,0,Disease
4,omentum,227,233,0,Disease
5,the hernia,236,245,0,Disease
6,the wound lesion,495,510,2,Disease
7,The lesion,562,571,3,Disease
8,the existing scar,589,605,3,Disease
9,the cyst,667,674,3,Disease


In [19]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()
visualiser.display(result = light_result[0] ,label_col = 'ner_chunk', document_col = 'document')

# **🔎 "ner_diseases_large" model**

In [20]:
light_result = pipeline("ner_diseases_large").fullAnnotate(sample_text)

chunks = []
entities = []
sentence= []
begin = []
end = []

for n in light_result[0]['ner_chunk']:
        
    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    sentence.append(n.metadata['sentence'])
    
    
import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, 
                   'sentence_id':sentence, 'entities':entities})

df.head(20)

ner_diseases_large download started this may take some time.
[OK!]


Unnamed: 0,chunks,begin,end,sentence_id,entities
0,cyst,53,56,0,Disease
1,incisional hernia,173,189,0,Disease
2,hernia,196,201,0,Disease
3,hernia,240,245,0,Disease
4,incisional hernia,896,912,4,Disease
5,cyst,919,922,5,Disease
6,hernia,1031,1036,6,Disease


In [21]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()
visualiser.display(result = light_result[0] ,label_col = 'ner_chunk', document_col = 'document')

# **🔎 "ner_jsl" model**

In [22]:
light_result = pipeline("ner_jsl").fullAnnotate(sample_text)

chunks = []
entities = []
sentence= []
begin = []
end = []

for n in light_result[0]['ner_chunk']:
        
    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    sentence.append(n.metadata['sentence'])
    
    
import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, 
                   'sentence_id':sentence, 'entities':entities})

df.head(20)

ner_jsl download started this may take some time.
[OK!]


Unnamed: 0,chunks,begin,end,sentence_id,entities
0,excision of the cyst,37,56,0,Procedure
1,hernia,184,189,0,Disease_Syndrome_Disorder
2,hernia,196,201,0,Disease_Syndrome_Disorder
3,hernia,240,245,0,Disease_Syndrome_Disorder
4,cyst,671,674,3,Disease_Syndrome_Disorder
5,cyst,791,794,4,Disease_Syndrome_Disorder
6,hernia,907,912,4,Disease_Syndrome_Disorder
7,cyst,919,922,5,Disease_Syndrome_Disorder
8,hernia repair,1031,1043,6,Procedure


In [23]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()
visualiser.display(result = light_result[0] ,label_col = 'ner_chunk', document_col = 'document')