![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/NER_CHEXPERT.ipynb)

# `NER_CHEMD` **Models**

This model extracts `Anatomical` and `Observation` entities from Chest Radiology Reports.

## 1. Colab Setup

**Import license keys**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

In [None]:
spark

## 3. Select the model and construct the pipeline

**Create the pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

sentenceDetector = nlp.SentenceDetectorDLModel.pretrained() \
      .setInputCols(["document"]) \
      .setOutputCol("sentence") 

tokenizer = nlp.Tokenizer()\
      .setInputCols(["sentence"])\
      .setOutputCol("token")


word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
      .setInputCols(["sentence", "token"])\
      .setOutputCol("word_embeddings")

clinical_ner = medical.NerModel.pretrained("ner_chexpert", "en", "clinical/models") \
      .setInputCols(["sentence", "token", "word_embeddings"]) \
      .setOutputCol("ner")

ner_converter = medical.NerConverterInternal() \
      .setInputCols(["sentence", "token", "ner"]) \
      .setOutputCol("ner_chunk")\
      .setGreedyMode(True)

nlpPipeline = Pipeline(stages = [document_assembler,
                                 sentenceDetector,
                                 tokenizer,
                                 word_embeddings,
                                 clinical_ner,
                                 ner_converter,])

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_chexpert download started this may take some time.
[OK!]


## 4. Create example inputs

In [None]:
sample_text = ["""FINAL REPORT HISTORY : Chest tube leak , to assess for pneumothorax . 
FINDINGS : In comparison with study of ___ , the endotracheal tube and Swan - Ganz catheter have been removed . Chest-tube remains in place and there is no evidence of pneumothorax. Mild atelectatic changes are seen at the left base.""",

"""FINAL REPORT EXAMINATION: CHEST ( PORTABLE AP ).
INDICATION : _ year old woman with SAH / / Fever workup Fever workup. 
IMPRESSION : Compared to chest radiographs _. Lungs are clear . Normal cardiomediastinal and hilar silhouettes and pleural surfaces .""",

"""FINAL REPORT EXAMINATION : CHEST ( PORTABLE AP ). 
INDICATION : _ year old woman with OGT / / OGT placement OGT placement. 
IMPRESSION : In comparison with the earlier study of this date , the nasogastric tube is been pushed forward so that it extends at least to the mid portion of the body stomach were crosses the lower margin of the image . Side - port is definitely distal to the esophagogastric junction . Otherwise little change .""",

"""TECHNIQUE : Chest PA and lateral.
FINAL REPORT INDICATION : _ year old woman with OGT / / OGT placement  
COMPARISON : Chest radiograph from _ from earlier today. 
FINDINGS : The lung volumes are stable . The mediastinal and hilar contours are normal . The pleural surfaces are normal . The ET tube terminates approximately 4.3cm from the carina . The NG tube is still malpositioned and is located closer to the distal esophagus / esophagogastric junction . 
IMPRESSION : ET tube is in appropriate position . The enteric tube is still malpositioned and should be advanced approximately 12 cm .""",

"""FINAL REPORT EXAMINATION : CHEST ( PORTABLE AP ) 
INDICATION : _ year old woman with SAH in ICU . / / Interval pulm changes Interval pulm changes. 
IMPRESSION : In comparison with the study of _ , No evidence of acute cardiopulmonary-disease . No pneumonia , vascular-congestion , or pleural-effusion . _ sixth""",

"""FINAL REPORT INDICATION : _ year old woman with SAH , intubated / / serial exam. 
TECHNIQUE : Chest PA and lateral COMPARISON : Chest radiograph from _ .
FINDINGS : The lung volumes are stable . The cardiomediastinal and hilar contours are normal . The pleural surfaces are normal . The ET tube terminates approximately 1.8cm from the carina . The NG tube appears to be closer to the esophagogastric junction or scarcely in the proximal stomach and the side ports are approximately 12cm from the mid stomach . 
IMPRESSION : ET tube is closer to the carina and should be withdrawn approximately 3 cm . NG tube is malpositioned and should be advanced approximately 12cm ."""
]

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(sample_text,StringType()).toDF('text')
df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|FINAL REPORT HISTORY : Chest tube leak , to assess for pneumothorax . 
FINDINGS : In comparison w...|
|FINAL REPORT EXAMINATION: CHEST ( PORTABLE AP ).
INDICATION : _ year old woman with SAH / / Fever...|
|FINAL REPORT EXAMINATION : CHEST ( PORTABLE AP ). 
INDICATION : _ year old woman with OGT / / OGT...|
|TECHNIQUE : Chest PA and lateral.
FINAL REPORT INDICATION : _ year old woman with OGT / / OGT pla...|
|FINAL REPORT EXAMINATION : CHEST ( PORTABLE AP ) 
INDICATION : _ year old woman with SAH in ICU ....|
|FINAL REPORT INDICATION : _ year old woman with SAH , intubated / / serial exam. 
TECHNIQUE : Che...|
+------------------------------------------------------------------------

## 5. Use the pipeline to create outputs

In [None]:
result = nlpPipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                      result.ner_chunk.begin, 
                                      result.ner_chunk.end,
                                      result.ner_chunk.metadata, )).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']").alias("begin"),
              F.expr("cols['2']").alias("end"),
              F.expr("cols['3']['entity']").alias("entity")).show()

+--------------------+-----+---+------+
|               chunk|begin|end|entity|
+--------------------+-----+---+------+
|   endotracheal tube|  120|136|   OBS|
|Swan - Ganz catheter|  142|161|   OBS|
|          Chest-tube|  183|192|   OBS|
|            in place|  202|209|   OBS|
|        pneumothorax|  239|250|   OBS|
|Mild atelectatic ...|  253|276|   OBS|
|           left base|  294|302|  ANAT|
|               Lungs|  166|170|  ANAT|
|               clear|  176|180|   OBS|
|              Normal|  184|189|   OBS|
|   cardiomediastinal|  191|207|  ANAT|
|   hilar silhouettes|  213|229|  ANAT|
|    pleural surfaces|  235|250|  ANAT|
|    nasogastric tube|  193|208|   OBS|
|         mid portion|  268|278|  ANAT|
|        body stomach|  287|298|  ANAT|
|lower margin of t...|  317|341|   OBS|
|         Side - port|  345|355|   OBS|
|              distal|  371|376|  ANAT|
|esophagogastric j...|  385|408|  ANAT|
+--------------------+-----+---+------+
only showing top 20 rows



## 6. Visualize results

In [None]:
from sparknlp_display import NerVisualizer

ner_viz = NerVisualizer()


    
for j in range(len(sample_text)):
    ner_viz.display(result = result.collect()[j], label_col = "ner_chunk")
    print("\n\n")





























