![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/ER_UMLS_CUI.ipynb)

# `sbiobertresolve_umls_findings` **Models**

This model maps clinical findings to their corresponding UMLS Concept Unique Identifier (CUI) codes using Entity Resolvers.

## 1. Colab Setup

**Import license keys**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 3. Select the model and construct the pipeline

In [None]:
MODEL_NAME = "sbiobertresolve_umls_findings"

**Create the pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner = medical.NerModel.pretrained("ner_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = medical.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(["PROBLEM"])

c2doc = nlp.Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc") 

sbert_embedder = nlp.BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")\
    .setCaseSensitive(False)

resolver = medical.SentenceEntityResolverModel.pretrained(MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["ner_chunk_doc", "sbert_embeddings"]) \
    .setOutputCol("resolution")


nlp_pipeline = Pipeline(
    stages = [
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner,
        ner_converter,
        c2doc,
        sbert_embedder,
        resolver
  ])


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_umls_findings download started this may take some time.
[OK!]


## 4. Create example inputs

In [None]:
sample_text = [
            
"""HPI: A 69-year-old white female with a history of depression, anxiety, admitted to the ABCD Hospital on February 6, 2007, for shortness of breath. The patient was consulted by Psychiatry for anxiety. I know this patient from a previous consult. During this recent admission, she was given Ativan 0.25 mg on a p.r.n. basis with relief after one to two hours. The patient was seen by Abc, MD, and Def, Ph.D. PAST MEDICAL HISTORY: The patient has a history of hypertension, hypothyroidism, cholelithiasis, Port-A-Cath placement, and hydronephrosis""",

"""The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He has lost about 200 pounds and was otherwise doing well until yesterday evening around 7:00-8:00 when he developed nausea and right upper quadrant pain, which apparently wrapped around toward his right side and back. He feels like he was on it but has not done so. He has overall malaise and a low-grade temperature of 100.3. His last normal bowel movement was yesterday. He denies any outright chills. PAST MEDICAL HISTORY: Significant for hypertension and morbid obesity, now resolved.""",

"""The patient is a 51-year-old Caucasian female with past medical history of morbid obesity and chronic lower extremity lymphedema. She follows up at the wound care center at Hospital. Her lower extremity edema is being managed there. She has had multiple episodes of cellulitis of the lower extremities for which she has received treatment with oral Bactrim and ciprofloxacin in the past according to her. As her lymphedema was not improving on therapy at that facility, she was referred for admission to Long-Term Acute Care Facility for lymphedema management. She at present has a stage II ulcer on the lower part of the medial aspect of left leg. Her measurements for lymphedema wraps have been taken and in my opinion, it is going to be started in a day or two.""",

"""HISTORY: The patient is a 78-year-old gentleman with no substantial past medical history. He denies diabetes including kidney disease, stroke, vision loss, or neuropathy. At this time, he has been admitted for anemia with hemoglobin of 7.1 and requiring transfusion. He reports that he has no signs or symptom of bleeding and had a blood transfusion approximately two months ago and actually several weeks before that blood transfusion, he had a transfusion for anemia. He has been placed on B12, oral iron, and Procrit. At this time, we are asked to evaluate him for further causes and treatment for his anemia. No fevers, rash, arthralgias, or myalgias.""",

"""The patient is a 36-year-old gentleman admitted to the hospital because he passed out at home. Over the past week, he has been noticing increasing shortness of breath. He also started having some abdominal pain; however, he continued about his regular activity until the other day when he passed out at home. His wife called paramedics and he was brought to the emergency room. He has been started on heparin and we are asked to see him because of increasing BUN and creatinine. The patient has no past history of any renal problems. He feels that he has been in good health until this current episode. His appetite has been good. He denies swelling in his feet or ankles. He denies chest pain. He denies any unexplained weight loss. He denies any recent change in bowel habits."""

]

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(sample_text,StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|HPI: A 69-year-old white female with a history of depression, anxiety, admitted to the ABCD Hospi...|
|The patient is a 28-year-old, who is status post gastric bypass surgery nearly one year ago. He h...|
|The patient is a 51-year-old Caucasian female with past medical history of morbid obesity and chr...|
|HISTORY: The patient is a 78-year-old gentleman with no substantial past medical history. He deni...|
|The patient is a 36-year-old gentleman admitted to the hospital because he passed out at home. Ov...|
+----------------------------------------------------------------------------------------------------+



## 5. Use the pipeline to create outputs

In [None]:
result = nlp_pipeline.fit(df).transform(df)

result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin, 
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata,
                                     result.resolution.result,
                                     result.resolution.metadata,)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']").alias("begin"),
              F.expr("cols['2']").alias("end"),
              F.expr("cols['3']['entity']").alias("entity"),
              F.expr("cols['4']").alias("UML_code"),
              F.expr("cols['5']['resolved_text']").alias("description"),
              F.expr("cols['5']['all_k_results']").alias("all_codes"),
              F.expr("cols['5']['all_k_resolutions']").alias("resolutions"),
              ).show(truncate=40)

+-----------------------------------+-----+---+-------+--------+----------------------------------------+----------------------------------------+----------------------------------------+
|                              chunk|begin|end| entity|UML_code|                             description|                               all_codes|                             resolutions|
+-----------------------------------+-----+---+-------+--------+----------------------------------------+----------------------------------------+----------------------------------------+
|                         depression|   50| 59|PROBLEM|C1999266|                              Depression|C1999266:::C0541868:::C1319226:::C054...|Depression:::DEPRESSION FUNCTIONAL:::...|
|                            anxiety|   62| 68|PROBLEM|C1963064|                                 Anxiety|C1963064:::C0518695:::C0564474:::C471...|Anxiety:::associated anxiety:::anxiet...|
|                shortness of breath|  126|144|PROBLEM|C0748

## 6. Visualize results

In [None]:
from sparknlp_display import EntityResolverVisualizer

resolver_viz = EntityResolverVisualizer()


for j in range(df.count()):
    resolver_viz.display(result = result.collect()[j], label_col = "ner_chunk", resolution_col="resolution")
    print("\n\n")
























