![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/ER_UMLS_CUI_DRUG_SUBSTANCE.ipynb)

# `sbiobertresolve_umls_drug_substance` **Models**

This model maps clinical entities to UMLS CUI codes. It is trained on 2021AB UMLS dataset. The complete dataset has 127 different categories, and this model is trained on the Clinical Drug, Pharmacologic Substance, Antibiotic, Hazardous or Poisonous Substance categories using sbiobert_base_cased_mli embeddings.

## 1. Colab Setup

**Import license keys**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 3. Select the model and construct the pipeline

In [None]:
MODEL_NAME = "sbiobertresolve_umls_drug_substance"

**Create the pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner = medical.NerModel.pretrained("ner_posology_greedy", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = medical.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(["DRUG"])

c2doc = nlp.Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc") 

sbert_embedder = nlp.BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")\
    .setCaseSensitive(False)

resolver = medical.SentenceEntityResolverModel.pretrained(MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["ner_chunk_doc", "sbert_embeddings"]) \
    .setOutputCol("resolution")


nlp_pipeline = Pipeline(
    stages = [
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner,
        ner_converter,
        c2doc,
        sbert_embedder,
        resolver
  ])


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_posology_greedy download started this may take some time.
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_umls_drug_substance download started this may take some time.
[OK!]


## 4. Create example inputs

In [None]:
sample_text = [
"""HISTORY OF PRESENT ILLNESS: The patient is a 72-year-old gentleman who was diagnosed with chronic lymphocytic leukemia in May 2008. He was noted to have autoimmune hemolytic anemia at the time of his CLL diagnosis. He comes in to clinic today for follow-up and complete blood count.
CURRENT MEDICATIONS: Levothyroxine 50 mcg, vitamin C 500 mg and simvastatin 20 mg""",

""" She did receive a course of bactrim for 14 days for UTI. Evidently, at some point in time, the patient was noted to develop a pressure-type wound on the sole of her left foot and left great toe. She was also noted to have a large sacral wound; this continues to receive daily care. The patient was transferred secondary to inability and continue of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given Levothyroxine 0.1 mg daily, Aspirin 81 mg daily, Percocet 5/325 mg, Magnesium Citrate""",

"""The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that his chest pain started yesterday evening and has been somewhat intermittent. He has been advised Aspirin 81 mg, Hydrochlorothiazide 50 mg and Nitroglycerin 1/150 sublingually PRN chest pain.""",

"""HISTORY: The patient is a 78-year-old gentleman with no substantial past medical history except for diabetes. He denies any comorbid complications of the diabetes including kidney disease, heart disease, stroke, vision loss, or neuropathy. At this time, he has been admitted for anemia with hemoglobin of 7.1 and requiring transfusion. He reports that he has no signs or symptom of bleeding and had a blood transfusion approximately two months ago and actually several weeks before that blood transfusion, he had a transfusion for anemia. He has been placed on Oral iron, and Procrit. At this time, we are asked to evaluate him for further causes and treatment for his anemia. He denies any constitutional complaints except for fatigue, malaise, and some dyspnea. He has no adenopathy that he reports. No fevers, night sweats, bone pain, rash, arthralgias, or myalgias.
PAST MEDICAL HISTORY: Diabetes.
PAST SURGICAL HISTORY: Hernia repair.
ALLERGIES: He has no allergies.
MEDICATIONS: Listed in the chart and include Coumadin, Lasix, Diltiazem, Prevacid.""",

"""INTERIM HISTORY: The patient comes to the clinic today for followup. I am seeing him once every 4 to 8 weeks. He is off of all immunosuppression. He does have mild chronic GVHD but not enough to warrant any therapy and the disease has been under control and he is 4-1/2-years posttransplant.
He has multiple complaints. He has had hematochezia. I referred him to gastroenterology. They did an upper and lower endoscopy. No evidence of ulcers or any abnormality was found. Some polyps were removed. They were benign. He may have mild iron deficiency, but he is fatigued and has several complaints related to his level of activity.
CURRENT MEDICATIONS: Cozaar, Prozac 20 mg, Potassium 10 mEq, Mirapex.""",

"""HISTORY OF PRESENT ILLNESS: This is a return visit to the renal clinic for the patient where she is followed up for diabetes and kidney disease management. Her last visit to this clinic was approximately three months ago. Since that time, the patient states that she has had some variability in her glucose control too largely to recent upper and lower respiratory illnesses. She did not seek attention for these, and the symptoms have begun to subside on their own and in the meantime, she continues to have some difficulties with blood sugar management. Her 14-day average is 191. She was able to manage this completely on her own. In the meantime, she is not having any other medical problems that have interfered with glucose control. Her diet has been a little bit different in that she had been away visiting with her family for some period of time as well.
CURRENT MEDICATIONS: Fluoxetine 20 mg, Protonix 40 mg, Calcium carbonate 500 mg, Valsartan 80 mg, Amlodipine 5 mg, Aspirin 81 mg,
""",

"""HISTORY OF PRESENT ILLNESS:  The patient had blood work done at Dr. XYZ's office on June 01, 2006, which revealed an elevation in his creatinine up to 2.3. He was asked to come in to see a nephrologist for further evaluation. I am therefore asked by Dr. XYZ to see this patient in consultation for evaluation of acute on chronic kidney failure. He has not had an ultrasound but has been diagnosed with prostatic hypertrophy by his primary care doctor and placed on Flomax. He states that his urinary dribbling and weak stream had not improved since doing this. For the past couple of weeks, he has had dizziness in the morning. This is then associated with low glucose. However, the patient's blood glucose this morning was 123 and he still was dizzy. This was worse on standing. He states that he has been checking his blood pressure regularly at home because he has felt so bad and that he has gotten under 100/60 on several occasions. His pulses remained in the 60s.
MEDICATIONS: Nitroglycerin p.r.n., potassium 10 mEq daily, folate 1 mg b.i.d., Niaspan 500 mg daily, atenolol 50 mg daily, aspirin 325 mg daily, Tylenol, and Flomax 0.4 mg daily.""",

"""She was immediately given hydrogen peroxide 30 mg to treat the infection on her leg, and has been advised neosporin cream for 5 days. She has a history of taking, isobutyltheophylline, and hydrocorticosterone."""
]

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(sample_text, StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|HISTORY OF PRESENT ILLNESS: The patient is a 72-year-old gentleman who was diagnosed with chronic...|
| She did receive a course of bactrim for 14 days for UTI. Evidently, at some point in time, the p...|
|The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The ...|
|HISTORY: The patient is a 78-year-old gentleman with no substantial past medical history except f...|
|INTERIM HISTORY: The patient comes to the clinic today for followup. I am seeing him once every 4...|
|HISTORY OF PRESENT ILLNESS: This is a return visit to the renal clinic for the patient where she ...|
|HISTORY OF PRESENT ILLNESS:  The patient had blood work done at Dr. XYZ'

## 5. Use the pipeline to create outputs

In [None]:
limited_df = df.limit(2)

result = nlp_pipeline.fit(limited_df).transform(limited_df)

result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin, 
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata,
                                     result.resolution.result,
                                     result.resolution.metadata,)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
                    F.expr("cols['1']").alias("begin"),
                    F.expr("cols['2']").alias("end"),
                    F.expr("cols['3']['entity']").alias("entity"),
                    F.expr("cols['4']").alias("UML_code"),
                    F.expr("cols['5']['all_k_results']").alias("all_codes"),
                    F.expr("cols['5']['all_k_resolutions']").alias("resolutions"),
                    F.expr("cols['5']['resolved_text']").alias("ndc_description")).show(truncate=40)

+--------------------+-----+---+------+--------+----------------------------------------+----------------------------------------+
|               chunk|begin|end|entity|UML_code|                               all_codes|                             resolutions|
+--------------------+-----+---+------+--------+----------------------------------------+----------------------------------------+
|Levothyroxine 50 mcg|  304|323|  DRUG|C0775246|C0775246:::C1828438:::C0978141:::C430...|LEVOTHYROXINE NA 50MCG TAB:::LEVOTHYR...|
|    vitamin C 500 mg|  326|341|  DRUG|C4765109|C4765109:::C0691927:::C0773489:::C097...|vitamin C 500 MG Oral Powder:::vitami...|
|   simvastatin 20 mg|  347|363|  DRUG|C0989915|C0989915:::C5137076:::C0980170:::C137...|simvastatin 20 MG:::rosuvastatin 20 M...|
|             bactrim|   29| 35|  DRUG|C0591139|C0591139:::C1530008:::C0718801:::C004...|Bactrim:::Bactil:::bactine:::Bactrime...|
|Levothyroxine 0.1 mg|  467|486|  DRUG|C2730026|C2730026:::C2730027:::C2747397:::C2

## 6. Visualize results

In [None]:
from sparknlp_display import EntityResolverVisualizer

resolver_viz = EntityResolverVisualizer()


for j in range(limited_df.count()):
    resolver_viz.display(result = result.collect()[j], label_col = "ner_chunk", resolution_col="resolution")
    print("\n\n")









