![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/ER_RXNORM_DRUG_CLASS.ipynb)

# `sbiobertresolve_rxnorm_disposition` **Models**

This model maps medication entities (like drugs/ingredients) to RxNorm codes and their dispositions using `sbiobert_base_cased_mli` Sentence Bert Embeddings.

## 1. Colab Setup

**Import license keys**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType, IntegerType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G", 
          "spark.kryoserializer.buffer.max":"2000M", 
          "spark.driver.maxResultSize":"2000M"} 

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.2.8
Spark NLP_JSL Version : 4.2.8


## 2. Select the model and construct the pipeline

In [4]:
MODEL_NAME = "sbiobertresolve_rxnorm_disposition"

**Create the pipeline**

In [5]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner = MedicalNerModel.pretrained("ner_posology", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(["DRUG"])

c2doc = Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc") 

sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")\
    .setCaseSensitive(False)

resolver = SentenceEntityResolverModel.pretrained(MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["sbert_embeddings"]) \
    .setOutputCol("resolution")\
    .setDistanceFunction("EUCLIDEAN")


nlp_pipeline = Pipeline(
    stages = [
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner,
        ner_converter,
        c2doc,
        sbert_embedder,
        resolver
  ])


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_posology download started this may take some time.
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_rxnorm_disposition download started this may take some time.
[OK!]


## 3. Create example inputs

In [6]:
sample_text = [
"""Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on dapagliflozin for T2DM and atorvastatin for HTG . She had been on dapagliflozin for six months at the time of presentation .The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals. """,
"""This 48-year-old woman returns in followup after a full-night sleep study performed to evaluate her for daytime fatigue and insomnia.,  PAST MEDICAL HISTORY:,1.  Depression.,2.  Hepatitis C.,3.  Hypertension.,4.  Inhaled and intravenous drug abuse history.,The patient has a history of smoking two packs per day of cigarettes for approximately 25 pounds.  She also has a history of recurrent atypical chest pain for which she has been evaluated.,MEDICATIONS: , Current medications include the following: Methadone 110 mg by mouth every day , Avalide (irbesartan)., Albuterol .""",
"""The patient returns to the Pulmonary Medicine Clinic for followup evaluation of interstitial disease secondary to lupus pneumonitis.  She was last seen in the Pulmonary Medicine Clinic in January 2004.  Since that time, her respiratory status has been quite good.  She has had no major respiratory difficulties; however, starting yesterday she began with increasing back and joint pain and as a result a deep breath has caused some back discomfort.  She denies any problems with cough or sputum production.  No fevers or chills.  Recently, she has had a bit more problems with fatigue.  For the most part, she has had no pulmonary limitations to her activity.,CURRENT MEDICATIONS:, prednisone, she was 2.5 mg daily, but discontinued this on 06/16/2004, aspirin 81 mg daily.  She is also on calcium, vitamin D .,ALLERGIES:,  Penicillin and also intolerance to shellfish.""",
"""CHIEF COMPLAINT:,  Leg pain.,HISTORY OF PRESENT ILLNESS:,  This is a 56-year-old female who has pain in her legs at nighttime and when she gets up it comes and goes, radiates from her buttocks to her legs, sometimes it is her ankle. She has had some night sweats occasionally.  She has had a little bit of fever and nausea.  She has noticed her blood sugars have been low.  She has lost over 30 pounds after exercising doing water aerobics at Genesis in Wichita.  She has noticed her fasting blood sugars have been ranging from 100 to 120.  Blood sugars one and a half hours after meals have been 185.  She is coming in for a diabetic checkup in one month and wants lab prior to that time.  She has been eating more meat recently and has not been on a diet for cholesterol.,CURRENT MEDICATIONS: Hydroxyzine pamoate 50 mg at h.s., aspirin 81 mg q.d.,  estradiol one mg q.d., and glucosamine 1000 mg q.d.,ALLERGIES: Cipro, Sulfac """,
"""This is an extremely pleasant 64-year-old gentleman who I am following for essential thrombocytosis.  He was first diagnosed when he first saw a hematologist on 07/09/07.  At that time, his platelet count was 1,240,000.  He was initially started on Hydrea 1000 mg q.d.  On 07/11/07, he underwent a bone marrow biopsy, which showed essential thrombocytosis.  He was positive for the JAK-2 mutation.  On 11/06/07, his platelets were noted to be 766,000.  His current Hydrea dose is now 1500 mg on Mondays and Fridays and 1000 mg on all other days.  He moved to ABCD in December 2009 in an attempt to improve his wife's rheumatoid arthritis.,Overall, he is doing well.  He has a good energy level, and his ECOG performance status is 0.  He denies any fevers, chills, or night sweats.  No lymphadenopathy.  No nausea or vomiting.  No change in bowel or bladder habits.,CURRENT MEDICATIONS: , Hydrea 1500 mg on Mondays and Fridays and 1000 mg the other days of the week, vitamin D q.d , aspirin 81 mg q.d.,ALLERGIES: , No known drug allergies.,REVIEW OF SYSTEMS:,  As per the HPI, otherwise negative.,PAST MEDICAL HISTORY:,1.  He is status post an appendectomy.,2.  Status post a tonsillectomy and adenoidectomy.,3.  Status post bilateral cataract surgery.,4.  BPH.""",
"""The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that his chest pain started yesterday evening and has been somewhat intermittent. He has been advised Aspirin 81 milligrams QDay. HISTORY OF PRESENT ILLNESS: , This is a 66-year-old gentleman status post deceased donor kidney transplant in 12/07,  who has had recurrent urinary retention issues since that time.  Most recently, he was hospitalized on 02/04/08 for acute renal insufficiency,  which was probably secondary to dehydration.  He was seen by urology again at this visit for urinary retention.  He had been seen by urology during a previous hospitalization and he passed his voiding trial at the time of his stent removal on 01/22/08.  Cystoscopy showed at that time obstructive BPH. During the most recent readmission on 02/04/08, he went back into urinary retention and he had had a Foley placed at the outside hospital.,PAST MEDICAL HISTORY:,1. End-stage renal disease, now status post deceased donor kidney transplant in 12/07.,2.  Hypertension.,3.  History of nephrolithiasis.,4. Gout.,5.  BPH.,6.  DJD., HOME MEDICATIONS: Clonidine 0.2 mg, Allopurinol , Oxybutynin , Aspirin , Omeprazole , Prednisone , Ganciclovir , Nystatin swish and swallow , Dapsone , Finasteride .ALLERGIES: No known drug allergies.""",
"""HISTORY OF PRESENT ILLNESS: , This is a 61-year-old woman with a history of polyarteritis nodosa, mononeuritis multiplex involving the lower extremities, and severe sleep apnea returns in followup following an overnight sleep study, on CPAP and oxygen to evaluate her for difficulty in initiating and maintaining sleep.  She returns today to review results of an inpatient study performed approximately two weeks ago.,In the meantime, the patient reports she continues on substantial doses of opiate medication to control leg pain from mononeuritis multiplex.,The patient reports that she generally initiates sleep on CPAP, but rips her mask off, tosses and turns throughout the night and has "terrible quality sleep.",MEDICATIONS: , Current medications are as previously noted.  Changes include reduction in prednisone from 9 to 6 mg by mouth every morning. Her an immediate release morphine preparation, 45 to 75 mg by mouth every 8 hours as needed.ASSESSMENT:,1.  Obesity hypoventilation syndrome.  The patient has evidence of a well-compensated respiratory acidosis, which is probably primarily related to severe obesity.  In addition, there may be contribution from large doses of opiates and standing doses of gabapentin.,2.  Severe central sleep apnea, on CPAP at 10 cmH2O and supplemental oxygen at 8 liters per minute.  The breathing pattern is that of cluster or Biot's breathing throughout sleep.  The primary etiology is probably opiate use, with contribution with further exacerbation by severe obesity which acts to lower the baseline oxyhemoglobin saturation, and worsen desaturations during apneic episodes."""
]

In [7]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(sample_text, StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a res...|
|This 48-year-old woman returns in followup after a full-night sleep study performed to evaluate h...|
|The patient returns to the Pulmonary Medicine Clinic for followup evaluation of interstitial dise...|
|CHIEF COMPLAINT:,  Leg pain.,HISTORY OF PRESENT ILLNESS:,  This is a 56-year-old female who has p...|
|This is an extremely pleasant 64-year-old gentleman who I am following for essential thrombocytos...|
|The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The ...|
|HISTORY OF PRESENT ILLNESS: , This is a 61-year-old woman with a history

## 4. Use the pipeline to create outputs

In [8]:
limited_df = df.limit(2)
result = nlp_pipeline.fit(limited_df).transform(limited_df)

result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin, 
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata,
                                     result.resolution.result,
                                     result.resolution.metadata,)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
                    F.expr("cols['1']").alias("begin"),
                    F.expr("cols['2']").alias("end"),
                    F.expr("cols['3']['entity']").alias("entity"),
                    F.expr("cols['4']").alias("UML_code"),
                    F.expr("cols['5']['all_k_results']").alias("all_codes"),
                    F.expr("cols['5']['all_k_resolutions']").alias("resolutions")).show(truncate=40)

+----------------+-----+---+------+--------+----------------------------------------+----------------------------------------+
|           chunk|begin|end|entity|UML_code|                               all_codes|                             resolutions|
+----------------+-----+---+------+--------+----------------------------------------+----------------------------------------+
|     amoxicillin|   76| 86|  DRUG|     723|723:::540141:::437527:::1152900:::370...|amoxicillin:::amoxicillinan:::amoxici...|
|   dapagliflozin|  135|147|  DRUG| 1488564|1488564:::1545653:::1992672:::1488566...|dapagliflozin:::empagliflozin:::ertug...|
|    atorvastatin|  162|173|  DRUG|   83367|83367:::1158285:::1158284:::301542:::...|atorvastatin:::atorvastatin pill:::at...|
|   dapagliflozin|  201|213|  DRUG| 1488564|1488564:::1545653:::1992672:::1488566...|dapagliflozin:::empagliflozin:::ertug...|
|insulin glargine|  347|362|  DRUG|  274783|274783:::1157459:::378864:::1740938::...|insulin glargine:::insulin

## 5. Visualize results

In [9]:
from sparknlp_display import EntityResolverVisualizer

resolver_viz = EntityResolverVisualizer()


for j in range(limited_df.count()):
    resolver_viz.display(result = result.collect()[j], label_col = "ner_chunk", resolution_col="resolution")
    print("\n\n")









