![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/ER_NDC.ipynb)

# `sbiobertresolve_ndc` **Models**

This model maps clinical entities and concepts (like drugs/ingredients) to [National Drug Codes](https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory) using `sbiobert_base_cased_mli` Sentence Bert Embeddings. It also returns package options and alternative drugs in the all_k_aux_label column.

## 1. Colab Setup

**Import license keys**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType, IntegerType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G", 
          "spark.kryoserializer.buffer.max":"2000M", 
          "spark.driver.maxResultSize":"2000M"} 

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.2.8
Spark NLP_JSL Version : 4.2.8


## 2. Select the model and construct the pipeline

In [4]:
MODEL_NAME = "sbiobertresolve_ndc"

**Create the pipeline**

In [5]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(["DRUG"])

c2doc = Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc") 

sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")\
    .setCaseSensitive(False)

resolver = SentenceEntityResolverModel.pretrained(MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["sbert_embeddings"]) \
    .setOutputCol("resolution")


nlp_pipeline = Pipeline(
    stages = [
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner,
        ner_converter,
        c2doc,
        sbert_embedder,
        resolver
  ])


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_posology_greedy download started this may take some time.
[OK!]
sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[OK!]
sbiobertresolve_ndc download started this may take some time.
[OK!]


## 3. Create example inputs

In [6]:
sample_text = [
               
"""On presentation included gabapentin 100 mg/1, aspirin 81 mg. The patient also takes oxybutynin chloride 1 kg/kg.""",

"""FAMILY HISTORY: Noncontributory. MEDICATIONS: meloxicam 7.5 mg/1 , metoprolol tartrate 5 mg/5ml , hydrocortisone , prednisone 5 mg/1 TABLET [Prednisone]""",

"""A 28-year-old female with a history of gestational diabetes mellitus, used to take metformin 1000 mg two times a day, presented with a one-week history of polyuria , polydipsia , poor appetite, vomiting. She was seen by the endocrinology service and discharged on ibuprofen 800 mg/1, insulin glargine 300 u/ml.""",

"""The patient is a 72-year-old gentleman who was diagnosed with chronic lymphocytic leukemia in May 2008. He was noted to have autoimmune hemolytic anemia at the time of his CLL diagnosis. CURRENT MEDICATIONS:  levothyroxine sodium 62.5 ug/ml, simvastatin 20 mg, levothyroxine sodium 75 ug/1.""",

"""The patient was transferred secondary to inability and continue of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given aspirin 81 mg, vitamin a, metformin 500 mg, sotalol hydrochloride 80 mg/1"""

]

In [7]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(sample_text, StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|On presentation included gabapentin 100 mg/1, aspirin 81 mg. The patient also takes oxybutynin ch...|
|FAMILY HISTORY: Noncontributory. MEDICATIONS: meloxicam 7.5 mg/1 , metoprolol tartrate 5 mg/5ml ,...|
|A 28-year-old female with a history of gestational diabetes mellitus, used to take metformin 1000...|
|The patient is a 72-year-old gentleman who was diagnosed with chronic lymphocytic leukemia in May...|
|The patient was transferred secondary to inability and continue of her diabetes, the sacral decub...|
+----------------------------------------------------------------------------------------------------+



## 4. Use the pipeline to create outputs

In [8]:
result = nlp_pipeline.fit(df).transform(df)

result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin, 
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata,
                                     result.resolution.result,
                                     result.resolution.metadata,)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
                    F.expr("cols['1']").alias("begin"),
                    F.expr("cols['2']").alias("end"),
                    F.expr("cols['3']['entity']").alias("entity"),
                    F.expr("cols['4']").alias("UML_code"),
                    F.expr("cols['5']['all_k_results']").alias("all_codes"),
                    F.expr("cols['5']['all_k_resolutions']").alias("resolutions"),
                    F.expr("cols['5']['resolved_text']").alias("ndc_description")).show(truncate=40)

+-------------------------------------+-----+---+------+----------+----------------------------------------+----------------------------------------+-------------------------------+
|                                chunk|begin|end|entity|  UML_code|                               all_codes|                             resolutions|                ndc_description|
+-------------------------------------+-----+---+------+----------+----------------------------------------+----------------------------------------+-------------------------------+
|                  gabapentin 100 mg/1|   25| 43|  DRUG|60505-0112|60505-0112:::80425-0150:::17351-5010:...|gabapentin 100 mg/1:::gabapentin 100m...|            gabapentin 100 mg/1|
|                        aspirin 81 mg|   46| 58|  DRUG|41250-0780|41250-0780:::72036-0080:::17714-0009:...|aspirin 81 mg:::aspirin 81mg:::aspiri...|                  aspirin 81 mg|
|          oxybutynin chloride 1 kg/kg|   84|110|  DRUG|49169-1019|49169-1019:::17381-0015

## 5. Visualize results

In [9]:
from sparknlp_display import EntityResolverVisualizer

resolver_viz = EntityResolverVisualizer()


for j in range(df.count()):
    resolver_viz.display(result = result.collect()[j], label_col = "ner_chunk", resolution_col="resolution")
    print("\n\n")
























