

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/ER_RXNORM.ipynb)




# **RxNorm coding**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.



## 1. Colab Setup

Import license keys

In [1]:
import os
import json

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

sparknlp_version = license_keys["PUBLIC_VERSION"]
jsl_version = license_keys["JSL_VERSION"]

print ('SparkNLP Version:', sparknlp_version)
print ('SparkNLP-JSL Version:', jsl_version)

Saving v3_spark_nlp_for_healthcare.json to v3_spark_nlp_for_healthcare.json
SparkNLP Version: 3.0.1
SparkNLP-JSL Version: 3.0.0


Install dependencies

In [2]:
%%capture
for k,v in license_keys.items(): 
    %set_env $k=$v

!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/jsl_colab_setup.sh
!bash jsl_colab_setup.sh

# Install Spark NLP Display for visualization
!pip install --ignore-installed spark-nlp-display

Import dependencies into Python

In [3]:
import pandas as pd
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

import sparknlp
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
import sparknlp_jsl


Start the Spark session

In [4]:
spark = sparknlp_jsl.start(license_keys['SECRET'])

# manually configure the session
# params = {"spark.driver.memory" : "16G",
#           "spark.kryoserializer.buffer.max" : "2000M",
#           "spark.driver.maxResultSize" : "2000M"}

# spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

## 2. Select the Entity Resolver model and construct the pipeline

Select the models:

**RxNorm Entity Resolver models:**

1.   **chunkresolve_rxnorm_cd_clinical**
2.   **chunkresolve_rxnorm_sbd_clinical**
3.   **chunkresolve_rxnorm_scd_clinical**






For more details: https://github.com/JohnSnowLabs/spark-nlp-models#pretrained-models---spark-nlp-for-healthcare

In [5]:
## important: You can change NER models and whitelist entities according to your own requirements

# ner and entity resolver mapping dict
ner_er_dict = {'chunkresolve_rxnorm_scd_clinical': 'ner_posology',
              'chunkresolve_rxnorm_cd_clinical': 'ner_posology',
              'chunkresolve_rxnorm_sbd_clinical': 'ner_clinical'}
# entities to whitelist, so resolver works on only required entities
wl_er_dict = {'chunkresolve_rxnorm_scd_clinical': ['DRUG'],
             'chunkresolve_rxnorm_cd_clinical': ['DRUG'],
             'chunkresolve_rxnorm_sbd_clinical': ['TREATMENT']}
# Change this to the model you want to use and re-run the cells below.
model = 'chunkresolve_rxnorm_sbd_clinical'

Create the pipeline

In [6]:
document_assembler = DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentences')

tokenizer = Tokenizer()\
    .setInputCols(['sentences']) \
    .setOutputCol('tokens')

embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models')\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained(ner_er_dict[model], "en", "clinical/models") \
    .setInputCols(["sentences", "tokens", "embeddings"])\
    .setOutputCol("ner_tags")   

ner_chunker = NerConverter()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunk").setWhiteList(wl_er_dict[model])

chunk_embeddings = ChunkEmbeddings()\
    .setInputCols("ner_chunk", "embeddings")\
    .setOutputCol("chunk_embeddings")

entity_resolver = \
    ChunkEntityResolverModel.pretrained(model,"en","clinical/models")\
    .setInputCols("tokens","chunk_embeddings").setOutputCol("resolution")
    
pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_chunker,
    chunk_embeddings,
    entity_resolver])

empty_df = spark.createDataFrame([['']]).toDF("text")
pipeline_model = pipeline.fit(empty_df)

light_pipeline = sparknlp.base.LightPipeline(pipeline_model)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
Approximate size to download 13.9 MB
[OK!]
chunkresolve_rxnorm_sbd_clinical download started this may take some time.
Approximate size to download 17.9 MB
[OK!]


## 3. Create example inputs

In [7]:
# Enter examples as strings in this array
input_list = [
"""The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that his chest pain started yesterday evening and has been somewhat intermittent. He has been advised Aspirin 81 milligrams QDay. Humulin N. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually PRN chest pain.""",
]

# 4. Run the pipeline

In [8]:
df = spark.createDataFrame(pd.DataFrame({"text": input_list}))
result = pipeline_model.transform(df)
light_result = light_pipeline.fullAnnotate(input_list[0])

# 5. Visualize

In [23]:
result.select(
    F.explode(
        F.arrays_zip('ner_chunk.result', 
                     'ner_chunk.begin',
                     'ner_chunk.end',
                     'ner_chunk.metadata',
                     'resolution.metadata', 'resolution.result')
    ).alias('cols')
).select(
    F.expr("cols['0']").alias('chunk'),
    F.expr("cols['1']").alias('begin'),
    F.expr("cols['2']").alias('end'),
    F.expr("cols['3']['entity']").alias('entity'),
    F.expr("cols['4']['resolved_text']").alias('RxNorm_description'),
    F.expr("cols['5']").alias('RxNorm_code'),
).show(truncate=False)

+-------------+-----+---+---------+----------------------------------------------------+-----------+
|chunk        |begin|end|entity   |RxNorm_description                                  |RxNorm_code|
+-------------+-----+---+---------+----------------------------------------------------+-----------+
|Aspirin      |306  |312|TREATMENT|Aspirin 325 MG Oral Tablet [Buffex]                 |260848     |
|Humulin N    |334  |342|TREATMENT|insulin isophane 100 UNT/ML Pen Injector [Novolin N]|2206101    |
|insulin      |345  |351|TREATMENT|Oxytocin 10 UNT/ML Injection [Pitocin]              |1791725    |
|HCTZ         |370  |373|TREATMENT|Hydrochlorothiazide 50 MG Oral Tablet [Loqua]       |207946     |
|Nitroglycerin|387  |399|TREATMENT|Nitroglycerin 0.3 MG Sublingual Tablet [GTN]        |104430     |
+-------------+-----+---+---------+----------------------------------------------------+-----------+



In [19]:
from sparknlp_display import EntityResolverVisualizer

vis = EntityResolverVisualizer()

## To set custom label colors:
vis.set_label_colors({'TREATMENT':'#800080', 'PROBLEM':'#77b5fe'})

vis.display(light_result[0], 'ner_chunk', 'resolution', 'document')