![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/42.Opioid_Models.ipynb)

# **Opioid Models**

This notebook includes details about different kinds of pretrained models to detect and label opioid related entities within text data. Opioids are a class of drugs that include the illegal drug heroin, synthetic opioids such as fentanyl, and pain relievers available legally by prescription.

## Healthcare NLP for Data Scientists Course

If you are not familiar with the components in this notebook, you can check [Healthcare NLP for Data Scientists Udemy Course](https://www.udemy.com/course/healthcare-nlp-for-data-scientists/) and the [MOOC Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP) for each components.

## **Setup**

In [None]:
import json
import os

from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.4.1 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [None]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp_jsl.pretrained import InternalResourceDownloader

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

spark

Spark NLP Version : 5.3.1
Spark NLP_JSL Version : 5.3.1


## **List of Pretrained Models**

In [None]:
df = pd.DataFrame()
for model_type in ['MedicalNerModel', 'AssertionDLModel']:
    model_list = sorted(list(set([model[0] for model in InternalResourceDownloader.returnPrivateModels(model_type) if 'opioid' in model[0]])))
    if len(model_list) > 0:
      if model_type == "MedicalNerModel":
        model_list = list(filter(lambda x: "wip" not in x, model_list))
      df = pd.concat([df, pd.DataFrame(model_list, columns = [model_type])], axis = 1)

df.fillna('')

Unnamed: 0,MedicalNerModel,AssertionDLModel
0,ner_opioid,assertion_opioid_drug_status_wip
1,,assertion_opioid_general_symptoms_status_wip
2,,assertion_opioid_wip


## **NER Models**

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_opioid", "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])

sample_texts = ["""The patient, unmarried and with a significant history of substance abuse involving the illicit consumption of various opioids such as heroin, fentanyl, and oxycodone, presented with a headache and was diagnosed PTSD. Despite denying the use of alcohol, smoking, or marijuana, the patient, who has been unemployed for several months, required administration of Narcan for suspected opioid overdose. A recent toxicology screen confirmed the presence of opioids, and showed negative results for benzodiazepines, cocaine, amphetamines, barbiturates, and tricyclic substances.""",
                """The patient presented with symptoms consistent with opioid withdrawal, including feelings of anxiety, tremors, and diarrhea. Vital signs were within normal limits, and supportive measures were initiated. The patient was closely monitored for potential complications and provided with appropriate pharmacological interventions to manage their symptoms.""",
               """The patient presented to the rehabilitation facility with a documented history of opioid abuse, primarily stemming from misuse of prescription percocet pills intended for their partner's use. Initial assessment revealed withdrawal symptoms consistent with opioid dependency, including agitation, diaphoresis, and myalgias.""",
               """The patient presented to the emergency department following an overdose on cocaine. On examination, the patient displayed signs of sympathetic nervous system stimulation, including tachycardia, hypertension, dilated pupils, and agitation.""",
               """The patient, known for a history of substance abuse, was brought to the hospital in a highly agitated and aggressive state, consistent with potential cocaine use. Initial assessment revealed signs of sympathetic overstimulation, including tachycardia, hypertension, and profuse sweating."""]


data = spark.createDataFrame(sample_texts, StringType()).toDF("text")

result = pipeline.fit(data).transform(data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_opioid download started this may take some time.
[OK!]


In [None]:
ner_opioid_labels = sorted(list(set([label.split('-')[-1] for label in ner_model.getClasses() if label != 'O'])))

len(ner_opioid_labels)

22

In [None]:
label_df = pd.DataFrame()
for column in range((len(ner_opioid_labels)//10)+1):
  label_df = pd.concat([label_df, pd.DataFrame(ner_opioid_labels, columns = [''])[column*10:(column+1)*10].reset_index(drop= True)], axis = 1)

label_df.fillna('')

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3
0,alcohol_use,general_symptoms,test_result
1,antidote,legal_issue,violence
2,communicable_disease,marital_status,
3,drug_duration,opioid_drug,
4,drug_form,other_disease,
5,drug_frequency,other_drug,
6,drug_quantity,psychiatric_issue,
7,drug_route,sexual_orientation,
8,drug_strength,substance_use_disorder,
9,employment,test,


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result, result.ner_chunk.begin, result.ner_chunk.end, result.ner_chunk.metadata)).alias("cols")) \
               .select( F.expr("cols['0']").alias("chunk"),
                       F.expr("cols['1']").alias("begin"),
                       F.expr("cols['2']").alias("end"),
                      F.expr("cols['3']['entity']").alias("ner_label"))\
                       .filter("ner_label!='O'")\
                       .show(1000,truncate=False)

+--------------------------------------+-----+---+----------------------+
|chunk                                 |begin|end|ner_label             |
+--------------------------------------+-----+---+----------------------+
|unmarried                             |13   |21 |marital_status        |
|substance abuse                       |57   |71 |substance_use_disorder|
|opioids                               |118  |124|opioid_drug           |
|heroin                                |134  |139|opioid_drug           |
|fentanyl                              |142  |149|opioid_drug           |
|oxycodone                             |156  |164|opioid_drug           |
|headache                              |184  |191|general_symptoms      |
|PTSD                                  |211  |214|psychiatric_issue     |
|alcohol                               |244  |250|alcohol_use           |
|marijuana                             |265  |273|other_drug            |
|unemployed                           

In [None]:
results = result.collect()

In [None]:
results[0]['ner_chunk']

[Row(annotatorType='chunk', begin=13, end=21, result='unmarried', metadata={'sentence': '0', 'chunk': '0', 'ner_source': 'ner_chunk', 'entity': 'marital_status', 'confidence': '0.9939'}, embeddings=[]),
 Row(annotatorType='chunk', begin=57, end=71, result='substance abuse', metadata={'sentence': '0', 'chunk': '1', 'ner_source': 'ner_chunk', 'entity': 'substance_use_disorder', 'confidence': '0.92155004'}, embeddings=[]),
 Row(annotatorType='chunk', begin=118, end=124, result='opioids', metadata={'sentence': '0', 'chunk': '2', 'ner_source': 'ner_chunk', 'entity': 'opioid_drug', 'confidence': '0.9996'}, embeddings=[]),
 Row(annotatorType='chunk', begin=134, end=139, result='heroin', metadata={'sentence': '0', 'chunk': '3', 'ner_source': 'ner_chunk', 'entity': 'opioid_drug', 'confidence': '0.9999'}, embeddings=[]),
 Row(annotatorType='chunk', begin=142, end=149, result='fentanyl', metadata={'sentence': '0', 'chunk': '4', 'ner_source': 'ner_chunk', 'entity': 'opioid_drug', 'confidence': '1.

In [None]:
from sparknlp_display import NerVisualizer
visualiser = NerVisualizer()

for i in range(len(results)):
    visualiser.display(results[i], label_col='ner_chunk')

## **Assertion Models**

<div align="center">

|    | model_name              |Predicted Entities|
|---:|:------------------------|-|
| 1        | [assertion_opioid_wip](https://nlp.johnsnowlabs.com/2024/02/28/assertion_opioid_wip_en.html)     | `present`, `history`, `absent`, `hypothetical`, `past`, `family_or_someoneelse` |
| 2          | [assertion_opioid_drug_status_wip](https://nlp.johnsnowlabs.com/2024/02/28/assertion_opioid_drug_status_wip_en.html)       | `opioid_medical_use`, `opioid_abuse`, `opioid_overdose`, `drug_medical_use`, `drug_abuse`, `drug_overdose` |
| 3          | [assertion_opioid_general_symptoms_status_wip](https://nlp.johnsnowlabs.com/2024/02/28/assertion_opioid_general_symptoms_status_wip_en.html)       | `underlying_pain`, `withdrawal_symptom`, `overdose_symptom` |
|||


</div>

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_opioid", "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

assertion = AssertionDLModel.pretrained("assertion_opioid_wip", "en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter,
    assertion
    ])

sample_texts = [ """The patient with a history of substance abuse presented with clinical signs indicative of opioid overdose, including constricted pupils, cyanotic lips, drowsiness, and confusion. Immediate assessment and intervention were initiated to address the patient's symptoms and stabilize their condition. Close monitoring for potential complications, such as respiratory depression, was maintained throughout the course of treatment.""",
                """The patient presented to the rehabilitation facility with a documented history of opioid abuse, primarily stemming from misuse of prescription percocet pills intended for their partner's use. Initial assessment revealed withdrawal symptoms consistent with opioid dependency.""",
               """The patient was brought to the clinic exhibiting symptoms consistent with opioid withdrawal, despite denying any illicit drug use. Upon further questioning, the patient revealed using tramadol for chronic pain management."""]

data = spark.createDataFrame(sample_texts, StringType()).toDF("text")

result = pipeline.fit(data).transform(data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_opioid download started this may take some time.
[OK!]
assertion_opioid_wip download started this may take some time.
[OK!]


In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.begin,
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata,
                                     result.assertion.result,
                                     result.assertion.metadata)).alias("cols")) \
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']").alias("begin"),
              F.expr("cols['2']").alias("end"),
              F.expr("cols['3']['entity']").alias("ner_label"),
              F.expr("cols['4']").alias("assertion"),
              F.expr("cols['5']['confidence']").alias("confidence") ).show(truncate=False)

+----------------------+-----+---+----------------------+------------+----------+
|chunk                 |begin|end|ner_label             |assertion   |confidence|
+----------------------+-----+---+----------------------+------------+----------+
|substance abuse       |30   |44 |substance_use_disorder|history     |0.9644    |
|opioid                |90   |95 |opioid_drug           |hypothetical|0.7974    |
|overdose              |97   |104|other_disease         |hypothetical|0.9961    |
|constricted pupils    |117  |134|general_symptoms      |past        |0.732     |
|cyanotic lips         |137  |149|general_symptoms      |past        |0.8501    |
|drowsiness            |152  |161|general_symptoms      |past        |0.9469    |
|confusion             |168  |176|general_symptoms      |past        |0.9686    |
|respiratory depression|351  |372|other_disease         |hypothetical|0.5921    |
|opioid                |82   |87 |opioid_drug           |history     |0.735     |
|percocet       

In [None]:
results = result.collect()

In [None]:
from sparknlp_display import AssertionVisualizer
visualiser = AssertionVisualizer()

for i in range(len(results)):
    visualiser.display(results[i], 'ner_chunk', 'assertion')