# Detecting Adverse Drug Events From Conversational Texts

Adverse Drug Events (ADEs) are potentially very dangerous to patients and are top causes of morbidity and mortality. Many ADEs are hard to discover as they happen to certain groups of people in certain conditions and they may take a long time to expose. Healthcare providers conduct clinical trials to discover ADEs before selling the products but normally are limited in numbers. Thus, post-market drug safety monitoring is required to help discover ADEs after the drugs are sold on the market. 

Less than 5% of ADEs are reported via official channels and the vast majority is described in free-text channels: emails & phone calls to patient support centers, social media posts, sales conversations between clinicians and pharma sales reps, online patient forums, and so on. This requires pharmaceuticals and drug safety groups to monitor and analyze unstructured medical text from a variety of jargons, formats, channels, and languages - with needs for timeliness and scale that require automation. 

Here we show how to use Spark NLP's existing models to process conversational text and extract highly specialized ADE and DRUG information that can be used for various downstream use cases, including;
<break>
- Conversational Texts ADE Classification
- Detecting ADE and Drug Entities From Texts
- Analysis of Drug and ADE Entities
- Finding Drugs and ADEs Have Been Talked Most
- Detecting Most Common Drug-ADE Pairs
- Checking Assertion Status of ADEs
- Relations Between ADEs and Drugs

**Initial Configurations**

In [0]:
import json
import os

from pyspark.sql import SparkSession
from pyspark.ml import PipelineModel,Pipeline
from pyspark.sql import functions as F
from pyspark.sql.types import *

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
import sparknlp_jsl
import sparknlp

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")
pd.set_option("display.max_colwidth",100)

print('sparknlp.version : ',sparknlp.version())
print('sparknlp_jsl.version : ',sparknlp_jsl.version())

spark

## Download Dataset

We will use a slightly modified version of some conversational ADE texts which are downloaded from https://sites.google.com/site/adecorpus/home/document.

Also you can find an article about this dataset here: https://www.sciencedirect.com/science/article/pii/S1532046412000615

**We will work with two main files in the dataset:**

- ADE-AE.rel : Conversations with ADE.
- ADE-NEG.txt : Conversations with no ADE.

Lets get started with downloading these files.

In [0]:
!wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/data/ADE_Corpus_V2/DRUG-AE.rel -P /dbfs/
!wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/data/ADE_Corpus_V2/ADE-NEG.txt -P /dbfs/

Now we will create dataframes named `neg_df` with ADE-Negative conversations and `pos_df` with ADE-Positive conversations.

In [0]:
neg_df = spark.read.csv("/ADE-NEG.txt").select("_c0")\
                                       .withColumn("hash", F.hash("_c0"))\
                                       .orderBy("hash")

neg_df = neg_df.withColumn('text', F.split(neg_df["_c0"], 'NEG').getItem(1))\
               .withColumn("is_ADE", F.lit(False))\
               .drop_duplicates(["text"])\
               .withColumn("id", F.monotonically_increasing_id()).select("id", "text", "is_ADE")
              
display(neg_df.limit(20))

id,text,is_ADE
0,Two patients had clots in the pump.,False
1,They no longer need red cell transfusions and have had a normal Hb concentration and normal ferrokinetics.,False
2,Both patients experienced the onset of psychiatric symptoms as young adults,False
3,Range orders for the delivery of IV opioids give nurses the flexibility needed to treat patients' pain in a timely manner while allowing for differences in patient response to pain and to analgesia.,False
4,The liver had zolpidem present at a concentration of 4.74 microg/g,False
5,Very rarely,False
6,Chest roentgenogram showed infiltration in the left lung field,False
7,The total amount ingested range from 14.3 to 99.3 mg/kg (mean,False
8,The patient was discharged on the 8th postoperative week in good conditions.,False
9,His immune system at this time was almost completely depleted.,False


In [0]:
neg_df.count()

In [0]:
pos_df = spark.read.csv("/DRUG-AE.rel", sep="|", header=None).select("_c1")\
                                                             .withColumn("hash", F.hash("_c1"))\
                                                             .withColumnRenamed("_c1","text")\
                                                             .withColumn("is_ADE", F.lit(True))\
                                                             .orderBy("hash")\

pos_df = pos_df.drop_duplicates(["text"]).withColumn("id", F.monotonically_increasing_id()).select("id", "text", "is_ADE")
display(pos_df.limit(20))

id,text,is_ADE
0,The authors report 2 cases of renal damage associated with lithium carbonate treatment.,True
1,Hepatic angiosarcoma occurring after cyclophosphamide therapy: case report and review of the literature.,True
2,TREATMENT/OUTCOME: Standard anti-tuberculosis therapy was administered but was complicated by interaction with cyclosporine and drug-induced cholestasis.,True
3,"Thus, an immunological mechanism might be involved in the mechanism of pirmenol-induced QT prolongation and T wave inversion on the electrocardiogram.",True
4,Contact dermatitis due to budesonide: report of five cases and review of the Japanese literature.,True
5,We report a case of acute generalized exanthematous pustulosis (AGEP) in a 50-year-old woman that was attributed to the ingestion of nimesulide.,True
6,Spindle coma in benzodiazepine toxicity: case report.,True
7,Ceftriaxone was approved in 1997 for the treatment of otitis media despite previous studies that documented an association of ceftriaxone with elevated hepato-biliary enzymes and transient biliary stasis.,True
8,Fibrosis of corpus cavernosum after intracavernous injection of phentolamine/papaverine.,True
9,Presented is a case of acute renal failure induced by acetazolamide therapy for glaucoma.,True


In [0]:
pos_df.count()

We will store our dataframes in delta-table.

In [0]:
delta_path='/FileStore/HLS/nlp/delta/jsl/'

neg_df.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/neg_df')
display(dbutils.fs.ls(f'{delta_path}/ADE/neg_df'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/neg_df/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/neg_df/part-00000-1fd31ed4-7b32-4f82-b9ad-be300c32dca4-c000.snappy.parquet,part-00000-1fd31ed4-7b32-4f82-b9ad-be300c32dca4-c000.snappy.parquet,996193


In [0]:
pos_df.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/pos_df')
display(dbutils.fs.ls(f'{delta_path}/ADE/pos_df'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/pos_df/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/pos_df/part-00000-07045875-b965-409d-8b68-9e3c9ef46d92-c000.snappy.parquet,part-00000-07045875-b965-409d-8b68-9e3c9ef46d92-c000.snappy.parquet,352896


# 1. Conversational ADE Classification

## 1.1 Use Case: Text Classification According To Contains ADE or Not

Now we will try to predict if a text contains ADE or not by using `classifierdl_ade_conversational_biobert`. For this, we will create a new dataframe merging all ADE negative and ADE positive texts and shuffle that.

In [0]:
sdf = neg_df.union(pos_df).drop("is_ADE").orderBy(F.rand(seed=42)).withColumn("id", F.monotonically_increasing_id())

display(sdf.limit(20))

id,text
0,CSF examination revealed an inflammatory profile.
1,"We report a case of generalized cutaneous sclerosis associated with muscle and oesophageal involvement in a patient exposed to herbicides containing bromocil, diuron and aminotriazole."
2,METHODS: This is a case of angiosarcoma developing 5 years after curative therapy for T3N0 squamous cell carcinoma of the supraglottic larynx.
3,"Two patients with ovarian cancer who had received multiple courses of cisplatin without complications experienced hypersensitivity reactions to cisplatin: one, involving intrahepatic artery infusion, manifested general erythema, dyspnea, and hypotension; the other, involving intravenous infusion, manifested abdominal pain, general erythema, and fever."
4,Evidence obtained indicated that the Reye-like syndrome might be caused by calcium hopantenate possibly due to the induction of pantothenic acid deficiency.
5,"The goal of this study is to describe three patients diagnosed with migraine and epilepsy (both under control) who evolved into status migrainosus after the introduction of oxcarbazepine (OXC), as part of a switch off from carbamazepine (CBZ)."
6,All 4 patients who were recruited completed the procedure successfully without significant difficulty.
7,Sensory neuropathy revealing necrotizing vasculitis during infliximab therapy for rheumatoid arthritis.
8,CONCLUSION: Copperhead bites typically result in mild to moderate envenomation due to local tissue effects.
9,A case of metoclopramide-induced oculogyric crisis in a 16-year-old girl with cystic fibrosis.


Store the dataframe in a delta table.

In [0]:
sdf = sdf.repartition(12)

In [0]:
sdf.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/sdf_shuffled')
display(dbutils.fs.ls(f'{delta_path}/ADE/sdf_shuffled'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00000-8fd8d833-a0a6-471a-821b-686302ab2eed-c000.snappy.parquet,part-00000-8fd8d833-a0a6-471a-821b-686302ab2eed-c000.snappy.parquet,112908
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00001-a0fb36c1-6149-447c-adde-b7315c6ea3b2-c000.snappy.parquet,part-00001-a0fb36c1-6149-447c-adde-b7315c6ea3b2-c000.snappy.parquet,116740
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00002-6be333c8-71a4-48f9-bca7-463d53429b04-c000.snappy.parquet,part-00002-6be333c8-71a4-48f9-bca7-463d53429b04-c000.snappy.parquet,115876
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00003-a08e94a3-75cf-4151-b998-b8f0bf2a0444-c000.snappy.parquet,part-00003-a08e94a3-75cf-4151-b998-b8f0bf2a0444-c000.snappy.parquet,115286
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00004-880b9859-e915-49b3-aadd-e3c7e20aa32c-c000.snappy.parquet,part-00004-880b9859-e915-49b3-aadd-e3c7e20aa32c-c000.snappy.parquet,115112
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00005-6e6c479d-40fb-4a7e-bd1a-dc08f0cc1792-c000.snappy.parquet,part-00005-6e6c479d-40fb-4a7e-bd1a-dc08f0cc1792-c000.snappy.parquet,116054
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00006-7c5c4183-f9d2-4e06-85a6-0f89018ff361-c000.snappy.parquet,part-00006-7c5c4183-f9d2-4e06-85a6-0f89018ff361-c000.snappy.parquet,116377
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00007-b3ef9ce1-0c73-403c-abe8-d00d65f7c3ee-c000.snappy.parquet,part-00007-b3ef9ce1-0c73-403c-abe8-d00d65f7c3ee-c000.snappy.parquet,115779
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/sdf_shuffled/part-00008-00dc4c15-9786-44c4-823d-527415c5d4df-c000.snappy.parquet,part-00008-00dc4c15-9786-44c4-823d-527415c5d4df-c000.snappy.parquet,115898


In [0]:
document_assembler = DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")

tokenizer = Tokenizer()\
        .setInputCols(['document'])\
        .setOutputCol('token')

embeddings = BertEmbeddings.pretrained('biobert_pubmed_base_cased')\
        .setInputCols(["document", 'token'])\
        .setOutputCol("embeddings")

sentence_embeddings = SentenceEmbeddings() \
        .setInputCols(["document", "embeddings"]) \
        .setOutputCol("sentence_embeddings") \
        .setPoolingStrategy("AVERAGE")

conv_classifier = ClassifierDLModel.pretrained('classifierdl_ade_conversational_biobert', 'en', 'clinical/models')\
        .setInputCols(['document', 'token', 'sentence_embeddings'])\
        .setOutputCol('conv_class')


clf_pipeline = Pipeline(stages=[
    document_assembler, 
    tokenizer, 
    embeddings, 
    sentence_embeddings, 
    conv_classifier])

empty_data = spark.createDataFrame([['']]).toDF("text")
clf_model = clf_pipeline.fit(empty_data)

In [0]:
result = clf_model.transform(sdf)

Lets get the `classifierdl_ade_conversational_biobert` model results in `conv_cl_result` column.

In [0]:
 res_df = result.select("id", "text", F.explode(F.arrays_zip("conv_class.result")).alias("cols"))\
                .select("id", "text", F.expr("cols['0']").alias("conv_cl_result"))

In [0]:
display(res_df.limit(20))

id,text,conv_cl_result
4869,Acne fulminans and 13-cis-retinoic acid.,False
9797,METHODS: Between April 2002 and October 2003,False
10247,Symptomatic cluster headache related to ocular pathologies have been rarely described.,False
1002,The first signs of joint manifestations started one year after HIV seroconversion and resolved when antiviral treatment with AZT was started.,False
9697,Whole blood histamine release studies were negative.,False
970,An unusual cause of burn injury: unsupervised use of drugs that contain psoralens.,False
13299,In our case no complications developed from the treatment and the patient's visual loss and renal function improved.,False
12291,A 12-year-old,False
20078,We explored ophthalmic and neurologic findings in two children who have been exposed prenatally to VGB.,True
6634,CASE HISTORY: A 58-year-old male Caucasian developed delayed onset diffuse lamellar keratitis,True


In [0]:
res_df.printSchema()

We will change the type of the labels in our results to `boolean`.

In [0]:
res_df = res_df.withColumn('conv_cl_result', F.col('conv_cl_result').cast('boolean'))
display(res_df.limit(20))

id,text,conv_cl_result
4869,Acne fulminans and 13-cis-retinoic acid.,False
9797,METHODS: Between April 2002 and October 2003,False
10247,Symptomatic cluster headache related to ocular pathologies have been rarely described.,False
1002,The first signs of joint manifestations started one year after HIV seroconversion and resolved when antiviral treatment with AZT was started.,False
9697,Whole blood histamine release studies were negative.,False
970,An unusual cause of burn injury: unsupervised use of drugs that contain psoralens.,False
13299,In our case no complications developed from the treatment and the patient's visual loss and renal function improved.,False
12291,A 12-year-old,False
20078,We explored ophthalmic and neurologic findings in two children who have been exposed prenatally to VGB.,True
6634,CASE HISTORY: A 58-year-old male Caucasian developed delayed onset diffuse lamellar keratitis,True


In [0]:
res_df.printSchema()

In [0]:
res_df.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/clf_res')
display(dbutils.fs.ls(f'{delta_path}/ADE/clf_res'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00000-bb713c77-9fd2-4f4d-87dd-3e8c278c5c4b-c000.snappy.parquet,part-00000-bb713c77-9fd2-4f4d-87dd-3e8c278c5c4b-c000.snappy.parquet,113340
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00001-60b1fdb6-e305-48a9-98c8-9a18cb2d41f9-c000.snappy.parquet,part-00001-60b1fdb6-e305-48a9-98c8-9a18cb2d41f9-c000.snappy.parquet,117172
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00002-8460dd13-1efe-4c8d-b55e-75e3ba710687-c000.snappy.parquet,part-00002-8460dd13-1efe-4c8d-b55e-75e3ba710687-c000.snappy.parquet,116308
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00003-2214677a-04e1-4dc3-92e1-f67e5bdaea22-c000.snappy.parquet,part-00003-2214677a-04e1-4dc3-92e1-f67e5bdaea22-c000.snappy.parquet,115718
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00004-cc1e28ec-ca14-4d2f-9000-43edeca46cd5-c000.snappy.parquet,part-00004-cc1e28ec-ca14-4d2f-9000-43edeca46cd5-c000.snappy.parquet,115544
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00005-917c173a-9b13-4598-8a37-1d514e287383-c000.snappy.parquet,part-00005-917c173a-9b13-4598-8a37-1d514e287383-c000.snappy.parquet,116486
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00006-a8e4cbbc-1b54-4df0-9330-7359003773a8-c000.snappy.parquet,part-00006-a8e4cbbc-1b54-4df0-9330-7359003773a8-c000.snappy.parquet,116809
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00007-d8f7033f-8685-4841-b009-c7e8e5686c8b-c000.snappy.parquet,part-00007-d8f7033f-8685-4841-b009-c7e8e5686c8b-c000.snappy.parquet,116211
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/clf_res/part-00008-441d152e-c9fc-48db-8992-c942caa7095a-c000.snappy.parquet,part-00008-441d152e-c9fc-48db-8992-c942caa7095a-c000.snappy.parquet,116330


**Let's see the number of prediction on each class.**

In [0]:
display(
  res_df.groupBy("conv_cl_result")
       .count()
       .orderBy(F.desc('count'))
       )

conv_cl_result,count
False,15486
True,4703


**Lets check some of the example sentences that the model predicted the ADE is `True` and `False`.**

In [0]:
display(res_df.filter(res_df["id"].isin([1,3,5,6,8,9,10])))

id,text,conv_cl_result
6,All 4 patients who were recruited completed the procedure successfully without significant difficulty.,False
3,"Two patients with ovarian cancer who had received multiple courses of cisplatin without complications experienced hypersensitivity reactions to cisplatin: one, involving intrahepatic artery infusion, manifested general erythema, dyspnea, and hypotension; the other, involving intravenous infusion, manifested abdominal pain, general erythema, and fever.",True
1,"We report a case of generalized cutaneous sclerosis associated with muscle and oesophageal involvement in a patient exposed to herbicides containing bromocil, diuron and aminotriazole.",True
9,A case of metoclopramide-induced oculogyric crisis in a 16-year-old girl with cystic fibrosis.,True
10,The primary tumor sites included the tongue,False
8,CONCLUSION: Copperhead bites typically result in mild to moderate envenomation due to local tissue effects.,False
5,"The goal of this study is to describe three patients diagnosed with migraine and epilepsy (both under control) who evolved into status migrainosus after the introduction of oxcarbazepine (OXC), as part of a switch off from carbamazepine (CBZ).",True


# 2. ADE-DRUG NER Examination

We will work on `pos_df` dataframe from now.

In [0]:
df = spark.read.format('delta').load(f'{delta_path}/ADE/pos_df/').drop("is_ADE")
display(df.limit(20))

id,text
0,The authors report 2 cases of renal damage associated with lithium carbonate treatment.
1,Hepatic angiosarcoma occurring after cyclophosphamide therapy: case report and review of the literature.
2,TREATMENT/OUTCOME: Standard anti-tuberculosis therapy was administered but was complicated by interaction with cyclosporine and drug-induced cholestasis.
3,"Thus, an immunological mechanism might be involved in the mechanism of pirmenol-induced QT prolongation and T wave inversion on the electrocardiogram."
4,Contact dermatitis due to budesonide: report of five cases and review of the Japanese literature.
5,We report a case of acute generalized exanthematous pustulosis (AGEP) in a 50-year-old woman that was attributed to the ingestion of nimesulide.
6,Spindle coma in benzodiazepine toxicity: case report.
7,Ceftriaxone was approved in 1997 for the treatment of otitis media despite previous studies that documented an association of ceftriaxone with elevated hepato-biliary enzymes and transient biliary stasis.
8,Fibrosis of corpus cavernosum after intracavernous injection of phentolamine/papaverine.
9,Presented is a case of acute renal failure induced by acetazolamide therapy for glaucoma.


In [0]:
df.count()

In [0]:
df = df.repartition(12)

## 2.1. Use Case: Detecting ADE and Drug Entities From Texts

Now we will extract `ADE` and `DRUG` entities from the conversational texts by using a combination of `ner_ade_clinical` and `ner_posology` models.

In [0]:
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")
  
ade_ner = MedicalNerModel.pretrained("ner_ade_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ade_ner")

ade_ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ade_ner"]) \
    .setOutputCol("ade_ner_chunk")\

pos_ner = MedicalNerModel.pretrained("ner_posology", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("pos_ner")

pos_ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "pos_ner"]) \
    .setOutputCol("pos_ner_chunk")\
    .setWhiteList(["DRUG"])

chunk_merger = ChunkMergeApproach()\
    .setInputCols("ade_ner_chunk","pos_ner_chunk")\
    .setOutputCol("ner_chunk")\


ner_pipeline = Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    word_embeddings,
    ade_ner,
    ade_ner_converter,
    pos_ner,
    pos_ner_converter,
    chunk_merger
    ])


empty_data = spark.createDataFrame([[""]]).toDF("text")
ade_ner_model = ner_pipeline.fit(empty_data)

In [0]:
result = ade_ner_model.transform(df)

**Show the  `ADE` and `DRUG` phrases detected in conversations.**

In [0]:
display(result.select('id', 'text','ner_chunk.result')\
              .toDF('id', 'text','ADE_phrases')\
              .filter(F.size('ADE_phrases')>0)\
              .orderBy("id").limit(20))

id,text,ADE_phrases
0,The authors report 2 cases of renal damage associated with lithium carbonate treatment.,"List(renal damage, lithium carbonate)"
1,Hepatic angiosarcoma occurring after cyclophosphamide therapy: case report and review of the literature.,"List(Hepatic angiosarcoma, cyclophosphamide)"
2,TREATMENT/OUTCOME: Standard anti-tuberculosis therapy was administered but was complicated by interaction with cyclosporine and drug-induced cholestasis.,"List(cyclosporine, cholestasis)"
3,"Thus, an immunological mechanism might be involved in the mechanism of pirmenol-induced QT prolongation and T wave inversion on the electrocardiogram.",List(pirmenol-induced)
4,Contact dermatitis due to budesonide: report of five cases and review of the Japanese literature.,"List(Contact dermatitis, budesonide)"
5,We report a case of acute generalized exanthematous pustulosis (AGEP) in a 50-year-old woman that was attributed to the ingestion of nimesulide.,"List(acute generalized exanthematous pustulosis, AGEP, nimesulide)"
6,Spindle coma in benzodiazepine toxicity: case report.,"List(Spindle coma, benzodiazepine)"
7,Ceftriaxone was approved in 1997 for the treatment of otitis media despite previous studies that documented an association of ceftriaxone with elevated hepato-biliary enzymes and transient biliary stasis.,"List(Ceftriaxone, ceftriaxone, elevated hepato-biliary enzymes, transient biliary stasis)"
8,Fibrosis of corpus cavernosum after intracavernous injection of phentolamine/papaverine.,"List(Fibrosis of corpus cavernosum, phentolamine/papaverine)"
9,Presented is a case of acute renal failure induced by acetazolamide therapy for glaucoma.,"List(acute renal failure, acetazolamide)"


**Show extracted chunks and their confidence levels**

In [0]:
result_sdf = result.select('id', F.explode(F.arrays_zip("ner_chunk.result","ner_chunk.metadata")).alias("cols"))\
                  .select('id', F.expr("cols['0']").alias("chunk"),
                                F.expr("cols['1']['entity']").alias("entity"),
                                F.expr("cols['1']['confidence']").alias("confidence")).orderBy("id")

In [0]:
display(result_sdf.limit(20))

id,chunk,entity,confidence
0,renal damage,ADE,0.93515
0,lithium carbonate,DRUG,0.86545
1,Hepatic angiosarcoma,ADE,0.9554
1,cyclophosphamide,DRUG,0.9952
2,cyclosporine,DRUG,0.9891
2,cholestasis,ADE,0.6868
3,pirmenol-induced,DRUG,0.928
4,budesonide,DRUG,0.9928
4,Contact dermatitis,ADE,0.90435004
5,acute generalized exanthematous pustulosis,ADE,0.709625


In [0]:
result_sdf.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/ner_result')
display(dbutils.fs.ls(f'{delta_path}/ADE/ner_result'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00000-287ff13f-675f-477d-b5bf-5b5121805a24-c000.snappy.parquet,part-00000-287ff13f-675f-477d-b5bf-5b5121805a24-c000.snappy.parquet,2711
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00001-2b832276-aef1-4a2d-8a80-12e4144a4a7a-c000.snappy.parquet,part-00001-2b832276-aef1-4a2d-8a80-12e4144a4a7a-c000.snappy.parquet,2655
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00002-89a14583-ec89-486b-b61c-f328982335fc-c000.snappy.parquet,part-00002-89a14583-ec89-486b-b61c-f328982335fc-c000.snappy.parquet,2700
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00003-58f89ded-6a3d-4ac4-9d0b-66fcf2e42fef-c000.snappy.parquet,part-00003-58f89ded-6a3d-4ac4-9d0b-66fcf2e42fef-c000.snappy.parquet,2602
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00004-a7c875e0-3f8a-4014-a927-372d5086a23f-c000.snappy.parquet,part-00004-a7c875e0-3f8a-4014-a927-372d5086a23f-c000.snappy.parquet,2802
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00005-27b0ac00-97a1-4a67-9383-f9b2b17b1bd4-c000.snappy.parquet,part-00005-27b0ac00-97a1-4a67-9383-f9b2b17b1bd4-c000.snappy.parquet,2688
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00006-a1121c11-2f2e-4eb2-ae2c-06b808daba52-c000.snappy.parquet,part-00006-a1121c11-2f2e-4eb2-ae2c-06b808daba52-c000.snappy.parquet,2487
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00007-218fe7af-1925-4add-a229-bf11117759f6-c000.snappy.parquet,part-00007-218fe7af-1925-4add-a229-bf11117759f6-c000.snappy.parquet,2750
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/ner_result/part-00008-23cdadc8-d332-47c6-8ff8-abad93e99bc0-c000.snappy.parquet,part-00008-23cdadc8-d332-47c6-8ff8-abad93e99bc0-c000.snappy.parquet,2717


**Highlight the extracted entities on the raw text by using `sparknlp_display` library for better visual understanding.**

In [0]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()

light_model = LightPipeline(ade_ner_model)

sample_text = result.filter(result["id"].isin([0,2,5,7,12,13])).select(["text"]).collect()

for index, text in enumerate(sample_text):

    print("\n", "*"*50, f'Sample Text {index+1}', "*"*50, "\n")

    light_result = light_model.fullAnnotate(text)

    # change color of an entity label
    visualiser.set_label_colors({'ADE':'#ff037d', 'DRUG':'#7EBF9B'})
    
    ner_vis = visualiser.display(light_result[0], label_col='ner_chunk', document_col='document', return_html=True)
    
    displayHTML(ner_vis)

## 2.2 Use Case: Analyse DRUG & ADE Entities - Find the DRUGs and ADEs have been talked most

**Let's start by reading our table as a pandas dataframe and create `ADE` and `DRUG` dataframes.**

In [0]:
result_df = pd.read_parquet(f'/dbfs{delta_path}/ADE/ner_result')
result_df.head(20)

Unnamed: 0,id,chunk,entity,confidence
0,0,renal damage,ADE,0.93515
1,0,lithium carbonate,DRUG,0.86545
2,1,Hepatic angiosarcoma,ADE,0.9554
3,1,cyclophosphamide,DRUG,0.9952
4,2,cyclosporine,DRUG,0.9891
5,2,cholestasis,ADE,0.6868
6,3,pirmenol-induced,DRUG,0.928
7,4,Contact dermatitis,ADE,0.90435004
8,4,budesonide,DRUG,0.9928
9,5,acute generalized exanthematous pustulosis,ADE,0.709625


In [0]:
drug_df = result_df[result_df.entity == "DRUG"]
drug_df

Unnamed: 0,id,chunk,entity,confidence
1,0,lithium carbonate,DRUG,0.86545
3,1,cyclophosphamide,DRUG,0.9952
4,2,cyclosporine,DRUG,0.9891
6,3,pirmenol-induced,DRUG,0.928
8,4,budesonide,DRUG,0.9928
...,...,...,...,...
11024,4267,Colchicine-induced,DRUG,0.9717
11026,4268,benztropine,DRUG,0.9834
11031,4269,melphalan,DRUG,0.996
11032,4269,busulfan,DRUG,0.9989


In [0]:
ade_df = result_df[result_df.entity == "ADE"]
ade_df

Unnamed: 0,id,chunk,entity,confidence
0,0,renal damage,ADE,0.93515
2,1,Hepatic angiosarcoma,ADE,0.9554
5,2,cholestasis,ADE,0.6868
7,4,Contact dermatitis,ADE,0.90435004
9,5,acute generalized exanthematous pustulosis,ADE,0.709625
...,...,...,...,...
11027,4268,confusion,ADE,0.838
11028,4268,abdominal pain,ADE,0.95585
11029,4268,distention,ADE,0.9585
11030,4269,Additive pulmonary toxicity,ADE,0.66113335


**We convert the chunks of these dataframes to lowercase to get more accurate results and check most frequent `ADE` and `DRUG` entities.**

In [0]:
drug_df.chunk = drug_df.chunk.str.lower()
drug_df.chunk.value_counts().head(20)

In [0]:
ade_df.chunk = ade_df.chunk.str.lower()
ade_df.chunk.value_counts().head(20)

**Lets show the talked most common `DRUG` and `ADE` entities on a barplot.**

In [0]:
import plotly.express as px

data=drug_df.chunk.value_counts().head(30)
data_pdf=pd.DataFrame({"Count":data.values,'Drug':data.index})
fig = px.bar(data_pdf, y='Drug', x='Count',orientation='h',color='Count', 
             color_continuous_scale=px.colors.sequential.Bluered, width=1200, height=700) 

fig.update_layout(
    title={
        'text': "Most Common DRUG Entities",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center'
        },
  font=dict(size=15))

fig.show()

In [0]:
import plotly.express as px

data=ade_df.chunk.value_counts().head(30)
data_pdf=pd.DataFrame({"Count":data.values,'ADE':data.index})
fig = px.bar(data_pdf, y='ADE', x='Count',orientation='h',color='Count', 
             color_continuous_scale=px.colors.sequential.Bluered, width=1200, height=700) 

fig.update_layout(
    title={
        'text': "Most Common ADE Entities",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center'
        },
    font=dict(size=15))

fig.show()

# 3. Get Assertion Status of ADE & DRUG Entities
We will create a new pipeline by setting a WhiteList in `NerConverter` to get only `ADE` entities which comes from `ner_ade_clinical` model. Also will add the `assertion_jsl` model to get the assertion status of them. We can use the same annotators that are common with the NER pipeline we created before.

In [0]:
ade_ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ade_ner"]) \
    .setOutputCol("ade_ner_chunk")\
    .setWhiteList(["ADE"])
 
 
assertion = AssertionDLModel.pretrained("assertion_jsl", "en", "clinical/models") \
    .setInputCols(["sentence", "ade_ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")
 
 
assertion_pipeline = Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    word_embeddings,
    ade_ner,
    ade_ner_converter,
    assertion
])
 
empty_data = spark.createDataFrame([[""]]).toDF("text")
assertion_model = assertion_pipeline.fit(empty_data)

In [0]:
as_result = assertion_model.transform(df)

**Now we will create a dataframe with `ADE` chunks and their assertion status and the confidence level of results.**

In [0]:
as_result_sdf = as_result.select('id', 'text', F.explode(F.arrays_zip("ade_ner_chunk.result","ade_ner_chunk.metadata", "assertion.result")).alias("cols"))\
                        .select('id', 'text', F.expr("cols['0']").alias("chunk"),
                                        F.expr("cols['1']['entity']").alias("entity"),
                                        F.expr("cols['2']").alias("assertion"),
                                        F.expr("cols['1']['confidence']").alias("confidence")).orderBy('id')                 

In [0]:
as_result_sdf.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/assertion_result')
display(dbutils.fs.ls(f'{delta_path}/ADE/assertion_result'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00000-848c40c9-baa1-41a6-8b87-16021f4b55e5-c000.snappy.parquet,part-00000-848c40c9-baa1-41a6-8b87-16021f4b55e5-c000.snappy.parquet,5440
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00001-aaa73bdc-bbb4-403f-ad0f-4eed80bbb725-c000.snappy.parquet,part-00001-aaa73bdc-bbb4-403f-ad0f-4eed80bbb725-c000.snappy.parquet,5856
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00002-5cce816e-9343-4e99-9de4-984c9545b6f0-c000.snappy.parquet,part-00002-5cce816e-9343-4e99-9de4-984c9545b6f0-c000.snappy.parquet,5600
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00003-a2a4817e-9084-4b70-b862-4e75be8b8d5e-c000.snappy.parquet,part-00003-a2a4817e-9084-4b70-b862-4e75be8b8d5e-c000.snappy.parquet,5205
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00004-2c01dd8a-ca8e-4799-8fcf-455ac30d3a16-c000.snappy.parquet,part-00004-2c01dd8a-ca8e-4799-8fcf-455ac30d3a16-c000.snappy.parquet,5633
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00005-e5aee803-80c0-4818-9ea7-9f59e6a60b5a-c000.snappy.parquet,part-00005-e5aee803-80c0-4818-9ea7-9f59e6a60b5a-c000.snappy.parquet,5231
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00006-33f33f6a-9ffe-45a3-8b58-7f381dcf618e-c000.snappy.parquet,part-00006-33f33f6a-9ffe-45a3-8b58-7f381dcf618e-c000.snappy.parquet,5146
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00007-21043219-7b6b-45df-b7f2-92896428847b-c000.snappy.parquet,part-00007-21043219-7b6b-45df-b7f2-92896428847b-c000.snappy.parquet,4956
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/assertion_result/part-00008-0538c342-0a5d-49c6-9478-9f2168c55c48-c000.snappy.parquet,part-00008-0538c342-0a5d-49c6-9478-9f2168c55c48-c000.snappy.parquet,4988


**Show the assertion status of the entities on the raw text.**

In [0]:
from sparknlp_display import AssertionVisualizer

assertion_vis = AssertionVisualizer()

as_light_model = LightPipeline(assertion_model)

sample_text = df.filter(df["id"].isin([ 0, 10, 12, 28, 839])).select(["text"]).collect()

for index, text in enumerate(sample_text):

    as_light_result = as_light_model.fullAnnotate(text)

    print("\n", "*"*50, f'Sample Text {index+1}', "*"*50, "\n")
    
    assertion_vis.set_label_colors({'ADE':'#113CB8'})

    assert_vis =     assertion_vis.display(as_light_result[0], 
                                            label_col = 'ade_ner_chunk', 
                                            assertion_col = 'assertion', 
                                            document_col = 'document',
                                            return_html=True
                                            )
    displayHTML(assert_vis)

**Lets plot the assertion status counts of the `ADE` entities.**

In [0]:
display(
  as_result_sdf
  .groupBy('assertion')
  .count()
  .orderBy(F.desc('count'))
)

assertion,count
Present,2895
Past,992
Hypothetical,931
Possible,400
Someoneelse,127
Absent,91
Planned,49
Family,45


## 3.1 Use Case: Conversation Counts by DRUG & ADE Entities
**We will work with the ADE entities by droping the assertion status is `absent`.**

In [0]:
final_ade_df = as_result_sdf.filter("assertion != 'Absent'").toPandas()
final_ade_df.chunk = final_ade_df.chunk.str.lower()

final_ade_df.head(30)

Unnamed: 0,id,text,chunk,entity,assertion,confidence
0,0,The authors report 2 cases of renal damage associated with lithium carbonate treatment.,renal damage,ADE,Present,0.93515
1,1,Hepatic angiosarcoma occurring after cyclophosphamide therapy: case report and review of the lit...,hepatic angiosarcoma,ADE,Past,0.9554
2,2,TREATMENT/OUTCOME: Standard anti-tuberculosis therapy was administered but was complicated by in...,cholestasis,ADE,Past,0.6868
3,4,Contact dermatitis due to budesonide: report of five cases and review of the Japanese literature.,contact dermatitis,ADE,Someoneelse,0.90435004
4,5,We report a case of acute generalized exanthematous pustulosis (AGEP) in a 50-year-old woman tha...,acute generalized exanthematous pustulosis,ADE,Present,0.709625
5,5,We report a case of acute generalized exanthematous pustulosis (AGEP) in a 50-year-old woman tha...,agep,ADE,Present,0.9254
6,6,Spindle coma in benzodiazepine toxicity: case report.,spindle coma,ADE,Present,0.85940003
7,7,Ceftriaxone was approved in 1997 for the treatment of otitis media despite previous studies that...,elevated hepato-biliary enzymes,ADE,Past,0.76946664
8,7,Ceftriaxone was approved in 1997 for the treatment of otitis media despite previous studies that...,transient biliary stasis,ADE,Past,0.7843
9,8,Fibrosis of corpus cavernosum after intracavernous injection of phentolamine/papaverine.,fibrosis of corpus cavernosum,ADE,Present,0.8859


**We will find the most frequent `ADE` and `DRUG` entities, then plot them on a chart to show the count of distinct conversations that contains these entities.**

In [0]:
most_common_ade = final_ade_df.chunk.value_counts().index[:20]
most_common_ade

In [0]:
import plotly.express as px

unique_ade = final_ade_df[final_ade_df.chunk.isin(most_common_ade)].rename(columns={"chunk":"ade"}).groupby(['id','ade']).count().reset_index()[['id', 'ade']]

data=unique_ade.ade.value_counts().head(20)
data_pdf=pd.DataFrame({"Count":data.values,'ADE':data.index})
fig = px.bar(data_pdf, y='ADE', x='Count',orientation='h',color='Count', 
             color_continuous_scale=px.colors.sequential.Bluyl, width=1200, height=700) 

fig.update_layout(
    title={
        'text': "Unique Conversation Counts by ADE",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center'
        },
    font=dict(size=15))

fig.show()

In [0]:
most_common_drug = drug_df.chunk.value_counts().index[:20]
most_common_drug

In [0]:
import plotly.express as px

unique_drug = drug_df[drug_df.chunk.isin(most_common_drug)].rename(columns={"chunk":"drug"}).groupby(['id','drug']).count().reset_index()[['id', 'drug']]

data=unique_drug.drug.value_counts().head(20)
data_pdf=pd.DataFrame({"Count":data.values,'ADE':data.index})
fig = px.bar(data_pdf, y='ADE', x='Count',orientation='h',color='Count', 
             color_continuous_scale=px.colors.sequential.Bluyl, width=1200, height=700) 

fig.update_layout(
    title={
        'text': "Unique Conversation Counts by DRUG",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center'
        },
    font=dict(size=15))

fig.show()

## 3.2 Use Case: Most Common ADE-DRUG Pairs 
**We can find the most common ADE-DRUG pairs that were talked in the same conversation.**

In [0]:
top_20_ade = unique_ade.groupby("ade").count().sort_values(by="id", ascending=False).iloc[:20].index
top_20_drug = unique_drug.groupby("drug").count().sort_values(by="id", ascending=False).iloc[:20].index

In [0]:
merged_df = pd.merge(unique_ade[unique_ade.ade.isin(top_20_ade)],
                     unique_drug[unique_drug.drug.isin(top_20_drug)],
                     on = "id").groupby(["ade", "drug"]).count().reset_index()

drug_ade_df = merged_df.pivot_table(index="ade", columns=["drug"], values="id", fill_value=0)
drug_ade_df

drug,5-fluorouracil,amiodarone,bleomycin,carbamazepine,chemotherapy,cisplatin,clozapine,cyclophosphamide,cyclosporine,ethambutol,infliximab,insulin,lithium,methotrexate,olanzapine,phenytoin,risperidone,rituximab
ade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
acute pancreatitis,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
acute renal failure,0,0,0,0,0,0,0,0,0,0,0,0,1,4,0,0,0,0
agranulocytosis,0,0,0,0,0,0,7,0,0,0,1,0,0,0,2,0,0,0
akathisia,0,0,0,0,0,0,1,0,0,0,0,0,1,0,2,0,1,0
anaphylaxis,0,0,0,0,0,2,0,0,1,0,0,0,0,3,0,0,0,0
fever,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,3,0,0
hepatitis,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,1,0,0
hepatotoxicity,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0
hypersensitivity,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0
leukopenia,0,0,0,2,0,0,0,0,0,0,0,0,0,0,2,0,0,0


In [0]:
import plotly.express as px

fig = px.imshow(drug_ade_df,labels=dict(x="DRUG", y="ADE", color='Occurence'),y=list(drug_ade_df.index), 
                x=list(drug_ade_df.columns), color_continuous_scale=px.colors.sequential.Mint)

fig.update_layout(
    autosize=False,
    width=1000,
    height=1000,
    title={
        'text': "Number of Conversation",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center'
        },
    font=dict(size=15))

fig.show()

**As you can see in the results, Pneumonitis-Methotrexate is the most common `ADE-DRUG` pair.**

# 4. Analyze Relations Between ADE & DRUG Entities

## 4.1. Use Case: Extract Relations Between ADE and DRUG Entities

We can extract the relations between `ADE` and `DRUG` entities by using `re_ade_clinical` model. We won't use `SentenceDetector` annotator in this pipeline to check the relations between entities in difference sentences.

In [0]:
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")
  
ade_ner = MedicalNerModel.pretrained("ner_ade_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("pos_tags")    

dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en")\
    .setInputCols(["sentence", "pos_tags", "token"])\
    .setOutputCol("dependencies")

reModel = RelationExtractionModel.pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunk", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(0)\
    .setRelationPairs(["ade-drug", "drug-ade"])


re_pipeline = Pipeline(stages=[
    documentAssembler, 
    tokenizer,
    word_embeddings,
    ade_ner,
    ner_converter,
    pos_tagger,
    dependency_parser,
    reModel
])


empty_data = spark.createDataFrame([[""]]).toDF("text")
re_model = re_pipeline.fit(empty_data)

In [0]:
re_result = re_model.transform(df)

**Now we can show our detected entities, their relations and confidence levels in a dataframe.**

In [0]:
rel_sdf = re_result.select('text', F.explode(F.arrays_zip('relations.result', 'relations.metadata')).alias("cols"))\
                  .select('text', F.expr("cols['0']").alias("relation"),
                                  F.expr("cols['1']['entity1']").alias("entity1"),
                                  F.expr("cols['1']['chunk1']").alias("chunk1"),
                                  F.expr("cols['1']['entity2']").alias("entity2"),
                                  F.expr("cols['1']['chunk2']").alias("chunk2"),
                                  F.expr("cols['1']['confidence']").alias("confidence"))

In [0]:
rel_sdf.write.format('delta').mode('overwrite').save(f'{delta_path}/ADE/relation_result')
display(dbutils.fs.ls(f'{delta_path}/ADE/relation_result'))

path,name,size
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/_delta_log/,_delta_log/,0
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00000-a108172d-bf7e-47c2-a006-ca767e4ad2bd-c000.snappy.parquet,part-00000-a108172d-bf7e-47c2-a006-ca767e4ad2bd-c000.snappy.parquet,41002
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00001-86f6c420-d8cd-48ab-92d9-ec830b2df54d-c000.snappy.parquet,part-00001-86f6c420-d8cd-48ab-92d9-ec830b2df54d-c000.snappy.parquet,41392
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00002-ef655af5-1c47-43e2-9810-a3e6703bc446-c000.snappy.parquet,part-00002-ef655af5-1c47-43e2-9810-a3e6703bc446-c000.snappy.parquet,42800
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00003-f8d93299-b2fa-4ead-80e8-48a0a4c0c89f-c000.snappy.parquet,part-00003-f8d93299-b2fa-4ead-80e8-48a0a4c0c89f-c000.snappy.parquet,40607
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00004-6902adc5-30ca-4b85-a66d-f9a0b154f9be-c000.snappy.parquet,part-00004-6902adc5-30ca-4b85-a66d-f9a0b154f9be-c000.snappy.parquet,42383
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00005-12c96067-09c5-4e72-b8e1-fd09cd5cc408-c000.snappy.parquet,part-00005-12c96067-09c5-4e72-b8e1-fd09cd5cc408-c000.snappy.parquet,39977
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00006-99c2b8a7-a92a-425d-9fa4-19a370e2a89e-c000.snappy.parquet,part-00006-99c2b8a7-a92a-425d-9fa4-19a370e2a89e-c000.snappy.parquet,41431
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00007-5a498934-a662-4e38-a50a-c29e7d4d677d-c000.snappy.parquet,part-00007-5a498934-a662-4e38-a50a-c29e7d4d677d-c000.snappy.parquet,42506
dbfs:/FileStore/HLS/nlp/delta/jsl/ADE/relation_result/part-00008-cdef6a63-bc85-4eda-8dcc-da80833974f1-c000.snappy.parquet,part-00008-cdef6a63-bc85-4eda-8dcc-da80833974f1-c000.snappy.parquet,44009


In [0]:
rel_df = pd.read_parquet(f'/dbfs{delta_path}/ADE/relation_result/')
rel_df.head(30)

Unnamed: 0,text,relation,entity1,chunk1,entity2,chunk2,confidence
0,We wish to call for cautious approach at time of cessation of prolonged ACTH therapy because of ...,1,DRUG,ACTH,ADE,hyperkalemia,1.0
1,Infections are a major adverse effect during the treatment with anti-TNF-alpha.,1,ADE,Infections,DRUG,anti-TNF-alpha,1.0
2,Severe Raynaud's phenomenon with yohimbine therapy for erectile dysfunction.,1,ADE,Severe Raynaud's phenomenon,DRUG,yohimbine,1.0
3,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,rapidly progressive glomerulonephritis,DRUG,D-penicillamine,1.0
4,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,epithelial crescents,DRUG,D-penicillamine,1.0
5,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,hemoptysis,DRUG,D-penicillamine,1.0
6,"In two patients with mycosis fungoides, a squamous cell carcinoma developed during therapy with ...",1,ADE,squamous cell carcinoma,DRUG,psoralens,1.0
7,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,Cholestatic liver disease,DRUG,clindamycin,1.0
8,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,Cholestatic liver disease,DRUG,trimethoprim-sulfamethoxazole,1.0
9,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,ductopenia,DRUG,clindamycin,1.0


**We will convert the chunks to lowercase to get more accurate results.**

In [0]:
rel_df.chunk1 = rel_df.chunk1.str.lower()
rel_df.chunk2 = rel_df.chunk2.str.lower()

In [0]:
rel_df.drop_duplicates(["chunk1", "chunk2"]).head(20)

Unnamed: 0,text,relation,entity1,chunk1,entity2,chunk2,confidence
0,We wish to call for cautious approach at time of cessation of prolonged ACTH therapy because of ...,1,DRUG,acth,ADE,hyperkalemia,1.0
1,Infections are a major adverse effect during the treatment with anti-TNF-alpha.,1,ADE,infections,DRUG,anti-tnf-alpha,1.0
2,Severe Raynaud's phenomenon with yohimbine therapy for erectile dysfunction.,1,ADE,severe raynaud's phenomenon,DRUG,yohimbine,1.0
3,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,rapidly progressive glomerulonephritis,DRUG,d-penicillamine,1.0
4,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,epithelial crescents,DRUG,d-penicillamine,1.0
5,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,hemoptysis,DRUG,d-penicillamine,1.0
6,"In two patients with mycosis fungoides, a squamous cell carcinoma developed during therapy with ...",1,ADE,squamous cell carcinoma,DRUG,psoralens,1.0
7,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,cholestatic liver disease,DRUG,clindamycin,1.0
8,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,cholestatic liver disease,DRUG,trimethoprim-sulfamethoxazole,1.0
9,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,ductopenia,DRUG,clindamycin,1.0


**We can show only `ADE-DRUG` in relation pairs.**

In [0]:
in_relation_df = rel_df[rel_df.relation.astype(int) == 1].drop_duplicates().reset_index(drop=True)
in_relation_df.head(20)

Unnamed: 0,text,relation,entity1,chunk1,entity2,chunk2,confidence
0,We wish to call for cautious approach at time of cessation of prolonged ACTH therapy because of ...,1,DRUG,acth,ADE,hyperkalemia,1.0
1,Infections are a major adverse effect during the treatment with anti-TNF-alpha.,1,ADE,infections,DRUG,anti-tnf-alpha,1.0
2,Severe Raynaud's phenomenon with yohimbine therapy for erectile dysfunction.,1,ADE,severe raynaud's phenomenon,DRUG,yohimbine,1.0
3,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,rapidly progressive glomerulonephritis,DRUG,d-penicillamine,1.0
4,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,epithelial crescents,DRUG,d-penicillamine,1.0
5,A 56-year-old woman with scleroderma developed rapidly progressive glomerulonephritis with epith...,1,ADE,hemoptysis,DRUG,d-penicillamine,1.0
6,"In two patients with mycosis fungoides, a squamous cell carcinoma developed during therapy with ...",1,ADE,squamous cell carcinoma,DRUG,psoralens,1.0
7,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,cholestatic liver disease,DRUG,clindamycin,1.0
8,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,cholestatic liver disease,DRUG,trimethoprim-sulfamethoxazole,1.0
9,Cholestatic liver disease with ductopenia (vanishing bile duct syndrome) after administration of...,1,ADE,ductopenia,DRUG,clindamycin,1.0


**Show the relations on the raw text bu using `sparknlp_display` library.**

In [0]:
from sparknlp_display import RelationExtractionVisualizer

re_light_model = LightPipeline(re_model)
re_vis = RelationExtractionVisualizer()

sample_text = df.filter(df["id"].isin([12, 34, 29, 4256, 1649])).select(["text"]).collect()

for index, text in enumerate(sample_text):

    print("\n", "*"*50, f'Sample Text {index+1}', "*"*50, "\n")
    
    re_light_result = re_light_model.fullAnnotate(text)

    relation_vis = re_vis.display(re_light_result[0],
                                  relation_col = 'relations',
                                  document_col = 'sentence',
                                  show_relations=True,
                                  return_html=True
                                   )
    
    displayHTML(relation_vis)

## License
Copyright / License info of the notebook. Copyright [2021] the Notebook Authors.  The source in this notebook is provided subject to the [Apache 2.0 License](https://spdx.org/licenses/Apache-2.0.html).  All included or referenced third party libraries are subject to the licenses set forth below.

|Library Name|Library License|Library License URL|Library Source URL|
| :-: | :-:| :-: | :-:|
|Pandas |BSD 3-Clause License| https://github.com/pandas-dev/pandas/blob/master/LICENSE | https://github.com/pandas-dev/pandas|
|Numpy |BSD 3-Clause License| https://github.com/numpy/numpy/blob/main/LICENSE.txt | https://github.com/numpy/numpy|
|Apache Spark |Apache License 2.0| https://github.com/apache/spark/blob/master/LICENSE | https://github.com/apache/spark/tree/master/python/pyspark|
|MatPlotLib | | https://github.com/matplotlib/matplotlib/blob/master/LICENSE/LICENSE | https://github.com/matplotlib/matplotlib|
|Seaborn |BSD 3-Clause License | https://github.com/seaborn/seaborn/blob/master/LICENSE | https://github.com/seaborn/seaborn/|
|Plotly|MIT License|https://github.com/plotly/plotly.py/blob/master/LICENSE.txt|https://github.com/plotly/plotly.py|
|Spark NLP Display|Apache License 2.0|https://github.com/JohnSnowLabs/spark-nlp-display/blob/main/LICENSE|https://github.com/JohnSnowLabs/spark-nlp-display|
|Spark NLP |Apache License 2.0| https://github.com/JohnSnowLabs/spark-nlp/blob/master/LICENSE | https://github.com/JohnSnowLabs/spark-nlp|
|Spark NLP for Healthcare|[Proprietary license - John Snow Labs Inc.](https://www.johnsnowlabs.com/spark-nlp-health/) |NA|NA|


|Author|
|-|
|Databricks Inc.|
|John Snow Labs Inc.|

## Disclaimers
Databricks Inc. (“Databricks”) does not dispense medical, diagnosis, or treatment advice. This Solution Accelerator (“tool”) is for informational purposes only and may not be used as a substitute for professional medical advice, treatment, or diagnosis. This tool may not be used within Databricks to process Protected Health Information (“PHI”) as defined in the Health Insurance Portability and Accountability Act of 1996, unless you have executed with Databricks a contract that allows for processing PHI, an accompanying Business Associate Agreement (BAA), and are running this notebook within a HIPAA Account.  Please note that if you run this notebook within Azure Databricks, your contract with Microsoft applies.