![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# Adverse Drug Events Detection using Named Entity Recognition, Classification and Assertion Status Models

`ADE NER`: Extracts ADE and DRUG entities from clinical texts.

`ADE Classifier`: CLassify if a sentence is ADE-related (`True`) or not (`False`)

We use several datasets to train these models:

- Twitter dataset, which is used in paper "`Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts`" (https://pubmed.ncbi.nlm.nih.gov/28339747/)
- ADE-Corpus-V2, which is used in paper "`An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text`" (https://arxiv.org/abs/1801.00625) and availe online: https://sites.google.com/site/adecorpus/home/document.
- CADEC dataset, which is sued in paper `Cadec: A corpus of adverse drug event annotations` (https://pubmed.ncbi.nlm.nih.gov/25817970)

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
#nlp.install()

In [None]:
from pyspark.sql import DataFrame
import pyspark.sql.functions as F
import pyspark.sql.types as T
import pyspark.sql as SQL

import os
import json
import string
import numpy as np
import pandas as pd

#from pyspark.ml import Pipeline, PipelineModel

pd.set_option('max_colwidth', 100)
pd.set_option('display.max_columns', 100)  
pd.set_option('display.expand_frame_repr', False)

spark



## ADE Classifier 

ADE Classifier Classifies if a sentence is ADE-related (`True`) or not (`False`)

`True` : The sentence is talking about a possible ADE

`False` : The sentences doesn't have any information about an ADE.


|index |model |Predicted Entities|
|-----:|:-----|:----------------:|
| 1| [classifierdl_ade_biobert](https://nlp.johnsnowlabs.com/2021/01/21/classifierdl_ade_biobert_en.html)   |True, False|
| 2| [classifierdl_ade_clinicalbert](https://nlp.johnsnowlabs.com/2021/01/21/classifierdl_ade_clinicalbert_en.html)  |True, False|
| 3| [classifierdl_ade_conversational_biobert](https://nlp.johnsnowlabs.com/2021/01/21/classifierdl_ade_conversational_biobert_en.html)  |True, False|
| 4| [bert_sequence_classifier_ade](https://nlp.johnsnowlabs.com/2022/02/08/bert_sequence_classifier_ade_en.html)  |True, False|
| 5| [bert_sequence_classifier_ade_augmented](https://nlp.johnsnowlabs.com/2022/07/27/bert_sequence_classifier_ade_augmented_en_3_0.html)   |ADE, noADE|
| 6| [distilbert_sequence_classifier_ade](https://nlp.johnsnowlabs.com/2022/02/08/distilbert_sequence_classifier_ade_en.html)  |True, False|

### ADE Classifier with BioBert

In [None]:
# Annotator that transforms a text column from dataframe into an Annotation ready for NLP
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("sentence")

# Tokenizer splits words in a relevant format for NLP
tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

bert_embeddings = nlp.BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)

embeddingsSentence = nlp.SentenceEmbeddings() \
    .setInputCols(["sentence", "embeddings"]) \
    .setOutputCol("sentence_embeddings") \
    .setPoolingStrategy("AVERAGE")\
    .setStorageRef('biobert_pubmed_base_cased')

classsifierdl = nlp.ClassifierDLModel.pretrained("classifierdl_ade_biobert", "en", "clinical/models")\
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("class")

ade_clf_pipeline = nlp.Pipeline(
    stages=[documentAssembler, 
            tokenizer,
            bert_embeddings,
            embeddingsSentence,
            classsifierdl])


empty_data = spark.createDataFrame([[""]]).toDF("text")
ade_clf_model = ade_clf_pipeline.fit(empty_data)

ade_lp_pipeline = nlp.LightPipeline(ade_clf_model)

19:02:46, INFO Received command c on object id p0
biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ | ][ / ][ — ][ \ ][ | ][ / ][ — ][ \ ][ | ][OK!]
classifierdl_ade_biobert download started this may take some time.
Approximate size to download 21.8 MB
[ | ][ / ][ — ][OK!]


In [None]:
text = """I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

print(ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].result)
print(ade_lp_pipeline.fullAnnotate(text)[0]["class"])

19:03:33, INFO Received command c on object id p0
True
[Annotation(category, 0, 247, True, {'sentence': '0', 'False': '0.012636117', 'True': '0.9873639'}, [])]


In [None]:
text="I just took an Advil and have no gastric problems so far."

print(ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].result)
print(ade_lp_pipeline.fullAnnotate(text)[0]["class"])

False
[Annotation(category, 0, 56, False, {'sentence': '0', 'False': '0.9999769', 'True': '2.3126835E-5'}, [])]


As you see `gastric problems` is not detected as `ADE` as it is in a negative context. So, classifier did a good job detecting that.

In [None]:
text="""Always tired, and possible blood clots. I was on Voltaren for about 4 years and all of the sudden had a minor stroke and had blood clots that traveled to my eye. 
I had every test in the book done at the hospital, and they couldn't find anything. I was completley healthy! I am thinking it was from the voltaren. 
I have been off of the drug for 8 months now, and have never felt better. I started eating healthy and working out and that has help alot. 
I can now sleep all thru the night. I wont take this again. If I have the back pain, I will pop a tylonol instead."""

print(ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].result)
print(ade_lp_pipeline.fullAnnotate(text)[0]["class"])

19:03:40, INFO Received command c on object id p0
True
[Annotation(category, 0, 567, True, {'sentence': '0', 'False': '0.0146723185', 'True': '0.9853277'}, [])]


In [None]:
ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].metadata["True"]

19:03:43, INFO Received command c on object id p0
Out[10]: '0.9853277'

In [None]:
texts = ["The patient was prescribed 1000 mg fish oil and multivitamins. She was discharged on zopiclone and ambrisentan.",
"I feel a bit drowsy & have a little blurred vision, after taking a pill.",
"I've been on Arthrotec 50 for over 10 years on and off, only taking it when I needed it.",
"Due to my arthritis getting progressively worse, to the point where I am in tears with the agony, gp's started me on 75 twice a day and I have to take it every day for the next month to see how I get on, here goes.",
"So far its been very good, pains almost gone, but I feel a bit weird, didn't have that when on 50."]

for text in texts:

  result = ade_lp_pipeline.annotate(text)
  cls = ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].result
  confidence = ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].metadata[cls]
  print (cls,"\t",confidence)


19:03:47, INFO Received command c on object id p0
False 	 0.9999689
True 	 0.7971174
False 	 0.99999654
True 	 0.50409865
False 	 0.9763144


### ADE Classifier trained with conversational (short) sentences

This model is trained on short, conversational sentences related to ADE and is supposed to do better on the text that is short and used in a daily context.

In [None]:
conv_classsifierdl = nlp.ClassifierDLModel.pretrained("classifierdl_ade_conversational_biobert", "en", "clinical/models")\
            .setInputCols(["sentence_embeddings"]) \
            .setOutputCol("class")

conv_ade_clf_pipeline = nlp.Pipeline(
    stages=[documentAssembler, 
            tokenizer,
            bert_embeddings,
            embeddingsSentence,
            conv_classsifierdl])

empty_data = spark.createDataFrame([[""]]).toDF("text")
conv_ade_clf_model = conv_ade_clf_pipeline.fit(empty_data)

conv_ade_lp_pipeline = nlp.LightPipeline(conv_ade_clf_model)

19:04:07, INFO Received command c on object id p0
classifierdl_ade_conversational_biobert download started this may take some time.
Approximate size to download 21.8 MB
[ | ][ / ][ — ][OK!]


In [None]:
text = "after taking a pill, he denies any pain"

conv_ade_lp_pipeline.annotate(text)['class'][0]

print(conv_ade_lp_pipeline.fullAnnotate(text)[0]["class"][0].result)
print(conv_ade_lp_pipeline.fullAnnotate(text)[0]["class"])

19:04:15, INFO Received command c on object id p0
False
[Annotation(category, 0, 38, False, {'sentence': '0', 'False': '0.9569519', 'True': '0.04304803'}, [])]


### ADE Sequence Classifier

MedicalBertForSequenceClassification

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")
    
tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")
        
sequenceClassifier = medical.BertForSequenceClassification.pretrained("bert_sequence_classifier_ade_augmented", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("class")
    

ade_clf_pipeline = nlp.Pipeline(stages=[
                    document_assembler, 
                    tokenizer,
                    sequenceClassifier])


data = spark.createDataFrame([["So glad I am off effexor, so sad it ruined my teeth. tip Please be carefull taking antideppresiva and read about it 1st"],
                              ["Religare Capital Ranbaxy has been accepting approval for Diovan since 2012"]]).toDF("text")
              
result = ade_clf_pipeline.fit(data).transform(data)

result.select("text", "class.result").show(truncate=False)

19:04:18, INFO Received command c on object id p0
bert_sequence_classifier_ade_augmented download started this may take some time.
[ | ][ / ][ — ][ \ ][ | ][ / ][ — ][ \ ][ | ][ / ][OK!]
+-----------------------------------------------------------------------------------------------------------------------+-------+
|text                                                                                                                   |result |
+-----------------------------------------------------------------------------------------------------------------------+-------+
|So glad I am off effexor, so sad it ruined my teeth. tip Please be carefull taking antideppresiva and read about it 1st|[ADE]  |
|Religare Capital Ranbaxy has been accepting approval for Diovan since 2012                                             |[noADE]|
+-----------------------------------------------------------------------------------------------------------------------+-------+



MedicalDistilBertForSequenceClassification

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")
    
tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")
    
sequenceClassifier = medical.DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_ade", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("class")
 
ade_clf_pipeline = nlp.Pipeline(stages=[
                    document_assembler, 
                    tokenizer,
                    sequenceClassifier])


data = spark.createDataFrame([["I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums.I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."],
                              ["Religare Capital Ranbaxy has been accepting approval for Diovan since 2012"]]).toDF("text")
              
result = ade_clf_pipeline.fit(data).transform(data)

result.select("text", "class.result").show(truncate=100)

19:04:53, INFO Received command c on object id p0
distilbert_sequence_classifier_ade download started this may take some time.
[ | ][ / ][ — ][ \ ][ | ][ / ][ — ][ \ ][ | ][OK!]
+----------------------------------------------------------------------------------------------------+-------+
|                                                                                                text| result|
+----------------------------------------------------------------------------------------------------+-------+
|I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numb...| [True]|
|                          Religare Capital Ranbaxy has been accepting approval for Diovan since 2012|[False]|
+----------------------------------------------------------------------------------------------------+-------+



## ADE NER

Extracts `ADE` and `DRUG` entities from text.

|    | model_name                 |Predicted Entities|
|---:|:---------------------------|:----------------:|
|  1 | [ner_ade_clinical](https://nlp.johnsnowlabs.com/2021/04/01/ner_ade_clinical_en.html)      |ADE, DRUG|
|  2 | [ner_ade_biobert](https://nlp.johnsnowlabs.com/2021/04/01/ner_ade_biobert_en.html)        |ADE, DRUG|
|  3 | [ner_ade_healthcare](https://nlp.johnsnowlabs.com/2021/04/01/ner_ade_healthcare_en.html)  |ADE, DRUG|
|  4 | [ner_ade_clinicalbert](https://nlp.johnsnowlabs.com/2021/04/01/ner_ade_clinicalbert_en.html)   |ADE, DRUG|
|  5 | [bert_token_classifier_ner_ade](https://nlp.johnsnowlabs.com/2022/01/04/bert_token_classifier_ner_ade_en.html) |ADE, DRUG|
|  6 | [bert_token_classifier_ade_tweet_binary](https://nlp.johnsnowlabs.com/2022/07/29/bert_token_classifier_ade_tweet_binary_en_3_0.html) |ADE |

#### ADE NER with Word embeddings

In [None]:
documentAssembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

sentenceDetector = nlp.SentenceDetector()\
  .setInputCols(["document"])\
  .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
  .setInputCols(["sentence"])\
  .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
  .setInputCols(["sentence", "token"])\
  .setOutputCol("embeddings")

ade_ner = medical.NerModel.pretrained("ner_ade_clinical", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")

ner_converter = nlp.NerConverter() \
  .setInputCols(["sentence", "token", "ner"]) \
  .setOutputCol("ner_chunk")

ner_pipeline = nlp.Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    word_embeddings,
    ade_ner,
    ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_ner_model = ner_pipeline.fit(empty_data)

ade_ner_lp = nlp.LightPipeline(ade_ner_model)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[ | ][OK!]
ner_ade_clinical download started this may take some time.
[ | ][OK!]


In [None]:
text = """I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

light_result = ade_ner_lp.fullAnnotate(text)

chunks = []
entities = []
begin =[]
end = []
confidence = []

for n in light_result[0]['ner_chunk']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])  

import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                    'begin': begin, 'end': end, "confidence":confidence})

df

19:05:54, INFO Received command c on object id p0


Unnamed: 0,chunks,entities,begin,end,confidence
0,allergic reaction,ADE,10,26,0.68645
1,vancomycin,DRUG,31,40,0.9988
2,itchy skin,ADE,52,61,0.71365
3,sore throat/burning/itching,ADE,64,90,0.81229997
4,numbness of tongue,ADE,93,110,0.6829667
5,gums,ADE,116,119,0.7055


As you see `gastric problems` is not detected as `ADE` as it is in a negative context. So, NER did a good job ignoring that.

#### ADE NER with Bert embeddings

In [None]:
documentAssembler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

sentenceDetector = nlp.SentenceDetector()\
  .setInputCols(["document"])\
  .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
  .setInputCols(["sentence"])\
  .setOutputCol("token")

bert_embeddings = nlp.BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
  .setInputCols(["sentence", "token"])\
  .setOutputCol("embeddings")
  
ade_ner_bert = medical.NerModel.pretrained("ner_ade_biobert", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")

ner_converter = nlp.NerConverter() \
  .setInputCols(["sentence", "token", "ner"]) \
  .setOutputCol("ner_chunk")

ner_pipeline = nlp.Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    bert_embeddings,
    ade_ner_bert,
    ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_ner_model_bert = ner_pipeline.fit(empty_data)

ade_ner_lp_bert = nlp.LightPipeline(ade_ner_model_bert)

biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ | ][OK!]
ner_ade_biobert download started this may take some time.
[ | ][OK!]


In [None]:
text = """I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums.
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""

light_result = ade_ner_lp_bert.fullAnnotate(text)

chunks = []
entities = []
begin =[]
end = []
confidence = []

for n in light_result[0]['ner_chunk']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])  

import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                    'begin': begin, 'end': end, "confidence":confidence})

df

Unnamed: 0,chunks,entities,begin,end,confidence
0,allergic reaction,ADE,10,26,0.8071
1,vancomycin,DRUG,31,40,0.9979
2,itchy skin,ADE,52,61,0.94315004
3,sore throat/burning/itching,ADE,64,90,0.90735
4,numbness of tongue and gums,ADE,93,119,0.90212
5,any other medication,DRUG,227,246,0.75869995


#### ADE NER for Tweets

In [None]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = nlp.SentenceDetectorDLModel.pretrained()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols("sentence")\
    .setOutputCol("token")

tokenClassifier = medical.BertForTokenClassification.pretrained("bert_token_classifier_ade_tweet_binary", "en", "clinical/models")\
    .setInputCols("token", "sentence")\
    .setOutputCol("ner")\
    .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence","token","ner"])\
    .setOutputCol("ner_chunk")


nlpPipeline =  nlp.Pipeline(stages=[
                      documentAssembler,
                      sentenceDetector,
                      tokenizer,
                      tokenClassifier,
                      ner_converter])

ade_ner_model_tweet= nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

ade_ner_lp_tweet = nlp.LightPipeline(ade_ner_model_tweet)

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[ | ][OK!]
bert_token_classifier_ade_tweet_binary download started this may take some time.
[ | ][OK!]


In [None]:
twitter_text = """I understand you very well. :( just got 1st urgh ! humira worked for me for just 3months then got painful reactions.
This vyvanse got me sweating right now and i dont even know why!,
Wonder which drug is doing this memory lapse thing. My guess the Duloxetine.
I used to be on paxil but that made me more depressed and prozac made me angry.
Maybe it's because of the effect of seroquel, but when I eat fast carbohydrates, I feel the sugar drop."""


In [None]:
light_result = ade_ner_lp_tweet.fullAnnotate(twitter_text)

chunks = []
entities = []
begin =[]
end = []
confidence = []

for n in light_result[0]['ner_chunk']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])  

import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                    'begin': begin, 'end': end, "confidence":confidence})

df

Unnamed: 0,chunks,entities,begin,end,confidence
0,painful reactions,ADE,98,114,0.96369493
1,sweating,ADE,137,144,0.99936104
2,memory lapse,ADE,215,226,0.98965096
3,depressed,ADE,304,312,0.99975455
4,angry,ADE,333,337,0.99960786
5,sugar drop,ADE,432,441,0.90284485


## ADE NER with AssertionDL Model

In [None]:
assertion_ner_converter = nlp.NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ass_ner_chunk")\
    .setWhiteList(['ADE'])

biobert_assertion = medical.AssertionDLModel.pretrained("assertion_dl_biobert", "en", "clinical/models") \
    .setInputCols(["sentence", "ass_ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

assertion_ner_pipeline = nlp.Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    bert_embeddings,
    ade_ner_bert,
    ner_converter,
    assertion_ner_converter,
    biobert_assertion])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_ass_ner_model_bert = assertion_ner_pipeline.fit(empty_data)

ade_ass_ner_model_lp_bert = nlp.LightPipeline(ade_ass_ner_model_bert)

assertion_dl_biobert download started this may take some time.
[ | ][OK!]


In [None]:
import pandas as pd
text = "I feel a bit drowsy & have a little blurred vision, so far no gastric problems. I have been on Arthrotec 50 for over 10 years on and off, only taking it when I needed it. Due to my arthritis getting progressively worse, to the point where I am in tears with the agony, gp's started me on 75 twice a day and I have to take it every day for the next month to see how I get on, here goes. So far its been very good, pains almost gone, but I feel a bit weird, didn't have that when on 50."

print (text)

light_result = ade_ass_ner_model_lp_bert.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]
confidence=[]


for n,m in zip(light_result['ass_ner_chunk'],light_result['assertion']):
    
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    status.append(m.result)
    confidence.append(n.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, "confidence":confidence})

df

I feel a bit drowsy & have a little blurred vision, so far no gastric problems. I have been on Arthrotec 50 for over 10 years on and off, only taking it when I needed it. Due to my arthritis getting progressively worse, to the point where I am in tears with the agony, gp's started me on 75 twice a day and I have to take it every day for the next month to see how I get on, here goes. So far its been very good, pains almost gone, but I feel a bit weird, didn't have that when on 50.


Unnamed: 0,chunks,entities,assertion,confidence
0,drowsy,ADE,present,0.7226
1,blurred vision,ADE,present,0.9296
2,gastric problems,ADE,absent,0.8494
3,pains,ADE,present,0.8005


Looks great ! `gastric problems` is detected as `ADE` and `absent`

## ADE Relation Extraction Model

ADE Relation Extraction classifies if an adverse event is caused by a drug or not. 1: Shows the adverse event and drug entities are related, 0 : Shows the adverse event and drug entities are not related

|    | model_name                 |Predicted Entities|
|---:|:---------------------------|:----------------:|
|  1 | [re_ade_clinical](https://nlp.johnsnowlabs.com/2021/07/12/re_ade_clinical_en.html)    |0, 1|
|  2 | [re_ade_biobert](https://nlp.johnsnowlabs.com/2021/07/16/re_ade_biobert_en.html)      |0, 1|
|  3 | [redl_ade_biobert](https://nlp.johnsnowlabs.com/2021/07/12/redl_ade_biobert_en.html)  |0, 1|
|  4 | [re_ade_conversational](https://nlp.johnsnowlabs.com/2022/07/27/re_ade_cConversational_en_3_0.html)  |not_related, is_related|

In [None]:
# get relations in a pandas dataframe
import pandas as pd

def get_relations_df (results, rel_col='relations', chunk_col='ner_chunks'):
    rel_pairs=[]
    chunks = []

    for rel in results[rel_col]:
        rel_pairs.append((
            rel.metadata['entity1_begin'],
            rel.metadata['entity1_end'],
            rel.metadata['chunk1'], 
            rel.metadata['entity1'], 
            rel.metadata['entity2_begin'],
            rel.metadata['entity2_end'],
            rel.metadata['chunk2'], 
            rel.metadata['entity2'],
            rel.result, 
            rel.metadata['confidence'],
        ))

    for chunk in results[chunk_col]:
        chunks.append((
            chunk.metadata["sentence"],
            chunk.begin,
            chunk.end,
            chunk.result, 
        ))

    rel_df = pd.DataFrame(rel_pairs, columns=['entity1_begin', 'entity1_end', 'chunk1', 'entity1', 'entity2_begin', 'entity2_end', 'chunk2', 'entity2', 'relation', 'confidence'])

    chunks_df = pd.DataFrame(chunks, columns = ["sentence", "begin", "end", "chunk"])
    chunks_df.begin = chunks_df.begin.astype(str)
    chunks_df.end = chunks_df.end.astype(str)

    result_df = pd.merge(rel_df,chunks_df, left_on=["entity1_begin", "entity1_end", "chunk1"], right_on=["begin", "end", "chunk"])[["sentence"] + list(rel_df.columns)]


    return result_df

19:07:16, INFO Received command c on object id p0


In [None]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

words_embedder = nlp.WordEmbeddingsModel()\
    .pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

pos_tagger = nlp.PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("pos_tags")

ner_tagger = medical.NerModel()\
    .pretrained("ner_ade_clinical", "en", "clinical/models")\
    .setInputCols("sentence", "token", "embeddings")\
    .setOutputCol("ner_tags")    

ner_chunker = medical.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = nlp.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentence", "pos_tags", "token"])\
    .setOutputCol("dependencies")

reModel = medical.RelationExtractionModel()\
    .pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(20)\
    .setRelationPairs(["drug-ade, ade-drug"])\
    .setRelationPairsCaseSensitive(False)\
    .setCustomLabels({"1": "is_related", "0": "not_related"})

pipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer, 
    words_embedder, 
    pos_tagger, 
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")
ade_model = pipeline.fit(empty_data)

19:07:21, INFO Received command c on object id p0
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[ | ][OK!]
pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[ | ][ / ][OK!]
ner_ade_clinical download started this may take some time.
[ | ][OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[ | ][ / ][ — ][ \ ][ | ][ / ][ — ][ \ ][OK!]
re_ade_clinical download started this may take some time.
Approximate size to download 10.9 MB
[ | ][ / ][OK!]


In [None]:
text = ["""
Hypersensitivity to aspirin can be manifested as acute asthma, urticaria and/or angioedema, or a systemic anaphylactoid reaction.
A patient had undergone a renal transplantation as a result of malignant hypertension, and immunosuppressive therapy consisting of cyclosporin and prednisone ,  developed  sweating  and  thrombosis alone 5 years following the transplantation but there were not stomach pain.  
A 44-year-old man taking naproxen for chronic low back pain and a 20-year-old woman on oxaprozin for   rheumatoid arthritis  presented  with  tense bullae and cutaneous fragility on the face and the back of the hands.""",

"""We describe the side effects of 5-FU in a colon cancer patient who suffered severe mucositis,  prolonged myelosuppression, and neurologic toxicity that required admission to the intensive care unit who  has a healthy appetite.
The reported cases of in utero exposure to cyclosposphamide shared the following manifestations with our patient who suffered  growth deficiency, developmental delay, craniosynostosis, blepharophimosis, flat nasal bridge and abnormal ears.
I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums.I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.
I experienced fatigue, muscle cramps, anxiety, agression and sadness after taking Lipitor but no more adverse after passing Zocor.
A 44-year-old man taking naproxen for chronic low back pain and a 20-year-old woman on oxaprozin for rheumatoid arthritis presented with tense bullae and cutaneous fragility on the face and the back of the hands.""" ]

19:07:59, INFO Received command c on object id p0


In [None]:
light_results = nlp.LightPipeline(ade_model).fullAnnotate(text)

19:07:59, INFO Received command c on object id p0


In [None]:
get_relations_df(light_results[0], 'relations')

19:08:06, INFO Received command c on object id p0


Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,0,21,27,aspirin,DRUG,50,61,acute asthma,ADE,is_related,1.0
1,0,21,27,aspirin,DRUG,64,72,urticaria,ADE,is_related,1.0
2,0,21,27,aspirin,DRUG,81,90,angioedema,ADE,is_related,1.0
3,0,21,27,aspirin,DRUG,98,128,systemic anaphylactoid reaction,ADE,is_related,0.99994946
4,1,262,272,cyclosporin,DRUG,303,310,sweating,ADE,is_related,0.99999833
5,1,262,272,cyclosporin,DRUG,318,327,thrombosis,ADE,is_related,1.0
6,1,278,287,prednisone,DRUG,303,310,sweating,ADE,is_related,1.0
7,1,278,287,prednisone,DRUG,318,327,thrombosis,ADE,is_related,1.0
8,2,433,440,naproxen,DRUG,550,561,tense bullae,ADE,is_related,1.0
9,2,433,440,naproxen,DRUG,567,623,cutaneous fragility on the face and the back of the hands,ADE,is_related,0.999218


In [None]:
vis = nlp.viz.RelationExtractionVisualizer()
re_vis = vis.display(light_results[0], 'relations', show_relations=True,return_html=True) # default show_relations: True
displayHTML(re_vis)

19:08:06, INFO Received command c on object id p0


## ADE models applied to Spark Dataframes

In [None]:
import pyspark.sql.functions as F

! wget -q	https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/data/sample_ADE_dataset.csv

Out[1]: True

In [None]:
dbutils.fs.cp("file:/databricks/driver/sample_ADE_dataset.csv", "dbfs:/")

Out[4]: True

In [None]:
ade_DF = spark.read\
                .option("header", "true")\
                .csv("/sample_ADE_dataset.csv")\
                .filter(F.col("label").isin(['True','False']))

ade_DF.show(truncate=50)

+--------------------------------------------------+-----+
|                                              text|label|
+--------------------------------------------------+-----+
|Do U know what Meds are R for bipolar depressio...|False|
|# hypercholesterol: Because of elevated CKs (pe...| True|
|Her weight, respirtory status and I/O should be...|False|
|* DM - Pt had several episodes of hypoglycemia ...| True|
|We report the case of a female acromegalic pati...| True|
|2 . Calcipotriene 0.005% Cream Sig: One (1) App...|False|
|Always tired, and possible blood clots. I was o...| True|
|A difference in chemical structure between thes...|False|
|10 . She was left on prednisone 20mg qd due to ...|False|
|The authors suggest that risperidone may increa...| True|
|- Per oral maxillofacial surgery there is no ev...|False|
|@marionjross Cipro is just as bad! Stay away fr...|False|
|A young woman with epilepsy had tonic-clonic se...| True|
|Intravenous methotrexate is an effective adjunc...|Fals

**With BioBert version of NER** (will be slower but more accurate)

In [None]:
import pyspark.sql.functions as F

ner_converter = nlp.NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")\
    .setWhiteList(['ADE'])

ner_pipeline = nlp.Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    bert_embeddings,
    ade_ner_bert,
    ner_converter])


empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_ner_model = ner_pipeline.fit(empty_data)

result = ade_ner_model.transform(ade_DF)

sample_df = result.select('text','ner_chunk.result')\
                  .toDF('text','ADE_phrases')\
                  .filter(F.size('ADE_phrases')>0).toPandas()

In [None]:
import pandas as pd
pd.set_option('display.max_colwidth', 0)

In [None]:
sample_df.sample(20)

Unnamed: 0,text,ADE_phrases
3,"Always tired, and possible blood clots. I was on Voltaren for about 4 years and all of the sudden had a minor stroke and had blood clots that traveled to my eye. I had every test in the book done at the hospital, and they couldnt find anything. I was completley healthy! I am thinking it was from the voltaren. I have been off of the drug for 8 months now, and have never felt better. I started eating healthy and working out and that has help alot. I can now sleep all thru the night. I wont take this again. If I have the back pain, I will pop a tylonol instead.","[tired, blood clots, stroke, blood clots that traveled to my eye, completley healthy]"
13,Hypokalemia after normal doses of neubulized albuterol (salbutamol).,[Hypokalemia]
7,PURPOSE: To report new indocyanine green angiographic (ICGA) findings after intravitreal bevacizumab (IVB) for myopic choroidal neovascularization (mCNV).,[indocyanine green angiographic]
2,We report the case of a female acromegalic patient in whom multiple hepatic adenomas appeared soon after danazol treatment for uterine fibromatosis.,[hepatic adenomas]
21,"2 years with no problems, then toe neuropathy for two years now and other foot problems because of this I assume. I stopped Lipitor after taking it for 2 years and toe neuropathy started. I also had stomach problems and pain. After stopping Lipitor, I thought my heart would jump out of my chest, but now stomach and heart are both OK - Feet are terrible thanks to Lipitor. I know many others that have had problems with Lipitor - feet and legs - and had to stop. All my blood test were normal. I weigh 114 and 5'2 . Hike, play golf and keep busy. It is a terrible drug and should be off the market.","[toe neuropathy, foot problems, toe neuropathy, stomach problems, pain]"
1,* DM - Pt had several episodes of hypoglycemia on lantus due to decreasing oral intake.,[hypoglycemia]
26,"With itraconazole, hepatotoxic reactions have only very rarely been reported, and histologic data are lacking.",[hepatotoxic reactions]
19,"Therefore, parenteral amiodarone was implicated as the cause of acute hepatitis in this patient.",[acute hepatitis]
28,"Lipitor is the only prescription drug I take and have taken it for four years now. I have noticed I am gaining excess weight even though I am not eating more. My stomach is bloated, with excess gas (although could be the weight gain?). I also am experiencing increased memory loss, and fatigue, but then I rarely sleep well (due to night sweats) since I went through menopause ten years ago. I think some of my symptoms are related to being post menopausal rather than from the Lipitor.","[gaining excess weight, stomach is bloated, memory loss, fatigue, night sweats]"
9,# thrombocytopenia: Secondary to chemotherapy and MDS/AML concerns.,[thrombocytopenia]


**Doing the same with clinical embeddings version** (faster results)

In [None]:
import pyspark.sql.functions as F

ner_converter = nlp.NerConverter() \
  .setInputCols(["sentence", "token", "ner"]) \
  .setOutputCol("ner_chunk")\
  .setWhiteList(['ADE'])

ner_pipeline = nlp.Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    word_embeddings,
    ade_ner,
    ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_ner_model = ner_pipeline.fit(empty_data)

result = ade_ner_model.transform(ade_DF)

result.select('text','ner_chunk.result')\
.toDF('text','ADE_phrases').filter(F.size('ADE_phrases')>0)\
.show(truncate=70)


+----------------------------------------------------------------------+----------------------------------------------------------------------+
|                                                                  text|                                                           ADE_phrases|
+----------------------------------------------------------------------+----------------------------------------------------------------------+
|# hypercholesterol: Because of elevated CKs (peaked at 819) the pat...|                                                        [elevated CKs]|
|We report the case of a female acromegalic patient in whom multiple...|                                           [multiple hepatic adenomas]|
|Always tired, and possible blood clots. I was on Voltaren for about...|                      [blood clots that traveled to my eye, back pain]|
|The authors suggest that risperidone may increase affect in patient...|                                                     [increase a

### Creating sentence dataframe (one sentence per row) and getting ADE entities and categories

In [None]:
documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

sentenceDetector = nlp.SentenceDetector()\
      .setInputCols(["document"])\
      .setOutputCol("sentence")\
      .setExplodeSentences(True)

tokenizer = nlp.Tokenizer()\
      .setInputCols(["sentence"])\
      .setOutputCol("token")

bert_embeddings = nlp.BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
      .setInputCols(["sentence", "token"])\
      .setOutputCol("embeddings")

embeddingsSentence = nlp.SentenceEmbeddings() \
      .setInputCols(["sentence", "embeddings"]) \
      .setOutputCol("sentence_embeddings") \
      .setPoolingStrategy("AVERAGE")\
      .setStorageRef('biobert_pubmed_base_cased')

classsifierdl = nlp.ClassifierDLModel.pretrained("classifierdl_ade_biobert", "en", "clinical/models")\
      .setInputCols(["sentence_embeddings"]) \
      .setOutputCol("class")\
      .setStorageRef('biobert_pubmed_base_cased')

ade_ner = medical.NerModel.pretrained("ner_ade_biobert", "en", "clinical/models") \
      .setInputCols(["sentence", "token", "embeddings"]) \
      .setOutputCol("ner")
  
ner_converter = nlp.NerConverter() \
      .setInputCols(["sentence", "token", "ner"]) \
      .setOutputCol("ner_chunk")\
      .setWhiteList(['ADE'])

ner_clf_pipeline = nlp.Pipeline(
    stages=[documentAssembler, 
            sentenceDetector,
            tokenizer,
            bert_embeddings,
            embeddingsSentence,
            classsifierdl,
            ade_ner,
            ner_converter])

ade_Sentences = ner_clf_pipeline.fit(ade_DF)

biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ | ][OK!]
classifierdl_ade_biobert download started this may take some time.
Approximate size to download 21.8 MB
[ | ][OK!]
ner_ade_biobert download started this may take some time.
[ | ][OK!]


In [None]:
import pyspark.sql.functions as F

ade_Sentences.transform(ade_DF).select('sentence.result','ner_chunk.result','class.result')\
                               .toDF('sentence','ADE_phrases','is_ADE').show(truncate=60)

+------------------------------------------------------------+---------------------------------------------+-------+
|                                                    sentence|                                  ADE_phrases| is_ADE|
+------------------------------------------------------------+---------------------------------------------+-------+
|         [Do U know what Meds are R for bipolar depression?]|                                           []|[False]|
|         [Currently #FDA approved #quetiapine AKA #Seroquel]|                                           []|[False]|
|[# hypercholesterol: Because of elevated CKs (peaked at 8...|                               [elevated CKs]|[False]|
|[Her weight, respirtory status and I/O should be monitore...|                                           []|[False]|
|[* DM - Pt had several episodes of hypoglycemia on lantus...|                               [hypoglycemia]| [True]|
|[We report the case of a female acromegalic patient in wh...|  

## Creating a pretrained pipeline with ADE NER, Assertion and Classifer

In [None]:
# Annotator that transforms a text column from dataframe into an Annotation ready for NLP
documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("sentence")

# Tokenizer splits words in a relevant format for NLP
tokenizer = nlp.Tokenizer()\
      .setInputCols(["sentence"])\
      .setOutputCol("token")

bert_embeddings = nlp.BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
      .setInputCols(["sentence", "token"])\
      .setOutputCol("embeddings")

ade_ner = medical.NerModel.pretrained("ner_ade_biobert", "en", "clinical/models") \
      .setInputCols(["sentence", "token", "embeddings"]) \
      .setOutputCol("ner")\
      .setStorageRef('biobert_pubmed_base_cased')

ner_converter = nlp.NerConverter() \
      .setInputCols(["sentence", "token", "ner"]) \
      .setOutputCol("ner_chunk")

assertion_ner_converter = nlp.NerConverter() \
      .setInputCols(["sentence", "token", "ner"]) \
      .setOutputCol("ass_ner_chunk")\
      .setWhiteList(['ADE'])

biobert_assertion = medical.AssertionDLModel.pretrained("assertion_dl_biobert", "en", "clinical/models") \
      .setInputCols(["sentence", "ass_ner_chunk", "embeddings"]) \
      .setOutputCol("assertion")

embeddingsSentence = nlp.SentenceEmbeddings() \
      .setInputCols(["sentence", "embeddings"]) \
      .setOutputCol("sentence_embeddings") \
      .setPoolingStrategy("AVERAGE")\
      .setStorageRef('biobert_pubmed_base_cased')

classsifierdl = nlp.ClassifierDLModel.pretrained("classifierdl_ade_conversational_biobert", "en", "clinical/models")\
      .setInputCols(["sentence_embeddings"]) \
      .setOutputCol("class")

ade_clf_pipeline = nlp.Pipeline(
    stages=[documentAssembler, 
            tokenizer,
            bert_embeddings,
            ade_ner,
            ner_converter,
            assertion_ner_converter,
            biobert_assertion,
            embeddingsSentence,
            classsifierdl])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_ner_clf_model = ade_clf_pipeline.fit(empty_data)

ade_ner_clf_pipeline = nlp.LightPipeline(ade_ner_clf_model)

biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ | ][OK!]
ner_ade_biobert download started this may take some time.
[ | ][OK!]
assertion_dl_biobert download started this may take some time.
[ | ][OK!]
classifierdl_ade_conversational_biobert download started this may take some time.
Approximate size to download 21.8 MB
[ | ][OK!]


In [None]:
classsifierdl.getStorageRef()

Out[23]: 'biobert_pubmed_base_cased'

In [None]:
text =  'I have always felt tired, but no blood clots. I was on Voltaren for about 4 years and all of the sudden had a minor stroke and had blood clots that traveled to my eye. I had every test in the book done at the hospital, and they couldnt find anything. I was completley healthy! I am thinking it was from the voltaren. I have been off of the drug for 8 months now, and have never felt better. I started eating healthy and working out and that has help alot. I can now sleep all thru the night. I wont take this again. If I have the back pain, I will pop a tylonol instead.'

light_result = ade_ner_clf_pipeline.fullAnnotate(text)

print (light_result[0]['class'][0].metadata)

chunks = []
entities = []
begin =[]
end = []
confidence = []

for n in light_result[0]['ner_chunk']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    confidence.append(n.metadata['confidence']) 

import pandas as pd

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                    'begin': begin, 'end': end, "confidence":confidence})

df

{'sentence': '0', 'False': '0.028450416', 'True': '0.97154963'}


Unnamed: 0,chunks,entities,begin,end,confidence
0,felt tired,ADE,14,23,0.86925
1,blood clots,ADE,33,43,0.93635
2,Voltaren,DRUG,55,62,0.9977
3,stroke,ADE,116,121,0.831
4,blood clots that traveled to my eye,ADE,131,165,0.8084857
5,voltaren,DRUG,307,314,0.7912


In [None]:
import pandas as pd

text = 'I have always felt tired, but no blood clots. I was on Voltaren for about 4 years and all of the sudden had a minor stroke and had blood clots that traveled to my eye. I had every test in the book done at the hospital, and they couldnt find anything. I was completley healthy! I am thinking it was from the voltaren. I have been off of the drug for 8 months now, and have never felt better. I started eating healthy and working out and that has help alot. I can now sleep all thru the night. I wont take this again. If I have the back pain, I will pop a tylonol instead.'

print (text)

light_result = ade_ass_ner_model_lp_bert.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]
confidence = []

for n,m in zip(light_result['ass_ner_chunk'],light_result['assertion']):
    
    chunks.append(n.result)
    entities.append(n.metadata['entity']) 
    status.append(m.result)
    confidence.append(m.metadata['confidence']) 
        
df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, "confidence":confidence})

df

I have always felt tired, but no blood clots. I was on Voltaren for about 4 years and all of the sudden had a minor stroke and had blood clots that traveled to my eye. I had every test in the book done at the hospital, and they couldnt find anything. I was completley healthy! I am thinking it was from the voltaren. I have been off of the drug for 8 months now, and have never felt better. I started eating healthy and working out and that has help alot. I can now sleep all thru the night. I wont take this again. If I have the back pain, I will pop a tylonol instead.


Unnamed: 0,chunks,entities,assertion,confidence
0,tired,ADE,present,0.9998
1,blood clots,ADE,absent,1.0
2,stroke,ADE,present,1.0
3,blood clots that traveled to my eye,ADE,present,1.0
4,completley healthy,ADE,present,1.0


In [None]:
result = ade_ner_clf_pipeline.annotate('I just took an Advil 100 mg and it made me drowsy')

print (result['class'])
print(list(zip(result['token'],result['ner'])))

['False']
[('I', 'O'), ('just', 'O'), ('took', 'O'), ('an', 'O'), ('Advil', 'B-DRUG'), ('100', 'O'), ('mg', 'O'), ('and', 'O'), ('it', 'O'), ('made', 'O'), ('me', 'O'), ('drowsy', 'B-ADE')]


In [None]:
ade_ner_clf_model.write().overwrite().save('/databricks/driver/ade_pretrained_pipeline')

In [None]:
ade_pipeline = nlp.PretrainedPipeline.from_disk('/databricks/driver/ade_pretrained_pipeline')

ade_pipeline.annotate('I just took an Advil 100 mg and it made me drowsy')

Out[39]: {'ner_chunk': ['Advil', 'drowsy'],
 'assertion': ['conditional'],
 'sentence_embeddings': ['I just took an Advil 100 mg and it made me drowsy'],
 'token': ['I',
  'just',
  'took',
  'an',
  'Advil',
  '100',
  'mg',
  'and',
  'it',
  'made',
  'me',
  'drowsy'],
 'ner': ['O', 'O', 'O', 'O', 'B-DRUG', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ADE'],
 'class': ['False'],
 'ass_ner_chunk': ['drowsy'],
 'embeddings': ['I',
  'just',
  'took',
  'an',
  'Advil',
  '100',
  'mg',
  'and',
  'it',
  'made',
  'me',
  'drowsy'],
 'sentence': ['I just took an Advil 100 mg and it made me drowsy']}

In [None]:
ade_pipeline.model.stages

Out[40]: [DocumentAssembler_aa40e5e040eb,
 REGEX_TOKENIZER_d4a6b731ce93,
 BERT_EMBEDDINGS_c6741c518b81,
 MedicalNerModel_4fc5b46ae2cf,
 NerConverter_b059aa644186,
 NerConverter_fc015c29b44f,
 ASSERTION_DL_2f4db8443148,
 SentenceEmbeddings_2d4d019bcd52,
 ClassifierDLModel_6edc7e323980]

## Pretrained ADE Pipeline

A pipeline for `Adverse Drug Events (ADE)` with `ner_ade_healthcare`, and `classifierdl_ade_biobert`. It will extract `ADE` and `DRUG` clinical entities, and then assign ADE status to a text(`True` means ADE, `False` means not related to ADE). Also extracts relations between `DRUG` and `ADE` entities (`1` means the adverse event and drug entities are related, `0` is not related).

In [None]:
#pretrained_ade_pipeline = nlp.PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')

In [None]:
#pretrained_ade_pipeline.model.stages



In [None]:
#result = pretrained_ade_pipeline.fullAnnotate("The main adverse effects of Leflunomide consist of diarrhea, nausea, liver enzyme elevation, hypertension, #alopecia, and allergic skin reactions.")

#result[0].keys()



In [None]:
#result[0]['class'][0].metadata



In [None]:
#text = "I experienced fatigue, muscle cramps, anxiety, agression and sadness after taking Lipitor but no more adverse after passing Zocor."
#
#import pandas as pd
#
#chunks = []
#entities = []
#begin =[]
#end = []
#confidence=[]
#
#print ('sentence:', text)
#print()
#
#result = pretrained_ade_pipeline.fullAnnotate(text)
#
#print ('ADE status:', result[0]['class'][0].result)
#
#print ('prediction probability>> True : ', result[0]['class'][0].metadata['True'], \
#        'False: ', result[0]['class'][0].metadata['False'])
#
#for n in result[0]['ner_chunks_ade']:
#
#    begin.append(n.begin)
#    end.append(n.end)
#    chunks.append(n.result)
#    entities.append(n.metadata['entity']) 
#    confidence.append(n.metadata["confidence"])
#
#df = pd.DataFrame({'chunks':chunks, 'entities':entities,
#                'begin': begin, 'end': end,"confidence":confidence})
#
#df




#### with AssertionDL

In [None]:
#import pandas as pd
#
#text = """I have an allergic reaction to vancomycin. My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. 
#I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""
#
#print (text)
#
#light_result = pretrained_ade_pipeline.fullAnnotate(text)[0]
#
#chunks=[]
#entities=[]
#status=[]
#confidence=[]
#
#
#for n,m in zip(light_result['ner_chunks_ade_assertion'],light_result['assertion_ade']):
#    
#    chunks.append(n.result)
#    entities.append(n.metadata['entity']) 
#    status.append(m.result)
#    confidence.append(m.metadata["confidence"])
#
#df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, "confidence":confidence})
#
#df



#### with Relation Extraction

In [None]:
#import pandas as pd
#
#text = """A patient had undergone a renal transplantation as a result of malignant hypertension, and immunosuppressive therapy consisting of cyclosporin and prednisone ,  developed  sweating  and  thrombosis alone 5 years following the transplantation but there were not stomach pain 
#"""
# 
#print (text)
#
#results = pretrained_ade_pipeline.fullAnnotate(text)
#
#rel_pairs=[]
#
#for rel in results[0]["relations_ade_drug"]:
#    rel_pairs.append((
#        rel.result, 
#        rel.metadata['entity1'], 
#        rel.metadata['entity1_begin'],
#        rel.metadata['entity1_end'],
#        rel.metadata['chunk1'], 
#        rel.metadata['entity2'],
#        rel.metadata['entity2_begin'],
#        rel.metadata['entity2_end'],
#        rel.metadata['chunk2'], 
#        rel.metadata['confidence']
#    ))
#
#rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])
#rel_df



In [None]:
#vis = nlp.viz.RelationExtractionVisualizer()
#
#re_vis = vis.display(results[0], 'relations_ade_drug', show_relations=True,return_html=True) # default show_relations: True
#displayHTML(re_vis)





You can check the links below if you want to see more examples of; 

- Pretrained Clinical Pipelines in Spark NLP : [11.Pretrained Clinical Pipelines](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.Pretrained_Clinical_Pipelines.ipynb)
- Relation Extraction Model of DRUG and ADE entities `re_ade_biobert`: [10. Clinical Relation Extraction Model](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/10.Clinical_Relation_Extraction.ipynb)

End of Notebook #