![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/NER_PATHOGEN.ipynb)

# `ner_pathogen` **Models**

Pretrained named entity recognition deep learning model for pathogen related texts and reports.

## 1. Colab Setup

**Import license keys**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

**Install dependencies**

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

## 2. Start Spark Session

**Import dependencies into Python and start the Spark session**

In [None]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G", 
          "spark.kryoserializer.buffer.max":"2000M", 
          "spark.driver.maxResultSize":"2000M"} 

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.1.0
Spark NLP_JSL Version : 4.1.0


## 3. Select the model and construct the pipeline

**Create the pipeline**

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetectorDL = SentenceDetectorDLModel\
    .pretrained("sentence_detector_dl_healthcare", "en", 'clinical/models') \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical" ,"en", "clinical/models")\
    .setInputCols(["sentence","token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_pathogen", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[document_assembler,
                            sentenceDetectorDL,
                            tokenizer,
                            word_embeddings,
                            ner_model, 
                            ner_converter])

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_pathogen download started this may take some time.
[OK!]


## 4. Create example inputs

In [None]:
text_list = [
"""Signs of dehydration often begin with loss of Skin Elasticity and Irritability. This can progress to skin discoloration , a fast heart rate , and a decreased responsiveness as it becomes more severe . Loose but non-watery stools in babies who are exclusively breastfed , however , are normal .Antiretroviral therapy ( ART ) is recommended for all HIV - infected individuals to reduce the risk of disease progression . ART also is recommended for HIV - infected individuals for the prevention of transmission of HIV . Patients starting ART should be willing and able to commit to treatment and understand the benefits and risks of therapy and the importance of adherence .""",

"""Measured resistance to amantadine and rimantadine in American isolates of H3N2 has increased to 91% in 2005. This high level of resistance may be due to the easy availability of amantadines as part of over-the-counter the common COLD medications in countries such as China and Russia, and their use to prevent outbreaks of influenza in farmed poultry .""",

"""Polymers of adamantane have been patented as ANTIVIRAL-Medications against HIV . Buprenorphine has been shown experimentally ( 1982–1995 ) to be effective against severe , refractory depression. Gabapentin , approved for treatment of seizures and postherpetic neuralgia in adults , has side-effects which are useful in treating bipolar disorder1 , essential tremor , hot flashes , migraine prophylaxis , neuropathic pain syndromes , phantom limb syndrome , and restless leg syndrome .""",

"""The CDC recommended against using M2-INHIBITORS during the 2005–06 influenza season due to high levels of drug resistance . The two classes of ANTIVIRAL-DRUGS used against influenza are neuraminidase inhibitors ( oseltamivir , zanamivir , laninamivir and peramivir ) and M2 protein inhibitors ( adamantane derivatives ) Influenza , commonly known as a flu infection, is an infectious disease caused by an influenza virus . Symptoms can be mild to severe . The most common symptoms include : SORE-THROAT , MUSCLE-and-JOINT pain , headache , coughing , and FEEL-TIRED . These symptoms typically begin two days after exposure to the Virus and most last less than a week .""",

""" Other diseases are under investigation to discover if they have a VIRUS as the causative agent , such as the possible connection between Human-Herpesvirus 6 ( HHV6 ) and Neurological Disorders(ND's) such as multiple sclerosis and chronic fatigue syndrome . All medical applications known so far involve not pure adamantane , but its derivatives . The first adamantane derivative used as a drug was amantadine – first ( 1967 ) as an ANTIVIRAL-DRUG against various strains of influenza virus and then to treat Parkinson's disease . Other drugs among adamantane derivatives include adapalene , adapromine , bromantane , carmantadine , chlodantane , dopamantine , memantine , rimantadine , saxagliptin , tromantadine , and vildagliptin ."""
]

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(text_list,StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|Signs of dehydration often begin with loss of Skin Elasticity and Irritability. This can progress...|
|Measured resistance to amantadine and rimantadine in American isolates of H3N2 has increased to 9...|
|Polymers of adamantane have been patented as ANTIVIRAL-Medications against HIV . Buprenorphine ha...|
|The CDC recommended against using M2-INHIBITORS during the 2005–06 influenza season due to high l...|
| Other diseases are under investigation to discover if they have a VIRUS as the causative agent ,...|
+----------------------------------------------------------------------------------------------------+



## 5. Use the pipeline to create outputs

In [None]:
result = pipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin,
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata)).alias("cols")) \
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']").alias("begin"),
              F.expr("cols['2']").alias("end"),
              F.expr("cols['3']['entity']").alias("ner_label")).show(truncate=False)

+------------------------+-----+---+----------------+
|chunk                   |begin|end|ner_label       |
+------------------------+-----+---+----------------+
|dehydration             |9    |19 |MedicalCondition|
|Skin Elasticity         |46   |60 |MedicalCondition|
|Irritability            |66   |77 |MedicalCondition|
|skin discoloration      |101  |118|MedicalCondition|
|fast heart rate         |124  |138|MedicalCondition|
|decreased responsiveness|148  |171|MedicalCondition|
|Antiretroviral therapy  |293  |314|Medicine        |
|ART                     |318  |320|Medicine        |
|HIV                     |347  |349|Pathogen        |
|ART                     |418  |420|Medicine        |
|HIV                     |446  |448|Pathogen        |
|HIV                     |511  |513|Pathogen        |
|ART                     |535  |537|Medicine        |
|amantadine              |23   |32 |Medicine        |
|rimantadine             |38   |48 |Medicine        |
|H3N2                    |74

## 6. Visualize results

In [None]:
from sparknlp_display import NerVisualizer

ner_viz = NerVisualizer()


    
for j in range(len(text_list)):
    ner_viz.display(result = result.collect()[j], label_col = "ner_chunk")
    print("\n\n")


























# `bert_token_classifier_ner_pathogen` **Models**

In [None]:
text_list = [
"""Signs of dehydration often begin with loss of Skin-Elasticity and Irritability. This can progress to Skin-Color Change, a Fast-Heart-Rate , and a decreased responsiveness as it becomes more severe . Loose but non-watery stools in babies who are exclusively breastfed , however , are normal .Antiretroviral therapy ( ART ) is recommended for all HIV - infected individuals to reduce the risk of disease progression . ART also is recommended for the HIV(Human immunodeficiency viruses)- infected individuals for the prevention of transmission of HIV . Patients starting ART should be willing and able to commit to treatment and understand the benefits and risks of therapy and the importance of adherence .""",

"""Measured resistance to amantadine and rimantadine in American isolates of H3N2 has increased to 91% in 2005. This high level of resistance may be due to the easy availability of amantadines as part of over-the-counter the common COLD medications in countries such as China and Russia, and their use to prevent outbreaks of influenza in farmed poultry .""",

"""Polymers of adamantane have been patented as ANTIVIRAL-Medications against HIV . Buprenorphine has been shown experimentally ( 1982–1995 ) to be effective against severe , refractory depression. Gabapentin , approved for treatment of seizures and postherpetic neuralgia in adults , has side-effects which are useful in treating bipolar disorder1 , essential tremor , hot flashes , migraine prophylaxis , neuropathic-pain syndromes , phantom limb syndrome , and restless leg syndrome .""",

"""The CDC recommended against using M2-INHIBITORS during the 2005–06 influenza season due to high levels of drug resistance . The two classes of ANTIVIRAL-DRUGS used against influenza are neuraminidase inhibitors ( oseltamivir , zanamivir , laninamivir and peramivir ) and M2 protein inhibitors ( adamantane derivatives ) Influenza , commonly known as a flu infection, is an infectious disease caused by an influenza virus . Symptoms can be mild to severe . The most common symptoms include : SORE-THROAT , MUSCLE-and-JOINT pain , headache , coughing , and FEEL-TIRED . These symptoms typically begin two days after exposure to the VIRUS infection and most last less than a week""",

""" Other diseases are under investigation to discover if they have a VIRUS as the causative agent , such as the possible connection between HUMAN-HERPES-VIRUS6 (HHV6) and ND's(Neurological-Disorders) such as multiple sclerosis and chronic fatigue syndrome . All medical applications known so far involve not pure adamantane , but its derivatives . The first adamantane derivative used as a drug was amantadine – first ( 1967 ) as an ANTIVIRAL-DRUG against various strains of influenza virus and then to treat Parkinson's disease . Other drugs among adamantane derivatives include adapalene , adapromine , bromantane , carmantadine , chlodantane , dopamantine , memantine , rimantadine , saxagliptin , tromantadine , and vildagliptin ."""
]

In [None]:
document_assembler = DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

sentenceDetector = SentenceDetectorDLModel.pretrained()\
  .setInputCols(["document"])\
  .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(['sentence']) \
    .setOutputCol('token')

tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_pathogen", "en", "clinical/models")\
    .setInputCols(['token', "sentence"])\
    .setOutputCol("label")\
    .setCaseSensitive(True)

ner_converter = NerConverter()\
    .setInputCols(["sentence","token","label"])\
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[document_assembler,
                            sentenceDetector, 
                            tokenizer,
                            tokenClassifier,
                            ner_converter])


sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
bert_token_classifier_ner_pathogen download started this may take some time.
[OK!]


In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(text_list, StringType()).toDF('text')

df.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|Signs of dehydration often begin with loss of Skin-Elasticity and Irritability. This can progress...|
|Measured resistance to amantadine and rimantadine in American isolates of H3N2 has increased to 9...|
|Polymers of adamantane have been patented as ANTIVIRAL-Medications against HIV . Buprenorphine ha...|
|The CDC recommended against using M2-INHIBITORS during the 2005–06 influenza season due to high l...|
| Other diseases are under investigation to discover if they have a VIRUS as the causative agent ,...|
+----------------------------------------------------------------------------------------------------+



## Use the pipeline to create outputs

In [None]:
result = pipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.ner_chunk.begin,
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata)).alias("cols")) \
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']").alias("begin"),
              F.expr("cols['2']").alias("end"),
              F.expr("cols['3']['entity']").alias("ner_label")).show(truncate=False)

+----------------------------------+-----+---+----------------+
|chunk                             |begin|end|ner_label       |
+----------------------------------+-----+---+----------------+
|dehydration                       |9    |19 |MedicalCondition|
|Skin-Elasticity                   |46   |60 |MedicalCondition|
|Irritability                      |66   |77 |MedicalCondition|
|Skin-Color Change                 |101  |117|MedicalCondition|
|Fast-Heart-Rate                   |122  |136|MedicalCondition|
|decreased responsiveness          |146  |169|MedicalCondition|
|Antiretroviral therapy            |291  |312|Medicine        |
|ART                               |316  |318|Medicine        |
|HIV                               |345  |347|Pathogen        |
|ART                               |416  |418|Medicine        |
|HIV(Human immunodeficiency viruses|448  |481|Pathogen        |
|HIV                               |544  |546|Pathogen        |
|ART                               |568 

## Visualize results

In [None]:
from sparknlp_display import NerVisualizer

ner_viz = NerVisualizer()


    
for j in range(len(text_list)):
    ner_viz.display(result = result.collect()[j], label_col = "ner_chunk")
    print("\n\n")
























