![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/PUBLIC_HEALTH_MB4SC.ipynb)

# `Medical Bert For Sequence Classification` for **Public Health Models**

# **Colab Setup**

In [None]:
import json
import os

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)

# Adding license key-value pairs to environment variables
os.environ.update(license_keys)

# **Install dependencies**

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

# **Import dependencies into Python and start the Spark session**

In [4]:
import json
import os

from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import SparkSession

import sparknlp
import sparknlp_jsl

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.pretrained import ResourceDownloader
from pyspark.sql import functions as F
from pyspark.sql.types import StringType, IntegerType

import pandas as pd

pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

import string
import numpy as np

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(secret = SECRET, params=params)

print ("Spark NLP Version :", sparknlp.version())
print ("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.0.2
Spark NLP_JSL Version : 4.0.2


# **General Function for MedicalBertForTokenClassifier Pipeline**





In [26]:
def run_pipeline(model, text):  
  document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

  tokenizer = Tokenizer() \
    .setInputCols("document") \
    .setOutputCol("token")

  sequenceClassifier = MedicalBertForSequenceClassification.pretrained(model, "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("class")

  pipeline = Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier
    ])

  df = spark.createDataFrame(text, StringType()).toDF("text")
  results = pipeline.fit(df).transform(df)
   
  print("\n")
  print("<----------------- MODEL NAME:","\033[1m" + model + "\033[0m"," ----------------- >")
  
  res = results.select(F.explode(F.arrays_zip("document.result", "class.result","class.metadata")).alias("col"))\
               .select(F.expr("col['1']").alias("prediction"),
                       F.expr("col['2']").alias("confidence"),
                       F.expr("col['0']").alias("sentence"))
                  
  if res.count()>1:
    udf_func = F.udf(lambda x,y:  x["Some("+str(y)+")"])
    print("\n",model,"\n") 
    res.withColumn('confidence', udf_func(res.confidence, res.prediction)).show(truncate=False)

# **MODELS**

## **bert_sequence_classifier_ade_augmented**

In [22]:
model = "bert_sequence_classifier_ade_augmented"

In [24]:
sample_texts = [
"""I'm so fine today. increasing zyprexa,my condisition is became so good. it has a side effect that increase my weight. i must care about it.""",
"""Actually, also loving it because it is a medicine for bipolar disorder and they named it Latuda.""",
"""Yeah,it can be caused by swelling from around a nerve from ra,but the effexor causes shaking like ur cold(shivering)""",
"""Day three of #nonsmoking - 90% of my thoughts revolve around cigs. The nicotine lozenges I have taste like cherry infused with ashtray.""",
"""I just had a look buddy, and my medication (Seroquel) does affect tolerance to the sun.""",
"""Many new physicians have been identified and added to the Buprenorphine Certified Physicians and Treatment Providers directory!""",
"""I started out with lyrica but i could no longer afford it. it made me bloated. tried cymbalta , my heart was beating wicked fast."""
]

In [27]:
run_pipeline(model, sample_texts)

bert_sequence_classifier_ade_augmented download started this may take some time.
[OK!]


<----------------- MODEL NAME: [1mbert_sequence_classifier_ade_augmented[0m  ----------------- >

 bert_sequence_classifier_ade_augmented 

+----------+----------+-------------------------------------------------------------------------------------------------------------------------------------------+
|prediction|confidence|sentence                                                                                                                                   |
+----------+----------+-------------------------------------------------------------------------------------------------------------------------------------------+
|ADE       |0.99947673|I'm so fine today. increasing zyprexa,my condisition is became so good. it has a side effect that increase my weight. i must care about it.|
|noADE     |0.99999017|Actually, also loving it because it is a medicine for bipolar disorder and they named it L