![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/BertForSequenceClassification.ipynb)

# `BertForSequenceClassification` **Models**

## 1. Colab Setup

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.2.0 spark-nlp==3.4.1

Import Libraries

In [2]:
import pandas as pd
import numpy as np
import json
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

from pyspark.sql.types import StringType, IntegerType

## 2. Start Spark Session

In [3]:
spark = sparknlp.start(spark32=True)

print("Spark NLP Version :", sparknlp.version())

spark

Spark NLP Version : 3.4.1


## 3. Select the DL model

In [4]:
### Select Model
model_antisemitism = 'bert_sequence_classifier_antisemitism'
model_trec_coarse = "bert_sequence_classifier_trec_coarse"
model_age_news = "bert_sequence_classifier_age_news"
model_hatexplain = "bert_sequence_classifier_hatexplain"
model_emotion = "bert_sequence_classifier_emotion"
model_banking = "bert_sequence_classifier_banking77"

## 4. Some sample examples

In [5]:
text_antisemitism = ["""Shylock in Merchant of Venice. Shylock was a Jew and moneylender. Depends on the context it is used but as the antisemitism is hotly debated nowadays if I were a Jew I wouldn't like to hear it. Perhaps I'm wrong but that's my opinion.""",
"""That Jew gripped yo nuts and you did nothing and you been attacking black people ever since. Probably like that shit""",
"""They came for the Jews, and I did not speak out Because I was not a Jew.Then they came for me and there was no one left to speak for me""",
"""David is a sephardic jew huh.... now i have to give him my entire heart i guess""",
"""I asked a genuine question, she has been smearing @georgegalloway for a while now without any evidence. Am I not allowed to ask her to show me the anti Semitism. Remember @RachelRileyRR is the one who said she ‘doesn’t look like a normal jew’ that to me is anti Semitic""",
"""I pointed the finger directly at the fascists still in control of Europe the muh jew shills began in earnest. Distraction, anger, insult, lies.""",
]

In [6]:
text_trec_coarse = ["""Germany is the largest country in Europe economically.""",
"""What other prince showed his paintings in a two-prince exhibition with Prince Charles in London?""",
"""What is the name of the chronic neurological autoimmune disease which attacks the protein sheath that surrounds nerve cells causing a gradual loss of movement in the body?""",
"""How many hands does Bjorn Borg use when hitting his forehand?""",
"""CNN is the abbreviation for what?""",
"""Give a reason for American Indians oftentimes dropping out of school.""",
"""What was organized as a confederate veterans social club in Pulaski, in Tennessee, in 1866?""",
"""Who was the first person inducted into the U.S. Swimming Hall of Fame?""",
"""What did the only repealed amendment to the U.S. Constitution deal with?""",
]

In [7]:
text_age_news = ["""Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group, which has a reputation for making well-timed and occasionally controversial plays in the defense industry, has quietly placed its bets on another part of the market.""",
"""Juventus and Olympique Lyon joined Chelsea and Inter Milan in the last 16 of the Champions League after both scored late goals to seal their places on Wednesday.""",
"""Fearing the fate of Italy, the centre-right government has threatened to be merciless with those who flout tough restrictions.""",
"""Cisco Systems is teaming up with researchers at the Massachusetts Institute of Technology to develop new communications technology that the company hopes to ultimately turn into products.""",
"""Golf is not a sport renowned for being at ease with gender issues. Many of the games great clubs refuse to admit women, and the occasional inclusion of women in men professional tournaments has caused consternation from Arran to Augusta.""",
"""The Supreme Council of the United Arab Emirates (UAE) on Wednesday evening unanimously elected Sheikh Khalifa bin Zayed Al Nahyan as the new president of the UAE, the official Emirates news agency WAM reported Wednesday.""",
"""The Norwegian hacker famed for developing DVD encryption cracking software has apparently struck again; this time breaking the locks on Apple Computer Inc.'s wireless music streaming technology.""",
"""Utility holding company Ameren Corp. announced plans Wednesday to reduce its payroll by offering 950 union and management employees financial incentives to leave."""
]

In [8]:
text_hatexplain = ["""If English is not imposition then Hindi is also not imposition shut up chutiya retards stop hindi imposition.""",
"""The uk has threatened to return radioactive waste to the eu if an agreement cannot be reached can not we keep the radioactive waste and send back all the paki migrants instead""",
"""Afghani immigrant once told me that in Afghanistan we do not believe in fences while our conversation was about livestock guardian dogs it now occurs to me what this really means is that there no belief in protection of persons or property""",
"""I post a video showing the shocking promotion of violence by the south african eff themselves and jewtube still say i am promoting violence""",
"""I want to go back to school and get an education but i am poor and colleges are kike rip offs i am not satisfied with where i am in life and honestly have no clue where or how to start pushing forward""",
"""I am confused how a president blocked stimulus checks for those who married to immigrants while he himself married to an immigrant""",
"""I love you very much!"""]

In [9]:
text_emotion = ["""I've been taking or milligrams or times recommended amount and I've fallen asleep a lot faster but I also feel like so funny.""",
"""I have been with petronas for years I feel that petronas has performed well and made a huge profit.""",
"""I feel a bit rude writing to an elderly gentleman to ask for gifts because i feel a bit greedy but what is christmas about if not mild greed.""",
"""I feel romantic too""",
"""I now feel compromised and skeptical of the value of every unit of work I put in""",
"""I started feeling sentimental about dolls I had as a child and so began a collection of vintage barbie dolls from the sixties"""]

In [10]:
text_banking = ["""I have been waiting over a week. Is the card still coming?""",
"""I need a transaction reversed from my account.""",
"""How long does it take for cards to be delivered after ordering them?""",
"""I've just been married and need to update my name""",
"""I'm interested in learning more about disposable virtual cards.	""",
"""I tried topping up using my card, but the money is gone?"""]

## 5. Define Spark NLP pipeline

In [11]:
model_dict = {model_antisemitism: text_antisemitism,
              model_trec_coarse :text_trec_coarse,
              model_age_news :text_age_news,
              model_hatexplain: text_hatexplain,
              model_emotion: text_emotion,
              model_banking: text_banking}

In [12]:
def run_pipeline(model, text, results):  
  document_assembler = DocumentAssembler() \
      .setInputCol('text') \
      .setOutputCol('document')

  tokenizer = Tokenizer() \
      .setInputCols(['document']) \
      .setOutputCol('token')

  sequenceClassifier = BertForSequenceClassification\
        .pretrained(model, 'en') \
        .setInputCols(['token', 'document']) \
        .setOutputCol('pred_class')

  pipeline = Pipeline(stages=[document_assembler, tokenizer, sequenceClassifier])

  df = spark.createDataFrame(text, StringType()).toDF("text")
  results[model]=(pipeline.fit(df).transform(df))

## 6. Run the pipeline

In [13]:
results = {}
for model, text in zip(model_dict.keys(),model_dict.values()):
  run_pipeline(model, text, results)

bert_sequence_classifier_antisemitism download started this may take some time.
Approximate size to download 390.8 MB
[OK!]
bert_sequence_classifier_trec_coarse download started this may take some time.
Approximate size to download 387.8 MB
[OK!]
bert_sequence_classifier_age_news download started this may take some time.
Approximate size to download 40.4 MB
[OK!]
bert_sequence_classifier_hatexplain download started this may take some time.
Approximate size to download 391.1 MB
[OK!]
bert_sequence_classifier_emotion download started this may take some time.
Approximate size to download 391.1 MB
[OK!]
bert_sequence_classifier_banking77 download started this may take some time.
Approximate size to download 391.3 MB
[OK!]


## 7. Visualize results

In [14]:
for model_name, result in zip(results.keys(),results.values()):  
  res = result.select(F.explode(F.arrays_zip(result.document.result, 
                                                  result.pred_class.result,
                                                  result.pred_class.metadata)).alias("col"))\
                  .select(F.expr("col['1']").alias("prediction"),
                          F.expr("col['2']").alias("confidence"),
                          F.expr("col['0']").alias("sentence"))
                  
  if res.count()>1:
    udf_func = F.udf(lambda x,y:  x["Some("+str(y)+")"])
    print("\n",model_name,"\n") 
    res.withColumn('confidence', udf_func(res.confidence, res.prediction)).show(truncate=False)
    print("\n**********************************\n") 


 bert_sequence_classifier_antisemitism 

+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|prediction|confidence|sentence                                                                                                                                                                                                                                                                     |
+----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|0         |0.92433804|Shylock in Merchant of Venice. Shylock was a Jew and 