

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/SENTIMENT_EN_SARCASM.ipynb)




# **Detect Sarcasm in text**

## 1. Colab Setup

In [1]:
# Install java
!apt-get update -qq
!apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
!java -version

# Install pyspark
!pip install --ignore-installed -q pyspark==2.4.4

# Install Sparknlp
!pip install --ignore-installed spark-nlp

openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing)
[K     |████████████████████████████████| 215.7MB 64kB/s 
[K     |████████████████████████████████| 204kB 40.1MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
Collecting spark-nlp
[?25l  Downloading https://files.pythonhosted.org/packages/b5/a2/5c2e18a65784442ded6f6c58af175ca4d99649337de569fac55b04d7ed8e/spark_nlp-2.5.5-py2.py3-none-any.whl (124kB)
[K     |████████████████████████████████| 133kB 2.8MB/s 
[?25hInstalling collected packages: spark-nlp
Successfully installed spark-nlp-2.5.5


In [2]:
import pandas as pd
import numpy as np
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
import json
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

## 2. Start Spark Session

In [3]:
spark = sparknlp.start()

## 3. Select the DL model

In [4]:
MODEL_NAME='classifierdl_use_sarcasm'

## 4. Some sample examples

In [5]:
## Generating Example Files ##
text_list = [
             #sarcasm
             """Love getting home from work knowing that in less than 8hours you're getting up to go back there again.""",
             #neutral
             """Oh my gosh! Can you imagine @JessieJ playing piano on her tour while singing a song. I would die and go to heaven. #sheisanangel""",
             #sarcasm
            """Dear Teva, thank you for waking me up every few hours by howling. Your just trying to be mother natures alarm clock.""",
             #neutral
             """The United States is a signatory to this international convention""",
             #sarcasm
             """If I could put into words how much I love waking up at am on Tuesdays I would""",
             #neutral
             """@pdomo Don't forget that Nick Foles is also the new Tom Brady. What a preseason! #toomanystudQBs #thankgodwedonthavetebow""",
             #sarcasm
             """I cant even describe how excited I am to go cook noodles for hours""",
             #neutral
             """@Will_Piper should move back up fella. I'm already here... On my own... Having loads of fun""",
             #sarcasm
             """Tweeting at work... Having sooooo much fun and honestly not bored at all #countdowntillfinish""",
             #neutral
             """I can do what I want to. I play by my own rules""",
             ]

## 5. Define Spark NLP pipleline

In [6]:
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
    
use = UniversalSentenceEncoder.pretrained(name="tfhub_use", lang="en")\
 .setInputCols(["document"])\
 .setOutputCol("sentence_embeddings")


sentimentdl = ClassifierDLModel.pretrained(name=MODEL_NAME)\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("sentiment")

nlpPipeline = Pipeline(
      stages = [
          documentAssembler,
          use,
          sentimentdl
      ])


tfhub_use download started this may take some time.
Approximate size to download 923.7 MB
[OK!]
classifierdl_use_sarcasm download started this may take some time.
Approximate size to download 21.5 MB
[OK!]


## 6. Run the pipeline

In [7]:
empty_df = spark.createDataFrame([['']]).toDF("text")

pipelineModel = nlpPipeline.fit(empty_df)

df = spark.createDataFrame(pd.DataFrame({"text":text_list}))
result = pipelineModel.transform(df)

## 7. Visualize results

In [8]:

result.select(F.explode(F.arrays_zip('document.result', 'sentiment.result')).alias("cols")) \
.select(F.expr("cols['0']").alias("document"),
        F.expr("cols['1']").alias("sentiment")).show(truncate=False)

+--------------------------------------------------------------------------------------------------------------------------------+---------+
|document                                                                                                                        |sentiment|
+--------------------------------------------------------------------------------------------------------------------------------+---------+
|Love getting home from work knowing that in less than 8hours you're getting up to go back there again.                          |sarcasm  |
|Oh my gosh! Can you imagine @JessieJ playing piano on her tour while singing a song. I would die and go to heaven. #sheisanangel|normal   |
|Dear Teva, thank you for waking me up every few hours by howling. Your just trying to be mother natures alarm clock.            |sarcasm  |
|The United States is a signatory to this international convention                                                               |normal   |
|If I could p