

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/ABSA_Inference.ipynb)




# **Aspect Based Sentiment Analysis in Spark NLP**

#### Model Details: https://nlp.johnsnowlabs.com/2020/12/29/ner_aspect_based_sentiment_en.html

### Spark NLP documentation and instructions:
https://nlp.johnsnowlabs.com/docs/en/quickstart

### You can find details about Spark NLP annotators here:
https://nlp.johnsnowlabs.com/docs/en/annotators

### You can find details about Spark NLP models here:
https://nlp.johnsnowlabs.com/models


## 1. Colab Setup

Install Dependencies and Libraries

In [1]:
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash
# !bash colab.sh
# -p is for pyspark
# -s is for spark-nlp
# !bash colab.sh -p 3.1.1 -s 3.0.1
# by default they are set to the latest

# Install Spark NLP Display lib
! pip install --upgrade -q spark-nlp-display

--2021-12-27 08:47:12--  http://setup.johnsnowlabs.com/colab.sh
Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://setup.johnsnowlabs.com/colab.sh [following]
--2021-12-27 08:47:12--  https://setup.johnsnowlabs.com/colab.sh
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/scripts/colab_setup.sh [following]
--2021-12-27 08:47:13--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/scripts/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:44

Import and start the Spark session

In [2]:
import pandas as pd
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

import sparknlp
from sparknlp.annotator import *
from sparknlp.base import *

spark = sparknlp.start()

# manually start session
'''
spark = SparkSession.builder \
    .appName('Spark NLP Licensed') \
    .master('local[*]') \
    .config('spark.driver.memory', '16G') \
    .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
    .config('spark.kryoserializer.buffer.max', '2000M') \
    .config('spark.jars.packages', 'com.johnsnowlabs.nlp:spark-nlp_2.11:' +sparknlp.version()).getOrCreate()
'''

"\nspark = SparkSession.builder     .appName('Spark NLP Licensed')     .master('local[*]')     .config('spark.driver.memory', '16G')     .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')     .config('spark.kryoserializer.buffer.max', '2000M')     .config('spark.jars.packages', 'com.johnsnowlabs.nlp:spark-nlp_2.11:' +sparknlp.version()).getOrCreate()\n"

##2. Build Pipeline

In [3]:
document_assembler = DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = Tokenizer()\
    .setInputCols(['sentence']) \
    .setOutputCol('token')

word_embeddings = WordEmbeddingsModel.pretrained("glove_6B_300", "xx")\
    .setInputCols(["document", "token"])\
    .setOutputCol("embeddings")
    
ner_model = NerDLModel.pretrained("ner_aspect_based_sentiment")\
    .setInputCols(["document", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverter()\
    .setInputCols(['sentence', 'token', 'ner']) \
    .setOutputCol('ner_chunk')

nlp_pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    word_embeddings,
    ner_model,
    ner_converter])

empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = nlp_pipeline.fit(empty_df)
light_pipeline = LightPipeline(pipeline_model)

glove_6B_300 download started this may take some time.
Approximate size to download 426.2 MB
[OK!]
ner_aspect_based_sentiment download started this may take some time.
Approximate size to download 21.3 MB
[OK!]


## 3. Create example inputs

In [4]:
# Enter examples as strings in this array
input_list = [
    """From the beginning, we were met by friendly staff members, and the convienent parking at Chelsea Piers made it easy for us to get to the boat."""]

## 4. Run the pipeline

Full Pipeline (Expects a spark Data Frame)

In [5]:
df = spark.createDataFrame(pd.DataFrame({"text": input_list}))
result = pipeline_model.transform(df)

Light Pipeline (Expects a list of string)

In [6]:
lresult = light_pipeline.fullAnnotate(input_list)

## 5. Visualize results

Full Pipeline Result

In [7]:
# Using display lib
from sparknlp_display import NerVisualizer

NerVisualizer().display(result.collect()[0], 'ner_chunk', 'document')

In [8]:
# Process manually
exploded = F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata'))
select_expression_0 = F.expr("cols['0']").alias("chunk")
select_expression_1 = F.expr("cols['1']['entity']").alias("ner_label")
result.select(exploded.alias("cols")) \
    .select(select_expression_0, select_expression_1).show(truncate=False)
result = result.toPandas()

+-------------+---------+
|chunk        |ner_label|
+-------------+---------+
|staff members|POS      |
+-------------+---------+



Light Pipeline Result

In [9]:
# Using display lib
from sparknlp_display import NerVisualizer

NerVisualizer().display(lresult[0], 'ner_chunk', 'document')

In [10]:
# Process manually
for example in lresult:
  for res in example['ner_chunk']:
    print ('Token/Phrase:', res.result, 'Sentiment: ', res.metadata['entity'])

Token/Phrase: staff members Sentiment:  POS
