<img src="https://nlp.johnsnowlabs.com/assets/images/logo.png" width="180" height="50" style="float: left;">

## Rule-based Sentiment Analysis

In the following example, we walk-through a simple use case for our straight forward SentimentDetector annotator.

This annotator will work on top of a list of labeled sentences which can have any of the following features
    
    positive
    negative
    revert
    increment
    decrement

Each of these sentences will be used for giving a score to text 

### Spark `2.4` and Spark NLP `2.0.0`

#### 1. Call necessary imports and set the resource path to read local data files

In [4]:
#Imports
import sys
sys.path.append('../../')

import sparknlp

from pyspark.sql import SparkSession
from pyspark.ml import Pipeline
from pyspark.sql.functions import array_contains
from sparknlp.annotator import *
from sparknlp.common import RegexRule
from sparknlp.base import DocumentAssembler, Finisher

#### 2. Load SparkSession if not already there

In [5]:
spark = sparknlp.start()

In [10]:
! wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sentiment.parquet.zip -P /tmp
! wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/lemma-corpus-small/lemmas_small.txt -P /tmp
! wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sentiment-corpus/default-sentiment-dict.txt -P /tmp    

--2019-03-17 18:01:52--  https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sentiment.parquet.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.170.117
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.170.117|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘/tmp/sentiment.parquet.zip’ not modified on server. Omitting download.

--2019-03-17 18:01:53--  https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/lemma-corpus-small/lemmas_small.txt
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.170.117
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.170.117|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘/tmp/lemmas_small.txt’ not modified on server. Omitting download.

--2019-03-17 18:01:53--  https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sentiment-corpus/default-sentiment-dict.txt
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.

In [7]:
! unzip /tmp/sentiment.parquet.zip -d /tmp/

Archive:  /tmp/sentiment.parquet.zip
   creating: /tmp/sentiment.parquet/
  inflating: /tmp/sentiment.parquet/.part-00002-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet.crc  
  inflating: /tmp/sentiment.parquet/part-00002-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet  
  inflating: /tmp/sentiment.parquet/part-00003-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet  
  inflating: /tmp/sentiment.parquet/.part-00000-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet.crc  
  inflating: /tmp/sentiment.parquet/part-00001-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet  
 extracting: /tmp/sentiment.parquet/_SUCCESS  
  inflating: /tmp/sentiment.parquet/.part-00003-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet.crc  
  inflating: /tmp/sentiment.parquet/part-00000-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet  
  inflating: /tmp/sentiment.parquet/.part-00001-08092d15-dd8c-40f9-a1df-641a1a4b1698.snappy.parquet.crc  


In [8]:
data = spark. \
        read. \
        parquet("/tmp/sentiment.parquet"). \
        limit(10000).cache()

data.show()

+------+---------+--------------------+
|itemid|sentiment|                text|
+------+---------+--------------------+
|799033|        0|@FrankomQ8 What's...|
|799034|        1|@FranKoUK guitar ...|
|799035|        0|@frankparenteau u...|
|799036|        1|@frankparenteau w...|
|799037|        1|@FrankPatris dude...|
|799038|        0|@FrankRamblings a...|
|799039|        1|@frankroberts  ni...|
|799040|        0|@frankroberts ur ...|
|799041|        1|@FrankS Breaking ...|
|799042|        1|@frankschultelad ...|
|799043|        0|@frankshorter Wol...|
|799044|        0|@franksting - its...|
|799045|        1|@franksting Ha! D...|
|799046|        1|@franksting yeah,...|
|799047|        1|@franksting yes, ...|
|799048|        1|@FrankSylar arn't...|
|799049|        1|    @frankules WO ? |
|799050|        0|@frankwkelly I'm ...|
|799051|        1|@FrankXSalinas Th...|
|799052|        1|@frankybhoy93 tha...|
+------+---------+--------------------+
only showing top 20 rows



#### 3. Create appropriate annotators. We are using Sentence Detection, Tokenizing the sentences, and find the lemmas of those tokens. The Finisher will only output the Sentiment.

In [11]:
document_assembler = DocumentAssembler() \
    .setInputCol("text")

sentence_detector = SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

lemmatizer = Lemmatizer() \
    .setInputCols(["token"]) \
    .setOutputCol("lemma") \
    .setDictionary("/tmp/lemmas_small.txt", key_delimiter="->", value_delimiter="\t")
        
sentiment_detector = SentimentDetector() \
    .setInputCols(["lemma", "sentence"]) \
    .setOutputCol("sentiment_score") \
    .setDictionary("/tmp/default-sentiment-dict.txt", ",")
    
finisher = Finisher() \
    .setInputCols(["sentiment_score"]) \
    .setOutputCols(["sentiment"])

#### 4. Train the pipeline, which is only being trained from external resources, not from the dataset we pass on. The prediction runs on the target dataset

In [None]:
pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, lemmatizer, sentiment_detector, finisher])
model = pipeline.fit(data)
result = model.transform(data)

#### 5. filter the finisher output, to find the positive sentiment lines

In [None]:
result.where(array_contains(result.sentiment, "positive")).show(10,False)