<img src="https://nlp.johnsnowlabs.com/assets/images/logo.png" width="180" height="50" style="float: left;">

## Sentiment Analysis Pipeline

This pipeline will be used to explain a number of important features of the Spark-NLP library; Sentence Detection, Tokenization, Spell Checking, and Sentiment Detection.
The idea is to start with natural language as could have been entered by a user, and get sentiment associated to it. Let's walk through each of the stages!


### Spark `2.4` and Spark NLP `???`

#### 1. Call necessary imports and set the resource path to read local data files

In [7]:
#Imports
import sys
sys.path.append('../../')

from pyspark.sql import SparkSession
from pyspark.ml import Pipeline
from pyspark.sql.functions import array_contains
from sparknlp.annotator import *
from pyspark.ml import Pipeline, PipelineModel

# location of pre-trained pipelines
resource_path= "../demo_pipelines/movies_sentiment"

#### 2. Load SparkSession if not already there

In [8]:
spark = SparkSession.builder \
    .appName("Sentiment_Rule_Based")\
    .master("local[*]")\
    .config("spark.driver.memory","8G")\
    .config("spark.driver.maxResultSize", "2G")\
    .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.3")\
    .config("spark.kryoserializer.buffer.max", "500m")\
    .config("spark.jars", "/tmp/sparknlp.jar")\
    .config("spark.driver.extraClassPath", "/tmp/sparknlp.jar")\
    .config("spark.executor.extraClassPath", "/tmp/sparknlp.jar")\
    .getOrCreate()

#### 3. Load our predefined pipeline containing all the important annotators.

In [9]:
pipeline = PipelineModel.load("../demo_pipelines/movies_sentiment_analysis_en_1.8.0_2.4_1552694056982")

#### 4. Create some user opinions for some movies, keep an eye on the spelling, we'll get back to that soon.

In [10]:
testDocs = [
    "I felt so disapointed to see this very uninspired film. I recommend others to awoid this movie is not good.",
    "This was movie was amesome, everything was nice."]

In [14]:
from sparknlp.base import LightPipeline
lp = LightPipeline(pipeline)
result = lp.annotate(testDocs)
[(r['sentence'], r['my_sda_scores']) for r in result]

[(['I felt so disapointed to see this very uninspired film.',
   'I recommend others to awoid this movie is not good.'],
  ['negative']),
 (['This was movie was amesome, everything was nice.'], ['positive'])]

#### Let's check the sentiment

 #### [Optional] - inspect intermmediate stages - spell checking

In [18]:
[list(zip(r['token'],r['checked_token'])) for r in result]

AttributeError: 'list' object has no attribute 'zip'

In [16]:
result


[{'checked_token': ['I',
   'felt',
   'so',
   'disappointed',
   'to',
   'see',
   'this',
   'very',
   'uninspired',
   'film',
   '.',
   'I',
   'recommend',
   'others',
   'to',
   'avoid',
   'this',
   'movie',
   'is',
   'not',
   'good',
   '.'],
  'document': ['I felt so disapointed to see this very uninspired film. I recommend others to awoid this movie is not good.'],
  'my_sda_scores': ['negative'],
  'sentence': ['I felt so disapointed to see this very uninspired film.',
   'I recommend others to awoid this movie is not good.'],
  'token': ['I',
   'felt',
   'so',
   'disapointed',
   'to',
   'see',
   'this',
   'very',
   'uninspired',
   'film',
   '.',
   'I',
   'recommend',
   'others',
   'to',
   'awoid',
   'this',
   'movie',
   'is',
   'not',
   'good',
   '.']},
 {'checked_token': ['This',
   'was',
   'movie',
   'was',
   'awesome',
   ',',
   'everything',
   'was',
   'nice',
   '.'],
  'document': ['This was movie was amesome, everything was nice.