![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/dictionary-sentiment/sentiment_rb.ipynb)

## 0. Colab Setup

In [0]:
import os

# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! java -version

# Install pyspark
! pip install --ignore-installed -q pyspark==2.4.4

# Install Spark NLP
! pip install --ignore-installed -q spark-nlp==2.5.0

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
[K     |████████████████████████████████| 215.7MB 54kB/s 
[K     |████████████████████████████████| 204kB 49.8MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 122kB 2.8MB/s 
[?25h

## Sentiment Analysis Pipeline

This pipeline will be used to explain a number of important features of the Spark-NLP library; Sentence Detection, Tokenization, Spell Checking, and Sentiment Detection.
The idea is to start with natural language as could have been entered by a user, and get sentiment associated to it. Let's walk through each of the stages!


#### 1. Call necessary imports and set the resource path to read local data files

In [0]:
#Imports
import sys

from pyspark.sql import SparkSession
from pyspark.ml import Pipeline
from pyspark.sql.functions import array_contains
from pyspark.ml import Pipeline, PipelineModel

from sparknlp.annotator import *
from sparknlp.pretrained import PretrainedPipeline

#### 2. Load SparkSession if not already there

In [0]:
import sparknlp

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

Spark NLP version:  2.5.0
Apache Spark version:  2.4.4


#### 3. Load our predefined pipeline containing all the important annotators.

In [0]:
pipeline = PretrainedPipeline("analyze_sentiment", lang="en")

analyze_sentiment download started this may take some time.
Approx size to download 4.9 MB
[OK!]


#### 4. Create some user opinions for some movies, keep an eye on the spelling, we'll get back to that soon.

In [0]:
testDocs = [
    "I felt so disapointed to see this very uninspired film. I recommend others to awoid this movie is not good.",
    "This was movie was amesome, everything was nice."]

In [0]:
result = pipeline.annotate(testDocs)
[(r['sentence'], r['sentiment']) for r in result]

[(['I felt so disapointed to see this very uninspired film.',
   'I recommend others to awoid this movie is not good.'],
  ['positive', 'negative']),
 (['This was movie was amesome, everything was nice.'], ['negative'])]

 #### [Optional] - inspect intermmediate stages - spell checking
 As you can see, it suggests `avoid` instead of `awoid`

In [0]:
result

[{'checked': ['I',
   'felt',
   'so',
   'disappointed',
   'to',
   'see',
   'this',
   'very',
   'uninspired',
   'film',
   '.',
   'I',
   'recommend',
   'others',
   'to',
   'avoid',
   'this',
   'movie',
   'is',
   'not',
   'good',
   '.'],
  'document': ['I felt so disapointed to see this very uninspired film. I recommend others to awoid this movie is not good.'],
  'sentence': ['I felt so disapointed to see this very uninspired film.',
   'I recommend others to awoid this movie is not good.'],
  'sentiment': ['positive', 'negative'],
  'token': ['I',
   'felt',
   'so',
   'disapointed',
   'to',
   'see',
   'this',
   'very',
   'uninspired',
   'film',
   '.',
   'I',
   'recommend',
   'others',
   'to',
   'awoid',
   'this',
   'movie',
   'is',
   'not',
   'good',
   '.']},
 {'checked': ['This',
   'was',
   'movie',
   'was',
   'awesome',
   ',',
   'everything',
   'was',
   'nice',
   '.'],
  'document': ['This was movie was amesome, everything was nice.'],
