# Objectives

1. What is **Sentiment Analysis** ? what practical use cases does it have ?
2. Rule-based sentiment analysis using Vader 
3. What other approaches can be used for **Sentiment Analysis** ?

# Warm-up

**Take a look at the [Vader-github-repo](https://github.com/cjhutto/vaderSentiment) and try to answer these questions:**

1. What is sentiment analysis ?

2. What can we find in the **lexicon**, and more specifically: what are the four values representing ? 

3. Does Vader take punctuation into account ? Which words does Vader consider to intensify a sentiment ?

4. How does Vader score a text as a whole ?

## **1. What is Sentiment Analysis ?**

**1. What is Sentiment Analysis ?**

- Sentiment analysis is simply the process of working out (statistically) whether a piece of text is positive, negative or neutral
- The majority of sentiment analysis approaches take one of two forms: 
1. polarity-based, where pieces of texts are classified as either positive or negative
2. valence-based, where the intensity of the sentiment is taken into account For example, the words ‘good’ and ‘excellent’ would be treated the same in a polarity-based approach, whereas ‘excellent’ would be treated as more positive than ‘good’ in a valence-based approach


**2. What can we find in the *lexicon*, and more specifically: what are the four values representing ?**

- Vader lexicon is in the text file *vader_lexicon.txt*
- the file is tab delimited with TOKEN, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-HUMAN-SENTIMENT-RATINGS

**3. Does vader take punctuation into account? Which words does Vader consider to intensify a sentiment ?**

Vader is trained on twitter text and it takes the following into consideration:

- typical negations (e.g., "not good")
- use of contractions as negations (e.g., "wasn't very good")
- conventional use of punctuation to signal increased sentiment intensity (e.g., "Good!!!")
- conventional use of word-shape to signal emphasis (e.g., using ALL CAPS for words/phrases)
- using degree modifiers to alter sentiment intensity (e.g., intensity boosters such as "very" and intensity dampeners such as "kind of")
- understanding many sentiment-laden slang words (e.g., 'sux')
- understanding many sentiment-laden slang words as modifiers such as 'uber' or 'friggin' or 'kinda'
- understanding many sentiment-laden emoticons such as :) and :D
- translating utf-8 encoded emojis such as 💘 and 💋 and 😁
- understanding sentiment-laden initialisms and acronyms (for example: 'lol')

**4. How does Vader score a text as a whole ?**

- Vader gives each text the following scores: pos, compount, neu, neg
- The pos, neu, and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation)
- The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive)

## 2. Rule-based sentiment analysis using Vader

In [1]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyser = SentimentIntensityAnalyzer()

In [6]:
def print_sentiment_scores(sentence):
    snt = analyser.polarity_scores(sentence)
    print("{} : {}".format(sentence, str(snt)))

In [7]:
print_sentiment_scores("I just got a call from my boss - does he realise it's Saturday?")

I just got a call from my boss - does he realise it's Saturday? : {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


In [8]:
# what happens when we add an emoticon ?

print_sentiment_scores("I just got a call from my boss - does he realise it's Saturday? :(")

I just got a call from my boss - does he realise it's Saturday? :( : {'neg': 0.172, 'neu': 0.828, 'pos': 0.0, 'compound': -0.4404}


In [9]:
# Let’s now add the acronym ‘smh’ (shaking my head) and see what happens

print_sentiment_scores("I just got a call from my boss - does he realise it's Saturday? smh :(")

I just got a call from my boss - does he realise it's Saturday? smh :( : {'neg': 0.271, 'neu': 0.729, 'pos': 0.0, 'compound': -0.6369}


In [11]:
# Vader can also "understand" word context 

print_sentiment_scores("The food is good.")

The food is good. : {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}


In [12]:
# let's introduce capitalization

print_sentiment_scores("The food is GOOD.")

The food is GOOD. : {'neg': 0.0, 'neu': 0.452, 'pos': 0.548, 'compound': 0.5622}


In [13]:
# Another factor that increases the intensity of sentence sentiment is exclamation mark

print_sentiment_scores("The food is GOOD!!")


The food is GOOD!! : {'neg': 0.0, 'neu': 0.416, 'pos': 0.584, 'compound': 0.639}


In [14]:
# VADER also takes into account what happens when modifying words are present in front of a sentiment term
# For example, “extremely bad” would increase the negative intensity of a sentence, but “kinda bad” would decrease it

print_sentiment_scores("The food is really GOOD!!")

The food is really GOOD!! : {'neg': 0.0, 'neu': 0.47, 'pos': 0.53, 'compound': 0.6715}


In [16]:
print_sentiment_scores("The food is kinda GOOD!!")

The food is kinda GOOD!! : {'neg': 0.0, 'neu': 0.505, 'pos': 0.495, 'compound': 0.6025}


In [17]:
# VADER also handles changes in a sentence’s sentiment intensity when it contains ‘but’
# Essentially, the rule is that the sentiments expressed both before and after the ‘but’ are taken into consideration 
# However, the sentiment afterwards is weighted more heavily than that before

print_sentiment_scores("The food is really GOOD! But the service is dreadful.")


The food is really GOOD! But the service is dreadful. : {'neg': 0.284, 'neu': 0.548, 'pos': 0.169, 'compound': -0.3977}


## 3. What other approaches can be used for Sentiment Analysis ?