# Lab5.3 - Sentiment analysis using VADER

Copyright, Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

In this notebook, we introduce how to use [VADER](https://github.com/cjhutto/vaderSentiment) as part of the NLTK to perform sentiment analysis.

**at the end of this notebook, you will**:
* have VADER installed on your computer
* be able to load the VADER model
* be able to apply the VADER model on new sentences:
    * with and without lemmatization
    * with only providing VADER with certain parts of speech, e.g., only providing the adjectives from a sentences as input to VADER

## 1. The VADER package in NLTK
Please run the following commands first to download VADER to your computer.

In [1]:
import nltk

Using VADER for the first time, you need to install it within NLTK. If you have done this before you can comment out the next cell.

In [3]:
nltk.download('vader_lexicon', quiet=False)

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/piek/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

To verify that the download was successful, you can run the following command.

In [4]:
from nltk.sentiment import vader

VADER is rule-based system that makes use of a lexicon. The lexicon can be found [here](https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt).

The model can be loaded in the following way.

In [10]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [11]:
vader_model = SentimentIntensityAnalyzer()

We will use the following three sentences:

In [12]:
sentences = ["Here are my sentences.",
             "It's a nice day.",
             "It's a rainy day."] 

The next for loop assigns a sentiment score from VADER to **each sentence**.

In [13]:
for sent in sentences:
    scores = vader_model.polarity_scores(sent)
    print()
    print('INPUT SENTENCE', sent)
    print('VADER OUTPUT', scores)


INPUT SENTENCE Here are my sentences.
VADER OUTPUT {'neg': 0.0, 'neu': 0.714, 'pos': 0.286, 'compound': 0.0516}

INPUT SENTENCE It's a nice day.
VADER OUTPUT {'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.4215}

INPUT SENTENCE It's a rainy day.
VADER OUTPUT {'neg': 0.394, 'neu': 0.606, 'pos': 0.0, 'compound': -0.0772}


## 2. Combining VADER with spaCy

We can manipulate the input to VADER by providing the lemmas as input instead of the words and by only providing words with certain parts of speech, e.g., only adjectives. We use spaCy for the lemmatization and part of speech tagging.

In [14]:
import spacy
nlp = spacy.load('en') # en_core_web_sm

The next function takes text, processes it using spaCy and applies VADER according to different settings.

In [15]:
# this is a custom-made function that expects text as input:
#
#1) runs SpaCy on the text, 
#2) prepares the SpaCy sentences for VADER
#3) runs VADER on each sentence
#Finally returns the aggregated scores

def run_vader(textual_unit, 
              lemmatize=False, 
              parts_of_speech_to_consider=set(),
              verbose=0):
    """
    Run VADER on a sentence from spacy
    
    :param str textual unit: a textual unit, e.g., sentence, sentences (one string)
    (by looping over doc.sents)
    :param bool lemmatize: If True, provide lemmas to VADER instead of words
    :param set parts_of_speech_to_consider:
    -empty set -> all parts of speech are provided
    -non-empty set: only these parts of speech are considered
    :param int verbose: if set to 1, information is printed
    about input and output
    
    :rtype: dict
    :return: vader output dict
    """
    doc = nlp(textual_unit)
        
    input_to_vader = []

    for sent in doc.sents:
        for token in sent:

            to_add = token.text

            if lemmatize:
                to_add = token.lemma_

                if to_add == '-PRON-': 
                    to_add = token.text

            if parts_of_speech_to_consider:
                if token.pos_ in parts_of_speech_to_consider:
                    input_to_vader.append(to_add) 
            else:
                input_to_vader.append(to_add)

    scores = vader_model.polarity_scores(' '.join(input_to_vader))
    
    if verbose >= 1:
        print()
        print('INPUT SENTENCE', sent)
        print('INPUT TO VADER', input_to_vader)
        print('VADER OUTPUT', scores)

    return scores

Provide VADER with lemmas

In [16]:
sentences = ["Here are my sentences.",
             "It's a nice day.",
             "It's a rainy day."]

In [17]:
run_vader(sentences[1], lemmatize=True)

{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}

if you want the function to print more information, you can set the keyword argument **verbose** to True.

In [18]:
# running the function on a single sentence
run_vader(sentences[1], lemmatize=True, verbose=1)


INPUT SENTENCE It's a nice day.
INPUT TO VADER ['It', 'be', 'a', 'nice', 'day', '.']
VADER OUTPUT {'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}


{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}

You can also filter on part of speech. 

Only Nouns:

In [19]:
run_vader(sentences[1], 
          lemmatize=True, 
          parts_of_speech_to_consider={'NOUN'},
          verbose=1)


INPUT SENTENCE It's a nice day.
INPUT TO VADER ['day']
VADER OUTPUT {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Only verbs:

In [20]:
run_vader(sentences[1], 
          lemmatize=True, 
          parts_of_speech_to_consider={'VERB'},
          verbose=1)


INPUT SENTENCE It's a nice day.
INPUT TO VADER ['be']
VADER OUTPUT {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

In [21]:
run_vader(sentences[1], 
          lemmatize=True, 
          parts_of_speech_to_consider={'ADJ'},
          verbose=1)


INPUT SENTENCE It's a nice day.
INPUT TO VADER ['nice']
VADER OUTPUT {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}


{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}

## 3. Applying VADER to attributions

In [24]:
import pickle

attributions=()
with open('attribution-relations.pickle', 'rb') as inputfile:
    attributions=pickle.load(inputfile)


In [25]:
for attribution in attributions:
    print(attribution)

('make', ' Amazon', 'ORG', ' this week just grocery delivery free , so Walmart is now touting how its grocery service offers the booze .')
('raise', ' Earl Mobile banking app Current', 'PERSON', ' $ 20 M Series B , tops half a million users')
('roll', ' Google', 'ORG', ' Sarah Perez A year ago , out “ .new')
('follow', ' Google Play Pass', 'ORG', ' soon as a way to subscribe to a sizable collection of both apps and ga Spotify now lets artists buy a full - screen ‘ recommendation ’ promoting their new album Oct 24 , 2019')
('work', ' Sarah', 'PERSON', ' Prior to her work as a reporter , in I.T. across a number of industries , including banking , retail and software .')
('rise', ' Sarah Perez North Carolina', 'PERSON', ' has been as an entrepreneurial hub .')
('add', ' Sarah Perez Spotify', 'PERSON', ' recently a feature that will occasionally pop up a full - screen recommendation of a new album the service thinks you ’ll like , based on a combination of your listening taste and huma App

We can now take the predicate and the content and apply VADER to it

In [28]:
for attribution in attributions:
    text = attribution[0]+attribution[3]
    sentiment = run_vader(sentences[1], 
          lemmatize=True, 
          parts_of_speech_to_consider={'ADJ'},
          verbose=0)
    print(sentiment)

{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}


In [29]:
for attribution in attributions:
    text = attribution[0]+attribution[3]
    sentiment = run_vader(sentences[1], 
          lemmatize=True,
          verbose=0)
    print(sentiment)

{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
{'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}


## End of this notebook