# Lab2 - Sentiment analysis using VADER

In this notebook, we introduce how to use [VADER](https://github.com/cjhutto/vaderSentiment) as part of the NLTK to perform sentiment analysis.

**at the end of this notebook, you will**:
* have VADER installed on your computer
* be able to load the VADER model
* be able to apply the VADER model on new sentences:
    * with and without lemmatization
    * with only providing VADER with certain parts of speech, e.g., only providing the adjectives from a sentences as input to VADER

## Downloading VADER package
Please run the following commands first to download VADER to your computer.

In [10]:
import nltk

In [11]:
nltk.download('vader_lexicon', quiet=False)

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/marten/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

To verify that the download was successful, you can run the following command.

In [1]:
from nltk.sentiment import vader

VADER is rule-based system that makes use of a lexicon. The lexicon can be found [here](https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt).

## Load VADER model
The model can be loaded in the following way.

In [2]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [3]:
vader_model = SentimentIntensityAnalyzer()

We need to preprocess text, e.g., sentence splitting, in order to apply VADER. We will use spaCy.

In [4]:
import spacy
nlp = spacy.load('en')

We take an arbitrary text and run spaCy on it.

In [5]:
sometext = "Here are my sentences. It's a nice day. It's a rainy day." 
doc = nlp(sometext)

Let's inspect how spaCy split the text into sentences.

In [6]:
for sent in doc.sents:
    print(sent.text)

Here are my sentences.
It's a nice day.
It's a rainy day.


The next for loop assigns a sentiment score from VADER to **each sentence**.

In [12]:
for sent in doc.sents:
    scores = vader_model.polarity_scores(sent.text)
    print()
    print('INPUT SENTENCE', sent)
    print('VADER OUTPUT', scores)


INPUT SENTENCE Here are my sentences.
VADER OUTPUT {'neg': 0.0, 'neu': 0.714, 'pos': 0.286, 'compound': 0.0516}

INPUT SENTENCE It's a nice day.
VADER OUTPUT {'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.4215}

INPUT SENTENCE It's a rainy day.
VADER OUTPUT {'neg': 0.394, 'neu': 0.606, 'pos': 0.0, 'compound': -0.0772}


We can manipulate the input to VADER by providing the lemmas as input instead of the words and by only providing words with certain parts of speech, e.g., only adjectives.

In [32]:
def run_vader(spacy_sent, 
              lemmatize=False, 
              parts_of_speech_to_consider=set(),
              verbose=0):
    """
    Run VADER on a sentence from spacy
    
    :param spacy.tokens.span.Span spacy_sent: a sentence in spaCy
    (by looping over doc.sents)
    :param bool lemmatize: If True, provide lemmas to VADER instead of words
    :param set parts_of_speech_to_consider:
    -empty set -> all parts of speech are provided
    -non-empty set: only these parts of speech are considered
    :param int verbose: if set to 1, information is printed
    about input and output
    
    :rtype: dict
    :return: vader output dict
    """
    input_to_vader = []

    for token in sent:

        to_add = token.text

        if lemmatize:
            to_add = token.lemma_

            if to_add == '-PRON-': 
                to_add = token.text

        if parts_of_speech_to_consider:
            if token.pos_ in parts_of_speech_to_consider:
                input_to_vader.append(to_add) 
        else:
            input_to_vader.append(to_add)

    scores = vader_model.polarity_scores(' '.join(input_to_vader))
    
    if verbose >= 1:
        print()
        print('INPUT SENTENCE', sent)
        print('INPUT TO VADER', input_to_vader)
        print('VADER OUTPUT', scores)

    return scores

Provide VADER with lemmas

In [33]:
sometext = "Here are my sentences. It's a nice day. It's a rainy day." 
sentences = list(doc.sents)

In [34]:
run_vader(sentences[0], lemmatize=True)

{'neg': 0.302, 'neu': 0.698, 'pos': 0.0, 'compound': -0.0772}

if you want the function to print more information, you can set the keyword argument **verbose** to True.

In [35]:
run_vader(sentences[0], lemmatize=True, verbose=1)


INPUT SENTENCE It's a rainy day.
INPUT TO VADER ['It', 'be', 'a', 'rainy', 'day', '.']
VADER OUTPUT {'neg': 0.302, 'neu': 0.698, 'pos': 0.0, 'compound': -0.0772}


{'neg': 0.302, 'neu': 0.698, 'pos': 0.0, 'compound': -0.0772}

You can also filter on part of speech. 

Only Nouns:

In [37]:
run_vader(sentences[0], 
          lemmatize=True, 
          parts_of_speech_to_consider={'NOUN'},
          verbose=1)


INPUT SENTENCE It's a rainy day.
INPUT TO VADER ['day']
VADER OUTPUT {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Only verbs:

In [38]:
run_vader(sentences[0], 
          lemmatize=True, 
          parts_of_speech_to_consider={'VERB'},
          verbose=1)


INPUT SENTENCE It's a rainy day.
INPUT TO VADER ['be']
VADER OUTPUT {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

In [39]:
run_vader(sentences[0], 
          lemmatize=True, 
          parts_of_speech_to_consider={'ADJ'},
          verbose=1)


INPUT SENTENCE It's a rainy day.
INPUT TO VADER ['rainy']
VADER OUTPUT {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.0772}


{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.0772}