# Lab2 - Sentiment analysis using VADER

In this notebook, we introduce how to use the [VADER](https://github.com/cjhutto/vaderSentiment) as part of the NLTK to perform sentiment analysis.

**at the end of this notebook, you will**:
* have VADER installed on your computer
* be able to load the VADER model
* be able to apply the VADER model on new sentences


**If you want to learn more sentiment analysis, you might find the following links useful**:

## Downloading VADER package
Please run the following commands first to download VADER to your computer.

In [2]:
import nltk

In [3]:
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/marten/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

To verify that the download was successful, you can run the following command.

In [5]:
from nltk.sentiment import vader

## Load VADER model
The model can be loaded in the following way.

In [6]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [7]:
vader_model = SentimentIntensityAnalyzer()

We need to tokenize text in order to apply VADER. We will use spaCy.

In [8]:
import spacy
nlp = spacy.load('en')

We take an arbitrary text and run spaCy on it.

In [9]:
sometext = "Here are my sentences. It's a nice day. It's a rainy day." 
doc = nlp(sometext)

Let's inspect how spaCy split the text into sentences.

In [10]:
for sent in doc.sents:
    print(sent.text)

Here are my sentences.
It's a nice day.
It's a rainy day.


The next for loop assigns a sentiment score from VADER to **each sentence**.

In [17]:
for sent in doc.sents:
    scores = vader_model.polarity_scores(sent.text)
    print()
    print('INPUT SENTENCE', sent)
    print('VADER OUTPUT', scores)


INPUT SENTENCE Here are my sentences.
VADER OUTPUT {'neg': 0.0, 'neu': 0.714, 'pos': 0.286, 'compound': 0.0516}

INPUT SENTENCE It's a nice day.
VADER OUTPUT {'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.4215}

INPUT SENTENCE It's a rainy day.
VADER OUTPUT {'neg': 0.394, 'neu': 0.606, 'pos': 0.0, 'compound': -0.0772}


Lemmatizing the text probably improves the performance of VADER. In addition, you can play with only providing words with a certain part of speech to get an idea of how VADER works.

In [35]:
lemmatize = True # if you want to lemmatize
filter_on_part_of_speech = True # if you want to filter on set of parts of speech
parts_of_speech_to_consider = {"ADJ"} # edit this set for other parts of speech
# see https://spacy.io/usage/linguistic-features for list of parts of speech in spaCy

for sent in doc.sents:
    input_to_vader = []
    
    for token in sent:
        
        to_add = token.text
                
        if lemmatize:
            to_add = token.lemma_
            
            if to_add == '-PRON-': 
                continue
            
        if filter_on_part_of_speech:
            if token.pos_ in parts_of_speech_to_consider:
                input_to_vader.append(to_add) 
        else:
            input_to_vader.append(to_add)
    
    
                
    scores = vader_model.polarity_scores(' '.join(input_to_vader))
    print()
    print('INPUT SENTENCE', sent)
    print('INPUT TO VADER', input_to_vader)
    print('VADER OUTPUT', scores)



INPUT SENTENCE Here are my sentences.
INPUT TO VADER []
VADER OUTPUT {'neg': 0.0, 'neu': 0.0, 'pos': 0.0, 'compound': 0.0}

INPUT SENTENCE It's a nice day.
INPUT TO VADER ['nice']
VADER OUTPUT {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}

INPUT SENTENCE It's a rainy day.
INPUT TO VADER ['rainy']
VADER OUTPUT {'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.0772}
