# Lab2 - Sentiment analysis using VADER

In this notebook, we introduce how to use the [VADER](https://github.com/cjhutto/vaderSentiment) as part of the NLTK to perform sentiment analysis.

**at the end of this notebook, you will**:
* have VADER installed on your computer
* be able to load the VADER model
* be able to apply the VADER model on new sentences


**If you want to learn more sentiment analysis, you might find the following links useful**:

## Downloading VADER package
Please run the following commands first to download VADER to your computer.

In [1]:
import nltk

In [2]:
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/piek/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

To verify that the download was successful, you can run the following command.

In [3]:
from nltk.sentiment import vader



## Load VADER model
The model can be loaded in the following way.

In [4]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [5]:
vader_model = SentimentIntensityAnalyzer()

We need to tokenize text in order to apply VADER. We will use spaCy.

In [10]:
import spacy
nlp = spacy.load('en')

We take an arbitrary text and run spaCy on it.

In [11]:
sometext = "Here are my sentences. It's a nice day. It's a rainy day." 
doc = nlp(sometext)

Let's inspect how spaCy split the text into sentences.

In [12]:
for sent in doc.sents:
    print(sent.text)

Here are my sentences.
It's a nice day.
It's a rainy day.


The next for loop assigns a sentiment score from VADER to **each sentence**.

In [13]:
for sent in doc.sents:
    scores = vader_model.polarity_scores(sent.text)
    print()
    print('INPUT SENTENCE', sent)
    print('VADER OUTPUT', scores)


INPUT SENTENCE Here are my sentences.
VADER OUTPUT {'neg': 0.0, 'neu': 0.714, 'pos': 0.286, 'compound': 0.0516}

INPUT SENTENCE It's a nice day.
VADER OUTPUT {'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.4215}

INPUT SENTENCE It's a rainy day.
VADER OUTPUT {'neg': 0.394, 'neu': 0.606, 'pos': 0.0, 'compound': -0.0772}


Question Piek: why was the vader lexicon downloaded? Will it be adapted and how can you use the adapted lexicon?
Explanation of the VADER lexicon:

FORMAT: the file is tab delimited with TOKEN, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-HUMAN-SENTIMENT-RATINGS


(-:<	-0.4	2.15407	[-3, 3, -1, -1, 2, -1, -2, 3, -3, -1]
(-:o	1.5	0.67082	[3, 1, 1, 2, 2, 2, 1, 1, 1, 1]
(-:O	1.5	0.67082	[3, 1, 1, 2, 2, 2, 1, 1, 1, 1]
(-:{	-0.1	1.57797	[-2, -3, 1, -2, 1, 1, 0, 0, 2, 1]
asset	1.5	0.80623	[2, 1, 1, 3, 2, 0, 2, 2, 1, 1]
assets	0.7	1.00499	[0, 0, 1, 3, 0, 1, 0, 0, 2, 0]
assfucking	-2.5	1.43178	[-3, -3, 0, -3, 0, -2, -4, -4, -4, -2]
assholes	-2.8	0.74833	[-3, -3, -3, -3, -4, -3, -2, -3, -1, -3]
assurance	1.4	0.4899	[1, 1, 2, 2, 1, 1, 1, 2, 2, 1]
assurances	1.4	0.4899	[2, 2, 1, 1, 1, 2, 2, 1, 1, 1]
assure	1.4	0.4899	[1, 1, 1, 1, 2, 1, 1, 2, 2, 2]
