<a href="https://colab.research.google.com/github/Nukaraju2003/Natural-Language-Processing/blob/main/Sentiment_Analysis_using_VADER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VADER Sentiment Analysis

Valence Aware Dictionary and Sentiment Reasoner (VADER) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. VADER does not requires any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon. (A sentiment lexicon is a list of lexical features e.g., words, which are generally labelled according to their semantic orientation as either positive or negative.). VADER has been found to be quite successful when dealing with social media texts, editorials, movie reviews, and product reviews. This is because VADER not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.

## Import the library

In [None]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


## Scoring
We will use the polarity_scores() method to obtain the polarity indices for the given sentence.

In [None]:
def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print("{} {}".format(sentence, str(score)))

Let's see how it performs on a custom sentence

In [None]:
sentiment_analyzer_scores("VADER is smart, handsome, and funny.")

VADER is smart, handsome, and funny. {'neg': 0.0, 'neu': 0.254, 'pos': 0.746, 'compound': 0.8316}


1. The Positive, Negative and Neutral scores represent the proportion of text that falls in these categories. This means our sentence was rated as 75% Positive, 25% Neutral and 0% Negative. Hence all these should add up to 1.
2. The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.
 positive sentiment: compound score >= 0.05
 neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
 negative sentiment: compound score <= -0.05

## Where does VADER Fail?
You saw how better VADER model is for analyzing social media texts. However, on more nuanced examples, it performs poorly.

Consider the review “everything tastes like garbage to me but we keep coming back because my wife loves the pasta”. Clearly the reviewer does NOT like this restaurant, despite the fact that his or her wife “loves” the pasta. So for humans, this review is a clear negative.

Let's see what VADER does.

In [None]:
sentiment_analyzer_scores("everything tastes like garbage to me but we keep coming back because my wife loves the pasta")

everything tastes like garbage to me but we keep coming back because my wife loves the pasta {'neg': 0.0, 'neu': 0.688, 'pos': 0.312, 'compound': 0.7783}


VADER instead returns a compound score of 0.78, which means highly positive. It relies on the polarity of certain words to determine the overall sentiment. It doesn't have a broader syntactic understanding of the sentence.

**Now let's try sentiment analysis on two classical ML based methods as your assignments:**

1. Naive-Bayes
2. SVM