# ___Sentiment Analysis using Vader and TextBlob___

___Rule-based methods:___
* ___TextBlob___ _: Simple rule-based API for sentiment analysis_
* ___VADER___ _: Parsimonious rule-based model for sentiment analysis of social media text._

___Feature-based methods:___
* ___Logistic Regression___ _: Generalized linear model in Scikit-learn._
* ___Support Vector Machine (SVM)___ _: Linear model in Scikit-learn with a stochastic gradient descent (SGD) optimizer for gradient loss._
* ___Naive Bayes___

___Embedding-based methods:___
* ___FastText___ _: An NLP library that uses highly efficient CPU-based representations of word embeddings for classification tasks._
* ___Flair___ _: A PyTorch-based framework for NLP tasks such as sequence tagging and classification._

_[Reference](https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4)_

In [None]:
#Importing required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## ___VADER Sentiment Analysis___
___VADER ( Valence Aware Dictionary for Sentiment Reasoning)___ _is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion. __VADER sentimental analysis__ relies on a dictionary that maps lexical features to emotion intensities known as sentiment scores. It is a rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media._

___Advantages of using VADER___

_VADER has a lot of advantages over traditional methods of Sentiment Analysis, including:_
* _It works exceedingly well on social media type text, yet readily generalizes to multiple domains_
* _It doesn’t require any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon_
* _It is fast enough to be used online with streaming data, and_
* _It does not severely suffer from a speed-performance tradeoff._

In [11]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [14]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

In [16]:
doc1 = 'This is a good moview'
sid.polarity_scores(doc1)

{'compound': 0.4404, 'neg': 0.0, 'neu': 0.508, 'pos': 0.492}

In [17]:
doc2 = 'This was the best, most awesome movie EVER MADE!'
sid.polarity_scores(doc2)

{'compound': 0.8716, 'neg': 0.0, 'neu': 0.441, 'pos': 0.559}

* _The __Positive, Negative and Neutral scores__ represent the proportion of text that falls in these categories. This means our sentence was rated as 67% Positive, 33% Neutral and 0% Negative. Hence all these should add up to 1._
* _The __Compound score__ is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive)_

## ___TextBlob___

* ___TextBlob Module___ _: Linguistic researchers have labeled the sentiment of words based on their domain expertise. Sentiment of words can vary based on where it is in a sentence. The TextBlob module allows us to take advantage of these labels._
* ___Sentiment Labels___ _: Each word in a corpus is labeled in terms of polarity and subjectivity (there are more labels as well, but we're going to ignore them for now). A corpus' sentiment is the average of these._
1. ___Polarity___ _: How positive or negative a word is. -1 is very negative. +1 is very positive._
2. ___Subjectivity___ _: How subjective, or opinionated a word is. 0 is fact. +1 is very much an opinion._

_The lexicon it refers to is in en-sentiment.xml, an XML document that includes the following four entries for the word “great”._

```
<word form="great" cornetto_synset_id="n_a-525317" wordnet_id="a-01123879" pos="JJ" sense="very good" polarity="1.0" subjectivity="1.0" intensity="1.0" confidence="0.9" />
<word form="great" wordnet_id="a-01278818" pos="JJ" sense="of major significance or importance" polarity="1.0" subjectivity="1.0" intensity="1.0" confidence="0.9" />
<word form="great" wordnet_id="a-01386883" pos="JJ" sense="relatively large in size or number or extent" polarity="0.4" subjectivity="0.2" intensity="1.0" confidence="0.9" />
<word form="great" wordnet_id="a-01677433" pos="JJ" sense="remarkable or out of the ordinary in degree or magnitude or effect" polarity="0.8" subjectivity="0.8" intensity="1.0" confidence="0.9" />
```
_When calculating sentiment for a single word, TextBlob uses a sophisticated technique known to mathematicians as “__averaging__”._

In [10]:
from textblob import TextBlob

In [20]:
doc1 = 'This is a good moview'
TextBlob(doc1).sentiment

Sentiment(polarity=0.7, subjectivity=0.6000000000000001)

_By default textblob uses PatternAnalyzer from Pattern library. But we can also use NaiveBayes analyzer._

In [23]:
import nltk
nltk.download('movie_reviews')
nltk.download('punkt')
from textblob.sentiments import NaiveBayesAnalyzer

opinion = TextBlob("python is dope!", analyzer=NaiveBayesAnalyzer())
opinion.sentiment

[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Sentiment(classification='neg', p_pos=0.38888888888888934, p_neg=0.6111111111111108)