## Sentiment Analysis

In this notebook we'll see some tools to gauge the sentiment of a piece of text. This can be incredibly useful for analyzing reviews or responding appropriately to queries. Unlike other tasks, sentiment analysis depends heavily on punctuation, flow, and context. Therefore we do not apply our pre-processing pipeline, instead sending text directly to the models. Again, we will be using the Brown corpus provided through NLTK

In [1]:
from nltk.corpus import brown

corpus = brown

categories = ["adventure", "belles_lettres", "editorial", "fiction", "government", 
              "hobbies", "humor", "learned", "lore", "mystery", "news", "religion",
              "reviews", "romance", "science_fiction"]

def get_texts(count, sentences):
    out = []
    for i in range(count):
        text = [" ".join(corpus.sents(categories="adventure")[j]) for j in range(sentences*i, sentences*(i+1))]
        text = " ".join(text).replace(" .", ".").replace(" ?", "?").replace(" !", "!")
        out.append(text)
    return out

## Vader Sentiment

VADER is a rule-based process for estimating the sentiment of a selection of text. VADER builds off of NLTK (and can also be imported through NLTK). Most importantly, it's super easy to use! Let's see a quick example:

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

print(analyzer.polarity_scores("This class is my favorite!!!"))
print(analyzer.polarity_scores("I hate this class :("))

{'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.5962}
{'neg': 0.688, 'neu': 0.312, 'pos': 0.0, 'compound': -0.765}


The negative, neutral, and positive scores count the portion of the input in each category. They sum up to 1. Compound combines all three, and is the metric we will be looking at. Compound ranges from -1 (highly negative) to 1 (highly positive). Somewhat surprisingly, these two highly opinionated sentences don't register as highly positive or negative, though the general direction is correctly picked up. Let's see how VADER performs on some text from the corpus

In [3]:
texts = get_texts(6, 1)
for i in range(6):
    print(texts[i], "\n", analyzer.polarity_scores(texts[i]), "\n")

Dan Morgan told himself he would forget Ann Turner. 
 {'neg': 0.192, 'neu': 0.808, 'pos': 0.0, 'compound': -0.2263} 

He was well rid of her. 
 {'neg': 0.0, 'neu': 0.704, 'pos': 0.296, 'compound': 0.2732} 

He certainly didn't want a wife who was fickle as Ann. 
 {'neg': 0.097, 'neu': 0.713, 'pos': 0.19, 'compound': 0.291} 

If he had married her , he'd have been asking for trouble. 
 {'neg': 0.197, 'neu': 0.803, 'pos': 0.0, 'compound': -0.4019} 

But all of this was rationalization. 
 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0} 

Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep. 
 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0} 



With such short examples, VADER is not able to get a good read of the sentiment, and outputs fairly conservative predictions. From my experiences, more text is needed for accurate assessment. Let's bump it up to four sentences per test.

In [4]:
texts = get_texts(4, 4)
for i in range(4):
    print(texts[i], "\n", analyzer.polarity_scores(texts[i]), "\n")

Dan Morgan told himself he would forget Ann Turner. He was well rid of her. He certainly didn't want a wife who was fickle as Ann. If he had married her , he'd have been asking for trouble. 
 {'neg': 0.134, 'neu': 0.762, 'pos': 0.104, 'compound': -0.0829} 

But all of this was rationalization. Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep. His plans and dreams had revolved around her so much and for so long that now he felt as if he had nothing. The easiest thing would be to sell out to Al Budd and leave the country , but there was a stubborn streak in him that wouldn't allow it. 
 {'neg': 0.113, 'neu': 0.847, 'pos': 0.041, 'compound': -0.6482} 

The best antidote for the bitterness and disappointment that poisoned him was hard work. He found that if he was tired enough at night , he went to sleep simply because he was too exhausted to stay awake. Each day he found himself thinking less often of Ann ; ; each day the hurt was a li

This looks much better. The first selection is neutral, somewhat negative. The next two are certainly negative, and the final sentences (independent of the other three) are fairly positive. VADER picks up on all of this. This is the ideal length of text for VADER. Let's take a quick look at an even longer passage.

In [5]:
text = get_texts(1, 16)[0]
print(text, "\n", analyzer.polarity_scores(text), "\n")

Dan Morgan told himself he would forget Ann Turner. He was well rid of her. He certainly didn't want a wife who was fickle as Ann. If he had married her , he'd have been asking for trouble. But all of this was rationalization. Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep. His plans and dreams had revolved around her so much and for so long that now he felt as if he had nothing. The easiest thing would be to sell out to Al Budd and leave the country , but there was a stubborn streak in him that wouldn't allow it. The best antidote for the bitterness and disappointment that poisoned him was hard work. He found that if he was tired enough at night , he went to sleep simply because he was too exhausted to stay awake. Each day he found himself thinking less often of Ann ; ; each day the hurt was a little duller , a little less poignant. He had plenty of work to do. Because the summer was unusually dry and hot , the spring produced a 

With such a large selection, it is difficult to give a single sentiment. As a result the Compound metric is saturated and misses out on the positive sentiment at the end of the paragraph. For these reasons, I recommend using VADER with smaller chunks. 

## TextBlob

Another easy to use library is TextBlob. While Vader focuses solely on sentiment analysis, TextBlob is a full library similar to NLTK and Spacy. One advantage of the TextBlob Sentiment Analysis tool is the polarity/subjectivity breakdown. Let's compare outputs with the same inputs as above.

In [6]:
from textblob import TextBlob

testimonial = TextBlob("This class is my favorite!!!")
print(testimonial.sentiment)

testimonial = TextBlob("I hate this class :(")
print(testimonial.sentiment)

Sentiment(polarity=0.9765625, subjectivity=1.0)
Sentiment(polarity=-0.775, subjectivity=0.95)


Again, notice how simple it is to import and use this package. Looking at the results, TextBlob reports stronger polarities than VADER for the same inputs. The subjectivity is also high as expected.

In [7]:
texts = get_texts(6, 1)
for i in range(6):
    blob = TextBlob(texts[i])
    print(texts[i], "\n", blob.sentiment, "\n")

Dan Morgan told himself he would forget Ann Turner. 
 Sentiment(polarity=0.0, subjectivity=0.0) 

He was well rid of her. 
 Sentiment(polarity=0.0, subjectivity=0.0) 

He certainly didn't want a wife who was fickle as Ann. 
 Sentiment(polarity=0.21428571428571427, subjectivity=0.5714285714285714) 

If he had married her , he'd have been asking for trouble. 
 Sentiment(polarity=0.024999999999999994, subjectivity=0.225) 

But all of this was rationalization. 
 Sentiment(polarity=0.0, subjectivity=0.0) 

Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep. 
 Sentiment(polarity=0.0, subjectivity=0.0) 



Similar to VADER, TextBlob does not perform well on small sentences. More context is needed.

In [8]:
texts = get_texts(4, 4)
for i in range(4):
    blob = TextBlob(texts[i])
    print(texts[i], "\n", blob.sentiment, "\n")

Dan Morgan told himself he would forget Ann Turner. He was well rid of her. He certainly didn't want a wife who was fickle as Ann. If he had married her , he'd have been asking for trouble. 
 Sentiment(polarity=0.0880952380952381, subjectivity=0.34047619047619043) 

But all of this was rationalization. Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep. His plans and dreams had revolved around her so much and for so long that now he felt as if he had nothing. The easiest thing would be to sell out to Al Budd and leave the country , but there was a stubborn streak in him that wouldn't allow it. 
 Sentiment(polarity=0.037500000000000006, subjectivity=0.15000000000000002) 

The best antidote for the bitterness and disappointment that poisoned him was hard work. He found that if he was tired enough at night , he went to sleep simply because he was too exhausted to stay awake. Each day he found himself thinking less often of Ann ; ; each d

The scale of TextBlob sentiments is slightly different than VADER, making it harder for strong polarities. TextBlob does differ slightly from VADER in marking the second selection positive. 

In [9]:
text = get_texts(1, 12)[0]
blob = TextBlob(text)
print(text, "\n", blob.sentiment, "\n")

Dan Morgan told himself he would forget Ann Turner. He was well rid of her. He certainly didn't want a wife who was fickle as Ann. If he had married her , he'd have been asking for trouble. But all of this was rationalization. Sometimes he woke up in the middle of the night thinking of Ann , and then could not get back to sleep. His plans and dreams had revolved around her so much and for so long that now he felt as if he had nothing. The easiest thing would be to sell out to Al Budd and leave the country , but there was a stubborn streak in him that wouldn't allow it. The best antidote for the bitterness and disappointment that poisoned him was hard work. He found that if he was tired enough at night , he went to sleep simply because he was too exhausted to stay awake. Each day he found himself thinking less often of Ann ; ; each day the hurt was a little duller , a little less poignant. 
 Sentiment(polarity=-0.05187969924812029, subjectivity=0.35545112781954885) 



Once again, such a large sample makes it difficult to accurately assess the polarity. I would recommend a smaller section.

VADER and TextBlob are two out of the box solutions for sentiment analysis. However, special attention must be applied to reach peak performance. Thanks for reading!