# Sentiment Analysis


## What is Sentiment Analysis?

Sentiment analysis is a method in NLP used to classify the emotion (or tone) and subjectiveness of human language. At the most common and basic level, the goal is to classify a text as positive, negative, or neutral.

### Getting Started

In [4]:
# import what we need
import pandas as pd
from pandas import DataFrame as DF, Series

import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from textblob import TextBlob

## Basic Sentiment Analysis

TextBlob has a `sentiment` method that can be used on any `TextBlob` object. It returns two values:
* polarity: value in range [-1, 1], indicating how negative or positive the text is (close to 0.0 is neutral).
* subjectivity: value in range [0, 1], indicating how subjective the text is (1 is very subjective)

In [5]:
lines = ["The food is on the table", "The food is green", "I don't like the food",
         "I do not like the food", "I like the food", "I don't love the food", "I do not love the food",
         "I hate the food", "I love the food", "The food is delicious"]

# analyze the sentences
sentiments = [b.sentiment for b in [TextBlob(l) for l in lines]]
for l,s in zip(lines, sentiments):
    print('{} \n(p={}, s={})'.format(l, s[0], s[1]), '\n')

The food is on the table 
(p=0.0, s=0.0) 

The food is green 
(p=-0.2, s=0.3) 

I don't like the food 
(p=0.0, s=0.0) 

I do not like the food 
(p=0.0, s=0.0) 

I like the food 
(p=0.0, s=0.0) 

I don't love the food 
(p=0.5, s=0.6) 

I do not love the food 
(p=-0.25, s=0.6) 

I hate the food 
(p=-0.8, s=0.9) 

I love the food 
(p=0.5, s=0.6) 

The food is delicious 
(p=1.0, s=1.0) 



Here you can see the inefficiency of the textblob library, but it is a good practice for now.

### Using textblob on labeled data

In [8]:
# read data
cols = ['airline_sentiment','airline_sentiment_confidence',
        'airline','name','text']
data = pd.read_csv('data/dataset/tweets.csv', usecols=cols)

In [9]:
# get subset of tweets where confidence is > 0.6
subset = data[data.airline_sentiment_confidence > 0.6]\
    .head(10).copy().reset_index(drop=True)
tweets = subset.text

In [10]:
subset

Unnamed: 0,airline_sentiment,airline_sentiment_confidence,airline,name,text
0,neutral,1.0,Virgin America,cairdin,@VirginAmerica What @dhepburn said.
1,neutral,0.6837,Virgin America,yvonnalynn,@VirginAmerica I didn't today... Must mean I n...
2,negative,1.0,Virgin America,jnardino,@VirginAmerica it's really aggressive to blast...
3,negative,1.0,Virgin America,jnardino,@VirginAmerica and it's a really big bad thing...
4,negative,1.0,Virgin America,jnardino,@VirginAmerica seriously would pay $30 a fligh...
5,positive,0.6745,Virgin America,cjmcginnis,"@VirginAmerica yes, nearly every time I fly VX..."
6,neutral,0.634,Virgin America,pilot,@VirginAmerica Really missed a prime opportuni...
7,positive,0.6559,Virgin America,dhepburn,"@virginamerica Well, I didn't…but NOW I DO! :-D"
8,positive,1.0,Virgin America,YupitsTate,"@VirginAmerica it was amazing, and arrived an ..."
9,neutral,0.6769,Virgin America,idk_but_youtube,@VirginAmerica did you know that suicide is th...


### Compare the `sentiment` predictions with each line in `subset`

We want to get a sense of how each tweet is being classified

In [11]:
# print the tweets and predicted polarity line-by-line
for i,t in enumerate(tweets):
    s = TextBlob(t).sentiment
    target = subset.airline_sentiment[i]
    print(t, '\n', '{} (target: {}) \n'.format(s[0], target))

@VirginAmerica What @dhepburn said. 
 0.0 (target: neutral) 

@VirginAmerica I didn't today... Must mean I need to take another trip! 
 -0.390625 (target: neutral) 

@VirginAmerica it's really aggressive to blast obnoxious "entertainment" in your guests' faces &amp; they have little recourse 
 0.0062500000000000056 (target: negative) 

@VirginAmerica and it's a really big bad thing about it 
 -0.3499999999999999 (target: negative) 

@VirginAmerica seriously would pay $30 a flight for seats that didn't have this playing.
it's really the only bad thing about flying VA 
 -0.2083333333333333 (target: negative) 

@VirginAmerica yes, nearly every time I fly VX this “ear worm” won’t go away :) 
 0.4666666666666666 (target: positive) 

@VirginAmerica Really missed a prime opportunity for Men Without Hats parody, there. https://t.co/mWpG7grEZP 
 0.2 (target: neutral) 

@virginamerica Well, I didn't…but NOW I DO! :-D 
 1.0 (target: positive) 

@VirginAmerica it was amazing, and arrived an hour e

In [19]:
story = ["I had a great dog", "He used to be my true friend", "We used to go out and enjoy our days",
         "but he got sick", "he felt really bad", "doctors said he can't be cured",  "he died eventually",
         "I felt so sad", "and I became so lonely"]

#.......Code here

I had a great dog 
(p=0.8, s=0.75) 

He used to be my true friend 
(p=0.35, s=0.65) 

We used to go out and enjoy our days 
(p=0.4, s=0.5) 

but he got sick 
(p=-0.7142857142857143, s=0.8571428571428571) 

he felt really bad 
(p=-0.6999999999999998, s=0.6666666666666666) 

doctors said he can't be cured 
(p=0.0, s=0.0) 

he died eventually 
(p=0.0, s=0.0) 

I felt so sad 
(p=-0.5, s=1.0) 

and I became so lonely 
(p=-0.09999999999999998, s=0.7) 

