# Natural Language Processing(3): Sentiment Analysis

A few key points to remember with sentiment analysis:
1. **TextBlob**: Textblob is a python library built on top of **nltk**. Linguistic researchers have labeled the sentiment of words based on their domain expertise. Sentiment of words can vary based on where it is in a sentence. The TextBlob module allows us to take advantage of these lables.
2. **Sentiment Labels**: Each word in a corpus is labeled in terms of polarity and subjectivity. A corpus' sentiment is the average of these.
    - **Polarity**: How positive or negative a word is. -1 means very negative and +1 means very positive.
    - **Subjectivity**: How subjective, or opinionated a word is. 0 is fact, and +1 is an opinion.

The output of sentiment analysis is score. For example, for each actor we will give the them a sentiment score, which means how positive/negative they are and a subjectivity score, meaning how opinioned they are.

Since sentiment analysis has great relationship with orders of words. So we will start by reading in the corpus instead of document term matrix(DTM).

In [None]:
import pandas as pd
data = pd.read_pickle('corpus.pkl')
data

In [None]:
from textblob import TextBlob

# using lambda function to find the polarity and subjectivity of each transcript
data['polarity'] = data['transcript'].apply(lambda x: TextBlob(x).sentiment.polarity)
data['subjectivity'] = data['transcript'].apply(lambda x: TextBlob(x).sentiment.subjectivity)

data

In [None]:
# Plot the results
import matplotlib.pyplot as plt
%matplolib inline

# setting the size
plt.rcParams['figure.figsize'] = [10, 8]

for index, actor in enumerate(data.index):
    x = data['polarity'].loc[actor]
    y = data['subjectivity'].loc[actor]
    plt.scatter(x, y)
    plt.text(x + 0.001, y + 0.001, data['full_name'][index], fontsize = 10)
    plt.xlim(-0.01, 0.12)

plt.title('Sentiment Analysis', fontsize = 20)
plt.xlabel('<-- Negative -------- Positive -->', fontsize=15)
plt.ylabel('<-- Facts -------- Opinions -->', fontsize=15)

plt.show()

In [None]:
# To check how the sentiment changed with time
import numpy as np
import math

# Split each routine into 10 parts
def split_text(text, n = 10):
    length = len(text)
    size = math.floor(length / n)
    start = np.arange(0, length, size)
    
    split_list = []
    for piece in range(n):
        split_list.append(text[start[piece]: start[piece] + size])
    
    return split_list

In [None]:
# create a list to hold all of the pieces of text
list_pieces = []

for t in data.transcript:
    split = split_text(t)
    list_pieces.append(split)

list_pieces
# There are 12 actors(len(list_pieces)) in list_pieces and 10 pieces of text(len(list_pieces[0])) in each of transcript

In [None]:
# Calculate the polarity for each piece of text
polarity_transcript = []
for lp in list_pieces:
    polarity_piece = []
    for p in lp:
        polarity_piece.append(TextBlob(p).sentiment.polarity)
    polarity_transcript.append(polarity_piece)

# polarity_transcript is a nested list list
polarity_transcript

In [None]:
# Show the plot for all comedians
plt.rcParams['figure.figsize'] = [16, 12]

for index, actor in enumerate(data.index):
    plt.subplot(3, 4, index+1)
    plt.plot(polarity_transcript[index])
    
    # The following line is actually y=0, which is a standard line to help to identify the scope of previous
    plt.plot(np.arange(0,10), np.zeros(10))
    plt.title(data['full_name'][index])
    plt.ylim(ymin = -0.2, ymax - 0.3)

plt.show()