# Sentiment Analysis

# Introduction

So far, all of the analysis we've done has been pretty generic - looking at counts, creating scatter plots, etc. These techniques could be applied to numeric data as well.

When it comes to text data, there are a few popular techniques that we'll be going through in the next few notebooks, starting with sentiment analysis. A few key points to remember with sentiment analysis.

TextBlob Module: Linguistic researchers have labeled the sentiment of words based on their domain expertise. Sentiment of words can vary based on where it is in a sentence. The TextBlob module allows us to take advantage of these labels.
Sentiment Labels: Each word in a corpus is labeled in terms of polarity and subjectivity (there are more labels as well, but we're going to ignore them for now). A corpus' sentiment is the average of these.
Polarity: How positive or negative a word is. -1 is very negative. +1 is very positive.
Subjectivity: How subjective, or opinionated a word is. 0 is fact. +1 is very much an opinion.
For more info on how TextBlob coded up its sentiment function.

Let's take a look at the sentiment of the various transcripts, both overall and throughout the comedy routine.

In [1]:
# We'll start by reading in the corpus, which preserves word order
import pandas as pd

data_rock = pd.read_pickle('data_clean_rock.pkl')
data_rock

FileNotFoundError: [Errno 2] No such file or directory: 'data_clean_rock.pkl'

In [None]:
# Create quick lambda functions to find the polarity and subjectivity of each routine
# Terminal / Anaconda Navigator: conda install -c conda-forge textblob
from textblob import TextBlob

pol = lambda x: TextBlob(x).sentiment.polarity
sub = lambda x: TextBlob(x).sentiment.subjectivity

data_rock['polarity'] = data_rock['Lyric'].apply(pol)
data_rock['subjectivity'] = data_rock['Lyric'].apply(sub)
data_rock['Genre'] = 'Rock'
data_rock.index.name = 'SongNumber'
data_rock

In [None]:
# Let's plot the results
import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['figure.figsize'] = [20, 8]

for index, song in enumerate(data_rock.index):
    x = data_rock.polarity.loc[song]
    y = data_rock.subjectivity.loc[song]
    
    plt.scatter(x, y, color='blue')
    plt.text(x+.001, y+.001, data_rock['Genre'][index], fontsize=10)
    plt.xlim(-1, 1) 
    
plt.title('Sentiment Analysis for Rock Song', fontsize=20)
plt.xlabel('<-- Negative -------- Positive -->', fontsize=15)
plt.ylabel('<-- Facts -------- Opinions -->', fontsize=15)

plt.show()

In [None]:
# We'll start by reading in the corpus, which preserves word order
import pandas as pd

data_pop = pd.read_pickle('data_clean_pop.pkl')
data_pop

In [2]:
# Create quick lambda functions to find the polarity and subjectivity of each routine
# Terminal / Anaconda Navigator: conda install -c conda-forge textblob
from textblob import TextBlob

pol = lambda x: TextBlob(x).sentiment.polarity
sub = lambda x: TextBlob(x).sentiment.subjectivity

data_pop['polarity'] = data_pop['Lyric'].apply(pol)
data_pop['subjectivity'] = data_pop['Lyric'].apply(sub)
data_pop['Genre'] = 'Pop'
data_pop.reset_index()
data_pop

NameError: name 'data_pop' is not defined

In [None]:
# Let's plot the results
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = [20, 8]

for index, song in enumerate(data_pop.index):
    x = data_pop.polarity.loc[song]
    y = data_pop.subjectivity.loc[song]
    
    plt.scatter(x, y, color='red')
    plt.text(x+.001, y+.001, 'Pop', fontsize=10)
    plt.xlim(-1, 1) 
    
plt.title('Sentiment Analysis for Pop Song', fontsize=20)
plt.xlabel('<-- Negative -------- Positive -->', fontsize=15)
plt.ylabel('<-- Facts -------- Opinions -->', fontsize=15)

plt.show()

In [None]:
# We'll start by reading in the corpus, which preserves word order
import pandas as pd

data_hip = pd.read_pickle('data_clean_hiphop.pkl')
data_hip

In [None]:
# Create quick lambda functions to find the polarity and subjectivity of each routine
# Terminal / Anaconda Navigator: conda install -c conda-forge textblob
from textblob import TextBlob

pol = lambda x: TextBlob(x).sentiment.polarity
sub = lambda x: TextBlob(x).sentiment.subjectivity

data_hip['polarity'] = data_hip['Lyric'].apply(pol)
data_hip['subjectivity'] = data_hip['Lyric'].apply(sub)
data_hip['Genre'] = 'Hip'
data_hip.index.name = 'SongNumber'
data_hip

In [None]:
# Let's plot the results
import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['figure.figsize'] = [20, 8]

for index, song in enumerate(data_hip.index):
    x = data_hip.polarity.loc[song]
    y = data_hip.subjectivity.loc[song]
    
    plt.scatter(x, y, color='Green')
    plt.text(x+.001, y+.001, 'Hiphop', fontsize=10)
    plt.xlim(-1, 1) 
    
plt.title('Sentiment Analysis for Hip-Hop Song', fontsize=20)
plt.xlabel('<-- Negative -------- Positive -->', fontsize=15)
plt.ylabel('<-- Facts -------- Opinions -->', fontsize=15)

plt.show()

In [None]:
# Let's plot the results
import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['figure.figsize'] = [20, 8]

for index, song in enumerate(data_rock.index):
    x = data_rock.polarity.loc[song]
    y = data_rock.subjectivity.loc[song]
    
    plt.scatter(x, y, color='blue')
    plt.xlim(-1, 1) 

for index, song in enumerate(data_pop.index):
    x = data_pop.polarity.loc[song]
    y = data_pop.subjectivity.loc[song]
    
    plt.scatter(x, y, color='red')
    plt.xlim(-1, 1)

for index, song in enumerate(data_hip.index):
    x = data_hip.polarity.loc[song]
    y = data_hip.subjectivity.loc[song]
    
    plt.scatter(x, y, color='Green')
    plt.xlim(-1, 1) 

 
   
plt.title('Sentiment Analysis for Songs', fontsize=20)
plt.xlabel('<-- Negative -------- Positive -->', fontsize=15)
plt.ylabel('<-- Facts -------- Opinions -->', fontsize=15)

plt.show()