<h2>Appendix 11 - Sentiment Analysis, Polarity and Subjectivity</h2>

Program using the TextBlob module to assign a polarity and sentiment score to every tweet in our corpus. The reason for choosing this approach to sentiment analysis and the underlying workings of this approach are explained in the Sentiment Analysis section of our study.

In [1]:
import pandas as pd
import numpy as np
from textblob import TextBlob as tb

In [2]:
tweet_data = pd.read_excel("04sotu_with_gender_final.xlsx")

In [3]:
# Create new columns to be populated with sentiment values
tweet_data["tbpolarity"] = np.nan
tweet_data["tbsubjectivity"] = np.nan

In [5]:
for i in range(len(tweet_data.index)):
    text = tweet_data.at[i, "text"]
    hashtags = tweet_data.at[i, "text_hashtags_split"]
    if type(text) == str:
        # Incorpate words from hashtags if available
        if type(hashtags) == str:
            text_with_hashtags = text + " " + hashtags  # Combine text and hashtags
            text_with_hashtags_tb = tb(text_with_hashtags)  # Convert text and hashtags into textblob object
            tweet_data.at[i,"tbpolarity"] = text_with_hashtags_tb.sentiment.polarity  # Calculate polarity
            tweet_data.at[i,"tbsubjectivity"] = text_with_hashtags_tb.sentiment.subjectivity  #  Calculate subjectivity
        else:
            text_tb = tb(text)  # Convert text into textblob object
            tweet_data.at[i,"tbpolarity"] = text_tb.sentiment.polarity  # Calculate polarity
            tweet_data.at[i,"tbsubjectivity"] = text_tb.sentiment.subjectivity  # Calculate subjectivity

In [7]:
#  Split polarity and subjectivity values into bins of width 0.25, for early indication of spread of data
bins = [-1.1,-0.75,-0.5,-0.25,0.0,0.25,0.5,0.75,1.0]
polarity_binned = pd.cut(tweet_data["tbpolarity"],bins)
bins = [-0.1, 0.25, 0.5, 0.75, 1.0]
subjectivity_binned = pd.cut(tweet_data["tbsubjectivity"],bins)

In [8]:
pd.value_counts(polarity_binned)

(-0.25, 0.0]     145248
(0.0, 0.25]       47876
(0.25, 0.5]       36400
(-0.5, -0.25]     15262
(0.5, 0.75]        9514
(-0.75, -0.5]      8813
(0.75, 1.0]        7160
(-1.1, -0.75]      4136
Name: tbpolarity, dtype: int64

In [9]:
pd.value_counts(subjectivity_binned)

(-0.1, 0.25]    125768
(0.25, 0.5]      61238
(0.5, 0.75]      51439
(0.75, 1.0]      35964
Name: tbsubjectivity, dtype: int64

In [13]:
writer = pd.ExcelWriter('05sotu_with_tb_sentiment.xlsx')
tweet_data.to_excel(writer,'Sheet1')
writer.save()