### Barack Obama - Sentiment Analysis

#### Language Processing Pipeline (Potential Idea - sentence tokenizer, POS, NER using spacy pipeline)

A pipeline is something that comes after the tokenizer, and can have a number of steps.

In [1]:
import pandas as pd

In [2]:
BO_Nov2006 = pd.read_pickle('Obama Speeches Raw/BO_Nov2006_df.pkl')
BO_Feb2009 = pd.read_pickle('Obama Speeches Raw/BO_Feb2009_df.pkl')
BO_May2011 = pd.read_pickle('Obama Speeches Raw/BO_May2011_df.pkl')
BO_Aug2014 = pd.read_pickle('Obama Speeches Raw/BO_Aug2014_df.pkl')

In [3]:
BO_Nov2006.head()

Unnamed: 0,sentence
0,"Throughout American history, the..."
1,They are the soul-trying times our forbearers ...
2,making the hard choices and sacrifices necessa...
3,This was true for those who went...
4,It was true for those who lie bu...


In [4]:
# add name of speech into each dataFrame
BO_Nov2006['speech'] = 'A way Forward in Iraq Nov2006'
BO_Feb2009['speech'] = 'Responsibly Ending the War in Iraq Feb2009'
BO_May2011['speech'] = 'American Diplomacy in Middle East and Northern Africa May2011'
BO_Aug2014['speech'] = 'Iraq Airstrikes and Humanitarian Aid Aug2014'

In [5]:
# reset indexes of each DataFrame 
BO_Nov2006.reset_index(inplace=True)
BO_Feb2009.reset_index(inplace=True)
BO_May2011.reset_index(inplace=True)
BO_Aug2014.reset_index(inplace=True)

### Sentiment Analysis using RoBERTa Twitter model

In [6]:
# import modules
import transformers
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax
from tqdm.notebook import tqdm # makes loops show a progress bar

In [7]:
# create tokenizer and model objects
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

In [8]:
def polarity_scores_roberta(speech_text):
# Create fucntion to apply model on a text and return scores
    encoded_text = tokenizer(speech_text, return_tensors='pt')
    output = model(**encoded_text)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores) # returns a numpy array
    scores_dict = {
        "roberta_neg": scores[0],
        "roberta_neu": scores[1],
        "roberta_pos": scores[2],
    }
    return scores_dict

##### Run the model on each dataFrame

In [9]:
from tqdm.notebook import tqdm_notebook

results = {}

# Set up tqdm_notebook with the total number of iterations
tqdm_bar = tqdm_notebook(total=len(BO_Nov2006))

for i, row in BO_Nov2006.iterrows():
    try:
        text = row['sentence']
        rowid = row['index'] # make sure DataFrame has reset index 
        results[rowid] = polarity_scores_roberta(text) # apply function and store in results dictionary

    except RuntimeError:
        print(f"Broke for id: {rowid}") # catch row id if bad response
        
    tqdm_bar.update(1)  # Update the progress bar

tqdm_bar.close()

  0%|          | 0/199 [00:00<?, ?it/s]

In [10]:
BO_Nov2006_results_df = pd.DataFrame(results).T
BO_Nov2006_df = pd.concat([BO_Nov2006, BO_Nov2006_results_df], axis=1)
BO_Nov2006_df

Unnamed: 0,index,sentence,speech,roberta_neg,roberta_neu,roberta_pos
0,0,"Throughout American history, the...",A way Forward in Iraq Nov2006,0.131547,0.740866,0.127587
1,1,They are the soul-trying times our forbearers ...,A way Forward in Iraq Nov2006,0.173441,0.693343,0.133217
2,2,making the hard choices and sacrifices necessa...,A way Forward in Iraq Nov2006,0.102811,0.694173,0.203016
3,3,This was true for those who went...,A way Forward in Iraq Nov2006,0.026974,0.689914,0.283112
4,4,It was true for those who lie bu...,A way Forward in Iraq Nov2006,0.650926,0.319936,0.029138
...,...,...,...,...,...,...
194,194,The time for waiting in Iraq is ...,A way Forward in Iraq Nov2006,0.478585,0.482813,0.038602
195,195,It is time to change our policy.,A way Forward in Iraq Nov2006,0.202463,0.732728,0.064810
196,196,It is time to give Iraqis their ...,A way Forward in Iraq Nov2006,0.188637,0.648993,0.162370
197,197,And it is time to refocus America’s ...,A way Forward in Iraq Nov2006,0.178327,0.707893,0.113781


In [11]:
results = {}

# Set up tqdm_notebook with the total number of iterations
tqdm_bar = tqdm_notebook(total=len(BO_Feb2009))

for i, row in BO_Feb2009.iterrows():
    try:
        text = row['sentence']
        rowid = row['index'] # make sure DataFrame has reset index 
        results[rowid] = polarity_scores_roberta(text) # apply function and store in results dictionary

    except RuntimeError:
        print(f"Broke for id: {rowid}") # catch row id if bad response
        
    tqdm_bar.update(1)  # Update the progress bar

tqdm_bar.close()

  0%|          | 0/177 [00:00<?, ?it/s]

In [12]:
BO_Feb2009_results_df = pd.DataFrame(results).T
BO_Feb2009_df = pd.concat([BO_Feb2009, BO_Feb2009_results_df], axis=1)
BO_Feb2009_df

Unnamed: 0,index,sentence,speech,roberta_neg,roberta_neu,roberta_pos
0,0,Good morning Marines.,Responsibly Ending the War in Iraq Feb2009,0.005283,0.117133,0.877583
1,1,Good morning Camp Lejeune.,Responsibly Ending the War in Iraq Feb2009,0.004063,0.277454,0.718482
2,2,Good morning Jacksonville.,Responsibly Ending the War in Iraq Feb2009,0.007778,0.177322,0.814900
3,3,Thank you for that outstanding welcome.,Responsibly Ending the War in Iraq Feb2009,0.001327,0.008560,0.990112
4,4,I want to thank Lieutenant General Hejlik for ...,Responsibly Ending the War in Iraq Feb2009,0.002158,0.069132,0.928709
...,...,...,...,...,...,...
172,172,Your sacrifice should challenge all of us -- e...,Responsibly Ending the War in Iraq Feb2009,0.285542,0.571485,0.142972
173,173,There will be more danger in the months ahead.,Responsibly Ending the War in Iraq Feb2009,0.865656,0.128964,0.005380
174,174,We will face new tests and unforeseen trials.,Responsibly Ending the War in Iraq Feb2009,0.208399,0.730318,0.061283
175,175,But thanks to the sacrifices of those who have...,Responsibly Ending the War in Iraq Feb2009,0.111227,0.497074,0.391699


In [13]:
results = {}

# Set up tqdm_notebook with the total number of iterations
tqdm_bar = tqdm_notebook(total=len(BO_May2011))

for i, row in BO_May2011.iterrows():
    try:
        text = row['sentence']
        rowid = row['index'] # make sure DataFrame has reset index 
        results[rowid] = polarity_scores_roberta(text) # apply function and store in results dictionary

    except RuntimeError:
        print(f"Broke for id: {rowid}") # catch row id if bad response
        
    tqdm_bar.update(1)  # Update the progress bar

tqdm_bar.close()

  0%|          | 0/271 [00:00<?, ?it/s]

In [14]:
BO_May2011_results_df = pd.DataFrame(results).T
BO_May2011_df = pd.concat([BO_May2011, BO_May2011_results_df], axis=1)
BO_May2011_df

Unnamed: 0,index,sentence,speech,roberta_neg,roberta_neu,roberta_pos
0,0,Thank you very much.,American Diplomacy in Middle East and Northern...,0.002824,0.037212,0.959963
1,1,Thank you.,American Diplomacy in Middle East and Northern...,0.007133,0.139902,0.852965
2,2,"Please, have a seat.",American Diplomacy in Middle East and Northern...,0.244394,0.691612,0.063994
3,3,Thank you very much.,American Diplomacy in Middle East and Northern...,0.002824,0.037212,0.959963
4,4,"I want to begin by thanking Hillary Clinton, w...",American Diplomacy in Middle East and Northern...,0.001580,0.042995,0.955426
...,...,...,...,...,...,...
266,266,12 It will not be easy.,American Diplomacy in Middle East and Northern...,0.477565,0.447358,0.075076
267,267,"There’s no straight line to progress, and har ...",American Diplomacy in Middle East and Northern...,0.120123,0.708587,0.171289
268,268,But the United States of America was founded o...,American Diplomacy in Middle East and Northern...,0.160360,0.770617,0.069023
269,269,And now we cannot hesitate to stand squarely o...,American Diplomacy in Middle East and Northern...,0.006380,0.170502,0.823118


In [15]:
results = {}

# Set up tqdm_notebook with the total number of iterations
tqdm_bar = tqdm_notebook(total=len(BO_Aug2014))

for i, row in BO_Aug2014.iterrows():
    try:
        text = row['sentence']
        rowid = row['index'] # make sure DataFrame has reset index 
        results[rowid] = polarity_scores_roberta(text) # apply function and store in results dictionary

    except RuntimeError:
        print(f"Broke for id: {rowid}") # catch row id if bad response
        
    tqdm_bar.update(1)  # Update the progress bar

tqdm_bar.close()

  0%|          | 0/73 [00:00<?, ?it/s]

In [16]:
BO_Aug2014_results_df = pd.DataFrame(results).T
BO_Aug2014_df = pd.concat([BO_Aug2014, BO_Aug2014_results_df], axis=1)
BO_Aug2014_df

Unnamed: 0,index,sentence,speech,roberta_neg,roberta_neu,roberta_pos
0,0,Good evening.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.003997,0.136151,0.859851
1,1,Today I authorized two operations in Iraq -- ...,Iraq Airstrikes and Humanitarian Aid Aug2014,0.406741,0.543621,0.049638
2,2,Let me explain the actions we’re taking and wh...,Iraq Airstrikes and Humanitarian Aid Aug2014,0.104323,0.850220,0.045457
3,3,"First, I said in June -- as the terrorist grou...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.265198,0.714070,0.020732
4,4,"In rece nt days, these terrorists have continu...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.755782,0.237042,0.007176
...,...,...,...,...,...,...
68,68,That’s who we are.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.029198,0.486363,0.484439
69,69,"So tonight, we give thanks to our men and wome...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.011584,0.106486,0.881929
70,70,They represent American leadership at its best.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.010095,0.197740,0.792165
71,71,"As a nation, we should be proud of them, and ...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.004900,0.079683,0.915417


##### Combine DataFrames

In [26]:
BO_combined_df = pd.concat([BO_Nov2006_df, BO_Feb2009_df, BO_May2011_df, BO_Aug2014_df], axis=0, ignore_index=True)
BO_combined_df.drop(columns=['index'], inplace=True)
BO_combined_df

Unnamed: 0,sentence,speech,roberta_neg,roberta_neu,roberta_pos
0,"Throughout American history, the...",A way Forward in Iraq Nov2006,0.131547,0.740866,0.127587
1,They are the soul-trying times our forbearers ...,A way Forward in Iraq Nov2006,0.173441,0.693343,0.133217
2,making the hard choices and sacrifices necessa...,A way Forward in Iraq Nov2006,0.102811,0.694173,0.203016
3,This was true for those who went...,A way Forward in Iraq Nov2006,0.026974,0.689914,0.283112
4,It was true for those who lie bu...,A way Forward in Iraq Nov2006,0.650926,0.319936,0.029138
...,...,...,...,...,...
715,That’s who we are.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.029198,0.486363,0.484439
716,"So tonight, we give thanks to our men and wome...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.011584,0.106486,0.881929
717,They represent American leadership at its best.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.010095,0.197740,0.792165
718,"As a nation, we should be proud of them, and ...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.004900,0.079683,0.915417


### Analyse the Results

We'll now assess the results of the model for each speech. The analysis will consist of the following:
* top 5 most positive sentences
* top 5 most negative sentences
* average sentiment score 

##### Top 5 Analysis

In [37]:
df_sorted_pos

Unnamed: 0,sentence,speech,roberta_neg,roberta_neu,roberta_pos
719,"God bless our Armed Forces, and God bless the ...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.00253,0.055588,0.941882
718,"As a nation, we should be proud of them, and ...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.0049,0.079683,0.915417
716,"So tonight, we give thanks to our men and wome...",Iraq Airstrikes and Humanitarian Aid Aug2014,0.011584,0.106486,0.881929
647,Good evening.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.003997,0.136151,0.859851
717,They represent American leadership at its best.,Iraq Airstrikes and Humanitarian Aid Aug2014,0.010095,0.19774,0.792165


In [41]:
for each in list(BO_combined_df['speech'].unique()):
    df = BO_combined_df[BO_combined_df['speech'] == each]
    df_sorted_pos = df.sort_values(by='roberta_pos', ascending=False).head(5)
    print(each)
    for i, row in df_sorted_pos.iterrows():
        print(f"{row['sentence']} | {row['roberta_pos']}")
    print('\n' '\n' '\n')

A way Forward in Iraq Nov2006
Now, I am hopeful that               the Iraq Study Group emerges next month with a series of proposals               around which we can begin to build a bipartisan consensus. | 0.8604554533958435
But we should know that our               success in doing so is enhanced by engaging our allies so that we               receive the crucial diplomatic, military, intelligence, and financial               support that can lighten our load and add legitimacy to our actions.                | 0.7828771471977234
Thank you.                  | 0.7771697044372559
Our best hope for success is to use the tools               we have ? | 0.7750783562660217
I am               committed to working with this White House and any of my colleagues               in the months to come to craft such a consensus. | 0.7290148138999939




Responsibly Ending the War in Iraq Feb2009
Thank you for that outstanding welcome. | 0.9901123642921448
Thank you, God Bless you, and God Bless th

In [42]:
for each in list(BO_combined_df['speech'].unique()):
    df = BO_combined_df[BO_combined_df['speech'] == each]
    df_sorted_neg = df.sort_values(by='roberta_neg', ascending=False).head(5)
    print(each)
    for i, row in df_sorted_neg.iterrows():
        print(f"{row['sentence']} | {row['roberta_neg']}")
    print('\n' '\n' '\n')

A way Forward in Iraq Nov2006
Today, the Iraqi landscape               is littered with ill-conceived, half-finished projects that have               done almost nothing to help the Iraqi people or stabilize the country. | 0.9608924984931946
Such lack of foresight               is simply inexcusable. | 0.9567728042602539
the American               people have determined that all these phrases have become meaningless               in the face of a conflict that grows more deadly and chaotic with               each passing day ? | 0.9480002522468567
a conflict that has only increased the terrorist               threat it was supposed to help contain.2,867 Americans have now               died in this war. | 0.9474151730537415
I refuse to accept the possibility that I will               have to come back a year from now and say the same thing. | 0.9457480311393738




Responsibly Ending the War in Iraq Feb2009
We cannot sustain indefinitely a commitment that has put a strain on our milita

It looks like the model has done a pretty good job of categorising the sentences into negative and positive. This top 5 analysis by sentiment type gives us confidence in the results.

##### Average Sentiment Score