# Data Analysis

In this notebook, I'd like to finally get down to analyzing the sentiment of these tweets. We'll need to conduct the analysis, store the result in a new column of the dataframe, and then visualize it somehow to get an understanding of what's going on. Let's get started!

**Hypothesis:** Tweets about Naomi Osaka will be overwhelmingly more positive than tweets about Serena Williams.

In [15]:
%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd

from statistics import mean
from textblob import TextBlob

In [71]:
filepath = "./data/CLEANED_naomi-serena.pkl"

In [72]:
data = pd.read_pickle(filepath)

In [73]:
data.head()

Unnamed: 0,tweet_date,tweet_text,search query
0,Sat Sep 08 19:59:59 +0000 2018,"[naomi, osaka, upset, serena, williams, contro...",naomi osaka
1,Sat Sep 08 19:59:57 +0000 2018,"[go, girl, got, back, congrats, open]",naomi osaka
2,Sat Sep 08 19:59:56 +0000 2018,"[probably, felt, like, friend, house, mom, sta...",naomi osaka
3,Sat Sep 08 19:59:55 +0000 2018,"[congrats, girly, let, anyone, take, moment, o...",naomi osaka
4,Sat Sep 08 19:59:55 +0000 2018,"[naomi, osaka, defeat, serena, williams, drama...",naomi osaka


In [7]:
test_data = data[:10].copy()
test_data

Unnamed: 0,tweet_date,tweet_text,search query
0,Sat Sep 08 19:59:59 +0000 2018,"[naomi, osaka, upset, serena, williams, contro...",naomi osaka
1,Sat Sep 08 19:59:57 +0000 2018,"[go, girl, got, back, congrats, open]",naomi osaka
2,Sat Sep 08 19:59:56 +0000 2018,"[probably, felt, like, friend, house, mom, sta...",naomi osaka
3,Sat Sep 08 19:59:55 +0000 2018,"[congrats, girly, let, anyone, take, moment, o...",naomi osaka
4,Sat Sep 08 19:59:55 +0000 2018,"[naomi, osaka, defeat, serena, williams, drama...",naomi osaka
6,Sat Sep 08 19:59:54 +0000 2018,"[carlos, ramos, also, robbed, osaka, imagine, ...",naomi osaka
7,Sat Sep 08 19:59:52 +0000 2018,"[yes, bravo, bajin, course, love, always, sinc...",naomi osaka
8,Sat Sep 08 19:59:50 +0000 2018,"[tennis, official, coach, seen, coaching, play...",naomi osaka
9,Sat Sep 08 19:59:50 +0000 2018,"[naomi, osaka, top, serena, williams, open, fi...",naomi osaka
10,Sat Sep 08 19:59:49 +0000 2018,"[booing, damn, naomi, osaka, girl, cooking, go...",naomi osaka


In [10]:
for tweet in test_data['tweet_text']:
    for word in tweet:
        print(TextBlob(word).sentiment)

Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.55, subjectivity=0.95)
Sentiment(polarity=0.0, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=1.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.5)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.0, subjecti

This is pretty interesting. It looks like most words are neutral and don't mean anything. What I'd like to do is two analyses of sentiment:

1. Get the sentiment of each word, average them to get the total sentiment of those words
2. Combine the words into a pseudo-sentence and get the average sentiment of the sentence

## Strategy 1: Word-Based Sentiment

In [40]:
def get_sentiment(tweets):
    
    for tweet in tweets:
        sentiments = []
        for word in tweet:
            sentiments.append(TextBlob(word).sentiment.polarity)
        return mean(sentiments)

In [41]:
get_sentiment(test_data['tweet_text'])

0.05500000000000001

In analyzing this output, I'm not sure I like what I'm seeing. Tweets that are clearly positive in nature are registering as neutral, and tweets that couldn't be considered negative have a score above zero. Let's see if the sentence-based method has any different output.

In [22]:
def get_sentiment_2(tweets):
    
    for tweet in tweets:
        sentiments = []
        sentence = " ".join(tweet)
        print(sentence, TextBlob(sentence).sentiment.polarity)

In [23]:
get_sentiment_2(test_data['tweet_text'])

naomi osaka upset serena williams controversial open final cnn smartnews 0.18333333333333335
go girl got back congrats open 0.0
probably felt like friend house mom started yelling usopen 0.0
congrats girly let anyone take moment outplayed everyone even goat serena 0.0
naomi osaka defeat serena williams dramatic open final -0.14444444444444443
carlos ramos also robbed osaka imagine much better would feel broke serena go instead giving game 0.04999999999999999
yes bravo bajin course love always since turned pro 0.5
naomi osaka top serena williams open final becomes first japanese grand slam single champion japan time 0.1683673469387755
booing damn naomi osaka girl cooking good job spot light earned naomi 0.55


A bit better, but I'm starting to think that it was a mistake tokenizing the words the way I did. Let's see if I can perform the sentiment on these tweets with no tokenization.

In [24]:
test_file = pd.read_pickle("./data/DATA00_naomi-serena-all.pkl")

In [25]:
test_file.head()

Unnamed: 0,tweet_date,tweet_text,search query
0,Sat Sep 08 19:59:59 +0000 2018,Naomi Osaka upsets Serena Williams in controve...,naomi osaka
1,Sat Sep 08 19:59:57 +0000 2018,"@ Naomi_Osaka_ , you go girl! I got your back!...",naomi osaka
2,Sat Sep 08 19:59:56 +0000 2018,@ Naomi_Osaka_ probably felt like she was at h...,naomi osaka
3,Sat Sep 08 19:59:55 +0000 2018,"Congrats girly, don’t let anyone take this mom...",naomi osaka
4,Sat Sep 08 19:59:55 +0000 2018,Naomi Osaka defeats Serena Williams in a drama...,naomi osaka


In [26]:
test2 = test_file[:10].copy()

In [27]:
test2

Unnamed: 0,tweet_date,tweet_text,search query
0,Sat Sep 08 19:59:59 +0000 2018,Naomi Osaka upsets Serena Williams in controve...,naomi osaka
1,Sat Sep 08 19:59:57 +0000 2018,"@ Naomi_Osaka_ , you go girl! I got your back!...",naomi osaka
2,Sat Sep 08 19:59:56 +0000 2018,@ Naomi_Osaka_ probably felt like she was at h...,naomi osaka
3,Sat Sep 08 19:59:55 +0000 2018,"Congrats girly, don’t let anyone take this mom...",naomi osaka
4,Sat Sep 08 19:59:55 +0000 2018,Naomi Osaka defeats Serena Williams in a drama...,naomi osaka
5,Sat Sep 08 19:59:55 +0000 2018,https://twitter.com/juventino5555/status/10377...,naomi osaka
6,Sat Sep 08 19:59:54 +0000 2018,Carlos Ramos also robbed Osaka. Imagine how mu...,naomi osaka
7,Sat Sep 08 19:59:52 +0000 2018,Yes Bravo to @ BigSascha Bajin And of course l...,naomi osaka
8,Sat Sep 08 19:59:50 +0000 2018,Tennis officials.. where coaches are seen coac...,naomi osaka
9,Sat Sep 08 19:59:50 +0000 2018,Naomi Osaka tops Serena Williams in U.S. Open ...,naomi osaka


In [28]:
for tweet in test2['tweet_text']:
    print(tweet, TextBlob(tweet).sentiment.polarity)

Naomi Osaka upsets Serena Williams in controversial US Open final - CNN # SmartNewshttps://edition.cnn.com/2018/09/08/sport/naomi-osaka-serena-williams-us-open-tennis-int-spt/index.html … 0.18333333333333335
@ Naomi_Osaka_ , you go girl! I got your back! Congrats on the US open! 0.0
@ Naomi_Osaka_ probably felt like she was at her friend’s house when their mom started yelling at them # usopen 0.0
Congrats girly, don’t let anyone take this moment from you..you outplayed everyone, even the GOAT Serena @ Naomi_Osaka_ 0.0
Naomi Osaka defeats Serena Williams in a dramatic US Open final https://twitter.com/i/events/1038540032330493952 … -0.14444444444444443
https://twitter.com/juventino5555/status/1037768949109276672?s=19 … 0.0
Carlos Ramos also robbed Osaka. Imagine how much better she would feel if she broke Serena to go up 5-3 instead of being giving the game. 0.04999999999999999
Yes Bravo to @ BigSascha Bajin And of course love as always (since she turned pro!) to @ Naomi_Osaka_ https://

These still seem overwhelmingly positive, though I'm getting more sentiment than from the tokenized corpus alone. Just to test the hypothesis, let's see what happens with the last 10 tweets of the dataset.

In [29]:
test3 = test_file[-10:].copy()

In [30]:
test3

Unnamed: 0,tweet_date,tweet_text,search query
23818,Sat Sep 08 18:45:26 +0000 2018,You should see what Serena's coach has just ad...,serena williams
23819,Sat Sep 08 18:45:25 +0000 2018,Serena Williams. The greatest athlete of all t...,serena williams
23820,Sat Sep 08 18:45:25 +0000 2018,Congrats to # naomiosaka who just won the # US...,serena williams
23821,Sat Sep 08 18:45:25 +0000 2018,Justified as in being upset at a stupid rule/v...,serena williams
23822,Sat Sep 08 18:45:25 +0000 2018,That's why you're so irrelevant Mr. Kasich. Ca...,serena williams
23823,Sat Sep 08 18:45:25 +0000 2018,@ serenawilliams you are an inspiration and a ...,serena williams
23824,Sat Sep 08 18:45:24 +0000 2018,@ serenawilliams I love you so much mommy. I k...,serena williams
23825,Sat Sep 08 18:45:24 +0000 2018,@ serenawilliams Mama needs a dictionary. Grac...,serena williams
23826,Sat Sep 08 18:45:23 +0000 2018,I’d rather lose than cheat @ serenawilliams,serena williams
23827,Sat Sep 08 18:45:23 +0000 2018,"Sorry, don't agree on this one. Referee had no...",serena williams


In [31]:
for tweet in test3['tweet_text']:
    print(tweet, TextBlob(tweet).sentiment.polarity)

You should see what Serena's coach has just admitted to. Yep, coaching his player. 0.0
Serena Williams. The greatest athlete of all time. A black woman. # USOpen18 0.4166666666666667
Congrats to # naomiosaka who just won the # USopen against her idol, # SerenaWilliams https://www.instagram.com/p/Bne3yDrBRn3/?utm_source=ig_twitter_share&igshid=755i1rx1ff4k … 0.0
Justified as in being upset at a stupid rule/violation. -0.19999999999999996
That's why you're so irrelevant Mr. Kasich. Calling an umpire names and a thief is not being a real sport and a class act. -0.15
@ serenawilliams you are an inspiration and a role model for women. Not only in sports but ALL women. # UsOpenFinal 0.0
@ serenawilliams I love you so much mommy. I know that you were right. You are amazing woman and I want to be like you. I will never ever stop dreaming to see you. And of course congratulations @ Naomi_Osaka_. Don't be sad. You are amazing. You don't deserve that, but life happenspic.twitter.com/PMWFTayrqH 0.

Still not quite what I was expecting, but the sentiment is there, nonetheless.

Let's do a side-by-side comparison of them both, with just the tokenized sentences.

In [48]:
test_data['lexicon-based sentiment'] = test_data['tweet_text'].apply(lambda x: mean([TextBlob(word).sentiment.polarity for word in x]))

In [49]:
test_data

Unnamed: 0,tweet_date,tweet_text,search query,lexicon-based sentiment
0,Sat Sep 08 19:59:59 +0000 2018,"[naomi, osaka, upset, serena, williams, contro...",naomi osaka,0.055
1,Sat Sep 08 19:59:57 +0000 2018,"[go, girl, got, back, congrats, open]",naomi osaka,0.0
2,Sat Sep 08 19:59:56 +0000 2018,"[probably, felt, like, friend, house, mom, sta...",naomi osaka,0.0
3,Sat Sep 08 19:59:55 +0000 2018,"[congrats, girly, let, anyone, take, moment, o...",naomi osaka,0.0
4,Sat Sep 08 19:59:55 +0000 2018,"[naomi, osaka, defeat, serena, williams, drama...",naomi osaka,-0.054167
6,Sat Sep 08 19:59:54 +0000 2018,"[carlos, ramos, also, robbed, osaka, imagine, ...",naomi osaka,0.01875
7,Sat Sep 08 19:59:52 +0000 2018,"[yes, bravo, bajin, course, love, always, sinc...",naomi osaka,0.055556
8,Sat Sep 08 19:59:50 +0000 2018,"[tennis, official, coach, seen, coaching, play...",naomi osaka,0.023077
9,Sat Sep 08 19:59:50 +0000 2018,"[naomi, osaka, top, serena, williams, open, fi...",naomi osaka,0.073661
10,Sat Sep 08 19:59:49 +0000 2018,"[booing, damn, naomi, osaka, girl, cooking, go...",naomi osaka,0.091667


In [50]:
test_data['pseudo-sentence sentiment'] = test_data['tweet_text'].apply(lambda x: TextBlob(" ".join(x)).sentiment.polarity)

In [51]:
test_data

Unnamed: 0,tweet_date,tweet_text,search query,lexicon-based sentiment,pseudo-sentence sentiment
0,Sat Sep 08 19:59:59 +0000 2018,"[naomi, osaka, upset, serena, williams, contro...",naomi osaka,0.055,0.183333
1,Sat Sep 08 19:59:57 +0000 2018,"[go, girl, got, back, congrats, open]",naomi osaka,0.0,0.0
2,Sat Sep 08 19:59:56 +0000 2018,"[probably, felt, like, friend, house, mom, sta...",naomi osaka,0.0,0.0
3,Sat Sep 08 19:59:55 +0000 2018,"[congrats, girly, let, anyone, take, moment, o...",naomi osaka,0.0,0.0
4,Sat Sep 08 19:59:55 +0000 2018,"[naomi, osaka, defeat, serena, williams, drama...",naomi osaka,-0.054167,-0.144444
6,Sat Sep 08 19:59:54 +0000 2018,"[carlos, ramos, also, robbed, osaka, imagine, ...",naomi osaka,0.01875,0.05
7,Sat Sep 08 19:59:52 +0000 2018,"[yes, bravo, bajin, course, love, always, sinc...",naomi osaka,0.055556,0.5
8,Sat Sep 08 19:59:50 +0000 2018,"[tennis, official, coach, seen, coaching, play...",naomi osaka,0.023077,0.2
9,Sat Sep 08 19:59:50 +0000 2018,"[naomi, osaka, top, serena, williams, open, fi...",naomi osaka,0.073661,0.168367
10,Sat Sep 08 19:59:49 +0000 2018,"[booing, damn, naomi, osaka, girl, cooking, go...",naomi osaka,0.091667,0.55


It looks like the pseudo-sentence sentiment is stronger, with a significant difference in polarity. I think it might be beneficial to retain a tokenized and non-tokenized version of the tweet, to compare those sentiment polarities as well.

In [52]:
%run scrub_tweets.py

Filename: DATA00_naomi-serena-all.pkl
Loading ./data/DATA00_naomi-serena-all.pkl...
A brief look at the data...
                       tweet_date  \
0  Sat Sep 08 19:59:59 +0000 2018   
1  Sat Sep 08 19:59:57 +0000 2018   
2  Sat Sep 08 19:59:56 +0000 2018   
3  Sat Sep 08 19:59:55 +0000 2018   
4  Sat Sep 08 19:59:55 +0000 2018   

                                          tweet_text search query  
0  Naomi Osaka upsets Serena Williams in controve...  naomi osaka  
1  @ Naomi_Osaka_ , you go girl! I got your back!...  naomi osaka  
2  @ Naomi_Osaka_ probably felt like she was at h...  naomi osaka  
3  Congrats girly, don’t let anyone take this mom...  naomi osaka  
4  Naomi Osaka defeats Serena Williams in a drama...  naomi osaka  
---------------------------
                           tweet_date  \
23823  Sat Sep 08 18:45:25 +0000 2018   
23824  Sat Sep 08 18:45:24 +0000 2018   
23825  Sat Sep 08 18:45:24 +0000 2018   
23826  Sat Sep 08 18:45:23 +0000 2018   
23827  Sat Sep 08 18:45:

In [53]:
new_data = pd.read_pickle("./data/CLEANED_DATA02_naomi-serena-processed.pkl")

In [54]:
new_data.head()

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro..."
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]"
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta..."
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o..."
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama..."


In [55]:
new_test_data = new_data[:10].copy()

In [56]:
new_test_data['tweet sentiment'] = new_test_data['tweet_text'].apply(lambda x: TextBlob(x).sentiment.polarity)
new_test_data['tokenized sentiment'] = new_test_data['processed_tweet'].apply(lambda x: mean([TextBlob(word).sentiment.polarity for word in x]))

In [57]:
new_test_data

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet,tweet sentiment,tokenized sentiment
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro...",0.183333,0.055
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]",0.0,0.0
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta...",0.0,0.0
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o...",0.0,0.0
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama...",-0.144444,-0.054167
6,Sat Sep 08 19:59:54 +0000 2018,carlos ramos also robbed osaka imagine how mu...,naomi osaka,"[carlos, ramos, also, robbed, osaka, imagine, ...",0.05,0.01875
7,Sat Sep 08 19:59:52 +0000 2018,yes bravo to bajin and of course love as alw...,naomi osaka,"[yes, bravo, bajin, course, love, always, sinc...",0.5,0.055556
8,Sat Sep 08 19:59:50 +0000 2018,tennis officials where coaches are seen coac...,naomi osaka,"[tennis, official, coach, seen, coaching, play...",0.25,0.023077
9,Sat Sep 08 19:59:50 +0000 2018,naomi osaka tops serena williams in u s open ...,naomi osaka,"[naomi, osaka, top, serena, williams, open, fi...",0.15,0.073661
10,Sat Sep 08 19:59:49 +0000 2018,booing damn naomi osaka won the girl was cooki...,naomi osaka,"[booing, damn, naomi, osaka, girl, cooking, go...",0.55,0.091667


In [60]:
new_test_data.rename(index=str, columns={"tweet_text": "original tweet", "processed_tweet": "tweet_text"})

Unnamed: 0,tweet_date,original tweet,search query,tweet_text,tweet sentiment,tokenized sentiment
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro...",0.183333,0.055
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]",0.0,0.0
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta...",0.0,0.0
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o...",0.0,0.0
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama...",-0.144444,-0.054167
6,Sat Sep 08 19:59:54 +0000 2018,carlos ramos also robbed osaka imagine how mu...,naomi osaka,"[carlos, ramos, also, robbed, osaka, imagine, ...",0.05,0.01875
7,Sat Sep 08 19:59:52 +0000 2018,yes bravo to bajin and of course love as alw...,naomi osaka,"[yes, bravo, bajin, course, love, always, sinc...",0.5,0.055556
8,Sat Sep 08 19:59:50 +0000 2018,tennis officials where coaches are seen coac...,naomi osaka,"[tennis, official, coach, seen, coaching, play...",0.25,0.023077
9,Sat Sep 08 19:59:50 +0000 2018,naomi osaka tops serena williams in u s open ...,naomi osaka,"[naomi, osaka, top, serena, williams, open, fi...",0.15,0.073661
10,Sat Sep 08 19:59:49 +0000 2018,booing damn naomi osaka won the girl was cooki...,naomi osaka,"[booing, damn, naomi, osaka, girl, cooking, go...",0.55,0.091667


In [64]:
earlier_results = test_data[["lexicon-based sentiment", "pseudo-sentence sentiment"]]

In [69]:
new_test_data.join(earlier_results)

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet,tweet sentiment,tokenized sentiment,lexicon-based sentiment,pseudo-sentence sentiment
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro...",0.183333,0.055,0.055,0.183333
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]",0.0,0.0,0.0,0.0
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta...",0.0,0.0,0.0,0.0
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o...",0.0,0.0,0.0,0.0
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama...",-0.144444,-0.054167,-0.054167,-0.144444
6,Sat Sep 08 19:59:54 +0000 2018,carlos ramos also robbed osaka imagine how mu...,naomi osaka,"[carlos, ramos, also, robbed, osaka, imagine, ...",0.05,0.01875,0.01875,0.05
7,Sat Sep 08 19:59:52 +0000 2018,yes bravo to bajin and of course love as alw...,naomi osaka,"[yes, bravo, bajin, course, love, always, sinc...",0.5,0.055556,0.055556,0.5
8,Sat Sep 08 19:59:50 +0000 2018,tennis officials where coaches are seen coac...,naomi osaka,"[tennis, official, coach, seen, coaching, play...",0.25,0.023077,0.023077,0.2
9,Sat Sep 08 19:59:50 +0000 2018,naomi osaka tops serena williams in u s open ...,naomi osaka,"[naomi, osaka, top, serena, williams, open, fi...",0.15,0.073661,0.073661,0.168367
10,Sat Sep 08 19:59:49 +0000 2018,booing damn naomi osaka won the girl was cooki...,naomi osaka,"[booing, damn, naomi, osaka, girl, cooking, go...",0.55,0.091667,0.091667,0.55


The results are almost exactly the same: a pseudo-sentence made by concatenating the tokenized words produced much the same sentiment as the tweet itself. Indeed, removing stop words and the like seems to have no impact on the overall sentiment of the tweet. However, analyzing the sentiment by averaging the tokenized words individually seems much less useful than analyzing the sentence as a whole (I believe due to the nature of syntax and semantics on a sentence vs. word level).

Analysis will proceed as follows: by joining the tokenized words into a longer string, and calculating the overall sentiment of the entire phrase.

### Calculating Sentiment for Data:

In [82]:
new_data.head()

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro..."
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]"
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta..."
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o..."
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama..."


In [83]:
analyzed_tweets = new_data.copy()

In [84]:
analyzed_tweets.head()

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro..."
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]"
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta..."
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o..."
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama..."


In [85]:
analyzed_tweets.tail()

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet
23823,Sat Sep 08 18:45:25 +0000 2018,you are an inspiration and a role model for ...,serena williams,"[inspiration, role, model, woman, sport, woman..."
23824,Sat Sep 08 18:45:24 +0000 2018,i love you so much mommy i know that you we...,serena williams,"[love, much, mommy, know, right, amazing, woma..."
23825,Sat Sep 08 18:45:24 +0000 2018,mama needs a dictionary gracious humble u...,serena williams,"[mama, need, dictionary, gracious, humble, umm..."
23826,Sat Sep 08 18:45:23 +0000 2018,i would rather lose than cheat,serena williams,"[would, rather, lose, cheat]"
23827,Sat Sep 08 18:45:23 +0000 2018,sorry do not agree on this one referee had n...,serena williams,"[sorry, agree, one, referee, business, making,..."


In [86]:
analyzed_tweets['sentiment'] = analyzed_tweets['processed_tweet'].apply(lambda x: TextBlob(" ".join(x)).sentiment.polarity)

In [87]:
analyzed_tweets.head()

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet,sentiment
0,Sat Sep 08 19:59:59 +0000 2018,naomi osaka upsets serena williams in controve...,naomi osaka,"[naomi, osaka, upset, serena, williams, contro...",0.183333
1,Sat Sep 08 19:59:57 +0000 2018,you go girl i got your back congrats on ...,naomi osaka,"[go, girl, got, back, congrats, open]",0.0
2,Sat Sep 08 19:59:56 +0000 2018,probably felt like she was at her friends ho...,naomi osaka,"[probably, felt, like, friend, house, mom, sta...",0.0
3,Sat Sep 08 19:59:55 +0000 2018,congrats girly do not let anyone take this mo...,naomi osaka,"[congrats, girly, let, anyone, take, moment, o...",0.0
4,Sat Sep 08 19:59:55 +0000 2018,naomi osaka defeats serena williams in a drama...,naomi osaka,"[naomi, osaka, defeat, serena, williams, drama...",-0.144444


In [88]:
analyzed_tweets.tail()

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet,sentiment
23823,Sat Sep 08 18:45:25 +0000 2018,you are an inspiration and a role model for ...,serena williams,"[inspiration, role, model, woman, sport, woman...",0.0
23824,Sat Sep 08 18:45:24 +0000 2018,i love you so much mommy i know that you we...,serena williams,"[love, much, mommy, know, right, amazing, woma...",0.280952
23825,Sat Sep 08 18:45:24 +0000 2018,mama needs a dictionary gracious humble u...,serena williams,"[mama, need, dictionary, gracious, humble, umm...",0.195238
23826,Sat Sep 08 18:45:23 +0000 2018,i would rather lose than cheat,serena williams,"[would, rather, lose, cheat]",0.0
23827,Sat Sep 08 18:45:23 +0000 2018,sorry do not agree on this one referee had n...,serena williams,"[sorry, agree, one, referee, business, making,...",-0.25


In [89]:
analyzed_tweets.sort_values(by="sentiment")

Unnamed: 0,tweet_date,tweet_text,search query,processed_tweet,sentiment
9067,Sat Sep 08 18:35:26 +0000 2018,no you are terribly disrespectful with shing...,naomi osaka,"[terribly, disrespectful, shingo, kunieda, yui...",-1.0
16528,Sat Sep 08 19:22:34 +0000 2018,yes after warning she had the penalty coach...,serena williams,"[yes, warning, penalty, coach, raquet, screami...",-1.0
12414,Sat Sep 08 19:52:02 +0000 2018,shame that disgraceful official did not show h...,serena williams,"[shame, disgraceful, official, show, level, re...",-1.0
4634,Sat Sep 08 19:07:53 +0000 2018,as opposed to carlos stealing a point how do...,naomi osaka,"[opposed, carlos, stealing, point, get, coachi...",-1.0
23102,Sat Sep 08 18:48:28 +0000 2018,he will forever be known as a horrible ref,serena williams,"[forever, known, horrible, ref]",-1.0
21940,Sat Sep 08 18:53:42 +0000 2018,mate you are quoting incidents from decades ag...,serena williams,"[mate, quoting, incident, decade, ago, abused,...",-1.0
19423,Sat Sep 08 19:05:31 +0000 2018,serena williams conduct tonight was disgusting...,serena williams,"[serena, williams, conduct, tonight, disgustin...",-1.0
16726,Sat Sep 08 19:21:12 +0000 2018,awful,serena williams,[awful],-1.0
16767,Sat Sep 08 19:20:56 +0000 2018,waoooo so serena is for tennis channel what hi...,serena williams,"[waoooo, serena, tennis, channel, hillary, cnn...",-1.0
19549,Sat Sep 08 19:04:56 +0000 2018,disgusting sportsmanship by serenawilliams u...,serena williams,"[disgusting, sportsmanship, serenawilliams, us...",-1.0


In [91]:
analyzed_tweets.groupby(by="search query").size()

search query
naomi osaka        11380
serena williams    12395
dtype: int64

In [92]:
naomi_analysis = analyzed_tweets[analyzed_tweets['search query'] == 'naomi osaka']

In [93]:
serena_analysis = analyzed_tweets[analyzed_tweets['search query'] == 'serena williams']

In [95]:
naomi_analysis['sentiment'].sum()

1897.4450200120045

In [96]:
serena_analysis['sentiment'].sum()

1305.5000090534168

I think I've got my final answer! By performing a crude count of the overall sentiment for each search query, I was able to determine that tweets for the search query 'naomi osaka' were on the whole more positive than tweets with the search query 'serena williams'. I wonder if I can perform a statistical analysis to prove this. I'm excited to move into visualizations so I can see this data more clearly.