# Tweet Collection from the Twitter API

The categories that we are predicting for this project are **Best Picture**, **Best Actor**, **Best Actress**, **Best Supporting Actor**, and **Best Supporting Actress**.

The technical goal is to store tweets under jsons for each Oscar Nominee. There will be one json for each category.

tweets -> keyword -> nominee -> json

### Getting authentication to collect

In [1]:
import json
import tweepy
import pandas as pd

In [2]:
#keys/secrets
credentials = {}
credentials['CONSUMER_KEY'] = ""
credentials['CONSUMER_SECRET'] = ""
credentials['ACCESS_TOKEN'] = ""
credentials['ACCESS_SECRET'] = ""

with open("twitter_credentials.json", "w") as f:
    json.dump(credentials, f)

In [3]:
with open("twitter_credentials.json", "r") as f:
    creds = json.load(f)
    
auth = tweepy.OAuthHandler(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])
auth.set_access_token(creds['ACCESS_TOKEN'], creds['ACCESS_SECRET'])

api = tweepy.API(auth, wait_on_rate_limit = True,
                wait_on_rate_limit_notify = True)
try:
    api.verify_credentials()
    print("Authentication OK")
except:
    print("Error during Authentication")

Authentication OK


In [4]:
# Code for collecting tweets for a list of keywords
def collectTweets(key, query, df, dateList):
    for date in dateList:
        for tweet in api.search(q=query, lang = "en", count = 100, toDate = date):
            actress_name = key
            text = tweet.text
            favorite_count = tweet.favorite_count
            retweet_count = tweet.retweet_count
            date_posted = tweet.created_at 
            df.loc[len(df.index)] = [key, text, favorite_count, retweet_count, date]       

In [2]:
n = ["Daniel Kaluuya", "Sacha Baron Cohen", "Leslie Odom, JR.", "Paul Raci", "Lakeith Stanfield"]

nominees_keyword_dict = {n[0]:"%23oscars%20%23danielkaluuya%20-filter%3Aretweets", 
                         n[1]:"%23oscars%20%23sachabaroncohen%20-filter%3Aretweets", 
                         n[2]:"%23oscars%20%23leslieodomjr%20-filter%3Aretweets", 
                         n[3]:"%23oscars%20%23paulraci%20-filter%3Aretweets", 
                         n[4]:"%23oscars%20%23lakeithstanfield%20-filter%3Aretweets"}

tweets_df = pd.DataFrame(columns = ["actor_name", "text", "favorite_count", "retweet_count", "date"])

dates = ["Sat Apr 23 12:00:00 +0000 2021",
         "Sat Apr 22 12:00:00 +0000 2021",
         "Sat Apr 21 12:00:00 +0000 2021",
         "Sat Apr 20 12:00:00 +0000 2021",
         "Sat Apr 19 12:00:00 +0000 2021",
         "Sat Apr 18 12:00:00 +0000 2021",
         "Sat Apr 17 12:00:00 +0000 2021",
         "Sat Apr 16 12:00:00 +0000 2021"]

In [6]:
# Collecting tweets for best actress
for key in nominees_keyword_dict:
    collectTweets(key, nominees_keyword_dict[key], tweets_df, dates)

In [7]:
tweets_df

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
0,Daniel Kaluuya,RT @opendoorpeople: Let’s make sure we note -\...,0,49,Sat Apr 23 12:00:00 +0000 2021
1,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021
2,Daniel Kaluuya,RT @ETCanada: #DanielKaluuya's mom goes viral ...,0,405,Sat Apr 23 12:00:00 +0000 2021
3,Daniel Kaluuya,"RT @Participant: ""How blessed we are that we l...",0,55,Sat Apr 23 12:00:00 +0000 2021
4,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
2819,Lakeith Stanfield,RT @CreativeByLucas: The 6th entry for my #Osc...,0,1,Sat Apr 16 12:00:00 +0000 2021
2820,Lakeith Stanfield,The 6th entry for my #Oscars poster series to ...,6,1,Sat Apr 16 12:00:00 +0000 2021
2821,Lakeith Stanfield,Continuing my #awardseason viewing with #Judas...,0,0,Sat Apr 16 12:00:00 +0000 2021
2822,Lakeith Stanfield,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021


In [8]:
# Number of tweets for Carey Mulligan for the Oscars prior to them taking place
tweets_df[tweets_df["actor_name"] == n[0]]

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
0,Daniel Kaluuya,RT @opendoorpeople: Let’s make sure we note -\...,0,49,Sat Apr 23 12:00:00 +0000 2021
1,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021
2,Daniel Kaluuya,RT @ETCanada: #DanielKaluuya's mom goes viral ...,0,405,Sat Apr 23 12:00:00 +0000 2021
3,Daniel Kaluuya,"RT @Participant: ""How blessed we are that we l...",0,55,Sat Apr 23 12:00:00 +0000 2021
4,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
795,Daniel Kaluuya,RT @ETCanada: #DanielKaluuya's mom goes viral ...,0,405,Sat Apr 16 12:00:00 +0000 2021
796,Daniel Kaluuya,RT @ABelloWrites: This took me out! 😂😂\n#Danie...,0,46,Sat Apr 16 12:00:00 +0000 2021
797,Daniel Kaluuya,RT @DanielMays9: Massive congratulations to th...,0,6,Sat Apr 16 12:00:00 +0000 2021
798,Daniel Kaluuya,"RT @Participant: ""How blessed we are that we l...",0,55,Sat Apr 16 12:00:00 +0000 2021


In [9]:
tweets_df[tweets_df["actor_name"] == n[1]]

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
800,Sacha Baron Cohen,Sacha Baron Cohen Was The Best Dressed Man At ...,0,0,Sat Apr 23 12:00:00 +0000 2021
801,Sacha Baron Cohen,I cannot begin to describe the joy I felt this...,0,0,Sat Apr 23 12:00:00 +0000 2021
802,Sacha Baron Cohen,RT @XpressCinema: It is time for the Best Supp...,0,2,Sat Apr 23 12:00:00 +0000 2021
803,Sacha Baron Cohen,Sacha Baron Cohen deserved best supporting act...,1,0,Sat Apr 23 12:00:00 +0000 2021
804,Sacha Baron Cohen,#SachaBaronCohen and #IslaFisher Dazzle at the...,0,0,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
1051,Sacha Baron Cohen,Good luck! Good luck! Good luck. And loads of ...,2,1,Sat Apr 16 12:00:00 +0000 2021
1052,Sacha Baron Cohen,#TheTrialoftheChicago7 is nominated for 6 #Osc...,0,0,Sat Apr 16 12:00:00 +0000 2021
1053,Sacha Baron Cohen,"Los nominados al #Oscar a ""mejor actor secunda...",2,0,Sat Apr 16 12:00:00 +0000 2021
1054,Sacha Baron Cohen,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021


In [10]:
tweets_df[tweets_df["actor_name"] == n[2]]

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
1056,"Leslie Odom, JR.",I think it’s because we spend so much time con...,0,0,Sat Apr 23 12:00:00 +0000 2021
1057,"Leslie Odom, JR.",#Zendaya #ReginaKing #AngelaBassett #MariaBaka...,0,0,Sat Apr 23 12:00:00 +0000 2021
1058,"Leslie Odom, JR.",😂 We do! Black people are 🪄#DanielKaluuya &amp...,0,0,Sat Apr 23 12:00:00 +0000 2021
1059,"Leslie Odom, JR.",#Oscars #LeslieOdomJr #OneNightInMiami\nOscars...,0,0,Sat Apr 23 12:00:00 +0000 2021
1060,"Leslie Odom, JR.",RT @XpressCinema: It is time for the Best Supp...,0,2,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
1675,"Leslie Odom, JR.",#oscar week continues and today we have #nomin...,2,0,Sat Apr 16 12:00:00 +0000 2021
1676,"Leslie Odom, JR.","We caught up with #LeslieOdomJr, who's up for ...",4,1,Sat Apr 16 12:00:00 +0000 2021
1677,"Leslie Odom, JR.",The #AcademyAwards will be broadcast Sunday🌟\n...,0,0,Sat Apr 16 12:00:00 +0000 2021
1678,"Leslie Odom, JR.",Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021


In [11]:
tweets_df[tweets_df["actor_name"] == n[3]]

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
1680,Paul Raci,congrats #danielkaluuya!! I bet it better not ...,0,0,Sat Apr 23 12:00:00 +0000 2021
1681,Paul Raci,RT @XpressCinema: It is time for the Best Supp...,0,2,Sat Apr 23 12:00:00 +0000 2021
1682,Paul Raci,Congratulations to the Sound of Metal team on ...,5,0,Sat Apr 23 12:00:00 +0000 2021
1683,Paul Raci,Yup. And probably a big reason why Paul Raci w...,1,0,Sat Apr 23 12:00:00 +0000 2021
1684,Paul Raci,A film comes alive for the first time in the e...,2,0,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
2115,Paul Raci,#SpiritAwards Winner #paulraci going to have a...,9,1,Sat Apr 16 12:00:00 +0000 2021
2116,Paul Raci,Oscars Predictions: Best Supporting Actor – Da...,9,0,Sat Apr 16 12:00:00 +0000 2021
2117,Paul Raci,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021
2118,Paul Raci,oh my god guys!! \nPaul Raci liked my tweet!\n...,1,0,Sat Apr 16 12:00:00 +0000 2021


In [12]:
tweets_df[tweets_df["actor_name"] == n[4]]

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
2120,Lakeith Stanfield,he ALWAYS understands the assignment #LakeithS...,1,0,Sat Apr 23 12:00:00 +0000 2021
2121,Lakeith Stanfield,#Zendaya #ReginaKing #AngelaBassett #MariaBaka...,0,0,Sat Apr 23 12:00:00 +0000 2021
2122,Lakeith Stanfield,#JudasAndTheBlackMessiah is available now on D...,1,0,Sat Apr 23 12:00:00 +0000 2021
2123,Lakeith Stanfield,congrats #danielkaluuya!! I bet it better not ...,0,0,Sat Apr 23 12:00:00 +0000 2021
2124,Lakeith Stanfield,I think I passed by #LaKeithStanfield in High ...,0,0,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
2819,Lakeith Stanfield,RT @CreativeByLucas: The 6th entry for my #Osc...,0,1,Sat Apr 16 12:00:00 +0000 2021
2820,Lakeith Stanfield,The 6th entry for my #Oscars poster series to ...,6,1,Sat Apr 16 12:00:00 +0000 2021
2821,Lakeith Stanfield,Continuing my #awardseason viewing with #Judas...,0,0,Sat Apr 16 12:00:00 +0000 2021
2822,Lakeith Stanfield,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021


In [15]:
tweets_df[tweets_df["actor_name"] == n[4]].iloc[0]

actor_name                                        Lakeith Stanfield
text              he ALWAYS understands the assignment #LakeithS...
favorite_count                                                    1
retweet_count                                                     0
date                                 Sat Apr 23 12:00:00 +0000 2021
Name: 2120, dtype: object

In [16]:
tweets_df.to_pickle("best_supporting_actor.pkl")

In [3]:
train = pd.read_pickle("best_supporting_actor.pkl")

# Text Processing

In [4]:
train

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date
0,Daniel Kaluuya,RT @opendoorpeople: Let’s make sure we note -\...,0,49,Sat Apr 23 12:00:00 +0000 2021
1,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021
2,Daniel Kaluuya,RT @ETCanada: #DanielKaluuya's mom goes viral ...,0,405,Sat Apr 23 12:00:00 +0000 2021
3,Daniel Kaluuya,"RT @Participant: ""How blessed we are that we l...",0,55,Sat Apr 23 12:00:00 +0000 2021
4,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021
...,...,...,...,...,...
2819,Lakeith Stanfield,RT @CreativeByLucas: The 6th entry for my #Osc...,0,1,Sat Apr 16 12:00:00 +0000 2021
2820,Lakeith Stanfield,The 6th entry for my #Oscars poster series to ...,6,1,Sat Apr 16 12:00:00 +0000 2021
2821,Lakeith Stanfield,Continuing my #awardseason viewing with #Judas...,0,0,Sat Apr 16 12:00:00 +0000 2021
2822,Lakeith Stanfield,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021


In [5]:
import nltk
from nltk.corpus import stopwords
additional  = ['rt','rts','retweet']
swords = set().union(stopwords.words('english'),additional)

In [6]:
train['processed_text'] = train['text'].str.lower()\
          .str.replace('(@[a-z0-9]+)\w+',' ')\
          .str.replace('(http\S+)', ' ')\
          .str.replace('([^0-9a-z \t])',' ')\
          .str.replace(' +',' ')\
          .str.replace('(@[0-9]+)\w+',' ')\
          .apply(lambda x: [i for i in x.split() if not i in swords])

In [7]:
from nltk.stem import PorterStemmer
ps = PorterStemmer()
train['stemmed'] = train['processed_text'].apply(lambda x: [ps.stem(i) for i in x if i != ''])

In [8]:
train["text2"] = train["processed_text"].str.join(" ")

In [9]:
train["text3"] = train["stemmed"].str.join(" ")

In [10]:
train

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date,processed_text,stemmed,text2,text3
0,Daniel Kaluuya,RT @opendoorpeople: Let’s make sure we note -\...,0,49,Sat Apr 23 12:00:00 +0000 2021,"[let, make, sure, note, british, working, clas...","[let, make, sure, note, british, work, class, ...",let make sure note british working class talen...,let make sure note british work class talent e...
1,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021,"[congratulations, danielkaluuya, historic, win...","[congratul, danielkaluuya, histor, win, first,...",congratulations danielkaluuya historic win fir...,congratul danielkaluuya histor win first black...
2,Daniel Kaluuya,RT @ETCanada: #DanielKaluuya's mom goes viral ...,0,405,Sat Apr 23 12:00:00 +0000 2021,"[danielkaluuya, mom, goes, viral, hilarious, r...","[danielkaluuya, mom, goe, viral, hilari, respo...",danielkaluuya mom goes viral hilarious respons...,danielkaluuya mom goe viral hilari respons bri...
3,Daniel Kaluuya,"RT @Participant: ""How blessed we are that we l...",0,55,Sat Apr 23 12:00:00 +0000 2021,"[blessed, lived, lifetime, existed, played, di...","[bless, live, lifetim, exist, play, divid, con...",blessed lived lifetime existed played divide c...,bless live lifetim exist play divid conquer sa...
4,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021,"[congratulations, danielkaluuya, historic, win...","[congratul, danielkaluuya, histor, win, first,...",congratulations danielkaluuya historic win fir...,congratul danielkaluuya histor win first black...
...,...,...,...,...,...,...,...,...,...
2819,Lakeith Stanfield,RT @CreativeByLucas: The 6th entry for my #Osc...,0,1,Sat Apr 16 12:00:00 +0000 2021,"[6th, entry, oscars, poster, series, celebrate...","[6th, entri, oscar, poster, seri, celebr, best...",6th entry oscars poster series celebrate best ...,6th entri oscar poster seri celebr best pictur...
2820,Lakeith Stanfield,The 6th entry for my #Oscars poster series to ...,6,1,Sat Apr 16 12:00:00 +0000 2021,"[6th, entry, oscars, poster, series, celebrate...","[6th, entri, oscar, poster, seri, celebr, best...",6th entry oscars poster series celebrate best ...,6th entri oscar poster seri celebr best pictur...
2821,Lakeith Stanfield,Continuing my #awardseason viewing with #Judas...,0,0,Sat Apr 16 12:00:00 +0000 2021,"[continuing, awardseason, viewing, judasandthe...","[continu, awardseason, view, judasandtheblackm...",continuing awardseason viewing judasandtheblac...,continu awardseason view judasandtheblackmessi...
2822,Lakeith Stanfield,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021,"[victorias, en, mejor, actor, de, reparto, dan...","[victoria, en, mejor, actor, de, reparto, dani...",victorias en mejor actor de reparto danielkalu...,victoria en mejor actor de reparto danielkaluu...


# Calculating Average Sentiment Scores

In [11]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [12]:
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Hojin\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [13]:
sia = SentimentIntensityAnalyzer()

In [14]:
def sentiment_calcnltk(text):
    try:
        return sia.polarity_scores(text)
    except:
        return None

In [15]:
train['NLTKsentiment_raw'] = train["text3"].apply(sentiment_calcnltk)

In [16]:
iterate = list(train["NLTKsentiment_raw"])
compoundList = []
for t in iterate:
    compound = t["compound"]
    compoundList.append(compound)
train["NLTKsentiment"] = compoundList

In [17]:
train

Unnamed: 0,actor_name,text,favorite_count,retweet_count,date,processed_text,stemmed,text2,text3,NLTKsentiment_raw,NLTKsentiment
0,Daniel Kaluuya,RT @opendoorpeople: Let’s make sure we note -\...,0,49,Sat Apr 23 12:00:00 +0000 2021,"[let, make, sure, note, british, working, clas...","[let, make, sure, note, british, work, class, ...",let make sure note british working class talen...,let make sure note british work class talent e...,"{'neg': 0.0, 'neu': 0.597, 'pos': 0.403, 'comp...",0.7964
1,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021,"[congratulations, danielkaluuya, historic, win...","[congratul, danielkaluuya, histor, win, first,...",congratulations danielkaluuya historic win fir...,congratul danielkaluuya histor win first black...,"{'neg': 0.0, 'neu': 0.404, 'pos': 0.596, 'comp...",0.9153
2,Daniel Kaluuya,RT @ETCanada: #DanielKaluuya's mom goes viral ...,0,405,Sat Apr 23 12:00:00 +0000 2021,"[danielkaluuya, mom, goes, viral, hilarious, r...","[danielkaluuya, mom, goe, viral, hilari, respo...",danielkaluuya mom goes viral hilarious respons...,danielkaluuya mom goe viral hilari respons bri...,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000
3,Daniel Kaluuya,"RT @Participant: ""How blessed we are that we l...",0,55,Sat Apr 23 12:00:00 +0000 2021,"[blessed, lived, lifetime, existed, played, di...","[bless, live, lifetim, exist, play, divid, con...",blessed lived lifetime existed played divide c...,bless live lifetim exist play divid conquer sa...,"{'neg': 0.0, 'neu': 0.574, 'pos': 0.426, 'comp...",0.6369
4,Daniel Kaluuya,RT @youngvictheatre: Congratulations to #Danie...,0,39,Sat Apr 23 12:00:00 +0000 2021,"[congratulations, danielkaluuya, historic, win...","[congratul, danielkaluuya, histor, win, first,...",congratulations danielkaluuya historic win fir...,congratul danielkaluuya histor win first black...,"{'neg': 0.0, 'neu': 0.404, 'pos': 0.596, 'comp...",0.9153
...,...,...,...,...,...,...,...,...,...,...,...
2819,Lakeith Stanfield,RT @CreativeByLucas: The 6th entry for my #Osc...,0,1,Sat Apr 16 12:00:00 +0000 2021,"[6th, entry, oscars, poster, series, celebrate...","[6th, entri, oscar, poster, seri, celebr, best...",6th entry oscars poster series celebrate best ...,6th entri oscar poster seri celebr best pictur...,"{'neg': 0.0, 'neu': 0.704, 'pos': 0.296, 'comp...",0.6369
2820,Lakeith Stanfield,The 6th entry for my #Oscars poster series to ...,6,1,Sat Apr 16 12:00:00 +0000 2021,"[6th, entry, oscars, poster, series, celebrate...","[6th, entri, oscar, poster, seri, celebr, best...",6th entry oscars poster series celebrate best ...,6th entri oscar poster seri celebr best pictur...,"{'neg': 0.0, 'neu': 0.704, 'pos': 0.296, 'comp...",0.6369
2821,Lakeith Stanfield,Continuing my #awardseason viewing with #Judas...,0,0,Sat Apr 16 12:00:00 +0000 2021,"[continuing, awardseason, viewing, judasandthe...","[continu, awardseason, view, judasandtheblackm...",continuing awardseason viewing judasandtheblac...,continu awardseason view judasandtheblackmessi...,"{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'comp...",0.6369
2822,Lakeith Stanfield,Victorias en MEJOR ACTOR DE REPARTO 🎭\n\n#Dani...,0,0,Sat Apr 16 12:00:00 +0000 2021,"[victorias, en, mejor, actor, de, reparto, dan...","[victoria, en, mejor, actor, de, reparto, dani...",victorias en mejor actor de reparto danielkalu...,victoria en mejor actor de reparto danielkaluu...,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000


In [18]:
ob1 = train[train["actor_name"] == n[0]]
ob2 = train[train["actor_name"] == n[1]]
ob3 = train[train["actor_name"] == n[2]]
ob4 = train[train["actor_name"] == n[3]]
ob5 = train[train["actor_name"] == n[4]]                     

In [19]:
score1 = ob1["NLTKsentiment"].mean()
score2 = ob2["NLTKsentiment"].mean()
score3 = ob3["NLTKsentiment"].mean()
score4 = ob4["NLTKsentiment"].mean()
score5 = ob5["NLTKsentiment"].mean()

print(n[0], "Average Sentiment Score", score1)
print(n[1], "Average Sentiment Score", score2)
print(n[2], "Average Sentiment Score", score3)
print(n[3], "Average Sentiment Score", score4)
print(n[4], "Average Sentiment Score", score5)

Daniel Kaluuya Average Sentiment Score 0.3792020000000009
Sacha Baron Cohen Average Sentiment Score 0.39029999999999937
Leslie Odom, JR. Average Sentiment Score 0.34504358974359034
Paul Raci Average Sentiment Score 0.48267636363636396
Lakeith Stanfield Average Sentiment Score 0.24239772727272732


In [20]:
sentiment = {n[0]: score1,
             n[1]: score2,
             n[2]: score3,
             n[3]: score4,
             n[4]: score5}

# IMDB Movie Ratings, Rotten Tomatoes Critic Scores

In [21]:
# Scores were collected on April 24th

# Out of 10
imdb = {n[0]: 7.6,
        n[1]: 7.8,
        n[2]: 7.2,
        n[3]: 7.8,
        n[4]: 7.6}

# Percentage
rt_critics = {n[0]: 0.96,
              n[1]: 0.89,
              n[2]: 0.98,
              n[3]: 0.97,
              n[4]: 0.96}

# Percentage
rt_audience = {n[0]: 0.95,
               n[1]: 0.91,
               n[2]: 0.82,
               n[3]: 0.91,
               n[4]: 0.95}

# Scoring Model

In [22]:
# We will calculate the percentage chance that an Actress will win the Oscars.
# Our paper is a commentary on public opinion 
# Therefore we need to scale our metrics  to 0.25 each

def scaling(score, old_range):
    new_range = (0, 0.25)
    mini = old_range[0]
    maxi = old_range[1]
    percent = (score - (mini)) / (maxi - (mini))
    # Scaling formula
    weighted = new_range[1] * percent + new_range[0]
    return weighted

In [32]:
actor_win = {}
actors_df = pd.DataFrame(columns = ["name", "category", "imdb_audience_score", "rt_critic_score", "rt_audience_score", "sentiment_score", "oscar_win"])

In [33]:
for actor in n:
    sentiment_score = scaling(sentiment[actor], (-1, 1))
    imdb_score = scaling(imdb[actor], (0, 10))
    rt_critics_score = scaling(rt_critics[actor], (0, 1))
    rt_audience_score = scaling(rt_audience[actor], (0, 1))
    percentage_win = sentiment_score + imdb_score + rt_critics_score + rt_audience_score
    actor_win[actor] = percentage_win
    actors_df.loc[len(actors_df.index)] = [actor, "supporting_actor", imdb_score, rt_critics_score, rt_audience_score, 
                                           sentiment_score, percentage_win]
actors_df.to_csv("supporting_actor_results.csv", index = False)

In [34]:
for key in actor_win:
    print(key, "score is", actor_win[key])

Daniel Kaluuya score is 0.8399002500000001
Sacha Baron Cohen score is 0.8187875
Leslie Odom, JR. score is 0.7981304487179487
Paul Raci score is 0.8503345454545455
Lakeith Stanfield score is 0.822799715909091
