will take you through how we can predict the US presidential elections with Python. Here, we will not train any machine learning model. we will analyze the sentiments of people for the candidates and then at the end, we will conclude based on the most number of positive and negative tweets against the candidates.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


##import library

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
from wordcloud import WordCloud
import plotly.graph_objects as go
import plotly.express as px

##import datasets

In [3]:
trump = pd.read_csv("/content/drive/MyDrive/Data_Sets/Trumpall2.csv")
biden = pd.read_csv("/content/drive/MyDrive/Data_Sets/Bidenall2.csv")

In [4]:
trump

Unnamed: 0,user,text
0,manny_rosen,@sanofi please tell us how many shares the Cr...
1,osi_abdul,"https://t.co/atM98CpqF7 Like, comment, RT #P..."
2,Patsyrw,Your AG Barr is as useless &amp; corrupt as y...
3,seyedebrahimi_m,Mr. Trump! Wake Up! Most of the comments bel...
4,James09254677,After 4 years you think you would have figure...
...,...,...
2783,4diva63,"@realDonaldTrump For the 1/100 time, absentee ..."
2784,hidge826,@realDonaldTrump If you’re so scared of losing...
2785,SpencerRossy,@realDonaldTrump I rarely get involved with fo...
2786,ScoobyMcpherson,@realDonaldTrump This is the moment when Trump...


In [5]:
biden

Unnamed: 0,user,text
0,MarkHodder3,@JoeBiden And we’ll find out who won in 2026...
1,K87327961G,@JoeBiden Your Democratic Nazi Party cannot be...
2,OldlaceA,@JoeBiden So did Lying Barr
3,penblogger,@JoeBiden It's clear you didnt compose this tw...
4,Aquarian0264,@JoeBiden I will vote in person thank you.
...,...,...
2535,meryn1977,@JoeBiden You'll just try to calm those waters...
2536,BSNelson114,@JoeBiden 96 days 96 dias #VoteJoeBiden2020 #...
2537,KenCapel,@JoeBiden YOU THINK YOU CAN DO THAT??? YOU CAN...
2538,LeslyeHale,@JoeBiden Trump wants our children back at sch...


##start with sentiment analysis by using TextBlob to perform simple text classification 

In [6]:
##to check the sentiment polarity of trump dataset
textblob1 = TextBlob(trump['text'][10])
print("Trump:", textblob1.sentiment)

##to check the sentiment polarity of biden dataset
textblob2 = TextBlob(biden['text'][500])
print("Biden:", textblob2.sentiment)

Trump: Sentiment(polarity=0.15, subjectivity=0.3125)
Biden: Sentiment(polarity=0.6, subjectivity=0.9)


##add polarity column in datset to show the sentiment polarity of both datasets

In [7]:
##add new sentiment polarity column in both dataset to show the every tweets of polarity
def find_polarity(review):
  return TextBlob(review).sentiment.polarity

##trump's dataset
trump["Sentiment Polarity"] = trump['text'].apply(find_polarity)
print(trump)

print()

##biden's dataset
biden["Sentiment Polarity"] = biden['text'].apply(find_polarity)
print(biden)

                 user  ... Sentiment Polarity
0         manny_rosen  ...              0.050
1           osi_abdul  ...              0.000
2             Patsyrw  ...             -0.500
3     seyedebrahimi_m  ...              0.500
4       James09254677  ...              0.000
...               ...  ...                ...
2783          4diva63  ...              0.000
2784         hidge826  ...              0.000
2785     SpencerRossy  ...              0.225
2786  ScoobyMcpherson  ...              0.000
2787          bjklinz  ...             -0.500

[2788 rows x 3 columns]

              user  ... Sentiment Polarity
0      MarkHodder3  ...               0.00
1       K87327961G  ...               0.00
2         OldlaceA  ...               0.00
3       penblogger  ...               0.05
4     Aquarian0264  ...               0.00
...            ...  ...                ...
2535     meryn1977  ...               0.15
2536   BSNelson114  ...               0.00
2537      KenCapel  ...            

Polarity: Polarity ranges from -1 to +1(negative to positive) and tells whether the text has negative sentiments or positive sentiments. Polarity tells about factual information

##Sentiment Polarity on Both the candidates

Now I will add a new attribute in both the datasets by the name of “Expression Label”:



In [8]:
##add expression lebel in trump's dataset to show the tweets are positive, negative or netural
trump["Expression Label"] = np.where(trump["Sentiment Polarity"] > 0, "positive", "negative")
trump["Expression Label"][trump["Sentiment Polarity"] == 0] = "netural"
print(trump)

                 user  ... Expression Label
0         manny_rosen  ...         positive
1           osi_abdul  ...          netural
2             Patsyrw  ...         negative
3     seyedebrahimi_m  ...         positive
4       James09254677  ...          netural
...               ...  ...              ...
2783          4diva63  ...          netural
2784         hidge826  ...          netural
2785     SpencerRossy  ...         positive
2786  ScoobyMcpherson  ...          netural
2787          bjklinz  ...         negative

[2788 rows x 4 columns]




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [9]:
##add expression lebel in biden's dataset to show the tweets are positive, negative or netural
biden["Expression Label"] = np.where(biden["Sentiment Polarity"] > 0, "positive", "negative")
biden["Expression Label"][biden["Sentiment Polarity"] == 0] = "netural"
print(biden)

              user  ... Expression Label
0      MarkHodder3  ...          netural
1       K87327961G  ...          netural
2         OldlaceA  ...          netural
3       penblogger  ...         positive
4     Aquarian0264  ...          netural
...            ...  ...              ...
2535     meryn1977  ...         positive
2536   BSNelson114  ...          netural
2537      KenCapel  ...          netural
2538    LeslyeHale  ...         positive
2539      rerickre  ...         positive

[2540 rows x 4 columns]




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



##drop all tweets with netural polarity

netural means not positive nor negative so we have to remove the netural tweets from both datasets

In [10]:
##trump's dataset
review1 = trump[trump["Sentiment Polarity"] == 0.0000]
print(review1.shape)

cond1 = trump["Sentiment Polarity"].isin(review1["Sentiment Polarity"])
trump.drop(trump[cond1].index, inplace = True)
print(trump.shape)

(1464, 4)
(1324, 4)


In [11]:
##biden's dataset
review2 = biden[biden["Sentiment Polarity"] == 0.0000]
print(review2.shape)

cond2 = biden["Sentiment Polarity"].isin(review2["Sentiment Polarity"])
biden.drop(biden[cond2].index, inplace = True)
print(biden.shape)

(1509, 4)
(1031, 4)


##balanced the datasets by removing nos. of rows(trump=324,biden=31)

trum has 1324 rows and biden has 1031 rows so have to balanced the datasets by removing 324rows from trump and 31rows from biden

In [12]:
##trump's dataset
np.random.seed(10)
remove_rows = 324
drop_indices = np.random.choice(trump.index, remove_rows, replace = False)
subset_trump = trump.drop(drop_indices)
print(subset_trump.shape)

(1000, 4)


In [13]:
##biden' dataset
np.random.seed(10)
remove_rows = 31
drop_indices = np.random.choice(biden.index, remove_rows, replace = False)
subset_biden = biden.drop(drop_indices)
print(subset_biden.shape)

(1000, 4)


##analyzing the data to predict the US election

both dataset has positive and negative sentiment polarity, so by analyzing the number of positive and negative sentiments in both dataset we clarifiy who win or losse

In [14]:
##trump's positive and neagtive polarity
count1 = subset_trump.groupby("Expression Label").count()
print(count1)

negative_polarity1 = (count1["Sentiment Polarity"][0]/1000)*100
positive_polarity1 = (count1["Sentiment Polarity"][1]/1000)*100

print("Negative Polarity of Trump: ",negative_polarity1)
print("Positive Polarity of Trump: ",positive_polarity1)

                  user  text  Sentiment Polarity
Expression Label                                
negative           449   449                 449
positive           551   551                 551
Negative Polarity of Trump:  44.9
Positive Polarity of Trump:  55.1


In [15]:
##biden's positive and negative polarity
count2 = subset_biden.groupby("Expression Label").count()
print(count2)

negative_polarity2 = (count2["Sentiment Polarity"][0]/1000)*100
positive_polarity2 = (count2["Sentiment Polarity"][1]/1000)*100

print("Negative Polarity of Biden: ",negative_polarity2)
print("Positive Polarity of Biden: ",positive_polarity2)

                  user  text  Sentiment Polarity
Expression Label                                
negative           393   393                 393
positive           607   607                 607
Negative Polarity of Biden:  39.300000000000004
Positive Polarity of Biden:  60.699999999999996


In [16]:
##now visualize the graph to show who win
politician = ["Doanld Trump", "Joe Biden"]

list_of_positive = [positive_polarity1, positive_polarity2]
list_of_negative = [negative_polarity1, negative_polarity2]

figure = go.Figure(data = [
                           go.Bar(name = "Positive", x = politician, y = list_of_positive),
                           go.Bar(name = "Negative", x = politician, y = list_of_negative)
])

figure.update_layout(barmode = "group")
figure.show()

print("As we Seen in above graph joe biden has 60.7% of positive tweets and Donld trump has 55.1%.")
print("                              *Joe Biden has won the Election*")

As we Seen in above graph joe biden has 60.7% of positive tweets and Donld trump has 55.1%.
                              *Joe Biden has won the Election*
