<h1>Is Biden the Most Popular Presidential Candidate Ever?</h1>
In the 2020 presidential election, as of Nov 6, Biden has received more votes than any other candidate in American history.  However, does this mean he is truly the most popular president?

To determine the truth of the matter, instead of counting individual votes per candidate, I will be calculating what percent of the population has historically voted for a candidate.

In order to meet the requirements of this project, the demographic history of the United States will be needed, as well as data for how many votes each presidential candidate, win or loss, has ever received.

In [1]:
# Library dependencies
import pandas as pd

# Standard plotly imports
from chart_studio.plotly import plot, iplot as py
import plotly.graph_objects as go
import plotly.express as px
from plotly.offline import iplot, init_notebook_mode

# Using plotly + cufflinks in offline mode
import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)

<h2>Demographic History of the United States</h2>

In [2]:
# Source: https://en.wikipedia.org/wiki/Demographic_history_of_the_United_States
df = pd.DataFrame(pd.read_csv('us_population.csv'))
df.tail()

Unnamed: 0,Census year,Population,Growth rate
37,1980,226545805,11.48%
38,1990,248709873,9.78%
39,2000,281421906,13.15%
40,2010,308745538,9.71%
41,2020,332639000,7.74%


In [3]:
fig = go.Figure(data=go.Scatter(x=df['Census year'], y=df['Population'], mode='markers'))
fig.update_layout(title='Year vs. Population (US)')
fig.layout.template = 'seaborn'

fig.show()

<h2>Number of Votes Received per US Presidential Candidate</h2>

In [4]:
# Source: https://en.wikipedia.org/wiki/List_of_United_States_presidential_candidates_by_number_of_votes_received
dp = pd.DataFrame(pd.read_csv('table.csv'))
dp.head()

Unnamed: 0,Candidate,Year,Party,Popular vote,Notes
0,Barack Obama,2008,Democratic,69498516,Winner
1,Barack Obama,2012,Democratic,65915795,Winner
2,Hillary Clinton,2016,Democratic,65853514,"Received the most votes, but lost the electora..."
3,Donald Trump,2016,Republican,62984828,"Winner. Lost the popular vote, but won the ele..."
4,George W. Bush,2004,Republican,62040610,Winner


In [5]:
fig = go.Figure(data=go.Scatter(x=dp['Year'], y=dp['Popular vote'], mode='markers'))
fig.update_layout(title='Year vs. Popular Vote (US)')
fig.layout.template = 'seaborn'

fig.show()

<h2>Calculating Voter Percentage</h2>
Unfortunately, is only collected once every ten years, so for years like 1984, 2012, and so on, the population will need to be estimated.

For the purposes of this project, a linear estimation was used.  That is, if we wish to find the population for the year 1984, then we take the populations for the years 1980 and 1990, assume there has been linear growth during that decade, and then calculate the population for 1984 using that assumption.

In [6]:
def getYearBounds(year):
    if year == 2020:
        return 2010, 2020
    lower_year = year - year%10
    upper_year = lower_year + 10
    return lower_year, upper_year

def estimatePopulation(year):
    lower_year, upper_year = getYearBounds(year)
    lower_pop = df[df['Census year']==lower_year]['Population'].astype(int).values[0]
    upper_pop = df[df['Census year']==upper_year]['Population'].astype(int).values[0]
    
    ten_year_diff = upper_pop - lower_pop
    estimate = lower_pop + int((year - lower_year) * ten_year_diff / 10)
    
    return estimate

In [7]:
# Estimate the population for each election year, and append those values to the rest of the data
a = list(map(lambda x: estimatePopulation(x), dp.Year.values))
dp['Population_Estimate'] = a
dp.head()

Unnamed: 0,Candidate,Year,Party,Popular vote,Notes,Population_Estimate
0,Barack Obama,2008,Democratic,69498516,Winner,303280811
1,Barack Obama,2012,Democratic,65915795,Winner,313524230
2,Hillary Clinton,2016,Democratic,65853514,"Received the most votes, but lost the electora...",323081615
3,Donald Trump,2016,Republican,62984828,"Winner. Lost the popular vote, but won the ele...",323081615
4,George W. Bush,2004,Republican,62040610,Winner,292351358


In [8]:
fig = px.scatter(dp, x=dp['Year'], y=dp['Popular vote']/dp['Population_Estimate'], color=dp['Party'])
fig.update_traces(marker=dict(size=12,
                              line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
fig.update_layout(title='Year vs. % Popular Vote (US)')
fig.show()

Now that we've done all the work, we can see that, up to the year 2016, the candidate with the highest percentage of the vote was Ronald Reagan in 1984, with 23.1% of the population voting for him.<br><br>
Great!  We have our baseline value of <b>23.1%</b>.  Now let's see how Sleepy Joe compares.

<h2>Adding 2020 Data to the Mix</h2>
For the year 2020, the estimated population, according to the census data, is 332639000.<br><br>
As of 9AM PST on November 6, 2020, Joe Biden has 73683443 votes. Source: The Associated Press

In [9]:
joe_biden_percent = 73683443 / 332639000
print(joe_biden_percent)

0.22151173795015017


<h2>Conclusion</h2>
So there you have it.  With less than <b>22.2%</b> of the population vote, Joe Biden trails behind Ronald Reagan's <b>23.1%</b>.  With those numbers, Obama in 2008, LBJ in 1964, and Nixon in 1972 were all more popular than Biden currently is.  Still, that makes him the 5th most popular candidate in American history.<br><br>
If Joe Biden wants to truly become the most popular presidential candidate ever, though, he would need to collect more than 3.27 million more votes before this election cycle is over.  Looks like he has his work cut out for him.<br><br>