# 2000 U.S. Presidential Election in Florida

The 2000 presidential election was contested by George W. Bush (R) and Al Gore (D). It was an extraordinarily close race and came down to just one state: Florida. Because of the Electoral College, whoever won Florida would win the election.

In this Colab, you will explore the Florida election data by county to understand the controversies surrounding this election.

In [36]:
import pandas as pd
import plotly.express as px

## Question 1
The data can be found at https://datasci112.stanford.edu/data/florida2000.csv. Read in the data. The columns Bush00, Gore00, Buchanan00, and Other00 contain the number of votes obtained by Bush, Gore, Pat Buchanan (a third-party candidate), and all other candidates, respectively, in the 2000 election.

Who won the election and by how many votes? Notice that some of the votes came from overseas absentee voters. If we exclude these votes, who would have won the election?

In [37]:
# load the dataset
df = pd.read_csv('data/florida2000.csv')
df

Unnamed: 0,County,VotingMachine,Ballot,Undervote,Overvote,Bush00,Gore00,Buchanan00,Other00,Clinton96,Dole96,Perot96
0,Alachua,Optical,1-column,217.0,105.0,34124,47365,263,3971,40144,25303,8072
1,Baker,Optical,1-column,79.0,46.0,5610,2392,73,79,2273,3684,667
2,Bay,Optical,1-column,541.0,141.0,38637,18850,248,1065,17020,28290,5922
3,Bradford,Optical,2-column,41.0,695.0,5414,3075,65,119,3356,4038,819
4,Brevard,Optical,1-column,277.0,136.0,115185,97318,570,5311,80416,87980,25249
...,...,...,...,...,...,...,...,...,...,...,...,...
63,Volusia,Optical,1-column,339.0,171.0,82357,97304,498,3486,78905,63067,17319
64,Wakulla,Datavote,1-column,49.0,373.0,4512,3838,46,189,3054,2931,1091
65,Walton,Optical,1-column,135.0,72.0,12182,5642,120,371,5341,7706,2342
66,Washington,Optical,1-column,305.0,36.0,4994,2798,88,145,2992,3522,1287


In [38]:
# get votes of each candidate
df_votes = df[['Bush00', 'Gore00', 'Buchanan00', 'Other00']].sum().sort_values(ascending=False)
df_votes

Bush00        2912790
Gore00        2912253
Other00        120583
Buchanan00      17484
dtype: int64

In [39]:
# create a bar chart
fig = px.bar(
    df_votes,
    x=df_votes.index,
    y=df_votes.values,
    title='Florida 2000 Presidential Election Votes',
    labels={'x': 'Candidates', 'y': 'Votes'},
    color=df_votes.index)
# show the bar chart
fig.show()

In [40]:
# get votes of each candidate except for 'federal absentee'
df_votes_no_federal_absentee = df[df['County'] != '(Federal Absentee)'][['Bush00', 'Gore00', 'Buchanan00', 'Other00']].sum().sort_values(ascending=False)
df_votes_no_federal_absentee

Gore00        2911417
Bush00        2911215
Other00        119947
Buchanan00      17479
dtype: int64

In [41]:
# create a bar chart
fig_no_federal_absentee = px.bar(
    df_votes_no_federal_absentee,
    x=df_votes_no_federal_absentee.index,
    y=df_votes_no_federal_absentee.values,
    title='Florida 2000 Presidential Election Votes',
    labels={'x': 'Candidates', 'y': 'Votes'},
    color=df_votes_no_federal_absentee.index)
# show the bar chart without federal absentee votes
fig_no_federal_absentee.show()

## Question 2

One controversy in the 2000 election was the so-called "butterfly ballot" used by Palm Beach County, shown below.

Gore was the 2nd ticket listed on the left-hand side, but to vote for Gore, a voter had to punch the 3rd hole. Punching the 2nd ticket would result in a vote for Pat Buchanan.

Many voters claimed that they were confused by the ballot design and inadvertently voted for Buchanan, when they meant to vote for Gore.

_Learn_: Take a look at the plotly scatter() [documentation](https://plotly.com/python-api-reference/generated/plotly.express.scatter.html). There are a ton of optional arguments! Pick 2 arguments besides DataFrame, x, and y and explain what they do.

_Do_: Make scatterplots of the data to see if there is evidence that there were an unusual number of voters for Pat Buchanan in Palm Beach County. Would this have been enough to swing the election?

In [42]:
# Highlight Palm Beach County in scatterplots of Buchanan vs Bush and Buchanan vs Gore
# (generated by GPT-4.1)

import plotly.express as px

# Add a column to indicate Palm Beach County
df['PalmBeach'] = df['County'] == 'Palm Beach'

# comment: be careful with set axes
# Buchanan vs Bush scatterplot
fig_buchanan_bush_pb = px.scatter(
    df, x='Bush00', y='Buchanan00', color='PalmBeach',
    title='Buchanan vs Bush Votes by County (Palm Beach Highlighted)',
    labels={'Bush00': 'Bush Votes', 'Buchanan00': 'Buchanan Votes'}
)
fig_buchanan_bush_pb.show()

# comment: be careful with set axes
# Buchanan vs Gore scatterplot
fig_buchanan_gore_pb = px.scatter(
    df, x='Gore00', y='Buchanan00', color='PalmBeach',
    title='Buchanan vs Gore Votes by County (Palm Beach Highlighted)',
    labels={'Gore00': 'Gore Votes', 'Buchanan00': 'Buchanan Votes'}
)
fig_buchanan_gore_pb.show()

# Check Buchanan votes in Palm Beach vs other counties
pb_buchanan = df.loc[df['County'] == 'Palm Beach', 'Buchanan00'].values[0]
mean_buchanan = df.loc[df['County'] != 'Palm Beach', 'Buchanan00'].mean()
print(f"Palm Beach Buchanan votes: {pb_buchanan}")
print(f"Mean Buchanan votes (other counties): {mean_buchanan:.2f}")
print(f"Difference: {pb_buchanan - mean_buchanan:.2f}")

# Would this have been enough to swing the election?
bush_gore_diff = abs(df_votes['Bush00'] - df_votes['Gore00'])
print(f"Bush-Gore margin (no federal absentee): {bush_gore_diff}")
print(f"Excess Buchanan votes in Palm Beach: {pb_buchanan - mean_buchanan:.2f}")

Palm Beach Buchanan votes: 3411
Mean Buchanan votes (other counties): 210.04
Difference: 3200.96
Bush-Gore margin (no federal absentee): 537
Excess Buchanan votes in Palm Beach: 3200.96


In [43]:
# my answer 1
fig_bush_vs_gore = px.scatter(df, x='Bush00', y='Gore00', color='County', title='Bush vs Gore Votes by County')
fig_bush_vs_gore.show()

In [44]:
# my answer 2
fig_buchanan_vs_bush = px.scatter(df, x='Buchanan00', y='Bush00', color='County', title='Buchanan vs Bush Votes by County')
fig_buchanan_vs_bush.show()

In [45]:
# my answer 3
fig_buchanan_vs_gore = px.scatter(df, x='Buchanan00', y='Gore00', color='County', title='Buchanan vs Gore Votes by County')
fig_buchanan_vs_gore.show()

## Question 3a

Another controversy in the 2000 election was overvotes. Overvoting is when a voter selects more than one candidate. Overvotes are disqualified and not counted. In the 2000 election, Gore appeared on a disproportionate number of overvotes in Florida.

Calculate the overvote proportion in each county.

In [46]:
# calculate the overvote ratio in each county
df['TotalVote00'] = df[['Undervote', 'Overvote', 'Bush00', 'Gore00', 'Buchanan00', 'Other00']].sum(axis=1)
df['OvervoteRatio00'] = df['Overvote'] / df['TotalVote00']
df

Unnamed: 0,County,VotingMachine,Ballot,Undervote,Overvote,Bush00,Gore00,Buchanan00,Other00,Clinton96,Dole96,Perot96,PalmBeach,TotalVote00,OvervoteRatio00
0,Alachua,Optical,1-column,217.0,105.0,34124,47365,263,3971,40144,25303,8072,False,86045.0,0.001220
1,Baker,Optical,1-column,79.0,46.0,5610,2392,73,79,2273,3684,667,False,8279.0,0.005556
2,Bay,Optical,1-column,541.0,141.0,38637,18850,248,1065,17020,28290,5922,False,59482.0,0.002370
3,Bradford,Optical,2-column,41.0,695.0,5414,3075,65,119,3356,4038,819,False,9409.0,0.073865
4,Brevard,Optical,1-column,277.0,136.0,115185,97318,570,5311,80416,87980,25249,False,218797.0,0.000622
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Volusia,Optical,1-column,339.0,171.0,82357,97304,498,3486,78905,63067,17319,False,184155.0,0.000929
64,Wakulla,Datavote,1-column,49.0,373.0,4512,3838,46,189,3054,2931,1091,False,9007.0,0.041412
65,Walton,Optical,1-column,135.0,72.0,12182,5642,120,371,5341,7706,2342,False,18522.0,0.003887
66,Washington,Optical,1-column,305.0,36.0,4994,2798,88,145,2992,3522,1287,False,8366.0,0.004303


## Question 3b

In the 2000 election, every county in Florida decided on a different voting machine and ballot design.

Use the grammar of graphics to determine if there is evidence that the voting machine (`VotingMachine`) or ballot design (`Ballot`) influenced the overvote ratio defined above.

In [47]:
# get the average overvote ratio for each voting machine
df_voting_machine_overvote = df[['VotingMachine', 'OvervoteRatio00']].groupby('VotingMachine').mean().reset_index()
df_voting_machine_overvote

Unnamed: 0,VotingMachine,OvervoteRatio00
0,Datavote,0.060193
1,Hand,0.057066
2,Lever,0.0009
3,Optical,0.027363
4,Votomatic,0.019127


In [76]:
# visualize the overvote ratio by voting machine type
fig_overvote_ratio = px.bar(    
    df_voting_machine_overvote,
    x='VotingMachine',
    y='OvervoteRatio00',
    title='Mean of Overvote Ratio by Voting Machine Type',
    labels={'VotingMachine': 'Voting Machine', 'OvervoteRatio00': 'Overvote Ratio'},
    color='VotingMachine'
)
fig_overvote_ratio.show()

In [49]:
# get the average overvote ratio for each ballot
df_ballot_overvote = df[['Ballot', 'OvervoteRatio00']].groupby('Ballot').mean().reset_index()
df_ballot_overvote

Unnamed: 0,Ballot,OvervoteRatio00
0,1-column,0.015858
1,2-column,0.065649


In [77]:
# visualize the overvote ratio by ballot type
fig_ballot_overvote_ratio = px.bar(
    df_ballot_overvote,
    x='Ballot',
    y='OvervoteRatio00',
    title='Mean of Overvote Ratio by Ballot Type',
    labels={'Ballot': 'Ballot Type', 'OvervoteRatio00': 'Overvote Ratio'},
    color='Ballot'
)
fig_ballot_overvote_ratio.show()

## Question 4

Al Gore challenged the state-certified count in court, in a case that made it all the way to the Supreme Court (Bush v. Gore). Since miscast votes and overvotes cannot be counted, Gore's challenge focused on undervotes. Undervoting is when a voter selects no candidate. However, this may be because the voter did not fully punch a hole in the ballot, leaving so-called "hanging chads" (see image).

Gore demanded recounts in counties with punch-card ballots (i.e., Votomatic or Datavote), arguing that the undervote rate was higher in counties with punch-card ballots. Do you agree?

In [51]:
# get undervote ratio
df['UndervoteRatio00'] = df['Undervote'] / df['TotalVote00']
df

Unnamed: 0,County,VotingMachine,Ballot,Undervote,Overvote,Bush00,Gore00,Buchanan00,Other00,Clinton96,Dole96,Perot96,PalmBeach,TotalVote00,OvervoteRatio00,UndervoteRatio00
0,Alachua,Optical,1-column,217.0,105.0,34124,47365,263,3971,40144,25303,8072,False,86045.0,0.001220,0.002522
1,Baker,Optical,1-column,79.0,46.0,5610,2392,73,79,2273,3684,667,False,8279.0,0.005556,0.009542
2,Bay,Optical,1-column,541.0,141.0,38637,18850,248,1065,17020,28290,5922,False,59482.0,0.002370,0.009095
3,Bradford,Optical,2-column,41.0,695.0,5414,3075,65,119,3356,4038,819,False,9409.0,0.073865,0.004358
4,Brevard,Optical,1-column,277.0,136.0,115185,97318,570,5311,80416,87980,25249,False,218797.0,0.000622,0.001266
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Volusia,Optical,1-column,339.0,171.0,82357,97304,498,3486,78905,63067,17319,False,184155.0,0.000929,0.001841
64,Wakulla,Datavote,1-column,49.0,373.0,4512,3838,46,189,3054,2931,1091,False,9007.0,0.041412,0.005440
65,Walton,Optical,1-column,135.0,72.0,12182,5642,120,371,5341,7706,2342,False,18522.0,0.003887,0.007289
66,Washington,Optical,1-column,305.0,36.0,4994,2798,88,145,2992,3522,1287,False,8366.0,0.004303,0.036457


In [None]:
# get PunchcardBallot column
# PunchcardBallot is True if VotingMachine is 'Votomatic' or 'Datavote'
df['PunchcardBallot'] = (df['VotingMachine'] == 'Votomatic') | (df['VotingMachine'] == 'Datavote')
df

Unnamed: 0,County,VotingMachine,Ballot,Undervote,Overvote,Bush00,Gore00,Buchanan00,Other00,Clinton96,Dole96,Perot96,PalmBeach,TotalVote00,OvervoteRatio00,UndervoteRatio00,PunchcardBallot
0,Alachua,Optical,1-column,217.0,105.0,34124,47365,263,3971,40144,25303,8072,False,86045.0,0.001220,0.002522,False
1,Baker,Optical,1-column,79.0,46.0,5610,2392,73,79,2273,3684,667,False,8279.0,0.005556,0.009542,False
2,Bay,Optical,1-column,541.0,141.0,38637,18850,248,1065,17020,28290,5922,False,59482.0,0.002370,0.009095,False
3,Bradford,Optical,2-column,41.0,695.0,5414,3075,65,119,3356,4038,819,False,9409.0,0.073865,0.004358,False
4,Brevard,Optical,1-column,277.0,136.0,115185,97318,570,5311,80416,87980,25249,False,218797.0,0.000622,0.001266,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,Volusia,Optical,1-column,339.0,171.0,82357,97304,498,3486,78905,63067,17319,False,184155.0,0.000929,0.001841,False
64,Wakulla,Datavote,1-column,49.0,373.0,4512,3838,46,189,3054,2931,1091,False,9007.0,0.041412,0.005440,True
65,Walton,Optical,1-column,135.0,72.0,12182,5642,120,371,5341,7706,2342,False,18522.0,0.003887,0.007289,False
66,Washington,Optical,1-column,305.0,36.0,4994,2798,88,145,2992,3522,1287,False,8366.0,0.004303,0.036457,False


In [71]:
# get the average undervote ratio whether the ballot is a punchcard or not
df_punchcard_undervote = df.groupby('PunchcardBallot')['UndervoteRatio00'].mean()
df_punchcard_undervote

PunchcardBallot
False    0.005827
True     0.012981
Name: UndervoteRatio00, dtype: float64

In [78]:
# create a bar chart for overvote ratio by voting machine
fig_punchcard_undervote = px.bar(
    df_punchcard_undervote,
    x=df_punchcard_undervote.index.astype(str).sort_values(ascending=False),
    y=df_punchcard_undervote.values,
    title='Mean of Undervote Ratio by Punchcard Ballot',
    labels={'x': 'Punchcard Ballot', 'y': 'Undervote Ratio'},
    color=df_punchcard_undervote.index.astype(str)
)
fig_punchcard_undervote.show()

In [None]:
# create a histogram of undervote ratio by punchcard ballot
fig_punchcard_undervote = px.histogram(
    df,
    x='UndervoteRatio00',
    color='PunchcardBallot',
    title='Distribution of Undervote Ratio by Punchcard Ballot',      
    labels={'x': 'Punchcard Ballot', 'y': 'Undervote Ratio'},
    barmode='overlay'
)
fig_punchcard_undervote.show() 