# Maine Rank Choice Voting (RCV) Analysis 

Following the rules here: 
Data source: https://www.maine.gov/sos/cec/elec/results/results18.html#Nov6

In [1]:
import pandas as pd

ballot_df = pd.read_csv('ME_all.csv')
ballot_df.head()

Unnamed: 0,record,precinct,ballot_style,rank1,rank2,rank3,rank4,rank5
0,1,Fayette,CAN Ballot Style 130,"REP Poliquin, Bruce (5931)","REP Poliquin, Bruce","REP Poliquin, Bruce",undervote,undervote
1,2,Fayette,CAN Ballot Style 130,"REP Poliquin, Bruce (5931)",undervote,undervote,undervote,undervote
2,3,Fayette,CAN Ballot Style 130,"DEM Golden, Jared F. (5471)","Bond, Tiffany L.",undervote,undervote,undervote
3,4,Fayette,CAN Ballot Style 130,"REP Poliquin, Bruce (5931)","DEM Golden, Jared F.","Bond, Tiffany L.","DEM Golden, Jared F.","Hoar, William R.S."
4,6,Fayette,CAN Ballot Style 130,"REP Poliquin, Bruce (5931)",undervote,undervote,undervote,undervote


In [2]:
ballot_df['rank1'].value_counts()

REP Poliquin, Bruce (5931)     133793
DEM Golden, Jared F. (5471)    131003
Bond, Tiffany L.                16415
Hoar, William R.S.               6782
undervote                        6641
DEM Golden, Jared F.              819
overvote                          424
REP Poliquin, Bruce               200
Name: rank1, dtype: int64

## Data Cleaning

In [3]:
# remove ' (5931)'' from 'REP Poliquin, Bruce (5931)' and ' (5471)' from DEM Golden, Jared F. (5471)
# now each candidate has exaclty one string representing a vote for them.
ballot_df['rank1'] = ballot_df['rank1'].map(lambda choice: choice.split(' (')[0].strip())
ballot_df.rank1.value_counts()

REP Poliquin, Bruce     133993
DEM Golden, Jared F.    131822
Bond, Tiffany L.         16415
Hoar, William R.S.        6782
undervote                 6641
overvote                   424
Name: rank1, dtype: int64

In [4]:
# apply the same cleaning to all other ranks
for index in range(2, 6):
    rank = 'rank{}'.format(index)
    print("Cleaning", rank)
    ballot_df[rank] = ballot_df[rank].map(lambda choice: choice.split(' (')[0].strip())

Cleaning rank2
Cleaning rank3
Cleaning rank4
Cleaning rank5


In [5]:
# ensure strings are consistent across all columns
unique_candidate_set = set()
for index in range(1, 6):
    rank = 'rank{}'.format(index)
    rank_i_unique_candidate_set = set(ballot_df[rank].unique())
    # take the union of unique candidates so far and unique candidates in this round
    unique_candidate_set = unique_candidate_set | rank_i_unique_candidate_set 
unique_candidate_set

{'Bond, Tiffany L.',
 'DEM Golden, Jared F.',
 'Hoar, William R.S.',
 'REP Poliquin, Bruce',
 'overvote',
 'undervote'}

We now see that across all ranks, each candidate is represented by exaclty one unique string.

## Definitions
From [Tile 21-A, §723-A](https://legislature.maine.gov/statutes/21-A/title21-Asec723-A.html):

>A. "Batch elimination" means the simultaneous defeat of multiple candidates for whom it is mathematically impossible to be elected.   [IB 2015, c. 3, §5 (NEW).]

>B. "Continuing ballot" means a ballot that is not an exhausted ballot.   [IB 2015, c. 3, §5 (NEW).]

>C. "Continuing candidate" means a candidate who has not been defeated.   [IB 2015, c. 3, §5 (NEW).]

In [6]:
OVERVOTE = 'overvote'
UNDERVOTE = 'undervote'

candidate_set = {
    'Bond, Tiffany L.',
    'DEM Golden, Jared F.',
    'Hoar, William R.S.',
    'REP Poliquin, Bruce'
}
continuing_candidate_set = {
    'Bond, Tiffany L.',
    'DEM Golden, Jared F.',
    'Hoar, William R.S.',
    'REP Poliquin, Bruce'
}

>D. "Exhausted ballot" means a ballot that does not rank any continuing candidate, contains an overvote at the highest continuing ranking or contains 2 or more sequential skipped rankings before its highest continuing ranking. 

In [7]:
# Constants
OVERVOTE = 'overvote'
UNDERVOTE = 'undervote'

# Helper functions
def ballot_has_two_consecutive_skipped_rankings(ballot):
    highest_continuing_rank = ballot['highest_continuing_rank']
    if highest_continuing_rank < 3:
        return False
    else:
        rank_one_before = "rank{}".format(highest_continuing_rank - 1)
        rank_two_before = "rank{}".format(highest_continuing_rank - 2)
    return (ballot[rank_one_before] == UNDERVOTE) and (ballot[rank_two_before] == UNDERVOTE)

def ballot_exhausted_for_overvote(ballot):
    highest_continuing_rank = ballot['highest_continuing_rank']
    if highest_continuing_rank in {1,2,3,4,5}:
        return ballot["rank{}".format(highest_continuing_rank)] == OVERVOTE
    else:
        return False

def mark_exhausted_ballots(continuing_ballot_df, continuing_candidate_set):
    # determine which ballot rank a continuing candidate according to https://legislature.maine.gov/statutes/21-A/title21-Asec723-A.html
    ranks_a_continuing_candidate = (continuing_ballot_df['rank1'].isin(continuing_candidate_set)
        | continuing_ballot_df['rank2'].isin(continuing_candidate_set)
        | continuing_ballot_df['rank3'].isin(continuing_candidate_set)
        | continuing_ballot_df['rank4'].isin(continuing_candidate_set)
        | continuing_ballot_df['rank5'].isin(continuing_candidate_set))
    continuing_ballot_df['does_not_rank_any_continuing_candidate'] = ~ranks_a_continuing_candidate
    print(continuing_ballot_df['does_not_rank_any_continuing_candidate'].value_counts()[True], "ballots exhausted for not ranking any continuing candidate.")
    
    # determine the highest continuing rank according to https://legislature.maine.gov/statutes/21-A/title21-Asec723-A.html
    # E. "Highest continuing ranking" means the highest ranking on a voter's ballot for a continuing candidate.   [IB 2015, c. 3, §5 (NEW).]
    continuing_ballot_df['highest_continuing_rank'] = 6 # defaults to 6 so that if the ballot is all undervotes, then ballot_has_two_consecutive_skipped_rankings will catch it
    for rank_idx in range(5, 0, -1):
        continuing_ballot_df['rank_choice_in_continuing_candidate_set'] = continuing_ballot_df["rank{}".format(rank_idx)].isin(continuing_candidate_set | {OVERVOTE})
        continuing_ballot_df.loc[continuing_ballot_df['rank_choice_in_continuing_candidate_set'], 'highest_continuing_rank'] = rank_idx
    
    # determine which ballots have an overvote at their highest continuing ranking according to https://legislature.maine.gov/statutes/21-A/title21-Asec723-A.html
    # H. "Overvote" means a circumstance in which a voter has ranked more than one candidate at the same ranking.   [IB 2015, c. 3, §5 (NEW).]
    continuing_ballot_df['overvote'] = continuing_ballot_df.apply(lambda ballot: ballot_exhausted_for_overvote(ballot), axis=1)
    print(continuing_ballot_df['overvote'].value_counts()[True], "ballots exhausted for overvotes.")
     
    continuing_ballot_df['two_consecutive_skipped_rankings'] = continuing_ballot_df.apply(lambda ballot: ballot_has_two_consecutive_skipped_rankings(ballot), axis=1)
    print(continuing_ballot_df['two_consecutive_skipped_rankings'].value_counts()[True], "ballots exhausted for two consecutive skipped rankings.")
    
    continuing_ballot_df['exhausted'] = continuing_ballot_df['overvote'] | continuing_ballot_df['two_consecutive_skipped_rankings'] | continuing_ballot_df['does_not_rank_any_continuing_candidate']
    print(continuing_ballot_df['exhausted'].value_counts()[True], "total ballots exhausted.")
    return continuing_ballot_df

## Procedure
>Except as provided in subsections 3 and 4, the following procedures are used to determine the winner of an election determined by ranked-choice voting. The ranked‑choice voting count must proceed in rounds. In each round, the number of votes for each continuing candidate must be counted. Each continuing ballot counts as one vote for its highest-ranked continuing candidate for that round. Exhausted ballots are not counted for any continuing candidate. The round then ends with one of the following 2 potential outcomes.

>A. If there are 2 or fewer continuing candidates, the candidate with the most votes is declared the winner of the election.   [IB 2015, c. 3, §5 (NEW).]

>B. If there are more than 2 continuing candidates, the last-place candidate is defeated and a new round begins.   [IB 2015, c. 3, §5 (NEW).]

In [8]:
candidate_set = {
    'Bond, Tiffany L.',
    'DEM Golden, Jared F.',
    'Hoar, William R.S.',
    'REP Poliquin, Bruce'
}
continuing_candidate_set = {
    'Bond, Tiffany L.',
    'DEM Golden, Jared F.',
    'Hoar, William R.S.',
    'REP Poliquin, Bruce'
}

import warnings
warnings.filterwarnings('error')

decided = False
round_index = 1
votes_by_round_df = pd.DataFrame()
continuing_ballot_df = ballot_df
round_to_ballot_df = {}

def get_highest_ranked_candidate(ballot):
    rank_idx = ballot['highest_continuing_rank']
    if rank_idx in {1,2,3,4,5}:
        return ballot["rank{}".format(rank_idx)]
    else:
        return False

while not decided:
    print("\nStarting Round", round_index)

    # determine which ballots are exhausted
    print("Determining which ballots are exhausted")
    continuing_ballot_df_exhausted_marked = mark_exhausted_ballots(continuing_ballot_df, continuing_candidate_set)
    
    # determine the highest ranked continuing candidate for each ballot
    continuing_ballot_df['highest_ranked_continuing_candidate'] = continuing_ballot_df.apply(lambda ballot: get_highest_ranked_candidate(ballot), axis=1)  
    # record this round's ballot dataframe for analysis
    round_to_ballot_df[round_index] = continuing_ballot_df_exhausted_marked
    
    # remove exhausted ballots from consideration
    continuing_ballot_df = pd.DataFrame(continuing_ballot_df_exhausted_marked[~continuing_ballot_df_exhausted_marked['exhausted']])
    assert set(continuing_ballot_df['highest_ranked_continuing_candidate'].unique()) <= continuing_candidate_set
                                                                              
    # for each continuing ballot, assign one vote for its highest ranked continuing candidate                              
    votes_by_round_df[round_index] = continuing_ballot_df['highest_ranked_continuing_candidate'].value_counts()

    if len(continuing_candidate_set) <= 2:
        decided = True
        winning_candidate = votes_by_round_df[votes_by_round_df[round_index]==votes_by_round_df[round_index].max()].index[0]
        print(winning_candidate, "won!")
    else:
        # Eliminate the continuing candidate with the fewest
        eliminated_candidate = votes_by_round_df[votes_by_round_df[round_index]==votes_by_round_df[round_index].min()].index[0]
        continuing_candidate_set.remove(eliminated_candidate)
        print(eliminated_candidate, "was eliminated.")
    round_index += 1
                                                                              
votes_by_round_df


Starting Round 1
Determining which ballots are exhausted
5928 ballots exhausted for not ranking any continuing candidate.
462 ballots exhausted for overvotes.
6018 ballots exhausted for two consecutive skipped rankings.
6453 total ballots exhausted.
Hoar, William R.S. was eliminated.

Starting Round 2
Determining which ballots are exhausted
2130 ballots exhausted for not ranking any continuing candidate.
27 ballots exhausted for overvotes.
2044 ballots exhausted for two consecutive skipped rankings.
2162 total ballots exhausted.
Bond, Tiffany L. was eliminated.

Starting Round 3
Determining which ballots are exhausted
5991 ballots exhausted for not ranking any continuing candidate.
123 ballots exhausted for overvotes.
5728 ballots exhausted for two consecutive skipped rankings.
6088 total ballots exhausted.
DEM Golden, Jared F. won!


Unnamed: 0,1,2,3
"REP Poliquin, Bruce",134184,135073.0,138932.0
"DEM Golden, Jared F.",132013,133216.0,142442.0
"Bond, Tiffany L.",16552,19173.0,
"Hoar, William R.S.",6875,,


In [9]:
votes_by_round_df.sum()

1    289624.0
2    287462.0
3    281374.0
dtype: float64