# Getting started with `votekit`

This guide will help you get started using `votekit`, by using real election data from the 2013 Minneapolis mayoral election. This election had 35 candidates running for one seat, and used a single-winner IRV method to elect the winner. Voters were allowed to rank their top three candidates. 

In [8]:
# these are the votekit functions we'll need access to
from votekit.cvr_loaders import load_csv
from votekit.elections import STV, fractional_transfer
from votekit.cleaning import remove_noncands

You can find the necessary csv file in the `src/votekit/data` folder of the GitHub repo. Download it, and then edit the path below to where you placed it.

We have the election data from 2013 in a csv file, which has 7 columns. The first tells us from which precinct a ballot was cast. The second through fourth tell us the first, second, and third ranking candidate from each voter.
The remaining columns tell us the race of the candidate in each rank. 

The first thing we will do is create a `PreferenceProfile` object from our csv. A preference profile is a term from the social choice literature that represents the rankings of some set of candidates from some voters. Put another way, a preference profile stores the votes from an election, and is a collection of `Ballot` objects and candidates. 

We give the load_csv function the path to the csv file, as well as the columns of the csv that correspond to rankings. These are given in decreasing order (the first column is the voters top choice, the last column their bottom choice.) Don't forget that Python starts indexing from 0. If we did not provide the parameter `rank_cols`, `votekit` would assume that each column of the csv was a ranking column. There are some other optional parameters which you can read about in the documentation.

In [2]:
# you'll need to edit this path!
minneapolis_profile = load_csv("minneapolis/2013-mayor-cvr.csv", rank_cols = [1,2,3])

The `PreferenceProfile` object has lots of helpful methods that allow us to study our votes. Let's use some of them to explore the ballots that were submitted. This is crucial since our data was not preprocessed. There could be undervotes, overvotes, defective, or spoiled ballots.

The `get_candidates` method returns a unique list of candidates. The `head` method shows the top n ballots. In the first column, we see the ballot that was cast. In the second column, we see how many of that type of ballot were cast. 

In [3]:
# returns a list of unique candidates
print(minneapolis_profile.get_candidates())

# returns the top n ballots
minneapolis_profile.head(n=5)

['UWI', None, 'BILL KAHN', 'MIKE GOULD', 'EDMUND BERNARD BRUYERE', 'DON SAMUELS', 'MARK V ANDERSON', 'CHRISTOPHER CLARK', 'JOHN LESLIE HARTWIG', 'KURTIS W. HANNA', 'JACKIE CHERRYHOMES', 'TONY LANE', 'MARK ANDREW', 'CHRISTOPHER ROBIN ZIMMERMAN', 'JOHN CHARLES WILSON', 'JEFFREY ALAN WAGNER', 'CAM WINTON', 'NEAL BAXTER', 'CAPTAIN JACK SPARROW', 'JAYMIE KELLY', 'ABDUL M RAHAMAN "THE ROCK"', 'GREGG A. IVERSON', 'TROY BENJEGERDES', 'ALICIA K. BENNETT', 'BOB FINE', 'CYD GORMAN', 'OLE SAVIOR', 'MERRILL ANDERSON', 'STEPHANIE WOODRUFF', 'RAHN V. WORKCUFF', 'BOB "AGAIN" CARNEY JR', 'DAN COHEN', 'DOUG MANN', 'JAMES EVERETT', 'JOSHUA REA', 'JAMES "JIMMY" L. STROUD, JR.', 'BETSY HODGES']


Unnamed: 0,Ballots,Weight
0,"(MARK ANDREW, None, None)",4129
1,"(BETSY HODGES, MARK ANDREW, DON SAMUELS)",3309
2,"(BETSY HODGES, DON SAMUELS, MARK ANDREW)",3031
3,"(MARK ANDREW, BETSY HODGES, DON SAMUELS)",2502
4,"(BETSY HODGES, None, None)",2417


Woah, that's a little funky! There's a candidate called 'None' and a candidate called 'UWI'. In this dataset, 'UWI' stands for unlisted write in, aka a candidate that was not on the official ballot. The 'None' candidate arises when we read in the csv file; if any ballot had an empty ranking (i.e. if I ranked my top two choices but not a third), `votekit` populates that empty space with 'None'.

It's really important to think carefully about how you want to handle cleaning up the ballots, as this depends entirely on the context of a given election. For now, let's assume that we want to get rid of the 'None' and 'UWI' candidates. The function `remove_noncands` will do this for us. If a ballot was "A B None", it would now be "A B". If a ballot was "A None B" it would now be "A B" as well. This might not be how you want to handle such things, but for now let's go with it. 

In [4]:
minneapolis_profile = remove_noncands(minneapolis_profile, [None, "UWI"])
print(minneapolis_profile.get_candidates())

['CAM WINTON', 'NEAL BAXTER', 'CAPTAIN JACK SPARROW', 'JAYMIE KELLY', 'ABDUL M RAHAMAN "THE ROCK"', 'BILL KAHN', 'GREGG A. IVERSON', 'TROY BENJEGERDES', 'ALICIA K. BENNETT', 'BOB FINE', 'MIKE GOULD', 'CYD GORMAN', 'EDMUND BERNARD BRUYERE', 'DON SAMUELS', 'MERRILL ANDERSON', 'STEPHANIE WOODRUFF', 'OLE SAVIOR', 'RAHN V. WORKCUFF', 'JOHN CHARLES WILSON', 'MARK V ANDERSON', 'CHRISTOPHER CLARK', 'JOHN LESLIE HARTWIG', 'BOB "AGAIN" CARNEY JR', 'DAN COHEN', 'KURTIS W. HANNA', 'DOUG MANN', 'JACKIE CHERRYHOMES', 'TONY LANE', 'MARK ANDREW', 'JAMES EVERETT', 'JOSHUA REA', 'CHRISTOPHER ROBIN ZIMMERMAN', 'JEFFREY ALAN WAGNER', 'JAMES "JIMMY" L. STROUD, JR.', 'BETSY HODGES']


Alright, things are looking a bit cleaner. Let's examine some of the ballots.

In [5]:
# returns the top n ballots
minneapolis_profile.head(n=5, percents = True)

Unnamed: 0,Ballots,Weight,Voter Share
0,"(MARK ANDREW,)",4156,0.052357
1,"(BETSY HODGES, MARK ANDREW, DON SAMUELS)",3309,0.041687
2,"(BETSY HODGES, DON SAMUELS, MARK ANDREW)",3031,0.038184
3,"(MARK ANDREW, BETSY HODGES, DON SAMUELS)",2502,0.03152
4,"(BETSY HODGES,)",2448,0.03084


Notice how the weight on the Mark Andrew ballot changed; it went from 4129 to 4156 after we got rid of the erraneous candidates. This means there must have been some ballots that were not (Mark Andrew, None, None) that mapped onto (Mark Andrew, , ). This highlights how our cleaning choices matter!

We can similarly print the bottom $n$ ballots. Here we toggle the optional `percents` and `totals` arguments, which will show us the fraction of the total vote, as well as sum up the weights.

In [6]:
# returns the bottom n ballots
minneapolis_profile.tail(n=5, percents = True, totals = True)

Unnamed: 0,Ballots,Weight,Voter Share
0,"(ABDUL M RAHAMAN ""THE ROCK"", ALICIA K. BENNETT...",1,1.3e-05
1,"(JAMES EVERETT, BOB FINE, DOUG MANN)",1,1.3e-05
2,"(JAMES EVERETT, BOB ""AGAIN"" CARNEY JR, RAHN V....",1,1.3e-05
3,"(JAMES EVERETT, BOB ""AGAIN"" CARNEY JR, MIKE GO...",1,1.3e-05
4,"(JAMES EVERETT, BOB ""AGAIN"" CARNEY JR, CHRISTO...",1,1.3e-05
Totals,,5,6.3e-05


There are a few other methods you can read about in the documentation, but now let's run an election!

Just because we have a collection of ballots does not mean we have a winner. To convert a PreferenceProfile into a winner (or winners), we need to choose a method of election. The mayoral race was conducted as a single winner IRV election, which in `votekit` is equivalent to a STV election with one seat. The transfer method tells us what to do if someone has a surplus of votes over the winning quota (which by default is the Droop quota). 

In [9]:
minn_election = STV(profile = minneapolis_profile, transfer = fractional_transfer, seats = 1)

In [10]:
# the run_election method prints a dataframe showing the order in which candidates are eliminated under STV
minn_election.run_election()

                   Candidate     Status  Round
                BETSY HODGES    Elected     35
                 MARK ANDREW Eliminated     34
                 DON SAMUELS Eliminated     33
                  CAM WINTON Eliminated     32
          JACKIE CHERRYHOMES Eliminated     31
                    BOB FINE Eliminated     30
                   DAN COHEN Eliminated     29
          STEPHANIE WOODRUFF Eliminated     28
             MARK V ANDERSON Eliminated     27
                   DOUG MANN Eliminated     26
                  OLE SAVIOR Eliminated     25
               JAMES EVERETT Eliminated     24
           ALICIA K. BENNETT Eliminated     23
  ABDUL M RAHAMAN "THE ROCK" Eliminated     22
        CAPTAIN JACK SPARROW Eliminated     21
           CHRISTOPHER CLARK Eliminated     20
                   TONY LANE Eliminated     19
                JAYMIE KELLY Eliminated     18
                  MIKE GOULD Eliminated     17
             KURTIS W. HANNA Eliminated     16
 CHRISTOPHER 

And there you go! You've created a PreferenceProfile from real election data, done some cleaning, and then conducted an STV election. You can Google and confirm that `votekit` elected the same candidate as in the real 2013 election.