# PS 88 Week 3 Lab: Utility, Expected Utility, and Pivotal Voters

In this lab we will use tables to explore several of the topics covered in the lecture. First we will think about preferences and utility with the aid of tables. Then we will move on to some expected utility calculations in the context of deciding what candidates to vote for (or whether to abstain) in an election. 

Next, we will start studying the question of when votes are more or less likely to be "pivotal" in elections by running simulations (again, aided by tables).

In [None]:
# Libraries we will use in the lab
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datascience import Table
from ipywidgets import interact
%matplotlib inline

## Part 1: Preferences in Tables

First let's think about preferences in the context of the 2020 Democratic primary. There were tons of candidates that ran in 2020, so to keep things manageable let's restrict attention to the six who won at least one delegate. We will store them in a list called `cands`.

In [None]:
cands = ["Biden", "Sanders", "Warren", "Bloomberg", 
                             "Klobuchar", "Gabbard"]

A "complete" way to think about preferences is to create a table with every possible pair and then ask if our voter prefers A to B (and B to A).

Here is some code that makes a table with the pairs (don't sweat the details).

In [None]:
cand1 = []
cand2 = []
for i in range(len(cands)):
    for j in range(len(cands)):
        cand1 = np.append(cand1, cands[i])
        cand2 = np.append(cand2, cands[j])
pairs = Table().with_columns("Candidate 1", cand1, "Candidate 2", cand2)
pairs    

Now let's add some preferences. In reality, we could go through and as a hypothetical voter "Do you like candidate 1 at least as much as candidate 2" 36 times, but for now I'm going to add an arbitrary preference, which I purposefully won't explain yet.

In [None]:
# Creating an arbitrary preference
pref1 = pairs.column("Candidate 1") >=  pairs.column("Candidate 2")
preftable1 = pairs.with_column("1pref2", pref1)
preftable1

As the name indicates, the third column answers the question "is candidate 1 at least as good as candidate 2"?

Now we can implement the algorithm discussed in lecture to identify the rationalizable candidate for a voter with these preferences:

1. Pick a "potential best candidate"

2. Ask "is our potential best candidate at least as good as everyone else?" 

    2.1 If not, we are done. 
    
    2.2 If we find someone who is strictly better, call them the new potential best candidate, an repeat step 2

Let's do this, starting with Biden as our initial potential best candidate. 

To check if Biden is the best candidate, we can look at the table which compares him to the others

In [None]:
preftable1.where("Candidate 1", "Biden")

The second line tells us that Biden is not preferred to Sanders. 

So now let's see if Sanders is the best.

In [None]:
preftable1.where("Candidate 1", "Sanders")

Now we are down to one candidate preferred to Sanders, which is Warren. Let's see if she is best:

In [None]:
preftable1.where("Candidate 1", "Warren")

Since Warren is at least as good as any other choice, she is rationalizable!

**Question 1.1. Repeat this process, but with Gabbard as the initial potential best candidate. Do you reach the same conclusion, and why?**

In [None]:
# Code for 1.1

*Words for 1.1*

**Question 1.2. Now that you have seen some examples of what candidates are preferred to others, you may be able to piece together the rule for the preferences of this voter. (You can also refer back to the code which produced the `pref1` variable as a hint, which tells you how Python interpets inequalities applied to strings!) What is the simplest way to describe this voter's preferences?**

*Words for 1.2*

**Question 1.3. Given your answer to 1.2, come up with a utility function which represents these preferences (i.e., assign a number to each candidate)**

*Answer to 1.3: add utility numbers to each candidate*

Biden:

Klobuchar:

Warren:

Sanders:

Gabbard:

Bloomberg:

We can also define a function that takes some candidates and a preference table as input, and then loops through and asks if this candidate is at least as good as everyone else.

In [None]:
def get_rationalizable(candlist,preftable):
    rat_cands = []
    for i in candlist:
        preftablei = preftable.where("Candidate 1", i)
        betterthani = sum(preftablei.column("1pref2"))
        if betterthani == len(candlist):
            rat_cands = np.append(rat_cands,i)
    return rat_cands

**Question 1.4. Use the `get_rationalizable` function to identify the rationalizable candidate given these preferences.**

In [None]:
# Code for 1.4

Now let's consider some alternate "preferences", where we will see why I added scare quotes soon.

In short, I'm going to make it so voter 2 has the same preferences as voter 1, except they strictly prefer Klobuchar to Warren. (Think about why this requires two "swaps!")

In [None]:
pref2=pref1
swap1 = (preftable1.column("Candidate 1") == "Klobuchar")*(preftable1.column("Candidate 2") == "Warren")
swap2 = (preftable1.column("Candidate 1") == "Warren")*(preftable1.column("Candidate 2") == "Klobuchar")
pref2[swap1] = 1
pref2[swap2] = 0
preftable2 = pairs.with_column("1pref2", pref2)

Unsurprisingly, now Warren is not preferred to all, which we can check using the same code as before but with `preftable2` replacing `preftable1`

In [None]:
preftable2.where("Candidate 1", "Warren")

How about Klobuchar?

In [None]:
preftable2.where("Candidate 1", "Klobuchar")

...Sanders?

In [None]:
preftable2.where("Candidate 1", "Sanders")

It's starting to look like maybe no one is rationalizable? 

**Question 1.5. Run the `get_rationalizable` function for this preference table.**

In [None]:
# Code for 1.5

**Question 1.6. Is there a rationalizable choices given these "preferences"? If so, who? If not, why?**

*Answer to 1.6*

## Part 2: Utility with Tables

When the utilities associated with different choices are a function of several variables, it can be useful to keep track of this with a table. 

Let's keep using the example of the Democratic primary in 2020 (and the 6 candidates who won a delegate). Suppose that our voters care about three factors: (1) how liberal the candidate is, (2) the candidate gender, and (3) whether the candidate has experience in the executive branch of government.

Here is a table that contains this data for the candidates who won a delegate. (Note: I got the liberal measure from eyeballing a graphic in <a href="https://www.businessinsider.com/2020-democratic-presidential-candidates-political-spectrum-ranking-2019-5">this article</a>, which was based on a survey of voter perceptions.)

In [None]:
cands = Table().with_columns("Name", 
                            cands,
                           "Liberal",
                           [5,10,8,2,4,6],
                           "Female",
                           [0,0,1,0,1,1],
                           "Exec",
                           [1,0,0,1,0,0])
cands

Let' consider a voter Bob, who is quite liberal, would like there to be a female nominee, and also thinks executive experience is very important. We can capture this by giving Bob a utility function
$$
U_{bob} = -|Liberal - 9| + Female + 2 \cdot Exec
$$

The first term captures the idea that Bob likes candidates less whose liberalism score is far from 9, which we can think of as his "ideal liberalism" value (more on this in week 5!). This means he like candidate with a score of 9 best, and as their score gets farther away he likes them less.

The second and third terms mean he adds 1 to his utility of female candate, and 2 to those with executive experience. Here is how we can comput his utility for each candidate:

In [None]:
Ubob = -abs(cands.column("Liberal")-9) + cands.column("Female") + 2*cands.column("Exec")
Ubob

Now let's create a new table to keep track of the utilities:

In [None]:
utils = cands.select("Name")
utils = utils.with_column("Ubob", Ubob)
utils

If we want to see who Bob likes best, one way is to sort the table in descending order

In [None]:
utils.sort("Ubob", descending=True)

So, it looks like Bob likes Warren best.

**Question 2.1. There are two candidates who Bob is indifferent between. Which two, and why?**

*Answer to 2.1*

Another way we can find the best candidate is by finding the one that gives Bob his maximum utility value. First, let's figure out what this is and save it as a variable called `maxUbob`.

In [None]:
maxUbob = np.max(utils.column("Ubob"))
maxUbob

And now we can use the `where` function to find candidates that maximize Bob's utility.

In [None]:
utils.where("Ubob", maxUbob)

Suppose a second voter named Anna has the following utility:
$$
U_{anna} = -|Liberal - 3| + 3 \cdot Exec
$$

**Question 2.2. What does this utility function mean for how Anna evaluates female vs male candidates?**


*Answer to 2.2*

**Question 2.3. Write code to (1) compute the utility the utility Anna assigns to each candidate, (2) add a column to the `utils` table with this information, and (3) determine which candidate Anna likes best.**

In [None]:
# Code for 2.3

*Words for 2.3*

**Question 2.4 Come up with a utility function which will make Sanders the most preferred candidate.**

In [None]:
# Code for 2.4

## Part 3: Computing and plotting expected utility

We can use Python to do expected utility calculations and explore the relationship between parameters in decision models and optimal choices. 

In class we showed that the expected utility for voting for a preferred candidate can be written $p_1 b - c$. A nice way to do calculations like this is to first assign values to the variables:

In [None]:
p1=.6
b=100
c=2
p1*b-c

**Question 3.1. Write code to compute the expected utility to voting when $p_1 = .5$, $b=50$, and $c=.5$**

In [None]:
# Code for to 3.1 here

We don't necessarily care about these expected utilities on their own, but how they compare to the expected utility to abstaining, which is equal to $p_0 b$. 

**Question 3.2. If $b=50$ and $p_0 = .48$, write code to compute the expected utility to abstaining.**

In [None]:
# Code for 3.2 here

**Question 3.3. Given 3.1 and 3.2, is voting the expected utility maximizing choice for these parameters?**

*Answer to 3.3 here*

We can also use the graphic capabilities of Python to learn more about how these models work. 

The following block of code plots the expected utility for voting (solid line) and abstaining (dashed line) as a function of the voting cost.

In [None]:
b=50
p0=.48
plt.hlines(p0*b, 0,2, label='Abstaining Utility',linestyles="dashed")
p1=.5
c = np.arange(0,2, step=.01)
y = p1*b-c
plt.ticklabel_format(style='plain')
plt.plot(c,y, label='Voting Expected Utility')
plt.xlabel('Voting Cost (c)')
plt.ylabel('Expected Utility')
plt.legend()

Note the abstaining utility is flat as the voting cost increases (since the expected utility to abstaining is not a function of $c$). However, the voting expected utility is decreasing in $c$.

**Question 3.4. From this graph, identify the values of $c$ where it is rational to vote (given these values of $p_0$, $p_1$, and $b$)**

*Answer to question 3.4*

**Question 3.5. (OPTIONAL) In the cell below, write some code which uses the calculating functions of python to verify your answer to the previous question**

In [None]:
# Code for 3.5 here

We can also use python (and tables) to do expected utility calculations with more than two options pretty easily. 

Let's suppose that in our list of democratic candidates, the probability of each winning at some point was: Biden (45%), Sanders (30%), Warren (15%), Bloomberg (5%), Klobuchar (4%), Gabbard (1%). We can capture this as a list, and check that it sums to 1:

In [None]:
pwin1 = [.45, .3, .15, .05, .04, .01]
sum(pwin1)

It might be nice to add this to our candidate/utility table:

In [None]:
utils=utils.with_column("pwin1", pwin1)
utils

If we multiply the arrays of Bob's utility with the probability of each winning, and then sum them up, that will sum up all of the $p_i u_i$'s, giving the expected utility:

In [None]:
EUbob1 = sum(utils.column("Ubob")*utils.column("pwin1"))
EUbob1

**Question 3.6. Write a line of code to compute the expected utility for Anna**

In [None]:
# Code for 3.6

Now suppose Bloomberg drops out. As he was generally considered a centrist, this would presumably help Biden at the cost of the more liberal candidates (Sanders, Warren). Suppose the new probabilities of winning are Biden (65%), Sanders (25%), Warren (5%), Bloomberg (0%), Klobuchar (4%), Gabbard (1%). 

**Question 3.7. Since Bloomberg was Bob's least favorite candidate, we might think that he is happy to have him out of the race. Write code to see if Bob's expected utility goes up or down, and then explain why.**

In [None]:
# Code for 3.7

*Words for 3.7*

## Part 4: Simulating votes

How can we estimate the probability of a vote mattering? One route is to use probability theory, which in realistic settings (like the electoral college in the US) requires lots of complicated mathematical manipulation. Another way, which will often be faster and uses the tools you are learning in Data 8, is to run simulations.

As we will see throughout the class, simulation is an incredibly powerful tool that can be used for many purposes. For example, later in the class we will use simulation to see how different causal processes can produce similar data.

For now, we are going to use simulation to estimate the probability a vote matters. The general idea is simple. We will create a large number of "fake electorates" with parameters and randomness that we control, and then see how often an individual vote matters in these simulations. 

Before we get to voting, let's do a simple exercise as warmup. Suppose we want to simulate flipping a coin 10 times. To do this we can use the `random.binomial` function from `numpy` (imported above as `np`). This function takes two arguments: the number of flips (`n`) and the probability that a flip is "heads" (`p`). More generally, we often call $n$ the number of "trials" and $p$ the probability of "success".

The following line of code simulates flipping a "fair" (i.e., $p=.5$) coin 10 times. Run it a few times.

In [None]:
# First number argument is the number of times to flip, the second is the probability of a "heads"
np.random.binomial(n=10, p=.5)

We can simulate 100 coin flips at a time by changing the `n` argument to 100. The output tells us how many of these simulated coin flips came up heads. Run it a few times to see what happens for different simulations.

In [None]:
np.random.binomial(n=100, p=.5)

In the 2020 election, about 158.4 million people voted. This is a big number to have to keep typing, so let's define a variable: 

In [None]:
voters2020 = 158400000

**Question 4.1. Write a line of code to simulate 158.4 million people flipping a coin and counting how many heads there are.**

In [None]:
# Code for 4.1 here

Of course, we don't care about coin flipping per se, but we can think about this as the number of "yes" votes if we have n people who vote for a candidate with probability $p$. In the 2020 election, about 51.3% of the voters voted for Joe Biden. Let's do a simulated version of the election: by running `np.random.binomial` with 58.4 million trials and a probability of "success" of 51.3%. 

Coding note: sometimes we will include a line called `np.random.seed`. This ensures that our random number generator (while still effectively "random" if we only run it once) alway produces the same output.

In [None]:
np.random.seed(88)
joe_count = np.random.binomial(n=voters2020, p=.513) #SOLUTION
joe_count

In reality, Biden won 81.27 million votes. 

**Question 4.2. How close was your answer to the real election? Compare this to the cases where you flipped 10 coins at a time.**


*Answer to 4.2 here*

## Part 5. Pivotal votes.

Suppose that you are a voter in a population with 10 people who are equally likely to vote for candidate A or candidate B, and you prefer candidate A. If you turn out to vote, you will be pivotal if the other 10 are split evenly between the two candidates. How often will that is happen?

We can answer this question by running a whole bunch of simulations where we effectively flip 10 coins and count how many heads there are. 

The following line runs the code to do 10 coin flips with `p=.5` 10,000 times, and stores the results in an array. 

In general if you write `[function(args) for _ in range(x)]` Python will run `function(args)` x times, and store the results in a list. We will often use code like this to run simulations where `function(args)` contains some element of randomness.

In [None]:
ntrials=10000
trials10 = [np.random.binomial(n=10, p=.5) for _ in range(ntrials)]

Let's put these in a table, and then make a histogram to see how often each trial number happens. To make sure we just get a count of how many are at each interval, we need to get the "bins" right.

In [None]:
simtable = Table().with_column("sims10",trials10)
simtable.hist("sims10", bins=range(11))

Let's see what happens with 20 coin flips. First we create a bunch of simulations:

In [None]:
trials20 = [np.random.binomial(n=20, p=.5) for _ in range(ntrials)]

And then add the new trials to `simtable` using the `.with_column()` function.

In [None]:
simtable=simtable.with_column("sims20", trials20)
simtable

**Question 5.1 Make a histogram of the number of heads in the trials with 20 flips. Make sure to set the bins so that each one contains exactly one integer.**

In [None]:
# Code for 5.1

Let's see what this looks like with a different probability of success. Here is a set of 10 trials with a higher probaility of success ($p = .7$)

In [None]:
np.random.seed(88)
trials_high = [np.random.binomial(n=10, p=.7) for _ in range(ntrials)]

**Question 5.2. Add this array to `simtable`, as a variable called `sims_high`, and create a histogram which shows the frequency of heads in these trials**

In [None]:
# Code for 5.2

**Question 5.3. Compare this to the histogram where $p=.5$**

*Answer to 5.3 here*

Next we want to figure out exactly how often a voter is pivotal in different situations. To do this, let's create a variable called `pivot10` which is true when there are exactly 5 other voters choosing each candidate.

In [None]:
simtable = simtable.with_column("pivot10", simtable.column("sims10")==5)
simtable

We can then count the number of trials where a voter was pivotal.

In [None]:
sum(simtable.column("pivot10"))

Since there were 10,000 trials, we can convert this into a percentage:

In [None]:
sum(simtable.column("pivot10"))/ntrials

**Question 5.4. Write code to determine what proportion of the time a voter is pivotal when $p=.5$ and $n=20$**

In [None]:
# Code for 5.4

To explore how changing the size of the electorate and the probabilities of voting affect the probability of being pivotal without having to go through all of these steps, we will define a function which does one simulation and then checks whether a new voter would be pivotal.

In [None]:
def one_pivot(n,p):
    return 1*(np.random.binomial(n=n,p=p)==n/2)

Run this a few times.

In [None]:
one_pivot(n=10, p=.6)

Let's see how the probability of being pivotal changes as the size of the electoral changes. To do so, we will use the same looping trick to store 10,000 simulations for different $n$. Here is code to simulate with $n=10$ (Note we defined `ntrials=10,000` above)

In [None]:
piv_trials10 = [one_pivot(n=10, p=.5) for _ in range(ntrials)]
sum(piv_trials10)/ntrials

**Question 5.5 Write code to simulate how often a voter is pivotal with $n=100$ and $n=1000000$. (Keep $p=.5$)**

In [None]:
# Code for 5.5 (n=1000)

In [None]:
# Code for 5.5 (n=1000000)

**Question 5.6 Now let's return to the $n=10$ case, and see what happens when we change $p$. Write code to simulate how often a voter in an electorate of 10 will be pivotal with $p=.2$, $p=.4$, and $p=.6$**

In [None]:
# Code for 5.6

**Question 5.7. Compare the probability of being pivotal for these values of $p$. What does this (and the analysis of the effect of changing $n$) tell you about what kinds of real world elections generate the highest probability of being pivotal.** 

*Answer to 5.7*

**Question 5.8 (Optional) make a plot of the probability of being pivotal with $n=10$ as a function of $p$**

In [None]:
# Code for 5.8