# PS 88 Lab 5: Political Competition, Theory and Data

In this lab we will explore some theories of the ideological positions of candidates for political office, and then see how they line up with real data from the US House of Representatives.

In [None]:
from datascience import Table
import numpy as np
%matplotlib inline

## Part 1: The Hotelling-Downs Model

First let's see how we can use the tools of Python to better understand the dynamics of the Hotelling-Downs model of political competition. 

So we have some "real" data as motivation, let's bring back the data from the survey of the class where I asked you all to place yourelf on a 1-7 point scale from "Very Conservative" to "Very Liberal". Let's reload the data.

In [None]:
classdata = Table.read_table("PS88survey.csv")
classdata

**Question 1.1. Make a histogram of the `LibCon` variable. (Hint: add a `bins=np.arange(1,8,.5)` argument to make it look a bit nicer). What does this tell you about the distribution of political ideology in the class?**

In [None]:
# Code for 1.1


*Words for 1.1*

We can now ask what would happen in hypothetical elections for "PS 88 President" under the assumption that candidates will pick vote for the candidate who is strictly closer to them if one meets that condition, and will abstain otherwise. 

Given platforms $x_A$ and $x_B$, we can check if someone with ideal point $x_i$ is closer to $A$ using absolute values. In particular, they vote for $A$ if:
$$
|x_i - x_A | < |x_i - x_B|
$$

For example, suppose candidate A picks a platform of 4 and candidate B picks 6. The following line of code creates an array  which answers the question of whether each class member would vote for A:

In [None]:
abs(classdata.column("LibCon") - 4) < abs(classdata.column("LibCon") - 6)

We can count how many Trues there are (and hence how many A votes) by summing these:

In [None]:
sum(abs(classdata.column("LibCon") - 4) < abs(classdata.column("LibCon") - 6))

**Question 1.2. Write a line of code to count how many class members would vote for B. Who would win in an election between candidates with these platforms?**

In [None]:
#Code for 1.2

*Words for 1.2*

To make calculations like this more efficient, let's define some functions. Here is one that takes as input two platforms `xA` and `xB`, and the ideal points of the electorate as `electorate` and computes the votes that A and B would get.

In [None]:
def get_Avotes(xA,xB,electorate):
    return sum(abs(electorate - xA) < abs(electorate - xB))

In [None]:
get_Avotes(4,6, classdata.column("LibCon"))

**Question 1.3 Write a function called `get_bothvotes` which takes as input the platforms and the electorate ideal points, and returns an array with the number of A votes and the number of B votes. To check that it works, use the `get_bothvotes` function to ask how many votes each party gets if A picks a platform of 4 and B picks a platform of 6 (it should correspond to what you learned above).**

In [None]:
# Code for 1.3

**Question 1.4. Use the `get_bothvotes` function to show that if party B picks a platform of 6, A could win by picking a slightly higher platform than 4 (note: parties need not pick integer platforms!). Why does this slightly higher platform lead to a much different result?**

In [None]:
# Code for 1.4

*Words for 1.4*

**Question 1.5. Show that if party  B picks a platform of 6, party A could also win by picking something slightly below 6.**

In [None]:
# Code for 1.5

**Question 1.6. Write code to find the median position of the class. Then use the `get_bothvotes` function to show that if candidate A places themselves at the median, then B will lose if placing themself slightly higher or lower but tie if also going to the median.**

In [None]:
# Code for 1.6.

Note that our function can also see how he same platforms would do with a different electorate. Let's define `opp_class` as an electorate with the exact opposite preferences you reported. Since the ideology is on a 1-7 scale, 8-ideology will also be on a 1 to 7 scale (think about why!)

In [None]:
opp_class = 8- classdata.column("LibCon")
opp_class

**Question 1.7. What does the Downs-Hotelling model say about the NE party positions with this electorate? Confirm this by showing that if A picks this position and B picks someting slightly lower or higher, A will get more votes.**

In [None]:
# Code for 1.7

*Words for 1.7*

**Qustion 1.8. Find pairs of platforms where (1) A would win with the class electorate but B would win with the opposite electorate, and (2) A would win with either electorate.**

In [None]:
# Code for 1.8

## Part 2: Multiple Electorates, Multiple Elections

To connect this theory to some data we will explore, let's imagine there are a bunch of districts, which have voters with different ideological distributions, which we will generate randomly. 

In each district, there is a "district election" (which will correspond to a member of the legislature) and a "national election". 

We will assume the "district" election happens as in the theory above, and so the winning candidate will have a platfom equal to the position of the median voter. As in the theory, we assume that this platform is "credible" in the sense that the winning candidate will implement this platform if in office.

We will assume that (for reasons not explicitly modeled), in the national election there are two candidates L and R who place themselves at positions 2 and 6, and voters pick the candidate closest to them. From this will compute the R vote share in the national election. 

Finally, we will look at the relationship between the result of the national election and the district election.

Let's define some variables which will correspond to the number of districts (set to 400, to approximate the data we will use later). We also set the district size to 100 voters; this is much smaller than the context of house disticts we will look at, but making it bigger will just make the code run more slowly.

In [None]:
n_dist = 400
dist_size = 100

Within each district we want to create some random voter ideologies. If we just make all of them come from the same distribution, then the median in each district will be pretty similar (you will learn more about this when you study sampling in Data 8). 

In reality, more liberal and conservative voters tend to cluster in different districts, as they did in our class! To reflect this, we will simulate our electorate in two steps. First, we will create random district means which range from 2 to 6, and then we will simulate the individual voter ideologies.

To do this, we can use the `np.random.rand(n)` function, which creates n random numbers between 0 and 1. These follow a "uniform" distribution which loosely means that all values are equally likely. For example this creates `n_dist` random numbers between 0 and 1 then puts it in a table so we can make a histogram with the `.hist` function.

In [None]:
ideology01 = np.random.rand(n_dist)
dist_data = Table().with_column("Ideology01", ideology01)
dist_data.hist("Ideology01")

If we want the district ideology to have a different range, we can "transform" this variable. For example. if we wanted it to range from 1 to 7 we can create the following variable, and confirm the range (and also check the average):

In [None]:
ideology17 = 1 + 6*ideology01
print(np.min(ideology17), np.mean(ideology17) ,np.max(ideology17))

For our simulation we want the mean ideology to range from 2 to 6. 


**Question 2.1. Write code to (1) create an array called `dist_means` which is uniformly distribtued from 2 to 6, (2) add it to the `dist_data` table, and (3) make a histogram to confirm that it has the range that we want.**

In [None]:
# Code for 2.1

Now let's create individual voter ideologies. To do this for the first district, let's first figure out the mean ideology there (recall that the first voter is in position 0):

In [None]:
dist_means[0]

Let's assume that within each district, the individual voters' ideologies are between the district mean - 2 and the district mean +2. We then check what the median voter ideology is within our test district.

In [None]:
test_dist_voters = dist_means[0] -2 + 4*np.random.rand(dist_size)
np.median(test_dist_voters)

We can see how this district would vote in the national election with candidate platforms at 2 and 6 using the `get_bothvotes` function we defined above.

In [None]:
get_bothvotes(2,6,test_dist_voters)

**Question 2.2. Now do the same thing to simulate `dist_size` voters in district 2, and compute (1) the district median ideology and (2) the number of votes for each canddiate on the national level. Compare both to the first district.** 

In [None]:
# Code for 2.2

Rather than going through these 1 by 1, let's write a loop which (1) simulates the voter ideologies in the district, and then (2) computes the district median and national vote share.

In [None]:
# Creating blank lists
dist_elec_dh = []
nat_elec= []
# Looping through ad adding the district and national results to our lists
for i in range(n_dist):
    dist_voters = dist_means[i] + 6*np.random.rand(dist_size) - 3
    dist_elec_dh = np.append(dist_elec_dh, np.median(dist_voters))
    lvotes = sum(abs(dist_voters - 2) < abs(dist_voters - 6))
    rvotes = sum(abs(dist_voters - 2) > abs(dist_voters - 6))
    nat_elec = np.append(nat_elec, rvotes/(lvotes + rvotes))
    

**Question 2.3. Add the median district ideology (call this "District DH" to emphasize this is the prediction from the Downs-Hotelling model) and national vote share to the `dist_data` table, and then make a scatter plot with the national vote share on the x axis and the district median on the y axis. Interpret this graph**

In [None]:
#Code for 2.3

*Words for 2.3*

## Part 3: Party Loyalty Theory

One thing that we might think is missing in the Downs-Hotelling theory is the role of parties. In most political systems, candidates don't just run on an ideology, they also usually run with a party label as well. (Sometimes voters *only* have the option to pick one of the parties.) Further, once in office, legislators are typically pressued to vote with the party, though there is hetereogeneity here across countries as well.

Let's consider an extreme version of a model with party loyalty/discipline. As above, suppose the "national" platforms for the L party is at 2 and the R party is at 6. Voters in each district get a choice to vote for a candidate from the L party, expecting that they will vote with the national L party, or a candidate from the R party who will vote with her national party. As a result, voters will vote for the L party if their ideology is closer to 2 and for the R party if their ideology is closer to 6.

We already have a variable which indicates what proportion of each district prefers 6 to 2: that is just the `National` variable. However, if we want to predict legislator behavior, we want a variable that is equal to 2 if more than half of the voters prefer 2 to 6, and equal to 6 if more than half of the voters prefer 6 to 2.

**Question 3.1. Create a variable called `District PL` which indicates how the legislator will vote in each district given this theory, and add it to the `dist_data` table. (Hint: you can assume that if there is a tie, the R party wins (or the L party wins).**

In [23]:
#Code for 3.1

**Question 3.2. Make a scatterplot with the National vote share on the x axis and the winning legislator position on the Y axis. Compare this to the Downs-Hotelling prediction.**

In [None]:
#Code for 3.1

*Words for 3.1*

## Part 4: The Data

Now let's look at some real data, which comes from <a href="https://www.jstor.org/stable/2669364?seq=1#metadata_info_tab_contents">this paper</a>. The data include members of the House of Representatives who were elected in 2000. To measure their "platform" or "position", the authors use how they voted on bills after being elected. This is measured in the `Member Position` variable. It ranges from 0, meaning most liberal, to 1, meaning most conservative.

To get at the ideology of the district, we can use how citizens within that districted voted in the 2000 presidential election. In particular, if we use `Bush` as our x axis variable that will give a sense of how conservative the median voter of the district is.

There are also some other variables that may be of interest: The party of the member, how often they voted with their party, their gender, etc.

In [None]:
realdata = Table.read_table("housedata.csv")
realdata

**Question 4.1. To see the empirical analog of our theoretical predictions, make a scatter plot with `Bush` on the x axis and `Member Position` on the y axis.**

In [None]:
# Code for 4.1

You should see two "clumps" of data, which given the discussion above might correspond to the two parties. 

**Question 4.2. Make a version of the same graph, but use the `group` option to label the points by party.**

In [24]:
# Code for 4.2

**Question 4.3. Compare this to the theoretical predictions of the Downs-Hotelling model and the Party Loyalty model. Which looks "closer" to reality, and why?**

*Words for 4.3*

Another thing we might want to check is whether *within the same party* members from more conservative districts vote in a more conservative fashion. 

**Question 4.4. To check this, make two scatterplots, one for each party (hint: use the `where` function), which also include a `fit_line=True` argument to draw a best fit line.**

In [None]:
# Code for 4.4

**Question 4.5. Interpret these graphs in light of the Downs-Hotelling and Party Loyalty theories.**

*Words for 4.5*

## Part 5: A Hybrid Theory

Perhaps a better theory to explain the voting behavior of members of the House combines the two models. A simple way we can do that is to predict a *weighted average* of the two predictions. Loosely, we can interpret this as predicting that members of congress sometimes get to vote according to their own/their district ideology, but other times have to vote with the party. 

With just two numbers $x$ and $y$, the $w$-weighted average is given by:
$$
w*x + (1-w)*y
$$

(We can also define weighted averages for more than two numbers, but won't need to do so for our purposes in this lab.)

Here is a simple function to compute the weighted average of two columns/arrays with weight $w$. 

In [None]:
def w_avg(x,y,w):
    return w*x + (1-w)*y

We can use this to compute the regular average of two numbers if we put the weight at $w=.5$:

In [None]:
w_avg(2,6, .5)

If we increase the $w$ parameter, we put more weight on the first number:

In [None]:
w_avg(2,6, .8)

We can also apply this to arrays:

In [None]:
w_avg(np.array([5,3,2]), np.array([8,5,1]), .8)

Note that this is different from taking the average of the two arrays: rather we take the $w$-weighted average of the first number from each array, then the $w$-weighted average of the second number from each array, etc.

**Question 5.1. Create a variable which is equal to an equal weighted average ($w=.5$) of the prediction of the DH theory and the PL theory, and add it to the `dist_data` table with the name "District Hybrid".**

In [25]:
#Code for 5.1

**Question 5.2 Make a scatter plot with "National" on the x axis and the hybrid prediction on the Y axis.**

In [26]:
# Code for 5.2

**Question 5.3. Repreat this process for $w=.2$ and $w=.8$. Which of these looks closest to the real data? What might this tell us about the relative importance of the two theories?**

In [27]:
# Code for 5.3

*Words for 5.3*

**Question 5.4. What else might we want to add to our theory to make it more realistic/fit the theory better? (No need to write any code here, but if you want to that's great!)**

*Words for 5.4*