# PS 88 - Lab 2 -  Theories of Accountability and Segregation

## Part 1: What outcomes affect votes?

In class I showed a graph that plotted GDP growth during a president's term and how well the incumbent party did in the next election. This is often viewed as important evidence that voters reward or punish politicians based on how the economy performs under their control, which could put more competent leaders in office and give politicians incentive to work hard to give voters good outcomes.

Let's first replicate the code to make that graph here.

In [None]:
# Importing libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Loading data
pv = pd.read_stata("data/presvote.dta")
# Subsetting to years after 1940
# There isn't much good GDP data before and the great depression/rebound is a weird time
pv = pv[pv['year'] >=1940]
# Making the plot and labeling axes
sns.scatterplot(x='gdpchange', y='incvote', data=pv)
plt.xlabel('GDP Growth')
plt.ylabel('Incumbent Vote Share')

Recall that GDP growth could be a misleading indicator of whether individual financial situations are improving. Some have argued that something called "real disposable income" (RDI) is a better measure of this (see <a href="https://www.bea.gov/resources/learning-center/what-to-know-income-saving">here</a> for a comparison of some different related variables).  Fortunately our data table has this information too. In particular, the `RDIg_term` column contains the real disposable income growth over the four years preceding the election.

Theoretically, it makes sense to focus on these four years since we'd like to know how things went under the control of the incumbent (or the incumbent party).

**Question 1.1: Modify the code below to change the x axis to real disposable income growth over the four years preceding the election.**

In [None]:
# Code for 1.1: Change someting here to plot real disposable income growth
sns.scatterplot(x=..., y='incvote', data=pv)
# Change something here to label the axis properly
plt.xlabel("RDI Growth (4 year)")
plt.ylabel('Incumbent Vote Share')

Another common argument is that people don't necessarily think carefully about how their economic situation changed over the entire time the incumbent was in office, but only think more about the recent past. One way we can test this is by looking at RDI growth over the year leading up to the election. This is captured by the variable `RDIyrgrowth`. 

**Question 1.2. Write code to make a scatterplot with RDI growth over the year leading up to the election on the x axis and the incumbent vote share on the y axis (feel free to copy from your answer to 1.1 and then modify it).**

In [None]:
# Code for 1.2 here

**Question 1.3 Compare the results of three graphs we have made so far. What might the say about the applicability of our model of political accountability? (Note: there are lots of potential answers here!)**

*Answer for 1.3*

To preview something we will learn later in class, we can also produce a similar graph but add a *line of best fit*, which describes the average trend in the data.

(We do this with a function called `regplot` in the Seaborn library, which we imported as `sns`)

In [None]:
# Creating a scatterplot with a line of best fit. 
# The ci=None option removes confidence intervals
sns.regplot(x='RDIyrgrowth', y='incvote', data=pv, ci=None)
plt.xlabel('RDI Growth')
plt.ylabel('Incumbent Vote Share')

One way to think about this line is saying "given a level of growth, what is our best prediction about the incumbent vote share?" 

There are lots of cool things we can do with this (again, more to come!) but one that is interesting in light of our accountability model is that we can think of elections that are far from this line as ones where the outcome is different than we would predict based on how the economy was doing.

To see what years had the incumbent do better or worse than expected, we can add some labels to the points.

In [None]:
# removing NA to avoid annoying errors
pvtoplot = pv[['RDIyrgrowth', 'incvote', 'year']].dropna()
pvtoplot['year'] = pvtoplot['year'].astype(int)

# The ci=None option removes confidence intervals
sns.regplot(x='RDIyrgrowth', y='incvote', data=pvtoplot, ci=None)
plt.xlabel('RDI election year growth')
plt.ylabel('Incumbent Vote Share')

# Looping through to label points with the year
for x, y, z in zip(pvtoplot['RDIyrgrowth'], pvtoplot['incvote'], pvtoplot['year']):
 # the position of the data label relative to the data point can be adjusted by adding/subtracting a value from the x &/ y coordinates
 plt.text(x = x+.025, # x-coordinate position of data label
 y = y-.01, # y-coordinate position of data label, adjusted to be 150 below the data point
 s = z) # data label, formatted to ignore decimals
 # set colour of line

**Question 1.4. Note that 2020 is a year where the incumbent did much worse than the best fit line predicts. Why might that be (there are multiple good ways to answer this!)**

*Answer for 1.4*

**Question 1.5. What is a factor outside of economic performance which voters might use to evaluate politicians? Come up with some data one might collect in order to test whether our accountability model works in this domain as well. (Hint: the easiest way to do this is to think of a non-economic variable you might put on the x axis for a graph like the ones we used in the lab.)**

*Answer for 1.5*

## Part 2: Automated Segregation

In the second part of this lab we will show how to leverage the power of simulation to quickly and easily run algorithms like the Schelling segregation model. 

The first line of code here runs an script which contains the functions to run and display the segregation algorithm.

In [None]:
%run schelling_py.ipynb

The main function we have written for you is called `display_schelling`. As we will see, this function can run variants of the algorithm from the lecture. This function requires at least one *argument*, which is the initial arrangement of houses. We will call an arrangement of houses a city, and input this with a *string* which is what Python calls variables that are letters. In particular, the string for the class example is "ABABABAB_").

The following line of code replicates the example from class. 

In [None]:
display_schelling("ABABABAB_")

The output here is a "data frame" where each line corresponds to a turn. The first column says who's turn it is (set to "0" for the initial setup), the second column says what they do, and the third column shows the resulting city arrangement.

The `display_schelling` function also has several *optional* arguments, which have a default value that you can override when you call the function. You will learn more about this soon in Data 8.

Here is one we will often make use of throughout the lab: if we want to make things a bit more concise, we can add a `shorten=True` argument, which only displays the start and then the turns where someone moves.

In [None]:
display_schelling("ABABABAB_", shorten=True)


One of the interesting features of this model is that seemingly small changes can have a big impact on the final result. Here is what happens if, in the initial arrangement, there are alternating pairs of As and Bs rather than alternating individuals.

In [None]:
display_schelling("BBAABBAA_", shorten=True)

Hmm that looks a bit weird. Let's do the same thing but without the shorten option. One way we could do this is to just delete the `shorten=True` argument, but to see another way we can do this is explicitly setting `shorten=False`. The reason these do the same thing is that `shorten=False` is the default setting (again, you will learn more about this in Data 8 when discussing functions), so if we don't specify whether to shorten the function will not do so.

In [None]:
display_schelling("BBAABBAA_", shorten=False)

Ah, so what happened here is that, given this initial arrangment, no one wanted to move! 

**Question 2.1. Given the way we defined this algorithm, why does no one want to move?**

*Answer for 2.1*

There are some additional arguments which we can change in order to capture different moving rules by the households. They are:

- `b_in` is how much the households value being close to in-group members.
- `b_out` is how much the households value dislike being close to out-group members
- `b_home` is how much the households value staying in their current home

The way the algorithm works is by computing a "utility" to each available house, where higher utility numbers mean liking the spot more (we will discuss the concept of utility more next week.). For every in-group neighbor at this potential house, we add `b_in` to this utility. For every outgroup neighbor we add `b_out` (which will typically be a negative number). If the available home is the current one, we add `b_home`. The household who is taking the current turn then goes to the available house that gives the highest utility (and goes to the leftmost one that gives the highest possible utility in the case of a tie). 

The defaults for these arguments, which replicate the rules we used in the lecture, set `b_in =1`, `b_out=0`, and `b_home=.01`. The `b_in=1` means we add 1 to the utility for each in-group member that would be a neighbor. Any positive number could do here, since all we want to capture is that more in-group neighbors is better. The `b_out=0` captures the idea that households don't care about having out-group neighbors either way: this does not affect the utility. Another way to think of this is that people are indifferent between having an empty house or an outgroup member as a neighbor (but would rather have an ingroup neighbor!)

You can think of the `b_home=.01` as a "tie-breaking" rule: household won't move unless they can have more in-group neighbors. (The only important thing to replicate the algorithm from lecture is that `b_home` is smaller than `b_in`; if not then houses would not move even if it led to one more in-group neighbor.)

If there are multiple spots that are equally good (give equal utility) the house moves to the left-most one. And after anyone moves, we "reset" and let the left-most house see if they want to move first. 

The wonder of doing using a computer rather than by hand is we can quickly see how things would shake out differently with some minor changes. For example, by setting `b_in=0` and `b_out=-1` we can see what would happen if the households don't intrinsically like being close to ingroup members but want to avoid outgroup members. (Think about why!)

Let's do this for the example from class where they start out alternating by house.

In [None]:
display_schelling("ABABABAB_", b_in=0, b_out=-1, b_home=.01, 
                  shorten=True)

**Question 2.2. Compare the final outcome here to the case where `b_in=1` and `b_out=0`. What does this mean in words?**

*Answer for 2.2*

If we want to capture the notion that households like living near the ingroup **and** dislike living near the outgroup, we can set `b_in=1` and `b_out=-1`. Let's see what happens in the "alternating pairs" starting point using this moving rule:

In [None]:
display_schelling("BBAABBAA_", b_in=1, b_out=-1, b_home=.01, shorten=True)

**Question 2.3. Why does this lead to a move when we didn't see any moves for the default parameters?**

*Answer for 2.3*

Next, let's see what happens if we add give the households another empty spot to move to.

**Question 2.4. Write code to run the algorithm with the same moving rule as the last example (`b_in=1, b_out=-1, b_home=.01`) but with an additional empty house added to the end of the initial arrangement. Use the `shorten=True` argument to keep things concise.**

In [None]:
# Code for question 2.4 here

**Question 2.5. Adding this blank house led to a very different final arrangement. Give an explanation for why this happened. What might this say about the drivers of segregation in real world cities?**

*Answer for 2.5.*

So far we have just been eyeballing the different arrangements, and saying how segregated we think they are. It will also help to have a more systematic definition of this.

There are several ways to measure segregation, but here is one that will be good for our purposes. For each household that has neighbors, let $n_s$ be the number of neighbors who are in the same group, and $n_d$ be the number in a different group. (Empty houses do not count.) Let the "individual segregation" for a household be $(n_s - n_d)/(n_s + n_d)$. Note this will be equal to $-1$  if all neighbors are outgroup, $1$  if all neighbors are in-group, and $0$ if there are an equal number of in- and and outgroup neighbors (here, the only possibility is 1 of each). 

Finally, we take the average of the individual segration measures to get a measure for the whole city.

We wrote a function to implement this for you, called `seg_meas`, where the input is a city string. Let's see what it looks like for the initial arrangement of our previous example.

In [None]:
#  Getting the segregation measure for our initial city
seg_meas("ABABABAB_")

This is the lowest possible measure of segration, because everyone in this city only has outgroup neighbors. When we ran our algorithm on this initial arrangement with either in-group favoratism or out-group animus the final arrangement was "AAAA_BBBB".

**Question 2.6. Write code to get the segregation measure for this final arrangment, and explain the output in the following markdown cell**

In [None]:
# Code for question 2.6 here

*Answer for 2.6*

**Question 2.7. Now let's consider the simulation with alternating pairs, where the start was "BBAABBAA_" and the end was "BB_ABBAAA". How did the segregation level change as we ran the algorithm here? Compare this change to the change you found in 2.6.**

In [None]:
# Code for 2.7

*Answer for 2.7.*

The last thing we will explore is what happens if there are more than two groups. We can do this my adding some additional letters into the mix. For constency, let's call the new groups C and D. Here is an initial arrangement with four groups:

In [None]:
init_four="ABCADBC_CD_ABBD"
init_four

Here is what happens with our default preferences (liking being close to the in-group, no antipathy towards any out-group). Note that now we have defined `init_four`, we can just use this as our `init=` argument.

In [None]:
display_schelling(init=init_four, shorten=True)

**Question 2.8. Try a few variants of the simulation with four groups (e.g., change the intial arrangment, or change the `b_in` or `b_out` parameters). Does this lead to more or less segregation? Remember you can use the `seg_meas` function to measure segration level.**

In [None]:
# Code for question 2.8

*Words to Question 2.8*

 **Question 2.9. Recall that our key principles for good theory is that we want to simplify the world in a way that allows us to capture key features of the question we are studying. What is a question you might want to ask related to segregation which is NOT well-suited to the algorithm here? How might we modify the model here in order to answer that question?**

*Answer for 2.9* 