# PS 88 Lab 11 - Nonlinear and Interactive Relationships

We will explore both of the themes this week using a <a href="https://www.pnas.org/doi/10.1073/pnas.2116851119#executive-summary-abstract">fascinating recent paper</a> about support for political violece in the United States. 


## Part 1: Theory and initial findings

We can motivate this paper with a quick callback to the game theory section of the class. Suppose two political actors can choose whether or not to use violence, with the following payoffs:

|          | B Violent | B Nonviolent     | 
|----------|----------|--------------|
| A Violent |    0,0    |   1,-1        |  
| A Nonviolent |    -1,1   |   2,2      |  


The best outcome is if both are nonviolent, giving payoff 2. But that may not be the only Nash equilibrium!

**Question 1.1. Show that if one player expects the other to choose violence it is a best response to choose violence as well, and so there is a Nash Equilibrium where both pick Violent.**

*Words for 1.1*

From this we might expect that individuals who think their political opponents are apt to use violence may be more apt to use violence themselves. We can check if this is true in the United States using data from Mernyk et al.

In [None]:
import numpy as np
import seaborn as sns
import statsmodels.formula.api as smf
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
spv = pd.read_csv("SPV_cleaned.csv")
spv

The authors did a survey which also involved an experiment that we will look at in part 2. First we will do some basic analysis of support for political violence in light of the theory above. 

In the survey, respondents were asked which party they supported, and we will focus on people who support the Democratic or Republican party (i.e., no independents). Democrats were then asked questions like "How much do you feel it is justified for Democrats to use violence in advancing their political goals these days?” Republicans were asked the same question but with "Republicans" replace "Democrats". 

Respondents were asked four questions that were meant to tap into their *Support For Political Violence* (SPV), from which they make an index which ranges from 0 to 100, where higher numbers mean more support for violence. 

The authors then asked respondents how they thought *others* would answer questions like this. We will focus on the "out-group metaperception", which is their best guess of the average SPV of people in the other party.

To summarize, the first variables we will study are:
- `SPV_self`: the individual's own SPV measured on a scale from 0-100
- `SPV_meta_out`: the guess about the average SPV among those in the other party

Let's take a look at the histogram of actual SPV:

In [None]:
spv.hist('SPV_self')

SPV might be different among the two parties. Recall we can use `smf.ols` to run a bivariate regression which is essentially a difference of means. The party of the individual is given by the "party" variable, which is coded as 1 for Republicans and 2 for Democrats. To make this easier to interpret let's create a variable called "dem" for Democrats:


In [None]:
spv['dem'] = np.where(spv['party']==2,1,0)

Now we can run the following regression:

In [None]:
smf.ols("SPV_self ~ dem", data=spv).fit().summary()

Which tells us that the average Democrat in this survey reports about 1.4 points more SPV on the 100 point scale. However, this is not much bigger than a standard error, so this may just be driven by random noise. Moving forward we will assume there aren't major differences among the parties on this variable.

**Question 1.2. Make a histogram of the perceived SPV of the other party, and then fit and summarize a regression to see if this is higher or lower among Democrats. Interpret these results: do respondents have a good sense of the other parties support for political violence (if you aren't sure you may want to also compare the average of these variables), and is this different among the two parties?**

In [None]:
# Code for 1.2

*Words for 1.2*

Another way to see this is to make a scatter plot of the self and outparty variables, along with a line where these two values are equal (or the "45-degree line"). Points above the line are individuals who support violence more themselves then they think the outparty does, and points below the line support violence less than they think the outparty does. 

In [None]:
sns.scatterplot(x='SPV_meta_out', y='SPV_self', data=spv)
plt.plot([0,100],[0,100])

These differences will prove important later on, but before we get to that remember part of our initial motivation was to see if those who think their political opponents are apt to use violence are apt to use it themselves.

**Question 1.3. Fit and summarize a bivariate regression with "SPV_self" as the dependent variable and "SPV_meta_out" as the dependent variable. Interpret the slope. Give a reason why this might not represent a causal effect of perceptions of outparty support for violence on personal support for violence.**

In [None]:
# Code for 1.3

*Words for 1.3*

## Part 2: Left-right ideology and SPV

Above we looked at differences between Republicans and Democrats, but we also might think that those who are relatively more extreme in their ideological views will be more likely to support violence. the "polit" variable is a 1 to 7 point scale ranging from "very liberal" (1) to "moderate" (4) to "very conservative".

We can check if there is a relationship between this variable and SPV visually using the regplot function:

In [None]:
sns.regplot(x="polit", y="SPV_self", data=spv)

**Question 2.1 Use `smf.ols` to fit and summarize a regression with "SPV_self" as the dependent variable and "polit" as the indpendent variable. Interpret the slope (hint: how would going from the lowest level of 1 to the highest level of 7 change the predicted SPV?)."**

In [None]:
# Code for 2.1

*Words for 2.1*

This regression can tell us if more liberal/conservative individuals have a higher SPV, but by assuming a linear relationship we can get at the idea that those on the ideological extremes have a higher SPV.

One way we can see this visually is by using `sns.regplot` and adding an `order=2` argument, which adds a squared term to the regression. 

In [None]:
sns.regplot(x="polit", y="SPV_self", data=spv, order=2)

Hmm still looks pretty flat. Let's confirm this by running a regression.

**Question 2.2 . Then fit and summarize a regression predicting "SPV_self" using a linear and squared term of `polit` (recall to make a squared term of a variable "X" we can add $I(X**2)$ to our regression formula.)**

In [None]:
# Code for 2.2

Now let's do some similar analysis looking at the relationship between political ideology and the perception of outparty SPV.

**Question 2.3.  Using a combination of `sns.regplot` and `smf.ols`, examine the linear and quadratic relationship between political ideology and beliefs about outparty SPV in this survey. Interpret your results.** 

In [None]:
# Code for 2.3

*Words for 2.3*

## Part 3: Do corrections reduce SPV?

Now we are ready to get to the main point of the paper. So far we know that self SPV and perceptions of outparty SPV have a positive correlation, and that individuals exaggerate the outgroup support for SPV. So, what will happen if we *correct* their beliefs about the outparty by telling subjects the truth before figuring out their own SPV? We might expect that learning that the outparty rarely supports violece will make individuals less supportive of violence themselves.

In other words, we can lower perceived SPV of the outparty in a randomized fashion, we can get a sense of the causal effect of these perceptions without selection bias.

The authors check this by randomly assigning some people to obsere a "correction" which tells them the true outparty SPV. Given what we learned above, for most people this will make them realize their political opponents support violence less than they really thought. The control group does not receive this information. 

The variable which stores this treatment status is called "condition". To ease interpretation lets turn it into a 0/1 variable:

In [None]:
spv['correct'] = np.where(spv['condition']=="out_correct", 1,0)

**Question 3.1. Fit and summarize a bivariate regression with "SPV_self" as the dependent variable and "correct" as the independent variable. Interpret the coefficient on "correct."**

In [None]:
# Code for 3.1

*Words for 3.1*

In the paper the authors also include some control variables for respondent demographics. This isn't about trying to control for counfounding variables since we know that the treatment status was randomized, so those who got the correction should be otherwise similar on average. However, doing so often leads to less noise in our estimates (for reasons we won't cover), and can also provide some additional interesting information about who tends to support violence. 

Some of the variables we want to include as controls are stored as "factors" which are categorical variables that can take on two or more values. The `smf.ols` function will automatically create several 0/1 variables which will tell us the difference in the predicted mean between each category and a "base category" (holding other variables fixed). For example, here is a table of the education variable:


In [None]:
pd.value_counts(spv['education'])

If we use this as an independent variable in a regression we get:

In [None]:
smf.ols("SPV_self ~ education", data=spv).fit().summary()

Note we get coefficients corresponding to "Graduate", "HS or less", and "Some college". Where is "bachelor's degree" This was set as the base category. So we can interpret the three coefficients as a comparison of the average SPV of each category compared to those with a bachelor's degree. In this case, those with HS or less have a higher support for SPV than those with a BA degree while those with a graduate degree have less SPV than those with a BA. 

**Question 3.2 Fit and summarize a multivariate regression predicting "SPV_self" using the correction variable, with control variances "gender", "age", "dem", and "education". Does this affect the coefficient on "correction" compared to the bivariate case? Interpret another coefficient from the regression.**

In [None]:
# Code for 3.2

*Words for 3.2*

We might think that these corrections may have a different effect on people who start with a low perceived outparty SPV vs those who think the outparty has a high SPV. To check this, the authors calculated a variable callsed "SPV_meta_over", which is the difference between the perceived outparty SPV and the truth. 


**Question 3.3. Make a histogram of this variable and interpret what you find.**

In [None]:
# Question for 3.3

*Words for 3.3*

**Question 3.4. Fit an summarize an interactive model predicting self SPV with the correction, "SPV_meta_over", and an interaction between the two (refer back to the class notebook for an example of how to do this.)**

In [None]:
# Code for 3.4

Note that when "SPV_meta_over" is equal to zero, the subject had a correct perception of the outgroup before the correction. 

**Question 3.4. What does this regression say about the effect of the correction on someone who had accurate beliefs to start with.**

*Words for 3.4*

**Question 3.5. Now interpret the coefficient on the interaction term. As one is more pessimistic (and incorrect) about the outgroup, how does the effect of the correction change?**

*Words for 3.5*

**Question 3.6. What is the effect of the correction on someone who overestimated the outparty SPV by 50 points? (Note this is not too extreme; the average is around 40**

In [None]:
# Code for 3.6

**Question 3.7. Interpret these results in light of our initial theory. Does changing perceptions of outparty SPV affect individuals on SPV? Who does the correction seem to affect, and why?**

*Words for 3.7*