# Laboratories of Democratic Backsliding - Jacob M. Grumbach (2022) - Replication

**General instructions:** these replications will be similar to labs, though with less guidance than we would typically give. An advantage you have is that the answer you are supposed to get for most questions is in the paper.

When we say to "replicate" a table we don't mean you need to reproduce every bit of formatting. Typically for regressions we just want you to verify that the coefficients match those reported in the table. For graphs, the output should look pretty similar, though again you don't need to make the formatting look exactly the same. 

A linguistic note: a tricky thing about this paper is that one of the names of the major parties in the US ("Democratic party") also corresponds to the key outcome variable (level of state democracy). One way we distinguish between these is that when we write Democratic with a capital D we are referring to the party and when we write democratic with a lower-case d we are referring to the kind of government (or the "regime type" to use standard comparative politics terminology). Be sure to be careful about this distinction, and always be clear whether you are referring to the party or the outcome.

In [None]:
## Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt 
%matplotlib inline

## Part 1: Loading and exploring the data

In [None]:
## Open Data
df = pd.read_csv("data_grumbach.csv") 

A nice thing to do after loading up any data frame is to use the `.info` function to see what variables are inside. Adding a `verbose=True` argument forces it to display all variables.

In [None]:
df.info(verbose=True)

One of the key variables we will study is "party control", which is equal to 0 when Republicans have unified control of the state government, 2 when Democrats have unified control of state government, and 1 when there is divided government (e.g., a governer from one party while another party has a majority in the legislature).

We can explore patterns in this variable across states using the `pd.crosstab` function, which creates a table that counts how many observations for each combination of the two variables.

In [None]:
pd.crosstab(df['state'], df['partycontrol'])

For example, we can learn from this that over the time period in this data frame, Wyoming had 11 years of unified Republican control, and 8 years of divided government. 

**Question 1.1. Make a crosstab of the "year" variable and party control. What is something we can learn from the table?**

In [None]:
# Code for 1.1

*Words for 1.1*

The key outcome we will study is the state-level democracy index, which is in the "democracy_mcmc" variable. Here is a histogram of that variable:

In [None]:
df.hist("democracy_mcmc")

We can also take the mean and standard deviation:

In [None]:
np.mean(df["democracy_mcmc"]), (np.var(df["democracy_mcmc"]))**(1/2)

This variable is loosely normally distributed with mean 0 and standard deviation 1 (this is by construction, see the paper for more details), so we can interpret a one unit increase in the state level democracy score as about a one standard deviation increase in how democratic a state is. 

**Question 1.2. Make seperate histograms of the distribution of "democracy_mcmc" for state-years under unified Republican control and unified Democratic control. Interpret any differences you see (hint: note the x axis may be different).**

In [None]:
# Code for 1.2

*Words for 1.2*

**Question 1.3. Use `smf.ols` to fit and summarize a regression with "democracy_mcmc" as the dependent variable and "partycontrol" as in independent variable. Interpret the coefficient on "partycontrol". (Hint: think carefully about one a one unit change in each variable means!)**

In [None]:
# Code for 1.3

*Words for 1.3*

**Question 1.4 Why might this coefficient not correspond to a causal effect of party control on the level of democracy?**

*Words for 1.4*

## Part 2: Replicating Graphs

Now let's do our first replication of an analysis in the paper, by creating versions of figures 3 and 4. We can do this with the `sns.lineplot` function. If we run this function with year on the x axis and "democray_mcmc" on the y axis it will plot the average of this variable by year. (Set the `ci=0` since the confidence intervals in the paper are doing something different than what the function does).

In [None]:
sns.lineplot(data=df, x='year', y='democracy_mcmc', ci=0)

We can also use a `hue='state'` argument to separate out the trend for each state. Though this is kind of messy!

In [None]:
sns.lineplot(data=df, x='year', y='democracy_mcmc', ci=0, hue='state')

To make a version of figure 3, let's do something similar to the previous graph but combine all of the states other than NC, TX, and WA in to an "other" category. First we'll create a variable called "Stlab" which is just "other" for all observations. Then we can overwrite it for North Carolina to be "NC"

In [None]:
df['Stlab'] = 'Other'
df.loc[df['state'] == "North Carolina", "Stlab"] = "NC"
df.value_counts('Stlab')

**Question 2.1. Use `sns.lineplot` to compare the trend of "democracy_mcmc" in North Carolina to all other states.**

In [None]:
# Code for 2.1

**Question 2.2 Change the "Stlab" variable to "TX" for observations corresponding to Texas and "WA" for observations corresponding to Washington. Then use `sns.lineplot` to compare the trends in "democracy_mcmc" of these three states to the average of all other states. Interpret this graph. (Hint: to see how party control changed for these states over the time window, you can make a seperate lineplot with "partycontrol" as the y-axis variable.)**

In [None]:
# Code for 2.2

*Words for 2.2*

**Question 2.3. To make something like Figure 4, plot the trend in "democracy_mcmc", seperated out by the "partycontrol" variable using a `hue=` argument. (Note this will look somewhat different because the paper uses a method to smooth out the trends. The general idea should be about the same though). Interpret this graph**

In [None]:
# Code for 2.3

*Words for 2.3*

## Part 3: Regressions

Finally let's replicate some of the regressions in tables 1 and 2. 

The regressions in the paper only include observations that have data for all of the key variables. For some state-years there is no data for the "competition" variable, so we drop these values.

In [None]:
# Drop NaN values to keep 833 observations like the paper
df = df.dropna(subset=['competition_allleg_lag']) 

The models in table 1 all have "democracy_mcmc" as the dependent variable, and also include state and year fixed effects. Recall if we want to make sure a variable is treated as categorical we can include it in the `smf.ols` formula as `...+C(varname)+...`. The other variables used in these regressions are:
- Competition is in the 'competition_allleg_lag' column
- Polarization is the 'polarization_avg' column
- Republican is in the 'republican' column

*DUE TO HOW THE KEY OUTCOME VARIABLE IS GENERATED, THE COEFFICIENTS IN THE NOTEBOOK WILL BE SLIGHTLY DIFFERENT THAN THE PAPER. SEE <a href="https://anthlittle.github.io/files/Grumbach_Tables.pdf">HERE</a> FOR VERSION OF TABLES 1 AND 2 WITH COEFFICIENTS THAT YOU SHOULD GET*

**Question 3.1. Use `smf.ols` to fit and summarize a regression which replicates model 1 in table 1. Note this should include state fixed effects and year fixed effects.**

In [None]:
# Code for 3.1

**Question 3.2. Now do the same for models 2 and 3 in table 1. Interpret the results of these three models.**

In [None]:
# Code for 3.2

*Words for 3.2*

**Question 3.3. Now replicate the model in column 7, where "Competition x Republican" is an interaction term between these two variables. Suppose that hypothetically the Coefficient on Competition X Republican was around .475 (i.e., the same as the Competition coefficient but positive): what would this mean how the relationship between close elections, party control, and levels of democracy?**

In [None]:
# Code for 3.3

*Words for 3.3*

**Question 3.4. Finally, replicate model 4 in table 2. Note the relevant variables to add are pct_black_change and pct_latino_change. Interpret the coefficients on these variables.**

In [None]:
# Code for 3.4

*Words for 3.4*

**Question 3.5. All of the analysis here compares Republican control to divided or Democratic control. Run a regression similar to those above but which also can answer the question of whether there is a meaningful difference between divided government and unified Democratic state government.**

In [None]:
# Code for 3.5

*Words for 3.5*

## Part 4. Wrapping up

**Question 4.1. What did you learn from this replication exercise? (3-4 sentences)**

*Words for 4.1*

**Question 4.2.  What additional data might you want to collect to build on the findings here? What would you expect to find? (4-5 sentences)**

*Words for 4.2*