# PS 137 In Class Notebook with NELDA data

In this notebook we will work with the NELDA data, a project led by our own Susan Hyde. 

The <a href="https://www.dropbox.com/scl/fi/xivptq8yhnl5f1pj8npvm/NELDA_Codebook_V5.pdf?rlkey=uybuf7fzhrxk63c68jk2muenv&e=1&dl=0">codebook for the data is here.</a>

In the main version of the data, each row correspond to a legislative or executive (i.e., presidental) election. Let's load that up:

In [None]:
elevel <- read.csv('data/Nelda.csv')
head(elevel)

Most of the variables in the data are answers to yes or no questions. For example, one used in the paper is `nelda15`, which answers "Is there evidence that the government harassed the opposition?" Let's look at a table of this:

In [None]:
table(elevel$nelda15)

It will help for future analysis to create a variable equal to 1 when the answer to this is "yes" and "no" otherwise. Here is one way to do that:


In [None]:
elevel$harass <- ifelse(elevel$nelda15=="yes", 1, 0)
table(elevel$harass)

Another variable used in the paper is nelda16, or "In the run-up to the election, were there allegations of media bias in favor of the incumbent?"

**Make a table of this variable, and then make a new variable in `elevel` called `mediabias` equal to 1 if this variable is coded as "yes"**

Now let's see if opposition and media harassment tend to go together. Over the whole time period, we can see the share of elections with media bias by taking the mean, since the mean of a 0/1 variable gives the share that are 1s.

In [None]:
mean(elevel$mediabias)

**What share of elections are coded as having opposition harassment?**

Now to get to the relationship, we can look at the share of elections with media bias among those where the opposition is NOT harassed by subsetting to elections where `elevel$harass==0`:

In [None]:
mean(elevel$mediabias[elevel$harass==0])

**What is the share of elections with media bias among those where the opposition did face harassment. Interpret this result**

*Interpretion*

Another way to see this is to run a linear regression predicting media bias with harassment:

In [None]:
summary(lm(mediabias ~ harass, data=elevel))

The intercept tells us that when harass=0 we would predict media bias will be 0.17, and when harass goes up by 1 unit (e.g., from 0 to 1) this increases by 0.51. Think about how this relates to the means you found above

**Now run a regression where the dependent variable is `harass` and the independent variable is `mediabias`. Interpret the coefficients.**

The Hyde paper looks a trends in the number of elections where **either** there is media bias or opposition harassment. We can create this variable by looking at whether the sum of these variables is greater than 1, which will capture casees where there is one but not the other (and hence the sum is 1) or both (sum of 2)

In [None]:
elevel$problems <- ifelse(elevel$mediabias + elevel$harass > 1, 1, 0)
table(elevel$problems)

We can plot a count of the of elections with these problems (part of figure 1) using the `tapply` function. Before we used `tapply(x,y,mean)` to get the average of x for each possibly value of y. We can also use this function to take sums, which does the counting. 

In [None]:
plot(1945:2020, tapply(elevel$problems, elevel$year, sum), type="l")

**Make a plot of the share of elections with problems for each year. (Hint: you still want to use `tapply`, but apply a different function!) Interpret the graph. Does it seem consistent with the discussion in the paper?**

To make things more clear, let's follow the lead of the paper and look at "moving averages". That is, rather than just plotting a count or a mean for an individual year, we will plot it for the 5 years leading up to each year. This will smooth things out.

By default, this function will use a 5 year window, though we can change this. When the `msum` argument is true, the function will create a moving average of the *sum* of `avgvar`. Otherwise it will compute a "moving average of the average" of `avgvar`. This will be more clear with examples!

In [128]:
makema <- function(yearvar, avgvar, window=5, msum=TRUE){
    allyears <- unique(yearvar)
    allyears <- allyears[order(allyears)]
    mayears <- allyears[(window):length(allyears)]
    ma <- NA
    for (i in 1:length(mayears)){
        ma[i] <- ifelse(msum, sum(avgvar[yearvar > mayears[i] - window + 1 & yearvar <= mayears[i]], na.rm=TRUE)/window,
                        mean(avgvar[yearvar > mayears[i] - window + 1 & yearvar <= mayears[i]], na.rm=TRUE))
    }
    return(data.frame(year=mayears, avg=ma))
}

To replicate something like the red curve in Figure 3 of the paper (which uses a related but more technical smoothing process), we can compute the moving average of the sum of elections with problems by year like this:

In [129]:
probms <- makema(elevel$year, elevel$problems)
head(probms)

Unnamed: 0_level_0,year,avg
Unnamed: 0_level_1,<int>,<dbl>
1,1949,2.6
2,1950,2.4
3,1951,3.0
4,1952,3.0
5,1953,2.6
6,1954,3.6


Note the first year we can compute a moving average of the sum is 1949, since the data starts in 1945 and the first 5 year window is 1945-1949. 

**Plot this moving average of the sum of problematic elections. How does this compare to the red curve in the paper?**

**Now repeat this process but change the "window" to 10 years. How does this change the trend?**

We might also be interested in the share of elections that have problems, not just the count. To do this, we can add a `msum=FALSE` argument to `makema`. If we do this, it will compute the share of elections with problems in each year, and then compute a moving average of that.

**Make a plot of the moving average of the share of elections with problems (make the window as big or small as you want!)**

Another variable in the data is `nelda46`: "were Western monitors present?" 

**Make a variable called `elevel$westmon` equal to 1 if this is true ("yes") and 0 otherwise. Them plot a moving average of the share of elections with western monitors over time.**

Finally, let's look at the relationship between being in an intergovernmental organization like the IMF and having clean elections. I'll do the easy part for you here:

In [None]:
elevel$igo <- ifelse(elevel$nelda53=="yes", 1,0)

**Run a linear regression with problems as the dependent variable and igo membership as the independent variable. Interpret the result.**