# PS 137 - Data Assignment 1
## Due - 2pm on 2/4

In this notebook we will continue the analysis of how our class's anwers on questions rating US democracy and what matters for democracy compare to experts. We will also bring in some data from the general public. First let's load up the data

In [None]:
# Run this cell!
suppressPackageStartupMessages(library(tidyverse))
class <- read_csv("data/students.csv")
experts <- read_csv("data/experts.csv")
public <- read_csv("data/public.csv")

## Part 1: What to the people say?

Let's take a look at the first few rows of the public data

In [None]:
head(public)

There are several variables on the importance of several aspects of democracy that you rated (`fairelec`, `contrib`, `media`), and a also some more. E.g., one we will look at later is `equalvote`), which corresponds to the importance of "All adult citizens have equal opportunity to vote."

The only `rating` variable here is one corresponding to December 2024. The analogous ratings from the experts and class were:

In [None]:
mean(class$rating_dec24)
# Recall for experts there are a few missing values so we need an na.rm=TRUE argument
mean(experts$rating_now, na.rm=TRUE)

**Question 1.1. Take the mean of the public rating. How does it compare to the class and experts? What might explain the difference? (1-2 sentences)**

*Words for 1.1*

Another variable in the public data is how the respondent voted. Lets make a table of the responses to this

In [None]:
table(public$vote)

Most of the sample said they voted (if this were a public opinion class we would talk more about how more people say they voted than really do!), amd among those who did a handful voted for a third party, while the votes among the two major candidates are pretty even. 

Let's see if people who voted (or at least said they voted) in different ways have different assessments of US democracy as of December 2024. 

**Question 1.2. Before we check, which of these groups do you think will rate US democracy the highest at this time point? The lowest? Note: we won't grade you on being correct, just making a reasonable argument! (2-3 sentences)**

*Answer to 1.2*

One way to compare the means across groups who voted differently is to first take subsets of the data based on the vote, and then taking the average withing each subset. (Feel free to do this another way if you'd like!) For example, to get hte average rating among Harris supporters we can do the following:

In [None]:
public_harris <- subset(public, vote=="Harris")
mean(public_harris$rating)

**Question 1.3. Compute the average belief about US democracy among Trump supporters, "other" suporters, and people who abstained. Is this different from what you expected? Anything different from your expectations? If so, come up with an explaination for what you did find. (2-3 sentences)**

*Words for 1.3*

## Part 2: Views of Democracy

Now let's see how these three different groups weigh different aspects of democracy. Here is a reminder of question asked among all three groups, and how the coding works:

1. `fairelec`: Fair Elections  
**Definition:** Elections are conducted, ballots counted, and winners determined without pervasive fraud or manipulation.  

2. `contrib`: Campaign Contributions  
**Definition:** Public policy is not determined by large campaign contributions.  

3. `judicial`: Judicial Independence  
**Definition:** The elected branches respect judicial independence.  

4. `turnout`: Voter Turnout  
**Definition:** Voter participation in elections is generally high.  

5. `speech`: Freedom of Speech  
**Definition:** Government protects individuals' right to engage in unpopular speech or expression.  

6. `media`: Media Freedom  
**Definition:** Government does not interfere with journalists or news organizations.  

All three samples were asked this question on a 1 to 4 point scale:

1. **Not relevant:** This has no impact on democracy (1)
2. **Beneficial:** This enhances democracy, but is not required for democracy (2)
3. **Important:** If this is absent, democracy is compromised (3)
4. **Essential:** A country cannot be described as democratic without this (4)

You may recall that for most of these questions most of you and the experts tended to answer either 3 or 4. To make differences across questions clear, we will generally focus on what proportion of respondents answered 4, i.e., said the dimension was essential. 

To spare you some coding, I'm going to create a data frame where each row corresponds to a dimension, and then we compute the share of each group that says this dimension is essential for democracy. (I'll explain what is going on with comments, but not super important to follow how the code works.)

In [None]:
# Defining the columns we want to average
shared_cols <- c("fairelec", "contrib", "judicial", "turnout", "speech", "media")
# The apply function allows us to take the "row" or "column" average of a table
# We first subset to the columns we want
# Taking the mean of the number equal to 4 gives the share who say it is essential
# The 2 argument means we want the column averages
# Since there are NAs, we need an na.rm
pub_means <- apply(subset(public, select=shared_cols)==4, 2, mean, na.rm=TRUE)
expert_means <- apply(subset(experts, select=shared_cols)==4, 2, mean, na.rm=TRUE)
class_means <- apply(subset(class, select=shared_cols)==4, 2, mean, na.rm=TRUE)

# 
essential_df <- data.frame(dimension=shared_cols, public=pub_means,
                     expert=expert_means, class=class_means)
essential_df

Now we can quickly compare how the class views the importance of each dimension with a *bar plot*. The first argument here (`essential_df$class`) tells us how high to make each bar. The second (`essential_df$class`) gives the labels for each point. The third (`ylim=c(0,1`) says to make the y axis range from 0 to 1, which will make it easier to compare across groups. 

In [None]:
barplot(essential_df$class, names.arg=essential_df$dimension, ylim=c(0,1))

**Question 2.1. Interpret this graph. What dimensions did the class think is least and most important? (2-3 sentences)**

*Words for 2.1*

**Question 2.2. Now make similar barplots for the public and experts. Find two interesting comparisons across the groups, and explain why you think they might arise. (2-3 sentences)**

*Words for 2.2*

Another way to visualize the data is to make a scatterplot where the importance of each dimension for one group is on the x axis and for the other is on the y axis.

In [None]:
plot(essential_df$class, essential_df$public, xlim=c(0,1), ylim=c(0,1), 
         xlab="Share Essential (class)", ylab="Share Essential (public)")

We can see from this that the responses are generally positively correlated: things the class thinks are more important are also considered more important among the public. 

To help interpret, let's (1) add a "45 degree line" which corresponds to parts of the graph where both groups rate the trait equally, and (2) label the points with the first letter of the corresponding dimension.  

In [None]:
plot(essential_df$class, essential_df$public, xlim=c(0,1), ylim=c(0,1), 
         xlab="Share Essential (class)", ylab="Share Essential (public)",
    pch=essential_df$dimension)
abline(a=0, b=1)

The 'f' lies basically on the 45 degree line, meaning the class and public rate this trait about equally.

**Question 2.3. From the graph, identify a trait which the class thinks is more important than the public, and one that the public thinks is more important than the class. Why might this be? (2-3 sentences)**

*Words for 2.3*

**Question 2.4. Now make a similar graph with the class rating on the x axis an the expert rating on the y axis. What can you learn about the difference of the ratings, and what might drive this? (2-3 sentences)**

In [None]:
plot(essential_df$class, essential_df$expert, xlim=c(0,1), ylim=c(0,1))
abline(a=0, b=1)

*Words for 2.4*

Here is a description of the four dimensions which are in the public and expert data but not the class data (starting the numbering at 7 since we already considered 6):

7. `concede`: Losers Concede
**Defintion**: Incumbent politicians who lose elections publicly concede defeat

8. `equalvote`: Equal voting
**Definition**: All adult citizens have equal opportunity to vote

9. `equalrights`: Equal rights	
**Definition**: All adult citizens enjoy the same legal and political rights

10. `protest`: Right to protest
**Definition**: Government protects individuals' right to engage in peaceful protest

11. `limits`: Constitional Limits
**Definition**: Executive authority cannot be expanded beyond constitutional limits

12. `facts`: Shared facts
**Definition**: Even when there are disagreements about ideology or policy, political leaders generally share a common understanding of relevant facts

**Question 2.5. Which of these dimensions(s) might do a good job of capturing Dahls "participation" component of democracy? Which might do a good job of capturing Acemoglu and Robinson's definition of democracy? (2-3 sentences)**

*Words for 2.5*

**Question 2.6 [Optional]. Examine the expert and public responses to the dimensions you identified for 2.5. Do these groups seem to agree with Dahl and/or Acemoglu and Robinson's defintion of democracy?**