# POLSCI 3

## Week 5 Activity 1: Randomized Experiments

As previewed in Lecture, today we will analyze data from a really interesting experiment involving over 100,000 US voters.

In this experiment, the authors sent out mailings to registered voters encouraging them to vote. Voters were randomized to either no mailing (the control group) or one of several different mailings, described below. The mailers used increasingly strong "social pressure" to encourage voters to vote.

### Part 1: Importing the Data

Let's first take a look at the data.

In [None]:
#RUN THIS CELL
library(testthat)

social <- read.csv('ps3_week5_social_pressure.csv')
head(social)

Here's what the variables mean:

- Outcome: `outcome_voted`: 1 if that particular person voted, 0 if not.
- Treatments:
    - `control_group` : 1 if assigned in control group and 0 otherwise.
    - `treat_civic`: mail with "do your civic duty" message, 1 if assigned and 0 otherwise.
    - `treat_hawthorne`: mail that says that the voter is being observed, 1 if assigned and 0 otherwise.
    - `treat_self`: mail with own voting history, 1 if assigned and 0 otherwise.
    - `treat_neighbors`: mail with own and neighbors' voting history, 1 if assigned and 0 otherwise.
- Other Variables:
    - `sex`: 1 female, 0 male
    - `yob`: year of birth
    - `g2000`: voted in 2000 general election
    - `g2002`: voted in 2002 general election
    - `median_income`: median income in the last 12 months in person's neighborhood
    - `p2004`: voted in 2004 primary election
    - `democrat`: registered Democrat
   
### Reminder about Treatment Conditions

Here's a reminder about the differences between the treatment conditions. In the table below, each row is one of the conditions, and the columns tell about the mail sent to the people in that condition. The end of the notebook has pictures of all the mail sent to people in the various conditions if you want to take a look.
    
<table>
<thead>
  <tr>
    <th>Condition</th>
    <th>Mailed Reminder<br>to Vote?</th>
    <th>Told Turnout<br>Being Watched</th>
    <th>Given Own<br>Vote History</th>
    <th>Neighbors and<br>Self Given All<br>Neighbors' Vote<br>History</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Control</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Civic Duty</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Hawthorne</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Self</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Neighbors</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
  </tr>
</tbody>
</table>

------

### Part 2: Data Analysis

**Question 1.** Create a variable called `meanControl` that contains the mean of `outcome_voted` among individuals assigned to the control group. Reminder: the name of the dataset is `social`.

*Hint: first subset the `social` dataset to just the control group by using the `control_group` variable.*


In [None]:
# Compute mean of turnout in control group
# First, subset the data to the control group.
control.group <- subset(social, ...)

# Second, take the mean of the outcome_voted variable among the control group.
meanControl <- mean(...)

meanControl * 100 # Prints the percentage of people in the control group who voted.

-----

**Question 2.** Create a variable called `meanNeighbors` that contains the mean of `outcome_voted` among individuals assigned to the "Neighbors" group. Reminder: the name of the dataset is `social`.

*Hint: first subset the `social` dataset to just the control group by using the `treat_neighbors` variable.*


In [None]:
# Compute mean of turnout in neighbors group
# First, subset the data to the neighbors mailing group.
neighbors.group <- subset(social, ...)

# Second, take the mean of the outcome_voted variable among the neighbors mailing group.
meanNeighbors <- mean(...)

meanNeighbors * 100 # Prints the percentage of people in the control group who voted.

------

**Question 3.** Relative to the control group, what was the effect of the neighbors mail on voter turnout? To answer this, write a line of code that calculates the difference between `meanControl` and `meanNeighbors`.


In [None]:
effect.of.neighbors.mail <- NULL # YOUR CODE HERE
effect.of.neighbors.mail * 100 # Prints the effect in percentage points.

-----

**Question 4.** Professor Frank Lee Dense, a political scientist at Stanford, argues that we can't tell whether the "Neighbors" mail increased voter turnout because of omitted variable bias. In particular, Professor Dense said that he thinks age is an omitted variable: he thinks older people have more neighbors, and so were especially likely to get the "Neighbors" mailing; and older people are especially likely to vote, too. This means, he argues, that age is an omitted variable in the study when we compare people in the "Neighbors" group and the control group.

Is he right? Let's check: how much older, on average, are people assigned to the "Neighbors" group than the control group?

*Hint: use the `yob` variable, which records people's year of birth. Remember that you already made `control.group` and `neighbors.group` above, too.*


In [None]:
# First, compute the mean year of birth in the Control group
mean.yob.control.group <- NULL # YOUR CODE HERE

# Second, compute the mean year of birth in the Neighbors group
mean.yob.neighbors.group <- NULL # YOUR CODE HERE

# Finally, compute how much older the neighbors group is than the control group using simple subtraction
neighbors.group.this.many.years.older <- NULL # YOUR CODE HERE

neighbors.group.this.many.years.older # Let's take a look at the answer

-----

**Question 5.** Why was Professor Dense wrong? That is, why did you find that the Neighbors and control groups had such similar average ages?

- `'a'`: The researchers who ran this particular study just got lucky: if someone else did this same randomized experiment again, it's very possible that the "Neighbors" group might end up being much older than the control group.
- `'b'`: The researchers who conducted the study must have removed older people from the "Neighbors" group in order to make the "Neighbors" and control group be so similar in age.
- `'c'`: Randomized experiments do not suffer from omitted variable bias: because of random assignment, the treatment and control groups will be similar on all variables, including age.

Put your answer (`a`, `b`, or `c`) between the `'` quotes below, replacing just the `...`. (For example, if you wanted to answer option a, you would have `q5 <- 'a'` below.)


In [None]:
q5 <- '...'

-----

**Question 6.** Professor Dense also claimed the experiment can be run better and cheaper by the following method: rather than sending mailers addressed to 1 individual voter, we send the mailers only to addresses with 2 or more voters, and address both of them using the same mailer. For addresses with 2 or more voters, we randomly assign them into one of the treatment conditions. For addresses with only 1 voter, we assign them to the control group.

Is this a good idea?

- `'a'`: Yes, since the mailers are randomly assigned among those living in addresses with 2 or more voters, there is no omitted variable bias and we save money in conducting this experiment.
- `'b'`: Yes, the treatment and control groups are now comprised by voters that are more similar than before. This increases comparability and we save money in conducting this experiment.
- `'c'`: No, because there may be omitted variable bias in assignment into one of the treatment conditions among the group of those living in addresses with 2 or more voters.
- `'d'`: No, those who live in addresses with 2 or more voters may systematically differ from those who live in addresses with only 1 voter. Whether people are in the control group or not may not be random.

Put your answer (`a`, `b`, or `c`) between the `'` quotes below, replacing just the `...`. (For example, if you wanted to answer option a, you would have `q6 <- 'a'` below.)


In [None]:
q6 <- '...'

-----

# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save and Checkpoint`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.

In [None]:
ottr::export("Week5_Activity1.ipynb")

After you hit "Run" on the cell above, wait for a moment (about 5 seconds), then click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.

## Appendix: Mail Images

#### Civic Duty Mailer
<img src="mail_images/civic_duty.png" width="500"/>

#### "Hawthorne" Mailer
<img src="mail_images/hawthorne.png" width="500"/>

#### "Self" Mailer
<img src="mail_images/self.png" width="500"/>

#### "Neighbors" Mailer
<img src="mail_images/neighbors.png" width="500"/>