# POLSCI 3

## Week 5, Lecture 2

In this activity, we are going to practice our new approach for comparing treatment and control groups in experiments, using the `difference_in_means()` function from the `estimatr` package.

### Revisiting the Data

We're going to be using the same social pressure dataset, but with a tweak to how the treatment variable is recorded.

In many datasets with experiments that have more than 2 treatment conditions (e.g., instead of just treatment and control, multiple treatments and control), which treatment condition a person is in is saved as a string. You'll see why in a moment.

In [None]:
#RUN THIS CELL
library(testthat)
library(estimatr) # This loads the estimatr package, where the difference_in_means() function is from.

social <- read.csv('ps3_week5_social_pressure_str.csv')
head(social)

Here's what the variables mean:

- Outcome: `outcome_voted`: 1 if that particular person voted, 0 if not.
- `treat` is now a string with the following values:
    - `"control"`: assigned to control group
    - `"civic"`: mail with "do your civic duty" message
    - `"hawthorne"`: mail that says that the voter is being observed
    - `"self"`: mail with own voting history
    - `"neighbors"`: mail with own and neighbors' voting history
- Other Variables:
    - `sex`: 1 female, 0 male
    - `yob`: year of birth
    - `g2000`: voted in 2000 general election
    - `g2002`: voted in 2002 general election
    - `median_income`: median income in the last 12 months in person's neighborhood
    - `p2004`: voted in 2004 primary election
    - `democrat`: registered Democrat

## Mean Turnout by Treatment Condition

As we discussed, one of the interesting things about this experiment is that there isn't just one treatment group and one control group. Instead, there's a control group and several different treatment groups. Let's remind ourselves what these are:

In [None]:
table(social$treat)

Here's a reminder about the differences between the treatment conditions. The end of the notebook has pictures of all the mail sent to people in the various conditions if you want to take a look.
    
<table>
<thead>
  <tr>
    <th>Condition</th>
    <th>Mailed Reminder<br>to Vote?</th>
    <th>Told Turnout<br>Being Watched</th>
    <th>Given Own<br>Vote History</th>
    <th>Neighbors and<br>Self Given All<br>Neighbors' Vote<br>History</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Control</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Civic Duty</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Hawthorne</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Self</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Neighbors</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
  </tr>
</tbody>
</table>

Now, let's take a look at the the average turnout rate by treatment condition:

In [None]:
# Don't worry about how this code works, just take a look at the results.
data.frame(Condition = c('Control', 'Civic Duty', 'Hawthorne', 'Self', 'Neighbors'),
          `Turnout Rate` = 100*with(social, c(mean(outcome_voted[treat=='control']), mean(outcome_voted[treat=='civic']),
           mean(outcome_voted[treat=='hawthorne']), mean(outcome_voted[treat=='self']), mean(outcome_voted[treat=='neighbors']))))

In the last class, we compared the Control group and the Neighbors group.

But comparisons between other groups can be pretty interesting too! For example, we could compare the Control and Civic Duty groups to look at the effect of just sending some mail with a bland encouragement to vote. Or, we could compare the Civic Duty and Hawthorne groups -- how much does it increase turnout just to tell people they're being watched?

Note that in each of these comparisons, there's always a **baseline**: we always look at the effect of something *relative to a baseline*. Usually in experiments the baseline is a control group, but when there are multiple treatment conditions, we can compare them to each other.

For example, if we look at how much higher turnout is in the Hawthorne group than the Civic Duty group, the Civic Duty group would be the baseline (and we'd be computing average turnout in Hawthorne group minus average turnout in Civic Duty group).

## Using the `difference_in_means()` function

Let's compute the effect of the Hawthorne mailer.

### Old way to comptue effect of Hawthorne mailer

In [None]:
# First, subset to just those assigned to Hawthorne mailer
hawthorne.subset <- subset(social, treat == 'hawthorne')

# Second, subset to the control group
control.subset <- subset(social, treat == 'control')

# Now, take the difference (this is already shorter than what you've learned before)
hawthorne.effect <- mean(hawthorne.subset$outcome_voted) - mean(control.subset$outcome_voted)

hawthorne.effect # Let's see the results.

### New way, using `difference_in_means()`

Here's the generic recipe:

`difference_in_means(outcome.name ~ treatment.name, dataset.name, condition1 = 'control.condition', condition2 = 'treatment.condition')`

You'll replace:
- `outcome.name` with the name of the variable name in the dataset that contains with the outcome
- `treatment.name` with the name of the variable name in the dataset that contains the treatment variable
- `dataset.name` with the name of the dataset
- `'control.condition'` with whatever the *baseline* for comparison is (usually the control group name)
- `'treatment.condition'` with whatever the *treatment* you're trying to estimate the effect of is (one of the treatment group names)

(This is available in the <a href="https://bcourses.berkeley.edu/courses/1505753/pages/r-cheat-sheet" target="_blank">R Cheat Sheet</a>.)

Let's see this for the effect of the Hawthorne mailer:

In [None]:
hawth_effect <- difference_in_means(outcome_voted ~ treat, social, condition1 = 'control', condition2 = 'hawthorne')
hawth_effect

For now, we are just going to focus on the first number, under `Estimate`. It matches what we found using the old way!

There's three advantages of this new approach:

1. One-liner!
2. Starting next week, we'll look at the other numbers here (e.g., `Std. Error`) and interpret them.
3. Easy to look at different comparisions. Let's look at this next.

### Reminder: What does the estimate mean?

Reminder: The estimate is **our best guess of the true average treatment effect**.

In Week 4, we covered what it is we are trying to estimate: the true average treatment effect. Because we can never see all the potential outcomes in a real-world dataset, we can never know the exact true average treatment effect. But an estimate from an experiment represents our best guess of what the true average treatment effect is in that dataset.

### Comparing two treatment conditions

Earlier, we compared the Hawthrone condition to the Control group.

Suppose a researcher had a *theory* that telling people they are being studied has a big effect on people's behavior.

(This is actually a common idea, called the Hawthorne effect, which <a href="https://en.wikipedia.org/wiki/Hawthorne_effect" target="_blank">Wikipedia helpfully defines</a> as when "individuals modify an aspect of their behavior in response to their awareness of being observed." There's some fun stories on that webpage if you have a few minutes.)

Can we do a good job of testing this theory by comparing how often people voted in the Hawthorne and Control groups? That is, can we tell whether just telling people they are being studied has a an effect on their behavior?

Let's look at this table again:

<table>
<thead>
  <tr>
    <th>Condition</th>
    <th>Mailed Reminder<br>to Vote?</th>
    <th>Told Turnout<br>Being Watched</th>
    <th>Given Own<br>Vote History</th>
    <th>Neighbors and<br>Self Given All<br>Neighbors' Vote<br>History</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Control</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Civic Duty</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Hawthorne</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Self</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Neighbors</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
  </tr>
</tbody>
</table>

Just comparing the Hawthorne condition and the Control group isn't a great way to test this theory, because the Hawthrone condition is both told they are being monitor *and* mailed a reminder to vote.

So, it would be nice to hold constant whether people got a reminder to vote, and just vary whether they're being told they're being studied. To do this, we can compare the Hawthorne group to the Civic duty group.

In [None]:
hawth_vs_civic <- difference_in_means(outcome_voted ~ treat, social, condition1 = 'civic', condition2 = 'hawthorne')

hawth_vs_civic # Let's look at what we computed. (Don't change this line.)

The fact that the Hawthorne mailing group a) had much higher turnout than the control group b) but didn't have much higher turnout than the Civic Duty group suggests that the Civic Duty condition had an effect on turnout almost as strong as the Hawthorne condition. We can double check that this is the case by running another version of the `difference_in_means()` function, this time looking at the effect of the Civic Duty condition relative to the Control group.

In [None]:
difference_in_means(outcome_voted ~ treat, social, condition1 = 'control', condition2 = 'civic')

### Quick FYI - R packages

At the beginning, I mentioned that `difference_in_means()` is from the `estimatr` R package.

What is an R "package"?

One of the cool things about R is that people have built free packages for it. Packages are like apps for your phone, but for R. You can get them for free (usually) and install them, and they let R do new things. `estimatr` is one of those apps (package), and one of its features is the `difference_in_means()` function. Running `library(estimatr)` at the top of the notebook tells R to "install this app" (load the package).

# Appendix: Mail Images

If you want to take a closer look at the treatment groups:

#### Civic Duty Mailer
<img src="mail_images/civic_duty.png" width="500"/>

#### "Hawthorne" Mailer
<img src="mail_images/hawthorne.png" width="500"/>

#### "Self" Mailer
<img src="mail_images/self.png" width="500"/>

#### "Neighbors" Mailer
<img src="mail_images/neighbors.png" width="500"/>