# POLSCI 3

## Week 8, Activity Notebook 1: Heterogeneous treatment effects

First, read in the data below.

In [None]:
library(testthat)
library(estimatr)

data <- read.csv("ps3_week8_electing_women.csv")
head(data)

Here is a quick reminder of what each column means:

- `unique_id`: Precinct ID
- `treat`: treatment variable
    - `'control'`: control group
    - `'supply'`: supply group; party chair instructed to recruit 2-3 women
    - `'demand'`: demand group; party chair reads letter at precinct convention
    - `'both'`: a fourth group getting both the supply and demand treatments; party chair instructed to read letter *and* to recruit 2-3 women
- `prop_sd_fem2014`: **Outcome**: Proportion of 2014 elected state delegates from that precinct who were women
- `sd_onefem2014`: 1 if at least one woman was selected; 0 otherwise
- `county` : County name in Utah
- `pc_male`: 1 if precinct chair is male; 0 otherwise (precinct chair is person who runs precinct meeting, would read letter if assigned to do so, etc.)
- `mormon`: 1 if precinct chair filled out a survey and told the party they were a Mormon; 0 otherwise (either because not Mormon or did not fill out survey) **<span style="color:red">New variable!</span>**

While this data set is otherwise similar to Week 7's dataset, **we now have a new variable `mormon`**:

In [None]:
# Here's the new mormon column:
table(data$mormon)

##### Subsetting reminder:

Remember that we can subset based on number in the dataset:
`data.subset <- subset(data, name.of.number.variable == 0)` would subset `data` to cases where `name.of.number.variable` is 0.

### Comparing effects for Mormon and non-Mormon precinct chairs

**Question 1.** Estimate: the effect of the "both" treatment (relative to the control group), on the `prop_sd_fem2014` variable, among precinct chairs who identified themselves as Mormon in the survey.


In [None]:
# First, subset to chairs who identify as mormon
mormon.chairs <- NULL # YOUR CODE HERE

# Next, use difference_in_means() to estimate the effect on prop_sd_fem2014 in this subset
dim.mormons <- NULL # YOUR CODE HERE
dim.mormons

-----

**Question 2.** Estimate the effect of the "both" treatment (relative to the control group), on the `prop_sd_fem2014` variable, among precinct chairs who **did not** identify themselves as Mormon in the survey (either because they didn't complete the survey or because they said they were not Mormon).


In [None]:
# First, subset to chairs who do not identify as mormon
nonmormon.chairs <- NULL # YOUR CODE HERE

# Next, use difference_in_means() to estimate the effect in this subset
dim.nonmormons <- NULL # YOUR CODE HERE
dim.nonmormons

-----

**Question 3. `TRUE` or `FALSE`: Our best guess is that the effect of the "both" treatment is larger for precincts chairs who identified themselves as Mormon in the pre-survey.**

Save your answer in `q3.answer`. Do not use quotes. For example, to answer `TRUE`, type `q3.answer <- TRUE`.


In [None]:
q3.answer <- NULL # YOUR CODE HERE

-----

**Question 4. `TRUE` or `FALSE`: The data indicates that, if non-Mormon chairs converted to Mormonism, the "both" treatment would then have higher effects on them.**

Save your answer in `q4.answer`. Do not use quotes. For example, to answer `TRUE`, type `q4.answer <- TRUE`.


In [None]:
q4.answer <- NULL # YOUR CODE HERE

-----

In Question 1, you calculated the following $p$-value:

In [None]:
dim.mormons$p.val

**Question 5.** Which one of the following statements best describes what this value means?

- `'a'`: The probability that we would see the estimate among Mormons that we did if the both treatment had no effect among Mormons
- `'b'`: The probability that the both treatment has an effect among Mormons
- `'c'`: The probability that the both treatment does not have an effect among Mormons
- `'d'`: The probability the effect among Mormons is larger than the effect among non-Mormons

Enter your answer below between quotes. For example, if you wanted to answer 'a', your answer would look like: `q5.answer <- 'a'`.


In [None]:
q5.answer <- '...'

-----

**Question 6.** Which one of the following would be an example of $p$-hacking?

- `'a'`: Testing whether the "both" treatment caused precincts to be more likely to elect women
- `'b'`: Splitting up the dataset by county, estimating the effect of the "both" treatment in every county in Utah, and then focusing on the results only in counties where the $p$-value is statistically significant
- `'c'`: Calculating the ends of a confidence interval by computing $\text{Estimate} + 1.96 * \text{Standard Error}$ and $\text{Estimate} - 1.96 * \text{Standard Error}$
- `'d'`: Deciding in advance of running an experiment to increase the sample size so that the standard error will be lower

Enter your answer below between quotes. For example, if you wanted to answer 'a', your answer would look like: `q6.answer <- 'a'`.


In [None]:
q6.answer <- '...'

-----

# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save and Checkpoint`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.

In [None]:
ottr::export("Week8_Activity1.ipynb")

After you hit "Run" on the cell above, wait for a moment (about 5 seconds), then click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try the "old way": hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.