# POLSCI 3

## Week 4 Activity 1: Causality and Potential Outcomes

For this activity, we will use what we've learned from class about potential outcomes in order to analyze data from a company's workplace wellness program.

Many companies and organizations have workplace wellness programs that encourage employees to live healthier lives. (<a href="https://hr.berkeley.edu/hr-network/central-guide-managing-hr/managing-hr/wellness/health-safety/services/health-matters" target="_blank">Berkeley has one too!</a>) One reason is that companies/organizations have to pay higher health insurance costs if their employees use lots of expensive medical care. So, **workplace wellness programs try to encourage employees to live healthier lives, with the goal of reducing the amount of money the employees spend on medical care---thereby reducing how much money the company has to spend on employee health insurance**.

But do these workplace wellness programs actually reduce employees' medical care costs? This week, we'll analyze data from a real study that looked at this question. Today, though, we'll pretend we are omniscent and can see all the potential outcomes in this study.

### Part 1: Importing the Data

In [None]:
#RUN THIS CELL
library(testthat)

# Read in the data.
data <- read.csv('ps3_wellness_with_POs.csv', stringsAsFactors = F)
data # Let's take a look at the data.

This dataset contains real data on participants in a wellness program... and a couple fake variables we've added in: the potential outcomes.

Each row in this dataset represents a unique person, and the data measures medical care costs before and after the wellness program started. Here is more information about the variables:

- `name`: Respondent Name
- `participate`: `0` if the person did not participate in the workplace wellness program; `1` if the person did participate in the workplace wellness program
- `baseline`: Amount of monthly average medical costs at baseline; that is, before the program started.
- `po_control`: Monthly cost of medical care for this person after the workplace wellness program started, **if the person _did not_ participate in the program**. In this study, this is the control potential outcome (because this is what would have happened if these people did not recieve the treatment; i.e., if they did not participate in the wellness program).
- `po_treat`: Monthly cost of medical care for this person after the workplace wellness program started, **if the person _did_ participate in the program**. In this study, this is the treatment potential outcome (because this is what would have happened if these people received the treatment, which is the wellness program).

-----

### Part 2: Data Analysis

----

**Question 1a (multiple choice).** In this example, what is the "**treatment**"?

- `a`: Respondent's names
- `b`: Participating in the wellness program
- `c`: Average monthly medical costs before the program started
- `d`: Average monthly medical costs after the program started

Enter `a`, `b`, `c`, or `d` between the quotes below to answer the question. For example, to answer `a`, write `q1a.answer <- 'a'`.


In [None]:
q1a.answer <- '...'

----

**Question 1b (multiple choice).** In this example, what is the "**outcome**"?

- `a`: Respondent's names
- `b`: Participating in the wellness program
- `c`: Average monthly medical costs before the program started
- `d`: Average monthly medical costs after the program started

Enter `a`, `b`, `c`, or `d` between the quotes below to answer the question. For example, to answer `a`, write `q1b.answer <- 'a'`.


In [None]:
q1b.answer <- '...'

-----

**Question 2.** Create a new variable in the dataset called `treatment_effect` which contains the treatment effect for every individual. Do this by calculating the treatment potential outcome minus the control potential outcome.

*Hint: The last slide of the lecture gave you the line of code to do this!*


In [None]:
# Replace ... with the right answer
data$treatment_effect <- ...
data$treatment_effect # Let's print the result.

----

**Question 3 (multiple choice).** Why is the treatment effect defined as the treatment potential outcome minus the control potential outcome?

- `a`: To understand the effect of the treatment, we compare outcomes after the treatment was administered to outcomes from before the treatment was administered
- `b`: We want to control for confounding factors when analyzing data
- `c`: This captures what additional effect (if any) the treatment would have relative to what would have happened anyway if the treatment had not been administered
- `d`: The people in the sample might not be representative of the broader population, and we use the control potential outcome to adjust for this

Enter `a`, `b`, `c`, or `d` between the quotes below to answer the question. For example, to answer `a`, write `q3.answer <- 'a'`.


In [None]:
q3.answer <- '...'

------

Before we move on, let's look at what the dataset looks like now that we've added our new variable to it, `treatment_effect`:

In [None]:
data # Just run this cell, no need to change it.

As you can see, the treatment effect is zero for most people, but some people have non-zero treatment effects.

Let's look at the data for one person, Haley, whose treatment effect isn't zero:

In [None]:
subset(data, name == 'Haley') # Just run this cell, no need to change it.

----

**Question 4 (multiple choice).** What does it mean that Haley's treatment effect is -50?

- `a`: By random chance, Haley happened to spend less on medical care during the month when she was participating in the workplace wellness program.
- `b`: If Haley participated in the workplace wellness program, she would spend \$50 less per month on medical care than if she did not participate in the workplace wellness program.
- `c`: The workplace wellness program causes Haley to spend \$50 more per month on medical care.
- `d`: Haley spent $50 less per month on medical care during the workplace wellness program than she did in the month before the program started.

Enter `a`, `b`, `c`, or `d` between the quotes below to answer the question. For example, to answer `a`, write `q3.answer <- 'a'`.


In [None]:
q4.answer <- '...'

-----

**Question 5.** Now let's think about everyone in this dataset, not just Haley. What is the *average treatment effect* of the workplace wellness program for the people in the `data` dataset? (This isn't a trick question: what is the average or *mean* of the `treatment_effect` variable?)


In [None]:
average.treatment.effect <- NULL # YOUR CODE HERE
average.treatment.effect # Prints your answer.

-----

**Question 6.** Which of the following claims is/are true?

1. In real life, we could never observe the true treatment effect of the wellness treatment for any specific individual
2. An individual's treatment effect cannot be zero

- `a`: Only number 1 is true
- `b`: Only number 2 is true
- `c`: 1 and 2 are both true
- `d`: Neither 1 nor 2 are true (both are false)

Enter `a`, `b`, `c`, or `d` between the quotes below to answer the question. For example, to answer `a`, write `q5.answer <- 'a'`.


In [None]:
q6.answer <- '...'

----

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit.

In [None]:
ottr::export("Week4_Activity1.ipynb", pdf = TRUE, force_save = TRUE)