## Problem Set 5
### UGBA 88: Data and Decisions, Fall 2019

In [None]:
#run this cell once, then *restart kernel*
%pip install gsExport

In [1]:
from datascience import *
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import numpy as np
import gsExport

Deadline: This assignment is due Monday, October 21st at noon (12pm). Late work will not be accepted.

You will submit your solutions using both OKpy and Gradescope. You will find detailed submission instructions ([here](https://docs.google.com/document/d/1vrg66vGtBf93xt4-LUQPpacUAQAxIJEeJ10fRsb8oUc/)). **Please do not remove or add cells and please ignore the '#newpage' cells** (these are here to facilitate Gradescope submission).

You should start early so that you have time to get help if you're stuck. Post questions on [Piazza](https://piazza.com/class/jzw0f05ebpof0). Check the syllabus  for the office hours schedule. Remember that Connector Assistant office hours are for *coding questions only*.

## Question 1: Oregon Health Study

**(31 points)** In this question we will analyze data from the Oregon Health Study, discussed in detail in Chapter 1 of *Mastering ’Metrics.* Recall that there is non-compliance in this experiment: there are participants that win the lottery that *do not* ultimately enroll in Medicaid, and there are participants that do not win the lottery that *do* enroll Medicaid. We will estimate the causal effect of Medicaid coverage in this context.

(Technical note: the results from this exercise will not match the results presented in the book exactly. In particular, the data we use here are limited to participants that live alone. We do this because including participants from larger households complicates the analysis in ways that are outside the scope of this problem set. For more details, read the paper on the Oregon Health Study cited in *Mastering 'Metrics*.)

Run the cell below to read in the data. Each row of the table represents a participant in the Oregon Medcaid lottery.

In [None]:
#run this cell to read in the data
ohs = Table.read_table("ohs_final.csv")
ohs.show(5)

**a. (2 points)** The column `any_medicaid` defines whether a participant is with or without Medicaid coverage. Compare the following columns for those two groups:

* `doc_any`: indicator for whether participant has any primary care doctor visits after lottery

* `cost_any_owe`: indicator for whether participant owes any money for medical expenses 12 months after lottery

* `poor_health`: indicator equaling 1 if the participant self-reports their health is poor 12 months after lottery

* `female`: indicator for whether participant is female

To compare these columns, compute the means of each outcome or characteristic by Medicaid coverage status, and then compute and print the difference in means.

In [2]:
#write code here

**b. (3 points)** Are the differences you’ve measured in **part a** a good estimate for the causal effect of some treatment? If not, why? If so, for what treatment? (You can ignore the possibility of interference or attrition.)

*Write answer here*

**c. (2 points)** Now, compare and print the same outcomes described in **part a** but for *lottery winners and losers*. This is defined by the column `win_lottery`. A value of 1 means the participant won the lottery; a value of 0 means the participant lost the lottery.

In [3]:
#write code here

**d. (3 points)** Are the differences you’ve measured in **part c** a good estimate for the causal effect of some treatment? If not, why? If so, for what treatment? (Again, you can ignore the possibility of interference or attrition.)

*Write answer here*

Next you will use this experiment to estimate the causal effect of Medicaid coverage. To do that, you will *instrument* for Medcaid coverage using `win_lottery`, the indicator for winning the lottery.

**e. (6 points)** What conditions or assumptions are required for this approach to be valid? Describe each of these conditions in the context of this experiment.

*Write answere here*

**f. (3 points)** In the context of this experiment, what defines a *complier*?

*Write answer here*

**g. (6 points)** For simplicity, let’s assume the monotonicity assumption holds. That is, there are no **defiers**. In class, we discussed how the first stage provides an estimate for the share of experiment participants that are compliers. Based on the data, what would you estimate is the share of participants that are **always-takers**? **Never-takers**?

In [4]:
#write code here

*Write answer here*

**h. (3 points)** Write a general function (in Python) that calculates the *Local Average Treatment Effect* (LATE) for a given outcome, treatment, and instrument.

In [5]:
#write code here

**i. (3 points)** Apply that function here for the outcome `cost_any_owe`, and describe your findings in a sentence. (If you're unable to create the function, just compute the LATE manually.)

In [6]:
#write code here

*Write description here*

#newpage

## Question 2: Should You Break Up? Flip a Coin.

**(19 points)** In this question, we study an experiment conducted by Steve Levitt, an economist you may know from his [Freakonomics](http://freakonomics.com/) book series and film.

In our lives, we face many difficult decisions that have potentially far-reaching consequences for our well-being: for example, whether to quit a job, end a relationship, or move across the country. Often we are quite uncertain about what to do. Levitt's research question is this: do we make good choices when facing such important decisions?

This is an ambitious question to answer. In the 'ideal' experiment for answering this question we would take a sample of people unsure about a major life decision; say, for example, whether to quit their job. Then we would *randomize* which decision they actually make and compare measures of well-being. From this hypothetical experiment we could measure the causal effect of quitting your job (for example) on well-being among those considering that decision.

Of course, running an experiment like this is impossible in practice -- we cannot force people to quit their job or end a relationship, so even if we randomly *assign* participants to make a particular choice, there is a clear noncompliance problem. Participants can ultimately decide as they please. However, from the tools we developed in class, we can potentially deal with noncompliance.

Here's how Levitt's experiment worked: each participant in the experiment has stated that they are having a difficult time making an important life decision. (*I've included the list of major decisions that were included in the study below.*) Each decision has two choices: an 'active choice' (e.g., quit the job) and a 'status quo' choice (don't quit the job). Then, for each participant, *a coin is flipped*. One choice is assigned to 'heads' and the other choice is assigned to 'tails'. The outcome of the coin toss is randomized and the participant is shown the outcome of the coin toss. If the coin toss dictates that the participant make a change, participants were encouraged to make that change within two months. If the coin toss dictates that the participant maintain the status quo, participants were told to maintain the status quo for at least the next two months. In this sense, participants are randomly 'assigned' to make one choice or another. (Remarkably, as you'll see below, some participants actually follow what the coin dictates.) Six months after the coin toss, participants are surveyed on: (1) what choice they ultimately made and (2) how happy they are. About 3,000 participants facing important life decisions complete this survey.

Happiness is measured on a 10-point scale. For reference, the standard deviation for a participant's *change* in happiness over 6 months is about 2.5 points.

To summarize, the *outcome of interest* is reported happiness. The *treatment of interest* is an indicator for making the active choice (e.g. quit the job). And the *instrument* is an indicator for whether the coin toss dictates an active choice.

*You won't need to read this list to answer the questions below, but for reference, these are the important life decisions that are included in the study:*
 * Should I quit my job?
 * Should I end a relationship?
 * Should I go back to school?
 * Should I start my own business?
 * Should I move?
 * Should I quit smoking?
 * Should I have a child?
 * Should I propose?

**a. (3 points)** Applying the same definitions from lecture, what does the term $Y_{1i}$ represent in this context? What does the term $D_{1i}$ represent in this context?

*Write answer here*

**b. (3 points)** Levitt finds that, regardless of whether the participant follows the dictate of the coin toss, participants that take the *active* choice report higher happiness. Does this evidence indicate that making the *active* choice increases happiness on average among participants?

*Write answer here*

The table below summarizes results from the experiment:

<img src="cointoss_table.png" alt="Drawing" style="width: 600px;"/>

**c. (4 points)** Describe in words what defines a **complier** in the context of this experiment. Approximately what share of participants are compliers? Assume there are no **defiers**.

*Write answer here*

**d. (3 points)** What is the **exclusion restriction** that must be satisfied for us to use the coin toss as an *instrument* for the participant's decision?

*Write answer here*

**e. (3 points)** Suppose the necessary assumptions are satisifed for us to use the outcome of the coin toss as an *instrument*. Among compliers, what is the average causal effect of making the active choice? (You should report a number and use appropriate units in your description.) What does the experiment imply that compliers should do (at least on average)?

*Write answer here*

**f. (3 points)** Compliers may be an unusual bunch. How do you think the average treatment effect for **never-takers** would compare to your LATE estimate? Why?

*Write answer here*

If you're interested in learning more about this experiment, [here's a write-up](https://www.theatlantic.com/business/archive/2016/08/quitting-your-job-and-other-life-choices/495122/) with more detail.

## Submission

Before submitting, please click "Kernel" above and click "Restart & Run All" to ensure all of your code is working as expected. This is important. Code that does not run cannot be graded. After confirming that all of your work looks and runs as you'd like it to, run **BOTH** of the below cells to submit your work.

Make sure that the following runs successfully for submission to OkPy.

In [None]:
from client.api.notebook import Notebook
ok = Notebook('pset5.ok')                
_ = ok.auth(inline=True)
_ = ok.submit()

Then, make sure that the following runs successfully to generate a PDF to upload to Gradescope. **Do not upload any other PDF to Gradescope other than the one generated by the below code.** If you have difficulty downloading the PDF, please review the submission instructions ([here](https://docs.google.com/document/d/1vrg66vGtBf93xt4-LUQPpacUAQAxIJEeJ10fRsb8oUc/)) or see Piazza for troubleshooting steps.


In [None]:
gsExport.generateSubmission('pset5.ipynb')