--- 
Project for the course in Microeconometrics | Summer 2021, M.Sc. Economics, Bonn University | [Bahar Coskun](https://github.com/baharcos)

# Replication of Joshua Angrist, Daniel Lang, and Philipp Oreopoulous (2009) <a class="tocSkip">   
---

The following notebook contains my replication of the results of

> Angrist, J., Land, D., & Oreopoulos, P. (2009). Incentives And Services For College Achievement: Evidence From A Randomized Trial. American Economic Journal: Applied Economics, 1(1), 136-63.

#### Downloading and Viewing this Notebook

* To ensure that every image and format is displayed properly, I recommend to download this notebook from its repository on [GitHub](https://github.com/OpenSourceEconomics/ose-data-science-course-project-baharcos). Other viewing options like _MyBinder_ or _NBViewer_ might have issues displaying formulas and formatting.

* The original data, code and paper can be found [here](https://economics.mit.edu/faculty/angrist/data1/data/angrist,1)

#### Information about the Replication and Individual Contributions

* I have labeled all figures and tables in this notebook in an ascending order to prevent confusion of the reader. If it is a replication, I always indicate the label of the corresponding figure and table referencing to the original paper.

* I always explicitly state, when I give the opinion of the author.

* All sections and subsections that are labeled as *Extension* are my independent contributions.  

* For more detailed information it is always refered to Angrist et. al. [(2009)](https://www.aeaweb.org/articles?id=10.1257/app.1.1.136) I replicated the results from the main analysis.

* The layout and the structure of the notebook is inspired by Annica Gehlen's replication of Jason M. Lindo, Nicholas J. Sanders & Philip Oreopoulos (2010) from the 2019 iteration of the Microeconometrics class. [(link)](https://github.com/amageh/replication-performance-standards/blob/master/replication-notebook.ipynb)

<h1> WiP: Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-Introduction" data-toc-modified-id="1.-Introduction-1">1. Introduction</a></span></li><li><span><a href="#2.-Data" data-toc-modified-id="2.-Data-2">2. Data</a></span></li><li><span><a href="#3.-Identification and Empirical Strategy" data-toc-modified-id="3.-Identification and Empirical Strategy-2">3. Identification and Empirical Strategy</a></span></li> <li><span><a href="#4.-Econometric Background" data-toc-modified-id="4.-Econometric Background-4">4. Econometric Background</a></span></li><li><span><a href="#5.-Replication-of-Angrist-et-al.-(2009)" data-toc-modified-id="5.-Replication-of-Angrist-et-al.-(2009)-3">5. Replication-of-Angrist-et-al.-(2009)</a></span></li><li><span><a href="#6.-Conclusion-and-Critical-Assessment" data-toc-modified-id="6.-Conclusion-and-Critical-Assessment-6">6. Conclusion  and Critical Assessment</a></span></li>
<li><span><a href="#7.-References" data-toc-modified-id="7.-References-7">7. References</a></span></li></ul></div></li>

---

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from auxiliary.colors import get_colors
from auxiliary.colors import plot_colors

---
# WiP: 1. Introduction
---
Will need to work on the argumentation:

***Motivation***:
The motivation of Angrist et al.(2009) comes from the importance of academic performance, not dropping out from the program and completing on time for post-secondary education experience. Something that many students, particularly ones from low-income families, struggle to achieve. One reason for this is poor study skills. As a response many North American universities offer academic services to improve skills like note taking, time management or goal-setting. However, the non-experimental research on academic support services for college students have found mixed results even though experimental design of the same services on high school students drew a more promising picture. 
Furthermore to motivate better academic performance merit scholarship have been part of college education for a long time. Although only a small group of spectacular students have benefitted from this financial incentive. 

***Study Design***: \
Angrist et al. (2009) analysis the results of Student Achievement and Retention (STAR) Demonstration Project. STAR is designed as a randomised evaluation to learn more about the effects of support services and financial incentives on academic performance at a satellite campus of a large Canadian university that is in U.S. terms a large state university with heavily subsidised tuition fees. Students attending are mostly from local area and have similar high school background which eliminate geographic background differences that could arise in the treatment.

***Estimation Strategy***:\
Students are randomly allocated into control or one of the treatment groups. Students in a treatment group receive an offer and the have to signed up to be eligible which is not random. In order to estimate the causal impact of financial incentives and academic support services Angrist et al. (2009) use **instrumental variables (IV)**. However, due to the low compliance rate most results reported are **intent-to-treat (ITT)** estimates. The data comes from 1656 full time first year students combining administrative data and survey data which the students took **prior** to the treatment selection.


**Main variables** 

| **Treatments**   | **Outcome Variables**  |
|------------------|------------------------|
| Services         | GPA                    |
| Incentives       | On probation/withdrew  |              
| Combined         | Credits earned         | 


***Results***: \
Main findings of Angrist et al. (2009) are: 
- Females use the services much more than males
- Male achievement remained unchanged with the intervention 
- Services itself failed to attract students
- Female achievement in services combined with financial incentives increased students GPA about 0.35$\sigma$


Write what is in this notebook

---
# WiP: 2. Study Design, Data and Descriptives(the extention)
---
**2.1. STAR Demonstration Project**\

STAR randomly assigned entering first-year undergreduates into the control group or one of the treatment groups below:
- service strategy SSP: peer advisors are trained upper-class students  in the treated students' program offering academic advise and suggestions for coping with first year of uni. advisor email regularly(at least biweekly) about university assimilation, scheduling, studying, time management, reminding availability of the services to meet at STAR office. Also trained for circumstances to call for professional help + FSG's to improve study habits complement normal tutorials, develop reasoning skills. Focus: critical thinking, note-taking, graphic organisation, questioning techniques, vocabulary acquisition, and test prediction and preparation.
offered approx. half of first year courses. Some of large courses offered the service to all students because it was in place before the experiment

- incentive strategy SFP: award targets are based on high school grades\
lowest grade quartile: $5000 for B, $1000 for C+ \
second quartile: $5000 for B+ and $1000 for B- \
third quartile: $5000 for A- and $1000 for B \
to qualify had to take at least 4 courses (to complete in time 5 courses per semester) and sign up for second year(all students in the program meeting targets did so) \
Targets based on trade off between program costs and award accessibility

- combined strategy SFSP: no link between two strategies(you can use the service without being eligible to fellowship and can be eligible for fellowship without using the service)

**2.2 Data**

Main source of data is baselie survey: write which varaibles come from administrative data and which from survey.



In [4]:
data = pd.read_stata('data/STAR_public_use.dta') #read the data

In [5]:
data.info() # check variables, missing values and data types

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1656 entries, 0 to 1655
Data columns (total 48 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   GPA_year1           1537 non-null   float32 
 1   GPA_year2           1368 non-null   float32 
 2   age                 1656 non-null   int8    
 3   chooseUTM           1472 non-null   float32 
 4   compsurv            1656 non-null   float32 
 5   control             1656 non-null   int8    
 6   credits_earned1     1575 non-null   float32 
 7   credits_earned2     1575 non-null   float32 
 8   dad1                1472 non-null   float32 
 9   dad2                1472 non-null   float32 
 10  dad_edn             1472 non-null   category
 11  english             1656 non-null   int8    
 12  female              1656 non-null   int8    
 13  finish4             1472 non-null   float64 
 14  goodstanding_year1  1634 non-null   float32 
 15  goodstanding_year2  1634 non-null   fl

**Plot Ideas**
- 
- 

---
# WiP: 3. Econometric Background, Empirical Strategy and Identification
---
The main goal of the Angrist et al. (2009) is to determine the causal effect of incentives and services on academic performance.

Incomplete random assignment, randomised offer do not force students into participate, sign up creates selection bias.  Sign-up procedure used to determine who is actually motivated to use the program by paying a small cost
To make sure that students would use the program to some degree or care to  use
Only people who received an offer can participate in one of the programs and participation is expected to improve academic performance.
The first stage is the effect of offer on sign-up rates. Which is by offer how likely you are to sign up This is given by the compliance rate. 
The instrument is the offer 
Second stage is effect of participation on academic performance
Reduced form(ITT: effect of offer on academic performance) = effect of offer on sign-up rates x effect of participation on academic performance
What we want to find out: effect of participation on academic performance


<img src="files/causalgraph.png" width=500 />

**Key Assumptions:**
Substantial first stage: The offer should really change the sign up, as only students who recieves an offer can sign up, it is safe to say that this assumption holds

Independence assumption: The instrument has to be as good as randomly assigned. The offer is given to students b random assignment. The authors still run checks for balance on treatment groups with control variables in table 1. Controlling for no group of students are suspiciously more likely to selected for treatment. 

Exclusion restriction:  We require the instrument changing outcome solely through the variable of interest. The authors check this by running an additional over-identified iv regression.??

If these assumptions hold the causal effect of participation can be identified by iv.
The reported results are mainly ITT. I will run additional checks on which group is more likely to sign up.

**Evaluation Framework**\
ITT: no correction for sign up and dilluted by non compliance
IV: the offer of services for program sign-up (to estimate the effect of treatment on those who signed up who are not random)
$P_i = 1$ represents people who gave concent(signed up) to recieve emails \
$Z_i$ is the random variable representing random assignment offer of treatment

\begin{equation}
E[Y_{1i} - Y_{0i} |P_i=1] = (E[Y_i|Z_i = 1] - E[Y_i|Z_i = 0]) / Pr[P_i = 1|Z_i = 1]
\end{equation}

- $E[Y_{1i} - Y_{0i} |P_i=1]$ is the true casual effect of participation on one of the treatment programs.

- $E[Y_i|Z_i = 1] - E[Y_i|Z_i = 0]$ is the intend-to-treat estimate given by the service offer

- $Pr[P_i = 1|Z_i = 1]$ is the compliance rate

