--- 
Microeconometrics | Summer 2021 | M.Sc. Economics, Bonn University 

# Replication of Angrist, J., and Evans, W. (1998). "Children and Their Parent's Labor Supply: Evidence from Exogenous Variation in Family Size". <a class="tocSkip">   

[Carolina Alvarez](https://github.com/carolinalvarez)
---

**Angrist, J.D., & Evans, W.N. (1998).** [Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size](https://www.jstor.org/stable/116844?seq=1). *The American Economic Review*, 88(3). 450-477. 

# Table of contents
* [Introduction](#Introduction)
* [Identification Strategy](#Identification)
* [Empirical Methodology](#Empirical-Methodology)
* [Replication Angrist & Evans (1998)](#Replication-of-Angrist-&-Evans-(1998))
 * [Data & Descriptive Statistics](#Data-&-Descriptive-Statistics)

In [1]:
#%matplotlib inline
import numpy as np
import pandas as pd
import pandas.io.formats.style
import seaborn as sns
import statsmodels as sm
import statsmodels.formula.api as smf
import statsmodels.api as sm_api
import matplotlib as plt
from IPython.display import HTML

---
# Introduction 
---

Angrist and Evans (1998) study the causal mechanisms between fertility and the work effort of both men and women. The authors begin by explaining the theoretical and practical reasons of studying the relationship between fertility and labor supply. First, there has been development of economic models that link the family and the labor market. Second, the relationship between fertility and labor supply could explain the increase of women's participation in the labor market in the post-war period, where having fewer children could have increased the female labor-force share. Meanwhile, other studies have linked fertility with female withdraws from the labor market and lower wages compared to men.

The mayority of empirical studies related to childbearing and labor supply find a negative correlation between family size (i.e., fertility) and female labor force. However, in his assesment of Economics of the Family, Robert J. Willis argues that there has not been well-measured exogenous variables that allow to separate cause and effect relationships from correlations among variables such as delay of marriage, decline of childbearing, increase in divorces, and increase in female labor force participation.

In this vein, the authors argue that the problems concerning the causal association between family size and labor supply arises from the theoretical argument that both factors are jointly determinated. For example, some labor-supply econometric models often use child-status variables as regressors on hours of work. On the other hand, economic demographers usually measure the effect of wages on fertility. According to the authors, "*since fertility variables cannot be both dependent and exogenous at the same time, it seems unlikely that either sort of regression has a causal interpretation*". 

Angrist and Evans (1998) contribute by using an **instrumental variable strategy (IV)** based on the sex-mix of children in families with two or more kids. This captures parental preferences for mixed-sex siblings, where parents of same-sex children are much more likely to have an additional child.

**Endogeneity Problem**

<center>Fertility 🠊 Labor supply</center>
<center>Labor supply 🠊 Fertility </center>

**Instrument** 

<center>Dummy variable for whether the sex of the second child matches the sex of the first child</center> 




---
# Identification
--- 
![ERROR:Here should be causal graph 1](files/causal_graph_v1.png)


---
# Empirical Methodology
## Casual estimation with a Binary IV

\begin{equation}
Y = \alpha + \delta D + \epsilon
\end{equation}

\begin{equation}
E[Y] = E[\alpha + \delta D + \epsilon]= \alpha + \delta E[D] + E[\epsilon]
\end{equation}

We re-write it as a difference equation in Z and divide both sides by $ E[D|Z=1] - E[D|Z=0]$ which yields:

\begin{equation}
\frac{E[Y|Z=1]-E[Y|Z=0]}{E[D|Z=1]-E[D|Z=0]} =\frac{\delta (E[D|Z=1]-E[D|Z=0]) + (E[\epsilon|Z=1]-E[\epsilon|Z=0])}{E[D|Z=1]-E[D|Z=0]}
\end{equation}

If the data holds for the causal graph despicted above, then $Z$ has no association with $ /epsilon$ and therefore:

\begin{equation}
\frac{E[Y|Z=1]-E[Y|Z=0]}{E[D|Z=1]-E[D|Z=0]} =\delta
\end{equation}

Under these conditions, the ratio of the population-level association between Y and Z and between D and Z is equal to the causal effect of D on Y. Then, if $Z$ is associated with $D$ but not with $/upvarepsilon$, then the following is the IV-Estimator for infinite samples:

\begin{equation}
\hat{\delta}_{IV,WALD} = \frac{E_N[y_i|z_i=1] - E_N[y_i|z_i=0]}{E_N[d_i|z_i=1] - E_N[d_i|z_i=0]}
\end{equation}

This is the IV-Estimator, which is known as the Wald Estimator when the instrument is binary. The wald estimator takes the average difference in the observed outcome of those who were exposed to the instrumental variable and of those who were not. Then it takes the average difference between the ones in the treatment group who took the treatment and those in the tratment group who did not receive the treatment.

## IV Estimation as LATE Estimation

Imbends and Angrist (1994) developed a framework for classifiying individuals as: i) those who respond positively to an instrument; ii) those who remain unaffected by the instrument; iii) those who rebel against the instrument. When $D$ and $Z$ are binary variables, then they are four possible group of individuals:

| Status                                    |Potential treatment assignment         | 
| ------------------------------------------|:-------------------------------------:| 
| Compliers ($\tilde{C}=c$)                 | $D^{Z=0}=0; D^{Z=1}=1$                | 
| Defiers ($\tilde{C}=d$)                   | $D^{Z=0}=1 D^{Z=1}=0$                 | 
| Always takers ($\tilde{C}=a$)             | $D^{Z=0}=1 D^{Z=1}=1$                 |  
| Never takers ($\tilde{C}=n$)              | $D^{Z=0}=0 D^{Z=1}=0$                 |   


A valid instrument $Z$ for the casual effect of $D$ on $Y$ must satisfy three assumptions in order to identify the **LATE**:

* Independence assumption: ($Y^{1}, Y^{0}, D^{Z=1}, D^{Z=0} \indep Z$)

This is analogous to the assumption that $cov(Z, \varepsilon)=0$ in the traditinal IV litera
* Non-zero effect of instrument assumption: $k \neq 0$ for all $i$
* Monotonicity assumption: either $k \geq 0$ for all $i$ or $k \leq 0$ for all $i$ 


---

---
# Replication of Angrist & Evans (1998)
---

## Data & Descriptive Statistics

Angrist and Evans (1998) use two extracts from the Census Public Use Micro Samples(PUMS) that correspond to the year 1980 and 1990. (Add that the paper also uses 1970 for descr analytics only not available in the repo).

In [39]:
census_1 = pd.read_stata("data/m_d_806_1.dta")
census_2 = pd.read_stata("data/m_d_806_2.dta")

In [40]:
data=census_1.append(census_2, ignore_index=False, verify_integrity=False, sort=False)
#data = prepare_data(data) from the auxiliary to be created after def variables

In [42]:
data.describe()

Unnamed: 0,STATE,SEXK,AGEK,QTRBKID,RACEK,SPANISH,BIRTHPLK,SCHOOLK,GRADE,FINGRADE,...,AWEEK79D,AHOUR79D,AINC1D,AINC2D,id,boy1st,boy2nd,two_boys,two_girls,same_sex
count,927267.0,927267.0,927267.0,927267.0,927267.0,927267.0,927267.0,927267.0,927267.0,927267.0,...,762843.0,762843.0,762843.0,762843.0,927267.0,927267.0,655169.0,927267.0,927267.0,927267.0
mean,28.463958,0.488175,8.758242,2.522137,1.575727,0.1484,48.79557,0.956873,5.507927,0.825858,...,0.200804,0.206107,0.311123,0.319551,463653.625,0.511825,0.511378,0.186647,0.170051,0.0
std,15.381372,0.49986,4.764916,1.113425,2.055951,0.610885,112.210375,0.685231,4.39152,0.454339,...,0.740307,0.758842,0.82736,0.876777,267656.625,0.49986,0.499871,0.389628,0.375678,0.0
min,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,17.0,0.0,5.0,2.0,1.0,0.0,17.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,231817.5,0.0,0.0,0.0,0.0,0.0
50%,29.0,0.0,9.0,3.0,1.0,0.0,30.0,1.0,5.0,1.0,...,0.0,0.0,0.0,0.0,463634.0,1.0,1.0,0.0,0.0,0.0
75%,41.0,1.0,13.0,4.0,1.0,0.0,42.0,1.0,9.0,1.0,...,0.0,0.0,0.0,0.0,695450.5,1.0,1.0,0.0,0.0,0.0
max,56.0,1.0,17.0,4.0,13.0,4.0,997.0,3.0,22.0,3.0,...,3.0,3.0,3.0,3.0,927267.0,1.0,1.0,1.0,1.0,0.0


In [6]:
data.head()

Unnamed: 0,STATE,SEXK,AGEK,QTRBKID,RACEK,SPANISH,BIRTHPLK,SCHOOLK,GRADE,FINGRADE,...,CLASSD,WEEKSD,HOURSD,INCOME1D,INCOME2D,AWEEK79D,AHOUR79D,AINC1D,AINC2D,id
0,1,1,9,3,1,0,1,1,6,1,...,1.0,52.0,40.0,28005.0,0.0,0.0,0.0,0.0,0.0,1.0
1,1,1,8,3,1,0,1,2,5,1,...,5.0,52.0,72.0,0.0,10005.0,0.0,0.0,0.0,3.0,2.0
2,1,0,9,1,1,0,1,1,5,1,...,5.0,16.0,48.0,0.0,16005.0,0.0,0.0,2.0,0.0,3.0
3,1,0,5,2,1,0,1,1,2,1,...,,,,,,,,,,4.0
4,1,1,11,1,1,0,1,1,7,1,...,1.0,32.0,40.0,9925.0,0.0,0.0,0.0,3.0,3.0,5.0


In [41]:
#constructing same sex variables --> instrument
#boy 1rst
data["boy1st"] = np.NaN
data.loc[data.SEXK == 0, "boy1st"] = 1 #for boys
data.loc[data.SEXK == 1, "boy1st"] = 0 #for girls

#boy 2nd
data["boy2nd"] = np.NaN
data.loc[data.SEX2ND == 0, "boy2nd"] = 1 #for boys
data.loc[data.SEX2ND == 1, "boy2nd"] = 0 #some will have NAs because there is no second child

#gen var two boys
data["two_boys"] = np.where(
    (data["boy1st"] == 1) & (data["boy2nd"] ==1), 1, 0)
#data["two_boys"] = np.NaN
#data.loc[(data["boy1st"] == 1) & (data["boy2nd"] == 1), "two_boys"] = 1

#gen var two girls
data["two_girls"] = np.where(
    (data["boy1st"] == 0) & (data["boy2nd"] ==0), 1, 0)

#data["two_girls"] = np.NaN
#data.loc[(data["boy1st"] == 0) & (data["boy2nd"] == 0), "two_girls"] = 1

#gen var same sex
data["same_sex"] = np.where(
    (data["two_boys"] == 1) & (data["two_girls"] ==1), 1, 0)

#data["same_sex"] = np.NaN
#data.loc[(data["two_boys"] == 1) | (data["two_girls"] == 1), "same_sex"] = 1

In [43]:
#constructing race indicators for mom and dad
#mom
data["blackm"] = np.NaN
data.loc[data.RACEM == 2, "blackm"] = 1
data["hispm"] = np.NaN
data.loc[data.RACEM == 12, "hispm"] = 1
data["whitem"] = np.NaN
data.loc[data.RACEM == 1, "whitem"] = 1
data["otheracem"] = np.NaN
data.loc[(data["blackm"] != 1) & (data["hispm"] != 1) & (data["whitem"] != 1), "otheracem"] = 1 

#dad
data["blackd"] = np.NaN
data.loc[data.RACED== 2, "blackd"] = 1
data["hispd"] = np.NaN
data.loc[data.RACED == 12, "hispd"] = 1
data["whited"] = np.NaN
data.loc[data.RACED == 1, "whited"] = 1
data["otheraced"] = np.NaN
data.loc[(data["blackd"] != 1) & (data["hispd"] != 1) & (data["whited"] != 1), "otheraced"] = 1 


In [44]:
#constructing education var for mom
data["educm"] = np.where(
    (data["FINGRADM"] == 1) | (data['FINGRADM'] == 2), data["GRADEM"] - 3, data["GRADEM"] - 2) 

#highschool graduated
data["hsgrad"]= np.NaN
data.loc[data["educm"] == 12, "hsgrad"] = 1

#highschoool or more
data["moregrad"]= np.NaN
data.loc[data["educm"] > 12, "moregrad"] = 1

In [45]:
# constructing income variables for labor market supply (?)
#for dad
data["total_incomed"]=(data.INCOME1D + np.maximum(data.INCOME2D, 0))*2.099173554 #taking into account neg values
#reported in INCOME2D and deflating wages as stated in Angrist and Evans (1998) 

#for mom
data["total_incomem"]=(data.INCOME1M + np.maximum(data.INCOME2M, 0))*2.099173554 #taking into account neg values
#reported in INCOME2M and deflating wages as stated in Angrist and Evans (1998) 


In [46]:
#more than 2 children
data["more2k"] = np.where(
    (data["KIDCOUNT"] > 2), 1, 0) 

In [47]:
#constructing age when first child was born

#year of birth dad
data["yobd"] = np.where(
    (data["QTRBTHD"] == 0), 80-data["AGED"], 79-data["AGED"]) 

#gen ageqm=4*(80-YOBM)-QTRBTHM-1
#gen ageqd=4*(80-yobd)-QTRBTHD
#gen agefstm=(ageqm-AGEQK)/4

data["ageqm"]=4*(80-data.YOBM)-data.QTRBTHM-1
data["ageqd"]=4*(80-data.yobd)-data.QTRBTHD
data["agefstm"]=(data.ageqm-data.AGEQK)/4 #age of mom when kid first born

In [69]:
#sample of moms aged between 21 and 35, second kid no old than 1 year
data_2=data[((data['AGEM']>=21) & (data['AGEM']<=35)) & (data['KIDCOUNT']>=2) & (data['AGEQ2ND']>4) & (data['agefstm']>=15) 
            & (data['ASEX']==0) & (data['AAGE']==0) & (data['AQTRBRTH']==0)  
            & (data['ASEX2ND']==0) & (data['AAGE2ND']==0) & (data['AQTRBRTH']==0)]

print("The sample of moms aged between 21 and 35 with second kid no older than 1 year old", len(data_2), "observations.")

The sample of moms aged between 21 and 35 with second kid no older than 1 year old 394840 observations.


In [56]:
data_2.describe()

Unnamed: 0,STATE,SEXK,AGEK,QTRBKID,RACEK,SPANISH,BIRTHPLK,SCHOOLK,GRADE,FINGRADE,...,educm,hsgrad,moregrad,total_incomed,total_incomem,more2k,yobd,ageqm,ageqd,agefstm
count,394840.0,394840.0,394840.0,394840.0,394840.0,394840.0,394840.0,394840.0,394840.0,394840.0,...,394840.0,37552.0,96907.0,333707.0,394840.0,394840.0,333707.0,394840.0,333707.0,394840.0
mean,28.674321,0.488912,9.112289,2.528148,1.599433,0.15016,47.596269,1.05603,5.601018,0.93627,...,11.418301,1.0,1.0,37426.343925,7160.815382,0.402064,46.291232,120.015333,133.311938,20.515074
std,15.371484,0.499878,3.536165,1.111621,2.131755,0.598132,107.658081,0.555176,3.47727,0.331009,...,2.324975,0.0,0.0,24536.664413,10804.13011,0.490315,4.984661,14.039271,19.918779,2.937905
min,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,...,-2.0,1.0,1.0,0.0,0.0,0.0,-2.0,82.0,60.0,15.0
25%,17.0,0.0,6.0,2.0,1.0,0.0,17.0,1.0,3.0,1.0,...,11.0,1.0,1.0,23101.404962,0.0,0.0,43.0,110.0,121.0,18.25
50%,29.0,0.0,9.0,3.0,1.0,0.0,29.0,1.0,5.0,1.0,...,11.0,1.0,1.0,35696.446286,1353.966942,0.0,46.0,122.0,133.0,20.0
75%,41.0,1.0,12.0,4.0,1.0,0.0,42.0,1.0,8.0,1.0,...,12.0,1.0,1.0,47661.735544,12395.619836,1.0,49.0,131.0,145.0,22.25
max,56.0,1.0,17.0,4.0,13.0,4.0,996.0,3.0,16.0,3.0,...,20.0,1.0,1.0,314876.0331,260308.016564,1.0,65.0,141.0,328.0,33.75


In [63]:
def table_sum_stats(data_2):
    """
    Creates Table 2.
    """
    variables = data_2[
        [
            "KIDCOUNT",
            "more2k",
            "boy1st",
            "boy2nd",
            "two_boys",
            "two_girls",
            "agefstm"
        ]
    ]

    table2 = pd.DataFrame()
    table2["Mean"] = variables.mean()
    table2["Standard Deviation"] = variables.std()
    table2 = table2.astype(float).round(2)
    table2["Description"] = [
        "Children ever born",
        "More than two children",
        "The first child is a boy",
        "The second child is a boy",
        "First two childs are boys",
        "First two childs are girls",
        "Age of mom when first kid was born",
    ]

    return table2

In [67]:
table_sum_stats(data_2)


Unnamed: 0,Mean,Standard Deviation,Description
KIDCOUNT,2.55,0.81,Children ever born
more2k,0.4,0.49,More than two children
boy1st,0.51,0.5,The first child is a boy
boy2nd,0.51,0.5,The second child is a boy
two_boys,0.26,0.44,First two childs are boys
two_girls,0.24,0.43,First two childs are girls
agefstm,20.52,2.94,agefstm


In [10]:
data["INCOME1D"].describe()
data["INCOME2D"].describe()

count    762843.000000
mean       1503.258842
std        7094.477677
min       -9995.000000
25%           0.000000
50%           0.000000
75%           0.000000
max       75000.000000
Name: INCOME2D, dtype: float64

---
<span style="color:orange">**NOTE**:</span> The original data provided by the authors can be found [here](https://www.aeaweb.org/articles?id=10.1257/app.2.2.95). For this replication the data is split into two .dta-files due to size constraints.

---

As shown in the graph below, the distance from the cutoff for university GPA in the provided dataset still spans from values of -1.6 to 2.8 as can be seen below. Lindo et al. (2010) use a bandwidth of *(-0.6, 0.6)* for regression results and a bandwidth of *(-1.2, 1.2)* for graphical analysis. 

Table 1 shows the descriptive statistics of the main student characteristics and outcomes in the restricted sample with a bandwidth of 0.6 from the cutoff. The majority of students are female (62%) and native English speakers (72%). Students in the reduced sample on average placed in the 33rd percentile in high school. It should also be noted that quite a large number of students (35%) are placed on probation after the fist year. An additional 11% are placed on probation after the first year.

#### Table 1- Summary statistics

##  5.2. Results

### 5.2.1. Tests of the Validity of the RD Approach

The core motivation in the application of RD approaches is the idea, that the variation in treatment near the cutoff is random if subjects are unable to control the selection into treatment (Lee & Lemieux, 2010). This condition, if fulfilled, means the RDD can closely emulate a randomized experiment and allows researchers to identify the causal effects of treatment. 

For evaluating the effects of academic probation on subsequent student outcomes, the RDD is thus a valid approach only if students are not able to precisely manipulate whether they score above or below the cutoff. Lindo et al. (2010) offer multiple arguments to address concerns about nonrandom sorting: 

1. The study focuses on first-year students, assuming this group of students is likely to be less familiar with the probation policy on campus. To verify their conjecture, the authors also conducted a survey in an introductory economics course which revealed that around 50 % of students were unsure of the probation cutoff at their campus. They also claim that this analysis showed no relationship between knowledge of probation cutoffs and students' grades. 


2. The authors also point out that most first-year courses span the entire year and most of the evaluation takes place at the end of the term which would make it difficult for students to purposely aim for performances slightly above the cutoff for academic probation.


3. Finally, and most importantly, the implication of local randomization is testable. If nonrandom sorting were to be a problem, there should be a discontinuity in the distribution of grades at the cutoff with a disproportionate number of students scoring just above the cutoff. Additionally, all the covariates should be continuous throughout the cutoff to ensure that the group above the probation cutoff constitutes a realistic counterfactual for the treated group.

In the following section, I first conduct a brief visual and descriptive check of validity before presenting my replication of the validity checks conducted in Lindo et al. (2010).

###  i.  Extension: Visual Validity Check

To check for discontinuities in the covariates and the distribution of students around the cutoff Lindo et al. (2010) use local linear regression analysis. Before implementing the rather extensive validity check conducted by Lindo et al. (2010) I show in this section that a rather simple descriptive and graphical analysis of the distribution of covariates already supports the assumption they are continuous throughout the threshold.

#### Extension | Table - Descriptive Statistics of Treated and Untreated Group Close to the Cutoff
The table below shows the means of the different covariates at the limits of the cutoff from both sides (here within a bandwidth of 0.1 grade points). We can see that the means of the groups below and above the probation cutoff are very similar, even equal for some of the variables.

#### Extension | Figure - Distribution of Covariates throughout the Probation Cutoff
The figure below shows the means of the nine covariates in bins of size 0.5 (grade points). Similar to the descriptive table shown above, this visualization shows that there seem to be no apparent discontinuities in the distribution of students for any of the observable characteristics (graphs with bins of size 0.1 or 0.025 suggest the same).

### ii. Advanced Validity Check
(as conducted by Lindo et al. (2010))

#### Figure 1 | Distribution of Student Grades Relative to their Cutoff

To test the assumption of local randomization, Lindo et al. (2010) run a local linear regression on the distribution of students throughout the cutoff. As mentioned above, these should be continuous as a jump in the distribution of students around the cutoff would indicate that students can in some way manipulate their GPA to place above the cutoff. 

For the analysis, the data (containing all observations within 1.2 GPA points from the cutoff) is sorted into bins of size 0.1. The bins contain their lower limit but not their upper limit. To replicate the result from Lindo et al. (2010), I calculate the frequency of each bin and then run a local linear regression with a bandwidth of 0.6 on the size of the bins. Figure 1 shows the bins and the predicted frequency for each bin. The results show that the distribution of grades seems to be continuous around the cutoff, suggesting that we can assume local randomization. 

This method of testing the validity is especially useful because it could capture the effects of unobservables, whose influence we cannot otherwise test like we test for discontinuities in observable characteristics in the parts above and below. If all observable characteristics would show to be continuous throughout the cutoff but we could still observe a jump in the distribution of students above the cutoff, this would suggest that some unobservable characteristic distinguishes students above and below the probation threshold. Fortunately, the results shown below indicate that this is not the case supporting the RDD as a valid identification strategy.

#### Table 2 - Estimated Discontinuities in Observable Characteristics 

Table 2 shows the results of local linear regression (using a bandwidth of 0.6) for a range of observable characteristics that are related to student outcomes. Significant discontinuities would indicate that students with certain characteristics might be able to manipulate their grades to score above the probation cutoff. Similar to the descriptive validity checks on covariates in the section, these results additionally support the validity of the RDD. Table 2 shows that the coefficient for scoring below the cutoff is insignificant at the 10% level for all covariates. 

---
<span style="color:orange">**NOTE**:</span> My results for 'Male' and 'Age at entry' are switched compared to the table presented in Lindo et al. (2010). Since the results are identical otherwise, I assume this difference stems from an error in the table formatting of the published paper.

<span style="color:orange">**NOTE**:</span> The p-values in all regression tables are color-coded to enhance readability:

* P-values at the <span style="color:magenta">10% level</span> are magenta,
* P-values at the <span style="color:red">5 % level</span> are red,
* P-values at the <span style="color:orange">1 % level</span> are orange.

The color-coding may not be visible in all viewing options for Jupyter Notebooks (e.g. MyBinder).

---

### 5.2.2. First Year GPAs and Academic Probation

Figure 2 and Table 3 show the estimated discontinuity in probation status. Figure 2 and the first part of Table 3 show the estimated discontinuity for the probation status after the _first year_. The second part of Table 3 presents the results for the estimated effects of scoring below the cutoff on the probability of _ever_ being placed on academic probation.

Figure 2 and part 1 of Table 3 verify that the discontinuity at the cutoff is **sharp**, i.e. all students whose GPA falls below the cutoff are placed on probation. For students below the cutoff, the probability of being placed on probation is 1, for students above the cutoff it is 0.

It should be noted that the estimated discontinuity at the cutoff is only approximately equal to 1 for all of the different subgroups, as the results in Part 1 of Table 3 show. The authors attribute this fact to administrative errors in the data reportage. 

#### Figure 2 - Porbation Status at the End of First Year

#### Table 3 - Estimated Discontinuity in Probation Status

To estimate the discontinuity in probation status, the authors again use a bandwidth of 0.6 from the cutoff. In addition to the whole sample, they also estimate the discontinuities for certain subgroups within the selected bandwidth:

* **high school grades below** and **above the median** (here, median refers to the median of the entire dataset (median: *50*) and not the median of the subset of students with a GPA within 0.6 grade points of the probation cutoff (the median for this set would be *28*))
* **male** and **female** students
* **english** native speakers and students with a different native language (**nonenglish**) 

**Table 3 | Part 1 - Estimated Discontinuity in Probation Status for Year 1**


**Table 3 | Part 2 - Estimated Discontinuity in Probabtion Status Ever**

Part 2 of Table 3 presents the estimated effect of scoring below the cutoff in the first year for _ever_ being placed on probation. The results show that even of those who score slightly above the probation cutoff in year 1, 33 % are placed on probation at some other point in time during their studies. 

For the different subgroups of students this value varies from 29% (for students with high school grades above the median) up to 36.7% (for the group of males). These results already indicate that we can expect heterogeneities in the way different students react to being placed on academic probation.

The fact that it is not unlikely for low performing students just slightly above the cutoff to fall below it later on also underlines these student's fitness as a control group for the purpose of the analysis. Lindo et al. (2010) argue that the controls can be thought of as receiving a much weaker form of treatment than the group that is placed on probation, as scoring just above the cutoff in year 1 does not save students from falling below the cutoff and being placed on probation in subsequent terms. 

### 5.2.3. The Immediate Response to Academic Probation 

Students who have been placed on academic probation enter their next term at university with the threat of suspension in case they fail to improve their grades. Recalling the theoretical framework presented in prior sections, students face the following set of options after each term:

1. **Option 1**: Return to school, exhibit low effort and achieving a low GPA,
2. **Option 2**: Return to school, exhibit high effort with the intent of achieving a high GPA,
3. **Neither** option: Drop out of university.

Students on probation face a different set of choices than the students that were not placed on probation as the threat of suspension essentially eliminates option 1. Of course, students could enter the next term, exhibit low effort, and receive low grades, but this would result in suspension. Since both option 1 and option 3 result in the student not continuing school (at least for a certain period of time), students who cannot meet the performance standard (thus leading to suspension) are much better off dropping out and saving themselves the cost of attending university for another term.

#### Table 4 - Estimated Effect on the Decision to Leave after the First Evaluation

The results presented in Table 4 and and Figure 3 show the effects of being placed on probation on the probability to drop out of school after the first evaluation. The first row of Table 4 shows the average effect of academic probation on this outcome. The results indicate that, on average, being placed on probation increases the probability of leaving university by 1.8 percentage points. A student on academic probation is thus 44% more likely to drop out than their control group counterpart.

The results presented in the rest of Table 4 and and Figure 3 show that the average effect of being placed on probation is also characterized by large heterogeneities between the different subgroups of students. For males and native English speakers, the results, which are significant at the 5% level, show an increase of 3.7 and 2.8 percentage points respectively in the probability of leaving university after being placed on probation after the first evaluation. The results show no significant effects for these group's counterparts, the subgroups of females and nonnative English speakers. 

Aside from gender and native language, the results also indicate that high school performance seems to play a role in how students react on being placed on probation. For the group of students who scored above the median in high school academic probation roughly doubles the probability of leaving school compared to the control group while there is no such effect for students who scored below the median in high school. Lindo et al. (2010) contribute this finding to a discouragement effect for those students who are placed on probation, which seems to be larger for students who did well in high school.

#### Figure 3 - Stratified Results for Voluntarily Leaving School at the End of the First year

### 5.2.4. The Impact onSubsequent Performance

### i. Main Results for Impact on GPA & Probability of Placing Above Cutoff in the Next Term

The next outcome Lindo et al. (2010) analyze is the performance of students who stayed at university for the next term. The theoretical framework presented in Section 2 predicts that those students on probation who stay at university will try to improve their GPA. Indeed, if they do not manage to improve, they will be suspended and could have saved themselves the effort by dropping out.

The results presented in Figure 4 and Table 5 show the estimated discontinuity in subsequent GPA. Lindo et al. (2010) find significant results (at the 5% level) for all subgroups, which is an even bigger effect than that of probation on drop out rates, where only some subgroups were affected. 

#### Figure 4 - GPA in the Next Enrolled Term

As part A of Table 5 shows, the average treatment effect on the GPA in the next term is positive for all groups of students. The average student on probation has a GPA increase of 0.23 grade points which is 74% of the control group. 

The increase is greatest for students who have high school grades below the median. These students increase their GPA by 0.25 grade points on average, 90% more than their control group. This is an interesting finding because the counterpart, students who scored above the median in high school, are especially likely to drop out. Thus high school grades seem to have a large effect on whether students perceive academic probation as discouragement or as an incentive to improve their performance. 

It should be noted here, that the '*next term*' may not be the next year for all students because some students take summer classes. If students fail to improve their grades during summer classes, they are already suspended after summer and will not enter the second year. Only using grades from the second year would thus omit students who were suspended before even entering the second year. The existence of summer classes may complicate the comparability of students after being put on probation. However, in a footnote Lindo et al. (2010) mention that they find no statistically significant impact of academic probation on the probability that a student enrolls in summer classes and the estimates for subsequent GPA are nearly identical when controlling for whether a student's next term was attending a summer class.

---
<span style="color:orange">**NOTE**:</span> Lindo et al. (2010) in this call this the '*improvement*' of students' GPA, however, this phrasing in my opinion could be misleading, as the dependent variable in this analysis is the distance from cutoff in the next term. The results thus capture the increase in subsequent GPA in general and not relative to the GPA in the prior term.

---

#### Table 5 - Estimated Discontinuites in Subsequent GPA | Part A - Next Term GPA

#### Table 5 - Estimated Discontinuites in Subsequent GPA | Part B - Probability of Placing Above the Cutoff in Next Term

Panel B of Table 5 shows the probability of scoring above the cutoff in the next term. This statistic is very important because it decides whether students on academic probation are suspended after the subsequent term. It is therefore important for students who scored below the cutoff in the first year to not only improve their GPA, but improve it enough to score above the cutoff in the next term. Again academic probation increases the probability of students scoring above the cutoff in the next term for all subgroups.

### ii. Formal Bound Analysis on Subsequent GPA (partial extension)

As already mentioned in the section on the identification strategy, analyzing outcomes that occur after the immediate reaction to probation (the decision whether to drop out or not) becomes more challenging if we find that students are significantly more or less likely to drop out if they have been placed on academic probation. As discussed in the preceding section, this is the case because some groups of students indeed are more likely to drop out if they have been placed on probation.

For the analysis of subsequent GPA, this means that the results become less reliable because there is a group of students (those who dropped out) whose subsequent performance cannot be observed. This can cause the results to be biased. For example, if academic probation causes students with relatively low ability to drop out (which the performance model would predict) then we would find a positive impact on subsequent GPA being solely driven by the fact that the low performers in the treatment group dropped out. If, on the other hand, high ability students were more likely to drop out, the estimates for the impact on subsequent performance would be downward biased.

In short, the control group might not be comparable anymore. To test whether the results on subsequent GPA are robust to these concerns, Lindo et al. (2010) use formal bound analysis for the results on subsequent GPA which I present below.

In addition to this formal bound analysis, I plot confidence intervals for the results on subsequent GPA. Confidence intervals are a useful way to support the graphical analysis of RDDs and ensure the discontinuity at the threshold does not disappear when new population samples are drawn. The graph below shows the estimates from before including a bootstrap 95% percent confidence interval. The confidence interval around the cutoff shows to be quite small, and the fall in subsequent GPA between the treatment and control group persists even at the borders of the confidence interval. 

#### Subsequent Performance with 95% Confidence Interval

---
<span style="color:orange">**NOTE**:</span> The confidence intervals presented here are the product of only 100 resampling iterations of the bootstrap because increasing the number of times the data is resampled significantly increases the runtime of this notebook. However, I have tested the bootstrap for up to 1000 iterations and the results do not diverge very much from the version shown here. 

---

This type of confidence interval, however, does not correct for potential biases in the treatment or control group discussed above because the bootstrap only resamples the original data and therefore can at best achieve the estimate resulting from the original sample. 

To test the sensitivity to possible nonrandom attrition through specific students dropping out of university, Lindo et al. (2010) perform a formal bound analysis using a trimming procedure proposed by Lee (2009)*. The reasoning for this approach is based on the concerns described above. To find a lower bound of the estimate, Lindo et al. (2010) assume that academic probation causes students who would have performed worse in the next term to drop out. The control group is thus made comparable by dropping the lowest-performing students (in the next term) from the sample, assuming these students would have dropped out had they been placed on probation. To calculate the upper bound estimate, the same share of students is dropped from the upper part of the grade distribution instead. 

The share of students who need to be dropped is given by the estimated impact of probation on leaving school. For example, in the entire sample students on probation are 1.8 percentage points more likely to drop out, which is 44% of the control mean. Thus, to make the groups comparable again we presumably need to drop 44% more students from the control group than actually dropped out. 

For groups of students where the estimated impact of probation on leaving school is negative, students from the control group need to be dropped instead (i.e. here the lower bound is given by dropping the top students in the treatment group and the upper bound is given by dropping the bottom students). 

While all results I have presented in this replication so far are exactly identical to the results from Lindo et al. (2010), I, unfortunately, cannot replicate the results from the formal bound analysis precisely. The description in the paper is brief and the provided STATA code from the authors does not include the formal bound analysis. While referring to methods presented in Lee (2009) has been helpful to understand the trimming procedure, I am unable to replicate the exact numbers presented in Lindo et al. (2010).

The table pictured below shows the results of the formal bound analysis presented in Lindo et al. (2010). The authors conclude that the positive effects of academic probation on students' subsequent GPA are too great to be explained by the attrition caused by dropouts. 

---
<span style="color:orange">**NOTE**:</span> In their paper Lindo et al. (2010) quote _'Lee (2008)'_ which could also refer to a different paper by Lee and Card from 2008 listed in the references. However, since this paper in contrast to the 2009 paper by Lee does not mention formal bound analysis and since Lee (2009) is not mentioned anywhere else in the paper, I am certain this is a citation error.

---

#### Formal Bound Analysis from Lindo et al. (2010) (p.110)

![ERROR: Here should be a picture of the bounds from the paper](files/bounds_nextGPA.PNG)

The table below shows my results using the proposed trimming procedure (table is again transposed compared to the original). The overall results are quite similar to the ones presented in Lindo et al. (2010), all estimates presented in Table 5 still lie between the lower and upper bound. It should be noted that in my replication the lower bound estimate for students with high school grades above the median was not significant at the 10% level while the results for all other groups were.

#### Replication of Formal Bound Analysis

### 5.2.5. The Impacts on Graduation

As a third outcome, Lindo et al. (2010) examine the effects of academic probation on students' graduation rates. As already discussed in the previous section, the outcomes that are realized later in time are more complex to examine because of all the different choices a student has made until she or he reaches that outcome. Graduation rates are the product of a dynamic decision-making process that spans throughout the students' time at university. While the study focuses mainly on the effects of being put on probation after the first year, the decision problem described in the theoretical framework can be faced by students at different points during their academic career as students can be placed on probation each term or for multiple terms in a row. There are different ways in which academic probation could affect graduation rates. On the one hand, it could reduce the probability of graduating because probation increases the probability of dropping out and some students who fail to increase their grades are suspended. On the other hand, these students might have graduated either way and thus do not have an effect. Additionally, probation could increase graduation rates because those students who remain improve their performance. 

#### Figure 5 - Graduation Rates

Figure 5 and Table 6 show the estimated impacts of academic probation after the first year on whether a student has graduated in four, five or six years. The effects are negative for all three options, suggesting that the negative effects discussed above overweigh potential positive effects on graduation rates.

#### Table 6 - Estimated Effects on Graduation

The effects on graduation rates are insignificant for most subgroups, the group of students with high school grades above the median stands out as being especially negatively affected by being placed on probation in the first year. This group of students sees their probability of graduation within six years reduced by 14.5 percent. Lindo et al. (2010) attribute these results to the fact that this group of students is especially likely to drop out after being put on probation and also on average does not do much better than their counterpart if they continue to attend university.

Overall the results on graduation rates are rather limited. This likely stems from the more complex nature in which probation in the first year can affect this outcome later down the line. Unfortunately, most of the data in the provided dataset focus on the first two years of students' time at university (e.g. we only now the GPA of the first two years). Much more information would be needed to uncover the mechanisms in which probation may affect students' probability of graduating within specific timeframes.

---
<span style="color:orange">**NOTE**:</span> Below I only show the sections of Table 6 that are discussed above as the entire table is quite extensive. The other results presented in Table 6 of the paper can be viewed by uncommenting the code at the end of this section.

---

#### Graduated after 6 years

**Code for complete Table 6:**

---
# 6. Extension: Robustness Checks 
---

As discussed in my replication of Lindo et al. (2010) above, the authors use a variety of validity and robustness checks to analyze the reliability of their results. Aside from some smaller independent contributions that I already discuss in the replication part for better context, I in this section further analyze subsequent performance and check the bandwidth sensitivity of the results in drop out rates and subsequent GPA.

## 6.1.  A Closer Look at Students' Subsequent Performance. 

### 6.1.1. Subsequent Performance and Total Credits in Year 2

The results from Lindo et al. (2010) presented above show that students are more likely to drop out after being placed on academic probation but those who remain in school tend to improve their GPA above the cutoff in the next term. These results are generally in line with the theoretical framework presented in the paper which predicts that students either drop out or improve their GPA if the cost of not improving in the next term increases. The performance standard model explains these results through students self-selecting between increasing effort and dropping out based on their abilities (which are defined as the probability of meeting the performance standard). Students who are less likely to improve their GPA should thus be more likely to drop out. Unfortunately, it is not possible to test this prediction, as Lindo et al. (2010) emphasize in the paper because the probability of meeting the performance standard is not observed for students who leave school. 

However, examining the students who remain in school may give some further insights. While Lindo et al. (2010) observe that  students have been placed on probation on average improve their performance, it is not clear under which circumstances this is happening. A look at the amount of credits students are taking in their second year may give some insights. The results presented below show that being placed on probation after the first year has a negative effect on the amount of credits students take in the second year for all of the examined subgroups except the group of nonnative English speakers. This is a stark contrast to the first year where both the treatment and control group take almost the same amount of credits (as shown in the section on the validity of the RD Approach).

The results indicate that being placed on probation decreases the total credits taken by the average student in year two by 0.33, around 8% of the control mean. As the table below shows, the results are most prominent for males, native English speakers, and students with high school grades above the median. Interestingly, these are the same groups of students that are most likely to drop out, suggesting that the discouragement effect persists throughout these groups and even those who re-enroll for the next term proceed with caution by taking fewer credits.

When interpreting these results it should be kept in mind that some students' next evaluation takes place during summer classes. Students who have taken summer classes enter their second year already having either passed the next evaluation or not. Those who fell below the cutoff will have been suspended and thus are missing from the data for the second year and those who have passed the threshold in the summer classes are likely not on probation anymore. Estimating the effects of probation on credits taken in the second year separately for both groups shows that those who did not take classes in the summer are more affected than those who did. For the students who took summer classes, the results are only significant for males, students with high school grades above the median and native English speakers.

#### No summer classes

#### Summer classes

These findings are useful for interpreting the subsequent performance of students because more credits likely signify a larger workload for the student. Instead of increasing their effort, students may just decrease their workload by completing fewer credits in the next term. Unfortunately, we cannot test this in detail because the data doesn't show how many credits students completed in which term. 

Reducing the sample for the analysis of the subsequent GPA to students who did not attend summer classes and completed 4 credits in the second year (the most frequent amount of credits takeen by this group of students) shows that the effect of scoring below the cutoff in year 1 becomes insignificant for the students who have above-median high school grades and nonnative English speakers. The improvement decreases a bit for some groups like females or students with high school grades below the median but increases for others like males and native english speakers. Overall the results are still highly significant though considering the small window of observations to which the data is reduced in this case. This suggests that while students on probation do seem to take fewer credits in the next year, the improvements to subsequent performance is too great to just be attributed to students decreasing their workload. 

### 6.1.2. Subsequent Cumulative Grade Point Average (CGPA) 

An additional factor that might be important for the analysis of subsequent performance is the Cumulative Grade Point Average (CGPA). Lindo et al. (2010) focus their analysis of subsequent performance solely on the grades achieved in the next term. However, in the section on the institutional background in the paper the authors write:

>*At all campuses, students on probation can avoid suspension and return to good academic standing by bringing their cumulative GPA up to the cutoff.* (Lindo et al., 2010, p.98).

To avoid suspension in the long term, students on probation thus are required to not only score above the cutoff in the next term but to score high enough to bring their CGPA above the probation threshold. Students who score above the threshold in the next term but still have a CGPA below the cutoff remain on probation. Students who fail to bring their GPA above the cutoff (and thus also their CGPA since their first-year GPA and first-year CGPA are the same) are suspended. 

As the figure and table below show, the positive effects of probation on subsequent performance carry over to students' CGPA as well. Being placed on probation on average increases students' CGPA by 0.07 grade points or 63% of the control mean although the difference is rather difficult to spot visually.

#### Effect of Academic Probation on Subsequent CGPA

However, in contrast to the probability of improving the next term GPA above the cutoff, academic probation has no significant effect on the probability of improving the CGPA above the cutoff in the next term except for the group of nonnative English speakers where the probability is actually negative. Indeed, out of all students on probation (within 0.6 grade points of the cutoff), only around 37% improve their next term CGPA above the cutoff. Around 23% improve their GPA above the cutoff but not their CGPA and remain on probation. The other students dropped out or are suspended after the next term. This suggests that the effects of probation span much longer than just the subsequent term for many students, not only indirectly because they have had the experience of being placed on probation but also directly because many of them remain on probation for multiple subsequent terms. These factors underline the points made in previous sections about the complexity of the way academic probation can affect a student's academic career. After being placed on probation a student can take a multitude of different paths, many more than the theoretical framework introduced in Section 2 leads on. A more dynamic approach to estimating the effects of academic probation could likely offer more insights into how students react to this university policy.

#### Effect of Academic Probation on the Probability of Achieving a CGPA Above the Cutoff in the Next Term

## 6.2. Bandwidth Sensitivity 

As a final robustness check, I evaluate the model at different bandwidths to ensure that results are not limited to one specific sample of students within a particular bandwidth. Lindo et al. (2010) use a distance from the threshold of 0.6 for the main regression analysis and 1.2 for graphical analysis (although the estimated curve at each point relies on a local linear regression with a bandwidth of 0.6 as well). The chosen bandwidth around the cutoff thus captures around 25% of the total range of grades (the GPA values observed in the first year span from 0 to 4.3). 

Lindo et al. (2010) do not discuss the reasoning behind their choice of bandwidth in detail and do not apply optimal bandwidth selection methods like some other applications of regression discontinuity (Imbens & Lemieux, 2008; Lee & Lemieux, 2010). However, from a heuristic standpoint, this bandwidth choice seems reasonable. Since the cutoff lies at a GPA of 1.5 (1.6 at Campus 3), this bandwidth includes students whose GPA falls roughly between 0.9 and 2.1 grade points, so a range of around one average grade point including the edges. A much larger bandwidth would not make sense because it would include students that are failing every class and students who are achieving passable grades and are thus not very comparable to students who pass or fall below the threshold by a small margin.

I evaluate bandwidths of length 0.2 (0.1 distance from cutoff on each side) up to 2.4 (1.2 distance from cutoff on both sides). As Lindo et al. (2010), I choose a maximum bandwidth of 1.2 the reasons explained in the paragraph above.

#### Bandwidth sensitivity of the effect of probation on the probability of leaving school

The table below shows the estimated effect of probation on the probability to leave school after the first year using local linear regression (same specification as before) for all bandwidths between 0.1 and 1.2. The bandwidths are on the vertical axis, and the different subgroups are on the horizontal axis of the table. An *x* in the table indicates that the estimate was insignificant at the 10% level and is thus not shown for readability. 

The table shows that the results for the effects on leaving school are relatively sensitive to bandwidth selection. Estimates of students within only 0.2 grade points of the probation threshold are not significant for any of the groups considered. Results for students with high school grades below the median are only significant for bandwidths between 0.3 and 0.5 while estimates for students with high school grades above the median are only significant between values of 0.5 and 0.7. The results for the other subgroups, on the other hand, seem to be quite robust to bandwidth selection.  

The findings reported in this table suggest that some results presented in the previous sections should be interpreted carefully. Especially the estimates of students based on high school grades might be driven by some underlying factors that are not observed in this study. These could explain the sensitivity of the results to bandwidth selection.

#### Bandwidth sensitivity of the effect of probation on subsequent GPA

The results for the effects of academic probation on subsequent performance, on the other hand, seem to be quite robust to bandwidth selection. The estimated effects are the highest for most subgroups around the threshold of 0.6 chosen by Lindo et al. (2010) but the effects do not change sign for any subgroup and still remain quite similar.

Again, the group of students with high school grades above the median does not show significant results for bandwidths between 0.1 and 0.4 and thus seems to be the most sensitive to bandwidth selection. 

---
# 7. Conclusion
---

Overall, the results in this notebook support the findings reported by Lindo et al. (2010) in their paper. The transparent research methods and STATA code provided by the authors allowed me to reproduce the results precisely for almost all tables and figures except for the formal bound analysis presented in Section 5.2.4. for which I could only produce similar results. In addition to the replication of the main results from Lindo et al. (2010), I discuss the identification strategy used in the paper and evaluate the robustness of the results, especially in the context of the performance standard model used in the paper. The results presented in Lindo et al. (2010) and my additional evaluation offer overall support for the performance standard model by Bénabou and Tirole (2000) which predicts that students who are put on probation will be more likely to drop out of university or improve their performance if they remain in school. However, one core feature of the model, the idea that students make their choices based on their ability to meet the performance standard, could not be tested due to the fact that the subsequent performance of students who left school cannot be observed.

Lindo et al. (2010) find large heterogeneities in the way students react to probation based on a set of covariates, however, the underlying sources of these heterogeneities are not evaluated. Further analysis of performance standards like academic probation using a larger set of information on student characteristics like personality traits, patience or socioeconomic background may thus be helpful to reveal the reasons why different students react to this negative incentive in certain ways. 

Additionally, the study focused only on the effects of academic probation in the first year and relatively short term outcomes while long term outcomes were not assessed in detail. As already discussed in the section on the effects of academic probation on graduation rates, analyzing long term outcomes is much more difficult because of the multitude of different choices a student can make before reaching a specific outcome. Being placed on probation in the first year already expands the types of paths students may follow greatly. However, students can be placed on probation, suspended or leave school each term. To analyze the long term effects of academic probation in detail, there are too many questions that the data used in this study cannot answer. 

Overall the findings from Lindo et al. (2010) offer quite robust results on the effects of academic probation on low performing students. They contribute important insights into how students or individuals in general may react to negative incentives with the threat of severe real-world penalties if they fail to adjust their behavior.

---
# 8. References
---

* **Bénabou, R., & Tirole, J. (2000)**. *Self-Confidence and Social Interactions* (No. w7585). National bureau of economic research.


* **Imbens, G. W., & Lemieux, T. (2008)**. Regression discontinuity designs: A guide to practice. *Journal of Econometrics*, 142(2), 615-635.


* **Lee, D. S. (2009)**. Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. *The Review of Economic Studies*, 76(3), 1071-1102.


* **Lee, D. S., & Lemieux, T. (2010)**. Regression Discontinuity Designs in Economics. *Journal of Economic Literature*, 48(2), 281-355.


* **Lindo, J. M., Sanders, N. J., & Oreopoulos, P. (2010)**. Ability, Gender, and Performance Standards: Evidence from Academic Probation. *American Economic Journal: Applied Economics*, 2(2), 95-117.


* **Thistlethwaite, D. L., & Campbell, D. T. (1960)**. Regression-discontinuity analysis: An alternative to the ex post facto experiment. *Journal of Educational Psychology*, 51(6), 309.



-------
Notebook by Annica Gehlen | Find me on GitHub at https://github.com/amageh.

---