--- 
Project for the course in Microeconometrics | Summer 2021, M.Sc. Economics, Bonn University | [Mengxi Wang](https://github.com/Mengxi-20)

# Replication of Chen, T., Kung, J. K. S., & Ma, C. (2020) <a class="tocSkip">   
---

This notebook contains my replication of the results from the following paper:

> Chen, T., Kung, J. K. S., & Ma, C. (2020). Long Live Keju! The Persistent Effects of China’s Civil Examination System. The Economic Journal, 130(631), 2030–2064. 

##### Downloading and viewing this notebook:


* The best way to view this notebook is by downloading it and the repository it is located in from [GitHub](https://github.com/OpenSourceEconomics/ose-data-science-course-project-Mengxi-20). Other viewing options like _MyBinder_ or _NBViewer_ may have issues with displaying images or coloring of certain parts (missing images can be viewed in the folder [files](https://github.com/OpenSourceEconomics/ose-data-science-course-project-Mengxi-20) on GitHub).

* The original paper, as well as the data and code provided by the authors can be accessed [here](https://academic.oup.com/ej/article/130/631/2030/5819954).


##### Information about replication and individual contributions:

* 

<h1>Table of Contents<span class="tocSkip"></span></h1>

<div class="toc"><ul class="toc-item"><li><span><a href="#1.-Introduction" data-toc-modified-id="1.-Introduction-1">1. Introduction</a></span></li><li><span><a href="#2.-Identification" data-toc-modified-id="2.-Identification-2">2. Identification</a></span></li><li><span><a href="#3.-Empirical-Setup" data-toc-modified-id="3.-Empirical-Setup-3">3. Empirical Setup</a></span></li><li><span><a href="#4.-Replication-of-Chen-et-al.-(2020)" data-toc-modified-id="4.-Replication-of-Chen-et-al.-(2020)-4">4. Replication of Chen et al. (2020)</a></span><ul class="toc-item"><li><span><a href="#4.1.-Data-&amp;-Descriptive-Statistics" data-toc-modified-id="4.1.-Data-&amp;-Descriptive-Statistics-4.1">4.1. Data &amp; Descriptive Statistics</a></span></li><li><span><a href="#4.2.-Results" data-toc-modified-id="4.2.-Results-4.2">4.2. Results</a></span><ul class="toc-item"><li><span><a href="#4.2.1.-Tests-of-the-Validity-of-the-RD-Approach" data-toc-modified-id="4.2.1.-Tests-of-the-Validity-of-the-RD-Approach-4.2.1">4.2.1. Tests of the Validity of the RD Approach</a></span></li><li><span><a href="#i.--Extension:-Visual-Validity-Check" data-toc-modified-id="i.--Extension:-Visual-Validity-Check-4.2.2">i.  Extension: Visual Validity Check</a></span></li><li><span><a href="#ii.-Advanced-Validity-Check" data-toc-modified-id="ii.-Advanced-Validity-Check-4.2.3">ii. Advanced Validity Check</a></span></li><li><span><a href="#4.2.2.-First-Year-GPAs-and-Academic-Probation" data-toc-modified-id="4.2.2.-First-Year-GPAs-and-Academic-Probation-4.2.4">4.2.2. First Year GPAs and Academic Probation</a></span></li><li><span><a href="#4.2.3.-The-Immediate-Response-to-Academic-Probation" data-toc-modified-id="4.2.3.-The-Immediate-Response-to-Academic-Probation-5.2.5">4.2.3. The Immediate Response to Academic Probation</a></span></li><li><span><a href="#4.2.4.-The-Impact-onSubsequent-Performance" data-toc-modified-id="4.2.4.-The-Impact-onSubsequent-Performance-4.2.6">4.2.4. The Impact onSubsequent Performance</a></span></li><li><span><a href="#i.-Main-Results-for-Impact-on-GPA-&amp;-Probability-of-Placing-Above-Cutoff-in-the-Next-Term" data-toc-modified-id="i.-Main-Results-for-Impact-on-GPA-&amp;-Probability-of-Placing-Above-Cutoff-in-the-Next-Term-4.2.7">i. Main Results for Impact on GPA &amp; Probability of Placing Above Cutoff in the Next Term</a></span></li><li><span><a href="#ii.-Formal-Bound-Analysis-on-Subsequent-GPA-(partial-extension)" data-toc-modified-id="ii.-Formal-Bound-Analysis-on-Subsequent-GPA-(partial-extension)-4.2.8">ii. Formal Bound Analysis on Subsequent GPA (partial extension)</a></span></li><li><span><a href="#4.2.5.-The-Impacts-on-Graduation" data-toc-modified-id="4.2.5.-The-Impacts-on-Graduation-4.2.9">4.2.5. The Impacts on Graduation</a></span></li></ul></li></ul></li><li><span><a href="#6.-Extension:-Robustness-Checks" data-toc-modified-id="5.-Extension:-Robustness-Checks-5">5. Extension: Robustness Checks</a></span><ul class="toc-item"><li><span><a href="#5.1.--A-Closer-Look-at-Students'-Subsequent-Performance." data-toc-modified-id="5.1.--A-Closer-Look-at-Students'-Subsequent-Performance.-5.1">5.1.  A Closer Look at Students' Subsequent Performance.</a></span><ul class="toc-item"><li><span><a href="#5.1.1.-Subsequent-Performance-and-Total-Credits-in-Year-2" data-toc-modified-id="5.1.1.-Subsequent-Performance-and-Total-Credits-in-Year-2-5.1.1">5.1.1. Subsequent Performance and Total Credits in Year 2</a></span></li><li><span><a href="#5.1.2.-Subsequent-Cumulative-Grade-Point-Average-(CGPA)" data-toc-modified-id="5.1.2.-Subsequent-Cumulative-Grade-Point-Average-(CGPA)-5.1.2">5.1.2. Subsequent Cumulative Grade Point Average (CGPA)</a></span></li></ul></li><li><span><a href="#5.2.-Bandwidth-Sensitivity" data-toc-modified-id="5.2.-Bandwidth-Sensitivity-5.2">5.2. Bandwidth Sensitivity</a></span></li></ul></li><li><span><a href="#6.-Conclusion" data-toc-modified-id="6.-Conclusion-6">6. Conclusion</a></span></li><li><span><a href="#7.-References" data-toc-modified-id="7.-References-7">7. References</a></span></li></ul></div>

In [1]:
%matplotlib inline
!pip install linearmodels
!pip install graphviz
import numpy as np
import pandas as pd
import pandas.io.formats.style
import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.api as sm_api

zsh:1: command not found: pip
zsh:1: command not found: pip


In [2]:
from linearmodels.iv import IV2SLS
from linearmodels import IV2SLS, IVLIML, IVGMM, IVGMMCUE, PanelOLS
from IPython.display import HTML, Image

---
# 1. Introduction 
---

Chen et. al. (2020) examine the effects of China’s civil examination system (keju), a long-lived institution, on human capital outcome todays. Becoming dominant from Song dynasty (c. 960–1276), Keju is the earliest elite selection system in the world, which aims to recruit talents to serve in the bureaucracy. Since Jinshi is the highest honor and qualification in this civil exam, passing the highest level of the exam and receiving the position of Jinshi mean generous pecuniary rewards and a promising future. Over time, the civil examination system formed a distinct group of local elites with deep respect for learning and academic achievements. This cultural characteristic still existed for a long time even after the abolition of the imperial examination system.

To verify the causal relationship between keju and contemporary human capital outcome, Chen et. al. (2020) introduce an **instrumental variable (IV)** - Distance to the Printing Ingredients (Pine and Bamboo) to tackle the issues of omitted variables bias. The IV is motivated by the idea that to become a winner in the Keju exam, except the limited textbooks, the candidates still need a large cluster of reference books, which explain nuances of texts and teach tricks of writing essay. That is why printing ingredients play such an important role in the Keju exam. At the end, Chen et. al. (2020) present the causality and compare OLS and TSLS using different control variables.

This notebook is structured as follows: In the next section the identification strategy is analyzed, then the empirical strategy that the authors use for estimation is briefly discussed. The fourth section and the fifth section, as the core of this notebook, show the replication of main results in the paper and try to solve possible problems with weak instruments, xxx and xxx. The sixth section concludes. The seventh section offers some reference.

---
# 2. Identification
---

Chen et. al. (2020) in their paper aim to verify whether the relationship between keju and contemporary human capital outcome is causal. However, performance i.e. Jinshi density of various prefectures may be related to many complicated factors. So the estimates of contemporary human capital outcomes to Keju are likely to suffer from omitted variable - variables that are simultaneously associated with both historical jinshi density and years of schooling today. Unobserved factors, like natural or genetic endowments, may be associated with prefectures that had produced more jinshi. Natural or genetic endowments are hard to measure and therefore it is not possible to control for them, when estimating the contemporary human capital outcomes to Keju. Thus, an omitted variable bias might occur. To deal with this concern, an instrumental variable approach is employed.  

Chen et. al. (2020) examine the effects of China’s civil examination system using data from in the Ming-Qing period (c. 1368–1905). In such a context, the printing technology in China relied mainly on pine and bamboo for producing ink and paper. Typically, the printing centers were located near to the pine and bamboo habitats to reduce the transport costs. Besides, for geographical reasons, the transportation of raw materials i.e. pine and bamboo products was mostly finished via waterways. 

Therefore, an instrumental variable is constructed using a prefecture’s shortest river distance to its nearest sites of pine and bamboo — the two key ingredients required for producing ink and paper in woodblock printing. The following is the logic that why this instrumental variable is feasible and reasonable. 

* To some extent, the performance of various Chinese prefectures in the Keju exam is closely related to the convenience of printing and obtaining books, not only textbooks but also reference books. 

* The main printing centers were located in close to the producing areas of pine and bamboo. 

* The raw materials needed for printing were mainly transported by water via the main river branches. 

Since the geographic distributions of pine and bamboo forests are random, the exogeneity assumption of the instrumental variable holds.  


In [3]:
Image("files/causal graph.png")

FileNotFoundError: No such file or directory: 'files/causal graph.png'

FileNotFoundError: No such file or directory: 'files/causal graph.png'

<IPython.core.display.Image object>

If instead of a prefecture’s shortest river distance to its nearest sites of pine and bamboo, the shortest distance to printing centers is directly employed as the instrumental variable, there might be problems of exclusion restriction, because the locations of printing centers were not exogenously determined. For example, they were more likely to be located in economically prosperous and densely populated areas. 


---
# 3. Empirical Setup
---

The authors examine the impact of keju on contemporary human capital outcomes. For the regression analysis Chen et. al. (2020) employ the following model to do the estimation:

---
# 4. Replication of Ting Chen, James Kai-sing Kung and Chicheng Ma (2020)
---

## 4.1. Data & Descriptive Statistics


To obtain the data of key explanatory variable


---
# 5. Extention
---

---
# 6. Conclusion
---

---
# 7. Reference
---

* **Chen, T., Kung, J. K. S., & Ma, C. (2020)**. Long Live Keju! The Persistent Effects of China’s Civil Examination System. *The Economic Journal*, 130(631), 2030–2064.

