http://www.axel-dreher.de/Dreher%20et%20al._Aid_China_Growth.pdf



--- 
Project for the course in Microeconometrics / OSE Data Science | Summer 2021, M.Sc. Economics, Bonn University | [Jonathan Willnow](https://github.com/JonathanWillnow)

# Replication of Dreher et. al (2020): Aid, China, and Growth: Evidence from a New Global Development Finance Dataset  <a class="tocSkip">   
---
    
    Which year????!!!! Clarify!

The aim of this notebook is to replicate the following paper:

> Dreher et. al (2021): Aid, China, and Growth: Evidence from a New Global Development Finance Dataset. American Economic Journal: Economic Policy, vol. 13(2), may 2021 (pp. 135-74).


##### Downloading and viewing this notebook:

* The best way to view this notebook is by downloading it and the repository it is located in from [GitHub](https://github.com/OpenSourceEconomics/ose-data-science-course-project-JonathanWillnow). Other viewing options like _MyBinder_ or _NBViewer_ may have issues with displaying images or coloring of certain parts (missing images can be viewed in the folder [files](https://github.com/OpenSourceEconomics/ose-data-science-course-project-JonathanWillnow) on GitHub).


* The original paper, as well as the data and code provided by the authors can be accessed [here](https://www.aeaweb.org/articles?id=10.1257/pol.20180631).

##### Information about replication and individual contributions:

* For the replication, I try to remain true to the original structure of the paper so readers can easily follow along and compare. All tables and figures are named and labeled as they appear in Lindo et al. (2010).


* The tables in my replication appear transposed compared to the original tables to suit my workflow in Python.


* For transparency, all sections in the replication that constitute independent contributions by me and are not part of results presented (or include deviations from the methods used) in the paper are marked as _extensions_. 

# Table of Contents

### Library imports

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import pandas.io.formats.style
import seaborn as sns
import statsmodels as sm
import statsmodels.formula.api as smf
import statsmodels.api as sm_api
import matplotlib as plt
from IPython.display import HTML

In [2]:
from auxiliary.example_project_auxiliary_predictions import *
from auxiliary.example_project_auxiliary_plots import *
from auxiliary.example_project_auxiliary_tables import *

# 1. Introduction

The Belt and Road Initiative (BRI), better known as The "new silk road initiative" is just one of many instances of China´s overseas activities of financing development, especially known to the europeans. Other projects, mostly infrastructure projects (by transaction value), link China within Asia and with the African continent. This role of China as significant donor raises strong opinions, but the debate was based on only little facts since most of the details are not officially reported. The paper at hand uses the Tracking Underreported Financial Flows (TUFF) methodology to introduce a new dataset that provides the needed evidentiary foundations that was needed for this issue.

Dreher et. al adress two questions: 

* Does Chinas financial development finance led to economic growth?
* Does Chinas finance undermine the effectivness of western development finance?

+ Argumente??!!

To identify whether and how Chinese development finance affects economic growth, we employ instrumental variables (IV) that exploit year-to-year changes in the supply of Chinese development finance in tandem with cross-sectional variation capturing the probability that countries receive a smaller or larger share of such funding.

# 2. The Tracking Underreported Financial Flows (TUFF) methodology

The data set is constructed using the Tracking Underreported Financial Flows (TUFF) methodology which codifies a set of open source data collection procedures in a systematic, transparent and replicable way. This methodology was origionally developed by Strange et. al(2012) in collaboration with AidData, a research lab at William and Mary and has been used and improved many times (for p.62). It enables us to identify detailed financial, locational and even operational informations about officially financed projects that are NOT recorded by the donors and lenders (here China) through the international reporting systems, like the OECD’s Creditor Reporting System (CRS) or the International Aid Transparency Initiative (IATI).

Since the authors of the paper address the relevance of this methodology many times and the constructed data set is unique in its range and accuracy, we will briefly explore the TUFF methodology. 

### First Stage
This represents the stage of primary data collection. All recorded projects of interest were collected. In parallel, potential projects at the donor/lender-recipient/borrower-year unit got identified and collected by a standartizied set of search querries. The database of choice for this dataset is the media database Factivia wich collects newspapers, radio and television transcript worldwide in 28 languages. This set of documents is then filtered using a machine learning algorithm, trained with large amounts of past identified and classified documents. A subset of documents is determined which are most likely to contain information on officially by china financed projects. Each object of the subset was then reviewed by the team and assessed whether it contained the information or not.

### Second Stage
The so gathered set of documents is then subject to a second review and augmention to validate/ invalidate it and potentionally add to the project informations in order to improve the accuracy and scope. This is performed by native speakers and language experts. For this specific dataset, the researchers also collected informations from entities like the private contractors of the projects, experts with specific tacit knowledge for specific projects and also involved external reviewers that did fildwork on a specific project or country. As a measure of validity, the researchers calculate sytematically triangulation and completness scores for each project. This triangulation and the related score achives a higher validity and reduced systematic risk, but also avoids the over-reliance on Factivia.

### Third Stage
The aim of this stage is to maximize the reliability and completeness of the records on the individual projects by quality assurance procedures. This involves the identification and correcting of inconsistent coding e.g caused by different categorized standarts, several de-duplication procedures and the vetting of each individual project record by higher ranking researchers as this whole process involves `HOWMANY?` researchers and assistants. All projects with poor records and relativly high transaction value get indentified by the triangulation and completeness scores and undergoe another review.
Finally the constructed dataset gets peer-reviewed by internal and external reviewers. For this specific dataset, more than 30 external and interbal reviewers were involved.


---
<span style="color:blue">**NOTE**:</span> More information about AidData´s TUFF Methodology, its development and coder instructions can be found [here](https://www.aiddata.org/publications). This section was based on Strange et. al(2017): AidData's Tracking Underreported Financial Flows (TUFF) Methodology, Version 1.3. Williamsburg, VA: AidData at William & Mary.



---

# 3. Theoretical Background

The aim is to analyze the causal effects of the Chinese development finance. The authors set up the following regression for all recipient countries not listed as high-income countries by the worldbank given the year $t$:


\begin{equation}
Growth_{i,t} = \beta_{1}OF_{CHN,i,t-2} + \beta_{2}pop_{i,t-1} + \beta_{3}\eta_{i} + \beta_{4}\mu_{t} + \epsilon{i,t'} 
\end{equation}

* $Growth_{i,t}$ as recipients country $i$´s real GDP per capita grwoth in $t$,
* $OF_{CHN,i,t-2}$ as measure of Chinese development finance two years before,
* $pop_{i,t-1}$ indicates recipients $i$ logged population size in $t_{-1}$,
* $\eta_{i}$ represents country-fixed effects,
* $\mu_{t}$ the time fixed effects and
* $\epsilon{i,t'}$ the error term.

As can be seen in this setup, the development finance $OF_{CHN,i,t-2}$ is likely to be endogenous to the dependent variable $Growth_{i,t}$. A potential source of its endogeneity is reverse causation: Not only does Chinese financial development has the potential to drive grwoth, but the growth of a recipient country may influence Bejjings decision to deploy development finance. The Chinese development finance and the real GDP per capita growth may positively correlated as the Chinese government prefers to concentrate its development finance towards countries with high growth, but also a negative correltation is possible due to its stated goal "to ensure its aid benefits as many needy people as possible" (p15, also state council 2011). Also when looking on this regression, it is rather lean compared to the rich dataset. It is therefore possible that $\epsilon{i,t'}$ correlates with the dependent variable, so we have the risk of ommited-variable bias. 

These described endogeneity is adressed by designing an instrumental variables regression using the following first-stage regresion:

\begin{equation}
OF_{CHN,i,t-2} = \gamma_{1}Material_{t-3} * p_{CHN,i} + \gamma_{2}Reserves_{t-3} * p_{CHN,i} + \gamma_{3}pop_{i,t-1} + \gamma_{4}\eta_{i} + \gamma_{5}\mu_{t} + \mu_{i,t-2} 
\end{equation}

The instruments that are used are

* $Material_{t-3}$ the lagged, detrended and logged Chinese production materials varying in $t$, interacted with $p_{CHN,i}$ the probability of recipient $i$ receiving Chinese development finance,
* $Resserves_{t-3}$, the lagged and detrended change in China´s ne foreign exchange reserves, interacted agin with $p_{CHN,i}$

$\mu_{t}$ is the time-varying part of our instrument. It is constructed using factor analysis for $Material_{t-3}$, which identifies latent structures, extracts common variance and puts them into a common score, here $\mu_{t}$. This allows to capture the joint variation of the logged and detrended production figures.
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/factor-analysis/
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/factor-analysis-2/





# 4. Replication

## 4.1 The Dataset


The dataset of Dreher et al(), constructed with the introduced TUFF methodology, covers 4,304 Chinese financed development projects that were commited, implemented or completed between 2000 and 2014 in 138 countries worldwide, based on 15,500 unique sources of information as described above.

## 4.2 Descriptive Statistics

In [3]:
data_1 = pd.read_stata('data/map_1yw_merge.dta')
data_2 = pd.read_stata('data/worldcoord.dta')

In [4]:
data_1.head()

Unnamed: 0,FIPS_CNTRY,GMI_CNTRY,ISO_2DIGIT,recipient_iso3,ISO_NUM,CNTRY_NAME,LONG_NAME,POP2007,SQKM,SQMI,LAND_SQKM,COLORMAP,id,OFa_all_con,probaid_PRC_OFn_all
0,AA,ABW,AW,ABW,533.0,Aruba,Aruba,72194.0,139.93,54.03,193.0,1.0,42.0,0.0,0.0
1,AF,AFG,AF,AFG,4.0,Afghanistan,Islamic Republic of Afghanistan,31889923.0,641358.44,247628.48,647500.0,3.0,177.0,135674200.0,0.933333
2,AO,AGO,AO,AGO,24.0,Angola,Republic of Angola,12263596.0,1252934.88,483758.22,1246700.0,1.0,221.0,13164030000.0,0.933333
3,AV,AIA,AI,AIA,660.0,Anguilla,Anguilla,13677.0,74.48,28.76,102.0,6.0,59.0,,
4,AL,ALB,AL,ALB,8.0,Albania,Republic of Albania,3600523.0,28798.0,11118.91,27398.0,6.0,122.0,233010900.0,0.733333


In [15]:
largest_recipiants_df = data_1.sort_values(by="OFa_all_con", ascending=False)
largest_recipiants_df.head(5)

Unnamed: 0,FIPS_CNTRY,GMI_CNTRY,ISO_2DIGIT,recipient_iso3,ISO_NUM,CNTRY_NAME,LONG_NAME,POP2007,SQKM,SQMI,LAND_SQKM,COLORMAP,id,OFa_all_con,probaid_PRC_OFn_all
180,RS,RUS,RU,RUS,643.0,Russia,Russian Federation,141377752.0,16897294.0,6524043.5,16995800.0,1.0,104.0,28717520000.0,0.6
163,PK,PAK,PK,PAK,586.0,Pakistan,Islamic Republic of Pakistan,169270617.0,880202.69,339846.25,778720.0,4.0,180.0,19340710000.0,0.933333
2,AO,AGO,AO,AGO,24.0,Angola,Republic of Angola,12263596.0,1252934.88,483758.22,1246700.0,1.0,221.0,13164030000.0,0.933333
67,ET,ETH,ET,ETH,231.0,Ethiopia,Federal Democratic Republic of Ethiopia,76511887.0,1134156.0,437897.63,1119683.0,4.0,94.0,12262410000.0,0.933333
123,CE,LKA,LK,LKA,144.0,Sri Lanka,Democratic Socialist Republic of Sri Lan,20926315.0,64665.21,24967.24,64740.0,7.0,165.0,10082000000.0,0.866667


In [14]:
# Use geopandas to plot!


# Literature to do´s

* Tracking Underreported Financial Flows 
(TUFF) methodology developed by Strange et al. (2017a, 2017b),
* Interview Prime minister Ethophia ("main reason for turnaround in fate in africa") http://et.china-embassy.org/eng/zagx/t899134.htm
