# Replication of ADH (2013)
#### By: Augusto Ospital. Date: April 5, 2022

The main model is

$ \Delta L^m_{it} = \gamma_t + \beta_1 \Delta IPW_{uit} + X'_{it}\beta_2 + e_{it}$

where where $\Delta L^m_{it}$ is the decadal change in the manufacturing employment share of the
working-age population in commuting zone $i$, $\gamma_t$ are time dummies, and $\Delta IPW_{uit}$ is the change in import exposure, and the time differences are for the period 1990–2007. The matrix $X_{it}$ includes controls. Standard errors are clustered at the state level to account for spatial correlations across CZs.

The change in import exposure $\Delta IPW_{uit}$ is instrumented by the non-US exposure variable $\Delta IPW_{oit}$ that is constructed using data on contemporane- ous industry-level growth of Chinese exports to other high-income markets. 

## Data and code preliminaries

In [1]:
from pathlib import Path
import pandas as pd
from linearmodels.iv import IV2SLS, compare
from io import StringIO
import warnings

In [2]:
mainp = Path('/Users/augusto/Dropbox/UCLA Classes/Teaching/econ424_S22/assignment1')

In [3]:
df = pd.read_stata(mainp / 'ADH/Public Release Data/dta/workfile_china.dta')

In [4]:
df.head()

Unnamed: 0,czone,statefip,city,yr,t2,timepwt48,reg_midatl,reg_encen,reg_wncen,reg_satl,...,d_tradefactor_otch_lag_io,d_expfactor_otch_lag_io,d_tradeuschlw_pw,d_tradeotchlw_pw_lag,d_tradeuschce_pw,d_tradeotchce_pw_lag,d_tradeusce_pw,d_tradeotce_pw_lag,d_tradeushi_pw,d_tradeothi_pw_lag
0,100.0,47,undefined,1990,0,0.002114,0,0,0,0,...,2.236522,0.473667,5.745085,2.437189,11.72344,2.363754,6.429653,0.084927,8.086318,3.398634
1,100.0,47,undefined,2000,1,0.002067,0,0,0,0,...,7.756201,2.385448,7.433516,9.709133,10.229408,9.147025,3.609548,0.178625,7.066726,24.310005
2,200.0,47,undefined,1990,0,0.000732,0,0,0,0,...,2.819397,0.470286,3.139381,3.033377,5.698626,2.847327,2.668147,0.049593,4.824689,0.610824
3,200.0,47,undefined,2000,1,0.000815,0,0,0,0,...,4.411247,0.982626,10.840832,4.853496,13.314946,4.596729,3.0542,0.115029,0.679557,7.543435
4,301.0,47,undefined,1990,0,0.000261,0,0,0,0,...,1.100026,0.110491,2.65686,0.798065,6.746723,0.739753,4.684127,0.018845,5.223952,-0.121527


In [5]:
def MyIVreg(formula, df=df):
    res = IV2SLS.from_formula(
        formula,
        df,
        weights = df["timepwt48"]
    ).fit(cov_type="clustered", clusters=df["statefip"])
    
    return res

In [6]:
pd.options.display.latex.repr = True

def CompareDF(x, fit_stats=['Estimator', 'R-squared', 'No. Observations'], keep=[]):
    with warnings.catch_warnings():
        warnings.simplefilter(action='ignore', category=FutureWarning)
        y = pd.read_csv(StringIO(compare(x, stars=True, precision='std_errors').summary.as_csv()), 
                        skiprows=1, skipfooter=1, engine='python')
    z = pd.DataFrame(
        data=y.iloc[:, 1:].values,
        index=y.iloc[:, 0].str.strip(),
        columns=pd.MultiIndex.from_arrays(
            arrays=[y.columns[1:], y.iloc[0][1:]],
            names=['Model', 'Dep. Var.']
        )
    )
    if not keep:
        return pd.concat([z.iloc[11:], z.loc[fit_stats]])
    else:
        return pd.concat([*[z.iloc[z.index.get_loc(v):z.index.get_loc(v)+2] for v in keep], z.loc[fit_stats]])



## Table 3: Change in Manuf/Pop, Pooled Regressions with Controls

In Table 3, we augment the first difference model for the period 1990–2007 with a set of demographic and labor force measures which test robustness and potentially eliminate confounds.

In [7]:
regions = list(filter(lambda x: x.startswith("reg"), df.columns))
baseform3 = "d_sh_empl_mfg ~ [d_tradeusch_pw ~ d_tradeotch_pw_lag] + 1"
controls3 = [
    ["t2"],
    ["t2","l_shind_manuf_cbp"],
    ["t2","l_shind_manuf_cbp"] + regions,
    ["t2","l_shind_manuf_cbp", "l_sh_popedu_c", "l_sh_popfborn", "l_sh_empl_f"] + regions,
    ["t2","l_shind_manuf_cbp", "l_task_outsource", "l_sh_routine33"] + regions,
    ["t2","l_shind_manuf_cbp", "l_sh_popedu_c", "l_sh_popfborn", "l_sh_empl_f", "l_task_outsource", "l_sh_routine33"] + regions,
]

#### I. 1990–2007 stacked first differences

In [8]:
models3 = {
    '({})'.format(i+1) : ' + '.join([baseform3, *controls3[i]]) for i in range(len(controls3))
}
results3 = {i : MyIVreg(m) for i, m in models3.items()}

In [9]:
# CompareDF(results3)
keep = ['d_tradeusch_pw','l_shind_manuf_cbp', 'l_sh_popedu_c', 'l_sh_popfborn', 'l_sh_empl_f', 
        'l_task_outsource', 'l_sh_routine33']
CompareDF(results3, keep = keep)

Model,(1),(2),(3),(4),(5),(6)
Dep. Var.,d_sh_empl_mfg,d_sh_empl_mfg,d_sh_empl_mfg,d_sh_empl_mfg,d_sh_empl_mfg,d_sh_empl_mfg
,,,,,,
d_tradeusch_pw,-0.7460***,-0.6104***,-0.5376***,-0.5080***,-0.5625***,-0.5964***
,(0.0680),(0.0940),(0.0909),(0.0812),(0.0963),(0.0988)
l_shind_manuf_cbp,,-0.0355,-0.0521***,-0.0613***,-0.0563***,-0.0402***
,,(0.0218),(0.0201),(0.0170),(0.0164),(0.0131)
l_sh_popedu_c,,,,-0.0082,,0.0131
,,,,(0.0165),,(0.0122)
l_sh_popfborn,,,,-0.0071,,0.0304***
,,,,(0.0078),,(0.0108)
l_sh_empl_f,,,,-0.0541**,,-0.0059


#### II. 2SLS first stage estimates

In [10]:
baseform3_first = 'd_tradeusch_pw ~ d_tradeotch_pw_lag + 1'
models3_first = {
    '({})'.format(i+1) : ' + '.join([baseform3_first, *controls3[i]]) 
    for i in range(len(controls3))
}
results3_first = {i : MyIVreg(m) for i, m in models3_first.items()}

In [11]:
CompareDF(results3_first, keep=['d_tradeotch_pw_lag'], fit_stats=['R-squared'])

Model,(1),(2),(3),(4),(5),(6)
Dep. Var.,d_tradeusch_pw,d_tradeusch_pw,d_tradeusch_pw,d_tradeusch_pw,d_tradeusch_pw,d_tradeusch_pw
,,,,,,
d_tradeotch_pw_lag,0.7916***,0.6637***,0.6518***,0.6346***,0.6380***,0.6310***
,(0.0793),(0.0881),(0.0924),(0.0925),(0.0893),(0.0900)
R-squared,0.5436,0.5729,0.5789,0.5846,0.5833,0.5848


## Table 5: Change in Employment, Unemployment and Non-Employment

In [13]:
regions = list(filter(lambda x: x.startswith("reg"), df.columns))
controls5 = ['t2','l_shind_manuf_cbp','l_sh_popedu_c','l_sh_popfborn','l_sh_empl_f','l_sh_routine33',
             'l_task_outsource'] + regions
lhs5 = {
    'A':['lnchg_no_empl_mfg','lnchg_no_empl_nmfg','lnchg_no_unempl','lnchg_no_nilf','lnchg_no_ssadiswkrs'],
    'B':['d_sh_empl_mfg','d_sh_empl_nmfg','d_sh_unempl','d_sh_nilf','d_sh_ssadiswkrs'],
    'C':['d_sh_empl_mfg_edu_c','d_sh_empl_nmfg_edu_c','d_sh_unempl_edu_c','d_sh_nilf_edu_c'],
    'D':['d_sh_empl_mfg_edu_nc','d_sh_empl_nmfg_edu_nc','d_sh_unempl_edu_nc','d_sh_nilf_edu_nc']
}

#### Panel A. 100 × log change in population counts

In [14]:
models5a = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs5['A'][i]), 
                                     *controls5]) for i in range(len(lhs5['A']))
}
results5a = {i : MyIVreg(m) for i, m in models5a.items()}

In [15]:
CompareDF(results5a, keep=['d_tradeusch_pw'], fit_stats=[])

Model,(1),(2),(3),(4),(5)
Dep. Var.,lnchg_no_empl_mfg,lnchg_no_empl_nmfg,lnchg_no_unempl,lnchg_no_nilf,lnchg_no_ssadiswkrs
,,,,,
d_tradeusch_pw,-4.2305***,-0.2741,4.9213***,2.0583*,1.4659***
,(1.0472),(0.6509),(1.1278),(1.0797),(0.5566)


#### Panel B. Change in population shares

In [16]:
models5b = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs5['B'][i]), 
                                     *controls5]) for i in range(len(lhs5['B']))
}
results5b = {i : MyIVreg(m) for i, m in models5b.items()}

In [17]:
CompareDF(results5b, keep=['d_tradeusch_pw'], fit_stats=[])

Model,(1),(2),(3),(4),(5)
Dep. Var.,d_sh_empl_mfg,d_sh_empl_nmfg,d_sh_unempl,d_sh_nilf,d_sh_ssadiswkrs
,,,,,
d_tradeusch_pw,-0.5964***,-0.1779,0.2209***,0.5533***,0.0764***
,(0.0988),(0.1370),(0.0576),(0.1503),(0.0276)


#### College education

In [18]:
models5c = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs5['C'][i]), 
                                     *controls5]) for i in range(len(lhs5['C']))
}
results5c = {i : MyIVreg(m) for i, m in models5c.items()}

In [19]:
CompareDF(results5c, keep=['d_tradeusch_pw'], fit_stats=[])

Model,(1),(2),(3),(4)
Dep. Var.,d_sh_empl_mfg_edu_c,d_sh_empl_nmfg_edu_c,d_sh_unempl_edu_c,d_sh_nilf_edu_c
,,,,
d_tradeusch_pw,-0.5917***,0.1681,0.1191***,0.3044***
,(0.1247),(0.1216),(0.0387),(0.1129)


#### No college education

In [20]:
models5d = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs5['D'][i]), 
                                     *controls5]) for i in range(len(lhs5['D']))
}
results5d = {i : MyIVreg(m) for i, m in models5d.items()}

In [21]:
CompareDF(results5d, keep=['d_tradeusch_pw'], fit_stats=[])

Model,(1),(2),(3),(4)
Dep. Var.,d_sh_empl_mfg_edu_nc,d_sh_empl_nmfg_edu_nc,d_sh_unempl_edu_nc,d_sh_nilf_edu_nc
,,,,
d_tradeusch_pw,-0.5811***,-0.5314***,0.2817***,0.8308***
,(0.0949),(0.2035),(0.0851),(0.2106)


## Table 6: Wage Changes

In [22]:
controls6 = controls5
lhs6 = {
    'A':['d_avg_lnwkwage','d_avg_lnwkwage_m','d_avg_lnwkwage_f'],
    'B':['d_avg_lnwkwage_c','d_avg_lnwkwage_c_m','d_avg_lnwkwage_c_f'],
    'C':['d_avg_lnwkwage_nc','d_avg_lnwkwage_nc_m','d_avg_lnwkwage_nc_f'],
}

# for panel in lhs6.keys():
#     for lhs in lhs6[panel]:
#         baseform6 = '{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs)
#         ADH_two_stage_regression(' + '.join([baseform6, *controls6]), print_first=False)

#### Panel A. All education levels

In [23]:
models6a = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs6['A'][i]), 
                                     *controls6]) for i in range(len(lhs6['A']))
}
results6a = {i : MyIVreg(m) for i, m in models6a.items()}

In [24]:
CompareDF(results6a, keep=['d_tradeusch_pw'], fit_stats=['R-squared'])

Model,(1),(2),(3)
Dep. Var.,d_avg_lnwkwage,d_avg_lnwkwage_m,d_avg_lnwkwage_f
,,,
d_tradeusch_pw,-0.7592***,-0.8921***,-0.6140***
,(0.2526),(0.2937),(0.2374)
R-squared,0.5603,0.4414,0.6919


#### Panel B. College education

In [25]:
models6b = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs6['B'][i]), 
                                     *controls6]) for i in range(len(lhs6['B']))
}
results6b = {i : MyIVreg(m) for i, m in models6b.items()}

In [26]:
CompareDF(results6b, keep=['d_tradeusch_pw'], fit_stats=['R-squared'])

Model,(1),(2),(3)
Dep. Var.,d_avg_lnwkwage_c,d_avg_lnwkwage_c_m,d_avg_lnwkwage_c_f
,,,
d_tradeusch_pw,-0.7568**,-0.9905***,-0.5255*
,(0.3084),(0.3741),(0.2786)
R-squared,0.5155,0.3875,0.6313


#### Panel C. No college education

In [27]:
models6c = {
    '({})'.format(i+1) : ' + '.join(['{} ~ 1 + [d_tradeusch_pw ~ d_tradeotch_pw_lag]'.format(lhs6['C'][i]), 
                                     *controls6]) for i in range(len(lhs6['C']))
}
results6c = {i : MyIVreg(m) for i, m in models6c.items()}

In [28]:
CompareDF(results6c, keep=['d_tradeusch_pw'], fit_stats=['R-squared'])

Model,(1),(2),(3)
Dep. Var.,d_avg_lnwkwage_nc,d_avg_lnwkwage_nc_m,d_avg_lnwkwage_nc_f
,,,
d_tradeusch_pw,-0.8144***,-0.7031***,-1.1164***
,(0.2357),(0.2496),(0.2781)
R-squared,0.5189,0.4518,0.5916
