# <font color=blue>_THE NATURAL SELECTION OF INFECTIOUS DISEASE RESISTANCE AND ITS EFFECT ON CONTEMPORARY HEALTH- A replication study_
## <font color=black> C. Justin Cook
### _Review of Economics and Statistics Volume 97 | Issue 4 | October 2015 p.742-757_

This study pretends to do a replication of _"THE NATURAL SELECTION OF INFECTIOUS DISEASE RESISTANCE AND
ITS EFFECT ON CONTEMPORARY HEALTH"_ whose author is C. Justin Cook, an assistant Professor, Economics University of California, Merced. The replication data for paper is available [here](https://doi.org/10.7910/DVN/27669)

The replication is done by Sara Ariza Murillo, a colombian economist from the Universidad del Rosario. This includes two jupyter notebooks:

1. **Notebook one:** presents the stata do file of the author that replicates tables 2-7 within the paper. It also replicates in stata the figures of the paper that are not included in the author's do file.
2. **Notebook two:** presents the code elaborated by Sara in R that replicates the tables within the paper. 

Both notebooks include a summary of the document.

If there are any questions, please contact Sara Ariza at sara.ariza@urosario.edu.co

### <font color=blue> Notebook two

#### <font color=blue> Abstract
 _"This paper empirically tests the association between genetically determined resistance to infectious disease and cross-country health differences. A country-level measure of genetic diversity for the system of genes associated with the recognition and disposal of foreign pathogens is constructed. Genetic diversity within this system has been shown to reduce the virulence and prevalence of infectious diseases and is hypothesized to have been naturally selected from historical exposure to infectious pathogens. Base estimation shows a statistically strong, robust, and positive relationship between this constructed measure and country-level health outcomes in times prior to, but not after, the international epidemiological transition"_

#### <font color=blue> I. Introduction
<p style="text-align: justify;"> Prior to the major medical discoveries associated with the international epidemiological transition, infectious diseases were a major determinant of mortality and subsequent differences in life expectancy across countries. (The discovery and widespread use of effective medicines (e.g., penicillin, streptomycin, a range of vaccines) in the late 1940s to early 1950s is labeled by Acemoglu and Johnson (2007) as the international epidemiological transition.)  
   
<p style="text-align: justify;"> This paper answers the question what were the causes of the initial cross-country disparities in the virulence of infectious diseases? the above, by empirically investigating the role of genetically determined differences in resistance to infectious diseases 
    
<p style="text-align: justify;"> The main hyphotesis is that innate resistance did influence country-level response to infectious disease prior to the international epidemiological transition, but the effects of innate resistance are dissipated by more efficacious health technologies.

<p style="text-align: justify;"> The measure of genetic resistance is found within the human leukocyte antigen (HLA) system. 

##### The HLA system 
<p style="text-align: justify;"> The HLA is responsible for locating foreign proteins in order to direct cells of the immune system to initiate an immune response and is broken into two major classes, class I and class II, with both classes being associated with the recognition of certain pathogens (Piertney&Oliver, 2006).

Using country-level aggregations of ethnic-level genetic data, the author constructs a cross-country measure for diversity within the HLA system: HLA heterozygosity. He tests the hypothesis by estimating the association between country-level health measures (e.g., life expectancy at birth) and HLA heterozygosity in periods both prior to and after the international epidemiological transition 


In [1]:
rm(list=ls())
cat("\014")



In [2]:
install.packages("lmtest")
install.packages("sandwich")
install.packages("stargazer")
install.packages("data.table")

package 'lmtest' successfully unpacked and MD5 sums checked


"restored 'lmtest'"


The downloaded binary packages are in
	C:\Users\USUARIO\AppData\Local\Temp\RtmpsvIWcj\downloaded_packages
package 'sandwich' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\USUARIO\AppData\Local\Temp\RtmpsvIWcj\downloaded_packages
package 'stargazer' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\USUARIO\AppData\Local\Temp\RtmpsvIWcj\downloaded_packages
package 'data.table' successfully unpacked and MD5 sums checked


"restored 'data.table'"


The downloaded binary packages are in
	C:\Users\USUARIO\AppData\Local\Temp\RtmpsvIWcj\downloaded_packages


In [3]:
library(lmtest)
library(sandwich)
library(stargazer)
library(data.table)

"package 'lmtest' was built under R version 3.6.3"Loading required package: zoo
"package 'zoo' was built under R version 3.6.3"
Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

"package 'sandwich' was built under R version 3.6.3"
Please cite as: 

 Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.2. https://CRAN.R-project.org/package=stargazer 

"package 'data.table' was built under R version 3.6.3"

In [4]:
dir1 <-("./raw/") #the directory where the base is
setwd(dir1)

In [5]:
hla_country_web <- read.delim("hla_country_web.tab", header=TRUE)


In [6]:
#############################
##########TABLA1#############
#############################

data<-data.table(hla_country_web)
stargazer(data,type="text")



Statistic       N    Mean    St. Dev.     Min    Pctl(25)  Pctl(75)     Max    
-------------------------------------------------------------------------------
ln_hla_het     175  -1.158     0.093    -1.560    -1.174    -1.109     -1.042  
hla_het        175   0.315     0.027     0.210     0.309     0.330     0.353   
ln_mort40      89   -1.095     0.802    -3.507    -1.619    -0.477     0.181   
ln_le40        87    3.820     0.253     3.292     3.620     4.039     4.215   
ln_le50        155   3.893     0.250     3.401     3.665     4.116     4.286   
ln_le60        158   3.975     0.223     3.438     3.790     4.175     4.298   
ln_le70        160   4.047     0.205     3.525     3.906     4.224     4.313   
ln_le80        159   4.112     0.178     3.654     4.010     4.252     4.332   
ln_le90        159   4.157     0.175     3.491     4.083     4.275     4.367   
ln_le00        160   4.185     0.174     3.682     4.117     4.306     4.395   
ln_le10        159   4.231     0.153   

In [7]:
##############################################################################
########## TABLE_2: Historic Determinants of HLA Heterozygosity  #############
##############################################################################

regt1_1<-lm(ln_hla_het~aa_ln_atd,subset=aa_mdist!="NA" & ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA",data=hla_country_web)
regt1_1robust<-coeftest(regt1_1, vcov = vcovHC(regt1_1, type="HC1"))
summary(regt1_1)
summary(regt1_1robust)

regt1_2<-lm(ln_hla_het~aa_ln_atd+aa_mdist,subset=aa_mdist!="NA" & ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt1_2robust<-coeftest(regt1_2, vcov = vcovHC(regt1_2, type="HC1"))
summary(regt1_2)
summary(regt1_2robust)

regt1_3<-lm(ln_hla_het~aa_lanim,subset=aa_mdist!="NA" & ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt1_3robust<-coeftest(regt1_3, vcov = vcovHC(regt1_3, type="HC1"))
summary(regt1_3)
summary(regt1_3robust)

regt1_4<-lm(ln_hla_het~aa_lanim+aa_mdist,subset=aa_mdist!="NA" & ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt1_4robust<-coeftest(regt1_4, vcov = vcovHC(regt1_4, type="HC1"))
summary(regt1_4)
summary(regt1_4robust)

regt1_5<-lm(ln_hla_het~aa_lpd1,subset=aa_mdist!="NA"  & ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt1_5robust<-coeftest(regt1_5, vcov = vcovHC(regt1_5, type="HC1"))
summary(regt1_5)
summary(regt1_5robust)

regt1_6<-lm(ln_hla_het~aa_lpd1+aa_mdist,subset=aa_mdist!="NA" & ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt1_6robust<-coeftest(regt1_6, vcov = vcovHC(regt1_6, type="HC1"))
summary(regt1_6)
summary(regt1_6robust)



Call:
lm(formula = ln_hla_het ~ aa_ln_atd, data = hla_country_web, 
    subset = aa_mdist != "NA" & ln_le60 != "NA" & ln_hla_het != 
        "NA" & ln_frac != "NA " & ln_abslat != "NA" & ln_arable != 
        "NA" & ln_suitavg != "NA" & aa_ln_atd != "NA" & ln_le60 != 
        "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.29871 -0.02954  0.01560  0.04933  0.09588 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.32606    0.10613  -12.49   <2e-16 ***
aa_ln_atd    0.02114    0.01251    1.69   0.0934 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.06738 on 129 degrees of freedom
  (44 observations deleted due to missingness)
Multiple R-squared:  0.02166,	Adjusted R-squared:  0.01408 
F-statistic: 2.856 on 1 and 129 DF,  p-value: 0.09344


    Estimate          Std. Error          t value           Pr(>|t|)       
 Min.   :-1.32606   Min.   :0.009805   Min.   :-15.660   Min.   :0.000000  
 1st Qu.:-0.98926   1st Qu.:0.028523   1st Qu.:-11.206   1st Qu.:0.008225  
 Median :-0.65246   Median :0.047241   Median : -6.752   Median :0.016449  
 Mean   :-0.65246   Mean   :0.047241   Mean   : -6.752   Mean   :0.016449  
 3rd Qu.:-0.31566   3rd Qu.:0.065960   3rd Qu.: -2.298   3rd Qu.:0.024674  
 Max.   : 0.02114   Max.   :0.084678   Max.   :  2.157   Max.   :0.032899  


Call:
lm(formula = ln_hla_het ~ aa_ln_atd + aa_mdist, data = hla_country_web, 
    subset = aa_mdist != "NA" & ln_le60 != "NA" & ln_hla_het != 
        "NA" & ln_frac != "NA " & ln_abslat != "NA" & ln_arable != 
        "NA" & ln_suitavg != "NA" & aa_ln_atd != "NA" & ln_le60 != 
        "NA")

Residuals:
      Min        1Q    Median        3Q       Max 
-0.189302 -0.028152  0.003329  0.033802  0.138810 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.321902   0.081579 -16.204  < 2e-16 ***
aa_ln_atd    0.029984   0.009661   3.104  0.00235 ** 
aa_mdist    -0.012063   0.001269  -9.505  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05179 on 128 degrees of freedom
  (44 observations deleted due to missingness)
Multiple R-squared:  0.4265,	Adjusted R-squared:  0.4175 
F-statistic: 47.59 on 2 and 128 DF,  p-value: 3.522e-16


    Estimate          Std. Error          t value           Pr(>|t|)        
 Min.   :-1.32190   Min.   :0.001422   Min.   :-18.110   Min.   :0.0000000  
 1st Qu.:-0.66698   1st Qu.:0.005030   1st Qu.:-13.295   1st Qu.:0.0000000  
 Median :-0.01206   Median :0.008638   Median : -8.481   Median :0.0000000  
 Mean   :-0.43466   Mean   :0.027685   Mean   : -7.707   Mean   :0.0002357  
 3rd Qu.: 0.00896   3rd Qu.:0.040816   3rd Qu.: -2.505   3rd Qu.:0.0003536  
 Max.   : 0.02998   Max.   :0.072994   Max.   :  3.471   Max.   :0.0007072  


Call:
lm(formula = ln_hla_het ~ aa_lanim, data = hla_country_web, subset = aa_mdist != 
    "NA" & ln_le60 != "NA" & ln_hla_het != "NA" & ln_frac != 
    "NA " & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
    "NA" & aa_ln_atd != "NA" & ln_le60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28924 -0.03132  0.01314  0.05735  0.08659 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.189394   0.012819 -92.780   <2e-16 ***
aa_lanim     0.026613   0.007815   3.406    0.001 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07228 on 87 degrees of freedom
  (86 observations deleted due to missingness)
Multiple R-squared:  0.1176,	Adjusted R-squared:  0.1075 
F-statistic:  11.6 on 1 and 87 DF,  p-value: 0.001001


    Estimate          Std. Error          t value          Pr(>|t|)        
 Min.   :-1.18939   Min.   :0.006444   Min.   :-94.03   Min.   :0.000e+00  
 1st Qu.:-0.88539   1st Qu.:0.007995   1st Qu.:-69.49   1st Qu.:2.078e-05  
 Median :-0.58139   Median :0.009546   Median :-44.95   Median :4.157e-05  
 Mean   :-0.58139   Mean   :0.009546   Mean   :-44.95   Mean   :4.157e-05  
 3rd Qu.:-0.27739   3rd Qu.:0.011097   3rd Qu.:-20.41   3rd Qu.:6.235e-05  
 Max.   : 0.02661   Max.   :0.012649   Max.   :  4.13   Max.   :8.314e-05  


Call:
lm(formula = ln_hla_het ~ aa_lanim + aa_mdist, data = hla_country_web, 
    subset = aa_mdist != "NA" & ln_hla_het != "NA" & ln_frac != 
        "NA " & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
        "NA" & aa_ln_atd != "NA" & ln_le60 != "NA")

Residuals:
      Min        1Q    Median        3Q       Max 
-0.173322 -0.027746  0.009552  0.035496  0.127879 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.109316   0.011777 -94.190  < 2e-16 ***
aa_lanim     0.036053   0.005395   6.683 2.22e-09 ***
aa_mdist    -0.013321   0.001318 -10.110 2.73e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04914 on 86 degrees of freedom
  (86 observations deleted due to missingness)
Multiple R-squared:  0.5968,	Adjusted R-squared:  0.5874 
F-statistic: 63.65 on 2 and 86 DF,  p-value: < 2.2e-16


    Estimate          Std. Error          t value             Pr(>|t|)        
 Min.   :-1.10932   Min.   :0.001494   Min.   :-102.9112   Min.   :0.000e+00  
 1st Qu.:-0.56132   1st Qu.:0.003295   1st Qu.: -55.9145   1st Qu.:3.620e-14  
 Median :-0.01332   Median :0.005096   Median :  -8.9177   Median :7.240e-14  
 Mean   :-0.36219   Mean   :0.005790   Mean   : -34.9181   Mean   :1.258e-10  
 3rd Qu.: 0.01137   3rd Qu.:0.007938   3rd Qu.:  -0.9215   3rd Qu.:1.887e-10  
 Max.   : 0.03605   Max.   :0.010779   Max.   :   7.0747   Max.   :3.773e-10  


Call:
lm(formula = ln_hla_het ~ aa_lpd1, data = hla_country_web, subset = aa_mdist != 
    "NA" & ln_hla_het != "NA" & ln_frac != "NA " & ln_abslat != 
    "NA" & ln_arable != "NA" & ln_suitavg != "NA" & aa_ln_atd != 
    "NA" & ln_le60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28985 -0.02536  0.01560  0.04899  0.09343 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -1.154714   0.006867 -168.154   <2e-16 ***
aa_lpd1      0.014363   0.005273    2.724   0.0075 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.06849 on 111 degrees of freedom
  (62 observations deleted due to missingness)
Multiple R-squared:  0.06264,	Adjusted R-squared:  0.0542 
F-statistic: 7.418 on 1 and 111 DF,  p-value: 0.007501


    Estimate          Std. Error          t value            Pr(>|t|)        
 Min.   :-1.15471   Min.   :0.004286   Min.   :-155.386   Min.   :0.0000000  
 1st Qu.:-0.86245   1st Qu.:0.005073   1st Qu.:-115.702   1st Qu.:0.0002755  
 Median :-0.57018   Median :0.005859   Median : -76.017   Median :0.0005510  
 Mean   :-0.57018   Mean   :0.005859   Mean   : -76.017   Mean   :0.0005510  
 3rd Qu.:-0.27791   3rd Qu.:0.006645   3rd Qu.: -36.333   3rd Qu.:0.0008265  
 Max.   : 0.01436   Max.   :0.007431   Max.   :   3.351   Max.   :0.0011020  


Call:
lm(formula = ln_hla_het ~ aa_lpd1 + aa_mdist, data = hla_country_web, 
    subset = aa_mdist != "NA" & ln_hla_het != "NA" & ln_frac != 
        "NA " & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
        "NA" & aa_ln_atd != "NA" & ln_le60 != "NA")

Residuals:
      Min        1Q    Median        3Q       Max 
-0.180134 -0.026307  0.008982  0.032460  0.114983 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -1.072103   0.010613 -101.019  < 2e-16 ***
aa_lpd1      0.015834   0.004032    3.927  0.00015 ***
aa_mdist    -0.012383   0.001383   -8.955 9.65e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05233 on 110 degrees of freedom
  (62 observations deleted due to missingness)
Multiple R-squared:  0.4578,	Adjusted R-squared:  0.448 
F-statistic: 46.45 on 2 and 110 DF,  p-value: 2.381e-15


    Estimate           Std. Error          t value            Pr(>|t|)        
 Min.   :-1.072104   Min.   :0.001501   Min.   :-107.000   Min.   :0.000e+00  
 1st Qu.:-0.542243   1st Qu.:0.002379   1st Qu.: -57.625   1st Qu.:0.000e+00  
 Median :-0.012383   Median :0.003258   Median :  -8.250   Median :0.000e+00  
 Mean   :-0.356217   Mean   :0.004926   Mean   : -36.796   Mean   :1.304e-06  
 3rd Qu.: 0.001726   3rd Qu.:0.006639   3rd Qu.:  -1.695   3rd Qu.:1.956e-06  
 Max.   : 0.015834   Max.   :0.010020   Max.   :   4.861   Max.   :3.912e-06  

In [8]:
Table2<-stargazer(regt1_1robust,regt1_2robust,regt1_3robust,regt1_4robust,
                  regt1_5robust,regt1_6robust,type="text", keep.stat=c("n", "rsq"), align = TRUE, 
                  title = "Table 2: Explaining HLA Heterozygosity",dep.var.labels = "ln HLA Heterozygosity",
                  column.sep.width = "5pt",  digits=4, no.space=TRUE, order=c("aa_ln_atd","aa_lanim","aa_lpd1","aa_mdist"),
                  covariate.labels=c("ln Years since Neolithic Revolution","ln No. of Potential Domesticate Animals", 
                                     "ln Population Density in 1 CE", "Migratory Distance from East Africa") )



Table 2: Explaining HLA Heterozygosity
                                                               Dependent variable:                       
                                        -----------------------------------------------------------------
                                                              ln HLA Heterozygosity                      
                                           (1)        (2)        (3)        (4)        (5)        (6)    
---------------------------------------------------------------------------------------------------------
ln Years since Neolithic Revolution      0.0211**  0.0300***                                             
                                         (0.0098)   (0.0086)                                             
ln No. of Potential Domesticate Animals                       0.0266***  0.0361***                       
                                                               (0.0064)   (0.0051)                       
ln Pop

In [9]:
#########################################################
##########    TABLE_3: Premedicinal Health   ############
#########################################################

regt3_1<-lm(ln_mort40~ln_hla_het,subset=ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt3_1robust<-coeftest(regt3_1, vcov = vcovHC(regt3_1, type="HC1"))
summary(regt3_1)

regt3_2<-lm(ln_mort40~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt3_2robust<-coeftest(regt3_2, vcov = vcovHC(regt3_2, type="HC1"))
summary(regt3_2)

regt3_3<-lm(ln_le40~ln_hla_het,subset=ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt3_3robust<-coeftest(regt3_3, vcov = vcovHC(regt3_3, type="HC1"))
summary(regt3_3)

regt3_4<-lm(ln_le40~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt3_4robust<-coeftest(regt3_4, vcov = vcovHC(regt3_4, type="HC1"))
summary(regt3_4)

regt3_5<-lm(ln_le60~ln_hla_het,subset=ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt3_5robust<-coeftest(regt3_5, vcov = vcovHC(regt3_5, type="HC1"))
summary(regt3_5)


regt3_6<-lm(ln_le60~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_hla_het!="NA" & ln_frac!="NA "& ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA" & ln_le60!="NA" ,data=hla_country_web)
regt3_6robust<-coeftest(regt3_6, vcov = vcovHC(regt3_6, type="HC1"))
summary(regt3_6)



Call:
lm(formula = ln_mort40 ~ ln_hla_het, data = hla_country_web, 
    subset = ln_hla_het != "NA" & ln_frac != "NA " & ln_abslat != 
        "NA" & ln_arable != "NA" & ln_suitavg != "NA" & aa_ln_atd != 
        "NA" & ln_le60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-1.18727 -0.34339  0.01533  0.26435  1.09459 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -6.6724     0.9301  -7.174 5.63e-10 ***
ln_hla_het   -5.0080     0.8120  -6.167 3.79e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4915 on 71 degrees of freedom
  (102 observations deleted due to missingness)
Multiple R-squared:  0.3488,	Adjusted R-squared:  0.3397 
F-statistic: 38.03 on 1 and 71 DF,  p-value: 3.795e-08



Call:
lm(formula = ln_mort40 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_hla_het != "NA" & ln_frac != 
        "NA " & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
        "NA" & aa_ln_atd != "NA" & ln_le60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-1.01614 -0.30628  0.05079  0.28246  0.91429 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -9.57531    2.25891  -4.239  7.6e-05 ***
ln_hla_het  -3.86103    1.04886  -3.681 0.000489 ***
ln_frac     -0.39701    0.38210  -1.039 0.302832    
aa_ln_atd    0.59752    0.21443   2.787 0.007060 ** 
ln_arable    0.04486    0.10042   0.447 0.656616    
ln_suitavg   0.03620    0.08595   0.421 0.675104    
ln_abslat   -0.32604    0.10993  -2.966 0.004283 ** 
europe       0.10748    0.35411   0.304 0.762501    
africa       0.45897    0.40337   1.138 0.259562    
asia         


Call:
lm(formula = ln_le40 ~ ln_hla_het, data = hla_country_web, subset = ln_hla_het != 
    "NA" & ln_frac != "NA " & ln_abslat != "NA" & ln_arable != 
    "NA" & ln_suitavg != "NA" & aa_ln_atd != "NA" & ln_le60 != 
    "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.44116 -0.18547  0.07159  0.18426  0.34939 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6.2554     0.4174  14.985  < 2e-16 ***
ln_hla_het    2.1436     0.3653   5.869 1.39e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.216 on 69 degrees of freedom
  (104 observations deleted due to missingness)
Multiple R-squared:  0.3329,	Adjusted R-squared:  0.3233 
F-statistic: 34.44 on 1 and 69 DF,  p-value: 1.386e-07



Call:
lm(formula = ln_le40 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_hla_het != "NA" & ln_frac != 
        "NA " & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
        "NA" & aa_ln_atd != "NA" & ln_le60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.32401 -0.10245 -0.00067  0.09843  0.36017 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.316719   0.686427   7.746 1.33e-10 ***
ln_hla_het   1.077112   0.348335   3.092 0.003013 ** 
ln_frac      0.072362   0.128129   0.565 0.574344    
aa_ln_atd   -0.003076   0.068392  -0.045 0.964279    
ln_arable   -0.105980   0.033565  -3.157 0.002491 ** 
ln_suitavg   0.052993   0.027554   1.923 0.059195 .  
ln_abslat    0.098031   0.037499   2.614 0.011290 *  
europe      -0.052723   0.116216  -0.454 0.651708    
africa      -0.502073   0.128697  -3.901 0.000245 ***
asia 


Call:
lm(formula = ln_le60 ~ ln_hla_het, data = hla_country_web, subset = ln_hla_het != 
    "NA" & ln_frac != "NA " & ln_abslat != "NA" & ln_arable != 
    "NA" & ln_suitavg != "NA" & aa_ln_atd != "NA" & ln_le60 != 
    "NA")

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5756 -0.1791  0.0458  0.1764  0.3490 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.8146     0.3023  19.233  < 2e-16 ***
ln_hla_het    1.6200     0.2631   6.157 8.72e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2036 on 129 degrees of freedom
  (44 observations deleted due to missingness)
Multiple R-squared:  0.2271,	Adjusted R-squared:  0.2211 
F-statistic: 37.91 on 1 and 129 DF,  p-value: 8.721e-09



Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_hla_het != "NA" & ln_frac != 
        "NA " & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
        "NA" & aa_ln_atd != "NA" & ln_le60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.48975 -0.06072  0.00716  0.07354  0.27776 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.962062   0.408122  12.158  < 2e-16 ***
ln_hla_het   1.015120   0.192335   5.278 5.89e-07 ***
ln_frac     -0.121001   0.075280  -1.607 0.110605    
aa_ln_atd    0.036998   0.037781   0.979 0.329412    
ln_arable   -0.008700   0.015168  -0.574 0.567349    
ln_suitavg   0.008039   0.014414   0.558 0.578063    
ln_abslat    0.019909   0.016277   1.223 0.223684    
europe       0.038739   0.081110   0.478 0.633792    
africa      -0.312210   0.084269  -3.705 0.000321 ***
asia 

In [10]:
Table3<-stargazer(regt3_1robust,regt3_2robust,regt3_3robust,regt3_4robust,
                  regt3_5robust,regt3_6robust,type="text", align = TRUE,
                  keep=c("ln_hla_het","ln_frac","aa_ln_atd","ln_arable", "ln_suitavg","ln_abslat"),
                  title = "Table 3: The Effect of HLA Heterozygosity prior to the International Epidemiological Transition",
                  column.labels=c("ln PM1940", "ln PM1940", "ln LE1940", "ln LE1940", "ln LE1960", "ln LE1960"),
                  column.sep.width = "10pt",  digits=4, no.space=TRUE,
                  covariate.labels=c("ln HLA Heterozygosity", "ln Ethnic Fractionalization", "ln Years since Neolithic Revolution", 
                                     "ln Fraction of Arable Land", "ln Suitability of Agriculture", "ln Abs. Latitude") )



Table 3: The Effect of HLA Heterozygosity prior to the International Epidemiological Transition
                                                         Dependent variable:                      
                                    --------------------------------------------------------------
                                                                                                  
                                    ln PM1940  ln PM1940  ln LE1940 ln LE1940  ln LE1960 ln LE1960
                                       (1)        (2)        (3)       (4)        (5)       (6)   
--------------------------------------------------------------------------------------------------
ln HLA Heterozygosity               -5.0080*** -3.8610*** 2.1436*** 1.0771***  1.6200*** 1.0151***
                                     (0.7117)   (1.1282)  (0.3310)   (0.3771)  (0.2455)  (0.1899) 
ln Ethnic Fractionalization                     -0.3970               0.0724              -0.1210 
            

In [11]:
##################################################################################
##########    TABLE_ 4: Additional Years:  The effect of medicine    #############
##################################################################################

regt4_1<-lm(ln_le60~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_le60!="NA" & ln_le70!="NA" & ln_le80!="NA" & ln_le90!="NA" & ln_le00!="NA" & ln_le10!="NA",data=hla_country_web)
regt4_1robust<-coeftest(regt4_1, vcov = vcovHC(regt4_1, type="HC1"))
summary(regt4_1)

regt4_2<-lm(ln_le70~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_le60!="NA" & ln_le70!="NA" & ln_le80!="NA" & ln_le90!="NA" & ln_le00!="NA" & ln_le10!="NA",data=hla_country_web)
regt4_2robust<-coeftest(regt4_2, vcov = vcovHC(regt4_2, type="HC1"))
summary(regt4_2)

regt4_3<-lm(ln_le80~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_le60!="NA" & ln_le70!="NA" & ln_le80!="NA" & ln_le90!="NA" & ln_le00!="NA" & ln_le10!="NA",data=hla_country_web)
regt4_3robust<-coeftest(regt4_3, vcov = vcovHC(regt4_3, type="HC1"))
summary(regt4_3)


regt4_4<-lm(ln_le90~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_le60!="NA" & ln_le70!="NA" & ln_le80!="NA" & ln_le90!="NA" & ln_le00!="NA" & ln_le10!="NA",data=hla_country_web)
regt4_4robust<-coeftest(regt4_4, vcov = vcovHC(regt4_4, type="HC1"))
summary(regt4_4)


regt4_5<-lm(ln_le00~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_le60!="NA" & ln_le70!="NA" & ln_le80!="NA" & ln_le90!="NA" & ln_le00!="NA" & ln_le10!="NA",data=hla_country_web)
regt4_5robust<-coeftest(regt4_5, vcov = vcovHC(regt4_5, type="HC1"))
summary(regt4_5)

regt4_6<-lm(ln_le10~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=ln_le60!="NA" & ln_le70!="NA" & ln_le80!="NA" & ln_le90!="NA" & ln_le00!="NA" & ln_le10!="NA",data=hla_country_web)
regt4_6robust<-coeftest(regt4_6, vcov = vcovHC(regt4_6, type="HC1"))
summary(regt4_6)




Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_le70 != 
        "NA" & ln_le80 != "NA" & ln_le90 != "NA" & ln_le00 != 
        "NA" & ln_le10 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.48975 -0.06072  0.00716  0.07354  0.27776 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.962062   0.408122  12.158  < 2e-16 ***
ln_hla_het   1.015120   0.192335   5.278 5.89e-07 ***
ln_frac     -0.121001   0.075280  -1.607 0.110605    
aa_ln_atd    0.036998   0.037781   0.979 0.329412    
ln_arable   -0.008700   0.015168  -0.574 0.567349    
ln_suitavg   0.008039   0.014414   0.558 0.578063    
ln_abslat    0.019909   0.016277   1.223 0.223684    
europe       0.038739   0.081110   0.478 0.633792    
africa      -0.312210   0.084269  -3.705 0.000321 ***
asia        -0.205622   0.077827  -2


Call:
lm(formula = ln_le70 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_le70 != 
        "NA" & ln_le80 != "NA" & ln_le90 != "NA" & ln_le00 != 
        "NA" & ln_le10 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47914 -0.04940  0.00058  0.07175  0.24402 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.1986107  0.3863098  13.457  < 2e-16 ***
ln_hla_het   1.0404702  0.1820558   5.715 8.15e-08 ***
ln_frac     -0.1163928  0.0712564  -1.633 0.104998    
aa_ln_atd    0.0246249  0.0357620   0.689 0.492418    
ln_arable   -0.0141638  0.0143576  -0.987 0.325870    
ln_suitavg   0.0007926  0.0136433   0.058 0.953771    
ln_abslat    0.0045721  0.0154071   0.297 0.767166    
europe       0.0344496  0.0767749   0.449 0.654450    
africa      -0.3192781  0.0797647  -4.003 0.000109 ***
asia        -0.1420921  0.


Call:
lm(formula = ln_le80 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_le70 != 
        "NA" & ln_le80 != "NA" & ln_le90 != "NA" & ln_le00 != 
        "NA" & ln_le10 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.42788 -0.04509  0.00724  0.06262  0.25050 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.9452963  0.3561394  13.886  < 2e-16 ***
ln_hla_het   0.8420928  0.1678374   5.017 1.84e-06 ***
ln_frac     -0.1462667  0.0656913  -2.227 0.027842 *  
aa_ln_atd    0.0348747  0.0329691   1.058 0.292271    
ln_arable   -0.0199753  0.0132363  -1.509 0.133893    
ln_suitavg   0.0003745  0.0125777   0.030 0.976298    
ln_abslat    0.0070101  0.0142038   0.494 0.622533    
europe       0.0176880  0.0707789   0.250 0.803088    
africa      -0.2552229  0.0735352  -3.471 0.000722 ***
asia        -0.1194574  0.


Call:
lm(formula = ln_le90 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_le70 != 
        "NA" & ln_le80 != "NA" & ln_le90 != "NA" & ln_le00 != 
        "NA" & ln_le10 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.42359 -0.04798  0.00311  0.06009  0.26370 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.554555   0.359376  12.674  < 2e-16 ***
ln_hla_het   0.549024   0.169363   3.242  0.00154 ** 
ln_frac     -0.183742   0.066288  -2.772  0.00646 ** 
aa_ln_atd    0.038322   0.033269   1.152  0.25166    
ln_arable   -0.022074   0.013357  -1.653  0.10101    
ln_suitavg  -0.004649   0.012692  -0.366  0.71478    
ln_abslat    0.024767   0.014333   1.728  0.08657 .  
europe       0.018960   0.071422   0.265  0.79111    
africa      -0.220302   0.074203  -2.969  0.00361 ** 
asia        -0.072690   0.068532  -1


Call:
lm(formula = ln_le00 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_le70 != 
        "NA" & ln_le80 != "NA" & ln_le90 != "NA" & ln_le00 != 
        "NA" & ln_le10 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.39356 -0.05137  0.00062  0.05057  0.21952 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.850750   0.309218  12.453  < 2e-16 ***
ln_hla_het   0.251217   0.145725   1.724 0.087299 .  
ln_frac     -0.177528   0.057037  -3.113 0.002319 ** 
aa_ln_atd    0.084388   0.028625   2.948 0.003845 ** 
ln_arable   -0.006223   0.011492  -0.542 0.589157    
ln_suitavg  -0.016824   0.010921  -1.541 0.126047    
ln_abslat    0.013741   0.012332   1.114 0.267420    
europe      -0.001058   0.061454  -0.017 0.986296    
africa      -0.249743   0.063847  -3.912 0.000152 ***
asia        -0.092455   0.058967  -1


Call:
lm(formula = ln_le10 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_le70 != 
        "NA" & ln_le80 != "NA" & ln_le90 != "NA" & ln_le00 != 
        "NA" & ln_le10 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.36822 -0.04284  0.00248  0.04303  0.22940 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.912044   0.282804  13.833  < 2e-16 ***
ln_hla_het   0.178763   0.133277   1.341 0.182360    
ln_frac     -0.150893   0.052164  -2.893 0.004537 ** 
aa_ln_atd    0.073300   0.026180   2.800 0.005960 ** 
ln_arable   -0.004571   0.010511  -0.435 0.664390    
ln_suitavg  -0.013801   0.009988  -1.382 0.169608    
ln_abslat    0.006617   0.011279   0.587 0.558550    
europe       0.006555   0.056204   0.117 0.907355    
africa      -0.223514   0.058393  -3.828 0.000207 ***
asia        -0.083993   0.053930  -1

In [12]:
Table4<-stargazer(regt4_1robust,regt4_2robust,regt4_3robust,regt4_4robust,
                  regt4_5robust,regt4_6robust,type="text", align = TRUE,
                  title = "Table 4: The Effect of HLA Heterozygosity after the International Epidemiological Transition",
                  dep.var.labels = "ln Life Expectancy", keep=c("ln_hla_het"), 
                  column.labels=c("1960", "1970", "1980", "1990", "2000", "2010"),
                  column.sep.width = "10pt",  digits=4, no.space=TRUE,
                  covariate.labels=c("ln HLA Heterozygosity"))


Table 4: The Effect of HLA Heterozygosity after the International Epidemiological Transition
                                         Dependent variable:                   
                      ---------------------------------------------------------
                                         ln Life Expectancy                    
                        1960      1970      1980      1990      2000     2010  
                         (1)       (2)       (3)       (4)      (5)      (6)   
-------------------------------------------------------------------------------
ln HLA Heterozygosity 1.0151*** 1.0405*** 0.8421*** 0.5490***  0.2512   0.1788 
                      (0.1899)  (0.1696)  (0.1700)  (0.1798)  (0.1541) (0.1335)
Note:                                               *p<0.1; **p<0.05; ***p<0.01


In [13]:
########################################################
##########TABLE_5: Pop. Composition Trunc. #############
########################################################

regt5_1<-lm(ln_le60~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas, subset = frac_eur==0 & ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA" & ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA", data=hla_country_web)
regt5_1robust<-coeftest(regt5_1, vcov = vcovHC(regt5_1, type="HC1"))
summary(regt5_1)

regt5_2<-lm(ln_le60~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas, subset = frac_eur>0 & frac_eur<1 & ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA" & ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA", data=hla_country_web)
regt5_2robust<-coeftest(regt5_2, vcov = vcovHC(regt5_2, type="HC1"))
summary(regt5_2)


regt5_3<-lm(ln_le60~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas, subset = frac_eur==1 & ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA" & ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA", data=hla_country_web)
regt5_3robust<-coeftest(regt5_3, vcov = vcovHC(regt5_3, type="HC1"))
summary(regt5_3)


regt5_4<-lm(ln_le60~ln_hla_het+frac_eur+frac_me+frac_easia+frac_africa+frac_am+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset= ln_le60!="NA" & ln_hla_het!="NA" & ln_frac!="NA" & ln_abslat!="NA" & ln_arable!="NA" & ln_suitavg!="NA" & aa_ln_atd!="NA",data=hla_country_web)
regt5_4robust<-coeftest(regt5_4, vcov = vcovHC(regt5_4, type="HC1"))
summary(regt5_4)




Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = frac_eur == 0 & ln_le60 != 
        "NA" & ln_hla_het != "NA" & ln_frac != "NA" & ln_abslat != 
        "NA" & ln_arable != "NA" & ln_suitavg != "NA" & aa_ln_atd != 
        "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23551 -0.07323  0.01446  0.06769  0.29570 

Coefficients: (2 not defined because of singularities)
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.663561   0.719793   6.479 5.52e-08 ***
ln_hla_het   0.724094   0.362643   1.997   0.0518 .  
ln_frac     -0.166265   0.129578  -1.283   0.2059    
aa_ln_atd    0.010913   0.066160   0.165   0.8697    
ln_arable   -0.006245   0.020622  -0.303   0.7634    
ln_suitavg   0.001592   0.021353   0.075   0.9409    
ln_abslat   -0.010405   0.022727  -0.458   0.6492    
europe             NA         NA      NA       N


Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = frac_eur > 0 & frac_eur < 
        1 & ln_le60 != "NA" & ln_hla_het != "NA" & ln_frac != 
        "NA" & ln_abslat != "NA" & ln_arable != "NA" & ln_suitavg != 
        "NA" & aa_ln_atd != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49173 -0.05557 -0.00057  0.07756  0.20268 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.97634    0.84821   4.688 3.04e-05 ***
ln_hla_het   0.80490    0.36724   2.192   0.0341 *  
ln_frac     -0.15810    0.14726  -1.074   0.2893    
aa_ln_atd    0.14118    0.08746   1.614   0.1142    
ln_arable   -0.06410    0.03803  -1.686   0.0995 .  
ln_suitavg   0.04920    0.04322   1.138   0.2616    
ln_abslat    0.05377    0.04204   1.279   0.2081    
europe      -0.03648    0.12133  -0.301   0.7652    
africa      -0.24688    0.1363


Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = frac_eur == 1 & ln_le60 != 
        "NA" & ln_hla_het != "NA" & ln_frac != "NA" & ln_abslat != 
        "NA" & ln_arable != "NA" & ln_suitavg != "NA" & aa_ln_atd != 
        "NA")

Residuals:
      Min        1Q    Median        3Q       Max 
-0.079159 -0.024165  0.002131  0.022136  0.057548 

Coefficients: (4 not defined because of singularities)
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.517311   0.758054   4.640 0.000234 ***
ln_hla_het   0.728074   0.365006   1.995 0.062365 .  
ln_frac     -0.073481   0.070289  -1.045 0.310474    
aa_ln_atd    0.035708   0.044750   0.798 0.435913    
ln_arable   -0.009074   0.022862  -0.397 0.696384    
ln_suitavg   0.018469   0.024473   0.755 0.460780    
ln_abslat    0.323034   0.129588   2.493 0.023290 *  
europe             NA         NA      


Call:
lm(formula = ln_le60 ~ ln_hla_het + frac_eur + frac_me + frac_easia + 
    frac_africa + frac_am + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_le60 != "NA" & ln_hla_het != 
        "NA" & ln_frac != "NA" & ln_abslat != "NA" & ln_arable != 
        "NA" & ln_suitavg != "NA" & aa_ln_atd != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.42848 -0.06734  0.00682  0.06718  0.24529 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.598250   0.497114   9.250 1.45e-15 ***
ln_hla_het   0.886992   0.286872   3.092   0.0025 ** 
frac_eur     0.317721   0.194336   1.635   0.1048    
frac_me      0.020621   0.204333   0.101   0.9198    
frac_easia   0.140123   0.190495   0.736   0.4635    
frac_africa  0.017364   0.194295   0.089   0.9289    
frac_am      0.145135   0.180890   0.802   0.4240    
ln_frac     -0.129576   0.073851  -1.755   0.0820 .

In [14]:
Table5<-stargazer(regt5_1robust,regt5_2robust,regt5_3robust,regt5_4robust,
                  type="text", align = TRUE, title = "Table 5: Robustness to the Influence of Regional Populations",
                  dep.var.labels = "ln Life Expectancy in 1960", keep=c("ln_hla_het"), 
                  column.labels=c("%European=0", "%European=(0;1)", "%European=1", "Full"),
                  column.sep.width = "10pt",  digits=4, no.space=TRUE,
                  covariate.labels=c("ln HLA Heterozygosity"))


Table 5: Robustness to the Influence of Regional Populations
                                     Dependent variable:               
                      -------------------------------------------------
                                 ln Life Expectancy in 1960            
                      %European=0 %European=(0;1) %European=1   Full   
                          (1)           (2)           (3)        (4)   
-----------------------------------------------------------------------
ln HLA Heterozygosity   0.7241*      0.8049***     0.7281**   0.8870***
                       (0.3623)      (0.2923)      (0.3111)   (0.2872) 
Note:                                       *p<0.1; **p<0.05; ***p<0.01


In [15]:
###################################################################
##########    TABLE_6: Exogenous Omitted Variables    #############
###################################################################

regt6_1<-lm(ln_le60~ln_hla_het+aa_pdiv+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset= distcr1000!="NA" & aa_pdiv!="NA" & malfal!="NA"  & tropical!="NA" & pnativ_60!="NA" , data=hla_country_web)
regt6_1robust<-coeftest(regt6_1, vcov = vcovHC(regt6_1, type="HC1"))
summary(regt6_1)
 

regt6_2<-lm(ln_le60~ln_hla_het+aa_pdiv+ln_hla_het+malfal+tropical+desert+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=distcr1000!="NA" & aa_pdiv!="NA" & malfal!="NA"  & tropical!="NA" & pnativ_60!="NA" , data=hla_country_web)
regt6_2robust<-coeftest(regt6_2, vcov = vcovHC(regt6_2, type="HC1"))
summary(regt6_2)


regt6_3<-lm(ln_le60~ln_hla_het+frac_migrant+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=distcr1000!="NA" & aa_pdiv!="NA" & malfal!="NA"  & tropical!="NA" & pnativ_60!="NA" , data=hla_country_web)
regt6_3robust<-coeftest(regt6_3, vcov = vcovHC(regt6_3, type="HC1"))
summary(regt6_3)


regt6_4<-lm(ln_le60~ln_hla_het+aa_pdiv+frac_migrant+ln_hla_het+malfal+tropical+desert+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset=distcr1000!="NA" & aa_pdiv!="NA" & malfal!="NA"  & tropical!="NA" & pnativ_60!="NA" , data=hla_country_web)
regt6_4robust<-coeftest(regt6_4, vcov = vcovHC(regt6_4, type="HC1"))
summary(regt6_4)



Call:
lm(formula = ln_le60 ~ ln_hla_het + aa_pdiv + ln_frac + aa_ln_atd + 
    ln_arable + ln_suitavg + ln_abslat + europe + africa + asia + 
    americas, data = hla_country_web, subset = distcr1000 != 
    "NA" & aa_pdiv != "NA" & malfal != "NA" & tropical != "NA" & 
    pnativ_60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49329 -0.06037  0.00781  0.07372  0.27135 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.196418   0.811731   6.402 3.18e-09 ***
ln_hla_het   1.077102   0.267620   4.025 0.000101 ***
aa_pdiv     -0.278363   0.832380  -0.334 0.738652    
ln_frac     -0.113391   0.078912  -1.437 0.153364    
aa_ln_atd    0.040665   0.039475   1.030 0.305031    
ln_arable   -0.008339   0.015263  -0.546 0.585860    
ln_suitavg   0.007875   0.014476   0.544 0.587445    
ln_abslat    0.020602   0.016469   1.251 0.213395    
europe       0.039045   0.081417   0.480 0.632416    
africa      -0.303156   0.088810  -3.414 0.0008


Call:
lm(formula = ln_le60 ~ ln_hla_het + aa_pdiv + ln_hla_het + malfal + 
    tropical + desert + ln_frac + aa_ln_atd + ln_arable + ln_suitavg + 
    ln_abslat + europe + africa + asia + americas, data = hla_country_web, 
    subset = distcr1000 != "NA" & aa_pdiv != "NA" & malfal != 
        "NA" & tropical != "NA" & pnativ_60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50609 -0.06769  0.00790  0.06232  0.25659 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.4368764  0.8027329   5.527 2.03e-07 ***
ln_hla_het   0.6126416  0.2805173   2.184 0.030977 *  
aa_pdiv      0.7093535  0.8347645   0.850 0.397206    
malfal      -0.1729166  0.0507154  -3.410 0.000896 ***
tropical    -0.0003448  0.0005208  -0.662 0.509231    
desert      -0.0014115  0.0015310  -0.922 0.358459    
ln_frac     -0.0967928  0.0763874  -1.267 0.207647    
aa_ln_atd    0.0008407  0.0396231   0.021 0.983108    
ln_arable   -0.0026086  0.0147429  -0.177 0.85


Call:
lm(formula = ln_le60 ~ ln_hla_het + frac_migrant + ln_frac + 
    aa_ln_atd + ln_arable + ln_suitavg + ln_abslat + europe + 
    africa + asia + americas, data = hla_country_web, subset = distcr1000 != 
    "NA" & aa_pdiv != "NA" & malfal != "NA" & tropical != "NA" & 
    pnativ_60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.47729 -0.06295  0.00722  0.06580  0.28144 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.708443   0.433304  10.866  < 2e-16 ***
ln_hla_het    0.869948   0.210203   4.139 6.55e-05 ***
frac_migrant  0.098209   0.059433   1.652   0.1011    
ln_frac      -0.144091   0.076038  -1.895   0.0605 .  
aa_ln_atd     0.041594   0.037615   1.106   0.2710    
ln_arable    -0.008539   0.015060  -0.567   0.5718    
ln_suitavg    0.011369   0.014452   0.787   0.4330    
ln_abslat     0.015788   0.016352   0.966   0.3362    
europe        0.113322   0.092317   1.228   0.2220    
africa       -0.252790   0.09106


Call:
lm(formula = ln_le60 ~ ln_hla_het + aa_pdiv + frac_migrant + 
    ln_hla_het + malfal + tropical + desert + ln_frac + aa_ln_atd + 
    ln_arable + ln_suitavg + ln_abslat + europe + africa + asia + 
    americas, data = hla_country_web, subset = distcr1000 != 
    "NA" & aa_pdiv != "NA" & malfal != "NA" & tropical != "NA" & 
    pnativ_60 != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49909 -0.05211  0.00440  0.06217  0.24834 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.6502914  0.8050756   5.776 6.63e-08 ***
ln_hla_het    0.5817241  0.2786370   2.088  0.03903 *  
aa_pdiv       0.1578063  0.8857528   0.178  0.85891    
frac_migrant  0.1079735  0.0618486   1.746  0.08352 .  
malfal       -0.1678646  0.0503568  -3.334  0.00115 ** 
tropical     -0.0003982  0.0005171  -0.770  0.44285    
desert       -0.0009068  0.0015449  -0.587  0.55837    
ln_frac      -0.1038242  0.0758290  -1.369  0.17361    
aa_ln_atd     0.01139

In [16]:
Table6<-stargazer(regt6_1robust,regt6_2robust,regt6_3robust,regt6_4robust,
                  type="text", align = TRUE)


                       Dependent variable:          
             ---------------------------------------
                                                    
                (1)       (2)       (3)       (4)   
----------------------------------------------------
ln_hla_het   1.077***   0.613*   0.870***   0.582*  
              (0.340)   (0.319)   (0.224)   (0.313) 
                                                    
aa_pdiv       -0.278     0.709               0.158  
              (1.200)   (0.974)             (1.015) 
                                                    
malfal                 -0.173***           -0.168***
                        (0.049)             (0.048) 
                                                    
tropical                -0.0003             -0.0004 
                        (0.001)             (0.001) 
                                                    
desert                  -0.001              -0.001  
                        (0.001)             (

In [17]:
####################################################################
##########    TABLE_7: Endogenous Omitted Variables    #############
####################################################################

regt7_1<-lm(ln_le60~ln_hla_het+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset = ln_ypc60!="NA" & ln_yr_sch1960!="NA" & ln_pd60!="NA" & ln_urb60!="NA" & ln_young!="NA", data=hla_country_web)
regt7_1robust<-coeftest(regt7_1, vcov = vcovHC(regt7_1, type="HC1"))
summary(regt7_1)

regt7_2<-lm(ln_le60~ln_hla_het+ln_ypc60+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset = ln_ypc60!="NA" & ln_yr_sch1960!="NA" & ln_pd60!="NA" & ln_urb60!="NA" & ln_young!="NA", data=hla_country_web)
regt7_2robust<-coeftest(regt7_2, vcov = vcovHC(regt7_2, type="HC1"))
summary(regt7_2)

regt7_3<-lm(ln_le60~ln_hla_het+ln_yr_sch1960+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset = ln_ypc60!="NA" & ln_yr_sch1960!="NA" & ln_pd60!="NA" & ln_urb60!="NA" & ln_young!="NA", data=hla_country_web)
regt7_3robust<-coeftest(regt7_3, vcov = vcovHC(regt7_3, type="HC1"))
summary(regt7_3)

regt7_4<-lm(ln_le60~ln_hla_het+ln_pd60+ln_urb60+ln_young+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset = ln_ypc60!="NA" & ln_yr_sch1960!="NA" & ln_pd60!="NA" & ln_urb60!="NA" & ln_young!="NA", data=hla_country_web)
regt7_4robust<-coeftest(regt7_4, vcov = vcovHC(regt7_4, type="HC1"))
summary(regt7_4)

regt7_5<-lm(ln_le60~ln_hla_het+ln_ypc60+ln_yr_sch1960+ln_pd60+ln_urb60+ln_young+ln_frac+aa_ln_atd+ln_arable+ln_suitavg+ln_abslat+europe+africa+asia+americas,subset = ln_ypc60!="NA" & ln_yr_sch1960!="NA" & ln_pd60!="NA" & ln_urb60!="NA" & ln_young!="NA", data=hla_country_web)
regt7_5robust<-coeftest(regt7_5, vcov = vcovHC(regt7_5, type="HC1"))
summary(regt7_5)



Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_ypc60 != "NA" & ln_yr_sch1960 != 
        "NA" & ln_pd60 != "NA" & ln_urb60 != "NA" & ln_young != 
        "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.48437 -0.04816  0.00643  0.06730  0.30371 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.075976   0.435901  11.645  < 2e-16 ***
ln_hla_het   0.999959   0.202655   4.934 3.29e-06 ***
ln_frac     -0.068797   0.080539  -0.854  0.39508    
aa_ln_atd    0.022079   0.041794   0.528  0.59850    
ln_arable   -0.020301   0.016533  -1.228  0.22243    
ln_suitavg   0.007012   0.015235   0.460  0.64633    
ln_abslat    0.016946   0.017460   0.971  0.33417    
europe       0.082177   0.081488   1.008  0.31572    
africa      -0.303860   0.086142  -3.527  0.00064 ***
asia        -0.198774   0.078794  -2.523  0.0


Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_ypc60 + ln_frac + aa_ln_atd + 
    ln_arable + ln_suitavg + ln_abslat + europe + africa + asia + 
    americas, data = hla_country_web, subset = ln_ypc60 != "NA" & 
    ln_yr_sch1960 != "NA" & ln_pd60 != "NA" & ln_urb60 != "NA" & 
    ln_young != "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.36913 -0.05115 -0.00077  0.05649  0.25210 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.0474992  0.4156969   9.737 4.97e-16 ***
ln_hla_het   0.6574851  0.1846542   3.561 0.000575 ***
ln_ypc60     0.0977108  0.0167017   5.850 6.64e-08 ***
ln_frac     -0.1265309  0.0702960  -1.800 0.074974 .  
aa_ln_atd   -0.0006487  0.0363256  -0.018 0.985789    
ln_arable   -0.0105789  0.0143837  -0.735 0.463824    
ln_suitavg   0.0290353  0.0136934   2.120 0.036524 *  
ln_abslat    0.0089705  0.0151501   0.592 0.555156    
europe       0.0967317  0.0704638   1.373 0.172985    
africa      -0.1342940  0.079885


Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_yr_sch1960 + ln_frac + 
    aa_ln_atd + ln_arable + ln_suitavg + ln_abslat + europe + 
    africa + asia + americas, data = hla_country_web, subset = ln_ypc60 != 
    "NA" & ln_yr_sch1960 != "NA" & ln_pd60 != "NA" & ln_urb60 != 
    "NA" & ln_young != "NA")

Residuals:
      Min        1Q    Median        3Q       Max 
-0.259833 -0.044515  0.001145  0.046709  0.179913 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.990270   0.323522  12.334  < 2e-16 ***
ln_hla_het     0.612726   0.146980   4.169 6.66e-05 ***
ln_yr_sch1960  0.132013   0.013021  10.139  < 2e-16 ***
ln_frac       -0.024146   0.056578  -0.427   0.6705    
aa_ln_atd      0.071103   0.029667   2.397   0.0185 *  
ln_arable      0.004400   0.011832   0.372   0.7108    
ln_suitavg    -0.008141   0.010774  -0.756   0.4517    
ln_abslat      0.005958   0.012276   0.485   0.6285    
europe         0.028567   0.057315   0.498   0.6193    
africa 


Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_pd60 + ln_urb60 + ln_young + 
    ln_frac + aa_ln_atd + ln_arable + ln_suitavg + ln_abslat + 
    europe + africa + asia + americas, data = hla_country_web, 
    subset = ln_ypc60 != "NA" & ln_yr_sch1960 != "NA" & ln_pd60 != 
        "NA" & ln_urb60 != "NA" & ln_young != "NA")

Residuals:
      Min        1Q    Median        3Q       Max 
-0.300764 -0.043457 -0.009262  0.061716  0.245806 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.5230912  0.4641918  11.898  < 2e-16 ***
ln_hla_het   0.7528765  0.1766469   4.262 4.77e-05 ***
ln_pd60      0.0001866  0.0141599   0.013  0.98951    
ln_urb60     0.1137048  0.0177831   6.394 6.02e-09 ***
ln_young    -0.0614235  0.0915465  -0.671  0.50388    
ln_frac     -0.1367887  0.0734167  -1.863  0.06553 .  
aa_ln_atd   -0.0755990  0.0374562  -2.018  0.04638 *  
ln_arable   -0.0064507  0.0173545  -0.372  0.71094    
ln_suitavg   0.0133635  0.0131086   1.019  0.31058  


Call:
lm(formula = ln_le60 ~ ln_hla_het + ln_ypc60 + ln_yr_sch1960 + 
    ln_pd60 + ln_urb60 + ln_young + ln_frac + aa_ln_atd + ln_arable + 
    ln_suitavg + ln_abslat + europe + africa + asia + americas, 
    data = hla_country_web, subset = ln_ypc60 != "NA" & ln_yr_sch1960 != 
        "NA" & ln_pd60 != "NA" & ln_urb60 != "NA" & ln_young != 
        "NA")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.21803 -0.03882  0.00079  0.04623  0.22772 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    4.0482107  0.4849748   8.347 6.35e-13 ***
ln_hla_het     0.5295807  0.1467296   3.609 0.000497 ***
ln_ypc60       0.0280348  0.0183296   1.529 0.129539    
ln_yr_sch1960  0.1036749  0.0149946   6.914 5.84e-10 ***
ln_pd60        0.0062813  0.0116550   0.539 0.591220    
ln_urb60       0.0342884  0.0196702   1.743 0.084609 .  
ln_young      -0.0148321  0.0771438  -0.192 0.847953    
ln_frac       -0.0582461  0.0607576  -0.959 0.340214    
aa_ln_at

In [18]:
Table7<-stargazer(regt7_1robust,regt7_2robust,regt7_3robust,regt7_4robust,regt7_5robust,
                  type="text", align = TRUE, title = "Table 7: Robustness to Endogenous Omitted Variables",
                  dep.var.labels = "ln Life Expectancy in 1960", keep=c("ln_hla_het", "ln_ypc60", "ln_yr_sch1960",
                                                                       "ln_pd60", "ln_urb60", "ln_young"),
                  column.sep.width = "10pt",  digits=4, no.space=TRUE,
                  covariate.labels=c("ln HLA Heterozygosity", "ln GDP per Capita in 1960", "ln Avg. Years of School in 1960",
                                     "ln Population Density in 1960", "ln Urbanization Rate in 1960", "ln Fraction of Population under 15 Years in 1960"))


Table 7: Robustness to Endogenous Omitted Variables
                                                                Dependent variable:               
                                                 -------------------------------------------------
                                                            ln Life Expectancy in 1960            
                                                    (1)       (2)       (3)       (4)       (5)   
--------------------------------------------------------------------------------------------------
ln HLA Heterozygosity                            1.0000*** 0.6575*** 0.6127*** 0.7529*** 0.5296***
                                                 (0.2118)  (0.2124)  (0.1633)  (0.1944)  (0.1673) 
ln GDP per Capita in 1960                                  0.0977***                      0.0280* 
                                                           (0.0221)                      (0.0167) 
ln Avg. Years of School in 1960                         