## 工具变量回归

##### 黄荣贵（复旦大学社会学系）

In [2]:
# install.packages('AER')

library(AER)

Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: sandwich
Loading required package: survival


In [5]:
data("CigarettesSW")

CigarettesSW$rprice <- with(CigarettesSW, price/cpi)
# real price
CigarettesSW$rincome <- with(CigarettesSW, income/population/cpi) 
# real income per person

# tax： Average state, federal and average local excise taxes for fiscal year.
# taxs： Average excise taxes for fiscal year, including sales tax.
CigarettesSW$tdiff <- with(CigarettesSW, (taxs - tax)/cpi) 
# real tax on cigarettes arising from the state’s general sales tax

In [6]:
# only analyze on a single year
Cigarettes95 = subset(CigarettesSW, year == "1995")

In [7]:
summary(lm(log(packs) ~ log(rprice) + log(rincome) , data = Cigarettes95))


Call:
lm(formula = log(packs) ~ log(rprice) + log(rincome), data = Cigarettes95)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.59077 -0.07856 -0.00149  0.11860  0.35442 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   10.3420     1.0227  10.113 3.66e-13 ***
log(rprice)   -1.4065     0.2514  -5.595 1.24e-06 ***
log(rincome)   0.3439     0.2350   1.463     0.15    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1873 on 45 degrees of freedom
Multiple R-squared:  0.4327,	Adjusted R-squared:  0.4075 
F-statistic: 17.16 on 2 and 45 DF,  p-value: 2.884e-06


* 结构模型

香烟消费量在多大程度上受价格的影响；控制变量为人均收入

* 可能内生的变量

价格，因为价格上升将导致消费需求下降，反之上升

* 工具变量

税收，税收由政府决定，而不是市场决定

* 模型设定

对变量作对数变换

In [4]:
iv1 <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff, data = Cigarettes95)

In [5]:
summary(iv1)


Call:
ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
    tdiff, data = Cigarettes95)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.611000 -0.086072  0.009423  0.106912  0.393159 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.4307     1.3584   6.943 1.24e-08 ***
log(rprice)   -1.1434     0.3595  -3.181  0.00266 ** 
log(rincome)   0.2145     0.2686   0.799  0.42867    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1896 on 45 degrees of freedom
Multiple R-Squared: 0.4189,	Adjusted R-squared: 0.3931 
Wald test: 6.534 on 2 and 45 DF,  p-value: 0.003227 


In [6]:
summary(iv1, diagnostics = TRUE)


Call:
ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
    tdiff, data = Cigarettes95)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.611000 -0.086072  0.009423  0.106912  0.393159 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.4307     1.3584   6.943 1.24e-08 ***
log(rprice)   -1.1434     0.3595  -3.181  0.00266 ** 
log(rincome)   0.2145     0.2686   0.799  0.42867    

Diagnostic tests:
                 df1 df2 statistic  p-value    
Weak instruments   1  45    45.158 2.65e-08 ***
Wu-Hausman         1  44     1.102      0.3    
Sargan             0  NA        NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1896 on 45 degrees of freedom
Multiple R-Squared: 0.4189,	Adjusted R-squared: 0.3931 
Wald test: 6.534 on 2 and 45 DF,  p-value: 0.003227 


*诊断检验*

 - an F test of the first stage regression for weak instruments
 - a Wu-Hausman test for endogeneity
 - a Sargan test of overidentifying restrictions (if more instruments than regressors)

In [8]:
iv2 <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + I(tax/cpi), data = Cigarettes95)

In [9]:
summary(iv2, diagnostics = TRUE)


Call:
ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
    tdiff + I(tax/cpi), data = Cigarettes95)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.6006931 -0.0862222 -0.0009999  0.1164699  0.3734227 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.8950     1.0586   9.348 4.12e-12 ***
log(rprice)   -1.2774     0.2632  -4.853 1.50e-05 ***
log(rincome)   0.2804     0.2386   1.175    0.246    

Diagnostic tests:
                 df1 df2 statistic p-value    
Weak instruments   2  44   244.734  <2e-16 ***
Wu-Hausman         1  44     3.068  0.0868 .  
Sargan             1  NA     0.333  0.5641    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1879 on 45 degrees of freedom
Multiple R-Squared: 0.4294,	Adjusted R-squared: 0.4041 
Wald test: 13.28 on 2 and 45 DF,  p-value: 2.931e-05 


*诊断检验结果*

- Weak instruments，显著，表明所使用的工具变量并不是弱工具变量
- Wu-Hausman，显著，表明确实存在内生性
- Sargan，不显著，表明所使用的工具变量与误差项之间不存在相关

In [11]:
summary(lm(log(rprice) ~ log(rincome) + tdiff + I(tax/cpi), data = Cigarettes95))


Call:
lm(formula = log(rprice) ~ log(rincome) + tdiff + I(tax/cpi), 
    data = Cigarettes95)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.067411 -0.017296 -0.001123  0.023591  0.071556 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.1030339  0.0988873  41.492  < 2e-16 ***
log(rincome) 0.1083449  0.0397382   2.726  0.00916 ** 
tdiff        0.0108898  0.0020086   5.422 2.37e-06 ***
I(tax/cpi)   0.0093517  0.0006273  14.909  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03226 on 44 degrees of freedom
Multiple R-squared:  0.9403,	Adjusted R-squared:  0.9363 
F-statistic: 231.1 on 3 and 44 DF,  p-value: < 2.2e-16


In [10]:
?ivreg