# **Causal inference with observational data**
## Instrumental variables
* Have repeated overvations of the outcome over time? (시계열인가?)
 * **no**
* Treatment assignment depends on a sharp cutoff?(처리에 대한 정학한 구분점이 있는가?)
 * **no**
* **Have a third variable associated with the outcome only through the cause variable?(원인 변수에만 영향을 주는 세번째 변수를 가지고 있는가?)**
 * **yes**
 
$\rightarrow$ 시계열 아님  
$\rightarrow$ 

### Motivation
#### Simple regression
$$y=\beta x+u$$

\begin{array}{lcl}
 x & \rightarrow & y\\
  & \nearrow & \\
 u & & \\
\end{array}

$$\beta_{ols}$$

#### There is associateion between $x$ and $u$

$$y=\beta x + u(x)$$

\begin{array}{lcl}
 x & \rightarrow & y\\
  \uparrow & \nearrow & \\
 u & & \\
\end{array}

$$ \frac {dx}{dy}=\beta + \frac {du}{dx}$$

The OLS estimator is biased and inconsistent for $\beta$

### Definition of and Instrument

\begin{array}{lcl}
z & \rightarrow & x & \rightarrow & y\\
 & & & \nearrow & \\
 & & u & & \\
\end{array}

1. $z$ is uncorrelated with the error $u$
2. $z$ is correlated with regressor $x$

### Estimate

#### Two-Stage Least Squuares(2SLS)
* 1-stage  
$X=Z\gamma+\tau$  
$\hat \gamma = (Z^TZ)^{-1}Z^{T}X$  
$\hat X = Z(Z^TZ)^{-1}Z^{T}X$  
* 2-stage  
$Y=\hat X \beta + u$   
$\hat \beta_{2SLS}=({\hat X}^T \hat X)^{-1}{\hat X}^TY$ 

In [144]:
R<-matrix(cbind(1,0.001,0.002,0.001,
                0.001,1,0.7,0.3,
                0.002,0.7,1,0.001,
                0.001,0.3,0.001,1),nrow=4)
rownames(R)<-colnames(R)<-c("x","d","z","e")
R
U = t(chol(R))
nvars = dim(U)[1]
numobs = 1000
set.seed(1)
random.normal = matrix(rnorm(nvars*numobs,0,1), nrow=nvars, ncol=numobs);
X = U %*% random.normal
newX = t(X)
data = as.data.frame(newX)
attach(data)

y<-10+1*x+1*d+e


ols<-lm(formula = y~x+d)
summary(ols)

#step1
tsls1<-lm(d~x+z)
summary(tsls1)
d.hat<-fitted.values(tsls1)
#step2
tsls2<-lm(y~x+d.hat)
summary(tsls2)

Unnamed: 0,x,d,z,e
x,1.0,0.001,0.002,0.001
d,0.001,1.0,0.7,0.3
z,0.002,0.7,1.0,0.001
e,0.001,0.3,0.001,1.0


The following objects are masked _by_ .GlobalEnv:

    x, z

The following objects are masked from data (pos = 3):

    d, e, x, z

The following objects are masked from data (pos = 4):

    e, x, z

The following objects are masked from data (pos = 5):

    e, x, z

The following objects are masked from data (pos = 6):

    e, x, z

The following objects are masked from data (pos = 7):

    e, x, z

The following objects are masked from data (pos = 8):

    e, x, z

The following objects are masked from data (pos = 9):

    e, x, z

The following objects are masked from data (pos = 10):

    e, x, z

The following objects are masked from data (pos = 11):

    d, e, x, z

The following objects are masked from data (pos = 12):

    d, e, x, z

The following objects are masked from data (pos = 13):

    d, e, x, z

The following objects are masked from data (pos = 14):

    d, e, x, z

The following objects are masked from data (pos = 15):

    e, x, z

The following objects are masked f


Call:
lm(formula = y ~ x + d)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2615 -0.6055 -0.0237  0.6580  2.7711 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.99453    0.03104  321.94   <2e-16 ***
x            1.01191    0.02168   46.67   <2e-16 ***
d            1.31268    0.03028   43.36   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9817 on 997 degrees of freedom
Multiple R-squared:  0.8121,	Adjusted R-squared:  0.8118 
F-statistic:  2155 on 2 and 997 DF,  p-value: < 2.2e-16



Call:
lm(formula = d ~ x + z)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2154 -0.6730 -0.0068  0.6898  2.7035 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0004957  0.0324797   0.015    0.988
x           0.0257322  0.0318376   0.808    0.419
z           0.0342855  0.0472577   0.726    0.468

Residual standard error: 1.027 on 997 degrees of freedom
Multiple R-squared:  0.003961,	Adjusted R-squared:  0.001963 
F-statistic: 1.983 on 2 and 997 DF,  p-value: 0.1383



Call:
lm(formula = y ~ x + d.hat)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0566 -1.1361  0.0343  1.1128  4.5039 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.99445    0.05273 189.531   <2e-16 ***
x            1.02884    0.10091  10.196   <2e-16 ***
d.hat        0.90940    2.23871   0.406    0.685    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.667 on 997 degrees of freedom
Multiple R-squared:  0.458,	Adjusted R-squared:  0.4569 
F-statistic: 421.2 on 2 and 997 DF,  p-value: < 2.2e-16


In [142]:
pexp(41,1/14)