# IPTW Data Analysis

To install R package, use the following syntax
```R
install.packages("survey",repos='http://cran.us.r-project.org')
```

In [23]:
library(tableone)
library(Matching)
library(ipw)
library(survey)
library(MatchIt)
library(sandwich)

In [7]:
data(lalonde)

### Dataset information

The data have n=614 subjects and 10 variables

**age** age in years.

**educ** years of schooling.

**black** indicator variable for blacks.

**hispan** indicator variable for Hispanics.

**married** indicator variable for marital status.

**nodegree** indicator variable for high school diploma.

**re74** real earnings in 1974.

**re75** real earnings in 1975.

**re78** real earnings in 1978.

**treat** an indicator variable for treatment status.

The *outcome* is **re78** – post-intervention income.

The *treatment* is **treat** – which is equal to 1 if the subject received the labor training and equal to 0 otherwise.

The potential *confounding variables* are: **age, educ, black, hispan, married, nodegree, re74, re75**.

Fit a propensity score model. Use a logistic regression model, where the outcome is treatment. Include the 8 confounding variables in the model as predictors, with no interaction terms or non-linear terms (such as squared terms). Obtain the propensity score for each subject. Next, obtain the inverse probability of treatment weights for each subject.

#### Q1. What are the minimum and maximum weights?

In [8]:
str(lalonde)

'data.frame':	614 obs. of  10 variables:
 $ treat   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ age     : int  37 22 30 27 33 22 23 32 22 33 ...
 $ educ    : int  11 9 12 11 8 9 12 11 16 12 ...
 $ black   : int  1 0 1 1 1 1 1 1 1 0 ...
 $ hispan  : int  0 1 0 0 0 0 0 0 0 0 ...
 $ married : int  1 0 0 0 0 0 0 0 0 1 ...
 $ nodegree: int  1 1 0 1 1 1 0 1 0 0 ...
 $ re74    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ re75    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ re78    : num  9930 3596 24909 7506 290 ...


In [10]:
xvars <- c("age", "educ", "black", "hispan", "married", "nodegree", "re74", "re75")
table1 <-CreateTableOne(vars = xvars, strata = "treat", data=lalonde, test=FALSE)
print(table1,smd=TRUE)

                      Stratified by treat
                       0                 1                 SMD   
  n                        429               185                 
  age (mean (SD))        28.03 (10.79)     25.82 (7.16)     0.242
  educ (mean (SD))       10.24 (2.86)      10.35 (2.01)     0.045
  black (mean (SD))       0.20 (0.40)       0.84 (0.36)     1.668
  hispan (mean (SD))      0.14 (0.35)       0.06 (0.24)     0.277
  married (mean (SD))     0.51 (0.50)       0.19 (0.39)     0.719
  nodegree (mean (SD))    0.60 (0.49)       0.71 (0.46)     0.235
  re74 (mean (SD))     5619.24 (6788.75) 2095.57 (4886.62)  0.596
  re75 (mean (SD))     2466.48 (3292.00) 1532.06 (3219.25)  0.287


In [19]:
ps_model <- glm(treat~age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde,family = binomial(link="logit"))
ps <- predict(ps_model,type = "response")

## create weights
weight <-ifelse(lalonde$treat==1,1/ps,1/(1-ps))

print(c(min(weight),max(weight)))

[1]  1.009163 40.077293


#### Answer 1. `(1.009163, 40.077293)`

#### Q2. Find the standardized differences for each confounder on the weighted (pseudo) population. What is the standardized difference for nodegree?

In [20]:
weighteddata <- svydesign(ids = ~1, data = lalonde, weights = weight )
weightedtable <-svyCreateTableOne(vars = xvars, strata = "treat", data = weighteddata, test = FALSE)
print(weightedtable,smd=TRUE)

                      Stratified by treat
                       0                 1                 SMD   
  n                     616.00            553.63                 
  age (mean (SD))        27.10 (10.80)     25.57 (6.53)     0.172
  educ (mean (SD))       10.29 (2.74)      10.61 (2.05)     0.132
  black (mean (SD))       0.40 (0.49)       0.45 (0.50)     0.101
  hispan (mean (SD))      0.12 (0.32)       0.12 (0.33)     0.014
  married (mean (SD))     0.41 (0.49)       0.31 (0.47)     0.197
  nodegree (mean (SD))    0.62 (0.48)       0.57 (0.50)     0.112
  re74 (mean (SD))     4552.74 (6337.09) 2932.18 (5709.42)  0.269
  re75 (mean (SD))     2172.04 (3160.14) 1658.07 (3072.89)  0.165


#### Answer 2. `0.112`

#### Q3. Using IPTW, find the estimate and 95% confidence interval for the average causal effect. This can be obtained from svyglm

In [31]:
#glm.obj <- glm(lalonde$re78~lalonde$treat, weights = weight, family = quasibinomial(link = "identity") )

# betaiptw <-coef(glm.obj)
# SE <-sqrt(diag(vcovHC(glm.obj,type = "HCO")))

# causalrd <- betaiptw[2]
# lcl<-(betaiptw[2]-1.96*SE[2])
# ucl<-(betaiptw[2]+1.96*SE[2])
# c(lcl,causalrd,ucl)

rd <- svyglm(re78~treat, design = weighteddata)
coef(rd)
confint(rd)

Unnamed: 0,2.5 %,97.5 %
(Intercept),5706.948,7138.73
treat,-1559.321,2008.673


#### Answer 3. Estimated `224.67`, 95% confidence interval `(-1559.32,2008.67)`

Now truncate the weights at the 1st and 99th percentiles. This can be done with the trunc=0.01 option in svyglm.

#### Q4. Using IPTW with the truncated weights, find the estimate and 95% confidence interval for the average causal effect

In [34]:
weight_model <- ipwpoint(exposure = treat, denominator = ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde,family = "binomial", link="logit", trunc =0.01)

rd <- (svyglm(re78~treat, design = svydesign(ids = ~1, data = lalonde, weights = weight_model$weights.trunc )))
coef(rd)
confint(rd)

Unnamed: 0,2.5 %,97.5 %
(Intercept),5707.033,7138.84
treat,-1090.639,2064.506


#### Answer 4. Estimated `486.93`, 95% confidence interval `(-1090.64,2064.51)`