-
Notifications
You must be signed in to change notification settings - Fork 19
/
lm_lin.Rd
169 lines (142 loc) · 6.81 KB
/
lm_lin.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/estimatr_lm_lin.R
\name{lm_lin}
\alias{lm_lin}
\title{Linear regression with the Lin (2013) covariate adjustment}
\usage{
lm_lin(formula, covariates, data, weights, subset, clusters, se_type = NULL,
ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE)
}
\arguments{
\item{formula}{an object of class formula, as in \code{\link{lm}}, such as
\code{Y ~ Z} with only one variable on the right-hand side, the treatment}
\item{covariates}{a right-sided formula with pre-treatment covariates on
the right hand side, such as \code{ ~ x1 + x2 + x3}.}
\item{data}{A \code{data.frame}}
\item{weights}{the bare (unquoted) names of the weights variable in the
supplied data.}
\item{subset}{An optional bare (unquoted) expression specifying a subset
of observations to be used.}
\item{clusters}{An optional bare (unquoted) name of the variable that
corresponds to the clusters in the data.}
\item{se_type}{The sort of standard error sought. If `clusters` is
not specified the options are "HC0", "HC1" (or "stata", the equivalent),
"HC2" (default), "HC3", or "classical". If `clusters` is specified the
options are "CR0", "CR2" (default), or "stata" are
permissible.}
\item{ci}{logical. Whether to compute and return p-values and confidence
intervals, TRUE by default.}
\item{alpha}{The significance level, 0.05 by default.}
\item{return_vcov}{logical. Whether to return the variance-covariance
matrix for later usage, TRUE by default.}
\item{try_cholesky}{logical. Whether to try using a Cholesky
decomposition to solve least squares instead of a QR decomposition,
FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only
be used if users are sure their model is full-rank (i.e., there is no
perfect multi-collinearity)}
}
\value{
An object of class \code{"lm_robust"}.
The post-estimation commands functions \code{summary} and \code{\link{tidy}}
return results in a \code{data.frame}. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions \code{coef}, \code{vcov},
\code{confint}, and \code{predict}. Marginal effects and uncertainty about
them can be gotten by passing this object to
\code{\link[margins]{margins}} from the \pkg{margins}.
Users who want to print the results in TeX of HTML can use the
\code{\link[texreg]{extract}} function and the \pkg{texreg} package.
An object of class \code{"lm_robust"} is a list containing at least the
following components:
\item{coefficients}{the estimated coefficients}
\item{std.error}{the estimated standard errors}
\item{df}{the estimated degrees of freedom}
\item{p.value}{the p-values from a two-sided t-test using \code{coefficients}, \code{std.error}, and \code{df}}
\item{ci.lower}{the lower bound of the \code{1 - alpha} percent confidence interval}
\item{ci.upper}{the upper bound of the \code{1 - alpha} percent confidence interval}
\item{term}{a character vector of coefficient names}
\item{alpha}{the significance level specified by the user}
\item{se_type}{the standard error type specified by the user}
\item{res_var}{the residual variance}
\item{N}{the number of observations used}
\item{k}{the number of columns in the design matrix (includes linearly dependent columns!)}
\item{rank}{the rank of the fitted model}
\item{vcov}{the fitted variance covariance matrix}
\item{r.squared}{The \eqn{R^2},
\deqn{R^2 = 1 - Sum(e[i]^2) / Sum((y[i] - y^*)^2),} where \eqn{y^*}
is the mean of \eqn{y[i]} if there is an intercept and zero otherwise,
and \eqn{e[i]} is the ith residual.}
\item{adj.r.squared}{The \eqn{R^2} but penalized for having more parameters, \code{rank}}
\item{weighted}{whether or not weights were applied}
\item{call}{the original function call}
We also return \code{terms} and \code{contrasts}, used by \code{predict},
and \code{scaled_center}{the means of each of the covariates used for centering them}
}
\description{
This function is a wrapper for \code{\link{lm_robust}} that
is useful for estimating treatment effects with pre-treatment covariate
data. This implements the method described by Lin (2013).
}
\details{
This function is simply a wrapper for \code{\link{lm_robust}} and implements
the Lin estimator (see the reference below). This method
pre-processes the data by taking the covariates specified in the
\code{`covariates`} argument, centering them by subtracting from each covariate
its mean, and interacting them with the treatment. If the treatment has
multiple values, a series of dummies for each value is created and each of
those is interacted with the demeaned covariates. More details can be found
in the
\href{https://declaredesign.org/R/estimatr/articles/getting-started.html}{Getting Started vignette}
and the
\href{https://declaredesign.org/R/estimatr/articles/mathematical-notes.html}{mathematical notes}.
}
\examples{
library(fabricatr)
library(randomizr)
dat <- fabricate(
N = 40,
x = rnorm(N, mean = 2.3),
x2 = rpois(N, lambda = 2),
x3 = runif(N),
y0 = rnorm(N) + x,
y1 = rnorm(N) + x + 0.35
)
dat$z <- complete_ra(N = nrow(dat))
dat$y <- ifelse(dat$z == 1, dat$y1, dat$y0)
# Same specification as lm_robust() with one additional argument
lmlin_out <- lm_lin(y ~ z, covariates = ~ x, data = dat)
tidy(lmlin_out)
# Works with multiple pre-treatment covariates
lm_lin(y ~ z, covariates = ~ x + x2, data = dat)
# Also centers data AFTER evaluating any functions in formula
lmlin_out2 <- lm_lin(y ~ z, covariates = ~ x + log(x3), data = dat)
lmlin_out2$scaled_center["log(x3)"]
mean(log(dat$x3))
# Works easily with clusters
dat$clusterID <- rep(1:20, each = 2)
dat$z_clust <- cluster_ra(clusters = dat$clusterID)
lm_lin(y ~ z_clust, covariates = ~ x, data = dat, clusters = clusterID)
# Works with multi-valued treatments
dat$z_multi <- sample(1:3, size = nrow(dat), replace = TRUE)
lm_lin(y ~ z_multi, covariates = ~ x, data = dat)
# Stratified estimator with blocks
dat$blockID <- rep(1:5, each = 8)
dat$z_block <- block_ra(blocks = dat$blockID)
lm_lin(y ~ z_block, ~ factor(blockID), data = dat)
\dontrun{
# Can also use 'margins' package if you have it installed to get
# marginal effects
library(margins)
lmlout <- lm_lin(y ~ z_block, ~ x, data = dat)
summary(margins(lmlout))
# Can output results using 'texreg'
library(texreg)
texregobj <- extract(lmlout)
}
}
\seealso{
\code{\link{lm_robust}}
#' @references
Freedman, David A. 2008. "On Regression Adjustments in Experiments with Several Treatments." The Annals of Applied Statistics. JSTOR, 176-96. \url{https://doi.org/10.1214/07-AOAS143}.
Lin, Winston. 2013. "Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman's Critique." The Annals of Applied Statistics 7 (1). Institute of Mathematical Statistics: 295-318. \url{https://doi.org/10.1214/12-AOAS583}.
}