The goal of regweight is to make it easy to diagnose a model using Aronow and Samii (2015) regression weights.
In short, these weights show which observations are most influential for determining the observed value of a coefficient in a linear regression. If the linear regression is aiming to estimate causal effects, this implies that the OLS estimand may differ from the average treatment effect. These linear regression weights provide, in some sense, the most precise estimate available given a conditioning set (and a linear model). These weights are in expectation the conditional variance of the variable of interest (given the other covariates in the model).
For more details, see vignette("example-usage")
.
You can install regweight like so:
# From CRAN:
install.packages("regweight")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("ddimmery/regweight")
This is a basic example which shows you how to analyze the implicit regression weights in a simple problem:
library(regweight)
library(estimatr)
data(penguins, package = "palmerpenguins")
model <- lm_robust(body_mass_g ~ ., penguins)
summary(model)
#>
#> Call:
#> lm_robust(formula = body_mass_g ~ ., data = penguins)
#>
#> Standard error type: HC2
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
#> (Intercept) 84087.94 41946.611 2.0046 4.584e-02 1564.884 166611.005 323
#> speciesChinstrap -282.54 79.288 -3.5635 4.212e-04 -438.525 -126.554 323
#> speciesGentoo 890.96 132.512 6.7236 8.048e-11 630.263 1151.653 323
#> islandDream -21.18 56.015 -0.3781 7.056e-01 -131.380 89.019 323
#> islandTorgersen -58.78 63.182 -0.9303 3.529e-01 -183.078 65.524 323
#> bill_length_mm 18.96 6.214 3.0516 2.465e-03 6.738 31.190 323
#> bill_depth_mm 60.80 18.841 3.2270 1.379e-03 23.732 97.863 323
#> flipper_length_mm 18.50 2.878 6.4283 4.632e-10 12.841 24.167 323
#> sexmale 378.98 45.265 8.3724 1.737e-15 289.926 468.028 323
#> year -42.78 20.953 -2.0420 4.197e-02 -84.006 -1.563 323
#>
#> Multiple R-squared: 0.8768 , Adjusted R-squared: 0.8734
#> F-statistic: 298.1 on 9 and 323 DF, p-value: < 2.2e-16
Let’s say that we want to explore the effect of flipper_length_mm
on
body_mass_g
. Which units have high implicit weight in estimating this
effect?
It’s very easy to use regweight
to answer this question:
rw_model <- calculate_weights(model, "flipper_length_mm")
hist(rw_model)
#> Warning: Removed 11 rows containing non-finite values (stat_bin).
We can see how the distribution of weights over islands varies:
plot(rw_model, penguins$island)
We can similarly see the implicit distribution of bill_length_mm
in
the nominal (unweighted) and implicit (regression weighted) sample:
plot(rw_model, penguins$bill_length_mm)
Or get a table of summary statistics:
summary(
rw_model,
dplyr::select(penguins, island, bill_length_mm),
output = "html"
)
Covariate | Value | Nominal | Implicit | ||
---|---|---|---|---|---|
Mean | Std. Dev. | Mean | Std. Dev. | ||
Discrete variables | |||||
island | Biscoe | 0.488 | 0.453 | ||
Dream | 0.360 | 0.416 | |||
Torgersen | 0.151 | 0.131 | |||
Continuous variables | |||||
bill_length_mm | 43.9 | 5.46 | 43.8 | 5.78 |