This package provides a causal_tbl class for causal inference. A
causal_tbl is a subclass of tibble which keeps track of information
on the roles of variables like treatment and outcome, and provides
functionality to store models and their fitted values as columns in a
data frame.
You can install the development version of causaltbl from GitHub with:
# install.packages("remotes")
remotes::install_github("CoryMcCartan/causaltbl")A causal tibble, causal_tbl, is a data frame with attributes
identifying which columns correspond to common inputs in causal
inference analyses. At the most basic level, you can indicate the
outcome and treatment columns. For more involved analyses, causal_tbls
can keep track of additional columns including multiple outcomes and
multiple treatments.
The primary entryway to causaltbl is through
create a causal_tbl directly via causal_tbl().
Suppose we have data from a really simple differences in differences design. Our data looks like this:
df <- data.frame(
id = c("a", "a", "a", "a", "b", "b", "b", "b"),
year = rep(2015:2018, 2),
trt = c(0, 0, 0, 0, 0, 0, 1, 1),
y = c(1, 3, 2, 3, 2, 4, 4, 5)
)There are two units (id), a and b. We have 4 yearly observations
from 2015 to 2018 (year) for each unit. a is never treated and b
is treated in 2017 and 2018 (trt). Some outcome (y) is measured
yearly.
We first can make a causal_tbl by passing df to causal_tbl(). We
don’t need to specify any options.
library(causaltbl)
did <- causal_tbl(df)Now did is a causal_tbl version of df.
did
#> # A <causal_tbl> [8 × 4]
#>
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5To set outcome , we can use the corresponding functions set_outcome().
causal_tbl uses tidy evaluation, so we can use the bare column name.
did <- did |>
set_outcome(outcome = y)
did
#> # A <causal_tbl> [8 × 4]
#> [out]
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5Similarly, we can indicate that did has a treatment column trt or
panel structure for each id-year with the corresponding
set_treatment() and set_panel() functions.
did <- did |>
set_treatment(treatment = trt) |>
set_panel(unit = id, time = year)
did
#> # A <causal_tbl> [8 × 4]
#> [unit] [time] [trt] [out]
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5This sets attributes that are used down-the-line by other packages. We
can retrieve them by calling their getters. For the outcome,
get_outcome():
get_outcome(did)
#> [1] "y"For the treatment, get_treatment():
get_treatment(did)
#> y
#> "trt"And for the panel structure, get_panel():
get_panel(did)
#> $unit
#> [1] "id"
#>
#> $time
#> [1] "year"For more information on using causal_tbls or designing functions that
use causal_tbls, see the Advanced causal_tbl vignette.
