Skip to content

Commit

Permalink
Merge branch 'main' of github.com:CoryMcCartan/causaltbl
Browse files Browse the repository at this point in the history
  • Loading branch information
CoryMcCartan committed Mar 26, 2023
2 parents 4621854 + 0fc842f commit 15f196c
Show file tree
Hide file tree
Showing 6 changed files with 276 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ pkgdown
*.tmp
*.bak
*.swp
inst/doc
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Imports:
stats
Suggests:
dplyr,
knitr,
rmarkdown,
testthat (>= 3.0.0)
License: MIT + file LICENSE
Encoding: UTF-8
Expand All @@ -27,3 +29,4 @@ Config/testthat/edition: 3
URL: https://github.com/CoryMcCartan/causaltbl,
http://corymccartan.com/causaltbl/
BugReports: https://github.com/CoryMcCartan/causaltbl/issues
VignetteBuilder: knitr
68 changes: 68 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,71 @@ You can install the development version of causaltbl from [GitHub](https://githu
# install.packages("remotes")
remotes::install_github("CoryMcCartan/causaltbl")
```

## Using `causaltbl`

A causal tibble, `causal_tbl`, is a data frame with attributes identifying which columns correspond to common inputs in causal inference analyses. At the most basic level, you can indicate the outcome and treatment columns. For more involved analyses, `causal_tbl`s can keep track of additional columns including multiple outcomes and multiple treatments.

The primary entryway to `causaltbl` is through <!--- [`tidycausal`](https://corymccartan.com/tidycausal/) -->.
You can create a `causal_tbl` directly via `causal_tbl()`.

Suppose we have data from a really simple differences in differences design. Our data looks like this:

```{r}
df <- data.frame(
id = c("a", "a", "a", "a", "b", "b", "b", "b"),
year = rep(2015:2018, 2),
trt = c(0, 0, 0, 0, 0, 0, 1, 1),
y = c(1, 3, 2, 3, 2, 4, 4, 5)
)
```

There are two units (`id`), `a` and `b`. We have 4 yearly observations from 2015 to 2018 (`year`) for each unit. `a` is never treated and `b` is treated in 2017 and 2018 (`trt`). Some outcome (`y`) is measured yearly.

We first can make a `causal_tbl` by passing `df` to `causal_tbl()`. We don't need to specify any options.

```{r}
library(causaltbl)
did <- causal_tbl(df)
```

Now `did` is a `causal_tbl` version of `df`.

```{r}
did
```

To set outcome , we can use the corresponding functions `set_outcome()`. `causal_tbl` uses tidy evaluation, so we can use the bare column name.

```{r}
did <- did |>
set_outcome(outcome = y)
did
```

Similarly, we can indicate that `did` has a treatment column `trt` or panel structure for each `id`-`year` with the corresponding `set_treatment()` and `set_panel()` functions.

```{r}
did <- did |>
set_treatment(treatment = trt) |>
set_panel(unit = id, time = year)
did
```

This sets attributes that are used down-the-line by other packages. We can retrieve them by calling their `get`ters. For the outcome, `get_outcome()`:

```{r}
get_outcome(did)
```
For the treatment, `get_treatment()`:
```{r}
get_treatment(did)
```

And for the panel structure, `get_panel()`:
```{r}
get_panel(did)
```

For more information on using `causal_tbl`s or designing functions that use `causal_tbl`s, see the Advanced `causal_tbl` vignette.

131 changes: 131 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,134 @@ You can install the development version of causaltbl from
# install.packages("remotes")
remotes::install_github("CoryMcCartan/causaltbl")
```

## Using `causaltbl`

A causal tibble, `causal_tbl`, is a data frame with attributes
identifying which columns correspond to common inputs in causal
inference analyses. At the most basic level, you can indicate the
outcome and treatment columns. For more involved analyses, `causal_tbl`s
can keep track of additional columns including multiple outcomes and
multiple treatments.

The primary entryway to `causaltbl` is through
<!--- [`tidycausal`](https://corymccartan.com/tidycausal/) -->. You can
create a `causal_tbl` directly via `causal_tbl()`.

Suppose we have data from a really simple differences in differences
design. Our data looks like this:

``` r
df <- data.frame(
id = c("a", "a", "a", "a", "b", "b", "b", "b"),
year = rep(2015:2018, 2),
trt = c(0, 0, 0, 0, 0, 0, 1, 1),
y = c(1, 3, 2, 3, 2, 4, 4, 5)
)
```

There are two units (`id`), `a` and `b`. We have 4 yearly observations
from 2015 to 2018 (`year`) for each unit. `a` is never treated and `b`
is treated in 2017 and 2018 (`trt`). Some outcome (`y`) is measured
yearly.

We first can make a `causal_tbl` by passing `df` to `causal_tbl()`. We
don’t need to specify any options.

``` r
library(causaltbl)
did <- causal_tbl(df)
```

Now `did` is a `causal_tbl` version of `df`.

``` r
did
#> # A <causal_tbl> [8 × 4]
#>
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5
```

To set outcome , we can use the corresponding functions `set_outcome()`.
`causal_tbl` uses tidy evaluation, so we can use the bare column name.

``` r
did <- did |>
set_outcome(outcome = y)
did
#> # A <causal_tbl> [8 × 4]
#> [out]
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5
```

Similarly, we can indicate that `did` has a treatment column `trt` or
panel structure for each `id`-`year` with the corresponding
`set_treatment()` and `set_panel()` functions.

``` r
did <- did |>
set_treatment(treatment = trt) |>
set_panel(unit = id, time = year)
did
#> # A <causal_tbl> [8 × 4]
#> [unit] [time] [trt] [out]
#> id year trt y
#> <chr> <int> <dbl> <dbl>
#> 1 a 2015 0 1
#> 2 a 2016 0 3
#> 3 a 2017 0 2
#> 4 a 2018 0 3
#> 5 b 2015 0 2
#> 6 b 2016 0 4
#> 7 b 2017 1 4
#> 8 b 2018 1 5
```

This sets attributes that are used down-the-line by other packages. We
can retrieve them by calling their `get`ters. For the outcome,
`get_outcome()`:

``` r
get_outcome(did)
#> [1] "y"
```

For the treatment, `get_treatment()`:

``` r
get_treatment(did)
#> y
#> "trt"
```

And for the panel structure, `get_panel()`:

``` r
get_panel(did)
#> $unit
#> [1] "id"
#>
#> $time
#> [1] "year"
```

For more information on using `causal_tbl`s or designing functions that
use `causal_tbl`s, see the Advanced `causal_tbl` vignette.
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
71 changes: 71 additions & 0 deletions vignettes/advanced.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: "Advanced `causal_tbl`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Advanced `causal_tbl`}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

This vignette provides more specific details of how `causal_tbl` objects work and how to extend them. Most users won't need to know much about `causal_tbl`s except that they're (1) extensions of `tibble`s and (2) they rely on a `causal_cols` attribute that makes things "just work". The `causal_cols` are the columns for different causal variables that play an important role. The package provides various getter and setter functions for these.

This vignette covers:
1. How `causal_cols` works internally.
2. How to extend the type if your model needs shiny new causal variables.

```{r setup}
library(causaltbl)
```

## Internal Design of `causal_tbl`

Like in the README, here we use a simple difference-in-differences example: 8 observations for 2 units, across 4 years.

```{r}
df <- data.frame(
id = c("a", "a", "a", "a", "b", "b", "b", "b"),
year = rep(2015:2018, 2),
trt = c(0, 0, 0, 0, 0, 0, 1, 1),
y = c(1, 3, 2, 3, 2, 4, 4, 5)
)
```

Here, when we create the `causal_tbl`, we can specify the outcome and treatment directly via `.outcome` and `.treatment`.
```{r}
did <- causal_tbl(df, .outcome = y, .treatment = trt)
```

All causal attributes can be recovered with `causal_cols()`:

```{r}
causal_cols(did)
```

Each of these elements is a character vector, with each element being a name of a column in the data frame. For some variables, this vector should be of length 1, but for other variables, there may be multiple columns of that type.

In our case, the `causal_cols()` are the `outcome` and `treatment`. The outcome has no name, i.e., it's just `"y"`. The treatments entry indicates that `trt` automatically corresponds to `"y"` as the outcome related to this treatment. This is indicated by the name.

The optional `names()` of the columns within a particular element of `causal_cols` convey information on any associated variable. For example, the treatment variable is by default associated with a particular outcome. And a propensity score or outcome model is associated with a particular treatment or outcome variable.

However, you are not limited to one treatment or one outcome. For example, if a package author was developing methods for causal inference with multiple continuous treatments, the treatment element of `causal_cols` could have an entry for each `treatment` column.

Once set, these column names within `causal_cols` are automatically updated if columns are renamed, or set to `NULL` if columns are dropped. This reassignment happens automatically and silently in all cases.

## Extending `causal_tbl` with new `causal_cols`

Now, if you need something fancy, odds are should implement a new attribute for `causal_cols`. As we saw before, `causal_cols` attributes can be gotten via `causal_cols()`. They can be set using `causal_cols() <- ...`.

Each new entry to `causal_cols` should be a named list, where:

- the name of the list denotes, in short form, what the thing is (i.e. if they're propensity scores, the name should be `pscores`)
- each entry in the list denotes one of those things
- each name of each entry indicates what that entry corresponds to

It is the responsibility of implementers of particular methods to check that a causal_tbl has the necessary columns set via helpers like `has_treatment()`, `has_outcome()`, etc.

0 comments on commit 15f196c

Please sign in to comment.