Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ddimmery committed Jul 23, 2023
1 parent 9e1cf18 commit d3368f1
Show file tree
Hide file tree
Showing 7 changed files with 93 additions and 10 deletions.
14 changes: 14 additions & 0 deletions R/public_api.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ attach_config <- function(data, .HTE_cfg) {
#' arbitrary number of covariates on which to stratify the splits.
#' It returns the original dataset with an additional column `.split_id`
#' corresponding to an identifier for the split.
#'
#' To see an example analysis, read `vignette("experimental_analysis")` in the context
#' of an experiment, `vignette("experimental_analysis")` for an observational study, or
#' `vignette("methodological_details")` for a deeper dive under the hood.
#'
#' @param data dataframe
#' @param identifier Unquoted name of unique identifier column
#' @param ... variables on which to stratify (requires that `quickblock` be installed.)
Expand Down Expand Up @@ -141,6 +146,11 @@ make_splits <- function(data, identifier, ..., .num_splits) {
#' to an estimate of the conditional expectation of treatment (`.pi_hat`), along with the
#' conditional expectation of the control and treatment potential outcome surfaces
#' (`.mu0_hat` and `.mu1_hat` respectively).
#'
#' To see an example analysis, read `vignette("experimental_analysis")` in the context
#' of an experiment, `vignette("experimental_analysis")` for an observational study, or
#' `vignette("methodological_details")` for a deeper dive under the hood.
#'
#' @param data dataframe (already prepared with `attach_config` and `make_splits`)
#' @param outcome Unquoted name of the outcome variable.
#' @param treatment Unquoted name of the treatment variable.
Expand Down Expand Up @@ -276,6 +286,10 @@ produce_plugin_estimates <- function(data, outcome, treatment, ..., .weights = N
#' plugin estimates and pseudo-outcomes and calculates the requested
#' quantities of interest (QoIs).
#'
#' To see an example analysis, read `vignette("experimental_analysis")` in the context
#' of an experiment, `vignette("experimental_analysis")` for an observational study, or
#' `vignette("methodological_details")` for a deeper dive under the hood.
#'
#' @param data data frame (already prepared with `attach_config`, `make_splits`,
#' `produce_plugin_estimates` and `construct_pseudo_outcomes`)
#' @param ... Unquoted names of moderators to calculate QoIs for.
Expand Down
11 changes: 9 additions & 2 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ navbar:
reference:
- title: Estimation API
desc: >
Tidy functions for performing an analysis of heterogeneous treatment effects.
Tidy functions for performing an analysis of heterogeneous treatment effects. Once
a configuration has been defined (like by the Recipe API), these functions are the
workhorses that perform all estimation.
- contents:
- attach_config
- make_splits
Expand All @@ -30,6 +32,7 @@ reference:
- title: Recipe API
desc: >
Tidy functions for configuring a "recipe" for how to estimate heterogeneous treatment effects.
This is the easiest way to get started with setting up a configuration for an HTE analysis.
- contents:
- basic_config
- add_propensity_score_model
Expand All @@ -44,6 +47,9 @@ reference:
- title: Model Configuration
desc: >
Classes to define the configuration of models to be used in the eventual HTE analysis.
These are the classes which define the underlying configurations in the Recipe API. They're
most useful for advanced users who want the most granular control over their analysis, but
most users will be best served by the Recipe API.
- subtitle: Base Class
- contents:
- Model_cfg
Expand All @@ -70,7 +76,8 @@ reference:
These classes configure the overall shape of the HTE analysis: essentially, how
all the various components and models should fit together. They explain what models
should be estimated and how those models should be combined into relevant quantities
of interest.
of interest. These, too, underlie the Recipe API, and should rarely need to be used
directly.
- contents:
- Diagnostics_cfg
- HTE_cfg
Expand Down
1 change: 1 addition & 0 deletions tests/testthat/test-api-pcate.R
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ n_rows <- (


test_that("Check results data", {
skip_on_cran()
checkmate::check_character(results$estimand, any.missing = FALSE)
checkmate::check_double(results$estimate, any.missing = FALSE)
checkmate::check_double(results$std_error, any.missing = FALSE)
Expand Down
2 changes: 2 additions & 0 deletions tests/testthat/test-api-repeats.R
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ test_that("Estimate QoIs", {
})

test_that("VIMP is valid", {
skip_on_cran()
vimp <- results %>% dplyr::filter(grepl("VIMP", estimand))
vimp_z <- vimp$estimate / vimp$std_error
# expect small p-value for x1 which has actual HTE
Expand All @@ -123,6 +124,7 @@ n_rows <- (
)

test_that("Check results data", {
skip_on_cran()
checkmate::check_character(results$estimand, any.missing = FALSE)
checkmate::check_double(results$estimate, any.missing = FALSE)
checkmate::check_double(results$std_error, any.missing = FALSE)
Expand Down
9 changes: 9 additions & 0 deletions tests/testthat/test-pseudooutcomes.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,15 @@ test_that("Estimate Plugin Models", {
checkmate::expect_data_frame(data3)
})

test_that("Errors on unknown type", {
expect_error(
construct_pseudo_outcomes(
data3, {{ outcome_variable }}, {{ treatment_variable }}, type = "idk"
),
"Unknown type of pseudo-outcome."
)
})

test_that("Construct DR Pseudo-outcomes", {
data4 <<- construct_pseudo_outcomes(
data3, {{ outcome_variable }}, {{ treatment_variable }}
Expand Down
44 changes: 36 additions & 8 deletions vignettes/methodological_details.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ abstract: >
This package implements the methods of @kennedy2020optimal and presents them through a tidy-style user-facing API.
The design principles undergirding this package are (1) the APIs should be tidy-friendly, (2) Analyses should be
easy to replicate with minor changes, (3) specifying complex ensembles for the nuisance functions should be
straightforward, and (4) sensible diagnostics should be easily accessible.
straightforward, and (4) sensible diagnostics should be easily accessible. Plotting and formatting of the results
are left for the end-user to customize.
---

```{r setup, echo=FALSE}
Expand All @@ -41,12 +42,14 @@ set.seed(100)
n <- nrow(penguins)
```

# Introduction
# Summary

This document details the broad strokes of how `tidyhte` constructs estimates of heterogeneous treatment effects. It will highlight a variety of the features of the package and discuss the mathematics which undergird them.
This document details how `tidyhte` constructs estimates of heterogeneous treatment effects. It will highlight a variety of the features of the package and discuss the mathematics which undergird them.

After a brief introduction to the methods of HTE estimation of @kennedy2020optimal, the structure will generally follow the estimation API of `tidyhte`: it will begin by discussing the creation of cross-validation folds, then highlight nuisance function estimation, proceed to the construction of "pseudo outcomes", a concept from @kennedy2020optimal, and conclude by demonstrating the calculation of a few varieties of Quantities of Interest: the actual statistics which are desired by the end-user.

# Statement of Need

# Preliminaries

## Problem Setting
Expand All @@ -68,7 +71,7 @@ Heterogeneous treatment effects are defined as the difference in conditional exp
Under these assumptions, $\tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x]$.

For the remainder of this paper, we will use a semi-simulated running example based on the `penguins` dataset of @palmerpenguins.
We will imagine a nutritional intervention for which we wish to measure the causal effect on body mass.
We will imagine a randomly assigned nutritional intervention for which we wish to measure the causal effect on body mass.
We also have a lower variance measurement of the average food consumed per day over an observation period.
Gentoo penguins gain weight by this intervention on average, while vice-versa for Adelie penguins.
The average change in weight for Chinstrap penguins is zero.
Expand Down Expand Up @@ -101,13 +104,13 @@ $$
\hat{\tau}_{dr}(x) = \hat{\mathbb{E}}[\hat{\psi}(Z) \mid X = x]
$$

This algorithm is written when estimates of nuisance functions are trained on separate data and, therefore, can be treated as fixed. The easiest way to do this is with a sample-splitting procedure, after which results are averaged across splits.
This algorithm is written assuming estimates of nuisance functions are trained on separate data and, therefore, can be treated as fixed. The easiest way to do this is with a sample-splitting procedure, after which results are averaged across splits.

The crucial result, Theorem 2 of @kennedy2020optimal, shows that the error of the second stage regression will match the error of an oracle regression of the true individual treatment effects on covariates $X$ up to a factor which depends on the product of the errors of the two estimated nuisance functions (that is, the errors in $\pi$ and in $\mu$).

When the second-stage regression is simple, like subgroup averages within cells defined by $X$, the DR-Learner inherits unbiasedness and efficiency like an AIPW estimator [@robins1995analysis; @van2003unified; @tsiatis2006semiparametric; @tsiatis2008covariate; @chernozhukov2018double].

The approximation results of @kennedy2020optimal applied to regressions on this transformed outcome allows for a lot of flexibility in quantities of interest that may be estimated. We will use this fact to allow the estimation of a variety of models *as if* they were estimated on the individual causal effects themselves: for example, variable importance measures operate off the assumption that various second-stage regressions will be accurate.
The approximation results of @kennedy2020optimal applied to regressions on this transformed outcome allow for a lot of flexibility in quantities of interest that may be estimated. We will use this fact to allow the estimation of a variety of models *as if* they were estimated on the individual causal effects themselves: for example, variable importance measures operate off the assumption that various second-stage regressions will be accurate.

## Principles

Expand All @@ -118,10 +121,24 @@ This design allows for repeating very similar analyses multiple times with very
Instances when this might be useful are when the user wishes to explore heterogeneity across a variety of outcomes, or when there are a variety of treatment contrasts of particular interest.
Each of these analyses will tend to share common features: similar classes of models will be included in the ensembles for nuisance estimation, for example.

Also important is that the methods provided support the usage of common

### Clustered data

A common feature of real-world data is that treatment may be clustered.
In other words, if one unit receives treatment, there may be other units who are then also more likely to receive treatment.
A common example of this sort of design might be when villages or townships are assigned to treatment, but measurement occurs at the individual level.
As discussed by @abadie2023should, this design implies that standard errors should be clustered at the level at which treatment was assigned.
The `tidyhte` package supports clustering as a first class citizen, and resulting estimates will all receive the proper statistical uncertainty estimates based around this clustering.
In practice, the end-user simply specifies the individual unit-id when constructing cross-validation splits (`make_splits()`), and all subsequent analyses take this into account.

### Population weights

Another common feature of designs used in practice is that they come from samples that do not perfectly represent the larger populations from which they are drawn.
The most common solution to this problem is through the use of weights to make the sample more closely resemble the population.
This, too, is supported by `tidyhte`, by simply specifying weights when models are estimated (`produce_plugin_estimates()`).
Downstream analyses will then take weights into account appropriately.

# Recipe API

In order to build up definitions around how HTE should be estimated, `tidyhte` provides an API to progressively build up a configuration object.
Expand All @@ -139,9 +156,9 @@ cfg <- basic_config() %>%

Since the subject of interest is an experimental intervention, the propensity score is known.
Using this known propensity score provides unbiasedness for many quantities of interest (although this may leave some efficiency on the table CITE).
In this case, in addition to the (default) linear model included in the SuperLearner ensemble, we add an elastic-net regression.
In this case, in addition to the (default) linear model included in the SuperLearner ensemble, we add an elastic-net regression [@zou2005regularization].
We sweep over a variety of mixing parameters between LASSO and ridge, throwing each of these models into the ensemble.
This means that SuperLearner will perform model selection and averaging to identify the best hyper-parameter values.
This means that SuperLearner will perform model selection and averaging to identify the best hyper-parameter values [@van2007super].
Furthermore, we define all of the moderators of interest and how their results should be collected and displayed.
Discrete moderators will just take stratified averages at each level of the moderator, while continuous moderators will use local polynomial regression [@fan2018local; @calonico2019nprobust].
Finally, we add a variable importance measure from @williamson2021nonparametric.
Expand Down Expand Up @@ -235,6 +252,10 @@ penguins %<>% construct_pseudo_outcomes(food_consumed_g, treatment)
Finally, it comes to the most glamorous part of the analysis, when effects are estimated and put into charts.

The design of `tidyhte` chooses to leave the charting to the end-user, but merely returns a tidy tibble with all of the requested quantities of interest.
The package focuses on a few types of quantities of interest:
- Marginal Conditional Average Treatment Effects (MCATEs): the standard Conditional Average Treatment Effects (CATEs) in the literature, in which all covariates except one are marginalized over, providing one average effect for one level of a single covariate. This is in contrast to a "Partial" CATE, in which other variables are "controlled for" in some way. This does not provide a satisfying causal interpretation without assumptions of joint randomization of treatment and covariates.
- Variable Importance (VIMP): Using @williamson2021nonparametric, `tidyhte` calculates how much each moderator contributes to the overall reduction in mean-squared-error in a joint model of the heterogeneous effects.
- Diagnostics: A wide variety of diagnostics are provided for all models fit as part of the `tidyhte` estimation process. These include single-number summaries like mean-squared-error or AUC, as well as entire receiver operating characteristic curves and coefficients in the SuperLearner ensembles.


```{r qoi, results='hide', message=FALSE, warning=FALSE}
Expand Down Expand Up @@ -380,7 +401,14 @@ By allowing the user to flexibly compose HTE estimators, it drastically reduces
# Conclusion

This paper has introduced the concepts underlying the `tidyhte` package and given examples as to how the package can be used.
In general, the package is written in a sufficiently general way that some features can be added with relatively little work.
For instance, adding new plugin models is as simple as providing some information on configuration as well as standardizing train / predict methods.
This is similarly true for providing new ways to summarize CATEs for plotting, which is handled in much the same way.
This makes it relatively easy for methodologists to fit in their preferred fitting methods and thereby extending `tidyhte`'s functionality.

# Acknowledgements

We gratefully acknowledge the collaboration with the US 2020 Facebook and Instagram Election Project for being a testbed for the initial versions of `tidyhte`, particularly Pablo Barberá at Meta. We received no financial support for this software project.

# References

Expand Down
22 changes: 22 additions & 0 deletions vignettes/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -800,4 +800,26 @@ @misc{dwivedi2020stable
eprint={2008.10109},
archivePrefix={arXiv},
primaryClass={stat.ME}
}

@article{abadie2023should,
title={When should you adjust standard errors for clustering?},
author={Abadie, Alberto and Athey, Susan and Imbens, Guido W and Wooldridge, Jeffrey M},
journal={The Quarterly Journal of Economics},
volume={138},
number={1},
pages={1--35},
year={2023},
publisher={Oxford University Press}
}

@article{zou2005regularization,
title={Regularization and variable selection via the elastic net},
author={Zou, Hui and Hastie, Trevor},
journal={Journal of the Royal Statistical Society Series B: Statistical Methodology},
volume={67},
number={2},
pages={301--320},
year={2005},
publisher={Oxford University Press}
}

0 comments on commit d3368f1

Please sign in to comment.