diff --git a/R/linear_pool.R b/R/linear_pool.R index e6eae35..012a64d 100644 --- a/R/linear_pool.R +++ b/R/linear_pool.R @@ -3,23 +3,7 @@ #' each combination of model task, output type, and output type id. Supported #' output types include `mean`, `quantile`, `cdf`, and `pmf`. #' -#' @param model_outputs an object of class `model_output_df` with component -#' model outputs (e.g., predictions). -#' @param weights an optional `data.frame` with component model weights. If -#' provided, it should have a column named `model_id` and a column containing -#' model weights. Optionally, it may contain additional columns corresponding -#' to task id variables, `output_type`, or `output_type_id`, if weights are -#' specific to values of those variables. The default is `NULL`, in which case -#' an equally-weighted ensemble is calculated. -#' @param weights_col_name `character` string naming the column in `weights` -#' with model weights. Defaults to `"weight"` -#' @param model_id `character` string with the identifier to use for the -#' ensemble model. -#' @param task_id_cols `character` vector with names of columns in -#' `model_outputs` that specify modeling tasks. Defaults to `NULL`, in which -#' case all columns in `model_outputs` other than `"model_id"`, the specified -#' `output_type_col` and `output_type_id_col`, and `"value"` are used as task -#' ids. +#' @inheritParams simple_ensemble #' @param n_samples `numeric` that specifies the number of samples to use when #' calculating quantiles from an estimated quantile function. Defaults to `1e4`. #' @param ... parameters that are passed to `distfromq::make_q_fn`, specifying @@ -37,14 +21,14 @@ #' in three steps: #' 1. Interpolate and extrapolate from the provided quantiles for each component #' model to obtain an estimate of the cdf of that distribution. -#' 2. Draw samples from the distribution for each component model. To reduce Monte -#' Carlo variability, we use pseudo-random samples corresponding to quantiles -#' of the estimated distribution. +#' 2. Draw samples from the distribution for each component model. To reduce +#' Monte Carlo variability, we use quasi-random samples corresponding to +#' quantiles of the estimated distribution. #' 3. Collect the samples from all component models and extract the desired quantiles. #' Steps 1 and 2 in this process are performed by `distfromq::make_q_fn`. #' -#' @return a `model_out_tbl` object of ensemble predictions. Note that any additional -#' columns in the input `model_outputs` are dropped. +#' @return a `model_out_tbl` object of ensemble predictions. Note that any +#' additional columns in the input `model_outputs` are dropped. #' #' @export #' diff --git a/R/linear_pool_quantile.R b/R/linear_pool_quantile.R index f71c942..6dc63b5 100644 --- a/R/linear_pool_quantile.R +++ b/R/linear_pool_quantile.R @@ -2,39 +2,9 @@ #' (distributional mixture) of component model outputs for the `quantile` #' output type. #' -#' @param model_outputs an object of class `model_output_df` with component -#' model outputs (e.g., predictions) with only a `quantile` output type. -#' Should be pre-validated. -#' @param weights an optional `data.frame` with component model weights. If -#' provided, it should have a column named `model_id` and a column containing -#' model weights. Optionally, it may contain additional columns corresponding -#' to task id variables, `output_type`, or `output_type_id`, if weights are -#' specific to values of those variables. The default is `NULL`, in which case -#' an equally-weighted ensemble is calculated. Should be pre-validated. -#' @param weights_col_name `character` string naming the column in `weights` -#' with model weights. Defaults to `"weight"`. -#' @param model_id `character` string with the identifier to use for the -#' ensemble model. -#' @param task_id_cols `character` vector with names of columns in -#' `model_outputs` that specify modeling tasks. Defaults to `NULL`, in which -#' case all columns in `model_outputs` other than `"model_id"`, the specified -#' `output_type_col` and `output_type_id_col`, and `"value"` are used as task -#' ids. Should be pre-validated. -#' @param n_samples `numeric` that specifies the number of samples to use when -#' calculating quantiles from an estimated quantile function. Defaults to `1e4`. -#' @param ... parameters that are passed to `distfromq::make_q_fun`, specifying -#' details of how to estimate a quantile function from provided quantile levels -#' and quantile values for `output_type` `"quantile"`. +#' @inherit linear_pool params details #' @noRd -#' @details The underlying mechanism for the computations to obtain the quantiles -#' of a linear pool in three steps is as follows: -#' 1. Interpolate and extrapolate from the provided quantiles for each component -#' model to obtain an estimate of the cdf of that distribution. -#' 2. Draw samples from the distribution for each component model. To reduce Monte -#' Carlo variability, we use pseudo-random samples corresponding to quantiles -#' of the estimated distribution. -#' 3. Collect the samples from all component models and extract the desired quantiles. -#' Steps 1 and 2 in this process are performed by `distfromq::make_q_fun`. +#' #' @return a `model_out_tbl` object of ensemble predictions for the `quantile` output type. #' @importFrom rlang .data diff --git a/R/simple_ensemble.R b/R/simple_ensemble.R index f4c1fd2..8169ef0 100644 --- a/R/simple_ensemble.R +++ b/R/simple_ensemble.R @@ -9,7 +9,7 @@ #' model weights. Optionally, it may contain additional columns corresponding #' to task id variables, `output_type`, or `output_type_id`, if weights are #' specific to values of those variables. The default is `NULL`, in which case -#' an equally-weighted ensemble is calculated. +#' an equally-weighted ensemble is calculated. Should be pre-validated. #' @param weights_col_name `character` string naming the column in `weights` #' with model weights. Defaults to `"weight"` #' @param agg_fun a function or character string name of a function to use for @@ -21,9 +21,8 @@ #' ensemble model. #' @param task_id_cols `character` vector with names of columns in #' `model_outputs` that specify modeling tasks. Defaults to `NULL`, in which -#' case all columns in `model_outputs` other than `"model_id"`, the specified -#' `output_type_col` and `output_type_id_col`, and `"value"` are used as task -#' ids. +#' case all columns in `model_outputs` other than `"model_id"`, `"output_type"`, +#' `"output_type_id"`, and `"value"` are used as task ids. #' #' @details The default for `agg_fun` is `"mean"`, in which case the ensemble's #' output is the average of the component model outputs within each group diff --git a/man/linear_pool.Rd b/man/linear_pool.Rd index 1911a36..9dba060 100644 --- a/man/linear_pool.Rd +++ b/man/linear_pool.Rd @@ -18,7 +18,7 @@ linear_pool( ) } \arguments{ -\item{model_outputs}{an object of class \code{model_output_df} with component +\item{model_outputs}{an object of class \code{model_out_tbl} with component model outputs (e.g., predictions).} \item{weights}{an optional \code{data.frame} with component model weights. If @@ -26,7 +26,7 @@ provided, it should have a column named \code{model_id} and a column containing model weights. Optionally, it may contain additional columns corresponding to task id variables, \code{output_type}, or \code{output_type_id}, if weights are specific to values of those variables. The default is \code{NULL}, in which case -an equally-weighted ensemble is calculated.} +an equally-weighted ensemble is calculated. Should be pre-validated.} \item{weights_col_name}{\code{character} string naming the column in \code{weights} with model weights. Defaults to \code{"weight"}} @@ -36,9 +36,8 @@ ensemble model.} \item{task_id_cols}{\code{character} vector with names of columns in \code{model_outputs} that specify modeling tasks. Defaults to \code{NULL}, in which -case all columns in \code{model_outputs} other than \code{"model_id"}, the specified -\code{output_type_col} and \code{output_type_id_col}, and \code{"value"} are used as task -ids.} +case all columns in \code{model_outputs} other than \code{"model_id"}, \code{"output_type"}, +\code{"output_type_id"}, and \code{"value"} are used as task ids.} \item{n_samples}{\code{numeric} that specifies the number of samples to use when calculating quantiles from an estimated quantile function. Defaults to \code{1e4}.} @@ -48,8 +47,8 @@ details of how to estimate a quantile function from provided quantile levels and quantile values for \code{output_type} \code{"quantile"}.} } \value{ -a \code{model_out_tbl} object of ensemble predictions. Note that any additional -columns in the input \code{model_outputs} are dropped. +a \code{model_out_tbl} object of ensemble predictions. Note that any +additional columns in the input \code{model_outputs} are dropped. } \description{ Compute ensemble model outputs as a linear pool, otherwise known as a @@ -70,9 +69,9 @@ When the \code{output_type} is \code{quantile}, we obtain the quantiles of a lin in three steps: 1. Interpolate and extrapolate from the provided quantiles for each component model to obtain an estimate of the cdf of that distribution. -2. Draw samples from the distribution for each component model. To reduce Monte -Carlo variability, we use pseudo-random samples corresponding to quantiles -of the estimated distribution. +2. Draw samples from the distribution for each component model. To reduce +Monte Carlo variability, we use quasi-random samples corresponding to +quantiles of the estimated distribution. 3. Collect the samples from all component models and extract the desired quantiles. Steps 1 and 2 in this process are performed by \code{distfromq::make_q_fn}. } diff --git a/man/simple_ensemble.Rd b/man/simple_ensemble.Rd index 3f514cd..b2c78d9 100644 --- a/man/simple_ensemble.Rd +++ b/man/simple_ensemble.Rd @@ -25,7 +25,7 @@ provided, it should have a column named \code{model_id} and a column containing model weights. Optionally, it may contain additional columns corresponding to task id variables, \code{output_type}, or \code{output_type_id}, if weights are specific to values of those variables. The default is \code{NULL}, in which case -an equally-weighted ensemble is calculated.} +an equally-weighted ensemble is calculated. Should be pre-validated.} \item{weights_col_name}{\code{character} string naming the column in \code{weights} with model weights. Defaults to \code{"weight"}} @@ -42,9 +42,8 @@ ensemble model.} \item{task_id_cols}{\code{character} vector with names of columns in \code{model_outputs} that specify modeling tasks. Defaults to \code{NULL}, in which -case all columns in \code{model_outputs} other than \code{"model_id"}, the specified -\code{output_type_col} and \code{output_type_id_col}, and \code{"value"} are used as task -ids.} +case all columns in \code{model_outputs} other than \code{"model_id"}, \code{"output_type"}, +\code{"output_type_id"}, and \code{"value"} are used as task ids.} } \value{ a \code{model_out_tbl} object of ensemble predictions. Note that