Skip to content

Part IV Function Reference 1

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 4 revisions

← Part III — Computational Architecture · gdpar Wiki Home · Part IV — Exhaustive Function Reference (2/7) →


Part IV — Exhaustive Function Reference

Documentation of every function in all 44 R source files, generated by the GLM-5.2 and MiMo-V2.5-Pro lineages under a faithful-to-source spec and audited (guaranteed floor on the mathematically dense modules: families, W_basis, geometry_*, dependence_robust, stan_codegen).

R/adapter_econml.R

gdpar_adapter_econml(estimator = "CausalForestDML", n_estimators = 1000L, model_y = NULL, model_t = NULL, seed = NULL)

Purpose Creates and returns a gdpar_meta_learner_adapter object that wraps an EconML estimator via the reticulate package, for use with gdpar_compare_meta_learners. It enables causal inference using the Orthogonal Double/Debiased Machine Learning (DML) framework, specifically the CausalForestDML estimator from the EconML Python package.

Arguments

  • estimator Character scalar. Identifies the EconML estimator to use. Currently only "CausalForestDML" is supported in this package version.
  • n_estimators Integer scalar. The number of trees in the CausalForestDML forest.
  • model_y Optional Python model object. The outcome model used in the first stage of the DML procedure. If NULL, EconML's default (a gradient-boosted tree) is used.
  • model_t Optional Python model object. The treatment model used in the first stage of the DML procedure. If NULL, EconML's default is used.
  • seed Optional integer scalar. A random seed for reproducibility, passed to the EconML estimator's random_state parameter. Must be between 1 and .Machine$integer.max.

Mathematics The CausalForestDML estimator implements the Orthogonal Double Machine Learning (DML) framework for estimating the Conditional Average Treatment Effect (CATE), $\tau(x)$, at a covariate vector $x$. For a binary treatment $T$, outcome $Y$, and covariates $X$, the CATE is defined as: $$\tau(x) = \mathbb{E}[Y(1) - Y(0) | X = x]$$ where $Y(t)$ denotes the potential outcome under treatment $t$. DML proceeds by:

  1. Estimating nuisance parameters $\eta_0$, e.g., the outcome model $\mathbb{E}[Y|X, T]$ and the propensity score $\mathbb{P}(T=1|X)$, using flexible machine learning.
  2. Constructing the pseudo-outcome (the "DML residual"): $\psi(W; \eta) = \left[ \frac{T - \hat{\mathbb{P}}(T=1|X)}{\hat{\mathbb{P}}(T=1|X)(1 - \hat{\mathbb{P}}(T=1|X))} \right] \left( Y - \hat{\mathbb{E}}[Y|X, T] \right)$
  3. Estimating the CATE by regressing $\psi(W; \hat{\eta})$ on $X$ using a causal forest.

Confidence intervals are constructed using the effect_interval method of the fitted EconML estimator, which provides asymptotically valid intervals based on the forest's variance estimation.

Returns A gdpar_meta_learner_adapter list object with the following components:

  • name: "econml"
  • fit_predict_fun: A function that fits the model and returns CATE estimates and confidence intervals.
  • predict_fun: A function that predicts from a fitted model without re-fitting.
  • requires_r: "reticulate"
  • requires_py: "econml"
  • native_ci: TRUE
  • description: A character string summarizing the estimator and its settings.

Notes

  • The Python module econml must be installed in the active reticulate environment. If not found, the function aborts with a gdpar_missing_dependency_error.
  • The state object returned by fit_predict_fun contains a reference to a Python object. This reference is invalidated if the R session is restarted (e.g., after saveRDS and loadRDS). Using predict_fun on such a state will result in a gdpar_unsupported_feature_error.
  • The function validates input arguments using assert_count and assert_numeric_scalar (internal validation functions not shown).
  • The internal .econml_to_matrix function is called to convert covariate data frames to numeric matrices compatible with EconML.

.econml_to_matrix(df, template = NULL)

Purpose Converts a data frame of covariates into a numeric matrix suitable for input to the EconML Python package. It also manages a template to ensure consistent encoding of factor levels between training and prediction data.

Arguments

  • df A data frame or object coercible to a data frame. Contains the covariates.
  • template A list or NULL. If NULL, the function creates a new template from df. If provided, it ensures df is processed to match the template's column structure.

Mathematics The conversion process applies the following transformation to each covariate:

  1. Character columns are converted to factors.
  2. The data frame is converted to a model matrix using the formula ~ . - 1, which expands factor variables into dummy (one-hot) variables without an intercept. For a factor with $k$ levels, this yields $k$ binary indicator columns.

The resulting matrix $X$ is used in the EconML estimator's fit and effect methods.

Returns A numeric matrix with the following attributes:

  • colnames: The column names of the matrix.
  • factor_levels: A list mapping original factor column names to their levels. If the column is not a factor, the entry is NULL.
  • template: The template list (either created or passed in) is attached as an attribute to the matrix. This template is stored in the adapter's state to ensure consistent data processing during prediction.

Notes

  • If template is NULL (i.e., during model fitting), the function requires at least one column in df. Otherwise, it aborts with a gdpar_input_error.
  • If template is provided (i.e., during prediction), the function enforces that all factor variables in the template exist in df and have the same levels. It also ensures the resulting model matrix has exactly the same columns as the template. Incompatibilities (missing or extra columns) result in a gdpar_input_error.
  • The function uses stats::model.matrix for the conversion, which handles factor expansion and removal of intercepts.

R/adapter_grf.R

gdpar_adapter_grf(num_trees = 2000L, sample_fraction = 0.5, mtry = NULL, honesty = TRUE, seed = NULL)

Purpose

Factory that constructs a gdpar_meta_learner_adapter object wrapping the R-side causal forest estimator grf::causal_forest (Athey, Tibshirani, and Wager, 2019) for use with gdpar_compare_meta_learners. The adapter populates both the mandatory fit_predict_fun (fit + predict in one call) and the optional predict_fun (reuse a cached forest on a fresh evaluation grid without refitting). It advertises requires_r = "grf" and native_ci = TRUE because grf's built-in variance estimator supplies confidence intervals via the normal approximation.

Arguments

  • num_trees: Integer scalar; number of trees in the forest. Default 2000L, matching grf's default.
  • sample_fraction: Numeric scalar in $(0, 0.5]$; fraction of the training sample drawn for each tree. Default 0.5.
  • mtry: Optional integer scalar; number of candidate variables per split. Default NULL delegates to grf's own default ($\min(\lceil\sqrt{p} + 20\rceil, p)$).
  • honesty: Logical scalar; whether to use honest splitting. Default TRUE (recommended; grf CIs are valid only under honesty).
  • seed: Optional integer scalar; seed propagated to grf's internal RNG when the comparator's seed_run is NULL. Default NULL.

Mathematics

Native confidence intervals are obtained by the normal approximation

$$ \big[,\widehat\tau(x) - z_{1-\alpha/2}\cdot\sqrt{\widehat{\mathrm{Var}}(\widehat\tau(x))},;; \widehat\tau(x) + z_{1-\alpha/2}\cdot\sqrt{\widehat{\mathrm{Var}}(\widehat\tau(x))},\big], $$

where $\alpha = 1 - \text{level},\ z_{1-\alpha/2} = \Phi^{-1}(1 - (1-\text{level})/2)$, and $\widehat{\mathrm{Var}}(\widehat\tau(x))$ is grf's built-in variance estimate obtained via predict(..., estimate.variance = TRUE). The standard error is clamped at zero: $\mathrm{se}(x) = \sqrt{\max(\widehat{\mathrm{Var}}(\widehat\tau(x)), 0)}$.

Returns

A gdpar_meta_learner_adapter object (constructed via gdpar_meta_learner_adapter) with fields:

  • name = "grf",
  • fit_predict_fun: closure capturing the hyperparameter list hp,
  • predict_fun: closure independent of hp,
  • requires_r = "grf",
  • native_ci = TRUE,
  • description: a string of the form "grf::causal_forest (num_trees = <n>, honesty = <h>) with normal-approximation CIs from estimate.variance.".

Notes

  • Validation sequence:
    1. assert_count(num_trees, "num_trees").
    2. assert_numeric_scalar(sample_fraction, "sample_fraction", lower = 1e-3, upper = 0.5).
    3. If mtry is non-NULL, assert_count(mtry, "mtry").
    4. honesty must be a length-1 non-NA logical; otherwise gdpar_abort raises a gdpar_input_error with message "Argument 'honesty' must be a non-NA logical scalar.".
    5. If seed is non-NULL, assert_numeric_scalar(seed, "seed", lower = 1, upper = .Machine$integer.max).
  • Hyperparameters are coerced and stored in a list hp: num_treesinteger, sample_fractionnumeric, mtryinteger (or NULL), honestylogical, seedinteger (or NULL).
  • The fit_predict_fun and predict_fun closures are defined locally and passed to gdpar_meta_learner_adapter; see their dedicated subsections below.
  • Side effects: none beyond construction of the adapter object. The closures themselves perform fitting/prediction only when invoked.

fit_predict_fun(X, Y, T, X_newdata, level, seed_run) (closure defined inside gdpar_adapter_grf)

Purpose

Mandatory adapter entry point: fit a grf::causal_forest on the training triple (X, Y, T) and immediately predict the conditional average treatment effect (CATE) on X_newdata together with native normal-approximation confidence intervals at confidence level level. Returns both the predictions and a state object that allows predict_fun to reuse the fitted forest.

Arguments

  • X: Covariate data (data frame or coercible) for the training sample; passed to .grf_to_matrix.
  • Y: Numeric outcome vector; coerced via as.numeric(Y) and passed as Y to grf::causal_forest.
  • T: Numeric treatment indicator vector; coerced via as.numeric(T) and passed as W to grf::causal_forest.
  • X_newdata: Covariate data for prediction; converted via .grf_to_matrix(X_newdata, template = attr(X_mat, "template")) so its design aligns with the training design.
  • level: Numeric scalar in $(0, 1)$; confidence level for the CIs.
  • seed_run: Optional integer scalar; seed supplied by the comparator. If non-NULL it overrides hp$seed.

Mathematics

The causal forest is fit by do.call(grf::causal_forest, args) with args = list(X = X_mat, Y = as.numeric(Y), W = as.numeric(T), num.trees = hp$num_trees, sample.fraction = hp$sample_fraction, honesty = hp$honesty), conditionally augmented with mtry (if hp$mtry non-NULL) and seed (if eff_seed non-NULL). Predictions use estimate.variance = TRUE, yielding $\widehat\tau(x)$ and $\widehat{\mathrm{Var}}(\widehat\tau(x))$. Then

$$ z = \Phi^{-1}!\left(1 - \frac{1 - \text{level}}{2}\right), \qquad \mathrm{se}(x) = \sqrt{\max!\big(\widehat{\mathrm{Var}}(\widehat\tau(x)),, 0\big)}, $$

$$ \text{CI}(x) = \big[,\widehat\tau(x) - z,\mathrm{se}(x),;; \widehat\tau(x) + z,\mathrm{se}(x),\big]. $$

Returns

A list with elements:

  • cate_mean: numeric vector of point predictions as.numeric(pred$predictions).
  • cate_ci: numeric matrix with columns lower and upper, row-aligned with cate_mean.
  • state: list with forest (the fitted causal_forest object cf) and template (the column template extracted from attr(X_mat, "template")).
  • notes: character(0L) (empty).

Notes

  • Calls require_suggested("grf", "fit gdpar_adapter_grf") to ensure the suggested package is available; this is expected to error if grf is not installed.
  • eff_seed is resolved as seed_run (coerced to integer) when non-NULL, otherwise falls back to hp$seed.
  • The variance estimates are clamped at zero via pmax(as.numeric(pred$variance.estimates), 0) before taking the square root, guarding against tiny negative numerical artifacts.
  • The state$forest object retains grf's internal RNG state and trained trees; it is intended to be passed back to predict_fun.
  • Side effects: triggers grf::causal_forest fitting (RNG consumption if eff_seed is set) and a prediction call.

predict_fun(state, X_newdata, level) (closure defined inside gdpar_adapter_grf)

Purpose

Optional adapter entry point: reuse a previously fitted causal forest stored in state to predict CATEs (with native CIs) on a fresh evaluation grid X_newdata, without refitting.

Arguments

  • state: List previously produced by fit_predict_fun, expected to contain forest (a fitted grf::causal_forest object) and template (column template for design alignment).
  • X_newdata: Covariate data for prediction; converted via .grf_to_matrix(X_newdata, template = state$template).
  • level: Numeric scalar in $(0, 1)$; confidence level for the CIs.

Mathematics

Identical normal-approximation CI construction as in fit_predict_fun:

$$ z = \Phi^{-1}!\left(1 - \frac{1 - \text{level}}{2}\right), \qquad \mathrm{se}(x) = \sqrt{\max!\big(\widehat{\mathrm{Var}}(\widehat\tau(x)),, 0\big)}, $$

$$ \text{CI}(x) = \big[,\widehat\tau(x) - z,\mathrm{se}(x),;; \widehat\tau(x) + z,\mathrm{se}(x),\big]. $$

Returns

A list with elements:

  • cate_mean: numeric vector of point predictions.
  • cate_ci: numeric matrix with columns lower and upper.

(No state or notes elements are returned, unlike fit_predict_fun.)

Notes

  • Dependency check is performed inline via requireNamespace("grf", quietly = TRUE); on failure, gdpar_abort raises a gdpar_missing_dependency_error with data = list(package = "grf") and message "Package 'grf' is required to reuse a cached causal_forest state.".
  • If state is NULL or state$forest is NULL, gdpar_abort raises a gdpar_internal_error with message "Cached state for the grf adapter is empty; refit before predicting.".
  • Uses stats::predict (not grf:::predict.causal_forest directly) so S3 dispatch resolves the method.
  • Variance estimates are clamped at zero via pmax(..., 0) before the square root.
  • Side effects: triggers a grf prediction call (no refitting, no RNG consumption).

.grf_to_matrix(df, template = NULL)

Purpose

Internal helper that converts a covariate data frame into a fully numeric design matrix suitable for grf::causal_forest. Character columns are coerced to factors, factors are expanded via model.matrix(~ . - 1, ...), and numeric columns pass through unchanged. A template attribute records the column structure of the first call so subsequent calls on X_newdata align identically; the function aborts when a new column appears or a previously observed factor level is missing.

Arguments

  • df: A data frame, or an object coercible via as.data.frame(df, stringsAsFactors = FALSE). Contains the covariates.
  • template: Optional list with elements colnames (character vector of expected design-matrix column names) and factor_levels (named list mapping original data-frame column names to either their factor levels or NULL for non-factor columns). If NULL, a new template is built from df.

Mathematics

The design matrix is constructed as

$$ \mathbf{X}_{\text{mat}} = \text{model.matrix}(\sim, . - 1,; \text{data} = \text{df}), $$

which expands each $K$-level factor into $K$ indicator columns (no intercept). When a template is supplied, factor columns are first re-leveled to the template's levels via factor(df[[j]], levels = template$factor_levels[[j]]), the resulting matrix is checked for set-equality of column names against template$colnames, and finally reordered as mm[, template$colnames, drop = FALSE].

Returns

A numeric matrix with attribute "template" (a list with colnames and factor_levels). When template = NULL on input, the returned template is freshly built from df; otherwise the input template is attached unchanged to the returned matrix.

Notes

  • If df is not a data frame, it is coerced via as.data.frame(df, stringsAsFactors = FALSE).
  • A for loop over seq_along(df) coerces any character column to factor via as.factor.
  • Template-NULL branch:
    • If ncol(df) == 0L, gdpar_abort raises a gdpar_input_error with message "gdpar_adapter_grf requires at least one covariate; received a 0-column data frame.".
    • template$factor_levels is built by lapply(df, function(col) if (is.factor(col)) levels(col) else NULL), so non-factor columns map to NULL.
  • Template-non-NULL branch:
    • For each name j in names(template$factor_levels) whose entry is non-NULL (i.e., a factor column in the original training data):
      • If j is not in colnames(df), gdpar_abort raises a gdpar_input_error with message "Covariate '<j>' missing from newdata for the grf adapter." and data = list(missing = j).
      • Otherwise df[[j]] is re-factored with factor(df[[j]], levels = template$factor_levels[[j]]); this silently drops unseen levels and introduces NA for values not in the template levels.
    • After building mm, if !setequal(colnames(mm), template$colnames):
      • missing_cols <- setdiff(template$colnames, colnames(mm)).
      • extra_cols <- setdiff(colnames(mm), template$colnames).
      • gdpar_abort raises a gdpar_input_error with a formatted message listing missing and extra columns (using "<none>" when a set is empty) and data = list(missing = missing_cols, extra = extra_cols).
    • On success, columns are reordered to exactly match template$colnames via mm[, template$colnames, drop = FALSE].
  • The template attribute is set on the returned matrix via attr(mm, "template") <- template, enabling chained calls (e.g., fit_predict_fun reads attr(X_mat, "template") and passes it to the X_newdata conversion).
  • No S3 dispatch; this is a plain internal function.

R/amm_build.R

amm_build(p = 1L)

Purpose Initialises a chainable builder object of class amm_builder that serves as an incremental specification container for an Adaptive Moderated Model (AMM). The builder is a programmatic alternative to the single-call amm_spec() constructor; it accumulates per-dimension additive (a) and multiplicative (b) basis formulas, a global modulating basis W, and optional covariate names x_vars, and is ultimately converted into an amm_spec via as_amm_spec().

Arguments

Argument Type Meaning
p Scalar positive integer (coerced to integer) Dimension of the per-individual parameter vector $\theta_i \in \mathbb{R}^p$. Defaults to 1L (scalar/ univariate path). Must satisfy p ≥ 1.

Mathematics None. The function performs only object construction.

Returns An S3 object of class c("amm_builder", "list") with four components:

  • pinteger, the value of the p argument.
  • dims — a dims_spec object (from dimwise(a = NULL, b = NULL)) representing per-dimension additive/multiplicative basis specifications with a NULL base and no overrides.
  • WNULL (no modulating basis set yet).
  • x_varsNULL (no covariate names set yet).

Notes

  • p is validated by assert_count(p, "p"), which enforces a single positive integer value. Non-integer numerics are silently coerced to integer via as.integer(p).
  • The resulting builder is intended to be mutated in-place through successive amm_set_*() calls and finalised by as_amm_spec(). Despite the pipe-friendly API, the builder is a plain list and mutations rely on R's copy-on-modify semantics.
  • When p = 1L, finalisation through as_amm_spec() resolves the embedded dims_spec to a scalar entry and invokes the scalar AMM path; when p > 1L, the dims_spec is forwarded directly to the multivariate path. This bifurcation happens at finalisation time, not at build time.

amm_set_a_uniform(builder, a)

Purpose Replaces the base (uniform) additive basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension $k = 1, \dots, p$ that does not carry an explicit per-dimension override (set via amm_set_a()). Passing NULL disables the additive component on the base layer.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
a One-sided formula (~ ...) or NULL The new base additive basis. NULL means no additive component on the base layer.

Returns The mutated amm_builder object (returned invisibly for pipe compatibility). The mutation is to builder$dims$base$a.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • a is validated by assert_one_sided_formula(a, "a", allow_null = TRUE), which enforces a one-sided formula or NULL.
  • The mutation builder$dims$base$a <- a directly overwrites the base additive slot. Any per-dimension overrides previously registered via amm_set_a() are preserved because they are stored separately in the dims_spec override layer (see override()).

amm_set_b_uniform(builder, b)

Purpose Replaces the base (uniform) multiplicative basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension $k = 1, \dots, p$ that does not carry an explicit per-dimension override (set via amm_set_b()). Passing NULL disables the multiplicative component on the base layer.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
b One-sided formula (~ ...) or NULL The new base multiplicative basis. NULL means no multiplicative component on the base layer.

Returns The mutated amm_builder object (returned invisibly). The mutation is to builder$dims$base$b.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • b is validated by assert_one_sided_formula(b, "b", allow_null = TRUE).
  • Per-dimension overrides previously registered via amm_set_b() are preserved; only the base layer is changed.

amm_set_a(builder, k, a)

Purpose Registers a per-dimension override of the additive component for a specific dimension index $k \in \{1, \dots, p\}$. This override takes precedence over the uniform base (set by amm_set_a_uniform()) at index $k$ only. Calling this function twice with the same k replaces the previous override for that dimension.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
k Positive integer in 1:p The dimension index to override. Validated against builder$p as the upper bound.
a One-sided formula (~ ...) or NULL The override additive basis for dimension $k$. NULL disables the additive component for that dimension.

Returns The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, a = a).

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • k is validated by assert_count(k, "k", max = builder$p), which enforces a single positive integer $\leq p$.
  • a is validated by assert_one_sided_formula(a, "a", allow_null = TRUE).
  • The override is applied via override(), which layers a per-dimension specification on top of the existing dims_spec. Overrides survive subsequent uniform changes (e.g., a later amm_set_a_uniform() call will not erase overrides registered here).

amm_set_b(builder, k, b)

Purpose Registers a per-dimension override of the multiplicative component for a specific dimension index $k \in \{1, \dots, p\}$. This override takes precedence over the uniform base (set by amm_set_b_uniform()) at index $k$ only. Calling this function twice with the same k replaces the previous override for that dimension.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
k Positive integer in 1:p The dimension index to override. Validated against builder$p as the upper bound.
b One-sided formula (~ ...) or NULL The override multiplicative basis for dimension $k$. NULL disables the multiplicative component for that dimension.

Returns The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, b = b).

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • k is validated by assert_count(k, "k", max = builder$p).
  • b is validated by assert_one_sided_formula(b, "b", allow_null = TRUE).
  • The override is applied via override(), which layers a per-dimension specification on top of the existing dims_spec. Overrides survive subsequent uniform changes.

amm_set_W(builder, W)

Purpose Stores a W_basis object as the global modulating basis of the specification under construction. The modulating component $W(\theta_i)$ is a function of $\theta_i$ shared across all dimensions and enters the response model as a linear factor multiplying a covariate vector $x$:

$$\text{response} \sim W(\theta_i), x$$

Passing NULL disables the modulating component.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
W W_basis object or NULL The modulating basis to store. NULL clears any previously set basis.

Returns The mutated amm_builder object (returned invisibly). The mutation is to builder$W.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • If W is non-NULL, it is validated by assert_inherits(W, "W_basis", "W").
  • The modulating component is global to all dimensions of $\theta_i$; there is no per-dimension setter for $W$. Declaring $W$ per-dimension would restrict the model to the separable sub-class, which is rejected by construction in the package design.
  • W is stored as a single top-level slot of the builder, not inside dims.

amm_set_x_vars(builder, x_vars)

Purpose Records a character vector of covariate names that enter the modulating component as the linear factor $x$ in $W(\theta_i)\, x$, or clears it by passing NULL. The recorded value is forwarded to amm_spec() at finalisation time. When NULL, the package derives covariate names from the right-hand side of the model formula passed to gdpar().

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
x_vars Character vector (length $\geq 1$) or NULL Names of the covariates used in the modulating component. NULL defers covariate identification to gdpar().

Returns The mutated amm_builder object (returned invisibly). The mutation is to builder$x_vars.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • If x_vars is non-NULL, the function performs a manual validation check: it must be is.character(x_vars) and length(x_vars) >= 1L. On failure, gdpar_abort() is called with class "gdpar_input_error" and a data list containing the argument name and the received value.
  • This is the only setter in the builder suite that performs its own argument validation via gdpar_abort() rather than delegating to an assert_* helper, because the validation logic (non-empty character vector or NULL) does not match any existing assertion primitive.

Note: The roxygen block for as_amm_spec(builder) begins at the end of this section but its function body is in section 2 of 2. It is therefore not documented here.

as_amm_spec(builder)

Purpose
Converts an amm_builder object into a finalized amm_spec specification object. This is the constructor that finalizes the builder's accumulated configuration, resolving any pending dimension specifications and handling the special case of a univariate AMM (p=1).

Arguments

  • builder: An object of class amm_builder containing the accumulated model configuration (e.g., dimensions dims, basis matrix W, predictor variables x_vars, and AR order p).

Returns
An object of class amm_spec representing the fully specified AMM model.

Notes

  • The function first asserts that builder is indeed an amm_builder object.
  • For the univariate case (p == 1), it explicitly resolves the dims specification to extract the scalar anchor parameters a and b using resolve_dims_spec. The resulting amm_spec object will contain these scalar values.
  • For the multivariate case (p > 1), the dims list is passed directly to the amm_spec constructor without immediate resolution.
  • The amm_spec object is constructed by calling the amm_spec() constructor (presumably defined elsewhere) with the relevant components from the builder.

print.amm_builder(x, ...)

Purpose
An S3 print method for objects of class amm_builder. It provides a human-readable summary of the builder's current configuration, including base dimensions, any override specifications for higher-order terms, the basis structure, and predictor variables.

Arguments

  • x: An object of class amm_builder to be printed.
  • ...: Additional arguments (unused, but required for S3 method compatibility).

Returns
Invisibly returns the input object x.

Notes

  • This is an exported S3 method.
  • The output is printed to the console via cat().
  • The summary includes:
    • The AR order p.
    • Base anchor parameters a and b. These are printed as NULL if they have not been set.
    • Any override specifications for dimensions of higher-order terms (k > 0). Overrides are printed if they exist, otherwise <none> is shown.
    • The basis matrix specification W. If present, it is printed as W_basis(type = '<type>').
    • The predictor variable names x_vars. If NULL, it notes they are inherited from the gdpar() formula.
  • The function handles NULL values gracefully by printing "NULL" instead of attempting to print NULL directly.
  • The overrides are printed in ascending order of the integer key k.

R/amm_serialize.R

amm_save_spec(spec, path)

Purpose

Serializes an amm_spec object (the constructor-input representation of an AMM specification) into a canonical, human-readable plain-text file. The format is designed for version control, archival, and bit-exact reproducibility. The file is intended to be round-tripped by amm_load_spec, which parses it with a dedicated lexical parser (no source/eval), making the serialized form safe to load from untrusted locations.

Arguments

  • spec: Object of class amm_spec. The specification to serialize. Only the constructor inputs are recorded; any materialized state (e.g., a W_basis materialized at a specific $\theta_{\mathrm{ref}}$ via the internal materialize_W_basis) is deliberately not written, so that the reconstructed object is the unmaterialized form normally produced by amm_spec.
  • path: Character scalar. The destination file path. Must be a single non-empty string.

Returns

Invisibly returns path (the character scalar that was passed in), after writing the canonical text representation to that file via writeLines.

Notes

Validation and errors:

  • spec is checked with assert_inherits(spec, "amm_spec", "spec"); a failure raises an error (dispatched by assert_inherits).
  • path is validated explicitly: it must satisfy is.character(path), length(path) == 1L, and nzchar(path). If any of these fail, the function aborts via gdpar_abort with class "gdpar_input_error" and a data list containing argument = "path" and received = path.
  • If spec[["W"]] is non-NULL and identical(W[["type"]], "user"), the function aborts with a gdpar_input_error (no data field). User-defined W_basis objects cannot be canonized because the evaluator is an arbitrary R function.

Serialization logic (line-by-line construction):

  1. The package version is obtained from utils::packageVersion("gdpar") and emitted as the mandatory header line:
    # gdpar_spec_version: <version>
    
  2. The dimension count is emitted as p: <spec[["p"]]>.
  3. Scalar path (spec[["p"]] == 1L, checked with isTRUE): the a and b one-sided formulas are serialized via the internal helper .serialize_formula and emitted as a: <literal> and b: <literal>.
  4. Multivariate path (spec[["p"]] > 1L): a and b are both written as the literal string NULL (the per-dimension formulas are emitted separately below).
  5. x_vars is serialized via .serialize_char_vec and emitted as x_vars: <literal> (handles NULL and character vectors).
  6. W block:
    • If W is NULL: emits W.type: NULL.
    • If W[["type"]] is "polynomial": emits W.type: polynomial followed by W.degree: <as.integer(W[["degree"]])>.
    • If W[["type"]] is "bspline": emits W.type: bspline, then W.degree: <as.integer(W[["degree"]])>. Additionally, if W[["knots"]] is non-NULL, emits W.knots: <.serialize_num_vec(W[["knots"]])>; if W[["df"]] is non-NULL, emits W.df: <as.integer(W[["df"]])>. Either, both, or neither of W.knots/W.df may appear depending on which fields are populated.
  7. Multivariate per-dimension entries (spec[["p"]] > 1L, checked with isTRUE): for each k in seq_len(spec[["p"]]), the function retrieves entry <- spec[["dims"]][[k]] and emits two lines:
    dims.<k>.a: <.serialize_formula(entry[["a"]])>
    dims.<k>.b: <.serialize_formula(entry[["b"]])>
    
    The index k is interpolated directly into the key name, producing keys such as dims.1.a, dims.1.b, dims.2.a, etc.

Side effects: Writes a text file to path via writeLines(lines, con = path). The file is overwritten if it exists.

Version policy: The version header records the running package version exactly. The loader (amm_load_spec) checks this strictly; until the first stable release, any mismatch is treated as an error.

Format grammar summary (as emitted by this function):

Key Value grammar
# gdpar_spec_version: Package version string (header line)
p Positive integer
a, b NULL or one-sided formula literal (e.g. ~ x1 + x2)
x_vars NULL or c("x1", "x2", ...)
W.type NULL, polynomial, or bspline
W.degree Positive integer (present for polynomial and bspline)
W.knots c(...) of numerics (present only for bspline with interior knots)
W.df Positive integer (present only for bspline with df)
dims.K.a, dims.K.b Same grammar as a, b, for K in 1:p (emitted only when p > 1)

amm_load_spec(path)

Purpose

Reads a canonical gdpar specification file from disk and parses it into an amm_spec object. The function is the deserialization counterpart to the package's spec writer: it enforces a strict, line-oriented key: value grammar, validates a version header against the currently loaded package version, and dispatches to amm_spec() either in the univariate (p = 1) form (with scalar a/b) or the multivariate (p > 1) form (with a dims list of per-dimension a/b pairs).

Arguments

  • path : character scalar (length 1, non-empty). Filesystem path to a canonical gdpar spec file. Must exist on disk.

Mathematics

The parser implements a deterministic finite scan over the file's lines. Let $L = (l_1, \dots, l_n)$ be the raw lines. Define the comment predicate $C(l) \equiv \mathrm{trimws}(l) \text{ starts with } \texttt{\#}$ and the blank predicate $B(l) \equiv \mathrm{nzchar}(\mathrm{trimws}(l)) = \mathrm{FALSE}$. For each non-skipped line $l_i$, the split position is

$$ \mathrm{pos}_i = \mathrm{regexpr}(\texttt{":"}, l_i, \mathrm{fixed}=\mathrm{TRUE}), $$

and the key/value pair is

$$ k_i = \mathrm{trimws}(\mathrm{substr}(l_i, 1, \mathrm{pos}_i - 1)), \qquad v_i = \mathrm{trimws}(\mathrm{substr}(l_i, \mathrm{pos}_i + 1, |l_i|)). $$

The recognised key set is

$$ \mathcal{K} = {\texttt{p}, \texttt{a}, \texttt{b}, \texttt{x_vars}, \texttt{W.type}, \texttt{W.degree}, \texttt{W.knots}, \texttt{W.df}} \cup {\texttt{dims.}k\texttt{.}{a,b} : k \in \mathbb{Z}_{\geq 1}}. $$

For $p = 1$, the admissible record set excludes any dims.*.* key; for $p &gt; 1$, the admissible dims indices are exactly $\{1, \dots, p\}$ and the scalar a/b keys must be absent (i.e. parse to NULL).

Returns

An amm_spec object constructed by amm_spec():

  • When p == 1L: amm_spec(a = a_scalar, b = b_scalar, W = W, x_vars = x_vars, p = 1L), where a_scalar and b_scalar are parsed formula objects (or NULL) and W is the parsed weight-basis specification returned by .parse_W_records().
  • When p > 1L: amm_spec(W = W, x_vars = x_vars, p = p_val, dims = dims_list), where dims_list is a list of length p_val whose $k$-th element is list(a = a_k, b = b_k) with a_k, b_k parsed formula objects.

Notes

  • All validation failures are raised via gdpar_abort() with class "gdpar_input_error"; condition data payloads are attached where contextual (e.g. argument, received, path, line, raw, key, file_version, package_version).
  • Input validation of path: rejects non-character, non-scalar, or empty-string inputs before any I/O.
  • Version header: the file must contain a line matching the regex ^\s*#\s*gdpar_spec_version\s*:. The extracted (whitespace-trimmed) value must be identical() to as.character(utils::packageVersion("gdpar")); any mismatch aborts, citing bit-exact reproducibility concerns across development releases.
  • Comment and blank lines are skipped during record parsing; only the first : (fixed-match) on each remaining line is used as the key/value separator.
  • Duplicate-key detection is performed against an accumulating records list; the error message reports both the current line and the line of the first occurrence.
  • Unknown-key rejection: any key not in recognised_prefixes and not matching ^dims\.[0-9]+\.[ab]$ aborts.
  • Required-key enforcement: p, a, b are mandatory for the univariate branch; for the multivariate branch, a/b must be NULL (i.e. .parse_formula() returned NULL) and dims.K.a / dims.K.b are required for every $k \in \{1, \dots, p\}$.
  • p is parsed via .parse_int() and must satisfy $p \geq 1$.
  • x_vars is optional; when absent it is passed as NULL.
  • The W block is delegated entirely to .parse_W_records(records).
  • Multivariate consistency: when p == 1L, any dims.* key triggers an abort listing the offending keys (via sQuote). When p > 1L, after building dims_list, the function recomputes the set of dims. keys and subtracts those matching ^dims\.[1-(p-1)]\.[ab]$|^dims\.p\.[ab]$ (i.e. the regex ^dims\.[1-%d]\.[ab]$|^dims\.%d\.[ab]$ with p_val - 1L and p_val substituted); any remainder is parsed for its integer index and aborted if the index is NA, less than 1, or greater than p_val, or if the key does not match the dims.K.{a,b} shape at all.
  • Side effects: reads from the filesystem via readLines(path, warn = FALSE); performs no writes.
  • No S3 dispatch is performed inside this function; the returned object's class is determined by amm_spec().

Serialization and Parsing Helpers (Formula, Vectors, Integers, W Records)

.serialize_formula(f)

Purpose

Serializes an R formula object into a single-line character string suitable for writing into the canonical text file format used by gdpar. This is the inverse of .parse_formula.

Arguments

  • f: formula or NULL. The formula to serialize. May be NULL, in which case the literal string "NULL" is produced.

Mathematics

No numerical formula. The transformation is:

$$ \text{serialize}(f) = \begin{cases} \texttt{"NULL"} & \text{if } f = \texttt{NULL}, \\ \texttt{paste}(\texttt{deparse}(f,, \text{width.cutoff}=500),, \text{collapse}=\texttt{" "}) & \text{otherwise.} \end{cases} $$

Returns

A length-one character string. When f is NULL, returns "NULL". Otherwise returns the deparsed formula collapsed onto a single line with elements separated by single spaces.

Notes

  • Uses deparse(..., width.cutoff = 500L) to reduce line-wrapping in the deparsed output, then collapses any resulting multi-element character vector with spaces.
  • No validation is performed on the structure of f (e.g., one-sided vs. two-sided); that is the responsibility of the parser on read-back.

.parse_formula(value, key, line_no)

Purpose

Parses a character string back into an R formula object, enforcing that it is either NULL or a one-sided formula beginning with ~. Used when reading canonical configuration files.

Arguments

  • value: character(1). The raw string read from the file for the given key.
  • key: character(1). The configuration key name, used in error messages.
  • line_no: integer(1) or numeric. The line number in the source file where value appeared, used in error messages.

Mathematics

No numerical formula. The validation logic is:

$$ \text{parse}(v) = \begin{cases} \texttt{NULL} & \text{if } \texttt{trimws}(v) = \texttt{"NULL"}, \\ \texttt{as.formula}(v) & \text{if } v \text{ starts with } \sim \text{ and } |\texttt{as.formula}(v)| = 2, \\ \text{error} & \text{otherwise.} \end{cases} $$

Returns

NULL or a one-sided formula object of length 2 (i.e., ~ rhs).

Notes

  • Trims whitespace from value before any comparison.
  • If the trimmed value is exactly "NULL", returns NULL immediately.
  • If the value does not start with "~", raises an error of class "gdpar_input_error" with data = list(key = key, value = value, line = line_no).
  • Wraps stats::as.formula(value) in tryCatch; on parse failure, raises a "gdpar_input_error" (without the data field).
  • After successful parsing, checks length(out) != 2L; a two-sided formula (length 3) triggers a "gdpar_input_error".
  • All errors are raised via gdpar_abort.

.serialize_char_vec(x)

Purpose

Serializes a character vector (or NULL) into a c(...) literal string with each element double-quoted, suitable for the canonical file format. This is the inverse of .parse_char_vec.

Arguments

  • x: character vector or NULL. The vector to serialize.

Mathematics

$$ \text{serialize}(x) = \begin{cases} \texttt{"NULL"} & \text{if } x = \texttt{NULL}, \\ \texttt{c("}x_1\texttt{", "}x_2\texttt{", }\ldots\texttt{")} & \text{otherwise,} \end{cases} $$

subject to the constraint that no element of $x$ contains ", \, \n, or \r.

Returns

A length-one character string. "NULL" for NULL input; otherwise a c(...) literal with quoted entries.

Notes

  • Before serialization, checks every element of x for the regex pattern ["\\\n\r] (double quote, backslash, newline, carriage return). If any match is found, raises a "gdpar_input_error" via gdpar_abort with the message indicating that double quotes, backslashes, or newlines are not permitted.
  • Each element is formatted via sprintf("\"%s\"", x_i).
  • Elements are joined with ", ".
  • An empty (length-zero) non-NULL character vector produces the string c().

.parse_char_vec(value, key, line_no)

Purpose

Parses a c(...) literal string back into a character vector, or recognizes "NULL". Inverse of .serialize_char_vec.

Arguments

  • value: character(1). The raw string from the file.
  • key: character(1). Configuration key name for error messages.
  • line_no: integer(1) or numeric. Source line number for error messages.

Mathematics

The extraction proceeds as:

  1. Trim value; if equal to "NULL", return NULL.
  2. Require the regex ^c\(.*\)$; otherwise error.
  3. Extract the inner content: inner = sub("^c\((.*)\)$", "\1", v).
  4. Find all quoted tokens via the regex "([^"]*)" on inner.
  5. Strip the surrounding double quotes from each match.

Returns

NULL, character(0), or a character vector of the parsed quoted tokens.

Notes

  • If the trimmed value is "NULL", returns NULL.
  • If the value does not match ^c\(.*\)$, raises a "gdpar_input_error".
  • After extracting inner, if there are zero regex matches:
    • If inner is empty or whitespace-only (!nzchar(trimws(inner))), returns character(0).
    • Otherwise raises a "gdpar_input_error" indicating quoted tokens were expected.
  • Quoted tokens are matched with the regex "([^"]*)" and then the double quotes are removed via gsub("\"", "", matches, fixed = TRUE).
  • The parser does not handle escaped quotes inside tokens; this is consistent with the serializer's prohibition on double quotes within elements.

.serialize_num_vec(x)

Purpose

Serializes a numeric vector (or NULL) into a c(...) literal string using high-precision formatting. Inverse of .parse_num_vec.

Arguments

  • x: numeric vector or NULL. The vector to serialize.

Mathematics

Each element is formatted with 17 significant digits:

$$ x_i \mapsto \texttt{sprintf}(\texttt{"%.17g"},, x_i), $$

then joined as $\texttt{c(}x_1\texttt{, }x_2\texttt{, }\ldots\texttt{)}$.

Returns

A length-one character string:

  • "NULL" if x is NULL.
  • "c()" if x has length 0.
  • Otherwise a c(...) literal with %.17g-formatted numbers.

Notes

  • The %.17g format provides enough precision to round-trip IEEE-754 double-precision values in most cases.
  • No validation is performed on finiteness or NA/NaN values; sprintf("%.17g", NA) yields "NA", and sprintf("%.17g", Inf) yields "Inf", which would fail on re-parse via .parse_num_vec (since as.numeric("NA") is NA and triggers the non-numeric error, and Inf would parse but is not checked here).

.parse_num_vec(value, key, line_no)

Purpose

Parses a c(...) literal string back into a numeric vector, or recognizes "NULL". Inverse of .serialize_num_vec.

Arguments

  • value: character(1). The raw string from the file.
  • key: character(1). Configuration key name for error messages.
  • line_no: integer(1) or numeric. Source line number for error messages.

Mathematics

  1. Trim value; if "NULL", return NULL.
  2. Require ^c\(.*\)$; otherwise error.
  3. Extract inner: inner = trimws(sub("^c\((.*)\)$", "\1", v)).
  4. If inner is empty, return numeric(0).
  5. Split on , (fixed), trim each part, coerce with as.numeric.
  6. If any result is NA, error.

Returns

NULL, numeric(0), or a numeric vector.

Notes

  • Uses suppressWarnings(as.numeric(parts)) to silence coercion warnings; the presence of NA in the result is then checked explicitly.
  • If any token fails to coerce (producing NA), raises a "gdpar_input_error" via gdpar_abort.
  • Splitting is done with strsplit(inner, ",", fixed = TRUE), so commas inside quoted strings would cause issues — but numeric vectors do not contain quoted strings, so this is safe.
  • Note that Inf, -Inf, and NaN would parse successfully via as.numeric and pass the is.na check for Inf (since is.na(Inf) is FALSE), but NaN would fail since is.na(NaN) is TRUE.

.parse_int(value, key, line_no)

Purpose

Parses a character string into a single integer value, with strict validation.

Arguments

  • value: character(1). The raw string from the file.
  • key: character(1). Configuration key name for error messages.
  • line_no: integer(1) or numeric. Source line number for error messages.

Mathematics

$$ v = \texttt{as.numeric}(\texttt{trimws}(\text{value})), $$

then validate:

$$ \text{valid} \iff \neg\texttt{is.na}(v) ;\wedge; \texttt{is.finite}(v) ;\wedge; v = \texttt{as.integer}(v). $$

If valid, return $\texttt{as.integer}(v)$.

Returns

A length-one integer.

Notes

  • Uses suppressWarnings(as.numeric(value)) to silence coercion warnings.
  • The check v != as.integer(v) rejects non-integer-valued numerics (e.g., 3.5). Note that as.integer(v) truncates toward zero, so this comparison effectively requires $v$ to be a whole number.
  • is.finite(v) rejects Inf, -Inf, NaN, and NA.
  • On any validation failure, raises a "gdpar_input_error" via gdpar_abort.
  • The returned value is explicitly as.integer(v), so it is of R type integer (not numeric).

.parse_W_records(records)

Purpose

Parses the collection of W.* configuration records (for the basis-function specification of the dynamic parameter model) and constructs a W_basis object. This is the central dispatcher for reconstructing the basis $W$ from the canonical file format.

Arguments

  • records: a named list of record objects. Each record is expected to be a list with elements value (character string) and line (line number). The relevant keys are "W.type", "W.degree", "W.knots", and "W.df".

Mathematics

The basis $W$ is constructed according to:

$$ W = \begin{cases} \texttt{NULL} & \text{if } \texttt{W.type} \text{ is absent or } \texttt{"NULL"}, \\ \text{error} & \text{if } \texttt{W.type} = \texttt{"user"}, \\ W_{\text{poly}}(\text{degree}) & \text{if } \texttt{W.type} = \texttt{"polynomial"}, \\ W_{\text{bspline}}(\text{degree},, \text{knots}) & \text{if } \texttt{W.type} = \texttt{"bspline"} \text{ and } \texttt{W.knots} \text{ given}, \\ W_{\text{bspline}}(\text{degree},, \text{df}) & \text{if } \texttt{W.type} = \texttt{"bspline"} \text{ and } \texttt{W.df} \text{ given}. \end{cases} $$

Returns

NULL, or a W_basis object constructed by W_basis(...).

Notes

  • If records[["W.type"]] is NULL or its value is "NULL", returns NULL immediately (no basis).
  • W.type = "user" is explicitly rejected with a "gdpar_input_error" because user-defined bases reference arbitrary R functions that cannot be serialized in the canonical format.
  • Only "polynomial" and "bspline" are accepted as W.type values; anything else raises a "gdpar_input_error" referencing the line number from Wt_rec[["line"]].
  • W.degree is required for both supported types. If records[["W.degree"]] is NULL, an error is raised. The degree is parsed via .parse_int.
  • For W.type = "polynomial": returns W_basis(type = "polynomial", degree = W_degree). No further keys are consulted.
  • For W.type = "bspline":
    • W.knots and W.df are mutually exclusive. If both are present, a "gdpar_input_error" is raised.
    • If W.knots is present, it is parsed via .parse_num_vec and passed as knots to W_basis.
    • If W.df is present, it is parsed via .parse_int and passed as df to W_basis.
    • If neither W.knots nor W.df is present, a "gdpar_input_error" is raised stating that one must be supplied.
  • All errors are raised via gdpar_abort with class "gdpar_input_error".
  • This function delegates the actual basis construction to the (presumably exported or internal) W_basis constructor, which is not defined in this section.

R/amm_spec.R

amm_spec(a = NULL, b = NULL, W = NULL, x_vars = NULL, p = 1L, dims = NULL)

Purpose Constructs a specification object that declares which components of the Additive-Multiplicative-Modulated (AMM) canonical form

$$\Delta(x, \theta) = a(x) + b(x) \odot \theta + W(\theta), x$$

are active and how each is parametrized. The returned amm_spec object is the primary input consumed by gdpar() to assemble design matrices and the Stan model. Two mutually exclusive construction paths exist, selected by p: the scalar path (p = 1L, default) accepts a and b directly as one-sided formulas; the multivariate path (p > 1L) requires the dims argument instead and forbids a/b.

Arguments

Argument Type Meaning
a One-sided formula or NULL Scalar path only. Basis for the additive component $a(x)$. Evaluated without an intercept via stats::model.matrix. NULL disables the additive component. Must be NULL when p > 1L.
b One-sided formula or NULL Scalar path only. Basis for the multiplicative component $b(x)$. Same semantics as a. Must be NULL when p > 1L.
W Object of class W_basis or NULL Basis for the modulating component $W(\theta)$. Declared once regardless of p because it couples all dimensions of $\theta_{\mathrm{ref}}$. NULL disables the modulating component. Validated with assert_inherits.
x_vars Character vector or NULL Names of covariates entering the modulating component as the linear factor $x$ in $W(\theta)\,x$. When NULL, gdpar() later uses the covariates from the right-hand side of the model formula. Must be non-empty character when supplied.
p Integer $\geq 1$ Dimension of the per-individual parameter vector $\theta_i$. Defaults to 1L. Coerced to integer via as.integer after validation with assert_count.
dims dims_spec object, plain list of length p, or NULL Multivariate path only. Each entry is a list with components a (one-sided formula or NULL) and b (one-sided formula or NULL). Must be NULL when p == 1L. Bare formulas are rejected to prevent silent recycling. If a dims_spec object (from dimwise()), it is resolved via resolve_dims_spec(dims, p).

Mathematics

The AMM level of the specification is inferred from the (non-)nullity of the components across all dimensions:

$$ \text{level} = \begin{cases} 0 & \text{if every component is } \texttt{NULL} \text{ across all dimensions,} \\ 1 & \text{if only } a \text{ is active in any dimension and neither } b \text{ nor } W \text{ is active anywhere,} \\ 2 & \text{otherwise.} \end{cases} $$

On the scalar path the check is direct on a, b, W; on the multivariate path it is computed over the resolved per-dimension list:

$$\texttt{any_a} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}ka), \qquad \texttt{any_b} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}kb).$$

The design-matrix centering conditions (C2) and (C3) are not enforced inside this constructor; they are enforced empirically when the design matrices $Z_a$ and $Z_b$ are built by the (internal, downstream) design-matrix materializer, which subtracts column-wise sample means so that $\mathrm{colMeans}(Z_a) = \mathbf{0}$ and $\mathrm{colMeans}(Z_b) = \mathbf{0}$ exactly.

Returns An S3 object of class c("amm_spec", "list") with the following components:

Component Type Description
a One-sided formula or NULL Populated on the scalar path; NULL on the multivariate path.
b One-sided formula or NULL Populated on the scalar path; NULL on the multivariate path.
W W_basis object or NULL The modulating basis, passed through verbatim.
x_vars Character vector or NULL Passed through verbatim.
level Integer (0L, 1L, or 2L) AMM hierarchy level as defined above.
p Integer Dimension of $\theta_i$.
dims NULL or list of length p NULL on the scalar path. On the multivariate path, a resolved list of length p where each element is list(a = ..., b = ...).

Notes

  • Scalar path (p == 1L): a and b are validated with assert_one_sided_formula(..., allow_null = TRUE). If dims is non-NULL, an error of class gdpar_input_error is raised.
  • Multivariate path (p > 1L): If either a or b is non-NULL, an error is raised directing the user to dims. If dims is NULL, an error is raised.
  • Bare formula guard: If dims inherits from "formula", a specific error is raised to prevent silent recycling across dimensions.
  • Plain list validation for dims: The list must have length exactly p; each entry must itself be a list. Missing a/b names default to NULL. Each formula entry is validated with assert_one_sided_formula.
  • dims_spec resolution: When dims is a dims_spec object (produced by dimwise() and possibly composed with override()), it is resolved via the internal resolve_dims_spec(dims, p) function.
  • Unknown class for dims: If dims is not NULL, not a formula, not a dims_spec, and not a list, an error of class gdpar_input_error is raised.
  • No cross-component identifiability check: The constructor does not detect non-identifiability between $a,\ b$, and $W$ components; this is deferred to gdpar_check_identifiability() which is called automatically before fitting.
  • Linearity assumption (LIN): Holds automatically for formula-based bases (linear subspaces of $L^2_0(\mu)$) and for polynomial/B-spline W_basis types.
  • Dependencies: Uses assert_count, assert_one_sided_formula, assert_inherits (all internal assertion helpers), gdpar_abort (structured error signalling), and resolve_dims_spec.

build_amm_design(amm, data, formula_rhs)

Purpose Constructs the centered design matrices ($Z_a,\ Z_b$, and optionally $X$) for an Asymmetric Mixture Model (AMM) specification with $p=1$ coordinate. For $p&gt;1$ it delegates to .build_amm_design_multi(). This function prepares the covariate matrices for the static and modulating components of the AMM, centering them and standardising the modulating covariates.

Arguments

  • amm (amm_spec): An object of class "amm_spec" representing the AMM specification.
  • data (data.frame): The data frame containing the variables referenced by the AMM specification.
  • formula_rhs (formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factor x variables when amm$x_vars is NULL.

Mathematics The function implements the following operations:

  1. Centering the static components:
    For the static intercept formula amm$a (and similarly amm$b), the design matrix $Z_a^\text{full}$ is constructed. It is then column-centred by subtracting its column means: $Z_a = Z_a^\text{full} - \mathbf{1} \cdot \bar{Z}_a^\text{T}$ where $\bar{Z}_a$ is the vector of column means of $Z_a^\text{full}$. The same is done for $Z_b$.

  2. Standardising the modulating component:
    When the modulating component amm$W is active, the covariate matrix $X^\text{full}$ is built from x_vars. It is then centred and scaled to have zero mean and unit standard deviation (using the sample standard deviation): $X = \text{diag}\left(\frac{1}{s_X}\right) \left( X^\text{full} - \mathbf{1} \cdot \bar{X}^\text{T} \right)$ where $\bar{X}$ is the vector of column means and $s_X$ is the vector of column sample standard deviations.

Returns A list with components:

  • Z_a (matrix): Centred design matrix for the static component a (rows = observations, columns = terms from a). If a is NULL, a 0-column matrix of the correct row count.
  • Z_b (matrix): Centred design matrix for the static component b.
  • X (matrix): Standardised (centred and scaled) design matrix for the modulating component. If W is inactive, a 0-column matrix.
  • Z_a_means (numeric): Column means of the raw (uncentred) design matrix for a.
  • Z_b_means (numeric): Column means for b.
  • X_means (numeric): Column means of the raw covariate matrix for x.
  • X_sds (numeric): Column standard deviations of the centred covariate matrix for x.
  • Z_a_names (character): Column names of Z_a.
  • Z_b_names (character): Column names of Z_b.
  • X_names (character): Column names of X (the x_vars).

Notes

  • If amm$p > 1, the function immediately returns the result of .build_amm_design_multi(amm, data, formula_rhs).
  • The function performs strict input validation via assert_inherits and assert_data_frame.
  • Raises a gdpar_input_error if any covariate needed by a, b, or x_vars contains missing values (NA). The error message specifies that "Path 1 does not impute".
  • Raises a gdpar_input_error if the modulating component W is active but no x_vars could be identified (from amm$x_vars, formula_rhs, or the formula terms).
  • Raises a gdpar_input_error if any required x_vars are missing from data.
  • Raises a gdpar_input_error if any x_vars are constant (zero standard deviation after centring).
  • The design matrices are constructed using stats::model.matrix with the formula updated to ~ . + 0 to suppress the intercept column.
  • The returned Z_a, Z_b, and X are always plain matrices (not data frames or tibbles).

.build_amm_design_multi(amm, data, formula_rhs)

Purpose Internal workhorse for building per-coordinate centred design matrices ($Z_{a,k},\ Z_{b,k}$) and a shared modulating matrix ($X$) for a multivariate AMM specification with $p &gt; 1$. The modulating component is shared across coordinates because it depends only on the global covariate vector $x$.

Arguments

  • amm (amm_spec): An object of class "amm_spec" with amm$p > 1. The per-coordinate specifications are stored in amm$dims.
  • data (data.frame): The data frame containing the variables referenced by the AMM specification.
  • formula_rhs (formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factor x variables when amm$x_vars is NULL.

Mathematics The algorithm is an extension of the univariate case ($p=1$) to $k=1,\dots,p$ coordinates:

  1. Collect needed variables:
    Iterate over all coordinates $k$ and collect all variables referenced in amm$dims[[k]]$a and amm$dims[[k]]$b. The union of these, plus amm$x_vars, forms the set of variables that must be present and complete (no NAs) in data.

  2. Per-coordinate centred design matrices:
    For each coordinate $k$:

    • If the static intercept formula a_k is not NULL, construct its design matrix $Z_{a,k}^\text{full}$ and centre it: $Z_{a,k} = Z_{a,k}^\text{full} - \mathbf{1} \cdot \bar{Z}_{a,k}^\text{T}$
    • Similarly for $Z_{b,k}$ from b_k.
    • If a_k or b_k is NULL, the corresponding matrix is set to a 0-column matrix.
  3. Shared modulating matrix $X$:
    The construction is identical to the univariate case: identify x_vars, build $X^\text{full}$, centre, scale, and standardise to unit variance. The resulting $X$ is shared across all $p$ coordinates.

Returns A list with components:

  • p (integer): The number of coordinates.
  • Z_a_list (list of matrices): Length p list; each element is the centred design matrix for coordinate $k$'s static component a.
  • Z_b_list (list of matrices): Length p list; each element is the centred design matrix for coordinate $k$'s static component b.
  • X (matrix): The shared standardised modulating matrix (0 columns if W is inactive).
  • Z_a_means_list (list of numeric vectors): Column means for each raw $Z_{a,k}^\text{full}$.
  • Z_b_means_list (list of numeric vectors): Column means for each raw $Z_{b,k}^\text{full}$.
  • X_means (numeric): Column means of the raw $X^\text{full}$.
  • X_sds (numeric): Column standard deviations of the centred $X$.
  • Z_a_names_list (list of character vectors): Column names for each $Z_{a,k}$.
  • Z_b_names_list (list of character vectors): Column names for each $Z_{b,k}$.
  • X_names (character): Column names of X (the x_vars).

Notes

  • This function is internal and not exported (hence the leading dot).
  • It performs the same input validation and error checks as build_amm_design: missing values in needed variables, identification of x_vars when W is active, missing x_vars in data, and constant covariates in x_vars.
  • The per-coordinate matrices in Z_a_list and Z_b_list may have different column counts (ragged arrays). Downstream assembly (e.g., for Stan) must handle this raggedness.
  • The function does not compute any group-level random effects or parameter transformations; it only constructs the design matrices from the data and specifications.

.build_amm_design_K(amm_list_canonical, data, formula_rhs)

Purpose Constructs centered design matrices and metadata for a multi‑individual (K > 1) AMM specification. Processes a list of canonical amm_spec objects (one per individual) and a dataset to produce centered design matrices for additive (a) and multiplicative (b) components, plus a scaled matrix for the modulating component (W) if present.

Arguments

  • amm_list_canonical (list): A named list of length ≥ 2. Each element must be an object of class amm_spec with p = 1 (the K > 1, p > 1 regime is unsupported). Slots must have non‑empty names.
  • data (data.frame): Data frame containing the variables used in the formulas. Must have no missing values in the required covariates.
  • formula_rhs (formula or character): Right‑hand side of the model formula. Used as a fallback to determine the set of covariates for the modulating component if x_vars is not specified and the union of variables from a/b formulas is empty.

Mathematics

For each individual $k = 1, \dots, K$:

  1. Additive component:
    If the formula $a_k$ is not NULL, build the design matrix $Z^{(a)}_{\text{full}, k}$ via model.matrix. Compute column means $\bar{z}^{(a)}_k$ and center: $Z^{(a)}_k = Z^{(a)}_{\text{full}, k} - \mathbf{1}_n \bar{z}^{(a)\top}_k.$ If $a_k$ is NULL, set $Z^{(a)}_k$ to an $n \times 0$ matrix.

  2. Multiplicative component:
    Analogously, from $b_k$, produce $Z^{(b)}_k$ after centering.

  3. Modulating component (if any $W$ is non‑NULL):
    Determine the variable set $\mathbf{x}$ (explicit x_vars or union of variables from all $a_k,\ b_k$, or from formula_rhs).
    Form $X_{\text{full}}$ from the data, then center and scale: $\bar{x} = \frac{1}{n} \mathbf{1}_n^\top X_{\text{full}},\ X_c = X_{\text{full}} - \mathbf{1}_n \bar{x}^\top,\ s_x = \text{diag}\left(\sqrt{\frac{1}{n-1} X_c^\top X_c}\right),\ X = X_c \cdot \text{diag}(1/s_x).$ Each column of $X$ has zero mean and unit sample standard deviation.

Returns
A list with components:

  • K (integer): Number of individuals.
  • slot_names (character vector): Names of the input list slots.
  • Z_a_k_list (list of matrices): Centered design matrices for additive components (each $n \times p_a^{(k)}$, where $p_a^{(k)}$ is the number of columns from $a_k;\ n \times 0$ if absent).
  • Z_b_k_list (list of matrices): Centered design matrices for multiplicative components.
  • X (matrix): Centered and scaled design matrix for the modulating component ($n \times q;\ n \times 0$ if no $W$ is active).
  • Z_a_k_means_list (list of numeric vectors): Column means of the additive design matrices before centering.
  • Z_b_k_means_list (list of numeric vectors): Column means of the multiplicative design matrices.
  • X_means (numeric vector): Column means of the modulating component matrix (empty if $q = 0$).
  • X_sds (numeric vector): Column standard deviations of $X_c$ (empty if $q = 0$).
  • Z_a_k_names_list (list of character vectors): Column names of the additive design matrices.
  • Z_b_k_names_list (list of character vectors): Column names of the multiplicative design matrices.
  • X_names (character vector): Names of the variables in the modulating component.

Notes

  • Raises errors via gdpar_abort if:
    • amm_list_canonical is not a list of length ≥ 2 or has empty/missing names.
    • Any amm_spec has p > 1 (unsupported feature error).
    • data is not a data frame.
    • Missing values exist in any required covariate (no imputation).
    • The modulating component is active but no covariates are identified.
    • Required variables for the modulating component are missing in data.
    • Any variable in the modulating component is constant (zero standard deviation).
  • Uses assert_inherits to verify each list element is an amm_spec.
  • Uses stats::model.matrix and sweep for matrix construction and centering.

print.amm_spec(x, ...)

Purpose S3 method that prints a concise summary of an amm_spec object to the console, displaying its key specifications: level, dimension p, formulas for additive/multiplicative components, modulating component, and x_vars if present.

Arguments

  • x (amm_spec): An object of class amm_spec to print.
  • ... (any): Additional arguments (unused; present for S3 generic compatibility).

Returns Invisibly returns the input object x.

Notes

  • Output format:
    <amm_spec> AMM Level <level>
      p (dim theta_i)    : <p>
      a (additive)       : <formula or NULL>
      b (multiplicative) : <formula or NULL>
      W (modulating)     : <W_basis(...) or NULL>
      x_vars             : <comma-separated list, if any>
    
    If p > 1, instead of separate a and b lines, it prints a table of per‑dimension formulas from x$dims.
  • Formulas are deparsed to character strings for display.
  • The method is exported and can be called directly on amm_spec objects or via print().

R/bvm_check.R

gdpar_bvm_check(fit, parameters = NULL, level = 0.95, verbose = TRUE)

Purpose

Exported methodological-audit function. Empirically verifies the conclusion of the Bernstein–von Mises theorem (Theorem 4C of Block 4) for a fitted Path 1 Bayesian gdpar model by comparing Bayesian posterior credible intervals with Hessian-based asymptotic confidence intervals obtained from a Laplace approximation around the maximum likelihood estimator on a prior-stripped Stan model. The function is opt-in, computationally expensive, and does not modify the input fit.

Arguments

  • fit: object of S3 class gdpar_fit (produced by gdpar() with path = "bayes"). Must contain $fit (a cmdstanr fit object with a draws() method), $stan_data (a list with possible field use_groups), and $prior (passed to generate_stan_code).
  • parameters: optional character vector of parameter names to include in the comparison. When NULL (default), defaults to the user-facing parameters that the prior-stripped likelihood identifies, obtained by filtering posterior::variables(draws) against an exclusion regex.
  • level: numeric scalar in $[0,1]$ giving the nominal credible/confidence level. Defaults to 0.95.
  • verbose: logical scalar; when TRUE, prints an estimated-cost / opt-in message before starting. Defaults to TRUE.

Mathematics

Let $\theta$ denote the finite-dimensional parameter vector of the parametric AMM specification. Theorem 4C states that, under the LAN condition with non-singular Fisher information $\mathcal{I}(\theta_0)$ at the true parameter $\theta_0$, the posterior converges in total variation:

$$ \pi(\theta \mid y) \xrightarrow{TV} \mathcal{N}!\left(\hat\theta_{\mathrm{MLE}},; \frac{1}{n}\mathcal{I}(\theta_0)^{-1}\right). $$

For a nominal level $1-\alpha$ with $\alpha = 1 - \texttt{level}$, the function computes two-sided intervals with lower and upper quantiles

$$ q_L = \alpha/2, \qquad q_U = 1 - \alpha/2. $$

For each parameter, the Bayesian credible interval is $[b_L, b_U]$ from posterior quantiles and the asymptotic confidence interval is $[a_L, a_U]$ from the Laplace approximation around the MLE. The interval-width ratio and discrepancy are

$$ r = \frac{b_U - b_L}{a_U - a_L}, \qquad d = \left|\log r\right|. $$

A width ratio is flagged as suspicious when $r &lt; 0.5$ or $r &gt; 2$.

The MLE is obtained on the constrained (natural) scale because Stan's optimizer is invoked with jacobian = FALSE; the same convention is used for cmdstanr::laplace, so the Hessian-based covariance corresponds to the inverse observed information on the natural scale rather than the unconstrained scale.

Returns

A list of S3 class c("gdpar_bvm_report", "list") with components:

  • table: a data frame with one row per selected parameter and columns
    • variable: parameter name,
    • bayes_mean, bayes_lower, bayes_upper, bayes_width: posterior mean, lower/upper quantiles, and width $b_U - b_L$,
    • asymp_mean, asymp_lower, asymp_upper, asymp_width: Laplace-approximation mean, lower/upper quantiles, and width $a_U - a_L$ (all NA_real_ if the Laplace step failed),
    • width_ratio: $r = \texttt{bayes\_width} / \texttt{asymp\_width}$ (NA_real_ if the Laplace step failed).
  • discrepancy: numeric vector of length length(parameters) with $d = |\log r|$ (NA_real_ if the Laplace step failed).
  • level: the input level.
  • warnings: character vector of warning messages. Set to "Laplace approximation failed; asymptotic comparison unavailable." when the Laplace step fails; otherwise appended with one entry of the form "Width ratio outside [0.5, 2] for: <vars>." whenever any ratio falls outside $[0.5, 2]$.

Notes

  • Input validation:
    • assert_inherits(fit, "gdpar_fit", "fit") is called first.
    • assert_numeric_scalar(level, "level", lower = 0, upper = 1) enforces $0 \le \texttt{level} \le 1$.
    • verbose is checked manually: if !is.logical(verbose) || length(verbose) != 1L, the function aborts with gdpar_abort(..., class = "gdpar_input_error").
  • Hierarchical fits are rejected: if fit$stan_data$use_groups is non-null and equals 1L (after as.integer), the function aborts with class gdpar_unsupported_feature_error and a message explaining that Theorem 4C does not cover per-group random anchors.
  • Suggested-package dependencies cmdstanr (for optimization and Laplace) and posterior (for draw extraction/summarisation) are required via require_suggested; missing packages raise an error.
  • The opt-in message is emitted through gdpar_inform with class "gdpar_optin_message" when verbose = TRUE.
  • Candidate parameters are obtained from posterior::variables(draws) after excluding names matching the regex
    ^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw|c_b|c_b_raw|mu_theta_ref|sigma_theta_ref|sigma_a|sigma_b|sigma_W)
    
    This excludes latent noise, log-likelihood, predictions, per-observation anchors, random-effect coefficients and raws, centering constants, prior hyperparameters, and hierarchical scales (sigma_a, sigma_b, sigma_W), leaving user-facing identified parameters such as theta_ref, sigma_y, and phi.
  • When parameters is supplied by the user, any requested name not in candidate_vars triggers gdpar_abort with class gdpar_input_error and a message listing the missing names wrapped in sQuote.
  • Stan code is regenerated from fit$prior via generate_stan_code(fit$prior, mle = TRUE) (prior block stripped using the // BEGIN PRIORS / // END PRIORS markers) and written to a temporary file via write_stan_to_tempfile.
  • The MLE is found by cs_model$optimize(data = stan_data, algorithm = "lbfgs", refresh = 0, history_size = 5, init_alpha = 0.001, iter = 2000, jacobian = FALSE).
  • The Laplace step is wrapped in tryCatch. On error, gdpar_warn is emitted with class "gdpar_diagnostic_warning" carrying the message "Laplace approximation failed: <conditionMessage>. Falling back to MLE-only summary.", and the asymptotic columns of table plus discrepancy are filled with NA_real_.
  • Bayesian and Laplace summaries are both produced with posterior::summarise_draws using custom mean, sd, q_lower, q_upper functions; q_lower and q_upper use stats::quantile with names = FALSE at probabilities alpha/2 and 1 - alpha/2 respectively.
  • The function does not modify fit; the returned report is purely informational.
  • Side effects: writes a Stan file to a temporary location, invokes cmdstanr::cmdstan_model, optimize, and (optionally) laplace; may emit one opt-in message, one diagnostic warning, and one or more width-ratio warnings.

print.gdpar_bvm_report(x, ...)

Purpose

Exported S3 print method for objects of class gdpar_bvm_report produced by gdpar_bvm_check. Renders a human-readable summary of the calibration comparison.

Arguments

  • x: an object of S3 class gdpar_bvm_report.
  • ...: unused; present for S3 generic compatibility.

Returns

Invisibly returns x.

Notes

  • Prints a header line of the form <gdpar_bvm_report> level = <level> followed by a newline.
  • Prints x$table via print with row.names = FALSE.
  • If length(x$warnings) > 0L, prints a blank line, the heading Warnings:, and each warning prefixed by - on its own line.
  • Dispatched through the S3 generic print based on the first class element "gdpar_bvm_report".
  • No validation of x is performed; the function assumes the object was constructed by gdpar_bvm_check.

R/causal_bridge_methods.R

print.gdpar_causal_bridge(x, ...)

Purpose S3 print method for objects of class gdpar_causal_bridge. Produces a concise, human-readable summary of the bridge's structural metadata and first few CATE estimates.

Arguments

Argument Type Meaning
x gdpar_causal_bridge The bridge object to print.
... (unused) Present for S3 generic compatibility; ignored.

Returns Invisibly returns x.

Notes

  • The method extracts the treatment-model family (x$fits$treat$family) and prints both its name and link.
  • It uses %||% (null-coalescing) to default K and p to 1L when the corresponding slot is absent in x$fits$treat.
  • K is printed only when K > 1; otherwise p is printed only when p > 1. If both are 1, neither label is emitted.
  • The anchor vector is formatted with digits = 4 and displayed as a comma-separated bracketed list.
  • x$meta$newdata_source defaults to "<unknown>" when NULL.
  • The head of cate_mean is displayed via the internal helper .bridge_format_head. If x$warnings is a non-empty character vector, each warning is printed on its own indented line under a "Warnings:" heading.
  • Side effect: writes to the console via cat.

.bridge_format_head(cate_mean, n_show = 6L)

Purpose Internal helper that formats the first few elements (or rows) of the cate_mean vector/matrix for display in print.gdpar_causal_bridge.

Arguments

Argument Type Meaning
cate_mean numeric vector or numeric matrix The posterior mean CATE values. A vector for scalar bridges; a matrix for multivariate/K-individual bridges.
n_show integer, default 6L Maximum number of elements or rows to display.

Returns A single character string suitable for cat.

Notes

  • Matrix path (is.matrix(cate_mean) is TRUE): nrow(cate_mean) observations are assumed; n_show is clamped to min(n_show, n). Each row i is formatted as [v1, v2, ...] with digits = 3 via format. Rows are concatenated with "; " separators.
  • Vector path: length(cate_mean) observations are assumed; n_show is clamped similarly. The first n_show elements are formatted with digits = 3 and joined with ", ".
  • Marked @keywords internal and @noRd; not exported.

summary.gdpar_causal_bridge(object, ...)

Purpose S3 summary method for gdpar_causal_bridge objects. Constructs a structured summary containing a per-observation table of posterior CATE (mean and credible interval), the marginal Average Treatment Effect (ATE) and its credible interval, and ancillary metadata.

Arguments

Argument Type Meaning
object gdpar_causal_bridge The bridge to summarise.
... (unused) Present for S3 generic compatibility; ignored.

Returns An object of class c("summary.gdpar_causal_bridge", "list") with components:

Component Structure Description
table data.frame Per-observation (and per-slot) CATE table (see below).
ate named numeric vector Marginal ATE. Scalar for K=1,p=1; length-K vector otherwise.
ate_ci numeric matrix (K × 2) Credible bounds for the marginal ATE. Columns lower, upper. Row names are slot names (or "ate" for scalar case).
level numeric scalar Credible level used.
type character scalar Type of bridge (e.g. "response", "link").
n_draws integer scalar Number of posterior draws.
n_obs integer scalar Number of evaluation observations.

Mathematics

Let $S$ denote the number of posterior draws and $n$ the number of observations. The credible level is $\ell$ and the tail probabilities are

$$\alpha = 1 - \ell, \quad q_L = \frac{\alpha}{2}, \quad q_U = 1 - \frac{\alpha}{2}.$$

Scalar bridge (cate_draws is an $S \times n$ matrix):

$$\text{ATE} = \frac{1}{n}\sum_{i=1}^{n} \text{cate_mean}_i, \qquad \text{ATE draws} = \frac{1}{n}\sum_{i=1}^{n} \text{cate_draws}_{s,i}, \quad s=1,\dots,S.$$

The ATE credible interval is

$$\text{CI}_{\text{ATE}} = \bigl[,Q_{q_L}(\text{ATE draws}),; Q_{q_U}(\text{ATE draws}),\bigr],$$

where $Q_p$ denotes the empirical $p$-quantile (via stats::quantile).

Multivariate / K-individual bridge (cate_draws is an $S \times n \times K$ array):

For each slot $k = 1,\dots,K$:

$$\text{ATE}_k = \frac{1}{n S}\sum_{s=1}^{S}\sum_{i=1}^{n} \text{cate_draws}_{s,i,k}, \qquad \bar{\tau}_{s,k} = \frac{1}{n}\sum_{i=1}^{n} \text{cate_draws}_{s,i,k}.$$

$$\text{CI}_{\text{ATE},k} = \bigl[,Q_{q_L}(\bar{\tau}_{\cdot,k}),; Q_{q_U}(\bar{\tau}_{\cdot,k}),\bigr].$$

Notes

  • Calls assert_inherits(object, "gdpar_causal_bridge", "object") to validate input.
  • The scalar branch is taken when is.matrix(cate_draws) is TRUE; the table then has columns observation, cate_mean, cate_lower, cate_upper. The cate_mean and cate_ci components of the bridge object (object$cate_mean, object$cate_ci) are assumed pre-computed.
  • The array branch is taken otherwise. Slot names are taken from object$meta$dim_names; if NULL, they default to "dim_1", "dim_2", …, "dim_K". The table gains a slot column and has K × n rows total.
  • The per-observation ATE for the scalar case is mean(cate_draws) (i.e. the grand mean over both draws and observations), and the ATE CI uses rowMeans(cate_draws) (a length-$S$ vector of per-draw cross-observation means).
  • For the array case, ate_vec[k] is computed as mean(cate_draws[, , k]) (grand mean over all draws and observations for slot $k$). The CI for slot $k$ uses apply(cate_draws[, , k, drop = FALSE], 1L, mean) to produce a length-$S$ vector of per-draw means, then takes quantiles of that.
  • The ate_mat matrix in the scalar path has a single row named "ate".
  • Class is set to c("summary.gdpar_causal_bridge", "list").

print.summary.gdpar_causal_bridge(x, ...)

Purpose S3 print method for summary.gdpar_causal_bridge objects. Displays bridge metadata, the marginal ATE table, and the first 10 rows of the per-observation CATE table.

Arguments

Argument Type Meaning
x summary.gdpar_causal_bridge The summary object to print.
... (unused) Present for S3 generic compatibility; ignored.

Returns Invisibly returns x.

Notes

  • Builds an ate_df data frame with columns slot, mean, lower, upper from x$ate (coerced to unnamed vector) and x$ate_ci. This data frame is printed with row.names = FALSE.
  • The per-observation CATE table is truncated to the first 10 rows via utils::head(x$table, 10L).
  • Side effect: writes to the console via cat and print.

predict.gdpar_causal_bridge(object, newdata, level = NULL, summary = c("all", "draws", "mean_ci"), ...)

Purpose S3 predict method for gdpar_causal_bridge objects. Recomputes the per-observation Conditional Average Treatment Effect (CATE) on a new evaluation grid by leveraging the treatment and control model fits already stored in the bridge. Structural compatibility of the two fits is not re-checked.

Arguments

Argument Type Meaning
object gdpar_causal_bridge The bridge whose fits are used for prediction.
newdata data.frame New covariate grid on which to evaluate the CATE. Required.
level numeric scalar in $(0,1)$ or NULL Credible level for the new credible intervals. If NULL (default), uses object$level.
summary character scalar, one of "all", "draws", "mean_ci" Controls the structure of the returned value (see Returns).
... (unused) Present for S3 generic compatibility; ignored.

Mathematics

Let $\hat{\tau}_{\text{treat}}(x_i)$ and $\hat{\tau}_{\text{ctrl}}(x_i)$ denote the posterior draws for observation $x_i$ under the treatment and control fits respectively, aligned to the same set of $S$ draws. Then

$$\text{CATE}(x_i) = \hat{\tau}_{\text{treat}}(x_i) - \hat{\tau}_{\text{ctrl}}(x_i).$$

The summary statistics are

$$\text{cate_mean}_i = \frac{1}{S}\sum_{s=1}^{S} \text{CATE}_s(x_i), \qquad \text{cate_ci}_i = \bigl[,Q_{q_L}(\text{CATE}_{\cdot}(x_i)),; Q_{q_U}(\text{CATE}_{\cdot}(x_i)),\bigr],$$

where $q_L = \alpha/2,\ q_U = 1 - \alpha/2$, and $\alpha = 1 - \text{level}$.

Returns

summary value Return structure
"all" A list with components cate_draws (array/matrix of posterior CATE draws), cate_mean (numeric vector/matrix of posterior means), cate_ci (array/matrix of credible bounds), n_draws (integer scalar), n_obs (integer scalar).
"draws" The raw cate_draws object (matrix or array), returned early.
"mean_ci" A list with components cate_mean and cate_ci.

Notes

  • Validates object via assert_inherits and newdata via assert_data_frame.
  • level is validated by assert_numeric_scalar when non-NULL, with bounds (1e-3, 1 − 1e-3) exclusive.
  • summary is matched via match.arg; partial matching is allowed by R convention.
  • The type slot is extracted from object$type and forwarded to both stats::predict calls with summary = "draws". This means both the treatment and control predictions return raw posterior draws.
  • Draw alignment is performed by the internal helper .align_bridge_draws(pred_t, pred_c), which returns a list with aligned $treat and $ctrl arrays and the common draw count $S.
  • CATE is computed as the element-wise difference aligned$treat - aligned$ctrl.
  • Summarisation (mean and CI) is delegated to the internal helper .summarize_cate(cate_draws, ql, qu).
  • The return value for summary = "all" mirrors the cate_* slot structure of a freshly constructed gdpar_causal_bridge, enabling downstream methods (e.g. summary, print) to operate on the result if it is wrapped appropriately.

R/causal_bridge.R

gdpar_causal_bridge(fit_treat, fit_ctrl, newdata = NULL, type = c("response", "theta_i", "linear_predictor"), level = 0.95, ...)

Purpose

Exported constructor for the T-learner causal bridge between two independent gdpar_fit objects. It estimates the conditional average treatment effect (CATE) as the per-observation, per-draw difference of the posterior predictive distributions from the treatment-arm fit and the control-arm fit, evaluated on a common evaluation set. The function does not modify either input fit and performs no causal adjustment beyond what is encoded in the two AMM specifications; the no-unmeasured-confounding assumption within each arm is the user's responsibility.

Arguments

  • fit_treat — (gdpar_fit) A fit object produced by gdpar() for the treatment arm. Must be a Path 1 (path = "bayes") Bayesian fit.
  • fit_ctrl — (gdpar_fit) A fit object produced by gdpar() for the control arm. Must share the family, anchor, AMM level, and covariate structure of fit_treat.
  • newdata — (data.frame or NULL) Optional evaluation data frame. When NULL (default), the function attempts to recover each arm's training data by evaluating the captured data argument of each fit's call in the caller's environment; if both recoveries succeed and the two data frames share column structure, their rbind is used. Otherwise the function aborts and requests an explicit newdata.
  • type — (character) Scalar selecting the prediction scale. Matched via match.arg against c("response", "theta_i", "linear_predictor"). "response" applies the inverse link per draw; "theta_i" and "linear_predictor" are synonyms selecting the linear predictor of the individual parameter.
  • level — (numeric) Scalar in $(0, 1)$ giving the nominal credible level for per-observation CATE intervals. Defaults to 0.95. Validated to lie in $[10^{-3},\, 1 - 10^{-3}]$.
  • ... — Reserved for future arguments; currently unused.

Mathematics

For each posterior draw $s = 1, \dots, S$ and observation $i = 1, \dots, n_{\text{new}}$, the bridge computes

$$ \widehat{\tau}^{(s)}_i = \hat{\mu}^{(s)}_{\text{treat}}(x_i) - \hat{\mu}^{(s)}_{\text{ctrl}}(x_i), $$

where $\hat{\mu}^{(s)}_{\text{arm}}(x)$ is the posterior prediction of the chosen type at $x$. The marginal posterior of the CATE at each $x_i$ is summarized by the empirical mean and the $(\alpha/2,\; 1-\alpha/2)$ quantiles with

$$ \alpha = 1 - \text{level}. $$

Because the two fits are independent (sampled from disjoint data subsets), the joint posterior factorizes:

$$ p(\theta_{\text{treat}}, \theta_{\text{ctrl}} \mid \text{data}) = p(\theta_{\text{treat}} \mid \text{data}_{\text{treat}}) \cdot p(\theta_{\text{ctrl}} \mid \text{data}_{\text{ctrl}}), $$

so any pairing of marginal draws is a valid sample from the joint. When the two fits differ in number of draws, the function trims to

$$ S = \min(S_{\text{treat}},; S_{\text{ctrl}}) $$

and emits a gdpar_diagnostic_warning.

For multivariate ($p &gt; 1$) or distributional ($K &gt; 1$) fits, predict.gdpar_fit returns a 3-array of shape $[S, n, \text{dim}]$; the CATE is computed elementwise, and the per-coordinate or per-slot CATEs occupy the last dimension of cate_draws. For type = "response", the canonical inverse link of each coordinate or slot is applied by predict.gdpar_fit before differencing.

Returns

An object of class c("gdpar_causal_bridge", "list") with components:

  • cate_drawsmatrix $[S, n]$ (scalar fits) or array $[S, n, \text{dim}]$ (multivariate / K-individual fits): per-draw CATE values.
  • cate_mean — posterior mean of the CATE per observation (and per dimension when applicable).
  • cate_ci — credible interval bounds at the requested level.
  • newdata — the resolved evaluation data frame.
  • id_checklist(treat = fit_treat$identifiability_report, ctrl = fit_ctrl$identifiability_report).
  • fitslist(treat = fit_treat, ctrl = fit_ctrl).
  • type — the matched type string.
  • level — the numeric credible level.
  • n_drawsS, the (possibly trimmed) number of aligned draws.
  • n_obsnrow(newdata_resolved).
  • call — the matched call (match.call()).
  • warningscharacter vector of fallback notifications (e.g., posterior-draw trimming); empty on the happy path.
  • metalist(dim_kind, dim_size, dim_names, newdata_source) where newdata_source is the "bridge_source" attribute of the resolved newdata.

Notes

  • Calls six internal validators in sequence: .check_bridge_path, .check_bridge_hierarchical, .check_bridge_family, .check_bridge_dim, .check_bridge_amm, .check_bridge_anchor. Only the first three are defined in this section; the remaining three and .resolve_bridge_newdata, .align_bridge_draws, .summarize_cate are defined in subsequent sections.
  • Aborts with condition class gdpar_unsupported_feature_error (via gdpar_abort) when structural compatibility checks fail.
  • The assert_inherits calls validate that both fit_treat and fit_ctrl inherit from "gdpar_fit" before any other logic executes.
  • assert_numeric_scalar enforces level within $[10^{-3},\; 1 - 10^{-3}]$.
  • The newdata resolution uses parent.frame() as the evaluation environment for recovering captured data arguments.
  • Predictions are obtained via stats::predict(fit, newdata = ..., type = type, summary = "draws"), dispatching to predict.gdpar_fit.
  • The warnings field is set to character(0L) when aligned$warning is NA, otherwise to the warning string.
  • (C7) anti-aliasing of Block 6.5 is not invoked because the hierarchical guard rules out the regime in which it applies.
  • Companion S3 methods print.gdpar_causal_bridge and summary.gdpar_causal_bridge are documented elsewhere.

.check_bridge_path(fit_treat, fit_ctrl)

Purpose

Internal validator asserting that both fits were produced under Path 1 (path = "bayes"). T-learner support for Paths 2/3 is not implemented.

Arguments

  • fit_treat — (gdpar_fit) Treatment-arm fit.
  • fit_ctrl — (gdpar_fit) Control-arm fit.

Mathematics

Each fit's path slot is read with a fallback default of "bayes":

$$ \text{path}_{\text{arm}} = \text{fit}_{\text{arm}}\text{$path} ;%|%; \text{"bayes"}. $$

The check passes if and only if $\text{path}_{\text{treat}} \equiv \text{"bayes"}$ and $\text{path}_{\text{ctrl}} \equiv \text{"bayes"}$ (via identical).

Returns

invisible(NULL) on success.

Notes

  • On failure, calls gdpar_abort with class = "gdpar_unsupported_feature_error" and a data list containing path_treat and path_ctrl.
  • The %||% operator provides the default "bayes" when fit$path is NULL, meaning a fit lacking an explicit path slot is treated as Path 1.

.check_bridge_hierarchical(fit_treat, fit_ctrl)

Purpose

Internal validator asserting that neither fit was sampled in the hierarchical (grouped) regime. The T-learner difference of per-group anchors is not defined in the canonical formulation and is queued for a future sub-phase.

Arguments

  • fit_treat — (gdpar_fit) Treatment-arm fit.
  • fit_ctrl — (gdpar_fit) Control-arm fit.

Mathematics

A fit is classified as grouped when

$$ \text{is_grouped}(\text{fit}) = \bigl(\text{fit$stan_data$use_groups} \neq \text{NULL}\bigr) ;\wedge; \bigl(\lfloor \text{fit$stan_data$use_groups} \rfloor = 1\bigr). $$

The check passes if and only if $\neg\,\text{is\_grouped}(\text{fit}_{\text{treat}}) \;\wedge\; \neg\,\text{is\_grouped}(\text{fit}_{\text{ctrl}})$.

Returns

invisible(NULL) on success.

Notes

  • Defines a local closure is_grouped(fit) that tests fit$stan_data$use_groups for non-NULL and integer equality to 1L (via as.integer(...)).
  • On failure, calls gdpar_abort with class = "gdpar_unsupported_feature_error" and a data list containing treat_grouped and ctrl_grouped (logical scalars).
  • Uses the same condition class as gdpar_bvm_check so user code handling unsupported-feature errors covers both helpers.
  • The error message advises refitting each arm without the group argument.

.check_bridge_family(fit_treat, fit_ctrl)

Purpose

Internal validator asserting that the two fits share compatible family identifiers—both the top-level family name and link, and, when param_specs are present (multivariate / K-individual families), the per-slot family identifiers.

Arguments

  • fit_treat — (gdpar_fit) Treatment-arm fit.
  • fit_ctrl — (gdpar_fit) Control-arm fit.

Mathematics

Let $\text{fam}_{\text{arm}} = \text{fit}_{\text{arm}}\text{\$family}$. The top-level check requires

$$ \text{fam}_{\text{treat}}\text{$name} \equiv \text{fam}_{\text{ctrl}}\text{$name} ;\wedge; \text{fam}_{\text{treat}}\text{$link} \equiv \text{fam}_{\text{ctrl}}\text{$link}. $$

When param_specs is non-NULL on either fit, let $n_{\text{arm}} = |\text{fam}_{\text{arm}}\text{\$param\_specs}|$ (defaulting to an empty list via %||%). The slot-count check requires $n_{\text{treat}} = n_{\text{ctrl}}$. The per-slot identifier vector for each arm is

$$ \text{names}{\text{arm}}[k] = \text{as.character}\bigl(\text{ps}{\text{arm}}k\text{$family_id} ;%|%; \text{ps}{\text{arm}}k\text{$name} ;%|%; \text{NA_character_}\bigr), \quad k = 1, \dots, n{\text{arm}}, $$

and the slot-identifier check requires $\text{names}_{\text{treat}} \equiv \text{names}_{\text{ctrl}}$ (via identical).

Returns

invisible(NULL) on success.

Notes

  • Performs three sequential aborts, all with class = "gdpar_unsupported_feature_error":
    1. Top-level family name or link mismatch — data contains family_treat, family_ctrl, link_treat, link_ctrl.
    2. Slot count mismatch — data contains n_slots_treat, n_slots_ctrl.
    3. Per-slot family identifier mismatch — data contains slot_families_treat, slot_families_ctrl (character vectors).
  • The param_specs branch is entered when !is.null(ps_t) || !is.null(ps_c), meaning if either fit has param_specs, both are expected to have them and be compared.
  • Per-slot identifier extraction uses the chain s$family_id %||% s$name %||% NA_character_, so a slot lacking both family_id and name yields NA_character_.
  • vapply with character(1L) is used for type-safe extraction of per-slot identifiers.

Internal Functions (Section 2)

.check_bridge_dim(fit_treat, fit_ctrl)

Purpose

Guard function for gdpar_causal_bridge that verifies the two fitted model objects share identical structural dimensions $(K, p)$, where $K$ is the number of AMM components (or mixture/strata slots) and $p$ is the parameter dimension. This is a prerequisite for any per-draw contrast between the treatment and control arms.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) The treatment-arm fit; must contain (or default) elements K and p.
fit_ctrl list (fitted model object) The control-arm fit; must contain (or default) elements K and p.

Mathematics

The check enforces the equality constraints

$$ K_{\text{treat}} = K_{\text{ctrl}}, \qquad p_{\text{treat}} = p_{\text{ctrl}}. $$

Each dimension is recovered with a null-coalescing fallback to 1L:

$$ K_t = \text{fit_treat}$K ;%||%; 1, \qquad K_c = \text{fit_ctrl}$K ;%||%; 1, $$

and analogously for $p_t, p_c$.

Returns

invisible(NULL) when the dimensions match. Otherwise it never returns: it calls gdpar_abort with class "gdpar_unsupported_feature_error" and a data payload listing K_treat, K_ctrl, p_treat, p_ctrl.

Notes

  • Uses the %||% operator, so a missing K or p slot silently defaults to 1L rather than erroring.
  • The error condition is tagged "gdpar_unsupported_feature_error", signalling that mismatched dimensions are a feature limitation rather than a user input typo.

.check_bridge_amm(fit_treat, fit_ctrl)

Purpose

Asserts that the two fits are compatible at the AMM (Additive Modulating Model) specification level. Three facets are compared: (1) the AMM level (structural composition of the spec slots a/b/W), (2) the modulating basis type (polynomial vs. B-spline), and (3) the covariate column structure of the AMM design. This guarantees that the predict path on newdata reuses the same algorithm on both arms.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) Treatment-arm fit; expected to carry an amm element or a fallback amm_list_canonical.
fit_ctrl list (fitted model object) Control-arm fit; same expectation.

Mathematics

Let $\mathcal{A}_t$ and $\mathcal{A}_c$ denote the AMM specs of the two arms. The function verifies three identities:

$$ \text{level}(\mathcal{A}_t) \equiv \text{level}(\mathcal{A}_c), $$

$$ \text{W_type}(\mathcal{A}_t) \equiv \text{W_type}(\mathcal{A}_c), $$

$$ \text{covariates}(\mathcal{A}_t) \equiv \text{covariates}(\mathcal{A}_c). $$

The AMM spec is resolved via null-coalescing:

$$ \mathcal{A} = \text{fit}$\text{amm} ;%||%; \text{fit}$\text{amm_list_canonical}. $$

Returns

invisible(NULL) on success. On any mismatch, calls gdpar_abort:

  • Missing AMM on either fit → class "gdpar_internal_error".
  • Level mismatch → class "gdpar_unsupported_feature_error", data = list(level_treat, level_ctrl).
  • W-type mismatch → class "gdpar_unsupported_feature_error", data = list(W_type_treat, W_type_ctrl). The formatted message substitutes "<none>" for a NULL type.
  • Covariate mismatch → class "gdpar_unsupported_feature_error". The message lists the symmetric set difference of covariate component names:

$$ \bigl(,\text{names}(\text{cov}_t) \cup \text{names}(\text{cov}_c),\bigr) ;\setminus; \bigl(,\text{names}(\text{cov}_t) \cap \text{names}(\text{cov}_c),\bigr). $$

Notes

  • Dispatches to three helper functions: .bridge_amm_level, .bridge_amm_W_type, .bridge_amm_covariates.
  • The covariate comparison uses identical(), so ordering of list elements matters for the check to pass.
  • A missing AMM on either fit is treated as an internal error (not a user error), reflecting the expectation that fits should always carry an AMM spec.

.check_bridge_anchor(fit_treat, fit_ctrl)

Purpose

Asserts that the reference anchors stored in the two fits are numerically identical (within tolerance). The anchor enters the modulating term as $\theta_{\text{ref}}^k - \text{anchor}^k$, so a mismatch between arms changes the semantic meaning of the predicted $\theta_i$ and therefore of the CATE.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) Treatment-arm fit; must contain an anchor element (numeric vector).
fit_ctrl list (fitted model object) Control-arm fit; must contain an anchor element (numeric vector).

Mathematics

Given anchor vectors $a_t$ and $a_c$, the function first checks length equality:

$$ \text{length}(a_t) = \text{length}(a_c). $$

Then it computes element-wise absolute differences and a per-element scale:

$$ d_i = |a_{t,i} - a_{c,i}|, \qquad s_i = \max\bigl(|a_{t,i}|,; |a_{c,i}|,; 1\bigr), $$

and requires

$$ d_i \le 10^{-8} \cdot s_i \quad \forall, i. $$

This combines relative tolerance (when $|a_{t,i}|$ or $|a_{c,i}|$ exceeds 1) with absolute tolerance (floor at 1).

Returns

invisible(NULL) when anchors match. Otherwise calls gdpar_abort:

  • Length mismatch → class "gdpar_unsupported_feature_error", data = list(anchor_treat_length, anchor_ctrl_length).
  • Value mismatch → class "gdpar_unsupported_feature_error", data = list(anchor_treat, anchor_ctrl). The message formats anchor values with digits = 6 and instructs the user to refit one arm anchored to the other's value.

Notes

  • The tolerance is fixed at 1e-8 and is not configurable.
  • pmax is used (element-wise parallel max), so the scale vector has the same length as the anchors.
  • If either anchor slot is NULL, the subtraction a_t - a_c will error in base R before the tolerance check; this is not explicitly guarded.

.bridge_amm_level(amm)

Purpose

Infers the AMM level from an amm_spec object or a list of amm_spec objects. The level encodes the structural composition of the spec (presence/absence of a, b, W components).

Arguments

Argument Type Meaning
amm amm_spec or list of amm_spec The AMM specification (single or per-slot when $K &gt; 1$).

Returns

  • If amm inherits from "amm_spec": as.integer(amm$level) (scalar integer).
  • If amm is a list: an integer vector of the same length, where each element is as.integer(a$level) if the element inherits from "amm_spec", otherwise NA_integer_.
  • Otherwise: NA_integer_.

Notes

  • No error is raised for non-amm_spec inputs; the function returns NA_integer_ silently.
  • The return type is always integer (via as.integer and NA_integer_), making the output suitable for identical() comparison in .check_bridge_amm.

.bridge_amm_W_type(amm)

Purpose

Extracts the modulating basis type (e.g., "polynomial" or "bspline") from the W sub-component of an AMM spec. This ensures the predict path on new data reuses the same basis expansion algorithm on both arms.

Arguments

Argument Type Meaning
amm amm_spec or list of amm_spec The AMM specification.

Returns

  • If amm inherits from "amm_spec":
    • NULL when amm$W is NULL (no modulating term).
    • as.character(amm$W$type) otherwise.
  • If amm is a list:
    • A character vector of length length(amm), where each element is as.character(a$W$type) if the element is an amm_spec with a non-null W, otherwise NA_character_.
    • If all elements are NA, returns NULL.
    • Otherwise returns the full character vector (including NA entries).
  • Otherwise: NULL.

Notes

  • The "all NA → NULL" collapse means a list of specs with no W on any slot is indistinguishable from a single spec with no W.
  • When some slots have W and others do not, the returned vector contains NA_character_ entries, which will cause .check_bridge_amm to fail the identical() test if the two arms differ in which slots lack W.

.bridge_amm_covariates(amm)

Purpose

Extracts the covariate names for each AMM component (a, b, and x_vars) from an AMM spec. This is used to verify that both arms share the same covariate column structure in their AMM designs.

Arguments

Argument Type Meaning
amm amm_spec or list of amm_spec The AMM specification.

Mathematics

For a single amm_spec $a$, the function extracts:

$$ \text{a_vars} = \text{all.vars}(a$a), \qquad \text{b_vars} = \text{all.vars}(a$b), \qquad \text{x_vars} = a$\text{x_vars}, $$

where all.vars() parses an R formula/expression and returns the variable names referenced in it. Each component defaults to character(0L) when the corresponding slot is NULL.

Returns

  • If amm inherits from "amm_spec": a named list with elements a_vars, b_vars, x_vars (each a character vector).
  • If amm is a list: a list of such named lists (one per element of amm).
  • Otherwise: an empty list().

Notes

  • all.vars() is applied to a$a and a$b, which are expected to be formula-like objects (formulas, calls, or expressions). If they are character strings, all.vars() will return character(0).
  • The x_vars slot is taken as-is (not parsed), so it must already be a character vector.
  • The returned structure is compared with identical() in .check_bridge_amm, so the order of x_vars and the order of list elements matter.

.resolve_bridge_newdata(fit_treat, fit_ctrl, newdata, eval_env)

Purpose

Resolves the evaluation grid (newdata) on which the CATE will be computed. When the user supplies newdata, it is used directly. When newdata is NULL, the function attempts to recover both arms' training data by evaluating the captured data argument from each fit's call, then rbinds them into a single data frame.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) Treatment-arm fit; its $call$data is evaluated if newdata is NULL.
fit_ctrl list (fitted model object) Control-arm fit; same recovery logic.
newdata data.frame or NULL User-supplied evaluation grid. If non-NULL, it is validated and returned.
eval_env environment The environment in which to evaluate fit$call$data when recovering training data.

Returns

A data.frame with an attribute "bridge_source":

  • "user" — when newdata was supplied by the user (passed through after assert_data_frame validation).
  • "training_rbind" — when newdata was NULL and both arms' training data were successfully recovered and rbind-ed.

The rbind uses the column order of the treatment arm (data_t) for both arms, ensuring consistent column alignment.

Notes

  • Recovery logic: The inner function recover(fit, arm_label) extracts fit$call$data and evaluates it in eval_env inside a tryCatch. Any evaluation error silently yields NULL. The arm_label argument is accepted but never used in the body.
  • Column matching: Column names are compared after sorting (sort(colnames(...))), so column order may differ between arms but the set of columns must be identical. However, the rbind itself uses data_t's column order for both, so if data_c has columns in a different order, they are subsetted to match data_t's order before binding.
  • Error conditions (all via gdpar_abort):
    • Recovery failure (either arm's data is NULL or not a data frame) → class "gdpar_input_error", data = list(treat_recovered, ctrl_recovered).
    • Column structure mismatch → class "gdpar_input_error", data = list(colnames_treat, colnames_ctrl).
  • assert_data_frame(newdata, "newdata") is called on user-supplied newdata; this is presumably a checkmate-style assertion (not defined in this section).
  • The rbind is performed with drop = FALSE subsetting, preserving data frame structure even for single-column data.

.align_bridge_draws(pred_t, pred_c)

Purpose

Aligns the posterior draw arrays of the two arms by trimming the longer set to match the shorter. This is necessary because the two fits may have been run with different numbers of posterior draws ($S$). When trimming occurs, a diagnostic warning is emitted and the trimming notification is returned for persistence in the final bridge object.

Arguments

Argument Type Meaning
pred_t matrix [S_t, n] or array [S_t, n, dim] Posterior draws from the treatment arm.
pred_c matrix [S_c, n] or array [S_c, n, dim] Posterior draws from the control arm.

Mathematics

The aligned draw count is:

$$ S = \min(S_t, S_c). $$

When $S_t \neq S_c$, the longer array is subsetted to its first $S$ rows along the first axis (the draw axis). The trimming operation for a matrix is:

$$ \text{pred}^{\prime} = \text{pred}[,1{:}S,, \cdot,], $$

and for an array of dimension $d$:

$$ \text{pred}^{\prime} = \text{pred}[,1{:}S,, \cdot,, \ldots,, \cdot,]. $$

Returns

A named list with four components:

Component Type Description
treat matrix or array Trimmed treatment draws, first axis of length $S$.
ctrl matrix or array Trimmed control draws, first axis of length $S$.
S integer The aligned draw count $\min(S_t, S_c)$.
warning character scalar The trimming notification message when $S_t \neq S_c$; NA_character_ otherwise.

Notes

  • Draw count inference: $S$ is inferred from nrow() (for matrices) or dim()[1L] (for arrays). If pred_t is a matrix and pred_c is an array (or vice versa), the function does not explicitly check for type consistency; it will still compute $S_t$ and $S_c$ but the trimming logic branches on is.matrix() per input.
  • Warning emission: When trimming occurs, gdpar_warn is called with class "gdpar_diagnostic_warning" and data = list(S_treat, S_ctrl, S). The same message string is stored in the warning field for persistence.
  • Trimming implementation: The inner function trim_first_axis(arr, S) handles both matrices (simple row indexing with drop = FALSE) and arrays (constructs an index list with seq_len(S) for the first axis and quote(expr =) (i.e., empty subscript) for all remaining axes, then calls do.call([, ...) with drop = FALSE). The use of quote(expr =) produces a missing/empty argument in the subscript list, which selects all elements along that dimension.
  • Persistence: The warning field is designed to be persisted by the constructor into the $warnings slot of the resulting gdpar_causal_bridge object, so the print method can surface the fallback notification (referenced as "D48 canonical norm; D50 of Sesion 18 Etapa 2 of Sesion 8.4" in the source comments).
  • No error is raised if the two inputs have different shapes beyond the first axis; the function only aligns the first (draw) axis.

.summarize_cate(cate_draws, ql, qu)

Purpose

Internal helper that converts raw posterior draws of a Conditional Average Treatment Effect (CATE) quantity into a compact summary consisting of the posterior mean and a lower/upper credible-interval bound per unit (and, in the multi-component case, per component slot). It is the canonical summarizer used downstream by the causal-bridge machinery to produce user-facing CATE summaries; it also tags the output with metadata (dim_kind, dim_size, dim_names) describing the structure of the summarized quantity so that downstream printers/extractors can dispatch correctly.

Arguments

  • cate_draws : numeric posterior draws of the CATE. Two shapes are supported:
    • a matrix with rows indexing posterior draws ($S$) and columns indexing units ($n$), i.e. dim = c(S, n);
    • a 3-dimensional array with dim = c(S, n, K), where the first dimension indexes draws, the second indexes units, and the third indexes component "slots" (e.g. multiple treatment arms, multiple outcome dimensions, or per-dimension effects).
  • ql : numeric scalar in $[0,1]$. Lower-tail quantile probability used for the credible interval.
  • qu : numeric scalar in $[0,1]$. Upper-tail quantile probability used for the credible interval.

Mathematics

For the matrix case (one scalar CATE per unit), let $\theta^{(s)}_i$ denote draw $s$ for unit $i,\ s=1,\dots,S,\ i=1,\dots,n$. The function computes

$$ \hat{\mu}_i = \frac{1}{S}\sum_{s=1}^{S} \theta^{(s)}_i, \qquad q^{,l}_i = F^{-1}_{\theta_i}(q_l), \qquad q^{,u}_i = F^{-1}_{\theta_i}(q_u), $$

where $F^{-1}_{\theta_i}$ is the empirical quantile function of the sample $\{\theta^{(s)}_i\}_{s=1}^{S}$ (as computed by stats::quantile with names = FALSE).

For the 3-D array case, let $\theta^{(s)}_{i,k}$ denote draw $s$ for unit $i$ and slot $k,\ k=1,\dots,K$. The function computes, for each $(i,k)$,

$$ \hat{\mu}_{i,k} = \frac{1}{S}\sum_{s=1}^{S} \theta^{(s)}_{i,k}, \qquad q^{,l}_{i,k} = F^{-1}_{\theta_{i,k}}(q_l), \qquad q^{,u}_{i,k} = F^{-1}_{\theta_{i,k}}(q_u). $$

The dim_kind tag is assigned by the heuristic

$$ \text{dim_kind} = \begin{cases} \texttt{"multi"}, & \text{if } \texttt{slot_names} \text{ is } \texttt{NULL} \text{ or every slot name begins with } \texttt{dim_},\\ \texttt{"K_individual"}, & \text{otherwise.} \end{cases} $$

Returns

A list with the following components, whose shapes depend on the input:

  • Matrix input:

    • mean : numeric vector of length $n$ (column means of cate_draws).
    • ci : numeric $n \times 2$ matrix with columns named "lower" and "upper"; column 1 holds the ql-quantiles, column 2 the qu-quantiles.
    • dim_kind : character scalar, hard-coded to "scalar".
    • dim_size : integer scalar, hard-coded to 1L.
    • dim_names : NULL.
  • 3-D array input:

    • mean : numeric $n \times K$ matrix with dimnames set to list(NULL, slot_names) (i.e. unit dimension unnamed, slot dimension named after the third-dimension names of cate_draws).
    • ci : numeric $n \times K \times 2$ array with dimnames = list(NULL, slot_names, c("lower", "upper")); slice [, , 1] holds the ql-quantiles, slice [, , 2] the qu-quantiles.
    • dim_kind : character scalar, either "multi" or "K_individual" per the heuristic above.
    • dim_size : integer scalar equal to $K$ (dim(cate_draws)[3L]).
    • dim_names : the third-dimension names of cate_draws (dimnames(cate_draws)[[3L]]), which may be NULL.

Notes

  • The function is internal (leading dot) and is not exported.
  • Shape dispatch is performed by testing is.matrix(cate_draws) first, then length(dim(cate_draws)) == 3L. Any other shape (e.g. a vector, a 2-D non-matrix array, or a 4+-D array) triggers an error via gdpar_abort with class = "gdpar_internal_error" and a data field carrying dim(cate_draws). The error message is the literal string "Internal error: unsupported shape for cate_draws.".
  • Quantiles are computed with stats::quantile and names = FALSE; the default quantile type (type 7) of stats::quantile is therefore in effect.
  • In the matrix branch, the two quantile rows returned by apply are re-assembled into an $n \times 2$ matrix explicitly (rather than transposed), so the orientation of ci is guaranteed regardless of how apply lays out its result.
  • In the 3-D branch, apply is called separately for the lower and upper quantiles (with scalar probs), producing two $n \times K$ matrices that are then written into the two slices of the output array. This avoids any ambiguity about the shape of apply's output when probs has length 1.
  • dim_kind is purely a metadata tag derived from the slot names; it does not affect the numerical content of mean or ci. The "multi" tag is intended to flag either anonymous component dimensions (slot_names is NULL) or component dimensions whose names follow the package-internal dim_ naming convention, while "K_individual" flags named per-individual components.
  • The function has no side effects and performs no S3 dispatch of its own; it is a plain closure.

R/check_identifiability.R

gdpar_check_identifiability(amm, data, theta_ref_init = NULL, formula_rhs = NULL, family = NULL, tol = 1e-8, rigor = c("full", "fast"))

Purpose
Diagnose whether the chosen finite parametric representation of the AMM canonical form satisfies the basis‑restricted Functional Independence Condition at a candidate population reference point. This pre‑fit check verifies parameter identifiability by examining the empirical Gram matrix of the extended design matrix.

Arguments

  • amm: Object of class amm_spec (created by amm_spec) defining bases for additive, multiplicative, and modulating components.
  • data: Data frame containing variables referenced in amm; covariates are centered internally before computing the Gram matrix.
  • theta_ref_init: Numeric vector of length p (dimension of the population reference) at which the diagnostic is evaluated. Defaults to a zero vector (if multiplicative component b is absent) or a vector of ones (if b is present).
  • formula_rhs: Optional formula or character vector specifying covariates for the modulating linear factor x; defaults to amm$x_vars.
  • tol: Numeric scalar ∈ (0,1) setting the relative condition‑number tolerance. Failure is flagged when λ_min < tol × λ_max. Default 1e-8.
  • family: Optional gdpar_family or gdpar_family_multi object; if supplied, triggers parameter‑level identifiability (D‑ID) pre‑fit checks.
  • rigor: Character scalar, "full" (default) or "fast", controlling the C4‑bis cross‑component check for p > 1 specs. Ignored when p = 1.

Mathematics
The function builds an extended design matrix $Z$ by column‑binding the design matrices for the additive ($Z_a$), anchored multiplicative ($Z_b \cdot \theta_{\text{ref}}$), and modulating ($X$) components. Each column of $Z$ is normalized to unit Euclidean norm, yielding $Z_{\text{norm}}$. The empirical Gram matrix is

$$ G = \frac{1}{n} Z_{\text{norm}}^\top Z_{\text{norm}}. $$

Its eigenvalues $\lambda_{\text{max}}$ and $\lambda_{\text{min}}$ are computed. The relative condition criterion is

$$ \frac{\lambda_{\text{min}}}{\lambda_{\text{max}}} \ge \text{tol}, $$

equivalent to the condition number $\kappa = \lambda_{\text{max}} / \lambda_{\text{min}} \le 1/\text{tol}$.
For p > 1, a per‑coordinate C4‑bis check is performed (if applicable) by calling check_C4_bis_per_k, and a D‑ID pre‑fit check is performed via .check_did_pre_fit.

Returns
An object of class gdpar_identifiability_report (a list) containing:

Component Description
passed Logical; TRUE iff all applicable checks (Gram, C4‑bis, D‑ID) pass.
lambda_min, lambda_max, condition_number Eigenvalues and condition number of the Gram matrix (if computed).
collinear_directions List of near‑zero eigenvectors projected onto basis columns (if Gram check fails); otherwise NULL.
theta_ref_used The theta_ref_init used.
tol_used, rigor_used The tol and rigor used.
column_labels Character vector labeling columns of the extended design matrix.
c4_bis Result of the C4‑bis per‑coordinate check (if performed); else NULL.
did_pre_fit Result of the D‑ID pre‑fit check (if performed); else NULL.
message Human‑readable summary.

A print method formats the report.

Notes

  • Trivial cases: If amm$level == 0L or there are no active design blocks, the function returns passed = TRUE with a message and NA eigenvalues.
  • Input validation: Uses assert_inherits, assert_data_frame, assert_numeric_scalar, and match.arg; raises gdpar_input_error on invalid inputs.
  • Warning: If theta_ref_init is zero in every coordinate and amm$b is non‑null, a gdpar_diagnostic_warning is issued because the multiplicative block vanishes trivially.
  • Zero‑norm columns: If any column of $Z$ is zero after centering, the function returns passed = FALSE with lambda_min = 0 and condition_number = Inf.
  • Multivariate path: When design$Z_a_list exists and design$Z_a is NULL, the function delegates to check_identifiability_multi (defined elsewhere) and returns its result.
  • Extended matrix construction: For p > 1, the multiplicative block is expanded as $Z_b \cdot \theta_{\text{ref}[k]}$ for each coordinate k, producing column labels b*theta[k]:....
  • C4‑bis and D‑ID checks: Only performed when p > 1 and the necessary design components exist. The rigor argument controls failure vs. warning on column‑name overlap between additive and modulating blocks.
  • Condition number computation: Uses eigen(G, symmetric = TRUE) and guards against division by zero with max(lambda_min, .Machine$double.eps).

check_identifiability_multi(amm, design, theta_ref, tol, rigor, family = NULL)

Purpose Top-level multivariate identifiability report. Combines two pre-fit layers—(i) the per-coordinate C4-bis functional independence check (check_C4_bis_per_k) and (ii) the pre-fit parameter identifiability / D-ID layer (.check_did_pre_fit)—into a single consolidated "gdpar_identifiability_report" object.

Arguments

Argument Type Meaning
amm amm_spec The model specification; forwarded to sub-checks (used to read W$dim and p).
design list Design structure returned by build_amm_design; forwarded to sub-checks.
theta_ref numeric vector (length p) Reference (initial) parameter vector at which identifiability is evaluated.
tol numeric scalar Tolerance for the eigenvalue-ratio (condition-number) criterion in C4-bis.
rigor character scalar ("fast" or "full") Controls depth of the C4-bis check; forwarded to both sub-checks.
family gdpar_family, gdpar_family_multi, or NULL Family object for the D-ID layer; NULL skips that layer entirely.

Mathematics

The function aggregates per-coordinate eigenvalue diagnostics. Denote the smallest and largest eigenvalues returned by coordinate $k$ as $\lambda_{\min}^{(k)}$ and $\lambda_{\max}^{(k)}$, and its condition number $\kappa^{(k)} = \lambda_{\max}^{(k)} / \lambda_{\min}^{(k)}$. The report-level summaries are:

$$ \lambda_{\min}^{\star} = \min_{k};\lambda_{\min}^{(k)}, \qquad \lambda_{\max}^{\star} = \max_{k};\lambda_{\max}^{(k)}, \qquad \kappa^{\star} = \max_{k};\kappa^{(k)} $$

with NA_real_ propagated when every per-$k$ value is NA.

Returns

A list of class c("gdpar_identifiability_report", "list") with elements:

Element Type / Content
passed Logical scalar; TRUE iff both passed_c4_bis and passed_did are TRUE.
lambda_min Numeric; global minimum eigenvalue across coordinates (or NA_real_).
lambda_max Numeric; global maximum eigenvalue across coordinates (or NA_real_).
condition_number Numeric; worst (maximum) condition number across coordinates (or NA_real_).
collinear_directions Always NULL at this level (populated in per-k entries).
theta_ref_used Numeric vector; echo of theta_ref.
tol_used Numeric; echo of tol.
rigor_used Character; echo of rigor.
column_labels character(0).
c4_bis Full return value of check_C4_bis_per_k.
did_pre_fit Return value of .check_did_pre_fit (or NULL).
message Character scalar summarising pass/fail with human-readable explanation.

Notes

  • When family is NULL, .check_did_pre_fit is not called (did_pre_fit is set to NULL) and passed_did defaults to TRUE, so the overall result depends solely on C4-bis.
  • The message distinguishes three cases: all-pass, C4-bis failure, and D-ID failure.
  • min/max over lmins/lmaxs/conds uses na.rm = TRUE but falls back to NA_real_ when every element is NA (guarded by all(is.na(...))).

check_C4_bis_per_k(design, amm, theta_ref, rigor, tol)

Purpose Internal per-coordinate cross-component identifiability check (C4-bis). For each coordinate $k = 1,\dots,p$ either performs a structural column-name overlap test (rigor = "fast") or a full Gram-matrix rank test (rigor = "full") on the additive design matrix $\mathbf{Z}_a[k]$, thereby detecting collinearity within the additive component and naming overlap with the modulating-component design $\mathbf{X}$.

Arguments

Argument Type Meaning
design list Must contain Z_a_list, Z_a_names_list, X, X_names.
amm amm_spec Model specification; amm$W$dim is read to derive the per-k block dimension.
theta_ref numeric vector (length p) Reference parameter vector; used to determine p.
rigor character scalar ("fast" or "full") Selects the check strategy (see below).
tol numeric scalar Eigenvalue-ratio tolerance; a coordinate passes iff $\lambda_{\min} \ge \text{tol}\cdot\lambda_{\max}$.

Mathematics

Let $p = \text{length}(\theta_{\text{ref}})$ and $W_{\text{per-k}} = \lfloor \dim(W) / p \rfloor$.

For each coordinate $k$:

  1. Extract the additive design sub-matrix $\mathbf{Z}_a[k]$ (column-normalised to unit $\ell_2$ norm):

    $\widetilde{\mathbf{Z}}_a[k] = \mathbf{Z}_a[k]\,\mathrm{diag}\!\bigl(\lVert \mathbf{Z}_a[k]_{\cdot j}\rVert_2^{-1}\bigr)$

  2. Form the (sample) correlation/gram matrix:

    $\mathbf{G}_k = \frac{1}{n}\,\widetilde{\mathbf{Z}}_a[k]^{\!\top}\,\widetilde{\mathbf{Z}}_a[k]$

  3. Compute eigen-decomposition $\mathbf{G}_k = \mathbf{V}\,\mathrm{diag}(\lambda)\,\mathbf{V}^{\!\top}$. The coordinate passes the rank sub-check iff:

    $\lambda_{\min}^{(k)} \;\ge\; \text{tol}\;\cdot\;\lambda_{\max}^{(k)}$

  4. Structural overlap sub-check: Independently compute $\text{shared} = \mathrm{colnames}\!\bigl(\mathbf{Z}_a[k]\bigr) \;\cap\; \mathrm{colnames}(\mathbf{X})$. Under rigor = "full", any non-empty intersection causes a fail.

    Under rigor = "fast", overlap triggers only a gdpar_c4bis_overlap_warning (not a failure), and the rank check is skipped entirely—every coordinate is marked passed = TRUE with NA eigenvalue diagnostics.

Algorithm (rigor = "full")

for k in 1..p:
    shared ← intersect(colnames(Z_a[k]), colnames(X))
    if ncol(Z_a[k]) == 0 → pass trivially
    col_norms ← √(colSums(Z_a[k]²))
    if any(col_norms == 0) → fail, record zero-norm columns
    Z_k_n ← Z_a[k] / col_norms          # column-normalise
    G_k ← crossprod(Z_k_n) / nrow(Z_k_n)
    eig_k ← eigen(G_k, symmetric = TRUE)
    λ_min ← min(eig_k$values)
    λ_max ← max(eig_k$values)
    κ ← λ_max / max(λ_min, ε_machine)
    passed_rank ← (λ_min ≥ tol · λ_max)
    passed_overlap ← (shared is empty)
    passed_k ← passed_rank AND passed_overlap
    collinear_directions:
        if !passed_rank  → eigenvectors with eigenvalue < tol·λ_max
        if !passed_overlap → shared columns labelled

Returns

A list with elements:

Element Type / Content
rigor Character; echo of the input rigor.
per_k List of length $p$; each entry is a list:

Each per_k[[k]] entry contains:

Sub-element Type / Content
passed Logical; TRUE if this coordinate passes both sub-checks.
rigor Character ("fast" or "full").
lambda_min Numeric; smallest eigenvalue of $\mathbf{G}_k$ (or NA_real_).
lambda_max Numeric; largest eigenvalue of $\mathbf{G}_k$ (or NA_real_).
condition_number Numeric; $\kappa^{(k)} = \lambda_{\max}^{(k)} / \max(\lambda_{\min}^{(k)},\,\varepsilon)$ (or NA_real_).
shared_cols Character vector; column names shared between $\mathbf{Z}_a[k]$ and $\mathbf{X}$.
collinear_directions List of lists (or NULL); each sub-list has eigenvalue, columns, coefficients (eigenvector loadings with $`
coord Integer; the coordinate index $k$.
message (Only when ncol(Z_a_k) == 0 under "full"): "empty Z_a[k]: trivially passes."

Notes

  • Zero-norm column detection (full rigor): If any column of $\mathbf{Z}_a[k]$ has zero $\ell_2$ norm the coordinate immediately fails with $\lambda_{\min}=0,\ \kappa=\infty$, and the offending columns are recorded in collinear_directions under the label "zero-norm columns in Z_a[k]" with coefficient 1.
  • Fast rigor warning aggregation: Overlap detected across coordinates is accumulated in overlap_acc and emitted as a single gdpar_c4bis_overlap_warning (via gdpar_warn) after the loop. The warning includes a data$shared_cols_by_coord slot listing shared columns per coordinate.
  • Modulating-component design $\mathbf{X}$ not used in rank check: The source code documents that the extended Gram matrix $[\mathbf{Z}_a[k] \mid \theta_{\text{ref}}[k]^m \mathbf{X}]$ was considered but rejected because at a fixed $\theta_{\text{ref}}$ the $W$-block columns are scalar multiples of $\mathbf{X}$ per coordinate $l$, yielding rank 1 by construction. Cross-component non-identifiability is therefore deferred to post-fit diagnostics (divergences, low ESS, high $\hat{R}$).
  • Column naming convention: Collinear-direction column labels are prefixed with "a{k}:" (e.g., "a3:smooth_term").
  • Default anchor: The comment states the anchor is taken as zero, consistent with the package default on the linear-predictor scale.
  • %||% usage: X_names, colnames(Z_a_k), lambda_min, lambda_max, and condition_number all use the null-coalescing operator %||% to supply fallback values.

.check_did_pre_fit(family, design, theta_ref, rigor)

Purpose Pre-fit validation of Data-Identifying Design (DID) conditions for individual-scope parameters within a gdpar_family or gdpar_family_multi object. Called before the Stan fit to verify that all per-observation or per-group parameter specifications have an explicit DID declaration and, when rigor == "full" and $K \geq 2$, that the parameters exhibit symbolic separability (distinct canonical prior kinds).

Arguments

Argument Type Meaning
family gdpar_family, gdpar_family_multi, or NULL The family object whose param_specs are inspected. If NULL, the function returns NULL immediately.
design any Present in the signature but never referenced in the function body; reserved for future use or passed through for call-site compatibility.
theta_ref any Present in the signature but never referenced in the function body; same rationale as design.
rigor character scalar "full" activates the symbolic-separability sub-check when $K \geq 2$; any other value skips it (separability defaults to TRUE).

Algorithm

  1. Family extraction. If family inherits from "gdpar_family_multi", the first element family$families[[1]] is taken as base_family; if it inherits from "gdpar_family", base_family <- family; otherwise return NULL.

  2. Individual-scope filtering. From base_family$param_specs, retain only specs whose scope is "per_observation" or "per_group". Let $K$ be the count of such specs.

  3. Per-parameter metadata. For each retained spec $s$, extract the fields name, scope, did_status, did_condition, did_reference, prior_canonical_kind.

  4. DID declarative check. $$ \text{passed_did} = \bigwedge_{k=1}^{K} \bigl(s_k.\text{did_status} \in {\texttt{"holds"},;\texttt{"holds_under_condition"}}\bigr) $$

  5. Symbolic separability (conditional on $K \geq 2$ and rigor == "full"). Collect the vector of prior_canonical_kind values $\mathbf{p} = (p_1, \dots, p_K)$. An element $p_i$ is overlapping if $p_i = p_j$ for some $j \neq i$: $$ \text{overlap}_i = \bigl(\exists, j \neq i : p_j = p_i\bigr) $$ This is detected via duplicated(p) | duplicated(p, fromLast = TRUE). If any overlap exists, passed_separability <- FALSE and the offending kinds/names are recorded. Otherwise the check passes.

  6. Final verdict. $$ \text{passed} = \text{passed_did} ;\wedge; \text{passed_separability} $$

Returns

A list (or NULL if family is NULL or not a recognized gdpar family):

Field Type Meaning
passed logical scalar TRUE iff DID declarations are valid and separability holds
K integer scalar Number of individual-scope parameter specs
per_param list of length $K$ Per-parameter metadata (name, scope, did_status, did_condition, did_reference, prior_canonical_kind)
symbolic_separability NULL or list NULL when not evaluated; otherwise a list with passed, overlapping_kinds, overlapping_names, message
rigor character scalar Echo of the input rigor

Notes

  • design and theta_ref are accepted but completely unused; this allows a uniform calling convention across identifiability-check helpers.
  • For a gdpar_family_multi object only the first family (family$families[[1]]) is inspected; sub-families at index $\geq 2$ are ignored.
  • If base_family$param_specs is NULL, the function returns NULL (not a list with passed = TRUE), signaling that DID checking is not applicable.
  • No errors are raised; all conditions are evaluated non-destructively.

.check_Z_a_K_per_slot(design_K, rigor, tol)

Purpose Implements layer D-B3 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check of the per-slot additive design matrix $Z_a^{(k)}$ for each slot $k = 1, \dots, K$ in the $K$-individual parameter path. Each $Z_a^{(k)}$ is already column-centered by .build_amm_design_K() before reaching this helper. The check detects zero-norm columns (structural rank deficiency) and, under rigor == "full", computes the normalized Gram condition number and flags slots whose minimum eigenvalue falls below $\text{tol} \cdot \lambda_{\max}$.

Arguments

Argument Type Meaning
design_K list Output of .build_amm_design_K(); must contain Z_a_k_list (list of $K$ matrices), Z_a_k_names_list (list of column-name vectors), slot_names (character vector of length $K$).
rigor character scalar "fast" skips the Gram eigendecomposition (rank is checked only via zero-norm detection); "full" performs the full eigendecomposition.
tol numeric scalar Threshold for the eigenvalue-ratio criterion. A slot passes when $\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}$.

Mathematics

For each slot $k$ with non-empty $Z_a^{(k)} \in \mathbb{R}^{n \times p_k}$:

  1. Column norms. $c_j = \lVert z_j \rVert_2 = \sqrt{\sum_{i=1}^n z_{ij}^2}$ for $j = 1, \dots, p_k$. Any $c_j = 0$ signals an identically-zero column and immediately fails the slot.

  2. Column normalization (full rigor). $$ \widetilde{Z} = Z_a^{(k)} \cdot \mathrm{diag}(c_1^{-1}, \dots, c_{p_k}^{-1}) $$

  3. Normalized Gram matrix. $$ G = \frac{1}{n},\widetilde{Z}^\top \widetilde{Z} ;\in; \mathbb{R}^{p_k \times p_k} $$

  4. Eigendecomposition. $G = V \Lambda V^\top$ with eigenvalues $\lambda_1 \geq \cdots \geq \lambda_{p_k} \geq 0$.

  5. Rank criterion. $$ \text{ok} = \bigl(\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}\bigr) $$ where $\lambda_{\min} = \lambda_{p_k}$ and $\lambda_{\max} = \lambda_1$.

  6. Collinear-direction extraction. For each eigenvalue $\lambda_j &lt; \text{tol} \cdot \lambda_{\max}$, the corresponding eigenvector $v^{(j)}$ is inspected: components with $|v^{(j)}_i| &gt; 10^{-3}$ are retained and sorted by descending absolute value, yielding the collinear direction report.

Returns

A list:

Field Type Meaning
passed logical scalar TRUE iff every slot passes
rigor character scalar Echo of input
per_slot list of length $K$ (names = slot_names) Per-slot reports (see below)

Each element of per_slot is a list:

Field Type Meaning
slot character Slot name
passed logical Slot-level pass/fail
rigor character Effective rigor for this slot
lambda_min numeric Minimum eigenvalue (NA_real_ if not computed)
lambda_max numeric Maximum eigenvalue (NA_real_ if not computed)
condition_number numeric $\lambda_{\max} / \max(\lambda_{\min}, \epsilon_{\text{mach}})$; Inf if zero-norm columns detected
collinear_columns NULL or list NULL when passed; under rigor == "fast" with zero columns, a character vector of offending column names; under rigor == "full" a list of sub-lists each with eigenvalue, columns, coefficients
message character Human-readable diagnostic

Notes

  • Empty matrix shortcut. If $p_k = 0$, the slot passes trivially with lambda_min = lambda_max = condition_number = NA_real_.
  • Zero-norm columns. Detected before any eigendecomposition. The offending column names are taken from Z_a_k_names_list[[k]], falling back to colnames(Z_a_k), falling back to character(ncol(Z_a_k)).
  • rigor == "fast" with no zero-norm columns. Passes unconditionally without Gram computation; diagnostics are NA_real_.
  • Numerical safeguard. The condition number divides by $\max(\lambda_{\min}, \epsilon_{\text{mach}})$ where $\epsilon_{\text{mach}}$ is .Machine$double.eps, preventing division by zero.
  • No S3 dispatch; purely internal utility.

.check_C4_bis_K_cross_slot(design_K, rigor, tol)

Purpose Implements layer D-B2 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check on the column-wise concatenation of per-slot additive design matrices $Z_a^{(1)}, \dots, Z_a^{(K)}$. Even when each individual $Z_a^{(k)}$ is full column rank (verified by .check_Z_a_K_per_slot), the joint matrix can be rank-deficient when the same covariate appears in multiple slots with linearly equivalent designs. This helper detects such cross-slot collinearity.

Arguments

Argument Type Meaning
design_K list Output of .build_amm_design_K(); same structure as for .check_Z_a_K_per_slot.
rigor character scalar "fast" returns a structural pass without eigendecomposition; "full" computes the Gram eigendecomposition of the joint matrix.
tol numeric scalar Eigenvalue-ratio threshold: $\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}$.

Mathematics

  1. Joint construction. For each slot $k$ with $p_k &gt; 0$ columns, prefix every column name with "{slot_name}:" and horizontally concatenate: $$ Z_{\text{joint}} = \bigl[, Z_a^{(1)} ;\big|; Z_a^{(2)} ;\big|; \cdots ;\big|; Z_a^{(K)},\bigr] ;\in; \mathbb{R}^{n \times P} $$ where $P = \sum_{k: p_k &gt; 0} p_k$.

  2. Zero-norm detection. Column norms $c_j = \lVert z_j \rVert_2$. Any $c_j = 0$ immediately fails the check.

  3. Column normalization and Gram matrix (full rigor). $$ \widetilde{Z}{\text{joint}} = Z{\text{joint}} \cdot \mathrm{diag}(c_1^{-1}, \dots, c_P^{-1}), \qquad G = \frac{1}{n},\widetilde{Z}{\text{joint}}^\top \widetilde{Z}{\text{joint}} $$

  4. Eigendecomposition and rank criterion. Identical to the per-slot check: $$ \text{ok} = \bigl(\lambda_{\min}(G) \geq \text{tol} \cdot \lambda_{\max}(G)\bigr) $$

  5. Collinear-direction extraction. Same thresholding procedure as .check_Z_a_K_per_slot: for each eigenvalue below the tolerance band, retain eigenvector components with $|v_i| &gt; 10^{-3}$, sorted by descending absolute value. Column names are prefixed with slot identifiers so the report identifies which slot contributes each column.

Returns

A list:

Field Type Meaning
passed logical scalar TRUE iff the joint matrix passes the rank criterion
rigor character scalar Echo of input
lambda_min numeric Minimum eigenvalue of $G$ (NA_real_ if not computed)
lambda_max numeric Maximum eigenvalue of $G$ (NA_real_ if not computed)
condition_number numeric $\lambda_{\max} / \max(\lambda_{\min}, \epsilon_{\text{mach}})$; Inf if zero-norm columns
collinear_directions NULL or list NULL when passed; otherwise a list of sub-lists each with eigenvalue, columns (prefixed names), coefficients
total_columns integer $P$, the total number of columns in $Z_{\text{joint}}$
message character Human-readable diagnostic

Notes

  • Empty joint design. If every slot has $p_k = 0$, the function returns passed = TRUE with total_columns = 0L and NA diagnostics.
  • rigor == "fast". After concatenation (to compute total_columns), the function returns a structural pass without eigendecomposition. Note that zero-norm detection is also skipped under "fast" — unlike .check_Z_a_K_per_slot which at least checks column norms under "fast".
  • Zero-norm columns under "full". Detected before eigendecomposition; the returned collinear_directions contains a single synthetic entry with eigenvalue = 0 and coefficients = rep(1, length(bad)), since the true null direction is degenerate.
  • Column-name resolution. For each slot $k$, names are sourced from Z_a_k_names_list[[k]], then colnames(Z_a_k), then auto-generated "col1", "col2", … via paste0("col", seq_len(ncol(Z_a_k))).
  • No S3 dispatch; purely internal. The returned list is a component of the combined report produced by the companion function .check_identifiability_K.

.check_identifiability_K(design_K, rigor = "full", tol = 1e-8)

Purpose Top-level orchestrator for K-individual identifiability checks in the gdpar pipeline. It aggregates two independent sub-checks—per-slot rank of the anchor design blocks (C1–C3 conditions) and cross-slot Gram-matrix conditioning (C4-bis condition)—into a single logical pass/fail verdict. This function is called during the pre-fit decision layer to determine whether the anchor parameterisation is structurally identifiable.

Arguments

Argument Type Meaning
design_K list Design structure returned by the AMM design builders. Must contain a Z_a_k_list component (list of per-slot anchor design matrices). Passed directly to the two sub-check helpers.
rigor character scalar Checking rigor; matched against c("full", "fast") via match.arg. "full" performs exhaustive eigenvalue/SVD-based rank analysis; "fast" may use cheaper heuristics. Defaults to "full".
tol numeric scalar Numerical tolerance (eigenvalue ratio threshold) for rank decisions. Defaults to 1e-8. Passed through to both sub-check functions.

Mathematics This function does not implement a formula directly; it delegates to:

  1. .check_Z_a_K_per_slot(design_K, rigor, tol) — verifies that each per-slot anchor design matrix $Z_{a,k}$ has full column rank (the C1–C3 conditions).
  2. .check_C4_bis_K_cross_slot(design_K, rigor, tol) — verifies that the cross-slot block Gram matrix is non-singular (the C4-bis condition), i.e., that no linear combination of columns across different $k$-slots creates a collinearity.

The overall verdict is the logical conjunction: the identifiability check passes if and only if both sub-checks pass.

Returns A named list with components:

Component Type Description
passed logical TRUE iff both per_slot_rank$passed and cross_slot_gram$passed are TRUE.
rigor character The resolved rigor level.
tol numeric The tolerance used.
K integer Number of K-individual slots, derived as length(design_K$Z_a_k_list).
per_slot_rank list Full return value of .check_Z_a_K_per_slot.
cross_slot_gram list Full return value of .check_C4_bis_K_cross_slot.

Notes

  • rigor is validated by match.arg; an unrecognised value raises an error before either sub-check executes.
  • Both sub-checks receive the same rigor and tol values, ensuring consistent stringency.
  • K is extracted from the length of Z_a_k_list; if that component is missing or empty, K will be 0 and the sub-checks must handle that case internally.
  • No side effects; purely computational with no global state modification.

.compute_info_ratio_K(fit, family, slot_names, use_groups, prior)

Purpose Implements Block D-B1 of sub-phase 8.3.4: post-fit information contraction analysis per K-individual slot. For each slot's anchor parameter, it computes the prior-to-posterior variance contraction ratio $C_k$ and classifies the result as pass, warn, or information-error. This diagnostic detects slots where the data is essentially uninformative about the anchor, so the posterior merely recovers the prior. The function is internal and invoked after Stan sampling completes.

Arguments

Argument Type Meaning
fit cmdstanr fit object The output of cs_model$sample(). Used to extract posterior draws via fit$draws(format = "draws_matrix").
family list (gdpar_family) A gdpar_family object whose K-individual slots have been promoted to per-observation scope by .gdpar_promote_scope_per_observation(). Its param_specs component is consulted for prior_canonical_kind per slot.
slot_names character vector Canonical slot names, length $K$. Used to name output entries and to construct Stan parameter names.
use_groups integer scalar (0 or 1) Whether the fit used per-group hierarchical anchors. Determines the Stan parameter root: "mu_theta_ref_k" when 1, "theta_ref_k" when 0.
prior list (gdpar_prior) A gdpar_prior object; its priors_by_kind overrides are documented as potentially consulted but are currently inert (the helper falls back to canonical kinds).

Mathematics

The contraction for slot $k$ is:

$$C_k = 1 - \frac{\mathrm{Var}_{\text{post}}(\theta_{\text{ref},k})}{\mathrm{Var}_{\text{prior}}(\theta_{\text{ref},k})}$$

where $\mathrm{Var}_{\text{post}}$ is the sample variance of the posterior draws and $\mathrm{Var}_{\text{prior}}$ is obtained from .gdpar_canonical_prior_variance(kind) for the slot's canonical prior kind.

Decision thresholds (from sub-phase 8.3.4 scoping):

$$\text{status} = \begin{cases} \texttt{"pass"} & \text{if } C_k \ge 0.5 \ \texttt{"warn"} & \text{if } 0.1 \le C_k < 0.5 \ \texttt{"information_error"} & \text{if } C_k < 0.1 \end{cases}$$

Returns A named list with components:

Component Type Description
passed logical TRUE iff no slot triggered "warn" or "information_error".
any_warn logical TRUE if any slot has status "warn".
any_info_error logical TRUE if any slot has status "information_error".
thresholds named numeric vector c(warn = 0.5, information_error = 0.1).
per_slot named list of length $K$ Per-slot diagnostic results (see below).

Each element of per_slot is a named list:

Component Type Description
slot character The slot name from slot_names[k].
var_post numeric Posterior sample variance (or NA_real_ if skipped).
var_prior numeric Prior variance from canonical kind (or NA_real_).
contraction numeric $C_k$ value (or NA_real_ if skipped/non-finite).
status character One of "pass", "warn", "information_error", "skipped".
message character Diagnostic message string.

Notes

  • Early return on missing draws: If fit$draws(format = "draws_matrix") throws an error (caught via tryCatch), the function returns immediately with passed = TRUE and all slots marked "skipped" with message "fit$draws() unavailable". This is a defensive fallback—no diagnostic is raised when draws are inaccessible.
  • Parameter name construction: The Stan parameter name is paste0(param_root, "[1,", k, "]") for both use_groups == 0 and use_groups == 1 (the index structure is [1, k] in both cases, using the first row). The root is "theta_ref_k" or "mu_theta_ref_k" depending on use_groups.
  • Spec resolution: The function attempts to filter family$param_specs for specs with scope in c("per_observation", "per_group"). If the filtered length does not equal $K$, it falls back to taking the first $K$ specs positionally (family$param_specs[seq_len(K)]).
  • Non-finite contraction: If var_prior is Inf, 0, or the ratio produces NaN/Inf, the contraction is NA_real_ and status is "skipped" with message "non-finite contraction".
  • Insufficient draws: If the draws column exists but has fewer than 4 elements (length(draws_k) < 4L), the slot is skipped with message "draws for '<param>' unavailable" (even though draws technically exist, the sample is too small for a reliable variance estimate).
  • No canonical variance: If .gdpar_canonical_prior_variance(kind) returns a non-finite value, the slot is skipped with message reporting the kind.
  • The information-error raises a gdpar_information_error warning class elsewhere in the pipeline; this function itself only classifies—it does not emit warnings or errors.
  • The prior argument is accepted for future extensibility but is currently unused within the function body.

print.gdpar_identifiability_report(x, ...)

Purpose S3 print method for objects of class "gdpar_identifiability_report". Renders a human-readable summary of the identifiability diagnostic to the console, covering all possible sub-report sections: eigenvalue/condition-number diagnostics, collinear directions (C1–C4), C4-bis per-coordinate cross-component checks, and D-ID pre-fit parameter analysis. Exported for user convenience.

Arguments

Argument Type Meaning
x list with S3 class "gdpar_identifiability_report" The identifiability report object. Expected components are detailed in the Returns section below.
... (unused) Present for S3 generic compatibility; ignored.

Returns Invisibly returns x (the input object), following standard R print-method conventions. The primary effect is the side effect of printing formatted text to the console via cat().

Notes

  • Top-level fields: Always prints x$passed. Prints lambda_min, lambda_max, condition_number, and tol_used only if x$lambda_min is not NA. Prints x$rigor_used if non-NULL. Prints x$message unconditionally.
  • Collinear directions (C1–C4): If x$passed is FALSE and x$collinear_directions is non-NULL, iterates over each direction entry, printing the eigenvalue and each column/coefficient pair in the direction vector. Each entry d is expected to have d$eigenvalue (numeric), d$columns (character vector), and d$coefficients (numeric vector of same length as d$columns).
  • C4-bis section: If x$c4_bis is non-NULL, iterates over x$c4_bis$per_k. Each entry pk is expected to have pk$coord, pk$rigor, pk$passed, pk$condition_number (may be Inf), pk$shared_cols (character vector, may be empty), and optionally pk$collinear_directions with the same structure as above.
  • D-ID pre-fit section: If x$did_pre_fit is non-NULL, prints the total K, overall passed flag, and per-parameter details (name, scope, prior_canonical_kind, did_status, did_condition). Also prints symbolic_separability if present, including passed and overlapping_kinds.
  • Formatting: Uses format(..., digits = 4) for eigenvalues and condition numbers, format(..., digits = 3, width = 7) for direction coefficients. Indentation uses fixed strings (" ", " ", " ", " ").
  • No validation is performed on the structure of x; missing components are guarded by is.null / is.na checks. If x lacks expected fields, the corresponding section is silently omitted.
  • The function does not raise errors under normal conditions; it is purely presentational.

Note: The source section also contains the roxygen documentation block for a .check_C7_group_anchor_aliasing function (implementing condition C7 from Block 6.5—detecting aliasing between group indicators and design columns). However, the function definition (signature and body) is not present in this section; it begins in the subsequent section (section 5 of 5). Therefore it is documented there, not here.

.check_group_aliasing_c7(design, group_id, group_var_name, tol = 1e-8)

Purpose
Checks the identifiability condition (C7) of Block 6.5 for group aliasing. Ensures that columns of the design matrices for the $a$ or $b$ components are not constant within groups (which would alias with the per-group anchor $\theta_{\text{ref}[g]}$) and that the combined matrix of group indicators and design columns has full column rank (no indirect aliasing). This function orchestrates per-block checks, handling both univariate and multi‑coordinate designs.

Arguments

  • design: A list containing design matrices. In a univariate setting it should have elements Z_a, Z_b, Z_a_names, Z_b_names. In a multi‑coordinate setting it should have Z_a_list, Z_b_list, Z_a_names_list, Z_b_names_list (each a list of length $p$, the number of coordinates).
  • group_id: A vector of group identifiers. Converted internally to an integer factor. If NULL or with fewer than two levels, the check is skipped.
  • group_var_name: A character string naming the grouping variable (used in error messages).
  • tol: Numeric tolerance for comparing variances and QR‑rank deficiency (default 1e-8).

Mathematics
The function implements two checks per design block (see .check_c7_one_block):

  1. Within‑group variance test: For each column $j$ of a design matrix $Z$, compute the within‑group variance $s_{jg}^2$ for each group $g$. If $s_{jg}^2 \le \text{tol}$ for all groups (i.e., the column is constant within every group), identifiability is violated.
  2. Rank test: Let $G$ be the indicator matrix for the groups. The combined matrix $M = [G \mid Z]$ must have full column rank. If $\mathrm{rank}(M) &lt; \text{ncol}(M)$ (after column normalization), there exists an indirect alias between the group anchor and a linear combination of the design columns.

Returns
invisible(NULL) invisibly. The function is called for its side effects (raising errors).

Notes

  • Internal function (not exported, leading dot).
  • If group_id is NULL or has fewer than two levels, the function returns immediately without checks.
  • For multi‑coordinate designs (has_multi_design is TRUE), the check is applied to each coordinate block separately, iterating over seq_len(p) where p = length(design$Z_a_list).
  • For each block, it calls .check_c7_one_block with the appropriate design sub‑matrix, column names, and coordinate index.
  • Errors are raised via gdpar_abort() with class "gdpar_input_error" and a structured data list containing the component, coordinate (if any), group variable name, and (for variance violations) the names of the aliased columns.

.check_c7_one_block(Z, Z_names, component, coord, group_int, J_groups, group_var_name, tol)

Purpose
Internal helper that applies the two‑layer aliasing check (C7) to a single design block $Z$ (either $Z_a$ or $Z_b$, for a specific coordinate $k$ in a multi‑coordinate design or with coord = NA for univariate). It performs the within‑group variance test and the joint rank test.

Arguments

  • Z: Numeric design matrix (may have zero columns or rows). If NULL, zero columns, or zero rows, the function returns immediately.
  • Z_names: Character vector of column names for Z, used in error messages to identify aliased columns.
  • component: Character string, either "a" or "b", indicating which model component the block belongs to.
  • coord: Integer coordinate index $k$ (for multi‑coordinate designs) or NA_integer_ for univariate. Used only for error message formatting.
  • group_int: Integer vector of group memberships (values in $1, \dots, J_{\text{groups}}$) for each observation.
  • J_groups: Integer number of groups. (Received but not used directly in the computations; group structure is encoded in group_int and the indicator matrix $G$.)
  • group_var_name: Character string naming the grouping variable (for error messages).
  • tol: Numeric tolerance for variance comparisons and QR‑rank deficiency.

Mathematics

  1. Within‑group variance test:
    For each column $j = 1, \dots, p$ (where $p = \text{ncol}(Z)$): $$ s_{jg}^2 = \begin{cases} 0 & \text{if group } g \text{ has fewer than 2 observations}, \ \frac{1}{|g|-1} \sum_{i \in g} (Z_{ij} - \bar{Z}_{\cdot j g})^2 & \text{otherwise}, \end{cases} $$ where $\bar{Z}_{\cdot j g}$ is the mean of column $j$ within group $g$.
    Let $m_j = \max_{g=1}^{J_{\text{groups}}} s_{jg}^2$.
    If $m_j \le \text{tol}$, column $j$ is flagged as constant within every group.
  2. Rank test:
    Construct the group indicator matrix $G \in \mathbb{R}^{n \times J_{\text{groups}}}$ via model.matrix(~ as.factor(group_int) + 0).
    Form $M = [G \mid Z]$. Normalize each column of $M$ by its Euclidean norm (with zero norms replaced by 1 to avoid division by zero).
    Compute the QR decomposition of the normalized matrix: $\text{qr}(M_{\text{norm}})$.
    If $\mathrm{rank}(M_{\text{norm}}) &lt; \text{ncol}(M_{\text{norm}})$, the joint matrix is rank‑deficient, indicating an indirect alias.

Returns
invisible(NULL) if all checks pass.

Notes

  • Internal function (not exported, leading dot), called only by .check_group_aliasing_c7.
  • Early returns with invisible(NULL) if:
    • Z is NULL,
    • ncol(Z) == 0L,
    • nrow(Z) == 0L.
  • Two possible error conditions:
    1. Direct aliasing (constant columns): If any column has within‑group variance $\le \text{tol}$, gdpar_abort() is called with a message listing the offending column names. The error data includes the component, coordinate, group variable name, and aliased_columns.
    2. Indirect aliasing (rank deficiency): If the rank of the normalized combined matrix is less than the number of columns, gdpar_abort() is called with a message including the component, coordinate, group variable name, the observed rank, and the number of columns. The error data includes rank and ncol.
  • Errors are of class "gdpar_input_error" and contain a data list for programmatic handling.
  • The argument J_groups is passed but not used in any computation; the group structure is fully represented by group_int and the generated indicator matrix $G$.

R/compare_eb_fb_methods.R

S3 Methods for gdpar_eb_fb_comparison Objects

This file defines three S3 methods supporting the comparison of Empirical-Bayes (EB) and Fully-Bayes (FB) estimation paths produced elsewhere in the gdpar package. The methods provide console printing, a structured summary, and printing of that summary.


print.gdpar_eb_fb_comparison(x, digits = 3L, ...)

Purpose

S3 print method for objects of class gdpar_eb_fb_comparison. Produces a concise human-readable console summary of an EB-vs-FB comparison, including the estimation families and paths involved, the number of common $\xi$ parameters, marginal summary statistics of the Total Variation (TV) distribution and the EB/FB interval width-ratio distribution, and the first six rows of the per-anchor $\theta$-diff table. Any stored warnings are appended at the end.

Arguments

Argument Type Meaning
x gdpar_eb_fb_comparison (list) The comparison object to display. Expected components: family_eb, family_fb (character), path_eb, path_fb (character), level (numeric), tv_bins (integer scalar), n_common_params (integer scalar), tv_table (data frame with a tv column, possibly NULL), coverage_table (data frame with a width_ratio column, possibly NULL), theta_diff_table (data frame, possibly NULL), warnings (character vector).
digits integer scalar Passed to format() for numeric formatting; defaults to 3L.
... (any) Unused; absorbed for S3 generic compatibility.

Mathematics

No mathematical formula is implemented. The method computes order statistics over finite subsets of two distributions:

  • TV values: $\{v \in \texttt{tv\_table}\$tv : v \in \mathbb{R}\}$, reporting $\min,\ \mathrm{median},\ \max$.
  • Width ratios: $\{r \in \texttt{coverage\_table}\$width\_ratio : r \in \mathbb{R}\}$, reporting $\min,\ \mathrm{median},\ \max$.

Non-finite values (NA, NaN, Inf, -Inf) are excluded via is.finite() before computing summary statistics.

Returns

The object x, returned invisibly (via invisible(x)). The primary effect is console output.

Notes

  • The level component is formatted with format(x$level, digits = digits); its type is not validated.
  • If tv_table is NULL or has zero rows, or if all tv values are non-finite, the marginal TV line is silently omitted.
  • If coverage_table is NULL or has zero rows, or if all width_ratio values are non-finite, the width-ratio line is silently omitted.
  • The $\theta$-diff preview uses utils::head(x$theta_diff_table, 6L) and is passed through format(..., digits = digits) before print().
  • Warnings are printed one per line, prefixed with " - ".
  • No validation or error-checking is performed on the structure of x; missing components would propagate as R errors (e.g., $ on NULL).

summary.gdpar_eb_fb_comparison(object, ...)

Purpose

S3 summary method for objects of class gdpar_eb_fb_comparison. Constructs a structured list suitable for programmatic access and for the canonical print.summary.gdpar_eb_fb_comparison method. Aggregates the TV table and the coverage (width-ratio) table into seven-point summary statistics (count, min, 25th percentile, median, 75th percentile, max, mean).

Arguments

Argument Type Meaning
object gdpar_eb_fb_comparison (list) The comparison object to summarize. Same expected components as for print.gdpar_eb_fb_comparison, plus optionally call.
... (any) Unused; absorbed for S3 generic compatibility.

Mathematics

For the TV distribution, let $V = \{v \in \texttt{tv\_table}\$tv : v \in \mathbb{R}\}$. If $|V| &gt; 0$, the summary stores:

$$ \texttt{tv_summary} = \bigl(n,; \min(V),; Q_{0.25}(V),; \mathrm{median}(V),; Q_{0.75}(V),; \max(V),; \bar{V}\bigr) $$

where $Q_p$ denotes the empirical quantile at probability $p$ (computed via stats::quantile) and $\bar{V} = \frac{1}{n}\sum_{i} v_i$.

For the width-ratio distribution, let $R = \{r \in \texttt{coverage\_table}\$width\_ratio : r \in \mathbb{R}\}$. If $|R| &gt; 0$, the summary stores the analogous seven statistics:

$$ \texttt{coverage_summary} = \bigl(n,; \min(R),; Q_{0.25}(R),; \mathrm{median}(R),; Q_{0.75}(R),; \max(R),; \bar{R}\bigr) $$

If either finite subset is empty, the corresponding summary element is set to NULL.

Returns

A list of class c("summary.gdpar_eb_fb_comparison", "list") with the following components:

Component Type Description
family_eb character Copied from object$family_eb.
family_fb character Copied from object$family_fb.
path_eb character Copied from object$path_eb.
path_fb character Copied from object$path_fb.
level (inherits from object$level) Copied from object$level.
tv_bins integer Copied from object$tv_bins.
n_common_params integer Copied from object$n_common_params.
n_anchor_cells integer 0L if object$theta_diff_table is NULL, otherwise nrow(object$theta_diff_table).
tv_summary list or NULL Seven-element list (n, min, q25, median, q75, max, mean) or NULL.
coverage_summary list or NULL Seven-element list (same structure) or NULL.
theta_diff_table data frame or NULL Copied by reference from object$theta_diff_table.
tv_table data frame or NULL Copied by reference from object$tv_table.
coverage_table data frame or NULL Copied by reference from object$coverage_table.
warnings character `object$warnings %
call (any) Copied from object$call.

Notes

  • The %||% infix operator is used for the warnings default; this operator is assumed to be defined elsewhere in the package (not in this file). Under standard rlang::%||%`` semantics, it returns the left-hand side if it is not NULL, otherwise the right-hand side.
  • Quantiles are computed with stats::quantile at probabilities $0.25$ and $0.75$; the default type = 7 interpolation is used. The names attribute of the scalar result is stripped via unname().
  • The q25 and q75 fields are unnamed scalars; all other numeric fields inherit names from min(), stats::median(), max(), and mean() (typically NULL for these functions on atomic vectors).
  • No copy is made of the table data frames; they are assigned by reference into the output list.
  • The output class vector is c("summary.gdpar_eb_fb_comparison", "list"), enabling dispatch on both the specific class and the implicit list class.

print.summary.gdpar_eb_fb_comparison(x, digits = 3L, ...)

Purpose

S3 print method for objects of class summary.gdpar_eb_fb_comparison. Renders the structured summary to the console, displaying the EB/FB families and paths, the level, the count of common $\xi$ parameters and anchor cells, the full five-point quantile summary plus mean for both the marginal TV distribution and the EB/FB width-ratio distribution, the complete $\theta$-diff table, and any stored warnings.

Arguments

Argument Type Meaning
x summary.gdpar_eb_fb_comparison (list) The summary object to display. Expected components: family_eb, family_fb, path_eb, path_fb (character), level (numeric), n_common_params (integer scalar), n_anchor_cells (integer scalar), tv_summary (list or NULL), coverage_summary (list or NULL), theta_diff_table (data frame or NULL), warnings (character vector).
digits integer scalar Passed to format() for numeric formatting; defaults to 3L.
... (any) Unused; absorbed for S3 generic compatibility.

Mathematics

No mathematical formula is implemented. The method formats and displays pre-computed summary statistics. For each distribution (TV and width-ratio), it prints:

$$ \min,; Q_{0.25},; \mathrm{median},; Q_{0.75},; \max $$

on one line, followed by $\bar{X}$ (the mean) on a separate line.

Returns

The object x, returned invisibly (via invisible(x)). The primary effect is console output.

Notes

  • Unlike print.gdpar_eb_fb_comparison (which shows only the first six rows of theta_diff_table), this method prints the full theta_diff_table via print(format(x$theta_diff_table, digits = digits)).
  • The tv_summary and coverage_summary blocks are each printed only if the corresponding component is non-NULL. Each block includes the sample size n in its header.
  • The level is formatted with format(x$level, digits = digits).
  • Warnings are printed one per line, prefixed with " - ", only if length(x$warnings) > 0L.
  • The tv_bins component is not printed by this method (unlike print.gdpar_eb_fb_comparison), even though it is present in the summary object.
  • No structural validation of x is performed; missing or mistyped components would propagate as R errors.

R/compare_eb_fb.R

gdpar_compare_eb_fb(eb_fit, fb_fit, level = 0.95, tv_bins = 30L, ...)

Purpose (role in the package). Exported orchestrator for Sub-phase 8.6.E (Charter Section 3.5, decision 2.5 Trio of vignettes). It produces a descriptive operational comparison between an Empirical-Bayes fit (gdpar_eb_fit, from gdpar_eb()) and a Fully-Bayes fit (gdpar_fit, from gdpar()) fitted on the same dataset. It does not assert algorithmic equivalence and does not test hypotheses across the two inferential frames. It computes three tables:

  1. Per-anchor-cell differences in the population anchor $\theta_{\text{ref}}$.
  2. Marginal empirical total-variation (TV) distance between the lower-level posteriors of $\xi = (a,\, b,\, W,\, \text{dispersion})$, parameter by parameter.
  3. Operational verification of the higher-order coverage discrepancy (v07 Section 6, Proposition 7B scalar / 7B* matricial / 7B* tensorial) on the nominal EB and FB credible intervals.

Arguments

Argument Type Meaning
eb_fit object of class gdpar_eb_fit Empirical-Bayes fit produced by gdpar_eb(). Covers all four path regimes: K = 1 + p = 1; Path A (K = 1 + p > 1); Path B (K > 1 + p = 1); Path C (K > 1 + p > 1, via the K × p tensor extension of Sub-phase 8.6.D).
fb_fit object of class gdpar_fit Fully-Bayes fit produced by gdpar(). Must have been fitted on the same dataset (same outcome, same covariates, same K / p regime). The comparator does not refit either model.
level numeric scalar Credible-interval level for coverage-discrepancy reporting. Must lie in $(0,\,1)$. Defaults to 0.95.
tv_bins integer scalar Number of histogram bins used to approximate the marginal TV distance per parameter. Must be $\geq 5$. Defaults to 30L. Larger values give a finer empirical TV but require more draws per parameter for stability.
... (any) Reserved for future arguments; currently unused.

Mathematics

The marginal total-variation distance between two distributions $P$ and $Q$ on a single scalar parameter is approximated via a histogram-based plug-in estimator:

$$\widehat{\text{TV}}(P,Q) = \frac{1}{2}\sum_{j=1}^{B} \bigl|\hat p_j - \hat q_j\bigr|,$$

where $B = \texttt{tv\_bins}$ is the number of bins, $\hat p_j$ and $\hat q_j$ are the relative bin counts (from a common breakpoint grid) of the EB and FB posterior draws respectively. No finite-sample correction is applied. Joint TV across the high-dimensional $\xi$ is out of scope (would require kernel Stein discrepancy or similar density-free metrics); the marginal TV reported per parameter is the operational proxy recommended in v07 Section 11.1.

The coverage-discrepancy table compares EB-nominal vs FB-nominal credible-interval widths at level level per anchor cell, operationally verifying the $\mathcal{O}(n^{-1})$ under-cover claim of Proposition 7B.

Returns

An object of class c("gdpar_eb_fb_comparison", "list") with components:

Component Type / Structure Meaning
theta_diff_table data.frame or NULL Per-anchor-cell comparison of EB vs FB $\theta_{\text{ref}}$ estimates. See .gdpar_eb_fb_theta_diff_table.
tv_table data.frame or NULL Marginal TV distance per common $\xi$ parameter. NULL when EB or FB draws are unavailable or there are zero common parameter names.
coverage_table data.frame or NULL Coverage-discrepancy table (EB-nominal vs FB-nominal IC widths per anchor cell).
level numeric scalar Echo of the level input.
tv_bins integer scalar Echo of the tv_bins input.
n_common_params integer Number of rows in tv_table (or 0L if tv_table is NULL).
path_eb character Path identifier from eb_fit$path, defaulting to "eb".
path_fb character Path identifier from fb_fit$path, defaulting to "fb".
family_eb character eb_fit$family$name, or NA_character_ if absent.
family_fb character fb_fit$family$name, or NA_character_ if absent.
call call The matched call.
warnings character vector Accumulated fallback notification messages from helper extractors. Empty (character(0L)) in the happy path.
meta list Contains mode = "compare_eb_fb" and a human-readable note summarizing the TV and coverage methodology.

Companion S3 methods print.gdpar_eb_fb_comparison and summary.gdpar_eb_fb_comparison are documented separately.

Notes

  • Input validation. eb_fit must inherit "gdpar_eb_fit"; fb_fit must inherit "gdpar_fit". level must be a single numeric value in $(0,1)$. tv_bins must be a single numeric value $\geq 5$ (coerced to integer). Violations raise a gdpar_input_error via gdpar_abort().
  • Required namespace. The posterior package is required (suggested dependency); an informative error is raised if absent.
  • Warning accumulation. A local emit() closure accumulates messages into warnings_msg (and also calls gdpar_warn() for each). Six distinct fallback conditions are checked:
    1. All-FB theta_ref draws are NA (unknown template convention or empty draws).
    2. EB $\xi$ draws are NULL.
    3. FB $\xi$ draws are NULL.
    4. TV table is NULL despite both draw sets being non-NULL (zero common parameter names).
    5. All-FB widths in the coverage table are NA.
  • FB conditional_fit fallback. For the FB $\xi$ draws extraction, the function first tries fb_fit$conditional_fit and falls back to fb_fit$fit via the %||% null-coalescing operator.
  • Path uniformity. All four EB regimes are handled uniformly by the helpers. For Path C the theta_ref_kp_hat tensor is flattened to a length-$Kp$ vector keyed by (slot, coord); the joint $K \times p$ inflation tensor is reported in the coverage table per cell via diagonal-block entries.
  • S3 dispatch. The returned object carries class c("gdpar_eb_fb_comparison", "list") for dispatch by the companion print/summary methods.

.gdpar_eb_fb_theta_diff_table(eb_fit, fb_fit, level)

Purpose (role in the package). Internal helper that builds the theta_diff_table component of the comparison object. It extracts the EB anchor point estimates and their standard errors, attempts to extract FB posterior draws of $\theta_{\text{ref}}$ via .gdpar_eb_fb_extract_theta_ref_draws_fb(), and computes per-anchor-cell differences. The row key structure varies by path regime.

Arguments

Argument Type Meaning
eb_fit object of class gdpar_eb_fit Empirical-Bayes fit. Path regime is inferred from eb_fit$path (checked for "eb_KxP" for Path C). Contains theta_ref_kp_hat / theta_ref_kp_se (Path C), or theta_ref_hat / theta_ref_se (other paths).
fb_fit object of class gdpar_fit Fully-Bayes fit. Passed to the FB draws extractor.
level numeric scalar in $(0,1)$ Credible-interval level. Accepted by the function signature but not used in the current body (reserved for future use or passed downstream elsewhere).

Mathematics

For each anchor cell $i$:

$$\texttt{diff}_i ;=; \widehat{\theta}_{\text{ref},i}^{,\text{EB}} ;-; \overline{\theta}_{\text{ref},i}^{,\text{FB}},$$

$$\texttt{diff_rel}_i ;=; \frac{\widehat{\theta}_{\text{ref},i}^{,\text{EB}} ;-; \overline{\theta}_{\text{ref},i}^{,\text{FB}}}{\text{sd}!\bigl(\theta_{\text{ref},i}^{,\text{FB draws}}\bigr)},$$

where $\overline{\theta}_{\text{ref},i}^{\,\text{FB}}$ and $\text{sd}(\cdot)$ are the posterior mean and posterior standard deviation of the FB draws for cell $i$. The relative difference $\texttt{diff\_rel}_i$ is set to NA_real_ when $\text{sd}(\cdot)$ is not finite or $\leq 0$.

Returns

A data.frame whose structure depends on path regime:

  • Path C (eb_fit$path == "eb_KxP", i.e.\ $K &gt; 1$ and $p &gt; 1$): One row per $(g,\,k,\,c)$ triple ($g = 1,\ldots,J$ groups; $k = 1,\ldots,K$ slots; $c = 1,\ldots,p$ coordinates). Columns:

    Column Type Meaning
    group integer Group index $g$.
    slot character Slot name from eb_fit$slot_names[k].
    coord integer Coordinate index $c$.
    eb_estimate numeric eb_fit$theta_ref_kp_hat[g, k, c].
    eb_se numeric eb_fit$theta_ref_kp_se[g, k, c].
    fb_mean numeric Posterior mean of FB draws for cell $(g,k,c)$; NA_real_ if draws unavailable.
    fb_se numeric Posterior SD of FB draws; NA_real_ if draws unavailable.
    diff numeric $\texttt{eb\_estimate} - \texttt{fb\_mean}$.
    diff_rel numeric $(\texttt{eb\_estimate} - \texttt{fb\_mean}) / \texttt{fb\_se}$, or NA_real_ when fb_se is not finite or $\leq 0$.
  • Non-Path-C (K = 1 + p = 1, Path A, Path B): One row per anchor cell, indexed sequentially. Columns:

    Column Type Meaning
    cell integer Sequential cell index $1,\ldots,n_{\text{cells}}$.
    eb_estimate numeric as.numeric(eb_fit$theta_ref_hat)[i].
    eb_se numeric as.numeric(eb_fit$theta_ref_se)[i].
    fb_mean numeric From fb_draws$flat$means[i]; NA_real_ if absent.
    fb_se numeric From fb_draws$flat$ses[i]; NA_real_ if absent.
    diff numeric $\texttt{eb\_estimate} - \texttt{fb\_mean}$.
    diff_rel numeric Conditional ratio as above; uses vectorized ifelse.

Notes

  • FB draws extraction. Calls .gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit) inside a tryCatch that silently returns NULL on any error. When NULL, all FB columns are filled with NA_real_.
  • Path C iteration. Uses a triple nested for loop over $(g, k, c)$, pre-allocating a list of length $J \cdot K \cdot p$. Each list element is a single-row data.frame, assembled at the end via do.call(rbind, rows). FB draws for each cell are accessed as fb_draws$kp[[g]][[k]][, c].
  • Non-Path-C path. EB estimates and SEs are coerced to numeric vectors via as.numeric(). FB means and SEs are taken from fb_draws$flat$means and fb_draws$flat$ses, with length truncated to min(n_cells, length(fb_draws$flat$means)).
  • Edge case — zero-length FB draws. For Path C, if a cell's FB draw vector is NULL or has length == 0, both fb_mean and fb_se are set to NA_real_.
  • Edge case — zero FB SE. diff_rel is NA_real_ whenever fb_se is NA, not finite, or $\leq 0$.
  • The level argument is accepted but not used in the computation within this helper; it is present for interface consistency with the other helpers.

.gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit)

Purpose (role in the package). Internal helper that extracts the $\theta_{\text{ref}}$ posterior draws from a Fully-Bayes gdpar_fit object in a path-aware manner. Used by .gdpar_eb_fb_theta_diff_table to obtain the FB posterior summaries ($\overline{\theta}^{\,\text{FB}}$ and $\text{sd}(\theta^{\,\text{FB draws}})$) for comparison against EB point estimates.

Arguments

Argument Type Meaning
fb_fit object of class gdpar_fit Fully-Bayes fit whose Stan draws are to be inspected for $\theta_{\text{ref}}$ variables.

Mathematics

No formula is implemented by this helper; it is a pure data-extraction utility that retrieves posterior draws from the Stan output using the canonical posterior::as_draws_matrix interface.

Returns

A list whose structure depends on the path regime of fb_fit:

Path regime Component Structure Meaning
Non-Path-C (K = 1 + p = 1 / Path A / Path B) flat Named list with elements means (numeric vector) and ses (numeric vector) Posterior means and standard deviations of $\theta_{\text{ref}}$ draws, keyed by cell index. Draw variable names expected under theta_ref[...] or theta_ref_k[...] conventions.
Path C (K > 1 + p > 1) kp Nested list: kp[[g]][[k]] is a matrix with $n_{\text{draws}}$ rows and $p$ columns Posterior draws of $\theta_{\text{ref}}$ per group $g$, slot $k$, and coordinate $c = 1,\ldots,p$. Draw variable names expected under theta_ref_kp[...] convention.

Returns NULL when extraction fails (draws are not present in the recognized variable-name convention, or fb_fit lacks draw data).

Notes

  • Body not in this section. The function body is defined in section 2 of 2; only the roxygen documentation block appears in this section. The documented behavior is as described above.
  • Path C debt. The documentation explicitly notes that the K × p FB template for Path C is itself a follow-on debt of the 8.4 unification effort per Charter and the project_gdpar_deuda_8_4_unificacion_stan debt item.
  • Fail-silent design. The function returns NULL on failure rather than raising an error. The calling orchestrator (gdpar_compare_eb_fb) detects this via downstream NULL checks and emits structured warnings through the emit() closure.
  • @keywords internal / @noRd. This function is internal and does not generate an .Rd help page.

.gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit)

Purpose Extracts the posterior draws of reference-anchor parameters ($\theta_{\mathrm{ref}}$) from a fitted Forward-Backward (FB) model object. It inspects the stored Stan/MCMC draws and returns them in one of two structural forms—"flat" (a list of posterior means and standard deviations) or "kp" (a nested list of per-group, per-slot, per-coordinate draw matrices)—depending on which variable-naming convention (Path A/B/C) was used during sampling. Returns NULL if no $\theta_{\mathrm{ref}}$ variables are found or if the draws cannot be retrieved.

Arguments

Argument Type Meaning
fb_fit list A fitted FB model object. The function first looks for fb_fit$conditional_fit (an EB conditional fit embedded within the FB workflow) and falls back to fb_fit$fit (the raw FB fit). Each is expected to possess a $draws() method returning a posterior draws object.

Mathematics

No closed-form formula is implemented. The function is a dispatch-and-extraction routine that selects the appropriate draws-dimension based on variable naming:

  • Path C convention: variables matching ^theta_ref_kp\[ (three-index tensor $\theta_{\mathrm{ref}\_kp}[g,k,c]$).
  • Path B convention: variables matching ^theta_ref_k\[ (one-dimensional, per-slot).
  • Path A / default convention: variables matching ^theta_ref(\[|$) (scalar or simple vector).

For the "flat" returns (Paths A and B), the function computes: $$\bar{\theta}j = \frac{1}{S}\sum{s=1}^{S} \theta_j^{(s)}, \qquad \mathrm{sd}j = \sqrt{\frac{1}{S-1}\sum{s=1}^{S}\bigl(\theta_j^{(s)} - \bar{\theta}_j\bigr)^2}$$ where $S$ is the number of posterior draws and $j$ indexes the parameter.

Returns

  • NULL if no draws object is available, if $draws() errors, if variable names are empty, or if no $\theta_{\mathrm{ref}}$ variables are found.
  • list(kp = <nested list>) — Path C: a nested list produced by .gdpar_eb_fb_unpack_kp, indexed as kp[[g]][[k]][, c] with $g=1,\dots,J$ (groups), $k=1,\dots,K$ (slots), $c=1,\dots,p$ (coordinates), each cell containing a numeric vector of posterior draws.
  • list(flat = list(means = <numeric>, ses = <numeric>)) — Paths A/B: unnamed numeric vectors of posterior means and standard deviations, one element per $\theta_{\mathrm{ref}}$ cell.

Notes

  • Uses the null-coalescing operator %||% (from rlang) to choose conditional_fit over fit.
  • Both the $draws() call and the dimnames() access are wrapped in tryCatch, silently returning NULL on error—this is a defensive pattern for partially-initialized or incomplete fit objects.
  • Variable detection proceeds in priority order: Path C (kp) is checked first, then Path B (k), then Path A (default). Only the first match is returned.
  • The comment notes that the Path C "kp" branch is not consumed by the diff table when the EB fit is $K=1, p=1$ (Path A); only the kp branch itself touches that structure.
  • Depends on the posterior package for as_draws_matrix and subset_draws.

.gdpar_eb_fb_unpack_kp(mat, vars_c)

Purpose Unpacks a draws matrix containing Path C-style $\theta_{\mathrm{ref}\_kp}[g,k,c]$ variables into a deeply nested list structure indexed by group $g$, slot $k$, and coordinate $c$. Each leaf is a numeric vector of posterior draws for one tensor cell.

Arguments

Argument Type Meaning
mat posterior::draws_matrix A draws matrix whose columns are the $\theta_{\mathrm{ref}\_kp}$ variables. Column names must follow the pattern theta_ref_kp[g,k,c].
vars_c character Character vector of variable names matching the theta_ref_kp[...] pattern (used for parsing index triples).

Mathematics

The function reconstructs the three-index tensor $$\theta_{\mathrm{ref}_kp} \in \mathbb{R}^{J \times K \times p}$$ from flat column names. For each column name theta_ref_kp[g,k,c], the integer indices $(g, k, c)$ are extracted via regular-expression parsing. The dimensions $J,\ K,\ p$ are inferred as: $$J = \max_g g, \quad K = \max_k k, \quad p = \max_c c$$

Returns

A nested list kp of depth 3:

  • kp[[g]] — list of length $J$ (groups/observations).
  • kp[[g]][[k]] — list of length $K$ (slots).
  • kp[[g]][[k]] — matrix of dimension $S \times p$ ($S$ = number of posterior draws, $p$ = number of coordinates). If a particular theta_ref_kp[g,k,c] column is absent from mat, that column is filled with NA_real_.

Notes

  • The regex "\\[(\\d+),(\\d+),(\\d+)\\]" with regexec captures exactly three comma-separated integers inside square brackets. If a parsed match has fewer than 4 elements (the full match plus 3 groups), c(NA, NA, NA) is substituted.
  • The result matrix kp[[g]][[k]] is initialized to NA_real_ before filling, so any missing column in mat yields NA draws for that cell rather than an error.
  • This function is marked @keywords internal and @noRd; it is not exported.
  • Uses sprintf to reconstruct the expected column name for lookup in mat.

.gdpar_eb_fb_extract_xi_draws(fit_obj)

Purpose Extracts the posterior draws of the "xi" parameter vector (the non-anchor model parameters: fixed-effect coefficients $a$, covariance parameters $c_b$, covariance matrix $W$, and dispersion parameters) from a fitted model object. It explicitly excludes $\theta_{\mathrm{ref}}$ variables, generated quantities ($\eta,\ \log\tilde{lik},\ y_{\mathrm{pred}},\ \theta_i$), raw/packed helper variables, and lp__. This function is used by both the EB conditional fit and the FB fit.

Arguments

Argument Type Meaning
fit_obj list A fitted model object (either EB conditional or FB) expected to have a $draws() method returning a posterior draws object. May be NULL.

Mathematics

No formula is implemented. The function performs a filtering operation on variable names. Variables are retained if they do not match any of the following exclusion patterns (joined by |):

Pattern Excluded variables
^lp__$ Log posterior density
^theta_ref Reference-anchor parameters (all variants)
^mu_theta_ref Mean of reference-anchor hyperparameters
^sigma_theta_ref SD of reference-anchor hyperparameters
^eta Linear predictor generated quantities
^eta_kp Path C linear predictor
^log_lik Pointwise log-likelihood (LOO)
^y_pred Posterior predictive draws
^theta_i Individual-level random effects
^a_raw Raw (non-centered) fixed-effect coefficients
^c_b_raw Raw covariance parameters
^c_b_kp_raw Path C raw covariance parameters
^W_raw Raw covariance matrix elements

Returns

  • NULL if fit_obj is NULL, if $draws() errors, if variable names are empty, or if no variables survive the exclusion filter.
  • A posterior::draws_matrix containing only the retained "xi" columns.

Notes

  • This function is the complement of .gdpar_eb_fb_extract_theta_ref_draws_fb: that function extracts only $\theta_{\mathrm{ref}}$ draws, while this one extracts everything except $\theta_{\mathrm{ref}}$ and generated quantities.
  • Both $draws() and dimnames() accesses are wrapped in tryCatch, silently returning NULL on error.
  • The exclusion list is hard-coded; adding new generated-quantity prefixes would require editing the paste(...) call.
  • Marked @keywords internal, @noRd, not exported.

.gdpar_eb_fb_tv_table(draws_eb, draws_fb, tv_bins)

Purpose Computes a marginal Total Variation (TV) distance between the EB and FB posterior distributions for each parameter common to both draws objects. This provides a diagnostic for how closely the EB approximation matches the full FB posterior on a per-parameter basis.

Arguments

Argument Type Meaning
draws_eb posterior::draws_matrix Posterior draws from the EB (Empirical Bayes) fit. Column names are parameter names.
draws_fb posterior::draws_matrix Posterior draws from the FB (Forward-Backward) fit. Column names are parameter names.
tv_bins integer scalar Number of bins for the shared histogram grid used in the TV computation.

Mathematics

For each parameter $\psi$ common to both draw sets, a shared support grid $[r_{\min}, r_{\max}]$ is constructed where: $$r_{\min} = \min\bigl(\min(x_{\mathrm{eb}}),, \min(x_{\mathrm{fb}})\bigr), \quad r_{\max} = \max\bigl(\max(x_{\mathrm{eb}}),, \max(x_{\mathrm{fb}})\bigr)$$

The grid is divided into $B$ equal-width bins with breakpoints: $$b_j = r_{\min} + j \cdot \frac{r_{\max} - r_{\min}}{B}, \quad j = 0, 1, \dots, B$$

Histogram counts $h_{\mathrm{eb},j}$ and $h_{\mathrm{fb},j}$ are computed over these bins and normalized to empirical pmfs: $$\hat{p}{\mathrm{eb},j} = \frac{h{\mathrm{eb},j}}{\sum_j h_{\mathrm{eb},j}}, \qquad \hat{p}{\mathrm{fb},j} = \frac{h{\mathrm{fb},j}}{\sum_j h_{\mathrm{fb},j}}$$

The marginal TV distance is then the histogram plug-in estimator: $$\widehat{\mathrm{TV}}(\psi) = \frac{1}{2}\sum_{j=1}^{B} \bigl|\hat{p}{\mathrm{eb},j} - \hat{p}{\mathrm{fb},j}\bigr|$$

Returns

  • NULL if either draws_eb or draws_fb is NULL, or if no common column names exist.
  • A data.frame with one row per common parameter and columns:
Column Type Meaning
parameter character Parameter name (column name in the draws objects).
tv numeric Marginal TV distance ($\in [0,1]$), or NA_real_ if the range is degenerate or non-finite.
n_eb integer Number of EB draws for this parameter.
n_fb integer Number of FB draws for this parameter.
mean_eb numeric Posterior mean from EB draws.
mean_fb numeric Posterior mean from FB draws.

Notes

  • If the combined range rng contains non-finite values or has zero width (diff(rng) <= 0), tv is set to NA_real_ for that parameter.
  • intersect(colnames(draws_eb), colnames(draws_fb)) determines common parameters; order follows draws_eb column order.
  • Uses graphics::hist(..., plot = FALSE) solely for bin counting; no plot is produced.
  • The rows are assembled via do.call(rbind, rows) after a loop over common parameters.
  • Marked @keywords internal, @noRd, not exported.

.gdpar_eb_fb_coverage_table(eb_fit, fb_fit, level)

Purpose Builds a coverage diagnostic table comparing credible-interval widths between the EB and FB posteriors for each $\theta_{\mathrm{ref}}$ cell. The ratio $\mathrm{width}_{\mathrm{eb}} / \mathrm{width}_{\mathrm{fb}}$ operationally diagnoses the $O(n^{-1})$ under-coverage predicted by Proposition 7B (v07 Section 6), with extensions to Path A, Path B, and Path C (v07b Section 5).

Arguments

Argument Type Meaning
eb_fit list A fitted EB model object. Must contain theta_ref_hat, theta_ref_se, and for Path C: theta_ref_kp_hat, theta_ref_kp_se, K, p, slot_names. May contain correction_applied (logical), eb_correction_constant (scalar), correction_tensor_constant (matrix), and path (character).
fb_fit list A fitted FB model object, passed to .gdpar_eb_fb_extract_theta_ref_draws_fb.
level numeric scalar Nominal credible level $\ell \in (0,1)$ (e.g., 0.95).

Mathematics

The significance level and critical value are: $$\alpha = 1 - \ell, \qquad z = \Phi^{-1}!\bigl(1 - \alpha/2\bigr)$$

Path C ($\texttt{eb\_fit\$path} == \texttt{"eb\_KxP"}$): For each cell $(g, k, c)$ with $g=1,\dots,J,\ k=1,\dots,K,\ c=1,\dots,p$:

  • The EB standard error is $\mathrm{se}_{g,k,c}^{\mathrm{eb}} = \mathtt{eb\_fit\$theta\_ref\_kp\_se}[g,k,c]$.
  • If the EB correction was applied and the correction tensor $\mathbf{T} \in \mathbb{R}^{K \times p \times p}$ is available with finite entries, the inflation factor is: $$\mathrm{inflate}_{k,c} = \sqrt{1 + \frac{T[k,c,c]}{\max(1, J)}}$$ Otherwise $\mathrm{inflate}_{k,c} = 1$.
  • The EB credible-interval width is: $$w_{g,k,c}^{\mathrm{eb}} = 2z \cdot \mathrm{se}{g,k,c}^{\mathrm{eb}} \cdot \mathrm{inflate}{k,c}$$
  • The FB credible-interval width is: $$w_{g,k,c}^{\mathrm{fb}} = 2z \cdot \mathrm{sd}\bigl(\theta_{\mathrm{ref}_kp}^{(s)}[g,k,c]\bigr)$$ computed from the FB posterior draws (via .gdpar_eb_fb_extract_theta_ref_draws_fb), or NA_real_ if unavailable.
  • The width ratio is: $$R_{g,k,c} = \frac{w_{g,k,c}^{\mathrm{eb}}}{w_{g,k,c}^{\mathrm{fb}}}$$ set to NA_real_ if $w^{\mathrm{fb}}$ is non-finite or zero.

Path A / Path B (non-Path-C): For each cell $j=1,\dots,J_{\mathrm{flat}}$ (where $J_{\mathrm{flat}} = \mathrm{length}(\mathtt{theta\_ref\_hat})$):

  • $\mathrm{se}_j^{\mathrm{eb}} = \mathtt{eb\_fit\$theta\_ref\_se}[j]$.
  • If eb_fit$correction_applied is TRUE, the scalar inflation factor is: $$\mathrm{inflate} = \sqrt{1 + \frac{c_{\mathrm{eb}}}{\max(1, J_{\mathrm{flat}})}}$$ where $c_{\mathrm{eb}} = \mathtt{eb\_fit\$eb\_correction\_constant}$ (defaulting to 0 if NULL). Otherwise $\mathrm{inflate} = 1$.
  • EB width: $w_j^{\mathrm{eb}} = 2z \cdot \mathrm{se}_j^{\mathrm{eb}} \cdot \mathrm{inflate}$.
  • FB width: $w_j^{\mathrm{fb}} = 2z \cdot \mathrm{sd}_{j}^{\mathrm{fb}}$ from the flat FB draws, or NA_real_ if the FB draws are shorter or unavailable.
  • The width ratio is computed element-wise with ifelse, yielding NA_real_ when $w^{\mathrm{fb}}$ is non-finite or $\leq 0$.

Returns

  • NULL is never explicitly returned (unlike the other functions in this section); instead the function always returns a data.frame.

  • Path C: A data.frame with $J \times K \times p$ rows and columns:

Column Type Meaning
group integer Group index $g$.
slot character Slot name from eb_fit$slot_names[k].
coord integer Coordinate index $c$.
eb_width numeric EB credible-interval width.
fb_width numeric FB credible-interval width (or NA).
width_ratio numeric $\mathrm{eb\_width} / \mathrm{fb\_width}$ (or NA).
inflation numeric The correction inflation factor $\mathrm{inflate}_{k,c}$ (always $\geq 1$ when active).
  • Path A / Path B: A data.frame with $J_{\mathrm{flat}}$ rows and columns:
Column Type Meaning
cell integer Sequential cell index $j = 1, 2, \dots$.
eb_width numeric EB credible-interval width.
fb_width numeric FB credible-interval width (or NA).
width_ratio numeric $\mathrm{eb\_width} / \mathrm{fb\_width}$ (or NA).
inflation numeric Scalar correction inflation factor.

Notes

  • Path C vs. non-Path-C dispatch is determined by identical(eb_fit$path, "eb_KxP").
  • The FB draws extraction (.gdpar_eb_fb_extract_theta_ref_draws_fb) is wrapped in tryCatch; on error fb_draws is set to NULL, and all FB widths will be NA_real_.
  • For Path C, the correction tensor entry tensor[k, c, c] is checked with all(is.finite(tensor[k, c, c])) before computing the inflation; if the tensor is NULL or the entry is non-finite, inflation defaults to 1.
  • For Path A/B, eb_fit$eb_correction_constant defaults to 0 via %||% when NULL.
  • The width_ratio is the key diagnostic: a ratio significantly above 1 indicates the EB credible intervals are wider than the FB intervals, consistent with the $O(n^{-1})$ under-coverage correction described in the referenced theoretical results.
  • Marked @keywords internal, @noRd, not exported.

← Part III — Computational Architecture · gdpar Wiki Home · Part IV — Exhaustive Function Reference (2/7) →

Clone this wiki locally