Skip to content

Part IV Function Reference 2

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 4 revisions

← Part IV — Exhaustive Function Reference (1/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (3/7) →


R/compare_meta_learners_methods.R

print.gdpar_meta_learner_comparison(x, ...)

Purpose S3 print method for objects of class gdpar_meta_learner_comparison. Renders a concise human-readable summary of a meta-learner comparison: the bridge identifier, observation/method counts, the credible level, per-external-method metadata (native CI availability, elapsed time, note count, presence of a predict_fun), and a head view of the three concordance matrices (RMSE, Pearson, MAD).

Arguments

  • x: gdpar_meta_learner_comparison. The comparison object to display. Expected to contain components n_obs, n_methods, level, external (a named list of per-method adapter result lists, each with native_ci, time_sec, notes, has_predict_fun), and comparison (a list with rmse, pearson, mad matrices).
  • ...: any. Unused; present for S3 generic compatibility. Silently ignored.

Mathematics

None.

Returns

Invisibly returns x (via invisible(x)). The side effect is console output.

Notes

  • The method does not validate x; it assumes the structure is present. Missing components would propagate as errors from cat/print.
  • Per-method line format is fixed by sprintf: "- %-12s native_ci = %s time = %.3f s notes = %d predict = %s\n". native_ci and has_predict_fun are coerced to character by sprintf's %s (typically "TRUE"/"FALSE").
  • The three concordance matrices are printed with round(..., 4L); they are expected to be square numeric matrices with shared row/column names of length $m = 1 + n_{\text{methods}}$ (bridge plus externals).
  • S3 dispatch is triggered by print(x) when class(x) contains "gdpar_meta_learner_comparison".

summary.gdpar_meta_learner_comparison(object, ...)

Purpose S3 summary method for gdpar_meta_learner_comparison objects. Constructs a structured long-format summary containing: per-method ATE point estimates, per-method ATE CI bounds (averaged from per-observation native CIs when available, otherwise NA_real_), the three concordance matrices pivoted into long form, and per-method timing/CI-availability metadata.

Arguments

  • object: gdpar_meta_learner_comparison. The comparison object. Must contain bridge_cate (with cate_mean and optionally cate_ci), external (named list of adapter results, each with cate_mean, optionally cate_ci, time_sec, native_ci), comparison (with rmse, pearson, mad), level, n_obs, n_methods.
  • ...: any. Unused; present for S3 generic compatibility.

Mathematics

Per-method ATE is the sample mean of the per-observation CATE posterior means:

$$ \widehat{\mathrm{ATE}}_{\text{method}} = \frac{1}{n}\sum_{i=1}^{n} \widehat{\mathrm{CATE}}_{\text{method},i} $$

When native per-observation CIs are present, the ATE CI bounds are likewise the sample means of the per-observation lower and upper bounds:

$$ \widehat{\mathrm{ATE}}^{\text{lo}}_{\text{method}} = \frac{1}{n}\sum_{i=1}^{n} L_{\text{method},i}, \qquad \widehat{\mathrm{ATE}}^{\text{hi}}_{\text{method}} = \frac{1}{n}\sum_{i=1}^{n} U_{\text{method},i} $$

Otherwise both bounds are NA_real_.

Returns

A list of class c("summary.gdpar_meta_learner_comparison", "list") with components:

  • ate_table: data.frame with columns method (character: "bridge" followed by names(object$external)), ate (numeric), ate_lower (numeric, possibly NA), ate_upper (numeric, possibly NA).
  • metrics: long-format data.frame produced by .comparison_long(object$comparison) with columns method_i, method_j, rmse, pearson, mad (off-diagonal rows only).
  • timing: data.frame with columns method (external method names only — bridge excluded), time_sec (numeric), native_ci (logical).
  • level, n_obs, n_methods: copied verbatim from object.

Notes

  • Calls assert_inherits(object, "gdpar_meta_learner_comparison", "object"); raises an error (presumably of class gdpar_input_error per package convention) if the class is absent.
  • Bridge ATE CI bounds are populated only if object$bridge_cate$cate_ci is non-NULL; otherwise they remain NA_real_.
  • External ATE CI bounds are populated per-method only if e$cate_ci is non-NULL; the code does not consult e$native_ci here, only the presence of cate_ci.
  • ate_vec, ate_lower, ate_upper are named numeric vectors initialized with stats::setNames; the bridge slot is filled first, then external slots in iteration order.
  • The timing data frame excludes the bridge (no timing recorded for it).
  • S3 dispatch is triggered by summary(object) when class(object) contains "gdpar_meta_learner_comparison".

print.summary.gdpar_meta_learner_comparison(x, ...)

Purpose S3 print method for objects of class summary.gdpar_meta_learner_comparison. Prints the credible level, observation count, method count, the ATE table, the timing/CI-availability table, and the first 20 rows of the long-format pairwise concordance metrics.

Arguments

  • x: summary.gdpar_meta_learner_comparison. The summary object produced by summary.gdpar_meta_learner_comparison. Expected components: level, n_obs, n_methods, ate_table, timing, metrics.
  • ...: any. Unused; present for S3 generic compatibility.

Mathematics

None.

Returns

Invisibly returns x (via invisible(x)). The side effect is console output.

Notes

  • Does not validate x.
  • ate_table and timing are printed with row.names = FALSE.
  • metrics is truncated to its first 20 rows via utils::head(x$metrics, 20L); if nrow(x$metrics) > 20L, a sprintf line of the form " ... (%d more rows)\n" is emitted with the count of omitted rows.
  • S3 dispatch is triggered by print(x) when class(x) contains "summary.gdpar_meta_learner_comparison".

.comparison_long(comparison)

Purpose Internal helper that pivots the three square concordance matrices (rmse, pearson, mad) stored in a comparison object into a single long-format data.frame containing one row per ordered off-diagonal method pair $(i, j)$ with $i \neq j$.

Arguments

  • comparison: list. Must contain numeric matrix components rmse, pearson, mad with identical dimensions and shared rownames/colnames. Row names are read from rownames(comparison$rmse).

Mathematics

Given $m \times m$ concordance matrices $R$ (RMSE), $P$ (Pearson), $D$ (MAD) indexed by method names $\{n_1, \ldots, n_m\}$, the long form enumerates all ordered pairs $(i, j)$ with $i \neq j$:

$$ \text{row}_{(i,j)} = \bigl(, n_i,\ n_j,\ R_{ij},\ P_{ij},\ D_{ij},\bigr) $$

Diagonal entries ($i = j$) are skipped via if (i == j) next. The total number of rows is $m(m-1)$.

Returns

A data.frame with columns method_i (character), method_j (character), rmse (numeric), pearson (numeric), mad (numeric), constructed by do.call(rbind, out_rows) over a list of single-row data frames. stringsAsFactors = FALSE is set on each constituent.

Notes

  • Marked @keywords internal and @noRd; not exported.
  • Iteration uses seq_along(nms) for both i and j, so the order is row-major over the upper and lower triangles combined (i.e., both $(i, j)$ and $(j, i)$ appear, but $(i, i)$ is excluded).
  • Assumes rownames(rmse) is non-NULL and that all three matrices share the same dimension and names; no consistency check is performed.
  • The list out_rows is pre-allocated by appending with an incrementing index k; if any i == j is skipped, the corresponding slot is never assigned, but because k is only incremented after assignment, no NULL slots are produced.
  • Returns NULL (from do.call(rbind, list())) if nms is empty.

predict.gdpar_meta_learner_comparison(object, newdata, level = NULL, bridge = NULL, data = NULL, ...)

Purpose S3 predict method for gdpar_meta_learner_comparison objects. Re-evaluates the CATE on a new covariate grid newdata for the bridge component and for every external adapter. Adapters exposing a predict_fun reuse their cached fitted state without refitting; adapters without a usable predict_fun (or whose predict_fun errors) are flagged for refit, their cate_mean is filled with NA_real_, and a gdpar_diagnostic_warning is emitted. The bridge is re-evaluated via predict.gdpar_causal_bridge when real fits are present, otherwise falls back to cached cate_mean/cate_ci only when newdata matches the original observation count.

Arguments

  • object: gdpar_meta_learner_comparison. The comparison object. Must contain level, bridge (a gdpar_causal_bridge or NULL), and external (named list of adapter results, each possibly containing predict_fun, state, native_ci, notes).
  • newdata: data.frame. Required. The new evaluation grid. Must be a data frame (asserted by assert_data_frame).
  • level: numeric(1) or NULL. Optional credible level in $(10^{-3}, 1 - 10^{-3})$ overriding object$level. Defaults to NULL, which reuses object$level. Validated by assert_numeric_scalar(level, "level", lower = 1e-3, upper = 1 - 1e-3) when non-NULL.
  • bridge: gdpar_causal_bridge or NULL. Optional replacement bridge object used instead of object$bridge. Defaults to NULL (use cached bridge). Useful when the cached bridge was stripped (e.g., after a saveRDS round-trip that lost the two fits).
  • data: named list with components X, T, Y (and optionally X_newdata) or NULL. Reserved for the case of a forced re-fit. Defaults to NULL. Note: the current implementation does not consume data at all — it is accepted but never referenced in the body.
  • ...: any. Reserved for future arguments; currently unused.

Mathematics

Let $X_{\text{new}} \in \mathbb{R}^{n_{\text{new}} \times p}$ be the covariate matrix extracted from newdata. For each external method $m$ with a working predict_fun $f_m$:

$$ \widehat{\mathrm{CATE}}_{m}^{\text{new}} = f_m(\text{state}_m,\ X_{\text{new}},\ \text{level}) $$

The bridge prediction is

$$ \widehat{\mathrm{CATE}}_{\text{bridge}}^{\text{new}} = \mathrm{predict}(\text{bridge},\ \text{newdata},\ \text{level},\ \text{summary} = \text{"mean_ci"}) $$

when real fits are present. The concordance metrics are then recomputed over the vector of method-specific CATE means via .compute_comparison_metrics(cate_list).

Returns

A list of class c("predict.gdpar_meta_learner_comparison", "list") with components:

  • bridge: list with cate_mean and cate_ci (from bridge_pred).
  • external: named list mirroring object$external names; each entry is a list with cate_mean (numeric, possibly all NA_real_), cate_ci (matrix or NULL), method (character), native_ci (logical), time_sec (NA_real_), notes (character vector, augmented with a status message).
  • comparison: result of .compute_comparison_metrics(cate_list) — a list of concordance matrices.
  • newdata: the input newdata (stored verbatim).
  • level: the resolved numeric level.

Notes

  • Calls assert_inherits(object, "gdpar_meta_learner_comparison", "object") and assert_data_frame(newdata, "newdata") up front.
  • If bridge_obj (resolved from bridge or object$bridge) does not inherit from "gdpar_causal_bridge", the function aborts via gdpar_abort with class "gdpar_input_error" and a message instructing the user to pass a bridge via the bridge argument.
  • The outcome name is recovered via .bridge_outcome_name(bridge_obj$fits$treat, bridge_obj$fits$ctrl), and covariates are extracted from newdata via .extract_covariates(newdata, outcome_name) (presumably dropping the outcome column).
  • Bridge re-evaluation branches on has_real_fits:
    • If both bridge_obj$fits$treat$fit and bridge_obj$fits$ctrl$fit are non-NULL, calls stats::predict(bridge_obj, newdata = newdata, level = level, summary = "mean_ci").
    • Otherwise, falls back to cached bridge_obj$cate_mean / bridge_obj$cate_ci only if nrow(newdata) == bridge_obj$n_obs; otherwise cate_mean is rep(NA_real_, nrow(newdata)) and cate_ci is NULL.
  • For each external method:
    • If e$predict_fun is a function, it is invoked as pf(state = e$state, X_newdata = X_newdata, level = level) inside tryCatch. On success, cate_mean is coerced via as.numeric(out$cate_mean), cate_ci is taken as out$cate_ci, native_ci is e$native_ci && !is.null(out$cate_ci), and notes is augmented with "reused cached state via predict_fun". On error, the method is added to needs_refit, cate_mean is rep(NA_real_, nrow(newdata)), cate_ci is NULL, native_ci is FALSE, and notes is augmented with "predict_fun failed: <message>".
    • If no predict_fun, the method is added to needs_refit, cate_mean is rep(NA_real_, nrow(newdata)), cate_ci is NULL, native_ci is FALSE, and notes is augmented with "predict_fun unavailable; a full refit would be required".
  • If length(needs_refit) > 0L, a warning of class "gdpar_diagnostic_warning" is emitted via gdpar_warn with data = list(needs_refit = needs_refit) and a message listing the affected adapters, advising the user to rebuild the comparison with gdpar_compare_meta_learners().
  • time_sec is always set to NA_real_ for every external entry (no timing is recorded for prediction).
  • The data argument is declared and documented but not used in the body; no refit path is actually implemented despite the documentation mentioning fit_predict_fun. The function only reuses cached state or returns NA predictions.
  • cate_list is constructed as c(list(bridge = as.numeric(bridge_pred$cate_mean)), lapply(external, function(e) e$cate_mean)) and passed to .compute_comparison_metrics; the resulting matrices therefore have row/column names "bridge" followed by the external method names (subject to .compute_comparison_metrics's naming behavior).
  • S3 dispatch is triggered by predict(object, newdata = ...) when class(object) contains "gdpar_meta_learner_comparison".

R/compare_meta_learners.R

gdpar_compare_meta_learners(bridge, methods, newdata = NULL, data = NULL, seed = NULL, ...)

Purpose
Orchestrates a descriptive comparison of the T-learner (AMM-side) embedded in a gdpar_causal_bridge object against one or more external meta-learners (e.g., grf, EconML). It evaluates each method on a common evaluation grid, reports point/posterior CATE estimates and their native confidence intervals, and computes three concordance metrics (RMSE, Pearson correlation, mean absolute discrepancy) between every ordered pair of methods on their point/posterior CATE estimates. It does not perform hypothesis tests.

Arguments

  • bridge: Object of class gdpar_causal_bridge (from gdpar_causal_bridge()). Contains two fitted gdpar objects (treatment and control arms), precomputed CATE estimates, and metadata.
  • methods: Non-empty list of gdpar_meta_learner_adapter objects. Each adapter wraps a specific external meta-learner implementation (e.g., gdpar_adapter_grf()).
  • newdata: Optional data.frame on which to evaluate CATE. Defaults to the evaluation grid stored in bridge$newdata.
  • data: Optional list with components X (covariate data.frame), T (integer 0/1 treatment vector), Y (numeric outcome vector). Used to supply training data explicitly if it cannot be recovered from the bridge's stored calls (e.g., when the original data is not in the calling environment).
  • seed: Optional integer scalar. Propagated to each adapter's fit_predict_fun as seed_run for reproducibility.
  • ...: Reserved for future arguments; currently unused.

Mathematics
For every ordered pair of methods $(i, j)$ (including the bridge, indexed as "bridge"), computes the following concordance metrics on the point/posterior CATE estimates $\widehat{\tau}$: $$ \mathrm{RMSE}{ij} = \sqrt{\frac{1}{N}\sum{n=1}^N \left( \widehat{\tau}_i(x_n) - \widehat{\tau}j(x_n) \right)^2} $$ $$ \mathrm{Pearson}{ij} = \mathrm{cor}\left( \widehat{\tau}i, \widehat{\tau}j \right) $$ $$ \mathrm{MAD}{ij} = \frac{1}{N}\sum{n=1}^N \left| \widehat{\tau}_i(x_n) - \widehat{\tau}_j(x_n) \right| $$ where $N$ is the number of evaluation points (n_obs). Confidence intervals are not pooled because their inferential origins (Bayesian posterior, asymptotic, bootstrap) are heterogeneous.

Returns
An object of class gdpar_meta_learner_comparison (a list) with components:

  • bridge_cate: List with cate_mean (numeric vector of bridge CATE point estimates) and cate_ci (matrix of bridge credible intervals).
  • bridge: The original gdpar_causal_bridge object.
  • external: Named list of results for each external adapter. Each element contains cate_mean, cate_ci, method, native_ci (logical), time_sec, notes, state (from the adapter), and the adapter's predict_fun/fit_predict_fun if provided.
  • comparison: Matrix of concordance metrics (RMSE, Pearson, MAD) between all method pairs.
  • newdata: The evaluation grid (data.frame) used.
  • level: The confidence level (numeric) used for intervals.
  • n_obs: Integer number of evaluation points.
  • n_methods: Integer total number of methods compared (bridge + external).
  • call: The matched call.
  • meta: List of metadata including package version, timestamp, seed, original bridge call, and adapter specifications.

Notes

  • Scalar-outcome restriction: Rejects bridges with dim_kind != "scalar" (i.e., distributional or multivariate regression) via .guard_scalar_outcome().
  • Method names: If the methods list is unnamed, names are taken from each adapter's $name field. Duplicate names cause an error.
  • Dataset recovery: If data is NULL, the function attempts to reconstruct the training data from the bridge's stored calls using .assemble_bridge_dataset(). If this fails, the user must supply data explicitly.
  • Adapter validation: Each adapter is checked for unmet software requirements (R packages, Python modules) via .check_adapter_requirements(). Missing dependencies cause a gdpar_missing_dependency_error.
  • Adapter output validation: Results from each adapter are checked with .validate_adapter_output() for correct length and structure.
  • Bridge CATE recomputation: If newdata differs from the bridge's original grid and lengths mismatch, the bridge CATE is re-predicted using stats::predict(bridge, ...).

.guard_scalar_outcome(bridge)

Purpose
Internal validation function ensuring the bridge was constructed from scalar-outcome fits (i.e., dim_kind == "scalar"). Rejects bridges from distributional regression (K > 1) or multivariate response (p > 1) with a specific error, as multi-output external adapters are not supported in the current scope (Sub-phase 8.5.B).

Arguments

  • bridge: Object of class gdpar_causal_bridge. Its $meta$dim_kind component is inspected.

Returns
invisible(NULL) if the bridge is scalar. Otherwise, raises a gdpar_unsupported_feature_error.

Notes

  • Uses the null-coalescing operator %||% to default "scalar" if dim_kind is missing.
  • The error message references "Sub-phase 8.5.B" and queues multi-output support for "Block 9" per the package roadmap.

.assemble_bridge_dataset(bridge, newdata, data, eval_env)

Purpose
Internal helper that constructs the unified training dataset (X, T, Y) required by external meta-learner adapters. It either uses an explicitly provided data argument or attempts to recover the training data from the bridge's stored fits by evaluating their captured call objects in the specified environment.

Arguments

  • bridge: gdpar_causal_bridge object.
  • newdata: data.frame of evaluation covariates (the CATE grid).
  • data: Optional list with components X, T, Y. If supplied, it is used directly.
  • eval_env: Environment in which to evaluate the bridge's stored calls (typically parent.frame() of the caller).

Mathematics
When data is NULL, the algorithm for each arm (treatment, control) is:

  1. Recover the arm's training dataset via eval(fit$call$data, eval_env).
  2. Identify the outcome variable name from the LHS of fit$call$formula.
  3. Extract the covariate matrix X_arm (all columns except the outcome) and outcome vector Y_arm.
  4. Create treatment indicator T_arm = 1L (treatment) or 0L (control).
  5. Stack the two arms row-wise to form (X, T, Y).

Returns
A list with components:

  • X: data.frame of stacked covariates (rows = training observations from both arms).
  • T: Integer vector of treatment indicators (0/1).
  • Y: Numeric vector of outcomes.
  • X_newdata: The supplied newdata (unchanged).
  • outcome_name: Character string of the outcome variable name.

Notes

  • If evaluation of a fit's call$data fails (e.g., the object is not in eval_env), the function aborts with a gdpar_input_error advising the user to pass data explicitly.
  • The helper ensures the covariate column order and types are consistent between training and evaluation data.
  • The function is responsible for ensuring that the stacked dataset aligns with the external adapter's expectations (i.e., X is a data frame, T is integer 0/1, Y is numeric).

.assemble_bridge_dataset(bridge, newdata, data, eval_env)

Purpose Assembles the unified training dataset and newdata covariate matrix required by the meta-learner comparison machinery. It either accepts an explicitly supplied data argument (a list with X, T, Y, and optionally X_newdata) or attempts to recover the original training data from the captured call objects inside the treatment and control fits of a bridge object. In both paths it validates consistency, combines arms, and returns a standardized list.

Arguments

Argument Type Meaning
bridge list A bridge object with component fits$treat and fits$ctrl, each a fitted model object that carries a $call element.
newdata data.frame (or coercible) New covariate data for which predictions will be compared. Must contain every covariate column present in the training data.
data list or NULL Optional explicit data. If non-NULL it must be a named list with components X (covariate matrix/data.frame), T (integer treatment indicator, values 0/1), Y (numeric outcome), and optionally X_newdata (covariate data.frame for newdata; if absent, covariates are extracted from newdata).
eval_env environment The environment in which fit call expressions (e.g. cl$data) are evaluated when recovering training data from the fitted objects.

Mathematics

No formula is implemented. The function performs data assembly only.

Training data from two arms are row-bound with treatment indicators prepended:

$$\mathbf{Y} = \begin{bmatrix} \mathbf{Y}^{(t)} \ \mathbf{Y}^{(c)} \end{bmatrix}, \quad \mathbf{T} = \begin{bmatrix} \mathbf{1}_{n_t} \ \mathbf{0}_{n_c} \end{bmatrix}, \quad \mathbf{X} = \begin{bmatrix} \mathbf{X}^{(t)} \ \mathbf{X}^{(c)} \end{bmatrix}$$

where $n_t$ and $n_c$ are the row counts of the treatment and control recovered data frames respectively.

Returns

A named list with five components:

Component Type Description
X data.frame Training covariates (outcome column removed). Row count equals $n_t + n_c$.
T integer vector Treatment indicator, length $n_t + n_c$, values 1L (treatment arm first) then 0L (control arm).
Y numeric vector Outcome values, treatment arm first then control arm, length $n_t + n_c$.
X_newdata data.frame Covariates for newdata, column subset/reordered to match X. Row count equals nrow(newdata).
outcome_name character scalar The name of the outcome variable, inferred by .bridge_outcome_name().

Notes

  • Explicit data path (data is non-NULL):

    • data must be a list with named components X, T, Y. If any is missing, a gdpar_input_error is raised.
    • X is coerced to data.frame if it is not one already (with stringsAsFactors = FALSE).
    • T is coerced to integer; Y to numeric.
    • Lengths of T, Y, and nrow(X) must agree; otherwise a gdpar_input_error is raised with a diagnostic sprintf message.
    • T must contain only 0L and 1L; otherwise a gdpar_input_error is raised.
    • If data$X_newdata is present it is used (coerced to data.frame if needed); otherwise covariates are extracted from newdata by calling .extract_covariates(newdata, outcome_name).
    • The function returns immediately via return(...) without any data-recovery attempt.
  • Recovery path (data is NULL):

    • The internal function recover(fit) evaluates fit$call$data in eval_env. If the call is NULL, the data component is NULL, or evaluation throws an error, NULL is returned.
    • If either recovered data is NULL or not a data.frame, a gdpar_input_error is raised with extra data fields treat_recovered and ctrl_recovered.
    • The outcome variable name (from .bridge_outcome_name()) must appear as a column in both recovered data frames; otherwise a gdpar_input_error is raised.
    • The column sets of the two recovered data frames must be identical (after sorting); otherwise a gdpar_input_error is raised listing both column sets.
    • Both data frames are subset to their common columns, then row-bound. The outcome column is removed from X via .extract_covariates().
    • newdata covariates are extracted and checked for missing columns present in X; if any are missing a gdpar_input_error is raised listing them. X_newdata is then reordered/subset to match X's columns exactly.
  • All errors are raised via gdpar_abort() with appropriate condition classes (gdpar_input_error, gdpar_unsupported_feature_error).


.bridge_outcome_name(fit_t, fit_c)

Purpose Infers the name of the outcome (response) variable from the captured formula calls of the treatment and control fits within a bridge object. This is used internally by .assemble_bridge_dataset() and other comparison functions.

Arguments

Argument Type Meaning
fit_t fitted model object Treatment-arm fit. Must carry a $call element, ideally with a $formula component.
fit_c fitted model object Control-arm fit. Same expectations as fit_t.

Mathematics

No formula. The function performs formula introspection.

Returns

A single character string giving the outcome variable name. If both fits resolve to the same name, that name is returned. If only one resolves, the resolved name is returned. If neither resolves or they disagree, an error is raised (see Notes).

Notes

  • The internal helper pick(fit) attempts three strategies in order to extract the LHS of a two-sided formula from fit$call$formula:

    1. Evaluate cl$formula in the environment of fit$call. If the result is a formula of length 3 (two-sided), extract as.character(fm[[2L]]).
    2. If cl$formula itself is a call or name, attempt to evaluate it with bare eval(). If the result is a two-sided formula, extract the LHS.
    3. If cl$formula is a call of length 3 whose first element is ~, directly extract as.character(cl$formula[[2L]]).
    4. If all strategies fail, NA_character_ is returned.
  • If both n_t and n_c are NA, a gdpar_input_error is raised advising the user to pass an explicit data argument.

  • If both are non-NA but differ (!identical(n_t, n_c)), a gdpar_unsupported_feature_error is raised listing both names. This means the two fits must model the same outcome variable.

  • If exactly one is NA, the non-NA value is returned (i.e., n_c when n_t is NA, otherwise n_t).


.extract_covariates(df, outcome_name)

Purpose Removes the outcome column from a data frame (or object coercible to one) and returns only the covariate columns. Used throughout the comparison pipeline to separate predictors from response.

Arguments

Argument Type Meaning
df data.frame or coercible A data frame whose columns include covariates and the outcome.
outcome_name character scalar The name of the outcome column to drop.

Mathematics

None.

Returns

A data.frame containing all columns of df except the one named outcome_name. If outcome_name is not present in colnames(df), all columns are returned (since setdiff returns the full set). The drop = FALSE argument ensures the result is always a data frame even if a single column remains.

Notes

  • If df is not already a data.frame, it is coerced via as.data.frame(df, stringsAsFactors = FALSE).
  • This is a utility function; no errors are raised by it directly.

.validate_adapter_output(result, n_newdata, adapter_name)

Purpose Validates that the return value of a meta-learner adapter conforms to the expected shape and types. Called internally after each adapter invocation to enforce the adapter interface contract.

Arguments

Argument Type Meaning
result list The object returned by an adapter. Must contain at minimum a cate_mean component. May optionally contain cate_ci.
n_newdata integer scalar The expected number of rows (observations) in the newdata, i.e., the required length of cate_mean and row count of cate_ci.
adapter_name character scalar Human-readable name of the adapter, used in error messages.

Mathematics

None.

Returns

Invisibly returns NULL (invisible(NULL)). Side-effect–only function: raises errors if validation fails.

Notes

  • First check: result must be a list and must have an element named "cate_mean". If not, a gdpar_internal_error is raised.
  • Second check: result$cate_mean must be a numeric vector of length exactly n_newdata. If not, a gdpar_internal_error is raised with a diagnostic sprintf.
  • Third check (conditional): If result$cate_ci is non-NULL, it must be a matrix with nrow == n_newdata and ncol == 2L (lower and upper bounds). If not, a gdpar_internal_error is raised reporting the actual dimensions.
  • All errors use class "gdpar_internal_error", indicating a programming error in the adapter rather than user input.

.compute_comparison_metrics(cate_list)

Purpose Computes three pairwise concordance/similarity matrices across a list of CATE (Conditional Average Treatment Effect) estimate vectors. These matrices quantify the agreement between different meta-learner methods on the same newdata.

Arguments

Argument Type Meaning
cate_list list of numeric vectors Each element is a numeric vector of CATE predictions for the same newdata observations. All vectors must have the same length. List element names (if present) are used as row/column labels; otherwise names m1, m2, … are generated.

Mathematics

Let $m = \texttt{length(cate\_list)}$ and let $\hat{\tau}_i \in \mathbb{R}^n$ denote the $i$-th CATE vector ($i = 1, \ldots, m$). Three $m \times m$ matrices are computed:

Root Mean Squared Error (RMSE):

$$\text{RMSE}_{ij} = \sqrt{\frac{1}{n} \sum_{k=1}^{n} \left(\hat{\tau}_{i,k} - \hat{\tau}_{j,k}\right)^2}, \quad i \neq j$$

Diagonal: $\text{RMSE}_{ii} = 0$.

Mean Absolute Deviation (MAD):

$$\text{MAD}_{ij} = \frac{1}{n} \sum_{k=1}^{n} \left|\hat{\tau}_{i,k} - \hat{\tau}_{j,k}\right|, \quad i \neq j$$

Diagonal: $\text{MAD}_{ii} = 0$.

Pearson Correlation:

$$\text{Pearson}_{ij} = \text{Cor}!\left(\hat{\tau}_i, \hat{\tau}_j\right), \quad i \neq j$$

Diagonal: $\text{Pearson}_{ii} = 1$. Note that correlation is computed only for $i &lt; j$ and then copied symmetrically: $\text{Pearson}_{ji} = \text{Pearson}_{ij}$.

Returns

A named list with three components:

Component Type Description
rmse matrix ($m \times m$) Pairwise RMSE. Diagonal is 0. Symmetric. Dimnames are the method names.
pearson matrix ($m \times m$) Pairwise Pearson correlation. Diagonal is 1. Symmetric. Dimnames are the method names.
mad matrix ($m \times m$) Pairwise MAD. Diagonal is 0. Symmetric. Dimnames are the method names.

Notes

  • All CATE vectors in cate_list are column-bound into a single matrix M via do.call(cbind, cate_list). This requires all vectors to have the same length; no explicit check is performed—cbind will recycle or error if lengths differ.
  • If cate_list has no names, synthetic names "m1", "m2", … are assigned and propagated to dimnames.
  • The double loop iterates over all $(i, j)$ pairs with $i \neq j$. For RMSE and MAD, each off-diagonal entry is written once. For Pearson, the loop only computes the correlation when $i &lt; j$ (using stats::cor()) and mirrors the value to $[j, i]$. This avoids redundant correlation calls.
  • stats::cor() is wrapped in suppressWarnings() to silence warnings about constant vectors (which yield NA correlations).
  • The matrices are not guaranteed to be perfectly symmetric due to floating-point considerations in the $i \neq j$ case for RMSE and MAD (each pair is computed only once and written to one cell; the symmetric cell is left at the diagonal-init value). Specifically, rmse[i,j] is set for all $i \neq j$ in the inner loop, so both $[i,j]$ and $[j,i]$ are filled (the loop visits both orderings since i == j is the only skip). The Pearson matrix is explicitly symmetric because only $i &lt; j$ is computed and mirrored.

R/contraction_diagnostic.R

R/contraction_diagnostic.R

gdpar_contraction_diagnostic(fit, data, sizes = NULL, replicates = 1L, parameters = NULL, level = 0.95, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)

Purpose

Empirical posterior contraction-rate diagnostic for a fitted Path 1 (Bayesian) gdpar model. It is an opt-in, computationally expensive methodological audit tool that refits the model at multiple subsample sizes, records the median posterior credible-interval width across user-facing parameters at each size, and fits an ordinary-least-squares regression of log-width on log-sample-size. The estimated slope is compared against the theoretical parametric contraction rate $n^{-1/2}$ predicted by Theorem 4B of Block 4. The function does not modify the original fit; it returns a standalone report.

Arguments

Argument Type Meaning
fit gdpar_fit A fitted model object produced by gdpar with path = "bayes". Must inherit from class "gdpar_fit". The original fit$call is extracted and modified to produce subsampled refits.
data data frame The data frame originally passed to gdpar, or another data frame compatible with the AMM specification of fit. Its row count $n$ governs subsample-size validation and sampling.
sizes NULL or numeric vector Subsample sizes at which to refit. If NULL (default), a length-five geometric sequence is generated between $\max(20, \lceil n/8 \rceil)$ and $n$. Entries must lie in $[5, n]$.
replicates integer scalar (count) Number of independent subsamples drawn per size. Defaults to 1L. Higher values reduce Monte Carlo variance of the log-width curve at additional computational cost. Must be a non-negative integer (validated by assert_count).
parameters NULL or character vector Optional explicit list of posterior variable names to include in the credible-width calculation. If NULL (default), the function auto-selects user-facing parameters by filtering out variables matching the internal ignore pattern.
level numeric scalar in $(0, 1)$ Nominal credible level for interval-width computation. Defaults to 0.95. The interval is formed from the $\alpha/2$ and $1 - \alpha/2$ quantiles where $\alpha = 1 - \text{level}$.
iter_warmup integer scalar (count) Warmup iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call.
iter_sampling integer scalar (count) Sampling iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call.
chains integer scalar (count) Number of MCMC chains per refit. Defaults to 2L. Forwarded to gdpar via the modified call.
verbose logical scalar (length 1) If TRUE, prints a cost message via gdpar_inform before starting the refits. Defaults to TRUE. Must be a single logical value.
... any Additional arguments forwarded to gdpar through the modified refit call.

Mathematics

The diagnostic fits the linear regression

$$ \log(\text{width}_i) = \alpha + \beta ,\log(n_i) + \varepsilon_i $$

where $\text{width}_i$ is the median credible-interval width across the selected parameters at subsample size $n_i$. For each parameter $\theta_j$ and each refit, the credible-interval width is

$$ \text{width}_j = q_{1-\alpha/2}(\theta_j) - q_{\alpha/2}(\theta_j), $$

with $\alpha = 1 - \text{level}$, and the cell-level summary is

$$ \text{median_width} = \mathrm{median}_j(\text{width}_j). $$

The slope $\hat\beta$ and its standard error $\mathrm{SE}(\hat\beta)$ are extracted from stats::lm. An approximate 95% confidence interval for $\beta$ is

$$ [\hat\beta - 1.96,\mathrm{SE}(\hat\beta),;; \hat\beta + 1.96,\mathrm{SE}(\hat\beta)]. $$

The verdict logic compares this interval against the theoretical target $(-0.6,\,-0.4)$:

$$ \text{verdict} = \begin{cases} \text{Consistent with parametric } n^{-1/2} \text{ rate.} & \text{if } \hat\beta_{\text{upper}} \ge -0.6 ;\text{and}; \hat\beta_{\text{lower}} \le -0.4, \[4pt] \text{Faster than } n^{-1/2}\text{; check for spurious artefacts.} & \text{if } \hat\beta_{\text{upper}} < -0.6, \[4pt] \text{Slower than } n^{-1/2}\text{; check for prior misspecification or model misspecification.} & \text{otherwise.} \end{cases} $$

The first branch tests whether the 95% CI overlaps the interval $[-0.6, -0.4]$.

The default subsample sizes, when sizes = NULL, are generated as

$$ \text{sizes} = \mathrm{unique}!\left(\mathrm{round}!\left(\exp!\left(\mathrm{seq}!\left(\log!\left(\max(20,,\lceil n/8\rceil)\right),;\log n,;\text{length.out}=5\right)\right)\right)\right). $$

Returns

A list of class c("gdpar_contraction_report", "list") with components:

Component Type Description
table data frame Columns n (subsample size), replicate (replicate index), median_width (median credible-interval width, NA_real_ if the refit failed). One row per (size, replicate) cell.
slope_estimate numeric scalar OLS slope $\hat\beta$ from lm(log_w ~ log_n), with names stripped via unname.
slope_se numeric scalar Standard error of $\hat\beta$, with names stripped.
slope_ci_lower numeric scalar Lower bound $\hat\beta - 1.96\,\mathrm{SE}(\hat\beta)$.
slope_ci_upper numeric scalar Upper bound $\hat\beta + 1.96\,\mathrm{SE}(\hat\beta)$.
verdict character One of three verdict strings (see Mathematics).
level numeric scalar The credible level used (echoed from the level argument).
warnings character vector Per-refit failure messages; empty if all refits succeeded.

Notes

  • Input validation. Calls assert_inherits(fit, "gdpar_fit", ...), assert_data_frame(data, ...), assert_count(replicates, ...), assert_numeric_scalar(level, ..., lower = 0, upper = 1), assert_count(iter_warmup, ...), assert_count(iter_sampling, ...), assert_count(chains, ...). The verbose argument is checked inline: if not a length-1 logical, gdpar_abort is called with class "gdpar_input_error". The sizes argument, when non-NULL, is validated inline: if not numeric, or if any entry is $&lt; 5$ or $&gt; n$, gdpar_abort is called with class "gdpar_input_error" and a message formatted via sprintf.
  • Suggested-package dependencies. Calls require_suggested("cmdstanr", ...) and require_suggested("posterior", ...). If either is unavailable, an error is raised by that helper.
  • Cost message. When verbose = TRUE, emits a gdpar_inform message of class "gdpar_optin_message" stating the total number of refits (length(sizes) * replicates).
  • Refit call construction. The original fit$call is copied and modified: data is set to quote(sub); iter_warmup, iter_sampling, chains are overwritten from the corresponding arguments; verbose is set to FALSE; refresh is set to 0L; skip_id_check is set to TRUE. The modified call is eval-uated in a freshly created environment (new.env(parent = parent.frame())) in which the symbol sub is bound to the subsampled data frame. The local variable call_data_arg_name <- "data" is assigned but never used.
  • Subsampling. For each (size, replicate) cell, sample.int(n, size = sz) draws a simple random sample without replacement. Despite the documentation mentioning "stratified by row order," the code performs uniform random sampling with no stratification.
  • Refit failure handling. Each refit is wrapped in tryCatch. On error, refit_failure_msg is populated (via <<- inside the error handler) with a formatted message including the size, replicate, and conditionMessage(e). A gdpar_warn of class "gdpar_diagnostic_warning" is emitted, the message is appended to warnings_msg, and a row with median_width = NA_real_ is recorded. The loop then continues to the next cell.
  • Variable selection. Posterior variables are retrieved via posterior::variables(draws). The ignore pattern "^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw)" is applied via grepl to exclude internal/auxiliary variables. If parameters is NULL, the filtered set (candidate_vars) is used; otherwise parameters is used directly without validation against available variables.
  • Width computation. posterior::summarise_draws is called on posterior::subset_draws(draws, variable = use_vars) with two custom summary functions q_lower and q_upper that wrap stats::quantile at probabilities $\alpha/2$ and $1 - \alpha/2$ respectively (with names = FALSE). Widths are computed as the element-wise difference q_upper - q_lower, and the cell's median_width is stats::median(widths).
  • Minimum data requirement. After removing NA rows, if fewer than 3 successful refits remain, gdpar_abort is called with class "gdpar_diagnostic_error" and message "Not enough successful refits to estimate the contraction slope.".
  • Regression. stats::lm(log_w ~ log_n) is fit on the non-NA subset. Coefficients and standard errors are extracted from stats::coef(reg) and summary(reg)$coefficients[, "Std. Error"] respectively, indexing by the name "log_n".
  • Side effects. May print a cost message (gdpar_inform), emit per-refit warnings (gdpar_warn), and perform length(sizes) * replicates full Bayesian refits via cmdstanr (through gdpar).

print.gdpar_contraction_report(x, ...)

Purpose

S3 print method for objects of class gdpar_contraction_report. Produces a human-readable summary of the contraction-rate diagnostic report, including the per-cell table, the estimated slope with standard error and 95% confidence interval, the verdict string, and any recorded warnings.

Arguments

Argument Type Meaning
x gdpar_contraction_report The report object returned by gdpar_contraction_diagnostic.
... any Unused; present for S3 generic compatibility.

Returns

Invisibly returns x (via invisible(x)).

Notes

  • Output format. Prints, in order:
    1. A header line "<gdpar_contraction_report> level = <level>" (using cat with sep = "").
    2. The table component via print(x$table, row.names = FALSE).
    3. A blank line, then "Slope estimate (log_width ~ log_n): <slope> (SE = <se>)" with values formatted to 3 significant digits via format(..., digits = 3).
    4. "95% CI: [<lower>, <upper>]" with values formatted to 3 significant digits.
    5. "Verdict: <verdict>".
    6. If length(x$warnings) > 0L, a blank line, the header "Warnings:", and each warning prefixed with " - ".
  • S3 dispatch. Registered as the print method for class gdpar_contraction_report; dispatched automatically when such an object is printed at the console.
  • No side effects beyond console output.

R/dependence_robust.R

.gdpar_eb_scalar_y_obs(object)

Purpose. Extracts the observed scalar outcome vector from a scalar Empirical-Bayes fit (gdpar_eb_fit) by reading the Stan data bundle stored in object$stan_data. It serves as the canonical accessor for the response used downstream by dependence diagnostics (e.g., residual-based Moran's I or block-bootstrap refit engines). Aborts for non-scalar outcomes (multivariate p > 1 or multi-slot K > 1), which are explicitly deferred in this sub-block.

Arguments.

Argument Type Meaning
object gdpar_eb_fit A scalar Empirical-Bayes fit object whose stan_data list contains the outcome vector.

Returns. A numeric vector (as.numeric(y_raw)), the observed outcome values. If the Stan data stored a real-valued response (y_real) that is used; otherwise y_int (count / Bernoulli families) is used. The result is always coerced to numeric.

Notes.

  • Reads object$stan_data$y_real first; if NULL, falls back to object$stan_data$y_int. If both are NULL, raises a gdpar_internal_error via gdpar_abort().
  • If y_raw is a matrix with more than one column (ncol(y_raw) > 1L), raises a gdpar_unsupported_feature_error, because multivariate (p > 1) outcomes are deferred.
  • Multi-slot (K > 1) outcomes are not checked here directly (that is handled by .gdpar_assert_scalar_eb()), but the matrix-column check implicitly guards against multi-column outcome matrices.
  • No S3 dispatch; purely internal.

.gdpar_assert_scalar_eb(object, arg_name = "object")

Purpose. Validates that object is a scalar Empirical-Bayes fit (gdpar_eb_fit) suitable for dependence-robust inference. Checks three conditions: (i) the object inherits from gdpar_eb_fit, (ii) it has no heterogeneous-family list (K > 1), and (iii) its conditional HMC fit is present.

Arguments.

Argument Type Meaning
object gdpar_eb_fit The fit object to validate.
arg_name character (length 1) Name of the argument, used in error messages. Defaults to "object".

Returns. invisible(object) — the same object, if all checks pass.

Notes.

  • Calls assert_inherits(object, "gdpar_eb_fit", arg_name) first; this is an external assertion helper that aborts with an appropriate class if the check fails.
  • If object$family$families is non-NULL, this indicates heterogeneous families (K > 1), and a gdpar_unsupported_feature_error is raised.
  • If object$conditional_fit is NULL, a gdpar_internal_error is raised because the conditional HMC fit is required for downstream residual extraction.
  • Returns invisibly to support use as a guard clause.

.gdpar_assert_scalar_dep(object, arg_name = "object")

Purpose. The Axis 2 gate (decision D102): validates that a fit object is a scalar fit on either the Empirical-Bayes or the full-Bayes path, suitable for dependence-robust inference. For EB fits it delegates verbatim to .gdpar_assert_scalar_eb(). For full-Bayes fits (gdpar_fit) it checks the path class and presence of the HMC fit. Any other class is rejected.

Arguments.

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit The fit object to validate.
arg_name character (length 1) Name of the argument, used in error messages. Defaults to "object".

Returns. invisible(object) — the same object, if all checks pass.

Notes.

  • If object inherits from gdpar_eb_fit, delegates to .gdpar_assert_scalar_eb(object, arg_name) and returns its result. This preserves byte-identical EB-path behaviour.
  • If object inherits from gdpar_fit:
    • Calls .gdpar_fit_path_class(object) (an internal helper elsewhere in the package) and asserts the result is "scalar". If not, raises gdpar_unsupported_feature_error (multivariate p > 1 and K > 1 full-Bayes fits are deferred).
    • If object$fit is NULL, raises gdpar_internal_error (the HMC fit is missing).
  • If object is neither gdpar_eb_fit nor gdpar_fit, raises gdpar_input_error with a message naming the offending argument.
  • Returns invisibly.

.gdpar_eb_estimate_vector(fit)

Purpose. Extracts the EB point-estimate vector from a scalar Empirical-Bayes fit and flattens it into a single named numeric vector. This is the EB touchpoint of the block-bootstrap engine: the same extraction is performed on each bootstrap refit, and column alignment across refits depends on the name stability guaranteed here.

Arguments.

Argument Type Meaning
fit gdpar_eb_fit A scalar Empirical-Bayes fit object.

Mathematics. No formula per se, but the extraction order is deterministic and fixed:

$$ \hat{\boldsymbol{\beta}} = \bigl(\hat{\theta}_{\text{ref}},; \hat{a},; \hat{b},; \hat{W}_{\text{raw}}\bigr)^{!\top} $$

where each component is a sub-vector of the named coefficients returned by coef.gdpar_eb_fit(). The concatenation order is: theta_ref, then a, then b, then W.

Returns. A named numeric vector containing all EB point estimates. Names follow the convention "theta_ref" or "theta_ref[1]" etc. for theta_ref, and "a[1]", "b[1]", "W[1]" etc. for the remaining components (unless the coef() result already provides names).

Notes.

  • Calls stats::coef(fit) to obtain the structured coefficient list.
  • Iterates over components "theta_ref", "a", "b", "W" in that fixed order.
  • If a component's $estimate field is NULL, it is silently skipped.
  • If names are NULL, synthetic names of the form "<comp>[<index>]" are generated. For theta_ref of length 1, the name is simply "theta_ref".
  • If no estimates can be extracted (all components NULL), raises gdpar_internal_error.
  • The result of do.call(c, unname(parts)) concatenates the named sub-vectors while preserving names.

.gdpar_eb_model_se_vector(fit)

Purpose. Mirrors .gdpar_eb_estimate_vector() but extracts the model-based (Laplace / conditional posterior) standard errors instead of point estimates. The resulting vector is name-aligned with the estimate vector, enabling ratio computations such as se_ratio = robust_se / model_se.

Arguments.

Argument Type Meaning
fit gdpar_eb_fit A scalar Empirical-Bayes fit object.

Mathematics. The model SE for each coefficient $k$ is:

$$ \text{model_SE}_k = \text{posterior SD from the Laplace approximation} $$

as stored in coef(fit)$<component>$se.

Returns. A named numeric vector of the same length and name structure as .gdpar_eb_estimate_vector(fit). If a component's $se field is NULL but its $estimate field is non-NULL, the corresponding entries are filled with NA_real_.

Notes.

  • Reads $se fields from the coef(fit) list. If $se is NULL for a given component but $estimate is present, fills with NA_real_ (length-matched via rep(NA_real_, length(est))).
  • Uses $estimate (not $se) to determine names and presence, ensuring alignment with the estimate vector.
  • Iterates over theta_ref, a, b, W in the same fixed order as .gdpar_eb_estimate_vector().
  • Returns a do.call(c, unname(parts)) result, identical structure to the estimate vector.

.gdpar_fb_coef_draws_matrix(object)

Purpose. Extracts the posterior draws of the AMM coefficients from a scalar full-Bayes fit (gdpar_fit) as a single $S \times P$ matrix ($S$ = number of posterior draws, $P$ = number of AMM coefficient parameters). This is the full-Bayes counterpart of the EB coefficient extraction. The matrix is used to compute both point estimates (posterior means) and model-based standard errors (posterior SDs).

Arguments.

Argument Type Meaning
object gdpar_fit A scalar full-Bayes fit object (already validated by .gdpar_assert_scalar_dep()).

Mathematics. Let $\boldsymbol{\theta}^{(s)}$ denote the $s$-th posterior draw of the AMM coefficient vector, $s = 1, \ldots, S$. The returned matrix is:

$$ \mathbf{M} = \begin{pmatrix} \boldsymbol{\theta}^{(1)\top} \ \vdots \ \boldsymbol{\theta}^{(S)\top} \end{pmatrix} \in \mathbb{R}^{S \times P} $$

where the columns correspond to the Stan variables theta_ref, a_coef, b_coef, and W_raw, in that order, each included only if the corresponding AMM component is active.

Returns. An $S \times P$ numeric matrix (unclassed draws_matrix) whose columns carry Stan variable names (e.g., "theta_ref[1]", "a_coef[1]"). Row count equals the number of posterior draws.

Notes.

  • Requires the suggested package posterior; calls require_suggested("posterior", "extract posterior draws") which will abort with an informative message if unavailable.
  • Reads draws via object$fit$draws() (the raw CmdStan / Stan fit object).
  • Variables included: always "theta_ref"; additionally "a_coef" if object$amm$a is non-NULL; "b_coef" if object$amm$b is non-NULL; "W_raw" if object$amm$W is non-NULL.
  • Uses raw W_raw draws (not sigma_W-scaled effective weights), matching the EB extractor's use of raw W_raw conditional estimates. This is a deliberate parity choice (decision D102).
  • Excludes hyperparameters (mu_theta_ref, sigma_theta_ref) for EB/FB parity.
  • If the resulting matrix is NULL or has zero columns, raises gdpar_internal_error.
  • Calls unclass() on the result to strip the draws_matrix class, returning a plain numeric matrix.

.gdpar_fb_estimate_vector(object)

Purpose. Computes the full-Bayes point-estimate vector as the posterior mean of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_estimate_vector().

Arguments.

Argument Type Meaning
object gdpar_fit A scalar full-Bayes fit object.

Mathematics. For each coefficient $k = 1, \ldots, P$:

$$ \hat{\theta}_k = \frac{1}{S} \sum_{s=1}^{S} \theta_k^{(s)} $$

where $\theta_k^{(s)}$ is the $s$-th posterior draw of coefficient $k$.

Returns. A named numeric vector of length $P$ with names taken from the column names of the draws matrix (e.g., "theta_ref[1]", "a_coef[1]", etc.).

Notes.

  • Calls .gdpar_fb_coef_draws_matrix(object) to obtain the $S \times P$ matrix, then computes column means via colMeans(mat).
  • Names are set from colnames(mat), which are the Stan variable names.

.gdpar_fb_model_se_vector(object)

Purpose. Computes the full-Bayes model-based standard error vector as the posterior standard deviation of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_model_se_vector().

Arguments.

Argument Type Meaning
object gdpar_fit A scalar full-Bayes fit object.

Mathematics. For each coefficient $k = 1, \ldots, P$:

$$ \text{model_SE}_k = \sqrt{\frac{1}{S - 1} \sum_{s=1}^{S} \bigl(\theta_k^{(s)} - \hat{\theta}_k\bigr)^2} $$

where $\hat{\theta}_k$ is the posterior mean. This is the sample standard deviation of the posterior draws.

Returns. A named numeric vector of length $P$, name-aligned with .gdpar_fb_estimate_vector(object). Names are taken from colnames(mat).

Notes.

  • Calls .gdpar_fb_coef_draws_matrix(object) then applies apply(mat, 2L, stats::sd) to compute column-wise standard deviations.
  • Uses stats::sd (which divides by $S - 1$, Bessel-corrected).
  • The "model SE" here is the posterior SD, which is like-for-like with the EB Laplace SD, so the se_ratio = robust_se / model_se comparison is a SD-vs-SD ratio on both paths.

.gdpar_dep_estimate_vector(object)

Purpose. Class-dispatched accessor for the point-estimate vector, the first touchpoint of the shared block-bootstrap engine. For a gdpar_eb_fit it delegates to .gdpar_eb_estimate_vector() (byte-identical EB path); for a gdpar_fit it delegates to .gdpar_fb_estimate_vector().

Arguments.

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A validated scalar fit object (EB or full-Bayes).

Mathematics. See .gdpar_eb_estimate_vector() and .gdpar_fb_estimate_vector().

Returns. A named numeric vector of AMM coefficient point estimates, regardless of path.

Notes.

  • Dispatch is via inherits(object, "gdpar_eb_fit") (manual S3-style, not formal UseMethod).
  • If the object is a gdpar_eb_fit, calls and returns .gdpar_eb_estimate_vector(object) verbatim, preserving regression-gate compatibility.
  • Otherwise (assumed gdpar_fit), calls .gdpar_fb_estimate_vector(object).
  • Column names are stable across refits of the same model specification, which is critical for the block-bootstrap column alignment.

.gdpar_dep_model_se_vector(object)

Purpose. Class-dispatched accessor for the model-based standard error vector, the second touchpoint of the shared block-bootstrap engine. EB path: Laplace / conditional posterior SD (verbatim). Full-Bayes path: posterior SD per coefficient. In both cases the "model SE" is a within-model (posterior / Laplace) standard deviation.

Arguments.

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A validated scalar fit object (EB or full-Bayes).

Mathematics. See .gdpar_eb_model_se_vector() and .gdpar_fb_model_se_vector().

Returns. A named numeric vector of model-based standard errors, name-aligned with .gdpar_dep_estimate_vector(object).

Notes.

  • Same dispatch pattern as .gdpar_dep_estimate_vector(): inherits(object, "gdpar_eb_fit") triggers the EB path; otherwise full-Bayes.
  • The resulting vector is used in computing se_ratio = robust_se / model_se, and because both EB and full-Bayes model SEs are standard deviations (SD-vs-SD), the ratio is a like-for-like comparison.
  • Name alignment with the estimate vector is guaranteed by the internal extractors.

Default block-length rate function (incomplete in section)

Purpose. According to the documentation comment, this function returns the rate-optimal default block length for the moving block bootstrap:

$$ b_n = \max!\bigl(1,; \lfloor n^{1/3} \rceil\bigr) $$

where $n$ is the time-series length and $\lfloor \cdot \rceil$ denotes rounding to the nearest integer. This is the optimal growth rate for the moving block bootstrap variance estimator (Künsch 1989; Hall, Horowitz & Jing 1995).

Arguments. Not defined in this section — the function body is truncated at the end of the provided source.

Returns. Presumably a single integer: $\max(1, \text{round}(n^{1/3}))$.

Notes.

  • The section is incomplete; only the roxygen/description comment is present. The function name and full signature are not visible in this segment.
  • The data-driven constant of Politis & White (2004) is noted as a deferred refinement; this default provides only the correct rate, not the optimal constant.
  • Full documentation will require the subsequent section(s) where the function body is defined.

.gdpar_default_block_length(n)

Purpose Computes the default block length for block bootstrap resampling using the cube-root rate $n^{1/3}$. Used as a fallback when the data-driven Politis–White selector cannot run (degenerate inputs, insufficient sample size, etc.).

Arguments

Argument Type Meaning
n integer-coercible scalar Sample size (number of observations).

Mathematics

Implements the rate:

$$\ell = \max!\bigl(1,;\lfloor n^{1/3} + 0.5 \rfloor\bigr)$$

where the rounding and flooring produce an integer $\ge 1$.

Returns A single integer: the default block length.

Notes The as.integer(round(...)) call rounds to the nearest integer and truncates; the outer max(1L, ...) guarantees the result is at least 1 even when n = 0 or n = 1.


.gdpar_is_auto(x)

Purpose Predicate that tests whether a block-size argument is the literal character string "auto", distinguishing the data-driven Politis–White path from a fixed integer or the NULL rate default. Shared by the temporal and spatial robust estimators.

Arguments

Argument Type Meaning
x any R object The block-length (or block-size) argument to inspect.

Returns A logical scalar: TRUE if x is exactly the character string "auto" (length-1 character, not NA), FALSE otherwise.

Notes The compound guard is.character(x) && length(x) == 1L && !is.na(x) && identical(x, "auto") is deliberately strict: a factor, an NA_character_, or a character vector of length ≠ 1 all return FALSE. No side effects.


.gdpar_flat_top_window(s)

Purpose Evaluates the flat-top lag window (kernel) of Politis (2003) / Politis & White (2004), vectorised over its argument. Used inside the Politis–White block-length selector to compute the spectral density estimate $\hat{g}$ and the sum $\widehat{\text{spec}}$.

Arguments

Argument Type Meaning
s numeric vector Scaled lag values $s = k/M$, where $k$ is the lag index and $M$ the bandwidth.

Mathematics

$$ \lambda(s) = \begin{cases} 1, & |s| \le \tfrac{1}{2},\[4pt] 2,(1 - |s|), & \tfrac{1}{2} < |s| \le 1,\[4pt] 0, & |s| > 1. \end{cases} $$

Returns A numeric vector of the same length as s containing $\lambda(s)$.

Notes Vectorised via nested ifelse over abs(s). No input validation; non-finite or NA inputs propagate NA.


.gdpar_pw_mhat(rho, Kn, crit)

Purpose Determines the adaptive bandwidth $\hat{m}$ for the Politis & White (2004) automatic block-length selector. Searches the sample autocorrelation sequence for the first run of Kn consecutive negligible lags (the "first insignificant run" rule). Factored out for direct unit testing.

Arguments

Argument Type Meaning
rho numeric vector Sample autocorrelations at lags $1, 2, \dots, L$ (computed from residuals).
Kn integer scalar Number of consecutive insignificant lags required to declare the bandwidth; typically $\max(5, \lceil\log_{10} n\rceil)$.
crit numeric scalar Critical value for the significance test; a lag $j$ is deemed insignificant when $`

Mathematics

Returns the smallest integer $j$ such that

$$|\hat\rho(j + \ell)| < \texttt{crit} \quad \forall; \ell = 0, \dots, K_N - 1.$$

If no such run exists in $1,\dots,L$, the fallback is

$$\hat m = \max\bigl{j : |\hat\rho(j)| \ge \texttt{crit}\bigr},$$

i.e., the largest significant lag. If every lag is insignificant, $\hat m = 1$.

Returns An integer scalar: the estimated bandwidth $\hat m$.

Notes Early-return inside the for loop at the first qualifying run. The function operates on a logical vector insig <- abs(rho) < crit of length $L$, requiring $L \ge K_N$ for the scan to execute. Returns 1L as a safe minimum when all autocorrelations are negligible (near-white noise).


.gdpar_politis_white_block_length(resid, c_thresh = stats::qnorm(0.975))

Purpose Computes the optimal block length $b_{\text{opt}}$ for overlapping block bootstrap using the Politis & White (2004) data-driven selector with the Patton, Politis & White (2009) correction. Operates on residuals of a working-independence model (no model refit needed). Falls back to the $n^{1/3}$ rate with a human-readable reason when the data-driven path is infeasible.

Arguments

Argument Type Meaning
resid numeric vector Residuals of the fitted working-independence model, already in the (temporal or spatial) bootstrap ordering.
c_thresh numeric scalar Critical-value multiplier for the adaptive bandwidth test. Default qnorm(0.975) ≈ 1.96, matching np::b.star.

Mathematics

  1. Bandwidth selection. Compute sample autocorrelations $\hat\rho(1),\dots,\hat\rho(M_{\max})$ where $M_{\max} = \min(\lceil\sqrt{n}\rceil + K_N,\; n-1)$ and $K_N = \max(5, \lceil\log_{10} n\rceil)$. The adaptive bandwidth $\hat{m}$ is found by .gdpar_pw_mhat with critical value

$$\texttt{crit} = c_{\text{thresh}} \sqrt{\frac{\log_{10} n}{n}}.$$

Set $M = \min(2\hat{m},\; M_{\max})$.

  1. Spectral estimates. Recompute autocovariances $\hat{R}(k)$ for $k = 0,\dots,M$. Apply the flat-top window $\lambda(k/M)$:

$$\widehat{\text{spec}} = \hat{R}(0) + 2\sum_{k=1}^{M} \lambda(k/M),\hat{R}(k),$$

$$\hat{g} = 2\sum_{k=1}^{M} \lambda(k/M),k,\hat{R}(k).$$

  1. Optimal block length. For overlapping (moving/circular) block bootstrap the variance constant is $D = \tfrac{4}{3}\,\widehat{\text{spec}}^2$ (Lahiri 2003), giving

$$b_{\text{opt}} = \left(\frac{2,\hat{g}^2}{D}\right)^{1/3} n^{1/3}.$$

  1. Capping. The final integer block length is

$$b = \max!\Bigl(1,;\min!\bigl(\lfloor b_{\text{opt}} + 0.5 \rfloor,; \lceil\min(3\sqrt{n},; n/3)\rceil\bigr)\Bigr).$$

Returns A list with components:

Component Type Meaning
block_length integer The selected block length.
method character "auto" if the data-driven rule succeeded; "rate" if the fallback was used.
reason character Human-readable description of the selection, including $\hat{m}$, $M$, uncapped $b$, and cap.

Notes Five fallback paths return the $n^{1/3}$ rate with method = "rate": (i) $n &lt; 8$; (ii) non-positive or non-finite residual variance; (iii) $M_{\max} &lt; K_N + 1$ (series too short for lag scan); (iv) non-positive or non-finite $\widehat{\text{spec}}$ or non-finite $\hat{g}$; (v) implicitly when $\hat{g} \approx 0$ drives $b_{\text{opt}} \to 0$ (floored at 1, which is the honest data-driven answer, not a fallback).


.gdpar_block_bootstrap_data_indices(n, block_length, type = c("moving", "circular"))

Purpose Generates a resampled index vector of length n for a single temporal block bootstrap replicate. Draws ceiling(n / block_length) contiguous blocks with replacement, concatenates them, and truncates to length n. Supports both the moving (Künsch 1989) and circular (Politis & Romano 1992) block schemes.

Arguments

Argument Type Meaning
n integer-coercible scalar Sample size.
block_length integer-coercible scalar Length of each contiguous block; must be in $[1, n]$.
type character "moving" (default) or "circular". Matched via match.arg.

Mathematics

Let $B = \lceil n / \ell \rceil$ be the number of blocks and $\ell$ the block length.

  • Moving block bootstrap ("moving"): block start positions are drawn uniformly from $\{1, 2, \dots, n - \ell + 1\}$. Each block $b$ contributes indices $s_b, s_b+1, \dots, s_b + \ell - 1$.

  • Circular block bootstrap ("circular"): block start positions are drawn uniformly from $\{1, 2, \dots, n\}$. Indices wrap around modulo $n$: the raw index $i$ maps to $((i-1) \bmod n) + 1$.

The output is the first $n$ entries of the concatenated $B \times \ell$ index vector.

Returns An integer vector of length n containing resampled observation indices in $\{1, \dots, n\}$.

Notes

  • Raises an abort (class "gdpar_input_error") via gdpar_abort() if block_length is outside $[1, n]$.
  • The circular scheme gives every observation equal expected resampling weight, whereas the moving scheme slightly down-weights observations near the boundaries.
  • This is the single-chain sibling of a multi-chain MCMC-draw block bootstrap resampler (block_bootstrap_indices()) documented elsewhere.

.gdpar_dependence_residuals(object, residual_type, randomize_seed)

Purpose Computes residuals of a scalar fit (Empirical-Bayes or full-Bayes) for use in the dependence diagnostics. Shared by the temporal diagnostic (gdpar_dependence_diagnostic) and the spatial diagnostic (gdpar_spatial_dependence_diagnostic) to ensure a single, consistent residual definition (design decision D100).

Arguments

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A scalar fitted model object.
residual_type character One of "quantile", "response", "pearson", "deviance".
randomize_seed integer or NULL Seed for reproducibility of randomized quantile residuals for discrete families; ignored for continuous families.

Returns A numeric vector of residuals of length $n$.

Notes

  • Full-Bayes branch (gdpar_fit that is not a gdpar_eb_fit): delegates entirely to the S3 method stats::residuals(object, type = residual_type, randomize_seed = randomize_seed), which internally uses the posterior predictive draws and .gdpar_residuals_dispatch() (design decision D102).
  • Empirical-Bayes branch: extracts the scalar observed outcome via .gdpar_eb_scalar_y_obs(object), obtains response-type predictions via stats::predict(object, type = "response"), reads the family name from object$family$name, and dispatches to .gdpar_residuals_dispatch().

gdpar_dependence_diagnostic(object, index = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), max_lag = NULL, level = 0.95, randomize_seed = NULL, ...)

Purpose (Exported.) Quantifies serial (temporal) dependence in the residuals of a scalar Path 1 Empirical-Bayes or full-Bayes fit. The diagnostic is the gate for gdpar_dependence_robust(): it makes violations of the conditional-independence assumption visible and measurable before any block-bootstrap remedy is applied. Only scalar fits ($K = 1$, $p = 1$) are supported; multi-parameter paths are deferred.

Arguments

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A scalar fitted model.
index numeric vector or NULL Temporal (or one-dimensional) ordering of observations. If non-NULL, residuals are sorted by order(index) before statistics are computed. Must have length $n$.
residual_type character Residual type: "quantile" (default; Dunn-Smyth / randomized quantile residuals), "response", "pearson", or "deviance".
max_lag integer or NULL Maximum lag for the Ljung–Box test. Default: $\min(\lfloor 10\log_{10} n\rfloor,\; n - 1)$.
level numeric in $(0,1)$ Confidence level for the verdict. Dependence is flagged when a p-value $&lt; 1 - \texttt{level}$. Default 0.95.
randomize_seed integer or NULL Seed for randomized quantile residuals (discrete families).
... Unused; present for signature stability.

Mathematics

  1. Lag-1 autocorrelation. Let $r_1, \dots, r_n$ be the (optionally re-ordered) residuals, $\bar{r}$ their mean, and $\tilde{r}_t = r_t - \bar{r}$. Then

$$\hat{\rho}_1 = \frac{\sum_{t=2}^{n} \tilde{r}_t,\tilde{r}_{t-1}}{\sum_{t=1}^{n} \tilde{r}_t^2}.$$

The approximate one-sided p-value under the null $\rho_1 = 0$ uses the normal approximation $\sqrt{n}\,\hat\rho_1 \dot\sim \mathcal{N}(0,1)$:

$$p_1 = 2,\Phi!\bigl(-|\sqrt{n},\hat\rho_1|\bigr).$$

  1. Durbin–Watson statistic. Reported descriptively (not as a formal test):

$$DW = \frac{\sum_{t=2}^{n}(r_t - r_{t-1})^2}{\sum_{t=1}^{n} r_t^2} ;\approx; 2(1 - \hat\rho_1).$$

Values near 2 indicate no first-order autocorrelation.

  1. Ljung–Box test. The omnibus test across lags $1, \dots, h$ (where $h = \texttt{max\_lag}$) is

$$Q = n(n+2)\sum_{j=1}^{h}\frac{\hat\rho_j^2}{n-j} ;\dot\sim; \chi^2_h \quad\text{under } H_0,$$

computed via stats::Box.test(..., type = "Ljung-Box", fitdf = 0). The degrees of freedom are not reduced by the number of estimated model coefficients (fitdf = 0), making the test mildly optimistic for residuals of a fitted model.

  1. Verdict. Dependence is flagged when $p_{\text{Ljung-Box}} &lt; 1 - \texttt{level}$.

Returns An object of class c("gdpar_dependence_diagnostic", "list") with components:

Component Type Meaning
residual_type character The residual type used.
n integer Number of residuals.
max_lag integer Maximum lag used for the Ljung–Box test.
lag1_autocorr numeric $\hat\rho_1$.
lag1_p_value numeric Two-sided p-value for $\hat\rho_1$.
durbin_watson numeric Durbin–Watson statistic $DW$.
ljung_box_statistic numeric Ljung–Box $Q$ statistic.
ljung_box_df integer Degrees of freedom of the $\chi^2$ reference distribution.
ljung_box_p_value numeric P-value of the Ljung–Box test.
level numeric Confidence level used.
index_supplied logical Whether index was non-NULL.
verdict character Human-readable verdict string.

A print method (S3 dispatch on "gdpar_dependence_diagnostic") is provided for formatted output.

Notes

  • Input validation. Calls .gdpar_assert_scalar_dep(object, "object") to ensure the fit is scalar. Validates level via assert_numeric_scalar(level, ..., lower = 0, upper = 1). Requires the posterior package (suggested dependency) for extracting posterior draws.
  • Abort conditions. Raises gdpar_abort with class "gdpar_input_error" if index has the wrong length or max_lag is outside $[1, n-1]$. Raises class "gdpar_diagnostic_error" if all residuals have zero variance (denom <= 0).
  • S3 method note. The returned object carries class "gdpar_dependence_diagnostic" as its primary class, enabling print.gdpar_dependence_diagnostic() dispatch.
  • Scope. Only scalar ($K=1$, $p=1$) fits are accepted. Spatial dependence is handled by the sibling gdpar_spatial_dependence_diagnostic().

print.gdpar_dependence_diagnostic(x, digits, ...)

Purpose (Exported S3 method.) Provides a human-readable formatted summary of a gdpar_dependence_diagnostic object.

Arguments

Argument Type Meaning
x gdpar_dependence_diagnostic The diagnostic object to print.
digits integer Number of significant digits for the printed statistics. (Signature declared in roxygen; exact default and implementation body are in the subsequent section.)
... Unused; present for S3 generic compatibility.

Returns Invisibly returns x.

Notes The function body is defined in the next section (section 3 of 7); only the roxygen documentation and function signature are present in this section. The method is registered via @export for S3 dispatch on the "gdpar_dependence_diagnostic" class.

print.gdpar_dependence_diagnostic(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_dependence_diagnostic. Produces a human-readable, multi-line textual summary of the serial-dependence diagnostic battery (autocorrelation, Durbin–Watson, Ljung–Box) attached to a fitted model.

Arguments

Argument Type Meaning
x list (S3 class gdpar_dependence_diagnostic) The diagnostic object produced by gdpar_dependence_diagnostic(). Required fields: $residual_type, $index_supplied, $n, $lag1_autocorr, $lag1_p_value, $durbin_watson, $ljung_box_df, $ljung_box_statistic, $ljung_box_p_value, $verdict.
digits integer (default 3L) Number of significant digits used by format() when printing numeric quantities.
... Ignored; present for S3 method compatibility.

Returns invisible(x) — the input object, invisibly, following standard R print-method convention.

Notes

  • All output is emitted via cat() to the console (stdout). No value is returned visibly.
  • The print method checks x$index_supplied to annotate whether the residuals were ordered by a user-supplied index or by natural row order.
  • If x$index_supplied is TRUE, the residual-type line reads "(ordered by supplied index)"; otherwise "(natural row order)".
  • No validation of x fields is performed; missing or NULL fields will produce blank output segments.

.gdpar_dependence_robust_engine(object, data, resample_fun, B, level, seed, iter_warmup, iter_sampling, chains, verbose, verbose_msg, caller_env, ...)

Purpose Internal (non-exported) shared block-bootstrap-by-refit engine. Factors out the entire resampling loop, seed management, bootstrap-SE and percentile-interval assembly, and per-refit convergence accounting that is common to the temporal (gdpar_dependence_robust) and spatial (gdpar_spatial_dependence_robust) robust-inference wrappers. The two public entry points differ only in their resample_fun and in the descriptive metadata they attach; everything downstream of the resample is handled here identically. Serves both the Empirical-Bayes (gdpar_eb_fit) and full-Bayes (gdpar_fit) paths through class-dispatched extractors (decision D102).

Arguments

Argument Type Meaning
object list (S3 class gdpar_eb_fit or gdpar_fit) A scalar Path 1 fit object (K = 1, p = 1). Must carry $call (the original fitting call), and must be dispatchable by .gdpar_dep_estimate_vector() and .gdpar_dep_model_se_vector().
data data.frame The original data frame passed to the fitting function. Each bootstrap iteration indexes rows of this data frame.
resample_fun nullary function A closure with no arguments that returns an integer vector of length nrow(data) — the row indices for one bootstrap resample. For temporal bootstrapping this wraps .gdpar_block_bootstrap_data_indices() (moving or circular blocks ordered by index); for spatial bootstrapping it returns a spatial-block index vector.
B integer scalar Number of bootstrap refits to perform.
level numeric scalar in $(0, 1)$ Confidence level for the percentile interval (e.g. 0.95).
seed integer or NULL Optional RNG seed. When non-NULL, set.seed(as.integer(seed)) is called once before the loop, ensuring reproducibility of both the per-refit Stan seeds and the resample_fun() draws.
iter_warmup integer scalar Number of warmup (burn-in) iterations passed to each Stan refit.
iter_sampling integer scalar Number of post-warmup sampling iterations passed to each Stan refit.
chains integer scalar Number of HMC chains for each refit.
verbose logical scalar When TRUE, emits verbose_msg once at the start via gdpar_inform().
verbose_msg character or NULL Pre-formatted cost message printed when verbose is TRUE.
caller_env environment The environment (typically the public wrapper's parent.frame()) in which each refit call is evaluated, so that model symbols resolve exactly as for a direct gdpar_eb() or gdpar() call.
... Passed through to nothing directly; present for extensibility.

Mathematics

RNG-consumption contract. The engine's random-number consumption order is frozen for reproducibility:

  1. set.seed(seed) (when seed is non-NULL);
  2. $B$ per-refit Stan seeds drawn via sample.int(.Machine$integer.max, B) — these are assigned deterministically to iteration $b = 1, \ldots, B$;
  3. One call to resample_fun() per iteration $b$.

Point-estimate extraction. For each successful refit $\hat{\theta}^{(b)}$, the estimate vector is obtained via the class-dispatched extractor .gdpar_dep_estimate_vector(fit_b), which returns a named numeric vector of all AMM coefficients (theta_ref, a_coef, b_coef, W_raw, etc.).

Robust standard error. Let $\hat{\theta}_j^{(b)}$ denote the estimate of parameter $j$ from bootstrap replicate $b$, and let $B_{\text{ok}}$ be the number of replicates with no errors and no NA coefficients. Then:

$$ \widehat{\text{SE}}_{\text{robust},,j} = \text{SD}!\bigl(\hat{\theta}_j^{(b)} : b \in \text{successful}\bigr) = \sqrt{\frac{1}{B_{\text{ok}}-1}\sum_{b=1}^{B_{\text{ok}}}\bigl(\hat{\theta}_j^{(b)} - \bar{\hat\theta}_j\bigr)^2} $$

where $\bar{\hat\theta}_j$ is the bootstrap sample mean.

Percentile confidence interval. For level $\ell$ (e.g. 0.95), set $\alpha = 1 - \ell$. The two-sided percentile interval for parameter $j$ is:

$$ \bigl[\hat\theta_{j,,\alpha/2}^{_},;\hat\theta_{j,,1-\alpha/2}^{_}\bigr] $$

where $\hat\theta_{j,\,q}^{*}$ is the $q$-quantile of the $B_{\text{ok}}$ successful bootstrap estimates, computed via stats::quantile(..., probs = c(alpha/2, 1 - alpha/2), names = FALSE).

SE ratio. The ratio comparing robust and model-based uncertainty:

$$ \text{se_ratio}_j = \frac{\widehat{\text{SE}}_{\text{robust},,j}}{\text{SE}_{\text{model},,j}} $$

A ratio $&gt; 1$ indicates the model-based SE understates the true sampling variability due to dependence.

Convergence diagnostics. Per-refit convergence fields are aggregated over the $B$ replicates (successful and unsuccessful):

  • $\text{max\_rhat} = \max_b \hat{R}^{(b)}_{\max}$ (maximum across all refits of the per-refit maximum split-$\hat{R}$);
  • $\text{min\_ess\_bulk} = \min_b \text{ESS}_{\text{bulk},\,\min}^{(b)}$ (minimum across all refits of the per-refit minimum bulk ESS);
  • $\text{n\_divergent\_refits} = |\{b : D^{(b)} &gt; 0\}|$ (number of refits with at least one divergent transition);
  • $\text{n\_high\_rhat\_refits} = |\{b : \hat{R}^{(b)}_{\max} &gt; 1.05\}|$ (number of refits with max R-hat exceeding the 1.05 threshold).

The R-hat threshold $1.05$ is the classical Gelman–Rubin "clearly non-converged" heuristic. If $\text{max\_rhat} &gt; 1.05$, a warning is appended advising the user to increase iter_warmup/iter_sampling.

Returns A list with components:

Component Type Description
table data.frame One row per AMM coefficient, columns: parameter (character), estimate (original point estimate), model_se (Laplace SD or posterior SD), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower, ci_upper (percentile interval endpoints).
B_ok integer Number of successful bootstrap refits (no errors, all coefficients non-NA).
seed integer The supplied seed, or NA_integer_ if seed was NULL.
warnings character vector Accumulated warning messages (refit failures, convergence issues). Zero-length if clean.
refit_diagnostics list Aggregate convergence summary: max_rhat (numeric), min_ess_bulk (numeric), n_divergent_refits (integer), n_high_rhat_refits (integer), rhat_threshold (numeric, always 1.05).

Notes

  • No refit exclusion. Under-converged or divergent refits are never excluded or down-weighted. The rationale (documented in source decision D102) is that excluding under-converged refits is non-random — it removes precisely the data configurations the bootstrap is meant to probe — and would bias the SE. Both R-hat breaches and divergence counts are reported as diagnostics only.
  • Error handling. If a refit raises an error, the error message is captured via tryCatch, stored in warnings_msg, and the iteration is skipped (next). If fewer than 2 refits succeed (B_ok < 2), the engine calls gdpar_abort() with class "gdpar_diagnostic_error", aborting the run.
  • Parameter alignment. Only parameters common to the original fit's param_names and each refit's estimate vector are recorded in boot[b, ]. This handles the (rare) case where a refit produces a partial coefficient vector.
  • Refit call construction. The refit call is object$call with fields overridden: datasub (the resampled data), iter_warmup, iter_sampling, chains, verboseFALSE, refresh0L, skip_id_checkTRUE, seedrefit_seeds[b]. A new environment env is created with parent = caller_env and env$sub <- sub, so the symbol sub resolves inside the call.
  • Diagnostics path-agnostic. Both gdpar_eb_fit and gdpar_fit objects carry a $diagnostics slot with fields rhat_max, ess_bulk_min, divergent_count. The engine reads whichever is present.
  • Byte-identical EB path. On the Empirical-Bayes path, the dispatch to .gdpar_dep_estimate_vector / .gdpar_dep_model_se_vector resolves to the original EB helpers, and the refit is a gdpar_eb() call, so the engine's output is bit-for-bit identical to the pre-D102 temporal-only implementation.

gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = "quantile", randomize_seed = NULL, type = "moving", B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = FALSE, ...)

Note: This function's roxygen documentation and @export directive appear at the end of this section (section 3 of 7); the actual function body is defined in a subsequent section. The documentation below is derived strictly from the roxygen block present here.

Purpose Public, exported entry point for dependence-robust standard errors via a temporal block bootstrap. Re-estimates the uncertainty of a scalar Path 1 Empirical-Bayes or full-Bayes fit so that it is robust to temporal (serial) dependence in the data, without modelling that dependence. It refits the model on $B$ moving (or circular) block bootstrap resamples of the data ordered by index, and reports the bootstrap standard deviation and percentile intervals of each AMM coefficient alongside the model-based (Laplace / posterior) standard errors. This implements the working-independence + robust-variance stance of Liang & Zeger (1986): the point estimates are unchanged (consistent when the mean structure is correct, not efficient); only the reported uncertainty is made dependence-robust. Delegates the core loop to .gdpar_dependence_robust_engine().

Arguments

Argument Type Meaning
object S3 object (gdpar_eb_fit or gdpar_fit) A scalar Path 1 fit (K = 1, p = 1): either from gdpar_eb() (Empirical Bayes) or gdpar() (full Bayes).
data data.frame The data frame originally passed to the fitting function. The fit object does not store the data (to stay lightweight), so it must be re-supplied. Resampled by contiguous blocks and the model is refit on each resample.
index numeric/integer vector of length $n$, or NULL Optional temporal ordering of the rows of data. Data are sorted by order(index) so that contiguous blocks correspond to contiguous time. When NULL (default), the natural row order is assumed to be the temporal order.
block_length NULL, positive integer, or "auto" Block size for the bootstrap. NULL (default): uses the rate-optimal $\max(1, \lfloor n^{1/3} \rceil)$ (Künsch 1989; Hall, Horowitz & Jing 1995). Positive integer: fixes the block length manually. "auto": selects the block length data-drivenly via the Politis & White (2004) automatic rule (with the Patton, Politis & White 2009 correction), computed from the fitted residuals (no extra refit), falling back to the rate-optimal formula on a degenerate series. The chosen value and method are reported in the result.
residual_type character, one of "quantile" (default), "response", "pearson", "deviance" Type of residuals fed to the Politis & White automatic block-length selector. Used only when block_length = "auto"; ignored otherwise. "quantile" refers to Dunn–Smyth randomized quantile residuals.
randomize_seed integer or NULL Optional seed for the randomized quantile residuals of discrete families. Used only by the "auto" block-length selector for reproducibility of the block-length choice; ignored otherwise.
type character, one of "moving" (default) or "circular" Type of block bootstrap. "moving" uses overlapping blocks that slide along the series; "circular" wraps the series into a circle.
B integer scalar (default 199L) Number of bootstrap refits.
level numeric scalar in $(0, 1)$ (default 0.95) Confidence level for the percentile interval.
seed integer or NULL Optional RNG seed controlling both the block resampling and deterministically derived per-refit Stan seeds, for full reproducibility.
iter_warmup integer scalar (default 500L) Number of warmup iterations per refit. Defaults are deliberately short to keep cost manageable.
iter_sampling integer scalar (default 500L) Number of post-warmup sampling iterations per refit.
chains integer scalar (default 2L) Number of HMC chains per refit.
verbose logical scalar (default FALSE) When TRUE, prints an opt-in cost message once.
... Additional arguments forwarded to gdpar_eb() (or gdpar()) for every refit.

Mathematics

The function applies the Liang & Zeger (1986) working-independence / sandwich-variance paradigm to the gdpar model class. The key quantities are:

  1. Block-length selection. Under the rate-optimal default: $$ L = \max!\bigl(1,, \lfloor n^{1/3} \rceil\bigr) $$ Under "auto", the Politis & White (2004) algorithm estimates the optimal block length from the spectral density at frequency zero of the fitted residuals, with the Patton–Politis–White (2009) bias correction.

  2. Moving block bootstrap. For series length $n$ and block length $L$, the moving block bootstrap draws $\lfloor n/L \rfloor$ contiguous blocks of length $L$ uniformly at random (with replacement) from the $n - L + 1$ possible overlapping blocks, concatenating them to form a resampled series of length $\approx n$.

  3. Circular block bootstrap. The series is wrapped into a circle; $n - L + 1$ is replaced by $n$ possible blocks, eliminating edge effects.

  4. Robust SE. Computed by the engine: $$ \widehat{\text{SE}}{\text{robust}} = \text{SD}\bigl(\hat\theta^{(1)}, \ldots, \hat\theta^{(B{\text{ok}})}\bigr) $$

  5. SE ratio. $$ \text{se_ratio} = \frac{\widehat{\text{SE}}{\text{robust}}}{\text{SE}{\text{model}}} $$ Values $&gt; 1$ signal that the model-based SE understates true sampling variability due to dependence.

Returns A list of S3 class gdpar_dependence_robust with components:

Component Type Description
table data.frame One row per AMM coefficient; columns: parameter, estimate, model_se, robust_se, se_ratio, ci_lower, ci_upper.
block_length integer The chosen block length.
block_length_method character One of "rate" (rate-optimal formula, also flags fallback from "auto"), "fixed" (user-supplied), "auto" (Politis–White).
type character "moving" or "circular".
B integer Requested number of bootstrap replications.
B_ok integer Number of successful refits.
level numeric Confidence level used.
index_supplied logical Whether the user supplied an index vector.
seed integer The supplied seed, or NA_integer_.
warnings character vector Accumulated warning messages (refit failures, convergence issues).
refit_diagnostics list Aggregate per-refit convergence: max_rhat, min_ess_bulk, n_divergent_refits, n_high_rhat_refits, rhat_threshold.

A print method (defined elsewhere) provides a human-readable summary.

Notes

  • Empirical-Bayes vs. full-Bayes parity. Both paths are supported (decision D102). On the EB path, estimate is the Laplace/conditional-posterior mean and model_se is its SD; on the full-Bayes path, estimate is the posterior mean and model_se is the posterior SD. The posterior mean (not median) is used for parity and to keep the SE ratio a dimensionless SD-vs-SD ratio without undeclared normal-scaling constants.
  • Full-Bayes caveats. (1) Each full-Bayes refit runs full HMC (costly). (2) Finite-iteration refits carry Monte-Carlo error in their posterior mean, which slightly and conservatively inflates robust_se. (3) Under an informative prior the full-Bayes posterior SD can be smaller than the bootstrap SD even under correct independent specification ($\text{se\_ratio} &lt; 1$), because the prior concentrates the posterior beyond what the data alone support — this is benign regularization, not overstatement.
  • Scope limitation. The bootstrap delivers robust variance, not better point estimates. It is valid for weak / short-range dependence relative to block_length; it does not rescue long-memory or unit-root processes.
  • Dependencies. Uses cmdstanr for refits and posterior to extract coefficient estimates.
  • Exported. This function is exported from the package namespace (present in NAMESPACE).

gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, type = c("moving", "circular"), B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)

Purpose Top-level exported function that performs a dependence-robust uncertainty audit for a fitted gdpar model via block bootstrap. It re-estimates standard errors (and confidence intervals) of model coefficients to account for temporal dependence in the residuals, without changing point estimates. The method repeatedly refits the model on block-bootstrap resamples of the original data.

Arguments

Argument Type Meaning
object gdpar_fit or gdpar_eb_fit or compatible The fitted model object whose uncertainty is to be audited.
data data.frame The original data frame used in fitting. Must be row-aligned with the model.
index numeric or NULL Optional temporal ordering variable. If non-NULL, data and residuals are sorted by this index before blocking. Must have length equal to nrow(data).
block_length NULL, positive integer, or "auto" Block length for the moving/circular block bootstrap. NULL uses a default rate $n^{1/3}$. "auto" selects the block length data-adaptively via the Politis–White (2004) plug-in method on the residuals.
residual_type character scalar, one of "quantile", "response", "pearson", "deviance" Type of residual used when block_length = "auto" for the Politis–White plug-in (and for spatial diagnostics). Matched via match.arg.
randomize_seed integer or NULL Seed for randomized quantile residuals (used only if residual_type = "quantile").
type character scalar, one of "moving", "circular" Block-bootstrap scheme. "moving" uses overlapping blocks of length block_length; "circular" wraps the data end-to-end. Matched via match.arg.
B positive integer Number of bootstrap replicates (default 199).
level numeric in $(0,1)$ Confidence level for percentile-based intervals (default 0.95).
seed integer or NULL Master seed passed to the engine for reproducibility.
iter_warmup positive integer Stan warmup iterations per refit.
iter_sampling positive integer Stan sampling iterations per refit.
chains positive integer Number of MCMC chains per refit.
verbose logical scalar If TRUE, prints an informational banner describing the audit before computation begins.
... Additional arguments passed through to .gdpar_dependence_robust_engine and ultimately to the Stan refit.

Mathematics

Default block length (rate method):

When block_length is NULL, the default block length is set to

$$\ell = \max!\bigl(1,;\lfloor n^{1/3} \rfloor\bigr)$$

where $n = \texttt{nrow(data)}$. This is the $d=1$ specialisation of the variance-MSE-optimal rate $\ell \sim n^{d/(d+2)}$.

Block bootstrap:

For each of $B$ replicates, a set of $n$ row indices is drawn by .gdpar_block_bootstrap_data_indices(n, block_length, type). If type = "moving", consecutive blocks of length $\ell$ are drawn starting at uniformly random positions; if type = "circular", the data are conceptually wrapped in a circle.

Auto block length (Politis–White):

When block_length = "auto", residuals $r_i$ are extracted from the model. The Politis–White (2004) plug-in estimator is applied to select $\ell$, returning a list with $block_length, $method, and $reason.

Robust standard error:

The block-bootstrap standard error of each coefficient is the sample standard deviation of the $B$ bootstrap replications. The ratio

$$\texttt{se_ratio} = \frac{\widehat{\mathrm{SE}}_{\text{robust}}}{\widehat{\mathrm{SE}}_{\text{model}}}$$

measures how much the model-based uncertainty understates the dependence-robust uncertainty; values $&gt; 1$ indicate that naive standard errors are anticonservative.

Returns An object of class c("gdpar_dependence_robust", "list") with the following components:

Component Type Meaning
table data.frame Coefficient table with robust SEs, model SEs, se_ratio, and confidence intervals at the requested level.
block_length integer The block length used (after resolution of NULL or "auto").
block_length_method character One of "fixed", "rate", or the method string returned by Politis–White.
type character The bootstrap scheme used ("moving" or "circular").
B integer Requested number of replicates.
B_ok integer Number of replicates that completed successfully.
level numeric Confidence level.
index_supplied logical Whether the caller supplied an ordering index.
seed integer or NULL Seed actually used by the engine.
warnings character vector Accumulated warning messages from failed or slow refits.
refit_diagnostics list or NULL Aggregate convergence diagnostics across all refits (max R-hat, min ESS, divergent transitions, high-R-hat count).

Notes

  • The function requires the cmdstanr and posterior packages; if absent, a suggestion-error is raised.
  • Validation errors (class = "gdpar_input_error") are raised for: non-scalar object, non-data-frame data, mismatched index length, invalid block_length (non-NULL, non-integer, non-"auto"), block_length outside $[1, n]$, non-scalar logical verbose.
  • If index is non-NULL, both data and (internally) residuals are reordered by order(index) before blocking, ensuring temporal coherence.
  • The function detects whether object inherits from "gdpar_fit" but not "gdpar_eb_fit" (i.e., is a full-Bayes fit) and adjusts the verbose message to warn that full HMC refits are markedly more expensive.
  • The resample-generating closure resample_fun is created in the local environment and passed to the engine.
  • caller_env <- parent.frame() is captured so the engine can re-evaluate expressions in the caller's scope if needed.

print.gdpar_dependence_robust(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_dependence_robust. Renders a human-readable summary of the block-bootstrap audit results to the console.

Arguments

Argument Type Meaning
x gdpar_dependence_robust The object to print.
digits integer scalar (default 3) Number of significant digits for formatting numeric columns in the table.
... Unused; present for S3 generic compatibility.

Returns Invisibly returns x (the input object).

Notes

  • Prints the bootstrap scheme ("moving" or "circular"), block length (with provenance label: "auto: Politis-White", "rate: n^(1/3)", or blank for fixed), $B$, $B_{\text{ok}}$, index-supplied status, and confidence level.
  • The label for block_length_method uses a switch with four branches: "auto", "rate", "fixed", and a default empty string; the %||% operator defaults to "fixed" if the component is NULL.
  • Numeric columns of the table are formatted with format(col, digits = digits).
  • Appends an explanatory note about the se_ratio interpretation.
  • Calls .gdpar_print_refit_diagnostics() to print convergence diagnostics.
  • If warnings are present, prints up to 5, with a count of remaining suppressed warnings.

.gdpar_print_refit_diagnostics(rd, digits = 3L)

Purpose Internal helper that prints aggregate per-refit convergence diagnostics (max R-hat, min ESS bulk, divergent transition count, high-R-hat refit count) to the console. Called by print.gdpar_dependence_robust.

Arguments

Argument Type Meaning
rd list or NULL The refit_diagnostics component of a gdpar_dependence_robust object.
digits integer scalar (default 3) Number of significant digits for formatting.

Returns invisible(NULL) in all cases.

Notes

  • Returns early (silently) if rd is NULL.
  • Also returns early if rd$max_rhat is NULL or non-finite (!is.finite(mr)).
  • Uses %||% to default missing components to NA_real_ or 0L or 1.05 as appropriate.
  • Prints a single formatted line showing: max R-hat, min ESS (bulk), number of divergent refits, number of refits with R-hat above a threshold (default 1.05).

.gdpar_spatial_default_g(n)

Purpose Internal function returning the variance-optimal default number of grid cells per axis $g$ for the spatial block bootstrap in two dimensions.

Arguments

Argument Type Meaning
n integer Number of spatial observations.

Mathematics

Implements the $d = 2$ specialisation of the variance-MSE-optimal block rate. The optimal number of points per block is $M \sim n^{d/(d+2)}$, so the number of cells per axis is

$$g = \max!\bigl(2,;\lfloor n^{1/4} + 0.5 \rfloor\bigr)$$

yielding $g^2 \sim n^{1/2}$ total cells and $M \sim n^{1/2}$ points per cell. For $d=1$ this reduces to the $n^{1/3}$ temporal rate.

Returns Integer scalar $g \geq 2$.

Notes

  • Uses round(n^(1/4)) then coerces to integer, with a floor of 2.
  • The documentation references decision D100 as the registered dissent.

.gdpar_validate_coords(coords, n, arg = "coords")

Purpose Internal function that validates and coerces a coordinate matrix into a numeric $n \times 2$ matrix suitable for spatial analysis.

Arguments

Argument Type Meaning
coords data.frame, matrix, or other Coordinate input to validate.
n integer Expected number of rows (observations).
arg character (default "coords") Name of the argument for error messages.

Returns A numeric matrix with exactly 2 columns and n rows, with no non-finite values.

Notes

  • If coords is a data.frame, it is coerced via as.matrix().
  • Raises gdpar_abort (class "gdpar_input_error") if:
    • coords is not a numeric matrix after coercion.
    • coords does not have exactly 2 columns.
    • nrow(coords) != n.
    • coords contains any non-finite values (NA, NaN, Inf, -Inf).

.gdpar_knn_adjacency(coords, k)

Purpose Internal function constructing a binary $k$-nearest-neighbour spatial adjacency matrix using Euclidean distance. Ties are broken by index order, so duplicate locations are well-defined but spatially degenerate.

Arguments

Argument Type Meaning
coords numeric matrix ($n \times 2$) Spatial coordinates (assumed validated).
k positive integer Number of nearest neighbours.

Mathematics

Computes the $n \times n$ Euclidean distance matrix

$$D_{ij} = \lVert \mathbf{x}_i - \mathbf{x}_j \rVert_2$$

via stats::dist. For each observation $i$, the diagonal entry $D_{ii}$ is set to $\infty$, the $k$ smallest distances are selected by order, and the corresponding entries of the adjacency matrix $W$ are set to 1:

$$W_{ij} = \begin{cases} 1 & \text{if } j \in \text{KNN}_k(i) \ 0 & \text{otherwise} \end{cases}$$

The resulting $W$ is generally asymmetric (if $j$ is a nearest neighbour of $i$, $i$ is not necessarily a nearest neighbour of $j$).

Returns An $n \times n$ binary (0/1) integer matrix $W$.

Notes

  • All $n$ rows of $W$ are initialized to zero, then row-by-row the $k$ nearest neighbours are set to 1.
  • Because order breaks ties by position, duplicate coordinates are handled deterministically.

.gdpar_distance_band_adjacency(coords)

Purpose Internal function constructing a binary distance-band adjacency matrix. The threshold is data-driven: the smallest distance that leaves no observation isolated.

Arguments

Argument Type Meaning
coords numeric matrix ($n \times 2$) Spatial coordinates (assumed validated).

Mathematics

  1. Compute the $n \times n$ Euclidean distance matrix $D$.
  2. Set the diagonal $D_{ii} = \infty$.
  3. The bandwidth threshold is

$$d^* = \max_{i=1,\ldots,n} \min_{j \neq i} D_{ij}$$

i.e., the maximum over all points of their nearest-neighbour distance. This ensures every point has at least one neighbour.

  1. The adjacency matrix is

$$W_{ij} = \begin{cases} 1 & \text{if } D_{ij} \leq d^* \text{ and } i \neq j \ 0 & \text{otherwise} \end{cases}$$

Returns An $n \times n$ binary (0/1) matrix $W$ with zero diagonal.

Notes

  • Described as a "declared data-driven heuristic."
  • The resulting $W$ is symmetric because Euclidean distance is symmetric.
  • The diagonal is explicitly zeroed after the threshold comparison.

.gdpar_morans_i(resid, W, S0 = sum(W))

Purpose Internal function computing Moran's $I$ statistic for a residual vector under a (possibly asymmetric) spatial weights matrix. Used to test for spatial autocorrelation.

Arguments

Argument Type Meaning
resid numeric vector of length $n$ Residuals (row-aligned with the weights matrix).
W $n \times n$ numeric matrix Spatial weights (binary adjacency or otherwise; need not be symmetric).
S0 numeric scalar (default sum(W)) The sum of all weights $S_0 = \sum_{i,j} w_{ij}$. Pre-computed for efficiency.

Mathematics

Let $\bar{r} = \frac{1}{n}\sum_{i=1}^n r_i$ and define the centred residuals $z_i = r_i - \bar{r}$. Moran's $I$ is

$$I = \frac{n}{S_0} \cdot \frac{\displaystyle\sum_{i=1}^n \sum_{j=1}^n w_{ij}, z_i, z_j}{\displaystyle\sum_{i=1}^n z_i^2}$$

In vector notation, with $\mathbf{z} = (z_1, \ldots, z_n)^\top$:

$$I = \frac{n}{S_0} \cdot \frac{\mathbf{z}^\top W \mathbf{z}}{\mathbf{z}^\top \mathbf{z}}$$

The implementation computes $W\mathbf{z}$ via matrix multiplication (W %*% z), then takes the elementwise product $\mathbf{z} \odot (W\mathbf{z})$ and sums, which is equivalent to $\mathbf{z}^\top W \mathbf{z}$.

Returns A numeric scalar: the Moran's $I$ statistic.

Notes

  • Under the null hypothesis of no spatial autocorrelation and row-standardised weights, $E[I] \approx -1/(n-1)$. Values near 1 indicate positive spatial autocorrelation; values near $-1/(n-1)$ indicate negative autocorrelation.
  • The formula as implemented handles asymmetric $W$ correctly because $\sum_{ij} w_{ij} z_i z_j = \mathbf{z}^\top W \mathbf{z}$ does not require symmetry.
  • No $p$-value or reference distribution is computed here; this is a pure computational helper.

.gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges)

Purpose Internal function generating a length-$n$ vector of resampled row indices for a spatial block bootstrap. The spatial analogue of .gdpar_block_bootstrap_data_indices() for 2-D coordinates.

Arguments

Argument Type Meaning
coords numeric matrix ($n \times 2$) Spatial coordinates (assumed validated and row-aligned).
g positive integer Number of grid cells per axis.
scheme character, "tiled" or "moving" Spatial bootstrap scheme.
random_origin logical If TRUE, the grid origin is randomized per replicate (Politis–Romano–Lahiri randomized partition).
mins numeric vector of length 2 Coordinate minima per axis (bounding-box lower corner).
ranges numeric vector of length 2 Coordinate range per axis (bounding-box extent).

Mathematics

Cell side lengths:

$$\mathbf{L} = \frac{\mathbf{ranges}}{g} = \left(\frac{\text{range}_x}{g},; \frac{\text{range}_y}{g}\right)$$

Tiled scheme:

  1. Set the origin $\mathbf{o}$. If random_origin = TRUE, draw $\mathbf{u} \sim U(0,1)^2$ and set $\mathbf{o} = \mathbf{mins} - \mathbf{u} \odot \mathbf{L}$; otherwise $\mathbf{o} = \mathbf{mins}$.
  2. Assign each observation $i$ to a cell:

$$c_{x,i} = \left\lfloor \frac{x_i - o_x}{L_x} \right\rfloor, \quad c_{y,i} = \left\lfloor \frac{y_i - o_y}{L_y} \right\rfloor$$

  1. Group observations by cell label $(c_{x,i}, c_{y,i})$.
  2. Sample cells with replacement (uniform) and concatenate their member indices until $\geq n$ indices accumulate. Truncate to exactly $n$.

Moving scheme:

  1. Repeatedly draw a random seed point $\mathbf{s}$ from the data.
  2. Draw $\mathbf{u} \sim U(0,1)^2$ and set the block origin $\mathbf{o} = \mathbf{s} - \mathbf{u} \odot \mathbf{L}$.
  3. Collect all observations within the axis-aligned square $[\mathbf{o},\, \mathbf{o} + \mathbf{L})$.
  4. Append to the output until $\geq n$ indices accumulate. Truncate to exactly $n$.

Returns An integer vector of length $n$ containing resampled row indices (1-based, with possible repetitions).

Notes

  • In the tiled scheme, non-empty cells are guaranteed to have at least one observation. Empty cells are implicitly excluded because split only creates groups for observed cell labels.
  • In the moving scheme, every block is guaranteed non-empty because the block is anchored at a randomly sampled observation and is sized to cover at least that point (assuming the observation falls inside the bounding box, which it does by construction).
  • The random_origin mechanism implements the Politis–Romano–Lahiri randomized partition to break grid-alignment artifacts.

.gdpar_spatial_block_length_auto(coords, resid, scheme, random_origin, mins, ranges, B0 = 200L, var_const = 1, seed = NULL)

Purpose Internal workhorse that data-selects the spatial block size $g$ (the number of grid cells along each axis) for the spatial block bootstrap. It minimises a mean-squared-error (MSE) criterion over a grid of candidate $g$ values, balancing bias (distance from the large-block anchor) against the variance of the bootstrap variance estimator. When the automatic procedure cannot run or fails, it falls back to the $n^{1/4}$ default rate via .gdpar_spatial_default_g.

Arguments

Argument Type Meaning
coords numeric $n \times 2$ matrix Spatial coordinates (columns = $x$, $y$), row-aligned with resid.
resid numeric vector of length $n$ Model residuals (centred internally: $z_i = \text{resid}_i - \bar{\text{resid}}$).
scheme character string Block-tile scheme identifier forwarded to .gdpar_spatial_block_indices (controls how spatial blocks are laid out relative to random_origin, mins, ranges).
random_origin logical or scalar Whether to randomise the grid origin in each bootstrap replicate (forwarded to .gdpar_spatial_block_indices).
mins numeric vector of length 2 Minimum coordinate values $(x_{\min}, y_{\min})$ defining the bounding box of the spatial domain.
ranges numeric vector of length 2 Coordinate ranges $(r_x, r_y)$ of the spatial domain.
B0 integer (default 200L) Number of Monte Carlo block-bootstrap replicates per candidate $g$.
var_const numeric scalar (default 1) Multiplicative constant $c$ scaling the variance-of-variance term in the MSE criterion.
seed integer or NULL Optional seed set via set.seed() before the bootstrap loop for reproducibility.

Mathematics

The procedure operates as follows.

  1. Default fallback. Compute $g_{\text{def}} = \lfloor n^{1/4} \rfloor$ via .gdpar_spatial_default_g(n). Early returns use $g_{\text{def}}$ when:

    • $n &lt; 25$,
    • coordinate spread is degenerate ($\text{sd}(x) \le 0$ or $\text{sd}(y) \le 0$),
    • fewer than 3 valid grid points exist,
    • bootstrap variances are non-finite or all zero,
    • the MSE criterion is non-finite, or
    • the MSE minimum falls on the largest-$g$ (smallest-block) boundary.
  2. Design matrix for a spatial-mean surrogate. Construct a $n \times 3$ matrix $$\mathbf{D}_{\text{surr}} = \bigl[,\mathbf{1},; (x - \bar x)/s_x,; (y - \bar y)/s_y,\bigr]$$ where $s_x, s_y$ are the coordinate standard deviations.

  3. Candidate grid. Define bounds $$g_{\text{lo}} = \max!\bigl(2,;\lfloor 0.5, g_{\text{def}} \rfloor\bigr), \qquad g_{\text{hi}} = \min!\bigl(\lfloor 3, g_{\text{def}} \rfloor,; \lfloor \sqrt{n/3},\rfloor\bigr).$$ Generate 6 points on a log-spaced grid in $[g_{\text{lo}},\, g_{\text{hi}}]$, round to unique integers, and retain only $g \ge 2$ with average cell occupancy $n/g^2 \ge 3$.

  4. Bootstrap variance per $g$. For each candidate $g$ and each replicate $b = 1,\dots,B_0$:

    • Draw a spatial block index vector $\mathcal{I}_b$ from .gdpar_spatial_block_indices.
    • Compute a $3$-vector of block-level spatial-mean statistics: $$\mathbf{T}b = \frac{1}{n},\mathbf{D}{\text{surr}}[\mathcal{I}_b,]^\top, z[\mathcal{I}_b].$$
    • Aggregate across coordinates: $$V_g = \sum_{j=1}^{3} \mathrm{Var}{b}(T{b,j}).$$
    • Also count the number of unique occupied tiles $n_{\text{tiles}}(g)$.
  5. MSE criterion. Smooth $V_g$ with a running median ($k=3$): $\tilde V_g = \mathrm{runmed}(V_g,\, 3)$. Anchor at $g_{\min}$ (the smallest candidate, i.e.\ the largest blocks). Then: $$\text{bias}^2(g) = \bigl(\tilde V_g - \tilde V_{g_{\min}}\bigr)^2, \qquad \text{var}(g) = c ;\frac{\tilde V_g^{,2}}{n_{\text{tiles}}(g)},$$ $$\text{MSE}(g) = \text{bias}^2(g) + \text{var}(g).$$ The variance term reflects the inverse-number-of-blocks scaling (Lahiri 2003), not the Monte Carlo noise from finite $B_0$.

  6. Selection. $g^* = \arg\min_g \text{MSE}(g)$. If $g^*$ equals the last (smallest-block) grid element, the procedure bails out to the $n^{1/4}$ default (anticonservative boundary).

Returns A named list with three elements:

Element Type Meaning
block_size integer The chosen $g^*$ (or the default $g_{\text{def}}$ on fallback).
method character "auto" if the data-driven selection succeeded; "rate" on any fallback.
reason character Human-readable explanation: on success a formatted string with the grid, $B_0$, $c$, and $g^*$; on fallback a diagnostic message indicating why the default was used.

Notes

  • Fallback cascade. There are six distinct early-return paths, all returning method = "rate" via the inner fb() helper, each with a different reason string. The function never errors; it always returns a valid list.
  • Bootstrap machinery. All block-index generation delegates to .gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges) which implements the spatial tiling and optional random-origin jitter.
  • Side effects. Calls set.seed() when seed is non-NULL. No other side effects.
  • No S3 dispatch. This is an internal utility, not an S3 generic or method.
  • Cell-occupancy bound. The upper cap on $g$ enforces $n/g^2 \ge 3$ (at least ~3 observations per cell on average), a validity constraint for within-cell resampling. This is deliberately looser than the $n^{1/3}$ rate sometimes seen in the spatial bootstrap literature.
  • Running median smoothing. stats::runmed(..., k = 3L) uses a centred running median by default (endrule = "median"), so the first and last values may be smoothed with a half-window.

gdpar_spatial_dependence_diagnostic(object, coords, W = NULL, weights = c("knn", "distance"), k = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), test = c("permutation", "analytic"), n_perm = 999L, level = 0.95, randomize_seed = NULL, seed = NULL, ...)

Purpose Exported diagnostic that quantifies spatial autocorrelation in the residuals of a scalar ($K = 1$, $p = 1$) Empirical-Bayes or full-Bayes fit via Moran's $I$. It tests the null hypothesis of spatial exchangeability (conditional independence). A significant result signals that the model-based (posterior / Laplace) uncertainty is too narrow because spatial dependence in the residuals violates the independence assumption. This is the spatial analogue of gdpar_dependence_diagnostic and the recommended gate before calling gdpar_spatial_dependence_robust.

Arguments

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A scalar Path 1 fit ($K = 1$, $p = 1$). Asserted by .gdpar_assert_scalar_dep.
coords numeric $n \times 2$ matrix or data frame Spatial coordinates, row-aligned with training data. Validated by .gdpar_validate_coords.
W numeric $n \times n$ matrix or NULL User-supplied spatial weight matrix. Overrides weights/k. Diagonal is zeroed internally. Row-standardized before use.
weights character, one of "knn" (default) or "distance" Neighbourhood construction method when W is NULL. "knn" = $k$-nearest-neighbour adjacency; "distance" = distance-band whose threshold is the smallest that isolates no location. Both produce row-standardized weights. Ignored when W is supplied.
k integer or NULL Number of neighbours for weights = "knn". Default heuristic: $\max(4,\;\min(\lfloor\log n\rceil,\; n-1))$. Must satisfy $1 \le k \le n-1$.
residual_type character, one of "quantile" (default), "response", "pearson", "deviance" Type of residual extracted from object via .gdpar_dependence_residuals. "quantile" = randomized quantile (Dunn–Smyth) residuals.
test character, one of "permutation" (default) or "analytic" Hypothesis test for Moran's $I$. "permutation" = location-relabelling permutation test (two-sided via $`
n_perm integer (default 999L) Number of permutations for test = "permutation". Capped below $n!$ for tiny $n$.
level numeric in $(0, 1)$ (default 0.95) Confidence level used to convert the $p$-value into a verdict.
randomize_seed integer or NULL Seed for randomized quantile residuals of discrete families; ignored otherwise.
seed integer or NULL Seed for the permutation test (reproducibility).
... Unused; present for signature stability.

Mathematics

The function implements the following:

Moran's $I$. With row-standardized weights $w_{ij}$ (so that $\sum_j w_{ij} = 1$) and $S_0 = \sum_{i,j} w_{ij} = n$: $$I = \frac{n}{S_0};\frac{\sum_{i}\sum_{j} w_{ij},(z_i - \bar z)(z_j - \bar z)}{\sum_i (z_i - \bar z)^2}$$ Under spatial exchangeability the expected value is: $$E[I] = -\frac{1}{n - 1}$$

Permutation test. For each of n_perm permutations $\pi$:

  1. Compute $I_\pi$ from the residuals $z_{\pi(i)}$.
  2. Two-sided $p$-value: $$p = \frac{1 + #{b : |I_\pi - E[I]| \ge |I_{\text{obs}} - E[I]|}}{\text{n_perm} + 1}$$

Analytic (Cliff–Ord) normal approximation. Define: $$S_1 = \frac{1}{2}\sum_{i,j}(w_{ij} + w_{ji})^2, \qquad S_2 = \sum_i!\Bigl(\sum_j w_{ij} + \sum_j w_{ji}\Bigr)^{!2}$$ $$\mathrm{Var}[I] = \frac{n^2 S_1 - n S_2 + 3 S_0^2}{S_0^2,(n^2 - 1)} - E[I]^2$$ $$z = \frac{I - E[I]}{\sqrt{\mathrm{Var}[I]}}, \qquad p = 2,\Phi(-|z|)$$

Returns A list of class c("gdpar_spatial_dependence_diagnostic", "list") with components:

Component Type Meaning
residual_type character The residual type used.
n integer Number of observations.
weights character "user" if W was supplied, otherwise the matched weights argument.
k integer $k$ used for kNN, or NA_integer_ if not applicable.
style character Always "W" (row-standardized).
n_zero_weight integer Number of locations with zero row sum in the raw weight matrix.
morans_i numeric Observed Moran's $I$, or NA if any location has zero weight.
expected_i numeric $-1/(n-1)$, or NA if undefined.
var_i numeric Analytic variance of $I$ (Cliff–Ord), or NA when test = "permutation" or the analytic variance is non-positive.
z numeric $z$-score from the analytic test, or NA when not computed.
p_value numeric Two-sided $p$-value, or NA if the test is undefined.
test character "permutation" or "analytic".
n_perm integer Effective number of permutations (may be less than requested for tiny $n$), or NA for the analytic test.
level numeric Confidence level used.
verdict character Human-readable summary string. Three forms: (1) "...Undefined..." if zero-weight locations exist, (2) "Spatial dependence detected..." if $p &lt; \alpha$, (3) "No evidence against spatial independence..." otherwise. Also handles the case where $p$ is NA.

Notes

  • S3 dispatch. This is an exported function (not a method). A print method for the returned object is defined immediately below.
  • Guards and assertions.
    • .gdpar_assert_scalar_dep(object, "object") enforces that the fit is scalar ($K = 1$, $p = 1$).
    • stats::var(resid) <= 0 triggers an abort via gdpar_abort with class "gdpar_diagnostic_error".
    • If W is supplied, it must be a numeric $n \times n$ matrix with all finite values; violations abort with class "gdpar_input_error".
    • k must satisfy $1 \le k \le n - 1$; otherwise abort with class "gdpar_input_error".
  • Zero-weight locations. If any location has zero row sum after weight construction, a warning is emitted, morans_i is set to NA, and the verdict reports "Undefined". With kNN and $k \ge 1$ this should never occur (every point gets at least one neighbour).
  • Small-sample warnings. For test = "permutation":
    • $n &lt; 20$: hard warning ("very small… treat the p-value as indicative only").
    • $n &lt; 50$: soft warning ("small… approximate").
  • Permutation cap. max_distinct is factorial(n) for $n \le 10$ and Inf otherwise. n_perm_eff = min(n_perm, max(1, max_distinct - 1)).
  • Analytic test asymmetry warning. If the row-standardized weight matrix Wn is not symmetric (which is typical for kNN and distance-band weights), a warning recommends test = "permutation".
  • Side effects. Calls set.seed() when seed is non-NULL (inside the permutation loop). Calls require_suggested("posterior", ...) to ensure the posterior package is available for extracting posterior draws.
  • Coordinate handling. Coordinates are validated by .gdpar_validate_coords. They are treated as Euclidean; lon/lat data must be projected before calling this function.
  • Residual extraction. Delegates to .gdpar_dependence_residuals(object, residual_type, randomize_seed).
  • Weight construction. kNN via .gdpar_knn_adjacency(coords, k); distance-band via .gdpar_distance_band_adjacency(coords). Both return raw (unstandardized) adjacency matrices.

print.gdpar_spatial_dependence_diagnostic(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Provides a human-readable summary of the spatial dependence diagnostic to the console.

Arguments

Argument Type Meaning
x gdpar_spatial_dependence_diagnostic The diagnostic object to print.
digits integer (default 3L) Number of significant digits for printed statistics.
... Unused; present for S3 generic compatibility.

Mathematics None.

Returns Invisibly returns x (the input object unchanged), following standard R print method conventions.

Notes

  • Export. Declared with @export in the roxygen header, so it is exported and registered as an S3 method for the print generic on class gdpar_spatial_dependence_diagnostic.
  • Body not shown. The source code for the function body is not included in this section (the section ends at the roxygen closing). Only the roxygen documentation is available; the exact formatting of the printed output cannot be described from this section alone.

print.gdpar_spatial_dependence_diagnostic(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Produces a formatted console summary of the spatial dependence diagnostic (Moran's I test on model residuals, optionally on a k-nearest-neighbour or distance-band spatial weight matrix).

Arguments

Argument Type Meaning
x gdpar_spatial_dependence_diagnostic list The diagnostic object to print. Expected components: residual_type (character), n (integer), weights (character, one of "knn", "distance", or other for user-supplied), k (integer, number of neighbours when weights == "knn"), morans_i (numeric or NA), expected_i (numeric, $E[I]$ under the null), test (character, "analytic" or "permutation"), z (numeric, analytic z-score), p_value (numeric), n_perm (integer, number of permutation draws), verdict (character).
digits integer scalar (default 3L) Number of significant digits used by format() when printing numeric statistics.
... Absorbed for S3 method compatibility; unused.

Returns x invisibly (enabling piping/invisible return in scripts).

Notes

  • When x$morans_i is NA, the method prints "Moran's I : undefined" and skips printing the expected value, test statistic, and p-value entirely (regardless of x$test).
  • When x$weights is "knn", the printed label includes the value of x$k via sprintf. When "distance", it prints a fixed label. Any other value falls through to "user-supplied, row-standardized".
  • Output is directed to the console via cat() with sep = "".
  • No side effects beyond printing; no error handling.

gdpar_spatial_dependence_robust(object, data, coords, block_size = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, scheme = c("tiled", "moving"), random_origin = TRUE, B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)

Purpose Re-estimates the uncertainty (standard errors and percentile confidence intervals) of every AMM coefficient from a scalar Path 1 fit so that inference is robust to unmodelled spatial dependence in the data. The point estimates themselves are unchanged; only the reported uncertainty is adjusted. This implements the working-independence + robust-variance stance of Liang & Zeger (1986) via a spatial block bootstrap: the model is refitted on B spatial block-bootstrap resamples over the observed coords, and the bootstrap standard deviation and percentile intervals of each coefficient replace (or supplement) the model-based (Laplace / posterior) standard errors. The function is the spatial counterpart of gdpar_dependence_robust (temporal); both share one internal refit engine (.gdpar_dependence_robust_engine).

Arguments

Argument Type Default Meaning
object gdpar_eb_fit or gdpar_fit A scalar Path 1 fit (K = 1, p = 1): either an Empirical-Bayes fit from gdpar_eb() or a full-Bayes fit from gdpar(). Validated by .gdpar_assert_scalar_dep.
data data frame The original data frame passed to the fitting function. Must be row-aligned with coords. It is resampled by spatial blocks and the model is refit on each resample. Validated by assert_data_frame.
coords numeric matrix or data frame ($n \times 2$) Spatial coordinates, row-aligned with data. Validated and coerced by .gdpar_validate_coords.
block_size NULL, positive integer, or "auto" NULL Number of grid cells per axis $g$. NULL: variance-optimal rate $g = \max(2, \lfloor n^{1/4} \rceil)$ (decision D100). Positive integer: user-supplied. "auto": data-driven calibration over a grid of $g$ values using residuals (decision D101); falls back to the rate if calibration degenerates.
residual_type character (one of "quantile", "response", "pearson", "deviance") "quantile" Type of residuals used only when block_size = "auto" to feed the data-driven block-size selector. "quantile" gives Dunn-Smyth randomized quantile residuals. Ignored when block_size is NULL or a fixed integer.
randomize_seed NULL or integer NULL Optional seed for reproducibility of the randomized quantile residuals of discrete families. Used only by the "auto" block-size selector; ignored otherwise.
scheme character (one of "tiled", "moving") "tiled" Resampling scheme. "tiled": non-overlapping rectangular cells. "moving": overlapping square blocks anchored on sampled observation points.
random_origin logical scalar TRUE When TRUE and scheme = "tiled", the grid origin is randomly shifted per bootstrap replicate (Politis-Romano-Lahiri circular block idea adapted to 2-D), breaking deterministic boundary artifacts at the cost of one extra random draw per refit.
B integer $\geq 1$ 199L Number of bootstrap refits. Validated by assert_count.
level numeric in $(0,1)$ 0.95 Level for the percentile confidence intervals. Validated by assert_numeric_scalar.
seed NULL or integer NULL Optional seed controlling the block resampling and per-refit Stan seeds for reproducibility. Passed through to .gdpar_dependence_robust_engine.
iter_warmup integer $\geq 1$ 500L Warmup iterations per refit's conditional HMC.
iter_sampling integer $\geq 1$ 500L Sampling iterations per refit's conditional HMC.
chains integer $\geq 1$ 2L Number of chains per refit.
verbose logical scalar TRUE When TRUE, prints an opt-in cost message describing the number of refits, grid dimensions, scheme, and whether the full-Bayes path is in use.
... Additional arguments absorbed for forward compatibility (passed to .gdpar_dependence_robust_engine).

Mathematics

Default block-size rate (decision D100, $d = 2$). The number of cells per axis is $g = \max\bigl(2,\,\bigl\lfloor n^{1/4}\bigr\rceil\bigr)$. This is the $d = 2$ specialization of the variance-optimal rate that minimises the mean-squared error of the block-bootstrap variance estimator. Writing $M$ for the number of points per block (linear extent $M^{1/d}$ per axis), the first-order bias from dependence broken at block edges is $O(M^{-1/d})$ and the estimator variance is $O(M/n)$, so

$$\text{MSE}(M) ;\sim; M^{-2/d} + \frac{M}{n}$$

is minimised at $M \sim n^{d/(d+2)}$. At $d = 1$ this gives $M \sim n^{1/3}$ (the temporal default); at $d = 2$ it gives $M \sim n^{1/2}$ points per block, i.e.\ $g^2 = n/M \sim n^{1/2}$ cells, hence $g \sim n^{1/4}$ cells per axis. The block side-length is $L_0 = \text{ranges} / g$ (per axis), and each point is assigned to the cell

$$\text{cell}(i) = \Bigl\lfloor \min!\Bigl(\frac{x_i - x_{\min}}{L_{0x}},; g-1\Bigr)\Bigr\rfloor ;\cup; \Bigl\lfloor \min!\Bigl(\frac{y_i - y_{\min}}{L_{0y}},; g-1\Bigr)\Bigr\rfloor$$

Resampling.

  • Tiled scheme: non-empty cells are sampled with replacement; the resample is truncated to $n$ observations (introducing a negative bias $O(1/n)$, negligible). When random_origin = TRUE, the grid origin is shifted by a random sub-cell offset per replicate.
  • Moving scheme: overlapping square blocks of side $g$ cells are anchored on sampled observation locations, guaranteeing non-empty blocks.

Data-driven block size ($g$, decision D101). For each candidate $g$ on a grid, $B_0$ cheap (no-refit) spatial block resamples produce the bootstrap variance $V(g)$ of the design-weighted residual functionals

$$\frac{1}{n},\bigl[1,;\tilde{g}_x,;\tilde{g}_y\bigr]^\top z$$

(the influence directions of the coefficient). $g$ is chosen to minimise an empirical mean-squared error: squared bias (anchored at the largest blocks, the least biased because the dependence-breaking bias grows like $g/\sqrt{n}$) plus a leave-one-out jackknife variance, with the $n^{1/4}$ rate as the fallback. A single isotropic $g$ is used.

Returns A list of class c("gdpar_spatial_dependence_robust", "list") with the following components:

Component Type Description
table data frame One row per AMM coefficient. Columns: estimate (original point estimate, unchanged), model_se (model-based SE), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower and ci_upper (percentile interval at level).
block_size integer The chosen $g$ (cells per axis).
block_size_method character One of "rate" (variance-optimal default; also returned when "auto" falls back), "fixed" (user-supplied integer), or "auto" (data-driven calibration succeeded).
scheme character The resampling scheme used ("tiled" or "moving").
random_origin logical Whether random grid-origin shifts were used (relevant only for "tiled").
n_tiles integer Number of unique spatial cells at the chosen resolution.
B integer Requested number of bootstrap refits.
B_ok integer Number of refits that successfully converged (from .gdpar_dependence_robust_engine).
level numeric The percentile-interval level.
seed integer or NULL The seed actually used (may be supplied or internally generated by the engine).
warnings character vector Accumulated warning messages (single-cell warning if n_tiles <= 1, plus any from the refit engine).
refit_diagnostics list Aggregate per-refit convergence diagnostics, structured as in gdpar_dependence_robust.

A print method is declared (signature print.gdpar_spatial_dependence_robust(x, digits, ...); body in another section).

Notes

  • Input validation. object is checked by .gdpar_assert_scalar_dep (must be a scalar Path 1 fit). coords is validated by .gdpar_validate_coords for dimension and type. Collinear coordinates (zero range on either axis) raise an error via gdpar_abort (class gdpar_input_error). block_size must be NULL, a positive integer, or the string "auto"; any other string triggers an error. random_origin and verbose must be logical scalars. B, iter_warmup, iter_sampling, chains are validated by assert_count. level is validated as a numeric scalar in $(0, 1)$.
  • Single-cell warning. If all locations fall into a single spatial cell at the chosen resolution (n_tiles <= 1), a warning is emitted and the bootstrap SE will collapse toward zero. The warning message is stored in warnings_pre and appended to the returned warnings vector.
  • Full-Bayes detection. The function detects whether object is a full-Bayes fit (inherits(object, "gdpar_fit") && !inherits(object, "gdpar_eb_fit")) to adjust the verbose cost message accordingly (full-Bayes refits use full HMC and are markedly more expensive).
  • Suggested dependencies. Requires cmdstanr (for Stan refits) and posterior (for extracting posterior draws); both are loaded via require_suggested.
  • Internal engine. The actual bootstrap loop is delegated to .gdpar_dependence_robust_engine, which receives a resample_fun closure that calls .gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges) to generate block indices for each replicate.
  • Coordinate pre-processing. mins (per-axis minima) and ranges (per-axis ranges) are computed from coords and used throughout cell assignment and the resampling closure.
  • ... arguments. Forwarded to .gdpar_dependence_robust_engine; currently absorbed for compatibility with the temporal sibling gdpar_dependence_robust.
  • No dependence modelling. The function does not model spatial dependence; it only makes inference robust to it. Valid for weak / short-range spatial dependence relative to block size; does not rescue strong long-range dependence.
  • Isotropic block. A single isotropic $g$ is used for both coordinate axes; strongly anisotropic residual dependence is a documented limitation.

print.gdpar_spatial_dependence_robust(x, digits = 3L, ...)

Purpose S3 print method for objects of class "gdpar_spatial_dependence_robust". Renders a human-readable summary of spatial-dependence-robust inference results produced by spatial block-bootstrap variance estimation. It displays the block-bootstrap configuration (grid size, scheme, number of non-empty tiles), the number of refits performed and succeeded, the confidence level, a formatted table of coefficient estimates with model-based and robust standard errors and their ratio, a brief interpretation of the se_ratio, refit diagnostics, and any stored warnings.

Arguments

Argument Type Meaning
x list (S3 class "gdpar_spatial_dependence_robust") The spatial-dependence-robust result object. Expected to contain the named elements scheme, block_size_method, block_size, random_origin, n_tiles, B, B_ok, level, table, refit_diagnostics, and warnings.
digits integer(1) (default 3L) Number of significant digits used when formatting numeric columns of the coefficient table via format().
... Additional arguments passed to print(); accepted for S3 method signature compatibility but not used in the body.

Returns Invisibly returns x, the original input object (via invisible(x)). The primary effect is the side effect of printing to the console.

Notes

  1. S3 dispatch. This is an S3 method registered for the generic print on objects of class "gdpar_spatial_dependence_robust". Standard print() dispatch applies.
  2. Header line. Prints the scheme name (e.g. "lattice", "tiled") followed by " spatial block bootstrap".
  3. Grid description. Prints block_size × block_size cells. The block_size_method element is checked:
    • "auto" appends " (auto: data-driven calibration)".
    • "rate" appends " (rate: n^(1/4))".
    • "fixed" or any other value (including NULL via the %||% fallback) appends nothing. If random_origin is TRUE and scheme is exactly "tiled", the note " (randomized origin)" is also appended. The count of non-empty tiles (n_tiles) is always shown.
  4. Refit summary. Displays the total number of bootstrap refits (B) and how many completed successfully (B_ok).
  5. Confidence level. Prints level (a numeric probability, e.g. 0.95).
  6. Coefficient table formatting. Columns of x$table that are numeric are re-formatted with format(col, digits = digits). The table is then printed with row.names = FALSE.
  7. Interpretation hint. A short explanatory sentence is printed: the se_ratio is defined as robust_se / model_se; a ratio greater than 1 indicates that the model-based standard errors understate the spatial-dependence-robust uncertainty.
  8. Refit diagnostics. Delegates to .gdpar_print_refit_diagnostics(x$refit_diagnostics, digits) (an internal helper defined elsewhere) to print any additional convergence or numerical diagnostics from the bootstrap refits.
  9. Warnings. If x$warnings is a non-empty character vector, up to 5 warnings are printed, each preceded by " - ". If more than 5 exist, a count of remaining warnings is appended.
  10. Edge cases. If block_size_method is NULL, the %||% (null-coalescing) operator defaults to "fixed", so no calibration label is printed. If scheme is not "tiled" or random_origin is not TRUE, the randomized-origin note is omitted. If x$warnings has length zero or is NULL, the warnings section is skipped entirely (the if guards handle this).

R/dims_spec.R

dimwise(a = NULL, b = NULL)

Purpose Constructor for the dims_spec S3 class. It broadcasts a single uniform per-component specification (additive basis a, multiplicative basis b) across all $p$ dimensions of a multivariate $\theta_i$, deferring the actual value of $p$ to the point of consumption (amm_spec). It is the intended value of the dims argument of amm_spec when every dimension shares the same $a_k, b_k$ structure; per-dimension deviations are subsequently layered on with override.

Arguments

  • a : NULL or a one-sided formula. The additive basis applied uniformly to every dimension $k = 1, \ldots, p$ of $\theta_i$. NULL disables the additive component for all dimensions.
  • b : NULL or a one-sided formula. The multiplicative basis applied uniformly to every dimension. NULL disables the multiplicative component for all dimensions.

Mathematics The object encodes, for the canonical AMM form $$ \theta_i[k] = \theta_{\mathrm{ref}}[k] + a_k(x_i) + b_k(x_i),\theta_{\mathrm{ref}}[k] + \bigl(W_k(\theta_{\mathrm{ref}}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i, \quad k = 1, \ldots, p, $$ the per-dimension, covariate-only pieces $a_k$ and $b_k$. The cross-dimension modulator $W_k(\theta_{\mathrm{ref}})$ is deliberately excluded from dims_spec because it couples all dimensions of $\theta_{\mathrm{ref}}$ and cannot be factored per $k$ without silently restricting the model to the separable sub-class.

Returns A list of class c("dims_spec", "list") with two components:

  • base : a list list(a = a, b = b) holding the uniform template.
  • overrides : an empty named list list(), to be populated by override.

Notes

  • Both arguments may simultaneously be NULL; this is permitted and yields a dims_spec whose base disables both components.
  • Validation is delegated to assert_one_sided_formula(., allow_null = TRUE) for each of a and b; malformed formulas abort there.
  • The dimension $p$ is intentionally not stored; coherence with $p$, the multivariate $W$ basis, and any overrides is validated later by amm_spec.
  • Bare formulas passed directly to amm_spec's dims argument when $p &gt; 1$ are rejected; wrapping in dimwise() is the explicit opt-in to broadcasting.

override(dims, k, a, b)

Purpose Attach a per-dimension override to an existing dims_spec, replacing the additive and/or multiplicative formula for a single dimension index k while leaving the base template and other dimensions untouched. Overrides compose across multiple calls and overwrite on repeated k.

Arguments

  • dims : a dims_spec object (produced by dimwise).
  • k : a positive integer scalar. The dimension index to override. Coherence with the global $p$ is checked later by amm_spec/resolve_dims_spec, not here.
  • a : optional. A one-sided formula replacing the additive basis for dimension k, or NULL to disable the additive component for that dimension only. Missing (omitted) means "inherit from base"; explicitly NULL means "disable for this dimension".
  • b : optional. Same semantics as a for the multiplicative basis.

Mathematics For the overridden dimension $k$, the resolved per-dimension pair $(a_k, b_k)$ becomes $$ a_k = \begin{cases} a^{\text{ov}} & \text{if a supplied} \ a^{\text{base}} & \text{if a missing} \end{cases}, \qquad b_k = \begin{cases} b^{\text{ov}} & \text{if b supplied} \ b^{\text{base}} & \text{if b missing} \end{cases}, $$ where a supplied value of NULL is interpreted as "disabled" (a valid formula replacement of NULL), distinct from "missing" (inherit).

Returns A new dims_spec (a modified copy of dims; the input is not mutated in place because list subsetting creates copies) with the override registered under the character key as.character(as.integer(k)) in dims$overrides. Each override entry is a list with components a, b, a_set (logical), b_set (logical). Calling override twice with the same k replaces the prior entry for that index.

Notes

  • assert_inherits(dims, "dims_spec", "dims") is called first; non-dims_spec input aborts.
  • assert_count(k, "k") enforces that k is a positive integer scalar.
  • If both a and b are missing, the function aborts via gdpar_abort with class = "gdpar_input_error" and the message: "override(): at least one of 'a' or 'b' must be supplied. To leave a dimension unchanged, do not call override() for it."
  • The missing-vs-NULL distinction is implemented with base::missing(). When a is supplied, assert_one_sided_formula(a, "a", allow_null = TRUE) is run, then ov["a"] <- list(a) (the [<--with-list idiom is used so that assigning NULL retains the element rather than deleting it) and ov$a_set <- TRUE. Symmetrically for b.
  • If no prior override exists for k, a fresh entry list(a = NULL, b = NULL, a_set = FALSE, b_set = FALSE) is seeded before applying the supplied arguments, so unsupplied components correctly remain flagged as unset and will inherit from the base at resolution time.
  • Range validation of k against $p$ is not performed here; it is deferred to resolve_dims_spec.

resolve_dims_spec(dims, p)

Purpose Internal resolver that flattens a dims_spec into the canonical per-dimension representation consumed by amm_spec: a length-p list of list(a, b) pairs, with overrides applied on top of the base template.

Arguments

  • dims : a dims_spec object.
  • p : a positive integer scalar, the global dimension.

Mathematics For each $k \in \{1, \ldots, p\}$ the resolved pair is $$ (a_k, b_k) = \begin{cases} (a^{\text{ov}}_k, b^{\text{ov}}_k) & \text{if an override for } k \text{ exists, per-component when set} \ (a^{\text{base}}, b^{\text{base}}) & \text{otherwise.} \end{cases} $$ Concretely, $a_k = a^{\text{ov}}_k$ iff ov$a_set is TRUE, else $a_k = a^{\text{base}}$; likewise for $b_k$.

Returns A list of length p. Each entry is list(a = a_k, b = b_k) where a_k/b_k are each either a one-sided formula or NULL.

Notes

  • assert_inherits(dims, "dims_spec", "dims") and assert_count(p, "p") are run first.
  • Before flattening, every override key is parsed with suppressWarnings(as.integer(key)). Any key that is NA, < 1, or > p is collected into bad; if bad is non-empty, the function aborts via gdpar_abort with class = "gdpar_input_error", a sprintf message listing the bad keys and the valid range 1:p, and a data field list(bad_keys = bad, p = p).
  • The flattening loop iterates seq_len(p); for each k it starts from dims$base$a / dims$base$b, then if dims$overrides[[as.character(k)]] is non-NULL it conditionally replaces a_k when isTRUE(ov$a_set) and b_k when isTRUE(ov$b_set). Unset components therefore inherit from the base, realising the missing-vs-NULL semantics established by override.
  • Marked @keywords internal / @noRd; not exported.

print.dims_spec(x, ...)

Purpose S3 print method for objects of class dims_spec. Renders a compact human-readable summary of the base template and any registered overrides.

Arguments

  • x : a dims_spec object.
  • ... : ignored; present for S3 generic compatibility.

Returns Invisibly returns x.

Notes

  • Output layout:
    • Header line <dims_spec>.
    • base: section printing a : <deparse(formula) or "NULL"> and b : <deparse(formula) or "NULL">. Formula deparsing uses base::deparse.
    • If length(x$overrides) > 0L, an overrides: section enumerating each override. Keys are sorted by their integer value (sort(as.integer(names(x$overrides)))) and printed as k = <int> : <parts>, where <parts> is the semicolon-joined set of a = <deparse or "NULL"> (only if isTRUE(ov$a_set)) and b = <deparse or "NULL"> (only if isTRUE(ov$b_set)). Unset components are omitted from the line.
    • If there are no overrides, prints overrides: <none>.
  • The method is exported (the generic print is dispatched via UseMethod on class "dims_spec", which sits before "list" in the class vector).
  • No validation of x is performed; passing a malformed object may produce confusing output or errors from deparse/subsetting.

R/eb_methods.R

print.gdpar_eb_fit(x, digits = 3L, ...)

Purpose
Provides a concise, human-readable console summary of a fitted Empirical-Bayes model object. This S3 print method dispatches on objects of class gdpar_eb_fit, displaying key model characteristics, parameter estimates, numerical diagnostics of the Laplace approximation, and conditional HMC diagnostics.

Arguments

  • x: A gdpar_eb_fit object, the result of an Empirical-Bayes fitting procedure.
  • digits: Integer scalar. Controls the number of digits for numeric formatting via format(). Defaults to 3L.
  • ...: Additional arguments (unused; included for S3 method consistency).

Mathematics
No explicit mathematical formula is implemented. The method presents estimates and standard errors computed elsewhere.

Returns
Invisibly returns the input object x (type gdpar_eb_fit).

Notes

  • S3 Dispatch: Invoked by print() when the first argument is of class gdpar_eb_fit.
  • Conditional Output: The printed output adapts based on the path component of the object. For "eb_KxP" (Path C, the K×p regime), it prints a multi-dimensional array of estimates (theta_ref_kp_hat, theta_ref_kp_se), slot names, and per-slot condition numbers. For other paths, it prints scalar/vectors of theta_ref_hat and theta_ref_se.
  • Side Effects: Writes directly to the console via cat().
  • NULL-safe Access: Uses the %||% operator (likely from rlang) to provide default values for potentially NULL components (e.g., x$family$name), preventing errors during formatting.
  • Diagnostics: Displays numerical diagnostics (diagnostics_numerical) and, if available, one-line conditional HMC diagnostics (diagnostics).

summary.gdpar_eb_fit(object, level = 0.95, ...)

Purpose
Constructs a structured summary of an Empirical-Bayes fit suitable for programmatic access and further printing. This S3 summary method computes credible intervals, optionally applying the Proposition 7B scalar or tensor correction, and extracts a summary of the conditional posterior if available.

Arguments

  • object: A gdpar_eb_fit object.
  • level: Numeric scalar in the interval (0, 1). Specifies the probability level for credible intervals. Defaults to 0.95.
  • ...: Additional arguments (unused).

Mathematics

  1. Credible Interval Inflation (Correction):
    The standard error (se) is multiplied by an inflation factor inflate to widen the credible interval, accounting for the uncertainty of the reference anchor.

    • Scalar Correction (Path C off):
      $$ \text{inflate} = \sqrt{1 + \frac{C}{\max(1, J)}} $$
      where $C$ is the constant object$eb_correction_constant and $J$ is the number of groups.
    • Tensor Correction (Path C on):
      For each group $g$, slot $k$, and coordinate $c$:
      $$ \text{inflate}_{k,c} = \sqrt{1 + \frac{\mathbf{T}[k, c, c]}{\max(1, J)}} $$
      where $\mathbf{T}$ is the object$correction_tensor_constant.
  2. Credible Interval Calculation:
    The $(1-\alpha)$ credible interval is:
    $$ \text{estimate} \pm z_{1-\alpha/2} \cdot \text{se} \cdot \text{inflate} $$
    where $z_{1-\alpha/2}$ is the $(1-\alpha/2)$ quantile of the standard normal distribution, and $\alpha = 1 - \text{level}$.

Returns
An object of class summary.gdpar_eb_fit, which is a list containing:

  • theta_table: A data.frame (or array for Path C) of estimates, standard errors, lower/upper interval bounds, and inflation factors.
  • conditional_summary: A posterior summary (from the posterior package) of the conditional model fit, if available. Otherwise NULL.
  • correction_applied: Logical flag indicating if an EB correction was applied.
  • correction_constant (non-Path C) or correction_tensor (Path C): The correction value(s) used.
  • inflation_factor: The computed inflation factor(s).
  • level, family, link, J_groups, K_slots, p_dim, slot_names (Path C), diagnostics_numerical, diagnostics_hmc, path, call: Various model metadata.

Notes

  • Input Validation: Raises a gdpar_input_error (via gdpar_abort()) if level is not a single numeric value in (0, 1).
  • Conditional Posterior Extraction: Attempts to extract and summarize the conditional posterior draws using posterior::summarise_draws(). Filters out latent parameters (e.g., eta, log_lik) by pattern matching. Errors are silently caught, returning NULL.
  • Path Dependency: The structure of the returned summary, especially theta_table, differs significantly between the scalar (non-Path C) and tensor (Path C, eb_KxP) regimes.

print.summary.gdpar_eb_fit(x, digits = 3L, ...)

Purpose
Formats and prints the summary of an Empirical-Bayes fit produced by summary.gdpar_eb_fit(). This S3 print method provides a detailed, human-readable display of the summary object.

Arguments

  • x: A summary.gdpar_eb_fit object.
  • digits: Integer scalar for numeric formatting. Defaults to 3L.
  • ...: Additional arguments (unused).

Mathematics
No new calculations; presents the pre-computed values from the summary object.

Returns
Invisibly returns the input summary object x.

Notes

  • S3 Dispatch: Invoked by print() when the first argument is of class summary.gdpar_eb_fit.
  • Path-Dependent Output: Prints different sections depending on whether x$path is "eb_KxP" (Path C). For Path C, it prints the tensor-based correction details and the full theta_table. For other paths, it prints the scalar correction constant and inflation factor.
  • Conditional Summary Display: If available, prints the first 8 rows of the conditional posterior summary for a quick overview.
  • Side Effects: Writes directly to the console via cat() and print().

coef.gdpar_eb_fit(object, ...)

Purpose
Extracts coefficient estimates from a fitted empirical Bayes General Dynamic Parameter model (gdpar_eb_fit object). It returns the reference parameter estimates and, if a conditional HMC fit is available, the conditional model parameters (random effects, fixed effects, and raw W parameters).

Arguments

  • object: A gdpar_eb_fit object resulting from a call to a fitting function (e.g., gdpar_eb).
  • ...: Additional arguments (currently unused).

Mathematics
No new mathematical operations. It extracts precomputed quantities:

  • $\widehat{\theta}_{\text{ref}}^{\text{EB}}$: The empirical Bayes estimate of the reference parameter.
  • $\text{SE}(\widehat{\theta}_{\text{ref}})$: Its standard error.
  • $\text{Cov}(\widehat{\theta}_{\text{ref}})$: Its covariance matrix.
  • For conditional parameters, it extracts posterior means and standard deviations from HMC draws: $$ \widehat{\mu}a = \frac{1}{S} \sum{s=1}^S a^{(s)}, \quad \text{SD}(a) = \sqrt{\frac{1}{S-1} \sum_{s=1}^S (a^{(s)} - \widehat{\mu}_a)^2} $$ (analogous for b and W), where $S$ is the number of posterior draws.

Returns
A list of class c("gdpar_coef_eb", "gdpar_coef", "list") with components:

  • theta_ref: A list containing:
    • method: Character "EB".
    • estimate: Numeric scalar, $\widehat{\theta}_{\text{ref}}^{\text{EB}}$.
    • se: Numeric scalar, standard error.
    • cov: Numeric matrix, covariance matrix.
    • eb_correction_applied: Logical, whether an EB correction was applied.
    • eb_correction_constant: Numeric, the constant used for EB correction (if any).
  • If object$conditional_fit exists and posterior package is available:
    • a: List with estimate (vector of means) and se (vector of SDs) for a_coef parameters.
    • b: List with estimate and se for b_coef parameters.
    • W: List with estimate and se for W_raw parameters.

Notes

  • S3 method for class gdpar_eb_fit.
  • The conditional parameters (a, b, W) are only extracted if the posterior package is available and the conditional fit object contains draws.
  • The helper function pick(pat) uses a regex pattern pat to match variable names in the posterior draws and returns their means and SDs.
  • The output class inherits from gdpar_coef, allowing use of generic coefficient methods.

predict.gdpar_eb_fit(object, newdata = NULL, type = c("response", "linear_predictor"), level = 0.95, ...)

Purpose
Computes posterior predictions from the conditional HMC model fit at the plug-in empirical Bayes estimate $\widehat{\theta}_{\text{ref}}^{\text{EB}}$. Supports in-sample prediction only; out-of-sample prediction is deferred to a later phase.

Arguments

  • object: A gdpar_eb_fit object.
  • newdata: Optional data frame with the same variables as training data. Currently must be NULL (in-sample prediction).
  • type: Character string specifying prediction scale:
    • "response" (default): Predictions on the response scale via the family's inverse-link function ($y$).
    • "linear_predictor": Predictions on the linear predictor scale ($\eta$).
  • level: Numeric scalar in $(0,1)$ for the credible interval width. Defaults to $0.95$.
  • ...: Additional arguments (currently unused).

Mathematics
Let $\eta^{(s)}$ be the $s$-th posterior draw of the linear predictor, and $y^{(s)}$ the corresponding response-scale draw. For each observation $i$:

  • Mean: $\bar{\eta}_i = \frac{1}{S} \sum_{s=1}^S \eta_i^{(s)}$ (or $\bar{y}_i$ for response).
  • Credible interval bounds: $Q_{\alpha/2}(\eta_i)$ and $Q_{1-\alpha/2}(\eta_i)$, where $\alpha = 1 - \text{level}$ and $Q$ denotes the sample quantile.

Returns
A list with components:

  • mean: Numeric vector of posterior predictive means (length $n$).
  • lower: Numeric vector of lower credible interval bounds (length $n$).
  • upper: Numeric vector of upper credible interval bounds (length $n$).
  • draws: Numeric matrix of posterior predictive draws (dimensions $S \times n$).
  • level: The credible interval level used.
  • type: The prediction type ("response" or "linear_predictor").

Notes

  • S3 method for class gdpar_eb_fit.
  • If newdata is not NULL, an error of class "gdpar_unsupported_feature_error" is raised, stating that out-of-sample prediction is not yet implemented (deferred to Sub-phase 8.6.C).
  • Requires the posterior package to extract and manipulate HMC draws.
  • The function searches for variables in the conditional fit's draws matching "^eta\\[" (for linear predictor) or "^y_pred\\[" (for response). If none are found, an internal error is raised.
  • Quantiles are computed using stats::quantile with names = FALSE.
  • The draws matrix is transposed from the posterior draws matrix format to $S \times n$.

R/eb.R

gdpar_eb(formula, family = gdpar_family("gaussian"), amm = amm_spec(), W = NULL, data, prior = NULL, anchor = "prior_mean", skip_id_check = FALSE, chains = 4L, iter_warmup = 1000L, iter_sampling = 1000L, adapt_delta = 0.95, max_treedepth = 12L, refresh = 100L, verbose = TRUE, seed = NULL, group = NULL, parametrization = c("auto", "ncp", "cp"), id_check_rigor = c("full", "fast"), eb_correction = TRUE, laplace_control = list(), ...)

Purpose

Exported main entry point for Path 1 Empirical-Bayes (EB) estimation of the AMM canonical model. It is the EB counterpart of gdpar(). The function implements a three-step pipeline:

  1. Step (i) — Estimate the population reference $\theta_{\text{ref}}$ by maximizing the marginal (Type II) likelihood via Laplace approximation (cmdstanr::laplace()), with multi-start optimization and adaptive Levenberg–Marquardt ridge perturbation for numerical anti-fragility.
  2. Step (iii) — Sample the lower-level parameters $\xi = (a, b, W, \sigma_*, \phi)$ from the conditional posterior $p(\xi \mid y, \widehat{\theta}_{\text{ref}}^{\text{EB}})$ via HMC (cmdstanr::sample()).
  3. Optionally apply the scalar Proposition 7B coverage-discrepancy inflation factor to the conditional credible intervals.

The function dispatches across three path regimes based on the resolved $(K, p)$:

  • Path A (8.6.B): $K = 1$, $p = 1$ — the base regime executed inline in the function body.
  • Path B (8.6.C): $K &gt; 1$, $p = 1$ — delegated to .gdpar_eb_run_K().
  • Path C (8.6.D): $K &gt; 1$ and any slot with $p &gt; 1$ — delegated to .gdpar_eb_run_KxP().

Arguments

Argument Type Meaning
formula Two-sided formula or gdpar_formula_set Outcome and RHS specification. Same semantics as gdpar()'s formula. When it inherits from "gdpar_formula_set", the K-input dispatch fires.
family gdpar_family object or named list Distributional family. Sub-phase 8.6.B supports stan_id in c(1, 2, 3, 4) (Gaussian, Poisson, neg-binomial-2, Bernoulli) for the $K=1$ path. When a named list (not inheriting gdpar_family or gdpar_family_multi), it is treated as a multi-family input for K-input dispatch.
amm amm_spec or named list of amm_spec AMM specification. Must have amm$p == 1L for the base regime; multivariate ($p &gt; 1$) flows through Path A internally. A named list (not inheriting amm_spec) triggers K-input dispatch.
W W_basis object or NULL Optional modulating basis (polynomial or B-spline).
data data.frame Data frame containing all variables referenced by formula and amm.
prior gdpar_prior object or NULL Prior specification. When NULL, defaults via gdpar_prior() are used.
anchor Numeric scalar, "prior_mean", or "empirical_y" Anchor value for $\theta_{\text{ref}}$. Default "prior_mean".
skip_id_check Logical scalar If TRUE, skips the basis-restricted identifiability check.
chains Integer scalar Number of HMC chains for Step (iii). Default 4L.
iter_warmup Integer scalar HMC warmup iterations per chain. Default 1000L.
iter_sampling Integer scalar HMC sampling iterations per chain. Default 1000L.
adapt_delta Numeric scalar HMC adapt_delta. Default 0.95.
max_treedepth Integer scalar HMC maximum tree depth. Default 12L.
refresh Integer scalar HMC refresh interval. Default 100L.
verbose Logical scalar Controls diagnostic messages and show_messages/show_exceptions in HMC.
seed Integer scalar or NULL Random seed for reproducibility (Laplace multi-start, parametrization pre-flight, and HMC).
group One-sided formula or NULL Grouping variable specification.
parametrization Character scalar One of "auto" (default), "ncp", "cp". Selects CP/NCP sampling parametrization for additive and modulating components in Step (iii). "auto" triggers a pre-flight diagnostic via resolve_parametrization().
id_check_rigor Character scalar One of "full" or "fast". Matched but not otherwise consumed in this function body (forwarded to K-path orchestrators).
eb_correction Logical scalar If TRUE (default), applies the scalar Proposition 7B inflation factor to conditional credible intervals. If FALSE, issues a gdpar_diagnostic_warning about expected $O(n^{-1})$ under-coverage.
laplace_control Named list Controls for Step (i) Laplace approximation and anti-fragility. Recognized entries: multi_start_M (default 5), kappa_threshold (default 1e10), ridge_init (default 1e-6), epsilon_lm (default sqrt(.Machine$double.eps)), ridge_max_iter (default 10), ridge_grow_factor (default 10.0), laplace_draws (default 1000), optim_algorithm (default "lbfgs"). Resolved by .gdpar_eb_resolve_laplace_control().
... Additional arguments Forwarded to the underlying HMC sampler (conditional_model$sample()) in Step (iii).

Mathematics

The EB estimator maximizes the marginal (Type II) log-likelihood:

$$ \widehat{\theta}_{\text{ref}}^{\text{EB}} = \arg\max_{\theta_{\text{ref}}} ; \ell_{\text{marg}}(\theta_{\text{ref}}), \qquad \ell_{\text{marg}}(\theta_{\text{ref}}) = \log \int p(y \mid \xi, \theta_{\text{ref}}) , p(\xi \mid \theta_{\text{ref}}) , d\xi. $$

The integral is approximated by the Laplace method: for each candidate $\theta_{\text{ref}}$, the inner posterior mode $\widehat{\xi}(\theta_{\text{ref}})$ and Hessian $H(\theta_{\text{ref}})$ are computed, yielding

$$ \ell_{\text{marg}}(\theta_{\text{ref}}) \approx \log p(y \mid \widehat{\xi}, \theta_{\text{ref}}) + \log p(\widehat{\xi} \mid \theta_{\text{ref}}) + \frac{d_\xi}{2}\log(2\pi) - \frac{1}{2}\log\det H(\theta_{\text{ref}}), $$

where $d_\xi$ is the dimension of $\xi$. The marginal standard error for $\theta_{\text{ref}}$ is derived from the inverse of the marginal Hessian (Laplace covariance).

Given $\widehat{\theta}_{\text{ref}}^{\text{EB}}$, the conditional posterior is

$$ p(\xi \mid y, \widehat{\theta}_{\text{ref}}^{\text{EB}}) \propto p(y \mid \xi, \widehat{\theta}_{\text{ref}}^{\text{EB}}) , p(\xi \mid \widehat{\theta}_{\text{ref}}^{\text{EB}}), $$

sampled via HMC in Step (iii).

When eb_correction = TRUE, the scalar Proposition 7B inflation constant $c_{\text{EB}}$ is applied to the conditional credible intervals, inflating their half-widths to account for the $O(n^{-1})$ coverage discrepancy between the EB conditional posterior and the exact marginal posterior.

Returns

An object of class c("gdpar_eb_fit", "list") with the following named components:

Component Type Description
theta_ref_hat Numeric vector (length J_groups) EB point estimates of $\theta_{\text{ref}}$.
theta_ref_se Numeric vector (length J_groups) Marginal standard errors from the Laplace covariance.
conditional_fit cmdstanr fit object The HMC fit from Step (iii).
amm amm_spec The resolved AMM specification.
family gdpar_family The resolved family object.
prior gdpar_prior The resolved prior.
design AMM design object Built by build_amm_design().
anchor Numeric scalar The resolved anchor value.
stan_data Named list The Stan data list (includes K_slots, p_dim).
identifiability_report Report object or NULL Result of gdpar_check_identifiability(); NULL when skip_id_check = TRUE.
diagnostics gdpar_diagnostics Diagnostics from the conditional HMC fit, computed by compute_diagnostics().
diagnostics_numerical Named list Numerical diagnostics from the Laplace step: kappa, lm_perturbation, lm_n_iter, lm_status (one of "not_needed", "converged", "exhausted"), kappa_post_ridge, multi_start_dispersion, marginal_log_lik_history. For Path C, slot-vectorized counterparts (kappa_per_slot, lm_lambda_per_slot, lm_n_iter_per_slot, lm_status_per_slot) replace the scalars.
parametrization Named list Contains cp_a (logical), cp_W (logical), and meta (metadata from resolve_parametrization()).
group_info Group info object or NULL Resolved grouping information.
correction_applied Logical scalar Whether the Proposition 7B correction was applied.
eb_correction_constant Numeric scalar The inflation constant when eb_correction = TRUE; NA_real_ otherwise.
call call The matched call.
path Character scalar Always "eb".

Notes

  • Argument matching: parametrization and id_check_rigor are resolved via match.arg() at function entry. call is captured via match.call().

  • Input validation: Delegates to .gdpar_eb_validate_inputs() (defined in a subsequent section) for type discipline of formula, family, amm, data, eb_correction, and laplace_control. If prior is NULL, it is replaced by gdpar_prior(); then assert_inherits() enforces class "gdpar_prior".

  • cmdstanr dependency: require_suggested("cmdstanr", ...) is called to ensure the suggested package is available. The Laplace method requires cmdstanr ≥ 0.7.0.

  • K-input dispatch: Four boolean flags detect multi-slot input patterns:

    • .formula_set_input: formula inherits "gdpar_formula_set".
    • .amm_list_input: amm is a list, does not inherit "amm_spec", and has non-NULL names.
    • .classic_with_amm_calls: formula is a standard two-sided formula (length 3) whose RHS contains a()/b()/W() calls, detected by .gdpar_rhs_has_amm_calls().
    • .family_is_named_list: family is a named list not inheriting "gdpar_family" or "gdpar_family_multi".

    When any of these fires, .gdpar_eb_resolve_K_inputs() builds amm_list_canonical, family_promoted, outcome_name, formula_env, and family_id_k_vector. If resolved $K &gt; 1$, the function checks whether any slot has $p &gt; 1$ (.any_slot_p_gt1); if so, it returns .gdpar_eb_run_KxP() (Path C), otherwise .gdpar_eb_run_K() (Path B). If $K = 1$, the single amm_spec is unwrapped from amm_list_canonical[[1]], family is replaced by family_promoted, and a new formula is reconstructed from the union of all.vars(amm$a) and all.vars(amm$b) (or "1" if both are empty), using K_inputs$formula_env as the environment.

  • Path A (K = 1) pipeline: After K-input resolution (or if no K-input pattern fired), the function proceeds inline:

    1. p_resolved is read from amm$p (defaulting to 1L if absent). K_resolved is always 1L.
    2. .gdpar_eb_check_stan_id_for_path() validates the family's stan_id against the resolved $(K, p)$.
    3. The outcome variable name is extracted from formula[[2]]. If not found in data, a gdpar_input_error is raised. Non-finite values (NA, NaN, Inf) in the outcome trigger a gdpar_input_error with a count.
    4. The RHS formula is extracted as formula[c(1L, 3L)] and updated with ~ . + 0 (no intercept).
    5. If amm$W is non-NULL, it is materialized via materialize_W_basis(amm$W, p = p_resolved).
    6. The AMM design is built via build_amm_design(amm, data, formula_rhs = rhs).
    7. The anchor is resolved via resolve_anchor(anchor, family, y, prior, verbose).
    8. Unless skip_id_check = TRUE, gdpar_check_identifiability() is called with theta_ref_init set to 1 when amm$b is non-NULL and abs(anchor_value) < 1e-8, otherwise anchor_value. If the check does not pass, a gdpar_identifiability_error is raised with the report attached in data = list(report = rep).
    9. Group resolution via .resolve_group_argument(). If a group is present, .check_group_aliasing_c7() is called.
    10. Stan data is assembled via assemble_stan_data(). stan_data$K_slots and stan_data$p_dim are set to the resolved integers.
    11. Parametrization is resolved via resolve_parametrization() (which may run a pre-flight diagnostic when parametrization = "auto").
    12. The marginal Stan model source is generated by .gdpar_eb_generate_stan_marginal(), written to a tempfile via write_stan_to_tempfile(), and compiled via cmdstanr::cmdstan_model().
    13. The marginal likelihood is maximized by .gdpar_eb_maximize_marginal(), returning theta_ref_hat, theta_ref_se, and diagnostics.
    14. The conditional Stan model source is generated by .gdpar_eb_generate_stan_conditional(), written and compiled analogously.
    15. stan_data_cond is a copy of stan_data with theta_ref_data set: when $p &gt; 1$ and length(theta_hat_loc) == J_groups * p, it is reshaped to a J_groups × p matrix (column-major, byrow = FALSE); otherwise it is passed as a flat numeric vector.
    16. HMC sampling is invoked via do.call(conditional_model$sample, sample_args). Extra arguments from ... are merged into sample_args, potentially overriding defaults. seed is included only when non-NULL.
    17. Diagnostics are computed via compute_diagnostics(fit_cond, verbose = verbose).
    18. The EB correction is computed by .gdpar_eb_apply_correction().
  • Errors raised:

    • gdpar_input_error: outcome variable not in data; outcome contains non-finite values.
    • gdpar_identifiability_error: basis-restricted identifiability check failed (with data = list(report = rep)).
    • gdpar_unsupported_feature_error: raised by .gdpar_eb_check_stan_id_for_path() for unsupported stan_id / $(K, p)$ combinations (as documented; the actual raise is inside the helper).
    • gdpar_eb_numerical_error: raised by .gdpar_eb_maximize_marginal() when the condition number exceeds kappa_threshold after adaptive ridge (as documented; the actual raise is inside the helper).
  • Side effects: Writes Stan source files to temporary files on disk; compiles Stan models (may invoke the C++ toolchain); runs optimization and HMC sampling (may produce console output controlled by verbose/refresh).

  • S3 dispatch: The returned object has class c("gdpar_eb_fit", "list"). No S3 methods for this class are defined in this section.

.gdpar_eb_validate_inputs(formula, family, amm, data, eb_correction, laplace_control)

Purpose

Top-level input validator for the EB (Empirical Bayes) correction pipeline. Called before any dispatch to verify that every public argument conforms to the expected type and structure. Guards the entry point of the EB path and prevents downstream functions from receiving malformed inputs.

Arguments

  • formula (any): Must be either a two-sided R formula of length 3 (y ~ ...) or an object inheriting from class "gdpar_formula_set".
  • family (any): Must be one of: an object inheriting from "gdpar_family", an object inheriting from "gdpar_family_multi" (Path A, $p &gt; 1$), or a named list whose every element inherits from "gdpar_family" with no duplicated or empty names (Path B heterogeneous $K$, sub-phase 8.3.7 pattern).
  • amm (any): Must be an object inheriting from "amm_spec" or a named list (whose elements are expected to be "amm_spec" objects) for Path B $K &gt; 1$.
  • data (any): Must be a data.frame.
  • eb_correction (any): Must be a single, non-NA logical value (TRUE or FALSE).
  • laplace_control (any): Must be a list (possibly empty, possibly unnamed at this stage — naming is enforced downstream in .gdpar_eb_resolve_laplace_control).

Returns

invisible(NULL). The function is called for its side effect of raising errors on invalid input.

Notes

  • Raises an error of class "gdpar_input_error" (via gdpar_abort) for each validation failure, with a condition data field carrying received_class where applicable.
  • The formula check first tests inherits(formula, "gdpar_formula_set"); if that fails, it requires inherits(formula, "formula") and length(formula) == 3L.
  • The family named-list detection (Path B) requires all of: is.list(family), not inheriting from "gdpar_family" or "gdpar_family_multi", non-null names, all names non-empty (nzchar), no duplicated names (anyDuplicated == 0L), and every element inheriting from "gdpar_family" (checked via vapply).
  • The amm named-list detection requires is.list(amm), not inheriting from "amm_spec", and non-null names.
  • The $K &gt; 1$ + $p &gt; 1$ guard (Path C) is explicitly released per Sub-phase 8.6.D (Session 13b, 2026-05-25); Path C is routed to .gdpar_eb_run_KxP() in the dispatcher. Per-path stan_id checks are deferred to .gdpar_eb_check_stan_id_for_path().

.gdpar_eb_check_stan_id_for_path(family, K, p)

Purpose

Enforces the per-path supported stan_id set for the EB Stan templates, depending on the resolved $(K, p)$ regime. Called by the dispatcher (iteratively across $K$ slots under Path C) before assembling the family_id_k_vector data field.

Arguments

  • family (list): A single family specification object. Must contain $stan_id (coercible to integer) and $name (character) fields.
  • K (integer/numeric): The resolved number of mixture components.
  • p (integer/numeric): The resolved number of parametric coordinates.

Mathematics

The supported stan_id sets by regime:

$$ \text{supported}(K, p) = \begin{cases} {1, 2, 3, 4} & \text{if } K = 1 \quad \text{(Gaussian, Poisson, NB, Bernoulli)} \\ {1, 3} & \text{if } K > 1 \text{ and } p > 1 \quad \text{(Path C: Gaussian } K{=}2 \text{ + NB } K{=}2\text{)} \\ {1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13} & \text{if } K > 1 \text{ and } p = 1 \quad \text{(Path B full set)} \end{cases} $$

Note that the $K = 1$ branch does not distinguish $p = 1$ from $p &gt; 1$; both receive $\{1, 2, 3, 4\}$.

Returns

invisible(NULL) on success.

Notes

  • If family$stan_id is NULL, the function returns immediately without checking (short-circuit).
  • stan_id is coerced via as.integer().
  • On failure, raises an error of class "gdpar_unsupported_feature_error" via gdpar_abort, with a condition data list containing family, stan_id, K, p, and supported.
  • Under Path C ($K &gt; 1$, $p &gt; 1$), the dispatcher is expected to iterate this check across the $K$ slots before assembling the family_id_k_vector data field.
  • The deferred Path B set $\{5, 6, 7, 8, 9, 10, 11, 12, 13\}$ (Beta, Gamma, Lognormal_loc_scale, Student-t, Tweedie, ZIP, ZINB, Hurdle-Poisson, Hurdle-NB) for the $K &gt; 1, p &gt; 1$ regime is queued for a later iteration of Sub-phase 8.6.D.

.gdpar_eb_resolve_laplace_control(user)

Purpose

Merges a user-supplied laplace_control list with documented defaults, coercing types and validating bounds. Produces the fully resolved control list consumed by downstream Laplace approximation and ridge-perturbation routines.

Arguments

  • user (list): User-supplied control parameters. May be empty. If non-empty, every entry must be named.

Mathematics

Default values:

$$ \begin{aligned} M_{\text{multi-start}} &= 5 \\ \kappa_{\text{threshold}} &= 10^{10} \\ \lambda_{\text{ridge,init}} &= 10^{-6} \\ D_{\text{Laplace}} &= 1000 \\ \text{algorithm} &= \texttt{"lbfgs"} \\ \epsilon_{\text{lm}} &= \sqrt{\epsilon_{\text{mach}}} \\ \text{ridge_max_iter} &= 10 \\ g_{\text{ridge}} &= 10.0 \end{aligned} $$

where $\epsilon_{\text{mach}}$ is .Machine$double.eps.

Returns

A named list with the following entries, all type-coerced:

Field Type Default
multi_start_M integer 5L
kappa_threshold double 1e10
ridge_init double 1e-6
laplace_draws integer 1000L
optim_algorithm character "lbfgs"
epsilon_lm double sqrt(.Machine$double.eps)
ridge_max_iter integer 10L
ridge_grow_factor double 10.0

User-supplied values for recognized names override defaults; unrecognized names are dropped after a warning.

Notes

  • If user is empty (length(user) == 0L), returns the defaults list directly.
  • If user is non-empty but has NULL names or any empty (!nzchar) names, raises an error of class "gdpar_input_error".
  • Unknown entries (not in names(defaults)) trigger a soft warning of class "gdpar_diagnostic_warning" via gdpar_warn and are silently dropped from the output.
  • Post-merge type coercion: multi_start_M, laplace_draws, ridge_max_iter are coerced via as.integer(); kappa_threshold, ridge_init, epsilon_lm, ridge_grow_factor via as.double(). optim_algorithm is left as-is.
  • Validation bounds (each raises "gdpar_input_error" on failure):
    • multi_start_M >= 1L
    • kappa_threshold > 0
    • epsilon_lm > 0
    • ridge_max_iter >= 1L
    • ridge_grow_factor > 1
  • laplace_draws is coerced to integer but not bounds-checked in this function.

.gdpar_eb_lm_perturb(cov, control)

Purpose

Adaptive Levenberg-Marquardt ridge perturbation for the empirical posterior covariance matrix returned by cmdstanr::laplace(). Implements component 2 of the four-component anti-fragility strategy, extending the single-step ridge of Sub-phase 8.6.B into an iterative geometric-growth loop.

Arguments

  • cov (numeric matrix): A square symmetric matrix — the empirical posterior covariance. For Path C this is a per-slot block; for Path A/B it is the full $\theta_{\text{ref}}$ covariance.
  • control (list): A resolved laplace_control list (as produced by .gdpar_eb_resolve_laplace_control). Must contain $ridge_init, $ridge_max_iter, $ridge_grow_factor, $kappa_threshold, and $epsilon_lm.

Mathematics

Let $\Sigma$ denote the input covariance (cov) of dimension $n \times n$, with eigenvalues $\{\lambda_i\}_{i=1}^{n}$ and trace mean $\bar{d} = \max(|\text{mean}(\text{diag}(\Sigma))|,\; 10^{-12})$.

Trigger condition. Ridge perturbation is needed if:

$$ \text{needs_ridge} = \Big(\exists, i:; \lambda_i \notin \mathbb{R}_{\text{finite}} ;\text{or}; \lambda_i \leq 0\Big) ;;\text{or};; \Big(|\det(\Sigma)| &lt; \epsilon_{\text{lm}}\Big) $$

where $\det(\Sigma) = \prod_i \lambda_i$ when all eigenvalues are finite.

If not needed: returns the original matrix with status "not_needed" and $\kappa_{\text{post}} = \lambda_{\max} / \lambda_{\min}$.

Adaptive loop. Starting with $\lambda \leftarrow \lambda_{\text{ridge,init}}$, for iteration $t = 1, \dots, T_{\max}$:

$$ \lambda_{\text{eff}}^{(t)} = \max!\Big(\lambda^{(t)},; 10^{-3} \cdot \bar{d}\Big) $$

$$ \Sigma_{\text{try}}^{(t)} = \Sigma + \lambda_{\text{eff}}^{(t)} \cdot I_n $$

Compute eigenvalues $\{\mu_i^{(t)}\}$ of $\Sigma_{\text{try}}^{(t)}$. If all finite and positive, compute the condition number:

$$ \kappa^{(t)} = \frac{\mu_{\max}^{(t)}}{\mu_{\min}^{(t)}} $$

Convergence: If $\kappa^{(t)} \leq \kappa_{\text{threshold}}$, return $\Sigma_{\text{try}}^{(t)}$ with status "converged".

Growth: Otherwise, $\lambda^{(t+1)} \leftarrow \lambda^{(t)} \cdot g_{\text{ridge}}$ and iterate.

Exhaustion: If the loop completes $T_{\max}$ iterations without convergence, return the last $\Sigma_{\text{try}}$ with status "exhausted" and $\kappa_{\text{post}} = \kappa^{(T_{\max})}$ (or $\infty$ if eigenvalues are still non-finite/non-positive).

Returns

A list with fields:

Field Type Description
cov_perturbed numeric matrix The (possibly ridged) covariance. Equals cov when status = "not_needed"; equals the last cov_try otherwise.
lambda_used numeric Final effective ridge $\lambda_{\text{eff}}$. Equals 0 when status = "not_needed".
n_iter integer Number of iterations performed. 0L when status = "not_needed". Equals control$ridge_max_iter when "exhausted".
kappa_post numeric Condition number after perturbation. Original $\kappa$ when "not_needed"; $\kappa^{(t)}$ at convergence; $\kappa^{(T_{\max})}$ or $\infty$ when "exhausted".
status character One of c("not_needed", "converged", "exhausted").

Notes

  • Eigenvalue computation uses eigen(cov, symmetric = TRUE, only.values = TRUE) wrapped in tryCatch; if it errors, eigenvalues are set to NA_real_, which triggers the ridge path.
  • The determinant is computed as prod(eigs0) only when all eigenvalues are finite; otherwise det_val is NA_real_ and the determinant-based trigger is skipped (but the eigenvalue-based trigger may still fire).
  • trace_mean is clamped to at least $10^{-12}$ to avoid a zero floor when the diagonal is near-zero.
  • The lambda_eff floor of $10^{-3} \cdot \bar{d}$ is applied inside every iteration, so even if control$ridge_init is very small, the effective ridge is bounded below by the trace-mean-scaled floor.
  • When status = "exhausted", the returned cov_perturbed is the last attempted matrix (which may or may not be positive-definite), and kappa_post is Inf if the final eigenvalues are non-finite or non-positive.
  • No error is raised on exhaustion; the caller is expected to inspect status.

Functions in R/eb.R (section 3 of 8)


.gdpar_eb_generate_stan_marginal(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)

Purpose

Dispatches to the correct Stan template generator for the EB marginal model — the model in which theta_ref (or theta_ref_k) lives in the parameters{} block and is assigned an anchor prior in model{}. This corresponds to Step (i)/(ii) of the EB workflow where the marginal log-likelihood is maximised to obtain the empirical-Bayes anchor estimate. The function selects among four template paths based on the resolved dimensions $(K, p)$.

Arguments

Argument Type Meaning
prior list Prior specification list. Expected elements (consumed downstream by the renderer): theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, phi.
cp_a logical (default FALSE) Centered-parameterization flag for a. When TRUE, a is scaled directly by sigma_a; when FALSE, a non-centered * sigma_a[1] scaling is applied.
cp_W logical (default FALSE) Centered-parameterization flag for W. Semantics mirror cp_a.
K integer (default 1L) Number of K-slots (groups/series). Coerced to integer at entry.
p integer (default 1L) Coordinate dimension of the response. Coerced to integer at entry.
family NULL or family object Passed only to the Path B (K > 1, p = 1) generator generate_stan_code_K.
cp_a_per_k NULL or logical Per-k centered-parameterization flag for a, forwarded to generate_stan_code_multi (Path A).
cp_a_per_K NULL or logical Per-K centered-parameterization flag for a, forwarded to generate_stan_code_K (Path B).

Mathematics

The dispatch is a partition of the $(K, p)$ plane:

$$ \text{template}(K, p) = \begin{cases} \texttt{amm_eb_marginal.stan} & K = 1 \wedge p = 1 \\ \texttt{amm_eb_marginal_multi.stan} & K = 1 \wedge p > 1 \\ \texttt{amm_eb_marginal_K.stan} & K > 1 \wedge p = 1 \\ \texttt{amm_eb_marginal_KxP.stan} & K > 1 \wedge p > 1 \end{cases} $$

Returns

A character string containing the rendered Stan model code. For the $K=1, p=1$ and $K&gt;1, p&gt;1$ paths the string is produced by .gdpar_eb_render_template; for the other two paths it is produced by generate_stan_code_multi or generate_stan_code_K respectively.

Notes

  • $K$ and $p$ are coerced to integer immediately upon entry (as.integer).
  • The Path C template ($K &gt; 1 \wedge p &gt; 1$) has a restricted placeholder set: only theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_y, phi are present. The placeholders {{A_SCALE}}, {{A_PRIOR}}, {{W_SCALE}}, {{W_PRIOR}} are absent because the NCP (non-centered parameterization) is hardcoded per slot per coordinate and W is disabled (decision D39).
  • The function does not itself raise errors; any errors propagate from the downstream generators/renderers.

.gdpar_eb_generate_stan_conditional(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)

Purpose

Companion of .gdpar_eb_generate_stan_marginal for Step (iii) of the EB workflow. Generates the EB conditional Stan model, in which theta_ref (or theta_ref_k) has been moved from parameters{} to data{} and the anchor priors are dropped from model{}. The dispatch table is structurally identical to the marginal helper; only the template names differ.

Arguments

Identical to .gdpar_eb_generate_stan_marginal (same names, types, defaults, and meanings).

Mathematics

$$ \text{template}(K, p) = \begin{cases} \texttt{amm_eb_conditional.stan} & K = 1 \wedge p = 1 \\ \texttt{amm_eb_conditional_multi.stan} & K = 1 \wedge p > 1 \\ \texttt{amm_eb_conditional_K.stan} & K > 1 \wedge p = 1 \\ \texttt{amm_eb_conditional_KxP.stan} & K > 1 \wedge p > 1 \end{cases} $$

Returns

A character string of rendered Stan model code, sourced from the same generators as the marginal path but with conditional template names.

Notes

  • The conditional templates share the same placeholder set as their marginal counterparts, except that anchor-prior placeholders are consumed only in the marginal path (the conditional path drops them from model{}).
  • $K$ and $p$ are coerced to integer at entry.
  • No errors are raised directly; all are delegated downstream.

.gdpar_eb_render_template(template_name, prior, cp_a, cp_W)

Purpose

Shared low-level renderer for the EB Stan template family. Reproduces the placeholder-substitution logic of generate_stan_code() but restricted to EB templates. It (1) translates legacy single-template names to their canonical-piece equivalents, (2) locates the template file in the installed package or falls back to inst/stan/, (3) injects the canonical helpers piece when the // {{CANONICAL_HELPERS}} marker is present, (4) performs all {{...}} substitutions, and (5) aborts with a structured error if any placeholder remains un-substituted.

Arguments

Argument Type Meaning
template_name character Base name of the .stan template file (e.g. "amm_eb_marginal.stan").
prior list Prior specification list; the renderer reads prior$theta_ref, prior$sigma_theta_ref, prior$sigma_a, prior$sigma_b, prior$sigma_W, prior$sigma_y, prior$phi.
cp_a logical Centered-parameterization flag for a. Controls the values substituted for {{A_SCALE}} and {{A_PRIOR}}.
cp_W logical Centered-parameterization flag for W. Controls the values substituted for {{W_SCALE}} and {{W_PRIOR}}.

Mathematics

The placeholder substitution map is:

Placeholder Value when cp_* = TRUE Value when cp_* = FALSE
{{A_SCALE}} "" " * sigma_a[1]"
{{A_PRIOR}} "normal(0, sigma_a[1])" "normal(0, 1)"
{{W_SCALE}} "" " * sigma_W[1]"
{{W_PRIOR}} "normal(0, sigma_W[1])" "normal(0, 1)"

The prior placeholders map directly: {{PRIOR_THETA_REF}} $\leftarrow$ prior$theta_ref, {{PRIOR_SIGMA_THETA_REF}} $\leftarrow$ prior$sigma_theta_ref, {{PRIOR_SIGMA_A}} $\leftarrow$ prior$sigma_a, {{PRIOR_SIGMA_B}} $\leftarrow$ prior$sigma_b, {{PRIOR_SIGMA_W}} $\leftarrow$ prior$sigma_W, {{PRIOR_SIGMA_Y}} $\leftarrow$ prior$sigma_y, {{PRIOR_PHI}} $\leftarrow$ prior$phi.

Returns

A character string: the fully substituted Stan source code.

Notes

  • Template name translation: "amm_eb_marginal.stan" is mapped to "amm_canonical_eb_marginal.stan" and "amm_eb_conditional.stan" is mapped to "amm_canonical_eb_conditional.stan". All other template names (including the KxP templates) pass through unchanged.
  • File location: If the effective template name starts with "amm_canonical_", the file is sought in system.file("stan", "_canonical_pieces", ...) with a fallback to file.path("inst", "stan", "_canonical_pieces", ...). Otherwise it is sought in system.file("stan", ...) with a fallback to file.path("inst", "stan", ...).
  • Helpers injection: If the template source contains the literal // {{CANONICAL_HELPERS}}, the file amm_canonical_helpers.stan is read from the same _canonical_pieces directory and substituted in place. Templates without this marker (e.g. the KxP EB templates) pass through unchanged.
  • Error — template not found: If the resolved template_path does not exist, calls gdpar_abort with class "gdpar_internal_error" and message "Stan template file '<name>' not found.".
  • Error — helpers not found: If the helpers piece file does not exist, calls gdpar_abort with class "gdpar_internal_error".
  • Error — unsubstituted placeholder: After all substitutions, if the string still contains "{{", the first match of \{\{[A-Za-z0-9_]+\}\} is extracted via regmatches/regexpr and passed to gdpar_abort with class "gdpar_internal_error" and message "Unsubstituted placeholder remains in EB Stan code: <leftover>".
  • All gsub calls use fixed = TRUE, so placeholders are treated as literal strings.

.gdpar_eb_maximize_marginal(model, stan_data, control, seed, verbose)

Purpose

Implements Step (i) of the EB workflow with the anti-fragility strategy of Charter Section 2.8. Runs cmdstanr::optimize() followed by cmdstanr::laplace() on the marginal EB Stan model with multi_start_M independent random inits, retains the init with the highest log-marginal approximation, applies an adaptive Levenberg–Marquardt ridge if the Hessian-derived covariance is ill-conditioned, and assembles the diagnostics needed by the gdpar_eb_fit$diagnostics_numerical slot.

Arguments

Argument Type Meaning
model CmdStanModel A compiled cmdstanr model object exposing $optimize() and $laplace() methods.
stan_data list Data list for Stan. Must contain J_groups (integer, number of groups). For path dispatch, may contain p_dim (integer, coordinate dimension) and K_slots (integer, number of K-slots).
control list Control parameters. Must contain: multi_start_M (integer, number of multi-start inits), optim_algorithm (character, passed to optimize), laplace_draws (integer, number of Laplace draws), kappa_threshold (numeric, condition-number gate).
seed NULL or integer Base random seed. When non-NULL, per-init seeds are as.integer(seed) + m for optimize and as.integer(seed) + 1000L for Laplace.
verbose logical When TRUE, emits informational messages about failed inits and multimodality warnings.

Mathematics

Multi-start optimization. For $m = 1, \dots, M$:

$$ \hat{\theta}^{(m)} = \arg\max_{\theta} ; \log p(\theta \mid \text{data}) $$

The best init is selected by the largest finite $\log p$ (the lp__ value from optimize()):

$$ m^\star = \arg\max_{m \in {1,\dots,M}} ; \text{lp}^{(m)} $$

Laplace approximation. At the best mode $\hat{\theta}^{(m^\star)}$:

$$ \hat{\Sigma} = \left[ -\nabla^2 \log p(\theta \mid \text{data}) \Big|_{\hat{\theta}} \right]^{-1} $$

Adaptive Levenberg–Marquardt ridge. If $\hat{\Sigma}$ is non-PD or $|\det(\hat{\Sigma})| &lt; \varepsilon_{\text{lm}}$, a ridge $\lambda I$ is grown geometrically until the post-ridge condition number satisfies $\kappa_{\text{post}} \leq \kappa_{\text{threshold}}$ or ridge_max_iter is reached:

$$ \hat{\Sigma}_{\text{ridged}} = \hat{\Sigma} + \lambda I $$

Condition-number gate. The final covariance is accepted only if:

$$ \kappa_{\text{post}} \leq \kappa_{\text{threshold}} \quad \wedge \quad \text{status} \neq \text{"exhausted"} $$

Multi-start dispersion. Computed over the finite $\log p$ values:

$$ \text{dispersion} = \frac{\text{sd}({\text{lp}^{(m)} : \text{finite}})}{\max(|\overline{\text{lp}}|, 1)} $$

A dispersion exceeding $0.05$ triggers a multimodality warning when verbose = TRUE.

Path-aware variable extraction. The theta_ref variable names extracted from the Laplace draws depend on the path:

Path Condition Variable pattern Expected count
Base $K=1, p=1$ theta_ref[1], …, theta_ref[J] (or theta_ref if $J=1$) $J$
Path A $p &gt; 1$ (and $K=1$) theta_ref[j,k] for $j \in 1..J$, $k \in 1..p$ $J \cdot p$
Path B $K &gt; 1$ (and $p=1$) theta_ref_k[j,k] for $j \in 1..J$, $k \in 1..K$ $J \cdot K$

Returns

A list with components:

Component Type Description
theta_ref_hat numeric vector (length $J$ or $J \cdot p$ or $J \cdot K$) Posterior mean of theta_ref from the Laplace draws (colMeans of the draws matrix).
theta_ref_se numeric vector (same length) Standard errors: $\sqrt{\max(\text{diag}(\hat{\Sigma}), 0)}$.
theta_ref_cov matrix ($n \times n$) Covariance matrix (possibly ridged).
diagnostics named list See below.

The diagnostics list contains:

Element Type Description
kappa numeric Post-ridge condition number $\kappa_{\text{post}}$.
lm_perturbation numeric The ridge $\lambda$ used (lambda_used).
lm_n_iter integer Number of LM ridge iterations.
lm_status character Status from .gdpar_eb_lm_perturb (e.g. "ok" or "exhausted").
kappa_post_ridge numeric Duplicate of kappa (from lm_out$kappa_post).
multi_start_dispersion numeric Dispersion of finite $\log p$ values across multi-start inits; NA if fewer than 2 finite values.
marginal_log_lik_history numeric vector (length $M$) lp__ from each init; NA for failed inits.
best_init_index integer The $m^\star$ index of the winning init.

Notes

  • Init dispatch: The flag is_multi_or_K is TRUE when stan_data$p_dim > 1L or stan_data$K_slots > 1L. In that case, init_m is set to NULL (cmdstanr's default unconstrained-space random sampler is used). Otherwise, .gdpar_eb_make_random_init(stan_data, seed_offset = m, base_seed = seed) is called. Each multi-start iteration uses a distinct seed offset, preserving reproducibility.
  • Optimize call: jacobian = TRUE is always set (required for downstream laplace() to match the unconstrained-scale convention). When init_m is non-NULL, it is wrapped as list(init_m) (single chain). When seed is non-NULL, the per-init seed is as.integer(seed) + m.
  • Laplace call: Uses mode = best_opt, jacobian = TRUE, draws = control$laplace_draws. Seed (if non-NULL) is as.integer(seed) + 1000L.
  • Error — all inits fail: If best_opt is NULL (every optimize() call failed or returned NULL), calls gdpar_abort with class "gdpar_unsupported_feature_error", message recommending gdpar() (FB), and data = list(history_lp = history_lp).
  • Error — Laplace fails: If model$laplace() returns NULL (error caught), calls gdpar_abort with class "gdpar_eb_numerical_error", message about singular/non-PD Hessian at the candidate MAP, and data = list(history_lp, best_idx).
  • Error — missing theta_ref variables (Path B): If the number of theta_ref_k[j,k] variables found in the draws does not equal $J \cdot K$, calls gdpar_abort with class "gdpar_internal_error".
  • Error — missing theta_ref variables (Path A): If the number of theta_ref[j,k] variables found does not equal $J \cdot p$, calls gdpar_abort with class "gdpar_internal_error".
  • Error — missing theta_ref variables (Base): If no theta_ref[...] variables are found and $J = 1$ does not rescue via the bare theta_ref name, calls gdpar_abort with class "gdpar_internal_error" and message "theta_ref variable not found in Laplace draws output.".
  • Error — kappa exceeds threshold: If $\kappa_{\text{post}} &gt; \kappa_{\text{threshold}}$ or lm_out$status == "exhausted", calls gdpar_abort with class "gdpar_eb_numerical_error", a detailed message including $\kappa$, threshold, LM status, iteration count, $\lambda$, and smallest eigenvalue, and data containing kappa, eigenvalues, history_lp, lm_status, lm_n_iter, lm_lambda.
  • Warning — multimodality: When dispersion > 0.05 and verbose = TRUE, calls gdpar_warn with class "gdpar_diagnostic_warning" and data = list(dispersion, history_lp).
  • Covariance computation: If the draws matrix has more than one column, stats::cov(theta_mat) is used; otherwise a $1 \times 1$ matrix from stats::var(theta_mat[, 1]).
  • Eigenvalue computation: eigen(theta_cov, symmetric = TRUE, only.values = TRUE) is attempted in a tryCatch; on error returns NA_real_. The minimum eigenvalue is reported in the kappa-exceeds-threshold error message.
  • Verbose messages: Failed optimize() calls emit a gdpar_inform with class "gdpar_eb_message" when verbose = TRUE.
  • The %||% operator is used for the all_vars fallback (dimnames(draws)$variable %||% character(0L)).

← Part IV — Exhaustive Function Reference (1/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (3/7) →

Clone this wiki locally