-
Notifications
You must be signed in to change notification settings - Fork 0
Part IV Function Reference 2
← Part IV — Exhaustive Function Reference (1/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (3/7) →
Purpose
S3 print method for objects of class gdpar_meta_learner_comparison. Renders a concise human-readable summary of a meta-learner comparison: the bridge identifier, observation/method counts, the credible level, per-external-method metadata (native CI availability, elapsed time, note count, presence of a predict_fun), and a head view of the three concordance matrices (RMSE, Pearson, MAD).
Arguments
-
x:gdpar_meta_learner_comparison. The comparison object to display. Expected to contain componentsn_obs,n_methods,level,external(a named list of per-method adapter result lists, each withnative_ci,time_sec,notes,has_predict_fun), andcomparison(a list withrmse,pearson,madmatrices). -
...: any. Unused; present for S3 generic compatibility. Silently ignored.
Mathematics
None.
Returns
Invisibly returns x (via invisible(x)). The side effect is console output.
Notes
- The method does not validate
x; it assumes the structure is present. Missing components would propagate as errors fromcat/print. - Per-method line format is fixed by
sprintf:"- %-12s native_ci = %s time = %.3f s notes = %d predict = %s\n".native_ciandhas_predict_funare coerced to character bysprintf's%s(typically"TRUE"/"FALSE"). - The three concordance matrices are printed with
round(..., 4L); they are expected to be square numeric matrices with shared row/column names of length$m = 1 + n_{\text{methods}}$ (bridge plus externals). - S3 dispatch is triggered by
print(x)whenclass(x)contains"gdpar_meta_learner_comparison".
Purpose
S3 summary method for gdpar_meta_learner_comparison objects. Constructs a structured long-format summary containing: per-method ATE point estimates, per-method ATE CI bounds (averaged from per-observation native CIs when available, otherwise NA_real_), the three concordance matrices pivoted into long form, and per-method timing/CI-availability metadata.
Arguments
-
object:gdpar_meta_learner_comparison. The comparison object. Must containbridge_cate(withcate_meanand optionallycate_ci),external(named list of adapter results, each withcate_mean, optionallycate_ci,time_sec,native_ci),comparison(withrmse,pearson,mad),level,n_obs,n_methods. -
...: any. Unused; present for S3 generic compatibility.
Mathematics
Per-method ATE is the sample mean of the per-observation CATE posterior means:
When native per-observation CIs are present, the ATE CI bounds are likewise the sample means of the per-observation lower and upper bounds:
Otherwise both bounds are NA_real_.
Returns
A list of class c("summary.gdpar_meta_learner_comparison", "list") with components:
-
ate_table:data.framewith columnsmethod(character:"bridge"followed bynames(object$external)),ate(numeric),ate_lower(numeric, possiblyNA),ate_upper(numeric, possiblyNA). -
metrics: long-formatdata.frameproduced by.comparison_long(object$comparison)with columnsmethod_i,method_j,rmse,pearson,mad(off-diagonal rows only). -
timing:data.framewith columnsmethod(external method names only — bridge excluded),time_sec(numeric),native_ci(logical). -
level,n_obs,n_methods: copied verbatim fromobject.
Notes
- Calls
assert_inherits(object, "gdpar_meta_learner_comparison", "object"); raises an error (presumably of classgdpar_input_errorper package convention) if the class is absent. - Bridge ATE CI bounds are populated only if
object$bridge_cate$cate_ciis non-NULL; otherwise they remainNA_real_. - External ATE CI bounds are populated per-method only if
e$cate_ciis non-NULL; the code does not consulte$native_cihere, only the presence ofcate_ci. -
ate_vec,ate_lower,ate_upperare named numeric vectors initialized withstats::setNames; the bridge slot is filled first, then external slots in iteration order. - The
timingdata frame excludes the bridge (no timing recorded for it). - S3 dispatch is triggered by
summary(object)whenclass(object)contains"gdpar_meta_learner_comparison".
Purpose
S3 print method for objects of class summary.gdpar_meta_learner_comparison. Prints the credible level, observation count, method count, the ATE table, the timing/CI-availability table, and the first 20 rows of the long-format pairwise concordance metrics.
Arguments
-
x:summary.gdpar_meta_learner_comparison. The summary object produced bysummary.gdpar_meta_learner_comparison. Expected components:level,n_obs,n_methods,ate_table,timing,metrics. -
...: any. Unused; present for S3 generic compatibility.
Mathematics
None.
Returns
Invisibly returns x (via invisible(x)). The side effect is console output.
Notes
- Does not validate
x. -
ate_tableandtimingare printed withrow.names = FALSE. -
metricsis truncated to its first 20 rows viautils::head(x$metrics, 20L); ifnrow(x$metrics) > 20L, asprintfline of the form" ... (%d more rows)\n"is emitted with the count of omitted rows. - S3 dispatch is triggered by
print(x)whenclass(x)contains"summary.gdpar_meta_learner_comparison".
Purpose
Internal helper that pivots the three square concordance matrices (rmse, pearson, mad) stored in a comparison object into a single long-format data.frame containing one row per ordered off-diagonal method pair
Arguments
-
comparison: list. Must contain numeric matrix componentsrmse,pearson,madwith identical dimensions and sharedrownames/colnames. Row names are read fromrownames(comparison$rmse).
Mathematics
Given
Diagonal entries (if (i == j) next. The total number of rows is
Returns
A data.frame with columns method_i (character), method_j (character), rmse (numeric), pearson (numeric), mad (numeric), constructed by do.call(rbind, out_rows) over a list of single-row data frames. stringsAsFactors = FALSE is set on each constituent.
Notes
- Marked
@keywords internaland@noRd; not exported. - Iteration uses
seq_along(nms)for bothiandj, so the order is row-major over the upper and lower triangles combined (i.e., both$(i, j)$ and$(j, i)$ appear, but$(i, i)$ is excluded). - Assumes
rownames(rmse)is non-NULLand that all three matrices share the same dimension and names; no consistency check is performed. - The list
out_rowsis pre-allocated by appending with an incrementing indexk; if anyi == jis skipped, the corresponding slot is never assigned, but becausekis only incremented after assignment, noNULLslots are produced. - Returns
NULL(fromdo.call(rbind, list())) ifnmsis empty.
predict.gdpar_meta_learner_comparison(object, newdata, level = NULL, bridge = NULL, data = NULL, ...)
Purpose
S3 predict method for gdpar_meta_learner_comparison objects. Re-evaluates the CATE on a new covariate grid newdata for the bridge component and for every external adapter. Adapters exposing a predict_fun reuse their cached fitted state without refitting; adapters without a usable predict_fun (or whose predict_fun errors) are flagged for refit, their cate_mean is filled with NA_real_, and a gdpar_diagnostic_warning is emitted. The bridge is re-evaluated via predict.gdpar_causal_bridge when real fits are present, otherwise falls back to cached cate_mean/cate_ci only when newdata matches the original observation count.
Arguments
-
object:gdpar_meta_learner_comparison. The comparison object. Must containlevel,bridge(agdpar_causal_bridgeorNULL), andexternal(named list of adapter results, each possibly containingpredict_fun,state,native_ci,notes). -
newdata:data.frame. Required. The new evaluation grid. Must be a data frame (asserted byassert_data_frame). -
level:numeric(1)orNULL. Optional credible level in$(10^{-3}, 1 - 10^{-3})$ overridingobject$level. Defaults toNULL, which reusesobject$level. Validated byassert_numeric_scalar(level, "level", lower = 1e-3, upper = 1 - 1e-3)when non-NULL. -
bridge:gdpar_causal_bridgeorNULL. Optional replacement bridge object used instead ofobject$bridge. Defaults toNULL(use cached bridge). Useful when the cached bridge was stripped (e.g., after asaveRDSround-trip that lost the two fits). -
data: named list with componentsX,T,Y(and optionallyX_newdata) orNULL. Reserved for the case of a forced re-fit. Defaults toNULL. Note: the current implementation does not consumedataat all — it is accepted but never referenced in the body. -
...: any. Reserved for future arguments; currently unused.
Mathematics
Let newdata. For each external method predict_fun
The bridge prediction is
when real fits are present. The concordance metrics are then recomputed over the vector of method-specific CATE means via .compute_comparison_metrics(cate_list).
Returns
A list of class c("predict.gdpar_meta_learner_comparison", "list") with components:
-
bridge: list withcate_meanandcate_ci(frombridge_pred). -
external: named list mirroringobject$externalnames; each entry is a list withcate_mean(numeric, possibly allNA_real_),cate_ci(matrix orNULL),method(character),native_ci(logical),time_sec(NA_real_),notes(character vector, augmented with a status message). -
comparison: result of.compute_comparison_metrics(cate_list)— a list of concordance matrices. -
newdata: the inputnewdata(stored verbatim). -
level: the resolved numeric level.
Notes
- Calls
assert_inherits(object, "gdpar_meta_learner_comparison", "object")andassert_data_frame(newdata, "newdata")up front. - If
bridge_obj(resolved frombridgeorobject$bridge) does not inherit from"gdpar_causal_bridge", the function aborts viagdpar_abortwith class"gdpar_input_error"and a message instructing the user to pass a bridge via thebridgeargument. - The outcome name is recovered via
.bridge_outcome_name(bridge_obj$fits$treat, bridge_obj$fits$ctrl), and covariates are extracted fromnewdatavia.extract_covariates(newdata, outcome_name)(presumably dropping the outcome column). - Bridge re-evaluation branches on
has_real_fits:- If both
bridge_obj$fits$treat$fitandbridge_obj$fits$ctrl$fitare non-NULL, callsstats::predict(bridge_obj, newdata = newdata, level = level, summary = "mean_ci"). - Otherwise, falls back to cached
bridge_obj$cate_mean/bridge_obj$cate_cionly ifnrow(newdata) == bridge_obj$n_obs; otherwisecate_meanisrep(NA_real_, nrow(newdata))andcate_ciisNULL.
- If both
- For each external method:
- If
e$predict_funis a function, it is invoked aspf(state = e$state, X_newdata = X_newdata, level = level)insidetryCatch. On success,cate_meanis coerced viaas.numeric(out$cate_mean),cate_ciis taken asout$cate_ci,native_ciise$native_ci && !is.null(out$cate_ci), andnotesis augmented with"reused cached state via predict_fun". On error, the method is added toneeds_refit,cate_meanisrep(NA_real_, nrow(newdata)),cate_ciisNULL,native_ciisFALSE, andnotesis augmented with"predict_fun failed: <message>". - If no
predict_fun, the method is added toneeds_refit,cate_meanisrep(NA_real_, nrow(newdata)),cate_ciisNULL,native_ciisFALSE, andnotesis augmented with"predict_fun unavailable; a full refit would be required".
- If
- If
length(needs_refit) > 0L, a warning of class"gdpar_diagnostic_warning"is emitted viagdpar_warnwithdata = list(needs_refit = needs_refit)and a message listing the affected adapters, advising the user to rebuild the comparison withgdpar_compare_meta_learners(). -
time_secis always set toNA_real_for every external entry (no timing is recorded for prediction). - The
dataargument is declared and documented but not used in the body; no refit path is actually implemented despite the documentation mentioningfit_predict_fun. The function only reuses cached state or returnsNApredictions. -
cate_listis constructed asc(list(bridge = as.numeric(bridge_pred$cate_mean)), lapply(external, function(e) e$cate_mean))and passed to.compute_comparison_metrics; the resulting matrices therefore have row/column names"bridge"followed by the external method names (subject to.compute_comparison_metrics's naming behavior). - S3 dispatch is triggered by
predict(object, newdata = ...)whenclass(object)contains"gdpar_meta_learner_comparison".
Purpose
Orchestrates a descriptive comparison of the T-learner (AMM-side) embedded in a gdpar_causal_bridge object against one or more external meta-learners (e.g., grf, EconML). It evaluates each method on a common evaluation grid, reports point/posterior CATE estimates and their native confidence intervals, and computes three concordance metrics (RMSE, Pearson correlation, mean absolute discrepancy) between every ordered pair of methods on their point/posterior CATE estimates. It does not perform hypothesis tests.
Arguments
-
bridge: Object of classgdpar_causal_bridge(fromgdpar_causal_bridge()). Contains two fittedgdparobjects (treatment and control arms), precomputed CATE estimates, and metadata. -
methods: Non-empty list ofgdpar_meta_learner_adapterobjects. Each adapter wraps a specific external meta-learner implementation (e.g.,gdpar_adapter_grf()). -
newdata: Optionaldata.frameon which to evaluate CATE. Defaults to the evaluation grid stored inbridge$newdata. -
data: Optional list with componentsX(covariatedata.frame),T(integer 0/1 treatment vector),Y(numeric outcome vector). Used to supply training data explicitly if it cannot be recovered from the bridge's stored calls (e.g., when the original data is not in the calling environment). -
seed: Optional integer scalar. Propagated to each adapter'sfit_predict_funasseed_runfor reproducibility. -
...: Reserved for future arguments; currently unused.
Mathematics
For every ordered pair of methods "bridge"), computes the following concordance metrics on the point/posterior CATE estimates n_obs). Confidence intervals are not pooled because their inferential origins (Bayesian posterior, asymptotic, bootstrap) are heterogeneous.
Returns
An object of class gdpar_meta_learner_comparison (a list) with components:
-
bridge_cate: List withcate_mean(numeric vector of bridge CATE point estimates) andcate_ci(matrix of bridge credible intervals). -
bridge: The originalgdpar_causal_bridgeobject. -
external: Named list of results for each external adapter. Each element containscate_mean,cate_ci,method,native_ci(logical),time_sec,notes,state(from the adapter), and the adapter'spredict_fun/fit_predict_funif provided. -
comparison: Matrix of concordance metrics (RMSE, Pearson, MAD) between all method pairs. -
newdata: The evaluation grid (data.frame) used. -
level: The confidence level (numeric) used for intervals. -
n_obs: Integer number of evaluation points. -
n_methods: Integer total number of methods compared (bridge + external). -
call: The matched call. -
meta: List of metadata including package version, timestamp, seed, original bridge call, and adapter specifications.
Notes
-
Scalar-outcome restriction: Rejects bridges with
dim_kind != "scalar"(i.e., distributional or multivariate regression) via.guard_scalar_outcome(). -
Method names: If the
methodslist is unnamed, names are taken from each adapter's$namefield. Duplicate names cause an error. -
Dataset recovery: If
dataisNULL, the function attempts to reconstruct the training data from the bridge's stored calls using.assemble_bridge_dataset(). If this fails, the user must supplydataexplicitly. -
Adapter validation: Each adapter is checked for unmet software requirements (R packages, Python modules) via
.check_adapter_requirements(). Missing dependencies cause agdpar_missing_dependency_error. -
Adapter output validation: Results from each adapter are checked with
.validate_adapter_output()for correct length and structure. -
Bridge CATE recomputation: If
newdatadiffers from the bridge's original grid and lengths mismatch, the bridge CATE is re-predicted usingstats::predict(bridge, ...).
Purpose
Internal validation function ensuring the bridge was constructed from scalar-outcome fits (i.e., dim_kind == "scalar"). Rejects bridges from distributional regression (K > 1) or multivariate response (p > 1) with a specific error, as multi-output external adapters are not supported in the current scope (Sub-phase 8.5.B).
Arguments
-
bridge: Object of classgdpar_causal_bridge. Its$meta$dim_kindcomponent is inspected.
Returns
invisible(NULL) if the bridge is scalar. Otherwise, raises a gdpar_unsupported_feature_error.
Notes
- Uses the null-coalescing operator
%||%to default"scalar"ifdim_kindis missing. - The error message references "Sub-phase 8.5.B" and queues multi-output support for "Block 9" per the package roadmap.
Purpose
Internal helper that constructs the unified training dataset (X, T, Y) required by external meta-learner adapters. It either uses an explicitly provided data argument or attempts to recover the training data from the bridge's stored fits by evaluating their captured call objects in the specified environment.
Arguments
-
bridge:gdpar_causal_bridgeobject. -
newdata:data.frameof evaluation covariates (the CATE grid). -
data: Optional list with componentsX,T,Y. If supplied, it is used directly. -
eval_env: Environment in which to evaluate the bridge's stored calls (typicallyparent.frame()of the caller).
Mathematics
When data is NULL, the algorithm for each arm (treatment, control) is:
- Recover the arm's training dataset via
eval(fit$call$data, eval_env). - Identify the outcome variable name from the LHS of
fit$call$formula. - Extract the covariate matrix
X_arm(all columns except the outcome) and outcome vectorY_arm. - Create treatment indicator
T_arm = 1L(treatment) or0L(control). - Stack the two arms row-wise to form
(X, T, Y).
Returns
A list with components:
-
X:data.frameof stacked covariates (rows = training observations from both arms). -
T: Integer vector of treatment indicators (0/1). -
Y: Numeric vector of outcomes. -
X_newdata: The suppliednewdata(unchanged). -
outcome_name: Character string of the outcome variable name.
Notes
- If evaluation of a fit's
call$datafails (e.g., the object is not ineval_env), the function aborts with agdpar_input_erroradvising the user to passdataexplicitly. - The helper ensures the covariate column order and types are consistent between training and evaluation data.
- The function is responsible for ensuring that the stacked dataset aligns with the external adapter's expectations (i.e.,
Xis a data frame,Tis integer 0/1,Yis numeric).
Purpose
Assembles the unified training dataset and newdata covariate matrix required by the meta-learner comparison machinery. It either accepts an explicitly supplied data argument (a list with X, T, Y, and optionally X_newdata) or attempts to recover the original training data from the captured call objects inside the treatment and control fits of a bridge object. In both paths it validates consistency, combines arms, and returns a standardized list.
Arguments
| Argument | Type | Meaning |
|---|---|---|
bridge |
list |
A bridge object with component fits$treat and fits$ctrl, each a fitted model object that carries a $call element. |
newdata |
data.frame (or coercible) |
New covariate data for which predictions will be compared. Must contain every covariate column present in the training data. |
data |
list or NULL
|
Optional explicit data. If non-NULL it must be a named list with components X (covariate matrix/data.frame), T (integer treatment indicator, values 0/1), Y (numeric outcome), and optionally X_newdata (covariate data.frame for newdata; if absent, covariates are extracted from newdata). |
eval_env |
environment |
The environment in which fit call expressions (e.g. cl$data) are evaluated when recovering training data from the fitted objects. |
Mathematics
No formula is implemented. The function performs data assembly only.
Training data from two arms are row-bound with treatment indicators prepended:
where
Returns
A named list with five components:
| Component | Type | Description |
|---|---|---|
X |
data.frame |
Training covariates (outcome column removed). Row count equals |
T |
integer vector |
Treatment indicator, length 1L (treatment arm first) then 0L (control arm). |
Y |
numeric vector |
Outcome values, treatment arm first then control arm, length |
X_newdata |
data.frame |
Covariates for newdata, column subset/reordered to match X. Row count equals nrow(newdata). |
outcome_name |
character scalar |
The name of the outcome variable, inferred by .bridge_outcome_name(). |
Notes
-
Explicit data path (
datais non-NULL):-
datamust be a list with named componentsX,T,Y. If any is missing, agdpar_input_erroris raised. -
Xis coerced todata.frameif it is not one already (withstringsAsFactors = FALSE). -
Tis coerced tointeger;Ytonumeric. - Lengths of
T,Y, andnrow(X)must agree; otherwise agdpar_input_erroris raised with a diagnostic sprintf message. -
Tmust contain only0Land1L; otherwise agdpar_input_erroris raised. - If
data$X_newdatais present it is used (coerced todata.frameif needed); otherwise covariates are extracted fromnewdataby calling.extract_covariates(newdata, outcome_name). - The function returns immediately via
return(...)without any data-recovery attempt.
-
-
Recovery path (
dataisNULL):- The internal function
recover(fit)evaluatesfit$call$dataineval_env. If the call isNULL, the data component isNULL, or evaluation throws an error,NULLis returned. - If either recovered data is
NULLor not adata.frame, agdpar_input_erroris raised with extra data fieldstreat_recoveredandctrl_recovered. - The outcome variable name (from
.bridge_outcome_name()) must appear as a column in both recovered data frames; otherwise agdpar_input_erroris raised. - The column sets of the two recovered data frames must be identical (after sorting); otherwise a
gdpar_input_erroris raised listing both column sets. - Both data frames are subset to their common columns, then row-bound. The outcome column is removed from
Xvia.extract_covariates(). -
newdatacovariates are extracted and checked for missing columns present inX; if any are missing agdpar_input_erroris raised listing them.X_newdatais then reordered/subset to matchX's columns exactly.
- The internal function
-
All errors are raised via
gdpar_abort()with appropriate condition classes (gdpar_input_error,gdpar_unsupported_feature_error).
Purpose
Infers the name of the outcome (response) variable from the captured formula calls of the treatment and control fits within a bridge object. This is used internally by .assemble_bridge_dataset() and other comparison functions.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_t |
fitted model object | Treatment-arm fit. Must carry a $call element, ideally with a $formula component. |
fit_c |
fitted model object | Control-arm fit. Same expectations as fit_t. |
Mathematics
No formula. The function performs formula introspection.
Returns
A single character string giving the outcome variable name. If both fits resolve to the same name, that name is returned. If only one resolves, the resolved name is returned. If neither resolves or they disagree, an error is raised (see Notes).
Notes
-
The internal helper
pick(fit)attempts three strategies in order to extract the LHS of a two-sided formula fromfit$call$formula:- Evaluate
cl$formulain the environment offit$call. If the result is aformulaof length 3 (two-sided), extractas.character(fm[[2L]]). - If
cl$formulaitself is acallorname, attempt to evaluate it with bareeval(). If the result is a two-sided formula, extract the LHS. - If
cl$formulais a call of length 3 whose first element is~, directly extractas.character(cl$formula[[2L]]). - If all strategies fail,
NA_character_is returned.
- Evaluate
-
If both
n_tandn_careNA, agdpar_input_erroris raised advising the user to pass an explicitdataargument. -
If both are non-
NAbut differ (!identical(n_t, n_c)), agdpar_unsupported_feature_erroris raised listing both names. This means the two fits must model the same outcome variable. -
If exactly one is
NA, the non-NAvalue is returned (i.e.,n_cwhenn_tisNA, otherwisen_t).
Purpose Removes the outcome column from a data frame (or object coercible to one) and returns only the covariate columns. Used throughout the comparison pipeline to separate predictors from response.
Arguments
| Argument | Type | Meaning |
|---|---|---|
df |
data.frame or coercible |
A data frame whose columns include covariates and the outcome. |
outcome_name |
character scalar |
The name of the outcome column to drop. |
Mathematics
None.
Returns
A data.frame containing all columns of df except the one named outcome_name. If outcome_name is not present in colnames(df), all columns are returned (since setdiff returns the full set). The drop = FALSE argument ensures the result is always a data frame even if a single column remains.
Notes
- If
dfis not already adata.frame, it is coerced viaas.data.frame(df, stringsAsFactors = FALSE). - This is a utility function; no errors are raised by it directly.
Purpose Validates that the return value of a meta-learner adapter conforms to the expected shape and types. Called internally after each adapter invocation to enforce the adapter interface contract.
Arguments
| Argument | Type | Meaning |
|---|---|---|
result |
list |
The object returned by an adapter. Must contain at minimum a cate_mean component. May optionally contain cate_ci. |
n_newdata |
integer scalar |
The expected number of rows (observations) in the newdata, i.e., the required length of cate_mean and row count of cate_ci. |
adapter_name |
character scalar |
Human-readable name of the adapter, used in error messages. |
Mathematics
None.
Returns
Invisibly returns NULL (invisible(NULL)). Side-effect–only function: raises errors if validation fails.
Notes
-
First check:
resultmust be alistand must have an element named"cate_mean". If not, agdpar_internal_erroris raised. -
Second check:
result$cate_meanmust be anumericvector of length exactlyn_newdata. If not, agdpar_internal_erroris raised with a diagnostic sprintf. -
Third check (conditional): If
result$cate_ciis non-NULL, it must be amatrixwithnrow == n_newdataandncol == 2L(lower and upper bounds). If not, agdpar_internal_erroris raised reporting the actual dimensions. - All errors use class
"gdpar_internal_error", indicating a programming error in the adapter rather than user input.
Purpose Computes three pairwise concordance/similarity matrices across a list of CATE (Conditional Average Treatment Effect) estimate vectors. These matrices quantify the agreement between different meta-learner methods on the same newdata.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cate_list |
list of numeric vectors |
Each element is a numeric vector of CATE predictions for the same newdata observations. All vectors must have the same length. List element names (if present) are used as row/column labels; otherwise names m1, m2, … are generated. |
Mathematics
Let
Root Mean Squared Error (RMSE):
Diagonal:
Mean Absolute Deviation (MAD):
Diagonal:
Pearson Correlation:
Diagonal:
Returns
A named list with three components:
| Component | Type | Description |
|---|---|---|
rmse |
matrix ( |
Pairwise RMSE. Diagonal is 0. Symmetric. Dimnames are the method names. |
pearson |
matrix ( |
Pairwise Pearson correlation. Diagonal is 1. Symmetric. Dimnames are the method names. |
mad |
matrix ( |
Pairwise MAD. Diagonal is 0. Symmetric. Dimnames are the method names. |
Notes
- All CATE vectors in
cate_listare column-bound into a single matrixMviado.call(cbind, cate_list). This requires all vectors to have the same length; no explicit check is performed—cbindwill recycle or error if lengths differ. - If
cate_listhas no names, synthetic names"m1","m2", … are assigned and propagated todimnames. - The double loop iterates over all
$(i, j)$ pairs with$i \neq j$ . For RMSE and MAD, each off-diagonal entry is written once. For Pearson, the loop only computes the correlation when$i < j$ (usingstats::cor()) and mirrors the value to$[j, i]$ . This avoids redundant correlation calls. -
stats::cor()is wrapped insuppressWarnings()to silence warnings about constant vectors (which yieldNAcorrelations). - The matrices are not guaranteed to be perfectly symmetric due to floating-point considerations in the
$i \neq j$ case for RMSE and MAD (each pair is computed only once and written to one cell; the symmetric cell is left at the diagonal-init value). Specifically,rmse[i,j]is set for all$i \neq j$ in the inner loop, so both$[i,j]$ and$[j,i]$ are filled (the loop visits both orderings sincei == jis the only skip). The Pearson matrix is explicitly symmetric because only$i < j$ is computed and mirrored.
gdpar_contraction_diagnostic(fit, data, sizes = NULL, replicates = 1L, parameters = NULL, level = 0.95, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)
Purpose
Empirical posterior contraction-rate diagnostic for a fitted Path 1 (Bayesian) gdpar model. It is an opt-in, computationally expensive methodological audit tool that refits the model at multiple subsample sizes, records the median posterior credible-interval width across user-facing parameters at each size, and fits an ordinary-least-squares regression of log-width on log-sample-size. The estimated slope is compared against the theoretical parametric contraction rate fit; it returns a standalone report.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_fit |
A fitted model object produced by gdpar with path = "bayes". Must inherit from class "gdpar_fit". The original fit$call is extracted and modified to produce subsampled refits. |
data |
data frame | The data frame originally passed to gdpar, or another data frame compatible with the AMM specification of fit. Its row count |
sizes |
NULL or numeric vector |
Subsample sizes at which to refit. If NULL (default), a length-five geometric sequence is generated between |
replicates |
integer scalar (count) | Number of independent subsamples drawn per size. Defaults to 1L. Higher values reduce Monte Carlo variance of the log-width curve at additional computational cost. Must be a non-negative integer (validated by assert_count). |
parameters |
NULL or character vector |
Optional explicit list of posterior variable names to include in the credible-width calculation. If NULL (default), the function auto-selects user-facing parameters by filtering out variables matching the internal ignore pattern. |
level |
numeric scalar in |
Nominal credible level for interval-width computation. Defaults to 0.95. The interval is formed from the |
iter_warmup |
integer scalar (count) | Warmup iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call. |
iter_sampling |
integer scalar (count) | Sampling iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call. |
chains |
integer scalar (count) | Number of MCMC chains per refit. Defaults to 2L. Forwarded to gdpar via the modified call. |
verbose |
logical scalar (length 1) | If TRUE, prints a cost message via gdpar_inform before starting the refits. Defaults to TRUE. Must be a single logical value. |
... |
any | Additional arguments forwarded to gdpar through the modified refit call. |
Mathematics
The diagnostic fits the linear regression
where
with
The slope stats::lm. An approximate 95% confidence interval for
The verdict logic compares this interval against the theoretical target
The first branch tests whether the 95% CI overlaps the interval
The default subsample sizes, when sizes = NULL, are generated as
Returns
A list of class c("gdpar_contraction_report", "list") with components:
| Component | Type | Description |
|---|---|---|
table |
data frame | Columns n (subsample size), replicate (replicate index), median_width (median credible-interval width, NA_real_ if the refit failed). One row per (size, replicate) cell. |
slope_estimate |
numeric scalar | OLS slope lm(log_w ~ log_n), with names stripped via unname. |
slope_se |
numeric scalar | Standard error of |
slope_ci_lower |
numeric scalar | Lower bound |
slope_ci_upper |
numeric scalar | Upper bound |
verdict |
character | One of three verdict strings (see Mathematics). |
level |
numeric scalar | The credible level used (echoed from the level argument). |
warnings |
character vector | Per-refit failure messages; empty if all refits succeeded. |
Notes
-
Input validation. Calls
assert_inherits(fit, "gdpar_fit", ...),assert_data_frame(data, ...),assert_count(replicates, ...),assert_numeric_scalar(level, ..., lower = 0, upper = 1),assert_count(iter_warmup, ...),assert_count(iter_sampling, ...),assert_count(chains, ...). Theverboseargument is checked inline: if not a length-1 logical,gdpar_abortis called with class"gdpar_input_error". Thesizesargument, when non-NULL, is validated inline: if not numeric, or if any entry is$< 5$ or$> n$ ,gdpar_abortis called with class"gdpar_input_error"and a message formatted viasprintf. -
Suggested-package dependencies. Calls
require_suggested("cmdstanr", ...)andrequire_suggested("posterior", ...). If either is unavailable, an error is raised by that helper. -
Cost message. When
verbose = TRUE, emits agdpar_informmessage of class"gdpar_optin_message"stating the total number of refits (length(sizes) * replicates). -
Refit call construction. The original
fit$callis copied and modified:datais set toquote(sub);iter_warmup,iter_sampling,chainsare overwritten from the corresponding arguments;verboseis set toFALSE;refreshis set to0L;skip_id_checkis set toTRUE. The modified call iseval-uated in a freshly created environment (new.env(parent = parent.frame())) in which the symbolsubis bound to the subsampled data frame. The local variablecall_data_arg_name <- "data"is assigned but never used. -
Subsampling. For each
(size, replicate)cell,sample.int(n, size = sz)draws a simple random sample without replacement. Despite the documentation mentioning "stratified by row order," the code performs uniform random sampling with no stratification. -
Refit failure handling. Each refit is wrapped in
tryCatch. On error,refit_failure_msgis populated (via<<-inside the error handler) with a formatted message including the size, replicate, andconditionMessage(e). Agdpar_warnof class"gdpar_diagnostic_warning"is emitted, the message is appended towarnings_msg, and a row withmedian_width = NA_real_is recorded. The loop then continues to the next cell. -
Variable selection. Posterior variables are retrieved via
posterior::variables(draws). The ignore pattern"^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw)"is applied viagreplto exclude internal/auxiliary variables. IfparametersisNULL, the filtered set (candidate_vars) is used; otherwiseparametersis used directly without validation against available variables. -
Width computation.
posterior::summarise_drawsis called onposterior::subset_draws(draws, variable = use_vars)with two custom summary functionsq_lowerandq_upperthat wrapstats::quantileat probabilities$\alpha/2$ and$1 - \alpha/2$ respectively (withnames = FALSE). Widths are computed as the element-wise differenceq_upper - q_lower, and the cell'smedian_widthisstats::median(widths). -
Minimum data requirement. After removing
NArows, if fewer than 3 successful refits remain,gdpar_abortis called with class"gdpar_diagnostic_error"and message"Not enough successful refits to estimate the contraction slope.". -
Regression.
stats::lm(log_w ~ log_n)is fit on the non-NAsubset. Coefficients and standard errors are extracted fromstats::coef(reg)andsummary(reg)$coefficients[, "Std. Error"]respectively, indexing by the name"log_n". -
Side effects. May print a cost message (
gdpar_inform), emit per-refit warnings (gdpar_warn), and performlength(sizes) * replicatesfull Bayesian refits viacmdstanr(throughgdpar).
Purpose
S3 print method for objects of class gdpar_contraction_report. Produces a human-readable summary of the contraction-rate diagnostic report, including the per-cell table, the estimated slope with standard error and 95% confidence interval, the verdict string, and any recorded warnings.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_contraction_report |
The report object returned by gdpar_contraction_diagnostic. |
... |
any | Unused; present for S3 generic compatibility. |
Returns
Invisibly returns x (via invisible(x)).
Notes
-
Output format. Prints, in order:
- A header line
"<gdpar_contraction_report> level = <level>"(usingcatwithsep = ""). - The
tablecomponent viaprint(x$table, row.names = FALSE). - A blank line, then
"Slope estimate (log_width ~ log_n): <slope> (SE = <se>)"with values formatted to 3 significant digits viaformat(..., digits = 3). -
"95% CI: [<lower>, <upper>]"with values formatted to 3 significant digits. -
"Verdict: <verdict>". - If
length(x$warnings) > 0L, a blank line, the header"Warnings:", and each warning prefixed with" - ".
- A header line
-
S3 dispatch. Registered as the print method for class
gdpar_contraction_report; dispatched automatically when such an object is printed at the console. - No side effects beyond console output.
Purpose. Extracts the observed scalar outcome vector from a scalar Empirical-Bayes fit (gdpar_eb_fit) by reading the Stan data bundle stored in object$stan_data. It serves as the canonical accessor for the response used downstream by dependence diagnostics (e.g., residual-based Moran's I or block-bootstrap refit engines). Aborts for non-scalar outcomes (multivariate p > 1 or multi-slot K > 1), which are explicitly deferred in this sub-block.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit |
A scalar Empirical-Bayes fit object whose stan_data list contains the outcome vector. |
Returns. A numeric vector (as.numeric(y_raw)), the observed outcome values. If the Stan data stored a real-valued response (y_real) that is used; otherwise y_int (count / Bernoulli families) is used. The result is always coerced to numeric.
Notes.
- Reads
object$stan_data$y_realfirst; ifNULL, falls back toobject$stan_data$y_int. If both areNULL, raises agdpar_internal_errorviagdpar_abort(). - If
y_rawis a matrix with more than one column (ncol(y_raw) > 1L), raises agdpar_unsupported_feature_error, because multivariate (p > 1) outcomes are deferred. - Multi-slot (
K > 1) outcomes are not checked here directly (that is handled by.gdpar_assert_scalar_eb()), but the matrix-column check implicitly guards against multi-column outcome matrices. - No S3 dispatch; purely internal.
Purpose. Validates that object is a scalar Empirical-Bayes fit (gdpar_eb_fit) suitable for dependence-robust inference. Checks three conditions: (i) the object inherits from gdpar_eb_fit, (ii) it has no heterogeneous-family list (K > 1), and (iii) its conditional HMC fit is present.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit |
The fit object to validate. |
arg_name |
character (length 1) |
Name of the argument, used in error messages. Defaults to "object". |
Returns. invisible(object) — the same object, if all checks pass.
Notes.
- Calls
assert_inherits(object, "gdpar_eb_fit", arg_name)first; this is an external assertion helper that aborts with an appropriate class if the check fails. - If
object$family$familiesis non-NULL, this indicates heterogeneous families (K > 1), and agdpar_unsupported_feature_erroris raised. - If
object$conditional_fitisNULL, agdpar_internal_erroris raised because the conditional HMC fit is required for downstream residual extraction. - Returns invisibly to support use as a guard clause.
Purpose. The Axis 2 gate (decision D102): validates that a fit object is a scalar fit on either the Empirical-Bayes or the full-Bayes path, suitable for dependence-robust inference. For EB fits it delegates verbatim to .gdpar_assert_scalar_eb(). For full-Bayes fits (gdpar_fit) it checks the path class and presence of the HMC fit. Any other class is rejected.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
The fit object to validate. |
arg_name |
character (length 1) |
Name of the argument, used in error messages. Defaults to "object". |
Returns. invisible(object) — the same object, if all checks pass.
Notes.
- If
objectinherits fromgdpar_eb_fit, delegates to.gdpar_assert_scalar_eb(object, arg_name)and returns its result. This preserves byte-identical EB-path behaviour. - If
objectinherits fromgdpar_fit:- Calls
.gdpar_fit_path_class(object)(an internal helper elsewhere in the package) and asserts the result is"scalar". If not, raisesgdpar_unsupported_feature_error(multivariatep > 1andK > 1full-Bayes fits are deferred). - If
object$fitisNULL, raisesgdpar_internal_error(the HMC fit is missing).
- Calls
- If
objectis neithergdpar_eb_fitnorgdpar_fit, raisesgdpar_input_errorwith a message naming the offending argument. - Returns invisibly.
Purpose. Extracts the EB point-estimate vector from a scalar Empirical-Bayes fit and flattens it into a single named numeric vector. This is the EB touchpoint of the block-bootstrap engine: the same extraction is performed on each bootstrap refit, and column alignment across refits depends on the name stability guaranteed here.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_eb_fit |
A scalar Empirical-Bayes fit object. |
Mathematics. No formula per se, but the extraction order is deterministic and fixed:
where each component is a sub-vector of the named coefficients returned by coef.gdpar_eb_fit(). The concatenation order is: theta_ref, then a, then b, then W.
Returns. A named numeric vector containing all EB point estimates. Names follow the convention "theta_ref" or "theta_ref[1]" etc. for theta_ref, and "a[1]", "b[1]", "W[1]" etc. for the remaining components (unless the coef() result already provides names).
Notes.
- Calls
stats::coef(fit)to obtain the structured coefficient list. - Iterates over components
"theta_ref","a","b","W"in that fixed order. - If a component's
$estimatefield isNULL, it is silently skipped. - If names are
NULL, synthetic names of the form"<comp>[<index>]"are generated. Fortheta_refof length 1, the name is simply"theta_ref". - If no estimates can be extracted (all components
NULL), raisesgdpar_internal_error. - The result of
do.call(c, unname(parts))concatenates the named sub-vectors while preserving names.
Purpose. Mirrors .gdpar_eb_estimate_vector() but extracts the model-based (Laplace / conditional posterior) standard errors instead of point estimates. The resulting vector is name-aligned with the estimate vector, enabling ratio computations such as se_ratio = robust_se / model_se.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_eb_fit |
A scalar Empirical-Bayes fit object. |
Mathematics. The model SE for each coefficient
as stored in coef(fit)$<component>$se.
Returns. A named numeric vector of the same length and name structure as .gdpar_eb_estimate_vector(fit). If a component's $se field is NULL but its $estimate field is non-NULL, the corresponding entries are filled with NA_real_.
Notes.
- Reads
$sefields from thecoef(fit)list. If$seisNULLfor a given component but$estimateis present, fills withNA_real_(length-matched viarep(NA_real_, length(est))). - Uses
$estimate(not$se) to determine names and presence, ensuring alignment with the estimate vector. - Iterates over
theta_ref,a,b,Win the same fixed order as.gdpar_eb_estimate_vector(). - Returns a
do.call(c, unname(parts))result, identical structure to the estimate vector.
Purpose. Extracts the posterior draws of the AMM coefficients from a scalar full-Bayes fit (gdpar_fit) as a single
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A scalar full-Bayes fit object (already validated by .gdpar_assert_scalar_dep()). |
Mathematics. Let
where the columns correspond to the Stan variables theta_ref, a_coef, b_coef, and W_raw, in that order, each included only if the corresponding AMM component is active.
Returns. An draws_matrix) whose columns carry Stan variable names (e.g., "theta_ref[1]", "a_coef[1]"). Row count equals the number of posterior draws.
Notes.
- Requires the suggested package
posterior; callsrequire_suggested("posterior", "extract posterior draws")which will abort with an informative message if unavailable. - Reads draws via
object$fit$draws()(the raw CmdStan / Stan fit object). - Variables included: always
"theta_ref"; additionally"a_coef"ifobject$amm$ais non-NULL;"b_coef"ifobject$amm$bis non-NULL;"W_raw"ifobject$amm$Wis non-NULL. - Uses raw
W_rawdraws (notsigma_W-scaled effective weights), matching the EB extractor's use of rawW_rawconditional estimates. This is a deliberate parity choice (decision D102). - Excludes hyperparameters (
mu_theta_ref,sigma_theta_ref) for EB/FB parity. - If the resulting matrix is
NULLor has zero columns, raisesgdpar_internal_error. - Calls
unclass()on the result to strip thedraws_matrixclass, returning a plain numeric matrix.
Purpose. Computes the full-Bayes point-estimate vector as the posterior mean of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_estimate_vector().
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A scalar full-Bayes fit object. |
Mathematics. For each coefficient
where
Returns. A named numeric vector of length "theta_ref[1]", "a_coef[1]", etc.).
Notes.
- Calls
.gdpar_fb_coef_draws_matrix(object)to obtain the$S \times P$ matrix, then computes column means viacolMeans(mat). - Names are set from
colnames(mat), which are the Stan variable names.
Purpose. Computes the full-Bayes model-based standard error vector as the posterior standard deviation of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_model_se_vector().
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A scalar full-Bayes fit object. |
Mathematics. For each coefficient
where
Returns. A named numeric vector of length .gdpar_fb_estimate_vector(object). Names are taken from colnames(mat).
Notes.
- Calls
.gdpar_fb_coef_draws_matrix(object)then appliesapply(mat, 2L, stats::sd)to compute column-wise standard deviations. - Uses
stats::sd(which divides by$S - 1$ , Bessel-corrected). - The "model SE" here is the posterior SD, which is like-for-like with the EB Laplace SD, so the
se_ratio = robust_se / model_secomparison is a SD-vs-SD ratio on both paths.
Purpose. Class-dispatched accessor for the point-estimate vector, the first touchpoint of the shared block-bootstrap engine. For a gdpar_eb_fit it delegates to .gdpar_eb_estimate_vector() (byte-identical EB path); for a gdpar_fit it delegates to .gdpar_fb_estimate_vector().
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A validated scalar fit object (EB or full-Bayes). |
Mathematics. See .gdpar_eb_estimate_vector() and .gdpar_fb_estimate_vector().
Returns. A named numeric vector of AMM coefficient point estimates, regardless of path.
Notes.
- Dispatch is via
inherits(object, "gdpar_eb_fit")(manual S3-style, not formalUseMethod). - If the object is a
gdpar_eb_fit, calls and returns.gdpar_eb_estimate_vector(object)verbatim, preserving regression-gate compatibility. - Otherwise (assumed
gdpar_fit), calls.gdpar_fb_estimate_vector(object). - Column names are stable across refits of the same model specification, which is critical for the block-bootstrap column alignment.
Purpose. Class-dispatched accessor for the model-based standard error vector, the second touchpoint of the shared block-bootstrap engine. EB path: Laplace / conditional posterior SD (verbatim). Full-Bayes path: posterior SD per coefficient. In both cases the "model SE" is a within-model (posterior / Laplace) standard deviation.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A validated scalar fit object (EB or full-Bayes). |
Mathematics. See .gdpar_eb_model_se_vector() and .gdpar_fb_model_se_vector().
Returns. A named numeric vector of model-based standard errors, name-aligned with .gdpar_dep_estimate_vector(object).
Notes.
- Same dispatch pattern as
.gdpar_dep_estimate_vector():inherits(object, "gdpar_eb_fit")triggers the EB path; otherwise full-Bayes. - The resulting vector is used in computing
se_ratio = robust_se / model_se, and because both EB and full-Bayes model SEs are standard deviations (SD-vs-SD), the ratio is a like-for-like comparison. - Name alignment with the estimate vector is guaranteed by the internal extractors.
Purpose. According to the documentation comment, this function returns the rate-optimal default block length for the moving block bootstrap:
where
Arguments. Not defined in this section — the function body is truncated at the end of the provided source.
Returns. Presumably a single integer:
Notes.
- The section is incomplete; only the roxygen/description comment is present. The function name and full signature are not visible in this segment.
- The data-driven constant of Politis & White (2004) is noted as a deferred refinement; this default provides only the correct rate, not the optimal constant.
- Full documentation will require the subsequent section(s) where the function body is defined.
Purpose Computes the default block length for block bootstrap resampling using the cube-root rate
Arguments
| Argument | Type | Meaning |
|---|---|---|
n |
integer-coercible scalar | Sample size (number of observations). |
Mathematics
Implements the rate:
where the rounding and flooring produce an integer
Returns A single integer: the default block length.
Notes The as.integer(round(...)) call rounds to the nearest integer and truncates; the outer max(1L, ...) guarantees the result is at least 1 even when n = 0 or n = 1.
Purpose Predicate that tests whether a block-size argument is the literal character string "auto", distinguishing the data-driven Politis–White path from a fixed integer or the NULL rate default. Shared by the temporal and spatial robust estimators.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
any R object | The block-length (or block-size) argument to inspect. |
Returns A logical scalar: TRUE if x is exactly the character string "auto" (length-1 character, not NA), FALSE otherwise.
Notes The compound guard is.character(x) && length(x) == 1L && !is.na(x) && identical(x, "auto") is deliberately strict: a factor, an NA_character_, or a character vector of length ≠ 1 all return FALSE. No side effects.
Purpose Evaluates the flat-top lag window (kernel) of Politis (2003) / Politis & White (2004), vectorised over its argument. Used inside the Politis–White block-length selector to compute the spectral density estimate
Arguments
| Argument | Type | Meaning |
|---|---|---|
s |
numeric vector | Scaled lag values |
Mathematics
Returns A numeric vector of the same length as s containing
Notes Vectorised via nested ifelse over abs(s). No input validation; non-finite or NA inputs propagate NA.
Purpose Determines the adaptive bandwidth Kn consecutive negligible lags (the "first insignificant run" rule). Factored out for direct unit testing.
Arguments
| Argument | Type | Meaning |
|---|---|---|
rho |
numeric vector | Sample autocorrelations at lags |
Kn |
integer scalar | Number of consecutive insignificant lags required to declare the bandwidth; typically |
crit |
numeric scalar | Critical value for the significance test; a lag |
Mathematics
Returns the smallest integer
If no such run exists in
i.e., the largest significant lag. If every lag is insignificant,
Returns An integer scalar: the estimated bandwidth
Notes Early-return inside the for loop at the first qualifying run. The function operates on a logical vector insig <- abs(rho) < crit of length 1L as a safe minimum when all autocorrelations are negligible (near-white noise).
Purpose Computes the optimal block length
Arguments
| Argument | Type | Meaning |
|---|---|---|
resid |
numeric vector | Residuals of the fitted working-independence model, already in the (temporal or spatial) bootstrap ordering. |
c_thresh |
numeric scalar | Critical-value multiplier for the adaptive bandwidth test. Default qnorm(0.975) ≈ 1.96, matching np::b.star. |
Mathematics
-
Bandwidth selection. Compute sample autocorrelations
$\hat\rho(1),\dots,\hat\rho(M_{\max})$ where$M_{\max} = \min(\lceil\sqrt{n}\rceil + K_N,\; n-1)$ and$K_N = \max(5, \lceil\log_{10} n\rceil)$ . The adaptive bandwidth$\hat{m}$ is found by.gdpar_pw_mhatwith critical value
Set
-
Spectral estimates. Recompute autocovariances
$\hat{R}(k)$ for$k = 0,\dots,M$ . Apply the flat-top window$\lambda(k/M)$ :
-
Optimal block length. For overlapping (moving/circular) block bootstrap the variance constant is
$D = \tfrac{4}{3}\,\widehat{\text{spec}}^2$ (Lahiri 2003), giving
- Capping. The final integer block length is
Returns A list with components:
| Component | Type | Meaning |
|---|---|---|
block_length |
integer | The selected block length. |
method |
character |
"auto" if the data-driven rule succeeded; "rate" if the fallback was used. |
reason |
character | Human-readable description of the selection, including |
Notes Five fallback paths return the method = "rate": (i)
Purpose Generates a resampled index vector of length n for a single temporal block bootstrap replicate. Draws ceiling(n / block_length) contiguous blocks with replacement, concatenates them, and truncates to length n. Supports both the moving (Künsch 1989) and circular (Politis & Romano 1992) block schemes.
Arguments
| Argument | Type | Meaning |
|---|---|---|
n |
integer-coercible scalar | Sample size. |
block_length |
integer-coercible scalar | Length of each contiguous block; must be in |
type |
character |
"moving" (default) or "circular". Matched via match.arg. |
Mathematics
Let
-
Moving block bootstrap (
"moving"): block start positions are drawn uniformly from$\{1, 2, \dots, n - \ell + 1\}$ . Each block$b$ contributes indices$s_b, s_b+1, \dots, s_b + \ell - 1$ . -
Circular block bootstrap (
"circular"): block start positions are drawn uniformly from$\{1, 2, \dots, n\}$ . Indices wrap around modulo$n$ : the raw index$i$ maps to$((i-1) \bmod n) + 1$ .
The output is the first
Returns An integer vector of length n containing resampled observation indices in
Notes
- Raises an abort (class
"gdpar_input_error") viagdpar_abort()ifblock_lengthis outside$[1, n]$ . - The circular scheme gives every observation equal expected resampling weight, whereas the moving scheme slightly down-weights observations near the boundaries.
- This is the single-chain sibling of a multi-chain MCMC-draw block bootstrap resampler (
block_bootstrap_indices()) documented elsewhere.
Purpose Computes residuals of a scalar fit (Empirical-Bayes or full-Bayes) for use in the dependence diagnostics. Shared by the temporal diagnostic (gdpar_dependence_diagnostic) and the spatial diagnostic (gdpar_spatial_dependence_diagnostic) to ensure a single, consistent residual definition (design decision D100).
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A scalar fitted model object. |
residual_type |
character | One of "quantile", "response", "pearson", "deviance". |
randomize_seed |
integer or NULL
|
Seed for reproducibility of randomized quantile residuals for discrete families; ignored for continuous families. |
Returns A numeric vector of residuals of length
Notes
-
Full-Bayes branch (
gdpar_fitthat is not agdpar_eb_fit): delegates entirely to the S3 methodstats::residuals(object, type = residual_type, randomize_seed = randomize_seed), which internally uses the posterior predictive draws and.gdpar_residuals_dispatch()(design decision D102). -
Empirical-Bayes branch: extracts the scalar observed outcome via
.gdpar_eb_scalar_y_obs(object), obtains response-type predictions viastats::predict(object, type = "response"), reads the family name fromobject$family$name, and dispatches to.gdpar_residuals_dispatch().
gdpar_dependence_diagnostic(object, index = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), max_lag = NULL, level = 0.95, randomize_seed = NULL, ...)
Purpose (Exported.) Quantifies serial (temporal) dependence in the residuals of a scalar Path 1 Empirical-Bayes or full-Bayes fit. The diagnostic is the gate for gdpar_dependence_robust(): it makes violations of the conditional-independence assumption visible and measurable before any block-bootstrap remedy is applied. Only scalar fits (
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A scalar fitted model. |
index |
numeric vector or NULL
|
Temporal (or one-dimensional) ordering of observations. If non-NULL, residuals are sorted by order(index) before statistics are computed. Must have length |
residual_type |
character | Residual type: "quantile" (default; Dunn-Smyth / randomized quantile residuals), "response", "pearson", or "deviance". |
max_lag |
integer or NULL
|
Maximum lag for the Ljung–Box test. Default: |
level |
numeric in |
Confidence level for the verdict. Dependence is flagged when a p-value 0.95. |
randomize_seed |
integer or NULL
|
Seed for randomized quantile residuals (discrete families). |
... |
— | Unused; present for signature stability. |
Mathematics
-
Lag-1 autocorrelation. Let
$r_1, \dots, r_n$ be the (optionally re-ordered) residuals,$\bar{r}$ their mean, and$\tilde{r}_t = r_t - \bar{r}$ . Then
The approximate one-sided p-value under the null
- Durbin–Watson statistic. Reported descriptively (not as a formal test):
Values near 2 indicate no first-order autocorrelation.
-
Ljung–Box test. The omnibus test across lags
$1, \dots, h$ (where$h = \texttt{max\_lag}$ ) is
computed via stats::Box.test(..., type = "Ljung-Box", fitdf = 0). The degrees of freedom are not reduced by the number of estimated model coefficients (fitdf = 0), making the test mildly optimistic for residuals of a fitted model.
-
Verdict. Dependence is flagged when
$p_{\text{Ljung-Box}} < 1 - \texttt{level}$ .
Returns An object of class c("gdpar_dependence_diagnostic", "list") with components:
| Component | Type | Meaning |
|---|---|---|
residual_type |
character | The residual type used. |
n |
integer | Number of residuals. |
max_lag |
integer | Maximum lag used for the Ljung–Box test. |
lag1_autocorr |
numeric |
|
lag1_p_value |
numeric | Two-sided p-value for |
durbin_watson |
numeric | Durbin–Watson statistic |
ljung_box_statistic |
numeric | Ljung–Box |
ljung_box_df |
integer | Degrees of freedom of the |
ljung_box_p_value |
numeric | P-value of the Ljung–Box test. |
level |
numeric | Confidence level used. |
index_supplied |
logical | Whether index was non-NULL. |
verdict |
character | Human-readable verdict string. |
A print method (S3 dispatch on "gdpar_dependence_diagnostic") is provided for formatted output.
Notes
-
Input validation. Calls
.gdpar_assert_scalar_dep(object, "object")to ensure the fit is scalar. Validateslevelviaassert_numeric_scalar(level, ..., lower = 0, upper = 1). Requires the posterior package (suggested dependency) for extracting posterior draws. -
Abort conditions. Raises
gdpar_abortwith class"gdpar_input_error"ifindexhas the wrong length ormax_lagis outside$[1, n-1]$ . Raises class"gdpar_diagnostic_error"if all residuals have zero variance (denom <= 0). -
S3 method note. The returned object carries class
"gdpar_dependence_diagnostic"as its primary class, enablingprint.gdpar_dependence_diagnostic()dispatch. -
Scope. Only scalar (
$K=1$ ,$p=1$ ) fits are accepted. Spatial dependence is handled by the siblinggdpar_spatial_dependence_diagnostic().
Purpose (Exported S3 method.) Provides a human-readable formatted summary of a gdpar_dependence_diagnostic object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_dependence_diagnostic |
The diagnostic object to print. |
digits |
integer | Number of significant digits for the printed statistics. (Signature declared in roxygen; exact default and implementation body are in the subsequent section.) |
... |
— | Unused; present for S3 generic compatibility. |
Returns Invisibly returns x.
Notes The function body is defined in the next section (section 3 of 7); only the roxygen documentation and function signature are present in this section. The method is registered via @export for S3 dispatch on the "gdpar_dependence_diagnostic" class.
Purpose S3 print method for objects of class gdpar_dependence_diagnostic. Produces a human-readable, multi-line textual summary of the serial-dependence diagnostic battery (autocorrelation, Durbin–Watson, Ljung–Box) attached to a fitted model.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list (S3 class gdpar_dependence_diagnostic) |
The diagnostic object produced by gdpar_dependence_diagnostic(). Required fields: $residual_type, $index_supplied, $n, $lag1_autocorr, $lag1_p_value, $durbin_watson, $ljung_box_df, $ljung_box_statistic, $ljung_box_p_value, $verdict. |
digits |
integer (default 3L) |
Number of significant digits used by format() when printing numeric quantities. |
... |
— | Ignored; present for S3 method compatibility. |
Returns invisible(x) — the input object, invisibly, following standard R print-method convention.
Notes
- All output is emitted via
cat()to the console (stdout). No value is returned visibly. - The print method checks
x$index_suppliedto annotate whether the residuals were ordered by a user-supplied index or by natural row order. - If
x$index_suppliedisTRUE, the residual-type line reads"(ordered by supplied index)"; otherwise"(natural row order)". - No validation of
xfields is performed; missing orNULLfields will produce blank output segments.
.gdpar_dependence_robust_engine(object, data, resample_fun, B, level, seed, iter_warmup, iter_sampling, chains, verbose, verbose_msg, caller_env, ...)
Purpose Internal (non-exported) shared block-bootstrap-by-refit engine. Factors out the entire resampling loop, seed management, bootstrap-SE and percentile-interval assembly, and per-refit convergence accounting that is common to the temporal (gdpar_dependence_robust) and spatial (gdpar_spatial_dependence_robust) robust-inference wrappers. The two public entry points differ only in their resample_fun and in the descriptive metadata they attach; everything downstream of the resample is handled here identically. Serves both the Empirical-Bayes (gdpar_eb_fit) and full-Bayes (gdpar_fit) paths through class-dispatched extractors (decision D102).
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
list (S3 class gdpar_eb_fit or gdpar_fit) |
A scalar Path 1 fit object (K = 1, p = 1). Must carry $call (the original fitting call), and must be dispatchable by .gdpar_dep_estimate_vector() and .gdpar_dep_model_se_vector(). |
data |
data.frame | The original data frame passed to the fitting function. Each bootstrap iteration indexes rows of this data frame. |
resample_fun |
nullary function | A closure with no arguments that returns an integer vector of length nrow(data) — the row indices for one bootstrap resample. For temporal bootstrapping this wraps .gdpar_block_bootstrap_data_indices() (moving or circular blocks ordered by index); for spatial bootstrapping it returns a spatial-block index vector. |
B |
integer scalar | Number of bootstrap refits to perform. |
level |
numeric scalar in |
Confidence level for the percentile interval (e.g. 0.95). |
seed |
integer or NULL
|
Optional RNG seed. When non-NULL, set.seed(as.integer(seed)) is called once before the loop, ensuring reproducibility of both the per-refit Stan seeds and the resample_fun() draws. |
iter_warmup |
integer scalar | Number of warmup (burn-in) iterations passed to each Stan refit. |
iter_sampling |
integer scalar | Number of post-warmup sampling iterations passed to each Stan refit. |
chains |
integer scalar | Number of HMC chains for each refit. |
verbose |
logical scalar | When TRUE, emits verbose_msg once at the start via gdpar_inform(). |
verbose_msg |
character or NULL
|
Pre-formatted cost message printed when verbose is TRUE. |
caller_env |
environment | The environment (typically the public wrapper's parent.frame()) in which each refit call is evaluated, so that model symbols resolve exactly as for a direct gdpar_eb() or gdpar() call. |
... |
— | Passed through to nothing directly; present for extensibility. |
RNG-consumption contract. The engine's random-number consumption order is frozen for reproducibility:
-
set.seed(seed)(whenseedis non-NULL); -
$B$ per-refit Stan seeds drawn viasample.int(.Machine$integer.max, B)— these are assigned deterministically to iteration$b = 1, \ldots, B$ ; - One call to
resample_fun()per iteration$b$ .
Point-estimate extraction. For each successful refit .gdpar_dep_estimate_vector(fit_b), which returns a named numeric vector of all AMM coefficients (theta_ref, a_coef, b_coef, W_raw, etc.).
Robust standard error. Let NA coefficients. Then:
where
Percentile confidence interval. For level
where stats::quantile(..., probs = c(alpha/2, 1 - alpha/2), names = FALSE).
SE ratio. The ratio comparing robust and model-based uncertainty:
A ratio
Convergence diagnostics. Per-refit convergence fields are aggregated over the
-
$\text{max\_rhat} = \max_b \hat{R}^{(b)}_{\max}$ (maximum across all refits of the per-refit maximum split-$\hat{R}$ ); -
$\text{min\_ess\_bulk} = \min_b \text{ESS}_{\text{bulk},\,\min}^{(b)}$ (minimum across all refits of the per-refit minimum bulk ESS); -
$\text{n\_divergent\_refits} = |\{b : D^{(b)} > 0\}|$ (number of refits with at least one divergent transition); -
$\text{n\_high\_rhat\_refits} = |\{b : \hat{R}^{(b)}_{\max} > 1.05\}|$ (number of refits with max R-hat exceeding the 1.05 threshold).
The R-hat threshold iter_warmup/iter_sampling.
Returns A list with components:
| Component | Type | Description |
|---|---|---|
table |
data.frame | One row per AMM coefficient, columns: parameter (character), estimate (original point estimate), model_se (Laplace SD or posterior SD), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower, ci_upper (percentile interval endpoints). |
B_ok |
integer | Number of successful bootstrap refits (no errors, all coefficients non-NA). |
seed |
integer | The supplied seed, or NA_integer_ if seed was NULL. |
warnings |
character vector | Accumulated warning messages (refit failures, convergence issues). Zero-length if clean. |
refit_diagnostics |
list | Aggregate convergence summary: max_rhat (numeric), min_ess_bulk (numeric), n_divergent_refits (integer), n_high_rhat_refits (integer), rhat_threshold (numeric, always 1.05). |
Notes
- No refit exclusion. Under-converged or divergent refits are never excluded or down-weighted. The rationale (documented in source decision D102) is that excluding under-converged refits is non-random — it removes precisely the data configurations the bootstrap is meant to probe — and would bias the SE. Both R-hat breaches and divergence counts are reported as diagnostics only.
-
Error handling. If a refit raises an error, the error message is captured via
tryCatch, stored inwarnings_msg, and the iteration is skipped (next). If fewer than 2 refits succeed (B_ok < 2), the engine callsgdpar_abort()with class"gdpar_diagnostic_error", aborting the run. -
Parameter alignment. Only parameters common to the original fit's
param_namesand each refit's estimate vector are recorded inboot[b, ]. This handles the (rare) case where a refit produces a partial coefficient vector. -
Refit call construction. The refit call is
object$callwith fields overridden:data→sub(the resampled data),iter_warmup,iter_sampling,chains,verbose→FALSE,refresh→0L,skip_id_check→TRUE,seed→refit_seeds[b]. A new environmentenvis created withparent = caller_envandenv$sub <- sub, so the symbolsubresolves inside the call. -
Diagnostics path-agnostic. Both
gdpar_eb_fitandgdpar_fitobjects carry a$diagnosticsslot with fieldsrhat_max,ess_bulk_min,divergent_count. The engine reads whichever is present. -
Byte-identical EB path. On the Empirical-Bayes path, the dispatch to
.gdpar_dep_estimate_vector/.gdpar_dep_model_se_vectorresolves to the original EB helpers, and the refit is agdpar_eb()call, so the engine's output is bit-for-bit identical to the pre-D102 temporal-only implementation.
gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = "quantile", randomize_seed = NULL, type = "moving", B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = FALSE, ...)
Note: This function's roxygen documentation and
@exportdirective appear at the end of this section (section 3 of 7); the actual function body is defined in a subsequent section. The documentation below is derived strictly from the roxygen block present here.
Purpose Public, exported entry point for dependence-robust standard errors via a temporal block bootstrap. Re-estimates the uncertainty of a scalar Path 1 Empirical-Bayes or full-Bayes fit so that it is robust to temporal (serial) dependence in the data, without modelling that dependence. It refits the model on index, and reports the bootstrap standard deviation and percentile intervals of each AMM coefficient alongside the model-based (Laplace / posterior) standard errors. This implements the working-independence + robust-variance stance of Liang & Zeger (1986): the point estimates are unchanged (consistent when the mean structure is correct, not efficient); only the reported uncertainty is made dependence-robust. Delegates the core loop to .gdpar_dependence_robust_engine().
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
S3 object (gdpar_eb_fit or gdpar_fit) |
A scalar Path 1 fit (K = 1, p = 1): either from gdpar_eb() (Empirical Bayes) or gdpar() (full Bayes). |
data |
data.frame | The data frame originally passed to the fitting function. The fit object does not store the data (to stay lightweight), so it must be re-supplied. Resampled by contiguous blocks and the model is refit on each resample. |
index |
numeric/integer vector of length NULL
|
Optional temporal ordering of the rows of data. Data are sorted by order(index) so that contiguous blocks correspond to contiguous time. When NULL (default), the natural row order is assumed to be the temporal order. |
block_length |
NULL, positive integer, or "auto"
|
Block size for the bootstrap. NULL (default): uses the rate-optimal "auto": selects the block length data-drivenly via the Politis & White (2004) automatic rule (with the Patton, Politis & White 2009 correction), computed from the fitted residuals (no extra refit), falling back to the rate-optimal formula on a degenerate series. The chosen value and method are reported in the result. |
residual_type |
character, one of "quantile" (default), "response", "pearson", "deviance"
|
Type of residuals fed to the Politis & White automatic block-length selector. Used only when block_length = "auto"; ignored otherwise. "quantile" refers to Dunn–Smyth randomized quantile residuals. |
randomize_seed |
integer or NULL
|
Optional seed for the randomized quantile residuals of discrete families. Used only by the "auto" block-length selector for reproducibility of the block-length choice; ignored otherwise. |
type |
character, one of "moving" (default) or "circular"
|
Type of block bootstrap. "moving" uses overlapping blocks that slide along the series; "circular" wraps the series into a circle. |
B |
integer scalar (default 199L) |
Number of bootstrap refits. |
level |
numeric scalar in 0.95) |
Confidence level for the percentile interval. |
seed |
integer or NULL
|
Optional RNG seed controlling both the block resampling and deterministically derived per-refit Stan seeds, for full reproducibility. |
iter_warmup |
integer scalar (default 500L) |
Number of warmup iterations per refit. Defaults are deliberately short to keep cost manageable. |
iter_sampling |
integer scalar (default 500L) |
Number of post-warmup sampling iterations per refit. |
chains |
integer scalar (default 2L) |
Number of HMC chains per refit. |
verbose |
logical scalar (default FALSE) |
When TRUE, prints an opt-in cost message once. |
... |
— | Additional arguments forwarded to gdpar_eb() (or gdpar()) for every refit. |
The function applies the Liang & Zeger (1986) working-independence / sandwich-variance paradigm to the gdpar model class. The key quantities are:
-
Block-length selection. Under the rate-optimal default: $$ L = \max!\bigl(1,, \lfloor n^{1/3} \rceil\bigr) $$ Under
"auto", the Politis & White (2004) algorithm estimates the optimal block length from the spectral density at frequency zero of the fitted residuals, with the Patton–Politis–White (2009) bias correction. -
Moving block bootstrap. For series length
$n$ and block length$L$ , the moving block bootstrap draws$\lfloor n/L \rfloor$ contiguous blocks of length$L$ uniformly at random (with replacement) from the$n - L + 1$ possible overlapping blocks, concatenating them to form a resampled series of length$\approx n$ . -
Circular block bootstrap. The series is wrapped into a circle;
$n - L + 1$ is replaced by$n$ possible blocks, eliminating edge effects. -
Robust SE. Computed by the engine: $$ \widehat{\text{SE}}{\text{robust}} = \text{SD}\bigl(\hat\theta^{(1)}, \ldots, \hat\theta^{(B{\text{ok}})}\bigr) $$
-
SE ratio. $$ \text{se_ratio} = \frac{\widehat{\text{SE}}{\text{robust}}}{\text{SE}{\text{model}}} $$ Values
$> 1$ signal that the model-based SE understates true sampling variability due to dependence.
Returns A list of S3 class gdpar_dependence_robust with components:
| Component | Type | Description |
|---|---|---|
table |
data.frame | One row per AMM coefficient; columns: parameter, estimate, model_se, robust_se, se_ratio, ci_lower, ci_upper. |
block_length |
integer | The chosen block length. |
block_length_method |
character | One of "rate" (rate-optimal formula, also flags fallback from "auto"), "fixed" (user-supplied), "auto" (Politis–White). |
type |
character |
"moving" or "circular". |
B |
integer | Requested number of bootstrap replications. |
B_ok |
integer | Number of successful refits. |
level |
numeric | Confidence level used. |
index_supplied |
logical | Whether the user supplied an index vector. |
seed |
integer | The supplied seed, or NA_integer_. |
warnings |
character vector | Accumulated warning messages (refit failures, convergence issues). |
refit_diagnostics |
list | Aggregate per-refit convergence: max_rhat, min_ess_bulk, n_divergent_refits, n_high_rhat_refits, rhat_threshold. |
A print method (defined elsewhere) provides a human-readable summary.
Notes
-
Empirical-Bayes vs. full-Bayes parity. Both paths are supported (decision D102). On the EB path,
estimateis the Laplace/conditional-posterior mean andmodel_seis its SD; on the full-Bayes path,estimateis the posterior mean andmodel_seis the posterior SD. The posterior mean (not median) is used for parity and to keep the SE ratio a dimensionless SD-vs-SD ratio without undeclared normal-scaling constants. -
Full-Bayes caveats. (1) Each full-Bayes refit runs full HMC (costly). (2) Finite-iteration refits carry Monte-Carlo error in their posterior mean, which slightly and conservatively inflates
robust_se. (3) Under an informative prior the full-Bayes posterior SD can be smaller than the bootstrap SD even under correct independent specification ($\text{se\_ratio} < 1$ ), because the prior concentrates the posterior beyond what the data alone support — this is benign regularization, not overstatement. -
Scope limitation. The bootstrap delivers robust variance, not better point estimates. It is valid for weak / short-range dependence relative to
block_length; it does not rescue long-memory or unit-root processes. -
Dependencies. Uses
cmdstanrfor refits andposteriorto extract coefficient estimates. -
Exported. This function is exported from the package namespace (present in
NAMESPACE).
gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, type = c("moving", "circular"), B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)
Purpose Top-level exported function that performs a dependence-robust uncertainty audit for a fitted gdpar model via block bootstrap. It re-estimates standard errors (and confidence intervals) of model coefficients to account for temporal dependence in the residuals, without changing point estimates. The method repeatedly refits the model on block-bootstrap resamples of the original data.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit or gdpar_eb_fit or compatible |
The fitted model object whose uncertainty is to be audited. |
data |
data.frame |
The original data frame used in fitting. Must be row-aligned with the model. |
index |
numeric or NULL
|
Optional temporal ordering variable. If non-NULL, data and residuals are sorted by this index before blocking. Must have length equal to nrow(data). |
block_length |
NULL, positive integer, or "auto"
|
Block length for the moving/circular block bootstrap. NULL uses a default rate "auto" selects the block length data-adaptively via the Politis–White (2004) plug-in method on the residuals. |
residual_type |
character scalar, one of "quantile", "response", "pearson", "deviance"
|
Type of residual used when block_length = "auto" for the Politis–White plug-in (and for spatial diagnostics). Matched via match.arg. |
randomize_seed |
integer or NULL
|
Seed for randomized quantile residuals (used only if residual_type = "quantile"). |
type |
character scalar, one of "moving", "circular"
|
Block-bootstrap scheme. "moving" uses overlapping blocks of length block_length; "circular" wraps the data end-to-end. Matched via match.arg. |
B |
positive integer | Number of bootstrap replicates (default 199). |
level |
numeric in |
Confidence level for percentile-based intervals (default 0.95). |
seed |
integer or NULL
|
Master seed passed to the engine for reproducibility. |
iter_warmup |
positive integer | Stan warmup iterations per refit. |
iter_sampling |
positive integer | Stan sampling iterations per refit. |
chains |
positive integer | Number of MCMC chains per refit. |
verbose |
logical scalar | If TRUE, prints an informational banner describing the audit before computation begins. |
... |
Additional arguments passed through to .gdpar_dependence_robust_engine and ultimately to the Stan refit. |
Mathematics
Default block length (rate method):
When block_length is NULL, the default block length is set to
where
Block bootstrap:
For each of .gdpar_block_bootstrap_data_indices(n, block_length, type). If type = "moving", consecutive blocks of length type = "circular", the data are conceptually wrapped in a circle.
Auto block length (Politis–White):
When block_length = "auto", residuals $block_length, $method, and $reason.
Robust standard error:
The block-bootstrap standard error of each coefficient is the sample standard deviation of the
measures how much the model-based uncertainty understates the dependence-robust uncertainty; values
Returns An object of class c("gdpar_dependence_robust", "list") with the following components:
| Component | Type | Meaning |
|---|---|---|
table |
data.frame |
Coefficient table with robust SEs, model SEs, se_ratio, and confidence intervals at the requested level. |
block_length |
integer | The block length used (after resolution of NULL or "auto"). |
block_length_method |
character | One of "fixed", "rate", or the method string returned by Politis–White. |
type |
character | The bootstrap scheme used ("moving" or "circular"). |
B |
integer | Requested number of replicates. |
B_ok |
integer | Number of replicates that completed successfully. |
level |
numeric | Confidence level. |
index_supplied |
logical | Whether the caller supplied an ordering index. |
seed |
integer or NULL
|
Seed actually used by the engine. |
warnings |
character vector | Accumulated warning messages from failed or slow refits. |
refit_diagnostics |
list or NULL
|
Aggregate convergence diagnostics across all refits (max R-hat, min ESS, divergent transitions, high-R-hat count). |
Notes
- The function requires the cmdstanr and posterior packages; if absent, a suggestion-error is raised.
- Validation errors (
class = "gdpar_input_error") are raised for: non-scalarobject, non-data-framedata, mismatchedindexlength, invalidblock_length(non-NULL, non-integer, non-"auto"),block_lengthoutside$[1, n]$ , non-scalar logicalverbose. - If
indexis non-NULL, bothdataand (internally) residuals are reordered byorder(index)before blocking, ensuring temporal coherence. - The function detects whether
objectinherits from"gdpar_fit"but not"gdpar_eb_fit"(i.e., is a full-Bayes fit) and adjusts the verbose message to warn that full HMC refits are markedly more expensive. - The resample-generating closure
resample_funis created in the local environment and passed to the engine. -
caller_env <- parent.frame()is captured so the engine can re-evaluate expressions in the caller's scope if needed.
Purpose S3 print method for objects of class gdpar_dependence_robust. Renders a human-readable summary of the block-bootstrap audit results to the console.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_dependence_robust |
The object to print. |
digits |
integer scalar (default 3) | Number of significant digits for formatting numeric columns in the table. |
... |
Unused; present for S3 generic compatibility. |
Returns Invisibly returns x (the input object).
Notes
- Prints the bootstrap scheme (
"moving"or"circular"), block length (with provenance label:"auto: Politis-White","rate: n^(1/3)", or blank for fixed),$B$ ,$B_{\text{ok}}$ , index-supplied status, and confidence level. - The label for
block_length_methoduses aswitchwith four branches:"auto","rate","fixed", and a default empty string; the%||%operator defaults to"fixed"if the component isNULL. - Numeric columns of the table are formatted with
format(col, digits = digits). - Appends an explanatory note about the
se_ratiointerpretation. - Calls
.gdpar_print_refit_diagnostics()to print convergence diagnostics. - If warnings are present, prints up to 5, with a count of remaining suppressed warnings.
Purpose Internal helper that prints aggregate per-refit convergence diagnostics (max R-hat, min ESS bulk, divergent transition count, high-R-hat refit count) to the console. Called by print.gdpar_dependence_robust.
Arguments
| Argument | Type | Meaning |
|---|---|---|
rd |
list or NULL
|
The refit_diagnostics component of a gdpar_dependence_robust object. |
digits |
integer scalar (default 3) | Number of significant digits for formatting. |
Returns invisible(NULL) in all cases.
Notes
- Returns early (silently) if
rdisNULL. - Also returns early if
rd$max_rhatisNULLor non-finite (!is.finite(mr)). - Uses
%||%to default missing components toNA_real_or0Lor1.05as appropriate. - Prints a single formatted line showing: max R-hat, min ESS (bulk), number of divergent refits, number of refits with R-hat above a threshold (default 1.05).
Purpose Internal function returning the variance-optimal default number of grid cells per axis
Arguments
| Argument | Type | Meaning |
|---|---|---|
n |
integer | Number of spatial observations. |
Mathematics
Implements the
yielding
Returns Integer scalar
Notes
- Uses
round(n^(1/4))then coerces to integer, with a floor of 2. - The documentation references decision D100 as the registered dissent.
Purpose Internal function that validates and coerces a coordinate matrix into a numeric
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
data.frame, matrix, or other |
Coordinate input to validate. |
n |
integer | Expected number of rows (observations). |
arg |
character (default "coords") |
Name of the argument for error messages. |
Returns A numeric matrix with exactly 2 columns and n rows, with no non-finite values.
Notes
- If
coordsis adata.frame, it is coerced viaas.matrix(). - Raises
gdpar_abort(class"gdpar_input_error") if:-
coordsis not a numeric matrix after coercion. -
coordsdoes not have exactly 2 columns. -
nrow(coords) != n. -
coordscontains any non-finite values (NA,NaN,Inf,-Inf).
-
Purpose Internal function constructing a binary
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric matrix ( |
Spatial coordinates (assumed validated). |
k |
positive integer | Number of nearest neighbours. |
Mathematics
Computes the
via stats::dist. For each observation order, and the corresponding entries of the adjacency matrix
The resulting
Returns An
Notes
- All
$n$ rows of$W$ are initialized to zero, then row-by-row the$k$ nearest neighbours are set to 1. - Because
orderbreaks ties by position, duplicate coordinates are handled deterministically.
Purpose Internal function constructing a binary distance-band adjacency matrix. The threshold is data-driven: the smallest distance that leaves no observation isolated.
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric matrix ( |
Spatial coordinates (assumed validated). |
Mathematics
- Compute the
$n \times n$ Euclidean distance matrix$D$ . - Set the diagonal
$D_{ii} = \infty$ . - The bandwidth threshold is
i.e., the maximum over all points of their nearest-neighbour distance. This ensures every point has at least one neighbour.
- The adjacency matrix is
Returns An
Notes
- Described as a "declared data-driven heuristic."
- The resulting
$W$ is symmetric because Euclidean distance is symmetric. - The diagonal is explicitly zeroed after the threshold comparison.
Purpose Internal function computing Moran's
Arguments
| Argument | Type | Meaning |
|---|---|---|
resid |
numeric vector of length |
Residuals (row-aligned with the weights matrix). |
W |
|
Spatial weights (binary adjacency or otherwise; need not be symmetric). |
S0 |
numeric scalar (default sum(W)) |
The sum of all weights |
Mathematics
Let
In vector notation, with
The implementation computes W %*% z), then takes the elementwise product
Returns A numeric scalar: the Moran's
Notes
- Under the null hypothesis of no spatial autocorrelation and row-standardised weights,
$E[I] \approx -1/(n-1)$ . Values near 1 indicate positive spatial autocorrelation; values near$-1/(n-1)$ indicate negative autocorrelation. - The formula as implemented handles asymmetric
$W$ correctly because$\sum_{ij} w_{ij} z_i z_j = \mathbf{z}^\top W \mathbf{z}$ does not require symmetry. - No
$p$ -value or reference distribution is computed here; this is a pure computational helper.
Purpose Internal function generating a length-.gdpar_block_bootstrap_data_indices() for 2-D coordinates.
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric matrix ( |
Spatial coordinates (assumed validated and row-aligned). |
g |
positive integer | Number of grid cells per axis. |
scheme |
character, "tiled" or "moving"
|
Spatial bootstrap scheme. |
random_origin |
logical | If TRUE, the grid origin is randomized per replicate (Politis–Romano–Lahiri randomized partition). |
mins |
numeric vector of length 2 | Coordinate minima per axis (bounding-box lower corner). |
ranges |
numeric vector of length 2 | Coordinate range per axis (bounding-box extent). |
Mathematics
Cell side lengths:
Tiled scheme:
- Set the origin
$\mathbf{o}$ . Ifrandom_origin = TRUE, draw$\mathbf{u} \sim U(0,1)^2$ and set$\mathbf{o} = \mathbf{mins} - \mathbf{u} \odot \mathbf{L}$ ; otherwise$\mathbf{o} = \mathbf{mins}$ . - Assign each observation
$i$ to a cell:
- Group observations by cell label
$(c_{x,i}, c_{y,i})$ . - Sample cells with replacement (uniform) and concatenate their member indices until
$\geq n$ indices accumulate. Truncate to exactly$n$ .
Moving scheme:
- Repeatedly draw a random seed point
$\mathbf{s}$ from the data. - Draw
$\mathbf{u} \sim U(0,1)^2$ and set the block origin$\mathbf{o} = \mathbf{s} - \mathbf{u} \odot \mathbf{L}$ . - Collect all observations within the axis-aligned square
$[\mathbf{o},\, \mathbf{o} + \mathbf{L})$ . - Append to the output until
$\geq n$ indices accumulate. Truncate to exactly$n$ .
Returns An integer vector of length
Notes
- In the tiled scheme, non-empty cells are guaranteed to have at least one observation. Empty cells are implicitly excluded because
splitonly creates groups for observed cell labels. - In the moving scheme, every block is guaranteed non-empty because the block is anchored at a randomly sampled observation and is sized to cover at least that point (assuming the observation falls inside the bounding box, which it does by construction).
- The
random_originmechanism implements the Politis–Romano–Lahiri randomized partition to break grid-alignment artifacts.
.gdpar_spatial_block_length_auto(coords, resid, scheme, random_origin, mins, ranges, B0 = 200L, var_const = 1, seed = NULL)
Purpose Internal workhorse that data-selects the spatial block size .gdpar_spatial_default_g.
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric |
Spatial coordinates (columns = resid. |
resid |
numeric vector of length |
Model residuals (centred internally: |
scheme |
character string | Block-tile scheme identifier forwarded to .gdpar_spatial_block_indices (controls how spatial blocks are laid out relative to random_origin, mins, ranges). |
random_origin |
logical or scalar | Whether to randomise the grid origin in each bootstrap replicate (forwarded to .gdpar_spatial_block_indices). |
mins |
numeric vector of length 2 | Minimum coordinate values |
ranges |
numeric vector of length 2 | Coordinate ranges |
B0 |
integer (default 200L) |
Number of Monte Carlo block-bootstrap replicates per candidate |
var_const |
numeric scalar (default 1) |
Multiplicative constant |
seed |
integer or NULL
|
Optional seed set via set.seed() before the bootstrap loop for reproducibility. |
Mathematics
The procedure operates as follows.
-
Default fallback. Compute
$g_{\text{def}} = \lfloor n^{1/4} \rfloor$ via.gdpar_spatial_default_g(n). Early returns use$g_{\text{def}}$ when:-
$n < 25$ , - coordinate spread is degenerate (
$\text{sd}(x) \le 0$ or$\text{sd}(y) \le 0$ ), - fewer than 3 valid grid points exist,
- bootstrap variances are non-finite or all zero,
- the MSE criterion is non-finite, or
- the MSE minimum falls on the largest-
$g$ (smallest-block) boundary.
-
-
Design matrix for a spatial-mean surrogate. Construct a
$n \times 3$ matrix$$\mathbf{D}_{\text{surr}} = \bigl[,\mathbf{1},; (x - \bar x)/s_x,; (y - \bar y)/s_y,\bigr]$$ where$s_x, s_y$ are the coordinate standard deviations. -
Candidate grid. Define bounds
$$g_{\text{lo}} = \max!\bigl(2,;\lfloor 0.5, g_{\text{def}} \rfloor\bigr), \qquad g_{\text{hi}} = \min!\bigl(\lfloor 3, g_{\text{def}} \rfloor,; \lfloor \sqrt{n/3},\rfloor\bigr).$$ Generate 6 points on a log-spaced grid in$[g_{\text{lo}},\, g_{\text{hi}}]$ , round to unique integers, and retain only$g \ge 2$ with average cell occupancy$n/g^2 \ge 3$ . -
Bootstrap variance per
$g$ . For each candidate$g$ and each replicate$b = 1,\dots,B_0$ :- Draw a spatial block index vector
$\mathcal{I}_b$ from.gdpar_spatial_block_indices. - Compute a
$3$ -vector of block-level spatial-mean statistics: $$\mathbf{T}b = \frac{1}{n},\mathbf{D}{\text{surr}}[\mathcal{I}_b,]^\top, z[\mathcal{I}_b].$$ - Aggregate across coordinates: $$V_g = \sum_{j=1}^{3} \mathrm{Var}{b}(T{b,j}).$$
- Also count the number of unique occupied tiles
$n_{\text{tiles}}(g)$ .
- Draw a spatial block index vector
-
MSE criterion. Smooth
$V_g$ with a running median ($k=3$ ):$\tilde V_g = \mathrm{runmed}(V_g,\, 3)$ . Anchor at$g_{\min}$ (the smallest candidate, i.e.\ the largest blocks). Then:$$\text{bias}^2(g) = \bigl(\tilde V_g - \tilde V_{g_{\min}}\bigr)^2, \qquad \text{var}(g) = c ;\frac{\tilde V_g^{,2}}{n_{\text{tiles}}(g)},$$ $$\text{MSE}(g) = \text{bias}^2(g) + \text{var}(g).$$ The variance term reflects the inverse-number-of-blocks scaling (Lahiri 2003), not the Monte Carlo noise from finite$B_0$ . -
Selection.
$g^* = \arg\min_g \text{MSE}(g)$ . If$g^*$ equals the last (smallest-block) grid element, the procedure bails out to the$n^{1/4}$ default (anticonservative boundary).
Returns A named list with three elements:
| Element | Type | Meaning |
|---|---|---|
block_size |
integer | The chosen |
method |
character |
"auto" if the data-driven selection succeeded; "rate" on any fallback. |
reason |
character | Human-readable explanation: on success a formatted string with the grid, |
Notes
-
Fallback cascade. There are six distinct early-return paths, all returning
method = "rate"via the innerfb()helper, each with a differentreasonstring. The function never errors; it always returns a valid list. -
Bootstrap machinery. All block-index generation delegates to
.gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges)which implements the spatial tiling and optional random-origin jitter. -
Side effects. Calls
set.seed()whenseedis non-NULL. No other side effects. - No S3 dispatch. This is an internal utility, not an S3 generic or method.
-
Cell-occupancy bound. The upper cap on
$g$ enforces$n/g^2 \ge 3$ (at least ~3 observations per cell on average), a validity constraint for within-cell resampling. This is deliberately looser than the$n^{1/3}$ rate sometimes seen in the spatial bootstrap literature. -
Running median smoothing.
stats::runmed(..., k = 3L)uses a centred running median by default (endrule = "median"), so the first and last values may be smoothed with a half-window.
gdpar_spatial_dependence_diagnostic(object, coords, W = NULL, weights = c("knn", "distance"), k = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), test = c("permutation", "analytic"), n_perm = 999L, level = 0.95, randomize_seed = NULL, seed = NULL, ...)
Purpose Exported diagnostic that quantifies spatial autocorrelation in the residuals of a scalar (gdpar_dependence_diagnostic and the recommended gate before calling gdpar_spatial_dependence_robust.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A scalar Path 1 fit (.gdpar_assert_scalar_dep. |
coords |
numeric |
Spatial coordinates, row-aligned with training data. Validated by .gdpar_validate_coords. |
W |
numeric NULL
|
User-supplied spatial weight matrix. Overrides weights/k. Diagonal is zeroed internally. Row-standardized before use. |
weights |
character, one of "knn" (default) or "distance"
|
Neighbourhood construction method when W is NULL. "knn" = "distance" = distance-band whose threshold is the smallest that isolates no location. Both produce row-standardized weights. Ignored when W is supplied. |
k |
integer or NULL
|
Number of neighbours for weights = "knn". Default heuristic: |
residual_type |
character, one of "quantile" (default), "response", "pearson", "deviance"
|
Type of residual extracted from object via .gdpar_dependence_residuals. "quantile" = randomized quantile (Dunn–Smyth) residuals. |
test |
character, one of "permutation" (default) or "analytic"
|
Hypothesis test for Moran's "permutation" = location-relabelling permutation test (two-sided via $` |
n_perm |
integer (default 999L) |
Number of permutations for test = "permutation". Capped below |
level |
numeric in 0.95) |
Confidence level used to convert the |
randomize_seed |
integer or NULL
|
Seed for randomized quantile residuals of discrete families; ignored otherwise. |
seed |
integer or NULL
|
Seed for the permutation test (reproducibility). |
... |
— | Unused; present for signature stability. |
Mathematics
The function implements the following:
Moran's
Permutation test. For each of n_perm permutations
- Compute
$I_\pi$ from the residuals$z_{\pi(i)}$ . - Two-sided
$p$ -value:$$p = \frac{1 + #{b : |I_\pi - E[I]| \ge |I_{\text{obs}} - E[I]|}}{\text{n_perm} + 1}$$
Analytic (Cliff–Ord) normal approximation. Define:
Returns A list of class c("gdpar_spatial_dependence_diagnostic", "list") with components:
| Component | Type | Meaning |
|---|---|---|
residual_type |
character | The residual type used. |
n |
integer | Number of observations. |
weights |
character |
"user" if W was supplied, otherwise the matched weights argument. |
k |
integer |
NA_integer_ if not applicable. |
style |
character | Always "W" (row-standardized). |
n_zero_weight |
integer | Number of locations with zero row sum in the raw weight matrix. |
morans_i |
numeric | Observed Moran's NA if any location has zero weight. |
expected_i |
numeric |
NA if undefined. |
var_i |
numeric | Analytic variance of NA when test = "permutation" or the analytic variance is non-positive. |
z |
numeric |
NA when not computed. |
p_value |
numeric | Two-sided NA if the test is undefined. |
test |
character |
"permutation" or "analytic". |
n_perm |
integer | Effective number of permutations (may be less than requested for tiny NA for the analytic test. |
level |
numeric | Confidence level used. |
verdict |
character | Human-readable summary string. Three forms: (1) "...Undefined..." if zero-weight locations exist, (2) "Spatial dependence detected..." if "No evidence against spatial independence..." otherwise. Also handles the case where NA. |
Notes
-
S3 dispatch. This is an exported function (not a method). A
printmethod for the returned object is defined immediately below. -
Guards and assertions.
-
.gdpar_assert_scalar_dep(object, "object")enforces that the fit is scalar ($K = 1$ ,$p = 1$ ). -
stats::var(resid) <= 0triggers an abort viagdpar_abortwith class"gdpar_diagnostic_error". - If
Wis supplied, it must be a numeric$n \times n$ matrix with all finite values; violations abort with class"gdpar_input_error". -
kmust satisfy$1 \le k \le n - 1$ ; otherwise abort with class"gdpar_input_error".
-
-
Zero-weight locations. If any location has zero row sum after weight construction, a
warningis emitted,morans_iis set toNA, and the verdict reports"Undefined". With kNN and$k \ge 1$ this should never occur (every point gets at least one neighbour). -
Small-sample warnings. For
test = "permutation":-
$n < 20$ : hard warning ("very small… treat the p-value as indicative only"). -
$n < 50$ : soft warning ("small… approximate").
-
-
Permutation cap.
max_distinctisfactorial(n)for$n \le 10$ andInfotherwise.n_perm_eff=min(n_perm, max(1, max_distinct - 1)). -
Analytic test asymmetry warning. If the row-standardized weight matrix
Wnis not symmetric (which is typical for kNN and distance-band weights), a warning recommendstest = "permutation". -
Side effects. Calls
set.seed()whenseedis non-NULL (inside the permutation loop). Callsrequire_suggested("posterior", ...)to ensure theposteriorpackage is available for extracting posterior draws. -
Coordinate handling. Coordinates are validated by
.gdpar_validate_coords. They are treated as Euclidean; lon/lat data must be projected before calling this function. -
Residual extraction. Delegates to
.gdpar_dependence_residuals(object, residual_type, randomize_seed). -
Weight construction. kNN via
.gdpar_knn_adjacency(coords, k); distance-band via.gdpar_distance_band_adjacency(coords). Both return raw (unstandardized) adjacency matrices.
Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Provides a human-readable summary of the spatial dependence diagnostic to the console.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_spatial_dependence_diagnostic |
The diagnostic object to print. |
digits |
integer (default 3L) |
Number of significant digits for printed statistics. |
... |
— | Unused; present for S3 generic compatibility. |
Mathematics None.
Returns Invisibly returns x (the input object unchanged), following standard R print method conventions.
Notes
-
Export. Declared with
@exportin the roxygen header, so it is exported and registered as an S3 method for theprintgeneric on classgdpar_spatial_dependence_diagnostic. - Body not shown. The source code for the function body is not included in this section (the section ends at the roxygen closing). Only the roxygen documentation is available; the exact formatting of the printed output cannot be described from this section alone.
Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Produces a formatted console summary of the spatial dependence diagnostic (Moran's I test on model residuals, optionally on a k-nearest-neighbour or distance-band spatial weight matrix).
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_spatial_dependence_diagnostic list |
The diagnostic object to print. Expected components: residual_type (character), n (integer), weights (character, one of "knn", "distance", or other for user-supplied), k (integer, number of neighbours when weights == "knn"), morans_i (numeric or NA), expected_i (numeric, test (character, "analytic" or "permutation"), z (numeric, analytic z-score), p_value (numeric), n_perm (integer, number of permutation draws), verdict (character). |
digits |
integer scalar (default 3L) |
Number of significant digits used by format() when printing numeric statistics. |
... |
— | Absorbed for S3 method compatibility; unused. |
Returns x invisibly (enabling piping/invisible return in scripts).
Notes
- When
x$morans_iisNA, the method prints"Moran's I : undefined"and skips printing the expected value, test statistic, and p-value entirely (regardless ofx$test). - When
x$weightsis"knn", the printed label includes the value ofx$kviasprintf. When"distance", it prints a fixed label. Any other value falls through to"user-supplied, row-standardized". - Output is directed to the console via
cat()withsep = "". - No side effects beyond printing; no error handling.
gdpar_spatial_dependence_robust(object, data, coords, block_size = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, scheme = c("tiled", "moving"), random_origin = TRUE, B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)
Purpose Re-estimates the uncertainty (standard errors and percentile confidence intervals) of every AMM coefficient from a scalar Path 1 fit so that inference is robust to unmodelled spatial dependence in the data. The point estimates themselves are unchanged; only the reported uncertainty is adjusted. This implements the working-independence + robust-variance stance of Liang & Zeger (1986) via a spatial block bootstrap: the model is refitted on B spatial block-bootstrap resamples over the observed coords, and the bootstrap standard deviation and percentile intervals of each coefficient replace (or supplement) the model-based (Laplace / posterior) standard errors. The function is the spatial counterpart of gdpar_dependence_robust (temporal); both share one internal refit engine (.gdpar_dependence_robust_engine).
Arguments
| Argument | Type | Default | Meaning |
|---|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
— | A scalar Path 1 fit (K = 1, p = 1): either an Empirical-Bayes fit from gdpar_eb() or a full-Bayes fit from gdpar(). Validated by .gdpar_assert_scalar_dep. |
data |
data frame | — | The original data frame passed to the fitting function. Must be row-aligned with coords. It is resampled by spatial blocks and the model is refit on each resample. Validated by assert_data_frame. |
coords |
numeric matrix or data frame ( |
— | Spatial coordinates, row-aligned with data. Validated and coerced by .gdpar_validate_coords. |
block_size |
NULL, positive integer, or "auto"
|
NULL |
Number of grid cells per axis NULL: variance-optimal rate "auto": data-driven calibration over a grid of |
residual_type |
character (one of "quantile", "response", "pearson", "deviance") |
"quantile" |
Type of residuals used only when block_size = "auto" to feed the data-driven block-size selector. "quantile" gives Dunn-Smyth randomized quantile residuals. Ignored when block_size is NULL or a fixed integer. |
randomize_seed |
NULL or integer |
NULL |
Optional seed for reproducibility of the randomized quantile residuals of discrete families. Used only by the "auto" block-size selector; ignored otherwise. |
scheme |
character (one of "tiled", "moving") |
"tiled" |
Resampling scheme. "tiled": non-overlapping rectangular cells. "moving": overlapping square blocks anchored on sampled observation points. |
random_origin |
logical scalar | TRUE |
When TRUE and scheme = "tiled", the grid origin is randomly shifted per bootstrap replicate (Politis-Romano-Lahiri circular block idea adapted to 2-D), breaking deterministic boundary artifacts at the cost of one extra random draw per refit. |
B |
integer |
199L |
Number of bootstrap refits. Validated by assert_count. |
level |
numeric in |
0.95 |
Level for the percentile confidence intervals. Validated by assert_numeric_scalar. |
seed |
NULL or integer |
NULL |
Optional seed controlling the block resampling and per-refit Stan seeds for reproducibility. Passed through to .gdpar_dependence_robust_engine. |
iter_warmup |
integer |
500L |
Warmup iterations per refit's conditional HMC. |
iter_sampling |
integer |
500L |
Sampling iterations per refit's conditional HMC. |
chains |
integer |
2L |
Number of chains per refit. |
verbose |
logical scalar | TRUE |
When TRUE, prints an opt-in cost message describing the number of refits, grid dimensions, scheme, and whether the full-Bayes path is in use. |
... |
— | — | Additional arguments absorbed for forward compatibility (passed to .gdpar_dependence_robust_engine). |
Mathematics
Default block-size rate (decision D100,
is minimised at
Resampling.
-
Tiled scheme: non-empty cells are sampled with replacement; the resample is truncated to
$n$ observations (introducing a negative bias$O(1/n)$ , negligible). Whenrandom_origin = TRUE, the grid origin is shifted by a random sub-cell offset per replicate. -
Moving scheme: overlapping square blocks of side
$g$ cells are anchored on sampled observation locations, guaranteeing non-empty blocks.
Data-driven block size (
(the influence directions of the coefficient).
Returns A list of class c("gdpar_spatial_dependence_robust", "list") with the following components:
| Component | Type | Description |
|---|---|---|
table |
data frame | One row per AMM coefficient. Columns: estimate (original point estimate, unchanged), model_se (model-based SE), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower and ci_upper (percentile interval at level). |
block_size |
integer | The chosen |
block_size_method |
character | One of "rate" (variance-optimal default; also returned when "auto" falls back), "fixed" (user-supplied integer), or "auto" (data-driven calibration succeeded). |
scheme |
character | The resampling scheme used ("tiled" or "moving"). |
random_origin |
logical | Whether random grid-origin shifts were used (relevant only for "tiled"). |
n_tiles |
integer | Number of unique spatial cells at the chosen resolution. |
B |
integer | Requested number of bootstrap refits. |
B_ok |
integer | Number of refits that successfully converged (from .gdpar_dependence_robust_engine). |
level |
numeric | The percentile-interval level. |
seed |
integer or NULL
|
The seed actually used (may be supplied or internally generated by the engine). |
warnings |
character vector | Accumulated warning messages (single-cell warning if n_tiles <= 1, plus any from the refit engine). |
refit_diagnostics |
list | Aggregate per-refit convergence diagnostics, structured as in gdpar_dependence_robust. |
A print method is declared (signature print.gdpar_spatial_dependence_robust(x, digits, ...); body in another section).
Notes
-
Input validation.
objectis checked by.gdpar_assert_scalar_dep(must be a scalar Path 1 fit).coordsis validated by.gdpar_validate_coordsfor dimension and type. Collinear coordinates (zero range on either axis) raise an error viagdpar_abort(classgdpar_input_error).block_sizemust beNULL, a positive integer, or the string"auto"; any other string triggers an error.random_originandverbosemust be logical scalars.B,iter_warmup,iter_sampling,chainsare validated byassert_count.levelis validated as a numeric scalar in$(0, 1)$ . -
Single-cell warning. If all locations fall into a single spatial cell at the chosen resolution (
n_tiles <= 1), a warning is emitted and the bootstrap SE will collapse toward zero. The warning message is stored inwarnings_preand appended to the returnedwarningsvector. -
Full-Bayes detection. The function detects whether
objectis a full-Bayes fit (inherits(object, "gdpar_fit") && !inherits(object, "gdpar_eb_fit")) to adjust the verbose cost message accordingly (full-Bayes refits use full HMC and are markedly more expensive). -
Suggested dependencies. Requires
cmdstanr(for Stan refits) andposterior(for extracting posterior draws); both are loaded viarequire_suggested. -
Internal engine. The actual bootstrap loop is delegated to
.gdpar_dependence_robust_engine, which receives aresample_funclosure that calls.gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges)to generate block indices for each replicate. -
Coordinate pre-processing.
mins(per-axis minima) andranges(per-axis ranges) are computed fromcoordsand used throughout cell assignment and the resampling closure. -
...arguments. Forwarded to.gdpar_dependence_robust_engine; currently absorbed for compatibility with the temporal siblinggdpar_dependence_robust. - No dependence modelling. The function does not model spatial dependence; it only makes inference robust to it. Valid for weak / short-range spatial dependence relative to block size; does not rescue strong long-range dependence.
-
Isotropic block. A single isotropic
$g$ is used for both coordinate axes; strongly anisotropic residual dependence is a documented limitation.
Purpose
S3 print method for objects of class "gdpar_spatial_dependence_robust". Renders a human-readable summary of spatial-dependence-robust inference results produced by spatial block-bootstrap variance estimation. It displays the block-bootstrap configuration (grid size, scheme, number of non-empty tiles), the number of refits performed and succeeded, the confidence level, a formatted table of coefficient estimates with model-based and robust standard errors and their ratio, a brief interpretation of the se_ratio, refit diagnostics, and any stored warnings.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list (S3 class "gdpar_spatial_dependence_robust") |
The spatial-dependence-robust result object. Expected to contain the named elements scheme, block_size_method, block_size, random_origin, n_tiles, B, B_ok, level, table, refit_diagnostics, and warnings. |
digits |
integer(1) (default 3L) |
Number of significant digits used when formatting numeric columns of the coefficient table via format(). |
... |
— | Additional arguments passed to print(); accepted for S3 method signature compatibility but not used in the body. |
Returns
Invisibly returns x, the original input object (via invisible(x)). The primary effect is the side effect of printing to the console.
Notes
-
S3 dispatch. This is an S3 method registered for the generic
printon objects of class"gdpar_spatial_dependence_robust". Standardprint()dispatch applies. -
Header line. Prints the scheme name (e.g.
"lattice","tiled") followed by" spatial block bootstrap". -
Grid description. Prints
block_size × block_sizecells. Theblock_size_methodelement is checked:-
"auto"appends" (auto: data-driven calibration)". -
"rate"appends" (rate: n^(1/4))". -
"fixed"or any other value (includingNULLvia the%||%fallback) appends nothing. Ifrandom_originisTRUEandschemeis exactly"tiled", the note" (randomized origin)"is also appended. The count of non-empty tiles (n_tiles) is always shown.
-
-
Refit summary. Displays the total number of bootstrap refits (
B) and how many completed successfully (B_ok). -
Confidence level. Prints
level(a numeric probability, e.g.0.95). -
Coefficient table formatting. Columns of
x$tablethat are numeric are re-formatted withformat(col, digits = digits). The table is then printed withrow.names = FALSE. -
Interpretation hint. A short explanatory sentence is printed: the
se_ratiois defined asrobust_se / model_se; a ratio greater than 1 indicates that the model-based standard errors understate the spatial-dependence-robust uncertainty. -
Refit diagnostics. Delegates to
.gdpar_print_refit_diagnostics(x$refit_diagnostics, digits)(an internal helper defined elsewhere) to print any additional convergence or numerical diagnostics from the bootstrap refits. -
Warnings. If
x$warningsis a non-empty character vector, up to 5 warnings are printed, each preceded by" - ". If more than 5 exist, a count of remaining warnings is appended. -
Edge cases. If
block_size_methodisNULL, the%||%(null-coalescing) operator defaults to"fixed", so no calibration label is printed. Ifschemeis not"tiled"orrandom_originis notTRUE, the randomized-origin note is omitted. Ifx$warningshas length zero or isNULL, the warnings section is skipped entirely (theifguards handle this).
Purpose
Constructor for the dims_spec S3 class. It broadcasts a single uniform per-component specification (additive basis a, multiplicative basis b) across all amm_spec). It is the intended value of the dims argument of amm_spec when every dimension shares the same override.
Arguments
-
a:NULLor a one-sidedformula. The additive basis applied uniformly to every dimension$k = 1, \ldots, p$ of$\theta_i$ .NULLdisables the additive component for all dimensions. -
b:NULLor a one-sidedformula. The multiplicative basis applied uniformly to every dimension.NULLdisables the multiplicative component for all dimensions.
Mathematics
The object encodes, for the canonical AMM form
$$
\theta_i[k] = \theta_{\mathrm{ref}}[k] + a_k(x_i) + b_k(x_i),\theta_{\mathrm{ref}}[k] + \bigl(W_k(\theta_{\mathrm{ref}}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i, \quad k = 1, \ldots, p,
$$
the per-dimension, covariate-only pieces dims_spec because it couples all dimensions of
Returns
A list of class c("dims_spec", "list") with two components:
-
base: a listlist(a = a, b = b)holding the uniform template. -
overrides: an empty named listlist(), to be populated byoverride.
Notes
- Both arguments may simultaneously be
NULL; this is permitted and yields adims_specwhose base disables both components. - Validation is delegated to
assert_one_sided_formula(., allow_null = TRUE)for each ofaandb; malformed formulas abort there. - The dimension
$p$ is intentionally not stored; coherence with$p$ , the multivariate$W$ basis, and any overrides is validated later byamm_spec. - Bare formulas passed directly to
amm_spec'sdimsargument when$p > 1$ are rejected; wrapping indimwise()is the explicit opt-in to broadcasting.
Purpose
Attach a per-dimension override to an existing dims_spec, replacing the additive and/or multiplicative formula for a single dimension index k while leaving the base template and other dimensions untouched. Overrides compose across multiple calls and overwrite on repeated k.
Arguments
-
dims: adims_specobject (produced bydimwise). -
k: a positive integer scalar. The dimension index to override. Coherence with the global$p$ is checked later byamm_spec/resolve_dims_spec, not here. -
a: optional. A one-sidedformulareplacing the additive basis for dimensionk, orNULLto disable the additive component for that dimension only. Missing (omitted) means "inherit from base"; explicitlyNULLmeans "disable for this dimension". -
b: optional. Same semantics asafor the multiplicative basis.
Mathematics
For the overridden dimension a supplied} \ a^{\text{base}} & \text{if a missing} \end{cases}, \qquad
b_k = \begin{cases} b^{\text{ov}} & \text{if b supplied} \ b^{\text{base}} & \text{if b missing} \end{cases},
$$
where a supplied value of NULL is interpreted as "disabled" (a valid formula replacement of NULL), distinct from "missing" (inherit).
Returns
A new dims_spec (a modified copy of dims; the input is not mutated in place because list subsetting creates copies) with the override registered under the character key as.character(as.integer(k)) in dims$overrides. Each override entry is a list with components a, b, a_set (logical), b_set (logical). Calling override twice with the same k replaces the prior entry for that index.
Notes
-
assert_inherits(dims, "dims_spec", "dims")is called first; non-dims_specinput aborts. -
assert_count(k, "k")enforces thatkis a positive integer scalar. - If both
aandbare missing, the function aborts viagdpar_abortwithclass = "gdpar_input_error"and the message: "override(): at least one of 'a' or 'b' must be supplied. To leave a dimension unchanged, do not call override() for it." - The missing-vs-
NULLdistinction is implemented withbase::missing(). Whenais supplied,assert_one_sided_formula(a, "a", allow_null = TRUE)is run, thenov["a"] <- list(a)(the[<--with-list idiom is used so that assigningNULLretains the element rather than deleting it) andov$a_set <- TRUE. Symmetrically forb. - If no prior override exists for
k, a fresh entrylist(a = NULL, b = NULL, a_set = FALSE, b_set = FALSE)is seeded before applying the supplied arguments, so unsupplied components correctly remain flagged as unset and will inherit from the base at resolution time. - Range validation of
kagainst$p$ is not performed here; it is deferred toresolve_dims_spec.
Purpose
Internal resolver that flattens a dims_spec into the canonical per-dimension representation consumed by amm_spec: a length-p list of list(a, b) pairs, with overrides applied on top of the base template.
Arguments
-
dims: adims_specobject. -
p: a positive integer scalar, the global dimension.
Mathematics
For each ov$a_set is TRUE, else
Returns
A list of length p. Each entry is list(a = a_k, b = b_k) where a_k/b_k are each either a one-sided formula or NULL.
Notes
-
assert_inherits(dims, "dims_spec", "dims")andassert_count(p, "p")are run first. - Before flattening, every override key is parsed with
suppressWarnings(as.integer(key)). Any key that isNA,< 1, or> pis collected intobad; ifbadis non-empty, the function aborts viagdpar_abortwithclass = "gdpar_input_error", asprintfmessage listing the bad keys and the valid range1:p, and adatafieldlist(bad_keys = bad, p = p). - The flattening loop iterates
seq_len(p); for eachkit starts fromdims$base$a/dims$base$b, then ifdims$overrides[[as.character(k)]]is non-NULLit conditionally replacesa_kwhenisTRUE(ov$a_set)andb_kwhenisTRUE(ov$b_set). Unset components therefore inherit from the base, realising the missing-vs-NULLsemantics established byoverride. - Marked
@keywords internal/@noRd; not exported.
Purpose
S3 print method for objects of class dims_spec. Renders a compact human-readable summary of the base template and any registered overrides.
Arguments
-
x: adims_specobject. -
...: ignored; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
- Output layout:
- Header line
<dims_spec>. -
base:section printinga : <deparse(formula) or "NULL">andb : <deparse(formula) or "NULL">. Formula deparsing usesbase::deparse. - If
length(x$overrides) > 0L, anoverrides:section enumerating each override. Keys are sorted by their integer value (sort(as.integer(names(x$overrides)))) and printed ask = <int> : <parts>, where<parts>is the semicolon-joined set ofa = <deparse or "NULL">(only ifisTRUE(ov$a_set)) andb = <deparse or "NULL">(only ifisTRUE(ov$b_set)). Unset components are omitted from the line. - If there are no overrides, prints
overrides: <none>.
- Header line
- The method is exported (the generic
printis dispatched viaUseMethodon class"dims_spec", which sits before"list"in the class vector). - No validation of
xis performed; passing a malformed object may produce confusing output or errors fromdeparse/subsetting.
Purpose
Provides a concise, human-readable console summary of a fitted Empirical-Bayes model object. This S3 print method dispatches on objects of class gdpar_eb_fit, displaying key model characteristics, parameter estimates, numerical diagnostics of the Laplace approximation, and conditional HMC diagnostics.
Arguments
-
x: Agdpar_eb_fitobject, the result of an Empirical-Bayes fitting procedure. -
digits: Integer scalar. Controls the number of digits for numeric formatting viaformat(). Defaults to3L. -
...: Additional arguments (unused; included for S3 method consistency).
Mathematics
No explicit mathematical formula is implemented. The method presents estimates and standard errors computed elsewhere.
Returns
Invisibly returns the input object x (type gdpar_eb_fit).
Notes
-
S3 Dispatch: Invoked by
print()when the first argument is of classgdpar_eb_fit. -
Conditional Output: The printed output adapts based on the
pathcomponent of the object. For "eb_KxP" (Path C, the K×p regime), it prints a multi-dimensional array of estimates (theta_ref_kp_hat,theta_ref_kp_se), slot names, and per-slot condition numbers. For other paths, it prints scalar/vectors oftheta_ref_hatandtheta_ref_se. -
Side Effects: Writes directly to the console via
cat(). -
NULL-safe Access: Uses the
%||%operator (likely from rlang) to provide default values for potentially NULL components (e.g.,x$family$name), preventing errors during formatting. -
Diagnostics: Displays numerical diagnostics (
diagnostics_numerical) and, if available, one-line conditional HMC diagnostics (diagnostics).
Purpose
Constructs a structured summary of an Empirical-Bayes fit suitable for programmatic access and further printing. This S3 summary method computes credible intervals, optionally applying the Proposition 7B scalar or tensor correction, and extracts a summary of the conditional posterior if available.
Arguments
-
object: Agdpar_eb_fitobject. -
level: Numeric scalar in the interval (0, 1). Specifies the probability level for credible intervals. Defaults to 0.95. -
...: Additional arguments (unused).
Mathematics
-
Credible Interval Inflation (Correction):
The standard error (se) is multiplied by an inflation factorinflateto widen the credible interval, accounting for the uncertainty of the reference anchor.-
Scalar Correction (Path C off):
$$ \text{inflate} = \sqrt{1 + \frac{C}{\max(1, J)}} $$
where$C$ is the constantobject$eb_correction_constantand$J$ is the number of groups. -
Tensor Correction (Path C on):
For each group$g$ , slot$k$ , and coordinate$c$ :
$$ \text{inflate}_{k,c} = \sqrt{1 + \frac{\mathbf{T}[k, c, c]}{\max(1, J)}} $$
where$\mathbf{T}$ is theobject$correction_tensor_constant.
-
Scalar Correction (Path C off):
-
Credible Interval Calculation:
The$(1-\alpha)$ credible interval is:
$$ \text{estimate} \pm z_{1-\alpha/2} \cdot \text{se} \cdot \text{inflate} $$
where$z_{1-\alpha/2}$ is the$(1-\alpha/2)$ quantile of the standard normal distribution, and$\alpha = 1 - \text{level}$ .
Returns
An object of class summary.gdpar_eb_fit, which is a list containing:
-
theta_table: Adata.frame(or array for Path C) of estimates, standard errors, lower/upper interval bounds, and inflation factors. -
conditional_summary: A posterior summary (from theposteriorpackage) of the conditional model fit, if available. OtherwiseNULL. -
correction_applied: Logical flag indicating if an EB correction was applied. -
correction_constant(non-Path C) orcorrection_tensor(Path C): The correction value(s) used. -
inflation_factor: The computed inflation factor(s). -
level,family,link,J_groups,K_slots,p_dim,slot_names(Path C),diagnostics_numerical,diagnostics_hmc,path,call: Various model metadata.
Notes
-
Input Validation: Raises a
gdpar_input_error(viagdpar_abort()) iflevelis not a single numeric value in (0, 1). -
Conditional Posterior Extraction: Attempts to extract and summarize the conditional posterior draws using
posterior::summarise_draws(). Filters out latent parameters (e.g.,eta,log_lik) by pattern matching. Errors are silently caught, returningNULL. -
Path Dependency: The structure of the returned summary, especially
theta_table, differs significantly between the scalar (non-Path C) and tensor (Path C,eb_KxP) regimes.
Purpose
Formats and prints the summary of an Empirical-Bayes fit produced by summary.gdpar_eb_fit(). This S3 print method provides a detailed, human-readable display of the summary object.
Arguments
-
x: Asummary.gdpar_eb_fitobject. -
digits: Integer scalar for numeric formatting. Defaults to3L. -
...: Additional arguments (unused).
Mathematics
No new calculations; presents the pre-computed values from the summary object.
Returns
Invisibly returns the input summary object x.
Notes
-
S3 Dispatch: Invoked by
print()when the first argument is of classsummary.gdpar_eb_fit. -
Path-Dependent Output: Prints different sections depending on whether
x$pathis"eb_KxP"(Path C). For Path C, it prints the tensor-based correction details and the fulltheta_table. For other paths, it prints the scalar correction constant and inflation factor. - Conditional Summary Display: If available, prints the first 8 rows of the conditional posterior summary for a quick overview.
-
Side Effects: Writes directly to the console via
cat()andprint().
Purpose
Extracts coefficient estimates from a fitted empirical Bayes General Dynamic Parameter model (gdpar_eb_fit object). It returns the reference parameter estimates and, if a conditional HMC fit is available, the conditional model parameters (random effects, fixed effects, and raw W parameters).
Arguments
-
object: Agdpar_eb_fitobject resulting from a call to a fitting function (e.g.,gdpar_eb). -
...: Additional arguments (currently unused).
Mathematics
No new mathematical operations. It extracts precomputed quantities:
-
$\widehat{\theta}_{\text{ref}}^{\text{EB}}$ : The empirical Bayes estimate of the reference parameter. -
$\text{SE}(\widehat{\theta}_{\text{ref}})$ : Its standard error. -
$\text{Cov}(\widehat{\theta}_{\text{ref}})$ : Its covariance matrix. - For conditional parameters, it extracts posterior means and standard deviations from HMC draws:
$$ \widehat{\mu}a = \frac{1}{S} \sum{s=1}^S a^{(s)}, \quad \text{SD}(a) = \sqrt{\frac{1}{S-1} \sum_{s=1}^S (a^{(s)} - \widehat{\mu}_a)^2} $$
(analogous for
bandW), where$S$ is the number of posterior draws.
Returns
A list of class c("gdpar_coef_eb", "gdpar_coef", "list") with components:
-
theta_ref: A list containing:-
method: Character"EB". -
estimate: Numeric scalar,$\widehat{\theta}_{\text{ref}}^{\text{EB}}$ . -
se: Numeric scalar, standard error. -
cov: Numeric matrix, covariance matrix. -
eb_correction_applied: Logical, whether an EB correction was applied. -
eb_correction_constant: Numeric, the constant used for EB correction (if any).
-
- If
object$conditional_fitexists andposteriorpackage is available:-
a: List withestimate(vector of means) andse(vector of SDs) fora_coefparameters. -
b: List withestimateandseforb_coefparameters. -
W: List withestimateandseforW_rawparameters.
-
Notes
- S3 method for class
gdpar_eb_fit. - The conditional parameters (
a,b,W) are only extracted if theposteriorpackage is available and the conditional fit object contains draws. - The helper function
pick(pat)uses a regex patternpatto match variable names in the posterior draws and returns their means and SDs. - The output class inherits from
gdpar_coef, allowing use of generic coefficient methods.
predict.gdpar_eb_fit(object, newdata = NULL, type = c("response", "linear_predictor"), level = 0.95, ...)
Purpose
Computes posterior predictions from the conditional HMC model fit at the plug-in empirical Bayes estimate
Arguments
-
object: Agdpar_eb_fitobject. -
newdata: Optional data frame with the same variables as training data. Currently must beNULL(in-sample prediction). -
type: Character string specifying prediction scale:-
"response"(default): Predictions on the response scale via the family's inverse-link function ($y$ ). -
"linear_predictor": Predictions on the linear predictor scale ($\eta$ ).
-
-
level: Numeric scalar in$(0,1)$ for the credible interval width. Defaults to$0.95$ . -
...: Additional arguments (currently unused).
Mathematics
Let
- Mean:
$\bar{\eta}_i = \frac{1}{S} \sum_{s=1}^S \eta_i^{(s)}$ (or$\bar{y}_i$ for response). - Credible interval bounds:
$Q_{\alpha/2}(\eta_i)$ and$Q_{1-\alpha/2}(\eta_i)$ , where$\alpha = 1 - \text{level}$ and$Q$ denotes the sample quantile.
Returns
A list with components:
-
mean: Numeric vector of posterior predictive means (length$n$ ). -
lower: Numeric vector of lower credible interval bounds (length$n$ ). -
upper: Numeric vector of upper credible interval bounds (length$n$ ). -
draws: Numeric matrix of posterior predictive draws (dimensions$S \times n$ ). -
level: The credible interval level used. -
type: The prediction type ("response"or"linear_predictor").
Notes
- S3 method for class
gdpar_eb_fit. - If
newdatais notNULL, an error of class"gdpar_unsupported_feature_error"is raised, stating that out-of-sample prediction is not yet implemented (deferred to Sub-phase 8.6.C). - Requires the
posteriorpackage to extract and manipulate HMC draws. - The function searches for variables in the conditional fit's draws matching
"^eta\\["(for linear predictor) or"^y_pred\\["(for response). If none are found, an internal error is raised. - Quantiles are computed using
stats::quantilewithnames = FALSE. - The
drawsmatrix is transposed from theposteriordraws matrix format to$S \times n$ .
gdpar_eb(formula, family = gdpar_family("gaussian"), amm = amm_spec(), W = NULL, data, prior = NULL, anchor = "prior_mean", skip_id_check = FALSE, chains = 4L, iter_warmup = 1000L, iter_sampling = 1000L, adapt_delta = 0.95, max_treedepth = 12L, refresh = 100L, verbose = TRUE, seed = NULL, group = NULL, parametrization = c("auto", "ncp", "cp"), id_check_rigor = c("full", "fast"), eb_correction = TRUE, laplace_control = list(), ...)
Purpose
Exported main entry point for Path 1 Empirical-Bayes (EB) estimation of the AMM canonical model. It is the EB counterpart of gdpar(). The function implements a three-step pipeline:
-
Step (i) — Estimate the population reference
$\theta_{\text{ref}}$ by maximizing the marginal (Type II) likelihood via Laplace approximation (cmdstanr::laplace()), with multi-start optimization and adaptive Levenberg–Marquardt ridge perturbation for numerical anti-fragility. -
Step (iii) — Sample the lower-level parameters
$\xi = (a, b, W, \sigma_*, \phi)$ from the conditional posterior$p(\xi \mid y, \widehat{\theta}_{\text{ref}}^{\text{EB}})$ via HMC (cmdstanr::sample()). - Optionally apply the scalar Proposition 7B coverage-discrepancy inflation factor to the conditional credible intervals.
The function dispatches across three path regimes based on the resolved
-
Path A (8.6.B):
$K = 1$ ,$p = 1$ — the base regime executed inline in the function body. -
Path B (8.6.C):
$K > 1$ ,$p = 1$ — delegated to.gdpar_eb_run_K(). -
Path C (8.6.D):
$K > 1$ and any slot with$p > 1$ — delegated to.gdpar_eb_run_KxP().
Arguments
| Argument | Type | Meaning |
|---|---|---|
formula |
Two-sided formula or gdpar_formula_set
|
Outcome and RHS specification. Same semantics as gdpar()'s formula. When it inherits from "gdpar_formula_set", the K-input dispatch fires. |
family |
gdpar_family object or named list |
Distributional family. Sub-phase 8.6.B supports stan_id in c(1, 2, 3, 4) (Gaussian, Poisson, neg-binomial-2, Bernoulli) for the gdpar_family or gdpar_family_multi), it is treated as a multi-family input for K-input dispatch. |
amm |
amm_spec or named list of amm_spec
|
AMM specification. Must have amm$p == 1L for the base regime; multivariate (amm_spec) triggers K-input dispatch. |
W |
W_basis object or NULL
|
Optional modulating basis (polynomial or B-spline). |
data |
data.frame |
Data frame containing all variables referenced by formula and amm. |
prior |
gdpar_prior object or NULL
|
Prior specification. When NULL, defaults via gdpar_prior() are used. |
anchor |
Numeric scalar, "prior_mean", or "empirical_y"
|
Anchor value for "prior_mean". |
skip_id_check |
Logical scalar | If TRUE, skips the basis-restricted identifiability check. |
chains |
Integer scalar | Number of HMC chains for Step (iii). Default 4L. |
iter_warmup |
Integer scalar | HMC warmup iterations per chain. Default 1000L. |
iter_sampling |
Integer scalar | HMC sampling iterations per chain. Default 1000L. |
adapt_delta |
Numeric scalar | HMC adapt_delta. Default 0.95. |
max_treedepth |
Integer scalar | HMC maximum tree depth. Default 12L. |
refresh |
Integer scalar | HMC refresh interval. Default 100L. |
verbose |
Logical scalar | Controls diagnostic messages and show_messages/show_exceptions in HMC. |
seed |
Integer scalar or NULL
|
Random seed for reproducibility (Laplace multi-start, parametrization pre-flight, and HMC). |
group |
One-sided formula or NULL
|
Grouping variable specification. |
parametrization |
Character scalar | One of "auto" (default), "ncp", "cp". Selects CP/NCP sampling parametrization for additive and modulating components in Step (iii). "auto" triggers a pre-flight diagnostic via resolve_parametrization(). |
id_check_rigor |
Character scalar | One of "full" or "fast". Matched but not otherwise consumed in this function body (forwarded to K-path orchestrators). |
eb_correction |
Logical scalar | If TRUE (default), applies the scalar Proposition 7B inflation factor to conditional credible intervals. If FALSE, issues a gdpar_diagnostic_warning about expected |
laplace_control |
Named list | Controls for Step (i) Laplace approximation and anti-fragility. Recognized entries: multi_start_M (default 5), kappa_threshold (default 1e10), ridge_init (default 1e-6), epsilon_lm (default sqrt(.Machine$double.eps)), ridge_max_iter (default 10), ridge_grow_factor (default 10.0), laplace_draws (default 1000), optim_algorithm (default "lbfgs"). Resolved by .gdpar_eb_resolve_laplace_control(). |
... |
Additional arguments | Forwarded to the underlying HMC sampler (conditional_model$sample()) in Step (iii). |
Mathematics
The EB estimator maximizes the marginal (Type II) log-likelihood:
The integral is approximated by the Laplace method: for each candidate
where
Given
sampled via HMC in Step (iii).
When eb_correction = TRUE, the scalar Proposition 7B inflation constant
Returns
An object of class c("gdpar_eb_fit", "list") with the following named components:
| Component | Type | Description |
|---|---|---|
theta_ref_hat |
Numeric vector (length J_groups) |
EB point estimates of |
theta_ref_se |
Numeric vector (length J_groups) |
Marginal standard errors from the Laplace covariance. |
conditional_fit |
cmdstanr fit object |
The HMC fit from Step (iii). |
amm |
amm_spec |
The resolved AMM specification. |
family |
gdpar_family |
The resolved family object. |
prior |
gdpar_prior |
The resolved prior. |
design |
AMM design object | Built by build_amm_design(). |
anchor |
Numeric scalar | The resolved anchor value. |
stan_data |
Named list | The Stan data list (includes K_slots, p_dim). |
identifiability_report |
Report object or NULL
|
Result of gdpar_check_identifiability(); NULL when skip_id_check = TRUE. |
diagnostics |
gdpar_diagnostics |
Diagnostics from the conditional HMC fit, computed by compute_diagnostics(). |
diagnostics_numerical |
Named list | Numerical diagnostics from the Laplace step: kappa, lm_perturbation, lm_n_iter, lm_status (one of "not_needed", "converged", "exhausted"), kappa_post_ridge, multi_start_dispersion, marginal_log_lik_history. For Path C, slot-vectorized counterparts (kappa_per_slot, lm_lambda_per_slot, lm_n_iter_per_slot, lm_status_per_slot) replace the scalars. |
parametrization |
Named list | Contains cp_a (logical), cp_W (logical), and meta (metadata from resolve_parametrization()). |
group_info |
Group info object or NULL
|
Resolved grouping information. |
correction_applied |
Logical scalar | Whether the Proposition 7B correction was applied. |
eb_correction_constant |
Numeric scalar | The inflation constant when eb_correction = TRUE; NA_real_ otherwise. |
call |
call |
The matched call. |
path |
Character scalar | Always "eb". |
Notes
-
Argument matching:
parametrizationandid_check_rigorare resolved viamatch.arg()at function entry.callis captured viamatch.call(). -
Input validation: Delegates to
.gdpar_eb_validate_inputs()(defined in a subsequent section) for type discipline offormula,family,amm,data,eb_correction, andlaplace_control. IfpriorisNULL, it is replaced bygdpar_prior(); thenassert_inherits()enforces class"gdpar_prior". -
cmdstanr dependency:
require_suggested("cmdstanr", ...)is called to ensure the suggested package is available. The Laplace method requires cmdstanr ≥ 0.7.0. -
K-input dispatch: Four boolean flags detect multi-slot input patterns:
-
.formula_set_input:formulainherits"gdpar_formula_set". -
.amm_list_input:ammis a list, does not inherit"amm_spec", and has non-NULLnames. -
.classic_with_amm_calls:formulais a standard two-sided formula (length 3) whose RHS containsa()/b()/W()calls, detected by.gdpar_rhs_has_amm_calls(). -
.family_is_named_list:familyis a named list not inheriting"gdpar_family"or"gdpar_family_multi".
When any of these fires,
.gdpar_eb_resolve_K_inputs()buildsamm_list_canonical,family_promoted,outcome_name,formula_env, andfamily_id_k_vector. If resolved$K > 1$ , the function checks whether any slot has$p > 1$ (.any_slot_p_gt1); if so, it returns.gdpar_eb_run_KxP()(Path C), otherwise.gdpar_eb_run_K()(Path B). If$K = 1$ , the singleamm_specis unwrapped fromamm_list_canonical[[1]],familyis replaced byfamily_promoted, and a newformulais reconstructed from the union ofall.vars(amm$a)andall.vars(amm$b)(or"1"if both are empty), usingK_inputs$formula_envas the environment. -
-
Path A (K = 1) pipeline: After K-input resolution (or if no K-input pattern fired), the function proceeds inline:
-
p_resolvedis read fromamm$p(defaulting to1Lif absent).K_resolvedis always1L. -
.gdpar_eb_check_stan_id_for_path()validates the family'sstan_idagainst the resolved$(K, p)$ . - The outcome variable name is extracted from
formula[[2]]. If not found indata, agdpar_input_erroris raised. Non-finite values (NA,NaN,Inf) in the outcome trigger agdpar_input_errorwith a count. - The RHS formula is extracted as
formula[c(1L, 3L)]and updated with~ . + 0(no intercept). - If
amm$Wis non-NULL, it is materialized viamaterialize_W_basis(amm$W, p = p_resolved). - The AMM design is built via
build_amm_design(amm, data, formula_rhs = rhs). - The anchor is resolved via
resolve_anchor(anchor, family, y, prior, verbose). - Unless
skip_id_check = TRUE,gdpar_check_identifiability()is called withtheta_ref_initset to1whenamm$bis non-NULLandabs(anchor_value) < 1e-8, otherwiseanchor_value. If the check does not pass, agdpar_identifiability_erroris raised with the report attached indata = list(report = rep). - Group resolution via
.resolve_group_argument(). If a group is present,.check_group_aliasing_c7()is called. - Stan data is assembled via
assemble_stan_data().stan_data$K_slotsandstan_data$p_dimare set to the resolved integers. - Parametrization is resolved via
resolve_parametrization()(which may run a pre-flight diagnostic whenparametrization = "auto"). - The marginal Stan model source is generated by
.gdpar_eb_generate_stan_marginal(), written to a tempfile viawrite_stan_to_tempfile(), and compiled viacmdstanr::cmdstan_model(). - The marginal likelihood is maximized by
.gdpar_eb_maximize_marginal(), returningtheta_ref_hat,theta_ref_se, anddiagnostics. - The conditional Stan model source is generated by
.gdpar_eb_generate_stan_conditional(), written and compiled analogously. -
stan_data_condis a copy ofstan_datawiththeta_ref_dataset: when$p > 1$ andlength(theta_hat_loc) == J_groups * p, it is reshaped to aJ_groups×pmatrix (column-major,byrow = FALSE); otherwise it is passed as a flat numeric vector. - HMC sampling is invoked via
do.call(conditional_model$sample, sample_args). Extra arguments from...are merged intosample_args, potentially overriding defaults.seedis included only when non-NULL. - Diagnostics are computed via
compute_diagnostics(fit_cond, verbose = verbose). - The EB correction is computed by
.gdpar_eb_apply_correction().
-
-
Errors raised:
-
gdpar_input_error: outcome variable not indata; outcome contains non-finite values. -
gdpar_identifiability_error: basis-restricted identifiability check failed (withdata = list(report = rep)). -
gdpar_unsupported_feature_error: raised by.gdpar_eb_check_stan_id_for_path()for unsupportedstan_id/$(K, p)$ combinations (as documented; the actual raise is inside the helper). -
gdpar_eb_numerical_error: raised by.gdpar_eb_maximize_marginal()when the condition number exceedskappa_thresholdafter adaptive ridge (as documented; the actual raise is inside the helper).
-
-
Side effects: Writes Stan source files to temporary files on disk; compiles Stan models (may invoke the C++ toolchain); runs optimization and HMC sampling (may produce console output controlled by
verbose/refresh). -
S3 dispatch: The returned object has class
c("gdpar_eb_fit", "list"). No S3 methods for this class are defined in this section.
Purpose
Top-level input validator for the EB (Empirical Bayes) correction pipeline. Called before any dispatch to verify that every public argument conforms to the expected type and structure. Guards the entry point of the EB path and prevents downstream functions from receiving malformed inputs.
Arguments
-
formula(any): Must be either a two-sided R formula of length 3 (y ~ ...) or an object inheriting from class"gdpar_formula_set". -
family(any): Must be one of: an object inheriting from"gdpar_family", an object inheriting from"gdpar_family_multi"(Path A,$p > 1$ ), or a named list whose every element inherits from"gdpar_family"with no duplicated or empty names (Path B heterogeneous$K$ , sub-phase 8.3.7 pattern). -
amm(any): Must be an object inheriting from"amm_spec"or a named list (whose elements are expected to be"amm_spec"objects) for Path B$K > 1$ . -
data(any): Must be adata.frame. -
eb_correction(any): Must be a single, non-NAlogical value (TRUEorFALSE). -
laplace_control(any): Must be a list (possibly empty, possibly unnamed at this stage — naming is enforced downstream in.gdpar_eb_resolve_laplace_control).
Returns
invisible(NULL). The function is called for its side effect of raising errors on invalid input.
Notes
- Raises an error of class
"gdpar_input_error"(viagdpar_abort) for each validation failure, with a conditiondatafield carryingreceived_classwhere applicable. - The
formulacheck first testsinherits(formula, "gdpar_formula_set"); if that fails, it requiresinherits(formula, "formula")andlength(formula) == 3L. - The
familynamed-list detection (Path B) requires all of:is.list(family), not inheriting from"gdpar_family"or"gdpar_family_multi", non-null names, all names non-empty (nzchar), no duplicated names (anyDuplicated == 0L), and every element inheriting from"gdpar_family"(checked viavapply). - The
ammnamed-list detection requiresis.list(amm), not inheriting from"amm_spec", and non-null names. - The
$K > 1$ +$p > 1$ guard (Path C) is explicitly released per Sub-phase 8.6.D (Session 13b, 2026-05-25); Path C is routed to.gdpar_eb_run_KxP()in the dispatcher. Per-pathstan_idchecks are deferred to.gdpar_eb_check_stan_id_for_path().
Purpose
Enforces the per-path supported stan_id set for the EB Stan templates, depending on the resolved family_id_k_vector data field.
Arguments
-
family(list): A single family specification object. Must contain$stan_id(coercible to integer) and$name(character) fields. -
K(integer/numeric): The resolved number of mixture components. -
p(integer/numeric): The resolved number of parametric coordinates.
Mathematics
The supported stan_id sets by regime:
Note that the
Returns
invisible(NULL) on success.
Notes
- If
family$stan_idisNULL, the function returns immediately without checking (short-circuit). -
stan_idis coerced viaas.integer(). - On failure, raises an error of class
"gdpar_unsupported_feature_error"viagdpar_abort, with a conditiondatalist containingfamily,stan_id,K,p, andsupported. - Under Path C (
$K > 1$ ,$p > 1$ ), the dispatcher is expected to iterate this check across the$K$ slots before assembling thefamily_id_k_vectordata field. - The deferred Path B set
$\{5, 6, 7, 8, 9, 10, 11, 12, 13\}$ (Beta, Gamma, Lognormal_loc_scale, Student-t, Tweedie, ZIP, ZINB, Hurdle-Poisson, Hurdle-NB) for the$K > 1, p > 1$ regime is queued for a later iteration of Sub-phase 8.6.D.
Purpose
Merges a user-supplied laplace_control list with documented defaults, coercing types and validating bounds. Produces the fully resolved control list consumed by downstream Laplace approximation and ridge-perturbation routines.
Arguments
-
user(list): User-supplied control parameters. May be empty. If non-empty, every entry must be named.
Mathematics
Default values:
where .Machine$double.eps.
Returns
A named list with the following entries, all type-coerced:
| Field | Type | Default |
|---|---|---|
multi_start_M |
integer | 5L |
kappa_threshold |
double | 1e10 |
ridge_init |
double | 1e-6 |
laplace_draws |
integer | 1000L |
optim_algorithm |
character | "lbfgs" |
epsilon_lm |
double | sqrt(.Machine$double.eps) |
ridge_max_iter |
integer | 10L |
ridge_grow_factor |
double | 10.0 |
User-supplied values for recognized names override defaults; unrecognized names are dropped after a warning.
Notes
- If
useris empty (length(user) == 0L), returns the defaults list directly. - If
useris non-empty but hasNULLnames or any empty (!nzchar) names, raises an error of class"gdpar_input_error". - Unknown entries (not in
names(defaults)) trigger a soft warning of class"gdpar_diagnostic_warning"viagdpar_warnand are silently dropped from the output. - Post-merge type coercion:
multi_start_M,laplace_draws,ridge_max_iterare coerced viaas.integer();kappa_threshold,ridge_init,epsilon_lm,ridge_grow_factorviaas.double().optim_algorithmis left as-is. - Validation bounds (each raises
"gdpar_input_error"on failure):multi_start_M >= 1Lkappa_threshold > 0epsilon_lm > 0ridge_max_iter >= 1Lridge_grow_factor > 1
-
laplace_drawsis coerced to integer but not bounds-checked in this function.
Purpose
Adaptive Levenberg-Marquardt ridge perturbation for the empirical posterior covariance matrix returned by cmdstanr::laplace(). Implements component 2 of the four-component anti-fragility strategy, extending the single-step ridge of Sub-phase 8.6.B into an iterative geometric-growth loop.
Arguments
-
cov(numeric matrix): A square symmetric matrix — the empirical posterior covariance. For Path C this is a per-slot block; for Path A/B it is the full$\theta_{\text{ref}}$ covariance. -
control(list): A resolvedlaplace_controllist (as produced by.gdpar_eb_resolve_laplace_control). Must contain$ridge_init,$ridge_max_iter,$ridge_grow_factor,$kappa_threshold, and$epsilon_lm.
Mathematics
Let cov) of dimension
Trigger condition. Ridge perturbation is needed if:
where
If not needed: returns the original matrix with status "not_needed" and
Adaptive loop. Starting with
Compute eigenvalues
Convergence: If "converged".
Growth: Otherwise,
Exhaustion: If the loop completes "exhausted" and
Returns
A list with fields:
| Field | Type | Description |
|---|---|---|
cov_perturbed |
numeric matrix | The (possibly ridged) covariance. Equals cov when status = "not_needed"; equals the last cov_try otherwise. |
lambda_used |
numeric | Final effective ridge 0 when status = "not_needed". |
n_iter |
integer | Number of iterations performed. 0L when status = "not_needed". Equals control$ridge_max_iter when "exhausted". |
kappa_post |
numeric | Condition number after perturbation. Original "not_needed"; "exhausted". |
status |
character | One of c("not_needed", "converged", "exhausted"). |
Notes
- Eigenvalue computation uses
eigen(cov, symmetric = TRUE, only.values = TRUE)wrapped intryCatch; if it errors, eigenvalues are set toNA_real_, which triggers the ridge path. - The determinant is computed as
prod(eigs0)only when all eigenvalues are finite; otherwisedet_valisNA_real_and the determinant-based trigger is skipped (but the eigenvalue-based trigger may still fire). -
trace_meanis clamped to at least$10^{-12}$ to avoid a zero floor when the diagonal is near-zero. - The
lambda_efffloor of$10^{-3} \cdot \bar{d}$ is applied inside every iteration, so even ifcontrol$ridge_initis very small, the effective ridge is bounded below by the trace-mean-scaled floor. - When
status = "exhausted", the returnedcov_perturbedis the last attempted matrix (which may or may not be positive-definite), andkappa_postisInfif the final eigenvalues are non-finite or non-positive. - No error is raised on exhaustion; the caller is expected to inspect
status.
.gdpar_eb_generate_stan_marginal(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)
Purpose
Dispatches to the correct Stan template generator for the EB marginal model — the model in which theta_ref (or theta_ref_k) lives in the parameters{} block and is assigned an anchor prior in model{}. This corresponds to Step (i)/(ii) of the EB workflow where the marginal log-likelihood is maximised to obtain the empirical-Bayes anchor estimate. The function selects among four template paths based on the resolved dimensions
Arguments
| Argument | Type | Meaning |
|---|---|---|
prior |
list | Prior specification list. Expected elements (consumed downstream by the renderer): theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, phi. |
cp_a |
logical (default FALSE) |
Centered-parameterization flag for a. When TRUE, a is scaled directly by sigma_a; when FALSE, a non-centered * sigma_a[1] scaling is applied. |
cp_W |
logical (default FALSE) |
Centered-parameterization flag for W. Semantics mirror cp_a. |
K |
integer (default 1L) |
Number of K-slots (groups/series). Coerced to integer at entry. |
p |
integer (default 1L) |
Coordinate dimension of the response. Coerced to integer at entry. |
family |
NULL or family object |
Passed only to the Path B (K > 1, p = 1) generator generate_stan_code_K. |
cp_a_per_k |
NULL or logical |
Per-k centered-parameterization flag for a, forwarded to generate_stan_code_multi (Path A). |
cp_a_per_K |
NULL or logical |
Per-K centered-parameterization flag for a, forwarded to generate_stan_code_K (Path B). |
Mathematics
The dispatch is a partition of the
Returns
A character string containing the rendered Stan model code. For the .gdpar_eb_render_template; for the other two paths it is produced by generate_stan_code_multi or generate_stan_code_K respectively.
Notes
-
$K$ and$p$ are coerced to integer immediately upon entry (as.integer). - The Path C template (
$K > 1 \wedge p > 1$ ) has a restricted placeholder set: onlytheta_ref,sigma_theta_ref,sigma_a,sigma_b,sigma_y,phiare present. The placeholders{{A_SCALE}},{{A_PRIOR}},{{W_SCALE}},{{W_PRIOR}}are absent because the NCP (non-centered parameterization) is hardcoded per slot per coordinate andWis disabled (decision D39). - The function does not itself raise errors; any errors propagate from the downstream generators/renderers.
.gdpar_eb_generate_stan_conditional(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)
Purpose
Companion of .gdpar_eb_generate_stan_marginal for Step (iii) of the EB workflow. Generates the EB conditional Stan model, in which theta_ref (or theta_ref_k) has been moved from parameters{} to data{} and the anchor priors are dropped from model{}. The dispatch table is structurally identical to the marginal helper; only the template names differ.
Arguments
Identical to .gdpar_eb_generate_stan_marginal (same names, types, defaults, and meanings).
Mathematics
Returns
A character string of rendered Stan model code, sourced from the same generators as the marginal path but with conditional template names.
Notes
- The conditional templates share the same placeholder set as their marginal counterparts, except that anchor-prior placeholders are consumed only in the marginal path (the conditional path drops them from
model{}). -
$K$ and$p$ are coerced to integer at entry. - No errors are raised directly; all are delegated downstream.
Purpose
Shared low-level renderer for the EB Stan template family. Reproduces the placeholder-substitution logic of generate_stan_code() but restricted to EB templates. It (1) translates legacy single-template names to their canonical-piece equivalents, (2) locates the template file in the installed package or falls back to inst/stan/, (3) injects the canonical helpers piece when the // {{CANONICAL_HELPERS}} marker is present, (4) performs all {{...}} substitutions, and (5) aborts with a structured error if any placeholder remains un-substituted.
Arguments
| Argument | Type | Meaning |
|---|---|---|
template_name |
character | Base name of the .stan template file (e.g. "amm_eb_marginal.stan"). |
prior |
list | Prior specification list; the renderer reads prior$theta_ref, prior$sigma_theta_ref, prior$sigma_a, prior$sigma_b, prior$sigma_W, prior$sigma_y, prior$phi. |
cp_a |
logical | Centered-parameterization flag for a. Controls the values substituted for {{A_SCALE}} and {{A_PRIOR}}. |
cp_W |
logical | Centered-parameterization flag for W. Controls the values substituted for {{W_SCALE}} and {{W_PRIOR}}. |
Mathematics
The placeholder substitution map is:
| Placeholder | Value when cp_* = TRUE
|
Value when cp_* = FALSE
|
|---|---|---|
{{A_SCALE}} |
"" |
" * sigma_a[1]" |
{{A_PRIOR}} |
"normal(0, sigma_a[1])" |
"normal(0, 1)" |
{{W_SCALE}} |
"" |
" * sigma_W[1]" |
{{W_PRIOR}} |
"normal(0, sigma_W[1])" |
"normal(0, 1)" |
The prior placeholders map directly: {{PRIOR_THETA_REF}} prior$theta_ref, {{PRIOR_SIGMA_THETA_REF}} prior$sigma_theta_ref, {{PRIOR_SIGMA_A}} prior$sigma_a, {{PRIOR_SIGMA_B}} prior$sigma_b, {{PRIOR_SIGMA_W}} prior$sigma_W, {{PRIOR_SIGMA_Y}} prior$sigma_y, {{PRIOR_PHI}} prior$phi.
Returns
A character string: the fully substituted Stan source code.
Notes
-
Template name translation:
"amm_eb_marginal.stan"is mapped to"amm_canonical_eb_marginal.stan"and"amm_eb_conditional.stan"is mapped to"amm_canonical_eb_conditional.stan". All other template names (including the KxP templates) pass through unchanged. -
File location: If the effective template name starts with
"amm_canonical_", the file is sought insystem.file("stan", "_canonical_pieces", ...)with a fallback tofile.path("inst", "stan", "_canonical_pieces", ...). Otherwise it is sought insystem.file("stan", ...)with a fallback tofile.path("inst", "stan", ...). -
Helpers injection: If the template source contains the literal
// {{CANONICAL_HELPERS}}, the fileamm_canonical_helpers.stanis read from the same_canonical_piecesdirectory and substituted in place. Templates without this marker (e.g. the KxP EB templates) pass through unchanged. -
Error — template not found: If the resolved
template_pathdoes not exist, callsgdpar_abortwith class"gdpar_internal_error"and message"Stan template file '<name>' not found.". -
Error — helpers not found: If the helpers piece file does not exist, calls
gdpar_abortwith class"gdpar_internal_error". -
Error — unsubstituted placeholder: After all substitutions, if the string still contains
"{{", the first match of\{\{[A-Za-z0-9_]+\}\}is extracted viaregmatches/regexprand passed togdpar_abortwith class"gdpar_internal_error"and message"Unsubstituted placeholder remains in EB Stan code: <leftover>". - All
gsubcalls usefixed = TRUE, so placeholders are treated as literal strings.
Purpose
Implements Step (i) of the EB workflow with the anti-fragility strategy of Charter Section 2.8. Runs cmdstanr::optimize() followed by cmdstanr::laplace() on the marginal EB Stan model with multi_start_M independent random inits, retains the init with the highest log-marginal approximation, applies an adaptive Levenberg–Marquardt ridge if the Hessian-derived covariance is ill-conditioned, and assembles the diagnostics needed by the gdpar_eb_fit$diagnostics_numerical slot.
Arguments
| Argument | Type | Meaning |
|---|---|---|
model |
CmdStanModel |
A compiled cmdstanr model object exposing $optimize() and $laplace() methods. |
stan_data |
list | Data list for Stan. Must contain J_groups (integer, number of groups). For path dispatch, may contain p_dim (integer, coordinate dimension) and K_slots (integer, number of K-slots). |
control |
list | Control parameters. Must contain: multi_start_M (integer, number of multi-start inits), optim_algorithm (character, passed to optimize), laplace_draws (integer, number of Laplace draws), kappa_threshold (numeric, condition-number gate). |
seed |
NULL or integer |
Base random seed. When non-NULL, per-init seeds are as.integer(seed) + m for optimize and as.integer(seed) + 1000L for Laplace. |
verbose |
logical | When TRUE, emits informational messages about failed inits and multimodality warnings. |
Mathematics
Multi-start optimization. For
The best init is selected by the largest finite lp__ value from optimize()):
Laplace approximation. At the best mode
Adaptive Levenberg–Marquardt ridge. If ridge_max_iter is reached:
Condition-number gate. The final covariance is accepted only if:
Multi-start dispersion. Computed over the finite
A dispersion exceeding verbose = TRUE.
Path-aware variable extraction. The theta_ref variable names extracted from the Laplace draws depend on the path:
| Path | Condition | Variable pattern | Expected count |
|---|---|---|---|
| Base |
theta_ref[1], …, theta_ref[J] (or theta_ref if |
||
| Path A |
|
theta_ref[j,k] for |
|
| Path B |
|
theta_ref_k[j,k] for |
Returns
A list with components:
| Component | Type | Description |
|---|---|---|
theta_ref_hat |
numeric vector (length |
Posterior mean of theta_ref from the Laplace draws (colMeans of the draws matrix). |
theta_ref_se |
numeric vector (same length) | Standard errors: |
theta_ref_cov |
matrix ( |
Covariance matrix (possibly ridged). |
diagnostics |
named list | See below. |
The diagnostics list contains:
| Element | Type | Description |
|---|---|---|
kappa |
numeric | Post-ridge condition number |
lm_perturbation |
numeric | The ridge lambda_used). |
lm_n_iter |
integer | Number of LM ridge iterations. |
lm_status |
character | Status from .gdpar_eb_lm_perturb (e.g. "ok" or "exhausted"). |
kappa_post_ridge |
numeric | Duplicate of kappa (from lm_out$kappa_post). |
multi_start_dispersion |
numeric | Dispersion of finite NA if fewer than 2 finite values. |
marginal_log_lik_history |
numeric vector (length |
lp__ from each init; NA for failed inits. |
best_init_index |
integer | The |
Notes
-
Init dispatch: The flag
is_multi_or_KisTRUEwhenstan_data$p_dim > 1Lorstan_data$K_slots > 1L. In that case,init_mis set toNULL(cmdstanr's default unconstrained-space random sampler is used). Otherwise,.gdpar_eb_make_random_init(stan_data, seed_offset = m, base_seed = seed)is called. Each multi-start iteration uses a distinct seed offset, preserving reproducibility. -
Optimize call:
jacobian = TRUEis always set (required for downstreamlaplace()to match the unconstrained-scale convention). Wheninit_mis non-NULL, it is wrapped aslist(init_m)(single chain). Whenseedis non-NULL, the per-init seed isas.integer(seed) + m. -
Laplace call: Uses
mode = best_opt,jacobian = TRUE,draws = control$laplace_draws. Seed (if non-NULL) isas.integer(seed) + 1000L. -
Error — all inits fail: If
best_optisNULL(everyoptimize()call failed or returnedNULL), callsgdpar_abortwith class"gdpar_unsupported_feature_error", message recommendinggdpar()(FB), anddata = list(history_lp = history_lp). -
Error — Laplace fails: If
model$laplace()returnsNULL(error caught), callsgdpar_abortwith class"gdpar_eb_numerical_error", message about singular/non-PD Hessian at the candidate MAP, anddata = list(history_lp, best_idx). -
Error — missing theta_ref variables (Path B): If the number of
theta_ref_k[j,k]variables found in the draws does not equal$J \cdot K$ , callsgdpar_abortwith class"gdpar_internal_error". -
Error — missing theta_ref variables (Path A): If the number of
theta_ref[j,k]variables found does not equal$J \cdot p$ , callsgdpar_abortwith class"gdpar_internal_error". -
Error — missing theta_ref variables (Base): If no
theta_ref[...]variables are found and$J = 1$ does not rescue via the baretheta_refname, callsgdpar_abortwith class"gdpar_internal_error"and message"theta_ref variable not found in Laplace draws output.". -
Error — kappa exceeds threshold: If
$\kappa_{\text{post}} > \kappa_{\text{threshold}}$ orlm_out$status == "exhausted", callsgdpar_abortwith class"gdpar_eb_numerical_error", a detailed message including$\kappa$ , threshold, LM status, iteration count,$\lambda$ , and smallest eigenvalue, anddatacontainingkappa,eigenvalues,history_lp,lm_status,lm_n_iter,lm_lambda. -
Warning — multimodality: When
dispersion > 0.05andverbose = TRUE, callsgdpar_warnwith class"gdpar_diagnostic_warning"anddata = list(dispersion, history_lp). -
Covariance computation: If the draws matrix has more than one column,
stats::cov(theta_mat)is used; otherwise a$1 \times 1$ matrix fromstats::var(theta_mat[, 1]). -
Eigenvalue computation:
eigen(theta_cov, symmetric = TRUE, only.values = TRUE)is attempted in atryCatch; on error returnsNA_real_. The minimum eigenvalue is reported in the kappa-exceeds-threshold error message. -
Verbose messages: Failed
optimize()calls emit agdpar_informwith class"gdpar_eb_message"whenverbose = TRUE. - The
%||%operator is used for theall_varsfallback (dimnames(draws)$variable %||% character(0L)).
← Part IV — Exhaustive Function Reference (1/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (3/7) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References