-
Notifications
You must be signed in to change notification settings - Fork 0
Part IV Function Reference 3
← Part IV — Exhaustive Function Reference (2/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (4/7) →
Purpose
Generates a list of random initial values for the Stan HMC sampler in the Empirical Bayes (EB) workflow. The structure of the returned list is conditioned on the flags and dimensions carried in stan_data, so that only parameters relevant to the configured model are initialised.
Arguments
-
stan_data(list): The data list prepared for Stan. The following fields are consulted:-
J_groups(integer): number of reference-parameter groups$J$ . -
use_groups(integer flag, 0/1): whether group-level hyperparameters are active. -
use_a(integer flag, 0/1): whether the$a$ AMM component is active. -
J_a(integer): dimension of the$a$ component. -
use_b(integer flag, 0/1): whether the$b$ AMM component is active. -
J_b(integer): dimension of the$b$ component. -
use_W(integer flag, 0/1): whether the$W$ AMM component is active. -
dim_W(integer): row dimension of the$W$ matrix. -
d(integer): column dimension of the$W$ matrix (latent dimension). -
use_dispersion_y(integer flag, 0/1): whether an observation-level dispersion is active. -
use_dispersion_phi(integer flag, 0/1): whether a$\phi$ dispersion parameter is active.
-
-
seed_offset(integer, default1L): integer added tobase_seedto derive the RNG seed. -
base_seed(integer orNULL, defaultNULL): base seed. IfNULL, the global RNG state is left untouched.
Mathematics
When base_seed is non-NULL, the effective seed is
The draws are:
and, when the corresponding flag is set:
Returns
A named list suitable for passing as init to a Stan sampler. Scalar parameters are wrapped in 1-element arrays via as.array(); W_raw is a matrix; theta_ref, a_raw, and c_b_raw are numeric vectors.
Notes
- When
base_seedis non-NULL, the function callsset.seed(rng_seed)and registers anon.exithandler that restores the prior.Random.seedstate in.GlobalEnv(if it existed) upon return. If.Random.seeddid not exist in.GlobalEnv, the handler does nothing (the seed set byset.seedpersists). - The
on.exithandler is registered withadd = TRUE, so it composes with any pre-existing exit handlers. - Flags are tested with
isTRUE(... == 1L), so any value other than exactly1L(includingTRUEor1) is treated as inactive.
Purpose
Entry point for the Proposition 7B coverage-discrepancy correction in the EB workflow. In the scalar regime (.gdpar_eb_correction_matrix(). The correction is not applied to the raw draws here—only the scaling object is returned for downstream S3 methods.
Arguments
-
eb_correction(logical): whether the correction should be applied. -
laplace_result(list): result of the Laplace approximation step. Must containtheta_ref_cov(a matrix, or at least an indexable object for the[1L, 1L]element in the scalar path). -
stan_data(list): the Stan data list. Passed through but not directly used in the scalar computation. -
p(integer, default1L): dimensionality of the reference parameter for the correction. -
verbose(logical): whether to emit a diagnostic warning when the correction is disabled.
Mathematics
Scalar form (
For the default identity functional
with
Returns
A list with two elements:
-
applied(logical):TRUEif the correction was successfully computed,FALSEotherwise. -
constant(numeric scalar): the scalar correction$C_{g,\alpha}$ whenapplied = TRUE;NA_real_otherwise.
When .gdpar_eb_correction_matrix() produces (a constant).
Notes
- If
eb_correctionisFALSEandverboseisTRUE, a warning is issued viagdpar_warn()with class"gdpar_diagnostic_warning", stating that intervals will use nominal coverage and may under-cover by$O(n^{-1})$ . - The marginal variance is extracted as
laplace_result$theta_ref_cov[1L, 1L]inside atryCatch; any error yieldsNA_real_. - If the marginal variance is not finite or is
$\leq 0$ , the function returnsapplied = FALSE, constant = NA_real_silently (no warning). - For
$p > 1$ ,pis coerced to integer before the delegation check.
Purpose
Computes the matrix-valued Proposition 7B* coverage-discrepancy correction for the multivariate regime (.gdpar_eb_apply_correction() and implements v07b Section 5.1.
Arguments
-
eb_correction(logical): whether the correction should be applied. -
laplace_result(list): Laplace approximation result containingtheta_ref_cov. -
stan_data(list): Stan data list (passed through, not used in computation). -
p(integer, default1L): dimension of the reference parameter. -
verbose(logical): intended for diagnostics (not directly used in the body beyond being accepted).
Mathematics
Matrix form (Proposition 7B*, v07b Section 5.1):
For the default identity functional
with
Returns
A list with two elements:
-
applied(logical):TRUEif the matrix correction was successfully computed. -
constant(matrix): the$p \times p$ (or matchingcov_matdimension) correction matrix whenapplied = TRUE; anNA_real_matrix of appropriate size otherwise.
Notes
- The function aborts silently to
applied = FALSEwith an NA matrix in the following cases:-
eb_correctionis notTRUE. -
laplace_result$theta_ref_covisNULL, not a matrix, or non-square (extraction wrapped intryCatchreturningNULLon error). - Any element of
cov_matis non-finite. - Eigenvalues of
cov_mat(computed viaeigen(..., symmetric = TRUE, only.values = TRUE)) are non-finite, or any eigenvalue is$< -10^{-10}$ (i.e., the matrix is not positive semi-definite within tolerance).
-
- When the PSD check fails, the returned NA matrix has dimensions matching
nrow(cov_mat)/ncol(cov_mat), not necessarilyp. - The eigenvalue extraction is wrapped in
tryCatchreturningNA_real_on error, which then triggers the non-finite check. - Downstream S3 methods are expected to fall back to nominal credible intervals when
applied = FALSE.
.gdpar_eb_resolve_K_inputs(formula, amm, W, family, formula_set_input, amm_list_input, classic_with_amm_calls, family_is_named_list)
Purpose
Resolves the three possible K-input patterns (formula set, named list of amm_spec, or classic formula with AMM wrapper calls) into a single canonical amm_list_canonical, and promotes the family scope accordingly. This mirrors the K-input dispatch logic of gdpar() and is the EB-path companion of .gdpar_K. The logic is intentionally duplicated rather than refactored to preserve bit-exact behaviour of golden tests.
Arguments
-
formula(formula orgdpar_formula_set): the model formula or formula set. -
amm(amm_specor named list ofamm_spec): the AMM specification(s). -
W(matrix orNULL): the$W$ matrix passed to AMM construction. -
family(gdpar_familyor named list ofgdpar_family): the response family specification. -
formula_set_input(logical): whetherformulais agdpar_formula_set. -
amm_list_input(logical): whetherammis a named list ofamm_spec. -
classic_with_amm_calls(logical): whether the formula RHS containsa()/b()/W()wrapper calls. -
family_is_named_list(logical): whetherfamilyis a named list (heterogeneous K-slot pattern).
Returns
A list with elements:
-
amm_list_canonical(named list ofamm_spec): the resolved canonical AMM specifications, one per K-slot. -
K(integer): length ofamm_list_canonical. -
outcome_name(character): the name of the outcome variable extracted from the formula. -
formula_env(environment): the environment associated with the formula. -
family_promoted: the family object after scope promotion (either a promotedgdpar_familyor a heterogeneous family structure). -
family_id_k_vector(integer vector orNULL): per-observation family IDs when the heterogeneous path is taken;NULLotherwise.
Notes
Three dispatch branches, evaluated in order:
-
formula_set_inputbranch:ammmust be the defaultamm_spec()(checked via.gdpar_is_default_amm_spec()); otherwise an error of class"gdpar_input_error"is raised. The canonical list is built by.gdpar_formula_set_to_amm_spec_list(formula, W).outcome_nameandformula_envare taken fromformula$outcomeandformula$env. -
amm_list_inputbranch:ammis used directly asamm_list_canonical. Each slot name must be non-empty (checked vianzchar()), each entry must inherit from class"amm_spec", and slot names must be unique (anyDuplicated(...) == 0L).formulamust be a two-sided formula (length(formula) == 3L). Violations raise"gdpar_input_error".outcome_nameisas.character(formula[[2L]]);formula_envisenvironment(formula). -
Classic (else) branch:
ammmust be the defaultamm_spec(). The first eligible parameter name is extracted fromfamily—eitherfamily[[1L]]$param_specs[[1L]]$name(iffamily_is_named_list) orfamily$param_specs[[1L]]$name. Agdpar_formula_setis constructed viado.call(gdpar_formula_set, args_for_fs)with the formula named by that parameter, then.gdpar_formula_set_to_amm_spec_list(fs, W)builds the canonical list.
After resolution,
-
$K > 1$ : Iffamily_is_named_list, calls.gdpar_resolve_heterogeneous_family_K(family, names(amm_list_canonical))and unpackslocation_familyandfamily_id_k_vector. Otherwise calls.gdpar_promote_scope_per_observation(family, names(amm_list_canonical))withfamily_id_k_vector = NULL. -
$K = 1$ : Iffamily_is_named_list, raises"gdpar_input_error"(heterogeneous path requires$K \geq 2$ ). Otherwise calls.gdpar_promote_scope_per_observation(family, k_name)withfamily_id_k_vector = NULL.
Errors raised (all via gdpar_abort with class = "gdpar_input_error"):
- Formula set path with non-default
amm. - Named-list
ammwith empty slot name, non-amm_specentry, or duplicated names. - Named-list
ammwithformulathat is not a two-sided formula. - Classic path with non-default
amm. - Heterogeneous family (
family_is_named_list = TRUE) resolved to$K = 1$ .
The data field of the abort is populated for some errors (e.g., list(slot = ..., received = ...) and list(K = K)).
.gdpar_eb_run_K(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, family_id_k_vector, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group, parametrization, id_check_rigor, eb_correction, laplace_control, call, ...)
Purpose
Primary orchestrator for the Empirical-Bayes ("eb") estimation path under the regime gdpar_eb_fit.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm_list_canonical |
named list of length |
Canonical AMM (anchor model matrix) specifications. Each element is a list potentially containing $a (formula for the $b (formula for the $W (pre-specified basis matrix or formula). Names become slot_names. |
family |
list / character | Response family specification passed to Stan code generators and data assemblers. |
data |
data.frame | Data containing the outcome column and all covariates referenced in slot formulas. |
prior |
list | Prior specification passed to Stan code generators. |
anchor |
various | Anchor specification (scalar, vector, or special keyword) resolved by resolve_anchor_K. |
outcome_name |
character | Name of the outcome column in data. |
formula_env |
environment | Environment attached to all internally constructed formulas via stats::as.formula(..., env = formula_env). |
family_id_k_vector |
integer vector | Per-slot family identifiers of length .assemble_stan_data_K. |
skip_id_check |
logical | If TRUE, all identifiability checks (per-slot and id_report is set to NULL. |
chains |
numeric/integer | Number of MCMC chains for the conditional model; coerced to integer. |
iter_warmup |
numeric/integer | Warmup iterations; coerced to integer. |
iter_sampling |
numeric/integer | Sampling iterations; coerced to integer. |
adapt_delta |
numeric | Stan NUTS adapt_delta control parameter. |
max_treedepth |
numeric/integer | Stan NUTS maximum tree depth; coerced to integer. |
refresh |
numeric/integer | Stan output refresh interval; coerced to integer. |
verbose |
logical | Controls show_messages, show_exceptions in Stan sampling, and verbosity of helper calls. |
seed |
integer or NULL
|
Random seed for Stan sampling and Laplace maximization. If non-NULL, coerced to integer. |
group |
various | Grouping specification for hierarchical structure, resolved by .resolve_group_argument. |
parametrization |
character | Requested parametrization ("cp" for centered, otherwise non-centered). In Path B both cp_a and cp_W are set uniformly across all |
id_check_rigor |
various | Rigor level forwarded to .check_identifiability_K for the |
eb_correction |
logical | Whether to apply the Proposition 7B coverage-discrepancy correction. |
laplace_control |
list | Control parameters forwarded to .gdpar_eb_maximize_marginal. |
call |
call | The original top-level function call, stored in the returned object. |
... |
any | Extra named arguments merged into the sample_args list passed to cmdstanr's $sample() method, potentially overriding defaults. |
Mathematics
The function implements a two-stage Empirical Bayes estimator:
Stage 1 — Laplace marginal maximization. The marginal Stan model is generated and compiled. The marginal log-posterior of the anchor parameters is maximized:
where
Stage 2 — Conditional MCMC. The conditional Stan model is generated, compiled, and sampled with theta_ref_k_data), drawing from:
EB correction (Proposition 7B, scalar form at
where .gdpar_eb_apply_correction.
Identifiability diagnostic test point. For each slot
This avoids testing identifiability at a degenerate zero anchor when a
Returns
A list with S3 class c("gdpar_eb_fit", "list") containing:
| Element | Type / Structure | Description |
|---|---|---|
theta_ref_hat |
numeric | Laplace point estimate of the anchor (flat vector of length |
theta_ref_se |
numeric | Standard error of the Laplace estimate. |
conditional_fit |
CmdStanMCMC |
The cmdstanr fit object from the conditional model. |
amm_list_canonical |
named list | The input AMM list (with $W slots materialized). |
family |
— | The input family. |
prior |
— | The input prior. |
design_K |
list | Design structure from .build_amm_design_K, containing Z_a_k_list, Z_b_k_list, X, etc. |
anchor |
numeric vector | Resolved anchor values of length resolve_anchor_K. |
stan_data |
list | Assembled Stan data list from .assemble_stan_data_K, augmented with K_slots and p_dim. |
identifiability_report |
named list or NULL
|
Per-slot identifiability reports (named by slot_names) with a K_level attribute; NULL when skip_id_check = TRUE. |
diagnostics |
— | MCMC diagnostics from compute_diagnostics. |
diagnostics_numerical |
— | Laplace optimizer diagnostics from laplace_result$diagnostics. |
parametrization |
list | Resolved parametrization with elements cp_a (logical), cp_W (logical), cp_a_per_K (NULL), and meta (list with mode = "eb_K_path_B", note, requested). |
group_info |
list or NULL
|
Resolved group information from .resolve_group_argument. |
correction_applied |
logical | Whether the EB correction was applied. |
eb_correction_constant |
— | The correction constant from .gdpar_eb_apply_correction. |
call |
call | The original function call. |
path |
character | Always "eb". |
K |
integer | Number of slots. |
slot_names |
character | Names of the AMM list elements. |
Notes
Input validation errors (class gdpar_input_error):
- If
outcome_nameis not a column indata. - If the outcome
yis a matrix or an array withlength(dim(y)) > 1(Path B requires a length-$n$ univariate vector shared across all$K$ slots). - If
ycontains any non-finite values: for numericy, any!is.finite(y)(NA, NaN, Inf); for non-numericy, anyis.na(y).
Identifiability errors (class gdpar_identifiability_error):
- If any per-slot check
gdpar_check_identifiabilityreturnsrep_k$passed != TRUE, the errordatafield containsslot(name) andreport(the full report). - If the
$K$ -level check.check_identifiability_Kreturnspassed != TRUE, the errordatafield containsreport(the$K$ -level report). - Both are bypassed when
skip_id_check = TRUE.
Formula construction:
- The union of all variables across all slots'
$aand$bformulas is collected. If empty, the RHS is"1"; otherwise it ispaste(union_vars, collapse = " + "). - The full formula is
outcome_name ~ rhs_strwithenv = formula_env. - The RHS is extracted as
formula_full[c(1L, 3L)](a one-sided formula) and updated with~ . + 0to remove the intercept.
W basis materialization:
- For each slot with a non-
NULL$Welement,materialize_W_basis(amm_list_canonical[[k]]$W, p = 1L)is called in place, mutatingamm_list_canonical.
Parametrization resolution:
- Both
cp_aandcp_Ware set toidentical(parametrization, "cp"), meaning the same parametrization is applied uniformly across all$K$ slots. Themeta$noteexplicitly states that per-slot preflight (cp_a_per_K) is queued but not yet implemented.
theta_ref_k_data reshaping:
-
theta_hatis extracted asas.numeric(laplace_result$theta_ref_hat). -
J_groups_locis read fromstan_data$J_groupsand coerced to integer. - The if/else branches both produce
matrix(theta_hat, nrow = J_groups_loc, ncol = K, byrow = FALSE)— the two branches are functionally identical, suggesting a placeholder for future differentiation. - The resulting matrix is assigned to
stan_data_cond$theta_ref_k_data, intended for Stan'sarray[J_groups] vector[K]consumer.
Stan model lifecycle:
- The marginal Stan source is generated by
.gdpar_eb_generate_stan_marginal, written to a tempfile viawrite_stan_to_tempfile, and compiled withcmdstanr::cmdstan_model. - The conditional Stan source is generated by
.gdpar_eb_generate_stan_conditionaland undergoes the same write-and-compile cycle. - Both tempfile paths are transient (side effect on the filesystem).
Sample arguments:
- The
sample_argslist is constructed with explicit integer coercions forchains,iter_warmup,iter_sampling,max_treedepth, andrefresh. -
adapt_deltais passed without coercion. -
show_messagesandshow_exceptionsare both set toverbose. -
seedis added only if non-NULL. - Extra arguments from
...are merged intosample_argsby name, potentially overriding any of the above defaults. - Sampling is invoked via
do.call(conditional_model$sample, sample_args).
Group aliasing:
- When
group_infois non-NULL,.check_group_aliasing_c7is called for each slot$k$ with a design list containingZ_a = design_K$Z_a_k_list[[k]],Z_b = design_K$Z_b_k_list[[k]], andX = design_K$X.
Trailing roxygen block:
- The section concludes with a roxygen
@noRddocumentation block for an internal function implementing the tensor-valued Proposition 7B* correction under$K > 1$ and$p > 1$ . The function itself is not defined in this section; its documented signature includes parameterseb_correction,laplace_result_per_slot,K,p, andverbose, and it returns a list withapplied,constant(3D array$[K, p, p]$ ), andslot_dispositions. This function appears in a subsequent section.
Purpose
Builds a three-dimensional correction tensor for the Path C empirical-Bayes (EB) regime. The tensor scales each slot's reference-parameter covariance (extracted from per-slot Laplace results) by a fixed multiplier and is consumed downstream by S3 coverage methods. If any slot fails validation, the entire correction is disabled and downstream methods fall back to nominal coverage.
Arguments
-
eb_correction— logical scalar (or any value testable byisTRUE). When notTRUE, the function short-circuits and returns a disabled result with no slot processing. -
laplace_result_per_slot— list of lengthK. Each element is a Laplace-fit result object expected to contain atheta_ref_cov_kfield holding the$p \times p$ covariance of the reference parameters for that slot. -
K— integer-ish scalar; number of slots. Coerced to integer. Default2L. -
p— integer-ish scalar; number of coordinates per slot. Coerced to integer. Default1L. -
verbose— logical scalar; whenTRUEand at least one slot fails, a diagnostic warning is emitted viagdpar_warn.
Mathematics
For each slot
where theta_ref_cov_k matrix for slot
The positive-semidefinite check uses eigenvalues eigen(..., symmetric = TRUE, only.values = TRUE); a slot is rejected if any
Returns
A named list with three components:
-
applied— logical scalar.TRUEonly if every slot passed validation and the tensor was filled. -
constant— numeric array of dimensionsc(K, p, p). Whenapplied = TRUE, filled with the scaled covariances; otherwise filled withNA_real_(the "empty tensor"). -
slot_dispositions— named character vector of lengthK(names areseq_len(K)coerced to character). Each entry is one of:"disabled"(correction globally off),"missing"(covariance absent or wrong shape),"non_finite"(covariance contains non-finite entries),"non_psd"(eigenvalue check failed), or"ok".
Notes
- The multiplier
kappa_alpha_95is hardcoded to1.92(not the exact 1.959964… standard-normal 97.5th percentile). - When
eb_correctionis notTRUE, the returnedslot_dispositionsare all"disabled"and names are set viasetNames(rep("disabled", K), seq_len(K)). - When any slot fails,
any_failedis set, the function returnsapplied = FALSEwith an empty (NA) tensor, and—ifverbose—a warning of class"gdpar_diagnostic_warning"is emitted viagdpar_warnsummarising the count and unique failure types. - The PSD eigen-decomposition is wrapped in
tryCatch; an error fromeigenyieldsNA_real_values, which then trigger the"non_psd"disposition. - The covariance shape check requires
is.matrix(cov_k),nrow == p, andncol == p.
Purpose
Local helper that allocates a fresh NA_real_, used as the default/disabled constant tensor.
Arguments
None. Captures K and p from the enclosing .gdpar_eb_correction_tensor scope.
Returns
A numeric array of dimensions c(K, p, p) filled entirely with NA_real_.
Notes
Defined as a closure; not accessible outside its parent function.
Purpose
Constructs the per-slot multivariate (ragged) design matrices for the Path C K slots of a canonicalised amm_spec list, enforces homogeneous .build_amm_design_multi() for each slot. The returned structure is the direct input consumed by .assemble_stan_data_KxP().
Arguments
-
amm_list_canonical— named list of length$K \geq 2$ ofamm_specobjects. Each object must carry a$pfield (defaulting to1Lvia%||%if absent) that is$\geq 2$ and identical across all slots. -
data— data frame containing the variables referenced by the per-slot AMM specifications. Validated byassert_data_frame(). -
formula_rhs— two-sided formula identifying the covariate columns ofdataused as the linear factor$x$ . Passed through verbatim to.build_amm_design_multi()for each slot.
Returns
A named list with:
-
K— integer scalar; number of slots. -
p— integer scalar; the homogeneous coordinate dimension (taken from the first slot). -
slot_names— character vector of lengthK; thenames()ofamm_list_canonical. -
design_per_slot— named list of lengthK. Each entry is the list returned by.build_amm_design_multi(a_k, data, formula_rhs)for that slot'samm_spec.
Notes
- Aborts with class
"gdpar_internal_error"viagdpar_abortifamm_list_canonicalis not a list or has length< 2. - Aborts if any element lacks a non-empty name (
is.null(slot_names)orany(!nzchar(slot_names))). - Aborts if the per-slot
$pvalues are not all$\geq 2$ or not all identical. The error message includes the comma-separatedp_per_slotvector. - Each slot is validated with
assert_inherits(a_k, "amm_spec", ...)before delegation. - The
$pextraction usesa$p %||% 1L, so a missing$pfield is treated as1L—which then triggers the homogeneous-$p \geq 2$ abort.
.assemble_stan_data_KxP(design_KxP, family, amm_list_canonical, y_matrix, theta_anchor_kp, group_id = NULL, path = c("EB", "FB"), cp_W = FALSE)
Purpose
Assembles the complete named-list data block consumed by the Path C Stan templates (amm_eb_marginal_KxP.stan / amm_eb_conditional_KxP.stan for the EB path; amm_canonical_pmulti_KxP.stan for the FB path). Dispatches on path to enforce or lift the Sub-phase 8.6.D first-iteration restrictions: EB hardcodes use_W = 0 and restricts stan_id to
Arguments
-
design_KxP— list returned by.build_amm_design_KxP(). Must contain$design_per_slot,$K, and$p. -
family— promotedgdpar_familyobject (validated byassert_inherits). Must carry a$stan_idfield and a$namefield. -
amm_list_canonical— named list ofKamm_specobjects with$p \geq 2$ per slot. Used to extract per-slotuse_a/use_bflags and (FB path)$W$ metadata. -
y_matrix— numeric or integer matrix of outcomes, shape$n \times p$ . -
theta_anchor_kp— numeric matrix of shape$K \times p$ ; per-slot per-coordinate anchors on the linear-predictor scale. -
group_id— optional integer vector of length$n$ . Resolved via.resolve_group_id(). -
path— character scalar; one of"EB"or"FB"(resolved bymatch.arg). Default"EB". -
cp_W— logical scalar. Present in the signature but not referenced anywhere in the function body.
Mathematics
Per-slot per-coordinate design matrices are packed into 4D arrays with zero-padding:
where
and
For the FB path with
with
The use_a_k / use_b_k flags are computed as
and analogously for use_b_k.
Returns
A named list. The base list (returned for both paths) contains:
| Field | Type | Description |
|---|---|---|
n |
integer | Number of observations. |
K |
integer | Number of slots. |
p |
integer | Coordinate dimension. |
family_id_k_vector |
integer vector (length K) |
Homogeneous stan_id replicated K times. |
inv_link_id_per_slot |
integer vector (length K) |
Computed by .gdpar_compute_inv_link_id_per_slot(). |
use_a_k |
integer vector (length K) |
Per-slot |
use_b_k |
integer vector (length K) |
Per-slot |
use_W |
integer scalar |
0L (EB) or as.integer(any_W) (FB). |
J_a_max |
integer | Maximum |
J_b_max |
integer | Maximum |
J_a_per_kp |
integer matrix ( |
Per-slot per-coord |
J_b_per_kp |
integer matrix ( |
Per-slot per-coord |
Z_a_kp |
numeric array ( |
Padded |
Z_b_kp |
numeric array ( |
Padded |
y_real |
numeric matrix ( |
Real-valued outcomes (or zeros if needs_real is FALSE). |
y_int |
integer matrix ( |
Integer-valued outcomes (or zeros if needs_int is FALSE). |
theta_anchor_kp |
list of K double vectors (each length p) |
Row-wise decomposition of the input matrix. |
use_dispersion_y_k |
integer vector (length K) |
Always zero in both paths. |
use_dispersion_phi_k |
integer vector (length K) |
Always zero in both paths. |
use_groups |
(from .resolve_group_id) |
Group flag. |
J_groups |
(from .resolve_group_id) |
Number of groups. |
group_id |
(from .resolve_group_id) |
Group index vector. |
K_slots |
integer | Redundant copy of K. |
p_dim |
integer | Redundant copy of p. |
For the FB path only, the list is extended (c(base_list, ...)) with:
| Field | Type | Description |
|---|---|---|
dim_W |
integer | Total 0L. |
d |
integer | Number of columns in the shared design matrix |
W_per_kj_dim |
integer | Per-(slot, coord) basis dimension. |
X |
numeric matrix ( |
Shared linear-factor design matrix (or |
W_type_id |
(from .gdpar_resolve_W_stan_data) |
|
W_n_knots_full |
(from .gdpar_resolve_W_stan_data) |
Knot count. |
W_knots_full |
(from .gdpar_resolve_W_stan_data) |
Knot vector. |
W_degree |
(from .gdpar_resolve_W_stan_data) |
Spline degree. |
Notes
-
EB path restrictions:
use_Wis hardcoded to0L;stan_idmust be in$\{1, 3\}$ (Gaussian or Negative Binomial), otherwise a"gdpar_unsupported_feature_error"is raised. If any slot declaresW != NULLon the EB path, a"gdpar_unsupported_feature_error"is raised. -
FB path extensions:
stan_idmust be in$\{1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13\}$ ; otherwise a"gdpar_unsupported_feature_error"is raised.$W$ is enabled if any slot declares it; the first such slot's$Wobject defines the basis metadata (shared globally). -
Outcome validation: For count families (
stan_id$\in \{3, 10, 11, 12, 13\}$ ), every entry ofy_matrixmust be a finite, non-negative integer; otherwise a"gdpar_input_error"is raised. For continuous families (stan_id$\in \{1, 5, 6, 7, 8, 9\}$ ), every entry must be finite. -
y_real/y_intpopulation:needs_realisTRUEforstan_id$\in \{1, 5, 6, 7, 8, 9\}$ ;needs_intisTRUEforstan_id$\in \{3, 10, 11, 12, 13\}$ . The unused matrix is zero-filled. -
theta_anchor_kpis validated as a$K \times p$ matrix and then decomposed row-wise into a list ofKlength-pdouble vectors vialapply(seq_len(K), function(k) as.double(theta_anchor_kp[k, ])). -
family_id_k_vectorisrep(as.integer(stan_id), K)—homogeneous across slots regardless of path. -
use_dispersion_y_k/use_dispersion_phi_kare zero vectors in both paths (the FB comment notes future B9.7+ may lift this). -
cp_Wis accepted as a parameter but never read. -
Internal errors (class
"gdpar_internal_error") are raised for: invaliddesign_KxPstructure;K < 2orp < 2;y_matrixnot a matrix;y_matrixcolumn count mismatch;theta_anchor_kpshape mismatch; FB path withdim_W <= 0whenuse_W == 1. - Calls
.resolve_group_id(),.gdpar_compute_inv_link_id_per_slot(), and (FB only).gdpar_resolve_W_stan_data(). - The
pad_tolocal helper (see below) handles zero-padding of design matrices.
Purpose
Zero-pads a design matrix z to target_cols columns. If target_cols is 0L, returns an z already has at least target_cols columns, returns z unchanged. Otherwise right-pads with a zero matrix.
Arguments
-
z— numeric matrix; the per-slot per-coordinate design matrix to pad. -
target_cols— integer scalar; the target column count ($J_{a,\max}$ or$J_{b,\max}$ ). -
n_rows— integer scalar; the number of rows to use whentarget_cols == 0L(i.e.,$n$ ).
Returns
A numeric matrix with nrow(z) rows and max(ncol(z), target_cols) columns (or n_rows rows and 0 columns when target_cols == 0L).
Notes
Defined as a local closure inside .assemble_stan_data_KxP; captures nothing from the enclosing scope (all inputs are explicit arguments). When target_cols > 0L but ncol(z) >= target_cols, z is returned as-is (no truncation occurs even if z has more columns than the target).
Purpose
Internal helper that fabricates a random initial-values list for the cmdstanr optimizer / Laplace approximation in the Path C K×p EB workflow. The returned list conforms to the cmdstanr automatic packing convention for the theta_ref_kp parameter (a 3D array [J, K, p]) and conditionally emits the auxiliary scale / raw-coefficient parameters that the K×p Stan template exposes when group structure or free a/b coefficients are active.
Arguments
-
stan_data— list. The Stan data environment. The following fields are consulted (via null-coalescing%||%):K_slots(fallbackK),p_dim(fallbackp),J_groups(fallback1L),use_groups(fallback0L),use_a_k,use_b_k,J_a_per_kp,J_b_per_kp. -
seed_offset— integer scalar, default1L. Integer added tobase_seedto derive the per-start RNG seed, enabling distinct inits across multi-start iterations. -
base_seed— integer scalar orNULL. When non-NULL, the function seeds the global RNG withas.integer(base_seed) + seed_offsetand restores the prior.Random.seedstate on exit. WhenNULL, no seeding is performed and the global RNG state is untouched.
Mathematics
The RNG seed is
Draws produced (all i.i.d. unless noted):
-
theta_ref_kp[g,k,c]$\sim \mathcal{N}(0,\, 0.1^2)$ , shape$[J, K, p]$ . - When
use_groups == 1:-
mu_theta_ref_kp[1,k,c]$\sim \mathcal{N}(0,\, 0.1^2)$ , shape$[1, K, p]$ . -
sigma_theta_ref_kp[1,k,c]$= |\mathcal{N}(0.5,\, 0.05^2)|$ , shape$[1, K, p]$ .
-
- When any
use_a_k == 1:-
sigma_a_k[s]$= 0.1 + |\mathcal{N}(0,\, 0.02^2)|$ for$s = 1, \dots, n_{\sigma_a}$ , where$n_{\sigma_a}$ is the count of slots$k$ satisfyinguse_a_k[k] == 1and$\sum_{c} \mathbf{1}\{J_{a,\text{per\_kp}}[k,c] > 0\} > 0$ . -
a_raw[j]$\sim \mathcal{N}(0,\, 0.1^2)$ for$j = 1, \dots, \sum_{k,c} J_{a,\text{per\_kp}}[k,c]$ .
-
- When any
use_b_k == 1:-
sigma_b_k[k]$= 0.1 + |\mathcal{N}(0,\, 0.02^2)|$ for$k = 1, \dots, K$ . -
c_b_kp_raw[j]$\sim \mathcal{N}(0,\, 0.1^2)$ for$j = 1, \dots, \sum_{k,c} J_{b,\text{per\_kp}}[k,c]$ .
-
Returns
A named list. Always contains theta_ref_kp (a 3D numeric array of dim c(J, K, p)). Conditionally also contains:
-
mu_theta_ref_kp— 3D array[1, K, p](only whenuse_groups == 1). -
sigma_theta_ref_kp— 3D array[1, K, p](only whenuse_groups == 1). -
sigma_a_k— 1D numeric array of lengthn_sigma_a(only whenany_use_a == 1andn_sigma_a > 0). -
a_raw— numeric vector of lengthtotal_J_a_free(only whenany_use_a == 1andtotal_J_a_free > 0). -
sigma_b_k— 1D numeric array of lengthK(only whenany_use_b == 1). -
c_b_kp_raw— numeric vector of lengthtotal_J_b_free(only whenany_use_b == 1andtotal_J_b_free > 0).
Notes
- Side effect: when
base_seedis non-NULL, the global.Random.seedis overwritten viaset.seed(rng_seed)and restored on function exit through anon.exithandler. If.Random.seeddid not previously exist in.GlobalEnv, the handler performs no restoration (the seed state is left as set). - The slot-free-a mask is computed by coercing
stan_data$J_a_per_kpto an integer matrix of shapeK × p(row-major), then takingrowSums(.jap > 0L) > 0Lintersected withuse_a_k == 1L. This mirrors then_sigma_atransformed-data quantity of the K×P Stan template; when every slot carries freeacoefficients,$n_{\sigma_a} = K$ and the draw count is bit-identical to the unconditional case. -
sigma_a_kandsigma_b_kare wrapped withas.arrayto ensure 1D-array typing expected by cmdstanr init packing. - No errors are raised by this function; malformed
stan_datawould propagate as errors from downstream coercions (e.g.as.integer,matrix).
Purpose
Step (i) of the EB workflow under Path C, specialized for the K×p regime. Runs a multi-start joint Laplace approximation over the full theta_ref_kp anchor tensor of shape [J_groups, K, p], selects the best init by marginal log-likelihood, draws from the Laplace approximation, extracts per-slot .gdpar_eb_correction_tensor().
Arguments
-
model— cmdstanr model object. Must expose$optimizeand$laplacemethods. -
stan_data— list. Stan data list; must containJ_groups,K_slots,p_dim, plus whatever fields.gdpar_eb_make_random_init_KxPrequires. -
control— list. Must contain at least:multi_start_M(integer, number of starts),optim_algorithm(passed tomodel$optimize),laplace_draws(integer, number of Laplace draws),kappa_threshold(numeric, condition-number gate), and any fields consumed by.gdpar_eb_lm_perturb. -
seed— integer scalar orNULL. Base seed for reproducibility; propagated to both the init generator and cmdstanr. -
verbose— logical. Controls emission of informational messages viagdpar_informandgdpar_warn.
Mathematics
Multi-start optimization. For
with optimizer seed seed non-NULL). The marginal log-likelihood of each start is
where inits that errored are skipped (their NA_real_).
Laplace approximation. Given
with seed non-NULL).
Posterior mean of the anchor tensor:
Per-slot covariance. For slot
where .gdpar_eb_lm_perturb, yielding a ridge-perturbed
Dispersion diagnostic. Let
Returns
A list with three top-level components:
-
theta_ref_kp_hat— 3D numeric array of dimc(J, K, p), the posterior-mean anchor tensor. -
laplace_result_per_slot— named list of lengthKwith namespaste0("slot_", 1:K). Each entry is a list with elementtheta_ref_cov_k, a$p \times p$ numeric matrix (the ridge-perturbed, group-averaged slot covariance). -
diagnostics— list with:-
kappa_per_slot— numeric vector of lengthK(post-ridge condition numbers). -
lm_lambda_per_slot— numeric vector of lengthK(LM ridge lambda used per slot). -
lm_n_iter_per_slot— integer vector of lengthK(LM iterations per slot). -
lm_status_per_slot— character vector of lengthK(LM status per slot;"not_needed"when no perturbation was required). -
multi_start_dispersion— numeric scalar orNA_real_. -
marginal_log_lik_history— numeric vector of lengthM(the$\ell_m$ values, withNA_real_for failed starts). -
best_init_index— integer scalar$m^\star$ .
-
Notes
-
Multi-start loop: each start's
model$optimizecall is wrapped intryCatch; on error,verbose-gatedgdpar_informmessage of classgdpar_eb_messageis emitted and the start is recorded asNULL(skipped vianext). Thelp__extraction is itself wrapped intryCatchreturningNA_real_on failure. -
Best-init selection: a start becomes the new best if
best_idxisNA, or if its$\ell_m$ is finite and strictly greater than the current best's recorded value (which may beNA). This means the first non-errored start with finite$\ell_m$ is always selected as a fallback. -
All-starts-failed abort: if
best_optremainsNULLafter the loop,gdpar_abortis invoked with classc("gdpar_unsupported_feature_error")anddata = list(history_lp = history_lp), recommendinggdpar()(FB) instead. -
Laplace failure abort: if
model$laplaceerrors (caught bytryCatchreturningNULL),gdpar_abortis invoked with classc("gdpar_eb_numerical_error")anddata = list(history_lp = history_lp, best_idx = best_idx). -
Missing-variable abort: the expected
theta_ref_kp[g,k,c]variable names are enumerated viaexpand.grid(g = 1:J, k = 1:K, c = 1:p)(row-major in(g, k, c)). If the intersection withdimnames(draws)$variabledoes not cover all expected names,gdpar_abortis invoked with classc("gdpar_internal_error"). -
Condition-number gate: after per-slot LM perturbation, any slot with finite
$\kappa_k > \text{control\$kappa\_threshold}$ or LM status"exhausted"triggersgdpar_abortof classc("gdpar_eb_numerical_error")with a detaileddatalist carrying per-slot kappa, lambda, n_iter, status, and the fullhistory_lp. -
Dispersion warning: when
dispersion > 0.05andverboseisTRUE,gdpar_warnof classc("gdpar_diagnostic_warning")is emitted withdata = list(dispersion, history_lp), flagging possible multimodality. -
Draws handling: uses
posterior::subset_drawsandposterior::as_draws_matrix; the%||%operator guardsdimnames(draws)$variable. -
Group-block indexing: within a slot's column subset, the
$g$ -th group block is assumed to occupy contiguous indices((g-1)*p + 1):((g-1)*p + p), consistent with the(g, k, c)row-major enumeration fromexpand.grid. -
Single-draws-column slot: when a slot has only one column in
mat_kp(i.e.$J \cdot p = 1$ for that slot), the covariance is computed asmatrix(stats::var(...), 1, 1)rather thanstats::cov. -
External dependencies: calls
.gdpar_eb_lm_perturb,gdpar_inform,gdpar_abort,gdpar_warn(all defined elsewhere in the package).
Purpose
Internal helper that resolves the user-supplied anchor argument into a canonical "prior_mean", and the string "empirical_y".
Arguments
-
anchor—numericscalar,matrixof shape$K \times p$ , orcharacterscalar in{"prior_mean", "empirical_y"}. The unresolved anchor specification. -
family— a family object (list-like) exposing alinkfunfunction in its location slot; used only whenanchor == "empirical_y". -
y_matrix—numericmatrixof shape$n \times p$ containing the multivariate outcome; consumed only for the"empirical_y"branch viacolMeans. -
K—integer/numericscalar; number of slots (rows of the resolved anchor). -
p—integer/numericscalar; number of coordinates per slot (columns of the resolved anchor). -
verbose—logicalscalar; whenTRUE, an informational message is emitted for the"empirical_y"branch viagdpar_inform.
Mathematics
For the scalar branch the resolved matrix is
For "prior_mean",
For "empirical_y", let y_matrix and family$linkfun. Then
Returns
A matrix of mode double with nrow = K and ncol = p. In the "empirical_y" branch, row 1 holds
Notes
- Scalar branch: requires
is.numeric(anchor),length(anchor) == 1L, andis.finite(anchor); otherwise falls through. - Matrix branch: requires
is.matrix(anchor),nrow(anchor) == K,ncol(anchor) == p. If any entry is non-finite, raises an error of classc("gdpar_input_error", ...)viagdpar_abortwith the message "Argument 'anchor' as a matrix must contain only finite values." - Character branch: only the two exact strings
"prior_mean"and"empirical_y"are recognized. For"empirical_y",family$linkfunis applied per column inside atryCatch; on error,gdpar_abortis invoked with class"gdpar_input_error"and a formatted message naming the offending column index and the captured condition message. - When
verboseisTRUEand the"empirical_y"branch is taken, a message of classc("gdpar_anchor_message", ...)is emitted viagdpar_informdescribing the computed anchor (formatted to 4 digits) and noting that all other slots are anchored at 0. - Fall-through (no recognized mode):
gdpar_abortis called with class"gdpar_input_error"anddata = list(received = anchor, K = K, p = p), with the message enumerating the four admissible forms. - No side effects beyond messages and errors; the function is pure with respect to its inputs.
.gdpar_eb_run_KxP(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, family_id_k_vector, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group, parametrization, id_check_rigor, eb_correction, laplace_control, call, ...)
Purpose
Internal orchestrator for the Path C Empirical-Bayes (EB) pipeline, invoked when gdpar_eb_fit result object.
Arguments
-
amm_list_canonical— namedlistof length$K$ of canonicalizedamm_specobjects. Each element is expected to expose fields$a,$b,$W,$dims, and$p. Names becomeslot_names. -
family— family object/list; passed to.resolve_anchor_KxP,.gdpar_eb_check_stan_id_for_path,.assemble_stan_data_KxP, and the Stan-code generators. Must exposelinkfun(used by the anchor resolver). -
data—data.framecontaining the outcome matrix-column and all RHS variables referenced by the AMM specifications. -
prior— prior specification object; forwarded to the Stan-code generators.gdpar_eb_generate_stan_marginaland.gdpar_eb_generate_stan_conditional. -
anchor— unresolved anchor specification; forwarded to.resolve_anchor_KxP. -
outcome_name—characterscalar naming the matrix-column indatathat holds the$n \times p$ outcome. -
formula_env—environmentused as the enclosure forstats::as.formulaconstructions. -
family_id_k_vector— (declared in the signature but not referenced in the body of this section) per-slot family-id vector. -
skip_id_check—logicalscalar. WhenTRUE, identifiability checking is skipped entirely andid_reportis set toNULL. -
chains—integer/numericscalar; number of MCMC chains, coerced viaas.integer. -
iter_warmup—integer/numericscalar; warmup iterations, coerced viaas.integer. -
iter_sampling—integer/numericscalar; sampling iterations, coerced viaas.integer. -
adapt_delta—numericscalar; CmdStanadapt_deltasetting (passed uncoerced). -
max_treedepth—integer/numericscalar; CmdStanmax_treedepth, coerced viaas.integer. -
refresh—integer/numericscalar; CmdStanrefresh, coerced viaas.integer. -
verbose—logicalscalar; controls informational messages andshow_messages/show_exceptionsto CmdStan. -
seed—NULLorinteger/numericscalar; PRNG seed for both Laplace mode-finding and HMC sampling. -
group— group specification forwarded to.resolve_group_argument. -
parametrization—characterscalar; currently only"cp"is detected (bothcp_aandcp_Ware set to the same value); other values yieldFALSEfor both. -
id_check_rigor— (declared in the signature but not referenced in the body of this section) rigor level for identifiability checks. -
eb_correction— correction specification forwarded to.gdpar_eb_correction_tensor. -
laplace_control—listof control parameters forwarded to.gdpar_eb_maximize_marginal_KxP. -
call— the original call; stored verbatim in the returned object. -
...— extra named arguments spliced into thesample_argslist passed tocmdstanr::cmdstan_model$sample, overriding defaults.
Mathematics
Let
The anchor matrix .resolve_anchor_KxP. The Laplace stage produces a mode estimate
together with per-slot covariance blocks
If
Returns
A list with S3 class c("gdpar_eb_fit", "list") containing:
-
theta_ref_kp_hat— the$J_{\mathrm{groups}} \times K \times p$ array of Laplace mode estimates (fromlaplace_result$theta_ref_kp_hat). -
theta_ref_kp_se—arrayof shape$J_{\mathrm{groups}} \times K \times p$ of standard errors derived from per-slot covariance diagonals. -
theta_ref_kp_cov_per_slot—listof length$K$ of per-slot covariance matrices (theta_ref_cov_k). -
conditional_fit— thecmdstanrfit object returned byconditional_model$sample(...). -
amm_list_canonical— the inputamm_list_canonical(unchanged). -
family— the inputfamily. -
prior— the inputprior. -
design_KxP— the design object returned by.build_amm_design_KxP. -
anchor— the resolved$K \times p$ anchor matrix. -
stan_data— the Stan data list returned by.assemble_stan_data_KxP. -
identifiability_report—NULLwhenskip_id_checkisTRUE; otherwise a namedlistof length$K$ of per-slot reports (entries may beNULLif a per-slot check errored). -
diagnostics— object returned bycompute_diagnostics(fit_cond, verbose = verbose). -
diagnostics_numerical—laplace_result$diagnostics(numerical diagnostics from the Laplace stage). -
parametrization—listwith elementscp_a(logical),cp_W(logical),cp_a_per_K(NULL), andmeta(alistwithmode = "eb_KxP_path_C", anotestring, andrequested = list(parametrization = parametrization)). -
group_info— value returned by.resolve_group_argument(may beNULL). -
correction_applied—correction$applied. -
correction_tensor_constant—correction$constant. -
correction_tensor_dispositions—correction$slot_dispositions. -
call— the inputcall. -
path—"eb_KxP". -
K— integer number of slots. -
p— resolved per-slot coordinate count. -
slot_names—names(amm_list_canonical).
Notes
-
Homogeneous-p requirement. Per-slot
$p$ values are extracted viaa$p %||% 1L. Ifany(p_per_slot != p_resolved), an error of classc("gdpar_unsupported_feature_error", ...)is raised withdata = list(K = K, p_per_slot = p_per_slot), deferring heterogeneous-$p$ support to "Block 9.x". -
Minimum
$p$ . Ifp_resolved < 2L, an error of classc("gdpar_internal_error", ...)is raised, indicating a dispatcher routing mistake. -
Stan-id coverage.
.gdpar_eb_check_stan_id_for_path(family, K, p_resolved)is called before any data validation; this is the gatekeeper for the supported(family, K, p)triples in the first iteration. -
Outcome presence and shape.
outcome_namemust be a column ofdata; otherwise an error of classc("gdpar_input_error", ...)is raised. The outcome object must be amatrix(not a vector/factor); otherwise an error of classc("gdpar_input_error", ...)is raised withdata = list(received_class = class(y_obj), outcome_name = outcome_name). Its column count must equalp_resolved; otherwise an error of classc("gdpar_input_error", ...)is raised withdata = list(received_ncol = ncol(y_obj), p_resolved = p_resolved). -
Outcome finiteness. For numeric outcomes, non-finite entries (
NA,NaN,Inf) are flagged via!is.finite; for non-numeric outcomes,is.nais used. Any non-finite entry triggers an error of classc("gdpar_input_error", ...)reporting the count. -
RHS union. The union of
all.varsover each slot's$a,$b, and (if present) each element of$dims's$a/$bis collected. If empty, the RHS string defaults to"1"; otherwise it ispaste(union_vars, collapse = " + "). A full formulaoutcome_name ~ rhsis constructed informula_env, then the RHS is extracted asformula_full[c(1L, 3L)]and updated with~ . + 0to drop the intercept. -
W component disabled. If any slot declares a non-
NULL$W, an error of classc("gdpar_unsupported_feature_error", ...)is raised (per decision D39), directing the user to remove theW()wrapper or defer to Block 9.x. -
Identifiability checks. When
skip_id_checkisTRUE,id_reportisNULL. WhenFALSE, averboseinformational message of classc("gdpar_eb_message", ...)is emitted noting that the K-level joint check is deferred. Per-slot, the slot's own RHS variables are gathered (analogously to the union above), a per-slot RHS formula is built informula_env, andgdpar_check_identifiabilityis called withtheta_ref_initset toanchor_value[k, 1L](or1when that anchor entry is near zero in absolute value — threshold1e-8— and the slot has a non-NULL$b). Each per-slot call is wrapped intryCatch; on error, averbosemessage of classc("gdpar_eb_message", ...)is emitted and the slot's report is set toNULL. -
Group resolution.
.resolve_group_argument(group, data, n = nrow(y_matrix), verbose = verbose)is called; if it returns non-NULL, its$group_idis forwarded to.assemble_stan_data_KxPasgroup_id. -
Parametrization. Both
cp_aandcp_Ware set toidentical(parametrization, "cp");cp_a_per_KisNULL. Themeta$notedocuments that the first iteration ships with NCP hardcoded in the Stan templates and that per-slot preflight is queued for Block 9.x. -
Stan model generation and compilation.
.gdpar_eb_generate_stan_marginal(prior, cp_a, cp_W, K, p, family)produces the marginal source, written to a tempfile viawrite_stan_to_tempfileand compiled withcmdstanr::cmdstan_model. The same pattern is used for the conditional source via.gdpar_eb_generate_stan_conditional. -
Laplace stage.
.gdpar_eb_maximize_marginal_KxP(model = marginal_model, stan_data = stan_data, control = laplace_control, seed = seed, verbose = verbose)returnslaplace_result, expected to containtheta_ref_kp_hat(a 3D array) andlaplace_result_per_slot(a list of length$K$ whose elements carrytheta_ref_cov_k). -
Conditional stage.
stan_data_condis a shallow copy ofstan_dataaugmented withtheta_ref_kp_data = laplace_result$theta_ref_kp_hat. The comment notes that cmdstanr's automatic packing accepts an R 3D array of shape$[J_{\mathrm{groups}}, K, p]$ for the Stan declarationarray[J_groups, K] vector[p] theta_ref_kp_data. Sampling is invoked viado.call(conditional_model$sample, sample_args);sample_argsincludesdata,chains,iter_warmup,iter_sampling,adapt_delta,max_treedepth,refresh,show_messages = verbose,show_exceptions = verbose, and (if non-NULL)seed. Extra arguments from...are spliced in by name, potentially overriding the defaults. -
Diagnostics.
compute_diagnostics(fit_cond, verbose = verbose)is called on the conditional fit; its return value is stored asdiagnostics. The Laplace stage's own diagnostics are stored asdiagnostics_numerical. -
Correction tensor.
.gdpar_eb_correction_tensor(eb_correction, laplace_result_per_slot, K, p_resolved, verbose)is called; its$applied,$constant, and$slot_dispositionsare stored. -
SE construction. A
$J_{\mathrm{groups}} \times K \times p$ array ofNA_real_is allocated and filled per slot fromsqrt(pmax(diag(cov_k), 0))whencov_kis a matrix, else left asNA_real_. The same per-slot SE vector is replicated across all groups (per decision D43 = (a)). -
Side effects. Writes two Stan source files to tempfiles; compiles two CmdStan models; runs Laplace optimization and HMC sampling (which may produce console output when
verboseisTRUE); may emitgdpar_informmessages. No global state is mutated. -
S3 class. The returned object is assigned class
c("gdpar_eb_fit", "list"), enabling dispatch ongdpar_eb_fitmethods downstream. -
Unused formal arguments.
family_id_k_vectorandid_check_rigorappear in the signature but are not referenced in this section's body; they are preserved for API stability and downstream/future use.
.gdpar_param_spec(name, link, did_status, did_condition, did_reference, support, prior_canonical_kind, scope, family_role)
Purpose
Internal constructor that builds a single structural-parameter specification object (class gdpar_param_spec). Each such object describes one parameter of a statistical family under the canonical contract established in the multi-parametric extension scoping (Block 8 Session 1, decision 1C): a family is represented as a list of marginal parameter specifications.
Arguments
| Argument | Type | Meaning |
|---|---|---|
name |
character scalar | Short canonical name of the parameter (e.g. "mu", "sigma", "phi", "pi", "p"). |
link |
character scalar | Name of the link function to apply; passed to .gdpar_link_funcs(link). Must be one of "identity", "log", "logit". |
did_status |
character scalar | Identifiability status under Lemma 1B. One of "holds", "holds_under_condition", or "user_responsible". |
did_condition |
character scalar or NA_character_
|
Human-readable description of the condition under which identifiability holds (relevant only when did_status is "holds_under_condition"). |
did_reference |
character scalar | Citation or pointer to the formal justification for the identifiability claim (e.g. "Block 1, Section 6.4 (Lemma 1B)"). |
support |
character scalar | Natural support of the parameter on the real line or a constrained domain. One of "real_line", "positive_real", "unit_interval", "bounded_open". |
prior_canonical_kind |
character scalar | Identifier for the canonical prior kind (decision 2b-iii), e.g. "mu", "log_sigma", "logit_p", "log_phi", "log_shape", "log_nu", "power_p", "logit_pi". |
scope |
character scalar | Scope of the parameter in the model hierarchy. One of "per_observation" (varies by observation) or "population" (shared across observations). Decision 2a-iii. |
family_role |
character scalar | Structural role of the parameter within the family. Must be one of the values returned by .gdpar_known_family_roles(). |
Returns
A list of class c("gdpar_param_spec", "list") with the following named elements:
| Element | Value |
|---|---|
name |
The name argument. |
link |
The link argument. |
linkfun |
The link function |
inv_link |
The inverse link function |
did_status |
Identifiability status string. |
did_condition |
Identifiability condition string or NA_character_. |
did_reference |
Identifiability reference string. |
support |
Support string. |
prior_canonical_kind |
Prior kind identifier string. |
scope |
Scope string. |
family_role |
Role string. |
Notes
- The link functions (
linkfun,inv_link) are obtained by calling.gdpar_link_funcs(link). If an unsupported link name is supplied, that helper will callgdpar_abort(). - The class vector is
c("gdpar_param_spec", "list"), enabling S3 dispatch on"gdpar_param_spec"first. - No validation of
did_status,support,scope, orfamily_roleagainst known values is performed inside this constructor; such validation is the responsibility of callers or upstream validators.
Purpose
Returns the complete character vector of all recognised family_role values that a gdpar_param_spec entry may carry. Used by validators inside .gdpar_param_spec() and by codegen branches that emit family-specific Stan likelihood blocks.
Arguments None.
Returns A character vector of length 6:
c("location", "scale", "shape", "df", "mixture_pi", "power")| Value | Meaning |
|---|---|
"location" |
Position/location parameter ( |
"scale" |
Dispersion / scale parameter ( |
"shape" |
Shape parameter (Gamma shape, when relevant). |
"df" |
Degrees of freedom ( |
"mixture_pi" |
Mixture probability (ZIP / ZINB / Hurdle). |
"power" |
Power parameter (Tweedie |
Notes
- Pure function with no side effects.
- The ordering of elements is fixed by the source code and may be relied upon for positional indexing.
Purpose Link function factory. Given the name of a link, returns a list containing both the link function and its inverse. Centralizes the link-function switch used by the family and parameter-spec constructors.
Arguments
| Argument | Type | Meaning |
|---|---|---|
link |
character scalar | Name of the link function. Must be one of "identity", "log", or "logit". |
Mathematics
For each supported link:
-
identity:
$\mathrm{linkfun}(\mu) = \mu$ ,$\mathrm{inv\_link}(\eta) = \eta$ -
log:
$\mathrm{linkfun}(\mu) = \log(\mu)$ ,$\mathrm{inv\_link}(\eta) = \exp(\eta)$ -
logit:
$\mathrm{linkfun}(\mu) = \log\!\left(\dfrac{\mu}{1-\mu}\right)$ ,$\mathrm{inv\_link}(\eta) = \dfrac{1}{1 + \exp(-\eta)}$
Returns A named list with two elements:
| Element | Type | Meaning |
|---|---|---|
linkfun |
function | The link function |
inv_link |
function | The inverse link function |
Notes
- If
linkis not one of the three recognised names, the function callsgdpar_abort()with message"Internal error: unsupported link '<link>'."and error class"gdpar_internal_error". This abort occurs inside theswitch()forinv_linkfirst; thelinkfunswitch is never reached. - All returned closures are anonymous functions defined at call time.
Purpose
Returns the canonical list of gdpar_param_spec objects describing every structural parameter that a built-in family admits as eligible for an individual specification. The first element is always the per-observation location parameter; subsequent elements are population-level auxiliary parameters in their canonical scope. The returned list is the registry of eligible parameters, not a subset selected for a particular model; promotion of auxiliary parameters to per-observation scope is handled downstream by .gdpar_promote_scope_per_observation().
Arguments
| Argument | Type | Meaning |
|---|---|---|
name |
character scalar | Built-in family name. Must be one of: "gaussian", "poisson", "neg_binomial_2", "bernoulli", "beta", "gamma", "student_t", "tweedie", "zip", "zinb", "hurdle_poisson", "hurdle_neg_binomial_2". |
link |
character scalar | Link function name to use for the location (.gdpar_param_spec(). |
Returns
A list of gdpar_param_spec objects (each of class c("gdpar_param_spec", "list")). The list length and contents depend on name:
| Family | Parameters (in order) |
link applied to |
|---|---|---|
"gaussian" |
mu (location), sigma (scale) |
mu |
"poisson" |
mu (location) |
mu |
"neg_binomial_2" |
mu (location), phi (scale) |
mu |
"bernoulli" |
mu (location) |
mu |
"beta" |
mu (location), phi (scale) |
mu |
"gamma" |
mu (location), shape (shape) |
mu |
"student_t" |
mu (location), sigma (scale), nu (df) |
mu |
"tweedie" |
mu (location), phi (scale), p (power) |
mu |
"zip" |
mu (location), pi (mixture_pi) |
mu |
"zinb" |
mu (location), phi (scale), pi (mixture_pi) |
mu |
"hurdle_poisson" |
mu (location), pi (mixture_pi) |
mu |
"hurdle_neg_binomial_2" |
mu (location), phi (scale), pi (mixture_pi) |
mu |
All location parameters receive the user-supplied link. All auxiliary parameters use their own fixed link (always "log" for positive-real parameters, "logit" for unit-interval parameters, "identity" for the Tweedie power parameter).
Notes
- If
namedoes not match any recognised family, the function callsgdpar_abort()with message"Internal error: no canonical param_specs for family '<name>'."and error class"gdpar_internal_error". - Every location parameter has
did_status = "holds",did_condition = NA_character_, andscope = "per_observation". - Every auxiliary parameter has
did_status = "holds_under_condition"with a family-specificdid_conditionstring andscope = "population". - The
prior_canonical_kindfor auxiliary parameters is determined by the family-specific convention (e.g."log_sigma"for Gaussiansigma,"log_phi"for NBphi,"logit_pi"for ZIPpi). - For
zipandzinb, thedid_referenceincludes a literature citation (Lambert (1992)andGreene (1994)respectively). Forhurdle_poissonandhurdle_neg_binomial_2, the reference includesMullahy (1986). - The tweedie
pparameter hassupport = "bounded_open"andprior_canonical_kind = "power_p", distinguishing it from all other auxiliary parameters which use"positive_real"or"unit_interval". - This function is a pure lookup; it does not mutate any global state.
Purpose
Implements the did_override plasticity argument of gdpar_family() (D2 of sub-phase 8.3.5a, 2026-05-21). Allows the user to keep the canonical likelihood and links of a built-in family but replace the identifiability descriptors of one or more parameter slots with their own declaration. Returns a new list of param-specs with the overrides applied; the input list is not mutated.
Arguments
| Argument | Type | Meaning |
|---|---|---|
param_specs |
list of gdpar_param_spec objects |
The canonical parameter specifications to modify. |
did_override |
NULL or a named list |
If NULL, a no-op. Otherwise a named list whose names must be a subset of the parameter names in param_specs. Each entry is itself a named list with names from {"did_status", "did_condition", "did_reference"}. |
family_name |
character scalar | Family name used only for error messages. |
Returns
List of gdpar_param_spec objects with the user-supplied identifiability descriptor overrides applied.
Notes
- The docstring is present in this section but the function body is not; the implementation follows in a subsequent section of
R/families.R. - Validation requirements (documented but not visible in source here):
did_overridemust beNULLor a named list whose names are a subset of the parameter names in the family'sparam_specs. Each entry must be a list whose names are a subset ofc("did_status", "did_condition", "did_reference"); unknown fields raise an error.did_statusmust be one of"holds","holds_under_condition","user_responsible". - Design rationale: D2 of the sub-phase 8.3.5 scoping rejected registering a built-in inside
.gdpar_K_custom_patterns()because that would blur the contract boundary between "package canonizes the likelihood" and "user declares D-ID". The override keeps the likelihood and links first-class while letting the user adjust identifiability semantics for their design.
Purpose
Applies user-supplied overrides to the identifiability (D-ID) descriptors of one or more parameter specifications (param_specs) for a given family. This allows users to modify the default D-ID status, condition, and reference without altering the likelihood or link structure. It is an internal helper used during family construction.
Arguments
-
param_specs: A list of parameter specification objects (each a list with at least anamefield). These represent the canonical parameters of the family. -
did_override: EitherNULL(no overrides) or a named list keyed by slot names (matchingparam_specs[*]$name). Each value must be a named list with optional fields:did_status,did_condition,did_reference. -
family_name: A character scalar identifying the family (used in error messages).
Mathematics
No explicit mathematical formulas; implements input validation and slot-wise patching of D-ID descriptors.
Returns
The modified param_specs list with any applicable D-ID overrides applied.
Notes
- Input validation is strict:
did_overridemust be a named list with no empty names. Each slot entry must be a named list containing only allowed fields. - Allowed fields:
"did_status","did_condition","did_reference". - Allowed values for
did_status:"holds","holds_under_condition","user_responsible". - Raises a
gdpar_input_errorviagdpar_abort()if any validation fails, providing details about the offending slot or field. - If
did_overrideisNULL, returnsparam_specsunchanged. - Modifies
param_specsin-place (by reference) for any slot present indid_override.
Purpose
S3 print method for objects of class gdpar_param_spec. Provides a human-readable summary of a single parameter specification.
Arguments
-
x: An object of classgdpar_param_spec. -
...: Additional arguments (unused; required for S3 generic compatibility).
Mathematics
None.
Returns
Invisibly returns the input x.
Notes
- Exported as an S3 method for the base
printgeneric. - Output includes:
name,link,family_role,scope,support,did_status,did_condition(if notNA), andprior_canonical_kind. - Each field is printed on a separate line with fixed-width alignment.
Purpose
Constructs a standard (built-in) family object for use with the package, setting up the link function, parameter specifications, and Stan identifier. This function serves as the primary entry point for creating family objects for the built-in distributions.
Arguments
-
name: character scalar; one of"gaussian","poisson","neg_binomial_2","bernoulli","beta","gamma","student_t","tweedie","zip","zinb","hurdle_poisson","hurdle_neg_binomial_2". Identifies the distributional family. -
link: character scalar orNULL; the link function for the mean (location) parameter. IfNULL, the default link for the chosen family is used. -
did_override: list orNULL; if non-NULL, passed to.gdpar_apply_did_overrideto modify the default identifiability (D-ID) status of the family's parameters.
Returns
An object of class c("gdpar_family", "list") containing:
-
name: the family name. -
link: the link function (character). -
inv_link: the inverse-link function (closure). -
linkfun: the link function itself (closure). -
stan_id: integer identifier for the Stan template branch. -
has_dispersion: logical;TRUEif the family has a dispersion parameter. -
did_status: character; the identifiability status (e.g.,"holds","holds_under_condition","user_responsible"). -
did_condition: character; description of condition under which identifiability holds, if any. -
did_reference: character; reference supporting the identifiability claim. -
param_specs: list of parameter specification lists (one per parameter, e.g., location and dispersion).
Notes
- Raises a
gdpar_input_errorviagdpar_abortif the providedlinkis not in the set of allowed links for the chosen family. - The
did_overrideargument is passed directly to.gdpar_apply_did_override(internal, not defined in this section) and can alter the default D-ID status of parameters. - The function internally calls
.gdpar_family_param_specs_for(internal, not defined in this section) to obtain the default parameter specifications for the given family and link.
gdpar_family_custom(name, link, did_holds, did_condition, stan_loglik_block, stan_log_lik_block, stan_y_pred_block, y_type, did_reference)
Purpose
Constructs a custom family object for use with the package when the built-in families are insufficient. The user provides Stan code snippets for the likelihood, log-likelihood (for LOO-CV), and posterior predictive sampling, and explicitly declares identifiability (D-ID) status.
Arguments
-
name: character scalar; a unique identifier for the custom family. Must not coincide with any built-in family name. -
link: character scalar; one of"identity","log","logit". -
did_holds: logical scalar (TRUEorFALSE); whether the family is identifiable in its parameter. Missing values (NA) are not allowed. -
did_condition: character scalar; description of the condition under which identifiability holds whendid_holds = TRUEbut identifiability is conditional. UseNA_character_if identifiability holds unconditionally. -
stan_loglik_block: character scalar; Stan code snippet for themodelblock that adds the log-likelihood for one observation. Must reference the linear predictoreta[i]and eithery_real[i](wheny_type = "real") ory_int[i](wheny_type = "integer"). The legacy placeholdery[i]is not allowed. -
stan_log_lik_block: character scalar; Stan code snippet for thegenerated quantitiesblock that assigns tolog_lik[i]for one observation. Used bygdpar_loofor LOO-CV. -
stan_y_pred_block: character scalar; Stan code snippet for thegenerated quantitiesblock that assigns toy_pred[i]for one observation. Used for posterior predictive checks. -
y_type: character scalar; either"real"(outcome is real-valued, Stan referencesy_real[i]) or"integer"(outcome is integer-valued, Stan referencesy_int[i]). -
did_reference: character scalar; citation or source supporting the identifiability declaration.
Returns
An object of class c("gdpar_family", "list") containing:
-
name: the custom family name. -
link: the link function (character). -
inv_link: the inverse-link function (closure). -
linkfun: the link function itself (closure). -
stan_id:NA_integer_(custom families do not have a built-in Stan branch ID). -
has_dispersion:FALSE(custom families are assumed to have only a location parameter). -
did_status: derived fromdid_holdsanddid_condition(one of"holds","holds_under_condition","user_responsible"). -
did_condition: the condition text (orNA_character_). -
did_reference: the user-provided reference. -
stan_loglik_block: the user-provided model block snippet. -
stan_log_lik_block: the user-provided generated quantities snippet forlog_lik. -
stan_y_pred_block: the user-provided generated quantities snippet fory_pred. -
y_type:"real"or"integer". -
is_custom:TRUE(flag indicating a custom family). -
param_specs: a list of length 1 containing the parameter specification for the location parameter"mu".
Notes
- Raises a
gdpar_input_errorviagdpar_abortfor any validation failure (e.g.,namecoincides with a built-in, invalidlink, missingdid_holds, empty Stan snippets, legacyy[i]in snippets). - Emits a
gdpar_did_messageviagdpar_informupon successful creation, reminding the user of their responsibility for the correctness of the Stan code and identifiability declaration. - The internal function
.gdpar_param_spec(not defined in this section) is called to create the parameter specification for the location parameter"mu".
Purpose Internal (non-exported) registry function that returns a named list of canonical bi-parametric (gdpar_family_custom_K. Each pattern specifies the Stan log-probability density function identifier, the outcome type, and the slot-level descriptors (link, support, prior kind, scope, family role, D-ID metadata). The registry is the single source of truth for the descriptor-based wiring decided in sub-phase 8.3.4 (Block 8, D-A3.B option (b)).
Arguments
None.
Returns
A named list. Currently a single element is registered:
| Name | stan_id |
y_type |
Slots |
|---|---|---|---|
lognormal_loc_scale |
7L |
"real" |
mu (location), sigma (scale) |
Each element is itself a list with components:
-
stan_id— integer; the Stan-side family identifier used by theamm_distrib_K.standispatcher. -
y_type— character; the Stan type of the outcome variable (e.g."real"). -
slot_specs— a list of lists, each describing one distributional parameter slot. Every slot contains:-
name— parameter name (e.g."mu","sigma"). -
link— link function name ("identity","log"). -
support— mathematical support ("real_line","positive_real"). -
prior_canonical_kind— canonical prior identifier ("mu","log_sigma"). -
scope— whether the parameter varies"per_observation"or is"population"-level. -
family_role— semantic role ("location","scale"). -
did_status— identifiability declaration ("holds","holds_under_condition"). -
did_condition— character orNA_character_; the condition under which D-ID holds. -
did_reference— citation string for the D-ID claim.
-
Notes
- This function is never called directly by the user; it is consumed exclusively by
gdpar_family_custom_K. - The registry is closed at the package level; adding a new
$K=2$ pattern requires a source-code contribution (new list entry) rather than user-side injection, enforcing the auditability contract of D-A3.B option (b). - Currently only
lognormal_loc_scaleis registered (stan_id 7). The doc comments note that future sub-phases will extend the whitelist.
gdpar_family_custom_K(name, stan_lpdf_id, did_holds = TRUE, did_condition = NULL, did_reference = NULL)
Purpose Exported constructor for a .gdpar_K_custom_patterns, wires the descriptor-level metadata into gdpar_param_spec objects, and returns a gdpar_family object that routes through the same amm_distrib_K.stan branch as built-in families. This is the gdpar_family_custom (the
Arguments
| Argument | Type | Default | Meaning |
|---|---|---|---|
name |
character scalar | (required) | Unique identifier for the custom family. Must be non-empty, must not collide with any built-in family name. |
stan_lpdf_id |
character scalar | (required) | Registry key selecting a canonical pattern (currently only "lognormal_loc_scale" is valid). |
did_holds |
logical scalar | TRUE |
Whether the D-ID condition of Lemma 1B holds for this family. If FALSE, every slot's did_status is overridden to "user_responsible". NA is not allowed. |
did_condition |
character scalar or NULL
|
NULL |
Optional override of the pattern-level D-ID condition string. When NULL, the registry's per-slot did_condition is used. |
did_reference |
character scalar or NULL
|
NULL |
Optional override of the pattern-level D-ID citation. When NULL, the registry's per-slot did_reference is used. |
Mathematics
No formula is computed; the constructor wires descriptors. However, the families it produces are intended for the AMM canonical form
where stan_lpdf_id, lognormal_loc_scale pattern the likelihood is
with link
Returns
An object of class gdpar_family (also inherits "list") with components:
| Component | Type | Description |
|---|---|---|
name |
character | The user-supplied family name. |
link |
character | Link function of the first (location) slot. |
inv_link |
function | Inverse-link function from the first slot's gdpar_param_spec. |
linkfun |
function | Link function object from the first slot's gdpar_param_spec. |
stan_id |
integer | Stan family identifier from the registry pattern (e.g. 7L). |
has_dispersion |
logical | Always TRUE (both slots exist). |
did_status |
character | D-ID status of the first slot (possibly overridden). |
did_condition |
character or NA
|
D-ID condition of the first slot (possibly overridden). |
did_reference |
character | D-ID reference of the first slot (possibly overridden). |
y_type |
character | Stan outcome type from the pattern (e.g. "real"). |
is_custom |
logical | Always TRUE. |
stan_lpdf_id |
character | The registry key used. |
param_specs |
list of gdpar_param_spec
|
One per slot in the pattern (length 2 for the registered pattern). |
Notes
-
Validation errors raised (via
gdpar_abortwith class"gdpar_input_error"):-
nameis not a non-empty character scalar. -
namecollides with a built-in family name (hard-coded vector of 12 built-in names:"gaussian","poisson","neg_binomial_2","bernoulli","beta","gamma","student_t","tweedie","zip","zinb","hurdle_poisson","hurdle_neg_binomial_2"). -
stan_lpdf_idis not a character scalar. -
stan_lpdf_idis not found in the registry. -
did_holdsis not a non-NAlogical scalar.
-
-
D-ID override logic: If
did_holdsisFALSE, every slot'sdid_statusis set to"user_responsible"regardless of the registry value. The user-supplieddid_conditionanddid_reference(when non-NULL) propagate to every slot, overriding the per-slot registry values. -
Side effect: Emits an informational message (via
gdpar_informwith class"gdpar_did_message") confirming the registration, the canonical pattern used, and the identifiability provenance. -
No uniqueness check: The current source does not verify that
namedoes not collide with a previously registered custom-K family in the session; only the hard-coded built-in vector is checked. This is consistent with the code as written; a session-level uniqueness guard may be added in a later sub-phase. - The returned object's top-level
link,inv_link, andlinkfunare taken from the first slot (param_specs[[1]]), i.e. the location parameter.
Purpose S3 print method for objects of class gdpar_family. Provides a human-readable textual summary of the family's key metadata.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_family object |
The family to print. |
... |
(ignored) | Present for S3 generic compatibility; unused. |
Returns
Invisibly returns x (the input object unchanged). Side effect is console output formatted as:
<gdpar_family>
name : <name>
link : <link>
has_dispersion : <TRUE/FALSE>
did_status : <status>
did_condition : <condition> # only if not NA
did_reference : <reference>
param_specs : <slot1_name (scope1)>, <slot2_name (scope2)>, ...
The param_specs line is only printed if x$param_specs is non-NULL. Each slot is formatted as "<name> (<scope>)" and the entries are comma-separated.
Notes
- The
did_conditionline is suppressed whenx$did_conditionisNA(checked with!is.na()). - No validation of
xis performed; the method trusts that the object has the expected structure. - This is the only print method for
gdpar_familyin this section. A separate print method forgdpar_family_multiis referenced in thegdpar_family_multidocumentation but is not defined here.
Purpose Exported constructor for a multivariate family object used with gdpar when the AMM specification has dimension gdpar_family objects (one per coordinate of the individual parameter vector
with cross-dimensional coupling carried exclusively by the modulating component
Arguments
| Argument | Type | Default | Meaning |
|---|---|---|---|
family |
character scalar, gdpar_family object, or list of gdpar_family
|
(required) | When a character scalar, it is the name of a built-in family (one of "gaussian", "poisson", "neg_binomial_2", "bernoulli"); gdpar_family() is called internally to construct the base family. When already a gdpar_family object, it is replicated gdpar_family (homogeneous restriction: all must share the same stan_id; heterogeneous per-coordinate families are deferred to a later sub-phase). |
p |
positive integer | (required) | Dimension of p of the amm_spec passed to gdpar. |
link |
character scalar | (implied: canonical link) | Link function used when family is supplied as a name. Ignored when family is already a gdpar_family object or a list. |
Returns
An object of class gdpar_family_multi containing at minimum:
| Component | Type | Description |
|---|---|---|
families |
list of p gdpar_family objects |
One univariate family per coordinate. |
p |
integer | The dimension. |
homogeneous |
logical |
TRUE when all stan_id. |
stan_id |
integer | Common Stan family identifier (present when homogeneous). |
has_dispersion |
logical | Common dispersion flag (present when homogeneous). |
name |
character | Common family name (present when homogeneous). |
did_status |
character | Common identifiability status (present when homogeneous). |
A print method for gdpar_family_multi provides a human-readable summary.
Mathematics
The factorization follows architectural Option B of the Phase F decision:
where
Notes
-
Scope restriction: This constructor handles the marginal factorization case only. Multi-parametric families (a single univariate outcome parametrized by the full vector
$\theta_i \in \mathbb{R}^p$ , e.g. a Gaussian with$\theta_i = (\mu_i, \log \sigma_i)$ in the distributional regression sense) are explicitly deferred to a dedicated post-validation block. -
Homogeneity: In this version, even when a list of families is supplied, all entries must share the same
stan_id. Heterogeneous per-coordinate families are deferred. -
D-ID: The identifiability condition of Lemma 1B applies coordinate-wise: each univariate marginal
$D_k$ identifies$\theta_i[k]$ from$y_{ik}$ independently. The cross-dimensional identifiability condition (C4-bis), guarding against aliasing between coordinates of$\theta_{\text{ref}}$ that share basis structure, is checked bygdpar_check_identifiability(Phase H, pending). -
Dependency: Calls
gdpar_family()whenfamilyis supplied as a character name. - The function body is not included in this section (section 4 of 7); only the roxygen block and the signature are present here. The implementation appears in a subsequent section.
Purpose Constructs a multivariate family object (class gdpar_family_multi) representing a product of p univariate families across coordinates. This is the primary factory for multivariate response models in gdpar, enforcing homogeneity constraints in the current version.
Arguments
-
family: Input family specification. Accepted types:-
character(1): Name of a built-in family (e.g.,"gaussian"). -
gdpar_family: A pre-constructed univariate family object. -
list: A list ofpgdpar_familyobjects, one per coordinate.
-
-
p: Integer > 0, the number of response dimensions (coordinates). -
link: Optional character string specifying a link function. Ignored iffamilyis already agdpar_familyobject.
Mathematics None. This function is a constructor and validator.
Returns An object of class c("gdpar_family_multi", "list") with elements:
-
families: List ofpgdpar_familyobjects. -
p: Integer, the number of coordinates. -
homogeneous: Logical, currently alwaysTRUE(enforced). -
stan_id,name,link,has_dispersion,did_status,did_condition,did_reference: Properties taken from the first (and only, due to homogeneity) family. -
param_specs_per_coord: List ofpparam_specslists.
Notes
- Raises
gdpar_input_errorifpis not a positive integer, iffamilyis an invalid type, if the list length mismatchesp, or if any list element is not agdpar_family. - Raises
gdpar_unsupported_feature_errorif the supplied families are heterogeneous (differentstan_idorlink). This is a known limitation; future versions will support heterogeneity. - If
familyis a character string, it is passed togdpar_family()to create a base family, which is then replicatedptimes. - The
linkargument is ignored when agdpar_familyobject is supplied to avoid silent overwrites; a warning-like abort is issued instead.
Purpose S3 print method for gdpar_family_multi objects, providing a concise summary.
Arguments
-
x: An object of classgdpar_family_multi. -
...: Additional arguments (unused, present for S3 generic compatibility).
Mathematics None.
Returns Invisibly returns x.
Notes Prints a formatted summary to the console, including dimensions (p), homogeneity status, family name, link, and parameter specifications. If homogeneous is TRUE and param_specs_per_coord is available, it prints a summary of parameter names and scopes for the first coordinate.
Purpose Internal helper to promote specified parameters of a family to "per_observation" scope. This is used to implement K-individual specifications where some parameters vary per observation (e.g., in mixed-effects-like models). For multivariate families (gdpar_family_multi), promotion is applied to each coordinate's family.
Arguments
-
family: Agdpar_familyorgdpar_family_multiobject. -
k_names: Character vector of parameter names to promote. May beNULLor empty (no-op).
Mathematics None. This function modifies metadata only.
Returns A copy of the input family object with updated scope fields in param_specs. The original is not mutated.
Notes
- Raises
gdpar_internal_errorifk_namesis not a character vector or if a family lacksparam_specs. - Raises
gdpar_input_errorif any name ink_namesis not among the eligible parameter names of the family. - For
gdpar_family_multi, it updates each family in thefamilieslist and also updates theparam_specs_per_coordfield accordingly.
Purpose Internal helper that returns the integer ID of the canonical inverse link function for the location slot (slot 1) of the built-in family identified by stan_id. This ID is used by Stan code for link dispatch.
Arguments
-
stan_id: Integer scalar, one of the built-in family Stan IDs (1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13).
Mathematics The mapping is:
-
stan_id == 1(Gaussian) → identity link → ID0 -
stan_id == 3(Negative Binomial) → log link → ID2 -
stan_id == 5(Beta) → logit link → ID1 -
stan_id == 6(Gamma) → log link → ID2 -
stan_id == 7(Lognormal) → identity link (on log scale) → ID0 -
stan_id == 8(Student-t) → identity link → ID0 -
stan_id == 9(Tweedie) → log link → ID2 -
stan_id == 10(Zero-Inflated Poisson) → log link → ID2 -
stan_id == 11(Zero-Inflated Negative Binomial) → log link → ID2 -
stan_id == 12(Hurdle Poisson) → log link → ID2 -
stan_id == 13(Hurdle Negative Binomial) → log link → ID2
Returns Integer scalar in {0, 1, 2}.
Notes
- Raises
gdpar_internal_errorifstan_idisNULLorNA. - Raises
gdpar_internal_errorwith a descriptive message ifstan_idis not one of the recognized codes.
Purpose Internal helper that returns the integer ID of the canonical inverse link function for an arbitrary slot index in the canonical parameter specifications of the built-in family identified by stan_id. This is used in the homogeneous branch of the heterogeneous resolver to obtain the correct link for a given slot.
Arguments
-
stan_id: Integer scalar, a built-in family Stan ID. -
slot_idx: Integer scalar, 1-based slot index in the family's canonicalparam_specs.
Mathematics The mapping is family-specific and slot-specific. For example, for the Gaussian family (stan_id=1), slot 1 (mean) uses identity link (ID 0) and slot 2 (scale) uses log link (ID 2). The exact mapping is not provided in the visible code.
Returns Integer scalar in {0, 1, 2}.
Notes
- The function body is not fully shown in the provided code, but it is expected to be a switch-case or lookup based on
stan_idandslot_idx. - Likely raises
gdpar_internal_errorfor invalidstan_idorslot_idx.
Purpose Internal helper that maps a given slot of a given family (identified by its Stan numeric ID) to a canonical integer inverse-link identifier used by the Stan template's apply_inv_link_by_id() dispatcher. It handles both built-in families (via a switch on stan_id) and the special custom-K pattern lognormal_loc_scale (stan_id 7).
Arguments
| Argument | Type | Meaning |
|---|---|---|
stan_id |
Integer scalar (or coercible) | The Stan-family numeric identifier. Must not be NULL or NA. |
slot_idx |
Integer scalar | 1-based index of the parameter slot whose inverse-link ID is requested. Must lie within 1:length(slot_specs) for the relevant family/pattern. |
Mathematics
The function does not compute a mathematical formula; it is a pure lookup/dispatch table. It returns the integer encoding of the link function associated with slot slot_idx:
For stan_id 7 the slot specs are obtained from .gdpar_K_custom_patterns()[["lognormal_loc_scale"]]$slot_specs. For all other built-in families the specs come from .gdpar_family_param_specs_for(family_name, .gdpar_default_link_for(family_name)).
Returns An integer scalar in {0L, 1L, 2L} representing the canonical inverse-link identifier.
Notes
- Raises a
gdpar_internal_errorifstan_idisNULLorNA. - For stan_id 7 (
lognormal_loc_scale), the function bypasses the normalgdpar_familypath and fetches specs directly from.gdpar_K_custom_patterns(), becauselognormal_loc_scaleis a custom-K registry pattern, not a built-ingdpar_family. - Raises a
gdpar_internal_errorifslot_idxis out of range for the resolved family/pattern. - Raises a
gdpar_internal_errorifstan_iddoes not correspond to any recognised built-in family (thefamily_nameswitch returnsNULL). - Raises a
gdpar_internal_errorif the resolved link string for the slot is not one ofidentity,logit, orlog. - No S3 dispatch; pure procedural logic.
Purpose Returns the default (canonical) link-function string for a built-in family given its character name. Used internally by .gdpar_canonical_inv_link_id_slot to look up param_specs without the caller needing to know link conventions.
Arguments
| Argument | Type | Meaning |
|---|---|---|
name |
Character scalar | One of the recognised built-in family names (see table below). |
Returns A character scalar — the name of the default link function. The mapping is:
| Family name | Default link |
|---|---|
gaussian |
"identity" |
poisson |
"log" |
neg_binomial_2 |
"log" |
bernoulli |
"logit" |
beta |
"logit" |
gamma |
"log" |
student_t |
"identity" |
tweedie |
"log" |
zip |
"log" |
zinb |
"log" |
hurdle_poisson |
"log" |
hurdle_neg_binomial_2 |
"log" |
Notes
- Implemented via a
switch()with no default branch. Ifnamedoes not match any of the listed families,switch()returnsNULLsilently (no explicit error is raised). - No S3 dispatch.
Purpose Returns the canonical support string of slot 1 (the location/first parameter) of the family identified by stan_id. Used by the heterogeneous validator (D4 of sub-phase 8.3.7): when family_id_k[k] is heterogeneous, the emitted support is the support of slot 1 of that family per the L1 refinement rule.
Arguments
| Argument | Type | Meaning |
|---|---|---|
stan_id |
Integer scalar (or coercible) | The Stan-family numeric identifier. |
Mathematics None — pure lookup table.
Returns A character scalar, one of "real_line", "positive_real", "unit_interval", or "bounded_open". The mapping is:
| stan_id | Family | Slot 1 support |
|---|---|---|
| 1 | gaussian |
"real_line" |
| 3 | neg_binomial_2 |
"positive_real" |
| 5 | beta |
"unit_interval" |
| 6 | gamma |
"positive_real" |
| 7 | lognormal_loc_scale |
"real_line" |
| 8 | student_t |
"real_line" |
| 9 | tweedie |
"positive_real" |
| 10 | zip |
"positive_real" |
| 11 | zinb |
"positive_real" |
| 12 | hurdle_poisson |
"positive_real" |
| 13 | hurdle_neg_binomial_2 |
"positive_real" |
Notes
- Raises a
gdpar_internal_errorif the integer-convertedstan_iddoes not match any recognised key. - No S3 dispatch.
Purpose Predicate that tests whether the support emitted by a heterogeneous family's slot 1 is a coherent subset of the support required by the role of that slot in the location family. Used by the D4 heterogeneous validator.
Arguments
| Argument | Type | Meaning |
|---|---|---|
emitted |
Character scalar | The support string emitted by the heterogeneous family's slot 1 (e.g. from .gdpar_canonical_support_slot1). |
required |
Character scalar | The support string required by the role of the slot in the location family's param_specs. |
Mathematics
The subset-coherence relation is defined as:
The reflexive case (emitted == required) always returns TRUE.
Returns A logical scalar: TRUE if emitted is a coherent subset of required; FALSE otherwise (including the case where emitted is "custom" or any unrecognised string).
Notes
- Custom supports (
"custom") are implicitly rejected: they match none of theifbranches and fall through toFALSE, because the heterogeneous family cannot have a custom slot 1 built-in. - Bounds are not checked for
"bounded_open"(no built-in family registers a bounded range for slot 1). - No S3 dispatch.
Purpose Enumerates alternative built-in families whose slot 1 support satisfies the subset-coherence predicate against a given required support. Used by the D4 validator to build informative error messages suggesting replacement families when a user-supplied heterogeneous family fails validation.
Arguments
| Argument | Type | Meaning |
|---|---|---|
required |
Character scalar | The support required by the role of the slot in the location family. |
exclude_stan_ids |
Integer vector (default integer(0)) |
Stan IDs to exclude from the candidate list. Typically includes the location family's stan_id and the failing heterogeneous family's stan_id. |
Mathematics None — iterates over a fixed candidate list and applies .gdpar_support_subset_coherent.
Returns A character vector of built-in family names whose slot 1 support coherently satisfies required, excluding any whose stan_id appears in exclude_stan_ids. May be length zero.
Notes
- The candidate list is hardcoded and includes only six families:
gaussian(1),neg_binomial_2(3),beta(5),gamma(6),student_t(8),tweedie(9). Notably absent arezip,zinb,hurdle_poisson,hurdle_neg_binomial_2,poisson, andbernoulli— these are not offered as suggestions. - The location family itself is excluded via
exclude_stan_ids(the homogeneous trivial case is not a suggestion). - No S3 dispatch.
Purpose Validates a heterogeneous family list (a named list of gdpar_family objects, one per slot
Arguments
| Argument | Type | Meaning |
|---|---|---|
family_het_list |
Named list of gdpar_family objects, length |
The heterogeneous family specification. [[1]] is the location family. Names must match slot_names. |
location_param_specs |
List of param_specs
|
The location family's parameter specifications after promotion to per-observation, defining the canonical role assignment of the |
slot_names |
Character vector, length |
Canonical slot names (e.g. c("mu", "sigma", ...)). |
Mathematics None — validation logic only.
Returns invisible(NULL) on success.
Notes
- Raises a
gdpar_internal_errorif the three inputs have inconsistent lengths$K$ . - For each slot
$k \geq 2$ :- Raises a
gdpar_input_erroriffamily_het_list[[k]]is not agdpar_familyobject. - Raises a
gdpar_input_erroriffamily_het_list[[k]]$stan_idisNAorNULL(custom free-form families are not admitted as heterogeneous slot entries; only built-in families and descriptor-based custom-K patterns are allowed). - Skips validation (via
next) if the heterogeneous family's stan_id equals the location family's stan_id (homogeneous slot). - Raises a
gdpar_input_errorif the subset-coherence predicate fails. The error message includes the offending slot name, its role, the emitted and required supports, the location family name, and a suggestion list from.gdpar_compatible_families_for_support.
- Raises a
- Slot 1 (the location family) is never validated here; it is the canonical reference that determines role assignment.
- No S3 dispatch.
.gdpar_inv_link_id_per_slot(family_id_k_vector, location_family) (documentation only; body not in this section)
Purpose Computes the inv_link_id_per_slot integer vector of length data block. Implements the L1 refined rule (D3.5 of 8.3.7) for dispatching inverse-link functions per slot.
Arguments
| Argument | Type | Meaning |
|---|---|---|
family_id_k_vector |
Integer vector, length |
Per-slot Stan family ID. Element stan_id governing slot |
location_family |
A gdpar_family object |
The location family (slot 1), used to look up canonical slot- |
Mathematics
For each slot
The returned integer vector values lie in identity (0), logit (1), and log (2) as defined in .gdpar_canonical_inv_link_id_slot.
Returns An integer vector of length
Notes
-
Backward-compatibility note (D7 of 8.3.7): In the strict homogeneous regime (all
family_id_kentries equal), the vector reproduces the canonical slot-by-slot links of the location family. The Stan refactor replaces previously hardcodedinv_linkcalls withapply_inv_link_by_id(), which preserves mathematical equivalence but is NOT guaranteed to be bit-for-bit identical with pre-8.3.7 outputs. Goldens for$K=2$ Beta and Gamma must be re-bootstrapped on close. - The function body is not present in this section; only the roxygen documentation block appears here.
Purpose
Computes a vector of integer inverse-link identifiers for each slot in a heterogeneous family model. This is used to map the canonical inverse-link function of the location family to slots that share the same family as the location family, and to assign a slot-specific canonical inverse-link for slots with different families.
Arguments
-
family_id_k_vector(integer vector): Length-K vector of Stan family identifiers for each slot. -
location_family(gdpar_family): The primary location family object whosestan_iddetermines the canonical inverse-link for matching slots.
Mathematics
For each slot
- If
$\text{family\_id\_k\_vector}[k] == \text{location\_family\$stan\_id}$ , then
$$ \text{out}[k] = \text{.gdpar_canonical_inv_link_id_slot}(\text{location_stan_id}, k). $$ - Otherwise,
$$ \text{out}[k] = \text{.gdpar_canonical_inv_link_id_slot1}(\text{family_id_k_vector}[k]). $$ where.gdpar_canonical_inv_link_id_slotand.gdpar_canonical_inv_link_id_slot1are helper functions (not defined in this section) that return integer inverse-link identifiers.
Returns
An integer vector of length
Notes
- This is an internal helper function (not exported).
- The function loops over
$K$ slots. - The inverse-link identifier is used by Stan to compute the inverse-link function for the linear predictor in each slot.
Purpose
Resolves a heterogeneous family argument (a named list of gdpar_family objects) into a canonical form for the model. It validates the input, ensures slot names match the formula set, checks for unsupported heterogeneity in certain location families, and propagates design (did) status overrides from the first slot (location family) to other slots that have the same family. This function implements the named-list public API (D5 of 8.3.7) and the materialization rule (D3.5 / L1 refined).
Arguments
-
family_input(either agdpar_familyobject or a named list ofgdpar_familyobjects):- If a single
gdpar_family, it is treated as homogeneous (all slots share the same family). - If a named list, it must have one entry per slot, keyed by slot name.
- If a single
-
slot_names(character vector): Length-K vector of canonical slot names from thegdpar_formula_set.
Returns
A named list with three components:
-
location_family(gdpar_family): The location family with per-slot design (did) status overrides propagated (only for slots that are heterogeneous). -
family_id_k_vector(integer vector): Length-K vector of Stan family identifiers for each slot. -
is_heterogeneous(logical):TRUEif at least one slot has a different Stan identifier from the first slot (i.e., the location family).
Notes
-
Input Validation:
- If
family_inputis not agdpar_familyor a list, an error of classgdpar_input_erroris raised. - If
family_inputis a list but not a named list (or has empty/duplicate names), an error is raised. - The length of the list must equal
$K$ (the number of slots). - The names of the list must exactly match
slot_names(no missing or extra names). - Each element of the list must be a
gdpar_familyobject.
- If
-
Location Family Requirements:
- The first slot (location family) must have a non-NA
stan_id(i.e., it must be a built-in or descriptor-based custom_K family).
- The first slot (location family) must have a non-NA
-
Heterogeneity Restrictions:
- Heterogeneous families (different families across slots) are only supported for a subset of location families: Gaussian (1), negative binomial (3), beta (5), gamma (6), and lognormal_loc_scale (7). If heterogeneity is detected and the location family is not one of these, an error of class
gdpar_unsupported_feature_erroris raised.
- Heterogeneous families (different families across slots) are only supported for a subset of location families: Gaussian (1), negative binomial (3), beta (5), gamma (6), and lognormal_loc_scale (7). If heterogeneity is detected and the location family is not one of these, an error of class
-
Design (did) Propagation:
- For each slot
$k \ge 2$ that has a different family from the location family, thedid_status,did_condition, anddid_referencefields in the location family'sparam_specsfor that slot are overwritten with those from the heterogeneous family for that slot. The link, support, and prior canonical kind are not overwritten because the Stan side usesinv_link_id_per_slotand canonical priors are applied in linear predictor space.
- For each slot
-
Side Effects:
- The
location_familyreturned is a modified copy of the original location family (promoted to have per-observation scope via.gdpar_promote_scope_per_observation). - Validation of the heterogeneous family is performed by
.gdpar_validate_heterogeneous_family_K(not defined in this section).
- The
-
Error Handling:
- Uses
gdpar_abortwith appropriate classes and data for structured error reporting.
- Uses
-
Internal Use:
- This is an internal function (not exported).
Purpose
Internal validation guard that detects intercept-suppression syntax (- 1 or + 0) in a formula by inspecting the intercept attribute of the terms object produced by stats::terms(). In the AMM canonical form the population anchor gdpar_input_error.
Arguments
| Argument | Type | Meaning |
|---|---|---|
f |
formula | A one-sided or two-sided formula object to be inspected. |
slot_name |
character (scalar) | Name of the slot within the enclosing gdpar_formula_set; used solely for constructing the error message. |
Mathematics
The decision rule is:
When stats::terms(f) raises an error (i.e. the formula cannot be parsed by terms), the function treats the formula as valid and returns TRUE:
Returns
invisible(TRUE) in all non-aborting paths:
- When
stats::terms(f)fails andtrmisNULL. - When the intercept attribute is non-zero (intercept retained).
When intercept suppression is detected, the function does not return; it calls gdpar_abort() with class = "gdpar_input_error" and a data list containing slot and formula (the deparsed formula at width.cutoff = 500L).
Notes
- The error message is bilingual (English and Spanish) and directs users to
glm()/brms/stan_glm()for anchor-free models. - There is a minor redundancy in the
NULLbranch:invisible(TRUE)is called and then immediatelyreturn(invisible(TRUE))follows; the first call has no observable effect. - The function is marked
@keywords internaland@noRd; it is not exported. - No S3 dispatch is involved; the function is a plain closure.
Purpose
Internal assertion that validates a list of arguments captured from ... in a constructor. It enforces four structural invariants: non-emptiness, full naming, name uniqueness, and that every element inherits from class "formula".
Arguments
| Argument | Type | Meaning |
|---|---|---|
args |
list | A list captured from ... in the calling constructor (e.g. gdpar_formula_set). |
fn_name |
character (scalar) | Name of the calling function, embedded in error messages for traceability. |
Returns
invisible(TRUE) on success. On any validation failure, the function aborts via gdpar_abort() with class = "gdpar_input_error".
Notes
Validation proceeds in four sequential checks, each of which aborts on failure:
-
Non-emptiness. If
length(args) == 0L, aborts with message"<fn_name>() requires at least one named formula."Nodatafield is attached. -
Naming.
arg_names <- names(args). Ifarg_namesisNULLor any element failsnzchar()(i.e. is""), aborts with a message instructing the user to name formulas with the canonical parameter name (e.g.mu = y ~ a(x)). Nodatafield is attached. -
Uniqueness.
anyDuplicated(arg_names) > 0Ltriggers an abort. Duplicates are collected viaunique(arg_names[duplicated(arg_names)])and reported withsQuoteand comma separation. Thedatafield islist(duplicates = dups). -
Formula class. Iterates
seq_along(args); if any element does notinherits(., "formula"), aborts with a message naming the slot, the function, and the received class(es) (viasQuote(class(args[[i]]))). Thedatafield islist(slot = arg_names[[i]], received = class(args[[i]])).
The function is marked @keywords internal and @noRd.
Purpose
Exported constructor that builds the canonical internal representation of a multi-parameter AMM formula set. This object is the single source of truth for which structural parameters of the distributional family receive an AMM (per-observation) design. It is consumed downstream by gdpar().
Arguments
| Argument | Type | Meaning |
|---|---|---|
... |
Named formula objects | The first must be a two-sided formula whose LHS is a single symbol naming the outcome (e.g. mu = y ~ a(x)). Subsequent arguments must be one-sided formulas (e.g. sigma = ~ a(x)). |
Mathematics
Let
-
$f_1$ : a two-sided formula$\text{outcome} \sim \text{design}_1$ , where$\text{outcome}$ is a bare symbol. -
$f_k$ for$k \geq 2$ : one-sided formulas$\sim \text{design}_k$ .
Each formula name corresponds to a canonical parameter gdpar() promotes every named parameter to:
while unnamed family parameters retain their canonical scope (typically "population"). The population anchor
Returns
An S3 object of class c("gdpar_formula_set", "list") with four components:
| Component | Type | Description |
|---|---|---|
outcome |
character (scalar) | The outcome variable name extracted from the LHS of the first formula. |
formulas |
named list of formula | The original args list, preserving order and names. The first element is two-sided; the rest are one-sided. |
param_names |
character vector | Identical to names(formulas). |
env |
environment |
environment(first_f), the environment of the first formula, used downstream for evaluation. |
Notes
Validation is performed in a strict sequence; the first failing check aborts via gdpar_abort() with class = "gdpar_input_error":
-
Named formula list. Delegates to
.gdpar_assert_named_formula_list(args, "gdpar_formula_set"), which checks non-emptiness, naming, uniqueness, and formula class. -
First slot is two-sided.
length(first_f) != 3Ldetects a non-two-sided first formula (R formulas have length 2 when one-sided, 3 when two-sided). The errordataislist(slot = arg_names[[1L]], received = deparse(first_f, width.cutoff = 500L)). -
First slot LHS is a bare symbol.
outcome_expr <- first_f[[2L]]; if!is.symbol(outcome_expr), the function aborts. This rejects function calls on the LHS such aslog(y). The errordataislist(slot = arg_names[[1L]], received = deparse(outcome_expr, width.cutoff = 500L)). The outcome name is then extracted viaas.character(outcome_expr). -
Subsequent slots are one-sided. For
iin2:length(args),length(f) != 2Ltriggers an abort. The errordataislist(slot = arg_names[[i]], received = deparse(f, width.cutoff = 500L)). -
No intercept suppression. Iterates
seq_along(args)and calls.gdpar_check_no_intercept_suppression(args[[i]], arg_names[[i]])for every formula.
Set membership (slot names must be a subset of the family's eligible param_specs) is not enforced here; it is deferred to gdpar() once both the formula set and the family are available.
The function is exported (@export).
Purpose
Exported brms-style sugar constructor that produces a gdpar_formula_set from a sequence of two-sided formulas. It is a convenience wrapper around gdpar_formula_set(), designed to feel familiar to users of brms::bf. The brms package is not a runtime dependency.
Arguments
| Argument | Type | Meaning |
|---|---|---|
... |
Two-sided formula objects | The first carries the outcome on its LHS and defaults to slot name "mu". Subsequent formulas carry the canonical parameter name on their LHS (e.g. sigma ~ a(x)). |
Mathematics
The sugar translates a sequence of two-sided formulas into the canonical representation. For the default case:
When the first argument is explicitly named (e.g. theta = y ~ a(x)), that name overrides the default "mu":
Returns
An object of class c("gdpar_formula_set", "list"), identical in structure to what gdpar_formula_set() returns (components: outcome, formulas, param_names, env).
Notes
-
The function body is not present in this source section. Only the roxygen documentation (
@param,@return,@examples,@seealso,@export) appears in section 1 of 3. The implementation is expected in a subsequent section; based on the documentation, it internally callsgdpar_formula_set()after converting the two-sided formulas into the canonical form (first formula's LHS becomes the outcome, subsequent formulas' LHSs become slot names and their RHSs become one-sided formulas). - The function is exported (
@export). - No S3 dispatch is involved at the constructor level.
Purpose
User-facing constructor that assembles a gdpar_formula_set from a variable number of formula arguments. It is the primary entry point for declaring the outcome formula together with one or more parameter (slot) formulas in a single call, performing full validation of shapes, names, and duplicates before delegating to gdpar_formula_set.
Arguments
-
...:formulaobjects. The first must be a two-sided formula whose left-hand side is the outcome variable (e.g.y ~ a(x)). Subsequent formulas must be two-sided with a single symbol on the left-hand side naming the parameter slot (e.g.sigma ~ a(x)), or one-sided formulas supplied via a named argument whose name matches the intended slot (e.g.sigma = ~ a(x)). Argument names are optional for positional use; when present they must agree with the LHS symbol of the corresponding formula.
Returns
A gdpar_formula_set object, obtained by do.call(gdpar_formula_set, args) where args is the validated, named list of formulas. The first slot is named "mu" when no explicit name is supplied; subsequent slots are named after their LHS symbol (or their explicit argument name).
Notes
- Raises a
gdpar_input_error(viagdpar_abort) when:- no arguments are supplied;
- any argument is not a
formula; - the first formula is not two-sided (length 3);
- any formula after the first is not two-sided;
- the LHS of a non-first formula is not a single symbol;
- an explicit argument name disagrees with the LHS symbol of its formula;
- duplicate slot names are detected.
- For formulas after the first, the two-sided formula is converted to a one-sided formula
~ <RHS>preserving the original environment, before being placed in the named list passed togdpar_formula_set. - The first formula is left untouched (still two-sided) in the list handed to
gdpar_formula_set; downstream code is expected to extract the outcome from it. - Error objects carry a
datafield with diagnostic information (position, received class, duplicates, etc.). - Side effects: none beyond error signaling and construction of the returned object.
Purpose
S3 print method for objects of class gdpar_formula_set. Renders a compact human-readable summary showing the number of slots K, the outcome variable, the parameter names in declaration order, and each slot's formula.
Arguments
-
x: agdpar_formula_setobject. Expected to be a list with componentsoutcome(character scalar),param_names(character vector), andformulas(named list of formulas). -
...: unused; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
- Dispatches on class
gdpar_formula_set. - Uses
deparse(f, width.cutoff = 500L)to render each formula on a single logical line. - No validation of
xis performed; malformed inputs may produce partial or confusing output.
Purpose
S3 double-bracket subsetting operator for gdpar_formula_set objects. Extracts a single formula by slot name or integer index.
Arguments
-
x: agdpar_formula_setobject containing aformulascomponent (a named list). -
i: a character scalar (slot name) or an integer scalar (position), passed directly to[[onx$formulas.
Returns
The single formula stored at slot i (not wrapped in a list).
Notes
- Dispatches on class
gdpar_formula_set. - Inherits the indexing semantics of R's
[[for lists: out-of-bounds integer indices returnNULL, and partial matching may apply for character indices. - No re-validation is performed; the return is a bare
formula.
Purpose
S3 single-bracket subsetting operator for gdpar_formula_set objects. Returns a named list of formulas for the requested slots.
Arguments
-
x: agdpar_formula_setobject containing aformulascomponent. -
i: a character vector of slot names or an integer vector of positions, passed directly to[onx$formulas.
Returns
A named list of formulas (a plain list, not a gdpar_formula_set). The documentation explicitly notes that downstream re-validation requires reconstruction via gdpar_formula_set.
Notes
- Dispatches on class
gdpar_formula_set. - The return value loses the
gdpar_formula_setclass intentionally; callers needing a validated set must rebuild one. - Indexing semantics (recycling, out-of-bounds producing
NA, partial matching) follow base R list behavior.
Purpose
S3 names method for gdpar_formula_set objects. Returns the slot (parameter) names in declaration order.
Arguments
-
x: agdpar_formula_setobject containing aparam_namescomponent (character vector).
Returns
A character vector equal to x$param_names, giving the slot names in declaration order.
Notes
- Dispatches on class
gdpar_formula_set. - No validation is performed; the value is returned verbatim from
x$param_names.
Purpose
S3 length method for gdpar_formula_set objects. Reports the number of parameter slots K.
Arguments
-
x: agdpar_formula_setobject containing aformulascomponent (a named list).
Returns
An integer scalar equal to length(x$formulas), i.e. the number of slots (including the outcome slot mu).
Notes
- Dispatches on class
gdpar_formula_set. - The returned count includes the outcome slot, so it is the total number of formulas stored, not the number of auxiliary parameters.
Purpose
Internal predicate used by the dispatch logic in gdpar (sub-phase 8.3.3, decision P-dispatch) to decide whether a classic two-sided formula should be routed through the new K-individual parser path or through the legacy single-amm_spec path. Returns TRUE when at least one top-level summand of the RHS is a call to one of the AMM wrappers a, b, or W.
Arguments
-
formula: a classic two-sided (or one-sided)formulaobject, or any R object.
Returns
A logical scalar. FALSE when formula is not a formula, or when no top-level summand is a call to a, b, or W; TRUE otherwise.
Mathematics
Let the RHS expression be
$$
H(e) = \begin{cases}
H(e_2) \lor H(e_3) & \text{if } e \text{ is a binary call } e_1 + e_2 + e_3 \text{ with } e_1 \equiv +, \
\mathbb{1}{\mathrm{head}(e) \in {a, b, W}} & \text{if } e \text{ is a call with a symbol head}, \
\text{FALSE} & \text{otherwise.}
\end{cases}
$$
The function returns formula[[3L]] for two-sided formulas or formula[[2L]] for one-sided formulas.
Notes
- Non-throwing by design: an invalid RHS that the parser would later reject still returns
TRUEhere when an AMM wrapper is detected, so that dispatch routes the call to the parser which produces the canonical error message. - Only top-level summands of a
+tree are inspected; nested calls inside other operators are not traversed (the recursion only descends through binary+). - A call head must be a symbol; calls with non-symbol heads (e.g.
::or$-qualified names) do not triggerTRUE. - Returns
FALSEimmediately whenformuladoes not inherit from"formula".
Purpose
Internal predicate that detects whether an amm_spec object is the canonical default Level 0 spec (no active components). Used by the dispatch in gdpar to treat an uncustomized amm argument as effectively absent when the new K-individual paths are taken.
Arguments
-
amm: any R object.
Returns
A logical scalar. TRUE when amm inherits from "amm_spec" and all of the following hold: amm$a is NULL, amm$b is NULL, amm$W is NULL, amm$x_vars is NULL, amm$p is NULL or equal to 1L, and amm$dims is NULL. FALSE otherwise.
Notes
- The check on
pusesis.null(amm$p) || isTRUE(amm$p == 1L), so a missingpcomponent also qualifies as default. - Accesses components via
$, which returnsNULLfor absent components in lists; thus anamm_speclacking some of these fields entirely is still considered default. - Does not throw; any non-
amm_specinput returnsFALSEvia short-circuit of&&.
Purpose
Internal parser that walks the RHS expression of a one-sided formula and extracts the AMM components declared via the special wrapper calls a(...), b(...), and W(). Produces a plain list descriptor that downstream dispatch combines with the external W_basis argument to build an amm_spec per slot.
Arguments
-
rhs_formula: a one-sided (or two-sided)formulawhose RHS is to be parsed. -
slot_name: a character scalar giving the slot name from the enclosinggdpar_formula_set, used solely for error messages.
Returns A named list with three components:
-
a_formula: a one-sided formula~ <expr>whena(<expr>)appears in the RHS;NULLotherwise. The formula carries the environment ofrhs_formula. -
b_formula: a one-sided formula~ <expr>whenb(<expr>)appears in the RHS;NULLotherwise. -
W_present: a logical scalar;TRUEwhenW()appears in the RHS.
Mathematics
Let the RHS expression be .gdpar_split_amm_summands. For each summand
- If
$s_j \equiv 1$ (the symbol1or the numeric value1), it is accepted as a Level 0 anchor-only term and skipped. - Otherwise
$s_j$ must be a call whose head is a symbol in$\{a, b, W\}$ . - For head
$W$ : the call must have zero arguments; setsW_present$\leftarrow$ TRUE. - For head
$a$ or$b$ : the call must have exactly one argument; the interior expression is wrapped as$\sim\,\text{interior}$ and stored ina_formulaorb_formularespectively.
The output is a_formula, b_formula, and W_present.
Notes
- Raises
gdpar_internal_error(viagdpar_abort) whenrhs_formulais not aformula; this is treated as an internal invariant violation. - Raises
gdpar_input_errorwhen:- a summand is a bare term (not a call and not
1) — the message (partly in Spanish) instructs to wrap terms ina(),b(), orW(), or use~ 1for a degenerate Level 0 AMM; - a call has a non-symbol head;
- a call's head is not one of
a,b,W(unknown function); -
W()is called with any arguments (must be zero-argument); -
W()appears more than once in the same slot; -
a()orb()is called with a number of arguments other than one; -
a()orb()appears more than once in the same slot.
- a summand is a bare term (not a call and not
- The interior of
a()/b()is wrapped in~ <deparse(interior)>usingstats::as.formulawithenv = environment(rhs_formula), then passed to.gdpar_check_no_intercept_suppressionfor an additional validation (presumably rejecting0or-1intercept-suppression syntax; the exact behavior is defined in that helper, not shown in this section). - The splitter
.gdpar_split_amm_summandsis invoked to flatten binary+trees; non-+binary operators at the top level are rejected there. - P5 canonization: a bare RHS of
1is accepted as Level 0 (anchor only); any other bare RHS without AMM wrappers aborts. - Error objects carry a
datafield withslot,term,fname,nargs, oralloweddiagnostics as appropriate.
Purpose
Internal helper used during AMM (Additive Multiplicative Model) formula parsing. It recursively decomposes an unevaluated R expression into a flat list of "summands" by walking down the top-level + nodes of the parse tree. It simultaneously enforces a syntactic restriction: at the top level of a slot's right-hand side, only additive composition (+) of wrapper calls (e.g. a(), b(), W()) is permitted; any other binary operator at the top level is rejected with a structured error. This guarantees that interactions and transformations are pushed inside the wrappers (e.g. a(x1:x2)) rather than appearing as top-level terms.
Arguments
-
expr—language(a call) orsymbol/literal. The unevaluated R expression representing (a sub-tree of) the right-hand side of a slot formula. May be a single call, a symbol, or a literal constant. -
slot_name—character(1). The name of the parameter slot currently being parsed; used only for constructing diagnostic error messages.
Mathematics
Given an expression tree
where the union preserves left-to-right order (i.e. c(S(E_L), S(E_R))). Before reaching the recursive base case, the function inspects any three-element call whose head is not +: if the head is one of
Returns
A list of length + call. The list is in left-to-right reading order of the original expression.
Notes
- The recursion is triggered only when
expris a call of length exactly3Lwhose first element isidenticalto the symbol+. All other three-element calls are routed to the validation branch. - The set of forbidden top-level binary operators is exactly
c("-", "*", "/", ":", "^", "|"). Note that+is handled separately (recursion) and unary-(a call of length 2) is not intercepted here. - When a forbidden operator is detected, the function calls
gdpar_abortwith:-
class = "gdpar_input_error", -
data = list(slot = slot_name, operator = op), - a message (in Spanish) explaining that AMM canonizes only additive composition of the wrappers
a(),b(),W(), and that interactions/transformations must go inside the wrapper.
-
- The function performs no evaluation of
expr; it is purely structural. - No side effects other than the potential abort.
Purpose
Internal converter that bridges the user-facing multi-slot formula specification (a gdpar_formula_set) and the lower-level AMM machinery. For each parameter slot it extracts the right-hand side of the stored formula, parses it via .gdpar_parse_amm_formula(), validates the consistency between any declared W() term and the externally supplied W_basis object, and assembles an amm_spec object. The result is a named list of amm_spec objects — one per slot — suitable for K-individual dispatch in gdpar(). For amm_spec path to preserve bit-exact backward compatibility with Block 7.
Arguments
-
fs—gdpar_formula_set. An S3 object containing at least the components$param_names(a character vector of slot names) and$formulas(a named list offormulaobjects, one per slot). The function aborts with an internal error iffsdoes not inherit from"gdpar_formula_set". -
W_basis_arg—W_basisorNULL. The externalWargument passed by the user togdpar(). When non-NULLit must inherit from"W_basis"; it is attached to every slot whose parsed formula declares aW()term. Defaults toNULL.
Mathematics
Let the formula set contain
and parses it into components
The output is the named list
Returns
A named list of length length(fs$param_names), with names equal to fs$param_names. Each element is an amm_spec object constructed by amm_spec(a = parsed$a_formula, b = parsed$b_formula, W = W_for_slot). Slots without a W() term receive W = NULL; slots with a W() term receive the (validated) W_basis_arg.
Notes
-
Input class check. If
!inherits(fs, "gdpar_formula_set"), the function aborts viagdpar_abortwithclass = "gdpar_internal_error"and the message"Internal error: .gdpar_formula_set_to_amm_spec_list expected a gdpar_formula_set.". This is treated as a programmer error, not a user error. -
RHS extraction. The right-hand side is obtained by indexing the formula object: when
length(f) == 3L(a two-sided formulay ~ ...),rhs_only <- f[c(1L, 3L)]reconstructs a one-sided formula~ ...; otherwise the formula is used as-is. This avoids deparsing or re-parsing and preserves the original language object. -
W()consistency (decision P3, sub-phase 8.3.3). Two abort conditions are checked, in order, for each slot whereparsed$W_presentisTRUE:- If
W_basis_argisNULL, abort withclass = "gdpar_input_error",data = list(slot = k), and a message instructing the user to pass an explicitW = W_basis(...)or remove theW()term. - Else if
W_basis_argdoes not inherit from"W_basis", abort withclass = "gdpar_input_error",data = list(received = class(W_basis_arg)), and a message stating that the externalWargument must be of classW_basis.
- If
- The
W_basis_argis shared across all slots that declareW(); there is no per-slotWoverride at this layer. - The function does not evaluate any formula; it only manipulates language objects and delegates parsing to
.gdpar_parse_amm_formula. - No S3 dispatch is performed by this function itself; it relies on the
amm_spec()constructor (whose dispatch is external to this section). - The loop iterates over
fs$param_namesin their stored order, so the output list preserves that order. - For
$K = 1$ the return value is a length-one named list; the function does not unwrap it (the caller handles backward compatibility).
Purpose (role in the package). Constructs a tidy data.frame of posterior summary statistics for a generic block of model parameters. It is the workhorse called by coef.gdpar_fit() for both the "multi" and "scalar" branches, producing one row per parameter term with its mean and central credible-interval quantiles.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_mat |
matrix or NULL
|
Matrix of posterior draws with rows = MCMC/HMC samples and columns = parameters. If NULL or with zero columns the function returns NULL. |
term_names |
character |
Column labels (parameter names) whose length must equal ncol(draws_mat). |
Mathematics
For each column draws_mat with draws
and the empirical quantiles
Returns
A data.frame with columns term (character), mean, q05, q50, q95 (all numeric), one row per parameter. Returns NULL when draws_mat is NULL or has zero columns.
Notes
- Raises a class
gdpar_internal_error(viagdpar_abort) whenlength(term_names) != ncol(draws_mat). -
stats::quantileis called withnames = FALSEandna.rm = TRUE. - The returned
meanand quantile columns areunnamed. -
stringsAsFactors = FALSEis explicitly set;row.names = NULL.
Purpose (role in the package). Builds a long-format data.frame summarising the posterior draws of a gdpar for the
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_mat |
matrix or NULL
|
Draw matrix of dimension basis_dim d_x matrix. Returns NULL if this is NULL or has zero columns. |
basis_dim |
integer(1) |
Number of spatial basis functions |
x_names |
character |
Names of the |
Mathematics
The draws matrix stores a row-major vectorisation of
so that column draws_mat maps to element build_coef_term_df.
Returns
A data.frame with columns basis_idx (integer, 1-based spatial-basis index), x_name (character, covariate name), mean, q05, q50, q95. Row count is basis_dim * length(x_names). Returns NULL when any degenerate condition is met (NULL draws, zero columns, basis_dim == 0, or empty x_names).
Notes
-
basis_idxis constructed viarep(seq_len(basis_dim), times = length(x_names))andx_nameviarep(x_names, each = basis_dim), matching the row-major ordering$b$ varying fastest. - Raises
gdpar_internal_errorwhenncol(draws_mat) != basis_dim * length(x_names). - Quantiles and means are
unnamed;na.rm = TRUE.
Purpose (role in the package). Produces a tidy data.frame of posterior summaries for the reference-anchor parameter vector
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_mat |
matrix |
Posterior draws of dimension |
p |
integer(1) |
Expected number of coordinates; must equal ncol(draws_mat). |
Mathematics
For coordinate
Returns
A data.frame with columns k (integer, mean, q05, q50, q95, and exactly NULL guard; assumes non-empty input.
Notes
- Raises
gdpar_internal_errorifncol(draws_mat) != p. -
kis alwaysseq_len(p), guaranteeing contiguous integer labels. - Called internally by
build_coef_theta_ref_df_groupedwhenJ_groups == 1.
Purpose (role in the package). Generalises build_coef_theta_ref_df to the grouped case (Block 6.5 of the model). Handles both univariate (J_groups == 1 it collapses to the ungrouped schema for backward compatibility.
Arguments
| Argument | Type | Meaning |
|---|---|---|
arr |
matrix or array
|
Posterior draws. Shape depends on context: J_groups == 1, several shapes are tolerated (see Notes). |
J_groups |
integer(1) |
Number of groups |
p |
integer(1) |
Number of coordinates per group. |
Mathematics
When J_groups > 1 and
When J_groups == 1, the group dimension is dropped and the output schema matches build_coef_theta_ref_df.
Returns
- If
J_groups == 1: delegates tobuild_coef_theta_ref_df, returning adata.framewith columns(k, mean, q05, q50, q95)and$p$ rows. - If
J_groups > 1andarris a matrix ($p=1$ ):data.framewith columns(g, k, mean, q05, q50, q95),$J_{\text{groups}}$ rows, andkis always1L. - If
J_groups > 1andarris a 3-D array:data.framewith columns(g, k, mean, q05, q50, q95),$J_{\text{groups}} \times p$ rows, constructed by looping over groups and coordinates.
Notes
- When
J_groups == 1, the function accepts three input shapes:- A 3-D array with
dim[2] == 1: the single group slicearr[, 1, ]is extracted. - A matrix with one column and
p == 1: used directly. - A matrix with
pcolumns: used directly (the ungrouped case). Any other shape raisesgdpar_internal_error.
- A 3-D array with
- When
J_groups > 1andarris a matrix, it must havencol == J_groupsandp == 1; otherwisegdpar_internal_error. - When
J_groups > 1andarris a 3-D array, dimensions must be exactly(S, J_groups, p). - The loop-based construction for the 3-D case appends single-row
data.frames to a list, then callsdo.call(rbind, out_rows).
Purpose (role in the package). Constructs a tidy data.frame of posterior summaries for a hyperparameter vector (specifically build_coef_theta_ref_df.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_mat |
matrix or NULL
|
Posterior draws of dimension .extract_mu_sigma_theta_ref. Returns NULL if NULL. |
p |
integer(1) |
Expected number of coordinates; must equal ncol(draws_mat). |
Mathematics
Identical per-coordinate mean and quantile computation as in build_coef_theta_ref_df.
Returns
A data.frame with columns k (mean, q05, q50, q95, and NULL when draws_mat is NULL.
Notes
- Raises
gdpar_internal_errorifncol(draws_mat) != p(whendraws_matis notNULL). - Structurally identical to
build_coef_theta_ref_dfexcept for the explicitNULLguard, reflecting that hyperparameters may be absent when there is no grouping.
Purpose (role in the package). Internal validation helper that checks the structural integrity of a per-gdpar_coef object. Each such slot is either NULL (component absent at the AMM level) or a list of length NULL (coordinate inactive) or data.frames with specific required columns.
Arguments
| Argument | Type | Meaning |
|---|---|---|
slot |
list or NULL
|
The coefficient slot to validate. |
p |
integer(1) |
Expected list length (number of coordinates). |
expected_cols |
character |
Column names that every non-NULL data.frame element must contain. |
slot_name |
character(1) |
Human-readable name used in error messages. |
Mathematics
None (pure validation logic).
Returns
Returns invisible(NULL) unconditionally on success. Side-effect-only on failure: raises gdpar_internal_error.
Notes
Validation proceeds in order:
- If
slotisNULL, return immediately (component absent — valid). - Check
is.list(slot)andlength(slot) == p; abort otherwise. - For each
$k \in 1,\dots,p$ : if element$k$ isNULLskip (coordinate inactive); otherwise verify it is adata.frameand that allexpected_colsare present innames(elem).
Purpose (role in the package). Validates the structural integrity of the theta_ref slot of a gdpar_coef object. Handles both ungrouped (g column) and grouped schemas.
Arguments
| Argument | Type | Meaning |
|---|---|---|
theta_ref_df |
data.frame |
The theta_ref component to validate. |
p |
integer(1) |
Expected number of coordinate levels ( |
J_groups |
integer(1) |
Expected number of groups. Defaults to 1L. |
Mathematics
None (pure validation logic).
Returns
Returns invisible(NULL) on success. Raises gdpar_internal_error on any failure.
Notes
Validation steps in order:
-
theta_ref_dfmust be adata.frame. - Expected columns are
c("k", "mean", "q05", "q50", "q95"); if column"g"is present,"g"is prepended to the expected set. - All expected columns must exist; missing ones trigger an abort.
- Row count must equal
$J_{\text{groups}} \times p$ (whengpresent) or$p$ (when ungrouped). -
Ungrouped path:
theta_ref_df$kmust contain exactly the integers$1,\dots,p$ without gaps (checked viasort(as.integer(...))). -
Grouped path:
theta_ref_df$gmust contain exactly$1,\dots,J_{\text{groups}}$ andtheta_ref_df$kmust contain exactly$1,\dots,p$ , each without gaps.
Purpose (role in the package). Validates a hyperparameter summary data.frame (either mu_theta_ref or sigma_theta_ref) inside a gdpar_coef object. Expects a data.frame with columns (k, mean, q05, q50, q95) and exactly NULL slot is accepted (no grouping present).
Arguments
| Argument | Type | Meaning |
|---|---|---|
hyper_df |
data.frame or NULL
|
The hyperparameter summary to validate. NULL is accepted (component inactive / no grouping). |
p |
integer(1) |
Expected number of coordinate rows. |
hyper_name |
character(1) |
Human-readable name used in error messages (typically "mu_theta_ref" or "sigma_theta_ref"). |
Mathematics
None (pure validation logic).
Returns
Returns invisible(NULL) on success. Raises gdpar_internal_error on failure.
Notes
The function body is truncated in this section (the section ends immediately after the roxygen @noRd tag). Based on the documentation header, the expected behaviour is: accept NULL silently; otherwise verify the object is a data.frame with columns c("k", "mean", "q05", "q50", "q95") and exactly p rows. The full implementation is in a subsequent section.
Purpose Internal validation helper that checks whether a hyperparameter summary data.frame (such as mu_theta_ref or sigma_theta_ref) conforms to the expected structure. Called during new_gdpar_coef() construction.
Arguments
| Argument | Type | Meaning |
|---|---|---|
df |
data.frame or NULL
|
The hyperparameter summary table to validate. If NULL the function returns silently (component is absent). |
p |
integer scalar | Expected number of rows (one per coordinate). |
slot_name |
character scalar | Human-readable name of the slot being validated; used in error messages. |
Mathematics
No formula is implemented. The function enforces the invariant that, when present, a hyperparameter summary data.frame must contain exactly the columns
Returns invisible(NULL) on success. Never returns a value on failure; an error of class "gdpar_internal_error" is raised instead.
Notes
- If
dfisNULL, returns immediately—this is the expected path when the hyperparameter component is absent. - If
dfis not adata.frame,gdpar_abort()is called with a message stating the slot must beNULLor adata.frame. - Missing columns are detected via
setdiff(expected, names(df))and reported in a single comma-separated list. - Row-count mismatch triggers an error message quoting the actual and expected (
$p$ ) row counts. - All errors carry class
"gdpar_internal_error".
new_gdpar_coef(theta_ref, a, b, W, p, mu_theta_ref, sigma_theta_ref, J_groups, group_levels, summary_stats)
Purpose Internal constructor for the gdpar_coef S3 class. Validates every slot and assembles a well-formed object that downstream methods can trust without further checks. This is the sole entry point for creating gdpar_coef instances.
Signature
new_gdpar_coef(theta_ref, a = NULL, b = NULL, W = NULL,
p,
mu_theta_ref = NULL,
sigma_theta_ref = NULL,
J_groups = 1L,
group_levels = NULL,
summary_stats = c("mean", "q05", "q50", "q95"))Arguments
| Argument | Type | Meaning |
|---|---|---|
theta_ref |
data.frame |
Reference-anchored parameter summaries with columns (k, mean, q05, q50, q95) and p rows. May also include a g column for grouped models. |
a |
NULL or list of length p
|
Per-coordinate intercept/additive component slot. Each entry is NULL (inactive) or a data.frame with columns (term, mean, q05, q50, q95). |
b |
NULL or list of length p
|
Per-coordinate slope/multiplier component slot. Same structure as a. |
W |
NULL or list of length p
|
Per-coordinate basis-expansion component slot. Each entry is NULL or a data.frame with columns (basis_idx, x_name, mean, q05, q50, q95). |
p |
numeric scalar | Number of coordinates (time points, locations, etc.). Must be a positive integer. |
mu_theta_ref |
NULL or data.frame
|
Hyperparameter summary for the prior mean of validate_coef_hyper. |
sigma_theta_ref |
NULL or data.frame
|
Hyperparameter summary for the prior standard deviation of validate_coef_hyper. |
J_groups |
integer scalar | Number of distinct groups in a grouped model. Defaults to 1L. |
group_levels |
NULL or vector |
Labels for the J_groups groups; stored as-is. |
summary_stats |
character vector | Names of the summary statistics carried by the object (e.g. c("mean","q05","q50","q95")). |
Returns A list of class "gdpar_coef" with elements theta_ref, a, b, W, p, mu_theta_ref, sigma_theta_ref, J_groups, group_levels, summary_stats.
Notes
-
pis coerced to integer after validation. The check requires it to be a length-1 numeric, non-NA,$\geq 1$ , and equal to its own integer cast. -
J_groupsis coerced to integer unconditionally. - Validation is delegated to helpers:
validate_coef_theta_reffortheta_ref;validate_coef_slotfora,b,W(each with its expected column set);validate_coef_hyperformu_theta_refandsigma_theta_ref. - Any validation failure raises an error of class
"gdpar_internal_error"viagdpar_abort(). - The returned object uses
structure(...)to attach the"gdpar_coef"class.
Purpose Internal utility that counts how many coordinates in a component slot (a, b, or W) have a non-empty data.frame, i.e. how many coordinates actively contribute a component.
Signature
count_active_coords(slot)Arguments
| Argument | Type | Meaning |
|---|---|---|
slot |
NULL or list |
A per-coordinate component list (e.g. x$a). |
Returns An integer scalar: the number of list elements that are non-NULL data.frames with at least one row. Returns 0L when slot is NULL.
Notes
- Uses
vapplywithlogical(1L)for type-safe element-wise testing. - An element is "active" if and only if all three conditions hold:
!is.null(e),is.data.frame(e), andnrow(e) > 0L. - Called by
print.gdpar_coefandsummary.gdpar_coefto report component activity counts.
Purpose S3 print method for gdpar_coef objects. Provides three verbosity levels for inspecting coefficient summaries.
Signature
print.gdpar_coef(x,
level = c("global", "coord", "full"),
digits = 4L,
...)Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_coef |
Object to print. |
level |
character, one of "global", "coord", "full"
|
Controls verbosity. "global" shows only the theta_ref table and active-component counts. "coord" appends per-coordinate component means (only the mean column). "full" appends per-coordinate data.frames with all summary columns. |
digits |
integer scalar | Number of significant digits passed to format_coef_df() for numeric formatting. |
... |
ignored | Present for S3 generic compatibility. |
Returns invisible(x) — the input object, unmodified.
Notes
- The header prints: class tag
<gdpar_coef>,p, conditionallyJ_groups(only when$> 1$ ),summary_stats, and active-component counts formatted asa(n/p) b(n/p) W(n/p). - If
mu_theta_reforsigma_theta_refis non-NULL, their data.frames are printed beforetheta_ref. - At
level = "global"a hint is printed suggesting the other levels andas.data.frame(). - At
level = "coord"or"full", the method iterates$k = 1, \ldots, p$ . For each coordinate, if at least one ofa[[k]],b[[k]],W[[k]]is non-NULLand non-empty, its table is printed. At"coord"level only columns(term, mean)(fora/b) or(basis_idx, x_name, mean)(forW) are shown; at"full"level all columns are shown. - All numeric formatting is delegated to
format_coef_df(). - Uses
match.arg()forlevelargument matching.
Purpose Internal helper that formats numeric columns of a coefficient data.frame for pretty printing. Modifies numeric columns in-place (except integer-like columns) to character strings with a specified number of significant digits.
Signature
format_coef_df(df, digits)Arguments
| Argument | Type | Meaning |
|---|---|---|
df |
data.frame or NULL
|
The data.frame to format. |
digits |
integer scalar | Number of significant digits for formatC(). |
Returns The same data.frame with selected numeric columns replaced by formatted character strings. Returns df unchanged if it is NULL or has zero rows.
Notes
- Columns
"g","k", and"basis_idx"are skipped even if numeric, because they serve as integer identifiers rather than estimated quantities. - Formatting uses
formatC(..., digits = digits, format = "g", flag = "-"), where"g"selects the shorter of%f/%erepresentation and"-"requests left-justification. - The function mutates and returns
dfin-place (standard R copy-on-modify semantics apply).
Purpose S3 summary method that returns a compact list of aggregated statistics about a gdpar_coef object.
Signature
summary.gdpar_coef(object, ...)Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_coef |
Object to summarize. |
... |
ignored | Present for S3 generic compatibility. |
Mathematics
Computes the cross-coordinate average of posterior means:
$$\bar{\theta}{\text{ref}} = \frac{1}{p}\sum{k=1}^{p} \theta_{\text{ref},k}^{\text{mean}}$$
via mean(object$theta_ref$mean).
Returns A named list with elements:
| Element | Type | Content |
|---|---|---|
p |
integer | Number of coordinates. |
n_active |
named integer vector | Counts of active coordinates for components a, b, W (via count_active_coords). |
theta_ref_mean |
numeric scalar | Mean of the mean column of theta_ref across all coordinates. |
summary_stats |
character vector | The summary statistic names carried by the object. |
Notes
- The
n_activevector has names"a","b","W". -
mean()is called directly onobject$theta_ref$mean; iftheta_refhas agcolumn (grouped model), this averages over all group–coordinate combinations.
Purpose S3 method that coerces a gdpar_coef object into a single long-format (tidy) data.frame. Flattens the hierarchical per-coordinate, per-component structure into rows suitable for use with dplyr / ggplot2.
Signature
as.data.frame.gdpar_coef(x, row.names = NULL, optional = FALSE, ...)Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_coef |
Object to coerce. |
row.names |
NULL |
Ignored; required by the as.data.frame generic. |
optional |
logical | Ignored; required by the generic. |
... |
ignored | Present for generic compatibility. |
Returns A data.frame with one row per summarized scalar coefficient and columns:
| Column | Type | Content |
|---|---|---|
component |
character |
"theta_ref", "mu_theta_ref", "sigma_theta_ref", "a", "b", or "W". |
g |
integer | Group index from theta_ref if a g column exists; NA_integer_ for all other components. |
k |
integer | Coordinate index ( |
identifier |
character |
term value for a/b; basis_idx (as character) for W; NA_character_ for theta_ref and hyperparameter rows. |
x_name |
character | Predictor name from the W slot; NA_character_ elsewhere. |
mean |
numeric | Posterior mean. |
q05 |
numeric | 5th percentile of the posterior. |
q50 |
numeric | 50th percentile (median) of the posterior. |
q95 |
numeric | 95th percentile of the posterior. |
Notes
- When
x$theta_refcontains a column"g", eachtheta_refrow carries the corresponding integer group value; otherwisegisNA_integer_. - The helper function
add_hyper(slot, comp_name)(defined inline) iterates over rows ofmu_theta_ref/sigma_theta_refand appends a row per entry. Itsgcolumn is alwaysNA_integer_. - The helper function
add_terms(slot, comp_name)(defined inline) iterates over coordinates and then over rows of the per-coordinate data.frame for componentsaandb, extracting thetermcolumn as theidentifier. - For
W, iteration is done directly (not viaadd_terms) because the identifier isbasis_idxand an additionalx_namecolumn must be carried. - If no rows are accumulated (all slots are
NULL), a zero-row data.frame with the correct column schema is returned. - Rows are accumulated in a list (
rows) and combined withdo.call(rbind, rows)at the end.
The section ends with the roxygen header for an exported function described as a "One-line formatter for gdpar_coef objects" but the function body is not included in this section. The documented signature is format.gdpar_coef(x, ...) (inferred from the generic dispatch implied by the @export tag and the parameter name x). It is documented as returning a length-1 character vector. Its implementation continues in section 3.
Purpose S3 method that provides a human-readable textual representation of a gdpar_coef object (the coefficient container for GDPAR models). It is automatically dispatched when format() or print() is called on an object of class "gdpar_coef", and is typically used for console display.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list with class attribute "gdpar_coef"
|
The coefficient object whose structure is to be rendered. Expected components accessed are $p, $a, $b, and $W. |
... |
(ignored) | Additional arguments passed through the S3 generic; not used by this method. |
Returns A single character string of the form:
<gdpar_coef> p=<p>, components=[theta_ref, a(<k_a>/<p>), b(<k_b>/<p>), W(<k_W>/<p>)]
where
-
$p$ isx$p(total number of predictors/anchors). -
$k_a$ ,$k_b$ ,$k_W$ are the values returned bycount_active_coords(x$a),count_active_coords(x$b), andcount_active_coords(x$W)respectively — each counts how many coordinates in the corresponding array are non-zero (active).
The expression sprintf is used for interpolation; the %d placeholders are all supplied as integers.
Notes
-
S3 dispatch. The leading-dot naming convention
.gdpar_coefafter the generic nameformatis the standard R mechanism for S3 method dispatch. Whenformat()(orprint(), which callsformat()) is invoked on any object whose class vector contains"gdpar_coef", this method is selected. -
Dependency on
count_active_coords. The helper functioncount_active_coords(defined elsewhere in the package) must return a scalar integer. If it returns a non-scalar or non-integer,sprintfwill either coerce or warn. -
No side effects. The function is purely informational; it does not modify
xor any global state. -
Assumptions on structure. The method assumes
xcontains list elementsp,a,b, andW. If any are missing, a runtime error will be thrown at the point of access.
Purpose
Exported entry point for approximate leave-one-out cross-validation of a gdpar_fit object. It delegates to loo::loo() to perform Pareto-smoothed importance sampling LOO (PSIS-LOO), using the per-observation log-likelihood draws persisted by the Stan model in the generated quantity log_lik. For multivariate fits (
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_fit |
A fitted gdpar model object. Must inherit from "gdpar_fit" (checked via assert_inherits). The function accesses fit$fit$draws(...) (the underlying Stan/CmdStan fit) and fit$amm$p (the declared coordinate dimension). |
aggregation |
character scalar |
One of "subject" (default) or "cell". Matched via match.arg. Ignored when "subject" sums log-likelihood over coordinates within each row; "cell" treats each |
r_eff |
numeric vector or NULL
|
Relative effective sample sizes per observation. If NULL (default), computed from the draws via loo::relative_eff() using chain identifiers reconstructed from the draws array dimensions. Supplying it explicitly avoids recomputation across repeated LOO calls on the same fit. |
cores |
integer (coerced to 1L by default) |
Number of cores passed to loo::loo() and loo::relative_eff(). Defaults to 1L (sequential) to avoid non-determinism. |
... |
(any) | Additional arguments forwarded to loo::loo(). |
Mathematics
For a univariate fit (log_lik as a length-
For a multivariate fit (log_lik as an
the per-subject log-likelihood (aggregation "subject") is
For aggregation "cell", each
The relative effective sample size is computed (when r_eff = NULL) as
where chain_id is constructed as rep(seq_len(n_chain), each = n_iter) from the draws array of shape [n_iter, n_chain, n_variables].
Returns
A loo object: an S3 object of class "psis_loo" containing the ELPD estimate (elpd_loo), its standard error, Pareto-loo::loo(log_lik_mat, r_eff = r_eff, cores = cores, ...).
Notes
- Flagged
@keywords experimental; the aggregation rule is stable but the signature may gain additional arguments in future versions. - Calls
assert_inherits(fit, "gdpar_fit", "fit")— raises an error iffitdoes not inherit from"gdpar_fit". - Calls
require_suggested("loo", "compute PSIS-LOO approximate cross-validation")— raises an error if theloopackage is not installed. - Draws are extracted via
fit$fit$draws(variables = "log_lik", format = "draws_array"), producing a 3-D array[iteration, chain, variable]. - The coordinate dimension
pis read fromfit$amm$p; if that slot isNULL,pdefaults to1L. - The
r_effcomputation usesexp(log_lik_mat)(i.e., the likelihood on the natural scale) as required byloo::relative_eff(). - Pareto-
$\hat{k} > 0.7$ signals that the PSIS approximation is unreliable for the affected observations; the documentation recommendsloo::loo_moment_match()orloo::reloo()as refinements. - The
"subject"aggregation matches the convention used by brms multivariate fits withset_rescor(FALSE), yielding ELPD values directly comparable to per-coordinate competitors aggregated identically. - The
"cell"aggregation is recommended for diagnostics only, not for reporting comparable ELPD values, because it conflates subject-level and coordinate-level cross-validation.
Purpose
Internal helper (not exported; documented @noRd) that converts a draws_array of the Stan-generated log_lik quantity into a plain numeric matrix suitable for loo::loo(). It handles the univariate case (pass-through), the multivariate subject-aggregation case (sum over coordinates), and the multivariate cell-aggregation case (concatenation in i-major, k-minor order).
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_arr |
draws_array (from posterior/cmdstanr) |
A draws array containing only the log_lik variable(s). Expected shape: [n_iter, n_chain, n_variables]. |
p |
integer |
Declared coordinate dimension. If 1L, the function returns the draws matrix unchanged. If |
aggregation |
character scalar |
Either "subject" or "cell". Determines whether coordinates are summed (subject) or concatenated (cell). Not validated inside this function (the caller gdpar_loo passes the result of match.arg). |
Mathematics
For
For ^log_lik\[(\d+),(\d+)\]$, yielding index pairs
Subject aggregation produces an
with mat corresponding to variable log_lik[i,k].
Cell aggregation produces an
so columns are ordered as
Returns
A plain numeric matrix (no class attributes beyond matrix, since unclass() is applied to the draws_matrix):
-
$p = 1$ : shape$S \times n$ , identical to the input draws matrix. -
$p > 1$ ,"subject": shape$S \times n$ , columns indexed by subject$i$ . -
$p > 1$ ,"cell": shape$S \times (n \cdot p)$ , columns indexed by flat$(i,k)$ .
Here
Notes
- The function begins by extracting variable names via
posterior::variables(draws_arr)and converting to a plain matrix viaunclass(posterior::as_draws_matrix(draws_arr)). - For
$p = 1$ , the function returns immediately without parsing variable names; both aggregation modes would yield the same result. -
Parse error: If any variable name does not match
^log_lik\[(\d+),(\d+)\]$, the function callsgdpar_abort()with:-
class = "gdpar_loo_parse_error", -
data = list(unparsed = vars[bad]), - a message listing up to 5 offending names (via
utils::head(vars[bad], 5L)andsQuote).
-
-
Dimension mismatch error: If the parsed maximum
$k$ (p_in) does not equal the declaredp, the function callsgdpar_abort()with:-
class = "gdpar_loo_dim_mismatch", -
data = list(p_parsed = p_in, p_fit = p), - a message of the form
"log_lik dimension mismatch: parsed p = %d from draws; fit declares p = %d.".
-
- The index matrix
ijis built bydo.call(rbind, lapply(parsed, function(z) as.integer(z[2:3]))), with column namesc("i", "k"). - The subject-aggregation loop iterates
seq_along(vars)and accumulatesmat[, col_idx]intoout[, i]; the cell-aggregation loop places eachmat[, col_idx]intoout[, flat_col]without summation. - No validation of
aggregationis performed inside this function; it assumes the caller has already matched the argument. If an unrecognized value is passed, the cell-aggregation branch is taken by default (since theif (aggregation == "subject")check fails), and the subject branch is skipped.
R/gdpar-package.R contains no functions: it holds only the package-level documentation sentinel "_PACKAGE" (which Roxygen turns into the ?gdpar-package help topic, aliased gdpar-package) plus the package-wide @importFrom directives.
Package-level documentation. The roxygen block states the package's thesis — individual parameters decomposed as
The roxygen @section Three paths describes path = "bayes" (Path 1, hierarchical Bayesian via cmdstanr — the default and, in this release, the only operational path), path = "vcm" (Path 2, varying-coefficient via mgcv), and path = "hyper" (Path 3, hypernetwork via torch, explicitly "Not implemented in this release"; invoking it raises a structured gdpar_unsupported_feature_error, and torch is not a declared dependency). The stated default rationale: Path 1 admits the closed-form identifiability results (Theorems 1A/1E) and calibrated Bayesian uncertainty (Theorem 4C). (Note: consistent with the operational status documented throughout this wiki and the framework-overview vignette, the executable surface of gdpar 0.1.0 is Path 1 only; path = "vcm" and path = "hyper" abort with gdpar_unsupported_feature_error.)
Imports. @importFrom stats model.matrix terms update sd quantile median lm coef qnorm setNames qt and @importFrom methods is — the base-stats/methods symbols used across the package.
← Part IV — Exhaustive Function Reference (2/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (4/7) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References