Skip to content

Part IV Function Reference 3

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 3 revisions

← Part IV — Exhaustive Function Reference (2/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (4/7) →


.gdpar_eb_make_random_init(stan_data, seed_offset = 1L, base_seed = NULL)

Purpose

Generates a list of random initial values for the Stan HMC sampler in the Empirical Bayes (EB) workflow. The structure of the returned list is conditioned on the flags and dimensions carried in stan_data, so that only parameters relevant to the configured model are initialised.

Arguments

  • stan_data (list): The data list prepared for Stan. The following fields are consulted:
    • J_groups (integer): number of reference-parameter groups $J$.
    • use_groups (integer flag, 0/1): whether group-level hyperparameters are active.
    • use_a (integer flag, 0/1): whether the $a$ AMM component is active.
    • J_a (integer): dimension of the $a$ component.
    • use_b (integer flag, 0/1): whether the $b$ AMM component is active.
    • J_b (integer): dimension of the $b$ component.
    • use_W (integer flag, 0/1): whether the $W$ AMM component is active.
    • dim_W (integer): row dimension of the $W$ matrix.
    • d (integer): column dimension of the $W$ matrix (latent dimension).
    • use_dispersion_y (integer flag, 0/1): whether an observation-level dispersion is active.
    • use_dispersion_phi (integer flag, 0/1): whether a $\phi$ dispersion parameter is active.
  • seed_offset (integer, default 1L): integer added to base_seed to derive the RNG seed.
  • base_seed (integer or NULL, default NULL): base seed. If NULL, the global RNG state is left untouched.

Mathematics

When base_seed is non-NULL, the effective seed is

$$ \texttt{rng_seed} = \texttt{as.integer(base_seed)} + \texttt{seed_offset}. $$

The draws are:

$$ \theta_{ref,j} \sim \mathcal{N}(0,, 0.5^2), \quad j = 1,\dots,J $$

and, when the corresponding flag is set:

$$ \mu_{\theta_{ref}} \sim \mathcal{N}(0,, 0.5^2), \qquad \sigma_{\theta_{ref}} = |Z| + 0.1, \quad Z \sim \mathcal{N}(0,1) $$

$$ \sigma_a = |Z| + 0.1, \qquad a_{raw} \sim \mathcal{N}(0,, 0.5^2) \in \mathbb{R}^{J_a} $$

$$ \sigma_b = |Z| + 0.1, \qquad c_{b,raw} \sim \mathcal{N}(0,, 0.5^2) \in \mathbb{R}^{J_b} $$

$$ \sigma_W = |Z| + 0.1, \qquad W_{raw} \sim \mathcal{N}(0,, 0.5^2) \in \mathbb{R}^{\texttt{dim_W} \times d} $$

$$ \sigma_y = |Z| + 0.1, \qquad \phi = |Z| + 1 $$

Returns

A named list suitable for passing as init to a Stan sampler. Scalar parameters are wrapped in 1-element arrays via as.array(); W_raw is a matrix; theta_ref, a_raw, and c_b_raw are numeric vectors.

Notes

  • When base_seed is non-NULL, the function calls set.seed(rng_seed) and registers an on.exit handler that restores the prior .Random.seed state in .GlobalEnv (if it existed) upon return. If .Random.seed did not exist in .GlobalEnv, the handler does nothing (the seed set by set.seed persists).
  • The on.exit handler is registered with add = TRUE, so it composes with any pre-existing exit handlers.
  • Flags are tested with isTRUE(... == 1L), so any value other than exactly 1L (including TRUE or 1) is treated as inactive.

.gdpar_eb_apply_correction(eb_correction, laplace_result, stan_data, p = 1L, verbose)

Purpose

Entry point for the Proposition 7B coverage-discrepancy correction in the EB workflow. In the scalar regime ($p = 1$) it computes a scalar inflation constant directly; for $p > 1$ it delegates to .gdpar_eb_correction_matrix(). The correction is not applied to the raw draws here—only the scaling object is returned for downstream S3 methods.

Arguments

  • eb_correction (logical): whether the correction should be applied.
  • laplace_result (list): result of the Laplace approximation step. Must contain theta_ref_cov (a matrix, or at least an indexable object for the [1L, 1L] element in the scalar path).
  • stan_data (list): the Stan data list. Passed through but not directly used in the scalar computation.
  • p (integer, default 1L): dimensionality of the reference parameter for the correction.
  • verbose (logical): whether to emit a diagnostic warning when the correction is disabled.

Mathematics

Scalar form ($p = 1$), from v07 Section 6 / v07b Section 5.2:

$$ C_{g,\alpha} = \kappa(\alpha) \cdot \bigl(g'(\xi^*)\bigr)^2 \cdot \bigl(J^{\xi}\bigr)^2 \cdot \bigl(I_{\theta\theta}^{marg}\bigr)^{-1} $$

For the default identity functional $g(\xi) = \xi$ and $p = 1$, this reduces to:

$$ C_{g,\alpha} = \kappa(\alpha) \cdot \mathrm{Var}^{marg}(\theta_{ref}) $$

with $\kappa(\alpha_{95\%}) = 1.92$ hardcoded in the function.

Returns

A list with two elements:

  • applied (logical): TRUE if the correction was successfully computed, FALSE otherwise.
  • constant (numeric scalar): the scalar correction $C_{g,\alpha}$ when applied = TRUE; NA_real_ otherwise.

When $p > 1$, the return value is whatever .gdpar_eb_correction_matrix() produces (a $p \times p$ matrix in constant).

Notes

  • If eb_correction is FALSE and verbose is TRUE, a warning is issued via gdpar_warn() with class "gdpar_diagnostic_warning", stating that intervals will use nominal coverage and may under-cover by $O(n^{-1})$.
  • The marginal variance is extracted as laplace_result$theta_ref_cov[1L, 1L] inside a tryCatch; any error yields NA_real_.
  • If the marginal variance is not finite or is $\leq 0$, the function returns applied = FALSE, constant = NA_real_ silently (no warning).
  • For $p > 1$, p is coerced to integer before the delegation check.

.gdpar_eb_correction_matrix(eb_correction, laplace_result, stan_data, p = 1L, verbose)

Purpose

Computes the matrix-valued Proposition 7B* coverage-discrepancy correction for the multivariate regime ($p > 1$). This is the companion of the scalar path in .gdpar_eb_apply_correction() and implements v07b Section 5.1.

Arguments

  • eb_correction (logical): whether the correction should be applied.
  • laplace_result (list): Laplace approximation result containing theta_ref_cov.
  • stan_data (list): Stan data list (passed through, not used in computation).
  • p (integer, default 1L): dimension of the reference parameter.
  • verbose (logical): intended for diagnostics (not directly used in the body beyond being accepted).

Mathematics

Matrix form (Proposition 7B*, v07b Section 5.1):

$$ C^{*}_{g,\alpha} = \kappa(\alpha) \cdot J^{\xi,T} \cdot \Sigma^{marg}_{\theta_{ref}} \cdot J^{\xi} $$

For the default identity functional $g(\xi) = \xi$, the Jacobian $J^{\xi} = I_p$, so:

$$ C^{*}_{g,\alpha} = \kappa(\alpha) \cdot \Sigma^{marg}_{\theta_{ref}} $$

with $\kappa(\alpha_{95\%}) = 1.92$. At $p = 1$ this collapses to the scalar form $\kappa(\alpha) \cdot \mathrm{Var}^{marg}(\theta_{ref})$.

Returns

A list with two elements:

  • applied (logical): TRUE if the matrix correction was successfully computed.
  • constant (matrix): the $p \times p$ (or matching cov_mat dimension) correction matrix when applied = TRUE; an NA_real_ matrix of appropriate size otherwise.

Notes

  • The function aborts silently to applied = FALSE with an NA matrix in the following cases:
    1. eb_correction is not TRUE.
    2. laplace_result$theta_ref_cov is NULL, not a matrix, or non-square (extraction wrapped in tryCatch returning NULL on error).
    3. Any element of cov_mat is non-finite.
    4. Eigenvalues of cov_mat (computed via eigen(..., symmetric = TRUE, only.values = TRUE)) are non-finite, or any eigenvalue is $< -10^{-10}$ (i.e., the matrix is not positive semi-definite within tolerance).
  • When the PSD check fails, the returned NA matrix has dimensions matching nrow(cov_mat) / ncol(cov_mat), not necessarily p.
  • The eigenvalue extraction is wrapped in tryCatch returning NA_real_ on error, which then triggers the non-finite check.
  • Downstream S3 methods are expected to fall back to nominal credible intervals when applied = FALSE.

.gdpar_eb_resolve_K_inputs(formula, amm, W, family, formula_set_input, amm_list_input, classic_with_amm_calls, family_is_named_list)

Purpose

Resolves the three possible K-input patterns (formula set, named list of amm_spec, or classic formula with AMM wrapper calls) into a single canonical amm_list_canonical, and promotes the family scope accordingly. This mirrors the K-input dispatch logic of gdpar() and is the EB-path companion of .gdpar_K. The logic is intentionally duplicated rather than refactored to preserve bit-exact behaviour of golden tests.

Arguments

  • formula (formula or gdpar_formula_set): the model formula or formula set.
  • amm (amm_spec or named list of amm_spec): the AMM specification(s).
  • W (matrix or NULL): the $W$ matrix passed to AMM construction.
  • family (gdpar_family or named list of gdpar_family): the response family specification.
  • formula_set_input (logical): whether formula is a gdpar_formula_set.
  • amm_list_input (logical): whether amm is a named list of amm_spec.
  • classic_with_amm_calls (logical): whether the formula RHS contains a()/b()/W() wrapper calls.
  • family_is_named_list (logical): whether family is a named list (heterogeneous K-slot pattern).

Returns

A list with elements:

  • amm_list_canonical (named list of amm_spec): the resolved canonical AMM specifications, one per K-slot.
  • K (integer): length of amm_list_canonical.
  • outcome_name (character): the name of the outcome variable extracted from the formula.
  • formula_env (environment): the environment associated with the formula.
  • family_promoted: the family object after scope promotion (either a promoted gdpar_family or a heterogeneous family structure).
  • family_id_k_vector (integer vector or NULL): per-observation family IDs when the heterogeneous path is taken; NULL otherwise.

Notes

Three dispatch branches, evaluated in order:

  1. formula_set_input branch: amm must be the default amm_spec() (checked via .gdpar_is_default_amm_spec()); otherwise an error of class "gdpar_input_error" is raised. The canonical list is built by .gdpar_formula_set_to_amm_spec_list(formula, W). outcome_name and formula_env are taken from formula$outcome and formula$env.

  2. amm_list_input branch: amm is used directly as amm_list_canonical. Each slot name must be non-empty (checked via nzchar()), each entry must inherit from class "amm_spec", and slot names must be unique (anyDuplicated(...) == 0L). formula must be a two-sided formula (length(formula) == 3L). Violations raise "gdpar_input_error". outcome_name is as.character(formula[[2L]]); formula_env is environment(formula).

  3. Classic (else) branch: amm must be the default amm_spec(). The first eligible parameter name is extracted from family—either family[[1L]]$param_specs[[1L]]$name (if family_is_named_list) or family$param_specs[[1L]]$name. A gdpar_formula_set is constructed via do.call(gdpar_formula_set, args_for_fs) with the formula named by that parameter, then .gdpar_formula_set_to_amm_spec_list(fs, W) builds the canonical list.

After resolution, $K = \texttt{length(amm\_list\_canonical)}$:

  • $K > 1$: If family_is_named_list, calls .gdpar_resolve_heterogeneous_family_K(family, names(amm_list_canonical)) and unpacks location_family and family_id_k_vector. Otherwise calls .gdpar_promote_scope_per_observation(family, names(amm_list_canonical)) with family_id_k_vector = NULL.
  • $K = 1$: If family_is_named_list, raises "gdpar_input_error" (heterogeneous path requires $K \geq 2$). Otherwise calls .gdpar_promote_scope_per_observation(family, k_name) with family_id_k_vector = NULL.

Errors raised (all via gdpar_abort with class = "gdpar_input_error"):

  • Formula set path with non-default amm.
  • Named-list amm with empty slot name, non-amm_spec entry, or duplicated names.
  • Named-list amm with formula that is not a two-sided formula.
  • Classic path with non-default amm.
  • Heterogeneous family (family_is_named_list = TRUE) resolved to $K = 1$.

The data field of the abort is populated for some errors (e.g., list(slot = ..., received = ...) and list(K = K)).

.gdpar_eb_run_K(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, family_id_k_vector, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group, parametrization, id_check_rigor, eb_correction, laplace_control, call, ...)

Purpose

Primary orchestrator for the Empirical-Bayes ("eb") estimation path under the regime $K > 1$ slots with $p = 1$ (univariate outcome shared across all slots). Internally labeled as "Sub-phase 8.6.C Path B." The function performs the complete pipeline: input validation, design-matrix construction for $K$ slots, anchor resolution, per-slot and $K$-level identifiability checks, group-aliasing checks, Stan data assembly, Laplace marginal maximization, conditional MCMC sampling, diagnostic computation, and EB correction application, returning a fitted object of class gdpar_eb_fit.

Arguments

Argument Type Meaning
amm_list_canonical named list of length $K$ Canonical AMM (anchor model matrix) specifications. Each element is a list potentially containing $a (formula for the $a$-basis), $b (formula for the $b$-basis), and $W (pre-specified basis matrix or formula). Names become slot_names.
family list / character Response family specification passed to Stan code generators and data assemblers.
data data.frame Data containing the outcome column and all covariates referenced in slot formulas.
prior list Prior specification passed to Stan code generators.
anchor various Anchor specification (scalar, vector, or special keyword) resolved by resolve_anchor_K.
outcome_name character Name of the outcome column in data.
formula_env environment Environment attached to all internally constructed formulas via stats::as.formula(..., env = formula_env).
family_id_k_vector integer vector Per-slot family identifiers of length $K$, forwarded to .assemble_stan_data_K.
skip_id_check logical If TRUE, all identifiability checks (per-slot and $K$-level) are bypassed and id_report is set to NULL.
chains numeric/integer Number of MCMC chains for the conditional model; coerced to integer.
iter_warmup numeric/integer Warmup iterations; coerced to integer.
iter_sampling numeric/integer Sampling iterations; coerced to integer.
adapt_delta numeric Stan NUTS adapt_delta control parameter.
max_treedepth numeric/integer Stan NUTS maximum tree depth; coerced to integer.
refresh numeric/integer Stan output refresh interval; coerced to integer.
verbose logical Controls show_messages, show_exceptions in Stan sampling, and verbosity of helper calls.
seed integer or NULL Random seed for Stan sampling and Laplace maximization. If non-NULL, coerced to integer.
group various Grouping specification for hierarchical structure, resolved by .resolve_group_argument.
parametrization character Requested parametrization ("cp" for centered, otherwise non-centered). In Path B both cp_a and cp_W are set uniformly across all $K$ slots.
id_check_rigor various Rigor level forwarded to .check_identifiability_K for the $K$-level check.
eb_correction logical Whether to apply the Proposition 7B coverage-discrepancy correction.
laplace_control list Control parameters forwarded to .gdpar_eb_maximize_marginal.
call call The original top-level function call, stored in the returned object.
... any Extra named arguments merged into the sample_args list passed to cmdstanr's $sample() method, potentially overriding defaults.

Mathematics

The function implements a two-stage Empirical Bayes estimator:

Stage 1 — Laplace marginal maximization. The marginal Stan model is generated and compiled. The marginal log-posterior of the anchor parameters is maximized:

$$\hat{\boldsymbol{\theta}}_{\mathrm{ref}} = \arg\max_{\boldsymbol{\theta}_{\mathrm{ref}}} ; \log p(\boldsymbol{\theta}_{\mathrm{ref}} \mid \mathbf{y})$$

where $\boldsymbol{\theta}_{\mathrm{ref}} \in \mathbb{R}^{J_{\mathrm{groups}} \times K}$ is the vector of per-group, per-slot anchor parameters. The Laplace helper returns the mode $\hat{\boldsymbol{\theta}}_{\mathrm{ref}}$ and its standard error.

Stage 2 — Conditional MCMC. The conditional Stan model is generated, compiled, and sampled with $\hat{\boldsymbol{\theta}}_{\mathrm{ref}}$ plugged in as data (theta_ref_k_data), drawing from:

$$p(\boldsymbol{\xi} \mid \mathbf{y}, \hat{\boldsymbol{\theta}}_{\mathrm{ref}})$$

EB correction (Proposition 7B, scalar form at $p=1$). At $p = 1$, the tensor-valued correction degenerates to a per-slot scalar:

$$C_{g,\alpha}[k] = \kappa(\alpha) \cdot \Sigma^{\mathrm{marg}}_{\theta_{\mathrm{ref},k}}$$

where $\Sigma^{\mathrm{marg}}_{\theta_{\mathrm{ref},k}}$ is the marginal variance of the $k$-th slot's anchor. The correction is applied by .gdpar_eb_apply_correction.

Identifiability diagnostic test point. For each slot $k$, the diagnostic anchor is:

$$\theta_{\mathrm{diag},k} = \begin{cases} 1 & \text{if } b_k \text{ exists and } |\theta_{\mathrm{ref},k}| < 10^{-8} \ \theta_{\mathrm{ref},k} & \text{otherwise} \end{cases}$$

This avoids testing identifiability at a degenerate zero anchor when a $b$-basis is present.

Returns

A list with S3 class c("gdpar_eb_fit", "list") containing:

Element Type / Structure Description
theta_ref_hat numeric Laplace point estimate of the anchor (flat vector of length $J_{\mathrm{groups}} \times K$).
theta_ref_se numeric Standard error of the Laplace estimate.
conditional_fit CmdStanMCMC The cmdstanr fit object from the conditional model.
amm_list_canonical named list The input AMM list (with $W slots materialized).
family The input family.
prior The input prior.
design_K list Design structure from .build_amm_design_K, containing Z_a_k_list, Z_b_k_list, X, etc.
anchor numeric vector Resolved anchor values of length $K$ from resolve_anchor_K.
stan_data list Assembled Stan data list from .assemble_stan_data_K, augmented with K_slots and p_dim.
identifiability_report named list or NULL Per-slot identifiability reports (named by slot_names) with a K_level attribute; NULL when skip_id_check = TRUE.
diagnostics MCMC diagnostics from compute_diagnostics.
diagnostics_numerical Laplace optimizer diagnostics from laplace_result$diagnostics.
parametrization list Resolved parametrization with elements cp_a (logical), cp_W (logical), cp_a_per_K (NULL), and meta (list with mode = "eb_K_path_B", note, requested).
group_info list or NULL Resolved group information from .resolve_group_argument.
correction_applied logical Whether the EB correction was applied.
eb_correction_constant The correction constant from .gdpar_eb_apply_correction.
call call The original function call.
path character Always "eb".
K integer Number of slots.
slot_names character Names of the AMM list elements.

Notes

Input validation errors (class gdpar_input_error):

  • If outcome_name is not a column in data.
  • If the outcome y is a matrix or an array with length(dim(y)) > 1 (Path B requires a length-$n$ univariate vector shared across all $K$ slots).
  • If y contains any non-finite values: for numeric y, any !is.finite(y) (NA, NaN, Inf); for non-numeric y, any is.na(y).

Identifiability errors (class gdpar_identifiability_error):

  • If any per-slot check gdpar_check_identifiability returns rep_k$passed != TRUE, the error data field contains slot (name) and report (the full report).
  • If the $K$-level check .check_identifiability_K returns passed != TRUE, the error data field contains report (the $K$-level report).
  • Both are bypassed when skip_id_check = TRUE.

Formula construction:

  • The union of all variables across all slots' $a and $b formulas is collected. If empty, the RHS is "1"; otherwise it is paste(union_vars, collapse = " + ").
  • The full formula is outcome_name ~ rhs_str with env = formula_env.
  • The RHS is extracted as formula_full[c(1L, 3L)] (a one-sided formula) and updated with ~ . + 0 to remove the intercept.

W basis materialization:

  • For each slot with a non-NULL $W element, materialize_W_basis(amm_list_canonical[[k]]$W, p = 1L) is called in place, mutating amm_list_canonical.

Parametrization resolution:

  • Both cp_a and cp_W are set to identical(parametrization, "cp"), meaning the same parametrization is applied uniformly across all $K$ slots. The meta$note explicitly states that per-slot preflight (cp_a_per_K) is queued but not yet implemented.

theta_ref_k_data reshaping:

  • theta_hat is extracted as as.numeric(laplace_result$theta_ref_hat).
  • J_groups_loc is read from stan_data$J_groups and coerced to integer.
  • The if/else branches both produce matrix(theta_hat, nrow = J_groups_loc, ncol = K, byrow = FALSE) — the two branches are functionally identical, suggesting a placeholder for future differentiation.
  • The resulting matrix is assigned to stan_data_cond$theta_ref_k_data, intended for Stan's array[J_groups] vector[K] consumer.

Stan model lifecycle:

  • The marginal Stan source is generated by .gdpar_eb_generate_stan_marginal, written to a tempfile via write_stan_to_tempfile, and compiled with cmdstanr::cmdstan_model.
  • The conditional Stan source is generated by .gdpar_eb_generate_stan_conditional and undergoes the same write-and-compile cycle.
  • Both tempfile paths are transient (side effect on the filesystem).

Sample arguments:

  • The sample_args list is constructed with explicit integer coercions for chains, iter_warmup, iter_sampling, max_treedepth, and refresh.
  • adapt_delta is passed without coercion.
  • show_messages and show_exceptions are both set to verbose.
  • seed is added only if non-NULL.
  • Extra arguments from ... are merged into sample_args by name, potentially overriding any of the above defaults.
  • Sampling is invoked via do.call(conditional_model$sample, sample_args).

Group aliasing:

  • When group_info is non-NULL, .check_group_aliasing_c7 is called for each slot $k$ with a design list containing Z_a = design_K$Z_a_k_list[[k]], Z_b = design_K$Z_b_k_list[[k]], and X = design_K$X.

Trailing roxygen block:

  • The section concludes with a roxygen @noRd documentation block for an internal function implementing the tensor-valued Proposition 7B* correction under $K &gt; 1$ and $p &gt; 1$. The function itself is not defined in this section; its documented signature includes parameters eb_correction, laplace_result_per_slot, K, p, and verbose, and it returns a list with applied, constant (3D array $[K, p, p]$), and slot_dispositions. This function appears in a subsequent section.

.gdpar_eb_correction_tensor(eb_correction, laplace_result_per_slot, K = 2L, p = 1L, verbose = TRUE)

Purpose

Builds a three-dimensional correction tensor for the Path C empirical-Bayes (EB) regime. The tensor scales each slot's reference-parameter covariance (extracted from per-slot Laplace results) by a fixed multiplier and is consumed downstream by S3 coverage methods. If any slot fails validation, the entire correction is disabled and downstream methods fall back to nominal coverage.

Arguments

  • eb_correction — logical scalar (or any value testable by isTRUE). When not TRUE, the function short-circuits and returns a disabled result with no slot processing.
  • laplace_result_per_slot — list of length K. Each element is a Laplace-fit result object expected to contain a theta_ref_cov_k field holding the $p \times p$ covariance of the reference parameters for that slot.
  • K — integer-ish scalar; number of slots. Coerced to integer. Default 2L.
  • p — integer-ish scalar; number of coordinates per slot. Coerced to integer. Default 1L.
  • verbose — logical scalar; when TRUE and at least one slot fails, a diagnostic warning is emitted via gdpar_warn.

Mathematics

For each slot $k \in \{1, \dots, K\}$ that passes validation, the correction constant is

$$C_k ;=; \kappa_{\alpha,0.95};\Sigma_k, \qquad \kappa_{\alpha,0.95} = 1.92,$$

where $\Sigma_k$ is the theta_ref_cov_k matrix for slot $k$. The resulting tensor has shape $K \times p \times p$ with slice $C[k,\cdot,\cdot] = C_k$.

The positive-semidefinite check uses eigenvalues $\lambda_i(\Sigma_k)$ computed via eigen(..., symmetric = TRUE, only.values = TRUE); a slot is rejected if any $\lambda_i &lt; -10^{-10}$.

Returns

A named list with three components:

  • applied — logical scalar. TRUE only if every slot passed validation and the tensor was filled.
  • constant — numeric array of dimensions c(K, p, p). When applied = TRUE, filled with the scaled covariances; otherwise filled with NA_real_ (the "empty tensor").
  • slot_dispositions — named character vector of length K (names are seq_len(K) coerced to character). Each entry is one of: "disabled" (correction globally off), "missing" (covariance absent or wrong shape), "non_finite" (covariance contains non-finite entries), "non_psd" (eigenvalue check failed), or "ok".

Notes

  • The multiplier kappa_alpha_95 is hardcoded to 1.92 (not the exact 1.959964… standard-normal 97.5th percentile).
  • When eb_correction is not TRUE, the returned slot_dispositions are all "disabled" and names are set via setNames(rep("disabled", K), seq_len(K)).
  • When any slot fails, any_failed is set, the function returns applied = FALSE with an empty (NA) tensor, and—if verbose—a warning of class "gdpar_diagnostic_warning" is emitted via gdpar_warn summarising the count and unique failure types.
  • The PSD eigen-decomposition is wrapped in tryCatch; an error from eigen yields NA_real_ values, which then trigger the "non_psd" disposition.
  • The covariance shape check requires is.matrix(cov_k), nrow == p, and ncol == p.

empty_tensor() (nested closure inside .gdpar_eb_correction_tensor)

Purpose

Local helper that allocates a fresh $K \times p \times p$ array of NA_real_, used as the default/disabled constant tensor.

Arguments

None. Captures K and p from the enclosing .gdpar_eb_correction_tensor scope.

Returns

A numeric array of dimensions c(K, p, p) filled entirely with NA_real_.

Notes

Defined as a closure; not accessible outside its parent function.


.build_amm_design_KxP(amm_list_canonical, data, formula_rhs)

Purpose

Constructs the per-slot multivariate (ragged) design matrices for the Path C $K \times p$ specification. Iterates over the K slots of a canonicalised amm_spec list, enforces homogeneous $p \geq 2$ across slots, and delegates to .build_amm_design_multi() for each slot. The returned structure is the direct input consumed by .assemble_stan_data_KxP().

Arguments

  • amm_list_canonical — named list of length $K \geq 2$ of amm_spec objects. Each object must carry a $p field (defaulting to 1L via %||% if absent) that is $\geq 2$ and identical across all slots.
  • data — data frame containing the variables referenced by the per-slot AMM specifications. Validated by assert_data_frame().
  • formula_rhs — two-sided formula identifying the covariate columns of data used as the linear factor $x$. Passed through verbatim to .build_amm_design_multi() for each slot.

Returns

A named list with:

  • K — integer scalar; number of slots.
  • p — integer scalar; the homogeneous coordinate dimension (taken from the first slot).
  • slot_names — character vector of length K; the names() of amm_list_canonical.
  • design_per_slot — named list of length K. Each entry is the list returned by .build_amm_design_multi(a_k, data, formula_rhs) for that slot's amm_spec.

Notes

  • Aborts with class "gdpar_internal_error" via gdpar_abort if amm_list_canonical is not a list or has length < 2.
  • Aborts if any element lacks a non-empty name (is.null(slot_names) or any(!nzchar(slot_names))).
  • Aborts if the per-slot $p values are not all $\geq 2$ or not all identical. The error message includes the comma-separated p_per_slot vector.
  • Each slot is validated with assert_inherits(a_k, "amm_spec", ...) before delegation.
  • The $p extraction uses a$p %||% 1L, so a missing $p field is treated as 1L—which then triggers the homogeneous-$p \geq 2$ abort.

.assemble_stan_data_KxP(design_KxP, family, amm_list_canonical, y_matrix, theta_anchor_kp, group_id = NULL, path = c("EB", "FB"), cp_W = FALSE)

Purpose

Assembles the complete named-list data block consumed by the Path C Stan templates (amm_eb_marginal_KxP.stan / amm_eb_conditional_KxP.stan for the EB path; amm_canonical_pmulti_KxP.stan for the FB path). Dispatches on path to enforce or lift the Sub-phase 8.6.D first-iteration restrictions: EB hardcodes use_W = 0 and restricts stan_id to $\{1, 3\}$; FB enables the modulating $W$ (globally shared) and accepts the extended Path B family set $\{1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13\}$.

Arguments

  • design_KxP — list returned by .build_amm_design_KxP(). Must contain $design_per_slot, $K, and $p.
  • family — promoted gdpar_family object (validated by assert_inherits). Must carry a $stan_id field and a $name field.
  • amm_list_canonical — named list of K amm_spec objects with $p \geq 2$ per slot. Used to extract per-slot use_a / use_b flags and (FB path) $W$ metadata.
  • y_matrix — numeric or integer matrix of outcomes, shape $n \times p$.
  • theta_anchor_kp — numeric matrix of shape $K \times p$; per-slot per-coordinate anchors on the linear-predictor scale.
  • group_id — optional integer vector of length $n$. Resolved via .resolve_group_id().
  • path — character scalar; one of "EB" or "FB" (resolved by match.arg). Default "EB".
  • cp_W — logical scalar. Present in the signature but not referenced anywhere in the function body.

Mathematics

Per-slot per-coordinate design matrices are packed into 4D arrays with zero-padding:

$$Z_{a,kp}[k, j, \cdot, \cdot] \in \mathbb{R}^{n \times J_{a,\max}}, \qquad Z_{b,kp}[k, j, \cdot, \cdot] \in \mathbb{R}^{n \times J_{b,\max}},$$

where

$$J_{a,\max} = \max_{k,j} J_a^{(k,j)}, \qquad J_{b,\max} = \max_{k,j} J_b^{(k,j)},$$

and $J_a^{(k,j)} = \text{ncol}(d_k\$\text{Z\_a\_list}[[j]])$ (similarly for $b$). Slots/coordinates with fewer columns than the maximum are right-padded with zeros.

For the FB path with $W$ enabled, the total $W$ parameter dimension is

$$\text{dim_W}_{\text{total}} = K \cdot p \cdot W_{\text{per_kj_dim}},$$

with $W_{\text{per\_kj\_dim}} = \text{amm}\$W\$\text{dim}$ taken from the first slot that declares $W$.

The use_a_k / use_b_k flags are computed as

$$\text{use_a}_k = \mathbb{1}!\left[a_ka \neq \text{NULL} ;\lor; \exists d \in a_k\text{dims}:; da \neq \text{NULL}\right],$$

and analogously for use_b_k.

Returns

A named list. The base list (returned for both paths) contains:

Field Type Description
n integer Number of observations.
K integer Number of slots.
p integer Coordinate dimension.
family_id_k_vector integer vector (length K) Homogeneous stan_id replicated K times.
inv_link_id_per_slot integer vector (length K) Computed by .gdpar_compute_inv_link_id_per_slot().
use_a_k integer vector (length K) Per-slot $a$-component flag.
use_b_k integer vector (length K) Per-slot $b$-component flag.
use_W integer scalar 0L (EB) or as.integer(any_W) (FB).
J_a_max integer Maximum $a$-design column count.
J_b_max integer Maximum $b$-design column count.
J_a_per_kp integer matrix ($K \times p$) Per-slot per-coord $a$-design sizes.
J_b_per_kp integer matrix ($K \times p$) Per-slot per-coord $b$-design sizes.
Z_a_kp numeric array ($K \times p \times n \times J_{a,\max}$) Padded $a$-design matrices.
Z_b_kp numeric array ($K \times p \times n \times J_{b,\max}$) Padded $b$-design matrices.
y_real numeric matrix ($n \times p$) Real-valued outcomes (or zeros if needs_real is FALSE).
y_int integer matrix ($n \times p$) Integer-valued outcomes (or zeros if needs_int is FALSE).
theta_anchor_kp list of K double vectors (each length p) Row-wise decomposition of the input matrix.
use_dispersion_y_k integer vector (length K) Always zero in both paths.
use_dispersion_phi_k integer vector (length K) Always zero in both paths.
use_groups (from .resolve_group_id) Group flag.
J_groups (from .resolve_group_id) Number of groups.
group_id (from .resolve_group_id) Group index vector.
K_slots integer Redundant copy of K.
p_dim integer Redundant copy of p.

For the FB path only, the list is extended (c(base_list, ...)) with:

Field Type Description
dim_W integer Total $W$ dimension ($K \cdot p \cdot W_{\text{per\_kj\_dim}}$), or 0L.
d integer Number of columns in the shared design matrix $X$.
W_per_kj_dim integer Per-(slot, coord) basis dimension.
X numeric matrix ($n \times d$) Shared linear-factor design matrix (or $n \times 0$).
W_type_id (from .gdpar_resolve_W_stan_data) $W$ basis type identifier.
W_n_knots_full (from .gdpar_resolve_W_stan_data) Knot count.
W_knots_full (from .gdpar_resolve_W_stan_data) Knot vector.
W_degree (from .gdpar_resolve_W_stan_data) Spline degree.

Notes

  • EB path restrictions: use_W is hardcoded to 0L; stan_id must be in $\{1, 3\}$ (Gaussian or Negative Binomial), otherwise a "gdpar_unsupported_feature_error" is raised. If any slot declares W != NULL on the EB path, a "gdpar_unsupported_feature_error" is raised.
  • FB path extensions: stan_id must be in $\{1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13\}$; otherwise a "gdpar_unsupported_feature_error" is raised. $W$ is enabled if any slot declares it; the first such slot's $W object defines the basis metadata (shared globally).
  • Outcome validation: For count families (stan_id $\in \{3, 10, 11, 12, 13\}$), every entry of y_matrix must be a finite, non-negative integer; otherwise a "gdpar_input_error" is raised. For continuous families (stan_id $\in \{1, 5, 6, 7, 8, 9\}$), every entry must be finite.
  • y_real / y_int population: needs_real is TRUE for stan_id $\in \{1, 5, 6, 7, 8, 9\}$; needs_int is TRUE for stan_id $\in \{3, 10, 11, 12, 13\}$. The unused matrix is zero-filled.
  • theta_anchor_kp is validated as a $K \times p$ matrix and then decomposed row-wise into a list of K length-p double vectors via lapply(seq_len(K), function(k) as.double(theta_anchor_kp[k, ])).
  • family_id_k_vector is rep(as.integer(stan_id), K)—homogeneous across slots regardless of path.
  • use_dispersion_y_k / use_dispersion_phi_k are zero vectors in both paths (the FB comment notes future B9.7+ may lift this).
  • cp_W is accepted as a parameter but never read.
  • Internal errors (class "gdpar_internal_error") are raised for: invalid design_KxP structure; K < 2 or p < 2; y_matrix not a matrix; y_matrix column count mismatch; theta_anchor_kp shape mismatch; FB path with dim_W <= 0 when use_W == 1.
  • Calls .resolve_group_id(), .gdpar_compute_inv_link_id_per_slot(), and (FB only) .gdpar_resolve_W_stan_data().
  • The pad_to local helper (see below) handles zero-padding of design matrices.

pad_to(z, target_cols, n_rows) (nested closure inside .assemble_stan_data_KxP)

Purpose

Zero-pads a design matrix z to target_cols columns. If target_cols is 0L, returns an $n_{\text{rows}} \times 0$ matrix. If z already has at least target_cols columns, returns z unchanged. Otherwise right-pads with a zero matrix.

Arguments

  • z — numeric matrix; the per-slot per-coordinate design matrix to pad.
  • target_cols — integer scalar; the target column count ($J_{a,\max}$ or $J_{b,\max}$).
  • n_rows — integer scalar; the number of rows to use when target_cols == 0L (i.e., $n$).

Returns

A numeric matrix with nrow(z) rows and max(ncol(z), target_cols) columns (or n_rows rows and 0 columns when target_cols == 0L).

Notes

Defined as a local closure inside .assemble_stan_data_KxP; captures nothing from the enclosing scope (all inputs are explicit arguments). When target_cols > 0L but ncol(z) >= target_cols, z is returned as-is (no truncation occurs even if z has more columns than the target).

.gdpar_eb_make_random_init_KxP(stan_data, seed_offset = 1L, base_seed = NULL)

Purpose Internal helper that fabricates a random initial-values list for the cmdstanr optimizer / Laplace approximation in the Path C K×p EB workflow. The returned list conforms to the cmdstanr automatic packing convention for the theta_ref_kp parameter (a 3D array [J, K, p]) and conditionally emits the auxiliary scale / raw-coefficient parameters that the K×p Stan template exposes when group structure or free a/b coefficients are active.

Arguments

  • stan_data — list. The Stan data environment. The following fields are consulted (via null-coalescing %||%): K_slots (fallback K), p_dim (fallback p), J_groups (fallback 1L), use_groups (fallback 0L), use_a_k, use_b_k, J_a_per_kp, J_b_per_kp.
  • seed_offset — integer scalar, default 1L. Integer added to base_seed to derive the per-start RNG seed, enabling distinct inits across multi-start iterations.
  • base_seed — integer scalar or NULL. When non-NULL, the function seeds the global RNG with as.integer(base_seed) + seed_offset and restores the prior .Random.seed state on exit. When NULL, no seeding is performed and the global RNG state is untouched.

Mathematics

The RNG seed is

$$ \text{rng_seed} = \begin{cases} \text{base_seed} + \text{seed_offset} & \text{base_seed} \neq \text{NULL} \ \text{NULL (no seeding)} & \text{otherwise} \end{cases} $$

Draws produced (all i.i.d. unless noted):

  • theta_ref_kp[g,k,c] $\sim \mathcal{N}(0,\, 0.1^2)$, shape $[J, K, p]$.
  • When use_groups == 1:
    • mu_theta_ref_kp[1,k,c] $\sim \mathcal{N}(0,\, 0.1^2)$, shape $[1, K, p]$.
    • sigma_theta_ref_kp[1,k,c] $= |\mathcal{N}(0.5,\, 0.05^2)|$, shape $[1, K, p]$.
  • When any use_a_k == 1:
    • sigma_a_k[s] $= 0.1 + |\mathcal{N}(0,\, 0.02^2)|$ for $s = 1, \dots, n_{\sigma_a}$, where $n_{\sigma_a}$ is the count of slots $k$ satisfying use_a_k[k] == 1 and $\sum_{c} \mathbf{1}\{J_{a,\text{per\_kp}}[k,c] &gt; 0\} &gt; 0$.
    • a_raw[j] $\sim \mathcal{N}(0,\, 0.1^2)$ for $j = 1, \dots, \sum_{k,c} J_{a,\text{per\_kp}}[k,c]$.
  • When any use_b_k == 1:
    • sigma_b_k[k] $= 0.1 + |\mathcal{N}(0,\, 0.02^2)|$ for $k = 1, \dots, K$.
    • c_b_kp_raw[j] $\sim \mathcal{N}(0,\, 0.1^2)$ for $j = 1, \dots, \sum_{k,c} J_{b,\text{per\_kp}}[k,c]$.

Returns

A named list. Always contains theta_ref_kp (a 3D numeric array of dim c(J, K, p)). Conditionally also contains:

  • mu_theta_ref_kp — 3D array [1, K, p] (only when use_groups == 1).
  • sigma_theta_ref_kp — 3D array [1, K, p] (only when use_groups == 1).
  • sigma_a_k — 1D numeric array of length n_sigma_a (only when any_use_a == 1 and n_sigma_a > 0).
  • a_raw — numeric vector of length total_J_a_free (only when any_use_a == 1 and total_J_a_free > 0).
  • sigma_b_k — 1D numeric array of length K (only when any_use_b == 1).
  • c_b_kp_raw — numeric vector of length total_J_b_free (only when any_use_b == 1 and total_J_b_free > 0).

Notes

  • Side effect: when base_seed is non-NULL, the global .Random.seed is overwritten via set.seed(rng_seed) and restored on function exit through an on.exit handler. If .Random.seed did not previously exist in .GlobalEnv, the handler performs no restoration (the seed state is left as set).
  • The slot-free-a mask is computed by coercing stan_data$J_a_per_kp to an integer matrix of shape K × p (row-major), then taking rowSums(.jap > 0L) > 0L intersected with use_a_k == 1L. This mirrors the n_sigma_a transformed-data quantity of the K×P Stan template; when every slot carries free a coefficients, $n_{\sigma_a} = K$ and the draw count is bit-identical to the unconditional case.
  • sigma_a_k and sigma_b_k are wrapped with as.array to ensure 1D-array typing expected by cmdstanr init packing.
  • No errors are raised by this function; malformed stan_data would propagate as errors from downstream coercions (e.g. as.integer, matrix).

.gdpar_eb_maximize_marginal_KxP(model, stan_data, control, seed, verbose)

Purpose

Step (i) of the EB workflow under Path C, specialized for the K×p regime. Runs a multi-start joint Laplace approximation over the full theta_ref_kp anchor tensor of shape [J_groups, K, p], selects the best init by marginal log-likelihood, draws from the Laplace approximation, extracts per-slot $p \times p$ covariance blocks (with canonical group-averaging when $J &gt; 1$), applies per-slot adaptive Levenberg–Marquardt ridge perturbation, and gates the result on a condition-number threshold. The packaged output is consumed downstream by .gdpar_eb_correction_tensor().

Arguments

  • model — cmdstanr model object. Must expose $optimize and $laplace methods.
  • stan_data — list. Stan data list; must contain J_groups, K_slots, p_dim, plus whatever fields .gdpar_eb_make_random_init_KxP requires.
  • control — list. Must contain at least: multi_start_M (integer, number of starts), optim_algorithm (passed to model$optimize), laplace_draws (integer, number of Laplace draws), kappa_threshold (numeric, condition-number gate), and any fields consumed by .gdpar_eb_lm_perturb.
  • seed — integer scalar or NULL. Base seed for reproducibility; propagated to both the init generator and cmdstanr.
  • verbose — logical. Controls emission of informational messages via gdpar_inform and gdpar_warn.

Mathematics

Multi-start optimization. For $m = 1, \dots, M$:

$$ \theta^{(m)}_0 = \texttt{.gdpar_eb_make_random_init_KxP}(\text{stan_data},, m,, \text{seed}) $$

$$ \hat{\theta}^{(m)} = \arg\max_{\theta}, \log p(\theta \mid \text{data}) $$

with optimizer seed $\text{seed} + m$ (when seed non-NULL). The marginal log-likelihood of each start is $\ell_m = \texttt{opt\_m\$mle()["lp\_\_"]}$. The best start is

$$ m^\star = \arg\max_{m \in {1,\dots,M}} \ell_m $$

where inits that errored are skipped (their $\ell_m$ is NA_real_).

Laplace approximation. Given $\hat{\theta}^{(m^\star)}$,

$$ \theta^{(s)} \sim \mathcal{N}!\left(\hat{\theta}^{(m^\star)},, \left[-\nabla^2 \log p(\hat{\theta}^{(m^\star)} \mid \text{data})\right]^{-1}\right), \quad s = 1, \dots, S $$

with $S = \text{control\$laplace\_draws}$ and Laplace seed $\text{seed} + 1000$ (when seed non-NULL).

Posterior mean of the anchor tensor:

$$ \hat{\theta}_{g,k,c} = \frac{1}{S}\sum_{s=1}^{S} \theta^{(s)}_{g,k,c} $$

Per-slot covariance. For slot $k$, let $\mathcal{I}_k = \{(g, c) : g \in [J],\, c \in [p]\}$ index the $J \cdot p$ rows of the draws matrix belonging to slot $k$. The raw slot covariance is the sample covariance of those columns. When $J &gt; 1$, the $J$ group-blocks of size $p \times p$ are averaged:

$$ \Sigma_k = \frac{1}{J}\sum_{g=1}^{J} \Sigma_{k,g} $$

where $\Sigma_{k,g}$ is the $p \times p$ sub-block of the slot covariance corresponding to group $g$. Each $\Sigma_k$ is then passed through .gdpar_eb_lm_perturb, yielding a ridge-perturbed $\tilde{\Sigma}_k$ with post-ridge condition number $\kappa_k$.

Dispersion diagnostic. Let $\mathcal{F} = \{m : \ell_m \text{ finite}\}$. Then

$$ \text{dispersion} = \begin{cases} \dfrac{\mathrm{sd}({\ell_m}_{m \in \mathcal{F}})}{\max!\left(\left|\overline{{\ell_m}_{m \in \mathcal{F}}}\right|,, 1\right)} &amp; |\mathcal{F}| &gt; 1 \ \text{NA_real_} &amp; \text{otherwise} \end{cases} $$

Returns

A list with three top-level components:

  • theta_ref_kp_hat — 3D numeric array of dim c(J, K, p), the posterior-mean anchor tensor.
  • laplace_result_per_slot — named list of length K with names paste0("slot_", 1:K). Each entry is a list with element theta_ref_cov_k, a $p \times p$ numeric matrix (the ridge-perturbed, group-averaged slot covariance).
  • diagnostics — list with:
    • kappa_per_slot — numeric vector of length K (post-ridge condition numbers).
    • lm_lambda_per_slot — numeric vector of length K (LM ridge lambda used per slot).
    • lm_n_iter_per_slot — integer vector of length K (LM iterations per slot).
    • lm_status_per_slot — character vector of length K (LM status per slot; "not_needed" when no perturbation was required).
    • multi_start_dispersion — numeric scalar or NA_real_.
    • marginal_log_lik_history — numeric vector of length M (the $\ell_m$ values, with NA_real_ for failed starts).
    • best_init_index — integer scalar $m^\star$.

Notes

  • Multi-start loop: each start's model$optimize call is wrapped in tryCatch; on error, verbose-gated gdpar_inform message of class gdpar_eb_message is emitted and the start is recorded as NULL (skipped via next). The lp__ extraction is itself wrapped in tryCatch returning NA_real_ on failure.
  • Best-init selection: a start becomes the new best if best_idx is NA, or if its $\ell_m$ is finite and strictly greater than the current best's recorded value (which may be NA). This means the first non-errored start with finite $\ell_m$ is always selected as a fallback.
  • All-starts-failed abort: if best_opt remains NULL after the loop, gdpar_abort is invoked with class c("gdpar_unsupported_feature_error") and data = list(history_lp = history_lp), recommending gdpar() (FB) instead.
  • Laplace failure abort: if model$laplace errors (caught by tryCatch returning NULL), gdpar_abort is invoked with class c("gdpar_eb_numerical_error") and data = list(history_lp = history_lp, best_idx = best_idx).
  • Missing-variable abort: the expected theta_ref_kp[g,k,c] variable names are enumerated via expand.grid(g = 1:J, k = 1:K, c = 1:p) (row-major in (g, k, c)). If the intersection with dimnames(draws)$variable does not cover all expected names, gdpar_abort is invoked with class c("gdpar_internal_error").
  • Condition-number gate: after per-slot LM perturbation, any slot with finite $\kappa_k &gt; \text{control\$kappa\_threshold}$ or LM status "exhausted" triggers gdpar_abort of class c("gdpar_eb_numerical_error") with a detailed data list carrying per-slot kappa, lambda, n_iter, status, and the full history_lp.
  • Dispersion warning: when dispersion > 0.05 and verbose is TRUE, gdpar_warn of class c("gdpar_diagnostic_warning") is emitted with data = list(dispersion, history_lp), flagging possible multimodality.
  • Draws handling: uses posterior::subset_draws and posterior::as_draws_matrix; the %||% operator guards dimnames(draws)$variable.
  • Group-block indexing: within a slot's column subset, the $g$-th group block is assumed to occupy contiguous indices ((g-1)*p + 1):((g-1)*p + p), consistent with the (g, k, c) row-major enumeration from expand.grid.
  • Single-draws-column slot: when a slot has only one column in mat_kp (i.e. $J \cdot p = 1$ for that slot), the covariance is computed as matrix(stats::var(...), 1, 1) rather than stats::cov.
  • External dependencies: calls .gdpar_eb_lm_perturb, gdpar_inform, gdpar_abort, gdpar_warn (all defined elsewhere in the package).

.resolve_anchor_KxP(anchor, family, y_matrix, K, p, verbose)

Purpose

Internal helper that resolves the user-supplied anchor argument into a canonical $K \times p$ numeric matrix used by the Path C (multivariate, $K \ge 2$ AND $p \ge 2$) Empirical-Bayes pipeline. It supports four input modes: a finite numeric scalar (broadcast), a $K \times p$ finite matrix, the string "prior_mean", and the string "empirical_y".

Arguments

  • anchornumeric scalar, matrix of shape $K \times p$, or character scalar in {"prior_mean", "empirical_y"}. The unresolved anchor specification.
  • family — a family object (list-like) exposing a linkfun function in its location slot; used only when anchor == "empirical_y".
  • y_matrixnumeric matrix of shape $n \times p$ containing the multivariate outcome; consumed only for the "empirical_y" branch via colMeans.
  • Kinteger/numeric scalar; number of slots (rows of the resolved anchor).
  • pinteger/numeric scalar; number of coordinates per slot (columns of the resolved anchor).
  • verboselogical scalar; when TRUE, an informational message is emitted for the "empirical_y" branch via gdpar_inform.

Mathematics

For the scalar branch the resolved matrix is

$$ A_{k,c} = a \in \mathbb{R}, \quad k=1,\dots,K,; c=1,\dots,p. $$

For "prior_mean",

$$ A_{k,c} = 0 \quad \forall, k,c. $$

For "empirical_y", let $\bar{y}_c = \frac{1}{n}\sum_{i=1}^{n} y_{i,c}$ be the column means of y_matrix and $g(\cdot)$ be family$linkfun. Then

$$ A_{k,c} = \begin{cases} g(\bar{y}_c) & k = 1, \ 0 & k = 2,\dots,K, \end{cases} \quad c = 1,\dots,p. $$

Returns

A matrix of mode double with nrow = K and ncol = p. In the "empirical_y" branch, row 1 holds $g(\bar y_1), \dots, g(\bar y_p)$ and rows $2,\dots,K$ are all zero.

Notes

  • Scalar branch: requires is.numeric(anchor), length(anchor) == 1L, and is.finite(anchor); otherwise falls through.
  • Matrix branch: requires is.matrix(anchor), nrow(anchor) == K, ncol(anchor) == p. If any entry is non-finite, raises an error of class c("gdpar_input_error", ...) via gdpar_abort with the message "Argument 'anchor' as a matrix must contain only finite values."
  • Character branch: only the two exact strings "prior_mean" and "empirical_y" are recognized. For "empirical_y", family$linkfun is applied per column inside a tryCatch; on error, gdpar_abort is invoked with class "gdpar_input_error" and a formatted message naming the offending column index and the captured condition message.
  • When verbose is TRUE and the "empirical_y" branch is taken, a message of class c("gdpar_anchor_message", ...) is emitted via gdpar_inform describing the computed anchor (formatted to 4 digits) and noting that all other slots are anchored at 0.
  • Fall-through (no recognized mode): gdpar_abort is called with class "gdpar_input_error" and data = list(received = anchor, K = K, p = p), with the message enumerating the four admissible forms.
  • No side effects beyond messages and errors; the function is pure with respect to its inputs.

.gdpar_eb_run_KxP(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, family_id_k_vector, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group, parametrization, id_check_rigor, eb_correction, laplace_control, call, ...)

Purpose

Internal orchestrator for the Path C Empirical-Bayes (EB) pipeline, invoked when $K \ge 2$ AND $p \ge 2$ (Sub-phase 8.6.D, first iteration). It validates inputs, builds the $K \times p$ AMM design, resolves the anchor, optionally runs per-slot identifiability checks, assembles Stan data, generates and compiles the marginal and conditional Stan models, runs Laplace mode-finding on the marginal model, runs HMC sampling on the conditional model, computes diagnostics, applies the EB correction tensor, and assembles a gdpar_eb_fit result object.

Arguments

  • amm_list_canonical — named list of length $K$ of canonicalized amm_spec objects. Each element is expected to expose fields $a, $b, $W, $dims, and $p. Names become slot_names.
  • family — family object/list; passed to .resolve_anchor_KxP, .gdpar_eb_check_stan_id_for_path, .assemble_stan_data_KxP, and the Stan-code generators. Must expose linkfun (used by the anchor resolver).
  • datadata.frame containing the outcome matrix-column and all RHS variables referenced by the AMM specifications.
  • prior — prior specification object; forwarded to the Stan-code generators .gdpar_eb_generate_stan_marginal and .gdpar_eb_generate_stan_conditional.
  • anchor — unresolved anchor specification; forwarded to .resolve_anchor_KxP.
  • outcome_namecharacter scalar naming the matrix-column in data that holds the $n \times p$ outcome.
  • formula_envenvironment used as the enclosure for stats::as.formula constructions.
  • family_id_k_vector — (declared in the signature but not referenced in the body of this section) per-slot family-id vector.
  • skip_id_checklogical scalar. When TRUE, identifiability checking is skipped entirely and id_report is set to NULL.
  • chainsinteger/numeric scalar; number of MCMC chains, coerced via as.integer.
  • iter_warmupinteger/numeric scalar; warmup iterations, coerced via as.integer.
  • iter_samplinginteger/numeric scalar; sampling iterations, coerced via as.integer.
  • adapt_deltanumeric scalar; CmdStan adapt_delta setting (passed uncoerced).
  • max_treedepthinteger/numeric scalar; CmdStan max_treedepth, coerced via as.integer.
  • refreshinteger/numeric scalar; CmdStan refresh, coerced via as.integer.
  • verboselogical scalar; controls informational messages and show_messages/show_exceptions to CmdStan.
  • seedNULL or integer/numeric scalar; PRNG seed for both Laplace mode-finding and HMC sampling.
  • group — group specification forwarded to .resolve_group_argument.
  • parametrizationcharacter scalar; currently only "cp" is detected (both cp_a and cp_W are set to the same value); other values yield FALSE for both.
  • id_check_rigor — (declared in the signature but not referenced in the body of this section) rigor level for identifiability checks.
  • eb_correction — correction specification forwarded to .gdpar_eb_correction_tensor.
  • laplace_controllist of control parameters forwarded to .gdpar_eb_maximize_marginal_KxP.
  • call — the original call; stored verbatim in the returned object.
  • ... — extra named arguments spliced into the sample_args list passed to cmdstanr::cmdstan_model$sample, overriding defaults.

Mathematics

Let $K$ be the number of slots and $p$ the (homogeneous) per-slot coordinate count. The function enforces

$$ p_k = p \quad \forall, k \in {1,\dots,K}, \qquad p \ge 2. $$

The anchor matrix $A \in \mathbb{R}^{K \times p}$ is resolved by .resolve_anchor_KxP. The Laplace stage produces a mode estimate

$$ \hat\theta^{\mathrm{ref}}_{g,k,c} \in \mathbb{R}^{J_{\mathrm{groups}} \times K \times p}, $$

together with per-slot covariance blocks $\Sigma_k \in \mathbb{R}^{p \times p}$. Per-group, per-slot, per-coordinate standard errors are obtained from the diagonal of each slot's covariance block and replicated across groups:

$$ \mathrm{SE}_{g,k,c} = \sqrt{\max\bigl([\Sigma_k]_{c,c},, 0\bigr)}, \qquad g = 1,\dots,J_{\mathrm{groups}}. $$

If $\Sigma_k$ is not a matrix, $\mathrm{SE}_{g,k,c} = \mathrm{NA}$ for all $g, c$ in that slot.

Returns

A list with S3 class c("gdpar_eb_fit", "list") containing:

  • theta_ref_kp_hat — the $J_{\mathrm{groups}} \times K \times p$ array of Laplace mode estimates (from laplace_result$theta_ref_kp_hat).
  • theta_ref_kp_searray of shape $J_{\mathrm{groups}} \times K \times p$ of standard errors derived from per-slot covariance diagonals.
  • theta_ref_kp_cov_per_slotlist of length $K$ of per-slot covariance matrices (theta_ref_cov_k).
  • conditional_fit — the cmdstanr fit object returned by conditional_model$sample(...).
  • amm_list_canonical — the input amm_list_canonical (unchanged).
  • family — the input family.
  • prior — the input prior.
  • design_KxP — the design object returned by .build_amm_design_KxP.
  • anchor — the resolved $K \times p$ anchor matrix.
  • stan_data — the Stan data list returned by .assemble_stan_data_KxP.
  • identifiability_reportNULL when skip_id_check is TRUE; otherwise a named list of length $K$ of per-slot reports (entries may be NULL if a per-slot check errored).
  • diagnostics — object returned by compute_diagnostics(fit_cond, verbose = verbose).
  • diagnostics_numericallaplace_result$diagnostics (numerical diagnostics from the Laplace stage).
  • parametrizationlist with elements cp_a (logical), cp_W (logical), cp_a_per_K (NULL), and meta (a list with mode = "eb_KxP_path_C", a note string, and requested = list(parametrization = parametrization)).
  • group_info — value returned by .resolve_group_argument (may be NULL).
  • correction_appliedcorrection$applied.
  • correction_tensor_constantcorrection$constant.
  • correction_tensor_dispositionscorrection$slot_dispositions.
  • call — the input call.
  • path"eb_KxP".
  • K — integer number of slots.
  • p — resolved per-slot coordinate count.
  • slot_namesnames(amm_list_canonical).

Notes

  • Homogeneous-p requirement. Per-slot $p$ values are extracted via a$p %||% 1L. If any(p_per_slot != p_resolved), an error of class c("gdpar_unsupported_feature_error", ...) is raised with data = list(K = K, p_per_slot = p_per_slot), deferring heterogeneous-$p$ support to "Block 9.x".
  • Minimum $p$. If p_resolved < 2L, an error of class c("gdpar_internal_error", ...) is raised, indicating a dispatcher routing mistake.
  • Stan-id coverage. .gdpar_eb_check_stan_id_for_path(family, K, p_resolved) is called before any data validation; this is the gatekeeper for the supported (family, K, p) triples in the first iteration.
  • Outcome presence and shape. outcome_name must be a column of data; otherwise an error of class c("gdpar_input_error", ...) is raised. The outcome object must be a matrix (not a vector/factor); otherwise an error of class c("gdpar_input_error", ...) is raised with data = list(received_class = class(y_obj), outcome_name = outcome_name). Its column count must equal p_resolved; otherwise an error of class c("gdpar_input_error", ...) is raised with data = list(received_ncol = ncol(y_obj), p_resolved = p_resolved).
  • Outcome finiteness. For numeric outcomes, non-finite entries (NA, NaN, Inf) are flagged via !is.finite; for non-numeric outcomes, is.na is used. Any non-finite entry triggers an error of class c("gdpar_input_error", ...) reporting the count.
  • RHS union. The union of all.vars over each slot's $a, $b, and (if present) each element of $dims's $a/$b is collected. If empty, the RHS string defaults to "1"; otherwise it is paste(union_vars, collapse = " + "). A full formula outcome_name ~ rhs is constructed in formula_env, then the RHS is extracted as formula_full[c(1L, 3L)] and updated with ~ . + 0 to drop the intercept.
  • W component disabled. If any slot declares a non-NULL $W, an error of class c("gdpar_unsupported_feature_error", ...) is raised (per decision D39), directing the user to remove the W() wrapper or defer to Block 9.x.
  • Identifiability checks. When skip_id_check is TRUE, id_report is NULL. When FALSE, a verbose informational message of class c("gdpar_eb_message", ...) is emitted noting that the K-level joint check is deferred. Per-slot, the slot's own RHS variables are gathered (analogously to the union above), a per-slot RHS formula is built in formula_env, and gdpar_check_identifiability is called with theta_ref_init set to anchor_value[k, 1L] (or 1 when that anchor entry is near zero in absolute value — threshold 1e-8 — and the slot has a non-NULL $b). Each per-slot call is wrapped in tryCatch; on error, a verbose message of class c("gdpar_eb_message", ...) is emitted and the slot's report is set to NULL.
  • Group resolution. .resolve_group_argument(group, data, n = nrow(y_matrix), verbose = verbose) is called; if it returns non-NULL, its $group_id is forwarded to .assemble_stan_data_KxP as group_id.
  • Parametrization. Both cp_a and cp_W are set to identical(parametrization, "cp"); cp_a_per_K is NULL. The meta$note documents that the first iteration ships with NCP hardcoded in the Stan templates and that per-slot preflight is queued for Block 9.x.
  • Stan model generation and compilation. .gdpar_eb_generate_stan_marginal(prior, cp_a, cp_W, K, p, family) produces the marginal source, written to a tempfile via write_stan_to_tempfile and compiled with cmdstanr::cmdstan_model. The same pattern is used for the conditional source via .gdpar_eb_generate_stan_conditional.
  • Laplace stage. .gdpar_eb_maximize_marginal_KxP(model = marginal_model, stan_data = stan_data, control = laplace_control, seed = seed, verbose = verbose) returns laplace_result, expected to contain theta_ref_kp_hat (a 3D array) and laplace_result_per_slot (a list of length $K$ whose elements carry theta_ref_cov_k).
  • Conditional stage. stan_data_cond is a shallow copy of stan_data augmented with theta_ref_kp_data = laplace_result$theta_ref_kp_hat. The comment notes that cmdstanr's automatic packing accepts an R 3D array of shape $[J_{\mathrm{groups}}, K, p]$ for the Stan declaration array[J_groups, K] vector[p] theta_ref_kp_data. Sampling is invoked via do.call(conditional_model$sample, sample_args); sample_args includes data, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, show_messages = verbose, show_exceptions = verbose, and (if non-NULL) seed. Extra arguments from ... are spliced in by name, potentially overriding the defaults.
  • Diagnostics. compute_diagnostics(fit_cond, verbose = verbose) is called on the conditional fit; its return value is stored as diagnostics. The Laplace stage's own diagnostics are stored as diagnostics_numerical.
  • Correction tensor. .gdpar_eb_correction_tensor(eb_correction, laplace_result_per_slot, K, p_resolved, verbose) is called; its $applied, $constant, and $slot_dispositions are stored.
  • SE construction. A $J_{\mathrm{groups}} \times K \times p$ array of NA_real_ is allocated and filled per slot from sqrt(pmax(diag(cov_k), 0)) when cov_k is a matrix, else left as NA_real_. The same per-slot SE vector is replicated across all groups (per decision D43 = (a)).
  • Side effects. Writes two Stan source files to tempfiles; compiles two CmdStan models; runs Laplace optimization and HMC sampling (which may produce console output when verbose is TRUE); may emit gdpar_inform messages. No global state is mutated.
  • S3 class. The returned object is assigned class c("gdpar_eb_fit", "list"), enabling dispatch on gdpar_eb_fit methods downstream.
  • Unused formal arguments. family_id_k_vector and id_check_rigor appear in the signature but are not referenced in this section's body; they are preserved for API stability and downstream/future use.

R/families.R

.gdpar_param_spec(name, link, did_status, did_condition, did_reference, support, prior_canonical_kind, scope, family_role)

Purpose Internal constructor that builds a single structural-parameter specification object (class gdpar_param_spec). Each such object describes one parameter of a statistical family under the canonical contract established in the multi-parametric extension scoping (Block 8 Session 1, decision 1C): a family is represented as a list of marginal parameter specifications.

Arguments

Argument Type Meaning
name character scalar Short canonical name of the parameter (e.g. "mu", "sigma", "phi", "pi", "p").
link character scalar Name of the link function to apply; passed to .gdpar_link_funcs(link). Must be one of "identity", "log", "logit".
did_status character scalar Identifiability status under Lemma 1B. One of "holds", "holds_under_condition", or "user_responsible".
did_condition character scalar or NA_character_ Human-readable description of the condition under which identifiability holds (relevant only when did_status is "holds_under_condition").
did_reference character scalar Citation or pointer to the formal justification for the identifiability claim (e.g. "Block 1, Section 6.4 (Lemma 1B)").
support character scalar Natural support of the parameter on the real line or a constrained domain. One of "real_line", "positive_real", "unit_interval", "bounded_open".
prior_canonical_kind character scalar Identifier for the canonical prior kind (decision 2b-iii), e.g. "mu", "log_sigma", "logit_p", "log_phi", "log_shape", "log_nu", "power_p", "logit_pi".
scope character scalar Scope of the parameter in the model hierarchy. One of "per_observation" (varies by observation) or "population" (shared across observations). Decision 2a-iii.
family_role character scalar Structural role of the parameter within the family. Must be one of the values returned by .gdpar_known_family_roles().

Returns A list of class c("gdpar_param_spec", "list") with the following named elements:

Element Value
name The name argument.
link The link argument.
linkfun The link function $g(\mu)$ mapping from the natural parameter space to the linear predictor space.
inv_link The inverse link function $g^{-1}(\eta)$ mapping from the linear predictor space back to the natural parameter space.
did_status Identifiability status string.
did_condition Identifiability condition string or NA_character_.
did_reference Identifiability reference string.
support Support string.
prior_canonical_kind Prior kind identifier string.
scope Scope string.
family_role Role string.

Notes

  • The link functions (linkfun, inv_link) are obtained by calling .gdpar_link_funcs(link). If an unsupported link name is supplied, that helper will call gdpar_abort().
  • The class vector is c("gdpar_param_spec", "list"), enabling S3 dispatch on "gdpar_param_spec" first.
  • No validation of did_status, support, scope, or family_role against known values is performed inside this constructor; such validation is the responsibility of callers or upstream validators.

.gdpar_known_family_roles()

Purpose Returns the complete character vector of all recognised family_role values that a gdpar_param_spec entry may carry. Used by validators inside .gdpar_param_spec() and by codegen branches that emit family-specific Stan likelihood blocks.

Arguments None.

Returns A character vector of length 6:

c("location", "scale", "shape", "df", "mixture_pi", "power")
Value Meaning
"location" Position/location parameter ($\mu$ in any GLMM-like family; the per-observation primary parameter).
"scale" Dispersion / scale parameter ($\sigma$ for Gaussian, $\phi$ for NB/Beta, $\phi$ for Tweedie).
"shape" Shape parameter (Gamma shape, when relevant).
"df" Degrees of freedom ($\nu$ for Student-t).
"mixture_pi" Mixture probability (ZIP / ZINB / Hurdle).
"power" Power parameter (Tweedie $p \in (1,2)$).

Notes

  • Pure function with no side effects.
  • The ordering of elements is fixed by the source code and may be relied upon for positional indexing.

.gdpar_link_funcs(link)

Purpose Link function factory. Given the name of a link, returns a list containing both the link function and its inverse. Centralizes the link-function switch used by the family and parameter-spec constructors.

Arguments

Argument Type Meaning
link character scalar Name of the link function. Must be one of "identity", "log", or "logit".

Mathematics

For each supported link:

  • identity: $\mathrm{linkfun}(\mu) = \mu$, $\mathrm{inv\_link}(\eta) = \eta$
  • log: $\mathrm{linkfun}(\mu) = \log(\mu)$, $\mathrm{inv\_link}(\eta) = \exp(\eta)$
  • logit: $\mathrm{linkfun}(\mu) = \log\!\left(\dfrac{\mu}{1-\mu}\right)$, $\mathrm{inv\_link}(\eta) = \dfrac{1}{1 + \exp(-\eta)}$

Returns A named list with two elements:

Element Type Meaning
linkfun function The link function $g(\mu)$; takes a numeric vector on the natural parameter scale and returns the linear predictor scale.
inv_link function The inverse link function $g^{-1}(\eta)$; takes a numeric vector on the linear predictor scale and returns the natural parameter scale.

Notes

  • If link is not one of the three recognised names, the function calls gdpar_abort() with message "Internal error: unsupported link '<link>'." and error class "gdpar_internal_error". This abort occurs inside the switch() for inv_link first; the linkfun switch is never reached.
  • All returned closures are anonymous functions defined at call time.

.gdpar_family_param_specs_for(name, link)

Purpose Returns the canonical list of gdpar_param_spec objects describing every structural parameter that a built-in family admits as eligible for an individual specification. The first element is always the per-observation location parameter; subsequent elements are population-level auxiliary parameters in their canonical scope. The returned list is the registry of eligible parameters, not a subset selected for a particular model; promotion of auxiliary parameters to per-observation scope is handled downstream by .gdpar_promote_scope_per_observation().

Arguments

Argument Type Meaning
name character scalar Built-in family name. Must be one of: "gaussian", "poisson", "neg_binomial_2", "bernoulli", "beta", "gamma", "student_t", "tweedie", "zip", "zinb", "hurdle_poisson", "hurdle_neg_binomial_2".
link character scalar Link function name to use for the location ($\mu$) parameter. Passed through to .gdpar_param_spec().

Returns A list of gdpar_param_spec objects (each of class c("gdpar_param_spec", "list")). The list length and contents depend on name:

Family Parameters (in order) link applied to
"gaussian" mu (location), sigma (scale) mu
"poisson" mu (location) mu
"neg_binomial_2" mu (location), phi (scale) mu
"bernoulli" mu (location) mu
"beta" mu (location), phi (scale) mu
"gamma" mu (location), shape (shape) mu
"student_t" mu (location), sigma (scale), nu (df) mu
"tweedie" mu (location), phi (scale), p (power) mu
"zip" mu (location), pi (mixture_pi) mu
"zinb" mu (location), phi (scale), pi (mixture_pi) mu
"hurdle_poisson" mu (location), pi (mixture_pi) mu
"hurdle_neg_binomial_2" mu (location), phi (scale), pi (mixture_pi) mu

All location parameters receive the user-supplied link. All auxiliary parameters use their own fixed link (always "log" for positive-real parameters, "logit" for unit-interval parameters, "identity" for the Tweedie power parameter).

Notes

  • If name does not match any recognised family, the function calls gdpar_abort() with message "Internal error: no canonical param_specs for family '<name>'." and error class "gdpar_internal_error".
  • Every location parameter has did_status = "holds", did_condition = NA_character_, and scope = "per_observation".
  • Every auxiliary parameter has did_status = "holds_under_condition" with a family-specific did_condition string and scope = "population".
  • The prior_canonical_kind for auxiliary parameters is determined by the family-specific convention (e.g. "log_sigma" for Gaussian sigma, "log_phi" for NB phi, "logit_pi" for ZIP pi).
  • For zip and zinb, the did_reference includes a literature citation (Lambert (1992) and Greene (1994) respectively). For hurdle_poisson and hurdle_neg_binomial_2, the reference includes Mullahy (1986).
  • The tweedie p parameter has support = "bounded_open" and prior_canonical_kind = "power_p", distinguishing it from all other auxiliary parameters which use "positive_real" or "unit_interval".
  • This function is a pure lookup; it does not mutate any global state.

.gdpar_apply_did_override(param_specs, did_override, family_name)

Purpose Implements the did_override plasticity argument of gdpar_family() (D2 of sub-phase 8.3.5a, 2026-05-21). Allows the user to keep the canonical likelihood and links of a built-in family but replace the identifiability descriptors of one or more parameter slots with their own declaration. Returns a new list of param-specs with the overrides applied; the input list is not mutated.

Arguments

Argument Type Meaning
param_specs list of gdpar_param_spec objects The canonical parameter specifications to modify.
did_override NULL or a named list If NULL, a no-op. Otherwise a named list whose names must be a subset of the parameter names in param_specs. Each entry is itself a named list with names from {"did_status", "did_condition", "did_reference"}.
family_name character scalar Family name used only for error messages.

Returns List of gdpar_param_spec objects with the user-supplied identifiability descriptor overrides applied.

Notes

  • The docstring is present in this section but the function body is not; the implementation follows in a subsequent section of R/families.R.
  • Validation requirements (documented but not visible in source here): did_override must be NULL or a named list whose names are a subset of the parameter names in the family's param_specs. Each entry must be a list whose names are a subset of c("did_status", "did_condition", "did_reference"); unknown fields raise an error. did_status must be one of "holds", "holds_under_condition", "user_responsible".
  • Design rationale: D2 of the sub-phase 8.3.5 scoping rejected registering a built-in inside .gdpar_K_custom_patterns() because that would blur the contract boundary between "package canonizes the likelihood" and "user declares D-ID". The override keeps the likelihood and links first-class while letting the user adjust identifiability semantics for their design.

.gdpar_apply_did_override(param_specs, did_override, family_name)

Purpose
Applies user-supplied overrides to the identifiability (D-ID) descriptors of one or more parameter specifications (param_specs) for a given family. This allows users to modify the default D-ID status, condition, and reference without altering the likelihood or link structure. It is an internal helper used during family construction.

Arguments

  • param_specs : A list of parameter specification objects (each a list with at least a name field). These represent the canonical parameters of the family.
  • did_override : Either NULL (no overrides) or a named list keyed by slot names (matching param_specs[*]$name). Each value must be a named list with optional fields: did_status, did_condition, did_reference.
  • family_name : A character scalar identifying the family (used in error messages).

Mathematics
No explicit mathematical formulas; implements input validation and slot-wise patching of D-ID descriptors.

Returns
The modified param_specs list with any applicable D-ID overrides applied.

Notes

  • Input validation is strict: did_override must be a named list with no empty names. Each slot entry must be a named list containing only allowed fields.
  • Allowed fields: "did_status", "did_condition", "did_reference".
  • Allowed values for did_status: "holds", "holds_under_condition", "user_responsible".
  • Raises a gdpar_input_error via gdpar_abort() if any validation fails, providing details about the offending slot or field.
  • If did_override is NULL, returns param_specs unchanged.
  • Modifies param_specs in-place (by reference) for any slot present in did_override.

print.gdpar_param_spec(x, ...)

Purpose
S3 print method for objects of class gdpar_param_spec. Provides a human-readable summary of a single parameter specification.

Arguments

  • x : An object of class gdpar_param_spec.
  • ... : Additional arguments (unused; required for S3 generic compatibility).

Mathematics
None.

Returns
Invisibly returns the input x.

Notes

  • Exported as an S3 method for the base print generic.
  • Output includes: name, link, family_role, scope, support, did_status, did_condition (if not NA), and prior_canonical_kind.
  • Each field is printed on a separate line with fixed-width alignment.

gdpar_family(name, link, did_override)

Purpose
Constructs a standard (built-in) family object for use with the package, setting up the link function, parameter specifications, and Stan identifier. This function serves as the primary entry point for creating family objects for the built-in distributions.

Arguments

  • name : character scalar; one of "gaussian", "poisson", "neg_binomial_2", "bernoulli", "beta", "gamma", "student_t", "tweedie", "zip", "zinb", "hurdle_poisson", "hurdle_neg_binomial_2". Identifies the distributional family.
  • link : character scalar or NULL; the link function for the mean (location) parameter. If NULL, the default link for the chosen family is used.
  • did_override : list or NULL; if non-NULL, passed to .gdpar_apply_did_override to modify the default identifiability (D-ID) status of the family's parameters.

Returns
An object of class c("gdpar_family", "list") containing:

  • name : the family name.
  • link : the link function (character).
  • inv_link : the inverse-link function (closure).
  • linkfun : the link function itself (closure).
  • stan_id : integer identifier for the Stan template branch.
  • has_dispersion : logical; TRUE if the family has a dispersion parameter.
  • did_status : character; the identifiability status (e.g., "holds", "holds_under_condition", "user_responsible").
  • did_condition : character; description of condition under which identifiability holds, if any.
  • did_reference : character; reference supporting the identifiability claim.
  • param_specs : list of parameter specification lists (one per parameter, e.g., location and dispersion).

Notes

  • Raises a gdpar_input_error via gdpar_abort if the provided link is not in the set of allowed links for the chosen family.
  • The did_override argument is passed directly to .gdpar_apply_did_override (internal, not defined in this section) and can alter the default D-ID status of parameters.
  • The function internally calls .gdpar_family_param_specs_for (internal, not defined in this section) to obtain the default parameter specifications for the given family and link.

gdpar_family_custom(name, link, did_holds, did_condition, stan_loglik_block, stan_log_lik_block, stan_y_pred_block, y_type, did_reference)

Purpose
Constructs a custom family object for use with the package when the built-in families are insufficient. The user provides Stan code snippets for the likelihood, log-likelihood (for LOO-CV), and posterior predictive sampling, and explicitly declares identifiability (D-ID) status.

Arguments

  • name : character scalar; a unique identifier for the custom family. Must not coincide with any built-in family name.
  • link : character scalar; one of "identity", "log", "logit".
  • did_holds : logical scalar (TRUE or FALSE); whether the family is identifiable in its parameter. Missing values (NA) are not allowed.
  • did_condition : character scalar; description of the condition under which identifiability holds when did_holds = TRUE but identifiability is conditional. Use NA_character_ if identifiability holds unconditionally.
  • stan_loglik_block : character scalar; Stan code snippet for the model block that adds the log-likelihood for one observation. Must reference the linear predictor eta[i] and either y_real[i] (when y_type = "real") or y_int[i] (when y_type = "integer"). The legacy placeholder y[i] is not allowed.
  • stan_log_lik_block : character scalar; Stan code snippet for the generated quantities block that assigns to log_lik[i] for one observation. Used by gdpar_loo for LOO-CV.
  • stan_y_pred_block : character scalar; Stan code snippet for the generated quantities block that assigns to y_pred[i] for one observation. Used for posterior predictive checks.
  • y_type : character scalar; either "real" (outcome is real-valued, Stan references y_real[i]) or "integer" (outcome is integer-valued, Stan references y_int[i]).
  • did_reference : character scalar; citation or source supporting the identifiability declaration.

Returns
An object of class c("gdpar_family", "list") containing:

  • name : the custom family name.
  • link : the link function (character).
  • inv_link : the inverse-link function (closure).
  • linkfun : the link function itself (closure).
  • stan_id : NA_integer_ (custom families do not have a built-in Stan branch ID).
  • has_dispersion : FALSE (custom families are assumed to have only a location parameter).
  • did_status : derived from did_holds and did_condition (one of "holds", "holds_under_condition", "user_responsible").
  • did_condition : the condition text (or NA_character_).
  • did_reference : the user-provided reference.
  • stan_loglik_block : the user-provided model block snippet.
  • stan_log_lik_block : the user-provided generated quantities snippet for log_lik.
  • stan_y_pred_block : the user-provided generated quantities snippet for y_pred.
  • y_type : "real" or "integer".
  • is_custom : TRUE (flag indicating a custom family).
  • param_specs : a list of length 1 containing the parameter specification for the location parameter "mu".

Notes

  • Raises a gdpar_input_error via gdpar_abort for any validation failure (e.g., name coincides with a built-in, invalid link, missing did_holds, empty Stan snippets, legacy y[i] in snippets).
  • Emits a gdpar_did_message via gdpar_inform upon successful creation, reminding the user of their responsibility for the correctness of the Stan code and identifiability declaration.
  • The internal function .gdpar_param_spec (not defined in this section) is called to create the parameter specification for the location parameter "mu".

.gdpar_K_custom_patterns()

Purpose Internal (non-exported) registry function that returns a named list of canonical bi-parametric ($K=2$) likelihood patterns available for constructing custom families via gdpar_family_custom_K. Each pattern specifies the Stan log-probability density function identifier, the outcome type, and the slot-level descriptors (link, support, prior kind, scope, family role, D-ID metadata). The registry is the single source of truth for the descriptor-based wiring decided in sub-phase 8.3.4 (Block 8, D-A3.B option (b)).

Arguments

None.

Returns

A named list. Currently a single element is registered:

Name stan_id y_type Slots
lognormal_loc_scale 7L "real" mu (location), sigma (scale)

Each element is itself a list with components:

  • stan_id — integer; the Stan-side family identifier used by the amm_distrib_K.stan dispatcher.
  • y_type — character; the Stan type of the outcome variable (e.g. "real").
  • slot_specs — a list of lists, each describing one distributional parameter slot. Every slot contains:
    • name — parameter name (e.g. "mu", "sigma").
    • link — link function name ("identity", "log").
    • support — mathematical support ("real_line", "positive_real").
    • prior_canonical_kind — canonical prior identifier ("mu", "log_sigma").
    • scope — whether the parameter varies "per_observation" or is "population"-level.
    • family_role — semantic role ("location", "scale").
    • did_status — identifiability declaration ("holds", "holds_under_condition").
    • did_condition — character or NA_character_; the condition under which D-ID holds.
    • did_reference — citation string for the D-ID claim.

Notes

  • This function is never called directly by the user; it is consumed exclusively by gdpar_family_custom_K.
  • The registry is closed at the package level; adding a new $K=2$ pattern requires a source-code contribution (new list entry) rather than user-side injection, enforcing the auditability contract of D-A3.B option (b).
  • Currently only lognormal_loc_scale is registered (stan_id 7). The doc comments note that future sub-phases will extend the whitelist.

gdpar_family_custom_K(name, stan_lpdf_id, did_holds = TRUE, did_condition = NULL, did_reference = NULL)

Purpose Exported constructor for a $K=2$ custom distributional regression family. It selects a canonical bi-parametric likelihood pattern from the registry returned by .gdpar_K_custom_patterns, wires the descriptor-level metadata into gdpar_param_spec objects, and returns a gdpar_family object that routes through the same amm_distrib_K.stan branch as built-in families. This is the $K=2$ sibling of gdpar_family_custom (the $K=1$ free-form path).

Arguments

Argument Type Default Meaning
name character scalar (required) Unique identifier for the custom family. Must be non-empty, must not collide with any built-in family name.
stan_lpdf_id character scalar (required) Registry key selecting a canonical pattern (currently only "lognormal_loc_scale" is valid).
did_holds logical scalar TRUE Whether the D-ID condition of Lemma 1B holds for this family. If FALSE, every slot's did_status is overridden to "user_responsible". NA is not allowed.
did_condition character scalar or NULL NULL Optional override of the pattern-level D-ID condition string. When NULL, the registry's per-slot did_condition is used.
did_reference character scalar or NULL NULL Optional override of the pattern-level D-ID citation. When NULL, the registry's per-slot did_reference is used.

Mathematics

No formula is computed; the constructor wires descriptors. However, the families it produces are intended for the AMM canonical form

$$y_i \sim D_k!\bigl(\cdot \mid \eta_{ik}\bigr), \qquad \eta_{ik} = g_k!\bigl(\theta_i[k]\bigr),$$

where $D_k$ is the likelihood selected by stan_lpdf_id, $g_k$ is the link function of slot $k$, and $\theta_i \in \mathbb{R}^K$ is the individual parameter vector. For the registered lognormal_loc_scale pattern the likelihood is

$$\log p(y_i \mid \mu_i, \sigma) = -\log(\sigma) - \frac{(\log y_i - \mu_i)^2}{2\sigma^2} - \frac{1}{2}\log(2\pi) - \log y_i,$$

with link $g_1(\mu)=\mu$ (identity) and $g_2(\sigma)=\log\sigma$ (log).

Returns

An object of class gdpar_family (also inherits "list") with components:

Component Type Description
name character The user-supplied family name.
link character Link function of the first (location) slot.
inv_link function Inverse-link function from the first slot's gdpar_param_spec.
linkfun function Link function object from the first slot's gdpar_param_spec.
stan_id integer Stan family identifier from the registry pattern (e.g. 7L).
has_dispersion logical Always TRUE (both slots exist).
did_status character D-ID status of the first slot (possibly overridden).
did_condition character or NA D-ID condition of the first slot (possibly overridden).
did_reference character D-ID reference of the first slot (possibly overridden).
y_type character Stan outcome type from the pattern (e.g. "real").
is_custom logical Always TRUE.
stan_lpdf_id character The registry key used.
param_specs list of gdpar_param_spec One per slot in the pattern (length 2 for the registered pattern).

Notes

  • Validation errors raised (via gdpar_abort with class "gdpar_input_error"):
    • name is not a non-empty character scalar.
    • name collides with a built-in family name (hard-coded vector of 12 built-in names: "gaussian", "poisson", "neg_binomial_2", "bernoulli", "beta", "gamma", "student_t", "tweedie", "zip", "zinb", "hurdle_poisson", "hurdle_neg_binomial_2").
    • stan_lpdf_id is not a character scalar.
    • stan_lpdf_id is not found in the registry.
    • did_holds is not a non-NA logical scalar.
  • D-ID override logic: If did_holds is FALSE, every slot's did_status is set to "user_responsible" regardless of the registry value. The user-supplied did_condition and did_reference (when non-NULL) propagate to every slot, overriding the per-slot registry values.
  • Side effect: Emits an informational message (via gdpar_inform with class "gdpar_did_message") confirming the registration, the canonical pattern used, and the identifiability provenance.
  • No uniqueness check: The current source does not verify that name does not collide with a previously registered custom-K family in the session; only the hard-coded built-in vector is checked. This is consistent with the code as written; a session-level uniqueness guard may be added in a later sub-phase.
  • The returned object's top-level link, inv_link, and linkfun are taken from the first slot (param_specs[[1]]), i.e. the location parameter.

print.gdpar_family(x, ...)

Purpose S3 print method for objects of class gdpar_family. Provides a human-readable textual summary of the family's key metadata.

Arguments

Argument Type Meaning
x gdpar_family object The family to print.
... (ignored) Present for S3 generic compatibility; unused.

Returns

Invisibly returns x (the input object unchanged). Side effect is console output formatted as:

<gdpar_family>
  name           : <name>
  link           : <link>
  has_dispersion : <TRUE/FALSE>
  did_status     : <status>
  did_condition  : <condition>    # only if not NA
  did_reference  : <reference>
  param_specs    : <slot1_name (scope1)>, <slot2_name (scope2)>, ...

The param_specs line is only printed if x$param_specs is non-NULL. Each slot is formatted as "<name> (<scope>)" and the entries are comma-separated.

Notes

  • The did_condition line is suppressed when x$did_condition is NA (checked with !is.na()).
  • No validation of x is performed; the method trusts that the object has the expected structure.
  • This is the only print method for gdpar_family in this section. A separate print method for gdpar_family_multi is referenced in the gdpar_family_multi documentation but is not defined here.

gdpar_family_multi(family, p, link)

Purpose Exported constructor for a multivariate family object used with gdpar when the AMM specification has dimension $p &gt; 1$. It builds a collection of $p$ univariate gdpar_family objects (one per coordinate of the individual parameter vector $\theta_i \in \mathbb{R}^p$) under the assumption that the likelihood factorizes across coordinates:

$$p(y_i \mid \theta_i) = \prod_{k=1}^{p} D_k(y_{ik} \mid \theta_i[k]),$$

with cross-dimensional coupling carried exclusively by the modulating component $W(\theta_{\text{ref}})$ of the AMM canonical form.

Arguments

Argument Type Default Meaning
family character scalar, gdpar_family object, or list of gdpar_family (required) When a character scalar, it is the name of a built-in family (one of "gaussian", "poisson", "neg_binomial_2", "bernoulli"); gdpar_family() is called internally to construct the base family. When already a gdpar_family object, it is replicated $p$ times. When a list of length $p$, each element must be a gdpar_family (homogeneous restriction: all must share the same stan_id; heterogeneous per-coordinate families are deferred to a later sub-phase).
p positive integer (required) Dimension of $\theta_i$. Must match the p of the amm_spec passed to gdpar.
link character scalar (implied: canonical link) Link function used when family is supplied as a name. Ignored when family is already a gdpar_family object or a list.

Returns

An object of class gdpar_family_multi containing at minimum:

Component Type Description
families list of p gdpar_family objects One univariate family per coordinate.
p integer The dimension.
homogeneous logical TRUE when all $p$ families share the same stan_id.
stan_id integer Common Stan family identifier (present when homogeneous).
has_dispersion logical Common dispersion flag (present when homogeneous).
name character Common family name (present when homogeneous).
did_status character Common identifiability status (present when homogeneous).

A print method for gdpar_family_multi provides a human-readable summary.

Mathematics

The factorization follows architectural Option B of the Phase F decision:

$$p(y_i \mid \theta_i) = \prod_{k=1}^{p} D_k(y_{ik} \mid \theta_i[k]),$$

where $y_i \in \mathbb{R}^p$, $D_k$ is the univariate marginal distribution for coordinate $k$, and $\theta_i[k]$ is the $k$-th coordinate of the individual parameter vector. Cross-dimensional dependence enters through the AMM coupling $\theta_i = \theta_{\text{ref}} + W(\theta_{\text{ref}}) \, \delta_i$, not through the likelihood.

Notes

  • Scope restriction: This constructor handles the marginal factorization case only. Multi-parametric families (a single univariate outcome parametrized by the full vector $\theta_i \in \mathbb{R}^p$, e.g. a Gaussian with $\theta_i = (\mu_i, \log \sigma_i)$ in the distributional regression sense) are explicitly deferred to a dedicated post-validation block.
  • Homogeneity: In this version, even when a list of families is supplied, all entries must share the same stan_id. Heterogeneous per-coordinate families are deferred.
  • D-ID: The identifiability condition of Lemma 1B applies coordinate-wise: each univariate marginal $D_k$ identifies $\theta_i[k]$ from $y_{ik}$ independently. The cross-dimensional identifiability condition (C4-bis), guarding against aliasing between coordinates of $\theta_{\text{ref}}$ that share basis structure, is checked by gdpar_check_identifiability (Phase H, pending).
  • Dependency: Calls gdpar_family() when family is supplied as a character name.
  • The function body is not included in this section (section 4 of 7); only the roxygen block and the signature are present here. The implementation appears in a subsequent section.

gdpar_family_multi(family, p, link = NULL)

Purpose Constructs a multivariate family object (class gdpar_family_multi) representing a product of p univariate families across coordinates. This is the primary factory for multivariate response models in gdpar, enforcing homogeneity constraints in the current version.

Arguments

  • family: Input family specification. Accepted types:
    • character(1): Name of a built-in family (e.g., "gaussian").
    • gdpar_family: A pre-constructed univariate family object.
    • list: A list of p gdpar_family objects, one per coordinate.
  • p: Integer > 0, the number of response dimensions (coordinates).
  • link: Optional character string specifying a link function. Ignored if family is already a gdpar_family object.

Mathematics None. This function is a constructor and validator.

Returns An object of class c("gdpar_family_multi", "list") with elements:

  • families: List of p gdpar_family objects.
  • p: Integer, the number of coordinates.
  • homogeneous: Logical, currently always TRUE (enforced).
  • stan_id, name, link, has_dispersion, did_status, did_condition, did_reference: Properties taken from the first (and only, due to homogeneity) family.
  • param_specs_per_coord: List of p param_specs lists.

Notes

  • Raises gdpar_input_error if p is not a positive integer, if family is an invalid type, if the list length mismatches p, or if any list element is not a gdpar_family.
  • Raises gdpar_unsupported_feature_error if the supplied families are heterogeneous (different stan_id or link). This is a known limitation; future versions will support heterogeneity.
  • If family is a character string, it is passed to gdpar_family() to create a base family, which is then replicated p times.
  • The link argument is ignored when a gdpar_family object is supplied to avoid silent overwrites; a warning-like abort is issued instead.

print.gdpar_family_multi(x, ...)

Purpose S3 print method for gdpar_family_multi objects, providing a concise summary.

Arguments

  • x: An object of class gdpar_family_multi.
  • ...: Additional arguments (unused, present for S3 generic compatibility).

Mathematics None.

Returns Invisibly returns x.

Notes Prints a formatted summary to the console, including dimensions (p), homogeneity status, family name, link, and parameter specifications. If homogeneous is TRUE and param_specs_per_coord is available, it prints a summary of parameter names and scopes for the first coordinate.

.gdpar_promote_scope_per_observation(family, k_names)

Purpose Internal helper to promote specified parameters of a family to "per_observation" scope. This is used to implement K-individual specifications where some parameters vary per observation (e.g., in mixed-effects-like models). For multivariate families (gdpar_family_multi), promotion is applied to each coordinate's family.

Arguments

  • family: A gdpar_family or gdpar_family_multi object.
  • k_names: Character vector of parameter names to promote. May be NULL or empty (no-op).

Mathematics None. This function modifies metadata only.

Returns A copy of the input family object with updated scope fields in param_specs. The original is not mutated.

Notes

  • Raises gdpar_internal_error if k_names is not a character vector or if a family lacks param_specs.
  • Raises gdpar_input_error if any name in k_names is not among the eligible parameter names of the family.
  • For gdpar_family_multi, it updates each family in the families list and also updates the param_specs_per_coord field accordingly.

.gdpar_canonical_inv_link_id_slot1(stan_id)

Purpose Internal helper that returns the integer ID of the canonical inverse link function for the location slot (slot 1) of the built-in family identified by stan_id. This ID is used by Stan code for link dispatch.

Arguments

  • stan_id: Integer scalar, one of the built-in family Stan IDs (1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13).

Mathematics The mapping is:

  • stan_id == 1 (Gaussian) → identity link → ID 0
  • stan_id == 3 (Negative Binomial) → log link → ID 2
  • stan_id == 5 (Beta) → logit link → ID 1
  • stan_id == 6 (Gamma) → log link → ID 2
  • stan_id == 7 (Lognormal) → identity link (on log scale) → ID 0
  • stan_id == 8 (Student-t) → identity link → ID 0
  • stan_id == 9 (Tweedie) → log link → ID 2
  • stan_id == 10 (Zero-Inflated Poisson) → log link → ID 2
  • stan_id == 11 (Zero-Inflated Negative Binomial) → log link → ID 2
  • stan_id == 12 (Hurdle Poisson) → log link → ID 2
  • stan_id == 13 (Hurdle Negative Binomial) → log link → ID 2

Returns Integer scalar in {0, 1, 2}.

Notes

  • Raises gdpar_internal_error if stan_id is NULL or NA.
  • Raises gdpar_internal_error with a descriptive message if stan_id is not one of the recognized codes.

.gdpar_canonical_inv_link_id(stan_id, slot_idx)

Purpose Internal helper that returns the integer ID of the canonical inverse link function for an arbitrary slot index in the canonical parameter specifications of the built-in family identified by stan_id. This is used in the homogeneous branch of the heterogeneous resolver to obtain the correct link for a given slot.

Arguments

  • stan_id: Integer scalar, a built-in family Stan ID.
  • slot_idx: Integer scalar, 1-based slot index in the family's canonical param_specs.

Mathematics The mapping is family-specific and slot-specific. For example, for the Gaussian family (stan_id=1), slot 1 (mean) uses identity link (ID 0) and slot 2 (scale) uses log link (ID 2). The exact mapping is not provided in the visible code.

Returns Integer scalar in {0, 1, 2}.

Notes

  • The function body is not fully shown in the provided code, but it is expected to be a switch-case or lookup based on stan_id and slot_idx.
  • Likely raises gdpar_internal_error for invalid stan_id or slot_idx.

.gdpar_canonical_inv_link_id_slot(stan_id, slot_idx)

Purpose Internal helper that maps a given slot of a given family (identified by its Stan numeric ID) to a canonical integer inverse-link identifier used by the Stan template's apply_inv_link_by_id() dispatcher. It handles both built-in families (via a switch on stan_id) and the special custom-K pattern lognormal_loc_scale (stan_id 7).

Arguments

Argument Type Meaning
stan_id Integer scalar (or coercible) The Stan-family numeric identifier. Must not be NULL or NA.
slot_idx Integer scalar 1-based index of the parameter slot whose inverse-link ID is requested. Must lie within 1:length(slot_specs) for the relevant family/pattern.

Mathematics

The function does not compute a mathematical formula; it is a pure lookup/dispatch table. It returns the integer encoding of the link function associated with slot slot_idx:

$$ \text{inv_link_id} = \begin{cases} 0 & \text{if link} = \texttt{identity},\\ 1 & \text{if link} = \texttt{logit},\\ 2 & \text{if link} = \texttt{log}. \end{cases} $$

For stan_id 7 the slot specs are obtained from .gdpar_K_custom_patterns()[["lognormal_loc_scale"]]$slot_specs. For all other built-in families the specs come from .gdpar_family_param_specs_for(family_name, .gdpar_default_link_for(family_name)).

Returns An integer scalar in {0L, 1L, 2L} representing the canonical inverse-link identifier.

Notes

  • Raises a gdpar_internal_error if stan_id is NULL or NA.
  • For stan_id 7 (lognormal_loc_scale), the function bypasses the normal gdpar_family path and fetches specs directly from .gdpar_K_custom_patterns(), because lognormal_loc_scale is a custom-K registry pattern, not a built-in gdpar_family.
  • Raises a gdpar_internal_error if slot_idx is out of range for the resolved family/pattern.
  • Raises a gdpar_internal_error if stan_id does not correspond to any recognised built-in family (the family_name switch returns NULL).
  • Raises a gdpar_internal_error if the resolved link string for the slot is not one of identity, logit, or log.
  • No S3 dispatch; pure procedural logic.

.gdpar_default_link_for(name)

Purpose Returns the default (canonical) link-function string for a built-in family given its character name. Used internally by .gdpar_canonical_inv_link_id_slot to look up param_specs without the caller needing to know link conventions.

Arguments

Argument Type Meaning
name Character scalar One of the recognised built-in family names (see table below).

Returns A character scalar — the name of the default link function. The mapping is:

Family name Default link
gaussian "identity"
poisson "log"
neg_binomial_2 "log"
bernoulli "logit"
beta "logit"
gamma "log"
student_t "identity"
tweedie "log"
zip "log"
zinb "log"
hurdle_poisson "log"
hurdle_neg_binomial_2 "log"

Notes

  • Implemented via a switch() with no default branch. If name does not match any of the listed families, switch() returns NULL silently (no explicit error is raised).
  • No S3 dispatch.

.gdpar_canonical_support_slot1(stan_id)

Purpose Returns the canonical support string of slot 1 (the location/first parameter) of the family identified by stan_id. Used by the heterogeneous validator (D4 of sub-phase 8.3.7): when family_id_k[k] is heterogeneous, the emitted support is the support of slot 1 of that family per the L1 refinement rule.

Arguments

Argument Type Meaning
stan_id Integer scalar (or coercible) The Stan-family numeric identifier.

Mathematics None — pure lookup table.

Returns A character scalar, one of "real_line", "positive_real", "unit_interval", or "bounded_open". The mapping is:

stan_id Family Slot 1 support
1 gaussian "real_line"
3 neg_binomial_2 "positive_real"
5 beta "unit_interval"
6 gamma "positive_real"
7 lognormal_loc_scale "real_line"
8 student_t "real_line"
9 tweedie "positive_real"
10 zip "positive_real"
11 zinb "positive_real"
12 hurdle_poisson "positive_real"
13 hurdle_neg_binomial_2 "positive_real"

Notes

  • Raises a gdpar_internal_error if the integer-converted stan_id does not match any recognised key.
  • No S3 dispatch.

.gdpar_support_subset_coherent(emitted, required)

Purpose Predicate that tests whether the support emitted by a heterogeneous family's slot 1 is a coherent subset of the support required by the role of that slot in the location family. Used by the D4 heterogeneous validator.

Arguments

Argument Type Meaning
emitted Character scalar The support string emitted by the heterogeneous family's slot 1 (e.g. from .gdpar_canonical_support_slot1).
required Character scalar The support string required by the role of the slot in the location family's param_specs.

Mathematics

The subset-coherence relation is defined as:

$$ \begin{aligned} &\texttt{real_line} \subseteq {\texttt{real_line}}, \\ &\texttt{positive_real} \subseteq {\texttt{positive_real},; \texttt{real_line}}, \\ &\texttt{unit_interval} \subseteq {\texttt{unit_interval},; \texttt{positive_real},; \texttt{real_line}}, \\ &\texttt{bounded_open} \subseteq {\texttt{bounded_open},; \texttt{positive_real},; \texttt{real_line}}. \end{aligned} $$

The reflexive case (emitted == required) always returns TRUE.

Returns A logical scalar: TRUE if emitted is a coherent subset of required; FALSE otherwise (including the case where emitted is "custom" or any unrecognised string).

Notes

  • Custom supports ("custom") are implicitly rejected: they match none of the if branches and fall through to FALSE, because the heterogeneous family cannot have a custom slot 1 built-in.
  • Bounds are not checked for "bounded_open" (no built-in family registers a bounded range for slot 1).
  • No S3 dispatch.

.gdpar_compatible_families_for_support(required, exclude_stan_ids = integer(0))

Purpose Enumerates alternative built-in families whose slot 1 support satisfies the subset-coherence predicate against a given required support. Used by the D4 validator to build informative error messages suggesting replacement families when a user-supplied heterogeneous family fails validation.

Arguments

Argument Type Meaning
required Character scalar The support required by the role of the slot in the location family.
exclude_stan_ids Integer vector (default integer(0)) Stan IDs to exclude from the candidate list. Typically includes the location family's stan_id and the failing heterogeneous family's stan_id.

Mathematics None — iterates over a fixed candidate list and applies .gdpar_support_subset_coherent.

Returns A character vector of built-in family names whose slot 1 support coherently satisfies required, excluding any whose stan_id appears in exclude_stan_ids. May be length zero.

Notes

  • The candidate list is hardcoded and includes only six families: gaussian (1), neg_binomial_2 (3), beta (5), gamma (6), student_t (8), tweedie (9). Notably absent are zip, zinb, hurdle_poisson, hurdle_neg_binomial_2, poisson, and bernoulli — these are not offered as suggestions.
  • The location family itself is excluded via exclude_stan_ids (the homogeneous trivial case is not a suggestion).
  • No S3 dispatch.

.gdpar_validate_heterogeneous_family_K(family_het_list, location_param_specs, slot_names)

Purpose Validates a heterogeneous family list (a named list of gdpar_family objects, one per slot $k = 1, \dots, K$) against the location family's slot roles. Implements the D4 validator of sub-phase 8.3.7: for each slot $k \geq 2$, it checks that the support emitted by the heterogeneous family's slot 1 is a coherent subset of the support required by slot $k$'s role in the location family.

Arguments

Argument Type Meaning
family_het_list Named list of gdpar_family objects, length $K$ The heterogeneous family specification. [[1]] is the location family. Names must match slot_names.
location_param_specs List of $K$ param_specs The location family's parameter specifications after promotion to per-observation, defining the canonical role assignment of the $K$ slots.
slot_names Character vector, length $K$ Canonical slot names (e.g. c("mu", "sigma", ...)).

Mathematics None — validation logic only.

Returns invisible(NULL) on success.

Notes

  • Raises a gdpar_internal_error if the three inputs have inconsistent lengths $K$.
  • For each slot $k \geq 2$:
    • Raises a gdpar_input_error if family_het_list[[k]] is not a gdpar_family object.
    • Raises a gdpar_input_error if family_het_list[[k]]$stan_id is NA or NULL (custom free-form families are not admitted as heterogeneous slot entries; only built-in families and descriptor-based custom-K patterns are allowed).
    • Skips validation (via next) if the heterogeneous family's stan_id equals the location family's stan_id (homogeneous slot).
    • Raises a gdpar_input_error if the subset-coherence predicate fails. The error message includes the offending slot name, its role, the emitted and required supports, the location family name, and a suggestion list from .gdpar_compatible_families_for_support.
  • Slot 1 (the location family) is never validated here; it is the canonical reference that determines role assignment.
  • No S3 dispatch.

.gdpar_inv_link_id_per_slot(family_id_k_vector, location_family) (documentation only; body not in this section)

Purpose Computes the inv_link_id_per_slot integer vector of length $K$ passed to the Stan template's data block. Implements the L1 refined rule (D3.5 of 8.3.7) for dispatching inverse-link functions per slot.

Arguments

Argument Type Meaning
family_id_k_vector Integer vector, length $K$ Per-slot Stan family ID. Element $k$ is the stan_id governing slot $k$.
location_family A gdpar_family object The location family (slot 1), used to look up canonical slot-$k$ links in the homogeneous case.

Mathematics

For each slot $k = 1, \dots, K$:

$$ \texttt{inv_link_id}[k] = \begin{cases} \text{canonical inv_link ID of slot } k \text{ of the location family} & \text{if } \texttt{family_id_k}[k] = \texttt{family_id_k}[1],\\ \text{canonical inv_link ID of slot } 1 \text{ of the family with stan_id } \texttt{family_id_k}[k] & \text{otherwise.} \end{cases} $$

The returned integer vector values lie in $\{0, 1, 2\}$, corresponding to identity (0), logit (1), and log (2) as defined in .gdpar_canonical_inv_link_id_slot.

Returns An integer vector of length $K$ with values in $\{0, 1, 2\}$.

Notes

  • Backward-compatibility note (D7 of 8.3.7): In the strict homogeneous regime (all family_id_k entries equal), the vector reproduces the canonical slot-by-slot links of the location family. The Stan refactor replaces previously hardcoded inv_link calls with apply_inv_link_by_id(), which preserves mathematical equivalence but is NOT guaranteed to be bit-for-bit identical with pre-8.3.7 outputs. Goldens for $K=2$ Beta and Gamma must be re-bootstrapped on close.
  • The function body is not present in this section; only the roxygen documentation block appears here.

.gdpar_compute_inv_link_id_per_slot(family_id_k_vector, location_family)

Purpose
Computes a vector of integer inverse-link identifiers for each slot in a heterogeneous family model. This is used to map the canonical inverse-link function of the location family to slots that share the same family as the location family, and to assign a slot-specific canonical inverse-link for slots with different families.

Arguments

  • family_id_k_vector (integer vector): Length-K vector of Stan family identifiers for each slot.
  • location_family (gdpar_family): The primary location family object whose stan_id determines the canonical inverse-link for matching slots.

Mathematics
For each slot $k = 1, \dots, K$:

  • If $\text{family\_id\_k\_vector}[k] == \text{location\_family\$stan\_id}$, then
    $$ \text{out}[k] = \text{.gdpar_canonical_inv_link_id_slot}(\text{location_stan_id}, k). $$
  • Otherwise,
    $$ \text{out}[k] = \text{.gdpar_canonical_inv_link_id_slot1}(\text{family_id_k_vector}[k]). $$ where .gdpar_canonical_inv_link_id_slot and .gdpar_canonical_inv_link_id_slot1 are helper functions (not defined in this section) that return integer inverse-link identifiers.

Returns
An integer vector of length $K$ where each element is the inverse-link identifier for the corresponding slot.

Notes

  • This is an internal helper function (not exported).
  • The function loops over $K$ slots.
  • The inverse-link identifier is used by Stan to compute the inverse-link function for the linear predictor in each slot.

.gdpar_resolve_heterogeneous_family_K(family_input, slot_names)

Purpose
Resolves a heterogeneous family argument (a named list of gdpar_family objects) into a canonical form for the model. It validates the input, ensures slot names match the formula set, checks for unsupported heterogeneity in certain location families, and propagates design (did) status overrides from the first slot (location family) to other slots that have the same family. This function implements the named-list public API (D5 of 8.3.7) and the materialization rule (D3.5 / L1 refined).

Arguments

  • family_input (either a gdpar_family object or a named list of gdpar_family objects):
    • If a single gdpar_family, it is treated as homogeneous (all slots share the same family).
    • If a named list, it must have one entry per slot, keyed by slot name.
  • slot_names (character vector): Length-K vector of canonical slot names from the gdpar_formula_set.

Returns
A named list with three components:

  1. location_family (gdpar_family): The location family with per-slot design (did) status overrides propagated (only for slots that are heterogeneous).
  2. family_id_k_vector (integer vector): Length-K vector of Stan family identifiers for each slot.
  3. is_heterogeneous (logical): TRUE if at least one slot has a different Stan identifier from the first slot (i.e., the location family).

Notes

  • Input Validation:
    • If family_input is not a gdpar_family or a list, an error of class gdpar_input_error is raised.
    • If family_input is a list but not a named list (or has empty/duplicate names), an error is raised.
    • The length of the list must equal $K$ (the number of slots).
    • The names of the list must exactly match slot_names (no missing or extra names).
    • Each element of the list must be a gdpar_family object.
  • Location Family Requirements:
    • The first slot (location family) must have a non-NA stan_id (i.e., it must be a built-in or descriptor-based custom_K family).
  • Heterogeneity Restrictions:
    • Heterogeneous families (different families across slots) are only supported for a subset of location families: Gaussian (1), negative binomial (3), beta (5), gamma (6), and lognormal_loc_scale (7). If heterogeneity is detected and the location family is not one of these, an error of class gdpar_unsupported_feature_error is raised.
  • Design (did) Propagation:
    • For each slot $k \ge 2$ that has a different family from the location family, the did_status, did_condition, and did_reference fields in the location family's param_specs for that slot are overwritten with those from the heterogeneous family for that slot. The link, support, and prior canonical kind are not overwritten because the Stan side uses inv_link_id_per_slot and canonical priors are applied in linear predictor space.
  • Side Effects:
    • The location_family returned is a modified copy of the original location family (promoted to have per-observation scope via .gdpar_promote_scope_per_observation).
    • Validation of the heterogeneous family is performed by .gdpar_validate_heterogeneous_family_K (not defined in this section).
  • Error Handling:
    • Uses gdpar_abort with appropriate classes and data for structured error reporting.
  • Internal Use:
    • This is an internal function (not exported).

R/formula_spec.R

formula_spec.R (section 1 of 3)

.gdpar_check_no_intercept_suppression(f, slot_name)

Purpose

Internal validation guard that detects intercept-suppression syntax (- 1 or + 0) in a formula by inspecting the intercept attribute of the terms object produced by stats::terms(). In the AMM canonical form the population anchor $\theta_{\mathrm{ref}}$ is a structural term that cannot be removed; a user attempt to suppress it is treated as a contract violation and aborts with a gdpar_input_error.

Arguments

Argument Type Meaning
f formula A one-sided or two-sided formula object to be inspected.
slot_name character (scalar) Name of the slot within the enclosing gdpar_formula_set; used solely for constructing the error message.

Mathematics

The decision rule is:

$$ \text{abort} \iff \mathrm{attr}!\big(\mathrm{terms}(f),;\text{"intercept"}\big) = 0 $$

When stats::terms(f) raises an error (i.e. the formula cannot be parsed by terms), the function treats the formula as valid and returns TRUE:

$$ \mathrm{terms}(f) ;\text{fails} ;\Longrightarrow; \text{return } \mathrm{TRUE} $$

Returns

invisible(TRUE) in all non-aborting paths:

  1. When stats::terms(f) fails and trm is NULL.
  2. When the intercept attribute is non-zero (intercept retained).

When intercept suppression is detected, the function does not return; it calls gdpar_abort() with class = "gdpar_input_error" and a data list containing slot and formula (the deparsed formula at width.cutoff = 500L).

Notes

  • The error message is bilingual (English and Spanish) and directs users to glm()/brms/stan_glm() for anchor-free models.
  • There is a minor redundancy in the NULL branch: invisible(TRUE) is called and then immediately return(invisible(TRUE)) follows; the first call has no observable effect.
  • The function is marked @keywords internal and @noRd; it is not exported.
  • No S3 dispatch is involved; the function is a plain closure.

.gdpar_assert_named_formula_list(args, fn_name)

Purpose

Internal assertion that validates a list of arguments captured from ... in a constructor. It enforces four structural invariants: non-emptiness, full naming, name uniqueness, and that every element inherits from class "formula".

Arguments

Argument Type Meaning
args list A list captured from ... in the calling constructor (e.g. gdpar_formula_set).
fn_name character (scalar) Name of the calling function, embedded in error messages for traceability.

Returns

invisible(TRUE) on success. On any validation failure, the function aborts via gdpar_abort() with class = "gdpar_input_error".

Notes

Validation proceeds in four sequential checks, each of which aborts on failure:

  1. Non-emptiness. If length(args) == 0L, aborts with message "<fn_name>() requires at least one named formula." No data field is attached.

  2. Naming. arg_names <- names(args). If arg_names is NULL or any element fails nzchar() (i.e. is ""), aborts with a message instructing the user to name formulas with the canonical parameter name (e.g. mu = y ~ a(x)). No data field is attached.

  3. Uniqueness. anyDuplicated(arg_names) > 0L triggers an abort. Duplicates are collected via unique(arg_names[duplicated(arg_names)]) and reported with sQuote and comma separation. The data field is list(duplicates = dups).

  4. Formula class. Iterates seq_along(args); if any element does not inherits(., "formula"), aborts with a message naming the slot, the function, and the received class(es) (via sQuote(class(args[[i]]))). The data field is list(slot = arg_names[[i]], received = class(args[[i]])).

The function is marked @keywords internal and @noRd.


gdpar_formula_set(...)

Purpose

Exported constructor that builds the canonical internal representation of a multi-parameter AMM formula set. This object is the single source of truth for which structural parameters of the distributional family receive an AMM (per-observation) design. It is consumed downstream by gdpar().

Arguments

Argument Type Meaning
... Named formula objects The first must be a two-sided formula whose LHS is a single symbol naming the outcome (e.g. mu = y ~ a(x)). Subsequent arguments must be one-sided formulas (e.g. sigma = ~ a(x)).

Mathematics

Let $\mathcal{F} = \{f_1, f_2, \dots, f_K\}$ be the set of $K$ formulas supplied. The constructor partitions $\mathcal{F}$ into:

  • $f_1$: a two-sided formula $\text{outcome} \sim \text{design}_1$, where $\text{outcome}$ is a bare symbol.
  • $f_k$ for $k \geq 2$: one-sided formulas $\sim \text{design}_k$.

Each formula name corresponds to a canonical parameter $\theta_k$ of the family. Downstream, gdpar() promotes every named parameter to:

$$ \mathrm{scope}(\theta_k) = \text{"per_observation"}, \quad k = 1, \dots, K $$

while unnamed family parameters retain their canonical scope (typically "population"). The population anchor $\theta_{\mathrm{ref}}$ is structural and cannot be suppressed.

Returns

An S3 object of class c("gdpar_formula_set", "list") with four components:

Component Type Description
outcome character (scalar) The outcome variable name extracted from the LHS of the first formula.
formulas named list of formula The original args list, preserving order and names. The first element is two-sided; the rest are one-sided.
param_names character vector Identical to names(formulas).
env environment environment(first_f), the environment of the first formula, used downstream for evaluation.

Notes

Validation is performed in a strict sequence; the first failing check aborts via gdpar_abort() with class = "gdpar_input_error":

  1. Named formula list. Delegates to .gdpar_assert_named_formula_list(args, "gdpar_formula_set"), which checks non-emptiness, naming, uniqueness, and formula class.

  2. First slot is two-sided. length(first_f) != 3L detects a non-two-sided first formula (R formulas have length 2 when one-sided, 3 when two-sided). The error data is list(slot = arg_names[[1L]], received = deparse(first_f, width.cutoff = 500L)).

  3. First slot LHS is a bare symbol. outcome_expr <- first_f[[2L]]; if !is.symbol(outcome_expr), the function aborts. This rejects function calls on the LHS such as log(y). The error data is list(slot = arg_names[[1L]], received = deparse(outcome_expr, width.cutoff = 500L)). The outcome name is then extracted via as.character(outcome_expr).

  4. Subsequent slots are one-sided. For i in 2:length(args), length(f) != 2L triggers an abort. The error data is list(slot = arg_names[[i]], received = deparse(f, width.cutoff = 500L)).

  5. No intercept suppression. Iterates seq_along(args) and calls .gdpar_check_no_intercept_suppression(args[[i]], arg_names[[i]]) for every formula.

Set membership (slot names must be a subset of the family's eligible param_specs) is not enforced here; it is deferred to gdpar() once both the formula set and the family are available.

The function is exported (@export).


gdpar_bf(...)

Purpose

Exported brms-style sugar constructor that produces a gdpar_formula_set from a sequence of two-sided formulas. It is a convenience wrapper around gdpar_formula_set(), designed to feel familiar to users of brms::bf. The brms package is not a runtime dependency.

Arguments

Argument Type Meaning
... Two-sided formula objects The first carries the outcome on its LHS and defaults to slot name "mu". Subsequent formulas carry the canonical parameter name on their LHS (e.g. sigma ~ a(x)).

Mathematics

The sugar translates a sequence of two-sided formulas into the canonical representation. For the default case:

$$ \texttt{gdpar_bf}\big(y \sim a(x_1),;\sigma \sim a(x_2)\big) ;\equiv; \texttt{gdpar_formula_set}\big(\mu = y \sim a(x_1),;\sigma = \sim a(x_2)\big) $$

When the first argument is explicitly named (e.g. theta = y ~ a(x)), that name overrides the default "mu":

$$ \texttt{gdpar_bf}(\theta = y \sim a(x)) ;\equiv; \texttt{gdpar_formula_set}(\theta = y \sim a(x)) $$

Returns

An object of class c("gdpar_formula_set", "list"), identical in structure to what gdpar_formula_set() returns (components: outcome, formulas, param_names, env).

Notes

  • The function body is not present in this source section. Only the roxygen documentation (@param, @return, @examples, @seealso, @export) appears in section 1 of 3. The implementation is expected in a subsequent section; based on the documentation, it internally calls gdpar_formula_set() after converting the two-sided formulas into the canonical form (first formula's LHS becomes the outcome, subsequent formulas' LHSs become slot names and their RHSs become one-sided formulas).
  • The function is exported (@export).
  • No S3 dispatch is involved at the constructor level.

gdpar_bf(...)

Purpose User-facing constructor that assembles a gdpar_formula_set from a variable number of formula arguments. It is the primary entry point for declaring the outcome formula together with one or more parameter (slot) formulas in a single call, performing full validation of shapes, names, and duplicates before delegating to gdpar_formula_set.

Arguments

  • ... : formula objects. The first must be a two-sided formula whose left-hand side is the outcome variable (e.g. y ~ a(x)). Subsequent formulas must be two-sided with a single symbol on the left-hand side naming the parameter slot (e.g. sigma ~ a(x)), or one-sided formulas supplied via a named argument whose name matches the intended slot (e.g. sigma = ~ a(x)). Argument names are optional for positional use; when present they must agree with the LHS symbol of the corresponding formula.

Returns A gdpar_formula_set object, obtained by do.call(gdpar_formula_set, args) where args is the validated, named list of formulas. The first slot is named "mu" when no explicit name is supplied; subsequent slots are named after their LHS symbol (or their explicit argument name).

Notes

  • Raises a gdpar_input_error (via gdpar_abort) when:
    • no arguments are supplied;
    • any argument is not a formula;
    • the first formula is not two-sided (length 3);
    • any formula after the first is not two-sided;
    • the LHS of a non-first formula is not a single symbol;
    • an explicit argument name disagrees with the LHS symbol of its formula;
    • duplicate slot names are detected.
  • For formulas after the first, the two-sided formula is converted to a one-sided formula ~ <RHS> preserving the original environment, before being placed in the named list passed to gdpar_formula_set.
  • The first formula is left untouched (still two-sided) in the list handed to gdpar_formula_set; downstream code is expected to extract the outcome from it.
  • Error objects carry a data field with diagnostic information (position, received class, duplicates, etc.).
  • Side effects: none beyond error signaling and construction of the returned object.

print.gdpar_formula_set(x, ...)

Purpose S3 print method for objects of class gdpar_formula_set. Renders a compact human-readable summary showing the number of slots K, the outcome variable, the parameter names in declaration order, and each slot's formula.

Arguments

  • x : a gdpar_formula_set object. Expected to be a list with components outcome (character scalar), param_names (character vector), and formulas (named list of formulas).
  • ... : unused; present for S3 generic compatibility.

Returns Invisibly returns x.

Notes

  • Dispatches on class gdpar_formula_set.
  • Uses deparse(f, width.cutoff = 500L) to render each formula on a single logical line.
  • No validation of x is performed; malformed inputs may produce partial or confusing output.

[[.gdpar_formula_set(x, i)

Purpose S3 double-bracket subsetting operator for gdpar_formula_set objects. Extracts a single formula by slot name or integer index.

Arguments

  • x : a gdpar_formula_set object containing a formulas component (a named list).
  • i : a character scalar (slot name) or an integer scalar (position), passed directly to [[ on x$formulas.

Returns The single formula stored at slot i (not wrapped in a list).

Notes

  • Dispatches on class gdpar_formula_set.
  • Inherits the indexing semantics of R's [[ for lists: out-of-bounds integer indices return NULL, and partial matching may apply for character indices.
  • No re-validation is performed; the return is a bare formula.

[.gdpar_formula_set(x, i)

Purpose S3 single-bracket subsetting operator for gdpar_formula_set objects. Returns a named list of formulas for the requested slots.

Arguments

  • x : a gdpar_formula_set object containing a formulas component.
  • i : a character vector of slot names or an integer vector of positions, passed directly to [ on x$formulas.

Returns A named list of formulas (a plain list, not a gdpar_formula_set). The documentation explicitly notes that downstream re-validation requires reconstruction via gdpar_formula_set.

Notes

  • Dispatches on class gdpar_formula_set.
  • The return value loses the gdpar_formula_set class intentionally; callers needing a validated set must rebuild one.
  • Indexing semantics (recycling, out-of-bounds producing NA, partial matching) follow base R list behavior.

names.gdpar_formula_set(x)

Purpose S3 names method for gdpar_formula_set objects. Returns the slot (parameter) names in declaration order.

Arguments

  • x : a gdpar_formula_set object containing a param_names component (character vector).

Returns A character vector equal to x$param_names, giving the slot names in declaration order.

Notes

  • Dispatches on class gdpar_formula_set.
  • No validation is performed; the value is returned verbatim from x$param_names.

length.gdpar_formula_set(x)

Purpose S3 length method for gdpar_formula_set objects. Reports the number of parameter slots K.

Arguments

  • x : a gdpar_formula_set object containing a formulas component (a named list).

Returns An integer scalar equal to length(x$formulas), i.e. the number of slots (including the outcome slot mu).

Notes

  • Dispatches on class gdpar_formula_set.
  • The returned count includes the outcome slot, so it is the total number of formulas stored, not the number of auxiliary parameters.

.gdpar_rhs_has_amm_calls(formula)

Purpose Internal predicate used by the dispatch logic in gdpar (sub-phase 8.3.3, decision P-dispatch) to decide whether a classic two-sided formula should be routed through the new K-individual parser path or through the legacy single-amm_spec path. Returns TRUE when at least one top-level summand of the RHS is a call to one of the AMM wrappers a, b, or W.

Arguments

  • formula : a classic two-sided (or one-sided) formula object, or any R object.

Returns A logical scalar. FALSE when formula is not a formula, or when no top-level summand is a call to a, b, or W; TRUE otherwise.

Mathematics

Let the RHS expression be $e$. Define a predicate $H(e)$ recursively:

$$ H(e) = \begin{cases} H(e_2) \lor H(e_3) & \text{if } e \text{ is a binary call } e_1 + e_2 + e_3 \text{ with } e_1 \equiv +, \ \mathbb{1}{\mathrm{head}(e) \in {a, b, W}} & \text{if } e \text{ is a call with a symbol head}, \ \text{FALSE} & \text{otherwise.} \end{cases} $$

The function returns $H(e)$ where $e$ is formula[[3L]] for two-sided formulas or formula[[2L]] for one-sided formulas.

Notes

  • Non-throwing by design: an invalid RHS that the parser would later reject still returns TRUE here when an AMM wrapper is detected, so that dispatch routes the call to the parser which produces the canonical error message.
  • Only top-level summands of a + tree are inspected; nested calls inside other operators are not traversed (the recursion only descends through binary +).
  • A call head must be a symbol; calls with non-symbol heads (e.g. :: or $-qualified names) do not trigger TRUE.
  • Returns FALSE immediately when formula does not inherit from "formula".

.gdpar_is_default_amm_spec(amm)

Purpose Internal predicate that detects whether an amm_spec object is the canonical default Level 0 spec (no active components). Used by the dispatch in gdpar to treat an uncustomized amm argument as effectively absent when the new K-individual paths are taken.

Arguments

  • amm : any R object.

Returns A logical scalar. TRUE when amm inherits from "amm_spec" and all of the following hold: amm$a is NULL, amm$b is NULL, amm$W is NULL, amm$x_vars is NULL, amm$p is NULL or equal to 1L, and amm$dims is NULL. FALSE otherwise.

Notes

  • The check on p uses is.null(amm$p) || isTRUE(amm$p == 1L), so a missing p component also qualifies as default.
  • Accesses components via $, which returns NULL for absent components in lists; thus an amm_spec lacking some of these fields entirely is still considered default.
  • Does not throw; any non-amm_spec input returns FALSE via short-circuit of &&.

.gdpar_parse_amm_formula(rhs_formula, slot_name)

Purpose Internal parser that walks the RHS expression of a one-sided formula and extracts the AMM components declared via the special wrapper calls a(...), b(...), and W(). Produces a plain list descriptor that downstream dispatch combines with the external W_basis argument to build an amm_spec per slot.

Arguments

  • rhs_formula : a one-sided (or two-sided) formula whose RHS is to be parsed.
  • slot_name : a character scalar giving the slot name from the enclosing gdpar_formula_set, used solely for error messages.

Returns A named list with three components:

  • a_formula : a one-sided formula ~ <expr> when a(<expr>) appears in the RHS; NULL otherwise. The formula carries the environment of rhs_formula.
  • b_formula : a one-sided formula ~ <expr> when b(<expr>) appears in the RHS; NULL otherwise.
  • W_present : a logical scalar; TRUE when W() appears in the RHS.

Mathematics

Let the RHS expression be $e$, split into top-level summands $s_1, \ldots, s_n$ by .gdpar_split_amm_summands. For each summand $s_j$:

  • If $s_j \equiv 1$ (the symbol 1 or the numeric value 1), it is accepted as a Level 0 anchor-only term and skipped.
  • Otherwise $s_j$ must be a call whose head is a symbol in $\{a, b, W\}$.
  • For head $W$: the call must have zero arguments; sets W_present $\leftarrow$ TRUE.
  • For head $a$ or $b$: the call must have exactly one argument; the interior expression is wrapped as $\sim\,\text{interior}$ and stored in a_formula or b_formula respectively.

The output is $(A, B, w)$ where $A$ is a_formula, $B$ is b_formula, and $w$ is W_present.

Notes

  • Raises gdpar_internal_error (via gdpar_abort) when rhs_formula is not a formula; this is treated as an internal invariant violation.
  • Raises gdpar_input_error when:
    • a summand is a bare term (not a call and not 1) — the message (partly in Spanish) instructs to wrap terms in a(), b(), or W(), or use ~ 1 for a degenerate Level 0 AMM;
    • a call has a non-symbol head;
    • a call's head is not one of a, b, W (unknown function);
    • W() is called with any arguments (must be zero-argument);
    • W() appears more than once in the same slot;
    • a() or b() is called with a number of arguments other than one;
    • a() or b() appears more than once in the same slot.
  • The interior of a() / b() is wrapped in ~ <deparse(interior)> using stats::as.formula with env = environment(rhs_formula), then passed to .gdpar_check_no_intercept_suppression for an additional validation (presumably rejecting 0 or -1 intercept-suppression syntax; the exact behavior is defined in that helper, not shown in this section).
  • The splitter .gdpar_split_amm_summands is invoked to flatten binary + trees; non-+ binary operators at the top level are rejected there.
  • P5 canonization: a bare RHS of 1 is accepted as Level 0 (anchor only); any other bare RHS without AMM wrappers aborts.
  • Error objects carry a data field with slot, term, fname, nargs, or allowed diagnostics as appropriate.

.gdpar_split_amm_summands(expr, slot_name)

Purpose

Internal helper used during AMM (Additive Multiplicative Model) formula parsing. It recursively decomposes an unevaluated R expression into a flat list of "summands" by walking down the top-level + nodes of the parse tree. It simultaneously enforces a syntactic restriction: at the top level of a slot's right-hand side, only additive composition (+) of wrapper calls (e.g. a(), b(), W()) is permitted; any other binary operator at the top level is rejected with a structured error. This guarantees that interactions and transformations are pushed inside the wrappers (e.g. a(x1:x2)) rather than appearing as top-level terms.

Arguments

  • exprlanguage (a call) or symbol/literal. The unevaluated R expression representing (a sub-tree of) the right-hand side of a slot formula. May be a single call, a symbol, or a literal constant.
  • slot_namecharacter(1). The name of the parameter slot currently being parsed; used only for constructing diagnostic error messages.

Mathematics

Given an expression tree $E$, the function computes a list $S(E)$ of top-level additive summands:

$$ S(E) = \begin{cases} S(E_L) \cup S(E_R) & \text{if } E = (E_L + E_R), \\ {E} & \text{otherwise,} \end{cases} $$

where the union preserves left-to-right order (i.e. c(S(E_L), S(E_R))). Before reaching the recursive base case, the function inspects any three-element call whose head is not +: if the head is one of $\{-, *, /, :, \wedge, |\}$, parsing is aborted.

Returns

A list of length $\geq 1$ whose elements are unevaluated R expressions (calls, symbols, or literals). Each element is a maximal sub-expression that is not itself a top-level + call. The list is in left-to-right reading order of the original expression.

Notes

  • The recursion is triggered only when expr is a call of length exactly 3L whose first element is identical to the symbol +. All other three-element calls are routed to the validation branch.
  • The set of forbidden top-level binary operators is exactly c("-", "*", "/", ":", "^", "|"). Note that + is handled separately (recursion) and unary - (a call of length 2) is not intercepted here.
  • When a forbidden operator is detected, the function calls gdpar_abort with:
    • class = "gdpar_input_error",
    • data = list(slot = slot_name, operator = op),
    • a message (in Spanish) explaining that AMM canonizes only additive composition of the wrappers a(), b(), W(), and that interactions/transformations must go inside the wrapper.
  • The function performs no evaluation of expr; it is purely structural.
  • No side effects other than the potential abort.

.gdpar_formula_set_to_amm_spec_list(fs, W_basis_arg = NULL)

Purpose

Internal converter that bridges the user-facing multi-slot formula specification (a gdpar_formula_set) and the lower-level AMM machinery. For each parameter slot it extracts the right-hand side of the stored formula, parses it via .gdpar_parse_amm_formula(), validates the consistency between any declared W() term and the externally supplied W_basis object, and assembles an amm_spec object. The result is a named list of amm_spec objects — one per slot — suitable for K-individual dispatch in gdpar(). For $K = 1$ the function still returns a length-one named list; the caller is responsible for unwrapping it when routing to the legacy single-amm_spec path to preserve bit-exact backward compatibility with Block 7.

Arguments

  • fsgdpar_formula_set. An S3 object containing at least the components $param_names (a character vector of slot names) and $formulas (a named list of formula objects, one per slot). The function aborts with an internal error if fs does not inherit from "gdpar_formula_set".
  • W_basis_argW_basis or NULL. The external W argument passed by the user to gdpar(). When non-NULL it must inherit from "W_basis"; it is attached to every slot whose parsed formula declares a W() term. Defaults to NULL.

Mathematics

Let the formula set contain $K$ slots with names $\{k_1, \dots, k_K\} = \text{fs\$param\_names}$. For each slot $k$ the function extracts the right-hand side $R_k$ of the stored formula $f_k$:

$$ R_k = \begin{cases} f_k[,c(1,3),] & \text{if } \text{length}(f_k) = 3 \quad (\text{i.e. } \sim \text{ with LHS}), \\ f_k & \text{otherwise} \quad (\text{i.e. } \sim \text{ with no LHS}), \end{cases} $$

and parses it into components $(a_k, b_k, w_k^{\text{present}}) = \text{.gdpar\_parse\_amm\_formula}(R_k, k)$. The effective $W$ basis for slot $k$ is

$$ W_k = \begin{cases} \text{W_basis_arg} & \text{if } w_k^{\text{present}} = \text{TRUE and } \text{W_basis_arg} \neq \text{NULL}, \\ \text{NULL} & \text{if } w_k^{\text{present}} = \text{FALSE}. \end{cases} $$

The output is the named list

$$ \text{out} = \bigl{, k \mapsto \text{amm_spec}(a = a_k,; b = b_k,; W = W_k) ;\big|; k \in {k_1,\dots,k_K} ,\bigr}. $$

Returns

A named list of length length(fs$param_names), with names equal to fs$param_names. Each element is an amm_spec object constructed by amm_spec(a = parsed$a_formula, b = parsed$b_formula, W = W_for_slot). Slots without a W() term receive W = NULL; slots with a W() term receive the (validated) W_basis_arg.

Notes

  • Input class check. If !inherits(fs, "gdpar_formula_set"), the function aborts via gdpar_abort with class = "gdpar_internal_error" and the message "Internal error: .gdpar_formula_set_to_amm_spec_list expected a gdpar_formula_set.". This is treated as a programmer error, not a user error.
  • RHS extraction. The right-hand side is obtained by indexing the formula object: when length(f) == 3L (a two-sided formula y ~ ...), rhs_only <- f[c(1L, 3L)] reconstructs a one-sided formula ~ ...; otherwise the formula is used as-is. This avoids deparsing or re-parsing and preserves the original language object.
  • W() consistency (decision P3, sub-phase 8.3.3). Two abort conditions are checked, in order, for each slot where parsed$W_present is TRUE:
    1. If W_basis_arg is NULL, abort with class = "gdpar_input_error", data = list(slot = k), and a message instructing the user to pass an explicit W = W_basis(...) or remove the W() term.
    2. Else if W_basis_arg does not inherit from "W_basis", abort with class = "gdpar_input_error", data = list(received = class(W_basis_arg)), and a message stating that the external W argument must be of class W_basis.
  • The W_basis_arg is shared across all slots that declare W(); there is no per-slot W override at this layer.
  • The function does not evaluate any formula; it only manipulates language objects and delegates parsing to .gdpar_parse_amm_formula.
  • No S3 dispatch is performed by this function itself; it relies on the amm_spec() constructor (whose dispatch is external to this section).
  • The loop iterates over fs$param_names in their stored order, so the output list preserves that order.
  • For $K = 1$ the return value is a length-one named list; the function does not unwrap it (the caller handles backward compatibility).

R/gdpar_coef.R

build_coef_term_df(draws_mat, term_names)

Purpose (role in the package). Constructs a tidy data.frame of posterior summary statistics for a generic block of model parameters. It is the workhorse called by coef.gdpar_fit() for both the "multi" and "scalar" branches, producing one row per parameter term with its mean and central credible-interval quantiles.

Arguments

Argument Type Meaning
draws_mat matrix or NULL Matrix of posterior draws with rows = MCMC/HMC samples and columns = parameters. If NULL or with zero columns the function returns NULL.
term_names character Column labels (parameter names) whose length must equal ncol(draws_mat).

Mathematics

For each column $j$ of draws_mat with draws $\{\theta_j^{(s)}\}_{s=1}^S$ the function computes

$$\bar{\theta}_j = \frac{1}{S}\sum_{s=1}^S \theta_j^{(s)},$$

and the empirical quantiles

$$q_j^{(\alpha)} = \text{Quantile}\bigl({\theta_j^{(s)}},;\alpha\bigr), \quad \alpha \in {0.05,;0.50,;0.95}.$$

Returns

A data.frame with columns term (character), mean, q05, q50, q95 (all numeric), one row per parameter. Returns NULL when draws_mat is NULL or has zero columns.

Notes

  • Raises a class gdpar_internal_error (via gdpar_abort) when length(term_names) != ncol(draws_mat).
  • stats::quantile is called with names = FALSE and na.rm = TRUE.
  • The returned mean and quantile columns are unnamed.
  • stringsAsFactors = FALSE is explicitly set; row.names = NULL.

build_coef_W_df(draws_mat, basis_dim, x_names)

Purpose (role in the package). Builds a long-format data.frame summarising the posterior draws of a $W$ block (the spatially-varying basis-coefficient matrix). Each row of the output corresponds to one (basis index, covariate name) combination, reflecting the row-major flattening convention used inside gdpar for the $W$ matrix.

Arguments

Argument Type Meaning
draws_mat matrix or NULL Draw matrix of dimension $(S,\;\texttt{basis\_dim} \times d_x)$ where $S$ is the number of posterior samples and $d_x = \texttt{length(x\_names)}$. Columns store a row-major flattening of a basis_dim $\times$ d_x matrix. Returns NULL if this is NULL or has zero columns.
basis_dim integer(1) Number of spatial basis functions $K_{\text{basis}}$ (i.e.\ the number of rows of the $W$ matrix per covariate).
x_names character Names of the $d_x$ covariates whose coefficients are spatially varying.

Mathematics

The draws matrix stores a row-major vectorisation of

$$\mathbf{W} \in \mathbb{R}^{K_{\text{basis}} \times d_x},$$

so that column $j$ of draws_mat maps to element $(b, x)$ of $\mathbf{W}$ via $j = (x-1)\,K_{\text{basis}} + b$. For each such column the function computes $\bar{w}_{b,x}$ and $q_{b,x}^{(\alpha)}$ exactly as in build_coef_term_df.

Returns

A data.frame with columns basis_idx (integer, 1-based spatial-basis index), x_name (character, covariate name), mean, q05, q50, q95. Row count is basis_dim * length(x_names). Returns NULL when any degenerate condition is met (NULL draws, zero columns, basis_dim == 0, or empty x_names).

Notes

  • basis_idx is constructed via rep(seq_len(basis_dim), times = length(x_names)) and x_name via rep(x_names, each = basis_dim), matching the row-major ordering $b$ varying fastest.
  • Raises gdpar_internal_error when ncol(draws_mat) != basis_dim * length(x_names).
  • Quantiles and means are unnamed; na.rm = TRUE.

build_coef_theta_ref_df(draws_mat, p)

Purpose (role in the package). Produces a tidy data.frame of posterior summaries for the reference-anchor parameter vector $\boldsymbol{\theta}_{\text{ref}}$ in the ungrouped (single-group) case. Each row represents one of the $p$ coordinates.

Arguments

Argument Type Meaning
draws_mat matrix Posterior draws of dimension $(S, p)$ where $S$ is the number of samples and $p$ the number of model coordinates.
p integer(1) Expected number of coordinates; must equal ncol(draws_mat).

Mathematics

For coordinate $k \in \{1,\dots,p\}$:

$$\bar{\theta}_{\text{ref},k} = \frac{1}{S}\sum_{s=1}^S \theta_{\text{ref},k}^{(s)}, \qquad q_k^{(\alpha)} = \text{Quantile}\bigl({\theta_{\text{ref},k}^{(s)}},;\alpha\bigr), \quad \alpha \in {0.05,0.50,0.95}.$$

Returns

A data.frame with columns k (integer, $1,\dots,p$), mean, q05, q50, q95, and exactly $p$ rows. Returns no NULL guard; assumes non-empty input.

Notes

  • Raises gdpar_internal_error if ncol(draws_mat) != p.
  • k is always seq_len(p), guaranteeing contiguous integer labels.
  • Called internally by build_coef_theta_ref_df_grouped when J_groups == 1.

build_coef_theta_ref_df_grouped(arr, J_groups, p)

Purpose (role in the package). Generalises build_coef_theta_ref_df to the grouped case (Block 6.5 of the model). Handles both univariate ($p=1$, 2-D matrix input) and multivariate ($p&gt;1$, 3-D array input) scenarios. When J_groups == 1 it collapses to the ungrouped schema for backward compatibility.

Arguments

Argument Type Meaning
arr matrix or array Posterior draws. Shape depends on context: $(S, J_{\text{groups}})$ when $p=1$ (matrix), or $(S, J_{\text{groups}}, p)$ (3-D array) when $p \ge 1$. When J_groups == 1, several shapes are tolerated (see Notes).
J_groups integer(1) Number of groups $J_{\text{groups}} \ge 1$.
p integer(1) Number of coordinates per group.

Mathematics

When J_groups > 1 and $p \ge 1$, for group $g$ and coordinate $k$:

$$\bar{\theta}_{\text{ref},g,k} = \frac{1}{S}\sum_{s=1}^S \theta_{\text{ref},g,k}^{(s)}, \qquad q_{g,k}^{(\alpha)} = \text{Quantile}\bigl({\theta_{\text{ref},g,k}^{(s)}},;\alpha\bigr).$$

When J_groups == 1, the group dimension is dropped and the output schema matches build_coef_theta_ref_df.

Returns

  • If J_groups == 1: delegates to build_coef_theta_ref_df, returning a data.frame with columns (k, mean, q05, q50, q95) and $p$ rows.
  • If J_groups > 1 and arr is a matrix ($p=1$): data.frame with columns (g, k, mean, q05, q50, q95), $J_{\text{groups}}$ rows, and k is always 1L.
  • If J_groups > 1 and arr is a 3-D array: data.frame with columns (g, k, mean, q05, q50, q95), $J_{\text{groups}} \times p$ rows, constructed by looping over groups and coordinates.

Notes

  • When J_groups == 1, the function accepts three input shapes:
    1. A 3-D array with dim[2] == 1: the single group slice arr[, 1, ] is extracted.
    2. A matrix with one column and p == 1: used directly.
    3. A matrix with p columns: used directly (the ungrouped case). Any other shape raises gdpar_internal_error.
  • When J_groups > 1 and arr is a matrix, it must have ncol == J_groups and p == 1; otherwise gdpar_internal_error.
  • When J_groups > 1 and arr is a 3-D array, dimensions must be exactly (S, J_groups, p).
  • The loop-based construction for the 3-D case appends single-row data.frames to a list, then calls do.call(rbind, out_rows).

build_coef_hyper_df(draws_mat, p)

Purpose (role in the package). Constructs a tidy data.frame of posterior summaries for a hyperparameter vector (specifically $\boldsymbol{\mu}_{\theta_{\text{ref}}}$ or $\boldsymbol{\sigma}_{\theta_{\text{ref}}}$) of dimension $p$. The output schema is identical to build_coef_theta_ref_df.

Arguments

Argument Type Meaning
draws_mat matrix or NULL Posterior draws of dimension $(S, p)$, as returned by .extract_mu_sigma_theta_ref. Returns NULL if NULL.
p integer(1) Expected number of coordinates; must equal ncol(draws_mat).

Mathematics

Identical per-coordinate mean and quantile computation as in build_coef_theta_ref_df.

Returns

A data.frame with columns k ($1,\dots,p$), mean, q05, q50, q95, and $p$ rows. Returns NULL when draws_mat is NULL.

Notes

  • Raises gdpar_internal_error if ncol(draws_mat) != p (when draws_mat is not NULL).
  • Structurally identical to build_coef_theta_ref_df except for the explicit NULL guard, reflecting that hyperparameters may be absent when there is no grouping.

validate_coef_slot(slot, p, expected_cols, slot_name)

Purpose (role in the package). Internal validation helper that checks the structural integrity of a per-$k$ coefficient list slot inside a gdpar_coef object. Each such slot is either NULL (component absent at the AMM level) or a list of length $p$ whose elements are themselves either NULL (coordinate inactive) or data.frames with specific required columns.

Arguments

Argument Type Meaning
slot list or NULL The coefficient slot to validate.
p integer(1) Expected list length (number of coordinates).
expected_cols character Column names that every non-NULL data.frame element must contain.
slot_name character(1) Human-readable name used in error messages.

Mathematics

None (pure validation logic).

Returns

Returns invisible(NULL) unconditionally on success. Side-effect-only on failure: raises gdpar_internal_error.

Notes

Validation proceeds in order:

  1. If slot is NULL, return immediately (component absent — valid).
  2. Check is.list(slot) and length(slot) == p; abort otherwise.
  3. For each $k \in 1,\dots,p$: if element $k$ is NULL skip (coordinate inactive); otherwise verify it is a data.frame and that all expected_cols are present in names(elem).

validate_coef_theta_ref(theta_ref_df, p, J_groups)

Purpose (role in the package). Validates the structural integrity of the theta_ref slot of a gdpar_coef object. Handles both ungrouped ($J_{\text{groups}} = 1$, no g column) and grouped schemas.

Arguments

Argument Type Meaning
theta_ref_df data.frame The theta_ref component to validate.
p integer(1) Expected number of coordinate levels ($1,\dots,p$).
J_groups integer(1) Expected number of groups. Defaults to 1L.

Mathematics

None (pure validation logic).

Returns

Returns invisible(NULL) on success. Raises gdpar_internal_error on any failure.

Notes

Validation steps in order:

  1. theta_ref_df must be a data.frame.
  2. Expected columns are c("k", "mean", "q05", "q50", "q95"); if column "g" is present, "g" is prepended to the expected set.
  3. All expected columns must exist; missing ones trigger an abort.
  4. Row count must equal $J_{\text{groups}} \times p$ (when g present) or $p$ (when ungrouped).
  5. Ungrouped path: theta_ref_df$k must contain exactly the integers $1,\dots,p$ without gaps (checked via sort(as.integer(...))).
  6. Grouped path: theta_ref_df$g must contain exactly $1,\dots,J_{\text{groups}}$ and theta_ref_df$k must contain exactly $1,\dots,p$, each without gaps.

validate_coef_hyper_df (signature: validate_coef_hyper_df(hyper_df, p, hyper_name))

Purpose (role in the package). Validates a hyperparameter summary data.frame (either mu_theta_ref or sigma_theta_ref) inside a gdpar_coef object. Expects a data.frame with columns (k, mean, q05, q50, q95) and exactly $p$ rows. A NULL slot is accepted (no grouping present).

Arguments

Argument Type Meaning
hyper_df data.frame or NULL The hyperparameter summary to validate. NULL is accepted (component inactive / no grouping).
p integer(1) Expected number of coordinate rows.
hyper_name character(1) Human-readable name used in error messages (typically "mu_theta_ref" or "sigma_theta_ref").

Mathematics

None (pure validation logic).

Returns

Returns invisible(NULL) on success. Raises gdpar_internal_error on failure.

Notes

The function body is truncated in this section (the section ends immediately after the roxygen @noRd tag). Based on the documentation header, the expected behaviour is: accept NULL silently; otherwise verify the object is a data.frame with columns c("k", "mean", "q05", "q50", "q95") and exactly p rows. The full implementation is in a subsequent section.

validate_coef_hyper(df, p, slot_name)

Purpose Internal validation helper that checks whether a hyperparameter summary data.frame (such as mu_theta_ref or sigma_theta_ref) conforms to the expected structure. Called during new_gdpar_coef() construction.

Arguments

Argument Type Meaning
df data.frame or NULL The hyperparameter summary table to validate. If NULL the function returns silently (component is absent).
p integer scalar Expected number of rows (one per coordinate).
slot_name character scalar Human-readable name of the slot being validated; used in error messages.

Mathematics

No formula is implemented. The function enforces the invariant that, when present, a hyperparameter summary data.frame must contain exactly the columns $${\texttt{k},;\texttt{mean},;\texttt{q05},;\texttt{q50},;\texttt{q95}}$$ and must have exactly $p$ rows.

Returns invisible(NULL) on success. Never returns a value on failure; an error of class "gdpar_internal_error" is raised instead.

Notes

  • If df is NULL, returns immediately—this is the expected path when the hyperparameter component is absent.
  • If df is not a data.frame, gdpar_abort() is called with a message stating the slot must be NULL or a data.frame.
  • Missing columns are detected via setdiff(expected, names(df)) and reported in a single comma-separated list.
  • Row-count mismatch triggers an error message quoting the actual and expected ($p$) row counts.
  • All errors carry class "gdpar_internal_error".

new_gdpar_coef(theta_ref, a, b, W, p, mu_theta_ref, sigma_theta_ref, J_groups, group_levels, summary_stats)

Purpose Internal constructor for the gdpar_coef S3 class. Validates every slot and assembles a well-formed object that downstream methods can trust without further checks. This is the sole entry point for creating gdpar_coef instances.

Signature

new_gdpar_coef(theta_ref, a = NULL, b = NULL, W = NULL,
               p,
               mu_theta_ref = NULL,
               sigma_theta_ref = NULL,
               J_groups = 1L,
               group_levels = NULL,
               summary_stats = c("mean", "q05", "q50", "q95"))

Arguments

Argument Type Meaning
theta_ref data.frame Reference-anchored parameter summaries with columns (k, mean, q05, q50, q95) and p rows. May also include a g column for grouped models.
a NULL or list of length p Per-coordinate intercept/additive component slot. Each entry is NULL (inactive) or a data.frame with columns (term, mean, q05, q50, q95).
b NULL or list of length p Per-coordinate slope/multiplier component slot. Same structure as a.
W NULL or list of length p Per-coordinate basis-expansion component slot. Each entry is NULL or a data.frame with columns (basis_idx, x_name, mean, q05, q50, q95).
p numeric scalar Number of coordinates (time points, locations, etc.). Must be a positive integer.
mu_theta_ref NULL or data.frame Hyperparameter summary for the prior mean of $\theta_{\text{ref}}$. When present, validated with validate_coef_hyper.
sigma_theta_ref NULL or data.frame Hyperparameter summary for the prior standard deviation of $\theta_{\text{ref}}$. When present, validated with validate_coef_hyper.
J_groups integer scalar Number of distinct groups in a grouped model. Defaults to 1L.
group_levels NULL or vector Labels for the J_groups groups; stored as-is.
summary_stats character vector Names of the summary statistics carried by the object (e.g. c("mean","q05","q50","q95")).

Returns A list of class "gdpar_coef" with elements theta_ref, a, b, W, p, mu_theta_ref, sigma_theta_ref, J_groups, group_levels, summary_stats.

Notes

  • p is coerced to integer after validation. The check requires it to be a length-1 numeric, non-NA, $\geq 1$, and equal to its own integer cast.
  • J_groups is coerced to integer unconditionally.
  • Validation is delegated to helpers: validate_coef_theta_ref for theta_ref; validate_coef_slot for a, b, W (each with its expected column set); validate_coef_hyper for mu_theta_ref and sigma_theta_ref.
  • Any validation failure raises an error of class "gdpar_internal_error" via gdpar_abort().
  • The returned object uses structure(...) to attach the "gdpar_coef" class.

count_active_coords(slot)

Purpose Internal utility that counts how many coordinates in a component slot (a, b, or W) have a non-empty data.frame, i.e. how many coordinates actively contribute a component.

Signature

count_active_coords(slot)

Arguments

Argument Type Meaning
slot NULL or list A per-coordinate component list (e.g. x$a).

Returns An integer scalar: the number of list elements that are non-NULL data.frames with at least one row. Returns 0L when slot is NULL.

Notes

  • Uses vapply with logical(1L) for type-safe element-wise testing.
  • An element is "active" if and only if all three conditions hold: !is.null(e), is.data.frame(e), and nrow(e) > 0L.
  • Called by print.gdpar_coef and summary.gdpar_coef to report component activity counts.

print.gdpar_coef(x, level, digits, ...)

Purpose S3 print method for gdpar_coef objects. Provides three verbosity levels for inspecting coefficient summaries.

Signature

print.gdpar_coef(x,
                 level = c("global", "coord", "full"),
                 digits = 4L,
                 ...)

Arguments

Argument Type Meaning
x gdpar_coef Object to print.
level character, one of "global", "coord", "full" Controls verbosity. "global" shows only the theta_ref table and active-component counts. "coord" appends per-coordinate component means (only the mean column). "full" appends per-coordinate data.frames with all summary columns.
digits integer scalar Number of significant digits passed to format_coef_df() for numeric formatting.
... ignored Present for S3 generic compatibility.

Returns invisible(x) — the input object, unmodified.

Notes

  • The header prints: class tag <gdpar_coef>, p, conditionally J_groups (only when $&gt; 1$), summary_stats, and active-component counts formatted as a(n/p) b(n/p) W(n/p).
  • If mu_theta_ref or sigma_theta_ref is non-NULL, their data.frames are printed before theta_ref.
  • At level = "global" a hint is printed suggesting the other levels and as.data.frame().
  • At level = "coord" or "full", the method iterates $k = 1, \ldots, p$. For each coordinate, if at least one of a[[k]], b[[k]], W[[k]] is non-NULL and non-empty, its table is printed. At "coord" level only columns (term, mean) (for a/b) or (basis_idx, x_name, mean) (for W) are shown; at "full" level all columns are shown.
  • All numeric formatting is delegated to format_coef_df().
  • Uses match.arg() for level argument matching.

format_coef_df(df, digits)

Purpose Internal helper that formats numeric columns of a coefficient data.frame for pretty printing. Modifies numeric columns in-place (except integer-like columns) to character strings with a specified number of significant digits.

Signature

format_coef_df(df, digits)

Arguments

Argument Type Meaning
df data.frame or NULL The data.frame to format.
digits integer scalar Number of significant digits for formatC().

Returns The same data.frame with selected numeric columns replaced by formatted character strings. Returns df unchanged if it is NULL or has zero rows.

Notes

  • Columns "g", "k", and "basis_idx" are skipped even if numeric, because they serve as integer identifiers rather than estimated quantities.
  • Formatting uses formatC(..., digits = digits, format = "g", flag = "-"), where "g" selects the shorter of %f / %e representation and "-" requests left-justification.
  • The function mutates and returns df in-place (standard R copy-on-modify semantics apply).

summary.gdpar_coef(object, ...)

Purpose S3 summary method that returns a compact list of aggregated statistics about a gdpar_coef object.

Signature

summary.gdpar_coef(object, ...)

Arguments

Argument Type Meaning
object gdpar_coef Object to summarize.
... ignored Present for S3 generic compatibility.

Mathematics

Computes the cross-coordinate average of posterior means: $$\bar{\theta}{\text{ref}} = \frac{1}{p}\sum{k=1}^{p} \theta_{\text{ref},k}^{\text{mean}}$$ via mean(object$theta_ref$mean).

Returns A named list with elements:

Element Type Content
p integer Number of coordinates.
n_active named integer vector Counts of active coordinates for components a, b, W (via count_active_coords).
theta_ref_mean numeric scalar Mean of the mean column of theta_ref across all coordinates.
summary_stats character vector The summary statistic names carried by the object.

Notes

  • The n_active vector has names "a", "b", "W".
  • mean() is called directly on object$theta_ref$mean; if theta_ref has a g column (grouped model), this averages over all group–coordinate combinations.

as.data.frame.gdpar_coef(x, row.names, optional, ...)

Purpose S3 method that coerces a gdpar_coef object into a single long-format (tidy) data.frame. Flattens the hierarchical per-coordinate, per-component structure into rows suitable for use with dplyr / ggplot2.

Signature

as.data.frame.gdpar_coef(x, row.names = NULL, optional = FALSE, ...)

Arguments

Argument Type Meaning
x gdpar_coef Object to coerce.
row.names NULL Ignored; required by the as.data.frame generic.
optional logical Ignored; required by the generic.
... ignored Present for generic compatibility.

Returns A data.frame with one row per summarized scalar coefficient and columns:

Column Type Content
component character "theta_ref", "mu_theta_ref", "sigma_theta_ref", "a", "b", or "W".
g integer Group index from theta_ref if a g column exists; NA_integer_ for all other components.
k integer Coordinate index ($1, \ldots, p$).
identifier character term value for a/b; basis_idx (as character) for W; NA_character_ for theta_ref and hyperparameter rows.
x_name character Predictor name from the W slot; NA_character_ elsewhere.
mean numeric Posterior mean.
q05 numeric 5th percentile of the posterior.
q50 numeric 50th percentile (median) of the posterior.
q95 numeric 95th percentile of the posterior.

Notes

  • When x$theta_ref contains a column "g", each theta_ref row carries the corresponding integer group value; otherwise g is NA_integer_.
  • The helper function add_hyper(slot, comp_name) (defined inline) iterates over rows of mu_theta_ref / sigma_theta_ref and appends a row per entry. Its g column is always NA_integer_.
  • The helper function add_terms(slot, comp_name) (defined inline) iterates over coordinates and then over rows of the per-coordinate data.frame for components a and b, extracting the term column as the identifier.
  • For W, iteration is done directly (not via add_terms) because the identifier is basis_idx and an additional x_name column must be carried.
  • If no rows are accumulated (all slots are NULL), a zero-row data.frame with the correct column schema is returned.
  • Rows are accumulated in a list (rows) and combined with do.call(rbind, rows) at the end.

One-line formatter (truncated)

The section ends with the roxygen header for an exported function described as a "One-line formatter for gdpar_coef objects" but the function body is not included in this section. The documented signature is format.gdpar_coef(x, ...) (inferred from the generic dispatch implied by the @export tag and the parameter name x). It is documented as returning a length-1 character vector. Its implementation continues in section 3.

format.gdpar_coef(x, ...)

Purpose S3 method that provides a human-readable textual representation of a gdpar_coef object (the coefficient container for GDPAR models). It is automatically dispatched when format() or print() is called on an object of class "gdpar_coef", and is typically used for console display.

Arguments

Argument Type Meaning
x list with class attribute "gdpar_coef" The coefficient object whose structure is to be rendered. Expected components accessed are $p, $a, $b, and $W.
... (ignored) Additional arguments passed through the S3 generic; not used by this method.

Returns A single character string of the form:

<gdpar_coef> p=<p>, components=[theta_ref, a(<k_a>/<p>), b(<k_b>/<p>), W(<k_W>/<p>)]

where

  • $p$ is x$p (total number of predictors/anchors).
  • $k_a$, $k_b$, $k_W$ are the values returned by count_active_coords(x$a), count_active_coords(x$b), and count_active_coords(x$W) respectively — each counts how many coordinates in the corresponding array are non-zero (active).

The expression sprintf is used for interpolation; the %d placeholders are all supplied as integers.

Notes

  • S3 dispatch. The leading-dot naming convention .gdpar_coef after the generic name format is the standard R mechanism for S3 method dispatch. When format() (or print(), which calls format()) is invoked on any object whose class vector contains "gdpar_coef", this method is selected.
  • Dependency on count_active_coords. The helper function count_active_coords (defined elsewhere in the package) must return a scalar integer. If it returns a non-scalar or non-integer, sprintf will either coerce or warn.
  • No side effects. The function is purely informational; it does not modify x or any global state.
  • Assumptions on structure. The method assumes x contains list elements p, a, b, and W. If any are missing, a runtime error will be thrown at the point of access.

R/gdpar_loo.R

gdpar_loo.R

gdpar_loo(fit, aggregation = c("subject", "cell"), r_eff = NULL, cores = 1L, ...)

Purpose

Exported entry point for approximate leave-one-out cross-validation of a gdpar_fit object. It delegates to loo::loo() to perform Pareto-smoothed importance sampling LOO (PSIS-LOO), using the per-observation log-likelihood draws persisted by the Stan model in the generated quantity log_lik. For multivariate fits ($p &gt; 1$) it offers two aggregation strategies that determine what counts as a single "observation" for the ELPD computation.

Arguments

Argument Type Meaning
fit gdpar_fit A fitted gdpar model object. Must inherit from "gdpar_fit" (checked via assert_inherits). The function accesses fit$fit$draws(...) (the underlying Stan/CmdStan fit) and fit$amm$p (the declared coordinate dimension).
aggregation character scalar One of "subject" (default) or "cell". Matched via match.arg. Ignored when $p = 1$. "subject" sums log-likelihood over coordinates within each row; "cell" treats each $(i,k)$ pair as an independent observation.
r_eff numeric vector or NULL Relative effective sample sizes per observation. If NULL (default), computed from the draws via loo::relative_eff() using chain identifiers reconstructed from the draws array dimensions. Supplying it explicitly avoids recomputation across repeated LOO calls on the same fit.
cores integer (coerced to 1L by default) Number of cores passed to loo::loo() and loo::relative_eff(). Defaults to 1L (sequential) to avoid non-determinism.
... (any) Additional arguments forwarded to loo::loo().

Mathematics

For a univariate fit ($p = 1$), the Stan model emits log_lik as a length-$n$ vector, and each of the $n$ rows of the input data is one observation.

For a multivariate fit ($p &gt; 1$), the Stan model emits log_lik as an $n \times p$ matrix with $\texttt{log\_lik}[i,k] = \log p(y_{ik} \mid \theta_i[k])$. Under the coordinate-wise factorization

$$ p(y_i \mid \theta_i) = \prod_{k=1}^{p} D_k!\left(y_{ik} \mid \theta_i[k]\right), $$

the per-subject log-likelihood (aggregation "subject") is

$$ \log p(y_i \mid \theta_i) = \sum_{k=1}^{p} \log p(y_{ik} \mid \theta_i[k]). $$

For aggregation "cell", each $(i,k)$ pair is treated as an independent observation, yielding PSIS-LOO over $n \cdot p$ cells.

The relative effective sample size is computed (when r_eff = NULL) as

$$ \texttt{r_eff} = \texttt{loo::relative_eff}!\left(\exp(\texttt{log_lik_mat}),; \texttt{chain_id}\right), $$

where chain_id is constructed as rep(seq_len(n_chain), each = n_iter) from the draws array of shape [n_iter, n_chain, n_variables].

Returns

A loo object: an S3 object of class "psis_loo" containing the ELPD estimate (elpd_loo), its standard error, Pareto-$\hat{k}$ diagnostics, and pointwise contributions. This is the direct return value of loo::loo(log_lik_mat, r_eff = r_eff, cores = cores, ...).

Notes

  • Flagged @keywords experimental; the aggregation rule is stable but the signature may gain additional arguments in future versions.
  • Calls assert_inherits(fit, "gdpar_fit", "fit") — raises an error if fit does not inherit from "gdpar_fit".
  • Calls require_suggested("loo", "compute PSIS-LOO approximate cross-validation") — raises an error if the loo package is not installed.
  • Draws are extracted via fit$fit$draws(variables = "log_lik", format = "draws_array"), producing a 3-D array [iteration, chain, variable].
  • The coordinate dimension p is read from fit$amm$p; if that slot is NULL, p defaults to 1L.
  • The r_eff computation uses exp(log_lik_mat) (i.e., the likelihood on the natural scale) as required by loo::relative_eff().
  • Pareto-$\hat{k} &gt; 0.7$ signals that the PSIS approximation is unreliable for the affected observations; the documentation recommends loo::loo_moment_match() or loo::reloo() as refinements.
  • The "subject" aggregation matches the convention used by brms multivariate fits with set_rescor(FALSE), yielding ELPD values directly comparable to per-coordinate competitors aggregated identically.
  • The "cell" aggregation is recommended for diagnostics only, not for reporting comparable ELPD values, because it conflates subject-level and coordinate-level cross-validation.

aggregate_log_lik(draws_arr, p, aggregation)

Purpose

Internal helper (not exported; documented @noRd) that converts a draws_array of the Stan-generated log_lik quantity into a plain numeric matrix suitable for loo::loo(). It handles the univariate case (pass-through), the multivariate subject-aggregation case (sum over coordinates), and the multivariate cell-aggregation case (concatenation in i-major, k-minor order).

Arguments

Argument Type Meaning
draws_arr draws_array (from posterior/cmdstanr) A draws array containing only the log_lik variable(s). Expected shape: [n_iter, n_chain, n_variables].
p integer Declared coordinate dimension. If 1L, the function returns the draws matrix unchanged. If $&gt; 1$, variable names are parsed to extract indices $(i, k)$.
aggregation character scalar Either "subject" or "cell". Determines whether coordinates are summed (subject) or concatenated (cell). Not validated inside this function (the caller gdpar_loo passes the result of match.arg).

Mathematics

For $p = 1$, the input matrix is returned as-is (shape $S \times n$, where $S$ is the total number of posterior draws).

For $p &gt; 1$, variable names are parsed with the regular expression ^log_lik\[(\d+),(\d+)\]$, yielding index pairs $(i, k)$. Let $n = \max_i$ and $p_{\text{in}} = \max_k$.

Subject aggregation produces an $S \times n$ matrix where column $i$ is

$$ \texttt{out}[\cdot, i] = \sum_{k=1}^{p} \texttt{mat}[\cdot, \texttt{col}(i,k)], $$

with $\texttt{col}(i,k)$ being the column index in mat corresponding to variable log_lik[i,k].

Cell aggregation produces an $S \times (n \cdot p)$ matrix where the flat column index is

$$ \texttt{flat_col}(i, k) = (i - 1),p + k, $$

so columns are ordered as $((1,1), (1,2), \ldots, (1,p), (2,1), \ldots, (n,p))$ — i-major, k-minor.

Returns

A plain numeric matrix (no class attributes beyond matrix, since unclass() is applied to the draws_matrix):

  • $p = 1$: shape $S \times n$, identical to the input draws matrix.
  • $p &gt; 1$, "subject": shape $S \times n$, columns indexed by subject $i$.
  • $p &gt; 1$, "cell": shape $S \times (n \cdot p)$, columns indexed by flat $(i,k)$.

Here $S = \texttt{nrow(mat)}$ is the total number of posterior draws (iterations $\times$ chains).

Notes

  • The function begins by extracting variable names via posterior::variables(draws_arr) and converting to a plain matrix via unclass(posterior::as_draws_matrix(draws_arr)).
  • For $p = 1$, the function returns immediately without parsing variable names; both aggregation modes would yield the same result.
  • Parse error: If any variable name does not match ^log_lik\[(\d+),(\d+)\]$, the function calls gdpar_abort() with:
    • class = "gdpar_loo_parse_error",
    • data = list(unparsed = vars[bad]),
    • a message listing up to 5 offending names (via utils::head(vars[bad], 5L) and sQuote).
  • Dimension mismatch error: If the parsed maximum $k$ (p_in) does not equal the declared p, the function calls gdpar_abort() with:
    • class = "gdpar_loo_dim_mismatch",
    • data = list(p_parsed = p_in, p_fit = p),
    • a message of the form "log_lik dimension mismatch: parsed p = %d from draws; fit declares p = %d.".
  • The index matrix ij is built by do.call(rbind, lapply(parsed, function(z) as.integer(z[2:3]))), with column names c("i", "k").
  • The subject-aggregation loop iterates seq_along(vars) and accumulates mat[, col_idx] into out[, i]; the cell-aggregation loop places each mat[, col_idx] into out[, flat_col] without summation.
  • No validation of aggregation is performed inside this function; it assumes the caller has already matched the argument. If an unrecognized value is passed, the cell-aggregation branch is taken by default (since the if (aggregation == "subject") check fails), and the subject branch is skipped.

R/gdpar-package.R

R/gdpar-package.R contains no functions: it holds only the package-level documentation sentinel "_PACKAGE" (which Roxygen turns into the ?gdpar-package help topic, aliased gdpar-package) plus the package-wide @importFrom directives.

Package-level documentation. The roxygen block states the package's thesis — individual parameters decomposed as $\theta_i=\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}})$ in the Additive–Multiplicative–Modulated canonical form $\Delta(x,\theta)=a(x)+b(x)\odot\theta+W(\theta)x$ — and documents the three estimation paths and the ten theoretical vignettes (Blocks 0–9: canonical form & identifiability, gnoseological validity, special-case subsumptions, per-path asymptotics, EB vs FB, causal positioning, cognitive motivation).

The roxygen @section Three paths describes path = "bayes" (Path 1, hierarchical Bayesian via cmdstanr — the default and, in this release, the only operational path), path = "vcm" (Path 2, varying-coefficient via mgcv), and path = "hyper" (Path 3, hypernetwork via torch, explicitly "Not implemented in this release"; invoking it raises a structured gdpar_unsupported_feature_error, and torch is not a declared dependency). The stated default rationale: Path 1 admits the closed-form identifiability results (Theorems 1A/1E) and calibrated Bayesian uncertainty (Theorem 4C). (Note: consistent with the operational status documented throughout this wiki and the framework-overview vignette, the executable surface of gdpar 0.1.0 is Path 1 only; path = "vcm" and path = "hyper" abort with gdpar_unsupported_feature_error.)

Imports. @importFrom stats model.matrix terms update sd quantile median lm coef qnorm setNames qt and @importFrom methods is — the base-stats/methods symbols used across the package.


← Part IV — Exhaustive Function Reference (2/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (4/7) →

Clone this wiki locally