Skip to content

Part IV Function Reference 7

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 4 revisions

← Part IV — Exhaustive Function Reference (6/7) · gdpar Wiki Home · Part V — Stan Templates (1/3) →


Pre-flight Diagnostics (Section 2)

preflight_attribution_score(log_sigma, raw_norm, divergent)

Purpose Pre-flight diagnostic that produces a z-like statistic measuring whether divergent MCMC transitions are disproportionately concentrated in regions of draw-space where the log prior scale (log_sigma) and the raw NCP normalization (raw_norm) co-vary. A large magnitude suggests that the divergences are structurally linked to the parametrization geometry, informing the CP/NCP decision.

Arguments

Argument Type Meaning
log_sigma numeric vector Per-draw log of the prior scale parameter (e.g. log(sigma_a) or log(sigma_W)).
raw_norm numeric vector Per-draw raw NCP normalization quantity (e.g. a column of W_raw or a_raw).
divergent logical or integer vector Per-draw divergence indicator; used as a logical subset index into the per-draw arrays.

Mathematics

Let $n$ be the common length. Standardize both vectors:

$$z_{\sigma} = \frac{\log\sigma - \overline{\log\sigma}}{\mathrm{sd}(\log\sigma)}, \qquad z_{r} = \frac{\text{raw_norm} - \overline{\text{raw_norm}}}{\mathrm{sd}(\text{raw_norm})}$$

Define the per-draw interaction score:

$$g_i = -,z_{\sigma,i}, z_{r,i}$$

Restrict to divergent draws and compute:

$$s_{\text{div}} = \frac{1}{n_{\text{div}}}\sum_{i \in \mathcal{D}} g_i, \qquad s_{\text{centered}} = s_{\text{div}} - \bar{g}$$

where $\mathcal{D}$ is the set of divergent draws and $\bar{g} = \mathrm{mean}(g)$. The standard error is:

$$\text{SE} = \frac{\mathrm{sd}(g)}{\sqrt{n_{\text{div}}}}$$

The returned statistic is:

$$t = \frac{s_{\text{centered}}}{\text{SE}}$$

Returns A numeric scalar (the statistic $t$), or NA_real_ if any of the following hold: length mismatch among the three inputs; sd(log_sigma) or sd(raw_norm) is NA or $\le 0$; fewer than 1 divergent draw (n_div < 1L); sd(g) is NA or $\le 0$.

Notes

  • No S3 dispatch; a plain closure.
  • The divergent argument is used directly as a logical index (g[divergent]), so it must be coercible to logical. Integer 0/1 vectors work because R recycles them as logical.
  • The sign convention $g = -z_\sigma z_r$ means that draws where both standardized quantities are large in the same direction contribute negatively; the centering by $\bar{g}$ removes the baseline association.

block_bootstrap_indices(n_draws, n_chains, block_size = 10L)

Purpose Generates resampling indices for a chain-aware block bootstrap. Contiguous blocks of block_size draws are resampled within each chain (never across chains), producing an integer vector of length n_draws that can be used to subset a draws matrix. This mitigates MCMC autocorrelation that would cause a naive i.i.d. bootstrap to underestimate standard errors.

Arguments

Argument Type Meaning
n_draws integer (coerced) Total number of draws across all chains.
n_chains integer (coerced) Number of chains.
block_size integer (coerced, default 10L) Length of each contiguous block. Must be in $[1,\; \text{draws\_per\_chain}]$.

Mathematics

Let $D = n_{\text{draws}} / n_{\text{chains}}$ (draws per chain) and $B = \lceil D / L \rceil$ where $L = \text{block\_size}$. For each chain $c \in \{1, \dots, n_{\text{chains}}\}$:

  1. Define the chain's global start offset: $\text{start}_c = (c-1)\,D + 1$.
  2. Sample $B$ block-start positions uniformly with replacement from $\{1, 2, \dots, D - L + 1\}$.
  3. For the $b$-th sampled start $s_b$, the block contributes indices $\{\text{start}_c + s_b - 1, \dots, \text{start}_c + s_b + L - 2\}$.
  4. Concatenate all $B \times L$ indices and truncate to the first $D$ entries.

The final output is the concatenation across all chains, yielding exactly $n_{\text{draws}}$ indices.

Returns An integer vector of length n_draws containing 1-based indices into the original draws matrix.

Notes

  • Raises an error of class "gdpar_input_error" (via gdpar_abort) if n_draws %% n_chains != 0L.
  • Raises an error of class "gdpar_input_error" if block_size < 1L or block_size > draws_per_chain. The error payload includes data = list(block_size = ..., draws_per_chain = ...).
  • Assumes chain-first row ordering: chain 1 iterations 1…D, then chain 2 iterations 1…D, etc.
  • Because $B = \lceil D/L \rceil$, the raw concatenated block indices may exceed $D$; the truncation chain_idx[seq_len(draws_per_chain)] handles this, meaning the last block may be partially represented.
  • Uses sample.int with replace = TRUE, so the same block start can appear multiple times.

preflight_info_ratio_t_from_fit(fit, component, n_coords, n_chains, theta_anchor = NULL, dim_W = NULL, d = NULL, tau_cp = 5, tau_ncp = 2)

Purpose Extracts the effective coefficients and per-draw reference scales from a cmdstanr pre-flight fit (Path B′) and delegates to preflight_info_ratio_t to compute the info-ratio z-statistics. Handles two components: "a" (scalar/vector coefficients) and "W" (basis-function coefficients requiring reference-point differencing).

Arguments

Argument Type Meaning
fit cmdstanr fit object The NCP pre-flight model fit.
component character Either "a" or "W".
n_coords integer Number of free coordinates $J_{a,\text{free}}$; used only when component == "a". Ignored for "W".
n_chains integer Number of chains in the underlying sampler.
theta_anchor numeric scalar Anchor value $\theta_{\text{anchor}}$; required when component == "W".
dim_W integer Basis dimension $K$; required when component == "W".
d integer Output dimension of $W$; required when component == "W".
tau_cp numeric scalar (default 5) CP threshold, forwarded to preflight_info_ratio_t.
tau_ncp numeric scalar (default 2) NCP threshold, forwarded to preflight_info_ratio_t.

Mathematics

Component "a":

$$\text{eff}[t, j] = a_{\text{coef}}[t, j], \qquad \text{ref}[t] = \sigma_a[t, 1]$$

for $j = 1, \dots, J_{a,\text{free}}$ and draws $t = 1, \dots, n_{\text{draws}}$.

Component "W":

Let $\theta_{\text{ref}}[t]$ be the reference location at draw $t$, $\sigma_W[t]$ the scale, and $W_{\text{raw}}[t, k, j]$ the raw coefficient for basis $k$ and coordinate $j$. Define the basis difference:

$$\Delta_k[t] = \theta_{\text{ref}}[t]^k - \theta_{\text{anchor}}^k$$

The effective coefficient at coordinate $j$ is:

$$\text{eff}[t, j] = \sum_{k=1}^{K} \Delta_k[t] ; W_{\text{raw}}[t, k, j] ; \sigma_W[t]$$

The per-draw reference scale (conditional prior SD of $\text{eff}[t, j]$ given hyperparameters at draw $t$) is:

$$\text{ref}[t] = \sigma_W[t] \sqrt{\sum_{k=1}^{K} \Delta_k[t]^2}$$

These are then passed to preflight_info_ratio_t.

Returns A list with elements t_cp and t_ncp (the z-statistics from preflight_info_ratio_t). Returns list(t_cp = NA_real_, t_ncp = NA_real_) on any of the following failure conditions:

  • component == "a": draws_coef or draws_sigma is NULL or has zero columns; any sigma_vec is NA or $\le 0$.
  • component == "W": theta_anchor, dim_W, or d is NULL; any of draws_theta, draws_sigma, draws_W is NULL or has zero columns; any theta_vec or sigma_vec is NA; any sigma_vec $\le 0$; ncol(draws_W) != dim_W * d; any basis_norm is non-finite or $\le 0$.
  • component is neither "a" nor "W": falls through to the final list(t_cp = NA_real_, t_ncp = NA_real_).

Notes

  • All fit$draws() calls are wrapped in tryCatch; extraction errors yield NULL, which triggers the NA-list return.
  • For "W", the W_raw draws matrix is reshaped into a 3-D array of dimensions c(n_draws, dim_W, d) via array(). The column ordering of the draws matrix is assumed to be dim_W * d with the first index (basis $k$) varying fastest, consistent with Stan's column-major flattening of W_raw[k, j].
  • The vapply for basis_diff computes theta_vec^k - theta_anchor^k for each $k$; if the result is not a matrix (e.g., dim_W == 1), it is explicitly reshaped to n_draws rows.
  • tau_cp and tau_ncp are forwarded unchanged to preflight_info_ratio_t.

preflight_info_ratio_t(effective_coef, reference_scale_per_draw, n_chains, tau_cp = 5, tau_ncp = 2, alpha = 0.025, n_boot = 200L, block_size = 10L)

Purpose Core engine of the Path B′ info-ratio filter. Computes the mean log information ratio across coordinates on the full data, estimates its standard error via a chain-aware block bootstrap, and returns two z-statistics testing against $\log \tau_{cp}$ (upper) and $\log \tau_{ncp}$ (lower). The caller compares these z-statistics to qnorm(1 - alpha) to decide parametrization.

Arguments

Argument Type Meaning
effective_coef numeric matrix (n_draws × n_coords) Per-draw effective coefficients. Coerced to matrix if not already.
reference_scale_per_draw numeric vector (length n_draws) Per-draw reference scale; must be strictly positive.
n_chains integer Number of chains; n_draws must be divisible by this.
tau_cp numeric scalar (default 5) CP information-ratio threshold.
tau_ncp numeric scalar (default 2) NCP information-ratio threshold.
alpha numeric scalar (default 0.025) Retained for documentation only; not used in computation. The caller is responsible for comparing the returned z-statistics to qnorm(1 - alpha).
n_boot integer (default 200L) Number of bootstrap replicates. Must be $\ge 2$.
block_size integer (default 10L) Block length for the chain-aware bootstrap. Clamped to min(block_size, n_draws %/% n_chains).

Mathematics

Full-data statistic. For each coordinate $j$:

$$\ell_j = \log!\left(\frac{\bar{r}}{\mathrm{sd}(\text{eff}_j)}\right), \qquad \bar{r} = \frac{1}{n}\sum_{t=1}^{n} \text{ref}[t]$$

The point estimate is the cross-coordinate mean:

$$m = \frac{1}{J}\sum_{j=1}^{J} \ell_j$$

Bootstrap standard error. For each replicate $b = 1, \dots, n_{\text{boot}}$:

  1. Draw indices $\mathcal{I}_b$ via block_bootstrap_indices(n_draws, n_chains, block_size).
  2. Compute $\ell_{j,b} = \log(\bar{r}_b / \mathrm{sd}(\text{eff}_{b,j}))$ where $\bar{r}_b$ and $\mathrm{sd}(\text{eff}_{b,j})$ are computed on the resampled rows.
  3. $m_b = \mathrm{mean}_j(\ell_{j,b})$.

Replicates with non-finite $m_b$ are discarded. The standard error is:

$$\text{SE} = \mathrm{sd}({m_b : m_b \text{ finite}})$$

Z-statistics:

$$t_{cp} = \frac{m - \log \tau_{cp}}{\text{SE}}, \qquad t_{ncp} = \frac{m - \log \tau_{ncp}}{\text{SE}}$$

Returns A list list(t_cp = ..., t_ncp = ...). Returns list(t_cp = NA_real_, t_ncp = NA_real_) under any of the following conditions:

  • effective_coef has fewer than 1 row or 1 column after coercion.
  • length(reference_scale_per_draw) != n_draws.
  • Any element of effective_coef is NA.
  • Any element of reference_scale_per_draw is NA or $\le 0$.
  • Any coordinate's full-data sd is NA or $\le 0$.
  • full_mean_ref is non-finite or $\le 0$.
  • m (the mean log info ratio) is non-finite.
  • n_chains < 1L or n_draws %% n_chains != 0L.
  • block_size after clamping is < 1L.
  • n_boot < 2L.
  • Fewer than 2 finite bootstrap means remain after filtering.
  • se (sd of bootstrap means) is non-finite or $\le 0$.

Notes

  • block_size is silently clamped: block_size <- min(as.integer(block_size), n_draws %/% n_chains). If this yields a value < 1L, the function returns NA.
  • Within the bootstrap loop, individual replicates that produce NA or non-positive sd_b or non-finite/non-positive mean_ref_b are recorded as NA_real_ and subsequently filtered out by boot_means[is.finite(boot_means)].
  • The alpha parameter has no effect on the returned values; it exists for API symmetry and documentation. The caller must explicitly compare t_cp and t_ncp to qnorm(1 - alpha).
  • The names t_cp and t_ncp are kept for symmetry with the Path B implementation; conceptually they are z-statistics under the asymptotic regime of large n_boot.
  • No side effects; no S3 dispatch.

resolve_parametrization(parametrization, parametrization_a, parametrization_W, prior, stan_data, amm, preflight_seed, verbose)

Purpose

Central dispatcher that reconciles three layers of parametrization specification — a single global parametrization switch, optional per-parameter overrides (parametrization_a, parametrization_W), and an automatic preflight sampler — into two concrete Boolean flags (cp_a, cp_W) plus a metadata record explaining how each flag was obtained. It is the bridge between user-facing parametrization arguments and the downstream Stan model configuration, deciding whether the centered (cp) or non-centered (ncp) parametrization is used for the a and W parameter blocks.

Arguments

  • parametrizationcharacter(1). Global switch; recognized values are "ncp" (non-centered), "cp" (centered), and "auto" (defer to preflight). Any other value causes switch() to return NULL, which propagates as an edge case (see Notes).
  • parametrization_acharacter(1) or NULL. Per-parameter override for the a block. When non-NULL, equality to "cp" is tested; any other string is treated as non-cp.
  • parametrization_Wcharacter(1) or NULL. Per-parameter override for the W block, interpreted analogously to parametrization_a.
  • prior — passed through to preflight_parametrization() when preflight is required. Type/meaning determined by that function.
  • stan_data — passed through to preflight_parametrization() when preflight is required.
  • amm — passed through to preflight_parametrization() when preflight is required.
  • preflight_seed — passed through as the preflight_seed argument of preflight_parametrization() when preflight is required.
  • verbose — passed through as the verbose argument of preflight_parametrization() when preflight is required.

Mathematics

The function implements a three-tier precedence resolution. Define the user-level indicator

$$ \text{cp}_a^{\text{user}} = \begin{cases} \mathbb{1}!\left[\text{parametrization_a} = \text{"cp"}\right] & \text{parametrization_a} \neq \text{NULL},\\ \text{NA} & \text{otherwise}, \end{cases} $$

and analogously $\text{cp}_W^{\text{user}}$. The global indicator is shared by both blocks:

$$ \text{cp}_a^{\text{global}} = \text{cp}_W^{\text{global}} = \begin{cases} \text{FALSE} & \text{parametrization} = \text{"ncp"},\\ \text{TRUE} & \text{parametrization} = \text{"cp"},\\ \text{NA} & \text{parametrization} = \text{"auto"}. \end{cases} $$

Preflight is invoked iff at least one block is unresolved at both tiers:

$$ \text{needs_pf} = \bigl(\text{isNA}(\text{cp}_a^{\text{user}}) \land \text{isNA}(\text{cp}_a^{\text{global}})\bigr) ;\lor; \bigl(\text{isNA}(\text{cp}_W^{\text{user}}) \land \text{isNA}(\text{cp}_W^{\text{global}})\bigr). $$

The final flag for each block follows the precedence user > preflight > global:

$$ \text{cp}_a = \begin{cases} \text{cp}_a^{\text{user}} & \text{if } \text{cp}_a^{\text{user}} \neq \text{NA},\\ \text{pf}$\text{cp}_a & \text{if preflight ran and } \text{cp}_a^{\text{user}} = \text{NA},\\ \text{cp}_a^{\text{global}} & \text{if preflight did not run and } \text{cp}_a^{\text{user}} = \text{NA}, \end{cases} $$

with $\text{cp}_W$ defined symmetrically. The decision-reason label is

$$ \text{reason}_a = \begin{cases} \text{"user_explicit_a"} & \text{cp}_a^{\text{user}} \neq \text{NA},\\ \text{"user_global"} & \text{preflight did not run and } \text{cp}_a^{\text{user}} = \text{NA},\\ \text{(from preflight meta)} & \text{preflight ran and } \text{cp}_a^{\text{user}} = \text{NA}, \end{cases} $$

again symmetric in $W$.

Returns

A named list with three elements:

  • cp_alogical(1). Final centered-parametrization flag for the a block, coerced via as.logical().
  • cp_Wlogical(1). Final centered-parametrization flag for the W block, coerced via as.logical().
  • metalist. Diagnostics/audit record. Two mutually exclusive shapes:
    • Preflight path (needs_pf_a || needs_pf_W): meta is pf$meta returned by preflight_parametrization(), with meta$decision_reason_a overwritten to "user_explicit_a" when cp_a_user is not NA, and meta$decision_reason_W overwritten to "user_explicit_W" when cp_W_user is not NA. The non-overwritten reason retains whatever preflight_parametrization() set.
    • No-preflight path: meta is a freshly constructed list with:
      • used_preflight = FALSE
      • n_divergent = NA_integer_
      • div_pct = NA_real_
      • ebfmi_min = NA_real_
      • t_attribution_a = NA_real_
      • t_attribution_W = NA_real_
      • t_info_cp_a = NA_real_
      • t_info_ncp_a = NA_real_
      • t_info_cp_W = NA_real_
      • t_info_ncp_W = NA_real_
      • decision_reason_a{"user_explicit_a", "user_global"}
      • decision_reason_W{"user_explicit_W", "user_global"}

Notes

  • The global flag is shared: cp_W_global <- cp_a_global, so parametrization = "cp"/"ncp" forces both blocks identically unless a per-parameter override intervenes.
  • parametrization_a == "cp" (and the W analogue) is a strict string equality test; supplying e.g. "CP", "centered", or any non-"cp" value yields FALSE, i.e. is silently interpreted as non-centered. No validation or warning is emitted.
  • If parametrization is not one of "ncp", "cp", "auto", switch() returns NULL, so cp_a_global and cp_W_global become NULL. Subsequent is.na(cp_a_global) then returns a zero-length logical, which alters the scalar && short-circuit behavior in needs_pf_a/needs_pf_W; this is an unvalidated input path and the resulting control flow is not explicitly guarded.
  • When preflight runs but only one block was "auto"-unresolved (the other being user-explicit), preflight is still executed (because the condition is ||), and pf$cp_a/pf$cp_W may both be consulted; however, the user-explicit block's flag is taken from the user value, and only its decision_reason_* is overwritten — the underlying preflight diagnostic fields in meta reflect the full preflight run regardless.
  • as.logical() is applied to the final cp_a/cp_W. For the no-preflight path these are already logical scalars; for the preflight path they are whatever preflight_parametrization() returned (expected logical), coerced to logical.
  • No errors are raised directly by this function; any error propagates from preflight_parametrization().
  • No S3 dispatch, no I/O side effects beyond those internal to preflight_parametrization() (e.g. sampling, console output when verbose).
  • The function is not exported (no @export visible); it is an internal resolver invoked during model setup.

R/prior_spec.R

.gdpar_canonical_prior_for_kind(kind) -- the real signature.

Purpose
Returns the canonical default prior expression (in Stan syntax) for a given parameter kind used in the multi-parametric extension of the gdpar package. This internal function is used by generate_stan_code*() when a param_spec carries a prior_canonical_kind that has not been overridden by the user via the priors_by_kind argument of gdpar_prior.

Arguments

  • kind
    Type: Character scalar.
    Meaning: Identifies the parameter kind. Must be one of the known canonical kinds: "mu", "log_sigma", "log_phi", "logit_p", "log_shape", "log_nu", "logit_pi", or "power_p".

Mathematics
No formula; a simple switch mapping.

Returns
A character scalar containing the Stan-syntax prior expression for the given kind. If an unrecognized kind is provided, the function calls gdpar_abort with an internal error.

Notes

  • This function is internal (not exported) and is used during Stan code generation.
  • The canonical priors are weakly informative defaults calibrated for covariates standardized to unit variance and outcomes on the natural scale of the family.
  • An internal error is raised if kind is not one of the recognized kinds.

.gdpar_canonical_prior_variance(kind) -- the real signature.

Purpose
Returns the variance of the canonical default prior for a given parameter kind. This internal function is used in the post-fit information ratio diagnostic (D-B1) introduced in sub-phase 8.3.4 of Block 8. The contraction metric compares each K-individual slot's posterior variance against the canonical prior variance returned here.

Arguments

  • kind
    Type: Character scalar.
    Meaning: Identifies the parameter kind. Must be one of the known canonical kinds.

Mathematics
For each kind, the variance is derived from the canonical prior returned by .gdpar_canonical_prior_for_kind:

  • For mu, logit_p, logit_pi: prior is normal(0, 2.5), variance = $2.5^2 = 6.25$.
  • For log_sigma, log_phi, log_shape: prior is normal(0, 1), variance = $1$.
  • For log_nu: prior is normal(log(10), 1), variance = $1$ (variance of normal depends only on scale).
  • For power_p: prior is uniform(1.01, 1.99), variance = $(1.99 - 1.01)^2 / 12$.

Returns
A numeric scalar with the canonical prior variance. If kind is unrecognized, returns NA_real_.

Notes

  • This function is internal (not exported).
  • The variance for power_p is computed as $(b-a)^2/12$ for a uniform distribution on $[a, b]$.

.gdpar_known_canonical_kinds() -- the real signature.

Purpose
Returns the character vector of all parameter kinds for which a canonical prior is registered. This internal function is used by the validator of priors_by_kind in gdpar_prior.

Arguments
None.

Mathematics
None.

Returns
A character vector containing the eight known canonical kinds: "mu", "log_sigma", "log_phi", "logit_p", "log_shape", "log_nu", "logit_pi", "power_p".

Notes

  • This function is internal (not exported) and is used for input validation.

gdpar_prior(theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, phi, priors_by_kind) -- the real signature.

Purpose
Builds a prior specification object of class gdpar_prior that is consumed by gdpar when path = "bayes". It allows users to specify priors for the population-level and hierarchical parameters of the AMM (Asymmetric Multinomial Model) hierarchical Bayesian model. Defaults are weakly informative on the linear-predictor scale.

Arguments

  • theta_ref
    Type: Character scalar (default "normal(0, 2.5)").
    Meaning: Prior on the population reference theta_ref (or its hyperparameter mu_theta_ref when grouping is active) on the linear-predictor scale, in Stan syntax.
  • sigma_theta_ref
    Type: Character scalar (default "student_t(3, 0, 1)").
    Meaning: Prior on the hierarchical scale of theta_ref[g] across groups (positive-truncated in Stan). Used only when grouping is active.
  • sigma_a
    Type: Character scalar (default "student_t(3, 0, 1)").
    Meaning: Prior on the hierarchical scale of the additive component coefficients (positive-truncated in Stan).
  • sigma_b
    Type: Character scalar (default "student_t(3, 0, 1)").
    Meaning: Prior on the hierarchical scale of the multiplicative contribution to the linear predictor (positive-truncated in Stan). Internally, the model samples c_b = theta_ref * b_coef; this prior applies to c_b.
  • sigma_W
    Type: Character scalar (default "student_t(3, 0, 1)").
    Meaning: Prior on the hierarchical scale of the modulating component coefficients (positive-truncated in Stan).
  • sigma_y
    Type: Character scalar (default "student_t(3, 0, 2.5)").
    Meaning: Prior on the residual standard deviation for Gaussian families (positive-truncated in Stan).
  • phi
    Type: Character scalar (default "gamma(2, 0.1)").
    Meaning: Prior on the negative-binomial dispersion parameter phi (Stan parametrization: variance = $\mu + \mu^2 / \phi$).
  • priors_by_kind
    Type: Optional named list of character scalars (default NULL).
    Meaning: Overrides for canonical priors by parameter kind (used when K > 1 in the multi-parametric extension). Each element must be a non-empty character scalar in Stan syntax. Recognized kinds are those from .gdpar_known_canonical_kinds().

Mathematics
No explicit formulas; the function constructs a list of prior strings.

Returns
An object of class gdpar_prior (also inheriting from list) containing the seven legacy character snippets and the priors_by_kind override list.

Notes

  • The function validates that each of the seven legacy arguments is a non-empty character scalar. If not, it calls gdpar_abort with a gdpar_input_error.
  • If priors_by_kind is NULL, it is set to an empty list. If provided, it must be a named list with no empty names; each name must be a recognized canonical kind (as returned by .gdpar_known_canonical_kinds). Each element must be a non-empty character scalar. Otherwise, gdpar_abort is called.
  • The class gdpar_prior is used by downstream functions (e.g., gdpar, code generators) to inject prior specifications into the Stan model.
  • The defaults are calibrated for standardized covariates and outcomes on the natural scale of the family. Scale parameters are declared positive in Stan; the prior strings are interpreted accordingly.
  • The object is intended for use with gdpar(path = "bayes").

.gdpar_resolve_effective_prior_for_kind(prior, kind) -- the real signature.

Purpose
Returns the effective prior expression for a given parameter kind by first checking for a user-supplied override in prior$priors_by_kind and, if absent, falling back to the canonical default from .gdpar_canonical_prior_for_kind. This internal function is used by the code generator and tests of the canonical-kind override mechanism.

Arguments

  • prior
    Type: A gdpar_prior object.
    Meaning: The prior specification object (created by gdpar_prior).
  • kind
    Type: Character scalar.
    Meaning: The parameter kind to resolve.

Mathematics
None; a lookup operation.

Returns
A character scalar containing the Stan-syntax prior expression for the given kind. If the kind is not found in prior$priors_by_kind, the canonical default is returned.

Notes

  • This function is internal (not exported).
  • It assumes kind is one of the known canonical kinds; otherwise, the canonical default lookup may fail with an internal error (via .gdpar_canonical_prior_for_kind).
  • Used during Stan code generation to insert the appropriate prior for each parameter kind in the model.

.gdpar_prior_for_kind(prior, kind)

Purpose Retrieves the specific prior specification for a given component kind from a gdpar_prior object. It first checks for an override in the object's priors_by_kind list. If no override exists, it falls back to the canonical (default) prior for that kind defined elsewhere in the package. Arguments

  • prior: An object of class gdpar_prior (or a list-like object containing an element named priors_by_kind).
  • kind: A character string identifying the model component (e.g., parameter group) for which to retrieve the prior.

Returns The prior specification (the exact structure depends on the canonical/override implementation) for the requested kind. If an override exists in prior$priors_by_kind[[kind]], it is returned; otherwise, the result of .gdpar_canonical_prior_for_kind(kind) is returned.

Notes

  • This is an internal function (note the leading .). It is a utility used to standardize prior lookups, allowing user-specified overrides via the priors_by_kind slot.
  • The fallback function .gdpar_canonical_prior_for_kind() is not defined in this section and must be provided by the package infrastructure.

print.gdpar_prior(x, ...)

Purpose An S3 print method for objects of class gdpar_prior. It formats and prints a structured summary of the prior object to the console. Arguments

  • x: An object of class gdpar_prior.
  • ...: Additional arguments passed to the generic; ignored in this method.

Returns Invisibly returns the input object x.

Notes

  • This is an exported function (due to the @export tag), providing the standard print() interface for gdpar_prior objects.
  • The output lists the values of the following top-level prior parameters, if they exist in x: theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, and phi.
  • If x contains a non-null, non-empty priors_by_kind list, each element (key-value pair) in that list is printed indented under a "priors_by_kind" header. The keys are the component names (the kind argument in other functions), and the values are their corresponding prior specifications.
  • The function uses cat() for direct console output and format(..., width = ...) for alignment.

R/residuals.R

Internal Helpers (residuals.R, section 1 of 2)

.gdpar_fit_path_class(object)

Purpose

Classifies a fitted gdpar_fit object into one of three dispatch "paths" so that downstream residual helpers can select the correct data-extraction and array-shaping logic. The three paths mirror the three Stan data-assembly routines in gdpar.R (.assemble_stan_data, .assemble_stan_data_multi, .assemble_stan_data_K) and the three Stan templates.

Arguments

Argument Type Meaning
object list (gdpar_fit) Fitted model object; must contain scalar integer elements K and p.

Returns

A length-1 character string:

  • "K_individual" — when object[["K"]] is non-NULL and strictly greater than 1L.
  • "multivariate" — when K is NULL or ≤ 1 and object[["p"]] is non-NULL and strictly greater than 1L.
  • "scalar" — otherwise (the fall-through default).

Notes

K is checked first and takes precedence over p. If either element is NULL or not greater than 1L, the corresponding branch is skipped. No validation of object structure is performed beyond the is.null / > 1L tests; a malformed object silently falls through to "scalar".


.gdpar_integer_family_stan_ids()

Purpose

Returns the set of stan_id integer codes that identify integer-valued (count or binary) response families. Used by .gdpar_get_y_obs() to decide whether to read y_int or y_real from stan_data.

Arguments

None.

Returns

An integer vector of length 7: c(2L, 3L, 4L, 10L, 11L, 12L, 13L). Per the roxygen documentation these correspond to Poisson, negative-binomial-2, Bernoulli, and the ZIP / ZINB / hurdle-Poisson / hurdle-NB mixture families.

Notes

The mapping from stan_id to family name is not validated inside this helper; callers are expected to use the returned vector for %in% membership tests only.


.gdpar_get_y_obs(object)

Purpose

Extracts the observed response vector (or matrix) from a gdpar_fit object's stan_data component, choosing between y_real and y_int according to the family's stan_id.

Arguments

Argument Type Meaning
object list (gdpar_fit) Fitted model object containing $stan_data, $family, and (for path classification) $K / $p.

Returns

  • Multivariate path (p > 1): a numeric matrix of shape $n \times p$, taken from stan_data$y_real if it is a non-empty matrix, otherwise from stan_data$y_int.
  • Scalar / K-individual paths: a numeric vector of length $n$. If family$stan_id is in .gdpar_integer_family_stan_ids(), as.numeric(sd$y_int) is returned; otherwise as.numeric(sd$y_real).

Notes

  • For the multivariate path, y_real is preferred over y_int; if neither is a non-NULL matrix with at least one row, the function calls gdpar_abort() with class = "gdpar_internal_error" and message "Internal error: multivariate fit lacks both y_real and y_int in stan_data.".
  • For scalar / K-individual paths, if object$family$stan_id is NULL, gdpar_abort() is called with class = "gdpar_internal_error" and message "Internal error: family lacks stan_id; cannot determine y_real vs y_int.".
  • The stan_id extraction uses an identical if/else branch for both "K_individual" and "scalar" paths (object$family$stan_id in both cases), so the code is functionally uniform across those two paths despite the explicit branch.

.gdpar_get_y_pred_draws(object)

Purpose

Extracts posterior draws of the Stan parameter y_pred from the CmdStanFit / stanfit object stored in object$fit, reshaping the flat draws matrix into the path-appropriate array layout.

Arguments

Argument Type Meaning
object list (gdpar_fit) Fitted model object; object$fit must support $draws(variables = "y_pred", format = "draws_matrix").

Mathematics

For the multivariate path, variable names follow the pattern y_pred[i,k] with $i \in \{1,\dots,n\}$, $k \in \{1,\dots,p\}$. The flat $S \times (n\cdot p)$ matrix is reshaped into a 3-D array $A \in \mathbb{R}^{S \times n \times p}$ where

$$A[s, i, k] = \texttt{mat}[s,, \text{col}(i,k)]$$

and $\text{col}(i,k)$ is the column index whose variable name encodes indices $(i,k)$.

For the scalar / K-individual paths, variable names follow y_pred[i] with $i \in \{1,\dots,n\}$, and the output is an $S \times n$ matrix $M$ with $M[s, i] = \texttt{mat}[s, \text{col}(i)]$.

Returns

  • Multivariate path: numeric array of shape $S \times n \times p$.
  • Scalar / K-individual paths: numeric matrix of shape $S \times n$.

In both cases $S$ is the number of posterior draws. Entries are initialised to NA_real_ and then filled column-by-column.

Notes

  • Calls require_suggested("posterior", "extract posterior y_pred draws") before extraction; if the posterior package is unavailable this will error.
  • Variable-name parsing uses regexec with patterns ^y_pred\[(\d+),(\d+)\]$ (multivariate) or ^y_pred\[(\d+)\]$ (scalar / K-individual). If any parsed index is NA, gdpar_abort() is called with class = "gdpar_internal_error" and a message indicating which path failed.
  • The dimensions $n$ and $p$ (multivariate) or $n$ (scalar) are inferred as max(idx_i) / max(idx_k) from the parsed variable names, not from object$stan_data.
  • unclass(draws) is used to strip the draws_matrix class before column indexing.

.gdpar_family_name_for_residuals(object, coord = NULL)

Purpose

Returns a single family-name string suitable for dispatching family-specific residual computations (deviance, quantile). For the multivariate path the family is per-coordinate; for the other paths it is the single location family.

Arguments

Argument Type Meaning
object list (gdpar_fit) Fitted model object.
coord NULL or integer Coordinate index selecting a family from object$family$families for the multivariate path. Ignored for scalar / K-individual paths.

Returns

A length-1 character string:

  • Multivariate path: object$family$families[[coord]]$name.
  • Scalar / K-individual paths: object$family$name.

Notes

If the path is "multivariate" and coord is NULL, gdpar_abort() is called with class = "gdpar_internal_error" and message "Internal error: multivariate path requires explicit coord index.". No bounds checking on coord is performed; an out-of-range index will propagate a standard subscript out of bounds error from [[.


.gdpar_family_is_discrete(name)

Purpose

Tests whether a family name corresponds to a discrete (count or binary) distribution, used to switch on uniform jittering in Dunn–Smyth randomized quantile residuals.

Arguments

Argument Type Meaning
name character Family name string (as returned by .gdpar_family_name_for_residuals()).

Returns

TRUE if name is one of "poisson", "neg_binomial_2", "bernoulli", "zip", "zinb", "hurdle_poisson", "hurdle_neg_binomial_2"; otherwise FALSE.

Notes

Vectorised via %in%, so a character vector of length > 1 returns a logical vector of the same length, although callers typically pass a scalar.


.gdpar_deviance_residual(y, mu, family_name, dispersion = NULL)

Purpose

Computes a single family-specific deviance residual contribution for each $(y_i, \mu_i)$ pair, signed by $\mathrm{sign}(y_i - \mu_i)$. Used as the building block for G1 deviance residuals.

Arguments

Argument Type Meaning
y numeric vector Observed responses.
mu numeric vector Predicted means (same length as y).
family_name character (length 1) Family name string.
dispersion NULL or numeric Dispersion / scale parameter ($\sigma$ for Gaussian / Student-$t$, $\phi$ for NB2). Defaults to 1 when NULL.

Mathematics

Let $\varepsilon =$ .Machine$double.eps$^{1/3}$ and $\mu_i^{\star} = \max(\mu_i, \varepsilon)$. The returned value is $r_i = \mathrm{sign}(y_i - \mu_i)\, g_i$ where $g_i$ depends on family_name:

Gaussian ($\sigma = \max(\texttt{dispersion}, \varepsilon)$, default $1$):

$$g_i = \frac{|y_i - \mu_i|}{\sigma}$$

Poisson / ZIP / hurdle-Poisson (with $\mathbf{1}_{y_i&gt;0}$ indicator):

$$d_i = 2\Big[\mathbf{1}_{y_i>0}, y_i \log\frac{y_i}{\mu_i^{\star}} - (y_i - \mu_i)\Big], \qquad g_i = \sqrt{\max(d_i, 0)}$$

Negative-binomial-2 / ZINB / hurdle-NB2 ($\phi = \max(\texttt{dispersion}, \varepsilon)$, default $1$):

$$d_i = 2\left[\mathbf{1}_{y_i>0}, y_i \log\frac{y_i}{\mu_i^{\star}} + (y_i + \phi)\log\frac{\mu_i^{\star} + \phi}{y_i + \phi}\right], \qquad g_i = \sqrt{\max(d_i, 0)}$$

Bernoulli ($\tilde\mu_i = \min(\max(\mu_i^{\star}, \varepsilon), 1-\varepsilon)$):

$$d_i = -2\big[y_i \log \tilde\mu_i + (1 - y_i)\log(1 - \tilde\mu_i)\big], \qquad g_i = \sqrt{\max(d_i, 0)}$$

Beta ($\tilde y_i = \min(\max(y_i, \varepsilon), 1-\varepsilon)$, $\tilde\mu_i$ as above):

$$d_i = 2\left[\tilde y_i \log\frac{\tilde y_i}{\tilde\mu_i} + (1 - \tilde y_i)\log\frac{1 - \tilde y_i}{1 - \tilde\mu_i}\right], \qquad g_i = \sqrt{\max(d_i, 0)}$$

Gamma ($y_i^{\star} = \max(y_i, \varepsilon)$):

$$d_i = 2\left[-\log\frac{y_i^{\star}}{\mu_i^{\star}} + \frac{y_i^{\star} - \mu_i^{\star}}{\mu_i^{\star}}\right], \qquad g_i = \sqrt{\max(d_i, 0)}$$

Lognormal (loc-scale):

$$g_i = \big|\log y_i^{\star} - \log \mu_i^{\star}\big|$$

Student-$t$ ($\sigma$ as for Gaussian):

$$g_i = \frac{|y_i - \mu_i|}{\sigma}$$

Tweedie and unrecognised families (fallback):

$$g_i = \sqrt{\max(2,|y_i - \mu_i|,; 0)}$$

Returns

A numeric vector of the same length as y containing the signed deviance residuals.

Notes

  • The NB2 deviance is described in the roxygen as "approximated by Poisson deviance with shape inflation factor, conservative"; the implemented formula is the standard NB2 deviance with $\phi$ as the reciprocal dispersion / shape parameter.
  • Mixtures and hurdle families that are not explicitly listed (e.g. zip, zinb, hurdle_poisson, hurdle_neg_binomial_2) are handled by the Poisson or NB2 branches respectively, not by the fallback.
  • The fallback branch (Tweedie and any unrecognised family_name) uses a Pearson-like surrogate $r_i = \mathrm{sign}(y_i-\mu_i)\sqrt{2|y_i-\mu_i|}$, matching the roxygen description.
  • eps is computed once as .Machine$double.eps^(1/3).
  • All vectorised operations rely on ifelse / pmax / pmin, so inputs of length > 1 are handled element-wise.

.gdpar_quantile_residuals_bayesian(y_obs, y_pred_mat, family_name, randomize_seed = NULL)

Purpose

Computes Bayesian randomized quantile residuals (Dunn & Smyth, 1996) by averaging the conditional CDF over posterior draws of $y_{\text{pred}}$, using the empirical ECDF of the posterior predictive draws as a nonparametric estimator of the family-conditional CDF.

Arguments

Argument Type Meaning
y_obs numeric vector Observed responses, length $n$.
y_pred_mat numeric matrix Posterior predictive draws of shape $S \times n$ (rows = draws, columns = observations).
family_name character (length 1) Family name, used to determine discreteness via .gdpar_family_is_discrete().
randomize_seed NULL or integer If non-NULL, the global RNG state is saved, set.seed(randomize_seed) is called, and the original state is restored on exit.

Mathematics

For each observation $i = 1, \dots, n$, let $\{y_{\text{pred},i}^{(s)}\}_{s=1}^{S}$ be the $S$ posterior predictive draws and let $\hat F_i$ denote their empirical CDF.

Discrete families (Dunn–Smyth randomization): draw $v_i \sim \mathrm{Uniform}(0,1)$ and compute

$$u_i = \hat F_i(y_i^-) + v_i\big[\hat F_i(y_i) - \hat F_i(y_i^-)\big]$$

where $\hat F_i(y_i^-) = \frac{1}{S}\sum_{s=1}^{S}\mathbf{1}\{y_{\text{pred},i}^{(s)} &lt; y_i\}$ and $\hat F_i(y_i) = \frac{1}{S}\sum_{s=1}^{S}\mathbf{1}\{y_{\text{pred},i}^{(s)} \le y_i\}$.

Continuous families:

$$u_i = \hat F_i(y_i) = \frac{1}{S}\sum_{s=1}^{S}\mathbf{1}{y_{\text{pred},i}^{(s)} \le y_i}$$

Then $u_i$ is clamped to $[\varepsilon,\, 1-\varepsilon]$ with $\varepsilon =$ .Machine$double.eps$^{1/3}$ and mapped to the normal scale:

$$r_i = \Phi^{-1}(u_i)$$

Returns

A numeric vector of length $n$ containing the quantile residuals on the standard-normal scale.

Notes

  • Seed handling: when randomize_seed is non-NULL, the function checks for .Random.seed in globalenv(). If it exists, it is captured into old_seed and restored via on.exit(assign(".Random.seed", old_seed, envir = globalenv()), add = TRUE). Then set.seed(randomize_seed) is called. If .Random.seed does not exist in globalenv(), no restoration is registered (the on.exit is inside the if (exists(...)) block), but set.seed is still called.
  • For discrete families, exactly one uniform variate $v_i$ is drawn per observation via stats::runif(1L), so the randomization is per-observation independent.
  • The loop over seq_len(n) is not vectorised for the CDF computation; each observation's yp <- y_pred_mat[, i] column is extracted and compared to y_obs[i].
  • Clamping with eps prevents qnorm from returning ±Inf when $u_i$ is exactly 0 or 1.
  • No input validation is performed on the dimensions of y_pred_mat; if nrow(y_pred_mat) or ncol(y_pred_mat) is inconsistent with length(y_obs), indexing will produce NA or recycling behaviour.

.gdpar_has_dharma()

Purpose

Detects whether the DHARMa package is installed and available, implementing the "Suggests with detect-and-use" pattern (decision E1.A of the 8.3.9 scoping).

Arguments

None.

Returns

TRUE if requireNamespace("DHARMa", quietly = TRUE) succeeds; FALSE otherwise.

Notes

The call is wrapped in requireNamespace with quietly = TRUE, so no warning or message is emitted on failure. The function is a thin predicate intended for use in if (.gdpar_has_dharma()) { ... } guards.

.gdpar_build_dharma_object(object, coord = NULL)

Purpose

Internal helper that constructs a DHARMa simulation object from a fitted gdpar_fit model by forwarding the posterior predictive draws of y_pred and the observed response y to DHARMa::createDHARMa(). Used by the exported wrapper gdpar_dharma_object().

Arguments

  • object: An object of class gdpar_fit. The fitted model from which observed responses and posterior predictive draws are extracted.
  • coord: NULL or an integer scalar. For multivariate fits (p > 1), selects the coordinate (column) for which the DHARMa object is built; required (non-NULL) in that case. Ignored for scalar / K-individual paths.

Mathematics

For each observation $i$, the posterior predictive draws form a matrix $Y_{\text{pred}} \in \mathbb{R}^{S \times n}$ (or $S \times n$ slice for coordinate coord of a multivariate fit). The fitted predicted response is the posterior mean per observation:

$$ \hat{\mu}_i = \frac{1}{S} \sum_{s=1}^{S} Y_{\text{pred}, s, i}. $$

These are passed to DHARMa::createDHARMa() with simulatedResponse = t(Y_pred) (draws as columns), observedResponse = y_obs, fittedPredictedResponse = \hat{\mu}, and integerResponse set according to whether the family is discrete.

Returns

A DHARMa::DHARMa simulation object (return value of DHARMa::createDHARMa()).

Notes

  • Raises an error of class gdpar_input_error if DHARMa is not installed (via .gdpar_has_dharma()), directing the user to residuals(., type = "quantile") as a fallback.
  • Raises an error of class gdpar_input_error if the fit path is "multivariate" and coord is NULL.
  • For multivariate fits, y_pred is indexed as y_pred[, , coord] (an $S \times n$ matrix) and y_obs as y_obs[, coord]; for other paths, y_pred and y_obs are used directly.
  • Discreteness of the family is determined by .gdpar_family_is_discrete(.gdpar_family_name_for_residuals(object, coord = coord)), which controls the integerResponse flag passed to DHARMa.
  • fitted_pred is computed via colMeans(y_pred_mat), i.e. the posterior mean per observation across draws.

residuals.gdpar_fit(object, type = c("quantile", "response", "pearson", "deviance"), coord = NULL, randomize_seed = NULL, ...)

Purpose

Exported S3 method implementing stats::residuals() for objects of class gdpar_fit. Computes posterior-predictive residuals of four types — response, Pearson, deviance, and Bayesian randomized quantile (Dunn-Smyth 1996) — from the y_pred draws emitted by the Stan templates.

Arguments

  • object: An object of class gdpar_fit (validated via assert_inherits()).
  • type: Character scalar matched against c("quantile", "response", "pearson", "deviance") via match.arg(). Default is "quantile".
  • coord: NULL or an integer scalar in $[1, p]$. Only used for multivariate fits. When NULL (default), residuals are returned as an $n \times p$ matrix; when specified, a length-$n$ vector for that coordinate is returned.
  • randomize_seed: NULL or an integer scalar. When provided, sets the RNG seed used by the randomized quantile residual under discrete families. When NULL, the global RNG state is used.
  • ...: Unused; present for S3 generic compatibility.

Mathematics

Let $y_i$ denote the observed response for observation $i$ and $Y_{\text{pred}, s, i}$ the $s$-th posterior predictive draw ($s = 1, \dots, S$). Define

$$ \hat{\mu}_i = \frac{1}{S}\sum_{s=1}^{S} Y_{\text{pred}, s, i}, \qquad \hat{\sigma}_i = \sqrt{\frac{1}{S-1}\sum_{s=1}^{S}\bigl(Y_{\text{pred}, s, i} - \hat{\mu}_i\bigr)^2}. $$

  • Response: $r_i = y_i - \hat{\mu}_i$.
  • Pearson: $r_i = (y_i - \hat{\mu}_i) / \max(\hat{\sigma}_i, \varepsilon)$, where $\varepsilon =$ .Machine$double.eps^(1/3).
  • Deviance: family-specific, delegated to .gdpar_deviance_residual(y_obs, mu_hat, family_name); for mixtures (ZIP/ZINB) and Hurdle families a Pearson-like surrogate is used.
  • Quantile (Bayesian randomized quantile, Dunn and Smyth 1996): averages the nonparametric ECDF of y_pred draws at $y_i$ across posterior draws, with uniform jitter on the equality mass when the family is discrete. Under correct specification, $r_i \overset{\text{approx}}{\sim} \mathcal{N}(0,1)$.

Returns

  • Scalar / K-individual paths: a numeric vector of length $n$.
  • Multivariate path with coord = NULL: a numeric matrix of size $n \times p$ with column names paste0("dim_", seq_len(p)).
  • Multivariate path with coord = k: a numeric vector of length $n$ for coordinate $k$.

Notes

  • Dispatches on the S3 class gdpar_fit registered for stats::residuals.
  • The fit path is determined by .gdpar_fit_path_class(object).
  • For multivariate fits with non-NULL coord, the function validates coord is an integer in $[1, p]$ (where p <- object[["p"]]); otherwise raises an error of class gdpar_input_error with a formatted message.
  • For multivariate fits with coord = NULL, residuals are computed coordinate-by-coordinate in a for (k in seq_len(p)) loop, with the family name retrieved per coordinate via .gdpar_family_name_for_residuals(object, coord = k).
  • Actual residual computation is delegated to .gdpar_residuals_dispatch().
  • y_obs and y_pred are obtained from .gdpar_get_y_obs(object) and .gdpar_get_y_pred_draws(object) respectively.

.gdpar_residuals_dispatch(y_obs, y_pred_mat, type, family_name, randomize_seed)

Purpose

Internal type-dispatcher that computes residuals for a single (y_obs, y_pred_mat) pair according to type. Called by residuals.gdpar_fit() once the appropriate coordinate slice has been extracted.

Arguments

  • y_obs: Numeric vector of length $n$. Observed responses for the coordinate being processed.
  • y_pred_mat: Numeric matrix of dimensions $S \times n$. Posterior predictive draws (draws in rows, observations in columns).
  • type: Character scalar; one of "response", "pearson", "deviance", "quantile". No match.arg() is performed here (the caller has already matched).
  • family_name: Character scalar naming the response family, used for deviance and quantile residuals.
  • randomize_seed: NULL or integer scalar; passed through to .gdpar_quantile_residuals_bayesian().

Mathematics

  • "response": returns $y - \text{colMeans}(Y_{\text{pred}})$, i.e. $r_i = y_i - \hat{\mu}_i$.
  • "pearson": with $\hat{\mu}_i = \text{colMeans}(Y_{\text{pred}})_i$ and $\hat{\sigma}_i = \text{sd}(Y_{\text{pred}, \cdot, i})$ (sample standard deviation across draws), returns

$$ r_i = \frac{y_i - \hat{\mu}_i}{\max(\hat{\sigma}_i, \varepsilon)}, \qquad \varepsilon = \text{.Machine$double.eps}^{1/3}. $$

  • "deviance": returns .gdpar_deviance_residual(y_obs, mu_hat, family_name).
  • otherwise ("quantile"): returns .gdpar_quantile_residuals_bayesian(y_obs, y_pred_mat, family_name, randomize_seed).

Returns

A numeric vector of length $n$ (the residuals for the supplied pair).

Notes

  • The function uses early return() statements; the final branch (quantile) is the fall-through default when type is not "response", "pearson", or "deviance".
  • sd_hat is computed via apply(y_pred_mat, 2L, stats::sd) (per-column sample SD across draws).
  • The floor eps <- .Machine$double.eps^(1 / 3) prevents division by zero when a coordinate has zero posterior predictive variance.
  • No input validation is performed; the caller is responsible for argument checking.

gdpar_posterior_predict(object, ndraws = NULL, ...)

Purpose

Exported function that extracts the posterior predictive draws y_pred emitted by the Stan templates. Independent of the rstantools generic; exported under the name gdpar_posterior_predict() to avoid a hard dependency on rstantools. When rstantools or brms is loaded, posterior_predict() routes to this method via the S3 method registered in NAMESPACE.

Arguments

  • object: An object of class gdpar_fit (validated via assert_inherits()).
  • ndraws: NULL (default) or an integer scalar. When given, subsamples the first ndraws posterior draws (rows of the matrix / first slices of the array).
  • ...: Unused; present for S3 generic compatibility.

Mathematics

Let $S$ be the total number of posterior draws. When ndraws = NULL, all $S$ draws are returned. When ndraws = m is supplied, the first $\min(m, S)$ draws are retained:

$$ Y_{\text{pred}}^{(\text{sub})} = Y_{\text{pred}}[1{:}\min(m, S), \cdot, (\cdot)]. $$

Returns

  • Scalar / K-individual paths: a numeric matrix of dimensions $S \times n$ (or $\min(\text{ndraws}, S) \times n$ when ndraws is given).
  • Multivariate path: a numeric array of dimensions $S \times n \times p$ (or $\min(\text{ndraws}, S) \times n \times p$).

In both cases drop = FALSE is used so dimensions are preserved.

Notes

  • Raises an error of class gdpar_input_error if ndraws is not a positive integer scalar (after coercion via as.integer()): conditions are length(ndraws) != 1L, is.na(ndraws), or ndraws < 1L.
  • The underlying draws are obtained from .gdpar_get_y_pred_draws(object).
  • Subsampling is performed via seq_len(min(ndraws, n_total)) where n_total is nrow(out) for matrices or dim(out)[1L] for arrays.
  • When ndraws exceeds the available number of draws, all draws are returned silently (no error).

gdpar_dharma_object(object, coord = NULL)

Purpose

Exported wrapper that builds and returns a DHARMa::DHARMa simulation object from a fitted gdpar_fit model, enabling DHARMa diagnostic routines such as testResiduals(), testZeroInflation(), and plotResiduals().

Arguments

  • object: An object of class gdpar_fit (validated via assert_inherits()).
  • coord: NULL or an integer scalar between $1$ and $p$. Required for multivariate fits; ignored for scalar / K-individual paths.

Returns

A DHARMa simulation object (the return value of .gdpar_build_dharma_object()).

Notes

  • Thin wrapper: validates object inherits from gdpar_fit and then delegates entirely to .gdpar_build_dharma_object(object, coord = coord).
  • Requires the suggested package DHARMa; if not installed, .gdpar_build_dharma_object() raises an error of class gdpar_input_error pointing to residuals(., type = "quantile") as the built-in fallback.
  • For multivariate fits, coord must be supplied (non-NULL); otherwise .gdpar_build_dharma_object() raises an error of class gdpar_input_error.

pp_check.gdpar_fit(object, type = c("dens_overlay", "hist", "ecdf_overlay", "stat", "intervals"), coord = NULL, ndraws = 50L, ...)

Purpose

Exported S3 method implementing the bayesplot::pp_check() generic for objects of class gdpar_fit. Forwards y_pred draws and the observed response y to one of the bayesplot::ppc_* family of functions selected by type, producing a posterior predictive check plot.

Arguments

  • object: An object of class gdpar_fit (validated via assert_inherits()).
  • type: Character scalar matched against c("dens_overlay", "hist", "ecdf_overlay", "stat", "intervals") via match.arg(). Default is "dens_overlay".
  • coord: NULL or an integer scalar in $[1, p]$. Required for multivariate fits; selects the coordinate to plot.
  • ndraws: Integer scalar; subsamples the first ndraws posterior draws before plotting. Defaults to 50L.
  • ...: Additional arguments forwarded to the selected bayesplot function.

Mathematics

Let $S$ be the total number of posterior draws. The function retains the first $\min(\text{ndraws}, S)$ draws:

$$ Y_{\text{rep}}^{(\text{sub})} = Y_{\text{pred}}[1{:}\min(\text{ndraws}, S), \cdot, (\cdot)]. $$

The selected bayesplot function is then invoked as fn(y = y_obs_vec, yrep = Y_rep_sub, ...).

Returns

A ggplot object produced by the selected bayesplot ppc_* function.

Notes

  • Registered via @exportS3Method bayesplot::pp_check so that the method is available when bayesplot is loaded.
  • Calls require_suggested("bayesplot", "produce a posterior predictive check plot"); if bayesplot is not installed, an error is raised (the exact class is determined by require_suggested()).
  • The fit path is determined by .gdpar_fit_path_class(object).
  • For multivariate fits:
    • If coord is NULL, raises an error of class gdpar_input_error instructing the user to pass coord = k.
    • Coerces coord to integer and validates it lies in $[1, p]$ where p <- object[["p"]]; otherwise raises an error of class gdpar_input_error with a formatted message.
    • Extracts y_obs_vec <- as.numeric(y_obs[, coord]) and yrep_mat <- y_pred[, , coord].
  • For scalar / K-individual paths: y_obs_vec <- as.numeric(y_obs) and yrep_mat <- y_pred.
  • ndraws is coerced via as.integer(); raises an error of class gdpar_input_error if length(ndraws) != 1L, is.na(ndraws), or ndraws < 1L.
  • Subsampling: keep <- seq_len(min(ndraws, nrow(yrep_mat))); yrep_mat <- yrep_mat[keep, , drop = FALSE].
  • The bayesplot function is selected via switch(type, ...):
    • "dens_overlay"bayesplot::ppc_dens_overlay
    • "hist"bayesplot::ppc_hist
    • "ecdf_overlay"bayesplot::ppc_ecdf_overlay
    • "stat"bayesplot::ppc_stat
    • "intervals"bayesplot::ppc_intervals
  • The selected function is invoked with named arguments y = y_obs_vec, yrep = yrep_mat, and forwarded ....

R/stan_codegen.R

.gdpar_n_params_individual(family)

Purpose Counts the number of individual-scope (per-observation or per-group) parameters $K$ in a distributional family specification. Used internally by guards that enforce block-level constraints on the multi-parametric / multivariate feature matrix.

Arguments

Argument Type Meaning
family gdpar_family or gdpar_family_multi A family object whose param_specs field (if present) lists the parameter slots and their scopes.

Mathematics

$$ K = \max!\Bigl(1,;\sum_{s,\in,\texttt{param_specs}} \mathbf{1}\bigl[\text{scope}(s) \in {\texttt{"per_observation"},;\texttt{"per_group"}}\bigr]\Bigr) $$

Population-level auxiliary parameters (e.g. $\sigma$, $\phi$ in the default $K=1$ block) are excluded. The result is clamped to a minimum of 1.

Returns Integer scalar $K \geq 1$.

Notes

  • For a gdpar_family_multi, the function extracts the first family from family$families[[1L]] and inspects its param_specs.
  • Backward compatibility: when param_specs is NULL (families predating the Block 8.0 sub-phase refactor), the function returns 1L unconditionally.
  • The clamping max(K, 1L) ensures that even if no param_specs entry has an individual scope, the returned value is at least 1.

.gdpar_guard_multiparam_multivariate(family, p)

Purpose Guards against the unsupported combination of a multi-parametric family ($K &gt; 1$) with a multivariate AMM dimension ($p &gt; 1$). In Block 8.0, the multi-parametric extension is exposed only for $p = 1$; the full $K &gt; 1 \times p &gt; 1$ combinatoric regime is deferred to Block 8.1.

Arguments

Argument Type Meaning
family gdpar_family or gdpar_family_multi The family object; forwarded to .gdpar_n_params_individual() to obtain $K$.
p Integer scalar The AMM dimension (number of coordinate functions).

Returns invisible(NULL) when the guard passes (i.e. the combination is allowed). Aborts with a gdpar_unsupported_feature_error otherwise, carrying a data list with fields K and p.

Notes

  • The guard calls .gdpar_n_params_individual(family) internally to determine $K$.
  • The error message directs the user to choose either $K &gt; 1$ with $p = 1$, or $K = 1$ with $p &gt; 1$.
  • Side effect on failure: calls gdpar_abort() which halts execution.

.gdpar_guard_K_below_family_min(family)

Purpose Prevents routing a family to legacy $K = 1$ Stan templates when that family requires more individual-scope slots than $K = 1$ can provide. Certain built-in families (Beta, Gamma, Student-t, Tweedie, etc.) have structural auxiliary parameters that are part of the likelihood and must be elevated to the $K$-individual distributional regression path (amm_distrib_K.stan).

Arguments

Argument Type Meaning
family gdpar_family or gdpar_family_multi The family object. For a gdpar_family_multi, the first element of family$families is inspected.

Minimum $K$ per stan_id

stan_id Family name Minimum $K$ Parameter slots
5 Beta 2 $\mu + \phi$
6 Gamma 2 $\mu + \text{shape}$
7 lognormal_loc_scale 2 $\mu + \sigma$
8 Student-t 3 $\mu + \sigma + \nu$
9 Tweedie 3 $\mu + \phi + p$
10 ZIP 2 $\mu + \pi$
11 ZINB 3 $\mu + \phi + \pi$
12 hurdle_poisson 2 $\mu + \pi$
13 hurdle_neg_binomial_2 3 $\mu + \phi + \pi$

Families with stan_id 1–4 (Normal, Bernoulli, Poisson, Neg-Binomial-2 in $K=1$ legacy wiring) bypass the guard. Families with stan_id that is NULL, NA, or not in the above table also bypass the guard.

Returns invisible(NULL) when the guard passes. Aborts with a gdpar_unsupported_feature_error on failure.

Notes

  • The guard computes $K$ via .gdpar_n_params_individual(family). If $K \geq \texttt{min\_K}$, it passes.
  • On failure, the error message names the family, the required $K$, the received $K$, lists the auxiliary slot names, and gives an example of how to elevate them via gdpar_bf().
  • Auxiliary slot names are read from base$param_specs when available; otherwise the fallback c("location", "dispersion") is used. The first slot is treated as the location; all remaining are auxiliaries.

.gdpar_emit_canonical_stan(spec)

Purpose Canonical dispatcher for $K = 1$ FB Stan templates (introduced in Sub-sub-fase 9.3.a, Bloque 9 Sesion B9.3, 2026-05-27). Single source-of-truth dispatcher that reads a frozen byte-identical canonical piece from inst/stan/_canonical_pieces/, injects helper fragments when requested, and performs placeholder substitution to produce the final Stan model source. Replaces duplicated inline template-read + substitute logic that previously lived in generate_stan_code() and generate_stan_code_multi().

Arguments

Argument Type Meaning
spec Named list Canonical codegen specification. Required fields depend on spec$p_class. See table below.

spec fields by p_class

p_class value Template file Required fields
"p1" amm_canonical_p1.stan prior, cp_a, cp_W, mle
"pmulti" amm_canonical_pmulti.stan (or spec$template_name) prior, cp_a, cp_W, cp_a_per_k
"distrib_K" amm_canonical_distrib_K.stan (or spec$template_name) prior, cp_a, cp_W, cp_a_per_K, family
"pmulti_KxP" amm_canonical_pmulti_KxP.stan prior, cp_a, cp_W, family

Algorithm / Substitution logic

  1. Validation. Verify spec$p_class $\in \{$"p1", "pmulti", "distrib_K", "pmulti_KxP"$\}$; abort with gdpar_internal_error otherwise.

  2. Template resolution. Select the Stan template file name. For names starting with "amm_canonical_", search in inst/stan/_canonical_pieces/; otherwise search in inst/stan/. Both system.file() (installed package) and a direct inst/ fallback are tried.

  3. Helpers injection (Sub-sub-fase 9.3.a colateral, Sesion B9.4). If the template source contains the literal placeholder // {{CANONICAL_HELPERS}}, read amm_canonical_helpers.stan from _canonical_pieces/ and substitute it into the placeholder via gsub(..., fixed = TRUE). This injects the canonical definitions of bspline_basis_eval and apply_W_basis_diff without using cmdstan #include semantics.

  4. Placeholder substitution. Build a named list replacements depending on p_class:

    • "p1": $$\texttt{A_SCALE} = \begin{cases} \texttt{""} & \text{if } \texttt{cp_a} = \texttt{TRUE} \ \texttt{"\ *\ sigma_a[1]"} & \text{otherwise} \end{cases}$$ $$\texttt{A_PRIOR} = \begin{cases} \texttt{"normal(0, sigma_a[1])"} & \text{if } \texttt{cp_a} \ \texttt{"normal(0, 1)"} & \text{otherwise} \end{cases}$$ Analogous logic for W_SCALE and W_PRIOR based on cp_W. When spec$mle is TRUE, both cp_a and cp_W are forced to TRUE.

    • "distrib_K": Validates cp_a_per_K (logical vector, no NA, length $\geq 1$; defaults to as.logical(spec$cp_a)[1L]). Calls generate_a_blocks_K(cp_a_per_K) to produce data_decl, tp_block, model_block strings. Calls .gdpar_build_theta_ref_prior_block(family) for the per-family anchor block. Replaces {{THETA_REF_PRIOR_BLOCK}}, {{DATA_CP_A_PER_K_DECL}}, {{TP_A_BLOCK}}, {{MODEL_A_BLOCK}}, plus the standard prior and CP/W placeholders. Supports EB-side templates via spec$template_name override.

    • "pmulti_KxP": Calls .gdpar_build_theta_ref_prior_block_KxP(family) for the K×P anchor block. Uses inline NCP-only a/b blocks (no per-slot CP substitutions in this sub-phase). Replaces {{THETA_REF_PRIOR_BLOCK}} plus the standard prior and CP/W placeholders.

    • "pmulti": Validates cp_a_per_k (same validation as distrib_K). Calls generate_a_blocks_multi(cp_a_per_k) for data_decl, tp_block, model_block. Replaces those blocks plus the standard prior and CP/W placeholders.

    All branches replace the following common placeholders: {{PRIOR_THETA_REF}}, {{PRIOR_SIGMA_THETA_REF}}, {{PRIOR_SIGMA_A}}, {{PRIOR_SIGMA_B}}, {{PRIOR_SIGMA_W}}, {{PRIOR_SIGMA_Y}}, {{PRIOR_PHI}}, {{W_SCALE}}, {{W_PRIOR}}.

  5. Leftover check. After substitution, if any {{…}} pattern remains in the source, abort with gdpar_internal_error reporting the unsubstituted placeholder.

  6. MLE strip. For p_class == "p1" with spec$mle == TRUE, call .strip_priors_block(src) to remove the entire priors block.

Returns Character scalar: the fully substituted Stan model source string, ready for cmdstanr::cmdstan_model(write_stan_file(...)).

Notes

  • Caching is delegated to cmdstanr, which hashes the source string. CP and NCP variants compile and cache independently.
  • Bit-exactness: the canonical pieces are frozen byte-identical copies of the legacy amm_main.stan and amm_distrib_multi.stan; the dispatcher preserves the same substituted strings as the legacy paths for identical inputs.
  • The distrib_K branch's template_name override supports EB-side templates (amm_eb_marginal_K.stan, amm_eb_conditional_K.stan) per D34 of Sub-fase 8.6.C.
  • The pmulti_KxP branch uses .gdpar_build_theta_ref_prior_block_KxP() (distinct from the distrib_K branch's .gdpar_build_theta_ref_prior_block()), reflecting the K×P architecture from Sub-bloque 9.3.d under decision I.iv + J.iv.A.
  • spec$prior is expected to be an object (likely gdpar_prior) with fields: theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, phi (each a character string of Stan code for the corresponding prior distribution).
  • Side effects on validation failure: calls gdpar_abort() with informative messages.

generate_stan_code(prior, cp_a = FALSE, cp_W = FALSE, mle = FALSE)

Purpose Exported entry point for generating the Stan model source code string for a General Dynamic Parameter (GDPAR) model via reference anchoring. Constructs a parameter-configuration list describing the "Path 1" (univariate) canonical template and delegates all template loading, placeholder substitution and emission to the internal .gdpar_emit_canonical_stan helper.

Arguments

Argument Type Meaning
prior object of class gdpar_prior Prior specification for all model parameters. Passed through to the Stan code generator.
cp_a logical scalar (default FALSE) If TRUE, the additive component $a$ is sampled in the centered parametrization (CP); if FALSE, the non-centered parametrization (NCP) is used.
cp_W logical scalar (default FALSE) If TRUE, the modulating component $W$ is sampled in the centered parametrization; if FALSE, NCP is used.
mle logical scalar (default FALSE) If TRUE, the prior block is stripped from the emitted Stan source so that the model reduces to maximum-likelihood estimation (used by gdpar_bvm_check).

Mathematics No direct formula; this function is a configuration wrapper. The parametrization choice affects how the Stan template rewrites the sampling statements for the latent parameters $a$ and $W$.

Returns A character scalar containing the complete Stan model source code, ready for compilation by cmdstanr::cmdstan_model.

Notes

  • Raises an error (via assert_inherits) if prior does not inherit from class "gdpar_prior".
  • The internal helper .gdpar_emit_canonical_stan receives a list with keys p_class (hard-coded to "p1"), prior, cp_a, cp_W, and mle.
  • When mle = TRUE, the emitted source is further processed by .strip_priors_block to remove all prior sampling statements.

.strip_priors_block(src)

Purpose Internal utility that removes the prior block from an already-substituted Stan source string. The block is delimited by the sentinel comment lines // BEGIN PRIORS and // END PRIORS (inclusive). Used when generate_stan_code(mle = TRUE) is requested so that the Stan model contains only the likelihood and deterministic transformations.

Arguments

Argument Type Meaning
src character scalar Fully substituted Stan source code (single string with embedded newlines).

Mathematics No formula; this is a string-processing operation: $$\text{out} = \text{lines}[1:(\texttt{begin_idx}-1)] ;||; \text{lines}[(\texttt{end_idx}+1):\text{end}]$$

Returns A character scalar equal to src with every line from the // BEGIN PRIORS marker through the // END PRIORS marker (inclusive) removed.

Notes

  • Raises a fatal error with class "gdpar_internal_error" (via gdpar_abort) if:
    • Exactly one // BEGIN PRIORS marker is not found.
    • Exactly one // END PRIORS marker is not found.
    • The BEGIN marker appears at or after the END marker.
  • The markers are matched via grep("//\\s*BEGIN PRIORS", ...) and grep("//\\s*END PRIORS", ...) (flexible whitespace).
  • This is a template invariant violation guard: the sentinel comments must exist in every canonical Stan template.

write_stan_to_tempfile(src)

Purpose Internal helper that writes a Stan model source string to a temporary .stan file and returns the file path. The file name incorporates a deterministic hash of the source content so that identical source strings reuse the same file within a session, enabling cmdstanr's internal compilation cache.

Arguments

Argument Type Meaning
src character scalar Complete Stan model source code to be written.

Mathematics The hash is computed as follows. The source string is converted to its raw byte representation, then formatted as numeric strings, concatenated, and truncated to 16 characters. Any character not in [A-Za-z0-9] is replaced with "0": $$h = \texttt{gsub}!\Bigl("[^A\text{-}Za\text{-}z0\text{-}9]",; "0",; \texttt{substr}\bigl(\texttt{format}(\texttt{as.numeric}(\texttt{charToRaw}(s))),;1,;16\bigr)\Bigr)$$ The path is then: $$\text{path} = \texttt{tempdir()};/;\texttt{"gdpar_amm_"},h,\texttt{".stan"}$$

Returns A character scalar with the absolute path to the written .stan file.

Notes

  • Complies with CRAN policy by writing only to tempdir().
  • The hash is a best-effort cache key; collisions are astronomically unlikely for 16 alphanumeric characters but not formally guaranteed.
  • Uses writeLines(src, path) which appends a trailing newline if src does not already end with one.

assemble_stan_data(design, family, amm, y, theta_anchor, group_id = NULL)

Purpose Internal function that assembles the named list of data passed to Stan's data block via cmdstanr::sample(data = ...). This is the univariate (single-coordinate, $p = 1$) path. It converts the R-side design matrices, family metadata, outcome vector, and anchor value into the exact scalar, vector, and matrix types that the Stan template expects. Dispatches to .assemble_stan_data_multi when the model is multivariate ($p &gt; 1$).

Arguments

Argument Type Meaning
design list (from build_amm_design) Contains design matrices Z_a ($n \times J_a$), Z_b ($n \times J_b$), and X ($n \times d$).
family object of class gdpar_family Specifies the response distribution (Gaussian, Poisson, negative binomial, Bernoulli, etc.) and its Stan integer ID.
amm object of class amm_spec The AMM (additive modulating model) specification; carries a, b, W sub-specs and the dimension count p.
y numeric or integer vector of length $n$ The observed outcome vector.
theta_anchor numeric scalar The reference (anchor) value $\theta^{\star}$ that pins the model's location.
group_id NULL or vector of length $n$ Optional grouping variable for multilevel/hierarchical extensions. If NULL, a single group is assumed.

Mathematics No explicit formula; the function constructs the data objects that enter the Stan log-likelihood. Key dimensions exposed:

  • $n$: number of observations (length(y))
  • $J_a$: columns of $Z_a$ (additive fixed-effect basis)
  • $J_b$: columns of $Z_b$ (additive random-effect basis)
  • $d$: columns of $X$ (shared modulating covariates)
  • $\text{dim\_W}$: dimensionality of the $W$ basis (0 if unused)

Family-specific outcome encoding: $$\text{y_real} = \begin{cases} y & \text{if family} = \texttt{"gaussian"} \ \mathbf{0} & \text{otherwise} \end{cases}, \qquad \text{y_int} = \begin{cases} y & \text{if family} \in {\texttt{poisson}, \texttt{neg_binomial_2}, \texttt{bernoulli}} \ \mathbf{0} & \text{otherwise} \end{cases}$$

Returns A named list with the following Stan data fields (among others):

Field Type Description
n integer Number of observations.
family_id integer Stan-side family identifier.
use_a, use_b, use_W 0 or 1 Flags indicating which AMM components are active.
J_a, J_b integer Column counts of $Z_a$ and $Z_b$.
dim_W integer Dimension of $W$ basis (0 if unused).
d integer Column count of $X$.
Z_a, Z_b, X matrix Design matrices (empty $n \times 0$ matrices when unused).
y_real, y_int numeric/integer vector Outcome in real and integer form.
theta_anchor double Anchor value.
use_dispersion_y, use_dispersion_phi 0 or 1 Flags for Gaussian dispersion $\sigma_y$ and NB-2 dispersion $\phi$.
W_type_id, W_n_knots_full, W_knots_full, W_degree various Modulating basis spline metadata.
use_groups, J_groups, group_id integer / integer vector Grouping fields.

Notes

  • Calls .gdpar_guard_multiparam_multivariate(family, p_value) and .gdpar_guard_K_below_family_min(family) as input validation guards.
  • If family inherits from "gdpar_family_multi" or amm$p > 1, dispatches to .assemble_stan_data_multi and returns immediately.
  • Design matrices with zero columns are replaced with matrix(0, n, 0) to satisfy Stan's requirement for non-null matrix data.
  • group_id is resolved via .resolve_group_id.

.resolve_group_id(group_id_arg, n)

Purpose Internal helper that validates and coerces the optional group_id argument into the integer-level representation required by the Stan template. Handles the backward-compatible single-group case (when group_id_arg is NULL) and the multilevel case.

Arguments

Argument Type Meaning
group_id_arg NULL or a vector of length $n$ Grouping variable. When NULL, a single group is assumed.
n positive integer Expected length of the group identifier vector (must match the number of observations).

Mathematics When group_id_arg is not NULL, it is coerced to a factor. The integer codes returned by as.integer(as.factor(group_id_arg)) assign contiguous levels $1, 2, \ldots, J_{\text{groups}}$: $$\texttt{group_id}i \in {1, 2, \ldots, J{\text{groups}}}, \quad J_{\text{groups}} = \texttt{nlevels}(\texttt{as.factor}(\texttt{group_id_arg}))$$

Returns A named list with three elements:

Field Type Description
use_groups integer 0 or 1 0 when group_id_arg is NULL; 1 otherwise.
J_groups integer $\geq 1$ Number of distinct groups. 1 in the single-group regime.
group_id integer vector of length $n$ Per-observation group index (1-based, contiguous).

Notes

  • Raises an error with class "gdpar_input_error" if length(group_id_arg) != n.
  • Raises an error with class "gdpar_input_error" if any element of group_id_arg is NA.
  • In the NULL case, group_id is rep(1L, n) and J_groups is 1L.

.assemble_stan_data_multi(design, family, amm, y, theta_anchor, group_id = NULL)

Purpose Internal function implementing the multivariate ($p &gt; 1$) path of assemble_stan_data. Converts ragged per-coordinate design matrices, the multivariate family object, the outcome matrix, and per-coordinate anchor values into the data list expected by the amm_distrib_multi.stan Stan template. Since Stan does not natively support ragged arrays, per-coordinate design matrices $Z_{a}^{[k]}$ and $Z_{b}^{[k]}$ are padded with zero columns up to the maximum column count $J_{a,\max}$ or $J_{b,\max}$ across coordinates $k = 1, \ldots, p$.

Arguments

Argument Type Meaning
design list (from .build_amm_design_multi) Contains Z_a_list and Z_b_list (lists of $p$ matrices, one per coordinate) and shared X matrix.
family object of class gdpar_family_multi Multivariate family with per-coordinate marginal families and a common Stan ID. Must have $p matching amm$p.
amm object of class amm_spec with amm$p > 1 The multivariate AMM specification.
y $n \times p$ numeric matrix Outcome matrix. Column $k$ feeds the marginal family $D_k(y_{ik} \mid \theta_i[k])$.
theta_anchor numeric scalar or vector of length $p$ Anchor value(s). Scalar is broadcast to length $p$.
group_id NULL or vector of length $n$ Optional grouping variable.

Mathematics Key operations:

  1. Column padding (ragged-to-rectangular): $$Z_{a}^{[k]}{\text{padded}} = \bigl[,Z{a}^{[k]} ;\big|; \mathbf{0}{n \times (J{a,\max} - J_{a}^{[k]})},\bigr]$$ Analogously for $Z_b$. The Stan template iterates only over $1 : J_{a}^{[k]}$, so padded zeros never enter the likelihood.

  2. Modulating basis dimension: $$\text{dim_W} = \frac{\dim(W)}{1}, \qquad W_{\text{per_k_dim}} = \frac{\text{dim_W}}{p}$$ The total $W$ dimension must be a positive multiple of $p$ for separable bases.

  3. Anchor broadcasting: $$\theta^{\star}_k = \begin{cases} \theta^{\star} & \text{if } \texttt{length}(\theta^{\star}) = 1 \ \theta^{\star}_k & \text{if } \texttt{length}(\theta^{\star}) = p \end{cases}$$

  4. Outcome encoding (per-coordinate): $$\text{y_real}{ik} = \begin{cases} y{ik} & \text{if family is Gaussian} \ 0 & \text{otherwise} \end{cases}, \qquad \text{y_int}{ik} = \begin{cases} y{ik} & \text{if family} \in {\texttt{poisson}, \texttt{neg_binomial_2}, \texttt{bernoulli}} \ 0 & \text{otherwise} \end{cases}$$

Returns A named list with the Stan data fields expected by amm_distrib_multi.stan:

Field Type Description
n integer Number of observations.
p integer Number of coordinates.
family_id integer Stan-side family ID.
use_a, use_b, use_W 0 or 1 Activity flags for AMM components.
J_a_max, J_b_max integer Maximum column count across coordinates for $Z_a$ and $Z_b$.
J_a_per_k, J_b_per_k integer vector of length $p$ Per-coordinate column counts.
Z_a, Z_b list of $p$ matrices, each $n \times J_{\max}$ Padded design matrices.
dim_W integer Total $W$ basis dimension.
d integer Column count of $X$.
W_per_k_dim integer Per-coordinate $W$ dimension ($\text{dim\_W} / p$).
X $n \times d$ matrix Shared modulating covariate matrix.
y_real $n \times p$ double matrix Real-valued outcome encoding.
y_int $n \times p$ integer matrix Integer outcome encoding.
theta_anchor double vector of length $p$ Per-coordinate anchor values.
use_dispersion_y, use_dispersion_phi 0 or 1 Dispersion flags.
W_type_id, W_n_knots_full, W_knots_full, W_degree various Spline metadata for $W$.
use_groups, J_groups, group_id integer / integer vector Grouping fields.

Notes

  • Raises "gdpar_internal_error" if amm$p is NULL or not positive.
  • Raises "gdpar_input_error" if family does not inherit from "gdpar_family_multi" when amm$p > 1.
  • Raises "gdpar_input_error" if family$p does not equal amm$p.
  • Raises "gdpar_input_error" if y is not a matrix/array, or if ncol(y) != p.
  • Raises "gdpar_internal_error" if dim_W is not a positive multiple of $p$ (required for separable bases).
  • Raises "gdpar_input_error" if theta_anchor is neither length 1 nor length $p$.
  • Calls .gdpar_resolve_W_stan_data to obtain the modulating basis metadata (W_type_id, knots, degree).
  • Grouping is resolved via .resolve_group_id (same as the univariate path).

generate_stan_code_multi(prior, cp_a = FALSE, cp_W = FALSE, cp_a_per_k = NULL, template_name = "amm_distrib_multi.stan")

Purpose: Translates a user-facing request for Stan code for a multi-coordinate AMM (Additive Mixed Model) into the canonical template machinery. It maps legacy template names to their canonical equivalents and delegates code assembly to .gdpar_emit_canonical_stan.

Arguments:

  • prior: An object of class "gdpar_prior" specifying the prior configuration for the model.
  • cp_a: A logical scalar (default FALSE). If TRUE, indicates that the a component (random effects) should use a centered parameterization (CP). When TRUE, it overrides per-coordinate settings.
  • cp_W: A logical scalar (default FALSE). If TRUE, indicates that the spatial/spline component W should use a centered parameterization.
  • cp_a_per_k: An optional logical vector of length p (the multivariate dimension). Each element indicates whether the corresponding coordinate's a component should use CP (TRUE) or non-centered parameterization (NCP, FALSE). If NULL (default), the choice is governed by the scalar cp_a flag.
  • template_name: A character string (default "amm_distrib_multi.stan") identifying the desired Stan template. The function maps legacy names to canonical pieces in inst/stan/_canonical_pieces/.

Returns: The result of calling .gdpar_emit_canonical_stan with a list containing the translated template name and all arguments. This is typically a list of character strings representing the Stan code blocks.

Notes:

  • The function asserts that prior inherits from the "gdpar_prior" class.
  • The template name translation is handled by an internal switch statement. The default "amm_distrib_multi.stan" is mapped to "amm_canonical_pmulti.stan".
  • The function does not generate the code directly; it prepares the input for the internal canonical emitter.

generate_a_blocks_multi(cp_a_per_k)

Purpose: Generates the Stan code blocks (data_decl, tp_block, model_block) for the a component (random effects) in a multi-coordinate model. It dispatches on the pattern of the cp_a_per_k vector to produce code for three cases: uniform NCP, uniform CP, or mixed CP/NCP across coordinates.

Arguments:

  • cp_a_per_k: A logical vector of length p (the multivariate dimension). Each element indicates the parameterization for that coordinate.

Mathematics: The function generates Stan code for the transformation of raw random effects a_raw into the scaled coefficients a_coef. For each coordinate k:

  • Uniform NCP: a_coef[k][j] = a_raw[...] * sigma_a[k] with a prior a_raw ~ normal(0, 1).
  • Uniform CP: a_coef[k][j] = a_raw[...] with a prior segment(a_raw, ...) ~ normal(0, sigma_a[k]).
  • Mixed: A runtime binary flag cp_a_per_k_data[k] selects between the two transformations and priors.

Returns: A named list with three character scalar elements:

  • data_decl: Stan code for any required data block declarations (empty for uniform cases, contains an integer array declaration for the mixed case).
  • tp_block: Stan code for the transformed parameters block that computes a_coef.
  • model_block: Stan code for the model block that specifies the prior on a_raw.

Notes:

  • The tp_block always includes a guard if (use_a == 1) to skip if the a component is not used.
  • In the mixed case, the caller must inject stan_data$cp_a_per_k_data as an integer vector into the Stan data.
  • The generated code uses a_raw_offset (a per-coordinate index vector) to segment the flat-packed a_raw vector.

.assemble_stan_data_K(design_K, family, amm_list_canonical, y, theta_anchor_K, group_id = NULL, family_id_k_vector = NULL)

Purpose: Assembles the data list for the K-individual (multiple families) Stan template (amm_distrib_K.stan). This is the counterpart of assemble_stan_data for the regime where K > 1 (multiple outcome distributions) and p = 1 (univariate outcomes per distribution). It consumes design matrices, family specifications, and anchors, and produces the structured list expected by the Stan template.

Arguments:

  • design_K: A list returned by .build_amm_design_K(), containing per-slot design matrices (Z_a_k_list, Z_b_k_list, X) and slot metadata.
  • family: A gdpar_family object. Its param_specs must have been promoted to scope = "per_observation" for the K-individual slots. Must not be a gdpar_family_multi object.
  • amm_list_canonical: A named list of amm_spec objects of length K. Used to extract per-slot flags (use_a, use_b, W).
  • y: Numeric or integer vector of outcomes (length n).
  • theta_anchor_K: Numeric vector of length K with the per-slot anchor values.
  • group_id: Optional integer vector of length n for group indices. NULL indicates a single group.
  • family_id_k_vector: Optional integer vector of length K specifying per-slot family IDs (Stan distribution codes). If NULL, all slots use the same family as the family argument.

Mathematics:

  • The function constructs a data list for a distributional regression model with canonical link functions per outcome slot.
  • Each slot k has its own family (identified by family_id_k[k]), a design matrix for the location parameter (Z_a_k), and optional design matrices for scale (Z_b_k) and spline (W) components.
  • The anchor theta_anchor_K[k] is the offset for the linear predictor of slot k.
  • The likelihood is family-specific, with families identified by integer codes (e.g., 1=Gaussian, 3=neg_binomial_2, 5=Beta, etc.).

Returns: A named list with the following elements (among others) expected by amm_distrib_K.stan:

  • n, K: Data dimensions.
  • family_id_k: Integer vector of per-slot family IDs.
  • inv_link_id_per_slot: Integer vector of per-slot inverse-link function IDs.
  • use_a_k, use_b_k: Integer vectors indicating presence of a/b components per slot.
  • use_W, dim_W: Global flags for the spline component.
  • J_a_per_k, J_b_per_k, J_a_max, J_b_max: Column counts for design matrices.
  • Z_a_k, Z_b_k: Lists of padded design matrices (padded to max column count).
  • X: Population-level design matrix.
  • y_real, y_int: Outcome vectors (real or integer, depending on family).
  • theta_anchor_K: Numeric vector of per-slot anchors.
  • use_dispersion_y_k, use_dispersion_phi_k: Integer vectors (currently all zeros) for dispersion flags.
  • W_type_id, W_n_knots_full, W_knots_full, W_degree: Spline component parameters.
  • use_groups, J_groups, group_id: Grouping information.

Notes:

  • Input Validation: The function performs extensive checks:
    • design_K must be a list with element Z_a_k_list.
    • K must be ≥ 2.
    • amm_list_canonical length must equal K.
    • family must be a gdpar_family (not gdpar_family_multi) with a non-NA stan_id.
    • stan_id must be one of the built-in codes (1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13).
    • y must be a vector (not a multi-column matrix) for the K > 1 path.
    • Outcome range is checked per family (e.g., Beta requires y ∈ (0,1), count families require non-negative integers).
    • theta_anchor_K must have length K.
    • If family_id_k_vector is provided, it must be a numeric vector of length K with no NAs, and its first element must match family$stan_id.
  • Internal Error Handling: Raises gdpar_abort with specific error classes (gdpar_internal_error, gdpar_unsupported_feature_error, gdpar_input_error) for invalid inputs.
  • Design Matrix Padding: Design matrices Z_a_k and Z_b_k are padded with zeros to have a common maximum column count (J_a_max, J_b_max) across slots.
  • Grouping: Uses .resolve_group_id to standardize grouping information.
  • Inverse Links: Computes per-slot inverse-link IDs via .gdpar_compute_inv_link_id_per_slot.
  • Side Effects: None; returns the data list.

generate_a_blocks_K(cp_a_per_K)

Purpose Generates the three Stan code fragments (data declaration, transformed-parameters block, and model block) that govern how the raw a coefficients are rescaled into a_coef_k for the K-individual template. The function inspects whether every component of cp_a_per_K is TRUE (all centered), FALSE (all non-centered), or mixed, and emits the corresponding Stan source. This controls the centered vs. non-centered parametrization of the group-level a coefficients in the generated Stan program.

Arguments

Argument Type Meaning
cp_a_per_K Logical vector of length ≥ 1 Per-K indicator: TRUE means the a coefficients for that K-slot use the centered parametrization (no sigma scaling in the transformed parameters block), FALSE means non-centered (raw standard-normal variates are multiplied by sigma_a_k in transformed parameters).

Mathematics

The function selects among three code-generation strategies for mapping raw parameters a_raw to a_coef_k[k][j]:

  • All non-centered (all_ncp): every FALSE. The transformed-parameters assignment is

$$\texttt{a_coef_k}[k][j] = \texttt{a_raw}[\texttt{a_raw_offset}[k]+j] \cdot \sigma_{a,k}$$

and the model block applies a unit-normal prior:

$$\texttt{a_raw} \sim \mathcal{N}(0,1).$$

  • All centered (all_cp): every TRUE. The transformed-parameters assignment is simply

$$\texttt{a_coef_k}[k][j] = \texttt{a_raw}[\texttt{a_raw_offset}[k]+j]$$

and the model block applies the group-level prior per segment:

$$\text{segment}(\texttt{a_raw},;\texttt{a_raw_offset}[k]+1,;J_{a}^{\text{free}}[k]) \sim \mathcal{N}!\bigl(0,;\sigma_{a,k}\bigr).$$

  • Mixed: neither all TRUE nor all FALSE. A per-K binary flag cp_a_per_K_data[k] is read from data. In the transformed-parameters block a local scale_k is computed:

$$\texttt{scale_k} = \begin{cases} 1 & \text{if } \texttt{cp_a_per_K_data}[k]=1 \text{ (centered)},\ \sigma_{a,k} & \text{otherwise (non-centered)}, \end{cases}$$

and a_coef_k[k][j] = a_raw[...] * scale_k. The model block branches per K: if centered, the segment prior is $\mathcal{N}(0, \sigma_{a,k})$; if non-centered, $\mathcal{N}(0,1)$.

Returns A named list with three character scalar elements:

Element Content
data_decl Stan data block fragment declaring cp_a_per_K_data (empty string when all_ncp or all_cp; non-empty array declaration for mixed).
tp_block Stan transformed parameters block fragment assigning a_coef_k[k][j] (conditionally guarded by any_use_a == 1 and use_a_k[k] == 1 && J_a_per_k[k] > 0).
model_block Stan model block fragment placing the prior on a_raw.

Notes

  • The all_ncp branch sets data_decl and model_block for a global a_raw ~ normal(0,1) (no per-K loop in the model block).
  • The all_cp branch loops over K in the model block, applying segment(...) with sigma_a_k per K.
  • In the mixed branch the data declaration cp_a_per_K_data is an array of K integers bounded [0,1]; the caller must supply this in the Stan data list.
  • All three branches guard the tp_block with if (any_use_a == 1) so the block is a no-op when no a coefficients are present.
  • Side effects: none; the function is pure code-generation.

.gdpar_build_theta_ref_prior_block(family)

Purpose (Internal, not exported.) Builds the Stan code block that assigns the anchor priors to theta_ref_k for the single-parametric K-individual distributional-regression template. For most families the block is a vectorized canonical prior; for Tweedie (stan_id == 9) the third distributional slot (the power parameter $p$) must receive a structurally separate uniform(1.01, 1.99) prior. The returned block still contains {{PRIOR_THETA_REF}} and {{PRIOR_SIGMA_THETA_REF}} placeholders, which the caller (generate_stan_code_K) substitutes on a subsequent pass.

Arguments

Argument Type Meaning
family gdpar_family object or NULL The distributional family. When NULL, or when family$stan_id is NA, the canonical (default) vectorized block is returned. When stan_id == 9 (Tweedie), a Tweedie-specific block is emitted.

Mathematics

The default block implements a hierarchical anchor-prior structure:

  • Grouped (use_groups == 1): hyper-priors on $\mu_{\theta_{\text{ref}}}$ and $\sigma_{\theta_{\text{ref}}}$, then for each group $g$:

$$\theta_{\text{ref},k}[g] \sim \mathcal{N}!\bigl(\mu_{\theta_{\text{ref}}}[1],;\sigma_{\theta_{\text{ref}}}[1]\bigr).$$

  • Ungrouped (use_groups != 1): a direct prior on $\theta_{\text{ref},k}[1]$.

For Tweedie, the block is split: slots $[1{:}2]$ receive {{PRIOR_THETA_REF}}, while slot $[3]$ (the power parameter) receives:

$$\texttt{theta_ref_k}[1][3] \sim \text{Uniform}(1.01,; 1.99).$$

Returns A character scalar containing a Stan code snippet (with {{PRIOR_THETA_REF}} / {{PRIOR_SIGMA_THETA_REF}} placeholders still present).

Notes

  • The function is @noRd / @keywords internal; it is never exported.
  • It relies on the caller ordering the THETA_REF_PRIOR_BLOCK substitution before the PRIOR_THETA_REF substitution so that the inner placeholders survive the first pass.
  • When family is NULL or stan_id is NA, the default (non-Tweedie) block is always returned.
  • Side effects: none.

.gdpar_build_theta_ref_prior_block_KxP(family)

Purpose (Internal, not exported.) Parallel to .gdpar_build_theta_ref_prior_block but emits the anchor-prior block for the multi-parametric multivariate (K×P) Full-Bayes Path C template amm_canonical_pmulti_KxP.stan. The anchor container differs: theta_ref_kp[g, k] is an array[J_groups, K] vector[p] (per-slot, per-coord), and the hierarchical hyper-prior is per-slot mu_theta_ref_kp[1, k] and sigma_theta_ref_kp[1, k]. For Tweedie (stan_id == 9), slot $k = 3$ is the power parameter and receives uniform(1.01, 1.99) on every coordinate $jj$.

Arguments

Argument Type Meaning
family gdpar_family object or NULL Same semantics as .gdpar_build_theta_ref_prior_block: NULL or NA stan_id yields the default vectorized block; stan_id == 9 yields the Tweedie-specific block.

Mathematics

The default block iterates over $k = 1, \ldots, K$:

  • Grouped: for each $k$:

$$\mu_{\theta_{\text{ref}},kp}[1, k] \sim \texttt{PRIOR_THETA_REF}, \qquad \sigma_{\theta_{\text{ref}},kp}[1, k] \sim \texttt{PRIOR_SIGMA_THETA_REF},$$

then for each group $g$:

$$\theta_{\text{ref},kp}[g, k] \sim \mathcal{N}!\bigl(\mu_{\theta_{\text{ref}},kp}[1, k],;\sigma_{\theta_{\text{ref}},kp}[1, k]\bigr).$$

  • Ungrouped: for each $k$:

$$\theta_{\text{ref},kp}[1, k] \sim \texttt{PRIOR_THETA_REF}.$$

For Tweedie, the block adds a conditional on $k$: when $k \le 2$ the canonical vectorized prior is used; when $k = 3$ (power slot), every coordinate $jj = 1,\ldots,p$ receives:

$$\mu_{\theta_{\text{ref}},kp}[1, 3][jj] \sim \text{Uniform}(1.01,; 1.99)$$

(and analogously theta_ref_kp[1, 3][jj] in the ungrouped branch).

Returns A character scalar with Stan code containing {{PRIOR_THETA_REF}} and {{PRIOR_SIGMA_THETA_REF}} placeholders.

Notes

  • The function does not need to know K at code-generation time because it emits a for (k in 1:K) loop in the Stan source; K is data.
  • The Tweedie branch uses if (k <= 2) rather than slicing [1:2] because the container is a vector of length p, and the distinction is by slot (the outer k index), not by inner vector elements. Slots 1 and 2 are the location and scale parameters; slot 3 is the power parameter.
  • Same caller-ordering requirement as .gdpar_build_theta_ref_prior_block: the THETA_REF_PRIOR_BLOCK substitution must precede PRIOR_THETA_REF.
  • Side effects: none.

generate_stan_code_KxP_FB(prior, cp_W = FALSE, family = NULL)

Purpose (Internal, not exported.) Thin wrapper around .gdpar_emit_canonical_stan for the multi-parametric multivariate Full-Bayes regime (K ≥ 2 slots crossed with p ≥ 2 coords). It dispatches to the canonical template-dispatcher by supplying the piece class "pmulti_KxP". The wrapper exists so that bootstrap scripts (data-raw/bootstrap_fb_goldens_KxP.R) and Tier-1 compare-path tests can emit and compile the Path C template deterministically, while the user-facing guard .gdpar_guard_multiparam_multivariate still aborts gdpar() when K > 1 and p > 1.

Arguments

Argument Type Meaning
prior gdpar_prior object The prior specification; validated by assert_inherits(prior, "gdpar_prior", "prior").
cp_W Logical scalar (default FALSE) If TRUE, the modulating component W is sampled in the centered parametrization; if FALSE, non-centered.
family gdpar_family object or NULL (default NULL) When supplied, drives the family-specific anchor-prior block via .gdpar_build_theta_ref_prior_block_KxP(). When NULL, the canonical vectorized block is used.

Returns A character scalar containing the complete Stan model source code (ready for cmdstanr::cmdstan_model()), produced by .gdpar_emit_canonical_stan after all placeholder substitutions.

Notes

  • The first line performs argument validation: assert_inherits(prior, "gdpar_prior", "prior") raises an error if prior is not of the correct class.
  • The function delegates entirely to .gdpar_emit_canonical_stan, passing a configuration list with keys p_class = "pmulti_KxP", prior, cp_W, and family. No template-reading or substitution logic is performed locally.
  • The function is not exported; it is consumed only by internal bootstrap and test machinery.
  • Side effects: none beyond the validation assertion.

.gdpar_fb_KxP_fit(y_matrix, family, amm_list, data, formula_rhs, theta_anchor_kp = NULL, iter_warmup = 200L, iter_sampling = 200L, chains = 2L, seed = 91001L, refresh = 0L, ...)

Purpose Internal workhorse that orchestrates a full Bayesian (FB) full-rank $K \times P$ fit for the General Dynamic Parameter framework. It validates inputs, assembles the Stan data block from the $K$-slot amm_list, generates and compiles the Stan model via the canonical-piece code-generation pipeline, constructs a deterministic per-chain initialisation function (reusing the EB-side canonical helper), invokes cmdstanr::cmdstan_model$sample(), and returns a list bundling the fitted object together with metadata needed for downstream post-processing (e.g. family_name, dimensions, MCMC settings, cmdstan version).

Arguments

Argument Type Meaning
y_matrix numeric matrix Response matrix with $n = \texttt{nrow(data)}$ rows and $p$ columns (one column per outcome dimension / time-point).
family gdpar_family object Distributional family for the likelihood (e.g. Gaussian, beta, gamma, student-t, Tweedie). Must inherit from class "gdpar_family".
amm_list named list of amm_spec objects Length-$K$ list where each slot describes one individual-specific AMM specification. Must have $K \ge 2$. Each slot's $p element (defaulting to $1$) must satisfy $p \ge 2$.
data data.frame Observational data whose row count determines $n$.
formula_rhs formula (right-hand side) Right-hand-side formula used inside .build_amm_design_KxP() to construct the design matrices.
theta_anchor_kp numeric matrix or NULL $K \times p$ matrix of anchor-point values for $\theta$. If NULL, defaults to a zero matrix $\mathbf{0}_{K \times p}$.
iter_warmup integer scalar Number of warmup (adaptation) iterations per chain. Default 200L.
iter_sampling integer scalar Number of post-warmup sampling iterations per chain. Default 200L.
chains integer scalar Number of MCMC chains. Default 2L.
seed integer scalar PRNG seed passed to cmdstanr. Default 91001L.
refresh integer scalar Console refresh interval for Stan progress output. Default 0L (silent).
... Additional arguments forwarded to cmdstanr::cmdstan_model$sample() (e.g. adapt_delta, max_treedepth).

Algorithm

  1. Package check – Guard that cmdstanr is available; abort with class "gdpar_unsupported_feature_error" if not.
  2. Input validation – Assert family inherits "gdpar_family"; assert amm_list is a list of length $\ge 2$; extract $K = |\texttt{amm\_list}|$ and $p = \texttt{amm\_list}[[1]]\$p$ (default $1$), then assert $p \ge 2$.
  3. W-basis materialisation – For every slot $k \in \{1,\dots,K\}$, if an W basis object exists but carries NULL or NA dimension, call materialize_W_basis(W, p = p).
  4. Design construction – Call .build_amm_design_KxP(amm_list, data, formula_rhs) to produce design_KxP.
  5. Anchor default – If theta_anchor_kp is NULL, set it to matrix(0, K, p).
  6. Response validation – Assert y_matrix is a numeric matrix with nrow(data) rows and p columns.
  7. Stan data assembly – Call .assemble_stan_data_KxP(design_KxP, family, amm_list, y_matrix, theta_anchor_kp, path = "FB").
  8. Prior & code generation – Create a default prior via gdpar_prior(); generate the Stan source via generate_stan_code_KxP_FB(prior, cp_W = FALSE, family = family); write it to a temporary file and compile with cmdstanr::cmdstan_model.
  9. Initialisation function – Define fb_init_fn(chain_id) that calls .gdpar_eb_make_random_init_KxP(stan_data, seed_offset = chain_id, base_seed = seed). This produces chain-deterministic starting values with:
    • $\theta_{\text{ref},kp} \sim$ small jitter,
    • $\sigma_{a,k} = 0.1 + \text{jitter}$,
    • $\sigma_{b,k} = 0.1 + \text{jitter}$,
    • $a_{\text{raw}} \sim$ small jitter,
    • $c_{b,kp,\text{raw}} \sim$ small jitter. Parameters not consumed by the FB extension (e.g. sigma_W, W_raw when use_W = 0; sigma_y_pop_k, phi_pop_k when all use_dispersion_*_k are zero) have size zero by construction, so missing init entries are harmless.
  10. Sampling – Call model$sample(...) with the assembled data and all MCMC control arguments.
  11. Return – Bundle the cmdstanr fit object, stan_data, family identifiers, dimensional constants ($K$, $p$, $n$), MCMC settings, seed, and the cmdstan version string.

Returns A named list with elements:

Element Type Description
fit CmdStanMCMC The fitted MCMC object from cmdstanr.
stan_data named list The complete Stan data list passed to the model.
family_name character Human-readable family name (e.g. "gaussian").
family_stan_id integer Numeric family identifier used inside Stan.
K integer Number of individuals/slots.
p integer Number of outcome dimensions per slot.
n integer Number of observations (rows of data).
seed integer PRNG seed used.
iter_warmup integer Warmup iterations per chain.
iter_sampling integer Post-warmup iterations per chain.
chains integer Number of MCMC chains.
cmdstan_version character (or value from helper) Version string of the running CmdStan installation.

Notes

  • The function is not exported (leading dot in name).
  • Several gdpar_abort() calls raise classed errors: "gdpar_unsupported_feature_error" (missing cmdstanr), "gdpar_input_error" (invalid amm_list length, $p &lt; 2$, mismatched y_matrix dimensions).
  • The W-basis materialisation loop is a defensive side-effect that mutates amm_list in-place for any slot whose W object has unmaterialised dimension.
  • The init function deliberately reuses the EB-side canonical helper .gdpar_eb_make_random_init_KxP() to avoid inv-logit / exp blow-ups near link-support boundaries for distributional families (beta, gamma, student-t, etc.), which would occur with cmdstanr's default init = 2.
  • Extra arguments (...) are forwarded verbatim to cmdstanr::cmdstan_model$sample().

generate_stan_code_K(prior, cp_a = FALSE, cp_W = FALSE, cp_a_per_K = NULL, family = NULL, template_name = "amm_distrib_K.stan")

Purpose Reads a $K$-individual Stan template from the package's inst/stan/ directory and substitutes prior, parametrisation, and $a$-block placeholders to produce a complete Stan model source string. It is the $K$-individual counterpart of generate_stan_code_multi(). Starting from sub-phase 8.3.5b the function also accepts an optional gdpar_family object to emit a family-specific anchor-prior block ($\theta_{\text{ref}}$ prior). The function translates legacy template names to their canonical-piece equivalents (relocated in sessions B9.5/B9.6) and delegates the actual emission to .gdpar_emit_canonical_stan().

Arguments

Argument Type Meaning
prior gdpar_prior object Prior specification governing the substitution of prior-related placeholders (e.g. hyperprior scale expressions).
cp_a logical scalar Centring flag for the $a$-block (population-level effects). When cp_a_per_K is NULL, this value is broadcast as a one-element pseudo-vector (legacy mode). Default FALSE (non-centred).
cp_W logical scalar If TRUE, the modulating component $W$ is sampled in centred parametrisation; if FALSE (default), non-centred parametrisation.
cp_a_per_K logical vector of length $K$ or NULL Per-slot centring control for the $a$-block. NULL (default) falls back to broadcasting cp_a uniformly across all $K$ slots.
family gdpar_family object or NULL When supplied, drives the family-specific anchor-prior block emission via .gdpar_build_theta_ref_prior_block(). NULL (default) retains the canonical vectorised block for backward compatibility with existing test call sites.
template_name character scalar Name of the Stan template file. Default "amm_distrib_K.stan". Internally mapped to a canonical-piece filename (see mapping table below).

Algorithm / Template Translation

The function performs a name mapping from legacy template names to their canonical-piece equivalents before delegating:

Legacy template_name Canonical-piece name
"amm_distrib_K.stan" "amm_canonical_distrib_K.stan"
"amm_eb_marginal_K.stan" "amm_canonical_eb_marginal_K.stan"
"amm_eb_conditional_K.stan" "amm_canonical_eb_conditional_K.stan"
(any other) used as-is

After mapping, the function assembles a configuration list:

$$\texttt{config} = {\texttt{p_class}: \texttt{"distrib_K"},; \texttt{prior},; \texttt{cp_a},; \texttt{cp_W},; \texttt{cp_a_per_K},; \texttt{family},; \texttt{template_name}}$$

and passes it to .gdpar_emit_canonical_stan(config) which performs the actual template reading and placeholder substitution.

Mathematics

The W-related placeholders {{W_SCALE}} and {{W_PRIOR}} are identical across the multivariate and $K$-individual templates because $W$ is a globally shared parameter (a single $\sigma_W[1]$); the same expressions are substituted regardless of template. The {{CANONICAL_HELPERS}} placeholder absorbs the in-line bspline_basis_eval and apply_W_basis_diff helpers following the G.iv pattern. In-line Tweedie function blocks and apply_inv_link_by_id remain $K$-specific and are kept in-line.

When family is non-NULL, the {{THETA_REF_PRIOR_BLOCK}} placeholder is replaced by the output of .gdpar_build_theta_ref_prior_block(family), which emits the per-family prior on anchor-point parameters $\theta_{\text{ref},kp}$. When family is NULL, the legacy vectorised block is retained.

Returns A character scalar containing the fully substituted Stan model source code, ready for compilation via cmdstanr::cmdstan_model().

Notes

  • The function is not exported (lacks @export tag; documented with @keywords internal and @noRd).
  • prior is validated via assert_inherits(prior, "gdpar_prior", "prior"), which aborts if the class does not match.
  • The template relocation (from inst/stan/amm_distrib_K.stan to inst/stan/_canonical_pieces/amm_canonical_distrib_K.stan) and the analogous relocation of the EB-side $K$ templates were performed in sessions B9.5 and B9.6 respectively. Bit-exact Stan semantics preservation was verified token-wise during the migration.
  • The function does not write the Stan source to disk; it only returns the string. The caller is responsible for persisting and compiling.

R/utils-conditions.R

gdpar_abort(message, class = character(), data = list())

Purpose

Internal helper that signals a structured error condition. It wraps base::stop with a class hierarchy so that user code can catch specific gdpar errors via tryCatch and optionally inspect additional fields embedded in the condition object.

Arguments

  • message — Character scalar. The human-readable error message.
  • class — Character vector (default character()). Additional class names prepended to the condition's class vector, ahead of the fixed suffix c("gdpar_error", "error", "condition").
  • data — Named list (default list()). Additional fields copied into the condition object for programmatic access by handlers.

Returns

Never returns; signals an error condition of class c(class, "gdpar_error", "error", "condition") via stop.

Notes

  • The condition object is constructed with structure(..., list(message = message, call = sys.call(-1))). The call field is set to sys.call(-1), i.e., the call of the function that invoked gdpar_abort, so the reported call site reflects the caller rather than gdpar_abort itself.
  • Each element of data is copied into the condition object by name via a for loop over names(data); existing fields (message, call) can be overwritten if data contains entries named message or call.
  • Because stop is invoked, execution halts unless the condition is caught by a tryCatch handler matching one of the condition's classes.
  • Marked @keywords internal and @noRd; not exported.

gdpar_warn(message, class = character(), data = list())

Purpose

Internal helper that signals a structured warning condition. It wraps base::warning with a class hierarchy so that user code can catch specific gdpar warnings via withCallingHandlers.

Arguments

  • message — Character scalar. The human-readable warning message.
  • class — Character vector (default character()). Additional class names prepended to the condition's class vector, ahead of the fixed suffix c("gdpar_warning", "warning", "condition").
  • data — Named list (default list()). Additional fields copied into the condition object for programmatic access by handlers.

Returns

Invisibly NULL; signals a warning condition of class c(class, "gdpar_warning", "warning", "condition") via warning.

Notes

  • The condition object is constructed identically to gdpar_abort in structure: list(message = message, call = sys.call(-1)) with the class vector set to c(class, "gdpar_warning", "warning", "condition").
  • Each element of data is copied into the condition object by name via a for loop over names(data); existing fields (message, call) can be overwritten if data contains entries named message or call.
  • Unlike gdpar_abort, warning does not halt execution by default (unless options(warn = 2) is set, which converts warnings to errors).
  • The return value is governed by warning, which returns NULL invisibly; gdpar_warn itself does not explicitly return, so the value is whatever warning returns.
  • Marked @keywords internal and @noRd; not exported.

gdpar_inform(message, class = character(), data = list())

Purpose

Internal helper that signals a structured informative message condition. It wraps base::message with a class hierarchy so that user code can suppress specific gdpar messages via suppressMessages or handle them via withCallingHandlers.

Arguments

  • message — Character scalar. The human-readable informative message.
  • class — Character vector (default character()). Additional class names prepended to the condition's class vector, ahead of the fixed suffix c("gdpar_message", "message", "condition").
  • data — Named list (default list()). Additional fields copied into the condition object for programmatic access by handlers.

Returns

Invisibly NULL; signals a message condition of class c(class, "gdpar_message", "message", "condition") via message.

Notes

  • The condition object is constructed with list(message = paste0(message, "\n"), call = sys.call(-1)). A trailing newline ("\n") is appended to message before embedding it in the condition, so the displayed text ends with a newline.
  • Each element of data is copied into the condition object by name via a for loop over names(data); existing fields (message, call) can be overwritten if data contains entries named message or call.
  • The return value is governed by message, which returns NULL invisibly; gdpar_inform itself does not explicitly return.
  • Marked @keywords internal and @noRd; not exported.

require_suggested(pkg, reason)

Purpose

Internal helper that checks whether a Suggests-listed package is installed before attempting to use it. This enforces CRAN policy that Suggests packages may only be invoked through conditional loading (e.g., requireNamespace), never assumed to be present.

Arguments

  • pkg — Character scalar. The name of the suggested package to check.
  • reason — Character scalar. A human-readable description of what the package is needed for; embedded verbatim into the error message shown to the user.

Mathematics

The check is a simple predicate:

$$ \text{available}(\text{pkg}) = \texttt{requireNamespace}(\text{pkg},\ \texttt{quietly = TRUE}) $$

If $\text{available}(\text{pkg})$ is FALSE, an error is signaled.

Returns

Invisibly TRUE if the package is available. Otherwise, signals a condition of class c("gdpar_missing_dependency_error", "gdpar_error", "error", "condition") via gdpar_abort and does not return.

Notes

  • Availability is tested with requireNamespace(pkg, quietly = TRUE), which returns TRUE if the package can be loaded without attaching it and without printing status messages.
  • On failure, gdpar_abort is called with:
    • message: sprintf("Package '%s' is required to %s but is not installed. Please install it.", pkg, reason).
    • class: "gdpar_missing_dependency_error".
    • data: list(package = pkg, reason = reason), so handlers can programmatically access the offending package name and the reason string via cond$package and cond$reason.
  • Because the error is signaled through gdpar_abort, the call field of the condition is sys.call(-1) relative to gdpar_abort, i.e., the call to require_suggested.
  • Marked @keywords internal and @noRd; not exported.

R/utils-validation.R

assert_inherits(x, cls, arg_name)

Purpose Internal helper to enforce that an object belongs to one of the specified classes; used to validate arguments before proceeding with computation.

Arguments

  • x : Object to check. Type: ANY.
  • cls : Character vector of acceptable classes. Type: character.
  • arg_name : Character scalar with the argument name as it appears in the calling function's signature, used to construct an informative error message. Type: character.

Mathematics Not applicable.

Returns Invisibly returns TRUE on success. Otherwise, it signals a condition of class gdpar_input_error via gdpar_abort.

Notes

  • The condition is signaled using gdpar_abort with the class "gdpar_input_error".
  • The data list attached to the condition contains the original argument name, the expected classes, and the received class(es).
  • The check is performed using inherits(x, cls).

assert_numeric_scalar(x, arg_name, lower = -Inf, upper = Inf, allow_null = FALSE)

Purpose Internal helper to validate that an argument is a single, finite numeric value within a specified inclusive range, with optional allowance for NULL.

Arguments

  • x : Object to check. Type: ANY.
  • arg_name : Character scalar with the argument name. Type: character.
  • lower : Numeric scalar specifying the inclusive lower bound. Defaults to -Inf. Type: numeric.
  • upper : Numeric scalar specifying the inclusive upper bound. Defaults to Inf. Type: numeric.
  • allow_null : Logical flag. If TRUE, NULL is accepted without further checks. Defaults to FALSE. Type: logical.

Mathematics Not applicable.

Returns Invisibly returns TRUE on success. Otherwise, it signals a condition of class gdpar_input_error via gdpar_abort.

Notes

  • The function checks for NULL first, followed by numeric type, length, and finiteness.
  • The range check is inclusive: $x \in [lower, upper]$.
  • The error message for an out-of-range value includes the lower and upper bounds and the received value.

assert_count(x, arg_name, max = Inf)

Purpose Internal helper to validate that an argument is a positive integer scalar (i.e., a count) and optionally enforce a maximum value.

Arguments

  • x : Object to check. Type: ANY.
  • arg_name : Character scalar with the argument name. Type: character.
  • max : Integer upper bound. Defaults to Inf, meaning no upper bound. Type: numeric.

Mathematics Not applicable.

Returns Invisibly returns TRUE on success. Otherwise, it signals a condition of class gdpar_input_error via gdpar_abort.

Notes

  • A valid value must satisfy: is.numeric(x), length(x) == 1L, is.finite(x), x >= 1, and x must equal as.integer(x).
  • The check for integer equivalence is performed by comparing x to as.integer(x). Due to floating-point precision, this may reject non-integer numbers that are very close to integers.

assert_one_sided_formula(x, arg_name, allow_null = TRUE)

Purpose Internal helper to validate that an argument is a one-sided formula (e.g., ~ x + y).

Arguments

  • x : Object to check. Type: ANY.
  • arg_name : Character scalar with the argument name. Type: character.
  • allow_null : Logical flag. If TRUE, NULL is accepted. Defaults to TRUE. Type: logical.

Mathematics Not applicable.

Returns Invisibly returns TRUE on success. Otherwise, it signals a condition of class gdpar_input_error via gdpar_abort.

Notes

  • The check for a formula object is performed using inherits(x, "formula").
  • A one-sided formula in R is represented as an object of length 2 (the call to ~ and its single right-hand-side argument).

assert_data_frame(x, arg_name, required_vars = character())

Purpose Internal helper to validate that an argument is a data frame and that it contains specific columns.

Arguments

  • x : Object to check. Type: ANY.
  • arg_name : Character scalar with the argument name. Type: character.
  • required_vars : Character vector of variable names that must be present as columns in x. Defaults to an empty vector. Type: character.

Mathematics Not applicable.

Returns Invisibly returns TRUE on success. Otherwise, it signals a condition of class gdpar_input_error via gdpar_abort.

Notes

  • The function first checks for data frame structure using is.data.frame(x).
  • It then computes missing_vars <- setdiff(required_vars, colnames(x)).
  • If any required variables are missing, the error message lists them in single quotes.

R/W_basis.R

W_basis(type = c("polynomial", "bspline", "user"), degree = NULL, knots = NULL, df = NULL, boundary_knots = NULL, basis_fn = NULL, dim = NULL, p = NULL)

Purpose

Constructor for the S3 class W_basis. It defines the finite-dimensional functional space from which the modulating component $W:\Theta\to\mathbb{R}^{p\times d}$ of the AMM canonical form is drawn. The returned object bundles a closure (evaluator) that maps a value of theta_ref to a numeric basis vector, together with metadata (type, degree, knots, df, boundary_knots, dim, p, block_indices) consumed by downstream code-generation and pre-flight machinery.

Arguments

Argument Type Meaning
type Character scalar Basis type; resolved via match.arg to one of "polynomial", "bspline", "user".
degree Positive integer or NULL Polynomial degree (type = "polynomial") or B-spline degree (type = "bspline"). Defaults to 1L (polynomial) or 3L (bspline) when NULL. Ignored for "user".
knots Numeric vector or NULL Interior knots passed to splines::bs. Required for "bspline" when df is not supplied. Mutually exclusive with df.
df Positive integer or NULL Degrees of freedom passed to splines::bs. Required for "bspline" when knots is not supplied. Mutually exclusive with knots.
boundary_knots Numeric vector of length 2 or NULL Fixed boundary knots for the B-spline expansion. Ignored when type != "bspline".
basis_fn Function or NULL User-supplied evaluator; required for type = "user", ignored otherwise. Must accept a numeric vector of length p and return a numeric vector of length dim.
dim Positive integer or NULL Declared basis dimension; required for type = "user", computed automatically for the package-provided types.
p Positive integer or NULL Dimension of theta_ref. When non-NULL, the basis is eagerly materialized via materialize_W_basis() before return.

Mathematics

For type = "polynomial" the evaluator computes, for a numeric vector $\theta=(\theta_1,\ldots,\theta_{p_\theta})$,

$$ \text{evaluator}(\theta)= \begin{cases} (\theta_1,\theta_1^2,\ldots,\theta_1^{\text{degree}}) & p_\theta=1,\[4pt] \bigl(\theta_1,\theta_1^2,\ldots,\theta_1^{\text{degree}},;\theta_2,\ldots,\theta_2^{\text{degree}},;\ldots,;\theta_{p_\theta},\ldots,\theta_{p_\theta}^{\text{degree}}\bigr) & p_\theta>1. \end{cases} $$

Cross-terms between coordinates are excluded; the layout is block-by-coordinate.

For type = "bspline" the evaluator applies splines::bs independently to each scalar coordinate $\theta_k$ and concatenates the resulting basis vectors in coordinate order:

$$ \text{evaluator}(\theta)=\bigl(B(\theta_1),,B(\theta_2),,\ldots,,B(\theta_{p_\theta})\bigr), $$

where $B(\cdot)$ denotes the B-spline basis evaluated at a single point. When boundary_knots is NULL and knots is supplied, the boundary is set to range(c(knots, tk)) per evaluation point; when boundary_knots is supplied it is passed as Boundary.knots.

Returns

A list of class c("W_basis", "list") with components:

  • type — the resolved basis type string.
  • degree — integer (polynomial/bspline) or NA_integer_ (user).
  • knots — the supplied knot vector or NULL.
  • df — the supplied df or NULL.
  • boundary_knots — the supplied boundary knots or NULL.
  • dimNA_integer_ at construction for polynomial/bspline (populated later by materialize_W_basis()); the user-declared integer for type = "user".
  • evaluator — closure mapping theta_ref to a numeric vector.
  • pNULL unless p was supplied (then set by materialize_W_basis()).
  • block_indicesNULL unless materialized.

When p is supplied, materialize_W_basis(obj, p) is invoked prior to return, so dim, p, and block_indices are populated.

Notes

  • assert_count(p, "p") is called when p is non-NULL; p is then coerced to integer.
  • For type = "bspline": if both knots and df are NULL, an error of class "gdpar_input_error" is raised. If both are non-NULL, a mutual-exclusivity error is raised.
  • boundary_knots validation (when non-NULL): must be numeric, length exactly 2, both finite, and boundary_knots[1] < boundary_knots[2]; otherwise a "gdpar_input_error" is raised with data = list(received = boundary_knots). When knots is also supplied, the interior knots must satisfy min(knots) > boundary_knots[1] and max(knots) < boundary_knots[2]; violation raises a "gdpar_input_error" with data containing knots_range and boundary_knots.
  • require_suggested("splines", "evaluate B-spline bases") is invoked for the bspline branch.
  • For type = "user": basis_fn must pass is.function; otherwise a "gdpar_input_error" is raised. dim is validated via assert_count. The evaluator slot is set directly to basis_fn (no wrapper).
  • The degree argument is validated via assert_count for both polynomial and bspline branches (after default resolution).
  • No S3 methods are defined in this section; the class attribute is set manually as c("W_basis", "list").

materialize_W_basis(wb, p)

Purpose

Internal finalizer that records the total basis dimension dim, the theta_ref dimension p, and (for separable types) the per-coordinate block_indices on an existing W_basis object. For polynomial and bspline bases the dimension is derived analytically from constructor arguments; for user bases the evaluator is probed at rep(0.5, p) and the declared dim is validated against the actual output length.

Arguments

Argument Type Meaning
wb W_basis object The basis object to finalize. Validated via assert_inherits(wb, "W_basis", "wb").
p Positive integer Length of theta_ref. Validated via assert_count; coerced to integer.

Mathematics

For type = "polynomial":

$$ \text{per_k_dim} = \text{degree},\qquad \text{dim} = \text{per_k_dim}\cdot p, $$

$$ \text{block_indices}^{(k)} = \bigl((k-1),\text{per_k_dim}+1\bigr):\bigl(k,\text{per_k_dim}\bigr),\quad k=1,\ldots,p. $$

For type = "bspline":

$$ \text{per_k_dim}= \begin{cases} \text{df} & \text{if } \texttt{df}\text{ is supplied},\\ \text{length(knots)}+\text{degree} & \text{otherwise}, \end{cases} \qquad \text{dim} = \text{per_k_dim}\cdot p, $$

with block_indices given by the same formula as the polynomial case.

For type = "user": the evaluator is probed at $\theta^{\star}=(0.5,\ldots,0.5)\in\mathbb{R}^{p}$. The returned vector out must be numeric and satisfy $\text{length}(\text{out})=\texttt{wb\$dim}$. block_indices is set to NULL.

Returns

A modified W_basis object (mutated in place and returned) with dim, p, and block_indices populated. For user bases, block_indices remains NULL.

Notes

  • If wb$p is already non-NULL, non-NA, and differs from the supplied p, a "gdpar_input_error" is raised with data = list(wb_p = wb$p, requested_p = p).
  • For user bases, evaluator errors during probing are caught via tryCatch and re-raised as "gdpar_input_error" with data = list(probe = probe).
  • If the user evaluator returns a non-numeric object, a "gdpar_input_error" is raised with data = list(returned_class = class(out)).
  • If the user evaluator returns a vector whose length differs from wb$dim, a "gdpar_input_error" is raised (no data field).
  • Marked @keywords internal and @noRd; not exported.

as_per_k(wb)

Purpose

Documented (via roxygen) as an exported function that splits a separable multivariate W_basis (polynomial or bspline with $p&gt;1$) into a list of length p of univariate W_basis objects, each materialized with p = 1L. For type = "user" the function is documented to return NULL with a warning, since separability of a user-supplied evaluator cannot be inferred automatically.

Arguments

Argument Type Meaning
wb W_basis A basis object with p populated (either at construction or via materialize_W_basis()).

Returns

Per the roxygen: a list of length wb$p of W_basis objects (separable types), or NULL with a warning (type = "user").

Notes

  • The function body is not present in this section (section 1 of 2); only the roxygen documentation and @export tag appear. The implementation is deferred to a subsequent section. The description above reflects the documented contract, not verified source behavior.

as_per_k(wb)

Purpose: Decomposes a materialized multivariate W_basis (with p > 1) into a list of p per-coordinate W_basis objects, each with p = 1L. This supports the separable-multivariate Stan data assembly path, where each coordinate k gets its own scalar basis.

Arguments:

  • wb (W_basis): A materialized W basis object. Must inherit from "W_basis" and have a non-null, non-NA p slot.

Mathematics: For a separable basis of dimension $p$, produce $p$ independent univariate bases:

$$\text{out} = \big[,W^{(1)},, W^{(2)},, \ldots,, W^{(p)},\big], \quad W^{(k)} \text{ with } p = 1.$$

Each $W^{(k)}$ inherits the type-specific parameters (degree, knots, df, boundary_knots) of the parent.

Returns: A list of length wb$p whose elements are W_basis objects with p = 1L; or NULL for type == "user" (after a warning); or NULL for unrecognized types (implicit fall-through).

Notes:

  • Calls assert_inherits(wb, "W_basis", "wb") — raises an error if wb is not a W_basis.
  • If wb$p is NULL or NA, aborts via gdpar_abort with class "gdpar_input_error", instructing the user to either construct with p = ... or call materialize_W_basis(wb, p).
  • For type == "user": issues warning(...) (not an abort) and returns NULL, because separability cannot be inferred automatically for user-supplied bases.
  • For type == "polynomial": each child is W_basis(type = "polynomial", degree = wb$degree, p = 1L).
  • For type == "bspline": each child carries degree, knots, df, and boundary_knots from the parent, with p = 1L.
  • The loop variable k is unused inside the lapply body — it merely indexes position.

print.W_basis(x, ...)

Purpose: S3 print method for objects of class "W_basis". Renders a human-readable summary of the basis configuration to stdout.

Arguments:

  • x (W_basis): The basis object to print.
  • ... (any): Unused; present for S3 generic compatibility.

Returns: Invisibly returns x.

Notes:

  • Prints <W_basis> header followed by indented fields.
  • type is always printed.
  • degree is printed only when !is.na(x$degree) (so NA degree is suppressed).
  • knots is printed (comma-separated, via format()) only when !is.null(x$knots).
  • df is printed only when !is.null(x$df).
  • boundary_knots is printed (comma-separated) only when !is.null(x$boundary_knots).
  • dim is printed only when !is.na(x$dim).
  • p is printed when !is.null(x$p) && !is.na(x$p), with a contextual label:
    • " (univariate)" when x$p == 1L,
    • " (multivariate, separable)" when x$p != 1L and !is.null(x$block_indices),
    • " (multivariate)" otherwise.
  • block_indices is printed when non-NULL. Each element idx of the list is rendered as:
    • as.character(idx) when length(idx) == 1L,
    • sprintf("%d-%d", min(idx), max(idx)) otherwise. Elements are joined by ", ".
  • No side effects beyond console output.

.gdpar_resolve_bspline_knots(W)

Purpose: Resolves (derives) the interior knots of a B-spline W_basis from df and boundary_knots when knots is not explicitly provided. This ensures a deterministic knot vector for Stan-side Cox–de Boor recursion across HMC steps. Mutates and returns the input W.

Arguments:

  • W (W_basis): A W basis object, expected to have type, knots, df, degree, and boundary_knots slots.

Mathematics: Given degree $D$, degrees of freedom $\text{df}$, and boundary knots $[b_{\text{lo}}, b_{\text{hi}}]$:

$$n_{\text{int}} = \text{df} - D.$$

If $n_{\text{int}} &gt; 0$, the interior knots are equally spaced inside the boundary:

$$\text{grid} = \text{seq}!\left(b_{\text{lo}},, b_{\text{hi}},, \text{length.out} = n_{\text{int}} + 2\right),$$

$$\text{knots} = \text{grid}\big[2 \ldots (n_{\text{int}}+1)\big] = \text{grid}[-c(1,, n_{\text{int}}+2)].$$

If $n_{\text{int}} = 0$, knots is set to numeric(0L) (no interior knots).

Returns: The input W_basis object with W$knots populated (mutated in place and returned). Passthrough (unchanged) when W$type != "bspline" or when W$knots is already non-NULL.

Notes:

  • Early return (passthrough) if W$type != "bspline".
  • Early return (passthrough) if W$knots is already set (non-NULL).
  • If W$knots is NULL and either W$df or W$boundary_knots is NULL, aborts via gdpar_abort with class "gdpar_input_error" and data = list(df = W$df, boundary_knots = W$boundary_knots).
  • If n_int < 0L (i.e., df < degree), aborts via gdpar_abort with class "gdpar_input_error" and a formatted message.
  • If n_int == 0L, sets W$knots <- numeric(0L) and returns.
  • The function mutates W directly (R's copy-on-modify semantics apply to the caller's reference only if the caller reassigns the return value).

.gdpar_bspline_knots_full(W)

Purpose: Assembles the augmented (full) B-spline knot vector consumed by Stan-side Cox–de Boor recursion. This vector has clamped boundary knots repeated $D+1$ times at each end.

Arguments:

  • W (W_basis): A B-spline W basis object (must have type, boundary_knots, degree, and optionally knots/df).

Mathematics: Let $D$ be the degree, $b_{\text{lo}}, b_{\text{hi}}$ the boundary knots, and $\text{interior}$ the resolved interior knots (possibly empty). The augmented knot vector is:

$$t = \big(,\underbrace{b_{\text{lo}}, \ldots, b_{\text{lo}}}_{D+1},; \text{interior}_1, \ldots, \text{interior}_{n_{\text{int}}},; \underbrace{b_{\text{hi}}, \ldots, b_{\text{hi}}}_{D+1},\big),$$

with total length $n_{\text{int}} + 2(D+1)$. The number of Cox–de Boor B-splines is $n_{\text{int}} + D + 1$; dropping the first (to mirror splines::bs(intercept = FALSE)) yields per-coordinate df $= n_{\text{int}} + D$.

Returns: A numeric vector (the augmented knot vector). Returns numeric(0L) if W$type != "bspline".

Notes:

  • If W$type != "bspline", returns numeric(0L) immediately.
  • If W$boundary_knots is NULL, aborts via gdpar_abort with class "gdpar_internal_error".
  • Internally calls .gdpar_resolve_bspline_knots(W) to ensure interior knots are populated.
  • If the resolved knots slot is NULL (should not happen after resolution, but defensively handled), uses numeric(0L).
  • Does not mutate W; operates on a local copy W_resolved.

.gdpar_estimate_theta_ref_proxy_range(y_vec, family)

Purpose: Estimates a conservative range for $\theta_{\text{ref}}$ from the outcome vector, used by the boundary-range validator to check that the projected $\theta_{\text{ref}}$ range fits inside the B-spline boundary_knots. Applies the canonical-link inverse to the outcome as a proxy.

Arguments:

  • y_vec (numeric): The outcome/response vector.
  • family (list/environment): A family object with a stan_id field, used to determine the canonical inverse link.

Mathematics: Let $y$ be the finite subset of y_vec.

  1. If $|y| &lt; 2$, return $[-1, 1]$.
  2. Determine link_id via .gdpar_canonical_inv_link_id_slot1(family$stan_id):
    • link_id == 0 (identity / unknown): $z = y$.
    • link_id == 1 (logit): clamp $\tilde{y} = \min(\max(y, 10^{-6}),\, 1 - 10^{-6})$, then $z = \log\!\left(\frac{\tilde{y}}{1 - \tilde{y}}\right)$.
    • Otherwise (log): $z = \log(\max(y, 10^{-6}))$.
  3. Compute $m = [\min(z),\, \max(z)]$ and $s_z = \text{sd}(z)$.
  4. Fallbacks for $s_z$: if not finite or zero, $s_z = |m_2 - m_1| / 4$; if still not finite or zero, $s_z = 1$.
  5. Return $[m_1 - s_z,\; m_2 + s_z]$.

Returns: A length-2 numeric vector c(lower, upper) representing the estimated $\theta_{\text{ref}}$ proxy range.

Notes:

  • Filters y_vec to finite values via is.finite() before processing.
  • If fewer than 2 finite values remain, returns c(-1, 1) immediately.
  • The call to .gdpar_canonical_inv_link_id_slot1(family$stan_id) is wrapped in tryCatch; on error, link_id defaults to 0L (identity proxy). The docstring notes that unknown/exotic stan_ids fall back to identity with a warning, though the code itself does not issue a warning — the warning (if any) would come from .gdpar_canonical_inv_link_id_slot1.
  • na.rm = TRUE is used in range() and sd() as a secondary safety net.
  • The clamping constants 1e-6 and 1 - 1e-6 prevent log(0) and log(Inf) in the logit case.

.gdpar_resolve_W_stan_data(W, use_W, y_vec, family)

Purpose: Unified entry point that assembles the W-related Stan data fields, dispatching over W$type into a single integer identifier W_type_id (0 = off, 1 = polynomial, 2 = B-spline) plus auxiliary fields for Cox–de Boor recursion. Used by the three assemble_stan_data* variants.

Arguments:

  • W (W_basis or NULL): The W basis object.
  • use_W (integer/numeric): Flag indicating whether W is active. 0L means W is off.
  • y_vec (numeric): The outcome vector, passed to .gdpar_estimate_theta_ref_proxy_range for B-spline boundary validation.
  • family (list/environment): The family object, passed to .gdpar_estimate_theta_ref_proxy_range.

Mathematics: The returned W_type_id encodes the basis:

$$\text{W_type_id} = \begin{cases} 0 & \text{W off or NULL}, \ 1 & \text{polynomial}, \ 2 & \text{B-spline}. \end{cases}$$

For B-splines, the augmented knot vector $t$ (from .gdpar_bspline_knots_full) is passed as W_knots_full with length W_n_knots_full, and W_degree $= D$.

Returns: A named list of four fields, suitable for c()-merging into a stan_data list:

  • W_type_id (integer): 0, 1, or 2.
  • W_n_knots_full (integer): Length of the augmented knot vector (0 for polynomial/off).
  • W_knots_full (numeric): The augmented knot vector (empty for polynomial/off).
  • W_degree (integer): The basis degree (0 for off).

Notes:

  • If use_W == 0L or W is NULL, returns the empty/default list immediately (all zeros/empties).
  • For type == "polynomial": returns W_type_id = 1L, no knots, W_degree = as.integer(W$degree).
  • For type == "bspline":
    • If W$boundary_knots is NULL, aborts via gdpar_abort with class "gdpar_input_error".
    • Calls .gdpar_resolve_bspline_knots(W) to populate interior knots.
    • Calls .gdpar_estimate_theta_ref_proxy_range(y_vec, family) to get the projected $\theta_{\text{ref}}$ range.
    • Calls .gdpar_validate_bspline_boundary_range(W_resolved, proj_rng) to enforce the boundary containment contract.
    • Calls .gdpar_bspline_knots_full(W_resolved) to build the augmented knot vector.
    • Returns W_type_id = 2L, W_n_knots_full = length(knots_full), W_knots_full = as.double(knots_full), W_degree = as.integer(W_resolved$degree).
  • For any other type (including "user"): aborts via gdpar_abort with class "gdpar_unsupported_feature_error" and data = list(W_type = W$type).
  • The y_vec and family arguments are only used in the B-spline branch.

.gdpar_validate_bspline_boundary_range(W, projected_range)

Purpose: Validates that the B-spline boundary_knots contain the projected $\theta_{\text{ref}}$ range. This enforces the R-side contract (scoping decision D4) that the Stan-side Cox–de Boor recursion assumes: the projected $\theta_{\text{ref}}$ range must lie within the basis support.

Arguments:

  • W (W_basis): A W basis object with type and boundary_knots slots.
  • projected_range (numeric, length 2): The projected $\theta_{\text{ref}}$ range [lower, upper], typically from .gdpar_estimate_theta_ref_proxy_range.

Mathematics: Let $b = [b_{\text{lo}}, b_{\text{hi}}]$ be the boundary knots and $r = [r_{\text{lo}}, r_{\text{hi}}]$ the projected range. The validation passes (no abort) when:

$$r_{\text{lo}} \geq b_{\text{lo}} \quad \text{and} \quad r_{\text{hi}} \leq b_{\text{hi}}.$$

Equivalently, it aborts when $r_{\text{lo}} &lt; b_{\text{lo}}$ or $r_{\text{hi}} &gt; b_{\text{hi}}$. Note: the code allows equality at the boundary (i.e., $r_{\text{lo}} = b_{\text{lo}}$ or $r_{\text{hi}} = b_{\text{hi}}$ does not trigger an abort), which is slightly more permissive than the docstring's "strictly contain" language.

Returns: invisible(NULL) on success (or on early-exit conditions).

Notes:

  • Early return (no-op) if W$type != "bspline".
  • Early return (no-op) if W$boundary_knots is NULL.
  • Coerces projected_range to numeric via as.numeric().
  • If projected_range does not have length exactly 2, or if any element is non-finite, aborts via gdpar_abort with class "gdpar_internal_error" and data = list(received = projected_range). This is classified as an internal error (not input error), suggesting it should not arise from user action.
  • On boundary violation, aborts via gdpar_abort with class "gdpar_input_error" and data = list(boundary_knots = b, projected_range = rng). The message includes both ranges formatted with %g.
  • The function does not modify W.

← Part IV — Exhaustive Function Reference (6/7) · gdpar Wiki Home · Part V — Stan Templates (1/3) →

Clone this wiki locally