Skip to content

Part IV Function Reference 6

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 5 revisions

← Part IV — Exhaustive Function Reference (5/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (7/7) →


R/golden_compare.R

gdpar_golden_compare(observed, golden, k_sigma = 3, sanity_floor = NULL)

Purpose
Compares an observed gdpar fit snapshot against a persisted golden reference snapshot for regression testing. It evaluates four distinct layers (structural, discrete, continuous, sanity) and returns a comprehensive report of any deviations that exceed principled tolerances.

Arguments

  • observed : A list with the schema produced by gdpar_snapshot_fit() containing fields structural, discrete, continuous, sanity, and parametrization_resolved.
  • golden : A list of identical schema (typically loaded from an archived reference snapshot via readRDS()).
  • k_sigma : numeric scalar; multiplier for the Monte Carlo standard error tolerance in layer C (continuous). Default is 3.
  • sanity_floor : Optional named list overriding the default absolute thresholds for layer D (sanity). Defaults to list(rhat_max = 1.05, ess_bulk_min = 100, ess_tail_min = 100, divergent_pct = 0.01, ebfmi_min = 0.3).

Mathematics
For layer C (continuous), the comparison criterion for each continuous variable $v$ is: $$ | \text{obs}(v) - \text{exp}(v) | \le k_sigma \times \text{MC_SE}_{\text{golden}}(v) $$ where $\text{MC\_SE}_{\text{golden}}(v) = \frac{\text{sd}_{\text{golden}}(v)}{\sqrt{\text{ESS}_{\text{bulk, golden}}(v)}}$.

Returns
A list with:

  • passed : logical scalar; TRUE if and only if no failures are detected in any layer.
  • failures : A data.frame with one row per failure. Columns: layer (character), item (character), expected (character, compactly formatted), observed (character, compactly formatted), delta (numeric), threshold (numeric), severity (character).
  • by_layer : Named integer vector of failure counts per layer (structural, discrete, continuous, sanity). Missing layers are reported as 0.

Notes

  • Both observed and golden must be lists; otherwise an error of class gdpar_internal_error is raised via gdpar_abort().
  • All four layers are evaluated regardless of earlier failures; failures are aggregated, not short-circuited.
  • The function is marked experimental. The snapshot schema is versioned at schema_version = 1L and may change in future releases.
  • The tolerance contract (k_sigma and sanity floors) is subject to refinement until the first stable release.
  • No S3 dispatch is involved; it is a plain function.

format_compact(x)

Purpose
Internal helper to format any R value into a compact one-line string suitable for inclusion in the comparison report's expected and observed columns.

Arguments

  • x : Any R object.

Returns
A character string:

  • "<NULL>" if x is NULL.
  • "<empty>" if length(x) == 0.
  • "<list:N>" if x is a list of length N.
  • For character or logical vectors, elements are collapsed with commas.
  • For numeric vectors of length ≤ 4, values are formatted with formatC(digits = 6, format = "g") and trimmed of whitespace before collapsing with commas.
  • For numeric vectors of length > 4, returns "<num[N]>" where N is the length.
  • For all other objects, uses as.character() and collapses with commas.

Notes

  • This is an internal function (not exported) and is used solely to generate human-readable representations for the comparison report.
  • No errors are raised; it handles any input gracefully.

gdpar_snapshot_fit(fit)

Purpose
Extracts a four-layer reproducibility snapshot from a fitted gdpar_fit object. The snapshot is the canonical reference object for regression testing and is designed to be persisted via saveRDS() and later compared with gdpar_golden_compare().

Arguments

  • fit : A gdpar_fit object returned by gdpar().

Returns
A list with:

  • structural : Class signatures, slot shapes, and column names of the fit object (built by build_structural_snapshot_golden()).
  • discrete : Integer sampler diagnostics: n_divergent (total divergences), treedepth_max_n (number of transitions hitting maximum treedepth), n_leapfrog_total_per_chain (vector of total leapfrog steps per chain), treedepth_max_value (maximum treedepth observed), ebfmi_min (minimum E-BFMI across chains).
  • continuous : Named list of per-variable posterior summaries for variables theta_ref, a_coef, sigma_a, sigma_y, lp__ (if present). Each entry is a named list of elements (e.g., theta_ref[1]) with fields: mean, sd, ess_bulk, ess_tail, rhat, mc_se (Monte Carlo standard error = sd / sqrt(ess_bulk)).
  • sanity : Aggregated convergence floors (built by extract_sanity_golden()).
  • parametrization_resolved : List containing resolved CP/NCP flags (cp_a, cp_W, cp_a_per_k, cp_W_per_k) and the aggregation method (aggregation) from fit$parametrization$meta.

Notes

  • The function asserts that fit inherits from class gdpar_fit.
  • The snapshot schema is versioned at schema_version = 1L.
  • The function is marked experimental.

summarize_draws_var_golden(draws_array, var_name)

Purpose
Internal helper to compute per-element posterior summaries for all elements of a given variable (or variable prefix) in a draws_array object.

Arguments

  • draws_array : A draws_array object (from the posterior package) containing the MCMC draws.
  • var_name : Character scalar specifying the variable name or prefix (e.g., "theta_ref"). Elements are identified via regex ^var_name($|\\[).

Mathematics
For each element $v$ of the variable:

  • Mean: $\bar{v} = \frac{1}{N} \sum_{i=1}^N v_i$
  • Standard deviation: $\text{sd}(v) = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (v_i - \bar{v})^2}$
  • Bulk effective sample size (ESS): $\text{ESS}_{\text{bulk}}$
  • Tail effective sample size (ESS): $\text{ESS}_{\text{tail}}$
  • Potential scale reduction factor ($\hat{R}$)
  • Monte Carlo standard error: $\text{MC\_SE} = \frac{\text{sd}(v)}{\sqrt{\text{ESS}_{\text{bulk}}}}$

Returns

  • NULL if no elements of the variable are found.
  • Otherwise, a named list where each element corresponds to a variable element (e.g., "theta_ref[1]"). Each element is a list with components: mean, sd, ess_bulk, ess_tail, rhat, mc_se (all numeric scalars).

Notes

  • Uses posterior::summarise_draws() with custom summary functions for mean, sd, ESS bulk, ESS tail, and rhat.
  • The variable matching is done via regex, so "theta_ref" matches "theta_ref", "theta_ref[1]", "theta_ref[2]", etc.

extract_discrete_diag_golden(fit)

Purpose
Internal helper to extract discrete (integer-valued) sampler diagnostics from a gdpar_fit object.

Arguments

  • fit : A gdpar_fit object.

Returns
A named list:

  • n_divergent : Integer total number of divergent transitions across all chains.
  • treedepth_max_n : Integer count of transitions that hit the maximum treedepth.
  • n_leapfrog_total_per_chain : Integer vector of total leapfrog steps per chain (length = number of chains).
  • treedepth_max_value : Integer maximum treedepth observed across all draws and chains.
  • ebfmi_min : Numeric scalar minimum E-BFMI across chains.

Notes

  • Relies on cmdstanr methods: $sampler_diagnostics() and $diagnostic_summary().
  • The ebfmi_min is computed as the minimum across chains of the E-BFMI values returned by $diagnostic_summary().

extract_sanity_golden(draws_array, discrete)

Purpose Computes a set of summary statistics for MCMC diagnostics from a draws array and a discrete diagnostics object. This function aggregates key sampling quality metrics to be compared against floor values in sanity checks.

Arguments

  • draws_array: A 3D array of MCMC draws (iterations × chains × parameters), typically from posterior::as_draws_array.
  • discrete: A list containing discrete diagnostics from the sampler (e.g., from rstan::get_sampler_params). Expected to include n_leapfrog_total_per_chain, n_divergent, and ebfmi_min.

Mathematics The function computes the following:

  1. For each parameter, the R-hat ($\hat{R}$), bulk effective sample size (ESS), and tail ESS using the posterior package.
  2. The maximum R-hat across all parameters: $\hat{R}_{\text{max}} = \max_{\theta} \hat{R}(\theta)$
  3. The minimum bulk ESS: $\text{ESS}_{\text{bulk,min}} = \min_{\theta} \text{ESS}_{\text{bulk}}(\theta)$
  4. The minimum tail ESS: $\text{ESS}_{\text{tail,min}} = \min_{\theta} \text{ESS}_{\text{tail}}(\theta)$
  5. The divergent transition percentage: $\text{divergent\%} = \frac{n_{\text{divergent}}}{n_{\text{chains}} \times n_{\text{sampling}}}$ where $n_{\text{sampling}}$ is the number of post-warmup draws per chain.
  6. The minimum E-BFMI (energy Bayesian fraction of missing information): $\text{E-BFMI}_{\text{min}} = \min_{\text{chains}} \text{E-BFMI}$

Returns A named list with numeric elements: rhat_max, ess_bulk_min, ess_tail_min, divergent_pct, and ebfmi_min.

Notes

  • n_sampling is computed as ndraws(draws_array) / n_chains. If n_leapfrog_total_per_chain length does not match the number of chains in the draws array, the divergent percentage may be misinterpreted.
  • Parameters with missing R-hat values are excluded from the max R-hat computation (na.rm = TRUE).
  • This is an internal function; not exported.

build_structural_snapshot_golden(fit)

Purpose Captures a structural snapshot of a fitted gdpar model object. This snapshot is used to detect silent API changes (e.g., changes in class, slot shapes, or column names) before running numerical comparisons.

Arguments

  • fit: A fitted model object of a class recognized by the gdpar package (typically with a coef method).

Mathematics None.

Returns A named list containing:

  • fit_class: The class of fit.
  • coef_class: The class of the coef(fit) object.
  • p: An integer, the number of parameters from coef(fit)$p.
  • summary_stats: The summary statistics from coef(fit).
  • components: A character vector of names in coef(fit) excluding "p" and "summary_stats".
  • theta_ref_cols: Column names of the theta_ref matrix in the coefficients.
  • theta_ref_nrow: Number of rows in theta_ref.
  • a_class, a_length, a_per_k_cols: Class, length, and per-component column names of the a component (or NULL/NA).
  • b_class, b_length: Class and length of the b component (or NULL/NA).
  • W_class, W_length: Class and length of the W component (or NULL/NA).
  • parametrization_keys: A sorted character vector of names in fit$parametrization.

Notes

  • This is an internal function; not exported.
  • The snapshot is designed for use in golden_compare_structural.

golden_compare_structural(obs, exp, add)

Purpose Performs a structural comparison (Layer A) between an observed and an expected structural snapshot. It reports differences in classes, dimensions, and component names.

Arguments

  • obs: The observed structural snapshot (a list from build_structural_snapshot_golden).
  • exp: The expected (golden) structural snapshot (a list from build_structural_snapshot_golden).
  • add: A callback function that records differences. It is called as add(layer, key, expected, observed, ...).

Mathematics None.

Returns Invisible NULL. Side-effect: calls add for each detected discrepancy.

Notes

  • If both obs and exp are NULL, returns immediately.
  • If only one is NULL, records a snapshot_present mismatch.
  • Compares scalars (integer/character) by identity, ordered vectors by identity, and sets (like components and parametrization_keys) by set equality.
  • The a_per_k_cols field is compared by identity after formatting with format_compact (not defined here; presumably a helper).
  • This is an internal function; not exported.

golden_compare_discrete(obs, exp, add)

Purpose Performs a discrete comparison (Layer B) of sampler diagnostics between observed and expected snapshots. It checks integer-valued diagnostics for bit-exact equality.

Arguments

  • obs: The observed discrete diagnostics snapshot (a list).
  • exp: The expected discrete diagnostics snapshot (a list).
  • add: A callback function that records differences.

Mathematics None.

Returns Invisible NULL. Side-effect: calls add for each detected discrepancy.

Notes

  • If both obs and exp are NULL, returns immediately.
  • If only one is NULL, records a snapshot_present mismatch.
  • Checks the following integer keys: n_divergent, treedepth_max_n, treedepth_max_value.
  • Also checks n_leapfrog_total_per_chain if both are non-NULL, but note that this field may be a vector (one per chain). The comparison uses as.integer and identity, so it must match exactly in length and values.
  • The delta field in add is set as obs[[key]] - exp[[key]] for integer keys, but note that for n_leapfrog_total_per_chain this delta is not explicitly set (the default add call does not include delta).
  • This is an internal function; not exported.

golden_compare_continuous(obs, exp, k_sigma, add)

Purpose Performs a continuous comparison (Layer C) of parameter summaries. It flags any parameter whose observed mean deviates from the expected mean by more than k_sigma times the expected Monte Carlo standard error (MC_SE).

Arguments

  • obs: The observed continuous summaries (a nested list: group → variable → summary with mean and mc_se).
  • exp: The expected continuous summaries (same structure).
  • k_sigma: A numeric multiplier for the MC_SE threshold.
  • add: A callback function that records failures.

Mathematics For each common group and variable, the comparison is: $$ |\text{obs.mean} - \text{exp.mean}| > k_{\sigma} \times \max(\text{exp.mc_se}, \epsilon) $$ where $\epsilon = \texttt{.Machine\$double.eps}$ to avoid division by zero.

Returns Invisible NULL. Side-effect: calls add for any parameter failing the threshold.

Notes

  • If both obs and exp are NULL, returns immediately.
  • If only one is NULL, records a snapshot_present mismatch.
  • Only groups and variables present in both obs and exp are compared.
  • Parameters with missing mean or mc_se in either snapshot are skipped.
  • The threshold is the maximum of k_sigma * e$mc_se and a tiny epsilon to prevent zero thresholds when mc_se is zero.
  • The severity of a failure is set to "fail".
  • This is an internal function; not exported.

golden_compare_sanity(obs, floor, add)

Purpose Performs a sanity check (Layer D) against absolute floor values for MCMC diagnostics. It flags when observed diagnostics are worse than the floor.

Arguments

  • obs: The observed sanity diagnostics (a list with rhat_max, ess_bulk_min, ess_tail_min, divergent_pct, ebfmi_min).
  • floor: The floor (minimum acceptable) values for the same diagnostics.
  • add: A callback function that records failures.

Mathematics The checks are:

  • obs$rhat_max > floor$rhat_max
  • obs$ess_bulk_min < floor$ess_bulk_min
  • obs$ess_tail_min < floor$ess_tail_min
  • obs$divergent_pct > floor$divergent_pct
  • obs$ebfmi_min < floor$ebfmi_min

Returns Invisible NULL. Side-effect: calls add for each violated floor.

Notes

  • If obs is NULL, records a snapshot_present mismatch.
  • Only fields present in obs (non-NULL) are compared.
  • The delta and threshold fields in the add call are set accordingly (delta = obs - floor, threshold = floor value).
  • This is an internal function; not exported.

R/golden_helpers.R

.gdpar_golden_manifest_path(root_dir = NULL)

Purpose

Returns the filesystem path to the golden manifest CSV (golden_manifest.csv), the reproducibility-tracking file for golden snapshots. All other manifest helpers delegate to this function for path resolution.

Arguments

  • root_dir (character(1) or NULL): Explicit root directory in which the manifest is expected to reside. When NULL, the function searches a fixed set of candidate directories.

Returns

A character(1) path string. The resolution logic is:

  • If root_dir is NULL, iterate over the candidate vector c("tests/testthat/data", "data", "."); return file.path(candidate, "golden_manifest.csv") for the first candidate for which dir.exists() is TRUE. If no candidate directory exists, return "golden_manifest.csv" (a bare filename in the current working directory).
  • If root_dir is supplied, return file.path(root_dir, "golden_manifest.csv") unconditionally—no existence check is performed.

Notes

No directories are created. The candidate list is hardcoded; there is no fallback to recursive search.


.gdpar_golden_manifest_columns()

Purpose

Returns the canonical, ordered column schema of the golden manifest CSV. Used by init/read/add helpers to guarantee schema consistency.

Arguments

None.

Returns

A character vector of length 16, in order:

"family", "K", "basis_type", "p", "sub_phase", "bootstrap_date",
"cmdstan_version", "R_version", "DHARMa_version", "fit_code_hash",
"seed", "n_chains", "n_iter_warmup", "n_iter_sampling",
"rds_path", "n_obs"

Notes

Pure constant function; no side effects, no I/O.


.gdpar_golden_manifest_init(path = NULL)

Purpose

Ensures the manifest CSV exists with the canonical header. If the file is already present, the function is a no-op; otherwise it creates a zero-row CSV with the canonical columns.

Arguments

  • path (character(1) or NULL): Path to the manifest CSV. If NULL, delegates to .gdpar_golden_manifest_path().

Returns

Invisibly returns path (the resolved path string), regardless of whether the file was created or already existed.

Notes

  • Existence is checked via file.exists(path).
  • When creating, an empty data.frame is built with matrix(character(0), nrow = 0L, ncol = length(header), dimnames = list(NULL, header)), coerced via as.data.frame(..., stringsAsFactors = FALSE), and written with utils::write.csv(empty, path, row.names = FALSE).
  • Does not create parent directories; if the parent directory does not exist, write.csv will error.

.gdpar_golden_manifest_read(path = NULL)

Purpose

Reads the manifest CSV into a data.frame. Returns a schema-conformant empty frame if the file does not exist.

Arguments

  • path (character(1) or NULL): Path to the manifest CSV. If NULL, delegates to .gdpar_golden_manifest_path().

Returns

A data.frame:

  • If file.exists(path) is FALSE: an empty frame with 0 rows and the 16 canonical columns (all character), constructed identically to .gdpar_golden_manifest_init's empty frame.
  • If the file exists: utils::read.csv(path, stringsAsFactors = FALSE).

Notes

Unlike .gdpar_golden_manifest_init, this function does not create the file when it is absent—it merely returns an empty in-memory frame. The returned frame's column types depend on what read.csv infers; only the absent-file branch guarantees all-character columns.


.gdpar_golden_manifest_add(row_data, path = NULL)

Purpose

Appends a single row to the manifest CSV with idempotent upsert semantics on the composite key (family, K, basis_type, p): any pre-existing row sharing the same key is removed before the new row is appended.

Arguments

  • row_data (named list): Field values for the new manifest row. Must contain at least the four key fields family, K, basis_type, p. Additional fields matching the canonical column set are written; non-matching fields are silently dropped.
  • path (character(1) or NULL): Path to the manifest CSV. If NULL, delegates to .gdpar_golden_manifest_path().

Mathematics

The deduplication key is a pipe-delimited concatenation:

$$ \text{key} = \text{family} ;\vert; \text{K} ;\vert; \text{basis_type} ;\vert; \text{p} $$

For each existing row $i$, compute $\text{key}_i$; retain row $i$ if and only if $\text{key}_i \neq \text{key}_{\text{new}}$. The surviving rows are then row-bound with the new row.

Returns

Invisibly returns path.

Notes

  • Calls .gdpar_golden_manifest_init(path) first, guaranteeing the file exists.
  • Validates that row_data contains all four required keys (family, K, basis_type, p). If any are missing, calls gdpar_abort() with class "gdpar_internal_error" and a message of the form "Golden manifest row is missing required keys: 'x', 'y'." (missing keys formatted with sQuote and comma-collapsed).
  • Constructs a 1-row data.frame initialized entirely to NA_character_ across all 16 canonical columns, then fills in values from row_data for matching column names via as.character().
  • Deduplication: builds key_old from the existing frame's columns and key_new from the new row's columns; filters with keep <- key_old != key_new; subsets current[keep, , drop = FALSE].
  • Writes the combined frame via utils::write.csv(combined, path, row.names = FALSE).
  • Column ordering in the output is governed by the canonical column vector because row_df is constructed with that dimnames order and current was originally written in that order (though read.csv may reorder if the file on disk differs).

.gdpar_golden_fit_code_hash(fit_code_fn)

Purpose

Computes a reproducibility identifier (hash) for the fit-code closure that generated a golden snapshot, enabling drift detection between the generator at bootstrap time and at audit time.

Arguments

  • fit_code_fn (function): A closure whose deparsed source text serves as the hash input.

Mathematics

$$ \text{txt} = \text{paste}\bigl(\text{deparse}(\text{fit_code_fn}),; \text{collapse} = \text{"}\backslash\text{n"}\bigr) $$

$$ \text{hash}(\text{txt}) = \begin{cases} \text{digest::digest}(\text{txt},; \text{algo} = \text{"sha256"}) & \text{if \pkg{digest} is available} \[4pt] \text{"nohash_"} ,\Vert, \text{as.character}\bigl(\text{nchar}(\text{txt})\bigr) & \text{otherwise} \end{cases} $$

Returns

A character(1) string: the SHA-256 hex digest of txt when the digest package is installed, otherwise the fallback "nohash_<n>" where <n> is nchar(txt).

Notes

  • Availability is checked with requireNamespace("digest", quietly = TRUE).
  • The fallback hash is not collision-resistant; it encodes only the string length, so two distinct closures of identical deparsed length would produce the same fallback identifier.
  • deparse() may truncate or reformat long closures depending on R's deparse.max.lines / width options; the hash is therefore sensitive to the active R session's deparse settings.

.gdpar_cmdstan_version()

Purpose

Returns the installed cmdstan version string for inclusion in manifest metadata, or NA when cmdstanr is unavailable or the version query fails.

Arguments

None.

Returns

A character(1): as.character(cmdstanr::cmdstan_version()) if cmdstanr is installed and the call succeeds; NA_character_ otherwise.

Notes

  • Availability checked via requireNamespace("cmdstanr", quietly = TRUE).
  • The cmdstan_version() call is wrapped in tryCatch with an error handler returning NA_character_, so a broken cmdstan installation yields NA rather than a propagated error.
  • The result is coerced with as.character(), so a numeric or numeric_version result is stringified.

.gdpar_dharma_version()

Purpose

Returns the installed DHARMa package version string for manifest metadata, or NA when DHARMa is unavailable.

Arguments

None.

Returns

A character(1): as.character(utils::packageVersion("DHARMa")) if DHARMa is installed; NA_character_ otherwise.

Notes

  • Availability checked via requireNamespace("DHARMa", quietly = TRUE).
  • Unlike .gdpar_cmdstan_version(), there is no tryCatch wrapper; packageVersion is expected not to throw once the namespace check passes.

.gdpar_golden_rds_name(family, K, basis_type = "polynomial", p = 1L)

Purpose

Generates the canonical RDS filename for a (non-EB) golden snapshot, keyed by the quadruple (family, K, basis_type, p). Matches the naming convention of the frozen goldens from Sub-phases 8.3.4 / 8.3.5a / 8.3.3 Unit 3.

Arguments

  • family (character(1)): Canonical family name (e.g., "gaussian").
  • K (scalar, numeric/integer): Number of mixture components.
  • basis_type (character(1)): Basis type label; defaults to "polynomial".
  • p (scalar, numeric/integer): Basis dimension / polynomial degree; defaults to 1L.

Mathematics

$$ \text{base} = \text{sprintf}!\bigl(\text{"golden_K%d_%s_K%d"},; K,; \text{family},; K\bigr) $$

$$ \text{if } \text{basis_type} \neq \text{"polynomial"}:\quad \text{base} \leftarrow \text{base} ,\Vert, \text{"_"} ,\Vert, \text{basis_type} $$

$$ \text{if } p > 1:\quad \text{base} \leftarrow \text{sprintf}!\bigl(\text{"%s_p%d"},; \text{base},; p\bigr) $$

$$ \text{filename} = \text{base} ,\Vert, \text{".rds"} $$

Returns

A character(1) filename string ending in .rds.

Notes

  • The basis_type suffix is omitted only when basis_type == "polynomial" (exact string equality). Any other value—including "Polynomial" with different casing—triggers the suffix.
  • The p suffix is omitted when p <= 1L (the check is p > 1L). A non-integer p such as 1.5 would trigger the suffix.
  • K is interpolated twice into the base name (both before and after the family name), producing patterns like golden_K2_gaussian_K2.rds.

.gdpar_golden_eb_rds_name(family, regime = "poly")

Purpose

Generates the canonical RDS filename for an EB (empirical-Bayes) golden snapshot, keyed by the (family, regime) pair of Sub-phase 8.6.B.

Arguments

  • family (character(1)): Canonical family name (e.g., "gaussian", "zip").
  • regime (character(1)): Regime label identifying the configuration; defaults to "poly". Typical values: "poly", "bspline", "guard".

Returns

A character(1) of the form:

$$ \text{filename} = \text{"golden_eb_"} ,\Vert, \text{family} ,\Vert, \text{"_"} ,\Vert, \text{regime} ,\Vert, \text{".rds"} $$

Notes

No conditional logic; the pattern is unconditional. Examples: golden_eb_gaussian_poly.rds, golden_eb_gaussian_bspline.rds, golden_eb_zip_guard.rds.


.gdpar_golden_roster_8_6_B()

Purpose

Returns the canonical 17-configuration roster for Sub-phase 8.6.B (Charter Section 3.2 / parcial handoff Section 5.1 / D33 of the closure session). The roster partitions into 5 fitable and 12 guard-rejected configurations, each annotated with family, regime, status, guard class, and rationale.

Arguments

None.

Returns

A list of length 17, where each element is a named list with exactly five components:

Component Type Description
family character(1) Canonical family name
regime character(1) Filename suffix
status character(1) "fitable" or "guard"
guard_class character(1) or NA_character_ NA_character_ for fitable; canonical error class for guarded
rationale character(1) One-line description

The 17 entries, in order:

Fitable (5):

# family regime status guard_class rationale
1 gaussian poly fitable NA_character_ K=1, polynomial W, identity link
2 poisson poly fitable NA_character_ K=1, polynomial W, log link
3 neg_binomial_2 poly fitable NA_character_ K=1 with phi as global dispersion, log link
4 bernoulli poly fitable NA_character_ K=1, polynomial W, logit link
5 gaussian bspline fitable NA_character_ K=1, B-spline W (Sub-phase 8.3.8 path)

Guard-rejected, stan_id D33 (5):

# family regime status guard_class rationale
6 beta guard guard gdpar_unsupported_feature_error stan_id=5 outside EB-supported set (D33)
7 gamma guard guard gdpar_unsupported_feature_error stan_id=6 outside EB-supported set (D33)
8 student_t guard guard gdpar_unsupported_feature_error stan_id=8 outside EB-supported set (D33)
9 tweedie guard guard gdpar_unsupported_feature_error stan_id=9 outside EB-supported set (D33)
10 lognormal_loc_scale guard guard gdpar_input_error not in gdpar_family() registry (K>=2 only)

Guard-rejected, min_K (4):

# family regime status guard_class rationale
11 zip guard guard gdpar_unsupported_feature_error min_K>1 (mixture; deferred to 8.6.C)
12 zinb guard guard gdpar_unsupported_feature_error min_K>1 (mixture; deferred to 8.6.C)
13 hurdle_poisson guard guard gdpar_unsupported_feature_error min_K>1 (hurdle; deferred to 8.6.C)
14 hurdle_neg_binomial_2 guard guard gdpar_unsupported_feature_error min_K>1 (hurdle; deferred to 8.6.C)

Guard-rejected, heterogeneous amm K>1 (3):

# family regime status guard_class rationale
15 het_beta_gamma_K2 guard guard gdpar_unsupported_feature_error heterogeneous K=2 amm list; deferred to 8.6.C
16 het_tweedie_K3 guard guard gdpar_unsupported_feature_error heterogeneous K=3 amm list; deferred to 8.6.C
17 het_student_t_K3 guard guard gdpar_unsupported_feature_error heterogeneous K=3 amm list; deferred to 8.6.C

Notes

Pure data function; no I/O, no side effects. The roster is entirely hardcoded. Entry 10 (lognormal_loc_scale) is the only guard entry whose guard_class is gdpar_input_error rather than gdpar_unsupported_feature_error.


.gdpar_golden_roster_8_6_C()

Purpose (from roxygen preamble only; function body not present in this section)

Intended to return the canonical 17-configuration roster for Sub-phase 8.6.C (Charter Section 3.3, apertura Section 4.4, D34 canonization), covering two multivariate paths:

  • Path A ($K = 1,\; p &gt; 1$): 6 fitable (Gaussian / Poisson / Negative Binomial / Bernoulli on amm_eb_marginal_multi.stan; stan_id in $\{1,2,3,4\}$) + 4 guard (Beta / Gamma / Student-t / Tweedie, rejected because Path A's supported set inherits amm_distrib_multi.stan $\{1,2,3,4\}$).
  • Path B ($K &gt; 1,\; p = 1$): 7 fitable spanning Beta K=2, Gamma K=2, Student-t K=3, Tweedie K=3, Lognormal-loc-scale K=2, ZIP K=2, and heterogeneous Gauss+Beta K=2.

Each entry is documented as carrying: family, regime ("polyP2" / "polyP3" / "bsplineP2" / "K2" / "K3" / "pathA_p2_guard"), status ("fitable" / "guard"), path ("A" or "B"), K, p, guard_class, rationale. The documented filename pattern is golden_eb_<family>_path<A|B>_<regime>.rds.

Notes

Only the roxygen documentation block is present in this section; the function definition (function(...) { ... }) is not included. The behavior described above reflects the roxygen comments and cannot be verified against source in this section.

.gdpar_golden_roster_8_6_C()

Purpose

Factory that returns the canonical roster (a hard-coded list of configuration descriptors) for Sub-phase 8.6.C of the golden-reference test infrastructure. This roster enumerates the family/regime/path/K/p combinations that are either "fitable" (i.e. backed by a real Stan model and eligible for a golden RDS artifact) or "guard" (i.e. explicitly rejected with a documented error class) across two disjoint modeling paths:

  • Path A — single-regime models with $K = 1$ and polynomial/B-spline order $p &gt; 1$.
  • Path B — multi-regime models with $K &gt; 1$ and $p = 1$.

The roster is consumed downstream by golden-artifact generation and validation logic to iterate over the supported and guarded configuration space without colliding with the 8.6.B or 8.6.D namespaces.

Arguments

None. The function is a zero-argument closure that returns a literal data structure.

Returns

A list of length 17, each element being a named list with the following fields:

Field Type Meaning
family character Canonical distribution family name (e.g. "gaussian", "neg_binomial_2", "het_gauss_beta_K2").
regime character Filename suffix / regime tag (e.g. "polyP2", "bsplineP2", "polyP3", "K2", "K3", "polyP2_guard").
status character Either "fitable" (real Stan-backed configuration) or "guard" (explicitly rejected configuration).
path character Modeling path tag: "A" or "B".
K integer Number of regimes. 1L for all Path A entries; 2L or 3L for Path B entries.
p integer Basis-function order. 2L or 3L for Path A; 1L for Path B.
guard_class character NA_character_ for fitable entries; "gdpar_unsupported_feature_error" for guard entries.
rationale character Human-readable description embedding the corresponding stan_id and the path/K/p summary.

The 17 entries decompose as:

  • Path A fitable (6)gaussian/polyP2 (stan_id 1), gaussian/bsplineP2 (stan_id 1), gaussian/polyP3 (stan_id 1), poisson/polyP2 (stan_id 2), neg_binomial_2/polyP2 (stan_id 3), bernoulli/polyP2 (stan_id 4). All have K = 1L, p = 2L (except polyP3 which has p = 3L).
  • Path A guard (4)beta/polyP2_guard (stan_id 5), gamma/polyP2_guard (stan_id 6), student_t/polyP2_guard (stan_id 8), tweedie/polyP2_guard (stan_id 9). All have K = 1L, p = 2L, guard_class = "gdpar_unsupported_feature_error", with rationale text "outside Path A supported set (K=1)".
  • Path B fitable (7)beta/K2 (stan_id 5, $K=2$), gamma/K2 (stan_id 6, $K=2$), student_t/K3 (stan_id 8, $K=3$), tweedie/K3 (stan_id 9, $K=3$), lognormal_loc_scale/K2 (stan_id 7, $K=2$), zip/K2 (stan_id 10, $K=2$), het_gauss_beta_K2/K2 (named-list family, $K=2$). All have p = 1L.

Notes

  • The roster is purely declarative; it performs no validation, I/O, or model fitting. Side effects: none.
  • The stan_id values cited in the rationale strings are documentation-only metadata; they are not stored as a separate field.
  • The heterogeneous-family entry het_gauss_beta_K2 uses the family name itself suffixed with _K2, distinguishing it from the Path B het_gauss_beta entry that appears in other rosters.
  • The Path A guard entries document families that are supported under Path B (e.g. beta, gamma, student_t, tweedie) but are rejected when combined with $K = 1$ and $p &gt; 1$.
  • No errors are raised by this function under any input (it takes no input).

.gdpar_golden_eb_rds_name_8_6_C(family, path, regime)

Purpose

Generates the canonical on-disk filename for an empirical-Bayes (EB) golden RDS artifact belonging to Sub-phase 8.6.C. The path tag ("A" or "B") is embedded in the basename so that 8.6.C artifacts do not collide on disk with the parallel 8.6.B namespace produced by .gdpar_golden_eb_rds_name.

Arguments

Argument Type Meaning
family character (scalar) Canonical distribution family name, e.g. "gaussian", "beta", "het_gauss_beta_K2".
path character (scalar) Modeling path tag, expected to be "A" or "B". Inserted verbatim after path in the format string.
regime character (scalar) Regime/filename suffix, e.g. "polyP2", "bsplineP2", "K2", "polyP2_guard".

Mathematics

The filename is produced by the format string

$$ \texttt{filename} = \texttt{sprintf}!\left(\texttt{"golden_eb_%s_path%s_%s.rds"},; \textit{family},; \textit{path},; \textit{regime}\right) $$

yielding, for example, golden_eb_gaussian_pathA_polyP2.rds or golden_eb_het_gauss_beta_K2_pathB_K2.rds.

Returns

A length-1 character string — the canonical RDS filename (basename only, no directory prefix).

Notes

  • No validation of family, path, or regime is performed; any string is substituted verbatim.
  • The function is @keywords internal and @noRd; it is not exported and not documented in the user-facing manual.
  • Because the path tag is embedded, two configurations sharing the same (family, regime) but different paths (e.g. a hypothetical gaussian/polyP2 under Path A vs. Path C) produce distinct filenames.
  • The extension is always .rds.

.gdpar_golden_roster_8_6_D()

Purpose

Factory that returns the canonical roster for Sub-phase 8.6.D, covering Path C — models with $K &gt; 1$ and $p &gt; 1$ simultaneously (the "both greater than one" frontier). Per the roxygen documentation, this sub-phase encodes decisions D36 ($\alpha$), D37 (Session 13a, item (i)), D38'' (item (h)), D39, D40', D41 (Session 13b), and D43 (Session 13c, item (a)).

The initial iteration covers stan_id ∈ {1, 3} (Gaussian and NB2) at $K = 2$ crossed with $p ∈ \{2, 3\}$, yielding 4 fitable configurations. An additional 4 guard entries document the deferred status of Beta, Gamma, Student-t, and ZIP at $p = 2$ under decision D40', with the explicit numerical caveat of "opening Section 6.1."

Arguments

None.

Returns

A list of length 8, each element a named list with fields family, regime, status, path, K, p, guard_class, and rationale (same schema as .gdpar_golden_roster_8_6_C). All entries have path = "C".

The 8 entries decompose as:

  • Path C fitable (4) — all with guard_class = NA_character_:
    • gaussian / polyP2, $K = 2$, $p = 2$ (stan_id 1)
    • gaussian / polyP3, $K = 2$, $p = 3$ (stan_id 1)
    • neg_binomial_2 / polyP2, $K = 2$, $p = 2$ (stan_id 3)
    • neg_binomial_2 / polyP3, $K = 2$, $p = 3$ (stan_id 3)
  • Path C guard (4) — all with guard_class = "gdpar_unsupported_feature_error" and rationale citing "deferred under D40' (numerical caveat 6.1)":
    • beta / polyP2_guard, $K = 2$, $p = 2$ (stan_id 5)
    • gamma / polyP2_guard, $K = 2$, $p = 2$ (stan_id 6)
    • student_t / polyP2_guard, $K = 3$, $p = 2$ (stan_id 8)
    • zip / polyP2_guard, $K = 2$, $p = 2$ (stan_id 10)

Notes

  • The student_t guard entry is the only Path C entry with $K = 3$; all others have $K = 2$. This mirrors the Path B roster where student_t is registered at $K = 3$.
  • Heterogeneous-$p$ across slots is explicitly not represented in this initial roster; the roxygen documents it as "follow-on debt against Block 9.x."
  • The roxygen references stan_id values {5, 6, 7, 8, 9, 10, 11, 12, 13} as "other Path B families" that are guard-rejected; however, the actual returned list only contains guard entries for stan_ids 5, 6, 8, and 10. The broader set mentioned in the documentation describes the conceptual deferred space, not the literal list contents.
  • Filename pattern for artifacts derived from this roster is golden_eb_<family>_pathC_<regime>.rds via .gdpar_golden_eb_rds_name_8_6_D.
  • No side effects; no errors raised.

.gdpar_golden_eb_rds_name_8_6_D(family, regime)

Purpose

Generates the canonical on-disk filename for an EB golden RDS artifact belonging to Sub-phase 8.6.D (Path C). Mirrors .gdpar_golden_eb_rds_name_8_6_C but hardcodes the path tag to "C", accepting only family and regime as arguments. This ensures the 8.6.B, 8.6.C, and 8.6.D namespaces remain disjoint on disk.

Arguments

Argument Type Meaning
family character (scalar) Canonical distribution family name, e.g. "gaussian", "neg_binomial_2", "beta".
regime character (scalar) Regime/filename suffix, e.g. "polyP2", "polyP3", "polyP2_guard".

Mathematics

$$ \texttt{filename} = \texttt{sprintf}!\left(\texttt{"golden_eb_%s_pathC_%s.rds"},; \textit{family},; \textit{regime}\right) $$

Returns

A length-1 character string — the canonical RDS basename, e.g. golden_eb_gaussian_pathC_polyP2.rds or golden_eb_beta_pathC_polyP2_guard.rds.

Notes

  • Unlike .gdpar_golden_eb_rds_name_8_6_C, this function does not accept a path argument; the path is fixed to "C".
  • No input validation is performed.
  • The function is @keywords internal and @noRd.
  • The extension is always .rds.

.gdpar_golden_roster_8_3_9()

Purpose

Factory that returns the canonical 14-configuration roster for sub-phase 8.3.9, described in the roxygen as "Decision D1 = (1D), 13 atrasados + K=1+p=2 polynomial baseline." This roster aggregates the "atrasados" (deferred/late) configurations from sub-phases 8.3.4 through 8.3.8 plus the 8.3.9 baseline, providing a unified iteration surface for golden-artifact generation across the heterogeneous family/basis/K/p space.

Arguments

None.

Returns

A list of length 14. Each element is a named list with fields:

Field Type Meaning
family character Canonical distribution family name.
K integer Number of regimes.
basis_type character Either "polynomial" or "bspline".
p integer Basis-function order.
sub_phase character Originating sub-phase tag, e.g. "8.3.4", "8.3.5a", "8.3.6", "8.3.7", "8.3.8", "8.3.9".

The 14 entries, in order, are:

# family K basis_type p sub_phase
1 lognormal_loc_scale 2 polynomial 1 8.3.4
2 student_t 3 polynomial 1 8.3.5a
3 tweedie 3 polynomial 1 8.3.5b
4 zip 2 polynomial 1 8.3.6
5 zinb 3 polynomial 1 8.3.6
6 hurdle_poisson 2 polynomial 1 8.3.6
7 hurdle_neg_binomial_2 3 polynomial 1 8.3.6
8 het_gauss_beta 2 polynomial 1 8.3.7
9 het_gauss_gamma 2 polynomial 1 8.3.7
10 het_nb_beta 2 polynomial 1 8.3.7
11 gaussian 1 bspline 1 8.3.8
12 het_gauss_beta 2 bspline 1 8.3.8
13 gaussian 1 bspline 2 8.3.8
14 gaussian 1 polynomial 2 8.3.9

Notes

  • The roxygen comment for entry 1 notes that gaussian, beta, and gamma "already exist on disk" and are therefore excluded from the 8.3.4 atrasados; only lognormal_loc_scale at $K = 2$ is registered as the remaining gap.
  • This roster uses a different schema from the 8.6.C / 8.6.D rosters: it has no status, path, guard_class, or rationale fields, and it adds basis_type and sub_phase fields instead. All entries are implicitly "fitable" (there are no guard entries).
  • Sub-phase 8.3.6 contributes 4 mixture-family entries (ZIP, ZINB, hurdle-Poisson, hurdle-NB2), alternating $K = 2$ and $K = 3$.
  • Sub-phase 8.3.7 contributes 3 heterogeneous-family entries, all at $K = 2$.
  • Sub-phase 8.3.8 contributes 3 B-spline entries; notably entry 13 (gaussian, $K = 1$, bspline, $p = 2$) is the only entry in this roster with $p = 2$ besides the 8.3.9 baseline.
  • The 8.3.9 baseline (entry 14) is the only entry with basis_type = "polynomial" and $p = 2$; the roxygen describes it as "filling the last gap."
  • No side effects; no errors raised; no input validation (zero-argument function).

R/ksd_joint.R

gdpar_ksd_joint(eb_fit, fb_fit, kernel = c("imq", "rbf"), bandwidth = c("median", "fixed"), bandwidth_value = NULL, beta = -0.5, ess_weighted = FALSE, seed = NULL, ...)

Purpose
Computes the joint kernel Stein discrepancy (KSD) between the EB (empirical Bayes) and FB (full Bayes) posteriors. This is a density-free spectral metric for multivariate discrepancy between the two posterior distributions, addressing the limitation of marginal total-variation (TV) distance in capturing joint dependence structure.

Arguments

  • eb_fit: Object of class gdpar_eb_fit (from gdpar_eb()).
  • fb_fit: Object of class gdpar_fit (from gdpar()). Must be fitted on the same dataset.
  • kernel: Character scalar; kernel type: "imq" (inverse multi-quadric, default) or "rbf" (radial basis function).
  • bandwidth: Character scalar; bandwidth selection: "median" (median heuristic on FB pairwise squared distances, default) or "fixed".
  • bandwidth_value: Numeric scalar; bandwidth value (squared units) when bandwidth = "fixed". Defaults to 1.0 if NULL.
  • beta: Numeric scalar in (-1, 0); exponent for IMQ kernel. Default -0.5. Ignored for RBF.
  • ess_weighted: Logical scalar; if TRUE, thin both posteriors to common effective sample size (ESS) count. Default FALSE.
  • seed: Integer scalar or NULL; RNG seed for thinning when ess_weighted = TRUE.
  • ...: Reserved; currently unused.

Mathematics
The KSD uses an empirical Gaussian target $P$ derived from FB posterior draws: mean $\hat{\mu}$, covariance $\hat{\Sigma}$. The score function is $s(x) = -\hat{\Sigma}^{-1}(x - \hat{\mu})$.

For kernel $k(x, y)$:

  • IMQ: $k(x, y) = (h + \|x - y\|^2)^\beta$, $\beta \in (-1, 0)$, default $\beta = -1/2$.
  • RBF: $k(x, y) = \exp(-\|x - y\|^2 / (2h))$.

The Stein kernel under Gaussian target is: $$ k_p(x, y) = \langle s(x), s(y) \rangle k(x, y) + \langle s(x), \nabla_y k(x, y) \rangle + \langle s(y), \nabla_x k(x, y) \rangle + \mathrm{tr}(\nabla_x \nabla_y^\top k(x, y)). $$

The KSD V-statistic is: $$ \mathrm{KSD}(Q, P) = \sqrt{\max\left{0,, n^{-2} \sum_{i, j = 1}^{n} k_p(x_i, x_j)\right}}, \qquad x_i \sim Q. $$ Here $Q$ is the EB posterior (thinned if ess_weighted = TRUE), and $n$ is the (possibly thinned) sample size.

When ess_weighted = TRUE, both posteriors are thinned to $n_{\text{eff}} = \min\{\widehat{\mathrm{ESS}}_{EB}, \widehat{\mathrm{ESS}}_{FB}\}$ via uniform subsampling, where ESS is computed as the minimum basic ESS across retained $\xi$-variables using posterior::ess_basic.

Returns
An object of class gdpar_ksd_joint (list) with components:

  • ksd_value: Numeric scalar; $\mathrm{KSD}$ (square root, clamped $\ge 0$).
  • ksd_squared: Numeric scalar; raw V-statistic before clamping.
  • kernel, bandwidth, bandwidth_value, beta: Settings used.
  • n_eb_draws, n_fb_draws, n_dim: Integer scalars; sample sizes and dimension after thinning.
  • target_mu, target_Sigma: Empirical mean and covariance of FB draws (Gaussian target).
  • ess_weighted, thinned_to: Logical and integer; thinning configuration.
  • vars: Character vector; common $\xi$-variables used.
  • call: The matched call.

Notes

  • Raises errors (via gdpar_abort()) if: inputs are not of correct classes; $\beta$ not in (-1, 0); bandwidth_value non-positive when bandwidth = "fixed"; ess_weighted or seed not scalar logical/numeric; no common $\xi$-variables; ESS-thinned $n_{\text{eff}} &lt; 2$.
  • Side effects: sets RNG seed if seed and ess_weighted = TRUE.
  • S3 methods: print.gdpar_ksd_joint and summary.gdpar_ksd_joint exist (documented separately).

.gdpar_ksd_validate_inputs(eb_fit, fb_fit, kernel, bandwidth, bandwidth_value, beta, ess_weighted, seed)

Purpose
Internal validation helper for gdpar_ksd_joint(). Checks argument types and ranges, aborting with informative errors on invalid inputs.

Arguments

  • eb_fit: Expected to be gdpar_eb_fit; validated by class.
  • fb_fit: Expected to be gdpar_fit; validated by class.
  • kernel: Character scalar; validated by match.arg() upstream.
  • bandwidth: Character scalar; validated by match.arg() upstream.
  • bandwidth_value: Numeric scalar or NULL; if non-NULL and bandwidth = "fixed", must be positive.
  • beta: Numeric scalar; must be in (-1, 0).
  • ess_weighted: Logical scalar; must be TRUE or FALSE.
  • seed: Numeric scalar or NULL; if non-NULL, must be a single numeric.

Mathematics
None.

Returns
NULL invisibly (called for side effects).

Notes

  • Raises errors via gdpar_abort() with class "gdpar_input_error" and relevant data.
  • Does not check consistency of eb_fit and fb_fit (e.g., same dataset) – that is left to the caller.

.gdpar_ksd_median_squared_distance(X)

Purpose. Computes a robust, scale-equivariant bandwidth heuristic for kernel-based discrepancy measures. Given a matrix of samples, it returns the median of all pairwise squared Euclidean distances among distinct pairs, serving as the default bandwidth $h$ (or a derived quantity thereof) for the Stein kernel in the KSD computation.

Arguments.

Argument Type Meaning
X matrix $(n \times d)$ Sample matrix whose rows are observations in $\mathbb{R}^d$.

Mathematics. Let $\mathbf{x}_1, \ldots, \mathbf{x}_n$ be the rows of X. The function first constructs the $n \times n$ matrix of squared Euclidean distances:

$$D^2_{ij} = |\mathbf{x}_i - \mathbf{x}_j|^2 = |\mathbf{x}_i|^2 + |\mathbf{x}_j|^2 - 2,\mathbf{x}_i^\top \mathbf{x}_j$$

implemented via the identity outer(ss, ss, "+") - 2 * tcrossprod(X) where ss = rowSums(X * X) stores $\|\mathbf{x}_i\|^2$. Any negative entries (numerical artifacts) are clamped to zero. The function extracts the strictly upper-triangular entries $\{D^2_{ij} : i &lt; j\}$ and returns:

$$h_{\text{med}} = \mathrm{median}!\bigl({D^2_{ij}}_{i<j}\bigr)$$

Returns. A numeric scalar:

  • 1.0 if $n &lt; 2$ (too few samples to form a pair).
  • 1.0 if the computed median is non-finite or $\le 0$ (degenerate configuration).
  • Otherwise, the median pairwise squared distance (always $&gt; 0$ under the guard).

Notes.

  • Purely internal (not exported, no @export tag).
  • The clamping D2[D2 < 0.0] <- 0.0 guards against floating-point negativity from the algebraic identity when points nearly coincide.
  • No side effects; does not modify its argument.

.gdpar_ksd_safe_inverse(Sigma)

Purpose. Computes the inverse of a covariance (or precision-related) matrix $\Sigma$ in a numerically robust fashion. Used internally to obtain the precision matrix $\Lambda = \Sigma^{-1}$ for the Gaussian-target score $s(x) = -\Lambda(x - \mu)$ in the KSD computation. If direct Cholesky inversion fails due to near-singularity, it retries after adding a ridge.

Arguments.

Argument Type Meaning
Sigma matrix $(d \times d)$ Symmetric positive-(semi)definite covariance matrix to be inverted.

Mathematics. The function attempts to compute $\Sigma^{-1}$ via Cholesky factorisation:

  1. First try: $\Sigma^{-1} = (\mathbf{L}\mathbf{L}^\top)^{-1}$ using chol2inv(chol(Sigma)).
  2. On failure, form the ridged matrix: $$\tilde{\Sigma} = \Sigma + \varepsilon,\mathbf{I}_d, \qquad \varepsilon = \max!\bigl(10^{-8},; 10^{-10}\cdot \mathrm{mean}(\mathrm{diag}(\Sigma))\bigr)$$ and retry chol2inv(chol(Sigma_ridged)).
  3. If the ridged inversion also fails, the function aborts.

Returns. A matrix $(d \times d)$ equal to $\Sigma^{-1}$ (or $\tilde{\Sigma}^{-1}$ if ridging was necessary).

Notes.

  • Raises a fatal gdpar_internal_error (via gdpar_abort) if even the ridged matrix is numerically singular.
  • The ridge $\varepsilon$ scales with the average diagonal of $\Sigma$, making it relative to the data scale, with an absolute floor of $10^{-8}$.
  • Internal only; not exported.

.gdpar_ksd_safe_min_ess(draws_mat)

Purpose. Safely computes the minimum effective sample size (ESS) across all columns (variables) of a draws matrix. Used to decide whether and how much to thin MCMC draws before computing the KSD, guarding against version-incompatible calls to the posterior package.

Arguments.

Argument Type Meaning
draws_mat matrix or draws_matrix Matrix of posterior draws (rows = iterations, columns = variables), typically from the feedback (FB) model.

Mathematics. Let $E_j$ be the ESS for variable $j$. The function returns:

$$n_{\text{eff}} = \min_j E_j$$

computed via posterior::ess_basic. The comment references fix B9.4 G.iv/H.iv: the call is routed through posterior::summarise_draws() to avoid ambiguous method dispatch of posterior::ess_basic() on raw matrices across package versions.

Returns. A numeric scalar:

  • If posterior::summarise_draws() succeeds and yields a valid "ess_basic" column (data frame) or a numeric vector, the minimum positive finite value is returned.
  • If the call fails, returns non-finite or empty, or ess_basic cannot be extracted, the function falls back to nrow(draws_mat) (i.e., the total number of draws, an upper bound).

Notes.

  • Internal only.
  • The tryCatch guards against the absence of the posterior package or API changes. The fallback nrow(draws_mat) means that in a degraded setting, no thinning based on ESS occurs.
  • Values $\le 0$ or non-finite are filtered out before taking the minimum.

.gdpar_ksd_stein_kernel_sum(X, mu, Lambda, kernel, h, beta)

Purpose. Computes the V-statistic numerator $\sum_{i=1}^{n}\sum_{j=1}^{n} k_p(\mathbf{x}_i, \mathbf{x}_j)$ of the kernelised Stein discrepancy (KSD) under a Gaussian target $p(\mathbf{x}) = \mathcal{N}(\boldsymbol{\mu}, \Sigma)$ with score $s(\mathbf{x}) = -\Lambda(\mathbf{x} - \boldsymbol{\mu})$ where $\Lambda = \Sigma^{-1}$. This is the core computational workhorse for the joint KSD statistic.

Arguments.

Argument Type Meaning
X matrix $(n \times d)$ Sample matrix (posterior draws in the common-$\xi$ space).
mu numeric (length $d$) Mean vector $\boldsymbol{\mu}$ of the Gaussian reference (typically the EB point estimate).
Lambda matrix $(d \times d)$ Precision matrix $\Lambda = \Sigma^{-1}$ of the Gaussian reference.
kernel character Base kernel type: either "imq" (inverse multi-quadratic) or "rbf" (radial basis function / Gaussian).
h numeric scalar Bandwidth parameter $h$ (squared-distance units).
beta numeric scalar Exponent $\beta$ for the IMQ kernel; ignored when kernel = "rbf".

Mathematics. Let $k_p$ denote the Stein kernel built from base kernel $k$ and score $s$:

$$k_p(\mathbf{x}, \mathbf{y}) = s(\mathbf{x})^\top s(\mathbf{y}), k(\mathbf{x}, \mathbf{y}) + s(\mathbf{x})^\top \nabla_{\mathbf{y}} k(\mathbf{x}, \mathbf{y}) + \nabla_{\mathbf{x}} k(\mathbf{x}, \mathbf{y})^\top s(\mathbf{y}) + \mathrm{tr}!\bigl(\nabla_{\mathbf{x}}\nabla_{\mathbf{y}} k(\mathbf{x}, \mathbf{y})\bigr)$$

The function computes the full $n \times n$ matrix for each of the four terms and sums all entries. Pre-computed quantities:

  • $\mathbf{S}_i = s(\mathbf{x}_i) = -\Lambda\,(\mathbf{x}_i - \boldsymbol{\mu})$, stored in S ($n \times d$).
  • $D^2_{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|^2$ (squared distance matrix, clamped $\ge 0$).
  • $\mathbf{S}\mathbf{X}^\top$, $\mathbf{X}\mathbf{S}^\top$, $\mathbf{S}\mathbf{S}^\top$, and $\mathbf{s}_i^\top \mathbf{x}_i$ (diagonal products).

For the IMQ kernel $k(\mathbf{x}, \mathbf{y}) = (h + \|\mathbf{x} - \mathbf{y}\|^2)^{\beta}$:

Let $u_{ij} = h + D^2_{ij}$. Define $K = u^{\beta}$, $K_1 = u^{\beta - 1}$, $K_2 = u^{\beta - 2}$ element-wise. The four Stein-kernel term matrices are:

$$\begin{aligned} T_1 &= (\mathbf{S}\mathbf{S}^\top) \odot K \[4pt] T_2 &= -2\beta, K_1 \odot \bigl(\mathbf{s}_i^\top \mathbf{x}_i,\mathbf{1} - \mathbf{S}\mathbf{X}^\top\bigr) \[4pt] T_3 &= \phantom{-}2\beta, K_1 \odot \bigl(\mathbf{X}\mathbf{S}^\top - \mathbf{1},\mathbf{s}_i^\top \mathbf{x}_i\bigr) \[4pt] T_4 &= -4\beta(\beta - 1), K_2 \odot D^2 - 2\beta, d, K_1 \end{aligned}$$

where $\odot$ denotes Hadamard (element-wise) product, $\mathbf{s}_i^\top \mathbf{x}_i$ is broadcast to $n \times n$ matrices, and the scalar $-2\beta d\, K_1$ is broadcast element-wise.

For the RBF kernel $k(\mathbf{x}, \mathbf{y}) = \exp\!\bigl(-\|\mathbf{x} - \mathbf{y}\|^2/(2h)\bigr)$:

$$\begin{aligned} K &= \exp!\bigl(-D^2/(2h)\bigr) \[4pt] T_1 &= (\mathbf{S}\mathbf{S}^\top) \odot K \[4pt] T_2 &= -\frac{1}{h}, K \odot \bigl(\mathbf{s}_i^\top \mathbf{x}_i,\mathbf{1} - \mathbf{S}\mathbf{X}^\top\bigr) \[4pt] T_3 &= -\frac{1}{h}, K \odot \bigl(\mathbf{1},\mathbf{s}_i^\top \mathbf{x}_i - \mathbf{X}\mathbf{S}^\top\bigr) \[4pt] T_4 &= K \odot \bigl(d/h - D^2/h^2\bigr) \end{aligned}$$

(The signs on $T_2$, $T_3$ match the code's negated-inner-difference convention.)

The function returns $\sum_{i,j}(T_1 + T_2 + T_3 + T_4)_{ij}$.

Returns. A numeric scalar: the V-statistic numerator $\sum_{i,j} k_p(\mathbf{x}_i, \mathbf{x}_j)$. The caller takes $\sqrt{\cdot}$ (or $\sqrt{\cdot}/n$) to obtain the reported KSD value.

Notes.

  • Cost is $O(n^2 d^2)$ due to the dense $n \times n$ matrix products $\mathbf{S}\mathbf{X}^\top$, $\mathbf{X}\mathbf{S}^\top$, $\mathbf{S}\mathbf{S}^\top$.
  • The beta argument is only meaningful when kernel == "imq"; for "rbf" it is accepted but unused.
  • Negative squared distances in $D^2$ are clamped to zero as in the other internal helpers.
  • Internal only; not exported.

print.gdpar_ksd_joint(x, digits = 4L, ...)

Purpose. S3 print method for objects of class "gdpar_ksd_joint", producing a human-readable summary of the joint KSD result to the console.

Arguments.

Argument Type Meaning
x gdpar_ksd_joint The KSD joint object, a named list produced by the main KSD computation. Expected components: ksd_value, kernel, beta, bandwidth_value, bandwidth, n_dim, n_eb_draws, n_fb_draws, ess_weighted, thinned_to.
digits integer scalar Number of significant digits for formatting numeric values. Default 4L.
... Reserved for future arguments; currently unused.

Returns. Invisibly returns x (the input object), following standard R print-method convention.

Notes.

  • Exported (@export); registered as an S3 method for print on class "gdpar_ksd_joint".
  • Conditionally prints the beta parameter line only when x$kernel == "imq".
  • Conditionally prints the ESS-weighted thinning line only when x$ess_weighted is TRUE.

summary.gdpar_ksd_joint(object, ...)

Purpose. S3 summary method for "gdpar_ksd_joint" objects. Constructs a structured summary list containing the KSD value, squared KSD, kernel details, dimensionality, variable names, and an interpretation string.

Arguments.

Argument Type Meaning
object gdpar_ksd_joint The KSD joint result object. Expected components: ksd_value, ksd_squared, kernel, bandwidth_value, n_dim, vars.
... Reserved; currently unused.

Returns. An object of class c("summary.gdpar_ksd_joint", "list") with components:

Component Type Meaning
ksd_value numeric $\sqrt{V}$, the square root of the V-statistic.
ksd_squared numeric $V$, the raw V-statistic.
kernel character Kernel type used ("imq" or "rbf").
bandwidth_value numeric Bandwidth value $h$ in squared-distance units.
n_dim integer Dimension $d$ of the common-$\xi$ space.
vars character vector Names of the common-$\xi$ variables.
interpretation character scalar Explanatory text: a value close to zero indicates EB matches the empirical Gaussian fit of FB on the joint $\xi$ posterior; positive values indicate joint distributional deviation. Advises cross-checking with gdpar_compare_eb_fb()'s marginal TV table.

Notes.

  • Exported.
  • Does not print anything itself; use print() on the returned object for formatted display.

print.summary.gdpar_ksd_joint(x, digits = 4L, ...)

Purpose. S3 print method for "summary.gdpar_ksd_joint" objects. Produces a formatted, human-readable display of the KSD summary including all key quantities and an interpretation paragraph.

Arguments.

Argument Type Meaning
x summary.gdpar_ksd_joint Summary object created by summary.gdpar_ksd_joint().
digits integer scalar Number of significant digits for numeric formatting. Default 4L.
... Reserved; currently unused.

Returns. Invisibly returns x, following standard R print-method convention.

Notes.

  • Exported (@export).
  • Uses strwrap() with width = 70L, indent = 0L, exdent = 2L to wrap the interpretation paragraph neatly at 70 characters with a 2-character hanging indent.
  • Displays: KSD value (V-stat sqrt), KSD squared, kernel type, bandwidth, dimension, variable names, and the interpretation block.

R/meta_learner_adapter.R

gdpar_meta_learner_adapter(name, fit_predict_fun, predict_fun = NULL, requires_r = character(0L), requires_py = character(0L), native_ci = FALSE, description = NULL)

Purpose

Constructor of the pluggable adapter contract through which gdpar_compare_meta_learners dispatches to external meta-learner implementations (e.g. grf, EconML, DoubleML, or custom doubly-robust estimators). The returned object is a small S3 list of class c("gdpar_meta_learner_adapter", "list") that bundles a mandatory fit-and-predict closure, an optional re-prediction closure, dependency manifests for R packages and Python modules, a flag declaring native credible-interval support, and a free-form description. The comparator orchestrates adapters by invoking fit_predict_fun for the full fit-and-predict cycle and, when available, predict_fun for re-evaluation on a new grid without re-fitting; when predict_fun is NULL, the comparator falls back to fit_predict_fun and emits a gdpar_diagnostic_warning announcing the re-fit.

Arguments

  • name: character scalar. Adapter identifier; must be non-empty (nzchar(name)) and is expected to be unique within a single call to gdpar_compare_meta_learners (uniqueness is enforced by the comparator, not by this constructor).
  • fit_predict_fun: function. Mandatory closure with signature function(X, Y, T, X_newdata, level, seed_run) that fits the meta-learner on (X, Y, T) and predicts the CATE on X_newdata. X and X_newdata are data frames of covariates (the adapter is responsible for any internal conversion to matrices or to language-native arrays such as numpy.ndarray); Y is a numeric vector; T is a 0/1 integer vector; level is the nominal credible level inherited from the bridge (default 0.95); seed_run is the per-method seed propagated by the comparator. The closure must return a list with components cate_mean (numeric vector), cate_ci (numeric matrix [n_newdata, 2] with columns lower, upper, or NULL), state (opaque object cached for the optional predict_fun; may be NULL), and notes (character vector of method-specific diagnostics/warnings).
  • predict_fun: function or NULL (default NULL). Optional closure with signature function(state, X_newdata, level) that re-evaluates the meta-learner on X_newdata using the cached state produced by fit_predict_fun. Must return a list with components cate_mean and cate_ci (same shape contract as fit_predict_fun). NULL signals that the adapter does not support re-prediction.
  • requires_r: character vector. R packages the adapter needs (checked via requireNamespace before the fit). Default character(0L).
  • requires_py: character vector. Python modules the adapter needs (checked via reticulate::py_module_available before the fit, which itself requires reticulate). Default character(0L).
  • native_ci: logical scalar. TRUE when the adapter returns a native CI in cate_ci; FALSE when it does not (the comparator does not synthesize intervals in that case and the slot is left at NULL). Must be non-NA.
  • description: NULL or character scalar. One-line human-readable description used by the print method. Default NULL.

Returns

An object of class c("gdpar_meta_learner_adapter", "list") with components name, fit_predict_fun, predict_fun, requires_r, requires_py, native_ci (coerced via as.logical(native_ci)), and description.

Notes

The constructor performs strict input validation and raises errors via gdpar_abort with S3 class "gdpar_input_error":

  • name: must satisfy is.character(name) && length(name) == 1L && nzchar(name); otherwise gdpar_abort is called with data = list(received = name).
  • fit_predict_fun: must satisfy is.function(fit_predict_fun).
  • predict_fun: must be NULL or satisfy is.function(predict_fun).
  • requires_r: must satisfy is.character(requires_r) (length is not constrained).
  • requires_py: must satisfy is.character(requires_py).
  • native_ci: must satisfy is.logical(native_ci) && length(native_ci) == 1L && !is.na(native_ci).
  • description: must be NULL or a character scalar (is.character(description) && length(description) == 1L).

No side effects beyond the construction of the list and assignment of the S3 class. The function does not validate the body or signature of fit_predict_fun/predict_fun; conformance to the documented contract is the caller's responsibility.


is_gdpar_meta_learner_adapter(x)

Purpose

Predicate testing whether an object inherits from class "gdpar_meta_learner_adapter". Used to guard comparator internals and user-facing code against malformed adapter arguments.

Arguments

  • x: any R object to test.

Returns

TRUE when inherits(x, "gdpar_meta_learner_adapter") is TRUE, FALSE otherwise. Always a length-1 logical scalar (never NA, because inherits never returns NA).

Notes

No validation, no errors, no side effects. The test is purely class-based; an object that happens to be a list with the right components but lacking the S3 class attribute will return FALSE.


print.gdpar_meta_learner_adapter(x, ...)

Purpose

S3 print method for objects of class gdpar_meta_learner_adapter. Renders a compact human-readable summary used by the print generic and by the example workflow in gdpar_meta_learner_adapter.

Arguments

  • x: a gdpar_meta_learner_adapter object.
  • ...: unused; present for S3 generic compatibility.

Returns

Invisibly returns x.

Notes

Emits the following lines to standard output via cat:

  1. <gdpar_meta_learner_adapter> (header).
  2. name : followed by x$name.
  3. If !is.null(x$description): description : followed by x$description.
  4. requires (R) : followed by either <none> (when length(x$requires_r) == 0L) or paste(x$requires_r, collapse = ", ").
  5. requires (Python) : followed by either <none> (when length(x$requires_py) == 0L) or paste(x$requires_py, collapse = ", ").
  6. native CI : followed by x$native_ci.
  7. supports predict : followed by !is.null(x$predict_fun).

No validation of x is performed; passing a non-adapter will either succeed (printing whatever fields happen to be present) or error with a standard R subsetting error. The ... argument is silently ignored.


.check_adapter_requirements(adapter)

Purpose

Internal helper that verifies whether the R packages and Python modules declared in an adapter's requires_r and requires_py fields are available in the current session. Used by the comparator before dispatching to fit_predict_fun so that missing-dependency failures surface as structured diagnostics rather than opaque downstream errors.

Arguments

  • adapter: a gdpar_meta_learner_adapter object (or any list with requires_r and requires_py character components). No class check is performed.

Returns

A list with three components:

  • ok: logical scalar. TRUE iff both missing_r and missing_py have length zero.
  • missing_r: character vector of entries from adapter$requires_r for which requireNamespace(pkg, quietly = TRUE) returns FALSE.
  • missing_py: character vector of entries from adapter$requires_py deemed unavailable (see algorithm below).

Mathematics

The availability predicate for R packages is

$$ \mathrm{avail}_R(p) ;=; \texttt{requireNamespace}(p,;\texttt{quietly=TRUE}), $$

and the overall R-side status is

$$ \mathrm{missing}_R ;=; {, p \in \texttt{requires_r} : \neg, \mathrm{avail}_R(p) ,}. $$

For Python, the predicate is conditional on reticulate itself being present:

$$ \mathrm{avail}_P(m) ;=; \begin{cases} \texttt{reticulate::py_module_available}(m), & \text{if \textbf{reticulate} is available},\\ \texttt{FALSE}, & \text{otherwise}, \end{cases} $$

with each call wrapped in tryCatch(..., error = function(e) FALSE), and

$$ \mathrm{missing}_P ;=; {, m \in \texttt{requires_py} : \neg, \mathrm{avail}_P(m) ,}. $$

The final status is

$$ \mathrm{ok} ;=; \bigl(|\mathrm{missing}_R| = 0\bigr) ;\wedge; \bigl(|\mathrm{missing}_P| = 0\bigr). $$

Notes

The Python branch proceeds in three cases:

  1. Empty requires_py: missing_py is set to character(0L) unconditionally; reticulate is not loaded.
  2. Non-empty requires_py but reticulate unavailable: missing_py is set to the entirety of adapter$requires_py (i.e. every declared module is reported as missing) without attempting to load Python.
  3. Non-empty requires_py and reticulate available: the helper first checks, via exists("py_require", envir = asNamespace("reticulate"), inherits = FALSE), whether reticulate exposes the newer py_require API; if so, it calls reticulate::py_require(adapter$requires_py) inside a tryCatch(..., error = function(e) NULL) (errors are silently swallowed and the return value is discarded — this call is a best-effort hint to reticulate's environment manager, not a hard gate). Then each module in adapter$requires_py is tested with reticulate::py_module_available(mod), again wrapped in tryCatch(..., error = function(e) FALSE); modules for which the call returns FALSE or errors are collected into missing_py.

The function is marked @keywords internal and @noRd; it is not exported and is intended only for use within the package. It performs no validation of adapter beyond subsetting requires_r and requires_py; a malformed adapter lacking these components will trigger an R subsetting error.

R/methods.R

print.gdpar_fit(x, ...)

Purpose Provides a concise printed summary of a fitted gdpar_fit object, displaying key model attributes and convergence diagnostics. Arguments

  • x: An object of class gdpar_fit.
  • ...: Unused; present for S3 generic compatibility. Mathematics None. Returns Invisibly returns the input object x. Notes
  • This is the S3 print method for gdpar_fit objects.
  • The output includes the model path, family and link, AMM level, anchor value(s), number of observations (inferred from design matrices), identifiability report (if present), and convergence diagnostics (converged status, maximum R-hat, minimum bulk ESS, divergent count) when available.
  • For multivariate models (p > 1), the anchor is printed as a vector.

summary.gdpar_fit(object, ...)

Purpose Returns a posterior summary table for user-facing parameters of a fitted gdpar_fit object. Arguments

  • object: An object of class gdpar_fit.
  • ...: Unused; present for S3 generic compatibility. Mathematics None. Returns A data frame of posterior summaries (mean, median, standard deviation, 5% and 95% quantiles, R-hat, ESS bulk and tail) for the selected parameters. Notes
  • This is the S3 summary method for gdpar_fit objects.
  • Requires the suggested package posterior.
  • Parameters are filtered to exclude internal variables (eta, log_lik, y_pred, theta_i, a_coef, b_coef, a_raw, b_raw, W_raw). If no user-facing parameters remain, defaults to theta_ref or the first variable.
  • Uses posterior::summarise_draws with default summary and convergence measures.

predict.gdpar_fit(object, newdata = NULL, type = c("theta_i", "linear_predictor", "response"), summary = c("draws", "mean_se", "quantiles"), ...)

Purpose Computes posterior draws of individual parameters $\theta_i$ for a fitted gdpar_fit model, optionally on the response scale. Arguments

  • object: An object of class gdpar_fit.
  • newdata: Optional data frame for prediction. When NULL (default), predictions are computed on training data using stored Stan-generated theta_i draws.
  • type: Character scalar: "theta_i" or "linear_predictor" (synonyms) for the linear predictor scale; "response" for the inverse-link-transformed scale.
  • summary: Character scalar: "draws" (default) returns full draws as a matrix/array; "mean_se" returns posterior mean and standard error per observation; "quantiles" returns 5%, 50%, and 95% quantiles per observation.
  • ...: Unused; present for S3 generic compatibility. Mathematics The individual parameters are defined as: $$\theta_i = \theta_{\text{ref}} + \Delta(x_i, \theta_{\text{ref}})$$ where $\Delta$ is the deviation function from the AMM specification. When type = "response", the family's inverse link function $g^{-1}(\cdot)$ is applied to $\theta_i$. Returns
  • For summary = "draws": A numeric matrix (draws $\times$ observations) for univariate models; an array of dimensions $(S, n, p)$ for multivariate models ($p &gt; 1$); or an array of dimensions $(S, n, K)$ for $K &gt; 1$ models.
  • For summary = "mean_se": A data frame with columns mean and se for univariate models; a named list of such data frames for multivariate or $K &gt; 1$ models.
  • For summary = "quantiles": A data frame with columns q05, q50, q95 for univariate models; a named list of such data frames for multivariate or $K &gt; 1$ models. Notes
  • This is the S3 predict method for gdpar_fit objects.
  • Dispatches to internal helper functions based on model structure (multivariate with $p &gt; 1$ or $K &gt; 1$).
  • For newdata, centering parameters (column means of $Z_a$, $Z_b$, and column means/standard deviations of $X$) from training are used to rebuild design matrices.
  • For type = "response", the inverse link is applied per-draw; reported quantiles are of the response-scale distribution.
  • Requires the suggested package posterior.

predict_gdpar_fit_multi(object, newdata, type, summary, draws)

Purpose Internal helper for predict.gdpar_fit handling multivariate models ($p &gt; 1$). Arguments

  • object: A gdpar_fit object with $p &gt; 1$.
  • newdata: Either NULL (in-sample) or a data frame for out-of-sample prediction.
  • type: Character scalar: "theta_i", "linear_predictor", or "response".
  • summary: Character scalar: "draws", "mean_se", or "quantiles".
  • draws: The posterior draws object from the Stan fit. Mathematics For in-sample predictions, the generated quantity $\theta_i$ (stored as a matrix with columns for each $(i,k)$ pair) is reshaped into an array of dimensions $(S, n, p)$. For out-of-sample, $\eta_{i,k}$ is rebuilt per coordinate from draws of $\theta_{\text{ref}}$, $a_{\text{coef}}$, $c_b$, $W_{\text{raw}}$, and $\sigma_W$. Returns
  • For summary = "draws": An array of dimensions $(S, n, p)$ (draws $\times$ observations $\times$ dimensions).
  • For summary = "mean_se": A named list of length $p$, each element a data frame with mean and se.
  • For summary = "quantiles": A named list of length $p$, each element a data frame with q05, q50, q95. Notes
  • Internal function, not exported.
  • Parses variable names of the form theta_i[i,k] from draws to map to array indices.
  • For type = "response", applies the $k$-th component's inverse link from object$family$families[[k]]$inv_link.
  • Handles grouped data via predict_from_newdata_grouped_multi.

predict_gdpar_fit_K(object, newdata, type, summary, draws)

Purpose Internal helper for predict.gdpar_fit handling distributional regression models with $K &gt; 1$ parameters (e.g., location and scale). Arguments

  • object: A gdpar_fit object with $K &gt; 1$.
  • newdata: Either NULL (in-sample) or a data frame for out-of-sample prediction.
  • type: Character scalar: "theta_i", "linear_predictor", or "response".
  • summary: Character scalar: "draws", "mean_se", or "quantiles".
  • draws: The posterior draws object from the Stan fit. Mathematics For in-sample, the generated quantity $\theta_{i,k}$ (stored as a matrix of dimensions $n \times K$) is reshaped into an array of dimensions $(S, n, K)$. For out-of-sample, $\eta_{i,k}$ is rebuilt per distributional parameter $k$ from draws of $\theta_{\text{ref},k}$, $a_{\text{coef},k}$, $c_{b,k}$, and the globally shared $W_{\text{raw}}$ and $\sigma_W$. Returns
  • For summary = "draws": An array of dimensions $(S, n, K)$.
  • For summary = "mean_se": A named list of length $K$, each element a data frame with mean and se.
  • For summary = "quantiles": A named list of length $K$, each element a data frame with q05, q50, q95. Notes
  • Internal function, not exported.
  • For type = "response", applies the canonical inverse link for each distributional parameter $k$ as specified in object$family$param_specs[[k]]$inv_link (e.g., identity for location, log for dispersion).
  • Out-of-sample predictions with grouping or B-spline bases on newdata are not yet supported and raise gdpar_unsupported_feature_error.
  • Mirrors predict_gdpar_fit_multi but iterates over $K$ slots.

predict_gdpar_fit_K(object, newdata, type, summary, draws)

Purpose Computes predictions for a gdpar_fit object with K > 1 (K-individual model). It reconstructs posterior draws of the linear predictor or response for each slot (individual) using the provided posterior draws, optionally transforming them via inverse link functions and summarizing them.

Arguments

  • object: A gdpar_fit object (list) containing model information, including K, slot_names, family, and amm.
  • newdata: Either NULL (use training data) or a data frame of new covariates for prediction.
  • type: Character string, either "linear_predictor" (default, returns linear predictor) or "response" (applies inverse link functions).
  • summary: Character string, either "draws" (returns the full array of draws), "mean_se" (returns mean and standard error), or "quantile" (returns 5%, 50%, and 95% quantiles).
  • draws: A draws object (from the posterior package) containing posterior samples, including variables theta_i_k, theta_ref_k, a_coef_k, c_b_k, W_raw, and sigma_W as applicable.

Mathematics When newdata is NULL, the function extracts draws of theta_i_k[i,k] from draws. When newdata is provided, it calls predict_from_newdata_K (not defined in this section) to compute $\eta_{i,k}$ for each draw $s$, individual $k$, and new observation $i$ using the formula: $$ \eta_{i,k} = \theta_{\mathrm{ref},k} + Z_{a,k}[i,]\cdot a_{\mathrm{coef},k} + Z_{b,k}[i,]\cdot c_{b,k} + \sum_{j=1}^{\dim_W} (\theta_{\mathrm{ref},k}^j - \theta_{\mathrm{anchor},k}^j), W_{\mathrm{raw}}[j,\cdot],\sigma_W,X[i,]^{\top} $$ If type = "response", each slot's linear predictor is transformed by the inverse link function from object$family$param_specs[[k]]$inv_link.

Returns

  • If summary = "draws": a 3-dimensional array of dimensions (S, n, K) (S = number of draws, n = number of observations, K = number of slots) with dimension names slot_names for the third dimension.
  • If summary = "mean_se": a list of length K (names = slot_names) where each element is a data frame with columns mean and se.
  • If summary = "quantile": a list of length K (names = slot_names) where each element is a data frame with columns q05, q50, q95.

Notes

  • The function requires the posterior package (suggested dependency) to extract draws.
  • When newdata is NULL, it parses variable names of theta_i_k using the regex ^theta_i_k\\[(\\d+),(\\d+)\\]$ to extract indices. If parsing fails, it raises a gdpar_internal_error.
  • For type = "response", if object$family$param_specs is NULL or has incorrect length, it raises a gdpar_internal_error advising to use type = "linear_predictor".
  • The function does not modify object or draws.

predict_from_newdata(object, newdata, draws)

Purpose Internal helper that computes the linear predictor $\eta_i$ for new data in a univariate (p = 1) dynamic parameter model. It uses the centering parameters from the training design and posterior draws of theta_ref, a_coef, b_coef, W_raw, and sigma_W.

Arguments

  • object: A gdpar_fit object (list) with amm (model specification) and design (training design matrices and centering constants).
  • newdata: Data frame with covariates required by the AMM specification.
  • draws: A draws object (from posterior) containing posterior samples.

Mathematics For each posterior draw $s$, the linear predictor $\eta_i$ is computed as: $$ \eta_i = \theta_{\mathrm{ref}} + Z_a[i,]\cdot a_{\mathrm{coef}} + Z_b[i,]\cdot b_{\mathrm{coef}} \cdot \theta_{\mathrm{ref}} + \sum_{j=1}^{\dim_W} (\theta_{\mathrm{ref}}^j - \theta_{\mathrm{anchor}}^j), W_{\mathrm{raw}}[j,\cdot],\sigma_W, X[i,]^{\top} $$ where:

  • $Z_a$ and $Z_b$ are design matrices for the additive and multiplicative components, centered using training means.
  • $X$ is the standardized modulating covariate matrix (centered and scaled using training means and standard deviations).
  • $\theta_{\mathrm{ref}}$ is the reference parameter draw.
  • $a_{\mathrm{coef}}$, $b_{\mathrm{coef}}$ are coefficient draws.
  • $W_{\mathrm{raw}}$ are draws of the modulating weights, scaled by $\sigma_W$ (if non-centered parameterization).
  • $\theta_{\mathrm{anchor}}$ is the fixed anchor parameter.

Returns A matrix of dimensions (S, n_new) (S = number of draws, n_new = number of rows in newdata) containing the linear predictor draws.

Notes

  • This function is internal (not exported) and is called by predict_gdpar_fit for the univariate case.
  • It assumes the model has a polynomial W basis (amm$W$type == "polynomial"). If the modulating component exists but is not polynomial, it is ignored (the has_W condition ensures only polynomial W is used).
  • If amm$a or amm$b are NULL, the corresponding design matrices are set to zero-column matrices.
  • The function loops over each draw $s$ in R, which may be slow for large S; however, it is intended for prediction on new data where vectorization across draws is non-trivial due to the multiplicative term involving $\theta_{\mathrm{ref}}$.

predict_from_newdata_multi(object, newdata, draws)

Purpose Internal helper that computes the linear predictor $\eta_{i,k}$ for new data in a multivariate (p > 1) dynamic parameter model. It rebuilds per-coordinate predictors using per-coordinate design matrices and centering, and a globally shared modulating component.

Arguments

  • object: A gdpar_fit object with p > 1 (number of dimensions) and per-dimension AMM specifications in amm$dims.
  • newdata: Data frame with covariates required by every dimension's AMM specification.
  • draws: A draws object (from posterior) containing posterior samples.

Mathematics For each posterior draw $s$ and dimension $k = 1,\dots,p$, the linear predictor $\eta_{i,k}$ is: $$ \eta_{i,k} = \theta_{\mathrm{ref},k} + Z_{a,k}[i,]\cdot a_{\mathrm{coef},k} + Z_{b,k}[i,]\cdot c_{b,k} + \sum_{j=1}^{\dim_W/p} (\theta_{\mathrm{ref},k}^j - \theta_{\mathrm{anchor},k}^j), W_{\mathrm{raw}}[r_{k,j},\cdot],\sigma_W, X[i,]^{\top} $$ where:

  • $Z_{a,k}$ and $Z_{b,k}$ are per-dimension design matrices, centered using per-dimension training means.
  • $X$ is the standardized modulating covariate matrix (shared across dimensions), centered and scaled using global training means and standard deviations.
  • $\theta_{\mathrm{ref},k}$, $a_{\mathrm{coef},k}$, $c_{b,k}$ are per-dimension draws.
  • $W_{\mathrm{raw}}$ is a matrix of dimension $\dim_W \times d$ (where $d$ is the number of modulating covariates), with rows partitioned by dimension: rows $(k-1)\cdot(\dim_W/p) + 1$ to $k\cdot(\dim_W/p)$ correspond to dimension $k$.
  • $\sigma_W$ is a scalar multiplier (if non-centered parameterization; omitted if centered).
  • $\theta_{\mathrm{anchor},k}$ is the per-dimension anchor.

Returns A 3-dimensional array of dimensions (S, n_new, p) (S = number of draws, n_new = number of rows in newdata, p = number of dimensions).

Notes

  • This function is internal and called by predict_gdpar_fit for multivariate models.
  • It uses helper functions (not shown in this section) to extract draws: .extract_theta_ref_multi_flat, posterior_array_var_to_list, posterior_var_to_matrix_3d, posterior_var_to_array_2d.
  • The modulating component is only included if amm$W exists, has type "polynomial", and X has columns. The global standardization of X is applied.
  • If the centered parameterization for W is used (cp_W = TRUE), sigma_W is not multiplied.
  • The function loops over draws and dimensions in R, which may be slow but is necessary due to the complex dependencies.

predict_from_newdata_K(object, newdata, draws)

Purpose Internal helper that generates posterior predictions on new covariate data for K-individual fits (K > 1). It mirrors the Stan-side linear predictor $\eta_k$ for each individual slot $k$ and posterior draw $s$, using centred / scaled design matrices built from newdata. Called by the predict.gdpar_fit S3 method when newdata is non-NULL and the fit has K > 1.

Arguments

Argument Type Meaning
object gdpar_fit list The fitted model object. Must contain K, amm_list_canonical, design_K, anchor, and parametrization.
newdata data.frame New covariate observations at which to predict. Must contain every column referenced by the model's fixed-effect (a), multiplicative-effect (b/c_b), and modulator (W) formula terms, plus X_names columns when a W basis is active.
draws draws_array / draws_matrix (from posterior) Posterior draws extracted from the fitted model (e.g., via as_draws_array(fit$stanfit)). Must contain the variables theta_ref_k, and conditionally a_coef_k, c_b_k, W_raw, and sigma_W.

Mathematics

The linear predictor for individual $k$, observation $i$, and draw $s$ is

$$\eta_{k,i}^{(s)} = \theta_{\mathrm{ref},k}^{(s)} ;+; \mathbf{Z}_{a,k,i}^{!\top},\mathbf{a}_k^{(s)} ;+; \mathbf{Z}_{b,k,i}^{!\top},\mathbf{c}_{b,k}^{(s)} ;+; \mathbf{X}_i^{!\top},\mathbf{w}_{\mathrm{diff},k}^{(s)},$$

where

  • $\theta_{\mathrm{ref},k}^{(s)}$ is the reference-point parameter for slot $k$ at draw $s$.
  • $\mathbf{Z}_{a,k}$ and $\mathbf{Z}_{b,k}$ are design matrices for the additive and multiplicative effects of slot $k$, each column-mean-centred using the training means stored in design_K$Z_a_k_means_list[[k]] and design_K$Z_b_k_means_list[[k]].
  • $\mathbf{a}_k^{(s)}$ and $\mathbf{c}_{b,k}^{(s)}$ are the corresponding coefficient vectors at draw $s$.
  • $\mathbf{X}_i$ is the vector of modulator covariates for observation $i$, standardised (centred then divided by training SDs) via design_K$X_means and design_K$X_sds.

The modulating coefficient vector is built from the polynomial basis difference

$$w_{\mathrm{diff},k,j}^{(s)} = \sum_{m=1}^{d_W} W_{\mathrm{raw},m,j}^{(s)};\Bigl[\bigl(\theta_{\mathrm{ref},k}^{(s)}\bigr)^{m} - \theta_{\mathrm{anchor},k}^{m}\Bigr], \qquad j = 1,\dots,d,$$

where $d_W$ (dim_W) is the number of polynomial basis functions and $d$ is the number of modulator covariates. When the centred parametrisation flag cp_W is FALSE, each entry is additionally scaled by $\sigma_W^{(s)}$:

$$\mathbf{w}_{\mathrm{diff},k}^{(s)} ;\leftarrow; \sigma_W^{(s)};\mathbf{w}_{\mathrm{diff},k}^{(s)}.$$

The anchor $\theta_{\mathrm{anchor},k}$ is recycled to length $K$ if its stored length is less than $K$.

Algorithm outline

  1. Validation — Abort with gdpar_unsupported_feature_error if the fit has grouping (J_groups > 1), or if the W basis type is not "polynomial" (e.g. B-spline). Both are queued for a future release (Session 8.4).
  2. Build newdata design matrices — For each slot $k \in \{1,\dots,K\}$, construct Z_a_k_new[[k]] and Z_b_k_new[[k]] from newdata using the stored formula objects, then centre each column by the training mean. If a slot has no a or b term, the corresponding matrix has zero columns. Likewise, build the standardised modulator matrix X_new from design_K$X_names, centre, and scale.
  3. Extract posterior draws — Pull theta_ref_k into an $(S \times K)$ matrix; pull a_coef_k and c_b_k into per-slot lists of $(S \times J_k)$ matrices (via posterior_array_var_to_list); pull W_raw into an $(S \times d_W \times d)$ array and optionally sigma_W into an $(S \times 1)$ matrix.
  4. Loop over draws and slots — For each draw $s$ and slot $k$, accumulate the linear predictor $\eta_k$ and, if a W basis is active, compute the polynomial basis difference and the dot product with W_raw.
  5. Return the filled array.

Returns A numeric 3-dimensional array of dimensions (S, n_new, K) where S = number of posterior draws, n_new = nrow(newdata), and K = number of individual slots. Entry [s, i, k] is the linear predictor $\eta_{k,i}^{(s)}$ (on the link scale; no inverse-link is applied).

Notes

  • Requires the posterior package (loaded via require_suggested).
  • Unsupported features that raise errors: (a) J_groups > 1 (grouped K-individual fits); (b) non-polynomial W basis types (e.g., "B-spline"). Both produce gdpar_unsupported_feature_error.
  • The W basis is declared global across all $K$ slots (per design decision "Scope of W: global"); therefore W_raw and sigma_W are shared, and the same modulating contribution enters every slot's $\eta_k$.
  • When has_W is TRUE, the function inspects only the first active slot's W specification to determine dim_W and type.
  • Newdata design matrices are centred/scaled using training statistics (Z_a_k_means_list, Z_b_k_means_list, X_means, X_sds), not the new data's own moments.
  • This function is not exported and is not an S3 method.

posterior_var_to_array_2d(draws, var_name, p)

Purpose Internal helper that extracts a one-dimensional Stan variable declared as real var[p] (e.g., theta_ref[k], sigma_W[1]) from a posterior draws object and reshapes it into an $(S \times p)$ matrix, correctly mapping each Stan-indexed element to its column by parsing the bracket index from the variable name.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix Posterior draws object (from posterior).
var_name character scalar Base name of the Stan variable (e.g., "theta_ref_k", "sigma_W").
p integer scalar Expected number of elements (length of the Stan vector/array).

Mathematics

No closed-form formula; this is a reshape/reindex operation. It maps column $c$ of the raw draws matrix (whose variable name is var_name[idx[c]]) to column idx[c] of the output matrix, where idx[c] is parsed via the regex ^var_name\[(\d+)\]$.

Algorithm

  1. Subset draws to variables matching var_name via posterior::subset_draws.
  2. Convert to a plain (S × C) matrix via posterior::as_draws_matrix (stripped of class).
  3. Parse each variable name with the regex ^var_name\[(\d+)\]$ to extract the integer index.
  4. Validate: abort with gdpar_internal_error if any name fails to match, or if the number of matched entries $\neq p$.
  5. Allocate an $(S \times p)$ output matrix filled with NA_real_. For each column $c$ of the raw matrix, assign it to column idx[c] of the output.

Returns A numeric matrix of dimensions (S, p) where S = number of posterior draws. Column j contains the draws for var_name[j].

Notes

  • Abort condition: names that fail regex parsing → gdpar_internal_error with message "failed to parse … variable names from draws".
  • Abort condition: length(idx) != pgdpar_internal_error with message "expected %d entries … found %d".
  • The function is robust to arbitrary column ordering in the posterior draws object because it reindexes by the parsed integer.
  • Not exported; not an S3 method.

posterior_var_to_matrix_3d(draws, var_name, rows, cols)

Purpose Internal helper that extracts a two-dimensional Stan variable declared as matrix[rows, cols] var (e.g., matrix[dim_W, d] W_raw) from a posterior draws object and reshapes it into a 3-dimensional R array of dimensions (S, rows, cols), correctly mapping each Stan-indexed element [r, c] to the appropriate array slice.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix Posterior draws object.
var_name character scalar Base name of the Stan matrix variable (e.g., "W_raw").
rows integer scalar Number of rows declared in Stan.
cols integer scalar Number of columns declared in Stan.

Mathematics

No formula; reshaping only. The element drawn from variable name var_name[r, c] is placed at array index [s, r, c] for each draw $s$.

Algorithm

  1. Subset and convert to a plain matrix as in posterior_var_to_array_2d.
  2. Parse variable names with regex ^var_name\[(\d+),(\d+)\]$ extracting row index idx_r and column index idx_c.
  3. Validate: abort with gdpar_internal_error if any name fails to match.
  4. Allocate an $(S \times \texttt{rows} \times \texttt{cols})$ array filled with NA_real_. For each raw matrix column $c$, assign it to [s, idx_r[c], idx_c[c]] across all draws $s$.

Returns A numeric 3-dimensional array of dimensions (S, rows, cols).

Notes

  • Abort condition: names that fail regex parsing → gdpar_internal_error.
  • The rows and cols arguments are used only for allocating the output array; the actual row/column mapping comes from the parsed indices, so the function is safe even if the posterior package reorders columns.
  • Not exported; not an S3 method.

posterior_array_var_to_list(draws, var_name, p, ncols)

Purpose Internal helper that extracts a Stan variable declared as array[p] vector[J_max] var (i.e., a ragged array of vectors, e.g., a_coef[k][j], c_b[k][j]) from a posterior draws object and returns a list of length $p$ whose $k$-th entry is an $(S \times \texttt{ncols}[k])$ matrix. The restriction to the leading ncols[k] columns is necessary because the Stan template pads each vector to a common J_max, and the trailing padded zeros must be excluded.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix Posterior draws object.
var_name character scalar Base name of the Stan array-of-vectors variable (e.g., "a_coef_k", "c_b_k").
p integer scalar Number of individual slots $K$ (length of the outer array).
ncols integer vector of length p Number of effective (non-padded) elements per slot $k$. Elements with index $j &gt; \texttt{ncols}[k]$ are silently discarded.

Mathematics

No formula; extraction and reshaping. For each draw $s$ and slot $k$, the output matrix entry at column $j$ ($1 \le j \le \texttt{ncols}[k]$) receives the value of var_name[k, j].

Algorithm

  1. Subset and convert to a plain matrix.
  2. Parse variable names with regex ^var_name\[(\d+),(\d+)\]$ extracting idx_k (slot index) and idx_j (within-slot index).
  3. Validate: abort with gdpar_internal_error if any name fails to match.
  4. Allocate a list of length $p$; the $k$-th entry is an $(S \times \texttt{ncols}[k])$ matrix of NA_real_.
  5. For each raw matrix column $c$, if idx_j[c] <= ncols[idx_k[c]], assign the entire column of draws to out[[k]][, j]; otherwise skip (padded column).

Returns A list of length p. The $k$-th element is a numeric matrix with S rows and ncols[k] columns containing posterior draws of the $k$-th slot's coefficient vector.

Notes

  • Padding columns (idx_j > ncols[k]) are silently ignored; this is by design since the Stan template pads to a common J_max.
  • Abort condition: names that fail regex parsing → gdpar_internal_error.
  • Not exported; not an S3 method.

coef.gdpar_fit(object, ...)

Purpose S3 method for the coef generic, dispatched on class gdpar_fit. Returns posterior summary statistics (mean and 5 % / 50 % / 95 % quantiles) of the model's key parameters: the reference point(s) $\theta_{\mathrm{ref}}$, additive coefficients $\mathbf{a}$, multiplicative coefficients $\mathbf{b}$ / $\mathbf{c}_b$, and modulating basis coefficients $\mathbf{W}_{\mathrm{raw}}$ (optionally multiplied by $\sigma_W$). For K-individual fits (K > 1), returns a named list of per-slot gdpar_coef objects.

Arguments

Argument Type Meaning
object gdpar_fit A fitted General Dynamic Parameter model.
... Unused; present for S3 generic signature compatibility.

Mathematics

The reported modulating coefficients use the natural-scale product

$$\tilde{\mathbf{W}}^{(s)} = \sigma_W^{(s)},\mathbf{W}_{\mathrm{raw}}^{(s)}$$

when cp_W is FALSE, so that summaries are on the modulating scale rather than the centred-parameterisation scale. When cp_W is TRUE, W_raw is used directly.

For K-individual fits, the W basis is global (shared across all $K$ slots per design decision "Scope of W: global" in handoff 28). The W component is therefore attached identically to every slot's returned gdpar_coef object. The per-coordinate multiplicative slot draws from c_b_k are already on the $\theta_{\mathrm{ref}}$-multiplied scale (Recovery 2, handoff 4).

Returns

Fit type Return value
Scalar (p = 1) or multi (p > 1) An object of class gdpar_coef (a list with components theta_ref, a, b, W).
K-individual (K > 1) A named list of length K, each element a gdpar_coef object with p = 1L.

The helper as.data.frame() can be applied to a single gdpar_coef to obtain a long-tidy table; for K-individual fits, lapply(coef(fit), as.data.frame) is recommended.

Notes

  • Exported S3 method (registered via @export).
  • K-individual fits with grouping (J_groups > 1) are not yet supported and raise gdpar_unsupported_feature_error (queued for Session 8.4).
  • The ... argument is not used but is required by the coef generic signature.
  • The function body is defined in another section of this file; only the roxygen documentation is present in this section.

Functions in R/methods.R — Section 4 of 6


coef.gdpar_fit(object, ...)

Purpose S3 method for the coef generic dispatching on objects of class gdpar_fit. Extracts posterior draws from a fitted general dynamic parameter model and restructures them into a gdpar_coef object (scalar/multivariate case) or a named list of gdpar_coef objects (K-individual case). This is the primary user-facing entry point for obtaining coefficient summaries from a fitted model.

Arguments

Argument Type Meaning
object gdpar_fit A fitted model object produced by the package's fitting machinery. Must contain at least fit (a CmdStanMCMC or similar object), p or K, and the requisite amm, design, and parametrization metadata.
... Ignored; present for S3 generic compatibility.

Algorithm

The method performs a three-way dispatch based on the model structure:

  1. Guard: Assert object inherits from "gdpar_fit" (raises an error otherwise).
  2. Require posterior package via require_suggested() (conditionally loads the posterior package for extracting draws).
  3. Read K: If object[["K"]] is non-NULL and $K &gt; 1$, delegate entirely to coef_K_to_gdpar_coef_list(object) and return its result immediately.
  4. Read p: Otherwise, read object[["p"]]; default to $p = 1$ if NULL, coerced to integer.
  5. Extract draws via object$fit$draws() (calls the posterior draws accessor on the embedded Stan fit).
  6. If $p = 1$: call coef_scalar_to_gdpar_coef(object, draws).
  7. If $p &gt; 1$: call coef_multi_to_gdpar_coef(object, draws).

Returns

  • A gdpar_coef S3 object (scalar or multivariate path), or a named list of gdpar_coef objects (K-individual path).

Notes

  • This is an S3 method dispatched on the "gdpar_fit" class. Calling coef(object) on any gdpar_fit will route here.
  • The method does not perform any summarisation (e.g., posterior means, credible intervals); it returns the full posterior draws packed into tidy data frames inside gdpar_coef.
  • If K > 1 and grouping information is present, the call aborts at the internal helper with a gdpar_unsupported_feature_error (see coef_K_to_gdpar_coef_list below).
  • Requires the suggested package posterior; if unavailable, an informative error is raised by require_suggested.

coef_K_to_gdpar_coef_list(object)

Purpose Internal helper that builds a named list of gdpar_coef objects (one per individual/slot) from a $K$-individual distributional regression fit ($K &gt; 1$). Each list entry is a gdpar_coef object with $p = 1$. This implements Decision E4.A of sub-phase 8.3.10, reusing the existing gdpar_coef S3 contract for each slot and wrapping the results in a plain named list.

Arguments

Argument Type Meaning
object gdpar_fit A fitted model object with K > 1. Must contain design_K, amm_list_canonical, slot_names, fit, and parametrization metadata.

Algorithm

  1. Require posterior package.
  2. Parse K from object[["K"]] (coerced to integer).
  3. Grouping guard: If object$group_info is non-NULL, abort with gdpar_unsupported_feature_error indicating that grouped $K$-individual fits ($J_{\text{groups}} &gt; 1$) are queued for Session 8.4 and not yet implemented.
  4. Extract metadata: Retrieve design_K (per-slot design specifications), amm_list_canonical (per-slot AMM specifications), and slot_names (defaulting to slot_1, …, slot_K if NULL or length mismatch).
  5. Extract draws via object$fit$draws().
  6. Extract $\theta_{\text{ref},k}$: Using posterior_var_to_matrix_3d on Stan variable "theta_ref_k" (shape: $S \times 1 \times K$, where $S$ is the number of posterior draws). Reshape to an $S \times K$ matrix.
  7. Determine per-slot column counts for $a$ and $b$ terms: $$\text{ncols_a}[k] = \lvert \texttt{design_KZ_a_k_names_list}k \rvert, \quad \text{ncols_b}[k] = \lvert \texttt{design_KZ_b_k_names_list}k \rvert$$
  8. Extract $a$ coefficients: If any slot has $a$ terms, extract "a_coef_k" via posterior_array_var_to_list(draws, "a_coef_k", K, ncols = ncols_a). Padding columns (those beyond a slot's actual design width) are silently skipped by the helper.
  9. Extract $b$ coefficients: If any slot has $b$ terms, extract "c_b_k" via posterior_array_var_to_list(draws, "c_b_k", K, ncols = ncols_b). Note that $b$ is recovered from the linear-reparametrization variable c_b_k, not b_coef_k (Recovery 2, handoff 4 addendum).
  10. Extract global $W$ block: Determine which slots declare a non-NULL W in their AMM. If any do and design_K$X_names is non-empty:
    • Identify the first active slot's W specification.
    • Verify it is "polynomial" with $\dim_W &gt; 0$.
    • Extract "W_raw" as an $S \times \dim_W \times d_x$ 3D array via posterior_var_to_matrix_3d.
    • If not centered parametrization (cp_W is FALSE): extract "sigma_W" and scale each draw's slice: $$W_{\text{eff}}[s, \cdot, \cdot] = W_{\text{raw}}[s, \cdot, \cdot] \cdot \sigma_W[s, 1]$$
    • Flatten to an $S \times (\dim_W \cdot d_x)$ matrix and call build_coef_W_df to produce a tidy data frame.
    • This W_global_df is replicated into every slot's gdpar_coef (since the K-individual Stan template shares $W$ globally).
  11. Assemble per-slot gdpar_coef: For $k = 1, \dots, K$:
    • Build theta_ref_df_k via build_coef_theta_ref_df from the $k$-th column of theta_ref_mat (scalar, $p = 1$).
    • If $\text{ncols\_a}[k] &gt; 0$: wrap in a one-element list via build_coef_term_df(a_dr_list[[k]], term_names = design_K$Z_a_k_names_list[[k]]).
    • If $\text{ncols\_b}[k] &gt; 0$: similarly wrap via build_coef_term_df(b_dr_list[[k]], ...).
    • If W_global_df is non-NULL: wrap in a one-element list.
    • Call new_gdpar_coef(...) with all components, $p = 1$, mu_theta_ref = NULL, sigma_theta_ref = NULL, J_groups = 1L, group_levels = NULL.
  12. Name the list with slot_names and return.

Mathematics

For each slot $k \in \{1, \dots, K\}$, the extracted model quantities are:

$$\theta_{\text{ref},k} \in \mathbb{R}^{S}, \quad a_k \in \mathbb{R}^{S \times \text{ncols_a}[k]}, \quad b_k \in \mathbb{R}^{S \times \text{ncols_b}[k]}$$

The globally shared weight matrix (per draw $s$):

$$W_{\text{eff}}[s] \in \mathbb{R}^{\dim_W \times d_x}$$

is replicated identically across all $K$ slots when the non-centered parametrization is active:

$$W_{\text{eff}}[s] = W_{\text{raw}}[s] \cdot \sigma_W[s]$$

Returns

A named list of length $K$, whose names are slot_names and whose elements are gdpar_coef S3 objects, each with $p = 1$.

Notes

  • Raises gdpar_unsupported_feature_error (via gdpar_abort) if object$group_info is non-NULL (i.e., $J_{\text{groups}} &gt; 1$). This combination is queued for implementation in Session 8.4.
  • The $b$ slot uses the linear-reparametrization variable c_b_k rather than b_coef_k, consistent with Recovery 2 / handoff 4 addendum convention.
  • Padding columns in a_coef_k and c_b_k arrays (when slots have different design widths) are silently skipped by posterior_array_var_to_list.
  • The W block is replicated across all slots (global $W$). Per-slot $W$ differentiation is not supported in the $K$-individual template.
  • Marked @keywords internal @noRd; not exported and not documented in the user manual.

coef_scalar_to_gdpar_coef(object, draws)

Purpose Internal helper that builds a single gdpar_coef object from a scalar ($p = 1$) univariate or grouped distributional regression fit. This is the workhorse for the $p = 1$ path of coef.gdpar_fit.

Arguments

Argument Type Meaning
object gdpar_fit A fitted model object with p = 1. Must contain group_info (or NULL), amm, design, and parametrization metadata.
draws draws_array / draws_matrix Posterior draws extracted from object$fit$draws().

Algorithm

  1. Determine grouping: J_groups = number of group levels (default 1 if no group_info); group_levels = the level labels (or NULL).
  2. Extract $\theta_{\text{ref}}$: Call .extract_theta_ref_uni_grouped(draws, J_groups) to obtain a 3D array of draws (shape $S \times p \times J_{\text{groups}}$). Build a tidy data frame via build_coef_theta_ref_df_grouped.
  3. Extract hyperparameters (grouped models only, group_info non-NULL):
    • $\mu_{\theta_{\text{ref}}}$: via .extract_mu_sigma_theta_ref(draws, "mu_theta_ref", p = 1), then build_coef_hyper_df(mu_dr, p = 1).
    • $\sigma_{\theta_{\text{ref}}}$: via .extract_mu_sigma_theta_ref(draws, "sigma_theta_ref", p = 1), then build_coef_hyper_df(sigma_dr, p = 1).
  4. Extract $a$ coefficients: If object$amm$a is non-NULL, extract "a_coef" as a draws matrix, then build a tidy data frame via build_coef_term_df(a_dr, object$design$Z_a_names). Wrap in a one-element list.
  5. Extract $b$ coefficients: If object$amm$b is non-NULL, extract "b_coef" as a draws matrix, then build_coef_term_df(b_dr, object$design$Z_b_names). Wrap in a one-element list.
  6. Extract $W$ block: If object$amm$W is non-NULL:
    • Extract "W_raw" as a draws matrix.
    • If not centered parametrization (cp_W is FALSE): extract "sigma_W" and compute: $$W_{\text{eff}} = W_{\text{raw}} \cdot \sigma_W$$ where $\sigma_W$ is broadcast from an $S \times 1$ matrix to a scalar per draw.
    • Otherwise $W_{\text{eff}} = W_{\text{raw}}$.
    • Build a tidy data frame via build_coef_W_df(W_eff, basis_dim = object$amm$W$dim, x_names = object$design$X_names). Wrap in a one-element list.
  7. Assemble via new_gdpar_coef(theta_ref, a, b, W, p = 1, mu_theta_ref, sigma_theta_ref, J_groups, group_levels).

Mathematics

For a scalar model ($p = 1$), the reference parameter is: $$\theta_{\text{ref}} \in \mathbb{R}^{S \times 1 \times J_{\text{groups}}}$$

In the grouped hierarchical case, the prior on the reference parameter is: $$\theta_{\text{ref},j} \sim \mathcal{N}(\mu_{\theta_{\text{ref}}},, \sigma_{\theta_{\text{ref}}}^2)$$

The non-centered $W$ transformation: $$W_{\text{eff}}[s] = W_{\text{raw}}[s] \cdot \sigma_W[s]$$

Returns

A single gdpar_coef S3 object with $p = 1$.

Notes

  • For grouped models ($J_{\text{groups}} &gt; 1$), the theta_ref data frame contains one column per group level, and the hyperparameter data frames (mu_theta_ref, sigma_theta_ref) are non-NULL.
  • For ungrouped models ($J_{\text{groups}} = 1$), mu_theta_ref and sigma_theta_ref are NULL, and group_levels is NULL.
  • a and b are each wrapped in a one-element list (matching the list-of-length-$p$ convention used in new_gdpar_coef, even though $p = 1$).
  • The $b$ coefficient is extracted from Stan variable "b_coef" (unlike the $K$-individual path which uses "c_b_k").
  • Marked @keywords internal @noRd; not exported.

coef_multi_to_gdpar_coef(object, draws)

Purpose Internal helper that builds a single gdpar_coef object from a multivariate ($p &gt; 1$) distributional regression fit. Handles the complexity of per-dimension design structures for $a$, $b$, and $W$ terms.

Arguments

Argument Type Meaning
object gdpar_fit A fitted model object with p > 1. Must contain amm, design, p, group_info (or NULL), and parametrization.
draws draws_array / draws_matrix Posterior draws extracted from object$fit$draws().

Algorithm

  1. Parse metadata: Extract amm, design, integer p, J_groups (default 1), group_levels (default NULL).
  2. Extract $\theta_{\text{ref}}$: Call .extract_theta_ref_multi_grouped(draws, J_groups, p) to obtain a 3D array (shape $S \times p \times J_{\text{groups}}$). Build a grouped data frame via build_coef_theta_ref_df_grouped.
  3. Extract hyperparameters (grouped models only):
    • $\mu_{\theta_{\text{ref}}}$: .extract_mu_sigma_theta_ref(draws, "mu_theta_ref", p)build_coef_hyper_df(mu_dr, p).
    • $\sigma_{\theta_{\text{ref}}}$: .extract_mu_sigma_theta_ref(draws, "sigma_theta_ref", p)build_coef_hyper_df(sigma_dr, p).
  4. Extract $a$ coefficients (per-dimension): Check if any dimension declares an $a$ term (has_a_global). If so:
    • Compute per-dimension column counts: $$\text{ncols_a}[k] = \lvert \texttt{designZ_a_names_list}k \rvert, \quad k = 1, \dots, p$$
    • Extract "a_coef" via posterior_array_var_to_list(draws, "a_coef", p, ncols = ncols_a).
    • For each $k$: if $\text{ncols\_a}[k] &gt; 0$, call build_coef_term_df(a_dr_list[[k]], term_names = design$Z_a_names_list[[k]]); otherwise NULL.
    • Result is a length-$p$ list (with possible NULL entries).
  5. Extract $b$ coefficients (per-dimension): Check if any dimension declares a $b$ term (has_b_global). If so:
    • Compute per-dimension column counts: $$\text{ncols_b}[k] = \lvert \texttt{designZ_b_names_list}k \rvert, \quad k = 1, \dots, p$$
    • Extract "c_b" via posterior_array_var_to_list(draws, "c_b", p, ncols = ncols_b).
    • Build per-dimension data frames (NULL if empty). Result is a length-$p$ list.
    • Note: Uses "c_b" (linear-reparametrization variable), not "b_coef".
  6. Extract $W$ block (per-dimension sub-blocks): Active if amm$W is non-NULL, design$X_names is non-empty, and amm$W$type == "polynomial". If so:
    • Compute per-dimension weight basis dimension: $$W_{\text{per_k_dim}} = \left\lfloor \frac{\dim_W}{p} \right\rfloor$$
    • Extract "W_raw" as an $S \times \dim_W \times d_x$ 3D array.
    • If not centered parametrization: extract "sigma_W".
    • For each $k = 1, \dots, p$:
      • If $W_{\text{per\_k\_dim}} = 0$: NULL.
      • Otherwise extract the row block $[(k-1) \cdot W_{\text{per\_k\_dim}} + 1,\; k \cdot W_{\text{per\_k\_dim}}]$ from the $\dim_W$ axis.
      • Scale by $\sigma_W$ if not centered: $$W_{\text{block}}[s] = W_{\text{raw,block}}[s] \cdot \sigma_W[s]$$
      • Flatten and build data frame via build_coef_W_df.
    • Result is a length-$p$ list.
  7. Assemble via new_gdpar_coef(theta_ref, a, b, W, p, mu_theta_ref, sigma_theta_ref, J_groups, group_levels).

Mathematics

For a multivariate model with $p &gt; 1$ dimensions, the weight matrix $\mathbf{W}$ has total dimension $\dim_W \times d_x$ and is partitioned into $p$ sub-blocks along the row axis, each of dimension $W_{\text{per\_k\_dim}} \times d_x$:

$$\mathbf{W} = \begin{pmatrix} \mathbf{W}^{(1)} \ \mathbf{W}^{(2)} \ \vdots \ \mathbf{W}^{(p)} \end{pmatrix}, \quad \mathbf{W}^{(k)} \in \mathbb{R}^{W_{\text{per_k_dim}} \times d_x}$$

Under the non-centered parametrization: $$\mathbf{W}^{(k)}{\text{eff}}[s] = \mathbf{W}^{(k)}{\text{raw}}[s] \cdot \sigma_W[s]$$

The $a$ and $b$ terms are extracted as arrays: $$a_k \in \mathbb{R}^{S \times \text{ncols_a}[k]}, \quad c_b_k \in \mathbb{R}^{S \times \text{ncols_b}[k]}$$

where $c\_b_k$ is the linear-reparametrization variable used to recover $b_k$ on the Stan side.

Returns

A single gdpar_coef S3 object with $p &gt; 1$. The a, b, and W slots are each length-$p$ lists (with possible NULL entries for dimensions that lack the respective term).

Notes

  • Uses "c_b" (not "b_coef") for the $b$ coefficients, consistent with the Stan-side linear reparametrization convention (Recovery 2).
  • The $a$ and $b$ extraction uses the global flag (has_a_global, has_b_global): if any dimension declares the term, the array is extracted for all $p$ dimensions; dimensions with zero columns yield NULL list entries.
  • The weight matrix is physically stored as a single 3D array in Stan and logically partitioned into per-dimension blocks in R.
  • Integer division $\lfloor \dim_W / p \rfloor$ determines per-dimension block size; if $\dim_W$ is not evenly divisible by $p$, any remainder rows would be discarded (though in practice the design enforces divisibility).
  • For grouped models, hyperparameter data frames are included; for ungrouped models, they are NULL.
  • Marked @keywords internal @noRd; not exported.

Note: The section also contains roxygen documentation for print.gdpar_diagnostics(x, ...) (an exported S3 print method), but the function body is not defined in this section and appears in a subsequent section (5 of 6). It is therefore not documented here.

print.gdpar_diagnostics(x, ...)

Purpose S3 print method for objects of class gdpar_diagnostics. Provides a concise, human-readable summary of MCMC diagnostic quantities stored in the object, including convergence status, $\hat{R}$, effective sample sizes, divergences, saturated treedepth, the minimum expected fraction of missing information (E-FMI), and any advisory messages.

Arguments

Argument Type Meaning
x list with class gdpar_diagnostics A diagnostics object whose named elements are converged (logical), rhat_max (numeric or NA), ess_bulk_min (numeric or NA), ess_tail_min (numeric or NA), divergent_count (integer), treedepth_saturated (integer), efmi_min (numeric or NA), and messages (character vector, possibly length-0).
... Additional arguments passed to print; ignored.

Returns x invisibly (invisible(x)). The function is called for its side effect of printing to the console.

Notes

  • Uses cat with sep = "" so no extra whitespace is introduced between label and value.
  • Numeric diagnostics (rhat_max, ess_bulk_min, ess_tail_min, efmi_min) that are NA are printed as the literal string "NA" rather than being passed to format.
  • When length(x$messages) > 0L, messages are printed as a bulleted list with "- " prefix and indented four spaces.
  • No S3 dispatch beyond the class name; UseMethod is not called explicitly here because print is already a generic.

.extract_theta_ref_uni_grouped(draws, J_groups)

Purpose Internal helper that extracts the posterior draws of theta_ref from a univariate Stan model with grouping. The Stan template declares theta_ref as vector[J_groups], which the posterior package exposes as scalar variables theta_ref[1], theta_ref[2], …, theta_ref[J_groups]. The helper reassembles these into a draws $\times$ groups matrix.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix (posterior) Posterior draws object from a fitted univariate grouped Stan model.
J_groups integer scalar Expected number of groups.

Returns A numeric matrix of dimension $(S, J_\text{groups})$ where $S$ is the number of posterior draws. Column $g$ holds the draws of $\theta_{\text{ref},g}$ for group $g$, correctly ordered regardless of the column order in the raw draws matrix.

Notes

  • Uses regex ^theta_ref\\[(\\d+)\\]$ to parse integer group indices from variable names.
  • Calls gdpar_abort with class "gdpar_internal_error" if:
    • Any parsed index is NA.
    • The number of parsed columns $\neq J_\text{groups}$.
  • Internally converts the posterior draws subset to a plain matrix via unclass(posterior::as_draws_matrix(...)).
  • Uses a for loop to place each column of the raw matrix into the correct column of out based on the parsed index, so ordering is deterministic even if Stan/the posterior package emits columns in a different order.

.extract_theta_ref_multi_grouped(draws, J_groups, p)

Purpose Internal helper that extracts the posterior draws of theta_ref from a multivariate Stan model with grouping. The Stan template declares theta_ref as array[J_groups] vector[p], exposed by the posterior package as variables theta_ref[g, k]. The helper reassembles these into a three-dimensional array.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix (posterior) Posterior draws object from a fitted multivariate grouped Stan model.
J_groups integer scalar Expected number of groups.
p integer scalar Dimensionality of the multivariate parameter space.

Returns A numeric array of dimension $(S, J_\text{groups}, p)$ where $S$ is the number of posterior draws. Element $[s, g, k]$ holds draw $s$ of $\theta_{\text{ref},g,k}$ for group $g$ and coordinate $k$.

Notes

  • Uses regex ^theta_ref\\[(\\d+),\\s*(\\d+)\\]$ to parse group index $g$ and coordinate index $k$.
  • Expected number of raw columns is $J_\text{groups} \times p$.
  • Calls gdpar_abort with class "gdpar_internal_error" if any parsed index is NA or the total count does not match.
  • Iterates over columns of the raw matrix, placing each into the correct [s, g, k] slice of the output array.

.extract_theta_ref_multi_flat(draws, J_groups, p)

Purpose Convenience wrapper for the no-grouping branch of the multivariate Stan template. When $J_\text{groups} = 1$, the three-dimensional array returned by .extract_theta_ref_multi_grouped has a trivial first axis. This function collapses $(S, 1, p)$ into a plain $(S, p)$ matrix, which is more natural for downstream single-group predictions.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix (posterior) Posterior draws object.
J_groups integer scalar Must be exactly 1L.
p integer scalar Dimensionality of the multivariate parameter.

Returns A numeric matrix of dimension $(S, p)$, equivalent to arr[, 1L, ] where arr is the $(S, 1, p)$ array.

Notes

  • Calls .extract_theta_ref_multi_grouped internally and then drops the singleton group axis.
  • Raises gdpar_abort (class "gdpar_internal_error") if $J_\text{groups} \neq 1$, guarding against misuse.

.extract_mu_sigma_theta_ref(draws, var_name, p)

Purpose Internal helper that extracts posterior draws of either mu_theta_ref or sigma_theta_ref (hierarchical hyperparameters of the group-level reference anchor). These Stan variables are emitted only when use_groups = 1. In the univariate template they are declared array[1] real, yielding variables mu_theta_ref[1]; in the multivariate template they are array[1] vector[p], yielding mu_theta_ref[1, k]. The helper detects both shapes and normalises the output to a matrix of dimension $(S, p)$.

Arguments

Argument Type Meaning
draws draws_array / draws_matrix (posterior) Posterior draws object.
var_name character scalar Either "mu_theta_ref" or "sigma_theta_ref".
p integer scalar Dimensionality of the parameter space ($p = 1$ for univariate).

Returns A numeric matrix of dimension $(S, p)$ where $S$ is the number of draws. For the univariate case ($p = 1$) the matrix has a single column.

Notes

  • First attempts to match a two-index pattern ^<var_name>\\[1,\\s*(\\d+)\\]$ (multivariate). If any matches succeed, it expects exactly $p$ such columns and assembles them into the matrix by coordinate index.
  • If no two-index matches are found, falls back to the one-index pattern ^<var_name>\\[(\\d+)\\]$ (univariate). It expects exactly one match and returns a single-column matrix.
  • Calls gdpar_abort (class "gdpar_internal_error") if:
    • The two-index path finds a count $\neq p$.
    • The one-index path finds 0 or $&gt; 1$ matches, or any parsed index is NA.
  • The var_name parameter allows the same logic to serve both mu_theta_ref and sigma_theta_ref.

predict_from_newdata_grouped(object, newdata, draws)

Purpose Internal function that generates posterior predictions of $\theta_i$ for new observations under a univariate grouped fit. It mirrors the non-grouped predict_from_newdata but respects the per-group anchor structure of Block 6.5: known groups reuse their group-specific theta_ref draws; unseen groups fall back to the hierarchical prior predictive.

Arguments

Argument Type Meaning
object gdpar fitted model object Must contain amm, design, group_info, and anchor components.
newdata data.frame New covariate data; must include the grouping variable group_info$var_name.
draws draws_matrix / draws_array (posterior) Posterior draws from the fitted Stan model.

Algorithm / Mathematics

Let $S$ denote the number of posterior draws and $n$ the number of rows in newdata.

1. Group resolution. For each row $i$, look up group $g_i$ from the grouping variable column in newdata, mapped against the factor levels observed at fit time. Rows whose level was never seen are flagged unseen.

2. Per-draw loop ($s = 1, \dots, S$):

For each observation $i$:

$$ \theta_{\text{ref},i}^{(s)} = \begin{cases} \theta_{\text{ref},g_i}^{(s)} & \text{if } g_i \text{ is a known group level}, \\ z^{(s)} \sim \mathcal{N}!\bigl(\mu_{\theta_\text{ref}}^{(s)},; \sigma_{\theta_\text{ref}}^{(s)}\bigr) & \text{if } g_i \text{ is unseen}. \end{cases} $$

Then the linear predictor is assembled:

$$ \eta_i^{(s)} = \theta_{\text{ref},i}^{(s)}

  • \mathbf{z}{a,i}^{\top},\mathbf{a}{\text{coef}}^{(s)}
  • \mathbf{z}_{b,i}^{\top},\mathbf{c}_b^{(s)}
  • \text{W-contribution}_i^{(s)} $$

where $\mathbf{z}_{a,i}$ and $\mathbf{z}_{b,i}$ are the centered design rows from newdata for the varying-intercept and varying-slope sub-models respectively, $\mathbf{a}_{\text{coef}}$ are the population-level intercept coefficients, and $\mathbf{c}_b$ are the direct $b$-block coefficients (not the derived b_coef; see decision D4 of Block 6.5).

3. W-contribution (only if amm$W is non-null and of type "polynomial"). Let $\mathbf{W}$ be the $(d_w \times d)$ matrix of basis coefficients reshaped from W_raw[s, ], $\sigma_W^{(s)}$ the scale parameter, and $\theta_\text{anchor}$ the reference anchor. For each row $i$:

$$ \text{basis_diff}_{i,k}^{(s)} = \bigl(\theta_{\text{ref},i}^{(s)}\bigr)^k - \theta_{\text{anchor}}^k, \quad k = 1,\dots,d_w $$

$$ \mathbf{w}_i^{(s)} = \sigma_W^{(s)},\mathbf{W}^{\top},\text{basis_diff}_i^{(s)} $$

$$ \text{W-contribution}_i^{(s)} = \mathbf{x}_i^{\top},\mathbf{w}_i^{(s)} $$

where $\mathbf{x}_i$ is the (centered and standardised) covariate row for the $W$ component.

Design matrix construction for newdata:

  • Z_a: model matrix for amm$a on newdata, centered column-wise by the training means design_train$Z_a_means.
  • Z_b: model matrix for amm$b on newdata, centered column-wise by design_train$Z_b_means.
  • X: columns of newdata matching design_train$X_names, centered by X_means and scaled by X_sds (training statistics).
  • Empty matrices (zero columns) are created if the corresponding sub-model is absent.

Returns A numeric matrix of dimension $(S, n)$ where $S$ is the number of posterior draws and $n$ is nrow(newdata). Entry $[s, i]$ is $\eta_i^{(s)}$.

Notes

  • Calls require_suggested("posterior", ...) to ensure the posterior package is available.
  • Raises gdpar_abort (class "gdpar_input_error") if the grouping variable is not found in newdata.
  • If any rows contain unseen group levels, a warning of class "gdpar_predict_unseen_group_warning" is issued via gdpar_warn, naming the unseen levels. Those rows receive draws from the marginal prior predictive rather than a group-specific posterior draw.
  • theta_ref_dr is extracted via .extract_theta_ref_uni_grouped, mu_dr and sigma_dr via .extract_mu_sigma_theta_ref with p = 1L.
  • Population-level coefficients (a_coef, c_b, W_raw, sigma_W) are extracted from draws only if the corresponding sub-model is present; otherwise set to NULL and the corresponding block is skipped.
  • The for loop over $s$ is the outermost loop, so the entire prediction for draw $s$ is computed before moving to $s+1$. This includes random number generation for unseen groups (via stats::rnorm), meaning the RNG state matters and results are reproducible only if the seed is set beforehand.

predict_from_newdata_grouped_multi(object, newdata, draws)

Purpose
Generates posterior predictive draws of the linear predictor for new data from a fitted generalized dynamic parameter (GDPAR) model with multiple groups. It is an internal workhorse used by the predict.gdpar method for grouped models. It handles unseen groups by drawing their reference parameters from the marginal prior predictive distribution.

Arguments

  • object : A fitted model object of class "gdpar" containing the amm (Anchored Model Matrix), design, p, group_info, parametrization, and anchor components.
  • newdata : A data frame containing new observations. Must include the same grouping variable used during model fitting.
  • draws : A posterior draws object (e.g., from the posterior package) containing samples of all model parameters.

Mathematics
For each posterior draw $s$, each new observation $i$, and each response dimension $k = 1, \dots, p$, the linear predictor $\eta_{s,i,k}$ is computed as:

  1. Reference Parameter
    Let $g(i)$ be the group of observation $i$. If $g(i)$ is a seen group,
    $\theta_{\text{ref}, s, i, k} = \theta_{\text{ref}, s, g(i), k}^{(\text{train})}$ where $\theta_{\text{ref}, s, g(i), k}^{(\text{train})}$ is the group‑specific reference draw.
    If $g(i)$ is unseen, $\theta_{\text{ref}, s, i, k}$ is drawn from the prior predictive:
    $\theta_{\text{ref}, s, i, k} \sim \mathcal{N}\left( \mu_{\theta_{\text{ref}}, s, k},\; \sigma_{\theta_{\text{ref}}, s, k} \right).$

  2. Linear Additive Terms
    Add the contributions from the a and b design matrices:
    $\eta_{s,i,k} \;+=\; Z_{a,i,k}^\top \, a_{\text{coef}, s, k} \;+\; Z_{b,i,k}^\top \, c_{b, s, k}$ where $Z_{a,i,k}$ and $Z_{b,i,k}$ are centered (mean‑subtracted) design vectors for the a and b sub‑models of dimension $k$, and $a_{\text{coef}, s, k}$, $c_{b, s, k}$ are the corresponding coefficient draws.

  3. Polynomial Basis Term (if present)
    If the model contains a polynomial basis matrix $\mathbf{W}$, then for each dimension $k$ let $\theta_{\text{anchor},k}$ be the fixed anchor point. Define the basis difference vector
    $$\Delta_{s,i,k} = \begin{bmatrix} \theta_{\text{ref}, s, i, k}^{1} - \theta_{\text{anchor},k}^{1} \ \theta_{\text{ref}, s, i, k}^{2} - \theta_{\text{anchor},k}^{2} \ \vdots \ \theta_{\text{ref}, s, i, k}^{W_{\text{per}_k}} - \theta_{\text{anchor},k}^{W_{\text{per}_k}} \end{bmatrix}$$
    where $W_{\text{per}\_k}$ is the number of polynomial terms per dimension. Let $W_{s,k}$ be the block of the weight matrix draw corresponding to dimension $k$. Then
    $\eta_{s,i,k} \;+=\; X_i^\top \left( W_{s,k}^\top \, \Delta_{s,i,k} \right).$ If the model uses a non‑centered parametrization (cp_W = FALSE), the term is additionally scaled by $\sigma_{W,s}$.

Returns
A 3‑dimensional array of size $(S, n_{\text{new}}, p)$ where $S$ is the number of posterior draws, $n_{\text{new}}$ is the number of rows in newdata, and $p$ is the number of response dimensions. Entry [s, i, k] is the linear predictor $\eta_{s,i,k}$ described above.

Notes

  • Side effect: Loads the posterior package via require_suggested().
  • Error handling: Raises a gdpar_input_error if the required grouping variable is missing from newdata.
  • Warning: Issues a gdpar_predict_unseen_group_warning if any level of the grouping variable in newdata was not present during training.
  • Unseen groups: Their reference parameters are drawn i.i.d. from the prior predictive $\mathcal{N}(\mu_{\theta_{\text{ref}}}, \sigma_{\theta_{\text{ref}}})$ for each draw, ignoring any group‑level posterior shrinkage.
  • Design matrix centering: The Z_a and Z_b matrices are centered using the training‑set column means stored in design_train$Z_a_means_list and Z_b_means_list.
  • Polynomial basis: Only active when the model’s amm$W is present and of type "polynomial". The number of terms per dimension ($W_{\text{per}\_k}$) is computed as dim_W / p.
  • Parametrization: The scaling of the $W$ basis by $\sigma_W$ depends on the logical flag object$parametrization$cp_W.
  • S3 dispatch: Not directly dispatched; called internally by predict.gdpar when the model is grouped and multi‑dimensional.

R/preflight_multi.R

preflight_parametrization_multi(prior, stan_data, amm, preflight_seed = NULL, verbose = FALSE, tau_cp = 5, tau_ncp = 2, aggregation = "any_ncp")

Purpose

Multivariate ($p &gt; 1$) pre-flight diagnostic that selects between centered (CP) and non-centered (NCP) parametrizations for each effective component ($a$ and $W$) of a coordinate-wise factorized dynamic parameter model. It is the multivariate counterpart of preflight_parametrization(). The function compiles and samples a short NCP version of the multivariate Stan template, then applies a three-filter cascade—divergence attribution, E-BFMI, and info-ratio z-test (Path B′)—to each coordinate $k \in \{1,\dots,p\}$ of each component independently. Per-coordinate decisions are aggregated into per-component global decisions via the aggregation strategy, and a full report is returned as a gdpar_preflight_report S3 object.

Arguments

Argument Type Meaning
prior gdpar_prior Prior specification object; validated via assert_inherits.
stan_data list Data list assembled by .assemble_stan_data_multi; must contain fields use_a, use_W, J_a_per_k, W_per_k_dim, d, theta_anchor.
amm amm_spec AMM specification object; must have amm$p set to an integer $\geq 1$.
preflight_seed integer or NULL Random seed passed to run_preflight_sample.
verbose logical If FALSE, pre-flight stderr is sunk to a tempfile and discarded.
tau_cp numeric scalar Upper null-hypothesis value for the info-ratio z-test. Default 5 (handoff-canonical).
tau_ncp numeric scalar Lower null-hypothesis value for the info-ratio z-test. Default 2 (handoff-canonical).
aggregation character scalar One of c("any_ncp", "majority", "per_k"). Default "any_ncp". Controls how per-coordinate decisions are aggregated into per-component decisions inside the report.

Mathematics

Let $p = \texttt{amm}\$p$. For each coordinate $k \in \{1,\dots,p\}$ and each component $c \in \{a, W\}$, a decision $d_c^{(k)} \in \{\texttt{"CP"}, \texttt{"NCP"}, \texttt{"absent"}\}$ is produced through a sequential filter cascade. A coordinate is marked "absent" when the component is unused or degenerate for that $k$.

Preliminary quantities.

$$ J_a^{\text{free}}(k) = \max(0,; J_{a,\text{per}_k}[k]) $$

$$ \texttt{offset}_a(k) = \begin{cases} 0 &amp; k = 1 \ \sum_{j=1}^{k-1} J_a^{\text{free}}(j) &amp; k &gt; 1 \end{cases} $$

A coordinate $k$ needs processing for component $a$ iff $\texttt{use\_a} = 1$ and $J_a^{\text{free}}(k) \geq 1$; for component $W$ iff $\texttt{use\_W} = 1$, $W_{\text{per}\_k\_\text{dim}} \geq 1$, and $d \geq 1$.

Critical values (computed once, $\alpha = 0.025$):

$$ t_{\text{crit}} = t_{1-\alpha}!\left(\max(1,; n_{\text{div}} - 1)\right), \qquad z_{\text{crit}} = \Phi^{-1}(1 - \alpha) $$

where $n_{\text{div}}$ is the total number of divergent transitions across all chains and draws.

Filter 1 — Divergence attribution (per coordinate). Applied only when $n_{\text{div}} \geq 1$. For each undecided coordinate $k$ of component $a$, a per-$k$ attribution score $s_a^{(k)}$ is computed by preflight_attribution_score_a_per_k_from_fit using $\log(\sigma_a[k])$ and $\|a_{\text{raw}}[\text{section}_k]\|$. For component $W$, $s_W^{(k)}$ uses $\log(\sigma_W[1])$ and the norm of the $k$-th block of $W_{\text{raw}}$. If $s_c^{(k)} &gt; t_{\text{crit}}$, the coordinate is forced to CP:

$$ d_c^{(k)} \leftarrow \texttt{"CP"} \quad \text{if } s_c^{(k)} > t_{\text{crit}} $$

Filter 2 — E-BFMI (global). If $\min(\text{E-BFMI}) &lt; 0.3$, every still-undecided coordinate of every effective component is forced to CP:

$$ d_c^{(k)} \leftarrow \texttt{"CP"} \quad \text{if } d_c^{(k)} = \texttt{"absent"} \text{ and } \text{E-BFMI}_{\min} < 0.3 $$

Filter 3 — Info-ratio z-test (Path B′, per coordinate). For each undecided coordinate $k$, the info-ratio extractor returns two test statistics $t_{\text{cp}}$ and $t_{\text{ncp}}$. The decision rule is:

$$ d_c^{(k)} = \begin{cases} \texttt{"NCP"} & \text{if } t_{\text{cp}} = \texttt{NA} \quad (\texttt{filter_info_undefined_ncp}) \\ \texttt{"CP"} & \text{if } t_{\text{cp}} > z_{\text{crit}} \quad (\texttt{filter_info_high}) \\ \texttt{"NCP"} & \text{if } t_{\text{ncp}} < -z_{\text{crit}} \quad (\texttt{filter_info_low}) \\ \texttt{"NCP"} & \text{otherwise} \quad (\texttt{filter_info_ambiguous_ncp}) \end{cases} $$

For component $a$, the effective coefficient matrix per draw is a_coef[, k, 1:J_a_free[k]] and the reference scale is sigma_a[, k]. For component $W$, the effective coefficient at coordinate $jj$ per draw $t$ is:

$$ \text{eff}[t, jj] = \sum_{m=1}^{W_{\text{per}_k_\text{dim}}} \left(\theta_{\text{ref}}[t, k]^m - \theta_{\text{anchor}}[k]^m\right) W_{\text{raw}}!\left[t,; (k-1),W_{\text{per}_k_\text{dim}} + m,; jj\right] \sigma_W[t] $$

and the per-draw reference scale is:

$$ \sigma_W[t] \sqrt{\sum_{m=1}^{W_{\text{per}_k_\text{dim}}} \left(\theta_{\text{ref}}[t, k]^m - \theta_{\text{anchor}}[k]^m\right)^2} $$

Aggregation. Per-coordinate decisions are passed to new_gdpar_preflight_report, which produces a global per-component decision according to aggregation. The global decision is then converted to a logical (TRUE = CP) via decision_to_logical.

Returns

A named list with elements:

Element Type Description
cp_a logical scalar Aggregated global CP/NCP decision for component $a$ (TRUE = CP).
cp_W logical scalar Aggregated global CP/NCP decision for component $W$ (TRUE = CP).
cp_a_per_k logical vector (length $p$) Per-coordinate CP/NCP decision for component $a$.
cp_W_per_k logical vector (length $p$) Per-coordinate CP/NCP decision for component $W$.
cs_model_ncp cmdstanr model object or NULL Compiled NCP Stan model; NULL if no preflight was run (no component needed per-$k$ processing).
report gdpar_preflight_report S3 object containing the full per-coordinate, per-component diagnostic table and settings.

Notes

  • Input validation. assert_inherits is called on prior (must be gdpar_prior) and amm (must be amm_spec). If amm$p is NULL or < 1, the function aborts with a gdpar_internal_error class error. If aggregation is not one of the three allowed values, it aborts with a gdpar_input_error class error carrying a data field with received.
  • Stan code generation. The NCP Stan template is generated via generate_stan_code_multi(prior, cp_a = FALSE, cp_W = FALSE), written to a tempfile via write_stan_to_tempfile, and compiled with cmdstanr::cmdstan_model.
  • Sampling. Delegated to run_preflight_sample(cs_model_ncp, stan_data, preflight_seed, verbose). The total number of transitions is hardcoded as 2L * 200L (2 chains × 200 post-warmup draws), matching the settings recorded in the report (n_chains = 2L, n_warmup = 200L, n_sampling = 200L, adapt_delta = 0.95, max_treedepth = 10L).
  • Diagnostics. diag <- pre_fit$diagnostic_summary(quiet = TRUE) is called on the fit object. n_divergent is sum(diag$num_divergent); div_pct is n_divergent / total_transitions; ebfmi_min is min(diag$ebfmi).
  • Null coalescing. Uses %||% for stan_data fields (J_a_per_k, W_per_k_dim, d, theta_anchor), defaulting to 0L, 0L, 0L, and as.double(stan_data$theta_anchor) respectively.
  • Offset computation. a_raw_offset_per_k is c(0L, cumsum(J_a_free_per_k)[-p]), giving the 0-based start index of each $k$'s block in the flattened a_raw vector. Note that cumsum is taken over all $p$ elements and the last element is dropped, so the result has length $p$.
  • needs_W_per_k is constructed as rep(has_W & W_per_k_dim >= 1L & d >= 1L, p), i.e., the same scalar condition replicated $p$ times.
  • No preflight path. If !any(needs_a_per_k) && !any(needs_W_per_k), no Stan model is compiled or sampled; cs_model_ncp remains NULL, used_preflight remains FALSE, and all decisions remain "absent" with NA test statistics.
  • Report construction. The per_dim data frame has $2p$ rows ($p$ for component a, $p$ for component W) with columns: component, dim, decision, decision_reason, n_divergent, div_pct, ebfmi_min, t_attribution, t_info_cp, t_info_ncp. The aggregation argument is passed as method to new_gdpar_preflight_report.
  • Global decision extraction. The global decision is read from report$global$global_decision filtered by component, then converted to logical via decision_to_logical.
  • Phase H.1 contract. The "any_ncp" and "majority" strategies are wired through to the existing multivariate Stan template (single CP/NCP pair per component). The "per_k" strategy requires Phase H.2 per-$k$ Stan template wiring and is rejected at the resolver level (not within this function).
  • Side effects. Writes a Stan file to a temporary directory; compiles a cmdstan model; potentially writes stderr to a tempfile (when verbose = FALSE).

preflight_a_info_per_k_from_fit(fit, k, J_a_free_k, a_raw_offset_k, tau_cp, tau_ncp)

Purpose

Per-coordinate ($k$) Path B′ info-ratio extractor for the additive component $a$. It extracts the per-draw free additive coefficients a_coef[k, 1:J_a_free_k] and the per-draw scale sigma_a[k] from a preflight cmdstanr fit object, then delegates the info-ratio z-test computation to preflight_info_ratio_t.

Arguments

Argument Type Meaning
fit cmdstanr fit object The preflight model fit from which posterior draws are extracted.
k integer scalar Coordinate index (1-based) into the $p$-dimensional parameter space.
J_a_free_k integer scalar Number of free additive coefficients for coordinate $k$.
a_raw_offset_k integer scalar Offset into the flattened a_raw vector for coordinate $k$. Accepted but not used in the function body.
tau_cp numeric scalar Upper null-hypothesis value for the info-ratio z-test.
tau_ncp numeric scalar Lower null-hypothesis value for the info-ratio z-test.

Mathematics

Given $J = J_a^{\text{free}}(k)$ free coefficients, the function extracts per-draw posterior samples:

$$ \sigma_a^{(k)}[t], \quad t = 1, \dots, T $$

$$ \mathbf{a}_{\text{coef}}^{(k)}[t, j], \quad j = 1, \dots, J $$

where $T$ is the total number of posterior draws. These are passed to preflight_info_ratio_t as effective_coef (an $T \times J$ matrix) and reference_scale_per_draw (a length-$T$ vector), with n_chains = 2L. The info-ratio z-test then produces two statistics $t_{\text{cp}}$ and $t_{\text{ncp}}$.

Returns

A named list with two elements:

Element Type Description
t_cp numeric scalar or NA_real_ Info-ratio z-test statistic under the CP null hypothesis.
t_ncp numeric scalar or NA_real_ Info-ratio z-test statistic under the NCP null hypothesis.

Returns list(t_cp = NA_real_, t_ncp = NA_real_) in any of the following degenerate cases:

  • J_a_free_k < 1L
  • The draw extraction for sigma_a[k] raises an error or returns zero columns
  • Any element of sigma_vec is NA or $\leq 0$
  • The draw extraction for a_coef[k, 1:J_a_free_k] raises an error or returns a matrix with ncol != J_a_free_k

Notes

  • Variable naming. Stan variables are extracted using 1-based Stan indexing: sprintf("sigma_a[%d]", k) for the scale and sprintf("a_coef[%d,%d]", k, seq_len(J_a_free_k)) for the coefficient vector.
  • Error handling. Both fit$draws() calls are wrapped in tryCatch with an error handler returning NULL. A subsequent is.null check (or ncol check) triggers the NA return path.
  • Unused argument. a_raw_offset_k is part of the signature (for symmetry with the $W$ counterpart preflight_W_info_per_k_from_fit) but is not referenced in the function body.
  • Draw format. Both draw extractions request format = "draws_matrix". The coefficient matrix is additionally coerced via as.matrix().
  • Hardcoded chain count. n_chains = 2L is passed to preflight_info_ratio_t, matching the preflight configuration in preflight_parametrization_multi.
  • No side effects. The function is purely computational; it reads from the fit object but does not modify it or any external state.

preflight_W_info_per_k_from_fit(fit, k, W_per_k_dim, d, theta_anchor_k, tau_cp, tau_ncp)

Purpose

Per-component-$k$ information-ratio diagnostic for the multiplicative $W$ block in the multivariate (multi) pre-flight pipeline. It extracts posterior draws of theta_ref[k], sigma_W[1], and the $k$-th row-block of W_raw from a pre-flight Stan fit, reconstructs the effective coefficient contribution of that block along each coordinate, and delegates the final CP/NCP threshold computation to preflight_info_ratio_t. This is the multivariate analogue of the per-$k$ $W$ information diagnostic used by preflight_parametrization_multi.

Arguments

Argument Type Meaning
fit Stan fit object A fitted model object exposing $draws(variables, format) and (transitively, via preflight_info_ratio_t) sampler diagnostics.
k integer scalar 1-based index of the component whose $W$ block is being diagnosed.
W_per_k_dim integer scalar Number of rows in the W_raw matrix belonging to component $k$ (the polynomial basis dimension for that block).
d integer scalar Number of columns of W_raw (the coordinate / response dimension).
theta_anchor_k numeric scalar The reference-anchor value $\theta_{\text{anchor},k}$ used to define the basis displacement.
tau_cp numeric scalar CP-side threshold passed through to preflight_info_ratio_t.
tau_ncp numeric scalar NCP-side threshold passed through to preflight_info_ratio_t.

Mathematics

Let $S$ denote the number of posterior draws. The function extracts:

$$ \theta^{(s)} = \texttt{theta_ref}[k]^{(s)}, \qquad \sigma^{(s)} = \texttt{sigma_W}[1]^{(s)}, \qquad s = 1,\dots,S. $$

The $k$-th block occupies rows $r_m = (k-1)\,W_{\text{per\_k\_dim}} + m$ for $m = 1,\dots,W_{\text{per\_k\_dim}}$ of W_raw. For each draw $s$, basis coordinate $m$, and output coordinate $j \in \{1,\dots,d\}$, the basis displacement and effective coefficient are

$$ \Delta_m^{(s)} = \bigl(\theta^{(s)}\bigr)^{m} - \theta_{\text{anchor},k}^{,m}, $$

$$ \text{basis_norm}^{(s)} = \sqrt{\sum_{m=1}^{W_{\text{per_k_dim}}} \bigl(\Delta_m^{(s)}\bigr)^2}, $$

$$ \text{reference_scale}^{(s)} = \sigma^{(s)} \cdot \text{basis_norm}^{(s)}, $$

$$ \text{eff_coef}^{(s)}_{j} = \sum_{m=1}^{W_{\text{per_k_dim}}} \Delta_m^{(s)} \cdot W_{\text{raw}}[r_m, j]^{(s)} \cdot \sigma^{(s)}. $$

The pair $(\text{eff\_coef},\, \text{reference\_scale})$ is forwarded to preflight_info_ratio_t with n_chains = 2L.

Returns

A list with elements t_cp and t_ncp (both NA_real_ on every early-exit path, otherwise the structure returned by preflight_info_ratio_t).

Notes

  • Early-exit (returns list(t_cp = NA_real_, t_ncp = NA_real_)) occurs when:
    • W_per_k_dim < 1L or d < 1L;
    • fit$draws(...) for theta_ref[k] or sigma_W[1] raises an error or returns zero columns;
    • any element of theta_vec or sigma_vec is NA, or any sigma_vec <= 0;
    • fit$draws(...) for the W_raw block raises an error or returns a matrix whose column count differs from W_per_k_dim * d;
    • any element of basis_norm is non-finite or non-positive.
  • tryCatch wrappers suppress errors from $draws() and coerce them to NULL.
  • basis_diff is defensively re-shaped into a matrix if vapply returns a non-matrix (which can occur when W_per_k_dim == 1L).
  • n_chains is hard-coded to 2L regardless of the actual number of chains in fit.
  • No side effects beyond reading from fit.

preflight_attribution_score_a_per_k_from_fit(fit, k, J_a_free_k, a_raw_offset_k)

Purpose

Per-component-$k$ attribution score (Path B′ filter 1) for the additive component $a$. It extracts sigma_a[k] and the flat-packed free-coefficient section a_raw[a_raw_offset_k + 1 : J_a_free_k] from the pre-flight fit, computes the per-draw Euclidean norm of that section, retrieves the divergent-transition indicator, and delegates to preflight_attribution_score.

Arguments

Argument Type Meaning
fit Stan fit object Pre-flight fit exposing $draws(variables, format) and $sampler_diagnostics(format).
k integer scalar 1-based component index used to select sigma_a[k].
J_a_free_k integer scalar Number of free coefficients in the $a$ section for component $k$.
a_raw_offset_k integer scalar 0-based offset into the flat a_raw vector at which component $k$'s section begins.

Mathematics

Let $S$ be the number of draws. The extracted quantities are

$$ \sigma_a^{(s)} = \texttt{sigma_a}[k]^{(s)}, \qquad \mathbf{a}^{(s)} = \bigl(\texttt{a_raw}[\text{offset}_k + 1]^{(s)}, \dots, \texttt{a_raw}[\text{offset}_k + J]^{(s)}\bigr)^\top, $$

$$ |\mathbf{a}^{(s)}| = \sqrt{\sum_{j=1}^{J} \bigl(a^{(s)}_j\bigr)^2}. $$

The score is then

$$ \text{score} = \texttt{preflight_attribution_score}!\bigl(\log \sigma_a^{(s)},; |\mathbf{a}^{(s)}|,; \text{divergent}^{(s)}\bigr). $$

Returns

A numeric scalar — the attribution score — or NA_real_ on any early-exit path.

Notes

  • Early-exit (returns NA_real_) when:
    • J_a_free_k < 1L;
    • $draws() for sigma_a[k] errors or returns zero columns;
    • any sigma_vec <= 0 or any sigma_vec is NA;
    • $draws() for the a_raw section errors or returns a column count $\neq J_{a\_free\_k}$;
    • $sampler_diagnostics() errors, returns NULL, or lacks a "divergent__" column.
  • The offset convention is 0-based: the actual Stan indices queried are a_raw_offset_k + 1, …, a_raw_offset_k + J_a_free_k.
  • divergent is coerced to logical via as.logical.
  • No side effects.

preflight_attribution_score_W_per_k_from_fit(fit, k, W_per_k_dim, d)

Purpose

Per-component-$k$ attribution score (Path B′ filter 1) for the multiplicative component $W$. It extracts the global sigma_W[1], selects the rows of W_raw belonging to the $k$-th block, computes the per-draw Frobenius norm of that block, retrieves divergent-transition indicators, and delegates to preflight_attribution_score.

Arguments

Argument Type Meaning
fit Stan fit object Pre-flight fit exposing $draws and $sampler_diagnostics.
k integer scalar 1-based component index selecting the row-block of W_raw.
W_per_k_dim integer scalar Number of rows in the $k$-th block of W_raw.
d integer scalar Number of columns of W_raw.

Mathematics

With $S$ draws and block rows $r_m = (k-1)\,W_{\text{per\_k\_dim}} + m$:

$$ \sigma_W^{(s)} = \texttt{sigma_W}[1]^{(s)}, \qquad \mathbf{W}^{(s)} = \bigl[\texttt{W_raw}[r_m, j]^{(s)}\bigr]_{m=1,\dots,W_{\text{per_k_dim}};; j=1,\dots,d}, $$

$$ |\mathbf{W}^{(s)}|_F = \sqrt{\sum_{m=1}^{W_{\text{per_k_dim}}} \sum_{j=1}^{d} \bigl(W^{(s)}_{m,j}\bigr)^2}. $$

The score is

$$ \text{score} = \texttt{preflight_attribution_score}!\bigl(\log \sigma_W^{(s)},; |\mathbf{W}^{(s)}|_F,; \text{divergent}^{(s)}\bigr). $$

Returns

A numeric scalar — the attribution score — or NA_real_ on any early-exit path.

Notes

  • Early-exit (returns NA_real_) under the same structural conditions as preflight_attribution_score_a_per_k_from_fit, mutatis mutandis for sigma_W[1] and the W_raw block.
  • The variable names passed to $draws() are constructed via outer(block_rows, seq_len(d), ...) producing sprintf("W_raw[%d,%d]", r, c) strings; the resulting vector is flattened by as.vector, so column ordering in the returned matrix follows row-major enumeration of (r, c).
  • W_norm is computed as sqrt(rowSums(W_block_mat^2)), i.e. the Frobenius norm per draw.
  • No side effects.

decision_to_logical(x)

Purpose

Maps a vector of parametrization-decision strings to a logical CP-flag vector. Used internally to translate pre-flight decision labels into the boolean cp_* flags consumed by downstream resolution logic.

Arguments

Argument Type Meaning
x character vector Each element is one of "CP", "NCP", "absent", "per_k".

Mathematics

The mapping is

$$ \texttt{decision_to_logical}(v) = \begin{cases} \texttt{TRUE} & v = \text{"CP"}, \\ \texttt{FALSE} & v = \text{"NCP"}, \\ \texttt{FALSE} & v = \text{"absent"}, \\ \texttt{NA} & v = \text{"per_k"}. \end{cases} $$

Returns

A logical vector of the same length as x (with possible NA values), without names (USE.NAMES = FALSE).

Notes

  • Any value not in $\{\text{"CP"}, \text{"NCP"}, \text{"absent"}, \text{"per\_k"}\}$ triggers gdpar_abort with class = "gdpar_internal_error" and data = list(value = v), carrying the message "Unknown decision value '<v>'.".
  • The "per_k" sentinel maps to NA; per the docstring, the resolver is expected to translate this into an unsupported-feature error in Phase H.1 (though the actual translation is performed elsewhere).
  • Implemented via vapply with switch, ensuring a scalar logical per element.

`%||%`(a, b)

Purpose

Local NULL-coalescing operator. Provides a fallback value when a is NULL, without depending on rlang or base-R %||% availability across R versions.

Arguments

Argument Type Meaning
a any Primary value; tested for NULL.
b any Fallback value returned when a is NULL.

Mathematics

$$ a ;%||%; b = \begin{cases} b & \text{if } a \text{ is } \texttt{NULL}, \ a & \text{otherwise.} \end{cases} $$

Returns

b if is.null(a) is TRUE; otherwise a.

Notes

  • Defined as a backtick-quoted infix function `%||%`, so it is callable as a %||% b within the enclosing namespace.
  • Only tests for NULL; does not handle zero-length vectors, NA, or FALSE as "empty".
  • No side effects.

resolve_parametrization_multi(parametrization, parametrization_a, parametrization_W, parametrization_aggregation, prior, stan_data, amm, preflight_seed, verbose, tau_cp = 5, tau_ncp = 2)

Purpose

Multivariate counterpart of resolve_parametrization(). Combines the user-supplied global parametrization, per-component scalar overrides parametrization_a / parametrization_W, and an optional pre-flight diagnostic to produce resolved CP/NCP flags (both global scalars and per-$k$ vectors of length $p$) for the additive component $a$ and the multiplicative component $W$. Also enforces aggregation-strategy validation and emits a structured informational message when heterogeneous per-$k$ $W$ decisions cannot be honored by the single-global-sigma_W[1] model.

Arguments

Argument Type Meaning
parametrization character scalar One of "auto", "ncp", "cp" (validated by the caller).
parametrization_a NULL or character scalar Per-component override for $a$; NULL or "ncp" expected. The docstring states "cp" is rejected with gdpar_unsupported_feature_error, but the code itself does not perform this rejection — it merely sets cp_a_user = (parametrization_a == "cp").
parametrization_W character scalar or NULL Per-component override for $W$; NULL, "ncp", or "cp". "cp" is honored.
parametrization_aggregation character scalar or NULL One of "any_ncp", "majority", "per_k"; NULL defaults to "any_ncp". The docstring states "per_k" is rejected with gdpar_unsupported_feature_error, but the code accepts "per_k" and handles it by setting cp_a_resolved = NA.
prior list Prior specification passed to preflight_parametrization_multi.
stan_data list Stan data list passed to preflight_parametrization_multi.
amm list Adaptive model metadata; must contain element p (number of components).
preflight_seed integer or NULL Seed for the pre-flight fit.
verbose logical Verbosity flag forwarded to the pre-flight.
tau_cp numeric scalar CP threshold (default 5); forwarded to preflight_parametrization_multi.
tau_ncp numeric scalar NCP threshold (default 2); forwarded to preflight_parametrization_multi.

Mathematics

Let $p = \texttt{amm\$p}$. The user-level and global-level CP flags are derived as

$$ \text{cp_a_user} = \begin{cases} \mathbb{1}[\texttt{parametrization_a} = \text{"cp"}] & \texttt{parametrization_a} \neq \texttt{NULL}, \ \texttt{NA} & \text{otherwise}, \end{cases} $$

$$ \text{cp_W_user} = \begin{cases} \mathbb{1}[\texttt{parametrization_W} = \text{"cp"}] & \texttt{parametrization_W} \neq \texttt{NULL}, \ \texttt{NA} & \text{otherwise}, \end{cases} $$

$$ \text{cp_a_global} = \text{cp_W_global} = \begin{cases} \texttt{FALSE} & \texttt{parametrization} = \text{"ncp"}, \ \texttt{TRUE} & \texttt{parametrization} = \text{"cp"}, \ \texttt{NA} & \texttt{parametrization} = \text{"auto"}. \end{cases} $$

A pre-flight is required iff both the user override and the global setting are indeterminate:

$$ \text{needs_pf_a} = \texttt{is.na}(\text{cp_a_user}) \wedge \texttt{is.na}(\text{cp_a_global}), \quad \text{needs_pf_W} = \texttt{is.na}(\text{cp_W_user}) \wedge \texttt{is.na}(\text{cp_W_global}). $$

When the pre-flight runs, each resolved flag prefers the user override when non-NA and falls back to the pre-flight result otherwise; per-$k$ vectors are either broadcast from the user scalar or taken from pf$cp_a_per_k / pf$cp_W_per_k. When no pre-flight is needed, per-$k$ vectors are broadcast as $\texttt{rep}(\text{cp}, p)$.

For the $a$ component, under aggregation = "per_k" the global cp_a_resolved is set to NA (preserving the per-$k$ vector); otherwise, if the per-$k$ vector is heterogeneous it is overwritten by $\texttt{rep}(\text{cp\_a\_resolved}, p)$, enforcing a uniform branch.

For the $W$ component, because the model uses a single global sigma_W[1], heterogeneous per-$k$ $W$ decisions cannot be honored at the sampler level. When length(unique(cp_W_per_k_resolved)) > 1L, a structured gdpar_inform message (class "gdpar_W_per_k_heterogeneous_message") is emitted and cp_W_per_k_resolved is overwritten by $\texttt{rep}(\text{cp\_W\_resolved}, p)$.

Returns

A list with elements:

Element Type Description
cp_a logical scalar (possibly NA) Resolved global CP flag for $a$.
cp_W logical scalar Resolved global CP flag for $W$.
cp_a_per_k logical vector (length $p$) Per-component CP flags for $a$.
cp_W_per_k logical vector (length $p$) Per-component CP flags for $W$.
aggregation character scalar The (possibly defaulted) aggregation strategy.
report gdpar_preflight_report or NULL Pre-flight report when the pre-flight ran; NULL otherwise.

All logical elements are coerced via as.logical before return.

Notes

  • Aggregation validation: If parametrization_aggregation (after defaulting) is not in {"any_ncp", "majority", "per_k"}, gdpar_abort is called with class = "gdpar_input_error" and data = list(received = parametrization_aggregation).
  • Docstring vs. code discrepancies: The docstring asserts that parametrization_a = "cp" is rejected with gdpar_unsupported_feature_error and that aggregation = "per_k" is likewise rejected. The code does neither: "cp" for parametrization_a is silently accepted (setting cp_a_user = TRUE), and "per_k" aggregation is accepted and handled by setting cp_a_resolved = NA. These appear to be forward-looking contract notes for Phase H.2 rather than enforced behavior in this code.
  • Pre-flight invocation: When needs_pf_a || needs_pf_W, preflight_parametrization_multi is called with aggregation = parametrization_aggregation and the supplied tau_cp / tau_ncp.
  • Heterogeneous $W$ message: The gdpar_inform call carries class = "gdpar_W_per_k_heterogeneous_message" and data = list(cp_W_per_k = cp_W_per_k_resolved) (the pre-overwrite vector). The message text documents that Bloque 8 may promote sigma_W to array[p].
  • No direct side effects other than the potential gdpar_inform message and the pre-flight fit (which itself may have side effects such as file I/O or console output depending on verbose).
  • The function does not validate parametrization itself; this is assumed to have been done by the caller.

R/preflight_report.R

validate_preflight_per_dim(df)

Purpose Internal schema validator for the per-dimension decision data frame produced by the multivariate preflight (Path B' applied per coordinate). Called before the data frame is stored inside a gdpar_preflight_report object. Signals a condition of class gdpar_input_error on any violation; returns invisible(TRUE) on success.

Arguments

Argument Type Meaning
df data.frame The per-coordinate, per-component decision table. Must contain exactly the columns listed in .preflight_per_dim_cols: "component", "dim", "decision", "decision_reason", "n_divergent", "div_pct", "ebfmi_min", "t_attribution", "t_info_cp", "t_info_ncp".

Mathematics None. Purely structural validation.

Algorithm — validation checks performed in order

  1. df must be a data.frame; otherwise abort with class gdpar_input_error and a data field list(received = class(df)).
  2. Every column in .preflight_per_dim_cols must be present in colnames(df). The set of missing columns is reported in the abort message and in data$missing.
  3. df$component must be character with every value in $\{\texttt{"a"},\; \texttt{"W"}\}$.
  4. df$dim must be integer, every value $\ge 1$, and no NA.
  5. df$decision must be character with every value in $\{\texttt{"CP"},\; \texttt{"NCP"},\; \texttt{"absent"}\}$ (the levels stored in .preflight_decision_levels).
  6. df$decision_reason must be character (no further content check).
  7. df$n_divergent must be integer.
  8. Each of df$div_pct, df$ebfmi_min, df$t_attribution, df$t_info_cp, df$t_info_ncp must be numeric.
  9. No duplicate (component, dim) pairs are allowed (checked via duplicated(df[, c("component", "dim")])).

Returns invisible(TRUE) on success. Never returns a visible value; aborts via gdpar_abort() on failure.

Notes

  • The function is internal (@noRd, @keywords internal). It is not exported.
  • Side effect: raises a condition of class gdpar_input_error (produced by gdpar_abort) if any check fails. The condition object carries a data attribute with diagnostic details (e.g., the set of missing columns, the received class of df).
  • Column component is restricted to the two model components of the HSGP reference-anchored parameterization: $\mathbf{a}$ (intercept) and $\mathbf{W}$ (basis weights).
  • The function does not check that dim values form a contiguous sequence or that they are consistent across components; it only checks positivity and absence of NA.

validate_preflight_global(df)

Purpose Internal schema validator for the aggregated (global-level) decision data frame that summarizes per-component CP/NCP decisions across all coordinates. Companion to validate_preflight_per_dim; called before storing the global summary inside a gdpar_preflight_report object.

Arguments

Argument Type Meaning
df data.frame The per-component global decision table. Must contain exactly the columns in .preflight_global_cols: "component", "global_decision", "agreement", "method".

Mathematics None. Purely structural validation.

Algorithm — validation checks performed in order

  1. df must be a data.frame.
  2. Every column in .preflight_global_cols must be present.
  3. df$component must be character with every value in $\{\texttt{"a"},\; \texttt{"W"}\}$.
  4. df$global_decision must be character with every value in $\{\texttt{"CP"},\; \texttt{"NCP"},\; \texttt{"per\_k"},\; \texttt{"absent"}\}$ (.preflight_global_decision_levels).
  5. df$agreement must be numeric. Additionally, every finite (non-NA, non-Inf) value must lie in $[0, 1]$. NA and Inf values are tolerated.
  6. df$method must be character with every value in $\{\texttt{"any\_ncp"},\; \texttt{"majority"},\; \texttt{"per\_k"}\}$ (.preflight_aggregation_methods).
  7. No duplicate component entries are allowed.

Returns invisible(TRUE) on success. Aborts with gdpar_input_error on failure.

Notes

  • Internal, not exported.
  • The agreement column is allowed to contain NA (e.g., when all per-dim decisions are "absent" for a component) and $\pm\infty$, but finite non-NA values are clamped to $[0,1]$ by the validator.
  • Duplicate component is forbidden because each component ($\mathbf{a}$, $\mathbf{W}$) may appear at most once in the global summary.

aggregate_preflight_one_component(decisions, method)

Purpose Aggregates the vector of per-dimension decisions for a single model component ($\mathbf{a}$ or $\mathbf{W}$) into a single global decision according to the chosen aggregation strategy. This is the core decision-merging logic for the multivariate preflight report.

Arguments

Argument Type Meaning
decisions character vector Per-dimension decision levels for one component. Each element is "CP", "NCP", or "absent".
method character scalar Aggregation method, one of "any_ncp", "majority", "per_k".

Mathematics

Let $\mathcal{E} = \{d_i : d_i \neq \texttt{"absent"}\}$ be the set of effective (non-absent) decisions.

Strategy "any_ncp" (conservative):

$$\text{global} = \begin{cases} \texttt{"CP"} & \text{if } \forall, d \in \mathcal{E},; d = \texttt{"CP"} \ \texttt{"NCP"} & \text{otherwise} \end{cases}$$

Strategy "majority":

Let $n_{\text{CP}} = |\{d \in \mathcal{E} : d = \texttt{"CP"}\}|$ and $n_{\text{NCP}} = |\{d \in \mathcal{E} : d = \texttt{"NCP"}\}|$.

$$\text{global} = \begin{cases} \texttt{"CP"} & \text{if } n_{\text{CP}} > n_{\text{NCP}} \ \texttt{"NCP"} & \text{otherwise (including ties)} \end{cases}$$

Ties break conservatively toward NCP.

Strategy "per_k":

$$\text{global} = \texttt{"per_k"}$$

whenever $|\mathcal{E}| &gt; 0$. The sentinel value "per_k" signals downstream codegen (Phase H.2) to consume the per-coordinate decision vector directly in the Stan template rather than a single aggregated decision.

Absence rule (all methods): If $|\mathcal{E}| = 0$ (all decisions are "absent"), the function returns "absent" immediately, before dispatching on method.

Returns A length-1 character scalar: "CP", "NCP", "per_k", or "absent".

Notes

  • Internal, not exported.
  • Aborts with gdpar_input_error if method is not one of the three recognized strings (the default branch of switch).
  • The "absent" level is filtered out before aggregation. This handles the case where a component has no informative data (e.g., zero divergences and no effective diagnostic thresholds reached).
  • The "majority" tie-breaking toward NCP is intentional and documented as conservative: NCP geometry is monotonically safer in the worst case.

compute_preflight_agreement(decisions, global_decision)

Purpose Computes the agreement score for a single component — the fraction of effective per-dim decisions that match the aggregated global decision. For the "per_k" strategy (where there is no single global decision), it instead reports the frequency of the modal per-dim decision as a measure of uniformity.

Arguments

Argument Type Meaning
decisions character vector Per-dimension decision levels for one component (may include "absent").
global_decision character scalar The aggregated global decision for this component (from aggregate_preflight_one_component).

Mathematics

Let $\mathcal{E} = \{d_i : d_i \neq \texttt{"absent"}\}$ with $|\mathcal{E}| = n_{\text{eff}}$.

Case 1: $n_{\text{eff}} = 0$$\text{agreement} = \texttt{NA}$.

Case 2: $\texttt{global\_decision} = \texttt{"per\_k"}$:

$$\text{agreement} = \frac{\max!\bigl(n_{\text{CP}},; n_{\text{NCP}}\bigr)}{n_{\text{eff}}}$$

where $n_{\text{CP}} = |\{d \in \mathcal{E} : d = \texttt{"CP"}\}|$ and $n_{\text{NCP}} = |\{d \in \mathcal{E} : d = \texttt{"NCP"}\}|$. This is the frequency of the modal effective decision, capturing whether per-dim decisions are unanimous or mixed.

Case 3: $\texttt{global\_decision} = \texttt{"absent"}$$\text{agreement} = \texttt{NA}$.

Case 4: Otherwise (global decision is "CP" or "NCP"):

$$\text{agreement} = \frac{1}{n_{\text{eff}}} \sum_{d \in \mathcal{E}} \mathbf{1}!\bigl[d = \texttt{global_decision}\bigr]$$

i.e., the proportion of effective decisions equal to the global decision.

Returns A length-1 numeric scalar (or NA_real_ when no effective decisions exist or the global decision is "absent").

Notes

  • Internal, not exported.
  • For "per_k" the agreement is always $\ge 0.5$ when there are effective decisions (since the mode is the majority or at least a tie, and max picks the larger count; with ties it equals exactly 0.5).
  • For "any_ncp" or "majority" strategies, agreement $\in [0,1]$ and equals 1.0 when all effective decisions agree with the global decision.
  • The function uses identical() (not ==) to compare global_decision against "per_k" and "absent", which is safe for length-1 character comparisons and handles NULL gracefully.

build_preflight_global(per_dim, method)

Purpose Constructs the global (aggregated) decision data frame from the per-dim decision data frame and the chosen aggregation method. Iterates over all unique components present in per_dim, applies aggregate_preflight_one_component and compute_preflight_agreement for each, and returns a data frame suitable for storage in the $global slot of a gdpar_preflight_report object.

Arguments

Argument Type Meaning
per_dim data.frame The validated per-coordinate, per-component decision table (must conform to the schema enforced by validate_preflight_per_dim). Must contain at least columns "component" and "decision".
method character scalar Aggregation method, one of "any_ncp", "majority", "per_k".

Mathematics None beyond what is delegated to aggregate_preflight_one_component and compute_preflight_agreement (see those subsections).

Algorithm

  1. Validate that method is in .preflight_aggregation_methods; abort with gdpar_input_error otherwise.
  2. Extract the unique components from per_dim$component (preserving encounter order, which is canonically c("a", "W") when both are present).
  3. Allocate a data.frame with length(comps) rows and columns component (character), global_decision (character), agreement (numeric), method (character, recycled from the scalar method).
  4. For each component $i$:
    • Extract the decision column for rows where per_dim$component == comps[i].
    • Compute global_decision[i] via aggregate_preflight_one_component(rows, method).
    • Compute agreement[i] via compute_preflight_agreement(rows, global_decision[i]).
  5. Return the assembled global data frame.

Returns A data.frame with columns component, global_decision, agreement, method and one row per unique component in per_dim, in the same order as encountered in per_dim.

Notes

  • Internal, not exported.
  • The returned data frame conforms to the schema enforced by validate_preflight_global (can be validated downstream).
  • Component ordering follows unique(per_dim$component), which preserves first-encounter order. The canonical order is c("a", "W") when both components are present, but the function does not enforce this — it relies on the input ordering.
  • The method column is filled uniformly for all rows, reflecting that a single aggregation strategy is applied across all components within one report.
  • The function does not validate per_dim itself; callers (the constructor in section 2) are expected to call validate_preflight_per_dim first.

new_gdpar_preflight_report(per_dim, method = "any_ncp", settings = list())

Purpose Internal constructor (not exported) that creates and validates a gdpar_preflight_report S3 object. It assembles a structured list containing per-dimension data, a globally aggregated summary, the aggregation method, and user-supplied settings.

Arguments

Argument Type Meaning
per_dim data frame A per-(component, dimension) data frame produced elsewhere (e.g. by build_preflight_per_dim). Its structure is validated by validate_preflight_per_dim.
method character scalar (default "any_ncp") The aggregation method used to collapse per-dimension decisions into a single per-component global decision. Must be one of the values stored in the internal vector .preflight_aggregation_methods.
settings list (default list()) Arbitrary named settings that accompanied the preflight analysis (e.g. thresholds, tuning parameters). Stored verbatim for later retrieval.

Returns An S3 object of class "gdpar_preflight_report" (a named list) with four elements: per_dim, global, method, settings.

Notes

  • Input validation is strict: method must be a single character string belonging to .preflight_aggregation_methods; violations trigger gdpar_abort with class "gdpar_input_error" and a data list containing the received value.
  • settings must be a list; violations also raise "gdpar_input_error".
  • per_dim is validated by validate_preflight_per_dim (not shown in this section).
  • After validation, build_preflight_global constructs the aggregated per-component data frame, which is then itself validated by validate_preflight_global.
  • No mathematics; purely structural assembly and validation.

preflight_per_dim(report)

Purpose Exported accessor that extracts the per-dimension data frame from a gdpar_preflight_report object, enabling downstream dplyr/ggplot2 analysis.

Arguments

Argument Type Meaning
report gdpar_preflight_report A validated preflight report object.

Returns The per_dim component: a data frame with one row per (component, dim) pair.

Notes

  • Calls assert_inherits to confirm report is of class "gdpar_preflight_report". If not, an assertion error is raised.

preflight_global_decision(report)

Purpose Exported accessor that extracts the per-component aggregated summary from a gdpar_preflight_report object.

Arguments

Argument Type Meaning
report gdpar_preflight_report A validated preflight report object.

Returns The global component: a data frame with one row per component, containing the aggregated global decision, the agreement share, and related metadata.

Notes

  • Calls assert_inherits to confirm class membership. Assertion failure raises an error.

print.gdpar_preflight_report(x, level = c("global", "dim", "both"), ...)

Purpose S3 print method for gdpar_preflight_report. Produces a human-readable console summary with selectable verbosity.

Arguments

Argument Type Meaning
x gdpar_preflight_report The report to print.
level character scalar One of "global" (default), "dim", or "both". Controls which sections are printed.
... Unused; present for S3 generic signature compliance.

Returns Invisibly returns x (the original report object).

Notes

  • The header block always prints the number of unique dimensions (p), the aggregation method, and the list of component names.
  • When level is "global" or "both", delegates to print_preflight_global_block(x$global).
  • When level is "dim" or "both", delegates to print_preflight_per_dim_block(x$per_dim).
  • When level is "global", a hint is printed instructing the user to call print(x, level = "dim") or as.data.frame(x) for more detail.
  • match.arg is used for level, so partial matching is allowed.

format_preflight_table(cols, indent = " ")

Purpose Internal (not exported) utility that formats a named list of equal-length vectors into a fixed-width, left-padded table of character strings suitable for console output.

Arguments

Argument Type Meaning
cols named list of vectors Each element is a column; all must have the same length. Column names become headers. Numeric values are formatted via formatC(..., format = "g", digits = 4). NA values become the string "NA".
indent character scalar (default " ", four spaces) Prefix prepended to every output line.

Returns A character vector whose first element is the header line and subsequent elements are the body rows, each prefixed by indent. If cols is empty or has zero rows, returns character().

Notes

  • This is a pure formatting helper; it does no mathematical computation.
  • Numeric columns are formatted with formatC (general format, 4 significant digits); all other types are coerced with as.character.
  • Column widths are computed as max(nchar(header), max(nchar(body))) and used for right-padding (left-alignment via flag = "-").

print_preflight_global_block(global)

Purpose Internal (not exported) helper that pretty-prints the global (per-component aggregated) section of a preflight report to the console.

Arguments

Argument Type Meaning
global data frame The global element of a gdpar_preflight_report, containing columns component, global_decision, agreement, and (implicitly derived) a flag column.

Returns NULL (invisibly). Side effect is console output.

Notes

  • If global has zero rows, prints "(no components)" and returns early.
  • A flag column is computed inline using thresholding on global$agreement:
    • NA or non-finite → "" (empty string)
    • $\text{agreement} \geq 1 - 10^{-12}$"uniform"
    • $\text{agreement} \leq 0.5 + 10^{-12}$"split"
    • Otherwise → "mixed"
  • These thresholds treat near-unanimity and near-even-split as the boundary cases, with a tiny tolerance ($10^{-12}$) for floating-point imprecision.
  • The agreement column is formatted to 2 decimal places (formatC(..., format = "f", digits = 2)).
  • Delegates actual table formatting to format_preflight_table.

print_preflight_per_dim_block(per_dim)

Purpose Internal (not exported) helper that pretty-prints the per-dimension section of a preflight report to the console.

Arguments

Argument Type Meaning
per_dim data frame The per_dim element of a gdpar_preflight_report, with columns including component, dim, decision, decision_reason, t_info_cp, t_info_ncp.

Returns NULL (invisibly). Side effect is console output.

Notes

  • If per_dim has zero rows, prints "(no per-dim rows)" and returns early.
  • t_info_cp and t_info_ncp are formatted with formatC(..., format = "g", digits = 3) (3 significant digits).
  • Delegates table formatting to format_preflight_table.

summary.gdpar_preflight_report(object, ...)

Purpose S3 summary method for gdpar_preflight_report. Returns a structured list of summary statistics about the report.

Arguments

Argument Type Meaning
object gdpar_preflight_report The report to summarize.
... Unused; present for S3 generic signature compliance.

Returns A named list with elements:

Element Type Meaning
n_components integer Number of rows in object$global (i.e. number of components).
n_dims integer Number of unique dimension values in object$per_dim$dim.
method character The aggregation method stored in the report.
per_component data frame The full object$global data frame.
overall_agreement numeric Mean of the finite (non-NA, finite) values in object$global$agreement. If no finite values exist, NA_real_.
per_dim_counts named integer vector Counts of CP, NCP, and absent decisions across all (component, dim) rows.
settings list The stored settings list.

Notes

  • assert_inherits enforces class membership.
  • The overall_agreement is computed as $\bar{a} = \frac{1}{|\{i : a_i \text{ finite}\}|}\sum_{\substack{i \\ a_i \text{ finite}}} a_i$ where $a_i$ is the agreement share for component $i$. If the set of finite agreements is empty, the result is NA_real_.

as.data.frame.gdpar_preflight_report(x, row.names = NULL, optional = FALSE, ...)

Purpose S3 as.data.frame method that returns the tidy per-dimension data frame from a preflight report, suitable for programmatic subsetting and analysis.

Arguments

Argument Type Meaning
x gdpar_preflight_report The report to convert.
row.names NULL or character vector If non-NULL, applied as the row names of the returned data frame.
optional logical Ignored; present for S3 generic compatibility with as.data.frame.
... Unused; present for S3 generic signature compliance.

Returns The x$per_dim data frame (with one row per (component, dim) pair), optionally with custom row names.

Notes

  • assert_inherits enforces class membership.
  • The optional argument is accepted but never consulted, consistent with the base as.data.frame contract.

format.gdpar_preflight_report(x, ...)

Purpose S3 format method that produces a compact one-line character representation of the report, suitable for embedding in other strings or contexts where a full print is undesirable.

Arguments

Argument Type Meaning
x gdpar_preflight_report The report to format.
... Unused; present for S3 generic signature compliance.

Returns A character scalar of the form:

<gdpar_preflight_report: p={n_dims}, aggregation={method}, components={comma-separated names}>

where n_dims is the count of unique dimensions, method is the aggregation method string, and the component names are the unique values from x$per_dim$component joined by commas (no spaces).

Notes

  • No class assertion is performed here (unlike the other methods), relying on correct S3 dispatch.
  • No trailing newline is included in the returned string.

R/preflight.R

preflight_parametrization(prior, stan_data, amm, preflight_seed = NULL, verbose = FALSE, tau_cp = 5, tau_ncp = 2)

Purpose

Central pre-flight diagnostic routine that decides, per model component, whether the long fit should use the centered parametrization (CP) or the non-centered parametrization (NCP). It compiles a short NCP version of the Path 1 Stan model, draws a fixed-length sample, and applies three sequential filters—divergence attribution, E-BFMI, and info-ratio (Path B′)—to produce Boolean CP flags for the additive component a and the modulating component W.

Arguments

Argument Type Meaning
prior gdpar_prior (S3) Prior specification object; validated via assert_inherits.
stan_data list Stan data list assembled by assemble_stan_data; must contain J_a, dim_W, d, and theta_anchor.
amm amm_spec (S3) AMM specification object; amm$a and amm$W may be NULL if the component is absent.
preflight_seed integer or NULL Seed forwarded to the pre-flight sampler; if NULL, no seed is set.
verbose logical If FALSE, pre-flight stderr is sunk to a tempfile and discarded.
tau_cp numeric scalar (default 5) Upper null hypothesis value for the info-ratio z-test (filter 3).
tau_ncp numeric scalar (default 2) Lower null hypothesis value for the info-ratio z-test (filter 3).

Mathematics

The routine evaluates three filters in strict priority order. Let $n_{\text{div}}$ denote the number of divergent transitions among $T = 400$ total transitions (2 chains × 200 sampling iterations), and let $\alpha = 0.025$.

Filter 1 — Divergence attribution (applied only when $n_{\text{div}} \geq 1$). For each component $X \in \{a, W\}$, define the per-transition score

$$g_t = -, z_{\log\sigma_X}(t) \cdot z_{\lVert X_{\text{raw}}\rVert}(t),$$

where $z_{\log\sigma_X}(t)$ and $z_{\lVert X_{\text{raw}}\rVert}(t)$ are z-scores computed over all $T$ transitions. The test statistic is the centered mean of $g_t$ over divergent transitions divided by a standard error estimated from all transitions:

$$S_X = \frac{\bar{g}_{\text{div}} - \bar{g}_{\text{all}}}{\sqrt{\mathrm{Var}(g),/,n_{\text{div}}}},$$

where $\bar{g}_{\text{all}}$ is the global posterior mean (centering under $H_0\colon E[S_{\text{centered}}] = 0$) and $\mathrm{Var}(g)$ is the variance of $g_t$ over all transitions. The one-sided critical value is

$$c = t_{1-\alpha,;\max(1,, n_{\text{div}}-1)}.$$

If $S_X &gt; c$, the component is switched to CP.

Filter 2 — E-BFMI. If $\min(\text{EBFMI}) &lt; 0.3$, any component still undecided (i.e., NA) is forced to CP. This pathology is not attributable per-component.

Filter 3 — Info ratio (Path B′). For each undecided component, a per-draw effective coefficient $\mathrm{eff}[t, jj]$ and a per-draw reference scale $\mathrm{ref}[t]$ are constructed, where $\mathrm{ref}[t]$ is the conditional prior standard deviation of $\mathrm{eff}[t, jj]$ given the hyperparameters at draw $t$. The per-coordinate log info ratio is

$$\ell_j = \log!\left(\frac{\overline{\mathrm{ref}}}{\mathrm{sd}(\mathrm{eff}_j)}\right),$$

averaged across coordinates to yield $\bar{\ell}$. Its standard error $\widehat{\mathrm{SE}}(\bar{\ell})$ is estimated by a chain-aware block bootstrap with block size 10. Two asymptotic z-tests are performed with critical value $z_{1-\alpha} = \Phi^{-1}(0.975)$:

  • Upper test (toward CP): $z_{\text{cp}} = (\bar{\ell} - \log\tau_{\text{cp}})\,/\,\widehat{\mathrm{SE}}$. Reject toward CP if $z_{\text{cp}} &gt; z_{1-\alpha}$.
  • Lower test (toward NCP): $z_{\text{ncp}} = (\bar{\ell} - \log\tau_{\text{ncp}})\,/\,\widehat{\mathrm{SE}}$. Reject toward NCP if $z_{\text{ncp}} &lt; -z_{1-\alpha}$.

Decision logic for filter 3:

Condition Decision Reason string
$z_{\text{cp}}$ is NA NCP (FALSE) filter_info_undefined_ncp
$z_{\text{cp}} &gt; z_{1-\alpha}$ CP (TRUE) filter_info_high
$z_{\text{ncp}} &lt; -z_{1-\alpha}$ NCP (FALSE) filter_info_low
otherwise NCP (FALSE) filter_info_ambiguous_ncp

Returns

A list with four elements:

Element Type Description
cp_a logical Whether component a should use CP.
cp_W logical Whether component W should use CP.
cs_model_ncp compiled cmdstan model object or NULL The compiled NCP model, suitable for reuse if both decisions are NCP; NULL when neither component is needed.
meta list Diagnostic statistics and textual reasons (see below).

The meta list contains:

  • used_preflight (logical): whether the pre-flight sample was actually run.
  • n_divergent (integer): total divergent transitions.
  • div_pct (numeric): n_divergent / 400.
  • ebfmi_min (numeric): minimum E-BFMI across chains.
  • t_attribution_a, t_attribution_W (numeric): filter 1 t-statistics.
  • t_info_cp_a, t_info_ncp_a, t_info_cp_W, t_info_ncp_W (numeric): filter 3 z-statistics.
  • decision_reason_a, decision_reason_W (character): textual reason per component; "absent_or_degenerate" if the component is not needed; "" initialized when needed but not yet decided.

Notes

  • Early exit: If needs_a and needs_W are both FALSE (i.e., neither component has a non-degenerate dimension), the function returns immediately with cp_a = FALSE, cp_W = FALSE, cs_model_ncp = NULL, and meta$used_preflight = FALSE. The reason fields are set to "absent_or_degenerate".
  • Component degeneracy: needs_a requires has_a && J_a_free >= 1L where J_a_free = J_a if has_a && J_a >= 1L, else 0L. needs_W requires has_W && dim_W >= 1L && d >= 1L. The coordinate count for W is n_coords_W = dim_W * d.
  • Fixed pre-flight settings: adapt_delta = 0.95, max_treedepth = 10, 200 warmup + 200 sampling, 2 chains. These are independent of the user's long-fit settings.
  • Stan code generation: Calls generate_stan_code(prior, cp_a = FALSE, cp_W = FALSE) to produce the NCP model source, then write_stan_to_tempfile and cmdstanr::cmdstan_model for compilation.
  • Filter priority: Filter 1 is applied first (only if $n_{\text{div}} \geq 1$); filter 2 next; filter 3 last. A component decided by an earlier filter is not re-evaluated.
  • Fallback: If a component remains NA after all filters (e.g., needs_a is TRUE but n_div = 0 and E-BFMI ≥ 0.3 and the info-ratio call was somehow skipped), it defaults to FALSE (NCP).
  • Errors raised: assert_inherits will error if prior is not a gdpar_prior or amm is not an amm_spec.
  • Side effects: Writes a Stan model to a tempfile, compiles it via cmdstanr, and runs MCMC sampling. If verbose = FALSE, stderr is temporarily redirected.

run_preflight_sample(cs_model, stan_data, seed, verbose)

Purpose

Wrapper around cs_model$sample() that enforces the fixed pre-flight MCMC settings and optionally suppresses stderr output by redirecting the message stream to a temporary log file.

Arguments

Argument Type Meaning
cs_model compiled cmdstan model object The model to sample from (produced by cmdstanr::cmdstan_model).
stan_data list Stan data list passed as data.
seed integer or NULL Random seed; if NULL, no seed argument is passed.
verbose logical If TRUE, show_messages and show_exceptions are enabled and stderr is not redirected.

Mathematics

The sampler is invoked with fixed settings:

$$n_{\text{chains}} = 2,\quad n_{\text{warmup}} = 200,\quad n_{\text{sampling}} = 200,\quad \delta_{\text{adapt}} = 0.95,\quad t_{\max} = 10.$$

Returns

The return value of cs_model$sample(...), which is a CmdStanMCMC object (as produced by cmdstanr).

Notes

  • Stderr suppression: When verbose = FALSE, a temporary file with extension .log is created, a file connection is opened in write-text mode ("wt"), and sink(msg_con, type = "message") is called. An on.exit handler (with after = FALSE) guarantees restoration: it checks sink.number(type = "message") > 0L before calling sink(NULL, type = "message"), closes the connection if open, and unlinks the temp file. This ensures cleanup even if sample() throws an error.
  • Seed handling: If seed is not NULL, it is coerced to integer via as.integer(seed) and added to the args list.
  • Refresh: Always set to 0L (no progress output).
  • No side effects beyond sampling: The function does not modify global state; the only side effect is the temporary file creation/deletion when verbose = FALSE.

preflight_attribution_score_from_fit(fit, component)

Purpose

Extracts the inputs needed for the divergence-attribution t-statistic (filter 1) from a pre-flight CmdStanMCMC fit object: draws of the component's scale parameter, draws of the raw (non-centered) component, and the divergent-transition flag. Delegates the actual t-statistic computation to preflight_attribution_score().

Arguments

Argument Type Meaning
fit CmdStanMCMC The pre-flight fit object from run_preflight_sample.
component character Either "a" or "W"; selects which component's variables to extract.

Mathematics

The function maps the component name to Stan variable names and computes:

  1. Log-scale vector: $\log\sigma_X$ from the first column of the sigma_X draws matrix.
  2. Raw norm: For each transition $t$,

$$\lVert X_{\text{raw}}(t)\rVert = \sqrt{\sum_{k=1}^{K} X_{\text{raw},k}(t)^2},$$

where $K$ is the number of columns in the raw draws matrix (i.e., the dimensionality of the component).

  1. Divergent flag: Boolean vector from the divergent__ sampler diagnostic column.

These three vectors are passed to preflight_attribution_score(log(sigma_vec), raw_norm, divergent).

Returns

A numeric scalar: the attribution t-statistic computed by preflight_attribution_score(), or NA_real_ if the statistic cannot be computed.

Notes

  • Variable mapping: "a"sigma_a, a_raw; "W"sigma_W, W_raw. Any other component value returns NA_real_ immediately.
  • Defensive extraction: Each fit$draws(...) and fit$sampler_diagnostics(...) call is wrapped in tryCatch with an error handler returning NULL. If any extraction returns NULL or has zero columns, the function returns NA_real_.
  • Validity checks on sigma: If any element of sigma_vec is $\leq 0$ or NA, the function returns NA_real_ (since log(sigma) would be undefined or non-finite).
  • Diagnostic column check: If the sampler diagnostics matrix does not contain a column named "divergent__", the function returns NA_real_.
  • Only first sigma column used: sigma_vec <- as.numeric(draws_sigma[, 1]) — even if sigma_X is vector-valued in Stan, only the first element is used for the attribution score.
  • Raw norm: Computed as sqrt(rowSums(raw_mat^2)) over the full raw draws matrix, yielding one scalar norm per transition.
  • No side effects: Pure extraction and delegation; no modification of fit or global state.

← Part IV — Exhaustive Function Reference (5/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (7/7) →

Clone this wiki locally