-
Notifications
You must be signed in to change notification settings - Fork 0
Part IV Function Reference 6
← Part IV — Exhaustive Function Reference (5/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (7/7) →
Purpose
Compares an observed gdpar fit snapshot against a persisted golden reference snapshot for regression testing. It evaluates four distinct layers (structural, discrete, continuous, sanity) and returns a comprehensive report of any deviations that exceed principled tolerances.
Arguments
-
observed: Alistwith the schema produced bygdpar_snapshot_fit()containing fieldsstructural,discrete,continuous,sanity, andparametrization_resolved. -
golden: Alistof identical schema (typically loaded from an archived reference snapshot viareadRDS()). -
k_sigma:numericscalar; multiplier for the Monte Carlo standard error tolerance in layer C (continuous). Default is3. -
sanity_floor: Optionalnamed listoverriding the default absolute thresholds for layer D (sanity). Defaults tolist(rhat_max = 1.05, ess_bulk_min = 100, ess_tail_min = 100, divergent_pct = 0.01, ebfmi_min = 0.3).
Mathematics
For layer C (continuous), the comparison criterion for each continuous variable
Returns
A list with:
-
passed:logicalscalar;TRUEif and only if no failures are detected in any layer. -
failures: Adata.framewith one row per failure. Columns:layer(character),item(character),expected(character, compactly formatted),observed(character, compactly formatted),delta(numeric),threshold(numeric),severity(character). -
by_layer: Named integer vector of failure counts per layer (structural,discrete,continuous,sanity). Missing layers are reported as0.
Notes
- Both
observedandgoldenmust be lists; otherwise an error of classgdpar_internal_erroris raised viagdpar_abort(). - All four layers are evaluated regardless of earlier failures; failures are aggregated, not short-circuited.
- The function is marked experimental. The snapshot schema is versioned at
schema_version = 1Land may change in future releases. - The tolerance contract (
k_sigmaand sanity floors) is subject to refinement until the first stable release. - No S3 dispatch is involved; it is a plain function.
Purpose
Internal helper to format any R value into a compact one-line string suitable for inclusion in the comparison report's expected and observed columns.
Arguments
-
x: Any R object.
Returns
A character string:
-
"<NULL>"ifxisNULL. -
"<empty>"iflength(x) == 0. -
"<list:N>"ifxis a list of lengthN. - For character or logical vectors, elements are collapsed with commas.
- For numeric vectors of length ≤ 4, values are formatted with
formatC(digits = 6, format = "g")and trimmed of whitespace before collapsing with commas. - For numeric vectors of length > 4, returns
"<num[N]>"whereNis the length. - For all other objects, uses
as.character()and collapses with commas.
Notes
- This is an internal function (not exported) and is used solely to generate human-readable representations for the comparison report.
- No errors are raised; it handles any input gracefully.
Purpose
Extracts a four-layer reproducibility snapshot from a fitted gdpar_fit object. The snapshot is the canonical reference object for regression testing and is designed to be persisted via saveRDS() and later compared with gdpar_golden_compare().
Arguments
-
fit: Agdpar_fitobject returned bygdpar().
Returns
A list with:
-
structural: Class signatures, slot shapes, and column names of the fit object (built bybuild_structural_snapshot_golden()). -
discrete: Integer sampler diagnostics:n_divergent(total divergences),treedepth_max_n(number of transitions hitting maximum treedepth),n_leapfrog_total_per_chain(vector of total leapfrog steps per chain),treedepth_max_value(maximum treedepth observed),ebfmi_min(minimum E-BFMI across chains). -
continuous: Named list of per-variable posterior summaries for variablestheta_ref,a_coef,sigma_a,sigma_y,lp__(if present). Each entry is a named list of elements (e.g.,theta_ref[1]) with fields:mean,sd,ess_bulk,ess_tail,rhat,mc_se(Monte Carlo standard error =sd / sqrt(ess_bulk)). -
sanity: Aggregated convergence floors (built byextract_sanity_golden()). -
parametrization_resolved: List containing resolved CP/NCP flags (cp_a,cp_W,cp_a_per_k,cp_W_per_k) and the aggregation method (aggregation) fromfit$parametrization$meta.
Notes
- The function asserts that
fitinherits from classgdpar_fit. - The snapshot schema is versioned at
schema_version = 1L. - The function is marked experimental.
Purpose
Internal helper to compute per-element posterior summaries for all elements of a given variable (or variable prefix) in a draws_array object.
Arguments
-
draws_array: Adraws_arrayobject (from theposteriorpackage) containing the MCMC draws. -
var_name: Character scalar specifying the variable name or prefix (e.g.,"theta_ref"). Elements are identified via regex^var_name($|\\[).
Mathematics
For each element
- Mean:
$\bar{v} = \frac{1}{N} \sum_{i=1}^N v_i$ - Standard deviation:
$\text{sd}(v) = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (v_i - \bar{v})^2}$ - Bulk effective sample size (ESS):
$\text{ESS}_{\text{bulk}}$ - Tail effective sample size (ESS):
$\text{ESS}_{\text{tail}}$ - Potential scale reduction factor (
$\hat{R}$ ) - Monte Carlo standard error:
$\text{MC\_SE} = \frac{\text{sd}(v)}{\sqrt{\text{ESS}_{\text{bulk}}}}$
Returns
-
NULLif no elements of the variable are found. - Otherwise, a named list where each element corresponds to a variable element (e.g.,
"theta_ref[1]"). Each element is a list with components:mean,sd,ess_bulk,ess_tail,rhat,mc_se(all numeric scalars).
Notes
- Uses
posterior::summarise_draws()with custom summary functions for mean, sd, ESS bulk, ESS tail, and rhat. - The variable matching is done via regex, so
"theta_ref"matches"theta_ref","theta_ref[1]","theta_ref[2]", etc.
Purpose
Internal helper to extract discrete (integer-valued) sampler diagnostics from a gdpar_fit object.
Arguments
-
fit: Agdpar_fitobject.
Returns
A named list:
-
n_divergent: Integer total number of divergent transitions across all chains. -
treedepth_max_n: Integer count of transitions that hit the maximum treedepth. -
n_leapfrog_total_per_chain: Integer vector of total leapfrog steps per chain (length = number of chains). -
treedepth_max_value: Integer maximum treedepth observed across all draws and chains. -
ebfmi_min: Numeric scalar minimum E-BFMI across chains.
Notes
- Relies on
cmdstanrmethods:$sampler_diagnostics()and$diagnostic_summary(). - The
ebfmi_minis computed as the minimum across chains of the E-BFMI values returned by$diagnostic_summary().
Purpose Computes a set of summary statistics for MCMC diagnostics from a draws array and a discrete diagnostics object. This function aggregates key sampling quality metrics to be compared against floor values in sanity checks.
Arguments
-
draws_array: A 3D array of MCMC draws (iterations × chains × parameters), typically fromposterior::as_draws_array. -
discrete: A list containing discrete diagnostics from the sampler (e.g., fromrstan::get_sampler_params). Expected to includen_leapfrog_total_per_chain,n_divergent, andebfmi_min.
Mathematics The function computes the following:
- For each parameter, the R-hat (
$\hat{R}$ ), bulk effective sample size (ESS), and tail ESS using theposteriorpackage. - The maximum R-hat across all parameters: $$ \hat{R}{\text{max}} = \max{\theta} \hat{R}(\theta) $$
- The minimum bulk ESS: $$ \text{ESS}{\text{bulk,min}} = \min{\theta} \text{ESS}_{\text{bulk}}(\theta) $$
- The minimum tail ESS: $$ \text{ESS}{\text{tail,min}} = \min{\theta} \text{ESS}_{\text{tail}}(\theta) $$
- The divergent transition percentage:
$$ \text{divergent%} = \frac{n_{\text{divergent}}}{n_{\text{chains}} \times n_{\text{sampling}}} $$
where
$n_{\text{sampling}}$ is the number of post-warmup draws per chain. - The minimum E-BFMI (energy Bayesian fraction of missing information): $$ \text{E-BFMI}{\text{min}} = \min{\text{chains}} \text{E-BFMI} $$
Returns
A named list with numeric elements: rhat_max, ess_bulk_min, ess_tail_min, divergent_pct, and ebfmi_min.
Notes
-
n_samplingis computed asndraws(draws_array) / n_chains. Ifn_leapfrog_total_per_chainlength does not match the number of chains in the draws array, the divergent percentage may be misinterpreted. - Parameters with missing R-hat values are excluded from the max R-hat computation (
na.rm = TRUE). - This is an internal function; not exported.
Purpose Captures a structural snapshot of a fitted gdpar model object. This snapshot is used to detect silent API changes (e.g., changes in class, slot shapes, or column names) before running numerical comparisons.
Arguments
-
fit: A fitted model object of a class recognized by thegdparpackage (typically with acoefmethod).
Mathematics None.
Returns A named list containing:
-
fit_class: The class offit. -
coef_class: The class of thecoef(fit)object. -
p: An integer, the number of parameters fromcoef(fit)$p. -
summary_stats: The summary statistics fromcoef(fit). -
components: A character vector of names incoef(fit)excluding "p" and "summary_stats". -
theta_ref_cols: Column names of thetheta_refmatrix in the coefficients. -
theta_ref_nrow: Number of rows intheta_ref. -
a_class,a_length,a_per_k_cols: Class, length, and per-component column names of theacomponent (or NULL/NA). -
b_class,b_length: Class and length of thebcomponent (or NULL/NA). -
W_class,W_length: Class and length of theWcomponent (or NULL/NA). -
parametrization_keys: A sorted character vector of names infit$parametrization.
Notes
- This is an internal function; not exported.
- The snapshot is designed for use in
golden_compare_structural.
Purpose Performs a structural comparison (Layer A) between an observed and an expected structural snapshot. It reports differences in classes, dimensions, and component names.
Arguments
-
obs: The observed structural snapshot (a list frombuild_structural_snapshot_golden). -
exp: The expected (golden) structural snapshot (a list frombuild_structural_snapshot_golden). -
add: A callback function that records differences. It is called asadd(layer, key, expected, observed, ...).
Mathematics None.
Returns Invisible NULL. Side-effect: calls add for each detected discrepancy.
Notes
- If both
obsandexpare NULL, returns immediately. - If only one is NULL, records a
snapshot_presentmismatch. - Compares scalars (integer/character) by identity, ordered vectors by identity, and sets (like
componentsandparametrization_keys) by set equality. - The
a_per_k_colsfield is compared by identity after formatting withformat_compact(not defined here; presumably a helper). - This is an internal function; not exported.
Purpose Performs a discrete comparison (Layer B) of sampler diagnostics between observed and expected snapshots. It checks integer-valued diagnostics for bit-exact equality.
Arguments
-
obs: The observed discrete diagnostics snapshot (a list). -
exp: The expected discrete diagnostics snapshot (a list). -
add: A callback function that records differences.
Mathematics None.
Returns Invisible NULL. Side-effect: calls add for each detected discrepancy.
Notes
- If both
obsandexpare NULL, returns immediately. - If only one is NULL, records a
snapshot_presentmismatch. - Checks the following integer keys:
n_divergent,treedepth_max_n,treedepth_max_value. - Also checks
n_leapfrog_total_per_chainif both are non-NULL, but note that this field may be a vector (one per chain). The comparison usesas.integerand identity, so it must match exactly in length and values. - The
deltafield inaddis set asobs[[key]] - exp[[key]]for integer keys, but note that forn_leapfrog_total_per_chainthis delta is not explicitly set (the defaultaddcall does not includedelta). - This is an internal function; not exported.
Purpose Performs a continuous comparison (Layer C) of parameter summaries. It flags any parameter whose observed mean deviates from the expected mean by more than k_sigma times the expected Monte Carlo standard error (MC_SE).
Arguments
-
obs: The observed continuous summaries (a nested list: group → variable → summary withmeanandmc_se). -
exp: The expected continuous summaries (same structure). -
k_sigma: A numeric multiplier for the MC_SE threshold. -
add: A callback function that records failures.
Mathematics
For each common group and variable, the comparison is:
$$
|\text{obs.mean} - \text{exp.mean}| > k_{\sigma} \times \max(\text{exp.mc_se}, \epsilon)
$$
where
Returns Invisible NULL. Side-effect: calls add for any parameter failing the threshold.
Notes
- If both
obsandexpare NULL, returns immediately. - If only one is NULL, records a
snapshot_presentmismatch. - Only groups and variables present in both
obsandexpare compared. - Parameters with missing
meanormc_sein either snapshot are skipped. - The threshold is the maximum of
k_sigma * e$mc_seand a tiny epsilon to prevent zero thresholds whenmc_seis zero. - The
severityof a failure is set to "fail". - This is an internal function; not exported.
Purpose Performs a sanity check (Layer D) against absolute floor values for MCMC diagnostics. It flags when observed diagnostics are worse than the floor.
Arguments
-
obs: The observed sanity diagnostics (a list withrhat_max,ess_bulk_min,ess_tail_min,divergent_pct,ebfmi_min). -
floor: The floor (minimum acceptable) values for the same diagnostics. -
add: A callback function that records failures.
Mathematics The checks are:
obs$rhat_max > floor$rhat_maxobs$ess_bulk_min < floor$ess_bulk_minobs$ess_tail_min < floor$ess_tail_minobs$divergent_pct > floor$divergent_pctobs$ebfmi_min < floor$ebfmi_min
Returns Invisible NULL. Side-effect: calls add for each violated floor.
Notes
- If
obsis NULL, records asnapshot_presentmismatch. - Only fields present in
obs(non-NULL) are compared. - The
deltaandthresholdfields in theaddcall are set accordingly (delta = obs - floor, threshold = floor value). - This is an internal function; not exported.
Purpose
Returns the filesystem path to the golden manifest CSV (golden_manifest.csv), the reproducibility-tracking file for golden snapshots. All other manifest helpers delegate to this function for path resolution.
Arguments
-
root_dir(character(1)orNULL): Explicit root directory in which the manifest is expected to reside. WhenNULL, the function searches a fixed set of candidate directories.
Returns
A character(1) path string. The resolution logic is:
- If
root_dirisNULL, iterate over the candidate vectorc("tests/testthat/data", "data", "."); returnfile.path(candidate, "golden_manifest.csv")for the first candidate for whichdir.exists()isTRUE. If no candidate directory exists, return"golden_manifest.csv"(a bare filename in the current working directory). - If
root_diris supplied, returnfile.path(root_dir, "golden_manifest.csv")unconditionally—no existence check is performed.
Notes
No directories are created. The candidate list is hardcoded; there is no fallback to recursive search.
Purpose
Returns the canonical, ordered column schema of the golden manifest CSV. Used by init/read/add helpers to guarantee schema consistency.
Arguments
None.
Returns
A character vector of length 16, in order:
"family", "K", "basis_type", "p", "sub_phase", "bootstrap_date",
"cmdstan_version", "R_version", "DHARMa_version", "fit_code_hash",
"seed", "n_chains", "n_iter_warmup", "n_iter_sampling",
"rds_path", "n_obs"
Notes
Pure constant function; no side effects, no I/O.
Purpose
Ensures the manifest CSV exists with the canonical header. If the file is already present, the function is a no-op; otherwise it creates a zero-row CSV with the canonical columns.
Arguments
-
path(character(1)orNULL): Path to the manifest CSV. IfNULL, delegates to.gdpar_golden_manifest_path().
Returns
Invisibly returns path (the resolved path string), regardless of whether the file was created or already existed.
Notes
- Existence is checked via
file.exists(path). - When creating, an empty
data.frameis built withmatrix(character(0), nrow = 0L, ncol = length(header), dimnames = list(NULL, header)), coerced viaas.data.frame(..., stringsAsFactors = FALSE), and written withutils::write.csv(empty, path, row.names = FALSE). - Does not create parent directories; if the parent directory does not exist,
write.csvwill error.
Purpose
Reads the manifest CSV into a data.frame. Returns a schema-conformant empty frame if the file does not exist.
Arguments
-
path(character(1)orNULL): Path to the manifest CSV. IfNULL, delegates to.gdpar_golden_manifest_path().
Returns
A data.frame:
- If
file.exists(path)isFALSE: an empty frame with 0 rows and the 16 canonical columns (all character), constructed identically to.gdpar_golden_manifest_init's empty frame. - If the file exists:
utils::read.csv(path, stringsAsFactors = FALSE).
Notes
Unlike .gdpar_golden_manifest_init, this function does not create the file when it is absent—it merely returns an empty in-memory frame. The returned frame's column types depend on what read.csv infers; only the absent-file branch guarantees all-character columns.
Purpose
Appends a single row to the manifest CSV with idempotent upsert semantics on the composite key (family, K, basis_type, p): any pre-existing row sharing the same key is removed before the new row is appended.
Arguments
-
row_data(namedlist): Field values for the new manifest row. Must contain at least the four key fieldsfamily,K,basis_type,p. Additional fields matching the canonical column set are written; non-matching fields are silently dropped. -
path(character(1)orNULL): Path to the manifest CSV. IfNULL, delegates to.gdpar_golden_manifest_path().
Mathematics
The deduplication key is a pipe-delimited concatenation:
For each existing row
Returns
Invisibly returns path.
Notes
- Calls
.gdpar_golden_manifest_init(path)first, guaranteeing the file exists. - Validates that
row_datacontains all four required keys (family,K,basis_type,p). If any are missing, callsgdpar_abort()with class"gdpar_internal_error"and a message of the form"Golden manifest row is missing required keys: 'x', 'y'."(missing keys formatted withsQuoteand comma-collapsed). - Constructs a 1-row
data.frameinitialized entirely toNA_character_across all 16 canonical columns, then fills in values fromrow_datafor matching column names viaas.character(). - Deduplication: builds
key_oldfrom the existing frame's columns andkey_newfrom the new row's columns; filters withkeep <- key_old != key_new; subsetscurrent[keep, , drop = FALSE]. - Writes the combined frame via
utils::write.csv(combined, path, row.names = FALSE). - Column ordering in the output is governed by the canonical column vector because
row_dfis constructed with thatdimnamesorder andcurrentwas originally written in that order (thoughread.csvmay reorder if the file on disk differs).
Purpose
Computes a reproducibility identifier (hash) for the fit-code closure that generated a golden snapshot, enabling drift detection between the generator at bootstrap time and at audit time.
Arguments
-
fit_code_fn(function): A closure whose deparsed source text serves as the hash input.
Mathematics
Returns
A character(1) string: the SHA-256 hex digest of txt when the digest package is installed, otherwise the fallback "nohash_<n>" where <n> is nchar(txt).
Notes
- Availability is checked with
requireNamespace("digest", quietly = TRUE). - The fallback hash is not collision-resistant; it encodes only the string length, so two distinct closures of identical deparsed length would produce the same fallback identifier.
-
deparse()may truncate or reformat long closures depending on R'sdeparse.max.lines/ width options; the hash is therefore sensitive to the active R session's deparse settings.
Purpose
Returns the installed cmdstan version string for inclusion in manifest metadata, or NA when cmdstanr is unavailable or the version query fails.
Arguments
None.
Returns
A character(1): as.character(cmdstanr::cmdstan_version()) if cmdstanr is installed and the call succeeds; NA_character_ otherwise.
Notes
- Availability checked via
requireNamespace("cmdstanr", quietly = TRUE). - The
cmdstan_version()call is wrapped intryCatchwith anerrorhandler returningNA_character_, so a broken cmdstan installation yieldsNArather than a propagated error. - The result is coerced with
as.character(), so a numeric ornumeric_versionresult is stringified.
Purpose
Returns the installed DHARMa package version string for manifest metadata, or NA when DHARMa is unavailable.
Arguments
None.
Returns
A character(1): as.character(utils::packageVersion("DHARMa")) if DHARMa is installed; NA_character_ otherwise.
Notes
- Availability checked via
requireNamespace("DHARMa", quietly = TRUE). - Unlike
.gdpar_cmdstan_version(), there is notryCatchwrapper;packageVersionis expected not to throw once the namespace check passes.
Purpose
Generates the canonical RDS filename for a (non-EB) golden snapshot, keyed by the quadruple (family, K, basis_type, p). Matches the naming convention of the frozen goldens from Sub-phases 8.3.4 / 8.3.5a / 8.3.3 Unit 3.
Arguments
-
family(character(1)): Canonical family name (e.g.,"gaussian"). -
K(scalar, numeric/integer): Number of mixture components. -
basis_type(character(1)): Basis type label; defaults to"polynomial". -
p(scalar, numeric/integer): Basis dimension / polynomial degree; defaults to1L.
Mathematics
Returns
A character(1) filename string ending in .rds.
Notes
- The
basis_typesuffix is omitted only whenbasis_type == "polynomial"(exact string equality). Any other value—including"Polynomial"with different casing—triggers the suffix. - The
psuffix is omitted whenp <= 1L(the check isp > 1L). A non-integerpsuch as1.5would trigger the suffix. -
Kis interpolated twice into the base name (both before and after the family name), producing patterns likegolden_K2_gaussian_K2.rds.
Purpose
Generates the canonical RDS filename for an EB (empirical-Bayes) golden snapshot, keyed by the (family, regime) pair of Sub-phase 8.6.B.
Arguments
-
family(character(1)): Canonical family name (e.g.,"gaussian","zip"). -
regime(character(1)): Regime label identifying the configuration; defaults to"poly". Typical values:"poly","bspline","guard".
Returns
A character(1) of the form:
Notes
No conditional logic; the pattern is unconditional. Examples: golden_eb_gaussian_poly.rds, golden_eb_gaussian_bspline.rds, golden_eb_zip_guard.rds.
Purpose
Returns the canonical 17-configuration roster for Sub-phase 8.6.B (Charter Section 3.2 / parcial handoff Section 5.1 / D33 of the closure session). The roster partitions into 5 fitable and 12 guard-rejected configurations, each annotated with family, regime, status, guard class, and rationale.
Arguments
None.
Returns
A list of length 17, where each element is a named list with exactly five components:
| Component | Type | Description |
|---|---|---|
family |
character(1) |
Canonical family name |
regime |
character(1) |
Filename suffix |
status |
character(1) |
"fitable" or "guard"
|
guard_class |
character(1) or NA_character_
|
NA_character_ for fitable; canonical error class for guarded |
rationale |
character(1) |
One-line description |
The 17 entries, in order:
Fitable (5):
| # | family | regime | status | guard_class | rationale |
|---|---|---|---|---|---|
| 1 | gaussian |
poly |
fitable |
NA_character_ |
K=1, polynomial W, identity link |
| 2 | poisson |
poly |
fitable |
NA_character_ |
K=1, polynomial W, log link |
| 3 | neg_binomial_2 |
poly |
fitable |
NA_character_ |
K=1 with phi as global dispersion, log link |
| 4 | bernoulli |
poly |
fitable |
NA_character_ |
K=1, polynomial W, logit link |
| 5 | gaussian |
bspline |
fitable |
NA_character_ |
K=1, B-spline W (Sub-phase 8.3.8 path) |
Guard-rejected, stan_id D33 (5):
| # | family | regime | status | guard_class | rationale |
|---|---|---|---|---|---|
| 6 | beta |
guard |
guard |
gdpar_unsupported_feature_error |
stan_id=5 outside EB-supported set (D33) |
| 7 | gamma |
guard |
guard |
gdpar_unsupported_feature_error |
stan_id=6 outside EB-supported set (D33) |
| 8 | student_t |
guard |
guard |
gdpar_unsupported_feature_error |
stan_id=8 outside EB-supported set (D33) |
| 9 | tweedie |
guard |
guard |
gdpar_unsupported_feature_error |
stan_id=9 outside EB-supported set (D33) |
| 10 | lognormal_loc_scale |
guard |
guard |
gdpar_input_error |
not in gdpar_family() registry (K>=2 only) |
Guard-rejected, min_K (4):
| # | family | regime | status | guard_class | rationale |
|---|---|---|---|---|---|
| 11 | zip |
guard |
guard |
gdpar_unsupported_feature_error |
min_K>1 (mixture; deferred to 8.6.C) |
| 12 | zinb |
guard |
guard |
gdpar_unsupported_feature_error |
min_K>1 (mixture; deferred to 8.6.C) |
| 13 | hurdle_poisson |
guard |
guard |
gdpar_unsupported_feature_error |
min_K>1 (hurdle; deferred to 8.6.C) |
| 14 | hurdle_neg_binomial_2 |
guard |
guard |
gdpar_unsupported_feature_error |
min_K>1 (hurdle; deferred to 8.6.C) |
Guard-rejected, heterogeneous amm K>1 (3):
| # | family | regime | status | guard_class | rationale |
|---|---|---|---|---|---|
| 15 | het_beta_gamma_K2 |
guard |
guard |
gdpar_unsupported_feature_error |
heterogeneous K=2 amm list; deferred to 8.6.C |
| 16 | het_tweedie_K3 |
guard |
guard |
gdpar_unsupported_feature_error |
heterogeneous K=3 amm list; deferred to 8.6.C |
| 17 | het_student_t_K3 |
guard |
guard |
gdpar_unsupported_feature_error |
heterogeneous K=3 amm list; deferred to 8.6.C |
Notes
Pure data function; no I/O, no side effects. The roster is entirely hardcoded. Entry 10 (lognormal_loc_scale) is the only guard entry whose guard_class is gdpar_input_error rather than gdpar_unsupported_feature_error.
Purpose (from roxygen preamble only; function body not present in this section)
Intended to return the canonical 17-configuration roster for Sub-phase 8.6.C (Charter Section 3.3, apertura Section 4.4, D34 canonization), covering two multivariate paths:
-
Path A (
$K = 1,\; p > 1$ ): 6 fitable (Gaussian / Poisson / Negative Binomial / Bernoulli onamm_eb_marginal_multi.stan; stan_id in$\{1,2,3,4\}$ ) + 4 guard (Beta / Gamma / Student-t / Tweedie, rejected because Path A's supported set inheritsamm_distrib_multi.stan$\{1,2,3,4\}$ ). -
Path B (
$K > 1,\; p = 1$ ): 7 fitable spanning Beta K=2, Gamma K=2, Student-t K=3, Tweedie K=3, Lognormal-loc-scale K=2, ZIP K=2, and heterogeneous Gauss+Beta K=2.
Each entry is documented as carrying: family, regime ("polyP2" / "polyP3" / "bsplineP2" / "K2" / "K3" / "pathA_p2_guard"), status ("fitable" / "guard"), path ("A" or "B"), K, p, guard_class, rationale. The documented filename pattern is golden_eb_<family>_path<A|B>_<regime>.rds.
Notes
Only the roxygen documentation block is present in this section; the function definition (function(...) { ... }) is not included. The behavior described above reflects the roxygen comments and cannot be verified against source in this section.
Purpose
Factory that returns the canonical roster (a hard-coded list of configuration descriptors) for Sub-phase 8.6.C of the golden-reference test infrastructure. This roster enumerates the family/regime/path/K/p combinations that are either "fitable" (i.e. backed by a real Stan model and eligible for a golden RDS artifact) or "guard" (i.e. explicitly rejected with a documented error class) across two disjoint modeling paths:
-
Path A — single-regime models with
$K = 1$ and polynomial/B-spline order$p > 1$ . -
Path B — multi-regime models with
$K > 1$ and$p = 1$ .
The roster is consumed downstream by golden-artifact generation and validation logic to iterate over the supported and guarded configuration space without colliding with the 8.6.B or 8.6.D namespaces.
Arguments
None. The function is a zero-argument closure that returns a literal data structure.
Returns
A list of length 17, each element being a named list with the following fields:
| Field | Type | Meaning |
|---|---|---|
family |
character |
Canonical distribution family name (e.g. "gaussian", "neg_binomial_2", "het_gauss_beta_K2"). |
regime |
character |
Filename suffix / regime tag (e.g. "polyP2", "bsplineP2", "polyP3", "K2", "K3", "polyP2_guard"). |
status |
character |
Either "fitable" (real Stan-backed configuration) or "guard" (explicitly rejected configuration). |
path |
character |
Modeling path tag: "A" or "B". |
K |
integer |
Number of regimes. 1L for all Path A entries; 2L or 3L for Path B entries. |
p |
integer |
Basis-function order. 2L or 3L for Path A; 1L for Path B. |
guard_class |
character |
NA_character_ for fitable entries; "gdpar_unsupported_feature_error" for guard entries. |
rationale |
character |
Human-readable description embedding the corresponding stan_id and the path/K/p summary. |
The 17 entries decompose as:
-
Path A fitable (6) —
gaussian/polyP2(stan_id 1),gaussian/bsplineP2(stan_id 1),gaussian/polyP3(stan_id 1),poisson/polyP2(stan_id 2),neg_binomial_2/polyP2(stan_id 3),bernoulli/polyP2(stan_id 4). All haveK = 1L,p = 2L(exceptpolyP3which hasp = 3L). -
Path A guard (4) —
beta/polyP2_guard(stan_id 5),gamma/polyP2_guard(stan_id 6),student_t/polyP2_guard(stan_id 8),tweedie/polyP2_guard(stan_id 9). All haveK = 1L,p = 2L,guard_class = "gdpar_unsupported_feature_error", with rationale text "outside Path A supported set (K=1)". -
Path B fitable (7) —
beta/K2(stan_id 5,$K=2$ ),gamma/K2(stan_id 6,$K=2$ ),student_t/K3(stan_id 8,$K=3$ ),tweedie/K3(stan_id 9,$K=3$ ),lognormal_loc_scale/K2(stan_id 7,$K=2$ ),zip/K2(stan_id 10,$K=2$ ),het_gauss_beta_K2/K2(named-list family,$K=2$ ). All havep = 1L.
Notes
- The roster is purely declarative; it performs no validation, I/O, or model fitting. Side effects: none.
- The
stan_idvalues cited in therationalestrings are documentation-only metadata; they are not stored as a separate field. - The heterogeneous-family entry
het_gauss_beta_K2uses the family name itself suffixed with_K2, distinguishing it from the Path Bhet_gauss_betaentry that appears in other rosters. - The Path A guard entries document families that are supported under Path B (e.g.
beta,gamma,student_t,tweedie) but are rejected when combined with$K = 1$ and$p > 1$ . - No errors are raised by this function under any input (it takes no input).
Purpose
Generates the canonical on-disk filename for an empirical-Bayes (EB) golden RDS artifact belonging to Sub-phase 8.6.C. The path tag ("A" or "B") is embedded in the basename so that 8.6.C artifacts do not collide on disk with the parallel 8.6.B namespace produced by .gdpar_golden_eb_rds_name.
Arguments
| Argument | Type | Meaning |
|---|---|---|
family |
character (scalar) |
Canonical distribution family name, e.g. "gaussian", "beta", "het_gauss_beta_K2". |
path |
character (scalar) |
Modeling path tag, expected to be "A" or "B". Inserted verbatim after path in the format string. |
regime |
character (scalar) |
Regime/filename suffix, e.g. "polyP2", "bsplineP2", "K2", "polyP2_guard". |
Mathematics
The filename is produced by the format string
yielding, for example, golden_eb_gaussian_pathA_polyP2.rds or golden_eb_het_gauss_beta_K2_pathB_K2.rds.
Returns
A length-1 character string — the canonical RDS filename (basename only, no directory prefix).
Notes
- No validation of
family,path, orregimeis performed; any string is substituted verbatim. - The function is
@keywords internaland@noRd; it is not exported and not documented in the user-facing manual. - Because the path tag is embedded, two configurations sharing the same
(family, regime)but different paths (e.g. a hypotheticalgaussian/polyP2under Path A vs. Path C) produce distinct filenames. - The extension is always
.rds.
Purpose
Factory that returns the canonical roster for Sub-phase 8.6.D, covering Path C — models with
The initial iteration covers stan_id ∈ {1, 3} (Gaussian and NB2) at
Arguments
None.
Returns
A list of length 8, each element a named list with fields family, regime, status, path, K, p, guard_class, and rationale (same schema as .gdpar_golden_roster_8_6_C). All entries have path = "C".
The 8 entries decompose as:
-
Path C fitable (4) — all with
guard_class = NA_character_:-
gaussian/polyP2,$K = 2$ ,$p = 2$ (stan_id 1) -
gaussian/polyP3,$K = 2$ ,$p = 3$ (stan_id 1) -
neg_binomial_2/polyP2,$K = 2$ ,$p = 2$ (stan_id 3) -
neg_binomial_2/polyP3,$K = 2$ ,$p = 3$ (stan_id 3)
-
-
Path C guard (4) — all with
guard_class = "gdpar_unsupported_feature_error"and rationale citing "deferred under D40' (numerical caveat 6.1)":-
beta/polyP2_guard,$K = 2$ ,$p = 2$ (stan_id 5) -
gamma/polyP2_guard,$K = 2$ ,$p = 2$ (stan_id 6) -
student_t/polyP2_guard,$K = 3$ ,$p = 2$ (stan_id 8) -
zip/polyP2_guard,$K = 2$ ,$p = 2$ (stan_id 10)
-
Notes
- The
student_tguard entry is the only Path C entry with$K = 3$ ; all others have$K = 2$ . This mirrors the Path B roster wherestudent_tis registered at$K = 3$ . - Heterogeneous-
$p$ across slots is explicitly not represented in this initial roster; the roxygen documents it as "follow-on debt against Block 9.x." - The roxygen references
stan_idvalues{5, 6, 7, 8, 9, 10, 11, 12, 13}as "other Path B families" that are guard-rejected; however, the actual returned list only contains guard entries for stan_ids 5, 6, 8, and 10. The broader set mentioned in the documentation describes the conceptual deferred space, not the literal list contents. - Filename pattern for artifacts derived from this roster is
golden_eb_<family>_pathC_<regime>.rdsvia.gdpar_golden_eb_rds_name_8_6_D. - No side effects; no errors raised.
Purpose
Generates the canonical on-disk filename for an EB golden RDS artifact belonging to Sub-phase 8.6.D (Path C). Mirrors .gdpar_golden_eb_rds_name_8_6_C but hardcodes the path tag to "C", accepting only family and regime as arguments. This ensures the 8.6.B, 8.6.C, and 8.6.D namespaces remain disjoint on disk.
Arguments
| Argument | Type | Meaning |
|---|---|---|
family |
character (scalar) |
Canonical distribution family name, e.g. "gaussian", "neg_binomial_2", "beta". |
regime |
character (scalar) |
Regime/filename suffix, e.g. "polyP2", "polyP3", "polyP2_guard". |
Mathematics
Returns
A length-1 character string — the canonical RDS basename, e.g. golden_eb_gaussian_pathC_polyP2.rds or golden_eb_beta_pathC_polyP2_guard.rds.
Notes
- Unlike
.gdpar_golden_eb_rds_name_8_6_C, this function does not accept apathargument; the path is fixed to"C". - No input validation is performed.
- The function is
@keywords internaland@noRd. - The extension is always
.rds.
Purpose
Factory that returns the canonical 14-configuration roster for sub-phase 8.3.9, described in the roxygen as "Decision D1 = (1D), 13 atrasados + K=1+p=2 polynomial baseline." This roster aggregates the "atrasados" (deferred/late) configurations from sub-phases 8.3.4 through 8.3.8 plus the 8.3.9 baseline, providing a unified iteration surface for golden-artifact generation across the heterogeneous family/basis/K/p space.
Arguments
None.
Returns
A list of length 14. Each element is a named list with fields:
| Field | Type | Meaning |
|---|---|---|
family |
character |
Canonical distribution family name. |
K |
integer |
Number of regimes. |
basis_type |
character |
Either "polynomial" or "bspline". |
p |
integer |
Basis-function order. |
sub_phase |
character |
Originating sub-phase tag, e.g. "8.3.4", "8.3.5a", "8.3.6", "8.3.7", "8.3.8", "8.3.9". |
The 14 entries, in order, are:
| # | family |
K |
basis_type |
p |
sub_phase |
|---|---|---|---|---|---|
| 1 | lognormal_loc_scale |
2 | polynomial | 1 | 8.3.4 |
| 2 | student_t |
3 | polynomial | 1 | 8.3.5a |
| 3 | tweedie |
3 | polynomial | 1 | 8.3.5b |
| 4 | zip |
2 | polynomial | 1 | 8.3.6 |
| 5 | zinb |
3 | polynomial | 1 | 8.3.6 |
| 6 | hurdle_poisson |
2 | polynomial | 1 | 8.3.6 |
| 7 | hurdle_neg_binomial_2 |
3 | polynomial | 1 | 8.3.6 |
| 8 | het_gauss_beta |
2 | polynomial | 1 | 8.3.7 |
| 9 | het_gauss_gamma |
2 | polynomial | 1 | 8.3.7 |
| 10 | het_nb_beta |
2 | polynomial | 1 | 8.3.7 |
| 11 | gaussian |
1 | bspline | 1 | 8.3.8 |
| 12 | het_gauss_beta |
2 | bspline | 1 | 8.3.8 |
| 13 | gaussian |
1 | bspline | 2 | 8.3.8 |
| 14 | gaussian |
1 | polynomial | 2 | 8.3.9 |
Notes
- The roxygen comment for entry 1 notes that
gaussian,beta, andgamma"already exist on disk" and are therefore excluded from the 8.3.4 atrasados; onlylognormal_loc_scaleat$K = 2$ is registered as the remaining gap. - This roster uses a different schema from the 8.6.C / 8.6.D rosters: it has no
status,path,guard_class, orrationalefields, and it addsbasis_typeandsub_phasefields instead. All entries are implicitly "fitable" (there are no guard entries). - Sub-phase 8.3.6 contributes 4 mixture-family entries (ZIP, ZINB, hurdle-Poisson, hurdle-NB2), alternating
$K = 2$ and$K = 3$ . - Sub-phase 8.3.7 contributes 3 heterogeneous-family entries, all at
$K = 2$ . - Sub-phase 8.3.8 contributes 3 B-spline entries; notably entry 13 (
gaussian,$K = 1$ , bspline,$p = 2$ ) is the only entry in this roster with$p = 2$ besides the 8.3.9 baseline. - The 8.3.9 baseline (entry 14) is the only entry with
basis_type = "polynomial"and$p = 2$ ; the roxygen describes it as "filling the last gap." - No side effects; no errors raised; no input validation (zero-argument function).
gdpar_ksd_joint(eb_fit, fb_fit, kernel = c("imq", "rbf"), bandwidth = c("median", "fixed"), bandwidth_value = NULL, beta = -0.5, ess_weighted = FALSE, seed = NULL, ...)
Purpose
Computes the joint kernel Stein discrepancy (KSD) between the EB (empirical Bayes) and FB (full Bayes) posteriors. This is a density-free spectral metric for multivariate discrepancy between the two posterior distributions, addressing the limitation of marginal total-variation (TV) distance in capturing joint dependence structure.
Arguments
-
eb_fit: Object of classgdpar_eb_fit(fromgdpar_eb()). -
fb_fit: Object of classgdpar_fit(fromgdpar()). Must be fitted on the same dataset. -
kernel: Character scalar; kernel type:"imq"(inverse multi-quadric, default) or"rbf"(radial basis function). -
bandwidth: Character scalar; bandwidth selection:"median"(median heuristic on FB pairwise squared distances, default) or"fixed". -
bandwidth_value: Numeric scalar; bandwidth value (squared units) whenbandwidth = "fixed". Defaults to1.0ifNULL. -
beta: Numeric scalar in (-1, 0); exponent for IMQ kernel. Default -0.5. Ignored for RBF. -
ess_weighted: Logical scalar; ifTRUE, thin both posteriors to common effective sample size (ESS) count. DefaultFALSE. -
seed: Integer scalar orNULL; RNG seed for thinning wheness_weighted = TRUE. -
...: Reserved; currently unused.
Mathematics
The KSD uses an empirical Gaussian target
For kernel
- IMQ:
$k(x, y) = (h + \|x - y\|^2)^\beta$ ,$\beta \in (-1, 0)$ , default$\beta = -1/2$ . - RBF:
$k(x, y) = \exp(-\|x - y\|^2 / (2h))$ .
The Stein kernel under Gaussian target is: $$ k_p(x, y) = \langle s(x), s(y) \rangle k(x, y) + \langle s(x), \nabla_y k(x, y) \rangle + \langle s(y), \nabla_x k(x, y) \rangle + \mathrm{tr}(\nabla_x \nabla_y^\top k(x, y)). $$
The KSD V-statistic is:
$$
\mathrm{KSD}(Q, P) = \sqrt{\max\left{0,, n^{-2} \sum_{i, j = 1}^{n} k_p(x_i, x_j)\right}}, \qquad x_i \sim Q.
$$
Here ess_weighted = TRUE), and
When ess_weighted = TRUE, both posteriors are thinned to posterior::ess_basic.
Returns
An object of class gdpar_ksd_joint (list) with components:
-
ksd_value: Numeric scalar;$\mathrm{KSD}$ (square root, clamped$\ge 0$ ). -
ksd_squared: Numeric scalar; raw V-statistic before clamping. -
kernel,bandwidth,bandwidth_value,beta: Settings used. -
n_eb_draws,n_fb_draws,n_dim: Integer scalars; sample sizes and dimension after thinning. -
target_mu,target_Sigma: Empirical mean and covariance of FB draws (Gaussian target). -
ess_weighted,thinned_to: Logical and integer; thinning configuration. -
vars: Character vector; common$\xi$ -variables used. -
call: The matched call.
Notes
- Raises errors (via
gdpar_abort()) if: inputs are not of correct classes;$\beta$ not in (-1, 0);bandwidth_valuenon-positive whenbandwidth = "fixed";ess_weightedorseednot scalar logical/numeric; no common$\xi$ -variables; ESS-thinned$n_{\text{eff}} < 2$ . - Side effects: sets RNG seed if
seedandess_weighted = TRUE. - S3 methods:
print.gdpar_ksd_jointandsummary.gdpar_ksd_jointexist (documented separately).
.gdpar_ksd_validate_inputs(eb_fit, fb_fit, kernel, bandwidth, bandwidth_value, beta, ess_weighted, seed)
Purpose
Internal validation helper for gdpar_ksd_joint(). Checks argument types and ranges, aborting with informative errors on invalid inputs.
Arguments
-
eb_fit: Expected to begdpar_eb_fit; validated by class. -
fb_fit: Expected to begdpar_fit; validated by class. -
kernel: Character scalar; validated bymatch.arg()upstream. -
bandwidth: Character scalar; validated bymatch.arg()upstream. -
bandwidth_value: Numeric scalar orNULL; if non-NULL andbandwidth = "fixed", must be positive. -
beta: Numeric scalar; must be in (-1, 0). -
ess_weighted: Logical scalar; must beTRUEorFALSE. -
seed: Numeric scalar orNULL; if non-NULL, must be a single numeric.
Mathematics
None.
Returns
NULL invisibly (called for side effects).
Notes
- Raises errors via
gdpar_abort()with class"gdpar_input_error"and relevant data. - Does not check consistency of
eb_fitandfb_fit(e.g., same dataset) – that is left to the caller.
Purpose. Computes a robust, scale-equivariant bandwidth heuristic for kernel-based discrepancy measures. Given a matrix of samples, it returns the median of all pairwise squared Euclidean distances among distinct pairs, serving as the default bandwidth
Arguments.
| Argument | Type | Meaning |
|---|---|---|
X |
matrix |
Sample matrix whose rows are observations in |
Mathematics. Let X. The function first constructs the
implemented via the identity outer(ss, ss, "+") - 2 * tcrossprod(X) where ss = rowSums(X * X) stores
Returns. A numeric scalar:
-
1.0if$n < 2$ (too few samples to form a pair). -
1.0if the computed median is non-finite or$\le 0$ (degenerate configuration). - Otherwise, the median pairwise squared distance (always
$> 0$ under the guard).
Notes.
- Purely internal (not exported, no
@exporttag). - The clamping
D2[D2 < 0.0] <- 0.0guards against floating-point negativity from the algebraic identity when points nearly coincide. - No side effects; does not modify its argument.
Purpose. Computes the inverse of a covariance (or precision-related) matrix
Arguments.
| Argument | Type | Meaning |
|---|---|---|
Sigma |
matrix |
Symmetric positive-(semi)definite covariance matrix to be inverted. |
Mathematics. The function attempts to compute
- First try:
$\Sigma^{-1} = (\mathbf{L}\mathbf{L}^\top)^{-1}$ usingchol2inv(chol(Sigma)). - On failure, form the ridged matrix:
$$\tilde{\Sigma} = \Sigma + \varepsilon,\mathbf{I}_d, \qquad \varepsilon = \max!\bigl(10^{-8},; 10^{-10}\cdot \mathrm{mean}(\mathrm{diag}(\Sigma))\bigr)$$ and retrychol2inv(chol(Sigma_ridged)). - If the ridged inversion also fails, the function aborts.
Returns. A matrix
Notes.
- Raises a fatal
gdpar_internal_error(viagdpar_abort) if even the ridged matrix is numerically singular. - The ridge
$\varepsilon$ scales with the average diagonal of$\Sigma$ , making it relative to the data scale, with an absolute floor of$10^{-8}$ . - Internal only; not exported.
Purpose. Safely computes the minimum effective sample size (ESS) across all columns (variables) of a draws matrix. Used to decide whether and how much to thin MCMC draws before computing the KSD, guarding against version-incompatible calls to the posterior package.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
draws_mat |
matrix or draws_matrix
|
Matrix of posterior draws (rows = iterations, columns = variables), typically from the feedback (FB) model. |
Mathematics. Let
computed via posterior::ess_basic. The comment references fix B9.4 G.iv/H.iv: the call is routed through posterior::summarise_draws() to avoid ambiguous method dispatch of posterior::ess_basic() on raw matrices across package versions.
Returns. A numeric scalar:
- If
posterior::summarise_draws()succeeds and yields a valid"ess_basic"column (data frame) or a numeric vector, the minimum positive finite value is returned. - If the call fails, returns non-finite or empty, or
ess_basiccannot be extracted, the function falls back tonrow(draws_mat)(i.e., the total number of draws, an upper bound).
Notes.
- Internal only.
- The tryCatch guards against the absence of the
posteriorpackage or API changes. The fallbacknrow(draws_mat)means that in a degraded setting, no thinning based on ESS occurs. - Values
$\le 0$ or non-finite are filtered out before taking the minimum.
Purpose. Computes the V-statistic numerator
Arguments.
| Argument | Type | Meaning |
|---|---|---|
X |
matrix |
Sample matrix (posterior draws in the common- |
mu |
numeric (length |
Mean vector |
Lambda |
matrix |
Precision matrix |
kernel |
character |
Base kernel type: either "imq" (inverse multi-quadratic) or "rbf" (radial basis function / Gaussian). |
h |
numeric scalar |
Bandwidth parameter |
beta |
numeric scalar |
Exponent kernel = "rbf". |
Mathematics. Let
The function computes the full
-
$\mathbf{S}_i = s(\mathbf{x}_i) = -\Lambda\,(\mathbf{x}_i - \boldsymbol{\mu})$ , stored inS($n \times d$ ). -
$D^2_{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|^2$ (squared distance matrix, clamped$\ge 0$ ). -
$\mathbf{S}\mathbf{X}^\top$ ,$\mathbf{X}\mathbf{S}^\top$ ,$\mathbf{S}\mathbf{S}^\top$ , and$\mathbf{s}_i^\top \mathbf{x}_i$ (diagonal products).
For the IMQ kernel
Let
where
For the RBF kernel
(The signs on
The function returns
Returns. A numeric scalar: the V-statistic numerator
Notes.
- Cost is
$O(n^2 d^2)$ due to the dense$n \times n$ matrix products$\mathbf{S}\mathbf{X}^\top$ ,$\mathbf{X}\mathbf{S}^\top$ ,$\mathbf{S}\mathbf{S}^\top$ . - The
betaargument is only meaningful whenkernel == "imq"; for"rbf"it is accepted but unused. - Negative squared distances in
$D^2$ are clamped to zero as in the other internal helpers. - Internal only; not exported.
Purpose. S3 print method for objects of class "gdpar_ksd_joint", producing a human-readable summary of the joint KSD result to the console.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_ksd_joint |
The KSD joint object, a named list produced by the main KSD computation. Expected components: ksd_value, kernel, beta, bandwidth_value, bandwidth, n_dim, n_eb_draws, n_fb_draws, ess_weighted, thinned_to. |
digits |
integer scalar |
Number of significant digits for formatting numeric values. Default 4L. |
... |
Reserved for future arguments; currently unused. |
Returns. Invisibly returns x (the input object), following standard R print-method convention.
Notes.
- Exported (
@export); registered as an S3 method forprinton class"gdpar_ksd_joint". - Conditionally prints the
betaparameter line only whenx$kernel == "imq". - Conditionally prints the ESS-weighted thinning line only when
x$ess_weightedisTRUE.
Purpose. S3 summary method for "gdpar_ksd_joint" objects. Constructs a structured summary list containing the KSD value, squared KSD, kernel details, dimensionality, variable names, and an interpretation string.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_ksd_joint |
The KSD joint result object. Expected components: ksd_value, ksd_squared, kernel, bandwidth_value, n_dim, vars. |
... |
Reserved; currently unused. |
Returns. An object of class c("summary.gdpar_ksd_joint", "list") with components:
| Component | Type | Meaning |
|---|---|---|
ksd_value |
numeric |
|
ksd_squared |
numeric |
|
kernel |
character |
Kernel type used ("imq" or "rbf"). |
bandwidth_value |
numeric |
Bandwidth value |
n_dim |
integer |
Dimension |
vars |
character vector |
Names of the common- |
interpretation |
character scalar |
Explanatory text: a value close to zero indicates EB matches the empirical Gaussian fit of FB on the joint gdpar_compare_eb_fb()'s marginal TV table. |
Notes.
- Exported.
- Does not print anything itself; use
print()on the returned object for formatted display.
Purpose. S3 print method for "summary.gdpar_ksd_joint" objects. Produces a formatted, human-readable display of the KSD summary including all key quantities and an interpretation paragraph.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
x |
summary.gdpar_ksd_joint |
Summary object created by summary.gdpar_ksd_joint(). |
digits |
integer scalar |
Number of significant digits for numeric formatting. Default 4L. |
... |
Reserved; currently unused. |
Returns. Invisibly returns x, following standard R print-method convention.
Notes.
- Exported (
@export). - Uses
strwrap()withwidth = 70L,indent = 0L,exdent = 2Lto wrap the interpretation paragraph neatly at 70 characters with a 2-character hanging indent. - Displays: KSD value (V-stat sqrt), KSD squared, kernel type, bandwidth, dimension, variable names, and the interpretation block.
gdpar_meta_learner_adapter(name, fit_predict_fun, predict_fun = NULL, requires_r = character(0L), requires_py = character(0L), native_ci = FALSE, description = NULL)
Purpose
Constructor of the pluggable adapter contract through which gdpar_compare_meta_learners dispatches to external meta-learner implementations (e.g. grf, EconML, DoubleML, or custom doubly-robust estimators). The returned object is a small S3 list of class c("gdpar_meta_learner_adapter", "list") that bundles a mandatory fit-and-predict closure, an optional re-prediction closure, dependency manifests for R packages and Python modules, a flag declaring native credible-interval support, and a free-form description. The comparator orchestrates adapters by invoking fit_predict_fun for the full fit-and-predict cycle and, when available, predict_fun for re-evaluation on a new grid without re-fitting; when predict_fun is NULL, the comparator falls back to fit_predict_fun and emits a gdpar_diagnostic_warning announcing the re-fit.
Arguments
-
name: character scalar. Adapter identifier; must be non-empty (nzchar(name)) and is expected to be unique within a single call togdpar_compare_meta_learners(uniqueness is enforced by the comparator, not by this constructor). -
fit_predict_fun: function. Mandatory closure with signaturefunction(X, Y, T, X_newdata, level, seed_run)that fits the meta-learner on(X, Y, T)and predicts the CATE onX_newdata.XandX_newdataare data frames of covariates (the adapter is responsible for any internal conversion to matrices or to language-native arrays such asnumpy.ndarray);Yis a numeric vector;Tis a 0/1 integer vector;levelis the nominal credible level inherited from the bridge (default 0.95);seed_runis the per-method seed propagated by the comparator. The closure must return a list with componentscate_mean(numeric vector),cate_ci(numeric matrix[n_newdata, 2]with columnslower,upper, orNULL),state(opaque object cached for the optionalpredict_fun; may beNULL), andnotes(character vector of method-specific diagnostics/warnings). -
predict_fun: function orNULL(defaultNULL). Optional closure with signaturefunction(state, X_newdata, level)that re-evaluates the meta-learner onX_newdatausing the cachedstateproduced byfit_predict_fun. Must return a list with componentscate_meanandcate_ci(same shape contract asfit_predict_fun).NULLsignals that the adapter does not support re-prediction. -
requires_r: character vector. R packages the adapter needs (checked viarequireNamespacebefore the fit). Defaultcharacter(0L). -
requires_py: character vector. Python modules the adapter needs (checked viareticulate::py_module_availablebefore the fit, which itself requires reticulate). Defaultcharacter(0L). -
native_ci: logical scalar.TRUEwhen the adapter returns a native CI incate_ci;FALSEwhen it does not (the comparator does not synthesize intervals in that case and the slot is left atNULL). Must be non-NA. -
description:NULLor character scalar. One-line human-readable description used by theprintmethod. DefaultNULL.
Returns
An object of class c("gdpar_meta_learner_adapter", "list") with components name, fit_predict_fun, predict_fun, requires_r, requires_py, native_ci (coerced via as.logical(native_ci)), and description.
Notes
The constructor performs strict input validation and raises errors via gdpar_abort with S3 class "gdpar_input_error":
-
name: must satisfyis.character(name) && length(name) == 1L && nzchar(name); otherwisegdpar_abortis called withdata = list(received = name). -
fit_predict_fun: must satisfyis.function(fit_predict_fun). -
predict_fun: must beNULLor satisfyis.function(predict_fun). -
requires_r: must satisfyis.character(requires_r)(length is not constrained). -
requires_py: must satisfyis.character(requires_py). -
native_ci: must satisfyis.logical(native_ci) && length(native_ci) == 1L && !is.na(native_ci). -
description: must beNULLor a character scalar (is.character(description) && length(description) == 1L).
No side effects beyond the construction of the list and assignment of the S3 class. The function does not validate the body or signature of fit_predict_fun/predict_fun; conformance to the documented contract is the caller's responsibility.
Purpose
Predicate testing whether an object inherits from class "gdpar_meta_learner_adapter". Used to guard comparator internals and user-facing code against malformed adapter arguments.
Arguments
-
x: any R object to test.
Returns
TRUE when inherits(x, "gdpar_meta_learner_adapter") is TRUE, FALSE otherwise. Always a length-1 logical scalar (never NA, because inherits never returns NA).
Notes
No validation, no errors, no side effects. The test is purely class-based; an object that happens to be a list with the right components but lacking the S3 class attribute will return FALSE.
Purpose
S3 print method for objects of class gdpar_meta_learner_adapter. Renders a compact human-readable summary used by the print generic and by the example workflow in gdpar_meta_learner_adapter.
Arguments
-
x: agdpar_meta_learner_adapterobject. -
...: unused; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
Emits the following lines to standard output via cat:
-
<gdpar_meta_learner_adapter>(header). -
name :followed byx$name. - If
!is.null(x$description):description :followed byx$description. -
requires (R) :followed by either<none>(whenlength(x$requires_r) == 0L) orpaste(x$requires_r, collapse = ", "). -
requires (Python) :followed by either<none>(whenlength(x$requires_py) == 0L) orpaste(x$requires_py, collapse = ", "). -
native CI :followed byx$native_ci. -
supports predict :followed by!is.null(x$predict_fun).
No validation of x is performed; passing a non-adapter will either succeed (printing whatever fields happen to be present) or error with a standard R subsetting error. The ... argument is silently ignored.
Purpose
Internal helper that verifies whether the R packages and Python modules declared in an adapter's requires_r and requires_py fields are available in the current session. Used by the comparator before dispatching to fit_predict_fun so that missing-dependency failures surface as structured diagnostics rather than opaque downstream errors.
Arguments
-
adapter: agdpar_meta_learner_adapterobject (or any list withrequires_randrequires_pycharacter components). No class check is performed.
Returns
A list with three components:
-
ok: logical scalar.TRUEiff bothmissing_randmissing_pyhave length zero. -
missing_r: character vector of entries fromadapter$requires_rfor whichrequireNamespace(pkg, quietly = TRUE)returnsFALSE. -
missing_py: character vector of entries fromadapter$requires_pydeemed unavailable (see algorithm below).
Mathematics
The availability predicate for R packages is
and the overall R-side status is
For Python, the predicate is conditional on reticulate itself being present:
with each call wrapped in tryCatch(..., error = function(e) FALSE), and
The final status is
Notes
The Python branch proceeds in three cases:
-
Empty
requires_py:missing_pyis set tocharacter(0L)unconditionally; reticulate is not loaded. -
Non-empty
requires_pybut reticulate unavailable:missing_pyis set to the entirety ofadapter$requires_py(i.e. every declared module is reported as missing) without attempting to load Python. -
Non-empty
requires_pyand reticulate available: the helper first checks, viaexists("py_require", envir = asNamespace("reticulate"), inherits = FALSE), whether reticulate exposes the newerpy_requireAPI; if so, it callsreticulate::py_require(adapter$requires_py)inside atryCatch(..., error = function(e) NULL)(errors are silently swallowed and the return value is discarded — this call is a best-effort hint to reticulate's environment manager, not a hard gate). Then each module inadapter$requires_pyis tested withreticulate::py_module_available(mod), again wrapped intryCatch(..., error = function(e) FALSE); modules for which the call returnsFALSEor errors are collected intomissing_py.
The function is marked @keywords internal and @noRd; it is not exported and is intended only for use within the package. It performs no validation of adapter beyond subsetting requires_r and requires_py; a malformed adapter lacking these components will trigger an R subsetting error.
Purpose Provides a concise printed summary of a fitted gdpar_fit object, displaying key model attributes and convergence diagnostics.
Arguments
-
x: An object of classgdpar_fit. -
...: Unused; present for S3 generic compatibility. Mathematics None. Returns Invisibly returns the input objectx. Notes - This is the S3
printmethod forgdpar_fitobjects. - The output includes the model path, family and link, AMM level, anchor value(s), number of observations (inferred from design matrices), identifiability report (if present), and convergence diagnostics (converged status, maximum R-hat, minimum bulk ESS, divergent count) when available.
- For multivariate models (
p > 1), the anchor is printed as a vector.
Purpose Returns a posterior summary table for user-facing parameters of a fitted gdpar_fit object.
Arguments
-
object: An object of classgdpar_fit. -
...: Unused; present for S3 generic compatibility. Mathematics None. Returns A data frame of posterior summaries (mean, median, standard deviation, 5% and 95% quantiles, R-hat, ESS bulk and tail) for the selected parameters. Notes - This is the S3
summarymethod forgdpar_fitobjects. - Requires the suggested package
posterior. - Parameters are filtered to exclude internal variables (
eta,log_lik,y_pred,theta_i,a_coef,b_coef,a_raw,b_raw,W_raw). If no user-facing parameters remain, defaults totheta_refor the first variable. - Uses
posterior::summarise_drawswith default summary and convergence measures.
predict.gdpar_fit(object, newdata = NULL, type = c("theta_i", "linear_predictor", "response"), summary = c("draws", "mean_se", "quantiles"), ...)
Purpose Computes posterior draws of individual parameters gdpar_fit model, optionally on the response scale.
Arguments
-
object: An object of classgdpar_fit. -
newdata: Optional data frame for prediction. WhenNULL(default), predictions are computed on training data using stored Stan-generatedtheta_idraws. -
type: Character scalar:"theta_i"or"linear_predictor"(synonyms) for the linear predictor scale;"response"for the inverse-link-transformed scale. -
summary: Character scalar:"draws"(default) returns full draws as a matrix/array;"mean_se"returns posterior mean and standard error per observation;"quantiles"returns 5%, 50%, and 95% quantiles per observation. -
...: Unused; present for S3 generic compatibility. Mathematics The individual parameters are defined as:$$\theta_i = \theta_{\text{ref}} + \Delta(x_i, \theta_{\text{ref}})$$ where$\Delta$ is the deviation function from the AMM specification. Whentype = "response", the family's inverse link function$g^{-1}(\cdot)$ is applied to$\theta_i$ . Returns - For
summary = "draws": A numeric matrix (draws$\times$ observations) for univariate models; an array of dimensions$(S, n, p)$ for multivariate models ($p > 1$ ); or an array of dimensions$(S, n, K)$ for$K > 1$ models. - For
summary = "mean_se": A data frame with columnsmeanandsefor univariate models; a named list of such data frames for multivariate or$K > 1$ models. - For
summary = "quantiles": A data frame with columnsq05,q50,q95for univariate models; a named list of such data frames for multivariate or$K > 1$ models. Notes - This is the S3
predictmethod forgdpar_fitobjects. - Dispatches to internal helper functions based on model structure (multivariate with
$p > 1$ or$K > 1$ ). - For
newdata, centering parameters (column means of$Z_a$ ,$Z_b$ , and column means/standard deviations of$X$ ) from training are used to rebuild design matrices. - For
type = "response", the inverse link is applied per-draw; reported quantiles are of the response-scale distribution. - Requires the suggested package
posterior.
Purpose Internal helper for predict.gdpar_fit handling multivariate models (
-
object: Agdpar_fitobject with$p > 1$ . -
newdata: EitherNULL(in-sample) or a data frame for out-of-sample prediction. -
type: Character scalar:"theta_i","linear_predictor", or"response". -
summary: Character scalar:"draws","mean_se", or"quantiles". -
draws: The posterior draws object from the Stan fit. Mathematics For in-sample predictions, the generated quantity$\theta_i$ (stored as a matrix with columns for each$(i,k)$ pair) is reshaped into an array of dimensions$(S, n, p)$ . For out-of-sample,$\eta_{i,k}$ is rebuilt per coordinate from draws of$\theta_{\text{ref}}$ ,$a_{\text{coef}}$ ,$c_b$ ,$W_{\text{raw}}$ , and$\sigma_W$ . Returns - For
summary = "draws": An array of dimensions$(S, n, p)$ (draws$\times$ observations$\times$ dimensions). - For
summary = "mean_se": A named list of length$p$ , each element a data frame withmeanandse. - For
summary = "quantiles": A named list of length$p$ , each element a data frame withq05,q50,q95. Notes - Internal function, not exported.
- Parses variable names of the form
theta_i[i,k]from draws to map to array indices. - For
type = "response", applies the$k$ -th component's inverse link fromobject$family$families[[k]]$inv_link. - Handles grouped data via
predict_from_newdata_grouped_multi.
Purpose Internal helper for predict.gdpar_fit handling distributional regression models with
-
object: Agdpar_fitobject with$K > 1$ . -
newdata: EitherNULL(in-sample) or a data frame for out-of-sample prediction. -
type: Character scalar:"theta_i","linear_predictor", or"response". -
summary: Character scalar:"draws","mean_se", or"quantiles". -
draws: The posterior draws object from the Stan fit. Mathematics For in-sample, the generated quantity$\theta_{i,k}$ (stored as a matrix of dimensions$n \times K$ ) is reshaped into an array of dimensions$(S, n, K)$ . For out-of-sample,$\eta_{i,k}$ is rebuilt per distributional parameter$k$ from draws of$\theta_{\text{ref},k}$ ,$a_{\text{coef},k}$ ,$c_{b,k}$ , and the globally shared$W_{\text{raw}}$ and$\sigma_W$ . Returns - For
summary = "draws": An array of dimensions$(S, n, K)$ . - For
summary = "mean_se": A named list of length$K$ , each element a data frame withmeanandse. - For
summary = "quantiles": A named list of length$K$ , each element a data frame withq05,q50,q95. Notes - Internal function, not exported.
- For
type = "response", applies the canonical inverse link for each distributional parameter$k$ as specified inobject$family$param_specs[[k]]$inv_link(e.g., identity for location, log for dispersion). - Out-of-sample predictions with grouping or B-spline bases on newdata are not yet supported and raise
gdpar_unsupported_feature_error. - Mirrors
predict_gdpar_fit_multibut iterates over$K$ slots.
Purpose Computes predictions for a gdpar_fit object with K > 1 (K-individual model). It reconstructs posterior draws of the linear predictor or response for each slot (individual) using the provided posterior draws, optionally transforming them via inverse link functions and summarizing them.
Arguments
-
object: Agdpar_fitobject (list) containing model information, includingK,slot_names,family, andamm. -
newdata: EitherNULL(use training data) or a data frame of new covariates for prediction. -
type: Character string, either"linear_predictor"(default, returns linear predictor) or"response"(applies inverse link functions). -
summary: Character string, either"draws"(returns the full array of draws),"mean_se"(returns mean and standard error), or"quantile"(returns 5%, 50%, and 95% quantiles). -
draws: Adrawsobject (from theposteriorpackage) containing posterior samples, including variablestheta_i_k,theta_ref_k,a_coef_k,c_b_k,W_raw, andsigma_Was applicable.
Mathematics
When newdata is NULL, the function extracts draws of theta_i_k[i,k] from draws. When newdata is provided, it calls predict_from_newdata_K (not defined in this section) to compute type = "response", each slot's linear predictor is transformed by the inverse link function from object$family$param_specs[[k]]$inv_link.
Returns
- If
summary = "draws": a 3-dimensional array of dimensions(S, n, K)(S = number of draws, n = number of observations, K = number of slots) with dimension namesslot_namesfor the third dimension. - If
summary = "mean_se": a list of lengthK(names =slot_names) where each element is a data frame with columnsmeanandse. - If
summary = "quantile": a list of lengthK(names =slot_names) where each element is a data frame with columnsq05,q50,q95.
Notes
- The function requires the
posteriorpackage (suggested dependency) to extract draws. - When
newdataisNULL, it parses variable names oftheta_i_kusing the regex^theta_i_k\\[(\\d+),(\\d+)\\]$to extract indices. If parsing fails, it raises agdpar_internal_error. - For
type = "response", ifobject$family$param_specsisNULLor has incorrect length, it raises agdpar_internal_erroradvising to usetype = "linear_predictor". - The function does not modify
objectordraws.
Purpose Internal helper that computes the linear predictor p = 1) dynamic parameter model. It uses the centering parameters from the training design and posterior draws of theta_ref, a_coef, b_coef, W_raw, and sigma_W.
Arguments
-
object: Agdpar_fitobject (list) withamm(model specification) anddesign(training design matrices and centering constants). -
newdata: Data frame with covariates required by the AMM specification. -
draws: Adrawsobject (fromposterior) containing posterior samples.
Mathematics
For each posterior draw
-
$Z_a$ and$Z_b$ are design matrices for the additive and multiplicative components, centered using training means. -
$X$ is the standardized modulating covariate matrix (centered and scaled using training means and standard deviations). -
$\theta_{\mathrm{ref}}$ is the reference parameter draw. -
$a_{\mathrm{coef}}$ ,$b_{\mathrm{coef}}$ are coefficient draws. -
$W_{\mathrm{raw}}$ are draws of the modulating weights, scaled by$\sigma_W$ (if non-centered parameterization). -
$\theta_{\mathrm{anchor}}$ is the fixed anchor parameter.
Returns
A matrix of dimensions (S, n_new) (S = number of draws, n_new = number of rows in newdata) containing the linear predictor draws.
Notes
- This function is internal (not exported) and is called by
predict_gdpar_fitfor the univariate case. - It assumes the model has a polynomial W basis (
amm$W$type == "polynomial"). If the modulating component exists but is not polynomial, it is ignored (thehas_Wcondition ensures only polynomial W is used). - If
amm$aoramm$bareNULL, the corresponding design matrices are set to zero-column matrices. - The function loops over each draw
$s$ in R, which may be slow for large S; however, it is intended for prediction on new data where vectorization across draws is non-trivial due to the multiplicative term involving$\theta_{\mathrm{ref}}$ .
Purpose Internal helper that computes the linear predictor p > 1) dynamic parameter model. It rebuilds per-coordinate predictors using per-coordinate design matrices and centering, and a globally shared modulating component.
Arguments
-
object: Agdpar_fitobject withp > 1(number of dimensions) and per-dimension AMM specifications inamm$dims. -
newdata: Data frame with covariates required by every dimension's AMM specification. -
draws: Adrawsobject (fromposterior) containing posterior samples.
Mathematics
For each posterior draw
-
$Z_{a,k}$ and$Z_{b,k}$ are per-dimension design matrices, centered using per-dimension training means. -
$X$ is the standardized modulating covariate matrix (shared across dimensions), centered and scaled using global training means and standard deviations. -
$\theta_{\mathrm{ref},k}$ ,$a_{\mathrm{coef},k}$ ,$c_{b,k}$ are per-dimension draws. -
$W_{\mathrm{raw}}$ is a matrix of dimension$\dim_W \times d$ (where$d$ is the number of modulating covariates), with rows partitioned by dimension: rows$(k-1)\cdot(\dim_W/p) + 1$ to$k\cdot(\dim_W/p)$ correspond to dimension$k$ . -
$\sigma_W$ is a scalar multiplier (if non-centered parameterization; omitted if centered). -
$\theta_{\mathrm{anchor},k}$ is the per-dimension anchor.
Returns
A 3-dimensional array of dimensions (S, n_new, p) (S = number of draws, n_new = number of rows in newdata, p = number of dimensions).
Notes
- This function is internal and called by
predict_gdpar_fitfor multivariate models. - It uses helper functions (not shown in this section) to extract draws:
.extract_theta_ref_multi_flat,posterior_array_var_to_list,posterior_var_to_matrix_3d,posterior_var_to_array_2d. - The modulating component is only included if
amm$Wexists, has type"polynomial", andXhas columns. The global standardization ofXis applied. - If the centered parameterization for W is used (
cp_W = TRUE),sigma_Wis not multiplied. - The function loops over draws and dimensions in R, which may be slow but is necessary due to the complex dependencies.
Purpose Internal helper that generates posterior predictions on new covariate data for K-individual fits (K > 1). It mirrors the Stan-side linear predictor newdata. Called by the predict.gdpar_fit S3 method when newdata is non-NULL and the fit has K > 1.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit list |
The fitted model object. Must contain K, amm_list_canonical, design_K, anchor, and parametrization. |
newdata |
data.frame |
New covariate observations at which to predict. Must contain every column referenced by the model's fixed-effect (a), multiplicative-effect (b/c_b), and modulator (W) formula terms, plus X_names columns when a W basis is active. |
draws |
draws_array / draws_matrix (from posterior) |
Posterior draws extracted from the fitted model (e.g., via as_draws_array(fit$stanfit)). Must contain the variables theta_ref_k, and conditionally a_coef_k, c_b_k, W_raw, and sigma_W. |
Mathematics
The linear predictor for individual
where
-
$\theta_{\mathrm{ref},k}^{(s)}$ is the reference-point parameter for slot$k$ at draw$s$ . -
$\mathbf{Z}_{a,k}$ and$\mathbf{Z}_{b,k}$ are design matrices for the additive and multiplicative effects of slot$k$ , each column-mean-centred using the training means stored indesign_K$Z_a_k_means_list[[k]]anddesign_K$Z_b_k_means_list[[k]]. -
$\mathbf{a}_k^{(s)}$ and$\mathbf{c}_{b,k}^{(s)}$ are the corresponding coefficient vectors at draw$s$ . -
$\mathbf{X}_i$ is the vector of modulator covariates for observation$i$ , standardised (centred then divided by training SDs) viadesign_K$X_meansanddesign_K$X_sds.
The modulating coefficient vector is built from the polynomial basis difference
where dim_W) is the number of polynomial basis functions and cp_W is FALSE, each entry is additionally scaled by
The anchor
Algorithm outline
-
Validation — Abort with
gdpar_unsupported_feature_errorif the fit has grouping (J_groups > 1), or if theWbasis type is not"polynomial"(e.g. B-spline). Both are queued for a future release (Session 8.4). -
Build newdata design matrices — For each slot
$k \in \{1,\dots,K\}$ , constructZ_a_k_new[[k]]andZ_b_k_new[[k]]fromnewdatausing the stored formula objects, then centre each column by the training mean. If a slot has noaorbterm, the corresponding matrix has zero columns. Likewise, build the standardised modulator matrixX_newfromdesign_K$X_names, centre, and scale. -
Extract posterior draws — Pull
theta_ref_kinto an$(S \times K)$ matrix; pulla_coef_kandc_b_kinto per-slot lists of$(S \times J_k)$ matrices (viaposterior_array_var_to_list); pullW_rawinto an$(S \times d_W \times d)$ array and optionallysigma_Winto an$(S \times 1)$ matrix. -
Loop over draws and slots — For each draw
$s$ and slot$k$ , accumulate the linear predictor$\eta_k$ and, if aWbasis is active, compute the polynomial basis difference and the dot product withW_raw. - Return the filled array.
Returns A numeric 3-dimensional array of dimensions (S, n_new, K) where S = number of posterior draws, n_new = nrow(newdata), and K = number of individual slots. Entry [s, i, k] is the linear predictor
Notes
- Requires the posterior package (loaded via
require_suggested). -
Unsupported features that raise errors: (a)
J_groups > 1(grouped K-individual fits); (b) non-polynomialWbasis types (e.g.,"B-spline"). Both producegdpar_unsupported_feature_error. - The
Wbasis is declared global across all$K$ slots (per design decision "Scope of W: global"); thereforeW_rawandsigma_Ware shared, and the same modulating contribution enters every slot's$\eta_k$ . - When
has_WisTRUE, the function inspects only the first active slot'sWspecification to determinedim_Wandtype. - Newdata design matrices are centred/scaled using training statistics (
Z_a_k_means_list,Z_b_k_means_list,X_means,X_sds), not the new data's own moments. - This function is not exported and is not an S3 method.
Purpose Internal helper that extracts a one-dimensional Stan variable declared as real var[p] (e.g., theta_ref[k], sigma_W[1]) from a posterior draws object and reshapes it into an
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix
|
Posterior draws object (from posterior). |
var_name |
character scalar | Base name of the Stan variable (e.g., "theta_ref_k", "sigma_W"). |
p |
integer scalar | Expected number of elements (length of the Stan vector/array). |
Mathematics
No closed-form formula; this is a reshape/reindex operation. It maps column var_name[idx[c]]) to column idx[c] of the output matrix, where idx[c] is parsed via the regex ^var_name\[(\d+)\]$.
Algorithm
- Subset
drawsto variables matchingvar_nameviaposterior::subset_draws. - Convert to a plain
(S × C)matrix viaposterior::as_draws_matrix(stripped of class). - Parse each variable name with the regex
^var_name\[(\d+)\]$to extract the integer index. - Validate: abort with
gdpar_internal_errorif any name fails to match, or if the number of matched entries$\neq p$ . - Allocate an
$(S \times p)$ output matrix filled withNA_real_. For each column$c$ of the raw matrix, assign it to columnidx[c]of the output.
Returns A numeric matrix of dimensions (S, p) where S = number of posterior draws. Column j contains the draws for var_name[j].
Notes
- Abort condition: names that fail regex parsing →
gdpar_internal_errorwith message "failed to parse … variable names from draws". - Abort condition:
length(idx) != p→gdpar_internal_errorwith message "expected %d entries … found %d". - The function is robust to arbitrary column ordering in the posterior draws object because it reindexes by the parsed integer.
- Not exported; not an S3 method.
Purpose Internal helper that extracts a two-dimensional Stan variable declared as matrix[rows, cols] var (e.g., matrix[dim_W, d] W_raw) from a posterior draws object and reshapes it into a 3-dimensional R array of dimensions (S, rows, cols), correctly mapping each Stan-indexed element [r, c] to the appropriate array slice.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix
|
Posterior draws object. |
var_name |
character scalar | Base name of the Stan matrix variable (e.g., "W_raw"). |
rows |
integer scalar | Number of rows declared in Stan. |
cols |
integer scalar | Number of columns declared in Stan. |
Mathematics
No formula; reshaping only. The element drawn from variable name var_name[r, c] is placed at array index [s, r, c] for each draw
Algorithm
- Subset and convert to a plain matrix as in
posterior_var_to_array_2d. - Parse variable names with regex
^var_name\[(\d+),(\d+)\]$extracting row indexidx_rand column indexidx_c. - Validate: abort with
gdpar_internal_errorif any name fails to match. - Allocate an
$(S \times \texttt{rows} \times \texttt{cols})$ array filled withNA_real_. For each raw matrix column$c$ , assign it to[s, idx_r[c], idx_c[c]]across all draws$s$ .
Returns A numeric 3-dimensional array of dimensions (S, rows, cols).
Notes
- Abort condition: names that fail regex parsing →
gdpar_internal_error. - The
rowsandcolsarguments are used only for allocating the output array; the actual row/column mapping comes from the parsed indices, so the function is safe even if the posterior package reorders columns. - Not exported; not an S3 method.
Purpose Internal helper that extracts a Stan variable declared as array[p] vector[J_max] var (i.e., a ragged array of vectors, e.g., a_coef[k][j], c_b[k][j]) from a posterior draws object and returns a list of length ncols[k] columns is necessary because the Stan template pads each vector to a common J_max, and the trailing padded zeros must be excluded.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix
|
Posterior draws object. |
var_name |
character scalar | Base name of the Stan array-of-vectors variable (e.g., "a_coef_k", "c_b_k"). |
p |
integer scalar | Number of individual slots |
ncols |
integer vector of length p
|
Number of effective (non-padded) elements per slot |
Mathematics
No formula; extraction and reshaping. For each draw var_name[k, j].
Algorithm
- Subset and convert to a plain matrix.
- Parse variable names with regex
^var_name\[(\d+),(\d+)\]$extractingidx_k(slot index) andidx_j(within-slot index). - Validate: abort with
gdpar_internal_errorif any name fails to match. - Allocate a list of length
$p$ ; the$k$ -th entry is an$(S \times \texttt{ncols}[k])$ matrix ofNA_real_. - For each raw matrix column
$c$ , ifidx_j[c] <= ncols[idx_k[c]], assign the entire column of draws toout[[k]][, j]; otherwise skip (padded column).
Returns A list of length p. The S rows and ncols[k] columns containing posterior draws of the
Notes
- Padding columns (
idx_j > ncols[k]) are silently ignored; this is by design since the Stan template pads to a commonJ_max. - Abort condition: names that fail regex parsing →
gdpar_internal_error. - Not exported; not an S3 method.
Purpose S3 method for the coef generic, dispatched on class gdpar_fit. Returns posterior summary statistics (mean and 5 % / 50 % / 95 % quantiles) of the model's key parameters: the reference point(s) K > 1), returns a named list of per-slot gdpar_coef objects.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A fitted General Dynamic Parameter model. |
... |
— | Unused; present for S3 generic signature compatibility. |
Mathematics
The reported modulating coefficients use the natural-scale product
when cp_W is FALSE, so that summaries are on the modulating scale rather than the centred-parameterisation scale. When cp_W is TRUE, W_raw is used directly.
For K-individual fits, the W basis is global (shared across all W component is therefore attached identically to every slot's returned gdpar_coef object. The per-coordinate multiplicative slot draws from c_b_k are already on the
Returns
| Fit type | Return value |
|---|---|
Scalar (p = 1) or multi (p > 1) |
An object of class gdpar_coef (a list with components theta_ref, a, b, W). |
K-individual (K > 1) |
A named list of length K, each element a gdpar_coef object with p = 1L. |
The helper as.data.frame() can be applied to a single gdpar_coef to obtain a long-tidy table; for K-individual fits, lapply(coef(fit), as.data.frame) is recommended.
Notes
- Exported S3 method (registered via
@export). - K-individual fits with grouping (
J_groups > 1) are not yet supported and raisegdpar_unsupported_feature_error(queued for Session 8.4). - The
...argument is not used but is required by thecoefgeneric signature. - The function body is defined in another section of this file; only the roxygen documentation is present in this section.
Purpose
S3 method for the coef generic dispatching on objects of class gdpar_fit. Extracts posterior draws from a fitted general dynamic parameter model and restructures them into a gdpar_coef object (scalar/multivariate case) or a named list of gdpar_coef objects (K-individual case). This is the primary user-facing entry point for obtaining coefficient summaries from a fitted model.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A fitted model object produced by the package's fitting machinery. Must contain at least fit (a CmdStanMCMC or similar object), p or K, and the requisite amm, design, and parametrization metadata. |
... |
— | Ignored; present for S3 generic compatibility. |
Algorithm
The method performs a three-way dispatch based on the model structure:
-
Guard: Assert
objectinherits from"gdpar_fit"(raises an error otherwise). -
Require
posteriorpackage viarequire_suggested()(conditionally loads theposteriorpackage for extracting draws). -
Read
K: Ifobject[["K"]]is non-NULL and$K > 1$ , delegate entirely tocoef_K_to_gdpar_coef_list(object)and return its result immediately. -
Read
p: Otherwise, readobject[["p"]]; default to$p = 1$ if NULL, coerced to integer. -
Extract draws via
object$fit$draws()(calls theposteriordraws accessor on the embedded Stan fit). - If
$p = 1$ : callcoef_scalar_to_gdpar_coef(object, draws). - If
$p > 1$ : callcoef_multi_to_gdpar_coef(object, draws).
Returns
- A
gdpar_coefS3 object (scalar or multivariate path), or a named list ofgdpar_coefobjects (K-individual path).
Notes
- This is an S3 method dispatched on the
"gdpar_fit"class. Callingcoef(object)on anygdpar_fitwill route here. - The method does not perform any summarisation (e.g., posterior means, credible intervals); it returns the full posterior draws packed into tidy data frames inside
gdpar_coef. - If
K > 1and grouping information is present, the call aborts at the internal helper with agdpar_unsupported_feature_error(seecoef_K_to_gdpar_coef_listbelow). - Requires the suggested package
posterior; if unavailable, an informative error is raised byrequire_suggested.
Purpose
Internal helper that builds a named list of gdpar_coef objects (one per individual/slot) from a gdpar_coef object with gdpar_coef S3 contract for each slot and wrapping the results in a plain named list.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A fitted model object with K > 1. Must contain design_K, amm_list_canonical, slot_names, fit, and parametrization metadata. |
Algorithm
-
Require
posteriorpackage. -
Parse
Kfromobject[["K"]](coerced to integer). -
Grouping guard: If
object$group_infois non-NULL, abort withgdpar_unsupported_feature_errorindicating that grouped$K$ -individual fits ($J_{\text{groups}} > 1$ ) are queued for Session 8.4 and not yet implemented. -
Extract metadata: Retrieve
design_K(per-slot design specifications),amm_list_canonical(per-slot AMM specifications), andslot_names(defaulting toslot_1, …,slot_Kif NULL or length mismatch). -
Extract draws via
object$fit$draws(). -
Extract
$\theta_{\text{ref},k}$ : Usingposterior_var_to_matrix_3don Stan variable"theta_ref_k"(shape:$S \times 1 \times K$ , where$S$ is the number of posterior draws). Reshape to an$S \times K$ matrix. -
Determine per-slot column counts for
$a$ and$b$ terms: $$\text{ncols_a}[k] = \lvert \texttt{design_KZ_a_k_names_list}k \rvert, \quad \text{ncols_b}[k] = \lvert \texttt{design_KZ_b_k_names_list}k \rvert$$ -
Extract
$a$ coefficients: If any slot has$a$ terms, extract"a_coef_k"viaposterior_array_var_to_list(draws, "a_coef_k", K, ncols = ncols_a). Padding columns (those beyond a slot's actual design width) are silently skipped by the helper. -
Extract
$b$ coefficients: If any slot has$b$ terms, extract"c_b_k"viaposterior_array_var_to_list(draws, "c_b_k", K, ncols = ncols_b). Note that$b$ is recovered from the linear-reparametrization variablec_b_k, notb_coef_k(Recovery 2, handoff 4 addendum). -
Extract global
$W$ block: Determine which slots declare a non-NULLWin their AMM. If any do anddesign_K$X_namesis non-empty:- Identify the first active slot's
Wspecification. - Verify it is
"polynomial"with$\dim_W > 0$ . - Extract
"W_raw"as an$S \times \dim_W \times d_x$ 3D array viaposterior_var_to_matrix_3d. - If not centered parametrization (
cp_Wis FALSE): extract"sigma_W"and scale each draw's slice:$$W_{\text{eff}}[s, \cdot, \cdot] = W_{\text{raw}}[s, \cdot, \cdot] \cdot \sigma_W[s, 1]$$ - Flatten to an
$S \times (\dim_W \cdot d_x)$ matrix and callbuild_coef_W_dfto produce a tidy data frame. - This
W_global_dfis replicated into every slot'sgdpar_coef(since the K-individual Stan template shares$W$ globally).
- Identify the first active slot's
-
Assemble per-slot
gdpar_coef: For$k = 1, \dots, K$ :- Build
theta_ref_df_kviabuild_coef_theta_ref_dffrom the$k$ -th column oftheta_ref_mat(scalar,$p = 1$ ). - If
$\text{ncols\_a}[k] > 0$ : wrap in a one-element list viabuild_coef_term_df(a_dr_list[[k]], term_names = design_K$Z_a_k_names_list[[k]]). - If
$\text{ncols\_b}[k] > 0$ : similarly wrap viabuild_coef_term_df(b_dr_list[[k]], ...). - If
W_global_dfis non-NULL: wrap in a one-element list. - Call
new_gdpar_coef(...)with all components,$p = 1$ ,mu_theta_ref = NULL,sigma_theta_ref = NULL,J_groups = 1L,group_levels = NULL.
- Build
-
Name the list with
slot_namesand return.
Mathematics
For each slot
The globally shared weight matrix (per draw
is replicated identically across all
Returns
A named list of length slot_names and whose elements are gdpar_coef S3 objects, each with
Notes
- Raises
gdpar_unsupported_feature_error(viagdpar_abort) ifobject$group_infois non-NULL (i.e.,$J_{\text{groups}} > 1$ ). This combination is queued for implementation in Session 8.4. - The
$b$ slot uses the linear-reparametrization variablec_b_krather thanb_coef_k, consistent with Recovery 2 / handoff 4 addendum convention. - Padding columns in
a_coef_kandc_b_karrays (when slots have different design widths) are silently skipped byposterior_array_var_to_list. - The
Wblock is replicated across all slots (global$W$ ). Per-slot$W$ differentiation is not supported in the$K$ -individual template. - Marked
@keywords internal @noRd; not exported and not documented in the user manual.
Purpose
Internal helper that builds a single gdpar_coef object from a scalar (coef.gdpar_fit.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A fitted model object with p = 1. Must contain group_info (or NULL), amm, design, and parametrization metadata. |
draws |
draws_array / draws_matrix
|
Posterior draws extracted from object$fit$draws(). |
Algorithm
-
Determine grouping:
J_groups= number of group levels (default 1 if nogroup_info);group_levels= the level labels (or NULL). -
Extract
$\theta_{\text{ref}}$ : Call.extract_theta_ref_uni_grouped(draws, J_groups)to obtain a 3D array of draws (shape$S \times p \times J_{\text{groups}}$ ). Build a tidy data frame viabuild_coef_theta_ref_df_grouped. -
Extract hyperparameters (grouped models only,
group_infonon-NULL):-
$\mu_{\theta_{\text{ref}}}$ : via.extract_mu_sigma_theta_ref(draws, "mu_theta_ref", p = 1), thenbuild_coef_hyper_df(mu_dr, p = 1). -
$\sigma_{\theta_{\text{ref}}}$ : via.extract_mu_sigma_theta_ref(draws, "sigma_theta_ref", p = 1), thenbuild_coef_hyper_df(sigma_dr, p = 1).
-
-
Extract
$a$ coefficients: Ifobject$amm$ais non-NULL, extract"a_coef"as a draws matrix, then build a tidy data frame viabuild_coef_term_df(a_dr, object$design$Z_a_names). Wrap in a one-element list. -
Extract
$b$ coefficients: Ifobject$amm$bis non-NULL, extract"b_coef"as a draws matrix, thenbuild_coef_term_df(b_dr, object$design$Z_b_names). Wrap in a one-element list. -
Extract
$W$ block: Ifobject$amm$Wis non-NULL:- Extract
"W_raw"as a draws matrix. - If not centered parametrization (
cp_Wis FALSE): extract"sigma_W"and compute:$$W_{\text{eff}} = W_{\text{raw}} \cdot \sigma_W$$ where$\sigma_W$ is broadcast from an$S \times 1$ matrix to a scalar per draw. - Otherwise
$W_{\text{eff}} = W_{\text{raw}}$ . - Build a tidy data frame via
build_coef_W_df(W_eff, basis_dim = object$amm$W$dim, x_names = object$design$X_names). Wrap in a one-element list.
- Extract
-
Assemble via
new_gdpar_coef(theta_ref, a, b, W, p = 1, mu_theta_ref, sigma_theta_ref, J_groups, group_levels).
Mathematics
For a scalar model (
In the grouped hierarchical case, the prior on the reference parameter is:
The non-centered
Returns
A single gdpar_coef S3 object with
Notes
- For grouped models (
$J_{\text{groups}} > 1$ ), thetheta_refdata frame contains one column per group level, and the hyperparameter data frames (mu_theta_ref,sigma_theta_ref) are non-NULL. - For ungrouped models (
$J_{\text{groups}} = 1$ ),mu_theta_refandsigma_theta_refare NULL, andgroup_levelsis NULL. -
aandbare each wrapped in a one-element list (matching the list-of-length-$p$ convention used innew_gdpar_coef, even though$p = 1$ ). - The
$b$ coefficient is extracted from Stan variable"b_coef"(unlike the$K$ -individual path which uses"c_b_k"). - Marked
@keywords internal @noRd; not exported.
Purpose
Internal helper that builds a single gdpar_coef object from a multivariate (
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A fitted model object with p > 1. Must contain amm, design, p, group_info (or NULL), and parametrization. |
draws |
draws_array / draws_matrix
|
Posterior draws extracted from object$fit$draws(). |
Algorithm
-
Parse metadata: Extract
amm,design, integerp,J_groups(default 1),group_levels(default NULL). -
Extract
$\theta_{\text{ref}}$ : Call.extract_theta_ref_multi_grouped(draws, J_groups, p)to obtain a 3D array (shape$S \times p \times J_{\text{groups}}$ ). Build a grouped data frame viabuild_coef_theta_ref_df_grouped. -
Extract hyperparameters (grouped models only):
-
$\mu_{\theta_{\text{ref}}}$ :.extract_mu_sigma_theta_ref(draws, "mu_theta_ref", p)→build_coef_hyper_df(mu_dr, p). -
$\sigma_{\theta_{\text{ref}}}$ :.extract_mu_sigma_theta_ref(draws, "sigma_theta_ref", p)→build_coef_hyper_df(sigma_dr, p).
-
-
Extract
$a$ coefficients (per-dimension): Check if any dimension declares an$a$ term (has_a_global). If so:- Compute per-dimension column counts: $$\text{ncols_a}[k] = \lvert \texttt{designZ_a_names_list}k \rvert, \quad k = 1, \dots, p$$
- Extract
"a_coef"viaposterior_array_var_to_list(draws, "a_coef", p, ncols = ncols_a). - For each
$k$ : if$\text{ncols\_a}[k] > 0$ , callbuild_coef_term_df(a_dr_list[[k]], term_names = design$Z_a_names_list[[k]]); otherwiseNULL. - Result is a length-
$p$ list (with possible NULL entries).
-
Extract
$b$ coefficients (per-dimension): Check if any dimension declares a$b$ term (has_b_global). If so:- Compute per-dimension column counts: $$\text{ncols_b}[k] = \lvert \texttt{designZ_b_names_list}k \rvert, \quad k = 1, \dots, p$$
- Extract
"c_b"viaposterior_array_var_to_list(draws, "c_b", p, ncols = ncols_b). - Build per-dimension data frames (NULL if empty). Result is a length-
$p$ list. -
Note: Uses
"c_b"(linear-reparametrization variable), not"b_coef".
-
Extract
$W$ block (per-dimension sub-blocks): Active ifamm$Wis non-NULL,design$X_namesis non-empty, andamm$W$type == "polynomial". If so:- Compute per-dimension weight basis dimension:
$$W_{\text{per_k_dim}} = \left\lfloor \frac{\dim_W}{p} \right\rfloor$$ - Extract
"W_raw"as an$S \times \dim_W \times d_x$ 3D array. - If not centered parametrization: extract
"sigma_W". - For each
$k = 1, \dots, p$ :- If
$W_{\text{per\_k\_dim}} = 0$ :NULL. - Otherwise extract the row block
$[(k-1) \cdot W_{\text{per\_k\_dim}} + 1,\; k \cdot W_{\text{per\_k\_dim}}]$ from the$\dim_W$ axis. - Scale by
$\sigma_W$ if not centered:$$W_{\text{block}}[s] = W_{\text{raw,block}}[s] \cdot \sigma_W[s]$$ - Flatten and build data frame via
build_coef_W_df.
- If
- Result is a length-
$p$ list.
- Compute per-dimension weight basis dimension:
-
Assemble via
new_gdpar_coef(theta_ref, a, b, W, p, mu_theta_ref, sigma_theta_ref, J_groups, group_levels).
Mathematics
For a multivariate model with
Under the non-centered parametrization: $$\mathbf{W}^{(k)}{\text{eff}}[s] = \mathbf{W}^{(k)}{\text{raw}}[s] \cdot \sigma_W[s]$$
The
where
Returns
A single gdpar_coef S3 object with a, b, and W slots are each length-
Notes
- Uses
"c_b"(not"b_coef") for the$b$ coefficients, consistent with the Stan-side linear reparametrization convention (Recovery 2). - The
$a$ and$b$ extraction uses the global flag (has_a_global,has_b_global): if any dimension declares the term, the array is extracted for all$p$ dimensions; dimensions with zero columns yield NULL list entries. - The weight matrix is physically stored as a single 3D array in Stan and logically partitioned into per-dimension blocks in R.
- Integer division
$\lfloor \dim_W / p \rfloor$ determines per-dimension block size; if$\dim_W$ is not evenly divisible by$p$ , any remainder rows would be discarded (though in practice the design enforces divisibility). - For grouped models, hyperparameter data frames are included; for ungrouped models, they are NULL.
- Marked
@keywords internal @noRd; not exported.
Note: The section also contains roxygen documentation for print.gdpar_diagnostics(x, ...) (an exported S3 print method), but the function body is not defined in this section and appears in a subsequent section (5 of 6). It is therefore not documented here.
Purpose S3 print method for objects of class gdpar_diagnostics. Provides a concise, human-readable summary of MCMC diagnostic quantities stored in the object, including convergence status,
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list with class gdpar_diagnostics
|
A diagnostics object whose named elements are converged (logical), rhat_max (numeric or NA), ess_bulk_min (numeric or NA), ess_tail_min (numeric or NA), divergent_count (integer), treedepth_saturated (integer), efmi_min (numeric or NA), and messages (character vector, possibly length-0). |
... |
Additional arguments passed to print; ignored. |
Returns x invisibly (invisible(x)). The function is called for its side effect of printing to the console.
Notes
- Uses
catwithsep = ""so no extra whitespace is introduced between label and value. - Numeric diagnostics (
rhat_max,ess_bulk_min,ess_tail_min,efmi_min) that areNAare printed as the literal string"NA"rather than being passed toformat. - When
length(x$messages) > 0L, messages are printed as a bulleted list with"- "prefix and indented four spaces. - No S3 dispatch beyond the class name;
UseMethodis not called explicitly here becauseprintis already a generic.
Purpose Internal helper that extracts the posterior draws of theta_ref from a univariate Stan model with grouping. The Stan template declares theta_ref as vector[J_groups], which the posterior package exposes as scalar variables theta_ref[1], theta_ref[2], …, theta_ref[J_groups]. The helper reassembles these into a draws
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix (posterior) |
Posterior draws object from a fitted univariate grouped Stan model. |
J_groups |
integer scalar | Expected number of groups. |
Returns A numeric matrix of dimension
Notes
- Uses regex
^theta_ref\\[(\\d+)\\]$to parse integer group indices from variable names. - Calls
gdpar_abortwith class"gdpar_internal_error"if:- Any parsed index is
NA. - The number of parsed columns
$\neq J_\text{groups}$ .
- Any parsed index is
- Internally converts the
posteriordraws subset to a plain matrix viaunclass(posterior::as_draws_matrix(...)). - Uses a
forloop to place each column of the raw matrix into the correct column ofoutbased on the parsed index, so ordering is deterministic even if Stan/theposteriorpackage emits columns in a different order.
Purpose Internal helper that extracts the posterior draws of theta_ref from a multivariate Stan model with grouping. The Stan template declares theta_ref as array[J_groups] vector[p], exposed by the posterior package as variables theta_ref[g, k]. The helper reassembles these into a three-dimensional array.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix (posterior) |
Posterior draws object from a fitted multivariate grouped Stan model. |
J_groups |
integer scalar | Expected number of groups. |
p |
integer scalar | Dimensionality of the multivariate parameter space. |
Returns A numeric array of dimension
Notes
- Uses regex
^theta_ref\\[(\\d+),\\s*(\\d+)\\]$to parse group index$g$ and coordinate index$k$ . - Expected number of raw columns is
$J_\text{groups} \times p$ . - Calls
gdpar_abortwith class"gdpar_internal_error"if any parsed index isNAor the total count does not match. - Iterates over columns of the raw matrix, placing each into the correct
[s, g, k]slice of the output array.
Purpose Convenience wrapper for the no-grouping branch of the multivariate Stan template. When .extract_theta_ref_multi_grouped has a trivial first axis. This function collapses
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix (posterior) |
Posterior draws object. |
J_groups |
integer scalar | Must be exactly 1L. |
p |
integer scalar | Dimensionality of the multivariate parameter. |
Returns A numeric matrix of dimension arr[, 1L, ] where arr is the
Notes
- Calls
.extract_theta_ref_multi_groupedinternally and then drops the singleton group axis. - Raises
gdpar_abort(class"gdpar_internal_error") if$J_\text{groups} \neq 1$ , guarding against misuse.
Purpose Internal helper that extracts posterior draws of either mu_theta_ref or sigma_theta_ref (hierarchical hyperparameters of the group-level reference anchor). These Stan variables are emitted only when use_groups = 1. In the univariate template they are declared array[1] real, yielding variables mu_theta_ref[1]; in the multivariate template they are array[1] vector[p], yielding mu_theta_ref[1, k]. The helper detects both shapes and normalises the output to a matrix of dimension
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws |
draws_array / draws_matrix (posterior) |
Posterior draws object. |
var_name |
character scalar | Either "mu_theta_ref" or "sigma_theta_ref". |
p |
integer scalar | Dimensionality of the parameter space ( |
Returns A numeric matrix of dimension
Notes
- First attempts to match a two-index pattern
^<var_name>\\[1,\\s*(\\d+)\\]$(multivariate). If any matches succeed, it expects exactly$p$ such columns and assembles them into the matrix by coordinate index. - If no two-index matches are found, falls back to the one-index pattern
^<var_name>\\[(\\d+)\\]$(univariate). It expects exactly one match and returns a single-column matrix. - Calls
gdpar_abort(class"gdpar_internal_error") if:- The two-index path finds a count
$\neq p$ . - The one-index path finds 0 or
$> 1$ matches, or any parsed index isNA.
- The two-index path finds a count
- The
var_nameparameter allows the same logic to serve bothmu_theta_refandsigma_theta_ref.
Purpose Internal function that generates posterior predictions of predict_from_newdata but respects the per-group anchor structure of Block 6.5: known groups reuse their group-specific theta_ref draws; unseen groups fall back to the hierarchical prior predictive.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar fitted model object |
Must contain amm, design, group_info, and anchor components. |
newdata |
data.frame |
New covariate data; must include the grouping variable group_info$var_name. |
draws |
draws_matrix / draws_array (posterior) |
Posterior draws from the fitted Stan model. |
Algorithm / Mathematics
Let newdata.
1. Group resolution. For each row newdata, mapped against the factor levels observed at fit time. Rows whose level was never seen are flagged unseen.
2. Per-draw loop (
For each observation
Then the linear predictor is assembled:
$$ \eta_i^{(s)} = \theta_{\text{ref},i}^{(s)}
- \mathbf{z}{a,i}^{\top},\mathbf{a}{\text{coef}}^{(s)}
- \mathbf{z}_{b,i}^{\top},\mathbf{c}_b^{(s)}
- \text{W-contribution}_i^{(s)} $$
where newdata for the varying-intercept and varying-slope sub-models respectively, b_coef; see decision D4 of Block 6.5).
3. W-contribution (only if amm$W is non-null and of type "polynomial"). Let W_raw[s, ],
where
Design matrix construction for newdata:
-
Z_a: model matrix foramm$aonnewdata, centered column-wise by the training meansdesign_train$Z_a_means. -
Z_b: model matrix foramm$bonnewdata, centered column-wise bydesign_train$Z_b_means. -
X: columns ofnewdatamatchingdesign_train$X_names, centered byX_meansand scaled byX_sds(training statistics). - Empty matrices (zero columns) are created if the corresponding sub-model is absent.
Returns A numeric matrix of dimension nrow(newdata). Entry
Notes
- Calls
require_suggested("posterior", ...)to ensure theposteriorpackage is available. - Raises
gdpar_abort(class"gdpar_input_error") if the grouping variable is not found innewdata. - If any rows contain unseen group levels, a warning of class
"gdpar_predict_unseen_group_warning"is issued viagdpar_warn, naming the unseen levels. Those rows receive draws from the marginal prior predictive rather than a group-specific posterior draw. -
theta_ref_dris extracted via.extract_theta_ref_uni_grouped,mu_drandsigma_drvia.extract_mu_sigma_theta_refwithp = 1L. - Population-level coefficients (
a_coef,c_b,W_raw,sigma_W) are extracted fromdrawsonly if the corresponding sub-model is present; otherwise set toNULLand the corresponding block is skipped. - The
forloop over$s$ is the outermost loop, so the entire prediction for draw$s$ is computed before moving to$s+1$ . This includes random number generation for unseen groups (viastats::rnorm), meaning the RNG state matters and results are reproducible only if the seed is set beforehand.
Purpose
Generates posterior predictive draws of the linear predictor for new data from a fitted generalized dynamic parameter (GDPAR) model with multiple groups. It is an internal workhorse used by the predict.gdpar method for grouped models. It handles unseen groups by drawing their reference parameters from the marginal prior predictive distribution.
Arguments
-
object: A fitted model object of class"gdpar"containing theamm(Anchored Model Matrix),design,p,group_info,parametrization, andanchorcomponents. -
newdata: A data frame containing new observations. Must include the same grouping variable used during model fitting. -
draws: A posterior draws object (e.g., from theposteriorpackage) containing samples of all model parameters.
Mathematics
For each posterior draw
-
Reference Parameter
Let$g(i)$ be the group of observation$i$ . If$g(i)$ is a seen group,
$$\theta_{\text{ref}, s, i, k} = \theta_{\text{ref}, s, g(i), k}^{(\text{train})}$$
where$\theta_{\text{ref}, s, g(i), k}^{(\text{train})}$ is the group‑specific reference draw.
If$g(i)$ is unseen,$\theta_{\text{ref}, s, i, k}$ is drawn from the prior predictive:
$$\theta_{\text{ref}, s, i, k} \sim \mathcal{N}\left( \mu_{\theta_{\text{ref}}, s, k},; \sigma_{\theta_{\text{ref}}, s, k} \right).$$ -
Linear Additive Terms
Add the contributions from theaandbdesign matrices:
$$\eta_{s,i,k} ;+=; Z_{a,i,k}^\top , a_{\text{coef}, s, k} ;+; Z_{b,i,k}^\top , c_{b, s, k}$$
where$Z_{a,i,k}$ and$Z_{b,i,k}$ are centered (mean‑subtracted) design vectors for theaandbsub‑models of dimension$k$ , and$a_{\text{coef}, s, k}$ ,$c_{b, s, k}$ are the corresponding coefficient draws. -
Polynomial Basis Term (if present)
If the model contains a polynomial basis matrix$\mathbf{W}$ , then for each dimension$k$ let$\theta_{\text{anchor},k}$ be the fixed anchor point. Define the basis difference vector
$$\Delta_{s,i,k} = \begin{bmatrix} \theta_{\text{ref}, s, i, k}^{1} - \theta_{\text{anchor},k}^{1} \ \theta_{\text{ref}, s, i, k}^{2} - \theta_{\text{anchor},k}^{2} \ \vdots \ \theta_{\text{ref}, s, i, k}^{W_{\text{per}_k}} - \theta_{\text{anchor},k}^{W_{\text{per}_k}} \end{bmatrix}$$
where$W_{\text{per}\_k}$ is the number of polynomial terms per dimension. Let$W_{s,k}$ be the block of the weight matrix draw corresponding to dimension$k$ . Then
$$\eta_{s,i,k} ;+=; X_i^\top \left( W_{s,k}^\top , \Delta_{s,i,k} \right).$$
If the model uses a non‑centered parametrization (cp_W = FALSE), the term is additionally scaled by$\sigma_{W,s}$ .
Returns
A 3‑dimensional array of size newdata, and [s, i, k] is the linear predictor
Notes
-
Side effect: Loads the
posteriorpackage viarequire_suggested(). -
Error handling: Raises a
gdpar_input_errorif the required grouping variable is missing fromnewdata. -
Warning: Issues a
gdpar_predict_unseen_group_warningif any level of the grouping variable innewdatawas not present during training. -
Unseen groups: Their reference parameters are drawn i.i.d. from the prior predictive
$\mathcal{N}(\mu_{\theta_{\text{ref}}}, \sigma_{\theta_{\text{ref}}})$ for each draw, ignoring any group‑level posterior shrinkage. -
Design matrix centering: The
Z_aandZ_bmatrices are centered using the training‑set column means stored indesign_train$Z_a_means_listandZ_b_means_list. -
Polynomial basis: Only active when the model’s
amm$Wis present and of type"polynomial". The number of terms per dimension ($W_{\text{per}\_k}$ ) is computed asdim_W / p. -
Parametrization: The scaling of the
$W$ basis by$\sigma_W$ depends on the logical flagobject$parametrization$cp_W. -
S3 dispatch: Not directly dispatched; called internally by
predict.gdparwhen the model is grouped and multi‑dimensional.
preflight_parametrization_multi(prior, stan_data, amm, preflight_seed = NULL, verbose = FALSE, tau_cp = 5, tau_ncp = 2, aggregation = "any_ncp")
Purpose
Multivariate (preflight_parametrization(). The function compiles and samples a short NCP version of the multivariate Stan template, then applies a three-filter cascade—divergence attribution, E-BFMI, and info-ratio z-test (Path B′)—to each coordinate aggregation strategy, and a full report is returned as a gdpar_preflight_report S3 object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
prior |
gdpar_prior |
Prior specification object; validated via assert_inherits. |
stan_data |
list | Data list assembled by .assemble_stan_data_multi; must contain fields use_a, use_W, J_a_per_k, W_per_k_dim, d, theta_anchor. |
amm |
amm_spec |
AMM specification object; must have amm$p set to an integer |
preflight_seed |
integer or NULL
|
Random seed passed to run_preflight_sample. |
verbose |
logical | If FALSE, pre-flight stderr is sunk to a tempfile and discarded. |
tau_cp |
numeric scalar | Upper null-hypothesis value for the info-ratio z-test. Default 5 (handoff-canonical). |
tau_ncp |
numeric scalar | Lower null-hypothesis value for the info-ratio z-test. Default 2 (handoff-canonical). |
aggregation |
character scalar | One of c("any_ncp", "majority", "per_k"). Default "any_ncp". Controls how per-coordinate decisions are aggregated into per-component decisions inside the report. |
Mathematics
Let "absent" when the component is unused or degenerate for that
Preliminary quantities.
A coordinate
Critical values (computed once,
where
Filter 1 — Divergence attribution (per coordinate). Applied only when preflight_attribution_score_a_per_k_from_fit using
Filter 2 — E-BFMI (global). If
Filter 3 — Info-ratio z-test (Path B′, per coordinate). For each undecided coordinate
For component a_coef[, k, 1:J_a_free[k]] and the reference scale is sigma_a[, k]. For component
and the per-draw reference scale is:
Aggregation. Per-coordinate decisions are passed to new_gdpar_preflight_report, which produces a global per-component decision according to aggregation. The global decision is then converted to a logical (TRUE = CP) via decision_to_logical.
Returns
A named list with elements:
| Element | Type | Description |
|---|---|---|
cp_a |
logical scalar | Aggregated global CP/NCP decision for component TRUE = CP). |
cp_W |
logical scalar | Aggregated global CP/NCP decision for component TRUE = CP). |
cp_a_per_k |
logical vector (length |
Per-coordinate CP/NCP decision for component |
cp_W_per_k |
logical vector (length |
Per-coordinate CP/NCP decision for component |
cs_model_ncp |
cmdstanr model object or NULL
|
Compiled NCP Stan model; NULL if no preflight was run (no component needed per- |
report |
gdpar_preflight_report |
S3 object containing the full per-coordinate, per-component diagnostic table and settings. |
Notes
-
Input validation.
assert_inheritsis called onprior(must begdpar_prior) andamm(must beamm_spec). Ifamm$pisNULLor< 1, the function aborts with agdpar_internal_errorclass error. Ifaggregationis not one of the three allowed values, it aborts with agdpar_input_errorclass error carrying adatafield withreceived. -
Stan code generation. The NCP Stan template is generated via
generate_stan_code_multi(prior, cp_a = FALSE, cp_W = FALSE), written to a tempfile viawrite_stan_to_tempfile, and compiled withcmdstanr::cmdstan_model. -
Sampling. Delegated to
run_preflight_sample(cs_model_ncp, stan_data, preflight_seed, verbose). The total number of transitions is hardcoded as2L * 200L(2 chains × 200 post-warmup draws), matching the settings recorded in the report (n_chains = 2L,n_warmup = 200L,n_sampling = 200L,adapt_delta = 0.95,max_treedepth = 10L). -
Diagnostics.
diag <- pre_fit$diagnostic_summary(quiet = TRUE)is called on the fit object.n_divergentissum(diag$num_divergent);div_pctisn_divergent / total_transitions;ebfmi_minismin(diag$ebfmi). -
Null coalescing. Uses
%||%forstan_datafields (J_a_per_k,W_per_k_dim,d,theta_anchor), defaulting to0L,0L,0L, andas.double(stan_data$theta_anchor)respectively. -
Offset computation.
a_raw_offset_per_kisc(0L, cumsum(J_a_free_per_k)[-p]), giving the 0-based start index of each$k$ 's block in the flatteneda_rawvector. Note thatcumsumis taken over all$p$ elements and the last element is dropped, so the result has length$p$ . -
needs_W_per_kis constructed asrep(has_W & W_per_k_dim >= 1L & d >= 1L, p), i.e., the same scalar condition replicated$p$ times. -
No preflight path. If
!any(needs_a_per_k) && !any(needs_W_per_k), no Stan model is compiled or sampled;cs_model_ncpremainsNULL,used_preflightremainsFALSE, and all decisions remain"absent"withNAtest statistics. -
Report construction. The
per_dimdata frame has$2p$ rows ($p$ for componenta,$p$ for componentW) with columns:component,dim,decision,decision_reason,n_divergent,div_pct,ebfmi_min,t_attribution,t_info_cp,t_info_ncp. Theaggregationargument is passed asmethodtonew_gdpar_preflight_report. -
Global decision extraction. The global decision is read from
report$global$global_decisionfiltered by component, then converted to logical viadecision_to_logical. -
Phase H.1 contract. The
"any_ncp"and"majority"strategies are wired through to the existing multivariate Stan template (single CP/NCP pair per component). The"per_k"strategy requires Phase H.2 per-$k$ Stan template wiring and is rejected at the resolver level (not within this function). -
Side effects. Writes a Stan file to a temporary directory; compiles a cmdstan model; potentially writes stderr to a tempfile (when
verbose = FALSE).
Purpose
Per-coordinate (a_coef[k, 1:J_a_free_k] and the per-draw scale sigma_a[k] from a preflight cmdstanr fit object, then delegates the info-ratio z-test computation to preflight_info_ratio_t.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
cmdstanr fit object | The preflight model fit from which posterior draws are extracted. |
k |
integer scalar | Coordinate index (1-based) into the |
J_a_free_k |
integer scalar | Number of free additive coefficients for coordinate |
a_raw_offset_k |
integer scalar | Offset into the flattened a_raw vector for coordinate |
tau_cp |
numeric scalar | Upper null-hypothesis value for the info-ratio z-test. |
tau_ncp |
numeric scalar | Lower null-hypothesis value for the info-ratio z-test. |
Mathematics
Given
where preflight_info_ratio_t as effective_coef (an reference_scale_per_draw (a length-n_chains = 2L. The info-ratio z-test then produces two statistics
Returns
A named list with two elements:
| Element | Type | Description |
|---|---|---|
t_cp |
numeric scalar or NA_real_
|
Info-ratio z-test statistic under the CP null hypothesis. |
t_ncp |
numeric scalar or NA_real_
|
Info-ratio z-test statistic under the NCP null hypothesis. |
Returns list(t_cp = NA_real_, t_ncp = NA_real_) in any of the following degenerate cases:
J_a_free_k < 1L- The draw extraction for
sigma_a[k]raises an error or returns zero columns - Any element of
sigma_vecisNAor$\leq 0$ - The draw extraction for
a_coef[k, 1:J_a_free_k]raises an error or returns a matrix withncol != J_a_free_k
Notes
-
Variable naming. Stan variables are extracted using 1-based Stan indexing:
sprintf("sigma_a[%d]", k)for the scale andsprintf("a_coef[%d,%d]", k, seq_len(J_a_free_k))for the coefficient vector. -
Error handling. Both
fit$draws()calls are wrapped intryCatchwith an error handler returningNULL. A subsequentis.nullcheck (orncolcheck) triggers the NA return path. -
Unused argument.
a_raw_offset_kis part of the signature (for symmetry with the$W$ counterpartpreflight_W_info_per_k_from_fit) but is not referenced in the function body. -
Draw format. Both draw extractions request
format = "draws_matrix". The coefficient matrix is additionally coerced viaas.matrix(). -
Hardcoded chain count.
n_chains = 2Lis passed topreflight_info_ratio_t, matching the preflight configuration inpreflight_parametrization_multi. - No side effects. The function is purely computational; it reads from the fit object but does not modify it or any external state.
Purpose
Per-component-theta_ref[k], sigma_W[1], and the W_raw from a pre-flight Stan fit, reconstructs the effective coefficient contribution of that block along each coordinate, and delegates the final CP/NCP threshold computation to preflight_info_ratio_t. This is the multivariate analogue of the per-preflight_parametrization_multi.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
Stan fit object | A fitted model object exposing $draws(variables, format) and (transitively, via preflight_info_ratio_t) sampler diagnostics. |
k |
integer scalar | 1-based index of the component whose |
W_per_k_dim |
integer scalar | Number of rows in the W_raw matrix belonging to component |
d |
integer scalar | Number of columns of W_raw (the coordinate / response dimension). |
theta_anchor_k |
numeric scalar | The reference-anchor value |
tau_cp |
numeric scalar | CP-side threshold passed through to preflight_info_ratio_t. |
tau_ncp |
numeric scalar | NCP-side threshold passed through to preflight_info_ratio_t. |
Mathematics
Let
The W_raw. For each draw
The pair preflight_info_ratio_t with n_chains = 2L.
Returns
A list with elements t_cp and t_ncp (both NA_real_ on every early-exit path, otherwise the structure returned by preflight_info_ratio_t).
Notes
- Early-exit (returns
list(t_cp = NA_real_, t_ncp = NA_real_)) occurs when:-
W_per_k_dim < 1Lord < 1L; -
fit$draws(...)fortheta_ref[k]orsigma_W[1]raises an error or returns zero columns; - any element of
theta_vecorsigma_vecisNA, or anysigma_vec <= 0; -
fit$draws(...)for theW_rawblock raises an error or returns a matrix whose column count differs fromW_per_k_dim * d; - any element of
basis_normis non-finite or non-positive.
-
-
tryCatchwrappers suppress errors from$draws()and coerce them toNULL. -
basis_diffis defensively re-shaped into a matrix ifvapplyreturns a non-matrix (which can occur whenW_per_k_dim == 1L). -
n_chainsis hard-coded to2Lregardless of the actual number of chains infit. - No side effects beyond reading from
fit.
Purpose
Per-component-sigma_a[k] and the flat-packed free-coefficient section a_raw[a_raw_offset_k + 1 : J_a_free_k] from the pre-flight fit, computes the per-draw Euclidean norm of that section, retrieves the divergent-transition indicator, and delegates to preflight_attribution_score.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
Stan fit object | Pre-flight fit exposing $draws(variables, format) and $sampler_diagnostics(format). |
k |
integer scalar | 1-based component index used to select sigma_a[k]. |
J_a_free_k |
integer scalar | Number of free coefficients in the |
a_raw_offset_k |
integer scalar | 0-based offset into the flat a_raw vector at which component |
Mathematics
Let
The score is then
Returns
A numeric scalar — the attribution score — or NA_real_ on any early-exit path.
Notes
- Early-exit (returns
NA_real_) when:-
J_a_free_k < 1L; -
$draws()forsigma_a[k]errors or returns zero columns; - any
sigma_vec <= 0or anysigma_vecisNA; -
$draws()for thea_rawsection errors or returns a column count$\neq J_{a\_free\_k}$ ; -
$sampler_diagnostics()errors, returnsNULL, or lacks a"divergent__"column.
-
- The offset convention is 0-based: the actual Stan indices queried are
a_raw_offset_k + 1, …,a_raw_offset_k + J_a_free_k. -
divergentis coerced to logical viaas.logical. - No side effects.
Purpose
Per-component-sigma_W[1], selects the rows of W_raw belonging to the preflight_attribution_score.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
Stan fit object | Pre-flight fit exposing $draws and $sampler_diagnostics. |
k |
integer scalar | 1-based component index selecting the row-block of W_raw. |
W_per_k_dim |
integer scalar | Number of rows in the W_raw. |
d |
integer scalar | Number of columns of W_raw. |
Mathematics
With
The score is
Returns
A numeric scalar — the attribution score — or NA_real_ on any early-exit path.
Notes
- Early-exit (returns
NA_real_) under the same structural conditions aspreflight_attribution_score_a_per_k_from_fit, mutatis mutandis forsigma_W[1]and theW_rawblock. - The variable names passed to
$draws()are constructed viaouter(block_rows, seq_len(d), ...)producingsprintf("W_raw[%d,%d]", r, c)strings; the resulting vector is flattened byas.vector, so column ordering in the returned matrix follows row-major enumeration of(r, c). -
W_normis computed assqrt(rowSums(W_block_mat^2)), i.e. the Frobenius norm per draw. - No side effects.
Purpose
Maps a vector of parametrization-decision strings to a logical CP-flag vector. Used internally to translate pre-flight decision labels into the boolean cp_* flags consumed by downstream resolution logic.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
character vector | Each element is one of "CP", "NCP", "absent", "per_k". |
Mathematics
The mapping is
Returns
A logical vector of the same length as x (with possible NA values), without names (USE.NAMES = FALSE).
Notes
- Any value not in
$\{\text{"CP"}, \text{"NCP"}, \text{"absent"}, \text{"per\_k"}\}$ triggersgdpar_abortwithclass = "gdpar_internal_error"anddata = list(value = v), carrying the message"Unknown decision value '<v>'.". - The
"per_k"sentinel maps toNA; per the docstring, the resolver is expected to translate this into an unsupported-feature error in Phase H.1 (though the actual translation is performed elsewhere). - Implemented via
vapplywithswitch, ensuring a scalar logical per element.
Purpose
Local NULL-coalescing operator. Provides a fallback value when a is NULL, without depending on rlang or base-R %||% availability across R versions.
Arguments
| Argument | Type | Meaning |
|---|---|---|
a |
any | Primary value; tested for NULL. |
b |
any | Fallback value returned when a is NULL. |
Mathematics
Returns
b if is.null(a) is TRUE; otherwise a.
Notes
- Defined as a backtick-quoted infix function
`%||%`, so it is callable asa %||% bwithin the enclosing namespace. - Only tests for
NULL; does not handle zero-length vectors,NA, orFALSEas "empty". - No side effects.
resolve_parametrization_multi(parametrization, parametrization_a, parametrization_W, parametrization_aggregation, prior, stan_data, amm, preflight_seed, verbose, tau_cp = 5, tau_ncp = 2)
Purpose
Multivariate counterpart of resolve_parametrization(). Combines the user-supplied global parametrization, per-component scalar overrides parametrization_a / parametrization_W, and an optional pre-flight diagnostic to produce resolved CP/NCP flags (both global scalars and per-sigma_W[1] model.
Arguments
| Argument | Type | Meaning |
|---|---|---|
parametrization |
character scalar | One of "auto", "ncp", "cp" (validated by the caller). |
parametrization_a |
NULL or character scalar |
Per-component override for NULL or "ncp" expected. The docstring states "cp" is rejected with gdpar_unsupported_feature_error, but the code itself does not perform this rejection — it merely sets cp_a_user = (parametrization_a == "cp"). |
parametrization_W |
character scalar or NULL
|
Per-component override for NULL, "ncp", or "cp". "cp" is honored. |
parametrization_aggregation |
character scalar or NULL
|
One of "any_ncp", "majority", "per_k"; NULL defaults to "any_ncp". The docstring states "per_k" is rejected with gdpar_unsupported_feature_error, but the code accepts "per_k" and handles it by setting cp_a_resolved = NA. |
prior |
list | Prior specification passed to preflight_parametrization_multi. |
stan_data |
list | Stan data list passed to preflight_parametrization_multi. |
amm |
list | Adaptive model metadata; must contain element p (number of components). |
preflight_seed |
integer or NULL
|
Seed for the pre-flight fit. |
verbose |
logical | Verbosity flag forwarded to the pre-flight. |
tau_cp |
numeric scalar | CP threshold (default 5); forwarded to preflight_parametrization_multi. |
tau_ncp |
numeric scalar | NCP threshold (default 2); forwarded to preflight_parametrization_multi. |
Mathematics
Let
A pre-flight is required iff both the user override and the global setting are indeterminate:
When the pre-flight runs, each resolved flag prefers the user override when non-NA and falls back to the pre-flight result otherwise; per-pf$cp_a_per_k / pf$cp_W_per_k. When no pre-flight is needed, per-
For the aggregation = "per_k" the global cp_a_resolved is set to NA (preserving the per-
For the sigma_W[1], heterogeneous per-length(unique(cp_W_per_k_resolved)) > 1L, a structured gdpar_inform message (class "gdpar_W_per_k_heterogeneous_message") is emitted and cp_W_per_k_resolved is overwritten by
Returns
A list with elements:
| Element | Type | Description |
|---|---|---|
cp_a |
logical scalar (possibly NA) |
Resolved global CP flag for |
cp_W |
logical scalar | Resolved global CP flag for |
cp_a_per_k |
logical vector (length |
Per-component CP flags for |
cp_W_per_k |
logical vector (length |
Per-component CP flags for |
aggregation |
character scalar | The (possibly defaulted) aggregation strategy. |
report |
gdpar_preflight_report or NULL
|
Pre-flight report when the pre-flight ran; NULL otherwise. |
All logical elements are coerced via as.logical before return.
Notes
-
Aggregation validation: If
parametrization_aggregation(after defaulting) is not in{"any_ncp", "majority", "per_k"},gdpar_abortis called withclass = "gdpar_input_error"anddata = list(received = parametrization_aggregation). -
Docstring vs. code discrepancies: The docstring asserts that
parametrization_a = "cp"is rejected withgdpar_unsupported_feature_errorand thataggregation = "per_k"is likewise rejected. The code does neither:"cp"forparametrization_ais silently accepted (settingcp_a_user = TRUE), and"per_k"aggregation is accepted and handled by settingcp_a_resolved = NA. These appear to be forward-looking contract notes for Phase H.2 rather than enforced behavior in this code. -
Pre-flight invocation: When
needs_pf_a || needs_pf_W,preflight_parametrization_multiis called withaggregation = parametrization_aggregationand the suppliedtau_cp/tau_ncp. -
Heterogeneous
$W$ message: Thegdpar_informcall carriesclass = "gdpar_W_per_k_heterogeneous_message"anddata = list(cp_W_per_k = cp_W_per_k_resolved)(the pre-overwrite vector). The message text documents that Bloque 8 may promotesigma_Wtoarray[p]. -
No direct side effects other than the potential
gdpar_informmessage and the pre-flight fit (which itself may have side effects such as file I/O or console output depending onverbose). - The function does not validate
parametrizationitself; this is assumed to have been done by the caller.
Purpose Internal schema validator for the per-dimension decision data frame produced by the multivariate preflight (Path B' applied per coordinate). Called before the data frame is stored inside a gdpar_preflight_report object. Signals a condition of class gdpar_input_error on any violation; returns invisible(TRUE) on success.
Arguments
| Argument | Type | Meaning |
|---|---|---|
df |
data.frame |
The per-coordinate, per-component decision table. Must contain exactly the columns listed in .preflight_per_dim_cols: "component", "dim", "decision", "decision_reason", "n_divergent", "div_pct", "ebfmi_min", "t_attribution", "t_info_cp", "t_info_ncp". |
Mathematics None. Purely structural validation.
Algorithm — validation checks performed in order
-
dfmust be adata.frame; otherwise abort with classgdpar_input_errorand adatafieldlist(received = class(df)). - Every column in
.preflight_per_dim_colsmust be present incolnames(df). The set of missing columns is reported in the abort message and indata$missing. -
df$componentmust be character with every value in$\{\texttt{"a"},\; \texttt{"W"}\}$ . -
df$dimmust be integer, every value$\ge 1$ , and noNA. -
df$decisionmust be character with every value in$\{\texttt{"CP"},\; \texttt{"NCP"},\; \texttt{"absent"}\}$ (the levels stored in.preflight_decision_levels). -
df$decision_reasonmust be character (no further content check). -
df$n_divergentmust be integer. - Each of
df$div_pct,df$ebfmi_min,df$t_attribution,df$t_info_cp,df$t_info_ncpmust be numeric. - No duplicate
(component, dim)pairs are allowed (checked viaduplicated(df[, c("component", "dim")])).
Returns invisible(TRUE) on success. Never returns a visible value; aborts via gdpar_abort() on failure.
Notes
- The function is internal (
@noRd,@keywords internal). It is not exported. - Side effect: raises a condition of class
gdpar_input_error(produced bygdpar_abort) if any check fails. The condition object carries adataattribute with diagnostic details (e.g., the set of missing columns, the received class ofdf). - Column
componentis restricted to the two model components of the HSGP reference-anchored parameterization:$\mathbf{a}$ (intercept) and$\mathbf{W}$ (basis weights). - The function does not check that
dimvalues form a contiguous sequence or that they are consistent across components; it only checks positivity and absence ofNA.
Purpose Internal schema validator for the aggregated (global-level) decision data frame that summarizes per-component CP/NCP decisions across all coordinates. Companion to validate_preflight_per_dim; called before storing the global summary inside a gdpar_preflight_report object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
df |
data.frame |
The per-component global decision table. Must contain exactly the columns in .preflight_global_cols: "component", "global_decision", "agreement", "method". |
Mathematics None. Purely structural validation.
Algorithm — validation checks performed in order
-
dfmust be adata.frame. - Every column in
.preflight_global_colsmust be present. -
df$componentmust be character with every value in$\{\texttt{"a"},\; \texttt{"W"}\}$ . -
df$global_decisionmust be character with every value in$\{\texttt{"CP"},\; \texttt{"NCP"},\; \texttt{"per\_k"},\; \texttt{"absent"}\}$ (.preflight_global_decision_levels). -
df$agreementmust be numeric. Additionally, every finite (non-NA, non-Inf) value must lie in$[0, 1]$ .NAandInfvalues are tolerated. -
df$methodmust be character with every value in$\{\texttt{"any\_ncp"},\; \texttt{"majority"},\; \texttt{"per\_k"}\}$ (.preflight_aggregation_methods). - No duplicate
componententries are allowed.
Returns invisible(TRUE) on success. Aborts with gdpar_input_error on failure.
Notes
- Internal, not exported.
- The
agreementcolumn is allowed to containNA(e.g., when all per-dim decisions are"absent"for a component) and$\pm\infty$ , but finite non-NAvalues are clamped to$[0,1]$ by the validator. - Duplicate
componentis forbidden because each component ($\mathbf{a}$ ,$\mathbf{W}$ ) may appear at most once in the global summary.
Purpose Aggregates the vector of per-dimension decisions for a single model component (
Arguments
| Argument | Type | Meaning |
|---|---|---|
decisions |
character vector |
Per-dimension decision levels for one component. Each element is "CP", "NCP", or "absent". |
method |
character scalar |
Aggregation method, one of "any_ncp", "majority", "per_k". |
Mathematics
Let
Strategy "any_ncp" (conservative):
Strategy "majority":
Let
Ties break conservatively toward NCP.
Strategy "per_k":
whenever "per_k" signals downstream codegen (Phase H.2) to consume the per-coordinate decision vector directly in the Stan template rather than a single aggregated decision.
Absence rule (all methods): If "absent"), the function returns "absent" immediately, before dispatching on method.
Returns A length-1 character scalar: "CP", "NCP", "per_k", or "absent".
Notes
- Internal, not exported.
- Aborts with
gdpar_input_errorifmethodis not one of the three recognized strings (the default branch ofswitch). - The
"absent"level is filtered out before aggregation. This handles the case where a component has no informative data (e.g., zero divergences and no effective diagnostic thresholds reached). - The
"majority"tie-breaking toward NCP is intentional and documented as conservative: NCP geometry is monotonically safer in the worst case.
Purpose Computes the agreement score for a single component — the fraction of effective per-dim decisions that match the aggregated global decision. For the "per_k" strategy (where there is no single global decision), it instead reports the frequency of the modal per-dim decision as a measure of uniformity.
Arguments
| Argument | Type | Meaning |
|---|---|---|
decisions |
character vector |
Per-dimension decision levels for one component (may include "absent"). |
global_decision |
character scalar |
The aggregated global decision for this component (from aggregate_preflight_one_component). |
Mathematics
Let
Case 1:
Case 2:
where
Case 3:
Case 4: Otherwise (global decision is "CP" or "NCP"):
i.e., the proportion of effective decisions equal to the global decision.
Returns A length-1 numeric scalar (or NA_real_ when no effective decisions exist or the global decision is "absent").
Notes
- Internal, not exported.
- For
"per_k"the agreement is always$\ge 0.5$ when there are effective decisions (since the mode is the majority or at least a tie, andmaxpicks the larger count; with ties it equals exactly 0.5). - For
"any_ncp"or"majority"strategies, agreement$\in [0,1]$ and equals 1.0 when all effective decisions agree with the global decision. - The function uses
identical()(not==) to compareglobal_decisionagainst"per_k"and"absent", which is safe for length-1 character comparisons and handlesNULLgracefully.
Purpose Constructs the global (aggregated) decision data frame from the per-dim decision data frame and the chosen aggregation method. Iterates over all unique components present in per_dim, applies aggregate_preflight_one_component and compute_preflight_agreement for each, and returns a data frame suitable for storage in the $global slot of a gdpar_preflight_report object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
per_dim |
data.frame |
The validated per-coordinate, per-component decision table (must conform to the schema enforced by validate_preflight_per_dim). Must contain at least columns "component" and "decision". |
method |
character scalar |
Aggregation method, one of "any_ncp", "majority", "per_k". |
Mathematics None beyond what is delegated to aggregate_preflight_one_component and compute_preflight_agreement (see those subsections).
Algorithm
- Validate that
methodis in.preflight_aggregation_methods; abort withgdpar_input_errorotherwise. - Extract the unique components from
per_dim$component(preserving encounter order, which is canonicallyc("a", "W")when both are present). - Allocate a
data.framewithlength(comps)rows and columnscomponent(character),global_decision(character),agreement(numeric),method(character, recycled from the scalarmethod). - For each component
$i$ :- Extract the
decisioncolumn for rows whereper_dim$component == comps[i]. - Compute
global_decision[i]viaaggregate_preflight_one_component(rows, method). - Compute
agreement[i]viacompute_preflight_agreement(rows, global_decision[i]).
- Extract the
- Return the assembled
globaldata frame.
Returns A data.frame with columns component, global_decision, agreement, method and one row per unique component in per_dim, in the same order as encountered in per_dim.
Notes
- Internal, not exported.
- The returned data frame conforms to the schema enforced by
validate_preflight_global(can be validated downstream). - Component ordering follows
unique(per_dim$component), which preserves first-encounter order. The canonical order isc("a", "W")when both components are present, but the function does not enforce this — it relies on the input ordering. - The
methodcolumn is filled uniformly for all rows, reflecting that a single aggregation strategy is applied across all components within one report. - The function does not validate
per_dimitself; callers (the constructor in section 2) are expected to callvalidate_preflight_per_dimfirst.
Purpose Internal constructor (not exported) that creates and validates a gdpar_preflight_report S3 object. It assembles a structured list containing per-dimension data, a globally aggregated summary, the aggregation method, and user-supplied settings.
Arguments
| Argument | Type | Meaning |
|---|---|---|
per_dim |
data frame | A per-(component, dimension) data frame produced elsewhere (e.g. by build_preflight_per_dim). Its structure is validated by validate_preflight_per_dim. |
method |
character scalar (default "any_ncp") |
The aggregation method used to collapse per-dimension decisions into a single per-component global decision. Must be one of the values stored in the internal vector .preflight_aggregation_methods. |
settings |
list (default list()) |
Arbitrary named settings that accompanied the preflight analysis (e.g. thresholds, tuning parameters). Stored verbatim for later retrieval. |
Returns An S3 object of class "gdpar_preflight_report" (a named list) with four elements: per_dim, global, method, settings.
Notes
- Input validation is strict:
methodmust be a single character string belonging to.preflight_aggregation_methods; violations triggergdpar_abortwith class"gdpar_input_error"and adatalist containing the received value. -
settingsmust be a list; violations also raise"gdpar_input_error". -
per_dimis validated byvalidate_preflight_per_dim(not shown in this section). - After validation,
build_preflight_globalconstructs the aggregated per-component data frame, which is then itself validated byvalidate_preflight_global. - No mathematics; purely structural assembly and validation.
Purpose Exported accessor that extracts the per-dimension data frame from a gdpar_preflight_report object, enabling downstream dplyr/ggplot2 analysis.
Arguments
| Argument | Type | Meaning |
|---|---|---|
report |
gdpar_preflight_report |
A validated preflight report object. |
Returns The per_dim component: a data frame with one row per (component, dim) pair.
Notes
- Calls
assert_inheritsto confirmreportis of class"gdpar_preflight_report". If not, an assertion error is raised.
Purpose Exported accessor that extracts the per-component aggregated summary from a gdpar_preflight_report object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
report |
gdpar_preflight_report |
A validated preflight report object. |
Returns The global component: a data frame with one row per component, containing the aggregated global decision, the agreement share, and related metadata.
Notes
- Calls
assert_inheritsto confirm class membership. Assertion failure raises an error.
Purpose S3 print method for gdpar_preflight_report. Produces a human-readable console summary with selectable verbosity.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_preflight_report |
The report to print. |
level |
character scalar | One of "global" (default), "dim", or "both". Controls which sections are printed. |
... |
Unused; present for S3 generic signature compliance. |
Returns Invisibly returns x (the original report object).
Notes
- The header block always prints the number of unique dimensions (
p), the aggregation method, and the list of component names. - When
levelis"global"or"both", delegates toprint_preflight_global_block(x$global). - When
levelis"dim"or"both", delegates toprint_preflight_per_dim_block(x$per_dim). - When
levelis"global", a hint is printed instructing the user to callprint(x, level = "dim")oras.data.frame(x)for more detail. -
match.argis used forlevel, so partial matching is allowed.
Purpose Internal (not exported) utility that formats a named list of equal-length vectors into a fixed-width, left-padded table of character strings suitable for console output.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cols |
named list of vectors | Each element is a column; all must have the same length. Column names become headers. Numeric values are formatted via formatC(..., format = "g", digits = 4). NA values become the string "NA". |
indent |
character scalar (default " ", four spaces) |
Prefix prepended to every output line. |
Returns A character vector whose first element is the header line and subsequent elements are the body rows, each prefixed by indent. If cols is empty or has zero rows, returns character().
Notes
- This is a pure formatting helper; it does no mathematical computation.
- Numeric columns are formatted with
formatC(general format, 4 significant digits); all other types are coerced withas.character. - Column widths are computed as
max(nchar(header), max(nchar(body)))and used for right-padding (left-alignment viaflag = "-").
Purpose Internal (not exported) helper that pretty-prints the global (per-component aggregated) section of a preflight report to the console.
Arguments
| Argument | Type | Meaning |
|---|---|---|
global |
data frame | The global element of a gdpar_preflight_report, containing columns component, global_decision, agreement, and (implicitly derived) a flag column. |
Returns NULL (invisibly). Side effect is console output.
Notes
- If
globalhas zero rows, prints"(no components)"and returns early. - A
flagcolumn is computed inline using thresholding onglobal$agreement:-
NAor non-finite →""(empty string) -
$\text{agreement} \geq 1 - 10^{-12}$ →"uniform" -
$\text{agreement} \leq 0.5 + 10^{-12}$ →"split" - Otherwise →
"mixed"
-
- These thresholds treat near-unanimity and near-even-split as the boundary cases, with a tiny tolerance (
$10^{-12}$ ) for floating-point imprecision. - The
agreementcolumn is formatted to 2 decimal places (formatC(..., format = "f", digits = 2)). - Delegates actual table formatting to
format_preflight_table.
Purpose Internal (not exported) helper that pretty-prints the per-dimension section of a preflight report to the console.
Arguments
| Argument | Type | Meaning |
|---|---|---|
per_dim |
data frame | The per_dim element of a gdpar_preflight_report, with columns including component, dim, decision, decision_reason, t_info_cp, t_info_ncp. |
Returns NULL (invisibly). Side effect is console output.
Notes
- If
per_dimhas zero rows, prints"(no per-dim rows)"and returns early. -
t_info_cpandt_info_ncpare formatted withformatC(..., format = "g", digits = 3)(3 significant digits). - Delegates table formatting to
format_preflight_table.
Purpose S3 summary method for gdpar_preflight_report. Returns a structured list of summary statistics about the report.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_preflight_report |
The report to summarize. |
... |
Unused; present for S3 generic signature compliance. |
Returns A named list with elements:
| Element | Type | Meaning |
|---|---|---|
n_components |
integer | Number of rows in object$global (i.e. number of components). |
n_dims |
integer | Number of unique dimension values in object$per_dim$dim. |
method |
character | The aggregation method stored in the report. |
per_component |
data frame | The full object$global data frame. |
overall_agreement |
numeric | Mean of the finite (non-NA, finite) values in object$global$agreement. If no finite values exist, NA_real_. |
per_dim_counts |
named integer vector | Counts of CP, NCP, and absent decisions across all (component, dim) rows. |
settings |
list | The stored settings list. |
Notes
-
assert_inheritsenforces class membership. - The
overall_agreementis computed as$\bar{a} = \frac{1}{|\{i : a_i \text{ finite}\}|}\sum_{\substack{i \\ a_i \text{ finite}}} a_i$ where$a_i$ is the agreement share for component$i$ . If the set of finite agreements is empty, the result isNA_real_.
Purpose S3 as.data.frame method that returns the tidy per-dimension data frame from a preflight report, suitable for programmatic subsetting and analysis.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_preflight_report |
The report to convert. |
row.names |
NULL or character vector |
If non-NULL, applied as the row names of the returned data frame. |
optional |
logical | Ignored; present for S3 generic compatibility with as.data.frame. |
... |
Unused; present for S3 generic signature compliance. |
Returns The x$per_dim data frame (with one row per (component, dim) pair), optionally with custom row names.
Notes
-
assert_inheritsenforces class membership. - The
optionalargument is accepted but never consulted, consistent with the baseas.data.framecontract.
Purpose S3 format method that produces a compact one-line character representation of the report, suitable for embedding in other strings or contexts where a full print is undesirable.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_preflight_report |
The report to format. |
... |
Unused; present for S3 generic signature compliance. |
Returns A character scalar of the form:
<gdpar_preflight_report: p={n_dims}, aggregation={method}, components={comma-separated names}>
where n_dims is the count of unique dimensions, method is the aggregation method string, and the component names are the unique values from x$per_dim$component joined by commas (no spaces).
Notes
- No class assertion is performed here (unlike the other methods), relying on correct S3 dispatch.
- No trailing newline is included in the returned string.
preflight_parametrization(prior, stan_data, amm, preflight_seed = NULL, verbose = FALSE, tau_cp = 5, tau_ncp = 2)
Purpose
Central pre-flight diagnostic routine that decides, per model component, whether the long fit should use the centered parametrization (CP) or the non-centered parametrization (NCP). It compiles a short NCP version of the Path 1 Stan model, draws a fixed-length sample, and applies three sequential filters—divergence attribution, E-BFMI, and info-ratio (Path B′)—to produce Boolean CP flags for the additive component a and the modulating component W.
Arguments
| Argument | Type | Meaning |
|---|---|---|
prior |
gdpar_prior (S3) |
Prior specification object; validated via assert_inherits. |
stan_data |
list | Stan data list assembled by assemble_stan_data; must contain J_a, dim_W, d, and theta_anchor. |
amm |
amm_spec (S3) |
AMM specification object; amm$a and amm$W may be NULL if the component is absent. |
preflight_seed |
integer or NULL
|
Seed forwarded to the pre-flight sampler; if NULL, no seed is set. |
verbose |
logical | If FALSE, pre-flight stderr is sunk to a tempfile and discarded. |
tau_cp |
numeric scalar (default 5) | Upper null hypothesis value for the info-ratio z-test (filter 3). |
tau_ncp |
numeric scalar (default 2) | Lower null hypothesis value for the info-ratio z-test (filter 3). |
Mathematics
The routine evaluates three filters in strict priority order. Let
Filter 1 — Divergence attribution (applied only when
where
where
If
Filter 2 — E-BFMI. If NA) is forced to CP. This pathology is not attributable per-component.
Filter 3 — Info ratio (Path B′). For each undecided component, a per-draw effective coefficient
averaged across coordinates to yield
-
Upper test (toward CP):
$z_{\text{cp}} = (\bar{\ell} - \log\tau_{\text{cp}})\,/\,\widehat{\mathrm{SE}}$ . Reject toward CP if$z_{\text{cp}} > z_{1-\alpha}$ . -
Lower test (toward NCP):
$z_{\text{ncp}} = (\bar{\ell} - \log\tau_{\text{ncp}})\,/\,\widehat{\mathrm{SE}}$ . Reject toward NCP if$z_{\text{ncp}} < -z_{1-\alpha}$ .
Decision logic for filter 3:
| Condition | Decision | Reason string |
|---|---|---|
NA
|
NCP (FALSE) |
filter_info_undefined_ncp |
CP (TRUE) |
filter_info_high |
|
NCP (FALSE) |
filter_info_low |
|
| otherwise | NCP (FALSE) |
filter_info_ambiguous_ncp |
Returns
A list with four elements:
| Element | Type | Description |
|---|---|---|
cp_a |
logical | Whether component a should use CP. |
cp_W |
logical | Whether component W should use CP. |
cs_model_ncp |
compiled cmdstan model object or NULL
|
The compiled NCP model, suitable for reuse if both decisions are NCP; NULL when neither component is needed. |
meta |
list | Diagnostic statistics and textual reasons (see below). |
The meta list contains:
-
used_preflight(logical): whether the pre-flight sample was actually run. -
n_divergent(integer): total divergent transitions. -
div_pct(numeric):n_divergent / 400. -
ebfmi_min(numeric): minimum E-BFMI across chains. -
t_attribution_a,t_attribution_W(numeric): filter 1 t-statistics. -
t_info_cp_a,t_info_ncp_a,t_info_cp_W,t_info_ncp_W(numeric): filter 3 z-statistics. -
decision_reason_a,decision_reason_W(character): textual reason per component;"absent_or_degenerate"if the component is not needed;""initialized when needed but not yet decided.
Notes
-
Early exit: If
needs_aandneeds_Ware bothFALSE(i.e., neither component has a non-degenerate dimension), the function returns immediately withcp_a = FALSE,cp_W = FALSE,cs_model_ncp = NULL, andmeta$used_preflight = FALSE. The reason fields are set to"absent_or_degenerate". -
Component degeneracy:
needs_arequireshas_a && J_a_free >= 1LwhereJ_a_free = J_aifhas_a && J_a >= 1L, else0L.needs_Wrequireshas_W && dim_W >= 1L && d >= 1L. The coordinate count forWisn_coords_W = dim_W * d. -
Fixed pre-flight settings:
adapt_delta = 0.95,max_treedepth = 10, 200 warmup + 200 sampling, 2 chains. These are independent of the user's long-fit settings. -
Stan code generation: Calls
generate_stan_code(prior, cp_a = FALSE, cp_W = FALSE)to produce the NCP model source, thenwrite_stan_to_tempfileandcmdstanr::cmdstan_modelfor compilation. -
Filter priority: Filter 1 is applied first (only if
$n_{\text{div}} \geq 1$ ); filter 2 next; filter 3 last. A component decided by an earlier filter is not re-evaluated. -
Fallback: If a component remains
NAafter all filters (e.g.,needs_aisTRUEbutn_div = 0and E-BFMI ≥ 0.3 and the info-ratio call was somehow skipped), it defaults toFALSE(NCP). -
Errors raised:
assert_inheritswill error ifprioris not agdpar_priororammis not anamm_spec. -
Side effects: Writes a Stan model to a tempfile, compiles it via
cmdstanr, and runs MCMC sampling. Ifverbose = FALSE, stderr is temporarily redirected.
Purpose
Wrapper around cs_model$sample() that enforces the fixed pre-flight MCMC settings and optionally suppresses stderr output by redirecting the message stream to a temporary log file.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cs_model |
compiled cmdstan model object | The model to sample from (produced by cmdstanr::cmdstan_model). |
stan_data |
list | Stan data list passed as data. |
seed |
integer or NULL
|
Random seed; if NULL, no seed argument is passed. |
verbose |
logical | If TRUE, show_messages and show_exceptions are enabled and stderr is not redirected. |
Mathematics
The sampler is invoked with fixed settings:
Returns
The return value of cs_model$sample(...), which is a CmdStanMCMC object (as produced by cmdstanr).
Notes
-
Stderr suppression: When
verbose = FALSE, a temporary file with extension.logis created, a file connection is opened in write-text mode ("wt"), andsink(msg_con, type = "message")is called. Anon.exithandler (withafter = FALSE) guarantees restoration: it checkssink.number(type = "message") > 0Lbefore callingsink(NULL, type = "message"), closes the connection if open, and unlinks the temp file. This ensures cleanup even ifsample()throws an error. -
Seed handling: If
seedis notNULL, it is coerced to integer viaas.integer(seed)and added to the args list. -
Refresh: Always set to
0L(no progress output). -
No side effects beyond sampling: The function does not modify global state; the only side effect is the temporary file creation/deletion when
verbose = FALSE.
Purpose
Extracts the inputs needed for the divergence-attribution t-statistic (filter 1) from a pre-flight CmdStanMCMC fit object: draws of the component's scale parameter, draws of the raw (non-centered) component, and the divergent-transition flag. Delegates the actual t-statistic computation to preflight_attribution_score().
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
CmdStanMCMC |
The pre-flight fit object from run_preflight_sample. |
component |
character | Either "a" or "W"; selects which component's variables to extract. |
Mathematics
The function maps the component name to Stan variable names and computes:
-
Log-scale vector:
$\log\sigma_X$ from the first column of thesigma_Xdraws matrix. -
Raw norm: For each transition
$t$ ,
where
-
Divergent flag: Boolean vector from the
divergent__sampler diagnostic column.
These three vectors are passed to preflight_attribution_score(log(sigma_vec), raw_norm, divergent).
Returns
A numeric scalar: the attribution t-statistic computed by preflight_attribution_score(), or NA_real_ if the statistic cannot be computed.
Notes
-
Variable mapping:
"a"→sigma_a,a_raw;"W"→sigma_W,W_raw. Any othercomponentvalue returnsNA_real_immediately. -
Defensive extraction: Each
fit$draws(...)andfit$sampler_diagnostics(...)call is wrapped intryCatchwith an error handler returningNULL. If any extraction returnsNULLor has zero columns, the function returnsNA_real_. -
Validity checks on sigma: If any element of
sigma_vecis$\leq 0$ orNA, the function returnsNA_real_(sincelog(sigma)would be undefined or non-finite). -
Diagnostic column check: If the sampler diagnostics matrix does not contain a column named
"divergent__", the function returnsNA_real_. -
Only first sigma column used:
sigma_vec <- as.numeric(draws_sigma[, 1])— even ifsigma_Xis vector-valued in Stan, only the first element is used for the attribution score. -
Raw norm: Computed as
sqrt(rowSums(raw_mat^2))over the full raw draws matrix, yielding one scalar norm per transition. -
No side effects: Pure extraction and delegation; no modification of
fitor global state.
← Part IV — Exhaustive Function Reference (5/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (7/7) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References