-
Notifications
You must be signed in to change notification settings - Fork 0
Part IV Function Reference 4
← Part IV — Exhaustive Function Reference (3/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (5/7) →
Purpose
Internal helper that deduplicates consecutive message blocks from captured stderr lines. It is used by gdpar() when verbose = FALSE to silence the duplicate divergence warnings emitted by cmdstanr during sampling. The function splits the captured lines into blocks delimited by empty lines, retains only the first occurrence of each unique block (compared by full text), and returns a flattened character vector suitable for re-emission via message().
Arguments
| Argument | Type | Meaning |
|---|---|---|
lines |
Character vector | Captured stderr lines from cmdstanr sampling. |
Mathematics
The algorithm operates in two passes:
-
Block segmentation. Given the input vector
$\mathbf{l} = (l_1, l_2, \dots, l_n)$ , consecutive non-empty strings are accumulated into a block$B_i$ . An empty string acts as a block delimiter: when encountered and the current accumulator is non-empty, the accumulator is closed as a completed block. Leading empty lines (before any non-empty content) and consecutive empty lines between blocks are silently consumed because the accumulator is empty at those points. Trailing empty lines after the last non-empty content do not produce a block. -
Deduplication. For each block
$B_i$ in order of first appearance, a key$k_i = \text{paste}(B_i, \text{collapse} = \text{"\textbackslash n"})$ is computed. If$k_i \notin S$ where$S$ is the set of previously seen keys, the block's lines (followed by an empty-string separator) are appended to the output and$k_i$ is added to$S$ . Otherwise the block is dropped.
A final trimming step removes the trailing empty-string separator from the output if present.
Returns
A character vector. If length(lines) == 0L, returns character(0). Otherwise returns the deduplicated blocks flattened into a single character vector, with blocks separated by empty strings (""), and the trailing separator removed. Ready for line-by-line re-emission via message().
Notes
- Marked
@keywords internaland@noRd; not exported. - The comparison key uses
paste(b, collapse = "\n"), so blocks are compared by their full multi-line text content, not line-by-line. - The function does not modify its input; it builds new vectors via concatenation.
- No side effects; no errors raised on any input (including
NULL, which would causelength(NULL) == 0Lto beTRUE, returningcharacter(0)).
gdpar(formula, family = gdpar_family("gaussian"), amm = amm_spec(), W = NULL, data, prior = NULL, path = c("bayes", "vcm", "hyper"), anchor = "prior_mean", skip_id_check = FALSE, chains = 4L, iter_warmup = 1000L, iter_sampling = 1000L, adapt_delta = 0.95, max_treedepth = 12L, refresh = 100L, verbose = TRUE, seed = NULL, group = NULL, parametrization = c("auto", "ncp", "cp"), parametrization_a = NULL, parametrization_W = NULL, parametrization_aggregation = NULL, id_check_rigor = c("full", "fast"), ...)
Purpose
Main exported entry point of the gdpar package. Fits the AMM (Additive–Multiplicative Modulation) canonical decomposition of the individual parameter via Path 1 (hierarchical Bayesian inference through Stan/cmdstanr). The function orchestrates input validation, covariate standardization, basis-restricted identifiability diagnostics, Stan model code generation, compilation, HMC/NUTS sampling, and convergence diagnostic collection. It also dispatches to internal multivariate (.gdpar_multi), K-individual (.gdpar_K), and placeholder paths (Path 2/Path 3) based on the input configuration.
Arguments
| Argument | Type | Meaning |
|---|---|---|
formula |
Formula or gdpar_formula_set
|
Two-sided formula (y ~ ...) or a gdpar_formula_set object. In the legacy two-sided form, the LHS is the outcome variable; the RHS either lists covariates entering the modulating component as the linear factor amm), or contains AMM wrapper calls a(...), b(...), W(). A gdpar_formula_set is the multi-parameter form with one slot per K-individual parameter. |
family |
gdpar_family, gdpar_family_multi, or named list of gdpar_family
|
Response distribution. Defaults to gdpar_family("gaussian"). A named list of gdpar_family objects triggers the heterogeneous-families-per-slot path (sub-phase 8.3.7), requiring |
amm |
amm_spec or named list of amm_spec
|
AMM specification. Defaults to amm_spec() (Level 0 degenerate). A named list triggers the K-individual low-level path. Must remain at default when formula is a gdpar_formula_set or contains AMM wrapper calls in the RHS. |
W |
W_basis or NULL
|
Modulating basis for K-individual paths. NULL by default. Ignored in the legacy single-amm_spec path (where the basis travels via amm$W). Supplying a non-NULL W in the legacy path raises an error. |
data |
Data frame | Data containing variables referenced by formula and amm. |
prior |
gdpar_prior or NULL
|
Prior specification. Defaults to gdpar_prior() when NULL. |
path |
Character scalar | Estimation path: "bayes" (Path 1, default), "vcm" (Path 2), "hyper" (Path 3). Only "bayes" is implemented; the other two raise gdpar_unsupported_feature_error. |
anchor |
Numeric scalar or character | Anchor value "prior_mean" (default) or "empirical_y" (linkfun applied to outcome mean). |
skip_id_check |
Logical scalar | If TRUE, skip the Gram-matrix identifiability diagnostic. Default FALSE. |
chains |
Integer scalar | Number of HMC chains. Default 4L. |
iter_warmup |
Integer scalar | Warmup iterations per chain. Default 1000L. |
iter_sampling |
Integer scalar | Sampling iterations per chain. Default 1000L. |
adapt_delta |
Numeric scalar | Target acceptance probability for NUTS. Default 0.95. Validated to be in |
max_treedepth |
Integer scalar | Maximum NUTS tree depth. Default 12L. |
refresh |
Integer scalar | Progress-reporting frequency passed to cmdstanr. Default 100L. Must be non-negative. |
verbose |
Logical scalar | Verbosity of package-level informational messages. Default TRUE. Also controls show_messages and show_exceptions in cmdstanr. |
seed |
Integer scalar or NULL
|
Random seed for cmdstanr. Default NULL. Also forwarded to the pre-flight diagnostic when parametrization = "auto". |
group |
One-sided formula or NULL
|
Grouping variable for per-group hierarchical estimation of theta_ref. Default NULL (single global theta_ref). Only single-variable one-sided formulas accepted (e.g., ~ species). |
parametrization |
Character scalar | CP/NCP parametrization selection: "auto" (default, runs pre-flight), "ncp" (force non-centered), "cp" (force centered). |
parametrization_a |
Character scalar or NULL
|
Override for component a: "ncp" or "cp". NULL inherits from parametrization. |
parametrization_W |
Character scalar or NULL
|
Override for component W: "ncp" or "cp". NULL inherits from parametrization. If both parametrization_a and parametrization_W are explicit, pre-flight is skipped. |
parametrization_aggregation |
Character scalar or NULL
|
Aggregation rule for multivariate per-coordinate CP/NCP decisions: "any_ncp", "majority", or "per_k". Only forwarded to .gdpar_multi; not used in the scalar or K-individual paths. |
id_check_rigor |
Character scalar | Identifiability check strictness: "full" (default) or "fast". Forwarded to .gdpar_K and .gdpar_multi; not directly used in the scalar path. |
... |
Additional named arguments forwarded to the underlying cmdstanr sampler (via sample_args), and also forwarded to .gdpar_K / .gdpar_multi. |
Mathematics
AMM canonical decomposition. The individual parameter
where
Anchor parametrization. The anchor value
Changing the anchor changes the parametrization but not the data-generating model.
Group-level random intercept. When group is supplied,
Identifiability diagnostic test point. The diagnostic test point
This avoids a degenerate zero test point when the anchor is numerically zero and a multiplicative component
Covariate standardization. Covariates entering the additive basis
Parametrization pre-flight (Path B'). When parametrization = "auto", a short pre-flight NCP fit is run and per-component CP/NCP decisions are made via three sequential filters:
- A divergence-attribution
$t$ -test. - An E-BFMI threshold check.
- A chain-aware block-bootstrap
$z$ -test on the posterior-to-prior contraction of the effective coefficient.
Returns
An object of class c("gdpar_fit", "list") with components:
| Component | Type | Description |
|---|---|---|
fit |
CmdStanFit object |
The underlying cmdstanr fit object. |
amm |
amm_spec |
The (possibly materialized) AMM specification used. |
family |
gdpar_family / gdpar_family_multi
|
The (possibly promoted) family object. |
prior |
gdpar_prior |
The prior specification. |
design |
List | The AMM design matrices and metadata from build_amm_design(). |
anchor |
Numeric scalar | The resolved anchor value |
stan_data |
List | The data list passed to Stan. |
identifiability_report |
List or NULL
|
Report from gdpar_check_identifiability(), or NULL if skip_id_check = TRUE. |
diagnostics |
List | Convergence diagnostics from compute_diagnostics(). |
parametrization |
List | Contains cp_a (logical), cp_W (logical), and meta (pre-flight diagnostic statistics when applicable). |
group_info |
List or NULL
|
Group information from .resolve_group_argument(). |
call |
call |
The matched call. |
path |
Character scalar | The resolved estimation path ("bayes"). |
Notes
Dispatch logic. The function has three major dispatch branches before reaching the scalar Stan pipeline:
-
K-individual paths — triggered when
formulais agdpar_formula_set, whenammis a named list ofamm_specobjects, or whenformulais a classic two-sided formula whose RHS contains AMM wrapper calls (a()/b()/W(), detected by.gdpar_rhs_has_amm_calls()). In all three sub-paths,ammmust be at its defaultamm_spec(). The formula set / AMM wrappers are converted toamm_list_canonicalvia.gdpar_formula_set_to_amm_spec_list(). If$K > 1$ , the function delegates to.gdpar_K(). If$K = 1$ , the singleamm_specis extracted, the formula is reconstructed fromunion(all.vars(amm$a), all.vars(amm$b)), and execution falls through to the scalar path. -
Multivariate path — triggered when
amm$p > 1L(andammis a singleamm_spec). Delegates to.gdpar_multi(). -
Scalar univariate path — the main body, for
amm$pbeingNULLor1L.
Heterogeneous family detection. A named list is recognized as a heterogeneous family specification when all of the following hold: it is.list(), does not inherit from gdpar_family or gdpar_family_multi, has non-NULL names, all names are non-empty (nzchar), no duplicated names, and every element inherits from gdpar_family. If .gdpar_resolve_heterogeneous_family_K() (producing a location_family and a family_id_k_vector); otherwise .gdpar_promote_scope_per_observation() is used.
Errors raised (via gdpar_abort):
| Condition | Error class |
|---|---|
path = "vcm" |
gdpar_unsupported_feature_error |
path = "hyper" |
gdpar_unsupported_feature_error |
formula is a gdpar_formula_set but amm is not at default |
gdpar_input_error |
Named list amm with empty slot name |
gdpar_input_error |
Named list amm entry not inheriting amm_spec
|
gdpar_input_error (with data containing slot and received) |
Named list amm with duplicated names |
gdpar_input_error |
Named list amm but formula not a two-sided formula |
gdpar_input_error |
Classic formula with AMM calls but amm not at default |
gdpar_input_error |
W supplied in the legacy single-amm_spec path |
gdpar_input_error |
formula not a two-sided formula |
gdpar_input_error |
gdpar_family_multi with amm$p NULL or 1 |
gdpar_input_error (with data containing family_class and amm_p) |
| Heterogeneous family with |
gdpar_input_error (with data containing K) |
refresh not a non-negative numeric scalar |
gdpar_input_error |
verbose not a logical scalar |
gdpar_input_error |
Outcome variable not in data
|
gdpar_input_error |
| Outcome contains non-finite values (NA, NaN, Inf) | gdpar_input_error |
| Identifiability check fails |
gdpar_identifiability_error (with data containing report) |
Assertion checks (via assert_* helpers):
-
familymust inherit fromc("gdpar_family", "gdpar_family_multi")(unless it is a recognized named list). -
ammmust inherit from"amm_spec". -
datamust be a data frame. -
priormust inherit from"gdpar_prior". -
chains,iter_warmup,iter_sampling,max_treedepthmust be count scalars. -
adapt_deltamust be a numeric scalar in$[0.5, 0.999]$ .
Side effects:
- Calls
require_suggested("cmdstanr", ...); aborts ifcmdstanris not installed. - Writes Stan source code to a temporary file via
write_stan_to_tempfile(). - Compiles the Stan model via
cmdstanr::cmdstan_model(). - Performs HMC/NUTS sampling via
cs_model$sample(). - When
verbose = FALSE: creates a temporary.logfile, opens a message sink, redirectsstderrto the file during sampling, then on exit reads the file, deduplicates blocks viadedup_message_blocks(), and re-emits them viamessage(). Theon.exithandler is registered withadd = TRUE, after = FALSE. - When
verbose = TRUE: passesshow_messages = TRUEandshow_exceptions = TRUEto the sampler. - Emits informational messages via
gdpar_inform()for: skipped identifiability check, D-ID conditional status, and group resolution. - Calls
.check_group_aliasing_c7()when a group is supplied, enforcing condition C7 of Block 6.5 (no perfect aliasing ofa/bcolumns with the group indicator). - The
...arguments are merged intosample_argsby name, potentially overriding default sampler settings.
D-ID status message. When family$did_status == "holds_under_condition" and verbose = TRUE, a message is emitted with the family name, the D-ID condition, and the D-ID reference. The package documents but does not verify this condition from data.
Group argument. Resolved via .resolve_group_argument(group, data, n = length(y), verbose = verbose). If non-NULL, the group_id integer vector is passed to assemble_stan_data() and .check_group_aliasing_c7() is called with the design, group ID, and variable name.
Parametrization resolution. resolve_parametrization() is called with parametrization, parametrization_a, parametrization_W, prior, stan_data, amm, preflight_seed = seed, and verbose. Note that parametrization_aggregation is NOT passed to resolve_parametrization() in the scalar path. The resolved cp_a and cp_W flags are forwarded to generate_stan_code().
Stan code generation. generate_stan_code(prior, cp_a, cp_W) substitutes prior placeholders into the static template at inst/stan/amm_main.stan.
W basis materialization. When amm$W is non-NULL, it is materialized via materialize_W_basis(amm$W, p = 1L) before design assembly.
Formula RHS processing. The RHS is extracted as formula[c(1L, 3L)] and updated with ~ . + 0 (removing the intercept), then passed as formula_rhs to build_amm_design() and gdpar_check_identifiability().
S3 dispatch. The returned object has class c("gdpar_fit", "list"), enabling S3 method dispatch for print, summary, etc. (defined elsewhere in the package).
Purpose
A closure defined inside gdpar() (not a top-level function). It is registered as an on.exit handler (with add = TRUE, after = FALSE) when verbose = FALSE, to flush the stderr messages captured during cmdstanr sampling. It un-sinks the message stream, closes the temporary log file connection, reads and deduplicates the captured lines, re-emits them via message(), and cleans up the temp file.
Arguments
None. The function captures msg_file (the temp file path) and msg_con (the file connection) from the enclosing scope of gdpar().
Returns
NULL (invisibly). Called for its side effects.
Notes
- This is a local closure, not a package-level function. It is defined only when
verbose = FALSEwithingdpar(). - The function first checks
sink.number(type = "message") > 0Lbefore un-sinking, guarding against double-un-sink. - It checks
isOpen(msg_con)before closing. - File reading is wrapped in
tryCatch: on error,character(0)is returned, and no messages are re-emitted. - The temp file is deleted via
unlink(msg_file)regardless of read success. - Captured lines are deduplicated via
dedup_message_blocks()before re-emission. - Each line of the deduplicated output is re-emitted via
message(line). - Registered via
on.exit(flush_captured(), add = TRUE, after = FALSE), ensuring it runs before otheron.exithandlers added earlier.
.gdpar_multi(formula, family, amm, data, prior, anchor, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group = NULL, parametrization, parametrization_a, parametrization_W, parametrization_aggregation = NULL, id_check_rigor = "full", call, ...)
Purpose
Internal workhorse implementing the multivariate (gdpar(). It validates inputs, promotes a univariate family to a multivariate one when necessary, assembles the AMM design matrix, resolves the anchor and the CP/NCP parametrization, runs a basis-restricted identifiability check, generates and compiles Stan code via cmdstanr, executes HMC/NUTS sampling, computes posterior diagnostics, and returns a gdpar_fit object. It is the multivariate counterpart referenced by the trailing (not-yet-defined-in-this-section) K-individual path documented for
Arguments
-
formula:formula(length 3, two-sided). The LHS names the outcome column indata; the RHS is rebuilt internally with the intercept suppressed. -
family: object of classgdpar_familyorgdpar_family_multi. A baregdpar_familyis auto-promoted togdpar_family_multiwithp = amm$pwhenamm$p > 1. -
amm: list representing the anchor model matrix specification; must expose integer element$p(the outcome dimension) and may expose$W(basis) and$b(offset).$W, if present, is materialized throughmaterialize_W_basis(). -
data:data.framecontaining the outcome column and any RHS variables. -
prior:gdpar_priorobject; ifNULL, replaced bygdpar_prior(). -
anchor: anchor specification forwarded toresolve_anchor_multi(). -
skip_id_check:logicalscalar; whenTRUE, the basis-restricted identifiability check (C1–C4 + C4-bis) is bypassed. -
chains: non-negative integer scalar; number of MCMC chains. -
iter_warmup: non-negative integer scalar; warmup iterations per chain. -
iter_sampling: non-negative integer scalar; sampling iterations per chain. -
adapt_delta: numeric scalar in$[0.5, 0.999]$ ; NUTS step-size adaptation target. -
max_treedepth: non-negative integer scalar; NUTS maximum tree depth. -
refresh: non-negative numeric scalar; Stan progress refresh interval. -
verbose:logicalscalar; toggles informative messages, Stanshow_messages/show_exceptions, and message-sink behavior. -
seed:integerscalar orNULL; PRNG seed forwarded tocmdstanrand to the parametrization preflight. -
group:NULLor group specification; resolved by.resolve_group_argument(). -
parametrization: forwarded toresolve_parametrization_multi(); uniform CP/NCP flag or"auto". -
parametrization_a: forwarded toresolve_parametrization_multi(); per-component$a$parametrization override. -
parametrization_W: forwarded toresolve_parametrization_multi(); per-component$W$parametrization override. -
parametrization_aggregation:NULL(defaulting internally to"any_ncp") or a string; rule for aggregating per-coordinate CP/NCP decisions. -
id_check_rigor: string (default"full"); rigor level passed togdpar_check_identifiability(). -
call: language object; the original call, stored verbatim in the returned object. -
...: extra named arguments merged into thesample_argslist passed tocmdstanr::cmdstan_model$sample().
Mathematics
The function operationalizes the multivariate Path-1 Bayesian model. Let
The diagnostic test point for the identifiability check is chosen heuristically as
i.e. when the anchor is numerically zero and a basis offset $b is present, the all-ones vector is substituted to avoid a degenerate diagnostic. The identifiability report is produced by gdpar_check_identifiability() at rigor id_check_rigor; failure raises a gdpar_identifiability_error.
The CP/NCP decision is resolved per coordinate resolve_parametrization_multi(), yielding boolean vectors cp_a_per_k and cp_W_per_k plus uniform flags cp_a/cp_W. When the per-coordinate $a$ decisions are non-constant,
activating the segment()-based per-coordinate prior wiring in the generated Stan code. The metadata note records that uniform-NCP and uniform-CP branches compile to byte-identical code relative to the "H.1 template", and that the mode is "multivariate_phase_h" (Phase H.2 preflight).
Returns
A list with S3 class c("gdpar_fit", "list") containing:
-
fit: thecmdstanrfit object returned by$sample(). -
amm: the (possibly$W-materialized)amminput. -
family: the (possibly promoted)gdpar_family_multiobject. -
prior: thegdpar_priorobject used. -
design: the AMM design object frombuild_amm_design(). -
anchor: the resolved anchor value (numeric vector of length$p$ ). -
stan_data: the assembled Stan data list. -
identifiability_report: the report fromgdpar_check_identifiability(), orNULLwhenskip_id_check = TRUE. -
diagnostics: output ofcompute_diagnostics(fit, verbose). -
parametrization: a list with elementscp_a,cp_W,cp_a_per_k,cp_W_per_k,report, andmeta(containingmode,note,aggregation, andrequested). -
group_info: result of.resolve_group_argument()(possiblyNULL). -
call: the storedcall. -
path: the string"bayes". -
p: the integer outcome dimension.
Notes
-
Errors raised (class
gdpar_input_error):formulanot a length-3 formula;datanot a data frame;priornot inheritinggdpar_prior;chains/iter_warmup/iter_sampling/max_treedepthnot non-negative integer scalars;adapt_deltaoutside$[0.5, 0.999]$ ;refreshnot a non-negative numeric scalar;verbosenot a logical scalar;familynot promotable togdpar_family_multi;family$p != amm$p; outcome name absent fromdata; outcome not a matrix/array; outcome containing any non-finite (NA,NaN,Inf) entry — Path 1 does not impute; outcome column count differing from$p$ . -
Errors raised (class
gdpar_identifiability_error): basis-restricted identifiability check (C1–C4 + C4-bis) fails at the diagnostic test point; the errordatafield carries the fullreport. -
Family auto-promotion: when
familyis a baregdpar_family(notgdpar_family_multi) andamm$p > 1, it is promoted viagdpar_family_multi(family, p = p)and agdpar_family_promotion_messageis emitted whenverbose. -
D-ID informational message: when
family$did_status == "holds_under_condition"andverbose, agdpar_did_messageis emitted reporting the condition and reference; the package documents but does not verify the condition from data. -
Group aliasing: when a group is supplied,
.check_group_aliasing_c7()is invoked against the design and group id before Stan data assembly. -
Message capture under
verbose = FALSE: a temporary.logfile is opened,sink(..., type = "message")is engaged, and anon.exithandlerflush_captured()(a closure defined inside the function) closes the sink, reads the file, deduplicates blocks viadedup_message_blocks(), emits each surviving line throughmessage(), and unlinks the temp file. The handler is registered withadd = TRUE, after = FALSE. -
Extra arguments: any
...names overwrite or extendsample_argsentries, allowing caller override ofcmdstanr::cmdstan_model$sample()parameters. -
S3 dispatch: the returned object is dispatched on class
gdpar_fit; no methods are defined within this section. -
Trailing roxygen block: the section ends with a
@noRd/@keywords internaldocumentation block describing a not-yet-defined K-individual path ($K > 1, p = 1$ ) targeting theamm_distrib_K.stantemplate via.build_amm_design_K(),.assemble_stan_data_K(), andgenerate_stan_code_K(); that function's body is not present in this section and is therefore not described here.
.gdpar_K(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group = NULL, parametrization, parametrization_a, parametrization_W, id_check_rigor = "full", family_id_k_vector = NULL, call, ...)
Purpose
Primary internal entry point for Path 1 (K-individual hierarchical Bayesian) model fitting. Validates all user-supplied arguments, delegates model/data assembly to .gdpar_K_build(), invokes cmdstanr MCMC sampling, computes post-sampling diagnostics and post-fit identifiability (information-contraction) checks, and assembles the final gdpar_fit S3 object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm_list_canonical |
list | Canonical AMM (anchor-map model) specifications, one per K slot. |
family |
gdpar_family |
Response distribution family with K-individual slots promoted to per-observation scope. |
data |
data.frame | Model data containing the outcome and all covariates. |
prior |
gdpar_prior or NULL
|
Prior specification; if NULL, defaults to gdpar_prior(). |
anchor |
numeric scalar, numeric vector (length K), or character ("prior_mean", "empirical_y") |
Reference-anchor value(s) for the K slots. |
outcome_name |
character (length 1) | Name of the outcome column in data. |
formula_env |
environment | Environment in which the model formula is constructed. |
skip_id_check |
logical | If TRUE, skip basis-restricted identifiability checks. |
chains |
integer (count) | Number of MCMC chains. |
iter_warmup |
integer (count) | Warmup iterations per chain. |
iter_sampling |
integer (count) | Post-warmup sampling iterations per chain. |
adapt_delta |
numeric scalar in |
NUTS step-size adaptation target. |
max_treedepth |
integer (count) | Maximum NUTS tree depth. |
refresh |
non-negative numeric scalar | Stan progress-report refresh interval. |
verbose |
logical scalar | Whether to show Stan messages and informational diagnostics. |
seed |
integer or NULL
|
Random seed for reproducibility. |
group |
NULL or group specification |
Grouping structure for hierarchical partial pooling. |
parametrization |
character | Global CP/NCP parametrization mode ("cp" or "ncp"). |
parametrization_a |
character or NULL
|
Per-component parametrization for the a coefficients; overrides parametrization when non-NULL. |
parametrization_W |
character or NULL
|
Per-component parametrization for the W basis; overrides parametrization when non-NULL. |
id_check_rigor |
character (default "full") |
Rigor level for the K-level identifiability check. |
family_id_k_vector |
NULL or vector |
Per-slot family ID vector. |
call |
call | The original matched call (stored in the returned object). |
... |
named arguments | Extra arguments forwarded to cs_model$sample(), overriding defaults. |
Mathematics
The function orchestrates Bayesian inference for a K-individual dynamic-parameter model. Each slot adapt_delta and maximum tree depth max_treedepth.
Post-fit, an information-contraction diagnostic is computed per slot. Let
Slots with
Returns
A list with S3 class c("gdpar_fit", "list") containing:
| Element | Type | Description |
|---|---|---|
fit |
CmdStanFit object | Raw posterior draws. |
amm_list_canonical |
list | Canonical AMM list (possibly with materialized |
family |
gdpar_family |
Resolved family object. |
prior |
gdpar_prior |
Resolved prior object. |
design_K |
list | K-individual design (matrices |
anchor |
numeric vector (length K) | Resolved anchor values. |
stan_data |
list | Stan data list passed to the sampler. |
identifiability_report |
list or NULL
|
Pre-fit per-slot identifiability reports with K-level attribute. |
identifiability_post_fit |
list | Post-fit information-contraction report. |
diagnostics |
list | MCMC convergence diagnostics (R-hat, ESS, divergences, etc.). |
parametrization |
list | Resolved CP/NCP flags (cp_a, cp_W, cp_a_per_K, meta). |
group_info |
list or NULL
|
Resolved grouping information. |
call |
call | Original function call. |
path |
character | Always "bayes". |
K |
integer | Number of slots. |
slot_names |
character vector | Canonical slot names. |
Notes
-
Input validation: Asserts
datais a data.frame,priorinherits fromgdpar_prior,chains/iter_warmup/iter_sampling/max_treedepthare counts,adapt_deltais in$[0.5, 0.999]$ ,refreshis a non-negative numeric scalar, andverboseis a logical scalar. Violations raise errors of classgdpar_input_error. -
Suggested package: Requires
cmdstanr(viarequire_suggested). -
Message capture: When
verbose = FALSE, Stan messages are redirected to a temporary.logfile viasink(). A closureflush_captured(registered onon.exit) closes the sink, reads the file, deduplicates message blocks viadedup_message_blocks(), and replays them throughmessage(). -
Extra arguments: Named elements in
...are merged intosample_argsafter all defaults are set, so they can override any sampling argument. -
Seed: Only added to
sample_argswhen non-NULL; coerced to integer. -
Init: Only added to
sample_argswhenbd$initis non-NULL(Tweedie family case). -
Post-fit warnings: If any slot has
status == "information_error", a warning of classgdpar_information_erroris issued (always). If any slot hasstatus == "warn"andverboseisTRUE, a warning of classgdpar_information_warningis issued. -
Side effects: Creates (and cleans up) a temporary log file when
verbose = FALSE. Callsdo.call(cs_model$sample, sample_args)which writes to disk and may print to console.
.gdpar_K_build(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, skip_id_check, verbose, group = NULL, parametrization, parametrization_a, parametrization_W, id_check_rigor = "full", family_id_k_vector = NULL, compile_model_methods = FALSE)
Purpose
Behaviour-preserving extraction of the build phase of .gdpar_K() — everything up to but not including the call to cs_model$sample(). Shared between .gdpar_K() (which passes compile_model_methods = FALSE) and gdpar_geom_fit() (which passes compile_model_methods = TRUE to expose $log_prob/$grad_log_prob/$hessian for the geometry engine). This ensures the model and data are built and compiled exactly once with no duplication.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm_list_canonical |
list | Canonical AMM specifications, one per K slot. |
family |
gdpar_family |
Response distribution family. |
data |
data.frame | Model data. |
prior |
gdpar_prior or NULL
|
Prior specification; defaults to gdpar_prior() if NULL. |
anchor |
numeric or character | Anchor specification for the K slots. |
outcome_name |
character | Name of the outcome column. |
formula_env |
environment | Environment for formula construction. |
skip_id_check |
logical | Whether to skip identifiability checks. |
verbose |
logical | Verbosity flag for informational messages. |
group |
NULL or group spec |
Grouping structure. |
parametrization |
character | Global CP/NCP mode. |
parametrization_a |
character or NULL
|
Per-component a parametrization. |
parametrization_W |
character or NULL
|
Per-component W parametrization. |
id_check_rigor |
character (default "full") |
Rigor for K-level identifiability check. |
family_id_k_vector |
NULL or vector |
Per-slot family ID vector. |
compile_model_methods |
logical (default FALSE) |
If TRUE, compiles with compile_model_methods = TRUE to expose log-prob/gradient/Hessian methods. |
Mathematics
Design construction. The union of all covariate variables across the K AMM slots is collected:
A model formula .build_amm_design_K().
Anchor resolution. The anchor is resolved via resolve_anchor_K(), producing a length-K numeric vector
Per-slot identifiability check. For each slot
This avoids testing at a degenerate zero anchor when a dispersion/basis term gdpar_check_identifiability().
K-level identifiability check. A cross-slot check (via .check_identifiability_K()) verifies:
-
D-B3: per-slot
$Z_a^{(k)}$ rank sufficiency. - D-B2: cross-slot extended Gram matrix rank.
Parametrization resolution. CP/NCP flags are resolved with per-component overrides:
and analogously for cp_W. Per-slot parametrization (cp_a_per_K) is NULL (deferred to a future sub-phase).
Tweedie init. For the Tweedie family (family$stan_id == 9), the structural constraint
Returns
A list with four top-level elements:
| Element | Type | Description |
|---|---|---|
cs_model |
CmdStanModel | Compiled Stan model. |
stan_data |
list | Stan data list. |
init |
function or NULL
|
Per-chain init function (Tweedie only); NULL otherwise. |
meta |
list | Metadata list (see below). |
The meta sub-list contains:
| Element | Type | Description |
|---|---|---|
amm_list_canonical |
list | AMM list (with materialized |
family |
gdpar_family |
Resolved family. |
prior |
gdpar_prior |
Resolved prior. |
design_K |
list | K-individual design. |
anchor_value |
numeric vector (length K) | Resolved anchor values. |
id_report |
list or NULL
|
Per-slot identifiability reports with K_level attribute. |
group_info |
list or NULL
|
Resolved group information. |
parametrization_resolved |
list | Resolved CP/NCP flags and metadata. |
slot_names |
character vector | Canonical slot names. |
K |
integer | Number of slots. |
stan_src |
character | Generated Stan source code. |
Notes
-
Outcome validation: Aborts with
gdpar_input_errorifoutcome_nameis not a column indata, if the outcome is matrix/array-valued (Path 1 with$K > 1, p = 1$ requires a length-$n$ vector), or if the outcome contains any non-finite values (NA, NaN, Inf). Path 1 does not impute. -
$W$ basis materialization: For each slot, if$Wis non-NULL, it is materialized viamaterialize_W_basis(W, p = 1L). -
Identifiability skip: When
skip_id_check = TRUEandverbose = TRUE, an informational message of classgdpar_identifiability_messageis issued;id_reportis set toNULL. -
Identifiability failure: Per-slot failures abort with class
gdpar_identifiability_error(including the slot name and report indata). K-level failures abort similarly, identifying whether the failure was in the per-slot rank check (D-B3) or the cross-slot extended Gram check (D-B2). -
D-ID conditional families: When
family$did_status == "holds_under_condition"andverbose = TRUE, an informational message of classgdpar_did_messageis issued documenting the condition and reference. The package does not verify the condition from data. -
Group aliasing: When groups are present,
.check_group_aliasing_c7()is called for each slot's design ($Z_a^{(k)}$ ,$Z_b^{(k)}$ ,$X$ ) against the group ID. -
Model compilation: When
compile_model_methods = FALSE, callscmdstanr::cmdstan_model(stan_path)(bit-identical to pre-refactor behavior). WhenTRUE, callscmdstanr::cmdstan_model(stan_path, compile_model_methods = TRUE). -
Stan code generation: Delegates to
generate_stan_code_K()with the resolved prior, CP/NCP flags, and family. -
Tempfile: Stan source is written to a temporary file via
write_stan_to_tempfile().
Purpose
Closure defined inside .gdpar_K() when verbose = FALSE. Registered on on.exit() to cleanly close the message sink, read the captured Stan messages from the temporary log file, deduplicate them, and replay them through message().
Arguments
None. Captures msg_con (the file connection) and msg_file (the temp file path) from the enclosing scope.
Returns
NULL (invisible). Side effect: messages are replayed to the R message stream.
Notes
- Checks
sink.number(type = "message") > 0Lbefore closing the sink to avoid errors if the sink was already closed. - Checks
isOpen(msg_con)before closing the connection. - Reads the log file with
readLines(msg_file, warn = FALSE); on error, returnscharacter(0). - Unlinks the temporary file after reading.
- Passes captured lines through
dedup_message_blocks()before replaying. - Registered via
on.exit(flush_captured(), add = TRUE, after = FALSE)so it executes before any otheron.exithandlers.
Documentation for resolve_anchor_K appears at the end of this section (roxygen block only); its function definition is not present in this section and will be documented in a subsequent section.
Purpose
Resolves the user-supplied anchor argument for the K-slot (multi-parameter-per-response) path of gdpar. It returns a length-K numeric vector on the linear-predictor scale, one anchor value per parameter slot (e.g. location, scale, shape). This is the slot-aware counterpart of the simpler resolve_anchor.
Arguments
-
anchor: A numeric scalar, a numeric vector of lengthK, or a single character string ("prior_mean"or"empirical_y"). -
family: Agdpar_familyobject. Only used whenanchor = "empirical_y"; its$linkfuncomponent is applied to the outcome mean for the location (first) slot. -
y: Numeric vector of outcomes. Used only whenanchor = "empirical_y". -
prior: Agdpar_priorobject. Accepted as part of the signature but not referenced in the function body. -
slot_names: Character vector of lengthKgiving the names of the parameter slots. Used for naming the output and for validating a namedanchorvector. -
verbose: Logical scalar. IfTRUE, an informational message is issued when the empirical anchor is used.
Mathematics
For "prior_mean":
For "empirical_y":
where family$linkfun. The returned vector is:
i.e. only the first slot (location) receives the link-transformed mean; all remaining slots are anchored at
Returns
A numeric vector of length K with names set to slot_names.
Notes
-
Scalar broadcast: A finite numeric scalar is recycled to all
Kslots. -
Named vector validation: If
anchoris a length-Knumeric vector with at least one non-empty name, the names must form the same set asslot_names(order-independent, checked viasetequal). A mismatch raises agdpar_input_error(class"gdpar_input_error") withdata = list(received_names = ..., expected_names = ...). -
Unnamed vector: A length-
Knumeric vector with no names is accepted positionally. -
Link failure: If
family$linkfun(yb)errors, the error is caught and re-raised asgdpar_input_errorwith the original condition message. -
Invalid anchor: Any input not matching the accepted patterns raises
gdpar_input_errorwithdata = list(received = anchor, K = K, slot_names = slot_names). - The
priorargument is unused in the body. - When
verboseisTRUEand"empirical_y"is used, a message of class"gdpar_anchor_message"is emitted viagdpar_inform.
Purpose
Multivariate-outcome counterpart of resolve_anchor. Resolves the anchor for a model with p outcome coordinates, returning a length-p numeric vector on the linear-predictor scale.
Arguments
-
anchor: A numeric scalar, a numeric vector of lengthp, or a single character string ("prior_mean"or"empirical_y"). -
family: Agdpar_family_multiobject. Must contain afamiliescomponent that is a list of lengthp, each element providing a$linkfun. -
y: Numeric matrix of outcomes (n × p). Used only whenanchor = "empirical_y". -
prior: Agdpar_priorobject. Accepted but not referenced in the body. -
p: Integer scalar; the number of outcome coordinates (length oftheta_ref). -
verbose: Logical scalar. IfTRUE, an informational message is issued when the empirical anchor is used.
Mathematics
For "prior_mean":
For "empirical_y":
where family$families[[k]]$linkfun. The returned vector is:
Returns
A numeric vector of length p (unnamed).
Notes
-
Scalar broadcast: A finite numeric scalar is recycled to length
p. -
Vector pass-through: A length-
pfinite numeric vector is returned as-is (coerced to double), without naming or name validation (unlikeresolve_anchor_K). -
Per-coordinate link: Each coordinate's link function is applied independently via
vapply. If anyfam_k$linkfun(yb[k])errors, it is caught and re-raised asgdpar_input_erroridentifying the failing coordinate indexk. -
Invalid anchor: Raises
gdpar_input_errorwithdata = list(received = anchor, p = p). - The
priorargument is unused in the body. - When
verboseisTRUEand"empirical_y"is used, a message of class"gdpar_anchor_message"is emitted showing the formatted anchor values.
Purpose
Simplest (univariate) anchor resolver. Parses the user-supplied anchor argument and returns a single numeric value on the linear-predictor scale.
Arguments
-
anchor: A numeric scalar, or a single character string ("prior_mean"or"empirical_y"). -
family: Agdpar_familyobject. Must provide$linkfun(used for"empirical_y") and$link(a string naming the link, used in the verbose message). -
y: Numeric vector of outcomes. Used only whenanchor = "empirical_y". -
prior: Agdpar_priorobject. Accepted but not referenced in the body. -
verbose: Logical scalar. IfTRUE, an informational message is issued when the empirical anchor is used.
Mathematics
For "prior_mean":
For "empirical_y":
where family$linkfun.
Returns
A numeric scalar.
Notes
- Unlike
resolve_anchor_Kandresolve_anchor_multi, this function does not accept a vectoranchor; only a scalar or string is valid. - If
family$linkfun(yb)errors, the error is caught and re-raised asgdpar_input_errorwith the original condition message. -
Invalid anchor: Raises
gdpar_input_errorwithdata = list(received = anchor). - The
priorargument is unused in the body. - When
verboseisTRUEand"empirical_y"is used, a message of class"gdpar_anchor_message"is emitted, referencingfamily$link(the link name) and the computed anchor value.
Purpose
Collects convergence diagnostics from a cmdstanr fit object at fit time. Extracts R-hat and effective sample size (bulk and tail) summaries via the posterior package, counts divergent transitions and tree-depth saturations from sampler diagnostics, and optionally warns when thresholds are violated.
Arguments
-
fit: A cmdstanr fit object. Must provide$draws(),$diagnostic_summary(). -
verbose: Logical scalar (defaultTRUE). IfTRUEand convergence thresholds are violated, individual warnings are issued.
Mathematics
Let ^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw). The convergence criteria are:
where
Returns
An object of class c("gdpar_diagnostics", "list") with the following named components:
| Component | Type | Description |
|---|---|---|
rhat_max |
numeric (scalar, possibly NA_real_) |
Maximum R-hat across retained variables |
ess_bulk_min |
numeric (scalar, possibly NA_real_) |
Minimum bulk ESS across retained variables |
ess_tail_min |
numeric (scalar, possibly NA_real_) |
Minimum tail ESS across retained variables |
divergent_count |
integer | Total divergent transitions |
divergent_relative |
numeric (possibly NA_real_) |
divergent_count / total_iter |
treedepth_saturated |
integer | Total tree-depth saturations |
treedepth_relative |
numeric (possibly NA_real_) |
treedepth_saturated / total_iter |
efmi_min |
numeric (possibly NA_real_) |
Minimum E-BFMI across chains |
converged |
logical | Whether all thresholds are satisfied |
summary |
data.frame | Full output of posterior::summarise_draws on retained variables |
messages |
character vector | Warning messages (populated only when verbose && !converged; otherwise empty) |
Notes
-
Dependency check: Calls
require_suggested("posterior", "summarize posterior draws")at entry. -
Variable filtering: Variables whose names match the regex
^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw)are excluded from convergence summaries. If this leaves zero variables, the function falls back to"theta_ref"(if present), then to the first variable in the draws. -
Summary measures: Uses
posterior::default_convergence_measures()appended to thesummarise_drawscall. -
Diagnostic summary:
fit$diagnostic_summary()is called insidetryCatch; if it errors or returnsNULL, the divergent and treedepth counts default to0Landefmi_mindefaults toNA_real_. -
Total iterations: Computed as
posterior::niterations(draws) * posterior::nchains(draws)insidetryCatch; if this fails or is non-finite, set toNA_integer_, and the relative divergence/treedepth rates becomeNA_real_. -
Warning behavior: When
verbose = TRUEandconverged = FALSE, each violated threshold produces a separate warning viagdpar_warnwith class"gdpar_diagnostic_warning". The same messages are collected in themessagescomponent. -
Side effects: May emit up to five warnings (one per violated threshold) when
verboseisTRUE. -
S3 class: The returned object has class
c("gdpar_diagnostics", "list"); no S3 methods for this class are defined in this section.
Purpose
Exported accessor that retrieves the precomputed diagnostics component stored inside a fitted gdpar_fit object. It is the user-facing entry point for inspecting sampler-level diagnostics (e.g., divergences, R-hat, ESS, max treedepth) that were attached to the fit during model estimation. The function performs no recomputation; it simply validates the class of its input and returns the stored list element named diagnostics.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
S3 object of class "gdpar_fit"
|
A fitted model object produced by the package's main estimation routine. Must contain a diagnostics element. |
Returns
The object fit$diagnostics, returned as-is. The exact structure is whatever was attached at fit time (typically a named list of diagnostic quantities); this function does not coerce or reshape it.
Notes
- Calls
assert_inherits(fit, "gdpar_fit", "fit")before accessing the element. Iffitdoes not inherit from class"gdpar_fit", an error is raised (with the argument name"fit"used in the message). The error is propagated fromassert_inherits; no custom class is set bydiagnosticsitself. - No side effects: the function is pure with respect to
fit. - S3 dispatch: none. This is an ordinary (non-generic) function; there is no
UseMethodcall, so subclasses ofgdpar_fitcannot override behavior. - If
fit$diagnosticsisNULLor absent,NULLis returned silently (R's standard behavior for accessing a missing list element); no further validation of the element's presence is performed.
Purpose
Internal helper that translates the user-facing group argument — either NULL or a one-sided formula such as ~ species — into the integer grouping vector consumed by the Stan template underlying gdpar. It validates the formula, extracts the referenced variable from data, coerces it to a factor, and returns the integer codes together with metadata (variable name and level labels). When verbose is TRUE and at least one group has fewer than five observations, it emits an informational warning about the dominance of hierarchical shrinkage in small groups.
Arguments
| Argument | Type | Meaning |
|---|---|---|
group |
NULL or formula of length 2 (one-sided) |
The user-supplied grouping specification. NULL indicates no grouping (function returns NULL immediately). A one-sided formula must contain exactly one variable name on its right-hand side. |
data |
data.frame |
The user-supplied data frame from which the grouping variable is extracted via data[[var_name]]. |
n |
integer (scalar) |
The sample size of the outcome variable, used to validate that the grouping variable has matching length. |
verbose |
logical (scalar, default TRUE) |
When TRUE, enables the small-group informational warning. When FALSE (or any non-TRUE value, tested via isTRUE), the warning is suppressed. |
Mathematics
Let
In that regime, the hierarchical anchor
Returns
- If
groupisNULL: returnsNULL. - Otherwise: a named
listwith three components:-
group_id:integervector of length$n$ , with values in$\{1, \dots, J\}$ , obtained viaas.integer(as.factor(raw)). The mapping from raw values to integers followsas.factor's level ordering (alphabetical by default for character inputs, or the factor's pre-existing level order). -
var_name:characterscalar — the single variable name extracted from the formula viaall.vars(group)[[1]]. -
levels:charactervector —levels(fac), i.e., the original group level labels in factor order.
-
Notes
-
Formula validation. If
groupis notNULL, it mustinherits(group, "formula")and havelength(group) == 2L(one-sided formulas have length 2:~plus RHS). Violations raise:gdpar_abort("Argument 'group' must be a one-sided formula such as ~ species.", class = "gdpar_input_error") -
Single-variable check.
all.vars(group)must yield exactly one variable. If zero or more than one are found, an error of class"gdpar_input_error"is raised, withdata = list(variables_found = vars)attached for programmatic inspection. -
Column existence. The extracted
var_namemust appear incolnames(data); otherwise an error of class"gdpar_input_error"is raised withdata = list(missing_variable = var_name). -
Length consistency.
length(raw)must equaln. A mismatch raises an error of class"gdpar_input_error"reporting both lengths. -
Missing values. Any
NAinrawis fatal: an error of class"gdpar_input_error"is raised reporting the count ofNAvalues viasum(is.na(raw)). The message instructs the user to remove or impute before fitting. -
Zero levels. After
as.factor, ifnlevels(fac) < 1L(defensive guard; under normal R semantics this cannot occur for a non-NAvector), an error of class"gdpar_input_error"is raised. -
Small-group warning. When
isTRUE(verbose)andlength(counts) > 0Landmin(counts) < 5L, a warning is emitted viagdpar_warnwith class"gdpar_grouping_warning". The message includesvar_name,min(counts), andJ_groups. The warning is purely informational (not an error) and does not abort resolution. -
Coercion semantics. The raw variable is coerced via
as.factor(raw), so numeric grouping variables are treated by their printed values;factorinputs retain their level order;characterinputs are sorted alphabetically byas.factor. -
No side effects beyond the optional warning. The function does not modify
dataor any global state. -
Not exported. Marked
@keywords internaland@noRd; intended solely for internal use within the package's fitting pipeline.
Purpose
Internal helper that enables the standalone log_prob, grad_log_prob, and optionally hessian methods on a CmdStanFit / CmdStanMCMC object. It is called by gdpar_geom_bridge() to prepare a fitted cmdstan object for consumption by the geometry engine. The call is idempotent: init_model_methods() is a no-op when the methods are already attached.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
CmdStanMCMC or CmdStanFit
|
A cmdstan fit object whose compiled executable should have the standalone methods exposed. |
hessian |
logical (length 1) | Whether to request compilation of the standalone Hessian (needed by the Riemannian / SoftAbs geometry level). Defaults to TRUE. |
seed |
integer-coercible (length 1) | Forwarded to fit$init_model_methods(seed = ...). Seeds any internal RNG inside the standalone methods. Defaults to 1L. |
Mathematics None; this is a plumbing / compilation-gating function.
Algorithm (two-stage fallback)
- Attempt
fit$init_model_methods(seed, verbose = FALSE, hessian = TRUE). - If that succeeds, return
list(has_hessian = TRUE). - If
hessian = TRUEwas requested and step 1 failed, retry withhessian = FALSE.- If the retry succeeds, emit a
gdpar_geometry_warningviagdpar_warn()and returnlist(has_hessian = FALSE). - This fallback leaves the Euclidean, dense, and sub-Riemannian (expected-Fisher) levels fully operational; only the Riemannian SoftAbs level is disabled.
- If the retry succeeds, emit a
- If both attempts fail (or
hessian = FALSEwas requested and the single attempt failed), callgdpar_abort()with class"gdpar_input_error", surfacing the original error message. A C++ toolchain capable of compiling the model methods is required.
Returns
invisible(list(has_hessian = logical(1))) — a single-element list recording whether the Hessian method is available. Returned invisibly.
Notes
- The
tryCatch/try_initwrapper converts errors to condition objects rather than propagating them, enabling the two-stage fallback. - A failure to expose
log_prob/grad_log_probat all is fatal (gdpar_abort); only Hessian failure is tolerated. - Side effects: may emit a
gdpar_geometry_warningviagdpar_warn()on Hessian fallback.
Purpose
Internal helper that extracts the unconstrained-dimension and a posterior-mean warm-start vector from a cmdstan fit with methods already enabled. The geometry engine operates on the unconstrained (real-vector) scale, so this provides both dim (the dimension of that space) and reference (the posterior mean in that space, used as a warm-start for position-dependent levels).
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
CmdStanMCMC or CmdStanFit
|
A methods-enabled cmdstan fit object. Must support $unconstrain_draws(). |
Mathematics
Given the matrix of unconstrained draws
That is, the reference is the column-wise mean of the unconstrained draws (the posterior mean on the unconstrained scale).
Returns A named list with two elements:
| Element | Type | Meaning |
|---|---|---|
dim |
integer (scalar) | Number of columns of the unconstrained draws matrix |
reference |
numeric vector (length |
Posterior mean on the unconstrained scale. |
Notes
- Calls
posterior::as_draws_matrix(fit$unconstrain_draws())then coerces to a basematrixviaas.matrix(). - If
fit$unconstrain_draws()errors,gdpar_abort()is called with class"gdpar_input_error", forwarding the original message. - The
referencevector isunname()-d to strip any column names.
.gdpar_geom_bridge_core(model, stan_data, dim, fisher = NULL, reference = NULL, engine_fit = NULL, has_hessian = TRUE, extra = list())
Purpose
Internal factory that assembles the gdpar_geom_bridge list object from its constituent parts. Called by both gdpar_geom_bridge() (the post-hoc bridge) and gdpar_geom_fit() (the one-call entry). It constructs the target (for re-sampling diagnostics) and geom_target (for the geometry engine) components and attaches metadata.
Arguments
| Argument | Type | Meaning |
|---|---|---|
model |
CmdStanModel |
A compiled cmdstan model (with methods) that can be re-sampled. Used for the diagnostic target. |
stan_data |
named list | The Stan data list for the model. |
dim |
integer-coercible (scalar) | The unconstrained dimension |
fisher |
function or NULL
|
An optional function NULL or a function; validated at entry. |
reference |
numeric vector or NULL
|
The unconstrained reference position (warm-start). Can be NULL if no warm-start is available. |
engine_fit |
CmdStanMCMC / CmdStanFit or NULL
|
A methods-enabled cmdstan fit to back geom_target. When NULL, the geom_target is derived from model instead (a cheap one-iteration sample would expose the methods). |
has_hessian |
logical (scalar) | Whether the Hessian method is available. When FALSE, the $hessian closure in geom_target is set to NULL so that the SoftAbs level degrades gracefully instead of erroring at call time. |
extra |
named list | Additional elements to merge into the returned object (e.g., extra metadata). |
Returns
An object of class c("gdpar_geom_bridge", "list") containing:
| Element | Type / Class | Meaning |
|---|---|---|
target |
list with $model, $dim, $data
|
The re-samplable diagnostic target. |
geom_target |
gdpar_geom_target object |
The engine sampling target on the unconstrained scale (exposes $log_prob, $grad_log_prob, and optionally $hessian). |
fisher |
function or NULL
|
The expected-Fisher function, if supplied. |
reference |
numeric vector or NULL
|
Warm-start position. |
dim |
integer | The unconstrained dimension. |
model |
CmdStanModel |
The methods-enabled compiled model. |
stan_data |
named list | The Stan data list. |
... |
any | Any extra elements from the extra argument. |
Notes
-
fisheris validated: if non-NULLand not a function,gdpar_abort()is called with class"gdpar_input_error". -
geom_targetis created by callinggdpar_geom_target(object = src, dim = dim, data = stan_data)wheresrcisengine_fitwhen non-NULL, otherwisemodel. - When
has_hessianisFALSE,geom_target$hessianis explicitly set toNULLso that downstream consumers (the SoftAbs level) degrade gracefully. - The
dimargument is coerced to integer in both thetargetsub-list and the top-level object.
Purpose
Exported function. The durable, path-agnostic core of the Block RG integration (RG.6 part ii). Takes an already-fitted gdpar_fit object, enables standalone log-probability / gradient / Hessian methods on its cmdstan fit, derives the unconstrained dimension and posterior-mean warm-start, recompiles a re-samplable CmdStanModel from the fit's own Stan source, and returns a gdpar_geom_bridge object consumable by gdpar_geom_orchestrate(). It never touches the gdpar() default fit path, so the golden tests remain bit-identical.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A fitted gdpar model (result of gdpar()). Must carry $fit (a CmdStanMCMC / CmdStanFit) and $stan_data. |
fisher |
function or NULL
|
Optional function of the unconstrained gdpar_geom_fisher_simulator() when no closed form exists. |
reference |
numeric vector or NULL
|
Optional unconstrained reference position. Defaults to the posterior mean extracted from the fit's draws. If supplied, must have length equal to the unconstrained dimension |
hessian |
logical (length 1) | Whether to compile the standalone Hessian method (needed by the Riemannian SoftAbs level). Defaults to TRUE. |
methods_seed |
integer (length 1) | Seed forwarded to init_model_methods(). Defaults to 1L. |
... |
— | Reserved for future extension; currently unused. |
Algorithm
-
Input validation: verify
objectinherits from"gdpar_fit", that$fitis aCmdStanMCMC/CmdStanFit, that$stan_datais non-NULL, thatreference(if non-NULL) is numeric. -
Dependency check: call
require_suggested("cmdstanr", ...)andrequire_suggested("posterior", ...). -
Enable methods: call
.gdpar_geom_enable_methods(csfit, hessian, seed)on the fitted cmdstan object. This returnsmethwithmeth$has_hessian. -
Unconstrained summary: call
.gdpar_geom_unconstrained_summary(csfit)to obtain the dimension$d$ and the posterior-mean reference. IfreferencewasNULL, use the posterior mean; otherwise validate its length equals$d$ . -
Recompile model: call
cmdstanr::cmdstan_model(cmdstanr::write_stan_file(csfit$code()), compile_model_methods = TRUE)to get a fresh, re-samplableCmdStanModelfrom the fit's Stan source. Cmdstanr's content-hash cache makes this a cache hit when the methods variant already exists. -
Assemble bridge: call
.gdpar_geom_bridge_core(model, stan_data, d, fisher, reference, engine_fit = csfit, has_hessian = meth$has_hessian).
Returns
An object of class c("gdpar_geom_bridge", "list") — see .gdpar_geom_bridge_core() for the full structure. Feed (target, geom_target, fisher, reference) to gdpar_geom_orchestrate().
Notes
- Path-agnostic: works for K-individual, multi-coordinate, and single-coordinate
gdpar_fitobjects, provided they carry$fitand$stan_data. - Uses the fit's own
$code()Stan source for recompilation, guaranteeing model identity. - Cmdstanr's hash-based compilation cache means recompilation is typically a no-op cache hit.
- Errors at any validation or compilation step raise
gdpar_abort()with class"gdpar_input_error". - This function is the tool RG.7 points at the real Tweedie count of benchmark 9.2.O.
Purpose
S3 print method for objects of class "gdpar_geom_bridge". Provides a concise, human-readable summary of the bridge's key properties.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_geom_bridge |
The bridge object to print. |
... |
— | Unused; present for S3 method compatibility. |
Output (to console)
<gdpar_geom_bridge>
unconstrained dim: <d>
fisher supplied: <TRUE/FALSE>
reference: <description>
feed (target, geom_target, fisher, reference) to gdpar_geom_orchestrate()
Where <description> is:
-
"zeros (default)"whenx$referenceisNULL. -
"length <n> (warm-start)"whenx$referenceis non-NULL, with<n>=length(x$reference).
Returns
invisible(x) — the input object, unchanged.
Notes
Pure side-effect (console output). No validation beyond what cat() and is.null() provide.
Purpose
Internal helper used by gdpar_geom_fit() to resolve the front-end inputs for the K-individual (distributional) model-building path. It reuses the same resolution logic as gdpar() without touching gdpar() itself, and returns the arguments that .gdpar_K_build() needs. Scoped to
Arguments
| Argument | Type | Meaning |
|---|---|---|
formula |
gdpar_formula_set, formula, or other |
A gdpar_bf() formula set, a classic formula with AMM wrapper calls on the RHS, or (with a named amm list) a two-sided y ~ ... formula. |
family |
gdpar_family, gdpar_family_multi, or named list of gdpar_family
|
The distribution family. A single gdpar_family is broadcast across all K slots; a named list enables heterogeneous families per slot. |
amm |
amm_spec (default) or named list of amm_spec
|
The additive multiresolution specification. When formula is a gdpar_formula_set, amm must stay at its default. When a named list of amm_spec, it is used directly. |
W |
modulating basis or NULL
|
Optional modulating basis for the K-individual paths. |
data |
data frame | The data (not directly used in resolution but part of the interface contract). |
Algorithm / Resolution Logic
The function handles three mutually exclusive input patterns:
Case 1 — formula is a gdpar_formula_set:
-
ammmust be at its default (amm_spec()); otherwise error. - Derive
amm_list_canonicalvia.gdpar_formula_set_to_amm_spec_list(formula, W). - Extract
outcome_namefromformula$outcomeandformula_envfromformula$env.
Case 2 — amm is a named list of amm_spec:
- Each entry must be a named
amm_spec; validated in a loop. -
formulamust be a two-sided formula (length(formula) == 3L); otherwise error. -
amm_list_canonical <- amm. -
outcome_nameextracted fromformula[[2L]]. -
formula_env <- environment(formula).
Case 3 — Classic formula with AMM wrapper calls on the RHS:
- Detected via
.gdpar_rhs_has_amm_calls(formula). -
ammmust be at its default; otherwise error. - Derive the first eligible parameter name from
family$param_specs[[1]]$name. - Build a
gdpar_formula_setviado.call(gdpar_formula_set, list(formula))named with that parameter. - Derive
amm_list_canonicalfrom the formula set. - Extract
outcome_nameandformula_envfrom the formula set.
If none of the three cases applies, gdpar_abort() is called with a message indicating that only the K-individual path is supported (RG.6 scope).
Post-resolution checks:
-
K <- length(amm_list_canonical). If$K \leq 1$ , abort:gdpar_geom_fit()requires$K > 1$ . -
Family resolution:
- If
familyis a named list ofgdpar_family(detected by checking it's a list, not agdpar_familyorgdpar_family_multi, has non-empty unique names, and every element inherits from"gdpar_family"): call.gdpar_resolve_heterogeneous_family_K(family, names(amm_list_canonical))to getfamily_promotedandfamily_id_k_vector. - Otherwise: call
.gdpar_promote_scope_per_observation(family, names(amm_list_canonical))to get a singlefamily_promoted;family_id_k_vectorisNULL.
- If
Returns A named list:
| Element | Type | Meaning |
|---|---|---|
amm_list_canonical |
named list of amm_spec
|
Canonical AMM specification list (length |
family |
gdpar_family or gdpar_family_multi
|
The resolved (possibly promoted) family. |
outcome_name |
character (length 1) | Name of the response variable. |
formula_env |
environment | The environment associated with the formula (for evaluating terms). |
family_id_k_vector |
integer vector or NULL
|
Per-slot family index for heterogeneous family lists; NULL when a single family is broadcast. |
K |
integer (scalar) | Number of distributional slots ( |
Notes
- This function is the shared seam between
gdpar_geom_fit()andgdpar()'s internal K path: it reuses.gdpar_K_build()as the single source of model assembly. - The
family_is_named_listdetection checks:is.list(family), not inheriting from"gdpar_family"or"gdpar_family_multi", non-NULLnames, all names non-empty (nzchar), no duplicated names, and every element inheriting from"gdpar_family". - All error paths use
gdpar_abort()with class"gdpar_input_error". - The error for
$K \leq 1$ includesdata = list(K = K)for programmatic access to the slot count.
gdpar_geom_fit(formula, family, amm, W, data, prior, anchor, skip_id_check, parametrization, parametrization_a, parametrization_W, id_check_rigor, group, fisher, budget, criteria, entry_level, level_map, reference, speed, rest_mass, laplace_fallback, laplace_draws, n_grid, seed, verbose, ...)
Purpose
High-level entry point for fitting General Dynamic Parameter (GDP) models using geometry-adaptive optimization (opt-in controller) with reference anchoring. Builds the hierarchical Bayesian model once, then runs the geometry-adaptive controller (an alternative to the default gdpar() fit path). Requires cmdstanr and posterior packages.
Arguments
-
formula:formula– Model formula specifying response and predictors. -
family:gdpar_family– Distribution family (default:gdpar_family("gaussian")). -
amm:amm_spec– Adaptive Mixture of Multivariate normals specification (default:amm_spec()). -
W:matrixorNULL– Weight matrix for the model (default:NULL). -
data:data.frame– Data frame containing variables referenced informula. -
prior:listorNULL– Prior specifications for parameters (default:NULL). -
anchor:characterornumeric– Anchor for reference (default:"prior_mean"). -
skip_id_check:logical– IfTRUE, skip identity checks (default:FALSE). -
parametrization:character– One of"auto","ncp"(non-centered),"cp"(centered); default"auto". -
parametrization_a:characterorNULL– Overridesparametrizationforaparameters; must be"ncp"or"cp"if set. -
parametrization_W:characterorNULL– OverridesparametrizationforWparameters; must be"ncp"or"cp"if set. -
id_check_rigor:character– One of"full"or"fast"(default:"full"). -
group:characterorNULL– Grouping variable for hierarchical structure (default:NULL). -
fisher:matrix,list, orNULL– Precomputed Fisher information or specification (default:NULL). -
budget:numericorNULL– Computational budget for optimizer (default:NULL). -
criteria:listorNULL– Convergence criteria for optimizer (default:NULL). -
entry_level:integerorNULL– Initial level in hierarchy for geometry controller (default:NULL). -
level_map:listorNULL– Mapping of levels for geometry controller (default:NULL). -
reference:numericorNULL– Reference point for anchoring (default:NULL). -
speed:numeric– Speed parameter for geometry controller (default:10). -
rest_mass:numeric– Rest mass parameter for geometry controller (default:1). -
laplace_fallback:logical– IfTRUE, use Laplace approximation as fallback (default:FALSE). -
laplace_draws:integer– Number of draws from Laplace fallback (default:0L). -
n_grid:integerorNULL– Grid size for geometry controller (default:NULL). -
seed:integer– Random seed for reproducibility (default:20260603L). -
verbose:logical– IfTRUE, print progress messages (default:TRUE). -
...: Additional arguments passed to internal functions.
Mathematics
The function orchestrates a two-stage process:
-
Model building: Constructs a Stan model via
.gdpar_K_build, yielding a compiled model and data. -
Geometry-adaptive fitting:
- A throwaway single-iteration MCMC sample (
chains=1, iter_warmup=1, iter_sampling=1) extracts the unconstrained parameter dimension$d$ and enables model methods (e.g., Hessian). - A bridge object (
.gdpar_geom_bridge_core) defines the target density$\pi(\theta)$ and geometry-informed target$\tilde{\pi}(\theta)$ , optionally using a Fisher information matrix$\mathcal{I}(\theta)$ . - The geometry controller (
gdpar_geom_orchestrate) performs adaptive sampling over a hierarchy of levels, using speed$v$ and rest mass$m$ to navigate the parameter space. It may fall back to Laplace approximation$\mathcal{N}(\hat{\theta}, \mathcal{I}(\hat{\theta})^{-1})$ if requested.
- A throwaway single-iteration MCMC sample (
Returns
An object of class gdpar_geom_fit (inheriting from list) with elements:
-
orchestration: Output fromgdpar_geom_orchestrate. -
bridge: Bridge object from.gdpar_geom_bridge_core. -
status: Character scalar – optimization status (e.g.,"resolved"). -
level: Numeric scalar – level at which resolution occurred. -
metric: List – geometry metric used. -
draws: Matrix – posterior draws (samples × parameters). -
laplace: List – Laplace fallback results (if used). -
certificate: List – verification certificate (if applicable). -
stan_data: List – data passed to Stan. -
family:gdpar_family– family object. -
K: Integer – number of mixture components/groups. -
slot_names: Character vector – model slot names. -
call: Matched call.
Notes
- Requires packages
cmdstanrandposterior; raises error if missing. - The throwaway sample discards all draws; its purpose is only to initialize methods and compute
$d$ . -
parametrization="auto"lets internal logic choose between non-centered (ncp) and centered (cp) parametrization. - The geometry controller adapts over levels;
entry_levelandlevel_mapcontrol this hierarchy. - If
laplace_fallback=TRUE, useslaplace_drawsposterior samples from the Laplace approximation. -
seedensures reproducibility of the geometry controller's internal sampling. - The printed message states that this function never alters the default
gdpar()fit path.
Purpose
S3 print method for gdpar_geom_fit objects. Provides a concise summary of the fit.
Arguments
-
x:gdpar_geom_fit– Object to print. -
...: Additional arguments (currently unused).
Mathematics
None.
Returns
Invisibly returns the input object x.
Notes
- Dispatches automatically for objects of class
gdpar_geom_fit. - Output varies based on
x$status:- If
status == "resolved", prints the level and draws dimensions (if draws exist). - Otherwise, prints verdict and prescription count from
x$certificate(if present).
- If
- If
x$laplaceis notNULL, appends the Laplace fit quality label. - Does not modify the object; side effect is printing to console.
Purpose
Returns the default decision thresholds for the rule-based classifier inside gdpar_geometry_diagnostic. These are exposed as data rather than hard-coded so that the calibration of Block RG.1 can be tuned against gdpar_geometry_suite and so users can re-calibrate for their own settings.
Arguments
None.
Returns
A named list of 13 numeric scalars:
| Element | Default | Role |
|---|---|---|
divergent_rate_high |
0.01 | Upper tolerable divergence rate before a pathology is flagged. |
funnel_ebfmi_low |
0.35 | Minimum E-BFMI below which a funnel-type energy pathology is suspected. |
heavy_cond_max |
25 | Maximum tolerable condition number for the heavy-tail class. |
treedepth_sat_high |
0.20 | Upper tolerable fraction of iterations hitting max_treedepth. |
condition_high |
12 | Condition-number cut above which anisotropy is flagged. |
step_scale_ratio_low |
0.10 | Lower cut on the adapted-step-to-scale ratio. |
nslope_grows |
0.80 | Slope threshold for the difficulty-vs- |
flat_var_high |
600 | Variance cut above which a flat-direction pathology is flagged. |
boundary_prox_high |
0.02 | Fraction of draws within boundary_eps of a boundary above which boundary pile-up is flagged. |
boundary_eps |
0.01 | Distance defining proximity to a parameter boundary. |
multimodal_high |
2.5 | Cut on the multimodality signal above which multimodality is flagged. |
heavy_kurtosis_high |
1.8 | Excess-kurtosis cut above which heavy tails are flagged. |
target_ess |
400 | Target effective sample size used for cost extrapolation. |
Notes
- The defaults were calibrated in sub-phase RG.1.c (session B9.23) against a synthetic suite over a difficulty × pilot-budget × replica grid. Thresholds were proposed by a data-driven Youden cut, regularised to interpretable values on a calibration fold, and validated out-of-sample on a held-out fold (held-out balanced accuracy rose from 0.60 to 0.89).
- Versus the initial hand-set values, several cuts were tightened to compensate for signal attenuation in the short pilots:
condition_high50 → 12,heavy_kurtosis_high3 → 1.8,boundary_prox_high0.10 → 0.02,nslope_grows0.30 → 0.80,funnel_ebfmi_low0.25 → 0.35,heavy_cond_max8 → 25,flat_var_high1000 → 600,multimodal_high2 → 2.5. - The calibration is against an idealised synthetic suite and is not claimed to be optimal on real posteriors. The funnel and heavy-tail classes remain mutually confusable (per-class recall ≈ 0.6–0.7).
- No side effects; no errors raised; pure data factory.
gdpar_geometry_diagnostic(target, n_grid = NULL, difficulty = NULL, pilot_warmup = 150L, pilot_sampling = 150L, chains = 4L, adapt_delta = 0.8, max_treedepth = 10L, seed = 20260602L, thresholds = NULL, verbose = TRUE, ...)
Purpose
Top-level exported entry point of the geometry-diagnostic subsystem. It is an opt-in forensic probe: it runs a sequence of short pilot Stan fits at one or more data sizes gdpar_geometry_diagnostic. It does not modify any fit.
Arguments
| Argument | Type | Meaning |
|---|---|---|
target |
various (see .gdpar_geom_normalize_target) |
The posterior target to probe: a gdpar_geometry_target suite object, a list(type = "gdpar", formula, amm, data) gdpar specification, a list(stan_code, stan_data[, data_n_fn, bounds]) raw Stan target, or a list(model, data[, data_n_fn, bounds]) with a compiled CmdStanModel. |
n_grid |
numeric / NULL
|
Grid of data sizes at which to run pilots. If NULL, the default grid supplied by the normalised target is used. Coerced to numeric, then sort-ed and de-duplicated via unique. |
difficulty |
various / NULL
|
Difficulty level forwarded to a suite target's make(n, diff). Ignored for non-suite targets. |
pilot_warmup |
integer (count) |
Warmup iterations per pilot fit. Asserted strictly positive integer via assert_count. |
pilot_sampling |
integer (count) |
Sampling (post-warmup) iterations per pilot fit. Asserted via assert_count. |
chains |
integer (count) |
Number of Markov chains per pilot. Asserted via assert_count. |
adapt_delta |
numeric scalar in |
Stan adapt_delta. Asserted via assert_numeric_scalar with lower = 0, upper = 1. |
max_treedepth |
integer (count) |
Stan max_treedepth. Asserted via assert_count. |
seed |
integer (count) |
Base RNG seed. Pilot seed + i. Asserted via assert_count. |
thresholds |
list / NULL
|
Classification thresholds. If NULL, defaults are obtained from gdpar_geometry_thresholds(). |
verbose |
logical scalar |
If TRUE, emits an informational gdpar_optin_message describing the probe before running. Manually validated (must be length-1 logical). |
... |
forwarded | Extra arguments passed to .gdpar_geom_run_pilot (and thence to the Stan sampler / target make). |
Mathematics
For each element
Each pilot yields a signals sub-list; these are stacked row-wise into a data frame
A difficulty curve is then estimated:
Classification uses the largest pilot's signals (pilot_max = pilots[[K]]), except that multimodality — being na.rm = TRUE):
The classification and cost are then:
Returns
A list with S3 class c("gdpar_geometry_diagnostic", "list") and elements:
| Element | Content |
|---|---|
pathology |
Classified pathology string (from cls$pathology). |
confidence |
Classification confidence (from cls$confidence). |
recommended_geometry |
Recommended remedy (from cls$remedy). |
geometry_level |
Recommended sampler level (from cls$level). |
signals |
Data frame |
difficulty_curve |
Result of .gdpar_geom_difficulty_curve. |
culprit |
Culprit information from the largest pilot (pilot_max$culprit). |
cost |
Result of .gdpar_geom_cost. |
rule_trace |
Trace from the classifier (cls$trace). |
reproducibility |
A list: seed, n_grid, controls (warmup/sampling/chains/adapt_delta/max_treedepth), target_id (norm$meta$id), gdpar_version (utils::packageVersion("gdpar") as character), cmdstan_version (cmdstanr::cmdstan_version() wrapped in tryCatch, yielding NA_character_ on error). |
ground_truth |
Present only if norm$meta$ground_truth is non-NULL; copied verbatim. |
correct |
Present only if ground truth exists; identical(cls$pathology, norm$meta$ground_truth$pathology). |
Notes
-
Input validation order:
assert_countis called onpilot_warmup,pilot_sampling,chains,max_treedepth, andseed;assert_numeric_scalaronadapt_delta(bounds$(0,1)$ );verboseis checked manually (length-1 logical, elsegdpar_abortwith classgdpar_input_error). -
Suggested-package gating:
require_suggested("cmdstanr", ...)andrequire_suggested("posterior", ...)are called before any work begins. -
Threshold defaulting: when
thresholdsisNULL,gdpar_geometry_thresholds()is called exactly once. -
Grid coercion:
n_grid(whether user-supplied or defaulted) is passed throughsort(unique(as.numeric(n_grid))), so non-numeric input will be coerced (and may error insideas.numeric), duplicates are removed, and order is ascending. -
Verbose message: the informational message reports
length(n_grid)pilot fits,length(n_grid)sizes,chainschains, andpilot_warmup + pilot_samplingiterations. It is emitted with classgdpar_optin_messageviagdpar_inform. -
Multimodality aggregation:
suppressWarnings(max(signals$multimodality, na.rm = TRUE))is used; if all values areNA, this yields-Inf(with the warning suppressed). -
Ground-truth comparison:
correctusesidentical, so the classified pathology string must match the ground-truth string exactly (including type and encoding). -
No side effects beyond the optional informational message and the Stan fits launched by
.gdpar_geom_run_pilot.
Purpose
Internal three-way (effectively four-way) adapter that normalises the heterogeneous target argument into a uniform list consumed by the pilot runner. It inspects the class and structure of target and dispatches to one of four branches: suite target, gdpar specification, raw Stan code, or compiled CmdStanModel.
Arguments
| Argument | Type | Meaning |
|---|---|---|
target |
various | The user-supplied target. See the four recognised forms below. |
difficulty |
various / NULL
|
Difficulty level; only consumed by the suite-target branch. |
Mathematics
For the gdpar-specification branch, the default grid is three logarithmically spaced points between a lower bound and the full data size:
where
Returns
A normalised list whose common fields are kind (a string tag), n_grid_default (numeric vector of default grid sizes), and meta (a list with id, bounds, ground_truth). Branch-specific additional fields:
| Branch | kind |
n_grid_default |
meta |
Extra fields |
|---|---|---|---|---|
inherits(target, "gdpar_geometry_target") |
"suite" |
target$n_grid |
id = target$id, bounds = target$bounds, ground_truth = list(pathology, geometry_remedy, culprit, difficulty_scales_with_n) extracted from the target |
make = function(n) target$make(n, diff) where `diff = difficulty % |
is.list(target) && identical(target$type, "gdpar") |
"gdpar" |
Log-spaced 3-point grid (formula above) |
id = "gdpar_spec", bounds = NULL, ground_truth = NULL
|
gdpar_target = target, n_full = nrow(target$data)
|
is.list(target) && !is.null(target$stan_code) |
"stan" |
1 |
id = "stan_target", `bounds = target$bounds % |
|
is.list(target) && (inherits(target$model, "CmdStanModel") || !is.null(target$model)) |
"model" |
1 |
id = "cmdstan_model", `bounds = target$bounds % |
Notes
-
Branch order matters: the suite-target branch is tested first (via
inherits), then the gdpar-specification branch (viaidentical(target$type, "gdpar")), then thestan_codebranch, then themodelbranch. A list that satisfies multiple conditions is dispatched to the earliest matching branch. -
gdpar-specification validation: if
target$dataortarget$formulaisNULL, the function aborts with classgdpar_input_errorand message"A gdpar target must supply at least 'formula' and 'data'.". -
Model-branch guard: the condition
inherits(target$model, "CmdStanModel") || !is.null(target$model)is logically equivalent to!is.null(target$model)(since a non-NULLvalue satisfies the second disjunct regardless of class). Theinheritscheck is therefore redundant but harmless. -
Unrecognised target: if none of the four branches matches, the function calls
gdpar_abortwith classgdpar_input_errorand a message listing the four accepted forms. -
%||%operator: used forboundsanddata_n_fndefaults; the comment notes it is "defined canonically inR/preflight_multi.R". -
difficultyfor non-suite branches: silently ignored. -
ground_truthfor non-suite branches: alwaysNULL, meaning the returned diagnostic object will not carryground_truthorcorrectfields. -
data_n_fndefault: for thestanandmodelbranches, iftarget$data_n_fnis not supplied, a closure is created that ignores its argumentnand returns the fixedtarget$stan_data(ortarget$data). This means the raw-Stan and compiled-model branches are effectively single-$n$ targets (n_grid_default = 1) unless the user supplies a customdata_n_fn. -
No side effects; pure transformation (the
makeclosure capturestargetanddiffbut is not invoked here).
Purpose
Top-level pilot runner for the geometry diagnostic. It attempts to draw a posterior sample at a given knob size n_knob, guards against sampling failures, and—on success—delegates to signal extraction and culprit localization. On failure it returns a sentinel structure so downstream code can proceed uniformly.
Arguments
| Argument | Type | Meaning |
|---|---|---|
norm |
list | A "norm" descriptor of the target model. Must contain $kind (character), $meta$bounds (parameter bounds), and kind-specific fields consumed by .gdpar_geom_sample. |
n_knob |
numeric / integer | The knob value (typically data size) at which to run the pilot. |
controls |
list | Sampling control parameters. Must include $max_treedepth; other fields ($chains, $warmup, $sampling, $adapt_delta) are forwarded to the sampler. |
seed |
integer | Random seed passed to the sampler. |
... |
any | Extra arguments forwarded to .gdpar_geom_sample and ultimately to the underlying sampler or gdpar(). |
Returns
A list with elements:
| Element | Type | Description |
|---|---|---|
failed |
logical |
TRUE if sampling raised an error; FALSE otherwise. |
signals |
list | Diagnostic signal list (real signals on success, all-NA via .gdpar_geom_na_signals() on failure). |
culprit |
list | Culprit localization (real on success, empty via .gdpar_geom_empty_culprit() on failure). |
elapsed |
numeric | Elapsed wall-clock seconds (present only when failed = FALSE). |
Notes
- Sampling errors are caught with
tryCatch. On error,gdpar_warn()is called with class"gdpar_diagnostic_warning"and a message of the form"Geometry pilot at n = <n_knob> failed: <message>.". - The
format(n_knob)call in the warning message uses base R's genericformat, so the rendered string depends on the class ofn_knob. -
.gdpar_geom_empty_culprit()is called from another section; it is invoked here but not defined here.
Purpose
Dispatches posterior sampling according to norm$kind. Supports four model kinds: "gdpar" (delegates to .gdpar_geom_sample_gdpar), "suite" (builds a Stan instance from a generator), "stan" (uses pre-supplied Stan code and a data-size function), and a fallback "model" kind (uses a pre-compiled cmdstanr model object).
Arguments
| Argument | Type | Meaning |
|---|---|---|
norm |
list | Model descriptor. Fields used depend on norm$kind (see below). |
n_knob |
numeric / integer | Knob value passed to the data-generating function or used for subsetting. |
controls |
list | Must contain $chains, $warmup, $sampling, $adapt_delta, $max_treedepth. |
seed |
integer | Seed for the sampler. |
... |
any | Extra arguments forwarded to mod$sample() or gdpar(). |
Returns
The return value of .gdpar_geom_pack_fit(cs_fit, elapsed): a list with cs_fit, pm, chain_id, and elapsed.
Notes
-
norm$kind == "gdpar": immediately returns.gdpar_geom_sample_gdpar(...). -
norm$kind == "suite": callsnorm$make(n_knob)to obtain a list with$stan_codeand$stan_data; compiles via.gdpar_geom_compile(code). -
norm$kind == "stan": usesnorm$stan_codedirectly and callsnorm$data_n_fn(n_knob)for data; compiles via.gdpar_geom_compile(code). -
Fallback (e.g.
"model"): usesnorm$model(a pre-compiled cmdstanr model) andnorm$data_n_fn(n_knob)for data. No compilation step. - The
mod$sample()call setsrefresh = 0,show_messages = FALSE,show_exceptions = FALSE, andparallel_chains = controls$chains(i.e., one parallel chain per chain). - Elapsed time is measured with
Sys.time()/difftime(..., units = "secs")and coerced to numeric.
Purpose
Specialized sampler for norm$kind == "gdpar". Subsets the full gdpar target dataset to n_knob rows (randomly), assembles an argument list from the stored target, and invokes gdpar() via do.call.
Arguments
| Argument | Type | Meaning |
|---|---|---|
norm |
list | Must contain $gdpar_target (a list of arguments for gdpar(), including $data) and $n_full (integer, total number of data rows). |
n_knob |
numeric / integer | Desired data subset size. |
controls |
list | Sampling controls ($warmup, $sampling, $chains, $adapt_delta, $max_treedepth). |
seed |
integer | Seed for gdpar(). |
... |
any | Extra arguments merged into the gdpar() call. |
Mathematics
Row indices are drawn uniformly without replacement:
and the data is subsetted as
Returns
The return value of .gdpar_geom_pack_fit(fit$fit, elapsed), where fit is the object returned by gdpar().
Notes
- The argument list is built by copying
norm$gdpar_targetand then overriding:typeis set toNULL(removed),datais replaced by the subset, and all control/seed fields are set fromcontrols/seed. -
skip_id_check = TRUEandverbose = FALSEare hardcoded into the args, suppressing gdpar's internal identifier validation and verbose output. -
refreshis set to0L. - Extra arguments in
...are appended viac(args, list(...))and thus override any同名 fields already inargs.
Purpose
Normalizes a cmdstanr fit object into a compact list containing the posterior draws matrix (excluding lp__), per-draw chain IDs, and elapsed time. This standardized structure is consumed by signal extraction and culprit localization.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cs_fit |
CmdStanMCMC | A cmdstanr fit object returned by mod$sample() or gdpar()$fit. |
elapsed |
numeric | Wall-clock seconds elapsed during sampling. |
Returns
A list:
| Element | Type | Description |
|---|---|---|
cs_fit |
CmdStanMCMC | The original fit object (retained for later diagnostic queries). |
pm |
numeric matrix | Posterior draws matrix; rows = draws, columns = parameters (excluding lp__). |
chain_id |
integer vector | Chain identifier for each draw (from posterior::as_draws_df()$.chain``). |
elapsed |
numeric | Passed through unchanged. |
Notes
-
lp__is explicitly removed from the variable set viasetdiff(vars, "lp__"). -
posterior::as_draws_matrix()followed byas.matrix()ensurespmis a plain numeric matrix. - The chain ID extraction relies on the
.chaincolumn produced byposterior::as_draws_df().
Purpose
Writes Stan source code to a file and compiles it into a cmdstanr model object. Relies on cmdstanr's file-hash caching so that repeated diagnostic calls with identical code do not recompile.
Arguments
| Argument | Type | Meaning |
|---|---|---|
stan_code |
character (scalar) | Stan model source code as a string. |
Returns
A CmdStanModel object.
Notes
- Uses
cmdstanr::write_stan_file(stan_code)to materialize the code to disk (typically a temp file with content hashing). - Then
cmdstanr::cmdstan_model(f)compiles (or retrieves the cached compilation). - No explicit error handling; compilation failures propagate to the caller.
Purpose
Extracts a comprehensive set of size-invariant geometric diagnostic signals from a single pilot fit. These signals characterize sampler health (divergences, tree-depth saturation, EBFMI), posterior geometry (condition number, step-scale ratio, multimodality, heavy kurtosis, boundary proximity), and computational cost (leapfrog count, elapsed time).
Arguments
| Argument | Type | Meaning |
|---|---|---|
cs_fit |
CmdStanMCMC | The cmdstanr fit object. |
pm |
numeric matrix | Posterior draws matrix (rows = draws, cols = parameters, lp__ excluded). |
chain_id |
integer vector | Chain ID per draw. |
bounds |
list / matrix | Parameter bounds, forwarded to .gdpar_geom_boundary_prox(). |
max_treedepth |
integer | Maximum NUTS tree depth; used to compute treedepth saturation rate. |
elapsed |
numeric | Elapsed wall-clock seconds. |
Mathematics
Let
Sampler diagnostics:
Covariance-based geometry:
Returns
A named list:
| Element | Type | Description |
|---|---|---|
divergent_rate |
numeric | Fraction of divergent transitions. |
ebfmi_min |
numeric | Minimum EBFMI across chains. |
treedepth_sat_rate |
numeric | Fraction of draws hitting max_treedepth. |
condition_number |
numeric |
NA_real_ if eigen-decomposition failed. |
lambda_max_cov |
numeric | Largest eigenvalue of the posterior covariance. |
step_scale_ratio |
numeric | Mean step size divided by smallest marginal SD. |
mean_leapfrog |
numeric | Mean number of leapfrog steps per draw. |
multimodality |
numeric | From .gdpar_geom_multimodality(). |
heavy_kurtosis |
numeric | From .gdpar_geom_heavy_kurtosis() (defined in another section). |
boundary_proximity |
numeric | From .gdpar_geom_boundary_prox() (defined in another section). |
elapsed |
numeric | Passed through. |
n_sampling |
integer | Total number of post-warmup draws ( |
Notes
- Sampler diagnostics are retrieved via
cs_fit$sampler_diagnostics(format = "draws_df")andcs_fit$diagnostic_summary(quiet = TRUE). -
ebfmi_minis wrapped insuppressWarnings(min(ds$ebfmi)). - Eigen-decomposition uses
eigen(cv, symmetric = TRUE, only.values = TRUE)inside atryCatchthat returnsNA_real_on error. If all eigenvalues areNA, bothlambda_maxandlambda_minare set toNA_real_, andcondition_numberisNA_real_. -
lambda_minis clamped to be non-negative viamax(min(ev), 0). - The condition number denominator uses
max(lambda_min, .Machine$double.eps)to avoid division by zero. - Marginal SDs use
sqrt(pmax(diag(cv), .Machine$double.eps))to avoid zero or negative variances from numerical issues.
Purpose
Returns a template signal list with every field set to NA_real_. Used by .gdpar_geom_run_pilot when sampling fails, so that downstream consumers receive a structurally complete (but missing-valued) signal object.
Arguments
None.
Returns
A named list with all twelve signal fields set to NA_real_:
divergent_rate, ebfmi_min, treedepth_sat_rate, condition_number, lambda_max_cov, step_scale_ratio, mean_leapfrog, multimodality, heavy_kurtosis, boundary_proximity, elapsed, n_sampling.
Notes
- The field names and order exactly mirror those produced by
.gdpar_geom_extract_signals(). - All values are explicitly
NA_real_(typed missing), notNAorNA_integer_.
Purpose
Computes a between-chain mean-separation signal relative to within-chain spread. Chains that settle in different modes produce large between-chain mean differences, inflating this metric. The design is intended to be robust to short pilot runs.
Arguments
| Argument | Type | Meaning |
|---|---|---|
pm |
numeric matrix | Posterior draws matrix; rows = draws, columns = parameters. |
chain_id |
integer vector | Chain identifier for each row of pm. Length must equal nrow(pm). |
Mathematics
Let
If
The returned multimodality signal is:
Returns
A single numeric value. Returns 0 if fewer than 2 unique chains are present.
Notes
- Early return of
0(notNA) whenlength(unique(chain_id)) < 2. - Per-chain means and SDs are computed with
vapplyover the unique chain set; the order of chains is determined byunique(chain_id). - Within-chain spread
mean(csds, na.rm = TRUE)averages the per-chain standard deviations (not a pooled variance). - The guard
!is.finite(within) || within <= 0returns0for that coordinate, preventing division by zero or non-finite values. - The final
max(scores, na.rm = TRUE)toleratesNAscores from individual coordinates.
Purpose
Computes a scalar summary of tail heaviness across all parameter marginals in a posterior matrix. Used by the classifier as the heavy_kurtosis signal to detect funnel-neck or heavy-tail pathologies.
Arguments
| Argument | Type | Meaning |
|---|---|---|
pm |
numeric matrix | Posterior draws matrix with rows = draws, columns = parameters. |
Mathematics
For each column pm, let
The returned value is
Returns
A single numeric scalar — the maximum excess kurtosis over all parameter columns.
Notes
- Uses
vapplywithnumeric(1), so type stability is guaranteed. -
na.rm = TRUEin the finalmaxmeans columns that producedNA(e.g. all-constant columns returning 0, or columns withNaN) do not propagate. - The variance estimator is the biased (population) estimator, not the sample (
$N-1$ ) estimator, so kurtosis values are slightly inflated relative to the unbiased form for small$N$ .
Purpose
Measures how tightly the posterior draws of bounded parameters pile against their declared lower or upper bound. Returns the worst-case (maximum) proximity fraction across all bounded parameters, used as the boundary_proximity signal.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cs_fit |
CmdStanFit-like object | A fitted model object exposing a draws() method compatible with the posterior package. |
bounds |
named list of length-2 numeric vectors, or NULL
|
Each element is c(lo, hi) giving the declared bound for the parameter named by the list element. |
eps |
numeric scalar (default 0.01) |
Relative tolerance: a draw is "near" a bound if it lies within eps * (hi - lo) of that bound. |
Mathematics
For each bounded parameter
The per-parameter score is na.rm = TRUE).
Returns
A numeric scalar in 0 if bounds is NULL or empty, or if no bounded parameter name matches a variable in the draws.
Notes
- Variables are retrieved via
posterior::variables(draws_arr). Parameters inboundsthat are absent from the draws silently contribute0. -
posterior::subset_drawsandposterior::as_draws_matrixare used to extract a single variable; the result is coerced withas.numeric. - If
hi - lois zero or negative,rngis non-positive and the comparisonsx <= lo + eps * rng/x >= hi - eps * rngstill execute but the geometric meaning is undefined.
Purpose
Factory function returning an empty culprit-localisation data frame with the correct schema, used as a sentinel when no culprit is identified.
Arguments
None.
Returns
A data.frame with zero rows and three columns:
| Column | Type |
|---|---|
parameter |
character |
mechanism |
character |
score |
numeric |
Constructed with stringsAsFactors = FALSE.
Notes
- No side effects. Pure constructor.
Purpose
Localises the geometric "culprit" parameters responsible for a detected pathology by combining three independent mechanisms: (a) flat/anisotropic directions from the covariance spectrum, (b) divergence-neck parameters that separate divergent from non-divergent draws, and (c) boundary-piling of bounded parameters.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cs_fit |
CmdStanFit-like object | Provides sampler_diagnostics(format = "draws_df") for divergence information and draws() (via .gdpar_geom_boundary_prox). |
pm |
numeric matrix | Posterior draws matrix (rows = draws, columns = parameters). Column names are used as parameter names. |
chain_id |
(any) | Accepted but never used in the function body. |
bounds |
named list of length-2 numeric vectors, or NULL
|
Declared parameter bounds, as in .gdpar_geom_boundary_prox. |
Mathematics
(a) Flat / anisotropic direction. Let eigen(..., symmetric = TRUE)). Let
(b) Divergence neck. Let
Parameters with "divergence_neck" and score
(c) Boundary pile. For each bounded parameter, .gdpar_geom_boundary_prox is called with a single-element bounds list. Parameters with proximity "boundary_pile".
Returns
A data.frame with columns parameter (character), mechanism (character), score (numeric), sorted by score in decreasing order, with rownames set to NULL. If no culprit rows are produced, returns .gdpar_geom_empty_culprit().
Mechanism labels:
| Mechanism string | Source |
|---|---|
"flat_or_anisotropic_direction" |
Covariance eigenvector (a) |
"divergence_neck" |
Divergence separation (b) |
"boundary_pile" |
Boundary proximity (c) |
Notes
- The eigendecomposition is wrapped in
tryCatch; if it fails, mechanism (a) is silently skipped. - Mechanism (b) is only evaluated when
any(div) && !all(div)— i.e., at least one but not all draws are divergent. If all draws diverge, the mechanism is skipped entirely. - In mechanism (b), columns with non-finite or non-positive
sd(x)are skipped vianext. - In mechanism (c),
.gdpar_geom_boundary_proxis called once per bounded parameter with a single-element sublistbounds[nm], so theepsdefault of0.01is always used. -
chain_idis a dead parameter — it is accepted for interface consistency but has no effect on the output.
Purpose
Fits a log–log regression of condition number against data size grows_with_n flag consumed by the classifier.
Arguments
| Argument | Type | Meaning |
|---|---|---|
signals |
data.frame | Must contain columns failed (logical), condition_number (numeric), and n (numeric). Typically a stacked table of signals across multiple fit sizes. |
thresholds |
named list | Must contain nslope_grows (numeric): the slope threshold above which difficulty is declared to grow with |
Mathematics
Rows are filtered to those with !failed, finite condition_number, and finite n. Let the filtered set be NA. Otherwise, an ordinary least-squares fit is performed:
The slope coef(fit)[2]. The flag grows_with_n is TRUE iff thresholds$nslope_grows.
Returns
A named list:
| Element | Type | Meaning |
|---|---|---|
slope |
numeric | The fitted exponent NA_real_ if insufficient data. |
grows_with_n |
logical or NA
|
TRUE if slope exceeds threshold; NA if insufficient data. |
note |
character or NULL
|
"need at least two distinct sizes" if insufficient data; NULL otherwise. |
Notes
- The filtering predicate uses
is.finiteon bothcondition_numberandn, soNA,NaN,Inf, and-Infrows are excluded. -
length(unique(s$n)) < 2Lchecks for at least two distinct sizes; duplicate sizes are allowed in the regression. -
stats::lmis used without error handling; if the regression matrix is degenerate (e.g., all identicalnafter filtering, which is already guarded),lmwould still execute but the guard prevents this. -
unnameis applied to the slope coefficient to strip the"log(n)"name.
Purpose
Maps a pathology label to a recommended remedy (metric / sampler strategy) and a difficulty level. Called by .gdpar_geom_classify via its inner decide closure.
Arguments
| Argument | Type | Meaning |
|---|---|---|
pathology |
character scalar | One of the recognised pathology labels (see table below). |
Returns
A named list with two elements:
| Element | Type | Meaning |
|---|---|---|
remedy |
character | The recommended remedy string. |
level |
integer | A difficulty/severity level (can be NA_integer_). |
Mapping table:
pathology |
remedy |
level |
|---|---|---|
"isotropic" |
"euclidean_diagonal" |
0L |
"anisotropic" |
"euclidean_dense" |
1L |
"funnel" |
"riemannian" |
3L |
"heavy_tails" |
"finsler_relativistic" |
4L |
"quasi_deterministic" |
"sub_riemannian" |
5L |
"multimodal" |
"tempering" |
6L |
"boundary" |
"boundary_reparam" |
6L |
"flat_direction" |
"reparam_eliminate" |
-1L |
| (any other) | "unknown" |
NA_integer_ |
Notes
- Implemented via
switchwith a default fallback, so unrecognised inputs never error. - The level values are not contiguous (level
2is unused), suggesting an ordinal scale where some slots are reserved or deprecated.
Purpose
The central rule-based classifier that consumes all geometric signals, the difficulty-vs-
Arguments
| Argument | Type | Meaning |
|---|---|---|
sig |
named list | Signal values. Expected elements: heavy_kurtosis, condition_number, boundary_proximity, multimodality, lambda_max_cov, ebfmi_min, divergent_rate, step_scale_ratio. |
ncurve |
named list | Output of .gdpar_geom_difficulty_curve; must contain grows_with_n. |
culprit |
(any) | Accepted but never used in the function body. |
th |
named list | Threshold values. Expected elements: heavy_kurtosis_high, condition_high, boundary_prox_high, multimodal_high, flat_var_high, funnel_ebfmi_low, divergent_rate_high, step_scale_ratio_low, heavy_cond_max, nslope_grows (the latter is used only indirectly via ncurve). |
Mathematics
Two boolean predicates are precomputed:
The confidence mapping function sq(x, ref) maps a signal value
This yields confidence
The classification rules are evaluated in strict priority order (first match wins):
| Priority | Pathology | Condition | Confidence |
|---|---|---|---|
| 1 | boundary |
sig$boundary_proximity >= th$boundary_prox_high |
sq(boundary_proximity, boundary_prox_high) |
| 2 | multimodal |
sig$multimodality >= th$multimodal_high |
sq(multimodality, multimodal_high) |
| 3 | flat_direction |
sig$lambda_max_cov >= th$flat_var_high AND !grows
|
sq(lambda_max_cov, flat_var_high) |
| 4 | quasi_deterministic |
cond_high AND grows
|
sq(condition_number, condition_high) |
| 5 | anisotropic |
cond_high AND !kurt_high
|
sq(condition_number, condition_high) |
| 6 | funnel |
(kurt_high AND sig$ebfmi_min <= th$funnel_ebfmi_low) OR (sig$divergent_rate >= th$divergent_rate_high AND sig$step_scale_ratio <= th$step_scale_ratio_low) |
sq(heavy_kurtosis, heavy_kurtosis_high) if energy branch; sq(divergent_rate, divergent_rate_high) if divergence branch |
| 7 | heavy_tails |
kurt_high AND sig$condition_number <= th$heavy_cond_max
|
sq(heavy_kurtosis, heavy_kurtosis_high) |
| 8 |
anisotropic (residual) |
cond_high (n-curve unknown) |
0.4 (fixed) |
| 9 |
isotropic (default) |
no threshold exceeded |
0.6 (fixed) |
Returns
A named list:
| Element | Type | Meaning |
|---|---|---|
pathology |
character | The matched pathology label. |
confidence |
numeric | Confidence in |
remedy |
character | Remedy string from .gdpar_geom_remedy_for. |
level |
integer or NA_integer_
|
Difficulty level from .gdpar_geom_remedy_for. |
trace |
character vector | Human-readable strings documenting which rule fired and why. |
Notes
- The inner function
decide(p, conf)closes over thetracevariable in the enclosing scope. Each rule appends a trace string totracebefore callingdecide, so the returnedtracereflects the path taken. -
culpritis a dead parameter — accepted for interface consistency but never referenced. - All threshold comparisons use
isTRUE(...)wrappers, which meansNAcomparisons safely evaluate toFALSErather than propagatingNA. This is critical: if any signal isNA, the corresponding rule is simply skipped. - Rule 6 (funnel) has two sub-conditions:
funnel_energy(high kurtosis + low E-BFMI) andfunnel_div(high divergence rate + low step/scale ratio). If both are true,funnel_energytakes precedence in the trace message and confidence computation because theif (funnel_energy)branch in the traceif-elseis evaluated first. - Rule 8 catches the case where
cond_highis true butgrowsisNA(n-curve unknown) andkurt_highis false — rules 4, 5, and 6 all fail, but rule 8 fires with a fixed confidence of0.4. - Rule 9 (isotropic) fires with a fixed confidence of
0.6when no pathology threshold is exceeded. This is higher than rule 8's0.4, reflecting higher confidence in a clean diagnosis than in a residual one. - The
sqfunction's default return of0.5for non-finite inputs means that if a signal isNAbut the threshold comparison somehow passed (whichisTRUEprevents), the confidence would be0.5. - The function does not modify
sig,ncurve,culprit, orth; it is side-effect-free.
Purpose
Internal helper for geometry diagnostics. Computes a compact cost and tractability summary from a pilot run’s sampler signals and from the difficulty-vs-sample-size curve.
Arguments
-
pilot_max: list. Expected components:-
failed: scalar logical or any value tested byisTRUE;TRUEindicates the pilot run failed. -
signals: list with numeric scalar fields:-
elapsed: elapsed time for the sampling phase. -
n_sampling: number of sampling draws. -
mean_leapfrog: mean leapfrog steps. -
treedepth_sat_rate: tree-depth saturation rate.
-
-
-
ncurve: list. Expected componentgrows_with_n, tested byisTRUE; indicates whether difficulty grows with sample size.
Mathematics
Let
Tractability is classified as:
Returns
A list with elements:
-
seconds_per_1000_draws: numeric scalar, orNA_real_in the failure branch. -
mean_leapfrog:sig$mean_leapfrog, orNA_real_in the failure branch. -
treedepth_saturation:sig$treedepth_sat_rate, orNA_real_in the failure branch. -
tractability: character string; one of"unknown","tractable","expensive", or"intractable (escalate geometry or certify limit)".
Notes
The failure branch is triggered when isTRUE(pilot_max$failed) or !is.finite(sig$elapsed). The max(sig$n_sampling, 1) guard protects against zero draws, but not against NA; if n_sampling is NA, per_1000 becomes NA. isTRUE(sat >= 0.5) requires sat to be a length-one numeric/logical value; NA or length not equal to one yields FALSE, causing classification to fall through to less severe branches. Assumes pilot_max$signals exists; if failed is not TRUE and sig$elapsed is absent or NULL, the finite check may error. Internal function; no S3 dispatch.
Purpose
S3 print method for objects of class gdpar_geometry_diagnostic. Renders a compact console summary of the diagnosed pathology, recommended geometry, difficulty curve, computational cost, culprit parameters, and optional ground-truth comparison.
Arguments
-
x: object of classgdpar_geometry_diagnostic, expected to be a list with components:-
pathology: character string. -
confidence: numeric scalar. -
recommended_geometry: character string. -
geometry_level: numeric or character geometry level. -
difficulty_curve: list with optionalslopeandgrows_with_n. -
cost: list withseconds_per_1000_drawsandtractability. -
culprit: data frame with columnparameter; may have zero rows. -
correct:NULL, or a scalar logical/NAindicating whether classification was correct. -
ground_truth: list withpathology, used whencorrectis non-NULL.
-
-
...: unused; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
Writes output to the console via cat. Printed lines include:
- Header:
<gdpar_geometry_diagnostic>. - Pathology with confidence formatted using
digits = 2. - Recommended geometry and geometry level.
- An
n-curveline only ifx$difficulty_curve$slopeis notNULL;slopeis formatted withdigits = 3, followed bygrows_with_n. - Cost line with
seconds_per_1000_drawsformatted withdigits = 3andtractability. - Culprit line only if
nrow(x$culprit) > 0L; prints up to the first three values ofx$culprit$parameterusingutils::head(..., 3), comma-separated. - Ground-truth line only if
!is.null(x$correct); this includesFALSEandNAvalues, not onlyTRUE, and accessesx$ground_truth$pathology.
Edge cases: if x$culprit is NULL, nrow(x$culprit) is NULL and the if condition errors due to length zero; the method assumes culprit is a data frame. If x$correct is non-NULL but x$ground_truth is missing, accessing x$ground_truth$pathology will error. The method does not validate the class or required fields. Exported as an S3 method.
← Part IV — Exhaustive Function Reference (3/7) · gdpar Wiki Home · Part IV — Exhaustive Function Reference (5/7) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References