-
Notifications
You must be signed in to change notification settings - Fork 0
Home
General Dynamic Parameter Models via Reference Anchoring
Package version: 0.1.0 · License: GPL (≥ 3) · Author: José Mauricio Gómez Julián (ORCID 0009-0000-2412-3150) · Repository: https://github.com/IsadoreNabi/gdpar
This is an exhaustive, self-contained technical reference for the gdpar R package. It documents, in three layers and at maximal depth:
- Conceptualization — the cognitive and statistical idea behind the framework, its desiderata, and the family of problems it addresses (Part I).
- Mathematics — the canonical decomposition, identifiability theory, the three estimation paths and their asymptotics, the distributional-family algebra, the spline bases, the causal-inference bridge, the geometry-adaptive sampling engine, and the dependence-robust inference machinery (Part II).
- Computation — how the mathematics is realized in code: the Stan code generator and every Stan template (line by line), the fitting engines, the family code generation, the geometry engine, and every one of the package's 469 functions organized by module (Parts III–V).
Parts VI and the appendices cover the bundled data, the benchmark harness, the test suite, the symbol glossary, and the bibliographic anchors.
| Part | Content |
|---|---|
| I | Conceptual framework: motivation, the anchoring equation, the AMM form, the three paths, EB vs FB, distributional regression, the causal bridge, geometric robustness, dependence-robust inference |
| II | Mathematical foundations: AMM algebra, identifiability theorems (C1–C7), parametrizations (CP/NCP, linear reparametrization), asymptotics of Paths 1/2/3, Empirical-Bayes theory (Theorems 7A–7D) and its multivariate extension, families and links, B-spline W bases, grouped references, causal identification, geometric metrics, dependence diagnostics and block bootstrap |
| III | Computational architecture: the Stan code generator, the fitting engines (gdpar, .gdpar_multi, .gdpar_K), the Empirical-Bayes engine, the family/codegen layer, the geometry engine/orchestrator |
| IV | Exhaustive function reference: all 44 R source files, all 469 functions, grouped by module, each with purpose, signature, arguments, mathematics, and return value |
| V | Stan templates: all 13 .stan files, block by block |
| VI | Data, benchmarks, tests, appendices, references |
Mathematics is written in GitHub-flavored LaTeX: inline as $ … $ and display as $$ … $$. Code identifiers, file names, and Stan symbols are in monospace. Internal (non-exported) R functions are named with a leading dot, e.g. .gdpar_multi; exported functions have no leading dot, e.g. gdpar.
The single most important object in the entire package is the reference-anchoring decomposition
read throughout as: the parameter of individual $i$ equals a population reference plus a deviation that is itself a function of the individual's covariates and of the reference. Every layer of the package — conceptual, mathematical, computational — is an elaboration of this one equation.
| Symbol | Meaning |
|---|---|
| parameter (vector) for individual / observation |
|
| population reference parameter, |
|
| individual deviation function | |
| observable covariates of individual |
|
| additive component of the AMM deviation | |
| multiplicative (Hadamard) component of the AMM deviation | |
| reference-modulated mixing matrix of the AMM deviation | |
| Hadamard (elementwise) product | |
| number of distributional parameter slots (e.g. |
|
| dimension of the parameter at a slot (number of coordinates of |
|
| observation distribution given |
|
| EB hyperparameter / canonical-piece reduced parameter vector | |
|
|
link and inverse-link (response) function of a slot |
| Riemannian metric tensor used by the geometry engine | |
| temporal persistence (AR) parameter; also hypernetwork weights in Path 3 | |
| structural-zero probability for individual |
The framework originates in an observation about expert human prediction under uncertainty and consequence: a driver executing fast overtaking maneuvers at narrow distances without crashing. Decomposed, the expert's predictive process has three stages:
- A population reference. The driver carries an internal model of the average driver — typical reaction times, modal aggressiveness, characteristic acceleration/braking patterns. This functions as a prior.
- Rapid estimation of individual deviation. In a one-to-two-second window, the driver reads signals from the specific other driver (style, relative speed, vehicle type, micro-movements) and estimates how and how much that driver departs from the average.
- Conditional prediction and decision. With the individual profile expressed as a deviation from the average, the driver predicts the other's behavior and decides the maneuver.
A conventional model in its simplest form predicts with parameters that are fixed once estimated,
More elaborate models do let parameters vary across individuals — hierarchical models with covariate-dependent random effects, varying-coefficient models, hypernetworks, mixtures of experts, state-space models. The framework's contribution is therefore not the absence of antecedents but the explicit, canonical formulation of a specific structural pattern those antecedents do not typically make explicit:
Conventional predictive models do not typically formulate individual parameters as explicit deviations from a population reference where the deviation function depends both on the individual's observable characteristics and on the reference itself.
The substantive content unpacks into three commitments:
-
$\theta_{\text{ref}}$ is always present as an explicit anchor in the predictive equation. -
$\Delta_i$ is computed as a function of observable signals$x_i$ . - the individual prediction emerges from
$\theta_i=\theta_{\text{ref}}+\Delta_i$ , not from$\theta_i$ estimated independently per individual nor from$\theta_{\text{ref}}$ shifted by a structurally separate term.
The decisive property is the structural dependence of
For each observation
with
A model in this framework is fully specified by three design choices:
-
How
$\theta_{\text{ref}}$ is estimated — from the full sample, from a reference subset, or as a hyperparameter with its own distribution (this becomes the anchor argument in code, and the EB-vs-FB distinction). -
How
$\Delta$ is estimated — parametric/linear, non-parametric (splines), or as a neural network output (the three paths). -
What distribution is assumed for
$Y_i\mid\theta_i$ — this determines the family of problems addressed (the distributional families).
The canonical functional form for
developed formally in Part II. Read componentwise:
-
Anchoring. With no individual information (
$x_i$ unobserved or non-informative),$\Delta\to 0$ and$\theta_i\to\theta_{\text{ref}}$ : the model collapses to the population baseline. -
Individuation. With rich individual information,
$\Delta$ can be large and parameters move away from the reference as far as the evidence justifies. -
Transferability. When
$\theta_{\text{ref}}$ changes (new population), deviations are recomputed coherently because$\Delta$ is a function of$\theta_{\text{ref}}$ . -
Generality. The form subsumes, as special cases, fixed effects (
$\Delta\equiv 0$ ), classical random effects ($\Delta$ independent of$\theta_{\text{ref}}$ ), and random coefficients ($\Delta$ linear in$x_i$ ).
The same canonical principle is realized through three complementary engines, differing in how
-
Path 1 — Hierarchical Bayesian (Stan). A three-level hierarchy: a population level
$\theta_{\text{ref}}\sim p(\theta\mid\text{hyper})$ , an individual level$\theta_i\mid x_i \sim \mathcal N(\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}}),\Sigma_i)$ (with$\Sigma_i$ possibly covariate-dependent, i.e. individual heteroscedasticity), and an observation level$Y_i\mid\theta_i\sim\mathcal D(\theta_i)$ . Most faithful to the cognitive analogy, native full-posterior uncertainty. This is the operational path ingdpar. -
Path 2 — Varying-coefficient models (penalized splines). The frequentist version:
$Y_i = x_i^\top\beta(z_i)+\varepsilon_i$ , with the reference recovered as$\beta_{\text{ref}}=\beta(\bar z)$ and the deviation as$\Delta_i=\beta(z_i)-\beta(\bar z)$ . Maximal interpretability; suffers the curse of dimensionality. Conceptual ingdpar 0.1.0(see below). -
Path 3 — Conditional parameter networks (hypernetworks / amortized inference). A neural network generates the individual parameters,
$\theta_i=h_\phi(x_i,\theta_{\text{ref}})$ , with anchoring enforced both by feeding$\theta_{\text{ref}}$ as an explicit input and by a regularizer$\lambda|\theta_i-\theta_{\text{ref}}|^2$ . Arbitrary nonlinearity, lowest interpretability. Conceptual ingdpar 0.1.0.
Implementation status. In
gdpar 0.1.0only Path 1 (Hierarchical Bayesian) is operational and is the default. Calls of the formgdpar(..., path = "vcm")orgdpar(..., path = "hyper")abort with agdpar_unsupported_feature_error. The asymptotic theory for all three paths is nonetheless developed to reference grade (Part II, from vignettesv04–v06), so the package's mathematical scope exceeds its current executable surface by design.
Comparative summary (from the framework overview):
| Criterion | Path 1 Bayesian | Path 2 VCM | Path 3 Hypernetwork |
|---|---|---|---|
| Fidelity to cognitive analogy | High | Moderate | Moderate |
| Theoretical rigor | Very high | High | Moderate |
| Interpretability | High | Very high | Low |
| Expressive capacity of |
Moderate (parametric) | Moderate–high | Arbitrarily high |
| Scalability to high dimension | Moderate | Low (curse of dim.) | High |
| Uncertainty quantification | Native (full posteriors) | Asymptotic (CIs) | Requires extensions |
| Primary tools | Stan, cmdstanr | mgcv, splines | torch |
Within Path 1, the reference
-
Full Bayes (FB). Everything — reference, deviation coefficients, hyperparameters — is given priors and sampled jointly by HMC. Output: full joint posterior. This is
gdpar(). -
Empirical Bayes (EB). The hyperparameters (and, in the marginal variant, the reference itself) are estimated by maximizing a marginal likelihood / MAP objective; the remaining parameters are then inferred conditionally. Far cheaper, with an explicit and analyzable contraction/asymptotic story. This is
gdpar_eb().
The package treats EB and FB as parallel, comparable estimation routes and ships an explicit comparator (gdpar_compare_eb_fb) that quantifies how much the two agree on
The framework is not restricted to modelling a mean. A distribution can have several parameters — location, scale, shape, tail index, zero-inflation probability — and each of these is a slot that can carry its own AMM decomposition. The package indexes slots by
each on its own link scale. Zero-inflation receives the framework's distinctive dual deviation: both
Because the AMM form produces individual parameters, it is naturally positioned for individual treatment-effect estimation. The package provides a T-learner causal bridge: fit the AMM model separately under treatment and control, then read individual conditional average treatment effects (CATE) / individual treatment effects (ITE) as the difference of the anchored individual predictions. A second layer compares this AMM-based learner against external meta-learners (via pluggable adapters to grf in R and EconML in Python through reticulate), so the framework's causal claims are benchmarked, not asserted.
Hierarchical AMM posteriors can be geometrically hostile to standard HMC — funnels, near-determinism, heavy tails, multimodality. The package contains an opt-in geometry-adaptive sampling engine whose default is bit-identical to ordinary sampling and which, when enabled, climbs a ladder of increasingly powerful geometries: Euclidean → Riemannian (Fisher / SoftAbs) → sub-Riemannian → Finsler/relativistic, governed by a certifying orchestrator that diagnoses the pathology, selects a metric, tunes the integrator, and emits a certificate. A Laplace fallback provides a plug-in posterior (and ELPD on par with mgcv-REML / INLA-Laplace) when full sampling is certified infeasible.
gdpar does not model temporal or spatial dependence in its point structure (that is deferred, by design, to a future "Block 10"); instead it makes the inference robust to dependence that is present but unmodelled. It ships:
-
diagnostics that convert invisible iid risk into measured quantities — lag-1 autocorrelation, Durbin–Watson, Ljung–Box on residuals (temporal); Moran's
$I$ with permutation and analytic Cliff–Ord variants (spatial); - robust standard errors / intervals via block bootstrap — moving/circular blocks in time, tiled randomized-origin blocks in space — with data-driven block lengths (Politis–White flat-top automatic length in time; a custom subsampling calibration in space).
Point estimates are unchanged; only the uncertainty is made robust. The honesty is explicit: the dependence is not modelled, the inference is merely made valid in its presence.
This part develops, at reference grade, the mathematics the package implements: the AMM decomposition and its identifiability theory (§II.1), the asymptotic theory of the three paths (§II.2), the Empirical-Bayes theory and its multivariate extension (§II.3), the distributional families and links (§II.4), the B-spline W bases (§II.5), grouped references (§II.6), the causal-inference bridge (§II.7), the geometry-adaptive sampling metrics (§II.8), and the dependence-robust inference machinery (§II.9). The identifiability material corresponds to package source R/check_identifiability.R, R/preflight*.R, R/amm_spec.R; the conditions stated here are enforced in code and the cross-references make the link explicit.
There are
-
(L1) Algebraic-functional identifiability — does the latent function
$\theta_i(\cdot)$ determine the components$(\theta_{\text{ref}},a,b,W)$ ? (Theorem 1A.) -
(L2) Statistical identifiability — do the observable data
${(x_i,y_i)}$ determine them? (Lemma 1B, via a hypothesis on the response family.) - (L3) Numerical verifiability — in a chosen finite basis, can a runtime diagnostic detect identifiability or its failure? (Proposition 1C, the Gram-matrix check.)
The space of deviation forms is stratified by joint polynomial order in
| Level | Form | Joint order | Status |
|---|---|---|---|
| 0 (degenerate) | standard non-mixed regression | ||
| 1 (linear additive) |
|
classical random coefficients; no reference-dependence | |
| 2 (canonical AMM) | mixed | the canonical default | |
| 2.5 (full-matrix mult.) | — | Level 2 is |
|
| 3 (quadratic) | adds |
up to |
— |
|
|
|
||
|
|
|
— | hypernetwork; Proposition 1F |
The canonical Level-2 AMM is
with
Approximation (Scheme 1D). On compact
| Condition | Meaning | |
|---|---|---|
| (C1) | covariates centered | |
| (C2) | additive component centered | |
| (C3) | multiplicative component centered | |
| (C4) |
|
modulating matrix anchored |
| (C5) |
|
support/integrability |
| (C6) | non-degeneracy of the reference |
The joint consequence of (C1)–(C4) is the centering of the framework:
In the implementation, (C2)/(C3) are enforced empirically by column-centering the design matrices
Let
Abstract FIC at
$\theta_*$ .$\mathcal S_a,\mathcal S_b(\theta_*),\mathcal S_W$ are linearly independent in$L^2(\mu,\mathbb R^p)$ :$f_a+f_b+f_W=0\Rightarrow f_a=f_b=f_W=0$ .
The basis-restricted FIC (
Under (LIN) (
$\mathcal F_a,\mathcal F_b,\mathcal F_W$ finite-dimensional linear subspaces), (C1)–(C5), (C6) at $\theta_$, and Abstract FIC at $\theta_$, the latent function $ \theta_i^(\cdot)=\theta_+a(\cdot)+b(\cdot)\odot\theta_+W(\theta_)\cdot$ uniquely determines $(\theta_,a,b,W(\theta_))$.
The proof has three steps: (1) taking
Necessity of FIC requires, in addition to (LIN) and (C1)–(C6), the hypothesis (EVAL): point evaluation $E_{\theta_}:\mathcal F_W\to\mathbb R^{p\times d}$, $W\mapsto W(\theta_)$, is surjective. If Abstract FIC fails (and (EVAL) holds), two distinct admissible tuples produce the same latent function — explicit non-identifiability. (LIN) lets perturbations stay in the classes; (EVAL) realizes the required
Lemma 1B. Under Theorem 1A's hypotheses plus (D-ID) (the response family
${\mathcal D(\theta)}$ is identifiable in$\theta$ :$\mathcal D(\theta)=\mathcal D(\theta')\Rightarrow\theta=\theta'$ ), the joint law of$(X_i,Y_i)$ determines $(\theta_,a,b,W(\theta_))$.
(D-ID) holds for one-parameter exponential families with canonical link, full-rank multi-parameter exponential families, ZIP/ZINB under independent variation of
Fix a basis
For the chosen finite representation
$B$ , $\text{FIC}B$ at $\theta$ holds iff $\mathbf G(\theta_)$ is non-singular (Gram non-singularity$=$ column linear independence).
Caveats: it diagnoses basis-restricted FIC, not abstract FIC; tol (default gdpar_check_identifiability().
For
(C4-bis) Coord-wise structural disjointness. For every
$k$ :$\mathrm{names}(Z_{a,k})\cap\mathrm{names}(X)=\emptyset$ .
This is necessary but not sufficient (overlap enables aliasing; regularization can suppress it). The extended Gram matrix cannot detect it: at a fixed check_C4_bis_per_k()) from post-fit posterior-geometry forensics (divergences, low ESS, high gdpar_check_identifiability(..., rigor=) offers "full" (default; aborts on overlap) and "fast" (warns of class gdpar_c4bis_overlap_warning, for users who intend the overlap and regularize the gdpar_prior()). The per-$k$ breakdown (passed, lambda_min/max, condition_number, shared_cols, collinear_directions) is in report$c4_bis$per_k.
Block 6.5 promotes the reference to a per-group anchor group = ~ species; group = NULL reduces bit-exactly to the single-anchor regime). If
(C7). When
use_groups = 1,$\mathrm{rank}([G\mid Z_a])=\mathrm{ncol}(G)+\mathrm{ncol}(Z_a)$ and likewise for$Z_b$ .
Enforced pre-fit by .check_group_aliasing_c7() in two layers: (1) within-group variance per column (catches constant-per-group / factor(group) aliases), (2) joint QR rank of normalized gdpar_input_error naming the columns. Together with C1–C4 (global Gram) and C4-bis (cross-component per coord), C7 completes a three-tier pre-flight; the post-fit forensic remains the posterior geometry.
In the Bayesian setting
-
(a)
$W=W'$ $\pi_\Theta$ -a.e. on$\mathrm{supp}(\pi_\Theta)$ ; -
(b)
$W=W'$ in$L^2(\pi_\Theta;\mathbb R^{p\times d})$ (the recommended default conclusion); -
(c)
$W=W'$ in$C(\overline{\mathrm{supp}(\pi_\Theta)})$ , additionally requiring (BAY-1) support$=$ closure of a connected open$U$ , (BAY-2)$W$ continuous (subsumed by (C5)), (BAY-3)$\pi_\Theta$ charges every non-empty open subset of$U$ (so the a.e.-identification set is dense). These are automatic for absolutely continuous priors with positive density on a connected open set.
No tier identifies
For the hypernetwork,
When Path 3 diverges from Path 1 on the same data, a four-step empirical protocol discriminates "richer structure" from "undetected non-identifiability": (1) stability across
Standard models are Theorem-1A special cases verified to satisfy (LIN): standard regression (Level 0), random coefficients (Level 1, identifiability
Component selection over the eight restrictions
The package develops asymptotics for all three paths to reference grade (only Path 1 is executable). The reference text throughout is Ghosal & van der Vaart (2017); AMM-specific theorems are specializations, with explicit statements of what is established, what the AMM specialization costs in extra hypotheses, and what remains open.
The Path-1 model places priors on every component:
Two distances, no global equivalence. Hellinger
The three asymptotic layers parallel the three identifiability layers:
-
(L1) Posterior consistency —
$\Pi_n({d(\eta,\eta_*)>\varepsilon})\xrightarrow{P}0$ for every$\varepsilon$ . -
(L2) Contraction rate —
$\exists,\varepsilon_n\to0$ ,$n\varepsilon_n^2\to\infty$ , with$\Pi_n({d>M\varepsilon_n})\xrightarrow{P}0$ . -
(L3) Bernstein–von Mises —
$\Pi_n(\sqrt n(\eta-\widehat\eta_n)\in\cdot)\xrightarrow{w}\mathcal N(0,I_*^{-1})$ in total variation.
Standing asymptotic hypotheses (additional to C1–C6, LIN, D-ID, IID): (PRIOR-KL) $\pi(B_\varepsilon(\eta_))>0$ for all $\varepsilon$ (KL-ball $B_\varepsilon={K(\eta_,\eta)\le\varepsilon^2,V(\eta_,\eta)\le\varepsilon^2}$); (PRIOR-THICK) $\pi(B_{\varepsilon_n}(\eta_))\ge e^{-C_1 n\varepsilon_n^2}$; (SIEVE) sieves
Theorem 4A (posterior consistency). Under C1–C6, LIN, D-ID, the Block-2 regularity (HOM)+(REG)+(IID), (PRIOR-KL), (TEST), and finite bracketing entropy on sieves with
$\pi(\Theta_n^c)\to0$ : $\Pi_n({d_H(\eta,\eta_)>\varepsilon})\xrightarrow{P_{\eta_}}0$. (Schwartz 1965 specialized to AMM; novelty is verifying PRIOR-KL and entropy for the product prior, which under LIN reduces to prior positivity at$\eta_*$ .)
Theorem 4A discharges (REG-EST) of Block 2 in average-error form: the posterior-mean individual parameter
Theorem 4B (contraction rate). Adding (PRIOR-THICK) and (SIEVE) for
$\varepsilon_n\to0$ ,$n\varepsilon_n^2\to\infty$ :$\Pi_n({d_H>M\varepsilon_n})\xrightarrow{P}0$ (Ghosal–Ghosh–van der Vaart 2000 specialized to AMM).
Rates by Level: Level 0/1 parametric
Theorem 4C (Bernstein–von Mises). For finite-dim parametric AMM (Levels 0/1/2 with finite-dim classes), under 4A+4B, (LAN), and a consistent
$\sqrt n$ -MLE: the posterior is asymptotically$\mathcal N(\widehat\eta_n,n^{-1}I_*^{-1})$ in total variation. Consequence: Bayesian credible intervals and asymptotic frequentist CIs coincide in the limit — this justifies reporting credible intervals as the primary uncertainty.
Proposition 4C-semi (semiparametric BvM). With parametric
$\theta_{\text{ref}}$ and non-parametric$(a,b,W)$ , under Castillo–Rousseau (2015) conditions ($\sqrt n$ -recoverability + least-favorable-direction-aware prior), the marginal posterior of$\theta_{\text{ref}}$ is$\sqrt n$ -asymptotically normal at the semiparametric efficiency bound$V_*$ . Tight scope: this is only for the marginal of$\theta_{\text{ref}}$ — the function-valued$(a,b,W)$ need not be asymptotically Gaussian in a function-space metric, and the library reports their intervals as posterior quantiles (function-space credible balls), never as$\sqrt n$ Gaussian intervals.
Open questions explicitly recognized: (O1) full BvM for non-parametric components (only partial Sobolev-topology results exist); (O2) adaptive contraction rates for general AMM Path-1 priors; (O3) misspecification asymptotics under failure of (HOM)/(REG) (contraction to a KL-projection pseudo-true parameter, Kleijn–van der Vaart 2012). Implementation diagnostics: prior KL-support report, Stan
These paths are not executable in gdpar 0.1.0 but carry reference-grade asymptotics:
-
Path 2 (VCM, vignette
v05). Frequentist penalized-spline asymptotics: pointwise and uniform consistency of$\widehat\beta(\cdot)$ , asymptotic normality at the spline rate$n^{-\beta/(2\beta+1)}$ , with the reference recovered as$\beta(\bar z)$ and the deviation as$\beta(z)-\beta(\bar z)$ ; conditions specialize Fan–Zhang (2008), Stone (1985), Wood (2017). The curse of dimensionality in$z$ is the binding limitation. -
Path 3 (hypernetwork, vignette
v06). Only partial results are available for Bayesian neural networks: consistency under the Neural-Tangent-Kernel regime (Jacot et al. 2018; Bach 2017), PAC-Bayes generalization bounds (Dziugaite–Roy 2017), and an explicit acknowledgement that BvM and contraction rates are open (Hron et al. 2020). Universal approximation gives density, not identifiability; the function-level identifiability of$\Phi_\phi$ and its contraction are open (cf. Proposition 1F).
The cross-path consistency: parametric AMM gets the full
Partition
Theorem 7A (first-order equivalence). Under regularity + the three hypotheses, EB and FB lower-level posteriors agree asymptotically. Regime A (finite-dim parametric AMM):
$d_{\text{TV}}(\Pi_n^{\text{EB}},\Pi_n^{\text{FB}})\xrightarrow{P}0$ . Regime B (non-parametric AMM): TV is too strong; convergence holds for smooth bounded$L^2(\mu)$ -Lipschitz functionals (equivalently Wasserstein-1/Prokhorov on the joint posterior). Specializes Petrone–Rousseau–Scricciolo (2014) and Rousseau–Szabo (2017). Practical content: for large$n$ + weak prior, EB and FB give essentially the same posterior over$\xi$ ; the choice is then computational/methodological.
Proposition 7B (higher-order coverage). Under Edgeworth-type expansion conditions (Bickel–Ghosh 1990), EB credible intervals for a smooth functional
$g(\xi)$ under-cover by$O(n^{-1})$ : $\mathbb P(g(\xi^)\in\mathrm{CI}n^{\text{EB},\alpha})=(1-\alpha)-C{g,\alpha}n^{-1}+o(n^{-1})$, with $C_{g,\alpha}\approx(g'(\xi^))^2/I_{\theta\theta}^{\text{marg}}\cdot\kappa(\alpha)$ (larger when$g$ is sensitive to$\theta_{\text{ref}}$ , smaller when$\theta_{\text{ref}}$ is well-identified); FB covers to first order. The library applies a post-hoc inflation$\sqrt{1+C_{g,\alpha}/(n-q)}$ (argumenteb_correction=TRUE), explicitly approximate.
Theorem 7C (compound decision, Robbins–Efron). For
$K$ exchangeable units, $\frac1K\sum_k\mathbb E[(\widehat\xi_k^{\text{EB}}-\xi_k^)^2]\le\frac1K\sum_k\mathbb E[(\widehat\xi_k^{\text{FB}}-\xi_k^)^2]+B_K$ with$B_K\le\frac{C_1}{K}\mathbb E[(\widehat\theta_{\text{ref}}^{\text{EB}}-\theta_{\text{ref}}^*)^2]\to0$ : EB risk approaches FB risk as$K\to\infty$ (squared-error loss on point estimates only — coverage still under-covers per 7B).
Proposition 7D. EB and FB differ substantially when: (i) small
$I_{\theta\theta}^{\text{marg}}$ (poorly identified upper level); (ii) strongly informative$\pi_\Theta$ ; (iii) multimodal$L_n^{\text{marg}}$ ; (iv) misspecified$\pi_\xi$ (EB regularizes by tuning$\theta_{\text{ref}}$ , FB does not).
Default is FB (gdpar()); EB (gdpar_eb()) is opt-in. EB's independent methodological advantages: honest avoidance of a prior on
Multivariate / multi-slot extension (vignette v07b): Theorem 7A* (to
The four gdpar_eb() path regimes (dispatched from the resolved $(K,p)$), each with a marginal+conditional Stan template pair and its own Proposition-7B correction:
| Regime | Stan template pair | 7B correction | |
|---|---|---|---|
| Base |
amm_eb_marginal.stan + amm_eb_conditional.stan
|
scalar | |
| Path A |
amm_eb_marginal_multi.stan + ..._conditional_multi.stan
|
matricial 7B* | |
| Path B |
amm_eb_marginal_K.stan + ..._conditional_K.stan
|
per-slot scalar | |
| Path C |
amm_eb_marginal_KxP.stan + ..._conditional_KxP.stan
|
tensor |
The canonical EB recipe (three steps): (i) marginal-likelihood maximization for cmdstanr::laplace() with multi-start + Levenberg–Marquardt ridge + condition-number guard — the anti-fragility strategy); (ii) plug gdpar_compare_eb_fb() operationally verifies Theorem 7A (marginal TV) and Proposition 7B (per-cell width ratio).
Hierarchical AMM posteriors can suffer the classic centered/non-centered pathology of hierarchical models. The package treats the parametrization of the multiplicative interaction R/preflight.R, R/preflight_multi.R, R/contraction_diagnostic.R.
For the multiplicative term, write the contribution of coordinate involving the reference as
-
Centered (CP). Sample
$b_{\text{coef}}$ directly; the term$\theta_{\text{ref}}\cdot b_{\text{coef}}$ couples$b_{\text{coef}}$ to$\theta_{\text{ref}}$ multiplicatively. CP mixes well when the data are informative about the interaction (the likelihood dominates the funnel). -
Non-centered (NCP). Reparametrize
$b_{\text{coef}}=\mu_b+\tau_b,\tilde b$ with$\tilde b\sim\mathrm{Normal}(0,1)$ , decoupling the prior geometry; NCP mixes well when the data are weakly informative (prior-dominated funnel). -
Linear reparametrization. Sample the product
$c_b=\theta_{\text{ref}}\cdot b_{\text{coef}}$ directly as a linear coordinate, sidestepping the bilinear$(\theta_{\text{ref}},b)$ geometry altogether. This is the package's resolution of the deeper diagnosis (below): the root cause of the funnel is the non-linear$(\theta_{\text{ref}},b)$ parametrization, not the centering per se; sampling the linear product removes the bilinearity at the source.
The diagnosis that led here proceeded in three iterations — NCP did not cure it, CP did not cure it, and the residual pathology was traced to the bilinear coordinate; the final fix samples
The choice among CP/NCP/(linear) is made by a pre-flight procedure that runs a short pilot and computes an information ratio discriminating prior-dominated from likelihood-dominated regimes, then dispatches:
-
preflight_parametrization()(scalar) /preflight_parametrization_multi()($p>1$ , per coordinate) run the pilot sample and the attribution/info-ratio computation. - The information ratio contrasts how much of the posterior variation in the interaction is attributable to the data vs the prior; an asymptotic
$z$ /$t$-style test on it, evaluated against thresholds, picks the regime. Defaults:tau_cp = 5,tau_ncp = 2(the boundary thresholds of the ratio); a high ratio (data-informative) selects CP, a low ratio selects NCP. - The variance/score machinery is made dependence-aware: effective weights + a chain-aware block bootstrap + an asymptotic z-test give the ratio's sampling distribution without assuming independence of the pilot draws (Path B′ canonical design). The per-coordinate variant decides each coordinate of
$\theta_{\text{ref}}\in\mathbb R^p$ separately and resolves a global vs per-dimension decision. -
resolve_parametrization()/resolve_parametrization_multi()turn the diagnostic verdict into the concrete Stan-side toggle used by the code generator.preflight_global_decision()andpreflight_per_dim()are the exported user-facing entry points;decision_to_logical()maps the verdict to the boolean the template consumes.
Confounding-induced NCP preference is treated as correct, not a defect: when a covariate confounds the reference, the prior-dominated geometry genuinely calls for NCP, and the diagnostic is designed to detect exactly that. The whole apparatus is data-driven: no parametrization is hard-coded; the knob is set from a declared, reproducible statistic of a pilot fit.
gdpar_contraction_diagnostic() (R/contraction_diagnostic.R) operationalizes the empirical contraction-rate verification of Theorem 4B (§II.2.1): it refits at increasing sample sizes and tracks the posterior credible-set diameter, flagging deviations from the predicted rate
A family is represented internally as an ordered list of per-parameter specifications (gdpar_param_spec), one per slot name (link (and its inverse), a family_role ∈ {location, scale, shape, df, mixture_pi, power}, a support (real_line/positive_real/unit_interval/bounded_open), an identifiability descriptor (did_status ∈ {holds, holds_under_condition, user_responsible} with condition + reference, i.e. the (D-ID) hypothesis of Lemma 1B made per-slot), a canonical prior kind, and a scope (per_observation or population). The link factory implements exactly three links:
The built-in roster (location slot gets the user link; auxiliaries get fixed links — log for positive-real, logit for unit-interval, identity for the Tweedie power):
| Family | Slots (role, link) |
|---|---|
| gaussian |
|
| poisson |
|
| neg_binomial_2 |
|
| bernoulli |
|
| beta |
|
| gamma |
|
| student_t |
|
| tweedie |
|
| zip |
|
| zinb |
|
| hurdle_poisson |
|
| hurdle_neg_binomial_2 |
|
Each family carries an integer stan_id selecting the likelihood branch in the Stan templates (Part V). Auxiliary slots default to population scope and can be promoted to per_observation (their own AMM decomposition) by the user — that is distributional regression: gdpar_family_custom, gdpar_family_custom_K) where the user declares the likelihood and accepts responsibility for (D-ID) via did_override. The exported gdpar_family, gdpar_family_multi (the R/families.R; per-likelihood Stan math: Part V.
The modulating component W_basis(type, degree, knots, df, boundary_knots, basis_fn, dim, p) with three types:
-
polynomial (default degree 1): block-by-coordinate powers, no cross-terms,
$$\text{eval}(\theta)=\big(\theta_1,\theta_1^2,\dots,\theta_1^{\deg},\ \theta_2,\dots,\theta_2^{\deg},\ \dots,\ \theta_{p},\dots,\theta_{p}^{\deg}\big);$$ -
bspline (default degree 3): per-coordinate B-spline bases concatenated,
$\text{eval}(\theta)=(B(\theta_1),\dots,B(\theta_p))$ , with Cox–de Boor evaluation performed Stan-side for differentiability inside HMC; the R side (.gdpar_resolve_bspline_knots,.gdpar_bspline_knots_full,.gdpar_validate_bspline_boundary_range) resolves interior knots (fromknotsordf) and boundary knots and validates the reference range;boundary_knotsdefaults torange(c(knots, tk)); -
user: an arbitrary
basis_fnof declareddim.
materialize_W_basis() populates dim, p, and block_indices (the per-coordinate column blocks) once as_per_k() reshapes to per-coordinate form. The anchoring constraint (C4) apply_W_basis_diff mechanism, selected by a W_type_id). Detail: Part IV, R/W_basis.R; Stan-side Cox–de Boor: Part V.
Activated by group = ~ factor in gdpar(), the scalar/coord-wise reference is promoted to a per-group anchor group = NULL reduces bit-exactly to the single-anchor regime. The anti-aliasing condition (C7) (§II.1.9) is enforced pre-fit by .check_group_aliasing_c7(). Anchor resolution across regimes is handled by resolve_anchor, resolve_anchor_multi, resolve_anchor_K; extraction of grouped references by the .extract_theta_ref_*_grouped helpers (Part IV, R/gdpar.R, R/methods.R).
Because the AMM produces individual parameters, individual treatment effects follow directly. gdpar_causal_bridge() implements a T-learner: fit the anchored model separately under treatment and control and read the individual CATE/ITE as the difference of anchored individual predictions,
.summarize_cate). A battery of pre-checks (.check_bridge_path/_hierarchical/_family/_dim/_amm/_anchor) guards the bridge's applicability. The second layer, gdpar_compare_meta_learners(), benchmarks the AMM learner against external meta-learners through a pluggable adapter registry: gdpar_adapter_grf (R, generalized random forests) and gdpar_adapter_econml (Python EconML via reticulate, e.g. CausalForestDML realizing orthogonal/DML CATE). Adapters honor a two-layer contract (fit_predict_fun + optional predict_fun). Detail: Part IV, R/causal_bridge.R, R/compare_meta_learners.R, R/adapter_*.R; theory: vignettes v08, v08b, v08c.
For geometrically hostile posteriors (funnels, near-determinism, heavy tails, multimodality), the package ships an opt-in geometry engine whose default path is bit-identical to ordinary HMC. When enabled it climbs a ladder of metrics
-
Euclidean (
gdpar_geom_metric_euclidean) — baseline mass matrix; -
Riemannian / Fisher (
gdpar_geom_metric_gp_fisher,gdpar_geom_metric_riemannian) with the SoftAbs regularization of the Hessian eigenvalues,$\lambda\mapsto\lambda\coth(\alpha\lambda)$ , giving a positive-definite metric from a possibly-indefinite Hessian; log-Cholesky parametrization of$G$ and its differentials ($dM/d\psi$ ); -
sub-Riemannian (
gdpar_geom_metric_subriemannian) — a degenerate/constrained metric flowing along admissible directions; -
relativistic / Finsler (
gdpar_geom_metric_relativistic) — a relativistic kinetic energy capping velocities (heavy-tail/ill-conditioning robustness), with its own radial integrator.
A certifying orchestrator (gdpar_geom_orchestrate, with _criteria and _budget variants) diagnoses the pathology (gdpar_geometry_diagnostic: multimodality, heavy kurtosis, boundary proximity, difficulty curve), selects an entry rung, builds the metric, tunes gdpar_geom_certificate). A Laplace fallback (gdpar_geom_laplace: Newton/Laplace climb, observed information, unconstrained draws, fit-quality label) provides a plug-in posterior and an ELPD on par with mgcv-REML / INLA-Laplace when full sampling is certified infeasible (the resolution of the near-deterministic Tweedie case). Detail: Part IV, R/geometry_engine.R, R/geometry_orchestrator.R, R/geometry_suite.R, R/geometry_bridge.R, R/geometry_laplace.R, R/geometry_diagnostic.R; theory: vignette vop08.
gdpar does not model dependence in its point structure; it makes the inference robust to unmodelled dependence (point estimates unchanged, only uncertainty made robust). Two axes:
-
Temporal (
gdpar_dependence_diagnostic,gdpar_dependence_robust): diagnostics = lag-1 autocorrelation, Durbin–Watson, Ljung–Box on residuals; robust SE/intervals via moving/circular block bootstrap with a data-driven block length. The automatic length is the Politis–White (2004) + Patton–Politis–White (2009) flat-top rule (a base-R hand-roll equal tonp::b.star): flat-top autocovariance + adaptive bandwidth$b=(2\hat g^2/D)^{1/3}n^{1/3}$ ,$D=\tfrac43 \text{spec}^2$ ; defaultblock_length=NULLreduces bit-exactly to the$n^{1/3}$ Künsch rate. -
Spatial (
gdpar_spatial_dependence_diagnostic,gdpar_spatial_dependence_robust): diagnostic = Moran's$I$ (hand-roll, nospdep/sf) with knn row-standardized / distance-band / user-supplied weights, two-sided permutation default + analytic Cliff–Ord option; robust inference via tiled randomized-origin spatial block bootstrap (Politis–Romano–Lahiri) with variance-optimal block side$g=\max(2,\lceil n^{1/4}\rceil)$ in$d=2$ , plus a data-driven subsampling calibration of$g$ .
Both axes work on the EB scalar path and the FB path (gdpar_fit and gdpar_eb_fit) via a shared engine (.gdpar_dependence_robust_engine) with class-dispatched estimate/SE/residual extractors and a frozen RNG contract that keeps the EB route bit-exact. Detail: Part IV, R/dependence_robust.R; theory: vignette vop09.
This part is the map of how a model flows from a user call to posterior summaries. Every function named here is documented in full in Part IV; the Stan templates in Part V.
A call to gdpar(formula, family, amm, data, group = NULL, path = "bayesian", ...) proceeds through:
-
Specification resolution. The response/formula is parsed; the AMM design is built. The user may supply the AMM three ways: a single
amm_spec()(scalar/coord-wise), a named list of specs, or agdpar_formula_set()/gdpar_bf()(abrms-like multi-slot formula DSL parsed by.gdpar_parse_amm_formula).dims_spec(dimwise/override/resolve_dims_spec) declares per-coordinate ($p>1$ ) component formulas. The family is resolved (gdpar_family/gdpar_family_multi/heterogeneous) into the per-slotgdpar_param_speclist, and the resolved$(K,p)$ pair determines the path regime. -
Design construction.
build_amm_design(and.build_amm_design_multi,.build_amm_design_K) assemble the centered design matrices$Z_a,Z_b$ (enforcing (C2)/(C3) by column-centering) and the modulating design$X$ with theWbasis materialized. -
Identifiability pre-flight.
gdpar_check_identifiability()runs the Gram-matrix check (Prop. 1C), the C4-bis cross-component check for$p>1$ (check_C4_bis_per_k), the per-slot checks for$K>1$ , and — whengroupis set — the C7 anti-aliasing check (.check_group_aliasing_c7). Failure aborts withgdpar_identifiability_errorunlessskip_id_check = TRUE. -
Parametrization pre-flight. The CP/NCP/(linear) decision is taken data-drivenly (§II.6) via
preflight_parametrization*→resolve_parametrization*. -
Code generation.
R/stan_codegen.Rassembles the Stan program string for the resolved$(K,p,\text{family},W,\text{parametrization},\text{group})$ from the canonical Stan pieces, then compiles viacmdstanr. -
Sampling. HMC runs (optionally through the geometry engine, §II.9). For
$K=1,p=1$ the path isgdpar()proper;$p>1$ dispatches to.gdpar_multi,$K>1$ to.gdpar_K/.gdpar_K_build. -
Diagnostics + packaging.
compute_diagnosticscollects$\widehat R$ , ESS, divergences (with the all-NA$\to-\infty$ +warning guard of D77);diagnostics()exposes them; the result is agdpar_fitobject.
gdpar_eb() replaces steps 5–7 with the three-step EB recipe (§II.3): Laplace marginal-likelihood maximization for
R/stan_codegen.R is a string-assembly compiler: it does not ship one monolithic Stan file per case but composes a program from canonical pieces under inst/stan/_canonical_pieces/ (helpers, the stan_id), the chosen W evaluation (W_type_id: polynomial / Stan-side Cox–de Boor B-spline / differenced anchoring), the parametrization toggle (CP/NCP/linear
-
R/gdpar.R— the orchestrator:gdpar()(entry),.gdpar_multi($p>1$ ),.gdpar_K/.gdpar_K_build($K>1$ ), anchor resolution,compute_diagnostics,dedup_message_blocks(clean console output). -
R/eb.R— the Empirical-Bayes engine (largest file, 3196 lines): the four-regime marginal/conditional fit drivers, the Laplace anti-fragility machinery, the EBgdpar_eb_fitpackaging. -
R/families.R— the family/codegen-facing layer: param-spec construction, link factory, per-slot scope promotion, heterogeneous-family resolution, inverse-link-id computation per slot. -
R/geometry_*.R— the geometry engine, orchestrator, suite, bridge, Laplace, diagnostic (§II.9). -
R/dependence_robust.R— the dependence-robust inference engine (§II.10).
The user-facing objects and their methods (full list in NAMESPACE; detail in Part IV):
-
gdpar_fit:print,summary,predict(with the$p>1$ array path and$K>1$ per-slot inverse-link path, plus grouped/newdata variants),coef(→gdpar_coef),residuals(Dunn–Smyth / deviance / quantile, with DHARMa integration),pp_check,gdpar_loo,gdpar_posterior_predict,gdpar_dharma_object. -
gdpar_eb_fit:print,summary,coef,predict. -
gdpar_coef:print,summary,format,as.data.frame. - causal/comparison:
gdpar_causal_bridge,gdpar_meta_learner_comparison,gdpar_eb_fb_comparisoneach withprint/summary/predictas applicable. - specs/reports:
amm_spec,amm_builder,W_basis,dims_spec,gdpar_family(_multi),gdpar_formula_set,gdpar_param_spec, and the diagnostic reports (gdpar_identifiability_report,gdpar_preflight_report,gdpar_contraction_report,gdpar_bvm_report,gdpar_dependence_diagnostic,gdpar_spatial_dependence_*,gdpar_ksd_joint,gdpar_geometry_diagnostic, geometry certificates) all carryprint(and where usefulsummary/format/as.data.frame).
-
R/amm_serialize.R—amm_save_spec/amm_load_spec(a text round-trip of an AMM spec: formula/char/numeric/W-record (de)serializers);gdpar_snapshot_fitsnapshots a fit. -
R/golden_compare.R,R/golden_helpers.R— the golden-regression machinery (manifest, roster, structural/discrete/continuous/sanity comparators, code-hash + toolchain-version stamping) underpinning the test tiers of §VI.3. -
R/ksd_joint.R— a kernelized Stein discrepancy joint diagnostic;R/bvm_check.R— the Bernstein–von Mises calibration check (§II.2.1);R/contraction_diagnostic.R— empirical contraction (§II.6.3);R/gdpar_loo.R— PSIS-LOO aggregation;R/preflight*.R— the parametrization pre-flight (§II.6);R/utils-*.R— condition system (gdpar_abort/warn/inform,require_suggested) and input validators (assert_*).
Documentation of every function in all 44 R source files, generated by the
GLM-5.2 and MiMo-V2.5-Pro lineages under a faithful-to-source spec and audited
(guaranteed floor on the mathematically dense modules: families, W_basis,
geometry_*, dependence_robust, stan_codegen).
gdpar_adapter_econml(estimator = "CausalForestDML", n_estimators = 1000L, model_y = NULL, model_t = NULL, seed = NULL)
Purpose Creates and returns a gdpar_meta_learner_adapter object that wraps an EconML estimator via the reticulate package, for use with gdpar_compare_meta_learners. It enables causal inference using the Orthogonal Double/Debiased Machine Learning (DML) framework, specifically the CausalForestDML estimator from the EconML Python package.
Arguments
-
estimatorCharacter scalar. Identifies the EconML estimator to use. Currently only"CausalForestDML"is supported in this package version. -
n_estimatorsInteger scalar. The number of trees in theCausalForestDMLforest. -
model_yOptional Python model object. The outcome model used in the first stage of the DML procedure. IfNULL, EconML's default (a gradient-boosted tree) is used. -
model_tOptional Python model object. The treatment model used in the first stage of the DML procedure. IfNULL, EconML's default is used. -
seedOptional integer scalar. A random seed for reproducibility, passed to the EconML estimator'srandom_stateparameter. Must be between 1 and.Machine$integer.max.
Mathematics
The CausalForestDML estimator implements the Orthogonal Double Machine Learning (DML) framework for estimating the Conditional Average Treatment Effect (CATE),
- Estimating nuisance parameters
$\eta_0$ , e.g., the outcome model$\mathbb{E}[Y|X, T]$ and the propensity score$\mathbb{P}(T=1|X)$ , using flexible machine learning. - Constructing the pseudo-outcome (the "DML residual"):
$$\psi(W; \eta) = \left[ \frac{T - \hat{\mathbb{P}}(T=1|X)}{\hat{\mathbb{P}}(T=1|X)(1 - \hat{\mathbb{P}}(T=1|X))} \right] \left( Y - \hat{\mathbb{E}}[Y|X, T] \right)$$ - Estimating the CATE by regressing
$\psi(W; \hat{\eta})$ on$X$ using a causal forest.
Confidence intervals are constructed using the effect_interval method of the fitted EconML estimator, which provides asymptotically valid intervals based on the forest's variance estimation.
Returns
A gdpar_meta_learner_adapter list object with the following components:
-
name:"econml" -
fit_predict_fun: A function that fits the model and returns CATE estimates and confidence intervals. -
predict_fun: A function that predicts from a fitted model without re-fitting. -
requires_r:"reticulate" -
requires_py:"econml" -
native_ci:TRUE -
description: A character string summarizing the estimator and its settings.
Notes
- The Python module
econmlmust be installed in the activereticulateenvironment. If not found, the function aborts with agdpar_missing_dependency_error. - The
stateobject returned byfit_predict_funcontains a reference to a Python object. This reference is invalidated if the R session is restarted (e.g., aftersaveRDSandloadRDS). Usingpredict_funon such a state will result in agdpar_unsupported_feature_error. - The function validates input arguments using
assert_countandassert_numeric_scalar(internal validation functions not shown). - The internal
.econml_to_matrixfunction is called to convert covariate data frames to numeric matrices compatible with EconML.
Purpose Converts a data frame of covariates into a numeric matrix suitable for input to the EconML Python package. It also manages a template to ensure consistent encoding of factor levels between training and prediction data.
Arguments
-
dfA data frame or object coercible to a data frame. Contains the covariates. -
templateA list orNULL. IfNULL, the function creates a new template fromdf. If provided, it ensuresdfis processed to match the template's column structure.
Mathematics The conversion process applies the following transformation to each covariate:
- Character columns are converted to factors.
- The data frame is converted to a model matrix using the formula
~ . - 1, which expands factor variables into dummy (one-hot) variables without an intercept. For a factor with$k$ levels, this yields$k$ binary indicator columns.
The resulting matrix fit and effect methods.
Returns A numeric matrix with the following attributes:
-
colnames: The column names of the matrix. -
factor_levels: A list mapping original factor column names to their levels. If the column is not a factor, the entry isNULL. -
template: The template list (either created or passed in) is attached as an attribute to the matrix. This template is stored in the adapter'sstateto ensure consistent data processing during prediction.
Notes
- If
templateisNULL(i.e., during model fitting), the function requires at least one column indf. Otherwise, it aborts with agdpar_input_error. - If
templateis provided (i.e., during prediction), the function enforces that all factor variables in the template exist indfand have the same levels. It also ensures the resulting model matrix has exactly the same columns as the template. Incompatibilities (missing or extra columns) result in agdpar_input_error. - The function uses
stats::model.matrixfor the conversion, which handles factor expansion and removal of intercepts.
gdpar_adapter_grf(num_trees = 2000L, sample_fraction = 0.5, mtry = NULL, honesty = TRUE, seed = NULL)
Purpose
Factory that constructs a gdpar_meta_learner_adapter object wrapping the R-side causal forest estimator grf::causal_forest (Athey, Tibshirani, and Wager, 2019) for use with gdpar_compare_meta_learners. The adapter populates both the mandatory fit_predict_fun (fit + predict in one call) and the optional predict_fun (reuse a cached forest on a fresh evaluation grid without refitting). It advertises requires_r = "grf" and native_ci = TRUE because grf's built-in variance estimator supplies confidence intervals via the normal approximation.
Arguments
-
num_trees: Integer scalar; number of trees in the forest. Default2000L, matchinggrf's default. -
sample_fraction: Numeric scalar in$(0, 0.5]$ ; fraction of the training sample drawn for each tree. Default0.5. -
mtry: Optional integer scalar; number of candidate variables per split. DefaultNULLdelegates togrf's own default ($\min(\lceil\sqrt{p} + 20\rceil, p)$). -
honesty: Logical scalar; whether to use honest splitting. DefaultTRUE(recommended;grfCIs are valid only under honesty). -
seed: Optional integer scalar; seed propagated togrf's internal RNG when the comparator'sseed_runisNULL. DefaultNULL.
Mathematics
Native confidence intervals are obtained by the normal approximation
where grf's built-in variance estimate obtained via predict(..., estimate.variance = TRUE). The standard error is clamped at zero:
Returns
A gdpar_meta_learner_adapter object (constructed via gdpar_meta_learner_adapter) with fields:
-
name = "grf", -
fit_predict_fun: closure capturing the hyperparameter listhp, -
predict_fun: closure independent ofhp, -
requires_r = "grf", -
native_ci = TRUE, -
description: a string of the form"grf::causal_forest (num_trees = <n>, honesty = <h>) with normal-approximation CIs from estimate.variance.".
Notes
- Validation sequence:
-
assert_count(num_trees, "num_trees"). -
assert_numeric_scalar(sample_fraction, "sample_fraction", lower = 1e-3, upper = 0.5). - If
mtryis non-NULL,assert_count(mtry, "mtry"). -
honestymust be a length-1 non-NAlogical; otherwisegdpar_abortraises agdpar_input_errorwith message"Argument 'honesty' must be a non-NA logical scalar.". - If
seedis non-NULL,assert_numeric_scalar(seed, "seed", lower = 1, upper = .Machine$integer.max).
-
- Hyperparameters are coerced and stored in a list
hp:num_trees→integer,sample_fraction→numeric,mtry→integer(orNULL),honesty→logical,seed→integer(orNULL). - The
fit_predict_funandpredict_funclosures are defined locally and passed togdpar_meta_learner_adapter; see their dedicated subsections below. - Side effects: none beyond construction of the adapter object. The closures themselves perform fitting/prediction only when invoked.
Purpose
Mandatory adapter entry point: fit a grf::causal_forest on the training triple (X, Y, T) and immediately predict the conditional average treatment effect (CATE) on X_newdata together with native normal-approximation confidence intervals at confidence level level. Returns both the predictions and a state object that allows predict_fun to reuse the fitted forest.
Arguments
-
X: Covariate data (data frame or coercible) for the training sample; passed to.grf_to_matrix. -
Y: Numeric outcome vector; coerced viaas.numeric(Y)and passed asYtogrf::causal_forest. -
T: Numeric treatment indicator vector; coerced viaas.numeric(T)and passed asWtogrf::causal_forest. -
X_newdata: Covariate data for prediction; converted via.grf_to_matrix(X_newdata, template = attr(X_mat, "template"))so its design aligns with the training design. -
level: Numeric scalar in$(0, 1)$ ; confidence level for the CIs. -
seed_run: Optional integer scalar; seed supplied by the comparator. If non-NULLit overrideshp$seed.
Mathematics
The causal forest is fit by do.call(grf::causal_forest, args) with args = list(X = X_mat, Y = as.numeric(Y), W = as.numeric(T), num.trees = hp$num_trees, sample.fraction = hp$sample_fraction, honesty = hp$honesty), conditionally augmented with mtry (if hp$mtry non-NULL) and seed (if eff_seed non-NULL). Predictions use estimate.variance = TRUE, yielding
Returns
A list with elements:
-
cate_mean: numeric vector of point predictionsas.numeric(pred$predictions). -
cate_ci: numeric matrix with columnslowerandupper, row-aligned withcate_mean. -
state: list withforest(the fittedcausal_forestobjectcf) andtemplate(the column template extracted fromattr(X_mat, "template")). -
notes:character(0L)(empty).
Notes
- Calls
require_suggested("grf", "fit gdpar_adapter_grf")to ensure the suggested package is available; this is expected to error ifgrfis not installed. -
eff_seedis resolved asseed_run(coerced to integer) when non-NULL, otherwise falls back tohp$seed. - The variance estimates are clamped at zero via
pmax(as.numeric(pred$variance.estimates), 0)before taking the square root, guarding against tiny negative numerical artifacts. - The
state$forestobject retainsgrf's internal RNG state and trained trees; it is intended to be passed back topredict_fun. - Side effects: triggers
grf::causal_forestfitting (RNG consumption ifeff_seedis set) and a prediction call.
Purpose
Optional adapter entry point: reuse a previously fitted causal forest stored in state to predict CATEs (with native CIs) on a fresh evaluation grid X_newdata, without refitting.
Arguments
-
state: List previously produced byfit_predict_fun, expected to containforest(a fittedgrf::causal_forestobject) andtemplate(column template for design alignment). -
X_newdata: Covariate data for prediction; converted via.grf_to_matrix(X_newdata, template = state$template). -
level: Numeric scalar in$(0, 1)$ ; confidence level for the CIs.
Mathematics
Identical normal-approximation CI construction as in fit_predict_fun:
Returns
A list with elements:
-
cate_mean: numeric vector of point predictions. -
cate_ci: numeric matrix with columnslowerandupper.
(No state or notes elements are returned, unlike fit_predict_fun.)
Notes
- Dependency check is performed inline via
requireNamespace("grf", quietly = TRUE); on failure,gdpar_abortraises agdpar_missing_dependency_errorwithdata = list(package = "grf")and message"Package 'grf' is required to reuse a cached causal_forest state.". - If
stateisNULLorstate$forestisNULL,gdpar_abortraises agdpar_internal_errorwith message"Cached state for the grf adapter is empty; refit before predicting.". - Uses
stats::predict(notgrf:::predict.causal_forestdirectly) so S3 dispatch resolves the method. - Variance estimates are clamped at zero via
pmax(..., 0)before the square root. - Side effects: triggers a
grfprediction call (no refitting, no RNG consumption).
Purpose
Internal helper that converts a covariate data frame into a fully numeric design matrix suitable for grf::causal_forest. Character columns are coerced to factors, factors are expanded via model.matrix(~ . - 1, ...), and numeric columns pass through unchanged. A template attribute records the column structure of the first call so subsequent calls on X_newdata align identically; the function aborts when a new column appears or a previously observed factor level is missing.
Arguments
-
df: A data frame, or an object coercible viaas.data.frame(df, stringsAsFactors = FALSE). Contains the covariates. -
template: Optional list with elementscolnames(character vector of expected design-matrix column names) andfactor_levels(named list mapping original data-frame column names to either their factor levels orNULLfor non-factor columns). IfNULL, a new template is built fromdf.
Mathematics
The design matrix is constructed as
which expands each template is supplied, factor columns are first re-leveled to the template's levels via factor(df[[j]], levels = template$factor_levels[[j]]), the resulting matrix is checked for set-equality of column names against template$colnames, and finally reordered as mm[, template$colnames, drop = FALSE].
Returns
A numeric matrix with attribute "template" (a list with colnames and factor_levels). When template = NULL on input, the returned template is freshly built from df; otherwise the input template is attached unchanged to the returned matrix.
Notes
- If
dfis not a data frame, it is coerced viaas.data.frame(df, stringsAsFactors = FALSE). - A
forloop overseq_along(df)coerces anycharactercolumn tofactorviaas.factor. - Template-
NULLbranch:- If
ncol(df) == 0L,gdpar_abortraises agdpar_input_errorwith message"gdpar_adapter_grf requires at least one covariate; received a 0-column data frame.". -
template$factor_levelsis built bylapply(df, function(col) if (is.factor(col)) levels(col) else NULL), so non-factor columns map toNULL.
- If
- Template-non-
NULLbranch:- For each name
jinnames(template$factor_levels)whose entry is non-NULL(i.e., a factor column in the original training data):- If
jis not incolnames(df),gdpar_abortraises agdpar_input_errorwith message"Covariate '<j>' missing from newdata for the grf adapter."anddata = list(missing = j). - Otherwise
df[[j]]is re-factored withfactor(df[[j]], levels = template$factor_levels[[j]]); this silently drops unseen levels and introducesNAfor values not in the template levels.
- If
- After building
mm, if!setequal(colnames(mm), template$colnames):-
missing_cols <- setdiff(template$colnames, colnames(mm)). -
extra_cols <- setdiff(colnames(mm), template$colnames). -
gdpar_abortraises agdpar_input_errorwith a formatted message listing missing and extra columns (using"<none>"when a set is empty) anddata = list(missing = missing_cols, extra = extra_cols).
-
- On success, columns are reordered to exactly match
template$colnamesviamm[, template$colnames, drop = FALSE].
- For each name
- The
templateattribute is set on the returned matrix viaattr(mm, "template") <- template, enabling chained calls (e.g.,fit_predict_funreadsattr(X_mat, "template")and passes it to theX_newdataconversion). - No S3 dispatch; this is a plain internal function.
Purpose
Initialises a chainable builder object of class amm_builder that serves as an incremental specification container for an Adaptive Moderated Model (AMM). The builder is a programmatic alternative to the single-call amm_spec() constructor; it accumulates per-dimension additive (a) and multiplicative (b) basis formulas, a global modulating basis W, and optional covariate names x_vars, and is ultimately converted into an amm_spec via as_amm_spec().
Arguments
| Argument | Type | Meaning |
|---|---|---|
p |
Scalar positive integer (coerced to integer) |
Dimension of the per-individual parameter vector 1L (scalar/ univariate path). Must satisfy p ≥ 1. |
Mathematics None. The function performs only object construction.
Returns
An S3 object of class c("amm_builder", "list") with four components:
-
p—integer, the value of thepargument. -
dims— adims_specobject (fromdimwise(a = NULL, b = NULL)) representing per-dimension additive/multiplicative basis specifications with aNULLbase and no overrides. -
W—NULL(no modulating basis set yet). -
x_vars—NULL(no covariate names set yet).
Notes
-
pis validated byassert_count(p, "p"), which enforces a single positive integer value. Non-integer numerics are silently coerced tointegerviaas.integer(p). - The resulting builder is intended to be mutated in-place through successive
amm_set_*()calls and finalised byas_amm_spec(). Despite the pipe-friendly API, the builder is a plain list and mutations rely on R's copy-on-modify semantics. - When
p = 1L, finalisation throughas_amm_spec()resolves the embeddeddims_specto a scalar entry and invokes the scalar AMM path; whenp > 1L, thedims_specis forwarded directly to the multivariate path. This bifurcation happens at finalisation time, not at build time.
Purpose
Replaces the base (uniform) additive basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension amm_set_a()). Passing NULL disables the additive component on the base layer.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
a |
One-sided formula (~ ...) or NULL
|
The new base additive basis. NULL means no additive component on the base layer. |
Returns
The mutated amm_builder object (returned invisibly for pipe compatibility). The mutation is to builder$dims$base$a.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
ais validated byassert_one_sided_formula(a, "a", allow_null = TRUE), which enforces a one-sided formula orNULL. - The mutation
builder$dims$base$a <- adirectly overwrites the base additive slot. Any per-dimension overrides previously registered viaamm_set_a()are preserved because they are stored separately in thedims_specoverride layer (seeoverride()).
Purpose
Replaces the base (uniform) multiplicative basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension amm_set_b()). Passing NULL disables the multiplicative component on the base layer.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
b |
One-sided formula (~ ...) or NULL
|
The new base multiplicative basis. NULL means no multiplicative component on the base layer. |
Returns
The mutated amm_builder object (returned invisibly). The mutation is to builder$dims$base$b.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
bis validated byassert_one_sided_formula(b, "b", allow_null = TRUE). - Per-dimension overrides previously registered via
amm_set_b()are preserved; only the base layer is changed.
Purpose
Registers a per-dimension override of the additive component for a specific dimension index amm_set_a_uniform()) at index k replaces the previous override for that dimension.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
k |
Positive integer in 1:p
|
The dimension index to override. Validated against builder$p as the upper bound. |
a |
One-sided formula (~ ...) or NULL
|
The override additive basis for dimension NULL disables the additive component for that dimension. |
Returns
The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, a = a).
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
kis validated byassert_count(k, "k", max = builder$p), which enforces a single positive integer$\leq p$ . -
ais validated byassert_one_sided_formula(a, "a", allow_null = TRUE). - The override is applied via
override(), which layers a per-dimension specification on top of the existingdims_spec. Overrides survive subsequent uniform changes (e.g., a lateramm_set_a_uniform()call will not erase overrides registered here).
Purpose
Registers a per-dimension override of the multiplicative component for a specific dimension index amm_set_b_uniform()) at index k replaces the previous override for that dimension.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
k |
Positive integer in 1:p
|
The dimension index to override. Validated against builder$p as the upper bound. |
b |
One-sided formula (~ ...) or NULL
|
The override multiplicative basis for dimension NULL disables the multiplicative component for that dimension. |
Returns
The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, b = b).
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
kis validated byassert_count(k, "k", max = builder$p). -
bis validated byassert_one_sided_formula(b, "b", allow_null = TRUE). - The override is applied via
override(), which layers a per-dimension specification on top of the existingdims_spec. Overrides survive subsequent uniform changes.
Purpose
Stores a W_basis object as the global modulating basis of the specification under construction. The modulating component
Passing NULL disables the modulating component.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
W |
W_basis object or NULL
|
The modulating basis to store. NULL clears any previously set basis. |
Returns
The mutated amm_builder object (returned invisibly). The mutation is to builder$W.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). - If
Wis non-NULL, it is validated byassert_inherits(W, "W_basis", "W"). - The modulating component is global to all dimensions of
$\theta_i$ ; there is no per-dimension setter for$W$ . Declaring$W$ per-dimension would restrict the model to the separable sub-class, which is rejected by construction in the package design. -
Wis stored as a single top-level slot of the builder, not insidedims.
Purpose
Records a character vector of covariate names that enter the modulating component as the linear factor NULL. The recorded value is forwarded to amm_spec() at finalisation time. When NULL, the package derives covariate names from the right-hand side of the model formula passed to gdpar().
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
x_vars |
Character vector (length NULL
|
Names of the covariates used in the modulating component. NULL defers covariate identification to gdpar(). |
Returns
The mutated amm_builder object (returned invisibly). The mutation is to builder$x_vars.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). - If
x_varsis non-NULL, the function performs a manual validation check: it must beis.character(x_vars)andlength(x_vars) >= 1L. On failure,gdpar_abort()is called with class"gdpar_input_error"and a data list containing the argument name and the received value. - This is the only setter in the builder suite that performs its own argument validation via
gdpar_abort()rather than delegating to anassert_*helper, because the validation logic (non-empty character vector orNULL) does not match any existing assertion primitive.
Note: The roxygen block for as_amm_spec(builder) begins at the end of this section but its function body is in section 2 of 2. It is therefore not documented here.
Purpose
Converts an amm_builder object into a finalized amm_spec specification object. This is the constructor that finalizes the builder's accumulated configuration, resolving any pending dimension specifications and handling the special case of a univariate AMM (p=1).
Arguments
-
builder: An object of classamm_buildercontaining the accumulated model configuration (e.g., dimensionsdims, basis matrixW, predictor variablesx_vars, and AR orderp).
Returns
An object of class amm_spec representing the fully specified AMM model.
Notes
- The function first asserts that
builderis indeed anamm_builderobject. - For the univariate case (
p == 1), it explicitly resolves thedimsspecification to extract the scalar anchor parametersaandbusingresolve_dims_spec. The resultingamm_specobject will contain these scalar values. - For the multivariate case (
p > 1), thedimslist is passed directly to theamm_specconstructor without immediate resolution. - The
amm_specobject is constructed by calling theamm_spec()constructor (presumably defined elsewhere) with the relevant components from the builder.
Purpose
An S3 print method for objects of class amm_builder. It provides a human-readable summary of the builder's current configuration, including base dimensions, any override specifications for higher-order terms, the basis structure, and predictor variables.
Arguments
-
x: An object of classamm_builderto be printed. -
...: Additional arguments (unused, but required for S3 method compatibility).
Returns
Invisibly returns the input object x.
Notes
- This is an exported S3 method.
- The output is printed to the console via
cat(). - The summary includes:
- The AR order
p. - Base anchor parameters
aandb. These are printed asNULLif they have not been set. - Any override specifications for dimensions of higher-order terms (k > 0). Overrides are printed if they exist, otherwise
<none>is shown. - The basis matrix specification
W. If present, it is printed asW_basis(type = '<type>'). - The predictor variable names
x_vars. IfNULL, it notes they are inherited from thegdpar()formula.
- The AR order
- The function handles
NULLvalues gracefully by printing"NULL"instead of attempting to printNULLdirectly. - The overrides are printed in ascending order of the integer key
k.
Purpose
Serializes an amm_spec object (the constructor-input representation of an AMM specification) into a canonical, human-readable plain-text file. The format is designed for version control, archival, and bit-exact reproducibility. The file is intended to be round-tripped by amm_load_spec, which parses it with a dedicated lexical parser (no source/eval), making the serialized form safe to load from untrusted locations.
Arguments
-
spec: Object of classamm_spec. The specification to serialize. Only the constructor inputs are recorded; any materialized state (e.g., aW_basismaterialized at a specific$\theta_{\mathrm{ref}}$ via the internalmaterialize_W_basis) is deliberately not written, so that the reconstructed object is the unmaterialized form normally produced byamm_spec. -
path: Character scalar. The destination file path. Must be a single non-empty string.
Returns
Invisibly returns path (the character scalar that was passed in), after writing the canonical text representation to that file via writeLines.
Notes
Validation and errors:
-
specis checked withassert_inherits(spec, "amm_spec", "spec"); a failure raises an error (dispatched byassert_inherits). -
pathis validated explicitly: it must satisfyis.character(path),length(path) == 1L, andnzchar(path). If any of these fail, the function aborts viagdpar_abortwith class"gdpar_input_error"and adatalist containingargument = "path"andreceived = path. - If
spec[["W"]]is non-NULLandidentical(W[["type"]], "user"), the function aborts with agdpar_input_error(nodatafield). User-definedW_basisobjects cannot be canonized because the evaluator is an arbitrary R function.
Serialization logic (line-by-line construction):
- The package version is obtained from
utils::packageVersion("gdpar")and emitted as the mandatory header line:# gdpar_spec_version: <version> - The dimension count is emitted as
p: <spec[["p"]]>. - Scalar path (
spec[["p"]] == 1L, checked withisTRUE): theaandbone-sided formulas are serialized via the internal helper.serialize_formulaand emitted asa: <literal>andb: <literal>. - Multivariate path (
spec[["p"]] > 1L):aandbare both written as the literal stringNULL(the per-dimension formulas are emitted separately below). -
x_varsis serialized via.serialize_char_vecand emitted asx_vars: <literal>(handlesNULLand character vectors). -
Wblock:- If
WisNULL: emitsW.type: NULL. - If
W[["type"]]is"polynomial": emitsW.type: polynomialfollowed byW.degree: <as.integer(W[["degree"]])>. - If
W[["type"]]is"bspline": emitsW.type: bspline, thenW.degree: <as.integer(W[["degree"]])>. Additionally, ifW[["knots"]]is non-NULL, emitsW.knots: <.serialize_num_vec(W[["knots"]])>; ifW[["df"]]is non-NULL, emitsW.df: <as.integer(W[["df"]])>. Either, both, or neither ofW.knots/W.dfmay appear depending on which fields are populated.
- If
- Multivariate per-dimension entries (
spec[["p"]] > 1L, checked withisTRUE): for eachkinseq_len(spec[["p"]]), the function retrievesentry <- spec[["dims"]][[k]]and emits two lines:The indexdims.<k>.a: <.serialize_formula(entry[["a"]])> dims.<k>.b: <.serialize_formula(entry[["b"]])>kis interpolated directly into the key name, producing keys such asdims.1.a,dims.1.b,dims.2.a, etc.
Side effects: Writes a text file to path via writeLines(lines, con = path). The file is overwritten if it exists.
Version policy: The version header records the running package version exactly. The loader (amm_load_spec) checks this strictly; until the first stable release, any mismatch is treated as an error.
Format grammar summary (as emitted by this function):
| Key | Value grammar |
|---|---|
# gdpar_spec_version: |
Package version string (header line) |
p |
Positive integer |
a, b
|
NULL or one-sided formula literal (e.g. ~ x1 + x2) |
x_vars |
NULL or c("x1", "x2", ...)
|
W.type |
NULL, polynomial, or bspline
|
W.degree |
Positive integer (present for polynomial and bspline) |
W.knots |
c(...) of numerics (present only for bspline with interior knots) |
W.df |
Positive integer (present only for bspline with df) |
dims.K.a, dims.K.b
|
Same grammar as a, b, for K in 1:p (emitted only when p > 1) |
Purpose
Reads a canonical gdpar specification file from disk and parses it into an amm_spec object. The function is the deserialization counterpart to the package's spec writer: it enforces a strict, line-oriented key: value grammar, validates a version header against the currently loaded package version, and dispatches to amm_spec() either in the univariate (p = 1) form (with scalar a/b) or the multivariate (p > 1) form (with a dims list of per-dimension a/b pairs).
Arguments
-
path:characterscalar (length 1, non-empty). Filesystem path to a canonical gdpar spec file. Must exist on disk.
Mathematics
The parser implements a deterministic finite scan over the file's lines. Let
and the key/value pair is
The recognised key set is
For dims.*.* key; for dims indices are exactly a/b keys must be absent (i.e. parse to NULL).
Returns
An amm_spec object constructed by amm_spec():
- When
p == 1L:amm_spec(a = a_scalar, b = b_scalar, W = W, x_vars = x_vars, p = 1L), wherea_scalarandb_scalarare parsed formula objects (orNULL) andWis the parsed weight-basis specification returned by.parse_W_records(). - When
p > 1L:amm_spec(W = W, x_vars = x_vars, p = p_val, dims = dims_list), wheredims_listis a list of lengthp_valwhose$k$ -th element islist(a = a_k, b = b_k)witha_k,b_kparsed formula objects.
Notes
- All validation failures are raised via
gdpar_abort()with class"gdpar_input_error"; conditiondatapayloads are attached where contextual (e.g.argument,received,path,line,raw,key,file_version,package_version). - Input validation of
path: rejects non-character, non-scalar, or empty-string inputs before any I/O. - Version header: the file must contain a line matching the regex
^\s*#\s*gdpar_spec_version\s*:. The extracted (whitespace-trimmed) value must beidentical()toas.character(utils::packageVersion("gdpar")); any mismatch aborts, citing bit-exact reproducibility concerns across development releases. - Comment and blank lines are skipped during record parsing; only the first
:(fixed-match) on each remaining line is used as the key/value separator. - Duplicate-key detection is performed against an accumulating
recordslist; the error message reports both the current line and the line of the first occurrence. - Unknown-key rejection: any key not in
recognised_prefixesand not matching^dims\.[0-9]+\.[ab]$aborts. - Required-key enforcement:
p,a,bare mandatory for the univariate branch; for the multivariate branch,a/bmust beNULL(i.e..parse_formula()returnedNULL) anddims.K.a/dims.K.bare required for every$k \in {1, \dots, p}$ . -
pis parsed via.parse_int()and must satisfy$p \geq 1$ . -
x_varsis optional; when absent it is passed asNULL. - The
Wblock is delegated entirely to.parse_W_records(records). - Multivariate consistency: when
p == 1L, anydims.*key triggers an abort listing the offending keys (viasQuote). Whenp > 1L, after buildingdims_list, the function recomputes the set ofdims.keys and subtracts those matching^dims\.[1-(p-1)]\.[ab]$|^dims\.p\.[ab]$(i.e. the regex^dims\.[1-%d]\.[ab]$|^dims\.%d\.[ab]$withp_val - 1Landp_valsubstituted); any remainder is parsed for its integer index and aborted if the index isNA, less than 1, or greater thanp_val, or if the key does not match thedims.K.{a,b}shape at all. - Side effects: reads from the filesystem via
readLines(path, warn = FALSE); performs no writes. - No S3 dispatch is performed inside this function; the returned object's class is determined by
amm_spec().
Purpose
Serializes an R formula object into a single-line character string suitable for writing into the canonical text file format used by gdpar. This is the inverse of .parse_formula.
Arguments
-
f:formulaorNULL. The formula to serialize. May beNULL, in which case the literal string"NULL"is produced.
Mathematics
No numerical formula. The transformation is:
Returns
A length-one character string. When f is NULL, returns "NULL". Otherwise returns the deparsed formula collapsed onto a single line with elements separated by single spaces.
Notes
- Uses
deparse(..., width.cutoff = 500L)to reduce line-wrapping in the deparsed output, then collapses any resulting multi-element character vector with spaces. - No validation is performed on the structure of
f(e.g., one-sided vs. two-sided); that is the responsibility of the parser on read-back.
Purpose
Parses a character string back into an R formula object, enforcing that it is either NULL or a one-sided formula beginning with ~. Used when reading canonical configuration files.
Arguments
-
value:character(1). The raw string read from the file for the given key. -
key:character(1). The configuration key name, used in error messages. -
line_no:integer(1)ornumeric. The line number in the source file wherevalueappeared, used in error messages.
Mathematics
No numerical formula. The validation logic is:
Returns
NULL or a one-sided formula object of length 2 (i.e., ~ rhs).
Notes
- Trims whitespace from
valuebefore any comparison. - If the trimmed value is exactly
"NULL", returnsNULLimmediately. - If the value does not start with
"~", raises an error of class"gdpar_input_error"withdata = list(key = key, value = value, line = line_no). - Wraps
stats::as.formula(value)intryCatch; on parse failure, raises a"gdpar_input_error"(without thedatafield). - After successful parsing, checks
length(out) != 2L; a two-sided formula (length 3) triggers a"gdpar_input_error". - All errors are raised via
gdpar_abort.
Purpose
Serializes a character vector (or NULL) into a c(...) literal string with each element double-quoted, suitable for the canonical file format. This is the inverse of .parse_char_vec.
Arguments
-
x:charactervector orNULL. The vector to serialize.
Mathematics
subject to the constraint that no element of ", \, \n, or \r.
Returns
A length-one character string. "NULL" for NULL input; otherwise a c(...) literal with quoted entries.
Notes
- Before serialization, checks every element of
xfor the regex pattern["\\\n\r](double quote, backslash, newline, carriage return). If any match is found, raises a"gdpar_input_error"viagdpar_abortwith the message indicating that double quotes, backslashes, or newlines are not permitted. - Each element is formatted via
sprintf("\"%s\"", x_i). - Elements are joined with
", ". - An empty (length-zero) non-NULL character vector produces the string
c().
Purpose
Parses a c(...) literal string back into a character vector, or recognizes "NULL". Inverse of .serialize_char_vec.
Arguments
-
value:character(1). The raw string from the file. -
key:character(1). Configuration key name for error messages. -
line_no:integer(1)ornumeric. Source line number for error messages.
Mathematics
The extraction proceeds as:
- Trim
value; if equal to"NULL", returnNULL. - Require the regex
^c\(.*\)$; otherwise error. - Extract the inner content:
inner = sub("^c\((.*)\)$", "\1", v). - Find all quoted tokens via the regex
"([^"]*)"oninner. - Strip the surrounding double quotes from each match.
Returns
NULL, character(0), or a character vector of the parsed quoted tokens.
Notes
- If the trimmed value is
"NULL", returnsNULL. - If the value does not match
^c\(.*\)$, raises a"gdpar_input_error". - After extracting
inner, if there are zero regex matches:- If
inneris empty or whitespace-only (!nzchar(trimws(inner))), returnscharacter(0). - Otherwise raises a
"gdpar_input_error"indicating quoted tokens were expected.
- If
- Quoted tokens are matched with the regex
"([^"]*)"and then the double quotes are removed viagsub("\"", "", matches, fixed = TRUE). - The parser does not handle escaped quotes inside tokens; this is consistent with the serializer's prohibition on double quotes within elements.
Purpose
Serializes a numeric vector (or NULL) into a c(...) literal string using high-precision formatting. Inverse of .parse_num_vec.
Arguments
-
x:numericvector orNULL. The vector to serialize.
Mathematics
Each element is formatted with 17 significant digits:
then joined as
Returns
A length-one character string:
-
"NULL"ifxisNULL. -
"c()"ifxhas length 0. - Otherwise a
c(...)literal with%.17g-formatted numbers.
Notes
- The
%.17gformat provides enough precision to round-trip IEEE-754 double-precision values in most cases. - No validation is performed on finiteness or
NA/NaNvalues;sprintf("%.17g", NA)yields"NA", andsprintf("%.17g", Inf)yields"Inf", which would fail on re-parse via.parse_num_vec(sinceas.numeric("NA")isNAand triggers the non-numeric error, andInfwould parse but is not checked here).
Purpose
Parses a c(...) literal string back into a numeric vector, or recognizes "NULL". Inverse of .serialize_num_vec.
Arguments
-
value:character(1). The raw string from the file. -
key:character(1). Configuration key name for error messages. -
line_no:integer(1)ornumeric. Source line number for error messages.
Mathematics
- Trim
value; if"NULL", returnNULL. - Require
^c\(.*\)$; otherwise error. - Extract inner:
inner = trimws(sub("^c\((.*)\)$", "\1", v)). - If
inneris empty, returnnumeric(0). - Split on
,(fixed), trim each part, coerce withas.numeric. - If any result is
NA, error.
Returns
NULL, numeric(0), or a numeric vector.
Notes
- Uses
suppressWarnings(as.numeric(parts))to silence coercion warnings; the presence ofNAin the result is then checked explicitly. - If any token fails to coerce (producing
NA), raises a"gdpar_input_error"viagdpar_abort. - Splitting is done with
strsplit(inner, ",", fixed = TRUE), so commas inside quoted strings would cause issues — but numeric vectors do not contain quoted strings, so this is safe. - Note that
Inf,-Inf, andNaNwould parse successfully viaas.numericand pass theis.nacheck forInf(sinceis.na(Inf)isFALSE), butNaNwould fail sinceis.na(NaN)isTRUE.
Purpose
Parses a character string into a single integer value, with strict validation.
Arguments
-
value:character(1). The raw string from the file. -
key:character(1). Configuration key name for error messages. -
line_no:integer(1)ornumeric. Source line number for error messages.
Mathematics
then validate:
If valid, return
Returns
A length-one integer.
Notes
- Uses
suppressWarnings(as.numeric(value))to silence coercion warnings. - The check
v != as.integer(v)rejects non-integer-valued numerics (e.g.,3.5). Note thatas.integer(v)truncates toward zero, so this comparison effectively requires$v$ to be a whole number. -
is.finite(v)rejectsInf,-Inf,NaN, andNA. - On any validation failure, raises a
"gdpar_input_error"viagdpar_abort. - The returned value is explicitly
as.integer(v), so it is of R typeinteger(notnumeric).
Purpose
Parses the collection of W.* configuration records (for the basis-function specification of the dynamic parameter model) and constructs a W_basis object. This is the central dispatcher for reconstructing the basis
Arguments
-
records: a namedlistof record objects. Each record is expected to be a list with elementsvalue(character string) andline(line number). The relevant keys are"W.type","W.degree","W.knots", and"W.df".
Mathematics
The basis
Returns
NULL, or a W_basis object constructed by W_basis(...).
Notes
- If
records[["W.type"]]isNULLor itsvalueis"NULL", returnsNULLimmediately (no basis). -
W.type = "user"is explicitly rejected with a"gdpar_input_error"because user-defined bases reference arbitrary R functions that cannot be serialized in the canonical format. - Only
"polynomial"and"bspline"are accepted asW.typevalues; anything else raises a"gdpar_input_error"referencing the line number fromWt_rec[["line"]]. -
W.degreeis required for both supported types. Ifrecords[["W.degree"]]isNULL, an error is raised. The degree is parsed via.parse_int. - For
W.type = "polynomial": returnsW_basis(type = "polynomial", degree = W_degree). No further keys are consulted. - For
W.type = "bspline":-
W.knotsandW.dfare mutually exclusive. If both are present, a"gdpar_input_error"is raised. - If
W.knotsis present, it is parsed via.parse_num_vecand passed asknotstoW_basis. - If
W.dfis present, it is parsed via.parse_intand passed asdftoW_basis. - If neither
W.knotsnorW.dfis present, a"gdpar_input_error"is raised stating that one must be supplied.
-
- All errors are raised via
gdpar_abortwith class"gdpar_input_error". - This function delegates the actual basis construction to the (presumably exported or internal)
W_basisconstructor, which is not defined in this section.
Purpose Constructs a specification object that declares which components of the Additive-Multiplicative-Modulated (AMM) canonical form
are active and how each is parametrized. The returned amm_spec object is the primary input consumed by gdpar() to assemble design matrices and the Stan model. Two mutually exclusive construction paths exist, selected by p: the scalar path (p = 1L, default) accepts a and b directly as one-sided formulas; the multivariate path (p > 1L) requires the dims argument instead and forbids a/b.
Arguments
| Argument | Type | Meaning |
|---|---|---|
a |
One-sided formula or NULL
|
Scalar path only. Basis for the additive component stats::model.matrix. NULL disables the additive component. Must be NULL when p > 1L. |
b |
One-sided formula or NULL
|
Scalar path only. Basis for the multiplicative component a. Must be NULL when p > 1L. |
W |
Object of class W_basis or NULL
|
Basis for the modulating component p because it couples all dimensions of NULL disables the modulating component. Validated with assert_inherits. |
x_vars |
Character vector or NULL
|
Names of covariates entering the modulating component as the linear factor NULL, gdpar() later uses the covariates from the right-hand side of the model formula. Must be non-empty character when supplied. |
p |
Integer |
Dimension of the per-individual parameter vector 1L. Coerced to integer via as.integer after validation with assert_count. |
dims |
dims_spec object, plain list of length p, or NULL
|
Multivariate path only. Each entry is a list with components a (one-sided formula or NULL) and b (one-sided formula or NULL). Must be NULL when p == 1L. Bare formulas are rejected to prevent silent recycling. If a dims_spec object (from dimwise()), it is resolved via resolve_dims_spec(dims, p). |
Mathematics
The AMM level of the specification is inferred from the (non-)nullity of the components across all dimensions:
On the scalar path the check is direct on a, b, W; on the multivariate path it is computed over the resolved per-dimension list:
$$\texttt{any_a} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}ka), \qquad \texttt{any_b} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}kb).$$
The design-matrix centering conditions (C2) and (C3) are not enforced inside this constructor; they are enforced empirically when the design matrices
Returns
An S3 object of class c("amm_spec", "list") with the following components:
| Component | Type | Description |
|---|---|---|
a |
One-sided formula or NULL
|
Populated on the scalar path; NULL on the multivariate path. |
b |
One-sided formula or NULL
|
Populated on the scalar path; NULL on the multivariate path. |
W |
W_basis object or NULL
|
The modulating basis, passed through verbatim. |
x_vars |
Character vector or NULL
|
Passed through verbatim. |
level |
Integer (0L, 1L, or 2L) |
AMM hierarchy level as defined above. |
p |
Integer | Dimension of |
dims |
NULL or list of length p
|
NULL on the scalar path. On the multivariate path, a resolved list of length p where each element is list(a = ..., b = ...). |
Notes
-
Scalar path (
p == 1L):aandbare validated withassert_one_sided_formula(..., allow_null = TRUE). Ifdimsis non-NULL, an error of classgdpar_input_erroris raised. -
Multivariate path (
p > 1L): If eitheraorbis non-NULL, an error is raised directing the user todims. IfdimsisNULL, an error is raised. -
Bare formula guard: If
dimsinherits from"formula", a specific error is raised to prevent silent recycling across dimensions. -
Plain list validation for
dims: The list must have length exactlyp; each entry must itself be a list. Missinga/bnames default toNULL. Each formula entry is validated withassert_one_sided_formula. -
dims_specresolution: Whendimsis adims_specobject (produced bydimwise()and possibly composed withoverride()), it is resolved via the internalresolve_dims_spec(dims, p)function. -
Unknown class for
dims: Ifdimsis notNULL, not a formula, not adims_spec, and not a list, an error of classgdpar_input_erroris raised. -
No cross-component identifiability check: The constructor does not detect non-identifiability between
$a$ ,$b$ , and$W$ components; this is deferred togdpar_check_identifiability()which is called automatically before fitting. -
Linearity assumption (LIN): Holds automatically for formula-based bases (linear subspaces of $L^2_0(\mu)$) and for polynomial/B-spline
W_basistypes. -
Dependencies: Uses
assert_count,assert_one_sided_formula,assert_inherits(all internal assertion helpers),gdpar_abort(structured error signalling), andresolve_dims_spec.
Purpose Constructs the centered design matrices (.build_amm_design_multi(). This function prepares the covariate matrices for the static and modulating components of the AMM, centering them and standardising the modulating covariates.
Arguments
-
amm(amm_spec): An object of class"amm_spec"representing the AMM specification. -
data(data.frame): The data frame containing the variables referenced by the AMM specification. -
formula_rhs(formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factorxvariables whenamm$x_varsisNULL.
Mathematics The function implements the following operations:
-
Centering the static components:
For the static intercept formulaamm$a(and similarlyamm$b), the design matrix$Z_a^\text{full}$ is constructed. It is then column-centred by subtracting its column means: $$ Z_a = Z_a^\text{full} - \mathbf{1} \cdot \bar{Z}_a^\text{T} $$ where$\bar{Z}_a$ is the vector of column means of$Z_a^\text{full}$ . The same is done for$Z_b$ . -
Standardising the modulating component:
When the modulating componentamm$Wis active, the covariate matrix$X^\text{full}$ is built fromx_vars. It is then centred and scaled to have zero mean and unit standard deviation (using the sample standard deviation): $$ X = \text{diag}\left(\frac{1}{s_X}\right) \left( X^\text{full} - \mathbf{1} \cdot \bar{X}^\text{T} \right) $$ where$\bar{X}$ is the vector of column means and$s_X$ is the vector of column sample standard deviations.
Returns A list with components:
-
Z_a(matrix): Centred design matrix for the static componenta(rows = observations, columns = terms froma). IfaisNULL, a 0-column matrix of the correct row count. -
Z_b(matrix): Centred design matrix for the static componentb. -
X(matrix): Standardised (centred and scaled) design matrix for the modulating component. IfWis inactive, a 0-column matrix. -
Z_a_means(numeric): Column means of the raw (uncentred) design matrix fora. -
Z_b_means(numeric): Column means forb. -
X_means(numeric): Column means of the raw covariate matrix forx. -
X_sds(numeric): Column standard deviations of the centred covariate matrix forx. -
Z_a_names(character): Column names ofZ_a. -
Z_b_names(character): Column names ofZ_b. -
X_names(character): Column names ofX(thex_vars).
Notes
- If
amm$p > 1, the function immediately returns the result of.build_amm_design_multi(amm, data, formula_rhs). - The function performs strict input validation via
assert_inheritsandassert_data_frame. - Raises a
gdpar_input_errorif any covariate needed bya,b, orx_varscontains missing values (NA). The error message specifies that "Path 1 does not impute". - Raises a
gdpar_input_errorif the modulating componentWis active but nox_varscould be identified (fromamm$x_vars,formula_rhs, or the formula terms). - Raises a
gdpar_input_errorif any requiredx_varsare missing fromdata. - Raises a
gdpar_input_errorif anyx_varsare constant (zero standard deviation after centring). - The design matrices are constructed using
stats::model.matrixwith the formula updated to~ . + 0to suppress the intercept column. - The returned
Z_a,Z_b, andXare always plain matrices (not data frames or tibbles).
Purpose Internal workhorse for building per-coordinate centred design matrices (
Arguments
-
amm(amm_spec): An object of class"amm_spec"withamm$p > 1. The per-coordinate specifications are stored inamm$dims. -
data(data.frame): The data frame containing the variables referenced by the AMM specification. -
formula_rhs(formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factorxvariables whenamm$x_varsisNULL.
Mathematics
The algorithm is an extension of the univariate case (
-
Collect needed variables:
Iterate over all coordinates$k$ and collect all variables referenced inamm$dims[[k]]$aandamm$dims[[k]]$b. The union of these, plusamm$x_vars, forms the set of variables that must be present and complete (noNAs) indata. -
Per-coordinate centred design matrices:
For each coordinate$k$ :- If the static intercept formula
a_kis notNULL, construct its design matrix$Z_{a,k}^\text{full}$ and centre it: $$ Z_{a,k} = Z_{a,k}^\text{full} - \mathbf{1} \cdot \bar{Z}_{a,k}^\text{T} $$ - Similarly for
$Z_{b,k}$ fromb_k. - If
a_korb_kisNULL, the corresponding matrix is set to a 0-column matrix.
- If the static intercept formula
-
Shared modulating matrix
$X$ :
The construction is identical to the univariate case: identifyx_vars, build$X^\text{full}$ , centre, scale, and standardise to unit variance. The resulting$X$ is shared across all$p$ coordinates.
Returns A list with components:
-
p(integer): The number of coordinates. -
Z_a_list(list of matrices): Lengthplist; each element is the centred design matrix for coordinate$k$ 's static componenta. -
Z_b_list(list of matrices): Lengthplist; each element is the centred design matrix for coordinate$k$ 's static componentb. -
X(matrix): The shared standardised modulating matrix (0 columns ifWis inactive). -
Z_a_means_list(list of numeric vectors): Column means for each raw$Z_{a,k}^\text{full}$ . -
Z_b_means_list(list of numeric vectors): Column means for each raw$Z_{b,k}^\text{full}$ . -
X_means(numeric): Column means of the raw$X^\text{full}$ . -
X_sds(numeric): Column standard deviations of the centred$X$ . -
Z_a_names_list(list of character vectors): Column names for each$Z_{a,k}$ . -
Z_b_names_list(list of character vectors): Column names for each$Z_{b,k}$ . -
X_names(character): Column names ofX(thex_vars).
Notes
- This function is internal and not exported (hence the leading dot).
- It performs the same input validation and error checks as
build_amm_design: missing values in needed variables, identification ofx_varswhenWis active, missingx_varsindata, and constant covariates inx_vars. - The per-coordinate matrices in
Z_a_listandZ_b_listmay have different column counts (ragged arrays). Downstream assembly (e.g., for Stan) must handle this raggedness. - The function does not compute any group-level random effects or parameter transformations; it only constructs the design matrices from the data and specifications.
Purpose Constructs centered design matrices and metadata for a multi‑individual (K > 1) AMM specification. Processes a list of canonical amm_spec objects (one per individual) and a dataset to produce centered design matrices for additive (a) and multiplicative (b) components, plus a scaled matrix for the modulating component (W) if present.
Arguments
-
amm_list_canonical(list): A named list of length ≥ 2. Each element must be an object of classamm_specwithp = 1(the K > 1, p > 1 regime is unsupported). Slots must have non‑empty names. -
data(data.frame): Data frame containing the variables used in the formulas. Must have no missing values in the required covariates. -
formula_rhs(formulaorcharacter): Right‑hand side of the model formula. Used as a fallback to determine the set of covariates for the modulating component ifx_varsis not specified and the union of variables froma/bformulas is empty.
Mathematics
For each individual
-
Additive component:
If the formula$a_k$ is notNULL, build the design matrix$Z^{(a)}_{\text{full}, k}$ viamodel.matrix. Compute column means$\bar{z}^{(a)}_k$ and center: $$Z^{(a)}k = Z^{(a)}{\text{full}, k} - \mathbf{1}_n \bar{z}^{(a)\top}_k.$$ If$a_k$ isNULL, set$Z^{(a)}_k$ to an$n \times 0$ matrix. -
Multiplicative component:
Analogously, from$b_k$ , produce$Z^{(b)}_k$ after centering. -
Modulating component (if any
$W$ is non‑NULL):
Determine the variable set$\mathbf{x}$ (explicitx_varsor union of variables from all$a_k$ ,$b_k$ , or fromformula_rhs).
Form$X_{\text{full}}$ from the data, then center and scale: $$\bar{x} = \frac{1}{n} \mathbf{1}n^\top X{\text{full}},$$$$X_c = X_{\text{full}} - \mathbf{1}_n \bar{x}^\top,$$ $$s_x = \text{diag}\left(\sqrt{\frac{1}{n-1} X_c^\top X_c}\right),$$ $$X = X_c \cdot \text{diag}(1/s_x).$$ Each column of$X$ has zero mean and unit sample standard deviation.
Returns
A list with components:
-
K(integer): Number of individuals. -
slot_names(character vector): Names of the input list slots. -
Z_a_k_list(list of matrices): Centered design matrices for additive components (each$n \times p_a^{(k)}$ , where$p_a^{(k)}$ is the number of columns from$a_k$ ;$n \times 0$ if absent). -
Z_b_k_list(list of matrices): Centered design matrices for multiplicative components. -
X(matrix): Centered and scaled design matrix for the modulating component ($n \times q$ ;$n \times 0$ if no$W$ is active). -
Z_a_k_means_list(list of numeric vectors): Column means of the additive design matrices before centering. -
Z_b_k_means_list(list of numeric vectors): Column means of the multiplicative design matrices. -
X_means(numeric vector): Column means of the modulating component matrix (empty if$q = 0$ ). -
X_sds(numeric vector): Column standard deviations of$X_c$ (empty if$q = 0$ ). -
Z_a_k_names_list(list of character vectors): Column names of the additive design matrices. -
Z_b_k_names_list(list of character vectors): Column names of the multiplicative design matrices. -
X_names(character vector): Names of the variables in the modulating component.
Notes
- Raises errors via
gdpar_abortif:-
amm_list_canonicalis not a list of length ≥ 2 or has empty/missing names. - Any
amm_spechasp > 1(unsupported feature error). -
datais not a data frame. - Missing values exist in any required covariate (no imputation).
- The modulating component is active but no covariates are identified.
- Required variables for the modulating component are missing in
data. - Any variable in the modulating component is constant (zero standard deviation).
-
- Uses
assert_inheritsto verify each list element is anamm_spec. - Uses
stats::model.matrixandsweepfor matrix construction and centering.
Purpose S3 method that prints a concise summary of an amm_spec object to the console, displaying its key specifications: level, dimension p, formulas for additive/multiplicative components, modulating component, and x_vars if present.
Arguments
-
x(amm_spec): An object of classamm_specto print. -
...(any): Additional arguments (unused; present for S3 generic compatibility).
Returns Invisibly returns the input object x.
Notes
- Output format:
If
<amm_spec> AMM Level <level> p (dim theta_i) : <p> a (additive) : <formula or NULL> b (multiplicative) : <formula or NULL> W (modulating) : <W_basis(...) or NULL> x_vars : <comma-separated list, if any>p > 1, instead of separateaandblines, it prints a table of per‑dimension formulas fromx$dims. - Formulas are deparsed to character strings for display.
- The method is exported and can be called directly on
amm_specobjects or viaprint().
Purpose
Exported methodological-audit function. Empirically verifies the conclusion of the Bernstein–von Mises theorem (Theorem 4C of Block 4) for a fitted Path 1 Bayesian gdpar model by comparing Bayesian posterior credible intervals with Hessian-based asymptotic confidence intervals obtained from a Laplace approximation around the maximum likelihood estimator on a prior-stripped Stan model. The function is opt-in, computationally expensive, and does not modify the input fit.
Arguments
-
fit: object of S3 classgdpar_fit(produced bygdpar()withpath = "bayes"). Must contain$fit(acmdstanrfit object with adraws()method),$stan_data(a list with possible fielduse_groups), and$prior(passed togenerate_stan_code). -
parameters: optional character vector of parameter names to include in the comparison. WhenNULL(default), defaults to the user-facing parameters that the prior-stripped likelihood identifies, obtained by filteringposterior::variables(draws)against an exclusion regex. -
level: numeric scalar in$[0,1]$ giving the nominal credible/confidence level. Defaults to0.95. -
verbose: logical scalar; whenTRUE, prints an estimated-cost / opt-in message before starting. Defaults toTRUE.
Mathematics
Let
For a nominal level
For each parameter, the Bayesian credible interval is
A width ratio is flagged as suspicious when
The MLE is obtained on the constrained (natural) scale because Stan's optimizer is invoked with jacobian = FALSE; the same convention is used for cmdstanr::laplace, so the Hessian-based covariance corresponds to the inverse observed information on the natural scale rather than the unconstrained scale.
Returns
A list of S3 class c("gdpar_bvm_report", "list") with components:
-
table: a data frame with one row per selected parameter and columns-
variable: parameter name, -
bayes_mean,bayes_lower,bayes_upper,bayes_width: posterior mean, lower/upper quantiles, and width$b_U - b_L$ , -
asymp_mean,asymp_lower,asymp_upper,asymp_width: Laplace-approximation mean, lower/upper quantiles, and width$a_U - a_L$ (allNA_real_if the Laplace step failed), -
width_ratio:$r = \texttt{bayes_width} / \texttt{asymp_width}$ (NA_real_if the Laplace step failed).
-
-
discrepancy: numeric vector of lengthlength(parameters)with$d = |\log r|$ (NA_real_if the Laplace step failed). -
level: the inputlevel. -
warnings: character vector of warning messages. Set to"Laplace approximation failed; asymptotic comparison unavailable."when the Laplace step fails; otherwise appended with one entry of the form"Width ratio outside [0.5, 2] for: <vars>."whenever any ratio falls outside$[0.5, 2]$ .
Notes
- Input validation:
-
assert_inherits(fit, "gdpar_fit", "fit")is called first. -
assert_numeric_scalar(level, "level", lower = 0, upper = 1)enforces$0 \le \texttt{level} \le 1$ . -
verboseis checked manually: if!is.logical(verbose) || length(verbose) != 1L, the function aborts withgdpar_abort(..., class = "gdpar_input_error").
-
- Hierarchical fits are rejected: if
fit$stan_data$use_groupsis non-null and equals1L(afteras.integer), the function aborts with classgdpar_unsupported_feature_errorand a message explaining that Theorem 4C does not cover per-group random anchors. - Suggested-package dependencies
cmdstanr(for optimization and Laplace) andposterior(for draw extraction/summarisation) are required viarequire_suggested; missing packages raise an error. - The opt-in message is emitted through
gdpar_informwith class"gdpar_optin_message"whenverbose = TRUE. - Candidate parameters are obtained from
posterior::variables(draws)after excluding names matching the regexThis excludes latent noise, log-likelihood, predictions, per-observation anchors, random-effect coefficients and raws, centering constants, prior hyperparameters, and hierarchical scales (^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw|c_b|c_b_raw|mu_theta_ref|sigma_theta_ref|sigma_a|sigma_b|sigma_W)sigma_a,sigma_b,sigma_W), leaving user-facing identified parameters such astheta_ref,sigma_y, andphi. - When
parametersis supplied by the user, any requested name not incandidate_varstriggersgdpar_abortwith classgdpar_input_errorand a message listing the missing names wrapped insQuote. - Stan code is regenerated from
fit$priorviagenerate_stan_code(fit$prior, mle = TRUE)(prior block stripped using the// BEGIN PRIORS/// END PRIORSmarkers) and written to a temporary file viawrite_stan_to_tempfile. - The MLE is found by
cs_model$optimize(data = stan_data, algorithm = "lbfgs", refresh = 0, history_size = 5, init_alpha = 0.001, iter = 2000, jacobian = FALSE). - The Laplace step is wrapped in
tryCatch. On error,gdpar_warnis emitted with class"gdpar_diagnostic_warning"carrying the message"Laplace approximation failed: <conditionMessage>. Falling back to MLE-only summary.", and the asymptotic columns oftableplusdiscrepancyare filled withNA_real_. - Bayesian and Laplace summaries are both produced with
posterior::summarise_drawsusing custommean,sd,q_lower,q_upperfunctions;q_lowerandq_upperusestats::quantilewithnames = FALSEat probabilitiesalpha/2and1 - alpha/2respectively. - The function does not modify
fit; the returned report is purely informational. - Side effects: writes a Stan file to a temporary location, invokes
cmdstanr::cmdstan_model,optimize, and (optionally)laplace; may emit one opt-in message, one diagnostic warning, and one or more width-ratio warnings.
Purpose
Exported S3 print method for objects of class gdpar_bvm_report produced by gdpar_bvm_check. Renders a human-readable summary of the calibration comparison.
Arguments
-
x: an object of S3 classgdpar_bvm_report. -
...: unused; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
- Prints a header line of the form
<gdpar_bvm_report> level = <level>followed by a newline. - Prints
x$tableviaprintwithrow.names = FALSE. - If
length(x$warnings) > 0L, prints a blank line, the headingWarnings:, and each warning prefixed by-on its own line. - Dispatched through the S3 generic
printbased on the first class element"gdpar_bvm_report". - No validation of
xis performed; the function assumes the object was constructed bygdpar_bvm_check.
Purpose
S3 print method for objects of class gdpar_causal_bridge. Produces a concise, human-readable summary of the bridge's structural metadata and first few CATE estimates.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_causal_bridge |
The bridge object to print. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns
Invisibly returns x.
Notes
- The method extracts the treatment-model family (
x$fits$treat$family) and prints both itsnameandlink. - It uses
%||%(null-coalescing) to defaultKandpto1Lwhen the corresponding slot is absent inx$fits$treat. -
Kis printed only whenK > 1; otherwisepis printed only whenp > 1. If both are 1, neither label is emitted. - The anchor vector is formatted with
digits = 4and displayed as a comma-separated bracketed list. -
x$meta$newdata_sourcedefaults to"<unknown>"whenNULL. - The head of
cate_meanis displayed via the internal helper.bridge_format_head. Ifx$warningsis a non-empty character vector, each warning is printed on its own indented line under a "Warnings:" heading. - Side effect: writes to the console via
cat.
Purpose
Internal helper that formats the first few elements (or rows) of the cate_mean vector/matrix for display in print.gdpar_causal_bridge.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cate_mean |
numeric vector or numeric matrix | The posterior mean CATE values. A vector for scalar bridges; a matrix for multivariate/K-individual bridges. |
n_show |
integer, default 6L
|
Maximum number of elements or rows to display. |
Returns
A single character string suitable for cat.
Notes
-
Matrix path (
is.matrix(cate_mean)isTRUE):nrow(cate_mean)observations are assumed;n_showis clamped tomin(n_show, n). Each rowiis formatted as[v1, v2, ...]withdigits = 3viaformat. Rows are concatenated with"; "separators. -
Vector path:
length(cate_mean)observations are assumed;n_showis clamped similarly. The firstn_showelements are formatted withdigits = 3and joined with", ". - Marked
@keywords internaland@noRd; not exported.
Purpose
S3 summary method for gdpar_causal_bridge objects. Constructs a structured summary containing a per-observation table of posterior CATE (mean and credible interval), the marginal Average Treatment Effect (ATE) and its credible interval, and ancillary metadata.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_causal_bridge |
The bridge to summarise. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns
An object of class c("summary.gdpar_causal_bridge", "list") with components:
| Component | Structure | Description |
|---|---|---|
table |
data.frame |
Per-observation (and per-slot) CATE table (see below). |
ate |
named numeric vector | Marginal ATE. Scalar for K=1,p=1; length-K vector otherwise. |
ate_ci |
numeric matrix (K × 2) |
Credible bounds for the marginal ATE. Columns lower, upper. Row names are slot names (or "ate" for scalar case). |
level |
numeric scalar | Credible level used. |
type |
character scalar | Type of bridge (e.g. "response", "link"). |
n_draws |
integer scalar | Number of posterior draws. |
n_obs |
integer scalar | Number of evaluation observations. |
Mathematics
Let
Scalar bridge (cate_draws is an
The ATE credible interval is
where stats::quantile).
Multivariate / K-individual bridge (cate_draws is an
For each slot
Notes
- Calls
assert_inherits(object, "gdpar_causal_bridge", "object")to validate input. - The scalar branch is taken when
is.matrix(cate_draws)isTRUE; the table then has columnsobservation,cate_mean,cate_lower,cate_upper. Thecate_meanandcate_cicomponents of the bridge object (object$cate_mean,object$cate_ci) are assumed pre-computed. - The array branch is taken otherwise. Slot names are taken from
object$meta$dim_names; ifNULL, they default to"dim_1","dim_2", …,"dim_K". The table gains aslotcolumn and hasK × nrows total. - The per-observation ATE for the scalar case is
mean(cate_draws)(i.e. the grand mean over both draws and observations), and the ATE CI usesrowMeans(cate_draws)(a length-$S$ vector of per-draw cross-observation means). - For the array case,
ate_vec[k]is computed asmean(cate_draws[, , k])(grand mean over all draws and observations for slot$k$ ). The CI for slot$k$ usesapply(cate_draws[, , k, drop = FALSE], 1L, mean)to produce a length-$S$ vector of per-draw means, then takes quantiles of that. - The
ate_matmatrix in the scalar path has a single row named"ate". - Class is set to
c("summary.gdpar_causal_bridge", "list").
Purpose
S3 print method for summary.gdpar_causal_bridge objects. Displays bridge metadata, the marginal ATE table, and the first 10 rows of the per-observation CATE table.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
summary.gdpar_causal_bridge |
The summary object to print. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns
Invisibly returns x.
Notes
- Builds an
ate_dfdata frame with columnsslot,mean,lower,upperfromx$ate(coerced to unnamed vector) andx$ate_ci. This data frame is printed withrow.names = FALSE. - The per-observation CATE table is truncated to the first 10 rows via
utils::head(x$table, 10L). - Side effect: writes to the console via
catandprint.
predict.gdpar_causal_bridge(object, newdata, level = NULL, summary = c("all", "draws", "mean_ci"), ...)
Purpose
S3 predict method for gdpar_causal_bridge objects. Recomputes the per-observation Conditional Average Treatment Effect (CATE) on a new evaluation grid by leveraging the treatment and control model fits already stored in the bridge. Structural compatibility of the two fits is not re-checked.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_causal_bridge |
The bridge whose fits are used for prediction. |
newdata |
data.frame |
New covariate grid on which to evaluate the CATE. Required. |
level |
numeric scalar in NULL
|
Credible level for the new credible intervals. If NULL (default), uses object$level. |
summary |
character scalar, one of "all", "draws", "mean_ci"
|
Controls the structure of the returned value (see Returns). |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Mathematics
Let $\hat{\tau}{\text{treat}}(x_i)$ and $\hat{\tau}{\text{ctrl}}(x_i)$ denote the posterior draws for observation
The summary statistics are
where
Returns
summary value |
Return structure |
|---|---|
"all" |
A list with components cate_draws (array/matrix of posterior CATE draws), cate_mean (numeric vector/matrix of posterior means), cate_ci (array/matrix of credible bounds), n_draws (integer scalar), n_obs (integer scalar). |
"draws" |
The raw cate_draws object (matrix or array), returned early. |
"mean_ci" |
A list with components cate_mean and cate_ci. |
Notes
- Validates
objectviaassert_inheritsandnewdataviaassert_data_frame. -
levelis validated byassert_numeric_scalarwhen non-NULL, with bounds(1e-3, 1 − 1e-3)exclusive. -
summaryis matched viamatch.arg; partial matching is allowed by R convention. - The
typeslot is extracted fromobject$typeand forwarded to bothstats::predictcalls withsummary = "draws". This means both the treatment and control predictions return raw posterior draws. - Draw alignment is performed by the internal helper
.align_bridge_draws(pred_t, pred_c), which returns a list with aligned$treatand$ctrlarrays and the common draw count$S. - CATE is computed as the element-wise difference
aligned$treat - aligned$ctrl. - Summarisation (mean and CI) is delegated to the internal helper
.summarize_cate(cate_draws, ql, qu). - The return value for
summary = "all"mirrors thecate_*slot structure of a freshly constructedgdpar_causal_bridge, enabling downstream methods (e.g.summary,print) to operate on the result if it is wrapped appropriately.
gdpar_causal_bridge(fit_treat, fit_ctrl, newdata = NULL, type = c("response", "theta_i", "linear_predictor"), level = 0.95, ...)
Purpose
Exported constructor for the T-learner causal bridge between two independent gdpar_fit objects. It estimates the conditional average treatment effect (CATE) as the per-observation, per-draw difference of the posterior predictive distributions from the treatment-arm fit and the control-arm fit, evaluated on a common evaluation set. The function does not modify either input fit and performs no causal adjustment beyond what is encoded in the two AMM specifications; the no-unmeasured-confounding assumption within each arm is the user's responsibility.
Arguments
-
fit_treat— (gdpar_fit) A fit object produced bygdpar()for the treatment arm. Must be a Path 1 (path = "bayes") Bayesian fit. -
fit_ctrl— (gdpar_fit) A fit object produced bygdpar()for the control arm. Must share the family, anchor, AMM level, and covariate structure offit_treat. -
newdata— (data.frameorNULL) Optional evaluation data frame. WhenNULL(default), the function attempts to recover each arm's training data by evaluating the captureddataargument of each fit's call in the caller's environment; if both recoveries succeed and the two data frames share column structure, theirrbindis used. Otherwise the function aborts and requests an explicitnewdata. -
type— (character) Scalar selecting the prediction scale. Matched viamatch.argagainstc("response", "theta_i", "linear_predictor")."response"applies the inverse link per draw;"theta_i"and"linear_predictor"are synonyms selecting the linear predictor of the individual parameter. -
level— (numeric) Scalar in$(0, 1)$ giving the nominal credible level for per-observation CATE intervals. Defaults to0.95. Validated to lie in$[10^{-3},, 1 - 10^{-3}]$ . -
...— Reserved for future arguments; currently unused.
Mathematics
For each posterior draw
where type at
Because the two fits are independent (sampled from disjoint data subsets), the joint posterior factorizes:
so any pairing of marginal draws is a valid sample from the joint. When the two fits differ in number of draws, the function trims to
and emits a gdpar_diagnostic_warning.
For multivariate (predict.gdpar_fit returns a 3-array of shape cate_draws. For type = "response", the canonical inverse link of each coordinate or slot is applied by predict.gdpar_fit before differencing.
Returns
An object of class c("gdpar_causal_bridge", "list") with components:
-
cate_draws—matrix$[S, n]$ (scalar fits) orarray$[S, n, \text{dim}]$ (multivariate / K-individual fits): per-draw CATE values. -
cate_mean— posterior mean of the CATE per observation (and per dimension when applicable). -
cate_ci— credible interval bounds at the requestedlevel. -
newdata— the resolved evaluation data frame. -
id_check—list(treat = fit_treat$identifiability_report, ctrl = fit_ctrl$identifiability_report). -
fits—list(treat = fit_treat, ctrl = fit_ctrl). -
type— the matchedtypestring. -
level— the numeric credible level. -
n_draws—S, the (possibly trimmed) number of aligned draws. -
n_obs—nrow(newdata_resolved). -
call— the matched call (match.call()). -
warnings—charactervector of fallback notifications (e.g., posterior-draw trimming); empty on the happy path. -
meta—list(dim_kind, dim_size, dim_names, newdata_source)wherenewdata_sourceis the"bridge_source"attribute of the resolvednewdata.
Notes
- Calls six internal validators in sequence:
.check_bridge_path,.check_bridge_hierarchical,.check_bridge_family,.check_bridge_dim,.check_bridge_amm,.check_bridge_anchor. Only the first three are defined in this section; the remaining three and.resolve_bridge_newdata,.align_bridge_draws,.summarize_cateare defined in subsequent sections. - Aborts with condition class
gdpar_unsupported_feature_error(viagdpar_abort) when structural compatibility checks fail. - The
assert_inheritscalls validate that bothfit_treatandfit_ctrlinherit from"gdpar_fit"before any other logic executes. -
assert_numeric_scalarenforceslevelwithin$[10^{-3},; 1 - 10^{-3}]$ . - The
newdataresolution usesparent.frame()as the evaluation environment for recovering captureddataarguments. - Predictions are obtained via
stats::predict(fit, newdata = ..., type = type, summary = "draws"), dispatching topredict.gdpar_fit. - The
warningsfield is set tocharacter(0L)whenaligned$warningisNA, otherwise to the warning string. - (C7) anti-aliasing of Block 6.5 is not invoked because the hierarchical guard rules out the regime in which it applies.
- Companion S3 methods
print.gdpar_causal_bridgeandsummary.gdpar_causal_bridgeare documented elsewhere.
Purpose
Internal validator asserting that both fits were produced under Path 1 (path = "bayes"). T-learner support for Paths 2/3 is not implemented.
Arguments
-
fit_treat— (gdpar_fit) Treatment-arm fit. -
fit_ctrl— (gdpar_fit) Control-arm fit.
Mathematics
Each fit's path slot is read with a fallback default of "bayes":
The check passes if and only if $\text{path}{\text{treat}} \equiv \text{"bayes"}$ and $\text{path}{\text{ctrl}} \equiv \text{"bayes"}$ (via identical).
Returns
invisible(NULL) on success.
Notes
- On failure, calls
gdpar_abortwithclass = "gdpar_unsupported_feature_error"and adatalist containingpath_treatandpath_ctrl. - The
%||%operator provides the default"bayes"whenfit$pathisNULL, meaning a fit lacking an explicitpathslot is treated as Path 1.
Purpose
Internal validator asserting that neither fit was sampled in the hierarchical (grouped) regime. The T-learner difference of per-group anchors is not defined in the canonical formulation and is queued for a future sub-phase.
Arguments
-
fit_treat— (gdpar_fit) Treatment-arm fit. -
fit_ctrl— (gdpar_fit) Control-arm fit.
Mathematics
A fit is classified as grouped when
The check passes if and only if $\neg,\text{is_grouped}(\text{fit}{\text{treat}}) ;\wedge; \neg,\text{is_grouped}(\text{fit}{\text{ctrl}})$.
Returns
invisible(NULL) on success.
Notes
- Defines a local closure
is_grouped(fit)that testsfit$stan_data$use_groupsfor non-NULL and integer equality to1L(viaas.integer(...)). - On failure, calls
gdpar_abortwithclass = "gdpar_unsupported_feature_error"and adatalist containingtreat_groupedandctrl_grouped(logical scalars). - Uses the same condition class as
gdpar_bvm_checkso user code handling unsupported-feature errors covers both helpers. - The error message advises refitting each arm without the
groupargument.
Purpose
Internal validator asserting that the two fits share compatible family identifiers—both the top-level family name and link, and, when param_specs are present (multivariate / K-individual families), the per-slot family identifiers.
Arguments
-
fit_treat— (gdpar_fit) Treatment-arm fit. -
fit_ctrl— (gdpar_fit) Control-arm fit.
Mathematics
Let $\text{fam}{\text{arm}} = \text{fit}{\text{arm}}\text{$family}$. The top-level check requires
When param_specs is non-NULL on either fit, let $n_{\text{arm}} = |\text{fam}{\text{arm}}\text{$param_specs}|$ (defaulting to an empty list via %||%). The slot-count check requires $n{\text{treat}} = n_{\text{ctrl}}$. The per-slot identifier vector for each arm is
$$ \text{names}{\text{arm}}[k] = \text{as.character}\bigl(\text{ps}{\text{arm}}k\text{$family_id} ;%|%; \text{ps}{\text{arm}}k\text{$name} ;%|%; \text{NA_character_}\bigr), \quad k = 1, \dots, n{\text{arm}}, $$
and the slot-identifier check requires $\text{names}{\text{treat}} \equiv \text{names}{\text{ctrl}}$ (via identical).
Returns
invisible(NULL) on success.
Notes
- Performs three sequential aborts, all with
class = "gdpar_unsupported_feature_error":- Top-level family name or link mismatch —
datacontainsfamily_treat,family_ctrl,link_treat,link_ctrl. - Slot count mismatch —
datacontainsn_slots_treat,n_slots_ctrl. - Per-slot family identifier mismatch —
datacontainsslot_families_treat,slot_families_ctrl(character vectors).
- Top-level family name or link mismatch —
- The
param_specsbranch is entered when!is.null(ps_t) || !is.null(ps_c), meaning if either fit hasparam_specs, both are expected to have them and be compared. - Per-slot identifier extraction uses the chain
s$family_id %||% s$name %||% NA_character_, so a slot lacking bothfamily_idandnameyieldsNA_character_. -
vapplywithcharacter(1L)is used for type-safe extraction of per-slot identifiers.
Purpose
Guard function for gdpar_causal_bridge that verifies the two fitted model objects share identical structural dimensions
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | The treatment-arm fit; must contain (or default) elements K and p. |
fit_ctrl |
list (fitted model object) | The control-arm fit; must contain (or default) elements K and p. |
Mathematics
The check enforces the equality constraints
Each dimension is recovered with a null-coalescing fallback to 1L:
and analogously for
Returns
invisible(NULL) when the dimensions match. Otherwise it never returns: it calls gdpar_abort with class "gdpar_unsupported_feature_error" and a data payload listing K_treat, K_ctrl, p_treat, p_ctrl.
Notes
- Uses the
%||%operator, so a missingKorpslot silently defaults to1Lrather than erroring. - The error condition is tagged
"gdpar_unsupported_feature_error", signalling that mismatched dimensions are a feature limitation rather than a user input typo.
Purpose
Asserts that the two fits are compatible at the AMM (Additive Modulating Model) specification level. Three facets are compared: (1) the AMM level (structural composition of the spec slots a/b/W), (2) the modulating basis type (polynomial vs. B-spline), and (3) the covariate column structure of the AMM design. This guarantees that the predict path on newdata reuses the same algorithm on both arms.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | Treatment-arm fit; expected to carry an amm element or a fallback amm_list_canonical. |
fit_ctrl |
list (fitted model object) | Control-arm fit; same expectation. |
Mathematics
Let
The AMM spec is resolved via null-coalescing:
Returns
invisible(NULL) on success. On any mismatch, calls gdpar_abort:
- Missing AMM on either fit → class
"gdpar_internal_error". - Level mismatch → class
"gdpar_unsupported_feature_error",data = list(level_treat, level_ctrl). - W-type mismatch → class
"gdpar_unsupported_feature_error",data = list(W_type_treat, W_type_ctrl). The formatted message substitutes"<none>"for aNULLtype. - Covariate mismatch → class
"gdpar_unsupported_feature_error". The message lists the symmetric set difference of covariate component names:
Notes
- Dispatches to three helper functions:
.bridge_amm_level,.bridge_amm_W_type,.bridge_amm_covariates. - The covariate comparison uses
identical(), so ordering of list elements matters for the check to pass. - A missing AMM on either fit is treated as an internal error (not a user error), reflecting the expectation that fits should always carry an AMM spec.
Purpose
Asserts that the reference anchors stored in the two fits are numerically identical (within tolerance). The anchor enters the modulating term as
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | Treatment-arm fit; must contain an anchor element (numeric vector). |
fit_ctrl |
list (fitted model object) | Control-arm fit; must contain an anchor element (numeric vector). |
Mathematics
Given anchor vectors
Then it computes element-wise absolute differences and a per-element scale:
and requires
This combines relative tolerance (when
Returns
invisible(NULL) when anchors match. Otherwise calls gdpar_abort:
- Length mismatch → class
"gdpar_unsupported_feature_error",data = list(anchor_treat_length, anchor_ctrl_length). - Value mismatch → class
"gdpar_unsupported_feature_error",data = list(anchor_treat, anchor_ctrl). The message formats anchor values withdigits = 6and instructs the user to refit one arm anchored to the other's value.
Notes
- The tolerance is fixed at
1e-8and is not configurable. -
pmaxis used (element-wise parallel max), so the scale vector has the same length as the anchors. - If either
anchorslot isNULL, the subtractiona_t - a_cwill error in base R before the tolerance check; this is not explicitly guarded.
Purpose
Infers the AMM level from an amm_spec object or a list of amm_spec objects. The level encodes the structural composition of the spec (presence/absence of a, b, W components).
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec or list of amm_spec
|
The AMM specification (single or per-slot when |
Returns
- If
amminherits from"amm_spec":as.integer(amm$level)(scalar integer). - If
ammis a list: an integer vector of the same length, where each element isas.integer(a$level)if the element inherits from"amm_spec", otherwiseNA_integer_. - Otherwise:
NA_integer_.
Notes
- No error is raised for non-
amm_specinputs; the function returnsNA_integer_silently. - The return type is always integer (via
as.integerandNA_integer_), making the output suitable foridentical()comparison in.check_bridge_amm.
Purpose
Extracts the modulating basis type (e.g., "polynomial" or "bspline") from the W sub-component of an AMM spec. This ensures the predict path on new data reuses the same basis expansion algorithm on both arms.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec or list of amm_spec
|
The AMM specification. |
Returns
- If
amminherits from"amm_spec":-
NULLwhenamm$WisNULL(no modulating term). -
as.character(amm$W$type)otherwise.
-
- If
ammis a list:- A character vector of length
length(amm), where each element isas.character(a$W$type)if the element is anamm_specwith a non-nullW, otherwiseNA_character_. - If all elements are
NA, returnsNULL. - Otherwise returns the full character vector (including
NAentries).
- A character vector of length
- Otherwise:
NULL.
Notes
- The "all NA → NULL" collapse means a list of specs with no
Won any slot is indistinguishable from a single spec with noW. - When some slots have
Wand others do not, the returned vector containsNA_character_entries, which will cause.check_bridge_ammto fail theidentical()test if the two arms differ in which slots lackW.
Purpose
Extracts the covariate names for each AMM component (a, b, and x_vars) from an AMM spec. This is used to verify that both arms share the same covariate column structure in their AMM designs.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec or list of amm_spec
|
The AMM specification. |
Mathematics
For a single amm_spec
where all.vars() parses an R formula/expression and returns the variable names referenced in it. Each component defaults to character(0L) when the corresponding slot is NULL.
Returns
- If
amminherits from"amm_spec": a named list with elementsa_vars,b_vars,x_vars(each a character vector). - If
ammis a list: a list of such named lists (one per element ofamm). - Otherwise: an empty
list().
Notes
-
all.vars()is applied toa$aanda$b, which are expected to be formula-like objects (formulas, calls, or expressions). If they are character strings,all.vars()will returncharacter(0). - The
x_varsslot is taken as-is (not parsed), so it must already be a character vector. - The returned structure is compared with
identical()in.check_bridge_amm, so the order ofx_varsand the order of list elements matter.
Purpose
Resolves the evaluation grid (newdata) on which the CATE will be computed. When the user supplies newdata, it is used directly. When newdata is NULL, the function attempts to recover both arms' training data by evaluating the captured data argument from each fit's call, then rbinds them into a single data frame.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | Treatment-arm fit; its $call$data is evaluated if newdata is NULL. |
fit_ctrl |
list (fitted model object) | Control-arm fit; same recovery logic. |
newdata |
data.frame or NULL
|
User-supplied evaluation grid. If non-NULL, it is validated and returned. |
eval_env |
environment | The environment in which to evaluate fit$call$data when recovering training data. |
Returns
A data.frame with an attribute "bridge_source":
-
"user"— whennewdatawas supplied by the user (passed through afterassert_data_framevalidation). -
"training_rbind"— whennewdatawasNULLand both arms' training data were successfully recovered andrbind-ed.
The rbind uses the column order of the treatment arm (data_t) for both arms, ensuring consistent column alignment.
Notes
-
Recovery logic: The inner function
recover(fit, arm_label)extractsfit$call$dataand evaluates it ineval_envinside atryCatch. Any evaluation error silently yieldsNULL. Thearm_labelargument is accepted but never used in the body. -
Column matching: Column names are compared after sorting (
sort(colnames(...))), so column order may differ between arms but the set of columns must be identical. However, therbinditself usesdata_t's column order for both, so ifdata_chas columns in a different order, they are subsetted to matchdata_t's order before binding. -
Error conditions (all via
gdpar_abort):- Recovery failure (either arm's data is
NULLor not a data frame) → class"gdpar_input_error",data = list(treat_recovered, ctrl_recovered). - Column structure mismatch → class
"gdpar_input_error",data = list(colnames_treat, colnames_ctrl).
- Recovery failure (either arm's data is
-
assert_data_frame(newdata, "newdata")is called on user-suppliednewdata; this is presumably a checkmate-style assertion (not defined in this section). - The
rbindis performed withdrop = FALSEsubsetting, preserving data frame structure even for single-column data.
Purpose
Aligns the posterior draw arrays of the two arms by trimming the longer set to match the shorter. This is necessary because the two fits may have been run with different numbers of posterior draws (
Arguments
| Argument | Type | Meaning |
|---|---|---|
pred_t |
matrix [S_t, n] or array [S_t, n, dim]
|
Posterior draws from the treatment arm. |
pred_c |
matrix [S_c, n] or array [S_c, n, dim]
|
Posterior draws from the control arm. |
Mathematics
The aligned draw count is:
When
and for an array of dimension
Returns
A named list with four components:
| Component | Type | Description |
|---|---|---|
treat |
matrix or array | Trimmed treatment draws, first axis of length |
ctrl |
matrix or array | Trimmed control draws, first axis of length |
S |
integer | The aligned draw count |
warning |
character scalar | The trimming notification message when NA_character_ otherwise. |
Notes
-
Draw count inference:
$S$ is inferred fromnrow()(for matrices) ordim()[1L](for arrays). Ifpred_tis a matrix andpred_cis an array (or vice versa), the function does not explicitly check for type consistency; it will still compute$S_t$ and$S_c$ but the trimming logic branches onis.matrix()per input. -
Warning emission: When trimming occurs,
gdpar_warnis called with class"gdpar_diagnostic_warning"anddata = list(S_treat, S_ctrl, S). The same message string is stored in thewarningfield for persistence. -
Trimming implementation: The inner function
trim_first_axis(arr, S)handles both matrices (simple row indexing withdrop = FALSE) and arrays (constructs an index list withseq_len(S)for the first axis andquote(expr =)(i.e., empty subscript) for all remaining axes, then callsdo.call([, ...)withdrop = FALSE). The use ofquote(expr =)produces a missing/empty argument in the subscript list, which selects all elements along that dimension. -
Persistence: The
warningfield is designed to be persisted by the constructor into the$warningsslot of the resultinggdpar_causal_bridgeobject, so the print method can surface the fallback notification (referenced as "D48 canonical norm; D50 of Sesion 18 Etapa 2 of Sesion 8.4" in the source comments). - No error is raised if the two inputs have different shapes beyond the first axis; the function only aligns the first (draw) axis.
Purpose
Internal helper that converts raw posterior draws of a Conditional Average Treatment Effect (CATE) quantity into a compact summary consisting of the posterior mean and a lower/upper credible-interval bound per unit (and, in the multi-component case, per component slot). It is the canonical summarizer used downstream by the causal-bridge machinery to produce user-facing CATE summaries; it also tags the output with metadata (dim_kind, dim_size, dim_names) describing the structure of the summarized quantity so that downstream printers/extractors can dispatch correctly.
Arguments
-
cate_draws:numericposterior draws of the CATE. Two shapes are supported:- a
matrixwith rows indexing posterior draws ($S$ ) and columns indexing units ($n$ ), i.e.dim = c(S, n); - a 3-dimensional
arraywithdim = c(S, n, K), where the first dimension indexes draws, the second indexes units, and the third indexes component "slots" (e.g. multiple treatment arms, multiple outcome dimensions, or per-dimension effects).
- a
-
ql:numericscalar in$[0,1]$ . Lower-tail quantile probability used for the credible interval. -
qu:numericscalar in$[0,1]$ . Upper-tail quantile probability used for the credible interval.
Mathematics
For the matrix case (one scalar CATE per unit), let
where stats::quantile with names = FALSE).
For the 3-D array case, let
The dim_kind tag is assigned by the heuristic
Returns
A list with the following components, whose shapes depend on the input:
-
Matrix input:
-
mean:numericvector of length$n$ (column means ofcate_draws). -
ci:numeric$n \times 2$ matrixwith columns named"lower"and"upper"; column 1 holds theql-quantiles, column 2 thequ-quantiles. -
dim_kind:characterscalar, hard-coded to"scalar". -
dim_size:integerscalar, hard-coded to1L. -
dim_names:NULL.
-
-
3-D array input:
-
mean:numeric$n \times K$ matrixwithdimnamesset tolist(NULL, slot_names)(i.e. unit dimension unnamed, slot dimension named after the third-dimension names ofcate_draws). -
ci:numeric$n \times K \times 2$ arraywithdimnames = list(NULL, slot_names, c("lower", "upper")); slice[, , 1]holds theql-quantiles, slice[, , 2]thequ-quantiles. -
dim_kind:characterscalar, either"multi"or"K_individual"per the heuristic above. -
dim_size:integerscalar equal to$K$ (dim(cate_draws)[3L]). -
dim_names: the third-dimension names ofcate_draws(dimnames(cate_draws)[[3L]]), which may beNULL.
-
Notes
- The function is internal (leading dot) and is not exported.
- Shape dispatch is performed by testing
is.matrix(cate_draws)first, thenlength(dim(cate_draws)) == 3L. Any other shape (e.g. a vector, a 2-D non-matrix array, or a 4+-D array) triggers an error viagdpar_abortwithclass = "gdpar_internal_error"and adatafield carryingdim(cate_draws). The error message is the literal string"Internal error: unsupported shape for cate_draws.". - Quantiles are computed with
stats::quantileandnames = FALSE; the default quantile type (type 7) ofstats::quantileis therefore in effect. - In the matrix branch, the two quantile rows returned by
applyare re-assembled into an$n \times 2$ matrix explicitly (rather than transposed), so the orientation ofciis guaranteed regardless of howapplylays out its result. - In the 3-D branch,
applyis called separately for the lower and upper quantiles (with scalarprobs), producing two$n \times K$ matrices that are then written into the two slices of the output array. This avoids any ambiguity about the shape ofapply's output whenprobshas length 1. -
dim_kindis purely a metadata tag derived from the slot names; it does not affect the numerical content ofmeanorci. The"multi"tag is intended to flag either anonymous component dimensions (slot_namesisNULL) or component dimensions whose names follow the package-internaldim_naming convention, while"K_individual"flags named per-individual components. - The function has no side effects and performs no S3 dispatch of its own; it is a plain closure.
gdpar_check_identifiability(amm, data, theta_ref_init = NULL, formula_rhs = NULL, family = NULL, tol = 1e-8, rigor = c("full", "fast"))
Purpose
Diagnose whether the chosen finite parametric representation of the AMM canonical form satisfies the basis‑restricted Functional Independence Condition at a candidate population reference point. This pre‑fit check verifies parameter identifiability by examining the empirical Gram matrix of the extended design matrix.
Arguments
-
amm: Object of classamm_spec(created byamm_spec) defining bases for additive, multiplicative, and modulating components. -
data: Data frame containing variables referenced inamm; covariates are centered internally before computing the Gram matrix. -
theta_ref_init: Numeric vector of length p (dimension of the population reference) at which the diagnostic is evaluated. Defaults to a zero vector (if multiplicative componentbis absent) or a vector of ones (ifbis present). -
formula_rhs: Optional formula or character vector specifying covariates for the modulating linear factorx; defaults toamm$x_vars. -
tol: Numeric scalar ∈ (0,1) setting the relative condition‑number tolerance. Failure is flagged when λ_min <tol× λ_max. Default1e-8. -
family: Optionalgdpar_familyorgdpar_family_multiobject; if supplied, triggers parameter‑level identifiability (D‑ID) pre‑fit checks. -
rigor: Character scalar,"full"(default) or"fast", controlling the C4‑bis cross‑component check for p > 1 specs. Ignored when p = 1.
Mathematics
The function builds an extended design matrix
Its eigenvalues
equivalent to the condition number
For p > 1, a per‑coordinate C4‑bis check is performed (if applicable) by calling check_C4_bis_per_k, and a D‑ID pre‑fit check is performed via .check_did_pre_fit.
Returns
An object of class gdpar_identifiability_report (a list) containing:
| Component | Description |
|---|---|
passed |
Logical; TRUE iff all applicable checks (Gram, C4‑bis, D‑ID) pass. |
lambda_min, lambda_max, condition_number
|
Eigenvalues and condition number of the Gram matrix (if computed). |
collinear_directions |
List of near‑zero eigenvectors projected onto basis columns (if Gram check fails); otherwise NULL. |
theta_ref_used |
The theta_ref_init used. |
tol_used, rigor_used
|
The tol and rigor used. |
column_labels |
Character vector labeling columns of the extended design matrix. |
c4_bis |
Result of the C4‑bis per‑coordinate check (if performed); else NULL. |
did_pre_fit |
Result of the D‑ID pre‑fit check (if performed); else NULL. |
message |
Human‑readable summary. |
A print method formats the report.
Notes
-
Trivial cases: If
amm$level == 0Lor there are no active design blocks, the function returnspassed = TRUEwith a message andNAeigenvalues. -
Input validation: Uses
assert_inherits,assert_data_frame,assert_numeric_scalar, andmatch.arg; raisesgdpar_input_erroron invalid inputs. -
Warning: If
theta_ref_initis zero in every coordinate andamm$bis non‑null, agdpar_diagnostic_warningis issued because the multiplicative block vanishes trivially. -
Zero‑norm columns: If any column of
$Z$ is zero after centering, the function returnspassed = FALSEwithlambda_min = 0andcondition_number = Inf. -
Multivariate path: When
design$Z_a_listexists anddesign$Z_aisNULL, the function delegates tocheck_identifiability_multi(defined elsewhere) and returns its result. -
Extended matrix construction: For p > 1, the multiplicative block is expanded as
$Z_b \cdot \theta_{\text{ref}[k]}$ for each coordinate k, producing column labelsb*theta[k]:.... -
C4‑bis and D‑ID checks: Only performed when p > 1 and the necessary design components exist. The
rigorargument controls failure vs. warning on column‑name overlap between additive and modulating blocks. -
Condition number computation: Uses
eigen(G, symmetric = TRUE)and guards against division by zero withmax(lambda_min, .Machine$double.eps).
Purpose
Top-level multivariate identifiability report. Combines two pre-fit layers—(i) the per-coordinate C4-bis functional independence check (check_C4_bis_per_k) and (ii) the pre-fit parameter identifiability / D-ID layer (.check_did_pre_fit)—into a single consolidated "gdpar_identifiability_report" object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec |
The model specification; forwarded to sub-checks (used to read W$dim and p). |
design |
list | Design structure returned by build_amm_design; forwarded to sub-checks. |
theta_ref |
numeric vector (length p) |
Reference (initial) parameter vector at which identifiability is evaluated. |
tol |
numeric scalar | Tolerance for the eigenvalue-ratio (condition-number) criterion in C4-bis. |
rigor |
character scalar ("fast" or "full") |
Controls depth of the C4-bis check; forwarded to both sub-checks. |
family |
gdpar_family, gdpar_family_multi, or NULL
|
Family object for the D-ID layer; NULL skips that layer entirely. |
Mathematics
The function aggregates per-coordinate eigenvalue diagnostics. Denote the smallest and largest eigenvalues returned by coordinate
with NA_real_ propagated when every per-$k$ value is NA.
Returns
A list of class c("gdpar_identifiability_report", "list") with elements:
| Element | Type / Content |
|---|---|
passed |
Logical scalar; TRUE iff both passed_c4_bis and passed_did are TRUE. |
lambda_min |
Numeric; global minimum eigenvalue across coordinates (or NA_real_). |
lambda_max |
Numeric; global maximum eigenvalue across coordinates (or NA_real_). |
condition_number |
Numeric; worst (maximum) condition number across coordinates (or NA_real_). |
collinear_directions |
Always NULL at this level (populated in per-k entries). |
theta_ref_used |
Numeric vector; echo of theta_ref. |
tol_used |
Numeric; echo of tol. |
rigor_used |
Character; echo of rigor. |
column_labels |
character(0). |
c4_bis |
Full return value of check_C4_bis_per_k. |
did_pre_fit |
Return value of .check_did_pre_fit (or NULL). |
message |
Character scalar summarising pass/fail with human-readable explanation. |
Notes
- When
familyisNULL,.check_did_pre_fitis not called (did_pre_fitis set toNULL) andpassed_diddefaults toTRUE, so the overall result depends solely on C4-bis. - The message distinguishes three cases: all-pass, C4-bis failure, and D-ID failure.
-
min/maxoverlmins/lmaxs/condsusesna.rm = TRUEbut falls back toNA_real_when every element isNA(guarded byall(is.na(...))).
Purpose
Internal per-coordinate cross-component identifiability check (C4-bis). For each coordinate rigor = "fast") or a full Gram-matrix rank test (rigor = "full") on the additive design matrix
Arguments
| Argument | Type | Meaning |
|---|---|---|
design |
list | Must contain Z_a_list, Z_a_names_list, X, X_names. |
amm |
amm_spec |
Model specification; amm$W$dim is read to derive the per-k block dimension. |
theta_ref |
numeric vector (length p) |
Reference parameter vector; used to determine p. |
rigor |
character scalar ("fast" or "full") |
Selects the check strategy (see below). |
tol |
numeric scalar | Eigenvalue-ratio tolerance; a coordinate passes iff |
Mathematics
Let
For each coordinate
-
Extract the additive design sub-matrix
$\mathbf{Z}_a[k]$ (column-normalised to unit$\ell_2$ norm):$$\widetilde{\mathbf{Z}}_a[k] = \mathbf{Z}_a[k],\mathrm{diag}!\bigl(\lVert \mathbf{Z}a[k]{\cdot j}\rVert_2^{-1}\bigr)$$
-
Form the (sample) correlation/gram matrix:
$$\mathbf{G}_k = \frac{1}{n},\widetilde{\mathbf{Z}}_a[k]^{!\top},\widetilde{\mathbf{Z}}_a[k]$$ -
Compute eigen-decomposition
$\mathbf{G}_k = \mathbf{V},\mathrm{diag}(\lambda),\mathbf{V}^{!\top}$ . The coordinate passes the rank sub-check iff:$$\lambda_{\min}^{(k)} ;\ge; \text{tol};\cdot;\lambda_{\max}^{(k)}$$ -
Structural overlap sub-check: Independently compute
$\text{shared} = \mathrm{colnames}!\bigl(\mathbf{Z}_a[k]\bigr) ;\cap; \mathrm{colnames}(\mathbf{X})$ . Underrigor = "full", any non-empty intersection causes a fail.Under
rigor = "fast", overlap triggers only agdpar_c4bis_overlap_warning(not a failure), and the rank check is skipped entirely—every coordinate is markedpassed = TRUEwithNAeigenvalue diagnostics.
Algorithm (rigor = "full")
for k in 1..p:
shared ← intersect(colnames(Z_a[k]), colnames(X))
if ncol(Z_a[k]) == 0 → pass trivially
col_norms ← √(colSums(Z_a[k]²))
if any(col_norms == 0) → fail, record zero-norm columns
Z_k_n ← Z_a[k] / col_norms # column-normalise
G_k ← crossprod(Z_k_n) / nrow(Z_k_n)
eig_k ← eigen(G_k, symmetric = TRUE)
λ_min ← min(eig_k$values)
λ_max ← max(eig_k$values)
κ ← λ_max / max(λ_min, ε_machine)
passed_rank ← (λ_min ≥ tol · λ_max)
passed_overlap ← (shared is empty)
passed_k ← passed_rank AND passed_overlap
collinear_directions:
if !passed_rank → eigenvectors with eigenvalue < tol·λ_max
if !passed_overlap → shared columns labelled
Returns
A list with elements:
| Element | Type / Content |
|---|---|
rigor |
Character; echo of the input rigor. |
per_k |
List of length |
Each per_k[[k]] entry contains:
| Sub-element | Type / Content |
|---|---|
passed |
Logical; TRUE if this coordinate passes both sub-checks. |
rigor |
Character ("fast" or "full"). |
lambda_min |
Numeric; smallest eigenvalue of NA_real_). |
lambda_max |
Numeric; largest eigenvalue of NA_real_). |
condition_number |
Numeric; NA_real_). |
shared_cols |
Character vector; column names shared between |
collinear_directions |
List of lists (or NULL); each sub-list has eigenvalue, columns, coefficients (eigenvector loadings with $ |
coord |
Integer; the coordinate index |
message |
(Only when ncol(Z_a_k) == 0 under "full"): "empty Z_a[k]: trivially passes."
|
Notes
-
Zero-norm column detection (full rigor): If any column of $\mathbf{Z}a[k]$ has zero $\ell_2$ norm the coordinate immediately fails with $\lambda{\min}=0$,
$\kappa=\infty$ , and the offending columns are recorded incollinear_directionsunder the label"zero-norm columns in Z_a[k]"with coefficient1. -
Fast rigor warning aggregation: Overlap detected across coordinates is accumulated in
overlap_accand emitted as a singlegdpar_c4bis_overlap_warning(viagdpar_warn) after the loop. The warning includes adata$shared_cols_by_coordslot listing shared columns per coordinate. -
Modulating-component design
$\mathbf{X}$ not used in rank check: The source code documents that the extended Gram matrix $[\mathbf{Z}a[k] \mid \theta{\text{ref}}[k]^m \mathbf{X}]$ was considered but rejected because at a fixed$\theta_{\text{ref}}$ the$W$ -block columns are scalar multiples of$\mathbf{X}$ per coordinate$l$ , yielding rank 1 by construction. Cross-component non-identifiability is therefore deferred to post-fit diagnostics (divergences, low ESS, high$\hat{R}$ ). -
Column naming convention: Collinear-direction column labels are prefixed with
"a{k}:"(e.g.,"a3:smooth_term"). - Default anchor: The comment states the anchor is taken as zero, consistent with the package default on the linear-predictor scale.
-
%||%usage:X_names,colnames(Z_a_k),lambda_min,lambda_max, andcondition_numberall use the null-coalescing operator%||%to supply fallback values.
Purpose Pre-fit validation of Data-Identifying Design (DID) conditions for individual-scope parameters within a gdpar_family or gdpar_family_multi object. Called before the Stan fit to verify that all per-observation or per-group parameter specifications have an explicit DID declaration and, when rigor == "full" and
Arguments
| Argument | Type | Meaning |
|---|---|---|
family |
gdpar_family, gdpar_family_multi, or NULL
|
The family object whose param_specs are inspected. If NULL, the function returns NULL immediately. |
design |
any | Present in the signature but never referenced in the function body; reserved for future use or passed through for call-site compatibility. |
theta_ref |
any | Present in the signature but never referenced in the function body; same rationale as design. |
rigor |
character scalar |
"full" activates the symbolic-separability sub-check when TRUE). |
Algorithm
-
Family extraction. If
familyinherits from"gdpar_family_multi", the first elementfamily$families[[1]]is taken asbase_family; if it inherits from"gdpar_family",base_family <- family; otherwise returnNULL. -
Individual-scope filtering. From
base_family$param_specs, retain only specs whosescopeis"per_observation"or"per_group". Let$K$ be the count of such specs. -
Per-parameter metadata. For each retained spec
$s$ , extract the fieldsname,scope,did_status,did_condition,did_reference,prior_canonical_kind. -
DID declarative check. $$ \text{passed_did} = \bigwedge_{k=1}^{K} \bigl(s_k.\text{did_status} \in {\texttt{"holds"},;\texttt{"holds_under_condition"}}\bigr) $$
-
Symbolic separability (conditional on
$K \geq 2$ andrigor == "full"). Collect the vector ofprior_canonical_kindvalues$\mathbf{p} = (p_1, \dots, p_K)$ . An element$p_i$ is overlapping if$p_i = p_j$ for some$j \neq i$ : $$ \text{overlap}_i = \bigl(\exists, j \neq i : p_j = p_i\bigr) $$ This is detected viaduplicated(p) | duplicated(p, fromLast = TRUE). If any overlap exists,passed_separability <- FALSEand the offending kinds/names are recorded. Otherwise the check passes. -
Final verdict. $$ \text{passed} = \text{passed_did} ;\wedge; \text{passed_separability} $$
Returns
A list (or NULL if family is NULL or not a recognized gdpar family):
| Field | Type | Meaning |
|---|---|---|
passed |
logical scalar |
TRUE iff DID declarations are valid and separability holds |
K |
integer scalar | Number of individual-scope parameter specs |
per_param |
list of length |
Per-parameter metadata (name, scope, did_status, did_condition, did_reference, prior_canonical_kind) |
symbolic_separability |
NULL or list |
NULL when not evaluated; otherwise a list with passed, overlapping_kinds, overlapping_names, message
|
rigor |
character scalar | Echo of the input rigor
|
Notes
-
designandtheta_refare accepted but completely unused; this allows a uniform calling convention across identifiability-check helpers. - For a
gdpar_family_multiobject only the first family (family$families[[1]]) is inspected; sub-families at index$\geq 2$ are ignored. - If
base_family$param_specsisNULL, the function returnsNULL(not a list withpassed = TRUE), signaling that DID checking is not applicable. - No errors are raised; all conditions are evaluated non-destructively.
Purpose Implements layer D-B3 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check of the per-slot additive design matrix .build_amm_design_K() before reaching this helper. The check detects zero-norm columns (structural rank deficiency) and, under rigor == "full", computes the normalized Gram condition number and flags slots whose minimum eigenvalue falls below
Arguments
| Argument | Type | Meaning |
|---|---|---|
design_K |
list | Output of .build_amm_design_K(); must contain Z_a_k_list (list of Z_a_k_names_list (list of column-name vectors), slot_names (character vector of length |
rigor |
character scalar |
"fast" skips the Gram eigendecomposition (rank is checked only via zero-norm detection); "full" performs the full eigendecomposition. |
tol |
numeric scalar | Threshold for the eigenvalue-ratio criterion. A slot passes when |
Mathematics
For each slot
-
Column norms.
$c_j = \lVert z_j \rVert_2 = \sqrt{\sum_{i=1}^n z_{ij}^2}$ for$j = 1, \dots, p_k$ . Any$c_j = 0$ signals an identically-zero column and immediately fails the slot. -
Column normalization (full rigor). $$ \widetilde{Z} = Z_a^{(k)} \cdot \mathrm{diag}(c_1^{-1}, \dots, c_{p_k}^{-1}) $$
-
Normalized Gram matrix. $$ G = \frac{1}{n},\widetilde{Z}^\top \widetilde{Z} ;\in; \mathbb{R}^{p_k \times p_k} $$
-
Eigendecomposition.
$G = V \Lambda V^\top$ with eigenvalues$\lambda_1 \geq \cdots \geq \lambda_{p_k} \geq 0$ . -
Rank criterion. $$ \text{ok} = \bigl(\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}\bigr) $$ where
$\lambda_{\min} = \lambda_{p_k}$ and$\lambda_{\max} = \lambda_1$ . -
Collinear-direction extraction. For each eigenvalue
$\lambda_j < \text{tol} \cdot \lambda_{\max}$ , the corresponding eigenvector$v^{(j)}$ is inspected: components with$|v^{(j)}_i| > 10^{-3}$ are retained and sorted by descending absolute value, yielding the collinear direction report.
Returns
A list:
| Field | Type | Meaning |
|---|---|---|
passed |
logical scalar |
TRUE iff every slot passes |
rigor |
character scalar | Echo of input |
per_slot |
list of length slot_names) |
Per-slot reports (see below) |
Each element of per_slot is a list:
| Field | Type | Meaning |
|---|---|---|
slot |
character | Slot name |
passed |
logical | Slot-level pass/fail |
rigor |
character | Effective rigor for this slot |
lambda_min |
numeric | Minimum eigenvalue (NA_real_ if not computed) |
lambda_max |
numeric | Maximum eigenvalue (NA_real_ if not computed) |
condition_number |
numeric |
Inf if zero-norm columns detected |
collinear_columns |
NULL or list |
NULL when passed; under rigor == "fast" with zero columns, a character vector of offending column names; under rigor == "full" a list of sub-lists each with eigenvalue, columns, coefficients
|
message |
character | Human-readable diagnostic |
Notes
-
Empty matrix shortcut. If
$p_k = 0$ , the slot passes trivially withlambda_min = lambda_max = condition_number = NA_real_. -
Zero-norm columns. Detected before any eigendecomposition. The offending column names are taken from
Z_a_k_names_list[[k]], falling back tocolnames(Z_a_k), falling back tocharacter(ncol(Z_a_k)). -
rigor == "fast"with no zero-norm columns. Passes unconditionally without Gram computation; diagnostics areNA_real_. -
Numerical safeguard. The condition number divides by
$\max(\lambda_{\min}, \epsilon_{\text{mach}})$ where$\epsilon_{\text{mach}}$ is.Machine$double.eps, preventing division by zero. - No S3 dispatch; purely internal utility.
Purpose Implements layer D-B2 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check on the column-wise concatenation of per-slot additive design matrices .check_Z_a_K_per_slot), the joint matrix can be rank-deficient when the same covariate appears in multiple slots with linearly equivalent designs. This helper detects such cross-slot collinearity.
Arguments
| Argument | Type | Meaning |
|---|---|---|
design_K |
list | Output of .build_amm_design_K(); same structure as for .check_Z_a_K_per_slot. |
rigor |
character scalar |
"fast" returns a structural pass without eigendecomposition; "full" computes the Gram eigendecomposition of the joint matrix. |
tol |
numeric scalar | Eigenvalue-ratio threshold: |
Mathematics
-
Joint construction. For each slot
$k$ with$p_k > 0$ columns, prefix every column name with"{slot_name}:"and horizontally concatenate: $$ Z_{\text{joint}} = \bigl[, Z_a^{(1)} ;\big|; Z_a^{(2)} ;\big|; \cdots ;\big|; Z_a^{(K)},\bigr] ;\in; \mathbb{R}^{n \times P} $$ where$P = \sum_{k: p_k > 0} p_k$ . -
Zero-norm detection. Column norms
$c_j = \lVert z_j \rVert_2$ . Any$c_j = 0$ immediately fails the check. -
Column normalization and Gram matrix (full rigor). $$ \widetilde{Z}{\text{joint}} = Z{\text{joint}} \cdot \mathrm{diag}(c_1^{-1}, \dots, c_P^{-1}), \qquad G = \frac{1}{n},\widetilde{Z}{\text{joint}}^\top \widetilde{Z}{\text{joint}} $$
-
Eigendecomposition and rank criterion. Identical to the per-slot check: $$ \text{ok} = \bigl(\lambda_{\min}(G) \geq \text{tol} \cdot \lambda_{\max}(G)\bigr) $$
-
Collinear-direction extraction. Same thresholding procedure as
.check_Z_a_K_per_slot: for each eigenvalue below the tolerance band, retain eigenvector components with$|v_i| > 10^{-3}$ , sorted by descending absolute value. Column names are prefixed with slot identifiers so the report identifies which slot contributes each column.
Returns
A list:
| Field | Type | Meaning |
|---|---|---|
passed |
logical scalar |
TRUE iff the joint matrix passes the rank criterion |
rigor |
character scalar | Echo of input |
lambda_min |
numeric | Minimum eigenvalue of NA_real_ if not computed) |
lambda_max |
numeric | Maximum eigenvalue of NA_real_ if not computed) |
condition_number |
numeric |
Inf if zero-norm columns |
collinear_directions |
NULL or list |
NULL when passed; otherwise a list of sub-lists each with eigenvalue, columns (prefixed names), coefficients
|
total_columns |
integer |
|
message |
character | Human-readable diagnostic |
Notes
-
Empty joint design. If every slot has
$p_k = 0$ , the function returnspassed = TRUEwithtotal_columns = 0LandNAdiagnostics. -
rigor == "fast". After concatenation (to computetotal_columns), the function returns a structural pass without eigendecomposition. Note that zero-norm detection is also skipped under"fast"— unlike.check_Z_a_K_per_slotwhich at least checks column norms under"fast". -
Zero-norm columns under
"full". Detected before eigendecomposition; the returnedcollinear_directionscontains a single synthetic entry witheigenvalue = 0andcoefficients = rep(1, length(bad)), since the true null direction is degenerate. -
Column-name resolution. For each slot
$k$ , names are sourced fromZ_a_k_names_list[[k]], thencolnames(Z_a_k), then auto-generated"col1","col2", … viapaste0("col", seq_len(ncol(Z_a_k))). -
No S3 dispatch; purely internal. The returned list is a component of the combined report produced by the companion function
.check_identifiability_K.
Purpose Top-level orchestrator for K-individual identifiability checks in the gdpar pipeline. It aggregates two independent sub-checks—per-slot rank of the anchor design blocks (C1–C3 conditions) and cross-slot Gram-matrix conditioning (C4-bis condition)—into a single logical pass/fail verdict. This function is called during the pre-fit decision layer to determine whether the anchor parameterisation is structurally identifiable.
Arguments
| Argument | Type | Meaning |
|---|---|---|
design_K |
list |
Design structure returned by the AMM design builders. Must contain a Z_a_k_list component (list of per-slot anchor design matrices). Passed directly to the two sub-check helpers. |
rigor |
character scalar |
Checking rigor; matched against c("full", "fast") via match.arg. "full" performs exhaustive eigenvalue/SVD-based rank analysis; "fast" may use cheaper heuristics. Defaults to "full". |
tol |
numeric scalar |
Numerical tolerance (eigenvalue ratio threshold) for rank decisions. Defaults to 1e-8. Passed through to both sub-check functions. |
Mathematics This function does not implement a formula directly; it delegates to:
-
.check_Z_a_K_per_slot(design_K, rigor, tol)— verifies that each per-slot anchor design matrix$Z_{a,k}$ has full column rank (the C1–C3 conditions). -
.check_C4_bis_K_cross_slot(design_K, rigor, tol)— verifies that the cross-slot block Gram matrix is non-singular (the C4-bis condition), i.e., that no linear combination of columns across different$k$ -slots creates a collinearity.
The overall verdict is the logical conjunction: the identifiability check passes if and only if both sub-checks pass.
Returns A named list with components:
| Component | Type | Description |
|---|---|---|
passed |
logical |
TRUE iff both per_slot_rank$passed and cross_slot_gram$passed are TRUE. |
rigor |
character |
The resolved rigor level. |
tol |
numeric |
The tolerance used. |
K |
integer |
Number of K-individual slots, derived as length(design_K$Z_a_k_list). |
per_slot_rank |
list |
Full return value of .check_Z_a_K_per_slot. |
cross_slot_gram |
list |
Full return value of .check_C4_bis_K_cross_slot. |
Notes
-
rigoris validated bymatch.arg; an unrecognised value raises an error before either sub-check executes. - Both sub-checks receive the same
rigorandtolvalues, ensuring consistent stringency. -
Kis extracted from the length ofZ_a_k_list; if that component is missing or empty,Kwill be0and the sub-checks must handle that case internally. - No side effects; purely computational with no global state modification.
Purpose Implements Block D-B1 of sub-phase 8.3.4: post-fit information contraction analysis per K-individual slot. For each slot's anchor parameter, it computes the prior-to-posterior variance contraction ratio
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
cmdstanr fit object |
The output of cs_model$sample(). Used to extract posterior draws via fit$draws(format = "draws_matrix"). |
family |
list (gdpar_family) |
A gdpar_family object whose K-individual slots have been promoted to per-observation scope by .gdpar_promote_scope_per_observation(). Its param_specs component is consulted for prior_canonical_kind per slot. |
slot_names |
character vector |
Canonical slot names, length |
use_groups |
integer scalar (0 or 1) |
Whether the fit used per-group hierarchical anchors. Determines the Stan parameter root: "mu_theta_ref_k" when 1, "theta_ref_k" when 0. |
prior |
list (gdpar_prior) |
A gdpar_prior object; its priors_by_kind overrides are documented as potentially consulted but are currently inert (the helper falls back to canonical kinds). |
Mathematics
The contraction for slot
where $\mathrm{Var}{\text{post}}$ is the sample variance of the posterior draws and $\mathrm{Var}{\text{prior}}$ is obtained from .gdpar_canonical_prior_variance(kind) for the slot's canonical prior kind.
Decision thresholds (from sub-phase 8.3.4 scoping):
Returns A named list with components:
| Component | Type | Description |
|---|---|---|
passed |
logical |
TRUE iff no slot triggered "warn" or "information_error". |
any_warn |
logical |
TRUE if any slot has status "warn". |
any_info_error |
logical |
TRUE if any slot has status "information_error". |
thresholds |
named numeric vector |
c(warn = 0.5, information_error = 0.1). |
per_slot |
named list of length |
Per-slot diagnostic results (see below). |
Each element of per_slot is a named list:
| Component | Type | Description |
|---|---|---|
slot |
character |
The slot name from slot_names[k]. |
var_post |
numeric |
Posterior sample variance (or NA_real_ if skipped). |
var_prior |
numeric |
Prior variance from canonical kind (or NA_real_). |
contraction |
numeric |
NA_real_ if skipped/non-finite). |
status |
character |
One of "pass", "warn", "information_error", "skipped". |
message |
character |
Diagnostic message string. |
Notes
-
Early return on missing draws: If
fit$draws(format = "draws_matrix")throws an error (caught viatryCatch), the function returns immediately withpassed = TRUEand all slots marked"skipped"with message"fit$draws() unavailable". This is a defensive fallback—no diagnostic is raised when draws are inaccessible. -
Parameter name construction: The Stan parameter name is
paste0(param_root, "[1,", k, "]")for bothuse_groups == 0anduse_groups == 1(the index structure is[1, k]in both cases, using the first row). The root is"theta_ref_k"or"mu_theta_ref_k"depending onuse_groups. -
Spec resolution: The function attempts to filter
family$param_specsfor specs withscopeinc("per_observation", "per_group"). If the filtered length does not equal$K$ , it falls back to taking the first$K$ specs positionally (family$param_specs[seq_len(K)]). -
Non-finite contraction: If
var_priorisInf,0, or the ratio producesNaN/Inf, the contraction isNA_real_and status is"skipped"with message"non-finite contraction". -
Insufficient draws: If the draws column exists but has fewer than 4 elements (
length(draws_k) < 4L), the slot is skipped with message"draws for '<param>' unavailable"(even though draws technically exist, the sample is too small for a reliable variance estimate). -
No canonical variance: If
.gdpar_canonical_prior_variance(kind)returns a non-finite value, the slot is skipped with message reporting the kind. - The information-error raises a
gdpar_information_errorwarning class elsewhere in the pipeline; this function itself only classifies—it does not emit warnings or errors. - The
priorargument is accepted for future extensibility but is currently unused within the function body.
Purpose S3 print method for objects of class "gdpar_identifiability_report". Renders a human-readable summary of the identifiability diagnostic to the console, covering all possible sub-report sections: eigenvalue/condition-number diagnostics, collinear directions (C1–C4), C4-bis per-coordinate cross-component checks, and D-ID pre-fit parameter analysis. Exported for user convenience.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list with S3 class "gdpar_identifiability_report"
|
The identifiability report object. Expected components are detailed in the Returns section below. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns Invisibly returns x (the input object), following standard R print-method conventions. The primary effect is the side effect of printing formatted text to the console via cat().
Notes
-
Top-level fields: Always prints
x$passed. Printslambda_min,lambda_max,condition_number, andtol_usedonly ifx$lambda_minis notNA. Printsx$rigor_usedif non-NULL. Printsx$messageunconditionally. -
Collinear directions (C1–C4): If
x$passedisFALSEandx$collinear_directionsis non-NULL, iterates over each direction entry, printing the eigenvalue and each column/coefficient pair in the direction vector. Each entrydis expected to haved$eigenvalue(numeric),d$columns(character vector), andd$coefficients(numeric vector of same length asd$columns). -
C4-bis section: If
x$c4_bisis non-NULL, iterates overx$c4_bis$per_k. Each entrypkis expected to havepk$coord,pk$rigor,pk$passed,pk$condition_number(may beInf),pk$shared_cols(character vector, may be empty), and optionallypk$collinear_directionswith the same structure as above. -
D-ID pre-fit section: If
x$did_pre_fitis non-NULL, prints the totalK, overallpassedflag, and per-parameter details (name,scope,prior_canonical_kind,did_status,did_condition). Also printssymbolic_separabilityif present, includingpassedandoverlapping_kinds. -
Formatting: Uses
format(..., digits = 4)for eigenvalues and condition numbers,format(..., digits = 3, width = 7)for direction coefficients. Indentation uses fixed strings (" "," "," "," "). - No validation is performed on the structure of
x; missing components are guarded byis.null/is.nachecks. Ifxlacks expected fields, the corresponding section is silently omitted. - The function does not raise errors under normal conditions; it is purely presentational.
Note: The source section also contains the roxygen documentation block for a .check_C7_group_anchor_aliasing function (implementing condition C7 from Block 6.5—detecting aliasing between group indicators and design columns). However, the function definition (signature and body) is not present in this section; it begins in the subsequent section (section 5 of 5). Therefore it is documented there, not here.
Purpose
Checks the identifiability condition (C7) of Block 6.5 for group aliasing. Ensures that columns of the design matrices for the
Arguments
-
design: A list containing design matrices. In a univariate setting it should have elementsZ_a,Z_b,Z_a_names,Z_b_names. In a multi‑coordinate setting it should haveZ_a_list,Z_b_list,Z_a_names_list,Z_b_names_list(each a list of length$p$ , the number of coordinates). -
group_id: A vector of group identifiers. Converted internally to an integer factor. IfNULLor with fewer than two levels, the check is skipped. -
group_var_name: A character string naming the grouping variable (used in error messages). -
tol: Numeric tolerance for comparing variances and QR‑rank deficiency (default1e-8).
Mathematics
The function implements two checks per design block (see .check_c7_one_block):
-
Within‑group variance test: For each column
$j$ of a design matrix$Z$ , compute the within‑group variance$s_{jg}^2$ for each group$g$ . If$s_{jg}^2 \le \text{tol}$ for all groups (i.e., the column is constant within every group), identifiability is violated. -
Rank test: Let
$G$ be the indicator matrix for the groups. The combined matrix$M = [G \mid Z]$ must have full column rank. If$\mathrm{rank}(M) < \text{ncol}(M)$ (after column normalization), there exists an indirect alias between the group anchor and a linear combination of the design columns.
Returns
invisible(NULL) invisibly. The function is called for its side effects (raising errors).
Notes
- Internal function (not exported, leading dot).
- If
group_idisNULLor has fewer than two levels, the function returns immediately without checks. - For multi‑coordinate designs (
has_multi_designisTRUE), the check is applied to each coordinate block separately, iterating overseq_len(p)wherep = length(design$Z_a_list). - For each block, it calls
.check_c7_one_blockwith the appropriate design sub‑matrix, column names, and coordinate index. - Errors are raised via
gdpar_abort()with class"gdpar_input_error"and a structured data list containing the component, coordinate (if any), group variable name, and (for variance violations) the names of the aliased columns.
Purpose
Internal helper that applies the two‑layer aliasing check (C7) to a single design block coord = NA for univariate). It performs the within‑group variance test and the joint rank test.
Arguments
-
Z: Numeric design matrix (may have zero columns or rows). IfNULL, zero columns, or zero rows, the function returns immediately. -
Z_names: Character vector of column names forZ, used in error messages to identify aliased columns. -
component: Character string, either"a"or"b", indicating which model component the block belongs to. -
coord: Integer coordinate index$k$ (for multi‑coordinate designs) orNA_integer_for univariate. Used only for error message formatting. -
group_int: Integer vector of group memberships (values in$1, \dots, J_{\text{groups}}$ ) for each observation. -
J_groups: Integer number of groups. (Received but not used directly in the computations; group structure is encoded ingroup_intand the indicator matrix$G$ .) -
group_var_name: Character string naming the grouping variable (for error messages). -
tol: Numeric tolerance for variance comparisons and QR‑rank deficiency.
Mathematics
-
Within‑group variance test:
For each column$j = 1, \dots, p$ (where $p = \text{ncol}(Z)$): $$ s_{jg}^2 = \begin{cases} 0 & \text{if group } g \text{ has fewer than 2 observations}, \ \frac{1}{|g|-1} \sum_{i \in g} (Z_{ij} - \bar{Z}{\cdot j g})^2 & \text{otherwise}, \end{cases} $$ where $\bar{Z}{\cdot j g}$ is the mean of column$j$ within group$g$ .
Let$m_j = \max_{g=1}^{J_{\text{groups}}} s_{jg}^2$ .
If$m_j \le \text{tol}$ , column$j$ is flagged as constant within every group. -
Rank test:
Construct the group indicator matrix$G \in \mathbb{R}^{n \times J_{\text{groups}}}$ viamodel.matrix(~ as.factor(group_int) + 0).
Form$M = [G \mid Z]$ . Normalize each column of$M$ by its Euclidean norm (with zero norms replaced by 1 to avoid division by zero).
Compute the QR decomposition of the normalized matrix:$\text{qr}(M_{\text{norm}})$ .
If$\mathrm{rank}(M_{\text{norm}}) < \text{ncol}(M_{\text{norm}})$ , the joint matrix is rank‑deficient, indicating an indirect alias.
Returns
invisible(NULL) if all checks pass.
Notes
- Internal function (not exported, leading dot), called only by
.check_group_aliasing_c7. - Early returns with
invisible(NULL)if:-
ZisNULL, -
ncol(Z) == 0L, -
nrow(Z) == 0L.
-
- Two possible error conditions:
-
Direct aliasing (constant columns): If any column has within‑group variance
$\le \text{tol}$ ,gdpar_abort()is called with a message listing the offending column names. The error data includes the component, coordinate, group variable name, andaliased_columns. -
Indirect aliasing (rank deficiency): If the rank of the normalized combined matrix is less than the number of columns,
gdpar_abort()is called with a message including the component, coordinate, group variable name, the observed rank, and the number of columns. The error data includesrankandncol.
-
Direct aliasing (constant columns): If any column has within‑group variance
- Errors are of class
"gdpar_input_error"and contain adatalist for programmatic handling. - The argument
J_groupsis passed but not used in any computation; the group structure is fully represented bygroup_intand the generated indicator matrix$G$ .
This file defines three S3 methods supporting the comparison of Empirical-Bayes (EB) and Fully-Bayes (FB) estimation paths produced elsewhere in the gdpar package. The methods provide console printing, a structured summary, and printing of that summary.
Purpose
S3 print method for objects of class gdpar_eb_fb_comparison. Produces a concise human-readable console summary of an EB-vs-FB comparison, including the estimation families and paths involved, the number of common
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_eb_fb_comparison (list) |
The comparison object to display. Expected components: family_eb, family_fb (character), path_eb, path_fb (character), level (numeric), tv_bins (integer scalar), n_common_params (integer scalar), tv_table (data frame with a tv column, possibly NULL), coverage_table (data frame with a width_ratio column, possibly NULL), theta_diff_table (data frame, possibly NULL), warnings (character vector). |
digits |
integer scalar | Passed to format() for numeric formatting; defaults to 3L. |
... |
(any) | Unused; absorbed for S3 generic compatibility. |
Mathematics
No mathematical formula is implemented. The method computes order statistics over finite subsets of two distributions:
- TV values: ${v \in \texttt{tv_table}$tv : v \in \mathbb{R}}$, reporting
$\min$ ,$\mathrm{median}$ ,$\max$ . - Width ratios: ${r \in \texttt{coverage_table}$width_ratio : r \in \mathbb{R}}$, reporting
$\min$ ,$\mathrm{median}$ ,$\max$ .
Non-finite values (NA, NaN, Inf, -Inf) are excluded via is.finite() before computing summary statistics.
Returns
The object x, returned invisibly (via invisible(x)). The primary effect is console output.
Notes
- The
levelcomponent is formatted withformat(x$level, digits = digits); its type is not validated. - If
tv_tableisNULLor has zero rows, or if alltvvalues are non-finite, the marginal TV line is silently omitted. - If
coverage_tableisNULLor has zero rows, or if allwidth_ratiovalues are non-finite, the width-ratio line is silently omitted. - The
$\theta$ -diff preview usesutils::head(x$theta_diff_table, 6L)and is passed throughformat(..., digits = digits)beforeprint(). - Warnings are printed one per line, prefixed with
" - ". - No validation or error-checking is performed on the structure of
x; missing components would propagate as R errors (e.g.,$onNULL).
Purpose
S3 summary method for objects of class gdpar_eb_fb_comparison. Constructs a structured list suitable for programmatic access and for the canonical print.summary.gdpar_eb_fb_comparison method. Aggregates the TV table and the coverage (width-ratio) table into seven-point summary statistics (count, min, 25th percentile, median, 75th percentile, max, mean).
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fb_comparison (list) |
The comparison object to summarize. Same expected components as for print.gdpar_eb_fb_comparison, plus optionally call. |
... |
(any) | Unused; absorbed for S3 generic compatibility. |
Mathematics
For the TV distribution, let $V = {v \in \texttt{tv_table}$tv : v \in \mathbb{R}}$. If
where stats::quantile) and
For the width-ratio distribution, let $R = {r \in \texttt{coverage_table}$width_ratio : r \in \mathbb{R}}$. If
If either finite subset is empty, the corresponding summary element is set to NULL.
Returns
A list of class c("summary.gdpar_eb_fb_comparison", "list") with the following components:
| Component | Type | Description |
|---|---|---|
family_eb |
character | Copied from object$family_eb. |
family_fb |
character | Copied from object$family_fb. |
path_eb |
character | Copied from object$path_eb. |
path_fb |
character | Copied from object$path_fb. |
level |
(inherits from object$level) |
Copied from object$level. |
tv_bins |
integer | Copied from object$tv_bins. |
n_common_params |
integer | Copied from object$n_common_params. |
n_anchor_cells |
integer |
0L if object$theta_diff_table is NULL, otherwise nrow(object$theta_diff_table). |
tv_summary |
list or NULL
|
Seven-element list (n, min, q25, median, q75, max, mean) or NULL. |
coverage_summary |
list or NULL
|
Seven-element list (same structure) or NULL. |
theta_diff_table |
data frame or NULL
|
Copied by reference from object$theta_diff_table. |
tv_table |
data frame or NULL
|
Copied by reference from object$tv_table. |
coverage_table |
data frame or NULL
|
Copied by reference from object$coverage_table. |
warnings |
character | `object$warnings % |
call |
(any) | Copied from object$call. |
Notes
- The
%||%infix operator is used for thewarningsdefault; this operator is assumed to be defined elsewhere in the package (not in this file). Under standardrlang::%||%`` semantics, it returns the left-hand side if it is notNULL, otherwise the right-hand side. - Quantiles are computed with
stats::quantileat probabilities$0.25$ and$0.75$ ; the defaulttype = 7interpolation is used. Thenamesattribute of the scalar result is stripped viaunname(). - The
q25andq75fields are unnamed scalars; all other numeric fields inherit names frommin(),stats::median(),max(), andmean()(typicallyNULLfor these functions on atomic vectors). - No copy is made of the table data frames; they are assigned by reference into the output list.
- The output class vector is
c("summary.gdpar_eb_fb_comparison", "list"), enabling dispatch on both the specific class and the implicitlistclass.
Purpose
S3 print method for objects of class summary.gdpar_eb_fb_comparison. Renders the structured summary to the console, displaying the EB/FB families and paths, the level, the count of common
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
summary.gdpar_eb_fb_comparison (list) |
The summary object to display. Expected components: family_eb, family_fb, path_eb, path_fb (character), level (numeric), n_common_params (integer scalar), n_anchor_cells (integer scalar), tv_summary (list or NULL), coverage_summary (list or NULL), theta_diff_table (data frame or NULL), warnings (character vector). |
digits |
integer scalar | Passed to format() for numeric formatting; defaults to 3L. |
... |
(any) | Unused; absorbed for S3 generic compatibility. |
Mathematics
No mathematical formula is implemented. The method formats and displays pre-computed summary statistics. For each distribution (TV and width-ratio), it prints:
on one line, followed by
Returns
The object x, returned invisibly (via invisible(x)). The primary effect is console output.
Notes
- Unlike
print.gdpar_eb_fb_comparison(which shows only the first six rows oftheta_diff_table), this method prints the fulltheta_diff_tableviaprint(format(x$theta_diff_table, digits = digits)). - The
tv_summaryandcoverage_summaryblocks are each printed only if the corresponding component is non-NULL. Each block includes the sample sizenin its header. - The
levelis formatted withformat(x$level, digits = digits). - Warnings are printed one per line, prefixed with
" - ", only iflength(x$warnings) > 0L. - The
tv_binscomponent is not printed by this method (unlikeprint.gdpar_eb_fb_comparison), even though it is present in the summary object. - No structural validation of
xis performed; missing or mistyped components would propagate as R errors.
Purpose (role in the package).
Exported orchestrator for Sub-phase 8.6.E (Charter Section 3.5, decision 2.5 Trio of vignettes). It produces a descriptive operational comparison between an Empirical-Bayes fit (gdpar_eb_fit, from gdpar_eb()) and a Fully-Bayes fit (gdpar_fit, from gdpar()) fitted on the same dataset. It does not assert algorithmic equivalence and does not test hypotheses across the two inferential frames. It computes three tables:
- Per-anchor-cell differences in the population anchor
$\theta_{\text{ref}}$ . - Marginal empirical total-variation (TV) distance between the lower-level posteriors of
$\xi = (a,, b,, W,, \text{dispersion})$ , parameter by parameter. - Operational verification of the higher-order coverage discrepancy (v07 Section 6, Proposition 7B scalar / 7B* matricial / 7B* tensorial) on the nominal EB and FB credible intervals.
Arguments
| Argument | Type | Meaning |
|---|---|---|
eb_fit |
object of class gdpar_eb_fit
|
Empirical-Bayes fit produced by gdpar_eb(). Covers all four path regimes: K = 1 + p = 1; Path A (K = 1 + p > 1); Path B (K > 1 + p = 1); Path C (K > 1 + p > 1, via the K × p tensor extension of Sub-phase 8.6.D). |
fb_fit |
object of class gdpar_fit
|
Fully-Bayes fit produced by gdpar(). Must have been fitted on the same dataset (same outcome, same covariates, same K / p regime). The comparator does not refit either model. |
level |
numeric scalar | Credible-interval level for coverage-discrepancy reporting. Must lie in 0.95. |
tv_bins |
integer scalar | Number of histogram bins used to approximate the marginal TV distance per parameter. Must be 30L. Larger values give a finer empirical TV but require more draws per parameter for stability. |
... |
(any) | Reserved for future arguments; currently unused. |
Mathematics
The marginal total-variation distance between two distributions
where
The coverage-discrepancy table compares EB-nominal vs FB-nominal credible-interval widths at level level per anchor cell, operationally verifying the
Returns
An object of class c("gdpar_eb_fb_comparison", "list") with components:
| Component | Type / Structure | Meaning |
|---|---|---|
theta_diff_table |
data.frame or NULL
|
Per-anchor-cell comparison of EB vs FB .gdpar_eb_fb_theta_diff_table. |
tv_table |
data.frame or NULL
|
Marginal TV distance per common NULL when EB or FB draws are unavailable or there are zero common parameter names. |
coverage_table |
data.frame or NULL
|
Coverage-discrepancy table (EB-nominal vs FB-nominal IC widths per anchor cell). |
level |
numeric scalar | Echo of the level input. |
tv_bins |
integer scalar | Echo of the tv_bins input. |
n_common_params |
integer | Number of rows in tv_table (or 0L if tv_table is NULL). |
path_eb |
character | Path identifier from eb_fit$path, defaulting to "eb". |
path_fb |
character | Path identifier from fb_fit$path, defaulting to "fb". |
family_eb |
character |
eb_fit$family$name, or NA_character_ if absent. |
family_fb |
character |
fb_fit$family$name, or NA_character_ if absent. |
call |
call |
The matched call. |
warnings |
character vector | Accumulated fallback notification messages from helper extractors. Empty (character(0L)) in the happy path. |
meta |
list | Contains mode = "compare_eb_fb" and a human-readable note summarizing the TV and coverage methodology. |
Companion S3 methods print.gdpar_eb_fb_comparison and summary.gdpar_eb_fb_comparison are documented separately.
Notes
-
Input validation.
eb_fitmust inherit"gdpar_eb_fit";fb_fitmust inherit"gdpar_fit".levelmust be a single numeric value in$(0,1)$ .tv_binsmust be a single numeric value$\geq 5$ (coerced to integer). Violations raise agdpar_input_errorviagdpar_abort(). -
Required namespace. The
posteriorpackage is required (suggested dependency); an informative error is raised if absent. -
Warning accumulation. A local
emit()closure accumulates messages intowarnings_msg(and also callsgdpar_warn()for each). Six distinct fallback conditions are checked:- All-FB
theta_refdraws areNA(unknown template convention or empty draws). - EB
$\xi$ draws areNULL. - FB
$\xi$ draws areNULL. - TV table is
NULLdespite both draw sets being non-NULL(zero common parameter names). - All-FB widths in the coverage table are
NA.
- All-FB
-
FB conditional_fit fallback. For the FB
$\xi$ draws extraction, the function first triesfb_fit$conditional_fitand falls back tofb_fit$fitvia the%||%null-coalescing operator. -
Path uniformity. All four EB regimes are handled uniformly by the helpers. For Path C the
theta_ref_kp_hattensor is flattened to a length-$Kp$ vector keyed by (slot, coord); the joint$K \times p$ inflation tensor is reported in the coverage table per cell via diagonal-block entries. -
S3 dispatch. The returned object carries class
c("gdpar_eb_fb_comparison", "list")for dispatch by the companion print/summary methods.
Purpose (role in the package).
Internal helper that builds the theta_diff_table component of the comparison object. It extracts the EB anchor point estimates and their standard errors, attempts to extract FB posterior draws of .gdpar_eb_fb_extract_theta_ref_draws_fb(), and computes per-anchor-cell differences. The row key structure varies by path regime.
Arguments
| Argument | Type | Meaning |
|---|---|---|
eb_fit |
object of class gdpar_eb_fit
|
Empirical-Bayes fit. Path regime is inferred from eb_fit$path (checked for "eb_KxP" for Path C). Contains theta_ref_kp_hat / theta_ref_kp_se (Path C), or theta_ref_hat / theta_ref_se (other paths). |
fb_fit |
object of class gdpar_fit
|
Fully-Bayes fit. Passed to the FB draws extractor. |
level |
numeric scalar in |
Credible-interval level. Accepted by the function signature but not used in the current body (reserved for future use or passed downstream elsewhere). |
Mathematics
For each anchor cell
where NA_real_ when
Returns
A data.frame whose structure depends on path regime:
-
Path C (
eb_fit$path == "eb_KxP", i.e.\$K > 1$ and$p > 1$ ): One row per$(g,,k,,c)$ triple ($g = 1,\ldots,J$ groups;$k = 1,\ldots,K$ slots;$c = 1,\ldots,p$ coordinates). Columns:Column Type Meaning groupinteger Group index $g$ .slotcharacter Slot name from eb_fit$slot_names[k].coordinteger Coordinate index $c$ .eb_estimatenumeric eb_fit$theta_ref_kp_hat[g, k, c].eb_senumeric eb_fit$theta_ref_kp_se[g, k, c].fb_meannumeric Posterior mean of FB draws for cell $(g,k,c)$ ;NA_real_if draws unavailable.fb_senumeric Posterior SD of FB draws; NA_real_if draws unavailable.diffnumeric $\texttt{eb_estimate} - \texttt{fb_mean}$ .diff_relnumeric $(\texttt{eb_estimate} - \texttt{fb_mean}) / \texttt{fb_se}$ , orNA_real_whenfb_seis not finite or$\leq 0$ . -
Non-Path-C (K = 1 + p = 1, Path A, Path B): One row per anchor cell, indexed sequentially. Columns:
Column Type Meaning cellinteger Sequential cell index $1,\ldots,n_{\text{cells}}$ .eb_estimatenumeric as.numeric(eb_fit$theta_ref_hat)[i].eb_senumeric as.numeric(eb_fit$theta_ref_se)[i].fb_meannumeric From fb_draws$flat$means[i];NA_real_if absent.fb_senumeric From fb_draws$flat$ses[i];NA_real_if absent.diffnumeric $\texttt{eb_estimate} - \texttt{fb_mean}$ .diff_relnumeric Conditional ratio as above; uses vectorized ifelse.
Notes
-
FB draws extraction. Calls
.gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit)inside atryCatchthat silently returnsNULLon any error. WhenNULL, all FB columns are filled withNA_real_. -
Path C iteration. Uses a triple nested
forloop over$(g, k, c)$ , pre-allocating a list of length$J \cdot K \cdot p$ . Each list element is a single-rowdata.frame, assembled at the end viado.call(rbind, rows). FB draws for each cell are accessed asfb_draws$kp[[g]][[k]][, c]. -
Non-Path-C path. EB estimates and SEs are coerced to numeric vectors via
as.numeric(). FB means and SEs are taken fromfb_draws$flat$meansandfb_draws$flat$ses, with length truncated tomin(n_cells, length(fb_draws$flat$means)). -
Edge case — zero-length FB draws. For Path C, if a cell's FB draw vector is
NULLor haslength == 0, bothfb_meanandfb_seare set toNA_real_. -
Edge case — zero FB SE.
diff_relisNA_real_wheneverfb_seisNA, not finite, or$\leq 0$ . -
The
levelargument is accepted but not used in the computation within this helper; it is present for interface consistency with the other helpers.
Purpose (role in the package).
Internal helper that extracts the gdpar_fit object in a path-aware manner. Used by .gdpar_eb_fb_theta_diff_table to obtain the FB posterior summaries (
Arguments
| Argument | Type | Meaning |
|---|---|---|
fb_fit |
object of class gdpar_fit
|
Fully-Bayes fit whose Stan draws are to be inspected for |
Mathematics
No formula is implemented by this helper; it is a pure data-extraction utility that retrieves posterior draws from the Stan output using the canonical posterior::as_draws_matrix interface.
Returns
A list whose structure depends on the path regime of fb_fit:
| Path regime | Component | Structure | Meaning |
|---|---|---|---|
| Non-Path-C (K = 1 + p = 1 / Path A / Path B) | flat |
Named list with elements means (numeric vector) and ses (numeric vector) |
Posterior means and standard deviations of theta_ref[...] or theta_ref_k[...] conventions. |
| Path C (K > 1 + p > 1) | kp |
Nested list: kp[[g]][[k]] is a matrix with |
Posterior draws of theta_ref_kp[...] convention. |
Returns NULL when extraction fails (draws are not present in the recognized variable-name convention, or fb_fit lacks draw data).
Notes
- Body not in this section. The function body is defined in section 2 of 2; only the roxygen documentation block appears in this section. The documented behavior is as described above.
-
Path C debt. The documentation explicitly notes that the K × p FB template for Path C is itself a follow-on debt of the 8.4 unification effort per Charter and the
project_gdpar_deuda_8_4_unificacion_standebt item. -
Fail-silent design. The function returns
NULLon failure rather than raising an error. The calling orchestrator (gdpar_compare_eb_fb) detects this via downstreamNULLchecks and emits structured warnings through theemit()closure. -
@keywords internal/@noRd. This function is internal and does not generate an.Rdhelp page.
Purpose Extracts the posterior draws of reference-anchor parameters (NULL if no
Arguments
| Argument | Type | Meaning |
|---|---|---|
fb_fit |
list | A fitted FB model object. The function first looks for fb_fit$conditional_fit (an EB conditional fit embedded within the FB workflow) and falls back to fb_fit$fit (the raw FB fit). Each is expected to possess a $draws() method returning a posterior draws object. |
Mathematics
No closed-form formula is implemented. The function is a dispatch-and-extraction routine that selects the appropriate draws-dimension based on variable naming:
-
Path C convention: variables matching
^theta_ref_kp\[(three-index tensor$\theta_{\mathrm{ref}_kp}[g,k,c]$ ). -
Path B convention: variables matching
^theta_ref_k\[(one-dimensional, per-slot). -
Path A / default convention: variables matching
^theta_ref(\[|$)(scalar or simple vector).
For the "flat" returns (Paths A and B), the function computes:
$$\bar{\theta}j = \frac{1}{S}\sum{s=1}^{S} \theta_j^{(s)}, \qquad \mathrm{sd}j = \sqrt{\frac{1}{S-1}\sum{s=1}^{S}\bigl(\theta_j^{(s)} - \bar{\theta}_j\bigr)^2}$$
where
Returns
-
NULLif no draws object is available, if$draws()errors, if variable names are empty, or if no$\theta_{\mathrm{ref}}$ variables are found. -
list(kp = <nested list>)— Path C: a nested list produced by.gdpar_eb_fb_unpack_kp, indexed askp[[g]][[k]][, c]with$g=1,\dots,J$ (groups),$k=1,\dots,K$ (slots),$c=1,\dots,p$ (coordinates), each cell containing a numeric vector of posterior draws. -
list(flat = list(means = <numeric>, ses = <numeric>))— Paths A/B: unnamed numeric vectors of posterior means and standard deviations, one element per$\theta_{\mathrm{ref}}$ cell.
Notes
- Uses the null-coalescing operator
%||%(fromrlang) to chooseconditional_fitoverfit. - Both the
$draws()call and thedimnames()access are wrapped intryCatch, silently returningNULLon error—this is a defensive pattern for partially-initialized or incomplete fit objects. - Variable detection proceeds in priority order: Path C (kp) is checked first, then Path B (k), then Path A (default). Only the first match is returned.
- The comment notes that the Path C "kp" branch is not consumed by the diff table when the EB fit is
$K=1, p=1$ (Path A); only the kp branch itself touches that structure. - Depends on the
posteriorpackage foras_draws_matrixandsubset_draws.
Purpose Unpacks a draws matrix containing Path C-style
Arguments
| Argument | Type | Meaning |
|---|---|---|
mat |
posterior::draws_matrix |
A draws matrix whose columns are the theta_ref_kp[g,k,c]. |
vars_c |
character | Character vector of variable names matching the theta_ref_kp[...] pattern (used for parsing index triples). |
Mathematics
The function reconstructs the three-index tensor
theta_ref_kp[g,k,c], the integer indices
Returns
A nested list kp of depth 3:
-
kp[[g]]— list of length$J$ (groups/observations). -
kp[[g]][[k]]— list of length$K$ (slots). -
kp[[g]][[k]]— matrix of dimension$S \times p$ ($S$ = number of posterior draws,$p$ = number of coordinates). If a particulartheta_ref_kp[g,k,c]column is absent frommat, that column is filled withNA_real_.
Notes
- The regex
"\\[(\\d+),(\\d+),(\\d+)\\]"withregexeccaptures exactly three comma-separated integers inside square brackets. If a parsed match has fewer than 4 elements (the full match plus 3 groups),c(NA, NA, NA)is substituted. - The result matrix
kp[[g]][[k]]is initialized toNA_real_before filling, so any missing column inmatyieldsNAdraws for that cell rather than an error. - This function is marked
@keywords internaland@noRd; it is not exported. - Uses
sprintfto reconstruct the expected column name for lookup inmat.
Purpose Extracts the posterior draws of the "xi" parameter vector (the non-anchor model parameters: fixed-effect coefficients lp__. This function is used by both the EB conditional fit and the FB fit.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_obj |
list | A fitted model object (either EB conditional or FB) expected to have a $draws() method returning a posterior draws object. May be NULL. |
Mathematics
No formula is implemented. The function performs a filtering operation on variable names. Variables are retained if they do not match any of the following exclusion patterns (joined by |):
| Pattern | Excluded variables |
|---|---|
^lp__$ |
Log posterior density |
^theta_ref |
Reference-anchor parameters (all variants) |
^mu_theta_ref |
Mean of reference-anchor hyperparameters |
^sigma_theta_ref |
SD of reference-anchor hyperparameters |
^eta |
Linear predictor generated quantities |
^eta_kp |
Path C linear predictor |
^log_lik |
Pointwise log-likelihood (LOO) |
^y_pred |
Posterior predictive draws |
^theta_i |
Individual-level random effects |
^a_raw |
Raw (non-centered) fixed-effect coefficients |
^c_b_raw |
Raw covariance parameters |
^c_b_kp_raw |
Path C raw covariance parameters |
^W_raw |
Raw covariance matrix elements |
Returns
-
NULLiffit_objisNULL, if$draws()errors, if variable names are empty, or if no variables survive the exclusion filter. - A
posterior::draws_matrixcontaining only the retained "xi" columns.
Notes
- This function is the complement of
.gdpar_eb_fb_extract_theta_ref_draws_fb: that function extracts only$\theta_{\mathrm{ref}}$ draws, while this one extracts everything except$\theta_{\mathrm{ref}}$ and generated quantities. - Both
$draws()anddimnames()accesses are wrapped intryCatch, silently returningNULLon error. - The exclusion list is hard-coded; adding new generated-quantity prefixes would require editing the
paste(...)call. - Marked
@keywords internal,@noRd, not exported.
Purpose Computes a marginal Total Variation (TV) distance between the EB and FB posterior distributions for each parameter common to both draws objects. This provides a diagnostic for how closely the EB approximation matches the full FB posterior on a per-parameter basis.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_eb |
posterior::draws_matrix |
Posterior draws from the EB (Empirical Bayes) fit. Column names are parameter names. |
draws_fb |
posterior::draws_matrix |
Posterior draws from the FB (Forward-Backward) fit. Column names are parameter names. |
tv_bins |
integer scalar | Number of bins for the shared histogram grid used in the TV computation. |
Mathematics
For each parameter
The grid is divided into
Histogram counts
The marginal TV distance is then the histogram plug-in estimator: $$\widehat{\mathrm{TV}}(\psi) = \frac{1}{2}\sum_{j=1}^{B} \bigl|\hat{p}{\mathrm{eb},j} - \hat{p}{\mathrm{fb},j}\bigr|$$
Returns
-
NULLif eitherdraws_ebordraws_fbisNULL, or if no common column names exist. - A
data.framewith one row per common parameter and columns:
| Column | Type | Meaning |
|---|---|---|
parameter |
character | Parameter name (column name in the draws objects). |
tv |
numeric | Marginal TV distance (NA_real_ if the range is degenerate or non-finite. |
n_eb |
integer | Number of EB draws for this parameter. |
n_fb |
integer | Number of FB draws for this parameter. |
mean_eb |
numeric | Posterior mean from EB draws. |
mean_fb |
numeric | Posterior mean from FB draws. |
Notes
- If the combined range
rngcontains non-finite values or has zero width (diff(rng) <= 0),tvis set toNA_real_for that parameter. -
intersect(colnames(draws_eb), colnames(draws_fb))determines common parameters; order followsdraws_ebcolumn order. - Uses
graphics::hist(..., plot = FALSE)solely for bin counting; no plot is produced. - The rows are assembled via
do.call(rbind, rows)after a loop over common parameters. - Marked
@keywords internal,@noRd, not exported.
Purpose Builds a coverage diagnostic table comparing credible-interval widths between the EB and FB posteriors for each
Arguments
| Argument | Type | Meaning |
|---|---|---|
eb_fit |
list | A fitted EB model object. Must contain theta_ref_hat, theta_ref_se, and for Path C: theta_ref_kp_hat, theta_ref_kp_se, K, p, slot_names. May contain correction_applied (logical), eb_correction_constant (scalar), correction_tensor_constant (matrix), and path (character). |
fb_fit |
list | A fitted FB model object, passed to .gdpar_eb_fb_extract_theta_ref_draws_fb. |
level |
numeric scalar | Nominal credible level 0.95). |
Mathematics
The significance level and critical value are:
Path C ($\texttt{eb_fit$path} == \texttt{"eb_KxP"}$): For each cell
- The EB standard error is $\mathrm{se}_{g,k,c}^{\mathrm{eb}} = \mathtt{eb_fit$theta_ref_kp_se}[g,k,c]$.
- If the EB correction was applied and the correction tensor
$\mathbf{T} \in \mathbb{R}^{K \times p \times p}$ is available with finite entries, the inflation factor is: $$\mathrm{inflate}{k,c} = \sqrt{1 + \frac{T[k,c,c]}{\max(1, J)}}$$ Otherwise $\mathrm{inflate}{k,c} = 1$. - The EB credible-interval width is: $$w_{g,k,c}^{\mathrm{eb}} = 2z \cdot \mathrm{se}{g,k,c}^{\mathrm{eb}} \cdot \mathrm{inflate}{k,c}$$
- The FB credible-interval width is:
$$w_{g,k,c}^{\mathrm{fb}} = 2z \cdot \mathrm{sd}\bigl(\theta_{\mathrm{ref}_kp}^{(s)}[g,k,c]\bigr)$$ computed from the FB posterior draws (via.gdpar_eb_fb_extract_theta_ref_draws_fb), orNA_real_if unavailable. - The width ratio is:
$$R_{g,k,c} = \frac{w_{g,k,c}^{\mathrm{eb}}}{w_{g,k,c}^{\mathrm{fb}}}$$ set toNA_real_if$w^{\mathrm{fb}}$ is non-finite or zero.
Path A / Path B (non-Path-C): For each cell
- $\mathrm{se}_j^{\mathrm{eb}} = \mathtt{eb_fit$theta_ref_se}[j]$.
- If
eb_fit$correction_appliedisTRUE, the scalar inflation factor is:$$\mathrm{inflate} = \sqrt{1 + \frac{c_{\mathrm{eb}}}{\max(1, J_{\mathrm{flat}})}}$$ where $c_{\mathrm{eb}} = \mathtt{eb_fit$eb_correction_constant}$ (defaulting to 0 ifNULL). Otherwise$\mathrm{inflate} = 1$ . - EB width:
$w_j^{\mathrm{eb}} = 2z \cdot \mathrm{se}_j^{\mathrm{eb}} \cdot \mathrm{inflate}$ . - FB width:
$w_j^{\mathrm{fb}} = 2z \cdot \mathrm{sd}_{j}^{\mathrm{fb}}$ from the flat FB draws, orNA_real_if the FB draws are shorter or unavailable. - The width ratio is computed element-wise with
ifelse, yieldingNA_real_when$w^{\mathrm{fb}}$ is non-finite or$\leq 0$ .
Returns
-
NULLis never explicitly returned (unlike the other functions in this section); instead the function always returns adata.frame. -
Path C: A
data.framewith$J \times K \times p$ rows and columns:
| Column | Type | Meaning |
|---|---|---|
group |
integer | Group index |
slot |
character | Slot name from eb_fit$slot_names[k]. |
coord |
integer | Coordinate index |
eb_width |
numeric | EB credible-interval width. |
fb_width |
numeric | FB credible-interval width (or NA). |
width_ratio |
numeric |
NA). |
inflation |
numeric | The correction inflation factor |
-
Path A / Path B: A
data.framewith$J_{\mathrm{flat}}$ rows and columns:
| Column | Type | Meaning |
|---|---|---|
cell |
integer | Sequential cell index |
eb_width |
numeric | EB credible-interval width. |
fb_width |
numeric | FB credible-interval width (or NA). |
width_ratio |
numeric |
NA). |
inflation |
numeric | Scalar correction inflation factor. |
Notes
- Path C vs. non-Path-C dispatch is determined by
identical(eb_fit$path, "eb_KxP"). - The FB draws extraction (
.gdpar_eb_fb_extract_theta_ref_draws_fb) is wrapped intryCatch; on errorfb_drawsis set toNULL, and all FB widths will beNA_real_. - For Path C, the correction tensor entry
tensor[k, c, c]is checked withall(is.finite(tensor[k, c, c]))before computing the inflation; if the tensor isNULLor the entry is non-finite, inflation defaults to 1. - For Path A/B,
eb_fit$eb_correction_constantdefaults to 0 via%||%whenNULL. - The
width_ratiois the key diagnostic: a ratio significantly above 1 indicates the EB credible intervals are wider than the FB intervals, consistent with the$O(n^{-1})$ under-coverage correction described in the referenced theoretical results. - Marked
@keywords internal,@noRd, not exported.
Purpose
S3 print method for objects of class gdpar_meta_learner_comparison. Renders a concise human-readable summary of a meta-learner comparison: the bridge identifier, observation/method counts, the credible level, per-external-method metadata (native CI availability, elapsed time, note count, presence of a predict_fun), and a head view of the three concordance matrices (RMSE, Pearson, MAD).
Arguments
-
x:gdpar_meta_learner_comparison. The comparison object to display. Expected to contain componentsn_obs,n_methods,level,external(a named list of per-method adapter result lists, each withnative_ci,time_sec,notes,has_predict_fun), andcomparison(a list withrmse,pearson,madmatrices). -
...: any. Unused; present for S3 generic compatibility. Silently ignored.
Mathematics
None.
Returns
Invisibly returns x (via invisible(x)). The side effect is console output.
Notes
- The method does not validate
x; it assumes the structure is present. Missing components would propagate as errors fromcat/print. - Per-method line format is fixed by
sprintf:"- %-12s native_ci = %s time = %.3f s notes = %d predict = %s\n".native_ciandhas_predict_funare coerced to character bysprintf's%s(typically"TRUE"/"FALSE"). - The three concordance matrices are printed with
round(..., 4L); they are expected to be square numeric matrices with shared row/column names of length$m = 1 + n_{\text{methods}}$ (bridge plus externals). - S3 dispatch is triggered by
print(x)whenclass(x)contains"gdpar_meta_learner_comparison".
Purpose
S3 summary method for gdpar_meta_learner_comparison objects. Constructs a structured long-format summary containing: per-method ATE point estimates, per-method ATE CI bounds (averaged from per-observation native CIs when available, otherwise NA_real_), the three concordance matrices pivoted into long form, and per-method timing/CI-availability metadata.
Arguments
-
object:gdpar_meta_learner_comparison. The comparison object. Must containbridge_cate(withcate_meanand optionallycate_ci),external(named list of adapter results, each withcate_mean, optionallycate_ci,time_sec,native_ci),comparison(withrmse,pearson,mad),level,n_obs,n_methods. -
...: any. Unused; present for S3 generic compatibility.
Mathematics
Per-method ATE is the sample mean of the per-observation CATE posterior means:
When native per-observation CIs are present, the ATE CI bounds are likewise the sample means of the per-observation lower and upper bounds:
Otherwise both bounds are NA_real_.
Returns
A list of class c("summary.gdpar_meta_learner_comparison", "list") with components:
-
ate_table:data.framewith columnsmethod(character:"bridge"followed bynames(object$external)),ate(numeric),ate_lower(numeric, possiblyNA),ate_upper(numeric, possiblyNA). -
metrics: long-formatdata.frameproduced by.comparison_long(object$comparison)with columnsmethod_i,method_j,rmse,pearson,mad(off-diagonal rows only). -
timing:data.framewith columnsmethod(external method names only — bridge excluded),time_sec(numeric),native_ci(logical). -
level,n_obs,n_methods: copied verbatim fromobject.
Notes
- Calls
assert_inherits(object, "gdpar_meta_learner_comparison", "object"); raises an error (presumably of classgdpar_input_errorper package convention) if the class is absent. - Bridge ATE CI bounds are populated only if
object$bridge_cate$cate_ciis non-NULL; otherwise they remainNA_real_. - External ATE CI bounds are populated per-method only if
e$cate_ciis non-NULL; the code does not consulte$native_cihere, only the presence ofcate_ci. -
ate_vec,ate_lower,ate_upperare named numeric vectors initialized withstats::setNames; the bridge slot is filled first, then external slots in iteration order. - The
timingdata frame excludes the bridge (no timing recorded for it). - S3 dispatch is triggered by
summary(object)whenclass(object)contains"gdpar_meta_learner_comparison".
Purpose
S3 print method for objects of class summary.gdpar_meta_learner_comparison. Prints the credible level, observation count, method count, the ATE table, the timing/CI-availability table, and the first 20 rows of the long-format pairwise concordance metrics.
Arguments
-
x:summary.gdpar_meta_learner_comparison. The summary object produced bysummary.gdpar_meta_learner_comparison. Expected components:level,n_obs,n_methods,ate_table,timing,metrics. -
...: any. Unused; present for S3 generic compatibility.
Mathematics
None.
Returns
Invisibly returns x (via invisible(x)). The side effect is console output.
Notes
- Does not validate
x. -
ate_tableandtimingare printed withrow.names = FALSE. -
metricsis truncated to its first 20 rows viautils::head(x$metrics, 20L); ifnrow(x$metrics) > 20L, asprintfline of the form" ... (%d more rows)\n"is emitted with the count of omitted rows. - S3 dispatch is triggered by
print(x)whenclass(x)contains"summary.gdpar_meta_learner_comparison".
Purpose
Internal helper that pivots the three square concordance matrices (rmse, pearson, mad) stored in a comparison object into a single long-format data.frame containing one row per ordered off-diagonal method pair
Arguments
-
comparison: list. Must contain numeric matrix componentsrmse,pearson,madwith identical dimensions and sharedrownames/colnames. Row names are read fromrownames(comparison$rmse).
Mathematics
Given
Diagonal entries (if (i == j) next. The total number of rows is
Returns
A data.frame with columns method_i (character), method_j (character), rmse (numeric), pearson (numeric), mad (numeric), constructed by do.call(rbind, out_rows) over a list of single-row data frames. stringsAsFactors = FALSE is set on each constituent.
Notes
- Marked
@keywords internaland@noRd; not exported. - Iteration uses
seq_along(nms)for bothiandj, so the order is row-major over the upper and lower triangles combined (i.e., both$(i, j)$ and$(j, i)$ appear, but$(i, i)$ is excluded). - Assumes
rownames(rmse)is non-NULLand that all three matrices share the same dimension and names; no consistency check is performed. - The list
out_rowsis pre-allocated by appending with an incrementing indexk; if anyi == jis skipped, the corresponding slot is never assigned, but becausekis only incremented after assignment, noNULLslots are produced. - Returns
NULL(fromdo.call(rbind, list())) ifnmsis empty.
predict.gdpar_meta_learner_comparison(object, newdata, level = NULL, bridge = NULL, data = NULL, ...)
Purpose
S3 predict method for gdpar_meta_learner_comparison objects. Re-evaluates the CATE on a new covariate grid newdata for the bridge component and for every external adapter. Adapters exposing a predict_fun reuse their cached fitted state without refitting; adapters without a usable predict_fun (or whose predict_fun errors) are flagged for refit, their cate_mean is filled with NA_real_, and a gdpar_diagnostic_warning is emitted. The bridge is re-evaluated via predict.gdpar_causal_bridge when real fits are present, otherwise falls back to cached cate_mean/cate_ci only when newdata matches the original observation count.
Arguments
-
object:gdpar_meta_learner_comparison. The comparison object. Must containlevel,bridge(agdpar_causal_bridgeorNULL), andexternal(named list of adapter results, each possibly containingpredict_fun,state,native_ci,notes). -
newdata:data.frame. Required. The new evaluation grid. Must be a data frame (asserted byassert_data_frame). -
level:numeric(1)orNULL. Optional credible level in$(10^{-3}, 1 - 10^{-3})$ overridingobject$level. Defaults toNULL, which reusesobject$level. Validated byassert_numeric_scalar(level, "level", lower = 1e-3, upper = 1 - 1e-3)when non-NULL. -
bridge:gdpar_causal_bridgeorNULL. Optional replacement bridge object used instead ofobject$bridge. Defaults toNULL(use cached bridge). Useful when the cached bridge was stripped (e.g., after asaveRDSround-trip that lost the two fits). -
data: named list with componentsX,T,Y(and optionallyX_newdata) orNULL. Reserved for the case of a forced re-fit. Defaults toNULL. Note: the current implementation does not consumedataat all — it is accepted but never referenced in the body. -
...: any. Reserved for future arguments; currently unused.
Mathematics
Let newdata. For each external method predict_fun
The bridge prediction is
when real fits are present. The concordance metrics are then recomputed over the vector of method-specific CATE means via .compute_comparison_metrics(cate_list).
Returns
A list of class c("predict.gdpar_meta_learner_comparison", "list") with components:
-
bridge: list withcate_meanandcate_ci(frombridge_pred). -
external: named list mirroringobject$externalnames; each entry is a list withcate_mean(numeric, possibly allNA_real_),cate_ci(matrix orNULL),method(character),native_ci(logical),time_sec(NA_real_),notes(character vector, augmented with a status message). -
comparison: result of.compute_comparison_metrics(cate_list)— a list of concordance matrices. -
newdata: the inputnewdata(stored verbatim). -
level: the resolved numeric level.
Notes
- Calls
assert_inherits(object, "gdpar_meta_learner_comparison", "object")andassert_data_frame(newdata, "newdata")up front. - If
bridge_obj(resolved frombridgeorobject$bridge) does not inherit from"gdpar_causal_bridge", the function aborts viagdpar_abortwith class"gdpar_input_error"and a message instructing the user to pass a bridge via thebridgeargument. - The outcome name is recovered via
.bridge_outcome_name(bridge_obj$fits$treat, bridge_obj$fits$ctrl), and covariates are extracted fromnewdatavia.extract_covariates(newdata, outcome_name)(presumably dropping the outcome column). - Bridge re-evaluation branches on
has_real_fits:- If both
bridge_obj$fits$treat$fitandbridge_obj$fits$ctrl$fitare non-NULL, callsstats::predict(bridge_obj, newdata = newdata, level = level, summary = "mean_ci"). - Otherwise, falls back to cached
bridge_obj$cate_mean/bridge_obj$cate_cionly ifnrow(newdata) == bridge_obj$n_obs; otherwisecate_meanisrep(NA_real_, nrow(newdata))andcate_ciisNULL.
- If both
- For each external method:
- If
e$predict_funis a function, it is invoked aspf(state = e$state, X_newdata = X_newdata, level = level)insidetryCatch. On success,cate_meanis coerced viaas.numeric(out$cate_mean),cate_ciis taken asout$cate_ci,native_ciise$native_ci && !is.null(out$cate_ci), andnotesis augmented with"reused cached state via predict_fun". On error, the method is added toneeds_refit,cate_meanisrep(NA_real_, nrow(newdata)),cate_ciisNULL,native_ciisFALSE, andnotesis augmented with"predict_fun failed: <message>". - If no
predict_fun, the method is added toneeds_refit,cate_meanisrep(NA_real_, nrow(newdata)),cate_ciisNULL,native_ciisFALSE, andnotesis augmented with"predict_fun unavailable; a full refit would be required".
- If
- If
length(needs_refit) > 0L, a warning of class"gdpar_diagnostic_warning"is emitted viagdpar_warnwithdata = list(needs_refit = needs_refit)and a message listing the affected adapters, advising the user to rebuild the comparison withgdpar_compare_meta_learners(). -
time_secis always set toNA_real_for every external entry (no timing is recorded for prediction). - The
dataargument is declared and documented but not used in the body; no refit path is actually implemented despite the documentation mentioningfit_predict_fun. The function only reuses cached state or returnsNApredictions. -
cate_listis constructed asc(list(bridge = as.numeric(bridge_pred$cate_mean)), lapply(external, function(e) e$cate_mean))and passed to.compute_comparison_metrics; the resulting matrices therefore have row/column names"bridge"followed by the external method names (subject to.compute_comparison_metrics's naming behavior). - S3 dispatch is triggered by
predict(object, newdata = ...)whenclass(object)contains"gdpar_meta_learner_comparison".
Purpose
Orchestrates a descriptive comparison of the T-learner (AMM-side) embedded in a gdpar_causal_bridge object against one or more external meta-learners (e.g., grf, EconML). It evaluates each method on a common evaluation grid, reports point/posterior CATE estimates and their native confidence intervals, and computes three concordance metrics (RMSE, Pearson correlation, mean absolute discrepancy) between every ordered pair of methods on their point/posterior CATE estimates. It does not perform hypothesis tests.
Arguments
-
bridge: Object of classgdpar_causal_bridge(fromgdpar_causal_bridge()). Contains two fittedgdparobjects (treatment and control arms), precomputed CATE estimates, and metadata. -
methods: Non-empty list ofgdpar_meta_learner_adapterobjects. Each adapter wraps a specific external meta-learner implementation (e.g.,gdpar_adapter_grf()). -
newdata: Optionaldata.frameon which to evaluate CATE. Defaults to the evaluation grid stored inbridge$newdata. -
data: Optional list with componentsX(covariatedata.frame),T(integer 0/1 treatment vector),Y(numeric outcome vector). Used to supply training data explicitly if it cannot be recovered from the bridge's stored calls (e.g., when the original data is not in the calling environment). -
seed: Optional integer scalar. Propagated to each adapter'sfit_predict_funasseed_runfor reproducibility. -
...: Reserved for future arguments; currently unused.
Mathematics
For every ordered pair of methods "bridge"), computes the following concordance metrics on the point/posterior CATE estimates n_obs). Confidence intervals are not pooled because their inferential origins (Bayesian posterior, asymptotic, bootstrap) are heterogeneous.
Returns
An object of class gdpar_meta_learner_comparison (a list) with components:
-
bridge_cate: List withcate_mean(numeric vector of bridge CATE point estimates) andcate_ci(matrix of bridge credible intervals). -
bridge: The originalgdpar_causal_bridgeobject. -
external: Named list of results for each external adapter. Each element containscate_mean,cate_ci,method,native_ci(logical),time_sec,notes,state(from the adapter), and the adapter'spredict_fun/fit_predict_funif provided. -
comparison: Matrix of concordance metrics (RMSE, Pearson, MAD) between all method pairs. -
newdata: The evaluation grid (data.frame) used. -
level: The confidence level (numeric) used for intervals. -
n_obs: Integer number of evaluation points. -
n_methods: Integer total number of methods compared (bridge + external). -
call: The matched call. -
meta: List of metadata including package version, timestamp, seed, original bridge call, and adapter specifications.
Notes
-
Scalar-outcome restriction: Rejects bridges with
dim_kind != "scalar"(i.e., distributional or multivariate regression) via.guard_scalar_outcome(). -
Method names: If the
methodslist is unnamed, names are taken from each adapter's$namefield. Duplicate names cause an error. -
Dataset recovery: If
dataisNULL, the function attempts to reconstruct the training data from the bridge's stored calls using.assemble_bridge_dataset(). If this fails, the user must supplydataexplicitly. -
Adapter validation: Each adapter is checked for unmet software requirements (R packages, Python modules) via
.check_adapter_requirements(). Missing dependencies cause agdpar_missing_dependency_error. -
Adapter output validation: Results from each adapter are checked with
.validate_adapter_output()for correct length and structure. -
Bridge CATE recomputation: If
newdatadiffers from the bridge's original grid and lengths mismatch, the bridge CATE is re-predicted usingstats::predict(bridge, ...).
Purpose
Internal validation function ensuring the bridge was constructed from scalar-outcome fits (i.e., dim_kind == "scalar"). Rejects bridges from distributional regression (K > 1) or multivariate response (p > 1) with a specific error, as multi-output external adapters are not supported in the current scope (Sub-phase 8.5.B).
Arguments
-
bridge: Object of classgdpar_causal_bridge. Its$meta$dim_kindcomponent is inspected.
Returns
invisible(NULL) if the bridge is scalar. Otherwise, raises a gdpar_unsupported_feature_error.
Notes
- Uses the null-coalescing operator
%||%to default"scalar"ifdim_kindis missing. - The error message references "Sub-phase 8.5.B" and queues multi-output support for "Block 9" per the package roadmap.
Purpose
Internal helper that constructs the unified training dataset (X, T, Y) required by external meta-learner adapters. It either uses an explicitly provided data argument or attempts to recover the training data from the bridge's stored fits by evaluating their captured call objects in the specified environment.
Arguments
-
bridge:gdpar_causal_bridgeobject. -
newdata:data.frameof evaluation covariates (the CATE grid). -
data: Optional list with componentsX,T,Y. If supplied, it is used directly. -
eval_env: Environment in which to evaluate the bridge's stored calls (typicallyparent.frame()of the caller).
Mathematics
When data is NULL, the algorithm for each arm (treatment, control) is:
- Recover the arm's training dataset via
eval(fit$call$data, eval_env). - Identify the outcome variable name from the LHS of
fit$call$formula. - Extract the covariate matrix
X_arm(all columns except the outcome) and outcome vectorY_arm. - Create treatment indicator
T_arm = 1L(treatment) or0L(control). - Stack the two arms row-wise to form
(X, T, Y).
Returns
A list with components:
-
X:data.frameof stacked covariates (rows = training observations from both arms). -
T: Integer vector of treatment indicators (0/1). -
Y: Numeric vector of outcomes. -
X_newdata: The suppliednewdata(unchanged). -
outcome_name: Character string of the outcome variable name.
Notes
- If evaluation of a fit's
call$datafails (e.g., the object is not ineval_env), the function aborts with agdpar_input_erroradvising the user to passdataexplicitly. - The helper ensures the covariate column order and types are consistent between training and evaluation data.
- The function is responsible for ensuring that the stacked dataset aligns with the external adapter's expectations (i.e.,
Xis a data frame,Tis integer 0/1,Yis numeric).
Purpose
Assembles the unified training dataset and newdata covariate matrix required by the meta-learner comparison machinery. It either accepts an explicitly supplied data argument (a list with X, T, Y, and optionally X_newdata) or attempts to recover the original training data from the captured call objects inside the treatment and control fits of a bridge object. In both paths it validates consistency, combines arms, and returns a standardized list.
Arguments
| Argument | Type | Meaning |
|---|---|---|
bridge |
list |
A bridge object with component fits$treat and fits$ctrl, each a fitted model object that carries a $call element. |
newdata |
data.frame (or coercible) |
New covariate data for which predictions will be compared. Must contain every covariate column present in the training data. |
data |
list or NULL
|
Optional explicit data. If non-NULL it must be a named list with components X (covariate matrix/data.frame), T (integer treatment indicator, values 0/1), Y (numeric outcome), and optionally X_newdata (covariate data.frame for newdata; if absent, covariates are extracted from newdata). |
eval_env |
environment |
The environment in which fit call expressions (e.g. cl$data) are evaluated when recovering training data from the fitted objects. |
Mathematics
No formula is implemented. The function performs data assembly only.
Training data from two arms are row-bound with treatment indicators prepended:
where
Returns
A named list with five components:
| Component | Type | Description |
|---|---|---|
X |
data.frame |
Training covariates (outcome column removed). Row count equals |
T |
integer vector |
Treatment indicator, length 1L (treatment arm first) then 0L (control arm). |
Y |
numeric vector |
Outcome values, treatment arm first then control arm, length |
X_newdata |
data.frame |
Covariates for newdata, column subset/reordered to match X. Row count equals nrow(newdata). |
outcome_name |
character scalar |
The name of the outcome variable, inferred by .bridge_outcome_name(). |
Notes
-
Explicit data path (
datais non-NULL):-
datamust be a list with named componentsX,T,Y. If any is missing, agdpar_input_erroris raised. -
Xis coerced todata.frameif it is not one already (withstringsAsFactors = FALSE). -
Tis coerced tointeger;Ytonumeric. - Lengths of
T,Y, andnrow(X)must agree; otherwise agdpar_input_erroris raised with a diagnostic sprintf message. -
Tmust contain only0Land1L; otherwise agdpar_input_erroris raised. - If
data$X_newdatais present it is used (coerced todata.frameif needed); otherwise covariates are extracted fromnewdataby calling.extract_covariates(newdata, outcome_name). - The function returns immediately via
return(...)without any data-recovery attempt.
-
-
Recovery path (
dataisNULL):- The internal function
recover(fit)evaluatesfit$call$dataineval_env. If the call isNULL, the data component isNULL, or evaluation throws an error,NULLis returned. - If either recovered data is
NULLor not adata.frame, agdpar_input_erroris raised with extra data fieldstreat_recoveredandctrl_recovered. - The outcome variable name (from
.bridge_outcome_name()) must appear as a column in both recovered data frames; otherwise agdpar_input_erroris raised. - The column sets of the two recovered data frames must be identical (after sorting); otherwise a
gdpar_input_erroris raised listing both column sets. - Both data frames are subset to their common columns, then row-bound. The outcome column is removed from
Xvia.extract_covariates(). -
newdatacovariates are extracted and checked for missing columns present inX; if any are missing agdpar_input_erroris raised listing them.X_newdatais then reordered/subset to matchX's columns exactly.
- The internal function
-
All errors are raised via
gdpar_abort()with appropriate condition classes (gdpar_input_error,gdpar_unsupported_feature_error).
Purpose
Infers the name of the outcome (response) variable from the captured formula calls of the treatment and control fits within a bridge object. This is used internally by .assemble_bridge_dataset() and other comparison functions.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_t |
fitted model object | Treatment-arm fit. Must carry a $call element, ideally with a $formula component. |
fit_c |
fitted model object | Control-arm fit. Same expectations as fit_t. |
Mathematics
No formula. The function performs formula introspection.
Returns
A single character string giving the outcome variable name. If both fits resolve to the same name, that name is returned. If only one resolves, the resolved name is returned. If neither resolves or they disagree, an error is raised (see Notes).
Notes
-
The internal helper
pick(fit)attempts three strategies in order to extract the LHS of a two-sided formula fromfit$call$formula:- Evaluate
cl$formulain the environment offit$call. If the result is aformulaof length 3 (two-sided), extractas.character(fm[[2L]]). - If
cl$formulaitself is acallorname, attempt to evaluate it with bareeval(). If the result is a two-sided formula, extract the LHS. - If
cl$formulais a call of length 3 whose first element is~, directly extractas.character(cl$formula[[2L]]). - If all strategies fail,
NA_character_is returned.
- Evaluate
-
If both
n_tandn_careNA, agdpar_input_erroris raised advising the user to pass an explicitdataargument. -
If both are non-
NAbut differ (!identical(n_t, n_c)), agdpar_unsupported_feature_erroris raised listing both names. This means the two fits must model the same outcome variable. -
If exactly one is
NA, the non-NAvalue is returned (i.e.,n_cwhenn_tisNA, otherwisen_t).
Purpose Removes the outcome column from a data frame (or object coercible to one) and returns only the covariate columns. Used throughout the comparison pipeline to separate predictors from response.
Arguments
| Argument | Type | Meaning |
|---|---|---|
df |
data.frame or coercible |
A data frame whose columns include covariates and the outcome. |
outcome_name |
character scalar |
The name of the outcome column to drop. |
Mathematics
None.
Returns
A data.frame containing all columns of df except the one named outcome_name. If outcome_name is not present in colnames(df), all columns are returned (since setdiff returns the full set). The drop = FALSE argument ensures the result is always a data frame even if a single column remains.
Notes
- If
dfis not already adata.frame, it is coerced viaas.data.frame(df, stringsAsFactors = FALSE). - This is a utility function; no errors are raised by it directly.
Purpose Validates that the return value of a meta-learner adapter conforms to the expected shape and types. Called internally after each adapter invocation to enforce the adapter interface contract.
Arguments
| Argument | Type | Meaning |
|---|---|---|
result |
list |
The object returned by an adapter. Must contain at minimum a cate_mean component. May optionally contain cate_ci. |
n_newdata |
integer scalar |
The expected number of rows (observations) in the newdata, i.e., the required length of cate_mean and row count of cate_ci. |
adapter_name |
character scalar |
Human-readable name of the adapter, used in error messages. |
Mathematics
None.
Returns
Invisibly returns NULL (invisible(NULL)). Side-effect–only function: raises errors if validation fails.
Notes
-
First check:
resultmust be alistand must have an element named"cate_mean". If not, agdpar_internal_erroris raised. -
Second check:
result$cate_meanmust be anumericvector of length exactlyn_newdata. If not, agdpar_internal_erroris raised with a diagnostic sprintf. -
Third check (conditional): If
result$cate_ciis non-NULL, it must be amatrixwithnrow == n_newdataandncol == 2L(lower and upper bounds). If not, agdpar_internal_erroris raised reporting the actual dimensions. - All errors use class
"gdpar_internal_error", indicating a programming error in the adapter rather than user input.
Purpose Computes three pairwise concordance/similarity matrices across a list of CATE (Conditional Average Treatment Effect) estimate vectors. These matrices quantify the agreement between different meta-learner methods on the same newdata.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cate_list |
list of numeric vectors |
Each element is a numeric vector of CATE predictions for the same newdata observations. All vectors must have the same length. List element names (if present) are used as row/column labels; otherwise names m1, m2, … are generated. |
Mathematics
Let
Root Mean Squared Error (RMSE):
Diagonal:
Mean Absolute Deviation (MAD):
Diagonal:
Pearson Correlation:
Diagonal: $\text{Pearson}{ii} = 1$. Note that correlation is computed only for $i < j$ and then copied symmetrically: $\text{Pearson}{ji} = \text{Pearson}_{ij}$.
Returns
A named list with three components:
| Component | Type | Description |
|---|---|---|
rmse |
matrix ( |
Pairwise RMSE. Diagonal is 0. Symmetric. Dimnames are the method names. |
pearson |
matrix ( |
Pairwise Pearson correlation. Diagonal is 1. Symmetric. Dimnames are the method names. |
mad |
matrix ( |
Pairwise MAD. Diagonal is 0. Symmetric. Dimnames are the method names. |
Notes
- All CATE vectors in
cate_listare column-bound into a single matrixMviado.call(cbind, cate_list). This requires all vectors to have the same length; no explicit check is performed—cbindwill recycle or error if lengths differ. - If
cate_listhas no names, synthetic names"m1","m2", … are assigned and propagated todimnames. - The double loop iterates over all
$(i, j)$ pairs with$i \neq j$ . For RMSE and MAD, each off-diagonal entry is written once. For Pearson, the loop only computes the correlation when$i < j$ (usingstats::cor()) and mirrors the value to$[j, i]$ . This avoids redundant correlation calls. -
stats::cor()is wrapped insuppressWarnings()to silence warnings about constant vectors (which yieldNAcorrelations). - The matrices are not guaranteed to be perfectly symmetric due to floating-point considerations in the
$i \neq j$ case for RMSE and MAD (each pair is computed only once and written to one cell; the symmetric cell is left at the diagonal-init value). Specifically,rmse[i,j]is set for all$i \neq j$ in the inner loop, so both$[i,j]$ and$[j,i]$ are filled (the loop visits both orderings sincei == jis the only skip). The Pearson matrix is explicitly symmetric because only$i < j$ is computed and mirrored.
gdpar_contraction_diagnostic(fit, data, sizes = NULL, replicates = 1L, parameters = NULL, level = 0.95, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)
Purpose
Empirical posterior contraction-rate diagnostic for a fitted Path 1 (Bayesian) gdpar model. It is an opt-in, computationally expensive methodological audit tool that refits the model at multiple subsample sizes, records the median posterior credible-interval width across user-facing parameters at each size, and fits an ordinary-least-squares regression of log-width on log-sample-size. The estimated slope is compared against the theoretical parametric contraction rate fit; it returns a standalone report.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_fit |
A fitted model object produced by gdpar with path = "bayes". Must inherit from class "gdpar_fit". The original fit$call is extracted and modified to produce subsampled refits. |
data |
data frame | The data frame originally passed to gdpar, or another data frame compatible with the AMM specification of fit. Its row count |
sizes |
NULL or numeric vector |
Subsample sizes at which to refit. If NULL (default), a length-five geometric sequence is generated between |
replicates |
integer scalar (count) | Number of independent subsamples drawn per size. Defaults to 1L. Higher values reduce Monte Carlo variance of the log-width curve at additional computational cost. Must be a non-negative integer (validated by assert_count). |
parameters |
NULL or character vector |
Optional explicit list of posterior variable names to include in the credible-width calculation. If NULL (default), the function auto-selects user-facing parameters by filtering out variables matching the internal ignore pattern. |
level |
numeric scalar in |
Nominal credible level for interval-width computation. Defaults to 0.95. The interval is formed from the |
iter_warmup |
integer scalar (count) | Warmup iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call. |
iter_sampling |
integer scalar (count) | Sampling iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call. |
chains |
integer scalar (count) | Number of MCMC chains per refit. Defaults to 2L. Forwarded to gdpar via the modified call. |
verbose |
logical scalar (length 1) | If TRUE, prints a cost message via gdpar_inform before starting the refits. Defaults to TRUE. Must be a single logical value. |
... |
any | Additional arguments forwarded to gdpar through the modified refit call. |
Mathematics
The diagnostic fits the linear regression
where
with
The slope stats::lm. An approximate 95% confidence interval for
The verdict logic compares this interval against the theoretical target
The first branch tests whether the 95% CI overlaps the interval
The default subsample sizes, when sizes = NULL, are generated as
Returns
A list of class c("gdpar_contraction_report", "list") with components:
| Component | Type | Description |
|---|---|---|
table |
data frame | Columns n (subsample size), replicate (replicate index), median_width (median credible-interval width, NA_real_ if the refit failed). One row per (size, replicate) cell. |
slope_estimate |
numeric scalar | OLS slope lm(log_w ~ log_n), with names stripped via unname. |
slope_se |
numeric scalar | Standard error of |
slope_ci_lower |
numeric scalar | Lower bound |
slope_ci_upper |
numeric scalar | Upper bound |
verdict |
character | One of three verdict strings (see Mathematics). |
level |
numeric scalar | The credible level used (echoed from the level argument). |
warnings |
character vector | Per-refit failure messages; empty if all refits succeeded. |
Notes
-
Input validation. Calls
assert_inherits(fit, "gdpar_fit", ...),assert_data_frame(data, ...),assert_count(replicates, ...),assert_numeric_scalar(level, ..., lower = 0, upper = 1),assert_count(iter_warmup, ...),assert_count(iter_sampling, ...),assert_count(chains, ...). Theverboseargument is checked inline: if not a length-1 logical,gdpar_abortis called with class"gdpar_input_error". Thesizesargument, when non-NULL, is validated inline: if not numeric, or if any entry is$< 5$ or$> n$ ,gdpar_abortis called with class"gdpar_input_error"and a message formatted viasprintf. -
Suggested-package dependencies. Calls
require_suggested("cmdstanr", ...)andrequire_suggested("posterior", ...). If either is unavailable, an error is raised by that helper. -
Cost message. When
verbose = TRUE, emits agdpar_informmessage of class"gdpar_optin_message"stating the total number of refits (length(sizes) * replicates). -
Refit call construction. The original
fit$callis copied and modified:datais set toquote(sub);iter_warmup,iter_sampling,chainsare overwritten from the corresponding arguments;verboseis set toFALSE;refreshis set to0L;skip_id_checkis set toTRUE. The modified call iseval-uated in a freshly created environment (new.env(parent = parent.frame())) in which the symbolsubis bound to the subsampled data frame. The local variablecall_data_arg_name <- "data"is assigned but never used. -
Subsampling. For each
(size, replicate)cell,sample.int(n, size = sz)draws a simple random sample without replacement. Despite the documentation mentioning "stratified by row order," the code performs uniform random sampling with no stratification. -
Refit failure handling. Each refit is wrapped in
tryCatch. On error,refit_failure_msgis populated (via<<-inside the error handler) with a formatted message including the size, replicate, andconditionMessage(e). Agdpar_warnof class"gdpar_diagnostic_warning"is emitted, the message is appended towarnings_msg, and a row withmedian_width = NA_real_is recorded. The loop then continues to the next cell. -
Variable selection. Posterior variables are retrieved via
posterior::variables(draws). The ignore pattern"^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw)"is applied viagreplto exclude internal/auxiliary variables. IfparametersisNULL, the filtered set (candidate_vars) is used; otherwiseparametersis used directly without validation against available variables. -
Width computation.
posterior::summarise_drawsis called onposterior::subset_draws(draws, variable = use_vars)with two custom summary functionsq_lowerandq_upperthat wrapstats::quantileat probabilities$\alpha/2$ and$1 - \alpha/2$ respectively (withnames = FALSE). Widths are computed as the element-wise differenceq_upper - q_lower, and the cell'smedian_widthisstats::median(widths). -
Minimum data requirement. After removing
NArows, if fewer than 3 successful refits remain,gdpar_abortis called with class"gdpar_diagnostic_error"and message"Not enough successful refits to estimate the contraction slope.". -
Regression.
stats::lm(log_w ~ log_n)is fit on the non-NAsubset. Coefficients and standard errors are extracted fromstats::coef(reg)andsummary(reg)$coefficients[, "Std. Error"]respectively, indexing by the name"log_n". -
Side effects. May print a cost message (
gdpar_inform), emit per-refit warnings (gdpar_warn), and performlength(sizes) * replicatesfull Bayesian refits viacmdstanr(throughgdpar).
Purpose
S3 print method for objects of class gdpar_contraction_report. Produces a human-readable summary of the contraction-rate diagnostic report, including the per-cell table, the estimated slope with standard error and 95% confidence interval, the verdict string, and any recorded warnings.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_contraction_report |
The report object returned by gdpar_contraction_diagnostic. |
... |
any | Unused; present for S3 generic compatibility. |
Returns
Invisibly returns x (via invisible(x)).
Notes
-
Output format. Prints, in order:
- A header line
"<gdpar_contraction_report> level = <level>"(usingcatwithsep = ""). - The
tablecomponent viaprint(x$table, row.names = FALSE). - A blank line, then
"Slope estimate (log_width ~ log_n): <slope> (SE = <se>)"with values formatted to 3 significant digits viaformat(..., digits = 3). -
"95% CI: [<lower>, <upper>]"with values formatted to 3 significant digits. -
"Verdict: <verdict>". - If
length(x$warnings) > 0L, a blank line, the header"Warnings:", and each warning prefixed with" - ".
- A header line
-
S3 dispatch. Registered as the print method for class
gdpar_contraction_report; dispatched automatically when such an object is printed at the console. - No side effects beyond console output.
Purpose. Extracts the observed scalar outcome vector from a scalar Empirical-Bayes fit (gdpar_eb_fit) by reading the Stan data bundle stored in object$stan_data. It serves as the canonical accessor for the response used downstream by dependence diagnostics (e.g., residual-based Moran's I or block-bootstrap refit engines). Aborts for non-scalar outcomes (multivariate p > 1 or multi-slot K > 1), which are explicitly deferred in this sub-block.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit |
A scalar Empirical-Bayes fit object whose stan_data list contains the outcome vector. |
Returns. A numeric vector (as.numeric(y_raw)), the observed outcome values. If the Stan data stored a real-valued response (y_real) that is used; otherwise y_int (count / Bernoulli families) is used. The result is always coerced to numeric.
Notes.
- Reads
object$stan_data$y_realfirst; ifNULL, falls back toobject$stan_data$y_int. If both areNULL, raises agdpar_internal_errorviagdpar_abort(). - If
y_rawis a matrix with more than one column (ncol(y_raw) > 1L), raises agdpar_unsupported_feature_error, because multivariate (p > 1) outcomes are deferred. - Multi-slot (
K > 1) outcomes are not checked here directly (that is handled by.gdpar_assert_scalar_eb()), but the matrix-column check implicitly guards against multi-column outcome matrices. - No S3 dispatch; purely internal.
Purpose. Validates that object is a scalar Empirical-Bayes fit (gdpar_eb_fit) suitable for dependence-robust inference. Checks three conditions: (i) the object inherits from gdpar_eb_fit, (ii) it has no heterogeneous-family list (K > 1), and (iii) its conditional HMC fit is present.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit |
The fit object to validate. |
arg_name |
character (length 1) |
Name of the argument, used in error messages. Defaults to "object". |
Returns. invisible(object) — the same object, if all checks pass.
Notes.
- Calls
assert_inherits(object, "gdpar_eb_fit", arg_name)first; this is an external assertion helper that aborts with an appropriate class if the check fails. - If
object$family$familiesis non-NULL, this indicates heterogeneous families (K > 1), and agdpar_unsupported_feature_erroris raised. - If
object$conditional_fitisNULL, agdpar_internal_erroris raised because the conditional HMC fit is required for downstream residual extraction. - Returns invisibly to support use as a guard clause.
Purpose. The Axis 2 gate (decision D102): validates that a fit object is a scalar fit on either the Empirical-Bayes or the full-Bayes path, suitable for dependence-robust inference. For EB fits it delegates verbatim to .gdpar_assert_scalar_eb(). For full-Bayes fits (gdpar_fit) it checks the path class and presence of the HMC fit. Any other class is rejected.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
The fit object to validate. |
arg_name |
character (length 1) |
Name of the argument, used in error messages. Defaults to "object". |
Returns. invisible(object) — the same object, if all checks pass.
Notes.
- If
objectinherits fromgdpar_eb_fit, delegates to.gdpar_assert_scalar_eb(object, arg_name)and returns its result. This preserves byte-identical EB-path behaviour. - If
objectinherits fromgdpar_fit:- Calls
.gdpar_fit_path_class(object)(an internal helper elsewhere in the package) and asserts the result is"scalar". If not, raisesgdpar_unsupported_feature_error(multivariatep > 1andK > 1full-Bayes fits are deferred). - If
object$fitisNULL, raisesgdpar_internal_error(the HMC fit is missing).
- Calls
- If
objectis neithergdpar_eb_fitnorgdpar_fit, raisesgdpar_input_errorwith a message naming the offending argument. - Returns invisibly.
Purpose. Extracts the EB point-estimate vector from a scalar Empirical-Bayes fit and flattens it into a single named numeric vector. This is the EB touchpoint of the block-bootstrap engine: the same extraction is performed on each bootstrap refit, and column alignment across refits depends on the name stability guaranteed here.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_eb_fit |
A scalar Empirical-Bayes fit object. |
Mathematics. No formula per se, but the extraction order is deterministic and fixed:
where each component is a sub-vector of the named coefficients returned by coef.gdpar_eb_fit(). The concatenation order is: theta_ref, then a, then b, then W.
Returns. A named numeric vector containing all EB point estimates. Names follow the convention "theta_ref" or "theta_ref[1]" etc. for theta_ref, and "a[1]", "b[1]", "W[1]" etc. for the remaining components (unless the coef() result already provides names).
Notes.
- Calls
stats::coef(fit)to obtain the structured coefficient list. - Iterates over components
"theta_ref","a","b","W"in that fixed order. - If a component's
$estimatefield isNULL, it is silently skipped. - If names are
NULL, synthetic names of the form"<comp>[<index>]"are generated. Fortheta_refof length 1, the name is simply"theta_ref". - If no estimates can be extracted (all components
NULL), raisesgdpar_internal_error. - The result of
do.call(c, unname(parts))concatenates the named sub-vectors while preserving names.
Purpose. Mirrors .gdpar_eb_estimate_vector() but extracts the model-based (Laplace / conditional posterior) standard errors instead of point estimates. The resulting vector is name-aligned with the estimate vector, enabling ratio computations such as se_ratio = robust_se / model_se.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
fit |
gdpar_eb_fit |
A scalar Empirical-Bayes fit object. |
Mathematics. The model SE for each coefficient
as stored in coef(fit)$<component>$se.
Returns. A named numeric vector of the same length and name structure as .gdpar_eb_estimate_vector(fit). If a component's $se field is NULL but its $estimate field is non-NULL, the corresponding entries are filled with NA_real_.
Notes.
- Reads
$sefields from thecoef(fit)list. If$seisNULLfor a given component but$estimateis present, fills withNA_real_(length-matched viarep(NA_real_, length(est))). - Uses
$estimate(not$se) to determine names and presence, ensuring alignment with the estimate vector. - Iterates over
theta_ref,a,b,Win the same fixed order as.gdpar_eb_estimate_vector(). - Returns a
do.call(c, unname(parts))result, identical structure to the estimate vector.
Purpose. Extracts the posterior draws of the AMM coefficients from a scalar full-Bayes fit (gdpar_fit) as a single
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A scalar full-Bayes fit object (already validated by .gdpar_assert_scalar_dep()). |
Mathematics. Let
where the columns correspond to the Stan variables theta_ref, a_coef, b_coef, and W_raw, in that order, each included only if the corresponding AMM component is active.
Returns. An draws_matrix) whose columns carry Stan variable names (e.g., "theta_ref[1]", "a_coef[1]"). Row count equals the number of posterior draws.
Notes.
- Requires the suggested package
posterior; callsrequire_suggested("posterior", "extract posterior draws")which will abort with an informative message if unavailable. - Reads draws via
object$fit$draws()(the raw CmdStan / Stan fit object). - Variables included: always
"theta_ref"; additionally"a_coef"ifobject$amm$ais non-NULL;"b_coef"ifobject$amm$bis non-NULL;"W_raw"ifobject$amm$Wis non-NULL. - Uses raw
W_rawdraws (notsigma_W-scaled effective weights), matching the EB extractor's use of rawW_rawconditional estimates. This is a deliberate parity choice (decision D102). - Excludes hyperparameters (
mu_theta_ref,sigma_theta_ref) for EB/FB parity. - If the resulting matrix is
NULLor has zero columns, raisesgdpar_internal_error. - Calls
unclass()on the result to strip thedraws_matrixclass, returning a plain numeric matrix.
Purpose. Computes the full-Bayes point-estimate vector as the posterior mean of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_estimate_vector().
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A scalar full-Bayes fit object. |
Mathematics. For each coefficient
where
Returns. A named numeric vector of length "theta_ref[1]", "a_coef[1]", etc.).
Notes.
- Calls
.gdpar_fb_coef_draws_matrix(object)to obtain the$S \times P$ matrix, then computes column means viacolMeans(mat). - Names are set from
colnames(mat), which are the Stan variable names.
Purpose. Computes the full-Bayes model-based standard error vector as the posterior standard deviation of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_model_se_vector().
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit |
A scalar full-Bayes fit object. |
Mathematics. For each coefficient
where
Returns. A named numeric vector of length .gdpar_fb_estimate_vector(object). Names are taken from colnames(mat).
Notes.
- Calls
.gdpar_fb_coef_draws_matrix(object)then appliesapply(mat, 2L, stats::sd)to compute column-wise standard deviations. - Uses
stats::sd(which divides by$S - 1$ , Bessel-corrected). - The "model SE" here is the posterior SD, which is like-for-like with the EB Laplace SD, so the
se_ratio = robust_se / model_secomparison is a SD-vs-SD ratio on both paths.
Purpose. Class-dispatched accessor for the point-estimate vector, the first touchpoint of the shared block-bootstrap engine. For a gdpar_eb_fit it delegates to .gdpar_eb_estimate_vector() (byte-identical EB path); for a gdpar_fit it delegates to .gdpar_fb_estimate_vector().
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A validated scalar fit object (EB or full-Bayes). |
Mathematics. See .gdpar_eb_estimate_vector() and .gdpar_fb_estimate_vector().
Returns. A named numeric vector of AMM coefficient point estimates, regardless of path.
Notes.
- Dispatch is via
inherits(object, "gdpar_eb_fit")(manual S3-style, not formalUseMethod). - If the object is a
gdpar_eb_fit, calls and returns.gdpar_eb_estimate_vector(object)verbatim, preserving regression-gate compatibility. - Otherwise (assumed
gdpar_fit), calls.gdpar_fb_estimate_vector(object). - Column names are stable across refits of the same model specification, which is critical for the block-bootstrap column alignment.
Purpose. Class-dispatched accessor for the model-based standard error vector, the second touchpoint of the shared block-bootstrap engine. EB path: Laplace / conditional posterior SD (verbatim). Full-Bayes path: posterior SD per coefficient. In both cases the "model SE" is a within-model (posterior / Laplace) standard deviation.
Arguments.
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A validated scalar fit object (EB or full-Bayes). |
Mathematics. See .gdpar_eb_model_se_vector() and .gdpar_fb_model_se_vector().
Returns. A named numeric vector of model-based standard errors, name-aligned with .gdpar_dep_estimate_vector(object).
Notes.
- Same dispatch pattern as
.gdpar_dep_estimate_vector():inherits(object, "gdpar_eb_fit")triggers the EB path; otherwise full-Bayes. - The resulting vector is used in computing
se_ratio = robust_se / model_se, and because both EB and full-Bayes model SEs are standard deviations (SD-vs-SD), the ratio is a like-for-like comparison. - Name alignment with the estimate vector is guaranteed by the internal extractors.
Purpose. According to the documentation comment, this function returns the rate-optimal default block length for the moving block bootstrap:
where
Arguments. Not defined in this section — the function body is truncated at the end of the provided source.
Returns. Presumably a single integer:
Notes.
- The section is incomplete; only the roxygen/description comment is present. The function name and full signature are not visible in this segment.
- The data-driven constant of Politis & White (2004) is noted as a deferred refinement; this default provides only the correct rate, not the optimal constant.
- Full documentation will require the subsequent section(s) where the function body is defined.
Purpose Computes the default block length for block bootstrap resampling using the cube-root rate
Arguments
| Argument | Type | Meaning |
|---|---|---|
n |
integer-coercible scalar | Sample size (number of observations). |
Mathematics
Implements the rate:
where the rounding and flooring produce an integer
Returns A single integer: the default block length.
Notes The as.integer(round(...)) call rounds to the nearest integer and truncates; the outer max(1L, ...) guarantees the result is at least 1 even when n = 0 or n = 1.
Purpose Predicate that tests whether a block-size argument is the literal character string "auto", distinguishing the data-driven Politis–White path from a fixed integer or the NULL rate default. Shared by the temporal and spatial robust estimators.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
any R object | The block-length (or block-size) argument to inspect. |
Returns A logical scalar: TRUE if x is exactly the character string "auto" (length-1 character, not NA), FALSE otherwise.
Notes The compound guard is.character(x) && length(x) == 1L && !is.na(x) && identical(x, "auto") is deliberately strict: a factor, an NA_character_, or a character vector of length ≠ 1 all return FALSE. No side effects.
Purpose Evaluates the flat-top lag window (kernel) of Politis (2003) / Politis & White (2004), vectorised over its argument. Used inside the Politis–White block-length selector to compute the spectral density estimate
Arguments
| Argument | Type | Meaning |
|---|---|---|
s |
numeric vector | Scaled lag values |
Mathematics
Returns A numeric vector of the same length as s containing
Notes Vectorised via nested ifelse over abs(s). No input validation; non-finite or NA inputs propagate NA.
Purpose Determines the adaptive bandwidth Kn consecutive negligible lags (the "first insignificant run" rule). Factored out for direct unit testing.
Arguments
| Argument | Type | Meaning |
|---|---|---|
rho |
numeric vector | Sample autocorrelations at lags |
Kn |
integer scalar | Number of consecutive insignificant lags required to declare the bandwidth; typically |
crit |
numeric scalar | Critical value for the significance test; a lag |
Mathematics
Returns the smallest integer
If no such run exists in
i.e., the largest significant lag. If every lag is insignificant,
Returns An integer scalar: the estimated bandwidth
Notes Early-return inside the for loop at the first qualifying run. The function operates on a logical vector insig <- abs(rho) < crit of length 1L as a safe minimum when all autocorrelations are negligible (near-white noise).
Purpose Computes the optimal block length
Arguments
| Argument | Type | Meaning |
|---|---|---|
resid |
numeric vector | Residuals of the fitted working-independence model, already in the (temporal or spatial) bootstrap ordering. |
c_thresh |
numeric scalar | Critical-value multiplier for the adaptive bandwidth test. Default qnorm(0.975) ≈ 1.96, matching np::b.star. |
Mathematics
-
Bandwidth selection. Compute sample autocorrelations
$\hat\rho(1),\dots,\hat\rho(M_{\max})$ where$M_{\max} = \min(\lceil\sqrt{n}\rceil + K_N,; n-1)$ and$K_N = \max(5, \lceil\log_{10} n\rceil)$ . The adaptive bandwidth$\hat{m}$ is found by.gdpar_pw_mhatwith critical value
Set
-
Spectral estimates. Recompute autocovariances
$\hat{R}(k)$ for$k = 0,\dots,M$ . Apply the flat-top window$\lambda(k/M)$ :
-
Optimal block length. For overlapping (moving/circular) block bootstrap the variance constant is
$D = \tfrac{4}{3},\widehat{\text{spec}}^2$ (Lahiri 2003), giving
- Capping. The final integer block length is
Returns A list with components:
| Component | Type | Meaning |
|---|---|---|
block_length |
integer | The selected block length. |
method |
character |
"auto" if the data-driven rule succeeded; "rate" if the fallback was used. |
reason |
character | Human-readable description of the selection, including |
Notes Five fallback paths return the method = "rate": (i)
Purpose Generates a resampled index vector of length n for a single temporal block bootstrap replicate. Draws ceiling(n / block_length) contiguous blocks with replacement, concatenates them, and truncates to length n. Supports both the moving (Künsch 1989) and circular (Politis & Romano 1992) block schemes.
Arguments
| Argument | Type | Meaning |
|---|---|---|
n |
integer-coercible scalar | Sample size. |
block_length |
integer-coercible scalar | Length of each contiguous block; must be in |
type |
character |
"moving" (default) or "circular". Matched via match.arg. |
Mathematics
Let
-
Moving block bootstrap (
"moving"): block start positions are drawn uniformly from${1, 2, \dots, n - \ell + 1}$ . Each block$b$ contributes indices$s_b, s_b+1, \dots, s_b + \ell - 1$ . -
Circular block bootstrap (
"circular"): block start positions are drawn uniformly from${1, 2, \dots, n}$ . Indices wrap around modulo$n$ : the raw index$i$ maps to$((i-1) \bmod n) + 1$ .
The output is the first
Returns An integer vector of length n containing resampled observation indices in
Notes
- Raises an abort (class
"gdpar_input_error") viagdpar_abort()ifblock_lengthis outside$[1, n]$ . - The circular scheme gives every observation equal expected resampling weight, whereas the moving scheme slightly down-weights observations near the boundaries.
- This is the single-chain sibling of a multi-chain MCMC-draw block bootstrap resampler (
block_bootstrap_indices()) documented elsewhere.
Purpose Computes residuals of a scalar fit (Empirical-Bayes or full-Bayes) for use in the dependence diagnostics. Shared by the temporal diagnostic (gdpar_dependence_diagnostic) and the spatial diagnostic (gdpar_spatial_dependence_diagnostic) to ensure a single, consistent residual definition (design decision D100).
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A scalar fitted model object. |
residual_type |
character | One of "quantile", "response", "pearson", "deviance". |
randomize_seed |
integer or NULL
|
Seed for reproducibility of randomized quantile residuals for discrete families; ignored for continuous families. |
Returns A numeric vector of residuals of length
Notes
-
Full-Bayes branch (
gdpar_fitthat is not agdpar_eb_fit): delegates entirely to the S3 methodstats::residuals(object, type = residual_type, randomize_seed = randomize_seed), which internally uses the posterior predictive draws and.gdpar_residuals_dispatch()(design decision D102). -
Empirical-Bayes branch: extracts the scalar observed outcome via
.gdpar_eb_scalar_y_obs(object), obtains response-type predictions viastats::predict(object, type = "response"), reads the family name fromobject$family$name, and dispatches to.gdpar_residuals_dispatch().
gdpar_dependence_diagnostic(object, index = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), max_lag = NULL, level = 0.95, randomize_seed = NULL, ...)
Purpose (Exported.) Quantifies serial (temporal) dependence in the residuals of a scalar Path 1 Empirical-Bayes or full-Bayes fit. The diagnostic is the gate for gdpar_dependence_robust(): it makes violations of the conditional-independence assumption visible and measurable before any block-bootstrap remedy is applied. Only scalar fits (
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A scalar fitted model. |
index |
numeric vector or NULL
|
Temporal (or one-dimensional) ordering of observations. If non-NULL, residuals are sorted by order(index) before statistics are computed. Must have length |
residual_type |
character | Residual type: "quantile" (default; Dunn-Smyth / randomized quantile residuals), "response", "pearson", or "deviance". |
max_lag |
integer or NULL
|
Maximum lag for the Ljung–Box test. Default: |
level |
numeric in |
Confidence level for the verdict. Dependence is flagged when a p-value 0.95. |
randomize_seed |
integer or NULL
|
Seed for randomized quantile residuals (discrete families). |
... |
— | Unused; present for signature stability. |
Mathematics
-
Lag-1 autocorrelation. Let
$r_1, \dots, r_n$ be the (optionally re-ordered) residuals,$\bar{r}$ their mean, and$\tilde{r}_t = r_t - \bar{r}$ . Then
The approximate one-sided p-value under the null
- Durbin–Watson statistic. Reported descriptively (not as a formal test):
Values near 2 indicate no first-order autocorrelation.
-
Ljung–Box test. The omnibus test across lags
$1, \dots, h$ (where$h = \texttt{max_lag}$ ) is
computed via stats::Box.test(..., type = "Ljung-Box", fitdf = 0). The degrees of freedom are not reduced by the number of estimated model coefficients (fitdf = 0), making the test mildly optimistic for residuals of a fitted model.
-
Verdict. Dependence is flagged when
$p_{\text{Ljung-Box}} < 1 - \texttt{level}$ .
Returns An object of class c("gdpar_dependence_diagnostic", "list") with components:
| Component | Type | Meaning |
|---|---|---|
residual_type |
character | The residual type used. |
n |
integer | Number of residuals. |
max_lag |
integer | Maximum lag used for the Ljung–Box test. |
lag1_autocorr |
numeric |
|
lag1_p_value |
numeric | Two-sided p-value for |
durbin_watson |
numeric | Durbin–Watson statistic |
ljung_box_statistic |
numeric | Ljung–Box |
ljung_box_df |
integer | Degrees of freedom of the |
ljung_box_p_value |
numeric | P-value of the Ljung–Box test. |
level |
numeric | Confidence level used. |
index_supplied |
logical | Whether index was non-NULL. |
verdict |
character | Human-readable verdict string. |
A print method (S3 dispatch on "gdpar_dependence_diagnostic") is provided for formatted output.
Notes
-
Input validation. Calls
.gdpar_assert_scalar_dep(object, "object")to ensure the fit is scalar. Validateslevelviaassert_numeric_scalar(level, ..., lower = 0, upper = 1). Requires the posterior package (suggested dependency) for extracting posterior draws. -
Abort conditions. Raises
gdpar_abortwith class"gdpar_input_error"ifindexhas the wrong length ormax_lagis outside$[1, n-1]$ . Raises class"gdpar_diagnostic_error"if all residuals have zero variance (denom <= 0). -
S3 method note. The returned object carries class
"gdpar_dependence_diagnostic"as its primary class, enablingprint.gdpar_dependence_diagnostic()dispatch. -
Scope. Only scalar (
$K=1$ ,$p=1$ ) fits are accepted. Spatial dependence is handled by the siblinggdpar_spatial_dependence_diagnostic().
Purpose (Exported S3 method.) Provides a human-readable formatted summary of a gdpar_dependence_diagnostic object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_dependence_diagnostic |
The diagnostic object to print. |
digits |
integer | Number of significant digits for the printed statistics. (Signature declared in roxygen; exact default and implementation body are in the subsequent section.) |
... |
— | Unused; present for S3 generic compatibility. |
Returns Invisibly returns x.
Notes The function body is defined in the next section (section 3 of 7); only the roxygen documentation and function signature are present in this section. The method is registered via @export for S3 dispatch on the "gdpar_dependence_diagnostic" class.
Purpose S3 print method for objects of class gdpar_dependence_diagnostic. Produces a human-readable, multi-line textual summary of the serial-dependence diagnostic battery (autocorrelation, Durbin–Watson, Ljung–Box) attached to a fitted model.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list (S3 class gdpar_dependence_diagnostic) |
The diagnostic object produced by gdpar_dependence_diagnostic(). Required fields: $residual_type, $index_supplied, $n, $lag1_autocorr, $lag1_p_value, $durbin_watson, $ljung_box_df, $ljung_box_statistic, $ljung_box_p_value, $verdict. |
digits |
integer (default 3L) |
Number of significant digits used by format() when printing numeric quantities. |
... |
— | Ignored; present for S3 method compatibility. |
Returns invisible(x) — the input object, invisibly, following standard R print-method convention.
Notes
- All output is emitted via
cat()to the console (stdout). No value is returned visibly. - The print method checks
x$index_suppliedto annotate whether the residuals were ordered by a user-supplied index or by natural row order. - If
x$index_suppliedisTRUE, the residual-type line reads"(ordered by supplied index)"; otherwise"(natural row order)". - No validation of
xfields is performed; missing orNULLfields will produce blank output segments.
.gdpar_dependence_robust_engine(object, data, resample_fun, B, level, seed, iter_warmup, iter_sampling, chains, verbose, verbose_msg, caller_env, ...)
Purpose Internal (non-exported) shared block-bootstrap-by-refit engine. Factors out the entire resampling loop, seed management, bootstrap-SE and percentile-interval assembly, and per-refit convergence accounting that is common to the temporal (gdpar_dependence_robust) and spatial (gdpar_spatial_dependence_robust) robust-inference wrappers. The two public entry points differ only in their resample_fun and in the descriptive metadata they attach; everything downstream of the resample is handled here identically. Serves both the Empirical-Bayes (gdpar_eb_fit) and full-Bayes (gdpar_fit) paths through class-dispatched extractors (decision D102).
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
list (S3 class gdpar_eb_fit or gdpar_fit) |
A scalar Path 1 fit object (K = 1, p = 1). Must carry $call (the original fitting call), and must be dispatchable by .gdpar_dep_estimate_vector() and .gdpar_dep_model_se_vector(). |
data |
data.frame | The original data frame passed to the fitting function. Each bootstrap iteration indexes rows of this data frame. |
resample_fun |
nullary function | A closure with no arguments that returns an integer vector of length nrow(data) — the row indices for one bootstrap resample. For temporal bootstrapping this wraps .gdpar_block_bootstrap_data_indices() (moving or circular blocks ordered by index); for spatial bootstrapping it returns a spatial-block index vector. |
B |
integer scalar | Number of bootstrap refits to perform. |
level |
numeric scalar in |
Confidence level for the percentile interval (e.g. 0.95). |
seed |
integer or NULL
|
Optional RNG seed. When non-NULL, set.seed(as.integer(seed)) is called once before the loop, ensuring reproducibility of both the per-refit Stan seeds and the resample_fun() draws. |
iter_warmup |
integer scalar | Number of warmup (burn-in) iterations passed to each Stan refit. |
iter_sampling |
integer scalar | Number of post-warmup sampling iterations passed to each Stan refit. |
chains |
integer scalar | Number of HMC chains for each refit. |
verbose |
logical scalar | When TRUE, emits verbose_msg once at the start via gdpar_inform(). |
verbose_msg |
character or NULL
|
Pre-formatted cost message printed when verbose is TRUE. |
caller_env |
environment | The environment (typically the public wrapper's parent.frame()) in which each refit call is evaluated, so that model symbols resolve exactly as for a direct gdpar_eb() or gdpar() call. |
... |
— | Passed through to nothing directly; present for extensibility. |
RNG-consumption contract. The engine's random-number consumption order is frozen for reproducibility:
-
set.seed(seed)(whenseedis non-NULL); -
$B$ per-refit Stan seeds drawn viasample.int(.Machine$integer.max, B)— these are assigned deterministically to iteration$b = 1, \ldots, B$ ; - One call to
resample_fun()per iteration$b$ .
Point-estimate extraction. For each successful refit .gdpar_dep_estimate_vector(fit_b), which returns a named numeric vector of all AMM coefficients (theta_ref, a_coef, b_coef, W_raw, etc.).
Robust standard error. Let $\hat{\theta}j^{(b)}$ denote the estimate of parameter $j$ from bootstrap replicate $b$, and let $B{\text{ok}}$ be the number of replicates with no errors and no NA coefficients. Then:
where
Percentile confidence interval. For level
where stats::quantile(..., probs = c(alpha/2, 1 - alpha/2), names = FALSE).
SE ratio. The ratio comparing robust and model-based uncertainty:
A ratio
Convergence diagnostics. Per-refit convergence fields are aggregated over the
-
$\text{max_rhat} = \max_b \hat{R}^{(b)}_{\max}$ (maximum across all refits of the per-refit maximum split-$\hat{R}$); -
$\text{min_ess_bulk} = \min_b \text{ESS}_{\text{bulk},,\min}^{(b)}$ (minimum across all refits of the per-refit minimum bulk ESS); -
$\text{n_divergent_refits} = |{b : D^{(b)} > 0}|$ (number of refits with at least one divergent transition); -
$\text{n_high_rhat_refits} = |{b : \hat{R}^{(b)}_{\max} > 1.05}|$ (number of refits with max R-hat exceeding the 1.05 threshold).
The R-hat threshold iter_warmup/iter_sampling.
Returns A list with components:
| Component | Type | Description |
|---|---|---|
table |
data.frame | One row per AMM coefficient, columns: parameter (character), estimate (original point estimate), model_se (Laplace SD or posterior SD), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower, ci_upper (percentile interval endpoints). |
B_ok |
integer | Number of successful bootstrap refits (no errors, all coefficients non-NA). |
seed |
integer | The supplied seed, or NA_integer_ if seed was NULL. |
warnings |
character vector | Accumulated warning messages (refit failures, convergence issues). Zero-length if clean. |
refit_diagnostics |
list | Aggregate convergence summary: max_rhat (numeric), min_ess_bulk (numeric), n_divergent_refits (integer), n_high_rhat_refits (integer), rhat_threshold (numeric, always 1.05). |
Notes
- No refit exclusion. Under-converged or divergent refits are never excluded or down-weighted. The rationale (documented in source decision D102) is that excluding under-converged refits is non-random — it removes precisely the data configurations the bootstrap is meant to probe — and would bias the SE. Both R-hat breaches and divergence counts are reported as diagnostics only.
-
Error handling. If a refit raises an error, the error message is captured via
tryCatch, stored inwarnings_msg, and the iteration is skipped (next). If fewer than 2 refits succeed (B_ok < 2), the engine callsgdpar_abort()with class"gdpar_diagnostic_error", aborting the run. -
Parameter alignment. Only parameters common to the original fit's
param_namesand each refit's estimate vector are recorded inboot[b, ]. This handles the (rare) case where a refit produces a partial coefficient vector. -
Refit call construction. The refit call is
object$callwith fields overridden:data→sub(the resampled data),iter_warmup,iter_sampling,chains,verbose→FALSE,refresh→0L,skip_id_check→TRUE,seed→refit_seeds[b]. A new environmentenvis created withparent = caller_envandenv$sub <- sub, so the symbolsubresolves inside the call. -
Diagnostics path-agnostic. Both
gdpar_eb_fitandgdpar_fitobjects carry a$diagnosticsslot with fieldsrhat_max,ess_bulk_min,divergent_count. The engine reads whichever is present. -
Byte-identical EB path. On the Empirical-Bayes path, the dispatch to
.gdpar_dep_estimate_vector/.gdpar_dep_model_se_vectorresolves to the original EB helpers, and the refit is agdpar_eb()call, so the engine's output is bit-for-bit identical to the pre-D102 temporal-only implementation.
gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = "quantile", randomize_seed = NULL, type = "moving", B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = FALSE, ...)
Note: This function's roxygen documentation and
@exportdirective appear at the end of this section (section 3 of 7); the actual function body is defined in a subsequent section. The documentation below is derived strictly from the roxygen block present here.
Purpose Public, exported entry point for dependence-robust standard errors via a temporal block bootstrap. Re-estimates the uncertainty of a scalar Path 1 Empirical-Bayes or full-Bayes fit so that it is robust to temporal (serial) dependence in the data, without modelling that dependence. It refits the model on index, and reports the bootstrap standard deviation and percentile intervals of each AMM coefficient alongside the model-based (Laplace / posterior) standard errors. This implements the working-independence + robust-variance stance of Liang & Zeger (1986): the point estimates are unchanged (consistent when the mean structure is correct, not efficient); only the reported uncertainty is made dependence-robust. Delegates the core loop to .gdpar_dependence_robust_engine().
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
S3 object (gdpar_eb_fit or gdpar_fit) |
A scalar Path 1 fit (K = 1, p = 1): either from gdpar_eb() (Empirical Bayes) or gdpar() (full Bayes). |
data |
data.frame | The data frame originally passed to the fitting function. The fit object does not store the data (to stay lightweight), so it must be re-supplied. Resampled by contiguous blocks and the model is refit on each resample. |
index |
numeric/integer vector of length NULL
|
Optional temporal ordering of the rows of data. Data are sorted by order(index) so that contiguous blocks correspond to contiguous time. When NULL (default), the natural row order is assumed to be the temporal order. |
block_length |
NULL, positive integer, or "auto"
|
Block size for the bootstrap. NULL (default): uses the rate-optimal "auto": selects the block length data-drivenly via the Politis & White (2004) automatic rule (with the Patton, Politis & White 2009 correction), computed from the fitted residuals (no extra refit), falling back to the rate-optimal formula on a degenerate series. The chosen value and method are reported in the result. |
residual_type |
character, one of "quantile" (default), "response", "pearson", "deviance"
|
Type of residuals fed to the Politis & White automatic block-length selector. Used only when block_length = "auto"; ignored otherwise. "quantile" refers to Dunn–Smyth randomized quantile residuals. |
randomize_seed |
integer or NULL
|
Optional seed for the randomized quantile residuals of discrete families. Used only by the "auto" block-length selector for reproducibility of the block-length choice; ignored otherwise. |
type |
character, one of "moving" (default) or "circular"
|
Type of block bootstrap. "moving" uses overlapping blocks that slide along the series; "circular" wraps the series into a circle. |
B |
integer scalar (default 199L) |
Number of bootstrap refits. |
level |
numeric scalar in 0.95) |
Confidence level for the percentile interval. |
seed |
integer or NULL
|
Optional RNG seed controlling both the block resampling and deterministically derived per-refit Stan seeds, for full reproducibility. |
iter_warmup |
integer scalar (default 500L) |
Number of warmup iterations per refit. Defaults are deliberately short to keep cost manageable. |
iter_sampling |
integer scalar (default 500L) |
Number of post-warmup sampling iterations per refit. |
chains |
integer scalar (default 2L) |
Number of HMC chains per refit. |
verbose |
logical scalar (default FALSE) |
When TRUE, prints an opt-in cost message once. |
... |
— | Additional arguments forwarded to gdpar_eb() (or gdpar()) for every refit. |
The function applies the Liang & Zeger (1986) working-independence / sandwich-variance paradigm to the gdpar model class. The key quantities are:
-
Block-length selection. Under the rate-optimal default: $$ L = \max!\bigl(1,, \lfloor n^{1/3} \rceil\bigr) $$ Under
"auto", the Politis & White (2004) algorithm estimates the optimal block length from the spectral density at frequency zero of the fitted residuals, with the Patton–Politis–White (2009) bias correction. -
Moving block bootstrap. For series length
$n$ and block length$L$ , the moving block bootstrap draws$\lfloor n/L \rfloor$ contiguous blocks of length$L$ uniformly at random (with replacement) from the$n - L + 1$ possible overlapping blocks, concatenating them to form a resampled series of length$\approx n$ . -
Circular block bootstrap. The series is wrapped into a circle;
$n - L + 1$ is replaced by$n$ possible blocks, eliminating edge effects. -
Robust SE. Computed by the engine: $$ \widehat{\text{SE}}{\text{robust}} = \text{SD}\bigl(\hat\theta^{(1)}, \ldots, \hat\theta^{(B{\text{ok}})}\bigr) $$
-
SE ratio. $$ \text{se_ratio} = \frac{\widehat{\text{SE}}{\text{robust}}}{\text{SE}{\text{model}}} $$ Values
$> 1$ signal that the model-based SE understates true sampling variability due to dependence.
Returns A list of S3 class gdpar_dependence_robust with components:
| Component | Type | Description |
|---|---|---|
table |
data.frame | One row per AMM coefficient; columns: parameter, estimate, model_se, robust_se, se_ratio, ci_lower, ci_upper. |
block_length |
integer | The chosen block length. |
block_length_method |
character | One of "rate" (rate-optimal formula, also flags fallback from "auto"), "fixed" (user-supplied), "auto" (Politis–White). |
type |
character |
"moving" or "circular". |
B |
integer | Requested number of bootstrap replications. |
B_ok |
integer | Number of successful refits. |
level |
numeric | Confidence level used. |
index_supplied |
logical | Whether the user supplied an index vector. |
seed |
integer | The supplied seed, or NA_integer_. |
warnings |
character vector | Accumulated warning messages (refit failures, convergence issues). |
refit_diagnostics |
list | Aggregate per-refit convergence: max_rhat, min_ess_bulk, n_divergent_refits, n_high_rhat_refits, rhat_threshold. |
A print method (defined elsewhere) provides a human-readable summary.
Notes
-
Empirical-Bayes vs. full-Bayes parity. Both paths are supported (decision D102). On the EB path,
estimateis the Laplace/conditional-posterior mean andmodel_seis its SD; on the full-Bayes path,estimateis the posterior mean andmodel_seis the posterior SD. The posterior mean (not median) is used for parity and to keep the SE ratio a dimensionless SD-vs-SD ratio without undeclared normal-scaling constants. -
Full-Bayes caveats. (1) Each full-Bayes refit runs full HMC (costly). (2) Finite-iteration refits carry Monte-Carlo error in their posterior mean, which slightly and conservatively inflates
robust_se. (3) Under an informative prior the full-Bayes posterior SD can be smaller than the bootstrap SD even under correct independent specification ($\text{se_ratio} < 1$ ), because the prior concentrates the posterior beyond what the data alone support — this is benign regularization, not overstatement. -
Scope limitation. The bootstrap delivers robust variance, not better point estimates. It is valid for weak / short-range dependence relative to
block_length; it does not rescue long-memory or unit-root processes. -
Dependencies. Uses
cmdstanrfor refits andposteriorto extract coefficient estimates. -
Exported. This function is exported from the package namespace (present in
NAMESPACE).
gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, type = c("moving", "circular"), B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)
Purpose Top-level exported function that performs a dependence-robust uncertainty audit for a fitted gdpar model via block bootstrap. It re-estimates standard errors (and confidence intervals) of model coefficients to account for temporal dependence in the residuals, without changing point estimates. The method repeatedly refits the model on block-bootstrap resamples of the original data.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_fit or gdpar_eb_fit or compatible |
The fitted model object whose uncertainty is to be audited. |
data |
data.frame |
The original data frame used in fitting. Must be row-aligned with the model. |
index |
numeric or NULL
|
Optional temporal ordering variable. If non-NULL, data and residuals are sorted by this index before blocking. Must have length equal to nrow(data). |
block_length |
NULL, positive integer, or "auto"
|
Block length for the moving/circular block bootstrap. NULL uses a default rate "auto" selects the block length data-adaptively via the Politis–White (2004) plug-in method on the residuals. |
residual_type |
character scalar, one of "quantile", "response", "pearson", "deviance"
|
Type of residual used when block_length = "auto" for the Politis–White plug-in (and for spatial diagnostics). Matched via match.arg. |
randomize_seed |
integer or NULL
|
Seed for randomized quantile residuals (used only if residual_type = "quantile"). |
type |
character scalar, one of "moving", "circular"
|
Block-bootstrap scheme. "moving" uses overlapping blocks of length block_length; "circular" wraps the data end-to-end. Matched via match.arg. |
B |
positive integer | Number of bootstrap replicates (default 199). |
level |
numeric in |
Confidence level for percentile-based intervals (default 0.95). |
seed |
integer or NULL
|
Master seed passed to the engine for reproducibility. |
iter_warmup |
positive integer | Stan warmup iterations per refit. |
iter_sampling |
positive integer | Stan sampling iterations per refit. |
chains |
positive integer | Number of MCMC chains per refit. |
verbose |
logical scalar | If TRUE, prints an informational banner describing the audit before computation begins. |
... |
Additional arguments passed through to .gdpar_dependence_robust_engine and ultimately to the Stan refit. |
Mathematics
Default block length (rate method):
When block_length is NULL, the default block length is set to
where
Block bootstrap:
For each of .gdpar_block_bootstrap_data_indices(n, block_length, type). If type = "moving", consecutive blocks of length type = "circular", the data are conceptually wrapped in a circle.
Auto block length (Politis–White):
When block_length = "auto", residuals $block_length, $method, and $reason.
Robust standard error:
The block-bootstrap standard error of each coefficient is the sample standard deviation of the
measures how much the model-based uncertainty understates the dependence-robust uncertainty; values
Returns An object of class c("gdpar_dependence_robust", "list") with the following components:
| Component | Type | Meaning |
|---|---|---|
table |
data.frame |
Coefficient table with robust SEs, model SEs, se_ratio, and confidence intervals at the requested level. |
block_length |
integer | The block length used (after resolution of NULL or "auto"). |
block_length_method |
character | One of "fixed", "rate", or the method string returned by Politis–White. |
type |
character | The bootstrap scheme used ("moving" or "circular"). |
B |
integer | Requested number of replicates. |
B_ok |
integer | Number of replicates that completed successfully. |
level |
numeric | Confidence level. |
index_supplied |
logical | Whether the caller supplied an ordering index. |
seed |
integer or NULL
|
Seed actually used by the engine. |
warnings |
character vector | Accumulated warning messages from failed or slow refits. |
refit_diagnostics |
list or NULL
|
Aggregate convergence diagnostics across all refits (max R-hat, min ESS, divergent transitions, high-R-hat count). |
Notes
- The function requires the cmdstanr and posterior packages; if absent, a suggestion-error is raised.
- Validation errors (
class = "gdpar_input_error") are raised for: non-scalarobject, non-data-framedata, mismatchedindexlength, invalidblock_length(non-NULL, non-integer, non-"auto"),block_lengthoutside$[1, n]$ , non-scalar logicalverbose. - If
indexis non-NULL, bothdataand (internally) residuals are reordered byorder(index)before blocking, ensuring temporal coherence. - The function detects whether
objectinherits from"gdpar_fit"but not"gdpar_eb_fit"(i.e., is a full-Bayes fit) and adjusts the verbose message to warn that full HMC refits are markedly more expensive. - The resample-generating closure
resample_funis created in the local environment and passed to the engine. -
caller_env <- parent.frame()is captured so the engine can re-evaluate expressions in the caller's scope if needed.
Purpose S3 print method for objects of class gdpar_dependence_robust. Renders a human-readable summary of the block-bootstrap audit results to the console.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_dependence_robust |
The object to print. |
digits |
integer scalar (default 3) | Number of significant digits for formatting numeric columns in the table. |
... |
Unused; present for S3 generic compatibility. |
Returns Invisibly returns x (the input object).
Notes
- Prints the bootstrap scheme (
"moving"or"circular"), block length (with provenance label:"auto: Politis-White","rate: n^(1/3)", or blank for fixed),$B$ ,$B_{\text{ok}}$ , index-supplied status, and confidence level. - The label for
block_length_methoduses aswitchwith four branches:"auto","rate","fixed", and a default empty string; the%||%operator defaults to"fixed"if the component isNULL. - Numeric columns of the table are formatted with
format(col, digits = digits). - Appends an explanatory note about the
se_ratiointerpretation. - Calls
.gdpar_print_refit_diagnostics()to print convergence diagnostics. - If warnings are present, prints up to 5, with a count of remaining suppressed warnings.
Purpose Internal helper that prints aggregate per-refit convergence diagnostics (max R-hat, min ESS bulk, divergent transition count, high-R-hat refit count) to the console. Called by print.gdpar_dependence_robust.
Arguments
| Argument | Type | Meaning |
|---|---|---|
rd |
list or NULL
|
The refit_diagnostics component of a gdpar_dependence_robust object. |
digits |
integer scalar (default 3) | Number of significant digits for formatting. |
Returns invisible(NULL) in all cases.
Notes
- Returns early (silently) if
rdisNULL. - Also returns early if
rd$max_rhatisNULLor non-finite (!is.finite(mr)). - Uses
%||%to default missing components toNA_real_or0Lor1.05as appropriate. - Prints a single formatted line showing: max R-hat, min ESS (bulk), number of divergent refits, number of refits with R-hat above a threshold (default 1.05).
Purpose Internal function returning the variance-optimal default number of grid cells per axis
Arguments
| Argument | Type | Meaning |
|---|---|---|
n |
integer | Number of spatial observations. |
Mathematics
Implements the
yielding
Returns Integer scalar
Notes
- Uses
round(n^(1/4))then coerces to integer, with a floor of 2. - The documentation references decision D100 as the registered dissent.
Purpose Internal function that validates and coerces a coordinate matrix into a numeric
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
data.frame, matrix, or other |
Coordinate input to validate. |
n |
integer | Expected number of rows (observations). |
arg |
character (default "coords") |
Name of the argument for error messages. |
Returns A numeric matrix with exactly 2 columns and n rows, with no non-finite values.
Notes
- If
coordsis adata.frame, it is coerced viaas.matrix(). - Raises
gdpar_abort(class"gdpar_input_error") if:-
coordsis not a numeric matrix after coercion. -
coordsdoes not have exactly 2 columns. -
nrow(coords) != n. -
coordscontains any non-finite values (NA,NaN,Inf,-Inf).
-
Purpose Internal function constructing a binary
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric matrix ( |
Spatial coordinates (assumed validated). |
k |
positive integer | Number of nearest neighbours. |
Mathematics
Computes the
via stats::dist. For each observation order, and the corresponding entries of the adjacency matrix
The resulting
Returns An
Notes
- All
$n$ rows of$W$ are initialized to zero, then row-by-row the$k$ nearest neighbours are set to 1. - Because
orderbreaks ties by position, duplicate coordinates are handled deterministically.
Purpose Internal function constructing a binary distance-band adjacency matrix. The threshold is data-driven: the smallest distance that leaves no observation isolated.
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric matrix ( |
Spatial coordinates (assumed validated). |
Mathematics
- Compute the
$n \times n$ Euclidean distance matrix$D$ . - Set the diagonal
$D_{ii} = \infty$ . - The bandwidth threshold is
i.e., the maximum over all points of their nearest-neighbour distance. This ensures every point has at least one neighbour.
- The adjacency matrix is
Returns An
Notes
- Described as a "declared data-driven heuristic."
- The resulting
$W$ is symmetric because Euclidean distance is symmetric. - The diagonal is explicitly zeroed after the threshold comparison.
Purpose Internal function computing Moran's
Arguments
| Argument | Type | Meaning |
|---|---|---|
resid |
numeric vector of length |
Residuals (row-aligned with the weights matrix). |
W |
|
Spatial weights (binary adjacency or otherwise; need not be symmetric). |
S0 |
numeric scalar (default sum(W)) |
The sum of all weights |
Mathematics
Let
In vector notation, with
The implementation computes W %*% z), then takes the elementwise product
Returns A numeric scalar: the Moran's
Notes
- Under the null hypothesis of no spatial autocorrelation and row-standardised weights,
$E[I] \approx -1/(n-1)$ . Values near 1 indicate positive spatial autocorrelation; values near$-1/(n-1)$ indicate negative autocorrelation. - The formula as implemented handles asymmetric
$W$ correctly because$\sum_{ij} w_{ij} z_i z_j = \mathbf{z}^\top W \mathbf{z}$ does not require symmetry. - No
$p$ -value or reference distribution is computed here; this is a pure computational helper.
Purpose Internal function generating a length-$n$ vector of resampled row indices for a spatial block bootstrap. The spatial analogue of .gdpar_block_bootstrap_data_indices() for 2-D coordinates.
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric matrix ( |
Spatial coordinates (assumed validated and row-aligned). |
g |
positive integer | Number of grid cells per axis. |
scheme |
character, "tiled" or "moving"
|
Spatial bootstrap scheme. |
random_origin |
logical | If TRUE, the grid origin is randomized per replicate (Politis–Romano–Lahiri randomized partition). |
mins |
numeric vector of length 2 | Coordinate minima per axis (bounding-box lower corner). |
ranges |
numeric vector of length 2 | Coordinate range per axis (bounding-box extent). |
Mathematics
Cell side lengths:
Tiled scheme:
- Set the origin
$\mathbf{o}$ . Ifrandom_origin = TRUE, draw$\mathbf{u} \sim U(0,1)^2$ and set$\mathbf{o} = \mathbf{mins} - \mathbf{u} \odot \mathbf{L}$ ; otherwise$\mathbf{o} = \mathbf{mins}$ . - Assign each observation
$i$ to a cell:
- Group observations by cell label
$(c_{x,i}, c_{y,i})$ . - Sample cells with replacement (uniform) and concatenate their member indices until
$\geq n$ indices accumulate. Truncate to exactly$n$ .
Moving scheme:
- Repeatedly draw a random seed point
$\mathbf{s}$ from the data. - Draw
$\mathbf{u} \sim U(0,1)^2$ and set the block origin$\mathbf{o} = \mathbf{s} - \mathbf{u} \odot \mathbf{L}$ . - Collect all observations within the axis-aligned square
$[\mathbf{o},, \mathbf{o} + \mathbf{L})$ . - Append to the output until
$\geq n$ indices accumulate. Truncate to exactly$n$ .
Returns An integer vector of length
Notes
- In the tiled scheme, non-empty cells are guaranteed to have at least one observation. Empty cells are implicitly excluded because
splitonly creates groups for observed cell labels. - In the moving scheme, every block is guaranteed non-empty because the block is anchored at a randomly sampled observation and is sized to cover at least that point (assuming the observation falls inside the bounding box, which it does by construction).
- The
random_originmechanism implements the Politis–Romano–Lahiri randomized partition to break grid-alignment artifacts.
.gdpar_spatial_block_length_auto(coords, resid, scheme, random_origin, mins, ranges, B0 = 200L, var_const = 1, seed = NULL)
Purpose Internal workhorse that data-selects the spatial block size .gdpar_spatial_default_g.
Arguments
| Argument | Type | Meaning |
|---|---|---|
coords |
numeric |
Spatial coordinates (columns = resid. |
resid |
numeric vector of length |
Model residuals (centred internally: |
scheme |
character string | Block-tile scheme identifier forwarded to .gdpar_spatial_block_indices (controls how spatial blocks are laid out relative to random_origin, mins, ranges). |
random_origin |
logical or scalar | Whether to randomise the grid origin in each bootstrap replicate (forwarded to .gdpar_spatial_block_indices). |
mins |
numeric vector of length 2 | Minimum coordinate values |
ranges |
numeric vector of length 2 | Coordinate ranges |
B0 |
integer (default 200L) |
Number of Monte Carlo block-bootstrap replicates per candidate |
var_const |
numeric scalar (default 1) |
Multiplicative constant |
seed |
integer or NULL
|
Optional seed set via set.seed() before the bootstrap loop for reproducibility. |
Mathematics
The procedure operates as follows.
-
Default fallback. Compute
$g_{\text{def}} = \lfloor n^{1/4} \rfloor$ via.gdpar_spatial_default_g(n). Early returns use$g_{\text{def}}$ when:-
$n < 25$ , - coordinate spread is degenerate (
$\text{sd}(x) \le 0$ or$\text{sd}(y) \le 0$ ), - fewer than 3 valid grid points exist,
- bootstrap variances are non-finite or all zero,
- the MSE criterion is non-finite, or
- the MSE minimum falls on the largest-$g$ (smallest-block) boundary.
-
-
Design matrix for a spatial-mean surrogate. Construct a
$n \times 3$ matrix$$\mathbf{D}_{\text{surr}} = \bigl[,\mathbf{1},; (x - \bar x)/s_x,; (y - \bar y)/s_y,\bigr]$$ where$s_x, s_y$ are the coordinate standard deviations. -
Candidate grid. Define bounds
$$g_{\text{lo}} = \max!\bigl(2,;\lfloor 0.5, g_{\text{def}} \rfloor\bigr), \qquad g_{\text{hi}} = \min!\bigl(\lfloor 3, g_{\text{def}} \rfloor,; \lfloor \sqrt{n/3},\rfloor\bigr).$$ Generate 6 points on a log-spaced grid in$[g_{\text{lo}},, g_{\text{hi}}]$ , round to unique integers, and retain only$g \ge 2$ with average cell occupancy$n/g^2 \ge 3$ . -
Bootstrap variance per
$g$ . For each candidate$g$ and each replicate$b = 1,\dots,B_0$ :- Draw a spatial block index vector
$\mathcal{I}_b$ from.gdpar_spatial_block_indices. - Compute a
$3$ -vector of block-level spatial-mean statistics: $$\mathbf{T}b = \frac{1}{n},\mathbf{D}{\text{surr}}[\mathcal{I}_b,]^\top, z[\mathcal{I}_b].$$ - Aggregate across coordinates: $$V_g = \sum_{j=1}^{3} \mathrm{Var}{b}(T{b,j}).$$
- Also count the number of unique occupied tiles
$n_{\text{tiles}}(g)$ .
- Draw a spatial block index vector
-
MSE criterion. Smooth
$V_g$ with a running median ($k=3$ ):$\tilde V_g = \mathrm{runmed}(V_g,, 3)$ . Anchor at$g_{\min}$ (the smallest candidate, i.e.\ the largest blocks). Then:$$\text{bias}^2(g) = \bigl(\tilde V_g - \tilde V_{g_{\min}}\bigr)^2, \qquad \text{var}(g) = c ;\frac{\tilde V_g^{,2}}{n_{\text{tiles}}(g)},$$ $$\text{MSE}(g) = \text{bias}^2(g) + \text{var}(g).$$ The variance term reflects the inverse-number-of-blocks scaling (Lahiri 2003), not the Monte Carlo noise from finite$B_0$ . -
Selection.
$g^* = \arg\min_g \text{MSE}(g)$ . If$g^*$ equals the last (smallest-block) grid element, the procedure bails out to the$n^{1/4}$ default (anticonservative boundary).
Returns A named list with three elements:
| Element | Type | Meaning |
|---|---|---|
block_size |
integer | The chosen |
method |
character |
"auto" if the data-driven selection succeeded; "rate" on any fallback. |
reason |
character | Human-readable explanation: on success a formatted string with the grid, |
Notes
-
Fallback cascade. There are six distinct early-return paths, all returning
method = "rate"via the innerfb()helper, each with a differentreasonstring. The function never errors; it always returns a valid list. -
Bootstrap machinery. All block-index generation delegates to
.gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges)which implements the spatial tiling and optional random-origin jitter. -
Side effects. Calls
set.seed()whenseedis non-NULL. No other side effects. - No S3 dispatch. This is an internal utility, not an S3 generic or method.
-
Cell-occupancy bound. The upper cap on
$g$ enforces$n/g^2 \ge 3$ (at least ~3 observations per cell on average), a validity constraint for within-cell resampling. This is deliberately looser than the$n^{1/3}$ rate sometimes seen in the spatial bootstrap literature. -
Running median smoothing.
stats::runmed(..., k = 3L)uses a centred running median by default (endrule = "median"), so the first and last values may be smoothed with a half-window.
gdpar_spatial_dependence_diagnostic(object, coords, W = NULL, weights = c("knn", "distance"), k = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), test = c("permutation", "analytic"), n_perm = 999L, level = 0.95, randomize_seed = NULL, seed = NULL, ...)
Purpose Exported diagnostic that quantifies spatial autocorrelation in the residuals of a scalar (gdpar_dependence_diagnostic and the recommended gate before calling gdpar_spatial_dependence_robust.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
A scalar Path 1 fit (.gdpar_assert_scalar_dep. |
coords |
numeric |
Spatial coordinates, row-aligned with training data. Validated by .gdpar_validate_coords. |
W |
numeric NULL
|
User-supplied spatial weight matrix. Overrides weights/k. Diagonal is zeroed internally. Row-standardized before use. |
weights |
character, one of "knn" (default) or "distance"
|
Neighbourhood construction method when W is NULL. "knn" = "distance" = distance-band whose threshold is the smallest that isolates no location. Both produce row-standardized weights. Ignored when W is supplied. |
k |
integer or NULL
|
Number of neighbours for weights = "knn". Default heuristic: |
residual_type |
character, one of "quantile" (default), "response", "pearson", "deviance"
|
Type of residual extracted from object via .gdpar_dependence_residuals. "quantile" = randomized quantile (Dunn–Smyth) residuals. |
test |
character, one of "permutation" (default) or "analytic"
|
Hypothesis test for Moran's "permutation" = location-relabelling permutation test (two-sided via $ |
n_perm |
integer (default 999L) |
Number of permutations for test = "permutation". Capped below |
level |
numeric in 0.95) |
Confidence level used to convert the |
randomize_seed |
integer or NULL
|
Seed for randomized quantile residuals of discrete families; ignored otherwise. |
seed |
integer or NULL
|
Seed for the permutation test (reproducibility). |
... |
— | Unused; present for signature stability. |
Mathematics
The function implements the following:
Moran's
Permutation test. For each of n_perm permutations
- Compute
$I_\pi$ from the residuals$z_{\pi(i)}$ . - Two-sided
$p$ -value:$$p = \frac{1 + #{b : |I_\pi - E[I]| \ge |I_{\text{obs}} - E[I]|}}{\text{n_perm} + 1}$$
Analytic (Cliff–Ord) normal approximation. Define:
Returns A list of class c("gdpar_spatial_dependence_diagnostic", "list") with components:
| Component | Type | Meaning |
|---|---|---|
residual_type |
character | The residual type used. |
n |
integer | Number of observations. |
weights |
character |
"user" if W was supplied, otherwise the matched weights argument. |
k |
integer |
NA_integer_ if not applicable. |
style |
character | Always "W" (row-standardized). |
n_zero_weight |
integer | Number of locations with zero row sum in the raw weight matrix. |
morans_i |
numeric | Observed Moran's NA if any location has zero weight. |
expected_i |
numeric |
NA if undefined. |
var_i |
numeric | Analytic variance of NA when test = "permutation" or the analytic variance is non-positive. |
z |
numeric |
NA when not computed. |
p_value |
numeric | Two-sided NA if the test is undefined. |
test |
character |
"permutation" or "analytic". |
n_perm |
integer | Effective number of permutations (may be less than requested for tiny NA for the analytic test. |
level |
numeric | Confidence level used. |
verdict |
character | Human-readable summary string. Three forms: (1) "...Undefined..." if zero-weight locations exist, (2) "Spatial dependence detected..." if "No evidence against spatial independence..." otherwise. Also handles the case where NA. |
Notes
-
S3 dispatch. This is an exported function (not a method). A
printmethod for the returned object is defined immediately below. -
Guards and assertions.
-
.gdpar_assert_scalar_dep(object, "object")enforces that the fit is scalar ($K = 1$ ,$p = 1$ ). -
stats::var(resid) <= 0triggers an abort viagdpar_abortwith class"gdpar_diagnostic_error". - If
Wis supplied, it must be a numeric$n \times n$ matrix with all finite values; violations abort with class"gdpar_input_error". -
kmust satisfy$1 \le k \le n - 1$ ; otherwise abort with class"gdpar_input_error".
-
-
Zero-weight locations. If any location has zero row sum after weight construction, a
warningis emitted,morans_iis set toNA, and the verdict reports"Undefined". With kNN and$k \ge 1$ this should never occur (every point gets at least one neighbour). -
Small-sample warnings. For
test = "permutation":-
$n < 20$ : hard warning ("very small… treat the p-value as indicative only"). -
$n < 50$ : soft warning ("small… approximate").
-
-
Permutation cap.
max_distinctisfactorial(n)for$n \le 10$ andInfotherwise.n_perm_eff=min(n_perm, max(1, max_distinct - 1)). -
Analytic test asymmetry warning. If the row-standardized weight matrix
Wnis not symmetric (which is typical for kNN and distance-band weights), a warning recommendstest = "permutation". -
Side effects. Calls
set.seed()whenseedis non-NULL (inside the permutation loop). Callsrequire_suggested("posterior", ...)to ensure theposteriorpackage is available for extracting posterior draws. -
Coordinate handling. Coordinates are validated by
.gdpar_validate_coords. They are treated as Euclidean; lon/lat data must be projected before calling this function. -
Residual extraction. Delegates to
.gdpar_dependence_residuals(object, residual_type, randomize_seed). -
Weight construction. kNN via
.gdpar_knn_adjacency(coords, k); distance-band via.gdpar_distance_band_adjacency(coords). Both return raw (unstandardized) adjacency matrices.
Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Provides a human-readable summary of the spatial dependence diagnostic to the console.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_spatial_dependence_diagnostic |
The diagnostic object to print. |
digits |
integer (default 3L) |
Number of significant digits for printed statistics. |
... |
— | Unused; present for S3 generic compatibility. |
Mathematics None.
Returns Invisibly returns x (the input object unchanged), following standard R print method conventions.
Notes
-
Export. Declared with
@exportin the roxygen header, so it is exported and registered as an S3 method for theprintgeneric on classgdpar_spatial_dependence_diagnostic. - Body not shown. The source code for the function body is not included in this section (the section ends at the roxygen closing). Only the roxygen documentation is available; the exact formatting of the printed output cannot be described from this section alone.
Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Produces a formatted console summary of the spatial dependence diagnostic (Moran's I test on model residuals, optionally on a k-nearest-neighbour or distance-band spatial weight matrix).
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_spatial_dependence_diagnostic list |
The diagnostic object to print. Expected components: residual_type (character), n (integer), weights (character, one of "knn", "distance", or other for user-supplied), k (integer, number of neighbours when weights == "knn"), morans_i (numeric or NA), expected_i (numeric, test (character, "analytic" or "permutation"), z (numeric, analytic z-score), p_value (numeric), n_perm (integer, number of permutation draws), verdict (character). |
digits |
integer scalar (default 3L) |
Number of significant digits used by format() when printing numeric statistics. |
... |
— | Absorbed for S3 method compatibility; unused. |
Returns x invisibly (enabling piping/invisible return in scripts).
Notes
- When
x$morans_iisNA, the method prints"Moran's I : undefined"and skips printing the expected value, test statistic, and p-value entirely (regardless ofx$test). - When
x$weightsis"knn", the printed label includes the value ofx$kviasprintf. When"distance", it prints a fixed label. Any other value falls through to"user-supplied, row-standardized". - Output is directed to the console via
cat()withsep = "". - No side effects beyond printing; no error handling.
gdpar_spatial_dependence_robust(object, data, coords, block_size = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, scheme = c("tiled", "moving"), random_origin = TRUE, B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)
Purpose Re-estimates the uncertainty (standard errors and percentile confidence intervals) of every AMM coefficient from a scalar Path 1 fit so that inference is robust to unmodelled spatial dependence in the data. The point estimates themselves are unchanged; only the reported uncertainty is adjusted. This implements the working-independence + robust-variance stance of Liang & Zeger (1986) via a spatial block bootstrap: the model is refitted on B spatial block-bootstrap resamples over the observed coords, and the bootstrap standard deviation and percentile intervals of each coefficient replace (or supplement) the model-based (Laplace / posterior) standard errors. The function is the spatial counterpart of gdpar_dependence_robust (temporal); both share one internal refit engine (.gdpar_dependence_robust_engine).
Arguments
| Argument | Type | Default | Meaning |
|---|---|---|---|
object |
gdpar_eb_fit or gdpar_fit
|
— | A scalar Path 1 fit (K = 1, p = 1): either an Empirical-Bayes fit from gdpar_eb() or a full-Bayes fit from gdpar(). Validated by .gdpar_assert_scalar_dep. |
data |
data frame | — | The original data frame passed to the fitting function. Must be row-aligned with coords. It is resampled by spatial blocks and the model is refit on each resample. Validated by assert_data_frame. |
coords |
numeric matrix or data frame ( |
— | Spatial coordinates, row-aligned with data. Validated and coerced by .gdpar_validate_coords. |
block_size |
NULL, positive integer, or "auto"
|
NULL |
Number of grid cells per axis NULL: variance-optimal rate "auto": data-driven calibration over a grid of |
residual_type |
character (one of "quantile", "response", "pearson", "deviance") |
"quantile" |
Type of residuals used only when block_size = "auto" to feed the data-driven block-size selector. "quantile" gives Dunn-Smyth randomized quantile residuals. Ignored when block_size is NULL or a fixed integer. |
randomize_seed |
NULL or integer |
NULL |
Optional seed for reproducibility of the randomized quantile residuals of discrete families. Used only by the "auto" block-size selector; ignored otherwise. |
scheme |
character (one of "tiled", "moving") |
"tiled" |
Resampling scheme. "tiled": non-overlapping rectangular cells. "moving": overlapping square blocks anchored on sampled observation points. |
random_origin |
logical scalar | TRUE |
When TRUE and scheme = "tiled", the grid origin is randomly shifted per bootstrap replicate (Politis-Romano-Lahiri circular block idea adapted to 2-D), breaking deterministic boundary artifacts at the cost of one extra random draw per refit. |
B |
integer |
199L |
Number of bootstrap refits. Validated by assert_count. |
level |
numeric in |
0.95 |
Level for the percentile confidence intervals. Validated by assert_numeric_scalar. |
seed |
NULL or integer |
NULL |
Optional seed controlling the block resampling and per-refit Stan seeds for reproducibility. Passed through to .gdpar_dependence_robust_engine. |
iter_warmup |
integer |
500L |
Warmup iterations per refit's conditional HMC. |
iter_sampling |
integer |
500L |
Sampling iterations per refit's conditional HMC. |
chains |
integer |
2L |
Number of chains per refit. |
verbose |
logical scalar | TRUE |
When TRUE, prints an opt-in cost message describing the number of refits, grid dimensions, scheme, and whether the full-Bayes path is in use. |
... |
— | — | Additional arguments absorbed for forward compatibility (passed to .gdpar_dependence_robust_engine). |
Mathematics
Default block-size rate (decision D100,
is minimised at
Resampling.
-
Tiled scheme: non-empty cells are sampled with replacement; the resample is truncated to
$n$ observations (introducing a negative bias$O(1/n)$ , negligible). Whenrandom_origin = TRUE, the grid origin is shifted by a random sub-cell offset per replicate. -
Moving scheme: overlapping square blocks of side
$g$ cells are anchored on sampled observation locations, guaranteeing non-empty blocks.
Data-driven block size (
(the influence directions of the coefficient).
Returns A list of class c("gdpar_spatial_dependence_robust", "list") with the following components:
| Component | Type | Description |
|---|---|---|
table |
data frame | One row per AMM coefficient. Columns: estimate (original point estimate, unchanged), model_se (model-based SE), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower and ci_upper (percentile interval at level). |
block_size |
integer | The chosen |
block_size_method |
character | One of "rate" (variance-optimal default; also returned when "auto" falls back), "fixed" (user-supplied integer), or "auto" (data-driven calibration succeeded). |
scheme |
character | The resampling scheme used ("tiled" or "moving"). |
random_origin |
logical | Whether random grid-origin shifts were used (relevant only for "tiled"). |
n_tiles |
integer | Number of unique spatial cells at the chosen resolution. |
B |
integer | Requested number of bootstrap refits. |
B_ok |
integer | Number of refits that successfully converged (from .gdpar_dependence_robust_engine). |
level |
numeric | The percentile-interval level. |
seed |
integer or NULL
|
The seed actually used (may be supplied or internally generated by the engine). |
warnings |
character vector | Accumulated warning messages (single-cell warning if n_tiles <= 1, plus any from the refit engine). |
refit_diagnostics |
list | Aggregate per-refit convergence diagnostics, structured as in gdpar_dependence_robust. |
A print method is declared (signature print.gdpar_spatial_dependence_robust(x, digits, ...); body in another section).
Notes
-
Input validation.
objectis checked by.gdpar_assert_scalar_dep(must be a scalar Path 1 fit).coordsis validated by.gdpar_validate_coordsfor dimension and type. Collinear coordinates (zero range on either axis) raise an error viagdpar_abort(classgdpar_input_error).block_sizemust beNULL, a positive integer, or the string"auto"; any other string triggers an error.random_originandverbosemust be logical scalars.B,iter_warmup,iter_sampling,chainsare validated byassert_count.levelis validated as a numeric scalar in$(0, 1)$ . -
Single-cell warning. If all locations fall into a single spatial cell at the chosen resolution (
n_tiles <= 1), a warning is emitted and the bootstrap SE will collapse toward zero. The warning message is stored inwarnings_preand appended to the returnedwarningsvector. -
Full-Bayes detection. The function detects whether
objectis a full-Bayes fit (inherits(object, "gdpar_fit") && !inherits(object, "gdpar_eb_fit")) to adjust the verbose cost message accordingly (full-Bayes refits use full HMC and are markedly more expensive). -
Suggested dependencies. Requires
cmdstanr(for Stan refits) andposterior(for extracting posterior draws); both are loaded viarequire_suggested. -
Internal engine. The actual bootstrap loop is delegated to
.gdpar_dependence_robust_engine, which receives aresample_funclosure that calls.gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges)to generate block indices for each replicate. -
Coordinate pre-processing.
mins(per-axis minima) andranges(per-axis ranges) are computed fromcoordsand used throughout cell assignment and the resampling closure. -
...arguments. Forwarded to.gdpar_dependence_robust_engine; currently absorbed for compatibility with the temporal siblinggdpar_dependence_robust. - No dependence modelling. The function does not model spatial dependence; it only makes inference robust to it. Valid for weak / short-range spatial dependence relative to block size; does not rescue strong long-range dependence.
-
Isotropic block. A single isotropic
$g$ is used for both coordinate axes; strongly anisotropic residual dependence is a documented limitation.
Purpose
S3 print method for objects of class "gdpar_spatial_dependence_robust". Renders a human-readable summary of spatial-dependence-robust inference results produced by spatial block-bootstrap variance estimation. It displays the block-bootstrap configuration (grid size, scheme, number of non-empty tiles), the number of refits performed and succeeded, the confidence level, a formatted table of coefficient estimates with model-based and robust standard errors and their ratio, a brief interpretation of the se_ratio, refit diagnostics, and any stored warnings.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list (S3 class "gdpar_spatial_dependence_robust") |
The spatial-dependence-robust result object. Expected to contain the named elements scheme, block_size_method, block_size, random_origin, n_tiles, B, B_ok, level, table, refit_diagnostics, and warnings. |
digits |
integer(1) (default 3L) |
Number of significant digits used when formatting numeric columns of the coefficient table via format(). |
... |
— | Additional arguments passed to print(); accepted for S3 method signature compatibility but not used in the body. |
Returns
Invisibly returns x, the original input object (via invisible(x)). The primary effect is the side effect of printing to the console.
Notes
-
S3 dispatch. This is an S3 method registered for the generic
printon objects of class"gdpar_spatial_dependence_robust". Standardprint()dispatch applies. -
Header line. Prints the scheme name (e.g.
"lattice","tiled") followed by" spatial block bootstrap". -
Grid description. Prints
block_size × block_sizecells. Theblock_size_methodelement is checked:-
"auto"appends" (auto: data-driven calibration)". -
"rate"appends" (rate: n^(1/4))". -
"fixed"or any other value (includingNULLvia the%||%fallback) appends nothing. Ifrandom_originisTRUEandschemeis exactly"tiled", the note" (randomized origin)"is also appended. The count of non-empty tiles (n_tiles) is always shown.
-
-
Refit summary. Displays the total number of bootstrap refits (
B) and how many completed successfully (B_ok). -
Confidence level. Prints
level(a numeric probability, e.g.0.95). -
Coefficient table formatting. Columns of
x$tablethat are numeric are re-formatted withformat(col, digits = digits). The table is then printed withrow.names = FALSE. -
Interpretation hint. A short explanatory sentence is printed: the
se_ratiois defined asrobust_se / model_se; a ratio greater than 1 indicates that the model-based standard errors understate the spatial-dependence-robust uncertainty. -
Refit diagnostics. Delegates to
.gdpar_print_refit_diagnostics(x$refit_diagnostics, digits)(an internal helper defined elsewhere) to print any additional convergence or numerical diagnostics from the bootstrap refits. -
Warnings. If
x$warningsis a non-empty character vector, up to 5 warnings are printed, each preceded by" - ". If more than 5 exist, a count of remaining warnings is appended. -
Edge cases. If
block_size_methodisNULL, the%||%(null-coalescing) operator defaults to"fixed", so no calibration label is printed. Ifschemeis not"tiled"orrandom_originis notTRUE, the randomized-origin note is omitted. Ifx$warningshas length zero or isNULL, the warnings section is skipped entirely (theifguards handle this).
Purpose
Constructor for the dims_spec S3 class. It broadcasts a single uniform per-component specification (additive basis a, multiplicative basis b) across all amm_spec). It is the intended value of the dims argument of amm_spec when every dimension shares the same override.
Arguments
-
a:NULLor a one-sidedformula. The additive basis applied uniformly to every dimension$k = 1, \ldots, p$ of$\theta_i$ .NULLdisables the additive component for all dimensions. -
b:NULLor a one-sidedformula. The multiplicative basis applied uniformly to every dimension.NULLdisables the multiplicative component for all dimensions.
Mathematics
The object encodes, for the canonical AMM form
$$
\theta_i[k] = \theta_{\mathrm{ref}}[k] + a_k(x_i) + b_k(x_i),\theta_{\mathrm{ref}}[k] + \bigl(W_k(\theta_{\mathrm{ref}}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i, \quad k = 1, \ldots, p,
$$
the per-dimension, covariate-only pieces dims_spec because it couples all dimensions of
Returns
A list of class c("dims_spec", "list") with two components:
-
base: a listlist(a = a, b = b)holding the uniform template. -
overrides: an empty named listlist(), to be populated byoverride.
Notes
- Both arguments may simultaneously be
NULL; this is permitted and yields adims_specwhose base disables both components. - Validation is delegated to
assert_one_sided_formula(., allow_null = TRUE)for each ofaandb; malformed formulas abort there. - The dimension
$p$ is intentionally not stored; coherence with$p$ , the multivariate$W$ basis, and any overrides is validated later byamm_spec. - Bare formulas passed directly to
amm_spec'sdimsargument when$p > 1$ are rejected; wrapping indimwise()is the explicit opt-in to broadcasting.
Purpose
Attach a per-dimension override to an existing dims_spec, replacing the additive and/or multiplicative formula for a single dimension index k while leaving the base template and other dimensions untouched. Overrides compose across multiple calls and overwrite on repeated k.
Arguments
-
dims: adims_specobject (produced bydimwise). -
k: a positive integer scalar. The dimension index to override. Coherence with the global$p$ is checked later byamm_spec/resolve_dims_spec, not here. -
a: optional. A one-sidedformulareplacing the additive basis for dimensionk, orNULLto disable the additive component for that dimension only. Missing (omitted) means "inherit from base"; explicitlyNULLmeans "disable for this dimension". -
b: optional. Same semantics asafor the multiplicative basis.
Mathematics
For the overridden dimension a supplied} \ a^{\text{base}} & \text{if a missing} \end{cases}, \qquad
b_k = \begin{cases} b^{\text{ov}} & \text{if b supplied} \ b^{\text{base}} & \text{if b missing} \end{cases},
$$
where a supplied value of NULL is interpreted as "disabled" (a valid formula replacement of NULL), distinct from "missing" (inherit).
Returns
A new dims_spec (a modified copy of dims; the input is not mutated in place because list subsetting creates copies) with the override registered under the character key as.character(as.integer(k)) in dims$overrides. Each override entry is a list with components a, b, a_set (logical), b_set (logical). Calling override twice with the same k replaces the prior entry for that index.
Notes
-
assert_inherits(dims, "dims_spec", "dims")is called first; non-dims_specinput aborts. -
assert_count(k, "k")enforces thatkis a positive integer scalar. - If both
aandbare missing, the function aborts viagdpar_abortwithclass = "gdpar_input_error"and the message: "override(): at least one of 'a' or 'b' must be supplied. To leave a dimension unchanged, do not call override() for it." - The missing-vs-
NULLdistinction is implemented withbase::missing(). Whenais supplied,assert_one_sided_formula(a, "a", allow_null = TRUE)is run, thenov["a"] <- list(a)(the[<--with-list idiom is used so that assigningNULLretains the element rather than deleting it) andov$a_set <- TRUE. Symmetrically forb. - If no prior override exists for
k, a fresh entrylist(a = NULL, b = NULL, a_set = FALSE, b_set = FALSE)is seeded before applying the supplied arguments, so unsupplied components correctly remain flagged as unset and will inherit from the base at resolution time. - Range validation of
kagainst$p$ is not performed here; it is deferred toresolve_dims_spec.
Purpose
Internal resolver that flattens a dims_spec into the canonical per-dimension representation consumed by amm_spec: a length-p list of list(a, b) pairs, with overrides applied on top of the base template.
Arguments
-
dims: adims_specobject. -
p: a positive integer scalar, the global dimension.
Mathematics
For each ov$a_set is TRUE, else
Returns
A list of length p. Each entry is list(a = a_k, b = b_k) where a_k/b_k are each either a one-sided formula or NULL.
Notes
-
assert_inherits(dims, "dims_spec", "dims")andassert_count(p, "p")are run first. - Before flattening, every override key is parsed with
suppressWarnings(as.integer(key)). Any key that isNA,< 1, or> pis collected intobad; ifbadis non-empty, the function aborts viagdpar_abortwithclass = "gdpar_input_error", asprintfmessage listing the bad keys and the valid range1:p, and adatafieldlist(bad_keys = bad, p = p). - The flattening loop iterates
seq_len(p); for eachkit starts fromdims$base$a/dims$base$b, then ifdims$overrides[[as.character(k)]]is non-NULLit conditionally replacesa_kwhenisTRUE(ov$a_set)andb_kwhenisTRUE(ov$b_set). Unset components therefore inherit from the base, realising the missing-vs-NULLsemantics established byoverride. - Marked
@keywords internal/@noRd; not exported.
Purpose
S3 print method for objects of class dims_spec. Renders a compact human-readable summary of the base template and any registered overrides.
Arguments
-
x: adims_specobject. -
...: ignored; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
- Output layout:
- Header line
<dims_spec>. -
base:section printinga : <deparse(formula) or "NULL">andb : <deparse(formula) or "NULL">. Formula deparsing usesbase::deparse. - If
length(x$overrides) > 0L, anoverrides:section enumerating each override. Keys are sorted by their integer value (sort(as.integer(names(x$overrides)))) and printed ask = <int> : <parts>, where<parts>is the semicolon-joined set ofa = <deparse or "NULL">(only ifisTRUE(ov$a_set)) andb = <deparse or "NULL">(only ifisTRUE(ov$b_set)). Unset components are omitted from the line. - If there are no overrides, prints
overrides: <none>.
- Header line
- The method is exported (the generic
printis dispatched viaUseMethodon class"dims_spec", which sits before"list"in the class vector). - No validation of
xis performed; passing a malformed object may produce confusing output or errors fromdeparse/subsetting.
Purpose
Provides a concise, human-readable console summary of a fitted Empirical-Bayes model object. This S3 print method dispatches on objects of class gdpar_eb_fit, displaying key model characteristics, parameter estimates, numerical diagnostics of the Laplace approximation, and conditional HMC diagnostics.
Arguments
-
x: Agdpar_eb_fitobject, the result of an Empirical-Bayes fitting procedure. -
digits: Integer scalar. Controls the number of digits for numeric formatting viaformat(). Defaults to3L. -
...: Additional arguments (unused; included for S3 method consistency).
Mathematics
No explicit mathematical formula is implemented. The method presents estimates and standard errors computed elsewhere.
Returns
Invisibly returns the input object x (type gdpar_eb_fit).
Notes
-
S3 Dispatch: Invoked by
print()when the first argument is of classgdpar_eb_fit. -
Conditional Output: The printed output adapts based on the
pathcomponent of the object. For "eb_KxP" (Path C, the K×p regime), it prints a multi-dimensional array of estimates (theta_ref_kp_hat,theta_ref_kp_se), slot names, and per-slot condition numbers. For other paths, it prints scalar/vectors oftheta_ref_hatandtheta_ref_se. -
Side Effects: Writes directly to the console via
cat(). -
NULL-safe Access: Uses the
%||%operator (likely from rlang) to provide default values for potentially NULL components (e.g.,x$family$name), preventing errors during formatting. -
Diagnostics: Displays numerical diagnostics (
diagnostics_numerical) and, if available, one-line conditional HMC diagnostics (diagnostics).
Purpose
Constructs a structured summary of an Empirical-Bayes fit suitable for programmatic access and further printing. This S3 summary method computes credible intervals, optionally applying the Proposition 7B scalar or tensor correction, and extracts a summary of the conditional posterior if available.
Arguments
-
object: Agdpar_eb_fitobject. -
level: Numeric scalar in the interval (0, 1). Specifies the probability level for credible intervals. Defaults to 0.95. -
...: Additional arguments (unused).
Mathematics
-
Credible Interval Inflation (Correction):
The standard error (se) is multiplied by an inflation factorinflateto widen the credible interval, accounting for the uncertainty of the reference anchor.-
Scalar Correction (Path C off):
$$ \text{inflate} = \sqrt{1 + \frac{C}{\max(1, J)}} $$
where$C$ is the constantobject$eb_correction_constantand$J$ is the number of groups. -
Tensor Correction (Path C on):
For each group$g$ , slot$k$ , and coordinate$c$ :
$$ \text{inflate}_{k,c} = \sqrt{1 + \frac{\mathbf{T}[k, c, c]}{\max(1, J)}} $$
where$\mathbf{T}$ is theobject$correction_tensor_constant.
-
Scalar Correction (Path C off):
-
Credible Interval Calculation:
The$(1-\alpha)$ credible interval is:
$$ \text{estimate} \pm z_{1-\alpha/2} \cdot \text{se} \cdot \text{inflate} $$
where$z_{1-\alpha/2}$ is the$(1-\alpha/2)$ quantile of the standard normal distribution, and$\alpha = 1 - \text{level}$ .
Returns
An object of class summary.gdpar_eb_fit, which is a list containing:
-
theta_table: Adata.frame(or array for Path C) of estimates, standard errors, lower/upper interval bounds, and inflation factors. -
conditional_summary: A posterior summary (from theposteriorpackage) of the conditional model fit, if available. OtherwiseNULL. -
correction_applied: Logical flag indicating if an EB correction was applied. -
correction_constant(non-Path C) orcorrection_tensor(Path C): The correction value(s) used. -
inflation_factor: The computed inflation factor(s). -
level,family,link,J_groups,K_slots,p_dim,slot_names(Path C),diagnostics_numerical,diagnostics_hmc,path,call: Various model metadata.
Notes
-
Input Validation: Raises a
gdpar_input_error(viagdpar_abort()) iflevelis not a single numeric value in (0, 1). -
Conditional Posterior Extraction: Attempts to extract and summarize the conditional posterior draws using
posterior::summarise_draws(). Filters out latent parameters (e.g.,eta,log_lik) by pattern matching. Errors are silently caught, returningNULL. -
Path Dependency: The structure of the returned summary, especially
theta_table, differs significantly between the scalar (non-Path C) and tensor (Path C,eb_KxP) regimes.
Purpose
Formats and prints the summary of an Empirical-Bayes fit produced by summary.gdpar_eb_fit(). This S3 print method provides a detailed, human-readable display of the summary object.
Arguments
-
x: Asummary.gdpar_eb_fitobject. -
digits: Integer scalar for numeric formatting. Defaults to3L. -
...: Additional arguments (unused).
Mathematics
No new calculations; presents the pre-computed values from the summary object.
Returns
Invisibly returns the input summary object x.
Notes
-
S3 Dispatch: Invoked by
print()when the first argument is of classsummary.gdpar_eb_fit. -
Path-Dependent Output: Prints different sections depending on whether
x$pathis"eb_KxP"(Path C). For Path C, it prints the tensor-based correction details and the fulltheta_table. For other paths, it prints the scalar correction constant and inflation factor. - Conditional Summary Display: If available, prints the first 8 rows of the conditional posterior summary for a quick overview.
-
Side Effects: Writes directly to the console via
cat()andprint().
Purpose
Extracts coefficient estimates from a fitted empirical Bayes General Dynamic Parameter model (gdpar_eb_fit object). It returns the reference parameter estimates and, if a conditional HMC fit is available, the conditional model parameters (random effects, fixed effects, and raw W parameters).
Arguments
-
object: Agdpar_eb_fitobject resulting from a call to a fitting function (e.g.,gdpar_eb). -
...: Additional arguments (currently unused).
Mathematics
No new mathematical operations. It extracts precomputed quantities:
-
$\widehat{\theta}_{\text{ref}}^{\text{EB}}$ : The empirical Bayes estimate of the reference parameter. -
$\text{SE}(\widehat{\theta}_{\text{ref}})$ : Its standard error. -
$\text{Cov}(\widehat{\theta}_{\text{ref}})$ : Its covariance matrix. - For conditional parameters, it extracts posterior means and standard deviations from HMC draws:
$$ \widehat{\mu}a = \frac{1}{S} \sum{s=1}^S a^{(s)}, \quad \text{SD}(a) = \sqrt{\frac{1}{S-1} \sum_{s=1}^S (a^{(s)} - \widehat{\mu}_a)^2} $$
(analogous for
bandW), where$S$ is the number of posterior draws.
Returns
A list of class c("gdpar_coef_eb", "gdpar_coef", "list") with components:
-
theta_ref: A list containing:-
method: Character"EB". -
estimate: Numeric scalar,$\widehat{\theta}_{\text{ref}}^{\text{EB}}$ . -
se: Numeric scalar, standard error. -
cov: Numeric matrix, covariance matrix. -
eb_correction_applied: Logical, whether an EB correction was applied. -
eb_correction_constant: Numeric, the constant used for EB correction (if any).
-
- If
object$conditional_fitexists andposteriorpackage is available:-
a: List withestimate(vector of means) andse(vector of SDs) fora_coefparameters. -
b: List withestimateandseforb_coefparameters. -
W: List withestimateandseforW_rawparameters.
-
Notes
- S3 method for class
gdpar_eb_fit. - The conditional parameters (
a,b,W) are only extracted if theposteriorpackage is available and the conditional fit object contains draws. - The helper function
pick(pat)uses a regex patternpatto match variable names in the posterior draws and returns their means and SDs. - The output class inherits from
gdpar_coef, allowing use of generic coefficient methods.
predict.gdpar_eb_fit(object, newdata = NULL, type = c("response", "linear_predictor"), level = 0.95, ...)
Purpose
Computes posterior predictions from the conditional HMC model fit at the plug-in empirical Bayes estimate
Arguments
-
object: Agdpar_eb_fitobject. -
newdata: Optional data frame with the same variables as training data. Currently must beNULL(in-sample prediction). -
type: Character string specifying prediction scale:-
"response"(default): Predictions on the response scale via the family's inverse-link function ($y$ ). -
"linear_predictor": Predictions on the linear predictor scale ($\eta$ ).
-
-
level: Numeric scalar in$(0,1)$ for the credible interval width. Defaults to$0.95$ . -
...: Additional arguments (currently unused).
Mathematics
Let
- Mean: $\bar{\eta}i = \frac{1}{S} \sum{s=1}^S \eta_i^{(s)}$ (or
$\bar{y}_i$ for response). - Credible interval bounds:
$Q_{\alpha/2}(\eta_i)$ and$Q_{1-\alpha/2}(\eta_i)$ , where$\alpha = 1 - \text{level}$ and$Q$ denotes the sample quantile.
Returns
A list with components:
-
mean: Numeric vector of posterior predictive means (length$n$ ). -
lower: Numeric vector of lower credible interval bounds (length$n$ ). -
upper: Numeric vector of upper credible interval bounds (length$n$ ). -
draws: Numeric matrix of posterior predictive draws (dimensions$S \times n$ ). -
level: The credible interval level used. -
type: The prediction type ("response"or"linear_predictor").
Notes
- S3 method for class
gdpar_eb_fit. - If
newdatais notNULL, an error of class"gdpar_unsupported_feature_error"is raised, stating that out-of-sample prediction is not yet implemented (deferred to Sub-phase 8.6.C). - Requires the
posteriorpackage to extract and manipulate HMC draws. - The function searches for variables in the conditional fit's draws matching
"^eta\\["(for linear predictor) or"^y_pred\\["(for response). If none are found, an internal error is raised. - Quantiles are computed using
stats::quantilewithnames = FALSE. - The
drawsmatrix is transposed from theposteriordraws matrix format to$S \times n$ .
gdpar_eb(formula, family = gdpar_family("gaussian"), amm = amm_spec(), W = NULL, data, prior = NULL, anchor = "prior_mean", skip_id_check = FALSE, chains = 4L, iter_warmup = 1000L, iter_sampling = 1000L, adapt_delta = 0.95, max_treedepth = 12L, refresh = 100L, verbose = TRUE, seed = NULL, group = NULL, parametrization = c("auto", "ncp", "cp"), id_check_rigor = c("full", "fast"), eb_correction = TRUE, laplace_control = list(), ...)
Purpose
Exported main entry point for Path 1 Empirical-Bayes (EB) estimation of the AMM canonical model. It is the EB counterpart of gdpar(). The function implements a three-step pipeline:
-
Step (i) — Estimate the population reference
$\theta_{\text{ref}}$ by maximizing the marginal (Type II) likelihood via Laplace approximation (cmdstanr::laplace()), with multi-start optimization and adaptive Levenberg–Marquardt ridge perturbation for numerical anti-fragility. -
Step (iii) — Sample the lower-level parameters
$\xi = (a, b, W, \sigma_*, \phi)$ from the conditional posterior$p(\xi \mid y, \widehat{\theta}_{\text{ref}}^{\text{EB}})$ via HMC (cmdstanr::sample()). - Optionally apply the scalar Proposition 7B coverage-discrepancy inflation factor to the conditional credible intervals.
The function dispatches across three path regimes based on the resolved
-
Path A (8.6.B):
$K = 1$ ,$p = 1$ — the base regime executed inline in the function body. -
Path B (8.6.C):
$K > 1$ ,$p = 1$ — delegated to.gdpar_eb_run_K(). -
Path C (8.6.D):
$K > 1$ and any slot with$p > 1$ — delegated to.gdpar_eb_run_KxP().
Arguments
| Argument | Type | Meaning |
|---|---|---|
formula |
Two-sided formula or gdpar_formula_set
|
Outcome and RHS specification. Same semantics as gdpar()'s formula. When it inherits from "gdpar_formula_set", the K-input dispatch fires. |
family |
gdpar_family object or named list |
Distributional family. Sub-phase 8.6.B supports stan_id in c(1, 2, 3, 4) (Gaussian, Poisson, neg-binomial-2, Bernoulli) for the gdpar_family or gdpar_family_multi), it is treated as a multi-family input for K-input dispatch. |
amm |
amm_spec or named list of amm_spec
|
AMM specification. Must have amm$p == 1L for the base regime; multivariate (amm_spec) triggers K-input dispatch. |
W |
W_basis object or NULL
|
Optional modulating basis (polynomial or B-spline). |
data |
data.frame |
Data frame containing all variables referenced by formula and amm. |
prior |
gdpar_prior object or NULL
|
Prior specification. When NULL, defaults via gdpar_prior() are used. |
anchor |
Numeric scalar, "prior_mean", or "empirical_y"
|
Anchor value for "prior_mean". |
skip_id_check |
Logical scalar | If TRUE, skips the basis-restricted identifiability check. |
chains |
Integer scalar | Number of HMC chains for Step (iii). Default 4L. |
iter_warmup |
Integer scalar | HMC warmup iterations per chain. Default 1000L. |
iter_sampling |
Integer scalar | HMC sampling iterations per chain. Default 1000L. |
adapt_delta |
Numeric scalar | HMC adapt_delta. Default 0.95. |
max_treedepth |
Integer scalar | HMC maximum tree depth. Default 12L. |
refresh |
Integer scalar | HMC refresh interval. Default 100L. |
verbose |
Logical scalar | Controls diagnostic messages and show_messages/show_exceptions in HMC. |
seed |
Integer scalar or NULL
|
Random seed for reproducibility (Laplace multi-start, parametrization pre-flight, and HMC). |
group |
One-sided formula or NULL
|
Grouping variable specification. |
parametrization |
Character scalar | One of "auto" (default), "ncp", "cp". Selects CP/NCP sampling parametrization for additive and modulating components in Step (iii). "auto" triggers a pre-flight diagnostic via resolve_parametrization(). |
id_check_rigor |
Character scalar | One of "full" or "fast". Matched but not otherwise consumed in this function body (forwarded to K-path orchestrators). |
eb_correction |
Logical scalar | If TRUE (default), applies the scalar Proposition 7B inflation factor to conditional credible intervals. If FALSE, issues a gdpar_diagnostic_warning about expected |
laplace_control |
Named list | Controls for Step (i) Laplace approximation and anti-fragility. Recognized entries: multi_start_M (default 5), kappa_threshold (default 1e10), ridge_init (default 1e-6), epsilon_lm (default sqrt(.Machine$double.eps)), ridge_max_iter (default 10), ridge_grow_factor (default 10.0), laplace_draws (default 1000), optim_algorithm (default "lbfgs"). Resolved by .gdpar_eb_resolve_laplace_control(). |
... |
Additional arguments | Forwarded to the underlying HMC sampler (conditional_model$sample()) in Step (iii). |
Mathematics
The EB estimator maximizes the marginal (Type II) log-likelihood:
The integral is approximated by the Laplace method: for each candidate
where
Given
sampled via HMC in Step (iii).
When eb_correction = TRUE, the scalar Proposition 7B inflation constant
Returns
An object of class c("gdpar_eb_fit", "list") with the following named components:
| Component | Type | Description |
|---|---|---|
theta_ref_hat |
Numeric vector (length J_groups) |
EB point estimates of |
theta_ref_se |
Numeric vector (length J_groups) |
Marginal standard errors from the Laplace covariance. |
conditional_fit |
cmdstanr fit object |
The HMC fit from Step (iii). |
amm |
amm_spec |
The resolved AMM specification. |
family |
gdpar_family |
The resolved family object. |
prior |
gdpar_prior |
The resolved prior. |
design |
AMM design object | Built by build_amm_design(). |
anchor |
Numeric scalar | The resolved anchor value. |
stan_data |
Named list | The Stan data list (includes K_slots, p_dim). |
identifiability_report |
Report object or NULL
|
Result of gdpar_check_identifiability(); NULL when skip_id_check = TRUE. |
diagnostics |
gdpar_diagnostics |
Diagnostics from the conditional HMC fit, computed by compute_diagnostics(). |
diagnostics_numerical |
Named list | Numerical diagnostics from the Laplace step: kappa, lm_perturbation, lm_n_iter, lm_status (one of "not_needed", "converged", "exhausted"), kappa_post_ridge, multi_start_dispersion, marginal_log_lik_history. For Path C, slot-vectorized counterparts (kappa_per_slot, lm_lambda_per_slot, lm_n_iter_per_slot, lm_status_per_slot) replace the scalars. |
parametrization |
Named list | Contains cp_a (logical), cp_W (logical), and meta (metadata from resolve_parametrization()). |
group_info |
Group info object or NULL
|
Resolved grouping information. |
correction_applied |
Logical scalar | Whether the Proposition 7B correction was applied. |
eb_correction_constant |
Numeric scalar | The inflation constant when eb_correction = TRUE; NA_real_ otherwise. |
call |
call |
The matched call. |
path |
Character scalar | Always "eb". |
Notes
-
Argument matching:
parametrizationandid_check_rigorare resolved viamatch.arg()at function entry.callis captured viamatch.call(). -
Input validation: Delegates to
.gdpar_eb_validate_inputs()(defined in a subsequent section) for type discipline offormula,family,amm,data,eb_correction, andlaplace_control. IfpriorisNULL, it is replaced bygdpar_prior(); thenassert_inherits()enforces class"gdpar_prior". -
cmdstanr dependency:
require_suggested("cmdstanr", ...)is called to ensure the suggested package is available. The Laplace method requires cmdstanr ≥ 0.7.0. -
K-input dispatch: Four boolean flags detect multi-slot input patterns:
-
.formula_set_input:formulainherits"gdpar_formula_set". -
.amm_list_input:ammis a list, does not inherit"amm_spec", and has non-NULLnames. -
.classic_with_amm_calls:formulais a standard two-sided formula (length 3) whose RHS containsa()/b()/W()calls, detected by.gdpar_rhs_has_amm_calls(). -
.family_is_named_list:familyis a named list not inheriting"gdpar_family"or"gdpar_family_multi".
When any of these fires,
.gdpar_eb_resolve_K_inputs()buildsamm_list_canonical,family_promoted,outcome_name,formula_env, andfamily_id_k_vector. If resolved$K > 1$ , the function checks whether any slot has$p > 1$ (.any_slot_p_gt1); if so, it returns.gdpar_eb_run_KxP()(Path C), otherwise.gdpar_eb_run_K()(Path B). If$K = 1$ , the singleamm_specis unwrapped fromamm_list_canonical[[1]],familyis replaced byfamily_promoted, and a newformulais reconstructed from the union ofall.vars(amm$a)andall.vars(amm$b)(or"1"if both are empty), usingK_inputs$formula_envas the environment. -
-
Path A (K = 1) pipeline: After K-input resolution (or if no K-input pattern fired), the function proceeds inline:
-
p_resolvedis read fromamm$p(defaulting to1Lif absent).K_resolvedis always1L. -
.gdpar_eb_check_stan_id_for_path()validates the family'sstan_idagainst the resolved$(K, p)$ . - The outcome variable name is extracted from
formula[[2]]. If not found indata, agdpar_input_erroris raised. Non-finite values (NA,NaN,Inf) in the outcome trigger agdpar_input_errorwith a count. - The RHS formula is extracted as
formula[c(1L, 3L)]and updated with~ . + 0(no intercept). - If
amm$Wis non-NULL, it is materialized viamaterialize_W_basis(amm$W, p = p_resolved). - The AMM design is built via
build_amm_design(amm, data, formula_rhs = rhs). - The anchor is resolved via
resolve_anchor(anchor, family, y, prior, verbose). - Unless
skip_id_check = TRUE,gdpar_check_identifiability()is called withtheta_ref_initset to1whenamm$bis non-NULLandabs(anchor_value) < 1e-8, otherwiseanchor_value. If the check does not pass, agdpar_identifiability_erroris raised with the report attached indata = list(report = rep). - Group resolution via
.resolve_group_argument(). If a group is present,.check_group_aliasing_c7()is called. - Stan data is assembled via
assemble_stan_data().stan_data$K_slotsandstan_data$p_dimare set to the resolved integers. - Parametrization is resolved via
resolve_parametrization()(which may run a pre-flight diagnostic whenparametrization = "auto"). - The marginal Stan model source is generated by
.gdpar_eb_generate_stan_marginal(), written to a tempfile viawrite_stan_to_tempfile(), and compiled viacmdstanr::cmdstan_model(). - The marginal likelihood is maximized by
.gdpar_eb_maximize_marginal(), returningtheta_ref_hat,theta_ref_se, anddiagnostics. - The conditional Stan model source is generated by
.gdpar_eb_generate_stan_conditional(), written and compiled analogously. -
stan_data_condis a copy ofstan_datawiththeta_ref_dataset: when$p > 1$ andlength(theta_hat_loc) == J_groups * p, it is reshaped to aJ_groups×pmatrix (column-major,byrow = FALSE); otherwise it is passed as a flat numeric vector. - HMC sampling is invoked via
do.call(conditional_model$sample, sample_args). Extra arguments from...are merged intosample_args, potentially overriding defaults.seedis included only when non-NULL. - Diagnostics are computed via
compute_diagnostics(fit_cond, verbose = verbose). - The EB correction is computed by
.gdpar_eb_apply_correction().
-
-
Errors raised:
-
gdpar_input_error: outcome variable not indata; outcome contains non-finite values. -
gdpar_identifiability_error: basis-restricted identifiability check failed (withdata = list(report = rep)). -
gdpar_unsupported_feature_error: raised by.gdpar_eb_check_stan_id_for_path()for unsupportedstan_id/$(K, p)$ combinations (as documented; the actual raise is inside the helper). -
gdpar_eb_numerical_error: raised by.gdpar_eb_maximize_marginal()when the condition number exceedskappa_thresholdafter adaptive ridge (as documented; the actual raise is inside the helper).
-
-
Side effects: Writes Stan source files to temporary files on disk; compiles Stan models (may invoke the C++ toolchain); runs optimization and HMC sampling (may produce console output controlled by
verbose/refresh). -
S3 dispatch: The returned object has class
c("gdpar_eb_fit", "list"). No S3 methods for this class are defined in this section.
Purpose
Top-level input validator for the EB (Empirical Bayes) correction pipeline. Called before any dispatch to verify that every public argument conforms to the expected type and structure. Guards the entry point of the EB path and prevents downstream functions from receiving malformed inputs.
Arguments
-
formula(any): Must be either a two-sided R formula of length 3 (y ~ ...) or an object inheriting from class"gdpar_formula_set". -
family(any): Must be one of: an object inheriting from"gdpar_family", an object inheriting from"gdpar_family_multi"(Path A,$p > 1$ ), or a named list whose every element inherits from"gdpar_family"with no duplicated or empty names (Path B heterogeneous$K$ , sub-phase 8.3.7 pattern). -
amm(any): Must be an object inheriting from"amm_spec"or a named list (whose elements are expected to be"amm_spec"objects) for Path B$K > 1$ . -
data(any): Must be adata.frame. -
eb_correction(any): Must be a single, non-NAlogical value (TRUEorFALSE). -
laplace_control(any): Must be a list (possibly empty, possibly unnamed at this stage — naming is enforced downstream in.gdpar_eb_resolve_laplace_control).
Returns
invisible(NULL). The function is called for its side effect of raising errors on invalid input.
Notes
- Raises an error of class
"gdpar_input_error"(viagdpar_abort) for each validation failure, with a conditiondatafield carryingreceived_classwhere applicable. - The
formulacheck first testsinherits(formula, "gdpar_formula_set"); if that fails, it requiresinherits(formula, "formula")andlength(formula) == 3L. - The
familynamed-list detection (Path B) requires all of:is.list(family), not inheriting from"gdpar_family"or"gdpar_family_multi", non-null names, all names non-empty (nzchar), no duplicated names (anyDuplicated == 0L), and every element inheriting from"gdpar_family"(checked viavapply). - The
ammnamed-list detection requiresis.list(amm), not inheriting from"amm_spec", and non-null names. - The
$K > 1$ +$p > 1$ guard (Path C) is explicitly released per Sub-phase 8.6.D (Session 13b, 2026-05-25); Path C is routed to.gdpar_eb_run_KxP()in the dispatcher. Per-pathstan_idchecks are deferred to.gdpar_eb_check_stan_id_for_path().
Purpose
Enforces the per-path supported stan_id set for the EB Stan templates, depending on the resolved family_id_k_vector data field.
Arguments
-
family(list): A single family specification object. Must contain$stan_id(coercible to integer) and$name(character) fields. -
K(integer/numeric): The resolved number of mixture components. -
p(integer/numeric): The resolved number of parametric coordinates.
Mathematics
The supported stan_id sets by regime:
Note that the
Returns
invisible(NULL) on success.
Notes
- If
family$stan_idisNULL, the function returns immediately without checking (short-circuit). -
stan_idis coerced viaas.integer(). - On failure, raises an error of class
"gdpar_unsupported_feature_error"viagdpar_abort, with a conditiondatalist containingfamily,stan_id,K,p, andsupported. - Under Path C (
$K > 1$ ,$p > 1$ ), the dispatcher is expected to iterate this check across the$K$ slots before assembling thefamily_id_k_vectordata field. - The deferred Path B set
${5, 6, 7, 8, 9, 10, 11, 12, 13}$ (Beta, Gamma, Lognormal_loc_scale, Student-t, Tweedie, ZIP, ZINB, Hurdle-Poisson, Hurdle-NB) for the$K > 1, p > 1$ regime is queued for a later iteration of Sub-phase 8.6.D.
Purpose
Merges a user-supplied laplace_control list with documented defaults, coercing types and validating bounds. Produces the fully resolved control list consumed by downstream Laplace approximation and ridge-perturbation routines.
Arguments
-
user(list): User-supplied control parameters. May be empty. If non-empty, every entry must be named.
Mathematics
Default values:
where .Machine$double.eps.
Returns
A named list with the following entries, all type-coerced:
| Field | Type | Default |
|---|---|---|
multi_start_M |
integer | 5L |
kappa_threshold |
double | 1e10 |
ridge_init |
double | 1e-6 |
laplace_draws |
integer | 1000L |
optim_algorithm |
character | "lbfgs" |
epsilon_lm |
double | sqrt(.Machine$double.eps) |
ridge_max_iter |
integer | 10L |
ridge_grow_factor |
double | 10.0 |
User-supplied values for recognized names override defaults; unrecognized names are dropped after a warning.
Notes
- If
useris empty (length(user) == 0L), returns the defaults list directly. - If
useris non-empty but hasNULLnames or any empty (!nzchar) names, raises an error of class"gdpar_input_error". - Unknown entries (not in
names(defaults)) trigger a soft warning of class"gdpar_diagnostic_warning"viagdpar_warnand are silently dropped from the output. - Post-merge type coercion:
multi_start_M,laplace_draws,ridge_max_iterare coerced viaas.integer();kappa_threshold,ridge_init,epsilon_lm,ridge_grow_factorviaas.double().optim_algorithmis left as-is. - Validation bounds (each raises
"gdpar_input_error"on failure):multi_start_M >= 1Lkappa_threshold > 0epsilon_lm > 0ridge_max_iter >= 1Lridge_grow_factor > 1
-
laplace_drawsis coerced to integer but not bounds-checked in this function.
Purpose
Adaptive Levenberg-Marquardt ridge perturbation for the empirical posterior covariance matrix returned by cmdstanr::laplace(). Implements component 2 of the four-component anti-fragility strategy, extending the single-step ridge of Sub-phase 8.6.B into an iterative geometric-growth loop.
Arguments
-
cov(numeric matrix): A square symmetric matrix — the empirical posterior covariance. For Path C this is a per-slot block; for Path A/B it is the full$\theta_{\text{ref}}$ covariance. -
control(list): A resolvedlaplace_controllist (as produced by.gdpar_eb_resolve_laplace_control). Must contain$ridge_init,$ridge_max_iter,$ridge_grow_factor,$kappa_threshold, and$epsilon_lm.
Mathematics
Let cov) of dimension
Trigger condition. Ridge perturbation is needed if:
where
If not needed: returns the original matrix with status "not_needed" and
Adaptive loop. Starting with
Compute eigenvalues
Convergence: If "converged".
Growth: Otherwise,
Exhaustion: If the loop completes "exhausted" and
Returns
A list with fields:
| Field | Type | Description |
|---|---|---|
cov_perturbed |
numeric matrix | The (possibly ridged) covariance. Equals cov when status = "not_needed"; equals the last cov_try otherwise. |
lambda_used |
numeric | Final effective ridge 0 when status = "not_needed". |
n_iter |
integer | Number of iterations performed. 0L when status = "not_needed". Equals control$ridge_max_iter when "exhausted". |
kappa_post |
numeric | Condition number after perturbation. Original "not_needed"; "exhausted". |
status |
character | One of c("not_needed", "converged", "exhausted"). |
Notes
- Eigenvalue computation uses
eigen(cov, symmetric = TRUE, only.values = TRUE)wrapped intryCatch; if it errors, eigenvalues are set toNA_real_, which triggers the ridge path. - The determinant is computed as
prod(eigs0)only when all eigenvalues are finite; otherwisedet_valisNA_real_and the determinant-based trigger is skipped (but the eigenvalue-based trigger may still fire). -
trace_meanis clamped to at least$10^{-12}$ to avoid a zero floor when the diagonal is near-zero. - The
lambda_efffloor of$10^{-3} \cdot \bar{d}$ is applied inside every iteration, so even ifcontrol$ridge_initis very small, the effective ridge is bounded below by the trace-mean-scaled floor. - When
status = "exhausted", the returnedcov_perturbedis the last attempted matrix (which may or may not be positive-definite), andkappa_postisInfif the final eigenvalues are non-finite or non-positive. - No error is raised on exhaustion; the caller is expected to inspect
status.
.gdpar_eb_generate_stan_marginal(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)
Purpose
Dispatches to the correct Stan template generator for the EB marginal model — the model in which theta_ref (or theta_ref_k) lives in the parameters{} block and is assigned an anchor prior in model{}. This corresponds to Step (i)/(ii) of the EB workflow where the marginal log-likelihood is maximised to obtain the empirical-Bayes anchor estimate. The function selects among four template paths based on the resolved dimensions
Arguments
| Argument | Type | Meaning |
|---|---|---|
prior |
list | Prior specification list. Expected elements (consumed downstream by the renderer): theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, phi. |
cp_a |
logical (default FALSE) |
Centered-parameterization flag for a. When TRUE, a is scaled directly by sigma_a; when FALSE, a non-centered * sigma_a[1] scaling is applied. |
cp_W |
logical (default FALSE) |
Centered-parameterization flag for W. Semantics mirror cp_a. |
K |
integer (default 1L) |
Number of K-slots (groups/series). Coerced to integer at entry. |
p |
integer (default 1L) |
Coordinate dimension of the response. Coerced to integer at entry. |
family |
NULL or family object |
Passed only to the Path B (K > 1, p = 1) generator generate_stan_code_K. |
cp_a_per_k |
NULL or logical |
Per-k centered-parameterization flag for a, forwarded to generate_stan_code_multi (Path A). |
cp_a_per_K |
NULL or logical |
Per-K centered-parameterization flag for a, forwarded to generate_stan_code_K (Path B). |
Mathematics
The dispatch is a partition of the
Returns
A character string containing the rendered Stan model code. For the .gdpar_eb_render_template; for the other two paths it is produced by generate_stan_code_multi or generate_stan_code_K respectively.
Notes
-
$K$ and$p$ are coerced to integer immediately upon entry (as.integer). - The Path C template (
$K > 1 \wedge p > 1$ ) has a restricted placeholder set: onlytheta_ref,sigma_theta_ref,sigma_a,sigma_b,sigma_y,phiare present. The placeholders{{A_SCALE}},{{A_PRIOR}},{{W_SCALE}},{{W_PRIOR}}are absent because the NCP (non-centered parameterization) is hardcoded per slot per coordinate andWis disabled (decision D39). - The function does not itself raise errors; any errors propagate from the downstream generators/renderers.
.gdpar_eb_generate_stan_conditional(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)
Purpose
Companion of .gdpar_eb_generate_stan_marginal for Step (iii) of the EB workflow. Generates the EB conditional Stan model, in which theta_ref (or theta_ref_k) has been moved from parameters{} to data{} and the anchor priors are dropped from model{}. The dispatch table is structurally identical to the marginal helper; only the template names differ.
Arguments
Identical to .gdpar_eb_generate_stan_marginal (same names, types, defaults, and meanings).
Mathematics
Returns
A character string of rendered Stan model code, sourced from the same generators as the marginal path but with conditional template names.
Notes
- The conditional templates share the same placeholder set as their marginal counterparts, except that anchor-prior placeholders are consumed only in the marginal path (the conditional path drops them from
model{}). -
$K$ and$p$ are coerced to integer at entry. - No errors are raised directly; all are delegated downstream.
Purpose
Shared low-level renderer for the EB Stan template family. Reproduces the placeholder-substitution logic of generate_stan_code() but restricted to EB templates. It (1) translates legacy single-template names to their canonical-piece equivalents, (2) locates the template file in the installed package or falls back to inst/stan/, (3) injects the canonical helpers piece when the // {{CANONICAL_HELPERS}} marker is present, (4) performs all {{...}} substitutions, and (5) aborts with a structured error if any placeholder remains un-substituted.
Arguments
| Argument | Type | Meaning |
|---|---|---|
template_name |
character | Base name of the .stan template file (e.g. "amm_eb_marginal.stan"). |
prior |
list | Prior specification list; the renderer reads prior$theta_ref, prior$sigma_theta_ref, prior$sigma_a, prior$sigma_b, prior$sigma_W, prior$sigma_y, prior$phi. |
cp_a |
logical | Centered-parameterization flag for a. Controls the values substituted for {{A_SCALE}} and {{A_PRIOR}}. |
cp_W |
logical | Centered-parameterization flag for W. Controls the values substituted for {{W_SCALE}} and {{W_PRIOR}}. |
Mathematics
The placeholder substitution map is:
| Placeholder | Value when cp_* = TRUE
|
Value when cp_* = FALSE
|
|---|---|---|
{{A_SCALE}} |
"" |
" * sigma_a[1]" |
{{A_PRIOR}} |
"normal(0, sigma_a[1])" |
"normal(0, 1)" |
{{W_SCALE}} |
"" |
" * sigma_W[1]" |
{{W_PRIOR}} |
"normal(0, sigma_W[1])" |
"normal(0, 1)" |
The prior placeholders map directly: {{PRIOR_THETA_REF}} prior$theta_ref, {{PRIOR_SIGMA_THETA_REF}} prior$sigma_theta_ref, {{PRIOR_SIGMA_A}} prior$sigma_a, {{PRIOR_SIGMA_B}} prior$sigma_b, {{PRIOR_SIGMA_W}} prior$sigma_W, {{PRIOR_SIGMA_Y}} prior$sigma_y, {{PRIOR_PHI}} prior$phi.
Returns
A character string: the fully substituted Stan source code.
Notes
-
Template name translation:
"amm_eb_marginal.stan"is mapped to"amm_canonical_eb_marginal.stan"and"amm_eb_conditional.stan"is mapped to"amm_canonical_eb_conditional.stan". All other template names (including the KxP templates) pass through unchanged. -
File location: If the effective template name starts with
"amm_canonical_", the file is sought insystem.file("stan", "_canonical_pieces", ...)with a fallback tofile.path("inst", "stan", "_canonical_pieces", ...). Otherwise it is sought insystem.file("stan", ...)with a fallback tofile.path("inst", "stan", ...). -
Helpers injection: If the template source contains the literal
// {{CANONICAL_HELPERS}}, the fileamm_canonical_helpers.stanis read from the same_canonical_piecesdirectory and substituted in place. Templates without this marker (e.g. the KxP EB templates) pass through unchanged. -
Error — template not found: If the resolved
template_pathdoes not exist, callsgdpar_abortwith class"gdpar_internal_error"and message"Stan template file '<name>' not found.". -
Error — helpers not found: If the helpers piece file does not exist, calls
gdpar_abortwith class"gdpar_internal_error". -
Error — unsubstituted placeholder: After all substitutions, if the string still contains
"{{", the first match of\{\{[A-Za-z0-9_]+\}\}is extracted viaregmatches/regexprand passed togdpar_abortwith class"gdpar_internal_error"and message"Unsubstituted placeholder remains in EB Stan code: <leftover>". - All
gsubcalls usefixed = TRUE, so placeholders are treated as literal strings.
Purpose
Implements Step (i) of the EB workflow with the anti-fragility strategy of Charter Section 2.8. Runs cmdstanr::optimize() followed by cmdstanr::laplace() on the marginal EB Stan model with multi_start_M independent random inits, retains the init with the highest log-marginal approximation, applies an adaptive Levenberg–Marquardt ridge if the Hessian-derived covariance is ill-conditioned, and assembles the diagnostics needed by the gdpar_eb_fit$diagnostics_numerical slot.
Arguments
| Argument | Type | Meaning |
|---|---|---|
model |
CmdStanModel |
A compiled cmdstanr model object exposing $optimize() and $laplace() methods. |
stan_data |
list | Data list for Stan. Must contain J_groups (integer, number of groups). For path dispatch, may contain p_dim (integer, coordinate dimension) and K_slots (integer, number of K-slots). |
control |
list | Control parameters. Must contain: multi_start_M (integer, number of multi-start inits), optim_algorithm (character, passed to optimize), laplace_draws (integer, number of Laplace draws), kappa_threshold (numeric, condition-number gate). |
seed |
NULL or integer |
Base random seed. When non-NULL, per-init seeds are as.integer(seed) + m for optimize and as.integer(seed) + 1000L for Laplace. |
verbose |
logical | When TRUE, emits informational messages about failed inits and multimodality warnings. |
Mathematics
Multi-start optimization. For
The best init is selected by the largest finite lp__ value from optimize()):
Laplace approximation. At the best mode
Adaptive Levenberg–Marquardt ridge. If ridge_max_iter is reached:
Condition-number gate. The final covariance is accepted only if:
Multi-start dispersion. Computed over the finite
A dispersion exceeding verbose = TRUE.
Path-aware variable extraction. The theta_ref variable names extracted from the Laplace draws depend on the path:
| Path | Condition | Variable pattern | Expected count |
|---|---|---|---|
| Base |
theta_ref[1], …, theta_ref[J] (or theta_ref if |
||
| Path A |
|
theta_ref[j,k] for |
|
| Path B |
|
theta_ref_k[j,k] for |
Returns
A list with components:
| Component | Type | Description |
|---|---|---|
theta_ref_hat |
numeric vector (length |
Posterior mean of theta_ref from the Laplace draws (colMeans of the draws matrix). |
theta_ref_se |
numeric vector (same length) | Standard errors: |
theta_ref_cov |
matrix ( |
Covariance matrix (possibly ridged). |
diagnostics |
named list | See below. |
The diagnostics list contains:
| Element | Type | Description |
|---|---|---|
kappa |
numeric | Post-ridge condition number |
lm_perturbation |
numeric | The ridge lambda_used). |
lm_n_iter |
integer | Number of LM ridge iterations. |
lm_status |
character | Status from .gdpar_eb_lm_perturb (e.g. "ok" or "exhausted"). |
kappa_post_ridge |
numeric | Duplicate of kappa (from lm_out$kappa_post). |
multi_start_dispersion |
numeric | Dispersion of finite NA if fewer than 2 finite values. |
marginal_log_lik_history |
numeric vector (length |
lp__ from each init; NA for failed inits. |
best_init_index |
integer | The |
Notes
-
Init dispatch: The flag
is_multi_or_KisTRUEwhenstan_data$p_dim > 1Lorstan_data$K_slots > 1L. In that case,init_mis set toNULL(cmdstanr's default unconstrained-space random sampler is used). Otherwise,.gdpar_eb_make_random_init(stan_data, seed_offset = m, base_seed = seed)is called. Each multi-start iteration uses a distinct seed offset, preserving reproducibility. -
Optimize call:
jacobian = TRUEis always set (required for downstreamlaplace()to match the unconstrained-scale convention). Wheninit_mis non-NULL, it is wrapped aslist(init_m)(single chain). Whenseedis non-NULL, the per-init seed isas.integer(seed) + m. -
Laplace call: Uses
mode = best_opt,jacobian = TRUE,draws = control$laplace_draws. Seed (if non-NULL) isas.integer(seed) + 1000L. -
Error — all inits fail: If
best_optisNULL(everyoptimize()call failed or returnedNULL), callsgdpar_abortwith class"gdpar_unsupported_feature_error", message recommendinggdpar()(FB), anddata = list(history_lp = history_lp). -
Error — Laplace fails: If
model$laplace()returnsNULL(error caught), callsgdpar_abortwith class"gdpar_eb_numerical_error", message about singular/non-PD Hessian at the candidate MAP, anddata = list(history_lp, best_idx). -
Error — missing theta_ref variables (Path B): If the number of
theta_ref_k[j,k]variables found in the draws does not equal$J \cdot K$ , callsgdpar_abortwith class"gdpar_internal_error". -
Error — missing theta_ref variables (Path A): If the number of
theta_ref[j,k]variables found does not equal$J \cdot p$ , callsgdpar_abortwith class"gdpar_internal_error". -
Error — missing theta_ref variables (Base): If no
theta_ref[...]variables are found and$J = 1$ does not rescue via the baretheta_refname, callsgdpar_abortwith class"gdpar_internal_error"and message"theta_ref variable not found in Laplace draws output.". -
Error — kappa exceeds threshold: If
$\kappa_{\text{post}} > \kappa_{\text{threshold}}$ orlm_out$status == "exhausted", callsgdpar_abortwith class"gdpar_eb_numerical_error", a detailed message including$\kappa$ , threshold, LM status, iteration count,$\lambda$ , and smallest eigenvalue, anddatacontainingkappa,eigenvalues,history_lp,lm_status,lm_n_iter,lm_lambda. -
Warning — multimodality: When
dispersion > 0.05andverbose = TRUE, callsgdpar_warnwith class"gdpar_diagnostic_warning"anddata = list(dispersion, history_lp). -
Covariance computation: If the draws matrix has more than one column,
stats::cov(theta_mat)is used; otherwise a$1 \times 1$ matrix fromstats::var(theta_mat[, 1]). -
Eigenvalue computation:
eigen(theta_cov, symmetric = TRUE, only.values = TRUE)is attempted in atryCatch; on error returnsNA_real_. The minimum eigenvalue is reported in the kappa-exceeds-threshold error message. -
Verbose messages: Failed
optimize()calls emit agdpar_informwith class"gdpar_eb_message"whenverbose = TRUE. - The
%||%operator is used for theall_varsfallback (dimnames(draws)$variable %||% character(0L)).
Purpose
Generates a list of random initial values for the Stan HMC sampler in the Empirical Bayes (EB) workflow. The structure of the returned list is conditioned on the flags and dimensions carried in stan_data, so that only parameters relevant to the configured model are initialised.
Arguments
-
stan_data(list): The data list prepared for Stan. The following fields are consulted:-
J_groups(integer): number of reference-parameter groups$J$ . -
use_groups(integer flag, 0/1): whether group-level hyperparameters are active. -
use_a(integer flag, 0/1): whether the$a$ AMM component is active. -
J_a(integer): dimension of the$a$ component. -
use_b(integer flag, 0/1): whether the$b$ AMM component is active. -
J_b(integer): dimension of the$b$ component. -
use_W(integer flag, 0/1): whether the$W$ AMM component is active. -
dim_W(integer): row dimension of the$W$ matrix. -
d(integer): column dimension of the$W$ matrix (latent dimension). -
use_dispersion_y(integer flag, 0/1): whether an observation-level dispersion is active. -
use_dispersion_phi(integer flag, 0/1): whether a$\phi$ dispersion parameter is active.
-
-
seed_offset(integer, default1L): integer added tobase_seedto derive the RNG seed. -
base_seed(integer orNULL, defaultNULL): base seed. IfNULL, the global RNG state is left untouched.
Mathematics
When base_seed is non-NULL, the effective seed is
The draws are:
and, when the corresponding flag is set:
Returns
A named list suitable for passing as init to a Stan sampler. Scalar parameters are wrapped in 1-element arrays via as.array(); W_raw is a matrix; theta_ref, a_raw, and c_b_raw are numeric vectors.
Notes
- When
base_seedis non-NULL, the function callsset.seed(rng_seed)and registers anon.exithandler that restores the prior.Random.seedstate in.GlobalEnv(if it existed) upon return. If.Random.seeddid not exist in.GlobalEnv, the handler does nothing (the seed set byset.seedpersists). - The
on.exithandler is registered withadd = TRUE, so it composes with any pre-existing exit handlers. - Flags are tested with
isTRUE(... == 1L), so any value other than exactly1L(includingTRUEor1) is treated as inactive.
Purpose
Entry point for the Proposition 7B coverage-discrepancy correction in the EB workflow. In the scalar regime (.gdpar_eb_correction_matrix(). The correction is not applied to the raw draws here—only the scaling object is returned for downstream S3 methods.
Arguments
-
eb_correction(logical): whether the correction should be applied. -
laplace_result(list): result of the Laplace approximation step. Must containtheta_ref_cov(a matrix, or at least an indexable object for the[1L, 1L]element in the scalar path). -
stan_data(list): the Stan data list. Passed through but not directly used in the scalar computation. -
p(integer, default1L): dimensionality of the reference parameter for the correction. -
verbose(logical): whether to emit a diagnostic warning when the correction is disabled.
Mathematics
Scalar form (
For the default identity functional
with
Returns
A list with two elements:
-
applied(logical):TRUEif the correction was successfully computed,FALSEotherwise. -
constant(numeric scalar): the scalar correction$C_{g,\alpha}$ whenapplied = TRUE;NA_real_otherwise.
When .gdpar_eb_correction_matrix() produces (a constant).
Notes
- If
eb_correctionisFALSEandverboseisTRUE, a warning is issued viagdpar_warn()with class"gdpar_diagnostic_warning", stating that intervals will use nominal coverage and may under-cover by$O(n^{-1})$ . - The marginal variance is extracted as
laplace_result$theta_ref_cov[1L, 1L]inside atryCatch; any error yieldsNA_real_. - If the marginal variance is not finite or is
$\leq 0$ , the function returnsapplied = FALSE, constant = NA_real_silently (no warning). - For
$p > 1$ ,pis coerced to integer before the delegation check.
Purpose
Computes the matrix-valued Proposition 7B* coverage-discrepancy correction for the multivariate regime (.gdpar_eb_apply_correction() and implements v07b Section 5.1.
Arguments
-
eb_correction(logical): whether the correction should be applied. -
laplace_result(list): Laplace approximation result containingtheta_ref_cov. -
stan_data(list): Stan data list (passed through, not used in computation). -
p(integer, default1L): dimension of the reference parameter. -
verbose(logical): intended for diagnostics (not directly used in the body beyond being accepted).
Mathematics
Matrix form (Proposition 7B*, v07b Section 5.1):
For the default identity functional
with
Returns
A list with two elements:
-
applied(logical):TRUEif the matrix correction was successfully computed. -
constant(matrix): the$p \times p$ (or matchingcov_matdimension) correction matrix whenapplied = TRUE; anNA_real_matrix of appropriate size otherwise.
Notes
- The function aborts silently to
applied = FALSEwith an NA matrix in the following cases:-
eb_correctionis notTRUE. -
laplace_result$theta_ref_covisNULL, not a matrix, or non-square (extraction wrapped intryCatchreturningNULLon error). - Any element of
cov_matis non-finite. - Eigenvalues of
cov_mat(computed viaeigen(..., symmetric = TRUE, only.values = TRUE)) are non-finite, or any eigenvalue is$< -10^{-10}$ (i.e., the matrix is not positive semi-definite within tolerance).
-
- When the PSD check fails, the returned NA matrix has dimensions matching
nrow(cov_mat)/ncol(cov_mat), not necessarilyp. - The eigenvalue extraction is wrapped in
tryCatchreturningNA_real_on error, which then triggers the non-finite check. - Downstream S3 methods are expected to fall back to nominal credible intervals when
applied = FALSE.
.gdpar_eb_resolve_K_inputs(formula, amm, W, family, formula_set_input, amm_list_input, classic_with_amm_calls, family_is_named_list)
Purpose
Resolves the three possible K-input patterns (formula set, named list of amm_spec, or classic formula with AMM wrapper calls) into a single canonical amm_list_canonical, and promotes the family scope accordingly. This mirrors the K-input dispatch logic of gdpar() and is the EB-path companion of .gdpar_K. The logic is intentionally duplicated rather than refactored to preserve bit-exact behaviour of golden tests.
Arguments
-
formula(formula orgdpar_formula_set): the model formula or formula set. -
amm(amm_specor named list ofamm_spec): the AMM specification(s). -
W(matrix orNULL): the$W$ matrix passed to AMM construction. -
family(gdpar_familyor named list ofgdpar_family): the response family specification. -
formula_set_input(logical): whetherformulais agdpar_formula_set. -
amm_list_input(logical): whetherammis a named list ofamm_spec. -
classic_with_amm_calls(logical): whether the formula RHS containsa()/b()/W()wrapper calls. -
family_is_named_list(logical): whetherfamilyis a named list (heterogeneous K-slot pattern).
Returns
A list with elements:
-
amm_list_canonical(named list ofamm_spec): the resolved canonical AMM specifications, one per K-slot. -
K(integer): length ofamm_list_canonical. -
outcome_name(character): the name of the outcome variable extracted from the formula. -
formula_env(environment): the environment associated with the formula. -
family_promoted: the family object after scope promotion (either a promotedgdpar_familyor a heterogeneous family structure). -
family_id_k_vector(integer vector orNULL): per-observation family IDs when the heterogeneous path is taken;NULLotherwise.
Notes
Three dispatch branches, evaluated in order:
-
formula_set_inputbranch:ammmust be the defaultamm_spec()(checked via.gdpar_is_default_amm_spec()); otherwise an error of class"gdpar_input_error"is raised. The canonical list is built by.gdpar_formula_set_to_amm_spec_list(formula, W).outcome_nameandformula_envare taken fromformula$outcomeandformula$env. -
amm_list_inputbranch:ammis used directly asamm_list_canonical. Each slot name must be non-empty (checked vianzchar()), each entry must inherit from class"amm_spec", and slot names must be unique (anyDuplicated(...) == 0L).formulamust be a two-sided formula (length(formula) == 3L). Violations raise"gdpar_input_error".outcome_nameisas.character(formula[[2L]]);formula_envisenvironment(formula). -
Classic (else) branch:
ammmust be the defaultamm_spec(). The first eligible parameter name is extracted fromfamily—eitherfamily[[1L]]$param_specs[[1L]]$name(iffamily_is_named_list) orfamily$param_specs[[1L]]$name. Agdpar_formula_setis constructed viado.call(gdpar_formula_set, args_for_fs)with the formula named by that parameter, then.gdpar_formula_set_to_amm_spec_list(fs, W)builds the canonical list.
After resolution,
-
$K > 1$ : Iffamily_is_named_list, calls.gdpar_resolve_heterogeneous_family_K(family, names(amm_list_canonical))and unpackslocation_familyandfamily_id_k_vector. Otherwise calls.gdpar_promote_scope_per_observation(family, names(amm_list_canonical))withfamily_id_k_vector = NULL. -
$K = 1$ : Iffamily_is_named_list, raises"gdpar_input_error"(heterogeneous path requires$K \geq 2$ ). Otherwise calls.gdpar_promote_scope_per_observation(family, k_name)withfamily_id_k_vector = NULL.
Errors raised (all via gdpar_abort with class = "gdpar_input_error"):
- Formula set path with non-default
amm. - Named-list
ammwith empty slot name, non-amm_specentry, or duplicated names. - Named-list
ammwithformulathat is not a two-sided formula. - Classic path with non-default
amm. - Heterogeneous family (
family_is_named_list = TRUE) resolved to$K = 1$ .
The data field of the abort is populated for some errors (e.g., list(slot = ..., received = ...) and list(K = K)).
.gdpar_eb_run_K(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, family_id_k_vector, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group, parametrization, id_check_rigor, eb_correction, laplace_control, call, ...)
Purpose
Primary orchestrator for the Empirical-Bayes ("eb") estimation path under the regime gdpar_eb_fit.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm_list_canonical |
named list of length |
Canonical AMM (anchor model matrix) specifications. Each element is a list potentially containing $a (formula for the $b (formula for the $W (pre-specified basis matrix or formula). Names become slot_names. |
family |
list / character | Response family specification passed to Stan code generators and data assemblers. |
data |
data.frame | Data containing the outcome column and all covariates referenced in slot formulas. |
prior |
list | Prior specification passed to Stan code generators. |
anchor |
various | Anchor specification (scalar, vector, or special keyword) resolved by resolve_anchor_K. |
outcome_name |
character | Name of the outcome column in data. |
formula_env |
environment | Environment attached to all internally constructed formulas via stats::as.formula(..., env = formula_env). |
family_id_k_vector |
integer vector | Per-slot family identifiers of length .assemble_stan_data_K. |
skip_id_check |
logical | If TRUE, all identifiability checks (per-slot and id_report is set to NULL. |
chains |
numeric/integer | Number of MCMC chains for the conditional model; coerced to integer. |
iter_warmup |
numeric/integer | Warmup iterations; coerced to integer. |
iter_sampling |
numeric/integer | Sampling iterations; coerced to integer. |
adapt_delta |
numeric | Stan NUTS adapt_delta control parameter. |
max_treedepth |
numeric/integer | Stan NUTS maximum tree depth; coerced to integer. |
refresh |
numeric/integer | Stan output refresh interval; coerced to integer. |
verbose |
logical | Controls show_messages, show_exceptions in Stan sampling, and verbosity of helper calls. |
seed |
integer or NULL
|
Random seed for Stan sampling and Laplace maximization. If non-NULL, coerced to integer. |
group |
various | Grouping specification for hierarchical structure, resolved by .resolve_group_argument. |
parametrization |
character | Requested parametrization ("cp" for centered, otherwise non-centered). In Path B both cp_a and cp_W are set uniformly across all |
id_check_rigor |
various | Rigor level forwarded to .check_identifiability_K for the |
eb_correction |
logical | Whether to apply the Proposition 7B coverage-discrepancy correction. |
laplace_control |
list | Control parameters forwarded to .gdpar_eb_maximize_marginal. |
call |
call | The original top-level function call, stored in the returned object. |
... |
any | Extra named arguments merged into the sample_args list passed to cmdstanr's $sample() method, potentially overriding defaults. |
Mathematics
The function implements a two-stage Empirical Bayes estimator:
Stage 1 — Laplace marginal maximization. The marginal Stan model is generated and compiled. The marginal log-posterior of the anchor parameters is maximized:
where $\boldsymbol{\theta}{\mathrm{ref}} \in \mathbb{R}^{J{\mathrm{groups}} \times K}$ is the vector of per-group, per-slot anchor parameters. The Laplace helper returns the mode
Stage 2 — Conditional MCMC. The conditional Stan model is generated, compiled, and sampled with theta_ref_k_data), drawing from:
EB correction (Proposition 7B, scalar form at
where $\Sigma^{\mathrm{marg}}{\theta{\mathrm{ref},k}}$ is the marginal variance of the .gdpar_eb_apply_correction.
Identifiability diagnostic test point. For each slot
This avoids testing identifiability at a degenerate zero anchor when a
Returns
A list with S3 class c("gdpar_eb_fit", "list") containing:
| Element | Type / Structure | Description |
|---|---|---|
theta_ref_hat |
numeric | Laplace point estimate of the anchor (flat vector of length |
theta_ref_se |
numeric | Standard error of the Laplace estimate. |
conditional_fit |
CmdStanMCMC |
The cmdstanr fit object from the conditional model. |
amm_list_canonical |
named list | The input AMM list (with $W slots materialized). |
family |
— | The input family. |
prior |
— | The input prior. |
design_K |
list | Design structure from .build_amm_design_K, containing Z_a_k_list, Z_b_k_list, X, etc. |
anchor |
numeric vector | Resolved anchor values of length resolve_anchor_K. |
stan_data |
list | Assembled Stan data list from .assemble_stan_data_K, augmented with K_slots and p_dim. |
identifiability_report |
named list or NULL
|
Per-slot identifiability reports (named by slot_names) with a K_level attribute; NULL when skip_id_check = TRUE. |
diagnostics |
— | MCMC diagnostics from compute_diagnostics. |
diagnostics_numerical |
— | Laplace optimizer diagnostics from laplace_result$diagnostics. |
parametrization |
list | Resolved parametrization with elements cp_a (logical), cp_W (logical), cp_a_per_K (NULL), and meta (list with mode = "eb_K_path_B", note, requested). |
group_info |
list or NULL
|
Resolved group information from .resolve_group_argument. |
correction_applied |
logical | Whether the EB correction was applied. |
eb_correction_constant |
— | The correction constant from .gdpar_eb_apply_correction. |
call |
call | The original function call. |
path |
character | Always "eb". |
K |
integer | Number of slots. |
slot_names |
character | Names of the AMM list elements. |
Notes
Input validation errors (class gdpar_input_error):
- If
outcome_nameis not a column indata. - If the outcome
yis a matrix or an array withlength(dim(y)) > 1(Path B requires a length-$n$ univariate vector shared across all$K$ slots). - If
ycontains any non-finite values: for numericy, any!is.finite(y)(NA, NaN, Inf); for non-numericy, anyis.na(y).
Identifiability errors (class gdpar_identifiability_error):
- If any per-slot check
gdpar_check_identifiabilityreturnsrep_k$passed != TRUE, the errordatafield containsslot(name) andreport(the full report). - If the
$K$ -level check.check_identifiability_Kreturnspassed != TRUE, the errordatafield containsreport(the$K$ -level report). - Both are bypassed when
skip_id_check = TRUE.
Formula construction:
- The union of all variables across all slots'
$aand$bformulas is collected. If empty, the RHS is"1"; otherwise it ispaste(union_vars, collapse = " + "). - The full formula is
outcome_name ~ rhs_strwithenv = formula_env. - The RHS is extracted as
formula_full[c(1L, 3L)](a one-sided formula) and updated with~ . + 0to remove the intercept.
W basis materialization:
- For each slot with a non-
NULL$Welement,materialize_W_basis(amm_list_canonical[[k]]$W, p = 1L)is called in place, mutatingamm_list_canonical.
Parametrization resolution:
- Both
cp_aandcp_Ware set toidentical(parametrization, "cp"), meaning the same parametrization is applied uniformly across all$K$ slots. Themeta$noteexplicitly states that per-slot preflight (cp_a_per_K) is queued but not yet implemented.
theta_ref_k_data reshaping:
-
theta_hatis extracted asas.numeric(laplace_result$theta_ref_hat). -
J_groups_locis read fromstan_data$J_groupsand coerced to integer. - The if/else branches both produce
matrix(theta_hat, nrow = J_groups_loc, ncol = K, byrow = FALSE)— the two branches are functionally identical, suggesting a placeholder for future differentiation. - The resulting matrix is assigned to
stan_data_cond$theta_ref_k_data, intended for Stan'sarray[J_groups] vector[K]consumer.
Stan model lifecycle:
- The marginal Stan source is generated by
.gdpar_eb_generate_stan_marginal, written to a tempfile viawrite_stan_to_tempfile, and compiled withcmdstanr::cmdstan_model. - The conditional Stan source is generated by
.gdpar_eb_generate_stan_conditionaland undergoes the same write-and-compile cycle. - Both tempfile paths are transient (side effect on the filesystem).
Sample arguments:
- The
sample_argslist is constructed with explicit integer coercions forchains,iter_warmup,iter_sampling,max_treedepth, andrefresh. -
adapt_deltais passed without coercion. -
show_messagesandshow_exceptionsare both set toverbose. -
seedis added only if non-NULL. - Extra arguments from
...are merged intosample_argsby name, potentially overriding any of the above defaults. - Sampling is invoked via
do.call(conditional_model$sample, sample_args).
Group aliasing:
- When
group_infois non-NULL,.check_group_aliasing_c7is called for each slot$k$ with a design list containingZ_a = design_K$Z_a_k_list[[k]],Z_b = design_K$Z_b_k_list[[k]], andX = design_K$X.
Trailing roxygen block:
- The section concludes with a roxygen
@noRddocumentation block for an internal function implementing the tensor-valued Proposition 7B* correction under$K > 1$ and$p > 1$ . The function itself is not defined in this section; its documented signature includes parameterseb_correction,laplace_result_per_slot,K,p, andverbose, and it returns a list withapplied,constant(3D array$[K, p, p]$ ), andslot_dispositions. This function appears in a subsequent section.
Purpose
Builds a three-dimensional correction tensor for the Path C empirical-Bayes (EB) regime. The tensor scales each slot's reference-parameter covariance (extracted from per-slot Laplace results) by a fixed multiplier and is consumed downstream by S3 coverage methods. If any slot fails validation, the entire correction is disabled and downstream methods fall back to nominal coverage.
Arguments
-
eb_correction— logical scalar (or any value testable byisTRUE). When notTRUE, the function short-circuits and returns a disabled result with no slot processing. -
laplace_result_per_slot— list of lengthK. Each element is a Laplace-fit result object expected to contain atheta_ref_cov_kfield holding the$p \times p$ covariance of the reference parameters for that slot. -
K— integer-ish scalar; number of slots. Coerced to integer. Default2L. -
p— integer-ish scalar; number of coordinates per slot. Coerced to integer. Default1L. -
verbose— logical scalar; whenTRUEand at least one slot fails, a diagnostic warning is emitted viagdpar_warn.
Mathematics
For each slot
where theta_ref_cov_k matrix for slot
The positive-semidefinite check uses eigenvalues eigen(..., symmetric = TRUE, only.values = TRUE); a slot is rejected if any
Returns
A named list with three components:
-
applied— logical scalar.TRUEonly if every slot passed validation and the tensor was filled. -
constant— numeric array of dimensionsc(K, p, p). Whenapplied = TRUE, filled with the scaled covariances; otherwise filled withNA_real_(the "empty tensor"). -
slot_dispositions— named character vector of lengthK(names areseq_len(K)coerced to character). Each entry is one of:"disabled"(correction globally off),"missing"(covariance absent or wrong shape),"non_finite"(covariance contains non-finite entries),"non_psd"(eigenvalue check failed), or"ok".
Notes
- The multiplier
kappa_alpha_95is hardcoded to1.92(not the exact 1.959964… standard-normal 97.5th percentile). - When
eb_correctionis notTRUE, the returnedslot_dispositionsare all"disabled"and names are set viasetNames(rep("disabled", K), seq_len(K)). - When any slot fails,
any_failedis set, the function returnsapplied = FALSEwith an empty (NA) tensor, and—ifverbose—a warning of class"gdpar_diagnostic_warning"is emitted viagdpar_warnsummarising the count and unique failure types. - The PSD eigen-decomposition is wrapped in
tryCatch; an error fromeigenyieldsNA_real_values, which then trigger the"non_psd"disposition. - The covariance shape check requires
is.matrix(cov_k),nrow == p, andncol == p.
Purpose
Local helper that allocates a fresh NA_real_, used as the default/disabled constant tensor.
Arguments
None. Captures K and p from the enclosing .gdpar_eb_correction_tensor scope.
Returns
A numeric array of dimensions c(K, p, p) filled entirely with NA_real_.
Notes
Defined as a closure; not accessible outside its parent function.
Purpose
Constructs the per-slot multivariate (ragged) design matrices for the Path C K slots of a canonicalised amm_spec list, enforces homogeneous .build_amm_design_multi() for each slot. The returned structure is the direct input consumed by .assemble_stan_data_KxP().
Arguments
-
amm_list_canonical— named list of length$K \geq 2$ ofamm_specobjects. Each object must carry a$pfield (defaulting to1Lvia%||%if absent) that is$\geq 2$ and identical across all slots. -
data— data frame containing the variables referenced by the per-slot AMM specifications. Validated byassert_data_frame(). -
formula_rhs— two-sided formula identifying the covariate columns ofdataused as the linear factor$x$ . Passed through verbatim to.build_amm_design_multi()for each slot.
Returns
A named list with:
-
K— integer scalar; number of slots. -
p— integer scalar; the homogeneous coordinate dimension (taken from the first slot). -
slot_names— character vector of lengthK; thenames()ofamm_list_canonical. -
design_per_slot— named list of lengthK. Each entry is the list returned by.build_amm_design_multi(a_k, data, formula_rhs)for that slot'samm_spec.
Notes
- Aborts with class
"gdpar_internal_error"viagdpar_abortifamm_list_canonicalis not a list or has length< 2. - Aborts if any element lacks a non-empty name (
is.null(slot_names)orany(!nzchar(slot_names))). - Aborts if the per-slot
$pvalues are not all$\geq 2$ or not all identical. The error message includes the comma-separatedp_per_slotvector. - Each slot is validated with
assert_inherits(a_k, "amm_spec", ...)before delegation. - The
$pextraction usesa$p %||% 1L, so a missing$pfield is treated as1L—which then triggers the homogeneous-$p \geq 2$ abort.
.assemble_stan_data_KxP(design_KxP, family, amm_list_canonical, y_matrix, theta_anchor_kp, group_id = NULL, path = c("EB", "FB"), cp_W = FALSE)
Purpose
Assembles the complete named-list data block consumed by the Path C Stan templates (amm_eb_marginal_KxP.stan / amm_eb_conditional_KxP.stan for the EB path; amm_canonical_pmulti_KxP.stan for the FB path). Dispatches on path to enforce or lift the Sub-phase 8.6.D first-iteration restrictions: EB hardcodes use_W = 0 and restricts stan_id to
Arguments
-
design_KxP— list returned by.build_amm_design_KxP(). Must contain$design_per_slot,$K, and$p. -
family— promotedgdpar_familyobject (validated byassert_inherits). Must carry a$stan_idfield and a$namefield. -
amm_list_canonical— named list ofKamm_specobjects with$p \geq 2$ per slot. Used to extract per-slotuse_a/use_bflags and (FB path)$W$ metadata. -
y_matrix— numeric or integer matrix of outcomes, shape$n \times p$ . -
theta_anchor_kp— numeric matrix of shape$K \times p$ ; per-slot per-coordinate anchors on the linear-predictor scale. -
group_id— optional integer vector of length$n$ . Resolved via.resolve_group_id(). -
path— character scalar; one of"EB"or"FB"(resolved bymatch.arg). Default"EB". -
cp_W— logical scalar. Present in the signature but not referenced anywhere in the function body.
Mathematics
Per-slot per-coordinate design matrices are packed into 4D arrays with zero-padding:
where
and
For the FB path with
with $W_{\text{per_kj_dim}} = \text{amm}$W$\text{dim}$ taken from the first slot that declares
The use_a_k / use_b_k flags are computed as
and analogously for use_b_k.
Returns
A named list. The base list (returned for both paths) contains:
| Field | Type | Description |
|---|---|---|
n |
integer | Number of observations. |
K |
integer | Number of slots. |
p |
integer | Coordinate dimension. |
family_id_k_vector |
integer vector (length K) |
Homogeneous stan_id replicated K times. |
inv_link_id_per_slot |
integer vector (length K) |
Computed by .gdpar_compute_inv_link_id_per_slot(). |
use_a_k |
integer vector (length K) |
Per-slot |
use_b_k |
integer vector (length K) |
Per-slot |
use_W |
integer scalar |
0L (EB) or as.integer(any_W) (FB). |
J_a_max |
integer | Maximum |
J_b_max |
integer | Maximum |
J_a_per_kp |
integer matrix ( |
Per-slot per-coord |
J_b_per_kp |
integer matrix ( |
Per-slot per-coord |
Z_a_kp |
numeric array ( |
Padded |
Z_b_kp |
numeric array ( |
Padded |
y_real |
numeric matrix ( |
Real-valued outcomes (or zeros if needs_real is FALSE). |
y_int |
integer matrix ( |
Integer-valued outcomes (or zeros if needs_int is FALSE). |
theta_anchor_kp |
list of K double vectors (each length p) |
Row-wise decomposition of the input matrix. |
use_dispersion_y_k |
integer vector (length K) |
Always zero in both paths. |
use_dispersion_phi_k |
integer vector (length K) |
Always zero in both paths. |
use_groups |
(from .resolve_group_id) |
Group flag. |
J_groups |
(from .resolve_group_id) |
Number of groups. |
group_id |
(from .resolve_group_id) |
Group index vector. |
K_slots |
integer | Redundant copy of K. |
p_dim |
integer | Redundant copy of p. |
For the FB path only, the list is extended (c(base_list, ...)) with:
| Field | Type | Description |
|---|---|---|
dim_W |
integer | Total 0L. |
d |
integer | Number of columns in the shared design matrix |
W_per_kj_dim |
integer | Per-(slot, coord) basis dimension. |
X |
numeric matrix ( |
Shared linear-factor design matrix (or |
W_type_id |
(from .gdpar_resolve_W_stan_data) |
|
W_n_knots_full |
(from .gdpar_resolve_W_stan_data) |
Knot count. |
W_knots_full |
(from .gdpar_resolve_W_stan_data) |
Knot vector. |
W_degree |
(from .gdpar_resolve_W_stan_data) |
Spline degree. |
Notes
-
EB path restrictions:
use_Wis hardcoded to0L;stan_idmust be in${1, 3}$ (Gaussian or Negative Binomial), otherwise a"gdpar_unsupported_feature_error"is raised. If any slot declaresW != NULLon the EB path, a"gdpar_unsupported_feature_error"is raised. -
FB path extensions:
stan_idmust be in${1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13}$ ; otherwise a"gdpar_unsupported_feature_error"is raised.$W$ is enabled if any slot declares it; the first such slot's$Wobject defines the basis metadata (shared globally). -
Outcome validation: For count families (
stan_id$\in {3, 10, 11, 12, 13}$ ), every entry ofy_matrixmust be a finite, non-negative integer; otherwise a"gdpar_input_error"is raised. For continuous families (stan_id$\in {1, 5, 6, 7, 8, 9}$ ), every entry must be finite. -
y_real/y_intpopulation:needs_realisTRUEforstan_id$\in {1, 5, 6, 7, 8, 9}$ ;needs_intisTRUEforstan_id$\in {3, 10, 11, 12, 13}$ . The unused matrix is zero-filled. -
theta_anchor_kpis validated as a$K \times p$ matrix and then decomposed row-wise into a list ofKlength-pdouble vectors vialapply(seq_len(K), function(k) as.double(theta_anchor_kp[k, ])). -
family_id_k_vectorisrep(as.integer(stan_id), K)—homogeneous across slots regardless of path. -
use_dispersion_y_k/use_dispersion_phi_kare zero vectors in both paths (the FB comment notes future B9.7+ may lift this). -
cp_Wis accepted as a parameter but never read. -
Internal errors (class
"gdpar_internal_error") are raised for: invaliddesign_KxPstructure;K < 2orp < 2;y_matrixnot a matrix;y_matrixcolumn count mismatch;theta_anchor_kpshape mismatch; FB path withdim_W <= 0whenuse_W == 1. - Calls
.resolve_group_id(),.gdpar_compute_inv_link_id_per_slot(), and (FB only).gdpar_resolve_W_stan_data(). - The
pad_tolocal helper (see below) handles zero-padding of design matrices.
Purpose
Zero-pads a design matrix z to target_cols columns. If target_cols is 0L, returns an z already has at least target_cols columns, returns z unchanged. Otherwise right-pads with a zero matrix.
Arguments
-
z— numeric matrix; the per-slot per-coordinate design matrix to pad. -
target_cols— integer scalar; the target column count ($J_{a,\max}$ or$J_{b,\max}$ ). -
n_rows— integer scalar; the number of rows to use whentarget_cols == 0L(i.e.,$n$ ).
Returns
A numeric matrix with nrow(z) rows and max(ncol(z), target_cols) columns (or n_rows rows and 0 columns when target_cols == 0L).
Notes
Defined as a local closure inside .assemble_stan_data_KxP; captures nothing from the enclosing scope (all inputs are explicit arguments). When target_cols > 0L but ncol(z) >= target_cols, z is returned as-is (no truncation occurs even if z has more columns than the target).
Purpose
Internal helper that fabricates a random initial-values list for the cmdstanr optimizer / Laplace approximation in the Path C K×p EB workflow. The returned list conforms to the cmdstanr automatic packing convention for the theta_ref_kp parameter (a 3D array [J, K, p]) and conditionally emits the auxiliary scale / raw-coefficient parameters that the K×p Stan template exposes when group structure or free a/b coefficients are active.
Arguments
-
stan_data— list. The Stan data environment. The following fields are consulted (via null-coalescing%||%):K_slots(fallbackK),p_dim(fallbackp),J_groups(fallback1L),use_groups(fallback0L),use_a_k,use_b_k,J_a_per_kp,J_b_per_kp. -
seed_offset— integer scalar, default1L. Integer added tobase_seedto derive the per-start RNG seed, enabling distinct inits across multi-start iterations. -
base_seed— integer scalar orNULL. When non-NULL, the function seeds the global RNG withas.integer(base_seed) + seed_offsetand restores the prior.Random.seedstate on exit. WhenNULL, no seeding is performed and the global RNG state is untouched.
Mathematics
The RNG seed is
Draws produced (all i.i.d. unless noted):
-
theta_ref_kp[g,k,c]$\sim \mathcal{N}(0,, 0.1^2)$ , shape$[J, K, p]$ . - When
use_groups == 1:-
mu_theta_ref_kp[1,k,c]$\sim \mathcal{N}(0,, 0.1^2)$ , shape$[1, K, p]$ . -
sigma_theta_ref_kp[1,k,c]$= |\mathcal{N}(0.5,, 0.05^2)|$ , shape$[1, K, p]$ .
-
- When any
use_a_k == 1:-
sigma_a_k[s] = 0.1 + |\mathcal{N}(0,\, 0.02^2)|$ for $s = 1, \dots, n_{\sigma_a}$, where $n_{\sigma_a}$ is the count of slots $k$ satisfyinguse_a_k[k] == 1` and$\sum_{c} \mathbf{1}{J_{a,\text{per_kp}}[k,c] > 0} > 0$ . -
a_raw[j]$\sim \mathcal{N}(0,, 0.1^2)$ for$j = 1, \dots, \sum_{k,c} J_{a,\text{per_kp}}[k,c]$ .
-
- When any
use_b_k == 1:-
sigma_b_k[k]$= 0.1 + |\mathcal{N}(0,, 0.02^2)|$ for$k = 1, \dots, K$ . -
c_b_kp_raw[j]$\sim \mathcal{N}(0,, 0.1^2)$ for$j = 1, \dots, \sum_{k,c} J_{b,\text{per_kp}}[k,c]$ .
-
Returns
A named list. Always contains theta_ref_kp (a 3D numeric array of dim c(J, K, p)). Conditionally also contains:
-
mu_theta_ref_kp— 3D array[1, K, p](only whenuse_groups == 1). -
sigma_theta_ref_kp— 3D array[1, K, p](only whenuse_groups == 1). -
sigma_a_k— 1D numeric array of lengthn_sigma_a(only whenany_use_a == 1andn_sigma_a > 0). -
a_raw— numeric vector of lengthtotal_J_a_free(only whenany_use_a == 1andtotal_J_a_free > 0). -
sigma_b_k— 1D numeric array of lengthK(only whenany_use_b == 1). -
c_b_kp_raw— numeric vector of lengthtotal_J_b_free(only whenany_use_b == 1andtotal_J_b_free > 0).
Notes
- Side effect: when
base_seedis non-NULL, the global.Random.seedis overwritten viaset.seed(rng_seed)and restored on function exit through anon.exithandler. If.Random.seeddid not previously exist in.GlobalEnv, the handler performs no restoration (the seed state is left as set). - The slot-free-a mask is computed by coercing
stan_data$J_a_per_kpto an integer matrix of shapeK × p(row-major), then takingrowSums(.jap > 0L) > 0Lintersected withuse_a_k == 1L. This mirrors then_sigma_atransformed-data quantity of the K×P Stan template; when every slot carries freeacoefficients,$n_{\sigma_a} = K$ and the draw count is bit-identical to the unconditional case. -
sigma_a_kandsigma_b_kare wrapped withas.arrayto ensure 1D-array typing expected by cmdstanr init packing. - No errors are raised by this function; malformed
stan_datawould propagate as errors from downstream coercions (e.g.as.integer,matrix).
Purpose
Step (i) of the EB workflow under Path C, specialized for the K×p regime. Runs a multi-start joint Laplace approximation over the full theta_ref_kp anchor tensor of shape [J_groups, K, p], selects the best init by marginal log-likelihood, draws from the Laplace approximation, extracts per-slot .gdpar_eb_correction_tensor().
Arguments
-
model— cmdstanr model object. Must expose$optimizeand$laplacemethods. -
stan_data— list. Stan data list; must containJ_groups,K_slots,p_dim, plus whatever fields.gdpar_eb_make_random_init_KxPrequires. -
control— list. Must contain at least:multi_start_M(integer, number of starts),optim_algorithm(passed tomodel$optimize),laplace_draws(integer, number of Laplace draws),kappa_threshold(numeric, condition-number gate), and any fields consumed by.gdpar_eb_lm_perturb. -
seed— integer scalar orNULL. Base seed for reproducibility; propagated to both the init generator and cmdstanr. -
verbose— logical. Controls emission of informational messages viagdpar_informandgdpar_warn.
Mathematics
Multi-start optimization. For
with optimizer seed seed non-NULL). The marginal log-likelihood of each start is $\ell_m = \texttt{opt_m$mle()["lp__"]}$. The best start is
where inits that errored are skipped (their NA_real_).
Laplace approximation. Given
with $S = \text{control$laplace_draws}$ and Laplace seed seed non-NULL).
Posterior mean of the anchor tensor:
Per-slot covariance. For slot
where .gdpar_eb_lm_perturb, yielding a ridge-perturbed $\tilde{\Sigm
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References