Skip to content
José Mauricio Gómez Julián edited this page Jun 28, 2026 · 6 revisions

gdpar — Complete Technical Wiki

General Dynamic Parameter Models via Reference Anchoring

Package version: 0.1.0 · License: GPL (≥ 3) · Author: José Mauricio Gómez Julián (ORCID 0009-0000-2412-3150) · Repository: https://github.com/IsadoreNabi/gdpar


0. About this document

This is an exhaustive, self-contained technical reference for the gdpar R package. It documents, in three layers and at maximal depth:

  1. Conceptualization — the cognitive and statistical idea behind the framework, its desiderata, and the family of problems it addresses (Part I).
  2. Mathematics — the canonical decomposition, identifiability theory, the three estimation paths and their asymptotics, the distributional-family algebra, the spline bases, the causal-inference bridge, the geometry-adaptive sampling engine, and the dependence-robust inference machinery (Part II).
  3. Computation — how the mathematics is realized in code: the Stan code generator and every Stan template (line by line), the fitting engines, the family code generation, the geometry engine, and every one of the package's 469 functions organized by module (Parts III–V).

Parts VI and the appendices cover the bundled data, the benchmark harness, the test suite, the symbol glossary, and the bibliographic anchors.

0.1 How this document is organized

Part Content
I Conceptual framework: motivation, the anchoring equation, the AMM form, the three paths, EB vs FB, distributional regression, the causal bridge, geometric robustness, dependence-robust inference
II Mathematical foundations: AMM algebra, identifiability theorems (C1–C7), parametrizations (CP/NCP, linear reparametrization), asymptotics of Paths 1/2/3, Empirical-Bayes theory (Theorems 7A–7D) and its multivariate extension, families and links, B-spline W bases, grouped references, causal identification, geometric metrics, dependence diagnostics and block bootstrap
III Computational architecture: the Stan code generator, the fitting engines (gdpar, .gdpar_multi, .gdpar_K), the Empirical-Bayes engine, the family/codegen layer, the geometry engine/orchestrator
IV Exhaustive function reference: all 44 R source files, all 469 functions, grouped by module, each with purpose, signature, arguments, mathematics, and return value
V Stan templates: all 13 .stan files, block by block
VI Data, benchmarks, tests, appendices, references

0.2 Notation and conventions

Mathematics is written in GitHub-flavored LaTeX: inline as $ … $ and display as $$ … $$. Code identifiers, file names, and Stan symbols are in monospace. Internal (non-exported) R functions are named with a leading dot, e.g. .gdpar_multi; exported functions have no leading dot, e.g. gdpar.

The single most important object in the entire package is the reference-anchoring decomposition

$$\theta_i ;=; \theta_{\text{ref}} ;+; \Delta(x_i,; \theta_{\text{ref}}),$$

read throughout as: the parameter of individual $i$ equals a population reference plus a deviation that is itself a function of the individual's covariates and of the reference. Every layer of the package — conceptual, mathematical, computational — is an elaboration of this one equation.

0.3 Global symbol table

Symbol Meaning
$\theta_i$ parameter (vector) for individual / observation $i$
$\theta_{\text{ref}}$ population reference parameter, $\theta_{\text{ref}} = \mathbb{E}[\theta]$
$\Delta(x_i,\theta_{\text{ref}})$ individual deviation function
$x_i$ observable covariates of individual $i$
$a(x)$ additive component of the AMM deviation
$b(x)$ multiplicative (Hadamard) component of the AMM deviation
$W(\theta_{\text{ref}})$ reference-modulated mixing matrix of the AMM deviation
$\odot$ Hadamard (elementwise) product
$K$ number of distributional parameter slots (e.g. $\mu,\sigma,\nu,\pi,\dots$)
$p$ dimension of the parameter at a slot (number of coordinates of $\theta$)
$\mathcal{D}(\theta_i)$ observation distribution given $\theta_i$
$\xi$ EB hyperparameter / canonical-piece reduced parameter vector
$g(\cdot)$, $g^{-1}(\cdot)$ link and inverse-link (response) function of a slot
$G(\theta)$ Riemannian metric tensor used by the geometry engine
$\phi$ temporal persistence (AR) parameter; also hypernetwork weights in Path 3
$\pi_i$ structural-zero probability for individual $i$ (zero-inflation / hurdle)

Part I — Conceptual Framework

I.1 Motivation: from cognition to statistics

The framework originates in an observation about expert human prediction under uncertainty and consequence: a driver executing fast overtaking maneuvers at narrow distances without crashing. Decomposed, the expert's predictive process has three stages:

  1. A population reference. The driver carries an internal model of the average driver — typical reaction times, modal aggressiveness, characteristic acceleration/braking patterns. This functions as a prior.
  2. Rapid estimation of individual deviation. In a one-to-two-second window, the driver reads signals from the specific other driver (style, relative speed, vehicle type, micro-movements) and estimates how and how much that driver departs from the average.
  3. Conditional prediction and decision. With the individual profile expressed as a deviation from the average, the driver predicts the other's behavior and decides the maneuver.

A conventional model in its simplest form predicts with parameters that are fixed once estimated,

$$\hat{y}_i = f(x_i;\hat\theta),\qquad \hat\theta \text{ identical for all } i.$$

More elaborate models do let parameters vary across individuals — hierarchical models with covariate-dependent random effects, varying-coefficient models, hypernetworks, mixtures of experts, state-space models. The framework's contribution is therefore not the absence of antecedents but the explicit, canonical formulation of a specific structural pattern those antecedents do not typically make explicit:

Conventional predictive models do not typically formulate individual parameters as explicit deviations from a population reference where the deviation function depends both on the individual's observable characteristics and on the reference itself.

The substantive content unpacks into three commitments:

  • $\theta_{\text{ref}}$ is always present as an explicit anchor in the predictive equation.
  • $\Delta_i$ is computed as a function of observable signals $x_i$.
  • the individual prediction emerges from $\theta_i=\theta_{\text{ref}}+\Delta_i$, not from $\theta_i$ estimated independently per individual nor from $\theta_{\text{ref}}$ shifted by a structurally separate term.

The decisive property is the structural dependence of $\Delta$ on $\theta_{\text{ref}}$: $\Delta_i = \Delta(x_i,\theta_{\text{ref}})$. If the reference changes — e.g. the model is transferred to a new population — the deviation function behaves differently because $\theta_{\text{ref}}$ is one of its arguments. This is the precise mathematical content the AMM canonical form (Part II) makes operational.

I.2 The fundamental equation

For each observation $i$ with covariates $x_i$,

$$\boxed{\theta_i ;=; \theta_{\text{ref}} ;+; \Delta(x_i,;\theta_{\text{ref}})},$$

with $\theta_{\text{ref}}=\mathbb{E}[\theta]$ the population reference (the "average driver", or in general the reference entity of the system) and $\Delta$ a deviation function. The individual prediction is

$$\hat y_i ;=; f\big(x_i;,\theta_i\big) ;=; f\big(x_i;,\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}})\big).$$

A model in this framework is fully specified by three design choices:

  1. How $\theta_{\text{ref}}$ is estimated — from the full sample, from a reference subset, or as a hyperparameter with its own distribution (this becomes the anchor argument in code, and the EB-vs-FB distinction).
  2. How $\Delta$ is estimated — parametric/linear, non-parametric (splines), or as a neural network output (the three paths).
  3. What distribution is assumed for $Y_i\mid\theta_i$ — this determines the family of problems addressed (the distributional families).

The canonical functional form for $\Delta$ is the Additive–Multiplicative–Modulated (AMM) form,

$$\Delta(x,\theta_{\text{ref}}) ;=; a(x) ;+; b(x)\odot\theta_{\text{ref}} ;+; W(\theta_{\text{ref}}),x,$$

developed formally in Part II. Read componentwise: $a(x)$ is a pure additive shift; $b(x)\odot\theta_{\text{ref}}$ scales the reference elementwise by a covariate-dependent factor; $W(\theta_{\text{ref}}),x$ mixes covariates through a matrix that itself depends on the reference. The third term is what encodes "the deviation depends on the reference."

I.3 Desiderata satisfied by the form

  1. Anchoring. With no individual information ($x_i$ unobserved or non-informative), $\Delta\to 0$ and $\theta_i\to\theta_{\text{ref}}$: the model collapses to the population baseline.
  2. Individuation. With rich individual information, $\Delta$ can be large and parameters move away from the reference as far as the evidence justifies.
  3. Transferability. When $\theta_{\text{ref}}$ changes (new population), deviations are recomputed coherently because $\Delta$ is a function of $\theta_{\text{ref}}$.
  4. Generality. The form subsumes, as special cases, fixed effects ($\Delta\equiv 0$), classical random effects ($\Delta$ independent of $\theta_{\text{ref}}$), and random coefficients ($\Delta$ linear in $x_i$).

I.4 The three estimation paths

The same canonical principle is realized through three complementary engines, differing in how $\Delta$ is estimated and in their interpretability/expressivity trade-offs.

  • Path 1 — Hierarchical Bayesian (Stan). A three-level hierarchy: a population level $\theta_{\text{ref}}\sim p(\theta\mid\text{hyper})$, an individual level $\theta_i\mid x_i \sim \mathcal N(\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}}),\Sigma_i)$ (with $\Sigma_i$ possibly covariate-dependent, i.e. individual heteroscedasticity), and an observation level $Y_i\mid\theta_i\sim\mathcal D(\theta_i)$. Most faithful to the cognitive analogy, native full-posterior uncertainty. This is the operational path in gdpar.
  • Path 2 — Varying-coefficient models (penalized splines). The frequentist version: $Y_i = x_i^\top\beta(z_i)+\varepsilon_i$, with the reference recovered as $\beta_{\text{ref}}=\beta(\bar z)$ and the deviation as $\Delta_i=\beta(z_i)-\beta(\bar z)$. Maximal interpretability; suffers the curse of dimensionality. Conceptual in gdpar 0.1.0 (see below).
  • Path 3 — Conditional parameter networks (hypernetworks / amortized inference). A neural network generates the individual parameters, $\theta_i=h_\phi(x_i,\theta_{\text{ref}})$, with anchoring enforced both by feeding $\theta_{\text{ref}}$ as an explicit input and by a regularizer $\lambda|\theta_i-\theta_{\text{ref}}|^2$. Arbitrary nonlinearity, lowest interpretability. Conceptual in gdpar 0.1.0.

Implementation status. In gdpar 0.1.0 only Path 1 (Hierarchical Bayesian) is operational and is the default. Calls of the form gdpar(..., path = "vcm") or gdpar(..., path = "hyper") abort with a gdpar_unsupported_feature_error. The asymptotic theory for all three paths is nonetheless developed to reference grade (Part II, from vignettes v04v06), so the package's mathematical scope exceeds its current executable surface by design.

Comparative summary (from the framework overview):

Criterion Path 1 Bayesian Path 2 VCM Path 3 Hypernetwork
Fidelity to cognitive analogy High Moderate Moderate
Theoretical rigor Very high High Moderate
Interpretability High Very high Low
Expressive capacity of $\Delta$ Moderate (parametric) Moderate–high Arbitrarily high
Scalability to high dimension Moderate Low (curse of dim.) High
Uncertainty quantification Native (full posteriors) Asymptotic (CIs) Requires extensions
Primary tools Stan, cmdstanr mgcv, splines torch

I.5 Empirical Bayes vs full Bayes

Within Path 1, the reference $\theta_{\text{ref}}$ and the hyperparameters can be handled in two regimes:

  • Full Bayes (FB). Everything — reference, deviation coefficients, hyperparameters — is given priors and sampled jointly by HMC. Output: full joint posterior. This is gdpar().
  • Empirical Bayes (EB). The hyperparameters (and, in the marginal variant, the reference itself) are estimated by maximizing a marginal likelihood / MAP objective; the remaining parameters are then inferred conditionally. Far cheaper, with an explicit and analyzable contraction/asymptotic story. This is gdpar_eb().

The package treats EB and FB as parallel, comparable estimation routes and ships an explicit comparator (gdpar_compare_eb_fb) that quantifies how much the two agree on $\theta_{\text{ref}}$, on the reduced parameters $\xi$, and on coverage. The EB theory and its multivariate generalization (Theorems 7A–7D, Part II) are first-class, not afterthoughts.

I.6 Distributional regression: parameter slots

The framework is not restricted to modelling a mean. A distribution can have several parameters — location, scale, shape, tail index, zero-inflation probability — and each of these is a slot that can carry its own AMM decomposition. The package indexes slots by $k=1,\dots,K$. Examples: Gaussian ($K=2$: $\mu,\sigma$), Student-$t$ ($K=3$: $\mu,\sigma,\nu$), Tweedie ($K=3$: $\mu,\phi,p$), zero-inflated and hurdle families (an extra $\pi$ slot), and heterogeneous per-slot families where different slots take different sub-families. Each slot has its own link $g_k$, its own support, and its own AMM design. This is the distributional-regression generalization of the anchoring equation:

$$\theta_i^{(k)} = \theta_{\text{ref}}^{(k)} + \Delta^{(k)}(x_i,\theta_{\text{ref}}^{(k)}),\qquad k=1,\dots,K,$$

each on its own link scale. Zero-inflation receives the framework's distinctive dual deviation: both $\pi_i$ and the count parameter $\theta_i$ are anchored to their respective references,

$$\operatorname{logit}(\pi_i)=\operatorname{logit}(\pi_{\text{ref}})+\Delta_\pi(x_i,\pi_{\text{ref}}),\qquad \theta_i=\theta_{\text{ref}}+\Delta_\theta(x_i,\theta_{\text{ref}}).$$

I.7 The causal bridge (CATE / ITE)

Because the AMM form produces individual parameters, it is naturally positioned for individual treatment-effect estimation. The package provides a T-learner causal bridge: fit the AMM model separately under treatment and control, then read individual conditional average treatment effects (CATE) / individual treatment effects (ITE) as the difference of the anchored individual predictions. A second layer compares this AMM-based learner against external meta-learners (via pluggable adapters to grf in R and EconML in Python through reticulate), so the framework's causal claims are benchmarked, not asserted.

I.8 Geometric robustness of sampling (opt-in)

Hierarchical AMM posteriors can be geometrically hostile to standard HMC — funnels, near-determinism, heavy tails, multimodality. The package contains an opt-in geometry-adaptive sampling engine whose default is bit-identical to ordinary sampling and which, when enabled, climbs a ladder of increasingly powerful geometries: Euclidean → Riemannian (Fisher / SoftAbs) → sub-Riemannian → Finsler/relativistic, governed by a certifying orchestrator that diagnoses the pathology, selects a metric, tunes the integrator, and emits a certificate. A Laplace fallback provides a plug-in posterior (and ELPD on par with mgcv-REML / INLA-Laplace) when full sampling is certified infeasible.

I.9 Dependence-robust inference

gdpar does not model temporal or spatial dependence in its point structure (that is deferred, by design, to a future "Block 10"); instead it makes the inference robust to dependence that is present but unmodelled. It ships:

  • diagnostics that convert invisible iid risk into measured quantities — lag-1 autocorrelation, Durbin–Watson, Ljung–Box on residuals (temporal); Moran's $I$ with permutation and analytic Cliff–Ord variants (spatial);
  • robust standard errors / intervals via block bootstrap — moving/circular blocks in time, tiled randomized-origin blocks in space — with data-driven block lengths (Politis–White flat-top automatic length in time; a custom subsampling calibration in space).

Point estimates are unchanged; only the uncertainty is made robust. The honesty is explicit: the dependence is not modelled, the inference is merely made valid in its presence.


Part II — Mathematical Foundations

This part develops, at reference grade, the mathematics the package implements: the AMM decomposition and its identifiability theory (§II.1), the asymptotic theory of the three paths (§II.2), the Empirical-Bayes theory and its multivariate extension (§II.3), the distributional families and links (§II.4), the B-spline W bases (§II.5), grouped references (§II.6), the causal-inference bridge (§II.7), the geometry-adaptive sampling metrics (§II.8), and the dependence-robust inference machinery (§II.9). The identifiability material corresponds to package source R/check_identifiability.R, R/preflight*.R, R/amm_spec.R; the conditions stated here are enforced in code and the cross-references make the link explicit.

II.1 The AMM canonical form and identifiability

II.1.1 Setting

There are $n$ units $i=1,\dots,n$. Each has covariates $x_i\in\mathcal X\subseteq\mathbb R^d$ drawn i.i.d. from a population law $\mu$, a latent parameter vector $\theta_i\in\mathbb R^p$, and an outcome $y_i\sim\mathcal D(\theta_i)$. The framework posits $\theta_i=\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}})$ with $\theta_{\text{ref}}\in\Theta\subseteq\mathbb R^p$ and $\Delta:\mathcal X\times\Theta\to\mathbb R^p$. Two questions are answered: what form does $\Delta$ take, and when are $(\theta_{\text{ref}},\Delta)$ recoverable from data. The second is split into three logically distinct layers kept separate throughout:

  • (L1) Algebraic-functional identifiability — does the latent function $\theta_i(\cdot)$ determine the components $(\theta_{\text{ref}},a,b,W)$? (Theorem 1A.)
  • (L2) Statistical identifiability — do the observable data ${(x_i,y_i)}$ determine them? (Lemma 1B, via a hypothesis on the response family.)
  • (L3) Numerical verifiability — in a chosen finite basis, can a runtime diagnostic detect identifiability or its failure? (Proposition 1C, the Gram-matrix check.)

II.1.2 The AMM hierarchy

The space of deviation forms is stratified by joint polynomial order in $(x,\theta_{\text{ref}})$:

Level Form Joint order Status
0 (degenerate) $\Delta=0$ $(0,0)$ standard non-mixed regression
1 (linear additive) $\Delta=Ax$, $A$ fixed $(1,0)$ classical random coefficients; no reference-dependence
2 (canonical AMM) $\Delta=a(x)+b(x)\odot\theta_{\text{ref}}+W(\theta_{\text{ref}})x$ mixed the canonical default
2.5 (full-matrix mult.) $\Delta=a(x)+B(x)\theta_{\text{ref}}+W(\theta_{\text{ref}})x$ Level 2 is $B=\operatorname{diag}(b)$
3 (quadratic) adds $x^\top Q_k,x$, $\theta^\top R_k,\theta$, $\theta\odot W x$ up to $(1,2)$
$K$ (polynomial closure) $\sum_{1\le j+k\le K,,j\ge1}\mathcal T_{j,k}(x^{\otimes j},\theta_{\text{ref}}^{\otimes k})$ $\le K$ $j=0$ terms excluded (non-individuating)
$\infty$ (universal) $\Delta=\Phi(x,\theta_{\text{ref}})$ arbitrary measurable hypernetwork; Proposition 1F

The canonical Level-2 AMM is

$$\boxed{;\Delta(x,\theta_{\text{ref}})=\underbrace{a(x)}_{\text{additive}}+\underbrace{b(x)\odot\theta_{\text{ref}}}_{\text{multiplicative (Hadamard)}}+\underbrace{W(\theta_{\text{ref}}),x}_{\text{modulated}};}$$

with $a,b:\mathcal X\to\mathbb R^p$ measurable and $W:\Theta\to\mathbb R^{p\times d}$ continuous with $W(\theta_0)=0$ at an anchor $\theta_0$. It is singled out by (P1) minimality of reference-dependence (smallest extension of Level 1 in which $\theta_{\text{ref}}$ enters nontrivially), (P2) separability of its three mechanisms, (P3) computational tractability (closed-form gradients, finite sufficient statistics, tractable HMC).

Approximation (Scheme 1D). On compact $\mathcal X,\Theta$, for continuous $\Delta$ and any $\varepsilon>0$ there is a finite $K$ and a Level-$K$ form with polynomial $a,b,W$ uniformly $\varepsilon$-close (Stone–Weierstrass). Flagged caveats: this is an approximation, not an identifiability statement; it covers only the polynomial restriction; it needs compactness; it gives no rate.

II.1.3 Standing assumptions (the six conditions enforced in code)

Condition Meaning
(C1) $\mathbb E_\mu[X]=0$ covariates centered
(C2) $\mathbb E_\mu[a(X)]=0$ additive component centered
(C3) $\mathbb E_\mu[b(X)]=0$ multiplicative component centered
(C4) $W(\theta_0)=0$ for some $\theta_0\in\Theta$ modulating matrix anchored
(C5) $\operatorname{Cov}_\mu(X)$ full rank $d$; $a,b\in L^2$; $W\in C(\Theta)$ bounded support/integrability
(C6) $\theta_*\in\Theta^\circ={\theta:\theta_k\neq0\ \forall k}$ non-degeneracy of the reference

The joint consequence of (C1)–(C4) is the centering of the framework: $\mathbb E_\mu[\Delta(X,\theta_{\text{ref}})]=0$ for every $\theta_{\text{ref}}$ — when an individual is "average", its deviation is zero. (C6) is genuine: a zero coordinate of $\theta_*$ kills the multiplicative term there and $b$ becomes unidentified in that coordinate; it is met for parameters bounded away from zero (rates, scales, precisions) or enforced by reparametrization ($\log\theta_{\text{ref}}$).

In the implementation, (C2)/(C3) are enforced empirically by column-centering the design matrices $Z_a,Z_b$ in the R-side AMM design constructor (so $\operatorname{colMeans}(Z_a)=0$ exactly); no sum-to-zero constraint is imposed on the coefficients $a_{\text{coef}},b_{\text{coef}}$ (that would wrongly exclude effects with non-zero coefficient mean). (C4) is enforced by the parametrization $W(\theta)=W_0(\theta)-W_0(\theta_0)$.

II.1.4 Functional Independence Condition (FIC)

Let $\mathcal F_a,\mathcal F_b\subseteq L^2_0(\mu,\mathbb R^p)$, $\mathcal F_W\subseteq C(\Theta,\mathbb R^{p\times d})$ be the component classes. Define the subspaces $\mathcal S_a=\mathcal F_a$, $\mathcal S_b(\theta_)={x\mapsto h(x)\odot\theta_:h\in\mathcal F_b}$, $\mathcal S_W={x\mapsto Mx:M\in\mathbb R^{p\times d}}$.

Abstract FIC at $\theta_*$. $\mathcal S_a,\mathcal S_b(\theta_*),\mathcal S_W$ are linearly independent in $L^2(\mu,\mathbb R^p)$: $f_a+f_b+f_W=0\Rightarrow f_a=f_b=f_W=0$.

The basis-restricted FIC ($\text{FIC}_B$) is the same condition restricted to a finite basis $B=(B_a,B_b)$; it is strictly weaker than abstract FIC when the function classes are infinite-dimensional, and coincides when $B$ exhausts them.

II.1.5 Theorem 1A (algebraic-functional identifiability)

Under (LIN) ($\mathcal F_a,\mathcal F_b,\mathcal F_W$ finite-dimensional linear subspaces), (C1)–(C5), (C6) at $\theta_$, and Abstract FIC at $\theta_$, the latent function $\theta_i^(\cdot)=\theta_+a(\cdot)+b(\cdot)\odot\theta_+W(\theta_)\cdot$ uniquely determines $(\theta_,a,b,W(\theta_))$.

The proof has three steps: (1) taking $\mathbb E_\mu$ of both representations and using (C1)–(C3) forces $\theta_=\theta_'$ (centering pins $\theta_{\text{ref}}$ to the marginal mean of $\theta_i(X)$); (2) subtract the common reference, form differences $\delta_a,\delta_b,\delta_W$, which by (LIN) remain in the classes, giving $\delta_a(x)+\delta_b(x)\odot\theta_+\delta_W x=0$; (3) Abstract FIC forces each summand to zero — $\delta_a=0$ directly, $\delta_b=0$ using (C6), $\delta_W=0$ using (C5) full-rank $X$. Scope: $W$ is pinned only at the single point $\theta_$; functional identification of $W$ on the prior support is Theorem 1E.

Necessity of FIC requires, in addition to (LIN) and (C1)–(C6), the hypothesis (EVAL): point evaluation $E_{\theta_}:\mathcal F_W\to\mathbb R^{p\times d}$, $W\mapsto W(\theta_)$, is surjective. If Abstract FIC fails (and (EVAL) holds), two distinct admissible tuples produce the same latent function — explicit non-identifiability. (LIN) lets perturbations stay in the classes; (EVAL) realizes the required $W$-shift at the anchor. Both are checkable in the finite representation and the library checks (EVAL) at the chosen anchor before fitting.

II.1.6 Lemma 1B (statistical bridge) and Corollary 1

Lemma 1B. Under Theorem 1A's hypotheses plus (D-ID) (the response family ${\mathcal D(\theta)}$ is identifiable in $\theta$: $\mathcal D(\theta)=\mathcal D(\theta')\Rightarrow\theta=\theta'$), the joint law of $(X_i,Y_i)$ determines $(\theta_,a,b,W(\theta_))$.

(D-ID) holds for one-parameter exponential families with canonical link, full-rank multi-parameter exponential families, ZIP/ZINB under independent variation of $\pi$ and $\theta$ (can fail at $\pi=1$), identifiable finite mixtures, and copula-linked multivariates under joint marginal+copula identifiability. (D-ID) is a hypothesis on the modelling setup, to be verified per response family, not a theorem of the framework. Corollary 1 restates the centering of the framework — $\mathbb E_\mu[\Delta(X,\theta_{\text{ref}})]=0$ — i.e. the Anchoring property is a theorem.

II.1.7 Proposition 1C (numerical verifiability — the Gram-matrix check)

Fix a basis $B$. The extended design matrix at $\theta_$ is $\mathbf Z_n(\theta_)=[,\Phi_a(X_{1:n}),\ \Phi_b(X_{1:n})\odot\theta_,\ X_{1:n},]\in\mathbb R^{np\times(J_a+J_b+d)}$, and $\mathbf G_n(\theta_)=n^{-1}\mathbf Z_n^\top\mathbf Z_n\to\mathbf G(\theta_*)$ in probability.

For the chosen finite representation $B$, $\text{FIC}B$ at $\theta$ holds iff $\mathbf G(\theta_)$ is non-singular (Gram non-singularity $=$ column linear independence).

Caveats: it diagnoses basis-restricted FIC, not abstract FIC; $\mathbf G_n\to\mathbf G$ only asymptotically (finite-sample near-singularity is a warning); singular $\mathbf G_n$ means non-identifiability within $B$. Operationally, before fitting the implementation computes the smallest eigenvalue and condition number of $\mathbf G_n(\theta_*)$; if $\lambda_{\min}<$ tol (default $10^{-8}$) it aborts and reports the dependent directions. This is gdpar_check_identifiability().

II.1.8 Condition C4-bis (cross-coordinate aliasing for $p>1$)

For $p>1$ with the coord-wise factorization, coordinate $k$ contributes $\eta_{i,k}=\theta_{\text{ref},k}+Z_{a,k}[i,\cdot]a_{\text{coef},k}+\sum_j(\theta_{\text{ref},k}^j-\theta_{\text{anchor},k}^j)W_{\text{raw}}[r_{k,j},\cdot]X[i,\cdot]^\top+\cdots$. If $Z_{a,k}$ and the modulating design $X$ share a column by name, a perturbation $a_k'(x)=a_k(x)+c,x_1$, $W_{k,x_1}'(\theta)=W_{k,x_1}(\theta)-c/(\theta_k^1-\theta_{\text{anchor},k}^1)$ leaves $\eta$ invariant at the diagnostic's single $\theta_{\text{ref},k}$ — aliasing.

(C4-bis) Coord-wise structural disjointness. For every $k$: $\operatorname{names}(Z_{a,k})\cap\operatorname{names}(X)=\emptyset$.

This is necessary but not sufficient (overlap enables aliasing; regularization can suppress it). The extended Gram matrix cannot detect it: at a fixed $\theta_{\text{ref}}$ the $W_{\text{per_k_dim}}$ columns collapse to scalar multiples of $X$ (false-positive rank deficit), and a Jacobian check at $W=0$ erases the very signal. So the package separates pre-fit (rank of $Z_{a,k}$ alone + name overlap with $X$, in check_C4_bis_per_k()) from post-fit posterior-geometry forensics (divergences, low ESS, high $\widehat R$). gdpar_check_identifiability(..., rigor=) offers "full" (default; aborts on overlap) and "fast" (warns of class gdpar_c4bis_overlap_warning, for users who intend the overlap and regularize the $W$ block via gdpar_prior()). The per-$k$ breakdown (passed, lambda_min/max, condition_number, shared_cols, collinear_directions) is in report$c4_bis$per_k.

II.1.9 Condition C7 (per-group anchor anti-aliasing)

Block 6.5 promotes the reference to a per-group anchor $\theta_{\text{ref}}[g]$, $g=1,\dots,J_{\text{groups}}$, drawn from $\mathrm{Normal}(\mu_{\theta_{\text{ref}}},\sigma_{\theta_{\text{ref}}})$ (activated by group = ~ species; group = NULL reduces bit-exactly to the single-anchor regime). If $Z_a$ (or $Z_b$) has a column rank-deficient with the per-group indicator $G$ (e.g. constant within each group), the model is non-identified along $\theta_{\text{ref}}[g]\mapsto\theta_{\text{ref}}[g]+c$, $a\mapsto a-c,e_{\text{col}}$.

(C7). When use_groups = 1, $\operatorname{rank}([G\mid Z_a])=\operatorname{ncol}(G)+\operatorname{ncol}(Z_a)$ and likewise for $Z_b$.

Enforced pre-fit by .check_group_aliasing_c7() in two layers: (1) within-group variance per column (catches constant-per-group / factor(group) aliases), (2) joint QR rank of normalized $[G\mid Z]$ (catches indirect aliases). Violation aborts with gdpar_input_error naming the columns. Together with C1–C4 (global Gram) and C4-bis (cross-component per coord), C7 completes a three-tier pre-flight; the post-fit forensic remains the posterior geometry.

II.1.10 Theorem 1E (functional identifiability of $W$ on the prior support)

In the Bayesian setting $\theta_{\text{ref}}\sim\pi_\Theta$. Theorem 1A pins $W$ only at $\theta_*$; outside $\operatorname{supp}(\pi_\Theta)$ no information about $W$ accrues. Three tiers, under Theorem 1A $\pi_\Theta$-a.e. and (D-ID):

  • (a) $W=W'$ $\pi_\Theta$-a.e. on $\operatorname{supp}(\pi_\Theta)$;
  • (b) $W=W'$ in $L^2(\pi_\Theta;\mathbb R^{p\times d})$ (the recommended default conclusion);
  • (c) $W=W'$ in $C(\overline{\operatorname{supp}(\pi_\Theta)})$, additionally requiring (BAY-1) support $=$ closure of a connected open $U$, (BAY-2) $W$ continuous (subsumed by (C5)), (BAY-3) $\pi_\Theta$ charges every non-empty open subset of $U$ (so the a.e.-identification set is dense). These are automatic for absolutely continuous priors with positive density on a connected open set.

No tier identifies $W$ outside $\overline{\operatorname{supp}(\pi_\Theta)}$: out-of-support reference predictions rely entirely on the prior and must be reported as such.

II.1.11 Proposition 1F (hypernetwork, Path 3) and the discrimination protocol

For the hypernetwork, $\mathcal F_a,\mathcal F_b,\mathcal F_W$ are nonlinear manifolds (images of feedforward nets), so (LIN) fails and Theorem 1A does not transfer. Then: (i) the realized function $\Phi_\phi(x,\theta)=a_\phi(x)+b_\phi(x)\odot\theta+W_\phi(\theta)x$ is the object of inference; (ii) the weights $\phi$ are not identifiable (permutation/sign/rescaling symmetries); (iii) under (D-ID), $\Phi_\phi$ is identifiable up to $L^2(\mu\otimes\pi_\Theta)$-equivalence — a function-level claim strictly weaker than identifiability of the AMM decomposition into $(a_\phi,b_\phi,W_\phi)$, which is an open question. Universal approximation (Hornik 1991; Pinkus 1999) gives density, not identifiability — the package records this distinction explicitly.

When Path 3 diverges from Path 1 on the same data, a four-step empirical protocol discriminates "richer structure" from "undetected non-identifiability": (1) stability across $\ge5$ random seeds (across-run SD vs within-run SD; "stable" iff $\le0.10\times$); (2) stratified $k$-fold ELPD comparison ($\overline{\Delta\text{elpd}}>2\text{SE}$ and positive in $\ge80%$ folds $\Rightarrow$ real structure); (3) posterior-predictive calibration at $\alpha\in{0.05,0.20,0.50}$; (4) component-wise decomposition of $\Delta\theta(x)$ into additive/multiplicative/modulated shares. A decision rule (5-row table) maps the joint outcome to a verdict (richer structure / mixed / no advantage / non-identifiability / model misspecified). The protocol is evidence-weighted, never a formal certificate.

II.1.12 Special cases and component selection

Standard models are Theorem-1A special cases verified to satisfy (LIN): standard regression (Level 0), random coefficients (Level 1, identifiability $\Leftrightarrow$ $\operatorname{Cov}_\mu(X)$ full rank), varying-coefficient (Level 2, $b\equiv0$, spline classes), hierarchical Bayes with multiplicative interaction (Level 2, $W\equiv0$; (C6) critical), full canonical AMM (Level 2). The hypernetwork falls outside (LIN), into Proposition 1F.

Component selection over the eight restrictions $M_S$, $S\subseteq{a,b,W}$, uses $\widehat{\text{elpd}}{\text{loo}}$ (Vehtari–Gelman–Gabry 2017) with differences $>2$ SE substantive; a stratified PSIS-LOO computed within $x$-quantile/categorical subgroups localizes structural heterogeneity and is reported by default. Frequentist (Path 2) selection uses generalized LRTs on effective d.f. (Wood 2017); Path 3 compares post-training component norms $|a\phi|,|b_\phi|,|W_\phi|$ to the regularization scale to prune dormant components.


II.2 Asymptotic theory of the three paths

The package develops asymptotics for all three paths to reference grade (only Path 1 is executable). The reference text throughout is Ghosal & van der Vaart (2017); AMM-specific theorems are specializations, with explicit statements of what is established, what the AMM specialization costs in extra hypotheses, and what remains open.

II.2.1 Path 1 (Hierarchical Bayesian) — three layers

The Path-1 model places priors on every component: $\theta_{\text{ref}}\sim\pi_\Theta$, $a\sim\pi_a$, $b\sim\pi_b$, $W\sim\pi_W$, joint prior $\pi=\pi_\Theta\otimes\pi_a\otimes\pi_b\otimes\pi_W$, posterior $\Pi_n\propto\pi\prod_i p(y_i\mid\theta_i)$. Write $\eta=(\theta_{\text{ref}},a,b,W)$, true value $\eta_*$. Two design regimes are distinguished because conclusions differ: (R-RANDOM) $X_i\overset{\text{iid}}\sim\mu$ (default) and (R-COND) fixed/conditional-on-design; they coincide under (R-EQUIV) (positive density of $\mu$ + uniformly Glivenko–Cantelli classes), automatic for finite-dim parametric classes.

Two distances, no global equivalence. Hellinger $d_H$ on the data distribution (the natural metric for Bayesian contraction) and $L^2(\mu)$ on the deviation function (the interpretable metric for the components). They are locally equivalent at $\eta_*$ only in the parametric smooth full-rank-Fisher case ($c_1 d_{L^2}\le d_H\le c_2 d_{L^2}$ on a neighborhood); globally and in non-parametric cases they can diverge. Theorems 4A/4B are stated in $d_H$ and specialized to $d_{L^2}$ only locally.

The three asymptotic layers parallel the three identifiability layers:

  • (L1) Posterior consistency$\Pi_n({d(\eta,\eta_*)>\varepsilon})\xrightarrow{P}0$ for every $\varepsilon$.
  • (L2) Contraction rate$\exists,\varepsilon_n\to0$, $n\varepsilon_n^2\to\infty$, with $\Pi_n({d>M\varepsilon_n})\xrightarrow{P}0$.
  • (L3) Bernstein–von Mises$\Pi_n(\sqrt n(\eta-\widehat\eta_n)\in\cdot)\xrightarrow{w}\mathcal N(0,I_*^{-1})$ in total variation.

Standing asymptotic hypotheses (additional to C1–C6, LIN, D-ID, IID): (PRIOR-KL) $\pi(B_\varepsilon(\eta_))>0$ for all $\varepsilon$ (KL-ball $B_\varepsilon={K(\eta_,\eta)\le\varepsilon^2,V(\eta_,\eta)\le\varepsilon^2}$); (PRIOR-THICK) $\pi(B_{\varepsilon_n}(\eta_))\ge e^{-C_1 n\varepsilon_n^2}$; (SIEVE) sieves $\Theta_n$ with $\pi(\Theta_n^c)\le e^{-C_2 n\varepsilon_n^2}$ and bracketing entropy $\log N_{[]}(\varepsilon_n,\Theta_n,d_H)\le C_3 n\varepsilon_n^2$; (TEST) tests with exponential type-I/II error decay; (LAN) local asymptotic normality at $\eta_$ with non-singular Fisher $I_$.

Theorem 4A (posterior consistency). Under C1–C6, LIN, D-ID, the Block-2 regularity (HOM)+(REG)+(IID), (PRIOR-KL), (TEST), and finite bracketing entropy on sieves with $\pi(\Theta_n^c)\to0$: $\Pi_n({d_H(\eta,\eta_)>\varepsilon})\xrightarrow{P_{\eta_}}0$. (Schwartz 1965 specialized to AMM; novelty is verifying PRIOR-KL and entropy for the product prior, which under LIN reduces to prior positivity at $\eta_*$.)

Theorem 4A discharges (REG-EST) of Block 2 in average-error form: the posterior-mean individual parameter $\widehat\theta_i^{\text{Bayes}}=\int[\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}})],d\Pi_n$ satisfies $\frac1n\sum_i|\widehat\theta_i^{\text{Bayes}}-\theta_*(x_i,\theta_{\text{ref}})|\xrightarrow{P}0$. The uniform-in-$i$ version needs sup-norm contraction (extra hypotheses, not implied by 4A).

Theorem 4B (contraction rate). Adding (PRIOR-THICK) and (SIEVE) for $\varepsilon_n\to0$, $n\varepsilon_n^2\to\infty$: $\Pi_n({d_H>M\varepsilon_n})\xrightarrow{P}0$ (Ghosal–Ghosh–van der Vaart 2000 specialized to AMM).

Rates by Level: Level 0/1 parametric $\varepsilon_n=n^{-1/2}$; VCM fixed-$J$ splines $(J/n)^{1/2}$, adaptive matched-smoothness $n^{-\beta/(2\beta+d)}$; Level 2 parametric $n^{-1/2}$, non-parametric determined by the slowest component subject to prior-matching caveats (the joint rate need not be a simple combination of component rates — numerical verification recommended). The library implements empirical contraction verification: refit at $n_1<\dots<n_K$, track credible-set diameters, and warn when the empirical rate diverges from prediction (signaling prior or model misspecification).

Theorem 4C (Bernstein–von Mises). For finite-dim parametric AMM (Levels 0/1/2 with finite-dim classes), under 4A+4B, (LAN), and a consistent $\sqrt n$-MLE: the posterior is asymptotically $\mathcal N(\widehat\eta_n,n^{-1}I_*^{-1})$ in total variation. Consequence: Bayesian credible intervals and asymptotic frequentist CIs coincide in the limit — this justifies reporting credible intervals as the primary uncertainty.

Proposition 4C-semi (semiparametric BvM). With parametric $\theta_{\text{ref}}$ and non-parametric $(a,b,W)$, under Castillo–Rousseau (2015) conditions ($\sqrt n$-recoverability + least-favorable-direction-aware prior), the marginal posterior of $\theta_{\text{ref}}$ is $\sqrt n$-asymptotically normal at the semiparametric efficiency bound $V_*$. Tight scope: this is only for the marginal of $\theta_{\text{ref}}$ — the function-valued $(a,b,W)$ need not be asymptotically Gaussian in a function-space metric, and the library reports their intervals as posterior quantiles (function-space credible balls), never as $\sqrt n$ Gaussian intervals.

Open questions explicitly recognized: (O1) full BvM for non-parametric components (only partial Sobolev-topology results exist); (O2) adaptive contraction rates for general AMM Path-1 priors; (O3) misspecification asymptotics under failure of (HOM)/(REG) (contraction to a KL-projection pseudo-true parameter, Kleijn–van der Vaart 2012). Implementation diagnostics: prior KL-support report, Stan $\widehat R$/ESS/divergences as indirect (TEST)/(SIEVE) checks, and a BvM calibration check comparing credible vs ML-Hessian intervals.

II.2.2 Path 2 (Varying-coefficient) and Path 3 (hypernetwork) — reference theory

These paths are not executable in gdpar 0.1.0 but carry reference-grade asymptotics:

  • Path 2 (VCM, vignette v05). Frequentist penalized-spline asymptotics: pointwise and uniform consistency of $\widehat\beta(\cdot)$, asymptotic normality at the spline rate $n^{-\beta/(2\beta+1)}$, with the reference recovered as $\beta(\bar z)$ and the deviation as $\beta(z)-\beta(\bar z)$; conditions specialize Fan–Zhang (2008), Stone (1985), Wood (2017). The curse of dimensionality in $z$ is the binding limitation.
  • Path 3 (hypernetwork, vignette v06). Only partial results are available for Bayesian neural networks: consistency under the Neural-Tangent-Kernel regime (Jacot et al. 2018; Bach 2017), PAC-Bayes generalization bounds (Dziugaite–Roy 2017), and an explicit acknowledgement that BvM and contraction rates are open (Hron et al. 2020). Universal approximation gives density, not identifiability; the function-level identifiability of $\Phi_\phi$ and its contraction are open (cf. Proposition 1F).

The cross-path consistency: parametric AMM gets the full $n^{-1/2}$ rate and full BvM; non-parametric AMM gets the smoothness-dependent rate and semiparametric BvM for the parametric subset only.


II.3 Empirical Bayes vs full Bayes

Partition $\eta=(\theta_{\text{ref}},\xi)$ with upper-level reference $\theta_{\text{ref}}$ and lower-level $\xi=(a,b,W)$. EB maximizes the marginal likelihood $L_n^{\text{marg}}(\theta_{\text{ref}})=\int p(Y_{1:n}\mid\theta_{\text{ref}},\xi)\pi_\xi(\xi),d\xi$ to get $\widehat\theta_{\text{ref}}^{\text{EB}}=\arg\max L_n^{\text{marg}}$, then conditions: $\Pi_n^{\text{EB}}(\xi)\propto p(Y_{1:n}\mid\widehat\theta_{\text{ref}}^{\text{EB}},\xi)\pi_\xi(\xi)$. FB gives $\theta_{\text{ref}}$ a prior $\pi_\Theta$ and integrates it out. Comparison metric: $d_{\text{TV}}(\Pi_n^{\text{EB}},\Pi_n^{\text{FB}})$. Standing hypotheses: (EB-MARG-ID) unique marginal-likelihood maximum with non-singular $I_{\theta\theta}^{\text{marg}}$; (PRIOR-FB-WEAK) $\operatorname{Var}n(\theta{\text{ref}}^*)/\operatorname{Var}\pi(\theta{\text{ref}})\to0$; (HIER-COMPLEX) bounded upper-level hyperparameters.

Theorem 7A (first-order equivalence). Under regularity + the three hypotheses, EB and FB lower-level posteriors agree asymptotically. Regime A (finite-dim parametric AMM): $d_{\text{TV}}(\Pi_n^{\text{EB}},\Pi_n^{\text{FB}})\xrightarrow{P}0$. Regime B (non-parametric AMM): TV is too strong; convergence holds for smooth bounded $L^2(\mu)$-Lipschitz functionals (equivalently Wasserstein-1/Prokhorov on the joint posterior). Specializes Petrone–Rousseau–Scricciolo (2014) and Rousseau–Szabo (2017). Practical content: for large $n$ + weak prior, EB and FB give essentially the same posterior over $\xi$; the choice is then computational/methodological.

Proposition 7B (higher-order coverage). Under Edgeworth-type expansion conditions (Bickel–Ghosh 1990), EB credible intervals for a smooth functional $g(\xi)$ under-cover by $O(n^{-1})$: $\mathbb P(g(\xi^)\in\mathrm{CI}n^{\text{EB},\alpha})=(1-\alpha)-C{g,\alpha}n^{-1}+o(n^{-1})$, with $C_{g,\alpha}\approx(g'(\xi^))^2/I_{\theta\theta}^{\text{marg}}\cdot\kappa(\alpha)$ (larger when $g$ is sensitive to $\theta_{\text{ref}}$, smaller when $\theta_{\text{ref}}$ is well-identified); FB covers to first order. The library applies a post-hoc inflation $\sqrt{1+C_{g,\alpha}/(n-q)}$ (argument eb_correction=TRUE), explicitly approximate.

Theorem 7C (compound decision, Robbins–Efron). For $K$ exchangeable units, $\frac1K\sum_k\mathbb E[(\widehat\xi_k^{\text{EB}}-\xi_k^)^2]\le\frac1K\sum_k\mathbb E[(\widehat\xi_k^{\text{FB}}-\xi_k^)^2]+B_K$ with $B_K\le\frac{C_1}{K}\mathbb E[(\widehat\theta_{\text{ref}}^{\text{EB}}-\theta_{\text{ref}}^*)^2]\to0$: EB risk approaches FB risk as $K\to\infty$ (squared-error loss on point estimates only — coverage still under-covers per 7B).

Proposition 7D. EB and FB differ substantially when: (i) small $I_{\theta\theta}^{\text{marg}}$ (poorly identified upper level); (ii) strongly informative $\pi_\Theta$; (iii) multimodal $L_n^{\text{marg}}$; (iv) misspecified $\pi_\xi$ (EB regularizes by tuning $\theta_{\text{ref}}$, FB does not).

Default is FB (gdpar()); EB (gdpar_eb()) is opt-in. EB's independent methodological advantages: honest avoidance of a prior on $\theta_{\text{ref}}$, and a deterministic, reproducible, frequentist-interpretable $\widehat\theta_{\text{ref}}^{\text{EB}}$.

Multivariate / multi-slot extension (vignette v07b): Theorem 7A* (to $\theta_{\text{ref}}\in\mathbb R^p$), Proposition 7B* (matricial coverage correction), Theorem 7C* (compound multi-slot, $K>1$), Proposition 7D* (four conditions extended), and open question O5*-EBFB on the anti-fragility of the Laplace Hessian under $p>1,K>1$.

The four gdpar_eb() path regimes (dispatched from the resolved $(K,p)$), each with a marginal+conditional Stan template pair and its own Proposition-7B correction:

Regime $(K,p)$ Stan template pair 7B correction
Base $K=1,p=1$ amm_eb_marginal.stan + amm_eb_conditional.stan scalar
Path A $K=1,p>1$ amm_eb_marginal_multi.stan + ..._conditional_multi.stan matricial 7B*
Path B $K>1,p=1$ amm_eb_marginal_K.stan + ..._conditional_K.stan per-slot scalar
Path C $K>1,p>1$ amm_eb_marginal_KxP.stan + ..._conditional_KxP.stan tensor $\mathbb R^{K\times p\times p}$

The canonical EB recipe (three steps): (i) marginal-likelihood maximization for $\widehat\theta_{\text{ref}}^{\text{EB}}$ via Laplace (cmdstanr::laplace() with multi-start + Levenberg–Marquardt ridge + condition-number guard — the anti-fragility strategy); (ii) plug $\widehat\theta_{\text{ref}}^{\text{EB}}$ as data into a conditional Stan model; (iii) sample the conditional posterior of $\xi$. The companion gdpar_compare_eb_fb() operationally verifies Theorem 7A (marginal TV) and Proposition 7B (per-cell width ratio).


II.6 Parametrizations and the pre-flight decision (CP / NCP / linear)

Hierarchical AMM posteriors can suffer the classic centered/non-centered pathology of hierarchical models. The package treats the parametrization of the multiplicative interaction $b(x)\odot\theta_{\text{ref}}$ as a first-class, data-driven decision rather than a fixed choice, and selects it pre-fit. Source: R/preflight.R, R/preflight_multi.R, R/contraction_diagnostic.R.

II.6.1 The three parametrizations

For the multiplicative term, write the contribution of coordinate involving the reference as $c_b=\theta_{\text{ref}}\cdot b_{\text{coef}}$ on the linear-predictor scale. Three parametrizations are available:

  • Centered (CP). Sample $b_{\text{coef}}$ directly; the term $\theta_{\text{ref}}\cdot b_{\text{coef}}$ couples $b_{\text{coef}}$ to $\theta_{\text{ref}}$ multiplicatively. CP mixes well when the data are informative about the interaction (the likelihood dominates the funnel).
  • Non-centered (NCP). Reparametrize $b_{\text{coef}}=\mu_b+\tau_b,\tilde b$ with $\tilde b\sim\mathrm{Normal}(0,1)$, decoupling the prior geometry; NCP mixes well when the data are weakly informative (prior-dominated funnel).
  • Linear reparametrization. Sample the product $c_b=\theta_{\text{ref}}\cdot b_{\text{coef}}$ directly as a linear coordinate, sidestepping the bilinear $(\theta_{\text{ref}},b)$ geometry altogether. This is the package's resolution of the deeper diagnosis (below): the root cause of the funnel is the non-linear $(\theta_{\text{ref}},b)$ parametrization, not the centering per se; sampling the linear product removes the bilinearity at the source.

The diagnosis that led here proceeded in three iterations — NCP did not cure it, CP did not cure it, and the residual pathology was traced to the bilinear coordinate; the final fix samples $c_b$ linearly. This is the maximum-robustness, root-cause resolution rather than a symptom patch.

II.6.2 The pre-flight decision

The choice among CP/NCP/(linear) is made by a pre-flight procedure that runs a short pilot and computes an information ratio discriminating prior-dominated from likelihood-dominated regimes, then dispatches:

  • preflight_parametrization() (scalar) / preflight_parametrization_multi() ($p>1$, per coordinate) run the pilot sample and the attribution/info-ratio computation.
  • The information ratio contrasts how much of the posterior variation in the interaction is attributable to the data vs the prior; an asymptotic $z$/$t$-style test on it, evaluated against thresholds, picks the regime. Defaults: tau_cp = 5, tau_ncp = 2 (the boundary thresholds of the ratio); a high ratio (data-informative) selects CP, a low ratio selects NCP.
  • The variance/score machinery is made dependence-aware: effective weights + a chain-aware block bootstrap + an asymptotic z-test give the ratio's sampling distribution without assuming independence of the pilot draws (Path B′ canonical design). The per-coordinate variant decides each coordinate of $\theta_{\text{ref}}\in\mathbb R^p$ separately and resolves a global vs per-dimension decision.
  • resolve_parametrization() / resolve_parametrization_multi() turn the diagnostic verdict into the concrete Stan-side toggle used by the code generator. preflight_global_decision() and preflight_per_dim() are the exported user-facing entry points; decision_to_logical() maps the verdict to the boolean the template consumes.

Confounding-induced NCP preference is treated as correct, not a defect: when a covariate confounds the reference, the prior-dominated geometry genuinely calls for NCP, and the diagnostic is designed to detect exactly that. The whole apparatus is data-driven: no parametrization is hard-coded; the knob is set from a declared, reproducible statistic of a pilot fit.

II.6.3 The contraction diagnostic

gdpar_contraction_diagnostic() (R/contraction_diagnostic.R) operationalizes the empirical contraction-rate verification of Theorem 4B (§II.2.1): it refits at increasing sample sizes and tracks the posterior credible-set diameter, flagging deviations from the predicted rate $\varepsilon_n$ as evidence of prior or model misspecification. It is the runtime instrument behind the asymptotic theory's "numerical verification of contraction".


II.4 Distributional families and links

A family is represented internally as an ordered list of per-parameter specifications (gdpar_param_spec), one per slot $k$. Each spec carries: a name ($\mu,\sigma,\phi,\nu,\pi,p,\dots$), a link (and its inverse), a family_role ∈ {location, scale, shape, df, mixture_pi, power}, a support (real_line/positive_real/unit_interval/bounded_open), an identifiability descriptor (did_status ∈ {holds, holds_under_condition, user_responsible} with condition + reference, i.e. the (D-ID) hypothesis of Lemma 1B made per-slot), a canonical prior kind, and a scope (per_observation or population). The link factory implements exactly three links:

$$\text{identity: } g(\mu)=\mu,\ g^{-1}(\eta)=\eta;\qquad \text{log: } g(\mu)=\log\mu,\ g^{-1}(\eta)=e^{\eta};\qquad \text{logit: } g(\mu)=\log\tfrac{\mu}{1-\mu},\ g^{-1}(\eta)=\tfrac{1}{1+e^{-\eta}}.$$

The built-in roster (location slot gets the user link; auxiliaries get fixed links — log for positive-real, logit for unit-interval, identity for the Tweedie power):

Family Slots (role, link)
gaussian $\mu$ (location), $\sigma$ (scale, log)
poisson $\mu$ (location)
neg_binomial_2 $\mu$ (location), $\phi$ (scale, log)
bernoulli $\mu$ (location)
beta $\mu$ (location), $\phi$ (scale, log)
gamma $\mu$ (location), shape (log)
student_t $\mu$, $\sigma$ (log), $\nu$ (df, log)
tweedie $\mu$ (log), $\phi$ (log), $p$ (power, identity, bounded $(1,2)$)
zip $\mu$ (location), $\pi$ (mixture_pi, logit)
zinb $\mu$, $\phi$ (log), $\pi$ (logit)
hurdle_poisson $\mu$, $\pi$ (logit)
hurdle_neg_binomial_2 $\mu$, $\phi$ (log), $\pi$ (logit)

Each family carries an integer stan_id selecting the likelihood branch in the Stan templates (Part V). Auxiliary slots default to population scope and can be promoted to per_observation (their own AMM decomposition) by the user — that is distributional regression: $\theta_i^{(k)}=\theta_{\text{ref}}^{(k)}+\Delta^{(k)}(x_i,\theta_{\text{ref}}^{(k)})$ on the slot's link scale. The package also supports heterogeneous per-slot families (different sub-family per slot) and custom families (gdpar_family_custom, gdpar_family_custom_K) where the user declares the likelihood and accepts responsibility for (D-ID) via did_override. The exported gdpar_family, gdpar_family_multi (the $p>1$ multivariate slot constructor) and the print methods complete the API. Zero-inflation/hurdle realize the dual deviation of §I.6 (both $\pi$ and the count parameter anchored). Full per-function detail: Part IV, R/families.R; per-likelihood Stan math: Part V.

II.5 The modulating basis W

The modulating component $W(\theta_{\text{ref}})x$ draws $W$ from a finite-dimensional basis built by W_basis(type, degree, knots, df, boundary_knots, basis_fn, dim, p) with three types:

  • polynomial (default degree 1): block-by-coordinate powers, no cross-terms, $$\text{eval}(\theta)=\big(\theta_1,\theta_1^2,\dots,\theta_1^{\deg},\ \theta_2,\dots,\theta_2^{\deg},\ \dots,\ \theta_{p},\dots,\theta_{p}^{\deg}\big);$$
  • bspline (default degree 3): per-coordinate B-spline bases concatenated, $\text{eval}(\theta)=(B(\theta_1),\dots,B(\theta_p))$, with Cox–de Boor evaluation performed Stan-side for differentiability inside HMC; the R side (.gdpar_resolve_bspline_knots, .gdpar_bspline_knots_full, .gdpar_validate_bspline_boundary_range) resolves interior knots (from knots or df) and boundary knots and validates the reference range; boundary_knots defaults to range(c(knots, tk));
  • user: an arbitrary basis_fn of declared dim.

materialize_W_basis() populates dim, p, and block_indices (the per-coordinate column blocks) once $p$ is known; as_per_k() reshapes to per-coordinate form. The anchoring constraint (C4) $W(\theta_0)=0$ is enforced by the differenced parametrization $W(\theta)=W_0(\theta)-W_0(\theta_0)$ (the apply_W_basis_diff mechanism, selected by a W_type_id). Detail: Part IV, R/W_basis.R; Stan-side Cox–de Boor: Part V.

II.7 Grouped references (per-group anchors)

Activated by group = ~ factor in gdpar(), the scalar/coord-wise reference is promoted to a per-group anchor $\theta_{\text{ref}}[g]\sim\mathrm{Normal}(\mu_{\theta_{\text{ref}}},\sigma_{\theta_{\text{ref}}})$, $g=1,\dots,J_{\text{groups}}$, with its own hierarchical hyperprior ($\sigma_{\theta_{\text{ref}}}$ a learned between-group scale). group = NULL reduces bit-exactly to the single-anchor regime. The anti-aliasing condition (C7) (§II.1.9) is enforced pre-fit by .check_group_aliasing_c7(). Anchor resolution across regimes is handled by resolve_anchor, resolve_anchor_multi, resolve_anchor_K; extraction of grouped references by the .extract_theta_ref_*_grouped helpers (Part IV, R/gdpar.R, R/methods.R).

II.8 The causal bridge (CATE / ITE)

Because the AMM produces individual parameters, individual treatment effects follow directly. gdpar_causal_bridge() implements a T-learner: fit the anchored model separately under treatment and control and read the individual CATE/ITE as the difference of anchored individual predictions, $$\widehat\tau(x_i)=\widehat\mu_1(x_i)-\widehat\mu_0(x_i),$$ with posterior draws giving full uncertainty on $\tau(x_i)$ (summarized by .summarize_cate). A battery of pre-checks (.check_bridge_path/_hierarchical/_family/_dim/_amm/_anchor) guards the bridge's applicability. The second layer, gdpar_compare_meta_learners(), benchmarks the AMM learner against external meta-learners through a pluggable adapter registry: gdpar_adapter_grf (R, generalized random forests) and gdpar_adapter_econml (Python EconML via reticulate, e.g. CausalForestDML realizing orthogonal/DML CATE). Adapters honor a two-layer contract (fit_predict_fun + optional predict_fun). Detail: Part IV, R/causal_bridge.R, R/compare_meta_learners.R, R/adapter_*.R; theory: vignettes v08, v08b, v08c.

II.9 Geometry-adaptive sampling (opt-in)

For geometrically hostile posteriors (funnels, near-determinism, heavy tails, multimodality), the package ships an opt-in geometry engine whose default path is bit-identical to ordinary HMC. When enabled it climbs a ladder of metrics $G(\theta)$ for Riemannian-manifold HMC:

  • Euclidean (gdpar_geom_metric_euclidean) — baseline mass matrix;
  • Riemannian / Fisher (gdpar_geom_metric_gp_fisher, gdpar_geom_metric_riemannian) with the SoftAbs regularization of the Hessian eigenvalues, $\lambda\mapsto\lambda\coth(\alpha\lambda)$, giving a positive-definite metric from a possibly-indefinite Hessian; log-Cholesky parametrization of $G$ and its differentials ($dM/d\psi$);
  • sub-Riemannian (gdpar_geom_metric_subriemannian) — a degenerate/constrained metric flowing along admissible directions;
  • relativistic / Finsler (gdpar_geom_metric_relativistic) — a relativistic kinetic energy capping velocities (heavy-tail/ill-conditioning robustness), with its own radial integrator.

A certifying orchestrator (gdpar_geom_orchestrate, with _criteria and _budget variants) diagnoses the pathology (gdpar_geometry_diagnostic: multimodality, heavy kurtosis, boundary proximity, difficulty curve), selects an entry rung, builds the metric, tunes $\varepsilon$, runs the success gate, and emits a certificate (gdpar_geom_certificate). A Laplace fallback (gdpar_geom_laplace: Newton/Laplace climb, observed information, unconstrained draws, fit-quality label) provides a plug-in posterior and an ELPD on par with mgcv-REML / INLA-Laplace when full sampling is certified infeasible (the resolution of the near-deterministic Tweedie case). Detail: Part IV, R/geometry_engine.R, R/geometry_orchestrator.R, R/geometry_suite.R, R/geometry_bridge.R, R/geometry_laplace.R, R/geometry_diagnostic.R; theory: vignette vop08.

II.10 Dependence-robust inference

gdpar does not model dependence in its point structure; it makes the inference robust to unmodelled dependence (point estimates unchanged, only uncertainty made robust). Two axes:

  • Temporal (gdpar_dependence_diagnostic, gdpar_dependence_robust): diagnostics = lag-1 autocorrelation, Durbin–Watson, Ljung–Box on residuals; robust SE/intervals via moving/circular block bootstrap with a data-driven block length. The automatic length is the Politis–White (2004) + Patton–Politis–White (2009) flat-top rule (a base-R hand-roll equal to np::b.star): flat-top autocovariance + adaptive bandwidth $b=(2\hat g^2/D)^{1/3}n^{1/3}$, $D=\tfrac43 \text{spec}^2$; default block_length=NULL reduces bit-exactly to the $n^{1/3}$ Künsch rate.
  • Spatial (gdpar_spatial_dependence_diagnostic, gdpar_spatial_dependence_robust): diagnostic = Moran's $I$ (hand-roll, no spdep/sf) with knn row-standardized / distance-band / user-supplied weights, two-sided permutation default + analytic Cliff–Ord option; robust inference via tiled randomized-origin spatial block bootstrap (Politis–Romano–Lahiri) with variance-optimal block side $g=\max(2,\lceil n^{1/4}\rceil)$ in $d=2$, plus a data-driven subsampling calibration of $g$.

Both axes work on the EB scalar path and the FB path (gdpar_fit and gdpar_eb_fit) via a shared engine (.gdpar_dependence_robust_engine) with class-dispatched estimate/SE/residual extractors and a frozen RNG contract that keeps the EB route bit-exact. Detail: Part IV, R/dependence_robust.R; theory: vignette vop09.


Part III — Computational Architecture

This part is the map of how a model flows from a user call to posterior summaries. Every function named here is documented in full in Part IV; the Stan templates in Part V.

III.1 The fitting pipeline

A call to gdpar(formula, family, amm, data, group = NULL, path = "bayesian", ...) proceeds through:

  1. Specification resolution. The response/formula is parsed; the AMM design is built. The user may supply the AMM three ways: a single amm_spec() (scalar/coord-wise), a named list of specs, or a gdpar_formula_set() / gdpar_bf() (a brms-like multi-slot formula DSL parsed by .gdpar_parse_amm_formula). dims_spec (dimwise/override/resolve_dims_spec) declares per-coordinate ($p>1$) component formulas. The family is resolved (gdpar_family/gdpar_family_multi/heterogeneous) into the per-slot gdpar_param_spec list, and the resolved $(K,p)$ pair determines the path regime.
  2. Design construction. build_amm_design (and .build_amm_design_multi, .build_amm_design_K) assemble the centered design matrices $Z_a,Z_b$ (enforcing (C2)/(C3) by column-centering) and the modulating design $X$ with the W basis materialized.
  3. Identifiability pre-flight. gdpar_check_identifiability() runs the Gram-matrix check (Prop. 1C), the C4-bis cross-component check for $p>1$ (check_C4_bis_per_k), the per-slot checks for $K>1$, and — when group is set — the C7 anti-aliasing check (.check_group_aliasing_c7). Failure aborts with gdpar_identifiability_error unless skip_id_check = TRUE.
  4. Parametrization pre-flight. The CP/NCP/(linear) decision is taken data-drivenly (§II.6) via preflight_parametrization*resolve_parametrization*.
  5. Code generation. R/stan_codegen.R assembles the Stan program string for the resolved $(K,p,\text{family},W,\text{parametrization},\text{group})$ from the canonical Stan pieces, then compiles via cmdstanr.
  6. Sampling. HMC runs (optionally through the geometry engine, §II.9). For $K=1,p=1$ the path is gdpar() proper; $p>1$ dispatches to .gdpar_multi, $K>1$ to .gdpar_K/.gdpar_K_build.
  7. Diagnostics + packaging. compute_diagnostics collects $\widehat R$, ESS, divergences (with the all-NA $\to-\infty$+warning guard of D77); diagnostics() exposes them; the result is a gdpar_fit object.

gdpar_eb() replaces steps 5–7 with the three-step EB recipe (§II.3): Laplace marginal-likelihood maximization for $\widehat\theta_{\text{ref}}^{\text{EB}}$ (multi-start + Levenberg–Marquardt ridge + condition-number guard), plug-in as data, conditional sampling — across the four regimes (Base/A/B/C) and their marginal+conditional template pairs.

III.2 The code generator

R/stan_codegen.R is a string-assembly compiler: it does not ship one monolithic Stan file per case but composes a program from canonical pieces under inst/stan/_canonical_pieces/ (helpers, the $p=1$ and $p\ge1$ AMM blocks, the EB marginal/conditional blocks, the distributional-$K$ block) plus the dispatched family likelihood branch (selected by stan_id), the chosen W evaluation (W_type_id: polynomial / Stan-side Cox–de Boor B-spline / differenced anchoring), the parametrization toggle (CP/NCP/linear $c_b$), and the optional per-group anchor block. The unification of the $p=1$ and $p\ge1$ templates is the standing "deuda 8.4". Marker-based codegen also produces the MAP/MLE variants (strip-and-mark) used by the EB Laplace step.

III.3 The fitting engines

  • R/gdpar.R — the orchestrator: gdpar() (entry), .gdpar_multi ($p>1$), .gdpar_K / .gdpar_K_build ($K>1$), anchor resolution, compute_diagnostics, dedup_message_blocks (clean console output).
  • R/eb.R — the Empirical-Bayes engine (largest file, 3196 lines): the four-regime marginal/conditional fit drivers, the Laplace anti-fragility machinery, the EB gdpar_eb_fit packaging.
  • R/families.R — the family/codegen-facing layer: param-spec construction, link factory, per-slot scope promotion, heterogeneous-family resolution, inverse-link-id computation per slot.
  • R/geometry_*.R — the geometry engine, orchestrator, suite, bridge, Laplace, diagnostic (§II.9).
  • R/dependence_robust.R — the dependence-robust inference engine (§II.10).

III.4 The S3 method surface

The user-facing objects and their methods (full list in NAMESPACE; detail in Part IV):

  • gdpar_fit: print, summary, predict (with the $p>1$ array path and $K>1$ per-slot inverse-link path, plus grouped/newdata variants), coef (→ gdpar_coef), residuals (Dunn–Smyth / deviance / quantile, with DHARMa integration), pp_check, gdpar_loo, gdpar_posterior_predict, gdpar_dharma_object.
  • gdpar_eb_fit: print, summary, coef, predict.
  • gdpar_coef: print, summary, format, as.data.frame.
  • causal/comparison: gdpar_causal_bridge, gdpar_meta_learner_comparison, gdpar_eb_fb_comparison each with print/summary/predict as applicable.
  • specs/reports: amm_spec, amm_builder, W_basis, dims_spec, gdpar_family(_multi), gdpar_formula_set, gdpar_param_spec, and the diagnostic reports (gdpar_identifiability_report, gdpar_preflight_report, gdpar_contraction_report, gdpar_bvm_report, gdpar_dependence_diagnostic, gdpar_spatial_dependence_*, gdpar_ksd_joint, gdpar_geometry_diagnostic, geometry certificates) all carry print (and where useful summary/format/as.data.frame).

III.5 Serialization, golden tests, validation utilities

  • R/amm_serialize.Ramm_save_spec/amm_load_spec (a text round-trip of an AMM spec: formula/char/numeric/W-record (de)serializers); gdpar_snapshot_fit snapshots a fit.
  • R/golden_compare.R, R/golden_helpers.R — the golden-regression machinery (manifest, roster, structural/discrete/continuous/sanity comparators, code-hash + toolchain-version stamping) underpinning the test tiers of §VI.3.
  • R/ksd_joint.R — a kernelized Stein discrepancy joint diagnostic; R/bvm_check.R — the Bernstein–von Mises calibration check (§II.2.1); R/contraction_diagnostic.R — empirical contraction (§II.6.3); R/gdpar_loo.R — PSIS-LOO aggregation; R/preflight*.R — the parametrization pre-flight (§II.6); R/utils-*.R — condition system (gdpar_abort/warn/inform, require_suggested) and input validators (assert_*).

Part IV — Exhaustive Function Reference

Documentation of every function in all 44 R source files, generated by the GLM-5.2 and MiMo-V2.5-Pro lineages under a faithful-to-source spec and audited (guaranteed floor on the mathematically dense modules: families, W_basis, geometry_*, dependence_robust, stan_codegen).

R/adapter_econml.R

gdpar_adapter_econml(estimator = "CausalForestDML", n_estimators = 1000L, model_y = NULL, model_t = NULL, seed = NULL)

Purpose Creates and returns a gdpar_meta_learner_adapter object that wraps an EconML estimator via the reticulate package, for use with gdpar_compare_meta_learners. It enables causal inference using the Orthogonal Double/Debiased Machine Learning (DML) framework, specifically the CausalForestDML estimator from the EconML Python package.

Arguments

  • estimator Character scalar. Identifies the EconML estimator to use. Currently only "CausalForestDML" is supported in this package version.
  • n_estimators Integer scalar. The number of trees in the CausalForestDML forest.
  • model_y Optional Python model object. The outcome model used in the first stage of the DML procedure. If NULL, EconML's default (a gradient-boosted tree) is used.
  • model_t Optional Python model object. The treatment model used in the first stage of the DML procedure. If NULL, EconML's default is used.
  • seed Optional integer scalar. A random seed for reproducibility, passed to the EconML estimator's random_state parameter. Must be between 1 and .Machine$integer.max.

Mathematics The CausalForestDML estimator implements the Orthogonal Double Machine Learning (DML) framework for estimating the Conditional Average Treatment Effect (CATE), $\tau(x)$, at a covariate vector $x$. For a binary treatment $T$, outcome $Y$, and covariates $X$, the CATE is defined as: $$\tau(x) = \mathbb{E}[Y(1) - Y(0) | X = x]$$ where $Y(t)$ denotes the potential outcome under treatment $t$. DML proceeds by:

  1. Estimating nuisance parameters $\eta_0$, e.g., the outcome model $\mathbb{E}[Y|X, T]$ and the propensity score $\mathbb{P}(T=1|X)$, using flexible machine learning.
  2. Constructing the pseudo-outcome (the "DML residual"): $$\psi(W; \eta) = \left[ \frac{T - \hat{\mathbb{P}}(T=1|X)}{\hat{\mathbb{P}}(T=1|X)(1 - \hat{\mathbb{P}}(T=1|X))} \right] \left( Y - \hat{\mathbb{E}}[Y|X, T] \right)$$
  3. Estimating the CATE by regressing $\psi(W; \hat{\eta})$ on $X$ using a causal forest.

Confidence intervals are constructed using the effect_interval method of the fitted EconML estimator, which provides asymptotically valid intervals based on the forest's variance estimation.

Returns A gdpar_meta_learner_adapter list object with the following components:

  • name: "econml"
  • fit_predict_fun: A function that fits the model and returns CATE estimates and confidence intervals.
  • predict_fun: A function that predicts from a fitted model without re-fitting.
  • requires_r: "reticulate"
  • requires_py: "econml"
  • native_ci: TRUE
  • description: A character string summarizing the estimator and its settings.

Notes

  • The Python module econml must be installed in the active reticulate environment. If not found, the function aborts with a gdpar_missing_dependency_error.
  • The state object returned by fit_predict_fun contains a reference to a Python object. This reference is invalidated if the R session is restarted (e.g., after saveRDS and loadRDS). Using predict_fun on such a state will result in a gdpar_unsupported_feature_error.
  • The function validates input arguments using assert_count and assert_numeric_scalar (internal validation functions not shown).
  • The internal .econml_to_matrix function is called to convert covariate data frames to numeric matrices compatible with EconML.

.econml_to_matrix(df, template = NULL)

Purpose Converts a data frame of covariates into a numeric matrix suitable for input to the EconML Python package. It also manages a template to ensure consistent encoding of factor levels between training and prediction data.

Arguments

  • df A data frame or object coercible to a data frame. Contains the covariates.
  • template A list or NULL. If NULL, the function creates a new template from df. If provided, it ensures df is processed to match the template's column structure.

Mathematics The conversion process applies the following transformation to each covariate:

  1. Character columns are converted to factors.
  2. The data frame is converted to a model matrix using the formula ~ . - 1, which expands factor variables into dummy (one-hot) variables without an intercept. For a factor with $k$ levels, this yields $k$ binary indicator columns.

The resulting matrix $X$ is used in the EconML estimator's fit and effect methods.

Returns A numeric matrix with the following attributes:

  • colnames: The column names of the matrix.
  • factor_levels: A list mapping original factor column names to their levels. If the column is not a factor, the entry is NULL.
  • template: The template list (either created or passed in) is attached as an attribute to the matrix. This template is stored in the adapter's state to ensure consistent data processing during prediction.

Notes

  • If template is NULL (i.e., during model fitting), the function requires at least one column in df. Otherwise, it aborts with a gdpar_input_error.
  • If template is provided (i.e., during prediction), the function enforces that all factor variables in the template exist in df and have the same levels. It also ensures the resulting model matrix has exactly the same columns as the template. Incompatibilities (missing or extra columns) result in a gdpar_input_error.
  • The function uses stats::model.matrix for the conversion, which handles factor expansion and removal of intercepts.

R/adapter_grf.R

gdpar_adapter_grf(num_trees = 2000L, sample_fraction = 0.5, mtry = NULL, honesty = TRUE, seed = NULL)

Purpose

Factory that constructs a gdpar_meta_learner_adapter object wrapping the R-side causal forest estimator grf::causal_forest (Athey, Tibshirani, and Wager, 2019) for use with gdpar_compare_meta_learners. The adapter populates both the mandatory fit_predict_fun (fit + predict in one call) and the optional predict_fun (reuse a cached forest on a fresh evaluation grid without refitting). It advertises requires_r = "grf" and native_ci = TRUE because grf's built-in variance estimator supplies confidence intervals via the normal approximation.

Arguments

  • num_trees: Integer scalar; number of trees in the forest. Default 2000L, matching grf's default.
  • sample_fraction: Numeric scalar in $(0, 0.5]$; fraction of the training sample drawn for each tree. Default 0.5.
  • mtry: Optional integer scalar; number of candidate variables per split. Default NULL delegates to grf's own default ($\min(\lceil\sqrt{p} + 20\rceil, p)$).
  • honesty: Logical scalar; whether to use honest splitting. Default TRUE (recommended; grf CIs are valid only under honesty).
  • seed: Optional integer scalar; seed propagated to grf's internal RNG when the comparator's seed_run is NULL. Default NULL.

Mathematics

Native confidence intervals are obtained by the normal approximation

$$ \big[,\widehat\tau(x) - z_{1-\alpha/2}\cdot\sqrt{\widehat{\mathrm{Var}}(\widehat\tau(x))},;; \widehat\tau(x) + z_{1-\alpha/2}\cdot\sqrt{\widehat{\mathrm{Var}}(\widehat\tau(x))},\big], $$

where $\alpha = 1 - \text{level}$, $z_{1-\alpha/2} = \Phi^{-1}(1 - (1-\text{level})/2)$, and $\widehat{\mathrm{Var}}(\widehat\tau(x))$ is grf's built-in variance estimate obtained via predict(..., estimate.variance = TRUE). The standard error is clamped at zero: $\mathrm{se}(x) = \sqrt{\max(\widehat{\mathrm{Var}}(\widehat\tau(x)), 0)}$.

Returns

A gdpar_meta_learner_adapter object (constructed via gdpar_meta_learner_adapter) with fields:

  • name = "grf",
  • fit_predict_fun: closure capturing the hyperparameter list hp,
  • predict_fun: closure independent of hp,
  • requires_r = "grf",
  • native_ci = TRUE,
  • description: a string of the form "grf::causal_forest (num_trees = <n>, honesty = <h>) with normal-approximation CIs from estimate.variance.".

Notes

  • Validation sequence:
    1. assert_count(num_trees, "num_trees").
    2. assert_numeric_scalar(sample_fraction, "sample_fraction", lower = 1e-3, upper = 0.5).
    3. If mtry is non-NULL, assert_count(mtry, "mtry").
    4. honesty must be a length-1 non-NA logical; otherwise gdpar_abort raises a gdpar_input_error with message "Argument 'honesty' must be a non-NA logical scalar.".
    5. If seed is non-NULL, assert_numeric_scalar(seed, "seed", lower = 1, upper = .Machine$integer.max).
  • Hyperparameters are coerced and stored in a list hp: num_treesinteger, sample_fractionnumeric, mtryinteger (or NULL), honestylogical, seedinteger (or NULL).
  • The fit_predict_fun and predict_fun closures are defined locally and passed to gdpar_meta_learner_adapter; see their dedicated subsections below.
  • Side effects: none beyond construction of the adapter object. The closures themselves perform fitting/prediction only when invoked.

fit_predict_fun(X, Y, T, X_newdata, level, seed_run) (closure defined inside gdpar_adapter_grf)

Purpose

Mandatory adapter entry point: fit a grf::causal_forest on the training triple (X, Y, T) and immediately predict the conditional average treatment effect (CATE) on X_newdata together with native normal-approximation confidence intervals at confidence level level. Returns both the predictions and a state object that allows predict_fun to reuse the fitted forest.

Arguments

  • X: Covariate data (data frame or coercible) for the training sample; passed to .grf_to_matrix.
  • Y: Numeric outcome vector; coerced via as.numeric(Y) and passed as Y to grf::causal_forest.
  • T: Numeric treatment indicator vector; coerced via as.numeric(T) and passed as W to grf::causal_forest.
  • X_newdata: Covariate data for prediction; converted via .grf_to_matrix(X_newdata, template = attr(X_mat, "template")) so its design aligns with the training design.
  • level: Numeric scalar in $(0, 1)$; confidence level for the CIs.
  • seed_run: Optional integer scalar; seed supplied by the comparator. If non-NULL it overrides hp$seed.

Mathematics

The causal forest is fit by do.call(grf::causal_forest, args) with args = list(X = X_mat, Y = as.numeric(Y), W = as.numeric(T), num.trees = hp$num_trees, sample.fraction = hp$sample_fraction, honesty = hp$honesty), conditionally augmented with mtry (if hp$mtry non-NULL) and seed (if eff_seed non-NULL). Predictions use estimate.variance = TRUE, yielding $\widehat\tau(x)$ and $\widehat{\mathrm{Var}}(\widehat\tau(x))$. Then

$$ z = \Phi^{-1}!\left(1 - \frac{1 - \text{level}}{2}\right), \qquad \mathrm{se}(x) = \sqrt{\max!\big(\widehat{\mathrm{Var}}(\widehat\tau(x)),, 0\big)}, $$

$$ \text{CI}(x) = \big[,\widehat\tau(x) - z,\mathrm{se}(x),;; \widehat\tau(x) + z,\mathrm{se}(x),\big]. $$

Returns

A list with elements:

  • cate_mean: numeric vector of point predictions as.numeric(pred$predictions).
  • cate_ci: numeric matrix with columns lower and upper, row-aligned with cate_mean.
  • state: list with forest (the fitted causal_forest object cf) and template (the column template extracted from attr(X_mat, "template")).
  • notes: character(0L) (empty).

Notes

  • Calls require_suggested("grf", "fit gdpar_adapter_grf") to ensure the suggested package is available; this is expected to error if grf is not installed.
  • eff_seed is resolved as seed_run (coerced to integer) when non-NULL, otherwise falls back to hp$seed.
  • The variance estimates are clamped at zero via pmax(as.numeric(pred$variance.estimates), 0) before taking the square root, guarding against tiny negative numerical artifacts.
  • The state$forest object retains grf's internal RNG state and trained trees; it is intended to be passed back to predict_fun.
  • Side effects: triggers grf::causal_forest fitting (RNG consumption if eff_seed is set) and a prediction call.

predict_fun(state, X_newdata, level) (closure defined inside gdpar_adapter_grf)

Purpose

Optional adapter entry point: reuse a previously fitted causal forest stored in state to predict CATEs (with native CIs) on a fresh evaluation grid X_newdata, without refitting.

Arguments

  • state: List previously produced by fit_predict_fun, expected to contain forest (a fitted grf::causal_forest object) and template (column template for design alignment).
  • X_newdata: Covariate data for prediction; converted via .grf_to_matrix(X_newdata, template = state$template).
  • level: Numeric scalar in $(0, 1)$; confidence level for the CIs.

Mathematics

Identical normal-approximation CI construction as in fit_predict_fun:

$$ z = \Phi^{-1}!\left(1 - \frac{1 - \text{level}}{2}\right), \qquad \mathrm{se}(x) = \sqrt{\max!\big(\widehat{\mathrm{Var}}(\widehat\tau(x)),, 0\big)}, $$

$$ \text{CI}(x) = \big[,\widehat\tau(x) - z,\mathrm{se}(x),;; \widehat\tau(x) + z,\mathrm{se}(x),\big]. $$

Returns

A list with elements:

  • cate_mean: numeric vector of point predictions.
  • cate_ci: numeric matrix with columns lower and upper.

(No state or notes elements are returned, unlike fit_predict_fun.)

Notes

  • Dependency check is performed inline via requireNamespace("grf", quietly = TRUE); on failure, gdpar_abort raises a gdpar_missing_dependency_error with data = list(package = "grf") and message "Package 'grf' is required to reuse a cached causal_forest state.".
  • If state is NULL or state$forest is NULL, gdpar_abort raises a gdpar_internal_error with message "Cached state for the grf adapter is empty; refit before predicting.".
  • Uses stats::predict (not grf:::predict.causal_forest directly) so S3 dispatch resolves the method.
  • Variance estimates are clamped at zero via pmax(..., 0) before the square root.
  • Side effects: triggers a grf prediction call (no refitting, no RNG consumption).

.grf_to_matrix(df, template = NULL)

Purpose

Internal helper that converts a covariate data frame into a fully numeric design matrix suitable for grf::causal_forest. Character columns are coerced to factors, factors are expanded via model.matrix(~ . - 1, ...), and numeric columns pass through unchanged. A template attribute records the column structure of the first call so subsequent calls on X_newdata align identically; the function aborts when a new column appears or a previously observed factor level is missing.

Arguments

  • df: A data frame, or an object coercible via as.data.frame(df, stringsAsFactors = FALSE). Contains the covariates.
  • template: Optional list with elements colnames (character vector of expected design-matrix column names) and factor_levels (named list mapping original data-frame column names to either their factor levels or NULL for non-factor columns). If NULL, a new template is built from df.

Mathematics

The design matrix is constructed as

$$ \mathbf{X}_{\text{mat}} = \text{model.matrix}(\sim, . - 1,; \text{data} = \text{df}), $$

which expands each $K$-level factor into $K$ indicator columns (no intercept). When a template is supplied, factor columns are first re-leveled to the template's levels via factor(df[[j]], levels = template$factor_levels[[j]]), the resulting matrix is checked for set-equality of column names against template$colnames, and finally reordered as mm[, template$colnames, drop = FALSE].

Returns

A numeric matrix with attribute "template" (a list with colnames and factor_levels). When template = NULL on input, the returned template is freshly built from df; otherwise the input template is attached unchanged to the returned matrix.

Notes

  • If df is not a data frame, it is coerced via as.data.frame(df, stringsAsFactors = FALSE).
  • A for loop over seq_along(df) coerces any character column to factor via as.factor.
  • Template-NULL branch:
    • If ncol(df) == 0L, gdpar_abort raises a gdpar_input_error with message "gdpar_adapter_grf requires at least one covariate; received a 0-column data frame.".
    • template$factor_levels is built by lapply(df, function(col) if (is.factor(col)) levels(col) else NULL), so non-factor columns map to NULL.
  • Template-non-NULL branch:
    • For each name j in names(template$factor_levels) whose entry is non-NULL (i.e., a factor column in the original training data):
      • If j is not in colnames(df), gdpar_abort raises a gdpar_input_error with message "Covariate '<j>' missing from newdata for the grf adapter." and data = list(missing = j).
      • Otherwise df[[j]] is re-factored with factor(df[[j]], levels = template$factor_levels[[j]]); this silently drops unseen levels and introduces NA for values not in the template levels.
    • After building mm, if !setequal(colnames(mm), template$colnames):
      • missing_cols <- setdiff(template$colnames, colnames(mm)).
      • extra_cols <- setdiff(colnames(mm), template$colnames).
      • gdpar_abort raises a gdpar_input_error with a formatted message listing missing and extra columns (using "<none>" when a set is empty) and data = list(missing = missing_cols, extra = extra_cols).
    • On success, columns are reordered to exactly match template$colnames via mm[, template$colnames, drop = FALSE].
  • The template attribute is set on the returned matrix via attr(mm, "template") <- template, enabling chained calls (e.g., fit_predict_fun reads attr(X_mat, "template") and passes it to the X_newdata conversion).
  • No S3 dispatch; this is a plain internal function.

R/amm_build.R

amm_build(p = 1L)

Purpose Initialises a chainable builder object of class amm_builder that serves as an incremental specification container for an Adaptive Moderated Model (AMM). The builder is a programmatic alternative to the single-call amm_spec() constructor; it accumulates per-dimension additive (a) and multiplicative (b) basis formulas, a global modulating basis W, and optional covariate names x_vars, and is ultimately converted into an amm_spec via as_amm_spec().

Arguments

Argument Type Meaning
p Scalar positive integer (coerced to integer) Dimension of the per-individual parameter vector $\theta_i \in \mathbb{R}^p$. Defaults to 1L (scalar/ univariate path). Must satisfy p ≥ 1.

Mathematics None. The function performs only object construction.

Returns An S3 object of class c("amm_builder", "list") with four components:

  • pinteger, the value of the p argument.
  • dims — a dims_spec object (from dimwise(a = NULL, b = NULL)) representing per-dimension additive/multiplicative basis specifications with a NULL base and no overrides.
  • WNULL (no modulating basis set yet).
  • x_varsNULL (no covariate names set yet).

Notes

  • p is validated by assert_count(p, "p"), which enforces a single positive integer value. Non-integer numerics are silently coerced to integer via as.integer(p).
  • The resulting builder is intended to be mutated in-place through successive amm_set_*() calls and finalised by as_amm_spec(). Despite the pipe-friendly API, the builder is a plain list and mutations rely on R's copy-on-modify semantics.
  • When p = 1L, finalisation through as_amm_spec() resolves the embedded dims_spec to a scalar entry and invokes the scalar AMM path; when p > 1L, the dims_spec is forwarded directly to the multivariate path. This bifurcation happens at finalisation time, not at build time.

amm_set_a_uniform(builder, a)

Purpose Replaces the base (uniform) additive basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension $k = 1, \dots, p$ that does not carry an explicit per-dimension override (set via amm_set_a()). Passing NULL disables the additive component on the base layer.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
a One-sided formula (~ ...) or NULL The new base additive basis. NULL means no additive component on the base layer.

Returns The mutated amm_builder object (returned invisibly for pipe compatibility). The mutation is to builder$dims$base$a.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • a is validated by assert_one_sided_formula(a, "a", allow_null = TRUE), which enforces a one-sided formula or NULL.
  • The mutation builder$dims$base$a <- a directly overwrites the base additive slot. Any per-dimension overrides previously registered via amm_set_a() are preserved because they are stored separately in the dims_spec override layer (see override()).

amm_set_b_uniform(builder, b)

Purpose Replaces the base (uniform) multiplicative basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension $k = 1, \dots, p$ that does not carry an explicit per-dimension override (set via amm_set_b()). Passing NULL disables the multiplicative component on the base layer.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
b One-sided formula (~ ...) or NULL The new base multiplicative basis. NULL means no multiplicative component on the base layer.

Returns The mutated amm_builder object (returned invisibly). The mutation is to builder$dims$base$b.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • b is validated by assert_one_sided_formula(b, "b", allow_null = TRUE).
  • Per-dimension overrides previously registered via amm_set_b() are preserved; only the base layer is changed.

amm_set_a(builder, k, a)

Purpose Registers a per-dimension override of the additive component for a specific dimension index $k \in {1, \dots, p}$. This override takes precedence over the uniform base (set by amm_set_a_uniform()) at index $k$ only. Calling this function twice with the same k replaces the previous override for that dimension.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
k Positive integer in 1:p The dimension index to override. Validated against builder$p as the upper bound.
a One-sided formula (~ ...) or NULL The override additive basis for dimension $k$. NULL disables the additive component for that dimension.

Returns The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, a = a).

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • k is validated by assert_count(k, "k", max = builder$p), which enforces a single positive integer $\leq p$.
  • a is validated by assert_one_sided_formula(a, "a", allow_null = TRUE).
  • The override is applied via override(), which layers a per-dimension specification on top of the existing dims_spec. Overrides survive subsequent uniform changes (e.g., a later amm_set_a_uniform() call will not erase overrides registered here).

amm_set_b(builder, k, b)

Purpose Registers a per-dimension override of the multiplicative component for a specific dimension index $k \in {1, \dots, p}$. This override takes precedence over the uniform base (set by amm_set_b_uniform()) at index $k$ only. Calling this function twice with the same k replaces the previous override for that dimension.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
k Positive integer in 1:p The dimension index to override. Validated against builder$p as the upper bound.
b One-sided formula (~ ...) or NULL The override multiplicative basis for dimension $k$. NULL disables the multiplicative component for that dimension.

Returns The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, b = b).

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • k is validated by assert_count(k, "k", max = builder$p).
  • b is validated by assert_one_sided_formula(b, "b", allow_null = TRUE).
  • The override is applied via override(), which layers a per-dimension specification on top of the existing dims_spec. Overrides survive subsequent uniform changes.

amm_set_W(builder, W)

Purpose Stores a W_basis object as the global modulating basis of the specification under construction. The modulating component $W(\theta_i)$ is a function of $\theta_i$ shared across all dimensions and enters the response model as a linear factor multiplying a covariate vector $x$:

$$\text{response} \sim W(\theta_i), x$$

Passing NULL disables the modulating component.

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
W W_basis object or NULL The modulating basis to store. NULL clears any previously set basis.

Returns The mutated amm_builder object (returned invisibly). The mutation is to builder$W.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • If W is non-NULL, it is validated by assert_inherits(W, "W_basis", "W").
  • The modulating component is global to all dimensions of $\theta_i$; there is no per-dimension setter for $W$. Declaring $W$ per-dimension would restrict the model to the separable sub-class, which is rejected by construction in the package design.
  • W is stored as a single top-level slot of the builder, not inside dims.

amm_set_x_vars(builder, x_vars)

Purpose Records a character vector of covariate names that enter the modulating component as the linear factor $x$ in $W(\theta_i), x$, or clears it by passing NULL. The recorded value is forwarded to amm_spec() at finalisation time. When NULL, the package derives covariate names from the right-hand side of the model formula passed to gdpar().

Arguments

Argument Type Meaning
builder amm_builder object The builder to modify.
x_vars Character vector (length $\geq 1$) or NULL Names of the covariates used in the modulating component. NULL defers covariate identification to gdpar().

Returns The mutated amm_builder object (returned invisibly). The mutation is to builder$x_vars.

Notes

  • builder is validated by assert_inherits(builder, "amm_builder", "builder").
  • If x_vars is non-NULL, the function performs a manual validation check: it must be is.character(x_vars) and length(x_vars) >= 1L. On failure, gdpar_abort() is called with class "gdpar_input_error" and a data list containing the argument name and the received value.
  • This is the only setter in the builder suite that performs its own argument validation via gdpar_abort() rather than delegating to an assert_* helper, because the validation logic (non-empty character vector or NULL) does not match any existing assertion primitive.

Note: The roxygen block for as_amm_spec(builder) begins at the end of this section but its function body is in section 2 of 2. It is therefore not documented here.

as_amm_spec(builder)

Purpose
Converts an amm_builder object into a finalized amm_spec specification object. This is the constructor that finalizes the builder's accumulated configuration, resolving any pending dimension specifications and handling the special case of a univariate AMM (p=1).

Arguments

  • builder: An object of class amm_builder containing the accumulated model configuration (e.g., dimensions dims, basis matrix W, predictor variables x_vars, and AR order p).

Returns
An object of class amm_spec representing the fully specified AMM model.

Notes

  • The function first asserts that builder is indeed an amm_builder object.
  • For the univariate case (p == 1), it explicitly resolves the dims specification to extract the scalar anchor parameters a and b using resolve_dims_spec. The resulting amm_spec object will contain these scalar values.
  • For the multivariate case (p > 1), the dims list is passed directly to the amm_spec constructor without immediate resolution.
  • The amm_spec object is constructed by calling the amm_spec() constructor (presumably defined elsewhere) with the relevant components from the builder.

print.amm_builder(x, ...)

Purpose
An S3 print method for objects of class amm_builder. It provides a human-readable summary of the builder's current configuration, including base dimensions, any override specifications for higher-order terms, the basis structure, and predictor variables.

Arguments

  • x: An object of class amm_builder to be printed.
  • ...: Additional arguments (unused, but required for S3 method compatibility).

Returns
Invisibly returns the input object x.

Notes

  • This is an exported S3 method.
  • The output is printed to the console via cat().
  • The summary includes:
    • The AR order p.
    • Base anchor parameters a and b. These are printed as NULL if they have not been set.
    • Any override specifications for dimensions of higher-order terms (k > 0). Overrides are printed if they exist, otherwise <none> is shown.
    • The basis matrix specification W. If present, it is printed as W_basis(type = '<type>').
    • The predictor variable names x_vars. If NULL, it notes they are inherited from the gdpar() formula.
  • The function handles NULL values gracefully by printing "NULL" instead of attempting to print NULL directly.
  • The overrides are printed in ascending order of the integer key k.

R/amm_serialize.R

amm_save_spec(spec, path)

Purpose

Serializes an amm_spec object (the constructor-input representation of an AMM specification) into a canonical, human-readable plain-text file. The format is designed for version control, archival, and bit-exact reproducibility. The file is intended to be round-tripped by amm_load_spec, which parses it with a dedicated lexical parser (no source/eval), making the serialized form safe to load from untrusted locations.

Arguments

  • spec: Object of class amm_spec. The specification to serialize. Only the constructor inputs are recorded; any materialized state (e.g., a W_basis materialized at a specific $\theta_{\mathrm{ref}}$ via the internal materialize_W_basis) is deliberately not written, so that the reconstructed object is the unmaterialized form normally produced by amm_spec.
  • path: Character scalar. The destination file path. Must be a single non-empty string.

Returns

Invisibly returns path (the character scalar that was passed in), after writing the canonical text representation to that file via writeLines.

Notes

Validation and errors:

  • spec is checked with assert_inherits(spec, "amm_spec", "spec"); a failure raises an error (dispatched by assert_inherits).
  • path is validated explicitly: it must satisfy is.character(path), length(path) == 1L, and nzchar(path). If any of these fail, the function aborts via gdpar_abort with class "gdpar_input_error" and a data list containing argument = "path" and received = path.
  • If spec[["W"]] is non-NULL and identical(W[["type"]], "user"), the function aborts with a gdpar_input_error (no data field). User-defined W_basis objects cannot be canonized because the evaluator is an arbitrary R function.

Serialization logic (line-by-line construction):

  1. The package version is obtained from utils::packageVersion("gdpar") and emitted as the mandatory header line:
    # gdpar_spec_version: <version>
    
  2. The dimension count is emitted as p: <spec[["p"]]>.
  3. Scalar path (spec[["p"]] == 1L, checked with isTRUE): the a and b one-sided formulas are serialized via the internal helper .serialize_formula and emitted as a: <literal> and b: <literal>.
  4. Multivariate path (spec[["p"]] > 1L): a and b are both written as the literal string NULL (the per-dimension formulas are emitted separately below).
  5. x_vars is serialized via .serialize_char_vec and emitted as x_vars: <literal> (handles NULL and character vectors).
  6. W block:
    • If W is NULL: emits W.type: NULL.
    • If W[["type"]] is "polynomial": emits W.type: polynomial followed by W.degree: <as.integer(W[["degree"]])>.
    • If W[["type"]] is "bspline": emits W.type: bspline, then W.degree: <as.integer(W[["degree"]])>. Additionally, if W[["knots"]] is non-NULL, emits W.knots: <.serialize_num_vec(W[["knots"]])>; if W[["df"]] is non-NULL, emits W.df: <as.integer(W[["df"]])>. Either, both, or neither of W.knots/W.df may appear depending on which fields are populated.
  7. Multivariate per-dimension entries (spec[["p"]] > 1L, checked with isTRUE): for each k in seq_len(spec[["p"]]), the function retrieves entry <- spec[["dims"]][[k]] and emits two lines:
    dims.<k>.a: <.serialize_formula(entry[["a"]])>
    dims.<k>.b: <.serialize_formula(entry[["b"]])>
    
    The index k is interpolated directly into the key name, producing keys such as dims.1.a, dims.1.b, dims.2.a, etc.

Side effects: Writes a text file to path via writeLines(lines, con = path). The file is overwritten if it exists.

Version policy: The version header records the running package version exactly. The loader (amm_load_spec) checks this strictly; until the first stable release, any mismatch is treated as an error.

Format grammar summary (as emitted by this function):

Key Value grammar
# gdpar_spec_version: Package version string (header line)
p Positive integer
a, b NULL or one-sided formula literal (e.g. ~ x1 + x2)
x_vars NULL or c("x1", "x2", ...)
W.type NULL, polynomial, or bspline
W.degree Positive integer (present for polynomial and bspline)
W.knots c(...) of numerics (present only for bspline with interior knots)
W.df Positive integer (present only for bspline with df)
dims.K.a, dims.K.b Same grammar as a, b, for K in 1:p (emitted only when p > 1)

amm_load_spec(path)

Purpose

Reads a canonical gdpar specification file from disk and parses it into an amm_spec object. The function is the deserialization counterpart to the package's spec writer: it enforces a strict, line-oriented key: value grammar, validates a version header against the currently loaded package version, and dispatches to amm_spec() either in the univariate (p = 1) form (with scalar a/b) or the multivariate (p > 1) form (with a dims list of per-dimension a/b pairs).

Arguments

  • path : character scalar (length 1, non-empty). Filesystem path to a canonical gdpar spec file. Must exist on disk.

Mathematics

The parser implements a deterministic finite scan over the file's lines. Let $L = (l_1, \dots, l_n)$ be the raw lines. Define the comment predicate $C(l) \equiv \mathrm{trimws}(l) \text{ starts with } \texttt{#}$ and the blank predicate $B(l) \equiv \mathrm{nzchar}(\mathrm{trimws}(l)) = \mathrm{FALSE}$. For each non-skipped line $l_i$, the split position is

$$ \mathrm{pos}_i = \mathrm{regexpr}(\texttt{":"}, l_i, \mathrm{fixed}=\mathrm{TRUE}), $$

and the key/value pair is

$$ k_i = \mathrm{trimws}(\mathrm{substr}(l_i, 1, \mathrm{pos}_i - 1)), \qquad v_i = \mathrm{trimws}(\mathrm{substr}(l_i, \mathrm{pos}_i + 1, |l_i|)). $$

The recognised key set is

$$ \mathcal{K} = {\texttt{p}, \texttt{a}, \texttt{b}, \texttt{x_vars}, \texttt{W.type}, \texttt{W.degree}, \texttt{W.knots}, \texttt{W.df}} \cup {\texttt{dims.}k\texttt{.}{a,b} : k \in \mathbb{Z}_{\geq 1}}. $$

For $p = 1$, the admissible record set excludes any dims.*.* key; for $p &gt; 1$, the admissible dims indices are exactly ${1, \dots, p}$ and the scalar a/b keys must be absent (i.e. parse to NULL).

Returns

An amm_spec object constructed by amm_spec():

  • When p == 1L: amm_spec(a = a_scalar, b = b_scalar, W = W, x_vars = x_vars, p = 1L), where a_scalar and b_scalar are parsed formula objects (or NULL) and W is the parsed weight-basis specification returned by .parse_W_records().
  • When p > 1L: amm_spec(W = W, x_vars = x_vars, p = p_val, dims = dims_list), where dims_list is a list of length p_val whose $k$-th element is list(a = a_k, b = b_k) with a_k, b_k parsed formula objects.

Notes

  • All validation failures are raised via gdpar_abort() with class "gdpar_input_error"; condition data payloads are attached where contextual (e.g. argument, received, path, line, raw, key, file_version, package_version).
  • Input validation of path: rejects non-character, non-scalar, or empty-string inputs before any I/O.
  • Version header: the file must contain a line matching the regex ^\s*#\s*gdpar_spec_version\s*:. The extracted (whitespace-trimmed) value must be identical() to as.character(utils::packageVersion("gdpar")); any mismatch aborts, citing bit-exact reproducibility concerns across development releases.
  • Comment and blank lines are skipped during record parsing; only the first : (fixed-match) on each remaining line is used as the key/value separator.
  • Duplicate-key detection is performed against an accumulating records list; the error message reports both the current line and the line of the first occurrence.
  • Unknown-key rejection: any key not in recognised_prefixes and not matching ^dims\.[0-9]+\.[ab]$ aborts.
  • Required-key enforcement: p, a, b are mandatory for the univariate branch; for the multivariate branch, a/b must be NULL (i.e. .parse_formula() returned NULL) and dims.K.a / dims.K.b are required for every $k \in {1, \dots, p}$.
  • p is parsed via .parse_int() and must satisfy $p \geq 1$.
  • x_vars is optional; when absent it is passed as NULL.
  • The W block is delegated entirely to .parse_W_records(records).
  • Multivariate consistency: when p == 1L, any dims.* key triggers an abort listing the offending keys (via sQuote). When p > 1L, after building dims_list, the function recomputes the set of dims. keys and subtracts those matching ^dims\.[1-(p-1)]\.[ab]$|^dims\.p\.[ab]$ (i.e. the regex ^dims\.[1-%d]\.[ab]$|^dims\.%d\.[ab]$ with p_val - 1L and p_val substituted); any remainder is parsed for its integer index and aborted if the index is NA, less than 1, or greater than p_val, or if the key does not match the dims.K.{a,b} shape at all.
  • Side effects: reads from the filesystem via readLines(path, warn = FALSE); performs no writes.
  • No S3 dispatch is performed inside this function; the returned object's class is determined by amm_spec().

Serialization and Parsing Helpers (Formula, Vectors, Integers, W Records)

.serialize_formula(f)

Purpose

Serializes an R formula object into a single-line character string suitable for writing into the canonical text file format used by gdpar. This is the inverse of .parse_formula.

Arguments

  • f: formula or NULL. The formula to serialize. May be NULL, in which case the literal string "NULL" is produced.

Mathematics

No numerical formula. The transformation is:

$$ \text{serialize}(f) = \begin{cases} \texttt{"NULL"} & \text{if } f = \texttt{NULL}, \\ \texttt{paste}(\texttt{deparse}(f,, \text{width.cutoff}=500),, \text{collapse}=\texttt{" "}) & \text{otherwise.} \end{cases} $$

Returns

A length-one character string. When f is NULL, returns "NULL". Otherwise returns the deparsed formula collapsed onto a single line with elements separated by single spaces.

Notes

  • Uses deparse(..., width.cutoff = 500L) to reduce line-wrapping in the deparsed output, then collapses any resulting multi-element character vector with spaces.
  • No validation is performed on the structure of f (e.g., one-sided vs. two-sided); that is the responsibility of the parser on read-back.

.parse_formula(value, key, line_no)

Purpose

Parses a character string back into an R formula object, enforcing that it is either NULL or a one-sided formula beginning with ~. Used when reading canonical configuration files.

Arguments

  • value: character(1). The raw string read from the file for the given key.
  • key: character(1). The configuration key name, used in error messages.
  • line_no: integer(1) or numeric. The line number in the source file where value appeared, used in error messages.

Mathematics

No numerical formula. The validation logic is:

$$ \text{parse}(v) = \begin{cases} \texttt{NULL} & \text{if } \texttt{trimws}(v) = \texttt{"NULL"}, \\ \texttt{as.formula}(v) & \text{if } v \text{ starts with } \sim \text{ and } |\texttt{as.formula}(v)| = 2, \\ \text{error} & \text{otherwise.} \end{cases} $$

Returns

NULL or a one-sided formula object of length 2 (i.e., ~ rhs).

Notes

  • Trims whitespace from value before any comparison.
  • If the trimmed value is exactly "NULL", returns NULL immediately.
  • If the value does not start with "~", raises an error of class "gdpar_input_error" with data = list(key = key, value = value, line = line_no).
  • Wraps stats::as.formula(value) in tryCatch; on parse failure, raises a "gdpar_input_error" (without the data field).
  • After successful parsing, checks length(out) != 2L; a two-sided formula (length 3) triggers a "gdpar_input_error".
  • All errors are raised via gdpar_abort.

.serialize_char_vec(x)

Purpose

Serializes a character vector (or NULL) into a c(...) literal string with each element double-quoted, suitable for the canonical file format. This is the inverse of .parse_char_vec.

Arguments

  • x: character vector or NULL. The vector to serialize.

Mathematics

$$ \text{serialize}(x) = \begin{cases} \texttt{"NULL"} & \text{if } x = \texttt{NULL}, \\ \texttt{c("}x_1\texttt{", "}x_2\texttt{", }\ldots\texttt{")} & \text{otherwise,} \end{cases} $$

subject to the constraint that no element of $x$ contains ", \, \n, or \r.

Returns

A length-one character string. "NULL" for NULL input; otherwise a c(...) literal with quoted entries.

Notes

  • Before serialization, checks every element of x for the regex pattern ["\\\n\r] (double quote, backslash, newline, carriage return). If any match is found, raises a "gdpar_input_error" via gdpar_abort with the message indicating that double quotes, backslashes, or newlines are not permitted.
  • Each element is formatted via sprintf("\"%s\"", x_i).
  • Elements are joined with ", ".
  • An empty (length-zero) non-NULL character vector produces the string c().

.parse_char_vec(value, key, line_no)

Purpose

Parses a c(...) literal string back into a character vector, or recognizes "NULL". Inverse of .serialize_char_vec.

Arguments

  • value: character(1). The raw string from the file.
  • key: character(1). Configuration key name for error messages.
  • line_no: integer(1) or numeric. Source line number for error messages.

Mathematics

The extraction proceeds as:

  1. Trim value; if equal to "NULL", return NULL.
  2. Require the regex ^c\(.*\)$; otherwise error.
  3. Extract the inner content: $\text{inner} = \texttt{sub}(\texttt{"^c((.*))$"}, \texttt{"\1"}, v)$.
  4. Find all quoted tokens via the regex "([^"]*)" on inner.
  5. Strip the surrounding double quotes from each match.

Returns

NULL, character(0), or a character vector of the parsed quoted tokens.

Notes

  • If the trimmed value is "NULL", returns NULL.
  • If the value does not match ^c\(.*\)$, raises a "gdpar_input_error".
  • After extracting inner, if there are zero regex matches:
    • If inner is empty or whitespace-only (!nzchar(trimws(inner))), returns character(0).
    • Otherwise raises a "gdpar_input_error" indicating quoted tokens were expected.
  • Quoted tokens are matched with the regex "([^"]*)" and then the double quotes are removed via gsub("\"", "", matches, fixed = TRUE).
  • The parser does not handle escaped quotes inside tokens; this is consistent with the serializer's prohibition on double quotes within elements.

.serialize_num_vec(x)

Purpose

Serializes a numeric vector (or NULL) into a c(...) literal string using high-precision formatting. Inverse of .parse_num_vec.

Arguments

  • x: numeric vector or NULL. The vector to serialize.

Mathematics

Each element is formatted with 17 significant digits:

$$ x_i \mapsto \texttt{sprintf}(\texttt{"%.17g"},, x_i), $$

then joined as $\texttt{c(}x_1\texttt{, }x_2\texttt{, }\ldots\texttt{)}$.

Returns

A length-one character string:

  • "NULL" if x is NULL.
  • "c()" if x has length 0.
  • Otherwise a c(...) literal with %.17g-formatted numbers.

Notes

  • The %.17g format provides enough precision to round-trip IEEE-754 double-precision values in most cases.
  • No validation is performed on finiteness or NA/NaN values; sprintf("%.17g", NA) yields "NA", and sprintf("%.17g", Inf) yields "Inf", which would fail on re-parse via .parse_num_vec (since as.numeric("NA") is NA and triggers the non-numeric error, and Inf would parse but is not checked here).

.parse_num_vec(value, key, line_no)

Purpose

Parses a c(...) literal string back into a numeric vector, or recognizes "NULL". Inverse of .serialize_num_vec.

Arguments

  • value: character(1). The raw string from the file.
  • key: character(1). Configuration key name for error messages.
  • line_no: integer(1) or numeric. Source line number for error messages.

Mathematics

  1. Trim value; if "NULL", return NULL.
  2. Require ^c\(.*\)$; otherwise error.
  3. Extract inner: $\text{inner} = \texttt{trimws}(\texttt{sub}(\texttt{"^c((.*))$"}, \texttt{"\1"}, v))$.
  4. If inner is empty, return numeric(0).
  5. Split on , (fixed), trim each part, coerce with as.numeric.
  6. If any result is NA, error.

Returns

NULL, numeric(0), or a numeric vector.

Notes

  • Uses suppressWarnings(as.numeric(parts)) to silence coercion warnings; the presence of NA in the result is then checked explicitly.
  • If any token fails to coerce (producing NA), raises a "gdpar_input_error" via gdpar_abort.
  • Splitting is done with strsplit(inner, ",", fixed = TRUE), so commas inside quoted strings would cause issues — but numeric vectors do not contain quoted strings, so this is safe.
  • Note that Inf, -Inf, and NaN would parse successfully via as.numeric and pass the is.na check for Inf (since is.na(Inf) is FALSE), but NaN would fail since is.na(NaN) is TRUE.

.parse_int(value, key, line_no)

Purpose

Parses a character string into a single integer value, with strict validation.

Arguments

  • value: character(1). The raw string from the file.
  • key: character(1). Configuration key name for error messages.
  • line_no: integer(1) or numeric. Source line number for error messages.

Mathematics

$$ v = \texttt{as.numeric}(\texttt{trimws}(\text{value})), $$

then validate:

$$ \text{valid} \iff \neg\texttt{is.na}(v) ;\wedge; \texttt{is.finite}(v) ;\wedge; v = \texttt{as.integer}(v). $$

If valid, return $\texttt{as.integer}(v)$.

Returns

A length-one integer.

Notes

  • Uses suppressWarnings(as.numeric(value)) to silence coercion warnings.
  • The check v != as.integer(v) rejects non-integer-valued numerics (e.g., 3.5). Note that as.integer(v) truncates toward zero, so this comparison effectively requires $v$ to be a whole number.
  • is.finite(v) rejects Inf, -Inf, NaN, and NA.
  • On any validation failure, raises a "gdpar_input_error" via gdpar_abort.
  • The returned value is explicitly as.integer(v), so it is of R type integer (not numeric).

.parse_W_records(records)

Purpose

Parses the collection of W.* configuration records (for the basis-function specification of the dynamic parameter model) and constructs a W_basis object. This is the central dispatcher for reconstructing the basis $W$ from the canonical file format.

Arguments

  • records: a named list of record objects. Each record is expected to be a list with elements value (character string) and line (line number). The relevant keys are "W.type", "W.degree", "W.knots", and "W.df".

Mathematics

The basis $W$ is constructed according to:

$$ W = \begin{cases} \texttt{NULL} & \text{if } \texttt{W.type} \text{ is absent or } \texttt{"NULL"}, \\ \text{error} & \text{if } \texttt{W.type} = \texttt{"user"}, \\ W_{\text{poly}}(\text{degree}) & \text{if } \texttt{W.type} = \texttt{"polynomial"}, \\ W_{\text{bspline}}(\text{degree},, \text{knots}) & \text{if } \texttt{W.type} = \texttt{"bspline"} \text{ and } \texttt{W.knots} \text{ given}, \\ W_{\text{bspline}}(\text{degree},, \text{df}) & \text{if } \texttt{W.type} = \texttt{"bspline"} \text{ and } \texttt{W.df} \text{ given}. \end{cases} $$

Returns

NULL, or a W_basis object constructed by W_basis(...).

Notes

  • If records[["W.type"]] is NULL or its value is "NULL", returns NULL immediately (no basis).
  • W.type = "user" is explicitly rejected with a "gdpar_input_error" because user-defined bases reference arbitrary R functions that cannot be serialized in the canonical format.
  • Only "polynomial" and "bspline" are accepted as W.type values; anything else raises a "gdpar_input_error" referencing the line number from Wt_rec[["line"]].
  • W.degree is required for both supported types. If records[["W.degree"]] is NULL, an error is raised. The degree is parsed via .parse_int.
  • For W.type = "polynomial": returns W_basis(type = "polynomial", degree = W_degree). No further keys are consulted.
  • For W.type = "bspline":
    • W.knots and W.df are mutually exclusive. If both are present, a "gdpar_input_error" is raised.
    • If W.knots is present, it is parsed via .parse_num_vec and passed as knots to W_basis.
    • If W.df is present, it is parsed via .parse_int and passed as df to W_basis.
    • If neither W.knots nor W.df is present, a "gdpar_input_error" is raised stating that one must be supplied.
  • All errors are raised via gdpar_abort with class "gdpar_input_error".
  • This function delegates the actual basis construction to the (presumably exported or internal) W_basis constructor, which is not defined in this section.

R/amm_spec.R

amm_spec(a = NULL, b = NULL, W = NULL, x_vars = NULL, p = 1L, dims = NULL)

Purpose Constructs a specification object that declares which components of the Additive-Multiplicative-Modulated (AMM) canonical form

$$\Delta(x, \theta) = a(x) + b(x) \odot \theta + W(\theta), x$$

are active and how each is parametrized. The returned amm_spec object is the primary input consumed by gdpar() to assemble design matrices and the Stan model. Two mutually exclusive construction paths exist, selected by p: the scalar path (p = 1L, default) accepts a and b directly as one-sided formulas; the multivariate path (p > 1L) requires the dims argument instead and forbids a/b.

Arguments

Argument Type Meaning
a One-sided formula or NULL Scalar path only. Basis for the additive component $a(x)$. Evaluated without an intercept via stats::model.matrix. NULL disables the additive component. Must be NULL when p > 1L.
b One-sided formula or NULL Scalar path only. Basis for the multiplicative component $b(x)$. Same semantics as a. Must be NULL when p > 1L.
W Object of class W_basis or NULL Basis for the modulating component $W(\theta)$. Declared once regardless of p because it couples all dimensions of $\theta_{\mathrm{ref}}$. NULL disables the modulating component. Validated with assert_inherits.
x_vars Character vector or NULL Names of covariates entering the modulating component as the linear factor $x$ in $W(\theta),x$. When NULL, gdpar() later uses the covariates from the right-hand side of the model formula. Must be non-empty character when supplied.
p Integer $\geq 1$ Dimension of the per-individual parameter vector $\theta_i$. Defaults to 1L. Coerced to integer via as.integer after validation with assert_count.
dims dims_spec object, plain list of length p, or NULL Multivariate path only. Each entry is a list with components a (one-sided formula or NULL) and b (one-sided formula or NULL). Must be NULL when p == 1L. Bare formulas are rejected to prevent silent recycling. If a dims_spec object (from dimwise()), it is resolved via resolve_dims_spec(dims, p).

Mathematics

The AMM level of the specification is inferred from the (non-)nullity of the components across all dimensions:

$$ \text{level} = \begin{cases} 0 & \text{if every component is } \texttt{NULL} \text{ across all dimensions,} \\ 1 & \text{if only } a \text{ is active in any dimension and neither } b \text{ nor } W \text{ is active anywhere,} \\ 2 & \text{otherwise.} \end{cases} $$

On the scalar path the check is direct on a, b, W; on the multivariate path it is computed over the resolved per-dimension list:

$$\texttt{any_a} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}k$a), \qquad \texttt{any_b} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}k$b).$$

The design-matrix centering conditions (C2) and (C3) are not enforced inside this constructor; they are enforced empirically when the design matrices $Z_a$ and $Z_b$ are built by the (internal, downstream) design-matrix materializer, which subtracts column-wise sample means so that $\operatorname{colMeans}(Z_a) = \mathbf{0}$ and $\operatorname{colMeans}(Z_b) = \mathbf{0}$ exactly.

Returns An S3 object of class c("amm_spec", "list") with the following components:

Component Type Description
a One-sided formula or NULL Populated on the scalar path; NULL on the multivariate path.
b One-sided formula or NULL Populated on the scalar path; NULL on the multivariate path.
W W_basis object or NULL The modulating basis, passed through verbatim.
x_vars Character vector or NULL Passed through verbatim.
level Integer (0L, 1L, or 2L) AMM hierarchy level as defined above.
p Integer Dimension of $\theta_i$.
dims NULL or list of length p NULL on the scalar path. On the multivariate path, a resolved list of length p where each element is list(a = ..., b = ...).

Notes

  • Scalar path (p == 1L): a and b are validated with assert_one_sided_formula(..., allow_null = TRUE). If dims is non-NULL, an error of class gdpar_input_error is raised.
  • Multivariate path (p > 1L): If either a or b is non-NULL, an error is raised directing the user to dims. If dims is NULL, an error is raised.
  • Bare formula guard: If dims inherits from "formula", a specific error is raised to prevent silent recycling across dimensions.
  • Plain list validation for dims: The list must have length exactly p; each entry must itself be a list. Missing a/b names default to NULL. Each formula entry is validated with assert_one_sided_formula.
  • dims_spec resolution: When dims is a dims_spec object (produced by dimwise() and possibly composed with override()), it is resolved via the internal resolve_dims_spec(dims, p) function.
  • Unknown class for dims: If dims is not NULL, not a formula, not a dims_spec, and not a list, an error of class gdpar_input_error is raised.
  • No cross-component identifiability check: The constructor does not detect non-identifiability between $a$, $b$, and $W$ components; this is deferred to gdpar_check_identifiability() which is called automatically before fitting.
  • Linearity assumption (LIN): Holds automatically for formula-based bases (linear subspaces of $L^2_0(\mu)$) and for polynomial/B-spline W_basis types.
  • Dependencies: Uses assert_count, assert_one_sided_formula, assert_inherits (all internal assertion helpers), gdpar_abort (structured error signalling), and resolve_dims_spec.

build_amm_design(amm, data, formula_rhs)

Purpose Constructs the centered design matrices ($Z_a$, $Z_b$, and optionally $X$) for an Asymmetric Mixture Model (AMM) specification with $p=1$ coordinate. For $p&gt;1$ it delegates to .build_amm_design_multi(). This function prepares the covariate matrices for the static and modulating components of the AMM, centering them and standardising the modulating covariates.

Arguments

  • amm (amm_spec): An object of class "amm_spec" representing the AMM specification.
  • data (data.frame): The data frame containing the variables referenced by the AMM specification.
  • formula_rhs (formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factor x variables when amm$x_vars is NULL.

Mathematics The function implements the following operations:

  1. Centering the static components:
    For the static intercept formula amm$a (and similarly amm$b), the design matrix $Z_a^\text{full}$ is constructed. It is then column-centred by subtracting its column means: $$ Z_a = Z_a^\text{full} - \mathbf{1} \cdot \bar{Z}_a^\text{T} $$ where $\bar{Z}_a$ is the vector of column means of $Z_a^\text{full}$. The same is done for $Z_b$.

  2. Standardising the modulating component:
    When the modulating component amm$W is active, the covariate matrix $X^\text{full}$ is built from x_vars. It is then centred and scaled to have zero mean and unit standard deviation (using the sample standard deviation): $$ X = \text{diag}\left(\frac{1}{s_X}\right) \left( X^\text{full} - \mathbf{1} \cdot \bar{X}^\text{T} \right) $$ where $\bar{X}$ is the vector of column means and $s_X$ is the vector of column sample standard deviations.

Returns A list with components:

  • Z_a (matrix): Centred design matrix for the static component a (rows = observations, columns = terms from a). If a is NULL, a 0-column matrix of the correct row count.
  • Z_b (matrix): Centred design matrix for the static component b.
  • X (matrix): Standardised (centred and scaled) design matrix for the modulating component. If W is inactive, a 0-column matrix.
  • Z_a_means (numeric): Column means of the raw (uncentred) design matrix for a.
  • Z_b_means (numeric): Column means for b.
  • X_means (numeric): Column means of the raw covariate matrix for x.
  • X_sds (numeric): Column standard deviations of the centred covariate matrix for x.
  • Z_a_names (character): Column names of Z_a.
  • Z_b_names (character): Column names of Z_b.
  • X_names (character): Column names of X (the x_vars).

Notes

  • If amm$p > 1, the function immediately returns the result of .build_amm_design_multi(amm, data, formula_rhs).
  • The function performs strict input validation via assert_inherits and assert_data_frame.
  • Raises a gdpar_input_error if any covariate needed by a, b, or x_vars contains missing values (NA). The error message specifies that "Path 1 does not impute".
  • Raises a gdpar_input_error if the modulating component W is active but no x_vars could be identified (from amm$x_vars, formula_rhs, or the formula terms).
  • Raises a gdpar_input_error if any required x_vars are missing from data.
  • Raises a gdpar_input_error if any x_vars are constant (zero standard deviation after centring).
  • The design matrices are constructed using stats::model.matrix with the formula updated to ~ . + 0 to suppress the intercept column.
  • The returned Z_a, Z_b, and X are always plain matrices (not data frames or tibbles).

.build_amm_design_multi(amm, data, formula_rhs)

Purpose Internal workhorse for building per-coordinate centred design matrices ($Z_{a,k}$, $Z_{b,k}$) and a shared modulating matrix ($X$) for a multivariate AMM specification with $p &gt; 1$. The modulating component is shared across coordinates because it depends only on the global covariate vector $x$.

Arguments

  • amm (amm_spec): An object of class "amm_spec" with amm$p > 1. The per-coordinate specifications are stored in amm$dims.
  • data (data.frame): The data frame containing the variables referenced by the AMM specification.
  • formula_rhs (formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factor x variables when amm$x_vars is NULL.

Mathematics The algorithm is an extension of the univariate case ($p=1$) to $k=1,\dots,p$ coordinates:

  1. Collect needed variables:
    Iterate over all coordinates $k$ and collect all variables referenced in amm$dims[[k]]$a and amm$dims[[k]]$b. The union of these, plus amm$x_vars, forms the set of variables that must be present and complete (no NAs) in data.

  2. Per-coordinate centred design matrices:
    For each coordinate $k$:

    • If the static intercept formula a_k is not NULL, construct its design matrix $Z_{a,k}^\text{full}$ and centre it: $$ Z_{a,k} = Z_{a,k}^\text{full} - \mathbf{1} \cdot \bar{Z}_{a,k}^\text{T} $$
    • Similarly for $Z_{b,k}$ from b_k.
    • If a_k or b_k is NULL, the corresponding matrix is set to a 0-column matrix.
  3. Shared modulating matrix $X$:
    The construction is identical to the univariate case: identify x_vars, build $X^\text{full}$, centre, scale, and standardise to unit variance. The resulting $X$ is shared across all $p$ coordinates.

Returns A list with components:

  • p (integer): The number of coordinates.
  • Z_a_list (list of matrices): Length p list; each element is the centred design matrix for coordinate $k$'s static component a.
  • Z_b_list (list of matrices): Length p list; each element is the centred design matrix for coordinate $k$'s static component b.
  • X (matrix): The shared standardised modulating matrix (0 columns if W is inactive).
  • Z_a_means_list (list of numeric vectors): Column means for each raw $Z_{a,k}^\text{full}$.
  • Z_b_means_list (list of numeric vectors): Column means for each raw $Z_{b,k}^\text{full}$.
  • X_means (numeric): Column means of the raw $X^\text{full}$.
  • X_sds (numeric): Column standard deviations of the centred $X$.
  • Z_a_names_list (list of character vectors): Column names for each $Z_{a,k}$.
  • Z_b_names_list (list of character vectors): Column names for each $Z_{b,k}$.
  • X_names (character): Column names of X (the x_vars).

Notes

  • This function is internal and not exported (hence the leading dot).
  • It performs the same input validation and error checks as build_amm_design: missing values in needed variables, identification of x_vars when W is active, missing x_vars in data, and constant covariates in x_vars.
  • The per-coordinate matrices in Z_a_list and Z_b_list may have different column counts (ragged arrays). Downstream assembly (e.g., for Stan) must handle this raggedness.
  • The function does not compute any group-level random effects or parameter transformations; it only constructs the design matrices from the data and specifications.

.build_amm_design_K(amm_list_canonical, data, formula_rhs)

Purpose Constructs centered design matrices and metadata for a multi‑individual (K > 1) AMM specification. Processes a list of canonical amm_spec objects (one per individual) and a dataset to produce centered design matrices for additive (a) and multiplicative (b) components, plus a scaled matrix for the modulating component (W) if present.

Arguments

  • amm_list_canonical (list): A named list of length ≥ 2. Each element must be an object of class amm_spec with p = 1 (the K > 1, p > 1 regime is unsupported). Slots must have non‑empty names.
  • data (data.frame): Data frame containing the variables used in the formulas. Must have no missing values in the required covariates.
  • formula_rhs (formula or character): Right‑hand side of the model formula. Used as a fallback to determine the set of covariates for the modulating component if x_vars is not specified and the union of variables from a/b formulas is empty.

Mathematics

For each individual $k = 1, \dots, K$:

  1. Additive component:
    If the formula $a_k$ is not NULL, build the design matrix $Z^{(a)}_{\text{full}, k}$ via model.matrix. Compute column means $\bar{z}^{(a)}_k$ and center: $$Z^{(a)}k = Z^{(a)}{\text{full}, k} - \mathbf{1}_n \bar{z}^{(a)\top}_k.$$ If $a_k$ is NULL, set $Z^{(a)}_k$ to an $n \times 0$ matrix.

  2. Multiplicative component:
    Analogously, from $b_k$, produce $Z^{(b)}_k$ after centering.

  3. Modulating component (if any $W$ is non‑NULL):
    Determine the variable set $\mathbf{x}$ (explicit x_vars or union of variables from all $a_k$, $b_k$, or from formula_rhs).
    Form $X_{\text{full}}$ from the data, then center and scale: $$\bar{x} = \frac{1}{n} \mathbf{1}n^\top X{\text{full}},$$ $$X_c = X_{\text{full}} - \mathbf{1}_n \bar{x}^\top,$$ $$s_x = \text{diag}\left(\sqrt{\frac{1}{n-1} X_c^\top X_c}\right),$$ $$X = X_c \cdot \text{diag}(1/s_x).$$ Each column of $X$ has zero mean and unit sample standard deviation.

Returns
A list with components:

  • K (integer): Number of individuals.
  • slot_names (character vector): Names of the input list slots.
  • Z_a_k_list (list of matrices): Centered design matrices for additive components (each $n \times p_a^{(k)}$, where $p_a^{(k)}$ is the number of columns from $a_k$; $n \times 0$ if absent).
  • Z_b_k_list (list of matrices): Centered design matrices for multiplicative components.
  • X (matrix): Centered and scaled design matrix for the modulating component ($n \times q$; $n \times 0$ if no $W$ is active).
  • Z_a_k_means_list (list of numeric vectors): Column means of the additive design matrices before centering.
  • Z_b_k_means_list (list of numeric vectors): Column means of the multiplicative design matrices.
  • X_means (numeric vector): Column means of the modulating component matrix (empty if $q = 0$).
  • X_sds (numeric vector): Column standard deviations of $X_c$ (empty if $q = 0$).
  • Z_a_k_names_list (list of character vectors): Column names of the additive design matrices.
  • Z_b_k_names_list (list of character vectors): Column names of the multiplicative design matrices.
  • X_names (character vector): Names of the variables in the modulating component.

Notes

  • Raises errors via gdpar_abort if:
    • amm_list_canonical is not a list of length ≥ 2 or has empty/missing names.
    • Any amm_spec has p > 1 (unsupported feature error).
    • data is not a data frame.
    • Missing values exist in any required covariate (no imputation).
    • The modulating component is active but no covariates are identified.
    • Required variables for the modulating component are missing in data.
    • Any variable in the modulating component is constant (zero standard deviation).
  • Uses assert_inherits to verify each list element is an amm_spec.
  • Uses stats::model.matrix and sweep for matrix construction and centering.

print.amm_spec(x, ...)

Purpose S3 method that prints a concise summary of an amm_spec object to the console, displaying its key specifications: level, dimension p, formulas for additive/multiplicative components, modulating component, and x_vars if present.

Arguments

  • x (amm_spec): An object of class amm_spec to print.
  • ... (any): Additional arguments (unused; present for S3 generic compatibility).

Returns Invisibly returns the input object x.

Notes

  • Output format:
    <amm_spec> AMM Level <level>
      p (dim theta_i)    : <p>
      a (additive)       : <formula or NULL>
      b (multiplicative) : <formula or NULL>
      W (modulating)     : <W_basis(...) or NULL>
      x_vars             : <comma-separated list, if any>
    
    If p > 1, instead of separate a and b lines, it prints a table of per‑dimension formulas from x$dims.
  • Formulas are deparsed to character strings for display.
  • The method is exported and can be called directly on amm_spec objects or via print().

R/bvm_check.R

gdpar_bvm_check(fit, parameters = NULL, level = 0.95, verbose = TRUE)

Purpose

Exported methodological-audit function. Empirically verifies the conclusion of the Bernstein–von Mises theorem (Theorem 4C of Block 4) for a fitted Path 1 Bayesian gdpar model by comparing Bayesian posterior credible intervals with Hessian-based asymptotic confidence intervals obtained from a Laplace approximation around the maximum likelihood estimator on a prior-stripped Stan model. The function is opt-in, computationally expensive, and does not modify the input fit.

Arguments

  • fit: object of S3 class gdpar_fit (produced by gdpar() with path = "bayes"). Must contain $fit (a cmdstanr fit object with a draws() method), $stan_data (a list with possible field use_groups), and $prior (passed to generate_stan_code).
  • parameters: optional character vector of parameter names to include in the comparison. When NULL (default), defaults to the user-facing parameters that the prior-stripped likelihood identifies, obtained by filtering posterior::variables(draws) against an exclusion regex.
  • level: numeric scalar in $[0,1]$ giving the nominal credible/confidence level. Defaults to 0.95.
  • verbose: logical scalar; when TRUE, prints an estimated-cost / opt-in message before starting. Defaults to TRUE.

Mathematics

Let $\theta$ denote the finite-dimensional parameter vector of the parametric AMM specification. Theorem 4C states that, under the LAN condition with non-singular Fisher information $\mathcal{I}(\theta_0)$ at the true parameter $\theta_0$, the posterior converges in total variation:

$$ \pi(\theta \mid y) \xrightarrow{TV} \mathcal{N}!\left(\hat\theta_{\mathrm{MLE}},; \frac{1}{n}\mathcal{I}(\theta_0)^{-1}\right). $$

For a nominal level $1-\alpha$ with $\alpha = 1 - \texttt{level}$, the function computes two-sided intervals with lower and upper quantiles

$$ q_L = \alpha/2, \qquad q_U = 1 - \alpha/2. $$

For each parameter, the Bayesian credible interval is $[b_L, b_U]$ from posterior quantiles and the asymptotic confidence interval is $[a_L, a_U]$ from the Laplace approximation around the MLE. The interval-width ratio and discrepancy are

$$ r = \frac{b_U - b_L}{a_U - a_L}, \qquad d = \left|\log r\right|. $$

A width ratio is flagged as suspicious when $r &lt; 0.5$ or $r &gt; 2$.

The MLE is obtained on the constrained (natural) scale because Stan's optimizer is invoked with jacobian = FALSE; the same convention is used for cmdstanr::laplace, so the Hessian-based covariance corresponds to the inverse observed information on the natural scale rather than the unconstrained scale.

Returns

A list of S3 class c("gdpar_bvm_report", "list") with components:

  • table: a data frame with one row per selected parameter and columns
    • variable: parameter name,
    • bayes_mean, bayes_lower, bayes_upper, bayes_width: posterior mean, lower/upper quantiles, and width $b_U - b_L$,
    • asymp_mean, asymp_lower, asymp_upper, asymp_width: Laplace-approximation mean, lower/upper quantiles, and width $a_U - a_L$ (all NA_real_ if the Laplace step failed),
    • width_ratio: $r = \texttt{bayes_width} / \texttt{asymp_width}$ (NA_real_ if the Laplace step failed).
  • discrepancy: numeric vector of length length(parameters) with $d = |\log r|$ (NA_real_ if the Laplace step failed).
  • level: the input level.
  • warnings: character vector of warning messages. Set to "Laplace approximation failed; asymptotic comparison unavailable." when the Laplace step fails; otherwise appended with one entry of the form "Width ratio outside [0.5, 2] for: <vars>." whenever any ratio falls outside $[0.5, 2]$.

Notes

  • Input validation:
    • assert_inherits(fit, "gdpar_fit", "fit") is called first.
    • assert_numeric_scalar(level, "level", lower = 0, upper = 1) enforces $0 \le \texttt{level} \le 1$.
    • verbose is checked manually: if !is.logical(verbose) || length(verbose) != 1L, the function aborts with gdpar_abort(..., class = "gdpar_input_error").
  • Hierarchical fits are rejected: if fit$stan_data$use_groups is non-null and equals 1L (after as.integer), the function aborts with class gdpar_unsupported_feature_error and a message explaining that Theorem 4C does not cover per-group random anchors.
  • Suggested-package dependencies cmdstanr (for optimization and Laplace) and posterior (for draw extraction/summarisation) are required via require_suggested; missing packages raise an error.
  • The opt-in message is emitted through gdpar_inform with class "gdpar_optin_message" when verbose = TRUE.
  • Candidate parameters are obtained from posterior::variables(draws) after excluding names matching the regex
    ^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw|c_b|c_b_raw|mu_theta_ref|sigma_theta_ref|sigma_a|sigma_b|sigma_W)
    
    This excludes latent noise, log-likelihood, predictions, per-observation anchors, random-effect coefficients and raws, centering constants, prior hyperparameters, and hierarchical scales (sigma_a, sigma_b, sigma_W), leaving user-facing identified parameters such as theta_ref, sigma_y, and phi.
  • When parameters is supplied by the user, any requested name not in candidate_vars triggers gdpar_abort with class gdpar_input_error and a message listing the missing names wrapped in sQuote.
  • Stan code is regenerated from fit$prior via generate_stan_code(fit$prior, mle = TRUE) (prior block stripped using the // BEGIN PRIORS / // END PRIORS markers) and written to a temporary file via write_stan_to_tempfile.
  • The MLE is found by cs_model$optimize(data = stan_data, algorithm = "lbfgs", refresh = 0, history_size = 5, init_alpha = 0.001, iter = 2000, jacobian = FALSE).
  • The Laplace step is wrapped in tryCatch. On error, gdpar_warn is emitted with class "gdpar_diagnostic_warning" carrying the message "Laplace approximation failed: <conditionMessage>. Falling back to MLE-only summary.", and the asymptotic columns of table plus discrepancy are filled with NA_real_.
  • Bayesian and Laplace summaries are both produced with posterior::summarise_draws using custom mean, sd, q_lower, q_upper functions; q_lower and q_upper use stats::quantile with names = FALSE at probabilities alpha/2 and 1 - alpha/2 respectively.
  • The function does not modify fit; the returned report is purely informational.
  • Side effects: writes a Stan file to a temporary location, invokes cmdstanr::cmdstan_model, optimize, and (optionally) laplace; may emit one opt-in message, one diagnostic warning, and one or more width-ratio warnings.

print.gdpar_bvm_report(x, ...)

Purpose

Exported S3 print method for objects of class gdpar_bvm_report produced by gdpar_bvm_check. Renders a human-readable summary of the calibration comparison.

Arguments

  • x: an object of S3 class gdpar_bvm_report.
  • ...: unused; present for S3 generic compatibility.

Returns

Invisibly returns x.

Notes

  • Prints a header line of the form <gdpar_bvm_report> level = <level> followed by a newline.
  • Prints x$table via print with row.names = FALSE.
  • If length(x$warnings) > 0L, prints a blank line, the heading Warnings:, and each warning prefixed by - on its own line.
  • Dispatched through the S3 generic print based on the first class element "gdpar_bvm_report".
  • No validation of x is performed; the function assumes the object was constructed by gdpar_bvm_check.

R/causal_bridge_methods.R

print.gdpar_causal_bridge(x, ...)

Purpose S3 print method for objects of class gdpar_causal_bridge. Produces a concise, human-readable summary of the bridge's structural metadata and first few CATE estimates.

Arguments

Argument Type Meaning
x gdpar_causal_bridge The bridge object to print.
... (unused) Present for S3 generic compatibility; ignored.

Returns Invisibly returns x.

Notes

  • The method extracts the treatment-model family (x$fits$treat$family) and prints both its name and link.
  • It uses %||% (null-coalescing) to default K and p to 1L when the corresponding slot is absent in x$fits$treat.
  • K is printed only when K > 1; otherwise p is printed only when p > 1. If both are 1, neither label is emitted.
  • The anchor vector is formatted with digits = 4 and displayed as a comma-separated bracketed list.
  • x$meta$newdata_source defaults to "<unknown>" when NULL.
  • The head of cate_mean is displayed via the internal helper .bridge_format_head. If x$warnings is a non-empty character vector, each warning is printed on its own indented line under a "Warnings:" heading.
  • Side effect: writes to the console via cat.

.bridge_format_head(cate_mean, n_show = 6L)

Purpose Internal helper that formats the first few elements (or rows) of the cate_mean vector/matrix for display in print.gdpar_causal_bridge.

Arguments

Argument Type Meaning
cate_mean numeric vector or numeric matrix The posterior mean CATE values. A vector for scalar bridges; a matrix for multivariate/K-individual bridges.
n_show integer, default 6L Maximum number of elements or rows to display.

Returns A single character string suitable for cat.

Notes

  • Matrix path (is.matrix(cate_mean) is TRUE): nrow(cate_mean) observations are assumed; n_show is clamped to min(n_show, n). Each row i is formatted as [v1, v2, ...] with digits = 3 via format. Rows are concatenated with "; " separators.
  • Vector path: length(cate_mean) observations are assumed; n_show is clamped similarly. The first n_show elements are formatted with digits = 3 and joined with ", ".
  • Marked @keywords internal and @noRd; not exported.

summary.gdpar_causal_bridge(object, ...)

Purpose S3 summary method for gdpar_causal_bridge objects. Constructs a structured summary containing a per-observation table of posterior CATE (mean and credible interval), the marginal Average Treatment Effect (ATE) and its credible interval, and ancillary metadata.

Arguments

Argument Type Meaning
object gdpar_causal_bridge The bridge to summarise.
... (unused) Present for S3 generic compatibility; ignored.

Returns An object of class c("summary.gdpar_causal_bridge", "list") with components:

Component Structure Description
table data.frame Per-observation (and per-slot) CATE table (see below).
ate named numeric vector Marginal ATE. Scalar for K=1,p=1; length-K vector otherwise.
ate_ci numeric matrix (K × 2) Credible bounds for the marginal ATE. Columns lower, upper. Row names are slot names (or "ate" for scalar case).
level numeric scalar Credible level used.
type character scalar Type of bridge (e.g. "response", "link").
n_draws integer scalar Number of posterior draws.
n_obs integer scalar Number of evaluation observations.

Mathematics

Let $S$ denote the number of posterior draws and $n$ the number of observations. The credible level is $\ell$ and the tail probabilities are

$$\alpha = 1 - \ell, \quad q_L = \frac{\alpha}{2}, \quad q_U = 1 - \frac{\alpha}{2}.$$

Scalar bridge (cate_draws is an $S \times n$ matrix):

$$\text{ATE} = \frac{1}{n}\sum_{i=1}^{n} \text{cate_mean}_i, \qquad \text{ATE draws} = \frac{1}{n}\sum_{i=1}^{n} \text{cate_draws}_{s,i}, \quad s=1,\dots,S.$$

The ATE credible interval is

$$\text{CI}_{\text{ATE}} = \bigl[,Q_{q_L}(\text{ATE draws}),; Q_{q_U}(\text{ATE draws}),\bigr],$$

where $Q_p$ denotes the empirical $p$-quantile (via stats::quantile).

Multivariate / K-individual bridge (cate_draws is an $S \times n \times K$ array):

For each slot $k = 1,\dots,K$:

$$\text{ATE}_k = \frac{1}{n S}\sum_{s=1}^{S}\sum_{i=1}^{n} \text{cate_draws}_{s,i,k}, \qquad \bar{\tau}_{s,k} = \frac{1}{n}\sum_{i=1}^{n} \text{cate_draws}_{s,i,k}.$$

$$\text{CI}_{\text{ATE},k} = \bigl[,Q_{q_L}(\bar{\tau}_{\cdot,k}),; Q_{q_U}(\bar{\tau}_{\cdot,k}),\bigr].$$

Notes

  • Calls assert_inherits(object, "gdpar_causal_bridge", "object") to validate input.
  • The scalar branch is taken when is.matrix(cate_draws) is TRUE; the table then has columns observation, cate_mean, cate_lower, cate_upper. The cate_mean and cate_ci components of the bridge object (object$cate_mean, object$cate_ci) are assumed pre-computed.
  • The array branch is taken otherwise. Slot names are taken from object$meta$dim_names; if NULL, they default to "dim_1", "dim_2", …, "dim_K". The table gains a slot column and has K × n rows total.
  • The per-observation ATE for the scalar case is mean(cate_draws) (i.e. the grand mean over both draws and observations), and the ATE CI uses rowMeans(cate_draws) (a length-$S$ vector of per-draw cross-observation means).
  • For the array case, ate_vec[k] is computed as mean(cate_draws[, , k]) (grand mean over all draws and observations for slot $k$). The CI for slot $k$ uses apply(cate_draws[, , k, drop = FALSE], 1L, mean) to produce a length-$S$ vector of per-draw means, then takes quantiles of that.
  • The ate_mat matrix in the scalar path has a single row named "ate".
  • Class is set to c("summary.gdpar_causal_bridge", "list").

print.summary.gdpar_causal_bridge(x, ...)

Purpose S3 print method for summary.gdpar_causal_bridge objects. Displays bridge metadata, the marginal ATE table, and the first 10 rows of the per-observation CATE table.

Arguments

Argument Type Meaning
x summary.gdpar_causal_bridge The summary object to print.
... (unused) Present for S3 generic compatibility; ignored.

Returns Invisibly returns x.

Notes

  • Builds an ate_df data frame with columns slot, mean, lower, upper from x$ate (coerced to unnamed vector) and x$ate_ci. This data frame is printed with row.names = FALSE.
  • The per-observation CATE table is truncated to the first 10 rows via utils::head(x$table, 10L).
  • Side effect: writes to the console via cat and print.

predict.gdpar_causal_bridge(object, newdata, level = NULL, summary = c("all", "draws", "mean_ci"), ...)

Purpose S3 predict method for gdpar_causal_bridge objects. Recomputes the per-observation Conditional Average Treatment Effect (CATE) on a new evaluation grid by leveraging the treatment and control model fits already stored in the bridge. Structural compatibility of the two fits is not re-checked.

Arguments

Argument Type Meaning
object gdpar_causal_bridge The bridge whose fits are used for prediction.
newdata data.frame New covariate grid on which to evaluate the CATE. Required.
level numeric scalar in $(0,1)$ or NULL Credible level for the new credible intervals. If NULL (default), uses object$level.
summary character scalar, one of "all", "draws", "mean_ci" Controls the structure of the returned value (see Returns).
... (unused) Present for S3 generic compatibility; ignored.

Mathematics

Let $\hat{\tau}{\text{treat}}(x_i)$ and $\hat{\tau}{\text{ctrl}}(x_i)$ denote the posterior draws for observation $x_i$ under the treatment and control fits respectively, aligned to the same set of $S$ draws. Then

$$\text{CATE}(x_i) = \hat{\tau}_{\text{treat}}(x_i) - \hat{\tau}_{\text{ctrl}}(x_i).$$

The summary statistics are

$$\text{cate_mean}_i = \frac{1}{S}\sum_{s=1}^{S} \text{CATE}_s(x_i), \qquad \text{cate_ci}_i = \bigl[,Q_{q_L}(\text{CATE}_{\cdot}(x_i)),; Q_{q_U}(\text{CATE}_{\cdot}(x_i)),\bigr],$$

where $q_L = \alpha/2$, $q_U = 1 - \alpha/2$, and $\alpha = 1 - \text{level}$.

Returns

summary value Return structure
"all" A list with components cate_draws (array/matrix of posterior CATE draws), cate_mean (numeric vector/matrix of posterior means), cate_ci (array/matrix of credible bounds), n_draws (integer scalar), n_obs (integer scalar).
"draws" The raw cate_draws object (matrix or array), returned early.
"mean_ci" A list with components cate_mean and cate_ci.

Notes

  • Validates object via assert_inherits and newdata via assert_data_frame.
  • level is validated by assert_numeric_scalar when non-NULL, with bounds (1e-3, 1 − 1e-3) exclusive.
  • summary is matched via match.arg; partial matching is allowed by R convention.
  • The type slot is extracted from object$type and forwarded to both stats::predict calls with summary = "draws". This means both the treatment and control predictions return raw posterior draws.
  • Draw alignment is performed by the internal helper .align_bridge_draws(pred_t, pred_c), which returns a list with aligned $treat and $ctrl arrays and the common draw count $S.
  • CATE is computed as the element-wise difference aligned$treat - aligned$ctrl.
  • Summarisation (mean and CI) is delegated to the internal helper .summarize_cate(cate_draws, ql, qu).
  • The return value for summary = "all" mirrors the cate_* slot structure of a freshly constructed gdpar_causal_bridge, enabling downstream methods (e.g. summary, print) to operate on the result if it is wrapped appropriately.

R/causal_bridge.R

gdpar_causal_bridge(fit_treat, fit_ctrl, newdata = NULL, type = c("response", "theta_i", "linear_predictor"), level = 0.95, ...)

Purpose

Exported constructor for the T-learner causal bridge between two independent gdpar_fit objects. It estimates the conditional average treatment effect (CATE) as the per-observation, per-draw difference of the posterior predictive distributions from the treatment-arm fit and the control-arm fit, evaluated on a common evaluation set. The function does not modify either input fit and performs no causal adjustment beyond what is encoded in the two AMM specifications; the no-unmeasured-confounding assumption within each arm is the user's responsibility.

Arguments

  • fit_treat — (gdpar_fit) A fit object produced by gdpar() for the treatment arm. Must be a Path 1 (path = "bayes") Bayesian fit.
  • fit_ctrl — (gdpar_fit) A fit object produced by gdpar() for the control arm. Must share the family, anchor, AMM level, and covariate structure of fit_treat.
  • newdata — (data.frame or NULL) Optional evaluation data frame. When NULL (default), the function attempts to recover each arm's training data by evaluating the captured data argument of each fit's call in the caller's environment; if both recoveries succeed and the two data frames share column structure, their rbind is used. Otherwise the function aborts and requests an explicit newdata.
  • type — (character) Scalar selecting the prediction scale. Matched via match.arg against c("response", "theta_i", "linear_predictor"). "response" applies the inverse link per draw; "theta_i" and "linear_predictor" are synonyms selecting the linear predictor of the individual parameter.
  • level — (numeric) Scalar in $(0, 1)$ giving the nominal credible level for per-observation CATE intervals. Defaults to 0.95. Validated to lie in $[10^{-3},, 1 - 10^{-3}]$.
  • ... — Reserved for future arguments; currently unused.

Mathematics

For each posterior draw $s = 1, \dots, S$ and observation $i = 1, \dots, n_{\text{new}}$, the bridge computes

$$ \widehat{\tau}^{(s)}_i = \hat{\mu}^{(s)}_{\text{treat}}(x_i) - \hat{\mu}^{(s)}_{\text{ctrl}}(x_i), $$

where $\hat{\mu}^{(s)}_{\text{arm}}(x)$ is the posterior prediction of the chosen type at $x$. The marginal posterior of the CATE at each $x_i$ is summarized by the empirical mean and the $(\alpha/2,; 1-\alpha/2)$ quantiles with

$$ \alpha = 1 - \text{level}. $$

Because the two fits are independent (sampled from disjoint data subsets), the joint posterior factorizes:

$$ p(\theta_{\text{treat}}, \theta_{\text{ctrl}} \mid \text{data}) = p(\theta_{\text{treat}} \mid \text{data}_{\text{treat}}) \cdot p(\theta_{\text{ctrl}} \mid \text{data}_{\text{ctrl}}), $$

so any pairing of marginal draws is a valid sample from the joint. When the two fits differ in number of draws, the function trims to

$$ S = \min(S_{\text{treat}},; S_{\text{ctrl}}) $$

and emits a gdpar_diagnostic_warning.

For multivariate ($p &gt; 1$) or distributional ($K &gt; 1$) fits, predict.gdpar_fit returns a 3-array of shape $[S, n, \text{dim}]$; the CATE is computed elementwise, and the per-coordinate or per-slot CATEs occupy the last dimension of cate_draws. For type = "response", the canonical inverse link of each coordinate or slot is applied by predict.gdpar_fit before differencing.

Returns

An object of class c("gdpar_causal_bridge", "list") with components:

  • cate_drawsmatrix $[S, n]$ (scalar fits) or array $[S, n, \text{dim}]$ (multivariate / K-individual fits): per-draw CATE values.
  • cate_mean — posterior mean of the CATE per observation (and per dimension when applicable).
  • cate_ci — credible interval bounds at the requested level.
  • newdata — the resolved evaluation data frame.
  • id_checklist(treat = fit_treat$identifiability_report, ctrl = fit_ctrl$identifiability_report).
  • fitslist(treat = fit_treat, ctrl = fit_ctrl).
  • type — the matched type string.
  • level — the numeric credible level.
  • n_drawsS, the (possibly trimmed) number of aligned draws.
  • n_obsnrow(newdata_resolved).
  • call — the matched call (match.call()).
  • warningscharacter vector of fallback notifications (e.g., posterior-draw trimming); empty on the happy path.
  • metalist(dim_kind, dim_size, dim_names, newdata_source) where newdata_source is the "bridge_source" attribute of the resolved newdata.

Notes

  • Calls six internal validators in sequence: .check_bridge_path, .check_bridge_hierarchical, .check_bridge_family, .check_bridge_dim, .check_bridge_amm, .check_bridge_anchor. Only the first three are defined in this section; the remaining three and .resolve_bridge_newdata, .align_bridge_draws, .summarize_cate are defined in subsequent sections.
  • Aborts with condition class gdpar_unsupported_feature_error (via gdpar_abort) when structural compatibility checks fail.
  • The assert_inherits calls validate that both fit_treat and fit_ctrl inherit from "gdpar_fit" before any other logic executes.
  • assert_numeric_scalar enforces level within $[10^{-3},; 1 - 10^{-3}]$.
  • The newdata resolution uses parent.frame() as the evaluation environment for recovering captured data arguments.
  • Predictions are obtained via stats::predict(fit, newdata = ..., type = type, summary = "draws"), dispatching to predict.gdpar_fit.
  • The warnings field is set to character(0L) when aligned$warning is NA, otherwise to the warning string.
  • (C7) anti-aliasing of Block 6.5 is not invoked because the hierarchical guard rules out the regime in which it applies.
  • Companion S3 methods print.gdpar_causal_bridge and summary.gdpar_causal_bridge are documented elsewhere.

.check_bridge_path(fit_treat, fit_ctrl)

Purpose

Internal validator asserting that both fits were produced under Path 1 (path = "bayes"). T-learner support for Paths 2/3 is not implemented.

Arguments

  • fit_treat — (gdpar_fit) Treatment-arm fit.
  • fit_ctrl — (gdpar_fit) Control-arm fit.

Mathematics

Each fit's path slot is read with a fallback default of "bayes":

$$ \text{path}_{\text{arm}} = \text{fit}_{\text{arm}}\text{$path} ;%|%; \text{"bayes"}. $$

The check passes if and only if $\text{path}{\text{treat}} \equiv \text{"bayes"}$ and $\text{path}{\text{ctrl}} \equiv \text{"bayes"}$ (via identical).

Returns

invisible(NULL) on success.

Notes

  • On failure, calls gdpar_abort with class = "gdpar_unsupported_feature_error" and a data list containing path_treat and path_ctrl.
  • The %||% operator provides the default "bayes" when fit$path is NULL, meaning a fit lacking an explicit path slot is treated as Path 1.

.check_bridge_hierarchical(fit_treat, fit_ctrl)

Purpose

Internal validator asserting that neither fit was sampled in the hierarchical (grouped) regime. The T-learner difference of per-group anchors is not defined in the canonical formulation and is queued for a future sub-phase.

Arguments

  • fit_treat — (gdpar_fit) Treatment-arm fit.
  • fit_ctrl — (gdpar_fit) Control-arm fit.

Mathematics

A fit is classified as grouped when

$$ \text{is_grouped}(\text{fit}) = \bigl(\text{fit$stan_data$use_groups} \neq \text{NULL}\bigr) ;\wedge; \bigl(\lfloor \text{fit$stan_data$use_groups} \rfloor = 1\bigr). $$

The check passes if and only if $\neg,\text{is_grouped}(\text{fit}{\text{treat}}) ;\wedge; \neg,\text{is_grouped}(\text{fit}{\text{ctrl}})$.

Returns

invisible(NULL) on success.

Notes

  • Defines a local closure is_grouped(fit) that tests fit$stan_data$use_groups for non-NULL and integer equality to 1L (via as.integer(...)).
  • On failure, calls gdpar_abort with class = "gdpar_unsupported_feature_error" and a data list containing treat_grouped and ctrl_grouped (logical scalars).
  • Uses the same condition class as gdpar_bvm_check so user code handling unsupported-feature errors covers both helpers.
  • The error message advises refitting each arm without the group argument.

.check_bridge_family(fit_treat, fit_ctrl)

Purpose

Internal validator asserting that the two fits share compatible family identifiers—both the top-level family name and link, and, when param_specs are present (multivariate / K-individual families), the per-slot family identifiers.

Arguments

  • fit_treat — (gdpar_fit) Treatment-arm fit.
  • fit_ctrl — (gdpar_fit) Control-arm fit.

Mathematics

Let $\text{fam}{\text{arm}} = \text{fit}{\text{arm}}\text{$family}$. The top-level check requires

$$ \text{fam}_{\text{treat}}\text{$name} \equiv \text{fam}_{\text{ctrl}}\text{$name} ;\wedge; \text{fam}_{\text{treat}}\text{$link} \equiv \text{fam}_{\text{ctrl}}\text{$link}. $$

When param_specs is non-NULL on either fit, let $n_{\text{arm}} = |\text{fam}{\text{arm}}\text{$param_specs}|$ (defaulting to an empty list via %||%). The slot-count check requires $n{\text{treat}} = n_{\text{ctrl}}$. The per-slot identifier vector for each arm is

$$ \text{names}{\text{arm}}[k] = \text{as.character}\bigl(\text{ps}{\text{arm}}k\text{$family_id} ;%|%; \text{ps}{\text{arm}}k\text{$name} ;%|%; \text{NA_character_}\bigr), \quad k = 1, \dots, n{\text{arm}}, $$

and the slot-identifier check requires $\text{names}{\text{treat}} \equiv \text{names}{\text{ctrl}}$ (via identical).

Returns

invisible(NULL) on success.

Notes

  • Performs three sequential aborts, all with class = "gdpar_unsupported_feature_error":
    1. Top-level family name or link mismatch — data contains family_treat, family_ctrl, link_treat, link_ctrl.
    2. Slot count mismatch — data contains n_slots_treat, n_slots_ctrl.
    3. Per-slot family identifier mismatch — data contains slot_families_treat, slot_families_ctrl (character vectors).
  • The param_specs branch is entered when !is.null(ps_t) || !is.null(ps_c), meaning if either fit has param_specs, both are expected to have them and be compared.
  • Per-slot identifier extraction uses the chain s$family_id %||% s$name %||% NA_character_, so a slot lacking both family_id and name yields NA_character_.
  • vapply with character(1L) is used for type-safe extraction of per-slot identifiers.

Internal Functions (Section 2)

.check_bridge_dim(fit_treat, fit_ctrl)

Purpose

Guard function for gdpar_causal_bridge that verifies the two fitted model objects share identical structural dimensions $(K, p)$, where $K$ is the number of AMM components (or mixture/strata slots) and $p$ is the parameter dimension. This is a prerequisite for any per-draw contrast between the treatment and control arms.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) The treatment-arm fit; must contain (or default) elements K and p.
fit_ctrl list (fitted model object) The control-arm fit; must contain (or default) elements K and p.

Mathematics

The check enforces the equality constraints

$$ K_{\text{treat}} = K_{\text{ctrl}}, \qquad p_{\text{treat}} = p_{\text{ctrl}}. $$

Each dimension is recovered with a null-coalescing fallback to 1L:

$$ K_t = \text{fit_treat}$K ;%||%; 1, \qquad K_c = \text{fit_ctrl}$K ;%||%; 1, $$

and analogously for $p_t, p_c$.

Returns

invisible(NULL) when the dimensions match. Otherwise it never returns: it calls gdpar_abort with class "gdpar_unsupported_feature_error" and a data payload listing K_treat, K_ctrl, p_treat, p_ctrl.

Notes

  • Uses the %||% operator, so a missing K or p slot silently defaults to 1L rather than erroring.
  • The error condition is tagged "gdpar_unsupported_feature_error", signalling that mismatched dimensions are a feature limitation rather than a user input typo.

.check_bridge_amm(fit_treat, fit_ctrl)

Purpose

Asserts that the two fits are compatible at the AMM (Additive Modulating Model) specification level. Three facets are compared: (1) the AMM level (structural composition of the spec slots a/b/W), (2) the modulating basis type (polynomial vs. B-spline), and (3) the covariate column structure of the AMM design. This guarantees that the predict path on newdata reuses the same algorithm on both arms.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) Treatment-arm fit; expected to carry an amm element or a fallback amm_list_canonical.
fit_ctrl list (fitted model object) Control-arm fit; same expectation.

Mathematics

Let $\mathcal{A}_t$ and $\mathcal{A}_c$ denote the AMM specs of the two arms. The function verifies three identities:

$$ \text{level}(\mathcal{A}_t) \equiv \text{level}(\mathcal{A}_c), $$

$$ \text{W_type}(\mathcal{A}_t) \equiv \text{W_type}(\mathcal{A}_c), $$

$$ \text{covariates}(\mathcal{A}_t) \equiv \text{covariates}(\mathcal{A}_c). $$

The AMM spec is resolved via null-coalescing:

$$ \mathcal{A} = \text{fit}$\text{amm} ;%||%; \text{fit}$\text{amm_list_canonical}. $$

Returns

invisible(NULL) on success. On any mismatch, calls gdpar_abort:

  • Missing AMM on either fit → class "gdpar_internal_error".
  • Level mismatch → class "gdpar_unsupported_feature_error", data = list(level_treat, level_ctrl).
  • W-type mismatch → class "gdpar_unsupported_feature_error", data = list(W_type_treat, W_type_ctrl). The formatted message substitutes "<none>" for a NULL type.
  • Covariate mismatch → class "gdpar_unsupported_feature_error". The message lists the symmetric set difference of covariate component names:

$$ \bigl(,\text{names}(\text{cov}_t) \cup \text{names}(\text{cov}_c),\bigr) ;\setminus; \bigl(,\text{names}(\text{cov}_t) \cap \text{names}(\text{cov}_c),\bigr). $$

Notes

  • Dispatches to three helper functions: .bridge_amm_level, .bridge_amm_W_type, .bridge_amm_covariates.
  • The covariate comparison uses identical(), so ordering of list elements matters for the check to pass.
  • A missing AMM on either fit is treated as an internal error (not a user error), reflecting the expectation that fits should always carry an AMM spec.

.check_bridge_anchor(fit_treat, fit_ctrl)

Purpose

Asserts that the reference anchors stored in the two fits are numerically identical (within tolerance). The anchor enters the modulating term as $\theta_{\text{ref}}^k - \text{anchor}^k$, so a mismatch between arms changes the semantic meaning of the predicted $\theta_i$ and therefore of the CATE.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) Treatment-arm fit; must contain an anchor element (numeric vector).
fit_ctrl list (fitted model object) Control-arm fit; must contain an anchor element (numeric vector).

Mathematics

Given anchor vectors $a_t$ and $a_c$, the function first checks length equality:

$$ \text{length}(a_t) = \text{length}(a_c). $$

Then it computes element-wise absolute differences and a per-element scale:

$$ d_i = |a_{t,i} - a_{c,i}|, \qquad s_i = \max\bigl(|a_{t,i}|,; |a_{c,i}|,; 1\bigr), $$

and requires

$$ d_i \le 10^{-8} \cdot s_i \quad \forall, i. $$

This combines relative tolerance (when $|a_{t,i}|$ or $|a_{c,i}|$ exceeds 1) with absolute tolerance (floor at 1).

Returns

invisible(NULL) when anchors match. Otherwise calls gdpar_abort:

  • Length mismatch → class "gdpar_unsupported_feature_error", data = list(anchor_treat_length, anchor_ctrl_length).
  • Value mismatch → class "gdpar_unsupported_feature_error", data = list(anchor_treat, anchor_ctrl). The message formats anchor values with digits = 6 and instructs the user to refit one arm anchored to the other's value.

Notes

  • The tolerance is fixed at 1e-8 and is not configurable.
  • pmax is used (element-wise parallel max), so the scale vector has the same length as the anchors.
  • If either anchor slot is NULL, the subtraction a_t - a_c will error in base R before the tolerance check; this is not explicitly guarded.

.bridge_amm_level(amm)

Purpose

Infers the AMM level from an amm_spec object or a list of amm_spec objects. The level encodes the structural composition of the spec (presence/absence of a, b, W components).

Arguments

Argument Type Meaning
amm amm_spec or list of amm_spec The AMM specification (single or per-slot when $K &gt; 1$).

Returns

  • If amm inherits from "amm_spec": as.integer(amm$level) (scalar integer).
  • If amm is a list: an integer vector of the same length, where each element is as.integer(a$level) if the element inherits from "amm_spec", otherwise NA_integer_.
  • Otherwise: NA_integer_.

Notes

  • No error is raised for non-amm_spec inputs; the function returns NA_integer_ silently.
  • The return type is always integer (via as.integer and NA_integer_), making the output suitable for identical() comparison in .check_bridge_amm.

.bridge_amm_W_type(amm)

Purpose

Extracts the modulating basis type (e.g., "polynomial" or "bspline") from the W sub-component of an AMM spec. This ensures the predict path on new data reuses the same basis expansion algorithm on both arms.

Arguments

Argument Type Meaning
amm amm_spec or list of amm_spec The AMM specification.

Returns

  • If amm inherits from "amm_spec":
    • NULL when amm$W is NULL (no modulating term).
    • as.character(amm$W$type) otherwise.
  • If amm is a list:
    • A character vector of length length(amm), where each element is as.character(a$W$type) if the element is an amm_spec with a non-null W, otherwise NA_character_.
    • If all elements are NA, returns NULL.
    • Otherwise returns the full character vector (including NA entries).
  • Otherwise: NULL.

Notes

  • The "all NA → NULL" collapse means a list of specs with no W on any slot is indistinguishable from a single spec with no W.
  • When some slots have W and others do not, the returned vector contains NA_character_ entries, which will cause .check_bridge_amm to fail the identical() test if the two arms differ in which slots lack W.

.bridge_amm_covariates(amm)

Purpose

Extracts the covariate names for each AMM component (a, b, and x_vars) from an AMM spec. This is used to verify that both arms share the same covariate column structure in their AMM designs.

Arguments

Argument Type Meaning
amm amm_spec or list of amm_spec The AMM specification.

Mathematics

For a single amm_spec $a$, the function extracts:

$$ \text{a_vars} = \text{all.vars}(a$a), \qquad \text{b_vars} = \text{all.vars}(a$b), \qquad \text{x_vars} = a$\text{x_vars}, $$

where all.vars() parses an R formula/expression and returns the variable names referenced in it. Each component defaults to character(0L) when the corresponding slot is NULL.

Returns

  • If amm inherits from "amm_spec": a named list with elements a_vars, b_vars, x_vars (each a character vector).
  • If amm is a list: a list of such named lists (one per element of amm).
  • Otherwise: an empty list().

Notes

  • all.vars() is applied to a$a and a$b, which are expected to be formula-like objects (formulas, calls, or expressions). If they are character strings, all.vars() will return character(0).
  • The x_vars slot is taken as-is (not parsed), so it must already be a character vector.
  • The returned structure is compared with identical() in .check_bridge_amm, so the order of x_vars and the order of list elements matter.

.resolve_bridge_newdata(fit_treat, fit_ctrl, newdata, eval_env)

Purpose

Resolves the evaluation grid (newdata) on which the CATE will be computed. When the user supplies newdata, it is used directly. When newdata is NULL, the function attempts to recover both arms' training data by evaluating the captured data argument from each fit's call, then rbinds them into a single data frame.

Arguments

Argument Type Meaning
fit_treat list (fitted model object) Treatment-arm fit; its $call$data is evaluated if newdata is NULL.
fit_ctrl list (fitted model object) Control-arm fit; same recovery logic.
newdata data.frame or NULL User-supplied evaluation grid. If non-NULL, it is validated and returned.
eval_env environment The environment in which to evaluate fit$call$data when recovering training data.

Returns

A data.frame with an attribute "bridge_source":

  • "user" — when newdata was supplied by the user (passed through after assert_data_frame validation).
  • "training_rbind" — when newdata was NULL and both arms' training data were successfully recovered and rbind-ed.

The rbind uses the column order of the treatment arm (data_t) for both arms, ensuring consistent column alignment.

Notes

  • Recovery logic: The inner function recover(fit, arm_label) extracts fit$call$data and evaluates it in eval_env inside a tryCatch. Any evaluation error silently yields NULL. The arm_label argument is accepted but never used in the body.
  • Column matching: Column names are compared after sorting (sort(colnames(...))), so column order may differ between arms but the set of columns must be identical. However, the rbind itself uses data_t's column order for both, so if data_c has columns in a different order, they are subsetted to match data_t's order before binding.
  • Error conditions (all via gdpar_abort):
    • Recovery failure (either arm's data is NULL or not a data frame) → class "gdpar_input_error", data = list(treat_recovered, ctrl_recovered).
    • Column structure mismatch → class "gdpar_input_error", data = list(colnames_treat, colnames_ctrl).
  • assert_data_frame(newdata, "newdata") is called on user-supplied newdata; this is presumably a checkmate-style assertion (not defined in this section).
  • The rbind is performed with drop = FALSE subsetting, preserving data frame structure even for single-column data.

.align_bridge_draws(pred_t, pred_c)

Purpose

Aligns the posterior draw arrays of the two arms by trimming the longer set to match the shorter. This is necessary because the two fits may have been run with different numbers of posterior draws ($S$). When trimming occurs, a diagnostic warning is emitted and the trimming notification is returned for persistence in the final bridge object.

Arguments

Argument Type Meaning
pred_t matrix [S_t, n] or array [S_t, n, dim] Posterior draws from the treatment arm.
pred_c matrix [S_c, n] or array [S_c, n, dim] Posterior draws from the control arm.

Mathematics

The aligned draw count is:

$$ S = \min(S_t, S_c). $$

When $S_t \neq S_c$, the longer array is subsetted to its first $S$ rows along the first axis (the draw axis). The trimming operation for a matrix is:

$$ \text{pred}^{\prime} = \text{pred}[,1{:}S,, \cdot,], $$

and for an array of dimension $d$:

$$ \text{pred}^{\prime} = \text{pred}[,1{:}S,, \cdot,, \ldots,, \cdot,]. $$

Returns

A named list with four components:

Component Type Description
treat matrix or array Trimmed treatment draws, first axis of length $S$.
ctrl matrix or array Trimmed control draws, first axis of length $S$.
S integer The aligned draw count $\min(S_t, S_c)$.
warning character scalar The trimming notification message when $S_t \neq S_c$; NA_character_ otherwise.

Notes

  • Draw count inference: $S$ is inferred from nrow() (for matrices) or dim()[1L] (for arrays). If pred_t is a matrix and pred_c is an array (or vice versa), the function does not explicitly check for type consistency; it will still compute $S_t$ and $S_c$ but the trimming logic branches on is.matrix() per input.
  • Warning emission: When trimming occurs, gdpar_warn is called with class "gdpar_diagnostic_warning" and data = list(S_treat, S_ctrl, S). The same message string is stored in the warning field for persistence.
  • Trimming implementation: The inner function trim_first_axis(arr, S) handles both matrices (simple row indexing with drop = FALSE) and arrays (constructs an index list with seq_len(S) for the first axis and quote(expr =) (i.e., empty subscript) for all remaining axes, then calls do.call([, ...) with drop = FALSE). The use of quote(expr =) produces a missing/empty argument in the subscript list, which selects all elements along that dimension.
  • Persistence: The warning field is designed to be persisted by the constructor into the $warnings slot of the resulting gdpar_causal_bridge object, so the print method can surface the fallback notification (referenced as "D48 canonical norm; D50 of Sesion 18 Etapa 2 of Sesion 8.4" in the source comments).
  • No error is raised if the two inputs have different shapes beyond the first axis; the function only aligns the first (draw) axis.

.summarize_cate(cate_draws, ql, qu)

Purpose

Internal helper that converts raw posterior draws of a Conditional Average Treatment Effect (CATE) quantity into a compact summary consisting of the posterior mean and a lower/upper credible-interval bound per unit (and, in the multi-component case, per component slot). It is the canonical summarizer used downstream by the causal-bridge machinery to produce user-facing CATE summaries; it also tags the output with metadata (dim_kind, dim_size, dim_names) describing the structure of the summarized quantity so that downstream printers/extractors can dispatch correctly.

Arguments

  • cate_draws : numeric posterior draws of the CATE. Two shapes are supported:
    • a matrix with rows indexing posterior draws ($S$) and columns indexing units ($n$), i.e. dim = c(S, n);
    • a 3-dimensional array with dim = c(S, n, K), where the first dimension indexes draws, the second indexes units, and the third indexes component "slots" (e.g. multiple treatment arms, multiple outcome dimensions, or per-dimension effects).
  • ql : numeric scalar in $[0,1]$. Lower-tail quantile probability used for the credible interval.
  • qu : numeric scalar in $[0,1]$. Upper-tail quantile probability used for the credible interval.

Mathematics

For the matrix case (one scalar CATE per unit), let $\theta^{(s)}_i$ denote draw $s$ for unit $i$, $s=1,\dots,S$, $i=1,\dots,n$. The function computes

$$ \hat{\mu}_i = \frac{1}{S}\sum_{s=1}^{S} \theta^{(s)}_i, \qquad q^{,l}_i = F^{-1}_{\theta_i}(q_l), \qquad q^{,u}_i = F^{-1}_{\theta_i}(q_u), $$

where $F^{-1}_{\theta_i}$ is the empirical quantile function of the sample ${\theta^{(s)}i}{s=1}^{S}$ (as computed by stats::quantile with names = FALSE).

For the 3-D array case, let $\theta^{(s)}_{i,k}$ denote draw $s$ for unit $i$ and slot $k$, $k=1,\dots,K$. The function computes, for each $(i,k)$,

$$ \hat{\mu}_{i,k} = \frac{1}{S}\sum_{s=1}^{S} \theta^{(s)}_{i,k}, \qquad q^{,l}_{i,k} = F^{-1}_{\theta_{i,k}}(q_l), \qquad q^{,u}_{i,k} = F^{-1}_{\theta_{i,k}}(q_u). $$

The dim_kind tag is assigned by the heuristic

$$ \text{dim_kind} = \begin{cases} \texttt{"multi"}, & \text{if } \texttt{slot_names} \text{ is } \texttt{NULL} \text{ or every slot name matches } \texttt{\textasciicircum dim_},\\ \texttt{"K_individual"}, & \text{otherwise.} \end{cases} $$

Returns

A list with the following components, whose shapes depend on the input:

  • Matrix input:

    • mean : numeric vector of length $n$ (column means of cate_draws).
    • ci : numeric $n \times 2$ matrix with columns named "lower" and "upper"; column 1 holds the ql-quantiles, column 2 the qu-quantiles.
    • dim_kind : character scalar, hard-coded to "scalar".
    • dim_size : integer scalar, hard-coded to 1L.
    • dim_names : NULL.
  • 3-D array input:

    • mean : numeric $n \times K$ matrix with dimnames set to list(NULL, slot_names) (i.e. unit dimension unnamed, slot dimension named after the third-dimension names of cate_draws).
    • ci : numeric $n \times K \times 2$ array with dimnames = list(NULL, slot_names, c("lower", "upper")); slice [, , 1] holds the ql-quantiles, slice [, , 2] the qu-quantiles.
    • dim_kind : character scalar, either "multi" or "K_individual" per the heuristic above.
    • dim_size : integer scalar equal to $K$ (dim(cate_draws)[3L]).
    • dim_names : the third-dimension names of cate_draws (dimnames(cate_draws)[[3L]]), which may be NULL.

Notes

  • The function is internal (leading dot) and is not exported.
  • Shape dispatch is performed by testing is.matrix(cate_draws) first, then length(dim(cate_draws)) == 3L. Any other shape (e.g. a vector, a 2-D non-matrix array, or a 4+-D array) triggers an error via gdpar_abort with class = "gdpar_internal_error" and a data field carrying dim(cate_draws). The error message is the literal string "Internal error: unsupported shape for cate_draws.".
  • Quantiles are computed with stats::quantile and names = FALSE; the default quantile type (type 7) of stats::quantile is therefore in effect.
  • In the matrix branch, the two quantile rows returned by apply are re-assembled into an $n \times 2$ matrix explicitly (rather than transposed), so the orientation of ci is guaranteed regardless of how apply lays out its result.
  • In the 3-D branch, apply is called separately for the lower and upper quantiles (with scalar probs), producing two $n \times K$ matrices that are then written into the two slices of the output array. This avoids any ambiguity about the shape of apply's output when probs has length 1.
  • dim_kind is purely a metadata tag derived from the slot names; it does not affect the numerical content of mean or ci. The "multi" tag is intended to flag either anonymous component dimensions (slot_names is NULL) or component dimensions whose names follow the package-internal dim_ naming convention, while "K_individual" flags named per-individual components.
  • The function has no side effects and performs no S3 dispatch of its own; it is a plain closure.

R/check_identifiability.R

gdpar_check_identifiability(amm, data, theta_ref_init = NULL, formula_rhs = NULL, family = NULL, tol = 1e-8, rigor = c("full", "fast"))

Purpose
Diagnose whether the chosen finite parametric representation of the AMM canonical form satisfies the basis‑restricted Functional Independence Condition at a candidate population reference point. This pre‑fit check verifies parameter identifiability by examining the empirical Gram matrix of the extended design matrix.

Arguments

  • amm: Object of class amm_spec (created by amm_spec) defining bases for additive, multiplicative, and modulating components.
  • data: Data frame containing variables referenced in amm; covariates are centered internally before computing the Gram matrix.
  • theta_ref_init: Numeric vector of length p (dimension of the population reference) at which the diagnostic is evaluated. Defaults to a zero vector (if multiplicative component b is absent) or a vector of ones (if b is present).
  • formula_rhs: Optional formula or character vector specifying covariates for the modulating linear factor x; defaults to amm$x_vars.
  • tol: Numeric scalar ∈ (0,1) setting the relative condition‑number tolerance. Failure is flagged when λ_min < tol × λ_max. Default 1e-8.
  • family: Optional gdpar_family or gdpar_family_multi object; if supplied, triggers parameter‑level identifiability (D‑ID) pre‑fit checks.
  • rigor: Character scalar, "full" (default) or "fast", controlling the C4‑bis cross‑component check for p > 1 specs. Ignored when p = 1.

Mathematics
The function builds an extended design matrix ( Z ) by column‑binding the design matrices for the additive (( Z_a )), anchored multiplicative (( Z_b \cdot \theta_{\text{ref}} )), and modulating (( X )) components. Each column of ( Z ) is normalized to unit Euclidean norm, yielding ( Z_{\text{norm}} ). The empirical Gram matrix is

[ G = \frac{1}{n} Z_{\text{norm}}^\top Z_{\text{norm}}. ]

Its eigenvalues ( \lambda_{\text{max}} ) and ( \lambda_{\text{min}} ) are computed. The relative condition criterion is

[ \frac{\lambda_{\text{min}}}{\lambda_{\text{max}}} \ge \text{tol}, ]

equivalent to the condition number ( \kappa = \lambda_{\text{max}} / \lambda_{\text{min}} \le 1/\text{tol} ).
For p > 1, a per‑coordinate C4‑bis check is performed (if applicable) by calling check_C4_bis_per_k, and a D‑ID pre‑fit check is performed via .check_did_pre_fit.

Returns
An object of class gdpar_identifiability_report (a list) containing:

Component Description
passed Logical; TRUE iff all applicable checks (Gram, C4‑bis, D‑ID) pass.
lambda_min, lambda_max, condition_number Eigenvalues and condition number of the Gram matrix (if computed).
collinear_directions List of near‑zero eigenvectors projected onto basis columns (if Gram check fails); otherwise NULL.
theta_ref_used The theta_ref_init used.
tol_used, rigor_used The tol and rigor used.
column_labels Character vector labeling columns of the extended design matrix.
c4_bis Result of the C4‑bis per‑coordinate check (if performed); else NULL.
did_pre_fit Result of the D‑ID pre‑fit check (if performed); else NULL.
message Human‑readable summary.

A print method formats the report.

Notes

  • Trivial cases: If amm$level == 0L or there are no active design blocks, the function returns passed = TRUE with a message and NA eigenvalues.
  • Input validation: Uses assert_inherits, assert_data_frame, assert_numeric_scalar, and match.arg; raises gdpar_input_error on invalid inputs.
  • Warning: If theta_ref_init is zero in every coordinate and amm$b is non‑null, a gdpar_diagnostic_warning is issued because the multiplicative block vanishes trivially.
  • Zero‑norm columns: If any column of ( Z ) is zero after centering, the function returns passed = FALSE with lambda_min = 0 and condition_number = Inf.
  • Multivariate path: When design$Z_a_list exists and design$Z_a is NULL, the function delegates to check_identifiability_multi (defined elsewhere) and returns its result.
  • Extended matrix construction: For p > 1, the multiplicative block is expanded as ( Z_b \cdot \theta_{\text{ref}[k]} ) for each coordinate k, producing column labels b*theta[k]:....
  • C4‑bis and D‑ID checks: Only performed when p > 1 and the necessary design components exist. The rigor argument controls failure vs. warning on column‑name overlap between additive and modulating blocks.
  • Condition number computation: Uses eigen(G, symmetric = TRUE) and guards against division by zero with max(lambda_min, .Machine$double.eps).

check_identifiability_multi(amm, design, theta_ref, tol, rigor, family = NULL)

Purpose Top-level multivariate identifiability report. Combines two pre-fit layers—(i) the per-coordinate C4-bis functional independence check (check_C4_bis_per_k) and (ii) the pre-fit parameter identifiability / D-ID layer (.check_did_pre_fit)—into a single consolidated "gdpar_identifiability_report" object.

Arguments

Argument Type Meaning
amm amm_spec The model specification; forwarded to sub-checks (used to read W$dim and p).
design list Design structure returned by build_amm_design; forwarded to sub-checks.
theta_ref numeric vector (length p) Reference (initial) parameter vector at which identifiability is evaluated.
tol numeric scalar Tolerance for the eigenvalue-ratio (condition-number) criterion in C4-bis.
rigor character scalar ("fast" or "full") Controls depth of the C4-bis check; forwarded to both sub-checks.
family gdpar_family, gdpar_family_multi, or NULL Family object for the D-ID layer; NULL skips that layer entirely.

Mathematics

The function aggregates per-coordinate eigenvalue diagnostics. Denote the smallest and largest eigenvalues returned by coordinate $k$ as $\lambda_{\min}^{(k)}$ and $\lambda_{\max}^{(k)}$, and its condition number $\kappa^{(k)} = \lambda_{\max}^{(k)} / \lambda_{\min}^{(k)}$. The report-level summaries are:

$$ \lambda_{\min}^{\star} = \min_{k};\lambda_{\min}^{(k)}, \qquad \lambda_{\max}^{\star} = \max_{k};\lambda_{\max}^{(k)}, \qquad \kappa^{\star} = \max_{k};\kappa^{(k)} $$

with NA_real_ propagated when every per-$k$ value is NA.

Returns

A list of class c("gdpar_identifiability_report", "list") with elements:

Element Type / Content
passed Logical scalar; TRUE iff both passed_c4_bis and passed_did are TRUE.
lambda_min Numeric; global minimum eigenvalue across coordinates (or NA_real_).
lambda_max Numeric; global maximum eigenvalue across coordinates (or NA_real_).
condition_number Numeric; worst (maximum) condition number across coordinates (or NA_real_).
collinear_directions Always NULL at this level (populated in per-k entries).
theta_ref_used Numeric vector; echo of theta_ref.
tol_used Numeric; echo of tol.
rigor_used Character; echo of rigor.
column_labels character(0).
c4_bis Full return value of check_C4_bis_per_k.
did_pre_fit Return value of .check_did_pre_fit (or NULL).
message Character scalar summarising pass/fail with human-readable explanation.

Notes

  • When family is NULL, .check_did_pre_fit is not called (did_pre_fit is set to NULL) and passed_did defaults to TRUE, so the overall result depends solely on C4-bis.
  • The message distinguishes three cases: all-pass, C4-bis failure, and D-ID failure.
  • min/max over lmins/lmaxs/conds uses na.rm = TRUE but falls back to NA_real_ when every element is NA (guarded by all(is.na(...))).

check_C4_bis_per_k(design, amm, theta_ref, rigor, tol)

Purpose Internal per-coordinate cross-component identifiability check (C4-bis). For each coordinate $k = 1,\dots,p$ either performs a structural column-name overlap test (rigor = "fast") or a full Gram-matrix rank test (rigor = "full") on the additive design matrix $\mathbf{Z}_a[k]$, thereby detecting collinearity within the additive component and naming overlap with the modulating-component design $\mathbf{X}$.

Arguments

Argument Type Meaning
design list Must contain Z_a_list, Z_a_names_list, X, X_names.
amm amm_spec Model specification; amm$W$dim is read to derive the per-k block dimension.
theta_ref numeric vector (length p) Reference parameter vector; used to determine p.
rigor character scalar ("fast" or "full") Selects the check strategy (see below).
tol numeric scalar Eigenvalue-ratio tolerance; a coordinate passes iff $\lambda_{\min} \ge \text{tol}\cdot\lambda_{\max}$.

Mathematics

Let $p = \text{length}(\theta_{\text{ref}})$ and $W_{\text{per-k}} = \lfloor \dim(W) / p \rfloor$.

For each coordinate $k$:

  1. Extract the additive design sub-matrix $\mathbf{Z}_a[k]$ (column-normalised to unit $\ell_2$ norm):

    $$\widetilde{\mathbf{Z}}_a[k] = \mathbf{Z}_a[k],\operatorname{diag}!\bigl(\lVert \mathbf{Z}a[k]{\cdot j}\rVert_2^{-1}\bigr)$$

  2. Form the (sample) correlation/gram matrix:

    $$\mathbf{G}_k = \frac{1}{n},\widetilde{\mathbf{Z}}_a[k]^{!\top},\widetilde{\mathbf{Z}}_a[k]$$

  3. Compute eigen-decomposition $\mathbf{G}_k = \mathbf{V},\operatorname{diag}(\lambda),\mathbf{V}^{!\top}$. The coordinate passes the rank sub-check iff:

    $$\lambda_{\min}^{(k)} ;\ge; \text{tol};\cdot;\lambda_{\max}^{(k)}$$

  4. Structural overlap sub-check: Independently compute $\text{shared} = \operatorname{colnames}!\bigl(\mathbf{Z}_a[k]\bigr) ;\cap; \operatorname{colnames}(\mathbf{X})$. Under rigor = "full", any non-empty intersection causes a fail.

    Under rigor = "fast", overlap triggers only a gdpar_c4bis_overlap_warning (not a failure), and the rank check is skipped entirely—every coordinate is marked passed = TRUE with NA eigenvalue diagnostics.

Algorithm (rigor = "full")

for k in 1..p:
    shared ← intersect(colnames(Z_a[k]), colnames(X))
    if ncol(Z_a[k]) == 0 → pass trivially
    col_norms ← √(colSums(Z_a[k]²))
    if any(col_norms == 0) → fail, record zero-norm columns
    Z_k_n ← Z_a[k] / col_norms          # column-normalise
    G_k ← crossprod(Z_k_n) / nrow(Z_k_n)
    eig_k ← eigen(G_k, symmetric = TRUE)
    λ_min ← min(eig_k$values)
    λ_max ← max(eig_k$values)
    κ ← λ_max / max(λ_min, ε_machine)
    passed_rank ← (λ_min ≥ tol · λ_max)
    passed_overlap ← (shared is empty)
    passed_k ← passed_rank AND passed_overlap
    collinear_directions:
        if !passed_rank  → eigenvectors with eigenvalue < tol·λ_max
        if !passed_overlap → shared columns labelled

Returns

A list with elements:

Element Type / Content
rigor Character; echo of the input rigor.
per_k List of length $p$; each entry is a list:

Each per_k[[k]] entry contains:

Sub-element Type / Content
passed Logical; TRUE if this coordinate passes both sub-checks.
rigor Character ("fast" or "full").
lambda_min Numeric; smallest eigenvalue of $\mathbf{G}_k$ (or NA_real_).
lambda_max Numeric; largest eigenvalue of $\mathbf{G}_k$ (or NA_real_).
condition_number Numeric; $\kappa^{(k)} = \lambda_{\max}^{(k)} / \max(\lambda_{\min}^{(k)},,\varepsilon)$ (or NA_real_).
shared_cols Character vector; column names shared between $\mathbf{Z}_a[k]$ and $\mathbf{X}$.
collinear_directions List of lists (or NULL); each sub-list has eigenvalue, columns, coefficients (eigenvector loadings with $
coord Integer; the coordinate index $k$.
message (Only when ncol(Z_a_k) == 0 under "full"): "empty Z_a[k]: trivially passes."

Notes

  • Zero-norm column detection (full rigor): If any column of $\mathbf{Z}a[k]$ has zero $\ell_2$ norm the coordinate immediately fails with $\lambda{\min}=0$, $\kappa=\infty$, and the offending columns are recorded in collinear_directions under the label "zero-norm columns in Z_a[k]" with coefficient 1.
  • Fast rigor warning aggregation: Overlap detected across coordinates is accumulated in overlap_acc and emitted as a single gdpar_c4bis_overlap_warning (via gdpar_warn) after the loop. The warning includes a data$shared_cols_by_coord slot listing shared columns per coordinate.
  • Modulating-component design $\mathbf{X}$ not used in rank check: The source code documents that the extended Gram matrix $[\mathbf{Z}a[k] \mid \theta{\text{ref}}[k]^m \mathbf{X}]$ was considered but rejected because at a fixed $\theta_{\text{ref}}$ the $W$-block columns are scalar multiples of $\mathbf{X}$ per coordinate $l$, yielding rank 1 by construction. Cross-component non-identifiability is therefore deferred to post-fit diagnostics (divergences, low ESS, high $\hat{R}$).
  • Column naming convention: Collinear-direction column labels are prefixed with "a{k}:" (e.g., "a3:smooth_term").
  • Default anchor: The comment states the anchor is taken as zero, consistent with the package default on the linear-predictor scale.
  • %||% usage: X_names, colnames(Z_a_k), lambda_min, lambda_max, and condition_number all use the null-coalescing operator %||% to supply fallback values.

.check_did_pre_fit(family, design, theta_ref, rigor)

Purpose Pre-fit validation of Data-Identifying Design (DID) conditions for individual-scope parameters within a gdpar_family or gdpar_family_multi object. Called before the Stan fit to verify that all per-observation or per-group parameter specifications have an explicit DID declaration and, when rigor == "full" and $K \geq 2$, that the parameters exhibit symbolic separability (distinct canonical prior kinds).

Arguments

Argument Type Meaning
family gdpar_family, gdpar_family_multi, or NULL The family object whose param_specs are inspected. If NULL, the function returns NULL immediately.
design any Present in the signature but never referenced in the function body; reserved for future use or passed through for call-site compatibility.
theta_ref any Present in the signature but never referenced in the function body; same rationale as design.
rigor character scalar "full" activates the symbolic-separability sub-check when $K \geq 2$; any other value skips it (separability defaults to TRUE).

Algorithm

  1. Family extraction. If family inherits from "gdpar_family_multi", the first element family$families[[1]] is taken as base_family; if it inherits from "gdpar_family", base_family <- family; otherwise return NULL.

  2. Individual-scope filtering. From base_family$param_specs, retain only specs whose scope is "per_observation" or "per_group". Let $K$ be the count of such specs.

  3. Per-parameter metadata. For each retained spec $s$, extract the fields name, scope, did_status, did_condition, did_reference, prior_canonical_kind.

  4. DID declarative check. $$ \text{passed_did} = \bigwedge_{k=1}^{K} \bigl(s_k.\text{did_status} \in {\texttt{"holds"},;\texttt{"holds_under_condition"}}\bigr) $$

  5. Symbolic separability (conditional on $K \geq 2$ and rigor == "full"). Collect the vector of prior_canonical_kind values $\mathbf{p} = (p_1, \dots, p_K)$. An element $p_i$ is overlapping if $p_i = p_j$ for some $j \neq i$: $$ \text{overlap}_i = \bigl(\exists, j \neq i : p_j = p_i\bigr) $$ This is detected via duplicated(p) | duplicated(p, fromLast = TRUE). If any overlap exists, passed_separability <- FALSE and the offending kinds/names are recorded. Otherwise the check passes.

  6. Final verdict. $$ \text{passed} = \text{passed_did} ;\wedge; \text{passed_separability} $$

Returns

A list (or NULL if family is NULL or not a recognized gdpar family):

Field Type Meaning
passed logical scalar TRUE iff DID declarations are valid and separability holds
K integer scalar Number of individual-scope parameter specs
per_param list of length $K$ Per-parameter metadata (name, scope, did_status, did_condition, did_reference, prior_canonical_kind)
symbolic_separability NULL or list NULL when not evaluated; otherwise a list with passed, overlapping_kinds, overlapping_names, message
rigor character scalar Echo of the input rigor

Notes

  • design and theta_ref are accepted but completely unused; this allows a uniform calling convention across identifiability-check helpers.
  • For a gdpar_family_multi object only the first family (family$families[[1]]) is inspected; sub-families at index $\geq 2$ are ignored.
  • If base_family$param_specs is NULL, the function returns NULL (not a list with passed = TRUE), signaling that DID checking is not applicable.
  • No errors are raised; all conditions are evaluated non-destructively.

.check_Z_a_K_per_slot(design_K, rigor, tol)

Purpose Implements layer D-B3 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check of the per-slot additive design matrix $Z_a^{(k)}$ for each slot $k = 1, \dots, K$ in the $K$-individual parameter path. Each $Z_a^{(k)}$ is already column-centered by .build_amm_design_K() before reaching this helper. The check detects zero-norm columns (structural rank deficiency) and, under rigor == "full", computes the normalized Gram condition number and flags slots whose minimum eigenvalue falls below $\text{tol} \cdot \lambda_{\max}$.

Arguments

Argument Type Meaning
design_K list Output of .build_amm_design_K(); must contain Z_a_k_list (list of $K$ matrices), Z_a_k_names_list (list of column-name vectors), slot_names (character vector of length $K$).
rigor character scalar "fast" skips the Gram eigendecomposition (rank is checked only via zero-norm detection); "full" performs the full eigendecomposition.
tol numeric scalar Threshold for the eigenvalue-ratio criterion. A slot passes when $\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}$.

Mathematics

For each slot $k$ with non-empty $Z_a^{(k)} \in \mathbb{R}^{n \times p_k}$:

  1. Column norms. $c_j = \lVert z_j \rVert_2 = \sqrt{\sum_{i=1}^n z_{ij}^2}$ for $j = 1, \dots, p_k$. Any $c_j = 0$ signals an identically-zero column and immediately fails the slot.

  2. Column normalization (full rigor). $$ \widetilde{Z} = Z_a^{(k)} \cdot \operatorname{diag}(c_1^{-1}, \dots, c_{p_k}^{-1}) $$

  3. Normalized Gram matrix. $$ G = \frac{1}{n},\widetilde{Z}^\top \widetilde{Z} ;\in; \mathbb{R}^{p_k \times p_k} $$

  4. Eigendecomposition. $G = V \Lambda V^\top$ with eigenvalues $\lambda_1 \geq \cdots \geq \lambda_{p_k} \geq 0$.

  5. Rank criterion. $$ \text{ok} = \bigl(\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}\bigr) $$ where $\lambda_{\min} = \lambda_{p_k}$ and $\lambda_{\max} = \lambda_1$.

  6. Collinear-direction extraction. For each eigenvalue $\lambda_j &lt; \text{tol} \cdot \lambda_{\max}$, the corresponding eigenvector $v^{(j)}$ is inspected: components with $|v^{(j)}_i| &gt; 10^{-3}$ are retained and sorted by descending absolute value, yielding the collinear direction report.

Returns

A list:

Field Type Meaning
passed logical scalar TRUE iff every slot passes
rigor character scalar Echo of input
per_slot list of length $K$ (names = slot_names) Per-slot reports (see below)

Each element of per_slot is a list:

Field Type Meaning
slot character Slot name
passed logical Slot-level pass/fail
rigor character Effective rigor for this slot
lambda_min numeric Minimum eigenvalue (NA_real_ if not computed)
lambda_max numeric Maximum eigenvalue (NA_real_ if not computed)
condition_number numeric $\lambda_{\max} / \max(\lambda_{\min}, \epsilon_{\text{mach}})$; Inf if zero-norm columns detected
collinear_columns NULL or list NULL when passed; under rigor == "fast" with zero columns, a character vector of offending column names; under rigor == "full" a list of sub-lists each with eigenvalue, columns, coefficients
message character Human-readable diagnostic

Notes

  • Empty matrix shortcut. If $p_k = 0$, the slot passes trivially with lambda_min = lambda_max = condition_number = NA_real_.
  • Zero-norm columns. Detected before any eigendecomposition. The offending column names are taken from Z_a_k_names_list[[k]], falling back to colnames(Z_a_k), falling back to character(ncol(Z_a_k)).
  • rigor == "fast" with no zero-norm columns. Passes unconditionally without Gram computation; diagnostics are NA_real_.
  • Numerical safeguard. The condition number divides by $\max(\lambda_{\min}, \epsilon_{\text{mach}})$ where $\epsilon_{\text{mach}}$ is .Machine$double.eps, preventing division by zero.
  • No S3 dispatch; purely internal utility.

.check_C4_bis_K_cross_slot(design_K, rigor, tol)

Purpose Implements layer D-B2 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check on the column-wise concatenation of per-slot additive design matrices $Z_a^{(1)}, \dots, Z_a^{(K)}$. Even when each individual $Z_a^{(k)}$ is full column rank (verified by .check_Z_a_K_per_slot), the joint matrix can be rank-deficient when the same covariate appears in multiple slots with linearly equivalent designs. This helper detects such cross-slot collinearity.

Arguments

Argument Type Meaning
design_K list Output of .build_amm_design_K(); same structure as for .check_Z_a_K_per_slot.
rigor character scalar "fast" returns a structural pass without eigendecomposition; "full" computes the Gram eigendecomposition of the joint matrix.
tol numeric scalar Eigenvalue-ratio threshold: $\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}$.

Mathematics

  1. Joint construction. For each slot $k$ with $p_k &gt; 0$ columns, prefix every column name with "{slot_name}:" and horizontally concatenate: $$ Z_{\text{joint}} = \bigl[, Z_a^{(1)} ;\big|; Z_a^{(2)} ;\big|; \cdots ;\big|; Z_a^{(K)},\bigr] ;\in; \mathbb{R}^{n \times P} $$ where $P = \sum_{k: p_k &gt; 0} p_k$.

  2. Zero-norm detection. Column norms $c_j = \lVert z_j \rVert_2$. Any $c_j = 0$ immediately fails the check.

  3. Column normalization and Gram matrix (full rigor). $$ \widetilde{Z}{\text{joint}} = Z{\text{joint}} \cdot \operatorname{diag}(c_1^{-1}, \dots, c_P^{-1}), \qquad G = \frac{1}{n},\widetilde{Z}{\text{joint}}^\top \widetilde{Z}{\text{joint}} $$

  4. Eigendecomposition and rank criterion. Identical to the per-slot check: $$ \text{ok} = \bigl(\lambda_{\min}(G) \geq \text{tol} \cdot \lambda_{\max}(G)\bigr) $$

  5. Collinear-direction extraction. Same thresholding procedure as .check_Z_a_K_per_slot: for each eigenvalue below the tolerance band, retain eigenvector components with $|v_i| &gt; 10^{-3}$, sorted by descending absolute value. Column names are prefixed with slot identifiers so the report identifies which slot contributes each column.

Returns

A list:

Field Type Meaning
passed logical scalar TRUE iff the joint matrix passes the rank criterion
rigor character scalar Echo of input
lambda_min numeric Minimum eigenvalue of $G$ (NA_real_ if not computed)
lambda_max numeric Maximum eigenvalue of $G$ (NA_real_ if not computed)
condition_number numeric $\lambda_{\max} / \max(\lambda_{\min}, \epsilon_{\text{mach}})$; Inf if zero-norm columns
collinear_directions NULL or list NULL when passed; otherwise a list of sub-lists each with eigenvalue, columns (prefixed names), coefficients
total_columns integer $P$, the total number of columns in $Z_{\text{joint}}$
message character Human-readable diagnostic

Notes

  • Empty joint design. If every slot has $p_k = 0$, the function returns passed = TRUE with total_columns = 0L and NA diagnostics.
  • rigor == "fast". After concatenation (to compute total_columns), the function returns a structural pass without eigendecomposition. Note that zero-norm detection is also skipped under "fast" — unlike .check_Z_a_K_per_slot which at least checks column norms under "fast".
  • Zero-norm columns under "full". Detected before eigendecomposition; the returned collinear_directions contains a single synthetic entry with eigenvalue = 0 and coefficients = rep(1, length(bad)), since the true null direction is degenerate.
  • Column-name resolution. For each slot $k$, names are sourced from Z_a_k_names_list[[k]], then colnames(Z_a_k), then auto-generated "col1", "col2", … via paste0("col", seq_len(ncol(Z_a_k))).
  • No S3 dispatch; purely internal. The returned list is a component of the combined report produced by the companion function .check_identifiability_K.

.check_identifiability_K(design_K, rigor = "full", tol = 1e-8)

Purpose Top-level orchestrator for K-individual identifiability checks in the gdpar pipeline. It aggregates two independent sub-checks—per-slot rank of the anchor design blocks (C1–C3 conditions) and cross-slot Gram-matrix conditioning (C4-bis condition)—into a single logical pass/fail verdict. This function is called during the pre-fit decision layer to determine whether the anchor parameterisation is structurally identifiable.

Arguments

Argument Type Meaning
design_K list Design structure returned by the AMM design builders. Must contain a Z_a_k_list component (list of per-slot anchor design matrices). Passed directly to the two sub-check helpers.
rigor character scalar Checking rigor; matched against c("full", "fast") via match.arg. "full" performs exhaustive eigenvalue/SVD-based rank analysis; "fast" may use cheaper heuristics. Defaults to "full".
tol numeric scalar Numerical tolerance (eigenvalue ratio threshold) for rank decisions. Defaults to 1e-8. Passed through to both sub-check functions.

Mathematics This function does not implement a formula directly; it delegates to:

  1. .check_Z_a_K_per_slot(design_K, rigor, tol) — verifies that each per-slot anchor design matrix $Z_{a,k}$ has full column rank (the C1–C3 conditions).
  2. .check_C4_bis_K_cross_slot(design_K, rigor, tol) — verifies that the cross-slot block Gram matrix is non-singular (the C4-bis condition), i.e., that no linear combination of columns across different $k$-slots creates a collinearity.

The overall verdict is the logical conjunction: the identifiability check passes if and only if both sub-checks pass.

Returns A named list with components:

Component Type Description
passed logical TRUE iff both per_slot_rank$passed and cross_slot_gram$passed are TRUE.
rigor character The resolved rigor level.
tol numeric The tolerance used.
K integer Number of K-individual slots, derived as length(design_K$Z_a_k_list).
per_slot_rank list Full return value of .check_Z_a_K_per_slot.
cross_slot_gram list Full return value of .check_C4_bis_K_cross_slot.

Notes

  • rigor is validated by match.arg; an unrecognised value raises an error before either sub-check executes.
  • Both sub-checks receive the same rigor and tol values, ensuring consistent stringency.
  • K is extracted from the length of Z_a_k_list; if that component is missing or empty, K will be 0 and the sub-checks must handle that case internally.
  • No side effects; purely computational with no global state modification.

.compute_info_ratio_K(fit, family, slot_names, use_groups, prior)

Purpose Implements Block D-B1 of sub-phase 8.3.4: post-fit information contraction analysis per K-individual slot. For each slot's anchor parameter, it computes the prior-to-posterior variance contraction ratio $C_k$ and classifies the result as pass, warn, or information-error. This diagnostic detects slots where the data is essentially uninformative about the anchor, so the posterior merely recovers the prior. The function is internal and invoked after Stan sampling completes.

Arguments

Argument Type Meaning
fit cmdstanr fit object The output of cs_model$sample(). Used to extract posterior draws via fit$draws(format = "draws_matrix").
family list (gdpar_family) A gdpar_family object whose K-individual slots have been promoted to per-observation scope by .gdpar_promote_scope_per_observation(). Its param_specs component is consulted for prior_canonical_kind per slot.
slot_names character vector Canonical slot names, length $K$. Used to name output entries and to construct Stan parameter names.
use_groups integer scalar (0 or 1) Whether the fit used per-group hierarchical anchors. Determines the Stan parameter root: "mu_theta_ref_k" when 1, "theta_ref_k" when 0.
prior list (gdpar_prior) A gdpar_prior object; its priors_by_kind overrides are documented as potentially consulted but are currently inert (the helper falls back to canonical kinds).

Mathematics

The contraction for slot $k$ is:

$$C_k = 1 - \frac{\operatorname{Var}_{\text{post}}(\theta_{\text{ref},k})}{\operatorname{Var}_{\text{prior}}(\theta_{\text{ref},k})}$$

where $\operatorname{Var}{\text{post}}$ is the sample variance of the posterior draws and $\operatorname{Var}{\text{prior}}$ is obtained from .gdpar_canonical_prior_variance(kind) for the slot's canonical prior kind.

Decision thresholds (from sub-phase 8.3.4 scoping):

$$\text{status} = \begin{cases} \texttt{"pass"} & \text{if } C_k \ge 0.5 \ \texttt{"warn"} & \text{if } 0.1 \le C_k < 0.5 \ \texttt{"information_error"} & \text{if } C_k < 0.1 \end{cases}$$

Returns A named list with components:

Component Type Description
passed logical TRUE iff no slot triggered "warn" or "information_error".
any_warn logical TRUE if any slot has status "warn".
any_info_error logical TRUE if any slot has status "information_error".
thresholds named numeric vector c(warn = 0.5, information_error = 0.1).
per_slot named list of length $K$ Per-slot diagnostic results (see below).

Each element of per_slot is a named list:

Component Type Description
slot character The slot name from slot_names[k].
var_post numeric Posterior sample variance (or NA_real_ if skipped).
var_prior numeric Prior variance from canonical kind (or NA_real_).
contraction numeric $C_k$ value (or NA_real_ if skipped/non-finite).
status character One of "pass", "warn", "information_error", "skipped".
message character Diagnostic message string.

Notes

  • Early return on missing draws: If fit$draws(format = "draws_matrix") throws an error (caught via tryCatch), the function returns immediately with passed = TRUE and all slots marked "skipped" with message "fit$draws() unavailable". This is a defensive fallback—no diagnostic is raised when draws are inaccessible.
  • Parameter name construction: The Stan parameter name is paste0(param_root, "[1,", k, "]") for both use_groups == 0 and use_groups == 1 (the index structure is [1, k] in both cases, using the first row). The root is "theta_ref_k" or "mu_theta_ref_k" depending on use_groups.
  • Spec resolution: The function attempts to filter family$param_specs for specs with scope in c("per_observation", "per_group"). If the filtered length does not equal $K$, it falls back to taking the first $K$ specs positionally (family$param_specs[seq_len(K)]).
  • Non-finite contraction: If var_prior is Inf, 0, or the ratio produces NaN/Inf, the contraction is NA_real_ and status is "skipped" with message "non-finite contraction".
  • Insufficient draws: If the draws column exists but has fewer than 4 elements (length(draws_k) < 4L), the slot is skipped with message "draws for '<param>' unavailable" (even though draws technically exist, the sample is too small for a reliable variance estimate).
  • No canonical variance: If .gdpar_canonical_prior_variance(kind) returns a non-finite value, the slot is skipped with message reporting the kind.
  • The information-error raises a gdpar_information_error warning class elsewhere in the pipeline; this function itself only classifies—it does not emit warnings or errors.
  • The prior argument is accepted for future extensibility but is currently unused within the function body.

print.gdpar_identifiability_report(x, ...)

Purpose S3 print method for objects of class "gdpar_identifiability_report". Renders a human-readable summary of the identifiability diagnostic to the console, covering all possible sub-report sections: eigenvalue/condition-number diagnostics, collinear directions (C1–C4), C4-bis per-coordinate cross-component checks, and D-ID pre-fit parameter analysis. Exported for user convenience.

Arguments

Argument Type Meaning
x list with S3 class "gdpar_identifiability_report" The identifiability report object. Expected components are detailed in the Returns section below.
... (unused) Present for S3 generic compatibility; ignored.

Returns Invisibly returns x (the input object), following standard R print-method conventions. The primary effect is the side effect of printing formatted text to the console via cat().

Notes

  • Top-level fields: Always prints x$passed. Prints lambda_min, lambda_max, condition_number, and tol_used only if x$lambda_min is not NA. Prints x$rigor_used if non-NULL. Prints x$message unconditionally.
  • Collinear directions (C1–C4): If x$passed is FALSE and x$collinear_directions is non-NULL, iterates over each direction entry, printing the eigenvalue and each column/coefficient pair in the direction vector. Each entry d is expected to have d$eigenvalue (numeric), d$columns (character vector), and d$coefficients (numeric vector of same length as d$columns).
  • C4-bis section: If x$c4_bis is non-NULL, iterates over x$c4_bis$per_k. Each entry pk is expected to have pk$coord, pk$rigor, pk$passed, pk$condition_number (may be Inf), pk$shared_cols (character vector, may be empty), and optionally pk$collinear_directions with the same structure as above.
  • D-ID pre-fit section: If x$did_pre_fit is non-NULL, prints the total K, overall passed flag, and per-parameter details (name, scope, prior_canonical_kind, did_status, did_condition). Also prints symbolic_separability if present, including passed and overlapping_kinds.
  • Formatting: Uses format(..., digits = 4) for eigenvalues and condition numbers, format(..., digits = 3, width = 7) for direction coefficients. Indentation uses fixed strings (" ", " ", " ", " ").
  • No validation is performed on the structure of x; missing components are guarded by is.null / is.na checks. If x lacks expected fields, the corresponding section is silently omitted.
  • The function does not raise errors under normal conditions; it is purely presentational.

Note: The source section also contains the roxygen documentation block for a .check_C7_group_anchor_aliasing function (implementing condition C7 from Block 6.5—detecting aliasing between group indicators and design columns). However, the function definition (signature and body) is not present in this section; it begins in the subsequent section (section 5 of 5). Therefore it is documented there, not here.

.check_group_aliasing_c7(design, group_id, group_var_name, tol = 1e-8)

Purpose
Checks the identifiability condition (C7) of Block 6.5 for group aliasing. Ensures that columns of the design matrices for the (a) or (b) components are not constant within groups (which would alias with the per-group anchor (\theta_{\text{ref}[g]})) and that the combined matrix of group indicators and design columns has full column rank (no indirect aliasing). This function orchestrates per-block checks, handling both univariate and multi‑coordinate designs.

Arguments

  • design: A list containing design matrices. In a univariate setting it should have elements Z_a, Z_b, Z_a_names, Z_b_names. In a multi‑coordinate setting it should have Z_a_list, Z_b_list, Z_a_names_list, Z_b_names_list (each a list of length (p), the number of coordinates).
  • group_id: A vector of group identifiers. Converted internally to an integer factor. If NULL or with fewer than two levels, the check is skipped.
  • group_var_name: A character string naming the grouping variable (used in error messages).
  • tol: Numeric tolerance for comparing variances and QR‑rank deficiency (default 1e-8).

Mathematics
The function implements two checks per design block (see .check_c7_one_block):

  1. Within‑group variance test: For each column (j) of a design matrix (Z), compute the within‑group variance (s_{jg}^2) for each group (g). If (s_{jg}^2 \le \text{tol}) for all groups (i.e., the column is constant within every group), identifiability is violated.
  2. Rank test: Let (G) be the indicator matrix for the groups. The combined matrix (M = [G \mid Z]) must have full column rank. If (\operatorname{rank}(M) < \text{ncol}(M)) (after column normalization), there exists an indirect alias between the group anchor and a linear combination of the design columns.

Returns
invisible(NULL) invisibly. The function is called for its side effects (raising errors).

Notes

  • Internal function (not exported, leading dot).
  • If group_id is NULL or has fewer than two levels, the function returns immediately without checks.
  • For multi‑coordinate designs (has_multi_design is TRUE), the check is applied to each coordinate block separately, iterating over seq_len(p) where p = length(design$Z_a_list).
  • For each block, it calls .check_c7_one_block with the appropriate design sub‑matrix, column names, and coordinate index.
  • Errors are raised via gdpar_abort() with class "gdpar_input_error" and a structured data list containing the component, coordinate (if any), group variable name, and (for variance violations) the names of the aliased columns.

.check_c7_one_block(Z, Z_names, component, coord, group_int, J_groups, group_var_name, tol)

Purpose
Internal helper that applies the two‑layer aliasing check (C7) to a single design block (Z) (either (Z_a) or (Z_b), for a specific coordinate (k) in a multi‑coordinate design or with coord = NA for univariate). It performs the within‑group variance test and the joint rank test.

Arguments

  • Z: Numeric design matrix (may have zero columns or rows). If NULL, zero columns, or zero rows, the function returns immediately.
  • Z_names: Character vector of column names for Z, used in error messages to identify aliased columns.
  • component: Character string, either "a" or "b", indicating which model component the block belongs to.
  • coord: Integer coordinate index (k) (for multi‑coordinate designs) or NA_integer_ for univariate. Used only for error message formatting.
  • group_int: Integer vector of group memberships (values in (1, \dots, J_{\text{groups}})) for each observation.
  • J_groups: Integer number of groups. (Received but not used directly in the computations; group structure is encoded in group_int and the indicator matrix (G).)
  • group_var_name: Character string naming the grouping variable (for error messages).
  • tol: Numeric tolerance for variance comparisons and QR‑rank deficiency.

Mathematics

  1. Within‑group variance test:
    For each column (j = 1, \dots, p) (where (p = \text{ncol}(Z))): [ s_{jg}^2 = \begin{cases} 0 & \text{if group } g \text{ has fewer than 2 observations}, \ \frac{1}{|g|-1} \sum_{i \in g} (Z_{ij} - \bar{Z}{\cdot j g})^2 & \text{otherwise}, \end{cases} ] where (\bar{Z}{\cdot j g}) is the mean of column (j) within group (g).
    Let (m_j = \max_{g=1}^{J_{\text{groups}}} s_{jg}^2).
    If (m_j \le \text{tol}), column (j) is flagged as constant within every group.
  2. Rank test:
    Construct the group indicator matrix (G \in \mathbb{R}^{n \times J_{\text{groups}}}) via model.matrix(~ as.factor(group_int) + 0).
    Form (M = [G \mid Z]). Normalize each column of (M) by its Euclidean norm (with zero norms replaced by 1 to avoid division by zero).
    Compute the QR decomposition of the normalized matrix: (\text{qr}(M_{\text{norm}})).
    If (\operatorname{rank}(M_{\text{norm}}) < \text{ncol}(M_{\text{norm}})), the joint matrix is rank‑deficient, indicating an indirect alias.

Returns
invisible(NULL) if all checks pass.

Notes

  • Internal function (not exported, leading dot), called only by .check_group_aliasing_c7.
  • Early returns with invisible(NULL) if:
    • Z is NULL,
    • ncol(Z) == 0L,
    • nrow(Z) == 0L.
  • Two possible error conditions:
    1. Direct aliasing (constant columns): If any column has within‑group variance (\le \text{tol}), gdpar_abort() is called with a message listing the offending column names. The error data includes the component, coordinate, group variable name, and aliased_columns.
    2. Indirect aliasing (rank deficiency): If the rank of the normalized combined matrix is less than the number of columns, gdpar_abort() is called with a message including the component, coordinate, group variable name, the observed rank, and the number of columns. The error data includes rank and ncol.
  • Errors are of class "gdpar_input_error" and contain a data list for programmatic handling.
  • The argument J_groups is passed but not used in any computation; the group structure is fully represented by group_int and the generated indicator matrix (G).

R/compare_eb_fb_methods.R

S3 Methods for gdpar_eb_fb_comparison Objects

This file defines three S3 methods supporting the comparison of Empirical-Bayes (EB) and Fully-Bayes (FB) estimation paths produced elsewhere in the gdpar package. The methods provide console printing, a structured summary, and printing of that summary.


print.gdpar_eb_fb_comparison(x, digits = 3L, ...)

Purpose

S3 print method for objects of class gdpar_eb_fb_comparison. Produces a concise human-readable console summary of an EB-vs-FB comparison, including the estimation families and paths involved, the number of common $\xi$ parameters, marginal summary statistics of the Total Variation (TV) distribution and the EB/FB interval width-ratio distribution, and the first six rows of the per-anchor $\theta$-diff table. Any stored warnings are appended at the end.

Arguments

Argument Type Meaning
x gdpar_eb_fb_comparison (list) The comparison object to display. Expected components: family_eb, family_fb (character), path_eb, path_fb (character), level (numeric), tv_bins (integer scalar), n_common_params (integer scalar), tv_table (data frame with a tv column, possibly NULL), coverage_table (data frame with a width_ratio column, possibly NULL), theta_diff_table (data frame, possibly NULL), warnings (character vector).
digits integer scalar Passed to format() for numeric formatting; defaults to 3L.
... (any) Unused; absorbed for S3 generic compatibility.

Mathematics

No mathematical formula is implemented. The method computes order statistics over finite subsets of two distributions:

  • TV values: ${v \in \texttt{tv_table}$tv : v \in \mathbb{R}}$, reporting $\min$, $\operatorname{median}$, $\max$.
  • Width ratios: ${r \in \texttt{coverage_table}$width_ratio : r \in \mathbb{R}}$, reporting $\min$, $\operatorname{median}$, $\max$.

Non-finite values (NA, NaN, Inf, -Inf) are excluded via is.finite() before computing summary statistics.

Returns

The object x, returned invisibly (via invisible(x)). The primary effect is console output.

Notes

  • The level component is formatted with format(x$level, digits = digits); its type is not validated.
  • If tv_table is NULL or has zero rows, or if all tv values are non-finite, the marginal TV line is silently omitted.
  • If coverage_table is NULL or has zero rows, or if all width_ratio values are non-finite, the width-ratio line is silently omitted.
  • The $\theta$-diff preview uses utils::head(x$theta_diff_table, 6L) and is passed through format(..., digits = digits) before print().
  • Warnings are printed one per line, prefixed with " - ".
  • No validation or error-checking is performed on the structure of x; missing components would propagate as R errors (e.g., $ on NULL).

summary.gdpar_eb_fb_comparison(object, ...)

Purpose

S3 summary method for objects of class gdpar_eb_fb_comparison. Constructs a structured list suitable for programmatic access and for the canonical print.summary.gdpar_eb_fb_comparison method. Aggregates the TV table and the coverage (width-ratio) table into seven-point summary statistics (count, min, 25th percentile, median, 75th percentile, max, mean).

Arguments

Argument Type Meaning
object gdpar_eb_fb_comparison (list) The comparison object to summarize. Same expected components as for print.gdpar_eb_fb_comparison, plus optionally call.
... (any) Unused; absorbed for S3 generic compatibility.

Mathematics

For the TV distribution, let $V = {v \in \texttt{tv_table}$tv : v \in \mathbb{R}}$. If $|V| &gt; 0$, the summary stores:

$$ \texttt{tv_summary} = \bigl(n,; \min(V),; Q_{0.25}(V),; \operatorname{median}(V),; Q_{0.75}(V),; \max(V),; \bar{V}\bigr) $$

where $Q_p$ denotes the empirical quantile at probability $p$ (computed via stats::quantile) and $\bar{V} = \frac{1}{n}\sum_{i} v_i$.

For the width-ratio distribution, let $R = {r \in \texttt{coverage_table}$width_ratio : r \in \mathbb{R}}$. If $|R| &gt; 0$, the summary stores the analogous seven statistics:

$$ \texttt{coverage_summary} = \bigl(n,; \min(R),; Q_{0.25}(R),; \operatorname{median}(R),; Q_{0.75}(R),; \max(R),; \bar{R}\bigr) $$

If either finite subset is empty, the corresponding summary element is set to NULL.

Returns

A list of class c("summary.gdpar_eb_fb_comparison", "list") with the following components:

Component Type Description
family_eb character Copied from object$family_eb.
family_fb character Copied from object$family_fb.
path_eb character Copied from object$path_eb.
path_fb character Copied from object$path_fb.
level (inherits from object$level) Copied from object$level.
tv_bins integer Copied from object$tv_bins.
n_common_params integer Copied from object$n_common_params.
n_anchor_cells integer 0L if object$theta_diff_table is NULL, otherwise nrow(object$theta_diff_table).
tv_summary list or NULL Seven-element list (n, min, q25, median, q75, max, mean) or NULL.
coverage_summary list or NULL Seven-element list (same structure) or NULL.
theta_diff_table data frame or NULL Copied by reference from object$theta_diff_table.
tv_table data frame or NULL Copied by reference from object$tv_table.
coverage_table data frame or NULL Copied by reference from object$coverage_table.
warnings character `object$warnings %
call (any) Copied from object$call.

Notes

  • The %||% infix operator is used for the warnings default; this operator is assumed to be defined elsewhere in the package (not in this file). Under standard rlang::%||%`` semantics, it returns the left-hand side if it is not NULL, otherwise the right-hand side.
  • Quantiles are computed with stats::quantile at probabilities $0.25$ and $0.75$; the default type = 7 interpolation is used. The names attribute of the scalar result is stripped via unname().
  • The q25 and q75 fields are unnamed scalars; all other numeric fields inherit names from min(), stats::median(), max(), and mean() (typically NULL for these functions on atomic vectors).
  • No copy is made of the table data frames; they are assigned by reference into the output list.
  • The output class vector is c("summary.gdpar_eb_fb_comparison", "list"), enabling dispatch on both the specific class and the implicit list class.

print.summary.gdpar_eb_fb_comparison(x, digits = 3L, ...)

Purpose

S3 print method for objects of class summary.gdpar_eb_fb_comparison. Renders the structured summary to the console, displaying the EB/FB families and paths, the level, the count of common $\xi$ parameters and anchor cells, the full five-point quantile summary plus mean for both the marginal TV distribution and the EB/FB width-ratio distribution, the complete $\theta$-diff table, and any stored warnings.

Arguments

Argument Type Meaning
x summary.gdpar_eb_fb_comparison (list) The summary object to display. Expected components: family_eb, family_fb, path_eb, path_fb (character), level (numeric), n_common_params (integer scalar), n_anchor_cells (integer scalar), tv_summary (list or NULL), coverage_summary (list or NULL), theta_diff_table (data frame or NULL), warnings (character vector).
digits integer scalar Passed to format() for numeric formatting; defaults to 3L.
... (any) Unused; absorbed for S3 generic compatibility.

Mathematics

No mathematical formula is implemented. The method formats and displays pre-computed summary statistics. For each distribution (TV and width-ratio), it prints:

$$ \min,; Q_{0.25},; \operatorname{median},; Q_{0.75},; \max $$

on one line, followed by $\bar{X}$ (the mean) on a separate line.

Returns

The object x, returned invisibly (via invisible(x)). The primary effect is console output.

Notes

  • Unlike print.gdpar_eb_fb_comparison (which shows only the first six rows of theta_diff_table), this method prints the full theta_diff_table via print(format(x$theta_diff_table, digits = digits)).
  • The tv_summary and coverage_summary blocks are each printed only if the corresponding component is non-NULL. Each block includes the sample size n in its header.
  • The level is formatted with format(x$level, digits = digits).
  • Warnings are printed one per line, prefixed with " - ", only if length(x$warnings) > 0L.
  • The tv_bins component is not printed by this method (unlike print.gdpar_eb_fb_comparison), even though it is present in the summary object.
  • No structural validation of x is performed; missing or mistyped components would propagate as R errors.

R/compare_eb_fb.R

gdpar_compare_eb_fb(eb_fit, fb_fit, level = 0.95, tv_bins = 30L, ...)

Purpose (role in the package). Exported orchestrator for Sub-phase 8.6.E (Charter Section 3.5, decision 2.5 Trio of vignettes). It produces a descriptive operational comparison between an Empirical-Bayes fit (gdpar_eb_fit, from gdpar_eb()) and a Fully-Bayes fit (gdpar_fit, from gdpar()) fitted on the same dataset. It does not assert algorithmic equivalence and does not test hypotheses across the two inferential frames. It computes three tables:

  1. Per-anchor-cell differences in the population anchor $\theta_{\text{ref}}$.
  2. Marginal empirical total-variation (TV) distance between the lower-level posteriors of $\xi = (a,, b,, W,, \text{dispersion})$, parameter by parameter.
  3. Operational verification of the higher-order coverage discrepancy (v07 Section 6, Proposition 7B scalar / 7B* matricial / 7B* tensorial) on the nominal EB and FB credible intervals.

Arguments

Argument Type Meaning
eb_fit object of class gdpar_eb_fit Empirical-Bayes fit produced by gdpar_eb(). Covers all four path regimes: K = 1 + p = 1; Path A (K = 1 + p > 1); Path B (K > 1 + p = 1); Path C (K > 1 + p > 1, via the K × p tensor extension of Sub-phase 8.6.D).
fb_fit object of class gdpar_fit Fully-Bayes fit produced by gdpar(). Must have been fitted on the same dataset (same outcome, same covariates, same K / p regime). The comparator does not refit either model.
level numeric scalar Credible-interval level for coverage-discrepancy reporting. Must lie in $(0,,1)$. Defaults to 0.95.
tv_bins integer scalar Number of histogram bins used to approximate the marginal TV distance per parameter. Must be $\geq 5$. Defaults to 30L. Larger values give a finer empirical TV but require more draws per parameter for stability.
... (any) Reserved for future arguments; currently unused.

Mathematics

The marginal total-variation distance between two distributions $P$ and $Q$ on a single scalar parameter is approximated via a histogram-based plug-in estimator:

$$\widehat{\text{TV}}(P,Q) = \frac{1}{2}\sum_{j=1}^{B} \bigl|\hat p_j - \hat q_j\bigr|,$$

where $B = \texttt{tv_bins}$ is the number of bins, $\hat p_j$ and $\hat q_j$ are the relative bin counts (from a common breakpoint grid) of the EB and FB posterior draws respectively. No finite-sample correction is applied. Joint TV across the high-dimensional $\xi$ is out of scope (would require kernel Stein discrepancy or similar density-free metrics); the marginal TV reported per parameter is the operational proxy recommended in v07 Section 11.1.

The coverage-discrepancy table compares EB-nominal vs FB-nominal credible-interval widths at level level per anchor cell, operationally verifying the $\mathcal{O}(n^{-1})$ under-cover claim of Proposition 7B.

Returns

An object of class c("gdpar_eb_fb_comparison", "list") with components:

Component Type / Structure Meaning
theta_diff_table data.frame or NULL Per-anchor-cell comparison of EB vs FB $\theta_{\text{ref}}$ estimates. See .gdpar_eb_fb_theta_diff_table.
tv_table data.frame or NULL Marginal TV distance per common $\xi$ parameter. NULL when EB or FB draws are unavailable or there are zero common parameter names.
coverage_table data.frame or NULL Coverage-discrepancy table (EB-nominal vs FB-nominal IC widths per anchor cell).
level numeric scalar Echo of the level input.
tv_bins integer scalar Echo of the tv_bins input.
n_common_params integer Number of rows in tv_table (or 0L if tv_table is NULL).
path_eb character Path identifier from eb_fit$path, defaulting to "eb".
path_fb character Path identifier from fb_fit$path, defaulting to "fb".
family_eb character eb_fit$family$name, or NA_character_ if absent.
family_fb character fb_fit$family$name, or NA_character_ if absent.
call call The matched call.
warnings character vector Accumulated fallback notification messages from helper extractors. Empty (character(0L)) in the happy path.
meta list Contains mode = "compare_eb_fb" and a human-readable note summarizing the TV and coverage methodology.

Companion S3 methods print.gdpar_eb_fb_comparison and summary.gdpar_eb_fb_comparison are documented separately.

Notes

  • Input validation. eb_fit must inherit "gdpar_eb_fit"; fb_fit must inherit "gdpar_fit". level must be a single numeric value in $(0,1)$. tv_bins must be a single numeric value $\geq 5$ (coerced to integer). Violations raise a gdpar_input_error via gdpar_abort().
  • Required namespace. The posterior package is required (suggested dependency); an informative error is raised if absent.
  • Warning accumulation. A local emit() closure accumulates messages into warnings_msg (and also calls gdpar_warn() for each). Six distinct fallback conditions are checked:
    1. All-FB theta_ref draws are NA (unknown template convention or empty draws).
    2. EB $\xi$ draws are NULL.
    3. FB $\xi$ draws are NULL.
    4. TV table is NULL despite both draw sets being non-NULL (zero common parameter names).
    5. All-FB widths in the coverage table are NA.
  • FB conditional_fit fallback. For the FB $\xi$ draws extraction, the function first tries fb_fit$conditional_fit and falls back to fb_fit$fit via the %||% null-coalescing operator.
  • Path uniformity. All four EB regimes are handled uniformly by the helpers. For Path C the theta_ref_kp_hat tensor is flattened to a length-$Kp$ vector keyed by (slot, coord); the joint $K \times p$ inflation tensor is reported in the coverage table per cell via diagonal-block entries.
  • S3 dispatch. The returned object carries class c("gdpar_eb_fb_comparison", "list") for dispatch by the companion print/summary methods.

.gdpar_eb_fb_theta_diff_table(eb_fit, fb_fit, level)

Purpose (role in the package). Internal helper that builds the theta_diff_table component of the comparison object. It extracts the EB anchor point estimates and their standard errors, attempts to extract FB posterior draws of $\theta_{\text{ref}}$ via .gdpar_eb_fb_extract_theta_ref_draws_fb(), and computes per-anchor-cell differences. The row key structure varies by path regime.

Arguments

Argument Type Meaning
eb_fit object of class gdpar_eb_fit Empirical-Bayes fit. Path regime is inferred from eb_fit$path (checked for "eb_KxP" for Path C). Contains theta_ref_kp_hat / theta_ref_kp_se (Path C), or theta_ref_hat / theta_ref_se (other paths).
fb_fit object of class gdpar_fit Fully-Bayes fit. Passed to the FB draws extractor.
level numeric scalar in $(0,1)$ Credible-interval level. Accepted by the function signature but not used in the current body (reserved for future use or passed downstream elsewhere).

Mathematics

For each anchor cell $i$:

$$\texttt{diff}_i ;=; \widehat{\theta}_{\text{ref},i}^{,\text{EB}} ;-; \overline{\theta}_{\text{ref},i}^{,\text{FB}},$$

$$\texttt{diff_rel}_i ;=; \frac{\widehat{\theta}_{\text{ref},i}^{,\text{EB}} ;-; \overline{\theta}_{\text{ref},i}^{,\text{FB}}}{\text{sd}!\bigl(\theta_{\text{ref},i}^{,\text{FB draws}}\bigr)},$$

where $\overline{\theta}_{\text{ref},i}^{,\text{FB}}$ and $\text{sd}(\cdot)$ are the posterior mean and posterior standard deviation of the FB draws for cell $i$. The relative difference $\texttt{diff_rel}_i$ is set to NA_real_ when $\text{sd}(\cdot)$ is not finite or $\leq 0$.

Returns

A data.frame whose structure depends on path regime:

  • Path C (eb_fit$path == "eb_KxP", i.e.\ $K &gt; 1$ and $p &gt; 1$): One row per $(g,,k,,c)$ triple ($g = 1,\ldots,J$ groups; $k = 1,\ldots,K$ slots; $c = 1,\ldots,p$ coordinates). Columns:

    Column Type Meaning
    group integer Group index $g$.
    slot character Slot name from eb_fit$slot_names[k].
    coord integer Coordinate index $c$.
    eb_estimate numeric eb_fit$theta_ref_kp_hat[g, k, c].
    eb_se numeric eb_fit$theta_ref_kp_se[g, k, c].
    fb_mean numeric Posterior mean of FB draws for cell $(g,k,c)$; NA_real_ if draws unavailable.
    fb_se numeric Posterior SD of FB draws; NA_real_ if draws unavailable.
    diff numeric $\texttt{eb_estimate} - \texttt{fb_mean}$.
    diff_rel numeric $(\texttt{eb_estimate} - \texttt{fb_mean}) / \texttt{fb_se}$, or NA_real_ when fb_se is not finite or $\leq 0$.
  • Non-Path-C (K = 1 + p = 1, Path A, Path B): One row per anchor cell, indexed sequentially. Columns:

    Column Type Meaning
    cell integer Sequential cell index $1,\ldots,n_{\text{cells}}$.
    eb_estimate numeric as.numeric(eb_fit$theta_ref_hat)[i].
    eb_se numeric as.numeric(eb_fit$theta_ref_se)[i].
    fb_mean numeric From fb_draws$flat$means[i]; NA_real_ if absent.
    fb_se numeric From fb_draws$flat$ses[i]; NA_real_ if absent.
    diff numeric $\texttt{eb_estimate} - \texttt{fb_mean}$.
    diff_rel numeric Conditional ratio as above; uses vectorized ifelse.

Notes

  • FB draws extraction. Calls .gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit) inside a tryCatch that silently returns NULL on any error. When NULL, all FB columns are filled with NA_real_.
  • Path C iteration. Uses a triple nested for loop over $(g, k, c)$, pre-allocating a list of length $J \cdot K \cdot p$. Each list element is a single-row data.frame, assembled at the end via do.call(rbind, rows). FB draws for each cell are accessed as fb_draws$kp[[g]][[k]][, c].
  • Non-Path-C path. EB estimates and SEs are coerced to numeric vectors via as.numeric(). FB means and SEs are taken from fb_draws$flat$means and fb_draws$flat$ses, with length truncated to min(n_cells, length(fb_draws$flat$means)).
  • Edge case — zero-length FB draws. For Path C, if a cell's FB draw vector is NULL or has length == 0, both fb_mean and fb_se are set to NA_real_.
  • Edge case — zero FB SE. diff_rel is NA_real_ whenever fb_se is NA, not finite, or $\leq 0$.
  • The level argument is accepted but not used in the computation within this helper; it is present for interface consistency with the other helpers.

.gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit)

Purpose (role in the package). Internal helper that extracts the $\theta_{\text{ref}}$ posterior draws from a Fully-Bayes gdpar_fit object in a path-aware manner. Used by .gdpar_eb_fb_theta_diff_table to obtain the FB posterior summaries ($\overline{\theta}^{,\text{FB}}$ and $\text{sd}(\theta^{,\text{FB draws}})$) for comparison against EB point estimates.

Arguments

Argument Type Meaning
fb_fit object of class gdpar_fit Fully-Bayes fit whose Stan draws are to be inspected for $\theta_{\text{ref}}$ variables.

Mathematics

No formula is implemented by this helper; it is a pure data-extraction utility that retrieves posterior draws from the Stan output using the canonical posterior::as_draws_matrix interface.

Returns

A list whose structure depends on the path regime of fb_fit:

Path regime Component Structure Meaning
Non-Path-C (K = 1 + p = 1 / Path A / Path B) flat Named list with elements means (numeric vector) and ses (numeric vector) Posterior means and standard deviations of $\theta_{\text{ref}}$ draws, keyed by cell index. Draw variable names expected under theta_ref[...] or theta_ref_k[...] conventions.
Path C (K > 1 + p > 1) kp Nested list: kp[[g]][[k]] is a matrix with $n_{\text{draws}}$ rows and $p$ columns Posterior draws of $\theta_{\text{ref}}$ per group $g$, slot $k$, and coordinate $c = 1,\ldots,p$. Draw variable names expected under theta_ref_kp[...] convention.

Returns NULL when extraction fails (draws are not present in the recognized variable-name convention, or fb_fit lacks draw data).

Notes

  • Body not in this section. The function body is defined in section 2 of 2; only the roxygen documentation block appears in this section. The documented behavior is as described above.
  • Path C debt. The documentation explicitly notes that the K × p FB template for Path C is itself a follow-on debt of the 8.4 unification effort per Charter and the project_gdpar_deuda_8_4_unificacion_stan debt item.
  • Fail-silent design. The function returns NULL on failure rather than raising an error. The calling orchestrator (gdpar_compare_eb_fb) detects this via downstream NULL checks and emits structured warnings through the emit() closure.
  • @keywords internal / @noRd. This function is internal and does not generate an .Rd help page.

.gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit)

Purpose Extracts the posterior draws of reference-anchor parameters ($\theta_{\mathrm{ref}}$) from a fitted Forward-Backward (FB) model object. It inspects the stored Stan/MCMC draws and returns them in one of two structural forms—"flat" (a list of posterior means and standard deviations) or "kp" (a nested list of per-group, per-slot, per-coordinate draw matrices)—depending on which variable-naming convention (Path A/B/C) was used during sampling. Returns NULL if no $\theta_{\mathrm{ref}}$ variables are found or if the draws cannot be retrieved.

Arguments

Argument Type Meaning
fb_fit list A fitted FB model object. The function first looks for fb_fit$conditional_fit (an EB conditional fit embedded within the FB workflow) and falls back to fb_fit$fit (the raw FB fit). Each is expected to possess a $draws() method returning a posterior draws object.

Mathematics

No closed-form formula is implemented. The function is a dispatch-and-extraction routine that selects the appropriate draws-dimension based on variable naming:

  • Path C convention: variables matching ^theta_ref_kp\[ (three-index tensor $\theta_{\mathrm{ref}_kp}[g,k,c]$).
  • Path B convention: variables matching ^theta_ref_k\[ (one-dimensional, per-slot).
  • Path A / default convention: variables matching ^theta_ref(\[|$) (scalar or simple vector).

For the "flat" returns (Paths A and B), the function computes: $$\bar{\theta}j = \frac{1}{S}\sum{s=1}^{S} \theta_j^{(s)}, \qquad \mathrm{sd}j = \sqrt{\frac{1}{S-1}\sum{s=1}^{S}\bigl(\theta_j^{(s)} - \bar{\theta}_j\bigr)^2}$$ where $S$ is the number of posterior draws and $j$ indexes the parameter.

Returns

  • NULL if no draws object is available, if $draws() errors, if variable names are empty, or if no $\theta_{\mathrm{ref}}$ variables are found.
  • list(kp = <nested list>) — Path C: a nested list produced by .gdpar_eb_fb_unpack_kp, indexed as kp[[g]][[k]][, c] with $g=1,\dots,J$ (groups), $k=1,\dots,K$ (slots), $c=1,\dots,p$ (coordinates), each cell containing a numeric vector of posterior draws.
  • list(flat = list(means = <numeric>, ses = <numeric>)) — Paths A/B: unnamed numeric vectors of posterior means and standard deviations, one element per $\theta_{\mathrm{ref}}$ cell.

Notes

  • Uses the null-coalescing operator %||% (from rlang) to choose conditional_fit over fit.
  • Both the $draws() call and the dimnames() access are wrapped in tryCatch, silently returning NULL on error—this is a defensive pattern for partially-initialized or incomplete fit objects.
  • Variable detection proceeds in priority order: Path C (kp) is checked first, then Path B (k), then Path A (default). Only the first match is returned.
  • The comment notes that the Path C "kp" branch is not consumed by the diff table when the EB fit is $K=1, p=1$ (Path A); only the kp branch itself touches that structure.
  • Depends on the posterior package for as_draws_matrix and subset_draws.

.gdpar_eb_fb_unpack_kp(mat, vars_c)

Purpose Unpacks a draws matrix containing Path C-style $\theta_{\mathrm{ref}_kp}[g,k,c]$ variables into a deeply nested list structure indexed by group $g$, slot $k$, and coordinate $c$. Each leaf is a numeric vector of posterior draws for one tensor cell.

Arguments

Argument Type Meaning
mat posterior::draws_matrix A draws matrix whose columns are the $\theta_{\mathrm{ref}_kp}$ variables. Column names must follow the pattern theta_ref_kp[g,k,c].
vars_c character Character vector of variable names matching the theta_ref_kp[...] pattern (used for parsing index triples).

Mathematics

The function reconstructs the three-index tensor $$\theta_{\mathrm{ref}_kp} \in \mathbb{R}^{J \times K \times p}$$ from flat column names. For each column name theta_ref_kp[g,k,c], the integer indices $(g, k, c)$ are extracted via regular-expression parsing. The dimensions $J$, $K$, $p$ are inferred as: $$J = \max_g g, \quad K = \max_k k, \quad p = \max_c c$$

Returns

A nested list kp of depth 3:

  • kp[[g]] — list of length $J$ (groups/observations).
  • kp[[g]][[k]] — list of length $K$ (slots).
  • kp[[g]][[k]] — matrix of dimension $S \times p$ ($S$ = number of posterior draws, $p$ = number of coordinates). If a particular theta_ref_kp[g,k,c] column is absent from mat, that column is filled with NA_real_.

Notes

  • The regex "\\[(\\d+),(\\d+),(\\d+)\\]" with regexec captures exactly three comma-separated integers inside square brackets. If a parsed match has fewer than 4 elements (the full match plus 3 groups), c(NA, NA, NA) is substituted.
  • The result matrix kp[[g]][[k]] is initialized to NA_real_ before filling, so any missing column in mat yields NA draws for that cell rather than an error.
  • This function is marked @keywords internal and @noRd; it is not exported.
  • Uses sprintf to reconstruct the expected column name for lookup in mat.

.gdpar_eb_fb_extract_xi_draws(fit_obj)

Purpose Extracts the posterior draws of the "xi" parameter vector (the non-anchor model parameters: fixed-effect coefficients $a$, covariance parameters $c_b$, covariance matrix $W$, and dispersion parameters) from a fitted model object. It explicitly excludes $\theta_{\mathrm{ref}}$ variables, generated quantities ($\eta$, $\log\tilde{lik}$, $y_{\mathrm{pred}}$, $\theta_i$), raw/packed helper variables, and lp__. This function is used by both the EB conditional fit and the FB fit.

Arguments

Argument Type Meaning
fit_obj list A fitted model object (either EB conditional or FB) expected to have a $draws() method returning a posterior draws object. May be NULL.

Mathematics

No formula is implemented. The function performs a filtering operation on variable names. Variables are retained if they do not match any of the following exclusion patterns (joined by |):

Pattern Excluded variables
^lp__$ Log posterior density
^theta_ref Reference-anchor parameters (all variants)
^mu_theta_ref Mean of reference-anchor hyperparameters
^sigma_theta_ref SD of reference-anchor hyperparameters
^eta Linear predictor generated quantities
^eta_kp Path C linear predictor
^log_lik Pointwise log-likelihood (LOO)
^y_pred Posterior predictive draws
^theta_i Individual-level random effects
^a_raw Raw (non-centered) fixed-effect coefficients
^c_b_raw Raw covariance parameters
^c_b_kp_raw Path C raw covariance parameters
^W_raw Raw covariance matrix elements

Returns

  • NULL if fit_obj is NULL, if $draws() errors, if variable names are empty, or if no variables survive the exclusion filter.
  • A posterior::draws_matrix containing only the retained "xi" columns.

Notes

  • This function is the complement of .gdpar_eb_fb_extract_theta_ref_draws_fb: that function extracts only $\theta_{\mathrm{ref}}$ draws, while this one extracts everything except $\theta_{\mathrm{ref}}$ and generated quantities.
  • Both $draws() and dimnames() accesses are wrapped in tryCatch, silently returning NULL on error.
  • The exclusion list is hard-coded; adding new generated-quantity prefixes would require editing the paste(...) call.
  • Marked @keywords internal, @noRd, not exported.

.gdpar_eb_fb_tv_table(draws_eb, draws_fb, tv_bins)

Purpose Computes a marginal Total Variation (TV) distance between the EB and FB posterior distributions for each parameter common to both draws objects. This provides a diagnostic for how closely the EB approximation matches the full FB posterior on a per-parameter basis.

Arguments

Argument Type Meaning
draws_eb posterior::draws_matrix Posterior draws from the EB (Empirical Bayes) fit. Column names are parameter names.
draws_fb posterior::draws_matrix Posterior draws from the FB (Forward-Backward) fit. Column names are parameter names.
tv_bins integer scalar Number of bins for the shared histogram grid used in the TV computation.

Mathematics

For each parameter $\psi$ common to both draw sets, a shared support grid $[r_{\min}, r_{\max}]$ is constructed where: $$r_{\min} = \min\bigl(\min(x_{\mathrm{eb}}),, \min(x_{\mathrm{fb}})\bigr), \quad r_{\max} = \max\bigl(\max(x_{\mathrm{eb}}),, \max(x_{\mathrm{fb}})\bigr)$$

The grid is divided into $B$ equal-width bins with breakpoints: $$b_j = r_{\min} + j \cdot \frac{r_{\max} - r_{\min}}{B}, \quad j = 0, 1, \dots, B$$

Histogram counts $h_{\mathrm{eb},j}$ and $h_{\mathrm{fb},j}$ are computed over these bins and normalized to empirical pmfs: $$\hat{p}{\mathrm{eb},j} = \frac{h{\mathrm{eb},j}}{\sum_j h_{\mathrm{eb},j}}, \qquad \hat{p}{\mathrm{fb},j} = \frac{h{\mathrm{fb},j}}{\sum_j h_{\mathrm{fb},j}}$$

The marginal TV distance is then the histogram plug-in estimator: $$\widehat{\mathrm{TV}}(\psi) = \frac{1}{2}\sum_{j=1}^{B} \bigl|\hat{p}{\mathrm{eb},j} - \hat{p}{\mathrm{fb},j}\bigr|$$

Returns

  • NULL if either draws_eb or draws_fb is NULL, or if no common column names exist.
  • A data.frame with one row per common parameter and columns:
Column Type Meaning
parameter character Parameter name (column name in the draws objects).
tv numeric Marginal TV distance ($\in [0,1]$), or NA_real_ if the range is degenerate or non-finite.
n_eb integer Number of EB draws for this parameter.
n_fb integer Number of FB draws for this parameter.
mean_eb numeric Posterior mean from EB draws.
mean_fb numeric Posterior mean from FB draws.

Notes

  • If the combined range rng contains non-finite values or has zero width (diff(rng) <= 0), tv is set to NA_real_ for that parameter.
  • intersect(colnames(draws_eb), colnames(draws_fb)) determines common parameters; order follows draws_eb column order.
  • Uses graphics::hist(..., plot = FALSE) solely for bin counting; no plot is produced.
  • The rows are assembled via do.call(rbind, rows) after a loop over common parameters.
  • Marked @keywords internal, @noRd, not exported.

.gdpar_eb_fb_coverage_table(eb_fit, fb_fit, level)

Purpose Builds a coverage diagnostic table comparing credible-interval widths between the EB and FB posteriors for each $\theta_{\mathrm{ref}}$ cell. The ratio $\mathrm{width}{\mathrm{eb}} / \mathrm{width}{\mathrm{fb}}$ operationally diagnoses the $O(n^{-1})$ under-coverage predicted by Proposition 7B (v07 Section 6), with extensions to Path A, Path B, and Path C (v07b Section 5).

Arguments

Argument Type Meaning
eb_fit list A fitted EB model object. Must contain theta_ref_hat, theta_ref_se, and for Path C: theta_ref_kp_hat, theta_ref_kp_se, K, p, slot_names. May contain correction_applied (logical), eb_correction_constant (scalar), correction_tensor_constant (matrix), and path (character).
fb_fit list A fitted FB model object, passed to .gdpar_eb_fb_extract_theta_ref_draws_fb.
level numeric scalar Nominal credible level $\ell \in (0,1)$ (e.g., 0.95).

Mathematics

The significance level and critical value are: $$\alpha = 1 - \ell, \qquad z = \Phi^{-1}!\bigl(1 - \alpha/2\bigr)$$

Path C ($\texttt{eb_fit$path} == \texttt{"eb_KxP"}$): For each cell $(g, k, c)$ with $g=1,\dots,J$, $k=1,\dots,K$, $c=1,\dots,p$:

  • The EB standard error is $\mathrm{se}_{g,k,c}^{\mathrm{eb}} = \mathtt{eb_fit$theta_ref_kp_se}[g,k,c]$.
  • If the EB correction was applied and the correction tensor $\mathbf{T} \in \mathbb{R}^{K \times p \times p}$ is available with finite entries, the inflation factor is: $$\mathrm{inflate}{k,c} = \sqrt{1 + \frac{T[k,c,c]}{\max(1, J)}}$$ Otherwise $\mathrm{inflate}{k,c} = 1$.
  • The EB credible-interval width is: $$w_{g,k,c}^{\mathrm{eb}} = 2z \cdot \mathrm{se}{g,k,c}^{\mathrm{eb}} \cdot \mathrm{inflate}{k,c}$$
  • The FB credible-interval width is: $$w_{g,k,c}^{\mathrm{fb}} = 2z \cdot \mathrm{sd}\bigl(\theta_{\mathrm{ref}_kp}^{(s)}[g,k,c]\bigr)$$ computed from the FB posterior draws (via .gdpar_eb_fb_extract_theta_ref_draws_fb), or NA_real_ if unavailable.
  • The width ratio is: $$R_{g,k,c} = \frac{w_{g,k,c}^{\mathrm{eb}}}{w_{g,k,c}^{\mathrm{fb}}}$$ set to NA_real_ if $w^{\mathrm{fb}}$ is non-finite or zero.

Path A / Path B (non-Path-C): For each cell $j=1,\dots,J_{\mathrm{flat}}$ (where $J_{\mathrm{flat}} = \mathrm{length}(\mathtt{theta_ref_hat})$):

  • $\mathrm{se}_j^{\mathrm{eb}} = \mathtt{eb_fit$theta_ref_se}[j]$.
  • If eb_fit$correction_applied is TRUE, the scalar inflation factor is: $$\mathrm{inflate} = \sqrt{1 + \frac{c_{\mathrm{eb}}}{\max(1, J_{\mathrm{flat}})}}$$ where $c_{\mathrm{eb}} = \mathtt{eb_fit$eb_correction_constant}$ (defaulting to 0 if NULL). Otherwise $\mathrm{inflate} = 1$.
  • EB width: $w_j^{\mathrm{eb}} = 2z \cdot \mathrm{se}_j^{\mathrm{eb}} \cdot \mathrm{inflate}$.
  • FB width: $w_j^{\mathrm{fb}} = 2z \cdot \mathrm{sd}_{j}^{\mathrm{fb}}$ from the flat FB draws, or NA_real_ if the FB draws are shorter or unavailable.
  • The width ratio is computed element-wise with ifelse, yielding NA_real_ when $w^{\mathrm{fb}}$ is non-finite or $\leq 0$.

Returns

  • NULL is never explicitly returned (unlike the other functions in this section); instead the function always returns a data.frame.

  • Path C: A data.frame with $J \times K \times p$ rows and columns:

Column Type Meaning
group integer Group index $g$.
slot character Slot name from eb_fit$slot_names[k].
coord integer Coordinate index $c$.
eb_width numeric EB credible-interval width.
fb_width numeric FB credible-interval width (or NA).
width_ratio numeric $\mathrm{eb_width} / \mathrm{fb_width}$ (or NA).
inflation numeric The correction inflation factor $\mathrm{inflate}_{k,c}$ (always $\geq 1$ when active).
  • Path A / Path B: A data.frame with $J_{\mathrm{flat}}$ rows and columns:
Column Type Meaning
cell integer Sequential cell index $j = 1, 2, \dots$.
eb_width numeric EB credible-interval width.
fb_width numeric FB credible-interval width (or NA).
width_ratio numeric $\mathrm{eb_width} / \mathrm{fb_width}$ (or NA).
inflation numeric Scalar correction inflation factor.

Notes

  • Path C vs. non-Path-C dispatch is determined by identical(eb_fit$path, "eb_KxP").
  • The FB draws extraction (.gdpar_eb_fb_extract_theta_ref_draws_fb) is wrapped in tryCatch; on error fb_draws is set to NULL, and all FB widths will be NA_real_.
  • For Path C, the correction tensor entry tensor[k, c, c] is checked with all(is.finite(tensor[k, c, c])) before computing the inflation; if the tensor is NULL or the entry is non-finite, inflation defaults to 1.
  • For Path A/B, eb_fit$eb_correction_constant defaults to 0 via %||% when NULL.
  • The width_ratio is the key diagnostic: a ratio significantly above 1 indicates the EB credible intervals are wider than the FB intervals, consistent with the $O(n^{-1})$ under-coverage correction described in the referenced theoretical results.
  • Marked @keywords internal, @noRd, not exported.

R/compare_meta_learners_methods.R

print.gdpar_meta_learner_comparison(x, ...)

Purpose S3 print method for objects of class gdpar_meta_learner_comparison. Renders a concise human-readable summary of a meta-learner comparison: the bridge identifier, observation/method counts, the credible level, per-external-method metadata (native CI availability, elapsed time, note count, presence of a predict_fun), and a head view of the three concordance matrices (RMSE, Pearson, MAD).

Arguments

  • x: gdpar_meta_learner_comparison. The comparison object to display. Expected to contain components n_obs, n_methods, level, external (a named list of per-method adapter result lists, each with native_ci, time_sec, notes, has_predict_fun), and comparison (a list with rmse, pearson, mad matrices).
  • ...: any. Unused; present for S3 generic compatibility. Silently ignored.

Mathematics

None.

Returns

Invisibly returns x (via invisible(x)). The side effect is console output.

Notes

  • The method does not validate x; it assumes the structure is present. Missing components would propagate as errors from cat/print.
  • Per-method line format is fixed by sprintf: "- %-12s native_ci = %s time = %.3f s notes = %d predict = %s\n". native_ci and has_predict_fun are coerced to character by sprintf's %s (typically "TRUE"/"FALSE").
  • The three concordance matrices are printed with round(..., 4L); they are expected to be square numeric matrices with shared row/column names of length $m = 1 + n_{\text{methods}}$ (bridge plus externals).
  • S3 dispatch is triggered by print(x) when class(x) contains "gdpar_meta_learner_comparison".

summary.gdpar_meta_learner_comparison(object, ...)

Purpose S3 summary method for gdpar_meta_learner_comparison objects. Constructs a structured long-format summary containing: per-method ATE point estimates, per-method ATE CI bounds (averaged from per-observation native CIs when available, otherwise NA_real_), the three concordance matrices pivoted into long form, and per-method timing/CI-availability metadata.

Arguments

  • object: gdpar_meta_learner_comparison. The comparison object. Must contain bridge_cate (with cate_mean and optionally cate_ci), external (named list of adapter results, each with cate_mean, optionally cate_ci, time_sec, native_ci), comparison (with rmse, pearson, mad), level, n_obs, n_methods.
  • ...: any. Unused; present for S3 generic compatibility.

Mathematics

Per-method ATE is the sample mean of the per-observation CATE posterior means:

$$ \widehat{\mathrm{ATE}}_{\text{method}} = \frac{1}{n}\sum_{i=1}^{n} \widehat{\mathrm{CATE}}_{\text{method},i} $$

When native per-observation CIs are present, the ATE CI bounds are likewise the sample means of the per-observation lower and upper bounds:

$$ \widehat{\mathrm{ATE}}^{\text{lo}}_{\text{method}} = \frac{1}{n}\sum_{i=1}^{n} L_{\text{method},i}, \qquad \widehat{\mathrm{ATE}}^{\text{hi}}_{\text{method}} = \frac{1}{n}\sum_{i=1}^{n} U_{\text{method},i} $$

Otherwise both bounds are NA_real_.

Returns

A list of class c("summary.gdpar_meta_learner_comparison", "list") with components:

  • ate_table: data.frame with columns method (character: "bridge" followed by names(object$external)), ate (numeric), ate_lower (numeric, possibly NA), ate_upper (numeric, possibly NA).
  • metrics: long-format data.frame produced by .comparison_long(object$comparison) with columns method_i, method_j, rmse, pearson, mad (off-diagonal rows only).
  • timing: data.frame with columns method (external method names only — bridge excluded), time_sec (numeric), native_ci (logical).
  • level, n_obs, n_methods: copied verbatim from object.

Notes

  • Calls assert_inherits(object, "gdpar_meta_learner_comparison", "object"); raises an error (presumably of class gdpar_input_error per package convention) if the class is absent.
  • Bridge ATE CI bounds are populated only if object$bridge_cate$cate_ci is non-NULL; otherwise they remain NA_real_.
  • External ATE CI bounds are populated per-method only if e$cate_ci is non-NULL; the code does not consult e$native_ci here, only the presence of cate_ci.
  • ate_vec, ate_lower, ate_upper are named numeric vectors initialized with stats::setNames; the bridge slot is filled first, then external slots in iteration order.
  • The timing data frame excludes the bridge (no timing recorded for it).
  • S3 dispatch is triggered by summary(object) when class(object) contains "gdpar_meta_learner_comparison".

print.summary.gdpar_meta_learner_comparison(x, ...)

Purpose S3 print method for objects of class summary.gdpar_meta_learner_comparison. Prints the credible level, observation count, method count, the ATE table, the timing/CI-availability table, and the first 20 rows of the long-format pairwise concordance metrics.

Arguments

  • x: summary.gdpar_meta_learner_comparison. The summary object produced by summary.gdpar_meta_learner_comparison. Expected components: level, n_obs, n_methods, ate_table, timing, metrics.
  • ...: any. Unused; present for S3 generic compatibility.

Mathematics

None.

Returns

Invisibly returns x (via invisible(x)). The side effect is console output.

Notes

  • Does not validate x.
  • ate_table and timing are printed with row.names = FALSE.
  • metrics is truncated to its first 20 rows via utils::head(x$metrics, 20L); if nrow(x$metrics) > 20L, a sprintf line of the form " ... (%d more rows)\n" is emitted with the count of omitted rows.
  • S3 dispatch is triggered by print(x) when class(x) contains "summary.gdpar_meta_learner_comparison".

.comparison_long(comparison)

Purpose Internal helper that pivots the three square concordance matrices (rmse, pearson, mad) stored in a comparison object into a single long-format data.frame containing one row per ordered off-diagonal method pair $(i, j)$ with $i \neq j$.

Arguments

  • comparison: list. Must contain numeric matrix components rmse, pearson, mad with identical dimensions and shared rownames/colnames. Row names are read from rownames(comparison$rmse).

Mathematics

Given $m \times m$ concordance matrices $R$ (RMSE), $P$ (Pearson), $D$ (MAD) indexed by method names ${n_1, \ldots, n_m}$, the long form enumerates all ordered pairs $(i, j)$ with $i \neq j$:

$$ \text{row}_{(i,j)} = \bigl(, n_i,\ n_j,\ R_{ij},\ P_{ij},\ D_{ij},\bigr) $$

Diagonal entries ($i = j$) are skipped via if (i == j) next. The total number of rows is $m(m-1)$.

Returns

A data.frame with columns method_i (character), method_j (character), rmse (numeric), pearson (numeric), mad (numeric), constructed by do.call(rbind, out_rows) over a list of single-row data frames. stringsAsFactors = FALSE is set on each constituent.

Notes

  • Marked @keywords internal and @noRd; not exported.
  • Iteration uses seq_along(nms) for both i and j, so the order is row-major over the upper and lower triangles combined (i.e., both $(i, j)$ and $(j, i)$ appear, but $(i, i)$ is excluded).
  • Assumes rownames(rmse) is non-NULL and that all three matrices share the same dimension and names; no consistency check is performed.
  • The list out_rows is pre-allocated by appending with an incrementing index k; if any i == j is skipped, the corresponding slot is never assigned, but because k is only incremented after assignment, no NULL slots are produced.
  • Returns NULL (from do.call(rbind, list())) if nms is empty.

predict.gdpar_meta_learner_comparison(object, newdata, level = NULL, bridge = NULL, data = NULL, ...)

Purpose S3 predict method for gdpar_meta_learner_comparison objects. Re-evaluates the CATE on a new covariate grid newdata for the bridge component and for every external adapter. Adapters exposing a predict_fun reuse their cached fitted state without refitting; adapters without a usable predict_fun (or whose predict_fun errors) are flagged for refit, their cate_mean is filled with NA_real_, and a gdpar_diagnostic_warning is emitted. The bridge is re-evaluated via predict.gdpar_causal_bridge when real fits are present, otherwise falls back to cached cate_mean/cate_ci only when newdata matches the original observation count.

Arguments

  • object: gdpar_meta_learner_comparison. The comparison object. Must contain level, bridge (a gdpar_causal_bridge or NULL), and external (named list of adapter results, each possibly containing predict_fun, state, native_ci, notes).
  • newdata: data.frame. Required. The new evaluation grid. Must be a data frame (asserted by assert_data_frame).
  • level: numeric(1) or NULL. Optional credible level in $(10^{-3}, 1 - 10^{-3})$ overriding object$level. Defaults to NULL, which reuses object$level. Validated by assert_numeric_scalar(level, "level", lower = 1e-3, upper = 1 - 1e-3) when non-NULL.
  • bridge: gdpar_causal_bridge or NULL. Optional replacement bridge object used instead of object$bridge. Defaults to NULL (use cached bridge). Useful when the cached bridge was stripped (e.g., after a saveRDS round-trip that lost the two fits).
  • data: named list with components X, T, Y (and optionally X_newdata) or NULL. Reserved for the case of a forced re-fit. Defaults to NULL. Note: the current implementation does not consume data at all — it is accepted but never referenced in the body.
  • ...: any. Reserved for future arguments; currently unused.

Mathematics

Let $X_{\text{new}} \in \mathbb{R}^{n_{\text{new}} \times p}$ be the covariate matrix extracted from newdata. For each external method $m$ with a working predict_fun $f_m$:

$$ \widehat{\mathrm{CATE}}_{m}^{\text{new}} = f_m(\text{state}_m,\ X_{\text{new}},\ \text{level}) $$

The bridge prediction is

$$ \widehat{\mathrm{CATE}}_{\text{bridge}}^{\text{new}} = \mathrm{predict}(\text{bridge},\ \text{newdata},\ \text{level},\ \text{summary} = \text{"mean_ci"}) $$

when real fits are present. The concordance metrics are then recomputed over the vector of method-specific CATE means via .compute_comparison_metrics(cate_list).

Returns

A list of class c("predict.gdpar_meta_learner_comparison", "list") with components:

  • bridge: list with cate_mean and cate_ci (from bridge_pred).
  • external: named list mirroring object$external names; each entry is a list with cate_mean (numeric, possibly all NA_real_), cate_ci (matrix or NULL), method (character), native_ci (logical), time_sec (NA_real_), notes (character vector, augmented with a status message).
  • comparison: result of .compute_comparison_metrics(cate_list) — a list of concordance matrices.
  • newdata: the input newdata (stored verbatim).
  • level: the resolved numeric level.

Notes

  • Calls assert_inherits(object, "gdpar_meta_learner_comparison", "object") and assert_data_frame(newdata, "newdata") up front.
  • If bridge_obj (resolved from bridge or object$bridge) does not inherit from "gdpar_causal_bridge", the function aborts via gdpar_abort with class "gdpar_input_error" and a message instructing the user to pass a bridge via the bridge argument.
  • The outcome name is recovered via .bridge_outcome_name(bridge_obj$fits$treat, bridge_obj$fits$ctrl), and covariates are extracted from newdata via .extract_covariates(newdata, outcome_name) (presumably dropping the outcome column).
  • Bridge re-evaluation branches on has_real_fits:
    • If both bridge_obj$fits$treat$fit and bridge_obj$fits$ctrl$fit are non-NULL, calls stats::predict(bridge_obj, newdata = newdata, level = level, summary = "mean_ci").
    • Otherwise, falls back to cached bridge_obj$cate_mean / bridge_obj$cate_ci only if nrow(newdata) == bridge_obj$n_obs; otherwise cate_mean is rep(NA_real_, nrow(newdata)) and cate_ci is NULL.
  • For each external method:
    • If e$predict_fun is a function, it is invoked as pf(state = e$state, X_newdata = X_newdata, level = level) inside tryCatch. On success, cate_mean is coerced via as.numeric(out$cate_mean), cate_ci is taken as out$cate_ci, native_ci is e$native_ci && !is.null(out$cate_ci), and notes is augmented with "reused cached state via predict_fun". On error, the method is added to needs_refit, cate_mean is rep(NA_real_, nrow(newdata)), cate_ci is NULL, native_ci is FALSE, and notes is augmented with "predict_fun failed: <message>".
    • If no predict_fun, the method is added to needs_refit, cate_mean is rep(NA_real_, nrow(newdata)), cate_ci is NULL, native_ci is FALSE, and notes is augmented with "predict_fun unavailable; a full refit would be required".
  • If length(needs_refit) > 0L, a warning of class "gdpar_diagnostic_warning" is emitted via gdpar_warn with data = list(needs_refit = needs_refit) and a message listing the affected adapters, advising the user to rebuild the comparison with gdpar_compare_meta_learners().
  • time_sec is always set to NA_real_ for every external entry (no timing is recorded for prediction).
  • The data argument is declared and documented but not used in the body; no refit path is actually implemented despite the documentation mentioning fit_predict_fun. The function only reuses cached state or returns NA predictions.
  • cate_list is constructed as c(list(bridge = as.numeric(bridge_pred$cate_mean)), lapply(external, function(e) e$cate_mean)) and passed to .compute_comparison_metrics; the resulting matrices therefore have row/column names "bridge" followed by the external method names (subject to .compute_comparison_metrics's naming behavior).
  • S3 dispatch is triggered by predict(object, newdata = ...) when class(object) contains "gdpar_meta_learner_comparison".

R/compare_meta_learners.R

gdpar_compare_meta_learners(bridge, methods, newdata = NULL, data = NULL, seed = NULL, ...)

Purpose
Orchestrates a descriptive comparison of the T-learner (AMM-side) embedded in a gdpar_causal_bridge object against one or more external meta-learners (e.g., grf, EconML). It evaluates each method on a common evaluation grid, reports point/posterior CATE estimates and their native confidence intervals, and computes three concordance metrics (RMSE, Pearson correlation, mean absolute discrepancy) between every ordered pair of methods on their point/posterior CATE estimates. It does not perform hypothesis tests.

Arguments

  • bridge: Object of class gdpar_causal_bridge (from gdpar_causal_bridge()). Contains two fitted gdpar objects (treatment and control arms), precomputed CATE estimates, and metadata.
  • methods: Non-empty list of gdpar_meta_learner_adapter objects. Each adapter wraps a specific external meta-learner implementation (e.g., gdpar_adapter_grf()).
  • newdata: Optional data.frame on which to evaluate CATE. Defaults to the evaluation grid stored in bridge$newdata.
  • data: Optional list with components X (covariate data.frame), T (integer 0/1 treatment vector), Y (numeric outcome vector). Used to supply training data explicitly if it cannot be recovered from the bridge's stored calls (e.g., when the original data is not in the calling environment).
  • seed: Optional integer scalar. Propagated to each adapter's fit_predict_fun as seed_run for reproducibility.
  • ...: Reserved for future arguments; currently unused.

Mathematics
For every ordered pair of methods $(i, j)$ (including the bridge, indexed as "bridge"), computes the following concordance metrics on the point/posterior CATE estimates $\widehat{\tau}$: $$ \mathrm{RMSE}{ij} = \sqrt{\frac{1}{N}\sum{n=1}^N \left( \widehat{\tau}_i(x_n) - \widehat{\tau}j(x_n) \right)^2} $$ $$ \mathrm{Pearson}{ij} = \mathrm{cor}\left( \widehat{\tau}i, \widehat{\tau}j \right) $$ $$ \mathrm{MAD}{ij} = \frac{1}{N}\sum{n=1}^N \left| \widehat{\tau}_i(x_n) - \widehat{\tau}_j(x_n) \right| $$ where $N$ is the number of evaluation points (n_obs). Confidence intervals are not pooled because their inferential origins (Bayesian posterior, asymptotic, bootstrap) are heterogeneous.

Returns
An object of class gdpar_meta_learner_comparison (a list) with components:

  • bridge_cate: List with cate_mean (numeric vector of bridge CATE point estimates) and cate_ci (matrix of bridge credible intervals).
  • bridge: The original gdpar_causal_bridge object.
  • external: Named list of results for each external adapter. Each element contains cate_mean, cate_ci, method, native_ci (logical), time_sec, notes, state (from the adapter), and the adapter's predict_fun/fit_predict_fun if provided.
  • comparison: Matrix of concordance metrics (RMSE, Pearson, MAD) between all method pairs.
  • newdata: The evaluation grid (data.frame) used.
  • level: The confidence level (numeric) used for intervals.
  • n_obs: Integer number of evaluation points.
  • n_methods: Integer total number of methods compared (bridge + external).
  • call: The matched call.
  • meta: List of metadata including package version, timestamp, seed, original bridge call, and adapter specifications.

Notes

  • Scalar-outcome restriction: Rejects bridges with dim_kind != "scalar" (i.e., distributional or multivariate regression) via .guard_scalar_outcome().
  • Method names: If the methods list is unnamed, names are taken from each adapter's $name field. Duplicate names cause an error.
  • Dataset recovery: If data is NULL, the function attempts to reconstruct the training data from the bridge's stored calls using .assemble_bridge_dataset(). If this fails, the user must supply data explicitly.
  • Adapter validation: Each adapter is checked for unmet software requirements (R packages, Python modules) via .check_adapter_requirements(). Missing dependencies cause a gdpar_missing_dependency_error.
  • Adapter output validation: Results from each adapter are checked with .validate_adapter_output() for correct length and structure.
  • Bridge CATE recomputation: If newdata differs from the bridge's original grid and lengths mismatch, the bridge CATE is re-predicted using stats::predict(bridge, ...).

.guard_scalar_outcome(bridge)

Purpose
Internal validation function ensuring the bridge was constructed from scalar-outcome fits (i.e., dim_kind == "scalar"). Rejects bridges from distributional regression (K > 1) or multivariate response (p > 1) with a specific error, as multi-output external adapters are not supported in the current scope (Sub-phase 8.5.B).

Arguments

  • bridge: Object of class gdpar_causal_bridge. Its $meta$dim_kind component is inspected.

Returns
invisible(NULL) if the bridge is scalar. Otherwise, raises a gdpar_unsupported_feature_error.

Notes

  • Uses the null-coalescing operator %||% to default "scalar" if dim_kind is missing.
  • The error message references "Sub-phase 8.5.B" and queues multi-output support for "Block 9" per the package roadmap.

.assemble_bridge_dataset(bridge, newdata, data, eval_env)

Purpose
Internal helper that constructs the unified training dataset (X, T, Y) required by external meta-learner adapters. It either uses an explicitly provided data argument or attempts to recover the training data from the bridge's stored fits by evaluating their captured call objects in the specified environment.

Arguments

  • bridge: gdpar_causal_bridge object.
  • newdata: data.frame of evaluation covariates (the CATE grid).
  • data: Optional list with components X, T, Y. If supplied, it is used directly.
  • eval_env: Environment in which to evaluate the bridge's stored calls (typically parent.frame() of the caller).

Mathematics
When data is NULL, the algorithm for each arm (treatment, control) is:

  1. Recover the arm's training dataset via eval(fit$call$data, eval_env).
  2. Identify the outcome variable name from the LHS of fit$call$formula.
  3. Extract the covariate matrix X_arm (all columns except the outcome) and outcome vector Y_arm.
  4. Create treatment indicator T_arm = 1L (treatment) or 0L (control).
  5. Stack the two arms row-wise to form (X, T, Y).

Returns
A list with components:

  • X: data.frame of stacked covariates (rows = training observations from both arms).
  • T: Integer vector of treatment indicators (0/1).
  • Y: Numeric vector of outcomes.
  • X_newdata: The supplied newdata (unchanged).
  • outcome_name: Character string of the outcome variable name.

Notes

  • If evaluation of a fit's call$data fails (e.g., the object is not in eval_env), the function aborts with a gdpar_input_error advising the user to pass data explicitly.
  • The helper ensures the covariate column order and types are consistent between training and evaluation data.
  • The function is responsible for ensuring that the stacked dataset aligns with the external adapter's expectations (i.e., X is a data frame, T is integer 0/1, Y is numeric).

.assemble_bridge_dataset(bridge, newdata, data, eval_env)

Purpose Assembles the unified training dataset and newdata covariate matrix required by the meta-learner comparison machinery. It either accepts an explicitly supplied data argument (a list with X, T, Y, and optionally X_newdata) or attempts to recover the original training data from the captured call objects inside the treatment and control fits of a bridge object. In both paths it validates consistency, combines arms, and returns a standardized list.

Arguments

Argument Type Meaning
bridge list A bridge object with component fits$treat and fits$ctrl, each a fitted model object that carries a $call element.
newdata data.frame (or coercible) New covariate data for which predictions will be compared. Must contain every covariate column present in the training data.
data list or NULL Optional explicit data. If non-NULL it must be a named list with components X (covariate matrix/data.frame), T (integer treatment indicator, values 0/1), Y (numeric outcome), and optionally X_newdata (covariate data.frame for newdata; if absent, covariates are extracted from newdata).
eval_env environment The environment in which fit call expressions (e.g. cl$data) are evaluated when recovering training data from the fitted objects.

Mathematics

No formula is implemented. The function performs data assembly only.

Training data from two arms are row-bound with treatment indicators prepended:

$$\mathbf{Y} = \begin{bmatrix} \mathbf{Y}^{(t)} \ \mathbf{Y}^{(c)} \end{bmatrix}, \quad \mathbf{T} = \begin{bmatrix} \mathbf{1}_{n_t} \ \mathbf{0}_{n_c} \end{bmatrix}, \quad \mathbf{X} = \begin{bmatrix} \mathbf{X}^{(t)} \ \mathbf{X}^{(c)} \end{bmatrix}$$

where $n_t$ and $n_c$ are the row counts of the treatment and control recovered data frames respectively.

Returns

A named list with five components:

Component Type Description
X data.frame Training covariates (outcome column removed). Row count equals $n_t + n_c$.
T integer vector Treatment indicator, length $n_t + n_c$, values 1L (treatment arm first) then 0L (control arm).
Y numeric vector Outcome values, treatment arm first then control arm, length $n_t + n_c$.
X_newdata data.frame Covariates for newdata, column subset/reordered to match X. Row count equals nrow(newdata).
outcome_name character scalar The name of the outcome variable, inferred by .bridge_outcome_name().

Notes

  • Explicit data path (data is non-NULL):

    • data must be a list with named components X, T, Y. If any is missing, a gdpar_input_error is raised.
    • X is coerced to data.frame if it is not one already (with stringsAsFactors = FALSE).
    • T is coerced to integer; Y to numeric.
    • Lengths of T, Y, and nrow(X) must agree; otherwise a gdpar_input_error is raised with a diagnostic sprintf message.
    • T must contain only 0L and 1L; otherwise a gdpar_input_error is raised.
    • If data$X_newdata is present it is used (coerced to data.frame if needed); otherwise covariates are extracted from newdata by calling .extract_covariates(newdata, outcome_name).
    • The function returns immediately via return(...) without any data-recovery attempt.
  • Recovery path (data is NULL):

    • The internal function recover(fit) evaluates fit$call$data in eval_env. If the call is NULL, the data component is NULL, or evaluation throws an error, NULL is returned.
    • If either recovered data is NULL or not a data.frame, a gdpar_input_error is raised with extra data fields treat_recovered and ctrl_recovered.
    • The outcome variable name (from .bridge_outcome_name()) must appear as a column in both recovered data frames; otherwise a gdpar_input_error is raised.
    • The column sets of the two recovered data frames must be identical (after sorting); otherwise a gdpar_input_error is raised listing both column sets.
    • Both data frames are subset to their common columns, then row-bound. The outcome column is removed from X via .extract_covariates().
    • newdata covariates are extracted and checked for missing columns present in X; if any are missing a gdpar_input_error is raised listing them. X_newdata is then reordered/subset to match X's columns exactly.
  • All errors are raised via gdpar_abort() with appropriate condition classes (gdpar_input_error, gdpar_unsupported_feature_error).


.bridge_outcome_name(fit_t, fit_c)

Purpose Infers the name of the outcome (response) variable from the captured formula calls of the treatment and control fits within a bridge object. This is used internally by .assemble_bridge_dataset() and other comparison functions.

Arguments

Argument Type Meaning
fit_t fitted model object Treatment-arm fit. Must carry a $call element, ideally with a $formula component.
fit_c fitted model object Control-arm fit. Same expectations as fit_t.

Mathematics

No formula. The function performs formula introspection.

Returns

A single character string giving the outcome variable name. If both fits resolve to the same name, that name is returned. If only one resolves, the resolved name is returned. If neither resolves or they disagree, an error is raised (see Notes).

Notes

  • The internal helper pick(fit) attempts three strategies in order to extract the LHS of a two-sided formula from fit$call$formula:

    1. Evaluate cl$formula in the environment of fit$call. If the result is a formula of length 3 (two-sided), extract as.character(fm[[2L]]).
    2. If cl$formula itself is a call or name, attempt to evaluate it with bare eval(). If the result is a two-sided formula, extract the LHS.
    3. If cl$formula is a call of length 3 whose first element is ~, directly extract as.character(cl$formula[[2L]]).
    4. If all strategies fail, NA_character_ is returned.
  • If both n_t and n_c are NA, a gdpar_input_error is raised advising the user to pass an explicit data argument.

  • If both are non-NA but differ (!identical(n_t, n_c)), a gdpar_unsupported_feature_error is raised listing both names. This means the two fits must model the same outcome variable.

  • If exactly one is NA, the non-NA value is returned (i.e., n_c when n_t is NA, otherwise n_t).


.extract_covariates(df, outcome_name)

Purpose Removes the outcome column from a data frame (or object coercible to one) and returns only the covariate columns. Used throughout the comparison pipeline to separate predictors from response.

Arguments

Argument Type Meaning
df data.frame or coercible A data frame whose columns include covariates and the outcome.
outcome_name character scalar The name of the outcome column to drop.

Mathematics

None.

Returns

A data.frame containing all columns of df except the one named outcome_name. If outcome_name is not present in colnames(df), all columns are returned (since setdiff returns the full set). The drop = FALSE argument ensures the result is always a data frame even if a single column remains.

Notes

  • If df is not already a data.frame, it is coerced via as.data.frame(df, stringsAsFactors = FALSE).
  • This is a utility function; no errors are raised by it directly.

.validate_adapter_output(result, n_newdata, adapter_name)

Purpose Validates that the return value of a meta-learner adapter conforms to the expected shape and types. Called internally after each adapter invocation to enforce the adapter interface contract.

Arguments

Argument Type Meaning
result list The object returned by an adapter. Must contain at minimum a cate_mean component. May optionally contain cate_ci.
n_newdata integer scalar The expected number of rows (observations) in the newdata, i.e., the required length of cate_mean and row count of cate_ci.
adapter_name character scalar Human-readable name of the adapter, used in error messages.

Mathematics

None.

Returns

Invisibly returns NULL (invisible(NULL)). Side-effect–only function: raises errors if validation fails.

Notes

  • First check: result must be a list and must have an element named "cate_mean". If not, a gdpar_internal_error is raised.
  • Second check: result$cate_mean must be a numeric vector of length exactly n_newdata. If not, a gdpar_internal_error is raised with a diagnostic sprintf.
  • Third check (conditional): If result$cate_ci is non-NULL, it must be a matrix with nrow == n_newdata and ncol == 2L (lower and upper bounds). If not, a gdpar_internal_error is raised reporting the actual dimensions.
  • All errors use class "gdpar_internal_error", indicating a programming error in the adapter rather than user input.

.compute_comparison_metrics(cate_list)

Purpose Computes three pairwise concordance/similarity matrices across a list of CATE (Conditional Average Treatment Effect) estimate vectors. These matrices quantify the agreement between different meta-learner methods on the same newdata.

Arguments

Argument Type Meaning
cate_list list of numeric vectors Each element is a numeric vector of CATE predictions for the same newdata observations. All vectors must have the same length. List element names (if present) are used as row/column labels; otherwise names m1, m2, … are generated.

Mathematics

Let $m = \texttt{length(cate_list)}$ and let $\hat{\tau}_i \in \mathbb{R}^n$ denote the $i$-th CATE vector ($i = 1, \ldots, m$). Three $m \times m$ matrices are computed:

Root Mean Squared Error (RMSE):

$$\text{RMSE}_{ij} = \sqrt{\frac{1}{n} \sum_{k=1}^{n} \left(\hat{\tau}_{i,k} - \hat{\tau}_{j,k}\right)^2}, \quad i \neq j$$

Diagonal: $\text{RMSE}_{ii} = 0$.

Mean Absolute Deviation (MAD):

$$\text{MAD}_{ij} = \frac{1}{n} \sum_{k=1}^{n} \left|\hat{\tau}_{i,k} - \hat{\tau}_{j,k}\right|, \quad i \neq j$$

Diagonal: $\text{MAD}_{ii} = 0$.

Pearson Correlation:

$$\text{Pearson}_{ij} = \text{Cor}!\left(\hat{\tau}_i, \hat{\tau}_j\right), \quad i \neq j$$

Diagonal: $\text{Pearson}{ii} = 1$. Note that correlation is computed only for $i < j$ and then copied symmetrically: $\text{Pearson}{ji} = \text{Pearson}_{ij}$.

Returns

A named list with three components:

Component Type Description
rmse matrix ($m \times m$) Pairwise RMSE. Diagonal is 0. Symmetric. Dimnames are the method names.
pearson matrix ($m \times m$) Pairwise Pearson correlation. Diagonal is 1. Symmetric. Dimnames are the method names.
mad matrix ($m \times m$) Pairwise MAD. Diagonal is 0. Symmetric. Dimnames are the method names.

Notes

  • All CATE vectors in cate_list are column-bound into a single matrix M via do.call(cbind, cate_list). This requires all vectors to have the same length; no explicit check is performed—cbind will recycle or error if lengths differ.
  • If cate_list has no names, synthetic names "m1", "m2", … are assigned and propagated to dimnames.
  • The double loop iterates over all $(i, j)$ pairs with $i \neq j$. For RMSE and MAD, each off-diagonal entry is written once. For Pearson, the loop only computes the correlation when $i &lt; j$ (using stats::cor()) and mirrors the value to $[j, i]$. This avoids redundant correlation calls.
  • stats::cor() is wrapped in suppressWarnings() to silence warnings about constant vectors (which yield NA correlations).
  • The matrices are not guaranteed to be perfectly symmetric due to floating-point considerations in the $i \neq j$ case for RMSE and MAD (each pair is computed only once and written to one cell; the symmetric cell is left at the diagonal-init value). Specifically, rmse[i,j] is set for all $i \neq j$ in the inner loop, so both $[i,j]$ and $[j,i]$ are filled (the loop visits both orderings since i == j is the only skip). The Pearson matrix is explicitly symmetric because only $i &lt; j$ is computed and mirrored.

R/contraction_diagnostic.R

R/contraction_diagnostic.R

gdpar_contraction_diagnostic(fit, data, sizes = NULL, replicates = 1L, parameters = NULL, level = 0.95, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)

Purpose

Empirical posterior contraction-rate diagnostic for a fitted Path 1 (Bayesian) gdpar model. It is an opt-in, computationally expensive methodological audit tool that refits the model at multiple subsample sizes, records the median posterior credible-interval width across user-facing parameters at each size, and fits an ordinary-least-squares regression of log-width on log-sample-size. The estimated slope is compared against the theoretical parametric contraction rate $n^{-1/2}$ predicted by Theorem 4B of Block 4. The function does not modify the original fit; it returns a standalone report.

Arguments

Argument Type Meaning
fit gdpar_fit A fitted model object produced by gdpar with path = "bayes". Must inherit from class "gdpar_fit". The original fit$call is extracted and modified to produce subsampled refits.
data data frame The data frame originally passed to gdpar, or another data frame compatible with the AMM specification of fit. Its row count $n$ governs subsample-size validation and sampling.
sizes NULL or numeric vector Subsample sizes at which to refit. If NULL (default), a length-five geometric sequence is generated between $\max(20, \lceil n/8 \rceil)$ and $n$. Entries must lie in $[5, n]$.
replicates integer scalar (count) Number of independent subsamples drawn per size. Defaults to 1L. Higher values reduce Monte Carlo variance of the log-width curve at additional computational cost. Must be a non-negative integer (validated by assert_count).
parameters NULL or character vector Optional explicit list of posterior variable names to include in the credible-width calculation. If NULL (default), the function auto-selects user-facing parameters by filtering out variables matching the internal ignore pattern.
level numeric scalar in $(0, 1)$ Nominal credible level for interval-width computation. Defaults to 0.95. The interval is formed from the $\alpha/2$ and $1 - \alpha/2$ quantiles where $\alpha = 1 - \text{level}$.
iter_warmup integer scalar (count) Warmup iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call.
iter_sampling integer scalar (count) Sampling iterations for each refit. Defaults to 500L. Forwarded to gdpar via the modified call.
chains integer scalar (count) Number of MCMC chains per refit. Defaults to 2L. Forwarded to gdpar via the modified call.
verbose logical scalar (length 1) If TRUE, prints a cost message via gdpar_inform before starting the refits. Defaults to TRUE. Must be a single logical value.
... any Additional arguments forwarded to gdpar through the modified refit call.

Mathematics

The diagnostic fits the linear regression

$$ \log(\text{width}_i) = \alpha + \beta ,\log(n_i) + \varepsilon_i $$

where $\text{width}_i$ is the median credible-interval width across the selected parameters at subsample size $n_i$. For each parameter $\theta_j$ and each refit, the credible-interval width is

$$ \text{width}_j = q_{1-\alpha/2}(\theta_j) - q_{\alpha/2}(\theta_j), $$

with $\alpha = 1 - \text{level}$, and the cell-level summary is

$$ \text{median_width} = \operatorname{median}_j(\text{width}_j). $$

The slope $\hat\beta$ and its standard error $\operatorname{SE}(\hat\beta)$ are extracted from stats::lm. An approximate 95% confidence interval for $\beta$ is

$$ [\hat\beta - 1.96,\operatorname{SE}(\hat\beta),;; \hat\beta + 1.96,\operatorname{SE}(\hat\beta)]. $$

The verdict logic compares this interval against the theoretical target $(-0.6,,-0.4)$:

$$ \text{verdict} = \begin{cases} \text{Consistent with parametric } n^{-1/2} \text{ rate.} & \text{if } \hat\beta_{\text{upper}} \ge -0.6 ;\text{and}; \hat\beta_{\text{lower}} \le -0.4, \[4pt] \text{Faster than } n^{-1/2}\text{; check for spurious artefacts.} & \text{if } \hat\beta_{\text{upper}} < -0.6, \[4pt] \text{Slower than } n^{-1/2}\text{; check for prior misspecification or model misspecification.} & \text{otherwise.} \end{cases} $$

The first branch tests whether the 95% CI overlaps the interval $[-0.6, -0.4]$.

The default subsample sizes, when sizes = NULL, are generated as

$$ \text{sizes} = \operatorname{unique}!\left(\operatorname{round}!\left(\exp!\left(\operatorname{seq}!\left(\log!\left(\max(20,,\lceil n/8\rceil)\right),;\log n,;\text{length.out}=5\right)\right)\right)\right). $$

Returns

A list of class c("gdpar_contraction_report", "list") with components:

Component Type Description
table data frame Columns n (subsample size), replicate (replicate index), median_width (median credible-interval width, NA_real_ if the refit failed). One row per (size, replicate) cell.
slope_estimate numeric scalar OLS slope $\hat\beta$ from lm(log_w ~ log_n), with names stripped via unname.
slope_se numeric scalar Standard error of $\hat\beta$, with names stripped.
slope_ci_lower numeric scalar Lower bound $\hat\beta - 1.96,\operatorname{SE}(\hat\beta)$.
slope_ci_upper numeric scalar Upper bound $\hat\beta + 1.96,\operatorname{SE}(\hat\beta)$.
verdict character One of three verdict strings (see Mathematics).
level numeric scalar The credible level used (echoed from the level argument).
warnings character vector Per-refit failure messages; empty if all refits succeeded.

Notes

  • Input validation. Calls assert_inherits(fit, "gdpar_fit", ...), assert_data_frame(data, ...), assert_count(replicates, ...), assert_numeric_scalar(level, ..., lower = 0, upper = 1), assert_count(iter_warmup, ...), assert_count(iter_sampling, ...), assert_count(chains, ...). The verbose argument is checked inline: if not a length-1 logical, gdpar_abort is called with class "gdpar_input_error". The sizes argument, when non-NULL, is validated inline: if not numeric, or if any entry is $&lt; 5$ or $&gt; n$, gdpar_abort is called with class "gdpar_input_error" and a message formatted via sprintf.
  • Suggested-package dependencies. Calls require_suggested("cmdstanr", ...) and require_suggested("posterior", ...). If either is unavailable, an error is raised by that helper.
  • Cost message. When verbose = TRUE, emits a gdpar_inform message of class "gdpar_optin_message" stating the total number of refits (length(sizes) * replicates).
  • Refit call construction. The original fit$call is copied and modified: data is set to quote(sub); iter_warmup, iter_sampling, chains are overwritten from the corresponding arguments; verbose is set to FALSE; refresh is set to 0L; skip_id_check is set to TRUE. The modified call is eval-uated in a freshly created environment (new.env(parent = parent.frame())) in which the symbol sub is bound to the subsampled data frame. The local variable call_data_arg_name <- "data" is assigned but never used.
  • Subsampling. For each (size, replicate) cell, sample.int(n, size = sz) draws a simple random sample without replacement. Despite the documentation mentioning "stratified by row order," the code performs uniform random sampling with no stratification.
  • Refit failure handling. Each refit is wrapped in tryCatch. On error, refit_failure_msg is populated (via <<- inside the error handler) with a formatted message including the size, replicate, and conditionMessage(e). A gdpar_warn of class "gdpar_diagnostic_warning" is emitted, the message is appended to warnings_msg, and a row with median_width = NA_real_ is recorded. The loop then continues to the next cell.
  • Variable selection. Posterior variables are retrieved via posterior::variables(draws). The ignore pattern "^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw)" is applied via grepl to exclude internal/auxiliary variables. If parameters is NULL, the filtered set (candidate_vars) is used; otherwise parameters is used directly without validation against available variables.
  • Width computation. posterior::summarise_draws is called on posterior::subset_draws(draws, variable = use_vars) with two custom summary functions q_lower and q_upper that wrap stats::quantile at probabilities $\alpha/2$ and $1 - \alpha/2$ respectively (with names = FALSE). Widths are computed as the element-wise difference q_upper - q_lower, and the cell's median_width is stats::median(widths).
  • Minimum data requirement. After removing NA rows, if fewer than 3 successful refits remain, gdpar_abort is called with class "gdpar_diagnostic_error" and message "Not enough successful refits to estimate the contraction slope.".
  • Regression. stats::lm(log_w ~ log_n) is fit on the non-NA subset. Coefficients and standard errors are extracted from stats::coef(reg) and summary(reg)$coefficients[, "Std. Error"] respectively, indexing by the name "log_n".
  • Side effects. May print a cost message (gdpar_inform), emit per-refit warnings (gdpar_warn), and perform length(sizes) * replicates full Bayesian refits via cmdstanr (through gdpar).

print.gdpar_contraction_report(x, ...)

Purpose

S3 print method for objects of class gdpar_contraction_report. Produces a human-readable summary of the contraction-rate diagnostic report, including the per-cell table, the estimated slope with standard error and 95% confidence interval, the verdict string, and any recorded warnings.

Arguments

Argument Type Meaning
x gdpar_contraction_report The report object returned by gdpar_contraction_diagnostic.
... any Unused; present for S3 generic compatibility.

Returns

Invisibly returns x (via invisible(x)).

Notes

  • Output format. Prints, in order:
    1. A header line "<gdpar_contraction_report> level = <level>" (using cat with sep = "").
    2. The table component via print(x$table, row.names = FALSE).
    3. A blank line, then "Slope estimate (log_width ~ log_n): <slope> (SE = <se>)" with values formatted to 3 significant digits via format(..., digits = 3).
    4. "95% CI: [<lower>, <upper>]" with values formatted to 3 significant digits.
    5. "Verdict: <verdict>".
    6. If length(x$warnings) > 0L, a blank line, the header "Warnings:", and each warning prefixed with " - ".
  • S3 dispatch. Registered as the print method for class gdpar_contraction_report; dispatched automatically when such an object is printed at the console.
  • No side effects beyond console output.

R/dependence_robust.R

.gdpar_eb_scalar_y_obs(object)

Purpose. Extracts the observed scalar outcome vector from a scalar Empirical-Bayes fit (gdpar_eb_fit) by reading the Stan data bundle stored in object$stan_data. It serves as the canonical accessor for the response used downstream by dependence diagnostics (e.g., residual-based Moran's I or block-bootstrap refit engines). Aborts for non-scalar outcomes (multivariate p > 1 or multi-slot K > 1), which are explicitly deferred in this sub-block.

Arguments.

Argument Type Meaning
object gdpar_eb_fit A scalar Empirical-Bayes fit object whose stan_data list contains the outcome vector.

Returns. A numeric vector (as.numeric(y_raw)), the observed outcome values. If the Stan data stored a real-valued response (y_real) that is used; otherwise y_int (count / Bernoulli families) is used. The result is always coerced to numeric.

Notes.

  • Reads object$stan_data$y_real first; if NULL, falls back to object$stan_data$y_int. If both are NULL, raises a gdpar_internal_error via gdpar_abort().
  • If y_raw is a matrix with more than one column (ncol(y_raw) > 1L), raises a gdpar_unsupported_feature_error, because multivariate (p > 1) outcomes are deferred.
  • Multi-slot (K > 1) outcomes are not checked here directly (that is handled by .gdpar_assert_scalar_eb()), but the matrix-column check implicitly guards against multi-column outcome matrices.
  • No S3 dispatch; purely internal.

.gdpar_assert_scalar_eb(object, arg_name = "object")

Purpose. Validates that object is a scalar Empirical-Bayes fit (gdpar_eb_fit) suitable for dependence-robust inference. Checks three conditions: (i) the object inherits from gdpar_eb_fit, (ii) it has no heterogeneous-family list (K > 1), and (iii) its conditional HMC fit is present.

Arguments.

Argument Type Meaning
object gdpar_eb_fit The fit object to validate.
arg_name character (length 1) Name of the argument, used in error messages. Defaults to "object".

Returns. invisible(object) — the same object, if all checks pass.

Notes.

  • Calls assert_inherits(object, "gdpar_eb_fit", arg_name) first; this is an external assertion helper that aborts with an appropriate class if the check fails.
  • If object$family$families is non-NULL, this indicates heterogeneous families (K > 1), and a gdpar_unsupported_feature_error is raised.
  • If object$conditional_fit is NULL, a gdpar_internal_error is raised because the conditional HMC fit is required for downstream residual extraction.
  • Returns invisibly to support use as a guard clause.

.gdpar_assert_scalar_dep(object, arg_name = "object")

Purpose. The Axis 2 gate (decision D102): validates that a fit object is a scalar fit on either the Empirical-Bayes or the full-Bayes path, suitable for dependence-robust inference. For EB fits it delegates verbatim to .gdpar_assert_scalar_eb(). For full-Bayes fits (gdpar_fit) it checks the path class and presence of the HMC fit. Any other class is rejected.

Arguments.

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit The fit object to validate.
arg_name character (length 1) Name of the argument, used in error messages. Defaults to "object".

Returns. invisible(object) — the same object, if all checks pass.

Notes.

  • If object inherits from gdpar_eb_fit, delegates to .gdpar_assert_scalar_eb(object, arg_name) and returns its result. This preserves byte-identical EB-path behaviour.
  • If object inherits from gdpar_fit:
    • Calls .gdpar_fit_path_class(object) (an internal helper elsewhere in the package) and asserts the result is "scalar". If not, raises gdpar_unsupported_feature_error (multivariate p > 1 and K > 1 full-Bayes fits are deferred).
    • If object$fit is NULL, raises gdpar_internal_error (the HMC fit is missing).
  • If object is neither gdpar_eb_fit nor gdpar_fit, raises gdpar_input_error with a message naming the offending argument.
  • Returns invisibly.

.gdpar_eb_estimate_vector(fit)

Purpose. Extracts the EB point-estimate vector from a scalar Empirical-Bayes fit and flattens it into a single named numeric vector. This is the EB touchpoint of the block-bootstrap engine: the same extraction is performed on each bootstrap refit, and column alignment across refits depends on the name stability guaranteed here.

Arguments.

Argument Type Meaning
fit gdpar_eb_fit A scalar Empirical-Bayes fit object.

Mathematics. No formula per se, but the extraction order is deterministic and fixed:

$$ \hat{\boldsymbol{\beta}} = \bigl(\hat{\theta}_{\text{ref}},; \hat{a},; \hat{b},; \hat{W}_{\text{raw}}\bigr)^{!\top} $$

where each component is a sub-vector of the named coefficients returned by coef.gdpar_eb_fit(). The concatenation order is: theta_ref, then a, then b, then W.

Returns. A named numeric vector containing all EB point estimates. Names follow the convention "theta_ref" or "theta_ref[1]" etc. for theta_ref, and "a[1]", "b[1]", "W[1]" etc. for the remaining components (unless the coef() result already provides names).

Notes.

  • Calls stats::coef(fit) to obtain the structured coefficient list.
  • Iterates over components "theta_ref", "a", "b", "W" in that fixed order.
  • If a component's $estimate field is NULL, it is silently skipped.
  • If names are NULL, synthetic names of the form "<comp>[<index>]" are generated. For theta_ref of length 1, the name is simply "theta_ref".
  • If no estimates can be extracted (all components NULL), raises gdpar_internal_error.
  • The result of do.call(c, unname(parts)) concatenates the named sub-vectors while preserving names.

.gdpar_eb_model_se_vector(fit)

Purpose. Mirrors .gdpar_eb_estimate_vector() but extracts the model-based (Laplace / conditional posterior) standard errors instead of point estimates. The resulting vector is name-aligned with the estimate vector, enabling ratio computations such as se_ratio = robust_se / model_se.

Arguments.

Argument Type Meaning
fit gdpar_eb_fit A scalar Empirical-Bayes fit object.

Mathematics. The model SE for each coefficient $k$ is:

$$ \text{model_SE}_k = \text{posterior SD from the Laplace approximation} $$

as stored in coef(fit)$<component>$se.

Returns. A named numeric vector of the same length and name structure as .gdpar_eb_estimate_vector(fit). If a component's $se field is NULL but its $estimate field is non-NULL, the corresponding entries are filled with NA_real_.

Notes.

  • Reads $se fields from the coef(fit) list. If $se is NULL for a given component but $estimate is present, fills with NA_real_ (length-matched via rep(NA_real_, length(est))).
  • Uses $estimate (not $se) to determine names and presence, ensuring alignment with the estimate vector.
  • Iterates over theta_ref, a, b, W in the same fixed order as .gdpar_eb_estimate_vector().
  • Returns a do.call(c, unname(parts)) result, identical structure to the estimate vector.

.gdpar_fb_coef_draws_matrix(object)

Purpose. Extracts the posterior draws of the AMM coefficients from a scalar full-Bayes fit (gdpar_fit) as a single $S \times P$ matrix ($S$ = number of posterior draws, $P$ = number of AMM coefficient parameters). This is the full-Bayes counterpart of the EB coefficient extraction. The matrix is used to compute both point estimates (posterior means) and model-based standard errors (posterior SDs).

Arguments.

Argument Type Meaning
object gdpar_fit A scalar full-Bayes fit object (already validated by .gdpar_assert_scalar_dep()).

Mathematics. Let $\boldsymbol{\theta}^{(s)}$ denote the $s$-th posterior draw of the AMM coefficient vector, $s = 1, \ldots, S$. The returned matrix is:

$$ \mathbf{M} = \begin{pmatrix} \boldsymbol{\theta}^{(1)\top} \ \vdots \ \boldsymbol{\theta}^{(S)\top} \end{pmatrix} \in \mathbb{R}^{S \times P} $$

where the columns correspond to the Stan variables theta_ref, a_coef, b_coef, and W_raw, in that order, each included only if the corresponding AMM component is active.

Returns. An $S \times P$ numeric matrix (unclassed draws_matrix) whose columns carry Stan variable names (e.g., "theta_ref[1]", "a_coef[1]"). Row count equals the number of posterior draws.

Notes.

  • Requires the suggested package posterior; calls require_suggested("posterior", "extract posterior draws") which will abort with an informative message if unavailable.
  • Reads draws via object$fit$draws() (the raw CmdStan / Stan fit object).
  • Variables included: always "theta_ref"; additionally "a_coef" if object$amm$a is non-NULL; "b_coef" if object$amm$b is non-NULL; "W_raw" if object$amm$W is non-NULL.
  • Uses raw W_raw draws (not sigma_W-scaled effective weights), matching the EB extractor's use of raw W_raw conditional estimates. This is a deliberate parity choice (decision D102).
  • Excludes hyperparameters (mu_theta_ref, sigma_theta_ref) for EB/FB parity.
  • If the resulting matrix is NULL or has zero columns, raises gdpar_internal_error.
  • Calls unclass() on the result to strip the draws_matrix class, returning a plain numeric matrix.

.gdpar_fb_estimate_vector(object)

Purpose. Computes the full-Bayes point-estimate vector as the posterior mean of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_estimate_vector().

Arguments.

Argument Type Meaning
object gdpar_fit A scalar full-Bayes fit object.

Mathematics. For each coefficient $k = 1, \ldots, P$:

$$ \hat{\theta}_k = \frac{1}{S} \sum_{s=1}^{S} \theta_k^{(s)} $$

where $\theta_k^{(s)}$ is the $s$-th posterior draw of coefficient $k$.

Returns. A named numeric vector of length $P$ with names taken from the column names of the draws matrix (e.g., "theta_ref[1]", "a_coef[1]", etc.).

Notes.

  • Calls .gdpar_fb_coef_draws_matrix(object) to obtain the $S \times P$ matrix, then computes column means via colMeans(mat).
  • Names are set from colnames(mat), which are the Stan variable names.

.gdpar_fb_model_se_vector(object)

Purpose. Computes the full-Bayes model-based standard error vector as the posterior standard deviation of each AMM coefficient column from the draws matrix. This is the full-Bayes counterpart of .gdpar_eb_model_se_vector().

Arguments.

Argument Type Meaning
object gdpar_fit A scalar full-Bayes fit object.

Mathematics. For each coefficient $k = 1, \ldots, P$:

$$ \text{model_SE}_k = \sqrt{\frac{1}{S - 1} \sum_{s=1}^{S} \bigl(\theta_k^{(s)} - \hat{\theta}_k\bigr)^2} $$

where $\hat{\theta}_k$ is the posterior mean. This is the sample standard deviation of the posterior draws.

Returns. A named numeric vector of length $P$, name-aligned with .gdpar_fb_estimate_vector(object). Names are taken from colnames(mat).

Notes.

  • Calls .gdpar_fb_coef_draws_matrix(object) then applies apply(mat, 2L, stats::sd) to compute column-wise standard deviations.
  • Uses stats::sd (which divides by $S - 1$, Bessel-corrected).
  • The "model SE" here is the posterior SD, which is like-for-like with the EB Laplace SD, so the se_ratio = robust_se / model_se comparison is a SD-vs-SD ratio on both paths.

.gdpar_dep_estimate_vector(object)

Purpose. Class-dispatched accessor for the point-estimate vector, the first touchpoint of the shared block-bootstrap engine. For a gdpar_eb_fit it delegates to .gdpar_eb_estimate_vector() (byte-identical EB path); for a gdpar_fit it delegates to .gdpar_fb_estimate_vector().

Arguments.

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A validated scalar fit object (EB or full-Bayes).

Mathematics. See .gdpar_eb_estimate_vector() and .gdpar_fb_estimate_vector().

Returns. A named numeric vector of AMM coefficient point estimates, regardless of path.

Notes.

  • Dispatch is via inherits(object, "gdpar_eb_fit") (manual S3-style, not formal UseMethod).
  • If the object is a gdpar_eb_fit, calls and returns .gdpar_eb_estimate_vector(object) verbatim, preserving regression-gate compatibility.
  • Otherwise (assumed gdpar_fit), calls .gdpar_fb_estimate_vector(object).
  • Column names are stable across refits of the same model specification, which is critical for the block-bootstrap column alignment.

.gdpar_dep_model_se_vector(object)

Purpose. Class-dispatched accessor for the model-based standard error vector, the second touchpoint of the shared block-bootstrap engine. EB path: Laplace / conditional posterior SD (verbatim). Full-Bayes path: posterior SD per coefficient. In both cases the "model SE" is a within-model (posterior / Laplace) standard deviation.

Arguments.

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A validated scalar fit object (EB or full-Bayes).

Mathematics. See .gdpar_eb_model_se_vector() and .gdpar_fb_model_se_vector().

Returns. A named numeric vector of model-based standard errors, name-aligned with .gdpar_dep_estimate_vector(object).

Notes.

  • Same dispatch pattern as .gdpar_dep_estimate_vector(): inherits(object, "gdpar_eb_fit") triggers the EB path; otherwise full-Bayes.
  • The resulting vector is used in computing se_ratio = robust_se / model_se, and because both EB and full-Bayes model SEs are standard deviations (SD-vs-SD), the ratio is a like-for-like comparison.
  • Name alignment with the estimate vector is guaranteed by the internal extractors.

Default block-length rate function (incomplete in section)

Purpose. According to the documentation comment, this function returns the rate-optimal default block length for the moving block bootstrap:

$$ b_n = \max!\bigl(1,; \lfloor n^{1/3} \rceil\bigr) $$

where $n$ is the time-series length and $\lfloor \cdot \rceil$ denotes rounding to the nearest integer. This is the optimal growth rate for the moving block bootstrap variance estimator (Künsch 1989; Hall, Horowitz & Jing 1995).

Arguments. Not defined in this section — the function body is truncated at the end of the provided source.

Returns. Presumably a single integer: $\max(1, \text{round}(n^{1/3}))$.

Notes.

  • The section is incomplete; only the roxygen/description comment is present. The function name and full signature are not visible in this segment.
  • The data-driven constant of Politis & White (2004) is noted as a deferred refinement; this default provides only the correct rate, not the optimal constant.
  • Full documentation will require the subsequent section(s) where the function body is defined.

.gdpar_default_block_length(n)

Purpose Computes the default block length for block bootstrap resampling using the cube-root rate $n^{1/3}$. Used as a fallback when the data-driven Politis–White selector cannot run (degenerate inputs, insufficient sample size, etc.).

Arguments

Argument Type Meaning
n integer-coercible scalar Sample size (number of observations).

Mathematics

Implements the rate:

$$\ell = \max!\bigl(1,;\lfloor n^{1/3} + 0.5 \rfloor\bigr)$$

where the rounding and flooring produce an integer $\ge 1$.

Returns A single integer: the default block length.

Notes The as.integer(round(...)) call rounds to the nearest integer and truncates; the outer max(1L, ...) guarantees the result is at least 1 even when n = 0 or n = 1.


.gdpar_is_auto(x)

Purpose Predicate that tests whether a block-size argument is the literal character string "auto", distinguishing the data-driven Politis–White path from a fixed integer or the NULL rate default. Shared by the temporal and spatial robust estimators.

Arguments

Argument Type Meaning
x any R object The block-length (or block-size) argument to inspect.

Returns A logical scalar: TRUE if x is exactly the character string "auto" (length-1 character, not NA), FALSE otherwise.

Notes The compound guard is.character(x) && length(x) == 1L && !is.na(x) && identical(x, "auto") is deliberately strict: a factor, an NA_character_, or a character vector of length ≠ 1 all return FALSE. No side effects.


.gdpar_flat_top_window(s)

Purpose Evaluates the flat-top lag window (kernel) of Politis (2003) / Politis & White (2004), vectorised over its argument. Used inside the Politis–White block-length selector to compute the spectral density estimate $\hat{g}$ and the sum $\widehat{\text{spec}}$.

Arguments

Argument Type Meaning
s numeric vector Scaled lag values $s = k/M$, where $k$ is the lag index and $M$ the bandwidth.

Mathematics

$$ \lambda(s) = \begin{cases} 1, & |s| \le \tfrac{1}{2},\[4pt] 2,(1 - |s|), & \tfrac{1}{2} < |s| \le 1,\[4pt] 0, & |s| > 1. \end{cases} $$

Returns A numeric vector of the same length as s containing $\lambda(s)$.

Notes Vectorised via nested ifelse over abs(s). No input validation; non-finite or NA inputs propagate NA.


.gdpar_pw_mhat(rho, Kn, crit)

Purpose Determines the adaptive bandwidth $\hat{m}$ for the Politis & White (2004) automatic block-length selector. Searches the sample autocorrelation sequence for the first run of Kn consecutive negligible lags (the "first insignificant run" rule). Factored out for direct unit testing.

Arguments

Argument Type Meaning
rho numeric vector Sample autocorrelations at lags $1, 2, \dots, L$ (computed from residuals).
Kn integer scalar Number of consecutive insignificant lags required to declare the bandwidth; typically $\max(5, \lceil\log_{10} n\rceil)$.
crit numeric scalar Critical value for the significance test; a lag $j$ is deemed insignificant when $

Mathematics

Returns the smallest integer $j$ such that

$$|\hat\rho(j + \ell)| < \texttt{crit} \quad \forall; \ell = 0, \dots, K_N - 1.$$

If no such run exists in $1,\dots,L$, the fallback is

$$\hat m = \max\bigl{j : |\hat\rho(j)| \ge \texttt{crit}\bigr},$$

i.e., the largest significant lag. If every lag is insignificant, $\hat m = 1$.

Returns An integer scalar: the estimated bandwidth $\hat m$.

Notes Early-return inside the for loop at the first qualifying run. The function operates on a logical vector insig <- abs(rho) < crit of length $L$, requiring $L \ge K_N$ for the scan to execute. Returns 1L as a safe minimum when all autocorrelations are negligible (near-white noise).


.gdpar_politis_white_block_length(resid, c_thresh = stats::qnorm(0.975))

Purpose Computes the optimal block length $b_{\text{opt}}$ for overlapping block bootstrap using the Politis & White (2004) data-driven selector with the Patton, Politis & White (2009) correction. Operates on residuals of a working-independence model (no model refit needed). Falls back to the $n^{1/3}$ rate with a human-readable reason when the data-driven path is infeasible.

Arguments

Argument Type Meaning
resid numeric vector Residuals of the fitted working-independence model, already in the (temporal or spatial) bootstrap ordering.
c_thresh numeric scalar Critical-value multiplier for the adaptive bandwidth test. Default qnorm(0.975) ≈ 1.96, matching np::b.star.

Mathematics

  1. Bandwidth selection. Compute sample autocorrelations $\hat\rho(1),\dots,\hat\rho(M_{\max})$ where $M_{\max} = \min(\lceil\sqrt{n}\rceil + K_N,; n-1)$ and $K_N = \max(5, \lceil\log_{10} n\rceil)$. The adaptive bandwidth $\hat{m}$ is found by .gdpar_pw_mhat with critical value

$$\texttt{crit} = c_{\text{thresh}} \sqrt{\frac{\log_{10} n}{n}}.$$

Set $M = \min(2\hat{m},; M_{\max})$.

  1. Spectral estimates. Recompute autocovariances $\hat{R}(k)$ for $k = 0,\dots,M$. Apply the flat-top window $\lambda(k/M)$:

$$\widehat{\text{spec}} = \hat{R}(0) + 2\sum_{k=1}^{M} \lambda(k/M),\hat{R}(k),$$

$$\hat{g} = 2\sum_{k=1}^{M} \lambda(k/M),k,\hat{R}(k).$$

  1. Optimal block length. For overlapping (moving/circular) block bootstrap the variance constant is $D = \tfrac{4}{3},\widehat{\text{spec}}^2$ (Lahiri 2003), giving

$$b_{\text{opt}} = \left(\frac{2,\hat{g}^2}{D}\right)^{1/3} n^{1/3}.$$

  1. Capping. The final integer block length is

$$b = \max!\Bigl(1,;\min!\bigl(\lfloor b_{\text{opt}} + 0.5 \rfloor,; \lceil\min(3\sqrt{n},; n/3)\rceil\bigr)\Bigr).$$

Returns A list with components:

Component Type Meaning
block_length integer The selected block length.
method character "auto" if the data-driven rule succeeded; "rate" if the fallback was used.
reason character Human-readable description of the selection, including $\hat{m}$, $M$, uncapped $b$, and cap.

Notes Five fallback paths return the $n^{1/3}$ rate with method = "rate": (i) $n &lt; 8$; (ii) non-positive or non-finite residual variance; (iii) $M_{\max} &lt; K_N + 1$ (series too short for lag scan); (iv) non-positive or non-finite $\widehat{\text{spec}}$ or non-finite $\hat{g}$; (v) implicitly when $\hat{g} \approx 0$ drives $b_{\text{opt}} \to 0$ (floored at 1, which is the honest data-driven answer, not a fallback).


.gdpar_block_bootstrap_data_indices(n, block_length, type = c("moving", "circular"))

Purpose Generates a resampled index vector of length n for a single temporal block bootstrap replicate. Draws ceiling(n / block_length) contiguous blocks with replacement, concatenates them, and truncates to length n. Supports both the moving (Künsch 1989) and circular (Politis & Romano 1992) block schemes.

Arguments

Argument Type Meaning
n integer-coercible scalar Sample size.
block_length integer-coercible scalar Length of each contiguous block; must be in $[1, n]$.
type character "moving" (default) or "circular". Matched via match.arg.

Mathematics

Let $B = \lceil n / \ell \rceil$ be the number of blocks and $\ell$ the block length.

  • Moving block bootstrap ("moving"): block start positions are drawn uniformly from ${1, 2, \dots, n - \ell + 1}$. Each block $b$ contributes indices $s_b, s_b+1, \dots, s_b + \ell - 1$.

  • Circular block bootstrap ("circular"): block start positions are drawn uniformly from ${1, 2, \dots, n}$. Indices wrap around modulo $n$: the raw index $i$ maps to $((i-1) \bmod n) + 1$.

The output is the first $n$ entries of the concatenated $B \times \ell$ index vector.

Returns An integer vector of length n containing resampled observation indices in ${1, \dots, n}$.

Notes

  • Raises an abort (class "gdpar_input_error") via gdpar_abort() if block_length is outside $[1, n]$.
  • The circular scheme gives every observation equal expected resampling weight, whereas the moving scheme slightly down-weights observations near the boundaries.
  • This is the single-chain sibling of a multi-chain MCMC-draw block bootstrap resampler (block_bootstrap_indices()) documented elsewhere.

.gdpar_dependence_residuals(object, residual_type, randomize_seed)

Purpose Computes residuals of a scalar fit (Empirical-Bayes or full-Bayes) for use in the dependence diagnostics. Shared by the temporal diagnostic (gdpar_dependence_diagnostic) and the spatial diagnostic (gdpar_spatial_dependence_diagnostic) to ensure a single, consistent residual definition (design decision D100).

Arguments

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A scalar fitted model object.
residual_type character One of "quantile", "response", "pearson", "deviance".
randomize_seed integer or NULL Seed for reproducibility of randomized quantile residuals for discrete families; ignored for continuous families.

Returns A numeric vector of residuals of length $n$.

Notes

  • Full-Bayes branch (gdpar_fit that is not a gdpar_eb_fit): delegates entirely to the S3 method stats::residuals(object, type = residual_type, randomize_seed = randomize_seed), which internally uses the posterior predictive draws and .gdpar_residuals_dispatch() (design decision D102).
  • Empirical-Bayes branch: extracts the scalar observed outcome via .gdpar_eb_scalar_y_obs(object), obtains response-type predictions via stats::predict(object, type = "response"), reads the family name from object$family$name, and dispatches to .gdpar_residuals_dispatch().

gdpar_dependence_diagnostic(object, index = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), max_lag = NULL, level = 0.95, randomize_seed = NULL, ...)

Purpose (Exported.) Quantifies serial (temporal) dependence in the residuals of a scalar Path 1 Empirical-Bayes or full-Bayes fit. The diagnostic is the gate for gdpar_dependence_robust(): it makes violations of the conditional-independence assumption visible and measurable before any block-bootstrap remedy is applied. Only scalar fits ($K = 1$, $p = 1$) are supported; multi-parameter paths are deferred.

Arguments

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A scalar fitted model.
index numeric vector or NULL Temporal (or one-dimensional) ordering of observations. If non-NULL, residuals are sorted by order(index) before statistics are computed. Must have length $n$.
residual_type character Residual type: "quantile" (default; Dunn-Smyth / randomized quantile residuals), "response", "pearson", or "deviance".
max_lag integer or NULL Maximum lag for the Ljung–Box test. Default: $\min(\lfloor 10\log_{10} n\rfloor,; n - 1)$.
level numeric in $(0,1)$ Confidence level for the verdict. Dependence is flagged when a p-value $&lt; 1 - \texttt{level}$. Default 0.95.
randomize_seed integer or NULL Seed for randomized quantile residuals (discrete families).
... Unused; present for signature stability.

Mathematics

  1. Lag-1 autocorrelation. Let $r_1, \dots, r_n$ be the (optionally re-ordered) residuals, $\bar{r}$ their mean, and $\tilde{r}_t = r_t - \bar{r}$. Then

$$\hat{\rho}_1 = \frac{\sum_{t=2}^{n} \tilde{r}_t,\tilde{r}_{t-1}}{\sum_{t=1}^{n} \tilde{r}_t^2}.$$

The approximate one-sided p-value under the null $\rho_1 = 0$ uses the normal approximation $\sqrt{n},\hat\rho_1 \dot\sim \mathcal{N}(0,1)$:

$$p_1 = 2,\Phi!\bigl(-|\sqrt{n},\hat\rho_1|\bigr).$$

  1. Durbin–Watson statistic. Reported descriptively (not as a formal test):

$$DW = \frac{\sum_{t=2}^{n}(r_t - r_{t-1})^2}{\sum_{t=1}^{n} r_t^2} ;\approx; 2(1 - \hat\rho_1).$$

Values near 2 indicate no first-order autocorrelation.

  1. Ljung–Box test. The omnibus test across lags $1, \dots, h$ (where $h = \texttt{max_lag}$) is

$$Q = n(n+2)\sum_{j=1}^{h}\frac{\hat\rho_j^2}{n-j} ;\dot\sim; \chi^2_h \quad\text{under } H_0,$$

computed via stats::Box.test(..., type = "Ljung-Box", fitdf = 0). The degrees of freedom are not reduced by the number of estimated model coefficients (fitdf = 0), making the test mildly optimistic for residuals of a fitted model.

  1. Verdict. Dependence is flagged when $p_{\text{Ljung-Box}} &lt; 1 - \texttt{level}$.

Returns An object of class c("gdpar_dependence_diagnostic", "list") with components:

Component Type Meaning
residual_type character The residual type used.
n integer Number of residuals.
max_lag integer Maximum lag used for the Ljung–Box test.
lag1_autocorr numeric $\hat\rho_1$.
lag1_p_value numeric Two-sided p-value for $\hat\rho_1$.
durbin_watson numeric Durbin–Watson statistic $DW$.
ljung_box_statistic numeric Ljung–Box $Q$ statistic.
ljung_box_df integer Degrees of freedom of the $\chi^2$ reference distribution.
ljung_box_p_value numeric P-value of the Ljung–Box test.
level numeric Confidence level used.
index_supplied logical Whether index was non-NULL.
verdict character Human-readable verdict string.

A print method (S3 dispatch on "gdpar_dependence_diagnostic") is provided for formatted output.

Notes

  • Input validation. Calls .gdpar_assert_scalar_dep(object, "object") to ensure the fit is scalar. Validates level via assert_numeric_scalar(level, ..., lower = 0, upper = 1). Requires the posterior package (suggested dependency) for extracting posterior draws.
  • Abort conditions. Raises gdpar_abort with class "gdpar_input_error" if index has the wrong length or max_lag is outside $[1, n-1]$. Raises class "gdpar_diagnostic_error" if all residuals have zero variance (denom <= 0).
  • S3 method note. The returned object carries class "gdpar_dependence_diagnostic" as its primary class, enabling print.gdpar_dependence_diagnostic() dispatch.
  • Scope. Only scalar ($K=1$, $p=1$) fits are accepted. Spatial dependence is handled by the sibling gdpar_spatial_dependence_diagnostic().

print.gdpar_dependence_diagnostic(x, digits, ...)

Purpose (Exported S3 method.) Provides a human-readable formatted summary of a gdpar_dependence_diagnostic object.

Arguments

Argument Type Meaning
x gdpar_dependence_diagnostic The diagnostic object to print.
digits integer Number of significant digits for the printed statistics. (Signature declared in roxygen; exact default and implementation body are in the subsequent section.)
... Unused; present for S3 generic compatibility.

Returns Invisibly returns x.

Notes The function body is defined in the next section (section 3 of 7); only the roxygen documentation and function signature are present in this section. The method is registered via @export for S3 dispatch on the "gdpar_dependence_diagnostic" class.

print.gdpar_dependence_diagnostic(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_dependence_diagnostic. Produces a human-readable, multi-line textual summary of the serial-dependence diagnostic battery (autocorrelation, Durbin–Watson, Ljung–Box) attached to a fitted model.

Arguments

Argument Type Meaning
x list (S3 class gdpar_dependence_diagnostic) The diagnostic object produced by gdpar_dependence_diagnostic(). Required fields: $residual_type, $index_supplied, $n, $lag1_autocorr, $lag1_p_value, $durbin_watson, $ljung_box_df, $ljung_box_statistic, $ljung_box_p_value, $verdict.
digits integer (default 3L) Number of significant digits used by format() when printing numeric quantities.
... Ignored; present for S3 method compatibility.

Returns invisible(x) — the input object, invisibly, following standard R print-method convention.

Notes

  • All output is emitted via cat() to the console (stdout). No value is returned visibly.
  • The print method checks x$index_supplied to annotate whether the residuals were ordered by a user-supplied index or by natural row order.
  • If x$index_supplied is TRUE, the residual-type line reads "(ordered by supplied index)"; otherwise "(natural row order)".
  • No validation of x fields is performed; missing or NULL fields will produce blank output segments.

.gdpar_dependence_robust_engine(object, data, resample_fun, B, level, seed, iter_warmup, iter_sampling, chains, verbose, verbose_msg, caller_env, ...)

Purpose Internal (non-exported) shared block-bootstrap-by-refit engine. Factors out the entire resampling loop, seed management, bootstrap-SE and percentile-interval assembly, and per-refit convergence accounting that is common to the temporal (gdpar_dependence_robust) and spatial (gdpar_spatial_dependence_robust) robust-inference wrappers. The two public entry points differ only in their resample_fun and in the descriptive metadata they attach; everything downstream of the resample is handled here identically. Serves both the Empirical-Bayes (gdpar_eb_fit) and full-Bayes (gdpar_fit) paths through class-dispatched extractors (decision D102).

Arguments

Argument Type Meaning
object list (S3 class gdpar_eb_fit or gdpar_fit) A scalar Path 1 fit object (K = 1, p = 1). Must carry $call (the original fitting call), and must be dispatchable by .gdpar_dep_estimate_vector() and .gdpar_dep_model_se_vector().
data data.frame The original data frame passed to the fitting function. Each bootstrap iteration indexes rows of this data frame.
resample_fun nullary function A closure with no arguments that returns an integer vector of length nrow(data) — the row indices for one bootstrap resample. For temporal bootstrapping this wraps .gdpar_block_bootstrap_data_indices() (moving or circular blocks ordered by index); for spatial bootstrapping it returns a spatial-block index vector.
B integer scalar Number of bootstrap refits to perform.
level numeric scalar in $(0, 1)$ Confidence level for the percentile interval (e.g. 0.95).
seed integer or NULL Optional RNG seed. When non-NULL, set.seed(as.integer(seed)) is called once before the loop, ensuring reproducibility of both the per-refit Stan seeds and the resample_fun() draws.
iter_warmup integer scalar Number of warmup (burn-in) iterations passed to each Stan refit.
iter_sampling integer scalar Number of post-warmup sampling iterations passed to each Stan refit.
chains integer scalar Number of HMC chains for each refit.
verbose logical scalar When TRUE, emits verbose_msg once at the start via gdpar_inform().
verbose_msg character or NULL Pre-formatted cost message printed when verbose is TRUE.
caller_env environment The environment (typically the public wrapper's parent.frame()) in which each refit call is evaluated, so that model symbols resolve exactly as for a direct gdpar_eb() or gdpar() call.
... Passed through to nothing directly; present for extensibility.

Mathematics

RNG-consumption contract. The engine's random-number consumption order is frozen for reproducibility:

  1. set.seed(seed) (when seed is non-NULL);
  2. $B$ per-refit Stan seeds drawn via sample.int(.Machine$integer.max, B) — these are assigned deterministically to iteration $b = 1, \ldots, B$;
  3. One call to resample_fun() per iteration $b$.

Point-estimate extraction. For each successful refit $\hat{\theta}^{(b)}$, the estimate vector is obtained via the class-dispatched extractor .gdpar_dep_estimate_vector(fit_b), which returns a named numeric vector of all AMM coefficients (theta_ref, a_coef, b_coef, W_raw, etc.).

Robust standard error. Let $\hat{\theta}j^{(b)}$ denote the estimate of parameter $j$ from bootstrap replicate $b$, and let $B{\text{ok}}$ be the number of replicates with no errors and no NA coefficients. Then:

$$ \widehat{\text{SE}}_{\text{robust},,j} = \text{SD}!\bigl(\hat{\theta}_j^{(b)} : b \in \text{successful}\bigr) = \sqrt{\frac{1}{B_{\text{ok}}-1}\sum_{b=1}^{B_{\text{ok}}}\bigl(\hat{\theta}_j^{(b)} - \bar{\hat\theta}_j\bigr)^2} $$

where $\bar{\hat\theta}_j$ is the bootstrap sample mean.

Percentile confidence interval. For level $\ell$ (e.g. 0.95), set $\alpha = 1 - \ell$. The two-sided percentile interval for parameter $j$ is:

$$ \bigl[\hat\theta_{j,,\alpha/2}^{_},;\hat\theta_{j,,1-\alpha/2}^{_}\bigr] $$

where $\hat\theta_{j,,q}^{*}$ is the $q$-quantile of the $B_{\text{ok}}$ successful bootstrap estimates, computed via stats::quantile(..., probs = c(alpha/2, 1 - alpha/2), names = FALSE).

SE ratio. The ratio comparing robust and model-based uncertainty:

$$ \text{se_ratio}_j = \frac{\widehat{\text{SE}}_{\text{robust},,j}}{\text{SE}_{\text{model},,j}} $$

A ratio $&gt; 1$ indicates the model-based SE understates the true sampling variability due to dependence.

Convergence diagnostics. Per-refit convergence fields are aggregated over the $B$ replicates (successful and unsuccessful):

  • $\text{max_rhat} = \max_b \hat{R}^{(b)}_{\max}$ (maximum across all refits of the per-refit maximum split-$\hat{R}$);
  • $\text{min_ess_bulk} = \min_b \text{ESS}_{\text{bulk},,\min}^{(b)}$ (minimum across all refits of the per-refit minimum bulk ESS);
  • $\text{n_divergent_refits} = |{b : D^{(b)} &gt; 0}|$ (number of refits with at least one divergent transition);
  • $\text{n_high_rhat_refits} = |{b : \hat{R}^{(b)}_{\max} &gt; 1.05}|$ (number of refits with max R-hat exceeding the 1.05 threshold).

The R-hat threshold $1.05$ is the classical Gelman–Rubin "clearly non-converged" heuristic. If $\text{max_rhat} &gt; 1.05$, a warning is appended advising the user to increase iter_warmup/iter_sampling.

Returns A list with components:

Component Type Description
table data.frame One row per AMM coefficient, columns: parameter (character), estimate (original point estimate), model_se (Laplace SD or posterior SD), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower, ci_upper (percentile interval endpoints).
B_ok integer Number of successful bootstrap refits (no errors, all coefficients non-NA).
seed integer The supplied seed, or NA_integer_ if seed was NULL.
warnings character vector Accumulated warning messages (refit failures, convergence issues). Zero-length if clean.
refit_diagnostics list Aggregate convergence summary: max_rhat (numeric), min_ess_bulk (numeric), n_divergent_refits (integer), n_high_rhat_refits (integer), rhat_threshold (numeric, always 1.05).

Notes

  • No refit exclusion. Under-converged or divergent refits are never excluded or down-weighted. The rationale (documented in source decision D102) is that excluding under-converged refits is non-random — it removes precisely the data configurations the bootstrap is meant to probe — and would bias the SE. Both R-hat breaches and divergence counts are reported as diagnostics only.
  • Error handling. If a refit raises an error, the error message is captured via tryCatch, stored in warnings_msg, and the iteration is skipped (next). If fewer than 2 refits succeed (B_ok < 2), the engine calls gdpar_abort() with class "gdpar_diagnostic_error", aborting the run.
  • Parameter alignment. Only parameters common to the original fit's param_names and each refit's estimate vector are recorded in boot[b, ]. This handles the (rare) case where a refit produces a partial coefficient vector.
  • Refit call construction. The refit call is object$call with fields overridden: datasub (the resampled data), iter_warmup, iter_sampling, chains, verboseFALSE, refresh0L, skip_id_checkTRUE, seedrefit_seeds[b]. A new environment env is created with parent = caller_env and env$sub <- sub, so the symbol sub resolves inside the call.
  • Diagnostics path-agnostic. Both gdpar_eb_fit and gdpar_fit objects carry a $diagnostics slot with fields rhat_max, ess_bulk_min, divergent_count. The engine reads whichever is present.
  • Byte-identical EB path. On the Empirical-Bayes path, the dispatch to .gdpar_dep_estimate_vector / .gdpar_dep_model_se_vector resolves to the original EB helpers, and the refit is a gdpar_eb() call, so the engine's output is bit-for-bit identical to the pre-D102 temporal-only implementation.

gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = "quantile", randomize_seed = NULL, type = "moving", B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = FALSE, ...)

Note: This function's roxygen documentation and @export directive appear at the end of this section (section 3 of 7); the actual function body is defined in a subsequent section. The documentation below is derived strictly from the roxygen block present here.

Purpose Public, exported entry point for dependence-robust standard errors via a temporal block bootstrap. Re-estimates the uncertainty of a scalar Path 1 Empirical-Bayes or full-Bayes fit so that it is robust to temporal (serial) dependence in the data, without modelling that dependence. It refits the model on $B$ moving (or circular) block bootstrap resamples of the data ordered by index, and reports the bootstrap standard deviation and percentile intervals of each AMM coefficient alongside the model-based (Laplace / posterior) standard errors. This implements the working-independence + robust-variance stance of Liang & Zeger (1986): the point estimates are unchanged (consistent when the mean structure is correct, not efficient); only the reported uncertainty is made dependence-robust. Delegates the core loop to .gdpar_dependence_robust_engine().

Arguments

Argument Type Meaning
object S3 object (gdpar_eb_fit or gdpar_fit) A scalar Path 1 fit (K = 1, p = 1): either from gdpar_eb() (Empirical Bayes) or gdpar() (full Bayes).
data data.frame The data frame originally passed to the fitting function. The fit object does not store the data (to stay lightweight), so it must be re-supplied. Resampled by contiguous blocks and the model is refit on each resample.
index numeric/integer vector of length $n$, or NULL Optional temporal ordering of the rows of data. Data are sorted by order(index) so that contiguous blocks correspond to contiguous time. When NULL (default), the natural row order is assumed to be the temporal order.
block_length NULL, positive integer, or "auto" Block size for the bootstrap. NULL (default): uses the rate-optimal $\max(1, \lfloor n^{1/3} \rceil)$ (Künsch 1989; Hall, Horowitz & Jing 1995). Positive integer: fixes the block length manually. "auto": selects the block length data-drivenly via the Politis & White (2004) automatic rule (with the Patton, Politis & White 2009 correction), computed from the fitted residuals (no extra refit), falling back to the rate-optimal formula on a degenerate series. The chosen value and method are reported in the result.
residual_type character, one of "quantile" (default), "response", "pearson", "deviance" Type of residuals fed to the Politis & White automatic block-length selector. Used only when block_length = "auto"; ignored otherwise. "quantile" refers to Dunn–Smyth randomized quantile residuals.
randomize_seed integer or NULL Optional seed for the randomized quantile residuals of discrete families. Used only by the "auto" block-length selector for reproducibility of the block-length choice; ignored otherwise.
type character, one of "moving" (default) or "circular" Type of block bootstrap. "moving" uses overlapping blocks that slide along the series; "circular" wraps the series into a circle.
B integer scalar (default 199L) Number of bootstrap refits.
level numeric scalar in $(0, 1)$ (default 0.95) Confidence level for the percentile interval.
seed integer or NULL Optional RNG seed controlling both the block resampling and deterministically derived per-refit Stan seeds, for full reproducibility.
iter_warmup integer scalar (default 500L) Number of warmup iterations per refit. Defaults are deliberately short to keep cost manageable.
iter_sampling integer scalar (default 500L) Number of post-warmup sampling iterations per refit.
chains integer scalar (default 2L) Number of HMC chains per refit.
verbose logical scalar (default FALSE) When TRUE, prints an opt-in cost message once.
... Additional arguments forwarded to gdpar_eb() (or gdpar()) for every refit.

Mathematics

The function applies the Liang & Zeger (1986) working-independence / sandwich-variance paradigm to the gdpar model class. The key quantities are:

  1. Block-length selection. Under the rate-optimal default: $$ L = \max!\bigl(1,, \lfloor n^{1/3} \rceil\bigr) $$ Under "auto", the Politis & White (2004) algorithm estimates the optimal block length from the spectral density at frequency zero of the fitted residuals, with the Patton–Politis–White (2009) bias correction.

  2. Moving block bootstrap. For series length $n$ and block length $L$, the moving block bootstrap draws $\lfloor n/L \rfloor$ contiguous blocks of length $L$ uniformly at random (with replacement) from the $n - L + 1$ possible overlapping blocks, concatenating them to form a resampled series of length $\approx n$.

  3. Circular block bootstrap. The series is wrapped into a circle; $n - L + 1$ is replaced by $n$ possible blocks, eliminating edge effects.

  4. Robust SE. Computed by the engine: $$ \widehat{\text{SE}}{\text{robust}} = \text{SD}\bigl(\hat\theta^{(1)}, \ldots, \hat\theta^{(B{\text{ok}})}\bigr) $$

  5. SE ratio. $$ \text{se_ratio} = \frac{\widehat{\text{SE}}{\text{robust}}}{\text{SE}{\text{model}}} $$ Values $&gt; 1$ signal that the model-based SE understates true sampling variability due to dependence.

Returns A list of S3 class gdpar_dependence_robust with components:

Component Type Description
table data.frame One row per AMM coefficient; columns: parameter, estimate, model_se, robust_se, se_ratio, ci_lower, ci_upper.
block_length integer The chosen block length.
block_length_method character One of "rate" (rate-optimal formula, also flags fallback from "auto"), "fixed" (user-supplied), "auto" (Politis–White).
type character "moving" or "circular".
B integer Requested number of bootstrap replications.
B_ok integer Number of successful refits.
level numeric Confidence level used.
index_supplied logical Whether the user supplied an index vector.
seed integer The supplied seed, or NA_integer_.
warnings character vector Accumulated warning messages (refit failures, convergence issues).
refit_diagnostics list Aggregate per-refit convergence: max_rhat, min_ess_bulk, n_divergent_refits, n_high_rhat_refits, rhat_threshold.

A print method (defined elsewhere) provides a human-readable summary.

Notes

  • Empirical-Bayes vs. full-Bayes parity. Both paths are supported (decision D102). On the EB path, estimate is the Laplace/conditional-posterior mean and model_se is its SD; on the full-Bayes path, estimate is the posterior mean and model_se is the posterior SD. The posterior mean (not median) is used for parity and to keep the SE ratio a dimensionless SD-vs-SD ratio without undeclared normal-scaling constants.
  • Full-Bayes caveats. (1) Each full-Bayes refit runs full HMC (costly). (2) Finite-iteration refits carry Monte-Carlo error in their posterior mean, which slightly and conservatively inflates robust_se. (3) Under an informative prior the full-Bayes posterior SD can be smaller than the bootstrap SD even under correct independent specification ($\text{se_ratio} &lt; 1$), because the prior concentrates the posterior beyond what the data alone support — this is benign regularization, not overstatement.
  • Scope limitation. The bootstrap delivers robust variance, not better point estimates. It is valid for weak / short-range dependence relative to block_length; it does not rescue long-memory or unit-root processes.
  • Dependencies. Uses cmdstanr for refits and posterior to extract coefficient estimates.
  • Exported. This function is exported from the package namespace (present in NAMESPACE).

gdpar_dependence_robust(object, data, index = NULL, block_length = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, type = c("moving", "circular"), B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)

Purpose Top-level exported function that performs a dependence-robust uncertainty audit for a fitted gdpar model via block bootstrap. It re-estimates standard errors (and confidence intervals) of model coefficients to account for temporal dependence in the residuals, without changing point estimates. The method repeatedly refits the model on block-bootstrap resamples of the original data.

Arguments

Argument Type Meaning
object gdpar_fit or gdpar_eb_fit or compatible The fitted model object whose uncertainty is to be audited.
data data.frame The original data frame used in fitting. Must be row-aligned with the model.
index numeric or NULL Optional temporal ordering variable. If non-NULL, data and residuals are sorted by this index before blocking. Must have length equal to nrow(data).
block_length NULL, positive integer, or "auto" Block length for the moving/circular block bootstrap. NULL uses a default rate $n^{1/3}$. "auto" selects the block length data-adaptively via the Politis–White (2004) plug-in method on the residuals.
residual_type character scalar, one of "quantile", "response", "pearson", "deviance" Type of residual used when block_length = "auto" for the Politis–White plug-in (and for spatial diagnostics). Matched via match.arg.
randomize_seed integer or NULL Seed for randomized quantile residuals (used only if residual_type = "quantile").
type character scalar, one of "moving", "circular" Block-bootstrap scheme. "moving" uses overlapping blocks of length block_length; "circular" wraps the data end-to-end. Matched via match.arg.
B positive integer Number of bootstrap replicates (default 199).
level numeric in $(0,1)$ Confidence level for percentile-based intervals (default 0.95).
seed integer or NULL Master seed passed to the engine for reproducibility.
iter_warmup positive integer Stan warmup iterations per refit.
iter_sampling positive integer Stan sampling iterations per refit.
chains positive integer Number of MCMC chains per refit.
verbose logical scalar If TRUE, prints an informational banner describing the audit before computation begins.
... Additional arguments passed through to .gdpar_dependence_robust_engine and ultimately to the Stan refit.

Mathematics

Default block length (rate method):

When block_length is NULL, the default block length is set to

$$\ell = \max!\bigl(1,;\lfloor n^{1/3} \rfloor\bigr)$$

where $n = \texttt{nrow(data)}$. This is the $d=1$ specialisation of the variance-MSE-optimal rate $\ell \sim n^{d/(d+2)}$.

Block bootstrap:

For each of $B$ replicates, a set of $n$ row indices is drawn by .gdpar_block_bootstrap_data_indices(n, block_length, type). If type = "moving", consecutive blocks of length $\ell$ are drawn starting at uniformly random positions; if type = "circular", the data are conceptually wrapped in a circle.

Auto block length (Politis–White):

When block_length = "auto", residuals $r_i$ are extracted from the model. The Politis–White (2004) plug-in estimator is applied to select $\ell$, returning a list with $block_length, $method, and $reason.

Robust standard error:

The block-bootstrap standard error of each coefficient is the sample standard deviation of the $B$ bootstrap replications. The ratio

$$\texttt{se_ratio} = \frac{\widehat{\mathrm{SE}}_{\text{robust}}}{\widehat{\mathrm{SE}}_{\text{model}}}$$

measures how much the model-based uncertainty understates the dependence-robust uncertainty; values $&gt; 1$ indicate that naive standard errors are anticonservative.

Returns An object of class c("gdpar_dependence_robust", "list") with the following components:

Component Type Meaning
table data.frame Coefficient table with robust SEs, model SEs, se_ratio, and confidence intervals at the requested level.
block_length integer The block length used (after resolution of NULL or "auto").
block_length_method character One of "fixed", "rate", or the method string returned by Politis–White.
type character The bootstrap scheme used ("moving" or "circular").
B integer Requested number of replicates.
B_ok integer Number of replicates that completed successfully.
level numeric Confidence level.
index_supplied logical Whether the caller supplied an ordering index.
seed integer or NULL Seed actually used by the engine.
warnings character vector Accumulated warning messages from failed or slow refits.
refit_diagnostics list or NULL Aggregate convergence diagnostics across all refits (max R-hat, min ESS, divergent transitions, high-R-hat count).

Notes

  • The function requires the cmdstanr and posterior packages; if absent, a suggestion-error is raised.
  • Validation errors (class = "gdpar_input_error") are raised for: non-scalar object, non-data-frame data, mismatched index length, invalid block_length (non-NULL, non-integer, non-"auto"), block_length outside $[1, n]$, non-scalar logical verbose.
  • If index is non-NULL, both data and (internally) residuals are reordered by order(index) before blocking, ensuring temporal coherence.
  • The function detects whether object inherits from "gdpar_fit" but not "gdpar_eb_fit" (i.e., is a full-Bayes fit) and adjusts the verbose message to warn that full HMC refits are markedly more expensive.
  • The resample-generating closure resample_fun is created in the local environment and passed to the engine.
  • caller_env <- parent.frame() is captured so the engine can re-evaluate expressions in the caller's scope if needed.

print.gdpar_dependence_robust(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_dependence_robust. Renders a human-readable summary of the block-bootstrap audit results to the console.

Arguments

Argument Type Meaning
x gdpar_dependence_robust The object to print.
digits integer scalar (default 3) Number of significant digits for formatting numeric columns in the table.
... Unused; present for S3 generic compatibility.

Returns Invisibly returns x (the input object).

Notes

  • Prints the bootstrap scheme ("moving" or "circular"), block length (with provenance label: "auto: Politis-White", "rate: n^(1/3)", or blank for fixed), $B$, $B_{\text{ok}}$, index-supplied status, and confidence level.
  • The label for block_length_method uses a switch with four branches: "auto", "rate", "fixed", and a default empty string; the %||% operator defaults to "fixed" if the component is NULL.
  • Numeric columns of the table are formatted with format(col, digits = digits).
  • Appends an explanatory note about the se_ratio interpretation.
  • Calls .gdpar_print_refit_diagnostics() to print convergence diagnostics.
  • If warnings are present, prints up to 5, with a count of remaining suppressed warnings.

.gdpar_print_refit_diagnostics(rd, digits = 3L)

Purpose Internal helper that prints aggregate per-refit convergence diagnostics (max R-hat, min ESS bulk, divergent transition count, high-R-hat refit count) to the console. Called by print.gdpar_dependence_robust.

Arguments

Argument Type Meaning
rd list or NULL The refit_diagnostics component of a gdpar_dependence_robust object.
digits integer scalar (default 3) Number of significant digits for formatting.

Returns invisible(NULL) in all cases.

Notes

  • Returns early (silently) if rd is NULL.
  • Also returns early if rd$max_rhat is NULL or non-finite (!is.finite(mr)).
  • Uses %||% to default missing components to NA_real_ or 0L or 1.05 as appropriate.
  • Prints a single formatted line showing: max R-hat, min ESS (bulk), number of divergent refits, number of refits with R-hat above a threshold (default 1.05).

.gdpar_spatial_default_g(n)

Purpose Internal function returning the variance-optimal default number of grid cells per axis $g$ for the spatial block bootstrap in two dimensions.

Arguments

Argument Type Meaning
n integer Number of spatial observations.

Mathematics

Implements the $d = 2$ specialisation of the variance-MSE-optimal block rate. The optimal number of points per block is $M \sim n^{d/(d+2)}$, so the number of cells per axis is

$$g = \max!\bigl(2,;\lfloor n^{1/4} + 0.5 \rfloor\bigr)$$

yielding $g^2 \sim n^{1/2}$ total cells and $M \sim n^{1/2}$ points per cell. For $d=1$ this reduces to the $n^{1/3}$ temporal rate.

Returns Integer scalar $g \geq 2$.

Notes

  • Uses round(n^(1/4)) then coerces to integer, with a floor of 2.
  • The documentation references decision D100 as the registered dissent.

.gdpar_validate_coords(coords, n, arg = "coords")

Purpose Internal function that validates and coerces a coordinate matrix into a numeric $n \times 2$ matrix suitable for spatial analysis.

Arguments

Argument Type Meaning
coords data.frame, matrix, or other Coordinate input to validate.
n integer Expected number of rows (observations).
arg character (default "coords") Name of the argument for error messages.

Returns A numeric matrix with exactly 2 columns and n rows, with no non-finite values.

Notes

  • If coords is a data.frame, it is coerced via as.matrix().
  • Raises gdpar_abort (class "gdpar_input_error") if:
    • coords is not a numeric matrix after coercion.
    • coords does not have exactly 2 columns.
    • nrow(coords) != n.
    • coords contains any non-finite values (NA, NaN, Inf, -Inf).

.gdpar_knn_adjacency(coords, k)

Purpose Internal function constructing a binary $k$-nearest-neighbour spatial adjacency matrix using Euclidean distance. Ties are broken by index order, so duplicate locations are well-defined but spatially degenerate.

Arguments

Argument Type Meaning
coords numeric matrix ($n \times 2$) Spatial coordinates (assumed validated).
k positive integer Number of nearest neighbours.

Mathematics

Computes the $n \times n$ Euclidean distance matrix

$$D_{ij} = \lVert \mathbf{x}_i - \mathbf{x}_j \rVert_2$$

via stats::dist. For each observation $i$, the diagonal entry $D_{ii}$ is set to $\infty$, the $k$ smallest distances are selected by order, and the corresponding entries of the adjacency matrix $W$ are set to 1:

$$W_{ij} = \begin{cases} 1 & \text{if } j \in \text{KNN}_k(i) \ 0 & \text{otherwise} \end{cases}$$

The resulting $W$ is generally asymmetric (if $j$ is a nearest neighbour of $i$, $i$ is not necessarily a nearest neighbour of $j$).

Returns An $n \times n$ binary (0/1) integer matrix $W$.

Notes

  • All $n$ rows of $W$ are initialized to zero, then row-by-row the $k$ nearest neighbours are set to 1.
  • Because order breaks ties by position, duplicate coordinates are handled deterministically.

.gdpar_distance_band_adjacency(coords)

Purpose Internal function constructing a binary distance-band adjacency matrix. The threshold is data-driven: the smallest distance that leaves no observation isolated.

Arguments

Argument Type Meaning
coords numeric matrix ($n \times 2$) Spatial coordinates (assumed validated).

Mathematics

  1. Compute the $n \times n$ Euclidean distance matrix $D$.
  2. Set the diagonal $D_{ii} = \infty$.
  3. The bandwidth threshold is

$$d^* = \max_{i=1,\ldots,n} \min_{j \neq i} D_{ij}$$

i.e., the maximum over all points of their nearest-neighbour distance. This ensures every point has at least one neighbour.

  1. The adjacency matrix is

$$W_{ij} = \begin{cases} 1 & \text{if } D_{ij} \leq d^* \text{ and } i \neq j \ 0 & \text{otherwise} \end{cases}$$

Returns An $n \times n$ binary (0/1) matrix $W$ with zero diagonal.

Notes

  • Described as a "declared data-driven heuristic."
  • The resulting $W$ is symmetric because Euclidean distance is symmetric.
  • The diagonal is explicitly zeroed after the threshold comparison.

.gdpar_morans_i(resid, W, S0 = sum(W))

Purpose Internal function computing Moran's $I$ statistic for a residual vector under a (possibly asymmetric) spatial weights matrix. Used to test for spatial autocorrelation.

Arguments

Argument Type Meaning
resid numeric vector of length $n$ Residuals (row-aligned with the weights matrix).
W $n \times n$ numeric matrix Spatial weights (binary adjacency or otherwise; need not be symmetric).
S0 numeric scalar (default sum(W)) The sum of all weights $S_0 = \sum_{i,j} w_{ij}$. Pre-computed for efficiency.

Mathematics

Let $\bar{r} = \frac{1}{n}\sum_{i=1}^n r_i$ and define the centred residuals $z_i = r_i - \bar{r}$. Moran's $I$ is

$$I = \frac{n}{S_0} \cdot \frac{\displaystyle\sum_{i=1}^n \sum_{j=1}^n w_{ij}, z_i, z_j}{\displaystyle\sum_{i=1}^n z_i^2}$$

In vector notation, with $\mathbf{z} = (z_1, \ldots, z_n)^\top$:

$$I = \frac{n}{S_0} \cdot \frac{\mathbf{z}^\top W \mathbf{z}}{\mathbf{z}^\top \mathbf{z}}$$

The implementation computes $W\mathbf{z}$ via matrix multiplication (W %*% z), then takes the elementwise product $\mathbf{z} \odot (W\mathbf{z})$ and sums, which is equivalent to $\mathbf{z}^\top W \mathbf{z}$.

Returns A numeric scalar: the Moran's $I$ statistic.

Notes

  • Under the null hypothesis of no spatial autocorrelation and row-standardised weights, $E[I] \approx -1/(n-1)$. Values near 1 indicate positive spatial autocorrelation; values near $-1/(n-1)$ indicate negative autocorrelation.
  • The formula as implemented handles asymmetric $W$ correctly because $\sum_{ij} w_{ij} z_i z_j = \mathbf{z}^\top W \mathbf{z}$ does not require symmetry.
  • No $p$-value or reference distribution is computed here; this is a pure computational helper.

.gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges)

Purpose Internal function generating a length-$n$ vector of resampled row indices for a spatial block bootstrap. The spatial analogue of .gdpar_block_bootstrap_data_indices() for 2-D coordinates.

Arguments

Argument Type Meaning
coords numeric matrix ($n \times 2$) Spatial coordinates (assumed validated and row-aligned).
g positive integer Number of grid cells per axis.
scheme character, "tiled" or "moving" Spatial bootstrap scheme.
random_origin logical If TRUE, the grid origin is randomized per replicate (Politis–Romano–Lahiri randomized partition).
mins numeric vector of length 2 Coordinate minima per axis (bounding-box lower corner).
ranges numeric vector of length 2 Coordinate range per axis (bounding-box extent).

Mathematics

Cell side lengths:

$$\mathbf{L} = \frac{\mathbf{ranges}}{g} = \left(\frac{\text{range}_x}{g},; \frac{\text{range}_y}{g}\right)$$

Tiled scheme:

  1. Set the origin $\mathbf{o}$. If random_origin = TRUE, draw $\mathbf{u} \sim U(0,1)^2$ and set $\mathbf{o} = \mathbf{mins} - \mathbf{u} \odot \mathbf{L}$; otherwise $\mathbf{o} = \mathbf{mins}$.
  2. Assign each observation $i$ to a cell:

$$c_{x,i} = \left\lfloor \frac{x_i - o_x}{L_x} \right\rfloor, \quad c_{y,i} = \left\lfloor \frac{y_i - o_y}{L_y} \right\rfloor$$

  1. Group observations by cell label $(c_{x,i}, c_{y,i})$.
  2. Sample cells with replacement (uniform) and concatenate their member indices until $\geq n$ indices accumulate. Truncate to exactly $n$.

Moving scheme:

  1. Repeatedly draw a random seed point $\mathbf{s}$ from the data.
  2. Draw $\mathbf{u} \sim U(0,1)^2$ and set the block origin $\mathbf{o} = \mathbf{s} - \mathbf{u} \odot \mathbf{L}$.
  3. Collect all observations within the axis-aligned square $[\mathbf{o},, \mathbf{o} + \mathbf{L})$.
  4. Append to the output until $\geq n$ indices accumulate. Truncate to exactly $n$.

Returns An integer vector of length $n$ containing resampled row indices (1-based, with possible repetitions).

Notes

  • In the tiled scheme, non-empty cells are guaranteed to have at least one observation. Empty cells are implicitly excluded because split only creates groups for observed cell labels.
  • In the moving scheme, every block is guaranteed non-empty because the block is anchored at a randomly sampled observation and is sized to cover at least that point (assuming the observation falls inside the bounding box, which it does by construction).
  • The random_origin mechanism implements the Politis–Romano–Lahiri randomized partition to break grid-alignment artifacts.

.gdpar_spatial_block_length_auto(coords, resid, scheme, random_origin, mins, ranges, B0 = 200L, var_const = 1, seed = NULL)

Purpose Internal workhorse that data-selects the spatial block size $g$ (the number of grid cells along each axis) for the spatial block bootstrap. It minimises a mean-squared-error (MSE) criterion over a grid of candidate $g$ values, balancing bias (distance from the large-block anchor) against the variance of the bootstrap variance estimator. When the automatic procedure cannot run or fails, it falls back to the $n^{1/4}$ default rate via .gdpar_spatial_default_g.

Arguments

Argument Type Meaning
coords numeric $n \times 2$ matrix Spatial coordinates (columns = $x$, $y$), row-aligned with resid.
resid numeric vector of length $n$ Model residuals (centred internally: $z_i = \text{resid}_i - \bar{\text{resid}}$).
scheme character string Block-tile scheme identifier forwarded to .gdpar_spatial_block_indices (controls how spatial blocks are laid out relative to random_origin, mins, ranges).
random_origin logical or scalar Whether to randomise the grid origin in each bootstrap replicate (forwarded to .gdpar_spatial_block_indices).
mins numeric vector of length 2 Minimum coordinate values $(x_{\min}, y_{\min})$ defining the bounding box of the spatial domain.
ranges numeric vector of length 2 Coordinate ranges $(r_x, r_y)$ of the spatial domain.
B0 integer (default 200L) Number of Monte Carlo block-bootstrap replicates per candidate $g$.
var_const numeric scalar (default 1) Multiplicative constant $c$ scaling the variance-of-variance term in the MSE criterion.
seed integer or NULL Optional seed set via set.seed() before the bootstrap loop for reproducibility.

Mathematics

The procedure operates as follows.

  1. Default fallback. Compute $g_{\text{def}} = \lfloor n^{1/4} \rfloor$ via .gdpar_spatial_default_g(n). Early returns use $g_{\text{def}}$ when:

    • $n &lt; 25$,
    • coordinate spread is degenerate ($\text{sd}(x) \le 0$ or $\text{sd}(y) \le 0$),
    • fewer than 3 valid grid points exist,
    • bootstrap variances are non-finite or all zero,
    • the MSE criterion is non-finite, or
    • the MSE minimum falls on the largest-$g$ (smallest-block) boundary.
  2. Design matrix for a spatial-mean surrogate. Construct a $n \times 3$ matrix $$\mathbf{D}_{\text{surr}} = \bigl[,\mathbf{1},; (x - \bar x)/s_x,; (y - \bar y)/s_y,\bigr]$$ where $s_x, s_y$ are the coordinate standard deviations.

  3. Candidate grid. Define bounds $$g_{\text{lo}} = \max!\bigl(2,;\lfloor 0.5, g_{\text{def}} \rfloor\bigr), \qquad g_{\text{hi}} = \min!\bigl(\lfloor 3, g_{\text{def}} \rfloor,; \lfloor \sqrt{n/3},\rfloor\bigr).$$ Generate 6 points on a log-spaced grid in $[g_{\text{lo}},, g_{\text{hi}}]$, round to unique integers, and retain only $g \ge 2$ with average cell occupancy $n/g^2 \ge 3$.

  4. Bootstrap variance per $g$. For each candidate $g$ and each replicate $b = 1,\dots,B_0$:

    • Draw a spatial block index vector $\mathcal{I}_b$ from .gdpar_spatial_block_indices.
    • Compute a $3$-vector of block-level spatial-mean statistics: $$\mathbf{T}b = \frac{1}{n},\mathbf{D}{\text{surr}}[\mathcal{I}_b,]^\top, z[\mathcal{I}_b].$$
    • Aggregate across coordinates: $$V_g = \sum_{j=1}^{3} \operatorname{Var}{b}(T{b,j}).$$
    • Also count the number of unique occupied tiles $n_{\text{tiles}}(g)$.
  5. MSE criterion. Smooth $V_g$ with a running median ($k=3$): $\tilde V_g = \operatorname{runmed}(V_g,, 3)$. Anchor at $g_{\min}$ (the smallest candidate, i.e.\ the largest blocks). Then: $$\text{bias}^2(g) = \bigl(\tilde V_g - \tilde V_{g_{\min}}\bigr)^2, \qquad \text{var}(g) = c ;\frac{\tilde V_g^{,2}}{n_{\text{tiles}}(g)},$$ $$\text{MSE}(g) = \text{bias}^2(g) + \text{var}(g).$$ The variance term reflects the inverse-number-of-blocks scaling (Lahiri 2003), not the Monte Carlo noise from finite $B_0$.

  6. Selection. $g^* = \arg\min_g \text{MSE}(g)$. If $g^*$ equals the last (smallest-block) grid element, the procedure bails out to the $n^{1/4}$ default (anticonservative boundary).

Returns A named list with three elements:

Element Type Meaning
block_size integer The chosen $g^*$ (or the default $g_{\text{def}}$ on fallback).
method character "auto" if the data-driven selection succeeded; "rate" on any fallback.
reason character Human-readable explanation: on success a formatted string with the grid, $B_0$, $c$, and $g^*$; on fallback a diagnostic message indicating why the default was used.

Notes

  • Fallback cascade. There are six distinct early-return paths, all returning method = "rate" via the inner fb() helper, each with a different reason string. The function never errors; it always returns a valid list.
  • Bootstrap machinery. All block-index generation delegates to .gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges) which implements the spatial tiling and optional random-origin jitter.
  • Side effects. Calls set.seed() when seed is non-NULL. No other side effects.
  • No S3 dispatch. This is an internal utility, not an S3 generic or method.
  • Cell-occupancy bound. The upper cap on $g$ enforces $n/g^2 \ge 3$ (at least ~3 observations per cell on average), a validity constraint for within-cell resampling. This is deliberately looser than the $n^{1/3}$ rate sometimes seen in the spatial bootstrap literature.
  • Running median smoothing. stats::runmed(..., k = 3L) uses a centred running median by default (endrule = "median"), so the first and last values may be smoothed with a half-window.

gdpar_spatial_dependence_diagnostic(object, coords, W = NULL, weights = c("knn", "distance"), k = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), test = c("permutation", "analytic"), n_perm = 999L, level = 0.95, randomize_seed = NULL, seed = NULL, ...)

Purpose Exported diagnostic that quantifies spatial autocorrelation in the residuals of a scalar ($K = 1$, $p = 1$) Empirical-Bayes or full-Bayes fit via Moran's $I$. It tests the null hypothesis of spatial exchangeability (conditional independence). A significant result signals that the model-based (posterior / Laplace) uncertainty is too narrow because spatial dependence in the residuals violates the independence assumption. This is the spatial analogue of gdpar_dependence_diagnostic and the recommended gate before calling gdpar_spatial_dependence_robust.

Arguments

Argument Type Meaning
object gdpar_eb_fit or gdpar_fit A scalar Path 1 fit ($K = 1$, $p = 1$). Asserted by .gdpar_assert_scalar_dep.
coords numeric $n \times 2$ matrix or data frame Spatial coordinates, row-aligned with training data. Validated by .gdpar_validate_coords.
W numeric $n \times n$ matrix or NULL User-supplied spatial weight matrix. Overrides weights/k. Diagonal is zeroed internally. Row-standardized before use.
weights character, one of "knn" (default) or "distance" Neighbourhood construction method when W is NULL. "knn" = $k$-nearest-neighbour adjacency; "distance" = distance-band whose threshold is the smallest that isolates no location. Both produce row-standardized weights. Ignored when W is supplied.
k integer or NULL Number of neighbours for weights = "knn". Default heuristic: $\max(4,;\min(\lfloor\log n\rceil,; n-1))$. Must satisfy $1 \le k \le n-1$.
residual_type character, one of "quantile" (default), "response", "pearson", "deviance" Type of residual extracted from object via .gdpar_dependence_residuals. "quantile" = randomized quantile (Dunn–Smyth) residuals.
test character, one of "permutation" (default) or "analytic" Hypothesis test for Moran's $I$. "permutation" = location-relabelling permutation test (two-sided via $
n_perm integer (default 999L) Number of permutations for test = "permutation". Capped below $n!$ for tiny $n$.
level numeric in $(0, 1)$ (default 0.95) Confidence level used to convert the $p$-value into a verdict.
randomize_seed integer or NULL Seed for randomized quantile residuals of discrete families; ignored otherwise.
seed integer or NULL Seed for the permutation test (reproducibility).
... Unused; present for signature stability.

Mathematics

The function implements the following:

Moran's $I$. With row-standardized weights $w_{ij}$ (so that $\sum_j w_{ij} = 1$) and $S_0 = \sum_{i,j} w_{ij} = n$: $$I = \frac{n}{S_0};\frac{\sum_{i}\sum_{j} w_{ij},(z_i - \bar z)(z_j - \bar z)}{\sum_i (z_i - \bar z)^2}$$ Under spatial exchangeability the expected value is: $$E[I] = -\frac{1}{n - 1}$$

Permutation test. For each of n_perm permutations $\pi$:

  1. Compute $I_\pi$ from the residuals $z_{\pi(i)}$.
  2. Two-sided $p$-value: $$p = \frac{1 + #{b : |I_\pi - E[I]| \ge |I_{\text{obs}} - E[I]|}}{\text{n_perm} + 1}$$

Analytic (Cliff–Ord) normal approximation. Define: $$S_1 = \frac{1}{2}\sum_{i,j}(w_{ij} + w_{ji})^2, \qquad S_2 = \sum_i!\Bigl(\sum_j w_{ij} + \sum_j w_{ji}\Bigr)^{!2}$$ $$\operatorname{Var}[I] = \frac{n^2 S_1 - n S_2 + 3 S_0^2}{S_0^2,(n^2 - 1)} - E[I]^2$$ $$z = \frac{I - E[I]}{\sqrt{\operatorname{Var}[I]}}, \qquad p = 2,\Phi(-|z|)$$

Returns A list of class c("gdpar_spatial_dependence_diagnostic", "list") with components:

Component Type Meaning
residual_type character The residual type used.
n integer Number of observations.
weights character "user" if W was supplied, otherwise the matched weights argument.
k integer $k$ used for kNN, or NA_integer_ if not applicable.
style character Always "W" (row-standardized).
n_zero_weight integer Number of locations with zero row sum in the raw weight matrix.
morans_i numeric Observed Moran's $I$, or NA if any location has zero weight.
expected_i numeric $-1/(n-1)$, or NA if undefined.
var_i numeric Analytic variance of $I$ (Cliff–Ord), or NA when test = "permutation" or the analytic variance is non-positive.
z numeric $z$-score from the analytic test, or NA when not computed.
p_value numeric Two-sided $p$-value, or NA if the test is undefined.
test character "permutation" or "analytic".
n_perm integer Effective number of permutations (may be less than requested for tiny $n$), or NA for the analytic test.
level numeric Confidence level used.
verdict character Human-readable summary string. Three forms: (1) "...Undefined..." if zero-weight locations exist, (2) "Spatial dependence detected..." if $p &lt; \alpha$, (3) "No evidence against spatial independence..." otherwise. Also handles the case where $p$ is NA.

Notes

  • S3 dispatch. This is an exported function (not a method). A print method for the returned object is defined immediately below.
  • Guards and assertions.
    • .gdpar_assert_scalar_dep(object, "object") enforces that the fit is scalar ($K = 1$, $p = 1$).
    • stats::var(resid) <= 0 triggers an abort via gdpar_abort with class "gdpar_diagnostic_error".
    • If W is supplied, it must be a numeric $n \times n$ matrix with all finite values; violations abort with class "gdpar_input_error".
    • k must satisfy $1 \le k \le n - 1$; otherwise abort with class "gdpar_input_error".
  • Zero-weight locations. If any location has zero row sum after weight construction, a warning is emitted, morans_i is set to NA, and the verdict reports "Undefined". With kNN and $k \ge 1$ this should never occur (every point gets at least one neighbour).
  • Small-sample warnings. For test = "permutation":
    • $n &lt; 20$: hard warning ("very small… treat the p-value as indicative only").
    • $n &lt; 50$: soft warning ("small… approximate").
  • Permutation cap. max_distinct is factorial(n) for $n \le 10$ and Inf otherwise. n_perm_eff = min(n_perm, max(1, max_distinct - 1)).
  • Analytic test asymmetry warning. If the row-standardized weight matrix Wn is not symmetric (which is typical for kNN and distance-band weights), a warning recommends test = "permutation".
  • Side effects. Calls set.seed() when seed is non-NULL (inside the permutation loop). Calls require_suggested("posterior", ...) to ensure the posterior package is available for extracting posterior draws.
  • Coordinate handling. Coordinates are validated by .gdpar_validate_coords. They are treated as Euclidean; lon/lat data must be projected before calling this function.
  • Residual extraction. Delegates to .gdpar_dependence_residuals(object, residual_type, randomize_seed).
  • Weight construction. kNN via .gdpar_knn_adjacency(coords, k); distance-band via .gdpar_distance_band_adjacency(coords). Both return raw (unstandardized) adjacency matrices.

print.gdpar_spatial_dependence_diagnostic(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Provides a human-readable summary of the spatial dependence diagnostic to the console.

Arguments

Argument Type Meaning
x gdpar_spatial_dependence_diagnostic The diagnostic object to print.
digits integer (default 3L) Number of significant digits for printed statistics.
... Unused; present for S3 generic compatibility.

Mathematics None.

Returns Invisibly returns x (the input object unchanged), following standard R print method conventions.

Notes

  • Export. Declared with @export in the roxygen header, so it is exported and registered as an S3 method for the print generic on class gdpar_spatial_dependence_diagnostic.
  • Body not shown. The source code for the function body is not included in this section (the section ends at the roxygen closing). Only the roxygen documentation is available; the exact formatting of the printed output cannot be described from this section alone.

print.gdpar_spatial_dependence_diagnostic(x, digits = 3L, ...)

Purpose S3 print method for objects of class gdpar_spatial_dependence_diagnostic. Produces a formatted console summary of the spatial dependence diagnostic (Moran's I test on model residuals, optionally on a k-nearest-neighbour or distance-band spatial weight matrix).

Arguments

Argument Type Meaning
x gdpar_spatial_dependence_diagnostic list The diagnostic object to print. Expected components: residual_type (character), n (integer), weights (character, one of "knn", "distance", or other for user-supplied), k (integer, number of neighbours when weights == "knn"), morans_i (numeric or NA), expected_i (numeric, $E[I]$ under the null), test (character, "analytic" or "permutation"), z (numeric, analytic z-score), p_value (numeric), n_perm (integer, number of permutation draws), verdict (character).
digits integer scalar (default 3L) Number of significant digits used by format() when printing numeric statistics.
... Absorbed for S3 method compatibility; unused.

Returns x invisibly (enabling piping/invisible return in scripts).

Notes

  • When x$morans_i is NA, the method prints "Moran's I : undefined" and skips printing the expected value, test statistic, and p-value entirely (regardless of x$test).
  • When x$weights is "knn", the printed label includes the value of x$k via sprintf. When "distance", it prints a fixed label. Any other value falls through to "user-supplied, row-standardized".
  • Output is directed to the console via cat() with sep = "".
  • No side effects beyond printing; no error handling.

gdpar_spatial_dependence_robust(object, data, coords, block_size = NULL, residual_type = c("quantile", "response", "pearson", "deviance"), randomize_seed = NULL, scheme = c("tiled", "moving"), random_origin = TRUE, B = 199L, level = 0.95, seed = NULL, iter_warmup = 500L, iter_sampling = 500L, chains = 2L, verbose = TRUE, ...)

Purpose Re-estimates the uncertainty (standard errors and percentile confidence intervals) of every AMM coefficient from a scalar Path 1 fit so that inference is robust to unmodelled spatial dependence in the data. The point estimates themselves are unchanged; only the reported uncertainty is adjusted. This implements the working-independence + robust-variance stance of Liang & Zeger (1986) via a spatial block bootstrap: the model is refitted on B spatial block-bootstrap resamples over the observed coords, and the bootstrap standard deviation and percentile intervals of each coefficient replace (or supplement) the model-based (Laplace / posterior) standard errors. The function is the spatial counterpart of gdpar_dependence_robust (temporal); both share one internal refit engine (.gdpar_dependence_robust_engine).

Arguments

Argument Type Default Meaning
object gdpar_eb_fit or gdpar_fit A scalar Path 1 fit (K = 1, p = 1): either an Empirical-Bayes fit from gdpar_eb() or a full-Bayes fit from gdpar(). Validated by .gdpar_assert_scalar_dep.
data data frame The original data frame passed to the fitting function. Must be row-aligned with coords. It is resampled by spatial blocks and the model is refit on each resample. Validated by assert_data_frame.
coords numeric matrix or data frame ($n \times 2$) Spatial coordinates, row-aligned with data. Validated and coerced by .gdpar_validate_coords.
block_size NULL, positive integer, or "auto" NULL Number of grid cells per axis $g$. NULL: variance-optimal rate $g = \max(2, \lfloor n^{1/4} \rceil)$ (decision D100). Positive integer: user-supplied. "auto": data-driven calibration over a grid of $g$ values using residuals (decision D101); falls back to the rate if calibration degenerates.
residual_type character (one of "quantile", "response", "pearson", "deviance") "quantile" Type of residuals used only when block_size = "auto" to feed the data-driven block-size selector. "quantile" gives Dunn-Smyth randomized quantile residuals. Ignored when block_size is NULL or a fixed integer.
randomize_seed NULL or integer NULL Optional seed for reproducibility of the randomized quantile residuals of discrete families. Used only by the "auto" block-size selector; ignored otherwise.
scheme character (one of "tiled", "moving") "tiled" Resampling scheme. "tiled": non-overlapping rectangular cells. "moving": overlapping square blocks anchored on sampled observation points.
random_origin logical scalar TRUE When TRUE and scheme = "tiled", the grid origin is randomly shifted per bootstrap replicate (Politis-Romano-Lahiri circular block idea adapted to 2-D), breaking deterministic boundary artifacts at the cost of one extra random draw per refit.
B integer $\geq 1$ 199L Number of bootstrap refits. Validated by assert_count.
level numeric in $(0,1)$ 0.95 Level for the percentile confidence intervals. Validated by assert_numeric_scalar.
seed NULL or integer NULL Optional seed controlling the block resampling and per-refit Stan seeds for reproducibility. Passed through to .gdpar_dependence_robust_engine.
iter_warmup integer $\geq 1$ 500L Warmup iterations per refit's conditional HMC.
iter_sampling integer $\geq 1$ 500L Sampling iterations per refit's conditional HMC.
chains integer $\geq 1$ 2L Number of chains per refit.
verbose logical scalar TRUE When TRUE, prints an opt-in cost message describing the number of refits, grid dimensions, scheme, and whether the full-Bayes path is in use.
... Additional arguments absorbed for forward compatibility (passed to .gdpar_dependence_robust_engine).

Mathematics

Default block-size rate (decision D100, $d = 2$). The number of cells per axis is $g = \max\bigl(2,,\bigl\lfloor n^{1/4}\bigr\rceil\bigr)$. This is the $d = 2$ specialization of the variance-optimal rate that minimises the mean-squared error of the block-bootstrap variance estimator. Writing $M$ for the number of points per block (linear extent $M^{1/d}$ per axis), the first-order bias from dependence broken at block edges is $O(M^{-1/d})$ and the estimator variance is $O(M/n)$, so

$$\text{MSE}(M) ;\sim; M^{-2/d} + \frac{M}{n}$$

is minimised at $M \sim n^{d/(d+2)}$. At $d = 1$ this gives $M \sim n^{1/3}$ (the temporal default); at $d = 2$ it gives $M \sim n^{1/2}$ points per block, i.e.\ $g^2 = n/M \sim n^{1/2}$ cells, hence $g \sim n^{1/4}$ cells per axis. The block side-length is $L_0 = \text{ranges} / g$ (per axis), and each point is assigned to the cell

$$\text{cell}(i) = \Bigl\lfloor \min!\Bigl(\frac{x_i - x_{\min}}{L_{0x}},; g-1\Bigr)\Bigr\rfloor ;\cup; \Bigl\lfloor \min!\Bigl(\frac{y_i - y_{\min}}{L_{0y}},; g-1\Bigr)\Bigr\rfloor$$

Resampling.

  • Tiled scheme: non-empty cells are sampled with replacement; the resample is truncated to $n$ observations (introducing a negative bias $O(1/n)$, negligible). When random_origin = TRUE, the grid origin is shifted by a random sub-cell offset per replicate.
  • Moving scheme: overlapping square blocks of side $g$ cells are anchored on sampled observation locations, guaranteeing non-empty blocks.

Data-driven block size ($g$, decision D101). For each candidate $g$ on a grid, $B_0$ cheap (no-refit) spatial block resamples produce the bootstrap variance $V(g)$ of the design-weighted residual functionals

$$\frac{1}{n},\bigl[1,;\tilde{g}_x,;\tilde{g}_y\bigr]^\top z$$

(the influence directions of the coefficient). $g$ is chosen to minimise an empirical mean-squared error: squared bias (anchored at the largest blocks, the least biased because the dependence-breaking bias grows like $g/\sqrt{n}$) plus a leave-one-out jackknife variance, with the $n^{1/4}$ rate as the fallback. A single isotropic $g$ is used.

Returns A list of class c("gdpar_spatial_dependence_robust", "list") with the following components:

Component Type Description
table data frame One row per AMM coefficient. Columns: estimate (original point estimate, unchanged), model_se (model-based SE), robust_se (bootstrap SD), se_ratio (robust_se / model_se), ci_lower and ci_upper (percentile interval at level).
block_size integer The chosen $g$ (cells per axis).
block_size_method character One of "rate" (variance-optimal default; also returned when "auto" falls back), "fixed" (user-supplied integer), or "auto" (data-driven calibration succeeded).
scheme character The resampling scheme used ("tiled" or "moving").
random_origin logical Whether random grid-origin shifts were used (relevant only for "tiled").
n_tiles integer Number of unique spatial cells at the chosen resolution.
B integer Requested number of bootstrap refits.
B_ok integer Number of refits that successfully converged (from .gdpar_dependence_robust_engine).
level numeric The percentile-interval level.
seed integer or NULL The seed actually used (may be supplied or internally generated by the engine).
warnings character vector Accumulated warning messages (single-cell warning if n_tiles <= 1, plus any from the refit engine).
refit_diagnostics list Aggregate per-refit convergence diagnostics, structured as in gdpar_dependence_robust.

A print method is declared (signature print.gdpar_spatial_dependence_robust(x, digits, ...); body in another section).

Notes

  • Input validation. object is checked by .gdpar_assert_scalar_dep (must be a scalar Path 1 fit). coords is validated by .gdpar_validate_coords for dimension and type. Collinear coordinates (zero range on either axis) raise an error via gdpar_abort (class gdpar_input_error). block_size must be NULL, a positive integer, or the string "auto"; any other string triggers an error. random_origin and verbose must be logical scalars. B, iter_warmup, iter_sampling, chains are validated by assert_count. level is validated as a numeric scalar in $(0, 1)$.
  • Single-cell warning. If all locations fall into a single spatial cell at the chosen resolution (n_tiles <= 1), a warning is emitted and the bootstrap SE will collapse toward zero. The warning message is stored in warnings_pre and appended to the returned warnings vector.
  • Full-Bayes detection. The function detects whether object is a full-Bayes fit (inherits(object, "gdpar_fit") && !inherits(object, "gdpar_eb_fit")) to adjust the verbose cost message accordingly (full-Bayes refits use full HMC and are markedly more expensive).
  • Suggested dependencies. Requires cmdstanr (for Stan refits) and posterior (for extracting posterior draws); both are loaded via require_suggested.
  • Internal engine. The actual bootstrap loop is delegated to .gdpar_dependence_robust_engine, which receives a resample_fun closure that calls .gdpar_spatial_block_indices(coords, g, scheme, random_origin, mins, ranges) to generate block indices for each replicate.
  • Coordinate pre-processing. mins (per-axis minima) and ranges (per-axis ranges) are computed from coords and used throughout cell assignment and the resampling closure.
  • ... arguments. Forwarded to .gdpar_dependence_robust_engine; currently absorbed for compatibility with the temporal sibling gdpar_dependence_robust.
  • No dependence modelling. The function does not model spatial dependence; it only makes inference robust to it. Valid for weak / short-range spatial dependence relative to block size; does not rescue strong long-range dependence.
  • Isotropic block. A single isotropic $g$ is used for both coordinate axes; strongly anisotropic residual dependence is a documented limitation.

print.gdpar_spatial_dependence_robust(x, digits = 3L, ...)

Purpose S3 print method for objects of class "gdpar_spatial_dependence_robust". Renders a human-readable summary of spatial-dependence-robust inference results produced by spatial block-bootstrap variance estimation. It displays the block-bootstrap configuration (grid size, scheme, number of non-empty tiles), the number of refits performed and succeeded, the confidence level, a formatted table of coefficient estimates with model-based and robust standard errors and their ratio, a brief interpretation of the se_ratio, refit diagnostics, and any stored warnings.

Arguments

Argument Type Meaning
x list (S3 class "gdpar_spatial_dependence_robust") The spatial-dependence-robust result object. Expected to contain the named elements scheme, block_size_method, block_size, random_origin, n_tiles, B, B_ok, level, table, refit_diagnostics, and warnings.
digits integer(1) (default 3L) Number of significant digits used when formatting numeric columns of the coefficient table via format().
... Additional arguments passed to print(); accepted for S3 method signature compatibility but not used in the body.

Returns Invisibly returns x, the original input object (via invisible(x)). The primary effect is the side effect of printing to the console.

Notes

  1. S3 dispatch. This is an S3 method registered for the generic print on objects of class "gdpar_spatial_dependence_robust". Standard print() dispatch applies.
  2. Header line. Prints the scheme name (e.g. "lattice", "tiled") followed by " spatial block bootstrap".
  3. Grid description. Prints block_size × block_size cells. The block_size_method element is checked:
    • "auto" appends " (auto: data-driven calibration)".
    • "rate" appends " (rate: n^(1/4))".
    • "fixed" or any other value (including NULL via the %||% fallback) appends nothing. If random_origin is TRUE and scheme is exactly "tiled", the note " (randomized origin)" is also appended. The count of non-empty tiles (n_tiles) is always shown.
  4. Refit summary. Displays the total number of bootstrap refits (B) and how many completed successfully (B_ok).
  5. Confidence level. Prints level (a numeric probability, e.g. 0.95).
  6. Coefficient table formatting. Columns of x$table that are numeric are re-formatted with format(col, digits = digits). The table is then printed with row.names = FALSE.
  7. Interpretation hint. A short explanatory sentence is printed: the se_ratio is defined as robust_se / model_se; a ratio greater than 1 indicates that the model-based standard errors understate the spatial-dependence-robust uncertainty.
  8. Refit diagnostics. Delegates to .gdpar_print_refit_diagnostics(x$refit_diagnostics, digits) (an internal helper defined elsewhere) to print any additional convergence or numerical diagnostics from the bootstrap refits.
  9. Warnings. If x$warnings is a non-empty character vector, up to 5 warnings are printed, each preceded by " - ". If more than 5 exist, a count of remaining warnings is appended.
  10. Edge cases. If block_size_method is NULL, the %||% (null-coalescing) operator defaults to "fixed", so no calibration label is printed. If scheme is not "tiled" or random_origin is not TRUE, the randomized-origin note is omitted. If x$warnings has length zero or is NULL, the warnings section is skipped entirely (the if guards handle this).

R/dims_spec.R

dimwise(a = NULL, b = NULL)

Purpose Constructor for the dims_spec S3 class. It broadcasts a single uniform per-component specification (additive basis a, multiplicative basis b) across all $p$ dimensions of a multivariate $\theta_i$, deferring the actual value of $p$ to the point of consumption (amm_spec). It is the intended value of the dims argument of amm_spec when every dimension shares the same $a_k, b_k$ structure; per-dimension deviations are subsequently layered on with override.

Arguments

  • a : NULL or a one-sided formula. The additive basis applied uniformly to every dimension $k = 1, \ldots, p$ of $\theta_i$. NULL disables the additive component for all dimensions.
  • b : NULL or a one-sided formula. The multiplicative basis applied uniformly to every dimension. NULL disables the multiplicative component for all dimensions.

Mathematics The object encodes, for the canonical AMM form $$ \theta_i[k] = \theta_{\mathrm{ref}}[k] + a_k(x_i) + b_k(x_i),\theta_{\mathrm{ref}}[k] + \bigl(W_k(\theta_{\mathrm{ref}}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i, \quad k = 1, \ldots, p, $$ the per-dimension, covariate-only pieces $a_k$ and $b_k$. The cross-dimension modulator $W_k(\theta_{\mathrm{ref}})$ is deliberately excluded from dims_spec because it couples all dimensions of $\theta_{\mathrm{ref}}$ and cannot be factored per $k$ without silently restricting the model to the separable sub-class.

Returns A list of class c("dims_spec", "list") with two components:

  • base : a list list(a = a, b = b) holding the uniform template.
  • overrides : an empty named list list(), to be populated by override.

Notes

  • Both arguments may simultaneously be NULL; this is permitted and yields a dims_spec whose base disables both components.
  • Validation is delegated to assert_one_sided_formula(., allow_null = TRUE) for each of a and b; malformed formulas abort there.
  • The dimension $p$ is intentionally not stored; coherence with $p$, the multivariate $W$ basis, and any overrides is validated later by amm_spec.
  • Bare formulas passed directly to amm_spec's dims argument when $p &gt; 1$ are rejected; wrapping in dimwise() is the explicit opt-in to broadcasting.

override(dims, k, a, b)

Purpose Attach a per-dimension override to an existing dims_spec, replacing the additive and/or multiplicative formula for a single dimension index k while leaving the base template and other dimensions untouched. Overrides compose across multiple calls and overwrite on repeated k.

Arguments

  • dims : a dims_spec object (produced by dimwise).
  • k : a positive integer scalar. The dimension index to override. Coherence with the global $p$ is checked later by amm_spec/resolve_dims_spec, not here.
  • a : optional. A one-sided formula replacing the additive basis for dimension k, or NULL to disable the additive component for that dimension only. Missing (omitted) means "inherit from base"; explicitly NULL means "disable for this dimension".
  • b : optional. Same semantics as a for the multiplicative basis.

Mathematics For the overridden dimension $k$, the resolved per-dimension pair $(a_k, b_k)$ becomes $$ a_k = \begin{cases} a^{\text{ov}} & \text{if a supplied} \ a^{\text{base}} & \text{if a missing} \end{cases}, \qquad b_k = \begin{cases} b^{\text{ov}} & \text{if b supplied} \ b^{\text{base}} & \text{if b missing} \end{cases}, $$ where a supplied value of NULL is interpreted as "disabled" (a valid formula replacement of NULL), distinct from "missing" (inherit).

Returns A new dims_spec (a modified copy of dims; the input is not mutated in place because list subsetting creates copies) with the override registered under the character key as.character(as.integer(k)) in dims$overrides. Each override entry is a list with components a, b, a_set (logical), b_set (logical). Calling override twice with the same k replaces the prior entry for that index.

Notes

  • assert_inherits(dims, "dims_spec", "dims") is called first; non-dims_spec input aborts.
  • assert_count(k, "k") enforces that k is a positive integer scalar.
  • If both a and b are missing, the function aborts via gdpar_abort with class = "gdpar_input_error" and the message: "override(): at least one of 'a' or 'b' must be supplied. To leave a dimension unchanged, do not call override() for it."
  • The missing-vs-NULL distinction is implemented with base::missing(). When a is supplied, assert_one_sided_formula(a, "a", allow_null = TRUE) is run, then ov["a"] <- list(a) (the [<--with-list idiom is used so that assigning NULL retains the element rather than deleting it) and ov$a_set <- TRUE. Symmetrically for b.
  • If no prior override exists for k, a fresh entry list(a = NULL, b = NULL, a_set = FALSE, b_set = FALSE) is seeded before applying the supplied arguments, so unsupplied components correctly remain flagged as unset and will inherit from the base at resolution time.
  • Range validation of k against $p$ is not performed here; it is deferred to resolve_dims_spec.

resolve_dims_spec(dims, p)

Purpose Internal resolver that flattens a dims_spec into the canonical per-dimension representation consumed by amm_spec: a length-p list of list(a, b) pairs, with overrides applied on top of the base template.

Arguments

  • dims : a dims_spec object.
  • p : a positive integer scalar, the global dimension.

Mathematics For each $k \in {1, \ldots, p}$ the resolved pair is $$ (a_k, b_k) = \begin{cases} (a^{\text{ov}}_k, b^{\text{ov}}_k) & \text{if an override for } k \text{ exists, per-component when set} \ (a^{\text{base}}, b^{\text{base}}) & \text{otherwise.} \end{cases} $$ Concretely, $a_k = a^{\text{ov}}_k$ iff ov$a_set is TRUE, else $a_k = a^{\text{base}}$; likewise for $b_k$.

Returns A list of length p. Each entry is list(a = a_k, b = b_k) where a_k/b_k are each either a one-sided formula or NULL.

Notes

  • assert_inherits(dims, "dims_spec", "dims") and assert_count(p, "p") are run first.
  • Before flattening, every override key is parsed with suppressWarnings(as.integer(key)). Any key that is NA, < 1, or > p is collected into bad; if bad is non-empty, the function aborts via gdpar_abort with class = "gdpar_input_error", a sprintf message listing the bad keys and the valid range 1:p, and a data field list(bad_keys = bad, p = p).
  • The flattening loop iterates seq_len(p); for each k it starts from dims$base$a / dims$base$b, then if dims$overrides[[as.character(k)]] is non-NULL it conditionally replaces a_k when isTRUE(ov$a_set) and b_k when isTRUE(ov$b_set). Unset components therefore inherit from the base, realising the missing-vs-NULL semantics established by override.
  • Marked @keywords internal / @noRd; not exported.

print.dims_spec(x, ...)

Purpose S3 print method for objects of class dims_spec. Renders a compact human-readable summary of the base template and any registered overrides.

Arguments

  • x : a dims_spec object.
  • ... : ignored; present for S3 generic compatibility.

Returns Invisibly returns x.

Notes

  • Output layout:
    • Header line <dims_spec>.
    • base: section printing a : <deparse(formula) or "NULL"> and b : <deparse(formula) or "NULL">. Formula deparsing uses base::deparse.
    • If length(x$overrides) > 0L, an overrides: section enumerating each override. Keys are sorted by their integer value (sort(as.integer(names(x$overrides)))) and printed as k = <int> : <parts>, where <parts> is the semicolon-joined set of a = <deparse or "NULL"> (only if isTRUE(ov$a_set)) and b = <deparse or "NULL"> (only if isTRUE(ov$b_set)). Unset components are omitted from the line.
    • If there are no overrides, prints overrides: <none>.
  • The method is exported (the generic print is dispatched via UseMethod on class "dims_spec", which sits before "list" in the class vector).
  • No validation of x is performed; passing a malformed object may produce confusing output or errors from deparse/subsetting.

R/eb_methods.R

print.gdpar_eb_fit(x, digits = 3L, ...)

Purpose
Provides a concise, human-readable console summary of a fitted Empirical-Bayes model object. This S3 print method dispatches on objects of class gdpar_eb_fit, displaying key model characteristics, parameter estimates, numerical diagnostics of the Laplace approximation, and conditional HMC diagnostics.

Arguments

  • x: A gdpar_eb_fit object, the result of an Empirical-Bayes fitting procedure.
  • digits: Integer scalar. Controls the number of digits for numeric formatting via format(). Defaults to 3L.
  • ...: Additional arguments (unused; included for S3 method consistency).

Mathematics
No explicit mathematical formula is implemented. The method presents estimates and standard errors computed elsewhere.

Returns
Invisibly returns the input object x (type gdpar_eb_fit).

Notes

  • S3 Dispatch: Invoked by print() when the first argument is of class gdpar_eb_fit.
  • Conditional Output: The printed output adapts based on the path component of the object. For "eb_KxP" (Path C, the K×p regime), it prints a multi-dimensional array of estimates (theta_ref_kp_hat, theta_ref_kp_se), slot names, and per-slot condition numbers. For other paths, it prints scalar/vectors of theta_ref_hat and theta_ref_se.
  • Side Effects: Writes directly to the console via cat().
  • NULL-safe Access: Uses the %||% operator (likely from rlang) to provide default values for potentially NULL components (e.g., x$family$name), preventing errors during formatting.
  • Diagnostics: Displays numerical diagnostics (diagnostics_numerical) and, if available, one-line conditional HMC diagnostics (diagnostics).

summary.gdpar_eb_fit(object, level = 0.95, ...)

Purpose
Constructs a structured summary of an Empirical-Bayes fit suitable for programmatic access and further printing. This S3 summary method computes credible intervals, optionally applying the Proposition 7B scalar or tensor correction, and extracts a summary of the conditional posterior if available.

Arguments

  • object: A gdpar_eb_fit object.
  • level: Numeric scalar in the interval (0, 1). Specifies the probability level for credible intervals. Defaults to 0.95.
  • ...: Additional arguments (unused).

Mathematics

  1. Credible Interval Inflation (Correction):
    The standard error (se) is multiplied by an inflation factor inflate to widen the credible interval, accounting for the uncertainty of the reference anchor.

    • Scalar Correction (Path C off):
      $$ \text{inflate} = \sqrt{1 + \frac{C}{\max(1, J)}} $$
      where $C$ is the constant object$eb_correction_constant and $J$ is the number of groups.
    • Tensor Correction (Path C on):
      For each group $g$, slot $k$, and coordinate $c$:
      $$ \text{inflate}_{k,c} = \sqrt{1 + \frac{\mathbf{T}[k, c, c]}{\max(1, J)}} $$
      where $\mathbf{T}$ is the object$correction_tensor_constant.
  2. Credible Interval Calculation:
    The $(1-\alpha)$ credible interval is:
    $$ \text{estimate} \pm z_{1-\alpha/2} \cdot \text{se} \cdot \text{inflate} $$
    where $z_{1-\alpha/2}$ is the $(1-\alpha/2)$ quantile of the standard normal distribution, and $\alpha = 1 - \text{level}$.

Returns
An object of class summary.gdpar_eb_fit, which is a list containing:

  • theta_table: A data.frame (or array for Path C) of estimates, standard errors, lower/upper interval bounds, and inflation factors.
  • conditional_summary: A posterior summary (from the posterior package) of the conditional model fit, if available. Otherwise NULL.
  • correction_applied: Logical flag indicating if an EB correction was applied.
  • correction_constant (non-Path C) or correction_tensor (Path C): The correction value(s) used.
  • inflation_factor: The computed inflation factor(s).
  • level, family, link, J_groups, K_slots, p_dim, slot_names (Path C), diagnostics_numerical, diagnostics_hmc, path, call: Various model metadata.

Notes

  • Input Validation: Raises a gdpar_input_error (via gdpar_abort()) if level is not a single numeric value in (0, 1).
  • Conditional Posterior Extraction: Attempts to extract and summarize the conditional posterior draws using posterior::summarise_draws(). Filters out latent parameters (e.g., eta, log_lik) by pattern matching. Errors are silently caught, returning NULL.
  • Path Dependency: The structure of the returned summary, especially theta_table, differs significantly between the scalar (non-Path C) and tensor (Path C, eb_KxP) regimes.

print.summary.gdpar_eb_fit(x, digits = 3L, ...)

Purpose
Formats and prints the summary of an Empirical-Bayes fit produced by summary.gdpar_eb_fit(). This S3 print method provides a detailed, human-readable display of the summary object.

Arguments

  • x: A summary.gdpar_eb_fit object.
  • digits: Integer scalar for numeric formatting. Defaults to 3L.
  • ...: Additional arguments (unused).

Mathematics
No new calculations; presents the pre-computed values from the summary object.

Returns
Invisibly returns the input summary object x.

Notes

  • S3 Dispatch: Invoked by print() when the first argument is of class summary.gdpar_eb_fit.
  • Path-Dependent Output: Prints different sections depending on whether x$path is "eb_KxP" (Path C). For Path C, it prints the tensor-based correction details and the full theta_table. For other paths, it prints the scalar correction constant and inflation factor.
  • Conditional Summary Display: If available, prints the first 8 rows of the conditional posterior summary for a quick overview.
  • Side Effects: Writes directly to the console via cat() and print().

coef.gdpar_eb_fit(object, ...)

Purpose
Extracts coefficient estimates from a fitted empirical Bayes General Dynamic Parameter model (gdpar_eb_fit object). It returns the reference parameter estimates and, if a conditional HMC fit is available, the conditional model parameters (random effects, fixed effects, and raw W parameters).

Arguments

  • object: A gdpar_eb_fit object resulting from a call to a fitting function (e.g., gdpar_eb).
  • ...: Additional arguments (currently unused).

Mathematics
No new mathematical operations. It extracts precomputed quantities:

  • $\widehat{\theta}_{\text{ref}}^{\text{EB}}$: The empirical Bayes estimate of the reference parameter.
  • $\text{SE}(\widehat{\theta}_{\text{ref}})$: Its standard error.
  • $\text{Cov}(\widehat{\theta}_{\text{ref}})$: Its covariance matrix.
  • For conditional parameters, it extracts posterior means and standard deviations from HMC draws: $$ \widehat{\mu}a = \frac{1}{S} \sum{s=1}^S a^{(s)}, \quad \text{SD}(a) = \sqrt{\frac{1}{S-1} \sum_{s=1}^S (a^{(s)} - \widehat{\mu}_a)^2} $$ (analogous for b and W), where $S$ is the number of posterior draws.

Returns
A list of class c("gdpar_coef_eb", "gdpar_coef", "list") with components:

  • theta_ref: A list containing:
    • method: Character "EB".
    • estimate: Numeric scalar, $\widehat{\theta}_{\text{ref}}^{\text{EB}}$.
    • se: Numeric scalar, standard error.
    • cov: Numeric matrix, covariance matrix.
    • eb_correction_applied: Logical, whether an EB correction was applied.
    • eb_correction_constant: Numeric, the constant used for EB correction (if any).
  • If object$conditional_fit exists and posterior package is available:
    • a: List with estimate (vector of means) and se (vector of SDs) for a_coef parameters.
    • b: List with estimate and se for b_coef parameters.
    • W: List with estimate and se for W_raw parameters.

Notes

  • S3 method for class gdpar_eb_fit.
  • The conditional parameters (a, b, W) are only extracted if the posterior package is available and the conditional fit object contains draws.
  • The helper function pick(pat) uses a regex pattern pat to match variable names in the posterior draws and returns their means and SDs.
  • The output class inherits from gdpar_coef, allowing use of generic coefficient methods.

predict.gdpar_eb_fit(object, newdata = NULL, type = c("response", "linear_predictor"), level = 0.95, ...)

Purpose
Computes posterior predictions from the conditional HMC model fit at the plug-in empirical Bayes estimate $\widehat{\theta}_{\text{ref}}^{\text{EB}}$. Supports in-sample prediction only; out-of-sample prediction is deferred to a later phase.

Arguments

  • object: A gdpar_eb_fit object.
  • newdata: Optional data frame with the same variables as training data. Currently must be NULL (in-sample prediction).
  • type: Character string specifying prediction scale:
    • "response" (default): Predictions on the response scale via the family's inverse-link function ($y$).
    • "linear_predictor": Predictions on the linear predictor scale ($\eta$).
  • level: Numeric scalar in $(0,1)$ for the credible interval width. Defaults to $0.95$.
  • ...: Additional arguments (currently unused).

Mathematics
Let $\eta^{(s)}$ be the $s$-th posterior draw of the linear predictor, and $y^{(s)}$ the corresponding response-scale draw. For each observation $i$:

  • Mean: $\bar{\eta}i = \frac{1}{S} \sum{s=1}^S \eta_i^{(s)}$ (or $\bar{y}_i$ for response).
  • Credible interval bounds: $Q_{\alpha/2}(\eta_i)$ and $Q_{1-\alpha/2}(\eta_i)$, where $\alpha = 1 - \text{level}$ and $Q$ denotes the sample quantile.

Returns
A list with components:

  • mean: Numeric vector of posterior predictive means (length $n$).
  • lower: Numeric vector of lower credible interval bounds (length $n$).
  • upper: Numeric vector of upper credible interval bounds (length $n$).
  • draws: Numeric matrix of posterior predictive draws (dimensions $S \times n$).
  • level: The credible interval level used.
  • type: The prediction type ("response" or "linear_predictor").

Notes

  • S3 method for class gdpar_eb_fit.
  • If newdata is not NULL, an error of class "gdpar_unsupported_feature_error" is raised, stating that out-of-sample prediction is not yet implemented (deferred to Sub-phase 8.6.C).
  • Requires the posterior package to extract and manipulate HMC draws.
  • The function searches for variables in the conditional fit's draws matching "^eta\\[" (for linear predictor) or "^y_pred\\[" (for response). If none are found, an internal error is raised.
  • Quantiles are computed using stats::quantile with names = FALSE.
  • The draws matrix is transposed from the posterior draws matrix format to $S \times n$.

R/eb.R

gdpar_eb(formula, family = gdpar_family("gaussian"), amm = amm_spec(), W = NULL, data, prior = NULL, anchor = "prior_mean", skip_id_check = FALSE, chains = 4L, iter_warmup = 1000L, iter_sampling = 1000L, adapt_delta = 0.95, max_treedepth = 12L, refresh = 100L, verbose = TRUE, seed = NULL, group = NULL, parametrization = c("auto", "ncp", "cp"), id_check_rigor = c("full", "fast"), eb_correction = TRUE, laplace_control = list(), ...)

Purpose

Exported main entry point for Path 1 Empirical-Bayes (EB) estimation of the AMM canonical model. It is the EB counterpart of gdpar(). The function implements a three-step pipeline:

  1. Step (i) — Estimate the population reference $\theta_{\text{ref}}$ by maximizing the marginal (Type II) likelihood via Laplace approximation (cmdstanr::laplace()), with multi-start optimization and adaptive Levenberg–Marquardt ridge perturbation for numerical anti-fragility.
  2. Step (iii) — Sample the lower-level parameters $\xi = (a, b, W, \sigma_*, \phi)$ from the conditional posterior $p(\xi \mid y, \widehat{\theta}_{\text{ref}}^{\text{EB}})$ via HMC (cmdstanr::sample()).
  3. Optionally apply the scalar Proposition 7B coverage-discrepancy inflation factor to the conditional credible intervals.

The function dispatches across three path regimes based on the resolved $(K, p)$:

  • Path A (8.6.B): $K = 1$, $p = 1$ — the base regime executed inline in the function body.
  • Path B (8.6.C): $K &gt; 1$, $p = 1$ — delegated to .gdpar_eb_run_K().
  • Path C (8.6.D): $K &gt; 1$ and any slot with $p &gt; 1$ — delegated to .gdpar_eb_run_KxP().

Arguments

Argument Type Meaning
formula Two-sided formula or gdpar_formula_set Outcome and RHS specification. Same semantics as gdpar()'s formula. When it inherits from "gdpar_formula_set", the K-input dispatch fires.
family gdpar_family object or named list Distributional family. Sub-phase 8.6.B supports stan_id in c(1, 2, 3, 4) (Gaussian, Poisson, neg-binomial-2, Bernoulli) for the $K=1$ path. When a named list (not inheriting gdpar_family or gdpar_family_multi), it is treated as a multi-family input for K-input dispatch.
amm amm_spec or named list of amm_spec AMM specification. Must have amm$p == 1L for the base regime; multivariate ($p &gt; 1$) flows through Path A internally. A named list (not inheriting amm_spec) triggers K-input dispatch.
W W_basis object or NULL Optional modulating basis (polynomial or B-spline).
data data.frame Data frame containing all variables referenced by formula and amm.
prior gdpar_prior object or NULL Prior specification. When NULL, defaults via gdpar_prior() are used.
anchor Numeric scalar, "prior_mean", or "empirical_y" Anchor value for $\theta_{\text{ref}}$. Default "prior_mean".
skip_id_check Logical scalar If TRUE, skips the basis-restricted identifiability check.
chains Integer scalar Number of HMC chains for Step (iii). Default 4L.
iter_warmup Integer scalar HMC warmup iterations per chain. Default 1000L.
iter_sampling Integer scalar HMC sampling iterations per chain. Default 1000L.
adapt_delta Numeric scalar HMC adapt_delta. Default 0.95.
max_treedepth Integer scalar HMC maximum tree depth. Default 12L.
refresh Integer scalar HMC refresh interval. Default 100L.
verbose Logical scalar Controls diagnostic messages and show_messages/show_exceptions in HMC.
seed Integer scalar or NULL Random seed for reproducibility (Laplace multi-start, parametrization pre-flight, and HMC).
group One-sided formula or NULL Grouping variable specification.
parametrization Character scalar One of "auto" (default), "ncp", "cp". Selects CP/NCP sampling parametrization for additive and modulating components in Step (iii). "auto" triggers a pre-flight diagnostic via resolve_parametrization().
id_check_rigor Character scalar One of "full" or "fast". Matched but not otherwise consumed in this function body (forwarded to K-path orchestrators).
eb_correction Logical scalar If TRUE (default), applies the scalar Proposition 7B inflation factor to conditional credible intervals. If FALSE, issues a gdpar_diagnostic_warning about expected $O(n^{-1})$ under-coverage.
laplace_control Named list Controls for Step (i) Laplace approximation and anti-fragility. Recognized entries: multi_start_M (default 5), kappa_threshold (default 1e10), ridge_init (default 1e-6), epsilon_lm (default sqrt(.Machine$double.eps)), ridge_max_iter (default 10), ridge_grow_factor (default 10.0), laplace_draws (default 1000), optim_algorithm (default "lbfgs"). Resolved by .gdpar_eb_resolve_laplace_control().
... Additional arguments Forwarded to the underlying HMC sampler (conditional_model$sample()) in Step (iii).

Mathematics

The EB estimator maximizes the marginal (Type II) log-likelihood:

$$ \widehat{\theta}_{\text{ref}}^{\text{EB}} = \arg\max_{\theta_{\text{ref}}} ; \ell_{\text{marg}}(\theta_{\text{ref}}), \qquad \ell_{\text{marg}}(\theta_{\text{ref}}) = \log \int p(y \mid \xi, \theta_{\text{ref}}) , p(\xi \mid \theta_{\text{ref}}) , d\xi. $$

The integral is approximated by the Laplace method: for each candidate $\theta_{\text{ref}}$, the inner posterior mode $\widehat{\xi}(\theta_{\text{ref}})$ and Hessian $H(\theta_{\text{ref}})$ are computed, yielding

$$ \ell_{\text{marg}}(\theta_{\text{ref}}) \approx \log p(y \mid \widehat{\xi}, \theta_{\text{ref}}) + \log p(\widehat{\xi} \mid \theta_{\text{ref}}) + \frac{d_\xi}{2}\log(2\pi) - \frac{1}{2}\log\det H(\theta_{\text{ref}}), $$

where $d_\xi$ is the dimension of $\xi$. The marginal standard error for $\theta_{\text{ref}}$ is derived from the inverse of the marginal Hessian (Laplace covariance).

Given $\widehat{\theta}_{\text{ref}}^{\text{EB}}$, the conditional posterior is

$$ p(\xi \mid y, \widehat{\theta}_{\text{ref}}^{\text{EB}}) \propto p(y \mid \xi, \widehat{\theta}_{\text{ref}}^{\text{EB}}) , p(\xi \mid \widehat{\theta}_{\text{ref}}^{\text{EB}}), $$

sampled via HMC in Step (iii).

When eb_correction = TRUE, the scalar Proposition 7B inflation constant $c_{\text{EB}}$ is applied to the conditional credible intervals, inflating their half-widths to account for the $O(n^{-1})$ coverage discrepancy between the EB conditional posterior and the exact marginal posterior.

Returns

An object of class c("gdpar_eb_fit", "list") with the following named components:

Component Type Description
theta_ref_hat Numeric vector (length J_groups) EB point estimates of $\theta_{\text{ref}}$.
theta_ref_se Numeric vector (length J_groups) Marginal standard errors from the Laplace covariance.
conditional_fit cmdstanr fit object The HMC fit from Step (iii).
amm amm_spec The resolved AMM specification.
family gdpar_family The resolved family object.
prior gdpar_prior The resolved prior.
design AMM design object Built by build_amm_design().
anchor Numeric scalar The resolved anchor value.
stan_data Named list The Stan data list (includes K_slots, p_dim).
identifiability_report Report object or NULL Result of gdpar_check_identifiability(); NULL when skip_id_check = TRUE.
diagnostics gdpar_diagnostics Diagnostics from the conditional HMC fit, computed by compute_diagnostics().
diagnostics_numerical Named list Numerical diagnostics from the Laplace step: kappa, lm_perturbation, lm_n_iter, lm_status (one of "not_needed", "converged", "exhausted"), kappa_post_ridge, multi_start_dispersion, marginal_log_lik_history. For Path C, slot-vectorized counterparts (kappa_per_slot, lm_lambda_per_slot, lm_n_iter_per_slot, lm_status_per_slot) replace the scalars.
parametrization Named list Contains cp_a (logical), cp_W (logical), and meta (metadata from resolve_parametrization()).
group_info Group info object or NULL Resolved grouping information.
correction_applied Logical scalar Whether the Proposition 7B correction was applied.
eb_correction_constant Numeric scalar The inflation constant when eb_correction = TRUE; NA_real_ otherwise.
call call The matched call.
path Character scalar Always "eb".

Notes

  • Argument matching: parametrization and id_check_rigor are resolved via match.arg() at function entry. call is captured via match.call().

  • Input validation: Delegates to .gdpar_eb_validate_inputs() (defined in a subsequent section) for type discipline of formula, family, amm, data, eb_correction, and laplace_control. If prior is NULL, it is replaced by gdpar_prior(); then assert_inherits() enforces class "gdpar_prior".

  • cmdstanr dependency: require_suggested("cmdstanr", ...) is called to ensure the suggested package is available. The Laplace method requires cmdstanr ≥ 0.7.0.

  • K-input dispatch: Four boolean flags detect multi-slot input patterns:

    • .formula_set_input: formula inherits "gdpar_formula_set".
    • .amm_list_input: amm is a list, does not inherit "amm_spec", and has non-NULL names.
    • .classic_with_amm_calls: formula is a standard two-sided formula (length 3) whose RHS contains a()/b()/W() calls, detected by .gdpar_rhs_has_amm_calls().
    • .family_is_named_list: family is a named list not inheriting "gdpar_family" or "gdpar_family_multi".

    When any of these fires, .gdpar_eb_resolve_K_inputs() builds amm_list_canonical, family_promoted, outcome_name, formula_env, and family_id_k_vector. If resolved $K &gt; 1$, the function checks whether any slot has $p &gt; 1$ (.any_slot_p_gt1); if so, it returns .gdpar_eb_run_KxP() (Path C), otherwise .gdpar_eb_run_K() (Path B). If $K = 1$, the single amm_spec is unwrapped from amm_list_canonical[[1]], family is replaced by family_promoted, and a new formula is reconstructed from the union of all.vars(amm$a) and all.vars(amm$b) (or "1" if both are empty), using K_inputs$formula_env as the environment.

  • Path A (K = 1) pipeline: After K-input resolution (or if no K-input pattern fired), the function proceeds inline:

    1. p_resolved is read from amm$p (defaulting to 1L if absent). K_resolved is always 1L.
    2. .gdpar_eb_check_stan_id_for_path() validates the family's stan_id against the resolved $(K, p)$.
    3. The outcome variable name is extracted from formula[[2]]. If not found in data, a gdpar_input_error is raised. Non-finite values (NA, NaN, Inf) in the outcome trigger a gdpar_input_error with a count.
    4. The RHS formula is extracted as formula[c(1L, 3L)] and updated with ~ . + 0 (no intercept).
    5. If amm$W is non-NULL, it is materialized via materialize_W_basis(amm$W, p = p_resolved).
    6. The AMM design is built via build_amm_design(amm, data, formula_rhs = rhs).
    7. The anchor is resolved via resolve_anchor(anchor, family, y, prior, verbose).
    8. Unless skip_id_check = TRUE, gdpar_check_identifiability() is called with theta_ref_init set to 1 when amm$b is non-NULL and abs(anchor_value) < 1e-8, otherwise anchor_value. If the check does not pass, a gdpar_identifiability_error is raised with the report attached in data = list(report = rep).
    9. Group resolution via .resolve_group_argument(). If a group is present, .check_group_aliasing_c7() is called.
    10. Stan data is assembled via assemble_stan_data(). stan_data$K_slots and stan_data$p_dim are set to the resolved integers.
    11. Parametrization is resolved via resolve_parametrization() (which may run a pre-flight diagnostic when parametrization = "auto").
    12. The marginal Stan model source is generated by .gdpar_eb_generate_stan_marginal(), written to a tempfile via write_stan_to_tempfile(), and compiled via cmdstanr::cmdstan_model().
    13. The marginal likelihood is maximized by .gdpar_eb_maximize_marginal(), returning theta_ref_hat, theta_ref_se, and diagnostics.
    14. The conditional Stan model source is generated by .gdpar_eb_generate_stan_conditional(), written and compiled analogously.
    15. stan_data_cond is a copy of stan_data with theta_ref_data set: when $p &gt; 1$ and length(theta_hat_loc) == J_groups * p, it is reshaped to a J_groups × p matrix (column-major, byrow = FALSE); otherwise it is passed as a flat numeric vector.
    16. HMC sampling is invoked via do.call(conditional_model$sample, sample_args). Extra arguments from ... are merged into sample_args, potentially overriding defaults. seed is included only when non-NULL.
    17. Diagnostics are computed via compute_diagnostics(fit_cond, verbose = verbose).
    18. The EB correction is computed by .gdpar_eb_apply_correction().
  • Errors raised:

    • gdpar_input_error: outcome variable not in data; outcome contains non-finite values.
    • gdpar_identifiability_error: basis-restricted identifiability check failed (with data = list(report = rep)).
    • gdpar_unsupported_feature_error: raised by .gdpar_eb_check_stan_id_for_path() for unsupported stan_id / $(K, p)$ combinations (as documented; the actual raise is inside the helper).
    • gdpar_eb_numerical_error: raised by .gdpar_eb_maximize_marginal() when the condition number exceeds kappa_threshold after adaptive ridge (as documented; the actual raise is inside the helper).
  • Side effects: Writes Stan source files to temporary files on disk; compiles Stan models (may invoke the C++ toolchain); runs optimization and HMC sampling (may produce console output controlled by verbose/refresh).

  • S3 dispatch: The returned object has class c("gdpar_eb_fit", "list"). No S3 methods for this class are defined in this section.

.gdpar_eb_validate_inputs(formula, family, amm, data, eb_correction, laplace_control)

Purpose

Top-level input validator for the EB (Empirical Bayes) correction pipeline. Called before any dispatch to verify that every public argument conforms to the expected type and structure. Guards the entry point of the EB path and prevents downstream functions from receiving malformed inputs.

Arguments

  • formula (any): Must be either a two-sided R formula of length 3 (y ~ ...) or an object inheriting from class "gdpar_formula_set".
  • family (any): Must be one of: an object inheriting from "gdpar_family", an object inheriting from "gdpar_family_multi" (Path A, $p &gt; 1$), or a named list whose every element inherits from "gdpar_family" with no duplicated or empty names (Path B heterogeneous $K$, sub-phase 8.3.7 pattern).
  • amm (any): Must be an object inheriting from "amm_spec" or a named list (whose elements are expected to be "amm_spec" objects) for Path B $K &gt; 1$.
  • data (any): Must be a data.frame.
  • eb_correction (any): Must be a single, non-NA logical value (TRUE or FALSE).
  • laplace_control (any): Must be a list (possibly empty, possibly unnamed at this stage — naming is enforced downstream in .gdpar_eb_resolve_laplace_control).

Returns

invisible(NULL). The function is called for its side effect of raising errors on invalid input.

Notes

  • Raises an error of class "gdpar_input_error" (via gdpar_abort) for each validation failure, with a condition data field carrying received_class where applicable.
  • The formula check first tests inherits(formula, "gdpar_formula_set"); if that fails, it requires inherits(formula, "formula") and length(formula) == 3L.
  • The family named-list detection (Path B) requires all of: is.list(family), not inheriting from "gdpar_family" or "gdpar_family_multi", non-null names, all names non-empty (nzchar), no duplicated names (anyDuplicated == 0L), and every element inheriting from "gdpar_family" (checked via vapply).
  • The amm named-list detection requires is.list(amm), not inheriting from "amm_spec", and non-null names.
  • The $K &gt; 1$ + $p &gt; 1$ guard (Path C) is explicitly released per Sub-phase 8.6.D (Session 13b, 2026-05-25); Path C is routed to .gdpar_eb_run_KxP() in the dispatcher. Per-path stan_id checks are deferred to .gdpar_eb_check_stan_id_for_path().

.gdpar_eb_check_stan_id_for_path(family, K, p)

Purpose

Enforces the per-path supported stan_id set for the EB Stan templates, depending on the resolved $(K, p)$ regime. Called by the dispatcher (iteratively across $K$ slots under Path C) before assembling the family_id_k_vector data field.

Arguments

  • family (list): A single family specification object. Must contain $stan_id (coercible to integer) and $name (character) fields.
  • K (integer/numeric): The resolved number of mixture components.
  • p (integer/numeric): The resolved number of parametric coordinates.

Mathematics

The supported stan_id sets by regime:

$$ \text{supported}(K, p) = \begin{cases} {1, 2, 3, 4} & \text{if } K = 1 \quad \text{(Gaussian, Poisson, NB, Bernoulli)} \\ {1, 3} & \text{if } K > 1 \text{ and } p > 1 \quad \text{(Path C: Gaussian } K{=}2 \text{ + NB } K{=}2\text{)} \\ {1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13} & \text{if } K > 1 \text{ and } p = 1 \quad \text{(Path B full set)} \end{cases} $$

Note that the $K = 1$ branch does not distinguish $p = 1$ from $p &gt; 1$; both receive ${1, 2, 3, 4}$.

Returns

invisible(NULL) on success.

Notes

  • If family$stan_id is NULL, the function returns immediately without checking (short-circuit).
  • stan_id is coerced via as.integer().
  • On failure, raises an error of class "gdpar_unsupported_feature_error" via gdpar_abort, with a condition data list containing family, stan_id, K, p, and supported.
  • Under Path C ($K &gt; 1$, $p &gt; 1$), the dispatcher is expected to iterate this check across the $K$ slots before assembling the family_id_k_vector data field.
  • The deferred Path B set ${5, 6, 7, 8, 9, 10, 11, 12, 13}$ (Beta, Gamma, Lognormal_loc_scale, Student-t, Tweedie, ZIP, ZINB, Hurdle-Poisson, Hurdle-NB) for the $K &gt; 1, p &gt; 1$ regime is queued for a later iteration of Sub-phase 8.6.D.

.gdpar_eb_resolve_laplace_control(user)

Purpose

Merges a user-supplied laplace_control list with documented defaults, coercing types and validating bounds. Produces the fully resolved control list consumed by downstream Laplace approximation and ridge-perturbation routines.

Arguments

  • user (list): User-supplied control parameters. May be empty. If non-empty, every entry must be named.

Mathematics

Default values:

$$ \begin{aligned} M_{\text{multi-start}} &= 5 \\ \kappa_{\text{threshold}} &= 10^{10} \\ \lambda_{\text{ridge,init}} &= 10^{-6} \\ D_{\text{Laplace}} &= 1000 \\ \text{algorithm} &= \texttt{"lbfgs"} \\ \epsilon_{\text{lm}} &= \sqrt{\epsilon_{\text{mach}}} \\ \text{ridge_max_iter} &= 10 \\ g_{\text{ridge}} &= 10.0 \end{aligned} $$

where $\epsilon_{\text{mach}}$ is .Machine$double.eps.

Returns

A named list with the following entries, all type-coerced:

Field Type Default
multi_start_M integer 5L
kappa_threshold double 1e10
ridge_init double 1e-6
laplace_draws integer 1000L
optim_algorithm character "lbfgs"
epsilon_lm double sqrt(.Machine$double.eps)
ridge_max_iter integer 10L
ridge_grow_factor double 10.0

User-supplied values for recognized names override defaults; unrecognized names are dropped after a warning.

Notes

  • If user is empty (length(user) == 0L), returns the defaults list directly.
  • If user is non-empty but has NULL names or any empty (!nzchar) names, raises an error of class "gdpar_input_error".
  • Unknown entries (not in names(defaults)) trigger a soft warning of class "gdpar_diagnostic_warning" via gdpar_warn and are silently dropped from the output.
  • Post-merge type coercion: multi_start_M, laplace_draws, ridge_max_iter are coerced via as.integer(); kappa_threshold, ridge_init, epsilon_lm, ridge_grow_factor via as.double(). optim_algorithm is left as-is.
  • Validation bounds (each raises "gdpar_input_error" on failure):
    • multi_start_M >= 1L
    • kappa_threshold > 0
    • epsilon_lm > 0
    • ridge_max_iter >= 1L
    • ridge_grow_factor > 1
  • laplace_draws is coerced to integer but not bounds-checked in this function.

.gdpar_eb_lm_perturb(cov, control)

Purpose

Adaptive Levenberg-Marquardt ridge perturbation for the empirical posterior covariance matrix returned by cmdstanr::laplace(). Implements component 2 of the four-component anti-fragility strategy, extending the single-step ridge of Sub-phase 8.6.B into an iterative geometric-growth loop.

Arguments

  • cov (numeric matrix): A square symmetric matrix — the empirical posterior covariance. For Path C this is a per-slot block; for Path A/B it is the full $\theta_{\text{ref}}$ covariance.
  • control (list): A resolved laplace_control list (as produced by .gdpar_eb_resolve_laplace_control). Must contain $ridge_init, $ridge_max_iter, $ridge_grow_factor, $kappa_threshold, and $epsilon_lm.

Mathematics

Let $\Sigma$ denote the input covariance (cov) of dimension $n \times n$, with eigenvalues ${\lambda_i}_{i=1}^{n}$ and trace mean $\bar{d} = \max(|\text{mean}(\text{diag}(\Sigma))|,; 10^{-12})$.

Trigger condition. Ridge perturbation is needed if:

$$ \text{needs_ridge} = \Big(\exists, i:; \lambda_i \notin \mathbb{R}_{\text{finite}} ;\text{or}; \lambda_i \leq 0\Big) ;;\text{or};; \Big(|\det(\Sigma)| &lt; \epsilon_{\text{lm}}\Big) $$

where $\det(\Sigma) = \prod_i \lambda_i$ when all eigenvalues are finite.

If not needed: returns the original matrix with status "not_needed" and $\kappa_{\text{post}} = \lambda_{\max} / \lambda_{\min}$.

Adaptive loop. Starting with $\lambda \leftarrow \lambda_{\text{ridge,init}}$, for iteration $t = 1, \dots, T_{\max}$:

$$ \lambda_{\text{eff}}^{(t)} = \max!\Big(\lambda^{(t)},; 10^{-3} \cdot \bar{d}\Big) $$

$$ \Sigma_{\text{try}}^{(t)} = \Sigma + \lambda_{\text{eff}}^{(t)} \cdot I_n $$

Compute eigenvalues ${\mu_i^{(t)}}$ of $\Sigma_{\text{try}}^{(t)}$. If all finite and positive, compute the condition number:

$$ \kappa^{(t)} = \frac{\mu_{\max}^{(t)}}{\mu_{\min}^{(t)}} $$

Convergence: If $\kappa^{(t)} \leq \kappa_{\text{threshold}}$, return $\Sigma_{\text{try}}^{(t)}$ with status "converged".

Growth: Otherwise, $\lambda^{(t+1)} \leftarrow \lambda^{(t)} \cdot g_{\text{ridge}}$ and iterate.

Exhaustion: If the loop completes $T_{\max}$ iterations without convergence, return the last $\Sigma_{\text{try}}$ with status "exhausted" and $\kappa_{\text{post}} = \kappa^{(T_{\max})}$ (or $\infty$ if eigenvalues are still non-finite/non-positive).

Returns

A list with fields:

Field Type Description
cov_perturbed numeric matrix The (possibly ridged) covariance. Equals cov when status = "not_needed"; equals the last cov_try otherwise.
lambda_used numeric Final effective ridge $\lambda_{\text{eff}}$. Equals 0 when status = "not_needed".
n_iter integer Number of iterations performed. 0L when status = "not_needed". Equals control$ridge_max_iter when "exhausted".
kappa_post numeric Condition number after perturbation. Original $\kappa$ when "not_needed"; $\kappa^{(t)}$ at convergence; $\kappa^{(T_{\max})}$ or $\infty$ when "exhausted".
status character One of c("not_needed", "converged", "exhausted").

Notes

  • Eigenvalue computation uses eigen(cov, symmetric = TRUE, only.values = TRUE) wrapped in tryCatch; if it errors, eigenvalues are set to NA_real_, which triggers the ridge path.
  • The determinant is computed as prod(eigs0) only when all eigenvalues are finite; otherwise det_val is NA_real_ and the determinant-based trigger is skipped (but the eigenvalue-based trigger may still fire).
  • trace_mean is clamped to at least $10^{-12}$ to avoid a zero floor when the diagonal is near-zero.
  • The lambda_eff floor of $10^{-3} \cdot \bar{d}$ is applied inside every iteration, so even if control$ridge_init is very small, the effective ridge is bounded below by the trace-mean-scaled floor.
  • When status = "exhausted", the returned cov_perturbed is the last attempted matrix (which may or may not be positive-definite), and kappa_post is Inf if the final eigenvalues are non-finite or non-positive.
  • No error is raised on exhaustion; the caller is expected to inspect status.

Functions in R/eb.R (section 3 of 8)


.gdpar_eb_generate_stan_marginal(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)

Purpose

Dispatches to the correct Stan template generator for the EB marginal model — the model in which theta_ref (or theta_ref_k) lives in the parameters{} block and is assigned an anchor prior in model{}. This corresponds to Step (i)/(ii) of the EB workflow where the marginal log-likelihood is maximised to obtain the empirical-Bayes anchor estimate. The function selects among four template paths based on the resolved dimensions $(K, p)$.

Arguments

Argument Type Meaning
prior list Prior specification list. Expected elements (consumed downstream by the renderer): theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_W, sigma_y, phi.
cp_a logical (default FALSE) Centered-parameterization flag for a. When TRUE, a is scaled directly by sigma_a; when FALSE, a non-centered * sigma_a[1] scaling is applied.
cp_W logical (default FALSE) Centered-parameterization flag for W. Semantics mirror cp_a.
K integer (default 1L) Number of K-slots (groups/series). Coerced to integer at entry.
p integer (default 1L) Coordinate dimension of the response. Coerced to integer at entry.
family NULL or family object Passed only to the Path B (K > 1, p = 1) generator generate_stan_code_K.
cp_a_per_k NULL or logical Per-k centered-parameterization flag for a, forwarded to generate_stan_code_multi (Path A).
cp_a_per_K NULL or logical Per-K centered-parameterization flag for a, forwarded to generate_stan_code_K (Path B).

Mathematics

The dispatch is a partition of the $(K, p)$ plane:

$$ \text{template}(K, p) = \begin{cases} \texttt{amm_eb_marginal.stan} & K = 1 \wedge p = 1 \\ \texttt{amm_eb_marginal_multi.stan} & K = 1 \wedge p > 1 \\ \texttt{amm_eb_marginal_K.stan} & K > 1 \wedge p = 1 \\ \texttt{amm_eb_marginal_KxP.stan} & K > 1 \wedge p > 1 \end{cases} $$

Returns

A character string containing the rendered Stan model code. For the $K=1, p=1$ and $K&gt;1, p&gt;1$ paths the string is produced by .gdpar_eb_render_template; for the other two paths it is produced by generate_stan_code_multi or generate_stan_code_K respectively.

Notes

  • $K$ and $p$ are coerced to integer immediately upon entry (as.integer).
  • The Path C template ($K &gt; 1 \wedge p &gt; 1$) has a restricted placeholder set: only theta_ref, sigma_theta_ref, sigma_a, sigma_b, sigma_y, phi are present. The placeholders {{A_SCALE}}, {{A_PRIOR}}, {{W_SCALE}}, {{W_PRIOR}} are absent because the NCP (non-centered parameterization) is hardcoded per slot per coordinate and W is disabled (decision D39).
  • The function does not itself raise errors; any errors propagate from the downstream generators/renderers.

.gdpar_eb_generate_stan_conditional(prior, cp_a = FALSE, cp_W = FALSE, K = 1L, p = 1L, family = NULL, cp_a_per_k = NULL, cp_a_per_K = NULL)

Purpose

Companion of .gdpar_eb_generate_stan_marginal for Step (iii) of the EB workflow. Generates the EB conditional Stan model, in which theta_ref (or theta_ref_k) has been moved from parameters{} to data{} and the anchor priors are dropped from model{}. The dispatch table is structurally identical to the marginal helper; only the template names differ.

Arguments

Identical to .gdpar_eb_generate_stan_marginal (same names, types, defaults, and meanings).

Mathematics

$$ \text{template}(K, p) = \begin{cases} \texttt{amm_eb_conditional.stan} & K = 1 \wedge p = 1 \\ \texttt{amm_eb_conditional_multi.stan} & K = 1 \wedge p > 1 \\ \texttt{amm_eb_conditional_K.stan} & K > 1 \wedge p = 1 \\ \texttt{amm_eb_conditional_KxP.stan} & K > 1 \wedge p > 1 \end{cases} $$

Returns

A character string of rendered Stan model code, sourced from the same generators as the marginal path but with conditional template names.

Notes

  • The conditional templates share the same placeholder set as their marginal counterparts, except that anchor-prior placeholders are consumed only in the marginal path (the conditional path drops them from model{}).
  • $K$ and $p$ are coerced to integer at entry.
  • No errors are raised directly; all are delegated downstream.

.gdpar_eb_render_template(template_name, prior, cp_a, cp_W)

Purpose

Shared low-level renderer for the EB Stan template family. Reproduces the placeholder-substitution logic of generate_stan_code() but restricted to EB templates. It (1) translates legacy single-template names to their canonical-piece equivalents, (2) locates the template file in the installed package or falls back to inst/stan/, (3) injects the canonical helpers piece when the // {{CANONICAL_HELPERS}} marker is present, (4) performs all {{...}} substitutions, and (5) aborts with a structured error if any placeholder remains un-substituted.

Arguments

Argument Type Meaning
template_name character Base name of the .stan template file (e.g. "amm_eb_marginal.stan").
prior list Prior specification list; the renderer reads prior$theta_ref, prior$sigma_theta_ref, prior$sigma_a, prior$sigma_b, prior$sigma_W, prior$sigma_y, prior$phi.
cp_a logical Centered-parameterization flag for a. Controls the values substituted for {{A_SCALE}} and {{A_PRIOR}}.
cp_W logical Centered-parameterization flag for W. Controls the values substituted for {{W_SCALE}} and {{W_PRIOR}}.

Mathematics

The placeholder substitution map is:

Placeholder Value when cp_* = TRUE Value when cp_* = FALSE
{{A_SCALE}} "" " * sigma_a[1]"
{{A_PRIOR}} "normal(0, sigma_a[1])" "normal(0, 1)"
{{W_SCALE}} "" " * sigma_W[1]"
{{W_PRIOR}} "normal(0, sigma_W[1])" "normal(0, 1)"

The prior placeholders map directly: {{PRIOR_THETA_REF}} $\leftarrow$ prior$theta_ref, {{PRIOR_SIGMA_THETA_REF}} $\leftarrow$ prior$sigma_theta_ref, {{PRIOR_SIGMA_A}} $\leftarrow$ prior$sigma_a, {{PRIOR_SIGMA_B}} $\leftarrow$ prior$sigma_b, {{PRIOR_SIGMA_W}} $\leftarrow$ prior$sigma_W, {{PRIOR_SIGMA_Y}} $\leftarrow$ prior$sigma_y, {{PRIOR_PHI}} $\leftarrow$ prior$phi.

Returns

A character string: the fully substituted Stan source code.

Notes

  • Template name translation: "amm_eb_marginal.stan" is mapped to "amm_canonical_eb_marginal.stan" and "amm_eb_conditional.stan" is mapped to "amm_canonical_eb_conditional.stan". All other template names (including the KxP templates) pass through unchanged.
  • File location: If the effective template name starts with "amm_canonical_", the file is sought in system.file("stan", "_canonical_pieces", ...) with a fallback to file.path("inst", "stan", "_canonical_pieces", ...). Otherwise it is sought in system.file("stan", ...) with a fallback to file.path("inst", "stan", ...).
  • Helpers injection: If the template source contains the literal // {{CANONICAL_HELPERS}}, the file amm_canonical_helpers.stan is read from the same _canonical_pieces directory and substituted in place. Templates without this marker (e.g. the KxP EB templates) pass through unchanged.
  • Error — template not found: If the resolved template_path does not exist, calls gdpar_abort with class "gdpar_internal_error" and message "Stan template file '<name>' not found.".
  • Error — helpers not found: If the helpers piece file does not exist, calls gdpar_abort with class "gdpar_internal_error".
  • Error — unsubstituted placeholder: After all substitutions, if the string still contains "{{", the first match of \{\{[A-Za-z0-9_]+\}\} is extracted via regmatches/regexpr and passed to gdpar_abort with class "gdpar_internal_error" and message "Unsubstituted placeholder remains in EB Stan code: <leftover>".
  • All gsub calls use fixed = TRUE, so placeholders are treated as literal strings.

.gdpar_eb_maximize_marginal(model, stan_data, control, seed, verbose)

Purpose

Implements Step (i) of the EB workflow with the anti-fragility strategy of Charter Section 2.8. Runs cmdstanr::optimize() followed by cmdstanr::laplace() on the marginal EB Stan model with multi_start_M independent random inits, retains the init with the highest log-marginal approximation, applies an adaptive Levenberg–Marquardt ridge if the Hessian-derived covariance is ill-conditioned, and assembles the diagnostics needed by the gdpar_eb_fit$diagnostics_numerical slot.

Arguments

Argument Type Meaning
model CmdStanModel A compiled cmdstanr model object exposing $optimize() and $laplace() methods.
stan_data list Data list for Stan. Must contain J_groups (integer, number of groups). For path dispatch, may contain p_dim (integer, coordinate dimension) and K_slots (integer, number of K-slots).
control list Control parameters. Must contain: multi_start_M (integer, number of multi-start inits), optim_algorithm (character, passed to optimize), laplace_draws (integer, number of Laplace draws), kappa_threshold (numeric, condition-number gate).
seed NULL or integer Base random seed. When non-NULL, per-init seeds are as.integer(seed) + m for optimize and as.integer(seed) + 1000L for Laplace.
verbose logical When TRUE, emits informational messages about failed inits and multimodality warnings.

Mathematics

Multi-start optimization. For $m = 1, \dots, M$:

$$ \hat{\theta}^{(m)} = \arg\max_{\theta} ; \log p(\theta \mid \text{data}) $$

The best init is selected by the largest finite $\log p$ (the lp__ value from optimize()):

$$ m^\star = \arg\max_{m \in {1,\dots,M}} ; \text{lp}^{(m)} $$

Laplace approximation. At the best mode $\hat{\theta}^{(m^\star)}$:

$$ \hat{\Sigma} = \left[ -\nabla^2 \log p(\theta \mid \text{data}) \Big|_{\hat{\theta}} \right]^{-1} $$

Adaptive Levenberg–Marquardt ridge. If $\hat{\Sigma}$ is non-PD or $|\det(\hat{\Sigma})| &lt; \varepsilon_{\text{lm}}$, a ridge $\lambda I$ is grown geometrically until the post-ridge condition number satisfies $\kappa_{\text{post}} \leq \kappa_{\text{threshold}}$ or ridge_max_iter is reached:

$$ \hat{\Sigma}_{\text{ridged}} = \hat{\Sigma} + \lambda I $$

Condition-number gate. The final covariance is accepted only if:

$$ \kappa_{\text{post}} \leq \kappa_{\text{threshold}} \quad \wedge \quad \text{status} \neq \text{"exhausted"} $$

Multi-start dispersion. Computed over the finite $\log p$ values:

$$ \text{dispersion} = \frac{\text{sd}({\text{lp}^{(m)} : \text{finite}})}{\max(|\overline{\text{lp}}|, 1)} $$

A dispersion exceeding $0.05$ triggers a multimodality warning when verbose = TRUE.

Path-aware variable extraction. The theta_ref variable names extracted from the Laplace draws depend on the path:

Path Condition Variable pattern Expected count
Base $K=1, p=1$ theta_ref[1], …, theta_ref[J] (or theta_ref if $J=1$) $J$
Path A $p &gt; 1$ (and $K=1$) theta_ref[j,k] for $j \in 1..J$, $k \in 1..p$ $J \cdot p$
Path B $K &gt; 1$ (and $p=1$) theta_ref_k[j,k] for $j \in 1..J$, $k \in 1..K$ $J \cdot K$

Returns

A list with components:

Component Type Description
theta_ref_hat numeric vector (length $J$ or $J \cdot p$ or $J \cdot K$) Posterior mean of theta_ref from the Laplace draws (colMeans of the draws matrix).
theta_ref_se numeric vector (same length) Standard errors: $\sqrt{\max(\text{diag}(\hat{\Sigma}), 0)}$.
theta_ref_cov matrix ($n \times n$) Covariance matrix (possibly ridged).
diagnostics named list See below.

The diagnostics list contains:

Element Type Description
kappa numeric Post-ridge condition number $\kappa_{\text{post}}$.
lm_perturbation numeric The ridge $\lambda$ used (lambda_used).
lm_n_iter integer Number of LM ridge iterations.
lm_status character Status from .gdpar_eb_lm_perturb (e.g. "ok" or "exhausted").
kappa_post_ridge numeric Duplicate of kappa (from lm_out$kappa_post).
multi_start_dispersion numeric Dispersion of finite $\log p$ values across multi-start inits; NA if fewer than 2 finite values.
marginal_log_lik_history numeric vector (length $M$) lp__ from each init; NA for failed inits.
best_init_index integer The $m^\star$ index of the winning init.

Notes

  • Init dispatch: The flag is_multi_or_K is TRUE when stan_data$p_dim > 1L or stan_data$K_slots > 1L. In that case, init_m is set to NULL (cmdstanr's default unconstrained-space random sampler is used). Otherwise, .gdpar_eb_make_random_init(stan_data, seed_offset = m, base_seed = seed) is called. Each multi-start iteration uses a distinct seed offset, preserving reproducibility.
  • Optimize call: jacobian = TRUE is always set (required for downstream laplace() to match the unconstrained-scale convention). When init_m is non-NULL, it is wrapped as list(init_m) (single chain). When seed is non-NULL, the per-init seed is as.integer(seed) + m.
  • Laplace call: Uses mode = best_opt, jacobian = TRUE, draws = control$laplace_draws. Seed (if non-NULL) is as.integer(seed) + 1000L.
  • Error — all inits fail: If best_opt is NULL (every optimize() call failed or returned NULL), calls gdpar_abort with class "gdpar_unsupported_feature_error", message recommending gdpar() (FB), and data = list(history_lp = history_lp).
  • Error — Laplace fails: If model$laplace() returns NULL (error caught), calls gdpar_abort with class "gdpar_eb_numerical_error", message about singular/non-PD Hessian at the candidate MAP, and data = list(history_lp, best_idx).
  • Error — missing theta_ref variables (Path B): If the number of theta_ref_k[j,k] variables found in the draws does not equal $J \cdot K$, calls gdpar_abort with class "gdpar_internal_error".
  • Error — missing theta_ref variables (Path A): If the number of theta_ref[j,k] variables found does not equal $J \cdot p$, calls gdpar_abort with class "gdpar_internal_error".
  • Error — missing theta_ref variables (Base): If no theta_ref[...] variables are found and $J = 1$ does not rescue via the bare theta_ref name, calls gdpar_abort with class "gdpar_internal_error" and message "theta_ref variable not found in Laplace draws output.".
  • Error — kappa exceeds threshold: If $\kappa_{\text{post}} &gt; \kappa_{\text{threshold}}$ or lm_out$status == "exhausted", calls gdpar_abort with class "gdpar_eb_numerical_error", a detailed message including $\kappa$, threshold, LM status, iteration count, $\lambda$, and smallest eigenvalue, and data containing kappa, eigenvalues, history_lp, lm_status, lm_n_iter, lm_lambda.
  • Warning — multimodality: When dispersion > 0.05 and verbose = TRUE, calls gdpar_warn with class "gdpar_diagnostic_warning" and data = list(dispersion, history_lp).
  • Covariance computation: If the draws matrix has more than one column, stats::cov(theta_mat) is used; otherwise a $1 \times 1$ matrix from stats::var(theta_mat[, 1]).
  • Eigenvalue computation: eigen(theta_cov, symmetric = TRUE, only.values = TRUE) is attempted in a tryCatch; on error returns NA_real_. The minimum eigenvalue is reported in the kappa-exceeds-threshold error message.
  • Verbose messages: Failed optimize() calls emit a gdpar_inform with class "gdpar_eb_message" when verbose = TRUE.
  • The %||% operator is used for the all_vars fallback (dimnames(draws)$variable %||% character(0L)).

.gdpar_eb_make_random_init(stan_data, seed_offset = 1L, base_seed = NULL)

Purpose

Generates a list of random initial values for the Stan HMC sampler in the Empirical Bayes (EB) workflow. The structure of the returned list is conditioned on the flags and dimensions carried in stan_data, so that only parameters relevant to the configured model are initialised.

Arguments

  • stan_data (list): The data list prepared for Stan. The following fields are consulted:
    • J_groups (integer): number of reference-parameter groups $J$.
    • use_groups (integer flag, 0/1): whether group-level hyperparameters are active.
    • use_a (integer flag, 0/1): whether the $a$ AMM component is active.
    • J_a (integer): dimension of the $a$ component.
    • use_b (integer flag, 0/1): whether the $b$ AMM component is active.
    • J_b (integer): dimension of the $b$ component.
    • use_W (integer flag, 0/1): whether the $W$ AMM component is active.
    • dim_W (integer): row dimension of the $W$ matrix.
    • d (integer): column dimension of the $W$ matrix (latent dimension).
    • use_dispersion_y (integer flag, 0/1): whether an observation-level dispersion is active.
    • use_dispersion_phi (integer flag, 0/1): whether a $\phi$ dispersion parameter is active.
  • seed_offset (integer, default 1L): integer added to base_seed to derive the RNG seed.
  • base_seed (integer or NULL, default NULL): base seed. If NULL, the global RNG state is left untouched.

Mathematics

When base_seed is non-NULL, the effective seed is

$$ \texttt{rng_seed} = \texttt{as.integer(base_seed)} + \texttt{seed_offset}. $$

The draws are:

$$ \theta_{ref,j} \sim \mathcal{N}(0,, 0.5^2), \quad j = 1,\dots,J $$

and, when the corresponding flag is set:

$$ \mu_{\theta_{ref}} \sim \mathcal{N}(0,, 0.5^2), \qquad \sigma_{\theta_{ref}} = |Z| + 0.1, \quad Z \sim \mathcal{N}(0,1) $$

$$ \sigma_a = |Z| + 0.1, \qquad a_{raw} \sim \mathcal{N}(0,, 0.5^2) \in \mathbb{R}^{J_a} $$

$$ \sigma_b = |Z| + 0.1, \qquad c_{b,raw} \sim \mathcal{N}(0,, 0.5^2) \in \mathbb{R}^{J_b} $$

$$ \sigma_W = |Z| + 0.1, \qquad W_{raw} \sim \mathcal{N}(0,, 0.5^2) \in \mathbb{R}^{\texttt{dim_W} \times d} $$

$$ \sigma_y = |Z| + 0.1, \qquad \phi = |Z| + 1 $$

Returns

A named list suitable for passing as init to a Stan sampler. Scalar parameters are wrapped in 1-element arrays via as.array(); W_raw is a matrix; theta_ref, a_raw, and c_b_raw are numeric vectors.

Notes

  • When base_seed is non-NULL, the function calls set.seed(rng_seed) and registers an on.exit handler that restores the prior .Random.seed state in .GlobalEnv (if it existed) upon return. If .Random.seed did not exist in .GlobalEnv, the handler does nothing (the seed set by set.seed persists).
  • The on.exit handler is registered with add = TRUE, so it composes with any pre-existing exit handlers.
  • Flags are tested with isTRUE(... == 1L), so any value other than exactly 1L (including TRUE or 1) is treated as inactive.

.gdpar_eb_apply_correction(eb_correction, laplace_result, stan_data, p = 1L, verbose)

Purpose

Entry point for the Proposition 7B coverage-discrepancy correction in the EB workflow. In the scalar regime ($p = 1$) it computes a scalar inflation constant directly; for $p &gt; 1$ it delegates to .gdpar_eb_correction_matrix(). The correction is not applied to the raw draws here—only the scaling object is returned for downstream S3 methods.

Arguments

  • eb_correction (logical): whether the correction should be applied.
  • laplace_result (list): result of the Laplace approximation step. Must contain theta_ref_cov (a matrix, or at least an indexable object for the [1L, 1L] element in the scalar path).
  • stan_data (list): the Stan data list. Passed through but not directly used in the scalar computation.
  • p (integer, default 1L): dimensionality of the reference parameter for the correction.
  • verbose (logical): whether to emit a diagnostic warning when the correction is disabled.

Mathematics

Scalar form ($p = 1$), from v07 Section 6 / v07b Section 5.2:

$$ C_{g,\alpha} = \kappa(\alpha) \cdot \bigl(g'(\xi^*)\bigr)^2 \cdot \bigl(J^{\xi}\bigr)^2 \cdot \bigl(I_{\theta\theta}^{marg}\bigr)^{-1} $$

For the default identity functional $g(\xi) = \xi$ and $p = 1$, this reduces to:

$$ C_{g,\alpha} = \kappa(\alpha) \cdot \mathrm{Var}^{marg}(\theta_{ref}) $$

with $\kappa(\alpha_{95%}) = 1.92$ hardcoded in the function.

Returns

A list with two elements:

  • applied (logical): TRUE if the correction was successfully computed, FALSE otherwise.
  • constant (numeric scalar): the scalar correction $C_{g,\alpha}$ when applied = TRUE; NA_real_ otherwise.

When $p &gt; 1$, the return value is whatever .gdpar_eb_correction_matrix() produces (a $p \times p$ matrix in constant).

Notes

  • If eb_correction is FALSE and verbose is TRUE, a warning is issued via gdpar_warn() with class "gdpar_diagnostic_warning", stating that intervals will use nominal coverage and may under-cover by $O(n^{-1})$.
  • The marginal variance is extracted as laplace_result$theta_ref_cov[1L, 1L] inside a tryCatch; any error yields NA_real_.
  • If the marginal variance is not finite or is $\leq 0$, the function returns applied = FALSE, constant = NA_real_ silently (no warning).
  • For $p &gt; 1$, p is coerced to integer before the delegation check.

.gdpar_eb_correction_matrix(eb_correction, laplace_result, stan_data, p = 1L, verbose)

Purpose

Computes the matrix-valued Proposition 7B* coverage-discrepancy correction for the multivariate regime ($p &gt; 1$). This is the companion of the scalar path in .gdpar_eb_apply_correction() and implements v07b Section 5.1.

Arguments

  • eb_correction (logical): whether the correction should be applied.
  • laplace_result (list): Laplace approximation result containing theta_ref_cov.
  • stan_data (list): Stan data list (passed through, not used in computation).
  • p (integer, default 1L): dimension of the reference parameter.
  • verbose (logical): intended for diagnostics (not directly used in the body beyond being accepted).

Mathematics

Matrix form (Proposition 7B*, v07b Section 5.1):

$$ C^{*}_{g,\alpha} = \kappa(\alpha) \cdot J^{\xi,T} \cdot \Sigma^{marg}_{\theta_{ref}} \cdot J^{\xi} $$

For the default identity functional $g(\xi) = \xi$, the Jacobian $J^{\xi} = I_p$, so:

$$ C^{*}_{g,\alpha} = \kappa(\alpha) \cdot \Sigma^{marg}_{\theta_{ref}} $$

with $\kappa(\alpha_{95%}) = 1.92$. At $p = 1$ this collapses to the scalar form $\kappa(\alpha) \cdot \mathrm{Var}^{marg}(\theta_{ref})$.

Returns

A list with two elements:

  • applied (logical): TRUE if the matrix correction was successfully computed.
  • constant (matrix): the $p \times p$ (or matching cov_mat dimension) correction matrix when applied = TRUE; an NA_real_ matrix of appropriate size otherwise.

Notes

  • The function aborts silently to applied = FALSE with an NA matrix in the following cases:
    1. eb_correction is not TRUE.
    2. laplace_result$theta_ref_cov is NULL, not a matrix, or non-square (extraction wrapped in tryCatch returning NULL on error).
    3. Any element of cov_mat is non-finite.
    4. Eigenvalues of cov_mat (computed via eigen(..., symmetric = TRUE, only.values = TRUE)) are non-finite, or any eigenvalue is $&lt; -10^{-10}$ (i.e., the matrix is not positive semi-definite within tolerance).
  • When the PSD check fails, the returned NA matrix has dimensions matching nrow(cov_mat) / ncol(cov_mat), not necessarily p.
  • The eigenvalue extraction is wrapped in tryCatch returning NA_real_ on error, which then triggers the non-finite check.
  • Downstream S3 methods are expected to fall back to nominal credible intervals when applied = FALSE.

.gdpar_eb_resolve_K_inputs(formula, amm, W, family, formula_set_input, amm_list_input, classic_with_amm_calls, family_is_named_list)

Purpose

Resolves the three possible K-input patterns (formula set, named list of amm_spec, or classic formula with AMM wrapper calls) into a single canonical amm_list_canonical, and promotes the family scope accordingly. This mirrors the K-input dispatch logic of gdpar() and is the EB-path companion of .gdpar_K. The logic is intentionally duplicated rather than refactored to preserve bit-exact behaviour of golden tests.

Arguments

  • formula (formula or gdpar_formula_set): the model formula or formula set.
  • amm (amm_spec or named list of amm_spec): the AMM specification(s).
  • W (matrix or NULL): the $W$ matrix passed to AMM construction.
  • family (gdpar_family or named list of gdpar_family): the response family specification.
  • formula_set_input (logical): whether formula is a gdpar_formula_set.
  • amm_list_input (logical): whether amm is a named list of amm_spec.
  • classic_with_amm_calls (logical): whether the formula RHS contains a()/b()/W() wrapper calls.
  • family_is_named_list (logical): whether family is a named list (heterogeneous K-slot pattern).

Returns

A list with elements:

  • amm_list_canonical (named list of amm_spec): the resolved canonical AMM specifications, one per K-slot.
  • K (integer): length of amm_list_canonical.
  • outcome_name (character): the name of the outcome variable extracted from the formula.
  • formula_env (environment): the environment associated with the formula.
  • family_promoted: the family object after scope promotion (either a promoted gdpar_family or a heterogeneous family structure).
  • family_id_k_vector (integer vector or NULL): per-observation family IDs when the heterogeneous path is taken; NULL otherwise.

Notes

Three dispatch branches, evaluated in order:

  1. formula_set_input branch: amm must be the default amm_spec() (checked via .gdpar_is_default_amm_spec()); otherwise an error of class "gdpar_input_error" is raised. The canonical list is built by .gdpar_formula_set_to_amm_spec_list(formula, W). outcome_name and formula_env are taken from formula$outcome and formula$env.

  2. amm_list_input branch: amm is used directly as amm_list_canonical. Each slot name must be non-empty (checked via nzchar()), each entry must inherit from class "amm_spec", and slot names must be unique (anyDuplicated(...) == 0L). formula must be a two-sided formula (length(formula) == 3L). Violations raise "gdpar_input_error". outcome_name is as.character(formula[[2L]]); formula_env is environment(formula).

  3. Classic (else) branch: amm must be the default amm_spec(). The first eligible parameter name is extracted from family—either family[[1L]]$param_specs[[1L]]$name (if family_is_named_list) or family$param_specs[[1L]]$name. A gdpar_formula_set is constructed via do.call(gdpar_formula_set, args_for_fs) with the formula named by that parameter, then .gdpar_formula_set_to_amm_spec_list(fs, W) builds the canonical list.

After resolution, $K = \texttt{length(amm_list_canonical)}$:

  • $K &gt; 1$: If family_is_named_list, calls .gdpar_resolve_heterogeneous_family_K(family, names(amm_list_canonical)) and unpacks location_family and family_id_k_vector. Otherwise calls .gdpar_promote_scope_per_observation(family, names(amm_list_canonical)) with family_id_k_vector = NULL.
  • $K = 1$: If family_is_named_list, raises "gdpar_input_error" (heterogeneous path requires $K \geq 2$). Otherwise calls .gdpar_promote_scope_per_observation(family, k_name) with family_id_k_vector = NULL.

Errors raised (all via gdpar_abort with class = "gdpar_input_error"):

  • Formula set path with non-default amm.
  • Named-list amm with empty slot name, non-amm_spec entry, or duplicated names.
  • Named-list amm with formula that is not a two-sided formula.
  • Classic path with non-default amm.
  • Heterogeneous family (family_is_named_list = TRUE) resolved to $K = 1$.

The data field of the abort is populated for some errors (e.g., list(slot = ..., received = ...) and list(K = K)).

.gdpar_eb_run_K(amm_list_canonical, family, data, prior, anchor, outcome_name, formula_env, family_id_k_vector, skip_id_check, chains, iter_warmup, iter_sampling, adapt_delta, max_treedepth, refresh, verbose, seed, group, parametrization, id_check_rigor, eb_correction, laplace_control, call, ...)

Purpose

Primary orchestrator for the Empirical-Bayes ("eb") estimation path under the regime $K &gt; 1$ slots with $p = 1$ (univariate outcome shared across all slots). Internally labeled as "Sub-phase 8.6.C Path B." The function performs the complete pipeline: input validation, design-matrix construction for $K$ slots, anchor resolution, per-slot and $K$-level identifiability checks, group-aliasing checks, Stan data assembly, Laplace marginal maximization, conditional MCMC sampling, diagnostic computation, and EB correction application, returning a fitted object of class gdpar_eb_fit.

Arguments

Argument Type Meaning
amm_list_canonical named list of length $K$ Canonical AMM (anchor model matrix) specifications. Each element is a list potentially containing $a (formula for the $a$-basis), $b (formula for the $b$-basis), and $W (pre-specified basis matrix or formula). Names become slot_names.
family list / character Response family specification passed to Stan code generators and data assemblers.
data data.frame Data containing the outcome column and all covariates referenced in slot formulas.
prior list Prior specification passed to Stan code generators.
anchor various Anchor specification (scalar, vector, or special keyword) resolved by resolve_anchor_K.
outcome_name character Name of the outcome column in data.
formula_env environment Environment attached to all internally constructed formulas via stats::as.formula(..., env = formula_env).
family_id_k_vector integer vector Per-slot family identifiers of length $K$, forwarded to .assemble_stan_data_K.
skip_id_check logical If TRUE, all identifiability checks (per-slot and $K$-level) are bypassed and id_report is set to NULL.
chains numeric/integer Number of MCMC chains for the conditional model; coerced to integer.
iter_warmup numeric/integer Warmup iterations; coerced to integer.
iter_sampling numeric/integer Sampling iterations; coerced to integer.
adapt_delta numeric Stan NUTS adapt_delta control parameter.
max_treedepth numeric/integer Stan NUTS maximum tree depth; coerced to integer.
refresh numeric/integer Stan output refresh interval; coerced to integer.
verbose logical Controls show_messages, show_exceptions in Stan sampling, and verbosity of helper calls.
seed integer or NULL Random seed for Stan sampling and Laplace maximization. If non-NULL, coerced to integer.
group various Grouping specification for hierarchical structure, resolved by .resolve_group_argument.
parametrization character Requested parametrization ("cp" for centered, otherwise non-centered). In Path B both cp_a and cp_W are set uniformly across all $K$ slots.
id_check_rigor various Rigor level forwarded to .check_identifiability_K for the $K$-level check.
eb_correction logical Whether to apply the Proposition 7B coverage-discrepancy correction.
laplace_control list Control parameters forwarded to .gdpar_eb_maximize_marginal.
call call The original top-level function call, stored in the returned object.
... any Extra named arguments merged into the sample_args list passed to cmdstanr's $sample() method, potentially overriding defaults.

Mathematics

The function implements a two-stage Empirical Bayes estimator:

Stage 1 — Laplace marginal maximization. The marginal Stan model is generated and compiled. The marginal log-posterior of the anchor parameters is maximized:

$$\hat{\boldsymbol{\theta}}_{\mathrm{ref}} = \arg\max_{\boldsymbol{\theta}_{\mathrm{ref}}} ; \log p(\boldsymbol{\theta}_{\mathrm{ref}} \mid \mathbf{y})$$

where $\boldsymbol{\theta}{\mathrm{ref}} \in \mathbb{R}^{J{\mathrm{groups}} \times K}$ is the vector of per-group, per-slot anchor parameters. The Laplace helper returns the mode $\hat{\boldsymbol{\theta}}_{\mathrm{ref}}$ and its standard error.

Stage 2 — Conditional MCMC. The conditional Stan model is generated, compiled, and sampled with $\hat{\boldsymbol{\theta}}_{\mathrm{ref}}$ plugged in as data (theta_ref_k_data), drawing from:

$$p(\boldsymbol{\xi} \mid \mathbf{y}, \hat{\boldsymbol{\theta}}_{\mathrm{ref}})$$

EB correction (Proposition 7B, scalar form at $p=1$). At $p = 1$, the tensor-valued correction degenerates to a per-slot scalar:

$$C_{g,\alpha}[k] = \kappa(\alpha) \cdot \Sigma^{\mathrm{marg}}_{\theta_{\mathrm{ref},k}}$$

where $\Sigma^{\mathrm{marg}}{\theta{\mathrm{ref},k}}$ is the marginal variance of the $k$-th slot's anchor. The correction is applied by .gdpar_eb_apply_correction.

Identifiability diagnostic test point. For each slot $k$, the diagnostic anchor is:

$$\theta_{\mathrm{diag},k} = \begin{cases} 1 & \text{if } b_k \text{ exists and } |\theta_{\mathrm{ref},k}| < 10^{-8} \ \theta_{\mathrm{ref},k} & \text{otherwise} \end{cases}$$

This avoids testing identifiability at a degenerate zero anchor when a $b$-basis is present.

Returns

A list with S3 class c("gdpar_eb_fit", "list") containing:

Element Type / Structure Description
theta_ref_hat numeric Laplace point estimate of the anchor (flat vector of length $J_{\mathrm{groups}} \times K$).
theta_ref_se numeric Standard error of the Laplace estimate.
conditional_fit CmdStanMCMC The cmdstanr fit object from the conditional model.
amm_list_canonical named list The input AMM list (with $W slots materialized).
family The input family.
prior The input prior.
design_K list Design structure from .build_amm_design_K, containing Z_a_k_list, Z_b_k_list, X, etc.
anchor numeric vector Resolved anchor values of length $K$ from resolve_anchor_K.
stan_data list Assembled Stan data list from .assemble_stan_data_K, augmented with K_slots and p_dim.
identifiability_report named list or NULL Per-slot identifiability reports (named by slot_names) with a K_level attribute; NULL when skip_id_check = TRUE.
diagnostics MCMC diagnostics from compute_diagnostics.
diagnostics_numerical Laplace optimizer diagnostics from laplace_result$diagnostics.
parametrization list Resolved parametrization with elements cp_a (logical), cp_W (logical), cp_a_per_K (NULL), and meta (list with mode = "eb_K_path_B", note, requested).
group_info list or NULL Resolved group information from .resolve_group_argument.
correction_applied logical Whether the EB correction was applied.
eb_correction_constant The correction constant from .gdpar_eb_apply_correction.
call call The original function call.
path character Always "eb".
K integer Number of slots.
slot_names character Names of the AMM list elements.

Notes

Input validation errors (class gdpar_input_error):

  • If outcome_name is not a column in data.
  • If the outcome y is a matrix or an array with length(dim(y)) > 1 (Path B requires a length-$n$ univariate vector shared across all $K$ slots).
  • If y contains any non-finite values: for numeric y, any !is.finite(y) (NA, NaN, Inf); for non-numeric y, any is.na(y).

Identifiability errors (class gdpar_identifiability_error):

  • If any per-slot check gdpar_check_identifiability returns rep_k$passed != TRUE, the error data field contains slot (name) and report (the full report).
  • If the $K$-level check .check_identifiability_K returns passed != TRUE, the error data field contains report (the $K$-level report).
  • Both are bypassed when skip_id_check = TRUE.

Formula construction:

  • The union of all variables across all slots' $a and $b formulas is collected. If empty, the RHS is "1"; otherwise it is paste(union_vars, collapse = " + ").
  • The full formula is outcome_name ~ rhs_str with env = formula_env.
  • The RHS is extracted as formula_full[c(1L, 3L)] (a one-sided formula) and updated with ~ . + 0 to remove the intercept.

W basis materialization:

  • For each slot with a non-NULL $W element, materialize_W_basis(amm_list_canonical[[k]]$W, p = 1L) is called in place, mutating amm_list_canonical.

Parametrization resolution:

  • Both cp_a and cp_W are set to identical(parametrization, "cp"), meaning the same parametrization is applied uniformly across all $K$ slots. The meta$note explicitly states that per-slot preflight (cp_a_per_K) is queued but not yet implemented.

theta_ref_k_data reshaping:

  • theta_hat is extracted as as.numeric(laplace_result$theta_ref_hat).
  • J_groups_loc is read from stan_data$J_groups and coerced to integer.
  • The if/else branches both produce matrix(theta_hat, nrow = J_groups_loc, ncol = K, byrow = FALSE) — the two branches are functionally identical, suggesting a placeholder for future differentiation.
  • The resulting matrix is assigned to stan_data_cond$theta_ref_k_data, intended for Stan's array[J_groups] vector[K] consumer.

Stan model lifecycle:

  • The marginal Stan source is generated by .gdpar_eb_generate_stan_marginal, written to a tempfile via write_stan_to_tempfile, and compiled with cmdstanr::cmdstan_model.
  • The conditional Stan source is generated by .gdpar_eb_generate_stan_conditional and undergoes the same write-and-compile cycle.
  • Both tempfile paths are transient (side effect on the filesystem).

Sample arguments:

  • The sample_args list is constructed with explicit integer coercions for chains, iter_warmup, iter_sampling, max_treedepth, and refresh.
  • adapt_delta is passed without coercion.
  • show_messages and show_exceptions are both set to verbose.
  • seed is added only if non-NULL.
  • Extra arguments from ... are merged into sample_args by name, potentially overriding any of the above defaults.
  • Sampling is invoked via do.call(conditional_model$sample, sample_args).

Group aliasing:

  • When group_info is non-NULL, .check_group_aliasing_c7 is called for each slot $k$ with a design list containing Z_a = design_K$Z_a_k_list[[k]], Z_b = design_K$Z_b_k_list[[k]], and X = design_K$X.

Trailing roxygen block:

  • The section concludes with a roxygen @noRd documentation block for an internal function implementing the tensor-valued Proposition 7B* correction under $K &gt; 1$ and $p &gt; 1$. The function itself is not defined in this section; its documented signature includes parameters eb_correction, laplace_result_per_slot, K, p, and verbose, and it returns a list with applied, constant (3D array $[K, p, p]$), and slot_dispositions. This function appears in a subsequent section.

.gdpar_eb_correction_tensor(eb_correction, laplace_result_per_slot, K = 2L, p = 1L, verbose = TRUE)

Purpose

Builds a three-dimensional correction tensor for the Path C empirical-Bayes (EB) regime. The tensor scales each slot's reference-parameter covariance (extracted from per-slot Laplace results) by a fixed multiplier and is consumed downstream by S3 coverage methods. If any slot fails validation, the entire correction is disabled and downstream methods fall back to nominal coverage.

Arguments

  • eb_correction — logical scalar (or any value testable by isTRUE). When not TRUE, the function short-circuits and returns a disabled result with no slot processing.
  • laplace_result_per_slot — list of length K. Each element is a Laplace-fit result object expected to contain a theta_ref_cov_k field holding the $p \times p$ covariance of the reference parameters for that slot.
  • K — integer-ish scalar; number of slots. Coerced to integer. Default 2L.
  • p — integer-ish scalar; number of coordinates per slot. Coerced to integer. Default 1L.
  • verbose — logical scalar; when TRUE and at least one slot fails, a diagnostic warning is emitted via gdpar_warn.

Mathematics

For each slot $k \in {1, \dots, K}$ that passes validation, the correction constant is

$$C_k ;=; \kappa_{\alpha,0.95};\Sigma_k, \qquad \kappa_{\alpha,0.95} = 1.92,$$

where $\Sigma_k$ is the theta_ref_cov_k matrix for slot $k$. The resulting tensor has shape $K \times p \times p$ with slice $C[k,\cdot,\cdot] = C_k$.

The positive-semidefinite check uses eigenvalues $\lambda_i(\Sigma_k)$ computed via eigen(..., symmetric = TRUE, only.values = TRUE); a slot is rejected if any $\lambda_i &lt; -10^{-10}$.

Returns

A named list with three components:

  • applied — logical scalar. TRUE only if every slot passed validation and the tensor was filled.
  • constant — numeric array of dimensions c(K, p, p). When applied = TRUE, filled with the scaled covariances; otherwise filled with NA_real_ (the "empty tensor").
  • slot_dispositions — named character vector of length K (names are seq_len(K) coerced to character). Each entry is one of: "disabled" (correction globally off), "missing" (covariance absent or wrong shape), "non_finite" (covariance contains non-finite entries), "non_psd" (eigenvalue check failed), or "ok".

Notes

  • The multiplier kappa_alpha_95 is hardcoded to 1.92 (not the exact 1.959964… standard-normal 97.5th percentile).
  • When eb_correction is not TRUE, the returned slot_dispositions are all "disabled" and names are set via setNames(rep("disabled", K), seq_len(K)).
  • When any slot fails, any_failed is set, the function returns applied = FALSE with an empty (NA) tensor, and—if verbose—a warning of class "gdpar_diagnostic_warning" is emitted via gdpar_warn summarising the count and unique failure types.
  • The PSD eigen-decomposition is wrapped in tryCatch; an error from eigen yields NA_real_ values, which then trigger the "non_psd" disposition.
  • The covariance shape check requires is.matrix(cov_k), nrow == p, and ncol == p.

empty_tensor() (nested closure inside .gdpar_eb_correction_tensor)

Purpose

Local helper that allocates a fresh $K \times p \times p$ array of NA_real_, used as the default/disabled constant tensor.

Arguments

None. Captures K and p from the enclosing .gdpar_eb_correction_tensor scope.

Returns

A numeric array of dimensions c(K, p, p) filled entirely with NA_real_.

Notes

Defined as a closure; not accessible outside its parent function.


.build_amm_design_KxP(amm_list_canonical, data, formula_rhs)

Purpose

Constructs the per-slot multivariate (ragged) design matrices for the Path C $K \times p$ specification. Iterates over the K slots of a canonicalised amm_spec list, enforces homogeneous $p \geq 2$ across slots, and delegates to .build_amm_design_multi() for each slot. The returned structure is the direct input consumed by .assemble_stan_data_KxP().

Arguments

  • amm_list_canonical — named list of length $K \geq 2$ of amm_spec objects. Each object must carry a $p field (defaulting to 1L via %||% if absent) that is $\geq 2$ and identical across all slots.
  • data — data frame containing the variables referenced by the per-slot AMM specifications. Validated by assert_data_frame().
  • formula_rhs — two-sided formula identifying the covariate columns of data used as the linear factor $x$. Passed through verbatim to .build_amm_design_multi() for each slot.

Returns

A named list with:

  • K — integer scalar; number of slots.
  • p — integer scalar; the homogeneous coordinate dimension (taken from the first slot).
  • slot_names — character vector of length K; the names() of amm_list_canonical.
  • design_per_slot — named list of length K. Each entry is the list returned by .build_amm_design_multi(a_k, data, formula_rhs) for that slot's amm_spec.

Notes

  • Aborts with class "gdpar_internal_error" via gdpar_abort if amm_list_canonical is not a list or has length < 2.
  • Aborts if any element lacks a non-empty name (is.null(slot_names) or any(!nzchar(slot_names))).
  • Aborts if the per-slot $p values are not all $\geq 2$ or not all identical. The error message includes the comma-separated p_per_slot vector.
  • Each slot is validated with assert_inherits(a_k, "amm_spec", ...) before delegation.
  • The $p extraction uses a$p %||% 1L, so a missing $p field is treated as 1L—which then triggers the homogeneous-$p \geq 2$ abort.

.assemble_stan_data_KxP(design_KxP, family, amm_list_canonical, y_matrix, theta_anchor_kp, group_id = NULL, path = c("EB", "FB"), cp_W = FALSE)

Purpose

Assembles the complete named-list data block consumed by the Path C Stan templates (amm_eb_marginal_KxP.stan / amm_eb_conditional_KxP.stan for the EB path; amm_canonical_pmulti_KxP.stan for the FB path). Dispatches on path to enforce or lift the Sub-phase 8.6.D first-iteration restrictions: EB hardcodes use_W = 0 and restricts stan_id to ${1, 3}$; FB enables the modulating $W$ (globally shared) and accepts the extended Path B family set ${1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13}$.

Arguments

  • design_KxP — list returned by .build_amm_design_KxP(). Must contain $design_per_slot, $K, and $p.
  • family — promoted gdpar_family object (validated by assert_inherits). Must carry a $stan_id field and a $name field.
  • amm_list_canonical — named list of K amm_spec objects with $p \geq 2$ per slot. Used to extract per-slot use_a / use_b flags and (FB path) $W$ metadata.
  • y_matrix — numeric or integer matrix of outcomes, shape $n \times p$.
  • theta_anchor_kp — numeric matrix of shape $K \times p$; per-slot per-coordinate anchors on the linear-predictor scale.
  • group_id — optional integer vector of length $n$. Resolved via .resolve_group_id().
  • path — character scalar; one of "EB" or "FB" (resolved by match.arg). Default "EB".
  • cp_W — logical scalar. Present in the signature but not referenced anywhere in the function body.

Mathematics

Per-slot per-coordinate design matrices are packed into 4D arrays with zero-padding:

$$Z_{a,kp}[k, j, \cdot, \cdot] \in \mathbb{R}^{n \times J_{a,\max}}, \qquad Z_{b,kp}[k, j, \cdot, \cdot] \in \mathbb{R}^{n \times J_{b,\max}},$$

where

$$J_{a,\max} = \max_{k,j} J_a^{(k,j)}, \qquad J_{b,\max} = \max_{k,j} J_b^{(k,j)},$$

and $J_a^{(k,j)} = \text{ncol}(d_k$\text{Z_a_list}j)$ (similarly for $b$). Slots/coordinates with fewer columns than the maximum are right-padded with zeros.

For the FB path with $W$ enabled, the total $W$ parameter dimension is

$$\text{dim_W}_{\text{total}} = K \cdot p \cdot W_{\text{per_kj_dim}},$$

with $W_{\text{per_kj_dim}} = \text{amm}$W$\text{dim}$ taken from the first slot that declares $W$.

The use_a_k / use_b_k flags are computed as

$$\text{use_a}_k = \mathbb{1}!\left[a_k$a \neq \text{NULL} ;\lor; \exists d \in a_k$\text{dims}:; d$a \neq \text{NULL}\right],$$

and analogously for use_b_k.

Returns

A named list. The base list (returned for both paths) contains:

Field Type Description
n integer Number of observations.
K integer Number of slots.
p integer Coordinate dimension.
family_id_k_vector integer vector (length K) Homogeneous stan_id replicated K times.
inv_link_id_per_slot integer vector (length K) Computed by .gdpar_compute_inv_link_id_per_slot().
use_a_k integer vector (length K) Per-slot $a$-component flag.
use_b_k integer vector (length K) Per-slot $b$-component flag.
use_W integer scalar 0L (EB) or as.integer(any_W) (FB).
J_a_max integer Maximum $a$-design column count.
J_b_max integer Maximum $b$-design column count.
J_a_per_kp integer matrix ($K \times p$) Per-slot per-coord $a$-design sizes.
J_b_per_kp integer matrix ($K \times p$) Per-slot per-coord $b$-design sizes.
Z_a_kp numeric array ($K \times p \times n \times J_{a,\max}$) Padded $a$-design matrices.
Z_b_kp numeric array ($K \times p \times n \times J_{b,\max}$) Padded $b$-design matrices.
y_real numeric matrix ($n \times p$) Real-valued outcomes (or zeros if needs_real is FALSE).
y_int integer matrix ($n \times p$) Integer-valued outcomes (or zeros if needs_int is FALSE).
theta_anchor_kp list of K double vectors (each length p) Row-wise decomposition of the input matrix.
use_dispersion_y_k integer vector (length K) Always zero in both paths.
use_dispersion_phi_k integer vector (length K) Always zero in both paths.
use_groups (from .resolve_group_id) Group flag.
J_groups (from .resolve_group_id) Number of groups.
group_id (from .resolve_group_id) Group index vector.
K_slots integer Redundant copy of K.
p_dim integer Redundant copy of p.

For the FB path only, the list is extended (c(base_list, ...)) with:

Field Type Description
dim_W integer Total $W$ dimension ($K \cdot p \cdot W_{\text{per_kj_dim}}$), or 0L.
d integer Number of columns in the shared design matrix $X$.
W_per_kj_dim integer Per-(slot, coord) basis dimension.
X numeric matrix ($n \times d$) Shared linear-factor design matrix (or $n \times 0$).
W_type_id (from .gdpar_resolve_W_stan_data) $W$ basis type identifier.
W_n_knots_full (from .gdpar_resolve_W_stan_data) Knot count.
W_knots_full (from .gdpar_resolve_W_stan_data) Knot vector.
W_degree (from .gdpar_resolve_W_stan_data) Spline degree.

Notes

  • EB path restrictions: use_W is hardcoded to 0L; stan_id must be in ${1, 3}$ (Gaussian or Negative Binomial), otherwise a "gdpar_unsupported_feature_error" is raised. If any slot declares W != NULL on the EB path, a "gdpar_unsupported_feature_error" is raised.
  • FB path extensions: stan_id must be in ${1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13}$; otherwise a "gdpar_unsupported_feature_error" is raised. $W$ is enabled if any slot declares it; the first such slot's $W object defines the basis metadata (shared globally).
  • Outcome validation: For count families (stan_id $\in {3, 10, 11, 12, 13}$), every entry of y_matrix must be a finite, non-negative integer; otherwise a "gdpar_input_error" is raised. For continuous families (stan_id $\in {1, 5, 6, 7, 8, 9}$), every entry must be finite.
  • y_real / y_int population: needs_real is TRUE for stan_id $\in {1, 5, 6, 7, 8, 9}$; needs_int is TRUE for stan_id $\in {3, 10, 11, 12, 13}$. The unused matrix is zero-filled.
  • theta_anchor_kp is validated as a $K \times p$ matrix and then decomposed row-wise into a list of K length-p double vectors via lapply(seq_len(K), function(k) as.double(theta_anchor_kp[k, ])).
  • family_id_k_vector is rep(as.integer(stan_id), K)—homogeneous across slots regardless of path.
  • use_dispersion_y_k / use_dispersion_phi_k are zero vectors in both paths (the FB comment notes future B9.7+ may lift this).
  • cp_W is accepted as a parameter but never read.
  • Internal errors (class "gdpar_internal_error") are raised for: invalid design_KxP structure; K < 2 or p < 2; y_matrix not a matrix; y_matrix column count mismatch; theta_anchor_kp shape mismatch; FB path with dim_W <= 0 when use_W == 1.
  • Calls .resolve_group_id(), .gdpar_compute_inv_link_id_per_slot(), and (FB only) .gdpar_resolve_W_stan_data().
  • The pad_to local helper (see below) handles zero-padding of design matrices.

pad_to(z, target_cols, n_rows) (nested closure inside .assemble_stan_data_KxP)

Purpose

Zero-pads a design matrix z to target_cols columns. If target_cols is 0L, returns an $n_{\text{rows}} \times 0$ matrix. If z already has at least target_cols columns, returns z unchanged. Otherwise right-pads with a zero matrix.

Arguments

  • z — numeric matrix; the per-slot per-coordinate design matrix to pad.
  • target_cols — integer scalar; the target column count ($J_{a,\max}$ or $J_{b,\max}$).
  • n_rows — integer scalar; the number of rows to use when target_cols == 0L (i.e., $n$).

Returns

A numeric matrix with nrow(z) rows and max(ncol(z), target_cols) columns (or n_rows rows and 0 columns when target_cols == 0L).

Notes

Defined as a local closure inside .assemble_stan_data_KxP; captures nothing from the enclosing scope (all inputs are explicit arguments). When target_cols > 0L but ncol(z) >= target_cols, z is returned as-is (no truncation occurs even if z has more columns than the target).

.gdpar_eb_make_random_init_KxP(stan_data, seed_offset = 1L, base_seed = NULL)

Purpose Internal helper that fabricates a random initial-values list for the cmdstanr optimizer / Laplace approximation in the Path C K×p EB workflow. The returned list conforms to the cmdstanr automatic packing convention for the theta_ref_kp parameter (a 3D array [J, K, p]) and conditionally emits the auxiliary scale / raw-coefficient parameters that the K×p Stan template exposes when group structure or free a/b coefficients are active.

Arguments

  • stan_data — list. The Stan data environment. The following fields are consulted (via null-coalescing %||%): K_slots (fallback K), p_dim (fallback p), J_groups (fallback 1L), use_groups (fallback 0L), use_a_k, use_b_k, J_a_per_kp, J_b_per_kp.
  • seed_offset — integer scalar, default 1L. Integer added to base_seed to derive the per-start RNG seed, enabling distinct inits across multi-start iterations.
  • base_seed — integer scalar or NULL. When non-NULL, the function seeds the global RNG with as.integer(base_seed) + seed_offset and restores the prior .Random.seed state on exit. When NULL, no seeding is performed and the global RNG state is untouched.

Mathematics

The RNG seed is

$$ \text{rng_seed} = \begin{cases} \text{base_seed} + \text{seed_offset} & \text{base_seed} \neq \text{NULL} \ \text{NULL (no seeding)} & \text{otherwise} \end{cases} $$

Draws produced (all i.i.d. unless noted):

  • theta_ref_kp[g,k,c] $\sim \mathcal{N}(0,, 0.1^2)$, shape $[J, K, p]$.
  • When use_groups == 1:
    • mu_theta_ref_kp[1,k,c] $\sim \mathcal{N}(0,, 0.1^2)$, shape $[1, K, p]$.
    • `sigma_theta_ref_kp[1,k,c] = |\mathcal{N}(0.5,, 0.05^2)|$, shape $[1, K, p]$.
  • When any use_a_k == 1:
    • sigma_a_k[s] = 0.1 + |\mathcal{N}(0,\, 0.02^2)|$ for $s = 1, \dots, n_{\sigma_a}$, where $n_{\sigma_a}$ is the count of slots $k$ satisfying use_a_k[k] == 1` and $\sum_{c} \mathbf{1}{J_{a,\text{per_kp}}[k,c] &gt; 0} &gt; 0$.
    • a_raw[j] $\sim \mathcal{N}(0,, 0.1^2)$ for $j = 1, \dots, \sum_{k,c} J_{a,\text{per_kp}}[k,c]$.
  • When any use_b_k == 1:
    • `sigma_b_k[k] = 0.1 + |\mathcal{N}(0,, 0.02^2)|$ for $k = 1, \dots, K$.
    • c_b_kp_raw[j] $\sim \mathcal{N}(0,, 0.1^2)$ for $j = 1, \dots, \sum_{k,c} J_{b,\text{per_kp}}[k,c]$.

Returns

A named list. Always contains theta_ref_kp (a 3D numeric array of dim c(J, K, p)). Conditionally also contains:

  • mu_theta_ref_kp — 3D array [1, K, p] (only when use_groups == 1).
  • sigma_theta_ref_kp — 3D array [1, K, p] (only when use_groups == 1).
  • sigma_a_k — 1D numeric array of length n_sigma_a (only when any_use_a == 1 and n_sigma_a > 0).
  • a_raw — numeric vector of length total_J_a_free (only when any_use_a == 1 and total_J_a_free > 0).
  • sigma_b_k — 1D numeric array of length K (only when any_use_b == 1).
  • c_b_kp_raw — numeric vector of length total_J_b_free (only when any_use_b == 1 and total_J_b_free > 0).

Notes

  • Side effect: when base_seed is non-NULL, the global .Random.seed is overwritten via set.seed(rng_seed) and restored on function exit through an on.exit handler. If .Random.seed did not previously exist in .GlobalEnv, the handler performs no restoration (the seed state is left as set).
  • The slot-free-a mask is computed by coercing stan_data$J_a_per_kp to an integer matrix of shape K × p (row-major), then taking rowSums(.jap > 0L) > 0L intersected with use_a_k == 1L. This mirrors the n_sigma_a transformed-data quantity of the K×P Stan template; when every slot carries free a coefficients, $n_{\sigma_a} = K$ and the draw count is bit-identical to the unconditional case.
  • sigma_a_k and sigma_b_k are wrapped with as.array to ensure 1D-array typing expected by cmdstanr init packing.
  • No errors are raised by this function; malformed stan_data would propagate as errors from downstream coercions (e.g. as.integer, matrix).

.gdpar_eb_maximize_marginal_KxP(model, stan_data, control, seed, verbose)

Purpose

Step (i) of the EB workflow under Path C, specialized for the K×p regime. Runs a multi-start joint Laplace approximation over the full theta_ref_kp anchor tensor of shape [J_groups, K, p], selects the best init by marginal log-likelihood, draws from the Laplace approximation, extracts per-slot $p \times p$ covariance blocks (with canonical group-averaging when $J &gt; 1$), applies per-slot adaptive Levenberg–Marquardt ridge perturbation, and gates the result on a condition-number threshold. The packaged output is consumed downstream by .gdpar_eb_correction_tensor().

Arguments

  • model — cmdstanr model object. Must expose $optimize and $laplace methods.
  • stan_data — list. Stan data list; must contain J_groups, K_slots, p_dim, plus whatever fields .gdpar_eb_make_random_init_KxP requires.
  • control — list. Must contain at least: multi_start_M (integer, number of starts), optim_algorithm (passed to model$optimize), laplace_draws (integer, number of Laplace draws), kappa_threshold (numeric, condition-number gate), and any fields consumed by .gdpar_eb_lm_perturb.
  • seed — integer scalar or NULL. Base seed for reproducibility; propagated to both the init generator and cmdstanr.
  • verbose — logical. Controls emission of informational messages via gdpar_inform and gdpar_warn.

Mathematics

Multi-start optimization. For $m = 1, \dots, M$:

$$ \theta^{(m)}_0 = \texttt{.gdpar_eb_make_random_init_KxP}(\text{stan_data},, m,, \text{seed}) $$

$$ \hat{\theta}^{(m)} = \arg\max_{\theta}, \log p(\theta \mid \text{data}) $$

with optimizer seed $\text{seed} + m$ (when seed non-NULL). The marginal log-likelihood of each start is $\ell_m = \texttt{opt_m$mle()["lp__"]}$. The best start is

$$ m^\star = \arg\max_{m \in {1,\dots,M}} \ell_m $$

where inits that errored are skipped (their $\ell_m$ is NA_real_).

Laplace approximation. Given $\hat{\theta}^{(m^\star)}$,

$$ \theta^{(s)} \sim \mathcal{N}!\left(\hat{\theta}^{(m^\star)},, \left[-\nabla^2 \log p(\hat{\theta}^{(m^\star)} \mid \text{data})\right]^{-1}\right), \quad s = 1, \dots, S $$

with $S = \text{control$laplace_draws}$ and Laplace seed $\text{seed} + 1000$ (when seed non-NULL).

Posterior mean of the anchor tensor:

$$ \hat{\theta}_{g,k,c} = \frac{1}{S}\sum_{s=1}^{S} \theta^{(s)}_{g,k,c} $$

Per-slot covariance. For slot $k$, let $\mathcal{I}_k = {(g, c)

Clone this wiki locally