Skip to content

Part I Conceptual Framework

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 5 revisions

gdpar Wiki Home · Part II — Mathematical Foundations →


Part I — Conceptual Framework

I.1 Motivation: from cognition to statistics

The framework originates in an observation about expert human prediction under uncertainty and consequence: a driver executing fast overtaking maneuvers at narrow distances without crashing. Decomposed, the expert's predictive process has three stages:

  1. A population reference. The driver carries an internal model of the average driver — typical reaction times, modal aggressiveness, characteristic acceleration/braking patterns. This functions as a prior.
  2. Rapid estimation of individual deviation. In a one-to-two-second window, the driver reads signals from the specific other driver (style, relative speed, vehicle type, micro-movements) and estimates how and how much that driver departs from the average.
  3. Conditional prediction and decision. With the individual profile expressed as a deviation from the average, the driver predicts the other's behavior and decides the maneuver.

A conventional model in its simplest form predicts with parameters that are fixed once estimated,

$$\hat{y}_i = f(x_i;\hat\theta),\qquad \hat\theta \text{ identical for all } i.$$

More elaborate models do let parameters vary across individuals — hierarchical models with covariate-dependent random effects, varying-coefficient models, hypernetworks, mixtures of experts, state-space models. The framework's contribution is therefore not the absence of antecedents but the explicit, canonical formulation of a specific structural pattern those antecedents do not typically make explicit:

Conventional predictive models do not typically formulate individual parameters as explicit deviations from a population reference where the deviation function depends both on the individual's observable characteristics and on the reference itself.

The substantive content unpacks into three commitments:

  • $\theta_{\text{ref}}$ is always present as an explicit anchor in the predictive equation.
  • $\Delta_i$ is computed as a function of observable signals $x_i$.
  • the individual prediction emerges from $\theta_i=\theta_{\text{ref}}+\Delta_i$, not from $\theta_i$ estimated independently per individual nor from $\theta_{\text{ref}}$ shifted by a structurally separate term.

The decisive property is the structural dependence of $\Delta$ on $\theta_{\text{ref}}$: $\Delta_i = \Delta(x_i,\theta_{\text{ref}})$. If the reference changes — e.g. the model is transferred to a new population — the deviation function behaves differently because $\theta_{\text{ref}}$ is one of its arguments. This is the precise mathematical content the AMM canonical form (Part II) makes operational.

I.2 The fundamental equation

For each observation $i$ with covariates $x_i$,

$$\boxed{\theta_i ;=; \theta_{\text{ref}} ;+; \Delta(x_i,;\theta_{\text{ref}})},$$

with $\theta_{\text{ref}}=\mathbb{E}[\theta]$ the population reference (the "average driver", or in general the reference entity of the system) and $\Delta$ a deviation function. The individual prediction is

$$\hat y_i ;=; f\big(x_i;,\theta_i\big) ;=; f\big(x_i;,\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}})\big).$$

A model in this framework is fully specified by three design choices:

  1. How $\theta_{\text{ref}}$ is estimated — from the full sample, from a reference subset, or as a hyperparameter with its own distribution (this becomes the anchor argument in code, and the EB-vs-FB distinction).
  2. How $\Delta$ is estimated — parametric/linear, non-parametric (splines), or as a neural network output (the three paths).
  3. What distribution is assumed for $Y_i\mid\theta_i$ — this determines the family of problems addressed (the distributional families).

The canonical functional form for $\Delta$ is the Additive–Multiplicative–Modulated (AMM) form,

$$\Delta(x,\theta_{\text{ref}}) ;=; a(x) ;+; b(x)\odot\theta_{\text{ref}} ;+; W(\theta_{\text{ref}}),x,$$

developed formally in Part II. Read componentwise: $a(x)$ is a pure additive shift; $b(x)\odot\theta_{\text{ref}}$ scales the reference elementwise by a covariate-dependent factor; $W(\theta_{\text{ref}})\,x$ mixes covariates through a matrix that itself depends on the reference. The third term is what encodes "the deviation depends on the reference."

I.3 Desiderata satisfied by the form

  1. Anchoring. With no individual information ($x_i$ unobserved or non-informative), $\Delta\to 0$ and $\theta_i\to\theta_{\text{ref}}$: the model collapses to the population baseline.
  2. Individuation. With rich individual information, $\Delta$ can be large and parameters move away from the reference as far as the evidence justifies.
  3. Transferability. When $\theta_{\text{ref}}$ changes (new population), deviations are recomputed coherently because $\Delta$ is a function of $\theta_{\text{ref}}$.
  4. Generality. The form subsumes, as special cases, fixed effects ($\Delta\equiv 0$), classical random effects ($\Delta$ independent of $\theta_{\text{ref}}$), and random coefficients ($\Delta$ linear in $x_i$).

I.4 The three estimation paths

The same canonical principle is realized through three complementary engines, differing in how $\Delta$ is estimated and in their interpretability/expressivity trade-offs.

  • Path 1 — Hierarchical Bayesian (Stan). A three-level hierarchy: a population level $\theta_{\text{ref}}\sim p(\theta\mid\text{hyper})$, an individual level $\theta_i\mid x_i \sim \mathcal N(\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}}),\Sigma_i)$ (with $\Sigma_i$ possibly covariate-dependent, i.e. individual heteroscedasticity), and an observation level $Y_i\mid\theta_i\sim\mathcal D(\theta_i)$. Most faithful to the cognitive analogy, native full-posterior uncertainty. This is the operational path in gdpar.
  • Path 2 — Varying-coefficient models (penalized splines). The frequentist version: $Y_i = x_i^\top\beta(z_i)+\varepsilon_i$, with the reference recovered as $\beta_{\text{ref}}=\beta(\bar z)$ and the deviation as $\Delta_i=\beta(z_i)-\beta(\bar z)$. Maximal interpretability; suffers the curse of dimensionality. Conceptual in gdpar 0.1.0 (see below).
  • Path 3 — Conditional parameter networks (hypernetworks / amortized inference). A neural network generates the individual parameters, $\theta_i=h_\phi(x_i,\theta_{\text{ref}})$, with anchoring enforced both by feeding $\theta_{\text{ref}}$ as an explicit input and by a regularizer $\lambda\|\theta_i-\theta_{\text{ref}}\|^2$. Arbitrary nonlinearity, lowest interpretability. Conceptual in gdpar 0.1.0.

Implementation status. In gdpar 0.1.0 only Path 1 (Hierarchical Bayesian) is operational and is the default. Calls of the form gdpar(..., path = "vcm") or gdpar(..., path = "hyper") abort with a gdpar_unsupported_feature_error. The asymptotic theory for all three paths is nonetheless developed to reference grade (Part II, from vignettes v04v06), so the package's mathematical scope exceeds its current executable surface by design.

Comparative summary (from the framework overview):

Criterion Path 1 Bayesian Path 2 VCM Path 3 Hypernetwork
Fidelity to cognitive analogy High Moderate Moderate
Theoretical rigor Very high High Moderate
Interpretability High Very high Low
Expressive capacity of $\Delta$ Moderate (parametric) Moderate–high Arbitrarily high
Scalability to high dimension Moderate Low (curse of dim.) High
Uncertainty quantification Native (full posteriors) Asymptotic (CIs) Requires extensions
Primary tools Stan, cmdstanr mgcv, splines torch

I.5 Empirical Bayes vs full Bayes

Within Path 1, the reference $\theta_{\text{ref}}$ and the hyperparameters can be handled in two regimes:

  • Full Bayes (FB). Everything — reference, deviation coefficients, hyperparameters — is given priors and sampled jointly by HMC. Output: full joint posterior. This is gdpar().
  • Empirical Bayes (EB). The hyperparameters (and, in the marginal variant, the reference itself) are estimated by maximizing a marginal likelihood / MAP objective; the remaining parameters are then inferred conditionally. Far cheaper, with an explicit and analyzable contraction/asymptotic story. This is gdpar_eb().

The package treats EB and FB as parallel, comparable estimation routes and ships an explicit comparator (gdpar_compare_eb_fb) that quantifies how much the two agree on $\theta_{\text{ref}}$, on the reduced parameters $\xi$, and on coverage. The EB theory and its multivariate generalization (Theorems 7A–7D, Part II) are first-class, not afterthoughts.

I.6 Distributional regression: parameter slots

The framework is not restricted to modelling a mean. A distribution can have several parameters — location, scale, shape, tail index, zero-inflation probability — and each of these is a slot that can carry its own AMM decomposition. The package indexes slots by $k=1,\dots,K$. Examples: Gaussian ($K=2:\ \mu,\sigma$), Student-$t$ ($K=3:\ \mu,\sigma,\nu$), Tweedie ($K=3:\ \mu,\phi,p$), zero-inflated and hurdle families (an extra $\pi$ slot), and heterogeneous per-slot families where different slots take different sub-families. Each slot has its own link $g_k$, its own support, and its own AMM design. This is the distributional-regression generalization of the anchoring equation:

$$\theta_i^{(k)} = \theta_{\text{ref}}^{(k)} + \Delta^{(k)}(x_i,\theta_{\text{ref}}^{(k)}),\qquad k=1,\dots,K,$$

each on its own link scale. Zero-inflation receives the framework's distinctive dual deviation: both $\pi_i$ and the count parameter $\theta_i$ are anchored to their respective references,

$$\mathrm{logit}(\pi_i)=\mathrm{logit}(\pi_{\text{ref}})+\Delta_\pi(x_i,\pi_{\text{ref}}),\qquad \theta_i=\theta_{\text{ref}}+\Delta_\theta(x_i,\theta_{\text{ref}}).$$

I.7 The causal bridge (CATE / ITE)

Because the AMM form produces individual parameters, it is naturally positioned for individual treatment-effect estimation. The package provides a T-learner causal bridge: fit the AMM model separately under treatment and control, then read individual conditional average treatment effects (CATE) / individual treatment effects (ITE) as the difference of the anchored individual predictions. A second layer compares this AMM-based learner against external meta-learners (via pluggable adapters to grf in R and EconML in Python through reticulate), so the framework's causal claims are benchmarked, not asserted.

I.8 Geometric robustness of sampling (opt-in)

Hierarchical AMM posteriors can be geometrically hostile to standard HMC — funnels, near-determinism, heavy tails, multimodality. The package contains an opt-in geometry-adaptive sampling engine whose default is bit-identical to ordinary sampling and which, when enabled, climbs a ladder of increasingly powerful geometries: Euclidean → Riemannian (Fisher / SoftAbs) → sub-Riemannian → Finsler/relativistic, governed by a certifying orchestrator that diagnoses the pathology, selects a metric, tunes the integrator, and emits a certificate. A Laplace fallback provides a plug-in posterior (and ELPD on par with mgcv-REML / INLA-Laplace) when full sampling is certified infeasible.

I.9 Dependence-robust inference

gdpar does not model temporal or spatial dependence in its point structure (that is deferred, by design, to a future "Block 10"); instead it makes the inference robust to dependence that is present but unmodelled. It ships:

  • diagnostics that convert invisible iid risk into measured quantities — lag-1 autocorrelation, Durbin–Watson, Ljung–Box on residuals (temporal); Moran's $I$ with permutation and analytic Cliff–Ord variants (spatial);
  • robust standard errors / intervals via block bootstrap — moving/circular blocks in time, tiled randomized-origin blocks in space — with data-driven block lengths (Politis–White flat-top automatic length in time; a custom subsampling calibration in space).

Point estimates are unchanged; only the uncertainty is made robust. The honesty is explicit: the dependence is not modelled, the inference is merely made valid in its presence.



gdpar Wiki Home · Part II — Mathematical Foundations →

Clone this wiki locally