-
Notifications
You must be signed in to change notification settings - Fork 0
Part I Conceptual Framework
gdpar Wiki Home · Part II — Mathematical Foundations →
The framework originates in an observation about expert human prediction under uncertainty and consequence: a driver executing fast overtaking maneuvers at narrow distances without crashing. Decomposed, the expert's predictive process has three stages:
- A population reference. The driver carries an internal model of the average driver — typical reaction times, modal aggressiveness, characteristic acceleration/braking patterns. This functions as a prior.
- Rapid estimation of individual deviation. In a one-to-two-second window, the driver reads signals from the specific other driver (style, relative speed, vehicle type, micro-movements) and estimates how and how much that driver departs from the average.
- Conditional prediction and decision. With the individual profile expressed as a deviation from the average, the driver predicts the other's behavior and decides the maneuver.
A conventional model in its simplest form predicts with parameters that are fixed once estimated,
More elaborate models do let parameters vary across individuals — hierarchical models with covariate-dependent random effects, varying-coefficient models, hypernetworks, mixtures of experts, state-space models. The framework's contribution is therefore not the absence of antecedents but the explicit, canonical formulation of a specific structural pattern those antecedents do not typically make explicit:
Conventional predictive models do not typically formulate individual parameters as explicit deviations from a population reference where the deviation function depends both on the individual's observable characteristics and on the reference itself.
The substantive content unpacks into three commitments:
-
$\theta_{\text{ref}}$ is always present as an explicit anchor in the predictive equation. -
$\Delta_i$ is computed as a function of observable signals$x_i$ . - the individual prediction emerges from
$\theta_i=\theta_{\text{ref}}+\Delta_i$ , not from$\theta_i$ estimated independently per individual nor from$\theta_{\text{ref}}$ shifted by a structurally separate term.
The decisive property is the structural dependence of
For each observation
with
A model in this framework is fully specified by three design choices:
-
How
$\theta_{\text{ref}}$ is estimated — from the full sample, from a reference subset, or as a hyperparameter with its own distribution (this becomes the anchor argument in code, and the EB-vs-FB distinction). -
How
$\Delta$ is estimated — parametric/linear, non-parametric (splines), or as a neural network output (the three paths). -
What distribution is assumed for
$Y_i\mid\theta_i$ — this determines the family of problems addressed (the distributional families).
The canonical functional form for
developed formally in Part II. Read componentwise:
-
Anchoring. With no individual information (
$x_i$ unobserved or non-informative),$\Delta\to 0$ and$\theta_i\to\theta_{\text{ref}}$ : the model collapses to the population baseline. -
Individuation. With rich individual information,
$\Delta$ can be large and parameters move away from the reference as far as the evidence justifies. -
Transferability. When
$\theta_{\text{ref}}$ changes (new population), deviations are recomputed coherently because$\Delta$ is a function of$\theta_{\text{ref}}$ . -
Generality. The form subsumes, as special cases, fixed effects (
$\Delta\equiv 0$ ), classical random effects ($\Delta$ independent of$\theta_{\text{ref}}$ ), and random coefficients ($\Delta$ linear in$x_i$ ).
The same canonical principle is realized through three complementary engines, differing in how
-
Path 1 — Hierarchical Bayesian (Stan). A three-level hierarchy: a population level
$\theta_{\text{ref}}\sim p(\theta\mid\text{hyper})$ , an individual level$\theta_i\mid x_i \sim \mathcal N(\theta_{\text{ref}}+\Delta(x_i,\theta_{\text{ref}}),\Sigma_i)$ (with$\Sigma_i$ possibly covariate-dependent, i.e. individual heteroscedasticity), and an observation level$Y_i\mid\theta_i\sim\mathcal D(\theta_i)$ . Most faithful to the cognitive analogy, native full-posterior uncertainty. This is the operational path ingdpar. -
Path 2 — Varying-coefficient models (penalized splines). The frequentist version:
$Y_i = x_i^\top\beta(z_i)+\varepsilon_i$ , with the reference recovered as$\beta_{\text{ref}}=\beta(\bar z)$ and the deviation as$\Delta_i=\beta(z_i)-\beta(\bar z)$ . Maximal interpretability; suffers the curse of dimensionality. Conceptual ingdpar 0.1.0(see below). -
Path 3 — Conditional parameter networks (hypernetworks / amortized inference). A neural network generates the individual parameters,
$\theta_i=h_\phi(x_i,\theta_{\text{ref}})$ , with anchoring enforced both by feeding$\theta_{\text{ref}}$ as an explicit input and by a regularizer$\lambda\|\theta_i-\theta_{\text{ref}}\|^2$ . Arbitrary nonlinearity, lowest interpretability. Conceptual ingdpar 0.1.0.
Implementation status. In
gdpar 0.1.0only Path 1 (Hierarchical Bayesian) is operational and is the default. Calls of the formgdpar(..., path = "vcm")orgdpar(..., path = "hyper")abort with agdpar_unsupported_feature_error. The asymptotic theory for all three paths is nonetheless developed to reference grade (Part II, from vignettesv04–v06), so the package's mathematical scope exceeds its current executable surface by design.
Comparative summary (from the framework overview):
| Criterion | Path 1 Bayesian | Path 2 VCM | Path 3 Hypernetwork |
|---|---|---|---|
| Fidelity to cognitive analogy | High | Moderate | Moderate |
| Theoretical rigor | Very high | High | Moderate |
| Interpretability | High | Very high | Low |
| Expressive capacity of |
Moderate (parametric) | Moderate–high | Arbitrarily high |
| Scalability to high dimension | Moderate | Low (curse of dim.) | High |
| Uncertainty quantification | Native (full posteriors) | Asymptotic (CIs) | Requires extensions |
| Primary tools | Stan, cmdstanr | mgcv, splines | torch |
Within Path 1, the reference
-
Full Bayes (FB). Everything — reference, deviation coefficients, hyperparameters — is given priors and sampled jointly by HMC. Output: full joint posterior. This is
gdpar(). -
Empirical Bayes (EB). The hyperparameters (and, in the marginal variant, the reference itself) are estimated by maximizing a marginal likelihood / MAP objective; the remaining parameters are then inferred conditionally. Far cheaper, with an explicit and analyzable contraction/asymptotic story. This is
gdpar_eb().
The package treats EB and FB as parallel, comparable estimation routes and ships an explicit comparator (gdpar_compare_eb_fb) that quantifies how much the two agree on
The framework is not restricted to modelling a mean. A distribution can have several parameters — location, scale, shape, tail index, zero-inflation probability — and each of these is a slot that can carry its own AMM decomposition. The package indexes slots by
each on its own link scale. Zero-inflation receives the framework's distinctive dual deviation: both
Because the AMM form produces individual parameters, it is naturally positioned for individual treatment-effect estimation. The package provides a T-learner causal bridge: fit the AMM model separately under treatment and control, then read individual conditional average treatment effects (CATE) / individual treatment effects (ITE) as the difference of the anchored individual predictions. A second layer compares this AMM-based learner against external meta-learners (via pluggable adapters to grf in R and EconML in Python through reticulate), so the framework's causal claims are benchmarked, not asserted.
Hierarchical AMM posteriors can be geometrically hostile to standard HMC — funnels, near-determinism, heavy tails, multimodality. The package contains an opt-in geometry-adaptive sampling engine whose default is bit-identical to ordinary sampling and which, when enabled, climbs a ladder of increasingly powerful geometries: Euclidean → Riemannian (Fisher / SoftAbs) → sub-Riemannian → Finsler/relativistic, governed by a certifying orchestrator that diagnoses the pathology, selects a metric, tunes the integrator, and emits a certificate. A Laplace fallback provides a plug-in posterior (and ELPD on par with mgcv-REML / INLA-Laplace) when full sampling is certified infeasible.
gdpar does not model temporal or spatial dependence in its point structure (that is deferred, by design, to a future "Block 10"); instead it makes the inference robust to dependence that is present but unmodelled. It ships:
-
diagnostics that convert invisible iid risk into measured quantities — lag-1 autocorrelation, Durbin–Watson, Ljung–Box on residuals (temporal); Moran's
$I$ with permutation and analytic Cliff–Ord variants (spatial); - robust standard errors / intervals via block bootstrap — moving/circular blocks in time, tiled randomized-origin blocks in space — with data-driven block lengths (Politis–White flat-top automatic length in time; a custom subsampling calibration in space).
Point estimates are unchanged; only the uncertainty is made robust. The honesty is explicit: the dependence is not modelled, the inference is merely made valid in its presence.
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References