-
Notifications
You must be signed in to change notification settings - Fork 0
Part V Stan Templates 2
← Part V — Stan Templates (1/3) · gdpar Wiki Home · Part V — Stan Templates (3/3) →
This template implements the Empirical Bayes (EB) marginal model for the AMM (Anchored Modulation Model) reference-anchoring decomposition with R/eb.R via cmdstanr::laplace() in Step (i) of the EB workflow. The template body is bit-identical to the full-Bayesian template amm_distrib_K.stan; only the banner comment documents the EB context.
The banner establishes:
- Provenance: Sub-phase 8.6.C, decision D34, canonized in Session 10 (2026-05-25).
-
EB context: Used by
R/eb.Rviacmdstanr::laplace()for Laplace-approximate marginal inference on$\boldsymbol{\theta}_{\text{ref}}$ . -
Regime:
$K \geq 2,\ p = 1$ . Each of the$K$ slots carries a univariate$\theta_{\text{ref},k}$ . -
Codegen identity: Body is bit-identical to
amm_distrib_K.stan, avoiding drift between FB (full Bayes) and EB Path B paths. The R-side functiongenerate_stan_code_K()is reused verbatim via atemplate_pathoverride, including the full dispatch overstan_id$\in \{1,3,5,6,7,8,9,10,11,12,13\}$ covering D33 deferral. -
Companion template:
amm_eb_conditional_K.stanmoves$\theta_{\text{ref},k}$ fromparameters{}todata{}(astheta_ref_k_data), removes the per-slot hyperparametersmu_theta_ref_k/sigma_theta_ref_kfromparameters{}, and removes theTHETA_REF_PRIOR_BLOCKsubstitution frommodel{}, but is otherwise bit-identical.
For each slot
where:
| Component | Meaning | Basis Expansion |
|---|---|---|
| Global reference (intercept) for slot |
Scalar | |
| Non-linear deviation from reference |
|
|
| Non-linear scaling interaction |
|
|
| Modulation term |
|
The design matrices build_amm_design_K()), ensuring the identification constraints (C2) and (C3) hold identically per slot. Constraint (C4) holds per slot via
To eliminate the bimodality of
- Define
$\mathbf{c}_{b,k} = \theta_{\text{ref},k} \cdot \boldsymbol{\beta}_k$ (effective coefficients for the$b_k$ basis). - The model samples
$\mathbf{c}_{b,k}^{\text{raw}}$ directly (non-centered parametrization, NCP). - Reports
$\boldsymbol{\beta}_k = \mathbf{c}_{b,k} / \theta_{\text{ref},k}$ as a derived quantity.
This eliminates the sign-symmetry bimodality. See Recovery 2 (handoff 4 addendum, 2026-05-09).
-
$W$ : Globally shared across all$K$ slots. A singleW_raw, a singlesigma_W, and a single basis matrix feed the modulating term in every slot. -
$\theta_{\text{anchor}}$ : Per-slot. Each slot$k$ carries its own$\theta_{\text{anchor},k}$ . Default: 0.
real tweedie_log_W_series(real y, real phi, real p)Purpose: Computes
Mathematics: For
In log space:
Line-by-line:
-
real alpha_pos = (2.0 - p) / (p - 1.0);: Computes$\alpha = \frac{2-p}{p-1} > 0$ for$1 < p < 2$ . -
real log_z = alpha_pos * log(y) - alpha_pos * log(p - 1.0) - (1.0 + alpha_pos) * log(phi) - log(2.0 - p);: Computes the "base" log-factor$\log z$ common to each term, where: $$ \log z = \alpha \log y - \alpha \log(p-1) - (1+\alpha)\log\phi - \log(2-p) $$ so that$\log W_j = j \cdot \log z - \log(j!) - \log\Gamma(j\alpha)$ . -
real log_W = negative_infinity();: Initializes the running log-sum to$-\infty$ (log of zero). -
real max_log_W = negative_infinity();: Tracks the running maximum of$\log W_j$ for early termination. -
int max_iter = 1000;: Hard ceiling on the number of series terms. -
int passed_peak = 0;: Flag indicating whether the series terms have started declining from the peak. -
Loop
for (j_loop in 1:max_iter):-
real j = j_loop;: Cast loop index toreal. -
real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);: Computes$\log W_j = j \cdot \log z - \log\Gamma(j+1) - \log\Gamma(j\alpha)$ , noting that$\log(j!) = \log\Gamma(j+1)$ . -
if (term > max_log_W) { max_log_W = term; }: Updates the running maximum. -
else { passed_peak = 1; }: Once a term is below the running max, the peak has been passed and the series is in its tail. -
log_W = log_sum_exp(log_W, term);: Accumulates$\log W = \text{log\_sum\_exp}(\log W, \log W_j)$ , i.e., the log of the partial sum. -
if (passed_peak == 1 && term < max_log_W - 37.0) { break; }: Early termination when the current term is more than 37 nats below the peak ($e^{-37} \approx 10^{-16}$ , i.e., relative contribution below machine-epsilon scale).
-
-
return log_W;: Returns$\log \sum_{j=1}^{J} W_j$ with adaptive truncation.
real tweedie_log_f_series(real y, real mu, real phi, real p)Purpose: Exact Tweedie log-density for
Mathematics: The Tweedie density in exponential dispersion form:
where
Line-by-line:
-
real theta = pow(mu, 1.0 - p) / (1.0 - p);: Natural parameter$\theta = \frac{\mu^{1-p}}{1-p}$ . For$1 < p < 2,\ 1 - p < 0$ , so$\theta < 0$ . -
real kappa = pow(mu, 2.0 - p) / (2.0 - p);: Cumulant function$\kappa(\theta) = \frac{\mu^{2-p}}{2-p}$ . For$1 < p < 2,\ 2 - p > 0$ , so$\kappa > 0$ . -
return (y * theta - kappa) / phi - log(y) + tweedie_log_W_series(y, phi, p);: Returns: $$ \log f(y) = \frac{y,\theta - \kappa}{\phi} - \log y + \log W(y, \phi, p) $$
real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p)Purpose: Saddlepoint approximation to the Tweedie log-density for
Mathematics: The saddlepoint approximation uses half the unit deviance:
where the unit deviance
Line-by-line:
-
real deviance = 2.0 * ( y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p) - (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p) );: Computes$d(y, \mu)$ . -
return -0.5 * log(2.0 * pi() * phi * pow(y, p)) - deviance / (2.0 * phi);: Returns the saddlepoint approximation$\log \hat{f}(y)$ .
real tweedie_lpdf(real y, real mu, real phi, real p)Purpose: Dispatches to the appropriate log-density computation based on _lpdf suffix enables Stan's y ~ tweedie(mu, phi, p) sampling syntax.
Mathematics:
-
$y = 0$ : Closed-form compound Poisson mass: $$ \Pr(Y = 0) = \exp(-\lambda), \quad \lambda = \frac{\mu^{2-p}}{\phi,(2-p)} $$ so$\log \Pr(Y=0) = -\lambda$ . -
$y > 0,\ |p - 1.5| < \tau$ ($\tau = 0.4$ , central region): Dunn–Smyth series (tweedie_log_f_series). -
$y > 0,\ |p - 1.5| \geq \tau$ (tail region): Saddlepoint approximation (tweedie_log_f_saddlepoint).
Line-by-line:
-
real tau = 0.4;: Canonical threshold for the series/saddlepoint boundary. -
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));: Poisson rate$\lambda = \frac{\mu^{2-p}}{\phi(2-p)}$ of the compound Poisson representation. -
if (y == 0.0) { return -lambda; }:$\log\Pr(Y=0) = -\lambda$ . -
if (abs(p - 1.5) < tau) { return tweedie_log_f_series(y, mu, phi, p); }: Central region → exact series. -
else { return tweedie_log_f_saddlepoint(y, mu, phi, p); }: Tail region → saddlepoint.
real tweedie_rng(real mu, real phi, real p)Purpose: Generates a draw from the Tweedie(
Mathematics: For
-
$N \sim \text{Poisson}(\lambda)$ with$\lambda = \frac{\mu^{2-p}}{\phi(2-p)}$ -
$G_i \sim \text{Gamma}(\text{shape}, \text{rate})$ with$\text{shape} = \frac{2-p}{p-1},\ \text{rate} = \frac{1}{\phi(p-1)\mu^{p-1}}$ -
$Y = \sum_{i=1}^{N} G_i$ (and$Y = 0$ if$N = 0$ )
Line-by-line:
-
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));: Poisson rate$\lambda$ . -
int N = poisson_rng(lambda);: Draws$N \sim \text{Poisson}(\lambda)$ . -
if (N == 0) { return 0.0; }: If no gamma variates to sum, return 0. -
real shape = (2.0 - p) / (p - 1.0);: Gamma shape$= \alpha$ . -
real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));: Gamma rate$= \frac{1}{\phi(p-1)\mu^{p-1}}$ . -
return gamma_rng(N * shape, rate);: Returns$Y \sim \text{Gamma}(N \cdot \alpha, \text{rate})$ , which equals the sum of$N$ i.i.d.$\text{Gamma}(\alpha, \text{rate})$ variates by the additivity property of the gamma distribution.
real apply_inv_link_by_id(int link_id, real eta)Purpose: Maps a link-function identifier to the canonical inverse-link transformation. Used by the
Mapping (populated R-side by .gdpar_compute_inv_link_id_per_slot under decision D3.5):
link_id |
Transformation | Example slots | |
|---|---|---|---|
| 0 | Identity | Gaussian |
|
| 1 | Inverse-logit | Beta |
|
| 2 | Exponential | Gaussian |
Line-by-line:
-
if (link_id == 0) return eta;: Identity:$g^{-1}(\eta) = \eta$ . -
if (link_id == 1) return inv_logit(eta);: Inverse-logit:$g^{-1}(\eta) = \text{logit}^{-1}(\eta) = \frac{e^{\eta}}{1+e^{\eta}}$ . -
return exp(eta);: Fallback (link_id == 2): exponential:$g^{-1}(\eta) = e^{\eta}$ .
The heterogeneity rule: for a homogeneous slot family_id_k[k] == family_id_k[1]), the ID reflects the canonical link of slot family_id_k[k] != family_id_k[1]), the ID reflects the canonical link of slot 1 of family_id_k[k].
This is a placeholder substituted by R/stan_codegen.R during template expansion. It injects family-specific canonical helper functions (e.g., log-likelihood wrappers, Jacobian corrections) resolved by the stan_id dispatch. The exact contents depend on the family configuration.
int<lower=1> n;Number of observations. Must be
int<lower=2> K;Number of
array[K] int<lower=1, upper=13> family_id_k;Per-slot family identifier. Each slot
| ID | Family | Links (per slot) | |
|---|---|---|---|
| 1 | gaussian |
|
2 |
| 3 | neg_binomial_2 |
|
2 |
| 5 | beta |
|
2 |
| 6 | gamma |
|
2 |
| 7 | lognormal_loc_scale |
|
2 |
| 8 | student_t |
|
3 |
| 9 | tweedie |
|
3 |
| 10 | zip |
|
2 |
| 11 | zinb |
|
3 |
| 12 | hurdle_poisson |
|
2 |
| 13 | hurdle_neg_binomial_2 |
|
3 |
Sub-phase 8.3.7 admits heterogeneous families per slot for the family_id_k[1] (the location slot); auxiliary slots consume inv_link_id_per_slot[k] via apply_inv_link_by_id().
array[K] int<lower=0, upper=2> inv_link_id_per_slot;Per-slot inverse-link identifier. Populated R-side by .gdpar_compute_inv_link_id_per_slot under the L1 refined rule (decision D3.5). Values: 0 = identity, 1 = inv_logit, 2 = exp. Consumed by the
array[K] int<lower=0, upper=1> use_a_k;Per-slot flag: 1 if the deviation component
array[K] int<lower=0, upper=1> use_b_k;Per-slot flag: 1 if the scaling interaction component
int<lower=0, upper=1> use_W;Global flag: 1 if the modulation component
array[K] int<lower=0> J_a_per_k;Number of basis columns in
array[K] int<lower=0> J_b_per_k;Number of basis columns in
int<lower=0> J_a_max;int<lower=0> J_b_max;array[K] matrix[n, J_a_max] Z_a_k;Padded per-slot design matrices for the Z_a_k[k] is segment() function extracts the active columns. These matrices are column-wise centered on the R side, enforcing identification constraints (C2) and (C3) per slot.
array[K] matrix[n, J_b_max] Z_b_k;Padded per-slot design matrices for the Z_a_k. Column-wise centered on the R side.
int<lower=0> dim_W;Total dimension of the W_raw in the parameters block.
int<lower=0> d;Number of columns in the covariate matrix
matrix[n, d] X;The
vector[n] y_real;Real-valued response vector. Used by continuous families: Gaussian (1), beta (5), gamma (6), lognormal (7), Student-t (8), Tweedie (9). Also used by continuous hurdle families when implemented.
array[n] int y_int;Integer-valued response array. Used by discrete families: neg_binomial_2 (3), zero-inflated Poisson (10), zero-inflated NB (11), hurdle Poisson (12), hurdle NB (13). The consumed branch follows family_id_k[1].
vector[K] theta_anchor_K;Per-slot anchor values gdpar_formula_set (e.g., theta_anchor = c(mu = 0, sigma = 0)). Default is typically 0. The modulation term evaluates
array[K] int<lower=0, upper=1> use_dispersion_y_k;Per-slot flag indicating whether the slot
array[K] int<lower=0, upper=1> use_dispersion_phi_k;Per-slot flag indicating whether slot param_spec retains population scope, kept for future extension to mixed (
int<lower=0, upper=2> W_type_id;Type identifier for the
int<lower=0> W_n_knots_full;Total number of knots for the
vector[W_n_knots_full] W_knots_full;Full knot sequence for the
int<lower=0> W_degree;Degree of the
int<lower=0, upper=1> use_groups;Flag: 1 if hierarchical grouping is active. The grouping structure is shared across all
int<lower=1> J_groups;Number of groups. Only meaningful when use_groups == 1.
array[n] int<lower=1, upper=J_groups> group_id;Group membership indicator for each observation
{{DATA_CP_A_PER_K_DECL}}Placeholder substituted by generate_a_blocks_K() based on the cp_a_per_K logical vector of length amm_distrib_multi.stan.
int<lower=1> K_slots;
int<lower=1> p_dim;Explicit dimensionality fields for the EB Path B convention. Under Path B:
-
K_slots == K(number of distributional parameter slots) -
p_dim == 1(univariate population reference per slot)
These are declared so that R-side .gdpar_eb_* helpers can populate them uniformly across all EB templates. The body of this template does not branch on either field.
This section (1/3) establishes the complete functions block (Tweedie density with series/saddlepoint dispatch, inverse-link dispatch, and code-generated helpers) and the complete data block (observation count, family/link identifiers, AMM component flags and design matrices, outcomes, anchors, dispersion flags, spline configuration for
with the transformed data, parameters, transformed parameters, model, and generated quantities.
This section contains four of Stan's program blocks in sequence: transformed data, parameters, transformed parameters, and the model block. Each is dissected below.
This block computes deterministic quantities from the data-level inputs before any parameters are sampled. Its primary role is bookkeeping: it sizes arrays, computes offsets for flat-packing raw parameter vectors, and builds index maps that ensure no unused parameter slot enters the model.
array[K] int J_a_free;
array[K] int J_b_free;
array[K] int a_raw_offset;
array[K] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;-
J_a_free[k]— Number of free (non-trivially sampled) additive-design columns for slot$k$ . A column is free only if (a) the slot participates in the additive component (use_a_k[k] == 1) and (b) the slot actually has at least one design column (J_a_per_k[k] > 0). Otherwise it collapses to 0. Mathematically:
-
J_b_free[k]— Identical logic for the multiplicative-design columns:
-
a_raw_offset[k]— Cumulative byte-offset into the flat-packed raw additive vectora_rawwhere slot$k$ 's raw coefficients begin. Computed astotal_J_a_freebefore incrementing, so slot$k$ 's raw entries occupy indicesa_raw_offset[k] + 1througha_raw_offset[k] + J_a_free[k]. -
c_b_raw_offset[k]— Same for the multiplicative raw vectorc_b_k_raw. -
total_J_a_free,total_J_b_free— Running sums, ultimately equal to$\sum_{k=1}^{K} J_{a,\text{free}}^{(k)}$ and$\sum_{k=1}^{K} J_{b,\text{free}}^{(k)}$ respectively. These become the dimension of the flat parameter vectorsa_rawandc_b_k_raw.
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;When the global modulating component use_W == 0), its dimensions are collapsed to 0 so that every matrix/array sized by them vanishes, eliminating dead code paths.
Similarly for
int any_use_a = 0;
int any_use_b = 0;
int sigma_W_size;
int sigma_y_size = 0;
int phi_size = 0;-
any_use_a— Becomes 1 if any slot$k$ hasuse_a_k[k] == 1. Guards all code that references$\sigma_a^{(k)}$ ora_raw. -
any_use_b— Same for the multiplicative component. -
sigma_W_size— Equals 1 when$W$ is active with positive dimension, 0 otherwise. Controls whether a single global scale$\sigma_W$ is declared. -
sigma_y_size— Accumulatesuse_dispersion_y_k[k]across slots. Counts how many slots request a population-scoped Gaussian dispersion$\sigma_y$ . Each such slot contributes one entry to the compacted vectorsigma_y_pop. -
phi_size— Same for negative-binomial / Beta dispersion$\phi$ .
The accumulation loop:
for (k in 1:K) {
J_a_free[k] = (use_a_k[k] == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
J_b_free[k] = (use_b_k[k] == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
a_raw_offset[k] = total_J_a_free;
c_b_raw_offset[k] = total_J_b_free;
total_J_a_free += J_a_free[k];
total_J_b_free += J_b_free[k];
if (use_a_k[k] == 1) any_use_a = 1;
if (use_b_k[k] == 1) any_use_b = 1;
sigma_y_size += use_dispersion_y_k[k];
phi_size += use_dispersion_phi_k[k];
}After the loop:
sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
if (J_a_free[k] > 0) {
n_sigma_a += 1;
sigma_a_idx[k] = n_sigma_a;
} else {
sigma_a_idx[k] = 0;
}
}Rationale (referenced as RG.6, D96): A slot whose additive component either does not exist (use_a_k[k] == 0) or is intercept-only (the intercept is fully absorbed into
Compaction strategy: Only slots with sigma_a_k. The map sigma_a_idx[k] records:
When every slot has free additive coefficients, sigma_a_idx is the identity
All quantities declared here are sampled by the MCMC algorithm. The block realizes the AMM (Additive–Multiplicative–Modulating) decomposition at the parameter level.
array[J_groups] vector[K] theta_ref_k;For each group
Each
array[use_groups] vector[K] mu_theta_ref_k;
array[use_groups] vector<lower=0>[K] sigma_theta_ref_k;When use_groups == 1, these are the mean and standard deviation of the hierarchical prior on the anchors. The array[use_groups] idiom (dimension 0 or 1) is Stan's way of conditionally declaring parameters: if use_groups == 0, the arrays have size 0 and no memory/sampling is allocated.
array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;-
sigma_a_k— Per-slot standard deviation of the additive component, compacted ton_sigma_aentries (see §1.4). Each entry is$\sigma_a^{(k)} > 0$ . -
a_raw— Flat-packed vector of raw additive coefficients across all active slots. For slot$k$ with$J_{a,\text{free}}^{(k)} > 0$ , its entries occupy positionsa_raw_offset[k]+1througha_raw_offset[k]+J_a_free[k]. The actual coefficients are$a^{(k)}_j = \sigma_a^{(k)} \cdot a_{\text{raw},j}^{(k)}$ , realizing the non-centered parameterization:
array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_k_raw;-
sigma_b_k— Per-slot scale for the multiplicative component. Dimension is$K \cdot \texttt{any\_use\_b}$ : if no slot uses the multiplicative component, the array is empty. When present,$\sigma_b^{(k)} > 0$ . -
c_b_k_raw— Flat-packed raw multiplicative coefficients. The actual coefficients are:
The relationship to the multiplicative design
In the single-group regime, the multiplicative coefficient can be decomposed as
array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;-
sigma_W— A single global scale (array of size 0 or 1).$\sigma_W > 0$ . -
W_raw— A$\texttt{dim\_W} \times d$ matrix of raw modulation weights. The actual weights are$W = \sigma_W \cdot W_{\text{raw}}$ , with$W_{\text{raw},\ell,j} \sim \mathcal{N}(0,1)$ (or whateverW_PRIORspecifies).
These weights define the global modulating component that maps basis-function evaluations of
array[sigma_y_size] real<lower=0> sigma_y_pop;
array[phi_size] real<lower=0> phi_pop;Population-level dispersion parameters for slots whose param_spec.scope remains "population" after promotion. sigma_y_pop carries entries for Gaussian-scale slots; phi_pop carries entries for NB/Beta dispersion slots. These are compacted similarly to sigma_a_k.
This block reconstructs derived quantities from the raw parameters and builds the per-observation, per-slot linear predictors
array[K] vector[J_a_max] a_coef_k;
array[K] vector[J_b_max] c_b_k;
array[K] vector[J_b_max] b_coef_k;
matrix[n, K] eta_k;-
a_coef_k[k]— Additive coefficients for slot$k$ , lengthJ_a_max(padded with zeros if the slot has fewer design columns). -
c_b_k[k]— Multiplicative product-level coefficients for slot$k$ . -
b_coef_k[k]— Multiplicative coefficients divided by the anchor (only non-zero in the single-group regime). -
eta_k— The$n \times K$ matrix of per-observation, per-slot linear predictors.
All are initialized to zero:
for (k in 1:K) {
a_coef_k[k] = rep_vector(0, J_a_max);
c_b_k[k] = rep_vector(0, J_b_max);
b_coef_k[k] = rep_vector(0, J_b_max);
}This placeholder is replaced at code-generation time. Conceptually, for each slot
if (any_use_b == 1) {
for (k in 1:K) {
if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
for (j in 1:J_b_free[k]) {
c_b_k[k][j] = c_b_k_raw[c_b_raw_offset[k] + j] * sigma_b_k[k];
}This implements the non-centered parameterization:
Next, the derived quantity use_groups == 0) and when the anchor is non-zero:
if (use_groups == 0 && theta_ref_k[1][k] != 0) {
for (j in 1:J_b_per_k[k]) {
b_coef_k[k][j] = c_b_k[k][j] / theta_ref_k[1][k];
}
}With multiple groups, b_coef_k stays at zero; downstream R code derives
for (i in 1:n) {
int g_i = group_id[i];
for (k in 1:K) {
real theta_ref_ik = theta_ref_k[g_i][k];
real eta_ik = theta_ref_ik;For each observation
Additive contribution (conditional on use_a_k[k] and J_a_per_k[k]):
if (use_a_k[k] == 1 && J_a_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_a_k[k][i, 1:J_a_per_k[k]]),
a_coef_k[k][1:J_a_per_k[k]]
);
}Multiplicative contribution (conditional on use_b_k[k] and J_b_per_k[k]):
if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_b_k[k][i, 1:J_b_per_k[k]]),
c_b_k[k][1:J_b_per_k[k]]
);
}Modulating contribution (conditional on use_W, dim_W, d):
if (use_W == 1 && dim_W > 0 && d > 0) {
vector[d] W_diff_x_k = rep_vector(0, d);
vector[dim_W] basis_diff_k = apply_W_basis_diff(
W_type_id, theta_ref_ik, theta_anchor_K[k], dim_W,
W_degree, W_n_knots_full, W_knots_full
);
for (jj_local in 1:dim_W) {
for (jjj in 1:d) {
W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[jj_local, jjj]{{W_SCALE}};
}
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
}This computes:
-
Evaluate the basis function
$\phi_\ell\bigl(\theta_{\text{ref},g_i}^{(k)} - \theta_{\text{anchor}}^{(k)}\bigr)$ for$\ell = 1, \ldots, \texttt{dim\_W}$ . The functionapply_W_basis_diffdispatches onW_type_id(1 = polynomial, 2 = B-spline) to produce the vector$\texttt{basis\_diff\_k} \in \mathbb{R}^{\texttt{dim\_W}}$ . -
The modulation vector for covariate direction
$j$ is:
where {{W_SCALE}} (typically 1).
- The modulating contribution to the linear predictor is:
This realizes a varying-coefficient model where the effect of covariate
Finally, the assembled linear predictor is stored:
eta_k[i, k] = eta_ik;Full AMM decomposition summary for observation
where
This block specifies the log-posterior: priors on all parameters and the likelihood of the observed data given the linear predictors
{{THETA_REF_PRIOR_BLOCK}}This placeholder is replaced at code-generation time. In the hierarchical case (use_groups == 1), the prior is:
For structurally constrained slots (e.g., Tweedie's power
if (any_use_a == 1) {
sigma_a_k ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}-
sigma_a_k ~ {{PRIOR_SIGMA_A}}— Prior on the per-slot additive scales. Typically half-normal or half-Cauchy. -
{{MODEL_A_BLOCK}}— Prior on the raw additive coefficients. In the canonical form:
which, combined with
if (any_use_b == 1) {
sigma_b_k ~ {{PRIOR_SIGMA_B}};
c_b_k_raw ~ normal(0, 1);
}-
sigma_b_k ~ {{PRIOR_SIGMA_B}}— Prior on the per-slot multiplicative scales. -
c_b_k_raw ~ normal(0, 1)— Non-centered prior:$c_{b,\text{raw}} \sim \mathcal{N}(0, I)$ , so$c_b \sim \mathcal{N}(0, \sigma_b^2 I)$ marginally.
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}-
sigma_W[1]— Global modulation scale, receiving its configured prior. -
W_raw— Flattened and given its configured prior (typically$\mathcal{N}(0,1)$ , yielding$W \sim \mathcal{N}(0, \sigma_W^2 I)$ marginally).
if (sigma_y_size > 0) {
sigma_y_pop ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
phi_pop ~ {{PRIOR_PHI}};
}Population-scoped dispersion parameters receive their family-appropriate priors.
The likelihood dispatches on family_id_k[1], which is the family identifier for slot 1 (the location slot). The template handles
The general structure for all families is:
- Transform
$\eta_k[i, k]$ through the slot-specific inverse link to obtain the natural parameter. - Feed natural parameters into the appropriate Stan likelihood.
The inverse-link function is applied via apply_inv_link_by_id(inv_link_id_per_slot[k], eta_k[i, k]), where link IDs map as:
| ID | Link | Inverse |
|---|---|---|
| 1 | identity | |
| 2 | log | |
| 3 | logit |
for (i in 1:n) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ normal(eta_k[i, 1], sigma_i);
}Slot 1 carries
for (i in 1:n) {
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_int[i] ~ neg_binomial_2(exp(eta_k[i, 1]), phi_i);
}Slot 1: neg_binomial_2 parametrization uses
for (i in 1:n) {
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ beta_proportion(inv_logit(eta_k[i, 1]), phi_i);
}Canonical mean-precision parametrization: beta_proportion(mu, phi) has mean
for (i in 1:n) {
real mu_i = exp(eta_k[i, 1]);
real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ gamma(shape_i, shape_i / mu_i);
}
$$
$$
y_i \sim \text{Gamma}\!\bigl(\alpha_i,\; \alpha_i / \mu_i\bigr).
$$
Mean-shape parametrization: $\mu_i = e^{\eta_i^{(1)}}$ (location), $\alpha_i = g_2^{-1}(\eta_i^{(2)})$ (shape). Stan's Gamma uses shape–rate $(\alpha, \beta)$ with $\mathbb{E}[y] = \alpha/\beta$; the code converts via $\beta_i = \alpha_i / \mu_i$.
##### 4.6.5 Family 7 — Lognormal ($K=2$)
```stan
for (i in 1:n) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ lognormal(eta_k[i, 1], sigma_i);
}
$$
$$
\log(y_i) \sim \mathcal{N}\!\bigl(\eta_i^{(1)},\; g_2^{-1}(\eta_i^{(2)})\bigr).
$$
Slot 1 carries $\mu$ of the log-scale (identity link); slot 2 carries $\sigma$ of the log-scale (typically log link).
##### 4.6.6 Family 8 — Student-t (K=3)
```stan
for (i in 1:n) {
y_real[i] ~ student_t(exp(eta_k[i, 3]), eta_k[i, 1], exp(eta_k[i, 2]));
}
$$
$$
y_i \sim t_{\nu_i}\!\bigl(\eta_i^{(1)},\; e^{\eta_i^{(2)}}\bigr), \qquad \nu_i = e^{\eta_i^{(3)}}.
$$
Three slots: location (identity), log-scale, and log-degrees-of-freedom. Stan's `student_t(nu, mu, sigma)` takes the raw degrees of freedom.
##### 4.6.7 Family 9 — Tweedie ($K=3$)
```stan
for (i in 1:n) {
y_real[i] ~ tweedie(exp(eta_k[i, 1]), exp(eta_k[i, 2]), eta_k[i, 3]);
}
$$
$$
y_i \sim \text{Tweedie}\!\bigl(e^{\eta_i^{(1)}},\; e^{\eta_i^{(2)}},\; \eta_i^{(3)}\bigr).
$$
Three slots: $\log(\mu)$ (log link), $\log(\phi)$ (log link), and $p$ (identity link, structurally bounded to $(1.01, 1.99)$). The custom `tweedie_lpdf` (defined in section 1) dispatches between Dunn–Smyth series and saddlepoint approximation depending on $\lvert p - 1.5\rvert$.
##### 4.6.8 Family 10 — Zero-Inflated Poisson (K=2)
```stan
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
target += log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(0 | eta_mu_i)
);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i);
}
}
$$
The ZIP likelihood (Lambert 1992):
$$
P(y_i = 0) = \pi_i + (1-\pi_i)\,e^{-\mu_i}, \qquad P(y_i = m) = (1-\pi_i)\,\frac{e^{-\mu_i}\mu_i^m}{m!}, \quad m > 0,
$$
where $\mu_i = e^{\eta_i^{(1)}}$ and $\pi_i = \text{logit}^{-1}(\eta_i^{(2)})$. The implementation uses log-sum-exp for numerical stability on the zero branch:
$$
\log P(y_i=0) = \text{log\_sum\_exp}\!\bigl(\text{logit}^{-1}(\eta_\pi),\; \log(1-\pi_i) + \log P_{\text{Pois}}(0 \mid \mu_i)\bigr).
$$
##### 4.6.9 Family 11 — Zero-Inflated Negative Binomial ($K=3$)
```stan
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
target += log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i);
}
}
$$
$$
P(y_i=0) = \pi_i + (1-\pi_i)\,\text{NB2}(0 \mid \mu_i, \phi_i), \qquad P(y_i=m) = (1-\pi_i)\,\text{NB2}(m \mid \mu_i, \phi_i), \;\; m>0.
$$
Three slots: $\log(\mu)$, $\log(\phi)$, $\text{logit}(\pi)$.
##### 4.6.10 Family 12 — Hurdle-Poisson (K=2)
```stan
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
target += bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i)
- log1m_exp(-exp(eta_mu_i));
}
}
$$
Hurdle likelihood (Mullahy 1986):
$$
P(y_i = 0) = \pi_i, \qquad P(y_i = m) = (1-\pi_i)\,\frac{\text{Pois}(m \mid \mu_i)}{1 - e^{-\mu_i}}, \quad m > 0.
$$
The truncation denominator $1 - e^{-\mu_i}$ is computed stably as `log1m_exp(-exp(eta_mu_i))`, i.e., $\text{log1m\_exp}(-\mu_i) = \log(1 - e^{-\mu_i})$.
##### 4.6.11 Family 13 — Hurdle-Negative Binomial ($K=3$)
```stan
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
target += bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i)
- log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i));
}
}
$$
$$
P(y_i=0) = \pi_i, \qquad P(y_i=m) = (1-\pi_i)\,\frac{\text{NB2}(m \mid \mu_i, \phi_i)}{1 - \text{NB2}(0 \mid \mu_i, \phi_i)}, \quad m>0.
$$
Three slots: $\log(\mu)$, $\log(\phi)$, $\text{logit}(\pi)$. The truncation denominator is computed in log-space via `log1m_exp`.
##### 4.6.12 Inactive families
```stan
// family_id_k[1] in {2 (poisson), 4 (bernoulli)}: these are
// K = 1 families (single eligible param_spec); the dispatcher
// never routes them here.Poisson (family 2) and Bernoulli (family 4) are single-parameter distributions (families.R) never promotes them to this marginal-
The entire template realizes the following structural identity for the distributional regression:
where each slot's linear predictor decomposes as the reference-anchored AMM form:
The non-centered parameterizations (sigma_a_k eliminates non-identified scale parameters for slots without free additive coefficients. The global
The generated quantities block computes posterior quantities that are derived from the parameter draws. In this AMM (Anchored Mixed Model) context, it serves three purposes: (1) computing pointwise log-likelihoods eta_k incorporate the reference-anchoring decomposition; thus, all derived quantities automatically respect the anchoring constraint.
vector[n] log_lik;
matrix[n, K] theta_i_k = eta_k;
vector[n] y_pred;-
log_lik: A vector of length$n$ (number of observations) storing the per-observation log-likelihoods. These are used for model comparison criteria. -
theta_i_k: An$n \times K$ matrix initialized toeta_k, which contains the linear predictors for each observation$i$ and each distributional slot$k$ (after reference anchoring). This is essentially a copy of the fitted linear predictors for convenience. -
y_pred: A vector of length$n$ storing posterior-predictive draws from the fitted model.
The block loops over all family_id_k[1] to compute the log-likelihood and posterior-predictive draw according to the specified distributional regression family. The distributional parameters are obtained by applying the appropriate inverse link functions to the linear predictors stored in eta_k[i, k].
if (family_id_k[1] == 1) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
y_pred[i] = normal_rng(eta_k[i, 1], sigma_i);
}- The normal model has two distributional slots: mean
$\mu$ and standard deviation$\sigma$ . -
$\mu_i = \eta_{i1}$ (identity link for mean). -
$\sigma_i = g_2^{-1}(\eta_{i2})$ , where$g_2^{-1}$ is the inverse link function for the scale slot (typically softplus or exp to ensure$\sigma > 0$ ). - Log-likelihood:
$\ell_i = \log \mathcal{N}(y_i \mid \mu_i, \sigma_i) = -\frac{1}{2}\log(2\pi) - \log \sigma_i - \frac{(y_i - \mu_i)^2}{2\sigma_i^2}$ . - Posterior-predictive draw:
$\tilde{y}_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$ .
} else if (family_id_k[1] == 3) {
real mu_i = exp(eta_k[i, 1]);
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = neg_binomial_2_lpmf(y_int[i] | mu_i, phi_i);
y_pred[i] = neg_binomial_2_rng(mu_i, phi_i);
}- Negative binomial parameterized with mean
$\mu$ and dispersion$\phi$ (sometimes called "size"). -
$\mu_i = \exp(\eta_{i1})$ (log link for mean). -
$\phi_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures$\phi > 0$ ). - Log-likelihood:
$\ell_i = \log \text{NB2}(y_i \mid \mu_i, \phi_i) = \log\binom{y_i + \phi_i - 1}{y_i} + \phi_i \log\left(\frac{\phi_i}{\phi_i + \mu_i}\right) + y_i \log\left(\frac{\mu_i}{\phi_i + \mu_i}\right)$ . - Posterior-predictive draw:
$\tilde{y}_i \sim \text{NB2}(\mu_i, \phi_i)$ .
} else if (family_id_k[1] == 5) {
real mu_i = inv_logit(eta_k[i, 1]);
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = beta_proportion_lpdf(y_real[i] | mu_i, phi_i);
y_pred[i] = beta_proportion_rng(mu_i, phi_i);
}- Beta-proportion parameterized with mean
$\mu \in (0,1)$ and precision$\phi > 0$ . -
$\mu_i = \text{logit}^{-1}(\eta_{i1})$ (logistic link for mean). -
$\phi_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures$\phi > 0$ ). - Log-likelihood:
$\ell_i = \log \text{Beta-Proportion}(y_i \mid \mu_i, \phi_i) = \log\left(\frac{\Gamma(\phi)}{\Gamma(\mu_i\phi)\Gamma((1-\mu_i)\phi)}\right) + (\mu_i\phi-1)\log y_i + ((1-\mu_i)\phi-1)\log(1-y_i)$ . - Posterior-predictive draw:
$\tilde{y}_i \sim \text{Beta-Proportion}(\mu_i, \phi_i)$ .
} else if (family_id_k[1] == 6) {
real mu_i = exp(eta_k[i, 1]);
real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = gamma_lpdf(y_real[i] | shape_i, shape_i / mu_i);
y_pred[i] = gamma_rng(shape_i, shape_i / mu_i);
}- Gamma parameterized with shape
$\alpha$ and rate$\beta$ . -
$\mu_i = \exp(\eta_{i1})$ (log link for mean). -
$\alpha_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures$\alpha > 0$ ). - Rate is
$\beta_i = \alpha_i / \mu_i$ (mean$\mu = \alpha/\beta$ ). - Log-likelihood:
$\ell_i = \log \mathcal{G}(y_i \mid \alpha_i, \beta_i) = (\alpha_i-1)\log y_i - \beta_i y_i + \alpha_i\log\beta_i - \log\Gamma(\alpha_i)$ . - Posterior-predictive draw:
$\tilde{y}_i \sim \mathcal{G}(\alpha_i, \beta_i)$ .
} else if (family_id_k[1] == 7) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = lognormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
y_pred[i] = lognormal_rng(eta_k[i, 1], sigma_i);
}- Lognormal parameterized with location
$\mu$ (of the underlying normal) and scale$\sigma$ . -
$\mu_i = \eta_{i1}$ (identity link for location). -
$\sigma_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures$\sigma > 0$ ). - Log-likelihood:
$\ell_i = -\frac{1}{2}\log(2\pi) - \log\sigma_i - \log y_i - \frac{(\log y_i - \mu_i)^2}{2\sigma_i^2}$ . - Posterior-predictive draw:
$\tilde{y}_i \sim \text{Lognormal}(\mu_i, \sigma_i^2)$ .
} else if (family_id_k[1] == 8) {
real nu_i = exp(eta_k[i, 3]);
real mu_i = eta_k[i, 1];
real sigma_i = exp(eta_k[i, 2]);
log_lik[i] = student_t_lpdf(y_real[i] | nu_i, mu_i, sigma_i);
y_pred[i] = student_t_rng(nu_i, mu_i, sigma_i);
}- Student-t with three distributional slots: degrees of freedom
$\nu$ , location$\mu$ , scale$\sigma$ . -
$\nu_i = \exp(\eta_{i3})$ (log link for degrees of freedom). -
$\mu_i = \eta_{i1}$ (identity link for location). -
$\sigma_i = \exp(\eta_{i2})$ (log link for scale). - Log-likelihood:
$\ell_i = \log t_\nu(y_i \mid \mu_i, \sigma_i) = \log\Gamma\left(\frac{\nu_i+1}{2}\right) - \log\Gamma\left(\frac{\nu_i}{2}\right) - \frac{1}{2}\log(\nu_i\pi) - \log\sigma_i - \frac{\nu_i+1}{2}\log\left(1 + \frac{(y_i-\mu_i)^2}{\nu_i\sigma_i^2}\right)$ . - Posterior-predictive draw:
$\tilde{y}_i \sim t_{\nu_i}(\mu_i, \sigma_i^2)$ .
} else if (family_id_k[1] == 9) {
real mu_i = exp(eta_k[i, 1]);
real phi_i = exp(eta_k[i, 2]);
real p_i = eta_k[i, 3];
log_lik[i] = tweedie_lpdf(y_real[i] | mu_i, phi_i, p_i);
y_pred[i] = tweedie_rng(mu_i, phi_i, p_i);
}- Tweedie with three distributional slots: mean
$\mu$ , dispersion$\phi$ , power$p$ (typically$1 < p < 2$ for compound Poisson-gamma). -
$\mu_i = \exp(\eta_{i1})$ (log link for mean). -
$\phi_i = \exp(\eta_{i2})$ (log link for dispersion). -
$p_i = \eta_{i3}$ (identity link for power). - Log-likelihood:
$\ell_i = \log f(y_i \mid \mu_i, \phi_i, p_i)$ where$f$ is the Tweedie density (no closed form; Stan uses numerical integration or series expansion). - Posterior-predictive draw:
$\tilde{y}_i \sim \text{Tweedie}(\mu_i, \phi_i, p_i)$ .
} else if (family_id_k[1] == 10) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
log_lik[i] = log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(0 | eta_mu_i)
);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i);
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
y_pred[i] = poisson_log_rng(eta_mu_i);
}
}- ZIP has two distributional slots: Poisson rate (log scale) and zero-inflation probability (logit scale).
-
$\eta_{\mu,i} = \eta_{i1}$ (linear predictor for Poisson log-rate). -
$\eta_{\pi,i} = \eta_{i2}$ (linear predictor for zero-inflation probability). - Log-likelihood for
$y_i = 0$ : $$ \ell_i = \log\left(\pi_i + (1-\pi_i)e^{-\lambda_i}\right) = \log\left(e^{\log\pi_i} + e^{\log(1-\pi_i) + \log\lambda_i \cdot 0 - \lambda_i}\right) $$ where$\pi_i = \text{logit}^{-1}(\eta_{\pi,i}),\ \lambda_i = \exp(\eta_{\mu,i})$ . This is implemented vialog_sum_expto avoid underflow. - Log-likelihood for
$y_i > 0$ : $$ \ell_i = \log(1-\pi_i) + \log\left(\frac{\lambda_i^{y_i}e^{-\lambda_i}}{y_i!}\right) $$ - Posterior-predictive draw: First draw zero-inflation indicator
$z_i \sim \text{Bernoulli}(\pi_i)$ . If$z_i = 1$ , set$\tilde{y}_i = 0$ ; else draw$\tilde{y}_i \sim \text{Poisson}(\lambda_i)$ .
} else if (family_id_k[1] == 11) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
log_lik[i] = log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
phi_i);
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
y_pred[i] = neg_binomial_2_log_rng(eta_mu_i, phi_i);
}
}- ZINB has three distributional slots: NB log-rate, NB dispersion, zero-inflation probability (logit).
-
$\eta_{\mu,i} = \eta_{i1}$ (NB log-rate). -
$\phi_i = \exp(\eta_{i2})$ (dispersion, exponentiated). -
$\eta_{\pi,i} = \eta_{i3}$ (zero-inflation logit). - Log-likelihood structure analogous to ZIP but using NB2 instead of Poisson.
- Posterior-predictive draw: Draw
$z_i \sim \text{Bernoulli}(\pi_i)$ . If$z_i=1,\ \tilde{y}_i=0$ ; else$\tilde{y}_i \sim \text{NB2}(\mu_i, \phi_i)$ .
} else if (family_id_k[1] == 12) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i)
- log1m_exp(-exp(eta_mu_i));
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
int y_draw = 0;
int iter = 0;
while (y_draw == 0 && iter < 10000) {
y_draw = poisson_log_rng(eta_mu_i);
iter = iter + 1;
}
y_pred[i] = y_draw;
}
}- Hurdle-Poisson has two distributional slots: Poisson rate (log scale) and hurdle probability (logit).
-
$\eta_{\mu,i} = \eta_{i1}$ (Poisson log-rate). -
$\eta_{\pi,i} = \eta_{i2}$ (hurdle logit). - Log-likelihood for
$y_i = 0:\ \ell_i = \log\pi_i$ . - Log-likelihood for
$y_i > 0$ : $$ \ell_i = \log(1-\pi_i) + \log\left(\frac{\lambda_i^{y_i}e^{-\lambda_i}}{y_i!}\right) - \log\left(1-e^{-\lambda_i}\right) $$ The term$\log\left(1-e^{-\lambda_i}\right)$ is the truncation normalizer (probability of being positive). It is computed aslog1m_exp(-exp(eta_mu_i))which is$\log(1 - \exp(-\lambda_i))$ . - Posterior-predictive draw: Draw hurdle indicator
$z_i \sim \text{Bernoulli}(\pi_i)$ . If$z_i=1,\ \tilde{y}_i=0$ ; else draw from truncated Poisson via rejection sampling: repeatedly draw from$\text{Poisson}(\lambda_i)$ until a draw is positive (up to 10000 iterations for numerical safety).
} else if (family_id_k[1] == 13) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
phi_i)
- log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i,
phi_i));
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
int y_draw = 0;
int iter = 0;
while (y_draw == 0 && iter < 10000) {
y_draw = neg_binomial_2_log_rng(eta_mu_i, phi_i);
iter = iter + 1;
}
y_pred[i] = y_draw;
}
}- Hurdle-NB has three distributional slots: NB log-rate, NB dispersion, hurdle probability (logit).
-
$\eta_{\mu,i} = \eta_{i1}$ (NB log-rate). -
$\phi_i = \exp(\eta_{i2})$ (dispersion). -
$\eta_{\pi,i} = \eta_{i3}$ (hurdle logit). - Log-likelihood for
$y_i=0:\ \ell_i = \log\pi_i$ . - Log-likelihood for
$y_i>0$ : $$ \ell_i = \log(1-\pi_i) + \log\text{NB2}(y_i \mid \mu_i, \phi_i) - \log\left(1 - \text{NB2}(0 \mid \mu_i, \phi_i)\right) $$ The truncation normalizer$\log\left(1 - \text{NB2}(0 \mid \mu_i, \phi_i)\right)$ is computed aslog1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)). - Posterior-predictive draw: Similar to hurdle-Poisson but draws from truncated NB via rejection sampling.
} else {
log_lik[i] = negative_infinity();
y_pred[i] = 0;
}- If
family_id_k[1]does not match any of the implemented families, set log-likelihood to$-\infty$ and predictive draw to 0. This is a safety fallback.
In the AMM framework, the linear predictors eta_k are constructed as:
$$
\eta_k = \mathbf{X}_k\beta_k + \mathbf{Z}_k\gamma_k + \text{(anchor adjustments)}
$$
where generated quantities block operates on the already-constrained eta_k, so all likelihoods and predictions are consistent with the anchored decomposition. The matrix theta_i_k provides direct access to these anchored linear predictors for diagnostics or plotting.
This is section 1 of 2 of the EB-marginal Path-A template. It is bit-identical in body to amm_distrib_multi.stan; only the banner differs (it documents the Empirical-Bayes context, where the template is fed to cmdstanr::laplace() in step (i) of the EB workflow for regime amm_eb_conditional_multi.stan moves theta_ref from parameters{} to data{} (renamed theta_ref_data) and drops the prior on theta_ref/mu_theta_ref/sigma_theta_ref.
The reference-anchoring decomposition realized by the template is, per individual
The linear predictor transformed parameters is exactly
with family_id. Cross-dimensional coupling is carried exclusively by
The multiplicative term is reparametrized linearly: defining c_b_raw), reporting b_coef as a derived quantity. This eliminates the
functions {
// {{CANONICAL_HELPERS}}
}A single placeholder, {{CANONICAL_HELPERS}}, is substituted by R/stan_codegen.R. Per the banner, the helpers include apply_W_basis_diff(), the W-basis dispatcher introduced in sub-phase 8.3.8 (2026-05-22). Its signature, as invoked in transformed parameters, is
vector[W_per_k_dim] apply_W_basis_diff(
int W_type_id,
real theta_ref_ik,
real theta_anchor_k,
int W_per_k_dim,
int W_degree,
int W_n_knots_full,
vector[W_n_knots_full] W_knots_full
);It returns, for a single coordinate, the basis-difference vector
where W_type_id == 1) or the B-spline basis (W_type_id == 2) of degree W_degree on knots W_knots_full. For W_type_id == 0 the helper returns zeros (W off). The Cox–de Boor recursion for B-splines is re-evaluated at
int<lower=1> n;
int<lower=1> p;
int<lower=1, upper=4> family_id;-
n— number of observations. -
p— dimension of the population reference$\theta_{\mathrm{ref}}\in\mathbb{R}^p$ and of each$y_i$ . -
family_id— homogeneous family selector:-
1= Gaussian (identity link), -
2= Poisson (log link), -
3=neg_binomial_2(log link, Stan's mean-dispersion parametrization), -
4= Bernoulli (logit link).
-
int<lower=0, upper=1> use_a;
int<lower=0, upper=1> use_b;
int<lower=0, upper=1> use_W;Binary toggles for the additive (
int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[p] int<lower=0> J_a_per_k;
array[p] int<lower=0> J_b_per_k;-
J_a_max,J_b_max— maximum column counts across coordinates (the padded storage width). -
J_a_per_k[k],J_b_per_k[k]— actual column counts for coordinate$k$ ; zero when the component is inactive for that coordinate. The ragged structure is handled by padded storage plus per-coordinate iteration bounds.
array[p] matrix[n, J_a_max] Z_a;
array[p] matrix[n, J_b_max] Z_b;For coordinate amm_spec_multi_designs()), so
int<lower=0> dim_W;
int<lower=0> d;
int<lower=0> W_per_k_dim;
matrix[n, d] X;-
dim_W— total row count ofW_raw. For separable polynomial/B-spline bases,dim_W = p * W_per_k_dim. -
d— column count ofW_rawand of the covariate matrixX. -
W_per_k_dim— block size per coordinate;dim_W / pfor separable bases. -
X— covariate matrix$X\in\mathbb{R}^{n\times d}$ entering the W-modulation term.
matrix[n, p] y_real;
array[n, p] int y_int;Both blocks are always allocated; only the one matching family_id is consumed (y_real for family_id == 1; y_int otherwise).
vector[p] theta_anchor;
int<lower=0, upper=1> use_dispersion_y;
int<lower=0, upper=1> use_dispersion_phi;-
theta_anchor— fixed anchor$\theta_{\mathrm{anchor}}\in\mathbb{R}^p$ used in the W reparametrization$W(\theta_{\mathrm{ref}})-W(\theta_{\mathrm{anchor}})$ , satisfying (C4) of Block 1 exactly. -
use_dispersion_y— toggles per-coordinate Gaussian SDsigma_y(only meaningful forfamily_id == 1). -
use_dispersion_phi— toggles per-coordinate NB2 dispersionphi(only meaningful forfamily_id == 3).
int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;-
W_type_id—0off,1polynomial,2B-spline. -
W_n_knots_full,W_knots_full,W_degree— full knot vector and degree for the B-spline basis (ignored whenW_type_id != 2).
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;{{DATA_CP_A_PER_K_DECL}}-
use_groups == 0(default, backward-compatible):J_groups == 1,group_id[i] == 1for all$i$ , reducing to a single global$\theta_{\mathrm{ref}}$ of length$p$ . -
use_groups == 1: hierarchical per-group anchors$\theta_{\mathrm{ref},g},\ g=1,\dots,J_{\text{groups}}$ , with coordinate-wise hyperpriors. Identifiability condition (C5):$Z_a[k]$ must not contain afactor(group)indicator; R-side preflight enforces this. -
{{DATA_CP_A_PER_K_DECL}}— placeholder substituted from thecp_a_per_klogical vector (Phase H.2); declares any per-coordinate CP/NCP scale hyperparameters needed by the additive block.
int<lower=1> K_slots;
int<lower=1> p_dim;Declared explicitly so R-side .gdpar_eb_* helpers can populate them uniformly. Under Path A, K_slots == 1 and p_dim == p. The template body does not branch on either field.
array[p] int J_a_free;
array[p] int J_b_free;
array[p] int a_raw_offset;
array[p] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;Per-coordinate free-coefficient counts and flat-pack offsets into the packed vectors a_raw and c_b_raw. a_raw_offset[k] is the start index (0-based) of coordinate a_raw; same convention for c_b_raw_offset.
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;
int sigma_a_size = (use_a == 1) ? p : 0;
int sigma_b_size = (use_b == 1) ? p : 0;
int sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;
int sigma_y_size = (use_dispersion_y == 1) ? p : 0;
int phi_size = (use_dispersion_phi == 1) ? p : 0;These drive conditional array sizes in parameters{}. Note:
-
sigma_W_sizeis either0or1— W has a single global scale, not per-coordinate. -
dim_W_effandd_effcollapse to0when W is off, soW_rawbecomes amatrix[0,0](a zero-size object Stan accepts).
for (k in 1:p) {
J_a_free[k] = (use_a == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
J_b_free[k] = (use_b == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
a_raw_offset[k] = total_J_a_free;
c_b_raw_offset[k] = total_J_b_free;
total_J_a_free += J_a_free[k];
total_J_b_free += J_b_free[k];
}For each coordinate
-
$J_{a,\text{free}}[k] = J_{a,\text{per\_k}}[k]$ ifuse_a == 1and the per-coordinate count is positive, else$0$ . -
$J_{b,\text{free}}[k]$ analogous. -
a_raw_offset[k]records the running total before adding coordinate$k$ 's block, so coordinate$k$ occupies indicesa_raw_offset[k]+1 .. a_raw_offset[k]+J_a_free[k](1-based) withina_raw.
array[J_groups] vector[p] theta_ref;
array[use_groups] vector[p] mu_theta_ref;
array[use_groups] vector<lower=0>[p] sigma_theta_ref;-
theta_ref[g]— the per-group population reference$\theta_{\mathrm{ref},g}\in\mathbb{R}^p$ . Whenuse_groups == 0,J_groups == 1and onlytheta_ref[1]is sampled, directly from{{PRIOR_THETA_REF}}(coordinate-wise i.i.d.). -
mu_theta_ref[1][k],sigma_theta_ref[1][k]— coordinate-wise hyperparameters of the group-level Normal prior ontheta_ref[g][k]. Allocated only whenuse_groups == 1(array size collapses to0otherwise, so the parameters vanish from the model).
array[sigma_a_size] real<lower=0> sigma_a;
vector[total_J_a_free] a_raw;-
sigma_a[k]— per-coordinate SD of the additive coefficients. Sizepwhenuse_a == 1, else0. -
a_raw— flat-packed standardized additive coefficients, length$\sum_k J_{a,\text{free}}[k]$ . CP vs NCP per coordinate is resolved by{{TP_A_BLOCK}}and{{MODEL_A_BLOCK}}(driven bycp_a_per_k).
array[sigma_b_size] real<lower=0> sigma_b;
vector[total_J_b_free] c_b_raw;-
sigma_b[k]— per-coordinate SD on$c_b$ (the linearly-identified product$\theta_{\mathrm{ref},k}\cdot b_{\mathrm{coef},k}$ ). -
c_b_raw— flat-packed standardized$c_b$ coefficients. Always NCP:c_b_raw ~ normal(0,1)(seemodel{}).
array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;-
sigma_W[1]— single global SD on W (allocated only whenuse_W == 1 && dim_W > 0). -
W_raw—$\dim_W\times d$ matrix of raw W coefficients. Row block$k$ (rows$(k-1)W_{\text{per\_k\_dim}}+1$ through$k\,W_{\text{per\_k\_dim}}$ ) corresponds to coordinate$k$ . CP/NCP is toggled globally via{{W_SCALE}}and{{W_PRIOR}}.
array[sigma_y_size] real<lower=0> sigma_y;
array[phi_size] real<lower=0> phi;-
sigma_y[k]— per-coordinate Gaussian SD (only whenuse_dispersion_y == 1). -
phi[k]— per-coordinate NB2 dispersion (only whenuse_dispersion_phi == 1).
array[p] vector[J_a_max] a_coef;
array[p] vector[J_b_max] c_b;
array[p] vector[J_b_max] b_coef;
matrix[n, p] eta;-
a_coef[k]— padded additive coefficient vector for coordinate$k$ (lengthJ_a_max, only firstJ_a_per_k[k]entries meaningful). -
c_b[k]— padded$c_b$ vector (the linearly-identified product). -
b_coef[k]— padded derived$b_{\mathrm{coef}}$ vector (reported only in single-anchor regime; see below). -
eta— the$n\times p$ linear-predictor matrix;$\eta_{i,k}=\theta_{i,k}$ under the AMM decomposition.
for (k in 1:p) {
a_coef[k] = rep_vector(0, J_a_max);
c_b[k] = rep_vector(0, J_b_max);
b_coef[k] = rep_vector(0, J_b_max);
}All padded coefficient vectors start at zero; only the active prefix of each is overwritten downstream. This guarantees that unused padded columns contribute exactly zero to eta.
{{TP_A_BLOCK}}Substituted from cp_a_per_k. For each coordinate
-
NCP:
$a_{\mathrm{coef},k}[j] = \sigma_a[k]\cdot a_{\text{raw}}[\text{offset}+j],\ j=1,\dots,J_{a,\text{free}}[k]$ . -
CP:
$a_{\mathrm{coef},k}[j] = a_{\text{raw}}[\text{offset}+j]$ (the scale is folded into the prior inmodel{}).
In both cases the active prefix of a_coef[k] is populated; the padded tail remains zero.
if (use_b == 1) {
for (k in 1:p) {
if (J_b_per_k[k] > 0) {
for (j in 1:J_b_free[k]) {
c_b[k][j] = c_b_raw[c_b_raw_offset[k] + j] * sigma_b[k];
}
if (use_groups == 0 && theta_ref[1][k] != 0) {
for (j in 1:J_b_per_k[k]) {
b_coef[k][j] = c_b[k][j] / theta_ref[1][k];
}
}
}
}
}For each active coordinate
-
NCP reconstruction of
$c_b$ : $$ c_{b,k}[j] ;=; \sigma_b[k]\cdot c_{b,\text{raw}}[\text{offset}k + j], \qquad j=1,\dots,J{b,\text{free}}[k]. $$ -
Derived
$b_{\mathrm{coef}}$ (single-anchor regime only, and only when$\theta_{\mathrm{ref},1,k}\neq 0$ ): $$ b_{\mathrm{coef},k}[j] ;=; \frac{c_{b,k}[j]}{\theta_{\mathrm{ref},1,k}}. $$ Underuse_groups == 1there is no canonical scalar to divide by (each group has its own$\theta_{\mathrm{ref},g,k}$ ), sob_coefstays at zero and users must derive it post-hoc per group from$c_b$ and$\theta_{\mathrm{ref},g,k}$ .
for (i in 1:n) {
int g_i = group_id[i];
for (k in 1:p) {
real theta_ref_ik = theta_ref[g_i][k];
real eta_ik = theta_ref_ik;
...
eta[i, k] = eta_ik;
}
}For each observation
real theta_ref_ik = theta_ref[g_i][k];
real eta_ik = theta_ref_ik;if (use_a == 1 && J_a_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_a[k][i, 1:J_a_per_k[k]]),
a_coef[k][1:J_a_per_k[k]]
);
}Only the active prefix of the padded row/column is sliced; the padded tail is excluded by the 1:J_a_per_k[k] range.
if (use_b == 1 && J_b_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_b[k][i, 1:J_b_per_k[k]]),
c_b[k][1:J_b_per_k[k]]
);
}The data identify
if (use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0) {
vector[d] W_diff_x_k = rep_vector(0, d);
vector[W_per_k_dim] basis_diff_k = apply_W_basis_diff(
W_type_id, theta_ref_ik, theta_anchor[k], W_per_k_dim,
W_degree, W_n_knots_full, W_knots_full
);
int row_start = (k - 1) * W_per_k_dim + 1;
for (jj_local in 1:W_per_k_dim) {
int row = row_start + jj_local - 1;
for (jjj in 1:d) {
W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[row, jjj]{{W_SCALE}};
}
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
}This block computes, for coordinate
-
Basis difference (via
apply_W_basis_diff): $$ \Delta^{(k)}\ell ;=; B^{(k)}\ell(\theta_{\mathrm{ref},g_i,k}) - B^{(k)}\ell(\theta{\mathrm{anchor},k}), \qquad \ell=1,\dots,W_{\text{per_k_dim}}. $$ -
Row block for coordinate
$k$ withinW_raw: $$ \text{row_start} ;=; (k-1),W_{\text{per_k_dim}} + 1, $$ so block$k$ occupies rows$\text{row\_start},\dots,\text{row\_start}+W_{\text{per\_k\_dim}}-1$ . -
Contraction of the basis difference with the W row block, producing a length-
$d$ vector: $$ \bigl(W_{\text{diff_x},k}\bigr)j ;=; \sum{\ell=1}^{W_{\text{per_k_dim}}} \Delta^{(k)}\ell \cdot \widetilde{W}{\text{row_start}+\ell-1,;j}, \qquad j=1,\dots,d, $$ where$\widetilde{W}$ denotesW_rawafter applying{{W_SCALE}}. Under NCP,{{W_SCALE}}resolves to/ sigma_W[1], giving$\widetilde{W}_{r,j}=W_{\text{raw},r,j}/\sigma_W$ ; under CP it is empty, so$\widetilde{W}=W_{\text{raw}}$ . -
Dot with covariate row: $$ \eta_{i,k}^{(4)} ;=; \sum_{j=1}^{d} \bigl(W_{\text{diff_x},k}\bigr)j , X{i,j} ;=; \bigl(W_k(\theta_{\mathrm{ref},g_i}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i. $$
The guard use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0 ensures the block is skipped whenever W is structurally absent.
eta[i, k] = eta_ik;This is exactly the AMM reference-anchoring decomposition stated in the banner.
if (use_groups == 1) {
mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
for (g in 1:J_groups) {
theta_ref[g] ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
}
} else {
theta_ref[1] ~ {{PRIOR_THETA_REF}};
}-
Hierarchical regime (
use_groups == 1): coordinate-wise hyperpriors on the mean and SD vectors, then a coordinate-wise independent Normal group-level prior: $$ \mu_{\theta_{\mathrm{ref}},k} \sim \pi_{\mu}, \quad \sigma_{\theta_{\mathrm{ref}},k} \sim \pi_{\sigma}, \quad \theta_{\mathrm{ref},g,k}\mid\mu_{\theta_{\mathrm{ref}},k},\sigma_{\theta_{\mathrm{ref}},k} ;\sim; \mathcal{N}!\bigl(\mu_{\theta_{\mathrm{ref}},k},,\sigma_{\theta_{\mathrm{ref}},k}\bigr). $$ Thenormal(mu_theta_ref[1], sigma_theta_ref[1])call is vectorized over the length-$p$ vectors, so each coordinate$k$ uses its own$(\mu_k,\sigma_k)$ . -
Single-anchor regime (
use_groups == 0):theta_ref[1]is sampled directly from{{PRIOR_THETA_REF}}(coordinate-wise i.i.d.).
if (use_a == 1) {
sigma_a ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}sigma_a[k] receives prior {{PRIOR_SIGMA_A}}. {{MODEL_A_BLOCK}} places the prior on a_raw:
-
NCP per coordinate:
a_raw ~ normal(0, 1)(the scale is applied in{{TP_A_BLOCK}}). -
CP per coordinate:
a_raw ~ normal(0, sigma_a[k])(or whatever prior is configured), with the scale folded into the prior directly.
The per-coordinate CP/NCP choice is driven by cp_a_per_k (Phase H.2).
if (use_b == 1) {
sigma_b ~ {{PRIOR_SIGMA_B}};
c_b_raw ~ normal(0, 1);
}$$
\sigma_{b,k} \sim \pi_{\sigma_b}, \qquad c_{b,\text{raw},m} ;\sim; \mathcal{N}(0,1) \quad\text{(i.i.d.\ over all packed entries)}.
$$
The transformed parameters reconstruction
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}$$
\sigma_W \sim \pi_{\sigma_W}, \qquad \mathrm{vec}(W_{\text{raw}}) \sim \pi_W.
$$
Under NCP, {{W_PRIOR}} resolves to normal(0, 1) and {{W_SCALE}} (in transformed parameters) divides by sigma_W[1]. Under CP, {{W_PRIOR}} resolves to normal(0, sigma_W[1]) and {{W_SCALE}} is empty. The toggle is global (single sigma_W), not per-coordinate.
if (use_dispersion_y == 1) {
sigma_y ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
phi ~ {{PRIOR_PHI}};
}Per-coordinate priors on Gaussian SDs and NB2 dispersions, allocated only when the corresponding toggle is on.
for (k in 1:p) {
if (family_id == 1) {
y_real[:, k] ~ normal(eta[:, k], sigma_y[k]);
} else if (family_id == 2) {
y_int[:, k] ~ poisson_log(eta[:, k]);
} else if (family_id == 3) {
y_int[:, k] ~ neg_binomial_2_log(eta[:, k], phi[k]);
} else if (family_id == 4) {
y_int[:, k] ~ bernoulli_logit(eta[:, k]);
}
}Coordinate-wise independent likelihood, homogeneous family across
-
Gaussian (
family_id == 1, identity link): $$ y_{i,k} ;\sim; \mathcal{N}!\bigl(\eta_{i,k},,\sigma_{y,k}\bigr). $$ -
Poisson (
family_id == 2, log link): $$ y_{i,k} ;\sim; \mathrm{Poisson}!\bigl(e^{\eta_{i,k}}\bigr). $$ -
Negative binomial 2 (
family_id == 3, log link, mean-dispersion): $$ y_{i,k} ;\sim; \mathrm{NegBinomial2}!\bigl(e^{\eta_{i,k}},,\phi_k\bigr), $$ where$\phi_k$ is the dispersion (variance$= \mu + \mu^2/\phi_k$ ). -
Bernoulli (
family_id == 4, logit link): $$ y_{i,k} ;\sim; \mathrm{Bernoulli}!\bigl(\mathrm{logit}^{-1}(\eta_{i,k})\bigr). $$
The full joint likelihood factorizes as
Not present in section 1 of 2. The section ends after the model{} block's likelihood loop (the closing braces for the for (k in 1:p) loop, the model block, and the program are not shown in this excerpt and belong to section 2). No generated quantities block is visible in the provided source.
| AMM term | Code location | Math |
|---|---|---|
theta_ref[g_i][k] in transformed parameters
|
sampled (hierarchical or direct) | |
dot_product(Z_a[k][i,·], a_coef[k][·]) |
||
dot_product(Z_b[k][i,·], c_b[k][·]) |
|
|
apply_W_basis_diff + W_raw contraction + dot_product(W_diff_x_k, X[i])
|
Identifiability conditions realized: (C2) and (C3) by R-side column centering of factor(group) in use_groups == 1). The cross-dimensional condition C4-bis is not checked by this template; it assumes the spec has passed preflight.
Section 2 consists of a single closing brace } (terminating the block opened in section 1 — almost certainly the model block, since generated quantities is the next top-level block declared) followed by the generated quantities block in its entirety. No functions, data, transformed data, parameters, transformed parameters, or model declarations appear in this section; they all live in section 1 and are assumed visible here through Stan's lexical scoping (all symbols n, p, family_id, eta, sigma_y, phi, y_real, y_int are resolved from earlier blocks).
}Closes the immediately preceding top-level Stan block from section 1. Per Stan's grammar, top-level blocks must appear in the fixed order functions, data, transformed data, parameters, transformed parameters, model, generated quantities. Since generated quantities follows, the closed block is the model block (or, less likely, transformed parameters if the model block was empty/omitted — but Stan requires a model block, so the former is the only legal reading).
generated quantities {Opens the block executed once per HMC/NUTS posterior draw (after the leapfrog integration step). All quantities declared here are stored in the posterior and exposed to R as draws. Crucially, RNG calls inside this block use the per-draw RNG state seeded by Stan, so y_pred constitutes genuine posterior predictive draws.
matrix[n, p] log_lik;Declares an
i.e. the log-density (or log-PMF) of observation loo::loo() and loo::loo_matrix() for leave-one-out cross-validation with multivariate outcomes — each cell is an independent LOO unit. No initializer is supplied, so entries are NaN until assigned in the loop below; this is a latent contract that family_id ∈ {1,2,3,4}.
matrix[n, p] theta_i = eta;Declares theta_i and copies eta into it elementwise. This is the AMM reference-anchoring exposure point:
-
etais the linear predictor produced upstream (intransformed parametersof section 1) by the AMM decomposition — i.e. the canonical parameter on the link scale already anchored to the chosen reference level/category. -
theta_iis the per-observation canonical parameter$\theta_{ik}$ exposed to the user. The assignment$\theta_{ik} \leftarrow \eta_{ik}$ is the identity map on the link scale, so no Jacobian is required: the canonical parameter is reported in the same parameterization in which the likelihood is evaluated.
In the AMM reference-anchoring decomposition, the canonical parameter is written schematically as
where family_id). The section-1 code constructs theta_i for downstream R-side diagnostics, residual computation, or anchoring checks. Because the copy is on the link scale and the likelihood functions used below (*_log_lpmf, *_logit_lpmf) accept the linear predictor directly, the chain
matrix[n, p] y_pred;Declares the posterior predictive matrix. Element
using the same family and the same current-draw parameters as log_lik. Like log_lik, it is uninitialized and relies on the loop to fill every cell.
for (i in 1:n) {
for (k in 1:p) {Iterates over all i) and all k). The ordering — observation outermost, response innermost — matches the row-major layout of log_lik, theta_i, y_pred, eta, y_real, y_int. Each
The body of the inner loop is a four-way if / else if cascade keyed on the integer family_id (declared in data in section 1). Each branch performs two assignments: (a) the pointwise log-likelihood into log_lik[i,k], and (b) a posterior predictive draw into y_pred[i,k]. There is no else default, so any family_id outside NaN (a silent contract violation).
if (family_id == 1) {
log_lik[i, k] = normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]);
y_pred[i, k] = normal_rng(eta[i, k], sigma_y[k]);
}Likelihood (continuous response, stored in y_real):
with pointwise log-density
The link is the identity normal_lpdf is called with eta[i,k] as the mean. sigma_y[k] is a response-specific SD (declared in parameters/transformed parameters of section 1).
Posterior predictive draw:
Note the predictive RNG reuses the same
} else if (family_id == 2) {
log_lik[i, k] = poisson_log_lpmf(y_int[i, k] | eta[i, k]);
y_pred[i, k] = poisson_log_rng(eta[i, k]);
}Likelihood (count response, stored in y_int):
with pointwise log-PMF (the _log suffix indicates Stan applies
Link: theta_i = eta exposes the natural parameter directly — consistent with the AMM canonical-parameter convention.
Posterior predictive draw:
where poisson_log_rng internally exponentiates
} else if (family_id == 3) {
log_lik[i, k] = neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]);
y_pred[i, k] = neg_binomial_2_log_rng(eta[i, k], phi[k]);
}Likelihood (overdispersed count response):
where
Pointwise log-PMF (Stan's neg_binomial_2_log_lpmf applies
with phi[k] is response-specific (declared in section 1).
Posterior predictive draw:
} else if (family_id == 4) {
log_lik[i, k] = bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]);
y_pred[i, k] = bernoulli_logit_rng(eta[i, k]);
}Likelihood (binary/0-1 response):
with pointwise log-PMF (Stan's bernoulli_logit_lpmf uses the numerically stable form, avoiding explicit evaluation of
Link: theta_i = eta exposes the natural parameter — fully consistent with the AMM canonical-parameter convention.
Posterior predictive draw:
}
}
}Three closing braces, innermost to outermost:
- Closes the
if/else ifcascade (per$(i,k)$ cell). - Closes the inner
for (k in 1:p)loop. - Closes the outer
for (i in 1:n)loop — and, on the same line, thegenerated quantitiesblock (the final brace also serves as the close of the entire Stan program, sincegenerated quantitiesis the last top-level block).
For each posterior draw
where
| Matrix | Content | Shape | Role |
|---|---|---|---|
log_lik |
pointwise log-likelihood for LOO/WAIC | ||
theta_i |
|
canonical parameter on link scale (AMM anchor exposure) | |
y_pred |
posterior predictive draws |
The AMM (Anchored Measurement / reference-anchored dynamic parameter) decomposition, as realized across the two sections, separates the model into three layers:
-
Anchor layer (section 1,
parameters/transformed parameters): defines a reference level/category and the anchored deviation structure that builds$\eta_{ik}$ from reference effects plus AMM deviations. -
Canonical-parameter layer (this section, line
theta_i = eta): exposes the linear predictor$\eta_{ik}$ as the canonical parameter$\theta_{ik}$ on the natural/link scale. Because every family used here is a member of the exponential family with canonical link (identity for Gaussian, log for Poisson/NB2, logit for Bernoulli),$\eta_{ik}$ is the natural parameter, and the copy is the identity reparameterization — no Jacobian term is needed in the model density. -
Observation layer (this section, the
if/else ifcascade): maps$\theta_{ik}=\eta_{ik}$ through the inverse link inside Stan's*_log_lpmf/*_logit_lpmfprimitives to evaluate the likelihood and to draw posterior predictive samples.
The "marginal" qualifier in the filename is realized by the absence of any cross-
functions {
// {{CANONICAL_HELPERS}}
}Explanation: This block declares user-defined functions. The placeholder {{CANONICAL_HELPERS}} is dynamically replaced by the R code-generation engine with a set of helper functions specific to the canonical AMM reference-anchoring model. These include functions for computing basis differences, scaling transformations, and other mathematical utilities. The exact functions depend on the model configuration (e.g., spline basis for W_type_id). Since they are generated at runtime, their implementation is not shown here, but they are essential for computing the non-linear W component in the transformed parameters block.
data {
int<lower=1> n;
int<lower=1, upper=4> family_id;
int<lower=0, upper=1> use_a;
int<lower=0, upper=1> use_b;
int<lower=0, upper=1> use_W;
int<lower=0> J_a;
int<lower=0> J_b;
int<lower=0> dim_W;
int<lower=0> d;
matrix[n, J_a] Z_a;
matrix[n, J_b] Z_b;
matrix[n, d] X;
vector[n] y_real;
array[n] int y_int;
real theta_anchor;
int<lower=0, upper=1> use_dispersion_y;
int<lower=0, upper=1> use_dispersion_phi;
int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;
// Sub-phase 8.6.B canonical-but-degenerate dimension fields. K_slots
// and p_dim default to 1 for the 8.6.B regime and are checked R-side
// by .gdpar_eb_validate_inputs(); 8.6.C will relax the K = 1 and
// p = 1 guards and exercise these fields nontrivially.
int<lower=1> K_slots;
int<lower=1> p_dim;
}Explanation: The data block defines all observed and fixed quantities. Each declaration is described below:
-
n: Number of observations. -
family_id: Integer index (1–4) selecting the likelihood family: 1 = Gaussian, 2 = Poisson, 3 = negative binomial, 4 = Bernoulli logistic. -
use_a,use_b,use_W: Binary flags (0/1) indicating whether to include the additive random effect (a), multiplicative random effect (b), and non-linear deviation (W) components. -
J_a,J_b: Number of columns in the design matrices foraandb. Ifuse_a=0orJ_a=0, thenais effectively absent. -
dim_W: Dimension of theWbasis expansion (e.g., number of basis functions). -
d: Dimension of the covariate vectorXused in theWinteraction. -
Z_a,Z_b: Design matrices for theaandbrandom effects, each of dimension$n \times J_a$ and$n \times J_b$ . -
X: Covariate matrix of dimension$n \times d$ for theWcomponent. -
y_real: Real-valued response vector (used whenfamily_id=1). -
y_int: Integer-valued response array (used whenfamily_id=2,3,4). -
theta_anchor: Fixed reference anchor value for the non-linearWcomponent. -
use_dispersion_y,use_dispersion_phi: Binary flags for including dispersion parameters$\sigma_y$ (Gaussian) and$\phi$ (negative binomial). -
W_type_id,W_n_knots_full,W_knots_full,W_degree: Parameters defining the basis for theWcomponent (e.g., spline type, knots, degree). -
use_groups: Flag for hierarchical structure oftheta_ref. -
J_groups: Number of groups (ifuse_groups=1). -
group_id: Integer array mapping each observation to a group (1 to$J_{\text{groups}}$ ). -
K_slots,p_dim: Placeholder dimensions for future multivariate extensions (here fixed to 1).
transformed data {
int J_a_free = (use_a == 1 && J_a > 0) ? J_a : 0;
int J_b_free = (use_b == 1 && J_b > 0) ? J_b : 0;
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;
}Explanation: This block computes effective dimensions after applying the use_* flags:
-
J_a_free: Effective number ofacoefficients. Equals$J_a$ ifuse_a=1and$J_a>0$ , else 0. -
J_b_free: Effective number ofbcoefficients. Equals$J_b$ ifuse_b=1and$J_b>0$ , else 0. -
dim_W_eff: Effective dimension ofWbasis. Equalsdim_Wifuse_W=1, else 0. -
d_eff: Effective dimension ofX. Equalsdifuse_W=1, else 0.
These are used to size parameter arrays later.
parameters {
vector[J_groups] theta_ref;
array[use_groups] real mu_theta_ref;
array[use_groups] real<lower=0> sigma_theta_ref;
array[use_a == 1 && J_a > 0 ? 1 : 0] real<lower=0> sigma_a;
vector[J_a_free] a_raw;
array[use_b == 1 && J_b > 0 ? 1 : 0] real<lower=0> sigma_b;
vector[J_b_free] c_b_raw;
array[use_W == 1 && dim_W > 0 ? 1 : 0] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;
array[use_dispersion_y] real<lower=0> sigma_y;
array[use_dispersion_phi] real<lower=0> phi;
}Explanation: This block declares the latent parameters:
-
theta_ref: Vector of reference parameters for each group. Dimension$J_{\text{groups}}$ . -
mu_theta_ref,sigma_theta_ref: Mean and standard deviation of the hierarchical prior ontheta_ref(arrays of length 1 ifuse_groups=1, else empty). -
sigma_a: Standard deviation for theacoefficients (array of length 1 ifuse_a=1and$J_a>0$ , else empty). -
a_raw: Raw (non-centered) additive random effects, vector of lengthJ_a_free. -
sigma_b: Standard deviation for thebcoefficients (array of length 1 ifuse_b=1and$J_b>0$ , else empty). -
c_b_raw: Raw (non-centered) multiplicative random effects, vector of lengthJ_b_free. -
sigma_W: Standard deviation for theWbasis coefficients (array of length 1 ifuse_W=1anddim_W>0, else empty). -
W_raw: Raw (non-centered)Wbasis coefficients, matrix of dimensiondim_W_eff$\times$ d_eff. -
sigma_y: Gaussian dispersion (array of length 1 ifuse_dispersion_y=1, else empty). -
phi: Negative binomial dispersion (array of length 1 ifuse_dispersion_phi=1, else empty).
All raw parameters are intended to have unit-scale priors (non-centered parameterization), while the actual coefficients are derived in the transformed parameters block.
transformed parameters {
vector[J_a] a_coef = rep_vector(0, J_a);
vector[J_b] c_b = rep_vector(0, J_b);
vector[J_b] b_coef = rep_vector(0, J_b);
vector[n] eta;
if (use_a == 1 && J_a > 0) {
for (j in 1:J_a_free) {
a_coef[j] = a_raw[j]{{A_SCALE}};
}
}
if (use_b == 1 && J_b > 0) {
for (j in 1:J_b_free) {
c_b[j] = c_b_raw[j] * sigma_b[1];
}
if (use_groups == 0 && theta_ref[1] != 0) {
for (j in 1:J_b) {
b_coef[j] = c_b[j] / theta_ref[1];
}
}
}
for (i in 1:n) {
real theta_ref_i = theta_ref[group_id[i]];
real eta_i = theta_ref_i;
if (use_a == 1 && J_a > 0) {
eta_i += Z_a[i] * a_coef;
}
if (use_b == 1 && J_b > 0) {
eta_i += Z_b[i] * c_b;
}
if (use_W == 1 && dim_W > 0 && d > 0) {
vector[d] W_diff_x = rep_vector(0, d);
vector[dim_W] basis_diff = apply_W_basis_diff(
W_type_id, theta_ref_i, theta_anchor, dim_W, W_degree,
W_n_knots_full, W_knots_full
);
for (k in 1:dim_W) {
for (jj in 1:d) {
W_diff_x[jj] += basis_diff[k] * W_raw[k, jj]{{W_SCALE}};
}
}
eta_i += dot_product(W_diff_x, to_vector(X[i]));
}
eta[i] = eta_i;
}
}Explanation: This block transforms raw parameters into meaningful quantities and constructs the linear predictor
-
Initialization:
a_coef,c_b,b_coefare initialized to zero vectors of lengths$J_a,\ J_b,\ J_b$ .etais a vector of length$n$ . -
Additive random effects (
$a$ ): Ifuse_a=1and$J_a>0$ , eacha_coef[j]is computed asa_raw[j]multiplied by a scaling factor{{A_SCALE}}. The placeholder{{A_SCALE}}is replaced by the R code generator, typicallysigma_a[1], giving the non-centered parameterization: $$ a_{\text{coef},j} = \sigma_a \cdot a_{\text{raw},j}, \quad a_{\text{raw},j} \sim \text{prior}(0,1). $$ -
Multiplicative random effects (
$b$ ): Ifuse_b=1and$J_b>0$ , eachc_b[j]is computed asc_b_raw[j] * sigma_b[1]. Then, if the model is non-hierarchical (use_groups=0) andtheta_ref[1] ≠ 0, we computeb_coef[j] = c_b[j] / theta_ref[1]. This derives the multiplicative coefficients in terms of the reference parameter. In the hierarchical case,b_coefremains zero (it is not used directly in the likelihood). -
Linear predictor (
$\eta$ ): For each observation$i$ :-
$\theta_{\text{ref},i} = \theta_{\text{ref}}[\text{group\_id}[i]]$ . - Initialize
$\eta_i = \theta_{\text{ref},i}$ . - Add the additive term:
$\eta_i \mathrel{+}= \mathbf{Z}_{a,i} \cdot \mathbf{a}_{\text{coef}}$ , where$\mathbf{Z}_{a,i}$ is the$i$ -th row of$Z_a$ . - Add the multiplicative term:
$\eta_i \mathrel{+}= \mathbf{Z}_{b,i} \cdot \mathbf{c}_b$ . Note thatc_bis used, notb_coef. - If
use_W=1,dim_W>0, andd>0, compute the non-linear deviation:- Compute
basis_diffvia the helper functionapply_W_basis_diff. This returns a vector of lengthdim_Wcontaining the basis functions evaluated at the difference$\theta_{\text{ref},i} - \theta_{\text{anchor}}$ . - Initialize
W_diff_x(a vector of length$d$ ) to zero. - For each basis dimension
$k$ and covariate dimension$jj$ , accumulate:$W_{\text{diff\_x},jj} \mathrel{+}= \text{basis\_diff}[k] \cdot W_{\text{raw}}[k,jj] \cdot {{W_SCALE}}$ . The placeholder{{W_SCALE}}is replaced, typically bysigma_W[1]. - Add the dot product:
$\eta_i \mathrel{+}= \mathbf{W}_{\text{diff\_x}} \cdot \mathbf{X}_i$ .
- Compute
- Set
eta[i] = eta_i.
-
The resulting
model {
// BEGIN PRIORS
if (use_groups == 1) {
mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
theta_ref ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
} else {
theta_ref[1] ~ {{PRIOR_THETA_REF}};
}
if (use_a == 1 && J_a > 0) {
sigma_a[1] ~ {{PRIOR_SIGMA_A}};
a_raw ~ {{A_PRIOR}};
}
if (use_b == 1 && J_b > 0) {
sigma_b[1] ~ {{PRIOR_SIGMA_B}};
c_b_raw ~ normal(0, 1);
}
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}
if (use_dispersion_y == 1) {
sigma_y[1] ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
phi[1] ~ {{PRIOR_PHI}};
}
// END PRIORS
if (family_id == 1) {
y_real ~ normal(eta, sigma_y[1]);
} else if (family_id == 2) {
y_int ~ poisson_log(eta);
} else if (family_id == 3) {
y_int ~ neg_binomial_2_log(eta, phi[1]);
} else if (family_id == 4) {
y_int ~ bernoulli_logit(eta);
}
}Explanation: This block specifies the joint log-density (priors and likelihood).
-
Reference parameter (
$\theta_{\text{ref}}$ ):- If
use_groups=1: hierarchical prior: $$ \mu_{\theta_{\text{ref}}} \sim p(\mu), \quad \sigma_{\theta_{\text{ref}}} \sim p(\sigma), \quad \theta_{\text{ref},j} \sim \mathcal{N}(\mu_{\theta_{\text{ref}}}, \sigma_{\theta_{\text{ref}}}). $$ - If
use_groups=0: a single$\theta_{\text{ref},1}$ with prior$p(\theta_{\text{ref}})$ .
- If
-
Additive random effects (
$a$ ): If used,$\sigma_a \sim p(\sigma_a)$ and$\mathbf{a}_{\text{raw}} \sim p(\mathbf{a}_{\text{raw}})$ (typically standard normal). -
Multiplicative random effects (
$b$ ): If used,$\sigma_b \sim p(\sigma_b)$ and$\mathbf{c}_{\text{raw}} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ . -
Non-linear deviation (
$W$ ): If used,$\sigma_W \sim p(\sigma_W)$ and$\mathrm{vec}(W_{\text{raw}}) \sim p(W_{\text{raw}})$ (typically standard normal). -
Dispersion parameters: If used,
$\sigma_y \sim p(\sigma_y)$ and$\phi \sim p(\phi)$ .
The likelihood depends on family_id:
- Gaussian (
family_id=1):$y_i \sim \mathcal{N}(\eta_i, \sigma_y^2)$ . - Poisson (
family_id=2):$y_i \sim \mathrm{Poisson}(\exp(\eta_i))$ . - Negative binomial (
family_id=3):$y_i \sim \mathrm{NB2}(\exp(\eta_i), \phi)$ . - Bernoulli logistic (
family_id=4):$y_i \sim \mathrm{Bernoulli}(\mathrm{logit}^{-1}(\eta_i))$ .
generated quantities {
vector[n] log_lik;
vector[n] theta_i = eta;
vector[n] y_pred;
for (i in 1:n) {
if (family_id == 1) {
log_lik[i] = normal_lpdf(y_real[i] | eta[i], sigma_y[1]);
y_pred[i] = normal_rng(eta[i], sigma_y[1]);
} else if (family_id == 2) {
log_lik[i] = poisson_log_lpmf(y_int[i] | eta[i]);
y_pred[i] = poisson_log_rng(eta[i]);
} else if (family_id == 3) {
log_lik[i] = neg_binomial_2_log_lpmf(y_int[i] | eta[i], phi[1]);
y_pred[i] = neg_binomial_2_log_rng(eta[i], phi[1]);
} else if (family_id == 4) {
log_lik[i] = bernoulli_logit_lpmf(y_int[i] | eta[i]);
y_pred[i] = bernoulli_logit_rng(eta[i]);
}
}
}Explanation: This block computes quantities after sampling.
-
log_lik: Vector of pointwise log-likelihoods (for LOO-CV). -
theta_i: Alias foreta, representing the linear predictor for each observation. -
y_pred: Posterior predictive draws.
For each observation family_id. The log-likelihoods are computed with the same distributional assumptions as the likelihood in the model block.
This Stan template realizes the additive-multiplicative non-linear model (AMM) with reference anchoring. The linear predictor
where:
-
$\theta_{\text{ref},g(i)}$ is the reference parameter for group$g(i)$ . -
$\mathbf{a}$ are additive random effects (centered via$\sigma_a$ ). -
$\mathbf{c}_b$ are multiplicative random effects (centered via$\sigma_b$ ). -
$\psi_k$ are basis functions (fromapply_W_basis_diff) evaluated at the difference$\theta_{\text{ref}} - \theta_{\text{anchor}}$ , and$\mathbf{W}_{\text{raw},k}$ are raw basis coefficients scaled by$\sigma_W$ .
The decomposition ensures that the non-linear term vanishes when
This Stan source file contains only a functions block. There are no data, transformed data, parameters, transformed parameters, model, or generated quantities blocks present in this file. The two user-defined functions defined here are intended to be included (via #include or Stan's modular compilation) into a larger AMM (Anchored Marginal Model) Stan program. Their sole purpose is to support the W modulating component of the AMM reference-anchoring decomposition by providing differentiable basis-function evaluation inside the HMC sampler.
The file header comment identifies the development phase:
Sub-phase 8.3.8 (2026-05-22): Cox-de Boor B-spline basis evaluation for the W modulating component.
Key design rationale stated in the comment:
- The augmented knot vector
W_knots_full(length$n_{\text{int}} + 2(W_{\text{degree}}+1)$ ) is constructed on the R side by.gdpar_bspline_knots_full()and passed as data. - The Cox-de Boor recursion is re-evaluated at
theta_ref_iandtheta_anchorinside every HMC leapfrog step because these are parameters — a pre-computed basis matrix from R would be constant data and would not propagate gradients with respect to the evaluation points. - The returned dimension
W_per_k_dimequals the total number of Cox-de Boor B-spline basis functions minus one, mirroringsplines::bs(intercept = FALSE)(the first basis function is dropped). - The exact ordering need not bit-match
splines::bsbecauseW_rawabsorbs any orthogonal rotation under sampling; only the dimension count must agree.
vector bspline_basis_eval(real x, int W_per_k_dim, int W_degree,
int W_n_knots_full, vector W_knots_full)Arguments:
| Argument | Type | Role |
|---|---|---|
x |
real |
Evaluation point. At call sites this is theta_ref_i or theta_anchor, both of which are Stan parameters. |
W_per_k_dim |
int |
Number of basis functions to return (total basis count minus one; the first is dropped). |
W_degree |
int |
B-spline polynomial degree |
W_n_knots_full |
int |
Length of the augmented knot vector |
W_knots_full |
vector |
Augmented knot vector |
Return type: vector[W_per_k_dim] — the B-spline basis evaluated at
int n_basis_total = W_n_knots_full - W_degree - 1;
int n_iter_max = n_basis_total + W_degree;-
n_basis_totalis the total number of degree-$p$ B-spline basis functions for a knot vector of length$m$ and degree$p$ :
For an augmented knot vector with
-
n_iter_maxis the number of degree-0 basis functions that must be initialized to support the full recursion up to degree$p$ :
This equals
vector[n_iter_max] B;A working vector of length
for (i in 1:n_iter_max) {
if (W_knots_full[i] == W_knots_full[i + 1]) {
B[i] = 0.0;
} else if (i == n_iter_max && x >= W_knots_full[i + 1]) {
B[i] = 1.0;
} else {
B[i] = (x >= W_knots_full[i] && x < W_knots_full[i + 1]) ? 1.0 : 0.0;
}
}This implements the degree-0 Cox-de Boor basis with three cases:
Case 1 — Repeated knots (W_knots_full[i] == W_knots_full[i + 1]):
When
This is the standard convention for repeated knots in the Cox-de Boor recursion.
Case 2 — Right-endpoint inclusion (i == n_iter_max && x >= W_knots_full[i + 1]):
For the last degree-0 basis function (index
This is the standard right-continuity correction. The half-open interval convention
Case 3 — Standard indicator (otherwise):
This is the canonical degree-0 B-spline: a piecewise-constant indicator on the half-open interval
for (d_ord in 1:W_degree) {
int n_iter = n_basis_total + W_degree - d_ord;
vector[n_iter] B_new;
for (i in 1:n_iter) {
real denom1 = W_knots_full[i + d_ord] - W_knots_full[i];
real denom2 = W_knots_full[i + d_ord + 1] - W_knots_full[i + 1];
real term1 = (denom1 > 0.0)
? (x - W_knots_full[i]) / denom1 * B[i] : 0.0;
real term2 = (denom2 > 0.0)
? (W_knots_full[i + d_ord + 1] - x) / denom2 * B[i + 1] : 0.0;
B_new[i] = term1 + term2;
}
for (i in 1:n_iter) B[i] = B_new[i];
}Outer loop (d_ord from
Dimension at degree
At
Each recursion level reduces the count by 1, consistent with the fact that
Inner loop — the Cox-de Boor recurrence formula:
The code maps directly:
-
denom1$= t_{i+d} - t_i$ (left denominator) -
denom2$= t_{i+d+1} - t_{i+1}$ (right denominator) -
term1$= \frac{x - t_i}{t_{i+d} - t_i} \cdot B_{i,d-1}(x)$ when$\text{denom1} > 0$ , else$0$ -
term2$= \frac{t_{i+d+1} - x}{t_{i+d+1} - t_{i+1}} \cdot B_{i+1,d-1}(x)$ when$\text{denom2} > 0$ , else$0$ -
B_new[i]$= \text{term1} + \text{term2}$
The guards denom1 > 0.0 and denom2 > 0.0 implement the convention
Copy-back loop:
for (i in 1:n_iter) B[i] = B_new[i];Overwrites the first B with the new degree-n_iter is strictly smaller).
After the outer loop completes (B[1] through B[n_basis_total] hold the degree-
vector[W_per_k_dim] out;
for (j in 1:W_per_k_dim) out[j] = B[j + 1];
return out;Returns entries B[2] through B[W_per_k_dim + 1], i.e.:
The first basis function splines::bs(intercept = FALSE), which returns
As noted in the header comment, the exact column ordering need not match splines::bs because the coefficient vector W_raw absorbs any orthogonal rotation of the basis under posterior sampling — only the dimension (and hence the column space dimension) must agree.
vector apply_W_basis_diff(int basis_type_id, real theta, real anchor,
int W_per_k_dim, int W_degree,
int W_n_knots_full, vector W_knots_full)Arguments:
| Argument | Type | Role |
|---|---|---|
basis_type_id |
int |
Dispatcher: 0 = W off, 1 = polynomial (monomial) basis, 2 = B-spline basis. |
theta |
real |
Evaluation point (typically theta_ref_i, a parameter). |
anchor |
real |
Anchor/reference point (typically theta_anchor, a parameter). |
W_per_k_dim |
int |
Output dimension (number of basis functions per coordinate). |
W_degree |
int |
B-spline degree (used only when basis_type_id == 2). |
W_n_knots_full |
int |
Length of augmented knot vector (used only when basis_type_id == 2). |
W_knots_full |
vector |
Augmented knot vector (used only when basis_type_id == 2). |
Return type: vector[W_per_k_dim] — the anchored basis difference
vector[W_per_k_dim] diff = rep_vector(0.0, W_per_k_dim);Initializes the output to the zero vector. This is the default return value when basis_type_id == 0 (W component disabled), serving as a defensive fallback.
if (basis_type_id == 1) {
for (jj in 1:W_per_k_dim) {
diff[jj] = pow(theta, jj) - pow(anchor, jj);
}
}Uses the monomial basis
Note that this basis is not orthogonal and the monomials are correlated, but the reference-anchoring difference improves conditioning by centering at the anchor. The absence of a constant term (power 0) is consistent with the intercept-dropping convention: a constant basis function would produce
} else if (basis_type_id == 2) {
diff = bspline_basis_eval(theta, W_per_k_dim, W_degree,
W_n_knots_full, W_knots_full)
- bspline_basis_eval(anchor, W_per_k_dim, W_degree,
W_n_knots_full, W_knots_full);
}Evaluates the B-spline basis (via bspline_basis_eval) at both theta and anchor, then computes the element-wise difference:
where each component is:
(The bspline_basis_eval.)
Both evaluations call bspline_basis_eval with identical knot/basis configuration, ensuring the difference is computed on the same basis space. Because theta and anchor are Stan parameters, the two calls to bspline_basis_eval execute inside the HMC sampler's autodiff graph, making the difference fully differentiable with respect to both arguments.
return diff;Returns the anchored basis difference. When basis_type_id == 0, this is the zero vector (W component inactive). When basis_type_id is 1 or 2, it is the non-trivial basis difference used by the AMM decomposition.
The central quantity computed by apply_W_basis_diff is:
where basis_type_id == 1) or the B-spline basis with the first function dropped (basis_type_id == 2).
In the AMM model, the W modulating component enters the likelihood through a linear combination:
where
so the modulation is pinned to zero at the anchor point. This provides identifiability: the absolute level of
The comment explicitly states why the basis is evaluated inside Stan rather than pre-computed in R:
the recursion re-evaluates the basis at theta_ref_i and theta_anchor in every Hamiltonian Monte Carlo step (these are parameters, so a R-side pre-computed matrix would not be differentiable in the sampler).
If the basis matrix anchor. By re-evaluating the Cox-de Boor recursion inside Stan, the full gradient
The comment notes:
The exact ordering does not need to bit-match splines::bs because W_raw absorbs any orthogonal rotation under sampling, but the dimension count must agree.
Formally, if
where materialize_W_basis() as length(knots) + degree per coordinate, which for the dim_W and matches W_per_k_dim.
| Quantity | Formula | Value |
|---|---|---|
| Augmented knot vector length | W_n_knots_full |
|
| Total B-spline basis functions | n_basis_total |
|
| Degree-0 working array size | n_iter_max |
|
| Returned basis dimension (first dropped) | W_per_k_dim |
|
R-side materialize_W_basis() dim |
length(knots) + degree |
dim_W (for |
The following Stan blocks are not present in this file:
-
data: No data declarations. The knot vector and dimension arguments are passed as function parameters, not as block-level data variables. -
transformed data: No precomputed quantities. -
parameters: No parameter declarations.thetaandanchorare function arguments that receive parameter values at call sites in the parent model. -
transformed parameters: No transformed quantities. -
model: No likelihood or prior contributions. These functions are pure utilities invoked elsewhere. -
generated quantities: No posterior predictions.
This file is a pure helper module: it provides two functions block entries that are #included into the canonical AMM Stan program, where the actual parameter declarations, likelihood, priors, and generated quantities reside.
This Stan template implements the Hierarchical Bayesian AMM (General Dynamic Parameter models via reference anchoring) canonical form, Path 1. The core model specification is:
Where:
-
$\theta_i$ is the effective parameter for individual$i$ (on the link-function scale). -
$\theta_{\text{ref}[g_i]}$ is a group-level reference parameter. -
$a(x_i)$ is an additive component linear in a basis$Z_a$ . -
$b(x_i)$ is a multiplicative component linear in a basis$Z_b$ , scaling$\theta_{\text{ref}}$ . -
$W(\cdot)$ is a basis expansion (polynomial or B-spline) applied to$\theta_{\text{ref}}$ , anchored at a fixed value$\theta_{\text{anchor}}$ .
The model ensures identifiability through centering of bases
functions {
// {{CANONICAL_HELPERS}}
}The functions block is populated at code generation time by R/stan_codegen.R. It contains:
-
apply_W_basis_diff(): A function to compute$W(\theta) - W(\theta_{\text{anchor}})$ for a given basis type (polynomial or B-spline). - Any additional helper functions required for the canonical form.
The function dispatches based on W_type_id:
-
Polynomial basis (
$\text{type}=1$ ): Computes$\theta^j - \theta_{\text{anchor}}^j$ for degrees$j = 1, \ldots, \text{W\_degree}$ . -
B-spline basis (
$\text{type}=2$ ): Uses Cox-de Boor recursion to evaluate basis functions at$\theta$ and$\theta_{\text{anchor}}$ , then returns the difference.
-
int<lower=1> n: Number of observations. -
int<lower=1, upper=4> family_id: Likelihood family identifier:- 1: Gaussian (identity link)
- 2: Poisson (log link)
- 3: Negative binomial 2 (log link, Stan parametrization)
- 4: Bernoulli (logit link)
-
int<lower=0, upper=1> use_a: Flag to include additive component$a$ . -
int<lower=0, upper=1> use_b: Flag to include multiplicative component$b$ . -
int<lower=0, upper=1> use_W: Flag to include$W$ basis component.
-
int<lower=0> J_a: Number of columns in additive design matrix$Z_a$ . -
int<lower=0> J_b: Number of columns in multiplicative design matrix$Z_b$ . -
int<lower=0> dim_W: Number of basis functions in$W$ . -
int<lower=0> d: Dimension of covariate vector$x_i$ for the$W$ interaction term.
-
matrix[n, J_a] Z_a: Additive design matrix. Column-centered in R-side pre-processing to enforce$E[a(X)] = 0$ (condition (C2)). -
matrix[n, J_b] Z_b: Multiplicative design matrix. Column-centered in R-side pre-processing to enforce$E[b(X)] = 0$ (condition (C3)). -
matrix[n, d] X: Covariate matrix for the$W$ interaction term.
-
vector[n] y_real: Real-valued response (used for Gaussian family). -
array[n] int y_int: Integer-valued response (used for Poisson, negative binomial, Bernoulli families).
-
real theta_anchor: Fixed anchor value for the$W$ basis reparametrization. Used to compute$W(\theta) - W(\theta_{\text{anchor}})$ , satisfying condition (C4).
-
int<lower=0, upper=1> use_dispersion_y: Flag for Gaussian observation-level dispersion$\sigma_y$ . -
int<lower=0, upper=1> use_dispersion_phi: Flag for negative binomial overdispersion$\phi$ .
-
int<lower=0, upper=2> W_type_id: Basis type for$W$ :- 0:
$W$ off - 1: Polynomial (degrees 1 to
W_degree, anchored attheta_anchor) - 2: B-spline (Cox-de Boor recursion)
- 0:
-
int<lower=0> W_n_knots_full: Number of full knot sequence for B-spline (used only ifW_type_id == 2). -
vector[W_n_knots_full] W_knots_full: Full knot vector for B-spline. -
int<lower=0> W_degree: Degree for polynomial basis or B-spline degree.
-
int<lower=0, upper=1> use_groups: Flag to enable hierarchical per-group reference parameters. When 0,J_groupsis forced to 1 and allgroup_id[i]are 1, collapsing to a single scalar$\theta_{\text{ref}}$ . -
int<lower=1> J_groups: Number of groups. -
array[n] int<lower=1, upper=J_groups> group_id: Group membership for each observation.
transformed data {
int J_a_free = (use_a == 1 && J_a > 0) ? J_a : 0;
int J_b_free = (use_b == 1 && J_b > 0) ? J_b : 0;
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;
}-
J_a_free: Effective number of free coefficients for$a$ . Set to 0 ifuse_ais 0 orJ_ais 0. -
J_b_free: Effective number of free coefficients for$b$ (actually$c_b$ ). Set to 0 ifuse_bis 0 orJ_bis 0. -
dim_W_eff: Effective dimension of$W$ basis. Set to 0 ifuse_Wis 0. -
d_eff: Effective dimension of covariates for$W$ interaction. Set to 0 ifuse_Wis 0.
These are used for conditional declaration of parameters and loop bounds.
-
vector[J_groups] theta_ref: Group-level reference parameters. Whenuse_groups == 0, this is a vector of length 1 containing the single global$\theta_{\text{ref}}$ . -
array[use_groups] real mu_theta_ref: Hyperparameter for the mean of the hierarchical prior on$\theta_{\text{ref}}$ . Allocated only whenuse_groups == 1. -
array[use_groups] real<lower=0> sigma_theta_ref: Hyperparameter for the standard deviation of the hierarchical prior on$\theta_{\text{ref}}$ . Allocated only whenuse_groups == 1.
-
array[use_a == 1 && J_a > 0 ? 1 : 0] real<lower=0> sigma_a: Standard deviation for additive coefficients (used in NCP or as prior scale in CP). Allocated only if component is active. -
vector[J_a_free] a_raw: Raw additive coefficients. In NCP (default), these are standard normal; in CP, these are drawn from$\mathcal{N}(0, \sigma_a)$ .
-
array[use_b == 1 && J_b > 0 ? 1 : 0] real<lower=0> sigma_b: Standard deviation for the reparametrized multiplicative coefficients$c_b$ . Allocated only if component is active. -
vector[J_b_free] c_b_raw: Raw coefficients for$c_b = \theta_{\text{ref}} \cdot b$ . Always drawn from standard normal, regardless of NCP/CP choice for other components.
Note: The multiplicative component uses a linear reparametrization to avoid bimodality. Instead of sampling
-
array[use_W == 1 && dim_W > 0 ? 1 : 0] real<lower=0> sigma_W: Standard deviation for$W$ basis coefficients. Allocated only if component is active. -
matrix[dim_W_eff, d_eff] W_raw: Raw coefficients for$W$ basis functions. Dimensions: number of basis functions × covariate dimension. In NCP, these are standard normal; in CP, scaled by$\sigma_W$ .
-
array[use_dispersion_y] real<lower=0> sigma_y: Observation-level standard deviation for Gaussian family. -
array[use_dispersion_phi] real<lower=0> phi: Overdispersion parameter for negative binomial family.
-
vector[J_a] a_coef: Additive coefficients, initialized to zero. Populated only ifuse_a == 1andJ_a > 0. -
vector[J_b] c_b: Reparametrized multiplicative coefficients, initialized to zero. -
vector[J_b] b_coef: Original multiplicative coefficients, initialized to zero. Only computed in single-group case.
-
vector[n] eta: Linear predictor$\eta_i$ for each observation.
if (use_a == 1 && J_a > 0) {
for (j in 1:J_a_free) {
a_coef[j] = a_raw[j]{{A_SCALE}};
}
}-
{{A_SCALE}}is replaced at code generation time:- In NCP (default):
* sigma_a[1] - In CP:
* 1(i.e., no scaling)
- In NCP (default):
Thus, in NCP:
if (use_b == 1 && J_b > 0) {
for (j in 1:J_b_free) {
c_b[j] = c_b_raw[j] * sigma_b[1];
}
// Only compute b_coef in single-group case
if (use_groups == 0 && theta_ref[1] != 0) {
for (j in 1:J_b) {
b_coef[j] = c_b[j] / theta_ref[1];
}
}
}-
$c_b = c_{b,\text{raw}} \cdot \sigma_b$ , where$c_{b,\text{raw}} \sim \mathcal{N}(0,1)$ . -
$b_{\text{coef}} = c_b / \theta_{\text{ref}}$ only when there is a single group and$\theta_{\text{ref}} \neq 0$ . Otherwise,$b_{\text{coef}}$ remains zero (must be derived post-hoc per group if needed).
For each observation
for (i in 1:n) {
real theta_ref_i = theta_ref[group_id[i]];
real eta_i = theta_ref_i;- Assign group-specific reference parameter
$\theta_{\text{ref},i}$ .
if (use_a == 1 && J_a > 0) {
eta_i += Z_a[i] * a_coef;
}- Add additive component:
$a(x_i) = Z_a[i] \cdot a_{\text{coef}}$ .
if (use_b == 1 && J_b > 0) {
eta_i += Z_b[i] * c_b;
}- Add multiplicative component in reparametrized form:
$b(x_i) \cdot \theta_{\text{ref}} = Z_b[i] \cdot c_b$ .
if (use_W == 1 && dim_W > 0 && d > 0) {
vector[d] W_diff_x = rep_vector(0, d);
vector[dim_W] basis_diff = apply_W_basis_diff(
W_type_id, theta_ref_i, theta_anchor, dim_W, W_degree,
W_n_knots_full, W_knots_full
);
for (k in 1:dim_W) {
for (jj in 1:d) {
W_diff_x[jj] += basis_diff[k] * W_raw[k, jj]{{W_SCALE}};
}
}
eta_i += dot_product(W_diff_x, to_vector(X[i]));
}- Add
$W$ component:- Compute
basis_diff=$W(\theta_{\text{ref},i}) - W(\theta_{\text{anchor}})$ , a vector of lengthdim_W. - For each covariate dimension
$jj$ , compute the linear combination of basis differences weighted by$W_{\text{raw}}$ (with{{W_SCALE}}scaling).{{W_SCALE}}is replaced by code generation:- In NCP:
* sigma_W[1] - In CP:
* 1
- In NCP:
- Take dot product of
W_diff_xwith$x_i$ to get scalar contribution:$\sum_{jj=1}^d \left( \sum_{k=1}^{\text{dim\_W}} \text{basis\_diff}_k \cdot W_{\text{raw},k,jj} \right) x_{i,jj}$ .
- Compute
eta[i] = eta_i;
}- Store final
$\eta_i$ .
if (use_groups == 1) {
mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
theta_ref ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
} else {
theta_ref[1] ~ {{PRIOR_THETA_REF}};
}- When
use_groups == 1:- Hyperpriors on
$\mu_{\theta_{\text{ref}}}$ and$\sigma_{\theta_{\text{ref}}}$ (placeholders replaced by code generation). - Hierarchical prior:
$\theta_{\text{ref},g} \sim \mathcal{N}(\mu_{\theta_{\text{ref}}}, \sigma_{\theta_{\text{ref}}})$ for each group$g$ .
- Hyperpriors on
- When
use_groups == 0:- Single prior directly on
$\theta_{\text{ref},1}$ .
- Single prior directly on
if (use_a == 1 && J_a > 0) {
sigma_a[1] ~ {{PRIOR_SIGMA_A}};
a_raw ~ {{A_PRIOR}};
}- Prior on
$\sigma_a$ (positive, typically half-normal or exponential). - Prior on
$a_{\text{raw}}$ :- In NCP:
{{A_PRIOR}}replaced bynormal(0, 1). - In CP:
{{A_PRIOR}}replaced bynormal(0, sigma_a[1]).
- In NCP:
if (use_b == 1 && J_b > 0) {
sigma_b[1] ~ {{PRIOR_SIGMA_B}};
c_b_raw ~ normal(0, 1);
}- Prior on
$\sigma_b$ . -
$c_{b,\text{raw}} \sim \mathcal{N}(0,1)$ always, regardless of NCP/CP choice for other components.
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}- Prior on
$\sigma_W$ . - Prior on
$W_{\text{raw}}$ :- In NCP:
{{W_PRIOR}}replaced bynormal(0, 1). - In CP:
{{W_PRIOR}}replaced bynormal(0, sigma_W[1]).
- In NCP:
if (use_dispersion_y == 1) {
sigma_y[1] ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
phi[1] ~ {{PRIOR_PHI}};
}- Priors on dispersion parameters (if used).
if (family_id == 1) {
y_real ~ normal(eta, sigma_y[1]);
} else if (family_id == 2) {
y_int ~ poisson_log(eta);
} else if (family_id == 3) {
y_int ~ neg_binomial_2_log(eta, phi[1]);
} else if (family_id == 4) {
y_int ~ bernoulli_logit(eta);
}-
Gaussian:
$y_i \sim \mathcal{N}(\eta_i, \sigma_y)$ . -
Poisson:
$y_i \sim \text{Poisson}(\exp(\eta_i))$ . -
Negative binomial:
$y_i \sim \text{NB2}(\exp(\eta_i), \phi)$ with mean$\mu = \exp(\eta_i)$ and variance$\mu + \mu^2/\phi$ . -
Bernoulli:
$y_i \sim \text{Bernoulli}(\text{logit}^{-1}(\eta_i))$ .
vector[n] log_lik;
vector[n] theta_i = eta;
vector[n] y_pred;-
log_lik: Pointwise log-likelihood for each observation (used for LOO/WAIC). -
theta_i: Set toeta(the linear predictor). This represents$\theta_i$ on the link-function scale. -
y_pred: Posterior predictive samples.
For each observation
-
Log-likelihood (for model comparison):
- Gaussian:
$\log p(y_i | \eta_i, \sigma_y) = \text{normal\_lpdf}(y_i | \eta_i, \sigma_y)$ - Poisson:
$\log p(y_i | \eta_i) = \text{poisson\_log\_lpmf}(y_i | \eta_i)$ - Negative binomial:
$\log p(y_i | \eta_i, \phi) = \text{neg\_binomial\_2\_log\_lpmf}(y_i | \eta_i, \phi)$ - Bernoulli:
$\log p(y_i | \eta_i) = \text{bernoulli\_logit\_lpmf}(y_i | \eta_i)$
- Gaussian:
-
Posterior predictive sample (for PPC):
- Gaussian:
$y_i^{\text{pred}} \sim \mathcal{N}(\eta_i, \sigma_y)$ - Poisson:
$y_i^{\text{pred}} \sim \text{Poisson}(\exp(\eta_i))$ - Negative binomial:
$y_i^{\text{pred}} \sim \text{NB2}(\exp(\eta_i), \phi)$ - Bernoulli:
$y_i^{\text{pred}} \sim \text{Bernoulli}(\text{logit}^{-1}(\eta_i))$
- Gaussian:
Where:
-
$a_{\text{coef},j} = a_{\text{raw},j} \cdot \sigma_a$ (NCP) or$a_{\text{coef},j} = a_{\text{raw},j}$ (CP). -
$c_b = c_{b,\text{raw}} \cdot \sigma_b$ , with$c_{b,\text{raw}} \sim \mathcal{N}(0,1)$ . -
$W_k(\theta) - W_k(\theta_{\text{anchor}})$ is the anchored basis difference. -
$W_{\text{raw},k,jj}$ are coefficients with scaling by$\sigma_W$ in NCP.
-
(C2) Additive centering:
$E[a(X)] = 0$ enforced by column-centering$Z_a$ in R. -
(C3) Multiplicative centering:
$E[b(X)] = 0$ enforced by column-centering$Z_b$ in R. -
(C4) Basis anchoring:
$W(\theta) - W(\theta_{\text{anchor}})$ ensures$W$ vanishes at anchor. -
(C5) Group aliasing: When
use_groups == 1,$Z_a$ must not contain columns that alias with$\theta_{\text{ref}[g]}$ (enforced in R-side preflight check).
The multiplicative component
This section contains only the header comment and the functions block. The remaining blocks (data, transformed data, parameters, transformed parameters, model, generated quantities) appear in subsequent sections. The walkthrough below is line-faithful to the supplied source.
The header is a documentation block (Stan // comments) and does not compile to anything. It establishes the following contract, which the rest of the template must honor:
-
Path label. "gdpar Path 1 — Full-Bayes (FB) multi-parametric multivariate distribution template, Path C". Path C is the cross product
$K \geq 2$ individual slots with$p \geq 2$ AMM coordinates. -
Sub-bloque provenance. "Sub-bloque 9.3.d (Bloque 9, Sesión B9.5, 2026-05-27)" under "decision I.iv lateral canonized in B9.4", with design reference
DESIGN_9_3_D_PATH_C.mdsub-decision 3.2 = "J.iv.A: dedicated piece with canonical helpers reuse". -
Architectural lineage. Mirrors
amm_eb_marginal_KxP.stanfrom "Sub-fase 8.6.D, 2026-05-25". It composes:-
Path A multivariate ("Theorem 7A* of v07b"): coord-wise factorization across
$p$ . - Path B multi-parametric ("Theorem 7C* of v07b"): distributional regression canonical link per slot.
-
Path A multivariate ("Theorem 7A* of v07b"): coord-wise factorization across
-
Outcome structure. A single outcome matrix
$y[n, p]$ is multivariate (matrix-column outcome shared across the$K$ slots, coord-wise factorized across$p$ ). The$K$ slots are the canonical distributional parameters of the family; examples given:- Gaussian
$K=2:\ (\mu, \log\sigma)$ . - Negative binomial
$K=2:\ (\log\mu, \log\phi)$ . - Student-
$t\ K=3:\ (\mu\ \text{identity}, \log\sigma, \log\nu)$ .
- Gaussian
- Each slot has its own anchor in
$\mathbb{R}^p$ per group, mirroring Path A.
For individual
The outcome is then distributed as
This is the AMM reference-anchoring decomposition: a per-group, per-slot, per-coordinate reference
Path B family set (Sesión B9.5, J.iv.A):
family_id |
Family |
|---|---|
| 1 | Gaussian |
| 3 | neg_binomial_2 |
| 5 | Beta |
| 6 | Gamma |
| 7 | lognormal_loc_scale |
| 8 | Student- |
| 9 | Tweedie |
| 10 | ZIP |
| 11 | ZINB |
| 12 | hurdle_poisson |
| 13 | hurdle_neg_binomial_2 |
The "K.c roster minimal bootstrap" covers families 1, 5, 6, 8 (seeds 91001–91004 reserved in B9.4 DESIGN_9_3_D_PATH_C.md §4). Remaining families use the same dispatch with caveats inherited from Sub-fase 8.6.D:
- Path B logit-strict under Path C near the support boundary is fragile.
- Tweedie's slot
$p$ is structurally bounded in$(1.01, 1.99)$ viaTHETA_REF_PRIOR_BLOCKby slice.
-
$a$ (offset). NCP only per slot, with per-slot scalesigma_a_k. Per-slot per-coord CP / mixed regimes are deferred to B9.6+ canonical KxP a-blocks; B9.5 is atomic. -
$b$ (slope). LINEAR reparametrization per slot per coord on$c_{b,kp} = \theta^{\mathrm{ref}}_{kp}\,b_{\mathrm{coef},kp}$ . NCP sample ofc_b_kp_rawand deriveb_coef_kppost-hoc in the single-anchor regime. -
$W$ (basis). Configurable CP/NCP globally via placeholdersW_SCALEandW_PRIOR. Inactive (use_W = 0) in the B9.5 initial roster; enabled viause_W = 1withapply_W_basis_diff()dispatch onW_type_idmirroringdistrib_Kandpmulti. -
$\theta^{\mathrm{ref}}_{kp}$ (anchor). Per-group, per-slot, per-coord anchor. Hierarchical underuse_groups == 1; single-anchor coord-wise i.i.d. otherwise.THETA_REF_PRIOR_BLOCKplaceholder emits the family-specific block (Tweedie$K=3$ with bounded$p$ slot via slice).
- Generated by
R/stan_codegen.Rvia placeholder substitution; the substituted output must not be hand-edited. - The dedicated helpers piece
inst/stan/_canonical_pieces/amm_canonical_helpers.stanis injected at theCANONICAL_HELPERSplaceholder inside thefunctions { }block (decision G.iv lateral of B9.4). -
Caveat heredado (
DESIGN_9_3_D_PATH_C.md§5): Path B logit-strict under Path C may cause numerical instability for families withinv_logitlink (beta, bernoulli) when the covariate domain produces$\eta$ near$\pm\infty$ . Mitigation: the B9.5 K.c roster avoids pure logit-strict (gaussian + beta + gamma + student_t); expansion to families {10 ZIP, 12 hurdle-Poisson} (logit link on$\pi$ ) is deferred to B9.6 with granularskip_ifif instability emerges.
functions {
// Sub-bloque 9.3.d colateral via G.iv lateral (Sesion B9.4): the
// bspline_basis_eval + apply_W_basis_diff helpers are injected here
// from the dedicated piece amm_canonical_helpers.stan.
// {{CANONICAL_HELPERS}}The functions { opens the block. The comment cites "Sub-bloque 9.3.d colateral via G.iv lateral (Sesión B9.4)" and identifies the injected helpers as bspline_basis_eval and apply_W_basis_diff, sourced from amm_canonical_helpers.stan. The literal token {{CANONICAL_HELPERS}} is a placeholder consumed by R/stan_codegen.R; at substitution time the contents of amm_canonical_helpers.stan are textually inserted here. In the raw template as supplied, no helper bodies are visible — only the placeholder. Consequently, the functions bspline_basis_eval and apply_W_basis_diff are referenced by name in the contract (§0.2, §0.4) but their definitions are external to this section.
// Sub-phase 8.3.5b: Tweedie compound Poisson-gamma log-density for
// the regime 1 < p < 2 (hybrid Dunn-Smyth series + saddlepoint).
// Mirror of the K = 1 + p = 1 template inst/stan/amm_distrib_K.stan
// functions block (sub-phase 8.3.5b, 2026-05-21). The functions are
// identical because the Tweedie likelihood is the same regardless
// of the cross-product K x p (the function takes scalar arguments
// y, mu, phi, p and returns a scalar log-density).
real tweedie_log_W_series(real y, real phi, real p) {
real alpha_pos = (2.0 - p) / (p - 1.0);
real log_z = alpha_pos * log(y)
- alpha_pos * log(p - 1.0)
- (1.0 + alpha_pos) * log(phi)
- log(2.0 - p);
real log_W = negative_infinity();
real max_log_W = negative_infinity();
int max_iter = 1000;
int passed_peak = 0;
for (j_loop in 1:max_iter) {
real j = j_loop;
real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);
if (term > max_log_W) {
max_log_W = term;
} else {
passed_peak = 1;
}
log_W = log_sum_exp(log_W, term);
if (passed_peak == 1 && term < max_log_W - 37.0) {
break;
}
}
return log_W;
}Comment. Cites "Sub-phase 8.3.5b" and identifies this as the Tweedie compound Poisson–gamma log-density for the regime inst/stan/amm_distrib_K.stan functions block (sub-phase 8.3.5b, 2026-05-21). The functions are identical because the Tweedie likelihood is invariant under the
Signature. real tweedie_log_W_series(real y, real phi, real p) — computes the log of the series-normalizing constant
Declarations and mathematics.
-
real alpha_pos = (2.0 - p) / (p - 1.0);— defines the Tweedie compound-Poisson shape parameter
For
-
real log_z = alpha_pos * log(y) - alpha_pos * log(p - 1.0) - (1.0 + alpha_pos) * log(phi) - log(2.0 - p);— defines the log of the series argument
Equivalently,
-
real log_W = negative_infinity();— initializes the running log-sum-exp accumulator at$-\infty$ (the identity element forlog_sum_exp). -
real max_log_W = negative_infinity();— tracks the maximum term encountered, for the truncation rule. -
int max_iter = 1000;— hard cap on the number of series terms. -
int passed_peak = 0;— flag indicating whether the series has passed its modal term.
Loop. for (j_loop in 1:max_iter) iterates
-
real j = j_loop;— promotes the integer loop counter to a real for arithmetic. -
real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);— the$j$ -th log-series term
This is the log of the
-
if (term > max_log_W) { max_log_W = term; } else { passed_peak = 1; }— updates the running maximum; once a term fails to exceed the running max, the peak has been passed andpassed_peakis latched to 1. -
log_W = log_sum_exp(log_W, term);— accumulates$\log W \leftarrow \mathrm{logsumexp}(\log W, \log w_j)$ , i.e., numerically stable$\log(\sum_j w_j)$ . -
if (passed_peak == 1 && term < max_log_W - 37.0) { break; }— early termination: once past the peak and the current term is more than 37 nats below the peak (a threshold corresponding roughly to single-precision underflow in$\exp$ ), the loop breaks.
Return. return log_W; — returns
real tweedie_log_f_series(real y, real mu, real phi, real p) {
real theta = pow(mu, 1.0 - p) / (1.0 - p);
real kappa = pow(mu, 2.0 - p) / (2.0 - p);
return (y * theta - kappa) / phi - log(y)
+ tweedie_log_W_series(y, phi, p);
}Signature. real tweedie_log_f_series(real y, real mu, real phi, real p) — the Dunn–Smyth series log-density of the Tweedie compound Poisson–gamma distribution for
Declarations and mathematics.
-
real theta = pow(mu, 1.0 - p) / (1.0 - p);— the Tweedie canonical parameter
-
real kappa = pow(mu, 2.0 - p) / (2.0 - p);— the Tweedie cumulant function (per unit$\phi$ )
Return.
i.e.,
This is the standard Dunn–Smyth series form of the Tweedie density for
real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p) {
real deviance = 2.0 * (
y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p)
- (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p)
);
return -0.5 * log(2.0 * pi() * phi * pow(y, p))
- deviance / (2.0 * phi);
}Signature. real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p) — the saddlepoint (large-deviation) approximation to the Tweedie log-density, used away from
Declarations and mathematics.
-
real deviance = 2.0 * ( y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p) - (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p) );— the Tweedie deviance
Return.
This is the standard saddlepoint / Daniels approximation for the Tweedie density.
real tweedie_lpdf(real y, real mu, real phi, real p) {
real tau = 0.4;
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
if (y == 0.0) {
return -lambda;
}
if (abs(p - 1.5) < tau) {
return tweedie_log_f_series(y, mu, phi, p);
} else {
return tweedie_log_f_saddlepoint(y, mu, phi, p);
}
}Signature. real tweedie_lpdf(real y, real mu, real phi, real p) — the public Tweedie log-density entry point, dispatching between the series and saddlepoint forms.
Declarations and mathematics.
-
real tau = 0.4;— the half-width of the band around$p = 1.5$ within which the series representation is preferred. -
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));— the compound-Poisson rate
This is the mean of the Poisson mixing distribution in the compound Poisson–gamma representation of the Tweedie for
Branch 1 (point mass at zero). if (y == 0.0) { return -lambda; } — for
Branch 2 (series regime). if (abs(p - 1.5) < tau) { return tweedie_log_f_series(y, mu, phi, p); } — when
Branch 3 (saddlepoint regime). else { return tweedie_log_f_saddlepoint(y, mu, phi, p); } — otherwise (i.e.,
The hybrid dispatch is the "hybrid Dunn-Smyth series + saddlepoint" referenced in the §1.2 comment.
real tweedie_rng(real mu, real phi, real p) {
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
int N = poisson_rng(lambda);
if (N == 0) {
return 0.0;
}
real shape = (2.0 - p) / (p - 1.0);
real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));
return gamma_rng(N * shape, rate);
}Signature. real tweedie_rng(real mu, real phi, real p) — a posterior-predictive RNG for the Tweedie compound Poisson–gamma, exploiting the compound representation
Declarations and mathematics.
-
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));— as in §1.5, the Poisson rate$\lambda = \mu^{2-p}/[\phi(2-p)]$ . -
int N = poisson_rng(lambda);— draw$N \sim \mathrm{Poisson}(\lambda)$ . -
if (N == 0) { return 0.0; }— if$N = 0$ , the compound sum is empty, so$Y = 0$ . -
real shape = (2.0 - p) / (p - 1.0);— the Gamma shape$\alpha_{\mathrm{pos}} = (2-p)/(p-1)$ (same as in §1.2). -
real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));— the Gamma rate
-
return gamma_rng(N * shape, rate);— draw$Y \sim \mathrm{Gamma}(N\alpha_{\mathrm{pos}},\,\beta)$ . This is the exact distribution of$\sum_{n=1}^{N} G_n$ when each$G_n \sim \mathrm{Gamma}(\alpha_{\mathrm{pos}}, \beta)$ independently (Gamma additivity in the shape parameter under a common rate).
// Sub-phase 8.3.7 heterogeneous slot inverse-link dispatch. Mirror
// of amm_distrib_K.stan apply_inv_link_by_id. Consumed by the K=2
// likelihood branches (family_id_k_vector[1] in {1, 3, 5, 6, 7})
// applied coord-wise.
real apply_inv_link_by_id(int link_id, real eta) {
if (link_id == 0) return eta;
if (link_id == 1) return inv_logit(eta);
return exp(eta);
}Comment. Cites "Sub-phase 8.3.7 heterogeneous slot inverse-link dispatch" and identifies this as a mirror of amm_distrib_K.stan's apply_inv_link_by_id. It is consumed by the family_id_k_vector[1] is in
Signature. real apply_inv_link_by_id(int link_id, real eta) — maps an integer link identifier and a real linear predictor
Dispatch.
-
if (link_id == 0) return eta;—link_id == 0is the identity link:$g^{-1}(\eta) = \eta$ . -
if (link_id == 1) return inv_logit(eta);—link_id == 1is the logit (inverse-logit) link:
-
return exp(eta);— the default (anylink_idnot 0 or 1) is the log (inverse-log = exp) link:
This dispatch is the canonical-link machinery referenced in the header (§0.1, "distributional regression canonical link per slot"). It is the mechanism by which the second (and subsequent) slot's linear predictor link_id == 0) to produce
Section 1 establishes:
- The contract (header): the AMM reference-anchoring decomposition with per-group, per-slot, per-coordinate anchors
$\theta^{\mathrm{ref}}_{kp}$ , offset$a_{kj}$ , slope$b_{kj}$ , and basis difference$W_{kj}$ ; the family roster; the sampling parametrization choices (NCP for$a$ , linear reparametrization for$b$ , configurable CP/NCP for$W$ , hierarchical or i.i.d. for$\theta^{\mathrm{ref}}_{kp}$ ); and the codegen/caveat framework. - The
functionsblock: a placeholder for canonical helpers (bspline_basis_eval,apply_W_basis_diff); the complete Tweedie compound Poisson–gamma log-density machinery (tweedie_log_W_series,tweedie_log_f_series,tweedie_log_f_saddlepoint,tweedie_lpdf,tweedie_rng) implementing the hybrid Dunn–Smyth series + saddlepoint approximation for$1 < p < 2$ ; and the heterogeneous slot inverse-link dispatcherapply_inv_link_by_idrealizing the canonical-link-per-slot machinery of Path B.
No data, transformed data, parameters, transformed parameters, model, or generated quantities blocks appear in this section; they are carried in subsequent sections.
Section 2 opens with the closing brace } of the functions block declared in section 1, then contains the complete data and transformed data blocks, and the parameters block up to (but not including) its closing brace — the block is truncated mid-declaration at phi_pop_k. The model and generated quantities blocks are not present in this section.
}This single line terminates the functions { ... } block begun in section 1. No function declarations or bodies appear in section 2. All user-defined Stan functions (family dispatchers, inverse-link selectors, W-basis constructors, log-likelihood wrappers) were defined in section 1 and are in scope for the remaining blocks.
int<lower=1> n;
int<lower=2> K;
int<lower=2> p;| Declaration | Role |
|---|---|
n |
Number of observations. |
K |
Number of AMM slots (distinct sub-models sharing the same design infrastructure). |
p |
Number of response coordinates (multivariate outcome dimension). |
The regime is _KxP.
array[K] int<lower=1, upper=13> family_id_k_vector;A length-<lower=1, upper=13> admits 13 candidate families. The comment states that in the B9.5 initial iteration the family is homogeneous across slots: every entry is intended to carry the same value, and the dispatcher (in the functions block) reads only family_id_k_vector[1] as the canonical family identifier. Heterogeneous per-slot families are deferred. Mathematically, the observation model for every slot
array[K] int<lower=0, upper=2> inv_link_id_per_slot;Per-slot canonical inverse-link code, inherited from the Path B
| Code | Inverse link |
|---|---|
| 0 | identity: |
| 1 | inverse-logit: |
| 2 | exponential: |
The linear predictor
array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;
int<lower=0, upper=1> use_W;Binary activation flags:
-
use_a_k[k]— whether slot$k$ carries an additive deviation component$a_k(\cdot)$ . -
use_b_k[k]— whether slot$k$ carries a multiplicative deviation component$b_k(\cdot)$ . -
use_W— whether the global modulating component$W(\cdot)$ is active.
When use_a_k[k] = 0 (or use_b_k[k] = 0), the corresponding per-coordinate free-column counts J_a_per_kp / J_b_per_kp are treated as zero in transformed data, and the dispatch loop in later sections reduces to a zero-cost branch (no coefficients are sampled for that slot/component).
int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[K, p] int<lower=0> J_a_per_kp;
array[K, p] int<lower=0> J_b_per_kp;-
J_a_max,J_b_max— upper bounds on the number of free columns in the additive and multiplicative design matrices, used for padding. -
J_a_per_kp[k, j]— true (ragged) number of free additive columns for slot$k$ , coordinate$j$ . -
J_b_per_kp[k, j]— true (ragged) number of free multiplicative columns for slot$k$ , coordinate$j$ .
The comment references "Sub-fase 8.6.D D41" for the full ragged canonization in the a_raw and c_b_kp_raw (declared in parameters) are laid out via per-transformed data, respecting this ragged structure.
array[K, p] matrix[n, J_a_max] Z_a_kp;
array[K, p] matrix[n, J_b_max] Z_b_kp;For each slot
-
Z_a_kp[k, j]is an$n \times J_{a,\max}$ matrix. Only columns$1{:}J_{a,k,j}^{\text{per}}$ are referenced; the remaining$J_{a,\max}-J_{a,k,j}^{\text{per}}$ padded columns are never touched. -
Z_b_kp[k, j]is an$n \times J_{b,\max}$ matrix, analogously.
The additive contribution to the linear predictor for slot
and the multiplicative contribution (after the linear reparametrization discussed below) is:
int<lower=0> dim_W;
int<lower=0> d;
int<lower=0> W_per_kj_dim;
matrix[n, d] X;-
dim_W— total basis dimension of the modulating component$W$ . -
d— number of columns of the covariate matrix$X$ (the "effect covariates" on which$W$ acts). -
W_per_kj_dim— per-$(k,j)$ basis dimension. For separable polynomial and B-spline bases, the comment notes the decomposition$\dim_W = K \cdot p \cdot W_{\text{per-kj-dim}}$ . -
X—$n \times d$ covariate matrix. The modulating term is$W \cdot X$ , i.e.$W_{k,j}(x_i) = \sum_{m=1}^{d} W_{k,j,m}\,X_{i,m}$ (the exact assembly appears intransformed parameters).
When use_W = 0, all transformed data (dim_W_eff = 0, d_eff = 0, sigma_W_size = 0), and W_raw collapses to a
matrix[n, p] y_real;
array[n, p] int y_int;-
y_real—$n \times p$ matrix of continuous (real-valued) responses, consumed by continuous families (e.g. Gaussian). -
y_int—$n \times p$ integer array of count responses, consumed by count families (e.g. Poisson, negative binomial).
Only the outcome matching family_id_k_vector[1] is consumed by the likelihood; the other is ignored.
array[K] vector[p] theta_anchor_kp;A parameters) deviates. The precise role (prior mean for model block (later section).
array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;Per-slot binary flags controlling whether slot
- a
$\sigma_{y,k}$ (observation-scale / standard-deviation) dispersion parameter, and - a
$\phi_k$ (auxiliary precision or over-dispersion) parameter.
These are "population-scoped" (one value per slot, shared across all transformed data.
int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;-
W_type_id— basis type selector (0, 1, or 2; the exact mapping to polynomial / B-spline / other is resolved in thefunctionsblock). -
W_n_knots_full— number of knots for spline bases. -
W_knots_full— the full knot vector of lengthW_n_knots_full. -
W_degree— spline degree.
These are consumed by the W-basis constructor function (section 1) to build the design matrix through which W_raw is expanded into
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;-
use_groups— master switch for the hierarchical (per-group) anchor structure. -
J_groups— number of groups ($\geq 1$ ). -
group_id— length-$n$ assignment vector,group_id[i]$\in \{1,\ldots,J_{\text{groups}}\}$ .
When use_groups = 1, the per-group reference use_groups = 0, the hierarchical hyperparameter arrays mu_theta_ref_kp and sigma_theta_ref_kp collapse to size 0 (see parameters), and theta_ref_kp has first dimension
array[K, p] int J_a_free_kp;
array[K, p] int J_b_free_kp;
array[K, p] int a_raw_offset_kp;
array[K, p] int c_b_raw_offset_kp;
int total_J_a_free = 0;
int total_J_b_free = 0;
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;
int any_use_a = 0;
int any_use_b = 0;
int sigma_W_size;
int sigma_y_size = 0;
int phi_size = 0;
array[K] int sigma_y_offset_k;
array[K] int phi_offset_k;| Variable | Meaning |
|---|---|
J_a_free_kp[k, j] |
Effective free additive columns after gating by use_a_k[k]. Equals J_a_per_kp[k, j] if use_a_k[k] == 1 and J_a_per_kp[k, j] > 0, else 0. |
J_b_free_kp[k, j] |
Effective free multiplicative columns after gating by use_b_k[k]. |
a_raw_offset_kp[k, j] |
Starting index (0-based) of slot a_raw. |
c_b_raw_offset_kp[k, j] |
Starting index (0-based) of slot c_b_kp_raw. |
total_J_a_free |
a_raw. |
total_J_b_free |
c_b_kp_raw. |
dim_W_eff |
dim_W if use_W == 1, else 0. |
d_eff |
d if use_W == 1, else 0. |
any_use_a |
1 if any |
any_use_b |
1 if any |
sigma_W_size |
1 if use_W == 1 and dim_W > 0, else 0. |
sigma_y_size |
|
phi_size |
|
sigma_y_offset_k[k] |
Starting index (0-based) of slot sigma_y_pop_k. |
phi_offset_k[k] |
Starting index (0-based) of slot phi_pop_k. |
for (k in 1:K) {
for (j in 1:p) {
J_a_free_kp[k, j] = (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0)
? J_a_per_kp[k, j] : 0;
J_b_free_kp[k, j] = (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0)
? J_b_per_kp[k, j] : 0;
a_raw_offset_kp[k, j] = total_J_a_free;
c_b_raw_offset_kp[k, j] = total_J_b_free;
total_J_a_free += J_a_free_kp[k, j];
total_J_b_free += J_b_free_kp[k, j];
if (J_a_free_kp[k, j] > 0) any_use_a = 1;
if (J_b_free_kp[k, j] > 0) any_use_b = 1;
}
sigma_y_offset_k[k] = sigma_y_size;
phi_offset_k[k] = phi_size;
sigma_y_size += use_dispersion_y_k[k];
phi_size += use_dispersion_phi_k[k];
}Line-by-line:
-
J_a_free_kp[k, j]assignment. The ternary gates the user-suppliedJ_a_per_kp[k, j]by the slot-level flaguse_a_k[k]. If the slot is inactive (use_a_k[k] == 0) or the coordinate genuinely has zero free columns, the effective count is forced to 0. This realizes the "zero-cost branch" mentioned in the comment: inactive slots contribute no coefficients.$J_{a,k,j}^{\text{free}} = \begin{cases} J_{a,k,j}^{\text{per}} & \text{if } \texttt{use\_a\_k}[k]=1 \text{ and } J_{a,k,j}^{\text{per}}>0, \\ 0 & \text{otherwise.}\end{cases}$ -
J_b_free_kp[k, j]assignment. Analogous gating for the multiplicative component. -
a_raw_offset_kp[k, j] = total_J_a_free. Records the current running total as the 0-based start index for slot$(k,j)$ 's block withina_raw. This is the flat-pack offset: the$\ell$ -th free additive coefficient of slot$(k,j)$ will be accessed asa_raw[a_raw_offset_kp[k,j] + ell]. -
c_b_raw_offset_kp[k, j] = total_J_b_free. Analogous offset forc_b_kp_raw. -
total_J_a_free += J_a_free_kp[k, j]. Accumulates the total length of the flat additive raw vector. -
total_J_b_free += J_b_free_kp[k, j]. Accumulates the total length of the flat multiplicative raw vector. -
if (J_a_free_kp[k, j] > 0) any_use_a = 1. Sets the global additive flag if any$(k,j)$ has free additive columns. -
if (J_b_free_kp[k, j] > 0) any_use_b = 1. Sets the global multiplicative flag. -
sigma_y_offset_k[k] = sigma_y_size. Records the 0-based start index for slot$k$ 's$\sigma_y$ entry (either 0 or 1 slots, since each slot contributes at most one). -
phi_offset_k[k] = phi_size. Analogous for$\phi$ . -
sigma_y_size += use_dispersion_y_k[k]. Increments the dispersion array size by 0 or 1. -
phi_size += use_dispersion_phi_k[k]. Analogous.
sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;A single global use_W = 1 with dim_W = 0 (a degenerate or placeholder configuration) does not allocate a spurious scale parameter.
int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
int slot_free_a = 0;
for (j in 1:p) {
slot_free_a += J_a_free_kp[k, j];
}
if (slot_free_a > 0) {
n_sigma_a += 1;
sigma_a_idx[k] = n_sigma_a;
} else {
sigma_a_idx[k] = 0;
}
}Purpose. The comment (RG.6, D96) explains the compaction. The per-slot additive scale use_a_k[k] == 0 (no additive component at all) or because the additive formula is intercept-only (e.g. a = ~ 1) and the intercept has been absorbed into
Mechanism.
-
n_sigma_a— counts slots with at least one free additive coefficient:$n_{\sigma_a} = \big|\{k : \sum_{j=1}^{p} J_{a,k,j}^{\text{free}} > 0\}\big|$ . -
sigma_a_idx[k]— maps slot$k$ to its 1-based position in the compactedsigma_a_karray, or 0 if the slot has no free additive coefficients.
When every slot carries free additive coefficients, sigma_a_idx is the identity map
The block is not closed in this section; it continues into section 3. All declarations up to phi_pop_k are present.
array[J_groups, K] vector[p] theta_ref_kp;The sampled per-group, per-slot, per-coordinate reference anchor
The fixed data anchor theta_anchor_kp provides the centering reference (its exact role — prior mean or offset — is finalized in the model block). The deviation of
array[use_groups, K] vector[p] mu_theta_ref_kp;
array[use_groups, K] vector<lower=0>[p] sigma_theta_ref_kp;-
mu_theta_ref_kp— hyper-mean$\mu_{\theta_{\text{ref}},k,j}$ of the per-group anchor prior. -
sigma_theta_ref_kp— hyper-scale$\sigma_{\theta_{\text{ref}},k,j} > 0$ .
The first dimension is use_groups (0 or 1). When use_groups = 0, both arrays have size 0 and are not sampled; the per-group anchor theta_ref_kp then receives a non-hierarchical prior directly (specified in the model block). When use_groups = 1, the hierarchical prior is:
(The exact prior family — Gaussian or otherwise — is declared in the model block; the <lower=0> constraint on sigma_theta_ref_kp confirms a positive scale.)
array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;-
sigma_a_k— per-slot additive scale$\sigma_{a,k} > 0$ , compacted to the$n_{\sigma_a}$ slots identified bysigma_a_idx. Slots with no free additive coefficients have no entry here, eliminating the non-identified flat directions described in §3.4. -
a_raw— flat-packed vector of lengthtotal_J_a_freeof standardized (non-centered) additive coefficients. The NCP reconstruction (intransformed parameters) is:
The implied prior on a_raw (declared in the model block) is standard normal:
so that marginally
array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_kp_raw;-
sigma_b_k— per-slot multiplicative scale$\sigma_{b,k} > 0$ . The array size is$K \times \texttt{any\_use\_b}$ : when no slot uses the multiplicative component (any_use_b = 0), the array is empty; when at least one slot does (any_use_b = 1), all$K$ slots receive a scale parameter. (Unlikesigma_a_k, no compaction to active-only slots is applied here.) -
c_b_kp_raw— flat-packed vector of lengthtotal_J_b_freeof standardized multiplicative coefficients.
The comment states the linear reparametrization:
so the multiplicative contribution to the linear predictor is:
This is linear in
The implied prior is
Observation (faithful to code). When
any_use_b = 1but some slot$k$ hasuse_b_k[k] = 0, that slot'ssigma_b_k[k]is declared but has no corresponding entries inc_b_kp_raw(sinceJ_b_free_kp[k, j] = 0for all$j$ ). Whether this scale is used in themodelblock or left as a sampled-but-prior-only parameter is determined in a later section. The code as written does not compactsigma_b_kthe waysigma_a_kis compacted.
array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;-
sigma_W— global scale$\sigma_W > 0$ for the modulating component. Size 1 when active, 0 otherwise. -
W_raw—$\dim_{W,\text{eff}} \times d_{\text{eff}}$ matrix of standardized entries. Whenuse_W = 0, both dimensions are 0, yielding a$0 \times 0$ matrix (a zero-cost placeholder).
The NCP reconstruction (in transformed parameters) is:
and the modulating contribution to the linear predictor is:
The implied prior is
array[sigma_y_size] real<lower=0> sigma_y_pop_k;
array[phi_size] real<lower=0> phi_pop_k;-
sigma_y_pop_k— compacted array of per-slot observation-scale dispersions$\sigma_{y,k} > 0$ . Only slots withuse_dispersion_y_k[k] = 1have an entry; the mapping is viasigma_y_offset_k[k]. -
phi_pop_k— compacted array of per-slot auxiliary dispersions$\phi_k > 0$ . Only slots withuse_dispersion_phi_k[k] = 1have an entry; the mapping is viaphi_offset_k[k].
Access pattern: for slot use_dispersion_y_k[k] = 1, the dispersion is sigma_y_pop_k[sigma_y_offset_k[k] + 1]; otherwise the family dispatcher uses a default (family-specific) value. Analogously for
The declarations in sections 1–2 establish the following structural decomposition. For observation
The reference-anchoring is realized by:
-
Fixed anchor (
theta_anchor_kp, data) — the immutable reference point. -
Per-group sampled reference (
theta_ref_kp, parameters) — deviates from the anchor under a hierarchical prior (mu_theta_ref_kp,sigma_theta_ref_kp) whenuse_groups = 1. -
Additive deviation (
a_raw,sigma_a_k) — NCP, compacted to active slots, contributing a linear shift$\delta^{\text{add}}$ . -
Multiplicative deviation (
c_b_kp_raw,sigma_b_k) — NCP with the linear reparametrization$c_b = \theta_{\text{ref}} \cdot b$ , contributing a$\theta_{\text{ref}}$ -scaled shift$\delta^{\text{mul}}$ . -
Global modulator (
W_raw,sigma_W) — a shared covariate-driven correction$W \cdot X$ .
The transformed parameters block (section 3) will assemble model block (section 4 or 5) will declare the priors on a_raw, c_b_kp_raw, W_raw, sigma_a_k, sigma_b_k, sigma_W, theta_ref_kp, and the hyperparameters, plus the family-specific likelihood; and generated quantities (section 5) will produce posterior summaries.
This block reconstructs the AMM coefficients from their non-centered raw draws and assembles the per-slot, per-coordinate linear predictor eta_kp.
Declarations. array[K] matrix[n, p] eta_kp holds, for each distributional slot a_coef_kp, c_b_kp, b_coef_kp are array[K, p] vector[J_*_max] — per slot and per coordinate, padded to the maximum basis dimension. All three are zero-initialized over the full
Non-centered reconstruction of the additive coefficients (lines 340–353). When any additive component is active, for each active slot (
Non-centered reconstruction of the multiplicative coefficients (lines 355–380). Symmetrically, the linear product coordinate b_coef_kp stays at zero; downstream callers derive it per group from amm_eb_marginal_KxP.stan and amm_distrib_K.stan).
Linear-predictor assembly (lines 382–422). For every slot dot_products over the active basis columns. The modulated term (lines 400–418) uses the globally-shared W_raw matrix (row count {{W_SCALE}} is the codegen-injected scale. The accumulated eta_kp[k][i,j].
Priors. {{THETA_REF_PRIOR_BLOCK}} is the codegen-injected, family-specific per-slot per-coord anchor prior (hierarchical .gdpar_build_theta_ref_prior_block_KxP()). The non-centered scale priors follow: {{PRIOR_SIGMA_A}}, {{PRIOR_SIGMA_B}}, {{PRIOR_SIGMA_W}}, {{W_PRIOR}}; plus the population auxiliary priors {{PRIOR_SIGMA_Y}} and {{PRIOR_PHI}} when those slots exist.
Likelihood — family dispatch with coord-wise factorization (lines 461–597). The canonical Path-B family is selected by family_id_k_vector[1], and each observation–coordinate pair apply_inv_link_by_id(inv_link_id_per_slot[k], eta_kp[k][i,j]). The branches:
-
1 Gaussian (
$K=2$ ):$y_{ij}\sim\mathcal N\!\big(\eta_{1,ij},\,\sigma_{ij}\big)$ with$\sigma_{ij}=g^{-1}(\eta_{2,ij})$ . -
3 Negative-binomial (
$K=2$ ):$y_{ij}\sim\text{NB2}\!\big(e^{\eta_{1,ij}},\,\phi_{ij}\big)$ . -
5 Beta (
$K=2$ ):$y_{ij}\sim\text{beta\_proportion}\!\big(\text{logit}^{-1}(\eta_{1,ij}),\,\phi_{ij}\big)$ . -
6 Gamma (
$K=2$ ): mean$\mu_{ij}=e^{\eta_{1,ij}}$ , shape$s_{ij};\ y_{ij}\sim\text{Gamma}(s_{ij},\,s_{ij}/\mu_{ij})$ (rate = shape/mean). -
7 Lognormal (loc–scale,
$K=2$ ):$y_{ij}\sim\text{Lognormal}(\eta_{1,ij},\sigma_{ij})$ . -
8 Student-
$t$ ($K=3$ ):$y_{ij}\sim t\big(\nu=e^{\eta_{3,ij}},\,\mu=\eta_{1,ij},\,\sigma=e^{\eta_{2,ij}}\big)$ . -
9 Tweedie (
$K=3$ ):$y_{ij}\sim\text{Tweedie}\big(e^{\eta_{1,ij}},e^{\eta_{2,ij}},p=\eta_{3,ij}\big)$ via the package'stweedie_lpdf. -
10 ZIP (
$K=2$ ): the dual deviation at work — for$y_{ij}=0,\ \;\log\!\big[e^{\text{bern\_logit}(1|\eta_\pi)}+e^{\text{bern\_logit}(0|\eta_\pi)}\,p_{\text{Pois}}(0|\eta_\mu)\big]$ vialog_sum_exp; for$y_{ij}>0,\ \;\text{bern\_logit}(0|\eta_\pi)+\text{poisson\_log}(y|\eta_\mu)$ . -
11 ZINB (
$K=3$ ): identical mixture with$\text{NB2}$ replacing Poisson,$\phi=e^{\eta_{2,ij}},\ \eta_\pi=\eta_{3,ij}$ . -
12 Hurdle-Poisson (
$K=2$ ):$y=0\Rightarrow\text{bern\_logit}(1|\eta_\pi);\ y>0\Rightarrow\text{bern\_logit}(0|\eta_\pi)+\text{poisson\_log}(y|\eta_\mu)-\log\!\big(1-e^{-e^{\eta_\mu}}\big)$ — thelog1m_expterm is the one-truncation normalizer of the zero-truncated Poisson. -
13 Hurdle-NB (
$K=3$ ): same hurdle structure with NB2 and its zero-truncation normalizer$-\,\text{log1m\_exp}\big(\text{NB2\_log}(0|\eta_\mu,\phi)\big)$ .
Each branch realizes the AMM anchoring on every slot (location and the auxiliaries that were promoted to per-observation), which is the multivariate (
This block produces, per observation log_lik[n,p] (the pointwise log-likelihood used by PSIS-LOO / gdpar_loo), theta_i[n,p] (the fitted location-slot linear predictor), and y_pred[n,p] (a posterior-predictive draw). Its shape mirrors amm_eb_marginal_KxP.stan extended to the full Path-B family set of amm_distrib_K.stan.
For each family_id_k_vector[1] dispatch as the model block is reproduced, now emitting _lpdf/_lpmf for log_lik and _rng for y_pred:
-
1 Gaussian:
theta_i = eta_kp[1];log_lik = normal_lpdf(y | eta_1, sigma);y_pred = normal_rng(eta_1, sigma),$\sigma=g^{-1}(\eta_2)$ . -
3 NB2:
$\mu=e^{\eta_1},\ \phi=g^{-1}(\eta_2)$ ;neg_binomial_2_lpmf/_rng. -
5 Beta:
$\mu=\text{logit}^{-1}(\eta_1),\ \phi=g^{-1}(\eta_2)$ ;beta_proportion_lpdf/_rng. -
6 Gamma:
$\mu=e^{\eta_1}$ , shape$s=g^{-1}(\eta_2)$ ;gamma_lpdf(y|s, s/μ)/_rng. -
7 Lognormal:
lognormal_lpdf(y|eta_1, sigma)/_rng. -
8 Student-
$t$ :$\nu=e^{\eta_3},\ \mu=\eta_1,\ \sigma=e^{\eta_2}$ ;student_t_lpdf/_rng; heretheta_i = muis set from$\eta_1$ explicitly. -
9 Tweedie:
$\mu=e^{\eta_1},\ \phi=e^{\eta_2},\ p=\eta_3$ ;tweedie_lpdf/_rng. -
10 ZIP / 11 ZINB:
log_likis the same zero-inflationlog_sum_expmixture as the model block; the predictive draw first samples the structural-zero indicatorbernoulli_logit_rng(eta_pi)— if 1,$y_{\text{pred}}=0$ ; else apoisson_log_rng(ZIP) /neg_binomial_2_log_rng(ZINB) count. -
12 Hurdle-Poisson / 13 Hurdle-NB:
log_likreproduces the hurdle decomposition (with thelog1m_expzero-truncation normalizer); the predictive draw samples the hurdle indicator, and on a "positive" outcome draws from the count distribution rejection-sampling until non-zero (while (y_draw == 0 && iter_h < 10000)), realizing the zero-truncated count. -
else (unknown family id): a safe fallback —
theta_i = eta_1,log_lik = negative_infinity(),y_pred = 0— so an unrecognizedfamily_idyields a$-\infty$ contribution rather than a crash.
The log_lik matrix is what gdpar_loo() aggregates (summing over coordinates per observation, §III.5) into the ELPD; y_pred feeds pp_check/gdpar_posterior_predict and the DHARMa residual machinery; theta_i is the anchored individual location estimate
Line 762 is the closing brace } of the generated quantities block — the end of the amm_canonical_pmulti_KxP.stan canonical piece. With it the template is complete: data / transformed data (sections c01–c02), parameters / transformed parameters and model (c03), and generated quantities (c04). This piece is the full-Bayes Stan program for the multivariate (R/stan_codegen.R from this and the shared canonical helpers.
← Part V — Stan Templates (1/3) · gdpar Wiki Home · Part V — Stan Templates (3/3) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References