-
Notifications
You must be signed in to change notification settings - Fork 0
Part V Stan Templates 1
← Part IV — Exhaustive Function Reference (7/7) · gdpar Wiki Home · Part V — Stan Templates (2/3) →
The functions block defines user-defined functions used in the Stan program. It contains functions for the Tweedie distribution and a helper for applying inverse links.
Computes the log of the Dunn–Smyth series W(y, phi, p) for the Tweedie distribution when
The series is defined as:
$$
W(y, \phi, p) = \sum_{j=1}^\infty W_j
$$
where
$$
W_j = \frac{y^{j \alpha} (p-1)^{-j\alpha}}{\phi^{j(1+\alpha)} (2-p)^j \cdot j! \cdot \Gamma(j\alpha)}
$$
with
The function iteratively sums terms until a term is more than 37 nats below the running maximum (relative contribution <
Returns:
Exact log-density of the Tweedie distribution via the Dunn–Smyth series. Valid for
The Tweedie density is: $$ f(y; \mu, \phi, p) = \exp\left{ \frac{y\theta - \kappa}{\phi} - \log y + \log W(y, \phi, p) \right} $$ where $$ \theta = \frac{\mu^{1-p}}{1-p}, \quad \kappa = \frac{\mu^{2-p}}{2-p}. $$
Returns:
Saddlepoint approximation of the Tweedie log-density (Wood 2017, Eq. 3.46). Used outside the central region (
The deviance is: $$ D = 2\left( \frac{y(y^{1-p} - \mu^{1-p})}{1-p} - \frac{y^{2-p} - \mu^{2-p}}{2-p} \right). $$ The saddlepoint log-density is: $$ \log f_{\text{sp}}(y; \mu, \phi, p) = -\frac{1}{2}\log(2\pi\phi y^p) - \frac{D}{2\phi}. $$
Returns:
Public entry point for the Tweedie log-pdf. Dispatches to the appropriate method based on
- If
$y = 0$ : uses the compound Poisson point mass:$\Pr(Y=0) = -\lambda$ with$\lambda = \mu^{2-p}/(\phi(2-p))$ . - If
$y > 0$ and$|p-1.5| < \tau$ (default$\tau = 0.4$ ): uses the series expansion. - Otherwise: uses the saddlepoint approximation.
Returns:
Generates a random draw from the Tweedie distribution. Uses the compound Poisson-gamma representation:
-
$N \sim \text{Poisson}(\lambda)$ with$\lambda = \mu^{2-p}/(\phi(2-p))$ . - If
$N = 0$ , return 0. - Else,
$Y = \sum_{i=1}^N G_i$ with$G_i \sim \text{Gamma}(\alpha, \beta),\ \alpha = (2-p)/(p-1),\ \beta = 1/(\phi(p-1)\mu^{p-1})$ .
Applies an inverse link function based on an integer ID. Used for heterogeneous slot links in K=2 families.
- ID 0: identity link (e.g., Gaussian μ).
- ID 1: logit link (e.g., Beta μ, Bernoulli μ).
- ID 2: exp link (e.g., Gaussian σ, NB φ).
Returns:
This is a placeholder for helper functions (bspline_basis_eval and apply_W_basis_diff) that are injected from amm_canonical_helpers.stan.
The data block declares all observed data and fixed constants.
Number of observations (sample size). Integer, must be at least 1.
Number of K-individual distributional slots. Integer, must be at least 2.
Array of length K of family IDs (1–13) specifying the distribution family for each slot.
- 1: Gaussian (K=2)
- 3: Negative binomial (K=2)
- 5: Beta (K=2)
- 6: Gamma (K=2)
- 7: Lognormal (K=2)
- 8: Student-t (K=3)
- 9: Tweedie (K=3)
- 10: Zero-inflated Poisson (K=2)
- 11: Zero-inflated negative binomial (K=3)
- 12: Hurdle Poisson (K=2)
- 13: Hurdle negative binomial (K=3)
Array of length K of inverse link IDs (0, 1, 2) per slot. Used for K=2 families to apply the canonical link to auxiliary slots.
Arrays of length K of 0/1 flags indicating whether the slot uses an additive term
0/1 flag indicating whether the modulating function
Arrays of length K of integers (≥0) specifying the number of basis functions for
Maximum of J_a_per_k and J_b_max over k, respectively.
Arrays of length K of design matrices for J_a_per_k[k] (or J_b_per_k[k]).
Dimension of the basis for the modulating function use_W=0, set to 0.
Dimension of the basis for the global modulating function use_W=0, set to 0.
Design matrix for the global modulating function
Vectors of length n containing the response variable. y_real is for continuous families, y_int for discrete.
Vector of length K containing the reference values for each slot. Default is 0.
Arrays of length K of 0/1 flags indicating whether the slot uses a dispersion parameter for y (e.g., σ for Gaussian) or for φ (e.g., for NB, Beta, etc.). These are for future extension to mixed family patterns.
Integer (0, 1, 2) indicating the type of basis for
- 0: Polynomial
- 1: B-spline
- 2: Radial basis
Parameters for the basis of W_n_knots_full is the number of knots, W_knots_full the knot vector, W_degree the degree.
0/1 flag and integer for group-level random effects. If use_groups=1, there are J_groups groups.
Array of length n of group indices (1 to J_groups) for each observation.
Placeholder for additional data declarations related to the CP/NCP parametrization of the additive terms
Note: The remaining blocks (transformed data, parameters, transformed parameters, model, generated quantities) will be covered in subsequent sections.
Computes indexing arrays, counters, and flags required to efficiently map between flat-packed raw parameter vectors and per-slot model quantities. This ensures identifiability and avoids non-identified parameters in the AMM (Additive-Multiplicative Model) reference-anchoring decomposition.
array[K] int J_a_free;
array[K] int J_b_free;-
J_a_free[k]: Number of free (estimated) additive coefficients for slot$k$ . Set toJ_a_per_k[k]if slot$k$ is active (use_a_k[k] == 1) and has at least one design column (J_a_per_k[k] > 0); otherwise 0. -
J_b_free[k]: Analogous for multiplicative coefficients.
array[K] int a_raw_offset;
array[K] int c_b_raw_offset;-
a_raw_offset[k]: Starting index (0-based) in the flat-packed vectora_rawfor slot$k$ 's additive coefficients. Equals$\sum_{k'=1}^{k-1} J_a_free[k']$ . -
c_b_raw_offset[k]: Starting index inc_b_k_rawfor slot$k$ 's multiplicative coefficients.
int total_J_a_free = 0;
int total_J_b_free = 0;- Accumulate total free additive/multiplicative coefficients across all slots. These become the dimensions of
a_rawandc_b_k_raw.
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;- Effective dimensions for the global modulating component
$W$ . Ifuse_W == 0, both are zero (no$W$ component).
int any_use_a = 0;
int any_use_b = 0;- Flags indicating whether any slot uses the additive or multiplicative component. Set to 1 if any slot has
use_a_k[k] == 1oruse_b_k[k] == 1.
int sigma_W_size;- Size of the scale array for
$W$ . Equals 1 ifuse_W == 1anddim_W > 0; otherwise 0.
int sigma_y_size = 0;
int phi_size = 0;- Counts of population-scoped dispersion parameters for Gaussian (
sigma_y) and negative binomial (phi) families. Incremented for each slot whereuse_dispersion_y_k[k]oruse_dispersion_phi_k[k]is 1.
For each slot
-
Compute
J_a_free[k]andJ_b_free[k]:J_a_free[k] = (use_a_k[k] == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
- Only slots with active additive component and at least one design column contribute free coefficients.
-
Set offsets and update totals:
a_raw_offset[k] = total_J_a_free; c_b_raw_offset[k] = total_J_b_free; total_J_a_free += J_a_free[k]; total_J_b_free += J_b_free[k];
- Offsets are cumulative sums of free coefficients from previous slots.
-
Update
any_use_aandany_use_bflags:if (use_a_k[k] == 1) any_use_a = 1; if (use_b_k[k] == 1) any_use_b = 1;
-
Accumulate dispersion parameter counts:
sigma_y_size += use_dispersion_y_k[k]; phi_size += use_dispersion_phi_k[k];
sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;-
$W$ has a single scale parametersigma_W[1]if active.
Compaction of sigma_a_k (per RG.6, D96):
int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
if (J_a_free[k] > 0) {
n_sigma_a += 1;
sigma_a_idx[k] = n_sigma_a;
} else {
sigma_a_idx[k] = 0;
}
}-
n_sigma_a: Number of slots with free additive coefficients. -
sigma_a_idx[k]: Maps slot$k$ to its index in the compactedsigma_a_karray (1-based); 0 if no free coefficients. -
Purpose: Avoids sampling a scale parameter for slots without free additive coefficients (e.g., intercept-only slots fully absorbed into
$\theta_{\text{ref}}$ ), which would be non-identified (flat direction) and stall the sampler.
This block establishes the bookkeeping for the AMM decomposition:
- The additive component
$\mathbf{a}_k$ has scale$\sigma_{a,k}$ and coefficients$a_{\text{raw}}$ . - The multiplicative component
$\mathbf{c}_b{k} = \theta_{\text{ref},k} \circ \mathbf{b}_k$ has scale$\sigma_{b,k}$ . - The global component
$W$ has scale$\sigma_W$ . - The per-slot anchors
$\theta_{\text{ref},k}$ are the reference points around which additive/multiplicative effects are defined.
array[J_groups] vector[K] theta_ref_k;-
$\theta_{\text{ref},g,k}$ : Anchor for group$g$ , slot$k$ . In the$K = 1 + p$ template,$p=1$ per slot, so this is a scalar per group per slot.
array[use_groups] vector[K] mu_theta_ref_k;
array[use_groups] vector<lower=0>[K] sigma_theta_ref_k;-
$\mu_{\theta,k}$ and$\sigma_{\theta,k}$ : Group-level mean and standard deviation for the per-slot anchors. Only present when hierarchical groups are used.
array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;-
$\sigma_{a,k}$ : Scale for additive component in slot$k$ (only for slots withJ_a_free[k] > 0). -
$a_{\text{raw}}$ : Raw additive coefficients (standardized). Transformed to$a_{\text{coef},k}$ in transformed parameters.
array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_k_raw;-
$\sigma_{b,k}$ : Scale for multiplicative component in slot$k$ (ifany_use_b == 1; otherwise array is empty). -
$c_{b,k}^{\text{raw}}$ : Raw multiplicative coefficients (standardized). Transformed to$c_{b,k}$ in transformed parameters.
array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;-
$\sigma_W$ : Scale for global component$W$ . -
$W_{\text{raw}}$ : Raw coefficients for basis functions of$W$ . Dimensions:$\text{dim}_W \times d$ .
array[sigma_y_size] real<lower=0> sigma_y_pop;
array[phi_size] real<lower=0> phi_pop;-
$\sigma_{y,\text{pop}}$ : Scale parameter for Gaussian family (location only). -
$\phi_{\text{pop}}$ : Dispersion parameter for negative binomial family (location only).
The parameters are structured for the AMM non-centered parameterization:
- Raw parameters
$a_{\text{raw}},\ c_{b,k}^{\text{raw}},\ W_{\text{raw}}$ are standard normal a priori (in the model block). - They are scaled by
$\sigma_{a,k},\ \sigma_{b,k},\ \sigma_W$ to produce actual coefficients. - The anchors
$\theta_{\text{ref},k}$ are the reference points; additive/multiplicative effects are defined relative to these.
array[K] vector[J_a_max] a_coef_k;
array[K] vector[J_b_max] c_b_k;
array[K] vector[J_b_max] b_coef_k;
matrix[n, K] eta_k;-
a_coef_k[k]: Additive coefficients for slot$k$ (length$J_a^{\max}$ ). -
c_b_k[k]: Multiplicative coefficients in centered form:$c_{b,k} = \theta_{\text{ref},k} \cdot b_k$ . -
b_coef_k[k]: Multiplicative coefficients in non-centered form (only computed for reporting whenuse_groups == 0). -
eta_k[i,k]: Linear predictor for observation$i$ , slot$k$ .
for (k in 1:K) {
a_coef_k[k] = rep_vector(0, J_a_max);
c_b_k[k] = rep_vector(0, J_b_max);
b_coef_k[k] = rep_vector(0, J_b_max);
}- Initialize all coefficient vectors to zero.
This is where the R package inserts code to transform
- For each slot
$k$ withJ_a_free[k] > 0, copy the segment of$a_{\text{raw}}$ from indexa_raw_offset[k]+1toa_raw_offset[k]+J_a_free[k]intoa_coef_k[k][1:J_a_free[k]]. - Scale by
$\sigma_{a,k}$ (using the compacted indexing viasigma_a_idx). - Possibly apply a non-centered transformation (e.g.,
$a_k = \sigma_{a,k} \cdot a_{\text{raw}}$ ).
if (any_use_b == 1) {
for (k in 1:K) {
if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
for (j in 1:J_b_free[k]) {
c_b_k[k][j] = c_b_k_raw[c_b_raw_offset[k] + j] * sigma_b_k[k];
}
// Compute b_coef_k only for single-group case
if (use_groups == 0 && theta_ref_k[1][k] != 0) {
for (j in 1:J_b_per_k[k]) {
b_coef_k[k][j] = c_b_k[k][j] / theta_ref_k[1][k];
}
}
}
}
}- For each slot with multiplicative component:
- Transform raw coefficients:
$c_{b,k} = \sigma_{b,k} \cdot c_{b,k}^{\text{raw}}$ . - If single group (
use_groups == 0) and anchor nonzero, compute$b_k = c_{b,k} / \theta_{\text{ref},k}$ for reporting.
- Transform raw coefficients:
For each observation
real theta_ref_ik = theta_ref_k[g_i][k];
real eta_ik = theta_ref_ik;- Start with the group-specific anchor for slot
$k$ .
Additive component:
if (use_a_k[k] == 1 && J_a_per_k[k] > 0) {
eta_ik += dot_product(to_vector(Z_a_k[k][i, 1:J_a_per_k[k]]), a_coef_k[k][1:J_a_per_k[k]]);
}-
$\eta_{ik} \mathrel{+}= \mathbf{z}_{a,ik}^\top \mathbf{a}_k$ , where$\mathbf{z}_{a,ik}$ is row$i$ of the additive design matrix for slot$k$ .
Multiplicative component:
if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
eta_ik += dot_product(to_vector(Z_b_k[k][i, 1:J_b_per_k[k]]), c_b_k[k][1:J_b_per_k[k]]);
}-
$\eta_{ik} \mathrel{+}= \mathbf{z}_{b,ik}^\top \mathbf{c}_{b,k}$ . Since$\mathbf{c}_{b,k} = \theta_{\text{ref},k} \cdot \mathbf{b}_k$ , this is the multiplicative effect anchored at$\theta_{\text{ref},k}$ .
Global modulating component (if use_W == 1):
vector[d] W_diff_x_k = rep_vector(0, d);
vector[dim_W] basis_diff_k = apply_W_basis_diff(...);
for (jj_local in 1:dim_W) {
for (jjj in 1:d) {
W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[jj_local, jjj]{{W_SCALE}};
}
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));- Compute basis functions of the difference between current anchor and slot anchor:
$\phi(\theta_{\text{ref},ik} - \theta_{\text{anchor},k})$ . - Construct
$W$ -component:$\mathbf{w}_k = \Phi \cdot W$ , where$\Phi$ is the basis matrix. -
$\eta_{ik} \mathrel{+}= \mathbf{x}_i^\top \mathbf{w}_k$ . - The placeholder
{{W_SCALE}}scales$W_{\text{raw}}$ by$\sigma_W$ .
Finally:
eta_k[i, k] = eta_ik;The linear predictor for observation
-
$\mathbf{a}_k = \sigma_{a,k} \cdot \mathbf{a}_{\text{raw},k}$ (additive coefficients). -
$\mathbf{b}_k$ are multiplicative coefficients (non-centered),$\mathbf{c}_{b,k} = \theta_{\text{ref},k} \mathbf{b}_k$ . -
$\mathbf{w}_k$ is the global$W$ component.
This realizes the AMM reference-anchoring decomposition: effects are additive/multiplicative around the anchor
{{THETA_REF_PRIOR_BLOCK}}- Placeholder for priors on
$\theta_{\text{ref},k}$ . Can be hierarchical (normal) or flat, and may include slot-specific constraints (e.g., uniform on Tweedie power).
Additive component (if any_use_a == 1):
sigma_a_k ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}- Prior on
$\sigma_{a,k}$ (e.g., half-normal, half-Cauchy). - Prior on
$a_{\text{raw}}$ (expected standard normal).
Multiplicative component (if any_use_b == 1):
sigma_b_k ~ {{PRIOR_SIGMA_B}};
c_b_k_raw ~ normal(0, 1);- Prior on
$\sigma_{b,k}$ . -
$c_{b,k}^{\text{raw}} \sim \mathcal{N}(0,1)$ .
Global component (if use_W == 1 and dim_W > 0):
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};- Prior on
$\sigma_W$ . - Prior on
$W_{\text{raw}}$ (e.g., standard normal).
Population-scoped dispersion:
sigma_y_pop ~ {{PRIOR_SIGMA_Y}};
phi_pop ~ {{PRIOR_PHI}};- Priors on
$\sigma_{y,\text{pop}}$ and$\phi_{\text{pop}}$ .
The likelihood is written for family_id_k[1]).
for (i in 1:n) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ normal(eta_k[i, 1], sigma_i);
}-
$\mu_i = \eta_{i1}$ (identity link for location). -
$\sigma_i = g^{-1}(\eta_{i2})$ (link function for dispersion, typically$\exp$ ). -
$y_i \sim \mathcal{N}(\mu_i, \sigma_i)$ .
for (i in 1:n) {
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_int[i] ~ neg_binomial_2(exp(eta_k[i, 1]), phi_i);
}-
$\mu_i = \exp(\eta_{i1})$ (log link for mean). -
$\phi_i = g^{-1}(\eta_{i2})$ (typically$\exp$ for dispersion). -
$y_i \sim \text{NB2}(\mu_i, \phi_i)$ .
for (i in 1:n) {
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ beta_proportion(inv_logit(eta_k[i, 1]), phi_i);
}-
$\mu_i = \text{logit}^{-1}(\eta_{i1})$ (logit link for mean in (0,1)). -
$\phi_i = g^{-1}(\eta_{i2})$ (typically$\exp$ for precision). -
$y_i \sim \text{BetaProportion}(\mu_i, \phi_i)$ .
for (i in 1:n) {
real mu_i = exp(eta_k[i, 1]);
real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ gamma(shape_i, shape_i / mu_i);
}-
$\mu_i = \exp(\eta_{i1})$ (log link for mean). -
$\alpha_i = g^{-1}(\eta_{i2})$ (typically$\exp$ for shape). -
$y_i \sim \text{Gamma}(\alpha_i, \alpha_i/\mu_i)$ .
for (i in 1:n) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ lognormal(eta_k[i, 1], sigma_i);
}-
$\mu_i = \eta_{i1}$ (identity link for log-location). -
$\sigma_i = g^{-1}(\eta_{i2})$ (typically$\exp$ for scale). -
$y_i \sim \text{Lognormal}(\mu_i, \sigma_i)$ .
for (i in 1:n) {
y_real[i] ~ student_t(exp(eta_k[i, 3]), eta_k[i, 1], exp(eta_k[i, 2]));
}-
$\mu_i = \eta_{i1}$ (identity link for location). -
$\sigma_i = \exp(\eta_{i2})$ (log link for scale). -
$\nu_i = \exp(\eta_{i3})$ (log link for degrees of freedom). -
$y_i \sim t_{\nu_i}(\mu_i, \sigma_i)$ .
for (i in 1:n) {
y_real[i] ~ tweedie(exp(eta_k[i, 1]), exp(eta_k[i, 2]), eta_k[i, 3]);
}-
$\mu_i = \exp(\eta_{i1})$ (log link for mean). -
$\phi_i = \exp(\eta_{i2})$ (log link for dispersion). -
$p_i = \eta_{i3}$ (identity link for power, bounded to$(1.01,1.99)$ by prior). -
$y_i \sim \text{Tweedie}(\mu_i, \phi_i, p_i)$ .
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
target += log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i) + poisson_log_lpmf(0 | eta_mu_i)
);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i) + poisson_log_lpmf(y_int[i] | eta_mu_i);
}
}-
$\mu_i = \exp(\eta_{i1})$ (log link for Poisson mean). -
$\pi_i = \text{logit}^{-1}(\eta_{i2})$ (logit link for zero-inflation probability). - Likelihood:
$P(y_i=0) = \pi_i + (1-\pi_i)e^{-\mu_i},\ P(y_i>0) = (1-\pi_i)\frac{e^{-\mu_i}\mu_i^{y_i}}{y_i!}$ .
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
target += log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i) + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i) + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i);
}
}-
$\mu_i = \exp(\eta_{i1}),\ \phi_i = \exp(\eta_{i2}),\ \pi_i = \text{logit}^{-1}(\eta_{i3})$ . - Likelihood:
$P(y_i=0) = \pi_i + (1-\pi_i)\left(\frac{\phi_i}{\mu_i+\phi_i}\right)^{\phi_i},\ P(y_i>0) = (1-\pi_i)\text{NB2}(y_i|\mu_i,\phi_i)$ .
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
target += bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i)
- log1m_exp(-exp(eta_mu_i));
}
}-
$\mu_i = \exp(\eta_{i1}),\ \pi_i = \text{logit}^{-1}(\eta_{i2})$ . - Likelihood:
$P(y_i=0)=\pi_i,\ P(y_i>0)=(1-\pi_i)\frac{\text{Poisson}(y_i|\mu_i)}{1-e^{-\mu_i}}$ .
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
target += bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i)
- log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i));
}
}-
$\mu_i = \exp(\eta_{i1}),\ \phi_i = \exp(\eta_{i2}),\ \pi_i = \text{logit}^{-1}(\eta_{i3})$ . - Likelihood:
$P(y_i=0)=\pi_i,\ P(y_i>0)=(1-\pi_i)\frac{\text{NB2}(y_i|\mu_i,\phi_i)}{1-\text{NB2}(0|\mu_i,\phi_i)}$ .
The model block specifies:
- Priors: Weakly informative or proper on scales and coefficients; anchors may have hierarchical or flat priors.
-
Likelihood: Distributional regression where each slot's
$\eta_{ik}$ is passed through an inverse link to produce the parameter of the distribution. The likelihood is family-specific and uses the AMM decomposition for all parameters.
The entire model realizes the AMM reference-anchoring decomposition: effects are additive/multiplicative around group-specific anchors, with optional global modulation
This is the final block of the AMM (General Dynamic Parameter) model template for distributional regression with generated quantities block computes two key posterior-estimable quantities:
-
Pointwise log-likelihoods (
log_lik): Used for model comparison via WAIC, LOO-CV, etc. -
Posterior-predictive draws (
y_pred): Used for posterior predictive checks.
Additionally, it declares theta_i_k as a matrix to store the linear predictors, though it is not subsequently used in this block (likely for diagnostic purposes).
The block implements a dispatcher based on family_id_k[1], which identifies the distribution family. For each observation, it:
- Computes the log-likelihood using the appropriate Stan log-probability function.
- Generates a draw from the posterior predictive distribution using the corresponding RNG.
- Uses inverse link functions (via
apply_inv_link_by_id) to map linear predictors to distribution parameters.
This realizes the AMM reference-anchoring decomposition by applying the distributional-regression likelihood that was selected in the model block, using the linear predictors
generated quantities {Opens the generated quantities block. This block executes after the model is fitted, on each posterior draw.
vector[n] log_lik;
matrix[n, K] theta_i_k = eta_k;
vector[n] y_pred;-
log_lik: A vector of length$n$ (number of observations) to store the pointwise log-likelihood$\log p(y_i \mid \theta_i)$ for each observation$i$ . -
theta_i_k: An$n \times K$ matrix initialized to the linear predictor matrixeta_kfrom the transformed parameters block. This stores the linear predictors$\eta_{i,k}$ for each observation$i$ and each distributional parameter slot$k$ . It is declared for potential post-processing (e.g., extracting fitted values), but not used further in this block. -
y_pred: A vector of length$n$ to store posterior-predictive draws$y_i^{\text{rep}}$ .
for (i in 1:n) {Iterates over all family_id_k[1].
if (family_id_k[1] == 1) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
y_pred[i] = normal_rng(eta_k[i, 1], sigma_i);
}-
Interpretation: Normal distribution with identity link for
$\mu$ and possibly non-identity link for$\sigma$ . -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \mu_i,\ \eta_{i,2} = g^{-1}(\sigma_i)$ . - Inverse link:
$\sigma_i = g_{\sigma}^{-1}(\eta_{i,2})$ , where$g_{\sigma}$ is the link for$\sigma$ (often log). - Log-likelihood:
$\log p(y_i \mid \mu_i, \sigma_i) = -\frac{1}{2} \log(2\pi) - \log \sigma_i - \frac{(y_i - \mu_i)^2}{2\sigma_i^2}$ - Posterior predictive:
$y_i^{\text{rep}} \sim \mathcal{N}(\mu_i, \sigma_i^2)$ .
- Linear predictors:
-
AMM Role: The location parameter
$\mu_i$ is directly from the reference-anchored linear predictor$\eta_{i,1}$ ; the scale parameter$\sigma_i$ is transformed from$\eta_{i,2}$ via its link.
} else if (family_id_k[1] == 3) {
real mu_i = exp(eta_k[i, 1]);
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = neg_binomial_2_lpmf(y_int[i] | mu_i, phi_i);
y_pred[i] = neg_binomial_2_rng(mu_i, phi_i);
}-
Interpretation: Negative binomial distribution (alternative parameterization) with log link for
$\mu$ and possibly non-identity link for$\phi$ (precision/size). -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = g^{-1}(\phi_i)$ . - Inverse links:
$\mu_i = \exp(\eta_{i,1}),\ \phi_i = g_{\phi}^{-1}(\eta_{i,2})$ . - Log-likelihood (using Stan's
neg_binomial_2_lpmf):$\log p(y_i \mid \mu_i, \phi_i) = \log \Gamma(y_i + \phi_i) - \log \Gamma(\phi_i) - \log \Gamma(y_i + 1) + \phi_i \log \phi_i - \phi_i \log(\phi_i + \mu_i) + y_i \log \mu_i - y_i \log(\phi_i + \mu_i)$ - Posterior predictive:
$y_i^{\text{rep}} \sim \text{NB2}(\mu_i, \phi_i)$ .
- Linear predictors:
-
AMM Role: Both parameters are distributional;
$\mu_i$ is exponentiated from the reference-anchored$\eta_{i,1}$ , and$\phi_i$ is transformed via its link.
} else if (family_id_k[1] == 5) {
real mu_i = inv_logit(eta_k[i, 1]);
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = beta_proportion_lpdf(y_real[i] | mu_i, phi_i);
y_pred[i] = beta_proportion_rng(mu_i, phi_i);
}-
Interpretation: Beta distribution in the mean-precision parameterization (proportion), with logit link for
$\mu$ and possibly non-identity link for$\phi$ . -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \text{logit}(\mu_i),\ \eta_{i,2} = g^{-1}(\phi_i)$ . - Inverse links:
$\mu_i = \text{logit}^{-1}(\eta_{i,1}) = \frac{1}{1 + e^{-\eta_{i,1}}},\ \phi_i = g_{\phi}^{-1}(\eta_{i,2})$ . - Log-likelihood (using Stan's
beta_proportion_lpmf):$\log p(y_i \mid \mu_i, \phi_i) = \text{log Beta}(\mu_i \phi_i, (1 - \mu_i) \phi_i) - \log y_i - \log(1 - y_i)$ where$\text{log Beta}(a, b) = \log \Gamma(a) + \log \Gamma(b) - \log \Gamma(a + b)$ . - Posterior predictive:
$y_i^{\text{rep}} \sim \text{BetaProportion}(\mu_i, \phi_i)$ .
- Linear predictors:
-
AMM Role: The mean
$\mu_i$ is on the probability scale via logit link; the precision$\phi_i$ is transformed.
} else if (family_id_k[1] == 6) {
real mu_i = exp(eta_k[i, 1]);
real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = gamma_lpdf(y_real[i] | shape_i, shape_i / mu_i);
y_pred[i] = gamma_rng(shape_i, shape_i / mu_i);
}-
Interpretation: Gamma distribution with shape-rate parameterization, where rate
$\beta = \text{shape} / \mu$ . Log link for$\mu$ and possibly non-identity link for shape$\alpha$ . -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = g^{-1}(\alpha_i)$ . - Inverse links:
$\mu_i = \exp(\eta_{i,1}),\ \alpha_i = g_{\alpha}^{-1}(\eta_{i,2})$ . - Derived rate:
$\beta_i = \alpha_i / \mu_i$ . - Log-likelihood (using Stan's
gamma_lpdf):$\log p(y_i \mid \alpha_i, \beta_i) = (\alpha_i - 1) \log y_i - \beta_i y_i + \alpha_i \log \beta_i - \log \Gamma(\alpha_i)$ - Posterior predictive:
$y_i^{\text{rep}} \sim \text{Gamma}(\alpha_i, \beta_i)$ .
- Linear predictors:
-
AMM Role: Both shape and scale are distributional parameters; the mean
$\mu_i$ is exponentiated from$\eta_{i,1}$ , and shape is transformed.
} else if (family_id_k[1] == 7) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = lognormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
y_pred[i] = lognormal_rng(eta_k[i, 1], sigma_i);
}-
Interpretation: Lognormal distribution with identity link for
$\mu$ (on the log scale) and possibly non-identity link for$\sigma$ . -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \mu_i,\ \eta_{i,2} = g^{-1}(\sigma_i)$ . - Inverse link:
$\sigma_i = g_{\sigma}^{-1}(\eta_{i,2})$ . - Log-likelihood:
$\log p(y_i \mid \mu_i, \sigma_i) = -\frac{1}{2} \log(2\pi) - \log \sigma_i - \log y_i - \frac{(\log y_i - \mu_i)^2}{2\sigma_i^2}$ - Posterior predictive:
$y_i^{\text{rep}} \sim \text{Lognormal}(\mu_i, \sigma_i^2)$ .
- Linear predictors:
-
AMM Role: The location parameter
$\mu_i$ is directly from$\eta_{i,1}$ ; the scale$\sigma_i$ is transformed.
} else if (family_id_k[1] == 8) {
real nu_i = exp(eta_k[i, 3]);
real mu_i = eta_k[i, 1];
real sigma_i = exp(eta_k[i, 2]);
log_lik[i] = student_t_lpdf(y_real[i] | nu_i, mu_i, sigma_i);
y_pred[i] = student_t_rng(nu_i, mu_i, sigma_i);
}-
Interpretation: Student-t distribution with identity link for
$\mu$ , log link for$\sigma$ , and log link for$\nu$ (degrees of freedom). This is a three-parameter distribution. -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \mu_i,\ \eta_{i,2} = \log(\sigma_i),\ \eta_{i,3} = \log(\nu_i)$ . - Inverse links:
$\sigma_i = \exp(\eta_{i,2}),\ \nu_i = \exp(\eta_{i,3})$ . - Log-likelihood:
$\log p(y_i \mid \mu_i, \sigma_i, \nu_i) = \log \Gamma\left(\frac{\nu_i + 1}{2}\right) - \log \Gamma\left(\frac{\nu_i}{2}\right) - \frac{1}{2} \log(\nu_i \pi) - \log \sigma_i - \frac{\nu_i + 1}{2} \log\left(1 + \frac{(y_i - \mu_i)^2}{\nu_i \sigma_i^2}\right)$ - Posterior predictive:
$y_i^{\text{rep}} \sim \text{Student-t}(\nu_i, \mu_i, \sigma_i)$ .
- Linear predictors:
- AMM Role: All three parameters are distributional and transformed via exponentiation from their linear predictors.
} else if (family_id_k[1] == 9) {
real mu_i = exp(eta_k[i, 1]);
real phi_i = exp(eta_k[i, 2]);
real p_i = eta_k[i, 3];
log_lik[i] = tweedie_lpdf(y_real[i] | mu_i, phi_i, p_i);
y_pred[i] = tweedie_rng(mu_i, phi_i, p_i);
}-
Interpretation: Tweedie distribution with log links for
$\mu$ and$\phi$ and identity link for$p$ (power parameter,$1 < p < 2$ ). -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = \log(\phi_i),\ \eta_{i,3} = p_i$ . - Inverse links:
$\mu_i = \exp(\eta_{i,1}),\ \phi_i = \exp(\eta_{i,2})$ . - Log-likelihood: The Tweedie log-likelihood is defined via the Tweedie density, which is not expressed in closed form but can be computed numerically.
- Posterior predictive:
$y_i^{\text{rep}} \sim \text{Tweedie}(\mu_i, \phi_i, p_i)$ .
- Linear predictors:
-
AMM Role: All three parameters are distributional;
$\mu_i$ and$\phi_i$ are exponentiated,$p_i$ is identity-transformed.
} else if (family_id_k[1] == 10) {
// ZIP K = 2: log_lik mirrors the model-block dispatch; y_pred
// draws the structural zero indicator first and then a Poisson
// sample for the non-zero branch (Lambert 1992).
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
log_lik[i] = log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(0 | eta_mu_i)
);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i);
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
y_pred[i] = poisson_log_rng(eta_mu_i);
}
}-
Interpretation: Zero-inflated Poisson model with
$K=2$ distributional parameters:$\mu$ (Poisson rate) on the log scale and$\pi$ (zero-inflation probability) on the logit scale. -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = \text{logit}(\pi_i)$ . -
Likelihood:
- If
$y_i = 0$ :$\log p(y_i=0) = \log\left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right)$ Computed vialog_sum_expfor numerical stability. - If
$y_i > 0$ :$\log p(y_i) = \log(1 - \pi_i) + y_i \log \mu_i - \mu_i - \log(y_i!)$
- If
-
Posterior predictive:
- Draw
$z_i \sim \text{Bernoulli}(\pi_i)$ (structural zero indicator). - If
$z_i = 1$ , then$y_i^{\text{rep}} = 0$ . - If
$z_i = 0$ , then$y_i^{\text{rep}} \sim \text{Poisson}(\mu_i)$ .
- Draw
- Linear predictors:
-
AMM Role: Both
$\mu_i$ and$\pi_i$ are distributional parameters transformed via log and logit links, respectively.
} else if (family_id_k[1] == 11) {
// ZINB K = 3: symmetric to ZIP with neg_binomial_2_log.
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
log_lik[i] = log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
phi_i);
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
y_pred[i] = neg_binomial_2_log_rng(eta_mu_i, phi_i);
}
}-
Interpretation: Zero-inflated negative binomial model with
$K=3$ distributional parameters:$\mu$ (NB mean) on log scale,$\phi$ (NB dispersion) on log scale, and$\pi$ (zero-inflation probability) on logit scale. -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = \log(\phi_i),\ \eta_{i,3} = \text{logit}(\pi_i)$ . - Inverse links:
$\mu_i = \exp(\eta_{i,1}),\ \phi_i = \exp(\eta_{i,2})$ . -
Likelihood:
- If
$y_i = 0$ :$\log p(y_i=0) = \log\left( \pi_i + (1 - \pi_i) \left(\frac{\phi_i}{\phi_i + \mu_i}\right)^{\phi_i} \right)$ Computed vialog_sum_expfor numerical stability. - If
$y_i > 0$ :$\log p(y_i) = \log(1 - \pi_i) + \log \Gamma(y_i + \phi_i) - \log \Gamma(\phi_i) - \log \Gamma(y_i + 1) + \phi_i \log \phi_i - \phi_i \log(\phi_i + \mu_i) + y_i \log \mu_i - y_i \log(\phi_i + \mu_i)$
- If
-
Posterior predictive:
- Draw
$z_i \sim \text{Bernoulli}(\pi_i)$ . - If
$z_i = 1$ , then$y_i^{\text{rep}} = 0$ . - If
$z_i = 0$ , then$y_i^{\text{rep}} \sim \text{NB2}(\mu_i, \phi_i)$ .
- Draw
- Linear predictors:
- AMM Role: All three parameters are distributional and transformed via exponential and logit links.
} else if (family_id_k[1] == 12) {
// Hurdle-Poisson K = 2: log_lik mirrors the model-block
// dispatch. y_pred uses rejection sampling on Poisson(mu) to
// realize the truncated-at-one positive branch (Mullahy 1986);
// bounded at 10000 iterations for numerical safety when mu is
// pathologically small (in which case the family is borderline
// identifiable in the first place).
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i)
- log1m_exp(-exp(eta_mu_i));
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
int y_draw = 0;
int iter = 0;
while (y_draw == 0 && iter < 10000) {
y_draw = poisson_log_rng(eta_mu_i);
iter = iter + 1;
}
y_pred[i] = y_draw;
}
}-
Interpretation: Hurdle-Poisson model with
$K=2$ distributional parameters:$\mu$ (Poisson rate) on log scale and$\pi$ (hurdle probability) on logit scale. -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = \text{logit}(\pi_i)$ . -
Likelihood:
- If
$y_i = 0$ :$\log p(y_i=0) = \log \pi_i$ - If
$y_i > 0$ :$\log p(y_i) = \log(1 - \pi_i) + y_i \log \mu_i - \mu_i - \log(y_i!) - \log(1 - e^{-\mu_i})$ The term$\log(1 - e^{-\mu_i})$ normalizes the truncated Poisson distribution.
- If
-
Posterior predictive:
- Draw
$z_i \sim \text{Bernoulli}(\pi_i)$ (hurdle indicator). - If
$z_i = 1$ , then$y_i^{\text{rep}} = 0$ . - If
$z_i = 0$ , then$y_i^{\text{rep}}$ is drawn from a Poisson($\mu_i$ ) truncated to positive integers. This is done via rejection sampling: repeatedly draw from Poisson($\mu_i$ ) until a non-zero value is obtained, with a maximum of 10000 iterations to avoid infinite loops when$\mu_i$ is extremely small.
- Draw
- Linear predictors:
- AMM Role: Both parameters are distributional and transformed via log and logit links.
} else if (family_id_k[1] == 13) {
// Hurdle-NB K = 3: symmetric to Hurdle-Poisson with
// neg_binomial_2_log on the truncated positive branch.
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
phi_i)
- log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i,
phi_i));
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
int y_draw = 0;
int iter = 0;
while (y_draw == 0 && iter < 10000) {
y_draw = neg_binomial_2_log_rng(eta_mu_i, phi_i);
iter = iter + 1;
}
y_pred[i] = y_draw;
}
}-
Interpretation: Hurdle-negative binomial model with
$K=3$ distributional parameters:$\mu$ (NB mean) on log scale,$\phi$ (NB dispersion) on log scale, and$\pi$ (hurdle probability) on logit scale. -
Mathematics:
- Linear predictors:
$\eta_{i,1} = \log(\mu_i),\ \eta_{i,2} = \log(\phi_i),\ \eta_{i,3} = \text{logit}(\pi_i)$ . - Inverse links:
$\mu_i = \exp(\eta_{i,1}),\ \phi_i = \exp(\eta_{i,2})$ . -
Likelihood:
- If
$y_i = 0$ :$\log p(y_i=0) = \log \pi_i$ - If
$y_i > 0$ :$\log p(y_i) = \log(1 - \pi_i) + \log \Gamma(y_i + \phi_i) - \log \Gamma(\phi_i) - \log \Gamma(y_i + 1) + \phi_i \log \phi_i - \phi_i \log(\phi_i + \mu_i) + y_i \log \mu_i - y_i \log(\phi_i + \mu_i) - \log\left(1 - \left(\frac{\phi_i}{\phi_i + \mu_i}\right)^{\phi_i}\right)$ The last term normalizes the truncated negative binomial distribution.
- If
-
Posterior predictive:
- Draw
$z_i \sim \text{Bernoulli}(\pi_i)$ . - If
$z_i = 1$ , then$y_i^{\text{rep}} = 0$ . - If
$z_i = 0$ , then$y_i^{\text{rep}}$ is drawn from a NB2($\mu_i, \phi_i$ ) truncated to positive integers via rejection sampling (up to 10000 iterations).
- Draw
- Linear predictors:
- AMM Role: All three parameters are distributional and transformed via exponential and logit links.
} else {
log_lik[i] = negative_infinity();
y_pred[i] = 0;
}-
Interpretation: If
family_id_k[1]does not match any of the implemented families (1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13), the log-likelihood is set to$-\infty$ and the posterior predictive draw is set to 0. This is a fallback to avoid undefined behavior. -
Mathematics:
$\log p(y_i \mid \theta_i) = -\infty,\ y_i^{\text{rep}} = 0$ . -
AMM Role: Ensures the code does not crash for unsupported families, but the model should never reach this point if
family_id_kis correctly specified.
}
}Closes the for loop and the generated quantities block.
The generated quantities block realizes the AMM reference-anchoring decomposition by:
-
Using the linear predictors
eta_kthat are computed in the transformed parameters block from the reference-anchored parameterization. -
Applying appropriate inverse link functions to transform linear predictors to distribution parameters (e.g.,
$\exp$ for log links,$\text{logit}^{-1}$ for logit links). - Implementing the likelihood for each supported distributional family, mirroring the model block's likelihood dispatcher.
- Generating posterior predictive draws that respect the distributional form, including zero-inflation and hurdle mechanisms.
- Providing pointwise log-likelihoods for model comparison, which are essential for evaluating the reference-anchored model's fit.
This block thus completes the computational pipeline for the AMM model with distributional regression for
The header is a dense specification document. It fixes the following invariants that govern the rest of the file:
- Path 1 — Empirical Bayes (EB) Conditional Template, Path B, canonized as decision D34 in Session 10 (2026-05-25), sub-phase 8.6.C.
- Invoked from
R/eb.Rthroughcmdstanr::cmdstan_model(...)$sample(...)at Step (iii) of the EB workflow, for the regime$K \geq 2,\ p = 1$ (single design column,$K$ distributional slots).
Three changes define the conditional (as opposed to marginal) EB template:
-
theta_ref_kis relocated from theparameters{}block to thedata{}block and renamedtheta_ref_k_data, making the EB plug-in$\widehat{\theta}_{\mathrm{ref}}^{\mathrm{EB}}$ explicit. - The per-slot hyperparameters
$\mu_{\theta_{\mathrm{ref},k}}$ and$\sigma_{\theta_{\mathrm{ref},k}}$ are removed entirely — they existed only to endow$\theta_{\mathrm{ref},k}$ with a prior in the full-Bayes (FB) and EB-marginal regimes. - The
THETA_REF_PRIOR_BLOCKsubstitution is deleted frommodel{}because$\theta_{\mathrm{ref},k}$ is now a fixed data input.
Everything else is asserted to be bit-identical to amm_eb_marginal_K.stan, so that the conditional posterior
Distributional regression with canonical links per slot. The canonical AMM linear predictor for slot
where
| Family ID | Family | Slot 1 (location) link | Slot 2+ link | |
|---|---|---|---|---|
| 1 | Gaussian |
|
|
2 |
| 3 | neg_binomial_2 |
|
|
2 |
| 5 | Beta |
|
|
2 |
| 6 | Gamma |
|
shape log | 2 |
| 7 | lognormal_loc_scale |
|
|
2 |
| 8 | Student- |
|
|
3 |
| 9 | Tweedie |
|
|
3 |
| 10 | ZIP |
|
|
2 |
| 11 | ZINB |
|
|
3 |
| 12 | hurdle_poisson |
|
|
2 |
| 13 | hurdle_neg_binomial_2 |
|
|
3 |
Concrete likelihood forms cited:
-
Gaussian
$K=2$ :$y_i \sim \mathrm{Normal}(\eta_\mu[i],\, \exp(\eta_\sigma[i]))$ , with$\eta_\mu$ on the identity link and$\eta_\sigma$ on the log link. -
neg_binomial_2
$K=2$ :neg_binomial_2_log(y | eta_mu, exp(eta_phi)). -
Beta
$K=2$ :beta_proportion_lpdf(y_real | inv_logit(eta_mu), exp(eta_phi)). -
Gamma
$K=2$ :gamma_lpdf(y_real | exp(eta_shape), exp(eta_shape) / exp(eta_mu))— shape–mean parametrization. -
Student-
$t\ K=3$ :student_t_lpdf(y_real | exp(eta_nu), eta_mu, exp(eta_sigma)). -
Tweedie
$K=3$ :tweedie_lpdf(y_real | exp(eta_mu), exp(eta_phi), eta_p)with the hybrid Dunn–Smyth lpdf defined in thefunctions{}block.
Continuous hurdle families (hurdle_gamma, hurdle_lognormal) and heterogeneous family/link per slot are deferred to sub-phases 8.3.7+.
W_raw, a single sigma_W, and a single basis matrix feed the modulating term in every slot. Per-slot or location-only
Per-slot: each slot gdpar_formula_set (e.g. theta_anchor = c(mu = 0, sigma = 0)).
-
$a$ : configurable CP/NCP per slot viasegment()priors. Three placeholders (DATA_CP_A_PER_K_DECL,TP_A_BLOCK,MODEL_A_BLOCK) are resolved bygenerate_a_blocks_K()from a length-$K$ logical vectorcp_a_per_K. -
$W$ : configurable CP/NCP globally via placeholdersW_SCALEandW_PRIOR. -
$b$ : linear reparametrization per slot. Define$c_{b,k} = \theta_{\mathrm{ref},k} \cdot b_{\mathrm{coef},k}$ . The model samples$c_{b,k,\mathrm{raw}}$ directly (NCP on$c_{b,k,\mathrm{raw}}$ ) and reports$b_{\mathrm{coef},k}$ as a derived quantity in the single-anchor regime. This eliminates the bimodality of$(\theta_{\mathrm{ref},k}, b_k)$ — Recovery 2, handoff 4 addendum, 2026-05-09.
Column-wise centering of build_amm_design_K(). Conditions (C2) and (C3) of Block 1 hold identically per slot; (C4) holds per slot via
real tweedie_log_W_series(real y, real phi, real p) {
real alpha_pos = (2.0 - p) / (p - 1.0);
real log_z = alpha_pos * log(y)
- alpha_pos * log(p - 1.0)
- (1.0 + alpha_pos) * log(phi)
- log(2.0 - p);
real log_W = negative_infinity();
real max_log_W = negative_infinity();
int max_iter = 1000;
int passed_peak = 0;
for (j_loop in 1:max_iter) {
real j = j_loop;
real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);
if (term > max_log_W) {
max_log_W = term;
} else {
passed_peak = 1;
}
log_W = log_sum_exp(log_W, term);
if (passed_peak == 1 && term < max_log_W - 37.0) {
break;
}
}
return log_W;
}Mathematics. For the Tweedie compound Poisson–gamma model with
with summand
Taking logs,
where
The code computes alpha_pos log_z term is log_W accumulates max_log_W (the peak log-summand) and sets passed_peak = 1 once the terms start decreasing. Termination occurs when a post-peak term falls more than 37 nats below the peak — a relative contribution of
Return value:
real tweedie_log_f_series(real y, real mu, real phi, real p) {
real theta = pow(mu, 1.0 - p) / (1.0 - p);
real kappa = pow(mu, 2.0 - p) / (2.0 - p);
return (y * theta - kappa) / phi - log(y)
+ tweedie_log_W_series(y, phi, p);
}Mathematics. The exact Dunn–Smyth log-density for
where the Tweedie canonical and cumulant functions are
The code computes theta kappa
real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p) {
real deviance = 2.0 * (
y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p)
- (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p)
);
return -0.5 * log(2.0 * pi() * phi * pow(y, p))
- deviance / (2.0 * phi);
}Mathematics. Wood (2017, Eq. 3.46) gives the saddlepoint (Lugannani–Rice-style) approximation specialized to the Tweedie exponential dispersion family. The Tweedie deviance is
and the saddlepoint log-density is
The code computes deviance
real tweedie_lpdf(real y, real mu, real phi, real p) {
real tau = 0.4;
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
if (y == 0.0) {
return -lambda;
}
if (abs(p - 1.5) < tau) {
return tweedie_log_f_series(y, mu, phi, p);
} else {
return tweedie_log_f_saddlepoint(y, mu, phi, p);
}
}Mathematics. This is the public entry point. The point-mass at zero for the compound Poisson–gamma distribution has closed form
because
-
$|p - 1.5| < 0.4$ : exact Dunn–Smyth series (tweedie_log_f_series). -
$|p - 1.5| \geq 0.4$ : saddlepoint approximation (tweedie_log_f_saddlepoint).
The threshold _lpdf convention, so both y ~ tweedie(mu, phi, p) and tweedie_lpdf(y | mu, phi, p) dispatch through this function.
real tweedie_rng(real mu, real phi, real p) {
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
int N = poisson_rng(lambda);
if (N == 0) {
return 0.0;
}
real shape = (2.0 - p) / (p - 1.0);
real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));
return gamma_rng(N * shape, rate);
}Mathematics. The compound Poisson–gamma representation:
If gamma_rng(N * shape, rate). This is used in generated quantities for posterior-predictive checks.
real apply_inv_link_by_id(int link_id, real eta) {
if (link_id == 0) return eta;
if (link_id == 1) return inv_logit(eta);
return exp(eta);
}Mathematics. A dispatch table for canonical inverse links:
link_id |
Inverse link | |
|---|---|---|
| 0 | identity | |
| 1 | inverse-logit | |
| 2 | exp |
This is consumed by the inv_link_id_per_slot[k] data field is populated R-side by .gdpar_compute_inv_link_id_per_slot. For *_log_lpmf, bernoulli_logit_lpmf) are used instead, and a separate R-side guard rejects heterogeneous location families outside
A placeholder token, to be substituted by R/stan_codegen.R during template instantiation. It is not Stan syntax in the raw template; the codegen step replaces it with additional helper functions as required by the specific family/slot configuration.
int<lower=1> n;
int<lower=2> K;-
n: number of observations,$n \geq 1$ . -
K: number of K-individual distributional slots,$K \geq 2$ . Unit 3 fires only when$K > 1$ .
array[K] int<lower=1, upper=13> family_id_k;Each slot family_id_k[1] (the location slot); auxiliary slots consume inv_link_id_per_slot[k] via apply_inv_link_by_id().
array[K] int<lower=0, upper=2> inv_link_id_per_slot;Populated R-side by .gdpar_compute_inv_link_id_per_slot under the L1 refined rule (decision D3.5):
-
Homogeneous slot
$k$ (family_id_k[k] == family_id_k[1]): the ID reflects the canonical link of slot$k$ of the location family. -
Heterogeneous slot
$k$ (family_id_k[k] != family_id_k[1]): the ID reflects the canonical link of slot 1 offamily_id_k[k].
Values:
array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;
int<lower=0, upper=1> use_W;-
use_a_k[k]: whether the$a_k(\cdot)$ basis expansion is active in slot$k$ . -
use_b_k[k]: whether the$b_k(\cdot)$ basis expansion is active in slot$k$ . -
use_W: whether the globally shared modulating component$W$ is active (single flag, since$W$ is shared across all slots).
array[K] int<lower=0> J_a_per_k;
array[K] int<lower=0> J_b_per_k;
int<lower=0> J_a_max;
int<lower=0> J_b_max;-
J_a_per_k[k]$= J_a^{(k)}$ : number of columns in the$a$ -basis for slot$k$ . -
J_b_per_k[k]$= J_b^{(k)}$ : number of columns in the$b$ -basis for slot$k$ . -
J_a_max$= \max_k J_a^{(k)}$ : padding width for the flat-packed$a$ -design. -
J_b_max$= \max_k J_b^{(k)}$ : padding width for the flat-packed$b$ -design.
array[K] matrix[n, J_a_max] Z_a_k;
array[K] matrix[n, J_b_max] Z_b_k;int<lower=0> dim_W;
int<lower=0> d;
matrix[n, d] X;-
dim_W: dimension of the$W$ basis (number of basis functions or spline coefficients). -
d: number of columns in the global design matrix$X$ . -
X$\in \mathbb{R}^{n \times d}$ : the global covariate matrix. In the AMM canonical form,$x_i$ (the scalar appearing in the$W(\theta_{\mathrm{ref},k}) - W(\theta_{\mathrm{anchor},k})$ term) is extracted from$X$ ; with$p = 1,\ d = 1$ and$X$ is a single column.
vector[n] y_real;
array[n] int y_int;Both real and integer storage are allocated because the consumed branch depends on family_id_k[1]. With y_real; discrete families (neg_binomial_2, ZIP, ZINB, hurdle_poisson, hurdle_neg_binomial_2) consume y_int.
vector[K] theta_anchor_K;gdpar_formula_set. Default is 0 for each slot. These enter the AMM canonical form through the term
array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;Per-slot dispersion flags carried for inter-slot heterogeneity. For example, Gaussian sigma_y entry because the scale is K-individual; neg_binomial_2 phi entry for the same reason. The flags reproduce the param_spec retains population scope, and are kept here for future extension to mixed (K-individual + population) family patterns.
int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;-
W_type_id$\in \{0, 1, 2\}$ : selects the type of the modulating function$W$ (e.g. polynomial, spline, other — the exact mapping is defined inamm_main.stanheader per sub-phase 8.3.8). -
W_n_knots_full: number of knots for the$W$ basis (0 if not applicable). -
W_knots_full: the knot locations, lengthW_n_knots_full. -
W_degree: polynomial/spline degree for$W$ .
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;-
use_groups: whether a hierarchical anchor on$\theta_{\mathrm{ref},k}[g]$ is active. -
J_groups: number of groups$G$ . -
group_id[i]$\in \{1, \dots, G\}$ : group membership of observation$i$ .
The grouping structure is shared across slots: a single
A placeholder token, substituted by generate_a_blocks_K() from the cp_a_per_K logical vector of length vector[J_a_per_k[k]] a_k_raw for NCP slots, or vector[J_a_per_k[k]] a_k for CP slots, plus any associated scaling declarations).
int<lower=1> K_slots;
int<lower=1> p_dim;-
K_slots: under Path B,K_slots == K. Declared explicitly so that the template body can in principle branch on it, though under Path B it does not. -
p_dim: under Path B,p_dim == 1. Same rationale.
array[J_groups] vector[K] theta_ref_k_data;This is the central data input of the conditional template. It is a theta_ref_k_data[g][k]
Provenance. Populated R-side by .gdpar_eb_maximize_marginal(), which extracts cmdstanr::laplace() applied to amm_eb_marginal_K.stan, projected onto the per-slot anchors. When use_groups == 0, J_groups == 1 and the array has a single element containing the
Role in the AMM decomposition. Because
where
with no prior on
Scope of this section. The section opens with a lone
}that closes thefunctionsblock begun in section 1. It then contains four complete top-level blocks —transformed data,parameters,transformed parameters, andmodel— and terminates mid-modelblock (the closing brace and anygenerated quantitiesbelong to section 3). The template is the conditional AMM regime:theta_ref_kis supplied as data, so no hyperparametersmu_theta_ref_k/sigma_theta_ref_kare sampled.
array[K] int J_a_free;
array[K] int J_b_free;
array[K] int a_raw_offset;
array[K] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;Four integer arrays of length
| Symbol | Meaning |
|---|---|
J_a_free[k] |
Number of free additive coefficients actually sampled for slot J_a_per_k[k] when the slot is globally off (use_a_k[k] == 0) or has no a() term at all. |
J_b_free[k] |
Analogous for the multiplicative component. |
a_raw_offset[k] |
Start index of slot a_raw. |
c_b_raw_offset[k] |
Start index of slot c_b_k_raw. |
total_J_a_free |
a_raw. |
total_J_b_free |
c_b_k_raw. |
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;When the global modulating component is disabled (use_W == 0), both collapse to W_raw matrix in parameters degenerates to a dim_W_eff = dim_W (number of basis functions) and d_eff = d (covariate stride).
int any_use_a = 0;
int any_use_b = 0;
int sigma_W_size;
int sigma_y_size = 0;
int phi_size = 0;-
any_use_a/any_use_b: OR-reductions ofuse_a_k/use_b_kacross slots. They gate entire prior blocks and thesigma_b_karray length (K * any_use_bevaluates to$0$ when no slot uses the multiplicative component, eliminating the parameter entirely). -
sigma_W_size: set below to$1$ when W is active, else$0$ . -
sigma_y_size/phi_size: population-scoped dispersion parameter counts, accumulated in the loop.
for (k in 1:K) {
J_a_free[k] = (use_a_k[k] == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
J_b_free[k] = (use_b_k[k] == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
a_raw_offset[k] = total_J_a_free;
c_b_raw_offset[k] = total_J_b_free;
total_J_a_free += J_a_free[k];
total_J_b_free += J_b_free[k];
if (use_a_k[k] == 1) any_use_a = 1;
if (use_b_k[k] == 1) any_use_b = 1;
sigma_y_size += use_dispersion_y_k[k];
phi_size += use_dispersion_phi_k[k];
}For each slot
-
Free-count gating.
$J_{a,\text{free}}[k] = J_{a,\text{per}\,k}[k]$ iffuse_a_k[k] == 1and$J_{a,\text{per}\,k}[k] > 0$ ; otherwise$0$ . The conjunctive guard prevents declaring sampled coefficients for a slot whosea()spec is either globally off or empty. Identical logic for$J_{b,\text{free}}[k]$ . -
Offset registration. Before accumulating, the current running total is stored as slot
$k$ 's offset. This is the standard flat-packing convention: slot$k$ 's raw coefficients occupy indicesoffset[k] + 1…offset[k] + J_free[k]in the packed vector. -
Accumulation.
total_J_a_freeandtotal_J_b_freegrow by the slot's free count. -
Flag setting.
any_use_aandany_use_bare latched to$1$ on the first active slot (irreversible OR). -
Dispersion sizing.
sigma_y_sizeaccumulatesuse_dispersion_y_k[k](Gaussian$\sigma_y$ slots whose scope is population), andphi_sizeaccumulatesuse_dispersion_phi_k[k](NB$\phi$ slots whose scope is population). Slots whose dispersion is K-individual (modeled via their own AMM in another slot) do not contribute here.
sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;A single global
int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
if (J_a_free[k] > 0) {
n_sigma_a += 1;
sigma_a_idx[k] = n_sigma_a;
} else {
sigma_a_idx[k] = 0;
}
}This is the compaction optimization described in the inline comment. The problem: if sigma_a_k were declared as array[K], a slot with use_a_k[k] == 0, e.g. a phi ~ 1 slot, or an intercept-only a() fully absorbed into theta_ref) would still sample a half-strict-positive scale parameter with no likelihood contribution — a flat, non-identified direction that stalls NUTS.
The compaction builds:
-
n_sigma_a: the count of slots with$J_{a,\text{free}}[k] > 0$ . -
sigma_a_idx[k]: a remapping from slot index$k$ to the compacted index insigma_a_k($1$ -based if active,$0$ if inactive — a sentinel).
When every slot carries free additive coefficients, {{TP_A_BLOCK}} and {{MODEL_A_BLOCK}} injection points (see below) are responsible for using sigma_a_idx[k] to index into the compacted vector.
// theta_ref_k is now data ... no hyperparameters ... are sampled
// in the conditional regime.
array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;
array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_k_raw;
array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;
array[sigma_y_size] real<lower=0> sigma_y_pop;
array[phi_size] real<lower=0> phi_pop;-
sigma_a_k: compacted per-slot additive scales. Length$n_{\sigma_a} \le K$ . Each entry$\sigma_{a,k} > 0$ . -
a_raw: flat-packed standard-normal-scale additive coefficients. Length$\sum_k J_{a,\text{free}}[k]$ . The non-centered parametrization is$a_{k,j} = \sigma_{a,k} \cdot a_{\text{raw},k,j}$ (realized in{{TP_A_BLOCK}}).
-
sigma_b_k: per-slot multiplicative scales. Length$K \cdot \mathbb{1}[\text{any\_use\_b}]$ , i.e.$K$ when any slot usesb(),$0$ otherwise. TheK * any_use_bidiom makes the array vanish entirely when the multiplicative component is globally off. -
c_b_k_raw: flat-packed standard-normal-scale coefficients on the$c_b$ scale (not the$b$ scale). Length$\sum_k J_{b,\text{free}}[k]$ .
The reference-anchoring reparametrization is:
The linear predictor uses
-
sigma_W: single scalar$\sigma_W > 0$ (length 1 when active, 0 otherwise). -
W_raw:$\text{dim\_W} \times d$ matrix of raw W coefficients. When W is off, both dimensions are$0$ .
-
sigma_y_pop: Gaussian$\sigma_y$ entries for slots whose dispersion scope is population. Lengthsigma_y_size. -
phi_pop: NB$\phi$ entries for slots whose dispersion scope is population. Lengthphi_size.
Slots whose dispersion is K-individual (modeled via their own AMM) do not appear here; their dispersion is carried in eta_k[i, k] for the relevant slot.
The comment block makes explicit that in the conditional regime:
-
theta_ref_kis data (passed in astheta_ref_k_data). -
mu_theta_ref_kandsigma_theta_ref_kare not sampled — they were devices for the marginal/FB regimes only.
This is the defining property of the conditional template: the reference is fixed, and only the deviation components (
array[K] vector[J_a_max] a_coef_k;
array[K] vector[J_b_max] c_b_k;
array[K] vector[J_b_max] b_coef_k;
matrix[n, K] eta_k;-
a_coef_k[k]: length-$J_{a,\max}$ vector of additive coefficients for slot$k$ (padded with zeros beyond$J_{a,\text{per}\,k}[k]$ ). -
c_b_k[k]: length-$J_{b,\max}$ vector of multiplicative coefficients on the$c_b$ scale. -
b_coef_k[k]: length-$J_{b,\max}$ vector of multiplicative coefficients on the$b$ scale (derived; zero in per-group-anchor regime). -
eta_k:$n \times K$ linear predictor matrix. Entry$\eta_k[i,k]$ is the linear predictor for observation$i$ , slot$k$ .
for (k in 1:K) {
a_coef_k[k] = rep_vector(0, J_a_max);
c_b_k[k] = rep_vector(0, J_b_max);
b_coef_k[k] = rep_vector(0, J_b_max);
}All coefficient vectors are zero-initialized. This ensures that slots/indices that are never written to (inactive slots, or padding indices eta_k.
{{TP_A_BLOCK}}This is a template injection point. Based on the surrounding context (the compaction logic in transformed data and the prior block {{MODEL_A_BLOCK}}), the injected code reconstructs a_coef_k[k] from a_raw and sigma_a_k using the offsets and sigma_a_idx. The canonical form is, for each active slot
The sigma_a_idx indirection routes each slot to its compacted scale entry. Inactive slots (a_coef_k[k] at zero.
if (any_use_b == 1) {
for (k in 1:K) {
if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
for (j in 1:J_b_free[k]) {
c_b_k[k][j] = c_b_k_raw[c_b_raw_offset[k] + j] * sigma_b_k[k];
}
if (use_groups == 0 && theta_ref_k_data[1][k] != 0) {
for (j in 1:J_b_per_k[k]) {
b_coef_k[k][j] = c_b_k[k][j] / theta_ref_k_data[1][k];
}
}
}
}
}Outer guard. The entire block is skipped if any_use_b == 0 (no slot uses the multiplicative component).
Per-slot guard. For slot use_b_k[k] == 1 and
Note: sigma_b_k[k] is indexed directly by sigma_b_k has length sigma_b because the K * any_use_b gate already eliminates the entire array when no slot uses b(), and within-active-slot zero counts are handled by the J_b_free[k] loop bound).
b_coef_k[k] = c_b_k[k] / theta_ref_k[1][k]is reported as a derived quantity only in the single-anchor regime. With per-group anchorsb_coef_kstays at zero; downstream callers derive it per group fromc_b_kandtheta_ref_k[g][k].
The guard use_groups == 0 && theta_ref_k_data[1][k] != 0 ensures:
- Single-anchor regime (no per-group anchors).
- The reference is nonzero (avoiding division by zero).
When the guard passes, for J_b_per_k, not J_b_free, but since the guard requires J_b_per_k[k] > 0 and use_b_k[k] == 1, we have
where theta_ref_k_data[1][k] — the reference for group 1 (the single anchor). This is the inverse of the reference-anchoring reparametrization
for (i in 1:n) {
int g_i = group_id[i];
for (k in 1:K) {
real theta_ref_ik = theta_ref_k_data[g_i][k];
real eta_ik = theta_ref_ik;
if (use_a_k[k] == 1 && J_a_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_a_k[k][i, 1:J_a_per_k[k]]),
a_coef_k[k][1:J_a_per_k[k]]
);
}
if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_b_k[k][i, 1:J_b_per_k[k]]),
c_b_k[k][1:J_b_per_k[k]]
);
}
if (use_W == 1 && dim_W > 0 && d > 0) {
// ... W modulation ...
}
eta_k[i, k] = eta_ik;
}
}For each observation
Step 1: Group lookup. g_i = group_id[i] determines which group's reference applies to observation
Step 2: Reference anchor.
This is the anchor point of the AMM decomposition: all deviations are additive perturbations on top of
Step 3: Additive deviation. If slot a() coefficient:
The design matrix slice Z_a_k[k][i, 1:J_a_per_k[k]] is cast to a vector and dotted with the coefficient slice. This is the additive smooth term: a linear deviation from the reference, unconstrained in sign.
Step 4: Multiplicative deviation (on
This is the reference-anchored multiplicative term. The key identity: because
Step 5: W modulation. Guarded by use_W == 1 && dim_W > 0 && d > 0:
vector[d] W_diff_x_k = rep_vector(0, d);
vector[dim_W] basis_diff_k = apply_W_basis_diff(
W_type_id, theta_ref_ik, theta_anchor_K[k], dim_W,
W_degree, W_n_knots_full, W_knots_full
);
for (jj_local in 1:dim_W) {
for (jjj in 1:d) {
W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[jj_local, jjj]{{W_SCALE}};
}
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));The W component is a basis-function difference modulated by covariates:
-
basis_diff_k = apply_W_basis_diff(W_type_id, theta_ref_ik, theta_anchor_K[k], ...): computes the difference of the W basis evaluated at the current reference$\theta_{\text{ref},i,k}$ minus the basis at the anchor$\theta_{\text{anchor},K}[k]$ . ForW_type_id == 1(polynomial) orW_type_id == 2(B-spline), this is a vector of lengthdim_W:
where
- The double loop computes
$W_{\text{diff},x,k} = \Delta_k^\top W_{\text{raw}}$ (a length-$d$ vector), where$W_{\text{raw}} \in \mathbb{R}^{\text{dim\_W} \times d}$ . The{{W_SCALE}}injection point multiplies each entry by$\sigma_W$ (or a function thereof), implementing the non-centered parametrization:
- The final contribution is the dot product with the covariate vector
$X_i$ :
The inline comment notes that for W_raw has dim_W rows (not dim_W * p as in the multivariate amm_distrib_multi.stan template). The basis dispatch via apply_W_basis_diff() supports W_type_id
Full AMM decomposition. Combining all terms:
where
// No prior on theta_ref_k / mu_theta_ref_k / sigma_theta_ref_k:
// theta_ref_k is data in the conditional template. xi components
// (a, b, W, sigma_*, phi) retain their canonical priors.Explicit statement: only the deviation components receive priors. The reference is fixed data.
if (any_use_a == 1) {
sigma_a_k ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}-
sigma_a_k ~ {{PRIOR_SIGMA_A}}: the per-slot additive scales receive a template-injected prior (typically half-normal, half-Cauchy, or exponential). The prior is on the compacted vector of length$n_{\sigma_a}$ . -
{{MODEL_A_BLOCK}}: injected prior ona_raw. The canonical non-centered form isa_raw ~ normal(0, 1), giving$a_{k,j} \sim \text{Normal}(0, \sigma_{a,k}^2)$ marginally. This block may also contain slot-specific prior modifications.
The entire block is gated by any_use_a == 1, so no prior is evaluated when the additive component is globally off.
if (any_use_b == 1) {
sigma_b_k ~ {{PRIOR_SIGMA_B}};
c_b_k_raw ~ normal(0, 1);
}-
sigma_b_k ~ {{PRIOR_SIGMA_B}}: template-injected prior on the$K$ multiplicative scales. -
c_b_k_raw ~ normal(0, 1): hardcoded standard normal on the raw$c_b$ coefficients. This is the non-centered parametrization:$c_{b,k,j}^{\text{raw}} \sim \text{Normal}(0,1)$ implies$c_{b,k,j} \sim \text{Normal}(0, \sigma_{b,k}^2)$ .
Note the asymmetry with the additive block: a_raw's prior is injected via {{MODEL_A_BLOCK}} (allowing customization), while c_b_k_raw's prior is hardcoded normal(0, 1).
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}-
sigma_W[1] ~ {{PRIOR_SIGMA_W}}: prior on the single global W scale. -
to_vector(W_raw) ~ {{W_PRIOR}}: template-injected prior on the vectorized W matrix. The canonical form isnormal(0, 1)(non-centered), with the scaling by$\sigma_W$ applied intransformed parametersvia{{W_SCALE}}.
if (sigma_y_size > 0) {
sigma_y_pop ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
phi_pop ~ {{PRIOR_PHI}};
}Population-scoped dispersion parameters receive template-injected priors, but only if the respective count is positive. These are the dispersion parameters that are not modeled via their own AMM slot (those are captured in eta_k).
The likelihood is dispatched on family_id_k[1] — the family identifier of slot 1. The inline comment explains:
The homogeneous-family assumption (
family_id_kall equal) holds in Unit 3;family_id_k[1]drives the dispatch.
The comment also notes that family_id_k[1] in families.R ensures promotion to per_observation requires the slot name to be eligible).
for (i in 1:n) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ normal(eta_k[i, 1], sigma_i);
}- Slot 1 (location): identity link,
$\mu_i = \eta_{i,1}$ . - Slot 2 (scale): inverse link applied via
apply_inv_link_by_id(inv_link_id_per_slot[2], ...). Under homogeneity (family_id_k[2] == 1),inv_link_id_per_slot[2] == 2→exp, giving$\sigma_i = e^{\eta_{i,2}} > 0$ (canonical log link for Gaussian$\sigma$ ). Under heterogeneity (e.g. Beta in slot 2), the link follows the heterogeneous family's slot-1 inverse link (e.g.inv_logitfor Beta, giving$\sigma \in (0,1)$ ).
for (i in 1:n) {
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_int[i] ~ neg_binomial_2(exp(eta_k[i, 1]), phi_i);
}- Slot 1 (mean): log link,
$\mu_i = e^{\eta_{i,1}}$ . - Slot 2 (dispersion): inverse link applied. Under homogeneity,
$\phi_i = e^{\eta_{i,2}}$ (log link), reproducing the legacyneg_binomial_2_log(eta_mu, exp(eta_phi))idiom.
The NB2 parametrization has
for (i in 1:n) {
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ beta_proportion(inv_logit(eta_k[i, 1]), phi_i);
}Canonical mean-precision parametrization: beta_proportion(mu, phi) has density proportional to
for (i in 1:n) {
real mu_i = exp(eta_k[i, 1]);
real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ gamma(shape_i, shape_i / mu_i);
}Mean-shape parametrization: Stan's gamma(alpha, beta) has mean
for (i in 1:n) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
y_real[i] ~ lognormal(eta_k[i, 1], sigma_i);
}Canonical
for (i in 1:n) {
y_real[i] ~ student_t(exp(eta_k[i, 3]), eta_k[i, 1], exp(eta_k[i, 2]));
}Canonical
- Slot 1: location, identity link.
- Slot 2:
$\log\sigma$ →$\sigma = e^{\eta_{i,2}}$ . - Slot 3:
$\log\nu$ →$\nu = e^{\eta_{i,3}}$ (positive real degrees of freedom).
Note: no apply_inv_link_by_id dispatch here — the links are hardcoded as exp for slots 2 and 3, identity for slot 1.
for (i in 1:n) {
y_real[i] ~ tweedie(exp(eta_k[i, 1]), exp(eta_k[i, 2]), eta_k[i, 3]);
}Canonical
- Slot 1:
$\log\mu$ (log link). - Slot 2:
$\log\phi$ (log link). - Slot 3:
$p$ on identity link, structurally bounded to$(1.01, 1.99)$ by a slot-specific prior emitted R-side (decision E7 of D5). The bounds ensure the Tweedie is in the compound-Poisson range ($1 < p < 2$ ).
The custom tweedie_lpdf (defined in section 1) dispatches Dunn–Smyth series or saddlepoint depending on
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
target += log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i) + poisson_log_lpmf(0 | eta_mu_i)
);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i);
}
}Canonical
- Slot 1:
$\eta_\mu = \log\mu$ (log link, consumed bypoisson_log_lpmf). - Slot 2:
$\eta_\pi = \text{logit}\,\pi$ (logit link, consumed bybernoulli_logit_lpmf).
log_sum_exp of the two components:
- Structural zero:
bernoulli_logit_lpmf(1 | eta_pi)$= \log\pi$ . - Sampling zero:
bernoulli_logit_lpmf(0 | eta_pi) + poisson_log_lpmf(0 | eta_mu)$= \log(1-\pi) + (-\mu)$ .
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
target += log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i) + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i);
}
}Canonical
- Slot 1:
$\eta_\mu = \log\mu$ . - Slot 2:
$\eta_\phi = \log\phi$ →$\phi = e^{\eta_{i,2}}$ (hardcodedexp, noapply_inv_link_by_id). - Slot 3:
$\eta_\pi = \text{logit}\,\pi$ .
Structurally identical to ZIP but with neg_binomial_2_log_lpmf replacing poisson_log_lpmf.
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
target += bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i)
- log1m_exp(-exp(eta_mu_i));
}
}Canonical
Distinct from ZIP: the zero is a structural decision (Bernoulli hurdle), and the positive branch is a Poisson truncated at one.
bernoulli_logit_lpmf(1 | eta_pi).
log1m_exp(-exp(eta_mu_i)):
-
exp(eta_mu_i)$= \mu$ . -
-exp(eta_mu_i)$= -\mu$ . -
exp(-mu)$= e^{-\mu} = \text{Poisson}(0 \mid \mu)$ . -
log1m_exp(-mu)$= \log(1 - e^{-\mu})$ .
for (i in 1:n) {
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
target += bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
target += bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i)
- log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i));
}
}Canonical
Symmetric to Hurdle-Poisson with NB2 in the positive branch. The truncation denominator is
-
neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)$= \log\text{NegBinomial2}(0 \mid \mu, \phi)$ . -
log1m_exp(...)$= \log(1 - \text{NegBinomial2}(0 \mid \mu, \phi))$ .
The trailing comment:
// family_id_k[1] in {2 (poisson), 4 (bernoulli)}: these are
// K = 1 families (single eligible param_spec); the dispatcher
// never routes them here. The guard in R/families.R ensures
// promotion to per_observation requires the slot name to be
// eligible.Poisson (family_id_k[1] == 2) and Bernoulli (family_id_k[1] == 4) are families.R prevents them from being promoted to per_observation scope with
The conditional template realizes the decomposition:
Key properties:
-
Reference is data.
$\theta_{\text{ref},i,k}$ is fixed (passed astheta_ref_k_data), not sampled. This is the defining feature of the conditional regime. -
Additive deviation is unconstrained in sign and scale (modulated by
$\sigma_{a,k}$ ). -
Multiplicative deviation is parametrized on the
$c_b$ scale, where$c_b = \theta_{\text{ref}} \cdot b$ . The linear predictor uses$c_b$ directly, so the model is identified in$c_b$ -space. The$b$ -scale recovery is a derived quantity (single-anchor only). -
W modulation is anchored at
$\theta_{\text{anchor},k}$ : the basis difference$\Delta_k = B(\theta_{\text{ref}}) - B(\theta_{\text{anchor}})$ ensures the modulation vanishes when$\theta_{\text{ref}} = \theta_{\text{anchor}}$ . -
Non-centered parametrization throughout:
$a = \sigma_a \odot a^{\text{raw}},\ c_b = \sigma_b \odot c_b^{\text{raw}},\ W = \sigma_W \cdot W^{\text{raw}}$ , with$a^{\text{raw}}, c_b^{\text{raw}}, W^{\text{raw}} \sim \text{Normal}(0, 1)$ . -
Compaction eliminates non-identified scale parameters for slots with no free coefficients (Option A, D96), with bit-identical behavior when all slots are active.
This section closes the preceding model block (the lone } on the first line) and then supplies the entire generated quantities block. There are no functions, data, transformed data, parameters, or transformed parameters blocks in this section; the analysis below is therefore restricted to the generated quantities block, with cross-references to identifiers that must have been declared in earlier sections (sections 1–2).
}This } terminates the immediately preceding block (the model block from section 2). It is not part of generated quantities; it appears here only because section 2 was truncated mid-block.
generated quantities {Opens the generated quantities program block. In Stan, code in this block is executed once per HMC/NUTS leapfrog trajectory after the Metropolis accept/reject step, using the accepted parameter draw. Quantities declared here are stored in the posterior and are available for posterior-predictive checks, WAIC/LOO via log_lik, and so on. They do not affect the joint density used for sampling (no target += accumulation here).
// Per-observation log-likelihood, fitted linear predictors per
// slot, and posterior-predictive draws. The shape mirrors the
// K = 1 + p >= 1 template up to the swap of p with K and the
// distributional-regression likelihood selection.
vector[n] log_lik;
matrix[n, K] theta_i_k = eta_k;
vector[n] y_pred;Three quantities are declared:
-
vector[n] log_lik;— A length-$n$ vector that will hold the pointwise log-likelihood$\log p(y_i \mid \eta_{i,\cdot})$ for each observation$i = 1, \ldots, n$ . This is the canonical interface forloo/psis-loo/waicin the R ecosystem: each entry is$\log p(y_i \mid \theta_i)$ evaluated at the current posterior draw. -
matrix[n, K] theta_i_k = eta_k;— An$n \times K$ matrix that is aliased (copied by value at construction) frometa_k. The comment makes the AMM reference-anchoring role explicit: in the$K = 1 + p$ template (one mean slot plus$p$ distributional-parameter slots), the per-observation predictor matrix is conventionally denoted$\theta_{i,k}$ . Here, with$K \geq 1$ slots, the same object is simplyeta_k, andtheta_i_kis exposed under the canonical name so downstream R code that expectstheta_i_kworks uniformly across the$K=1+p$ and the general-$K$ parametrizations. Mathematically,$\theta_{i,k} \;=\; \eta_{i,k}, \qquad i = 1,\ldots,n,\; k = 1,\ldots,K.$ No transformation is applied; this is a pure renaming for API compatibility with the reference-anchoring decomposition in which$\eta_{i,k} = x_i^\top \beta_k$ (or, in the AMM anchored form,$\eta_{i,k} = x_i^\top \beta_k + \sum_{r} \lambda_{k,r}\, z_{i,r}$ with anchoring constraints absorbed into the prior on$\beta_k$ — see sections 1–2). -
vector[n] y_pred;— A length-$n$ vector that will hold one posterior-predictive draw$y_i^{\text{rep}} \sim p(y_i \mid \eta_{i,\cdot})$ per observation, using the_rngcounterparts of the same family dispatched in themodelblock.
for (i in 1:n) {Iterates over observations family_id_k[1], i.e. the first element of the integer family-id vector. The AMM convention is that all family_id_k[1]); the remaining elements of family_id_k (if any) are not consulted in this block. The dispatch is a long if / else if chain.
if (family_id_k[1] == 1) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
y_pred[i] = normal_rng(eta_k[i, 1], sigma_i);
}-
Slot 1 (
eta_k[i, 1]) is the mean$\mu_i$ on the identity link:$\mu_i = \eta_{i,1}$ . -
Slot 2 (
eta_k[i, 2]) is the standard deviation$\sigma_i$ , mapped through the user-selected inverse link$g_2^{-1}$ identified byinv_link_id_per_slot[2]:$\sigma_i \;=\; g_2^{-1}(\eta_{i,2}).$ Typical choices:exp(log-link on$\sigma$ ) orsoftplus/squareetc., depending on the integer id resolved byapply_inv_link_by_id. -
Likelihood:
$\log p(y_i \mid \mu_i, \sigma_i) \;=\; -\tfrac{1}{2}\log(2\pi) - \log\sigma_i - \frac{(y_i - \mu_i)^2}{2\sigma_i^2},$ implemented bynormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i). The response is the real-valued outcome vectory_real. -
Posterior predictive:
$y_i^{\text{rep}} \sim \mathcal{N}(\mu_i, \sigma_i)$ vianormal_rng.
In AMM reference-anchoring terms, eta_k in transformed parameters (sections 1–2).
} else if (family_id_k[1] == 3) {
real mu_i = exp(eta_k[i, 1]);
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = neg_binomial_2_lpmf(y_int[i] | mu_i, phi_i);
y_pred[i] = neg_binomial_2_rng(mu_i, phi_i);
}-
Slot 1: mean on the log link,
$\mu_i = \exp(\eta_{i,1})$ . -
Slot 2: precision/reciprocal-dispersion
$\phi_i = g_2^{-1}(\eta_{i,2})$ via the user-selected inverse link. The NB2 variance function is$\mathrm{Var}(Y_i) = \mu_i + \mu_i^2/\phi_i$ . -
Likelihood (integer response
y_int):$\log p(y_i \mid \mu_i, \phi_i) \;=\; \log\Gamma(y_i + \phi_i) - \log\Gamma(\phi_i) - \log\Gamma(y_i + 1) + \phi_i \log\!\left(\frac{\phi_i}{\mu_i + \phi_i}\right) + y_i \log\!\left(\frac{\mu_i}{\mu_i + \phi_i}\right).$ -
Posterior predictive:
$y_i^{\text{rep}} \sim \mathrm{NB}_2(\mu_i, \phi_i)$ vianeg_binomial_2_rng.
Note the family-id gap: family_id_k[1] == 2 is absent from this dispatch, implying that id 2 is reserved for a family not handled in this template (likely Bernoulli/Binomial handled elsewhere or omitted).
} else if (family_id_k[1] == 5) {
real mu_i = inv_logit(eta_k[i, 1]);
real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = beta_proportion_lpdf(y_real[i] | mu_i, phi_i);
y_pred[i] = beta_proportion_rng(mu_i, phi_i);
}-
Slot 1: mean
$\mu_i = \mathrm{logit}^{-1}(\eta_{i,1}) \in (0,1)$ . -
Slot 2: precision
$\phi_i = g_2^{-1}(\eta_{i,2}) > 0$ . -
Likelihood (real response in
$(0,1)$ ): the Beta parametrized by mean and precision,$p(y_i \mid \mu_i, \phi_i) \;=\; \frac{\Gamma(\phi_i)}{\Gamma(\mu_i \phi_i)\,\Gamma((1-\mu_i)\phi_i)}\, y_i^{\mu_i\phi_i - 1}(1-y_i)^{(1-\mu_i)\phi_i - 1},$ with$\mathrm{Var}(Y_i) = \mu_i(1-\mu_i)/(\phi_i + 1)$ . -
Posterior predictive:
$y_i^{\text{rep}} \sim \mathrm{BetaProportion}(\mu_i, \phi_i)$ .
Family id 4 is skipped (reserved).
} else if (family_id_k[1] == 6) {
real mu_i = exp(eta_k[i, 1]);
real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = gamma_lpdf(y_real[i] | shape_i, shape_i / mu_i);
y_pred[i] = gamma_rng(shape_i, shape_i / mu_i);
}-
Slot 1: mean
$\mu_i = \exp(\eta_{i,1})$ (log link). -
Slot 2: shape
$\alpha_i = g_2^{-1}(\eta_{i,2})$ . - The Stan
gamma_lpdf(y | alpha, beta)uses shape–rate parametrization, so the rate is recovered from the mean via$\beta_i = \alpha_i / \mu_i$ (since$\mathbb{E}[Y] = \alpha/\beta$ ). Hence:$\log p(y_i \mid \alpha_i, \beta_i) \;=\; \alpha_i \log\beta_i - \log\Gamma(\alpha_i) + (\alpha_i - 1)\log y_i - \beta_i y_i, \qquad \beta_i = \alpha_i/\mu_i.$ -
Posterior predictive:
$y_i^{\text{rep}} \sim \mathrm{Gamma}(\alpha_i, \alpha_i/\mu_i)$ viagamma_rng(shape_i, shape_i / mu_i).
} else if (family_id_k[1] == 7) {
real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
eta_k[i, 2]);
log_lik[i] = lognormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
y_pred[i] = lognormal_rng(eta_k[i, 1], sigma_i);
}-
Slot 1:
$\log$ -scale mean$\mu_i^{\log} = \eta_{i,1}$ (identity link on the log scale). -
Slot 2:
$\log$ -scale SD$\sigma_i = g_2^{-1}(\eta_{i,2})$ . -
Likelihood:
$\log p(y_i \mid \eta_{i,1}, \sigma_i) \;=\; -\log y_i - \tfrac{1}{2}\log(2\pi) - \log\sigma_i - \frac{(\log y_i - \eta_{i,1})^2}{2\sigma_i^2}.$ -
Posterior predictive:
$y_i^{\text{rep}} \sim \mathrm{Lognormal}(\eta_{i,1}, \sigma_i)$ .
} else if (family_id_k[1] == 8) {
real nu_i = exp(eta_k[i, 3]);
real mu_i = eta_k[i, 1];
real sigma_i = exp(eta_k[i, 2]);
log_lik[i] = student_t_lpdf(y_real[i] | nu_i, mu_i, sigma_i);
y_pred[i] = student_t_rng(nu_i, mu_i, sigma_i);
}-
Slot 1: location
$\mu_i = \eta_{i,1}$ (identity). -
Slot 2: scale
$\sigma_i = \exp(\eta_{i,2})$ (log link, hard-coded — not routed throughapply_inv_link_by_id). -
Slot 3: degrees of freedom
$\nu_i = \exp(\eta_{i,3})$ (log link, hard-coded). -
Likelihood:
$\log p(y_i \mid \nu_i, \mu_i, \sigma_i) \;=\; \log\Gamma\!\left(\frac{\nu_i + 1}{2}\right) - \log\Gamma\!\left(\frac{\nu_i}{2}\right) - \tfrac{1}{2}\log(\nu_i\pi) - \log\sigma_i - \frac{\nu_i + 1}{2}\log\!\left(1 + \frac{1}{\nu_i}\left(\frac{y_i - \mu_i}{\sigma_i}\right)^2\right).$ -
Posterior predictive:
$y_i^{\text{rep}} \sim t_{\nu_i}(\mu_i, \sigma_i)$ .
Note the asymmetry with families 1, 3, 5, 6, 7: here both non-location slots use exp directly rather than apply_inv_link_by_id. This is a hard-coded convention of the template for the Student-
} else if (family_id_k[1] == 9) {
real mu_i = exp(eta_k[i, 1]);
real phi_i = exp(eta_k[i, 2]);
real p_i = eta_k[i, 3];
log_lik[i] = tweedie_lpdf(y_real[i] | mu_i, phi_i, p_i);
y_pred[i] = tweedie_rng(mu_i, phi_i, p_i);
}-
Slot 1: mean
$\mu_i = \exp(\eta_{i,1})$ (log link). -
Slot 2: dispersion
$\phi_i = \exp(\eta_{i,2})$ (log link, hard-coded). -
Slot 3: power parameter
$p_i = \eta_{i,3}$ (identity link, unconstrained in the linear predictor — the validity domain$p \in (1,2)$ is not enforced here; it must be enforced upstream, e.g. by a prior or by a constrained inverse link intransformed parameters). -
Likelihood: compound Poisson–gamma Tweedie density,
$p(y_i \mid \mu_i, \phi_i, p_i) \;=\; \text{Tweedie}_{p_i}(\mu_i, \phi_i),$ evaluated viatweedie_lpdf(a user-defined orbrms-style implementation exposed to Stan; the evaluation uses the series expansion of Dunn & Smyth 2005). -
Posterior predictive:
$y_i^{\text{rep}} \sim \mathrm{Tweedie}(\mu_i, \phi_i, p_i)$ viatweedie_rng.
} else if (family_id_k[1] == 10) {
// ZIP K = 2: log_lik mirrors the model-block dispatch; y_pred
// draws the structural zero indicator first and then a Poisson
// sample for the non-zero branch (Lambert 1992).
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
log_lik[i] = log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(0 | eta_mu_i)
);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i);
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
y_pred[i] = poisson_log_rng(eta_mu_i);
}
}-
Slot 1: Poisson mean on the log link,
$\log\mu_i = \eta_{i,1}$ , i.e.$\mu_i = \exp(\eta_{i,1})$ . The code usespoisson_log_lpmf, which takes the log-mean directly, so no explicitexpis needed. -
Slot 2: zero-inflation probability on the logit link,
$\pi_i = \mathrm{logit}^{-1}(\eta_{i,2})$ , viabernoulli_logit_lpmf/bernoulli_logit_rngwhich take the logit directly.
Likelihood (Lambert 1992). Let
For log_sum_exp:
For
Posterior predictive (Lambert 1992 forward simulation): first draw poisson_log_rng.
} else if (family_id_k[1] == 11) {
// ZINB K = 3: symmetric to ZIP with neg_binomial_2_log.
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
log_lik[i] = log_sum_exp(
bernoulli_logit_lpmf(1 | eta_pi_i),
bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
phi_i);
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
y_pred[i] = neg_binomial_2_log_rng(eta_mu_i, phi_i);
}
}-
Slot 1: NB2 mean on the log link,
$\log\mu_i = \eta_{i,1}$ (consumed byneg_binomial_2_log_lpmf, which takes the log-mean directly). -
Slot 2: NB2 precision
$\phi_i = \exp(\eta_{i,2})$ (log link, hard-coded). -
Slot 3: zero-inflation logit
$\eta_{i,3}$ , with$\pi_i = \mathrm{logit}^{-1}(\eta_{i,3})$ .
Likelihood. With
For
For
Posterior predictive: draw neg_binomial_2_log_rng.
} else if (family_id_k[1] == 12) {
// Hurdle-Poisson K = 2: log_lik mirrors the model-block
// dispatch. y_pred uses rejection sampling on Poisson(mu) to
// realize the truncated-at-one positive branch (Mullahy 1986);
// bounded at 10000 iterations for numerical safety when mu is
// pathologically small (in which case the family is borderline
// identifiable in the first place).
real eta_mu_i = eta_k[i, 1];
real eta_pi_i = eta_k[i, 2];
if (y_int[i] == 0) {
log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ poisson_log_lpmf(y_int[i] | eta_mu_i)
- log1m_exp(-exp(eta_mu_i));
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
int y_draw = 0;
int iter = 0;
while (y_draw == 0 && iter < 10000) {
y_draw = poisson_log_rng(eta_mu_i);
iter = iter + 1;
}
y_pred[i] = y_draw;
}
}-
Slot 1: Poisson mean on the log link,
$\log\mu_i = \eta_{i,1}$ . -
Slot 2: hurdle logit
$\eta_{i,2}$ , with$\pi_i = \mathrm{logit}^{-1}(\eta_{i,2}) = \Pr(Y_i = 0)$ (the hurdle-at-zero probability, not the zero-inflation probability).
Likelihood (Mullahy 1986). The hurdle model factorizes as
For
For log1m_exp(-exp(eta_mu_i)) for numerical stability (computing
Posterior predictive. Draw the hurdle indicator iter reaches 10000. The 10000-iteration cap is a numerical safeguard: when y_draw == 0, y_pred[i] is set to 0 — a degenerate fallback.
} else if (family_id_k[1] == 13) {
// Hurdle-NB K = 3: symmetric to Hurdle-Poisson with
// neg_binomial_2_log on the truncated positive branch.
real eta_mu_i = eta_k[i, 1];
real phi_i = exp(eta_k[i, 2]);
real eta_pi_i = eta_k[i, 3];
if (y_int[i] == 0) {
log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
} else {
log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
+ neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
phi_i)
- log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i,
phi_i));
}
if (bernoulli_logit_rng(eta_pi_i) == 1) {
y_pred[i] = 0;
} else {
int y_draw = 0;
int iter = 0;
while (y_draw == 0 && iter < 10000) {
y_draw = neg_binomial_2_log_rng(eta_mu_i, phi_i);
iter = iter + 1;
}
y_pred[i] = y_draw;
}
}-
Slot 1: NB2 mean on the log link,
$\log\mu_i = \eta_{i,1}$ . -
Slot 2: NB2 precision
$\phi_i = \exp(\eta_{i,2})$ (log link, hard-coded). -
Slot 3: hurdle logit
$\eta_{i,3}$ , with$\pi_i = \mathrm{logit}^{-1}(\eta_{i,3}) = \Pr(Y_i = 0)$ .
Likelihood. The NB2 probability of zero is
neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i). The hurdle factorization is
For
For
Posterior predictive: draw neg_binomial_2_log_rng(eta_mu_i, phi_i) until a positive draw is obtained or 10000 iterations elapse. Same numerical-safety cap and degenerate fallback as family 12.
} else {
log_lik[i] = negative_infinity();
y_pred[i] = 0;
}
}
}If family_id_k[1] matches none of
-
log_lik[i] = negative_infinity()— i.e.$\log p(y_i \mid \cdot) = -\infty$ , rendering the draw effectively impossible under any model that consumeslog_likfor model comparison. This is a defensive sentinel rather than a graceful degradation. -
y_pred[i] = 0— a placeholder posterior-predictive draw.
The loop over i and the generated quantities block are then closed.
The generated quantities block is consumption-only with respect to the AMM reference-anchoring decomposition: it does not re-express the anchoring constraints, the reference-contrast prior structure, or the slot assembly. All of that was performed in transformed parameters (sections 1–2), which produced the eta_k whose
subject to whatever anchoring identification constraints (e.g.
-
Aliases
eta_kastheta_i_kto maintain API parity with the$K = 1 + p$ template — the only place where the "swap of$p$ with$K$ " mentioned in the comment is materially realized. -
Dispatches on
family_id_k[1]to evaluate the pointwise log-likelihood$\log p(y_i \mid \eta_{i,\cdot})$ using the appropriate inverse-link machinery (apply_inv_link_by_idfor slot 2 in most two-slot families; hard-codedexp/inv_logitfor the location-scale and Tweedie families). -
Generates one posterior-predictive draw
$y_i^{\text{rep}}$ per observation from the same family, using the_rngcounterparts and (for the hurdle families) rejection-sampling truncation with a 10000-iteration safety cap.
No Jacobian terms appear in this block: all inverse-link transformations here are forward maps from linear predictors to natural parameters, evaluated for the purpose of likelihood evaluation and posterior-predictive simulation. Any Jacobian adjustments required by the anchoring decomposition (e.g. for constrained reparametrizations of model block (section 2) via target += statements that are not visible in this section.
This file implements the conditional Empirical Bayes (EB) path for the Anchored Modulating Model (AMM). It differs from the marginal EB template in one structural decision: the group-level reference parameters parameters into data (as theta_ref_data), having been pre-estimated via Laplace approximation in a prior step. All remaining components—additive basis
The header comment documents:
- Path: Empirical Bayes, Conditional (Path A), sub-phase 8.6.C, decision D34.
-
Structural difference:
theta_refis data, not a parameter; hyperparametersmu_theta_ref/sigma_theta_refand their priors are absent. -
Reference framework: For individual
$i$ , group$g[i]$ , coordinate$k$ :
-
Likelihood factorization:
$p(\mathbf{y}_i \mid \theta_i) = \prod_{k=1}^p D(y_{ik} \mid \theta_i[k])$ , where$D$ is selected byfamily_id(homogeneous across all$p$ coordinates). - Ragged design: Padded design matrices with per-coordinate column counts; flat-packed coefficient vectors with offset arrays.
-
Reparametrization of
$\mathbf{b}$ : The model samples$\mathbf{c}_b = \theta_{\text{ref}} \odot \mathbf{b}$ directly (NCP), eliminating bimodality in$(\theta_{\text{ref}}, \mathbf{b})$ . -
Centering: Column-wise centering of
$Z_a[k]$ and$Z_b[k]$ is enforced R-side, so conditions (C2) and (C3) hold identically without further Stan-side constraints. -
Anchoring of
$\mathbf{W}$ : The reparametrization$W(\theta_{\text{ref}}) - W(\theta_{\text{anchor}})$ satisfies (C4).
functions {
// {{CANONICAL_HELPERS}}
}The placeholder {{CANONICAL_HELPERS}} is resolved at code-generation time by R/stan_codegen.R. It injects the shared helper functions used by all AMM templates, most critically:
-
apply_W_basis_diff(W_type_id, theta_ref_ik, theta_anchor_k, W_per_k_dim, W_degree, W_n_knots_full, W_knots_full): Returns avector[W_per_k_dim]representing the difference of basis evaluations:
where W_type_id == 1, B-spline with Cox–de Boor recursion when W_type_id == 2). When W_type_id == 0 (W off), this function is not called. The Cox–de Boor recursion re-evaluates the basis at both
Additional canonical helpers (from the shared library) include utilities for offset computation, padded dot-product helpers, and family-specific link/inverse-link functions as needed.
This block declares all fixed inputs. Each declaration is explained below.
int<lower=1> n;Number of observations. Must be
int<lower=1> p;Number of outcome coordinates (multivariate dimension). Must be
int<lower=1, upper=4> family_id;Selects the response family, homogeneous across all
- 1 = Gaussian (identity link)
- 2 = Poisson (log link)
- 3 = Negative binomial 2 (log link, Stan's mean–dispersion parameterization)
- 4 = Bernoulli (logit link)
int<lower=0, upper=1> use_a;
int<lower=0, upper=1> use_b;
int<lower=0, upper=1> use_W;Binary toggles for the additive (
int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[p] int<lower=0> J_a_per_k;
array[p] int<lower=0> J_b_per_k;-
J_a_max:$\max_k J_{a,k}$ , the maximum number of additive basis columns across coordinates. Used to dimension padded arrays. -
J_b_max:$\max_k J_{b,k}$ , the maximum number of multiplicative basis columns. -
J_a_per_k[k]: Actual number of additive basis columns for coordinate$k$ . Zero whenuse_a == 0or coordinate$k$ has no additive term. -
J_b_per_k[k]: Analogous for the multiplicative basis.
array[p] matrix[n, J_a_max] Z_a;
array[p] matrix[n, J_b_max] Z_b;For each coordinate
-
$Z_a[k]$ is an$n \times J_{a,\max}$ matrix. Only columns$1 \ldots J_{a,k}$ are referenced; remaining columns are padding. -
$Z_b[k]$ is an$n \times J_{b,\max}$ matrix. Only columns$1 \ldots J_{b,k}$ are referenced.
These matrices are column-wise centered R-side (enforced in amm_spec_multi_designs()), ensuring conditions (C2):
int<lower=0> dim_W;Total dimension of the use_W == 0, this may be 0.
int<lower=0> d;Dimension of the covariate vector use_W == 0, this may be 0.
int<lower=0> W_per_k_dim;Number of basis functions per coordinate for separable
matrix[n, d] X;The
matrix[n, p] y_real;
array[n, p] int y_int;Both outcome storage blocks are allocated; only the one matching family_id is consumed:
-
y_real[:, k]is used whenfamily_id == 1(Gaussian). -
y_int[:, k]is used whenfamily_id ∈ {2, 3, 4}(Poisson, negative binomial, Bernoulli).
vector[p] theta_anchor;The fixed anchor point
int<lower=0, upper=1> use_dispersion_y;
int<lower=0, upper=1> use_dispersion_phi;-
use_dispersion_y: When 1, sample observation-level dispersion$\sigma_y$ (used by Gaussian family). -
use_dispersion_phi: When 1, sample the dispersion parameter$\phi$ (used by negative binomial family).
int<lower=0, upper=2> W_type_id;Selects the
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;B-spline knot vector and polynomial degree, consumed by apply_W_basis_diff() when W_type_id == 2. For polynomial bases (W_type_id == 1), W_degree determines the polynomial order.
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;-
use_groups: 0 = single global anchor (backward-compatible), 1 = per-group anchors. -
J_groups: Number of groups. Whenuse_groups == 0,J_groups == 1. -
group_id[i]: Group assignment for observation$i$ . Whenuse_groups == 0,group_id[i] == 1for all$i$ .
{{DATA_CP_A_PER_K_DECL}}Resolved at code-generation time. Declares any additional data needed for the configurable centered/non-centered parametrization of
int<lower=1> K_slots;
int<lower=1> p_dim;Under Path A (EB Conditional, K_slots == 1 and p_dim == p. These are declared for template uniformity; the body does not branch on them.
array[J_groups] vector[p] theta_ref_data;The key structural difference from the marginal template. This is the pre-estimated reference parameter, gdpar_eb_maximize_marginal() via cmdstanr::laplace() applied to the marginal EB template. It is an array of J_groups vectors of length
When use_groups == 0, only theta_ref_data[1] is populated and used.
This block pre-computes offsets and sizes used for flat-packing.
array[p] int J_a_free;
array[p] int J_b_free;For each coordinate
-
J_a_free[k]=J_a_per_k[k]whenuse_a == 1andJ_a_per_k[k] > 0; otherwise 0. This is the number of free additive coefficients for coordinate$k$ . -
J_b_free[k]is analogous for the multiplicative component.
array[p] int a_raw_offset;
array[p] int c_b_raw_offset;Cumulative offsets into the flat-packed coefficient vectors:
-
a_raw_offset[k]=$\sum_{k' < k} J_{a,\text{free}}[k']$ : the starting index (0-based offset) ina_rawfor coordinate$k$ 's additive coefficients. -
c_b_raw_offset[k]=$\sum_{k' < k} J_{b,\text{free}}[k']$ : analogous forc_b_raw.
int total_J_a_free = 0;
int total_J_b_free = 0;Running totals, accumulated in the loop below. Final values:
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;Effective dimensions for use_W == 0, both collapse to 0, producing zero-sized parameter arrays.
int sigma_a_size = (use_a == 1) ? p : 0;
int sigma_b_size = (use_b == 1) ? p : 0;
int sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;
int sigma_y_size = (use_dispersion_y == 1) ? p : 0;
int phi_size = (use_dispersion_phi == 1) ? p : 0;Sizes for the scale/dispersion parameter arrays. When a component is inactive, its scale array is zero-sized (not sampled).
for (k in 1:p) {
J_a_free[k] = (use_a == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
J_b_free[k] = (use_b == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
a_raw_offset[k] = total_J_a_free;
c_b_raw_offset[k] = total_J_b_free;
total_J_a_free += J_a_free[k];
total_J_b_free += J_b_free[k];
}The loop that populates the per-coordinate free counts and offsets. For each coordinate
- Compute
J_a_free[k]andJ_b_free[k]. - Record the current cumulative offset.
- Increment the cumulative totals.
This produces a prefix-sum scheme enabling segment(a_raw, a_raw_offset[k] + 1, J_a_free[k]) to extract coordinate
Notably absent: theta_ref, mu_theta_ref, sigma_theta_ref. These were parameters in the marginal template; here
array[sigma_a_size] real<lower=0> sigma_a;Per-coordinate additive scale parameters use_a == 1). Zero-sized array when inactive.
vector[total_J_a_free] a_raw;Flat-packed raw additive coefficients. In the non-centered parametrization (NCP), {{TP_A_BLOCK}}.
array[sigma_b_size] real<lower=0> sigma_b;Per-coordinate multiplicative scale parameters use_b == 1).
vector[total_J_b_free] c_b_raw;Flat-packed raw coefficients for
array[sigma_W_size] real<lower=0> sigma_W;Global scale for use_W == 1 && dim_W > 0).
matrix[dim_W_eff, d_eff] W_raw;The raw use_W == 1, this is a {{W_SCALE}}). When use_W == 0, this is
array[sigma_y_size] real<lower=0> sigma_y;Per-coordinate observation-level standard deviation family_id == 1). Zero-sized when use_dispersion_y == 0 or family is non-Gaussian.
array[phi_size] real<lower=0> phi;Per-coordinate dispersion parameter family_id == 3). Stan's mean–dispersion parameterization:
This block constructs the coefficient vectors and the linear predictor
array[p] vector[J_a_max] a_coef;
array[p] vector[J_b_max] c_b;
array[p] vector[J_b_max] b_coef;
matrix[n, p] eta;-
a_coef[k]: Additive coefficient vector for coordinate$k$ , length$J_{a,\max}$ . Only entries$1 \ldots J_{a,k}$ are meaningful. -
c_b[k]: Reparametrized multiplicative coefficient vector,$c_{b,k} = \theta_{\text{ref},k} \cdot b_k$ . Length$J_{b,\max}$ . -
b_coef[k]: Derived multiplicative coefficient vector,$b_k = c_{b,k} / \theta_{\text{ref},k}$ . Only computed as a derived quantity in the single-anchor regime. -
eta[i, k]: Linear predictor$\eta_{ik}$ .
for (k in 1:p) {
a_coef[k] = rep_vector(0, J_a_max);
c_b[k] = rep_vector(0, J_b_max);
b_coef[k] = rep_vector(0, J_b_max);
}All coefficient vectors initialized to zero. Unused entries remain zero.
{{TP_A_BLOCK}}Placeholder resolved at code-generation time. This implements the CP/NCP transformation for the additive coefficients. In the canonical NCP form:
and stores the result in a_coef[k][1:J_a_per_k[k]]. The CP form would instead directly assign a_raw segments to a_coef with the prior providing regularization.
if (use_b == 1) {
for (k in 1:p) {
if (J_b_per_k[k] > 0) {
for (j in 1:J_b_free[k]) {
c_b[k][j] = c_b_raw[c_b_raw_offset[k] + j] * sigma_b[k];
}When the multiplicative component is active, for each coordinate
This is the NCP for
if (use_groups == 0 && theta_ref_data[1][k] != 0) {
for (j in 1:J_b_per_k[k]) {
b_coef[k][j] = c_b[k][j] / theta_ref_data[1][k];
}
}Derived quantity only in the single-anchor regime. Computes:
when use_groups == 0 and use_groups == 1), there is no single canonical scalar divisor; b_coef stays zero and users derive
This reparametrization is motivated by the bimodality documented in Recovery 2 (handoff 4 addendum): the multiplicative term in the original parametrization enters as
for (i in 1:n) {
int g_i = group_id[i];
for (k in 1:p) {
real theta_ref_ik = theta_ref_data[g_i][k];
real eta_ik = theta_ref_ik;For each observation
- Look up the group
$g_i = \texttt{group\_id}[i]$ . - Set the local reference:
$\theta_{\text{ref},ik} = \texttt{theta\_ref\_data}[g_i][k]$ . - Initialize the linear predictor:
$\eta_{ik} \leftarrow \theta_{\text{ref},ik}$ .
This implements the first term of the AMM decomposition: the baseline is the group-level reference parameter.
Additive component:
if (use_a == 1 && J_a_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_a[k][i, 1:J_a_per_k[k]]),
a_coef[k][1:J_a_per_k[k]]
);
}Adds the additive contribution:
Only the first
Multiplicative component:
if (use_b == 1 && J_b_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_b[k][i, 1:J_b_per_k[k]]),
c_b[k][1:J_b_per_k[k]]
);
}Adds the multiplicative contribution, but in the
Since
Modulating component (
if (use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0) {
vector[d] W_diff_x_k = rep_vector(0, d);
vector[W_per_k_dim] basis_diff_k = apply_W_basis_diff(
W_type_id, theta_ref_ik, theta_anchor[k], W_per_k_dim,
W_degree, W_n_knots_full, W_knots_full
);
int row_start = (k - 1) * W_per_k_dim + 1;When the modulating component is active:
- Initialize
$\mathbf{W}_{\text{diff\_x\_k}} = \mathbf{0}_d$ . - Evaluate the basis difference vector:
via apply_W_basis_diff(). This function dispatches on W_type_id:
-
W_type_id == 1(polynomial): evaluates monomials$1, \theta, \theta^2, \ldots$ at both points and takes differences. -
W_type_id == 2(B-spline): runs Cox–de Boor recursion at both points and takes differences.
- Compute the row offset: coordinate
$k$ 's block in$W_{\text{raw}}$ starts at row$\texttt{row\_start} = (k-1) \times W_{\text{per\_k\_dim}} + 1$ .
for (jj_local in 1:W_per_k_dim) {
int row = row_start + jj_local - 1;
for (jjj in 1:d) {
W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[row, jjj]{{W_SCALE}};
}
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));The double loop accumulates:
for {{W_SCALE}} placeholder resolves to a scaling factor (typically * sigma_W[1] for the CP form). Then:
In matrix notation, the full modulating contribution for coordinate
where
eta[i, k] = eta_ik;
}
}Store the completed linear predictor. The full linear predictor is:
This block specifies priors and the likelihood.
// No prior on theta_ref / mu_theta_ref / sigma_theta_ref: theta_ref
// is data in the conditional template. xi components retain their
// canonical priors.This is the defining difference from the marginal template. In the marginal EB template,
if (use_a == 1) {
sigma_a ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}When the additive component is active:
-
Scale prior:
$\sigma_{a,k} \sim \texttt{PRIOR\_SIGMA\_A}$ for each$k$ (typically half-normal or half-Cauchy; resolved at code generation). -
Coefficient prior:
{{MODEL_A_BLOCK}}is resolved to place priors ona_raw. In the canonical NCP form, this is:
so that
if (use_b == 1) {
sigma_b ~ {{PRIOR_SIGMA_B}};
c_b_raw ~ normal(0, 1);
}When the multiplicative component is active:
-
Scale prior:
$\sigma_{b,k} \sim \texttt{PRIOR\_SIGMA\_B}$ for each$k$ . -
Coefficient prior:
$c_{b,\text{raw}}[j] \sim \mathcal{N}(0, 1)$ for all$j$ , so$c_{b,k}[j] = \sigma_{b,k} \cdot c_{b,\text{raw}}[\ldots]$ has marginal prior$\mathcal{N}(0, \sigma_{b,k}^2)$ .
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}When the modulating component is active:
-
Global scale prior:
$\sigma_W \sim \texttt{PRIOR\_SIGMA\_W}$ (a single scalar). -
Coefficient prior:
$\text{vec}(W_{\text{raw}}) \sim \texttt{W\_PRIOR}$ (typically$\mathcal{N}(0, 1)$ for NCP). With the{{W_SCALE}}scaling, the implied prior on$W$ entries is$\mathcal{N}(0, \sigma_W^2)$ .
if (use_dispersion_y == 1) {
sigma_y ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
phi ~ {{PRIOR_PHI}};
}-
$\sigma_{y,k}$ : Observation-level scale for Gaussian family. -
$\phi_k$ : Dispersion for negative binomial family.
for (k in 1:p) {
if (family_id == 1) {
y_real[:, k] ~ normal(eta[:, k], sigma_y[k]);
} else if (family_id == 2) {
y_int[:, k] ~ poisson_log(eta[:, k]);
} else if (family_id == 3) {
y_int[:, k] ~ neg_binomial_2_log(eta[:, k], phi[k]);
} else if (family_id == 4) {
y_int[:, k] ~ bernoulli_logit(eta[:, k]);
}
}The likelihood factorizes across coordinates. For each coordinate family_id:
-
Gaussian (
family_id == 1):
using Stan's normal(mu, sigma) parameterization. The link function is identity:
-
Poisson (
family_id == 2):
using Stan's poisson_log (log-link). So
-
Negative binomial 2 (
family_id == 3):
using Stan's neg_binomial_2_log(eta, phi) (log-link, mean–dispersion parameterization). Mean
-
Bernoulli (
family_id == 4):
using Stan's bernoulli_logit (logit-link). So
The joint likelihood across all observations and coordinates is:
where
The full model implements the following generative structure. Let
Conditional posterior (what this Stan program computes):
where:
This is the conditional step (Step (iii)) of the EB workflow: given the point estimate
The template is absent a generated quantities block in the provided source (section 1 of 2); it presumably appears in section 2.
generated quantities {
matrix[n, p] log_lik;
matrix[n, p] theta_i = eta;
matrix[n, p] y_pred;
for (i in 1:n) {
for (k in 1:p) {
if (family_id == 1) {
log_lik[i, k] = normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]);
y_pred[i, k] = normal_rng(eta[i, k], sigma_y[k]);
} else if (family_id == 2) {
log_lik[i, k] = poisson_log_lpmf(y_int[i, k] | eta[i, k]);
y_pred[i, k] = poisson_log_rng(eta[i, k]);
} else if (family_id == 3) {
log_lik[i, k] = neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]);
y_pred[i, k] = neg_binomial_2_log_rng(eta[i, k], phi[k]);
} else if (family_id == 4) {
log_lik[i, k] = bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]);
y_pred[i, k] = bernoulli_logit_rng(eta[i, k]);
}
}
}
}This block computes quantities derived from the posterior samples after model fitting. It is executed once per iteration of the sampling algorithm.
matrix[n, p] log_lik;
- Declares an
$n \times p$ matrix storing the pointwise log-likelihood for each observation$i \in \{1,\dots,n\}$ and each item$k \in \{1,\dots,p\}$ . - Used for posterior model comparison and diagnostics (e.g., computing WAIC, LOO).
matrix[n, p] theta_i = eta;
- Declares and initializes an
$n \times p$ matrixtheta_ito the linear predictor matrixetafrom thetransformed parametersblock. - In the AMM reference-anchoring decomposition,
etarepresents the latent score or linear predictor on the link scale. By assigning it totheta_i, we preserve the latent trait values for each individual and item for post-processing.
matrix[n, p] y_pred;
- Declares an
$n \times p$ matrix storing posterior predictive samples (replications of the observed data). - Used for posterior predictive checks (PPCs) to assess model fit.
Outer loop: for (i in 1:n) iterates over all
Inner loop: for (k in 1:p) iterates over all
The family_id data variable selects the likelihood family. For each
-
Pointwise Log-Likelihood
$\log p(y_{i,k} \mid \eta_{i,k}, \boldsymbol{\psi}_k)$ -
Posterior Predictive Sample
$\tilde{y}_{i,k} \sim p(y \mid \eta_{i,k}, \boldsymbol{\psi}_k)$
where
Likelihood:
$$
\log p(y_{i,k}^{\text{real}} \mid \eta_{i,k}, \sigma_{y,k}) = -\frac{1}{2}\left(\log(2\pi) + 2\log\sigma_{y,k} + \frac{(y_{i,k}^{\text{real}} - \eta_{i,k})^2}{\sigma_{y,k}^2}\right)
$$
Implementation:
normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]) computes this directly.
Posterior Predictive Distribution:
$$
\tilde{y}{i,k}^{\text{real}} \sim \mathcal{N}(\eta{i,k}, \sigma_{y,k}^2)
$$
Implementation:
normal_rng(eta[i, k], sigma_y[k]) draws a sample from this normal distribution.
Likelihood (log link):
$$
\log p(y_{i,k}^{\text{int}} \mid \eta_{i,k}) = y_{i,k}^{\text{int}} \cdot \eta_{i,k} - e^{\eta_{i,k}} - \log(y_{i,k}^{\text{int}}!)
$$
Implementation:
poisson_log_lpmf(y_int[i, k] | eta[i, k]) computes the Poisson log-PMF with log-rate
Posterior Predictive Distribution:
$$
\tilde{y}{i,k}^{\text{int}} \sim \text{Poisson}(e^{\eta{i,k}})
$$
Implementation:
poisson_log_rng(eta[i, k]) generates a Poisson sample with rate
Likelihood (log link):
$$
\log p(y_{i,k}^{\text{int}} \mid \eta_{i,k}, \phi_k) = \log\Gamma(y_{i,k}^{\text{int}} + \phi_k) - \log\Gamma(\phi_k) - \log(y_{i,k}^{\text{int}}!) + \phi_k \log\phi_k + y_{i,k}^{\text{int}} \eta_{i,k} - (\phi_k + y_{i,k}^{\text{int}}) \log(\phi_k + e^{\eta_{i,k}})
$$
where
Implementation:
neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]) computes this exactly.
Posterior Predictive Distribution:
$$
\tilde{y}{i,k}^{\text{int}} \sim \text{NB2}(\mu = e^{\eta{i,k}}, \phi_k)
$$
Implementation:
neg_binomial_2_log_rng(eta[i, k], phi[k]) draws a sample from this distribution.
Likelihood (logit link):
$$
\log p(y_{i,k}^{\text{int}} \mid \eta_{i,k}) = y_{i,k}^{\text{int}} \log\left(\frac{e^{\eta_{i,k}}}{1 + e^{\eta_{i,k}}}\right) + (1 - y_{i,k}^{\text{int}}) \log\left(\frac{1}{1 + e^{\eta_{i,k}}}\right)
$$
Equivalently,
Implementation:
bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]) computes this using the logit-scale parameterization.
Posterior Predictive Distribution:
$$
\tilde{y}{i,k}^{\text{int}} \sim \text{Bernoulli}\left(\text{logit}^{-1}(\eta{i,k})\right)
$$
Implementation:
bernoulli_logit_rng(eta[i, k]) draws a Bernoulli sample with probability
In the AMM framework:
-
$\eta_{i,k}$ represents the latent score (or linear predictor) for individual$i$ on item$k$ . - Under the reference-anchoring decomposition,
$\eta$ typically has the form: $$ \eta_{i,k} = \alpha_k + \boldsymbol{x}_i^\top \boldsymbol{\beta}_k + \boldsymbol{z}i^\top \boldsymbol{u}{i,k} $$ where$\alpha_k$ are item intercepts,$\boldsymbol{\beta}_k$ are item-specific fixed effects, and$\boldsymbol{u}_{i,k}$ are random effects.
The generated quantities block does not alter the model's parameterization but uses
- Compute the pointwise log-likelihood for each data point, enabling model comparison via LOO or WAIC.
- Generate posterior predictive samples
$\tilde{y}_{i,k}$ to assess the model's ability to replicate observed data patterns. - Export
theta_i = etafor potential post-hoc analysis (e.g., computing person parameters or reliability).
By conditioning on the posterior samples of
File path: inst/stan/_canonical_pieces/amm_canonical_eb_conditional.stan
Regime: Empirical Bayes (EB) Conditional — Sub-phase 8.6.B of the gdpar development charter. This template is the Step (iii) sampler in the EB workflow: it receives the EB point estimate
The header establishes three structural facts:
-
Provenance. The template is invoked by
R/eb.Rviacmdstanr::cmdstan_model(...)$sample(...)at Step (iii) of the EB workflow described inv07 §11.1(operational) andv07b §§4, 6(theoretical). Step (i) produces$\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ ; Step (iii) plugs it in as data and samples$\xi$ . -
Structural difference from
amm_eb_marginal.stan/amm_main.stan. Three changes:-
theta_refmoves fromparameters{}todata{}, renamedtheta_ref_data. - The hyperparameters
mu_theta_refandsigma_theta_refare removed entirely — they existed solely to place a prior ontheta_refin the full-Bayes (FB) and EB-marginal regimes. - The prior on
theta_refis removed frommodel{}.
-
-
Bit-identicality guarantee. Every other line matches
amm_eb_marginal.stanso that the conditional posterior of$\xi$ given the plug-in$\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ is exactly the conditional posterior that the EB asymptotic theory ofv07 §5/v07b §4analyses. No extra modelling assumption is introduced between Steps (i) and (iii). -
Multivariate posture (Charter §2.7). Sub-phase 8.6.B operates under
$K = 1$ (single slot per group),$p = 1$ (scalar parameter),$J \geq 1$ (one or more groups). The data block declaresK_slotsandp_dimwith default 1; the body assumesK_slots == 1andp_dim == 1throughout. Sub-phase 8.6.C will extend the body to$K > 1,\ p > 1$ without rewriting the data declaration.
functions {
// {{CANONICAL_HELPERS}}
}This block contains a single Mustache-style placeholder {{CANONICAL_HELPERS}}. At template-instantiation time (R-side, before cmdstanr::cmdstan_model), the R engine substitutes this token with the canonical helper-function definitions shared across all AMM templates. The only helper referenced in the body below is:
vector apply_W_basis_diff(int W_type_id, real theta_ref_i, real theta_anchor,
int dim_W, int W_degree,
int W_n_knots_full, vector W_knots_full)which computes the basis-difference vector transformed parameters walkthrough for the mathematical definition). Because the placeholder is opaque at the Stan-source level, no further specification is given here; the substitution is faithful to whatever gdpar's R-side template engine injects.
int<lower=1> n;
int<lower=0, upper=4> family_id;-
n— number of observations. Constraintlower=1enforces$n \geq 1$ . -
family_id— response-family selector, constrained to$\{0,1,2,3,4\}$ . In themodelandgenerated quantitiesblocks only values$1$ –$4$ are exercised:-
1→ Gaussian (normal) -
2→ Poisson (log link) -
3→ Negative binomial type 2 (log link) -
4→ Bernoulli (logit link)
-
int<lower=0, upper=1> use_a;
int<lower=0, upper=1> use_b;
int<lower=0, upper=1> use_W;Three binary switches that activate the three additive slots of the AMM decomposition:
-
use_a— additive random-effect slot$Z_a\, a$ (unanchored). -
use_b— reference-anchored slot$Z_b\, c_b$ (where$c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$ ). -
use_W— smooth$W$ -deviation slot$x_i^\top W^\top \Delta_B(\theta_{\mathrm{ref},i}, \theta_0)$ .
int<lower=0> J_a;
int<lower=0> J_b;
int<lower=0> dim_W;
int<lower=0> d;-
J_a— column count of$Z_a$ (number of additive coefficients). -
J_b— column count of$Z_b$ (number of reference-anchored coefficients). -
dim_W— number of basis rows of$W$ (rows of the$W_{\mathrm{raw}}$ matrix). -
d— covariate dimension for the$W$ slot (columns of$W_{\mathrm{raw}}$ ; also the column count of$X$ ).
All four are constrained lower=0, allowing any slot to be absent (dimension zero).
matrix[n, J_a] Z_a;
matrix[n, J_b] Z_b;
matrix[n, d] X;
vector[n] y_real;
array[n] int y_int;-
Z_a—$n \times J_a$ design matrix for the additive slot. -
Z_b—$n \times J_b$ design matrix for the reference-anchored slot. -
X—$n \times d$ covariate matrix for the$W$ -deviation slot. -
y_real— real-valued response, used whenfamily_id == 1. -
y_int— integer-valued response (array of length$n$ ), used whenfamily_id ∈ {2,3,4}.
Both response vectors are always declared; only one is consumed depending on family_id.
real theta_anchor;
int<lower=0, upper=1> use_dispersion_y;
int<lower=0, upper=1> use_dispersion_phi;-
theta_anchor— the fixed anchor value$\theta_0$ used in the$W$ -basis difference$\Delta_B(\theta, \theta_0)$ . This is the "reference point" at which the$W$ -deviation vanishes by construction. -
use_dispersion_y— binary flag activating the Gaussian dispersion$\sigma_y$ (relevant only forfamily_id == 1). -
use_dispersion_phi— binary flag activating the NB dispersion$\phi$ (relevant only forfamily_id == 3).
int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;-
W_type_id— basis-type selector, constrained to$\{0,1,2\}$ . The actual mapping (e.g., B-spline, polynomial, etc.) is resolved insideapply_W_basis_diff(injected via{{CANONICAL_HELPERS}}). -
W_n_knots_full— number of knots (can be 0 for knot-free bases). -
W_knots_full— knot vector of lengthW_n_knots_full. -
W_degree— spline/polynomial degree.
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;-
use_groups— binary flag. When0, the model treats all observations as sharing a single reference (degenerate grouping); when1, group-specific references are used. -
J_groups— number of groups, constrained$\geq 1$ . -
group_id— integer array of length$n$ , mapping each observation to a group in$\{1, \dots, J_{\mathrm{groups}}\}$ .
int<lower=1> K_slots;
int<lower=1> p_dim;-
K_slots— number of slots per group. In 8.6.B this is fixed at$K = 1$ (a single scalar slot per group). The field is declared withlower=1and defaults to 1; the R-side validator.gdpar_eb_validate_inputs()checksK_slots == 1. -
p_dim— parameter dimension per slot. In 8.6.B this is fixed at$p = 1$ . Same default-and-check regime.
These fields are declared but not exercised in the 8.6.B body; they exist so that 8.6.C can add data block.
vector[J_groups] theta_ref_data;-
theta_ref_data— the EB point estimate$\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ , supplied as a length-$J_{\mathrm{groups}}$ vector. Each entrytheta_ref_data[g]is the plug-in reference for group$g$ . The R-side helper.gdpar_eb_maximize_marginal()populates this fromcmdstanr::laplace()applied toamm_eb_marginal.stan, following the projection and anti-fragility logic of Charter §2.8.
This is the single most important declaration distinguishing the conditional template from the marginal/FB templates: theta_ref is data, not a parameter.
int J_a_free = (use_a == 1 && J_a > 0) ? J_a : 0;
int J_b_free = (use_b == 1 && J_b > 0) ? J_b : 0;
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;Four effective-dimension integers computed once at data-loading time:
-
J_a_free— effective free dimension of the additive slot:$J_a$ ifuse_a == 1and$J_a > 0$ , else$0$ . -
J_b_free— effective free dimension of the reference-anchored slot:$J_b$ ifuse_b == 1and$J_b > 0$ , else$0$ . -
dim_W_eff— effective row dimension of$W_{\mathrm{raw}}:\ \dim_W$ ifuse_W == 1, else$0$ . -
d_eff— effective column dimension of$W_{\mathrm{raw}}:\ d$ ifuse_W == 1, else$0$ .
These are used to size parameter vectors/matrices with zero dimensions when a slot is disabled, which is the Stan idiom for conditionally removing parameters from the model.
// theta_ref is now data (theta_ref_data above); no hyperparameters
// mu_theta_ref or sigma_theta_ref are sampled in the conditional
// regime --- they were a device to give theta_ref a prior in the
// marginal/FB regimes only.Explicitly documents the absence of theta_ref, mu_theta_ref, and sigma_theta_ref from the parameter vector. In the conditional regime the parameter vector is
array[use_a == 1 && J_a > 0 ? 1 : 0] real<lower=0> sigma_a;
vector[J_a_free] a_raw;-
sigma_a— a conditional-size array (length 1 if the additive slot is active, length 0 otherwise) of positive scale parameters. Thearray[cond ? 1 : 0]idiom makes the parameter vanish from HMC when the slot is disabled. -
a_raw— raw (unscaled) additive coefficients, lengthJ_a_free. The scaled coefficient isa_coef[j] = a_raw[j] * {{A_SCALE}}(seetransformed parameters), where{{A_SCALE}}is a Mustache placeholder (typically* sigma_a[1]).
array[use_b == 1 && J_b > 0 ? 1 : 0] real<lower=0> sigma_b;
vector[J_b_free] c_b_raw;-
sigma_b— conditional-size positive scale for the reference-anchored slot. -
c_b_raw— raw coefficients of lengthJ_b_free. Note the namec_b_raw(notb_raw): the sampled quantity is the centered coefficient$c_b = \sigma_b \cdot c_{b,\mathrm{raw}}$ , and the anchored coefficient$b_{\mathrm{coef}} = c_b / \theta_{\mathrm{ref}}$ is derived intransformed parameters. This is the reference-anchoring decomposition: the model is parameterized in terms of$c_b$ (which enters the linear predictor directly), while$b_{\mathrm{coef}}$ is the interpretable quantity satisfying$c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$ .
array[use_W == 1 && dim_W > 0 ? 1 : 0] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;-
sigma_W— conditional-size positive scale for the$W$ slot. -
W_raw—$\dim_W_{\mathrm{eff}} \times d_{\mathrm{eff}}$ raw coefficient matrix. When the$W$ slot is disabled, both dimensions are 0, producing a$0 \times 0$ matrix (effectively absent). The scaled matrix used in the predictor isW_raw[k,jj] * {{W_SCALE}}(typically* sigma_W[1]).
array[use_dispersion_y] real<lower=0> sigma_y;
array[use_dispersion_phi] real<lower=0> phi;-
sigma_y— Gaussian dispersion, array of lengthuse_dispersion_y(0 or 1). Positive-constrained. -
phi— Negative-binomial dispersion (reciprocal overdispersion), array of lengthuse_dispersion_phi(0 or 1). Positive-constrained.
vector[J_a] a_coef = rep_vector(0, J_a);
vector[J_b] c_b = rep_vector(0, J_b);
vector[J_b] b_coef = rep_vector(0, J_b);
vector[n] eta;-
a_coef— scaled additive coefficients, length$J_a$ , initialized to zero. -
c_b— centered (scaled) reference-anchored coefficients, length$J_b$ , initialized to zero. -
b_coef— anchored coefficients$b_{\mathrm{coef}} = c_b / \theta_{\mathrm{ref}}$ , length$J_b$ , initialized to zero. -
eta— linear predictor, length$n$ .
if (use_a == 1 && J_a > 0) {
for (j in 1:J_a_free) {
a_coef[j] = a_raw[j]{{A_SCALE}};
}
}When the additive slot is active, each raw coefficient is scaled:
where {{A_SCALE}} is a Mustache placeholder (typically * sigma_a[1]). The remaining entries of a_coef (if any) stay at zero.
if (use_b == 1 && J_b > 0) {
for (j in 1:J_b_free) {
c_b[j] = c_b_raw[j] * sigma_b[1];
}
if (use_groups == 0 && theta_ref_data[1] != 0) {
for (j in 1:J_b) {
b_coef[j] = c_b[j] / theta_ref_data[1];
}
}
}Two steps:
-
Scaling:
$c_{b,j} = c_{b,\mathrm{raw},j} \cdot \sigma_b$ , for$j = 1, \dots, J_{b,\mathrm{free}}$ . -
Anchoring (conditional): Only when
use_groups == 0(degenerate single-group regime) andtheta_ref_data[1] != 0(the plug-in reference is nonzero), the anchored coefficient is computed:
This realizes the reference-anchoring decomposition use_groups == 1 (multi-group) or b_coef remains at zero — the anchoring is not performed because a single scalar division by one group's reference would be ill-defined or meaningless across groups.
for (i in 1:n) {
real theta_ref_i = theta_ref_data[group_id[i]];
real eta_i = theta_ref_i;
if (use_a == 1 && J_a > 0) {
eta_i += Z_a[i] * a_coef;
}
if (use_b == 1 && J_b > 0) {
eta_i += Z_b[i] * c_b;
}
if (use_W == 1 && dim_W > 0 && d > 0) {
vector[d] W_diff_x = rep_vector(0, d);
vector[dim_W] basis_diff = apply_W_basis_diff(
W_type_id, theta_ref_i, theta_anchor, dim_W, W_degree,
W_n_knots_full, W_knots_full
);
for (k in 1:dim_W) {
for (jj in 1:d) {
W_diff_x[jj] += basis_diff[k] * W_raw[k, jj]{{W_SCALE}};
}
}
eta_i += dot_product(W_diff_x, to_vector(X[i]));
}
eta[i] = eta_i;
}For each observation
Step 1 — Group-specific reference:
where
Step 2 — Initialize predictor:
Step 3 — Additive slot (if active):
where
Step 4 — Reference-anchored slot (if active):
Note: the predictor uses
Step 5 —
(a) Compute the basis-difference vector:
This is the difference between the basis evaluated at the observation's reference
(b) Accumulate the
where {{W_SCALE}} is a Mustache placeholder (typically * sigma_W[1]). In matrix form:
where
(c) Add to the predictor:
where
Full linear predictor (AMM reference-anchoring decomposition):
with the anchoring identity
Step 6 — Store:
// BEGIN PRIORS
// No prior on theta_ref: theta_ref is data in the conditional
// template. The hyperprior block of the marginal template
// (mu_theta_ref, sigma_theta_ref) is removed here for the same
// reason. xi components retain their canonical priors.The comment block explicitly states that:
- No prior is placed on
$\theta_{\mathrm{ref}}$ (it is data). - The hyperprior block (
mu_theta_ref,sigma_theta_ref) from the marginal template is removed. - The
$\xi$ components retain their canonical priors (identical toamm_eb_marginal.stan).
if (use_a == 1 && J_a > 0) {
sigma_a[1] ~ {{PRIOR_SIGMA_A}};
a_raw ~ {{A_PRIOR}};
}-
$\sigma_a \sim \texttt{PRIOR\_SIGMA\_A}$ (Mustache placeholder for a positive-support prior, e.g., half-normal, half-Cauchy, exponential). -
$a_{\mathrm{raw}} \sim \texttt{A\_PRIOR}$ (Mustache placeholder, typicallynormal(0, 1)for a non-centered parameterization, but the exact distribution is resolved at template-instantiation time).
if (use_b == 1 && J_b > 0) {
sigma_b[1] ~ {{PRIOR_SIGMA_B}};
c_b_raw ~ normal(0, 1);
}-
$\sigma_b \sim \texttt{PRIOR\_SIGMA\_B}$ (Mustache placeholder). -
$c_{b,\mathrm{raw},j} \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(0, 1)$ — standard normal on the raw coefficients. This is a non-centered parameterization:$c_{b,j} = \sigma_b \cdot c_{b,\mathrm{raw},j}$ , so marginally$c_{b,j} \sim \mathcal{N}(0, \sigma_b^2)$ .
Note: unlike a_raw (whose prior is a placeholder), c_b_raw has a hard-coded normal(0, 1) prior. This asymmetry is faithful to the source.
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}-
$\sigma_W \sim \texttt{PRIOR\_SIGMA\_W}$ (Mustache placeholder). -
$\mathrm{vec}(W_{\mathrm{raw}}) \sim \texttt{W\_PRIOR}$ (Mustache placeholder). Theto_vector()call column-major vectorizes the matrix so a single multivariate or iid prior can be applied.
if (use_dispersion_y == 1) {
sigma_y[1] ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
phi[1] ~ {{PRIOR_PHI}};
}-
$\sigma_y \sim \texttt{PRIOR\_SIGMA\_Y}$ (Mustache placeholder), when Gaussian dispersion is active. -
$\phi \sim \texttt{PRIOR\_PHI}$ (Mustache placeholder), when NB dispersion is active.
// END PRIORSif (family_id == 1) {
y_real ~ normal(eta, sigma_y[1]);
} else if (family_id == 2) {
y_int ~ poisson_log(eta);
} else if (family_id == 3) {
y_int ~ neg_binomial_2_log(eta, phi[1]);
} else if (family_id == 4) {
y_int ~ bernoulli_logit(eta);
}The likelihood is selected by family_id:
Family 1 (Gaussian):
The normal(eta, sigma_y[1]) vectorized form adds
Family 2 (Poisson, log link):
poisson_log(eta) uses exp() is needed.
Family 3 (Negative binomial type 2, log link):
where
Family 4 (Bernoulli, logit link):
bernoulli_logit(eta) uses
The model block encodes the conditional posterior:
where:
-
$L_n$ is the likelihood from the selected family, -
$\pi(\xi)$ is the product of the canonical priors on$(a_{\mathrm{raw}}, c_{b,\mathrm{raw}}, W_{\mathrm{raw}}, \sigma_a, \sigma_b, \sigma_W, \sigma_y, \phi)$ , -
$\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ enters as a fixed constant (data), not a random variable.
There is no Jacobian adjustment beyond what Stan automatically applies for the lower=0 constraints on the positive parameters (
vector[n] log_lik;
vector[n] theta_i = eta;
vector[n] y_pred;-
log_lik— pointwise log-likelihood, length$n$ . Used for LOO/WAIC vialoo::loo()in R. -
theta_i— the per-observation "theta" quantity, defined astheta_i = eta. In this parameterization the linear predictor$\eta_i$ is$\theta_i$ ; there is no separate link-function inversion. This assignment makes explicit that the AMM's latent "theta" for observation$i$ equals the constructed predictor. -
y_pred— posterior-predictive draws, length$n$ .
for (i in 1:n) {
if (family_id == 1) {
log_lik[i] = normal_lpdf(y_real[i] | eta[i], sigma_y[1]);
y_pred[i] = normal_rng(eta[i], sigma_y[1]);
} else if (family_id == 2) {
log_lik[i] = poisson_log_lpmf(y_int[i] | eta[i]);
y_pred[i] = poisson_log_rng(eta[i]);
} else if (family_id == 3) {
log_lik[i] = neg_binomial_2_log_lpmf(y_int[i] | eta[i], phi[1]);
y_pred[i] = neg_binomial_2_log_rng(eta[i], phi[1]);
} else if (family_id == 4) {
log_lik[i] = bernoulli_logit_lpmf(y_int[i] | eta[i]);
y_pred[i] = bernoulli_logit_rng(eta[i]);
}
}For each observation
Family 1:
Family 2:
Family 3:
Family 4:
The _lpdf / _lpmf forms compute the exact pointwise log-density/mass; the _rng forms draw from the predictive distribution conditional on the current HMC draw of
The template encodes the AMM (Anchored Mixed Model) decomposition with
with the anchoring identity use_groups == 0 and amm_eb_marginal.stan except for the three structural changes documented in the header: theta_ref → data, hyperparameters removed, prior on theta_ref removed.
← Part IV — Exhaustive Function Reference (7/7) · gdpar Wiki Home · Part V — Stan Templates (2/3) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References