Skip to content

Part V Stan Templates 1

José Mauricio Gómez Julián edited this page Jun 28, 2026 · 6 revisions

← Part IV — Complete Function Reference (7/7) · gdpar Wiki Home · Part V — Stan Templates (2/3) →


Part V — Stan Templates (block by block)

inst/stan/_canonical_pieces/amm_canonical_distrib_K.stan

functions block

The functions block defines user-defined functions used in the Stan program. It contains functions for the Tweedie distribution and a helper for applying inverse links.

tweedie_log_W_series

Computes the log of the Dunn–Smyth series W(y, phi, p) for the Tweedie distribution when $1 < p < 2$.

The series is defined as: $$ W(y, \phi, p) = \sum_{j=1}^\infty W_j $$ where $$ W_j = \frac{y^{j \alpha} (p-1)^{-j\alpha}}{\phi^{j(1+\alpha)} (2-p)^j \cdot j! \cdot \Gamma(j\alpha)} $$ with $\alpha = \frac{2-p}{p-1} > 0$.

The function iteratively sums terms until a term is more than 37 nats below the running maximum (relative contribution < $10^{-16}$).

Returns: $\log W(y, \phi, p)$.

tweedie_log_f_series

Exact log-density of the Tweedie distribution via the Dunn–Smyth series. Valid for $y &gt; 0$.

The Tweedie density is: $$ f(y; \mu, \phi, p) = \exp\left{ \frac{y\theta - \kappa}{\phi} - \log y + \log W(y, \phi, p) \right} $$ where $$ \theta = \frac{\mu^{1-p}}{1-p}, \quad \kappa = \frac{\mu^{2-p}}{2-p}. $$

Returns: $\log f(y; \mu, \phi, p)$.

tweedie_log_f_saddlepoint

Saddlepoint approximation of the Tweedie log-density (Wood 2017, Eq. 3.46). Used outside the central region ($|p-1.5| \ge \tau$) for gradient stability. Valid for $y &gt; 0$.

The deviance is: $$ D = 2\left( \frac{y(y^{1-p} - \mu^{1-p})}{1-p} - \frac{y^{2-p} - \mu^{2-p}}{2-p} \right). $$ The saddlepoint log-density is: $$ \log f_{\text{sp}}(y; \mu, \phi, p) = -\frac{1}{2}\log(2\pi\phi y^p) - \frac{D}{2\phi}. $$

Returns: $\log f_{\text{sp}}(y; \mu, \phi, p)$.

tweedie_lpdf

Public entry point for the Tweedie log-pdf. Dispatches to the appropriate method based on $y$ and $p$.

  • If $y = 0$: uses the compound Poisson point mass: $\Pr(Y=0) = -\lambda$ with $\lambda = \mu^{2-p}/(\phi(2-p))$.
  • If $y &gt; 0$ and $|p-1.5| &lt; \tau$ (default $\tau = 0.4$): uses the series expansion.
  • Otherwise: uses the saddlepoint approximation.

Returns: $\log f(y; \mu, \phi, p)$.

tweedie_rng

Generates a random draw from the Tweedie distribution. Uses the compound Poisson-gamma representation:

  • $N \sim \text{Poisson}(\lambda)$ with $\lambda = \mu^{2-p}/(\phi(2-p))$.
  • If $N = 0$, return 0.
  • Else, $Y = \sum_{i=1}^N G_i$ with $G_i \sim \text{Gamma}(\alpha, \beta)$, $\alpha = (2-p)/(p-1)$, $\beta = 1/(\phi(p-1)\mu^{p-1})$.

apply_inv_link_by_id

Applies an inverse link function based on an integer ID. Used for heterogeneous slot links in K=2 families.

  • ID 0: identity link (e.g., Gaussian μ).
  • ID 1: logit link (e.g., Beta μ, Bernoulli μ).
  • ID 2: exp link (e.g., Gaussian σ, NB φ).

Returns: $g^{-1}(\eta)$ where $g$ is the link function.

Placeholder: {{CANONICAL_HELPERS}}

This is a placeholder for helper functions (bspline_basis_eval and apply_W_basis_diff) that are injected from amm_canonical_helpers.stan.

data block

The data block declares all observed data and fixed constants.

n

Number of observations (sample size). Integer, must be at least 1.

K

Number of K-individual distributional slots. Integer, must be at least 2.

family_id_k

Array of length K of family IDs (1–13) specifying the distribution family for each slot.

  • 1: Gaussian (K=2)
  • 3: Negative binomial (K=2)
  • 5: Beta (K=2)
  • 6: Gamma (K=2)
  • 7: Lognormal (K=2)
  • 8: Student-t (K=3)
  • 9: Tweedie (K=3)
  • 10: Zero-inflated Poisson (K=2)
  • 11: Zero-inflated negative binomial (K=3)
  • 12: Hurdle Poisson (K=2)
  • 13: Hurdle negative binomial (K=3)

inv_link_id_per_slot

Array of length K of inverse link IDs (0, 1, 2) per slot. Used for K=2 families to apply the canonical link to auxiliary slots.

use_a_k and use_b_k

Arrays of length K of 0/1 flags indicating whether the slot uses an additive term $a_k$ or a multiplicative term $b_k$.

use_W

0/1 flag indicating whether the modulating function $W$ is used globally.

J_a_per_k and J_b_per_k

Arrays of length K of integers (≥0) specifying the number of basis functions for $a_k$ and $b_k$ per slot.

J_a_max and J_b_max

Maximum of J_a_per_k and J_b_max over k, respectively.

Z_a_k and Z_b_k

Arrays of length K of design matrices for $a_k$ and $b_k$. Each matrix has n rows and J_a_max (or J_b_max) columns. The actual columns used per slot are the first J_a_per_k[k] (or J_b_per_k[k]).

dim_W

Dimension of the basis for the modulating function $W$. If use_W=0, set to 0.

d

Dimension of the basis for the global modulating function $W$. If use_W=0, set to 0.

X

Design matrix for the global modulating function $W$. Has n rows and d columns.

y_real and y_int

Vectors of length n containing the response variable. y_real is for continuous families, y_int for discrete.

theta_anchor_K

Vector of length K containing the reference values for each slot. Default is 0.

use_dispersion_y_k and use_dispersion_phi_k

Arrays of length K of 0/1 flags indicating whether the slot uses a dispersion parameter for y (e.g., σ for Gaussian) or for φ (e.g., for NB, Beta, etc.). These are for future extension to mixed family patterns.

W_type_id

Integer (0, 1, 2) indicating the type of basis for $W$:

  • 0: Polynomial
  • 1: B-spline
  • 2: Radial basis

W_n_knots_full, W_knots_full, W_degree

Parameters for the basis of $W$ (if B-spline). W_n_knots_full is the number of knots, W_knots_full the knot vector, W_degree the degree.

use_groups and J_groups

0/1 flag and integer for group-level random effects. If use_groups=1, there are J_groups groups.

group_id

Array of length n of group indices (1 to J_groups) for each observation.

{{DATA_CP_A_PER_K_DECL}}

Placeholder for additional data declarations related to the CP/NCP parametrization of the additive terms $a_k$. This is filled by the R code generator.


Note: The remaining blocks (transformed data, parameters, transformed parameters, model, generated quantities) will be covered in subsequent sections.

Transformed Data Block

Purpose

Computes indexing arrays, counters, and flags required to efficiently map between flat-packed raw parameter vectors and per-slot model quantities. This ensures identifiability and avoids non-identified parameters in the AMM (Additive-Multiplicative Model) reference-anchoring decomposition.

Declarations

array[K] int J_a_free;
array[K] int J_b_free;
  • J_a_free[k]: Number of free (estimated) additive coefficients for slot $k$. Set to J_a_per_k[k] if slot $k$ is active (use_a_k[k] == 1) and has at least one design column (J_a_per_k[k] > 0); otherwise 0.
  • J_b_free[k]: Analogous for multiplicative coefficients.
array[K] int a_raw_offset;
array[K] int c_b_raw_offset;
  • a_raw_offset[k]: Starting index (0-based) in the flat-packed vector a_raw for slot $k$'s additive coefficients. Equals $\sum_{k'=1}^{k-1} J_a_free[k']$.
  • c_b_raw_offset[k]: Starting index in c_b_k_raw for slot $k$'s multiplicative coefficients.
int total_J_a_free = 0;
int total_J_b_free = 0;
  • Accumulate total free additive/multiplicative coefficients across all slots. These become the dimensions of a_raw and c_b_k_raw.
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;
  • Effective dimensions for the global modulating component $W$. If use_W == 0, both are zero (no $W$ component).
int any_use_a = 0;
int any_use_b = 0;
  • Flags indicating whether any slot uses the additive or multiplicative component. Set to 1 if any slot has use_a_k[k] == 1 or use_b_k[k] == 1.
int sigma_W_size;
  • Size of the scale array for $W$. Equals 1 if use_W == 1 and dim_W > 0; otherwise 0.
int sigma_y_size = 0;
int phi_size = 0;
  • Counts of population-scoped dispersion parameters for Gaussian (sigma_y) and negative binomial (phi) families. Incremented for each slot where use_dispersion_y_k[k] or use_dispersion_phi_k[k] is 1.

Loop Over Slots $k = 1 \dots K$

For each slot $k$:

  1. Compute J_a_free[k] and J_b_free[k]:

    J_a_free[k] = (use_a_k[k] == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
    • Only slots with active additive component and at least one design column contribute free coefficients.
  2. Set offsets and update totals:

    a_raw_offset[k] = total_J_a_free;
    c_b_raw_offset[k] = total_J_b_free;
    total_J_a_free += J_a_free[k];
    total_J_b_free += J_b_free[k];
    • Offsets are cumulative sums of free coefficients from previous slots.
  3. Update any_use_a and any_use_b flags:

    if (use_a_k[k] == 1) any_use_a = 1;
    if (use_b_k[k] == 1) any_use_b = 1;
  4. Accumulate dispersion parameter counts:

    sigma_y_size += use_dispersion_y_k[k];
    phi_size += use_dispersion_phi_k[k];

After Loop: Global Component and sigma_a_k Compaction

sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;
  • $W$ has a single scale parameter sigma_W[1] if active.

Compaction of sigma_a_k (per RG.6, D96):

int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
  if (J_a_free[k] > 0) {
    n_sigma_a += 1;
    sigma_a_idx[k] = n_sigma_a;
  } else {
    sigma_a_idx[k] = 0;
  }
}
  • n_sigma_a: Number of slots with free additive coefficients.
  • sigma_a_idx[k]: Maps slot $k$ to its index in the compacted sigma_a_k array (1-based); 0 if no free coefficients.
  • Purpose: Avoids sampling a scale parameter for slots without free additive coefficients (e.g., intercept-only slots fully absorbed into $\theta_{\text{ref}}$), which would be non-identified (flat direction) and stall the sampler.

Mathematical Role

This block establishes the bookkeeping for the AMM decomposition:

  • The additive component $\mathbf{a}k$ has scale $\sigma{a,k}$ and coefficients $a_{\text{raw}}$.
  • The multiplicative component $\mathbf{c}b{k} = \theta{\text{ref},k} \circ \mathbf{b}k$ has scale $\sigma{b,k}$.
  • The global component $W$ has scale $\sigma_W$.
  • The per-slot anchors $\theta_{\text{ref},k}$ are the reference points around which additive/multiplicative effects are defined.

Parameters Block

Declarations

Per-slot anchors

array[J_groups] vector[K] theta_ref_k;
  • $\theta_{\text{ref},g,k}$: Anchor for group $g$, slot $k$. In the $K = 1 + p$ template, $p=1$ per slot, so this is a scalar per group per slot.

Hierarchical hyperparameters (if use_groups == 1)

array[use_groups] vector[K] mu_theta_ref_k;
array[use_groups] vector<lower=0>[K] sigma_theta_ref_k;
  • $\mu_{\theta,k}$ and $\sigma_{\theta,k}$: Group-level mean and standard deviation for the per-slot anchors. Only present when hierarchical groups are used.

Additive component scale and coefficients

array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;
  • $\sigma_{a,k}$: Scale for additive component in slot $k$ (only for slots with J_a_free[k] > 0).
  • $a_{\text{raw}}$: Raw additive coefficients (standardized). Transformed to $a_{\text{coef},k}$ in transformed parameters.

Multiplicative component scale and coefficients

array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_k_raw;
  • $\sigma_{b,k}$: Scale for multiplicative component in slot $k$ (if any_use_b == 1; otherwise array is empty).
  • $c_{b,k}^{\text{raw}}$: Raw multiplicative coefficients (standardized). Transformed to $c_{b,k}$ in transformed parameters.

Global modulating component

array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;
  • $\sigma_W$: Scale for global component $W$.
  • $W_{\text{raw}}$: Raw coefficients for basis functions of $W$. Dimensions: $\text{dim}_W \times d$.

Population-scoped dispersion parameters

array[sigma_y_size] real<lower=0> sigma_y_pop;
array[phi_size] real<lower=0> phi_pop;
  • $\sigma_{y,\text{pop}}$: Scale parameter for Gaussian family (location only).
  • $\phi_{\text{pop}}$: Dispersion parameter for negative binomial family (location only).

Mathematical Role

The parameters are structured for the AMM non-centered parameterization:

  • Raw parameters $a_{\text{raw}}$, $c_{b,k}^{\text{raw}}$, $W_{\text{raw}}$ are standard normal a priori (in the model block).
  • They are scaled by $\sigma_{a,k}$, $\sigma_{b,k}$, $\sigma_W$ to produce actual coefficients.
  • The anchors $\theta_{\text{ref},k}$ are the reference points; additive/multiplicative effects are defined relative to these.

Transformed Parameters Block

Declarations

array[K] vector[J_a_max] a_coef_k;
array[K] vector[J_b_max] c_b_k;
array[K] vector[J_b_max] b_coef_k;
matrix[n, K] eta_k;
  • a_coef_k[k]: Additive coefficients for slot $k$ (length $J_a^{\max}$).
  • c_b_k[k]: Multiplicative coefficients in centered form: $c_{b,k} = \theta_{\text{ref},k} \cdot b_k$.
  • b_coef_k[k]: Multiplicative coefficients in non-centered form (only computed for reporting when use_groups == 0).
  • eta_k[i,k]: Linear predictor for observation $i$, slot $k$.

Initialization

for (k in 1:K) {
  a_coef_k[k] = rep_vector(0, J_a_max);
  c_b_k[k] = rep_vector(0, J_b_max);
  b_coef_k[k] = rep_vector(0, J_b_max);
}
  • Initialize all coefficient vectors to zero.

Placeholder: {{TP_A_BLOCK}}

This is where the R package inserts code to transform $a_{\text{raw}}$ to $a_{\text{coef},k}$. The expected transformation is:

  1. For each slot $k$ with J_a_free[k] > 0, copy the segment of $a_{\text{raw}}$ from index a_raw_offset[k]+1 to a_raw_offset[k]+J_a_free[k] into a_coef_k[k][1:J_a_free[k]].
  2. Scale by $\sigma_{a,k}$ (using the compacted indexing via sigma_a_idx).
  3. Possibly apply a non-centered transformation (e.g., $a_k = \sigma_{a,k} \cdot a_{\text{raw}}$).

Multiplicative Coefficients

if (any_use_b == 1) {
  for (k in 1:K) {
    if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
      for (j in 1:J_b_free[k]) {
        c_b_k[k][j] = c_b_k_raw[c_b_raw_offset[k] + j] * sigma_b_k[k];
      }
      // Compute b_coef_k only for single-group case
      if (use_groups == 0 && theta_ref_k[1][k] != 0) {
        for (j in 1:J_b_per_k[k]) {
          b_coef_k[k][j] = c_b_k[k][j] / theta_ref_k[1][k];
        }
      }
    }
  }
}
  • For each slot with multiplicative component:
    • Transform raw coefficients: $c_{b,k} = \sigma_{b,k} \cdot c_{b,k}^{\text{raw}}$.
    • If single group (use_groups == 0) and anchor nonzero, compute $b_k = c_{b,k} / \theta_{\text{ref},k}$ for reporting.

Linear Predictor Construction

For each observation $i$ and slot $k$:

real theta_ref_ik = theta_ref_k[g_i][k];
real eta_ik = theta_ref_ik;
  • Start with the group-specific anchor for slot $k$.

Additive component:

if (use_a_k[k] == 1 && J_a_per_k[k] > 0) {
  eta_ik += dot_product(to_vector(Z_a_k[k][i, 1:J_a_per_k[k]]), a_coef_k[k][1:J_a_per_k[k]]);
}
  • $\eta_{ik} \mathrel{+}= \mathbf{z}_{a,ik}^\top \mathbf{a}k$, where $\mathbf{z}{a,ik}$ is row $i$ of the additive design matrix for slot $k$.

Multiplicative component:

if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
  eta_ik += dot_product(to_vector(Z_b_k[k][i, 1:J_b_per_k[k]]), c_b_k[k][1:J_b_per_k[k]]);
}
  • $\eta_{ik} \mathrel{+}= \mathbf{z}{b,ik}^\top \mathbf{c}{b,k}$. Since $\mathbf{c}{b,k} = \theta{\text{ref},k} \cdot \mathbf{b}k$, this is the multiplicative effect anchored at $\theta{\text{ref},k}$.

Global modulating component (if use_W == 1):

vector[d] W_diff_x_k = rep_vector(0, d);
vector[dim_W] basis_diff_k = apply_W_basis_diff(...);
for (jj_local in 1:dim_W) {
  for (jjj in 1:d) {
    W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[jj_local, jjj]{{W_SCALE}};
  }
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
  • Compute basis functions of the difference between current anchor and slot anchor: $\phi(\theta_{\text{ref},ik} - \theta_{\text{anchor},k})$.
  • Construct $W$-component: $\mathbf{w}_k = \Phi \cdot W$, where $\Phi$ is the basis matrix.
  • $\eta_{ik} \mathrel{+}= \mathbf{x}_i^\top \mathbf{w}_k$.
  • The placeholder {{W_SCALE}} scales $W_{\text{raw}}$ by $\sigma_W$.

Finally:

eta_k[i, k] = eta_ik;

Mathematical Formulation

The linear predictor for observation $i$, slot $k$ is: $$ \eta_{ik} = \theta_{\text{ref},g_i,k} + \mathbf{z}{a,ik}^\top \mathbf{a}k + \mathbf{z}{b,ik}^\top (\theta{\text{ref},g_i,k} \mathbf{b}_k) + \mathbf{x}_i^\top \mathbf{w}_k $$ where:

  • $\mathbf{a}k = \sigma{a,k} \cdot \mathbf{a}_{\text{raw},k}$ (additive coefficients).
  • $\mathbf{b}k$ are multiplicative coefficients (non-centered), $\mathbf{c}{b,k} = \theta_{\text{ref},k} \mathbf{b}_k$.
  • $\mathbf{w}_k$ is the global $W$ component.

This realizes the AMM reference-anchoring decomposition: effects are additive/multiplicative around the anchor $\theta_{\text{ref},k}$.


Model Block

Prior for Anchors

{{THETA_REF_PRIOR_BLOCK}}
  • Placeholder for priors on $\theta_{\text{ref},k}$. Can be hierarchical (normal) or flat, and may include slot-specific constraints (e.g., uniform on Tweedie power).

Scale and Coefficient Priors

Additive component (if any_use_a == 1):

sigma_a_k ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
  • Prior on $\sigma_{a,k}$ (e.g., half-normal, half-Cauchy).
  • Prior on $a_{\text{raw}}$ (expected standard normal).

Multiplicative component (if any_use_b == 1):

sigma_b_k ~ {{PRIOR_SIGMA_B}};
c_b_k_raw ~ normal(0, 1);
  • Prior on $\sigma_{b,k}$.
  • $c_{b,k}^{\text{raw}} \sim \mathcal{N}(0,1)$.

Global component (if use_W == 1 and dim_W > 0):

sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
  • Prior on $\sigma_W$.
  • Prior on $W_{\text{raw}}$ (e.g., standard normal).

Population-scoped dispersion:

sigma_y_pop ~ {{PRIOR_SIGMA_Y}};
phi_pop ~ {{PRIOR_PHI}};
  • Priors on $\sigma_{y,\text{pop}}$ and $\phi_{\text{pop}}$.

Likelihood (Distributional Regression)

The likelihood is written for $K = 2$ (location + dispersion) or $K = 3$ (location + dispersion + additional parameter) families, with homogeneous family assumption (family_id_k[1]).

Gaussian ($K = 2$, family_id = 1)

for (i in 1:n) {
  real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ normal(eta_k[i, 1], sigma_i);
}
  • $\mu_i = \eta_{i1}$ (identity link for location).
  • $\sigma_i = g^{-1}(\eta_{i2})$ (link function for dispersion, typically $\exp$).
  • $y_i \sim \mathcal{N}(\mu_i, \sigma_i)$.

Negative Binomial ($K = 2$, family_id = 3)

for (i in 1:n) {
  real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_int[i] ~ neg_binomial_2(exp(eta_k[i, 1]), phi_i);
}
  • $\mu_i = \exp(\eta_{i1})$ (log link for mean).
  • $\phi_i = g^{-1}(\eta_{i2})$ (typically $\exp$ for dispersion).
  • $y_i \sim \text{NB2}(\mu_i, \phi_i)$.

Beta ($K = 2$, family_id = 5)

for (i in 1:n) {
  real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ beta_proportion(inv_logit(eta_k[i, 1]), phi_i);
}
  • $\mu_i = \text{logit}^{-1}(\eta_{i1})$ (logit link for mean in (0,1)).
  • $\phi_i = g^{-1}(\eta_{i2})$ (typically $\exp$ for precision).
  • $y_i \sim \text{BetaProportion}(\mu_i, \phi_i)$.

Gamma ($K = 2$, family_id = 6)

for (i in 1:n) {
  real mu_i    = exp(eta_k[i, 1]);
  real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ gamma(shape_i, shape_i / mu_i);
}
  • $\mu_i = \exp(\eta_{i1})$ (log link for mean).
  • $\alpha_i = g^{-1}(\eta_{i2})$ (typically $\exp$ for shape).
  • $y_i \sim \text{Gamma}(\alpha_i, \alpha_i/\mu_i)$.

Lognormal ($K = 2$, family_id = 7)

for (i in 1:n) {
  real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ lognormal(eta_k[i, 1], sigma_i);
}
  • $\mu_i = \eta_{i1}$ (identity link for log-location).
  • $\sigma_i = g^{-1}(\eta_{i2})$ (typically $\exp$ for scale).
  • $y_i \sim \text{Lognormal}(\mu_i, \sigma_i)$.

Student-t ($K = 3$, family_id = 8)

for (i in 1:n) {
  y_real[i] ~ student_t(exp(eta_k[i, 3]), eta_k[i, 1], exp(eta_k[i, 2]));
}
  • $\mu_i = \eta_{i1}$ (identity link for location).
  • $\sigma_i = \exp(\eta_{i2})$ (log link for scale).
  • $\nu_i = \exp(\eta_{i3})$ (log link for degrees of freedom).
  • $y_i \sim t_{\nu_i}(\mu_i, \sigma_i)$.

Tweedie ($K = 3$, family_id = 9)

for (i in 1:n) {
  y_real[i] ~ tweedie(exp(eta_k[i, 1]), exp(eta_k[i, 2]), eta_k[i, 3]);
}
  • $\mu_i = \exp(\eta_{i1})$ (log link for mean).
  • $\phi_i = \exp(\eta_{i2})$ (log link for dispersion).
  • $p_i = \eta_{i3}$ (identity link for power, bounded to $(1.01,1.99)$ by prior).
  • $y_i \sim \text{Tweedie}(\mu_i, \phi_i, p_i)$.

Zero-Inflated Poisson ($K = 2$, family_id = 10)

for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real eta_pi_i = eta_k[i, 2];
  if (y_int[i] == 0) {
    target += log_sum_exp(
      bernoulli_logit_lpmf(1 | eta_pi_i),
      bernoulli_logit_lpmf(0 | eta_pi_i) + poisson_log_lpmf(0 | eta_mu_i)
    );
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i) + poisson_log_lpmf(y_int[i] | eta_mu_i);
  }
}
  • $\mu_i = \exp(\eta_{i1})$ (log link for Poisson mean).
  • $\pi_i = \text{logit}^{-1}(\eta_{i2})$ (logit link for zero-inflation probability).
  • Likelihood: $P(y_i=0) = \pi_i + (1-\pi_i)e^{-\mu_i}$, $P(y_i&gt;0) = (1-\pi_i)\frac{e^{-\mu_i}\mu_i^{y_i}}{y_i!}$.

Zero-Inflated Negative Binomial ($K = 3$, family_id = 11)

for (i in 1:n) {
  real eta_mu_i  = eta_k[i, 1];
  real phi_i     = exp(eta_k[i, 2]);
  real eta_pi_i  = eta_k[i, 3];
  if (y_int[i] == 0) {
    target += log_sum_exp(
      bernoulli_logit_lpmf(1 | eta_pi_i),
      bernoulli_logit_lpmf(0 | eta_pi_i) + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
    );
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i) + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i);
  }
}
  • $\mu_i = \exp(\eta_{i1})$, $\phi_i = \exp(\eta_{i2})$, $\pi_i = \text{logit}^{-1}(\eta_{i3})$.
  • Likelihood: $P(y_i=0) = \pi_i + (1-\pi_i)\left(\frac{\phi_i}{\mu_i+\phi_i}\right)^{\phi_i}$, $P(y_i&gt;0) = (1-\pi_i)\text{NB2}(y_i|\mu_i,\phi_i)$.

Hurdle-Poisson ($K = 2$, family_id = 12)

for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real eta_pi_i = eta_k[i, 2];
  if (y_int[i] == 0) {
    target += bernoulli_logit_lpmf(1 | eta_pi_i);
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(y_int[i] | eta_mu_i)
            - log1m_exp(-exp(eta_mu_i));
  }
}
  • $\mu_i = \exp(\eta_{i1})$, $\pi_i = \text{logit}^{-1}(\eta_{i2})$.
  • Likelihood: $P(y_i=0)=\pi_i$, $P(y_i&gt;0)=(1-\pi_i)\frac{\text{Poisson}(y_i|\mu_i)}{1-e^{-\mu_i}}$.

Hurdle-Negative Binomial ($K = 3$, family_id = 13)

for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real phi_i    = exp(eta_k[i, 2]);
  real eta_pi_i = eta_k[i, 3];
  if (y_int[i] == 0) {
    target += bernoulli_logit_lpmf(1 | eta_pi_i);
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i)
            - log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i));
  }
}
  • $\mu_i = \exp(\eta_{i1})$, $\phi_i = \exp(\eta_{i2})$, $\pi_i = \text{logit}^{-1}(\eta_{i3})$.
  • Likelihood: $P(y_i=0)=\pi_i$, $P(y_i&gt;0)=(1-\pi_i)\frac{\text{NB2}(y_i|\mu_i,\phi_i)}{1-\text{NB2}(0|\mu_i,\phi_i)}$.

Mathematical Summary

The model block specifies:

  • Priors: Weakly informative or proper on scales and coefficients; anchors may have hierarchical or flat priors.
  • Likelihood: Distributional regression where each slot's $\eta_{ik}$ is passed through an inverse link to produce the parameter of the distribution. The likelihood is family-specific and uses the AMM decomposition for all parameters.

The entire model realizes the AMM reference-anchoring decomposition: effects are additive/multiplicative around group-specific anchors, with optional global modulation $W$, and proper identifiability via the compacted $\sigma_{a,k}$ and the prior structure.

Stan Template: inst/stan/_canonical_pieces/amm_canonical_distrib_K.stan — Block 3 of 3

Overview

This is the final block of the AMM (General Dynamic Parameter) model template for distributional regression with $K \geq 1$ distributional parameters. The generated quantities block computes two key posterior-estimable quantities:

  1. Pointwise log-likelihoods (log_lik): Used for model comparison via WAIC, LOO-CV, etc.
  2. Posterior-predictive draws (y_pred): Used for posterior predictive checks.

Additionally, it declares theta_i_k as a matrix to store the linear predictors, though it is not subsequently used in this block (likely for diagnostic purposes).

The block implements a dispatcher based on family_id_k[1], which identifies the distribution family. For each observation, it:

  • Computes the log-likelihood using the appropriate Stan log-probability function.
  • Generates a draw from the posterior predictive distribution using the corresponding RNG.
  • Uses inverse link functions (via apply_inv_link_by_id) to map linear predictors to distribution parameters.

This realizes the AMM reference-anchoring decomposition by applying the distributional-regression likelihood that was selected in the model block, using the linear predictors $\eta_k$ that are constructed from the reference-anchored parameterization.


Block Structure and Line-by-Line Explanation

generated quantities {

Opens the generated quantities block. This block executes after the model is fitted, on each posterior draw.


1. Declarations

  vector[n] log_lik;
  matrix[n, K] theta_i_k = eta_k;
  vector[n] y_pred;
  • log_lik: A vector of length $n$ (number of observations) to store the pointwise log-likelihood $\log p(y_i \mid \theta_i)$ for each observation $i$.
  • theta_i_k: An $n \times K$ matrix initialized to the linear predictor matrix eta_k from the transformed parameters block. This stores the linear predictors $\eta_{i,k}$ for each observation $i$ and each distributional parameter slot $k$. It is declared for potential post-processing (e.g., extracting fitted values), but not used further in this block.
  • y_pred: A vector of length $n$ to store posterior-predictive draws $y_i^{\text{rep}}$.

2. Observation Loop

  for (i in 1:n) {

Iterates over all $n$ observations. For each observation $i$, the code dispatches based on the family identifier family_id_k[1].


Family 1: Normal (Gaussian)
    if (family_id_k[1] == 1) {
      real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
      y_pred[i] = normal_rng(eta_k[i, 1], sigma_i);
    }
  • Interpretation: Normal distribution with identity link for $\mu$ and possibly non-identity link for $\sigma$.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \mu_i$, $\eta_{i,2} = g^{-1}(\sigma_i)$.
    • Inverse link: $\sigma_i = g_{\sigma}^{-1}(\eta_{i,2})$, where $g_{\sigma}$ is the link for $\sigma$ (often log).
    • Log-likelihood: $$\log p(y_i \mid \mu_i, \sigma_i) = -\frac{1}{2} \log(2\pi) - \log \sigma_i - \frac{(y_i - \mu_i)^2}{2\sigma_i^2}$$
    • Posterior predictive: $y_i^{\text{rep}} \sim \mathcal{N}(\mu_i, \sigma_i^2)$.
  • AMM Role: The location parameter $\mu_i$ is directly from the reference-anchored linear predictor $\eta_{i,1}$; the scale parameter $\sigma_i$ is transformed from $\eta_{i,2}$ via its link.

Family 3: Negative Binomial
    } else if (family_id_k[1] == 3) {
      real mu_i  = exp(eta_k[i, 1]);
      real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                        eta_k[i, 2]);
      log_lik[i] = neg_binomial_2_lpmf(y_int[i] | mu_i, phi_i);
      y_pred[i]  = neg_binomial_2_rng(mu_i, phi_i);
    }
  • Interpretation: Negative binomial distribution (alternative parameterization) with log link for $\mu$ and possibly non-identity link for $\phi$ (precision/size).
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = g^{-1}(\phi_i)$.
    • Inverse links: $\mu_i = \exp(\eta_{i,1})$, $\phi_i = g_{\phi}^{-1}(\eta_{i,2})$.
    • Log-likelihood (using Stan's neg_binomial_2_lpmf): $$\log p(y_i \mid \mu_i, \phi_i) = \log \Gamma(y_i + \phi_i) - \log \Gamma(\phi_i) - \log \Gamma(y_i + 1) + \phi_i \log \phi_i - \phi_i \log(\phi_i + \mu_i) + y_i \log \mu_i - y_i \log(\phi_i + \mu_i)$$
    • Posterior predictive: $y_i^{\text{rep}} \sim \text{NB2}(\mu_i, \phi_i)$.
  • AMM Role: Both parameters are distributional; $\mu_i$ is exponentiated from the reference-anchored $\eta_{i,1}$, and $\phi_i$ is transformed via its link.

Family 5: Beta Proportion
    } else if (family_id_k[1] == 5) {
      real mu_i  = inv_logit(eta_k[i, 1]);
      real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                        eta_k[i, 2]);
      log_lik[i] = beta_proportion_lpdf(y_real[i] | mu_i, phi_i);
      y_pred[i]  = beta_proportion_rng(mu_i, phi_i);
    }
  • Interpretation: Beta distribution in the mean-precision parameterization (proportion), with logit link for $\mu$ and possibly non-identity link for $\phi$.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \text{logit}(\mu_i)$, $\eta_{i,2} = g^{-1}(\phi_i)$.
    • Inverse links: $\mu_i = \text{logit}^{-1}(\eta_{i,1}) = \frac{1}{1 + e^{-\eta_{i,1}}}$, $\phi_i = g_{\phi}^{-1}(\eta_{i,2})$.
    • Log-likelihood (using Stan's beta_proportion_lpmf): $$\log p(y_i \mid \mu_i, \phi_i) = \text{log Beta}(\mu_i \phi_i, (1 - \mu_i) \phi_i) - \log y_i - \log(1 - y_i)$$ where $\text{log Beta}(a, b) = \log \Gamma(a) + \log \Gamma(b) - \log \Gamma(a + b)$.
    • Posterior predictive: $y_i^{\text{rep}} \sim \text{BetaProportion}(\mu_i, \phi_i)$.
  • AMM Role: The mean $\mu_i$ is on the probability scale via logit link; the precision $\phi_i$ is transformed.

Family 6: Gamma
    } else if (family_id_k[1] == 6) {
      real mu_i    = exp(eta_k[i, 1]);
      real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = gamma_lpdf(y_real[i] | shape_i, shape_i / mu_i);
      y_pred[i]  = gamma_rng(shape_i, shape_i / mu_i);
    }
  • Interpretation: Gamma distribution with shape-rate parameterization, where rate $\beta = \text{shape} / \mu$. Log link for $\mu$ and possibly non-identity link for shape $\alpha$.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = g^{-1}(\alpha_i)$.
    • Inverse links: $\mu_i = \exp(\eta_{i,1})$, $\alpha_i = g_{\alpha}^{-1}(\eta_{i,2})$.
    • Derived rate: $\beta_i = \alpha_i / \mu_i$.
    • Log-likelihood (using Stan's gamma_lpdf): $$\log p(y_i \mid \alpha_i, \beta_i) = (\alpha_i - 1) \log y_i - \beta_i y_i + \alpha_i \log \beta_i - \log \Gamma(\alpha_i)$$
    • Posterior predictive: $y_i^{\text{rep}} \sim \text{Gamma}(\alpha_i, \beta_i)$.
  • AMM Role: Both shape and scale are distributional parameters; the mean $\mu_i$ is exponentiated from $\eta_{i,1}$, and shape is transformed.

Family 7: Lognormal
    } else if (family_id_k[1] == 7) {
      real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = lognormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
      y_pred[i]  = lognormal_rng(eta_k[i, 1], sigma_i);
    }
  • Interpretation: Lognormal distribution with identity link for $\mu$ (on the log scale) and possibly non-identity link for $\sigma$.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \mu_i$, $\eta_{i,2} = g^{-1}(\sigma_i)$.
    • Inverse link: $\sigma_i = g_{\sigma}^{-1}(\eta_{i,2})$.
    • Log-likelihood: $$\log p(y_i \mid \mu_i, \sigma_i) = -\frac{1}{2} \log(2\pi) - \log \sigma_i - \log y_i - \frac{(\log y_i - \mu_i)^2}{2\sigma_i^2}$$
    • Posterior predictive: $y_i^{\text{rep}} \sim \text{Lognormal}(\mu_i, \sigma_i^2)$.
  • AMM Role: The location parameter $\mu_i$ is directly from $\eta_{i,1}$; the scale $\sigma_i$ is transformed.

Family 8: Student-t
    } else if (family_id_k[1] == 8) {
      real nu_i    = exp(eta_k[i, 3]);
      real mu_i    = eta_k[i, 1];
      real sigma_i = exp(eta_k[i, 2]);
      log_lik[i] = student_t_lpdf(y_real[i] | nu_i, mu_i, sigma_i);
      y_pred[i] = student_t_rng(nu_i, mu_i, sigma_i);
    }
  • Interpretation: Student-t distribution with identity link for $\mu$, log link for $\sigma$, and log link for $\nu$ (degrees of freedom). This is a three-parameter distribution.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \mu_i$, $\eta_{i,2} = \log(\sigma_i)$, $\eta_{i,3} = \log(\nu_i)$.
    • Inverse links: $\sigma_i = \exp(\eta_{i,2})$, $\nu_i = \exp(\eta_{i,3})$.
    • Log-likelihood: $$\log p(y_i \mid \mu_i, \sigma_i, \nu_i) = \log \Gamma\left(\frac{\nu_i + 1}{2}\right) - \log \Gamma\left(\frac{\nu_i}{2}\right) - \frac{1}{2} \log(\nu_i \pi) - \log \sigma_i - \frac{\nu_i + 1}{2} \log\left(1 + \frac{(y_i - \mu_i)^2}{\nu_i \sigma_i^2}\right)$$
    • Posterior predictive: $y_i^{\text{rep}} \sim \text{Student-t}(\nu_i, \mu_i, \sigma_i)$.
  • AMM Role: All three parameters are distributional and transformed via exponentiation from their linear predictors.

Family 9: Tweedie
    } else if (family_id_k[1] == 9) {
      real mu_i    = exp(eta_k[i, 1]);
      real phi_i = exp(eta_k[i, 2]);
      real p_i   = eta_k[i, 3];
      log_lik[i] = tweedie_lpdf(y_real[i] | mu_i, phi_i, p_i);
      y_pred[i] = tweedie_rng(mu_i, phi_i, p_i);
    }
  • Interpretation: Tweedie distribution with log links for $\mu$ and $\phi$ and identity link for $p$ (power parameter, $1 &lt; p &lt; 2$).
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = \log(\phi_i)$, $\eta_{i,3} = p_i$.
    • Inverse links: $\mu_i = \exp(\eta_{i,1})$, $\phi_i = \exp(\eta_{i,2})$.
    • Log-likelihood: The Tweedie log-likelihood is defined via the Tweedie density, which is not expressed in closed form but can be computed numerically.
    • Posterior predictive: $y_i^{\text{rep}} \sim \text{Tweedie}(\mu_i, \phi_i, p_i)$.
  • AMM Role: All three parameters are distributional; $\mu_i$ and $\phi_i$ are exponentiated, $p_i$ is identity-transformed.

Family 10: Zero-Inflated Poisson (ZIP)
    } else if (family_id_k[1] == 10) {
      // ZIP K = 2: log_lik mirrors the model-block dispatch; y_pred
      // draws the structural zero indicator first and then a Poisson
      // sample for the non-zero branch (Lambert 1992).
      real eta_mu_i = eta_k[i, 1];
      real eta_pi_i = eta_k[i, 2];
      if (y_int[i] == 0) {
        log_lik[i] = log_sum_exp(
          bernoulli_logit_lpmf(1 | eta_pi_i),
          bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(0 | eta_mu_i)
        );
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + poisson_log_lpmf(y_int[i] | eta_mu_i);
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        y_pred[i] = poisson_log_rng(eta_mu_i);
      }
    }
  • Interpretation: Zero-inflated Poisson model with $K=2$ distributional parameters: $\mu$ (Poisson rate) on the log scale and $\pi$ (zero-inflation probability) on the logit scale.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = \text{logit}(\pi_i)$.
    • Likelihood:
      • If $y_i = 0$: $$\log p(y_i=0) = \log\left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right)$$ Computed via log_sum_exp for numerical stability.
      • If $y_i &gt; 0$: $$\log p(y_i) = \log(1 - \pi_i) + y_i \log \mu_i - \mu_i - \log(y_i!)$$
    • Posterior predictive:
      1. Draw $z_i \sim \text{Bernoulli}(\pi_i)$ (structural zero indicator).
      2. If $z_i = 1$, then $y_i^{\text{rep}} = 0$.
      3. If $z_i = 0$, then $y_i^{\text{rep}} \sim \text{Poisson}(\mu_i)$.
  • AMM Role: Both $\mu_i$ and $\pi_i$ are distributional parameters transformed via log and logit links, respectively.

Family 11: Zero-Inflated Negative Binomial (ZINB)
    } else if (family_id_k[1] == 11) {
      // ZINB K = 3: symmetric to ZIP with neg_binomial_2_log.
      real eta_mu_i = eta_k[i, 1];
      real phi_i    = exp(eta_k[i, 2]);
      real eta_pi_i = eta_k[i, 3];
      if (y_int[i] == 0) {
        log_lik[i] = log_sum_exp(
          bernoulli_logit_lpmf(1 | eta_pi_i),
          bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
        );
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
                                              phi_i);
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        y_pred[i] = neg_binomial_2_log_rng(eta_mu_i, phi_i);
      }
    }
  • Interpretation: Zero-inflated negative binomial model with $K=3$ distributional parameters: $\mu$ (NB mean) on log scale, $\phi$ (NB dispersion) on log scale, and $\pi$ (zero-inflation probability) on logit scale.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = \log(\phi_i)$, $\eta_{i,3} = \text{logit}(\pi_i)$.
    • Inverse links: $\mu_i = \exp(\eta_{i,1})$, $\phi_i = \exp(\eta_{i,2})$.
    • Likelihood:
      • If $y_i = 0$: $$\log p(y_i=0) = \log\left( \pi_i + (1 - \pi_i) \left(\frac{\phi_i}{\phi_i + \mu_i}\right)^{\phi_i} \right)$$ Computed via log_sum_exp for numerical stability.
      • If $y_i &gt; 0$: $$\log p(y_i) = \log(1 - \pi_i) + \log \Gamma(y_i + \phi_i) - \log \Gamma(\phi_i) - \log \Gamma(y_i + 1) + \phi_i \log \phi_i - \phi_i \log(\phi_i + \mu_i) + y_i \log \mu_i - y_i \log(\phi_i + \mu_i)$$
    • Posterior predictive:
      1. Draw $z_i \sim \text{Bernoulli}(\pi_i)$.
      2. If $z_i = 1$, then $y_i^{\text{rep}} = 0$.
      3. If $z_i = 0$, then $y_i^{\text{rep}} \sim \text{NB2}(\mu_i, \phi_i)$.
  • AMM Role: All three parameters are distributional and transformed via exponential and logit links.

Family 12: Hurdle-Poisson
    } else if (family_id_k[1] == 12) {
      // Hurdle-Poisson K = 2: log_lik mirrors the model-block
      // dispatch. y_pred uses rejection sampling on Poisson(mu) to
      // realize the truncated-at-one positive branch (Mullahy 1986);
      // bounded at 10000 iterations for numerical safety when mu is
      // pathologically small (in which case the family is borderline
      // identifiable in the first place).
      real eta_mu_i = eta_k[i, 1];
      real eta_pi_i = eta_k[i, 2];
      if (y_int[i] == 0) {
        log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + poisson_log_lpmf(y_int[i] | eta_mu_i)
                   - log1m_exp(-exp(eta_mu_i));
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        int y_draw = 0;
        int iter = 0;
        while (y_draw == 0 && iter < 10000) {
          y_draw = poisson_log_rng(eta_mu_i);
          iter = iter + 1;
        }
        y_pred[i] = y_draw;
      }
    }
  • Interpretation: Hurdle-Poisson model with $K=2$ distributional parameters: $\mu$ (Poisson rate) on log scale and $\pi$ (hurdle probability) on logit scale.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = \text{logit}(\pi_i)$.
    • Likelihood:
      • If $y_i = 0$: $$\log p(y_i=0) = \log \pi_i$$
      • If $y_i &gt; 0$: $$\log p(y_i) = \log(1 - \pi_i) + y_i \log \mu_i - \mu_i - \log(y_i!) - \log(1 - e^{-\mu_i})$$ The term $\log(1 - e^{-\mu_i})$ normalizes the truncated Poisson distribution.
    • Posterior predictive:
      1. Draw $z_i \sim \text{Bernoulli}(\pi_i)$ (hurdle indicator).
      2. If $z_i = 1$, then $y_i^{\text{rep}} = 0$.
      3. If $z_i = 0$, then $y_i^{\text{rep}}$ is drawn from a Poisson($\mu_i$) truncated to positive integers. This is done via rejection sampling: repeatedly draw from Poisson($\mu_i$) until a non-zero value is obtained, with a maximum of 10000 iterations to avoid infinite loops when $\mu_i$ is extremely small.
  • AMM Role: Both parameters are distributional and transformed via log and logit links.

Family 13: Hurdle-Negative Binomial
    } else if (family_id_k[1] == 13) {
      // Hurdle-NB K = 3: symmetric to Hurdle-Poisson with
      // neg_binomial_2_log on the truncated positive branch.
      real eta_mu_i = eta_k[i, 1];
      real phi_i    = exp(eta_k[i, 2]);
      real eta_pi_i = eta_k[i, 3];
      if (y_int[i] == 0) {
        log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
                                              phi_i)
                   - log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i,
                                                       phi_i));
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        int y_draw = 0;
        int iter = 0;
        while (y_draw == 0 && iter < 10000) {
          y_draw = neg_binomial_2_log_rng(eta_mu_i, phi_i);
          iter = iter + 1;
        }
        y_pred[i] = y_draw;
      }
    }
  • Interpretation: Hurdle-negative binomial model with $K=3$ distributional parameters: $\mu$ (NB mean) on log scale, $\phi$ (NB dispersion) on log scale, and $\pi$ (hurdle probability) on logit scale.
  • Mathematics:
    • Linear predictors: $\eta_{i,1} = \log(\mu_i)$, $\eta_{i,2} = \log(\phi_i)$, $\eta_{i,3} = \text{logit}(\pi_i)$.
    • Inverse links: $\mu_i = \exp(\eta_{i,1})$, $\phi_i = \exp(\eta_{i,2})$.
    • Likelihood:
      • If $y_i = 0$: $$\log p(y_i=0) = \log \pi_i$$
      • If $y_i &gt; 0$: $$\log p(y_i) = \log(1 - \pi_i) + \log \Gamma(y_i + \phi_i) - \log \Gamma(\phi_i) - \log \Gamma(y_i + 1) + \phi_i \log \phi_i - \phi_i \log(\phi_i + \mu_i) + y_i \log \mu_i - y_i \log(\phi_i + \mu_i) - \log\left(1 - \left(\frac{\phi_i}{\phi_i + \mu_i}\right)^{\phi_i}\right)$$ The last term normalizes the truncated negative binomial distribution.
    • Posterior predictive:
      1. Draw $z_i \sim \text{Bernoulli}(\pi_i)$.
      2. If $z_i = 1$, then $y_i^{\text{rep}} = 0$.
      3. If $z_i = 0$, then $y_i^{\text{rep}}$ is drawn from a NB2($\mu_i, \phi_i$) truncated to positive integers via rejection sampling (up to 10000 iterations).
  • AMM Role: All three parameters are distributional and transformed via exponential and logit links.

Default Case (Unsupported Family)
    } else {
      log_lik[i] = negative_infinity();
      y_pred[i] = 0;
    }
  • Interpretation: If family_id_k[1] does not match any of the implemented families (1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13), the log-likelihood is set to $-\infty$ and the posterior predictive draw is set to 0. This is a fallback to avoid undefined behavior.
  • Mathematics: $\log p(y_i \mid \theta_i) = -\infty$, $y_i^{\text{rep}} = 0$.
  • AMM Role: Ensures the code does not crash for unsupported families, but the model should never reach this point if family_id_k is correctly specified.

3. End of Block

  }
}

Closes the for loop and the generated quantities block.


Summary of AMM Reference-Anchoring Realization

The generated quantities block realizes the AMM reference-anchoring decomposition by:

  1. Using the linear predictors eta_k that are computed in the transformed parameters block from the reference-anchored parameterization.
  2. Applying appropriate inverse link functions to transform linear predictors to distribution parameters (e.g., $\exp$ for log links, $\text{logit}^{-1}$ for logit links).
  3. Implementing the likelihood for each supported distributional family, mirroring the model block's likelihood dispatcher.
  4. Generating posterior predictive draws that respect the distributional form, including zero-inflation and hurdle mechanisms.
  5. Providing pointwise log-likelihoods for model comparison, which are essential for evaluating the reference-anchored model's fit.

This block thus completes the computational pipeline for the AMM model with distributional regression for $K \geq 1$ parameters.

inst/stan/_canonical_pieces/amm_canonical_eb_conditional_K.stan

inst/stan/_canonical_pieces/amm_canonical_eb_conditional_K.stan — Section 1 of 3

0. Header comment block (lines 1–87)

The header is a dense specification document. It fixes the following invariants that govern the rest of the file:

0.1 Provenance and role

  • Path 1 — Empirical Bayes (EB) Conditional Template, Path B, canonized as decision D34 in Session 10 (2026-05-25), sub-phase 8.6.C.
  • Invoked from R/eb.R through cmdstanr::cmdstan_model(...)$sample(...) at Step (iii) of the EB workflow, for the regime $K \geq 2$, $p = 1$ (single design column, $K$ distributional slots).

0.2 Structural delta versus amm_eb_marginal_K.stan

Three changes define the conditional (as opposed to marginal) EB template:

  1. theta_ref_k is relocated from the parameters{} block to the data{} block and renamed theta_ref_k_data, making the EB plug-in $\widehat{\theta}_{\mathrm{ref}}^{\mathrm{EB}}$ explicit.
  2. The per-slot hyperparameters $\mu_{\theta_{\mathrm{ref},k}}$ and $\sigma_{\theta_{\mathrm{ref},k}}$ are removed entirely — they existed only to endow $\theta_{\mathrm{ref},k}$ with a prior in the full-Bayes (FB) and EB-marginal regimes.
  3. The THETA_REF_PRIOR_BLOCK substitution is deleted from model{} because $\theta_{\mathrm{ref},k}$ is now a fixed data input.

Everything else is asserted to be bit-identical to amm_eb_marginal_K.stan, so that the conditional posterior $p(\xi \mid y, \widehat{\theta}_{\mathrm{ref}}^{\mathrm{EB}})$ sampled here matches the conditional posterior that the EB asymptotic theory of documents v07 / v07b analyzes. No extra modelling assumption is interposed between Steps (i) and (iii).

0.3 Likelihood specification

Distributional regression with canonical links per slot. The canonical AMM linear predictor for slot $k$ is

$$ \eta_k[i] ;=; \theta_{\mathrm{ref},k} ;+; a_k(x_i) ;+; b_k(x_i),\theta_{\mathrm{ref},k} ;+; \bigl(W(\theta_{\mathrm{ref},k}) - W(\theta_{\mathrm{anchor},k})\bigr),x_i, $$

where $a_k(\cdot)$ is a basis expansion of dimension $J_a^{(k)}$, $b_k(\cdot)$ is a basis expansion of dimension $J_b^{(k)}$, and $W(\cdot)$ is a globally shared modulating function. The family-specific wiring is:

Family ID Family Slot 1 (location) link Slot 2+ link $K$
1 Gaussian $\mu$ identity $\sigma$ log 2
3 neg_binomial_2 $\mu$ log $\phi$ log 2
5 Beta $\mu$ logit $\phi$ log 2
6 Gamma $\mu$ log shape log 2
7 lognormal_loc_scale $\mu$ identity $\sigma$ log 2
8 Student-$t$ $\mu$ identity $\sigma$ log, $\nu$ log 3
9 Tweedie $\mu$ log $\phi$ log, $p$ identity 3
10 ZIP $\mu$ log $\pi$ logit 2
11 ZINB $\mu$ log $\phi$ log, $\pi$ logit 3
12 hurdle_poisson $\mu$ log $\pi$ logit 2
13 hurdle_neg_binomial_2 $\mu$ log $\phi$ log, $\pi$ logit 3

Concrete likelihood forms cited:

  • Gaussian $K=2$: $y_i \sim \mathrm{Normal}(\eta_\mu[i],, \exp(\eta_\sigma[i]))$, with $\eta_\mu$ on the identity link and $\eta_\sigma$ on the log link.
  • neg_binomial_2 $K=2$: neg_binomial_2_log(y | eta_mu, exp(eta_phi)).
  • Beta $K=2$: beta_proportion_lpdf(y_real | inv_logit(eta_mu), exp(eta_phi)).
  • Gamma $K=2$: gamma_lpdf(y_real | exp(eta_shape), exp(eta_shape) / exp(eta_mu)) — shape–mean parametrization.
  • Student-$t$ $K=3$: student_t_lpdf(y_real | exp(eta_nu), eta_mu, exp(eta_sigma)).
  • Tweedie $K=3$: tweedie_lpdf(y_real | exp(eta_mu), exp(eta_phi), eta_p) with the hybrid Dunn–Smyth lpdf defined in the functions{} block.

Continuous hurdle families (hurdle_gamma, hurdle_lognormal) and heterogeneous family/link per slot are deferred to sub-phases 8.3.7+.

0.4 Scope of $W$

$W$ is globally shared across all $K$ slots: a single W_raw, a single sigma_W, and a single basis matrix feed the modulating term in every slot. Per-slot or location-only $W$ is registered as a candidate decision for Session 8.4.

0.5 Scope of $\theta_{\mathrm{anchor}}$

Per-slot: each slot $k$ carries its own scalar $\theta_{\mathrm{anchor},k}$, defaulting to 0. The R-side input is a named numeric vector whose names match the slot names of the gdpar_formula_set (e.g. theta_anchor = c(mu = 0, sigma = 0)).

0.6 Internal sampling parametrization

  • $a$: configurable CP/NCP per slot via segment() priors. Three placeholders (DATA_CP_A_PER_K_DECL, TP_A_BLOCK, MODEL_A_BLOCK) are resolved by generate_a_blocks_K() from a length-$K$ logical vector cp_a_per_K.
  • $W$: configurable CP/NCP globally via placeholders W_SCALE and W_PRIOR.
  • $b$: linear reparametrization per slot. Define $c_{b,k} = \theta_{\mathrm{ref},k} \cdot b_{\mathrm{coef},k}$. The model samples $c_{b,k,\mathrm{raw}}$ directly (NCP on $c_{b,k,\mathrm{raw}}$) and reports $b_{\mathrm{coef},k}$ as a derived quantity in the single-anchor regime. This eliminates the bimodality of $(\theta_{\mathrm{ref},k}, b_k)$ — Recovery 2, handoff 4 addendum, 2026-05-09.

0.7 Centering conventions

Column-wise centering of $Z_{a,k}$ and $Z_{b,k}$ is enforced R-side by build_amm_design_K(). Conditions (C2) and (C3) of Block 1 hold identically per slot; (C4) holds per slot via $W(\theta_{\mathrm{ref},k}) - W(\theta_{\mathrm{anchor},K,k})$.


1. functions { ... } block

1.1 tweedie_log_W_series

real tweedie_log_W_series(real y, real phi, real p) {
  real alpha_pos = (2.0 - p) / (p - 1.0);
  real log_z = alpha_pos * log(y)
               - alpha_pos * log(p - 1.0)
               - (1.0 + alpha_pos) * log(phi)
               - log(2.0 - p);
  real log_W = negative_infinity();
  real max_log_W = negative_infinity();
  int max_iter = 1000;
  int passed_peak = 0;
  for (j_loop in 1:max_iter) {
    real j = j_loop;
    real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);
    if (term > max_log_W) {
      max_log_W = term;
    } else {
      passed_peak = 1;
    }
    log_W = log_sum_exp(log_W, term);
    if (passed_peak == 1 && term < max_log_W - 37.0) {
      break;
    }
  }
  return log_W;
}

Mathematics. For the Tweedie compound Poisson–gamma model with $1 &lt; p &lt; 2$, the Dunn–Smyth (2005) infinite-series expansion of the density at $y &gt; 0$ involves

$$ W(y, \phi, p) ;=; \sum_{j=1}^{\infty} W_j, $$

with summand

$$ W_j ;=; \frac{y^{j\alpha},(p-1)^{-j\alpha},\phi^{-j(1+\alpha)},(2-p)^{-j}}{j!,\Gamma(j\alpha)}, \qquad \alpha ;:=; \frac{2-p}{p-1} ;>; 0. $$

Taking logs,

$$ \log W_j ;=; j,\log z ;-; \log\Gamma(j+1) ;-; \log\Gamma(j\alpha), $$

where

$$ \log z ;=; \alpha\log y ;-; \alpha\log(p-1) ;-; (1+\alpha)\log\phi ;-; \log(2-p). $$

The code computes alpha_pos $= \alpha$, log_z $= \log z$, and then iterates $j = 1, 2, \dots, 1000$. Each term is $\log W_j$. The running log-sum-exp log_W accumulates $\log\sum_j W_j$. Because $W_j$ is unimodal in $j$, the code tracks max_log_W (the peak log-summand) and sets passed_peak = 1 once the terms start decreasing. Termination occurs when a post-peak term falls more than 37 nats below the peak — a relative contribution of $\approx e^{-37} \approx 1\times 10^{-16}$, i.e. machine-epsilon-level truncation.

Return value: $\log W(y, \phi, p)$.

1.2 tweedie_log_f_series

real tweedie_log_f_series(real y, real mu, real phi, real p) {
  real theta = pow(mu, 1.0 - p) / (1.0 - p);
  real kappa = pow(mu, 2.0 - p) / (2.0 - p);
  return (y * theta - kappa) / phi - log(y)
         + tweedie_log_W_series(y, phi, p);
}

Mathematics. The exact Dunn–Smyth log-density for $y &gt; 0$ is

$$ \log f(y \mid \mu, \phi, p) ;=; \frac{y,\theta(\mu) - \kappa(\mu)}{\phi} ;-; \log y ;+; \log W(y, \phi, p), $$

where the Tweedie canonical and cumulant functions are

$$ \theta(\mu) ;=; \frac{\mu^{1-p}}{1-p}, \qquad \kappa(\mu) ;=; \frac{\mu^{2-p}}{2-p}. $$

The code computes theta $= \theta(\mu)$, kappa $= \kappa(\mu)$, and returns the three-term sum. This is the central-region evaluator, used when $|p - 1.5| &lt; \tau$.

1.3 tweedie_log_f_saddlepoint

real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p) {
  real deviance = 2.0 * (
    y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p)
    - (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p)
  );
  return -0.5 * log(2.0 * pi() * phi * pow(y, p))
         - deviance / (2.0 * phi);
}

Mathematics. Wood (2017, Eq. 3.46) gives the saddlepoint (Lugannani–Rice-style) approximation specialized to the Tweedie exponential dispersion family. The Tweedie deviance is

$$ D(y, \mu) ;=; 2\left[\frac{y,(y^{1-p} - \mu^{1-p})}{1-p} ;-; \frac{y^{2-p} - \mu^{2-p}}{2-p}\right], $$

and the saddlepoint log-density is

$$ \log f_{\mathrm{sp}}(y \mid \mu, \phi, p) ;=; -\tfrac{1}{2}\log!\bigl(2\pi\phi, y^{p}\bigr) ;-; \frac{D(y,\mu)}{2\phi}. $$

The code computes deviance $= D(y, \mu)$ and returns the saddlepoint formula. This evaluator is used outside the central region ($|p - 1.5| \geq \tau$) for gradient stability.

1.4 tweedie_lpdf

real tweedie_lpdf(real y, real mu, real phi, real p) {
  real tau = 0.4;
  real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
  if (y == 0.0) {
    return -lambda;
  }
  if (abs(p - 1.5) < tau) {
    return tweedie_log_f_series(y, mu, phi, p);
  } else {
    return tweedie_log_f_saddlepoint(y, mu, phi, p);
  }
}

Mathematics. This is the public entry point. The point-mass at zero for the compound Poisson–gamma distribution has closed form

$$ \log\Pr(Y = 0) ;=; -\lambda, \qquad \lambda ;=; \frac{\mu^{2-p}}{\phi,(2-p)}, $$

because $N \sim \mathrm{Poisson}(\lambda)$ and $Y = 0 \iff N = 0$. For $y &gt; 0$, the function dispatches:

  • $|p - 1.5| &lt; 0.4$: exact Dunn–Smyth series (tweedie_log_f_series).
  • $|p - 1.5| \geq 0.4$: saddlepoint approximation (tweedie_log_f_saddlepoint).

The threshold $\tau = 0.4$ is the canonical value. The comma-separated argument signature follows the Stan _lpdf convention, so both y ~ tweedie(mu, phi, p) and tweedie_lpdf(y | mu, phi, p) dispatch through this function.

1.5 tweedie_rng

real tweedie_rng(real mu, real phi, real p) {
  real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
  int N = poisson_rng(lambda);
  if (N == 0) {
    return 0.0;
  }
  real shape = (2.0 - p) / (p - 1.0);
  real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));
  return gamma_rng(N * shape, rate);
}

Mathematics. The compound Poisson–gamma representation: $N \sim \mathrm{Poisson}(\lambda)$, $Y = \sum_{i=1}^{N} G_i$, $G_i \sim \mathrm{Gamma}(\text{shape}, \text{rate})$ with

$$ \text{shape} ;=; \frac{2-p}{p-1} ;=; \alpha, \qquad \text{rate} ;=; \frac{1}{\phi,(p-1),\mu^{p-1}}. $$

If $N = 0$, $Y = 0$. Otherwise, the sum of $N$ i.i.d. $\mathrm{Gamma}(\alpha, \text{rate})$ variates is $\mathrm{Gamma}(N\alpha, \text{rate})$, so the code draws a single gamma_rng(N * shape, rate). This is used in generated quantities for posterior-predictive checks.

1.6 apply_inv_link_by_id

real apply_inv_link_by_id(int link_id, real eta) {
  if (link_id == 0) return eta;
  if (link_id == 1) return inv_logit(eta);
  return exp(eta);
}

Mathematics. A dispatch table for canonical inverse links:

link_id Inverse link $g^{-1}(\eta)$
0 identity $\eta$
1 inverse-logit $\frac{1}{1+e^{-\eta}}$
2 exp $e^{\eta}$

This is consumed by the $K = 2$ likelihood branches (location family in ${1, 3, 5, 6, 7}$) to compose the auxiliary slot's transformation under the L1 refined rule (sub-phase 8.3.7). The inv_link_id_per_slot[k] data field is populated R-side by .gdpar_compute_inv_link_id_per_slot. For $K \geq 3$ families (Student-$t$, Tweedie, mixtures), canonical Stan idioms (*_log_lpmf, bernoulli_logit_lpmf) are used instead, and a separate R-side guard rejects heterogeneous location families outside ${1, 3, 5, 6, 7}$.

1.7 {{CANONICAL_HELPERS}}

A placeholder token, to be substituted by R/stan_codegen.R during template instantiation. It is not Stan syntax in the raw template; the codegen step replaces it with additional helper functions as required by the specific family/slot configuration.


2. data { ... } block

2.1 Core dimensions

int<lower=1> n;
int<lower=2> K;
  • n: number of observations, $n \geq 1$.
  • K: number of K-individual distributional slots, $K \geq 2$. Unit 3 fires only when $K &gt; 1$.

2.2 Family identifiers per slot

array[K] int<lower=1, upper=13> family_id_k;

Each slot $k$ carries its own family identifier in ${1, \dots, 13}$. Sub-phase 8.3.7 admits heterogeneous families per slot for $K = 2$ location families ${1, 3, 5, 6, 7}$; for $K \geq 3$ families heterogeneity is deferred. The likelihood block branches on family_id_k[1] (the location slot); auxiliary slots consume inv_link_id_per_slot[k] via apply_inv_link_by_id().

2.3 Inverse-link identifiers per slot

array[K] int<lower=0, upper=2> inv_link_id_per_slot;

Populated R-side by .gdpar_compute_inv_link_id_per_slot under the L1 refined rule (decision D3.5):

  • Homogeneous slot $k$ (family_id_k[k] == family_id_k[1]): the ID reflects the canonical link of slot $k$ of the location family.
  • Heterogeneous slot $k$ (family_id_k[k] != family_id_k[1]): the ID reflects the canonical link of slot 1 of family_id_k[k].

Values: $0$ = identity, $1$ = inv_logit, $2$ = exp. Consumed by the $K = 2$ likelihood branches.

2.4 Component usage flags

array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;
int<lower=0, upper=1> use_W;
  • use_a_k[k]: whether the $a_k(\cdot)$ basis expansion is active in slot $k$.
  • use_b_k[k]: whether the $b_k(\cdot)$ basis expansion is active in slot $k$.
  • use_W: whether the globally shared modulating component $W$ is active (single flag, since $W$ is shared across all slots).

2.5 Basis dimensions

array[K] int<lower=0> J_a_per_k;
array[K] int<lower=0> J_b_per_k;
int<lower=0> J_a_max;
int<lower=0> J_b_max;
  • J_a_per_k[k] $= J_a^{(k)}$: number of columns in the $a$-basis for slot $k$.
  • J_b_per_k[k] $= J_b^{(k)}$: number of columns in the $b$-basis for slot $k$.
  • J_a_max $= \max_k J_a^{(k)}$: padding width for the flat-packed $a$-design.
  • J_b_max $= \max_k J_b^{(k)}$: padding width for the flat-packed $b$-design.

2.6 Padded per-slot design matrices

array[K] matrix[n, J_a_max] Z_a_k;
array[K] matrix[n, J_b_max] Z_b_k;

$Z_{a,k} \in \mathbb{R}^{n \times J_a^{\max}}$ and $Z_{b,k} \in \mathbb{R}^{n \times J_b^{\max}}$ are the padded design matrices for slot $k$. Coordinate $p = 1$ implies a single design matrix per slot; the ragged column counts $J_a^{(k)} \leq J_a^{\max}$ are handled by the flat-pack / pad-to-max device. Columns beyond $J_a^{(k)}$ are zero-padded and never indexed (the corresponding coefficients are either absent or fixed at zero).

2.7 Global modulating component

int<lower=0> dim_W;
int<lower=0> d;
matrix[n, d] X;
  • dim_W: dimension of the $W$ basis (number of basis functions or spline coefficients).
  • d: number of columns in the global design matrix $X$.
  • X $\in \mathbb{R}^{n \times d}$: the global covariate matrix. In the AMM canonical form, $x_i$ (the scalar appearing in the $W(\theta_{\mathrm{ref},k}) - W(\theta_{\mathrm{anchor},k})$ term) is extracted from $X$; with $p = 1$, $d = 1$ and $X$ is a single column.

2.8 Outcomes

vector[n] y_real;
array[n] int y_int;

Both real and integer storage are allocated because the consumed branch depends on family_id_k[1]. With $p = 1$, the outcome $y$ is the same across all $K$ slots — the $K$ slots parameterize one univariate response. Continuous families (Gaussian, Beta, Gamma, Student-$t$, Tweedie, lognormal) consume y_real; discrete families (neg_binomial_2, ZIP, ZINB, hurdle_poisson, hurdle_neg_binomial_2) consume y_int.

2.9 Theta anchors

vector[K] theta_anchor_K;

$\theta_{\mathrm{anchor},K} \in \mathbb{R}^{K}$: per-slot anchor values. With $p = 1$, one scalar per slot. The R-side packager exposes this as a length-$K$ named vector whose names match the slot order of the gdpar_formula_set. Default is 0 for each slot. These enter the AMM canonical form through the term $W(\theta_{\mathrm{ref},k}) - W(\theta_{\mathrm{anchor},k})$.

2.10 Dispersion flags

array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;

Per-slot dispersion flags carried for inter-slot heterogeneity. For example, Gaussian $K = 2$ has no sigma_y entry because the scale is K-individual; neg_binomial_2 $K = 2$ has no phi entry for the same reason. The flags reproduce the $K = 1$, $p \geq 1$ behaviour on slots whose param_spec retains population scope, and are kept here for future extension to mixed (K-individual + population) family patterns.

2.11 W-type specification (sub-phase 8.3.8)

int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;
  • W_type_id $\in {0, 1, 2}$: selects the type of the modulating function $W$ (e.g. polynomial, spline, other — the exact mapping is defined in amm_main.stan header per sub-phase 8.3.8).
  • W_n_knots_full: number of knots for the $W$ basis (0 if not applicable).
  • W_knots_full: the knot locations, length W_n_knots_full.
  • W_degree: polynomial/spline degree for $W$.

2.12 Grouping structure (Block 6.5)

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;
  • use_groups: whether a hierarchical anchor on $\theta_{\mathrm{ref},k}[g]$ is active.
  • J_groups: number of groups $G$.
  • group_id[i] $\in {1, \dots, G}$: group membership of observation $i$.

The grouping structure is shared across slots: a single $(J_{\mathrm{groups}}, \mathrm{group_id})$ tuple drives the hierarchical anchor on $\theta_{\mathrm{ref},k}[g]$ for every $k$.

2.13 {{DATA_CP_A_PER_K_DECL}}

A placeholder token, substituted by generate_a_blocks_K() from the cp_a_per_K logical vector of length $K$. It injects per-slot CP/NCP declarations for the $a$-block coefficients (e.g. vector[J_a_per_k[k]] a_k_raw for NCP slots, or vector[J_a_per_k[k]] a_k for CP slots, plus any associated scaling declarations).

2.14 Explicit dimension declarations (sub-phase 8.6.B / 8.6.C)

int<lower=1> K_slots;
int<lower=1> p_dim;
  • K_slots: under Path B, K_slots == K. Declared explicitly so that the template body can in principle branch on it, though under Path B it does not.
  • p_dim: under Path B, p_dim == 1. Same rationale.

2.15 EB conditional plug-in

array[J_groups] vector[K] theta_ref_k_data;

This is the central data input of the conditional template. It is a $G \times K$ array of vectors: theta_ref_k_data[g][k] $= \widehat{\theta}_{\mathrm{ref},k}^{\mathrm{EB}}[g]$, the EB plug-in estimate for slot $k$ in group $g$.

Provenance. Populated R-side by .gdpar_eb_maximize_marginal(), which extracts $\widehat{\theta}_{\mathrm{ref}}^{\mathrm{EB}}$ from cmdstanr::laplace() applied to amm_eb_marginal_K.stan, projected onto the per-slot anchors. When use_groups == 0, J_groups == 1 and the array has a single element containing the $K$ slot-level plug-ins.

Role in the AMM decomposition. Because $\theta_{\mathrm{ref},k}$ is now fixed data rather than a sampled parameter, the conditional posterior sampled by this template is

$$ p!\bigl(\xi ;\big|; y,; \widehat{\theta}_{\mathrm{ref}}^{\mathrm{EB}}\bigr) ;=; p!\bigl(\xi ;\big|; y,; \theta_{\mathrm{ref}} = \widehat{\theta}_{\mathrm{ref}}^{\mathrm{EB}}\bigr), $$

where $\xi$ denotes all remaining free parameters ($a$, $b$, $W$, and any dispersion/shape parameters). This is precisely the conditional posterior that the EB asymptotic theory of v07 / v07b analyzes. The AMM canonical linear predictor for slot $k$, group $g$, observation $i$ becomes

$$ \eta_k[i, g] ;=; \widehat{\theta}_{\mathrm{ref},k}^{\mathrm{EB}}[g] ;+; a_k(x_i) ;+; b_k(x_i),\widehat{\theta}_{\mathrm{ref},k}^{\mathrm{EB}}[g] ;+; \bigl(W(\widehat{\theta}_{\mathrm{ref},k}^{\mathrm{EB}}[g]) - W(\theta_{\mathrm{anchor},k})\bigr),x_i, $$

with no prior on $\widehat{\theta}{\mathrm{ref},k}^{\mathrm{EB}}[g]$ itself — it is a plug-in constant. This is the reference-anchoring decomposition: the reference parameter $\theta{\mathrm{ref},k}$ anchors the linear predictor, and the remaining components $a_k$, $b_k$, $W$ capture deviations, interactions, and modulations around that anchor.

inst/stan/_canonical_pieces/amm_canonical_eb_conditional_K.stan — Section 2 of 3

Scope of this section. The section opens with a lone } that closes the functions block begun in section 1. It then contains four complete top-level blocks — transformed data, parameters, transformed parameters, and model — and terminates mid-model block (the closing brace and any generated quantities belong to section 3). The template is the conditional AMM regime: theta_ref_k is supplied as data, so no hyperparameters mu_theta_ref_k / sigma_theta_ref_k are sampled.


1. transformed data block

1.1 Per-slot free-coefficient bookkeeping

array[K] int J_a_free;
array[K] int J_b_free;
array[K] int a_raw_offset;
array[K] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;

Four integer arrays of length $K$ and two scalar accumulators:

Symbol Meaning
J_a_free[k] Number of free additive coefficients actually sampled for slot $k$. Differs from J_a_per_k[k] when the slot is globally off (use_a_k[k] == 0) or has no a() term at all.
J_b_free[k] Analogous for the multiplicative component.
a_raw_offset[k] Start index of slot $k$'s segment inside the flat-packed vector a_raw.
c_b_raw_offset[k] Start index of slot $k$'s segment inside c_b_k_raw.
total_J_a_free $\sum_{k=1}^{K} J_{a,\text{free}}[k]$; the declared length of a_raw.
total_J_b_free $\sum_{k=1}^{K} J_{b,\text{free}}[k]$; the declared length of c_b_k_raw.

1.2 Effective W dimensions

int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;

When the global modulating component is disabled (use_W == 0), both collapse to $0$, so the W_raw matrix in parameters degenerates to a $0 \times 0$ matrix (zero sampled entries). When enabled, dim_W_eff = dim_W (number of basis functions) and d_eff = d (covariate stride).

1.3 Conjunctive activity flags

int any_use_a = 0;
int any_use_b = 0;
int sigma_W_size;
int sigma_y_size = 0;
int phi_size = 0;
  • any_use_a / any_use_b: OR-reductions of use_a_k / use_b_k across slots. They gate entire prior blocks and the sigma_b_k array length (K * any_use_b evaluates to $0$ when no slot uses the multiplicative component, eliminating the parameter entirely).
  • sigma_W_size: set below to $1$ when W is active, else $0$.
  • sigma_y_size / phi_size: population-scoped dispersion parameter counts, accumulated in the loop.

1.4 The per-slot accumulation loop

for (k in 1:K) {
  J_a_free[k] = (use_a_k[k] == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
  J_b_free[k] = (use_b_k[k] == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
  a_raw_offset[k] = total_J_a_free;
  c_b_raw_offset[k] = total_J_b_free;
  total_J_a_free += J_a_free[k];
  total_J_b_free += J_b_free[k];
  if (use_a_k[k] == 1) any_use_a = 1;
  if (use_b_k[k] == 1) any_use_b = 1;
  sigma_y_size += use_dispersion_y_k[k];
  phi_size += use_dispersion_phi_k[k];
}

For each slot $k \in {1,\dots,K}$:

  1. Free-count gating. $J_{a,\text{free}}[k] = J_{a,\text{per},k}[k]$ iff use_a_k[k] == 1 and $J_{a,\text{per},k}[k] &gt; 0$; otherwise $0$. The conjunctive guard prevents declaring sampled coefficients for a slot whose a() spec is either globally off or empty. Identical logic for $J_{b,\text{free}}[k]$.

  2. Offset registration. Before accumulating, the current running total is stored as slot $k$'s offset. This is the standard flat-packing convention: slot $k$'s raw coefficients occupy indices offset[k] + 1offset[k] + J_free[k] in the packed vector.

  3. Accumulation. total_J_a_free and total_J_b_free grow by the slot's free count.

  4. Flag setting. any_use_a and any_use_b are latched to $1$ on the first active slot (irreversible OR).

  5. Dispersion sizing. sigma_y_size accumulates use_dispersion_y_k[k] (Gaussian $\sigma_y$ slots whose scope is population), and phi_size accumulates use_dispersion_phi_k[k] (NB $\phi$ slots whose scope is population). Slots whose dispersion is K-individual (modeled via their own AMM in another slot) do not contribute here.

1.5 sigma_W sizing

sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;

A single global $\sigma_W$ scalar is declared iff W is active and has at least one basis function.

1.6 sigma_a compaction (Option A, RG.6 / D96)

int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
  if (J_a_free[k] > 0) {
    n_sigma_a += 1;
    sigma_a_idx[k] = n_sigma_a;
  } else {
    sigma_a_idx[k] = 0;
  }
}

This is the compaction optimization described in the inline comment. The problem: if sigma_a_k were declared as array[K], a slot with $J_{a,\text{free}}[k] = 0$ (either use_a_k[k] == 0, e.g. a phi ~ 1 slot, or an intercept-only a() fully absorbed into theta_ref) would still sample a half-strict-positive scale parameter with no likelihood contribution — a flat, non-identified direction that stalls NUTS.

The compaction builds:

  • n_sigma_a: the count of slots with $J_{a,\text{free}}[k] &gt; 0$.
  • sigma_a_idx[k]: a remapping from slot index $k$ to the compacted index in sigma_a_k ($1$-based if active, $0$ if inactive — a sentinel).

When every slot carries free additive coefficients, $n_{\sigma_a} = K$ and $\text{sigma_a_idx}[k] = k$ (identity), so the generated model is bit-identical to the uncompacted version. The {{TP_A_BLOCK}} and {{MODEL_A_BLOCK}} injection points (see below) are responsible for using sigma_a_idx[k] to index into the compacted vector.


2. parameters block

// theta_ref_k is now data ... no hyperparameters ... are sampled
// in the conditional regime.

array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;

array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_k_raw;

array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;

array[sigma_y_size] real<lower=0> sigma_y_pop;
array[phi_size] real<lower=0> phi_pop;

2.1 Additive component

  • sigma_a_k: compacted per-slot additive scales. Length $n_{\sigma_a} \le K$. Each entry $\sigma_{a,k} &gt; 0$.
  • a_raw: flat-packed standard-normal-scale additive coefficients. Length $\sum_k J_{a,\text{free}}[k]$. The non-centered parametrization is $a_{k,j} = \sigma_{a,k} \cdot a_{\text{raw},k,j}$ (realized in {{TP_A_BLOCK}}).

2.2 Multiplicative component

  • sigma_b_k: per-slot multiplicative scales. Length $K \cdot \mathbb{1}[\text{any_use_b}]$, i.e. $K$ when any slot uses b(), $0$ otherwise. The K * any_use_b idiom makes the array vanish entirely when the multiplicative component is globally off.
  • c_b_k_raw: flat-packed standard-normal-scale coefficients on the $c_b$ scale (not the $b$ scale). Length $\sum_k J_{b,\text{free}}[k]$.

The reference-anchoring reparametrization is:

$$ c_{b,k,j} = \theta_{\text{ref},k} \cdot b_{\text{coef},k,j}, \qquad c_{b,k,j} = \sigma_{b,k} \cdot c_{b,k,j}^{\text{raw}}. $$

The linear predictor uses $c_{b,k,j}$ directly (see §3.4), so the model is identified in $c_b$-space. The $b$-scale recovery $b_{\text{coef},k,j} = c_{b,k,j} / \theta_{\text{ref},k}$ is a derived quantity, computed only in the single-anchor regime (§3.3).

2.3 Global modulating component (W)

  • sigma_W: single scalar $\sigma_W &gt; 0$ (length 1 when active, 0 otherwise).
  • W_raw: $\text{dim_W} \times d$ matrix of raw W coefficients. When W is off, both dimensions are $0$.

2.4 Population-scoped dispersion

  • sigma_y_pop: Gaussian $\sigma_y$ entries for slots whose dispersion scope is population. Length sigma_y_size.
  • phi_pop: NB $\phi$ entries for slots whose dispersion scope is population. Length phi_size.

Slots whose dispersion is K-individual (modeled via their own AMM) do not appear here; their dispersion is carried in eta_k[i, k] for the relevant slot.

2.5 What is not sampled

The comment block makes explicit that in the conditional regime:

  • theta_ref_k is data (passed in as theta_ref_k_data).
  • mu_theta_ref_k and sigma_theta_ref_k are not sampled — they were devices for the marginal/FB regimes only.

This is the defining property of the conditional template: the reference is fixed, and only the deviation components ($a$, $b$/ $c_b$, $W$, $\sigma_*$, $\phi$) are random.


3. transformed parameters block

3.1 Declarations

array[K] vector[J_a_max] a_coef_k;
array[K] vector[J_b_max] c_b_k;
array[K] vector[J_b_max] b_coef_k;
matrix[n, K] eta_k;
  • a_coef_k[k]: length-$J_{a,\max}$ vector of additive coefficients for slot $k$ (padded with zeros beyond $J_{a,\text{per},k}[k]$).
  • c_b_k[k]: length-$J_{b,\max}$ vector of multiplicative coefficients on the $c_b$ scale.
  • b_coef_k[k]: length-$J_{b,\max}$ vector of multiplicative coefficients on the $b$ scale (derived; zero in per-group-anchor regime).
  • eta_k: $n \times K$ linear predictor matrix. Entry $\eta_k[i,k]$ is the linear predictor for observation $i$, slot $k$.

3.2 Zero-initialization

for (k in 1:K) {
  a_coef_k[k] = rep_vector(0, J_a_max);
  c_b_k[k] = rep_vector(0, J_b_max);
  b_coef_k[k] = rep_vector(0, J_b_max);
}

All coefficient vectors are zero-initialized. This ensures that slots/indices that are never written to (inactive slots, or padding indices $&gt; J_{\text{per},k}$) remain exactly zero, contributing nothing to eta_k.

3.3 Additive reconstruction via {{TP_A_BLOCK}}

{{TP_A_BLOCK}}

This is a template injection point. Based on the surrounding context (the compaction logic in transformed data and the prior block {{MODEL_A_BLOCK}}), the injected code reconstructs a_coef_k[k] from a_raw and sigma_a_k using the offsets and sigma_a_idx. The canonical form is, for each active slot $k$:

$$ a_{\text{coef},k,j} = \sigma_{a,,\text{sigma_a_idx}[k]} \cdot a_{\text{raw}}[\text{a_raw_offset}[k] + j], \qquad j = 1, \dots, J_{a,\text{free}}[k]. $$

The sigma_a_idx indirection routes each slot to its compacted scale entry. Inactive slots ($\text{sigma_a_idx}[k] = 0$) are skipped, leaving a_coef_k[k] at zero.

3.4 Multiplicative ($c_b$) reconstruction

if (any_use_b == 1) {
  for (k in 1:K) {
    if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
      for (j in 1:J_b_free[k]) {
        c_b_k[k][j] = c_b_k_raw[c_b_raw_offset[k] + j] * sigma_b_k[k];
      }
      if (use_groups == 0 && theta_ref_k_data[1][k] != 0) {
        for (j in 1:J_b_per_k[k]) {
          b_coef_k[k][j] = c_b_k[k][j] / theta_ref_k_data[1][k];
        }
      }
    }
  }
}

Outer guard. The entire block is skipped if any_use_b == 0 (no slot uses the multiplicative component).

Per-slot guard. For slot $k$: only if use_b_k[k] == 1 and $J_{b,\text{per},k}[k] &gt; 0$.

$c_b$ reconstruction. For $j = 1, \dots, J_{b,\text{free}}[k]$:

$$ c_{b,k,j} = \sigma_{b,k} \cdot c_{b,k,j}^{\text{raw}}. $$

Note: sigma_b_k[k] is indexed directly by $k$ (not via a compaction map), because sigma_b_k has length $K$ when active — every slot gets a scale, even if some are unused (the unused ones have a prior but no likelihood contribution; the comment in §1.6 explains this is acceptable for sigma_b because the K * any_use_b gate already eliminates the entire array when no slot uses b(), and within-active-slot zero counts are handled by the J_b_free[k] loop bound).

$b$-scale recovery (single-anchor only). The comment is explicit:

b_coef_k[k] = c_b_k[k] / theta_ref_k[1][k] is reported as a derived quantity only in the single-anchor regime. With per-group anchors b_coef_k stays at zero; downstream callers derive it per group from c_b_k and theta_ref_k[g][k].

The guard use_groups == 0 && theta_ref_k_data[1][k] != 0 ensures:

  1. Single-anchor regime (no per-group anchors).
  2. The reference is nonzero (avoiding division by zero).

When the guard passes, for $j = 1, \dots, J_{b,\text{per},k}[k]$ (note: this uses J_b_per_k, not J_b_free, but since the guard requires J_b_per_k[k] > 0 and use_b_k[k] == 1, we have $J_{b,\text{free}}[k] = J_{b,\text{per},k}[k]$ here):

$$ b_{\text{coef},k,j} = \frac{c_{b,k,j}}{\theta_{\text{ref},k}^{(1)}}, $$

where $\theta_{\text{ref},k}^{(1)}$ is theta_ref_k_data[1][k] — the reference for group 1 (the single anchor). This is the inverse of the reference-anchoring reparametrization $c_b = \theta_{\text{ref}} \cdot b$.

3.5 Linear predictor construction

for (i in 1:n) {
  int g_i = group_id[i];
  for (k in 1:K) {
    real theta_ref_ik = theta_ref_k_data[g_i][k];
    real eta_ik = theta_ref_ik;
    if (use_a_k[k] == 1 && J_a_per_k[k] > 0) {
      eta_ik += dot_product(
        to_vector(Z_a_k[k][i, 1:J_a_per_k[k]]),
        a_coef_k[k][1:J_a_per_k[k]]
      );
    }
    if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
      eta_ik += dot_product(
        to_vector(Z_b_k[k][i, 1:J_b_per_k[k]]),
        c_b_k[k][1:J_b_per_k[k]]
      );
    }
    if (use_W == 1 && dim_W > 0 && d > 0) {
      // ... W modulation ...
    }
    eta_k[i, k] = eta_ik;
  }
}

For each observation $i$ and slot $k$:

Step 1: Group lookup. g_i = group_id[i] determines which group's reference applies to observation $i$.

Step 2: Reference anchor. $\theta_{\text{ref},i,k} = \text{theta_ref_k_data}[g_i][k]$. The linear predictor is initialized to the reference:

$$ \eta_{i,k} = \theta_{\text{ref},i,k}. $$

This is the anchor point of the AMM decomposition: all deviations are additive perturbations on top of $\theta_{\text{ref},i,k}$.

Step 3: Additive deviation. If slot $k$ uses the additive component and has at least one a() coefficient:

$$ \eta_{i,k} \mathrel{+}= \sum_{j=1}^{J_{a,\text{per},k}[k]} Z_{a,k}[i,j] \cdot a_{\text{coef},k,j}. $$

The design matrix slice Z_a_k[k][i, 1:J_a_per_k[k]] is cast to a vector and dotted with the coefficient slice. This is the additive smooth term: a linear deviation from the reference, unconstrained in sign.

Step 4: Multiplicative deviation (on $c_b$ scale). If slot $k$ uses the multiplicative component:

$$ \eta_{i,k} \mathrel{+}= \sum_{j=1}^{J_{b,\text{per},k}[k]} Z_{b,k}[i,j] \cdot c_{b,k,j} = \sum_{j=1}^{J_{b,\text{per},k}[k]} Z_{b,k}[i,j] \cdot \theta_{\text{ref},k} \cdot b_{\text{coef},k,j}. $$

This is the reference-anchored multiplicative term. The key identity: because $c_{b,k,j} = \theta_{\text{ref},k} \cdot b_{\text{coef},k,j}$, the contribution $Z_b \cdot c_b$ equals $Z_b \cdot \theta_{\text{ref}} \cdot b$, which is the multiplicative interaction $b \cdot \theta_{\text{ref}}$ modulated by the design $Z_b$. The anchoring ensures the term vanishes when $b = 0$ (no multiplicative effect) and scales proportionally with $\theta_{\text{ref}}$.

Step 5: W modulation. Guarded by use_W == 1 && dim_W > 0 && d > 0:

vector[d] W_diff_x_k = rep_vector(0, d);
vector[dim_W] basis_diff_k = apply_W_basis_diff(
  W_type_id, theta_ref_ik, theta_anchor_K[k], dim_W,
  W_degree, W_n_knots_full, W_knots_full
);
for (jj_local in 1:dim_W) {
  for (jjj in 1:d) {
    W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[jj_local, jjj]{{W_SCALE}};
  }
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));

The W component is a basis-function difference modulated by covariates:

  1. basis_diff_k = apply_W_basis_diff(W_type_id, theta_ref_ik, theta_anchor_K[k], ...): computes the difference of the W basis evaluated at the current reference $\theta_{\text{ref},i,k}$ minus the basis at the anchor $\theta_{\text{anchor},K}[k]$. For W_type_id == 1 (polynomial) or W_type_id == 2 (B-spline), this is a vector of length dim_W:

$$ \Delta_k = B(\theta_{\text{ref},i,k}) - B(\theta_{\text{anchor},k}), $$

where $B(\cdot) \in \mathbb{R}^{\text{dim_W}}$ is the basis vector. The anchoring at $\theta_{\text{anchor},k}$ ensures the W modulation is zero when $\theta_{\text{ref},i,k} = \theta_{\text{anchor},k}$ — the reference-anchoring property.

  1. The double loop computes $W_{\text{diff},x,k} = \Delta_k^\top W_{\text{raw}}$ (a length-$d$ vector), where $W_{\text{raw}} \in \mathbb{R}^{\text{dim_W} \times d}$. The {{W_SCALE}} injection point multiplies each entry by $\sigma_W$ (or a function thereof), implementing the non-centered parametrization:

$$ W_{\text{diff},x,k,j} = \sum_{\ell=1}^{\text{dim_W}} \Delta_{k,\ell} \cdot W_{\text{raw}}[\ell, j] \cdot (\text{W_SCALE}), \qquad j = 1, \dots, d. $$

  1. The final contribution is the dot product with the covariate vector $X_i$:

$$ \eta_{i,k} \mathrel{+}= \sum_{j=1}^{d} W_{\text{diff},x,k,j} \cdot X_{i,j} = \sum_{j=1}^{d} \left( \sum_{\ell=1}^{\text{dim_W}} \Delta_{k,\ell} , W_{\text{raw}}[\ell,j] \cdot \text{W_SCALE} \right) X_{i,j}. $$

The inline comment notes that for $K &gt; 1 + p = 1$ the basis is univariate (per-slot scalar input $\theta_{\text{ref},i,k}$), so W_raw has dim_W rows (not dim_W * p as in the multivariate amm_distrib_multi.stan template). The basis dispatch via apply_W_basis_diff() supports W_type_id $\in {1, 2}$ uniformly across all $K$ slots (sub-phase 8.3.8).

Full AMM decomposition. Combining all terms:

$$ \boxed{ \eta_{i,k} = \theta_{\text{ref},i,k} + \underbrace{Z_{a,k}[i]^\top a_{\text{coef},k}}_{\text{additive}} + \underbrace{Z_{b,k}[i]^\top c_{b,k}}_{\text{multiplicative (anchored)}} + \underbrace{X_i^\top \Delta_k^\top W_{\text{raw}} \cdot \text{W_SCALE}}_{\text{W modulation (anchored)}} } $$

where $c_{b,k} = \theta_{\text{ref},k} \cdot b_{\text{coef},k}$ and $\Delta_k = B(\theta_{\text{ref},i,k}) - B(\theta_{\text{anchor},k})$.


4. model block

4.1 Preamble comment

// No prior on theta_ref_k / mu_theta_ref_k / sigma_theta_ref_k:
// theta_ref_k is data in the conditional template. xi components
// (a, b, W, sigma_*, phi) retain their canonical priors.

Explicit statement: only the deviation components receive priors. The reference is fixed data.

4.2 Additive priors

if (any_use_a == 1) {
  sigma_a_k ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}
  • sigma_a_k ~ {{PRIOR_SIGMA_A}}: the per-slot additive scales receive a template-injected prior (typically half-normal, half-Cauchy, or exponential). The prior is on the compacted vector of length $n_{\sigma_a}$.
  • {{MODEL_A_BLOCK}}: injected prior on a_raw. The canonical non-centered form is a_raw ~ normal(0, 1), giving $a_{k,j} \sim \text{Normal}(0, \sigma_{a,k}^2)$ marginally. This block may also contain slot-specific prior modifications.

The entire block is gated by any_use_a == 1, so no prior is evaluated when the additive component is globally off.

4.3 Multiplicative priors

if (any_use_b == 1) {
  sigma_b_k ~ {{PRIOR_SIGMA_B}};
  c_b_k_raw ~ normal(0, 1);
}
  • sigma_b_k ~ {{PRIOR_SIGMA_B}}: template-injected prior on the $K$ multiplicative scales.
  • c_b_k_raw ~ normal(0, 1): hardcoded standard normal on the raw $c_b$ coefficients. This is the non-centered parametrization: $c_{b,k,j}^{\text{raw}} \sim \text{Normal}(0,1)$ implies $c_{b,k,j} \sim \text{Normal}(0, \sigma_{b,k}^2)$.

Note the asymmetry with the additive block: a_raw's prior is injected via {{MODEL_A_BLOCK}} (allowing customization), while c_b_k_raw's prior is hardcoded normal(0, 1).

4.4 W priors

if (use_W == 1 && dim_W > 0) {
  sigma_W[1] ~ {{PRIOR_SIGMA_W}};
  to_vector(W_raw) ~ {{W_PRIOR}};
}
  • sigma_W[1] ~ {{PRIOR_SIGMA_W}}: prior on the single global W scale.
  • to_vector(W_raw) ~ {{W_PRIOR}}: template-injected prior on the vectorized W matrix. The canonical form is normal(0, 1) (non-centered), with the scaling by $\sigma_W$ applied in transformed parameters via {{W_SCALE}}.

4.5 Dispersion priors

if (sigma_y_size > 0) {
  sigma_y_pop ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
  phi_pop ~ {{PRIOR_PHI}};
}

Population-scoped dispersion parameters receive template-injected priors, but only if the respective count is positive. These are the dispersion parameters that are not modeled via their own AMM slot (those are captured in eta_k).

4.6 Likelihood dispatch

The likelihood is dispatched on family_id_k[1] — the family identifier of slot 1. The inline comment explains:

The homogeneous-family assumption (family_id_k all equal) holds in Unit 3; family_id_k[1] drives the dispatch.

The comment also notes that family_id_k[1] in ${2, 4}$ (Poisson, Bernoulli) are $K=1$ families and are never routed here (the R-side guard in families.R ensures promotion to per_observation requires the slot name to be eligible).

4.6.1 Gaussian ($K=2$, family_id_k[1] == 1)

for (i in 1:n) {
  real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ normal(eta_k[i, 1], sigma_i);
}

$$ \sigma_i = g^{-1}_{\text{slot 2}}(\eta_{i,2}), \qquad y_i \sim \text{Normal}(\eta_{i,1},; \sigma_i). $$

  • Slot 1 (location): identity link, $\mu_i = \eta_{i,1}$.
  • Slot 2 (scale): inverse link applied via apply_inv_link_by_id(inv_link_id_per_slot[2], ...). Under homogeneity (family_id_k[2] == 1), inv_link_id_per_slot[2] == 2exp, giving $\sigma_i = e^{\eta_{i,2}} &gt; 0$ (canonical log link for Gaussian $\sigma$). Under heterogeneity (e.g. Beta in slot 2), the link follows the heterogeneous family's slot-1 inverse link (e.g. inv_logit for Beta, giving $\sigma \in (0,1)$).

4.6.2 Negative Binomial ($K=2$, family_id_k[1] == 3)

for (i in 1:n) {
  real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_int[i] ~ neg_binomial_2(exp(eta_k[i, 1]), phi_i);
}

$$ \mu_i = e^{\eta_{i,1}}, \qquad \phi_i = g^{-1}_{\text{slot 2}}(\eta_{i,2}), \qquad y_i \sim \text{NegBinomial2}(\mu_i,; \phi_i). $$

  • Slot 1 (mean): log link, $\mu_i = e^{\eta_{i,1}}$.
  • Slot 2 (dispersion): inverse link applied. Under homogeneity, $\phi_i = e^{\eta_{i,2}}$ (log link), reproducing the legacy neg_binomial_2_log(eta_mu, exp(eta_phi)) idiom.

The NB2 parametrization has $\text{Var}(y) = \mu + \mu^2/\phi$.

4.6.3 Beta ($K=2$, family_id_k[1] == 5)

for (i in 1:n) {
  real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ beta_proportion(inv_logit(eta_k[i, 1]), phi_i);
}

$$ \mu_i = \text{logit}^{-1}(\eta_{i,1}), \qquad \phi_i = g^{-1}_{\text{slot 2}}(\eta_{i,2}), \qquad y_i \sim \text{BetaProportion}(\mu_i,; \phi_i). $$

Canonical mean-precision parametrization: $\mu \in (0,1)$ via logit, $\phi &gt; 0$ via log (under homogeneity). Stan's beta_proportion(mu, phi) has density proportional to $y^{\mu\phi - 1}(1-y)^{(1-\mu)\phi - 1}$.

4.6.4 Gamma ($K=2$, family_id_k[1] == 6)

for (i in 1:n) {
  real mu_i    = exp(eta_k[i, 1]);
  real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ gamma(shape_i, shape_i / mu_i);
}

$$ \mu_i = e^{\eta_{i,1}}, \qquad \alpha_i = g^{-1}_{\text{slot 2}}(\eta_{i,2}), \qquad \beta_i = \frac{\alpha_i}{\mu_i}, \qquad y_i \sim \text{Gamma}(\alpha_i,; \beta_i). $$

Mean-shape parametrization: Stan's gamma(alpha, beta) has mean $\alpha/\beta$. Setting $\beta = \alpha/\mu$ gives $\mathbb{E}[y] = \mu$. Slot 1 carries $\log\mu$ (log link), slot 2 carries $\log\alpha$ (log link under homogeneity).

4.6.5 Lognormal ($K=2$, family_id_k[1] == 7)

for (i in 1:n) {
  real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ lognormal(eta_k[i, 1], sigma_i);
}

$$ \sigma_i = g^{-1}_{\text{slot 2}}(\eta_{i,2}), \qquad \log y_i \sim \text{Normal}(\eta_{i,1},; \sigma_i). $$

Canonical $(\mu \text{ identity}, \log\sigma)$ on the log-scale of $y$. Slot 1 is the log-mean (identity link), slot 2 is $\log\sigma$ (log link under homogeneity → $\sigma = e^{\eta_{i,2}}$).

4.6.6 Student-$t$ ($K=3$, family_id_k[1] == 8)

for (i in 1:n) {
  y_real[i] ~ student_t(exp(eta_k[i, 3]), eta_k[i, 1], exp(eta_k[i, 2]));
}

$$ \nu_i = e^{\eta_{i,3}}, \qquad \mu_i = \eta_{i,1}, \qquad \sigma_i = e^{\eta_{i,2}}, \qquad y_i \sim \text{Student-}t(\nu_i,; \mu_i,; \sigma_i). $$

Canonical $(\mu \text{ identity}, \log\sigma, \log\nu)$:

  • Slot 1: location, identity link.
  • Slot 2: $\log\sigma$$\sigma = e^{\eta_{i,2}}$.
  • Slot 3: $\log\nu$$\nu = e^{\eta_{i,3}}$ (positive real degrees of freedom).

Note: no apply_inv_link_by_id dispatch here — the links are hardcoded as exp for slots 2 and 3, identity for slot 1.

4.6.7 Tweedie ($K=3$, family_id_k[1] == 9)

for (i in 1:n) {
  y_real[i] ~ tweedie(exp(eta_k[i, 1]), exp(eta_k[i, 2]), eta_k[i, 3]);
}

$$ \mu_i = e^{\eta_{i,1}}, \qquad \phi_i = e^{\eta_{i,2}}, \qquad p_i = \eta_{i,3}, \qquad y_i \sim \text{Tweedie}(\mu_i,; \phi_i,; p_i). $$

Canonical $(\log\mu, \log\phi, p \text{ identity})$:

  • Slot 1: $\log\mu$ (log link).
  • Slot 2: $\log\phi$ (log link).
  • Slot 3: $p$ on identity link, structurally bounded to $(1.01, 1.99)$ by a slot-specific prior emitted R-side (decision E7 of D5). The bounds ensure the Tweedie is in the compound-Poisson range ($1 &lt; p &lt; 2$).

The custom tweedie_lpdf (defined in section 1) dispatches Dunn–Smyth series or saddlepoint depending on $|p - 1.5| &lt; 0.4$ (hybrid boundary).

4.6.8 Zero-Inflated Poisson ($K=2$, family_id_k[1] == 10)

for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real eta_pi_i = eta_k[i, 2];
  if (y_int[i] == 0) {
    target += log_sum_exp(
      bernoulli_logit_lpmf(1 | eta_pi_i),
      bernoulli_logit_lpmf(0 | eta_pi_i) + poisson_log_lpmf(0 | eta_mu_i)
    );
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(y_int[i] | eta_mu_i);
  }
}

Canonical $(\log\mu, \text{logit},\pi)$ parametrization (Lambert 1992):

$$ P(y=0) = \pi + (1-\pi) \cdot \text{Poisson}(0 \mid \mu), \qquad P(y>0) = (1-\pi) \cdot \text{Poisson}(y \mid \mu). $$

  • Slot 1: $\eta_\mu = \log\mu$ (log link, consumed by poisson_log_lpmf).
  • Slot 2: $\eta_\pi = \text{logit},\pi$ (logit link, consumed by bernoulli_logit_lpmf).

$y=0$ branch. The log-likelihood is $\log!\big[\pi + (1-\pi) e^{-\mu}\big]$, computed via log_sum_exp of the two components:

  • Structural zero: bernoulli_logit_lpmf(1 | eta_pi) $= \log\pi$.
  • Sampling zero: bernoulli_logit_lpmf(0 | eta_pi) + poisson_log_lpmf(0 | eta_mu) $= \log(1-\pi) + (-\mu)$.

$y&gt;0$ branch. $\log(1-\pi) + \log\text{Poisson}(y \mid \mu)$, computed directly.

4.6.9 Zero-Inflated NB ($K=3$, family_id_k[1] == 11)

for (i in 1:n) {
  real eta_mu_i  = eta_k[i, 1];
  real phi_i     = exp(eta_k[i, 2]);
  real eta_pi_i  = eta_k[i, 3];
  if (y_int[i] == 0) {
    target += log_sum_exp(
      bernoulli_logit_lpmf(1 | eta_pi_i),
      bernoulli_logit_lpmf(0 | eta_pi_i) + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
    );
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i);
  }
}

Canonical $(\log\mu, \log\phi, \text{logit},\pi)$ (Greene 1994):

$$ P(y=0) = \pi + (1-\pi) \cdot \text{NegBinomial2}(0 \mid \mu, \phi), \qquad P(y>0) = (1-\pi) \cdot \text{NegBinomial2}(y \mid \mu, \phi). $$

  • Slot 1: $\eta_\mu = \log\mu$.
  • Slot 2: $\eta_\phi = \log\phi$$\phi = e^{\eta_{i,2}}$ (hardcoded exp, no apply_inv_link_by_id).
  • Slot 3: $\eta_\pi = \text{logit},\pi$.

Structurally identical to ZIP but with neg_binomial_2_log_lpmf replacing poisson_log_lpmf.

4.6.10 Hurdle-Poisson ($K=2$, family_id_k[1] == 12)

for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real eta_pi_i = eta_k[i, 2];
  if (y_int[i] == 0) {
    target += bernoulli_logit_lpmf(1 | eta_pi_i);
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(y_int[i] | eta_mu_i)
            - log1m_exp(-exp(eta_mu_i));
  }
}

Canonical $(\log\mu, \text{logit},\pi)$ (Mullahy 1986):

$$ P(y=0) = \pi, \qquad P(y>0) = (1-\pi) \cdot \frac{\text{Poisson}(y \mid \mu)}{1 - \text{Poisson}(0 \mid \mu)}. $$

Distinct from ZIP: the zero is a structural decision (Bernoulli hurdle), and the positive branch is a Poisson truncated at one.

$y=0$ branch. $\log\pi$ via bernoulli_logit_lpmf(1 | eta_pi).

$y&gt;0$ branch. $\log(1-\pi) + \log\text{Poisson}(y \mid \mu) - \log!\big[1 - e^{-\mu}\big]$. The truncation denominator $\log(1 - \text{Poisson}(0 \mid \mu)) = \log(1 - e^{-\mu})$ is computed stably via log1m_exp(-exp(eta_mu_i)):

  • exp(eta_mu_i) $= \mu$.
  • -exp(eta_mu_i) $= -\mu$.
  • exp(-mu) $= e^{-\mu} = \text{Poisson}(0 \mid \mu)$.
  • log1m_exp(-mu) $= \log(1 - e^{-\mu})$.

4.6.11 Hurdle-NB ($K=3$, family_id_k[1] == 13)

for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real phi_i    = exp(eta_k[i, 2]);
  real eta_pi_i = eta_k[i, 3];
  if (y_int[i] == 0) {
    target += bernoulli_logit_lpmf(1 | eta_pi_i);
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i)
            - log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i));
  }
}

Canonical $(\log\mu, \log\phi, \text{logit},\pi)$:

$$ P(y=0) = \pi, \qquad P(y>0) = (1-\pi) \cdot \frac{\text{NegBinomial2}(y \mid \mu, \phi)}{1 - \text{NegBinomial2}(0 \mid \mu, \phi)}. $$

Symmetric to Hurdle-Poisson with NB2 in the positive branch. The truncation denominator is $\log!\big[1 - \text{NegBinomial2}(0 \mid \mu, \phi)\big]$, computed via:

  • neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i) $= \log\text{NegBinomial2}(0 \mid \mu, \phi)$.
  • log1m_exp(...) $= \log(1 - \text{NegBinomial2}(0 \mid \mu, \phi))$.

4.7 Families not dispatched

The trailing comment:

// family_id_k[1] in {2 (poisson), 4 (bernoulli)}: these are
// K = 1 families (single eligible param_spec); the dispatcher
// never routes them here. The guard in R/families.R ensures
// promotion to per_observation requires the slot name to be
// eligible.

Poisson (family_id_k[1] == 2) and Bernoulli (family_id_k[1] == 4) are $K=1$ families — they have a single eligible parameter slot and are handled by a different template (the $K=1$ canonical). The R-side guard in families.R prevents them from being promoted to per_observation scope with $K &gt; 1$, so this dispatcher never encounters them.


5. Summary of the AMM reference-anchoring decomposition

The conditional template realizes the decomposition:

$$ \eta_{i,k} = \underbrace{\theta_{\text{ref},i,k}}_{\text{data (anchor)}} + \underbrace{Z_{a,k}[i]^\top \sigma_{a,k} \odot a_{k}^{\text{raw}}}_{\text{additive deviation}} + \underbrace{Z_{b,k}[i]^\top \sigma_{b,k} \odot c_{b,k}^{\text{raw}}}_{\text{multiplicative deviation (}c_b = \theta_{\text{ref}} \cdot b\text{)}} + \underbrace{X_i^\top \big[B(\theta_{\text{ref},i,k}) - B(\theta_{\text{anchor},k})\big]^\top W_{\text{raw}} \cdot \sigma_W}_{\text{W modulation (anchored at } \theta_{\text{anchor},k}\text{)}} $$

Key properties:

  1. Reference is data. $\theta_{\text{ref},i,k}$ is fixed (passed as theta_ref_k_data), not sampled. This is the defining feature of the conditional regime.

  2. Additive deviation is unconstrained in sign and scale (modulated by $\sigma_{a,k}$).

  3. Multiplicative deviation is parametrized on the $c_b$ scale, where $c_b = \theta_{\text{ref}} \cdot b$. The linear predictor uses $c_b$ directly, so the model is identified in $c_b$-space. The $b$-scale recovery is a derived quantity (single-anchor only).

  4. W modulation is anchored at $\theta_{\text{anchor},k}$: the basis difference $\Delta_k = B(\theta_{\text{ref}}) - B(\theta_{\text{anchor}})$ ensures the modulation vanishes when $\theta_{\text{ref}} = \theta_{\text{anchor}}$.

  5. Non-centered parametrization throughout: $a = \sigma_a \odot a^{\text{raw}}$, $c_b = \sigma_b \odot c_b^{\text{raw}}$, $W = \sigma_W \cdot W^{\text{raw}}$, with $a^{\text{raw}}, c_b^{\text{raw}}, W^{\text{raw}} \sim \text{Normal}(0, 1)$.

  6. Compaction eliminates non-identified scale parameters for slots with no free coefficients (Option A, D96), with bit-identical behavior when all slots are active.

Stan Source Walkthrough — amm_canonical_eb_conditional_K.stan (section 3 of 3)

This section closes the preceding model block (the lone } on the first line) and then supplies the entire generated quantities block. There are no functions, data, transformed data, parameters, or transformed parameters blocks in this section; the analysis below is therefore restricted to the generated quantities block, with cross-references to identifiers that must have been declared in earlier sections (sections 1–2).


Closing brace of the previous block

}

This } terminates the immediately preceding block (the model block from section 2). It is not part of generated quantities; it appears here only because section 2 was truncated mid-block.


generated quantities block — header

generated quantities {

Opens the generated quantities program block. In Stan, code in this block is executed once per HMC/NUTS leapfrog trajectory after the Metropolis accept/reject step, using the accepted parameter draw. Quantities declared here are stored in the posterior and are available for posterior-predictive checks, WAIC/LOO via log_lik, and so on. They do not affect the joint density used for sampling (no target += accumulation here).


Top-level declarations inside generated quantities

  // Per-observation log-likelihood, fitted linear predictors per
  // slot, and posterior-predictive draws. The shape mirrors the
  // K = 1 + p >= 1 template up to the swap of p with K and the
  // distributional-regression likelihood selection.
  vector[n] log_lik;
  matrix[n, K] theta_i_k = eta_k;
  vector[n] y_pred;

Three quantities are declared:

  1. vector[n] log_lik; — A length-$n$ vector that will hold the pointwise log-likelihood $\log p(y_i \mid \eta_{i,\cdot})$ for each observation $i = 1, \ldots, n$. This is the canonical interface for loo/psis-loo/waic in the R ecosystem: each entry is $\log p(y_i \mid \theta_i)$ evaluated at the current posterior draw.

  2. matrix[n, K] theta_i_k = eta_k; — An $n \times K$ matrix that is aliased (copied by value at construction) from eta_k. The comment makes the AMM reference-anchoring role explicit: in the $K = 1 + p$ template (one mean slot plus $p$ distributional-parameter slots), the per-observation predictor matrix is conventionally denoted $\theta_{i,k}$. Here, with $K \geq 1$ slots, the same object is simply eta_k, and theta_i_k is exposed under the canonical name so downstream R code that expects theta_i_k works uniformly across the $K=1+p$ and the general-$K$ parametrizations. Mathematically, $$\theta_{i,k} ;=; \eta_{i,k}, \qquad i = 1,\ldots,n,; k = 1,\ldots,K.$$ No transformation is applied; this is a pure renaming for API compatibility with the reference-anchoring decomposition in which $\eta_{i,k} = x_i^\top \beta_k$ (or, in the AMM anchored form, $\eta_{i,k} = x_i^\top \beta_k + \sum_{r} \lambda_{k,r}, z_{i,r}$ with anchoring constraints absorbed into the prior on $\beta_k$ — see sections 1–2).

  3. vector[n] y_pred; — A length-$n$ vector that will hold one posterior-predictive draw $y_i^{\text{rep}} \sim p(y_i \mid \eta_{i,\cdot})$ per observation, using the _rng counterparts of the same family dispatched in the model block.


Observation loop

  for (i in 1:n) {

Iterates over observations $i = 1, \ldots, n$. Inside the loop, the dispatch is driven by family_id_k[1], i.e. the first element of the integer family-id vector. The AMM convention is that all $K$ slots share a single response family (selected by family_id_k[1]); the remaining elements of family_id_k (if any) are not consulted in this block. The dispatch is a long if / else if chain.


Family 1 — Gaussian (Normal), $K = 2$

    if (family_id_k[1] == 1) {
      real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
      y_pred[i] = normal_rng(eta_k[i, 1], sigma_i);
    }
  • Slot 1 (eta_k[i, 1]) is the mean $\mu_i$ on the identity link: $\mu_i = \eta_{i,1}$.
  • Slot 2 (eta_k[i, 2]) is the standard deviation $\sigma_i$, mapped through the user-selected inverse link $g_2^{-1}$ identified by inv_link_id_per_slot[2]: $$\sigma_i ;=; g_2^{-1}(\eta_{i,2}).$$ Typical choices: exp (log-link on $\sigma$) or softplus/square etc., depending on the integer id resolved by apply_inv_link_by_id.
  • Likelihood: $$\log p(y_i \mid \mu_i, \sigma_i) ;=; -\tfrac{1}{2}\log(2\pi) - \log\sigma_i - \frac{(y_i - \mu_i)^2}{2\sigma_i^2},$$ implemented by normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i). The response is the real-valued outcome vector y_real.
  • Posterior predictive: $y_i^{\text{rep}} \sim \mathcal{N}(\mu_i, \sigma_i)$ via normal_rng.

In AMM reference-anchoring terms, $\eta_{i,1}$ and $\eta_{i,2}$ are each linear predictors that may carry anchored reference contrasts; the anchoring is invisible at this layer because it was already absorbed into the construction of eta_k in transformed parameters (sections 1–2).


Family 3 — Negative Binomial II, $K = 2$

    } else if (family_id_k[1] == 3) {
      real mu_i  = exp(eta_k[i, 1]);
      real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                        eta_k[i, 2]);
      log_lik[i] = neg_binomial_2_lpmf(y_int[i] | mu_i, phi_i);
      y_pred[i]  = neg_binomial_2_rng(mu_i, phi_i);
    }
  • Slot 1: mean on the log link, $\mu_i = \exp(\eta_{i,1})$.
  • Slot 2: precision/reciprocal-dispersion $\phi_i = g_2^{-1}(\eta_{i,2})$ via the user-selected inverse link. The NB2 variance function is $\mathrm{Var}(Y_i) = \mu_i + \mu_i^2/\phi_i$.
  • Likelihood (integer response y_int): $$\log p(y_i \mid \mu_i, \phi_i) ;=; \log\Gamma(y_i + \phi_i) - \log\Gamma(\phi_i) - \log\Gamma(y_i + 1) + \phi_i \log!\left(\frac{\phi_i}{\mu_i + \phi_i}\right) + y_i \log!\left(\frac{\mu_i}{\mu_i + \phi_i}\right).$$
  • Posterior predictive: $y_i^{\text{rep}} \sim \mathrm{NB}_2(\mu_i, \phi_i)$ via neg_binomial_2_rng.

Note the family-id gap: family_id_k[1] == 2 is absent from this dispatch, implying that id 2 is reserved for a family not handled in this template (likely Bernoulli/Binomial handled elsewhere or omitted).


Family 5 — Beta-proportion, $K = 2$

    } else if (family_id_k[1] == 5) {
      real mu_i  = inv_logit(eta_k[i, 1]);
      real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                        eta_k[i, 2]);
      log_lik[i] = beta_proportion_lpdf(y_real[i] | mu_i, phi_i);
      y_pred[i]  = beta_proportion_rng(mu_i, phi_i);
    }
  • Slot 1: mean $\mu_i = \mathrm{logit}^{-1}(\eta_{i,1}) \in (0,1)$.
  • Slot 2: precision $\phi_i = g_2^{-1}(\eta_{i,2}) &gt; 0$.
  • Likelihood (real response in $(0,1)$): the Beta parametrized by mean and precision, $$p(y_i \mid \mu_i, \phi_i) ;=; \frac{\Gamma(\phi_i)}{\Gamma(\mu_i \phi_i),\Gamma((1-\mu_i)\phi_i)}, y_i^{\mu_i\phi_i - 1}(1-y_i)^{(1-\mu_i)\phi_i - 1},$$ with $\mathrm{Var}(Y_i) = \mu_i(1-\mu_i)/(\phi_i + 1)$.
  • Posterior predictive: $y_i^{\text{rep}} \sim \mathrm{BetaProportion}(\mu_i, \phi_i)$.

Family id 4 is skipped (reserved).


Family 6 — Gamma, $K = 2$

    } else if (family_id_k[1] == 6) {
      real mu_i    = exp(eta_k[i, 1]);
      real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = gamma_lpdf(y_real[i] | shape_i, shape_i / mu_i);
      y_pred[i]  = gamma_rng(shape_i, shape_i / mu_i);
    }
  • Slot 1: mean $\mu_i = \exp(\eta_{i,1})$ (log link).
  • Slot 2: shape $\alpha_i = g_2^{-1}(\eta_{i,2})$.
  • The Stan gamma_lpdf(y | alpha, beta) uses shape–rate parametrization, so the rate is recovered from the mean via $\beta_i = \alpha_i / \mu_i$ (since $\mathbb{E}[Y] = \alpha/\beta$). Hence: $$\log p(y_i \mid \alpha_i, \beta_i) ;=; \alpha_i \log\beta_i - \log\Gamma(\alpha_i) + (\alpha_i - 1)\log y_i - \beta_i y_i, \qquad \beta_i = \alpha_i/\mu_i.$$
  • Posterior predictive: $y_i^{\text{rep}} \sim \mathrm{Gamma}(\alpha_i, \alpha_i/\mu_i)$ via gamma_rng(shape_i, shape_i / mu_i).

Family 7 — Lognormal, $K = 2$

    } else if (family_id_k[1] == 7) {
      real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = lognormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
      y_pred[i]  = lognormal_rng(eta_k[i, 1], sigma_i);
    }
  • Slot 1: $\log$-scale mean $\mu_i^{\log} = \eta_{i,1}$ (identity link on the log scale).
  • Slot 2: $\log$-scale SD $\sigma_i = g_2^{-1}(\eta_{i,2})$.
  • Likelihood: $$\log p(y_i \mid \eta_{i,1}, \sigma_i) ;=; -\log y_i - \tfrac{1}{2}\log(2\pi) - \log\sigma_i - \frac{(\log y_i - \eta_{i,1})^2}{2\sigma_i^2}.$$
  • Posterior predictive: $y_i^{\text{rep}} \sim \mathrm{Lognormal}(\eta_{i,1}, \sigma_i)$.

Family 8 — Student-$t$, $K = 3$

    } else if (family_id_k[1] == 8) {
      real nu_i    = exp(eta_k[i, 3]);
      real mu_i    = eta_k[i, 1];
      real sigma_i = exp(eta_k[i, 2]);
      log_lik[i] = student_t_lpdf(y_real[i] | nu_i, mu_i, sigma_i);
      y_pred[i] = student_t_rng(nu_i, mu_i, sigma_i);
    }
  • Slot 1: location $\mu_i = \eta_{i,1}$ (identity).
  • Slot 2: scale $\sigma_i = \exp(\eta_{i,2})$ (log link, hard-coded — not routed through apply_inv_link_by_id).
  • Slot 3: degrees of freedom $\nu_i = \exp(\eta_{i,3})$ (log link, hard-coded).
  • Likelihood: $$\log p(y_i \mid \nu_i, \mu_i, \sigma_i) ;=; \log\Gamma!\left(\frac{\nu_i + 1}{2}\right) - \log\Gamma!\left(\frac{\nu_i}{2}\right) - \tfrac{1}{2}\log(\nu_i\pi) - \log\sigma_i - \frac{\nu_i + 1}{2}\log!\left(1 + \frac{1}{\nu_i}\left(\frac{y_i - \mu_i}{\sigma_i}\right)^2\right).$$
  • Posterior predictive: $y_i^{\text{rep}} \sim t_{\nu_i}(\mu_i, \sigma_i)$.

Note the asymmetry with families 1, 3, 5, 6, 7: here both non-location slots use exp directly rather than apply_inv_link_by_id. This is a hard-coded convention of the template for the Student-$t$ family.


Family 9 — Tweedie, $K = 3$

    } else if (family_id_k[1] == 9) {
      real mu_i  = exp(eta_k[i, 1]);
      real phi_i = exp(eta_k[i, 2]);
      real p_i   = eta_k[i, 3];
      log_lik[i] = tweedie_lpdf(y_real[i] | mu_i, phi_i, p_i);
      y_pred[i] = tweedie_rng(mu_i, phi_i, p_i);
    }
  • Slot 1: mean $\mu_i = \exp(\eta_{i,1})$ (log link).
  • Slot 2: dispersion $\phi_i = \exp(\eta_{i,2})$ (log link, hard-coded).
  • Slot 3: power parameter $p_i = \eta_{i,3}$ (identity link, unconstrained in the linear predictor — the validity domain $p \in (1,2)$ is not enforced here; it must be enforced upstream, e.g. by a prior or by a constrained inverse link in transformed parameters).
  • Likelihood: compound Poisson–gamma Tweedie density, $$p(y_i \mid \mu_i, \phi_i, p_i) ;=; \text{Tweedie}_{p_i}(\mu_i, \phi_i),$$ evaluated via tweedie_lpdf (a user-defined or brms-style implementation exposed to Stan; the evaluation uses the series expansion of Dunn & Smyth 2005).
  • Posterior predictive: $y_i^{\text{rep}} \sim \mathrm{Tweedie}(\mu_i, \phi_i, p_i)$ via tweedie_rng.

Family 10 — Zero-Inflated Poisson (ZIP), $K = 2$

    } else if (family_id_k[1] == 10) {
      // ZIP K = 2: log_lik mirrors the model-block dispatch; y_pred
      // draws the structural zero indicator first and then a Poisson
      // sample for the non-zero branch (Lambert 1992).
      real eta_mu_i = eta_k[i, 1];
      real eta_pi_i = eta_k[i, 2];
      if (y_int[i] == 0) {
        log_lik[i] = log_sum_exp(
          bernoulli_logit_lpmf(1 | eta_pi_i),
          bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(0 | eta_mu_i)
        );
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + poisson_log_lpmf(y_int[i] | eta_mu_i);
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        y_pred[i] = poisson_log_rng(eta_mu_i);
      }
    }
  • Slot 1: Poisson mean on the log link, $\log\mu_i = \eta_{i,1}$, i.e. $\mu_i = \exp(\eta_{i,1})$. The code uses poisson_log_lpmf, which takes the log-mean directly, so no explicit exp is needed.
  • Slot 2: zero-inflation probability on the logit link, $\pi_i = \mathrm{logit}^{-1}(\eta_{i,2})$, via bernoulli_logit_lpmf / bernoulli_logit_rng which take the logit directly.

Likelihood (Lambert 1992). Let $Z_i \in {0,1}$ indicate the structural-zero state ($Z_i = 1 \iff$ structural zero). Then $$\Pr(Y_i = 0) = \pi_i + (1 - \pi_i),e^{-\mu_i}, \qquad \Pr(Y_i = y) = (1 - \pi_i),\frac{\mu_i^y e^{-\mu_i}}{y!},; y \geq 1.$$

For $y_i = 0$ the code computes the log of the mixture via log_sum_exp: $$\log p(0 \mid \cdot) ;=; \mathrm{logsumexp}!\Big(\mathrm{logit}^{-1}\text{-Bernoulli log-pmf}(1 \mid \eta_{i,2}),;; \mathrm{logit}^{-1}\text{-Bernoulli log-pmf}(0 \mid \eta_{i,2}) + \mathrm{PoissonLog}(0 \mid \eta_{i,1})\Big).$$ Equivalently, $$\log p(0 \mid \cdot) ;=; \log!\left(\pi_i + (1-\pi_i),e^{-\mu_i}\right).$$

For $y_i \geq 1$: $$\log p(y_i \mid \cdot) ;=; \log(1 - \pi_i) + \log\mathrm{Poisson}(y_i \mid \mu_i) ;=; \mathrm{BernoulliLogit}(0 \mid \eta_{i,2}) + \mathrm{PoissonLog}(y_i \mid \eta_{i,1}).$$

Posterior predictive (Lambert 1992 forward simulation): first draw $Z_i \sim \mathrm{BernoulliLogit}(\eta_{i,2})$; if $Z_i = 1$, set $y_i^{\text{rep}} = 0$; otherwise draw $y_i^{\text{rep}} \sim \mathrm{PoissonLog}(\eta_{i,1})$ via poisson_log_rng.


Family 11 — Zero-Inflated Negative Binomial (ZINB), $K = 3$

    } else if (family_id_k[1] == 11) {
      // ZINB K = 3: symmetric to ZIP with neg_binomial_2_log.
      real eta_mu_i = eta_k[i, 1];
      real phi_i    = exp(eta_k[i, 2]);
      real eta_pi_i = eta_k[i, 3];
      if (y_int[i] == 0) {
        log_lik[i] = log_sum_exp(
          bernoulli_logit_lpmf(1 | eta_pi_i),
          bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
        );
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
                                              phi_i);
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        y_pred[i] = neg_binomial_2_log_rng(eta_mu_i, phi_i);
      }
    }
  • Slot 1: NB2 mean on the log link, $\log\mu_i = \eta_{i,1}$ (consumed by neg_binomial_2_log_lpmf, which takes the log-mean directly).
  • Slot 2: NB2 precision $\phi_i = \exp(\eta_{i,2})$ (log link, hard-coded).
  • Slot 3: zero-inflation logit $\eta_{i,3}$, with $\pi_i = \mathrm{logit}^{-1}(\eta_{i,3})$.

Likelihood. With $Z_i$ the structural-zero indicator and NB2 variance $\mu_i + \mu_i^2/\phi_i$: $$\Pr(Y_i = 0) = \pi_i + (1 - \pi_i)\left(\frac{\phi_i}{\mu_i + \phi_i}\right)^{\phi_i},$$ $$\Pr(Y_i = y) = (1 - \pi_i),\mathrm{NB}_2(y \mid \mu_i, \phi_i), \quad y \geq 1.$$

For $y_i = 0$: $$\log p(0 \mid \cdot) ;=; \mathrm{logsumexp}!\Big(\mathrm{BernoulliLogit}(1 \mid \eta_{i,3}),;; \mathrm{BernoulliLogit}(0 \mid \eta_{i,3}) + \mathrm{NB2Log}(0 \mid \eta_{i,1}, \phi_i)\Big).$$

For $y_i \geq 1$: $$\log p(y_i \mid \cdot) ;=; \mathrm{BernoulliLogit}(0 \mid \eta_{i,3}) + \mathrm{NB2Log}(y_i \mid \eta_{i,1}, \phi_i).$$

Posterior predictive: draw $Z_i \sim \mathrm{BernoulliLogit}(\eta_{i,3})$; if $Z_i = 1$, $y_i^{\text{rep}} = 0$; else $y_i^{\text{rep}} \sim \mathrm{NB2Log}(\eta_{i,1}, \phi_i)$ via neg_binomial_2_log_rng.


Family 12 — Hurdle-Poisson, $K = 2$

    } else if (family_id_k[1] == 12) {
      // Hurdle-Poisson K = 2: log_lik mirrors the model-block
      // dispatch. y_pred uses rejection sampling on Poisson(mu) to
      // realize the truncated-at-one positive branch (Mullahy 1986);
      // bounded at 10000 iterations for numerical safety when mu is
      // pathologically small (in which case the family is borderline
      // identifiable in the first place).
      real eta_mu_i = eta_k[i, 1];
      real eta_pi_i = eta_k[i, 2];
      if (y_int[i] == 0) {
        log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + poisson_log_lpmf(y_int[i] | eta_mu_i)
                   - log1m_exp(-exp(eta_mu_i));
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        int y_draw = 0;
        int iter = 0;
        while (y_draw == 0 && iter < 10000) {
          y_draw = poisson_log_rng(eta_mu_i);
          iter = iter + 1;
        }
        y_pred[i] = y_draw;
      }
    }
  • Slot 1: Poisson mean on the log link, $\log\mu_i = \eta_{i,1}$.
  • Slot 2: hurdle logit $\eta_{i,2}$, with $\pi_i = \mathrm{logit}^{-1}(\eta_{i,2}) = \Pr(Y_i = 0)$ (the hurdle-at-zero probability, not the zero-inflation probability).

Likelihood (Mullahy 1986). The hurdle model factorizes as $$\Pr(Y_i = 0) = \pi_i, \qquad \Pr(Y_i = y) = (1 - \pi_i),\frac{\mathrm{Poisson}(y \mid \mu_i)}{1 - e^{-\mu_i}}, \quad y \geq 1,$$ where the denominator $1 - e^{-\mu_i} = 1 - \mathrm{Poisson}(0 \mid \mu_i)$ is the zero-truncation normalizer.

For $y_i = 0$: $$\log p(0 \mid \cdot) ;=; \mathrm{BernoulliLogit}(1 \mid \eta_{i,2}) ;=; \log\pi_i.$$

For $y_i \geq 1$: $$\log p(y_i \mid \cdot) ;=; \mathrm{BernoulliLogit}(0 \mid \eta_{i,2}) + \mathrm{PoissonLog}(y_i \mid \eta_{i,1}) - \log!\left(1 - e^{-\mu_i}\right),$$ where the truncation correction is $$-\log!\left(1 - e^{-\mu_i}\right) ;=; -\mathrm{log1m_exp}(-\exp(\eta_{i,1})),$$ using log1m_exp(-exp(eta_mu_i)) for numerical stability (computing $\log(1 - e^x)$ accurately when $x$ is very negative).

Posterior predictive. Draw the hurdle indicator $H_i \sim \mathrm{BernoulliLogit}(\eta_{i,2})$; if $H_i = 1$, $y_i^{\text{rep}} = 0$. Otherwise, sample from the zero-truncated Poisson by rejection sampling: repeatedly draw $y^{\text{draw}} \sim \mathrm{PoissonLog}(\eta_{i,1})$ until $y^{\text{draw}} &gt; 0$ or until iter reaches 10000. The 10000-iteration cap is a numerical safeguard: when $\mu_i$ is pathologically small, the acceptance probability $1 - e^{-\mu_i} \approx \mu_i$ is tiny, and the rejection sampler would otherwise loop essentially forever. The comment notes that in this regime the family is borderline unidentifiable anyway. If the cap is hit with y_draw == 0, y_pred[i] is set to 0 — a degenerate fallback.


Family 13 — Hurdle Negative Binomial, $K = 3$

    } else if (family_id_k[1] == 13) {
      // Hurdle-NB K = 3: symmetric to Hurdle-Poisson with
      // neg_binomial_2_log on the truncated positive branch.
      real eta_mu_i = eta_k[i, 1];
      real phi_i    = exp(eta_k[i, 2]);
      real eta_pi_i = eta_k[i, 3];
      if (y_int[i] == 0) {
        log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
                                              phi_i)
                   - log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i,
                                                       phi_i));
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        int y_draw = 0;
        int iter = 0;
        while (y_draw == 0 && iter < 10000) {
          y_draw = neg_binomial_2_log_rng(eta_mu_i, phi_i);
          iter = iter + 1;
        }
        y_pred[i] = y_draw;
      }
    }
  • Slot 1: NB2 mean on the log link, $\log\mu_i = \eta_{i,1}$.
  • Slot 2: NB2 precision $\phi_i = \exp(\eta_{i,2})$ (log link, hard-coded).
  • Slot 3: hurdle logit $\eta_{i,3}$, with $\pi_i = \mathrm{logit}^{-1}(\eta_{i,3}) = \Pr(Y_i = 0)$.

Likelihood. The NB2 probability of zero is $$\mathrm{NB}_2(0 \mid \mu_i, \phi_i) ;=; \left(\frac{\phi_i}{\mu_i + \phi_i}\right)^{\phi_i},$$ which the code obtains in log form via neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i). The hurdle factorization is $$\Pr(Y_i = 0) = \pi_i, \qquad \Pr(Y_i = y) = (1 - \pi_i),\frac{\mathrm{NB}_2(y \mid \mu_i, \phi_i)}{1 - \mathrm{NB}_2(0 \mid \mu_i, \phi_i)}, \quad y \geq 1.$$

For $y_i = 0$: $$\log p(0 \mid \cdot) ;=; \mathrm{BernoulliLogit}(1 \mid \eta_{i,3}).$$

For $y_i \geq 1$: $$\log p(y_i \mid \cdot) ;=; \mathrm{BernoulliLogit}(0 \mid \eta_{i,3}) + \mathrm{NB2Log}(y_i \mid \eta_{i,1}, \phi_i) - \log!\left(1 - \mathrm{NB}2(0 \mid \mu_i, \phi_i)\right),$$ where the truncation correction is implemented as $$-\log!\left(1 - e^{\mathrm{NB2Log}(0 \mid \eta{i,1}, \phi_i)}\right) ;=; -\mathrm{log1m_exp}!\left(\mathrm{neg_binomial_2_log_lpmf}(0 \mid \eta_{i,1}, \phi_i)\right).$$ This is the direct analogue of the Hurdle-Poisson correction, with the NB2 zero probability replacing $e^{-\mu_i}$.

Posterior predictive: draw $H_i \sim \mathrm{BernoulliLogit}(\eta_{i,3})$; if $H_i = 1$, $y_i^{\text{rep}} = 0$; otherwise rejection-sample from the zero-truncated NB2 by repeatedly calling neg_binomial_2_log_rng(eta_mu_i, phi_i) until a positive draw is obtained or 10000 iterations elapse. Same numerical-safety cap and degenerate fallback as family 12.


Default / fallback branch

    } else {
      log_lik[i] = negative_infinity();
      y_pred[i] = 0;
    }
  }
}

If family_id_k[1] matches none of ${1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13}$, the code assigns:

  • log_lik[i] = negative_infinity() — i.e. $\log p(y_i \mid \cdot) = -\infty$, rendering the draw effectively impossible under any model that consumes log_lik for model comparison. This is a defensive sentinel rather than a graceful degradation.
  • y_pred[i] = 0 — a placeholder posterior-predictive draw.

The loop over i and the generated quantities block are then closed.


Summary of the AMM reference-anchoring realization in this block

The generated quantities block is consumption-only with respect to the AMM reference-anchoring decomposition: it does not re-express the anchoring constraints, the reference-contrast prior structure, or the slot assembly. All of that was performed in transformed parameters (sections 1–2), which produced the $n \times K$ matrix eta_k whose $(i,k)$ entry is

$$\eta_{i,k} ;=; x_i^\top \beta_k ;+; \sum_{r} \lambda_{k,r}, z_{i,r},$$

subject to whatever anchoring identification constraints (e.g. $\sum_i w_i \eta_{i,k} = 0$ for non-reference slots, or $\beta_{k,\text{ref}} = 0$) were imposed upstream. This block simply:

  1. Aliases eta_k as theta_i_k to maintain API parity with the $K = 1 + p$ template — the only place where the "swap of $p$ with $K$" mentioned in the comment is materially realized.
  2. Dispatches on family_id_k[1] to evaluate the pointwise log-likelihood $\log p(y_i \mid \eta_{i,\cdot})$ using the appropriate inverse-link machinery (apply_inv_link_by_id for slot 2 in most two-slot families; hard-coded exp/inv_logit for the location-scale and Tweedie families).
  3. Generates one posterior-predictive draw $y_i^{\text{rep}}$ per observation from the same family, using the _rng counterparts and (for the hurdle families) rejection-sampling truncation with a 10000-iteration safety cap.

No Jacobian terms appear in this block: all inverse-link transformations here are forward maps from linear predictors to natural parameters, evaluated for the purpose of likelihood evaluation and posterior-predictive simulation. Any Jacobian adjustments required by the anchoring decomposition (e.g. for constrained reparametrizations of $\beta_k$ or $\lambda_{k,r}$) were already accounted for in the model block (section 2) via target += statements that are not visible in this section.

inst/stan/_canonical_pieces/amm_canonical_eb_conditional_multi.stan

Stan Template: amm_canonical_eb_conditional_multi.stan

Overview

This file implements the conditional Empirical Bayes (EB) path for the Anchored Modulating Model (AMM). It differs from the marginal EB template in one structural decision: the group-level reference parameters $\theta_{\text{ref}}$ are moved from parameters into data (as theta_ref_data), having been pre-estimated via Laplace approximation in a prior step. All remaining components—additive basis $\mathbf{a}$, multiplicative basis $\mathbf{b}$ (reparametrized through $\mathbf{c}b$), modulating component $\mathbf{W}$, likelihood—are identical to the marginal template, ensuring that the conditional posterior of the remaining parameters $\xi$ given the plug-in $\widehat{\theta}{\text{ref}}^{\text{EB}}$ is exactly the conditional posterior implied by the joint EB model.

The header comment documents:

  • Path: Empirical Bayes, Conditional (Path A), sub-phase 8.6.C, decision D34.
  • Structural difference: theta_ref is data, not a parameter; hyperparameters mu_theta_ref / sigma_theta_ref and their priors are absent.
  • Reference framework: For individual $i$, group $g[i]$, coordinate $k$:

$$\theta_{i}[k] = \theta_{\text{ref},g[i]}[k] + a_k(\mathbf{x}_i) + b_k(\mathbf{x}_i),\theta_{\text{ref},g[i]}[k] + \bigl(W_k(\theta_{\text{ref},g[i]}) - W_k(\theta_{\text{anchor}})\bigr),\mathbf{x}_i$$

  • Likelihood factorization: $p(\mathbf{y}i \mid \theta_i) = \prod{k=1}^p D(y_{ik} \mid \theta_i[k])$, where $D$ is selected by family_id (homogeneous across all $p$ coordinates).
  • Ragged design: Padded design matrices with per-coordinate column counts; flat-packed coefficient vectors with offset arrays.
  • Reparametrization of $\mathbf{b}$: The model samples $\mathbf{c}b = \theta{\text{ref}} \odot \mathbf{b}$ directly (NCP), eliminating bimodality in $(\theta_{\text{ref}}, \mathbf{b})$.
  • Centering: Column-wise centering of $Z_a[k]$ and $Z_b[k]$ is enforced R-side, so conditions (C2) and (C3) hold identically without further Stan-side constraints.
  • Anchoring of $\mathbf{W}$: The reparametrization $W(\theta_{\text{ref}}) - W(\theta_{\text{anchor}})$ satisfies (C4).

functions Block

functions {
  // {{CANONICAL_HELPERS}}
}

The placeholder {{CANONICAL_HELPERS}} is resolved at code-generation time by R/stan_codegen.R. It injects the shared helper functions used by all AMM templates, most critically:

  • apply_W_basis_diff(W_type_id, theta_ref_ik, theta_anchor_k, W_per_k_dim, W_degree, W_n_knots_full, W_knots_full): Returns a vector[W_per_k_dim] representing the difference of basis evaluations:

$$\phi_j(\theta_{\text{ref},ik}) - \phi_j(\theta_{\text{anchor},k}), \quad j = 1, \ldots, W_{\text{per_k_dim}}$$

where ${\phi_j}$ is the chosen basis (polynomial when W_type_id == 1, B-spline with Cox–de Boor recursion when W_type_id == 2). When W_type_id == 0 (W off), this function is not called. The Cox–de Boor recursion re-evaluates the basis at both $\theta_{\text{ref},\text{data}}[g_i][k]$ and $\theta_{\text{anchor}}[k]$ on every HMC step.

Additional canonical helpers (from the shared library) include utilities for offset computation, padded dot-product helpers, and family-specific link/inverse-link functions as needed.


data Block

This block declares all fixed inputs. Each declaration is explained below.

Core dimensions

  int<lower=1> n;

Number of observations. Must be $\geq 1$.

  int<lower=1> p;

Number of outcome coordinates (multivariate dimension). Must be $\geq 1$.

  int<lower=1, upper=4> family_id;

Selects the response family, homogeneous across all $p$ coordinates:

  • 1 = Gaussian (identity link)
  • 2 = Poisson (log link)
  • 3 = Negative binomial 2 (log link, Stan's mean–dispersion parameterization)
  • 4 = Bernoulli (logit link)

Component activation flags

  int<lower=0, upper=1> use_a;
  int<lower=0, upper=1> use_b;
  int<lower=0, upper=1> use_W;

Binary toggles for the additive ($\mathbf{a}$), multiplicative ($\mathbf{b}$), and modulating ($\mathbf{W}$) components. When 0, the corresponding component is deactivated: design matrices may be zero-column, coefficient vectors are zero-length, and the prior/sampling code is skipped.

Per-coordinate column counts

  int<lower=0> J_a_max;
  int<lower=0> J_b_max;
  array[p] int<lower=0> J_a_per_k;
  array[p] int<lower=0> J_b_per_k;
  • J_a_max: $\max_k J_{a,k}$, the maximum number of additive basis columns across coordinates. Used to dimension padded arrays.
  • J_b_max: $\max_k J_{b,k}$, the maximum number of multiplicative basis columns.
  • J_a_per_k[k]: Actual number of additive basis columns for coordinate $k$. Zero when use_a == 0 or coordinate $k$ has no additive term.
  • J_b_per_k[k]: Analogous for the multiplicative basis.

Padded design matrices

  array[p] matrix[n, J_a_max] Z_a;
  array[p] matrix[n, J_b_max] Z_b;

For each coordinate $k$:

  • $Z_a[k]$ is an $n \times J_{a,\max}$ matrix. Only columns $1 \ldots J_{a,k}$ are referenced; remaining columns are padding.
  • $Z_b[k]$ is an $n \times J_{b,\max}$ matrix. Only columns $1 \ldots J_{b,k}$ are referenced.

These matrices are column-wise centered R-side (enforced in amm_spec_multi_designs()), ensuring conditions (C2): $\mathbb{E}[a(X)] = 0$ and (C3): $\mathbb{E}[b(X)] = 0$ hold identically for any coefficient vector.

Modulating component data

  int<lower=0> dim_W;

Total dimension of the $\mathbf{W}$ parameter matrix. For separable polynomial/B-spline bases, $\dim_W = p \times W_{\text{per_k_dim}}$. When use_W == 0, this may be 0.

  int<lower=0> d;

Dimension of the covariate vector $\mathbf{x}_i$ that $\mathbf{W}$ modulates. When use_W == 0, this may be 0.

  int<lower=0> W_per_k_dim;

Number of basis functions per coordinate for separable $\mathbf{W}$ bases: $W_{\text{per_k_dim}} = \dim_W / p$. Each coordinate $k$ uses a contiguous block of $W_{\text{per_k_dim}}$ rows of $W_{\text{raw}}$.

  matrix[n, d] X;

The $n \times d$ covariate matrix. Row $i$ is $\mathbf{x}i \in \mathbb{R}^d$. The modulating component contributes $\bigl(W_k(\theta{\text{ref},g[i]}) - W_k(\theta_{\text{anchor}})\bigr)^\top \mathbf{x}i$ to $\eta{ik}$.

Outcome data

  matrix[n, p] y_real;
  array[n, p] int y_int;

Both outcome storage blocks are allocated; only the one matching family_id is consumed:

  • y_real[:, k] is used when family_id == 1 (Gaussian).
  • y_int[:, k] is used when family_id ∈ {2, 3, 4} (Poisson, negative binomial, Bernoulli).

Anchor vector

  vector[p] theta_anchor;

The fixed anchor point $\boldsymbol{\theta}{\text{anchor}} \in \mathbb{R}^p$, satisfying condition (C4). The $\mathbf{W}$ component is evaluated as $W(\theta{\text{ref}}) - W(\theta_{\text{anchor}})$, anchoring the modulating term at this point.

Dispersion toggles

  int<lower=0, upper=1> use_dispersion_y;
  int<lower=0, upper=1> use_dispersion_phi;
  • use_dispersion_y: When 1, sample observation-level dispersion $\sigma_y$ (used by Gaussian family).
  • use_dispersion_phi: When 1, sample the dispersion parameter $\phi$ (used by negative binomial family).

W basis specification

  int<lower=0, upper=2> W_type_id;

Selects the $\mathbf{W}$ basis type: 0 = off, 1 = polynomial, 2 = B-spline.

  int<lower=0> W_n_knots_full;
  vector[W_n_knots_full] W_knots_full;
  int<lower=0> W_degree;

B-spline knot vector and polynomial degree, consumed by apply_W_basis_diff() when W_type_id == 2. For polynomial bases (W_type_id == 1), W_degree determines the polynomial order.

Grouping

  int<lower=0, upper=1> use_groups;
  int<lower=1> J_groups;
  array[n] int<lower=1, upper=J_groups> group_id;
  • use_groups: 0 = single global anchor (backward-compatible), 1 = per-group anchors.
  • J_groups: Number of groups. When use_groups == 0, J_groups == 1.
  • group_id[i]: Group assignment for observation $i$. When use_groups == 0, group_id[i] == 1 for all $i$.

Placeholder for CP/NCP declarations

  {{DATA_CP_A_PER_K_DECL}}

Resolved at code-generation time. Declares any additional data needed for the configurable centered/non-centered parametrization of $\mathbf{a}$ per coordinate (e.g., per-coordinate flags, index arrays).

EB-specific dimensions

  int<lower=1> K_slots;
  int<lower=1> p_dim;

Under Path A (EB Conditional, $K = 1$): K_slots == 1 and p_dim == p. These are declared for template uniformity; the body does not branch on them.

EB conditional plug-in

  array[J_groups] vector[p] theta_ref_data;

The key structural difference from the marginal template. This is the pre-estimated reference parameter, $\widehat{\theta}_{\text{ref}}^{\text{EB}}$, populated R-side by gdpar_eb_maximize_marginal() via cmdstanr::laplace() applied to the marginal EB template. It is an array of J_groups vectors of length $p$:

$$\texttt{theta_ref_data}[g] = \widehat{\theta}_{\text{ref},g}^{\text{EB}} \in \mathbb{R}^p, \quad g = 1, \ldots, J_{\text{groups}}$$

When use_groups == 0, only theta_ref_data[1] is populated and used.


transformed data Block

This block pre-computes offsets and sizes used for flat-packing.

  array[p] int J_a_free;
  array[p] int J_b_free;

For each coordinate $k$:

  • J_a_free[k] = J_a_per_k[k] when use_a == 1 and J_a_per_k[k] > 0; otherwise 0. This is the number of free additive coefficients for coordinate $k$.
  • J_b_free[k] is analogous for the multiplicative component.
  array[p] int a_raw_offset;
  array[p] int c_b_raw_offset;

Cumulative offsets into the flat-packed coefficient vectors:

  • a_raw_offset[k] = $\sum_{k' &lt; k} J_{a,\text{free}}[k']$: the starting index (0-based offset) in a_raw for coordinate $k$'s additive coefficients.
  • c_b_raw_offset[k] = $\sum_{k' &lt; k} J_{b,\text{free}}[k']$: analogous for c_b_raw.
  int total_J_a_free = 0;
  int total_J_b_free = 0;

Running totals, accumulated in the loop below. Final values: $\text{total_J_a_free} = \sum_{k=1}^p J_{a,\text{free}}[k]$, $\text{total_J_b_free} = \sum_{k=1}^p J_{b,\text{free}}[k]$.

  int dim_W_eff = (use_W == 1) ? dim_W : 0;
  int d_eff = (use_W == 1) ? d : 0;

Effective dimensions for $W_{\text{raw}}$: when use_W == 0, both collapse to 0, producing zero-sized parameter arrays.

  int sigma_a_size = (use_a == 1) ? p : 0;
  int sigma_b_size = (use_b == 1) ? p : 0;
  int sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;
  int sigma_y_size = (use_dispersion_y == 1) ? p : 0;
  int phi_size = (use_dispersion_phi == 1) ? p : 0;

Sizes for the scale/dispersion parameter arrays. When a component is inactive, its scale array is zero-sized (not sampled).

  for (k in 1:p) {
    J_a_free[k] = (use_a == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
    J_b_free[k] = (use_b == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
    a_raw_offset[k] = total_J_a_free;
    c_b_raw_offset[k] = total_J_b_free;
    total_J_a_free += J_a_free[k];
    total_J_b_free += J_b_free[k];
  }

The loop that populates the per-coordinate free counts and offsets. For each coordinate $k$ in sequence:

  1. Compute J_a_free[k] and J_b_free[k].
  2. Record the current cumulative offset.
  3. Increment the cumulative totals.

This produces a prefix-sum scheme enabling segment(a_raw, a_raw_offset[k] + 1, J_a_free[k]) to extract coordinate $k$'s block.


parameters Block

Notably absent: theta_ref, mu_theta_ref, sigma_theta_ref. These were parameters in the marginal template; here $\theta_{\text{ref}}$ is data and the hyperparameters are unnecessary.

  array[sigma_a_size] real<lower=0> sigma_a;

Per-coordinate additive scale parameters $\sigma_{a,k}$, $k = 1, \ldots, p$ (when use_a == 1). Zero-sized array when inactive.

  vector[total_J_a_free] a_raw;

Flat-packed raw additive coefficients. In the non-centered parametrization (NCP), $\mathbf{a}k = \sigma{a,k} \cdot \texttt{a_raw}[\texttt{a_raw_offset}[k]+1 : \texttt{a_raw_offset}[k]+\texttt{J_a_free}[k]]$. The actual transformation depends on the CP/NCP configuration resolved by {{TP_A_BLOCK}}.

  array[sigma_b_size] real<lower=0> sigma_b;

Per-coordinate multiplicative scale parameters $\sigma_{b,k}$, $k = 1, \ldots, p$ (when use_b == 1).

  vector[total_J_b_free] c_b_raw;

Flat-packed raw coefficients for $\mathbf{c}b$, the reparametrized multiplicative component. Under NCP: $c{b,k}[j] = \sigma_{b,k} \cdot \texttt{c_b_raw}[\texttt{c_b_raw_offset}[k]+j]$.

  array[sigma_W_size] real<lower=0> sigma_W;

Global scale for $\mathbf{W}$ (single scalar when use_W == 1 && dim_W > 0).

  matrix[dim_W_eff, d_eff] W_raw;

The raw $\mathbf{W}$ coefficient matrix. When use_W == 1, this is a $\dim_W \times d$ matrix; the actual $\mathbf{W}$ is $\sigma_W \cdot W_{\text{raw}}$ (or with a different scaling resolved by {{W_SCALE}}). When use_W == 0, this is $0 \times 0$.

  array[sigma_y_size] real<lower=0> sigma_y;

Per-coordinate observation-level standard deviation $\sigma_{y,k}$ for the Gaussian family (family_id == 1). Zero-sized when use_dispersion_y == 0 or family is non-Gaussian.

  array[phi_size] real<lower=0> phi;

Per-coordinate dispersion parameter $\phi_k$ for the negative binomial family (family_id == 3). Stan's mean–dispersion parameterization: $\text{NegBinomial2}(\mu, \phi)$ with $\text{Var}[Y] = \mu + \mu^2/\phi$.


transformed parameters Block

This block constructs the coefficient vectors and the linear predictor $\eta$.

Declarations

  array[p] vector[J_a_max] a_coef;
  array[p] vector[J_b_max] c_b;
  array[p] vector[J_b_max] b_coef;
  matrix[n, p] eta;
  • a_coef[k]: Additive coefficient vector for coordinate $k$, length $J_{a,\max}$. Only entries $1 \ldots J_{a,k}$ are meaningful.
  • c_b[k]: Reparametrized multiplicative coefficient vector, $c_{b,k} = \theta_{\text{ref},k} \cdot b_k$. Length $J_{b,\max}$.
  • b_coef[k]: Derived multiplicative coefficient vector, $b_k = c_{b,k} / \theta_{\text{ref},k}$. Only computed as a derived quantity in the single-anchor regime.
  • eta[i, k]: Linear predictor $\eta_{ik}$.

Initialization

  for (k in 1:p) {
    a_coef[k] = rep_vector(0, J_a_max);
    c_b[k] = rep_vector(0, J_b_max);
    b_coef[k] = rep_vector(0, J_b_max);
  }

All coefficient vectors initialized to zero. Unused entries remain zero.

Additive coefficient block

{{TP_A_BLOCK}}

Placeholder resolved at code-generation time. This implements the CP/NCP transformation for the additive coefficients. In the canonical NCP form:

$$a_k[j] = \sigma_{a,k} \cdot a_{\text{raw}}[j + \texttt{a_raw_offset}[k]], \quad j = 1, \ldots, J_{a,k}$$

and stores the result in a_coef[k][1:J_a_per_k[k]]. The CP form would instead directly assign a_raw segments to a_coef with the prior providing regularization.

Multiplicative coefficient block

  if (use_b == 1) {
    for (k in 1:p) {
      if (J_b_per_k[k] > 0) {
        for (j in 1:J_b_free[k]) {
          c_b[k][j] = c_b_raw[c_b_raw_offset[k] + j] * sigma_b[k];
        }

When the multiplicative component is active, for each coordinate $k$ with $J_{b,k} &gt; 0$, the raw coefficients are scaled:

$$c_{b,k}[j] = \sigma_{b,k} \cdot c_{b,\text{raw}}[\texttt{c_b_raw_offset}[k] + j]$$

This is the NCP for $\mathbf{c}b$: the raw variables have a standard normal prior, and $\sigma{b,k}$ controls the scale.

        if (use_groups == 0 && theta_ref_data[1][k] != 0) {
          for (j in 1:J_b_per_k[k]) {
            b_coef[k][j] = c_b[k][j] / theta_ref_data[1][k];
          }
        }

Derived quantity only in the single-anchor regime. Computes:

$$b_k[j] = \frac{c_{b,k}[j]}{\theta_{\text{ref},1}[k]}, \quad j = 1, \ldots, J_{b,k}$$

when use_groups == 0 and $\theta_{\text{ref},1}[k] \neq 0$. With per-group anchors (use_groups == 1), there is no single canonical scalar divisor; b_coef stays zero and users derive $b_k[g][j] = c_{b,k}[j] / \theta_{\text{ref},g}[k]$ post-hoc.

This reparametrization is motivated by the bimodality documented in Recovery 2 (handoff 4 addendum): the multiplicative term in the original parametrization enters as $b_k(\mathbf{x}i) \cdot \theta{\text{ref},k}$, and sampling $(\theta_{\text{ref},k}, b_k)$ jointly creates a ridge of near-constant products. Sampling $c_{b,k} = \theta_{\text{ref},k} \cdot b_k$ directly eliminates this.

Linear predictor construction

  for (i in 1:n) {
    int g_i = group_id[i];
    for (k in 1:p) {
      real theta_ref_ik = theta_ref_data[g_i][k];
      real eta_ik = theta_ref_ik;

For each observation $i$ and coordinate $k$:

  1. Look up the group $g_i = \texttt{group_id}[i]$.
  2. Set the local reference: $\theta_{\text{ref},ik} = \texttt{theta_ref_data}[g_i][k]$.
  3. Initialize the linear predictor: $\eta_{ik} \leftarrow \theta_{\text{ref},ik}$.

This implements the first term of the AMM decomposition: the baseline is the group-level reference parameter.

Additive component:

      if (use_a == 1 && J_a_per_k[k] > 0) {
        eta_ik += dot_product(
          to_vector(Z_a[k][i, 1:J_a_per_k[k]]),
          a_coef[k][1:J_a_per_k[k]]
        );
      }

Adds the additive contribution:

$$\eta_{ik} \mathrel{+}= \mathbf{Z}_{a,k}[i, 1:J_{a,k}] \cdot \mathbf{a}_k[1:J_{a,k}] = \sum_{j=1}^{J_{a,k}} Z_{a,k}[i,j], a_k[j]$$

Only the first $J_{a,k}$ (active) columns of the padded matrix are used.

Multiplicative component:

      if (use_b == 1 && J_b_per_k[k] > 0) {
        eta_ik += dot_product(
          to_vector(Z_b[k][i, 1:J_b_per_k[k]]),
          c_b[k][1:J_b_per_k[k]]
        );
      }

Adds the multiplicative contribution, but in the $\mathbf{c}_b$ parametrization:

$$\eta_{ik} \mathrel{+}= \mathbf{Z}_{b,k}[i, 1:J_{b,k}] \cdot \mathbf{c}_{b,k}[1:J_{b,k}]$$

Since $c_{b,k}[j] = \theta_{\text{ref},k} \cdot b_k[j]$, this is equivalent to $\theta_{\text{ref},k} \cdot \mathbf{Z}_{b,k}[i] \cdot \mathbf{b}k$, which is the $b_k(\mathbf{x}i) \cdot \theta{\text{ref},g[i],k}$ term in the AMM decomposition. The dot product uses only columns $1 \ldots J{b,k}$ of the padded matrix.

Modulating component ($\mathbf{W}$):

      if (use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0) {
        vector[d] W_diff_x_k = rep_vector(0, d);
        vector[W_per_k_dim] basis_diff_k = apply_W_basis_diff(
          W_type_id, theta_ref_ik, theta_anchor[k], W_per_k_dim,
          W_degree, W_n_knots_full, W_knots_full
        );
        int row_start = (k - 1) * W_per_k_dim + 1;

When the modulating component is active:

  1. Initialize $\mathbf{W}_{\text{diff_x_k}} = \mathbf{0}_d$.
  2. Evaluate the basis difference vector:

$$\texttt{basis_diff_k}[j] = \phi_j(\theta_{\text{ref},ik}) - \phi_j(\theta_{\text{anchor},k}), \quad j = 1, \ldots, W_{\text{per_k_dim}}$$

via apply_W_basis_diff(). This function dispatches on W_type_id:

  • W_type_id == 1 (polynomial): evaluates monomials $1, \theta, \theta^2, \ldots$ at both points and takes differences.
  • W_type_id == 2 (B-spline): runs Cox–de Boor recursion at both points and takes differences.
  1. Compute the row offset: coordinate $k$'s block in $W_{\text{raw}}$ starts at row $\texttt{row_start} = (k-1) \times W_{\text{per_k_dim}} + 1$.
        for (jj_local in 1:W_per_k_dim) {
          int row = row_start + jj_local - 1;
          for (jjj in 1:d) {
            W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[row, jjj]{{W_SCALE}};
          }
        }
        eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));

The double loop accumulates:

$$[\mathbf{W}_{\text{diff_x_k}}]_\ell = \sum_{j=1}^{W_{\text{per_k_dim}}} \bigl(\phi_j(\theta_{\text{ref},ik}) - \phi_j(\theta_{\text{anchor},k})\bigr) \cdot \sigma_W \cdot W_{\text{raw}}[\texttt{row_start}+j-1, \ell]$$

for $\ell = 1, \ldots, d$. The {{W_SCALE}} placeholder resolves to a scaling factor (typically * sigma_W[1] for the CP form). Then:

$$\eta_{ik} \mathrel{+}= \mathbf{W}_{\text{diff_x_k}}^\top \mathbf{x}_i$$

In matrix notation, the full modulating contribution for coordinate $k$ is:

$$\bigl(W_k(\theta_{\text{ref},g[i]}) - W_k(\theta_{\text{anchor}})\bigr)^\top \mathbf{x}_i$$

where $W_k(\theta) = \boldsymbol{\phi}(\theta)^\top W_k$ and $W_k$ is the $k$-th block (of $W_{\text{per_k_dim}}$ rows) of the $\dim_W \times d$ matrix $W$.

      eta[i, k] = eta_ik;
    }
  }

Store the completed linear predictor. The full linear predictor is:

$$\eta_{ik} = \theta_{\text{ref},g[i],k} + \underbrace{\mathbf{Z}_{a,k}[i] \cdot \mathbf{a}_k}_{\text{additive}} + \underbrace{\mathbf{Z}_{b,k}[i] \cdot \mathbf{c}_{b,k}}_{\text{multiplicative}} + \underbrace{\bigl(W_k(\theta_{\text{ref},g[i]}) - W_k(\theta_{\text{anchor}})\bigr)^\top \mathbf{x}_i}_{\text{modulating}}$$


model Block

This block specifies priors and the likelihood.

Prior on $\theta_{\text{ref}}$

  // No prior on theta_ref / mu_theta_ref / sigma_theta_ref: theta_ref
  // is data in the conditional template. xi components retain their
  // canonical priors.

This is the defining difference from the marginal template. In the marginal EB template, $\theta_{\text{ref}}$ has a hierarchical prior $\theta_{\text{ref},g}[k] \sim \mathcal{N}(\mu_{\theta_{\text{ref}},k}, \sigma_{\theta_{\text{ref}},k})$. Here, $\theta_{\text{ref}}$ is fixed data, so there is no prior term. The remaining priors are identical.

Additive component priors

  if (use_a == 1) {
    sigma_a ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
  }

When the additive component is active:

  1. Scale prior: $\sigma_{a,k} \sim \texttt{PRIOR_SIGMA_A}$ for each $k$ (typically half-normal or half-Cauchy; resolved at code generation).
  2. Coefficient prior: {{MODEL_A_BLOCK}} is resolved to place priors on a_raw. In the canonical NCP form, this is:

$$a_{\text{raw}}[j] \sim \mathcal{N}(0, 1), \quad j = 1, \ldots, \text{total_J_a_free}$$

so that $a_k[j] = \sigma_{a,k} \cdot a_{\text{raw}}[\ldots]$ has marginal prior $\mathcal{N}(0, \sigma_{a,k}^2)$.

Multiplicative component priors

  if (use_b == 1) {
    sigma_b ~ {{PRIOR_SIGMA_B}};
    c_b_raw ~ normal(0, 1);
  }

When the multiplicative component is active:

  1. Scale prior: $\sigma_{b,k} \sim \texttt{PRIOR_SIGMA_B}$ for each $k$.
  2. Coefficient prior: $c_{b,\text{raw}}[j] \sim \mathcal{N}(0, 1)$ for all $j$, so $c_{b,k}[j] = \sigma_{b,k} \cdot c_{b,\text{raw}}[\ldots]$ has marginal prior $\mathcal{N}(0, \sigma_{b,k}^2)$.

Modulating component priors

  if (use_W == 1 && dim_W > 0) {
    sigma_W[1] ~ {{PRIOR_SIGMA_W}};
    to_vector(W_raw) ~ {{W_PRIOR}};
  }

When the modulating component is active:

  1. Global scale prior: $\sigma_W \sim \texttt{PRIOR_SIGMA_W}$ (a single scalar).
  2. Coefficient prior: $\text{vec}(W_{\text{raw}}) \sim \texttt{W_PRIOR}$ (typically $\mathcal{N}(0, 1)$ for NCP). With the {{W_SCALE}} scaling, the implied prior on $W$ entries is $\mathcal{N}(0, \sigma_W^2)$.

Dispersion priors

  if (use_dispersion_y == 1) {
    sigma_y ~ {{PRIOR_SIGMA_Y}};
  }
  if (use_dispersion_phi == 1) {
    phi ~ {{PRIOR_PHI}};
  }
  • $\sigma_{y,k}$: Observation-level scale for Gaussian family.
  • $\phi_k$: Dispersion for negative binomial family.

Likelihood

  for (k in 1:p) {
    if (family_id == 1) {
      y_real[:, k] ~ normal(eta[:, k], sigma_y[k]);
    } else if (family_id == 2) {
      y_int[:, k] ~ poisson_log(eta[:, k]);
    } else if (family_id == 3) {
      y_int[:, k] ~ neg_binomial_2_log(eta[:, k], phi[k]);
    } else if (family_id == 4) {
      y_int[:, k] ~ bernoulli_logit(eta[:, k]);
    }
  }

The likelihood factorizes across coordinates. For each coordinate $k = 1, \ldots, p$, the conditional likelihood of the $k$-th outcome given $\eta_{\cdot, k}$ is determined by family_id:

  1. Gaussian (family_id == 1):

$$y_{ik} \mid \eta_{ik}, \sigma_{y,k} \sim \mathcal{N}(\eta_{ik}, \sigma_{y,k}^2)$$

using Stan's normal(mu, sigma) parameterization. The link function is identity: $\eta_{ik} = \mu_{ik}$.

  1. Poisson (family_id == 2):

$$y_{ik} \mid \eta_{ik} \sim \text{Poisson}\bigl(\exp(\eta_{ik})\bigr)$$

using Stan's poisson_log (log-link). So $\lambda_{ik} = \exp(\eta_{ik})$.

  1. Negative binomial 2 (family_id == 3):

$$y_{ik} \mid \eta_{ik}, \phi_k \sim \text{NegBinomial2}\bigl(\exp(\eta_{ik}), \phi_k\bigr)$$

using Stan's neg_binomial_2_log(eta, phi) (log-link, mean–dispersion parameterization). Mean $\mu_{ik} = \exp(\eta_{ik})$, variance $\mu_{ik} + \mu_{ik}^2/\phi_k$.

  1. Bernoulli (family_id == 4):

$$y_{ik} \mid \eta_{ik} \sim \text{Bernoulli}\bigl(\text{logit}^{-1}(\eta_{ik})\bigr)$$

using Stan's bernoulli_logit (logit-link). So $p_{ik} = \text{logit}^{-1}(\eta_{ik}) = \frac{1}{1 + e^{-\eta_{ik}}}$.

The joint likelihood across all observations and coordinates is:

$$p(\mathbf{Y} \mid \boldsymbol{\xi}, \boldsymbol{\theta}_{\text{ref}}) = \prod_{i=1}^n \prod_{k=1}^p D\bigl(y_{ik} \mid \eta_{ik}\bigr)$$

where $\boldsymbol{\xi} = (\mathbf{a}, \mathbf{c}b, \mathbf{W}, \sigma_a, \sigma_b, \sigma_W, \sigma_y, \phi)$ and $\boldsymbol{\theta}{\text{ref}} = \texttt{theta_ref_data}$ is fixed. This factorization—cross-dimensional coupling carried exclusively by $W(\theta_{\text{ref}})$, with conditionally independent coordinates given $\theta_i$—is the canonization of "Opción B" from the Phase F architectural decision.


Summary of the AMM Reference-Anchoring Decomposition

The full model implements the following generative structure. Let $\boldsymbol{\xi} = {\mathbf{a}, \mathbf{c}b, \mathbf{W}, \sigma_a, \sigma_b, \sigma_W, \sigma_y, \phi}$ denote all sampled parameters, and $\boldsymbol{\theta}{\text{ref}}$ denote the fixed plug-in.

Conditional posterior (what this Stan program computes):

$$p(\boldsymbol{\xi} \mid \mathbf{Y}, \boldsymbol{\theta}_{\text{ref}}^{\text{EB}}) \propto \underbrace{p(\boldsymbol{\xi})}_{\substack{\text{priors on } \xi \ \text{(no prior on } \theta_{\text{ref}}\text{)}}} \cdot \underbrace{\prod_{i=1}^n \prod_{k=1}^p D\bigl(y_{ik} \mid \eta_{ik}(\boldsymbol{\xi}, \boldsymbol{\theta}_{\text{ref}}^{\text{EB}})\bigr)}_{\text{likelihood}}$$

where:

$$\eta_{ik} = \theta_{\text{ref},g[i],k}^{\text{EB}} + \underbrace{\sum_{j=1}^{J_{a,k}} Z_{a,k}[i,j], a_k[j]}_{a_k(\mathbf{x}_i)} + \underbrace{\sum_{j=1}^{J_{b,k}} Z_{b,k}[i,j], c_{b,k}[j]}_{\theta_{\text{ref},g[i],k}^{\text{EB}} \cdot b_k(\mathbf{x}_i)} + \underbrace{\sum_{\ell=1}^{d} \Bigl[\sum_{j=1}^{W_{\text{per_k_dim}}} \bigl(\phi_j(\theta_{\text{ref},g[i],k}^{\text{EB}}) - \phi_j(\theta_{\text{anchor},k})\bigr) W_{k,j,\ell}\Bigr] x_{i\ell}}_{\bigl(W_k(\theta_{\text{ref},g[i]}^{\text{EB}}) - W_k(\theta_{\text{anchor}})\bigr)^\top \mathbf{x}_i}$$

This is the conditional step (Step (iii)) of the EB workflow: given the point estimate $\widehat{\boldsymbol{\theta}}{\text{ref}}^{\text{EB}}$ from the Laplace approximation on the marginal template, sample the remaining parameters $\xi$ from their conditional posterior. By construction, there is no extra modelling assumption between the marginal and conditional steps—the conditional posterior here is mathematically identical to the conditional $p(\xi \mid Y, \theta{\text{ref}})$ implied by the joint model of the marginal template, evaluated at $\theta_{\text{ref}} = \widehat{\theta}_{\text{ref}}^{\text{EB}}$.

The template is absent a generated quantities block in the provided source (section 1 of 2); it presumably appears in section 2.

generated quantities {
  matrix[n, p] log_lik;
  matrix[n, p] theta_i = eta;
  matrix[n, p] y_pred;

  for (i in 1:n) {
    for (k in 1:p) {
      if (family_id == 1) {
        log_lik[i, k] = normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]);
        y_pred[i, k] = normal_rng(eta[i, k], sigma_y[k]);
      } else if (family_id == 2) {
        log_lik[i, k] = poisson_log_lpmf(y_int[i, k] | eta[i, k]);
        y_pred[i, k] = poisson_log_rng(eta[i, k]);
      } else if (family_id == 3) {
        log_lik[i, k] = neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]);
        y_pred[i, k] = neg_binomial_2_log_rng(eta[i, k], phi[k]);
      } else if (family_id == 4) {
        log_lik[i, k] = bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]);
        y_pred[i, k] = bernoulli_logit_rng(eta[i, k]);
      }
    }
  }
}

Generated Quantities Block

This block computes quantities derived from the posterior samples after model fitting. It is executed once per iteration of the sampling algorithm.

Variable Declarations and Initialization

matrix[n, p] log_lik;

  • Declares an $n \times p$ matrix storing the pointwise log-likelihood for each observation $i \in {1,\dots,n}$ and each item $k \in {1,\dots,p}$.
  • Used for posterior model comparison and diagnostics (e.g., computing WAIC, LOO).

matrix[n, p] theta_i = eta;

  • Declares and initializes an $n \times p$ matrix theta_i to the linear predictor matrix eta from the transformed parameters block.
  • In the AMM reference-anchoring decomposition, eta represents the latent score or linear predictor on the link scale. By assigning it to theta_i, we preserve the latent trait values for each individual and item for post-processing.

matrix[n, p] y_pred;

  • Declares an $n \times p$ matrix storing posterior predictive samples (replications of the observed data).
  • Used for posterior predictive checks (PPCs) to assess model fit.

Computation Loops

Outer loop: for (i in 1:n) iterates over all $n$ observations (e.g., individuals or response vectors).

Inner loop: for (k in 1:p) iterates over all $p$ items (or dimensions of the multivariate response).

Family-Specific Computations

The family_id data variable selects the likelihood family. For each $(i,k)$ pair, the code computes:

  1. Pointwise Log-Likelihood $\log p(y_{i,k} \mid \eta_{i,k}, \boldsymbol{\psi}_k)$
  2. Posterior Predictive Sample $\tilde{y}{i,k} \sim p(y \mid \eta{i,k}, \boldsymbol{\psi}_k)$

where $\boldsymbol{\psi}k$ denotes family-specific dispersion parameters (e.g., $\sigma{y,k}$, $\phi_k$).


Family 1: Normal (Continuous) – family_id == 1

Likelihood: $$ \log p(y_{i,k}^{\text{real}} \mid \eta_{i,k}, \sigma_{y,k}) = -\frac{1}{2}\left(\log(2\pi) + 2\log\sigma_{y,k} + \frac{(y_{i,k}^{\text{real}} - \eta_{i,k})^2}{\sigma_{y,k}^2}\right) $$ Implementation:
normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]) computes this directly.

Posterior Predictive Distribution: $$ \tilde{y}{i,k}^{\text{real}} \sim \mathcal{N}(\eta{i,k}, \sigma_{y,k}^2) $$ Implementation:
normal_rng(eta[i, k], sigma_y[k]) draws a sample from this normal distribution.


Family 2: Poisson (Count) – family_id == 2

Likelihood (log link): $$ \log p(y_{i,k}^{\text{int}} \mid \eta_{i,k}) = y_{i,k}^{\text{int}} \cdot \eta_{i,k} - e^{\eta_{i,k}} - \log(y_{i,k}^{\text{int}}!) $$ Implementation:
poisson_log_lpmf(y_int[i, k] | eta[i, k]) computes the Poisson log-PMF with log-rate $\eta_{i,k}$.

Posterior Predictive Distribution: $$ \tilde{y}{i,k}^{\text{int}} \sim \text{Poisson}(e^{\eta{i,k}}) $$ Implementation:
poisson_log_rng(eta[i, k]) generates a Poisson sample with rate $e^{\eta_{i,k}}$.


Family 3: Negative Binomial (Overdispersed Count) – family_id == 3

Likelihood (log link): $$ \log p(y_{i,k}^{\text{int}} \mid \eta_{i,k}, \phi_k) = \log\Gamma(y_{i,k}^{\text{int}} + \phi_k) - \log\Gamma(\phi_k) - \log(y_{i,k}^{\text{int}}!) + \phi_k \log\phi_k + y_{i,k}^{\text{int}} \eta_{i,k} - (\phi_k + y_{i,k}^{\text{int}}) \log(\phi_k + e^{\eta_{i,k}}) $$ where $\mu_k = e^{\eta_{i,k}}$ and variance is $\mu_k + \mu_k^2/\phi_k$.

Implementation:
neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]) computes this exactly.

Posterior Predictive Distribution: $$ \tilde{y}{i,k}^{\text{int}} \sim \text{NB2}(\mu = e^{\eta{i,k}}, \phi_k) $$ Implementation:
neg_binomial_2_log_rng(eta[i, k], phi[k]) draws a sample from this distribution.


Family 4: Bernoulli (Binary) – family_id == 4

Likelihood (logit link): $$ \log p(y_{i,k}^{\text{int}} \mid \eta_{i,k}) = y_{i,k}^{\text{int}} \log\left(\frac{e^{\eta_{i,k}}}{1 + e^{\eta_{i,k}}}\right) + (1 - y_{i,k}^{\text{int}}) \log\left(\frac{1}{1 + e^{\eta_{i,k}}}\right) $$ Equivalently, $p = \text{logit}^{-1}(\eta_{i,k}) = \frac{1}{1 + e^{-\eta_{i,k}}}$.

Implementation:
bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]) computes this using the logit-scale parameterization.

Posterior Predictive Distribution: $$ \tilde{y}{i,k}^{\text{int}} \sim \text{Bernoulli}\left(\text{logit}^{-1}(\eta{i,k})\right) $$ Implementation:
bernoulli_logit_rng(eta[i, k]) draws a Bernoulli sample with probability $\text{logit}^{-1}(\eta_{i,k})$.


Role in AMM Reference-Anchoring Decomposition

In the AMM framework:

  • $\eta_{i,k}$ represents the latent score (or linear predictor) for individual $i$ on item $k$.
  • Under the reference-anchoring decomposition, $\eta$ typically has the form: $$ \eta_{i,k} = \alpha_k + \boldsymbol{x}_i^\top \boldsymbol{\beta}_k + \boldsymbol{z}i^\top \boldsymbol{u}{i,k} $$ where $\alpha_k$ are item intercepts, $\boldsymbol{\beta}k$ are item-specific fixed effects, and $\boldsymbol{u}{i,k}$ are random effects.

The generated quantities block does not alter the model's parameterization but uses $\eta$ to:

  1. Compute the pointwise log-likelihood for each data point, enabling model comparison via LOO or WAIC.
  2. Generate posterior predictive samples $\tilde{y}_{i,k}$ to assess the model's ability to replicate observed data patterns.
  3. Export theta_i = eta for potential post-hoc analysis (e.g., computing person parameters or reliability).

By conditioning on the posterior samples of $\eta$ (which encode the reference-anchored latent traits), this block ensures that all derived quantities respect the AMM decomposition and associated likelihood assumptions.

inst/stan/_canonical_pieces/amm_canonical_eb_conditional.stan

amm_canonical_eb_conditional.stan — EB Conditional Template

File path: inst/stan/_canonical_pieces/amm_canonical_eb_conditional.stan

Regime: Empirical Bayes (EB) Conditional — Sub-phase 8.6.B of the gdpar development charter. This template is the Step (iii) sampler in the EB workflow: it receives the EB point estimate $\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ as data and draws from the conditional posterior $\Pi_n^{\mathrm{EB}}(\xi \mid \widehat\theta_{\mathrm{ref}}^{\mathrm{EB}})$, where $\xi = (a_{\mathrm{coef}},, b_{\mathrm{coef}},, W_{\mathrm{raw}},, \sigma_*,, \phi)$.


Header Comments (lines 1–37)

The header establishes three structural facts:

  1. Provenance. The template is invoked by R/eb.R via cmdstanr::cmdstan_model(...)$sample(...) at Step (iii) of the EB workflow described in v07 §11.1 (operational) and v07b §§4, 6 (theoretical). Step (i) produces $\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$; Step (iii) plugs it in as data and samples $\xi$.

  2. Structural difference from amm_eb_marginal.stan / amm_main.stan. Three changes:

    • theta_ref moves from parameters{} to data{}, renamed theta_ref_data.
    • The hyperparameters mu_theta_ref and sigma_theta_ref are removed entirely — they existed solely to place a prior on theta_ref in the full-Bayes (FB) and EB-marginal regimes.
    • The prior on theta_ref is removed from model{}.
  3. Bit-identicality guarantee. Every other line matches amm_eb_marginal.stan so that the conditional posterior of $\xi$ given the plug-in $\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ is exactly the conditional posterior that the EB asymptotic theory of v07 §5 / v07b §4 analyses. No extra modelling assumption is introduced between Steps (i) and (iii).

  4. Multivariate posture (Charter §2.7). Sub-phase 8.6.B operates under $K = 1$ (single slot per group), $p = 1$ (scalar parameter), $J \geq 1$ (one or more groups). The data block declares K_slots and p_dim with default 1; the body assumes K_slots == 1 and p_dim == 1 throughout. Sub-phase 8.6.C will extend the body to $K &gt; 1$, $p &gt; 1$ without rewriting the data declaration.


functions Block

functions {
  // {{CANONICAL_HELPERS}}
}

This block contains a single Mustache-style placeholder {{CANONICAL_HELPERS}}. At template-instantiation time (R-side, before cmdstanr::cmdstan_model), the R engine substitutes this token with the canonical helper-function definitions shared across all AMM templates. The only helper referenced in the body below is:

vector apply_W_basis_diff(int W_type_id, real theta_ref_i, real theta_anchor,
                          int dim_W, int W_degree,
                          int W_n_knots_full, vector W_knots_full)

which computes the basis-difference vector $\Delta_B(\theta_{\mathrm{ref},i},, \theta_0) \in \mathbb{R}^{\dim_W}$ (see the transformed parameters walkthrough for the mathematical definition). Because the placeholder is opaque at the Stan-source level, no further specification is given here; the substitution is faithful to whatever gdpar's R-side template engine injects.


data Block

Dimension and family controls

int<lower=1> n;
int<lower=0, upper=4> family_id;
  • n — number of observations. Constraint lower=1 enforces $n \geq 1$.
  • family_id — response-family selector, constrained to ${0,1,2,3,4}$. In the model and generated quantities blocks only values $1$–$4$ are exercised:
    • 1 → Gaussian (normal)
    • 2 → Poisson (log link)
    • 3 → Negative binomial type 2 (log link)
    • 4 → Bernoulli (logit link)

Slot-usage flags

int<lower=0, upper=1> use_a;
int<lower=0, upper=1> use_b;
int<lower=0, upper=1> use_W;

Three binary switches that activate the three additive slots of the AMM decomposition:

  • use_a — additive random-effect slot $Z_a, a$ (unanchored).
  • use_b — reference-anchored slot $Z_b, c_b$ (where $c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$).
  • use_W — smooth $W$-deviation slot $x_i^\top W^\top \Delta_B(\theta_{\mathrm{ref},i}, \theta_0)$.

Slot dimensions

int<lower=0> J_a;
int<lower=0> J_b;
int<lower=0> dim_W;
int<lower=0> d;
  • J_a — column count of $Z_a$ (number of additive coefficients).
  • J_b — column count of $Z_b$ (number of reference-anchored coefficients).
  • dim_W — number of basis rows of $W$ (rows of the $W_{\mathrm{raw}}$ matrix).
  • d — covariate dimension for the $W$ slot (columns of $W_{\mathrm{raw}}$; also the column count of $X$).

All four are constrained lower=0, allowing any slot to be absent (dimension zero).

Design matrices and responses

matrix[n, J_a] Z_a;
matrix[n, J_b] Z_b;
matrix[n, d] X;

vector[n] y_real;
array[n] int y_int;
  • Z_a$n \times J_a$ design matrix for the additive slot.
  • Z_b$n \times J_b$ design matrix for the reference-anchored slot.
  • X$n \times d$ covariate matrix for the $W$-deviation slot.
  • y_real — real-valued response, used when family_id == 1.
  • y_int — integer-valued response (array of length $n$), used when family_id ∈ {2,3,4}.

Both response vectors are always declared; only one is consumed depending on family_id.

Anchor and dispersion flags

real theta_anchor;

int<lower=0, upper=1> use_dispersion_y;
int<lower=0, upper=1> use_dispersion_phi;
  • theta_anchor — the fixed anchor value $\theta_0$ used in the $W$-basis difference $\Delta_B(\theta, \theta_0)$. This is the "reference point" at which the $W$-deviation vanishes by construction.
  • use_dispersion_y — binary flag activating the Gaussian dispersion $\sigma_y$ (relevant only for family_id == 1).
  • use_dispersion_phi — binary flag activating the NB dispersion $\phi$ (relevant only for family_id == 3).

$W$-basis specification

int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;
  • W_type_id — basis-type selector, constrained to ${0,1,2}$. The actual mapping (e.g., B-spline, polynomial, etc.) is resolved inside apply_W_basis_diff (injected via {{CANONICAL_HELPERS}}).
  • W_n_knots_full — number of knots (can be 0 for knot-free bases).
  • W_knots_full — knot vector of length W_n_knots_full.
  • W_degree — spline/polynomial degree.

Grouping structure

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;
  • use_groups — binary flag. When 0, the model treats all observations as sharing a single reference (degenerate grouping); when 1, group-specific references are used.
  • J_groups — number of groups, constrained $\geq 1$.
  • group_id — integer array of length $n$, mapping each observation to a group in ${1, \dots, J_{\mathrm{groups}}}$.

Canonical-but-degenerate dimension fields (8.6.B)

int<lower=1> K_slots;
int<lower=1> p_dim;
  • K_slots — number of slots per group. In 8.6.B this is fixed at $K = 1$ (a single scalar slot per group). The field is declared with lower=1 and defaults to 1; the R-side validator .gdpar_eb_validate_inputs() checks K_slots == 1.
  • p_dim — parameter dimension per slot. In 8.6.B this is fixed at $p = 1$. Same default-and-check regime.

These fields are declared but not exercised in the 8.6.B body; they exist so that 8.6.C can add $K &gt; 1$ and $p &gt; 1$ branches without modifying the data block.

EB conditional plug-in

vector[J_groups] theta_ref_data;
  • theta_ref_data — the EB point estimate $\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$, supplied as a length-$J_{\mathrm{groups}}$ vector. Each entry theta_ref_data[g] is the plug-in reference for group $g$. The R-side helper .gdpar_eb_maximize_marginal() populates this from cmdstanr::laplace() applied to amm_eb_marginal.stan, following the projection and anti-fragility logic of Charter §2.8.

This is the single most important declaration distinguishing the conditional template from the marginal/FB templates: theta_ref is data, not a parameter.


transformed data Block

int J_a_free = (use_a == 1 && J_a > 0) ? J_a : 0;
int J_b_free = (use_b == 1 && J_b > 0) ? J_b : 0;
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;

Four effective-dimension integers computed once at data-loading time:

  • J_a_free — effective free dimension of the additive slot: $J_a$ if use_a == 1 and $J_a &gt; 0$, else $0$.
  • J_b_free — effective free dimension of the reference-anchored slot: $J_b$ if use_b == 1 and $J_b &gt; 0$, else $0$.
  • dim_W_eff — effective row dimension of $W_{\mathrm{raw}}$: $\dim_W$ if use_W == 1, else $0$.
  • d_eff — effective column dimension of $W_{\mathrm{raw}}$: $d$ if use_W == 1, else $0$.

These are used to size parameter vectors/matrices with zero dimensions when a slot is disabled, which is the Stan idiom for conditionally removing parameters from the model.


parameters Block

Comment

// theta_ref is now data (theta_ref_data above); no hyperparameters
// mu_theta_ref or sigma_theta_ref are sampled in the conditional
// regime --- they were a device to give theta_ref a prior in the
// marginal/FB regimes only.

Explicitly documents the absence of theta_ref, mu_theta_ref, and sigma_theta_ref from the parameter vector. In the conditional regime the parameter vector is $\xi = (a_{\mathrm{raw}},, c_{b,\mathrm{raw}},, W_{\mathrm{raw}},, \sigma_a,, \sigma_b,, \sigma_W,, \sigma_y,, \phi)$, with $\theta_{\mathrm{ref}}$ fixed.

Additive slot parameters

array[use_a == 1 && J_a > 0 ? 1 : 0] real<lower=0> sigma_a;
vector[J_a_free] a_raw;
  • sigma_a — a conditional-size array (length 1 if the additive slot is active, length 0 otherwise) of positive scale parameters. The array[cond ? 1 : 0] idiom makes the parameter vanish from HMC when the slot is disabled.
  • a_raw — raw (unscaled) additive coefficients, length J_a_free. The scaled coefficient is a_coef[j] = a_raw[j] * {{A_SCALE}} (see transformed parameters), where {{A_SCALE}} is a Mustache placeholder (typically * sigma_a[1]).

Reference-anchored slot parameters

array[use_b == 1 && J_b > 0 ? 1 : 0] real<lower=0> sigma_b;
vector[J_b_free] c_b_raw;
  • sigma_b — conditional-size positive scale for the reference-anchored slot.
  • c_b_raw — raw coefficients of length J_b_free. Note the name c_b_raw (not b_raw): the sampled quantity is the centered coefficient $c_b = \sigma_b \cdot c_{b,\mathrm{raw}}$, and the anchored coefficient $b_{\mathrm{coef}} = c_b / \theta_{\mathrm{ref}}$ is derived in transformed parameters. This is the reference-anchoring decomposition: the model is parameterized in terms of $c_b$ (which enters the linear predictor directly), while $b_{\mathrm{coef}}$ is the interpretable quantity satisfying $c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$.

$W$-deviation slot parameters

array[use_W == 1 && dim_W > 0 ? 1 : 0] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;
  • sigma_W — conditional-size positive scale for the $W$ slot.
  • W_raw$\dim_W_{\mathrm{eff}} \times d_{\mathrm{eff}}$ raw coefficient matrix. When the $W$ slot is disabled, both dimensions are 0, producing a $0 \times 0$ matrix (effectively absent). The scaled matrix used in the predictor is W_raw[k,jj] * {{W_SCALE}} (typically * sigma_W[1]).

Dispersion parameters

array[use_dispersion_y] real<lower=0> sigma_y;
array[use_dispersion_phi] real<lower=0> phi;
  • sigma_y — Gaussian dispersion, array of length use_dispersion_y (0 or 1). Positive-constrained.
  • phi — Negative-binomial dispersion (reciprocal overdispersion), array of length use_dispersion_phi (0 or 1). Positive-constrained.

transformed parameters Block

Initializations

vector[J_a] a_coef = rep_vector(0, J_a);
vector[J_b] c_b = rep_vector(0, J_b);
vector[J_b] b_coef = rep_vector(0, J_b);
vector[n] eta;
  • a_coef — scaled additive coefficients, length $J_a$, initialized to zero.
  • c_b — centered (scaled) reference-anchored coefficients, length $J_b$, initialized to zero.
  • b_coef — anchored coefficients $b_{\mathrm{coef}} = c_b / \theta_{\mathrm{ref}}$, length $J_b$, initialized to zero.
  • eta — linear predictor, length $n$.

Additive slot scaling

if (use_a == 1 && J_a > 0) {
  for (j in 1:J_a_free) {
    a_coef[j] = a_raw[j]{{A_SCALE}};
  }
}

When the additive slot is active, each raw coefficient is scaled:

$$a_j = a_{\mathrm{raw},j} \cdot \texttt{A_SCALE}, \qquad j = 1, \dots, J_{a,\mathrm{free}}$$

where {{A_SCALE}} is a Mustache placeholder (typically * sigma_a[1]). The remaining entries of a_coef (if any) stay at zero.

Reference-anchored slot scaling and anchoring

if (use_b == 1 && J_b > 0) {
  for (j in 1:J_b_free) {
    c_b[j] = c_b_raw[j] * sigma_b[1];
  }
  if (use_groups == 0 && theta_ref_data[1] != 0) {
    for (j in 1:J_b) {
      b_coef[j] = c_b[j] / theta_ref_data[1];
    }
  }
}

Two steps:

  1. Scaling: $c_{b,j} = c_{b,\mathrm{raw},j} \cdot \sigma_b$, for $j = 1, \dots, J_{b,\mathrm{free}}$.

  2. Anchoring (conditional): Only when use_groups == 0 (degenerate single-group regime) and theta_ref_data[1] != 0 (the plug-in reference is nonzero), the anchored coefficient is computed:

$$b_{\mathrm{coef},j} = \frac{c_{b,j}}{\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}}, \qquad j = 1, \dots, J_b$$

This realizes the reference-anchoring decomposition $c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$. The linear predictor uses $c_b$ directly (not $b_{\mathrm{coef}}$); $b_{\mathrm{coef}}$ is produced for interpretation and downstream output. When use_groups == 1 (multi-group) or $\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}} = 0$, b_coef remains at zero — the anchoring is not performed because a single scalar division by one group's reference would be ill-defined or meaningless across groups.

Linear predictor construction

for (i in 1:n) {
  real theta_ref_i = theta_ref_data[group_id[i]];
  real eta_i = theta_ref_i;
  if (use_a == 1 && J_a > 0) {
    eta_i += Z_a[i] * a_coef;
  }
  if (use_b == 1 && J_b > 0) {
    eta_i += Z_b[i] * c_b;
  }
  if (use_W == 1 && dim_W > 0 && d > 0) {
    vector[d] W_diff_x = rep_vector(0, d);
    vector[dim_W] basis_diff = apply_W_basis_diff(
      W_type_id, theta_ref_i, theta_anchor, dim_W, W_degree,
      W_n_knots_full, W_knots_full
    );
    for (k in 1:dim_W) {
      for (jj in 1:d) {
        W_diff_x[jj] += basis_diff[k] * W_raw[k, jj]{{W_SCALE}};
      }
    }
    eta_i += dot_product(W_diff_x, to_vector(X[i]));
  }
  eta[i] = eta_i;
}

For each observation $i = 1, \dots, n$:

Step 1 — Group-specific reference:

$$\theta_{\mathrm{ref},i} = \widehat\theta_{\mathrm{ref},,g(i)}^{\mathrm{EB}}$$

where $g(i) = \texttt{group_id}[i]$.

Step 2 — Initialize predictor:

$$\eta_i = \theta_{\mathrm{ref},i}$$

Step 3 — Additive slot (if active):

$$\eta_i \mathrel{+}= Z_{a,i}^\top , a_{\mathrm{coef}}$$

where $Z_{a,i}$ is the $i$-th row of $Z_a$ (a $1 \times J_a$ row vector) and $a_{\mathrm{coef}} \in \mathbb{R}^{J_a}$.

Step 4 — Reference-anchored slot (if active):

$$\eta_i \mathrel{+}= Z_{b,i}^\top , c_b$$

Note: the predictor uses $c_b$ (the centered/scaled coefficient), not $b_{\mathrm{coef}}$. This is consistent with the anchoring $c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$: the contribution $Z_b^\top c_b = \theta_{\mathrm{ref}} \cdot Z_b^\top b_{\mathrm{coef}}$ is proportional to the reference, which is the defining property of the AMM reference-anchored slot.

Step 5 — $W$-deviation slot (if active and $\dim_W &gt; 0$, $d &gt; 0$):

(a) Compute the basis-difference vector:

$$\Delta_B(\theta_{\mathrm{ref},i},, \theta_0) = \texttt{apply_W_basis_diff}(\ldots) \in \mathbb{R}^{\dim_W}$$

This is the difference between the basis evaluated at the observation's reference $\theta_{\mathrm{ref},i}$ and at the anchor $\theta_0 = \texttt{theta_anchor}$. By construction $\Delta_B(\theta_0, \theta_0) = 0$, so the $W$-deviation vanishes at the anchor.

(b) Accumulate the $W$-deviation covariate vector:

$$w_{\mathrm{diff},x}[\mathrm{jj}] = \sum_{k=1}^{\dim_W} \Delta_B[k] \cdot W_{\mathrm{raw}}[k, \mathrm{jj}] \cdot \texttt{W_SCALE}, \qquad \mathrm{jj} = 1, \dots, d$$

where {{W_SCALE}} is a Mustache placeholder (typically * sigma_W[1]). In matrix form:

$$w_{\mathrm{diff},x} = \Delta_B^\top , \Sigma_W \circ W_{\mathrm{raw}} \in \mathbb{R}^{d}$$

where $\Sigma_W$ denotes the scaling applied elementwise.

(c) Add to the predictor:

$$\eta_i \mathrel{+}= w_{\mathrm{diff},x}^\top , x_i = x_i^\top , W_{\mathrm{scaled}}^\top , \Delta_B(\theta_{\mathrm{ref},i}, \theta_0)$$

where $x_i = \texttt{to_vector}(X[i]) \in \mathbb{R}^d$ and $W_{\mathrm{scaled}}$ is the scaled $W_{\mathrm{raw}}$.

Full linear predictor (AMM reference-anchoring decomposition):

$$\boxed{\eta_i = \theta_{\mathrm{ref},g(i)} + Z_{a,i}^\top a + Z_{b,i}^\top c_b + x_i^\top W_{\mathrm{scaled}}^\top , \Delta_B!\bigl(\theta_{\mathrm{ref},g(i)},, \theta_0\bigr)}$$

with the anchoring identity $c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$ (when computed). This is the AMM decomposition: a reference level $\theta_{\mathrm{ref}}$, an unanchored additive correction $Z_a a$, a reference-proportional correction $Z_b c_b$, and a smooth reference-deviation $W$-term that vanishes at $\theta_0$.

Step 6 — Store:

$$\texttt{eta}[i] = \eta_i$$


model Block

Prior section

// BEGIN PRIORS
// No prior on theta_ref: theta_ref is data in the conditional
// template. The hyperprior block of the marginal template
// (mu_theta_ref, sigma_theta_ref) is removed here for the same
// reason. xi components retain their canonical priors.

The comment block explicitly states that:

  • No prior is placed on $\theta_{\mathrm{ref}}$ (it is data).
  • The hyperprior block (mu_theta_ref, sigma_theta_ref) from the marginal template is removed.
  • The $\xi$ components retain their canonical priors (identical to amm_eb_marginal.stan).

Additive slot priors

if (use_a == 1 && J_a > 0) {
  sigma_a[1] ~ {{PRIOR_SIGMA_A}};
  a_raw ~ {{A_PRIOR}};
}
  • $\sigma_a \sim \texttt{PRIOR_SIGMA_A}$ (Mustache placeholder for a positive-support prior, e.g., half-normal, half-Cauchy, exponential).
  • $a_{\mathrm{raw}} \sim \texttt{A_PRIOR}$ (Mustache placeholder, typically normal(0, 1) for a non-centered parameterization, but the exact distribution is resolved at template-instantiation time).

Reference-anchored slot priors

if (use_b == 1 && J_b > 0) {
  sigma_b[1] ~ {{PRIOR_SIGMA_B}};
  c_b_raw ~ normal(0, 1);
}
  • $\sigma_b \sim \texttt{PRIOR_SIGMA_B}$ (Mustache placeholder).
  • $c_{b,\mathrm{raw},j} \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(0, 1)$ — standard normal on the raw coefficients. This is a non-centered parameterization: $c_{b,j} = \sigma_b \cdot c_{b,\mathrm{raw},j}$, so marginally $c_{b,j} \sim \mathcal{N}(0, \sigma_b^2)$.

Note: unlike a_raw (whose prior is a placeholder), c_b_raw has a hard-coded normal(0, 1) prior. This asymmetry is faithful to the source.

$W$-deviation slot priors

if (use_W == 1 && dim_W > 0) {
  sigma_W[1] ~ {{PRIOR_SIGMA_W}};
  to_vector(W_raw) ~ {{W_PRIOR}};
}
  • $\sigma_W \sim \texttt{PRIOR_SIGMA_W}$ (Mustache placeholder).
  • $\mathrm{vec}(W_{\mathrm{raw}}) \sim \texttt{W_PRIOR}$ (Mustache placeholder). The to_vector() call column-major vectorizes the matrix so a single multivariate or iid prior can be applied.

Dispersion priors

if (use_dispersion_y == 1) {
  sigma_y[1] ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
  phi[1] ~ {{PRIOR_PHI}};
}
  • $\sigma_y \sim \texttt{PRIOR_SIGMA_Y}$ (Mustache placeholder), when Gaussian dispersion is active.
  • $\phi \sim \texttt{PRIOR_PHI}$ (Mustache placeholder), when NB dispersion is active.
// END PRIORS

Likelihood section

if (family_id == 1) {
  y_real ~ normal(eta, sigma_y[1]);
} else if (family_id == 2) {
  y_int ~ poisson_log(eta);
} else if (family_id == 3) {
  y_int ~ neg_binomial_2_log(eta, phi[1]);
} else if (family_id == 4) {
  y_int ~ bernoulli_logit(eta);
}

The likelihood is selected by family_id:

Family 1 (Gaussian):

$$y_i \mid \eta_i, \sigma_y \stackrel{\mathrm{ind}}{\sim} \mathcal{N}(\eta_i, \sigma_y^2), \qquad i = 1, \dots, n$$

The normal(eta, sigma_y[1]) vectorized form adds $\sum_{i=1}^n \log \mathcal{N}(y_i \mid \eta_i, \sigma_y^2)$ to the log-posterior.

Family 2 (Poisson, log link):

$$y_i \mid \eta_i \stackrel{\mathrm{ind}}{\sim} \mathrm{Poisson}(\exp(\eta_i))$$

poisson_log(eta) uses $\eta$ directly as the log-mean; no explicit exp() is needed.

Family 3 (Negative binomial type 2, log link):

$$y_i \mid \eta_i, \phi \stackrel{\mathrm{ind}}{\sim} \mathrm{NegBinomial_2}(\mu_i = \exp(\eta_i),, \phi)$$

where $\phi$ is the precision (reciprocal overdispersion) parameter: $\mathrm{Var}(y_i) = \mu_i + \mu_i^2 / \phi$.

Family 4 (Bernoulli, logit link):

$$y_i \mid \eta_i \stackrel{\mathrm{ind}}{\sim} \mathrm{Bernoulli}(\mathrm{logit}^{-1}(\eta_i))$$

bernoulli_logit(eta) uses $\eta$ directly as the log-odds.

Conditional posterior realized

The model block encodes the conditional posterior:

$$\Pi_n^{\mathrm{EB}}(\xi \mid \widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}) \propto L_n!\bigl(y \mid \xi,, \widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}\bigr) \cdot \pi(\xi)$$

where:

  • $L_n$ is the likelihood from the selected family,
  • $\pi(\xi)$ is the product of the canonical priors on $(a_{\mathrm{raw}}, c_{b,\mathrm{raw}}, W_{\mathrm{raw}}, \sigma_a, \sigma_b, \sigma_W, \sigma_y, \phi)$,
  • $\widehat\theta_{\mathrm{ref}}^{\mathrm{EB}}$ enters as a fixed constant (data), not a random variable.

There is no Jacobian adjustment beyond what Stan automatically applies for the lower=0 constraints on the positive parameters ($\sigma_a, \sigma_b, \sigma_W, \sigma_y, \phi$). The non-centered parameterizations ($a_{\mathrm{raw}} \to a_{\mathrm{coef}}$, $c_{b,\mathrm{raw}} \to c_b$) are linear, so no additional Jacobian is needed.


generated quantities Block

vector[n] log_lik;
vector[n] theta_i = eta;
vector[n] y_pred;
  • log_lik — pointwise log-likelihood, length $n$. Used for LOO/WAIC via loo::loo() in R.
  • theta_i — the per-observation "theta" quantity, defined as theta_i = eta. In this parameterization the linear predictor $\eta_i$ is $\theta_i$; there is no separate link-function inversion. This assignment makes explicit that the AMM's latent "theta" for observation $i$ equals the constructed predictor.
  • y_pred — posterior-predictive draws, length $n$.

Per-observation loop

for (i in 1:n) {
  if (family_id == 1) {
    log_lik[i] = normal_lpdf(y_real[i] | eta[i], sigma_y[1]);
    y_pred[i] = normal_rng(eta[i], sigma_y[1]);
  } else if (family_id == 2) {
    log_lik[i] = poisson_log_lpmf(y_int[i] | eta[i]);
    y_pred[i] = poisson_log_rng(eta[i]);
  } else if (family_id == 3) {
    log_lik[i] = neg_binomial_2_log_lpmf(y_int[i] | eta[i], phi[1]);
    y_pred[i] = neg_binomial_2_log_rng(eta[i], phi[1]);
  } else if (family_id == 4) {
    log_lik[i] = bernoulli_logit_lpmf(y_int[i] | eta[i]);
    y_pred[i] = bernoulli_logit_rng(eta[i]);
  }
}

For each observation $i$:

Family 1: $$\texttt{log_lik}[i] = \log \mathcal{N}(y_i \mid \eta_i, \sigma_y^2), \qquad \texttt{y_pred}[i] \sim \mathcal{N}(\eta_i, \sigma_y^2)$$

Family 2: $$\texttt{log_lik}[i] = \log \mathrm{Poisson}(y_i \mid \exp(\eta_i)), \qquad \texttt{y_pred}[i] \sim \mathrm{Poisson}(\exp(\eta_i))$$

Family 3: $$\texttt{log_lik}[i] = \log \mathrm{NegBinomial_2}(y_i \mid \exp(\eta_i), \phi), \qquad \texttt{y_pred}[i] \sim \mathrm{NegBinomial_2}(\exp(\eta_i), \phi)$$

Family 4: $$\texttt{log_lik}[i] = \log \mathrm{Bernoulli}(y_i \mid \mathrm{logit}^{-1}(\eta_i)), \qquad \texttt{y_pred}[i] \sim \mathrm{Bernoulli}(\mathrm{logit}^{-1}(\eta_i))$$

The _lpdf / _lpmf forms compute the exact pointwise log-density/mass; the _rng forms draw from the predictive distribution conditional on the current HMC draw of $(\eta_i, \sigma_y, \phi)$. These are computed for every posterior draw, producing $n$-vectors of log-likelihood and predictive samples.


Summary: How the Template Realizes the AMM Reference-Anchoring Decomposition

The template encodes the AMM (Anchored Mixed Model) decomposition with $\theta_{\mathrm{ref}}$ fixed at the EB plug-in:

$$\eta_i = \underbrace{\theta_{\mathrm{ref},g(i)}}_{\text{reference level}} + \underbrace{Z_{a,i}^\top a}_{\text{additive (unanchored)}} + \underbrace{Z_{b,i}^\top c_b}_{\text{reference-proportional}} + \underbrace{x_i^\top W^\top \Delta_B(\theta_{\mathrm{ref},g(i)}, \theta_0)}_{\text{smooth deviation (vanishes at }\theta_0\text{)}}$$

with the anchoring identity $c_b = \theta_{\mathrm{ref}} \cdot b_{\mathrm{coef}}$ (computed when use_groups == 0 and $\theta_{\mathrm{ref}} \neq 0$). The conditional posterior $\Pi_n^{\mathrm{EB}}(\xi \mid \widehat\theta_{\mathrm{ref}}^{\mathrm{EB}})$ is sampled via HMC over $\xi = (a_{\mathrm{raw}}, c_{b,\mathrm{raw}}, W_{\mathrm{raw}}, \sigma_a, \sigma_b, \sigma_W, \sigma_y, \phi)$, with $\theta_{\mathrm{ref}}$ entering as data. The template is bit-identical to amm_eb_marginal.stan except for the three structural changes documented in the header: theta_ref → data, hyperparameters removed, prior on theta_ref removed.


← Part IV — Complete Function Reference (7/7) · gdpar Wiki Home · Part V — Stan Templates (2/3) →

Clone this wiki locally