Part V Stan Templates 2

← Part V — Stan Templates (1/3) · gdpar Wiki Home · Part V — Stan Templates (3/3) →

inst/stan/_canonical_pieces/amm_canonical_eb_marginal_K.stan

Stan Template Walkthrough: `amm_canonical_eb_marginal_K.stan` — Section 1 of 3

This template implements the Empirical Bayes (EB) marginal model for the AMM (Anchored Modulation Model) reference-anchoring decomposition with $K \geq 2$ distributional slots and $p = 1$ (univariate population reference per slot). It is invoked by R/eb.R via cmdstanr::laplace() in Step (i) of the EB workflow. The template body is bit-identical to the full-Bayesian template amm_distrib_K.stan; only the banner comment documents the EB context.

1. Header Comments

The banner establishes:

Provenance: Sub-phase 8.6.C, decision D34, canonized in Session 10 (2026-05-25).
EB context: Used by R/eb.R via cmdstanr::laplace() for Laplace-approximate marginal inference on $\boldsymbol{\theta}_{\text{ref}}$.
Regime: $K \geq 2,\ p = 1$. Each of the $K$ slots carries a univariate $\theta_{\text{ref},k}$.
Codegen identity: Body is bit-identical to amm_distrib_K.stan, avoiding drift between FB (full Bayes) and EB Path B paths. The R-side function generate_stan_code_K() is reused verbatim via a template_path override, including the full dispatch over stan_id $\in \{1,3,5,6,7,8,9,10,11,12,13\}$ covering D33 deferral.
Companion template: amm_eb_conditional_K.stan moves $\theta_{\text{ref},k}$ from parameters{} to data{} (as theta_ref_k_data), removes the per-slot hyperparameters mu_theta_ref_k / sigma_theta_ref_k from parameters{}, and removes the THETA_REF_PRIOR_BLOCK substitution from model{}, but is otherwise bit-identical.

AMM Canonical Decomposition

For each slot $k \in \{1, \ldots, K\}$, the linear predictor on the link-function scale is:

$$ \eta_k[i] ;=; \theta_{\text{ref},k} ;+; a_k(\mathbf{x}_i) ;+; b_k(\mathbf{x}_i),\theta_{\text{ref},k} ;+; \bigl(W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k})\bigr),\mathbf{x}_i $$

where:

Component	Meaning	Basis Expansion
$\theta_{\text{ref},k}$	Global reference (intercept) for slot $k$	Scalar
$a_k(\mathbf{x}_i)$	Non-linear deviation from reference	$\mathbf{z}_{a,k}[i,]\,\boldsymbol{\alpha}_k$ with $J_a^{(k)}$ basis columns
$b_k(\mathbf{x}_i)\,\theta_{\text{ref},k}$	Non-linear scaling interaction	$(\mathbf{z}_{b,k}[i,]\,\boldsymbol{\beta}_k)\,\theta_{\text{ref},k}$ with $J_b^{(k)}$ basis columns
$(W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k}))\,\mathbf{x}_i$	Modulation term	$W$ globally shared; $\mathbf{X}$ is $n \times d$

The design matrices $\mathbf{Z}_{a,k}$ and $\mathbf{Z}_{b,k}$ are column-wise centered on the R side (via build_amm_design_K()), ensuring the identification constraints (C2) and (C3) hold identically per slot. Constraint (C4) holds per slot via $W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k})$.

b_k Reparametrization (Recovery 2)

To eliminate the bimodality of $(\theta_{\text{ref},k},\, \boldsymbol{\beta}_k)$:

Define $\mathbf{c}_{b,k} = \theta_{\text{ref},k} \cdot \boldsymbol{\beta}_k$ (effective coefficients for the $b_k$ basis).
The model samples $\mathbf{c}_{b,k}^{\text{raw}}$ directly (non-centered parametrization, NCP).
Reports $\boldsymbol{\beta}_k = \mathbf{c}_{b,k} / \theta_{\text{ref},k}$ as a derived quantity.

This eliminates the sign-symmetry bimodality. See Recovery 2 (handoff 4 addendum, 2026-05-09).

Scope Decisions

$W$: Globally shared across all $K$ slots. A single W_raw, a single sigma_W, and a single basis matrix feed the modulating term in every slot.
$\theta_{\text{anchor}}$: Per-slot. Each slot $k$ carries its own $\theta_{\text{anchor},k}$. Default: 0.

2. `functions { }` Block

2.1 `tweedie_log_W_series` — Dunn–Smyth Series Log-Sum

real tweedie_log_W_series(real y, real phi, real p)

Purpose: Computes $\log W(y, \phi, p) = \log \sum_{j=1}^{\infty} W_j$, the log of the infinite series correction factor in the Dunn–Smyth (2005) expansion for the Tweedie exponential dispersion model density in the regime $1 < p < 2$.

Mathematics: For $1 < p < 2$, define $\alpha = \frac{2-p}{p-1} > 0$. Each summand is:

$$ W_j = \frac{y^{j\alpha}}{(p-1)^{j\alpha},\phi^{j(1+\alpha)},(2-p)^{j},j!,\Gamma(j\alpha)} $$

In log space:

$$ \log W_j = j,\alpha \log y ;-; j,\alpha\log(p{-}1) ;-; j(1{+}\alpha)\log\phi ;-; j\log(2{-}p) ;-; \log(j!) ;-; \log\Gamma(j\alpha) $$

Line-by-line:

real alpha_pos = (2.0 - p) / (p - 1.0);: Computes $\alpha = \frac{2-p}{p-1} > 0$ for $1 < p < 2$.
real log_z = alpha_pos * log(y) - alpha_pos * log(p - 1.0) - (1.0 + alpha_pos) * log(phi) - log(2.0 - p);: Computes the "base" log-factor $\log z$ common to each term, where: $$ \log z = \alpha \log y - \alpha \log(p-1) - (1+\alpha)\log\phi - \log(2-p) $$ so that $\log W_j = j \cdot \log z - \log(j!) - \log\Gamma(j\alpha)$.
real log_W = negative_infinity();: Initializes the running log-sum to $-\infty$ (log of zero).
real max_log_W = negative_infinity();: Tracks the running maximum of $\log W_j$ for early termination.
int max_iter = 1000;: Hard ceiling on the number of series terms.
int passed_peak = 0;: Flag indicating whether the series terms have started declining from the peak.
Loop for (j_loop in 1:max_iter):
- real j = j_loop;: Cast loop index to real.
- real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);: Computes $\log W_j = j \cdot \log z - \log\Gamma(j+1) - \log\Gamma(j\alpha)$, noting that $\log(j!) = \log\Gamma(j+1)$.
- if (term > max_log_W) { max_log_W = term; }: Updates the running maximum.
- else { passed_peak = 1; }: Once a term is below the running max, the peak has been passed and the series is in its tail.
- log_W = log_sum_exp(log_W, term);: Accumulates $\log W = \text{log\_sum\_exp}(\log W, \log W_j)$, i.e., the log of the partial sum.
- if (passed_peak == 1 && term < max_log_W - 37.0) { break; }: Early termination when the current term is more than 37 nats below the peak ($e^{-37} \approx 10^{-16}$, i.e., relative contribution below machine-epsilon scale).
return log_W;: Returns $\log \sum_{j=1}^{J} W_j$ with adaptive truncation.

2.2 `tweedie_log_f_series` — Exact Log-Density via Series

real tweedie_log_f_series(real y, real mu, real phi, real p)

Purpose: Exact Tweedie log-density for $y > 0,\ 1 < p < 2$, using the Dunn–Smyth series. This is the canonical exponential dispersion form.

Mathematics: The Tweedie density in exponential dispersion form:

$$ f(y \mid \mu, \phi, p) = \exp!\left{\frac{y,\theta - \kappa(\theta)}{\phi} - \log y + c(y, \phi)\right} $$

where $\theta = \frac{\mu^{1-p}}{1-p}$ and $\kappa = \frac{\mu^{2-p}}{2-p}$ are the natural parameter and cumulant function respectively, and $c(y, \phi)$ is the carrier density term. The series expansion handles $c(y, \phi)$.

Line-by-line:

real theta = pow(mu, 1.0 - p) / (1.0 - p);: Natural parameter $\theta = \frac{\mu^{1-p}}{1-p}$. For $1 < p < 2,\ 1 - p < 0$, so $\theta < 0$.
real kappa = pow(mu, 2.0 - p) / (2.0 - p);: Cumulant function $\kappa(\theta) = \frac{\mu^{2-p}}{2-p}$. For $1 < p < 2,\ 2 - p > 0$, so $\kappa > 0$.
return (y * theta - kappa) / phi - log(y) + tweedie_log_W_series(y, phi, p);: Returns: $$ \log f(y) = \frac{y,\theta - \kappa}{\phi} - \log y + \log W(y, \phi, p) $$

2.3 `tweedie_log_f_saddlepoint` — Saddlepoint Log-Density

real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p)

Purpose: Saddlepoint approximation to the Tweedie log-density for $y > 0$. Used outside the series central region ($|p - 1.5| \geq \tau$) for gradient stability. Based on Wood (2017), Eq. 3.46.

Mathematics: The saddlepoint approximation uses half the unit deviance:

$$ \log f(y) \approx -\frac{1}{2}\log(2\pi\phi,y^p) - \frac{d(y, \mu)}{2\phi} $$

where the unit deviance $d(y, \mu)$ for the Tweedie family is:

$$ d(y, \mu) = 2!\left[\frac{y,(y^{1-p} - \mu^{1-p})}{1-p} - \frac{y^{2-p} - \mu^{2-p}}{2-p}\right] $$

Line-by-line:

real deviance = 2.0 * ( y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p) - (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p) );: Computes $d(y, \mu)$.
return -0.5 * log(2.0 * pi() * phi * pow(y, p)) - deviance / (2.0 * phi);: Returns the saddlepoint approximation $\log \hat{f}(y)$.

2.4 `tweedie_lpdf` — Public Log-PDF Entry Point

real tweedie_lpdf(real y, real mu, real phi, real p)

Purpose: Dispatches to the appropriate log-density computation based on $y$ and $|p - 1.5|$. The _lpdf suffix enables Stan's y ~ tweedie(mu, phi, p) sampling syntax.

Mathematics:

$y = 0$: Closed-form compound Poisson mass: $$ \Pr(Y = 0) = \exp(-\lambda), \quad \lambda = \frac{\mu^{2-p}}{\phi,(2-p)} $$ so $\log \Pr(Y=0) = -\lambda$.
$y > 0,\ |p - 1.5| < \tau$ ($\tau = 0.4$, central region): Dunn–Smyth series (tweedie_log_f_series).
$y > 0,\ |p - 1.5| \geq \tau$ (tail region): Saddlepoint approximation (tweedie_log_f_saddlepoint).

Line-by-line:

real tau = 0.4;: Canonical threshold for the series/saddlepoint boundary.
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));: Poisson rate $\lambda = \frac{\mu^{2-p}}{\phi(2-p)}$ of the compound Poisson representation.
if (y == 0.0) { return -lambda; }: $\log\Pr(Y=0) = -\lambda$.
if (abs(p - 1.5) < tau) { return tweedie_log_f_series(y, mu, phi, p); }: Central region → exact series.
else { return tweedie_log_f_saddlepoint(y, mu, phi, p); }: Tail region → saddlepoint.

2.5 `tweedie_rng` — Compound Poisson–Gamma Random Number Generator

real tweedie_rng(real mu, real phi, real p)

Purpose: Generates a draw from the Tweedie($\mu, \phi, p$) distribution using the compound Poisson–gamma representation.

Mathematics: For $1 < p < 2$:

$N \sim \text{Poisson}(\lambda)$ with $\lambda = \frac{\mu^{2-p}}{\phi(2-p)}$
$G_i \sim \text{Gamma}(\text{shape}, \text{rate})$ with $\text{shape} = \frac{2-p}{p-1},\ \text{rate} = \frac{1}{\phi(p-1)\mu^{p-1}}$
$Y = \sum_{i=1}^{N} G_i$ (and $Y = 0$ if $N = 0$)

Line-by-line:

real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));: Poisson rate $\lambda$.
int N = poisson_rng(lambda);: Draws $N \sim \text{Poisson}(\lambda)$.
if (N == 0) { return 0.0; }: If no gamma variates to sum, return 0.
real shape = (2.0 - p) / (p - 1.0);: Gamma shape $= \alpha$.
real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));: Gamma rate $= \frac{1}{\phi(p-1)\mu^{p-1}}$.
return gamma_rng(N * shape, rate);: Returns $Y \sim \text{Gamma}(N \cdot \alpha, \text{rate})$, which equals the sum of $N$ i.i.d. $\text{Gamma}(\alpha, \text{rate})$ variates by the additivity property of the gamma distribution.

2.6 `apply_inv_link_by_id` — Inverse-Link Dispatch

real apply_inv_link_by_id(int link_id, real eta)

Purpose: Maps a link-function identifier to the canonical inverse-link transformation. Used by the $K = 2$ likelihood branches (family IDs $\in \{1, 3, 5, 6, 7\}$) to compose the auxiliary slot's transformation under the L1 refined rule.

Mapping (populated R-side by .gdpar_compute_inv_link_id_per_slot under decision D3.5):

`link_id`	Transformation	$\eta \mapsto$	Example slots
0	Identity	$\eta$	Gaussian $\mu$, Student-t $\mu$, lognormal $\mu$
1	Inverse-logit	$\frac{1}{1+e^{-\eta}}$	Beta $\mu$, Bernoulli $\mu$
2	Exponential	$e^{\eta}$	Gaussian $\sigma$, NB $\phi$, Beta $\phi$, Gamma $\mu/\alpha$, Tweedie $\mu/\phi$, lognormal $\sigma$, mixture $\mu/\phi$

Line-by-line:

if (link_id == 0) return eta;: Identity: $g^{-1}(\eta) = \eta$.
if (link_id == 1) return inv_logit(eta);: Inverse-logit: $g^{-1}(\eta) = \text{logit}^{-1}(\eta) = \frac{e^{\eta}}{1+e^{\eta}}$.
return exp(eta);: Fallback (link_id == 2): exponential: $g^{-1}(\eta) = e^{\eta}$.

The heterogeneity rule: for a homogeneous slot $k$ (where family_id_k[k] == family_id_k[1]), the ID reflects the canonical link of slot $k$ of the location family. For a heterogeneous slot $k$ (family_id_k[k] != family_id_k[1]), the ID reflects the canonical link of slot 1 of family_id_k[k].

2.7 `{{CANONICAL_HELPERS}}` — Code-Generated Helper Functions

This is a placeholder substituted by R/stan_codegen.R during template expansion. It injects family-specific canonical helper functions (e.g., log-likelihood wrappers, Jacobian corrections) resolved by the stan_id dispatch. The exact contents depend on the family configuration.

3. `data { }` Block

3.1 Core Dimensions

int<lower=1> n;

Number of observations. Must be $\geq 1$.

int<lower=2> K;

Number of $K$-individual (distributional) slots. The template fires only when $K > 1$ (Unit 3). For example, $K = 2$ for Gaussian ($\mu, \sigma$), $K = 3$ for Student-t ($\mu, \sigma, \nu$).

3.2 Family and Link Identifiers

array[K] int<lower=1, upper=13> family_id_k;

Per-slot family identifier. Each slot $k$ carries its own family ID. The 13 supported families are:

ID	Family	Links (per slot)	$K$
1	`gaussian`	$\mu$: identity, $\sigma$: log	2
3	`neg_binomial_2`	$\mu$: log, $\phi$: log	2
5	`beta`	$\mu$: logit, $\phi$: log	2
6	`gamma`	$\mu$: log, $\alpha$: log	2
7	`lognormal_loc_scale`	$\mu$: identity, $\sigma$: log	2
8	`student_t`	$\mu$: identity, $\sigma$: log, $\nu$: log	3
9	`tweedie`	$\mu$: log, $\phi$: log, $p$: identity	3
10	`zip`	$\mu$: log, $\pi$: logit	2
11	`zinb`	$\mu$: log, $\phi$: log, $\pi$: logit	3
12	`hurdle_poisson`	$\mu$: log, $\pi$: logit	2
13	`hurdle_neg_binomial_2`	$\mu$: log, $\phi$: log, $\pi$: logit	3

Sub-phase 8.3.7 admits heterogeneous families per slot for the $K = 2$ location families $\{1, 3, 5, 6, 7\}$; for $K = 3+$ families ($8, 9, 10\text{–}13$), heterogeneity is deferred. The likelihood block branches on family_id_k[1] (the location slot); auxiliary slots consume inv_link_id_per_slot[k] via apply_inv_link_by_id().

array[K] int<lower=0, upper=2> inv_link_id_per_slot;

Per-slot inverse-link identifier. Populated R-side by .gdpar_compute_inv_link_id_per_slot under the L1 refined rule (decision D3.5). Values: 0 = identity, 1 = inv_logit, 2 = exp. Consumed by the $K = 2$ likelihood branches for families $\{1, 3, 5, 6, 7\}$.

3.3 AMM Component Flags

array[K] int<lower=0, upper=1> use_a_k;

Per-slot flag: 1 if the deviation component $a_k(\mathbf{x}_i)$ is active for slot $k$, 0 otherwise.

array[K] int<lower=0, upper=1> use_b_k;

Per-slot flag: 1 if the scaling interaction component $b_k(\mathbf{x}_i)$ is active for slot $k$, 0 otherwise.

int<lower=0, upper=1> use_W;

Global flag: 1 if the modulation component $W$ is active. $W$ is shared across all $K$ slots (Sub-decision 2, Unit 3 paso 3).

3.4 Basis Dimensions and Design Matrices

array[K] int<lower=0> J_a_per_k;

Number of basis columns in $\mathbf{Z}_{a,k}$ for each slot $k:\ J_a^{(k)}$.

array[K] int<lower=0> J_b_per_k;

Number of basis columns in $\mathbf{Z}_{b,k}$ for each slot $k:\ J_b^{(k)}$.

int<lower=0> J_a_max;

$\max_k J_a^{(k)}$. Used for padding the per-slot design matrices to uniform Stan array dimensions.

int<lower=0> J_b_max;

$\max_k J_b^{(k)}$. Same padding purpose.

array[K] matrix[n, J_a_max] Z_a_k;

Padded per-slot design matrices for the $a_k$ component. Each Z_a_k[k] is $n \times J_a^{\max}$; the first $J_a^{(k)}$ columns are the actual basis, and columns $J_a^{(k)}+1 : J_a^{\max}$ are padding (zeros). The segment() function extracts the active columns. These matrices are column-wise centered on the R side, enforcing identification constraints (C2) and (C3) per slot.

array[K] matrix[n, J_b_max] Z_b_k;

Padded per-slot design matrices for the $b_k$ component, analogous to Z_a_k. Column-wise centered on the R side.

3.5 Global Modulating Component (W)

int<lower=0> dim_W;

Total dimension of the $W$ basis parameterization. Used to size W_raw in the parameters block.

int<lower=0> d;

Number of columns in the covariate matrix $\mathbf{X}$ used by the modulation term $(W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k}))\,\mathbf{x}_i$. Typically $d \geq 1$.

matrix[n, d] X;

The $n \times d$ covariate matrix $\mathbf{X}$ for the modulation term. Each row $\mathbf{x}_i$ is multiplied by the scalar modulation $(W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k}))$ per slot $k$.

3.6 Outcomes

vector[n] y_real;

Real-valued response vector. Used by continuous families: Gaussian (1), beta (5), gamma (6), lognormal (7), Student-t (8), Tweedie (9). Also used by continuous hurdle families when implemented.

array[n] int y_int;

Integer-valued response array. Used by discrete families: neg_binomial_2 (3), zero-inflated Poisson (10), zero-inflated NB (11), hurdle Poisson (12), hurdle NB (13). The consumed branch follows family_id_k[1].

3.7 Anchoring Parameters

vector[K] theta_anchor_K;

Per-slot anchor values $\theta_{\text{anchor},k}$ for $k = 1, \ldots, K$. The R-side packager exposes this as a named numeric vector with names matching the slot order of gdpar_formula_set (e.g., theta_anchor = c(mu = 0, sigma = 0)). Default is typically 0. The modulation term evaluates $W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k})$.

3.8 Dispersion Flags

array[K] int<lower=0, upper=1> use_dispersion_y_k;

Per-slot flag indicating whether the slot $k$ contributes a population-level dispersion parameter on the $y$-scale. For example, Gaussian $K = 2$ has no separate $\sigma_y$ entry because the scale is $K$-individual (it is one of the $K$ slots).

array[K] int<lower=0, upper=1> use_dispersion_phi_k;

Per-slot flag indicating whether slot $k$ carries a population-level $\phi$-scale dispersion parameter. These flags reproduce the $K = 1 + p \geq 1$ behaviour on slots whose param_spec retains population scope, kept for future extension to mixed ($K$-individual + population) family patterns.

3.9 W Basis Spline Configuration (Sub-phase 8.3.8)

int<lower=0, upper=2> W_type_id;

Type identifier for the $W$ basis: 0 = none/disabled, 1 = polynomial, 2 = B-spline. Determines how the modulation function $W(\cdot)$ is constructed.

int<lower=0> W_n_knots_full;

Total number of knots for the $W$ spline basis (including boundary knots). Populated R-side.

vector[W_n_knots_full] W_knots_full;

Full knot sequence for the $W$ spline. These knots define the B-spline basis for $W(\cdot)$.

int<lower=0> W_degree;

Degree of the $W$ spline basis (e.g., degree 3 for cubic B-splines).

3.10 Grouping Structure (Block 6.5)

int<lower=0, upper=1> use_groups;

Flag: 1 if hierarchical grouping is active. The grouping structure is shared across all $K$ slots: a single $(J_{\text{groups}}, \texttt{group\_id})$ tuple drives the hierarchical anchor on $\theta_{\text{ref},k}[g]$ for every $k$.

int<lower=1> J_groups;

Number of groups. Only meaningful when use_groups == 1.

array[n] int<lower=1, upper=J_groups> group_id;

Group membership indicator for each observation $i$. Maps observation $i$ to group $g \in \{1, \ldots, J_{\text{groups}}\}$. Used to implement group-level $\theta_{\text{ref},k}[g]$ in the hierarchical extension.

3.11 CP/NCP Parametrization per Slot (Placeholder)

{{DATA_CP_A_PER_K_DECL}}

Placeholder substituted by generate_a_blocks_K() based on the cp_a_per_K logical vector of length $K$. This declares per-slot data fields controlling whether the $a_k$ coefficients use centered parametrization (CP) or non-centered parametrization (NCP). For CP slots, the prior is applied directly to the coefficients; for NCP slots, a raw parameter is sampled and transformed. The substitution is resolved identically to the multi-family template amm_distrib_multi.stan.

3.12 EB Convention Fields

int<lower=1> K_slots;
int<lower=1> p_dim;

Explicit dimensionality fields for the EB Path B convention. Under Path B:

K_slots == K (number of distributional parameter slots)
p_dim == 1 (univariate population reference per slot)

These are declared so that R-side .gdpar_eb_* helpers can populate them uniformly across all EB templates. The body of this template does not branch on either field.

Summary of Template Architecture

This section (1/3) establishes the complete functions block (Tweedie density with series/saddlepoint dispatch, inverse-link dispatch, and code-generated helpers) and the complete data block (observation count, family/link identifiers, AMM component flags and design matrices, outcomes, anchors, dispersion flags, spline configuration for $W$, grouping structure, and EB convention fields). The AMM reference-anchoring decomposition is realized as:

$$ \eta_k[i] = \underbrace{\theta_{\text{ref},k}}_{\text{reference}} + \underbrace{\mathbf{z}_{a,k}[i,],\boldsymbol{\alpha}_k}_{a_k(\mathbf{x}_i)} + \underbrace{(\mathbf{z}_{b,k}[i,],\boldsymbol{\beta}_k),\theta_{\text{ref},k}}_{b_k(\mathbf{x}_i),\theta_{\text{ref},k}} + \underbrace{(W(\theta_{\text{ref},k}) - W(\theta_{\text{anchor},k})),\mathbf{x}_i}_{\text{modulation}} $$

with the $b_k$ reparametrization sampling $\mathbf{c}_{b,k} = \theta_{\text{ref},k}\,\boldsymbol{\beta}_k$ directly (NCP) to eliminate bimodality. Sections 2 and 3 will cover the remaining blocks: transformed data, parameters, transformed parameters, model, and generated quantities.

Block-by-Block Walkthrough of `amm_canonical_eb_marginal_K.stan` (Section 2 of 3)

This section contains four of Stan's program blocks in sequence: transformed data, parameters, transformed parameters, and the model block. Each is dissected below.

1. `transformed data` Block

This block computes deterministic quantities from the data-level inputs before any parameters are sampled. Its primary role is bookkeeping: it sizes arrays, computes offsets for flat-packing raw parameter vectors, and builds index maps that ensure no unused parameter slot enters the model.

1.1 Per-slot free-coefficient counts and offsets

array[K] int J_a_free;
array[K] int J_b_free;
array[K] int a_raw_offset;
array[K] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;

J_a_free[k] — Number of free (non-trivially sampled) additive-design columns for slot $k$. A column is free only if (a) the slot participates in the additive component (use_a_k[k] == 1) and (b) the slot actually has at least one design column (J_a_per_k[k] > 0). Otherwise it collapses to 0. Mathematically:

$$ J_{a,\text{free}}^{(k)} = \begin{cases} J_a^{(k)} & \text{if } \texttt{use_a_k}[k]=1 ;\wedge; J_a^{(k)}>0,\\ 0 & \text{otherwise}. \end{cases} $$

J_b_free[k] — Identical logic for the multiplicative-design columns:

$$ J_{b,\text{free}}^{(k)} = \begin{cases} J_b^{(k)} & \text{if } \texttt{use_b_k}[k]=1 ;\wedge; J_b^{(k)}>0,\\ 0 & \text{otherwise}. \end{cases} $$

a_raw_offset[k] — Cumulative byte-offset into the flat-packed raw additive vector a_raw where slot $k$'s raw coefficients begin. Computed as total_J_a_free before incrementing, so slot $k$'s raw entries occupy indices a_raw_offset[k] + 1 through a_raw_offset[k] + J_a_free[k].
c_b_raw_offset[k] — Same for the multiplicative raw vector c_b_k_raw.
total_J_a_free, total_J_b_free — Running sums, ultimately equal to $\sum_{k=1}^{K} J_{a,\text{free}}^{(k)}$ and $\sum_{k=1}^{K} J_{b,\text{free}}^{(k)}$ respectively. These become the dimension of the flat parameter vectors a_raw and c_b_k_raw.

1.2 Effective W dimensions

int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;

When the global modulating component $W$ is inactive (use_W == 0), its dimensions are collapsed to 0 so that every matrix/array sized by them vanishes, eliminating dead code paths.

$$ \texttt{dim_W_eff} = \begin{cases} \texttt{dim_W} & \text{if } \texttt{use_W}=1,\ 0 & \text{otherwise}. \end{cases} $$

Similarly for $d_\text{eff}$ (the number of covariates that $W$ modulates).

1.3 Activity flags and dispersion sizes

int any_use_a = 0;
int any_use_b = 0;
int sigma_W_size;
int sigma_y_size = 0;
int phi_size = 0;

any_use_a — Becomes 1 if any slot $k$ has use_a_k[k] == 1. Guards all code that references $\sigma_a^{(k)}$ or a_raw.
any_use_b — Same for the multiplicative component.
sigma_W_size — Equals 1 when $W$ is active with positive dimension, 0 otherwise. Controls whether a single global scale $\sigma_W$ is declared.
sigma_y_size — Accumulates use_dispersion_y_k[k] across slots. Counts how many slots request a population-scoped Gaussian dispersion $\sigma_y$. Each such slot contributes one entry to the compacted vector sigma_y_pop.
phi_size — Same for negative-binomial / Beta dispersion $\phi$.

The accumulation loop:

for (k in 1:K) {
  J_a_free[k] = (use_a_k[k] == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
  J_b_free[k] = (use_b_k[k] == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
  a_raw_offset[k] = total_J_a_free;
  c_b_raw_offset[k] = total_J_b_free;
  total_J_a_free += J_a_free[k];
  total_J_b_free += J_b_free[k];
  if (use_a_k[k] == 1) any_use_a = 1;
  if (use_b_k[k] == 1) any_use_b = 1;
  sigma_y_size += use_dispersion_y_k[k];
  phi_size += use_dispersion_phi_k[k];
}

After the loop:

sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;

1.4 Compaction of `sigma_a_k` (Option A / D96)

int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
  if (J_a_free[k] > 0) {
    n_sigma_a += 1;
    sigma_a_idx[k] = n_sigma_a;
  } else {
    sigma_a_idx[k] = 0;
  }
}

Rationale (referenced as RG.6, D96): A slot whose additive component either does not exist (use_a_k[k] == 0) or is intercept-only (the intercept is fully absorbed into $\theta_\text{ref}^{(k)}$) has $J_{a,\text{free}}^{(k)} = 0$. Declaring a scale parameter $\sigma_a^{(k)}$ for such a slot would create a flat, non-identified direction that stalls the sampler.

Compaction strategy: Only slots with $J_{a,\text{free}}^{(k)} > 0$ receive an entry in the compacted vector sigma_a_k. The map sigma_a_idx[k] records:

$$ \texttt{sigma_a_idx}[k] = \begin{cases} \ell & \text{if slot } k \text{ is the } \ell\text{-th slot with } J_{a,\text{free}}^{(k)}>0,\\ 0 & \text{otherwise.} \end{cases} $$

When every slot has free additive coefficients, $n_\sigma_a = K$ and sigma_a_idx is the identity $[1, 2, \ldots, K]$, so the generated model is bit-identical to the uncompacted version.

2. `parameters` Block

All quantities declared here are sampled by the MCMC algorithm. The block realizes the AMM (Additive–Multiplicative–Modulating) decomposition at the parameter level.

2.1 Per-group anchors θ_ref^(k)

array[J_groups] vector[K] theta_ref_k;

For each group $g \in \{1, \ldots, J_\text{groups}\}$, a $K$-vector of slot-specific reference anchors:

$$ \boldsymbol{\theta}_{\text{ref},g} = \bigl(\theta_{\text{ref},g}^{(1)},; \theta_{\text{ref},g}^{(2)},; \ldots,; \theta_{\text{ref},g}^{(K)}\bigr)^\top \in \mathbb{R}^K. $$

Each $\theta_{\text{ref},g}^{(k)}$ is the group-level intercept for slot $k$—the anchor around which all deviations (additive, multiplicative, modulating) are defined. This is the core of the reference-anchoring decomposition: the linear predictor for observation $i$ in group $g$ on slot $k$ starts at $\theta_{\text{ref},g}^{(k)}$ and then adds structured deviations.

2.2 Hierarchical hyperparameters

array[use_groups] vector[K] mu_theta_ref_k;
array[use_groups] vector<lower=0>[K] sigma_theta_ref_k;

When use_groups == 1, these are the mean and standard deviation of the hierarchical prior on the anchors. The array[use_groups] idiom (dimension 0 or 1) is Stan's way of conditionally declaring parameters: if use_groups == 0, the arrays have size 0 and no memory/sampling is allocated.

$$ \theta_{\text{ref},g}^{(k)} \sim \mathcal{N}!\bigl(\mu_{\theta}^{(k)},; \sigma_{\theta}^{(k)}\bigr), \qquad g = 1,\ldots,J_\text{groups}. $$

2.3 Additive component scales and raw coefficients

array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;

sigma_a_k — Per-slot standard deviation of the additive component, compacted to n_sigma_a entries (see §1.4). Each entry is $\sigma_a^{(k)} > 0$.
a_raw — Flat-packed vector of raw additive coefficients across all active slots. For slot $k$ with $J_{a,\text{free}}^{(k)} > 0$, its entries occupy positions a_raw_offset[k]+1 through a_raw_offset[k]+J_a_free[k]. The actual coefficients are $a^{(k)}_j = \sigma_a^{(k)} \cdot a_{\text{raw},j}^{(k)}$, realizing the non-centered parameterization:

$$ a_j^{(k)} = \sigma_a^{(k)} \cdot a_{\text{raw},j}^{(k)}, \qquad a_{\text{raw},j}^{(k)} \sim \mathcal{N}(0, 1). $$

2.4 Multiplicative component scales and raw coefficients

array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_k_raw;

sigma_b_k — Per-slot scale for the multiplicative component. Dimension is $K \cdot \texttt{any\_use\_b}$: if no slot uses the multiplicative component, the array is empty. When present, $\sigma_b^{(k)} > 0$.
c_b_k_raw — Flat-packed raw multiplicative coefficients. The actual coefficients are:

$$ c_{b,j}^{(k)} = \sigma_b^{(k)} \cdot c_{b,\text{raw},j}^{(k)}, \qquad c_{b,\text{raw},j}^{(k)} \sim \mathcal{N}(0, 1). $$

The relationship to the multiplicative design $Z_b$ is:

$$ \text{multiplicative contribution} = \sum_j Z_{b,i,j}^{(k)} \cdot c_{b,j}^{(k)}. $$

In the single-group regime, the multiplicative coefficient can be decomposed as $c_{b,j}^{(k)} = \theta_{\text{ref}}^{(k)} \cdot b_j^{(k)}$, yielding the linear reparametrization referenced in the comments. With multiple groups, $c_b^{(k)}$ is the product-level coefficient and $b_\text{coef}$ is not uniquely identifiable per group.

2.5 Global modulating component W

array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;

sigma_W — A single global scale (array of size 0 or 1). $\sigma_W > 0$.
W_raw — A $\texttt{dim\_W} \times d$ matrix of raw modulation weights. The actual weights are $W = \sigma_W \cdot W_{\text{raw}}$, with $W_{\text{raw},\ell,j} \sim \mathcal{N}(0,1)$ (or whatever W_PRIOR specifies).

These weights define the global modulating component that maps basis-function evaluations of $\theta_\text{ref}^{(k)} - \theta_\text{anchor}^{(k)}$ into contributions to the linear predictor. The matrix $W$ is shared across all slots $k$; each slot applies it to its own basis evaluation (see §3.4).

2.6 Population-scoped dispersion

array[sigma_y_size] real<lower=0> sigma_y_pop;
array[phi_size] real<lower=0> phi_pop;

Population-level dispersion parameters for slots whose param_spec.scope remains "population" after promotion. sigma_y_pop carries entries for Gaussian-scale slots; phi_pop carries entries for NB/Beta dispersion slots. These are compacted similarly to sigma_a_k.

3. `transformed parameters` Block

This block reconstructs derived quantities from the raw parameters and builds the per-observation, per-slot linear predictors $\eta_k[i, k]$.

3.1 Declarations

array[K] vector[J_a_max] a_coef_k;
array[K] vector[J_b_max] c_b_k;
array[K] vector[J_b_max] b_coef_k;
matrix[n, K] eta_k;

a_coef_k[k] — Additive coefficients for slot $k$, length J_a_max (padded with zeros if the slot has fewer design columns).
c_b_k[k] — Multiplicative product-level coefficients for slot $k$.
b_coef_k[k] — Multiplicative coefficients divided by the anchor (only non-zero in the single-group regime).
eta_k — The $n \times K$ matrix of per-observation, per-slot linear predictors.

All are initialized to zero:

for (k in 1:K) {
  a_coef_k[k] = rep_vector(0, J_a_max);
  c_b_k[k] = rep_vector(0, J_b_max);
  b_coef_k[k] = rep_vector(0, J_b_max);
}

3.2 Reconstruction of additive coefficients (`{{TP_A_BLOCK}}`)

This placeholder is replaced at code-generation time. Conceptually, for each slot $k$ with $J_{a,\text{free}}^{(k)} > 0$:

$$ a_j^{(k)} = \sigma_a^{(\texttt{sigma_a_idx}[k])} \cdot a_{\text{raw},,\texttt{a_raw_offset}[k]+j}, \qquad j = 1, \ldots, J_{a,\text{free}}^{(k)}. $$

3.3 Reconstruction of multiplicative coefficients

if (any_use_b == 1) {
  for (k in 1:K) {
    if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
      for (j in 1:J_b_free[k]) {
        c_b_k[k][j] = c_b_k_raw[c_b_raw_offset[k] + j] * sigma_b_k[k];
      }

This implements the non-centered parameterization:

$$ c_{b,j}^{(k)} = \sigma_b^{(k)} \cdot c_{b,\text{raw},j}^{(k)}. $$

Next, the derived quantity $b_\text{coef}^{(k)} = c_b^{(k)} / \theta_{\text{ref}}^{(k)}$ is computed only in the single-group regime (use_groups == 0) and when the anchor is non-zero:

      if (use_groups == 0 && theta_ref_k[1][k] != 0) {
        for (j in 1:J_b_per_k[k]) {
          b_coef_k[k][j] = c_b_k[k][j] / theta_ref_k[1][k];
        }
      }

$$ b_j^{(k)} = \frac{c_{b,j}^{(k)}}{\theta_{\text{ref},1}^{(k)}}. $$

With multiple groups, b_coef_k stays at zero; downstream R code derives $b$ per group.

3.4 Per-observation linear predictor construction

for (i in 1:n) {
  int g_i = group_id[i];
  for (k in 1:K) {
    real theta_ref_ik = theta_ref_k[g_i][k];
    real eta_ik = theta_ref_ik;

For each observation $i$ in group $g_i$ and slot $k$, the linear predictor starts at the group's anchor:

$$ \eta_{i}^{(k)} = \theta_{\text{ref},g_i}^{(k)}. $$

Additive contribution (conditional on use_a_k[k] and J_a_per_k[k]):

    if (use_a_k[k] == 1 && J_a_per_k[k] > 0) {
      eta_ik += dot_product(
        to_vector(Z_a_k[k][i, 1:J_a_per_k[k]]),
        a_coef_k[k][1:J_a_per_k[k]]
      );
    }

$$ \eta_i^{(k)} ;+!=; \sum_{j=1}^{J_a^{(k)}} Z_{a,i,j}^{(k)}; a_j^{(k)}. $$

Multiplicative contribution (conditional on use_b_k[k] and J_b_per_k[k]):

    if (use_b_k[k] == 1 && J_b_per_k[k] > 0) {
      eta_ik += dot_product(
        to_vector(Z_b_k[k][i, 1:J_b_per_k[k]]),
        c_b_k[k][1:J_b_per_k[k]]
      );
    }

$$ \eta_i^{(k)} ;+!=; \sum_{j=1}^{J_b^{(k)}} Z_{b,i,j}^{(k)}; c_{b,j}^{(k)}. $$

Modulating contribution (conditional on use_W, dim_W, d):

    if (use_W == 1 && dim_W > 0 && d > 0) {
      vector[d] W_diff_x_k = rep_vector(0, d);
      vector[dim_W] basis_diff_k = apply_W_basis_diff(
        W_type_id, theta_ref_ik, theta_anchor_K[k], dim_W,
        W_degree, W_n_knots_full, W_knots_full
      );
      for (jj_local in 1:dim_W) {
        for (jjj in 1:d) {
          W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[jj_local, jjj]{{W_SCALE}};
        }
      }
      eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
    }

This computes:

Evaluate the basis function $\phi_\ell\bigl(\theta_{\text{ref},g_i}^{(k)} - \theta_{\text{anchor}}^{(k)}\bigr)$ for $\ell = 1, \ldots, \texttt{dim\_W}$. The function apply_W_basis_diff dispatches on W_type_id (1 = polynomial, 2 = B-spline) to produce the vector $\texttt{basis\_diff\_k} \in \mathbb{R}^{\texttt{dim\_W}}$.
The modulation vector for covariate direction $j$ is:

$$ \texttt{W_diff_x_k}[j] = \sum_{\ell=1}^{\texttt{dim_W}} \phi_\ell!\bigl(\theta_{\text{ref},g_i}^{(k)} - \theta_{\text{anchor}}^{(k)}\bigr) ; W_{\ell,j} \cdot s, $$

where $s$ is the optional scale factor inserted by {{W_SCALE}} (typically 1).

The modulating contribution to the linear predictor is:

$$ \eta_i^{(k)} ;+!=; \sum_{j=1}^{d} \texttt{W_diff_x_k}[j] \cdot X_{i,j}. $$

This realizes a varying-coefficient model where the effect of covariate $X_j$ on observation $i$ is modulated by the distance $\theta_{\text{ref},g_i}^{(k)} - \theta_{\text{anchor}}^{(k)}$ through the shared basis-weight matrix $W$.

Finally, the assembled linear predictor is stored:

    eta_k[i, k] = eta_ik;

Full AMM decomposition summary for observation $i$, slot $k$:

$$ \boxed{\eta_i^{(k)} = \underbrace{\theta_{\text{ref},g_i}^{(k)}}_{\text{anchor}} + \underbrace{\sum_j Z_{a,i,j}^{(k)}, a_j^{(k)}}_{\text{additive } A} + \underbrace{\sum_j Z_{b,i,j}^{(k)}, c_{b,j}^{(k)}}_{\text{multiplicative } M} + \underbrace{\Bigl(\sum_\ell \phi_\ell(\Delta_\text{ref}^{(k)}), W_{\ell,:}\Bigr) \cdot X_i^\top}_{\text{modulating } W}} $$

where $\Delta_\text{ref}^{(k)} = \theta_{\text{ref},g_i}^{(k)} - \theta_{\text{anchor}}^{(k)}$.

4. `model` Block

This block specifies the log-posterior: priors on all parameters and the likelihood of the observed data given the linear predictors $\eta_k$.

4.1 Anchor priors (`{{THETA_REF_PRIOR_BLOCK}}`)

{{THETA_REF_PRIOR_BLOCK}}

This placeholder is replaced at code-generation time. In the hierarchical case (use_groups == 1), the prior is:

$$ \theta_{\text{ref},g}^{(k)} \sim \mathcal{N}!\bigl(\mu_\theta^{(k)},; \sigma_\theta^{(k)}\bigr), \qquad \mu_\theta^{(k)} \sim \ldots, \quad \sigma_\theta^{(k)} \sim \ldots $$

For structurally constrained slots (e.g., Tweedie's power $p \in (1.01, 1.99)$), the R code generator emits a slot-specific prior (per decision E7 of D5).

4.2 Additive component priors

if (any_use_a == 1) {
  sigma_a_k ~ {{PRIOR_SIGMA_A}};
  {{MODEL_A_BLOCK}}
}

sigma_a_k ~ {{PRIOR_SIGMA_A}} — Prior on the per-slot additive scales. Typically half-normal or half-Cauchy.
{{MODEL_A_BLOCK}} — Prior on the raw additive coefficients. In the canonical form:

$$ a_{\text{raw}} \sim \mathcal{N}(0, I), $$

which, combined with $a = \sigma_a \cdot a_{\text{raw}}$, yields $a \sim \mathcal{N}(0, \sigma_a^2 I)$ marginal—i.e., the standard hierarchical shrinkage prior on the additive coefficients.

4.3 Multiplicative component priors

if (any_use_b == 1) {
  sigma_b_k ~ {{PRIOR_SIGMA_B}};
  c_b_k_raw ~ normal(0, 1);
}

sigma_b_k ~ {{PRIOR_SIGMA_B}} — Prior on the per-slot multiplicative scales.
c_b_k_raw ~ normal(0, 1) — Non-centered prior: $c_{b,\text{raw}} \sim \mathcal{N}(0, I)$, so $c_b \sim \mathcal{N}(0, \sigma_b^2 I)$ marginally.

4.4 Modulating component priors

if (use_W == 1 && dim_W > 0) {
  sigma_W[1] ~ {{PRIOR_SIGMA_W}};
  to_vector(W_raw) ~ {{W_PRIOR}};
}

sigma_W[1] — Global modulation scale, receiving its configured prior.
W_raw — Flattened and given its configured prior (typically $\mathcal{N}(0,1)$, yielding $W \sim \mathcal{N}(0, \sigma_W^2 I)$ marginally).

4.5 Dispersion priors

if (sigma_y_size > 0) {
  sigma_y_pop ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
  phi_pop ~ {{PRIOR_PHI}};
}

Population-scoped dispersion parameters receive their family-appropriate priors.

4.6 Likelihood (distributional regression)

The likelihood dispatches on family_id_k[1], which is the family identifier for slot 1 (the location slot). The template handles $K \geq 2$ families; single-parameter families ($K=1$, e.g., Poisson, Bernoulli) are never routed here.

The general structure for all families is:

Transform $\eta_k[i, k]$ through the slot-specific inverse link to obtain the natural parameter.
Feed natural parameters into the appropriate Stan likelihood.

The inverse-link function is applied via apply_inv_link_by_id(inv_link_id_per_slot[k], eta_k[i, k]), where link IDs map as:

ID	Link	Inverse
1	identity	$x$
2	log	$e^x$
3	logit	$\frac{1}{1+e^{-x}}$

4.6.1 Family 1 — Gaussian (K=2)

for (i in 1:n) {
  real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ normal(eta_k[i, 1], sigma_i);
}

$$ y_i \sim \mathcal{N}!\bigl(\eta_i^{(1)},; g_2^{-1}(\eta_i^{(2)})\bigr). $$

Slot 1 carries $\mu$ on the identity link; slot 2 carries $\sigma$ typically on the log link ($g_2^{-1} = \exp$), so $\sigma_i = \exp(\eta_i^{(2)})$. Under heterogeneity (e.g., Beta in slot 2), the slot 2 link may be logit, placing $\sigma_i \in (0,1)$.

4.6.2 Family 3 — Negative Binomial (K=2)

for (i in 1:n) {
  real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_int[i] ~ neg_binomial_2(exp(eta_k[i, 1]), phi_i);
}

$$ y_i \sim \text{NegBinomial2}!\bigl(e^{\eta_i^{(1)}},; g_2^{-1}(\eta_i^{(2)})\bigr). $$

Slot 1: $\log(\mu)$ on log link $\Rightarrow \mu_i = e^{\eta_i^{(1)}}$. Slot 2: $\phi$ (dispersion) typically on log link $\Rightarrow \phi_i = e^{\eta_i^{(2)}}$. Stan's neg_binomial_2 parametrization uses $(\mu, \phi)$ with $\text{Var}(y) = \mu + \mu^2/\phi$.

4.6.3 Family 5 — Beta (K=2)

for (i in 1:n) {
  real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ beta_proportion(inv_logit(eta_k[i, 1]), phi_i);
}

$$ y_i \sim \text{BetaProportion}!\bigl(\text{logit}^{-1}(\eta_i^{(1)}),; g_2^{-1}(\eta_i^{(2)})\bigr). $$

Canonical mean-precision parametrization: $\mu_i = \text{logit}^{-1}(\eta_i^{(1)}) \in (0,1),\ \phi_i = g_2^{-1}(\eta_i^{(2)}) > 0$. Stan's beta_proportion(mu, phi) has mean $\mu$ and concentration $\phi$.

4.6.4 Family 6 — Gamma (K=2)

for (i in 1:n) {
  real mu_i    = exp(eta_k[i, 1]);
  real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ gamma(shape_i, shape_i / mu_i);
}
$$

$$
y_i \sim \text{Gamma}\!\bigl(\alpha_i,\; \alpha_i / \mu_i\bigr).
$$

Mean-shape parametrization: $\mu_i = e^{\eta_i^{(1)}}$ (location), $\alpha_i = g_2^{-1}(\eta_i^{(2)})$ (shape). Stan's Gamma uses shape–rate $(\alpha, \beta)$ with $\mathbb{E}[y] = \alpha/\beta$; the code converts via $\beta_i = \alpha_i / \mu_i$.

##### 4.6.5 Family 7 — Lognormal ($K=2$)

```stan
for (i in 1:n) {
  real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2], eta_k[i, 2]);
  y_real[i] ~ lognormal(eta_k[i, 1], sigma_i);
}
$$

$$
\log(y_i) \sim \mathcal{N}\!\bigl(\eta_i^{(1)},\; g_2^{-1}(\eta_i^{(2)})\bigr).
$$

Slot 1 carries $\mu$ of the log-scale (identity link); slot 2 carries $\sigma$ of the log-scale (typically log link).

##### 4.6.6 Family 8 — Student-t (K=3)

```stan
for (i in 1:n) {
  y_real[i] ~ student_t(exp(eta_k[i, 3]), eta_k[i, 1], exp(eta_k[i, 2]));
}
$$

$$
y_i \sim t_{\nu_i}\!\bigl(\eta_i^{(1)},\; e^{\eta_i^{(2)}}\bigr), \qquad \nu_i = e^{\eta_i^{(3)}}.
$$

Three slots: location (identity), log-scale, and log-degrees-of-freedom. Stan's `student_t(nu, mu, sigma)` takes the raw degrees of freedom.

##### 4.6.7 Family 9 — Tweedie ($K=3$)

```stan
for (i in 1:n) {
  y_real[i] ~ tweedie(exp(eta_k[i, 1]), exp(eta_k[i, 2]), eta_k[i, 3]);
}
$$

$$
y_i \sim \text{Tweedie}\!\bigl(e^{\eta_i^{(1)}},\; e^{\eta_i^{(2)}},\; \eta_i^{(3)}\bigr).
$$

Three slots: $\log(\mu)$ (log link), $\log(\phi)$ (log link), and $p$ (identity link, structurally bounded to $(1.01, 1.99)$). The custom `tweedie_lpdf` (defined in section 1) dispatches between Dunn–Smyth series and saddlepoint approximation depending on $\lvert p - 1.5\rvert$.

##### 4.6.8 Family 10 — Zero-Inflated Poisson (K=2)

```stan
for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real eta_pi_i = eta_k[i, 2];
  if (y_int[i] == 0) {
    target += log_sum_exp(
      bernoulli_logit_lpmf(1 | eta_pi_i),
      bernoulli_logit_lpmf(0 | eta_pi_i)
        + poisson_log_lpmf(0 | eta_mu_i)
    );
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(y_int[i] | eta_mu_i);
  }
}
$$

The ZIP likelihood (Lambert 1992):

$$
P(y_i = 0) = \pi_i + (1-\pi_i)\,e^{-\mu_i}, \qquad P(y_i = m) = (1-\pi_i)\,\frac{e^{-\mu_i}\mu_i^m}{m!}, \quad m > 0,
$$

where $\mu_i = e^{\eta_i^{(1)}}$ and $\pi_i = \text{logit}^{-1}(\eta_i^{(2)})$. The implementation uses log-sum-exp for numerical stability on the zero branch:

$$
\log P(y_i=0) = \text{log\_sum\_exp}\!\bigl(\text{logit}^{-1}(\eta_\pi),\; \log(1-\pi_i) + \log P_{\text{Pois}}(0 \mid \mu_i)\bigr).
$$

##### 4.6.9 Family 11 — Zero-Inflated Negative Binomial ($K=3$)

```stan
for (i in 1:n) {
  real eta_mu_i  = eta_k[i, 1];
  real phi_i     = exp(eta_k[i, 2]);
  real eta_pi_i  = eta_k[i, 3];
  if (y_int[i] == 0) {
    target += log_sum_exp(
      bernoulli_logit_lpmf(1 | eta_pi_i),
      bernoulli_logit_lpmf(0 | eta_pi_i)
        + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
    );
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i);
  }
}
$$

$$
P(y_i=0) = \pi_i + (1-\pi_i)\,\text{NB2}(0 \mid \mu_i, \phi_i), \qquad P(y_i=m) = (1-\pi_i)\,\text{NB2}(m \mid \mu_i, \phi_i), \;\; m>0.
$$

Three slots: $\log(\mu)$, $\log(\phi)$, $\text{logit}(\pi)$.

##### 4.6.10 Family 12 — Hurdle-Poisson (K=2)

```stan
for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real eta_pi_i = eta_k[i, 2];
  if (y_int[i] == 0) {
    target += bernoulli_logit_lpmf(1 | eta_pi_i);
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(y_int[i] | eta_mu_i)
            - log1m_exp(-exp(eta_mu_i));
  }
}
$$

Hurdle likelihood (Mullahy 1986):

$$
P(y_i = 0) = \pi_i, \qquad P(y_i = m) = (1-\pi_i)\,\frac{\text{Pois}(m \mid \mu_i)}{1 - e^{-\mu_i}}, \quad m > 0.
$$

The truncation denominator $1 - e^{-\mu_i}$ is computed stably as `log1m_exp(-exp(eta_mu_i))`, i.e., $\text{log1m\_exp}(-\mu_i) = \log(1 - e^{-\mu_i})$.

##### 4.6.11 Family 13 — Hurdle-Negative Binomial ($K=3$)

```stan
for (i in 1:n) {
  real eta_mu_i = eta_k[i, 1];
  real phi_i    = exp(eta_k[i, 2]);
  real eta_pi_i = eta_k[i, 3];
  if (y_int[i] == 0) {
    target += bernoulli_logit_lpmf(1 | eta_pi_i);
  } else {
    target += bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i, phi_i)
            - log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i));
  }
}
$$

$$
P(y_i=0) = \pi_i, \qquad P(y_i=m) = (1-\pi_i)\,\frac{\text{NB2}(m \mid \mu_i, \phi_i)}{1 - \text{NB2}(0 \mid \mu_i, \phi_i)}, \quad m>0.
$$

Three slots: $\log(\mu)$, $\log(\phi)$, $\text{logit}(\pi)$. The truncation denominator is computed in log-space via `log1m_exp`.

##### 4.6.12 Inactive families

```stan
  // family_id_k[1] in {2 (poisson), 4 (bernoulli)}: these are
  // K = 1 families (single eligible param_spec); the dispatcher
  // never routes them here.

Poisson (family 2) and Bernoulli (family 4) are single-parameter distributions ($K=1$). The R-side dispatcher (families.R) never promotes them to this marginal-$K$ template; they use the simpler $K=1$ template. This comment documents an invariant, not executable code.

Summary of the Reference-Anchoring Decomposition

The entire template realizes the following structural identity for the distributional regression:

$$ \boldsymbol{y}_i \sim p!\bigl(\cdot \mid g_1^{-1}(\eta_i^{(1)}),; g_2^{-1}(\eta_i^{(2)}),; \ldots,; g_K^{-1}(\eta_i^{(K)})\bigr), $$

where each slot's linear predictor decomposes as the reference-anchored AMM form:

$$ \eta_i^{(k)} = \theta_{\text{ref},g_i}^{(k)} + \underbrace{Z_{a,i}^{(k)\top} \sigma_a^{(k)} a_{\text{raw}}^{(k)}}_{\text{A: additive}} + \underbrace{Z_{b,i}^{(k)\top} \sigma_b^{(k)} c_{b,\text{raw}}^{(k)}}_{\text{M: multiplicative}} + \underbrace{\bigl(W_\text{raw} \cdot \sigma_W\bigr)^\top \phi^{(k)}(\Delta_\text{ref}^{(k)}) \cdot X_i}_{\text{W: modulating}}. $$

The non-centered parameterizations ($a_{\text{raw}} \sim \mathcal{N}(0,I)$ scaled by $\sigma_a;\ c_{b,\text{raw}} \sim \mathcal{N}(0,I)$ scaled by $\sigma_b;\ W_{\text{raw}} \sim \mathcal{N}(0,I)$ scaled by $\sigma_W$) improve sampling geometry by decoupling the scale and direction of each component. The compaction of sigma_a_k eliminates non-identified scale parameters for slots without free additive coefficients. The global $W$ matrix is shared across all $K$ slots but applied per-slot through slot-specific basis evaluations, enabling the modulating component to act heterogeneously across distributional parameters while sharing regularization through the common weight structure.

Generated Quantities Block

The generated quantities block computes posterior quantities that are derived from the parameter draws. In this AMM (Anchored Mixed Model) context, it serves three purposes: (1) computing pointwise log-likelihoods $\ell_i(\theta)$ for model comparison (e.g., via LOO or WAIC), (2) storing the fitted linear predictors $\eta_{ik}$ per observation $i$ and distributional slot $k$, and (3) generating posterior-predictive draws $\tilde{y}_i$ from the model. The linear predictors eta_k incorporate the reference-anchoring decomposition; thus, all derived quantities automatically respect the anchoring constraint.

Variable Declarations

  vector[n] log_lik;
  matrix[n, K] theta_i_k = eta_k;
  vector[n] y_pred;

log_lik: A vector of length $n$ (number of observations) storing the per-observation log-likelihoods. These are used for model comparison criteria.
theta_i_k: An $n \times K$ matrix initialized to eta_k, which contains the linear predictors for each observation $i$ and each distributional slot $k$ (after reference anchoring). This is essentially a copy of the fitted linear predictors for convenience.
y_pred: A vector of length $n$ storing posterior-predictive draws from the fitted model.

Observation Loop

The block loops over all $n$ observations. For each observation $i$, it dispatches on family_id_k[1] to compute the log-likelihood and posterior-predictive draw according to the specified distributional regression family. The distributional parameters are obtained by applying the appropriate inverse link functions to the linear predictors stored in eta_k[i, k].

Family 1: Normal

    if (family_id_k[1] == 1) {
      real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = normal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
      y_pred[i] = normal_rng(eta_k[i, 1], sigma_i);
    }

The normal model has two distributional slots: mean $\mu$ and standard deviation $\sigma$.
$\mu_i = \eta_{i1}$ (identity link for mean).
$\sigma_i = g_2^{-1}(\eta_{i2})$, where $g_2^{-1}$ is the inverse link function for the scale slot (typically softplus or exp to ensure $\sigma > 0$).
Log-likelihood: $\ell_i = \log \mathcal{N}(y_i \mid \mu_i, \sigma_i) = -\frac{1}{2}\log(2\pi) - \log \sigma_i - \frac{(y_i - \mu_i)^2}{2\sigma_i^2}$.
Posterior-predictive draw: $\tilde{y}_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$.

Family 3: Negative Binomial

    } else if (family_id_k[1] == 3) {
      real mu_i  = exp(eta_k[i, 1]);
      real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                        eta_k[i, 2]);
      log_lik[i] = neg_binomial_2_lpmf(y_int[i] | mu_i, phi_i);
      y_pred[i]  = neg_binomial_2_rng(mu_i, phi_i);
    }

Negative binomial parameterized with mean $\mu$ and dispersion $\phi$ (sometimes called "size").
$\mu_i = \exp(\eta_{i1})$ (log link for mean).
$\phi_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures $\phi > 0$).
Log-likelihood: $\ell_i = \log \text{NB2}(y_i \mid \mu_i, \phi_i) = \log\binom{y_i + \phi_i - 1}{y_i} + \phi_i \log\left(\frac{\phi_i}{\phi_i + \mu_i}\right) + y_i \log\left(\frac{\mu_i}{\phi_i + \mu_i}\right)$.
Posterior-predictive draw: $\tilde{y}_i \sim \text{NB2}(\mu_i, \phi_i)$.

Family 5: Beta-Proportion

    } else if (family_id_k[1] == 5) {
      real mu_i  = inv_logit(eta_k[i, 1]);
      real phi_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                        eta_k[i, 2]);
      log_lik[i] = beta_proportion_lpdf(y_real[i] | mu_i, phi_i);
      y_pred[i]  = beta_proportion_rng(mu_i, phi_i);
    }

Beta-proportion parameterized with mean $\mu \in (0,1)$ and precision $\phi > 0$.
$\mu_i = \text{logit}^{-1}(\eta_{i1})$ (logistic link for mean).
$\phi_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures $\phi > 0$).
Log-likelihood: $\ell_i = \log \text{Beta-Proportion}(y_i \mid \mu_i, \phi_i) = \log\left(\frac{\Gamma(\phi)}{\Gamma(\mu_i\phi)\Gamma((1-\mu_i)\phi)}\right) + (\mu_i\phi-1)\log y_i + ((1-\mu_i)\phi-1)\log(1-y_i)$.
Posterior-predictive draw: $\tilde{y}_i \sim \text{Beta-Proportion}(\mu_i, \phi_i)$.

Family 6: Gamma

    } else if (family_id_k[1] == 6) {
      real mu_i    = exp(eta_k[i, 1]);
      real shape_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = gamma_lpdf(y_real[i] | shape_i, shape_i / mu_i);
      y_pred[i]  = gamma_rng(shape_i, shape_i / mu_i);
    }

Gamma parameterized with shape $\alpha$ and rate $\beta$.
$\mu_i = \exp(\eta_{i1})$ (log link for mean).
$\alpha_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures $\alpha > 0$).
Rate is $\beta_i = \alpha_i / \mu_i$ (mean $\mu = \alpha/\beta$).
Log-likelihood: $\ell_i = \log \mathcal{G}(y_i \mid \alpha_i, \beta_i) = (\alpha_i-1)\log y_i - \beta_i y_i + \alpha_i\log\beta_i - \log\Gamma(\alpha_i)$.
Posterior-predictive draw: $\tilde{y}_i \sim \mathcal{G}(\alpha_i, \beta_i)$.

Family 7: Lognormal

    } else if (family_id_k[1] == 7) {
      real sigma_i = apply_inv_link_by_id(inv_link_id_per_slot[2],
                                          eta_k[i, 2]);
      log_lik[i] = lognormal_lpdf(y_real[i] | eta_k[i, 1], sigma_i);
      y_pred[i]  = lognormal_rng(eta_k[i, 1], sigma_i);
    }

Lognormal parameterized with location $\mu$ (of the underlying normal) and scale $\sigma$.
$\mu_i = \eta_{i1}$ (identity link for location).
$\sigma_i = g_2^{-1}(\eta_{i2})$ (inverse link ensures $\sigma > 0$).
Log-likelihood: $\ell_i = -\frac{1}{2}\log(2\pi) - \log\sigma_i - \log y_i - \frac{(\log y_i - \mu_i)^2}{2\sigma_i^2}$.
Posterior-predictive draw: $\tilde{y}_i \sim \text{Lognormal}(\mu_i, \sigma_i^2)$.

Family 8: Student-t

    } else if (family_id_k[1] == 8) {
      real nu_i    = exp(eta_k[i, 3]);
      real mu_i    = eta_k[i, 1];
      real sigma_i = exp(eta_k[i, 2]);
      log_lik[i] = student_t_lpdf(y_real[i] | nu_i, mu_i, sigma_i);
      y_pred[i] = student_t_rng(nu_i, mu_i, sigma_i);
    }

Student-t with three distributional slots: degrees of freedom $\nu$, location $\mu$, scale $\sigma$.
$\nu_i = \exp(\eta_{i3})$ (log link for degrees of freedom).
$\mu_i = \eta_{i1}$ (identity link for location).
$\sigma_i = \exp(\eta_{i2})$ (log link for scale).
Log-likelihood: $\ell_i = \log t_\nu(y_i \mid \mu_i, \sigma_i) = \log\Gamma\left(\frac{\nu_i+1}{2}\right) - \log\Gamma\left(\frac{\nu_i}{2}\right) - \frac{1}{2}\log(\nu_i\pi) - \log\sigma_i - \frac{\nu_i+1}{2}\log\left(1 + \frac{(y_i-\mu_i)^2}{\nu_i\sigma_i^2}\right)$.
Posterior-predictive draw: $\tilde{y}_i \sim t_{\nu_i}(\mu_i, \sigma_i^2)$.

Family 9: Tweedie

    } else if (family_id_k[1] == 9) {
      real mu_i  = exp(eta_k[i, 1]);
      real phi_i = exp(eta_k[i, 2]);
      real p_i   = eta_k[i, 3];
      log_lik[i] = tweedie_lpdf(y_real[i] | mu_i, phi_i, p_i);
      y_pred[i] = tweedie_rng(mu_i, phi_i, p_i);
    }

Tweedie with three distributional slots: mean $\mu$, dispersion $\phi$, power $p$ (typically $1 < p < 2$ for compound Poisson-gamma).
$\mu_i = \exp(\eta_{i1})$ (log link for mean).
$\phi_i = \exp(\eta_{i2})$ (log link for dispersion).
$p_i = \eta_{i3}$ (identity link for power).
Log-likelihood: $\ell_i = \log f(y_i \mid \mu_i, \phi_i, p_i)$ where $f$ is the Tweedie density (no closed form; Stan uses numerical integration or series expansion).
Posterior-predictive draw: $\tilde{y}_i \sim \text{Tweedie}(\mu_i, \phi_i, p_i)$.

Family 10: Zero-Inflated Poisson (ZIP)

    } else if (family_id_k[1] == 10) {
      real eta_mu_i = eta_k[i, 1];
      real eta_pi_i = eta_k[i, 2];
      if (y_int[i] == 0) {
        log_lik[i] = log_sum_exp(
          bernoulli_logit_lpmf(1 | eta_pi_i),
          bernoulli_logit_lpmf(0 | eta_pi_i)
            + poisson_log_lpmf(0 | eta_mu_i)
        );
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + poisson_log_lpmf(y_int[i] | eta_mu_i);
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        y_pred[i] = poisson_log_rng(eta_mu_i);
      }
    }

ZIP has two distributional slots: Poisson rate (log scale) and zero-inflation probability (logit scale).
$\eta_{\mu,i} = \eta_{i1}$ (linear predictor for Poisson log-rate).
$\eta_{\pi,i} = \eta_{i2}$ (linear predictor for zero-inflation probability).
Log-likelihood for $y_i = 0$: $$ \ell_i = \log\left(\pi_i + (1-\pi_i)e^{-\lambda_i}\right) = \log\left(e^{\log\pi_i} + e^{\log(1-\pi_i) + \log\lambda_i \cdot 0 - \lambda_i}\right) $$ where $\pi_i = \text{logit}^{-1}(\eta_{\pi,i}),\ \lambda_i = \exp(\eta_{\mu,i})$. This is implemented via log_sum_exp to avoid underflow.
Log-likelihood for $y_i > 0$: $$ \ell_i = \log(1-\pi_i) + \log\left(\frac{\lambda_i^{y_i}e^{-\lambda_i}}{y_i!}\right) $$
Posterior-predictive draw: First draw zero-inflation indicator $z_i \sim \text{Bernoulli}(\pi_i)$. If $z_i = 1$, set $\tilde{y}_i = 0$; else draw $\tilde{y}_i \sim \text{Poisson}(\lambda_i)$.

Family 11: Zero-Inflated Negative Binomial (ZINB)

    } else if (family_id_k[1] == 11) {
      real eta_mu_i = eta_k[i, 1];
      real phi_i    = exp(eta_k[i, 2]);
      real eta_pi_i = eta_k[i, 3];
      if (y_int[i] == 0) {
        log_lik[i] = log_sum_exp(
          bernoulli_logit_lpmf(1 | eta_pi_i),
          bernoulli_logit_lpmf(0 | eta_pi_i)
            + neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)
        );
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
                                              phi_i);
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        y_pred[i] = neg_binomial_2_log_rng(eta_mu_i, phi_i);
      }
    }

ZINB has three distributional slots: NB log-rate, NB dispersion, zero-inflation probability (logit).
$\eta_{\mu,i} = \eta_{i1}$ (NB log-rate).
$\phi_i = \exp(\eta_{i2})$ (dispersion, exponentiated).
$\eta_{\pi,i} = \eta_{i3}$ (zero-inflation logit).
Log-likelihood structure analogous to ZIP but using NB2 instead of Poisson.
Posterior-predictive draw: Draw $z_i \sim \text{Bernoulli}(\pi_i)$. If $z_i=1,\ \tilde{y}_i=0$; else $\tilde{y}_i \sim \text{NB2}(\mu_i, \phi_i)$.

Family 12: Hurdle-Poisson

    } else if (family_id_k[1] == 12) {
      real eta_mu_i = eta_k[i, 1];
      real eta_pi_i = eta_k[i, 2];
      if (y_int[i] == 0) {
        log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + poisson_log_lpmf(y_int[i] | eta_mu_i)
                   - log1m_exp(-exp(eta_mu_i));
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        int y_draw = 0;
        int iter = 0;
        while (y_draw == 0 && iter < 10000) {
          y_draw = poisson_log_rng(eta_mu_i);
          iter = iter + 1;
        }
        y_pred[i] = y_draw;
      }
    }

Hurdle-Poisson has two distributional slots: Poisson rate (log scale) and hurdle probability (logit).
$\eta_{\mu,i} = \eta_{i1}$ (Poisson log-rate).
$\eta_{\pi,i} = \eta_{i2}$ (hurdle logit).
Log-likelihood for $y_i = 0:\ \ell_i = \log\pi_i$.
Log-likelihood for $y_i > 0$: $$ \ell_i = \log(1-\pi_i) + \log\left(\frac{\lambda_i^{y_i}e^{-\lambda_i}}{y_i!}\right) - \log\left(1-e^{-\lambda_i}\right) $$ The term $\log\left(1-e^{-\lambda_i}\right)$ is the truncation normalizer (probability of being positive). It is computed as log1m_exp(-exp(eta_mu_i)) which is $\log(1 - \exp(-\lambda_i))$.
Posterior-predictive draw: Draw hurdle indicator $z_i \sim \text{Bernoulli}(\pi_i)$. If $z_i=1,\ \tilde{y}_i=0$; else draw from truncated Poisson via rejection sampling: repeatedly draw from $\text{Poisson}(\lambda_i)$ until a draw is positive (up to 10000 iterations for numerical safety).

Family 13: Hurdle-Negative Binomial

    } else if (family_id_k[1] == 13) {
      real eta_mu_i = eta_k[i, 1];
      real phi_i    = exp(eta_k[i, 2]);
      real eta_pi_i = eta_k[i, 3];
      if (y_int[i] == 0) {
        log_lik[i] = bernoulli_logit_lpmf(1 | eta_pi_i);
      } else {
        log_lik[i] = bernoulli_logit_lpmf(0 | eta_pi_i)
                   + neg_binomial_2_log_lpmf(y_int[i] | eta_mu_i,
                                              phi_i)
                   - log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i,
                                                       phi_i));
      }
      if (bernoulli_logit_rng(eta_pi_i) == 1) {
        y_pred[i] = 0;
      } else {
        int y_draw = 0;
        int iter = 0;
        while (y_draw == 0 && iter < 10000) {
          y_draw = neg_binomial_2_log_rng(eta_mu_i, phi_i);
          iter = iter + 1;
        }
        y_pred[i] = y_draw;
      }
    }

Hurdle-NB has three distributional slots: NB log-rate, NB dispersion, hurdle probability (logit).
$\eta_{\mu,i} = \eta_{i1}$ (NB log-rate).
$\phi_i = \exp(\eta_{i2})$ (dispersion).
$\eta_{\pi,i} = \eta_{i3}$ (hurdle logit).
Log-likelihood for $y_i=0:\ \ell_i = \log\pi_i$.
Log-likelihood for $y_i>0$: $$ \ell_i = \log(1-\pi_i) + \log\text{NB2}(y_i \mid \mu_i, \phi_i) - \log\left(1 - \text{NB2}(0 \mid \mu_i, \phi_i)\right) $$ The truncation normalizer $\log\left(1 - \text{NB2}(0 \mid \mu_i, \phi_i)\right)$ is computed as log1m_exp(neg_binomial_2_log_lpmf(0 | eta_mu_i, phi_i)).
Posterior-predictive draw: Similar to hurdle-Poisson but draws from truncated NB via rejection sampling.

Default Case

    } else {
      log_lik[i] = negative_infinity();
      y_pred[i] = 0;
    }

If family_id_k[1] does not match any of the implemented families, set log-likelihood to $-\infty$ and predictive draw to 0. This is a safety fallback.

Relation to AMM Reference-Anchoring

In the AMM framework, the linear predictors eta_k are constructed as: $$ \eta_k = \mathbf{X}_k\beta_k + \mathbf{Z}_k\gamma_k + \text{(anchor adjustments)} $$ where $\beta_k$ are fixed effects and $\gamma_k$ are random effects. The reference-anchoring constraints ensure identifiability by fixing certain parameters relative to a reference level or sum-to-zero constraints. The generated quantities block operates on the already-constrained eta_k, so all likelihoods and predictions are consistent with the anchored decomposition. The matrix theta_i_k provides direct access to these anchored linear predictors for diagnostics or plotting.

inst/stan/_canonical_pieces/amm_canonical_eb_marginal_multi.stan

`amm_canonical_eb_marginal_multi.stan` — Block-by-Block Technical Reference

This is section 1 of 2 of the EB-marginal Path-A template. It is bit-identical in body to amm_distrib_multi.stan; only the banner differs (it documents the Empirical-Bayes context, where the template is fed to cmdstanr::laplace() in step (i) of the EB workflow for regime $K=1,\;p\geq 1$). The companion amm_eb_conditional_multi.stan moves theta_ref from parameters{} to data{} (renamed theta_ref_data) and drops the prior on theta_ref/mu_theta_ref/sigma_theta_ref.

The reference-anchoring decomposition realized by the template is, per individual $i$, group $g[i]$, coordinate $k\in\{1,\dots,p\}$:

$$ \theta_{i,k} ;=; \theta_{\mathrm{ref},,g[i],k} ;+; a_k(x_i) ;+; b_k(x_i),\theta_{\mathrm{ref},,g[i],k} ;+; \bigl(W_k(\theta_{\mathrm{ref},,g[i]}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i . $$

The linear predictor $\eta_{i,k}$ constructed in transformed parameters is exactly $\theta_{i,k}$ under the homogeneous-family factorization

$$ p(y_i\mid\theta_i) ;=; \prod_{k=1}^{p} D!\left(y_{i,k},\big|,\theta_{i,k}\right), $$

with $D$ selected by family_id. Cross-dimensional coupling is carried exclusively by $W(\theta_{\mathrm{ref}})$.

The multiplicative term is reparametrized linearly: defining $c_{b,k}=b_{\mathrm{coef},k}\cdot\theta_{\mathrm{ref},k}$, the data identify $c_{b,k}$ linearly through $Z_{b,k}\,c_{b,k}$, and the model samples $c_b$ directly (NCP on c_b_raw), reporting b_coef as a derived quantity. This eliminates the $(\theta_{\mathrm{ref}},b)$ bimodality.

`functions { }`

functions {
  // {{CANONICAL_HELPERS}}
}

A single placeholder, {{CANONICAL_HELPERS}}, is substituted by R/stan_codegen.R. Per the banner, the helpers include apply_W_basis_diff(), the W-basis dispatcher introduced in sub-phase 8.3.8 (2026-05-22). Its signature, as invoked in transformed parameters, is

vector[W_per_k_dim] apply_W_basis_diff(
    int W_type_id,
    real theta_ref_ik,
    real theta_anchor_k,
    int W_per_k_dim,
    int W_degree,
    int W_n_knots_full,
    vector[W_n_knots_full] W_knots_full
);

It returns, for a single coordinate, the basis-difference vector

$$ \Delta^{(k)}(\theta_{\mathrm{ref},ik},\theta_{\mathrm{anchor},k}) ;=; B^{(k)}(\theta_{\mathrm{ref},ik}) - B^{(k)}(\theta_{\mathrm{anchor},k}) ;\in;\mathbb{R}^{W_{\text{per_k_dim}}}, $$

where $B^{(k)}$ is the polynomial basis (W_type_id == 1) or the B-spline basis (W_type_id == 2) of degree W_degree on knots W_knots_full. For W_type_id == 0 the helper returns zeros (W off). The Cox–de Boor recursion for B-splines is re-evaluated at $\theta_{\mathrm{ref},g_i,k}$ and $\theta_{\mathrm{anchor},k}$ at every HMC step (no caching across iterations).

`data { }`

Core dimensions

int<lower=1> n;
int<lower=1> p;
int<lower=1, upper=4> family_id;

n — number of observations.
p — dimension of the population reference $\theta_{\mathrm{ref}}\in\mathbb{R}^p$ and of each $y_i$.
family_id — homogeneous family selector:
- 1 = Gaussian (identity link),
- 2 = Poisson (log link),
- 3 = neg_binomial_2 (log link, Stan's mean-dispersion parametrization),
- 4 = Bernoulli (logit link).

Component toggles

int<lower=0, upper=1> use_a;
int<lower=0, upper=1> use_b;
int<lower=0, upper=1> use_W;

Binary toggles for the additive ($a$), multiplicative ($b$), and modulating ($W$) components of the AMM decomposition.

Per-coordinate column counts

int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[p] int<lower=0> J_a_per_k;
array[p] int<lower=0> J_b_per_k;

J_a_max, J_b_max — maximum column counts across coordinates (the padded storage width).
J_a_per_k[k], J_b_per_k[k] — actual column counts for coordinate $k$; zero when the component is inactive for that coordinate. The ragged structure is handled by padded storage plus per-coordinate iteration bounds.

Padded design matrices

array[p] matrix[n, J_a_max] Z_a;
array[p] matrix[n, J_b_max] Z_b;

For coordinate $k$, only columns $1..J_{a,\text{per\_k}}[k]$ (resp. $J_{b,\text{per\_k}}[k]$) are referenced; the remaining columns are present in storage but never enter the likelihood. Column-wise centering is enforced R-side (amm_spec_multi_designs()), so $\sum_i Z_{a,k}[i,j]=0$ for every $k,j$, giving $E_\mu[a(X)]=0$ identically (assumption C2 of Block 1); analogously for $Z_b$ (C3).

Modulating-component data

int<lower=0> dim_W;
int<lower=0> d;
int<lower=0> W_per_k_dim;
matrix[n, d] X;

dim_W — total row count of W_raw. For separable polynomial/B-spline bases, dim_W = p * W_per_k_dim.
d — column count of W_raw and of the covariate matrix X.
W_per_k_dim — block size per coordinate; dim_W / p for separable bases.
X — covariate matrix $X\in\mathbb{R}^{n\times d}$ entering the W-modulation term.

Outcomes

matrix[n, p] y_real;
array[n, p] int y_int;

Both blocks are always allocated; only the one matching family_id is consumed (y_real for family_id == 1; y_int otherwise).

Anchor and dispersion toggles

vector[p] theta_anchor;
int<lower=0, upper=1> use_dispersion_y;
int<lower=0, upper=1> use_dispersion_phi;

theta_anchor — fixed anchor $\theta_{\mathrm{anchor}}\in\mathbb{R}^p$ used in the W reparametrization $W(\theta_{\mathrm{ref}})-W(\theta_{\mathrm{anchor}})$, satisfying (C4) of Block 1 exactly.
use_dispersion_y — toggles per-coordinate Gaussian SD sigma_y (only meaningful for family_id == 1).
use_dispersion_phi — toggles per-coordinate NB2 dispersion phi (only meaningful for family_id == 3).

W-basis dispatch

int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;

W_type_id — 0 off, 1 polynomial, 2 B-spline.
W_n_knots_full, W_knots_full, W_degree — full knot vector and degree for the B-spline basis (ignored when W_type_id != 2).

Grouping (Block 6.5)

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;{{DATA_CP_A_PER_K_DECL}}

use_groups == 0 (default, backward-compatible): J_groups == 1, group_id[i] == 1 for all $i$, reducing to a single global $\theta_{\mathrm{ref}}$ of length $p$.
use_groups == 1: hierarchical per-group anchors $\theta_{\mathrm{ref},g},\ g=1,\dots,J_{\text{groups}}$, with coordinate-wise hyperpriors. Identifiability condition (C5): $Z_a[k]$ must not contain a factor(group) indicator; R-side preflight enforces this.
{{DATA_CP_A_PER_K_DECL}} — placeholder substituted from the cp_a_per_k logical vector (Phase H.2); declares any per-coordinate CP/NCP scale hyperparameters needed by the additive block.

EB metadata

int<lower=1> K_slots;
int<lower=1> p_dim;

Declared explicitly so R-side .gdpar_eb_* helpers can populate them uniformly. Under Path A, K_slots == 1 and p_dim == p. The template body does not branch on either field.

`transformed data { }`

Size accumulators

array[p] int J_a_free;
array[p] int J_b_free;
array[p] int a_raw_offset;
array[p] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;

Per-coordinate free-coefficient counts and flat-pack offsets into the packed vectors a_raw and c_b_raw. a_raw_offset[k] is the start index (0-based) of coordinate $k$'s block within a_raw; same convention for c_b_raw_offset.

Effective sizes

int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff     = (use_W == 1) ? d : 0;
int sigma_a_size = (use_a == 1) ? p : 0;
int sigma_b_size = (use_b == 1) ? p : 0;
int sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;
int sigma_y_size = (use_dispersion_y == 1) ? p : 0;
int phi_size     = (use_dispersion_phi == 1) ? p : 0;

These drive conditional array sizes in parameters{}. Note:

sigma_W_size is either 0 or 1 — W has a single global scale, not per-coordinate.
dim_W_eff and d_eff collapse to 0 when W is off, so W_raw becomes a matrix[0,0] (a zero-size object Stan accepts).

Offset computation loop

for (k in 1:p) {
  J_a_free[k] = (use_a == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
  J_b_free[k] = (use_b == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
  a_raw_offset[k] = total_J_a_free;
  c_b_raw_offset[k] = total_J_b_free;
  total_J_a_free += J_a_free[k];
  total_J_b_free += J_b_free[k];
}

For each coordinate $k$:

$J_{a,\text{free}}[k] = J_{a,\text{per\_k}}[k]$ if use_a == 1 and the per-coordinate count is positive, else $0$.
$J_{b,\text{free}}[k]$ analogous.
a_raw_offset[k] records the running total before adding coordinate $k$'s block, so coordinate $k$ occupies indices a_raw_offset[k]+1 .. a_raw_offset[k]+J_a_free[k] (1-based) within a_raw.

`parameters { }`

Reference and group hyperparameters

array[J_groups] vector[p] theta_ref;
array[use_groups] vector[p] mu_theta_ref;
array[use_groups] vector<lower=0>[p] sigma_theta_ref;

theta_ref[g] — the per-group population reference $\theta_{\mathrm{ref},g}\in\mathbb{R}^p$. When use_groups == 0, J_groups == 1 and only theta_ref[1] is sampled, directly from {{PRIOR_THETA_REF}} (coordinate-wise i.i.d.).
mu_theta_ref[1][k], sigma_theta_ref[1][k] — coordinate-wise hyperparameters of the group-level Normal prior on theta_ref[g][k]. Allocated only when use_groups == 1 (array size collapses to 0 otherwise, so the parameters vanish from the model).

Additive component

array[sigma_a_size] real<lower=0> sigma_a;
vector[total_J_a_free] a_raw;

sigma_a[k] — per-coordinate SD of the additive coefficients. Size p when use_a == 1, else 0.
a_raw — flat-packed standardized additive coefficients, length $\sum_k J_{a,\text{free}}[k]$. CP vs NCP per coordinate is resolved by {{TP_A_BLOCK}} and {{MODEL_A_BLOCK}} (driven by cp_a_per_k).

Multiplicative component

array[sigma_b_size] real<lower=0> sigma_b;
vector[total_J_b_free] c_b_raw;

sigma_b[k] — per-coordinate SD on $c_b$ (the linearly-identified product $\theta_{\mathrm{ref},k}\cdot b_{\mathrm{coef},k}$).
c_b_raw — flat-packed standardized $c_b$ coefficients. Always NCP: c_b_raw ~ normal(0,1) (see model{}).

Modulating component

array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;

sigma_W[1] — single global SD on W (allocated only when use_W == 1 && dim_W > 0).
W_raw — $\dim_W\times d$ matrix of raw W coefficients. Row block $k$ (rows $(k-1)W_{\text{per\_k\_dim}}+1$ through $k\,W_{\text{per\_k\_dim}}$) corresponds to coordinate $k$. CP/NCP is toggled globally via {{W_SCALE}} and {{W_PRIOR}}.

Dispersion parameters

array[sigma_y_size] real<lower=0> sigma_y;
array[phi_size] real<lower=0> phi;

sigma_y[k] — per-coordinate Gaussian SD (only when use_dispersion_y == 1).
phi[k] — per-coordinate NB2 dispersion (only when use_dispersion_phi == 1).

`transformed parameters { }`

Coefficient containers

array[p] vector[J_a_max] a_coef;
array[p] vector[J_b_max] c_b;
array[p] vector[J_b_max] b_coef;
matrix[n, p] eta;

a_coef[k] — padded additive coefficient vector for coordinate $k$ (length J_a_max, only first J_a_per_k[k] entries meaningful).
c_b[k] — padded $c_b$ vector (the linearly-identified product).
b_coef[k] — padded derived $b_{\mathrm{coef}}$ vector (reported only in single-anchor regime; see below).
eta — the $n\times p$ linear-predictor matrix; $\eta_{i,k}=\theta_{i,k}$ under the AMM decomposition.

Zero-initialization

for (k in 1:p) {
  a_coef[k] = rep_vector(0, J_a_max);
  c_b[k] = rep_vector(0, J_b_max);
  b_coef[k] = rep_vector(0, J_b_max);
}

All padded coefficient vectors start at zero; only the active prefix of each is overwritten downstream. This guarantees that unused padded columns contribute exactly zero to eta.

Additive CP/NCP block (placeholder)

{{TP_A_BLOCK}}

Substituted from cp_a_per_k. For each coordinate $k$ with $J_{a,\text{free}}[k]>0$:

NCP: $a_{\mathrm{coef},k}[j] = \sigma_a[k]\cdot a_{\text{raw}}[\text{offset}+j],\ j=1,\dots,J_{a,\text{free}}[k]$.
CP: $a_{\mathrm{coef},k}[j] = a_{\text{raw}}[\text{offset}+j]$ (the scale is folded into the prior in model{}).

In both cases the active prefix of a_coef[k] is populated; the padded tail remains zero.

Multiplicative reconstruction

if (use_b == 1) {
  for (k in 1:p) {
    if (J_b_per_k[k] > 0) {
      for (j in 1:J_b_free[k]) {
        c_b[k][j] = c_b_raw[c_b_raw_offset[k] + j] * sigma_b[k];
      }
      if (use_groups == 0 && theta_ref[1][k] != 0) {
        for (j in 1:J_b_per_k[k]) {
          b_coef[k][j] = c_b[k][j] / theta_ref[1][k];
        }
      }
    }
  }
}

For each active coordinate $k$:

NCP reconstruction of $c_b$: $$ c_{b,k}[j] ;=; \sigma_b[k]\cdot c_{b,\text{raw}}[\text{offset}k + j], \qquad j=1,\dots,J{b,\text{free}}[k]. $$
Derived $b_{\mathrm{coef}}$ (single-anchor regime only, and only when $\theta_{\mathrm{ref},1,k}\neq 0$): $$ b_{\mathrm{coef},k}[j] ;=; \frac{c_{b,k}[j]}{\theta_{\mathrm{ref},1,k}}. $$ Under use_groups == 1 there is no canonical scalar to divide by (each group has its own $\theta_{\mathrm{ref},g,k}$), so b_coef stays at zero and users must derive it post-hoc per group from $c_b$ and $\theta_{\mathrm{ref},g,k}$.

Linear-predictor assembly

for (i in 1:n) {
  int g_i = group_id[i];
  for (k in 1:p) {
    real theta_ref_ik = theta_ref[g_i][k];
    real eta_ik = theta_ref_ik;
    ...
    eta[i, k] = eta_ik;
  }
}

For each observation $i$ and coordinate $k,\ \eta_{i,k}$ is built up from four terms matching the AMM decomposition.

Term 1 — reference

real theta_ref_ik = theta_ref[g_i][k];
real eta_ik = theta_ref_ik;

$$\eta_{i,k}^{(1)} ;=; \theta_{\mathrm{ref},,g_i,,k}.$$

Term 2 — additive

if (use_a == 1 && J_a_per_k[k] > 0) {
  eta_ik += dot_product(
    to_vector(Z_a[k][i, 1:J_a_per_k[k]]),
    a_coef[k][1:J_a_per_k[k]]
  );
}

$$ \eta_{i,k}^{(2)} ;=; \sum_{j=1}^{J_{a,\text{per_k}}[k]} Z_{a,k}[i,j],a_{\mathrm{coef},k}[j] ;=; a_k(x_i). $$

Only the active prefix of the padded row/column is sliced; the padded tail is excluded by the 1:J_a_per_k[k] range.

Term 3 — multiplicative (via c_b)

if (use_b == 1 && J_b_per_k[k] > 0) {
  eta_ik += dot_product(
    to_vector(Z_b[k][i, 1:J_b_per_k[k]]),
    c_b[k][1:J_b_per_k[k]]
  );
}

$$ \eta_{i,k}^{(3)} ;=; \sum_{j=1}^{J_{b,\text{per_k}}[k]} Z_{b,k}[i,j],c_{b,k}[j] ;=; \sum_j Z_{b,k}[i,j],b_{\mathrm{coef},k}[j],\theta_{\mathrm{ref},g_i,k} ;=; b_k(x_i),\theta_{\mathrm{ref},g_i,k}. $$

The data identify $c_{b,k}$ linearly; the model never multiplies $b_{\mathrm{coef}}$ by $\theta_{\mathrm{ref}}$ inside the likelihood, which is the whole point of the linear reparametrization.

Term 4 — W modulation

if (use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0) {
  vector[d] W_diff_x_k = rep_vector(0, d);
  vector[W_per_k_dim] basis_diff_k = apply_W_basis_diff(
    W_type_id, theta_ref_ik, theta_anchor[k], W_per_k_dim,
    W_degree, W_n_knots_full, W_knots_full
  );
  int row_start = (k - 1) * W_per_k_dim + 1;
  for (jj_local in 1:W_per_k_dim) {
    int row = row_start + jj_local - 1;
    for (jjj in 1:d) {
      W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[row, jjj]{{W_SCALE}};
    }
  }
  eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
}

This block computes, for coordinate $k$:

Basis difference (via apply_W_basis_diff): $$ \Delta^{(k)}\ell ;=; B^{(k)}\ell(\theta_{\mathrm{ref},g_i,k}) - B^{(k)}\ell(\theta{\mathrm{anchor},k}), \qquad \ell=1,\dots,W_{\text{per_k_dim}}. $$
Row block for coordinate $k$ within W_raw: $$ \text{row_start} ;=; (k-1),W_{\text{per_k_dim}} + 1, $$ so block $k$ occupies rows $\text{row\_start},\dots,\text{row\_start}+W_{\text{per\_k\_dim}}-1$.
Contraction of the basis difference with the W row block, producing a length-$d$ vector: $$ \bigl(W_{\text{diff_x},k}\bigr)j ;=; \sum{\ell=1}^{W_{\text{per_k_dim}}} \Delta^{(k)}\ell \cdot \widetilde{W}{\text{row_start}+\ell-1,;j}, \qquad j=1,\dots,d, $$ where $\widetilde{W}$ denotes W_raw after applying {{W_SCALE}}. Under NCP, {{W_SCALE}} resolves to / sigma_W[1], giving $\widetilde{W}_{r,j}=W_{\text{raw},r,j}/\sigma_W$; under CP it is empty, so $\widetilde{W}=W_{\text{raw}}$.
Dot with covariate row: $$ \eta_{i,k}^{(4)} ;=; \sum_{j=1}^{d} \bigl(W_{\text{diff_x},k}\bigr)j , X{i,j} ;=; \bigl(W_k(\theta_{\mathrm{ref},g_i}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i. $$

The guard use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0 ensures the block is skipped whenever W is structurally absent.

Final assignment

eta[i, k] = eta_ik;

$$ \eta_{i,k} ;=; \theta_{\mathrm{ref},g_i,k} ;+; a_k(x_i) ;+; b_k(x_i),\theta_{\mathrm{ref},g_i,k} ;+; \bigl(W_k(\theta_{\mathrm{ref},g_i}) - W_k(\theta_{\mathrm{anchor}})\bigr),x_i. $$

This is exactly the AMM reference-anchoring decomposition stated in the banner.

`model { }`

Reference prior

if (use_groups == 1) {
  mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
  sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
  for (g in 1:J_groups) {
    theta_ref[g] ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
  }
} else {
  theta_ref[1] ~ {{PRIOR_THETA_REF}};
}

Hierarchical regime (use_groups == 1): coordinate-wise hyperpriors on the mean and SD vectors, then a coordinate-wise independent Normal group-level prior: $$ \mu_{\theta_{\mathrm{ref}},k} \sim \pi_{\mu}, \quad \sigma_{\theta_{\mathrm{ref}},k} \sim \pi_{\sigma}, \quad \theta_{\mathrm{ref},g,k}\mid\mu_{\theta_{\mathrm{ref}},k},\sigma_{\theta_{\mathrm{ref}},k} ;\sim; \mathcal{N}!\bigl(\mu_{\theta_{\mathrm{ref}},k},,\sigma_{\theta_{\mathrm{ref}},k}\bigr). $$ The normal(mu_theta_ref[1], sigma_theta_ref[1]) call is vectorized over the length-$p$ vectors, so each coordinate $k$ uses its own $(\mu_k,\sigma_k)$.
Single-anchor regime (use_groups == 0): theta_ref[1] is sampled directly from {{PRIOR_THETA_REF}} (coordinate-wise i.i.d.).

Additive priors

if (use_a == 1) {
  sigma_a ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}

sigma_a[k] receives prior {{PRIOR_SIGMA_A}}. {{MODEL_A_BLOCK}} places the prior on a_raw:

NCP per coordinate: a_raw ~ normal(0, 1) (the scale is applied in {{TP_A_BLOCK}}).
CP per coordinate: a_raw ~ normal(0, sigma_a[k]) (or whatever prior is configured), with the scale folded into the prior directly.

The per-coordinate CP/NCP choice is driven by cp_a_per_k (Phase H.2).

Multiplicative priors

if (use_b == 1) {
  sigma_b ~ {{PRIOR_SIGMA_B}};
  c_b_raw ~ normal(0, 1);
}

$$ \sigma_{b,k} \sim \pi_{\sigma_b}, \qquad c_{b,\text{raw},m} ;\sim; \mathcal{N}(0,1) \quad\text{(i.i.d.\ over all packed entries)}. $$ The $c_b$ block is always NCP; there is no CP variant. The scale $\sigma_b[k]$ enters through the transformed parameters reconstruction $c_{b,k}[j]=\sigma_b[k]\,c_{b,\text{raw},m}$.

W priors

if (use_W == 1 && dim_W > 0) {
  sigma_W[1] ~ {{PRIOR_SIGMA_W}};
  to_vector(W_raw) ~ {{W_PRIOR}};
}

$$ \sigma_W \sim \pi_{\sigma_W}, \qquad \mathrm{vec}(W_{\text{raw}}) \sim \pi_W. $$ Under NCP, {{W_PRIOR}} resolves to normal(0, 1) and {{W_SCALE}} (in transformed parameters) divides by sigma_W[1]. Under CP, {{W_PRIOR}} resolves to normal(0, sigma_W[1]) and {{W_SCALE}} is empty. The toggle is global (single sigma_W), not per-coordinate.

Dispersion priors

if (use_dispersion_y == 1) {
  sigma_y ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
  phi ~ {{PRIOR_PHI}};
}

Per-coordinate priors on Gaussian SDs and NB2 dispersions, allocated only when the corresponding toggle is on.

Likelihood

for (k in 1:p) {
  if (family_id == 1) {
    y_real[:, k] ~ normal(eta[:, k], sigma_y[k]);
  } else if (family_id == 2) {
    y_int[:, k] ~ poisson_log(eta[:, k]);
  } else if (family_id == 3) {
    y_int[:, k] ~ neg_binomial_2_log(eta[:, k], phi[k]);
  } else if (family_id == 4) {
    y_int[:, k] ~ bernoulli_logit(eta[:, k]);
  }
}

Coordinate-wise independent likelihood, homogeneous family across $k$:

Gaussian (family_id == 1, identity link): $$ y_{i,k} ;\sim; \mathcal{N}!\bigl(\eta_{i,k},,\sigma_{y,k}\bigr). $$
Poisson (family_id == 2, log link): $$ y_{i,k} ;\sim; \mathrm{Poisson}!\bigl(e^{\eta_{i,k}}\bigr). $$
Negative binomial 2 (family_id == 3, log link, mean-dispersion): $$ y_{i,k} ;\sim; \mathrm{NegBinomial2}!\bigl(e^{\eta_{i,k}},,\phi_k\bigr), $$ where $\phi_k$ is the dispersion (variance $= \mu + \mu^2/\phi_k$).
Bernoulli (family_id == 4, logit link): $$ y_{i,k} ;\sim; \mathrm{Bernoulli}!\bigl(\mathrm{logit}^{-1}(\eta_{i,k})\bigr). $$

The full joint likelihood factorizes as $\prod_{i,k} D(y_{i,k}\mid\eta_{i,k})$, canonizing Opcion B of the Phase F decision: $y_i$ is multivariate in $\mathbb{R}^p$, coordinates independent given $\theta_i$, with cross-dimensional coupling carried exclusively by $W(\theta_{\mathrm{ref}})$.

`generated quantities { }`

Not present in section 1 of 2. The section ends after the model{} block's likelihood loop (the closing braces for the for (k in 1:p) loop, the model block, and the program are not shown in this excerpt and belong to section 2). No generated quantities block is visible in the provided source.

Summary of the AMM realization

AMM term	Code location	Math
$\theta_{\mathrm{ref},g_i,k}$	`theta_ref[g_i][k]` in `transformed parameters`	sampled (hierarchical or direct)
$a_k(x_i)$	`dot_product(Z_a[k][i,·], a_coef[k][·])`	$\sum_j Z_{a,k}[i,j]\,a_{\mathrm{coef},k}[j]$
$b_k(x_i)\,\theta_{\mathrm{ref},g_i,k}$	`dot_product(Z_b[k][i,·], c_b[k][·])`	$\sum_j Z_{b,k}[i,j]\,c_{b,k}[j]$, with $c_{b,k}=b_{\mathrm{coef},k}\,\theta_{\mathrm{ref},k}$
$(W_k(\theta_{\mathrm{ref},g_i})-W_k(\theta_{\mathrm{anchor}}))\,x_i$	`apply_W_basis_diff` + `W_raw` contraction + `dot_product(W_diff_x_k, X[i])`	$\sum_{\ell,j}\Delta^{(k)}_\ell\,\widetilde{W}_{r(\ell,k),j}\,X_{i,j}$

Identifiability conditions realized: (C2) and (C3) by R-side column centering of $Z_a,Z_b$; (C4) by the explicit $W(\theta_{\mathrm{ref}})-W(\theta_{\mathrm{anchor}})$ reparametrization; (C5) by R-side preflight (no factor(group) in $Z_a$ under use_groups == 1). The cross-dimensional condition C4-bis is not checked by this template; it assumes the spec has passed preflight.

Stan Template `amm_canonical_eb_marginal_multi.stan` — Section 2 of 2

0. Scope of this section

Section 2 consists of a single closing brace } (terminating the block opened in section 1 — almost certainly the model block, since generated quantities is the next top-level block declared) followed by the generated quantities block in its entirety. No functions, data, transformed data, parameters, transformed parameters, or model declarations appear in this section; they all live in section 1 and are assumed visible here through Stan's lexical scoping (all symbols n, p, family_id, eta, sigma_y, phi, y_real, y_int are resolved from earlier blocks).

1. The leading `}`

Closes the immediately preceding top-level Stan block from section 1. Per Stan's grammar, top-level blocks must appear in the fixed order functions, data, transformed data, parameters, transformed parameters, model, generated quantities. Since generated quantities follows, the closed block is the model block (or, less likely, transformed parameters if the model block was empty/omitted — but Stan requires a model block, so the former is the only legal reading).

2. `generated quantities` block — header

generated quantities {

Opens the block executed once per HMC/NUTS posterior draw (after the leapfrog integration step). All quantities declared here are stored in the posterior and exposed to R as draws. Crucially, RNG calls inside this block use the per-draw RNG state seeded by Stan, so y_pred constitutes genuine posterior predictive draws.

3. Declarations inside `generated quantities`

3.1 `matrix[n, p] log_lik;`

  matrix[n, p] log_lik;

Declares an $n \times p$ matrix of pointwise log-likelihood contributions. Element $(i,k)$ will hold

$$\log p!\left(y_{ik} ,\middle|, \eta_{ik}, \text{family-specific parameters}\right),$$

i.e. the log-density (or log-PMF) of observation $i$ for response $k$ evaluated at the current posterior draw. The matrix layout (one row per observation, one column per response) is the convention expected by loo::loo() and loo::loo_matrix() for leave-one-out cross-validation with multivariate outcomes — each cell is an independent LOO unit. No initializer is supplied, so entries are NaN until assigned in the loop below; this is a latent contract that family_id ∈ {1,2,3,4}.

3.2 `matrix[n, p] theta_i = eta;`

  matrix[n, p] theta_i = eta;

Declares theta_i and copies eta into it elementwise. This is the AMM reference-anchoring exposure point:

eta is the linear predictor produced upstream (in transformed parameters of section 1) by the AMM decomposition — i.e. the canonical parameter on the link scale already anchored to the chosen reference level/category.
theta_i is the per-observation canonical parameter $\theta_{ik}$ exposed to the user. The assignment $\theta_{ik} \leftarrow \eta_{ik}$ is the identity map on the link scale, so no Jacobian is required: the canonical parameter is reported in the same parameterization in which the likelihood is evaluated.

In the AMM reference-anchoring decomposition, the canonical parameter is written schematically as

$$\eta_{ik} ;=; g(\mu_{ik}) ;=; \underbrace{\alpha_k^{\text{(ref)}}}_{\text{reference anchor}} ;+; \underbrace{\text{AMM deviation terms}}_{\text{anchored to } \alpha_k^{\text{(ref)}}},$$

where $g(\cdot)$ is the link function (identity / log / logit depending on family_id). The section-1 code constructs $\eta_{ik}$; this section simply re-exposes it under the canonical-parameter name theta_i for downstream R-side diagnostics, residual computation, or anchoring checks. Because the copy is on the link scale and the likelihood functions used below (*_log_lpmf, *_logit_lpmf) accept the linear predictor directly, the chain $\eta \to \theta_i \to \text{likelihood}$ is parameterization-consistent and Jacobian-free.

3.3 `matrix[n, p] y_pred;`

  matrix[n, p] y_pred;

Declares the posterior predictive matrix. Element $(i,k)$ will hold a single draw from the conditional predictive distribution

$$\tilde y_{ik} \sim p!\left(y ,\middle|, \eta_{ik}, \text{family-specific parameters}\right),$$

using the same family and the same current-draw parameters as log_lik. Like log_lik, it is uninitialized and relies on the loop to fill every cell.

4. The observation/response double loop

  for (i in 1:n) {
    for (k in 1:p) {

Iterates over all $n$ observations (outer index i) and all $p$ responses (inner index k). The ordering — observation outermost, response innermost — matches the row-major layout of log_lik, theta_i, y_pred, eta, y_real, y_int. Each $(i,k)$ cell is treated as an independent conditional draw given $\eta_{ik}$; there is no joint/multivariate likelihood coupling across $k$ in this template (the "multi" in the filename refers to $p>1$ responses, not to a joint covariance). This is the marginal-multi specification: each response has its own univariate exponential-family likelihood, sharing only the AMM-anchored linear predictor structure.

5. Family branches

The body of the inner loop is a four-way if / else if cascade keyed on the integer family_id (declared in data in section 1). Each branch performs two assignments: (a) the pointwise log-likelihood into log_lik[i,k], and (b) a posterior predictive draw into y_pred[i,k]. There is no else default, so any family_id outside $\{1,2,3,4\}$ leaves both cells as NaN (a silent contract violation).

5.1 `family_id == 1` — Gaussian (identity link)

      if (family_id == 1) {
        log_lik[i, k] = normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]);
        y_pred[i, k] = normal_rng(eta[i, k], sigma_y[k]);
      }

Likelihood (continuous response, stored in y_real):

$$y_{ik} \sim \mathrm{Normal}!\left(\mu_{ik} = \eta_{ik},; \sigma_{y,k}\right),$$

with pointwise log-density

$$\log p(y_{ik}\mid\eta_{ik},\sigma_{y,k}) ;=; -\tfrac{1}{2}\log(2\pi\sigma_{y,k}^{2}) ;-; \frac{(y_{ik}-\eta_{ik})^{2}}{2\sigma_{y,k}^{2}}.$$

The link is the identity $g(\mu)=\mu$, so $\eta_{ik}=\mu_{ik}$ directly and normal_lpdf is called with eta[i,k] as the mean. sigma_y[k] is a response-specific SD (declared in parameters/transformed parameters of section 1).

Posterior predictive draw:

$$\tilde y_{ik} \sim \mathrm{Normal}!\left(\eta_{ik},, \sigma_{y,k}\right), \qquad \tilde y_{ik} = \texttt{normal_rng}(\eta_{ik}, \sigma_{y,k}).$$

Note the predictive RNG reuses the same $\sigma_{y,k}$ draw as the likelihood — i.e. this is conditional predictive at the current posterior parameter value, not a "new parameter" draw.

5.2 `family_id == 2` — Poisson (log link)

      } else if (family_id == 2) {
        log_lik[i, k] = poisson_log_lpmf(y_int[i, k] | eta[i, k]);
        y_pred[i, k] = poisson_log_rng(eta[i, k]);
      }

Likelihood (count response, stored in y_int):

$$y_{ik} \sim \mathrm{Poisson}!\left(\lambda_{ik} = \exp(\eta_{ik})\right),$$

with pointwise log-PMF (the _log suffix indicates Stan applies $\exp(\cdot)$ internally and uses the numerically stable form):

$$\log p(y_{ik}\mid\eta_{ik}) ;=; y_{ik},\eta_{ik} ;-; \exp(\eta_{ik}) ;-; \log(y_{ik}!).$$

Link: $g(\mu)=\log\mu$, so $\eta_{ik}=\log\lambda_{ik}$. The canonical parameter of the Poisson exponential family is exactly $\log\lambda$, hence theta_i = eta exposes the natural parameter directly — consistent with the AMM canonical-parameter convention.

Posterior predictive draw:

$$\tilde y_{ik} \sim \mathrm{Poisson}!\left(\exp(\eta_{ik})\right), \qquad \tilde y_{ik} = \texttt{poisson_log_rng}(\eta_{ik}),$$

where poisson_log_rng internally exponentiates $\eta_{ik}$ before sampling.

5.3 `family_id == 3` — Negative Binomial type 2 (log link)

      } else if (family_id == 3) {
        log_lik[i, k] = neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]);
        y_pred[i, k] = neg_binomial_2_log_rng(eta[i, k], phi[k]);
      }

Likelihood (overdispersed count response):

$$y_{ik} \sim \mathrm{NegBinomial_2}!\left(\mu_{ik} = \exp(\eta_{ik}),; \phi_k\right),$$

where $\phi_k > 0$ is the precision (reciprocal dispersion) parameter for response $k$, so that

$$\mathrm{Var}(y_{ik}) ;=; \mu_{ik} ;+; \frac{\mu_{ik}^{2}}{\phi_k}.$$

Pointwise log-PMF (Stan's neg_binomial_2_log_lpmf applies $\exp$ to $\eta_{ik}$ internally and uses the gamma–Poisson mixture form):

$$\log p(y_{ik}\mid\eta_{ik},\phi_k) ;=; \log\Gamma(y_{ik}+\phi_k) - \log\Gamma(\phi_k) - \log(y_{ik}!) ;+; \phi_k\log!\frac{\phi_k}{\phi_k+\mu_{ik}} ;+; y_{ik}\log!\frac{\mu_{ik}}{\phi_k+\mu_{ik}},$$

with $\mu_{ik}=\exp(\eta_{ik})$. Link is again $\log$, so $\eta_{ik}$ is the natural-parameter-side linear predictor; phi[k] is response-specific (declared in section 1).

Posterior predictive draw:

$$\tilde y_{ik} \sim \mathrm{NegBinomial_2}!\left(\exp(\eta_{ik}),, \phi_k\right), \qquad \tilde y_{ik} = \texttt{neg_binomial_2_log_rng}(\eta_{ik}, \phi_k).$$

5.4 `family_id == 4` — Bernoulli (logit link)

      } else if (family_id == 4) {
        log_lik[i, k] = bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]);
        y_pred[i, k] = bernoulli_logit_rng(eta[i, k]);
      }

Likelihood (binary/0-1 response):

$$y_{ik} \sim \mathrm{Bernoulli}!\left(p_{ik} = \mathrm{logit}^{-1}(\eta_{ik}) = \frac{1}{1+e^{-\eta_{ik}}}\right),$$

with pointwise log-PMF (Stan's bernoulli_logit_lpmf uses the numerically stable form, avoiding explicit evaluation of $\mathrm{logit}^{-1}$):

$$\log p(y_{ik}\mid\eta_{ik}) ;=; y_{ik},\eta_{ik} ;-; \log!\left(1 + e^{\eta_{ik}}\right).$$

Link: $g(p)=\mathrm{logit}(p)=\log\frac{p}{1-p}$, so $\eta_{ik}=\mathrm{logit}(p_{ik})$. The canonical parameter of the Bernoulli exponential family is exactly $\mathrm{logit}(p)$, so once again theta_i = eta exposes the natural parameter — fully consistent with the AMM canonical-parameter convention.

Posterior predictive draw:

$$\tilde y_{ik} \sim \mathrm{Bernoulli}!\left(\mathrm{logit}^{-1}(\eta_{ik})\right), \qquad \tilde y_{ik} = \texttt{bernoulli_logit_rng}(\eta_{ik}) \in {0,1}.$$

6. Closing braces

    }
  }
}

Three closing braces, innermost to outermost:

Closes the if/else if cascade (per $(i,k)$ cell).
Closes the inner for (k in 1:p) loop.
Closes the outer for (i in 1:n) loop — and, on the same line, the generated quantities block (the final brace also serves as the close of the entire Stan program, since generated quantities is the last top-level block).

7. Mathematical summary of the block

For each posterior draw $\theta^{(s)}$ (comprising all parameters declared in section 1), the block computes, for every $(i,k)$:

$$\ell_{ik}^{(s)} ;=; \log p!\left(y_{ik} ,\middle|, \eta_{ik}^{(s)},, \xi_k^{(s)}\right), \qquad \tilde y_{ik}^{(s)} \sim p!\left(y ,\middle|, \eta_{ik}^{(s)},, \xi_k^{(s)}\right),$$

where $\xi_k$ is the family-specific auxiliary parameter ($\sigma_{y,k}$ for Gaussian, $\phi_k$ for NB2, absent otherwise), and $\eta_{ik}^{(s)}$ is the AMM-anchored linear predictor produced upstream. The three exported matrices are:

Matrix	Content	Shape	Role
`log_lik`	$\ell_{ik}^{(s)}$	$n\times p$	pointwise log-likelihood for LOO/WAIC
`theta_i`	$\eta_{ik}^{(s)}$ (copy)	$n\times p$	canonical parameter on link scale (AMM anchor exposure)
`y_pred`	$\tilde y_{ik}^{(s)}$	$n\times p$	posterior predictive draws

8. How this realizes the AMM reference-anchoring decomposition

The AMM (Anchored Measurement / reference-anchored dynamic parameter) decomposition, as realized across the two sections, separates the model into three layers:

Anchor layer (section 1, parameters/transformed parameters): defines a reference level/category and the anchored deviation structure that builds $\eta_{ik}$ from reference effects plus AMM deviations.
Canonical-parameter layer (this section, line theta_i = eta): exposes the linear predictor $\eta_{ik}$ as the canonical parameter $\theta_{ik}$ on the natural/link scale. Because every family used here is a member of the exponential family with canonical link (identity for Gaussian, log for Poisson/NB2, logit for Bernoulli), $\eta_{ik}$ is the natural parameter, and the copy is the identity reparameterization — no Jacobian term is needed in the model density.
Observation layer (this section, the if/else if cascade): maps $\theta_{ik}=\eta_{ik}$ through the inverse link inside Stan's *_log_lpmf / *_logit_lpmf primitives to evaluate the likelihood and to draw posterior predictive samples.

The "marginal" qualifier in the filename is realized by the absence of any cross-$k$ coupling in the loop: each response $k$ contributes an independent univariate exponential-family term, conditional on its own $\eta_{ik}$ and (where applicable) its own $\sigma_{y,k}$ or $\phi_k$. The "multi" qualifier is realized by the $p$-column matrix layout and the per-$k$ auxiliary parameters. The "eb" (empirical-Bayes) and "canonical" qualifiers are realized upstream in section 1 (priors on the AMM deviation terms and the canonical-link construction of $\eta$); this section simply consumes the resulting $\eta$ and emits the three diagnostic/predictive matrices described above.

inst/stan/_canonical_pieces/amm_canonical_eb_marginal.stan

Functions Block

functions {
  // {{CANONICAL_HELPERS}}
}

Explanation: This block declares user-defined functions. The placeholder {{CANONICAL_HELPERS}} is dynamically replaced by the R code-generation engine with a set of helper functions specific to the canonical AMM reference-anchoring model. These include functions for computing basis differences, scaling transformations, and other mathematical utilities. The exact functions depend on the model configuration (e.g., spline basis for W_type_id). Since they are generated at runtime, their implementation is not shown here, but they are essential for computing the non-linear W component in the transformed parameters block.

Data Block

data {
  int<lower=1> n;
  int<lower=1, upper=4> family_id;

  int<lower=0, upper=1> use_a;
  int<lower=0, upper=1> use_b;
  int<lower=0, upper=1> use_W;

  int<lower=0> J_a;
  int<lower=0> J_b;
  int<lower=0> dim_W;
  int<lower=0> d;

  matrix[n, J_a] Z_a;
  matrix[n, J_b] Z_b;
  matrix[n, d] X;

  vector[n] y_real;
  array[n] int y_int;

  real theta_anchor;

  int<lower=0, upper=1> use_dispersion_y;
  int<lower=0, upper=1> use_dispersion_phi;

  int<lower=0, upper=2> W_type_id;
  int<lower=0> W_n_knots_full;
  vector[W_n_knots_full] W_knots_full;
  int<lower=0> W_degree;

  int<lower=0, upper=1> use_groups;
  int<lower=1> J_groups;
  array[n] int<lower=1, upper=J_groups> group_id;

  // Sub-phase 8.6.B canonical-but-degenerate dimension fields. K_slots
  // and p_dim default to 1 for the 8.6.B regime and are checked R-side
  // by .gdpar_eb_validate_inputs(); 8.6.C will relax the K = 1 and
  // p = 1 guards and exercise these fields nontrivially.
  int<lower=1> K_slots;
  int<lower=1> p_dim;
}

Explanation: The data block defines all observed and fixed quantities. Each declaration is described below:

n: Number of observations.
family_id: Integer index (1–4) selecting the likelihood family: 1 = Gaussian, 2 = Poisson, 3 = negative binomial, 4 = Bernoulli logistic.
use_a, use_b, use_W: Binary flags (0/1) indicating whether to include the additive random effect (a), multiplicative random effect (b), and non-linear deviation (W) components.
J_a, J_b: Number of columns in the design matrices for a and b. If use_a=0 or J_a=0, then a is effectively absent.
dim_W: Dimension of the W basis expansion (e.g., number of basis functions).
d: Dimension of the covariate vector X used in the W interaction.
Z_a, Z_b: Design matrices for the a and b random effects, each of dimension $n \times J_a$ and $n \times J_b$.
X: Covariate matrix of dimension $n \times d$ for the W component.
y_real: Real-valued response vector (used when family_id=1).
y_int: Integer-valued response array (used when family_id=2,3,4).
theta_anchor: Fixed reference anchor value for the non-linear W component.
use_dispersion_y, use_dispersion_phi: Binary flags for including dispersion parameters $\sigma_y$ (Gaussian) and $\phi$ (negative binomial).
W_type_id, W_n_knots_full, W_knots_full, W_degree: Parameters defining the basis for the W component (e.g., spline type, knots, degree).
use_groups: Flag for hierarchical structure of theta_ref.
J_groups: Number of groups (if use_groups=1).
group_id: Integer array mapping each observation to a group (1 to $J_{\text{groups}}$).
K_slots, p_dim: Placeholder dimensions for future multivariate extensions (here fixed to 1).

Transformed Data Block

transformed data {
  int J_a_free = (use_a == 1 && J_a > 0) ? J_a : 0;
  int J_b_free = (use_b == 1 && J_b > 0) ? J_b : 0;
  int dim_W_eff = (use_W == 1) ? dim_W : 0;
  int d_eff = (use_W == 1) ? d : 0;
}

Explanation: This block computes effective dimensions after applying the use_* flags:

J_a_free: Effective number of a coefficients. Equals $J_a$ if use_a=1 and $J_a>0$, else 0.
J_b_free: Effective number of b coefficients. Equals $J_b$ if use_b=1 and $J_b>0$, else 0.
dim_W_eff: Effective dimension of W basis. Equals dim_W if use_W=1, else 0.
d_eff: Effective dimension of X. Equals d if use_W=1, else 0.

These are used to size parameter arrays later.

Parameters Block

parameters {
  vector[J_groups] theta_ref;
  array[use_groups] real mu_theta_ref;
  array[use_groups] real<lower=0> sigma_theta_ref;

  array[use_a == 1 && J_a > 0 ? 1 : 0] real<lower=0> sigma_a;
  vector[J_a_free] a_raw;

  array[use_b == 1 && J_b > 0 ? 1 : 0] real<lower=0> sigma_b;
  vector[J_b_free] c_b_raw;

  array[use_W == 1 && dim_W > 0 ? 1 : 0] real<lower=0> sigma_W;
  matrix[dim_W_eff, d_eff] W_raw;

  array[use_dispersion_y] real<lower=0> sigma_y;
  array[use_dispersion_phi] real<lower=0> phi;
}

Explanation: This block declares the latent parameters:

theta_ref: Vector of reference parameters for each group. Dimension $J_{\text{groups}}$.
mu_theta_ref, sigma_theta_ref: Mean and standard deviation of the hierarchical prior on theta_ref (arrays of length 1 if use_groups=1, else empty).
sigma_a: Standard deviation for the a coefficients (array of length 1 if use_a=1 and $J_a>0$, else empty).
a_raw: Raw (non-centered) additive random effects, vector of length J_a_free.
sigma_b: Standard deviation for the b coefficients (array of length 1 if use_b=1 and $J_b>0$, else empty).
c_b_raw: Raw (non-centered) multiplicative random effects, vector of length J_b_free.
sigma_W: Standard deviation for the W basis coefficients (array of length 1 if use_W=1 and dim_W>0, else empty).
W_raw: Raw (non-centered) W basis coefficients, matrix of dimension dim_W_eff $\times$ d_eff.
sigma_y: Gaussian dispersion (array of length 1 if use_dispersion_y=1, else empty).
phi: Negative binomial dispersion (array of length 1 if use_dispersion_phi=1, else empty).

All raw parameters are intended to have unit-scale priors (non-centered parameterization), while the actual coefficients are derived in the transformed parameters block.

Transformed Parameters Block

transformed parameters {
  vector[J_a] a_coef = rep_vector(0, J_a);
  vector[J_b] c_b = rep_vector(0, J_b);
  vector[J_b] b_coef = rep_vector(0, J_b);
  vector[n] eta;

  if (use_a == 1 && J_a > 0) {
    for (j in 1:J_a_free) {
      a_coef[j] = a_raw[j]{{A_SCALE}};
    }
  }

  if (use_b == 1 && J_b > 0) {
    for (j in 1:J_b_free) {
      c_b[j] = c_b_raw[j] * sigma_b[1];
    }
    if (use_groups == 0 && theta_ref[1] != 0) {
      for (j in 1:J_b) {
        b_coef[j] = c_b[j] / theta_ref[1];
      }
    }
  }

  for (i in 1:n) {
    real theta_ref_i = theta_ref[group_id[i]];
    real eta_i = theta_ref_i;
    if (use_a == 1 && J_a > 0) {
      eta_i += Z_a[i] * a_coef;
    }
    if (use_b == 1 && J_b > 0) {
      eta_i += Z_b[i] * c_b;
    }
    if (use_W == 1 && dim_W > 0 && d > 0) {
      vector[d] W_diff_x = rep_vector(0, d);
      vector[dim_W] basis_diff = apply_W_basis_diff(
        W_type_id, theta_ref_i, theta_anchor, dim_W, W_degree,
        W_n_knots_full, W_knots_full
      );
      for (k in 1:dim_W) {
        for (jj in 1:d) {
          W_diff_x[jj] += basis_diff[k] * W_raw[k, jj]{{W_SCALE}};
        }
      }
      eta_i += dot_product(W_diff_x, to_vector(X[i]));
    }
    eta[i] = eta_i;
  }
}

Explanation: This block transforms raw parameters into meaningful quantities and constructs the linear predictor $\eta$.

Initialization: a_coef, c_b, b_coef are initialized to zero vectors of lengths $J_a,\ J_b,\ J_b$. eta is a vector of length $n$.
Additive random effects ($a$): If use_a=1 and $J_a>0$, each a_coef[j] is computed as a_raw[j] multiplied by a scaling factor {{A_SCALE}}. The placeholder {{A_SCALE}} is replaced by the R code generator, typically sigma_a[1], giving the non-centered parameterization: $$ a_{\text{coef},j} = \sigma_a \cdot a_{\text{raw},j}, \quad a_{\text{raw},j} \sim \text{prior}(0,1). $$
Multiplicative random effects ($b$): If use_b=1 and $J_b>0$, each c_b[j] is computed as c_b_raw[j] * sigma_b[1]. Then, if the model is non-hierarchical (use_groups=0) and theta_ref[1] ≠ 0, we compute b_coef[j] = c_b[j] / theta_ref[1]. This derives the multiplicative coefficients in terms of the reference parameter. In the hierarchical case, b_coef remains zero (it is not used directly in the likelihood).
Linear predictor ($\eta$): For each observation $i$:
- $\theta_{\text{ref},i} = \theta_{\text{ref}}[\text{group\_id}[i]]$.
- Initialize $\eta_i = \theta_{\text{ref},i}$.
- Add the additive term: $\eta_i \mathrel{+}= \mathbf{Z}_{a,i} \cdot \mathbf{a}_{\text{coef}}$, where $\mathbf{Z}_{a,i}$ is the $i$-th row of $Z_a$.
- Add the multiplicative term: $\eta_i \mathrel{+}= \mathbf{Z}_{b,i} \cdot \mathbf{c}_b$. Note that c_b is used, not b_coef.
- If use_W=1, dim_W>0, and d>0, compute the non-linear deviation:
  - Compute basis_diff via the helper function apply_W_basis_diff. This returns a vector of length dim_W containing the basis functions evaluated at the difference $\theta_{\text{ref},i} - \theta_{\text{anchor}}$.
  - Initialize W_diff_x (a vector of length $d$) to zero.
  - For each basis dimension $k$ and covariate dimension $jj$, accumulate: $W_{\text{diff\_x},jj} \mathrel{+}= \text{basis\_diff}[k] \cdot W_{\text{raw}}[k,jj] \cdot {{W_SCALE}}$. The placeholder {{W_SCALE}} is replaced, typically by sigma_W[1].
  - Add the dot product: $\eta_i \mathrel{+}= \mathbf{W}_{\text{diff\_x}} \cdot \mathbf{X}_i$.
- Set eta[i] = eta_i.

The resulting $\eta$ is the linear predictor for the chosen likelihood family.

Model Block

model {
  // BEGIN PRIORS
  if (use_groups == 1) {
    mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
    sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
    theta_ref ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
  } else {
    theta_ref[1] ~ {{PRIOR_THETA_REF}};
  }

  if (use_a == 1 && J_a > 0) {
    sigma_a[1] ~ {{PRIOR_SIGMA_A}};
    a_raw ~ {{A_PRIOR}};
  }
  if (use_b == 1 && J_b > 0) {
    sigma_b[1] ~ {{PRIOR_SIGMA_B}};
    c_b_raw ~ normal(0, 1);
  }
  if (use_W == 1 && dim_W > 0) {
    sigma_W[1] ~ {{PRIOR_SIGMA_W}};
    to_vector(W_raw) ~ {{W_PRIOR}};
  }
  if (use_dispersion_y == 1) {
    sigma_y[1] ~ {{PRIOR_SIGMA_Y}};
  }
  if (use_dispersion_phi == 1) {
    phi[1] ~ {{PRIOR_PHI}};
  }
  // END PRIORS

  if (family_id == 1) {
    y_real ~ normal(eta, sigma_y[1]);
  } else if (family_id == 2) {
    y_int ~ poisson_log(eta);
  } else if (family_id == 3) {
    y_int ~ neg_binomial_2_log(eta, phi[1]);
  } else if (family_id == 4) {
    y_int ~ bernoulli_logit(eta);
  }
}

Explanation: This block specifies the joint log-density (priors and likelihood).

Priors

Reference parameter ($\theta_{\text{ref}}$):
- If use_groups=1: hierarchical prior: $$ \mu_{\theta_{\text{ref}}} \sim p(\mu), \quad \sigma_{\theta_{\text{ref}}} \sim p(\sigma), \quad \theta_{\text{ref},j} \sim \mathcal{N}(\mu_{\theta_{\text{ref}}}, \sigma_{\theta_{\text{ref}}}). $$
- If use_groups=0: a single $\theta_{\text{ref},1}$ with prior $p(\theta_{\text{ref}})$.
Additive random effects ($a$): If used, $\sigma_a \sim p(\sigma_a)$ and $\mathbf{a}_{\text{raw}} \sim p(\mathbf{a}_{\text{raw}})$ (typically standard normal).
Multiplicative random effects ($b$): If used, $\sigma_b \sim p(\sigma_b)$ and $\mathbf{c}_{\text{raw}} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$.
Non-linear deviation ($W$): If used, $\sigma_W \sim p(\sigma_W)$ and $\mathrm{vec}(W_{\text{raw}}) \sim p(W_{\text{raw}})$ (typically standard normal).
Dispersion parameters: If used, $\sigma_y \sim p(\sigma_y)$ and $\phi \sim p(\phi)$.

Likelihood

The likelihood depends on family_id:

Gaussian (family_id=1): $y_i \sim \mathcal{N}(\eta_i, \sigma_y^2)$.
Poisson (family_id=2): $y_i \sim \mathrm{Poisson}(\exp(\eta_i))$.
Negative binomial (family_id=3): $y_i \sim \mathrm{NB2}(\exp(\eta_i), \phi)$.
Bernoulli logistic (family_id=4): $y_i \sim \mathrm{Bernoulli}(\mathrm{logit}^{-1}(\eta_i))$.

Generated Quantities Block

generated quantities {
  vector[n] log_lik;
  vector[n] theta_i = eta;
  vector[n] y_pred;

  for (i in 1:n) {
    if (family_id == 1) {
      log_lik[i] = normal_lpdf(y_real[i] | eta[i], sigma_y[1]);
      y_pred[i] = normal_rng(eta[i], sigma_y[1]);
    } else if (family_id == 2) {
      log_lik[i] = poisson_log_lpmf(y_int[i] | eta[i]);
      y_pred[i] = poisson_log_rng(eta[i]);
    } else if (family_id == 3) {
      log_lik[i] = neg_binomial_2_log_lpmf(y_int[i] | eta[i], phi[1]);
      y_pred[i] = neg_binomial_2_log_rng(eta[i], phi[1]);
    } else if (family_id == 4) {
      log_lik[i] = bernoulli_logit_lpmf(y_int[i] | eta[i]);
      y_pred[i] = bernoulli_logit_rng(eta[i]);
    }
  }
}

Explanation: This block computes quantities after sampling.

log_lik: Vector of pointwise log-likelihoods (for LOO-CV).
theta_i: Alias for eta, representing the linear predictor for each observation.
y_pred: Posterior predictive draws.

For each observation $i$, the appropriate log-likelihood and predictive random number generator are called based on family_id. The log-likelihoods are computed with the same distributional assumptions as the likelihood in the model block.

Summary of the AMM Reference-Anchoring Decomposition

This Stan template realizes the additive-multiplicative non-linear model (AMM) with reference anchoring. The linear predictor $\eta_i$ is decomposed as:

$$ \eta_i = \theta_{\text{ref},g(i)} + \underbrace{\mathbf{Z}_{a,i} \mathbf{a}}_{\text{additive}} + \underbrace{\mathbf{Z}_{b,i} \mathbf{c}_b}_{\text{multiplicative}} + \underbrace{\mathbf{X}_i \cdot \left( \sum_{k=1}^{\text{dim_W}} \psi_k(\theta_{\text{ref},g(i)} - \theta_{\text{anchor}}) , \mathbf{W}_{\text{raw},k} \right)}_{\text{non-linear deviation}}, $$

where:

$\theta_{\text{ref},g(i)}$ is the reference parameter for group $g(i)$.
$\mathbf{a}$ are additive random effects (centered via $\sigma_a$).
$\mathbf{c}_b$ are multiplicative random effects (centered via $\sigma_b$).
$\psi_k$ are basis functions (from apply_W_basis_diff) evaluated at the difference $\theta_{\text{ref}} - \theta_{\text{anchor}}$, and $\mathbf{W}_{\text{raw},k}$ are raw basis coefficients scaled by $\sigma_W$.

The decomposition ensures that the non-linear term vanishes when $\theta_{\text{ref}} = \theta_{\text{anchor}}$, anchoring the model at that point. In the EB workflow, this template is used with Laplace approximation to integrate out all parameters except $\theta_{\text{ref}}$, yielding its marginal MAP estimate and covariance for downstream diagnostics.

inst/stan/_canonical_pieces/amm_canonical_helpers.stan

`inst/stan/_canonical_pieces/amm_canonical_helpers.stan` — Technical Wiki

File-Level Overview

This Stan source file contains only a functions block. There are no data, transformed data, parameters, transformed parameters, model, or generated quantities blocks present in this file. The two user-defined functions defined here are intended to be included (via #include or Stan's modular compilation) into a larger AMM (Anchored Marginal Model) Stan program. Their sole purpose is to support the W modulating component of the AMM reference-anchoring decomposition by providing differentiable basis-function evaluation inside the HMC sampler.

The file header comment identifies the development phase:

Sub-phase 8.3.8 (2026-05-22): Cox-de Boor B-spline basis evaluation for the W modulating component.

Key design rationale stated in the comment:

The augmented knot vector W_knots_full (length $n_{\text{int}} + 2(W_{\text{degree}}+1)$) is constructed on the R side by .gdpar_bspline_knots_full() and passed as data.
The Cox-de Boor recursion is re-evaluated at theta_ref_i and theta_anchor inside every HMC leapfrog step because these are parameters — a pre-computed basis matrix from R would be constant data and would not propagate gradients with respect to the evaluation points.
The returned dimension W_per_k_dim equals the total number of Cox-de Boor B-spline basis functions minus one, mirroring splines::bs(intercept = FALSE) (the first basis function is dropped).
The exact ordering need not bit-match splines::bs because W_raw absorbs any orthogonal rotation under sampling; only the dimension count must agree.

`functions` Block

Function 1: `bspline_basis_eval`

Signature

vector bspline_basis_eval(real x, int W_per_k_dim, int W_degree,
                          int W_n_knots_full, vector W_knots_full)

Arguments:

Argument	Type	Role
`x`	`real`	Evaluation point. At call sites this is `theta_ref_i` or `theta_anchor`, both of which are Stan parameters.
`W_per_k_dim`	`int`	Number of basis functions to return (total basis count minus one; the first is dropped).
`W_degree`	`int`	B-spline polynomial degree $p$.
`W_n_knots_full`	`int`	Length of the augmented knot vector $m$.
`W_knots_full`	`vector`	Augmented knot vector $\mathbf{t} = (t_1, \ldots, t_m)$, passed as data from R.

Return type: vector[W_per_k_dim] — the B-spline basis evaluated at $x$, with the first basis function omitted.

Internal Dimension Computation

int n_basis_total = W_n_knots_full - W_degree - 1;
int n_iter_max = n_basis_total + W_degree;

n_basis_total is the total number of degree-$p$ B-spline basis functions for a knot vector of length $m$ and degree $p$:

$$n_{\text{basis_total}} = m - p - 1$$

For an augmented knot vector with $n_{\text{int}}$ interior knots and $p+1$ repeated boundary knots at each end, $m = n_{\text{int}} + 2(p+1)$, giving:

$$n_{\text{basis_total}} = n_{\text{int}} + 2(p+1) - p - 1 = n_{\text{int}} + p + 1$$

n_iter_max is the number of degree-0 basis functions that must be initialized to support the full recursion up to degree $p$:

$$n_{\text{iter_max}} = n_{\text{basis_total}} + p = m - 1$$

This equals $m - 1$ because there are $m - 1$ consecutive knot intervals $(t_i, t_{i+1})$ for $i = 1, \ldots, m-1$. Each degree-$p$ basis function $B_{j,p}$ depends (transitively) on degree-0 basis functions $B_{j,0}, \ldots, B_{j+p,0}$, so the full set of degree-$p$ functions requires degree-0 indices $1$ through $n_{\text{basis\_total}} + p$.

Working Array Declaration

vector[n_iter_max] B;

A working vector of length $n_{\text{iter\_max}} = m - 1$ that holds the current recursion level's basis values. It is overwritten in place at each degree level.

Degree-0 Initialization

for (i in 1:n_iter_max) {
  if (W_knots_full[i] == W_knots_full[i + 1]) {
    B[i] = 0.0;
  } else if (i == n_iter_max && x >= W_knots_full[i + 1]) {
    B[i] = 1.0;
  } else {
    B[i] = (x >= W_knots_full[i] && x < W_knots_full[i + 1]) ? 1.0 : 0.0;
  }
}

This implements the degree-0 Cox-de Boor basis with three cases:

Case 1 — Repeated knots (W_knots_full[i] == W_knots_full[i + 1]):

When $t_i = t_{i+1}$, the interval is degenerate (zero width), so:

$$B_{i,0}(x) = 0$$

This is the standard convention for repeated knots in the Cox-de Boor recursion.

Case 2 — Right-endpoint inclusion (i == n_iter_max && x >= W_knots_full[i + 1]):

For the last degree-0 basis function (index $i = n_{\text{iter\_max}} = m - 1$), if $x \ge t_m$ (the final knot), set:

$$B_{m-1,0}(x) = 1$$

This is the standard right-continuity correction. The half-open interval convention $[t_i, t_{i+1})$ would otherwise assign $B_{i,0} = 0$ at $x = t_m$, causing the entire basis to vanish at the right boundary. By closing the last interval, the partition-of-unity property is preserved at $x = t_m$.

Case 3 — Standard indicator (otherwise):

$$B_{i,0}(x) = \mathbf{1}{t_i \le x < t_{i+1}}$$

This is the canonical degree-0 B-spline: a piecewise-constant indicator on the half-open interval $[t_i, t_{i+1})$.

Cox-de Boor Recursion

for (d_ord in 1:W_degree) {
  int n_iter = n_basis_total + W_degree - d_ord;
  vector[n_iter] B_new;
  for (i in 1:n_iter) {
    real denom1 = W_knots_full[i + d_ord] - W_knots_full[i];
    real denom2 = W_knots_full[i + d_ord + 1] - W_knots_full[i + 1];
    real term1 = (denom1 > 0.0)
      ? (x - W_knots_full[i]) / denom1 * B[i] : 0.0;
    real term2 = (denom2 > 0.0)
      ? (W_knots_full[i + d_ord + 1] - x) / denom2 * B[i + 1] : 0.0;
    B_new[i] = term1 + term2;
  }
  for (i in 1:n_iter) B[i] = B_new[i];
}

Outer loop (d_ord from $1$ to $W_{\text{degree}}$): iterates over polynomial degrees $d = 1, 2, \ldots, p$.

Dimension at degree $d$:

$$n_{\text{iter}}(d) = n_{\text{basis_total}} + p - d$$

At $d = 0:\ n_{\text{iter\_max}} = n_{\text{basis\_total}} + p$. At $d = p:\ n_{\text{iter}}(p) = n_{\text{basis\_total}}$.

Each recursion level reduces the count by 1, consistent with the fact that $B_{i,d}$ depends on $B_{i,d-1}$ and $B_{i+1,d-1}$, so the support shrinks by one index at each level.

Inner loop — the Cox-de Boor recurrence formula:

$$B_{i,d}(x) = \frac{x - t_i}{t_{i+d} - t_i}, B_{i,d-1}(x) ;+; \frac{t_{i+d+1} - x}{t_{i+d+1} - t_{i+1}}, B_{i+1,d-1}(x)$$

The code maps directly:

denom1 $= t_{i+d} - t_i$ (left denominator)
denom2 $= t_{i+d+1} - t_{i+1}$ (right denominator)
term1 $= \frac{x - t_i}{t_{i+d} - t_i} \cdot B_{i,d-1}(x)$ when $\text{denom1} > 0$, else $0$
term2 $= \frac{t_{i+d+1} - x}{t_{i+d+1} - t_{i+1}} \cdot B_{i+1,d-1}(x)$ when $\text{denom2} > 0$, else $0$
B_new[i] $= \text{term1} + \text{term2}$

The guards denom1 > 0.0 and denom2 > 0.0 implement the convention $\frac{0}{0} := 0$ for repeated knots, which is essential for numerical stability and correctness at boundary knots where denominators vanish.

Copy-back loop:

for (i in 1:n_iter) B[i] = B_new[i];

Overwrites the first $n_{\text{iter}}$ entries of B with the new degree-$d$ values. Entries beyond $n_{\text{iter}}$ are stale but never read again (the next iteration's n_iter is strictly smaller).

After the outer loop completes ($d = p$), B[1] through B[n_basis_total] hold the degree-$p$ B-spline basis functions evaluated at $x$.

Output Extraction (First Basis Dropped)

vector[W_per_k_dim] out;
for (j in 1:W_per_k_dim) out[j] = B[j + 1];
return out;

Returns entries B[2] through B[W_per_k_dim + 1], i.e.:

$$\text{out} = \bigl(B_{2,p}(x),; B_{3,p}(x),; \ldots,; B_{W_{\text{per_k_dim}}+1,,p}(x)\bigr)^{!\top}$$

The first basis function $B_{1,p}(x)$ is dropped. This mirrors the R convention splines::bs(intercept = FALSE), which returns $n_{\text{basis\_total}} - 1$ columns. The relationship is:

$$W_{\text{per_k_dim}} = n_{\text{basis_total}} - 1 = n_{\text{int}} + p$$

As noted in the header comment, the exact column ordering need not match splines::bs because the coefficient vector W_raw absorbs any orthogonal rotation of the basis under posterior sampling — only the dimension (and hence the column space dimension) must agree.

Function 2: `apply_W_basis_diff`

Signature

vector apply_W_basis_diff(int basis_type_id, real theta, real anchor,
                          int W_per_k_dim, int W_degree,
                          int W_n_knots_full, vector W_knots_full)

Arguments:

Argument	Type	Role
`basis_type_id`	`int`	Dispatcher: `0` = W off, `1` = polynomial (monomial) basis, `2` = B-spline basis.
`theta`	`real`	Evaluation point (typically `theta_ref_i`, a parameter).
`anchor`	`real`	Anchor/reference point (typically `theta_anchor`, a parameter).
`W_per_k_dim`	`int`	Output dimension (number of basis functions per coordinate).
`W_degree`	`int`	B-spline degree (used only when `basis_type_id == 2`).
`W_n_knots_full`	`int`	Length of augmented knot vector (used only when `basis_type_id == 2`).
`W_knots_full`	`vector`	Augmented knot vector (used only when `basis_type_id == 2`).

Return type: vector[W_per_k_dim] — the anchored basis difference $\mathbf{b}(\theta) - \mathbf{b}(\text{anchor})$.

Initialization

vector[W_per_k_dim] diff = rep_vector(0.0, W_per_k_dim);

Initializes the output to the zero vector. This is the default return value when basis_type_id == 0 (W component disabled), serving as a defensive fallback.

Dispatch: `basis_type_id == 1` (Polynomial / Monomial Basis)

if (basis_type_id == 1) {
  for (jj in 1:W_per_k_dim) {
    diff[jj] = pow(theta, jj) - pow(anchor, jj);
  }
}

Uses the monomial basis $\mathbf{b}(\theta) = (\theta^1, \theta^2, \ldots, \theta^{W_{\text{per\_k\_dim}}})^{\!\top}$ (no intercept term; powers start at 1). The anchored difference is:

$$\text{diff}_j = \theta^{,j} - \text{anchor}^{,j}, \qquad j = 1, \ldots, W_{\text{per_k_dim}}$$

$$\mathbf{b}(\theta) - \mathbf{b}(\text{anchor}) = \begin{pmatrix} \theta - \text{anchor} \ \theta^2 - \text{anchor}^2 \ \vdots \ \theta^{W_{\text{per_k_dim}}} - \text{anchor}^{W_{\text{per_k_dim}}} \end{pmatrix}$$

Note that this basis is not orthogonal and the monomials are correlated, but the reference-anchoring difference improves conditioning by centering at the anchor. The absence of a constant term (power 0) is consistent with the intercept-dropping convention: a constant basis function would produce $\theta^0 - \text{anchor}^0 = 1 - 1 = 0$, contributing nothing to the difference, so it is correctly omitted.

Dispatch: `basis_type_id == 2` (B-Spline Basis)

} else if (basis_type_id == 2) {
  diff = bspline_basis_eval(theta, W_per_k_dim, W_degree,
                            W_n_knots_full, W_knots_full)
       - bspline_basis_eval(anchor, W_per_k_dim, W_degree,
                            W_n_knots_full, W_knots_full);
}

Evaluates the B-spline basis (via bspline_basis_eval) at both theta and anchor, then computes the element-wise difference:

$$\text{diff} = \mathbf{b}_{\text{bspline}}(\theta) - \mathbf{b}_{\text{bspline}}(\text{anchor})$$

where each component is:

$$\text{diff}_j = B_{j+1,,p}(\theta) - B_{j+1,,p}(\text{anchor}), \qquad j = 1, \ldots, W_{\text{per_k_dim}}$$

(The $+1$ index shift reflects the first-basis-dropping convention in bspline_basis_eval.)

Both evaluations call bspline_basis_eval with identical knot/basis configuration, ensuring the difference is computed on the same basis space. Because theta and anchor are Stan parameters, the two calls to bspline_basis_eval execute inside the HMC sampler's autodiff graph, making the difference fully differentiable with respect to both arguments.

Return

return diff;

Returns the anchored basis difference. When basis_type_id == 0, this is the zero vector (W component inactive). When basis_type_id is 1 or 2, it is the non-trivial basis difference used by the AMM decomposition.

Mathematical Summary: AMM Reference-Anchoring Decomposition

The Anchored Basis Difference

The central quantity computed by apply_W_basis_diff is:

$$\Delta\mathbf{b}(\theta,,\text{anchor}) ;=; \mathbf{b}(\theta) - \mathbf{b}(\text{anchor})$$

where $\mathbf{b}(\cdot) \in \mathbb{R}^{W_{\text{per\_k\_dim}}}$ is either the monomial basis (basis_type_id == 1) or the B-spline basis with the first function dropped (basis_type_id == 2).

Role in the W Modulating Component

In the AMM model, the W modulating component enters the likelihood through a linear combination:

$$W(\theta) = \Delta\mathbf{b}(\theta,,\text{anchor})^{!\top}, \mathbf{w}_{\text{raw}}$$

where $\mathbf{w}_{\text{raw}} \in \mathbb{R}^{W_{\text{per\_k\_dim}}}$ is the coefficient vector (declared as a Stan parameter elsewhere in the model). The reference-anchoring decomposition ensures:

$$W(\text{anchor}) = \bigl[\mathbf{b}(\text{anchor}) - \mathbf{b}(\text{anchor})\bigr]^{!\top}, \mathbf{w}_{\text{raw}} = 0$$

so the modulation is pinned to zero at the anchor point. This provides identifiability: the absolute level of $W$ is absorbed elsewhere in the model (e.g., by an intercept or baseline hazard), and $\mathbf{w}_{\text{raw}}$ captures only the shape of the modulation relative to the anchor.

Differentiability Requirement

The comment explicitly states why the basis is evaluated inside Stan rather than pre-computed in R:

the recursion re-evaluates the basis at theta_ref_i and theta_anchor in every Hamiltonian Monte Carlo step (these are parameters, so a R-side pre-computed matrix would not be differentiable in the sampler).

If the basis matrix $\mathbf{B}$ were pre-computed in R as a fixed data matrix, then $W(\theta) = \mathbf{B}\, \mathbf{w}_{\text{raw}}$ would be linear in $\mathbf{w}_{\text{raw}}$ but would have zero gradient with respect to $\theta$ and anchor. By re-evaluating the Cox-de Boor recursion inside Stan, the full gradient $\partial W / \partial \theta$ and $\partial W / \partial \text{anchor}$ are available to the HMC sampler via Stan's automatic differentiation.

Basis Rotation Invariance

The comment notes:

The exact ordering does not need to bit-match splines::bs because W_raw absorbs any orthogonal rotation under sampling, but the dimension count must agree.

Formally, if $\tilde{\mathbf{b}}(\cdot) = \mathbf{Q}\, \mathbf{b}(\cdot)$ for some orthogonal matrix $\mathbf{Q}$, then:

$$\tilde{\mathbf{b}}(\theta)^{!\top}, \tilde{\mathbf{w}} = \mathbf{b}(\theta)^{!\top}, \mathbf{Q}^{!\top}, \tilde{\mathbf{w}} = \mathbf{b}(\theta)^{!\top}, \mathbf{w}$$

where $\mathbf{w} = \mathbf{Q}^{\!\top}\, \tilde{\mathbf{w}}$. The posterior over $\mathbf{w}_{\text{raw}}$ adapts to whatever basis ordering is used, so only the dimension (and hence the column space) must match between R and Stan. The R-side dimension is computed by materialize_W_basis() as length(knots) + degree per coordinate, which for the $K = 1 + p = 1$ template equals dim_W and matches W_per_k_dim.

Dimensional Consistency Summary

Quantity	Formula	Value
Augmented knot vector length	$m = n_{\text{int}} + 2(p+1)$	`W_n_knots_full`
Total B-spline basis functions	$n_{\text{basis\_total}} = m - p - 1 = n_{\text{int}} + p + 1$	`n_basis_total`
Degree-0 working array size	$m - 1 = n_{\text{int}} + 2p + 1$	`n_iter_max`
Returned basis dimension (first dropped)	$n_{\text{basis\_total}} - 1 = n_{\text{int}} + p$	`W_per_k_dim`
R-side `materialize_W_basis()` dim	`length(knots) + degree` $= n_{\text{int}} + p$	`dim_W` (for $K=1+p=1$)

Absent Blocks

The following Stan blocks are not present in this file:

data: No data declarations. The knot vector and dimension arguments are passed as function parameters, not as block-level data variables.
transformed data: No precomputed quantities.
parameters: No parameter declarations. theta and anchor are function arguments that receive parameter values at call sites in the parent model.
transformed parameters: No transformed quantities.
model: No likelihood or prior contributions. These functions are pure utilities invoked elsewhere.
generated quantities: No posterior predictions.

This file is a pure helper module: it provides two functions block entries that are #included into the canonical AMM Stan program, where the actual parameter declarations, likelihood, priors, and generated quantities reside.

inst/stan/_canonical_pieces/amm_canonical_p1.stan

Stan Template: `inst/stan/_canonical_pieces/amm_canonical_p1.stan`

Overview

This Stan template implements the Hierarchical Bayesian AMM (General Dynamic Parameter models via reference anchoring) canonical form, Path 1. The core model specification is:

$$ \theta_i = \theta_{\text{ref}[g_i]} + a(x_i) + b(x_i) \cdot \theta_{\text{ref}[g_i]} + \left(W(\theta_{\text{ref}[g_i]}) - W(\theta_{\text{anchor}})\right) x_i $$

Where:

$\theta_i$ is the effective parameter for individual $i$ (on the link-function scale).
$\theta_{\text{ref}[g_i]}$ is a group-level reference parameter.
$a(x_i)$ is an additive component linear in a basis $Z_a$.
$b(x_i)$ is a multiplicative component linear in a basis $Z_b$, scaling $\theta_{\text{ref}}$.
$W(\cdot)$ is a basis expansion (polynomial or B-spline) applied to $\theta_{\text{ref}}$, anchored at a fixed value $\theta_{\text{anchor}}$.

The model ensures identifiability through centering of bases $Z_a$ and $Z_b$, anchoring of $W$, and uses a linear reparametrization for the multiplicative component to avoid posterior bimodality.

`functions` Block

functions {
  // {{CANONICAL_HELPERS}}
}

The functions block is populated at code generation time by R/stan_codegen.R. It contains:

apply_W_basis_diff(): A function to compute $W(\theta) - W(\theta_{\text{anchor}})$ for a given basis type (polynomial or B-spline).
Any additional helper functions required for the canonical form.

The function dispatches based on W_type_id:

Polynomial basis ($\text{type}=1$): Computes $\theta^j - \theta_{\text{anchor}}^j$ for degrees $j = 1, \ldots, \text{W\_degree}$.
B-spline basis ($\text{type}=2$): Uses Cox-de Boor recursion to evaluate basis functions at $\theta$ and $\theta_{\text{anchor}}$, then returns the difference.

`data` Block

Core Data

int<lower=1> n: Number of observations.
int<lower=1, upper=4> family_id: Likelihood family identifier:
- 1: Gaussian (identity link)
- 2: Poisson (log link)
- 3: Negative binomial 2 (log link, Stan parametrization)
- 4: Bernoulli (logit link)

Model Component Flags

int<lower=0, upper=1> use_a: Flag to include additive component $a$.
int<lower=0, upper=1> use_b: Flag to include multiplicative component $b$.
int<lower=0, upper=1> use_W: Flag to include $W$ basis component.

Basis Dimensions

int<lower=0> J_a: Number of columns in additive design matrix $Z_a$.
int<lower=0> J_b: Number of columns in multiplicative design matrix $Z_b$.
int<lower=0> dim_W: Number of basis functions in $W$.
int<lower=0> d: Dimension of covariate vector $x_i$ for the $W$ interaction term.

Design Matrices

matrix[n, J_a] Z_a: Additive design matrix. Column-centered in R-side pre-processing to enforce $E[a(X)] = 0$ (condition (C2)).
matrix[n, J_b] Z_b: Multiplicative design matrix. Column-centered in R-side pre-processing to enforce $E[b(X)] = 0$ (condition (C3)).
matrix[n, d] X: Covariate matrix for the $W$ interaction term.

Response

vector[n] y_real: Real-valued response (used for Gaussian family).
array[n] int y_int: Integer-valued response (used for Poisson, negative binomial, Bernoulli families).

Anchoring

real theta_anchor: Fixed anchor value for the $W$ basis reparametrization. Used to compute $W(\theta) - W(\theta_{\text{anchor}})$, satisfying condition (C4).

Dispersion Flags

int<lower=0, upper=1> use_dispersion_y: Flag for Gaussian observation-level dispersion $\sigma_y$.
int<lower=0, upper=1> use_dispersion_phi: Flag for negative binomial overdispersion $\phi$.

W Basis Configuration

int<lower=0, upper=2> W_type_id: Basis type for $W$:
- 0: $W$ off
- 1: Polynomial (degrees 1 to W_degree, anchored at theta_anchor)
- 2: B-spline (Cox-de Boor recursion)
int<lower=0> W_n_knots_full: Number of full knot sequence for B-spline (used only if W_type_id == 2).
vector[W_n_knots_full] W_knots_full: Full knot vector for B-spline.
int<lower=0> W_degree: Degree for polynomial basis or B-spline degree.

Grouping (Block 6.5)

int<lower=0, upper=1> use_groups: Flag to enable hierarchical per-group reference parameters. When 0, J_groups is forced to 1 and all group_id[i] are 1, collapsing to a single scalar $\theta_{\text{ref}}$.
int<lower=1> J_groups: Number of groups.
array[n] int<lower=1, upper=J_groups> group_id: Group membership for each observation.

`transformed data` Block

transformed data {
  int J_a_free = (use_a == 1 && J_a > 0) ? J_a : 0;
  int J_b_free = (use_b == 1 && J_b > 0) ? J_b : 0;
  int dim_W_eff = (use_W == 1) ? dim_W : 0;
  int d_eff = (use_W == 1) ? d : 0;
}

J_a_free: Effective number of free coefficients for $a$. Set to 0 if use_a is 0 or J_a is 0.
J_b_free: Effective number of free coefficients for $b$ (actually $c_b$). Set to 0 if use_b is 0 or J_b is 0.
dim_W_eff: Effective dimension of $W$ basis. Set to 0 if use_W is 0.
d_eff: Effective dimension of covariates for $W$ interaction. Set to 0 if use_W is 0.

These are used for conditional declaration of parameters and loop bounds.

`parameters` Block

Reference Parameters and Hyperparameters

vector[J_groups] theta_ref: Group-level reference parameters. When use_groups == 0, this is a vector of length 1 containing the single global $\theta_{\text{ref}}$.
array[use_groups] real mu_theta_ref: Hyperparameter for the mean of the hierarchical prior on $\theta_{\text{ref}}$. Allocated only when use_groups == 1.
array[use_groups] real<lower=0> sigma_theta_ref: Hyperparameter for the standard deviation of the hierarchical prior on $\theta_{\text{ref}}$. Allocated only when use_groups == 1.

Additive Component Parameters

array[use_a == 1 && J_a > 0 ? 1 : 0] real<lower=0> sigma_a: Standard deviation for additive coefficients (used in NCP or as prior scale in CP). Allocated only if component is active.
vector[J_a_free] a_raw: Raw additive coefficients. In NCP (default), these are standard normal; in CP, these are drawn from $\mathcal{N}(0, \sigma_a)$.

Multiplicative Component Parameters (Linear Reparametrization)

array[use_b == 1 && J_b > 0 ? 1 : 0] real<lower=0> sigma_b: Standard deviation for the reparametrized multiplicative coefficients $c_b$. Allocated only if component is active.
vector[J_b_free] c_b_raw: Raw coefficients for $c_b = \theta_{\text{ref}} \cdot b$. Always drawn from standard normal, regardless of NCP/CP choice for other components.

Note: The multiplicative component uses a linear reparametrization to avoid bimodality. Instead of sampling $b$ directly, we sample $c_b = \theta_{\text{ref}} \cdot b$ and then recover $b = c_b / \theta_{\text{ref}}$ as a derived quantity (only in single-group case). This eliminates a spurious mode where $\theta_{\text{ref}}$ and $b$ co-vary with opposite signs.

W Component Parameters

array[use_W == 1 && dim_W > 0 ? 1 : 0] real<lower=0> sigma_W: Standard deviation for $W$ basis coefficients. Allocated only if component is active.
matrix[dim_W_eff, d_eff] W_raw: Raw coefficients for $W$ basis functions. Dimensions: number of basis functions × covariate dimension. In NCP, these are standard normal; in CP, scaled by $\sigma_W$.

Dispersion Parameters

array[use_dispersion_y] real<lower=0> sigma_y: Observation-level standard deviation for Gaussian family.
array[use_dispersion_phi] real<lower=0> phi: Overdispersion parameter for negative binomial family.

`transformed parameters` Block

Coefficient Vectors

vector[J_a] a_coef: Additive coefficients, initialized to zero. Populated only if use_a == 1 and J_a > 0.
vector[J_b] c_b: Reparametrized multiplicative coefficients, initialized to zero.
vector[J_b] b_coef: Original multiplicative coefficients, initialized to zero. Only computed in single-group case.

Linear Predictor

vector[n] eta: Linear predictor $\eta_i$ for each observation.

Additive Component Calculation

if (use_a == 1 && J_a > 0) {
  for (j in 1:J_a_free) {
    a_coef[j] = a_raw[j]{{A_SCALE}};
  }
}

{{A_SCALE}} is replaced at code generation time:
- In NCP (default): * sigma_a[1]
- In CP: * 1 (i.e., no scaling)

Thus, in NCP: $a_j = a_{\text{raw},j} \cdot \sigma_a$, where $a_{\text{raw},j} \sim \mathcal{N}(0,1)$. In CP: $a_j = a_{\text{raw},j}$, where $a_{\text{raw},j} \sim \mathcal{N}(0, \sigma_a)$.

Multiplicative Component Calculation

if (use_b == 1 && J_b > 0) {
  for (j in 1:J_b_free) {
    c_b[j] = c_b_raw[j] * sigma_b[1];
  }
  // Only compute b_coef in single-group case
  if (use_groups == 0 && theta_ref[1] != 0) {
    for (j in 1:J_b) {
      b_coef[j] = c_b[j] / theta_ref[1];
    }
  }
}

$c_b = c_{b,\text{raw}} \cdot \sigma_b$, where $c_{b,\text{raw}} \sim \mathcal{N}(0,1)$.
$b_{\text{coef}} = c_b / \theta_{\text{ref}}$ only when there is a single group and $\theta_{\text{ref}} \neq 0$. Otherwise, $b_{\text{coef}}$ remains zero (must be derived post-hoc per group if needed).

Observation-Level Linear Predictor Loop

For each observation $i$:

for (i in 1:n) {
  real theta_ref_i = theta_ref[group_id[i]];
  real eta_i = theta_ref_i;

Assign group-specific reference parameter $\theta_{\text{ref},i}$.

  if (use_a == 1 && J_a > 0) {
    eta_i += Z_a[i] * a_coef;
  }

Add additive component: $a(x_i) = Z_a[i] \cdot a_{\text{coef}}$.

  if (use_b == 1 && J_b > 0) {
    eta_i += Z_b[i] * c_b;
  }

Add multiplicative component in reparametrized form: $b(x_i) \cdot \theta_{\text{ref}} = Z_b[i] \cdot c_b$.

  if (use_W == 1 && dim_W > 0 && d > 0) {
    vector[d] W_diff_x = rep_vector(0, d);
    vector[dim_W] basis_diff = apply_W_basis_diff(
      W_type_id, theta_ref_i, theta_anchor, dim_W, W_degree,
      W_n_knots_full, W_knots_full
    );
    for (k in 1:dim_W) {
      for (jj in 1:d) {
        W_diff_x[jj] += basis_diff[k] * W_raw[k, jj]{{W_SCALE}};
      }
    }
    eta_i += dot_product(W_diff_x, to_vector(X[i]));
  }

Add $W$ component:
- Compute basis_diff = $W(\theta_{\text{ref},i}) - W(\theta_{\text{anchor}})$, a vector of length dim_W.
- For each covariate dimension $jj$, compute the linear combination of basis differences weighted by $W_{\text{raw}}$ (with {{W_SCALE}} scaling). {{W_SCALE}} is replaced by code generation:
  - In NCP: * sigma_W[1]
  - In CP: * 1
- Take dot product of W_diff_x with $x_i$ to get scalar contribution: $\sum_{jj=1}^d \left( \sum_{k=1}^{\text{dim\_W}} \text{basis\_diff}_k \cdot W_{\text{raw},k,jj} \right) x_{i,jj}$.

  eta[i] = eta_i;
}

Store final $\eta_i$.

`model` Block

Priors

if (use_groups == 1) {
  mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
  sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
  theta_ref ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
} else {
  theta_ref[1] ~ {{PRIOR_THETA_REF}};
}

When use_groups == 1:
- Hyperpriors on $\mu_{\theta_{\text{ref}}}$ and $\sigma_{\theta_{\text{ref}}}$ (placeholders replaced by code generation).
- Hierarchical prior: $\theta_{\text{ref},g} \sim \mathcal{N}(\mu_{\theta_{\text{ref}}}, \sigma_{\theta_{\text{ref}}})$ for each group $g$.
When use_groups == 0:
- Single prior directly on $\theta_{\text{ref},1}$.

if (use_a == 1 && J_a > 0) {
  sigma_a[1] ~ {{PRIOR_SIGMA_A}};
  a_raw ~ {{A_PRIOR}};
}

Prior on $\sigma_a$ (positive, typically half-normal or exponential).
Prior on $a_{\text{raw}}$:
- In NCP: {{A_PRIOR}} replaced by normal(0, 1).
- In CP: {{A_PRIOR}} replaced by normal(0, sigma_a[1]).

if (use_b == 1 && J_b > 0) {
  sigma_b[1] ~ {{PRIOR_SIGMA_B}};
  c_b_raw ~ normal(0, 1);
}

Prior on $\sigma_b$.
$c_{b,\text{raw}} \sim \mathcal{N}(0,1)$ always, regardless of NCP/CP choice for other components.

if (use_W == 1 && dim_W > 0) {
  sigma_W[1] ~ {{PRIOR_SIGMA_W}};
  to_vector(W_raw) ~ {{W_PRIOR}};
}

Prior on $\sigma_W$.
Prior on $W_{\text{raw}}$:
- In NCP: {{W_PRIOR}} replaced by normal(0, 1).
- In CP: {{W_PRIOR}} replaced by normal(0, sigma_W[1]).

if (use_dispersion_y == 1) {
  sigma_y[1] ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
  phi[1] ~ {{PRIOR_PHI}};
}

Priors on dispersion parameters (if used).

Likelihood

if (family_id == 1) {
  y_real ~ normal(eta, sigma_y[1]);
} else if (family_id == 2) {
  y_int ~ poisson_log(eta);
} else if (family_id == 3) {
  y_int ~ neg_binomial_2_log(eta, phi[1]);
} else if (family_id == 4) {
  y_int ~ bernoulli_logit(eta);
}

Gaussian: $y_i \sim \mathcal{N}(\eta_i, \sigma_y)$.
Poisson: $y_i \sim \text{Poisson}(\exp(\eta_i))$.
Negative binomial: $y_i \sim \text{NB2}(\exp(\eta_i), \phi)$ with mean $\mu = \exp(\eta_i)$ and variance $\mu + \mu^2/\phi$.
Bernoulli: $y_i \sim \text{Bernoulli}(\text{logit}^{-1}(\eta_i))$.

`generated quantities` Block

Declarations

vector[n] log_lik;
vector[n] theta_i = eta;
vector[n] y_pred;

log_lik: Pointwise log-likelihood for each observation (used for LOO/WAIC).
theta_i: Set to eta (the linear predictor). This represents $\theta_i$ on the link-function scale.
y_pred: Posterior predictive samples.

Loop Over Observations

For each observation $i$, compute:

Log-likelihood (for model comparison):
- Gaussian: $\log p(y_i | \eta_i, \sigma_y) = \text{normal\_lpdf}(y_i | \eta_i, \sigma_y)$
- Poisson: $\log p(y_i | \eta_i) = \text{poisson\_log\_lpmf}(y_i | \eta_i)$
- Negative binomial: $\log p(y_i | \eta_i, \phi) = \text{neg\_binomial\_2\_log\_lpmf}(y_i | \eta_i, \phi)$
- Bernoulli: $\log p(y_i | \eta_i) = \text{bernoulli\_logit\_lpmf}(y_i | \eta_i)$
Posterior predictive sample (for PPC):
- Gaussian: $y_i^{\text{pred}} \sim \mathcal{N}(\eta_i, \sigma_y)$
- Poisson: $y_i^{\text{pred}} \sim \text{Poisson}(\exp(\eta_i))$
- Negative binomial: $y_i^{\text{pred}} \sim \text{NB2}(\exp(\eta_i), \phi)$
- Bernoulli: $y_i^{\text{pred}} \sim \text{Bernoulli}(\text{logit}^{-1}(\eta_i))$

Mathematical Summary

Model Equation

$$ \eta_i = \underbrace{\theta_{\text{ref}[g_i]}}_{\text{reference}} + \underbrace{Z_a[i] \cdot a_{\text{coef}}}_{\text{additive}} + \underbrace{Z_b[i] \cdot c_b}_{\text{multiplicative (reparam.)}} + \underbrace{\sum_{k=1}^{\text{dim_W}} \sum_{jj=1}^d \left(W_k(\theta_{\text{ref}[g_i]}) - W_k(\theta_{\text{anchor}})\right) W_{\text{raw},k,jj} \cdot X[i,jj]}_{\text{basis interaction}} $$

Where:

$a_{\text{coef},j} = a_{\text{raw},j} \cdot \sigma_a$ (NCP) or $a_{\text{coef},j} = a_{\text{raw},j}$ (CP).
$c_b = c_{b,\text{raw}} \cdot \sigma_b$, with $c_{b,\text{raw}} \sim \mathcal{N}(0,1)$.
$W_k(\theta) - W_k(\theta_{\text{anchor}})$ is the anchored basis difference.
$W_{\text{raw},k,jj}$ are coefficients with scaling by $\sigma_W$ in NCP.

Identifiability Conditions

(C2) Additive centering: $E[a(X)] = 0$ enforced by column-centering $Z_a$ in R.
(C3) Multiplicative centering: $E[b(X)] = 0$ enforced by column-centering $Z_b$ in R.
(C4) Basis anchoring: $W(\theta) - W(\theta_{\text{anchor}})$ ensures $W$ vanishes at anchor.
(C5) Group aliasing: When use_groups == 1, $Z_a$ must not contain columns that alias with $\theta_{\text{ref}[g]}$ (enforced in R-side preflight check).

Reparametrization Rationale

The multiplicative component $b$ is reparametrized as $c_b = \theta_{\text{ref}} \cdot b$ to avoid a bimodal posterior. The original parametrization leads to a non-convex log-posterior in $(\theta_{\text{ref}}, b)$ with a spurious mode where $\theta_{\text{ref}}$ and $b$ shift in opposite directions. Sampling $c_b$ directly yields a strictly log-concave (Gaussian-conditional) posterior.

inst/stan/_canonical_pieces/amm_canonical_pmulti_KxP.stan

`inst/stan/_canonical_pieces/amm_canonical_pmulti_KxP.stan` — Section 1 of 5

This section contains only the header comment and the functions block. The remaining blocks (data, transformed data, parameters, transformed parameters, model, generated quantities) appear in subsequent sections. The walkthrough below is line-faithful to the supplied source.

0. Header comment (preamble)

The header is a documentation block (Stan // comments) and does not compile to anything. It establishes the following contract, which the rest of the template must honor:

0.1 Provenance and canonization

Path label. "gdpar Path 1 — Full-Bayes (FB) multi-parametric multivariate distribution template, Path C". Path C is the cross product $K \geq 2$ individual slots with $p \geq 2$ AMM coordinates.
Sub-bloque provenance. "Sub-bloque 9.3.d (Bloque 9, Sesión B9.5, 2026-05-27)" under "decision I.iv lateral canonized in B9.4", with design reference DESIGN_9_3_D_PATH_C.md sub-decision 3.2 = "J.iv.A: dedicated piece with canonical helpers reuse".
Architectural lineage. Mirrors amm_eb_marginal_KxP.stan from "Sub-fase 8.6.D, 2026-05-25". It composes:
- Path A multivariate ("Theorem 7A* of v07b"): coord-wise factorization across $p$.
- Path B multi-parametric ("Theorem 7C* of v07b"): distributional regression canonical link per slot.
Outcome structure. A single outcome matrix $y[n, p]$ is multivariate (matrix-column outcome shared across the $K$ slots, coord-wise factorized across $p$). The $K$ slots are the canonical distributional parameters of the family; examples given:
- Gaussian $K=2:\ (\mu, \log\sigma)$.
- Negative binomial $K=2:\ (\log\mu, \log\phi)$.
- Student-$t\ K=3:\ (\mu\ \text{identity}, \log\sigma, \log\nu)$.
Each slot has its own anchor in $\mathbb{R}^p$ per group, mirroring Path A.

0.2 Reference-anchoring decomposition (mathematical statement)

For individual $i$, (optional) group $g[i]$, slot $k = 1,\dots,K$, and coordinate $j = 1,\dots,p$, the linear predictor is

$$ \eta_{kp}[k, i, j] ;=; \theta^{\mathrm{ref}}_{kp}[g[i], k][j] ;+; a_{kj}(x_i) ;+; b_{kj}(x_i),\theta^{\mathrm{ref}}_{kp}[g[i], k][j] ;+; \bigl(W_{kj}(\theta^{\mathrm{ref}}_{kp}[g[i], k][j]) - W_{kj}(\theta^{\mathrm{anchor}}_{kp}[k][j])\bigr),x_i. $$

The outcome is then distributed as

$$ y[i, j] ;\sim; \mathcal{D}_{,\texttt{family_id_k_vector}[1]}!\left(\eta_{kp}[1, i, j],; \texttt{apply_inv_link_by_id}(\texttt{inv_link_id_per_slot}[2], \eta_{kp}[2, i, j]),;\dots\right). $$

This is the AMM reference-anchoring decomposition: a per-group, per-slot, per-coordinate reference $\theta^{\mathrm{ref}}_{kp}$ is shifted by a smooth offset $a_{kj}(x_i)$, scaled by a covariate-modulated slope $b_{kj}(x_i)$, and further adjusted by a basis-function difference $W_{kj}(\theta^{\mathrm{ref}}) - W_{kj}(\theta^{\mathrm{anchor}})$ times $x_i$.

0.3 Family coverage

Path B family set (Sesión B9.5, J.iv.A):

`family_id`	Family
1	Gaussian
3	`neg_binomial_2`
5	Beta
6	Gamma
7	`lognormal_loc_scale`
8	Student-$t$
9	Tweedie
10	ZIP
11	ZINB
12	`hurdle_poisson`
13	`hurdle_neg_binomial_2`

The "K.c roster minimal bootstrap" covers families 1, 5, 6, 8 (seeds 91001–91004 reserved in B9.4 DESIGN_9_3_D_PATH_C.md §4). Remaining families use the same dispatch with caveats inherited from Sub-fase 8.6.D:

Path B logit-strict under Path C near the support boundary is fragile.
Tweedie's slot $p$ is structurally bounded in $(1.01, 1.99)$ via THETA_REF_PRIOR_BLOCK by slice.

0.4 Internal sampling parametrization

$a$ (offset). NCP only per slot, with per-slot scale sigma_a_k. Per-slot per-coord CP / mixed regimes are deferred to B9.6+ canonical KxP a-blocks; B9.5 is atomic.
$b$ (slope). LINEAR reparametrization per slot per coord on $c_{b,kp} = \theta^{\mathrm{ref}}_{kp}\,b_{\mathrm{coef},kp}$. NCP sample of c_b_kp_raw and derive b_coef_kp post-hoc in the single-anchor regime.
$W$ (basis). Configurable CP/NCP globally via placeholders W_SCALE and W_PRIOR. Inactive (use_W = 0) in the B9.5 initial roster; enabled via use_W = 1 with apply_W_basis_diff() dispatch on W_type_id mirroring distrib_K and pmulti.
$\theta^{\mathrm{ref}}_{kp}$ (anchor). Per-group, per-slot, per-coord anchor. Hierarchical under use_groups == 1; single-anchor coord-wise i.i.d. otherwise. THETA_REF_PRIOR_BLOCK placeholder emits the family-specific block (Tweedie $K=3$ with bounded $p$ slot via slice).

0.5 Codegen and caveats

Generated by R/stan_codegen.R via placeholder substitution; the substituted output must not be hand-edited.
The dedicated helpers piece inst/stan/_canonical_pieces/amm_canonical_helpers.stan is injected at the CANONICAL_HELPERS placeholder inside the functions { } block (decision G.iv lateral of B9.4).
Caveat heredado (DESIGN_9_3_D_PATH_C.md §5): Path B logit-strict under Path C may cause numerical instability for families with inv_logit link (beta, bernoulli) when the covariate domain produces $\eta$ near $\pm\infty$. Mitigation: the B9.5 K.c roster avoids pure logit-strict (gaussian + beta + gamma + student_t); expansion to families {10 ZIP, 12 hurdle-Poisson} (logit link on $\pi$) is deferred to B9.6 with granular skip_if if instability emerges.

1. `functions` block

1.1 Canonical helpers injection

functions {
  // Sub-bloque 9.3.d colateral via G.iv lateral (Sesion B9.4): the
  // bspline_basis_eval + apply_W_basis_diff helpers are injected here
  // from the dedicated piece amm_canonical_helpers.stan.
  // {{CANONICAL_HELPERS}}

The functions { opens the block. The comment cites "Sub-bloque 9.3.d colateral via G.iv lateral (Sesión B9.4)" and identifies the injected helpers as bspline_basis_eval and apply_W_basis_diff, sourced from amm_canonical_helpers.stan. The literal token {{CANONICAL_HELPERS}} is a placeholder consumed by R/stan_codegen.R; at substitution time the contents of amm_canonical_helpers.stan are textually inserted here. In the raw template as supplied, no helper bodies are visible — only the placeholder. Consequently, the functions bspline_basis_eval and apply_W_basis_diff are referenced by name in the contract (§0.2, §0.4) but their definitions are external to this section.

1.2 `tweedie_log_W_series`

  // Sub-phase 8.3.5b: Tweedie compound Poisson-gamma log-density for
  // the regime 1 < p < 2 (hybrid Dunn-Smyth series + saddlepoint).
  // Mirror of the K = 1 + p = 1 template inst/stan/amm_distrib_K.stan
  // functions block (sub-phase 8.3.5b, 2026-05-21). The functions are
  // identical because the Tweedie likelihood is the same regardless
  // of the cross-product K x p (the function takes scalar arguments
  // y, mu, phi, p and returns a scalar log-density).
  real tweedie_log_W_series(real y, real phi, real p) {
    real alpha_pos = (2.0 - p) / (p - 1.0);
    real log_z = alpha_pos * log(y)
                 - alpha_pos * log(p - 1.0)
                 - (1.0 + alpha_pos) * log(phi)
                 - log(2.0 - p);
    real log_W = negative_infinity();
    real max_log_W = negative_infinity();
    int max_iter = 1000;
    int passed_peak = 0;
    for (j_loop in 1:max_iter) {
      real j = j_loop;
      real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos);
      if (term > max_log_W) {
        max_log_W = term;
      } else {
        passed_peak = 1;
      }
      log_W = log_sum_exp(log_W, term);
      if (passed_peak == 1 && term < max_log_W - 37.0) {
        break;
      }
    }
    return log_W;
  }

Comment. Cites "Sub-phase 8.3.5b" and identifies this as the Tweedie compound Poisson–gamma log-density for the regime $1 < p < 2$, implemented as a hybrid Dunn–Smyth series + saddlepoint. It is a mirror of the $K=1, p=1$ template inst/stan/amm_distrib_K.stan functions block (sub-phase 8.3.5b, 2026-05-21). The functions are identical because the Tweedie likelihood is invariant under the $K \times p$ cross-product — the function takes scalar arguments $(y, \mu, \phi, p)$ and returns a scalar log-density.

Signature. real tweedie_log_W_series(real y, real phi, real p) — computes the log of the series-normalizing constant $W$ appearing in the Dunn–Smyth series representation of the Tweedie density for $1 < p < 2$.

Declarations and mathematics.

real alpha_pos = (2.0 - p) / (p - 1.0); — defines the Tweedie compound-Poisson shape parameter

$$ \alpha_{\mathrm{pos}} ;=; \frac{2 - p}{p - 1}. $$

For $1 < p < 2$ this is positive. (This is the standard Tweedie index relation $p = (\alpha + 2)/(\alpha + 1)$ solved for $\alpha$.)

real log_z = alpha_pos * log(y) - alpha_pos * log(p - 1.0) - (1.0 + alpha_pos) * log(phi) - log(2.0 - p); — defines the log of the series argument

$$ \log z ;=; \alpha_{\mathrm{pos}}\log y ;-; \alpha_{\mathrm{pos}}\log(p-1) ;-; (1 + \alpha_{\mathrm{pos}})\log\phi ;-; \log(2 - p). $$

Equivalently,

$$ z ;=; \frac{y^{\alpha_{\mathrm{pos}}}}{(p-1)^{\alpha_{\mathrm{pos}}},\phi^{1+\alpha_{\mathrm{pos}}},(2-p)}. $$

real log_W = negative_infinity(); — initializes the running log-sum-exp accumulator at $-\infty$ (the identity element for log_sum_exp).
real max_log_W = negative_infinity(); — tracks the maximum term encountered, for the truncation rule.
int max_iter = 1000; — hard cap on the number of series terms.
int passed_peak = 0; — flag indicating whether the series has passed its modal term.

Loop. for (j_loop in 1:max_iter) iterates $j = 1, 2, \dots, 1000$.

real j = j_loop; — promotes the integer loop counter to a real for arithmetic.
real term = j * log_z - lgamma(j + 1.0) - lgamma(j * alpha_pos); — the $j$-th log-series term

$$ \log w_j ;=; j\log z ;-; \log\Gamma(j+1) ;-; \log\Gamma(j,\alpha_{\mathrm{pos}}). $$

This is the log of the $j$-th summand in the Dunn–Smyth series $W = \sum_{j=1}^{\infty} w_j$, where $w_j = z^j / [\Gamma(j+1)\,\Gamma(j\alpha_{\mathrm{pos}})]$.

if (term > max_log_W) { max_log_W = term; } else { passed_peak = 1; } — updates the running maximum; once a term fails to exceed the running max, the peak has been passed and passed_peak is latched to 1.
log_W = log_sum_exp(log_W, term); — accumulates $\log W \leftarrow \mathrm{logsumexp}(\log W, \log w_j)$, i.e., numerically stable $\log(\sum_j w_j)$.
if (passed_peak == 1 && term < max_log_W - 37.0) { break; } — early termination: once past the peak and the current term is more than 37 nats below the peak (a threshold corresponding roughly to single-precision underflow in $\exp$), the loop breaks.

Return. return log_W; — returns $\log W = \log\sum_{j=1}^{\infty} w_j$ (truncated per the rule above).

1.3 `tweedie_log_f_series`

  real tweedie_log_f_series(real y, real mu, real phi, real p) {
    real theta = pow(mu, 1.0 - p) / (1.0 - p);
    real kappa = pow(mu, 2.0 - p) / (2.0 - p);
    return (y * theta - kappa) / phi - log(y)
           + tweedie_log_W_series(y, phi, p);
  }

Signature. real tweedie_log_f_series(real y, real mu, real phi, real p) — the Dunn–Smyth series log-density of the Tweedie compound Poisson–gamma distribution for $1 < p < 2$.

Declarations and mathematics.

real theta = pow(mu, 1.0 - p) / (1.0 - p); — the Tweedie canonical parameter

$$ \theta ;=; \frac{\mu^{1-p}}{1 - p}. $$

real kappa = pow(mu, 2.0 - p) / (2.0 - p); — the Tweedie cumulant function (per unit $\phi$)

$$ \kappa ;=; \frac{\mu^{2-p}}{2 - p}. $$

Return.

$$ \log f(y;\mu,\phi,p) ;=; \frac{y\theta - \kappa}{\phi} ;-; \log y ;+; \log W(y, \phi, p), $$

i.e.,

$$ \log f ;=; \frac{1}{\phi}!\left(\frac{y,\mu^{1-p}}{1-p} - \frac{\mu^{2-p}}{2-p}\right) - \log y + \log W. $$

This is the standard Dunn–Smyth series form of the Tweedie density for $1 < p < 2$.

1.4 `tweedie_log_f_saddlepoint`

  real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p) {
    real deviance = 2.0 * (
      y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p)
      - (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p)
    );
    return -0.5 * log(2.0 * pi() * phi * pow(y, p))
           - deviance / (2.0 * phi);
  }

Signature. real tweedie_log_f_saddlepoint(real y, real mu, real phi, real p) — the saddlepoint (large-deviation) approximation to the Tweedie log-density, used away from $p = 1.5$.

Declarations and mathematics.

real deviance = 2.0 * ( y * (pow(y, 1.0 - p) - pow(mu, 1.0 - p)) / (1.0 - p) - (pow(y, 2.0 - p) - pow(mu, 2.0 - p)) / (2.0 - p) ); — the Tweedie deviance

$$ D(y, \mu; p) ;=; 2!\left[\frac{y,(y^{1-p} - \mu^{1-p})}{1 - p} ;-; \frac{y^{2-p} - \mu^{2-p}}{2 - p}\right]. $$

Return.

$$ \log f(y;\mu,\phi,p) ;\approx; -\tfrac{1}{2}\log!\bigl(2\pi,\phi,y^{p}\bigr) ;-; \frac{D(y, \mu; p)}{2\phi}. $$

This is the standard saddlepoint / Daniels approximation for the Tweedie density.

1.5 `tweedie_lpdf`

  real tweedie_lpdf(real y, real mu, real phi, real p) {
    real tau = 0.4;
    real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
    if (y == 0.0) {
      return -lambda;
    }
    if (abs(p - 1.5) < tau) {
      return tweedie_log_f_series(y, mu, phi, p);
    } else {
      return tweedie_log_f_saddlepoint(y, mu, phi, p);
    }
  }

Signature. real tweedie_lpdf(real y, real mu, real phi, real p) — the public Tweedie log-density entry point, dispatching between the series and saddlepoint forms.

Declarations and mathematics.

real tau = 0.4; — the half-width of the band around $p = 1.5$ within which the series representation is preferred.
real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p)); — the compound-Poisson rate

$$ \lambda ;=; \frac{\mu^{2-p}}{\phi,(2 - p)}. $$

This is the mean of the Poisson mixing distribution in the compound Poisson–gamma representation of the Tweedie for $1 < p < 2$.

Branch 1 (point mass at zero). if (y == 0.0) { return -lambda; } — for $y = 0$ the Tweedie density is $\Pr(Y = 0) = e^{-\lambda}$, so

$$ \log f(0;\mu,\phi,p) ;=; -\lambda. $$

Branch 2 (series regime). if (abs(p - 1.5) < tau) { return tweedie_log_f_series(y, mu, phi, p); } — when $|p - 1.5| < 0.4$ (i.e., $p \in (1.1, 1.9)$), return the Dunn–Smyth series log-density.

Branch 3 (saddlepoint regime). else { return tweedie_log_f_saddlepoint(y, mu, phi, p); } — otherwise (i.e., $p \in (1, 1.1] \cup [1.9, 2)$ within the supported $(1, 2)$ range), return the saddlepoint approximation.

The hybrid dispatch is the "hybrid Dunn-Smyth series + saddlepoint" referenced in the §1.2 comment.

1.6 `tweedie_rng`

  real tweedie_rng(real mu, real phi, real p) {
    real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p));
    int N = poisson_rng(lambda);
    if (N == 0) {
      return 0.0;
    }
    real shape = (2.0 - p) / (p - 1.0);
    real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0));
    return gamma_rng(N * shape, rate);
  }

Signature. real tweedie_rng(real mu, real phi, real p) — a posterior-predictive RNG for the Tweedie compound Poisson–gamma, exploiting the compound representation $Y = \sum_{n=1}^{N} G_n$ with $N \sim \mathrm{Pois}(\lambda)$ and $G_n \sim \mathrm{Gamma}(\alpha_{\mathrm{pos}}, \beta)$.

Declarations and mathematics.

real lambda = pow(mu, 2.0 - p) / (phi * (2.0 - p)); — as in §1.5, the Poisson rate $\lambda = \mu^{2-p}/[\phi(2-p)]$.
int N = poisson_rng(lambda); — draw $N \sim \mathrm{Poisson}(\lambda)$.
if (N == 0) { return 0.0; } — if $N = 0$, the compound sum is empty, so $Y = 0$.
real shape = (2.0 - p) / (p - 1.0); — the Gamma shape $\alpha_{\mathrm{pos}} = (2-p)/(p-1)$ (same as in §1.2).
real rate = 1.0 / (phi * (p - 1.0) * pow(mu, p - 1.0)); — the Gamma rate

$$ \beta ;=; \frac{1}{\phi,(p-1),\mu^{p-1}}. $$

return gamma_rng(N * shape, rate); — draw $Y \sim \mathrm{Gamma}(N\alpha_{\mathrm{pos}},\,\beta)$. This is the exact distribution of $\sum_{n=1}^{N} G_n$ when each $G_n \sim \mathrm{Gamma}(\alpha_{\mathrm{pos}}, \beta)$ independently (Gamma additivity in the shape parameter under a common rate).

1.7 `apply_inv_link_by_id`

  // Sub-phase 8.3.7 heterogeneous slot inverse-link dispatch. Mirror
  // of amm_distrib_K.stan apply_inv_link_by_id. Consumed by the K=2
  // likelihood branches (family_id_k_vector[1] in {1, 3, 5, 6, 7})
  // applied coord-wise.
  real apply_inv_link_by_id(int link_id, real eta) {
    if (link_id == 0) return eta;
    if (link_id == 1) return inv_logit(eta);
    return exp(eta);
  }

Comment. Cites "Sub-phase 8.3.7 heterogeneous slot inverse-link dispatch" and identifies this as a mirror of amm_distrib_K.stan's apply_inv_link_by_id. It is consumed by the $K=2$ likelihood branches where family_id_k_vector[1] is in $\{1, 3, 5, 6, 7\}$, applied coord-wise.

Signature. real apply_inv_link_by_id(int link_id, real eta) — maps an integer link identifier and a real linear predictor $\eta$ to the inverse-link-transformed value.

Dispatch.

if (link_id == 0) return eta; — link_id == 0 is the identity link: $g^{-1}(\eta) = \eta$.
if (link_id == 1) return inv_logit(eta); — link_id == 1 is the logit (inverse-logit) link:

$$ g^{-1}(\eta) ;=; \mathrm{logit}^{-1}(\eta) ;=; \frac{1}{1 + e^{-\eta}}. $$

return exp(eta); — the default (any link_id not 0 or 1) is the log (inverse-log = exp) link:

$$ g^{-1}(\eta) ;=; e^{\eta}. $$

This dispatch is the canonical-link machinery referenced in the header (§0.1, "distributional regression canonical link per slot"). It is the mechanism by which the second (and subsequent) slot's linear predictor $\eta_{kp}[k, i, j]$ is mapped to the natural parameter of the family before being passed to the likelihood; e.g., for a Gaussian with $K=2$, slot 1 uses identity (link_id == 0) to produce $\mu$, and slot 2 uses exp (default) to produce $\sigma = e^{\eta}$ from $\eta = \log\sigma$.

Summary of section 1

Section 1 establishes:

The contract (header): the AMM reference-anchoring decomposition with per-group, per-slot, per-coordinate anchors $\theta^{\mathrm{ref}}_{kp}$, offset $a_{kj}$, slope $b_{kj}$, and basis difference $W_{kj}$; the family roster; the sampling parametrization choices (NCP for $a$, linear reparametrization for $b$, configurable CP/NCP for $W$, hierarchical or i.i.d. for $\theta^{\mathrm{ref}}_{kp}$); and the codegen/caveat framework.
The functions block: a placeholder for canonical helpers (bspline_basis_eval, apply_W_basis_diff); the complete Tweedie compound Poisson–gamma log-density machinery (tweedie_log_W_series, tweedie_log_f_series, tweedie_log_f_saddlepoint, tweedie_lpdf, tweedie_rng) implementing the hybrid Dunn–Smyth series + saddlepoint approximation for $1 < p < 2$; and the heterogeneous slot inverse-link dispatcher apply_inv_link_by_id realizing the canonical-link-per-slot machinery of Path B.

No data, transformed data, parameters, transformed parameters, model, or generated quantities blocks appear in this section; they are carried in subsequent sections.

Stan Template `amm_canonical_pmulti_KxP.stan` — Section 2 of 5

Section boundary

Section 2 opens with the closing brace } of the functions block declared in section 1, then contains the complete data and transformed data blocks, and the parameters block up to (but not including) its closing brace — the block is truncated mid-declaration at phi_pop_k. The model and generated quantities blocks are not present in this section.

1. `functions` block (residual closing brace)

This single line terminates the functions { ... } block begun in section 1. No function declarations or bodies appear in section 2. All user-defined Stan functions (family dispatchers, inverse-link selectors, W-basis constructors, log-likelihood wrappers) were defined in section 1 and are in scope for the remaining blocks.

2. `data` block

2.1 Core dimensions

int<lower=1> n;
int<lower=2> K;
int<lower=2> p;

Declaration	Role
`n`	Number of observations. $n \geq 1$.
`K`	Number of AMM slots (distinct sub-models sharing the same design infrastructure). $K \geq 2$.
`p`	Number of response coordinates (multivariate outcome dimension). $p \geq 2$.

The regime is $K \times P$ (multi-slot, multi-coordinate), as indicated by the file name suffix _KxP.

2.2 Family identifiers

array[K] int<lower=1, upper=13> family_id_k_vector;

A length-$K$ integer array assigning an exponential-family identifier to each slot. The constraint <lower=1, upper=13> admits 13 candidate families. The comment states that in the B9.5 initial iteration the family is homogeneous across slots: every entry is intended to carry the same value, and the dispatcher (in the functions block) reads only family_id_k_vector[1] as the canonical family identifier. Heterogeneous per-slot families are deferred. Mathematically, the observation model for every slot $k$ uses the same family $\mathcal{F}$:

$$y_{i,k,\cdot} \sim \mathcal{F}!\big(\mu_{i,k,\cdot},;\sigma_{y,k},;\phi_k\big), \qquad \mathcal{F} = \texttt{family_id_k_vector[1]}.$$

2.3 Inverse-link selectors

array[K] int<lower=0, upper=2> inv_link_id_per_slot;

Per-slot canonical inverse-link code, inherited from the Path B $K$-dispatch under the "L1 refined rule." The three admissible values encode:

Code	Inverse link $g_k^{-1}(\eta)$
0	identity: $g_k^{-1}(\eta)=\eta$
1	inverse-logit: $g_k^{-1}(\eta)=\mathrm{logit}^{-1}(\eta)=\frac{1}{1+e^{-\eta}}$
2	exponential: $g_k^{-1}(\eta)=e^{\eta}$

The linear predictor $\eta_{i,k,j}$ is mapped to the mean $\mu_{i,k,j}=g_k^{-1}(\eta_{i,k,j})$ inside the family dispatcher.

2.4 Component use-flags

array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;
int<lower=0, upper=1> use_W;

Binary activation flags:

use_a_k[k] — whether slot $k$ carries an additive deviation component $a_k(\cdot)$.
use_b_k[k] — whether slot $k$ carries a multiplicative deviation component $b_k(\cdot)$.
use_W — whether the global modulating component $W(\cdot)$ is active.

When use_a_k[k] = 0 (or use_b_k[k] = 0), the corresponding per-coordinate free-column counts J_a_per_kp / J_b_per_kp are treated as zero in transformed data, and the dispatch loop in later sections reduces to a zero-cost branch (no coefficients are sampled for that slot/component).

2.5 Ragged free-column structure

int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[K, p] int<lower=0> J_a_per_kp;
array[K, p] int<lower=0> J_b_per_kp;

J_a_max, J_b_max — upper bounds on the number of free columns in the additive and multiplicative design matrices, used for padding.
J_a_per_kp[k, j] — true (ragged) number of free additive columns for slot $k$, coordinate $j$.
J_b_per_kp[k, j] — true (ragged) number of free multiplicative columns for slot $k$, coordinate $j$.

The comment references "Sub-fase 8.6.D D41" for the full ragged canonization in the $K \times P$ regime. The flat-pack vectors a_raw and c_b_kp_raw (declared in parameters) are laid out via per-$(k,j)$ offsets computed in transformed data, respecting this ragged structure.

2.6 Padded design matrices

array[K, p] matrix[n, J_a_max] Z_a_kp;
array[K, p] matrix[n, J_b_max] Z_b_kp;

For each slot $k$ and coordinate $j$:

Z_a_kp[k, j] is an $n \times J_{a,\max}$ matrix. Only columns $1{:}J_{a,k,j}^{\text{per}}$ are referenced; the remaining $J_{a,\max}-J_{a,k,j}^{\text{per}}$ padded columns are never touched.
Z_b_kp[k, j] is an $n \times J_{b,\max}$ matrix, analogously.

The additive contribution to the linear predictor for slot $k$, coordinate $j$ is:

$$\delta^{\text{add}}_{i,k,j} = \sum_{\ell=1}^{J_{a,k,j}} Z_{a,k,j}[i,\ell];a_{k,j,\ell},$$

and the multiplicative contribution (after the linear reparametrization discussed below) is:

$$\delta^{\text{mul}}_{i,k,j} = \sum_{\ell=1}^{J_{b,k,j}} Z_{b,k,j}[i,\ell];c_{b,k,j,\ell}.$$

2.7 Modulating component W infrastructure

int<lower=0> dim_W;
int<lower=0> d;
int<lower=0> W_per_kj_dim;
matrix[n, d] X;

dim_W — total basis dimension of the modulating component $W$.
d — number of columns of the covariate matrix $X$ (the "effect covariates" on which $W$ acts).
W_per_kj_dim — per-$(k,j)$ basis dimension. For separable polynomial and B-spline bases, the comment notes the decomposition $\dim_W = K \cdot p \cdot W_{\text{per-kj-dim}}$.
X — $n \times d$ covariate matrix. The modulating term is $W \cdot X$, i.e. $W_{k,j}(x_i) = \sum_{m=1}^{d} W_{k,j,m}\,X_{i,m}$ (the exact assembly appears in transformed parameters).

When use_W = 0, all $W$-related quantities are zeroed out in transformed data (dim_W_eff = 0, d_eff = 0, sigma_W_size = 0), and W_raw collapses to a $0 \times 0$ matrix.

2.8 Outcomes

matrix[n, p] y_real;
array[n, p] int y_int;

y_real — $n \times p$ matrix of continuous (real-valued) responses, consumed by continuous families (e.g. Gaussian).
y_int — $n \times p$ integer array of count responses, consumed by count families (e.g. Poisson, negative binomial).

Only the outcome matching family_id_k_vector[1] is consumed by the likelihood; the other is ignored.

2.9 Anchor

array[K] vector[p] theta_anchor_kp;

A $K$-vector of $p$-vectors: the per-slot per-coordinate anchor $\theta_{\text{anchor},k,j}$. This is fixed data (not sampled). In the reference-anchoring decomposition, the anchor serves as the fixed reference point from which the per-group sampled reference $\theta_{\text{ref},g,k,j}$ (declared in parameters) deviates. The precise role (prior mean for $\theta_{\text{ref}}$, or direct centering constant) is finalized in the model block (later section).

2.10 Dispersion flags

array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;

Per-slot binary flags controlling whether slot $k$ carries:

a $\sigma_{y,k}$ (observation-scale / standard-deviation) dispersion parameter, and
a $\phi_k$ (auxiliary precision or over-dispersion) parameter.

These are "population-scoped" (one value per slot, shared across all $n$ observations and all $p$ coordinates within the slot). The sizes of the sampled dispersion arrays are accumulated in transformed data.

2.11 W-basis construction data

int<lower=0, upper=2> W_type_id;
int<lower=0> W_n_knots_full;
vector[W_n_knots_full] W_knots_full;
int<lower=0> W_degree;

W_type_id — basis type selector (0, 1, or 2; the exact mapping to polynomial / B-spline / other is resolved in the functions block).
W_n_knots_full — number of knots for spline bases.
W_knots_full — the full knot vector of length W_n_knots_full.
W_degree — spline degree.

These are consumed by the W-basis constructor function (section 1) to build the design matrix through which W_raw is expanded into $W$.

2.12 Grouping (Block 6.5)

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;

use_groups — master switch for the hierarchical (per-group) anchor structure.
J_groups — number of groups ($\geq 1$).
group_id — length-$n$ assignment vector, group_id[i] $\in \{1,\ldots,J_{\text{groups}}\}$.

When use_groups = 1, the per-group reference $\theta_{\text{ref},g,k,j}$ is sampled for each group $g$, and observation $i$ uses the reference of its assigned group: $\theta_{\text{ref},\text{group\_id}[i],k,j}$. When use_groups = 0, the hierarchical hyperparameter arrays mu_theta_ref_kp and sigma_theta_ref_kp collapse to size 0 (see parameters), and theta_ref_kp has first dimension $J_{\text{groups}}$ (which is $\geq 1$).

3. `transformed data` block

3.1 Declarations

array[K, p] int J_a_free_kp;
array[K, p] int J_b_free_kp;
array[K, p] int a_raw_offset_kp;
array[K, p] int c_b_raw_offset_kp;
int total_J_a_free = 0;
int total_J_b_free = 0;
int dim_W_eff = (use_W == 1) ? dim_W : 0;
int d_eff = (use_W == 1) ? d : 0;
int any_use_a = 0;
int any_use_b = 0;
int sigma_W_size;
int sigma_y_size = 0;
int phi_size = 0;
array[K] int sigma_y_offset_k;
array[K] int phi_offset_k;

Variable	Meaning
`J_a_free_kp[k, j]`	Effective free additive columns after gating by `use_a_k[k]`. Equals `J_a_per_kp[k, j]` if `use_a_k[k] == 1` and `J_a_per_kp[k, j] > 0`, else 0.
`J_b_free_kp[k, j]`	Effective free multiplicative columns after gating by `use_b_k[k]`.
`a_raw_offset_kp[k, j]`	Starting index (0-based) of slot $(k,j)$'s additive coefficients within the flat vector `a_raw`.
`c_b_raw_offset_kp[k, j]`	Starting index (0-based) of slot $(k,j)$'s multiplicative coefficients within `c_b_kp_raw`.
`total_J_a_free`	$\sum_{k=1}^{K}\sum_{j=1}^{p} J_{a,k,j}^{\text{free}}$ — total length of `a_raw`.
`total_J_b_free`	$\sum_{k=1}^{K}\sum_{j=1}^{p} J_{b,k,j}^{\text{free}}$ — total length of `c_b_kp_raw`.
`dim_W_eff`	`dim_W` if `use_W == 1`, else 0.
`d_eff`	`d` if `use_W == 1`, else 0.
`any_use_a`	1 if any $(k,j)$ has $J_{a,k,j}^{\text{free}} > 0$, else 0.
`any_use_b`	1 if any $(k,j)$ has $J_{b,k,j}^{\text{free}} > 0$, else 0.
`sigma_W_size`	1 if `use_W == 1` and `dim_W > 0`, else 0.
`sigma_y_size`	$\sum_{k=1}^{K} \texttt{use\_dispersion\_y\_k}[k]$.
`phi_size`	$\sum_{k=1}^{K} \texttt{use\_dispersion\_phi\_k}[k]$.
`sigma_y_offset_k[k]`	Starting index (0-based) of slot $k$'s entry in `sigma_y_pop_k`.
`phi_offset_k[k]`	Starting index (0-based) of slot $k$'s entry in `phi_pop_k`.

3.2 The double loop over slots and coordinates

for (k in 1:K) {
  for (j in 1:p) {
    J_a_free_kp[k, j] = (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0)
                          ? J_a_per_kp[k, j] : 0;
    J_b_free_kp[k, j] = (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0)
                          ? J_b_per_kp[k, j] : 0;
    a_raw_offset_kp[k, j] = total_J_a_free;
    c_b_raw_offset_kp[k, j] = total_J_b_free;
    total_J_a_free += J_a_free_kp[k, j];
    total_J_b_free += J_b_free_kp[k, j];
    if (J_a_free_kp[k, j] > 0) any_use_a = 1;
    if (J_b_free_kp[k, j] > 0) any_use_b = 1;
  }
  sigma_y_offset_k[k] = sigma_y_size;
  phi_offset_k[k] = phi_size;
  sigma_y_size += use_dispersion_y_k[k];
  phi_size += use_dispersion_phi_k[k];
}

Line-by-line:

J_a_free_kp[k, j] assignment. The ternary gates the user-supplied J_a_per_kp[k, j] by the slot-level flag use_a_k[k]. If the slot is inactive (use_a_k[k] == 0) or the coordinate genuinely has zero free columns, the effective count is forced to 0. This realizes the "zero-cost branch" mentioned in the comment: inactive slots contribute no coefficients.

$J_{a,k,j}^{\text{free}} = \begin{cases} J_{a,k,j}^{\text{per}} & \text{if } \texttt{use\_a\_k}[k]=1 \text{ and } J_{a,k,j}^{\text{per}}>0, \\ 0 & \text{otherwise.}\end{cases}$
J_b_free_kp[k, j] assignment. Analogous gating for the multiplicative component.
a_raw_offset_kp[k, j] = total_J_a_free. Records the current running total as the 0-based start index for slot $(k,j)$'s block within a_raw. This is the flat-pack offset: the $\ell$-th free additive coefficient of slot $(k,j)$ will be accessed as a_raw[a_raw_offset_kp[k,j] + ell].
c_b_raw_offset_kp[k, j] = total_J_b_free. Analogous offset for c_b_kp_raw.
total_J_a_free += J_a_free_kp[k, j]. Accumulates the total length of the flat additive raw vector.
total_J_b_free += J_b_free_kp[k, j]. Accumulates the total length of the flat multiplicative raw vector.
if (J_a_free_kp[k, j] > 0) any_use_a = 1. Sets the global additive flag if any $(k,j)$ has free additive columns.
if (J_b_free_kp[k, j] > 0) any_use_b = 1. Sets the global multiplicative flag.
sigma_y_offset_k[k] = sigma_y_size. Records the 0-based start index for slot $k$'s $\sigma_y$ entry (either 0 or 1 slots, since each slot contributes at most one).
phi_offset_k[k] = phi_size. Analogous for $\phi$.
sigma_y_size += use_dispersion_y_k[k]. Increments the dispersion array size by 0 or 1.
phi_size += use_dispersion_phi_k[k]. Analogous.

3.3 W-scale size

sigma_W_size = (use_W == 1 && dim_W > 0) ? 1 : 0;

A single global $\sigma_W$ scale is allocated when $W$ is active and has positive dimension. The ternary ensures that use_W = 1 with dim_W = 0 (a degenerate or placeholder configuration) does not allocate a spurious scale parameter.

3.4 Sigma-a compaction (Option A, RG.6, D96)

int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
  int slot_free_a = 0;
  for (j in 1:p) {
    slot_free_a += J_a_free_kp[k, j];
  }
  if (slot_free_a > 0) {
    n_sigma_a += 1;
    sigma_a_idx[k] = n_sigma_a;
  } else {
    sigma_a_idx[k] = 0;
  }
}

Purpose. The comment (RG.6, D96) explains the compaction. The per-slot additive scale $\sigma_{a,k}$ is only meaningful for slots that carry at least one free additive coefficient. A slot with no free $a$-coefficients — either because use_a_k[k] == 0 (no additive component at all) or because the additive formula is intercept-only (e.g. a = ~ 1) and the intercept has been absorbed into $\theta_{\text{ref}}$ — would otherwise declare a sampled-but-unused half-Cauchy / half-normal scale: a flat, non-identified direction in the posterior.

Mechanism.

n_sigma_a — counts slots with at least one free additive coefficient: $n_{\sigma_a} = \big|\{k : \sum_{j=1}^{p} J_{a,k,j}^{\text{free}} > 0\}\big|$.
sigma_a_idx[k] — maps slot $k$ to its 1-based position in the compacted sigma_a_k array, or 0 if the slot has no free additive coefficients.

$$\texttt{sigma_a_idx}[k] = \begin{cases} \text{rank of } k \text{ among slots with } \sum_j J_{a,k,j}^{\text{free}}>0 & \text{if } \sum_j J_{a,k,j}^{\text{free}}>0, \ 0 & \text{otherwise.}\end{cases}$$

When every slot carries free additive coefficients, $n_{\sigma_a} = K$ and sigma_a_idx is the identity map $k \mapsto k$, so the model is bit-identical to the uncompacted version. The compaction is a pure non-identification guard.

4. `parameters` block (truncated)

The block is not closed in this section; it continues into section 3. All declarations up to phi_pop_k are present.

4.1 Per-group reference anchor

array[J_groups, K] vector[p] theta_ref_kp;

The sampled per-group, per-slot, per-coordinate reference anchor $\theta_{\text{ref},g,k,j}$. This is the central parameter of the reference-anchoring decomposition. For observation $i$ in group $g = \texttt{group\_id}[i]$, the linear predictor for slot $k$, coordinate $j$ is anchored at $\theta_{\text{ref},g,k,j}$:

$$\eta_{i,k,j} = \theta_{\text{ref},g,k,j} ;+; \underbrace{\delta^{\text{add}}_{i,k,j}}_{\text{additive}} ;+; \underbrace{\delta^{\text{mul}}_{i,k,j}}_{\text{multiplicative}} ;+; \underbrace{W_{k,j}(x_i)}_{\text{modulating}}.$$

The fixed data anchor theta_anchor_kp provides the centering reference (its exact role — prior mean or offset — is finalized in the model block). The deviation of $\theta_{\text{ref},g,k,j}$ from $\theta_{\text{anchor},k,j}$ is governed by the hierarchical prior declared next.

4.2 Hierarchical hyperparameters of the anchor

array[use_groups, K] vector[p] mu_theta_ref_kp;
array[use_groups, K] vector<lower=0>[p] sigma_theta_ref_kp;

mu_theta_ref_kp — hyper-mean $\mu_{\theta_{\text{ref}},k,j}$ of the per-group anchor prior.
sigma_theta_ref_kp — hyper-scale $\sigma_{\theta_{\text{ref}},k,j} > 0$.

The first dimension is use_groups (0 or 1). When use_groups = 0, both arrays have size 0 and are not sampled; the per-group anchor theta_ref_kp then receives a non-hierarchical prior directly (specified in the model block). When use_groups = 1, the hierarchical prior is:

$$\theta_{\text{ref},g,k,j} \mid \mu_{\theta_{\text{ref}},k,j},;\sigma_{\theta_{\text{ref}},k,j} ;\sim; \mathcal{N}!\big(\mu_{\theta_{\text{ref}},k,j},;\sigma_{\theta_{\text{ref}},k,j}^2\big), \qquad g=1,\ldots,J_{\text{groups}}.$$

(The exact prior family — Gaussian or otherwise — is declared in the model block; the <lower=0> constraint on sigma_theta_ref_kp confirms a positive scale.)

4.3 Additive component (NCP, compacted)

array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;

sigma_a_k — per-slot additive scale $\sigma_{a,k} > 0$, compacted to the $n_{\sigma_a}$ slots identified by sigma_a_idx. Slots with no free additive coefficients have no entry here, eliminating the non-identified flat directions described in §3.4.
a_raw — flat-packed vector of length total_J_a_free of standardized (non-centered) additive coefficients. The NCP reconstruction (in transformed parameters) is:

$$a_{k,j,\ell} = \sigma_{a,,\texttt{sigma_a_idx}[k]};\texttt{a_raw}\big[\texttt{a_raw_offset_kp}[k,j] + \ell\big], \qquad \ell = 1,\ldots,J_{a,k,j}^{\text{free}}.$$

The implied prior on a_raw (declared in the model block) is standard normal:

$$\texttt{a_raw}[m] \sim \mathcal{N}(0, 1), \qquad m = 1,\ldots,\texttt{total_J_a_free},$$

so that marginally $a_{k,j,\ell} \sim \mathcal{N}(0, \sigma_{a,k}^2)$. The NCP decouples the scale $\sigma_{a,k}$ from the direction $a_{\text{raw}}$, improving posterior geometry when $\sigma_{a,k}$ is near zero.

4.4 Multiplicative component (NCP, linear reparametrization)

array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_kp_raw;

sigma_b_k — per-slot multiplicative scale $\sigma_{b,k} > 0$. The array size is $K \times \texttt{any\_use\_b}$: when no slot uses the multiplicative component (any_use_b = 0), the array is empty; when at least one slot does (any_use_b = 1), all $K$ slots receive a scale parameter. (Unlike sigma_a_k, no compaction to active-only slots is applied here.)
c_b_kp_raw — flat-packed vector of length total_J_b_free of standardized multiplicative coefficients.

The comment states the linear reparametrization:

$$c_{b,k,j,\ell} = \theta_{\text{ref},g,k,j}\cdot b_{k,j,\ell},$$

so the multiplicative contribution to the linear predictor is:

$$\delta^{\text{mul}}_{i,k,j} = \sum_{\ell=1}^{J_{b,k,j}} Z_{b,k,j}[i,\ell];c_{b,k,j,\ell} = \theta_{\text{ref},g,k,j}\cdot\sum_{\ell=1}^{J_{b,k,j}} Z_{b,k,j}[i,\ell];b_{k,j,\ell}.$$

This is linear in $c_{b,k,j,\ell}$ (no Jacobian needed for the reparametrization itself), and the NCP reconstruction is:

$$c_{b,k,j,\ell} = \sigma_{b,k};\texttt{c_b_kp_raw}\big[\texttt{c_b_raw_offset_kp}[k,j] + \ell\big].$$

The implied prior is $\texttt{c\_b\_kp\_raw}[m] \sim \mathcal{N}(0,1)$, so marginally $c_{b,k,j,\ell} \sim \mathcal{N}(0, \sigma_{b,k}^2)$, and the multiplicative coefficient $b_{k,j,\ell} = c_{b,k,j,\ell}/\theta_{\text{ref},g,k,j}$ inherits a scale-mixture structure induced by the randomness in $\theta_{\text{ref}}$.

Observation (faithful to code). When any_use_b = 1 but some slot $k$ has use_b_k[k] = 0, that slot's sigma_b_k[k] is declared but has no corresponding entries in c_b_kp_raw (since J_b_free_kp[k, j] = 0 for all $j$). Whether this scale is used in the model block or left as a sampled-but-prior-only parameter is determined in a later section. The code as written does not compact sigma_b_k the way sigma_a_k is compacted.

4.5 Global modulating component W

array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;

sigma_W — global scale $\sigma_W > 0$ for the modulating component. Size 1 when active, 0 otherwise.
W_raw — $\dim_{W,\text{eff}} \times d_{\text{eff}}$ matrix of standardized entries. When use_W = 0, both dimensions are 0, yielding a $0 \times 0$ matrix (a zero-cost placeholder).

The NCP reconstruction (in transformed parameters) is:

$$W = \sigma_W \cdot W_{\text{raw}},$$

and the modulating contribution to the linear predictor is:

$$W_{k,j}(x_i) = \sum_{m=1}^{d} W_{k,j,m};X_{i,m} = \sigma_W \sum_{m=1}^{d} W_{\text{raw},k,j,m};X_{i,m}.$$

The implied prior is $W_{\text{raw},r,s} \sim \mathcal{N}(0,1)$, so marginally $W_{r,s} \sim \mathcal{N}(0, \sigma_W^2)$.

4.6 Per-slot dispersion parameters

array[sigma_y_size] real<lower=0> sigma_y_pop_k;
array[phi_size] real<lower=0> phi_pop_k;

sigma_y_pop_k — compacted array of per-slot observation-scale dispersions $\sigma_{y,k} > 0$. Only slots with use_dispersion_y_k[k] = 1 have an entry; the mapping is via sigma_y_offset_k[k].
phi_pop_k — compacted array of per-slot auxiliary dispersions $\phi_k > 0$. Only slots with use_dispersion_phi_k[k] = 1 have an entry; the mapping is via phi_offset_k[k].

Access pattern: for slot $k$, if use_dispersion_y_k[k] = 1, the dispersion is sigma_y_pop_k[sigma_y_offset_k[k] + 1]; otherwise the family dispatcher uses a default (family-specific) value. Analogously for $\phi_k$.

5. Summary: AMM reference-anchoring decomposition as encoded

The declarations in sections 1–2 establish the following structural decomposition. For observation $i$ (in group $g = \texttt{group\_id}[i]$), slot $k$, coordinate $j$:

$$\boxed{\eta_{i,k,j} = \theta_{\text{ref},g,k,j} ;+; \sigma_{a,k}\sum_{\ell=1}^{J_{a,k,j}^{\text{free}}} Z_{a,k,j}[i,\ell];\texttt{a_raw}[\cdot] ;+; \sigma_{b,k}\sum_{\ell=1}^{J_{b,k,j}^{\text{free}}} Z_{b,k,j}[i,\ell];\texttt{c_b_kp_raw}[\cdot] ;+; \sigma_W\sum_{m=1}^{d} W_{\text{raw},k,j,m};X_{i,m}}$$

$$\mu_{i,k,j} = g_k^{-1}(\eta_{i,k,j}), \qquad y_{i,k,j} \sim \mathcal{F}!\big(\mu_{i,k,j},;\sigma_{y,k},;\phi_k\big).$$

The reference-anchoring is realized by:

Fixed anchor (theta_anchor_kp, data) — the immutable reference point.
Per-group sampled reference (theta_ref_kp, parameters) — deviates from the anchor under a hierarchical prior (mu_theta_ref_kp, sigma_theta_ref_kp) when use_groups = 1.
Additive deviation (a_raw, sigma_a_k) — NCP, compacted to active slots, contributing a linear shift $\delta^{\text{add}}$.
Multiplicative deviation (c_b_kp_raw, sigma_b_k) — NCP with the linear reparametrization $c_b = \theta_{\text{ref}} \cdot b$, contributing a $\theta_{\text{ref}}$-scaled shift $\delta^{\text{mul}}$.
Global modulator (W_raw, sigma_W) — a shared covariate-driven correction $W \cdot X$.

The transformed parameters block (section 3) will assemble $\eta_{i,k,j}$ from these pieces; the model block (section 4 or 5) will declare the priors on a_raw, c_b_kp_raw, W_raw, sigma_a_k, sigma_b_k, sigma_W, theta_ref_kp, and the hyperparameters, plus the family-specific likelihood; and generated quantities (section 5) will produce posterior summaries.

`transformed parameters` block (lines 326–423)

This block reconstructs the AMM coefficients from their non-centered raw draws and assembles the per-slot, per-coordinate linear predictor eta_kp.

Declarations. array[K] matrix[n, p] eta_kp holds, for each distributional slot $k$, the $n\times p$ matrix of linear predictors $\eta_{ikj}$ (observation $i$, coordinate $j$). The coefficient arrays a_coef_kp, c_b_kp, b_coef_kp are array[K, p] vector[J_*_max] — per slot and per coordinate, padded to the maximum basis dimension. All three are zero-initialized over the full $K\times p$ grid so that inactive slots/coordinates contribute nothing.

Non-centered reconstruction of the additive coefficients (lines 340–353). When any additive component is active, for each active slot ($\texttt{use\_a\_k}[k]=1$) and coordinate with free columns ($J_{a,\text{free}}[k,j]>0$), the coefficient is the standardized raw draw scaled by the slot's hyper-scale: $$a_{\text{coef}}[k,j]{jj} = a{\text{raw}}\big[\text{offset}{a}[k,j]+jj\big]\cdot\sigma{a}[,\text{sigma_a_idx}[k],].$$ This is the non-centered parametrization (NCP): $a_{\text{raw}}\sim\mathcal N(0,1)$ in the model block, so $a_{\text{coef}}\sim\mathcal N(0,\sigma_a^2)$ by construction, with the funnel between scale and coefficient broken (§II.6).

Non-centered reconstruction of the multiplicative coefficients (lines 355–380). Symmetrically, the linear product coordinate $c_b$ is reconstructed as $c_{b}[k,j]_{jj}=c_{b,\text{raw}}[\text{offset}_b[k,j]+jj]\cdot\sigma_b[k]$. This is the linear reparametrization of §II.6: the model samples $c_b=\theta_{\text{ref}}\cdot b$ directly rather than $b$, removing the bilinear $(\theta_{\text{ref}},b)$ geometry. The original multiplicative coefficient $b_{\text{coef}}$ is recovered only as a reported derived quantity in the single-anchor regime ($\texttt{use\_groups}=0$ and $\theta_{\text{ref}}\neq0$): $$b_{\text{coef}}[k,j]{jj}=c_b[k,j]{jj}\big/\theta_{\text{ref}}[1,k]_j.$$ Under per-group anchors ($\texttt{use\_groups}=1$) b_coef_kp stays at zero; downstream callers derive it per group from $c_b$ and $\theta_{\text{ref}}[g,k]_j$ (the same convention as amm_eb_marginal_KxP.stan and amm_distrib_K.stan).

Linear-predictor assembly (lines 382–422). For every slot $k$, observation $i$ (with group $g_i=\texttt{group\_id}[i]$) and coordinate $j$, the AMM decomposition is evaluated on the slot's link scale: $$\eta_{ikj}=\underbrace{\theta_{\text{ref}}[g_i,k]j}{\text{anchor}}+\underbrace{Z_a[k,j][i,\cdot],a_{\text{coef}}[k,j]}{\text{additive } a(x)}+\underbrace{Z_b[k,j][i,\cdot],c_b[k,j]}{\text{multiplicative } b(x)\odot\theta_{\text{ref}}}+\underbrace{\big(W^{\Delta}(\theta_{\text{ref}})\big),x_i}_{\text{modulated}}.$$ The additive and multiplicative terms are dot_products over the active basis columns. The modulated term (lines 400–418) uses the globally-shared W_raw matrix (row count $K\cdot p\cdot W_{\text{per\_kj\_dim}}$; the $(k,j)$ block begins at row $((k-1)p+(j-1))W_{\text{per\_kj\_dim}}+1$) and the differenced basis $\texttt{apply\_W\_basis\_diff}(\cdot)$ which evaluates $\,b(\theta_{\text{ref}})-b(\theta_{\text{anchor}})\,$ so that the anchoring constraint (C4) $W(\theta_0)=0$ holds identically; {{W_SCALE}} is the codegen-injected scale. The accumulated $W^\Delta$-weighted covariate row is contracted with $x_i$ and added to $\eta_{ikj}$, which is stored in eta_kp[k][i,j].

`model` block (lines 425–597)

Priors. {{THETA_REF_PRIOR_BLOCK}} is the codegen-injected, family-specific per-slot per-coord anchor prior (hierarchical $\theta_{\text{ref}}[g,k]_j\sim\mathrm{Normal}(\mu_{\theta_{\text{ref}}},\sigma_{\theta_{\text{ref}}})$ when $\texttt{use\_groups}=1$; flat coord-wise i.i.d. otherwise; the Tweedie $K=3$ bounded power slot is emitted with slice notation by .gdpar_build_theta_ref_prior_block_KxP()). The non-centered scale priors follow: $\sigma_a\sim${{PRIOR_SIGMA_A}}, $a_{\text{raw}}\sim\mathcal N(0,1);\ \sigma_b\sim${{PRIOR_SIGMA_B}}, $c_{b,\text{raw}}\sim\mathcal N(0,1);\ \sigma_W\sim${{PRIOR_SIGMA_W}}, $\mathrm{vec}(W_{\text{raw}})\sim${{W_PRIOR}}; plus the population auxiliary priors {{PRIOR_SIGMA_Y}} and {{PRIOR_PHI}} when those slots exist.

Likelihood — family dispatch with coord-wise factorization (lines 461–597). The canonical Path-B family is selected by family_id_k_vector[1], and each observation–coordinate pair $(i,j)$ contributes independently (the coord-wise factorization). Auxiliary slots are mapped through apply_inv_link_by_id(inv_link_id_per_slot[k], eta_kp[k][i,j]). The branches:

1 Gaussian ($K=2$): $y_{ij}\sim\mathcal N\!\big(\eta_{1,ij},\,\sigma_{ij}\big)$ with $\sigma_{ij}=g^{-1}(\eta_{2,ij})$.
3 Negative-binomial ($K=2$): $y_{ij}\sim\text{NB2}\!\big(e^{\eta_{1,ij}},\,\phi_{ij}\big)$.
5 Beta ($K=2$): $y_{ij}\sim\text{beta\_proportion}\!\big(\text{logit}^{-1}(\eta_{1,ij}),\,\phi_{ij}\big)$.
6 Gamma ($K=2$): mean $\mu_{ij}=e^{\eta_{1,ij}}$, shape $s_{ij};\ y_{ij}\sim\text{Gamma}(s_{ij},\,s_{ij}/\mu_{ij})$ (rate = shape/mean).
7 Lognormal (loc–scale, $K=2$): $y_{ij}\sim\text{Lognormal}(\eta_{1,ij},\sigma_{ij})$.
8 Student-$t$ ($K=3$): $y_{ij}\sim t\big(\nu=e^{\eta_{3,ij}},\,\mu=\eta_{1,ij},\,\sigma=e^{\eta_{2,ij}}\big)$.
9 Tweedie ($K=3$): $y_{ij}\sim\text{Tweedie}\big(e^{\eta_{1,ij}},e^{\eta_{2,ij}},p=\eta_{3,ij}\big)$ via the package's tweedie_lpdf.
10 ZIP ($K=2$): the dual deviation at work — for $y_{ij}=0,\ \;\log\!\big[e^{\text{bern\_logit}(1|\eta_\pi)}+e^{\text{bern\_logit}(0|\eta_\pi)}\,p_{\text{Pois}}(0|\eta_\mu)\big]$ via log_sum_exp; for $y_{ij}>0,\ \;\text{bern\_logit}(0|\eta_\pi)+\text{poisson\_log}(y|\eta_\mu)$.
11 ZINB ($K=3$): identical mixture with $\text{NB2}$ replacing Poisson, $\phi=e^{\eta_{2,ij}},\ \eta_\pi=\eta_{3,ij}$.
12 Hurdle-Poisson ($K=2$): $y=0\Rightarrow\text{bern\_logit}(1|\eta_\pi);\ y>0\Rightarrow\text{bern\_logit}(0|\eta_\pi)+\text{poisson\_log}(y|\eta_\mu)-\log\!\big(1-e^{-e^{\eta_\mu}}\big)$ — the log1m_exp term is the one-truncation normalizer of the zero-truncated Poisson.
13 Hurdle-NB ($K=3$): same hurdle structure with NB2 and its zero-truncation normalizer $-\,\text{log1m\_exp}\big(\text{NB2\_log}(0|\eta_\mu,\phi)\big)$.

Each branch realizes the AMM anchoring on every slot (location and the auxiliaries that were promoted to per-observation), which is the multivariate ($p>1$) × multi-slot ($K>1$) distributional regression of §II.4/§II.6.

`generated quantities` block (lines 600–761)

This block produces, per observation $i$ and coordinate $j$, three matrices: log_lik[n,p] (the pointwise log-likelihood used by PSIS-LOO / gdpar_loo), theta_i[n,p] (the fitted location-slot linear predictor), and y_pred[n,p] (a posterior-predictive draw). Its shape mirrors amm_eb_marginal_KxP.stan extended to the full Path-B family set of amm_distrib_K.stan.

For each $(i,j)$ the same family_id_k_vector[1] dispatch as the model block is reproduced, now emitting _lpdf/_lpmf for log_lik and _rng for y_pred:

1 Gaussian: theta_i = eta_kp[1]; log_lik = normal_lpdf(y | eta_1, sigma); y_pred = normal_rng(eta_1, sigma), $\sigma=g^{-1}(\eta_2)$.
3 NB2: $\mu=e^{\eta_1},\ \phi=g^{-1}(\eta_2)$; neg_binomial_2_lpmf / _rng.
5 Beta: $\mu=\text{logit}^{-1}(\eta_1),\ \phi=g^{-1}(\eta_2)$; beta_proportion_lpdf / _rng.
6 Gamma: $\mu=e^{\eta_1}$, shape $s=g^{-1}(\eta_2)$; gamma_lpdf(y|s, s/μ) / _rng.
7 Lognormal: lognormal_lpdf(y|eta_1, sigma) / _rng.
8 Student-$t$: $\nu=e^{\eta_3},\ \mu=\eta_1,\ \sigma=e^{\eta_2}$; student_t_lpdf/_rng; here theta_i = mu is set from $\eta_1$ explicitly.
9 Tweedie: $\mu=e^{\eta_1},\ \phi=e^{\eta_2},\ p=\eta_3$; tweedie_lpdf/_rng.
10 ZIP / 11 ZINB: log_lik is the same zero-inflation log_sum_exp mixture as the model block; the predictive draw first samples the structural-zero indicator bernoulli_logit_rng(eta_pi) — if 1, $y_{\text{pred}}=0$; else a poisson_log_rng (ZIP) / neg_binomial_2_log_rng (ZINB) count.
12 Hurdle-Poisson / 13 Hurdle-NB: log_lik reproduces the hurdle decomposition (with the log1m_exp zero-truncation normalizer); the predictive draw samples the hurdle indicator, and on a "positive" outcome draws from the count distribution rejection-sampling until non-zero (while (y_draw == 0 && iter_h < 10000)), realizing the zero-truncated count.
else (unknown family id): a safe fallback — theta_i = eta_1, log_lik = negative_infinity(), y_pred = 0 — so an unrecognized family_id yields a $-\infty$ contribution rather than a crash.

The log_lik matrix is what gdpar_loo() aggregates (summing over coordinates per observation, §III.5) into the ELPD; y_pred feeds pp_check/gdpar_posterior_predict and the DHARMa residual machinery; theta_i is the anchored individual location estimate $\widehat\theta_i$ on the link scale.

close of `generated quantities` (line 762)

Line 762 is the closing brace } of the generated quantities block — the end of the amm_canonical_pmulti_KxP.stan canonical piece. With it the template is complete: data / transformed data (sections c01–c02), parameters / transformed parameters and model (c03), and generated quantities (c04). This piece is the full-Bayes Stan program for the multivariate ($p>1$) × multi-slot ($K>1$) AMM (Path C), assembled by R/stan_codegen.R from this and the shared canonical helpers.

← Part V — Stan Templates (1/3) · gdpar Wiki Home · Part V — Stan Templates (3/3) →

gdpar Wiki

Home

Part V Stan Templates 2

inst/stan/_canonical_pieces/amm_canonical_eb_marginal_K.stan

Stan Template Walkthrough: amm_canonical_eb_marginal_K.stan — Section 1 of 3

1. Header Comments

AMM Canonical Decomposition

b_k Reparametrization (Recovery 2)

Scope Decisions

2. functions { } Block

2.1 tweedie_log_W_series — Dunn–Smyth Series Log-Sum

2.2 tweedie_log_f_series — Exact Log-Density via Series

2.3 tweedie_log_f_saddlepoint — Saddlepoint Log-Density

2.4 tweedie_lpdf — Public Log-PDF Entry Point

2.5 tweedie_rng — Compound Poisson–Gamma Random Number Generator

2.6 apply_inv_link_by_id — Inverse-Link Dispatch

2.7 {{CANONICAL_HELPERS}} — Code-Generated Helper Functions

3. data { } Block

3.1 Core Dimensions

3.2 Family and Link Identifiers

3.3 AMM Component Flags

3.4 Basis Dimensions and Design Matrices

3.5 Global Modulating Component (W)

3.6 Outcomes

3.7 Anchoring Parameters

3.8 Dispersion Flags

3.9 W Basis Spline Configuration (Sub-phase 8.3.8)

3.10 Grouping Structure (Block 6.5)

3.11 CP/NCP Parametrization per Slot (Placeholder)

3.12 EB Convention Fields

Summary of Template Architecture

Block-by-Block Walkthrough of amm_canonical_eb_marginal_K.stan (Section 2 of 3)

1. transformed data Block

1.1 Per-slot free-coefficient counts and offsets

1.2 Effective W dimensions

1.3 Activity flags and dispersion sizes

1.4 Compaction of sigma_a_k (Option A / D96)

2. parameters Block

2.1 Per-group anchors θ_ref^(k)

2.2 Hierarchical hyperparameters

2.3 Additive component scales and raw coefficients

2.4 Multiplicative component scales and raw coefficients

2.5 Global modulating component W

2.6 Population-scoped dispersion

3. transformed parameters Block

3.1 Declarations

3.2 Reconstruction of additive coefficients ({{TP_A_BLOCK}})

3.3 Reconstruction of multiplicative coefficients

3.4 Per-observation linear predictor construction

4. model Block

4.1 Anchor priors ({{THETA_REF_PRIOR_BLOCK}})

4.2 Additive component priors

4.3 Multiplicative component priors

4.4 Modulating component priors

4.5 Dispersion priors

4.6 Likelihood (distributional regression)

4.6.1 Family 1 — Gaussian (K=2)

4.6.2 Family 3 — Negative Binomial (K=2)

4.6.3 Family 5 — Beta (K=2)

4.6.4 Family 6 — Gamma (K=2)

Summary of the Reference-Anchoring Decomposition

Generated Quantities Block

Variable Declarations

Observation Loop

Family 1: Normal

Family 3: Negative Binomial

Family 5: Beta-Proportion

Family 6: Gamma

Family 7: Lognormal

Family 8: Student-t

Family 9: Tweedie

Family 10: Zero-Inflated Poisson (ZIP)

Family 11: Zero-Inflated Negative Binomial (ZINB)

Family 12: Hurdle-Poisson

Family 13: Hurdle-Negative Binomial

Default Case

Relation to AMM Reference-Anchoring

inst/stan/_canonical_pieces/amm_canonical_eb_marginal_multi.stan

amm_canonical_eb_marginal_multi.stan — Block-by-Block Technical Reference

functions { }

data { }

Core dimensions

Stan Template Walkthrough: `amm_canonical_eb_marginal_K.stan` — Section 1 of 3

2. `functions { }` Block

2.1 `tweedie_log_W_series` — Dunn–Smyth Series Log-Sum

2.2 `tweedie_log_f_series` — Exact Log-Density via Series

2.3 `tweedie_log_f_saddlepoint` — Saddlepoint Log-Density

2.4 `tweedie_lpdf` — Public Log-PDF Entry Point

2.5 `tweedie_rng` — Compound Poisson–Gamma Random Number Generator

2.6 `apply_inv_link_by_id` — Inverse-Link Dispatch

2.7 `{{CANONICAL_HELPERS}}` — Code-Generated Helper Functions

3. `data { }` Block

Block-by-Block Walkthrough of `amm_canonical_eb_marginal_K.stan` (Section 2 of 3)

1. `transformed data` Block

1.4 Compaction of `sigma_a_k` (Option A / D96)

2. `parameters` Block

3. `transformed parameters` Block

3.2 Reconstruction of additive coefficients (`{{TP_A_BLOCK}}`)

4. `model` Block

4.1 Anchor priors (`{{THETA_REF_PRIOR_BLOCK}}`)

`amm_canonical_eb_marginal_multi.stan` — Block-by-Block Technical Reference

`functions { }`

`data { }`

`transformed data { }`

`parameters { }`

`transformed parameters { }`

`model { }`

`generated quantities { }`

Stan Template `amm_canonical_eb_marginal_multi.stan` — Section 2 of 2

1. The leading `}`

2. `generated quantities` block — header

3. Declarations inside `generated quantities`

3.1 `matrix[n, p] log_lik;`

3.2 `matrix[n, p] theta_i = eta;`

3.3 `matrix[n, p] y_pred;`

5.1 `family_id == 1` — Gaussian (identity link)

5.2 `family_id == 2` — Poisson (log link)

5.3 `family_id == 3` — Negative Binomial type 2 (log link)

5.4 `family_id == 4` — Bernoulli (logit link)