Part V Stan Templates 3

← Part V — Stan Templates (2/3) · gdpar Wiki Home · Part VI — Data, Benchmarks, Tests & References →

inst/stan/_canonical_pieces/amm_canonical_pmulti.stan

`functions` Block

functions {
  // {{CANONICAL_HELPERS}}
}

A placeholder block. At code-generation time, the R helper generate_stan_code_multi() replaces {{CANONICAL_HELPERS}} with the actual Stan function definitions. Based on the header and the call site in transformed parameters, at minimum this block must define:

vector apply_W_basis_diff(int W_type_id, real theta_ref, real theta_anchor,
                          int W_per_k_dim, int W_degree,
                          int W_n_knots_full, vector W_knots_full)

This function returns a vector of length W_per_k_dim whose $j$-th element is $\phi_j(\theta_{\text{ref}}) - \phi_j(\theta_{\text{anchor}})$, where $\{\phi_j\}$ is a set of basis functions dispatched by W_type_id:

`W_type_id`	Basis	Evaluation
0	(off)	Not called; guarded by `use_W == 0`.
1	Polynomial (degree `W_degree`)	Monomials $1, x, x^2, \ldots, x^{d_{\text{poly}}}$ evaluated at both arguments; difference taken.
2	B-spline	Cox–de Boor recursion re-evaluated at both `theta_ref` and `theta_anchor` on the knot vector `W_knots_full`; difference of the two B-spline coefficient vectors. Because evaluation depends on the current `theta_ref`, which changes every HMC leapfrog step, this is non-separable in the Hamiltonian sense.

The function is called once per observation $i$, coordinate $k$, and leapfrog step — i.e., $O(n \cdot p)$ evaluations per gradient evaluation.

`data` Block

All inputs supplied from R via rstan::sampling(..., data = stan_data).

Scalar dimensions and model flags

Declaration	Meaning
`int<lower=1> n`	Number of observations.
`int<lower=1> p`	Dimension of the multivariate outcome $\mathbf{y}_i \in \mathbb{R}^p$ (or $\mathbb{Z}^p$).
`int<lower=1, upper=4> family_id`	Homogeneous likelihood family across all $p$ coordinates: 1 = Gaussian, 2 = Poisson, 3 = NB2, 4 = Bernoulli.
`int<lower=0, upper=1> use_a`	Toggle additive component $a_k(\mathbf{x})$.
`int<lower=0, upper=1> use_b`	Toggle multiplicative component $b_k(\mathbf{x})$.
`int<lower=0, upper=1> use_W`	Toggle modulating component $W(\theta_{\text{ref}})$.

Ragged design matrix dimensions

Declaration	Meaning
`int<lower=0> J_a_max`	$\max_k J_{a,k}$: the widest additive basis across coordinates. Padded column count.
`int<lower=0> J_b_max`	$\max_k J_{b,k}$: the widest multiplicative basis.
`array[p] int<lower=0> J_a_per_k`	Per-coordinate active column count for $Z_a$. Element $k$ equals 0 when the additive component is inactive for coordinate $k$.
`array[p] int<lower=0> J_b_per_k`	Analogous for $Z_b$.

Padded design matrices

array[p] matrix[n, J_a_max] Z_a;
array[p] matrix[n, J_b_max] Z_b;

For coordinate $k$, only the sub-matrix Z_a[k][, 1:J_a_per_k[k]] is referenced. The remaining padded columns occupy memory but never enter any computation. The R-side constructor amm_spec_multi_designs() column-centers every active column so that $\text{colMean}(\mathbf{Z}_{a,k}[,j]) = 0$, ensuring identifiability condition (C2): $\mathbb{E}[a_k(\mathbf{X})] = 0$ for any coefficient vector.

Modulating component (W) data

Declaration	Meaning
`int<lower=0> dim_W`	Total number of basis rows across all coordinates. For separable bases, $\text{dim\_W} = p \times W_{\text{per\_k\_dim}}$.
`int<lower=0> d`	Dimension of the modulator covariate vector $\mathbf{x}_i \in \mathbb{R}^d$.
`int<lower=0> W_per_k_dim`	Number of basis rows allocated to each coordinate: $W_{\text{per\_k\_dim}} = \text{dim\_W}/p$.
`matrix[n, d] X`	Modulator covariate matrix. Row $i$ is $\mathbf{x}_i^\top$.

Outcome matrices

matrix[n, p] y_real;
array[n, p] int y_int;

Both are allocated. Only the one matching family_id is consumed:

y_real when family_id == 1 (Gaussian).
y_int when family_id ∈ {2, 3, 4} (Poisson, NB2, Bernoulli).

Anchoring

vector[p] theta_anchor;

The fixed reference anchor $\boldsymbol{\theta}_{\text{anchor}} \in \mathbb{R}^p$. Entered once and held constant throughout sampling. Satisfies condition (C4): the modulating component is anchored by evaluating $W(\theta_{\text{ref}}) - W(\theta_{\text{anchor}})$.

Dispersion toggles and hyperpriors

Declaration	Meaning
`int<lower=0, upper=1> use_dispersion_y`	Whether the Gaussian family uses a per-coordinate observation noise $\sigma_{y,k}$.
`int<lower=0, upper=1> use_dispersion_phi`	Whether the NB2 family uses a per-coordinate dispersion $\phi_k$.

W basis dispatch parameters

Declaration	Meaning
`int<lower=0, upper=2> W_type_id`	0 = off, 1 = polynomial, 2 = B-spline.
`int<lower=0> W_n_knots_full`	Length of the (possibly padded) knot vector for B-splines.
`vector[W_n_knots_full] W_knots_full`	Knot positions. Ignored for polynomial.
`int<lower=0> W_degree`	Polynomial degree or B-spline degree.

Grouping (hierarchical extension, Block 6.5)

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;

When use_groups == 0 (default): J_groups == 1 and group_id[i] == 1 for all $i$, collapsing to a single global $\boldsymbol{\theta}_{\text{ref}}$. When use_groups == 1: each group $g \in \{1,\ldots,J_{\text{groups}}\}$ gets its own $\boldsymbol{\theta}_{\text{ref}}[g]$, drawn from a common Normal hyperprior. Identifiability condition (C5) (enforced on the R side) requires that Z_a[k] must not contain a factor(group) indicator when use_groups == 1.

Placeholder

{{DATA_CP_A_PER_K_DECL}}

Substituted at code-generation time. Declares any additional data needed for the configurable CP/NCP (centered / non-centered) parametrization of a_raw, e.g., a vector of per-coordinate logical flags cp_a_per_k.

`transformed data` Block

Computes integer bookkeeping for flat-packed coefficient vectors.

array[p] int J_a_free;
array[p] int J_b_free;
array[p] int a_raw_offset;
array[p] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;

Size calculations

Declaration	Formula	Purpose
`dim_W_eff`	`(use_W == 1) ? dim_W : 0`	Actual rows of `W_raw`; 0 when W is off.
`d_eff`	`(use_W == 1) ? d : 0`	Actual columns of `W_raw`; 0 when W is off.
`sigma_a_size`	`(use_a == 1) ? p : 0`	Number of per-coordinate $\sigma_a$ scalars.
`sigma_b_size`	`(use_b == 1) ? p : 0`	Number of per-coordinate $\sigma_b$ scalars.
`sigma_W_size`	`(use_W == 1 && dim_W > 0) ? 1 : 0`	Single global $\sigma_W$ (0 or 1).
`sigma_y_size`	`(use_dispersion_y == 1) ? p : 0`	Per-coordinate Gaussian $\sigma_{y,k}$.
`phi_size`	`(use_dispersion_phi == 1) ? p : 0`	Per-coordinate NB2 dispersion $\phi_k$.

Per-coordinate free counts and flat offsets

for (k in 1:p) {
    J_a_free[k] = (use_a == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
    J_b_free[k] = (use_b == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
    a_raw_offset[k] = total_J_a_free;
    c_b_raw_offset[k] = total_J_b_free;
    total_J_a_free += J_a_free[k];
    total_J_b_free += J_b_free[k];
}

For each coordinate $k$:

$J_{a,\text{free}}[k]$: number of actually-sampled additive coefficients (0 if component is inactive or has zero columns).
$J_{b,\text{free}}[k]$: same for multiplicative.
a_raw_offset[k]: the 0-based index into the flat vector a_raw where coordinate $k$'s block begins. The segment for coordinate $k$ is a_raw[(a_raw_offset[k]+1) : (a_raw_offset[k]+J_a_free[k])] (1-indexed in Stan).
c_b_raw_offset[k]: same for c_b_raw.
After the loop, total_J_a_free = \sum_{k=1}^p J_{a,\text{free}}[k] and similarly for total_J_b_free.

This implements ragged-array emulation — Stan has no native ragged arrays, so coordinates with different basis widths are flat-packed into a single vector with per-coordinate offset arithmetic.

`parameters` Block

All parameters on the unconstrained sampling space.

Group-level reference parameters

array[J_groups] vector[p] theta_ref;
array[use_groups] vector[p] mu_theta_ref;
array[use_groups] vector<lower=0>[p] sigma_theta_ref;

Parameter	Dimension	Constr.	Role
`theta_ref[g]`	$\mathbb{R}^p$ for $g=1,\ldots,J_{\text{groups}}$	unconstrained	Per-group reference anchor $\boldsymbol{\theta}_{\text{ref}}[g]$. When `use_groups == 0`, only `theta_ref[1]` is sampled.
`mu_theta_ref[1]`	$\mathbb{R}^p$	unconstrained	Hyperprior mean $\boldsymbol{\mu}_{\theta}$. Allocated only when `use_groups == 1`.
`sigma_theta_ref[1]`	$\mathbb{R}_{>0}^p$	`lower=0`	Hyperprior scale $\boldsymbol{\sigma}_{\theta}$. Allocated only when `use_groups == 1`.

Additive component

array[sigma_a_size] real<lower=0> sigma_a;
vector[total_J_a_free] a_raw;

Parameter	Role
`sigma_a[k]`	Per-coordinate prior scale for the additive coefficients. Exists only when `use_a == 1`.
`a_raw`	Flat-packed raw additive coefficients. Depending on `cp_a_per_k[k]`, these are either the actual coefficients (centered parametrization, CP) or standard-normal auxiliary variables mapped via $\mathbf{a}_k = \sigma_{a,k} \cdot \mathbf{a}_{\text{raw},k}$ (non-centered parametrization, NCP). The substitution `{{TP_A_BLOCK}}` and `{{MODEL_A_BLOCK}}` handle the dispatch.

Multiplicative component

array[sigma_b_size] real<lower=0> sigma_b;
vector[total_J_b_free] c_b_raw;

Parameter	Role
`sigma_b[k]`	Per-coordinate prior scale for the multiplicative (reparametrized) coefficients.
`c_b_raw`	Flat-packed standard-normal raw variables for $\mathbf{c}_b$. Always sampled NCP: `c_b_raw ~ normal(0, 1)`, then transformed to `c_b[k] = sigma_b[k] * c_b_raw[k]` in `transformed parameters`.

The key reparametrization: we sample $\mathbf{c}_b$ directly (not $\mathbf{b}$), where $\mathbf{c}_{b,k} = \theta_{\text{ref}}[k] \cdot \mathbf{b}_k$. This makes the additive contribution $\mathbf{Z}_{b,k} \mathbf{c}_{b,k}$ linear in the parameters, eliminating the bimodality documented in Recovery 2 (the $(\theta_{\text{ref}}, \mathbf{b})$ space has a ridge where sign flips of $\theta_{\text{ref}}$ and $\mathbf{b}$ produce the same likelihood).

Modulating component

array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;

Parameter	Role
`sigma_W[1]`	Single global prior scale for W coefficients.
`W_raw`	$\text{dim\_W} \times d$ coefficient matrix. Rows are partitioned into $p$ consecutive blocks of $W_{\text{per\_k\_dim}}$ rows each; block $k$ corresponds to coordinate $k$ of $\theta_{\text{ref}}$. The prior on `W_raw` is either direct (CP: `W_raw ~ ...`) or involves a scaling by `sigma_W` (NCP: the `{{W_SCALE}}` placeholder in `transformed parameters`).

Dispersion parameters

array[sigma_y_size] real<lower=0> sigma_y;
array[phi_size] real<lower=0> phi;

Parameter	Family	Role
`sigma_y[k]`	Gaussian (`family_id == 1`)	Per-coordinate observation standard deviation $\sigma_{y,k}$.
`phi[k]`	NB2 (`family_id == 3`)	Per-coordinate reciprocal dispersion (Stan's mean–dispersion parametrization: $\text{NB2}(\mu, \phi)$ with $\text{Var} = \mu + \mu^2/\phi$).

`transformed parameters` Block

Constructs the linear predictor $\eta_{i,k}$ from the AMM decomposition, and exposes derived coefficient vectors.

Declarations

array[p] vector[J_a_max] a_coef;
array[p] vector[J_b_max] c_b;
array[p] vector[J_b_max] b_coef;
matrix[n, p] eta;

Variable	Shape	Meaning
`a_coef[k]`	$J_{a,\text{max}}$	Additive coefficients for coordinate $k$. Padded to length $J_{a,\text{max}}$; only indices $1:J_{a,\text{free}}[k]$ are non-zero.
`c_b[k]`	$J_{b,\text{max}}$	Reparametrized multiplicative coefficients: $\mathbf{c}_{b,k} = \sigma_{b,k} \cdot \mathbf{c}_{b,\text{raw},k}$.
`b_coef[k]`	$J_{b,\text{max}}$	Recovered multiplicative coefficients: $\mathbf{b}_k = \mathbf{c}_{b,k} / \theta_{\text{ref}}[1][k]$ (only computed when `use_groups == 0` and $\theta_{\text{ref}}[1][k] \neq 0$; otherwise stays at zero — a derived quantity, not sampled).
`eta[i,k]`	$n \times p$	Linear predictor. Its interpretation depends on `family_id`.

Initialization to zero

for (k in 1:p) {
    a_coef[k] = rep_vector(0, J_a_max);
    c_b[k]     = rep_vector(0, J_b_max);
    b_coef[k]  = rep_vector(0, J_b_max);
}

Ensures all padded positions are identically zero.

Additive coefficient construction

{{TP_A_BLOCK}}

Placeholder replaced at code-generation time. Implements per-coordinate CP/NCP logic:

CP (cp_a_per_k[k] == 1): a_coef[k][1:J_a_free[k]] = a_raw[(a_raw_offset[k]+1):(a_raw_offset[k]+J_a_free[k])]. The coefficients are sampled directly; the prior {{MODEL_A_BLOCK}} then targets a_raw with scale $\sigma_{a,k}$.
NCP (cp_a_per_k[k] == 0): a_coef[k][1:J_a_free[k]] = sigma_a[k] * a_raw[...]. The raw variables are standard normal; the scaling by $\sigma_{a,k}$ happens here.

Mathematically: $$\mathbf{a}k = \begin{cases} \mathbf{a}{\text{raw},k} & \text{(CP: prior has scale } \sigma_{a,k}\text{)} \ \sigma_{a,k} \cdot \mathbf{a}_{\text{raw},k} & \text{(NCP: raw} \sim \mathcal{N}(0, I)\text{)} \end{cases}$$

Multiplicative coefficient construction

if (use_b == 1) {
    for (k in 1:p) {
      if (J_b_per_k[k] > 0) {
        for (j in 1:J_b_free[k]) {
          c_b[k][j] = c_b_raw[c_b_raw_offset[k] + j] * sigma_b[k];
        }
        if (use_groups == 0 && theta_ref[1][k] != 0) {
          for (j in 1:J_b_per_k[k]) {
            b_coef[k][j] = c_b[k][j] / theta_ref[1][k];
          }
        }
      }
    }
}

Line-by-line:

Guard: only executed when the multiplicative component is active.
Per-coordinate loop over $k = 1,\ldots,p$.
Inner guard: skip coordinates with zero multiplicative basis columns.
NCP scaling: $\mathbf{c}_{b,k} = \sigma_{b,k} \cdot \mathbf{c}_{b,\text{raw},k}$. This is always NCP — the raw variables are standard normal (priors below).
Derived quantity b_coef: computed only in the single-anchor regime (use_groups == 0) and when $\theta_{\text{ref}}[1][k] \neq 0$. The formula is: $$b_{k,j} = \frac{c_{b,k,j}}{\theta_{\text{ref}}[1][k]}$$ With multiple groups there is no single canonical scalar to divide by, so b_coef remains at zero — users must derive $b_{k,j}^{(g)} = c_{b,k,j} / \theta_{\text{ref}}[g][k]$ post-hoc.

Linear predictor construction

for (i in 1:n) {
    int g_i = group_id[i];
    for (k in 1:p) {
      real theta_ref_ik = theta_ref[g_i][k];
      real eta_ik = theta_ref_ik;

For each observation $i$ and coordinate $k$:

Look up group index $g_i = \text{group\_id}[i]$.
Initialize $\eta_{i,k} = \theta_{\text{ref}}[g_i][k]$.

Additive term:

      if (use_a == 1 && J_a_per_k[k] > 0) {
        eta_ik += dot_product(
          to_vector(Z_a[k][i, 1:J_a_per_k[k]]),
          a_coef[k][1:J_a_per_k[k]]
        );
      }

Adds $\mathbf{Z}_{a,k}[i,\, 1\!:\!J_{a,k}] \cdot \mathbf{a}_k[1\!:\!J_{a,k}]$.

Multiplicative term (reparametrized):

      if (use_b == 1 && J_b_per_k[k] > 0) {
        eta_ik += dot_product(
          to_vector(Z_b[k][i, 1:J_b_per_k[k]]),
          c_b[k][1:J_b_per_k[k]]
        );
      }

Adds $\mathbf{Z}_{b,k}[i,\, 1\!:\!J_{b,k}] \cdot \mathbf{c}_{b,k}[1\!:\!J_{b,k}]$.

This encodes $b_k(\mathbf{x}_i) \cdot \theta_{\text{ref}}[g_i][k]$ via the linear reparametrization $\mathbf{c}_{b,k} = \theta_{\text{ref}}[k] \cdot \mathbf{b}_k$, so that: $$\mathbf{Z}{b,k}[i,:] \cdot \mathbf{c}{b,k} = \theta_{\text{ref}}[g_i][k] \cdot \underbrace{\mathbf{Z}_{b,k}[i,:] \cdot \mathbf{b}k}{b_k(\mathbf{x}_i)}$$

Modulating (W) term:

      if (use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0) {
        vector[d] W_diff_x_k = rep_vector(0, d);
        vector[W_per_k_dim] basis_diff_k = apply_W_basis_diff(
          W_type_id, theta_ref_ik, theta_anchor[k], W_per_k_dim,
          W_degree, W_n_knots_full, W_knots_full
        );
        int row_start = (k - 1) * W_per_k_dim + 1;
        for (jj_local in 1:W_per_k_dim) {
          int row = row_start + jj_local - 1;
          for (jjj in 1:d) {
            W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[row, jjj]{{W_SCALE}};
          }
        }
        eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
      }

Step by step:

Guard: all five conditions must hold (component active, non-zero dimensions).
W_diff_x_k: accumulator $\in \mathbb{R}^d$, initialized to zero.
basis_diff_k: the vector $\boldsymbol{\delta}_k$ with elements: $$\delta_{k,j} = \phi_j!\bigl(\theta_{\text{ref}}[g_i][k]\bigr) - \phi_j!\bigl(\theta_{\text{anchor}}[k]\bigr), \quad j = 1,\ldots,W_{\text{per_k_dim}}$$
row_start = (k-1)*W_per_k_dim + 1: the first row in W_raw belonging to coordinate $k$.
Inner double loop: computes $$\texttt{W_diff_x_k}[c] = \sum_{j=1}^{W_{\text{per_k_dim}}} \delta_{k,j} \cdot W_{\text{raw}}[\text{row}j, c] \cdot \underbrace{\texttt{W_SCALE}}{\text{scaling factor}}$$ for each column $c = 1,\ldots,d$. The {{W_SCALE}} placeholder is either empty (CP) or * sigma_W[1] (NCP). The matrix–vector product is: $$\mathbf{W}{\text{diff}}^{(k)} = \mathbf{W}{\text{raw}}^{(k)\top} , \boldsymbol{\delta}_k \quad \in \mathbb{R}^d$$ where $\mathbf{W}_{\text{raw}}^{(k)}$ is the $W_{\text{per\_k\_dim}} \times d$ sub-block of W_raw for coordinate $k$.
Dot product: $\eta_{i,k} \mathrel{+}= \bigl(\mathbf{W}_{\text{diff}}^{(k)}\bigr)^\top \mathbf{x}_i$.

Assembling all three additive components, the full linear predictor is:

$$\boxed{\eta_{i,k} = \theta_{\text{ref}}[g_i][k] + \underbrace{\mathbf{Z}_{a,k}[i,:] , \mathbf{a}_k}_{\text{additive } a_k(\mathbf{x}_i)} + \underbrace{\mathbf{Z}_{b,k}[i,:] , \mathbf{c}_{b,k}}_{\text{mult. (reparam.) } b_k(\mathbf{x}_i),\theta_{\text{ref}}[k]} + \underbrace{\left(\mathbf{W}_{\text{raw}}^{(k)\top},\boldsymbol{\delta}_k\right)^\top \mathbf{x}_i}_{\text{modulating } (W_k(\theta_{\text{ref}})-W_k(\theta_{\text{anchor}}))^\top \mathbf{x}_i}}$$

The coordinate-wise factorization means cross-dimensional coupling enters only through the shared $\theta_{\text{ref}}[g_i]$ in the W term (condition (C4-bis)); the likelihood itself factorizes as $\prod_{k=1}^p D(y_{ik} \mid \eta_{i,k})$.

`model` Block

Priors

Group-level reference:

if (use_groups == 1) {
    mu_theta_ref[1]       ~ {{PRIOR_THETA_REF}};
    sigma_theta_ref[1]    ~ {{PRIOR_SIGMA_THETA_REF}};
    for (g in 1:J_groups) {
      theta_ref[g] ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
    }
} else {
    theta_ref[1] ~ {{PRIOR_THETA_REF}};
}

use_groups == 1 (hierarchical):
- Hyperprior on $\boldsymbol{\mu}_\theta$: coordinate-wise prior {{PRIOR_THETA_REF}} (placeholder, e.g., normal(0, 5)).
- Hyperprior on $\boldsymbol{\sigma}_\theta$: coordinate-wise prior {{PRIOR_SIGMA_THETA_REF}} (e.g., exponential(1)).
- Group-level: for each $g,\ \boldsymbol{\theta}_{\text{ref}}[g] \sim \mathcal{N}(\boldsymbol{\mu}_\theta, \boldsymbol{\sigma}_\theta)$ (coordinate-wise independent).
The joint density is: $p(\boldsymbol{\theta}_{\text{ref}}, \boldsymbol{\mu}_\theta, \boldsymbol{\sigma}_\theta) = p(\boldsymbol{\mu}_\theta)\, p(\boldsymbol{\sigma}_\theta) \prod_{g=1}^{J_{\text{groups}}} \prod_{k=1}^p \mathcal{N}\!\bigl(\theta_{\text{ref}}[g][k] \mid \mu_{\theta,k},\, \sigma_{\theta,k}\bigr)$
use_groups == 0 (single anchor): $\boldsymbol{\theta}_{\text{ref}}[1] \sim$ {{PRIOR_THETA_REF}} (coordinate-wise). This collapses to the Block 6 model.

Additive component:

if (use_a == 1) {
    sigma_a ~ {{PRIOR_SIGMA_A}};
    {{MODEL_A_BLOCK}}
}

sigma_a[k]: per-coordinate prior {{PRIOR_SIGMA_A}} (e.g., exponential(1)).
{{MODEL_A_BLOCK}}: dispatches CP vs NCP per coordinate. For CP coordinates, targets a_raw segments with $\mathcal{N}(0, \sigma_{a,k}^2)$. For NCP coordinates, targets a_raw segments with $\mathcal{N}(0, 1)$ (the scaling by $\sigma_{a,k}$ happened in transformed parameters).

Multiplicative component:

if (use_b == 1) {
    sigma_b ~ {{PRIOR_SIGMA_B}};
    c_b_raw ~ normal(0, 1);
}

sigma_b[k]: per-coordinate prior (e.g., exponential(1)).
c_b_raw: always $\mathcal{N}(0, I)$. This is the NCP for $\mathbf{c}_b$: the transformation $\mathbf{c}_b = \sigma_b \cdot \mathbf{c}_{b,\text{raw}}$ was applied in transformed parameters. The Jacobian of this linear transformation is constant ($\prod_k \sigma_{b,k}^{J_{b,\text{free}}[k]}$) and is automatically handled by Stan's change-of-variables for unconstrained parameters (though technically c_b_raw is unconstrained, so the Jacobian contributes to the log-density through the transformed parameters assignment).

Modulating component:

if (use_W == 1 && dim_W > 0) {
    sigma_W[1] ~ {{PRIOR_SIGMA_W}};
    to_vector(W_raw) ~ {{W_PRIOR}};
}

sigma_W[1]: global prior scale for W (e.g., exponential(1)).
W_raw: flattened to a vector and assigned prior {{W_PRIOR}} (e.g., normal(0, 1) for NCP, or normal(0, sigma_W[1]) for CP — dispatched by code generation).

Dispersion parameters:

if (use_dispersion_y == 1) {
    sigma_y ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
    phi ~ {{PRIOR_PHI}};
}

Per-coordinate priors on $\sigma_{y,k}$ (Gaussian SD) and $\phi_k$ (NB2 dispersion).

Likelihood

for (k in 1:p) {
    if (family_id == 1) {
      y_real[:, k] ~ normal(eta[:, k], sigma_y[k]);
    } else if (family_id == 2) {
      y_int[:, k] ~ poisson_log(eta[:, k]);
    } else if (family_id == 3) {
      y_int[:, k] ~ neg_binomial_2_log(eta[:, k], phi[k]);
    } else if (family_id == 4) {
      y_int[:, k] ~ bernoulli_logit(eta[:, k]);
    }
}

For each coordinate $k$, the likelihood contribution is:

`family_id`	Distribution	Link	Log-likelihood contribution $\ell_{i,k}$
1 (Gaussian)	$y_{ik} \sim \mathcal{N}(\eta_{i,k},\, \sigma_{y,k}^2)$	identity	$-\tfrac{1}{2}\log(2\pi\sigma_{y,k}^2) - \frac{(y_{ik}-\eta_{i,k})^2}{2\sigma_{y,k}^2}$
2 (Poisson)	$y_{ik} \sim \text{Poisson}(\exp(\eta_{i,k}))$	log	$y_{ik}\,\eta_{i,k} - \exp(\eta_{i,k}) - \log(y_{ik}!)$
3 (NB2)	$y_{ik} \sim \text{NB2}(\exp(\eta_{i,k}),\, \phi_k)$	log	$y_{ik}\log\frac{\phi_k}{\phi_k+e^{\eta_{i,k}}} + \phi_k\log\frac{\phi_k}{\phi_k+e^{\eta_{i,k}}} + \log\Gamma(y_{ik}+\phi_k) - \log\Gamma(\phi_k) - \log(y_{ik}!)$
4 (Bernoulli)	$y_{ik} \sim \text{Bernoulli}(\text{logit}^{-1}(\eta_{i,k}))$	logit	$y_{ik}\,\eta_{i,k} - \log(1+e^{\eta_{i,k}})$

Note: y_real[:, k] is Stan's column-slice syntax, selecting all rows of column $k$ — equivalent to col(y_real, k).

The total log-posterior up to a constant is:

$$\log p(\boldsymbol{\Theta} \mid \mathbf{y}) \propto \sum_{i=1}^n \sum_{k=1}^p \ell_{i,k} + \log p(\boldsymbol{\theta}_{\text{ref}}) + \log p(\mathbf{a}) + \log p(\mathbf{c}_b) + \log p(\mathbf{W}) + \text{dispersion priors}$$

where $\boldsymbol{\Theta}$ collects all parameters.

Reference-anchoring decomposition summary

The entire template realizes the AMM canonical form:

$$\theta_{i}[k] = \underbrace{\theta_{\text{ref}}[g_i][k]}_{\text{reference anchor}} + \underbrace{a_k(\mathbf{x}_i)}_{\text{additive (C2)}} + \underbrace{b_k(\mathbf{x}_i),\theta_{\text{ref}}[g_i][k]}_{\text{multiplicative, via } c_b = \theta_{\text{ref}} \cdot b} + \underbrace{\bigl(W_k(\theta_{\text{ref}}) - W_k(\theta_{\text{anchor}})\bigr)^\top \mathbf{x}_i}_{\text{modulating (C4)}}$$

with the following identifiability conditions enforced:

(C1): The reference $\theta_{\text{ref}}$ is identifiable through its role as intercept and through the W coupling.
(C2): Column-centering of $\mathbf{Z}_a$ ensures $\mathbb{E}[a_k(\mathbf{X})] = 0$, preventing additive confounding with $\theta_{\text{ref}}$.
(C3): Column-centering of $\mathbf{Z}_b$ ensures $\mathbb{E}[b_k(\mathbf{X})] = 0$.
(C4): The reparametrization $W_k(\theta_{\text{ref}}) - W_k(\theta_{\text{anchor}})$ eliminates the $W$-level intercept, anchoring the modulating component.
(C5): No group indicator in $\mathbf{Z}_a$ when use_groups == 1 (enforced on the R side).

The generated quantities block (section 2 of 2) would follow, containing posterior predictive quantities, log-likelihoods, and any additional derived parameters.

`inst/stan/_canonical_pieces/amm_canonical_pmulti.stan` — Section 2 of 2

This section contains the tail of the model block (a single closing brace) and the entire generated quantities block. Because section 1 ended mid-model block, the leading } here terminates it. Everything that follows is post-sampling bookkeeping: pointwise log-likelihood assembly, posterior-predictive replication, and the AMM "individual predictor" handle theta_i.

Closing of the `model` block

This brace closes the model block opened in section 1. Section 1 is responsible for accumulating the joint log-density (priors on all AMM-anchored parameters plus the sampling distribution contributions via target += or sampling statements). No further target updates occur in section 2; everything below is computed conditionally on the current posterior draw and does not affect the sampler.

`generated quantities` block

generated quantities {

Opens the generated-quantities section. Executed once per HMC/NUTS leapfrog-accepted draw (and once per warmup draw on most interfaces, though only post-warmup draws are stored). All quantities declared here are tracked by the sampler's output chain. Three matrices are produced: log_lik, theta_i, and y_pred.

Declaration 1 — pointwise log-likelihood store

  matrix[n, p] log_lik;

Declares an $n \times p$ real matrix to hold the per-observation, per-outcome log-likelihood contributions $\log p(y_{ik} \mid \eta_{ik}, \dots)$. This layout is the canonical input for loo::loo() / loo::loo_matrix() and for $k$-fold / Pareto-smoothed importance sampling leave-one-out cross-validation. Storing it elementwise (rather than summed) is what permits PSIS-LOO to reweight at the $(i,k)$ resolution.

Declaration 2 — AMM individual predictor handle

  matrix[n, p] theta_i = eta;

Declares theta_i as an $n \times p$ matrix and initializes it to the current value of eta. In the AMM (Anchored Measurement / reference-anchoring) decomposition, $\eta_{ik}$ is the anchored linear predictor for individual $i$ on outcome $k$, constructed in section 1 by decomposing

$$ \eta_{ik} ;=; \alpha_k ;+; \underbrace{x_i^\top \beta_k}_{\text{covariate contribution}} ;+; \underbrace{\sum_{r} \Lambda_{kr}, f_{ir}}_{\text{anchored latent factor contribution}} ;+; \dots $$

where the anchoring identifies the latent factors $f_{ir}$ against a reference outcome (or reference level) so that the scale, sign, and location of the latent space are pinned to observables rather than to arbitrary identifying constraints. The assignment theta_i = eta is a pass-through alias: mathematically $\theta_i \equiv \eta$, i.e.

$$ \theta_{ik} ;:=; \eta_{ik}. $$

It introduces no transformation and no Jacobian. Its purpose is purely semantic and downstream-facing: it exposes the AMM "individual-level predictor" under the conventional name $\theta_i$ so that R-side code (gdpar's extractors, plotting, summary methods) can retrieve the anchored predictor without re-deriving it from eta. Because the canonical-link parameterization is used (see below), $\theta_{ik}$ is on the natural-parameter scale of the chosen exponential family, which is exactly the scale on which AMM anchoring is most cleanly defined.

Declaration 3 — posterior-predictive store

  matrix[n, p] y_pred;

Declares an $n \times p$ matrix to hold one posterior-predictive replicate $y^{\text{rep}}_{ik}$ per stored draw. Filled inside the loop below using the _rng variants of each family's sampler.

The double loop

  for (i in 1:n) {
    for (k in 1:p) {

Iterates over all $n$ observations and all $p$ outcomes. The body dispatches on the integer flag family_id (declared in data in section 1) to select the exponential-family observation model. Because the dispatch is uniform across $k$, all $p$ outcomes share the same family in this canonical template; multi-family extensions would require a per-$k$ family vector. The canonical-link convention is enforced by the choice of _log / _logit suffixed Stan functions, which take eta directly on the natural-parameter scale.

Branch `family_id == 1` — Gaussian / normal

      if (family_id == 1) {
        log_lik[i, k] = normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]);
        y_pred[i, k] = normal_rng(eta[i, k], sigma_y[k]);
      }

Observation model:

$$ y_{ik} ;\sim; \mathcal{N}!\left(\eta_{ik},; \sigma_{y,k}\right), $$

with identity link $g(\mu_{ik}) = \mu_{ik} = \eta_{ik}$ (the canonical link for the Gaussian family). The pointwise log-likelihood stored is

$$ \log p(y_{ik}\mid \eta_{ik}, \sigma_{y,k}) ;=; -\tfrac{1}{2}\log(2\pi\sigma_{y,k}^{2}) ;-; \frac{(y_{ik}-\eta_{ik})^{2}}{2\sigma_{y,k}^{2}}. $$

y_real (declared in data, section 1) is the continuous-response matrix used only in this branch. sigma_y[k] is the outcome-specific standard deviation (a parameters entry from section 1, presumably constrained <lower=0>). The posterior-predictive draw is

$$ y^{\text{rep}}_{ik} ;\sim; \mathcal{N}!\left(\eta_{ik},; \sigma_{y,k}\right). $$

Branch `family_id == 2` — Poisson with log link

      } else if (family_id == 2) {
        log_lik[i, k] = poisson_log_lpmf(y_int[i, k] | eta[i, k]);
        y_pred[i, k] = poisson_log_rng(eta[i, k]);
      }

Observation model:

$$ y_{ik} ;\sim; \text{Poisson}!\left(\mu_{ik}\right), \qquad \log \mu_{ik} ;=; \eta_{ik}, \qquad \mu_{ik} ;=; e^{\eta_{ik}}. $$

The log link is the canonical link for the Poisson family, so eta enters poisson_log_lpmf directly without manual exponentiation. The stored log-likelihood is

$$ \log p(y_{ik}\mid \eta_{ik}) ;=; y_{ik},\eta_{ik} ;-; e^{\eta_{ik}} ;-; \log(y_{ik}!). $$

y_int (declared in data, section 1) is the integer-response matrix. The posterior-predictive draw is

$$ y^{\text{rep}}_{ik} ;\sim; \text{Poisson}!\left(e^{\eta_{ik}}\right). $$

Branch `family_id == 3` — Negative binomial (NB2) with log link

      } else if (family_id == 3) {
        log_lik[i, k] = neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]);
        y_pred[i, k] = neg_binomial_2_log_rng(eta[i, k], phi[k]);
      }

Observation model (NB2 parameterization, mean $\mu$ and reciprocal-dispersion $\phi$):

$$ y_{ik} ;\sim; \text{NegBinomial}_2!\left(\mu_{ik},; \phi_k\right), \qquad \log \mu_{ik} ;=; \eta_{ik}, \qquad \mu_{ik} ;=; e^{\eta_{ik}}, $$

with variance function

$$ \text{Var}(y_{ik}\mid \eta_{ik}, \phi_k) ;=; \mu_{ik} ;+; \frac{\mu_{ik}^{2}}{\phi_k}. $$

The log link is the canonical link for the NB2 family as parameterized by Stan. The stored log-likelihood is the NB2 pmf in log form,

$$ \log p(y_{ik}\mid \eta_{ik}, \phi_k) ;=; \log\Gamma(y_{ik}+\phi_k) ;-; \log\Gamma(\phi_k) ;-; \log(y_{ik}!) ;+; \phi_k \log!\left(\frac{\phi_k}{\phi_k + e^{\eta_{ik}}}\right) ;+; y_{ik} \log!\left(\frac{e^{\eta_{ik}}}{\phi_k + e^{\eta_{ik}}}\right). $$

phi[k] is the outcome-specific reciprocal-dispersion parameter (declared in parameters, section 1, presumably <lower=0>). The posterior-predictive draw is

$$ y^{\text{rep}}_{ik} ;\sim; \text{NegBinomial}_2!\left(e^{\eta_{ik}},; \phi_k\right). $$

Branch `family_id == 4` — Bernoulli with logit link

      } else if (family_id == 4) {
        log_lik[i, k] = bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]);
        y_pred[i, k] = bernoulli_logit_rng(eta[i, k]);
      }

Observation model:

$$ y_{ik} ;\sim; \text{Bernoulli}!\left(\pi_{ik}\right), \qquad \text{logit}(\pi_{ik}) ;=; \eta_{ik}, \qquad \pi_{ik} ;=; \frac{1}{1+e^{-\eta_{ik}}}. $$

The logit link is the canonical link for the Bernoulli/binomial family, so eta enters bernoulli_logit_lpmf directly. The stored log-likelihood is

$$ \log p(y_{ik}\mid \eta_{ik}) ;=; y_{ik},\eta_{ik} ;-; \log!\left(1 + e^{\eta_{ik}}\right), $$

where the second term is the softplus function evaluated in a numerically stable manner inside Stan. The posterior-predictive draw is

$$ y^{\text{rep}}_{ik} ;\sim; \text{Bernoulli}!\left(\text{logit}^{-1}(\eta_{ik})\right), $$

i.e. a 0/1 integer realization.

Closing of the dispatch and loops

    }
  }
}

Closes the k loop, the i loop, and the generated quantities block. No else clause is provided for family_id values outside $\{1,2,3,4\}$; if an unsupported value is supplied, the corresponding log_lik[i,k] and y_pred[i,k] entries remain at their default-initialized value (NaN for real matrices in Stan), which will propagate visibly into downstream R-side diagnostics.

How this realizes the AMM reference-anchoring decomposition

The generated-quantities block is the consumption point of the AMM decomposition, not its construction site (which lives in transformed parameters of section 1). Its role is threefold:

Expose the anchored predictor. The line matrix[n, p] theta_i = eta; re-binds the anchored linear predictor $\eta$ — already assembled in section 1 from the reference-anchored intercepts, covariate slopes, and identified latent factors — under the canonical AMM symbol $\theta_i$. Because the anchoring was performed on the natural-parameter scale (the canonical link scale), no further reparameterization is needed: $\theta_{ik}$ is exactly the natural parameter of the chosen exponential family.
Evaluate the likelihood on the canonical scale. Each branch feeds eta[i,k] (= $\theta_{ik}$) directly into the _log / _logit suffixed Stan functions, which interpret their first argument as the natural parameter of the family. This is the operational meaning of "canonical pieces" in the file path: the template hard-codes the canonical-link parameterization of each supported family, so the AMM predictor $\theta_i$ plugs in without a link-function transformation. Mathematically, for family $F$ with natural parameter $\theta$ and cumulant function $b(\theta)$,

$$ \log p(y\mid \theta) ;=; y,\theta ;-; b(\theta) ;+; c(y), $$

and the four branches instantiate this for the Gaussian ($b(\theta)=\theta^2/2$ on the mean scale with known $\sigma$), Poisson ($b(\theta)=e^\theta$), NB2 (with $\phi$-dependent $b$), and Bernoulli ($b(\theta)=\log(1+e^\theta)$) families respectively.
Produce posterior-predictive replicates on the same anchored scale. Each _rng call draws $y^{\text{rep}}_{ik}$ from the same sampling distribution whose natural parameter is the anchored $\eta_{ik}$, so posterior-predictive checks operate on the same identified parameter space that the AMM anchoring was designed to make interpretable. The reference anchoring guarantees that $\eta_{ik}$ (and hence the predictive draws) is invariant to the otherwise-arbitrary rotation/sign/scale of the latent factor space, because those degrees of freedom were absorbed into the anchoring constraints imposed in section 1.

No Jacobian terms appear in this section: the only transformation (theta_i = eta) is the identity, and the canonical-link parameterization means the map from $\eta$ to the mean/probability is the inverse of the canonical link, whose Jacobian is already absorbed into the _lpdf/_lpmf functions themselves (Stan's poisson_log_lpmf, neg_binomial_2_log_lpmf, and bernoulli_logit_lpmf internally apply the link and include the corresponding change-of-variables terms where relevant — none for discrete families, where the link is merely a reparameterization of the parameter, not of the data).

inst/stan/amm_eb_conditional_KxP.stan

`amm_eb_conditional_KxP.stan` — Empirical Bayes Conditional AMM Template (Path C)

1. Architectural Overview

This template implements the Empirical Bayes (EB) Conditional variant of the AMM (Anchor-Mediated Model) reference-anchoring decomposition within the gdpar package. It is designated Path 1 / Path C under sub-phase 8.6.D, canonizing decisions D36 (α), D37 (i), and D38″ (h).

The template is bit-identical to amm_eb_marginal_KxP.stan except for one structural change: the reference anchor $\theta_{\text{ref},k,j}$ is relocated from the parameters block (where it was sampled with a prior) to the data block (where it enters as a plug-in EB estimate, renamed theta_ref_kp_data). Consequently, the prior block on $\theta_{\text{ref}}$ is removed from model{}. This transforms the model from a marginal (fully Bayesian over the anchor) to a conditional (anchor conditioned upon) formulation.

The AMM Reference-Anchoring Decomposition

For observation $i$, group $g_i$, slot $k \in \{1,\dots,K\}$, and coordinate $j \in \{1,\dots,p\}$, the linear predictor is:

$$\eta_{k,i,j} = \theta_{\text{ref},g_i,k,j} ;+; \mathbf{z}_{a,k,j,i}^{\top},\mathbf{a}_{k,j} ;+; \mathbf{z}_{b,k,j,i}^{\top},\mathbf{c}_{b,k,j}$$

where:

$\theta_{\text{ref},g_i,k,j}$ is the group-specific reference anchor (plug-in EB data in this conditional template).
$\mathbf{a}_{k,j}$ is the additive deviation vector, constructed via non-centered parameterization: $\mathbf{a}_{k,j} = \sigma_{a,k}\,\mathbf{a}_{\text{raw},k,j}$.
$\mathbf{c}_{b,k,j}$ is the scaled b-component, also non-centered: $\mathbf{c}_{b,k,j} = \sigma_{b,k}\,\mathbf{c}_{b,\text{raw},k,j}$.
The anchor-mediated scaling $\mathbf{b}_{k,j}$ is defined (only when use_groups == 0) as $\mathbf{b}_{k,j} = \mathbf{c}_{b,k,j}\,/\,\theta_{\text{ref},k,j}$, expressing the multiplicative deviation relative to the anchor.

K-Slot Multi-Parametric Likelihood (Path B)

With $K=2$ (the initial coverage), the two slots jointly parameterize a single likelihood:

Slot 1 ($k=1$): location parameter (mean).
Slot 2 ($k=2$): dispersion parameter (log-scale of $\sigma$ or $\phi$).

Coordinate-Wise Factorization (Path A)

The $p$-dimensional response is factored coordinate-wise: conditional on the slots, each coordinate $j$ is independent. The full likelihood is a product over $i$ and $j$.

Supported Families (initial coverage)

`family_id_k_vector[1]`	Family	Slot 1 role	Slot 2 role
1	Gaussian	$\mu_{ij} = \eta_{1,i,j}$	$\log\sigma_{ij} = \eta_{2,i,j}$
3	Negative Binomial (type 2)	$\log\mu_{ij} = \eta_{1,i,j}$	$\log\phi_{ij} = \eta_{2,i,j}$

2. `data` Block

2.1 Core Dimensions

int<lower=1> n;
int<lower=2> K;
int<lower=2> p;

n: Number of observations. Constrained $\geq 1$.
K: Number of slots. Constrained $\geq 2$ because the multi-parametric likelihood requires at least a location slot and a dispersion slot.
p: Number of response coordinates (the dimensionality of the coordinate-wise factorization). Constrained $\geq 2$.

2.2 Family and Link Configuration

array[K] int<lower=1, upper=13> family_id_k_vector;
array[K] int<lower=0, upper=2> inv_link_id_per_slot;

family_id_k_vector: Per-slot integer family identifier. The range $[1,13]$ accommodates up to 13 distributional families. In the current template body, only family_id_k_vector[1] (the slot-1 family) is inspected to dispatch the likelihood; slot 2 is implicitly the dispersion slot. Values used in the body: 1 (Gaussian) and 3 (Negative Binomial type 2).
inv_link_id_per_slot: Per-slot inverse-link identifier $\in \{0,1,2\}$. Declared but not referenced in the template body — the link is hard-coded per family (identity for Gaussian mean, exp for dispersion/NB mean). This is a configuration placeholder for future generalization.

2.3 Component Usage Flags

array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;
int<lower=0, upper=0> use_W;

use_a_k: Binary flag per slot. If use_a_k[k] == 1, slot $k$ includes an additive $a()$ component.
use_b_k: Binary flag per slot. If use_b_k[k] == 1, slot $k$ includes a $b()$ (anchor-mediated scaling) component.
use_W: Constrained <lower=0, upper=0>, i.e., always exactly 0. This is a structural placeholder indicating that a $W$-matrix (global cross-slot mixing) pathway is disabled by construction. It is declared for interface compatibility but can never take a nonzero value.

2.4 Design Matrix Dimensions

int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[K, p] int<lower=0> J_a_per_kp;
array[K, p] int<lower=0> J_b_per_kp;

J_a_max: Maximum column count of any $a()$ design matrix across all $(k,j)$ pairs. Used as the static column dimension of the padded design arrays.
J_b_max: Analogous maximum for $b()$ design matrices.
J_a_per_kp[k,j]: Actual (unpadded) number of $a()$ design columns for slot $k$, coordinate $j$. May be 0 (no $a$ terms for that pair).
J_b_per_kp[k,j]: Analogous for $b()$.

2.5 Design Matrices

array[K, p] matrix[n, J_a_max] Z_a_kp;
array[K, p] matrix[n, J_b_max] Z_b_kp;

Z_a_kp[k,j]: $n \times J_{a,\max}$ design matrix for the additive component of slot $k$, coordinate $j$. Columns beyond J_a_per_kp[k,j] are padding (ignored at evaluation time via slicing 1:J_a_per_kp[k,j]).
Z_b_kp[k,j]: $n \times J_{b,\max}$ design matrix for the $b()$ component. Same padding convention.

2.6 Response Data

matrix[n, p] y_real;
array[n, p] int y_int;

y_real: $n \times p$ matrix of real-valued responses. Used when family_id_k_vector[1] == 1 (Gaussian).
y_int: $n \times p$ integer array of count responses. Used when family_id_k_vector[1] == 3 (Negative Binomial).

Both are always passed; the unused one is simply ignored by the dispatched likelihood.

2.7 Anchor Data

array[K] vector[p] theta_anchor_kp;

theta_anchor_kp: Per-slot anchor vector of length $p$. Declared in data but never referenced in the template body. This is a legacy/interface declaration — the R-side wrapper may use it for construction or validation, but the Stan model itself reads the anchor from theta_ref_kp_data (below). Its presence ensures interface symmetry with the marginal template.

2.8 Dispersion Flags

array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;

use_dispersion_y_k: Per-slot flag for a population-level $\sigma_y$ dispersion parameter.
use_dispersion_phi_k: Per-slot flag for a population-level $\phi$ dispersion parameter.

These control the sizing of sigma_y_pop_k and phi_pop_k parameter arrays (see transformed data). Note: in the current template body, these population dispersion parameters receive priors but do not enter the likelihood — they are sampled but unused in $\eta$ or the data-generating distribution. This appears to be scaffolding for a population-level dispersion layer not yet wired into the predictor.

2.9 Group Structure

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;

use_groups: Binary flag. If 1, observations are assigned to groups and the reference anchor varies by group.
J_groups: Number of groups. Must be $\geq 1$.
group_id: Per-observation group index $g_i \in \{1,\dots,J_{\text{groups}}\}$.

When use_groups == 0, all observations share the same anchor theta_ref_kp_data[1, k][j] (group index 1), and the $b$-coefficient decomposition $\mathbf{b} = \mathbf{c}_b / \theta_{\text{ref}}$ is computed. When use_groups == 1, the anchor varies by group but the $b$-decomposition is skipped (because a single $\mathbf{b}$ cannot represent group-varying $\theta_{\text{ref}}$).

2.10 Redundant Dimension Mirrors

int<lower=2> K_slots;
int<lower=2> p_dim;

K_slots: Redundant copy of K. Declared for interface/naming consistency. Not referenced in the body.
p_dim: Redundant copy of p. Same purpose. Not referenced in the body.

2.11 Plug-in EB Anchor (the Conditional Template's Defining Data)

array[J_groups, K] vector[p] theta_ref_kp_data;

theta_ref_kp_data: The plug-in Empirical Bayes reference anchor, shaped as a 3D array $[J_{\text{groups}}, K, p]$. Element theta_ref_kp_data[g, k][j] gives $\theta_{\text{ref},g,k,j}$ — the anchor for group $g$, slot $k$, coordinate $j$.

This is the single structural difference from the marginal template: here $\theta_{\text{ref}}$ is data (conditioned upon), not a parameter (sampled). The EB procedure on the R side produces a point estimate of $\theta_{\text{ref}}$ (e.g., from a previous fitting stage or a method-of-moments calculation), packs it into this 3D array via cmdstanr's automatic array packing, and passes it in. The Stan model then treats it as known.

3. `transformed data` Block

This block performs all index precomputation: it determines the layout of the flattened parameter vectors (a_raw, c_b_kp_raw), computes offsets, sizes the dispersion parameter arrays, and implements the Option A compactification of $\sigma_{a,k}$.

3.1 Free-Coefficient Counts and Offsets

array[K, p] int J_a_free_kp;
array[K, p] int J_b_free_kp;
array[K, p] int a_raw_offset_kp;
array[K, p] int c_b_raw_offset_kp;
int total_J_a_free = 0;
int total_J_b_free = 0;
int any_use_a = 0;
int any_use_b = 0;
int sigma_y_size = 0;
int phi_size = 0;
array[K] int sigma_y_offset_k;
array[K] int phi_offset_k;

Declarations:

J_a_free_kp[k,j]: Number of free $a()$ coefficients for slot $k$, coordinate $j$. A coefficient is free only if use_a_k[k] == 1 and J_a_per_kp[k,j] > 0. Otherwise it is 0.
J_b_free_kp[k,j]: Analogous for $b()$.
a_raw_offset_kp[k,j]: Starting index (0-based) of the $(k,j)$ block within the flattened a_raw vector.
c_b_raw_offset_kp[k,j]: Starting index of the $(k,j)$ block within c_b_kp_raw.
total_J_a_free: Total count of free $a()$ coefficients across all $(k,j)$. Determines the length of a_raw.
total_J_b_free: Total count of free $b()$ coefficients. Determines the length of c_b_kp_raw.
any_use_a: Global flag — 1 if any $(k,j)$ has free $a()$ coefficients, else 0.
any_use_b: Global flag — 1 if any $(k,j)$ has free $b()$ coefficients, else 0.
sigma_y_size: Total count of population $\sigma_y$ parameters (sum of use_dispersion_y_k).
phi_size: Total count of population $\phi$ parameters (sum of use_dispersion_phi_k).
sigma_y_offset_k[k], phi_offset_k[k]: Per-slot offsets into the compacted dispersion arrays.

3.2 The Main Indexing Loop

for (k in 1:K) {
  for (j in 1:p) {
    J_a_free_kp[k, j] = (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0)
                          ? J_a_per_kp[k, j] : 0;
    J_b_free_kp[k, j] = (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0)
                          ? J_b_per_kp[k, j] : 0;
    a_raw_offset_kp[k, j] = total_J_a_free;
    c_b_raw_offset_kp[k, j] = total_J_b_free;
    total_J_a_free += J_a_free_kp[k, j];
    total_J_b_free += J_b_free_kp[k, j];
    if (J_a_free_kp[k, j] > 0) any_use_a = 1;
    if (J_b_free_kp[k, j] > 0) any_use_b = 1;
  }
  sigma_y_offset_k[k] = sigma_y_size;
  phi_offset_k[k] = phi_size;
  sigma_y_size += use_dispersion_y_k[k];
  phi_size += use_dispersion_phi_k[k];
}

For each $(k,j)$ pair:

Free count determination: The ternary expression sets J_a_free_kp[k,j] to J_a_per_kp[k,j] only if the slot's $a()$ component is active (use_a_k[k] == 1) and there is at least one design column. This handles two cases where no free $a()$ coefficient exists:
- use_a_k[k] == 0: the $a()$ component is entirely absent for this slot.
- J_a_per_kp[k,j] == 0: the $a()$ design has zero columns (e.g., intercept-only $a()$ where the intercept has been absorbed into $\theta_{\text{ref}}$).
The same logic applies to J_b_free_kp.
Offset recording: The current value of total_J_a_free (resp. total_J_b_free) is stored as the 0-based starting offset for this $(k,j)$ block. This is a standard prefix-sum / ragged-array flattening pattern.
Running total update: total_J_a_free is incremented by the free count, accumulating the total vector length.
Global flag update: If any $(k,j)$ has free $a()$ coefficients, any_use_a is set to 1 (and stays 1). Same for any_use_b.

After the inner $j$-loop, the per-slot dispersion offsets and sizes are updated similarly.

3.3 Option A Compactification of σ_a,k (RG.6, D96)

int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
  int slot_free_a = 0;
  for (j in 1:p) {
    slot_free_a += J_a_free_kp[k, j];
  }
  if (slot_free_a > 0) {
    n_sigma_a += 1;
    sigma_a_idx[k] = n_sigma_a;
  } else {
    sigma_a_idx[k] = 0;
  }
}

This implements Option A (referenced as RG.6, decision D96): the $\sigma_{a,k}$ parameter vector is compacted to include only slots that carry at least one free $a()$ coefficient.

Rationale (from the inline comment): A slot whose $a()$ component contributes no free coefficient — either because use_a_k[k] == 0 (no $a()$ at all) or because $a()$ is intercept-only (absorbed into $\theta_{\text{ref}}$) — would otherwise declare a sampled-but-unused half-prior scale $\sigma_{a,k}$. Such a parameter would have a prior but no likelihood contribution, creating a flat (non-identified) direction in the posterior.

Mechanism:

slot_free_a accumulates $\sum_{j=1}^{p} J_{a,\text{free},k,j}$ for slot $k$.
If slot_free_a > 0, the slot is assigned a compacted index sigma_a_idx[k] = n_sigma_a (1-based), and n_sigma_a is incremented.
If slot_free_a == 0, sigma_a_idx[k] = 0 (sentinel: no parameter).

Degenerate case: When every slot carries free $a()$ coefficients, $n_{\sigma_a} = K$ and $\sigma_{a,\text{idx}}[k] = k$ (the identity mapping), so the model is bit-identical to the non-compacted version.

Note: $\sigma_{b,k}$ is not compacted in the same way — it uses array[K * any_use_b], which is all-or-nothing (either all $K$ slots get a $\sigma_{b,k}$ or none do).

4. `parameters` Block

array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;

array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_kp_raw;

array[sigma_y_size] real<lower=0> sigma_y_pop_k;
array[phi_size] real<lower=0> phi_pop_k;

4.1 Additive Component Parameters

sigma_a_k: Array of length $n_{\sigma_a}$ (compacted). Each element is a half-prior scale $\sigma_{a,k} > 0$ for the non-centered parameterization of the $a()$ coefficients. Indexed via sigma_a_idx[k] at evaluation time.
a_raw: Flattened vector of length total_J_a_free containing all free raw $a()$ coefficients across all $(k,j)$. These are the non-centered variates: each $a_{\text{raw},m} \sim \mathcal{N}(0,1)$, and the actual coefficient is $a_{k,j,jj} = \sigma_{a,k} \cdot a_{\text{raw},\text{offset}+jj}$.

4.2 B-Component (Anchor-Mediated Scaling) Parameters

sigma_b_k: Array of length $K \cdot \text{any\_use\_b}$. If any_use_b == 1, this is length $K$ (one $\sigma_{b,k}$ per slot, not compacted). If any_use_b == 0, length 0 (no parameters declared). Each element $\sigma_{b,k} > 0$.
c_b_kp_raw: Flattened vector of length total_J_b_free containing all free raw $b()$ coefficients. Non-centered: $c_{b,\text{raw},m} \sim \mathcal{N}(0,1)$, actual coefficient $c_{b,k,j,jj} = \sigma_{b,k} \cdot c_{b,\text{raw},\text{offset}+jj}$.

4.3 Population Dispersion Parameters

sigma_y_pop_k: Array of length sigma_y_size (sum of use_dispersion_y_k). Population-level $\sigma_y$ scales, constrained $> 0$.
phi_pop_k: Array of length phi_size (sum of use_dispersion_phi_k). Population-level $\phi$ scales, constrained $> 0$.

Important: These population dispersion parameters are declared, receive priors in the model block, but do not enter the likelihood or the linear predictor in the current template body. They are sampled from their priors only. This is either scaffolding for a future population-dispersion layer or a legacy from the marginal template.

4.4 Absence of `theta_ref_kp`

The defining feature of this conditional template: theta_ref_kp is not a parameter. It has been moved to data as theta_ref_kp_data. The model conditions on the plug-in EB estimate rather than sampling the anchor.

5. `transformed parameters` Block

5.1 Declarations

array[K] matrix[n, p] eta_kp;
array[K, p] vector[J_a_max] a_coef_kp;
array[K, p] vector[J_b_max] c_b_kp;
array[K, p] vector[J_b_max] b_coef_kp;

eta_kp[k]: $n \times p$ matrix of linear predictors for slot $k$. This is the central quantity of the AMM decomposition.
a_coef_kp[k,j]: Vector of length $J_{a,\max}$ holding the (scaled) $a()$ coefficients for slot $k$, coordinate $j$. Padded with zeros beyond J_a_free_kp[k,j].
c_b_kp[k,j]: Vector of length $J_{b,\max}$ holding the scaled $b()$ coefficients (the "c" in the $b = c/\theta_{\text{ref}}$ decomposition).
b_coef_kp[k,j]: Vector of length $J_{b,\max}$ holding the anchor-mediated scaling coefficients $\mathbf{b}_{k,j} = \mathbf{c}_{b,k,j} / \theta_{\text{ref},k,j}$. Computed but not used in the likelihood — the linear predictor uses c_b_kp directly. This quantity is available for posterior reporting/diagnostics.

5.2 Zero Initialization

for (k in 1:K) {
  for (j in 1:p) {
    a_coef_kp[k, j] = rep_vector(0, J_a_max);
    c_b_kp[k, j] = rep_vector(0, J_b_max);
    b_coef_kp[k, j] = rep_vector(0, J_b_max);
  }
}

All coefficient vectors are initialized to zero. This ensures that padding positions (beyond the free coefficient count) and slots/coordinates with no active component remain exactly zero, contributing nothing to the linear predictor.

5.3 Additive Coefficient Construction (NCP Scaling)

if (any_use_a == 1) {
  for (k in 1:K) {
    if (use_a_k[k] == 1) {
      for (j in 1:p) {
        if (J_a_free_kp[k, j] > 0) {
          for (jj in 1:J_a_free_kp[k, j]) {
            a_coef_kp[k, j][jj] = a_raw[a_raw_offset_kp[k, j] + jj]
                                    * sigma_a_k[sigma_a_idx[k]];
          }
        }
      }
    }
  }
}

This block is guarded by any_use_a == 1 (skipped entirely if no slot uses $a()$). For each active slot $k$ and coordinate $j$ with free $a()$ coefficients, each coefficient is constructed as:

$$a_{k,j,jj} = a_{\text{raw},,\text{offset}_{k,j}+jj} ;\cdot; \sigma_{a,,\sigma_{a,\text{idx}}[k]}$$

This is the non-centered parameterization (NCP): the raw variates $a_{\text{raw}} \sim \mathcal{N}(0,1)$ are scaled by the slot-level hyperparameter $\sigma_{a,k}$. The compacted index sigma_a_idx[k] maps slot $k$ to its position in the (possibly shortened) sigma_a_k array.

The mathematical effect: marginally,

$$a_{k,j,jj} \sim \mathcal{N}(0, \sigma_{a,k}^2)$$

but the joint $(a_{\text{raw}}, \sigma_a)$ parameterization decorrelates the posterior, improving sampler geometry.

5.4 B-Component Coefficient Construction

if (any_use_b == 1) {
  for (k in 1:K) {
    if (use_b_k[k] == 1) {
      for (j in 1:p) {
        if (J_b_free_kp[k, j] > 0) {
          for (jj in 1:J_b_free_kp[k, j]) {
            c_b_kp[k, j][jj] = c_b_kp_raw[c_b_raw_offset_kp[k, j] + jj]
                                * sigma_b_k[k];
          }
          if (use_groups == 0 && theta_ref_kp_data[1, k][j] != 0) {
            for (jj in 1:J_b_per_kp[k, j]) {
              b_coef_kp[k, j][jj] = c_b_kp[k, j][jj]
                                      / theta_ref_kp_data[1, k][j];
            }
          }
        }
      }
    }
  }
}

5.4.1 Scaled c_b Construction (NCP)

For each free $b()$ coefficient:

$$c_{b,k,j,jj} = c_{b,\text{raw},,\text{offset}_{k,j}+jj} ;\cdot; \sigma_{b,k}$$

Marginally, $c_{b,k,j,jj} \sim \mathcal{N}(0, \sigma_{b,k}^2)$. Note that sigma_b_k[k] is indexed directly by $k$ (no compactification), which is valid because the array has length $K$ when any_use_b == 1.

5.4.2 Anchor-Mediated Scaling b

The inner conditional computes:

$$b_{k,j,jj} = \frac{c_{b,k,j,jj}}{\theta_{\text{ref},1,k,j}}$$

This is the reference-anchoring decomposition for the multiplicative component: the $b()$ coefficient is expressed as the scaled raw coefficient divided by the anchor. The conditions are:

use_groups == 0: Only when there is no group structure. With groups, $\theta_{\text{ref}}$ varies by group, so a single $\mathbf{b}_{k,j}$ cannot represent the group-varying ratio. In that case, b_coef_kp remains zero.
theta_ref_kp_data[1, k][j] != 0: Division by zero is avoided. If the anchor is exactly zero, b_coef_kp remains zero for that $(k,j)$.

The loop runs over 1:J_b_per_kp[k,j] (the full design column count, not just the free count), but c_b_kp[k,j][jj] is zero for $jj > J_{b,\text{free},k,j}$ (from initialization), so only the free coefficients produce nonzero $b$ values.

Critical observation: b_coef_kp is computed here but never used in the linear predictor (Section 5.5 uses c_b_kp directly). The $b$ coefficients are available as a derived quantity for posterior analysis but do not enter the model's likelihood. The linear predictor's $b()$ contribution is $\mathbf{z}_{b}^\top \mathbf{c}_b$, not $\mathbf{z}_{b}^\top \mathbf{b}$.

5.5 Linear Predictor Construction

for (k in 1:K) {
  for (i in 1:n) {
    int g_i = group_id[i];
    for (j in 1:p) {
      real theta_ref_ikj = theta_ref_kp_data[g_i, k][j];
      real eta_ikj = theta_ref_ikj;
      if (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0) {
        eta_ikj += dot_product(
          to_vector(Z_a_kp[k, j][i, 1:J_a_per_kp[k, j]]),
          a_coef_kp[k, j][1:J_a_per_kp[k, j]]
        );
      }
      if (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0) {
        eta_ikj += dot_product(
          to_vector(Z_b_kp[k, j][i, 1:J_b_per_kp[k, j]]),
          c_b_kp[k, j][1:J_b_per_kp[k, j]]
        );
      }
      eta_kp[k][i, j] = eta_ikj;
    }
  }
}

This is the core AMM reference-anchoring equation, realized as:

$$\eta_{k,i,j} = \theta_{\text{ref},g_i,k,j} ;+; \sum_{\ell=1}^{J_{a,k,j}} z_{a,k,j,i,\ell},a_{k,j,\ell} ;+; \sum_{\ell=1}^{J_{b,k,j}} z_{b,k,j,i,\ell},c_{b,k,j,\ell}$$

Step by step:

Group lookup: g_i = group_id[i] determines which group's anchor applies to observation $i$.
Anchor retrieval: theta_ref_ikj = theta_ref_kp_data[g_i, k][j] fetches $\theta_{\text{ref},g_i,k,j}$ — the plug-in EB anchor for this observation's group, slot, and coordinate.
Anchor initialization: eta_ikj = theta_ref_ikj — the linear predictor starts at the anchor. All deviations are additive on top of the anchor.
Additive $a()$ contribution: If slot $k$ uses $a()$ and has design columns, the dot product $\mathbf{z}_{a,k,j,i}^\top \mathbf{a}_{k,j}$ is added. The design row Z_a_kp[k,j][i, 1:J_a_per_kp[k,j]] is extracted and converted to a vector; the coefficient slice a_coef_kp[k,j][1:J_a_per_kp[k,j]] is matched. Note: the slicing uses J_a_per_kp (the full design column count), but coefficients beyond J_a_free_kp are zero (from initialization), so only free coefficients contribute.
$b()$ contribution via $c_b$: If slot $k$ uses $b()$ and has design columns, the dot product $\mathbf{z}_{b,k,j,i}^\top \mathbf{c}_{b,k,j}$ is added. Note: this uses c_b_kp (the scaled raw coefficients), not b_coef_kp (the anchor-divided coefficients). The $b()$ component enters the predictor as a direct additive term through $\mathbf{c}_b$, not through the anchor-mediated $\mathbf{b}$.
Storage: eta_kp[k][i,j] = eta_ikj.

The reference-anchoring decomposition is thus: the anchor $\theta_{\text{ref}}$ provides the baseline, and the $a()$ and $b()$ (via $c_b$) components provide structured deviations from that baseline. The anchor "anchors" the predictor, and the design matrices structure how observations deviate from it.

6. `model` Block

6.1 Prior Distributions

// theta_ref_kp prior block intentionally absent: the anchor is data
// here. The remaining priors mirror the marginal template.
if (any_use_a == 1) {
  sigma_a_k ~ {{PRIOR_SIGMA_A}};
  a_raw ~ normal(0, 1);
}
if (any_use_b == 1) {
  sigma_b_k ~ {{PRIOR_SIGMA_B}};
  c_b_kp_raw ~ normal(0, 1);
}
if (sigma_y_size > 0) {
  sigma_y_pop_k ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
  phi_pop_k ~ {{PRIOR_PHI}};
}

The comment makes explicit the defining difference from the marginal template: the prior on $\theta_{\text{ref}}$ is intentionally absent because $\theta_{\text{ref}}$ is data (plug-in EB), not a parameter. There is no distributional assumption on the anchor; it is conditioned upon.

6.1.1 Additive Component Priors

When any_use_a == 1:

sigma_a_k ~ {{PRIOR_SIGMA_A}}: The half-prior scales $\sigma_{a,k}$ receive a templated prior (the {{PRIOR_SIGMA_A}} placeholder is substituted by the R-side wrapper at model compilation time). This is a hyperprior on the additive deviation magnitude.
a_raw ~ normal(0, 1): The raw variates are standard normal. This is the NCP anchor: $a_{\text{raw},m} \sim \mathcal{N}(0, 1)$, implying $a_{k,j,jj} \sim \mathcal{N}(0, \sigma_{a,k}^2)$ marginally.

No Jacobian adjustment is needed because a_raw is declared as an unconstrained vector (no transformation), and the scaling to a_coef_kp happens in transformed parameters (deterministic, no prior placed on a_coef_kp directly).

6.1.2 B-Component Priors

When any_use_b == 1:

sigma_b_k ~ {{PRIOR_SIGMA_B}}: Templated hyperprior on $\sigma_{b,k}$.
c_b_kp_raw ~ normal(0, 1): Standard normal NCP variates, implying $c_{b,k,j,jj} \sim \mathcal{N}(0, \sigma_{b,k}^2)$ marginally.

6.1.3 Population Dispersion Priors

When sigma_y_size > 0 and/or phi_size > 0:

sigma_y_pop_k ~ {{PRIOR_SIGMA_Y}}: Templated prior on population $\sigma_y$.
phi_pop_k ~ {{PRIOR_PHI}}: Templated prior on population $\phi$.

As noted, these parameters do not enter the likelihood — they are sampled from their priors only.

6.2 Likelihood

The likelihood is dispatched on family_id_k_vector[1] — the family of slot 1 (the location slot). Only two branches are implemented:

6.2.1 Gaussian Family (`family_id_k_vector[1] == 1`)

if (family_id_k_vector[1] == 1) {
  for (i in 1:n) {
    for (j in 1:p) {
      real sigma_ij = exp(eta_kp[2][i, j]);
      y_real[i, j] ~ normal(eta_kp[1][i, j], sigma_ij);
    }
  }
}

For each observation $i$ and coordinate $j$:

$$\sigma_{ij} = \exp(\eta_{2,i,j})$$

$$y_{ij}^{\text{real}} \sim \mathcal{N}(\eta_{1,i,j},;\sigma_{ij})$$

Slot 1 ($\eta_{1,i,j}$) is the mean $\mu_{ij}$ (identity link, since the Gaussian mean is directly the linear predictor).
Slot 2 ($\eta_{2,i,j}$) is the log-standard-deviation. The exp() link ensures $\sigma_{ij} > 0$.
The likelihood is the product $\prod_{i=1}^{n}\prod_{j=1}^{p} \mathcal{N}(y_{ij}^{\text{real}} \mid \eta_{1,i,j}, \exp(\eta_{2,i,j}))$, reflecting the coordinate-wise factorization (Path A).

6.2.2 Negative Binomial Family (`family_id_k_vector[1] == 3`)

} else if (family_id_k_vector[1] == 3) {
  for (i in 1:n) {
    for (j in 1:p) {
      real phi_ij = exp(eta_kp[2][i, j]);
      y_int[i, j] ~ neg_binomial_2(exp(eta_kp[1][i, j]), phi_ij);
    }
  }
}

For each observation $i$ and coordinate $j$:

$$\phi_{ij} = \exp(\eta_{2,i,j})$$

$$\mu_{ij} = \exp(\eta_{1,i,j})$$

$$y_{ij}^{\text{int}} \sim \text{NegBinomial}_2(\mu_{ij},;\phi_{ij})$$

Slot 1 ($\eta_{1,i,j}$) is the log-mean. The exp() link ensures $\mu_{ij} > 0$.
Slot 2 ($\eta_{2,i,j}$) is the log-dispersion. The exp() link ensures $\phi_{ij} > 0$.
neg_binomial_2 is Stan's parameterization with mean $\mu$ and precision (inverse overdispersion) $\phi:\ \text{Var}(Y) = \mu + \mu^2/\phi$.
The likelihood is $\prod_{i=1}^{n}\prod_{j=1}^{p} \text{NegBinomial}_2(y_{ij}^{\text{int}} \mid \exp(\eta_{1,i,j}), \exp(\eta_{2,i,j}))$.

6.2.3 No Other Branches

If family_id_k_vector[1] is neither 1 nor 3, no likelihood is evaluated. The model reduces to just the priors. This is consistent with the stated "initial coverage" of Gaussian and NB only.

6.3 No Jacobian Terms

No explicit Jacobian adjustments appear in the model block. This is correct because:

The constrained parameters (sigma_a_k, sigma_b_k, sigma_y_pop_k, phi_pop_k) have <lower=0> constraints, and Stan automatically applies the log-Jacobian for the half-normal/half-Student-t etc. transforms.
The NCP scaling ($a = \sigma_a \cdot a_{\text{raw}}$) is a deterministic linear transformation in transformed parameters; no prior is placed on a_coef_kp directly, so no Jacobian is needed.
The link functions (exp) are applied to eta_kp inside the likelihood statements, not as parameter transforms.

7. `generated quantities` Block

matrix[n, p] log_lik;
matrix[n, p] theta_i;
matrix[n, p] y_pred;

log_lik: $n \times p$ matrix of pointwise log-likelihood contributions. Used for PSIS-LOO/WAIC via the loo package.
theta_i: $n \times p$ matrix of the location parameter $\theta_i$ (the quantity the AMM decomposes). For both families, this is $\eta_{1,i,j}$ (the slot-1 linear predictor).
y_pred: $n \times p$ matrix of posterior predictive draws.

7.1 Gaussian Generated Quantities

if (family_id_k_vector[1] == 1) {
  for (i in 1:n) {
    for (j in 1:p) {
      real sigma_ij = exp(eta_kp[2][i, j]);
      theta_i[i, j] = eta_kp[1][i, j];
      log_lik[i, j] = normal_lpdf(y_real[i, j] | eta_kp[1][i, j],
                                   sigma_ij);
      y_pred[i, j] = normal_rng(eta_kp[1][i, j], sigma_ij);
    }
  }
}

For each $(i,j)$:

$$\sigma_{ij} = \exp(\eta_{2,i,j})$$

$$\theta_{i}[i,j] = \eta_{1,i,j} \quad \text{(the mean)}$$

$$\text{log_lik}[i,j] = \log \mathcal{N}(y_{ij}^{\text{real}} \mid \eta_{1,i,j},;\sigma_{ij})$$

$$y_{\text{pred}}[i,j] \sim \mathcal{N}(\eta_{1,i,j},;\sigma_{ij}) \quad \text{(posterior predictive draw)}$$

The normal_lpdf is called explicitly (rather than using the ~ sampling syntax) to capture the log-density value for LOO. The normal_rng generates a random draw from the predictive distribution at each posterior sample.

7.2 Negative Binomial Generated Quantities

} else if (family_id_k_vector[1] == 3) {
  for (i in 1:n) {
    for (j in 1:p) {
      real phi_ij = exp(eta_kp[2][i, j]);
      real mu_ij = exp(eta_kp[1][i, j]);
      theta_i[i, j] = eta_kp[1][i, j];
      log_lik[i, j] = neg_binomial_2_lpmf(y_int[i, j] | mu_ij, phi_ij);
      y_pred[i, j] = neg_binomial_2_rng(mu_ij, phi_ij);
    }
  }
}

For each $(i,j)$:

$$\phi_{ij} = \exp(\eta_{2,i,j}), \qquad \mu_{ij} = \exp(\eta_{1,i,j})$$

$$\theta_{i}[i,j] = \eta_{1,i,j} \quad \text{(the log-mean)}$$

$$\text{log_lik}[i,j] = \log \text{NegBinomial}_2(y_{ij}^{\text{int}} \mid \mu_{ij},;\phi_{ij})$$

$$y_{\text{pred}}[i,j] \sim \text{NegBinomial}_2(\mu_{ij},;\phi_{ij}) \quad \text{(posterior predictive draw)}$$

Note that theta_i stores $\eta_{1,i,j}$ (the log-mean), not $\mu_{ij}$ itself. This is consistent with the AMM decomposition: $\theta_i$ is the linear predictor (the "theta" being decomposed), and the inverse link is applied separately to obtain the distributional parameter.

7.3 No Other Branches

As in the model block, if the family is neither 1 nor 3, the generated quantities matrices remain uninitialized (Stan will fill them with NaN). This is consistent with the stated initial coverage.

8. Summary: How the Template Realizes the AMM Reference-Anchoring Decomposition

8.1 The Decomposition

The AMM reference-anchoring decomposition expresses the linear predictor for each slot $k$, observation $i$, and coordinate $j$ as:

$$\boxed{\eta_{k,i,j} = \underbrace{\theta_{\text{ref},g_i,k,j}}_{\text{anchor}} ;+; \underbrace{\mathbf{z}_{a,k,j,i}^{\top},\sigma_{a,k},\mathbf{a}_{\text{raw},k,j}}_{\text{additive deviation } a()} ;+; \underbrace{\mathbf{z}_{b,k,j,i}^{\top},\sigma_{b,k},\mathbf{c}_{b,\text{raw},k,j}}_{\text{scaled deviation } c_b()}}$$

The anchor $\theta_{\text{ref}}$ is the reference point (plug-in EB data in this conditional template). The $a()$ and $c_b()$ components are structured deviations from that reference, parameterized via non-centered priors with slot-level scales $\sigma_{a,k}$ and $\sigma_{b,k}$.

The anchor-mediated scaling $\mathbf{b}_{k,j} = \mathbf{c}_{b,k,j}/\theta_{\text{ref},k,j}$ is derived (in transformed parameters) but not used in the likelihood — it is a reporting quantity that expresses the multiplicative deviation relative to the anchor.

8.2 The Conditional (EB) Aspect

In this conditional template, $\theta_{\text{ref}}$ is data (theta_ref_kp_data), not a parameter. The model conditions on the plug-in EB estimate:

$$p(\mathbf{a}, \mathbf{c}_b, \boldsymbol{\sigma} \mid \mathbf{y}, \theta_{\text{ref}}) \propto p(\mathbf{y} \mid \eta(\theta_{\text{ref}}, \mathbf{a}, \mathbf{c}_b)) ; p(\mathbf{a}_{\text{raw}}) ; p(\mathbf{c}_{b,\text{raw}}) ; p(\boldsymbol{\sigma})$$

There is no prior on $\theta_{\text{ref}}$ and no marginalization over it. This contrasts with the marginal template, where $\theta_{\text{ref}}$ is a parameter with its own prior and the posterior marginalizes over it.

8.3 The K-Slot Multi-Parametric Likelihood

The $K=2$ slots jointly parameterize a single distribution:

Family	$\eta_{1}$ (slot 1)	$\eta_{2}$ (slot 2)	Likelihood
Gaussian	$\mu_{ij}$ (identity)	$\log\sigma_{ij}$	$\mathcal{N}(\mu_{ij}, \sigma_{ij})$
NB	$\log\mu_{ij}$	$\log\phi_{ij}$	$\text{NegBinomial}_2(\mu_{ij}, \phi_{ij})$

8.4 The Coordinate-Wise Factorization

The $p$ coordinates are independent given the slots:

$$p(\mathbf{y}_i \mid \eta) = \prod_{j=1}^{p} p(y_{ij} \mid \eta_{1,i,j}, \eta_{2,i,j})$$

This is the "Path A" coordinate-wise factorization composed with the "Path B" K-slot multi-parametric likelihood, canonized as "Path C" (D38″ = (h)).

8.5 Group Structure

When use_groups == 1, the anchor varies by group: observation $i$ in group $g_i$ uses $\theta_{\text{ref},g_i,k,j}$. This allows the reference point to differ across groups while the deviation structure ($a(),\ c_b()$) is shared. The $b$-decomposition ($\mathbf{b} = \mathbf{c}_b / \theta_{\text{ref}}$) is disabled in this case because a single $\mathbf{b}$ cannot represent group-varying anchors.

8.6 Declared-but-Unused Quantities (Faithful Inventory)

The following data/parameters are declared but not referenced in the template body (likelihood or predictor):

Quantity	Block	Status
`inv_link_id_per_slot`	data	Declared, never read (links are hard-coded per family)
`use_W`	data	Constrained to 0; placeholder for disabled $W$-matrix pathway
`theta_anchor_kp`	data	Declared, never read (R-side interface only)
`K_slots`	data	Redundant mirror of `K`; never read
`p_dim`	data	Redundant mirror of `p`; never read
`sigma_y_pop_k`	parameters	Sampled from prior only; does not enter $\eta$ or likelihood
`phi_pop_k`	parameters	Sampled from prior only; does not enter $\eta$ or likelihood
`b_coef_kp`	transformed parameters	Computed but not used in $\eta$ or likelihood (reporting only)
`sigma_y_offset_k`	transformed data	Computed but never read
`phi_offset_k`	transformed data	Computed but never read

These are either interface placeholders, legacy from the marginal template, or scaffolding for future generalization.

inst/stan/amm_eb_marginal_KxP.stan

`amm_eb_marginal_KxP.stan` — Block-by-Block Technical Reference (Section 1 of 2)

Header Comment Summary

The header establishes this as the Empirical Bayes (EB) Marginal Template, Path C of the gdpar package (Sub-phase 8.6.D). It composes:

Path A (multivariate, Theorem 7A*): a single outcome $y[n,p]$ factorized coordinate-wise across $p$ columns.
Path B (multi-parametric, Theorem 7C*): $K$ distributional-regression slots, each with its own canonical link.

The $K$ slots are the canonical distributional parameters of the resolved family (e.g., $(\mu, \log\sigma)$ for Gaussian $K{=}2;\ (\log\mu, \log\phi)$ for NB $K{=}2$). Each slot carries its own per-group, per-coordinate anchor $\theta_{\text{ref},k}^{(g)}[j] \in \mathbb{R}^p$.

Coverage: Initial iteration supports family_id_k_vector[1] $\in \{1, 3\}$ (Gaussian $K{=}2$, Negative Binomial $K{=}2$). Families $\{5,\ldots,13\}$ are deferred.

Restrictions:

use_W = 0 enforced (decision D39): the modulating component $W$ is disabled.
$a/b$ coefficients are per-slot per-coordinate ragged (D41): each $(k,j)$ pair has its own $J_{a}^{(k,j)}\ /\ J_{b}^{(k,j)}$ free columns.

Internal parametrization:

$a$: non-centered (NCP) per slot, with per-slot scale $\sigma_{a,k}$.
$b$: linear reparametrization via $c_b = \theta_{\text{ref}} \cdot b$; NCP sampling of $c_{b,\text{raw}}$, with $b$ derived post-hoc in the single-anchor regime (use_groups == 0).
$\theta_{\text{ref}}$: per-group, per-slot, per-coord anchor (sampled in the marginal template; fixed as data in the companion conditional template).

`data` Block

Dimension declarations

int<lower=1> n;

Number of observations. Constraint $n \geq 1$.

int<lower=2> K;

Number of distributional parameter slots. Constraint $K \geq 2$ (multi-parametric Path B requirement).

int<lower=2> p;

Number of outcome coordinates (matrix-column dimension). Constraint $p \geq 2$ (Path A multivariate requirement).

Family and link configuration

array[K] int<lower=1, upper=13> family_id_k_vector;

Per-slot family identifiers. In the initial iteration, these are homogeneous across slots: the dispatcher consumes only family_id_k_vector[1] as the canonical family, and remaining slots inherit the canonical structure. Valid values in this iteration: $\{1, 3\}$.

array[K] int<lower=0, upper=2> inv_link_id_per_slot;

Canonical Stan inverse-link identifier per slot, inherited from Path B dispatch. The comment states that only inv_link_id_per_slot[2] (log $\mapsto$ exp) is consumed for the dispersion/phi slot. Note: in the model block, the exp() call is hardcoded rather than dispatched through this array; the array is declared but not referenced in the executable code of this section.

Component use flags

array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;

Per-slot binary flags activating the additive ($a$) and multiplicative ($b$) components. When use_a_k[k] == 0, slot $k$ has no additive free coefficients; likewise for $b$.

int<lower=0, upper=0> use_W;

The modulating component $W$ flag. The double constraint <lower=0, upper=0> forces use_W to be exactly zero — this is a compile-time assertion (decision D39) ensuring $W$ is disabled. Any non-zero value supplied from R would trigger a Stan data-constraint violation.

Free-column counts

int<lower=0> J_a_max;
int<lower=0> J_b_max;

Maximum padding widths for the additive and multiplicative design matrices across all $(k,j)$ pairs. These define the column dimension of the padded design matrices.

array[K, p] int<lower=0> J_a_per_kp;
array[K, p] int<lower=0> J_b_per_kp;

Per-slot, per-coordinate free-column counts. $J_a^{(k,j)}$ is the number of active additive columns for slot $k$, coordinate $j$; likewise $J_b^{(k,j)}$ for multiplicative. Zero entries indicate no free coefficients for that $(k,j)$ pair (e.g., intercept-only absorbed into $\theta_{\text{ref}}$).

Design matrices

array[K, p] matrix[n, J_a_max] Z_a_kp;
array[K, p] matrix[n, J_b_max] Z_b_kp;

Per-slot, per-coordinate padded design matrices. For each $(k,j)$, only columns $1{:}J_a^{(k,j)}$ (resp. $1{:}J_b^{(k,j)}$) are referenced; remaining padded columns are never touched. This flat-pack structure allows heterogeneous column counts within a single Stan array.

Outcomes

matrix[n, p] y_real;
array[n, p] int y_int;

Dual outcome declarations: y_real for continuous families (Gaussian), y_int for count families (NB). Only the declaration matching family_id_k_vector[1] is consumed in the likelihood. Both are always supplied as data; the unused one is ignored.

Data-side anchor

array[K] vector[p] theta_anchor_kp;

Per-slot, per-coordinate anchor supplied as data. This is distinct from the sampled parameter theta_ref_kp. It is declared but not referenced in the executable code of this section (likely consumed in generated quantities or for initialization in section 2).

Dispersion flags

array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;

Per-slot population-scope dispersion flags, derived on the R side from family_id_k_vector. These control whether a sigma_y_pop_k (resp. phi_pop_k) entry is allocated for slot $k$. For Gaussian $K{=}2$, slot 2 would typically have use_dispersion_y_k[2] = 1; for NB $K{=}2$, slot 2 would have use_dispersion_phi_k[2] = 1.

Grouping

int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;

use_groups: binary flag. When 0, a single anchor per slot is used (J_groups is still $\geq 1$ but only group 1 is referenced).
J_groups: number of groups.
group_id: per-observation group assignment $g_i \in \{1, \ldots, J_{\text{groups}}\}$.

EB convention declarations

int<lower=2> K_slots;
int<lower=2> p_dim;

EB-template convention: every EB template declares K_slots and p_dim. In Path C, K_slots == K and p_dim == p, both strictly greater than 1. These serve as redundant confirmations of the multi-slot, multi-coordinate structure.

`transformed data` Block

Declarations

array[K, p] int J_a_free_kp;
array[K, p] int J_b_free_kp;

Effective free-column counts after applying the per-slot use flags. These refine J_a_per_kp / J_b_per_kp by zeroing out entries for inactive slots.

array[K, p] int a_raw_offset_kp;
array[K, p] int c_b_raw_offset_kp;

Flat-pack offsets into the one-dimensional parameter vectors a_raw and c_b_kp_raw. For each $(k,j)$, the free coefficients occupy indices [offset + 1 : offset + J_free] in the flat vector.

int total_J_a_free = 0;
int total_J_b_free = 0;
int any_use_a = 0;
int any_use_b = 0;

total_J_a_free / total_J_b_free: total number of free additive / multiplicative coefficients across all $(k,j)$ pairs. These determine the lengths of a_raw and c_b_kp_raw.
any_use_a / any_use_b: global flags (0 or 1) indicating whether any free coefficient exists. Used to guard parameter declarations and model-block statements.

int sigma_y_size = 0;
int phi_size = 0;
array[K] int sigma_y_offset_k;
array[K] int phi_offset_k;

sigma_y_size / phi_size: total count of active dispersion entries across slots.
sigma_y_offset_k / phi_offset_k: per-slot offsets into the compacted dispersion arrays (computed but not referenced in the executable code of this section).

Main loop: free-column computation and flat-pack offsets

for (k in 1:K) {
    for (j in 1:p) {
        J_a_free_kp[k, j] = (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0)
                              ? J_a_per_kp[k, j] : 0;
        J_b_free_kp[k, j] = (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0)
                              ? J_b_per_kp[k, j] : 0;

For each $(k,j)$, the effective free count is the nominal count only if the slot's use flag is active; otherwise it is zero. This realizes the ragged per-slot per-coordinate structure (D41):

$$J_{a,\text{free}}^{(k,j)} = \begin{cases} J_a^{(k,j)} & \text{if } \texttt{use_a_k}[k]=1 \text{ and } J_a^{(k,j)}>0 \ 0 & \text{otherwise}\end{cases}$$

and analogously for $J_{b,\text{free}}^{(k,j)}$.

        a_raw_offset_kp[k, j] = total_J_a_free;
        c_b_raw_offset_kp[k, j] = total_J_b_free;
        total_J_a_free += J_a_free_kp[k, j];
        total_J_b_free += J_b_free_kp[k, j];

The current running totals are recorded as the starting offset for this $(k,j)$ pair, then incremented. This builds a contiguous flat-pack indexing scheme:

$$\texttt{a_raw_offset}^{(k,j)} = \sum_{(k',j') \prec (k,j)} J_{a,\text{free}}^{(k',j')}$$

where $\prec$ denotes row-major (k-outer, j-inner) ordering.

        if (J_a_free_kp[k, j] > 0) any_use_a = 1;
        if (J_b_free_kp[k, j] > 0) any_use_b = 1;
    }

Global flags are set to 1 if any $(k,j)$ pair has free coefficients.

    sigma_y_offset_k[k] = sigma_y_size;
    phi_offset_k[k] = phi_size;
    sigma_y_size += use_dispersion_y_k[k];
    phi_size += use_dispersion_phi_k[k];
}

Per-slot dispersion offsets and total sizes are computed. For each slot $k$, if use_dispersion_y_k[k] == 1, one sigma_y_pop_k entry is allocated; likewise for phi_pop_k.

Sigma_a compaction (Decision D96)

int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
    int slot_free_a = 0;
    for (j in 1:p) {
        slot_free_a += J_a_free_kp[k, j];
    }
    if (slot_free_a > 0) {
        n_sigma_a += 1;
        sigma_a_idx[k] = n_sigma_a;
    } else {
        sigma_a_idx[k] = 0;
    }
}

This implements Option A (RG.6, D96): the per-slot additive scale $\sigma_{a,k}$ is declared only for slots that carry at least one free additive coefficient. A slot with no free $a$ coefficients (either use_a_k[k] == 0 or all $J_a^{(k,j)} = 0$, e.g., intercept-only $a = \sim 1$ absorbed into $\theta_{\text{ref}}$) would otherwise declare a sampled-but-unused half-prior scale — a flat, non-identified direction.

The mapping $\texttt{sigma\_a\_idx}[k]$ assigns each "active" slot a position in the compacted vector sigma_a_k (1-indexed), or 0 if the slot is inactive. When every slot carries free $a$ coefficients, $n_{\sigma_a} = K$ and $\texttt{sigma\_a\_idx}$ is the identity, preserving bit-identical behavior.

Mathematically:

$$\texttt{sigma_a_idx}[k] = \begin{cases} \text{rank}_k & \text{if } \sum_{j=1}^p J_{a,\text{free}}^{(k,j)} > 0 \ 0 & \text{otherwise}\end{cases}$$

where $\text{rank}_k$ is the sequential position among active slots.

`parameters` Block

Anchor

array[J_groups, K] vector[p] theta_ref_kp;

The per-group, per-slot, per-coordinate anchor $\theta_{\text{ref},k}^{(g)}[j]$. This is the central object of the AMM reference-anchoring decomposition. Total parameter count: $J_{\text{groups}} \times K \times p$.

In the marginal template, these are sampled parameters. In the companion conditional template, they are moved to data (as theta_ref_kp_data) and the anchor prior is removed.

Anchor hyperparameters (hierarchical prior, active only when `use_groups == 1`)

array[use_groups, K] vector[p] mu_theta_ref_kp;
array[use_groups, K] vector<lower=0>[p] sigma_theta_ref_kp;

When use_groups == 1, these declare per-slot, per-coordinate hyperparameters:

$\mu_{\theta,k}[j]$: prior mean of the anchor.
$\sigma_{\theta,k}[j] > 0$: prior scale of the anchor.

When use_groups == 0, the first array dimension is 0, so no parameters are declared — Stan permits zero-size arrays. The anchor is then given a direct prior without hierarchy.

Additive component (NCP)

array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;

sigma_a_k: per-slot additive scale, compacted to active slots only (size $n_{\sigma_a}$). Constraint $\sigma_{a,k} > 0$.
a_raw: flat-packed standard-normal latent variables, size $\texttt{total\_J\_a\_free}$.

The NCP construction is:

$$a^{(k,j)}_{\ell} = \sigma_{a,k} \cdot a_{\text{raw},,\text{offset}^{(k,j)}+\ell}, \quad \ell = 1, \ldots, J_{a,\text{free}}^{(k,j)}$$

with $a_{\text{raw},m} \sim \mathcal{N}(0,1)$. No explicit Jacobian is needed because Stan samples $(a_{\text{raw}}, \sigma_a)$ directly and the prior is placed on each separately.

Multiplicative component (linear reparametrization, NCP)

array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_kp_raw;

sigma_b_k: per-slot multiplicative scale. When any_use_b == 1, the size is $K$; when 0, the size is 0 (no parameters declared). Constraint $\sigma_{b,k} > 0$.
c_b_kp_raw: flat-packed standard-normal latent variables for the linear-reparametrized multiplicative coefficients, size $\texttt{total\_J\_b\_free}$.

The NCP construction is:

$$c_b^{(k,j)}_{\ell} = \sigma_{b,k} \cdot c_{b,\text{raw},,\text{offset}^{(k,j)}+\ell}, \quad \ell = 1, \ldots, J_{b,\text{free}}^{(k,j)}$$

with $c_{b,\text{raw},m} \sim \mathcal{N}(0,1)$.

The reference-anchoring relation is $c_b = \theta_{\text{ref}} \odot b$, i.e., the multiplicative coefficient $b$ is defined implicitly by:

$$b^{(k,j)}_{\ell} = \frac{c_b^{(k,j)}_{\ell}}{\theta_{\text{ref},k}^{(1)}[j]}$$

This relation is inverted only in the single-anchor regime (use_groups == 0) in transformed parameters. The linear predictor uses $c_b$ directly (not $b$), avoiding the bilinear term $\theta_{\text{ref}} \cdot b$ in the HMC sampling geometry. This is the linear reparametrization of the multiplicative component.

Population-scope dispersion parameters

array[sigma_y_size] real<lower=0> sigma_y_pop_k;
array[phi_size] real<lower=0> phi_pop_k;

Per-slot population-scope dispersion entries. These are declared with sizes determined by use_dispersion_y_k / use_dispersion_phi_k. They receive priors in the model block but do not enter the likelihood in this section: the Gaussian likelihood uses $\sigma_{ij} = \exp(\eta_{2ij})$ directly, and the NB likelihood uses $\phi_{ij} = \exp(\eta_{2ij})$ directly. These parameters are therefore sampled from their priors only (their posterior equals their prior) in the current iteration. They may be wired into the likelihood or used in generated quantities in section 2.

`transformed parameters` Block

Declarations

array[K] matrix[n, p] eta_kp;

Per-slot linear predictor matrices. $\eta_{k}[i,j]$ is the linear predictor for slot $k$ at observation $i$, coordinate $j$.

array[K, p] vector[J_a_max] a_coef_kp;
array[K, p] vector[J_b_max] c_b_kp;
array[K, p] vector[J_b_max] b_coef_kp;

a_coef_kp: padded additive coefficient vectors (only first $J_a^{(k,j)}$ entries are active).
c_b_kp: padded linear-reparametrized multiplicative coefficient vectors.
b_coef_kp: padded derived multiplicative coefficient vectors (populated only in single-anchor regime).

Zero initialization

for (k in 1:K) {
    for (j in 1:p) {
        a_coef_kp[k, j] = rep_vector(0, J_a_max);
        c_b_kp[k, j] = rep_vector(0, J_b_max);
        b_coef_kp[k, j] = rep_vector(0, J_b_max);
    }
}

All coefficient vectors are initialized to zero. Padded (unused) entries remain zero throughout, ensuring they contribute nothing to dot products.

Additive coefficient construction (NCP scaling)

if (any_use_a == 1) {
    for (k in 1:K) {
        if (use_a_k[k] == 1) {
            for (j in 1:p) {
                if (J_a_free_kp[k, j] > 0) {
                    for (jj in 1:J_a_free_kp[k, j]) {
                        a_coef_kp[k, j][jj] = a_raw[a_raw_offset_kp[k, j] + jj]
                                                * sigma_a_k[sigma_a_idx[k]];
                    }
                }
            }
        }
    }
}

For each active $(k,j)$ pair, the additive coefficients are constructed via NCP scaling:

$$a^{(k,j)}_{\ell} = \sigma_{a,,\texttt{sigma_a_idx}[k]} \cdot a_{\text{raw},,\texttt{offset}^{(k,j)}+\ell}$$

The guard if (any_use_a == 1) prevents any execution when no additive coefficients exist. The inner guards use_a_k[k] == 1 and J_a_free_kp[k, j] > 0 ensure only active slots and coordinates are processed. The compacted index sigma_a_idx[k] maps slot $k$ to its position in the reduced sigma_a_k vector.

Multiplicative coefficient construction (NCP scaling + anchor inversion)

if (any_use_b == 1) {
    for (k in 1:K) {
        if (use_b_k[k] == 1) {
            for (j in 1:p) {
                if (J_b_free_kp[k, j] > 0) {
                    for (jj in 1:J_b_free_kp[k, j]) {
                        c_b_kp[k, j][jj] = c_b_kp_raw[c_b_raw_offset_kp[k, j] + jj]
                                            * sigma_b_k[k];
                    }

The linear-reparametrized multiplicative coefficients are constructed via NCP:

$$c_b^{(k,j)}_{\ell} = \sigma_{b,k} \cdot c_{b,\text{raw},,\texttt{offset}^{(k,j)}+\ell}$$

Note: sigma_b_k[k] uses direct slot indexing (not compacted), because sigma_b_k has size $K$ when any_use_b == 1.

                    if (use_groups == 0 && theta_ref_kp[1, k][j] != 0) {
                        for (jj in 1:J_b_per_kp[k, j]) {
                            b_coef_kp[k, j][jj] = c_b_kp[k, j][jj]
                                                    / theta_ref_kp[1, k][j];
                        }
                    }

In the single-anchor regime (use_groups == 0), the multiplicative coefficient $b$ is recovered by dividing $c_b$ by the anchor:

$$b^{(k,j)}_{\ell} = \frac{c_b^{(k,j)}_{\ell}}{\theta_{\text{ref},k}^{(1)}[j]}$$

The guard theta_ref_kp[1, k][j] != 0 prevents division by zero. When use_groups == 1, $b$ is not recovered (it would be ambiguous which group's anchor to use), and b_coef_kp remains zero. The comment notes this is "reported as a derived quantity only in the single-anchor regime."

Note the loop bound uses J_b_per_kp[k, j] (the original count) rather than J_b_free_kp[k, j] — these are equal when use_b_k[k] == 1, so the distinction is immaterial here, but it is a line-faithful observation.

Linear predictor assembly

for (k in 1:K) {
    for (i in 1:n) {
        int g_i = group_id[i];
        for (j in 1:p) {
            real theta_ref_ikj = theta_ref_kp[g_i, k][j];
            real eta_ikj = theta_ref_ikj;

For each observation $i$, the group $g_i$ is looked up, and the per-group anchor $\theta_{\text{ref},k}^{(g_i)}[j]$ is retrieved. The linear predictor is initialized to the anchor:

$$\eta_{kij} \leftarrow \theta_{\text{ref},k}^{(g_i)}[j]$$

This is the reference-anchoring baseline: the predictor starts at the group-specific anchor.

            if (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0) {
                eta_ikj += dot_product(
                    to_vector(Z_a_kp[k, j][i, 1:J_a_per_kp[k, j]]),
                    a_coef_kp[k, j][1:J_a_per_kp[k, j]]
                );
            }

The additive component is added:

$$\eta_{kij} \mathrel{+}= \sum_{\ell=1}^{J_a^{(k,j)}} Z_a^{(k,j)}[i, \ell] \cdot a^{(k,j)}_{\ell}$$

The to_vector() call extracts the active row slice of the design matrix; dot_product computes the inner product with the active coefficient slice.

            if (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0) {
                eta_ikj += dot_product(
                    to_vector(Z_b_kp[k, j][i, 1:J_b_per_kp[k, j]]),
                    c_b_kp[k, j][1:J_b_per_kp[k, j]]
                );
            }

The multiplicative component (in linear-reparametrized form $c_b$) is added:

$$\eta_{kij} \mathrel{+}= \sum_{\ell=1}^{J_b^{(k,j)}} Z_b^{(k,j)}[i, \ell] \cdot c_b^{(k,j)}_{\ell}$$

Note that $c_b^{(k,j)}_{\ell} = \theta_{\text{ref},k}^{(g_i)}[j] \cdot b^{(k,j)}_{\ell}$ conceptually, so this term equals $\theta_{\text{ref},k}^{(g_i)}[j] \cdot \sum_\ell Z_b^{(k,j)}[i,\ell] \cdot b^{(k,j)}_{\ell}$, making the $b$-effect proportional to the anchor. However, the computation uses $c_b$ directly to avoid the bilinear product in the sampling geometry.

            eta_kp[k][i, j] = eta_ikj;
        }
    }
}

The assembled linear predictor is stored. The complete AMM reference-anchoring decomposition for the linear predictor is:

$$\boxed{\eta_{kij} = \theta_{\text{ref},k}^{(g_i)}[j] + \mathbf{z}_{a}^{(k,j)}[i]^\top \mathbf{a}^{(k,j)} + \mathbf{z}_{b}^{(k,j)}[i]^\top \mathbf{c}_b^{(k,j)}}$$

where $\mathbf{c}_b^{(k,j)} = \theta_{\text{ref},k}^{(g_i)}[j] \odot \mathbf{b}^{(k,j)}$ (the anchoring relation, inverted only in single-anchor mode for reporting).

`model` Block

Anchor priors

if (use_groups == 1) {
    for (k in 1:K) {
        mu_theta_ref_kp[1, k] ~ {{PRIOR_THETA_REF}};
        sigma_theta_ref_kp[1, k] ~ {{PRIOR_SIGMA_THETA_REF}};
        for (g in 1:J_groups) {
            theta_ref_kp[g, k] ~ normal(mu_theta_ref_kp[1, k],
                                        sigma_theta_ref_kp[1, k]);
        }
    }
}

Hierarchical prior (when use_groups == 1): for each slot $k$,

$$\boldsymbol{\mu}_{\theta,k} \sim \texttt{PRIOR_THETA_REF}$$ $$\boldsymbol{\sigma}_{\theta,k} \sim \texttt{PRIOR_SIGMA_THETA_REF}$$ $$\boldsymbol{\theta}_{\text{ref},k}^{(g)} \mid \boldsymbol{\mu}_{\theta,k}, \boldsymbol{\sigma}_{\theta,k} \sim \mathcal{N}_p!\left(\boldsymbol{\mu}_{\theta,k},; \text{diag}(\boldsymbol{\sigma}_{\theta,k})\right), \quad g = 1, \ldots, J_{\text{groups}}$$

The {{PRIOR_THETA_REF}} and {{PRIOR_SIGMA_THETA_REF}} are R-side template placeholders substituted by the dispatcher. The hyperparameters are indexed [1, k] because the first array dimension is use_groups (= 1 in this branch), so only index 1 is valid.

} else {
    for (k in 1:K) {
        theta_ref_kp[1, k] ~ {{PRIOR_THETA_REF}};
    }
}

Direct prior (when use_groups == 0): only group 1 exists, and the anchor receives a direct (non-hierarchical) prior:

$$\boldsymbol{\theta}_{\text{ref},k}^{(1)} \sim \texttt{PRIOR_THETA_REF}$$

Additive component priors

if (any_use_a == 1) {
    sigma_a_k ~ {{PRIOR_SIGMA_A}};
    a_raw ~ normal(0, 1);
}

When any additive free coefficient exists:

$$\sigma_{a,k} \sim \texttt{PRIOR_SIGMA_A}, \quad k \text{ active}$$ $$a_{\text{raw},m} \sim \mathcal{N}(0, 1), \quad m = 1, \ldots, \texttt{total_J_a_free}$$

This is the standard NCP prior: the unit-normal on a_raw combined with the scale prior on sigma_a_k induces the implied prior $a^{(k,j)}_\ell = \sigma_{a,k} \cdot a_{\text{raw},\cdot} \sim \sigma_{a,k} \cdot \mathcal{N}(0,1)$. No Jacobian adjustment is required because Stan samples $(a_{\text{raw}}, \sigma_a)$ directly.

Multiplicative component priors

if (any_use_b == 1) {
    sigma_b_k ~ {{PRIOR_SIGMA_B}};
    c_b_kp_raw ~ normal(0, 1);
}

When any multiplicative free coefficient exists:

$$\sigma_{b,k} \sim \texttt{PRIOR_SIGMA_B}, \quad k = 1, \ldots, K$$ $$c_{b,\text{raw},m} \sim \mathcal{N}(0, 1), \quad m = 1, \ldots, \texttt{total_J_b_free}$$

Same NCP structure as the additive component. The implied prior on $c_b^{(k,j)}_\ell$ is $\sigma_{b,k} \cdot \mathcal{N}(0,1)$.

Dispersion priors

if (sigma_y_size > 0) {
    sigma_y_pop_k ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
    phi_pop_k ~ {{PRIOR_PHI}};
}

Population-scope dispersion parameters receive their priors. As noted above, these parameters do not enter the likelihood in this section — they are sampled from their priors only. Their posteriors are therefore equal to their priors, making them non-identified in the likelihood sense. This is a faithful description of the code's behavior; the parameters may serve a purpose in section 2 (generated quantities) or in future family extensions.

Likelihood: family dispatch with coord-wise factorization

The likelihood implements the coord-wise factorization stated in the header:

$$L = \prod_{i=1}^n \prod_{j=1}^p D_{\text{family}}!\left(y_{ij} ;\middle|; \eta_{1ij},; \text{inv_link}_2(\eta_{2ij}),; \ldots\right)$$

Gaussian K=2 (`family_id_k_vector[1] == 1`)

if (family_id_k_vector[1] == 1) {
    for (i in 1:n) {
        for (j in 1:p) {
            real sigma_ij = exp(eta_kp[2][i, j]);
            y_real[i, j] ~ normal(eta_kp[1][i, j], sigma_ij);
        }
    }
}

Slot 1 carries the location $\mu_{ij} = \eta_{1ij}$ (identity link). Slot 2 carries the log-standard-deviation $\log\sigma_{ij} = \eta_{2ij}$, inverted via $\sigma_{ij} = \exp(\eta_{2ij})$ (log link, inv_link_id_per_slot[2] = 2).

The likelihood is:

$$L = \prod_{i=1}^n \prod_{j=1}^p \mathcal{N}!\left(y_{\text{real},ij} ;\middle|; \eta_{1ij},; \exp(\eta_{2ij})\right)$$

$$\log L = \sum_{i=1}^n \sum_{j=1}^p \left[ -\frac{1}{2}\log(2\pi) - \eta_{2ij} - \frac{(y_{\text{real},ij} - \eta_{1ij})^2}{2\exp(2\eta_{2ij})} \right]$$

The exp() inverse link is hardcoded rather than dispatched through inv_link_id_per_slot.

Negative Binomial K=2 (`family_id_k_vector[1] == 3`)

} else if (family_id_k_vector[1] == 3) {
    for (i in 1:n) {
        for (j in 1:p) {
            real phi_ij = exp(eta_kp[2][i, j]);
            y_int[i, j] ~ neg_binomial_2(exp(eta_kp[1][i, j]), phi_ij);
        }
    }
}

Slot 1 carries the log-mean $\log\mu_{ij} = \eta_{1ij}$, inverted via $\mu_{ij} = \exp(\eta_{1ij})$ (log link). Slot 2 carries the log-precision $\log\phi_{ij} = \eta_{2ij}$, inverted via $\phi_{ij} = \exp(\eta_{2ij})$ (log link).

Stan's neg_binomial_2(mu, phi) parameterization has PMF:

$$P(Y = y) = \binom{y + \phi - 1}{y} \left(\frac{\phi}{\phi + \mu}\right)^\phi \left(\frac{\mu}{\phi + \mu}\right)^y$$

with mean $\mu$ and variance $\mu + \mu^2/\phi$.

The likelihood is:

$$L = \prod_{i=1}^n \prod_{j=1}^p \text{NegBinomial}_2!\left(y_{\text{int},ij} ;\middle|; \exp(\eta_{1ij}),; \exp(\eta_{2ij})\right)$$

Deferred families

// family_id_k_vector[1] in {5, 6, 7, 8, 9, 10, 11, 12, 13}:
// deferred to a later iteration of 8.6.D with the explicit
// numerical caveat of opening Section 6.1. The dispatcher upstream
// raises gdpar_unsupported_feature_error before reaching this
// branch.

Families $\{5, \ldots, 13\}$ (Beta, Gamma, Lognormal, Student-t, Tweedie, ZIP, ZINB, Hurdle-Poisson, Hurdle-NB) are not implemented. The R-side dispatcher raises gdpar_unsupported_feature_error before this Stan model is ever called for those families. If the model were reached with an unsupported family_id_k_vector[1], no likelihood branch would execute, resulting in an improper posterior (parameters sampled from priors only).

Absent Blocks

`functions` block

Not present. No custom Stan functions are defined; the model uses only built-in distributions (normal, neg_binomial_2) and built-in functions (exp, dot_product, to_vector, rep_vector).

`generated quantities` block

Not present in section 1 of 2. This block is expected in section 2, where derived quantities such as posterior predictive draws, log-likelihood contributions, and potentially the b_coef_kp recovery in the multi-group regime would be computed. The population-scope dispersion parameters (sigma_y_pop_k, phi_pop_k) and the data-side anchor (theta_anchor_kp) may be consumed there.

`inst/stan/amm_eb_marginal_KxP.stan` — Section 2 of 2

Scope of this section

Section 2 contains only the generated quantities block (preceded by a single closing brace } that terminates the previous block — almost certainly the model block from section 1). There is no functions, data, transformed data, parameters, or transformed parameters block in this section; those were defined in section 1 and are referenced here as already-in-scope symbols. The relevant in-scope identifiers used below are:

n, p — sample size and number of response coordinates.
family_id_k_vector — integer vector indexed by the AMM component index $k$; element [1] selects the observation-family (the marginal response distribution).
eta_kp — a $K \times n \times p$ array of linear predictors (the AMM working-scale coordinates). Indexed as eta_kp[k][i, j] with $k \in \{1,\dots,K\},\ i \in \{1,\dots,n\},\ j \in \{1,\dots,p\}$. In this marginal template $K = 2$: eta_kp[1] is the mean-structure predictor and eta_kp[2] is the auxiliary-structure (dispersion/scale) predictor.
y_real — $n \times p$ real-valued response matrix (used for continuous families).
y_int — $n \times p$ integer-valued response matrix (used for count families).

Line-by-line walkthrough

Closing brace of the previous block

This single } closes the block opened in section 1 (the model block, where the joint log-posterior was incremented). It is reproduced here only because the section boundary splits the file mid-program. No semantics of its own.

Block header

generated quantities {

Opens the generated quantities block. Quantities declared here are not used for HMC sampling; they are computed once per posterior draw and stored for downstream inference (LOO-CV, posterior-predictive checks, recovery diagnostics). Because they execute after each leapfrog trajectory is accepted, all eta_kp values reflect a valid posterior sample.

Declarations

  // Per-coord log-likelihood and posterior-predictive quantities.
  matrix[n, p] log_lik;
  matrix[n, p] theta_i;
  matrix[n, p] y_pred;

Three $n \times p$ matrices are declared, one per coordinate $(i,j)$:

Symbol	Type	Role
`log_lik`	`matrix[n, p]`	Per-observation log-likelihood $\log p(y_{ij} \mid \eta_{1,ij}, \eta_{2,ij})$. Shaped for direct feeding into `loo::loo()` (after reshaping to a vector) for PSIS-LOO at the coordinate level.
`theta_i`	`matrix[n, p]`	The anchored mean predictor on its natural working scale, $\theta_{ij} := \eta_{1,ij}$. Stored separately so the user can inspect the identified mean structure without re-applying link functions.
`y_pred`	`matrix[n, p]`	Posterior-predictive replicate $y^{\text{rep}}_{ij} \sim p(\cdot \mid \eta_{1,ij}, \eta_{2,ij})$ at each draw.

Note that theta_i is declared as a matrix even for the negative-binomial branch, where the natural parameter is $\log \mu_{ij}$; the stored value is the linear predictor $\eta_{1,ij}$, not the mean $\mu_{ij}$. This is consistent with the AMM convention that theta_i always holds the reference-anchored coordinate on the working (link) scale.

Family dispatch

  if (family_id_k_vector[1] == 1) {

The dispatch keys on family_id_k_vector[1] — the family identifier for the first AMM component, which in this marginal template is the observation model. The code recognizes exactly two values:

1 → Gaussian (normal) family.
3 → Negative-binomial (NB2) family.

Other values of family_id_k_vector[1] (e.g. 2, presumably Bernoulli/logistic in the package's family registry) are not handled in this template; if encountered, log_lik, theta_i, and y_pred remain uninitialized, which Stan treats as a runtime error (NaN propagation). This is faithful to the source: no else clause is provided.

Branch 1 — Gaussian family (`family_id_k_vector[1] == 1`)

    for (i in 1:n) {
      for (j in 1:p) {
        real sigma_ij = exp(eta_kp[2][i, j]);
        theta_i[i, j] = eta_kp[1][i, j];
        log_lik[i, j] = normal_lpdf(y_real[i, j] | eta_kp[1][i, j],
                                     sigma_ij);
        y_pred[i, j] = normal_rng(eta_kp[1][i, j], sigma_ij);
      }
    }

For each coordinate $(i,j)$:

real sigma_ij = exp(eta_kp[2][i, j]); The auxiliary predictor $\eta_{2,ij}$ is mapped through the log link to the positive standard deviation: $\sigma_{ij} = \exp(\eta_{2,ij}) > 0.$ This is the only place the link for the dispersion is applied; $\eta_{2,ij}$ itself is unconstrained on the real line, so no Jacobian term is required (the transformation is applied to a deterministic function of parameters inside the likelihood, not to a transformed parameter being sampled).
theta_i[i, j] = eta_kp[1][i, j]; The mean-structure predictor is copied verbatim: $\theta_{ij} = \eta_{1,ij}.$ For the Gaussian family the mean link is the identity, so $\theta_{ij}$ is also the conditional mean $\mathbb{E}[y_{ij}] = \eta_{1,ij}$.
log_lik[i, j] = normal_lpdf(y_real[i, j] | eta_kp[1][i, j], sigma_ij); The Gaussian log-density is evaluated at the observed $y_{ij}^{\mathbb{R}}$: $\log p(y_{ij} \mid \eta_{1,ij}, \eta_{2,ij}) = -\tfrac{1}{2}\log(2\pi) - \log \sigma_{ij} - \frac{(y_{ij} - \eta_{1,ij})^2}{2\sigma_{ij}^2}.$ This is the marginal observation likelihood for coordinate $(i,j)$ — the AMM "EB marginal" likelihood, conditional on the anchored predictors.
y_pred[i, j] = normal_rng(eta_kp[1][i, j], sigma_ij); A posterior-predictive draw: $y^{\text{rep}}_{ij} \sim \mathcal{N}(\eta_{1,ij},\, \sigma_{ij}^2).$ normal_rng consumes the per-draw RNG stream and is therefore a valid posterior-predictive sample.

Branch 2 — Negative-binomial family (`family_id_k_vector[1] == 3`)

  } else if (family_id_k_vector[1] == 3) {
    for (i in 1:n) {
      for (j in 1:p) {
        real phi_ij = exp(eta_kp[2][i, j]);
        real mu_ij = exp(eta_kp[1][i, j]);
        theta_i[i, j] = eta_kp[1][i, j];
        log_lik[i, j] = neg_binomial_2_lpmf(y_int[i, j] | mu_ij, phi_ij);
        y_pred[i, j] = neg_binomial_2_rng(mu_ij, phi_ij);
      }
    }
  }
}

For each coordinate $(i,j)$:

real phi_ij = exp(eta_kp[2][i, j]); The auxiliary predictor is mapped through the log link to the NB2 precision (inverse-dispersion) parameter: $\phi_{ij} = \exp(\eta_{2,ij}) > 0.$ Larger $\eta_{2,ij}$ ⇒ larger $\phi_{ij}$ ⇒ lower over-dispersion. The variance is $\mathrm{Var}(y_{ij}) = \mu_{ij} + \frac{\mu_{ij}^2}{\phi_{ij}}.$
real mu_ij = exp(eta_kp[1][i, j]); The mean-structure predictor is mapped through the log link to the positive mean: $\mu_{ij} = \exp(\eta_{1,ij}) > 0.$ This is the canonical link for the NB2 mean. As in the Gaussian branch, the link is applied here inside generated quantities rather than to a sampled transformed parameter, so no Jacobian is added.
theta_i[i, j] = eta_kp[1][i, j]; The anchored coordinate is stored on the working (log) scale: $\theta_{ij} = \eta_{1,ij} = \log \mu_{ij}.$ This is the AMM convention: theta_i always holds the linear predictor, regardless of family. Downstream consumers must exponentiate if they want the mean.
log_lik[i, j] = neg_binomial_2_lpmf(y_int[i, j] | mu_ij, phi_ij); The NB2 log-PMF evaluated at the observed integer count $y_{ij}^{\mathbb{Z}}$: $\log p(y_{ij} \mid \mu_{ij}, \phi_{ij}) = \log\Gamma(y_{ij}+\phi_{ij}) - \log\Gamma(\phi_{ij}) - \log\Gamma(y_{ij}+1) + \phi_{ij}\log\!\frac{\phi_{ij}}{\phi_{ij}+\mu_{ij}} + y_{ij}\log\!\frac{\mu_{ij}}{\phi_{ij}+\mu_{ij}}.$ This is the marginal coordinate likelihood under the AMM decomposition.
y_pred[i, j] = neg_binomial_2_rng(mu_ij, phi_ij); A posterior-predictive count: $y^{\text{rep}}_{ij} \sim \mathrm{NB2}(\mu_{ij}, \phi_{ij}).$ Note that y_pred is declared as a matrix[n,p] (real-valued), so the integer draw is implicitly stored as a real; downstream code that expects integer counts must round or cast.

The final } closes the generated quantities block (and hence the program).

Mathematical summary

Let $\eta_{1,ij}$ and $\eta_{2,ij}$ denote the two AMM working-scale coordinates at observation $i$, response $j$. The marginal observation model encoded by this block is

$$ y_{ij} \mid \eta_{1,ij}, \eta_{2,ij} \sim \begin{cases} \mathcal{N}!\big(\eta_{1,ij},, \exp(2\eta_{2,ij})\big), & \text{if } \texttt{family_id_k_vector[1]} = 1, \[4pt] \mathrm{NB2}!\big(\exp(\eta_{1,ij}),, \exp(\eta_{2,ij})\big), & \text{if } \texttt{family_id_k_vector[1]} = 3. \end{cases} $$

The block produces, for every posterior draw $\eta^{(s)}$:

$\log p(y_{ij} \mid \eta^{(s)}_{1,ij}, \eta^{(s)}_{2,ij})$ into log_lik[i,j];
$\eta^{(s)}_{1,ij}$ into theta_i[i,j] (the reference-anchored mean coordinate, on the working scale);
$y^{\text{rep},(s)}_{ij}$ into y_pred[i,j], drawn from the same marginal law.

Realization of the AMM reference-anchoring decomposition

The "AMM" (Anchored Measurement Model) reference-anchoring decomposition splits the parameter vector for each coordinate into:

A reference coordinate $\eta_{1,ij}$ — the mean-structure linear predictor, identified up to the chosen link. This is the quantity stored verbatim in theta_i and is the canonical "anchored" target of inference.
An auxiliary coordinate $\eta_{2,ij}$ — the dispersion/scale linear predictor, also on an unconstrained working scale, mapped to the positive half-line via $\exp(\cdot)$ inside the likelihood.

The "EB marginal" qualifier in the file name indicates that the likelihood evaluated here is the marginal coordinate-wise likelihood (each $(i,j)$ treated as an independent draw from its own exponential-family distribution), not a joint multivariate likelihood. Independence across coordinates is implicit: the double for loop accumulates per-coordinate log-densities with no covariance term, so

$$ \log p(\mathbf{Y} \mid \eta_1, \eta_2) = \sum_{i=1}^{n}\sum_{j=1}^{p} \log p(y_{ij} \mid \eta_{1,ij}, \eta_{2,ij}). $$

The KxP suffix denotes that the AMM array is indexed by $K$ parameter components (here $K=2$) and $P$ response coordinates; the family_id_k_vector mechanism allows each component $k$ to carry its own family in principle, but this template only consumes family_id_k_vector[1] — the family of the observation model — and treats eta_kp[2] purely as a dispersion predictor whose own "family" is implicitly log-Gaussian on the working scale (i.e., it is always exponentiated). This is the canonical AMM pattern: the first component carries the response family, and subsequent components are auxiliary working-scale regressors whose link is fixed by the response family's requirements (log link for $\sigma$ in the Gaussian case; log link for $\phi$ in the NB2 case).

No Jacobian adjustments appear in this block because:

eta_kp is declared (in section 1) as an unconstrained array of reals, sampled directly by HMC.
The link functions $\exp(\cdot)$ are applied inside the likelihood/RNG calls as deterministic transformations of parameters, not as transformations of sampled transformed parameters. The change-of-variables term would only be required if a constrained quantity (e.g., sigma declared with <lower=0>) were being sampled; here the unconstrained $\eta_{2,ij}$ is the sampled quantity and $\sigma_{ij}$ is a deterministic function of it.

Consequently, the posterior sampled over $\eta_{1,ij}, \eta_{2,ij}$ is exactly the posterior implied by the marginal likelihoods above together with whatever priors were declared in the model block (section 1), and the generated quantities block merely exposes the implied per-coordinate likelihood, anchored mean, and posterior-predictive replicate for downstream use.

← Part V — Stan Templates (2/3) · gdpar Wiki Home · Part VI — Data, Benchmarks, Tests & References →

gdpar Wiki

Home

Part V Stan Templates 3

inst/stan/_canonical_pieces/amm_canonical_pmulti.stan

functions Block

data Block

Scalar dimensions and model flags

Ragged design matrix dimensions

Padded design matrices

Modulating component (W) data

Outcome matrices

Anchoring

Dispersion toggles and hyperpriors

W basis dispatch parameters

Grouping (hierarchical extension, Block 6.5)

Placeholder

transformed data Block

Size calculations

Per-coordinate free counts and flat offsets

parameters Block

Group-level reference parameters

Additive component

Multiplicative component

Modulating component

Dispersion parameters

transformed parameters Block

Declarations

Initialization to zero

Additive coefficient construction

Multiplicative coefficient construction

Linear predictor construction

model Block

Priors

Likelihood

Reference-anchoring decomposition summary

inst/stan/_canonical_pieces/amm_canonical_pmulti.stan — Section 2 of 2

Closing of the model block

generated quantities block

Declaration 1 — pointwise log-likelihood store

Declaration 2 — AMM individual predictor handle

Declaration 3 — posterior-predictive store

The double loop

Branch family_id == 1 — Gaussian / normal

Branch family_id == 2 — Poisson with log link

Branch family_id == 3 — Negative binomial (NB2) with log link

Branch family_id == 4 — Bernoulli with logit link

Closing of the dispatch and loops

How this realizes the AMM reference-anchoring decomposition

inst/stan/amm_eb_conditional_KxP.stan

amm_eb_conditional_KxP.stan — Empirical Bayes Conditional AMM Template (Path C)

1. Architectural Overview

The AMM Reference-Anchoring Decomposition

K-Slot Multi-Parametric Likelihood (Path B)

Coordinate-Wise Factorization (Path A)

Supported Families (initial coverage)

2. data Block

2.1 Core Dimensions

2.2 Family and Link Configuration

2.3 Component Usage Flags

2.4 Design Matrix Dimensions

2.5 Design Matrices

2.6 Response Data

2.7 Anchor Data

2.8 Dispersion Flags

2.9 Group Structure

2.10 Redundant Dimension Mirrors

2.11 Plug-in EB Anchor (the Conditional Template's Defining Data)

3. transformed data Block

3.1 Free-Coefficient Counts and Offsets

3.2 The Main Indexing Loop

3.3 Option A Compactification of σ_a,k (RG.6, D96)

4. parameters Block

4.1 Additive Component Parameters

4.2 B-Component (Anchor-Mediated Scaling) Parameters

4.3 Population Dispersion Parameters

4.4 Absence of theta_ref_kp

5. transformed parameters Block

5.1 Declarations

5.2 Zero Initialization

5.3 Additive Coefficient Construction (NCP Scaling)

5.4 B-Component Coefficient Construction

5.4.1 Scaled c_b Construction (NCP)

`functions` Block

`data` Block

`transformed data` Block

`parameters` Block

`transformed parameters` Block

`model` Block

`inst/stan/_canonical_pieces/amm_canonical_pmulti.stan` — Section 2 of 2

Closing of the `model` block

`generated quantities` block

Branch `family_id == 1` — Gaussian / normal

Branch `family_id == 2` — Poisson with log link

Branch `family_id == 3` — Negative binomial (NB2) with log link

Branch `family_id == 4` — Bernoulli with logit link

`amm_eb_conditional_KxP.stan` — Empirical Bayes Conditional AMM Template (Path C)

2. `data` Block

3. `transformed data` Block

4. `parameters` Block

4.4 Absence of `theta_ref_kp`

5. `transformed parameters` Block

6. `model` Block

6.2.1 Gaussian Family (`family_id_k_vector[1] == 1`)

6.2.2 Negative Binomial Family (`family_id_k_vector[1] == 3`)

7. `generated quantities` Block

`amm_eb_marginal_KxP.stan` — Block-by-Block Technical Reference (Section 1 of 2)

`data` Block

`transformed data` Block

`parameters` Block

Anchor hyperparameters (hierarchical prior, active only when `use_groups == 1`)

`transformed parameters` Block

`model` Block

Gaussian K=2 (`family_id_k_vector[1] == 1`)

Negative Binomial K=2 (`family_id_k_vector[1] == 3`)

`functions` block

`generated quantities` block