-
Notifications
You must be signed in to change notification settings - Fork 0
Part V Stan Templates 3
← Part V — Stan Templates (2/3) · gdpar Wiki Home · Part VI — Data, Benchmarks, Tests & References →
functions {
// {{CANONICAL_HELPERS}}
}A placeholder block. At code-generation time, the R helper generate_stan_code_multi() replaces {{CANONICAL_HELPERS}} with the actual Stan function definitions. Based on the header and the call site in transformed parameters, at minimum this block must define:
vector apply_W_basis_diff(int W_type_id, real theta_ref, real theta_anchor,
int W_per_k_dim, int W_degree,
int W_n_knots_full, vector W_knots_full)This function returns a vector of length W_per_k_dim whose W_type_id:
W_type_id |
Basis | Evaluation |
|---|---|---|
| 0 | (off) | Not called; guarded by use_W == 0. |
| 1 | Polynomial (degree W_degree) |
Monomials |
| 2 | B-spline | Cox–de Boor recursion re-evaluated at both theta_ref and theta_anchor on the knot vector W_knots_full; difference of the two B-spline coefficient vectors. Because evaluation depends on the current theta_ref, which changes every HMC leapfrog step, this is non-separable in the Hamiltonian sense. |
The function is called once per observation
All inputs supplied from R via rstan::sampling(..., data = stan_data).
| Declaration | Meaning |
|---|---|
int<lower=1> n |
Number of observations. |
int<lower=1> p |
Dimension of the multivariate outcome |
int<lower=1, upper=4> family_id |
Homogeneous likelihood family across all |
int<lower=0, upper=1> use_a |
Toggle additive component |
int<lower=0, upper=1> use_b |
Toggle multiplicative component |
int<lower=0, upper=1> use_W |
Toggle modulating component |
| Declaration | Meaning |
|---|---|
int<lower=0> J_a_max |
|
int<lower=0> J_b_max |
|
array[p] int<lower=0> J_a_per_k |
Per-coordinate active column count for |
array[p] int<lower=0> J_b_per_k |
Analogous for |
array[p] matrix[n, J_a_max] Z_a;
array[p] matrix[n, J_b_max] Z_b;For coordinate Z_a[k][, 1:J_a_per_k[k]] is referenced. The remaining padded columns occupy memory but never enter any computation. The R-side constructor amm_spec_multi_designs() column-centers every active column so that
| Declaration | Meaning |
|---|---|
int<lower=0> dim_W |
Total number of basis rows across all coordinates. For separable bases, |
int<lower=0> d |
Dimension of the modulator covariate vector |
int<lower=0> W_per_k_dim |
Number of basis rows allocated to each coordinate: |
matrix[n, d] X |
Modulator covariate matrix. Row |
matrix[n, p] y_real;
array[n, p] int y_int;Both are allocated. Only the one matching family_id is consumed:
-
y_realwhenfamily_id == 1(Gaussian). -
y_intwhenfamily_id ∈ {2, 3, 4}(Poisson, NB2, Bernoulli).
vector[p] theta_anchor;The fixed reference anchor $\boldsymbol{\theta}{\text{anchor}} \in \mathbb{R}^p$. Entered once and held constant throughout sampling. Satisfies condition (C4): the modulating component is anchored by evaluating $W(\theta{\text{ref}}) - W(\theta_{\text{anchor}})$.
| Declaration | Meaning |
|---|---|
int<lower=0, upper=1> use_dispersion_y |
Whether the Gaussian family uses a per-coordinate observation noise |
int<lower=0, upper=1> use_dispersion_phi |
Whether the NB2 family uses a per-coordinate dispersion |
| Declaration | Meaning |
|---|---|
int<lower=0, upper=2> W_type_id |
0 = off, 1 = polynomial, 2 = B-spline. |
int<lower=0> W_n_knots_full |
Length of the (possibly padded) knot vector for B-splines. |
vector[W_n_knots_full] W_knots_full |
Knot positions. Ignored for polynomial. |
int<lower=0> W_degree |
Polynomial degree or B-spline degree. |
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;When use_groups == 0 (default): J_groups == 1 and group_id[i] == 1 for all use_groups == 1: each group $g \in {1,\ldots,J{\text{groups}}}$ gets its own Z_a[k] must not contain a factor(group) indicator when use_groups == 1.
{{DATA_CP_A_PER_K_DECL}}Substituted at code-generation time. Declares any additional data needed for the configurable CP/NCP (centered / non-centered) parametrization of a_raw, e.g., a vector of per-coordinate logical flags cp_a_per_k.
Computes integer bookkeeping for flat-packed coefficient vectors.
array[p] int J_a_free;
array[p] int J_b_free;
array[p] int a_raw_offset;
array[p] int c_b_raw_offset;
int total_J_a_free = 0;
int total_J_b_free = 0;| Declaration | Formula | Purpose |
|---|---|---|
dim_W_eff |
(use_W == 1) ? dim_W : 0 |
Actual rows of W_raw; 0 when W is off. |
d_eff |
(use_W == 1) ? d : 0 |
Actual columns of W_raw; 0 when W is off. |
sigma_a_size |
(use_a == 1) ? p : 0 |
Number of per-coordinate |
sigma_b_size |
(use_b == 1) ? p : 0 |
Number of per-coordinate |
sigma_W_size |
(use_W == 1 && dim_W > 0) ? 1 : 0 |
Single global |
sigma_y_size |
(use_dispersion_y == 1) ? p : 0 |
Per-coordinate Gaussian |
phi_size |
(use_dispersion_phi == 1) ? p : 0 |
Per-coordinate NB2 dispersion |
for (k in 1:p) {
J_a_free[k] = (use_a == 1 && J_a_per_k[k] > 0) ? J_a_per_k[k] : 0;
J_b_free[k] = (use_b == 1 && J_b_per_k[k] > 0) ? J_b_per_k[k] : 0;
a_raw_offset[k] = total_J_a_free;
c_b_raw_offset[k] = total_J_b_free;
total_J_a_free += J_a_free[k];
total_J_b_free += J_b_free[k];
}For each coordinate
-
$J_{a,\text{free}}[k]$ : number of actually-sampled additive coefficients (0 if component is inactive or has zero columns). -
$J_{b,\text{free}}[k]$ : same for multiplicative. -
a_raw_offset[k]: the 0-based index into the flat vectora_rawwhere coordinate$k$ 's block begins. The segment for coordinate$k$ isa_raw[(a_raw_offset[k]+1) : (a_raw_offset[k]+J_a_free[k])](1-indexed in Stan). -
c_b_raw_offset[k]: same forc_b_raw. - After the loop,
total_J_a_free = \sum_{k=1}^p J_{a,\text{free}}[k]and similarly fortotal_J_b_free.
This implements ragged-array emulation — Stan has no native ragged arrays, so coordinates with different basis widths are flat-packed into a single vector with per-coordinate offset arithmetic.
All parameters on the unconstrained sampling space.
array[J_groups] vector[p] theta_ref;
array[use_groups] vector[p] mu_theta_ref;
array[use_groups] vector<lower=0>[p] sigma_theta_ref;| Parameter | Dimension | Constr. | Role |
|---|---|---|---|
theta_ref[g] |
|
unconstrained | Per-group reference anchor use_groups == 0, only theta_ref[1] is sampled. |
mu_theta_ref[1] |
unconstrained | Hyperprior mean use_groups == 1. |
|
sigma_theta_ref[1] |
lower=0 |
Hyperprior scale use_groups == 1. |
array[sigma_a_size] real<lower=0> sigma_a;
vector[total_J_a_free] a_raw;| Parameter | Role |
|---|---|
sigma_a[k] |
Per-coordinate prior scale for the additive coefficients. Exists only when use_a == 1. |
a_raw |
Flat-packed raw additive coefficients. Depending on cp_a_per_k[k], these are either the actual coefficients (centered parametrization, CP) or standard-normal auxiliary variables mapped via $\mathbf{a}k = \sigma{a,k} \cdot \mathbf{a}_{\text{raw},k}$ (non-centered parametrization, NCP). The substitution {{TP_A_BLOCK}} and {{MODEL_A_BLOCK}} handle the dispatch. |
array[sigma_b_size] real<lower=0> sigma_b;
vector[total_J_b_free] c_b_raw;| Parameter | Role |
|---|---|
sigma_b[k] |
Per-coordinate prior scale for the multiplicative (reparametrized) coefficients. |
c_b_raw |
Flat-packed standard-normal raw variables for c_b_raw ~ normal(0, 1), then transformed to c_b[k] = sigma_b[k] * c_b_raw[k] in transformed parameters. |
The key reparametrization: we sample $\mathbf{c}b$ directly (not $\mathbf{b}$), where $\mathbf{c}{b,k} = \theta_{\text{ref}}[k] \cdot \mathbf{b}k$. This makes the additive contribution $\mathbf{Z}{b,k} \mathbf{c}{b,k}$ linear in the parameters, eliminating the bimodality documented in Recovery 2 (the $(\theta{\text{ref}}, \mathbf{b})$ space has a ridge where sign flips of
array[sigma_W_size] real<lower=0> sigma_W;
matrix[dim_W_eff, d_eff] W_raw;| Parameter | Role |
|---|---|
sigma_W[1] |
Single global prior scale for W coefficients. |
W_raw |
W_raw is either direct (CP: W_raw ~ ...) or involves a scaling by sigma_W (NCP: the {{W_SCALE}} placeholder in transformed parameters). |
array[sigma_y_size] real<lower=0> sigma_y;
array[phi_size] real<lower=0> phi;| Parameter | Family | Role |
|---|---|---|
sigma_y[k] |
Gaussian (family_id == 1) |
Per-coordinate observation standard deviation |
phi[k] |
NB2 (family_id == 3) |
Per-coordinate reciprocal dispersion (Stan's mean–dispersion parametrization: |
Constructs the linear predictor
array[p] vector[J_a_max] a_coef;
array[p] vector[J_b_max] c_b;
array[p] vector[J_b_max] b_coef;
matrix[n, p] eta;| Variable | Shape | Meaning |
|---|---|---|
a_coef[k] |
Additive coefficients for coordinate |
|
c_b[k] |
Reparametrized multiplicative coefficients: $\mathbf{c}{b,k} = \sigma{b,k} \cdot \mathbf{c}_{b,\text{raw},k}$. | |
b_coef[k] |
Recovered multiplicative coefficients: $\mathbf{b}k = \mathbf{c}{b,k} / \theta_{\text{ref}}[1][k]$ (only computed when use_groups == 0 and |
|
eta[i,k] |
Linear predictor. Its interpretation depends on family_id. |
for (k in 1:p) {
a_coef[k] = rep_vector(0, J_a_max);
c_b[k] = rep_vector(0, J_b_max);
b_coef[k] = rep_vector(0, J_b_max);
}Ensures all padded positions are identically zero.
{{TP_A_BLOCK}}Placeholder replaced at code-generation time. Implements per-coordinate CP/NCP logic:
-
CP (
cp_a_per_k[k] == 1):a_coef[k][1:J_a_free[k]] = a_raw[(a_raw_offset[k]+1):(a_raw_offset[k]+J_a_free[k])]. The coefficients are sampled directly; the prior{{MODEL_A_BLOCK}}then targetsa_rawwith scale$\sigma_{a,k}$ . -
NCP (
cp_a_per_k[k] == 0):a_coef[k][1:J_a_free[k]] = sigma_a[k] * a_raw[...]. The raw variables are standard normal; the scaling by$\sigma_{a,k}$ happens here.
Mathematically: $$\mathbf{a}k = \begin{cases} \mathbf{a}{\text{raw},k} & \text{(CP: prior has scale } \sigma_{a,k}\text{)} \ \sigma_{a,k} \cdot \mathbf{a}_{\text{raw},k} & \text{(NCP: raw} \sim \mathcal{N}(0, I)\text{)} \end{cases}$$
if (use_b == 1) {
for (k in 1:p) {
if (J_b_per_k[k] > 0) {
for (j in 1:J_b_free[k]) {
c_b[k][j] = c_b_raw[c_b_raw_offset[k] + j] * sigma_b[k];
}
if (use_groups == 0 && theta_ref[1][k] != 0) {
for (j in 1:J_b_per_k[k]) {
b_coef[k][j] = c_b[k][j] / theta_ref[1][k];
}
}
}
}
}Line-by-line:
- Guard: only executed when the multiplicative component is active.
-
Per-coordinate loop over
$k = 1,\ldots,p$ . - Inner guard: skip coordinates with zero multiplicative basis columns.
- NCP scaling: $\mathbf{c}{b,k} = \sigma{b,k} \cdot \mathbf{c}_{b,\text{raw},k}$. This is always NCP — the raw variables are standard normal (priors below).
-
Derived quantity
b_coef: computed only in the single-anchor regime (use_groups == 0) and when$\theta_{\text{ref}}[1][k] \neq 0$ . The formula is:$$b_{k,j} = \frac{c_{b,k,j}}{\theta_{\text{ref}}[1][k]}$$ With multiple groups there is no single canonical scalar to divide by, sob_coefremains at zero — users must derive$b_{k,j}^{(g)} = c_{b,k,j} / \theta_{\text{ref}}[g][k]$ post-hoc.
for (i in 1:n) {
int g_i = group_id[i];
for (k in 1:p) {
real theta_ref_ik = theta_ref[g_i][k];
real eta_ik = theta_ref_ik;For each observation
- Look up group index
$g_i = \text{group_id}[i]$ . - Initialize
$\eta_{i,k} = \theta_{\text{ref}}[g_i][k]$ .
Additive term:
if (use_a == 1 && J_a_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_a[k][i, 1:J_a_per_k[k]]),
a_coef[k][1:J_a_per_k[k]]
);
}Adds $\mathbf{Z}{a,k}[i,, 1!:!J{a,k}] \cdot \mathbf{a}k[1!:!J{a,k}]$.
Multiplicative term (reparametrized):
if (use_b == 1 && J_b_per_k[k] > 0) {
eta_ik += dot_product(
to_vector(Z_b[k][i, 1:J_b_per_k[k]]),
c_b[k][1:J_b_per_k[k]]
);
}Adds $\mathbf{Z}{b,k}[i,, 1!:!J{b,k}] \cdot \mathbf{c}{b,k}[1!:!J{b,k}]$.
This encodes $b_k(\mathbf{x}i) \cdot \theta{\text{ref}}[g_i][k]$ via the linear reparametrization $\mathbf{c}{b,k} = \theta{\text{ref}}[k] \cdot \mathbf{b}k$, so that: $$\mathbf{Z}{b,k}[i,:] \cdot \mathbf{c}{b,k} = \theta{\text{ref}}[g_i][k] \cdot \underbrace{\mathbf{Z}_{b,k}[i,:] \cdot \mathbf{b}k}{b_k(\mathbf{x}_i)}$$
Modulating (W) term:
if (use_W == 1 && dim_W > 0 && d > 0 && W_per_k_dim > 0) {
vector[d] W_diff_x_k = rep_vector(0, d);
vector[W_per_k_dim] basis_diff_k = apply_W_basis_diff(
W_type_id, theta_ref_ik, theta_anchor[k], W_per_k_dim,
W_degree, W_n_knots_full, W_knots_full
);
int row_start = (k - 1) * W_per_k_dim + 1;
for (jj_local in 1:W_per_k_dim) {
int row = row_start + jj_local - 1;
for (jjj in 1:d) {
W_diff_x_k[jjj] += basis_diff_k[jj_local] * W_raw[row, jjj]{{W_SCALE}};
}
}
eta_ik += dot_product(W_diff_x_k, to_vector(X[i]));
}Step by step:
- Guard: all five conditions must hold (component active, non-zero dimensions).
-
W_diff_x_k: accumulator$\in \mathbb{R}^d$ , initialized to zero. -
basis_diff_k: the vector $\boldsymbol{\delta}k$ with elements: $$\delta{k,j} = \phi_j!\bigl(\theta_{\text{ref}}[g_i][k]\bigr) - \phi_j!\bigl(\theta_{\text{anchor}}[k]\bigr), \quad j = 1,\ldots,W_{\text{per_k_dim}}$$ -
row_start = (k-1)*W_per_k_dim + 1: the first row inW_rawbelonging to coordinate$k$ . -
Inner double loop: computes
$$\texttt{W_diff_x_k}[c] = \sum_{j=1}^{W_{\text{per_k_dim}}} \delta_{k,j} \cdot W_{\text{raw}}[\text{row}j, c] \cdot \underbrace{\texttt{W_SCALE}}{\text{scaling factor}}$$
for each column
$c = 1,\ldots,d$ . The{{W_SCALE}}placeholder is either empty (CP) or* sigma_W[1](NCP). The matrix–vector product is: $$\mathbf{W}{\text{diff}}^{(k)} = \mathbf{W}{\text{raw}}^{(k)\top} , \boldsymbol{\delta}k \quad \in \mathbb{R}^d$$ where $\mathbf{W}{\text{raw}}^{(k)}$ is the$W_{\text{per_k_dim}} \times d$ sub-block ofW_rawfor coordinate$k$ . -
Dot product:
$\eta_{i,k} \mathrel{+}= \bigl(\mathbf{W}_{\text{diff}}^{(k)}\bigr)^\top \mathbf{x}_i$ .
Assembling all three additive components, the full linear predictor is:
The coordinate-wise factorization means cross-dimensional coupling enters only through the shared
Group-level reference:
if (use_groups == 1) {
mu_theta_ref[1] ~ {{PRIOR_THETA_REF}};
sigma_theta_ref[1] ~ {{PRIOR_SIGMA_THETA_REF}};
for (g in 1:J_groups) {
theta_ref[g] ~ normal(mu_theta_ref[1], sigma_theta_ref[1]);
}
} else {
theta_ref[1] ~ {{PRIOR_THETA_REF}};
}-
use_groups == 1(hierarchical):- Hyperprior on
$\boldsymbol{\mu}_\theta$ : coordinate-wise prior{{PRIOR_THETA_REF}}(placeholder, e.g.,normal(0, 5)). - Hyperprior on
$\boldsymbol{\sigma}_\theta$ : coordinate-wise prior{{PRIOR_SIGMA_THETA_REF}}(e.g.,exponential(1)). - Group-level: for each
$g$ , $\boldsymbol{\theta}{\text{ref}}[g] \sim \mathcal{N}(\boldsymbol{\mu}\theta, \boldsymbol{\sigma}_\theta)$ (coordinate-wise independent).
The joint density is: $$p(\boldsymbol{\theta}{\text{ref}}, \boldsymbol{\mu}\theta, \boldsymbol{\sigma}\theta) = p(\boldsymbol{\mu}\theta), p(\boldsymbol{\sigma}\theta) \prod{g=1}^{J_{\text{groups}}} \prod_{k=1}^p \mathcal{N}!\bigl(\theta_{\text{ref}}[g][k] \mid \mu_{\theta,k},, \sigma_{\theta,k}\bigr)$$
- Hyperprior on
-
use_groups == 0(single anchor):$\boldsymbol{\theta}_{\text{ref}}[1] \sim$ {{PRIOR_THETA_REF}}(coordinate-wise). This collapses to the Block 6 model.
Additive component:
if (use_a == 1) {
sigma_a ~ {{PRIOR_SIGMA_A}};
{{MODEL_A_BLOCK}}
}-
sigma_a[k]: per-coordinate prior{{PRIOR_SIGMA_A}}(e.g.,exponential(1)). -
{{MODEL_A_BLOCK}}: dispatches CP vs NCP per coordinate. For CP coordinates, targetsa_rawsegments with$\mathcal{N}(0, \sigma_{a,k}^2)$ . For NCP coordinates, targetsa_rawsegments with$\mathcal{N}(0, 1)$ (the scaling by$\sigma_{a,k}$ happened intransformed parameters).
Multiplicative component:
if (use_b == 1) {
sigma_b ~ {{PRIOR_SIGMA_B}};
c_b_raw ~ normal(0, 1);
}-
sigma_b[k]: per-coordinate prior (e.g.,exponential(1)). -
c_b_raw: always$\mathcal{N}(0, I)$ . This is the NCP for $\mathbf{c}b$: the transformation $\mathbf{c}b = \sigma_b \cdot \mathbf{c}{b,\text{raw}}$ was applied intransformed parameters. The Jacobian of this linear transformation is constant ($\prod_k \sigma{b,k}^{J_{b,\text{free}}[k]}$) and is automatically handled by Stan's change-of-variables for unconstrained parameters (though technicallyc_b_rawis unconstrained, so the Jacobian contributes to the log-density through thetransformed parametersassignment).
Modulating component:
if (use_W == 1 && dim_W > 0) {
sigma_W[1] ~ {{PRIOR_SIGMA_W}};
to_vector(W_raw) ~ {{W_PRIOR}};
}-
sigma_W[1]: global prior scale for W (e.g.,exponential(1)). -
W_raw: flattened to a vector and assigned prior{{W_PRIOR}}(e.g.,normal(0, 1)for NCP, ornormal(0, sigma_W[1])for CP — dispatched by code generation).
Dispersion parameters:
if (use_dispersion_y == 1) {
sigma_y ~ {{PRIOR_SIGMA_Y}};
}
if (use_dispersion_phi == 1) {
phi ~ {{PRIOR_PHI}};
}Per-coordinate priors on
for (k in 1:p) {
if (family_id == 1) {
y_real[:, k] ~ normal(eta[:, k], sigma_y[k]);
} else if (family_id == 2) {
y_int[:, k] ~ poisson_log(eta[:, k]);
} else if (family_id == 3) {
y_int[:, k] ~ neg_binomial_2_log(eta[:, k], phi[k]);
} else if (family_id == 4) {
y_int[:, k] ~ bernoulli_logit(eta[:, k]);
}
}For each coordinate
family_id |
Distribution | Link | Log-likelihood contribution |
|---|---|---|---|
| 1 (Gaussian) | identity | ||
| 2 (Poisson) | log | ||
| 3 (NB2) | log | ||
| 4 (Bernoulli) | logit |
Note: y_real[:, k] is Stan's column-slice syntax, selecting all rows of column col(y_real, k).
The total log-posterior up to a constant is:
where
The entire template realizes the AMM canonical form:
with the following identifiability conditions enforced:
-
(C1): The reference
$\theta_{\text{ref}}$ is identifiable through its role as intercept and through the W coupling. - (C2): Column-centering of $\mathbf{Z}a$ ensures $\mathbb{E}[a_k(\mathbf{X})] = 0$, preventing additive confounding with $\theta{\text{ref}}$.
-
(C3): Column-centering of
$\mathbf{Z}_b$ ensures$\mathbb{E}[b_k(\mathbf{X})] = 0$ . -
(C4): The reparametrization
$W_k(\theta_{\text{ref}}) - W_k(\theta_{\text{anchor}})$ eliminates the$W$ -level intercept, anchoring the modulating component. -
(C5): No group indicator in
$\mathbf{Z}_a$ whenuse_groups == 1(enforced on the R side).
The generated quantities block (section 2 of 2) would follow, containing posterior predictive quantities, log-likelihoods, and any additional derived parameters.
This section contains the tail of the model block (a single closing brace) and the entire generated quantities block. Because section 1 ended mid-model block, the leading } here terminates it. Everything that follows is post-sampling bookkeeping: pointwise log-likelihood assembly, posterior-predictive replication, and the AMM "individual predictor" handle theta_i.
}This brace closes the model block opened in section 1. Section 1 is responsible for accumulating the joint log-density (priors on all AMM-anchored parameters plus the sampling distribution contributions via target += or sampling statements). No further target updates occur in section 2; everything below is computed conditionally on the current posterior draw and does not affect the sampler.
generated quantities {Opens the generated-quantities section. Executed once per HMC/NUTS leapfrog-accepted draw (and once per warmup draw on most interfaces, though only post-warmup draws are stored). All quantities declared here are tracked by the sampler's output chain. Three matrices are produced: log_lik, theta_i, and y_pred.
matrix[n, p] log_lik;Declares an loo::loo() / loo::loo_matrix() and for
matrix[n, p] theta_i = eta;Declares theta_i as an eta. In the AMM (Anchored Measurement / reference-anchoring) decomposition,
where the anchoring identifies the latent factors theta_i = eta is a pass-through alias: mathematically
It introduces no transformation and no Jacobian. Its purpose is purely semantic and downstream-facing: it exposes the AMM "individual-level predictor" under the conventional name gdpar's extractors, plotting, summary methods) can retrieve the anchored predictor without re-deriving it from eta. Because the canonical-link parameterization is used (see below),
matrix[n, p] y_pred;Declares an _rng variants of each family's sampler.
for (i in 1:n) {
for (k in 1:p) {Iterates over all family_id (declared in data in section 1) to select the exponential-family observation model. Because the dispatch is uniform across _log / _logit suffixed Stan functions, which take eta directly on the natural-parameter scale.
if (family_id == 1) {
log_lik[i, k] = normal_lpdf(y_real[i, k] | eta[i, k], sigma_y[k]);
y_pred[i, k] = normal_rng(eta[i, k], sigma_y[k]);
}Observation model:
with identity link
y_real (declared in data, section 1) is the continuous-response matrix used only in this branch. sigma_y[k] is the outcome-specific standard deviation (a parameters entry from section 1, presumably constrained <lower=0>). The posterior-predictive draw is
} else if (family_id == 2) {
log_lik[i, k] = poisson_log_lpmf(y_int[i, k] | eta[i, k]);
y_pred[i, k] = poisson_log_rng(eta[i, k]);
}Observation model:
The log link is the canonical link for the Poisson family, so eta enters poisson_log_lpmf directly without manual exponentiation. The stored log-likelihood is
y_int (declared in data, section 1) is the integer-response matrix. The posterior-predictive draw is
} else if (family_id == 3) {
log_lik[i, k] = neg_binomial_2_log_lpmf(y_int[i, k] | eta[i, k], phi[k]);
y_pred[i, k] = neg_binomial_2_log_rng(eta[i, k], phi[k]);
}Observation model (NB2 parameterization, mean
with variance function
The log link is the canonical link for the NB2 family as parameterized by Stan. The stored log-likelihood is the NB2 pmf in log form,
phi[k] is the outcome-specific reciprocal-dispersion parameter (declared in parameters, section 1, presumably <lower=0>). The posterior-predictive draw is
} else if (family_id == 4) {
log_lik[i, k] = bernoulli_logit_lpmf(y_int[i, k] | eta[i, k]);
y_pred[i, k] = bernoulli_logit_rng(eta[i, k]);
}Observation model:
The logit link is the canonical link for the Bernoulli/binomial family, so eta enters bernoulli_logit_lpmf directly. The stored log-likelihood is
where the second term is the softplus function evaluated in a numerically stable manner inside Stan. The posterior-predictive draw is
i.e. a 0/1 integer realization.
}
}
}Closes the k loop, the i loop, and the generated quantities block. No else clause is provided for family_id values outside log_lik[i,k] and y_pred[i,k] entries remain at their default-initialized value (NaN for real matrices in Stan), which will propagate visibly into downstream R-side diagnostics.
The generated-quantities block is the consumption point of the AMM decomposition, not its construction site (which lives in transformed parameters of section 1). Its role is threefold:
-
Expose the anchored predictor. The line
matrix[n, p] theta_i = eta;re-binds the anchored linear predictor$\eta$ — already assembled in section 1 from the reference-anchored intercepts, covariate slopes, and identified latent factors — under the canonical AMM symbol$\theta_i$ . Because the anchoring was performed on the natural-parameter scale (the canonical link scale), no further reparameterization is needed:$\theta_{ik}$ is exactly the natural parameter of the chosen exponential family. -
Evaluate the likelihood on the canonical scale. Each branch feeds
eta[i,k](=$\theta_{ik}$ ) directly into the_log/_logitsuffixed Stan functions, which interpret their first argument as the natural parameter of the family. This is the operational meaning of "canonical pieces" in the file path: the template hard-codes the canonical-link parameterization of each supported family, so the AMM predictor$\theta_i$ plugs in without a link-function transformation. Mathematically, for family$F$ with natural parameter$\theta$ and cumulant function$b(\theta)$ ,$$ \log p(y\mid \theta) ;=; y,\theta ;-; b(\theta) ;+; c(y), $$
and the four branches instantiate this for the Gaussian (
$b(\theta)=\theta^2/2$ on the mean scale with known$\sigma$ ), Poisson ($b(\theta)=e^\theta$ ), NB2 (with$\phi$ -dependent$b$ ), and Bernoulli ($b(\theta)=\log(1+e^\theta)$) families respectively. -
Produce posterior-predictive replicates on the same anchored scale. Each
_rngcall draws $y^{\text{rep}}{ik}$ from the same sampling distribution whose natural parameter is the anchored $\eta{ik}$, so posterior-predictive checks operate on the same identified parameter space that the AMM anchoring was designed to make interpretable. The reference anchoring guarantees that$\eta_{ik}$ (and hence the predictive draws) is invariant to the otherwise-arbitrary rotation/sign/scale of the latent factor space, because those degrees of freedom were absorbed into the anchoring constraints imposed in section 1.
No Jacobian terms appear in this section: the only transformation (theta_i = eta) is the identity, and the canonical-link parameterization means the map from _lpdf/_lpmf functions themselves (Stan's poisson_log_lpmf, neg_binomial_2_log_lpmf, and bernoulli_logit_lpmf internally apply the link and include the corresponding change-of-variables terms where relevant — none for discrete families, where the link is merely a reparameterization of the parameter, not of the data).
This template implements the Empirical Bayes (EB) Conditional variant of the AMM (Anchor-Mediated Model) reference-anchoring decomposition within the gdpar package. It is designated Path 1 / Path C under sub-phase 8.6.D, canonizing decisions D36 (α), D37 (i), and D38″ (h).
The template is bit-identical to amm_eb_marginal_KxP.stan except for one structural change: the reference anchor parameters block (where it was sampled with a prior) to the data block (where it enters as a plug-in EB estimate, renamed theta_ref_kp_data). Consequently, the prior block on model{}. This transforms the model from a marginal (fully Bayesian over the anchor) to a conditional (anchor conditioned upon) formulation.
For observation
where:
-
$\theta_{\text{ref},g_i,k,j}$ is the group-specific reference anchor (plug-in EB data in this conditional template). - $\mathbf{a}{k,j}$ is the additive deviation vector, constructed via non-centered parameterization: $\mathbf{a}{k,j} = \sigma_{a,k},\mathbf{a}_{\text{raw},k,j}$.
- $\mathbf{c}{b,k,j}$ is the scaled b-component, also non-centered: $\mathbf{c}{b,k,j} = \sigma_{b,k},\mathbf{c}_{b,\text{raw},k,j}$.
- The anchor-mediated scaling $\mathbf{b}{k,j}$ is defined (only when
use_groups == 0) as $\mathbf{b}{k,j} = \mathbf{c}{b,k,j},/,\theta{\text{ref},k,j}$, expressing the multiplicative deviation relative to the anchor.
With
-
Slot 1 (
$k=1$ ): location parameter (mean). -
Slot 2 (
$k=2$ ): dispersion parameter (log-scale of$\sigma$ or$\phi$ ).
The
family_id_k_vector[1] |
Family | Slot 1 role | Slot 2 role |
|---|---|---|---|
| 1 | Gaussian | ||
| 3 | Negative Binomial (type 2) |
int<lower=1> n;
int<lower=2> K;
int<lower=2> p;-
n: Number of observations. Constrained$\geq 1$ . -
K: Number of slots. Constrained$\geq 2$ because the multi-parametric likelihood requires at least a location slot and a dispersion slot. -
p: Number of response coordinates (the dimensionality of the coordinate-wise factorization). Constrained$\geq 2$ .
array[K] int<lower=1, upper=13> family_id_k_vector;
array[K] int<lower=0, upper=2> inv_link_id_per_slot;-
family_id_k_vector: Per-slot integer family identifier. The range$[1,13]$ accommodates up to 13 distributional families. In the current template body, onlyfamily_id_k_vector[1](the slot-1 family) is inspected to dispatch the likelihood; slot 2 is implicitly the dispersion slot. Values used in the body:1(Gaussian) and3(Negative Binomial type 2). -
inv_link_id_per_slot: Per-slot inverse-link identifier$\in {0,1,2}$ . Declared but not referenced in the template body — the link is hard-coded per family (identity for Gaussian mean, exp for dispersion/NB mean). This is a configuration placeholder for future generalization.
array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;
int<lower=0, upper=0> use_W;-
use_a_k: Binary flag per slot. Ifuse_a_k[k] == 1, slot$k$ includes an additive$a()$ component. -
use_b_k: Binary flag per slot. Ifuse_b_k[k] == 1, slot$k$ includes a$b()$ (anchor-mediated scaling) component. -
use_W: Constrained<lower=0, upper=0>, i.e., always exactly 0. This is a structural placeholder indicating that a$W$ -matrix (global cross-slot mixing) pathway is disabled by construction. It is declared for interface compatibility but can never take a nonzero value.
int<lower=0> J_a_max;
int<lower=0> J_b_max;
array[K, p] int<lower=0> J_a_per_kp;
array[K, p] int<lower=0> J_b_per_kp;-
J_a_max: Maximum column count of any$a()$ design matrix across all$(k,j)$ pairs. Used as the static column dimension of the padded design arrays. -
J_b_max: Analogous maximum for$b()$ design matrices. -
J_a_per_kp[k,j]: Actual (unpadded) number of$a()$ design columns for slot$k$ , coordinate$j$ . May be 0 (no$a$ terms for that pair). -
J_b_per_kp[k,j]: Analogous for$b()$ .
array[K, p] matrix[n, J_a_max] Z_a_kp;
array[K, p] matrix[n, J_b_max] Z_b_kp;-
Z_a_kp[k,j]:$n \times J_{a,\max}$ design matrix for the additive component of slot$k$ , coordinate$j$ . Columns beyondJ_a_per_kp[k,j]are padding (ignored at evaluation time via slicing1:J_a_per_kp[k,j]). -
Z_b_kp[k,j]:$n \times J_{b,\max}$ design matrix for the$b()$ component. Same padding convention.
matrix[n, p] y_real;
array[n, p] int y_int;-
y_real:$n \times p$ matrix of real-valued responses. Used whenfamily_id_k_vector[1] == 1(Gaussian). -
y_int:$n \times p$ integer array of count responses. Used whenfamily_id_k_vector[1] == 3(Negative Binomial).
Both are always passed; the unused one is simply ignored by the dispatched likelihood.
array[K] vector[p] theta_anchor_kp;-
theta_anchor_kp: Per-slot anchor vector of length$p$ . Declared indatabut never referenced in the template body. This is a legacy/interface declaration — the R-side wrapper may use it for construction or validation, but the Stan model itself reads the anchor fromtheta_ref_kp_data(below). Its presence ensures interface symmetry with the marginal template.
array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;-
use_dispersion_y_k: Per-slot flag for a population-level$\sigma_y$ dispersion parameter. -
use_dispersion_phi_k: Per-slot flag for a population-level$\phi$ dispersion parameter.
These control the sizing of sigma_y_pop_k and phi_pop_k parameter arrays (see transformed data). Note: in the current template body, these population dispersion parameters receive priors but do not enter the likelihood — they are sampled but unused in
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;-
use_groups: Binary flag. If 1, observations are assigned to groups and the reference anchor varies by group. -
J_groups: Number of groups. Must be$\geq 1$ . -
group_id: Per-observation group index$g_i \in {1,\dots,J_{\text{groups}}}$ .
When use_groups == 0, all observations share the same anchor theta_ref_kp_data[1, k][j] (group index 1), and the use_groups == 1, the anchor varies by group but the
int<lower=2> K_slots;
int<lower=2> p_dim;-
K_slots: Redundant copy ofK. Declared for interface/naming consistency. Not referenced in the body. -
p_dim: Redundant copy ofp. Same purpose. Not referenced in the body.
array[J_groups, K] vector[p] theta_ref_kp_data;-
theta_ref_kp_data: The plug-in Empirical Bayes reference anchor, shaped as a 3D array$[J_{\text{groups}}, K, p]$ . Elementtheta_ref_kp_data[g, k][j]gives$\theta_{\text{ref},g,k,j}$ — the anchor for group$g$ , slot$k$ , coordinate$j$ .
This is the single structural difference from the marginal template: here cmdstanr's automatic array packing, and passes it in. The Stan model then treats it as known.
This block performs all index precomputation: it determines the layout of the flattened parameter vectors (a_raw, c_b_kp_raw), computes offsets, sizes the dispersion parameter arrays, and implements the Option A compactification of
array[K, p] int J_a_free_kp;
array[K, p] int J_b_free_kp;
array[K, p] int a_raw_offset_kp;
array[K, p] int c_b_raw_offset_kp;
int total_J_a_free = 0;
int total_J_b_free = 0;
int any_use_a = 0;
int any_use_b = 0;
int sigma_y_size = 0;
int phi_size = 0;
array[K] int sigma_y_offset_k;
array[K] int phi_offset_k;Declarations:
-
J_a_free_kp[k,j]: Number of free$a()$ coefficients for slot$k$ , coordinate$j$ . A coefficient is free only ifuse_a_k[k] == 1andJ_a_per_kp[k,j] > 0. Otherwise it is 0. -
J_b_free_kp[k,j]: Analogous for$b()$ . -
a_raw_offset_kp[k,j]: Starting index (0-based) of the$(k,j)$ block within the flatteneda_rawvector. -
c_b_raw_offset_kp[k,j]: Starting index of the$(k,j)$ block withinc_b_kp_raw. -
total_J_a_free: Total count of free$a()$ coefficients across all$(k,j)$ . Determines the length ofa_raw. -
total_J_b_free: Total count of free$b()$ coefficients. Determines the length ofc_b_kp_raw. -
any_use_a: Global flag — 1 if any$(k,j)$ has free$a()$ coefficients, else 0. -
any_use_b: Global flag — 1 if any$(k,j)$ has free$b()$ coefficients, else 0. -
sigma_y_size: Total count of population$\sigma_y$ parameters (sum ofuse_dispersion_y_k). -
phi_size: Total count of population$\phi$ parameters (sum ofuse_dispersion_phi_k). -
sigma_y_offset_k[k],phi_offset_k[k]: Per-slot offsets into the compacted dispersion arrays.
for (k in 1:K) {
for (j in 1:p) {
J_a_free_kp[k, j] = (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0)
? J_a_per_kp[k, j] : 0;
J_b_free_kp[k, j] = (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0)
? J_b_per_kp[k, j] : 0;
a_raw_offset_kp[k, j] = total_J_a_free;
c_b_raw_offset_kp[k, j] = total_J_b_free;
total_J_a_free += J_a_free_kp[k, j];
total_J_b_free += J_b_free_kp[k, j];
if (J_a_free_kp[k, j] > 0) any_use_a = 1;
if (J_b_free_kp[k, j] > 0) any_use_b = 1;
}
sigma_y_offset_k[k] = sigma_y_size;
phi_offset_k[k] = phi_size;
sigma_y_size += use_dispersion_y_k[k];
phi_size += use_dispersion_phi_k[k];
}For each
-
Free count determination: The ternary expression sets
J_a_free_kp[k,j]toJ_a_per_kp[k,j]only if the slot's$a()$ component is active (use_a_k[k] == 1) and there is at least one design column. This handles two cases where no free$a()$ coefficient exists:-
use_a_k[k] == 0: the$a()$ component is entirely absent for this slot. -
J_a_per_kp[k,j] == 0: the$a()$ design has zero columns (e.g., intercept-only$a()$ where the intercept has been absorbed into$\theta_{\text{ref}}$ ).
The same logic applies to
J_b_free_kp. -
-
Offset recording: The current value of
total_J_a_free(resp.total_J_b_free) is stored as the 0-based starting offset for this$(k,j)$ block. This is a standard prefix-sum / ragged-array flattening pattern. -
Running total update:
total_J_a_freeis incremented by the free count, accumulating the total vector length. -
Global flag update: If any
$(k,j)$ has free$a()$ coefficients,any_use_ais set to 1 (and stays 1). Same forany_use_b.
After the inner
int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
int slot_free_a = 0;
for (j in 1:p) {
slot_free_a += J_a_free_kp[k, j];
}
if (slot_free_a > 0) {
n_sigma_a += 1;
sigma_a_idx[k] = n_sigma_a;
} else {
sigma_a_idx[k] = 0;
}
}This implements Option A (referenced as RG.6, decision D96): the
Rationale (from the inline comment): A slot whose use_a_k[k] == 0 (no
Mechanism:
-
slot_free_aaccumulates$\sum_{j=1}^{p} J_{a,\text{free},k,j}$ for slot$k$ . - If
slot_free_a > 0, the slot is assigned a compacted indexsigma_a_idx[k] = n_sigma_a(1-based), andn_sigma_ais incremented. - If
slot_free_a == 0,sigma_a_idx[k] = 0(sentinel: no parameter).
Degenerate case: When every slot carries free
Note: array[K * any_use_b], which is all-or-nothing (either all
array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;
array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_kp_raw;
array[sigma_y_size] real<lower=0> sigma_y_pop_k;
array[phi_size] real<lower=0> phi_pop_k;-
sigma_a_k: Array of length$n_{\sigma_a}$ (compacted). Each element is a half-prior scale$\sigma_{a,k} > 0$ for the non-centered parameterization of the$a()$ coefficients. Indexed viasigma_a_idx[k]at evaluation time. -
a_raw: Flattened vector of lengthtotal_J_a_freecontaining all free raw$a()$ coefficients across all$(k,j)$ . These are the non-centered variates: each$a_{\text{raw},m} \sim \mathcal{N}(0,1)$ , and the actual coefficient is$a_{k,j,jj} = \sigma_{a,k} \cdot a_{\text{raw},\text{offset}+jj}$ .
-
sigma_b_k: Array of length$K \cdot \text{any_use_b}$ . Ifany_use_b == 1, this is length$K$ (one$\sigma_{b,k}$ per slot, not compacted). Ifany_use_b == 0, length 0 (no parameters declared). Each element$\sigma_{b,k} > 0$ . -
c_b_kp_raw: Flattened vector of lengthtotal_J_b_freecontaining all free raw$b()$ coefficients. Non-centered:$c_{b,\text{raw},m} \sim \mathcal{N}(0,1)$ , actual coefficient$c_{b,k,j,jj} = \sigma_{b,k} \cdot c_{b,\text{raw},\text{offset}+jj}$ .
-
sigma_y_pop_k: Array of lengthsigma_y_size(sum ofuse_dispersion_y_k). Population-level$\sigma_y$ scales, constrained$> 0$ . -
phi_pop_k: Array of lengthphi_size(sum ofuse_dispersion_phi_k). Population-level$\phi$ scales, constrained$> 0$ .
Important: These population dispersion parameters are declared, receive priors in the model block, but do not enter the likelihood or the linear predictor in the current template body. They are sampled from their priors only. This is either scaffolding for a future population-dispersion layer or a legacy from the marginal template.
The defining feature of this conditional template: theta_ref_kp is not a parameter. It has been moved to data as theta_ref_kp_data. The model conditions on the plug-in EB estimate rather than sampling the anchor.
array[K] matrix[n, p] eta_kp;
array[K, p] vector[J_a_max] a_coef_kp;
array[K, p] vector[J_b_max] c_b_kp;
array[K, p] vector[J_b_max] b_coef_kp;-
eta_kp[k]:$n \times p$ matrix of linear predictors for slot$k$ . This is the central quantity of the AMM decomposition. -
a_coef_kp[k,j]: Vector of length$J_{a,\max}$ holding the (scaled)$a()$ coefficients for slot$k$ , coordinate$j$ . Padded with zeros beyondJ_a_free_kp[k,j]. -
c_b_kp[k,j]: Vector of length$J_{b,\max}$ holding the scaled$b()$ coefficients (the "c" in the$b = c/\theta_{\text{ref}}$ decomposition). -
b_coef_kp[k,j]: Vector of length$J_{b,\max}$ holding the anchor-mediated scaling coefficients $\mathbf{b}{k,j} = \mathbf{c}{b,k,j} / \theta_{\text{ref},k,j}$. Computed but not used in the likelihood — the linear predictor usesc_b_kpdirectly. This quantity is available for posterior reporting/diagnostics.
for (k in 1:K) {
for (j in 1:p) {
a_coef_kp[k, j] = rep_vector(0, J_a_max);
c_b_kp[k, j] = rep_vector(0, J_b_max);
b_coef_kp[k, j] = rep_vector(0, J_b_max);
}
}All coefficient vectors are initialized to zero. This ensures that padding positions (beyond the free coefficient count) and slots/coordinates with no active component remain exactly zero, contributing nothing to the linear predictor.
if (any_use_a == 1) {
for (k in 1:K) {
if (use_a_k[k] == 1) {
for (j in 1:p) {
if (J_a_free_kp[k, j] > 0) {
for (jj in 1:J_a_free_kp[k, j]) {
a_coef_kp[k, j][jj] = a_raw[a_raw_offset_kp[k, j] + jj]
* sigma_a_k[sigma_a_idx[k]];
}
}
}
}
}
}This block is guarded by any_use_a == 1 (skipped entirely if no slot uses $a()$). For each active slot
This is the non-centered parameterization (NCP): the raw variates sigma_a_idx[k] maps slot sigma_a_k array.
The mathematical effect: marginally,
but the joint
if (any_use_b == 1) {
for (k in 1:K) {
if (use_b_k[k] == 1) {
for (j in 1:p) {
if (J_b_free_kp[k, j] > 0) {
for (jj in 1:J_b_free_kp[k, j]) {
c_b_kp[k, j][jj] = c_b_kp_raw[c_b_raw_offset_kp[k, j] + jj]
* sigma_b_k[k];
}
if (use_groups == 0 && theta_ref_kp_data[1, k][j] != 0) {
for (jj in 1:J_b_per_kp[k, j]) {
b_coef_kp[k, j][jj] = c_b_kp[k, j][jj]
/ theta_ref_kp_data[1, k][j];
}
}
}
}
}
}
}For each free
Marginally, sigma_b_k[k] is indexed directly by any_use_b == 1.
The inner conditional computes:
This is the reference-anchoring decomposition for the multiplicative component: the
-
use_groups == 0: Only when there is no group structure. With groups,$\theta_{\text{ref}}$ varies by group, so a single$\mathbf{b}_{k,j}$ cannot represent the group-varying ratio. In that case,b_coef_kpremains zero. -
theta_ref_kp_data[1, k][j] != 0: Division by zero is avoided. If the anchor is exactly zero,b_coef_kpremains zero for that$(k,j)$ .
The loop runs over 1:J_b_per_kp[k,j] (the full design column count, not just the free count), but c_b_kp[k,j][jj] is zero for
Critical observation: b_coef_kp is computed here but never used in the linear predictor (Section 5.5 uses c_b_kp directly). The
for (k in 1:K) {
for (i in 1:n) {
int g_i = group_id[i];
for (j in 1:p) {
real theta_ref_ikj = theta_ref_kp_data[g_i, k][j];
real eta_ikj = theta_ref_ikj;
if (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0) {
eta_ikj += dot_product(
to_vector(Z_a_kp[k, j][i, 1:J_a_per_kp[k, j]]),
a_coef_kp[k, j][1:J_a_per_kp[k, j]]
);
}
if (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0) {
eta_ikj += dot_product(
to_vector(Z_b_kp[k, j][i, 1:J_b_per_kp[k, j]]),
c_b_kp[k, j][1:J_b_per_kp[k, j]]
);
}
eta_kp[k][i, j] = eta_ikj;
}
}
}This is the core AMM reference-anchoring equation, realized as:
Step by step:
-
Group lookup:
g_i = group_id[i]determines which group's anchor applies to observation$i$ . -
Anchor retrieval:
theta_ref_ikj = theta_ref_kp_data[g_i, k][j]fetches$\theta_{\text{ref},g_i,k,j}$ — the plug-in EB anchor for this observation's group, slot, and coordinate. -
Anchor initialization:
eta_ikj = theta_ref_ikj— the linear predictor starts at the anchor. All deviations are additive on top of the anchor. -
Additive
$a()$ contribution: If slot$k$ uses$a()$ and has design columns, the dot product $\mathbf{z}{a,k,j,i}^\top \mathbf{a}{k,j}$ is added. The design rowZ_a_kp[k,j][i, 1:J_a_per_kp[k,j]]is extracted and converted to a vector; the coefficient slicea_coef_kp[k,j][1:J_a_per_kp[k,j]]is matched. Note: the slicing usesJ_a_per_kp(the full design column count), but coefficients beyondJ_a_free_kpare zero (from initialization), so only free coefficients contribute. -
$b()$ contribution via$c_b$ : If slot$k$ uses$b()$ and has design columns, the dot product $\mathbf{z}{b,k,j,i}^\top \mathbf{c}{b,k,j}$ is added. Note: this usesc_b_kp(the scaled raw coefficients), notb_coef_kp(the anchor-divided coefficients). The$b()$ component enters the predictor as a direct additive term through$\mathbf{c}_b$ , not through the anchor-mediated$\mathbf{b}$ . -
Storage:
eta_kp[k][i,j] = eta_ikj.
The reference-anchoring decomposition is thus: the anchor
// theta_ref_kp prior block intentionally absent: the anchor is data
// here. The remaining priors mirror the marginal template.
if (any_use_a == 1) {
sigma_a_k ~ {{PRIOR_SIGMA_A}};
a_raw ~ normal(0, 1);
}
if (any_use_b == 1) {
sigma_b_k ~ {{PRIOR_SIGMA_B}};
c_b_kp_raw ~ normal(0, 1);
}
if (sigma_y_size > 0) {
sigma_y_pop_k ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
phi_pop_k ~ {{PRIOR_PHI}};
}The comment makes explicit the defining difference from the marginal template: the prior on
When any_use_a == 1:
-
sigma_a_k ~ {{PRIOR_SIGMA_A}}: The half-prior scales$\sigma_{a,k}$ receive a templated prior (the{{PRIOR_SIGMA_A}}placeholder is substituted by the R-side wrapper at model compilation time). This is a hyperprior on the additive deviation magnitude. -
a_raw ~ normal(0, 1): The raw variates are standard normal. This is the NCP anchor:$a_{\text{raw},m} \sim \mathcal{N}(0, 1)$ , implying$a_{k,j,jj} \sim \mathcal{N}(0, \sigma_{a,k}^2)$ marginally.
No Jacobian adjustment is needed because a_raw is declared as an unconstrained vector (no transformation), and the scaling to a_coef_kp happens in transformed parameters (deterministic, no prior placed on a_coef_kp directly).
When any_use_b == 1:
-
sigma_b_k ~ {{PRIOR_SIGMA_B}}: Templated hyperprior on$\sigma_{b,k}$ . -
c_b_kp_raw ~ normal(0, 1): Standard normal NCP variates, implying$c_{b,k,j,jj} \sim \mathcal{N}(0, \sigma_{b,k}^2)$ marginally.
When sigma_y_size > 0 and/or phi_size > 0:
-
sigma_y_pop_k ~ {{PRIOR_SIGMA_Y}}: Templated prior on population$\sigma_y$ . -
phi_pop_k ~ {{PRIOR_PHI}}: Templated prior on population$\phi$ .
As noted, these parameters do not enter the likelihood — they are sampled from their priors only.
The likelihood is dispatched on family_id_k_vector[1] — the family of slot 1 (the location slot). Only two branches are implemented:
if (family_id_k_vector[1] == 1) {
for (i in 1:n) {
for (j in 1:p) {
real sigma_ij = exp(eta_kp[2][i, j]);
y_real[i, j] ~ normal(eta_kp[1][i, j], sigma_ij);
}
}
}For each observation
-
Slot 1 (
$\eta_{1,i,j}$ ) is the mean$\mu_{ij}$ (identity link, since the Gaussian mean is directly the linear predictor). -
Slot 2 (
$\eta_{2,i,j}$ ) is the log-standard-deviation. Theexp()link ensures$\sigma_{ij} > 0$ . - The likelihood is the product
$\prod_{i=1}^{n}\prod_{j=1}^{p} \mathcal{N}(y_{ij}^{\text{real}} \mid \eta_{1,i,j}, \exp(\eta_{2,i,j}))$ , reflecting the coordinate-wise factorization (Path A).
} else if (family_id_k_vector[1] == 3) {
for (i in 1:n) {
for (j in 1:p) {
real phi_ij = exp(eta_kp[2][i, j]);
y_int[i, j] ~ neg_binomial_2(exp(eta_kp[1][i, j]), phi_ij);
}
}
}For each observation
-
Slot 1 (
$\eta_{1,i,j}$ ) is the log-mean. Theexp()link ensures$\mu_{ij} > 0$ . -
Slot 2 (
$\eta_{2,i,j}$ ) is the log-dispersion. Theexp()link ensures$\phi_{ij} > 0$ . -
neg_binomial_2is Stan's parameterization with mean$\mu$ and precision (inverse overdispersion)$\phi$ :$\text{Var}(Y) = \mu + \mu^2/\phi$ . - The likelihood is $\prod_{i=1}^{n}\prod_{j=1}^{p} \text{NegBinomial}2(y{ij}^{\text{int}} \mid \exp(\eta_{1,i,j}), \exp(\eta_{2,i,j}))$.
If family_id_k_vector[1] is neither 1 nor 3, no likelihood is evaluated. The model reduces to just the priors. This is consistent with the stated "initial coverage" of Gaussian and NB only.
No explicit Jacobian adjustments appear in the model block. This is correct because:
- The constrained parameters (
sigma_a_k,sigma_b_k,sigma_y_pop_k,phi_pop_k) have<lower=0>constraints, and Stan automatically applies the log-Jacobian for the half-normal/half-Student-t etc. transforms. - The NCP scaling (
$a = \sigma_a \cdot a_{\text{raw}}$ ) is a deterministic linear transformation intransformed parameters; no prior is placed ona_coef_kpdirectly, so no Jacobian is needed. - The link functions (
exp) are applied toeta_kpinside the likelihood statements, not as parameter transforms.
matrix[n, p] log_lik;
matrix[n, p] theta_i;
matrix[n, p] y_pred;-
log_lik:$n \times p$ matrix of pointwise log-likelihood contributions. Used for PSIS-LOO/WAIC via theloopackage. -
theta_i:$n \times p$ matrix of the location parameter$\theta_i$ (the quantity the AMM decomposes). For both families, this is$\eta_{1,i,j}$ (the slot-1 linear predictor). -
y_pred:$n \times p$ matrix of posterior predictive draws.
if (family_id_k_vector[1] == 1) {
for (i in 1:n) {
for (j in 1:p) {
real sigma_ij = exp(eta_kp[2][i, j]);
theta_i[i, j] = eta_kp[1][i, j];
log_lik[i, j] = normal_lpdf(y_real[i, j] | eta_kp[1][i, j],
sigma_ij);
y_pred[i, j] = normal_rng(eta_kp[1][i, j], sigma_ij);
}
}
}For each
The normal_lpdf is called explicitly (rather than using the ~ sampling syntax) to capture the log-density value for LOO. The normal_rng generates a random draw from the predictive distribution at each posterior sample.
} else if (family_id_k_vector[1] == 3) {
for (i in 1:n) {
for (j in 1:p) {
real phi_ij = exp(eta_kp[2][i, j]);
real mu_ij = exp(eta_kp[1][i, j]);
theta_i[i, j] = eta_kp[1][i, j];
log_lik[i, j] = neg_binomial_2_lpmf(y_int[i, j] | mu_ij, phi_ij);
y_pred[i, j] = neg_binomial_2_rng(mu_ij, phi_ij);
}
}
}For each
Note that theta_i stores
As in the model block, if the family is neither 1 nor 3, the generated quantities matrices remain uninitialized (Stan will fill them with NaN). This is consistent with the stated initial coverage.
The AMM reference-anchoring decomposition expresses the linear predictor for each slot
The anchor
The anchor-mediated scaling $\mathbf{b}{k,j} = \mathbf{c}{b,k,j}/\theta_{\text{ref},k,j}$ is derived (in transformed parameters) but not used in the likelihood — it is a reporting quantity that expresses the multiplicative deviation relative to the anchor.
In this conditional template, theta_ref_kp_data), not a parameter. The model conditions on the plug-in EB estimate:
There is no prior on
The
| Family |
|
|
Likelihood |
|---|---|---|---|
| Gaussian |
|
||
| NB | $\text{NegBinomial}2(\mu{ij}, \phi_{ij})$ |
The
This is the "Path A" coordinate-wise factorization composed with the "Path B" K-slot multi-parametric likelihood, canonized as "Path C" (D38″ = (h)).
When use_groups == 1, the anchor varies by group: observation
The following data/parameters are declared but not referenced in the template body (likelihood or predictor):
| Quantity | Block | Status |
|---|---|---|
inv_link_id_per_slot |
data | Declared, never read (links are hard-coded per family) |
use_W |
data | Constrained to 0; placeholder for disabled |
theta_anchor_kp |
data | Declared, never read (R-side interface only) |
K_slots |
data | Redundant mirror of K; never read |
p_dim |
data | Redundant mirror of p; never read |
sigma_y_pop_k |
parameters | Sampled from prior only; does not enter |
phi_pop_k |
parameters | Sampled from prior only; does not enter |
b_coef_kp |
transformed parameters | Computed but not used in |
sigma_y_offset_k |
transformed data | Computed but never read |
phi_offset_k |
transformed data | Computed but never read |
These are either interface placeholders, legacy from the marginal template, or scaffolding for future generalization.
The header establishes this as the Empirical Bayes (EB) Marginal Template, Path C of the gdpar package (Sub-phase 8.6.D). It composes:
-
Path A (multivariate, Theorem 7A*): a single outcome
$y[n,p]$ factorized coordinate-wise across$p$ columns. -
Path B (multi-parametric, Theorem 7C*):
$K$ distributional-regression slots, each with its own canonical link.
The
Coverage: Initial iteration supports family_id_k_vector[1]
Restrictions:
-
use_W = 0enforced (decision D39): the modulating component$W$ is disabled. -
$a$ /$b$ coefficients are per-slot per-coordinate ragged (D41): each$(k,j)$ pair has its own$J_{a}^{(k,j)}$ /$J_{b}^{(k,j)}$ free columns.
Internal parametrization:
-
$a$ : non-centered (NCP) per slot, with per-slot scale$\sigma_{a,k}$ . -
$b$ : linear reparametrization via$c_b = \theta_{\text{ref}} \cdot b$ ; NCP sampling of$c_{b,\text{raw}}$ , with$b$ derived post-hoc in the single-anchor regime (use_groups == 0). -
$\theta_{\text{ref}}$ : per-group, per-slot, per-coord anchor (sampled in the marginal template; fixed as data in the companion conditional template).
int<lower=1> n;Number of observations. Constraint
int<lower=2> K;Number of distributional parameter slots. Constraint
int<lower=2> p;Number of outcome coordinates (matrix-column dimension). Constraint
array[K] int<lower=1, upper=13> family_id_k_vector;Per-slot family identifiers. In the initial iteration, these are homogeneous across slots: the dispatcher consumes only family_id_k_vector[1] as the canonical family, and remaining slots inherit the canonical structure. Valid values in this iteration:
array[K] int<lower=0, upper=2> inv_link_id_per_slot;Canonical Stan inverse-link identifier per slot, inherited from Path B dispatch. The comment states that only inv_link_id_per_slot[2] (log model block, the exp() call is hardcoded rather than dispatched through this array; the array is declared but not referenced in the executable code of this section.
array[K] int<lower=0, upper=1> use_a_k;
array[K] int<lower=0, upper=1> use_b_k;Per-slot binary flags activating the additive (use_a_k[k] == 0, slot
int<lower=0, upper=0> use_W;The modulating component <lower=0, upper=0> forces use_W to be exactly zero — this is a compile-time assertion (decision D39) ensuring
int<lower=0> J_a_max;
int<lower=0> J_b_max;Maximum padding widths for the additive and multiplicative design matrices across all
array[K, p] int<lower=0> J_a_per_kp;
array[K, p] int<lower=0> J_b_per_kp;Per-slot, per-coordinate free-column counts.
array[K, p] matrix[n, J_a_max] Z_a_kp;
array[K, p] matrix[n, J_b_max] Z_b_kp;Per-slot, per-coordinate padded design matrices. For each
matrix[n, p] y_real;
array[n, p] int y_int;Dual outcome declarations: y_real for continuous families (Gaussian), y_int for count families (NB). Only the declaration matching family_id_k_vector[1] is consumed in the likelihood. Both are always supplied as data; the unused one is ignored.
array[K] vector[p] theta_anchor_kp;Per-slot, per-coordinate anchor supplied as data. This is distinct from the sampled parameter theta_ref_kp. It is declared but not referenced in the executable code of this section (likely consumed in generated quantities or for initialization in section 2).
array[K] int<lower=0, upper=1> use_dispersion_y_k;
array[K] int<lower=0, upper=1> use_dispersion_phi_k;Per-slot population-scope dispersion flags, derived on the R side from family_id_k_vector. These control whether a sigma_y_pop_k (resp. phi_pop_k) entry is allocated for slot use_dispersion_y_k[2] = 1; for NB use_dispersion_phi_k[2] = 1.
int<lower=0, upper=1> use_groups;
int<lower=1> J_groups;
array[n] int<lower=1, upper=J_groups> group_id;-
use_groups: binary flag. When 0, a single anchor per slot is used (J_groupsis still$\geq 1$ but only group 1 is referenced). -
J_groups: number of groups. -
group_id: per-observation group assignment$g_i \in {1, \ldots, J_{\text{groups}}}$ .
int<lower=2> K_slots;
int<lower=2> p_dim;EB-template convention: every EB template declares K_slots and p_dim. In Path C, K_slots == K and p_dim == p, both strictly greater than 1. These serve as redundant confirmations of the multi-slot, multi-coordinate structure.
array[K, p] int J_a_free_kp;
array[K, p] int J_b_free_kp;Effective free-column counts after applying the per-slot use flags. These refine J_a_per_kp / J_b_per_kp by zeroing out entries for inactive slots.
array[K, p] int a_raw_offset_kp;
array[K, p] int c_b_raw_offset_kp;Flat-pack offsets into the one-dimensional parameter vectors a_raw and c_b_kp_raw. For each [offset + 1 : offset + J_free] in the flat vector.
int total_J_a_free = 0;
int total_J_b_free = 0;
int any_use_a = 0;
int any_use_b = 0;-
total_J_a_free/total_J_b_free: total number of free additive / multiplicative coefficients across all$(k,j)$ pairs. These determine the lengths ofa_rawandc_b_kp_raw. -
any_use_a/any_use_b: global flags (0 or 1) indicating whether any free coefficient exists. Used to guard parameter declarations and model-block statements.
int sigma_y_size = 0;
int phi_size = 0;
array[K] int sigma_y_offset_k;
array[K] int phi_offset_k;-
sigma_y_size/phi_size: total count of active dispersion entries across slots. -
sigma_y_offset_k/phi_offset_k: per-slot offsets into the compacted dispersion arrays (computed but not referenced in the executable code of this section).
for (k in 1:K) {
for (j in 1:p) {
J_a_free_kp[k, j] = (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0)
? J_a_per_kp[k, j] : 0;
J_b_free_kp[k, j] = (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0)
? J_b_per_kp[k, j] : 0;For each
and analogously for
a_raw_offset_kp[k, j] = total_J_a_free;
c_b_raw_offset_kp[k, j] = total_J_b_free;
total_J_a_free += J_a_free_kp[k, j];
total_J_b_free += J_b_free_kp[k, j];The current running totals are recorded as the starting offset for this
where
if (J_a_free_kp[k, j] > 0) any_use_a = 1;
if (J_b_free_kp[k, j] > 0) any_use_b = 1;
}Global flags are set to 1 if any
sigma_y_offset_k[k] = sigma_y_size;
phi_offset_k[k] = phi_size;
sigma_y_size += use_dispersion_y_k[k];
phi_size += use_dispersion_phi_k[k];
}Per-slot dispersion offsets and total sizes are computed. For each slot use_dispersion_y_k[k] == 1, one sigma_y_pop_k entry is allocated; likewise for phi_pop_k.
int n_sigma_a = 0;
array[K] int sigma_a_idx;
for (k in 1:K) {
int slot_free_a = 0;
for (j in 1:p) {
slot_free_a += J_a_free_kp[k, j];
}
if (slot_free_a > 0) {
n_sigma_a += 1;
sigma_a_idx[k] = n_sigma_a;
} else {
sigma_a_idx[k] = 0;
}
}This implements Option A (RG.6, D96): the per-slot additive scale use_a_k[k] == 0 or all
The mapping sigma_a_k (1-indexed), or 0 if the slot is inactive. When every slot carries free
Mathematically:
where
array[J_groups, K] vector[p] theta_ref_kp;The per-group, per-slot, per-coordinate anchor
In the marginal template, these are sampled parameters. In the companion conditional template, they are moved to data (as theta_ref_kp_data) and the anchor prior is removed.
array[use_groups, K] vector[p] mu_theta_ref_kp;
array[use_groups, K] vector<lower=0>[p] sigma_theta_ref_kp;When use_groups == 1, these declare per-slot, per-coordinate hyperparameters:
-
$\mu_{\theta,k}[j]$ : prior mean of the anchor. -
$\sigma_{\theta,k}[j] > 0$ : prior scale of the anchor.
When use_groups == 0, the first array dimension is 0, so no parameters are declared — Stan permits zero-size arrays. The anchor is then given a direct prior without hierarchy.
array[n_sigma_a] real<lower=0> sigma_a_k;
vector[total_J_a_free] a_raw;-
sigma_a_k: per-slot additive scale, compacted to active slots only (size$n_{\sigma_a}$ ). Constraint$\sigma_{a,k} > 0$ . -
a_raw: flat-packed standard-normal latent variables, size$\texttt{total_J_a_free}$ .
The NCP construction is:
with
array[K * any_use_b] real<lower=0> sigma_b_k;
vector[total_J_b_free] c_b_kp_raw;-
sigma_b_k: per-slot multiplicative scale. Whenany_use_b == 1, the size is$K$ ; when 0, the size is 0 (no parameters declared). Constraint$\sigma_{b,k} > 0$ . -
c_b_kp_raw: flat-packed standard-normal latent variables for the linear-reparametrized multiplicative coefficients, size$\texttt{total_J_b_free}$ .
The NCP construction is:
with
The reference-anchoring relation is
This relation is inverted only in the single-anchor regime (use_groups == 0) in transformed parameters. The linear predictor uses
array[sigma_y_size] real<lower=0> sigma_y_pop_k;
array[phi_size] real<lower=0> phi_pop_k;Per-slot population-scope dispersion entries. These are declared with sizes determined by use_dispersion_y_k / use_dispersion_phi_k. They receive priors in the model block but do not enter the likelihood in this section: the Gaussian likelihood uses
array[K] matrix[n, p] eta_kp;Per-slot linear predictor matrices.
array[K, p] vector[J_a_max] a_coef_kp;
array[K, p] vector[J_b_max] c_b_kp;
array[K, p] vector[J_b_max] b_coef_kp;-
a_coef_kp: padded additive coefficient vectors (only first$J_a^{(k,j)}$ entries are active). -
c_b_kp: padded linear-reparametrized multiplicative coefficient vectors. -
b_coef_kp: padded derived multiplicative coefficient vectors (populated only in single-anchor regime).
for (k in 1:K) {
for (j in 1:p) {
a_coef_kp[k, j] = rep_vector(0, J_a_max);
c_b_kp[k, j] = rep_vector(0, J_b_max);
b_coef_kp[k, j] = rep_vector(0, J_b_max);
}
}All coefficient vectors are initialized to zero. Padded (unused) entries remain zero throughout, ensuring they contribute nothing to dot products.
if (any_use_a == 1) {
for (k in 1:K) {
if (use_a_k[k] == 1) {
for (j in 1:p) {
if (J_a_free_kp[k, j] > 0) {
for (jj in 1:J_a_free_kp[k, j]) {
a_coef_kp[k, j][jj] = a_raw[a_raw_offset_kp[k, j] + jj]
* sigma_a_k[sigma_a_idx[k]];
}
}
}
}
}
}For each active
The guard if (any_use_a == 1) prevents any execution when no additive coefficients exist. The inner guards use_a_k[k] == 1 and J_a_free_kp[k, j] > 0 ensure only active slots and coordinates are processed. The compacted index sigma_a_idx[k] maps slot sigma_a_k vector.
if (any_use_b == 1) {
for (k in 1:K) {
if (use_b_k[k] == 1) {
for (j in 1:p) {
if (J_b_free_kp[k, j] > 0) {
for (jj in 1:J_b_free_kp[k, j]) {
c_b_kp[k, j][jj] = c_b_kp_raw[c_b_raw_offset_kp[k, j] + jj]
* sigma_b_k[k];
}The linear-reparametrized multiplicative coefficients are constructed via NCP:
Note: sigma_b_k[k] uses direct slot indexing (not compacted), because sigma_b_k has size any_use_b == 1.
if (use_groups == 0 && theta_ref_kp[1, k][j] != 0) {
for (jj in 1:J_b_per_kp[k, j]) {
b_coef_kp[k, j][jj] = c_b_kp[k, j][jj]
/ theta_ref_kp[1, k][j];
}
}In the single-anchor regime (use_groups == 0), the multiplicative coefficient
The guard theta_ref_kp[1, k][j] != 0 prevents division by zero. When use_groups == 1, b_coef_kp remains zero. The comment notes this is "reported as a derived quantity only in the single-anchor regime."
Note the loop bound uses J_b_per_kp[k, j] (the original count) rather than J_b_free_kp[k, j] — these are equal when use_b_k[k] == 1, so the distinction is immaterial here, but it is a line-faithful observation.
for (k in 1:K) {
for (i in 1:n) {
int g_i = group_id[i];
for (j in 1:p) {
real theta_ref_ikj = theta_ref_kp[g_i, k][j];
real eta_ikj = theta_ref_ikj;For each observation
This is the reference-anchoring baseline: the predictor starts at the group-specific anchor.
if (use_a_k[k] == 1 && J_a_per_kp[k, j] > 0) {
eta_ikj += dot_product(
to_vector(Z_a_kp[k, j][i, 1:J_a_per_kp[k, j]]),
a_coef_kp[k, j][1:J_a_per_kp[k, j]]
);
}The additive component is added:
The to_vector() call extracts the active row slice of the design matrix; dot_product computes the inner product with the active coefficient slice.
if (use_b_k[k] == 1 && J_b_per_kp[k, j] > 0) {
eta_ikj += dot_product(
to_vector(Z_b_kp[k, j][i, 1:J_b_per_kp[k, j]]),
c_b_kp[k, j][1:J_b_per_kp[k, j]]
);
}The multiplicative component (in linear-reparametrized form
Note that $c_b^{(k,j)}{\ell} = \theta{\text{ref},k}^{(g_i)}[j] \cdot b^{(k,j)}{\ell}$ conceptually, so this term equals $\theta{\text{ref},k}^{(g_i)}[j] \cdot \sum_\ell Z_b^{(k,j)}[i,\ell] \cdot b^{(k,j)}_{\ell}$, making the
eta_kp[k][i, j] = eta_ikj;
}
}
}The assembled linear predictor is stored. The complete AMM reference-anchoring decomposition for the linear predictor is:
where $\mathbf{c}b^{(k,j)} = \theta{\text{ref},k}^{(g_i)}[j] \odot \mathbf{b}^{(k,j)}$ (the anchoring relation, inverted only in single-anchor mode for reporting).
if (use_groups == 1) {
for (k in 1:K) {
mu_theta_ref_kp[1, k] ~ {{PRIOR_THETA_REF}};
sigma_theta_ref_kp[1, k] ~ {{PRIOR_SIGMA_THETA_REF}};
for (g in 1:J_groups) {
theta_ref_kp[g, k] ~ normal(mu_theta_ref_kp[1, k],
sigma_theta_ref_kp[1, k]);
}
}
}Hierarchical prior (when use_groups == 1): for each slot
The {{PRIOR_THETA_REF}} and {{PRIOR_SIGMA_THETA_REF}} are R-side template placeholders substituted by the dispatcher. The hyperparameters are indexed [1, k] because the first array dimension is use_groups (= 1 in this branch), so only index 1 is valid.
} else {
for (k in 1:K) {
theta_ref_kp[1, k] ~ {{PRIOR_THETA_REF}};
}
}Direct prior (when use_groups == 0): only group 1 exists, and the anchor receives a direct (non-hierarchical) prior:
if (any_use_a == 1) {
sigma_a_k ~ {{PRIOR_SIGMA_A}};
a_raw ~ normal(0, 1);
}When any additive free coefficient exists:
This is the standard NCP prior: the unit-normal on a_raw combined with the scale prior on sigma_a_k induces the implied prior $a^{(k,j)}\ell = \sigma{a,k} \cdot a_{\text{raw},\cdot} \sim \sigma_{a,k} \cdot \mathcal{N}(0,1)$. No Jacobian adjustment is required because Stan samples
if (any_use_b == 1) {
sigma_b_k ~ {{PRIOR_SIGMA_B}};
c_b_kp_raw ~ normal(0, 1);
}When any multiplicative free coefficient exists:
Same NCP structure as the additive component. The implied prior on $c_b^{(k,j)}\ell$ is $\sigma{b,k} \cdot \mathcal{N}(0,1)$.
if (sigma_y_size > 0) {
sigma_y_pop_k ~ {{PRIOR_SIGMA_Y}};
}
if (phi_size > 0) {
phi_pop_k ~ {{PRIOR_PHI}};
}Population-scope dispersion parameters receive their priors. As noted above, these parameters do not enter the likelihood in this section — they are sampled from their priors only. Their posteriors are therefore equal to their priors, making them non-identified in the likelihood sense. This is a faithful description of the code's behavior; the parameters may serve a purpose in section 2 (generated quantities) or in future family extensions.
The likelihood implements the coord-wise factorization stated in the header:
if (family_id_k_vector[1] == 1) {
for (i in 1:n) {
for (j in 1:p) {
real sigma_ij = exp(eta_kp[2][i, j]);
y_real[i, j] ~ normal(eta_kp[1][i, j], sigma_ij);
}
}
}Slot 1 carries the location inv_link_id_per_slot[2] = 2).
The likelihood is:
The exp() inverse link is hardcoded rather than dispatched through inv_link_id_per_slot.
} else if (family_id_k_vector[1] == 3) {
for (i in 1:n) {
for (j in 1:p) {
real phi_ij = exp(eta_kp[2][i, j]);
y_int[i, j] ~ neg_binomial_2(exp(eta_kp[1][i, j]), phi_ij);
}
}
}Slot 1 carries the log-mean
Stan's neg_binomial_2(mu, phi) parameterization has PMF:
with mean
The likelihood is:
// family_id_k_vector[1] in {5, 6, 7, 8, 9, 10, 11, 12, 13}:
// deferred to a later iteration of 8.6.D with the explicit
// numerical caveat of opening Section 6.1. The dispatcher upstream
// raises gdpar_unsupported_feature_error before reaching this
// branch.Families gdpar_unsupported_feature_error before this Stan model is ever called for those families. If the model were reached with an unsupported family_id_k_vector[1], no likelihood branch would execute, resulting in an improper posterior (parameters sampled from priors only).
Not present. No custom Stan functions are defined; the model uses only built-in distributions (normal, neg_binomial_2) and built-in functions (exp, dot_product, to_vector, rep_vector).
Not present in section 1 of 2. This block is expected in section 2, where derived quantities such as posterior predictive draws, log-likelihood contributions, and potentially the b_coef_kp recovery in the multi-group regime would be computed. The population-scope dispersion parameters (sigma_y_pop_k, phi_pop_k) and the data-side anchor (theta_anchor_kp) may be consumed there.
Section 2 contains only the generated quantities block (preceded by a single closing brace } that terminates the previous block — almost certainly the model block from section 1). There is no functions, data, transformed data, parameters, or transformed parameters block in this section; those were defined in section 1 and are referenced here as already-in-scope symbols. The relevant in-scope identifiers used below are:
-
n,p— sample size and number of response coordinates. -
family_id_k_vector— integer vector indexed by the AMM component index$k$ ; element[1]selects the observation-family (the marginal response distribution). -
eta_kp— a$K \times n \times p$ array of linear predictors (the AMM working-scale coordinates). Indexed aseta_kp[k][i, j]with$k \in {1,\dots,K}$ ,$i \in {1,\dots,n}$ ,$j \in {1,\dots,p}$ . In this marginal template$K = 2$ :eta_kp[1]is the mean-structure predictor andeta_kp[2]is the auxiliary-structure (dispersion/scale) predictor. -
y_real—$n \times p$ real-valued response matrix (used for continuous families). -
y_int—$n \times p$ integer-valued response matrix (used for count families).
}This single } closes the block opened in section 1 (the model block, where the joint log-posterior was incremented). It is reproduced here only because the section boundary splits the file mid-program. No semantics of its own.
generated quantities {Opens the generated quantities block. Quantities declared here are not used for HMC sampling; they are computed once per posterior draw and stored for downstream inference (LOO-CV, posterior-predictive checks, recovery diagnostics). Because they execute after each leapfrog trajectory is accepted, all eta_kp values reflect a valid posterior sample.
// Per-coord log-likelihood and posterior-predictive quantities.
matrix[n, p] log_lik;
matrix[n, p] theta_i;
matrix[n, p] y_pred;Three
| Symbol | Type | Role |
|---|---|---|
log_lik |
matrix[n, p] |
Per-observation log-likelihood loo::loo() (after reshaping to a vector) for PSIS-LOO at the coordinate level. |
theta_i |
matrix[n, p] |
The anchored mean predictor on its natural working scale, |
y_pred |
matrix[n, p] |
Posterior-predictive replicate $y^{\text{rep}}{ij} \sim p(\cdot \mid \eta{1,ij}, \eta_{2,ij})$ at each draw. |
Note that theta_i is declared as a matrix even for the negative-binomial branch, where the natural parameter is theta_i always holds the reference-anchored coordinate on the working (link) scale.
if (family_id_k_vector[1] == 1) {The dispatch keys on family_id_k_vector[1] — the family identifier for the first AMM component, which in this marginal template is the observation model. The code recognizes exactly two values:
-
1→ Gaussian (normal) family. -
3→ Negative-binomial (NB2) family.
Other values of family_id_k_vector[1] (e.g. 2, presumably Bernoulli/logistic in the package's family registry) are not handled in this template; if encountered, log_lik, theta_i, and y_pred remain uninitialized, which Stan treats as a runtime error (NaN propagation). This is faithful to the source: no else clause is provided.
for (i in 1:n) {
for (j in 1:p) {
real sigma_ij = exp(eta_kp[2][i, j]);
theta_i[i, j] = eta_kp[1][i, j];
log_lik[i, j] = normal_lpdf(y_real[i, j] | eta_kp[1][i, j],
sigma_ij);
y_pred[i, j] = normal_rng(eta_kp[1][i, j], sigma_ij);
}
}For each coordinate
-
real sigma_ij = exp(eta_kp[2][i, j]);The auxiliary predictor$\eta_{2,ij}$ is mapped through the log link to the positive standard deviation:$$\sigma_{ij} = \exp(\eta_{2,ij}) > 0.$$ This is the only place the link for the dispersion is applied;$\eta_{2,ij}$ itself is unconstrained on the real line, so no Jacobian term is required (the transformation is applied to a deterministic function of parameters inside the likelihood, not to a transformed parameter being sampled). -
theta_i[i, j] = eta_kp[1][i, j];The mean-structure predictor is copied verbatim:$$\theta_{ij} = \eta_{1,ij}.$$ For the Gaussian family the mean link is the identity, so$\theta_{ij}$ is also the conditional mean$\mathbb{E}[y_{ij}] = \eta_{1,ij}$ . -
log_lik[i, j] = normal_lpdf(y_real[i, j] | eta_kp[1][i, j], sigma_ij);The Gaussian log-density is evaluated at the observed$y_{ij}^{\mathbb{R}}$ :$$\log p(y_{ij} \mid \eta_{1,ij}, \eta_{2,ij}) = -\tfrac{1}{2}\log(2\pi) - \log \sigma_{ij} - \frac{(y_{ij} - \eta_{1,ij})^2}{2\sigma_{ij}^2}.$$ This is the marginal observation likelihood for coordinate$(i,j)$ — the AMM "EB marginal" likelihood, conditional on the anchored predictors. -
y_pred[i, j] = normal_rng(eta_kp[1][i, j], sigma_ij);A posterior-predictive draw: $$y^{\text{rep}}{ij} \sim \mathcal{N}(\eta{1,ij},, \sigma_{ij}^2).$$normal_rngconsumes the per-draw RNG stream and is therefore a valid posterior-predictive sample.
} else if (family_id_k_vector[1] == 3) {
for (i in 1:n) {
for (j in 1:p) {
real phi_ij = exp(eta_kp[2][i, j]);
real mu_ij = exp(eta_kp[1][i, j]);
theta_i[i, j] = eta_kp[1][i, j];
log_lik[i, j] = neg_binomial_2_lpmf(y_int[i, j] | mu_ij, phi_ij);
y_pred[i, j] = neg_binomial_2_rng(mu_ij, phi_ij);
}
}
}
}For each coordinate
-
real phi_ij = exp(eta_kp[2][i, j]);The auxiliary predictor is mapped through the log link to the NB2 precision (inverse-dispersion) parameter:$$\phi_{ij} = \exp(\eta_{2,ij}) > 0.$$ Larger$\eta_{2,ij}$ ⇒ larger$\phi_{ij}$ ⇒ lower over-dispersion. The variance is$$\mathrm{Var}(y_{ij}) = \mu_{ij} + \frac{\mu_{ij}^2}{\phi_{ij}}.$$ -
real mu_ij = exp(eta_kp[1][i, j]);The mean-structure predictor is mapped through the log link to the positive mean:$$\mu_{ij} = \exp(\eta_{1,ij}) > 0.$$ This is the canonical link for the NB2 mean. As in the Gaussian branch, the link is applied here insidegenerated quantitiesrather than to a sampledtransformed parameter, so no Jacobian is added. -
theta_i[i, j] = eta_kp[1][i, j];The anchored coordinate is stored on the working (log) scale:$$\theta_{ij} = \eta_{1,ij} = \log \mu_{ij}.$$ This is the AMM convention:theta_ialways holds the linear predictor, regardless of family. Downstream consumers must exponentiate if they want the mean. -
log_lik[i, j] = neg_binomial_2_lpmf(y_int[i, j] | mu_ij, phi_ij);The NB2 log-PMF evaluated at the observed integer count$y_{ij}^{\mathbb{Z}}$ :$$\log p(y_{ij} \mid \mu_{ij}, \phi_{ij}) = \log\Gamma(y_{ij}+\phi_{ij}) - \log\Gamma(\phi_{ij}) - \log\Gamma(y_{ij}+1) + \phi_{ij}\log!\frac{\phi_{ij}}{\phi_{ij}+\mu_{ij}} + y_{ij}\log!\frac{\mu_{ij}}{\phi_{ij}+\mu_{ij}}.$$ This is the marginal coordinate likelihood under the AMM decomposition. -
y_pred[i, j] = neg_binomial_2_rng(mu_ij, phi_ij);A posterior-predictive count: $$y^{\text{rep}}{ij} \sim \mathrm{NB2}(\mu{ij}, \phi_{ij}).$$ Note thaty_predis declared as amatrix[n,p](real-valued), so the integer draw is implicitly stored as a real; downstream code that expects integer counts must round or cast.
The final } closes the generated quantities block (and hence the program).
Let
The block produces, for every posterior draw
- $\log p(y_{ij} \mid \eta^{(s)}{1,ij}, \eta^{(s)}{2,ij})$ into
log_lik[i,j]; -
$\eta^{(s)}_{1,ij}$ intotheta_i[i,j](the reference-anchored mean coordinate, on the working scale); -
$y^{\text{rep},(s)}_{ij}$ intoy_pred[i,j], drawn from the same marginal law.
The "AMM" (Anchored Measurement Model) reference-anchoring decomposition splits the parameter vector for each coordinate into:
- A reference coordinate
$\eta_{1,ij}$ — the mean-structure linear predictor, identified up to the chosen link. This is the quantity stored verbatim intheta_iand is the canonical "anchored" target of inference. - An auxiliary coordinate
$\eta_{2,ij}$ — the dispersion/scale linear predictor, also on an unconstrained working scale, mapped to the positive half-line via$\exp(\cdot)$ inside the likelihood.
The "EB marginal" qualifier in the file name indicates that the likelihood evaluated here is the marginal coordinate-wise likelihood (each for loop accumulates per-coordinate log-densities with no covariance term, so
The KxP suffix denotes that the AMM array is indexed by family_id_k_vector mechanism allows each component family_id_k_vector[1] — the family of the observation model — and treats eta_kp[2] purely as a dispersion predictor whose own "family" is implicitly log-Gaussian on the working scale (i.e., it is always exponentiated). This is the canonical AMM pattern: the first component carries the response family, and subsequent components are auxiliary working-scale regressors whose link is fixed by the response family's requirements (log link for
No Jacobian adjustments appear in this block because:
-
eta_kpis declared (in section 1) as an unconstrained array of reals, sampled directly by HMC. - The link functions
$\exp(\cdot)$ are applied inside the likelihood/RNG calls as deterministic transformations of parameters, not as transformations of sampledtransformed parameters. The change-of-variables term would only be required if a constrained quantity (e.g.,sigmadeclared with<lower=0>) were being sampled; here the unconstrained$\eta_{2,ij}$ is the sampled quantity and$\sigma_{ij}$ is a deterministic function of it.
Consequently, the posterior sampled over model block (section 1), and the generated quantities block merely exposes the implied per-coordinate likelihood, anchored mean, and posterior-predictive replicate for downstream use.
← Part V — Stan Templates (2/3) · gdpar Wiki Home · Part VI — Data, Benchmarks, Tests & References →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References