-
Notifications
You must be signed in to change notification settings - Fork 0
Part IV Function Reference 1
← Part III — Computational Architecture · gdpar Wiki Home · Part IV — Exhaustive Function Reference (2/7) →
Documentation of every function in all 44 R source files, generated by the
GLM-5.2 and MiMo-V2.5-Pro lineages under a faithful-to-source spec and audited
(guaranteed floor on the mathematically dense modules: families, W_basis,
geometry_*, dependence_robust, stan_codegen).
gdpar_adapter_econml(estimator = "CausalForestDML", n_estimators = 1000L, model_y = NULL, model_t = NULL, seed = NULL)
Purpose Creates and returns a gdpar_meta_learner_adapter object that wraps an EconML estimator via the reticulate package, for use with gdpar_compare_meta_learners. It enables causal inference using the Orthogonal Double/Debiased Machine Learning (DML) framework, specifically the CausalForestDML estimator from the EconML Python package.
Arguments
-
estimatorCharacter scalar. Identifies the EconML estimator to use. Currently only"CausalForestDML"is supported in this package version. -
n_estimatorsInteger scalar. The number of trees in theCausalForestDMLforest. -
model_yOptional Python model object. The outcome model used in the first stage of the DML procedure. IfNULL, EconML's default (a gradient-boosted tree) is used. -
model_tOptional Python model object. The treatment model used in the first stage of the DML procedure. IfNULL, EconML's default is used. -
seedOptional integer scalar. A random seed for reproducibility, passed to the EconML estimator'srandom_stateparameter. Must be between 1 and.Machine$integer.max.
Mathematics
The CausalForestDML estimator implements the Orthogonal Double Machine Learning (DML) framework for estimating the Conditional Average Treatment Effect (CATE),
- Estimating nuisance parameters
$\eta_0$ , e.g., the outcome model$\mathbb{E}[Y|X, T]$ and the propensity score$\mathbb{P}(T=1|X)$ , using flexible machine learning. - Constructing the pseudo-outcome (the "DML residual"):
$\psi(W; \eta) = \left[ \frac{T - \hat{\mathbb{P}}(T=1|X)}{\hat{\mathbb{P}}(T=1|X)(1 - \hat{\mathbb{P}}(T=1|X))} \right] \left( Y - \hat{\mathbb{E}}[Y|X, T] \right)$ - Estimating the CATE by regressing
$\psi(W; \hat{\eta})$ on$X$ using a causal forest.
Confidence intervals are constructed using the effect_interval method of the fitted EconML estimator, which provides asymptotically valid intervals based on the forest's variance estimation.
Returns
A gdpar_meta_learner_adapter list object with the following components:
-
name:"econml" -
fit_predict_fun: A function that fits the model and returns CATE estimates and confidence intervals. -
predict_fun: A function that predicts from a fitted model without re-fitting. -
requires_r:"reticulate" -
requires_py:"econml" -
native_ci:TRUE -
description: A character string summarizing the estimator and its settings.
Notes
- The Python module
econmlmust be installed in the activereticulateenvironment. If not found, the function aborts with agdpar_missing_dependency_error. - The
stateobject returned byfit_predict_funcontains a reference to a Python object. This reference is invalidated if the R session is restarted (e.g., aftersaveRDSandloadRDS). Usingpredict_funon such a state will result in agdpar_unsupported_feature_error. - The function validates input arguments using
assert_countandassert_numeric_scalar(internal validation functions not shown). - The internal
.econml_to_matrixfunction is called to convert covariate data frames to numeric matrices compatible with EconML.
Purpose Converts a data frame of covariates into a numeric matrix suitable for input to the EconML Python package. It also manages a template to ensure consistent encoding of factor levels between training and prediction data.
Arguments
-
dfA data frame or object coercible to a data frame. Contains the covariates. -
templateA list orNULL. IfNULL, the function creates a new template fromdf. If provided, it ensuresdfis processed to match the template's column structure.
Mathematics The conversion process applies the following transformation to each covariate:
- Character columns are converted to factors.
- The data frame is converted to a model matrix using the formula
~ . - 1, which expands factor variables into dummy (one-hot) variables without an intercept. For a factor with$k$ levels, this yields$k$ binary indicator columns.
The resulting matrix fit and effect methods.
Returns A numeric matrix with the following attributes:
-
colnames: The column names of the matrix. -
factor_levels: A list mapping original factor column names to their levels. If the column is not a factor, the entry isNULL. -
template: The template list (either created or passed in) is attached as an attribute to the matrix. This template is stored in the adapter'sstateto ensure consistent data processing during prediction.
Notes
- If
templateisNULL(i.e., during model fitting), the function requires at least one column indf. Otherwise, it aborts with agdpar_input_error. - If
templateis provided (i.e., during prediction), the function enforces that all factor variables in the template exist indfand have the same levels. It also ensures the resulting model matrix has exactly the same columns as the template. Incompatibilities (missing or extra columns) result in agdpar_input_error. - The function uses
stats::model.matrixfor the conversion, which handles factor expansion and removal of intercepts.
gdpar_adapter_grf(num_trees = 2000L, sample_fraction = 0.5, mtry = NULL, honesty = TRUE, seed = NULL)
Purpose
Factory that constructs a gdpar_meta_learner_adapter object wrapping the R-side causal forest estimator grf::causal_forest (Athey, Tibshirani, and Wager, 2019) for use with gdpar_compare_meta_learners. The adapter populates both the mandatory fit_predict_fun (fit + predict in one call) and the optional predict_fun (reuse a cached forest on a fresh evaluation grid without refitting). It advertises requires_r = "grf" and native_ci = TRUE because grf's built-in variance estimator supplies confidence intervals via the normal approximation.
Arguments
-
num_trees: Integer scalar; number of trees in the forest. Default2000L, matchinggrf's default. -
sample_fraction: Numeric scalar in$(0, 0.5]$ ; fraction of the training sample drawn for each tree. Default0.5. -
mtry: Optional integer scalar; number of candidate variables per split. DefaultNULLdelegates togrf's own default ($\min(\lceil\sqrt{p} + 20\rceil, p)$ ). -
honesty: Logical scalar; whether to use honest splitting. DefaultTRUE(recommended;grfCIs are valid only under honesty). -
seed: Optional integer scalar; seed propagated togrf's internal RNG when the comparator'sseed_runisNULL. DefaultNULL.
Mathematics
Native confidence intervals are obtained by the normal approximation
where grf's built-in variance estimate obtained via predict(..., estimate.variance = TRUE). The standard error is clamped at zero:
Returns
A gdpar_meta_learner_adapter object (constructed via gdpar_meta_learner_adapter) with fields:
-
name = "grf", -
fit_predict_fun: closure capturing the hyperparameter listhp, -
predict_fun: closure independent ofhp, -
requires_r = "grf", -
native_ci = TRUE, -
description: a string of the form"grf::causal_forest (num_trees = <n>, honesty = <h>) with normal-approximation CIs from estimate.variance.".
Notes
- Validation sequence:
-
assert_count(num_trees, "num_trees"). -
assert_numeric_scalar(sample_fraction, "sample_fraction", lower = 1e-3, upper = 0.5). - If
mtryis non-NULL,assert_count(mtry, "mtry"). -
honestymust be a length-1 non-NAlogical; otherwisegdpar_abortraises agdpar_input_errorwith message"Argument 'honesty' must be a non-NA logical scalar.". - If
seedis non-NULL,assert_numeric_scalar(seed, "seed", lower = 1, upper = .Machine$integer.max).
-
- Hyperparameters are coerced and stored in a list
hp:num_trees→integer,sample_fraction→numeric,mtry→integer(orNULL),honesty→logical,seed→integer(orNULL). - The
fit_predict_funandpredict_funclosures are defined locally and passed togdpar_meta_learner_adapter; see their dedicated subsections below. - Side effects: none beyond construction of the adapter object. The closures themselves perform fitting/prediction only when invoked.
Purpose
Mandatory adapter entry point: fit a grf::causal_forest on the training triple (X, Y, T) and immediately predict the conditional average treatment effect (CATE) on X_newdata together with native normal-approximation confidence intervals at confidence level level. Returns both the predictions and a state object that allows predict_fun to reuse the fitted forest.
Arguments
-
X: Covariate data (data frame or coercible) for the training sample; passed to.grf_to_matrix. -
Y: Numeric outcome vector; coerced viaas.numeric(Y)and passed asYtogrf::causal_forest. -
T: Numeric treatment indicator vector; coerced viaas.numeric(T)and passed asWtogrf::causal_forest. -
X_newdata: Covariate data for prediction; converted via.grf_to_matrix(X_newdata, template = attr(X_mat, "template"))so its design aligns with the training design. -
level: Numeric scalar in$(0, 1)$ ; confidence level for the CIs. -
seed_run: Optional integer scalar; seed supplied by the comparator. If non-NULLit overrideshp$seed.
Mathematics
The causal forest is fit by do.call(grf::causal_forest, args) with args = list(X = X_mat, Y = as.numeric(Y), W = as.numeric(T), num.trees = hp$num_trees, sample.fraction = hp$sample_fraction, honesty = hp$honesty), conditionally augmented with mtry (if hp$mtry non-NULL) and seed (if eff_seed non-NULL). Predictions use estimate.variance = TRUE, yielding
Returns
A list with elements:
-
cate_mean: numeric vector of point predictionsas.numeric(pred$predictions). -
cate_ci: numeric matrix with columnslowerandupper, row-aligned withcate_mean. -
state: list withforest(the fittedcausal_forestobjectcf) andtemplate(the column template extracted fromattr(X_mat, "template")). -
notes:character(0L)(empty).
Notes
- Calls
require_suggested("grf", "fit gdpar_adapter_grf")to ensure the suggested package is available; this is expected to error ifgrfis not installed. -
eff_seedis resolved asseed_run(coerced to integer) when non-NULL, otherwise falls back tohp$seed. - The variance estimates are clamped at zero via
pmax(as.numeric(pred$variance.estimates), 0)before taking the square root, guarding against tiny negative numerical artifacts. - The
state$forestobject retainsgrf's internal RNG state and trained trees; it is intended to be passed back topredict_fun. - Side effects: triggers
grf::causal_forestfitting (RNG consumption ifeff_seedis set) and a prediction call.
Purpose
Optional adapter entry point: reuse a previously fitted causal forest stored in state to predict CATEs (with native CIs) on a fresh evaluation grid X_newdata, without refitting.
Arguments
-
state: List previously produced byfit_predict_fun, expected to containforest(a fittedgrf::causal_forestobject) andtemplate(column template for design alignment). -
X_newdata: Covariate data for prediction; converted via.grf_to_matrix(X_newdata, template = state$template). -
level: Numeric scalar in$(0, 1)$ ; confidence level for the CIs.
Mathematics
Identical normal-approximation CI construction as in fit_predict_fun:
Returns
A list with elements:
-
cate_mean: numeric vector of point predictions. -
cate_ci: numeric matrix with columnslowerandupper.
(No state or notes elements are returned, unlike fit_predict_fun.)
Notes
- Dependency check is performed inline via
requireNamespace("grf", quietly = TRUE); on failure,gdpar_abortraises agdpar_missing_dependency_errorwithdata = list(package = "grf")and message"Package 'grf' is required to reuse a cached causal_forest state.". - If
stateisNULLorstate$forestisNULL,gdpar_abortraises agdpar_internal_errorwith message"Cached state for the grf adapter is empty; refit before predicting.". - Uses
stats::predict(notgrf:::predict.causal_forestdirectly) so S3 dispatch resolves the method. - Variance estimates are clamped at zero via
pmax(..., 0)before the square root. - Side effects: triggers a
grfprediction call (no refitting, no RNG consumption).
Purpose
Internal helper that converts a covariate data frame into a fully numeric design matrix suitable for grf::causal_forest. Character columns are coerced to factors, factors are expanded via model.matrix(~ . - 1, ...), and numeric columns pass through unchanged. A template attribute records the column structure of the first call so subsequent calls on X_newdata align identically; the function aborts when a new column appears or a previously observed factor level is missing.
Arguments
-
df: A data frame, or an object coercible viaas.data.frame(df, stringsAsFactors = FALSE). Contains the covariates. -
template: Optional list with elementscolnames(character vector of expected design-matrix column names) andfactor_levels(named list mapping original data-frame column names to either their factor levels orNULLfor non-factor columns). IfNULL, a new template is built fromdf.
Mathematics
The design matrix is constructed as
which expands each template is supplied, factor columns are first re-leveled to the template's levels via factor(df[[j]], levels = template$factor_levels[[j]]), the resulting matrix is checked for set-equality of column names against template$colnames, and finally reordered as mm[, template$colnames, drop = FALSE].
Returns
A numeric matrix with attribute "template" (a list with colnames and factor_levels). When template = NULL on input, the returned template is freshly built from df; otherwise the input template is attached unchanged to the returned matrix.
Notes
- If
dfis not a data frame, it is coerced viaas.data.frame(df, stringsAsFactors = FALSE). - A
forloop overseq_along(df)coerces anycharactercolumn tofactorviaas.factor. - Template-
NULLbranch:- If
ncol(df) == 0L,gdpar_abortraises agdpar_input_errorwith message"gdpar_adapter_grf requires at least one covariate; received a 0-column data frame.". -
template$factor_levelsis built bylapply(df, function(col) if (is.factor(col)) levels(col) else NULL), so non-factor columns map toNULL.
- If
- Template-non-
NULLbranch:- For each name
jinnames(template$factor_levels)whose entry is non-NULL(i.e., a factor column in the original training data):- If
jis not incolnames(df),gdpar_abortraises agdpar_input_errorwith message"Covariate '<j>' missing from newdata for the grf adapter."anddata = list(missing = j). - Otherwise
df[[j]]is re-factored withfactor(df[[j]], levels = template$factor_levels[[j]]); this silently drops unseen levels and introducesNAfor values not in the template levels.
- If
- After building
mm, if!setequal(colnames(mm), template$colnames):-
missing_cols <- setdiff(template$colnames, colnames(mm)). -
extra_cols <- setdiff(colnames(mm), template$colnames). -
gdpar_abortraises agdpar_input_errorwith a formatted message listing missing and extra columns (using"<none>"when a set is empty) anddata = list(missing = missing_cols, extra = extra_cols).
-
- On success, columns are reordered to exactly match
template$colnamesviamm[, template$colnames, drop = FALSE].
- For each name
- The
templateattribute is set on the returned matrix viaattr(mm, "template") <- template, enabling chained calls (e.g.,fit_predict_funreadsattr(X_mat, "template")and passes it to theX_newdataconversion). - No S3 dispatch; this is a plain internal function.
Purpose
Initialises a chainable builder object of class amm_builder that serves as an incremental specification container for an Adaptive Moderated Model (AMM). The builder is a programmatic alternative to the single-call amm_spec() constructor; it accumulates per-dimension additive (a) and multiplicative (b) basis formulas, a global modulating basis W, and optional covariate names x_vars, and is ultimately converted into an amm_spec via as_amm_spec().
Arguments
| Argument | Type | Meaning |
|---|---|---|
p |
Scalar positive integer (coerced to integer) |
Dimension of the per-individual parameter vector 1L (scalar/ univariate path). Must satisfy p ≥ 1. |
Mathematics None. The function performs only object construction.
Returns
An S3 object of class c("amm_builder", "list") with four components:
-
p—integer, the value of thepargument. -
dims— adims_specobject (fromdimwise(a = NULL, b = NULL)) representing per-dimension additive/multiplicative basis specifications with aNULLbase and no overrides. -
W—NULL(no modulating basis set yet). -
x_vars—NULL(no covariate names set yet).
Notes
-
pis validated byassert_count(p, "p"), which enforces a single positive integer value. Non-integer numerics are silently coerced tointegerviaas.integer(p). - The resulting builder is intended to be mutated in-place through successive
amm_set_*()calls and finalised byas_amm_spec(). Despite the pipe-friendly API, the builder is a plain list and mutations rely on R's copy-on-modify semantics. - When
p = 1L, finalisation throughas_amm_spec()resolves the embeddeddims_specto a scalar entry and invokes the scalar AMM path; whenp > 1L, thedims_specis forwarded directly to the multivariate path. This bifurcation happens at finalisation time, not at build time.
Purpose
Replaces the base (uniform) additive basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension amm_set_a()). Passing NULL disables the additive component on the base layer.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
a |
One-sided formula (~ ...) or NULL
|
The new base additive basis. NULL means no additive component on the base layer. |
Returns
The mutated amm_builder object (returned invisibly for pipe compatibility). The mutation is to builder$dims$base$a.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
ais validated byassert_one_sided_formula(a, "a", allow_null = TRUE), which enforces a one-sided formula orNULL. - The mutation
builder$dims$base$a <- adirectly overwrites the base additive slot. Any per-dimension overrides previously registered viaamm_set_a()are preserved because they are stored separately in thedims_specoverride layer (seeoverride()).
Purpose
Replaces the base (uniform) multiplicative basis formula of the embedded dims_spec inside an amm_builder. This formula is applied to every dimension amm_set_b()). Passing NULL disables the multiplicative component on the base layer.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
b |
One-sided formula (~ ...) or NULL
|
The new base multiplicative basis. NULL means no multiplicative component on the base layer. |
Returns
The mutated amm_builder object (returned invisibly). The mutation is to builder$dims$base$b.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
bis validated byassert_one_sided_formula(b, "b", allow_null = TRUE). - Per-dimension overrides previously registered via
amm_set_b()are preserved; only the base layer is changed.
Purpose
Registers a per-dimension override of the additive component for a specific dimension index amm_set_a_uniform()) at index k replaces the previous override for that dimension.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
k |
Positive integer in 1:p
|
The dimension index to override. Validated against builder$p as the upper bound. |
a |
One-sided formula (~ ...) or NULL
|
The override additive basis for dimension NULL disables the additive component for that dimension. |
Returns
The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, a = a).
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
kis validated byassert_count(k, "k", max = builder$p), which enforces a single positive integer$\leq p$ . -
ais validated byassert_one_sided_formula(a, "a", allow_null = TRUE). - The override is applied via
override(), which layers a per-dimension specification on top of the existingdims_spec. Overrides survive subsequent uniform changes (e.g., a lateramm_set_a_uniform()call will not erase overrides registered here).
Purpose
Registers a per-dimension override of the multiplicative component for a specific dimension index amm_set_b_uniform()) at index k replaces the previous override for that dimension.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
k |
Positive integer in 1:p
|
The dimension index to override. Validated against builder$p as the upper bound. |
b |
One-sided formula (~ ...) or NULL
|
The override multiplicative basis for dimension NULL disables the multiplicative component for that dimension. |
Returns
The mutated amm_builder object (returned invisibly). The mutation replaces builder$dims with the result of override(builder$dims, k = k, b = b).
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). -
kis validated byassert_count(k, "k", max = builder$p). -
bis validated byassert_one_sided_formula(b, "b", allow_null = TRUE). - The override is applied via
override(), which layers a per-dimension specification on top of the existingdims_spec. Overrides survive subsequent uniform changes.
Purpose
Stores a W_basis object as the global modulating basis of the specification under construction. The modulating component
Passing NULL disables the modulating component.
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
W |
W_basis object or NULL
|
The modulating basis to store. NULL clears any previously set basis. |
Returns
The mutated amm_builder object (returned invisibly). The mutation is to builder$W.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). - If
Wis non-NULL, it is validated byassert_inherits(W, "W_basis", "W"). - The modulating component is global to all dimensions of
$\theta_i$ ; there is no per-dimension setter for$W$ . Declaring$W$ per-dimension would restrict the model to the separable sub-class, which is rejected by construction in the package design. -
Wis stored as a single top-level slot of the builder, not insidedims.
Purpose
Records a character vector of covariate names that enter the modulating component as the linear factor NULL. The recorded value is forwarded to amm_spec() at finalisation time. When NULL, the package derives covariate names from the right-hand side of the model formula passed to gdpar().
Arguments
| Argument | Type | Meaning |
|---|---|---|
builder |
amm_builder object |
The builder to modify. |
x_vars |
Character vector (length NULL
|
Names of the covariates used in the modulating component. NULL defers covariate identification to gdpar(). |
Returns
The mutated amm_builder object (returned invisibly). The mutation is to builder$x_vars.
Notes
-
builderis validated byassert_inherits(builder, "amm_builder", "builder"). - If
x_varsis non-NULL, the function performs a manual validation check: it must beis.character(x_vars)andlength(x_vars) >= 1L. On failure,gdpar_abort()is called with class"gdpar_input_error"and a data list containing the argument name and the received value. - This is the only setter in the builder suite that performs its own argument validation via
gdpar_abort()rather than delegating to anassert_*helper, because the validation logic (non-empty character vector orNULL) does not match any existing assertion primitive.
Note: The roxygen block for as_amm_spec(builder) begins at the end of this section but its function body is in section 2 of 2. It is therefore not documented here.
Purpose
Converts an amm_builder object into a finalized amm_spec specification object. This is the constructor that finalizes the builder's accumulated configuration, resolving any pending dimension specifications and handling the special case of a univariate AMM (p=1).
Arguments
-
builder: An object of classamm_buildercontaining the accumulated model configuration (e.g., dimensionsdims, basis matrixW, predictor variablesx_vars, and AR orderp).
Returns
An object of class amm_spec representing the fully specified AMM model.
Notes
- The function first asserts that
builderis indeed anamm_builderobject. - For the univariate case (
p == 1), it explicitly resolves thedimsspecification to extract the scalar anchor parametersaandbusingresolve_dims_spec. The resultingamm_specobject will contain these scalar values. - For the multivariate case (
p > 1), thedimslist is passed directly to theamm_specconstructor without immediate resolution. - The
amm_specobject is constructed by calling theamm_spec()constructor (presumably defined elsewhere) with the relevant components from the builder.
Purpose
An S3 print method for objects of class amm_builder. It provides a human-readable summary of the builder's current configuration, including base dimensions, any override specifications for higher-order terms, the basis structure, and predictor variables.
Arguments
-
x: An object of classamm_builderto be printed. -
...: Additional arguments (unused, but required for S3 method compatibility).
Returns
Invisibly returns the input object x.
Notes
- This is an exported S3 method.
- The output is printed to the console via
cat(). - The summary includes:
- The AR order
p. - Base anchor parameters
aandb. These are printed asNULLif they have not been set. - Any override specifications for dimensions of higher-order terms (k > 0). Overrides are printed if they exist, otherwise
<none>is shown. - The basis matrix specification
W. If present, it is printed asW_basis(type = '<type>'). - The predictor variable names
x_vars. IfNULL, it notes they are inherited from thegdpar()formula.
- The AR order
- The function handles
NULLvalues gracefully by printing"NULL"instead of attempting to printNULLdirectly. - The overrides are printed in ascending order of the integer key
k.
Purpose
Serializes an amm_spec object (the constructor-input representation of an AMM specification) into a canonical, human-readable plain-text file. The format is designed for version control, archival, and bit-exact reproducibility. The file is intended to be round-tripped by amm_load_spec, which parses it with a dedicated lexical parser (no source/eval), making the serialized form safe to load from untrusted locations.
Arguments
-
spec: Object of classamm_spec. The specification to serialize. Only the constructor inputs are recorded; any materialized state (e.g., aW_basismaterialized at a specific$\theta_{\mathrm{ref}}$ via the internalmaterialize_W_basis) is deliberately not written, so that the reconstructed object is the unmaterialized form normally produced byamm_spec. -
path: Character scalar. The destination file path. Must be a single non-empty string.
Returns
Invisibly returns path (the character scalar that was passed in), after writing the canonical text representation to that file via writeLines.
Notes
Validation and errors:
-
specis checked withassert_inherits(spec, "amm_spec", "spec"); a failure raises an error (dispatched byassert_inherits). -
pathis validated explicitly: it must satisfyis.character(path),length(path) == 1L, andnzchar(path). If any of these fail, the function aborts viagdpar_abortwith class"gdpar_input_error"and adatalist containingargument = "path"andreceived = path. - If
spec[["W"]]is non-NULLandidentical(W[["type"]], "user"), the function aborts with agdpar_input_error(nodatafield). User-definedW_basisobjects cannot be canonized because the evaluator is an arbitrary R function.
Serialization logic (line-by-line construction):
- The package version is obtained from
utils::packageVersion("gdpar")and emitted as the mandatory header line:# gdpar_spec_version: <version> - The dimension count is emitted as
p: <spec[["p"]]>. - Scalar path (
spec[["p"]] == 1L, checked withisTRUE): theaandbone-sided formulas are serialized via the internal helper.serialize_formulaand emitted asa: <literal>andb: <literal>. - Multivariate path (
spec[["p"]] > 1L):aandbare both written as the literal stringNULL(the per-dimension formulas are emitted separately below). -
x_varsis serialized via.serialize_char_vecand emitted asx_vars: <literal>(handlesNULLand character vectors). -
Wblock:- If
WisNULL: emitsW.type: NULL. - If
W[["type"]]is"polynomial": emitsW.type: polynomialfollowed byW.degree: <as.integer(W[["degree"]])>. - If
W[["type"]]is"bspline": emitsW.type: bspline, thenW.degree: <as.integer(W[["degree"]])>. Additionally, ifW[["knots"]]is non-NULL, emitsW.knots: <.serialize_num_vec(W[["knots"]])>; ifW[["df"]]is non-NULL, emitsW.df: <as.integer(W[["df"]])>. Either, both, or neither ofW.knots/W.dfmay appear depending on which fields are populated.
- If
- Multivariate per-dimension entries (
spec[["p"]] > 1L, checked withisTRUE): for eachkinseq_len(spec[["p"]]), the function retrievesentry <- spec[["dims"]][[k]]and emits two lines:The indexdims.<k>.a: <.serialize_formula(entry[["a"]])> dims.<k>.b: <.serialize_formula(entry[["b"]])>kis interpolated directly into the key name, producing keys such asdims.1.a,dims.1.b,dims.2.a, etc.
Side effects: Writes a text file to path via writeLines(lines, con = path). The file is overwritten if it exists.
Version policy: The version header records the running package version exactly. The loader (amm_load_spec) checks this strictly; until the first stable release, any mismatch is treated as an error.
Format grammar summary (as emitted by this function):
| Key | Value grammar |
|---|---|
# gdpar_spec_version: |
Package version string (header line) |
p |
Positive integer |
a, b
|
NULL or one-sided formula literal (e.g. ~ x1 + x2) |
x_vars |
NULL or c("x1", "x2", ...)
|
W.type |
NULL, polynomial, or bspline
|
W.degree |
Positive integer (present for polynomial and bspline) |
W.knots |
c(...) of numerics (present only for bspline with interior knots) |
W.df |
Positive integer (present only for bspline with df) |
dims.K.a, dims.K.b
|
Same grammar as a, b, for K in 1:p (emitted only when p > 1) |
Purpose
Reads a canonical gdpar specification file from disk and parses it into an amm_spec object. The function is the deserialization counterpart to the package's spec writer: it enforces a strict, line-oriented key: value grammar, validates a version header against the currently loaded package version, and dispatches to amm_spec() either in the univariate (p = 1) form (with scalar a/b) or the multivariate (p > 1) form (with a dims list of per-dimension a/b pairs).
Arguments
-
path:characterscalar (length 1, non-empty). Filesystem path to a canonical gdpar spec file. Must exist on disk.
Mathematics
The parser implements a deterministic finite scan over the file's lines. Let
and the key/value pair is
The recognised key set is
For dims.*.* key; for dims indices are exactly a/b keys must be absent (i.e. parse to NULL).
Returns
An amm_spec object constructed by amm_spec():
- When
p == 1L:amm_spec(a = a_scalar, b = b_scalar, W = W, x_vars = x_vars, p = 1L), wherea_scalarandb_scalarare parsed formula objects (orNULL) andWis the parsed weight-basis specification returned by.parse_W_records(). - When
p > 1L:amm_spec(W = W, x_vars = x_vars, p = p_val, dims = dims_list), wheredims_listis a list of lengthp_valwhose$k$ -th element islist(a = a_k, b = b_k)witha_k,b_kparsed formula objects.
Notes
- All validation failures are raised via
gdpar_abort()with class"gdpar_input_error"; conditiondatapayloads are attached where contextual (e.g.argument,received,path,line,raw,key,file_version,package_version). - Input validation of
path: rejects non-character, non-scalar, or empty-string inputs before any I/O. - Version header: the file must contain a line matching the regex
^\s*#\s*gdpar_spec_version\s*:. The extracted (whitespace-trimmed) value must beidentical()toas.character(utils::packageVersion("gdpar")); any mismatch aborts, citing bit-exact reproducibility concerns across development releases. - Comment and blank lines are skipped during record parsing; only the first
:(fixed-match) on each remaining line is used as the key/value separator. - Duplicate-key detection is performed against an accumulating
recordslist; the error message reports both the current line and the line of the first occurrence. - Unknown-key rejection: any key not in
recognised_prefixesand not matching^dims\.[0-9]+\.[ab]$aborts. - Required-key enforcement:
p,a,bare mandatory for the univariate branch; for the multivariate branch,a/bmust beNULL(i.e..parse_formula()returnedNULL) anddims.K.a/dims.K.bare required for every$k \in \{1, \dots, p\}$ . -
pis parsed via.parse_int()and must satisfy$p \geq 1$ . -
x_varsis optional; when absent it is passed asNULL. - The
Wblock is delegated entirely to.parse_W_records(records). - Multivariate consistency: when
p == 1L, anydims.*key triggers an abort listing the offending keys (viasQuote). Whenp > 1L, after buildingdims_list, the function recomputes the set ofdims.keys and subtracts those matching^dims\.[1-(p-1)]\.[ab]$|^dims\.p\.[ab]$(i.e. the regex^dims\.[1-%d]\.[ab]$|^dims\.%d\.[ab]$withp_val - 1Landp_valsubstituted); any remainder is parsed for its integer index and aborted if the index isNA, less than 1, or greater thanp_val, or if the key does not match thedims.K.{a,b}shape at all. - Side effects: reads from the filesystem via
readLines(path, warn = FALSE); performs no writes. - No S3 dispatch is performed inside this function; the returned object's class is determined by
amm_spec().
Purpose
Serializes an R formula object into a single-line character string suitable for writing into the canonical text file format used by gdpar. This is the inverse of .parse_formula.
Arguments
-
f:formulaorNULL. The formula to serialize. May beNULL, in which case the literal string"NULL"is produced.
Mathematics
No numerical formula. The transformation is:
Returns
A length-one character string. When f is NULL, returns "NULL". Otherwise returns the deparsed formula collapsed onto a single line with elements separated by single spaces.
Notes
- Uses
deparse(..., width.cutoff = 500L)to reduce line-wrapping in the deparsed output, then collapses any resulting multi-element character vector with spaces. - No validation is performed on the structure of
f(e.g., one-sided vs. two-sided); that is the responsibility of the parser on read-back.
Purpose
Parses a character string back into an R formula object, enforcing that it is either NULL or a one-sided formula beginning with ~. Used when reading canonical configuration files.
Arguments
-
value:character(1). The raw string read from the file for the given key. -
key:character(1). The configuration key name, used in error messages. -
line_no:integer(1)ornumeric. The line number in the source file wherevalueappeared, used in error messages.
Mathematics
No numerical formula. The validation logic is:
Returns
NULL or a one-sided formula object of length 2 (i.e., ~ rhs).
Notes
- Trims whitespace from
valuebefore any comparison. - If the trimmed value is exactly
"NULL", returnsNULLimmediately. - If the value does not start with
"~", raises an error of class"gdpar_input_error"withdata = list(key = key, value = value, line = line_no). - Wraps
stats::as.formula(value)intryCatch; on parse failure, raises a"gdpar_input_error"(without thedatafield). - After successful parsing, checks
length(out) != 2L; a two-sided formula (length 3) triggers a"gdpar_input_error". - All errors are raised via
gdpar_abort.
Purpose
Serializes a character vector (or NULL) into a c(...) literal string with each element double-quoted, suitable for the canonical file format. This is the inverse of .parse_char_vec.
Arguments
-
x:charactervector orNULL. The vector to serialize.
Mathematics
subject to the constraint that no element of ", \, \n, or \r.
Returns
A length-one character string. "NULL" for NULL input; otherwise a c(...) literal with quoted entries.
Notes
- Before serialization, checks every element of
xfor the regex pattern["\\\n\r](double quote, backslash, newline, carriage return). If any match is found, raises a"gdpar_input_error"viagdpar_abortwith the message indicating that double quotes, backslashes, or newlines are not permitted. - Each element is formatted via
sprintf("\"%s\"", x_i). - Elements are joined with
", ". - An empty (length-zero) non-NULL character vector produces the string
c().
Purpose
Parses a c(...) literal string back into a character vector, or recognizes "NULL". Inverse of .serialize_char_vec.
Arguments
-
value:character(1). The raw string from the file. -
key:character(1). Configuration key name for error messages. -
line_no:integer(1)ornumeric. Source line number for error messages.
Mathematics
The extraction proceeds as:
- Trim
value; if equal to"NULL", returnNULL. - Require the regex
^c\(.*\)$; otherwise error. - Extract the inner content:
inner = sub("^c\((.*)\)$", "\1", v). - Find all quoted tokens via the regex
"([^"]*)"oninner. - Strip the surrounding double quotes from each match.
Returns
NULL, character(0), or a character vector of the parsed quoted tokens.
Notes
- If the trimmed value is
"NULL", returnsNULL. - If the value does not match
^c\(.*\)$, raises a"gdpar_input_error". - After extracting
inner, if there are zero regex matches:- If
inneris empty or whitespace-only (!nzchar(trimws(inner))), returnscharacter(0). - Otherwise raises a
"gdpar_input_error"indicating quoted tokens were expected.
- If
- Quoted tokens are matched with the regex
"([^"]*)"and then the double quotes are removed viagsub("\"", "", matches, fixed = TRUE). - The parser does not handle escaped quotes inside tokens; this is consistent with the serializer's prohibition on double quotes within elements.
Purpose
Serializes a numeric vector (or NULL) into a c(...) literal string using high-precision formatting. Inverse of .parse_num_vec.
Arguments
-
x:numericvector orNULL. The vector to serialize.
Mathematics
Each element is formatted with 17 significant digits:
then joined as
Returns
A length-one character string:
-
"NULL"ifxisNULL. -
"c()"ifxhas length 0. - Otherwise a
c(...)literal with%.17g-formatted numbers.
Notes
- The
%.17gformat provides enough precision to round-trip IEEE-754 double-precision values in most cases. - No validation is performed on finiteness or
NA/NaNvalues;sprintf("%.17g", NA)yields"NA", andsprintf("%.17g", Inf)yields"Inf", which would fail on re-parse via.parse_num_vec(sinceas.numeric("NA")isNAand triggers the non-numeric error, andInfwould parse but is not checked here).
Purpose
Parses a c(...) literal string back into a numeric vector, or recognizes "NULL". Inverse of .serialize_num_vec.
Arguments
-
value:character(1). The raw string from the file. -
key:character(1). Configuration key name for error messages. -
line_no:integer(1)ornumeric. Source line number for error messages.
Mathematics
- Trim
value; if"NULL", returnNULL. - Require
^c\(.*\)$; otherwise error. - Extract inner:
inner = trimws(sub("^c\((.*)\)$", "\1", v)). - If
inneris empty, returnnumeric(0). - Split on
,(fixed), trim each part, coerce withas.numeric. - If any result is
NA, error.
Returns
NULL, numeric(0), or a numeric vector.
Notes
- Uses
suppressWarnings(as.numeric(parts))to silence coercion warnings; the presence ofNAin the result is then checked explicitly. - If any token fails to coerce (producing
NA), raises a"gdpar_input_error"viagdpar_abort. - Splitting is done with
strsplit(inner, ",", fixed = TRUE), so commas inside quoted strings would cause issues — but numeric vectors do not contain quoted strings, so this is safe. - Note that
Inf,-Inf, andNaNwould parse successfully viaas.numericand pass theis.nacheck forInf(sinceis.na(Inf)isFALSE), butNaNwould fail sinceis.na(NaN)isTRUE.
Purpose
Parses a character string into a single integer value, with strict validation.
Arguments
-
value:character(1). The raw string from the file. -
key:character(1). Configuration key name for error messages. -
line_no:integer(1)ornumeric. Source line number for error messages.
Mathematics
then validate:
If valid, return
Returns
A length-one integer.
Notes
- Uses
suppressWarnings(as.numeric(value))to silence coercion warnings. - The check
v != as.integer(v)rejects non-integer-valued numerics (e.g.,3.5). Note thatas.integer(v)truncates toward zero, so this comparison effectively requires$v$ to be a whole number. -
is.finite(v)rejectsInf,-Inf,NaN, andNA. - On any validation failure, raises a
"gdpar_input_error"viagdpar_abort. - The returned value is explicitly
as.integer(v), so it is of R typeinteger(notnumeric).
Purpose
Parses the collection of W.* configuration records (for the basis-function specification of the dynamic parameter model) and constructs a W_basis object. This is the central dispatcher for reconstructing the basis
Arguments
-
records: a namedlistof record objects. Each record is expected to be a list with elementsvalue(character string) andline(line number). The relevant keys are"W.type","W.degree","W.knots", and"W.df".
Mathematics
The basis
Returns
NULL, or a W_basis object constructed by W_basis(...).
Notes
- If
records[["W.type"]]isNULLor itsvalueis"NULL", returnsNULLimmediately (no basis). -
W.type = "user"is explicitly rejected with a"gdpar_input_error"because user-defined bases reference arbitrary R functions that cannot be serialized in the canonical format. - Only
"polynomial"and"bspline"are accepted asW.typevalues; anything else raises a"gdpar_input_error"referencing the line number fromWt_rec[["line"]]. -
W.degreeis required for both supported types. Ifrecords[["W.degree"]]isNULL, an error is raised. The degree is parsed via.parse_int. - For
W.type = "polynomial": returnsW_basis(type = "polynomial", degree = W_degree). No further keys are consulted. - For
W.type = "bspline":-
W.knotsandW.dfare mutually exclusive. If both are present, a"gdpar_input_error"is raised. - If
W.knotsis present, it is parsed via.parse_num_vecand passed asknotstoW_basis. - If
W.dfis present, it is parsed via.parse_intand passed asdftoW_basis. - If neither
W.knotsnorW.dfis present, a"gdpar_input_error"is raised stating that one must be supplied.
-
- All errors are raised via
gdpar_abortwith class"gdpar_input_error". - This function delegates the actual basis construction to the (presumably exported or internal)
W_basisconstructor, which is not defined in this section.
Purpose Constructs a specification object that declares which components of the Additive-Multiplicative-Modulated (AMM) canonical form
are active and how each is parametrized. The returned amm_spec object is the primary input consumed by gdpar() to assemble design matrices and the Stan model. Two mutually exclusive construction paths exist, selected by p: the scalar path (p = 1L, default) accepts a and b directly as one-sided formulas; the multivariate path (p > 1L) requires the dims argument instead and forbids a/b.
Arguments
| Argument | Type | Meaning |
|---|---|---|
a |
One-sided formula or NULL
|
Scalar path only. Basis for the additive component stats::model.matrix. NULL disables the additive component. Must be NULL when p > 1L. |
b |
One-sided formula or NULL
|
Scalar path only. Basis for the multiplicative component a. Must be NULL when p > 1L. |
W |
Object of class W_basis or NULL
|
Basis for the modulating component p because it couples all dimensions of NULL disables the modulating component. Validated with assert_inherits. |
x_vars |
Character vector or NULL
|
Names of covariates entering the modulating component as the linear factor NULL, gdpar() later uses the covariates from the right-hand side of the model formula. Must be non-empty character when supplied. |
p |
Integer |
Dimension of the per-individual parameter vector 1L. Coerced to integer via as.integer after validation with assert_count. |
dims |
dims_spec object, plain list of length p, or NULL
|
Multivariate path only. Each entry is a list with components a (one-sided formula or NULL) and b (one-sided formula or NULL). Must be NULL when p == 1L. Bare formulas are rejected to prevent silent recycling. If a dims_spec object (from dimwise()), it is resolved via resolve_dims_spec(dims, p). |
Mathematics
The AMM level of the specification is inferred from the (non-)nullity of the components across all dimensions:
On the scalar path the check is direct on a, b, W; on the multivariate path it is computed over the resolved per-dimension list:
$$\texttt{any_a} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}ka), \qquad \texttt{any_b} = \bigvee_{k=1}^{p} \lnot \texttt{is.null}(\texttt{resolved}kb).$$
The design-matrix centering conditions (C2) and (C3) are not enforced inside this constructor; they are enforced empirically when the design matrices
Returns
An S3 object of class c("amm_spec", "list") with the following components:
| Component | Type | Description |
|---|---|---|
a |
One-sided formula or NULL
|
Populated on the scalar path; NULL on the multivariate path. |
b |
One-sided formula or NULL
|
Populated on the scalar path; NULL on the multivariate path. |
W |
W_basis object or NULL
|
The modulating basis, passed through verbatim. |
x_vars |
Character vector or NULL
|
Passed through verbatim. |
level |
Integer (0L, 1L, or 2L) |
AMM hierarchy level as defined above. |
p |
Integer | Dimension of |
dims |
NULL or list of length p
|
NULL on the scalar path. On the multivariate path, a resolved list of length p where each element is list(a = ..., b = ...). |
Notes
-
Scalar path (
p == 1L):aandbare validated withassert_one_sided_formula(..., allow_null = TRUE). Ifdimsis non-NULL, an error of classgdpar_input_erroris raised. -
Multivariate path (
p > 1L): If eitheraorbis non-NULL, an error is raised directing the user todims. IfdimsisNULL, an error is raised. -
Bare formula guard: If
dimsinherits from"formula", a specific error is raised to prevent silent recycling across dimensions. -
Plain list validation for
dims: The list must have length exactlyp; each entry must itself be a list. Missinga/bnames default toNULL. Each formula entry is validated withassert_one_sided_formula. -
dims_specresolution: Whendimsis adims_specobject (produced bydimwise()and possibly composed withoverride()), it is resolved via the internalresolve_dims_spec(dims, p)function. -
Unknown class for
dims: Ifdimsis notNULL, not a formula, not adims_spec, and not a list, an error of classgdpar_input_erroris raised. -
No cross-component identifiability check: The constructor does not detect non-identifiability between
$a,\ b$ , and$W$ components; this is deferred togdpar_check_identifiability()which is called automatically before fitting. -
Linearity assumption (LIN): Holds automatically for formula-based bases (linear subspaces of
$L^2_0(\mu)$ ) and for polynomial/B-splineW_basistypes. -
Dependencies: Uses
assert_count,assert_one_sided_formula,assert_inherits(all internal assertion helpers),gdpar_abort(structured error signalling), andresolve_dims_spec.
Purpose Constructs the centered design matrices (.build_amm_design_multi(). This function prepares the covariate matrices for the static and modulating components of the AMM, centering them and standardising the modulating covariates.
Arguments
-
amm(amm_spec): An object of class"amm_spec"representing the AMM specification. -
data(data.frame): The data frame containing the variables referenced by the AMM specification. -
formula_rhs(formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factorxvariables whenamm$x_varsisNULL.
Mathematics The function implements the following operations:
-
Centering the static components:
For the static intercept formulaamm$a(and similarlyamm$b), the design matrix$Z_a^\text{full}$ is constructed. It is then column-centred by subtracting its column means:$Z_a = Z_a^\text{full} - \mathbf{1} \cdot \bar{Z}_a^\text{T}$ where$\bar{Z}_a$ is the vector of column means of$Z_a^\text{full}$ . The same is done for$Z_b$ . -
Standardising the modulating component:
When the modulating componentamm$Wis active, the covariate matrix$X^\text{full}$ is built fromx_vars. It is then centred and scaled to have zero mean and unit standard deviation (using the sample standard deviation):$X = \text{diag}\left(\frac{1}{s_X}\right) \left( X^\text{full} - \mathbf{1} \cdot \bar{X}^\text{T} \right)$ where$\bar{X}$ is the vector of column means and$s_X$ is the vector of column sample standard deviations.
Returns A list with components:
-
Z_a(matrix): Centred design matrix for the static componenta(rows = observations, columns = terms froma). IfaisNULL, a 0-column matrix of the correct row count. -
Z_b(matrix): Centred design matrix for the static componentb. -
X(matrix): Standardised (centred and scaled) design matrix for the modulating component. IfWis inactive, a 0-column matrix. -
Z_a_means(numeric): Column means of the raw (uncentred) design matrix fora. -
Z_b_means(numeric): Column means forb. -
X_means(numeric): Column means of the raw covariate matrix forx. -
X_sds(numeric): Column standard deviations of the centred covariate matrix forx. -
Z_a_names(character): Column names ofZ_a. -
Z_b_names(character): Column names ofZ_b. -
X_names(character): Column names ofX(thex_vars).
Notes
- If
amm$p > 1, the function immediately returns the result of.build_amm_design_multi(amm, data, formula_rhs). - The function performs strict input validation via
assert_inheritsandassert_data_frame. - Raises a
gdpar_input_errorif any covariate needed bya,b, orx_varscontains missing values (NA). The error message specifies that "Path 1 does not impute". - Raises a
gdpar_input_errorif the modulating componentWis active but nox_varscould be identified (fromamm$x_vars,formula_rhs, or the formula terms). - Raises a
gdpar_input_errorif any requiredx_varsare missing fromdata. - Raises a
gdpar_input_errorif anyx_varsare constant (zero standard deviation after centring). - The design matrices are constructed using
stats::model.matrixwith the formula updated to~ . + 0to suppress the intercept column. - The returned
Z_a,Z_b, andXare always plain matrices (not data frames or tibbles).
Purpose Internal workhorse for building per-coordinate centred design matrices (
Arguments
-
amm(amm_spec): An object of class"amm_spec"withamm$p > 1. The per-coordinate specifications are stored inamm$dims. -
data(data.frame): The data frame containing the variables referenced by the AMM specification. -
formula_rhs(formula or character): The right-hand side of the model formula or a character vector of covariate names. Used to identify the linear factorxvariables whenamm$x_varsisNULL.
Mathematics
The algorithm is an extension of the univariate case (
-
Collect needed variables:
Iterate over all coordinates$k$ and collect all variables referenced inamm$dims[[k]]$aandamm$dims[[k]]$b. The union of these, plusamm$x_vars, forms the set of variables that must be present and complete (noNAs) indata. -
Per-coordinate centred design matrices:
For each coordinate$k$ :- If the static intercept formula
a_kis notNULL, construct its design matrix$Z_{a,k}^\text{full}$ and centre it:$Z_{a,k} = Z_{a,k}^\text{full} - \mathbf{1} \cdot \bar{Z}_{a,k}^\text{T}$ - Similarly for
$Z_{b,k}$ fromb_k. - If
a_korb_kisNULL, the corresponding matrix is set to a 0-column matrix.
- If the static intercept formula
-
Shared modulating matrix
$X$ :
The construction is identical to the univariate case: identifyx_vars, build$X^\text{full}$ , centre, scale, and standardise to unit variance. The resulting$X$ is shared across all$p$ coordinates.
Returns A list with components:
-
p(integer): The number of coordinates. -
Z_a_list(list of matrices): Lengthplist; each element is the centred design matrix for coordinate$k$ 's static componenta. -
Z_b_list(list of matrices): Lengthplist; each element is the centred design matrix for coordinate$k$ 's static componentb. -
X(matrix): The shared standardised modulating matrix (0 columns ifWis inactive). -
Z_a_means_list(list of numeric vectors): Column means for each raw$Z_{a,k}^\text{full}$ . -
Z_b_means_list(list of numeric vectors): Column means for each raw$Z_{b,k}^\text{full}$ . -
X_means(numeric): Column means of the raw$X^\text{full}$ . -
X_sds(numeric): Column standard deviations of the centred$X$ . -
Z_a_names_list(list of character vectors): Column names for each$Z_{a,k}$ . -
Z_b_names_list(list of character vectors): Column names for each$Z_{b,k}$ . -
X_names(character): Column names ofX(thex_vars).
Notes
- This function is internal and not exported (hence the leading dot).
- It performs the same input validation and error checks as
build_amm_design: missing values in needed variables, identification ofx_varswhenWis active, missingx_varsindata, and constant covariates inx_vars. - The per-coordinate matrices in
Z_a_listandZ_b_listmay have different column counts (ragged arrays). Downstream assembly (e.g., for Stan) must handle this raggedness. - The function does not compute any group-level random effects or parameter transformations; it only constructs the design matrices from the data and specifications.
Purpose Constructs centered design matrices and metadata for a multi‑individual (K > 1) AMM specification. Processes a list of canonical amm_spec objects (one per individual) and a dataset to produce centered design matrices for additive (a) and multiplicative (b) components, plus a scaled matrix for the modulating component (W) if present.
Arguments
-
amm_list_canonical(list): A named list of length ≥ 2. Each element must be an object of classamm_specwithp = 1(the K > 1, p > 1 regime is unsupported). Slots must have non‑empty names. -
data(data.frame): Data frame containing the variables used in the formulas. Must have no missing values in the required covariates. -
formula_rhs(formulaorcharacter): Right‑hand side of the model formula. Used as a fallback to determine the set of covariates for the modulating component ifx_varsis not specified and the union of variables froma/bformulas is empty.
Mathematics
For each individual
-
Additive component:
If the formula$a_k$ is notNULL, build the design matrix$Z^{(a)}_{\text{full}, k}$ viamodel.matrix. Compute column means$\bar{z}^{(a)}_k$ and center:$Z^{(a)}_k = Z^{(a)}_{\text{full}, k} - \mathbf{1}_n \bar{z}^{(a)\top}_k.$ If$a_k$ isNULL, set$Z^{(a)}_k$ to an$n \times 0$ matrix. -
Multiplicative component:
Analogously, from$b_k$ , produce$Z^{(b)}_k$ after centering. -
Modulating component (if any
$W$ is non‑NULL):
Determine the variable set$\mathbf{x}$ (explicitx_varsor union of variables from all$a_k,\ b_k$ , or fromformula_rhs).
Form$X_{\text{full}}$ from the data, then center and scale:$\bar{x} = \frac{1}{n} \mathbf{1}_n^\top X_{\text{full}},\ X_c = X_{\text{full}} - \mathbf{1}_n \bar{x}^\top,\ s_x = \text{diag}\left(\sqrt{\frac{1}{n-1} X_c^\top X_c}\right),\ X = X_c \cdot \text{diag}(1/s_x).$ Each column of$X$ has zero mean and unit sample standard deviation.
Returns
A list with components:
-
K(integer): Number of individuals. -
slot_names(character vector): Names of the input list slots. -
Z_a_k_list(list of matrices): Centered design matrices for additive components (each$n \times p_a^{(k)}$ , where$p_a^{(k)}$ is the number of columns from$a_k;\ n \times 0$ if absent). -
Z_b_k_list(list of matrices): Centered design matrices for multiplicative components. -
X(matrix): Centered and scaled design matrix for the modulating component ($n \times q;\ n \times 0$ if no$W$ is active). -
Z_a_k_means_list(list of numeric vectors): Column means of the additive design matrices before centering. -
Z_b_k_means_list(list of numeric vectors): Column means of the multiplicative design matrices. -
X_means(numeric vector): Column means of the modulating component matrix (empty if$q = 0$ ). -
X_sds(numeric vector): Column standard deviations of$X_c$ (empty if$q = 0$ ). -
Z_a_k_names_list(list of character vectors): Column names of the additive design matrices. -
Z_b_k_names_list(list of character vectors): Column names of the multiplicative design matrices. -
X_names(character vector): Names of the variables in the modulating component.
Notes
- Raises errors via
gdpar_abortif:-
amm_list_canonicalis not a list of length ≥ 2 or has empty/missing names. - Any
amm_spechasp > 1(unsupported feature error). -
datais not a data frame. - Missing values exist in any required covariate (no imputation).
- The modulating component is active but no covariates are identified.
- Required variables for the modulating component are missing in
data. - Any variable in the modulating component is constant (zero standard deviation).
-
- Uses
assert_inheritsto verify each list element is anamm_spec. - Uses
stats::model.matrixandsweepfor matrix construction and centering.
Purpose S3 method that prints a concise summary of an amm_spec object to the console, displaying its key specifications: level, dimension p, formulas for additive/multiplicative components, modulating component, and x_vars if present.
Arguments
-
x(amm_spec): An object of classamm_specto print. -
...(any): Additional arguments (unused; present for S3 generic compatibility).
Returns Invisibly returns the input object x.
Notes
- Output format:
If
<amm_spec> AMM Level <level> p (dim theta_i) : <p> a (additive) : <formula or NULL> b (multiplicative) : <formula or NULL> W (modulating) : <W_basis(...) or NULL> x_vars : <comma-separated list, if any>p > 1, instead of separateaandblines, it prints a table of per‑dimension formulas fromx$dims. - Formulas are deparsed to character strings for display.
- The method is exported and can be called directly on
amm_specobjects or viaprint().
Purpose
Exported methodological-audit function. Empirically verifies the conclusion of the Bernstein–von Mises theorem (Theorem 4C of Block 4) for a fitted Path 1 Bayesian gdpar model by comparing Bayesian posterior credible intervals with Hessian-based asymptotic confidence intervals obtained from a Laplace approximation around the maximum likelihood estimator on a prior-stripped Stan model. The function is opt-in, computationally expensive, and does not modify the input fit.
Arguments
-
fit: object of S3 classgdpar_fit(produced bygdpar()withpath = "bayes"). Must contain$fit(acmdstanrfit object with adraws()method),$stan_data(a list with possible fielduse_groups), and$prior(passed togenerate_stan_code). -
parameters: optional character vector of parameter names to include in the comparison. WhenNULL(default), defaults to the user-facing parameters that the prior-stripped likelihood identifies, obtained by filteringposterior::variables(draws)against an exclusion regex. -
level: numeric scalar in$[0,1]$ giving the nominal credible/confidence level. Defaults to0.95. -
verbose: logical scalar; whenTRUE, prints an estimated-cost / opt-in message before starting. Defaults toTRUE.
Mathematics
Let
For a nominal level
For each parameter, the Bayesian credible interval is
A width ratio is flagged as suspicious when
The MLE is obtained on the constrained (natural) scale because Stan's optimizer is invoked with jacobian = FALSE; the same convention is used for cmdstanr::laplace, so the Hessian-based covariance corresponds to the inverse observed information on the natural scale rather than the unconstrained scale.
Returns
A list of S3 class c("gdpar_bvm_report", "list") with components:
-
table: a data frame with one row per selected parameter and columns-
variable: parameter name, -
bayes_mean,bayes_lower,bayes_upper,bayes_width: posterior mean, lower/upper quantiles, and width$b_U - b_L$ , -
asymp_mean,asymp_lower,asymp_upper,asymp_width: Laplace-approximation mean, lower/upper quantiles, and width$a_U - a_L$ (allNA_real_if the Laplace step failed), -
width_ratio:$r = \texttt{bayes\_width} / \texttt{asymp\_width}$ (NA_real_if the Laplace step failed).
-
-
discrepancy: numeric vector of lengthlength(parameters)with$d = |\log r|$ (NA_real_if the Laplace step failed). -
level: the inputlevel. -
warnings: character vector of warning messages. Set to"Laplace approximation failed; asymptotic comparison unavailable."when the Laplace step fails; otherwise appended with one entry of the form"Width ratio outside [0.5, 2] for: <vars>."whenever any ratio falls outside$[0.5, 2]$ .
Notes
- Input validation:
-
assert_inherits(fit, "gdpar_fit", "fit")is called first. -
assert_numeric_scalar(level, "level", lower = 0, upper = 1)enforces$0 \le \texttt{level} \le 1$ . -
verboseis checked manually: if!is.logical(verbose) || length(verbose) != 1L, the function aborts withgdpar_abort(..., class = "gdpar_input_error").
-
- Hierarchical fits are rejected: if
fit$stan_data$use_groupsis non-null and equals1L(afteras.integer), the function aborts with classgdpar_unsupported_feature_errorand a message explaining that Theorem 4C does not cover per-group random anchors. - Suggested-package dependencies
cmdstanr(for optimization and Laplace) andposterior(for draw extraction/summarisation) are required viarequire_suggested; missing packages raise an error. - The opt-in message is emitted through
gdpar_informwith class"gdpar_optin_message"whenverbose = TRUE. - Candidate parameters are obtained from
posterior::variables(draws)after excluding names matching the regexThis excludes latent noise, log-likelihood, predictions, per-observation anchors, random-effect coefficients and raws, centering constants, prior hyperparameters, and hierarchical scales (^(eta|log_lik|y_pred|theta_i|a_coef|b_coef|a_raw|b_raw|W_raw|c_b|c_b_raw|mu_theta_ref|sigma_theta_ref|sigma_a|sigma_b|sigma_W)sigma_a,sigma_b,sigma_W), leaving user-facing identified parameters such astheta_ref,sigma_y, andphi. - When
parametersis supplied by the user, any requested name not incandidate_varstriggersgdpar_abortwith classgdpar_input_errorand a message listing the missing names wrapped insQuote. - Stan code is regenerated from
fit$priorviagenerate_stan_code(fit$prior, mle = TRUE)(prior block stripped using the// BEGIN PRIORS/// END PRIORSmarkers) and written to a temporary file viawrite_stan_to_tempfile. - The MLE is found by
cs_model$optimize(data = stan_data, algorithm = "lbfgs", refresh = 0, history_size = 5, init_alpha = 0.001, iter = 2000, jacobian = FALSE). - The Laplace step is wrapped in
tryCatch. On error,gdpar_warnis emitted with class"gdpar_diagnostic_warning"carrying the message"Laplace approximation failed: <conditionMessage>. Falling back to MLE-only summary.", and the asymptotic columns oftableplusdiscrepancyare filled withNA_real_. - Bayesian and Laplace summaries are both produced with
posterior::summarise_drawsusing custommean,sd,q_lower,q_upperfunctions;q_lowerandq_upperusestats::quantilewithnames = FALSEat probabilitiesalpha/2and1 - alpha/2respectively. - The function does not modify
fit; the returned report is purely informational. - Side effects: writes a Stan file to a temporary location, invokes
cmdstanr::cmdstan_model,optimize, and (optionally)laplace; may emit one opt-in message, one diagnostic warning, and one or more width-ratio warnings.
Purpose
Exported S3 print method for objects of class gdpar_bvm_report produced by gdpar_bvm_check. Renders a human-readable summary of the calibration comparison.
Arguments
-
x: an object of S3 classgdpar_bvm_report. -
...: unused; present for S3 generic compatibility.
Returns
Invisibly returns x.
Notes
- Prints a header line of the form
<gdpar_bvm_report> level = <level>followed by a newline. - Prints
x$tableviaprintwithrow.names = FALSE. - If
length(x$warnings) > 0L, prints a blank line, the headingWarnings:, and each warning prefixed by-on its own line. - Dispatched through the S3 generic
printbased on the first class element"gdpar_bvm_report". - No validation of
xis performed; the function assumes the object was constructed bygdpar_bvm_check.
Purpose
S3 print method for objects of class gdpar_causal_bridge. Produces a concise, human-readable summary of the bridge's structural metadata and first few CATE estimates.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_causal_bridge |
The bridge object to print. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns
Invisibly returns x.
Notes
- The method extracts the treatment-model family (
x$fits$treat$family) and prints both itsnameandlink. - It uses
%||%(null-coalescing) to defaultKandpto1Lwhen the corresponding slot is absent inx$fits$treat. -
Kis printed only whenK > 1; otherwisepis printed only whenp > 1. If both are 1, neither label is emitted. - The anchor vector is formatted with
digits = 4and displayed as a comma-separated bracketed list. -
x$meta$newdata_sourcedefaults to"<unknown>"whenNULL. - The head of
cate_meanis displayed via the internal helper.bridge_format_head. Ifx$warningsis a non-empty character vector, each warning is printed on its own indented line under a "Warnings:" heading. - Side effect: writes to the console via
cat.
Purpose
Internal helper that formats the first few elements (or rows) of the cate_mean vector/matrix for display in print.gdpar_causal_bridge.
Arguments
| Argument | Type | Meaning |
|---|---|---|
cate_mean |
numeric vector or numeric matrix | The posterior mean CATE values. A vector for scalar bridges; a matrix for multivariate/K-individual bridges. |
n_show |
integer, default 6L
|
Maximum number of elements or rows to display. |
Returns
A single character string suitable for cat.
Notes
-
Matrix path (
is.matrix(cate_mean)isTRUE):nrow(cate_mean)observations are assumed;n_showis clamped tomin(n_show, n). Each rowiis formatted as[v1, v2, ...]withdigits = 3viaformat. Rows are concatenated with"; "separators. -
Vector path:
length(cate_mean)observations are assumed;n_showis clamped similarly. The firstn_showelements are formatted withdigits = 3and joined with", ". - Marked
@keywords internaland@noRd; not exported.
Purpose
S3 summary method for gdpar_causal_bridge objects. Constructs a structured summary containing a per-observation table of posterior CATE (mean and credible interval), the marginal Average Treatment Effect (ATE) and its credible interval, and ancillary metadata.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_causal_bridge |
The bridge to summarise. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns
An object of class c("summary.gdpar_causal_bridge", "list") with components:
| Component | Structure | Description |
|---|---|---|
table |
data.frame |
Per-observation (and per-slot) CATE table (see below). |
ate |
named numeric vector | Marginal ATE. Scalar for K=1,p=1; length-K vector otherwise. |
ate_ci |
numeric matrix (K × 2) |
Credible bounds for the marginal ATE. Columns lower, upper. Row names are slot names (or "ate" for scalar case). |
level |
numeric scalar | Credible level used. |
type |
character scalar | Type of bridge (e.g. "response", "link"). |
n_draws |
integer scalar | Number of posterior draws. |
n_obs |
integer scalar | Number of evaluation observations. |
Mathematics
Let
Scalar bridge (cate_draws is an
The ATE credible interval is
where stats::quantile).
Multivariate / K-individual bridge (cate_draws is an
For each slot
Notes
- Calls
assert_inherits(object, "gdpar_causal_bridge", "object")to validate input. - The scalar branch is taken when
is.matrix(cate_draws)isTRUE; the table then has columnsobservation,cate_mean,cate_lower,cate_upper. Thecate_meanandcate_cicomponents of the bridge object (object$cate_mean,object$cate_ci) are assumed pre-computed. - The array branch is taken otherwise. Slot names are taken from
object$meta$dim_names; ifNULL, they default to"dim_1","dim_2", …,"dim_K". The table gains aslotcolumn and hasK × nrows total. - The per-observation ATE for the scalar case is
mean(cate_draws)(i.e. the grand mean over both draws and observations), and the ATE CI usesrowMeans(cate_draws)(a length-$S$ vector of per-draw cross-observation means). - For the array case,
ate_vec[k]is computed asmean(cate_draws[, , k])(grand mean over all draws and observations for slot$k$ ). The CI for slot$k$ usesapply(cate_draws[, , k, drop = FALSE], 1L, mean)to produce a length-$S$ vector of per-draw means, then takes quantiles of that. - The
ate_matmatrix in the scalar path has a single row named"ate". - Class is set to
c("summary.gdpar_causal_bridge", "list").
Purpose
S3 print method for summary.gdpar_causal_bridge objects. Displays bridge metadata, the marginal ATE table, and the first 10 rows of the per-observation CATE table.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
summary.gdpar_causal_bridge |
The summary object to print. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns
Invisibly returns x.
Notes
- Builds an
ate_dfdata frame with columnsslot,mean,lower,upperfromx$ate(coerced to unnamed vector) andx$ate_ci. This data frame is printed withrow.names = FALSE. - The per-observation CATE table is truncated to the first 10 rows via
utils::head(x$table, 10L). - Side effect: writes to the console via
catandprint.
predict.gdpar_causal_bridge(object, newdata, level = NULL, summary = c("all", "draws", "mean_ci"), ...)
Purpose
S3 predict method for gdpar_causal_bridge objects. Recomputes the per-observation Conditional Average Treatment Effect (CATE) on a new evaluation grid by leveraging the treatment and control model fits already stored in the bridge. Structural compatibility of the two fits is not re-checked.
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_causal_bridge |
The bridge whose fits are used for prediction. |
newdata |
data.frame |
New covariate grid on which to evaluate the CATE. Required. |
level |
numeric scalar in NULL
|
Credible level for the new credible intervals. If NULL (default), uses object$level. |
summary |
character scalar, one of "all", "draws", "mean_ci"
|
Controls the structure of the returned value (see Returns). |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Mathematics
Let
The summary statistics are
where
Returns
summary value |
Return structure |
|---|---|
"all" |
A list with components cate_draws (array/matrix of posterior CATE draws), cate_mean (numeric vector/matrix of posterior means), cate_ci (array/matrix of credible bounds), n_draws (integer scalar), n_obs (integer scalar). |
"draws" |
The raw cate_draws object (matrix or array), returned early. |
"mean_ci" |
A list with components cate_mean and cate_ci. |
Notes
- Validates
objectviaassert_inheritsandnewdataviaassert_data_frame. -
levelis validated byassert_numeric_scalarwhen non-NULL, with bounds(1e-3, 1 − 1e-3)exclusive. -
summaryis matched viamatch.arg; partial matching is allowed by R convention. - The
typeslot is extracted fromobject$typeand forwarded to bothstats::predictcalls withsummary = "draws". This means both the treatment and control predictions return raw posterior draws. - Draw alignment is performed by the internal helper
.align_bridge_draws(pred_t, pred_c), which returns a list with aligned$treatand$ctrlarrays and the common draw count$S. - CATE is computed as the element-wise difference
aligned$treat - aligned$ctrl. - Summarisation (mean and CI) is delegated to the internal helper
.summarize_cate(cate_draws, ql, qu). - The return value for
summary = "all"mirrors thecate_*slot structure of a freshly constructedgdpar_causal_bridge, enabling downstream methods (e.g.summary,print) to operate on the result if it is wrapped appropriately.
gdpar_causal_bridge(fit_treat, fit_ctrl, newdata = NULL, type = c("response", "theta_i", "linear_predictor"), level = 0.95, ...)
Purpose
Exported constructor for the T-learner causal bridge between two independent gdpar_fit objects. It estimates the conditional average treatment effect (CATE) as the per-observation, per-draw difference of the posterior predictive distributions from the treatment-arm fit and the control-arm fit, evaluated on a common evaluation set. The function does not modify either input fit and performs no causal adjustment beyond what is encoded in the two AMM specifications; the no-unmeasured-confounding assumption within each arm is the user's responsibility.
Arguments
-
fit_treat— (gdpar_fit) A fit object produced bygdpar()for the treatment arm. Must be a Path 1 (path = "bayes") Bayesian fit. -
fit_ctrl— (gdpar_fit) A fit object produced bygdpar()for the control arm. Must share the family, anchor, AMM level, and covariate structure offit_treat. -
newdata— (data.frameorNULL) Optional evaluation data frame. WhenNULL(default), the function attempts to recover each arm's training data by evaluating the captureddataargument of each fit's call in the caller's environment; if both recoveries succeed and the two data frames share column structure, theirrbindis used. Otherwise the function aborts and requests an explicitnewdata. -
type— (character) Scalar selecting the prediction scale. Matched viamatch.argagainstc("response", "theta_i", "linear_predictor")."response"applies the inverse link per draw;"theta_i"and"linear_predictor"are synonyms selecting the linear predictor of the individual parameter. -
level— (numeric) Scalar in$(0, 1)$ giving the nominal credible level for per-observation CATE intervals. Defaults to0.95. Validated to lie in$[10^{-3},\, 1 - 10^{-3}]$ . -
...— Reserved for future arguments; currently unused.
Mathematics
For each posterior draw
where type at
Because the two fits are independent (sampled from disjoint data subsets), the joint posterior factorizes:
so any pairing of marginal draws is a valid sample from the joint. When the two fits differ in number of draws, the function trims to
and emits a gdpar_diagnostic_warning.
For multivariate (predict.gdpar_fit returns a 3-array of shape cate_draws. For type = "response", the canonical inverse link of each coordinate or slot is applied by predict.gdpar_fit before differencing.
Returns
An object of class c("gdpar_causal_bridge", "list") with components:
-
cate_draws—matrix$[S, n]$ (scalar fits) orarray$[S, n, \text{dim}]$ (multivariate / K-individual fits): per-draw CATE values. -
cate_mean— posterior mean of the CATE per observation (and per dimension when applicable). -
cate_ci— credible interval bounds at the requestedlevel. -
newdata— the resolved evaluation data frame. -
id_check—list(treat = fit_treat$identifiability_report, ctrl = fit_ctrl$identifiability_report). -
fits—list(treat = fit_treat, ctrl = fit_ctrl). -
type— the matchedtypestring. -
level— the numeric credible level. -
n_draws—S, the (possibly trimmed) number of aligned draws. -
n_obs—nrow(newdata_resolved). -
call— the matched call (match.call()). -
warnings—charactervector of fallback notifications (e.g., posterior-draw trimming); empty on the happy path. -
meta—list(dim_kind, dim_size, dim_names, newdata_source)wherenewdata_sourceis the"bridge_source"attribute of the resolvednewdata.
Notes
- Calls six internal validators in sequence:
.check_bridge_path,.check_bridge_hierarchical,.check_bridge_family,.check_bridge_dim,.check_bridge_amm,.check_bridge_anchor. Only the first three are defined in this section; the remaining three and.resolve_bridge_newdata,.align_bridge_draws,.summarize_cateare defined in subsequent sections. - Aborts with condition class
gdpar_unsupported_feature_error(viagdpar_abort) when structural compatibility checks fail. - The
assert_inheritscalls validate that bothfit_treatandfit_ctrlinherit from"gdpar_fit"before any other logic executes. -
assert_numeric_scalarenforceslevelwithin$[10^{-3},\; 1 - 10^{-3}]$ . - The
newdataresolution usesparent.frame()as the evaluation environment for recovering captureddataarguments. - Predictions are obtained via
stats::predict(fit, newdata = ..., type = type, summary = "draws"), dispatching topredict.gdpar_fit. - The
warningsfield is set tocharacter(0L)whenaligned$warningisNA, otherwise to the warning string. - (C7) anti-aliasing of Block 6.5 is not invoked because the hierarchical guard rules out the regime in which it applies.
- Companion S3 methods
print.gdpar_causal_bridgeandsummary.gdpar_causal_bridgeare documented elsewhere.
Purpose
Internal validator asserting that both fits were produced under Path 1 (path = "bayes"). T-learner support for Paths 2/3 is not implemented.
Arguments
-
fit_treat— (gdpar_fit) Treatment-arm fit. -
fit_ctrl— (gdpar_fit) Control-arm fit.
Mathematics
Each fit's path slot is read with a fallback default of "bayes":
The check passes if and only if identical).
Returns
invisible(NULL) on success.
Notes
- On failure, calls
gdpar_abortwithclass = "gdpar_unsupported_feature_error"and adatalist containingpath_treatandpath_ctrl. - The
%||%operator provides the default"bayes"whenfit$pathisNULL, meaning a fit lacking an explicitpathslot is treated as Path 1.
Purpose
Internal validator asserting that neither fit was sampled in the hierarchical (grouped) regime. The T-learner difference of per-group anchors is not defined in the canonical formulation and is queued for a future sub-phase.
Arguments
-
fit_treat— (gdpar_fit) Treatment-arm fit. -
fit_ctrl— (gdpar_fit) Control-arm fit.
Mathematics
A fit is classified as grouped when
The check passes if and only if
Returns
invisible(NULL) on success.
Notes
- Defines a local closure
is_grouped(fit)that testsfit$stan_data$use_groupsfor non-NULL and integer equality to1L(viaas.integer(...)). - On failure, calls
gdpar_abortwithclass = "gdpar_unsupported_feature_error"and adatalist containingtreat_groupedandctrl_grouped(logical scalars). - Uses the same condition class as
gdpar_bvm_checkso user code handling unsupported-feature errors covers both helpers. - The error message advises refitting each arm without the
groupargument.
Purpose
Internal validator asserting that the two fits share compatible family identifiers—both the top-level family name and link, and, when param_specs are present (multivariate / K-individual families), the per-slot family identifiers.
Arguments
-
fit_treat— (gdpar_fit) Treatment-arm fit. -
fit_ctrl— (gdpar_fit) Control-arm fit.
Mathematics
Let
When param_specs is non-NULL on either fit, let %||%). The slot-count check requires
$$ \text{names}{\text{arm}}[k] = \text{as.character}\bigl(\text{ps}{\text{arm}}k\text{$family_id} ;%|%; \text{ps}{\text{arm}}k\text{$name} ;%|%; \text{NA_character_}\bigr), \quad k = 1, \dots, n{\text{arm}}, $$
and the slot-identifier check requires identical).
Returns
invisible(NULL) on success.
Notes
- Performs three sequential aborts, all with
class = "gdpar_unsupported_feature_error":- Top-level family name or link mismatch —
datacontainsfamily_treat,family_ctrl,link_treat,link_ctrl. - Slot count mismatch —
datacontainsn_slots_treat,n_slots_ctrl. - Per-slot family identifier mismatch —
datacontainsslot_families_treat,slot_families_ctrl(character vectors).
- Top-level family name or link mismatch —
- The
param_specsbranch is entered when!is.null(ps_t) || !is.null(ps_c), meaning if either fit hasparam_specs, both are expected to have them and be compared. - Per-slot identifier extraction uses the chain
s$family_id %||% s$name %||% NA_character_, so a slot lacking bothfamily_idandnameyieldsNA_character_. -
vapplywithcharacter(1L)is used for type-safe extraction of per-slot identifiers.
Purpose
Guard function for gdpar_causal_bridge that verifies the two fitted model objects share identical structural dimensions
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | The treatment-arm fit; must contain (or default) elements K and p. |
fit_ctrl |
list (fitted model object) | The control-arm fit; must contain (or default) elements K and p. |
Mathematics
The check enforces the equality constraints
Each dimension is recovered with a null-coalescing fallback to 1L:
and analogously for
Returns
invisible(NULL) when the dimensions match. Otherwise it never returns: it calls gdpar_abort with class "gdpar_unsupported_feature_error" and a data payload listing K_treat, K_ctrl, p_treat, p_ctrl.
Notes
- Uses the
%||%operator, so a missingKorpslot silently defaults to1Lrather than erroring. - The error condition is tagged
"gdpar_unsupported_feature_error", signalling that mismatched dimensions are a feature limitation rather than a user input typo.
Purpose
Asserts that the two fits are compatible at the AMM (Additive Modulating Model) specification level. Three facets are compared: (1) the AMM level (structural composition of the spec slots a/b/W), (2) the modulating basis type (polynomial vs. B-spline), and (3) the covariate column structure of the AMM design. This guarantees that the predict path on newdata reuses the same algorithm on both arms.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | Treatment-arm fit; expected to carry an amm element or a fallback amm_list_canonical. |
fit_ctrl |
list (fitted model object) | Control-arm fit; same expectation. |
Mathematics
Let
The AMM spec is resolved via null-coalescing:
Returns
invisible(NULL) on success. On any mismatch, calls gdpar_abort:
- Missing AMM on either fit → class
"gdpar_internal_error". - Level mismatch → class
"gdpar_unsupported_feature_error",data = list(level_treat, level_ctrl). - W-type mismatch → class
"gdpar_unsupported_feature_error",data = list(W_type_treat, W_type_ctrl). The formatted message substitutes"<none>"for aNULLtype. - Covariate mismatch → class
"gdpar_unsupported_feature_error". The message lists the symmetric set difference of covariate component names:
Notes
- Dispatches to three helper functions:
.bridge_amm_level,.bridge_amm_W_type,.bridge_amm_covariates. - The covariate comparison uses
identical(), so ordering of list elements matters for the check to pass. - A missing AMM on either fit is treated as an internal error (not a user error), reflecting the expectation that fits should always carry an AMM spec.
Purpose
Asserts that the reference anchors stored in the two fits are numerically identical (within tolerance). The anchor enters the modulating term as
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | Treatment-arm fit; must contain an anchor element (numeric vector). |
fit_ctrl |
list (fitted model object) | Control-arm fit; must contain an anchor element (numeric vector). |
Mathematics
Given anchor vectors
Then it computes element-wise absolute differences and a per-element scale:
and requires
This combines relative tolerance (when
Returns
invisible(NULL) when anchors match. Otherwise calls gdpar_abort:
- Length mismatch → class
"gdpar_unsupported_feature_error",data = list(anchor_treat_length, anchor_ctrl_length). - Value mismatch → class
"gdpar_unsupported_feature_error",data = list(anchor_treat, anchor_ctrl). The message formats anchor values withdigits = 6and instructs the user to refit one arm anchored to the other's value.
Notes
- The tolerance is fixed at
1e-8and is not configurable. -
pmaxis used (element-wise parallel max), so the scale vector has the same length as the anchors. - If either
anchorslot isNULL, the subtractiona_t - a_cwill error in base R before the tolerance check; this is not explicitly guarded.
Purpose
Infers the AMM level from an amm_spec object or a list of amm_spec objects. The level encodes the structural composition of the spec (presence/absence of a, b, W components).
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec or list of amm_spec
|
The AMM specification (single or per-slot when |
Returns
- If
amminherits from"amm_spec":as.integer(amm$level)(scalar integer). - If
ammis a list: an integer vector of the same length, where each element isas.integer(a$level)if the element inherits from"amm_spec", otherwiseNA_integer_. - Otherwise:
NA_integer_.
Notes
- No error is raised for non-
amm_specinputs; the function returnsNA_integer_silently. - The return type is always integer (via
as.integerandNA_integer_), making the output suitable foridentical()comparison in.check_bridge_amm.
Purpose
Extracts the modulating basis type (e.g., "polynomial" or "bspline") from the W sub-component of an AMM spec. This ensures the predict path on new data reuses the same basis expansion algorithm on both arms.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec or list of amm_spec
|
The AMM specification. |
Returns
- If
amminherits from"amm_spec":-
NULLwhenamm$WisNULL(no modulating term). -
as.character(amm$W$type)otherwise.
-
- If
ammis a list:- A character vector of length
length(amm), where each element isas.character(a$W$type)if the element is anamm_specwith a non-nullW, otherwiseNA_character_. - If all elements are
NA, returnsNULL. - Otherwise returns the full character vector (including
NAentries).
- A character vector of length
- Otherwise:
NULL.
Notes
- The "all NA → NULL" collapse means a list of specs with no
Won any slot is indistinguishable from a single spec with noW. - When some slots have
Wand others do not, the returned vector containsNA_character_entries, which will cause.check_bridge_ammto fail theidentical()test if the two arms differ in which slots lackW.
Purpose
Extracts the covariate names for each AMM component (a, b, and x_vars) from an AMM spec. This is used to verify that both arms share the same covariate column structure in their AMM designs.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec or list of amm_spec
|
The AMM specification. |
Mathematics
For a single amm_spec
where all.vars() parses an R formula/expression and returns the variable names referenced in it. Each component defaults to character(0L) when the corresponding slot is NULL.
Returns
- If
amminherits from"amm_spec": a named list with elementsa_vars,b_vars,x_vars(each a character vector). - If
ammis a list: a list of such named lists (one per element ofamm). - Otherwise: an empty
list().
Notes
-
all.vars()is applied toa$aanda$b, which are expected to be formula-like objects (formulas, calls, or expressions). If they are character strings,all.vars()will returncharacter(0). - The
x_varsslot is taken as-is (not parsed), so it must already be a character vector. - The returned structure is compared with
identical()in.check_bridge_amm, so the order ofx_varsand the order of list elements matter.
Purpose
Resolves the evaluation grid (newdata) on which the CATE will be computed. When the user supplies newdata, it is used directly. When newdata is NULL, the function attempts to recover both arms' training data by evaluating the captured data argument from each fit's call, then rbinds them into a single data frame.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_treat |
list (fitted model object) | Treatment-arm fit; its $call$data is evaluated if newdata is NULL. |
fit_ctrl |
list (fitted model object) | Control-arm fit; same recovery logic. |
newdata |
data.frame or NULL
|
User-supplied evaluation grid. If non-NULL, it is validated and returned. |
eval_env |
environment | The environment in which to evaluate fit$call$data when recovering training data. |
Returns
A data.frame with an attribute "bridge_source":
-
"user"— whennewdatawas supplied by the user (passed through afterassert_data_framevalidation). -
"training_rbind"— whennewdatawasNULLand both arms' training data were successfully recovered andrbind-ed.
The rbind uses the column order of the treatment arm (data_t) for both arms, ensuring consistent column alignment.
Notes
-
Recovery logic: The inner function
recover(fit, arm_label)extractsfit$call$dataand evaluates it ineval_envinside atryCatch. Any evaluation error silently yieldsNULL. Thearm_labelargument is accepted but never used in the body. -
Column matching: Column names are compared after sorting (
sort(colnames(...))), so column order may differ between arms but the set of columns must be identical. However, therbinditself usesdata_t's column order for both, so ifdata_chas columns in a different order, they are subsetted to matchdata_t's order before binding. -
Error conditions (all via
gdpar_abort):- Recovery failure (either arm's data is
NULLor not a data frame) → class"gdpar_input_error",data = list(treat_recovered, ctrl_recovered). - Column structure mismatch → class
"gdpar_input_error",data = list(colnames_treat, colnames_ctrl).
- Recovery failure (either arm's data is
-
assert_data_frame(newdata, "newdata")is called on user-suppliednewdata; this is presumably a checkmate-style assertion (not defined in this section). - The
rbindis performed withdrop = FALSEsubsetting, preserving data frame structure even for single-column data.
Purpose
Aligns the posterior draw arrays of the two arms by trimming the longer set to match the shorter. This is necessary because the two fits may have been run with different numbers of posterior draws (
Arguments
| Argument | Type | Meaning |
|---|---|---|
pred_t |
matrix [S_t, n] or array [S_t, n, dim]
|
Posterior draws from the treatment arm. |
pred_c |
matrix [S_c, n] or array [S_c, n, dim]
|
Posterior draws from the control arm. |
Mathematics
The aligned draw count is:
When
and for an array of dimension
Returns
A named list with four components:
| Component | Type | Description |
|---|---|---|
treat |
matrix or array | Trimmed treatment draws, first axis of length |
ctrl |
matrix or array | Trimmed control draws, first axis of length |
S |
integer | The aligned draw count |
warning |
character scalar | The trimming notification message when NA_character_ otherwise. |
Notes
-
Draw count inference:
$S$ is inferred fromnrow()(for matrices) ordim()[1L](for arrays). Ifpred_tis a matrix andpred_cis an array (or vice versa), the function does not explicitly check for type consistency; it will still compute$S_t$ and$S_c$ but the trimming logic branches onis.matrix()per input. -
Warning emission: When trimming occurs,
gdpar_warnis called with class"gdpar_diagnostic_warning"anddata = list(S_treat, S_ctrl, S). The same message string is stored in thewarningfield for persistence. -
Trimming implementation: The inner function
trim_first_axis(arr, S)handles both matrices (simple row indexing withdrop = FALSE) and arrays (constructs an index list withseq_len(S)for the first axis andquote(expr =)(i.e., empty subscript) for all remaining axes, then callsdo.call([, ...)withdrop = FALSE). The use ofquote(expr =)produces a missing/empty argument in the subscript list, which selects all elements along that dimension. -
Persistence: The
warningfield is designed to be persisted by the constructor into the$warningsslot of the resultinggdpar_causal_bridgeobject, so the print method can surface the fallback notification (referenced as "D48 canonical norm; D50 of Sesion 18 Etapa 2 of Sesion 8.4" in the source comments). - No error is raised if the two inputs have different shapes beyond the first axis; the function only aligns the first (draw) axis.
Purpose
Internal helper that converts raw posterior draws of a Conditional Average Treatment Effect (CATE) quantity into a compact summary consisting of the posterior mean and a lower/upper credible-interval bound per unit (and, in the multi-component case, per component slot). It is the canonical summarizer used downstream by the causal-bridge machinery to produce user-facing CATE summaries; it also tags the output with metadata (dim_kind, dim_size, dim_names) describing the structure of the summarized quantity so that downstream printers/extractors can dispatch correctly.
Arguments
-
cate_draws:numericposterior draws of the CATE. Two shapes are supported:- a
matrixwith rows indexing posterior draws ($S$ ) and columns indexing units ($n$ ), i.e.dim = c(S, n); - a 3-dimensional
arraywithdim = c(S, n, K), where the first dimension indexes draws, the second indexes units, and the third indexes component "slots" (e.g. multiple treatment arms, multiple outcome dimensions, or per-dimension effects).
- a
-
ql:numericscalar in$[0,1]$ . Lower-tail quantile probability used for the credible interval. -
qu:numericscalar in$[0,1]$ . Upper-tail quantile probability used for the credible interval.
Mathematics
For the matrix case (one scalar CATE per unit), let
where stats::quantile with names = FALSE).
For the 3-D array case, let
The dim_kind tag is assigned by the heuristic
Returns
A list with the following components, whose shapes depend on the input:
-
Matrix input:
-
mean:numericvector of length$n$ (column means ofcate_draws). -
ci:numeric$n \times 2$ matrixwith columns named"lower"and"upper"; column 1 holds theql-quantiles, column 2 thequ-quantiles. -
dim_kind:characterscalar, hard-coded to"scalar". -
dim_size:integerscalar, hard-coded to1L. -
dim_names:NULL.
-
-
3-D array input:
-
mean:numeric$n \times K$ matrixwithdimnamesset tolist(NULL, slot_names)(i.e. unit dimension unnamed, slot dimension named after the third-dimension names ofcate_draws). -
ci:numeric$n \times K \times 2$ arraywithdimnames = list(NULL, slot_names, c("lower", "upper")); slice[, , 1]holds theql-quantiles, slice[, , 2]thequ-quantiles. -
dim_kind:characterscalar, either"multi"or"K_individual"per the heuristic above. -
dim_size:integerscalar equal to$K$ (dim(cate_draws)[3L]). -
dim_names: the third-dimension names ofcate_draws(dimnames(cate_draws)[[3L]]), which may beNULL.
-
Notes
- The function is internal (leading dot) and is not exported.
- Shape dispatch is performed by testing
is.matrix(cate_draws)first, thenlength(dim(cate_draws)) == 3L. Any other shape (e.g. a vector, a 2-D non-matrix array, or a 4+-D array) triggers an error viagdpar_abortwithclass = "gdpar_internal_error"and adatafield carryingdim(cate_draws). The error message is the literal string"Internal error: unsupported shape for cate_draws.". - Quantiles are computed with
stats::quantileandnames = FALSE; the default quantile type (type 7) ofstats::quantileis therefore in effect. - In the matrix branch, the two quantile rows returned by
applyare re-assembled into an$n \times 2$ matrix explicitly (rather than transposed), so the orientation ofciis guaranteed regardless of howapplylays out its result. - In the 3-D branch,
applyis called separately for the lower and upper quantiles (with scalarprobs), producing two$n \times K$ matrices that are then written into the two slices of the output array. This avoids any ambiguity about the shape ofapply's output whenprobshas length 1. -
dim_kindis purely a metadata tag derived from the slot names; it does not affect the numerical content ofmeanorci. The"multi"tag is intended to flag either anonymous component dimensions (slot_namesisNULL) or component dimensions whose names follow the package-internaldim_naming convention, while"K_individual"flags named per-individual components. - The function has no side effects and performs no S3 dispatch of its own; it is a plain closure.
gdpar_check_identifiability(amm, data, theta_ref_init = NULL, formula_rhs = NULL, family = NULL, tol = 1e-8, rigor = c("full", "fast"))
Purpose
Diagnose whether the chosen finite parametric representation of the AMM canonical form satisfies the basis‑restricted Functional Independence Condition at a candidate population reference point. This pre‑fit check verifies parameter identifiability by examining the empirical Gram matrix of the extended design matrix.
Arguments
-
amm: Object of classamm_spec(created byamm_spec) defining bases for additive, multiplicative, and modulating components. -
data: Data frame containing variables referenced inamm; covariates are centered internally before computing the Gram matrix. -
theta_ref_init: Numeric vector of length p (dimension of the population reference) at which the diagnostic is evaluated. Defaults to a zero vector (if multiplicative componentbis absent) or a vector of ones (ifbis present). -
formula_rhs: Optional formula or character vector specifying covariates for the modulating linear factorx; defaults toamm$x_vars. -
tol: Numeric scalar ∈ (0,1) setting the relative condition‑number tolerance. Failure is flagged when λ_min <tol× λ_max. Default1e-8. -
family: Optionalgdpar_familyorgdpar_family_multiobject; if supplied, triggers parameter‑level identifiability (D‑ID) pre‑fit checks. -
rigor: Character scalar,"full"(default) or"fast", controlling the C4‑bis cross‑component check for p > 1 specs. Ignored when p = 1.
Mathematics
The function builds an extended design matrix
Its eigenvalues
equivalent to the condition number
For p > 1, a per‑coordinate C4‑bis check is performed (if applicable) by calling check_C4_bis_per_k, and a D‑ID pre‑fit check is performed via .check_did_pre_fit.
Returns
An object of class gdpar_identifiability_report (a list) containing:
| Component | Description |
|---|---|
passed |
Logical; TRUE iff all applicable checks (Gram, C4‑bis, D‑ID) pass. |
lambda_min, lambda_max, condition_number
|
Eigenvalues and condition number of the Gram matrix (if computed). |
collinear_directions |
List of near‑zero eigenvectors projected onto basis columns (if Gram check fails); otherwise NULL. |
theta_ref_used |
The theta_ref_init used. |
tol_used, rigor_used
|
The tol and rigor used. |
column_labels |
Character vector labeling columns of the extended design matrix. |
c4_bis |
Result of the C4‑bis per‑coordinate check (if performed); else NULL. |
did_pre_fit |
Result of the D‑ID pre‑fit check (if performed); else NULL. |
message |
Human‑readable summary. |
A print method formats the report.
Notes
-
Trivial cases: If
amm$level == 0Lor there are no active design blocks, the function returnspassed = TRUEwith a message andNAeigenvalues. -
Input validation: Uses
assert_inherits,assert_data_frame,assert_numeric_scalar, andmatch.arg; raisesgdpar_input_erroron invalid inputs. -
Warning: If
theta_ref_initis zero in every coordinate andamm$bis non‑null, agdpar_diagnostic_warningis issued because the multiplicative block vanishes trivially. -
Zero‑norm columns: If any column of
$Z$ is zero after centering, the function returnspassed = FALSEwithlambda_min = 0andcondition_number = Inf. -
Multivariate path: When
design$Z_a_listexists anddesign$Z_aisNULL, the function delegates tocheck_identifiability_multi(defined elsewhere) and returns its result. -
Extended matrix construction: For p > 1, the multiplicative block is expanded as
$Z_b \cdot \theta_{\text{ref}[k]}$ for each coordinate k, producing column labelsb*theta[k]:.... -
C4‑bis and D‑ID checks: Only performed when p > 1 and the necessary design components exist. The
rigorargument controls failure vs. warning on column‑name overlap between additive and modulating blocks. -
Condition number computation: Uses
eigen(G, symmetric = TRUE)and guards against division by zero withmax(lambda_min, .Machine$double.eps).
Purpose
Top-level multivariate identifiability report. Combines two pre-fit layers—(i) the per-coordinate C4-bis functional independence check (check_C4_bis_per_k) and (ii) the pre-fit parameter identifiability / D-ID layer (.check_did_pre_fit)—into a single consolidated "gdpar_identifiability_report" object.
Arguments
| Argument | Type | Meaning |
|---|---|---|
amm |
amm_spec |
The model specification; forwarded to sub-checks (used to read W$dim and p). |
design |
list | Design structure returned by build_amm_design; forwarded to sub-checks. |
theta_ref |
numeric vector (length p) |
Reference (initial) parameter vector at which identifiability is evaluated. |
tol |
numeric scalar | Tolerance for the eigenvalue-ratio (condition-number) criterion in C4-bis. |
rigor |
character scalar ("fast" or "full") |
Controls depth of the C4-bis check; forwarded to both sub-checks. |
family |
gdpar_family, gdpar_family_multi, or NULL
|
Family object for the D-ID layer; NULL skips that layer entirely. |
Mathematics
The function aggregates per-coordinate eigenvalue diagnostics. Denote the smallest and largest eigenvalues returned by coordinate
with NA_real_ propagated when every per-NA.
Returns
A list of class c("gdpar_identifiability_report", "list") with elements:
| Element | Type / Content |
|---|---|
passed |
Logical scalar; TRUE iff both passed_c4_bis and passed_did are TRUE. |
lambda_min |
Numeric; global minimum eigenvalue across coordinates (or NA_real_). |
lambda_max |
Numeric; global maximum eigenvalue across coordinates (or NA_real_). |
condition_number |
Numeric; worst (maximum) condition number across coordinates (or NA_real_). |
collinear_directions |
Always NULL at this level (populated in per-k entries). |
theta_ref_used |
Numeric vector; echo of theta_ref. |
tol_used |
Numeric; echo of tol. |
rigor_used |
Character; echo of rigor. |
column_labels |
character(0). |
c4_bis |
Full return value of check_C4_bis_per_k. |
did_pre_fit |
Return value of .check_did_pre_fit (or NULL). |
message |
Character scalar summarising pass/fail with human-readable explanation. |
Notes
- When
familyisNULL,.check_did_pre_fitis not called (did_pre_fitis set toNULL) andpassed_diddefaults toTRUE, so the overall result depends solely on C4-bis. - The message distinguishes three cases: all-pass, C4-bis failure, and D-ID failure.
-
min/maxoverlmins/lmaxs/condsusesna.rm = TRUEbut falls back toNA_real_when every element isNA(guarded byall(is.na(...))).
Purpose
Internal per-coordinate cross-component identifiability check (C4-bis). For each coordinate rigor = "fast") or a full Gram-matrix rank test (rigor = "full") on the additive design matrix
Arguments
| Argument | Type | Meaning |
|---|---|---|
design |
list | Must contain Z_a_list, Z_a_names_list, X, X_names. |
amm |
amm_spec |
Model specification; amm$W$dim is read to derive the per-k block dimension. |
theta_ref |
numeric vector (length p) |
Reference parameter vector; used to determine p. |
rigor |
character scalar ("fast" or "full") |
Selects the check strategy (see below). |
tol |
numeric scalar | Eigenvalue-ratio tolerance; a coordinate passes iff |
Mathematics
Let
For each coordinate
-
Extract the additive design sub-matrix
$\mathbf{Z}_a[k]$ (column-normalised to unit$\ell_2$ norm):$\widetilde{\mathbf{Z}}_a[k] = \mathbf{Z}_a[k]\,\mathrm{diag}\!\bigl(\lVert \mathbf{Z}_a[k]_{\cdot j}\rVert_2^{-1}\bigr)$ -
Form the (sample) correlation/gram matrix:
$\mathbf{G}_k = \frac{1}{n}\,\widetilde{\mathbf{Z}}_a[k]^{\!\top}\,\widetilde{\mathbf{Z}}_a[k]$ -
Compute eigen-decomposition
$\mathbf{G}_k = \mathbf{V}\,\mathrm{diag}(\lambda)\,\mathbf{V}^{\!\top}$ . The coordinate passes the rank sub-check iff:$\lambda_{\min}^{(k)} \;\ge\; \text{tol}\;\cdot\;\lambda_{\max}^{(k)}$ -
Structural overlap sub-check: Independently compute
$\text{shared} = \mathrm{colnames}\!\bigl(\mathbf{Z}_a[k]\bigr) \;\cap\; \mathrm{colnames}(\mathbf{X})$ . Underrigor = "full", any non-empty intersection causes a fail.Under
rigor = "fast", overlap triggers only agdpar_c4bis_overlap_warning(not a failure), and the rank check is skipped entirely—every coordinate is markedpassed = TRUEwithNAeigenvalue diagnostics.
Algorithm (rigor = "full")
for k in 1..p:
shared ← intersect(colnames(Z_a[k]), colnames(X))
if ncol(Z_a[k]) == 0 → pass trivially
col_norms ← √(colSums(Z_a[k]²))
if any(col_norms == 0) → fail, record zero-norm columns
Z_k_n ← Z_a[k] / col_norms # column-normalise
G_k ← crossprod(Z_k_n) / nrow(Z_k_n)
eig_k ← eigen(G_k, symmetric = TRUE)
λ_min ← min(eig_k$values)
λ_max ← max(eig_k$values)
κ ← λ_max / max(λ_min, ε_machine)
passed_rank ← (λ_min ≥ tol · λ_max)
passed_overlap ← (shared is empty)
passed_k ← passed_rank AND passed_overlap
collinear_directions:
if !passed_rank → eigenvectors with eigenvalue < tol·λ_max
if !passed_overlap → shared columns labelled
Returns
A list with elements:
| Element | Type / Content |
|---|---|
rigor |
Character; echo of the input rigor. |
per_k |
List of length |
Each per_k[[k]] entry contains:
| Sub-element | Type / Content |
|---|---|
passed |
Logical; TRUE if this coordinate passes both sub-checks. |
rigor |
Character ("fast" or "full"). |
lambda_min |
Numeric; smallest eigenvalue of NA_real_). |
lambda_max |
Numeric; largest eigenvalue of NA_real_). |
condition_number |
Numeric; NA_real_). |
shared_cols |
Character vector; column names shared between |
collinear_directions |
List of lists (or NULL); each sub-list has eigenvalue, columns, coefficients (eigenvector loadings with $` |
coord |
Integer; the coordinate index |
message |
(Only when ncol(Z_a_k) == 0 under "full"): "empty Z_a[k]: trivially passes."
|
Notes
-
Zero-norm column detection (full rigor): If any column of
$\mathbf{Z}_a[k]$ has zero$\ell_2$ norm the coordinate immediately fails with$\lambda_{\min}=0,\ \kappa=\infty$ , and the offending columns are recorded incollinear_directionsunder the label"zero-norm columns in Z_a[k]"with coefficient1. -
Fast rigor warning aggregation: Overlap detected across coordinates is accumulated in
overlap_accand emitted as a singlegdpar_c4bis_overlap_warning(viagdpar_warn) after the loop. The warning includes adata$shared_cols_by_coordslot listing shared columns per coordinate. -
Modulating-component design
$\mathbf{X}$ not used in rank check: The source code documents that the extended Gram matrix$[\mathbf{Z}_a[k] \mid \theta_{\text{ref}}[k]^m \mathbf{X}]$ was considered but rejected because at a fixed$\theta_{\text{ref}}$ the$W$ -block columns are scalar multiples of$\mathbf{X}$ per coordinate$l$ , yielding rank 1 by construction. Cross-component non-identifiability is therefore deferred to post-fit diagnostics (divergences, low ESS, high$\hat{R}$ ). -
Column naming convention: Collinear-direction column labels are prefixed with
"a{k}:"(e.g.,"a3:smooth_term"). - Default anchor: The comment states the anchor is taken as zero, consistent with the package default on the linear-predictor scale.
-
%||%usage:X_names,colnames(Z_a_k),lambda_min,lambda_max, andcondition_numberall use the null-coalescing operator%||%to supply fallback values.
Purpose Pre-fit validation of Data-Identifying Design (DID) conditions for individual-scope parameters within a gdpar_family or gdpar_family_multi object. Called before the Stan fit to verify that all per-observation or per-group parameter specifications have an explicit DID declaration and, when rigor == "full" and
Arguments
| Argument | Type | Meaning |
|---|---|---|
family |
gdpar_family, gdpar_family_multi, or NULL
|
The family object whose param_specs are inspected. If NULL, the function returns NULL immediately. |
design |
any | Present in the signature but never referenced in the function body; reserved for future use or passed through for call-site compatibility. |
theta_ref |
any | Present in the signature but never referenced in the function body; same rationale as design. |
rigor |
character scalar |
"full" activates the symbolic-separability sub-check when TRUE). |
Algorithm
-
Family extraction. If
familyinherits from"gdpar_family_multi", the first elementfamily$families[[1]]is taken asbase_family; if it inherits from"gdpar_family",base_family <- family; otherwise returnNULL. -
Individual-scope filtering. From
base_family$param_specs, retain only specs whosescopeis"per_observation"or"per_group". Let$K$ be the count of such specs. -
Per-parameter metadata. For each retained spec
$s$ , extract the fieldsname,scope,did_status,did_condition,did_reference,prior_canonical_kind. -
DID declarative check. $$ \text{passed_did} = \bigwedge_{k=1}^{K} \bigl(s_k.\text{did_status} \in {\texttt{"holds"},;\texttt{"holds_under_condition"}}\bigr) $$
-
Symbolic separability (conditional on
$K \geq 2$ andrigor == "full"). Collect the vector ofprior_canonical_kindvalues$\mathbf{p} = (p_1, \dots, p_K)$ . An element$p_i$ is overlapping if$p_i = p_j$ for some$j \neq i$ : $$ \text{overlap}_i = \bigl(\exists, j \neq i : p_j = p_i\bigr) $$ This is detected viaduplicated(p) | duplicated(p, fromLast = TRUE). If any overlap exists,passed_separability <- FALSEand the offending kinds/names are recorded. Otherwise the check passes. -
Final verdict. $$ \text{passed} = \text{passed_did} ;\wedge; \text{passed_separability} $$
Returns
A list (or NULL if family is NULL or not a recognized gdpar family):
| Field | Type | Meaning |
|---|---|---|
passed |
logical scalar |
TRUE iff DID declarations are valid and separability holds |
K |
integer scalar | Number of individual-scope parameter specs |
per_param |
list of length |
Per-parameter metadata (name, scope, did_status, did_condition, did_reference, prior_canonical_kind) |
symbolic_separability |
NULL or list |
NULL when not evaluated; otherwise a list with passed, overlapping_kinds, overlapping_names, message
|
rigor |
character scalar | Echo of the input rigor
|
Notes
-
designandtheta_refare accepted but completely unused; this allows a uniform calling convention across identifiability-check helpers. - For a
gdpar_family_multiobject only the first family (family$families[[1]]) is inspected; sub-families at index$\geq 2$ are ignored. - If
base_family$param_specsisNULL, the function returnsNULL(not a list withpassed = TRUE), signaling that DID checking is not applicable. - No errors are raised; all conditions are evaluated non-destructively.
Purpose Implements layer D-B3 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check of the per-slot additive design matrix .build_amm_design_K() before reaching this helper. The check detects zero-norm columns (structural rank deficiency) and, under rigor == "full", computes the normalized Gram condition number and flags slots whose minimum eigenvalue falls below
Arguments
| Argument | Type | Meaning |
|---|---|---|
design_K |
list | Output of .build_amm_design_K(); must contain Z_a_k_list (list of Z_a_k_names_list (list of column-name vectors), slot_names (character vector of length |
rigor |
character scalar |
"fast" skips the Gram eigendecomposition (rank is checked only via zero-norm detection); "full" performs the full eigendecomposition. |
tol |
numeric scalar | Threshold for the eigenvalue-ratio criterion. A slot passes when |
Mathematics
For each slot
-
Column norms.
$c_j = \lVert z_j \rVert_2 = \sqrt{\sum_{i=1}^n z_{ij}^2}$ for$j = 1, \dots, p_k$ . Any$c_j = 0$ signals an identically-zero column and immediately fails the slot. -
Column normalization (full rigor). $$ \widetilde{Z} = Z_a^{(k)} \cdot \mathrm{diag}(c_1^{-1}, \dots, c_{p_k}^{-1}) $$
-
Normalized Gram matrix. $$ G = \frac{1}{n},\widetilde{Z}^\top \widetilde{Z} ;\in; \mathbb{R}^{p_k \times p_k} $$
-
Eigendecomposition.
$G = V \Lambda V^\top$ with eigenvalues$\lambda_1 \geq \cdots \geq \lambda_{p_k} \geq 0$ . -
Rank criterion. $$ \text{ok} = \bigl(\lambda_{\min} \geq \text{tol} \cdot \lambda_{\max}\bigr) $$ where
$\lambda_{\min} = \lambda_{p_k}$ and$\lambda_{\max} = \lambda_1$ . -
Collinear-direction extraction. For each eigenvalue
$\lambda_j < \text{tol} \cdot \lambda_{\max}$ , the corresponding eigenvector$v^{(j)}$ is inspected: components with$|v^{(j)}_i| > 10^{-3}$ are retained and sorted by descending absolute value, yielding the collinear direction report.
Returns
A list:
| Field | Type | Meaning |
|---|---|---|
passed |
logical scalar |
TRUE iff every slot passes |
rigor |
character scalar | Echo of input |
per_slot |
list of length slot_names) |
Per-slot reports (see below) |
Each element of per_slot is a list:
| Field | Type | Meaning |
|---|---|---|
slot |
character | Slot name |
passed |
logical | Slot-level pass/fail |
rigor |
character | Effective rigor for this slot |
lambda_min |
numeric | Minimum eigenvalue (NA_real_ if not computed) |
lambda_max |
numeric | Maximum eigenvalue (NA_real_ if not computed) |
condition_number |
numeric |
Inf if zero-norm columns detected |
collinear_columns |
NULL or list |
NULL when passed; under rigor == "fast" with zero columns, a character vector of offending column names; under rigor == "full" a list of sub-lists each with eigenvalue, columns, coefficients
|
message |
character | Human-readable diagnostic |
Notes
-
Empty matrix shortcut. If
$p_k = 0$ , the slot passes trivially withlambda_min = lambda_max = condition_number = NA_real_. -
Zero-norm columns. Detected before any eigendecomposition. The offending column names are taken from
Z_a_k_names_list[[k]], falling back tocolnames(Z_a_k), falling back tocharacter(ncol(Z_a_k)). -
rigor == "fast"with no zero-norm columns. Passes unconditionally without Gram computation; diagnostics areNA_real_. -
Numerical safeguard. The condition number divides by
$\max(\lambda_{\min}, \epsilon_{\text{mach}})$ where$\epsilon_{\text{mach}}$ is.Machine$double.eps, preventing division by zero. - No S3 dispatch; purely internal utility.
Purpose Implements layer D-B2 of sub-phase 8.3.4 (Block 8): a pre-fit structural rank check on the column-wise concatenation of per-slot additive design matrices .check_Z_a_K_per_slot), the joint matrix can be rank-deficient when the same covariate appears in multiple slots with linearly equivalent designs. This helper detects such cross-slot collinearity.
Arguments
| Argument | Type | Meaning |
|---|---|---|
design_K |
list | Output of .build_amm_design_K(); same structure as for .check_Z_a_K_per_slot. |
rigor |
character scalar |
"fast" returns a structural pass without eigendecomposition; "full" computes the Gram eigendecomposition of the joint matrix. |
tol |
numeric scalar | Eigenvalue-ratio threshold: |
Mathematics
-
Joint construction. For each slot
$k$ with$p_k > 0$ columns, prefix every column name with"{slot_name}:"and horizontally concatenate: $$ Z_{\text{joint}} = \bigl[, Z_a^{(1)} ;\big|; Z_a^{(2)} ;\big|; \cdots ;\big|; Z_a^{(K)},\bigr] ;\in; \mathbb{R}^{n \times P} $$ where$P = \sum_{k: p_k > 0} p_k$ . -
Zero-norm detection. Column norms
$c_j = \lVert z_j \rVert_2$ . Any$c_j = 0$ immediately fails the check. -
Column normalization and Gram matrix (full rigor). $$ \widetilde{Z}{\text{joint}} = Z{\text{joint}} \cdot \mathrm{diag}(c_1^{-1}, \dots, c_P^{-1}), \qquad G = \frac{1}{n},\widetilde{Z}{\text{joint}}^\top \widetilde{Z}{\text{joint}} $$
-
Eigendecomposition and rank criterion. Identical to the per-slot check: $$ \text{ok} = \bigl(\lambda_{\min}(G) \geq \text{tol} \cdot \lambda_{\max}(G)\bigr) $$
-
Collinear-direction extraction. Same thresholding procedure as
.check_Z_a_K_per_slot: for each eigenvalue below the tolerance band, retain eigenvector components with$|v_i| > 10^{-3}$ , sorted by descending absolute value. Column names are prefixed with slot identifiers so the report identifies which slot contributes each column.
Returns
A list:
| Field | Type | Meaning |
|---|---|---|
passed |
logical scalar |
TRUE iff the joint matrix passes the rank criterion |
rigor |
character scalar | Echo of input |
lambda_min |
numeric | Minimum eigenvalue of NA_real_ if not computed) |
lambda_max |
numeric | Maximum eigenvalue of NA_real_ if not computed) |
condition_number |
numeric |
Inf if zero-norm columns |
collinear_directions |
NULL or list |
NULL when passed; otherwise a list of sub-lists each with eigenvalue, columns (prefixed names), coefficients
|
total_columns |
integer |
|
message |
character | Human-readable diagnostic |
Notes
-
Empty joint design. If every slot has
$p_k = 0$ , the function returnspassed = TRUEwithtotal_columns = 0LandNAdiagnostics. -
rigor == "fast". After concatenation (to computetotal_columns), the function returns a structural pass without eigendecomposition. Note that zero-norm detection is also skipped under"fast"— unlike.check_Z_a_K_per_slotwhich at least checks column norms under"fast". -
Zero-norm columns under
"full". Detected before eigendecomposition; the returnedcollinear_directionscontains a single synthetic entry witheigenvalue = 0andcoefficients = rep(1, length(bad)), since the true null direction is degenerate. -
Column-name resolution. For each slot
$k$ , names are sourced fromZ_a_k_names_list[[k]], thencolnames(Z_a_k), then auto-generated"col1","col2", … viapaste0("col", seq_len(ncol(Z_a_k))). -
No S3 dispatch; purely internal. The returned list is a component of the combined report produced by the companion function
.check_identifiability_K.
Purpose Top-level orchestrator for K-individual identifiability checks in the gdpar pipeline. It aggregates two independent sub-checks—per-slot rank of the anchor design blocks (C1–C3 conditions) and cross-slot Gram-matrix conditioning (C4-bis condition)—into a single logical pass/fail verdict. This function is called during the pre-fit decision layer to determine whether the anchor parameterisation is structurally identifiable.
Arguments
| Argument | Type | Meaning |
|---|---|---|
design_K |
list |
Design structure returned by the AMM design builders. Must contain a Z_a_k_list component (list of per-slot anchor design matrices). Passed directly to the two sub-check helpers. |
rigor |
character scalar |
Checking rigor; matched against c("full", "fast") via match.arg. "full" performs exhaustive eigenvalue/SVD-based rank analysis; "fast" may use cheaper heuristics. Defaults to "full". |
tol |
numeric scalar |
Numerical tolerance (eigenvalue ratio threshold) for rank decisions. Defaults to 1e-8. Passed through to both sub-check functions. |
Mathematics This function does not implement a formula directly; it delegates to:
-
.check_Z_a_K_per_slot(design_K, rigor, tol)— verifies that each per-slot anchor design matrix$Z_{a,k}$ has full column rank (the C1–C3 conditions). -
.check_C4_bis_K_cross_slot(design_K, rigor, tol)— verifies that the cross-slot block Gram matrix is non-singular (the C4-bis condition), i.e., that no linear combination of columns across different$k$ -slots creates a collinearity.
The overall verdict is the logical conjunction: the identifiability check passes if and only if both sub-checks pass.
Returns A named list with components:
| Component | Type | Description |
|---|---|---|
passed |
logical |
TRUE iff both per_slot_rank$passed and cross_slot_gram$passed are TRUE. |
rigor |
character |
The resolved rigor level. |
tol |
numeric |
The tolerance used. |
K |
integer |
Number of K-individual slots, derived as length(design_K$Z_a_k_list). |
per_slot_rank |
list |
Full return value of .check_Z_a_K_per_slot. |
cross_slot_gram |
list |
Full return value of .check_C4_bis_K_cross_slot. |
Notes
-
rigoris validated bymatch.arg; an unrecognised value raises an error before either sub-check executes. - Both sub-checks receive the same
rigorandtolvalues, ensuring consistent stringency. -
Kis extracted from the length ofZ_a_k_list; if that component is missing or empty,Kwill be0and the sub-checks must handle that case internally. - No side effects; purely computational with no global state modification.
Purpose Implements Block D-B1 of sub-phase 8.3.4: post-fit information contraction analysis per K-individual slot. For each slot's anchor parameter, it computes the prior-to-posterior variance contraction ratio
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit |
cmdstanr fit object |
The output of cs_model$sample(). Used to extract posterior draws via fit$draws(format = "draws_matrix"). |
family |
list (gdpar_family) |
A gdpar_family object whose K-individual slots have been promoted to per-observation scope by .gdpar_promote_scope_per_observation(). Its param_specs component is consulted for prior_canonical_kind per slot. |
slot_names |
character vector |
Canonical slot names, length |
use_groups |
integer scalar (0 or 1) |
Whether the fit used per-group hierarchical anchors. Determines the Stan parameter root: "mu_theta_ref_k" when 1, "theta_ref_k" when 0. |
prior |
list (gdpar_prior) |
A gdpar_prior object; its priors_by_kind overrides are documented as potentially consulted but are currently inert (the helper falls back to canonical kinds). |
Mathematics
The contraction for slot
where .gdpar_canonical_prior_variance(kind) for the slot's canonical prior kind.
Decision thresholds (from sub-phase 8.3.4 scoping):
Returns A named list with components:
| Component | Type | Description |
|---|---|---|
passed |
logical |
TRUE iff no slot triggered "warn" or "information_error". |
any_warn |
logical |
TRUE if any slot has status "warn". |
any_info_error |
logical |
TRUE if any slot has status "information_error". |
thresholds |
named numeric vector |
c(warn = 0.5, information_error = 0.1). |
per_slot |
named list of length |
Per-slot diagnostic results (see below). |
Each element of per_slot is a named list:
| Component | Type | Description |
|---|---|---|
slot |
character |
The slot name from slot_names[k]. |
var_post |
numeric |
Posterior sample variance (or NA_real_ if skipped). |
var_prior |
numeric |
Prior variance from canonical kind (or NA_real_). |
contraction |
numeric |
NA_real_ if skipped/non-finite). |
status |
character |
One of "pass", "warn", "information_error", "skipped". |
message |
character |
Diagnostic message string. |
Notes
-
Early return on missing draws: If
fit$draws(format = "draws_matrix")throws an error (caught viatryCatch), the function returns immediately withpassed = TRUEand all slots marked"skipped"with message"fit$draws() unavailable". This is a defensive fallback—no diagnostic is raised when draws are inaccessible. -
Parameter name construction: The Stan parameter name is
paste0(param_root, "[1,", k, "]")for bothuse_groups == 0anduse_groups == 1(the index structure is[1, k]in both cases, using the first row). The root is"theta_ref_k"or"mu_theta_ref_k"depending onuse_groups. -
Spec resolution: The function attempts to filter
family$param_specsfor specs withscopeinc("per_observation", "per_group"). If the filtered length does not equal$K$ , it falls back to taking the first$K$ specs positionally (family$param_specs[seq_len(K)]). -
Non-finite contraction: If
var_priorisInf,0, or the ratio producesNaN/Inf, the contraction isNA_real_and status is"skipped"with message"non-finite contraction". -
Insufficient draws: If the draws column exists but has fewer than 4 elements (
length(draws_k) < 4L), the slot is skipped with message"draws for '<param>' unavailable"(even though draws technically exist, the sample is too small for a reliable variance estimate). -
No canonical variance: If
.gdpar_canonical_prior_variance(kind)returns a non-finite value, the slot is skipped with message reporting the kind. - The information-error raises a
gdpar_information_errorwarning class elsewhere in the pipeline; this function itself only classifies—it does not emit warnings or errors. - The
priorargument is accepted for future extensibility but is currently unused within the function body.
Purpose S3 print method for objects of class "gdpar_identifiability_report". Renders a human-readable summary of the identifiability diagnostic to the console, covering all possible sub-report sections: eigenvalue/condition-number diagnostics, collinear directions (C1–C4), C4-bis per-coordinate cross-component checks, and D-ID pre-fit parameter analysis. Exported for user convenience.
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
list with S3 class "gdpar_identifiability_report"
|
The identifiability report object. Expected components are detailed in the Returns section below. |
... |
(unused) | Present for S3 generic compatibility; ignored. |
Returns Invisibly returns x (the input object), following standard R print-method conventions. The primary effect is the side effect of printing formatted text to the console via cat().
Notes
-
Top-level fields: Always prints
x$passed. Printslambda_min,lambda_max,condition_number, andtol_usedonly ifx$lambda_minis notNA. Printsx$rigor_usedif non-NULL. Printsx$messageunconditionally. -
Collinear directions (C1–C4): If
x$passedisFALSEandx$collinear_directionsis non-NULL, iterates over each direction entry, printing the eigenvalue and each column/coefficient pair in the direction vector. Each entrydis expected to haved$eigenvalue(numeric),d$columns(character vector), andd$coefficients(numeric vector of same length asd$columns). -
C4-bis section: If
x$c4_bisis non-NULL, iterates overx$c4_bis$per_k. Each entrypkis expected to havepk$coord,pk$rigor,pk$passed,pk$condition_number(may beInf),pk$shared_cols(character vector, may be empty), and optionallypk$collinear_directionswith the same structure as above. -
D-ID pre-fit section: If
x$did_pre_fitis non-NULL, prints the totalK, overallpassedflag, and per-parameter details (name,scope,prior_canonical_kind,did_status,did_condition). Also printssymbolic_separabilityif present, includingpassedandoverlapping_kinds. -
Formatting: Uses
format(..., digits = 4)for eigenvalues and condition numbers,format(..., digits = 3, width = 7)for direction coefficients. Indentation uses fixed strings (" "," "," "," "). - No validation is performed on the structure of
x; missing components are guarded byis.null/is.nachecks. Ifxlacks expected fields, the corresponding section is silently omitted. - The function does not raise errors under normal conditions; it is purely presentational.
Note: The source section also contains the roxygen documentation block for a .check_C7_group_anchor_aliasing function (implementing condition C7 from Block 6.5—detecting aliasing between group indicators and design columns). However, the function definition (signature and body) is not present in this section; it begins in the subsequent section (section 5 of 5). Therefore it is documented there, not here.
Purpose
Checks the identifiability condition (C7) of Block 6.5 for group aliasing. Ensures that columns of the design matrices for the
Arguments
-
design: A list containing design matrices. In a univariate setting it should have elementsZ_a,Z_b,Z_a_names,Z_b_names. In a multi‑coordinate setting it should haveZ_a_list,Z_b_list,Z_a_names_list,Z_b_names_list(each a list of length$p$ , the number of coordinates). -
group_id: A vector of group identifiers. Converted internally to an integer factor. IfNULLor with fewer than two levels, the check is skipped. -
group_var_name: A character string naming the grouping variable (used in error messages). -
tol: Numeric tolerance for comparing variances and QR‑rank deficiency (default1e-8).
Mathematics
The function implements two checks per design block (see .check_c7_one_block):
-
Within‑group variance test: For each column
$j$ of a design matrix$Z$ , compute the within‑group variance$s_{jg}^2$ for each group$g$ . If$s_{jg}^2 \le \text{tol}$ for all groups (i.e., the column is constant within every group), identifiability is violated. -
Rank test: Let
$G$ be the indicator matrix for the groups. The combined matrix$M = [G \mid Z]$ must have full column rank. If$\mathrm{rank}(M) < \text{ncol}(M)$ (after column normalization), there exists an indirect alias between the group anchor and a linear combination of the design columns.
Returns
invisible(NULL) invisibly. The function is called for its side effects (raising errors).
Notes
- Internal function (not exported, leading dot).
- If
group_idisNULLor has fewer than two levels, the function returns immediately without checks. - For multi‑coordinate designs (
has_multi_designisTRUE), the check is applied to each coordinate block separately, iterating overseq_len(p)wherep = length(design$Z_a_list). - For each block, it calls
.check_c7_one_blockwith the appropriate design sub‑matrix, column names, and coordinate index. - Errors are raised via
gdpar_abort()with class"gdpar_input_error"and a structured data list containing the component, coordinate (if any), group variable name, and (for variance violations) the names of the aliased columns.
Purpose
Internal helper that applies the two‑layer aliasing check (C7) to a single design block coord = NA for univariate). It performs the within‑group variance test and the joint rank test.
Arguments
-
Z: Numeric design matrix (may have zero columns or rows). IfNULL, zero columns, or zero rows, the function returns immediately. -
Z_names: Character vector of column names forZ, used in error messages to identify aliased columns. -
component: Character string, either"a"or"b", indicating which model component the block belongs to. -
coord: Integer coordinate index$k$ (for multi‑coordinate designs) orNA_integer_for univariate. Used only for error message formatting. -
group_int: Integer vector of group memberships (values in$1, \dots, J_{\text{groups}}$ ) for each observation. -
J_groups: Integer number of groups. (Received but not used directly in the computations; group structure is encoded ingroup_intand the indicator matrix$G$ .) -
group_var_name: Character string naming the grouping variable (for error messages). -
tol: Numeric tolerance for variance comparisons and QR‑rank deficiency.
Mathematics
-
Within‑group variance test:
For each column$j = 1, \dots, p$ (where$p = \text{ncol}(Z)$ ): $$ s_{jg}^2 = \begin{cases} 0 & \text{if group } g \text{ has fewer than 2 observations}, \ \frac{1}{|g|-1} \sum_{i \in g} (Z_{ij} - \bar{Z}_{\cdot j g})^2 & \text{otherwise}, \end{cases} $$ where$\bar{Z}_{\cdot j g}$ is the mean of column$j$ within group$g$ .
Let$m_j = \max_{g=1}^{J_{\text{groups}}} s_{jg}^2$ .
If$m_j \le \text{tol}$ , column$j$ is flagged as constant within every group. -
Rank test:
Construct the group indicator matrix$G \in \mathbb{R}^{n \times J_{\text{groups}}}$ viamodel.matrix(~ as.factor(group_int) + 0).
Form$M = [G \mid Z]$ . Normalize each column of$M$ by its Euclidean norm (with zero norms replaced by 1 to avoid division by zero).
Compute the QR decomposition of the normalized matrix:$\text{qr}(M_{\text{norm}})$ .
If$\mathrm{rank}(M_{\text{norm}}) < \text{ncol}(M_{\text{norm}})$ , the joint matrix is rank‑deficient, indicating an indirect alias.
Returns
invisible(NULL) if all checks pass.
Notes
- Internal function (not exported, leading dot), called only by
.check_group_aliasing_c7. - Early returns with
invisible(NULL)if:-
ZisNULL, -
ncol(Z) == 0L, -
nrow(Z) == 0L.
-
- Two possible error conditions:
-
Direct aliasing (constant columns): If any column has within‑group variance
$\le \text{tol}$ ,gdpar_abort()is called with a message listing the offending column names. The error data includes the component, coordinate, group variable name, andaliased_columns. -
Indirect aliasing (rank deficiency): If the rank of the normalized combined matrix is less than the number of columns,
gdpar_abort()is called with a message including the component, coordinate, group variable name, the observed rank, and the number of columns. The error data includesrankandncol.
-
Direct aliasing (constant columns): If any column has within‑group variance
- Errors are of class
"gdpar_input_error"and contain adatalist for programmatic handling. - The argument
J_groupsis passed but not used in any computation; the group structure is fully represented bygroup_intand the generated indicator matrix$G$ .
This file defines three S3 methods supporting the comparison of Empirical-Bayes (EB) and Fully-Bayes (FB) estimation paths produced elsewhere in the gdpar package. The methods provide console printing, a structured summary, and printing of that summary.
Purpose
S3 print method for objects of class gdpar_eb_fb_comparison. Produces a concise human-readable console summary of an EB-vs-FB comparison, including the estimation families and paths involved, the number of common
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
gdpar_eb_fb_comparison (list) |
The comparison object to display. Expected components: family_eb, family_fb (character), path_eb, path_fb (character), level (numeric), tv_bins (integer scalar), n_common_params (integer scalar), tv_table (data frame with a tv column, possibly NULL), coverage_table (data frame with a width_ratio column, possibly NULL), theta_diff_table (data frame, possibly NULL), warnings (character vector). |
digits |
integer scalar | Passed to format() for numeric formatting; defaults to 3L. |
... |
(any) | Unused; absorbed for S3 generic compatibility. |
Mathematics
No mathematical formula is implemented. The method computes order statistics over finite subsets of two distributions:
- TV values:
$\{v \in \texttt{tv\_table}\$tv : v \in \mathbb{R}\}$ , reporting$\min,\ \mathrm{median},\ \max$ . - Width ratios:
$\{r \in \texttt{coverage\_table}\$width\_ratio : r \in \mathbb{R}\}$ , reporting$\min,\ \mathrm{median},\ \max$ .
Non-finite values (NA, NaN, Inf, -Inf) are excluded via is.finite() before computing summary statistics.
Returns
The object x, returned invisibly (via invisible(x)). The primary effect is console output.
Notes
- The
levelcomponent is formatted withformat(x$level, digits = digits); its type is not validated. - If
tv_tableisNULLor has zero rows, or if alltvvalues are non-finite, the marginal TV line is silently omitted. - If
coverage_tableisNULLor has zero rows, or if allwidth_ratiovalues are non-finite, the width-ratio line is silently omitted. - The
$\theta$ -diff preview usesutils::head(x$theta_diff_table, 6L)and is passed throughformat(..., digits = digits)beforeprint(). - Warnings are printed one per line, prefixed with
" - ". - No validation or error-checking is performed on the structure of
x; missing components would propagate as R errors (e.g.,$onNULL).
Purpose
S3 summary method for objects of class gdpar_eb_fb_comparison. Constructs a structured list suitable for programmatic access and for the canonical print.summary.gdpar_eb_fb_comparison method. Aggregates the TV table and the coverage (width-ratio) table into seven-point summary statistics (count, min, 25th percentile, median, 75th percentile, max, mean).
Arguments
| Argument | Type | Meaning |
|---|---|---|
object |
gdpar_eb_fb_comparison (list) |
The comparison object to summarize. Same expected components as for print.gdpar_eb_fb_comparison, plus optionally call. |
... |
(any) | Unused; absorbed for S3 generic compatibility. |
Mathematics
For the TV distribution, let
where stats::quantile) and
For the width-ratio distribution, let
If either finite subset is empty, the corresponding summary element is set to NULL.
Returns
A list of class c("summary.gdpar_eb_fb_comparison", "list") with the following components:
| Component | Type | Description |
|---|---|---|
family_eb |
character | Copied from object$family_eb. |
family_fb |
character | Copied from object$family_fb. |
path_eb |
character | Copied from object$path_eb. |
path_fb |
character | Copied from object$path_fb. |
level |
(inherits from object$level) |
Copied from object$level. |
tv_bins |
integer | Copied from object$tv_bins. |
n_common_params |
integer | Copied from object$n_common_params. |
n_anchor_cells |
integer |
0L if object$theta_diff_table is NULL, otherwise nrow(object$theta_diff_table). |
tv_summary |
list or NULL
|
Seven-element list (n, min, q25, median, q75, max, mean) or NULL. |
coverage_summary |
list or NULL
|
Seven-element list (same structure) or NULL. |
theta_diff_table |
data frame or NULL
|
Copied by reference from object$theta_diff_table. |
tv_table |
data frame or NULL
|
Copied by reference from object$tv_table. |
coverage_table |
data frame or NULL
|
Copied by reference from object$coverage_table. |
warnings |
character | `object$warnings % |
call |
(any) | Copied from object$call. |
Notes
- The
%||%infix operator is used for thewarningsdefault; this operator is assumed to be defined elsewhere in the package (not in this file). Under standardrlang::%||%`` semantics, it returns the left-hand side if it is notNULL, otherwise the right-hand side. - Quantiles are computed with
stats::quantileat probabilities$0.25$ and$0.75$ ; the defaulttype = 7interpolation is used. Thenamesattribute of the scalar result is stripped viaunname(). - The
q25andq75fields are unnamed scalars; all other numeric fields inherit names frommin(),stats::median(),max(), andmean()(typicallyNULLfor these functions on atomic vectors). - No copy is made of the table data frames; they are assigned by reference into the output list.
- The output class vector is
c("summary.gdpar_eb_fb_comparison", "list"), enabling dispatch on both the specific class and the implicitlistclass.
Purpose
S3 print method for objects of class summary.gdpar_eb_fb_comparison. Renders the structured summary to the console, displaying the EB/FB families and paths, the level, the count of common
Arguments
| Argument | Type | Meaning |
|---|---|---|
x |
summary.gdpar_eb_fb_comparison (list) |
The summary object to display. Expected components: family_eb, family_fb, path_eb, path_fb (character), level (numeric), n_common_params (integer scalar), n_anchor_cells (integer scalar), tv_summary (list or NULL), coverage_summary (list or NULL), theta_diff_table (data frame or NULL), warnings (character vector). |
digits |
integer scalar | Passed to format() for numeric formatting; defaults to 3L. |
... |
(any) | Unused; absorbed for S3 generic compatibility. |
Mathematics
No mathematical formula is implemented. The method formats and displays pre-computed summary statistics. For each distribution (TV and width-ratio), it prints:
on one line, followed by
Returns
The object x, returned invisibly (via invisible(x)). The primary effect is console output.
Notes
- Unlike
print.gdpar_eb_fb_comparison(which shows only the first six rows oftheta_diff_table), this method prints the fulltheta_diff_tableviaprint(format(x$theta_diff_table, digits = digits)). - The
tv_summaryandcoverage_summaryblocks are each printed only if the corresponding component is non-NULL. Each block includes the sample sizenin its header. - The
levelis formatted withformat(x$level, digits = digits). - Warnings are printed one per line, prefixed with
" - ", only iflength(x$warnings) > 0L. - The
tv_binscomponent is not printed by this method (unlikeprint.gdpar_eb_fb_comparison), even though it is present in the summary object. - No structural validation of
xis performed; missing or mistyped components would propagate as R errors.
Purpose (role in the package).
Exported orchestrator for Sub-phase 8.6.E (Charter Section 3.5, decision 2.5 Trio of vignettes). It produces a descriptive operational comparison between an Empirical-Bayes fit (gdpar_eb_fit, from gdpar_eb()) and a Fully-Bayes fit (gdpar_fit, from gdpar()) fitted on the same dataset. It does not assert algorithmic equivalence and does not test hypotheses across the two inferential frames. It computes three tables:
- Per-anchor-cell differences in the population anchor
$\theta_{\text{ref}}$ . - Marginal empirical total-variation (TV) distance between the lower-level posteriors of
$\xi = (a,\, b,\, W,\, \text{dispersion})$ , parameter by parameter. - Operational verification of the higher-order coverage discrepancy (v07 Section 6, Proposition 7B scalar / 7B* matricial / 7B* tensorial) on the nominal EB and FB credible intervals.
Arguments
| Argument | Type | Meaning |
|---|---|---|
eb_fit |
object of class gdpar_eb_fit
|
Empirical-Bayes fit produced by gdpar_eb(). Covers all four path regimes: K = 1 + p = 1; Path A (K = 1 + p > 1); Path B (K > 1 + p = 1); Path C (K > 1 + p > 1, via the K × p tensor extension of Sub-phase 8.6.D). |
fb_fit |
object of class gdpar_fit
|
Fully-Bayes fit produced by gdpar(). Must have been fitted on the same dataset (same outcome, same covariates, same K / p regime). The comparator does not refit either model. |
level |
numeric scalar | Credible-interval level for coverage-discrepancy reporting. Must lie in 0.95. |
tv_bins |
integer scalar | Number of histogram bins used to approximate the marginal TV distance per parameter. Must be 30L. Larger values give a finer empirical TV but require more draws per parameter for stability. |
... |
(any) | Reserved for future arguments; currently unused. |
Mathematics
The marginal total-variation distance between two distributions
where
The coverage-discrepancy table compares EB-nominal vs FB-nominal credible-interval widths at level level per anchor cell, operationally verifying the
Returns
An object of class c("gdpar_eb_fb_comparison", "list") with components:
| Component | Type / Structure | Meaning |
|---|---|---|
theta_diff_table |
data.frame or NULL
|
Per-anchor-cell comparison of EB vs FB .gdpar_eb_fb_theta_diff_table. |
tv_table |
data.frame or NULL
|
Marginal TV distance per common NULL when EB or FB draws are unavailable or there are zero common parameter names. |
coverage_table |
data.frame or NULL
|
Coverage-discrepancy table (EB-nominal vs FB-nominal IC widths per anchor cell). |
level |
numeric scalar | Echo of the level input. |
tv_bins |
integer scalar | Echo of the tv_bins input. |
n_common_params |
integer | Number of rows in tv_table (or 0L if tv_table is NULL). |
path_eb |
character | Path identifier from eb_fit$path, defaulting to "eb". |
path_fb |
character | Path identifier from fb_fit$path, defaulting to "fb". |
family_eb |
character |
eb_fit$family$name, or NA_character_ if absent. |
family_fb |
character |
fb_fit$family$name, or NA_character_ if absent. |
call |
call |
The matched call. |
warnings |
character vector | Accumulated fallback notification messages from helper extractors. Empty (character(0L)) in the happy path. |
meta |
list | Contains mode = "compare_eb_fb" and a human-readable note summarizing the TV and coverage methodology. |
Companion S3 methods print.gdpar_eb_fb_comparison and summary.gdpar_eb_fb_comparison are documented separately.
Notes
-
Input validation.
eb_fitmust inherit"gdpar_eb_fit";fb_fitmust inherit"gdpar_fit".levelmust be a single numeric value in$(0,1)$ .tv_binsmust be a single numeric value$\geq 5$ (coerced to integer). Violations raise agdpar_input_errorviagdpar_abort(). -
Required namespace. The
posteriorpackage is required (suggested dependency); an informative error is raised if absent. -
Warning accumulation. A local
emit()closure accumulates messages intowarnings_msg(and also callsgdpar_warn()for each). Six distinct fallback conditions are checked:- All-FB
theta_refdraws areNA(unknown template convention or empty draws). - EB
$\xi$ draws areNULL. - FB
$\xi$ draws areNULL. - TV table is
NULLdespite both draw sets being non-NULL(zero common parameter names). - All-FB widths in the coverage table are
NA.
- All-FB
-
FB conditional_fit fallback. For the FB
$\xi$ draws extraction, the function first triesfb_fit$conditional_fitand falls back tofb_fit$fitvia the%||%null-coalescing operator. -
Path uniformity. All four EB regimes are handled uniformly by the helpers. For Path C the
theta_ref_kp_hattensor is flattened to a length-$Kp$ vector keyed by (slot, coord); the joint$K \times p$ inflation tensor is reported in the coverage table per cell via diagonal-block entries. -
S3 dispatch. The returned object carries class
c("gdpar_eb_fb_comparison", "list")for dispatch by the companion print/summary methods.
Purpose (role in the package).
Internal helper that builds the theta_diff_table component of the comparison object. It extracts the EB anchor point estimates and their standard errors, attempts to extract FB posterior draws of .gdpar_eb_fb_extract_theta_ref_draws_fb(), and computes per-anchor-cell differences. The row key structure varies by path regime.
Arguments
| Argument | Type | Meaning |
|---|---|---|
eb_fit |
object of class gdpar_eb_fit
|
Empirical-Bayes fit. Path regime is inferred from eb_fit$path (checked for "eb_KxP" for Path C). Contains theta_ref_kp_hat / theta_ref_kp_se (Path C), or theta_ref_hat / theta_ref_se (other paths). |
fb_fit |
object of class gdpar_fit
|
Fully-Bayes fit. Passed to the FB draws extractor. |
level |
numeric scalar in |
Credible-interval level. Accepted by the function signature but not used in the current body (reserved for future use or passed downstream elsewhere). |
Mathematics
For each anchor cell
where NA_real_ when
Returns
A data.frame whose structure depends on path regime:
-
Path C (
eb_fit$path == "eb_KxP", i.e.\$K > 1$ and$p > 1$ ): One row per$(g,\,k,\,c)$ triple ($g = 1,\ldots,J$ groups;$k = 1,\ldots,K$ slots;$c = 1,\ldots,p$ coordinates). Columns:Column Type Meaning groupinteger Group index $g$ .slotcharacter Slot name from eb_fit$slot_names[k].coordinteger Coordinate index $c$ .eb_estimatenumeric eb_fit$theta_ref_kp_hat[g, k, c].eb_senumeric eb_fit$theta_ref_kp_se[g, k, c].fb_meannumeric Posterior mean of FB draws for cell $(g,k,c)$ ;NA_real_if draws unavailable.fb_senumeric Posterior SD of FB draws; NA_real_if draws unavailable.diffnumeric $\texttt{eb\_estimate} - \texttt{fb\_mean}$ .diff_relnumeric $(\texttt{eb\_estimate} - \texttt{fb\_mean}) / \texttt{fb\_se}$ , orNA_real_whenfb_seis not finite or$\leq 0$ . -
Non-Path-C (K = 1 + p = 1, Path A, Path B): One row per anchor cell, indexed sequentially. Columns:
Column Type Meaning cellinteger Sequential cell index $1,\ldots,n_{\text{cells}}$ .eb_estimatenumeric as.numeric(eb_fit$theta_ref_hat)[i].eb_senumeric as.numeric(eb_fit$theta_ref_se)[i].fb_meannumeric From fb_draws$flat$means[i];NA_real_if absent.fb_senumeric From fb_draws$flat$ses[i];NA_real_if absent.diffnumeric $\texttt{eb\_estimate} - \texttt{fb\_mean}$ .diff_relnumeric Conditional ratio as above; uses vectorized ifelse.
Notes
-
FB draws extraction. Calls
.gdpar_eb_fb_extract_theta_ref_draws_fb(fb_fit)inside atryCatchthat silently returnsNULLon any error. WhenNULL, all FB columns are filled withNA_real_. -
Path C iteration. Uses a triple nested
forloop over$(g, k, c)$ , pre-allocating a list of length$J \cdot K \cdot p$ . Each list element is a single-rowdata.frame, assembled at the end viado.call(rbind, rows). FB draws for each cell are accessed asfb_draws$kp[[g]][[k]][, c]. -
Non-Path-C path. EB estimates and SEs are coerced to numeric vectors via
as.numeric(). FB means and SEs are taken fromfb_draws$flat$meansandfb_draws$flat$ses, with length truncated tomin(n_cells, length(fb_draws$flat$means)). -
Edge case — zero-length FB draws. For Path C, if a cell's FB draw vector is
NULLor haslength == 0, bothfb_meanandfb_seare set toNA_real_. -
Edge case — zero FB SE.
diff_relisNA_real_wheneverfb_seisNA, not finite, or$\leq 0$ . -
The
levelargument is accepted but not used in the computation within this helper; it is present for interface consistency with the other helpers.
Purpose (role in the package).
Internal helper that extracts the gdpar_fit object in a path-aware manner. Used by .gdpar_eb_fb_theta_diff_table to obtain the FB posterior summaries (
Arguments
| Argument | Type | Meaning |
|---|---|---|
fb_fit |
object of class gdpar_fit
|
Fully-Bayes fit whose Stan draws are to be inspected for |
Mathematics
No formula is implemented by this helper; it is a pure data-extraction utility that retrieves posterior draws from the Stan output using the canonical posterior::as_draws_matrix interface.
Returns
A list whose structure depends on the path regime of fb_fit:
| Path regime | Component | Structure | Meaning |
|---|---|---|---|
| Non-Path-C (K = 1 + p = 1 / Path A / Path B) | flat |
Named list with elements means (numeric vector) and ses (numeric vector) |
Posterior means and standard deviations of theta_ref[...] or theta_ref_k[...] conventions. |
| Path C (K > 1 + p > 1) | kp |
Nested list: kp[[g]][[k]] is a matrix with |
Posterior draws of theta_ref_kp[...] convention. |
Returns NULL when extraction fails (draws are not present in the recognized variable-name convention, or fb_fit lacks draw data).
Notes
- Body not in this section. The function body is defined in section 2 of 2; only the roxygen documentation block appears in this section. The documented behavior is as described above.
-
Path C debt. The documentation explicitly notes that the K × p FB template for Path C is itself a follow-on debt of the 8.4 unification effort per Charter and the
project_gdpar_deuda_8_4_unificacion_standebt item. -
Fail-silent design. The function returns
NULLon failure rather than raising an error. The calling orchestrator (gdpar_compare_eb_fb) detects this via downstreamNULLchecks and emits structured warnings through theemit()closure. -
@keywords internal/@noRd. This function is internal and does not generate an.Rdhelp page.
Purpose Extracts the posterior draws of reference-anchor parameters (NULL if no
Arguments
| Argument | Type | Meaning |
|---|---|---|
fb_fit |
list | A fitted FB model object. The function first looks for fb_fit$conditional_fit (an EB conditional fit embedded within the FB workflow) and falls back to fb_fit$fit (the raw FB fit). Each is expected to possess a $draws() method returning a posterior draws object. |
Mathematics
No closed-form formula is implemented. The function is a dispatch-and-extraction routine that selects the appropriate draws-dimension based on variable naming:
-
Path C convention: variables matching
^theta_ref_kp\[(three-index tensor$\theta_{\mathrm{ref}\_kp}[g,k,c]$ ). -
Path B convention: variables matching
^theta_ref_k\[(one-dimensional, per-slot). -
Path A / default convention: variables matching
^theta_ref(\[|$)(scalar or simple vector).
For the "flat" returns (Paths A and B), the function computes:
$$\bar{\theta}j = \frac{1}{S}\sum{s=1}^{S} \theta_j^{(s)}, \qquad \mathrm{sd}j = \sqrt{\frac{1}{S-1}\sum{s=1}^{S}\bigl(\theta_j^{(s)} - \bar{\theta}_j\bigr)^2}$$
where
Returns
-
NULLif no draws object is available, if$draws()errors, if variable names are empty, or if no$\theta_{\mathrm{ref}}$ variables are found. -
list(kp = <nested list>)— Path C: a nested list produced by.gdpar_eb_fb_unpack_kp, indexed askp[[g]][[k]][, c]with$g=1,\dots,J$ (groups),$k=1,\dots,K$ (slots),$c=1,\dots,p$ (coordinates), each cell containing a numeric vector of posterior draws. -
list(flat = list(means = <numeric>, ses = <numeric>))— Paths A/B: unnamed numeric vectors of posterior means and standard deviations, one element per$\theta_{\mathrm{ref}}$ cell.
Notes
- Uses the null-coalescing operator
%||%(fromrlang) to chooseconditional_fitoverfit. - Both the
$draws()call and thedimnames()access are wrapped intryCatch, silently returningNULLon error—this is a defensive pattern for partially-initialized or incomplete fit objects. - Variable detection proceeds in priority order: Path C (kp) is checked first, then Path B (k), then Path A (default). Only the first match is returned.
- The comment notes that the Path C "kp" branch is not consumed by the diff table when the EB fit is
$K=1, p=1$ (Path A); only the kp branch itself touches that structure. - Depends on the
posteriorpackage foras_draws_matrixandsubset_draws.
Purpose Unpacks a draws matrix containing Path C-style
Arguments
| Argument | Type | Meaning |
|---|---|---|
mat |
posterior::draws_matrix |
A draws matrix whose columns are the theta_ref_kp[g,k,c]. |
vars_c |
character | Character vector of variable names matching the theta_ref_kp[...] pattern (used for parsing index triples). |
Mathematics
The function reconstructs the three-index tensor
theta_ref_kp[g,k,c], the integer indices
Returns
A nested list kp of depth 3:
-
kp[[g]]— list of length$J$ (groups/observations). -
kp[[g]][[k]]— list of length$K$ (slots). -
kp[[g]][[k]]— matrix of dimension$S \times p$ ($S$ = number of posterior draws,$p$ = number of coordinates). If a particulartheta_ref_kp[g,k,c]column is absent frommat, that column is filled withNA_real_.
Notes
- The regex
"\\[(\\d+),(\\d+),(\\d+)\\]"withregexeccaptures exactly three comma-separated integers inside square brackets. If a parsed match has fewer than 4 elements (the full match plus 3 groups),c(NA, NA, NA)is substituted. - The result matrix
kp[[g]][[k]]is initialized toNA_real_before filling, so any missing column inmatyieldsNAdraws for that cell rather than an error. - This function is marked
@keywords internaland@noRd; it is not exported. - Uses
sprintfto reconstruct the expected column name for lookup inmat.
Purpose Extracts the posterior draws of the "xi" parameter vector (the non-anchor model parameters: fixed-effect coefficients lp__. This function is used by both the EB conditional fit and the FB fit.
Arguments
| Argument | Type | Meaning |
|---|---|---|
fit_obj |
list | A fitted model object (either EB conditional or FB) expected to have a $draws() method returning a posterior draws object. May be NULL. |
Mathematics
No formula is implemented. The function performs a filtering operation on variable names. Variables are retained if they do not match any of the following exclusion patterns (joined by |):
| Pattern | Excluded variables |
|---|---|
^lp__$ |
Log posterior density |
^theta_ref |
Reference-anchor parameters (all variants) |
^mu_theta_ref |
Mean of reference-anchor hyperparameters |
^sigma_theta_ref |
SD of reference-anchor hyperparameters |
^eta |
Linear predictor generated quantities |
^eta_kp |
Path C linear predictor |
^log_lik |
Pointwise log-likelihood (LOO) |
^y_pred |
Posterior predictive draws |
^theta_i |
Individual-level random effects |
^a_raw |
Raw (non-centered) fixed-effect coefficients |
^c_b_raw |
Raw covariance parameters |
^c_b_kp_raw |
Path C raw covariance parameters |
^W_raw |
Raw covariance matrix elements |
Returns
-
NULLiffit_objisNULL, if$draws()errors, if variable names are empty, or if no variables survive the exclusion filter. - A
posterior::draws_matrixcontaining only the retained "xi" columns.
Notes
- This function is the complement of
.gdpar_eb_fb_extract_theta_ref_draws_fb: that function extracts only$\theta_{\mathrm{ref}}$ draws, while this one extracts everything except$\theta_{\mathrm{ref}}$ and generated quantities. - Both
$draws()anddimnames()accesses are wrapped intryCatch, silently returningNULLon error. - The exclusion list is hard-coded; adding new generated-quantity prefixes would require editing the
paste(...)call. - Marked
@keywords internal,@noRd, not exported.
Purpose Computes a marginal Total Variation (TV) distance between the EB and FB posterior distributions for each parameter common to both draws objects. This provides a diagnostic for how closely the EB approximation matches the full FB posterior on a per-parameter basis.
Arguments
| Argument | Type | Meaning |
|---|---|---|
draws_eb |
posterior::draws_matrix |
Posterior draws from the EB (Empirical Bayes) fit. Column names are parameter names. |
draws_fb |
posterior::draws_matrix |
Posterior draws from the FB (Forward-Backward) fit. Column names are parameter names. |
tv_bins |
integer scalar | Number of bins for the shared histogram grid used in the TV computation. |
Mathematics
For each parameter
The grid is divided into
Histogram counts
The marginal TV distance is then the histogram plug-in estimator: $$\widehat{\mathrm{TV}}(\psi) = \frac{1}{2}\sum_{j=1}^{B} \bigl|\hat{p}{\mathrm{eb},j} - \hat{p}{\mathrm{fb},j}\bigr|$$
Returns
-
NULLif eitherdraws_ebordraws_fbisNULL, or if no common column names exist. - A
data.framewith one row per common parameter and columns:
| Column | Type | Meaning |
|---|---|---|
parameter |
character | Parameter name (column name in the draws objects). |
tv |
numeric | Marginal TV distance (NA_real_ if the range is degenerate or non-finite. |
n_eb |
integer | Number of EB draws for this parameter. |
n_fb |
integer | Number of FB draws for this parameter. |
mean_eb |
numeric | Posterior mean from EB draws. |
mean_fb |
numeric | Posterior mean from FB draws. |
Notes
- If the combined range
rngcontains non-finite values or has zero width (diff(rng) <= 0),tvis set toNA_real_for that parameter. -
intersect(colnames(draws_eb), colnames(draws_fb))determines common parameters; order followsdraws_ebcolumn order. - Uses
graphics::hist(..., plot = FALSE)solely for bin counting; no plot is produced. - The rows are assembled via
do.call(rbind, rows)after a loop over common parameters. - Marked
@keywords internal,@noRd, not exported.
Purpose Builds a coverage diagnostic table comparing credible-interval widths between the EB and FB posteriors for each
Arguments
| Argument | Type | Meaning |
|---|---|---|
eb_fit |
list | A fitted EB model object. Must contain theta_ref_hat, theta_ref_se, and for Path C: theta_ref_kp_hat, theta_ref_kp_se, K, p, slot_names. May contain correction_applied (logical), eb_correction_constant (scalar), correction_tensor_constant (matrix), and path (character). |
fb_fit |
list | A fitted FB model object, passed to .gdpar_eb_fb_extract_theta_ref_draws_fb. |
level |
numeric scalar | Nominal credible level 0.95). |
Mathematics
The significance level and critical value are:
Path C (
- The EB standard error is
$\mathrm{se}_{g,k,c}^{\mathrm{eb}} = \mathtt{eb\_fit\$theta\_ref\_kp\_se}[g,k,c]$ . - If the EB correction was applied and the correction tensor
$\mathbf{T} \in \mathbb{R}^{K \times p \times p}$ is available with finite entries, the inflation factor is:$$\mathrm{inflate}_{k,c} = \sqrt{1 + \frac{T[k,c,c]}{\max(1, J)}}$$ Otherwise$\mathrm{inflate}_{k,c} = 1$ . - The EB credible-interval width is: $$w_{g,k,c}^{\mathrm{eb}} = 2z \cdot \mathrm{se}{g,k,c}^{\mathrm{eb}} \cdot \mathrm{inflate}{k,c}$$
- The FB credible-interval width is:
$$w_{g,k,c}^{\mathrm{fb}} = 2z \cdot \mathrm{sd}\bigl(\theta_{\mathrm{ref}_kp}^{(s)}[g,k,c]\bigr)$$ computed from the FB posterior draws (via.gdpar_eb_fb_extract_theta_ref_draws_fb), orNA_real_if unavailable. - The width ratio is:
$$R_{g,k,c} = \frac{w_{g,k,c}^{\mathrm{eb}}}{w_{g,k,c}^{\mathrm{fb}}}$$ set toNA_real_if$w^{\mathrm{fb}}$ is non-finite or zero.
Path A / Path B (non-Path-C): For each cell
-
$\mathrm{se}_j^{\mathrm{eb}} = \mathtt{eb\_fit\$theta\_ref\_se}[j]$ . - If
eb_fit$correction_appliedisTRUE, the scalar inflation factor is:$$\mathrm{inflate} = \sqrt{1 + \frac{c_{\mathrm{eb}}}{\max(1, J_{\mathrm{flat}})}}$$ where$c_{\mathrm{eb}} = \mathtt{eb\_fit\$eb\_correction\_constant}$ (defaulting to 0 ifNULL). Otherwise$\mathrm{inflate} = 1$ . - EB width:
$w_j^{\mathrm{eb}} = 2z \cdot \mathrm{se}_j^{\mathrm{eb}} \cdot \mathrm{inflate}$ . - FB width:
$w_j^{\mathrm{fb}} = 2z \cdot \mathrm{sd}_{j}^{\mathrm{fb}}$ from the flat FB draws, orNA_real_if the FB draws are shorter or unavailable. - The width ratio is computed element-wise with
ifelse, yieldingNA_real_when$w^{\mathrm{fb}}$ is non-finite or$\leq 0$ .
Returns
-
NULLis never explicitly returned (unlike the other functions in this section); instead the function always returns adata.frame. -
Path C: A
data.framewith$J \times K \times p$ rows and columns:
| Column | Type | Meaning |
|---|---|---|
group |
integer | Group index |
slot |
character | Slot name from eb_fit$slot_names[k]. |
coord |
integer | Coordinate index |
eb_width |
numeric | EB credible-interval width. |
fb_width |
numeric | FB credible-interval width (or NA). |
width_ratio |
numeric |
NA). |
inflation |
numeric | The correction inflation factor |
-
Path A / Path B: A
data.framewith$J_{\mathrm{flat}}$ rows and columns:
| Column | Type | Meaning |
|---|---|---|
cell |
integer | Sequential cell index |
eb_width |
numeric | EB credible-interval width. |
fb_width |
numeric | FB credible-interval width (or NA). |
width_ratio |
numeric |
NA). |
inflation |
numeric | Scalar correction inflation factor. |
Notes
- Path C vs. non-Path-C dispatch is determined by
identical(eb_fit$path, "eb_KxP"). - The FB draws extraction (
.gdpar_eb_fb_extract_theta_ref_draws_fb) is wrapped intryCatch; on errorfb_drawsis set toNULL, and all FB widths will beNA_real_. - For Path C, the correction tensor entry
tensor[k, c, c]is checked withall(is.finite(tensor[k, c, c]))before computing the inflation; if the tensor isNULLor the entry is non-finite, inflation defaults to 1. - For Path A/B,
eb_fit$eb_correction_constantdefaults to 0 via%||%whenNULL. - The
width_ratiois the key diagnostic: a ratio significantly above 1 indicates the EB credible intervals are wider than the FB intervals, consistent with the$O(n^{-1})$ under-coverage correction described in the referenced theoretical results. - Marked
@keywords internal,@noRd, not exported.
← Part III — Computational Architecture · gdpar Wiki Home · Part IV — Exhaustive Function Reference (2/7) →
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References