Skip to content

Part VI Data Benchmarks Tests References

José Mauricio Gómez Julián edited this page Jun 29, 2026 · 4 revisions

← Part V — Stan Templates (3/3) · gdpar Wiki Home


Part VI — Data, benchmarks, tests, and references

VI.1 Bundled data and reproducibility

gdpar 0.1.0 ships no data/ datasets. Empirical material is obtained two ways:

  • Synthetic generators under inst/benchmarks/ produce known-ground-truth data for recovery studies (factorial scenario tables with analytic truth grids per family).
  • eBird Status & Trends avian-abundance data is pulled at run time through the ebirdst package (declared in Suggests); the package does not redistribute eBird data.

Golden-test fixtures (.rds) are not shipped (gitignored and .Rbuildignored); they are regenerated on a developer machine by the data-raw/ bootstrap scripts bootstrap_eb_goldens.R, bootstrap_eb_goldens_C.R, bootstrap_eb_goldens_D.R, bootstrap_fb_goldens_KxP.R. The golden tests skip gracefully (skip_on_cran / skip_if_not_installed("cmdstanr")) when the fixtures or Stan toolchain are absent, so the distributed package is self-consistent without them (see VI.3).

VI.2 Benchmark harness and external validation

The harness lives in inst/benchmarks/ (driver scripts, family adapters, scenario tables, metric functions) and writes six shipped Markdown reports to inst/benchmarks/results/:

Report Content
block9_synthetic_recovery.md synthetic recovery study (9.2.S)
block9_revalidation.md organic eBird re-validation (9.2.O)
block9_internal.md internal adversarial re-validation (9.1)
block_rg_calibration.md geometry-engine calibration thresholds (Block RG)
synthetic_hard_multi_summary.md hard-synthetic multi-parametric summary
ebird_path_a_multi_summary.md eBird Path-A multi-parametric summary

The two pivotal external studies (re-validation gate of Block 9, against the four canonical competitors mgcv, brms, INLA, rstanarm):

  • 9.2.S — synthetic recovery, 3200 cells. A $2\times2^5=64$-scenario factorial with known ground truth (continuous patterns {multicollinearity, nonlinearity, autocorrelation, heteroscedasticity, heavy tails} and count patterns {multicollinearity, nonlinearity, autocorrelation, zero-inflation, overdispersion}), $n=2000,\ R=10$ replicates, 5 methods. Metrics: rmse_mean, bias_mean, rmse_sd (graded for heteroscedasticity), a graded tail metric, coverage90, and an elpd_loo cross-check. Findings: gdpar leads in its distributional lane (tail-quantile ≈ 2.7×, zero-inflation ≈ 2.2×, co-best heteroscedasticity, wins 33/64 scenarios, dominant on count 21/32), is robust where it makes no claim (autocorrelation top-2; zero explosions in 640 fits vs one each for brms/rstanarm), loses narrowly on pure nonlinearity to mgcv's spline (0.088 vs 0.077), and is the most expensive (median ≈ 171 s vs ≈ 0.1 s, ≈ 1000×).
  • 9.2.O — organic eBird, 80 cells. 5 species × 4 NE-USA sub-regions, extended over $K=1,2,3$. Finding: gdpar is statistically indistinguishable from mgcv (ELPD difference < 0.4 across the four sub-regions). The honest envelope: model-vs-model, state-based sub-regions, no per-model tuning, efficiency is not a veto.

The recurring, honestly-reported conclusion: gdpar's contribution is the canonical anchoring structure and its distributional/robustness behaviour, not a blanket accuracy win — on pure smoothness mgcv is better and ~1000× faster.

VI.3 Test architecture

The suite has 75 test files under tests/testthat/ with roughly 1240 test_that blocks (the full natural R CMD check runs in three layers). Two helpers support it: helper-data.R (synthetic data builders, skip_if_no_cmdstan) and helper-block9_spatial.R (a frozen pre-refactor copy used as a bit-exactness gate). Three tiers of tests:

  1. Stan-free (always run). Pure-R logic: AMM design construction, identifiability checks, codegen string assembly, family/link resolution, residual algebra, dependence diagnostics on fixed residuals, the deterministic resampler golden (D76). These run on CRAN.
  2. Golden regression (skip_on_cran). Bit-exact / portable-tolerance comparisons against frozen fixtures (EB 8.6.B/C/D, FB KxP, distributional 8.3.9). The exact (expect_identical) guard is developer-machine-only; portable numeric tolerances (expect_equal, 1e-6/1e-5) protect CRAN portability.
  3. Stan-bound (gated). Full HMC fits behind skip_if_not_installed("cmdstanr") and environment gates (e.g. GDPAR_RUN_BLOCK9_DEP_FITS=1): invariance fuzz, 12-family outcome stress, bit-exact compare-path, geometry stress.

This tiering is why the published package (which excludes the heavy fixtures) still passes R CMD check cleanly: the fixture-dependent and Stan-dependent assertions skip when their prerequisites are absent.

VI.4 Release status

R CMD check --as-cran on the slim published tarball (vignettes and manual built, tests and --run-donttest executed) returns Status: 1 NOTE — the unavoidable "new submission" note plus "Suggests or Enhances not in mainstream repositories: cmdstanr, INLA" (both declared in Additional_repositories). No ERROR, no WARNING, no other NOTE. The natural check without vignette-building (--no-build-vignettes) returns Status OK; with vignettes built it is also clean.

VI.5 Consolidated references

The theory draws on, among others: Ghosal & van der Vaart (2017) Fundamentals of Nonparametric Bayesian Inference; Schwartz (1965); Ghosal, Ghosh & van der Vaart (2000); van der Vaart (1998) Asymptotic Statistics; Castillo & Rousseau (2015); Petrone, Rousseau & Scricciolo (2014); Rousseau & Szabo (2017); Robbins (1956); Efron (2019); Hastie & Tibshirani (1993); Fan & Zhang (2008); Wood (2017) GAMs; Hornik (1991), Pinkus (1999) (universal approximation); Jacot et al. (2018), Bach (2017), Dziugaite & Roy (2017) (neural-tangent / PAC-Bayes); Lambert (1992), Greene (1994) (zero-inflation); Teicher (1963), Patton (2006) (mixture/copula identifiability); Künsch (1989), Politis & White (2004) + Patton–Politis–White (2009) (block bootstrap, automatic block length); Cliff & Ord (spatial autocorrelation); Vehtari, Gelman & Gabry (2017) (PSIS-LOO); Wager & Athey (2018), Künzel et al. (2019) (heterogeneous treatment effects). Per-topic citation lists appear in the corresponding vignettes (v01v09, vop01vop09).



← Part V — Stan Templates (3/3) · gdpar Wiki Home

Clone this wiki locally