-
Notifications
You must be signed in to change notification settings - Fork 0
Part VI Data Benchmarks Tests References
← Part V — Stan Templates (3/3) · gdpar Wiki Home
gdpar 0.1.0 ships no data/ datasets. Empirical material is obtained two ways:
-
Synthetic generators under
inst/benchmarks/produce known-ground-truth data for recovery studies (factorial scenario tables with analytic truth grids per family). -
eBird Status & Trends avian-abundance data is pulled at run time through the
ebirdstpackage (declared inSuggests); the package does not redistribute eBird data.
Golden-test fixtures (.rds) are not shipped (gitignored and .Rbuildignored); they are regenerated on a developer machine by the data-raw/ bootstrap scripts bootstrap_eb_goldens.R, bootstrap_eb_goldens_C.R, bootstrap_eb_goldens_D.R, bootstrap_fb_goldens_KxP.R. The golden tests skip gracefully (skip_on_cran / skip_if_not_installed("cmdstanr")) when the fixtures or Stan toolchain are absent, so the distributed package is self-consistent without them (see VI.3).
The harness lives in inst/benchmarks/ (driver scripts, family adapters, scenario tables, metric functions) and writes six shipped Markdown reports to inst/benchmarks/results/:
| Report | Content |
|---|---|
block9_synthetic_recovery.md |
synthetic recovery study (9.2.S) |
block9_revalidation.md |
organic eBird re-validation (9.2.O) |
block9_internal.md |
internal adversarial re-validation (9.1) |
block_rg_calibration.md |
geometry-engine calibration thresholds (Block RG) |
synthetic_hard_multi_summary.md |
hard-synthetic multi-parametric summary |
ebird_path_a_multi_summary.md |
eBird Path-A multi-parametric summary |
The two pivotal external studies (re-validation gate of Block 9, against the four canonical competitors mgcv, brms, INLA, rstanarm):
-
9.2.S — synthetic recovery, 3200 cells. A
$2\times2^5=64$ -scenario factorial with known ground truth (continuous patterns {multicollinearity, nonlinearity, autocorrelation, heteroscedasticity, heavy tails} and count patterns {multicollinearity, nonlinearity, autocorrelation, zero-inflation, overdispersion}),$n=2000$ ,$R=10$ replicates, 5 methods. Metrics:rmse_mean,bias_mean,rmse_sd(graded for heteroscedasticity), a graded tail metric,coverage90, and anelpd_loocross-check. Findings: gdpar leads in its distributional lane (tail-quantile ≈ 2.7×, zero-inflation ≈ 2.2×, co-best heteroscedasticity, wins 33/64 scenarios, dominant on count 21/32), is robust where it makes no claim (autocorrelation top-2; zero explosions in 640 fits vs one each for brms/rstanarm), loses narrowly on pure nonlinearity to mgcv's spline (0.088 vs 0.077), and is the most expensive (median ≈ 171 s vs ≈ 0.1 s, ≈ 1000×). -
9.2.O — organic eBird, 80 cells. 5 species × 4 NE-USA sub-regions, extended over
$K=1,2,3$ . Finding: gdpar is statistically indistinguishable from mgcv (ELPD difference < 0.4 across the four sub-regions). The honest envelope: model-vs-model, state-based sub-regions, no per-model tuning, efficiency is not a veto.
The recurring, honestly-reported conclusion: gdpar's contribution is the canonical anchoring structure and its distributional/robustness behaviour, not a blanket accuracy win — on pure smoothness mgcv is better and ~1000× faster.
The suite has 75 test files under tests/testthat/ with roughly 1240 test_that blocks (the full natural R CMD check runs in three layers). Two helpers support it: helper-data.R (synthetic data builders, skip_if_no_cmdstan) and helper-block9_spatial.R (a frozen pre-refactor copy used as a bit-exactness gate). Three tiers of tests:
- Stan-free (always run). Pure-R logic: AMM design construction, identifiability checks, codegen string assembly, family/link resolution, residual algebra, dependence diagnostics on fixed residuals, the deterministic resampler golden (D76). These run on CRAN.
-
Golden regression (
skip_on_cran). Bit-exact / portable-tolerance comparisons against frozen fixtures (EB 8.6.B/C/D, FB KxP, distributional 8.3.9). The exact (expect_identical) guard is developer-machine-only; portable numeric tolerances (expect_equal, 1e-6/1e-5) protect CRAN portability. -
Stan-bound (gated). Full HMC fits behind
skip_if_not_installed("cmdstanr")and environment gates (e.g.GDPAR_RUN_BLOCK9_DEP_FITS=1): invariance fuzz, 12-family outcome stress, bit-exact compare-path, geometry stress.
This tiering is why the published package (which excludes the heavy fixtures) still passes R CMD check cleanly: the fixture-dependent and Stan-dependent assertions skip when their prerequisites are absent.
R CMD check --as-cran on the slim published tarball (vignettes and manual built, tests and --run-donttest executed) returns Status: 1 NOTE — the unavoidable "new submission" note plus "Suggests or Enhances not in mainstream repositories: cmdstanr, INLA" (both declared in Additional_repositories). No ERROR, no WARNING, no other NOTE. The natural check without vignette-building (--no-build-vignettes) returns Status OK; with vignettes built it is also clean.
The theory draws on, among others: Ghosal & van der Vaart (2017) Fundamentals of Nonparametric Bayesian Inference; Schwartz (1965); Ghosal, Ghosh & van der Vaart (2000); van der Vaart (1998) Asymptotic Statistics; Castillo & Rousseau (2015); Petrone, Rousseau & Scricciolo (2014); Rousseau & Szabo (2017); Robbins (1956); Efron (2019); Hastie & Tibshirani (1993); Fan & Zhang (2008); Wood (2017) GAMs; Hornik (1991), Pinkus (1999) (universal approximation); Jacot et al. (2018), Bach (2017), Dziugaite & Roy (2017) (neural-tangent / PAC-Bayes); Lambert (1992), Greene (1994) (zero-inflation); Teicher (1963), Patton (2006) (mixture/copula identifiability); Künsch (1989), Politis & White (2004) + Patton–Politis–White (2009) (block bootstrap, automatic block length); Cliff & Ord (spatial autocorrelation); Vehtari, Gelman & Gabry (2017) (PSIS-LOO); Wager & Athey (2018), Künzel et al. (2019) (heterogeneous treatment effects). Per-topic citation lists appear in the corresponding vignettes (v01–v09, vop01–vop09).
- Part I — Conceptual Framework
- Part II — Mathematical Foundations
- Part III — Computational Architecture
- Part IV — Exhaustive Function Reference (1/7)
- Part IV — Exhaustive Function Reference (2/7)
- Part IV — Exhaustive Function Reference (3/7)
- Part IV — Exhaustive Function Reference (4/7)
- Part IV — Exhaustive Function Reference (5/7)
- Part IV — Exhaustive Function Reference (6/7)
- Part IV — Exhaustive Function Reference (7/7)
- Part V — Stan Templates (1/3)
- Part V — Stan Templates (2/3)
- Part V — Stan Templates (3/3)
- Part VI — Data, Benchmarks, Tests & References