Part VI Data Benchmarks Tests References

← Part V — Stan Templates (3/3) · gdpar Wiki Home

Part VI — Data, benchmarks, tests, and references

VI.1 Bundled data and reproducibility

gdpar 0.1.0 ships no data/ datasets. Empirical material is obtained two ways:

Synthetic generators under inst/benchmarks/ produce known-ground-truth data for recovery studies (factorial scenario tables with analytic truth grids per family).
eBird Status & Trends avian-abundance data is pulled at run time through the ebirdst package (declared in Suggests); the package does not redistribute eBird data.

Golden-test fixtures (.rds) are not shipped (gitignored and .Rbuildignored); they are regenerated on a developer machine by the data-raw/ bootstrap scripts bootstrap_eb_goldens.R, bootstrap_eb_goldens_C.R, bootstrap_eb_goldens_D.R, bootstrap_fb_goldens_KxP.R. The golden tests skip gracefully (skip_on_cran / skip_if_not_installed("cmdstanr")) when the fixtures or Stan toolchain are absent, so the distributed package is self-consistent without them (see VI.3).

VI.2 Benchmark harness and external validation

The harness lives in inst/benchmarks/ (driver scripts, family adapters, scenario tables, metric functions) and writes six shipped Markdown reports to inst/benchmarks/results/:

Report	Content
`block9_synthetic_recovery.md`	synthetic recovery study (9.2.S)
`block9_revalidation.md`	organic eBird re-validation (9.2.O)
`block9_internal.md`	internal adversarial re-validation (9.1)
`block_rg_calibration.md`	geometry-engine calibration thresholds (Block RG)
`synthetic_hard_multi_summary.md`	hard-synthetic multi-parametric summary
`ebird_path_a_multi_summary.md`	eBird Path-A multi-parametric summary

The two pivotal external studies (re-validation gate of Block 9, against the four canonical competitors mgcv, brms, INLA, rstanarm):

9.2.S — synthetic recovery, 3200 cells. A $2\times2^5=64$-scenario factorial with known ground truth (continuous patterns {multicollinearity, nonlinearity, autocorrelation, heteroscedasticity, heavy tails} and count patterns {multicollinearity, nonlinearity, autocorrelation, zero-inflation, overdispersion}), $n=2000,\ R=10$ replicates, 5 methods. Metrics: rmse_mean, bias_mean, rmse_sd (graded for heteroscedasticity), a graded tail metric, coverage90, and an elpd_loo cross-check. Findings: gdpar leads in its distributional lane (tail-quantile ≈ 2.7×, zero-inflation ≈ 2.2×, co-best heteroscedasticity, wins 33/64 scenarios, dominant on count 21/32), is robust where it makes no claim (autocorrelation top-2; zero explosions in 640 fits vs one each for brms/rstanarm), loses narrowly on pure nonlinearity to mgcv's spline (0.088 vs 0.077), and is the most expensive (median ≈ 171 s vs ≈ 0.1 s, ≈ 1000×).
9.2.O — organic eBird, 80 cells. 5 species × 4 NE-USA sub-regions, extended over $K=1,2,3$. Finding: gdpar is statistically indistinguishable from mgcv (ELPD difference < 0.4 across the four sub-regions). The honest envelope: model-vs-model, state-based sub-regions, no per-model tuning, efficiency is not a veto.

The recurring, honestly-reported conclusion: gdpar's contribution is the canonical anchoring structure and its distributional/robustness behaviour, not a blanket accuracy win — on pure smoothness mgcv is better and ~1000× faster.

VI.3 Test architecture

The suite has 75 test files under tests/testthat/ with roughly 1240 test_that blocks (the full natural R CMD check runs in three layers). Two helpers support it: helper-data.R (synthetic data builders, skip_if_no_cmdstan) and helper-block9_spatial.R (a frozen pre-refactor copy used as a bit-exactness gate). Three tiers of tests:

Stan-free (always run). Pure-R logic: AMM design construction, identifiability checks, codegen string assembly, family/link resolution, residual algebra, dependence diagnostics on fixed residuals, the deterministic resampler golden (D76). These run on CRAN.
Golden regression (skip_on_cran). Bit-exact / portable-tolerance comparisons against frozen fixtures (EB 8.6.B/C/D, FB KxP, distributional 8.3.9). The exact (expect_identical) guard is developer-machine-only; portable numeric tolerances (expect_equal, 1e-6/1e-5) protect CRAN portability.
Stan-bound (gated). Full HMC fits behind skip_if_not_installed("cmdstanr") and environment gates (e.g. GDPAR_RUN_BLOCK9_DEP_FITS=1): invariance fuzz, 12-family outcome stress, bit-exact compare-path, geometry stress.

This tiering is why the published package (which excludes the heavy fixtures) still passes R CMD check cleanly: the fixture-dependent and Stan-dependent assertions skip when their prerequisites are absent.

VI.4 Release status

R CMD check --as-cran on the slim published tarball (vignettes and manual built, tests and --run-donttest executed) returns Status: 1 NOTE — the unavoidable "new submission" note plus "Suggests or Enhances not in mainstream repositories: cmdstanr, INLA" (both declared in Additional_repositories). No ERROR, no WARNING, no other NOTE. The natural check without vignette-building (--no-build-vignettes) returns Status OK; with vignettes built it is also clean.

VI.5 Consolidated references

The theory draws on, among others: Ghosal & van der Vaart (2017) Fundamentals of Nonparametric Bayesian Inference; Schwartz (1965); Ghosal, Ghosh & van der Vaart (2000); van der Vaart (1998) Asymptotic Statistics; Castillo & Rousseau (2015); Petrone, Rousseau & Scricciolo (2014); Rousseau & Szabo (2017); Robbins (1956); Efron (2019); Hastie & Tibshirani (1993); Fan & Zhang (2008); Wood (2017) GAMs; Hornik (1991), Pinkus (1999) (universal approximation); Jacot et al. (2018), Bach (2017), Dziugaite & Roy (2017) (neural-tangent / PAC-Bayes); Lambert (1992), Greene (1994) (zero-inflation); Teicher (1963), Patton (2006) (mixture/copula identifiability); Künsch (1989), Politis & White (2004) + Patton–Politis–White (2009) (block bootstrap, automatic block length); Cliff & Ord (spatial autocorrelation); Vehtari, Gelman & Gabry (2017) (PSIS-LOO); Wager & Athey (2018), Künzel et al. (2019) (heterogeneous treatment effects). Per-topic citation lists appear in the corresponding vignettes (v01–v09, vop01–vop09).

← Part V — Stan Templates (3/3) · gdpar Wiki Home

gdpar Wiki

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part VI Data Benchmarks Tests References

Part VI — Data, benchmarks, tests, and references

VI.1 Bundled data and reproducibility

VI.2 Benchmark harness and external validation

VI.3 Test architecture

VI.4 Release status

VI.5 Consolidated references

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gdpar Wiki

Clone this wiki locally