Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
.DS_Store
vignettes/*.html
vignettes/*.R
inst/validation/snapshots/*.rds
72 changes: 72 additions & 0 deletions inst/validation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Validation Infrastructure

This directory supports the joint g-computation refactor (issues
[#61](https://github.com/EpiModel/ARTnet/issues/61)–[#65](https://github.com/EpiModel/ARTnet/issues/65))
by giving us two things a standard `testthat` suite cannot:

1. A **byte-for-byte reference snapshot** of the output that
`build_netparams()` and `build_netstats()` produce on the pre-refactor
`main` branch, captured once and compared against on every subsequent
commit.
2. A pinned copy of the **downstream consumer** code
(`EpiModelHIV-Template/R/A-networks/`) so we always know exactly which
fields of `netstats` must remain stable — no guessing.

## Files

- `epimodelhiv_template_ref/` — verbatim copies of
`~/git/EpiModelHIV-Template/R/A-networks/{initialize,model_main,model_casl,model_ooff}.R`
taken on 2026-04-19. These are the ERGM specifications that consume
`netstats`; they define the backward-compatibility contract. Do not edit
unless the upstream template changes.
- `netstats_contract.md` — distilled list of exactly which `netstats` fields
the template scripts read, by layer.
- `validate_backward_compat.R` — `capture_snapshot()` and
`compare_to_snapshot()` functions. Run `capture_snapshot()` on pre-refactor
`main`; run `compare_to_snapshot()` on the refactor branch with
`method = "existing"` (or equivalent default) and expect zero diffs.
- `snapshots/` — created on first capture run. `.gitignore`d by default;
the captured `.rds` files are large and should not be checked in.

## Workflow

### Step A — Before starting the refactor (pre-capture)

On the pre-refactor `main` branch, with `ARTnetData` installed:

```r
devtools::load_all() # from the ARTnet repo root
source(system.file("validation/validate_backward_compat.R", package = "ARTnet"))
capture_snapshot() # writes inst/validation/snapshots/*.rds
```

This saves one snapshot per parameter set (see the `PARAM_SETS` list in
`validate_backward_compat.R`). Commit the snapshot files only if they are
small enough; otherwise keep them locally and rely on a hash digest that
**is** committed.

### Step B — During/after the refactor (compare)

On the refactor branch, with the new joint-GLM code in place:

```r
devtools::load_all()
source(system.file("validation/validate_backward_compat.R", package = "ARTnet"))
compare_to_snapshot(method = "existing")
```

The call must report `ALL MATCH` before the PR is considered mergeable.
Any field-level diff is a backward-compatibility regression.

## Why not just `testthat::expect_equal()`?

Two reasons:
1. These runs require `ARTnetData` (private) and take minutes to execute —
they do not belong in CI.
2. `testthat` snapshots are text-based and don't roundtrip well for deeply
nested lists containing S3 objects (`glm`, `lm`, `dissolution_coefs`).
`saveRDS()` + `all.equal()` is the simplest reliable approach here.

Unit tests for individual joint-GLM behaviors (convergence, marginal
recovery, coefficient sanity — see CLAUDE.md §4.5) should still live in
`tests/testthat/` as normal.
53 changes: 53 additions & 0 deletions inst/validation/epimodelhiv_template_ref/initialize.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
## REFERENCE COPY (2026-04-19) of EpiModelHIV-Template/R/A-networks/initialize.R
## DO NOT EDIT — this exists to pin the downstream consumer contract.
## If the upstream file changes, refresh this copy and update
## `inst/validation/netstats_contract.md`.

## Initialize the ARTnet data objects and the networks to be fitted
##
## This script should not be run directly. But `sourced` by `1-estimation.R`

if (system.file(package = "ARTnetData") == "") {
message(
"=================================================================\n",
"You are currently using the example population provided by ARTnet\n",
"Install ARTnetData to get all the features.\n",
"Follow the instructions at the link below to get access to it.\n",
"https://github.com/EpiModel/ARTnet/tree/main?tab=readme-ov-file#artnetdata-dependency\n",
"=================================================================\n"
)

epistats <- readRDS(system.file("epistats-example.rds", package = "ARTnet"))
netstats <- readRDS(system.file("netstats-example.rds", package = "ARTnet"))
} else {
epistats <- build_epistats(
geog.lvl = "city",
geog.cat = "Atlanta",
init.hiv.prev = c(0.33, 0.137, 0.084),
race = TRUE,
time.unit = time_unit
)

netparams <- build_netparams(
epistats = epistats,
smooth.main.dur = TRUE
)

netstats <- build_netstats(
epistats,
netparams,
expect.mort = 0.000478213,
network.size = networks_size
)
}


nw <- EpiModel::network_initialize(netstats$demog$num)
nw_main <- EpiModel::set_vertex_attribute(
nw,
names(netstats$attr),
netstats$attr
)

nw_casl <- nw_main
nw_inst <- nw_main
43 changes: 43 additions & 0 deletions inst/validation/epimodelhiv_template_ref/model_casl.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
## REFERENCE COPY (2026-04-19) of EpiModelHIV-Template/R/A-networks/model_casl.R
## DO NOT EDIT — pins the ERGM specification consuming netstats$casl.

## Define and fit the *casual* network model
##
## This script should not be run directly. But `sourced` by `1-estimation.R`

# Formula
model_casl <- ~ edges +
nodematch("age.grp", diff = TRUE) +
nodefactor("age.grp", levels = -5) +
nodematch("race", diff = FALSE) +
nodefactor("race", levels = -1) +
nodefactor("deg.main", levels = -3) +
concurrent +
degrange(from = 4) +
nodematch("role.class", diff = TRUE, levels = c(1, 2))

# Target Stats
netstats_casl <- c(
edges = netstats$casl$edges,
nodematch_age.grp = netstats$casl$nodematch_age.grp,
nodefactor_age.grp = netstats$casl$nodefactor_age.grp[-5],
nodematch_race = netstats$casl$nodematch_race_diffF,
nodefactor_race = netstats$casl$nodefactor_race[-1],
nodefactor_deg.main = netstats$casl$nodefactor_deg.main[-3],
concurrent = netstats$casl$concurrent,
degrange = 0,
nodematch_role.class = c(0, 0)
) |> unname()

# Fit model
fit_casl <- EpiModel::netest(
nw_casl,
formation = model_casl,
target.stats = netstats_casl,
coef.diss = netstats$casl$diss.byage,
set.control.ergm = control_ergm,
verbose = FALSE
) |> trim_netest()

# Keep only the necessary objects
rm(model_casl, netstats_casl)
43 changes: 43 additions & 0 deletions inst/validation/epimodelhiv_template_ref/model_main.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
## REFERENCE COPY (2026-04-19) of EpiModelHIV-Template/R/A-networks/model_main.R
## DO NOT EDIT — pins the ERGM specification consuming netstats$main.

## Define and fit the *main* network model
##
## This script should not be run directly. But `sourced` by `1-estimation.R`

# Formula
model_main <- ~ edges +
nodematch("age.grp", diff = TRUE) +
nodefactor("age.grp", levels = -1) +
nodematch("race", diff = FALSE) +
nodefactor("race", levels = -1) +
nodefactor("deg.casl", levels = -1) +
concurrent +
degrange(from = 3) +
nodematch("role.class", diff = TRUE, levels = c(1, 2))

# Target Stats
netstats_main <- c(
edges = netstats$main$edges,
nodematch_age.grp = netstats$main$nodematch_age.grp,
nodefactor_age.grp = netstats$main$nodefactor_age.grp[-1],
nodematch_race = netstats$main$nodematch_race_diffF,
nodefactor_race = netstats$main$nodefactor_race[-1],
nodefactor_deg.casl = netstats$main$nodefactor_deg.casl[-1],
concurrent = netstats$main$concurrent,
degrange = 0,
nodematch_role.class = c(0, 0)
) |> unname()

# Fit model
fit_main <- EpiModel::netest(
nw_main,
formation = model_main,
target.stats = netstats_main,
coef.diss = netstats$main$diss.byage,
set.control.ergm = control_ergm,
verbose = FALSE
) |> EpiModel::trim_netest()

# Keep only the necessary objects
rm(model_main, netstats_main)
41 changes: 41 additions & 0 deletions inst/validation/epimodelhiv_template_ref/model_ooff.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
## REFERENCE COPY (2026-04-19) of EpiModelHIV-Template/R/A-networks/model_ooff.R
## DO NOT EDIT — pins the ERGM specification consuming netstats$inst.

## Define and fit the *one-off* network model
##
## This script should not be run directly. But `sourced` by `1-estimation.R`

# Formula
model_ooff <- ~ edges +
nodematch("age.grp", diff = FALSE) +
nodefactor("age.grp", levels = -1) +
nodematch("race", diff = FALSE) +
nodefactor("race", levels = -1) +
nodefactor("risk.grp", levels = -5) +
nodefactor("deg.tot", levels = -1) +
nodematch("role.class", diff = TRUE, levels = c(1, 2))

# Target Stats
netstats_ooff <- c(
edges = netstats$inst$edges,
nodematch_age.grp = sum(netstats$inst$nodematch_age.grp),
nodefactor_age.grp = netstats$inst$nodefactor_age.grp[-1],
nodematch_race = netstats$inst$nodematch_race_diffF,
nodefactor_race = netstats$inst$nodefactor_race[-1],
nodefactor_risk.grp = netstats$inst$nodefactor_risk.grp[-5],
nodefactor_deg.tot = netstats$inst$nodefactor_deg.tot[-1],
nodematch_role.class = c(0, 0)
) |> unname()

# Fit model
fit_ooff <- EpiModel::netest(
nw_inst,
formation = model_ooff,
target.stats = netstats_ooff,
coef.diss = dissolution_coefs(~ offset(edges), 1),
set.control.ergm = control_ergm,
verbose = FALSE
) |> trim_netest()

# Keep only the necessary objects
rm(model_ooff, netstats_ooff)
60 changes: 60 additions & 0 deletions inst/validation/netstats_contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# netstats Backward-Compatibility Contract

The `netstats` object returned by `build_netstats()` is consumed by the
ERGM estimation scripts in `EpiModelHIV-Template/R/A-networks/` (verbatim
copy pinned in `epimodelhiv_template_ref/`). These are the fields the
template reads — they must remain byte-identical under
`method = "existing"` (or whatever we name the legacy flag).

Snapshot taken 2026-04-19 against EpiModelHIV-Template@main.

## `initialize.R`
- `netstats$demog$num` — network size (scalar integer)
- `netstats$attr` — named list of vertex attributes (age, sqrt.age,
age.grp, active.sex, race, deg.casl, deg.main, deg.tot, risk.grp,
role.class, diag.status). The attribute vectors are constructed via
`sample()` / `apportion_lr()` / `rbinom()` and depend on RNG state.
Validation must `set.seed()` before comparison.

## `model_main.R`
- `netstats$main$edges`
- `netstats$main$nodematch_age.grp` (vector, one per age group)
- `netstats$main$nodefactor_age.grp` (vector, one per age group)
- `netstats$main$nodematch_race_diffF` (scalar)
- `netstats$main$nodefactor_race` (vector, one per race group)
- `netstats$main$nodefactor_deg.casl` (vector, one per deg.casl level)
- `netstats$main$concurrent` (scalar)
- `netstats$main$diss.byage` — `dissolution_coefs` S3 object

## `model_casl.R`
- `netstats$casl$edges`
- `netstats$casl$nodematch_age.grp`
- `netstats$casl$nodefactor_age.grp`
- `netstats$casl$nodematch_race_diffF`
- `netstats$casl$nodefactor_race`
- `netstats$casl$nodefactor_deg.main`
- `netstats$casl$concurrent`
- `netstats$casl$diss.byage`

## `model_ooff.R`
- `netstats$inst$edges`
- `netstats$inst$nodematch_age.grp`
- `netstats$inst$nodefactor_age.grp`
- `netstats$inst$nodematch_race_diffF`
- `netstats$inst$nodefactor_race`
- `netstats$inst$nodefactor_risk.grp`
- `netstats$inst$nodefactor_deg.tot`

## Not directly consumed but still part of the contract
Anything else currently in `netstats$*` — `nodematch_race`,
`absdiff_age`, `absdiff_sqrt.age`, etc. — is also part of the contract
by default because the package ships it publicly. The validation script
does a full-object diff rather than checking only the fields above.

## `netparams` contract
The validation also captures `netparams` whole (the input to
`build_netstats`). Joint models are *additive* outputs (new
`$joint_model` fields), so:
- existing `netparams$main$*`, `$casl$*`, `$inst$*`, `$all$*` fields must be
byte-identical under `method = "existing"`;
- new fields (`$joint_model`) are ignored during comparison.
Empty file.
Loading