Skip to content

Method comparison validation suite (#65, Phase 1.5)#76

Merged
smjenness merged 1 commit intomainfrom
feature/method-comparison-validation
Apr 25, 2026
Merged

Method comparison validation suite (#65, Phase 1.5)#76
smjenness merged 1 commit intomainfrom
feature/method-comparison-validation

Conversation

@smjenness
Copy link
Copy Markdown
Contributor

Closes #65. Now that the joint g-comp refactor is fully landed (PRs #67 / #68 / #69 / #71 / #74), this PR adds the systematic marginal-vs-joint comparison the validation phase calls for.

What's new

inst/validation/method_comparison.R — three entry points, designed to be sourced via system.file():

source(system.file("validation/method_comparison.R", package = "ARTnet"))
res <- compare_methods()              # full suite, ~30s
summarize_comparison(res)             # console summary
render_comparison_report(res)         # writes inst/validation/method_comparison.md

The harness walks every numeric target stat in netstats per layer (edges, nodefactor_, nodematch_, absdiff_*, concurrent, dissolution_duration) and produces a long-format data.frame with one row per (scenario, layer, stat, level) cell — existing and joint values side-by-side, plus abs_diff and pct_diff.

Four scenarios:

Scenario Description
atlanta_default Baseline EpiModelHIV-Template config (Atlanta, race = TRUE)
national_no_geog No geographic stratification (sanity check)
atlanta_nhbs_shifted Atlanta config with race.prop = c(0.35, 0.25, 0.40) (NHBS-MSM-like)
atlanta_no_race race = FALSE path (sanity check)

inst/validation/method_comparison.md is the canonical rendered report of this run, committed to the repo so anyone can read the comparison without running the harness. It will feed methods-paper drafting work directly.

Headline findings (full breakdown in method_comparison.md)

  • 363 target-stat cells across 4 scenarios
  • 229 cells (~63%) shift > 5% under joint vs existing
  • Per scenario:
    • atlanta_default: 63 / 96 cells materially shifted
    • national_no_geog: 51 / 96
    • atlanta_nhbs_shifted: 66 / 96 (slightly more than default — NHBS race mix stresses the correction further)
    • atlanta_no_race: 49 / 75 (race off, but dyad-level corrections still drive the divergence)

Largest shifts cluster around:

What it confirms

  • The joint refactor is doing material work — not a no-op. Substantial fraction of every layer's target stats are corrected.
  • The correction direction is consistent across scenarios (e.g., main dissolution durations always pulled down for older matched strata, whether we're looking at default Atlanta, NHBS-shifted, or national).
  • atlanta_nhbs_shifted sees more material shifts than atlanta_default, consistent with the marginal-vs-joint critique: the bigger the gap between target-population and ARTnet sample composition, the more the correction matters.

Tests

tests/testthat/test-method-comparison.R (4 blocks): structure check, arithmetic check, dissolution_duration scoping, baseline divergence sanity. Uses a 2000-node mini scenario for speed; full suite stays at 5000.

Test plan

  • Backward-compat snapshot harness 3/3
  • Comparison runs without error across all 4 scenarios
  • Output structure matches spec (long format, all expected stats)
  • Unit tests pass (4/4); full suite 546/546 passes
  • R CMD check 0/0/0
  • inst/validation/method_comparison.md rendered and committed

Closes #65. Logical follow-up: #64 (post-stratification API) — with these comparison results in hand, we know the magnitudes of population-shift effects and can size the API surface accordingly.

Systematic side-by-side comparison of method = "existing" vs
method = "joint" + duration.method = "joint_lm" across four
scenarios:

- atlanta_default        Baseline EpiModelHIV-Template config
- national_no_geog       No geographic stratification (sanity)
- atlanta_nhbs_shifted   Atlanta with race.prop = c(0.35, 0.25, 0.40)
- atlanta_no_race        race = FALSE path (sanity)

Adds inst/validation/method_comparison.R with three entry points:

  source(system.file("validation/method_comparison.R", package = "ARTnet"))
  res <- compare_methods()              # runs full suite (~30s)
  summarize_comparison(res)             # console summary, top shifts
  render_comparison_report(res)         # writes inst/validation/method_comparison.md

The harness walks every numeric target stat in netstats per layer
(edges, nodefactor_*, nodematch_*, absdiff_*, concurrent,
dissolution_duration) and produces a long-format data.frame with
columns scenario, layer, stat, level, existing, joint, abs_diff,
pct_diff. Output is suitable both for interactive inspection and
for serializing to publications/vignettes.

Headline findings (full table in method_comparison.md):

  - 363 target-stat cells across 4 scenarios
  - 229 (~63%) shift > 5% under joint vs existing
  - Largest shifts on dissolution_duration (matched.5 main: -47%)
    and inst nodematch_age.grp[5] (-51% in atlanta_default)
  - Population shift (atlanta_nhbs_shifted) produces 66/96 cells
    >5% shifted, slightly more than atlanta_default's 63/96 -- the
    NHBS-like race composition stresses the marginal-vs-joint
    correction further than Atlanta's default
  - atlanta_no_race shifts 49/75 cells; without race in the joint
    formulas, the dyad-level corrections still dominate

Inst/validation/method_comparison.md is the canonical report that
will feed any methods-paper drafting work.

New tests in tests/testthat/test-method-comparison.R (4 blocks):
verify long-format structure; abs_diff and pct_diff arithmetic;
dissolution_duration restricted to main/casl; at least one cell
materially shifted on Atlanta default. Use a 2000-node mini
scenario for speed; full suite in inst/validation/ stays at 5000.

Closes #65 (Phase 1.5 validation suite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@smjenness smjenness merged commit 4c539b7 into main Apr 25, 2026
1 check passed
@smjenness smjenness deleted the feature/method-comparison-validation branch April 25, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 1.5] Validation suite: marginal vs joint_gcomp on ARTnet 2017-18

1 participant