Method comparison validation suite (#65, Phase 1.5)#76
Merged
Conversation
Systematic side-by-side comparison of method = "existing" vs
method = "joint" + duration.method = "joint_lm" across four
scenarios:
- atlanta_default Baseline EpiModelHIV-Template config
- national_no_geog No geographic stratification (sanity)
- atlanta_nhbs_shifted Atlanta with race.prop = c(0.35, 0.25, 0.40)
- atlanta_no_race race = FALSE path (sanity)
Adds inst/validation/method_comparison.R with three entry points:
source(system.file("validation/method_comparison.R", package = "ARTnet"))
res <- compare_methods() # runs full suite (~30s)
summarize_comparison(res) # console summary, top shifts
render_comparison_report(res) # writes inst/validation/method_comparison.md
The harness walks every numeric target stat in netstats per layer
(edges, nodefactor_*, nodematch_*, absdiff_*, concurrent,
dissolution_duration) and produces a long-format data.frame with
columns scenario, layer, stat, level, existing, joint, abs_diff,
pct_diff. Output is suitable both for interactive inspection and
for serializing to publications/vignettes.
Headline findings (full table in method_comparison.md):
- 363 target-stat cells across 4 scenarios
- 229 (~63%) shift > 5% under joint vs existing
- Largest shifts on dissolution_duration (matched.5 main: -47%)
and inst nodematch_age.grp[5] (-51% in atlanta_default)
- Population shift (atlanta_nhbs_shifted) produces 66/96 cells
>5% shifted, slightly more than atlanta_default's 63/96 -- the
NHBS-like race composition stresses the marginal-vs-joint
correction further than Atlanta's default
- atlanta_no_race shifts 49/75 cells; without race in the joint
formulas, the dyad-level corrections still dominate
Inst/validation/method_comparison.md is the canonical report that
will feed any methods-paper drafting work.
New tests in tests/testthat/test-method-comparison.R (4 blocks):
verify long-format structure; abs_diff and pct_diff arithmetic;
dissolution_duration restricted to main/casl; at least one cell
materially shifted on Atlanta default. Use a 2000-node mini
scenario for speed; full suite in inst/validation/ stays at 5000.
Closes #65 (Phase 1.5 validation suite).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #65. Now that the joint g-comp refactor is fully landed (PRs #67 / #68 / #69 / #71 / #74), this PR adds the systematic marginal-vs-joint comparison the validation phase calls for.
What's new
inst/validation/method_comparison.R— three entry points, designed to be sourced viasystem.file():The harness walks every numeric target stat in
netstatsper layer (edges, nodefactor_, nodematch_, absdiff_*, concurrent, dissolution_duration) and produces a long-format data.frame with one row per(scenario, layer, stat, level)cell —existingandjointvalues side-by-side, plusabs_diffandpct_diff.Four scenarios:
atlanta_defaultnational_no_geogatlanta_nhbs_shiftedrace.prop = c(0.35, 0.25, 0.40)(NHBS-MSM-like)atlanta_no_racerace = FALSEpath (sanity check)inst/validation/method_comparison.mdis the canonical rendered report of this run, committed to the repo so anyone can read the comparison without running the harness. It will feed methods-paper drafting work directly.Headline findings (full breakdown in
method_comparison.md)atlanta_default: 63 / 96 cells materially shiftednational_no_geog: 51 / 96atlanta_nhbs_shifted: 66 / 96 (slightly more than default — NHBS race mix stresses the correction further)atlanta_no_race: 49 / 75 (race off, but dyad-level corrections still drive the divergence)Largest shifts cluster around:
dissolution_durationmatched-and-old strata (matched.5 main: −47% across all scenarios). This is the multivariate-fit-vs-stratum-only-empirical-with-smoothing tradeoff documented in PR Duration methods: empirical + joint_lm (#63 phase 3) #71.inst$nodematch_age.grp[5](small targets dominated by AIC-selected interactions in Joint dyad-level modeling: nodematch + absdiff (#63 phases 1 & 2) #69's joint dyad fits).casl$nodefactor_deg.main[3](+35–43%): high-deg.main outliers are sparse in ARTnet, so Poisson joint fit produces noticeably different per-stratum sums than the marginal × deg.main.dist multiplication.What it confirms
atlanta_nhbs_shiftedsees more material shifts thanatlanta_default, consistent with the marginal-vs-joint critique: the bigger the gap between target-population and ARTnet sample composition, the more the correction matters.Tests
tests/testthat/test-method-comparison.R(4 blocks): structure check, arithmetic check, dissolution_duration scoping, baseline divergence sanity. Uses a 2000-node mini scenario for speed; full suite stays at 5000.Test plan
inst/validation/method_comparison.mdrendered and committedCloses #65. Logical follow-up: #64 (post-stratification API) — with these comparison results in hand, we know the magnitudes of population-shift effects and can size the API surface accordingly.