target_pop API: list / data.frame / character forms (#64, Phase 1.4)#77
Merged
target_pop API: list / data.frame / character forms (#64, Phase 1.4)#77
Conversation
Adds a single `target_pop` argument to `build_netstats()` that accepts three forms and unifies post-stratification of the synthetic target population: 1. **NULL (default)** -- legacy patchwork-of-references behavior, byte- identical to pre-#64. Verified via the inst/validation/ snapshot harness (3/3 match on default and explicit method = "existing"). 2. **Named list** -- per-marginal overrides for any subset of {age.pyramid, race.prop, deg.casl, deg.main, deg.tot, role.class, risk.grp}. Names not in the list fall through to existing defaults. The list form supersedes the older one-arg-at-a-time approach (age.pyramid, race.prop) and extends the override surface to the per-attribute distributions previously sourced from netparams. 3. **data.frame** -- one row per node, columns supplying user-specified joint attribute values. Required: age, deg.casl, deg.main, role.class, risk.grp (plus race when race = TRUE). Optional with derivation: sqrt.age, age.grp, active.sex, deg.tot, diag.status. When supplied, attribute sampling is bypassed entirely and `network.size` is set to `nrow(target_pop)`. Designed for users with a fully-specified joint target population (NHBS / AMIS post-stratification, custom synthetic cohorts). 4. **Character** (e.g., target_pop = "nhbs_msm_2022") -- raises an informative not-yet-implemented error. Built-in reference data ships via ARTnetData and requires PI coordination; tracked as a future extension on this issue. Implementation: - New private helper `.parse_target_pop()` in R/NetStats.R does form detection, column / element validation, and normalization (e.g., race.props -> race.prop alias). - The Nodal Attribute Initialization block is restructured into an if/else: data.frame form pulls attributes directly into the same attr_* locals the sampling path produces; sampling path applies list-form distribution overrides via `.dist_*` locals. - diag.status block honors a user-supplied diag.status column when data.frame form provides one; otherwise falls through to the epistats-based draw (init.hiv.prev or hiv.mod) on the user's attribute vectors. - Common attr assignments factored out so both paths share one place where out$attr is populated. New tests: tests/testthat/test-target-pop.R (12 blocks, 25 assertions) covering: NULL byte-identical to no-arg; list form with race.prop, race.props alias, deg.casl override; unknown list element error; data.frame form attribute pass-through; deg.tot cap derivation; missing-required-column error; diag.status fallback; composition with method = "joint"; character-form error; non-list/df/char input error. Validation: - Backward-compat snapshot harness: 3/3 match on default and explicit method = "existing". - Full testthat suite: 571 / 571 pass. - R CMD check: 0 errors / 0 warnings / 0 notes. - Manual exercise of all four forms (NULL, list, data.frame, character) produces expected behavior including correct error messages. Closes #64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review feedback: the original character-form error message and docstring used 'nhbs_msm_2022' as a placeholder name from issue #64's example. NHBS microdata is restricted, not appropriate for an ARTnetData-shipped reference. The realistic plan is geography-specific general male population demographics: NCHS age pyramid (already in build_netstats) + race composition from ARTnetData::race.dist (already in the package) per city / state / region. No restricted data needed; bundles like "atlanta" or "us_msm_male" would just package what's already there into named entry points. Updates: - Error message no longer references NHBS; describes the actual planned set (NCHS + ARTnetData::race.dist by geography). - Roxygen @param doc rewritten to match. - Test uses target_pop = "atlanta" (a realistic future bundle name) instead of the speculative NHBS example. No code path change; only the user-facing strings and one test trigger value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
smjenness
added a commit
that referenced
this pull request
Apr 25, 2026
A 2,800-word standalone writeup at inst/validation/method_refactor_report.md documenting the methodological refactor delivered by PRs #66-#77. Structured as introduction / methods / results / discussion + references + reproducibility section. Sections cover: - Intro: ARTnet's role in EpiModelHIV-p; the marginal-vs-joint problem the legacy univariate approach exposed; the ARTnetPredict motivation for fixing the within-ARTnet baseline before forward projection. - Methods: the three new arguments (`method`, `duration.method`, `target_pop`); per-layer joint Poisson + binomial + Gaussian + log-linear fits; g-computation aggregation in build_netstats; the cross-sectional age-of-extant-ties target for dissolution; the validation infrastructure (snapshot harness, method comparison, GHA CI). - Results: 229/363 cells (63%) shift > 5% across four scenarios; worst shifts on dissolution durations in matched-and-old strata (-47%), one-time nodematch in older age groups (-51%), and high-deg.main casual nodefactor (+40%); decomposition of the -15% Atlanta main-edges shift attributed to ARTnet's 80.7% White vs Atlanta's 51.5% Black composition; coefficient strengthening on deg.casl (-0.24 -> -0.55), hiv2 (+0.09 -> +0.25), age slope, and the AIC-selected age:deg.casl interaction; end-to-end ERGM convergence with netdx |Z| <= 2.05 across 1000 sims. - Discussion: implications for EpiModelHIV-p simulations (Atlanta-specific models over-target main edges by 15%); three explicit limitations (geometric tergm dissolution can't honor Weibull k != 1, length-bias and 5-truncation in formation stats not yet addressed in #72, joint_lm uses ongoing partnerships only); ARTnetPredict's three unblocked next steps (corrected 2017-18 baseline, 2022-24 AMIS projection via target_pop data.frame, NHBS post-stratification as a one-line argument); methods paper outline. Numbers cited are spot-checked against the committed inst/validation/method_comparison.md to ensure the report and the machine-generated comparison agree. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #64. Adds a single
target_popargument tobuild_netstats()that unifies post-stratification of the synthetic target population.API
target_popaccepts four forms:NULL(default){age.pyramid, race.prop, deg.casl, deg.main, deg.tot, role.class, risk.grp}. Missing names fall through to defaultsage,deg.casl,deg.main,role.class,risk.grp(+racewhenrace = TRUE). Optional with derivation:sqrt.age,age.grp,active.sex,deg.tot,diag.status.network.sizeis overridden tonrow(target_pop)"atlanta","us_msm_male") bundled from NCHS age pyramid +ARTnetData::race.distby geography. Raises a clear not-yet-implemented error for nowWhy each form
age.pyramid,race.prop) and extends the override surface to per-attribute distributions previously sourced silently fromnetparams. One place to set every marginal you might want to override.Implementation
.parse_target_pop()does form detection, column / element validation, and normalization (e.g., handlesrace.propsalias forrace.prop).attr_*locals the sampling path produces; sampling path applies list-form distribution overrides via.dist_*locals.diag.statuscolumn when data.frame form provides one; otherwise falls through to the existing epistats-based draw using the user's attribute vectors.out$attrassignments factored out so both code paths share one place to populate.Validation
method = "existing"(this is the real test that the restructuring didn't break anything).Tests
tests/testthat/test-target-pop.R— 12 blocks, 25 assertions. Covers:race.propoverride produces matching race composition;deg.casloverride produces matching distribution;race.propsalias normalized torace.prop; unknown list elements raise informative errordeg.totcap derivation; missing-required-column error;diag.statusfalls back to epistats when absent; composes withmethod = "joint"(internal consistencysum(nf_*) == 2 * edgesstill holds)"must be NULL, a list, a data.frame, or a character string"Test plan
method = "joint""atlanta","us_msm_male"): tracked as future work. Implementation is a lookup table from name tolist(age.pyramid = ..., race.prop = ...)using NCHS age pyramid (already inbuild_netstats) +ARTnetData::race.dist(already shipped) — no new external data needed.Closes #64. With #64 + #65 + the joint g-comp refactor (#61–#74) all landed, the ARTnet → ERGM target stat pipeline is now fully joint-corrected and post-stratifiable.