Skip to content

Synth-aggregated duration g-computation in build_netstats (#73)#74

Merged
smjenness merged 1 commit intomainfrom
feature/duration-gcomp-on-synth
Apr 25, 2026
Merged

Synth-aggregated duration g-computation in build_netstats (#73)#74
smjenness merged 1 commit intomainfrom
feature/duration-gcomp-on-synth

Conversation

@smjenness
Copy link
Copy Markdown
Contributor

Closes the partial task on #63 raised by PR #71's review: under method = "joint" + duration.method = "joint_lm" in build_netparams, the within-ARTnet stratum aggregation that build_netparams emits is now overridden in build_netstats with synth-population aggregation.

What changed

  • New private helper .aggregate_synth_byage_durations() in R/NetStats.R.
    • Predicts joint_lm log-duration at same.race = 0 and 1 per synth ego.
    • Marginalizes over partner-race uncertainty using P(same.race | ego) from joint_nm_race_model.
    • Exponentiates, takes median per stratum, applies the existing geometric transformation mean.dur.adj = 1 / (1 - 2^(-1/median)).
  • build_netstats joint synth-prediction block now aliases synth$index.age.grp <- synth$age.grp (joint_lm uses index.age.grp RHS naming; joint_nm_*_model uses age.grp) and calls the helper for main and casl. Skips when joint_duration_model is NULL.
  • diss.byage dissolution_coefs calls read from a local override vector when present, else fall back to netparams[[layer]]$durs.<layer>.byage$mean.dur.adj.
  • diss.homog still uses the within-ARTnet aggregation (not consumed by EpiModelHIV-Template's tergm offset; can land in a follow-up).

sex.cess.mod handling: helper preserves the deterministic post-cessation "dead row" (mean.dur.adj = 1) at the end of the override vector so dissolution_coefs sees the same shape it did before.

Empirical effect

Stratum-level mean.dur.adj (Atlanta + race = TRUE, N = 10k):

Layer/Stratum ARTnet-cond. (PR #71) synth-Atlanta (this PR) synth-NHBS shift
main / nonmatch 235.5 243.0 234.2
main / matched.5 445.6 491.5 491.5
casl / nonmatch 92.3 86.9 86.9
casl / matched.1 48.9 68.9 47.9
casl / matched.5 129.7 186.0 126.9

Main durations move modestly (joint_lm has weak race effects on duration); casl moves substantially under shifted population (matched.1: 68.9 Atlanta vs 47.9 NHBS) — exactly the marginal-vs-joint correction the refactor was designed to apply. This is the dyad-level analog of the formation-stat divergence we documented in PR #68 / #69 reviews.

Validation

  • Backward-compat snapshot harness: 3/3 match on default and explicit method = "existing".
  • New unit tests: tests/testthat/test-duration-gcomp-synth.R, 7 blocks, 17 assertions. Covers: override fires under joint + joint_lm; falls back under empirical durations; falls back under method = "existing"; diverges under shifted race.prop; preserves sex.cess.mod dead row; produces well-formed disscoef objects; handles race = FALSE without joint_nm_race_model.
  • Full testthat suite: 525 / 525 pass.
  • R CMD check: 0 errors / 0 warnings / 0 notes.
  • End-to-end EpiModelHIV-Template: all 6 ERGMs (3 layers × default + joint methods) converge cleanly under Stochastic-Approximation. Main coef.form shifts -17.570 → -17.602 between PR Duration methods: empirical + joint_lm (#63 phase 3) #71's path and this PR's, consistent with matched.5 mean.dur.adj moving 445.6 → 491.5 under synth aggregation.

Approach note

This implements Option A from #63: ego attributes only on the prediction RHS, with partner-race marginalized via the existing joint_nm_race_model. No explicit synthetic partnership-pair construction (which would be Option B). The simpler choice is consistent with how nodematch/absdiff are handled in PR #69 and gives correct results for the cross-sectional-age-of-extant-ties target Steve Goodreau articulated in the PR #71 review.

Test plan

  • Backward-compat snapshot 3/3 (default + explicit method = "existing")
  • Override fires under joint + joint_lm; falls back under empirical / method = "existing"
  • Casl durations diverge >1% under shifted race.prop on at least one stratum
  • sex.cess.mod dead row preserved
  • race = FALSE path works
  • Unit tests 525/525
  • R CMD check 0/0/0
  • End-to-end EpiModelHIV-Template estimation under both default and joint paths converges

Depends on #71 (merged). Part of #63 — closes the partial task. Unblocks #65 (Phase 1.5 validation suite).

Closes the partial task on #63: under method = "joint" +
duration.method = "joint_lm" in build_netparams, the within-ARTnet
stratum aggregation that build_netparams emits is now overridden in
build_netstats with synth-population aggregation. Stratum-level
mean.dur.adj values feeding the diss.byage dissolution offset are
computed from per-synthetic-ego predictions of joint_lm log-duration,
marginalized over partner-race uncertainty using joint_nm_race_model,
then median-aggregated within (same.age.grp x index.age.grp) cells.

Downstream effect: when the synthetic target population's joint
attribute distribution differs from ARTnet's, dissolution offsets
diverge from the within-ARTnet estimates the previous code path
produced. Verified on a NHBS-like shifted race.prop run: casl
durations diverge meaningfully across strata (matched.1: 68.9 vs
47.9 weeks Atlanta vs NHBS), main is largely unchanged because the
joint_lm fit on main has weak race effects.

Implementation:
- New private helper .aggregate_synth_byage_durations() in NetStats.R.
  Predicts joint_lm log-duration at same.race = 0 and 1 per ego,
  weights by P(same.race | ego) from joint_nm_race_model, exponentiates,
  median-aggregates per stratum, applies the existing geometric
  transformation (1 / (1 - 2^(-1/median))) for mean.dur.adj.
- In build_netstats joint synth-prediction block: alias
  synth$index.age.grp <- synth$age.grp (joint_lm uses index.age.grp
  RHS naming; joint_nm_*_model uses age.grp), then call helper for
  main and casl. Skip when joint_duration_model is NULL (which is the
  case under duration.method = "empirical" or method = "existing").
- diss.byage dissolution_coefs() calls now read from a local override
  vector when present, else fall back to netparams[[layer]]$durs.<layer>.byage$mean.dur.adj.
- diss.homog still uses the within-ARTnet aggregation; not consumed
  by EpiModelHIV-Template's tergm offset, so synth analog can land in
  a follow-up.

sex.cess.mod handling: helper preserves the deterministic post-cessation
"dead row" (mean.dur.adj = 1) at the end of the override vector so
dissolution_coefs sees the same shape it did before this PR.

Validation:
- Backward-compat snapshot harness: 3/3 match on default and explicit
  method = "existing".
- New tests in test-duration-gcomp-synth.R (7 blocks, 17 assertions):
  override fires under joint + joint_lm; falls back under empirical;
  falls back under method = "existing"; diverges under shifted
  race.prop (>1% on at least one casl stratum); preserves the
  sex.cess.mod dead row; produces well-formed disscoef objects;
  handles race = FALSE without joint_nm_race_model.
- Full testthat suite: 525 / 525 pass.
- R CMD check: 0 errors / 0 warnings / 0 notes.
- End-to-end EpiModelHIV-Template run: all 6 ERGMs converge under
  Stochastic-Approximation; main coef.form drifts -17.570 -> -17.602
  under the new synth-aggregated durations vs PR #71, consistent with
  matched.5 mean.dur.adj moving 445.6 -> 491.5.

Closes the dyad-level synthetic-pair gap raised by PR #71. Logical
follow-ups now possible: #65 (Phase 1.5 validation suite, now
unblocked); #72 (formation-stat sampling-bias work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant