Skip to content

[Phase 1.4] Support post-stratification via user-supplied target population distribution #64

@smjenness

Description

@smjenness

Context

build_netstats() currently assembles a synthetic target population from a mix of reference sources:

  • age: NCHS 2020 general-population pyramid (not MSM-specific)
  • race: ARTnetData::race.dist national or city-specific (not MSM-specific)
  • deg.casl: ARTnet sample's own deg.casl.dist
  • deg.main: ARTnet sample's own deg.main.dist
  • role.class: ARTnet sample distribution
  • risk.grp: uniform (5 equal quintiles)

This is a patchwork — no single coherent post-stratification target. If a user wants to parametrize for, say, CDC NHBS 2023 MSM demographics, there is no clean API to do so.

Proposed approach

Add a target_pop argument to build_netstats() that accepts either:

  1. A named list of marginal distributions (current behavior as default):

    target_pop = list(
      age.pyramid  = full.age.pyr,         # length = nAges
      race.props   = c(Black=0.15, Hispanic=0.20, White.Other=0.65),
      deg.casl     = c(0.45, 0.30, 0.15, 0.10),
      deg.main     = c(0.60, 0.35, 0.05),
      role.class   = c(0.18, 0.27, 0.55),
      risk.grp     = rep(0.2, 5)
    )
  2. A pre-built data frame of synthetic respondents (user has their own joint distribution):

    target_pop = my_synthetic_pop  # data.frame with age, race, deg.casl, etc.
  3. A built-in reference (character flag):

    target_pop = 'nhbs_msm_2022'   # package-provided MSM demographics

For #3, we'd add built-in reference population data to ARTnetData (CDC NHBS or similar).

Tasks

  • Design target_pop argument API (three-option: list / data.frame / character).
  • Default behavior unchanged from current (patchwork references).
  • When user provides a joint data.frame, use it directly (skip sampling).
  • When user provides marginal distributions, sample independently (current behavior generalization).
  • Add at least one built-in reference population to ARTnetData (CDC NHBS MSM or similar) — coordinate with Sam on data source.
  • Document trade-offs: marginal resampling vs joint user-provided.
  • Unit tests covering each input form.

Acceptance criteria

  • build_netstats(..., target_pop = NULL) produces current output byte-identically.
  • build_netstats(..., target_pop = user_df) uses the user's joint distribution.
  • At least one built-in reference MSM population is available.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions