Summary
social_security is shipped as a stored input in populace_us_2024.h5, but it disagrees with the sum of its four components (social_security_retirement / _disability / _survivors / _dependents) for 2,019 / 160,858 person records (1.255%). In policyengine-us social_security is an adds variable (defined as the sum of those four components), so shipping the aggregate as a stored input that contradicts the components is internally inconsistent.
Pattern
Almost all mismatches are records where the stored total is positive but all four components are $0 — the total was imputed/assigned but never decomposed. Worst case: stored $121,655, all four components $0.
Materiality (low)
Weighted, this is negligible:
- Affected records carry 4,852 of 337,861,201 household weight ≈ 0.0014% of population (80% have weight < 0.01).
- Weighted Social Security gap: ≈ $0.3B = 0.02% of total benefits.
- Concentrated in near-zero-weight, high-income tail records (median age 59 — not all retirement-age).
- A handful have moderate weight (up to ~865), e.g. an age-40 record with $64,235 of undecomposed SS.
So it does not move weighted aggregates. Filing as low-priority correctness/hygiene, not a numbers bug.
Why it's still worth fixing
Because social_security is an adds variable, any consumer that strips the stored aggregate and recomputes from components — standard policyengine-us behavior, and what calibration pipelines that drop "pseudo-input" aggregates do — silently zeroes out SS for these records, trusting the (incomplete) components over the (complete) stored total. The drop is invisible downstream.
Repro
import pandas as pd
f = "populace_us_2024.h5" # f0af251 build
comps = [f"social_security_{x}" for x in ["dependents", "disability", "retirement", "survivors"]]
df = pd.read_hdf(f, "person", columns=["social_security"] + comps)
gap = df["social_security"] - df[comps].sum(axis=1)
print((gap.abs() > 1).sum(), "records mismatch; max", gap.abs().max())
Suggested fix
Either populate the four components so they sum to the stored total during the SS imputation/decomposition step, or stop shipping the stored social_security aggregate so the adds formula is the single source of truth (consistent by construction).
Build observed: populace-us-2024-f0af251-703bd81a565c-20260620 (latest at filing; the pattern is likely longstanding across builds).
Summary
social_securityis shipped as a stored input inpopulace_us_2024.h5, but it disagrees with the sum of its four components (social_security_retirement/_disability/_survivors/_dependents) for 2,019 / 160,858 person records (1.255%). In policyengine-ussocial_securityis anaddsvariable (defined as the sum of those four components), so shipping the aggregate as a stored input that contradicts the components is internally inconsistent.Pattern
Almost all mismatches are records where the stored total is positive but all four components are $0 — the total was imputed/assigned but never decomposed. Worst case: stored $121,655, all four components $0.
Materiality (low)
Weighted, this is negligible:
So it does not move weighted aggregates. Filing as low-priority correctness/hygiene, not a numbers bug.
Why it's still worth fixing
Because
social_securityis anaddsvariable, any consumer that strips the stored aggregate and recomputes from components — standard policyengine-us behavior, and what calibration pipelines that drop "pseudo-input" aggregates do — silently zeroes out SS for these records, trusting the (incomplete) components over the (complete) stored total. The drop is invisible downstream.Repro
Suggested fix
Either populate the four components so they sum to the stored total during the SS imputation/decomposition step, or stop shipping the stored
social_securityaggregate so theaddsformula is the single source of truth (consistent by construction).Build observed:
populace-us-2024-f0af251-703bd81a565c-20260620(latest at filing; the pattern is likely longstanding across builds).