`social_security` stored input contradicts its four components for ~2k records (decomposition gap)

## Summary

`social_security` is shipped as a **stored input** in `populace_us_2024.h5`, but it disagrees with the sum of its four components (`social_security_retirement` / `_disability` / `_survivors` / `_dependents`) for **2,019 / 160,858 person records (1.255%)**. In policyengine-us `social_security` is an `adds` variable (defined as the sum of those four components), so shipping the aggregate as a stored input that contradicts the components is internally inconsistent.

## Pattern

Almost all mismatches are records where the stored total is **positive but all four components are $0** — the total was imputed/assigned but never decomposed. Worst case: stored **$121,655**, all four components **$0**.

## Materiality (low)

Weighted, this is **negligible**:
- Affected records carry **4,852 of 337,861,201 household weight ≈ 0.0014%** of population (80% have weight < 0.01).
- Weighted Social Security gap: **≈ $0.3B = 0.02%** of total benefits.
- Concentrated in near-zero-weight, high-income tail records (median age 59 — not all retirement-age).
- A handful have moderate weight (up to ~865), e.g. an age-40 record with $64,235 of undecomposed SS.

So it does not move weighted aggregates. Filing as **low-priority correctness/hygiene**, not a numbers bug.

## Why it's still worth fixing

Because `social_security` is an `adds` variable, any consumer that strips the stored aggregate and recomputes from components — standard policyengine-us behavior, and what calibration pipelines that drop "pseudo-input" aggregates do — silently **zeroes out SS for these records**, trusting the (incomplete) components over the (complete) stored total. The drop is invisible downstream.

## Repro

```python
import pandas as pd
f = "populace_us_2024.h5"  # f0af251 build
comps = [f"social_security_{x}" for x in ["dependents", "disability", "retirement", "survivors"]]
df = pd.read_hdf(f, "person", columns=["social_security"] + comps)
gap = df["social_security"] - df[comps].sum(axis=1)
print((gap.abs() > 1).sum(), "records mismatch; max", gap.abs().max())
```

## Suggested fix

Either populate the four components so they sum to the stored total during the SS imputation/decomposition step, or stop shipping the stored `social_security` aggregate so the `adds` formula is the single source of truth (consistent by construction).

Build observed: `populace-us-2024-f0af251-703bd81a565c-20260620` (latest at filing; the pattern is likely longstanding across builds).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`social_security` stored input contradicts its four components for ~2k records (decomposition gap) #183

Summary

Pattern

Materiality (low)

Why it's still worth fixing

Repro

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

social_security stored input contradicts its four components for ~2k records (decomposition gap) #183

Description

Summary

Pattern

Materiality (low)

Why it's still worth fixing

Repro

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`social_security` stored input contradicts its four components for ~2k records (decomposition gap) #183