Follow-up to #202 and part of the broader target-surface cleanup in #200.
#202 fixes the immediate inversion by applying a documented constant fallback: when a record has only a dividend total and no observed qualified/non-qualified components, split it 78% qualified / 22% non-qualified based on the 2015 PUF aggregate E00650/E00600 share. That is a good first-order patch, but it gives every unsplit CPS dividend row the same qualified share.
We should replace that constant fallback with a stochastic or modeled qualified_dividend_share imputation learned from PUF rows with observed dividend composition.
Suggested shape:
- Train/impute
qualified_dividend_share = qualified_dividend_income / ordinary_dividend_income from PUF donor rows where ordinary dividends are positive and the qualified/non-qualified split is observed.
- Apply the imputed share only to rows with an unsplit positive dividend total and no observed components, e.g. CPS
DIV_VAL-only rows.
- Preserve each row's total dividend exactly:
qualified + non_qualified == ordinary_dividend_income == dividend_income up to numerical tolerance.
- Keep observed PUF component rows unchanged.
- Make the stochastic draw reproducible via the pipeline seed/checkpoint metadata.
- Prefer conditioning on relevant predictors if available, such as dividend amount, income/AGI proxies, age, filing/tax-unit features, and asset/investment indicators.
Validation target:
- Rebuild or run a focused diagnostic showing the qualified/non-qualified split moves toward the SOI/eCPS evidence without breaking export support parity.
- Report national weighted totals and filer counts for
qualified_dividend_income, non_qualified_dividend_income, and total dividends before/after.
- Confirm this does not reintroduce the old all-non-qualified CPS-spine failure.
This should be treated as a quality improvement after #202, not a reason to block the constant-share bug fix.
Follow-up to #202 and part of the broader target-surface cleanup in #200.
#202 fixes the immediate inversion by applying a documented constant fallback: when a record has only a dividend total and no observed qualified/non-qualified components, split it 78% qualified / 22% non-qualified based on the 2015 PUF aggregate E00650/E00600 share. That is a good first-order patch, but it gives every unsplit CPS dividend row the same qualified share.
We should replace that constant fallback with a stochastic or modeled
qualified_dividend_shareimputation learned from PUF rows with observed dividend composition.Suggested shape:
qualified_dividend_share = qualified_dividend_income / ordinary_dividend_incomefrom PUF donor rows where ordinary dividends are positive and the qualified/non-qualified split is observed.DIV_VAL-only rows.qualified + non_qualified == ordinary_dividend_income == dividend_incomeup to numerical tolerance.Validation target:
qualified_dividend_income,non_qualified_dividend_income, and total dividends before/after.This should be treated as a quality improvement after #202, not a reason to block the constant-share bug fix.