Problem
For itemized deduction inputs, Populace should avoid using formula-owned deduction totals as exported columns, but a naive allocation from an aggregate deduction total to leaf inputs can lose important distributional information.
A concrete example is medical expenses:
- CPS/ASEC has health-related leaves such as premiums and out-of-pocket medical expenses, including people with medical expenses who do not itemize or do not clear the federal medical deduction floor.
- PUF has tax deduction observables such as medical expense deduction detail, but not necessarily the same underlying health leaves.
- PolicyEngine already defines the rules engine relationship: health expense leaves ->
medical_out_of_pocket_expenses -> medical_expense_deduction.
If we simply seed or split the deduction total, we miss non-itemizers and people below the threshold. If we only trust CPS leaves, we may miss the PUF-side distribution among itemizers. We need a principled bridge.
Proposal
Design a formula-constrained leaf-imputation stage, starting with medical expenses:
- Use CPS health leaves as the receiver/support distribution.
- Use the PolicyEngine rules engine to calculate formula-owned outputs from leaf candidates.
- Use PUF deduction observables and other predictors as donor constraints/signals.
- Impute or adjust the leaf inputs so the rules-engine-calculated deduction matches the donor/target information as closely as possible.
- Export only formula leaves; never export formula-owned deduction totals.
For medical, this could mean imputing these leaves on the PUF support side:
health_insurance_premiums_without_medicare_part_b
medicare_part_b_premiums
other_medical_expenses
- possibly
over_the_counter_health_expenses where relevant outside the IRS deduction
Then run PE to calculate medical_out_of_pocket_expenses and medical_expense_deduction for calibration diagnostics and target fit.
Acceptance criteria
- No formula-owned deduction totals are emitted as stored inputs.
- A focused test proves the stage can create positive medical leaf support for both itemizers and non-itemizers/below-floor units.
- A diagnostic compares initial/final fit for the medical deduction target and the underlying health-leaf totals.
- The implementation is spec-driven from the country package: country content selects sources/variables and operation type; shared runtime code performs the formula-constrained imputation.
- The release gate distinguishes leaf-support failure from formula-output calibration failure.
Other candidate concepts
The same pattern may apply to:
- Mortgage/itemized interest: infer structural mortgage leaves from interest deduction and housing/debt observables.
- Charitable giving: split cash/non-cash leaves while respecting the PE charitable deduction formula and itemization limits.
- SALT: split property tax, state income tax, and sales tax leaves while respecting the SALT deduction cap/formula.
- QBI: infer business leaves and qualification flags from deduction and business-income observables.
- Taxable Social Security: impute Social Security components/leaves and let PE calculate taxable Social Security rather than storing taxable totals.
- ACA/PTC: impute take-up and plan-choice leaves, then let PE calculate PTC rather than targeting stored PTC directly.
- EITC/CTC return-count diagnostics: materialize eligibility/claiming leaves and use PE-calculated credits/positive-credit indicators for targets.
This should be treated as a model-design issue, not a quick release blocker for the current formula-owned export assertion fix.
Problem
For itemized deduction inputs, Populace should avoid using formula-owned deduction totals as exported columns, but a naive allocation from an aggregate deduction total to leaf inputs can lose important distributional information.
A concrete example is medical expenses:
medical_out_of_pocket_expenses->medical_expense_deduction.If we simply seed or split the deduction total, we miss non-itemizers and people below the threshold. If we only trust CPS leaves, we may miss the PUF-side distribution among itemizers. We need a principled bridge.
Proposal
Design a formula-constrained leaf-imputation stage, starting with medical expenses:
For medical, this could mean imputing these leaves on the PUF support side:
health_insurance_premiums_without_medicare_part_bmedicare_part_b_premiumsother_medical_expensesover_the_counter_health_expenseswhere relevant outside the IRS deductionThen run PE to calculate
medical_out_of_pocket_expensesandmedical_expense_deductionfor calibration diagnostics and target fit.Acceptance criteria
Other candidate concepts
The same pattern may apply to:
This should be treated as a model-design issue, not a quick release blocker for the current formula-owned export assertion fix.