Add second-stage QRF imputation for FRS-only variables by MaxGhenis · Pull Request #362 · PolicyEngine/policyengine-uk-data

MaxGhenis · 2026-04-18T00:24:31Z

Summary

Fixes item 3 of #1621 — biggest-ticket architectural gap in the enhanced-FRS pipeline.

The zero-weight SPI-donor subsample's income columns are rewritten by the SPI-trained first-stage QRF, but every other FRS column (benefit _reported values, pension contributions, savings-interest income, state-pension and disability-benefit _reported amounts, council-tax benefit) stays as whatever middle-income FRS donor was sampled. A £2M imputed self-employment earner keeps its donor's £120 UC _reported value, tiny pension contributions, and typical rent. Under calibration upweight these cascade into false benefit aggregates, depressed allowances, and distorted housing-cost totals.

This PR adds a second-stage QRF in frs_only.py that:

Trains on the original full-FRS with predictors = [demographics + the six stage-1 income components] and outputs = the FRS-only variables listed in FRS_ONLY_PERSON_VARIABLES.
Predicts for every SPI-donor row using [demographics + newly-imputed incomes] as predictors.
Overwrites only the listed outputs; non-negative clamp; missing columns skipped.

Mirrors the CPS-only stage-2 QRF introduced in PolicyEngine/policyengine-us-data#589 and the same training pattern used by _impute_cps_only_variables.

Expected impact

High-income SPI-donor rows should now carry income-consistent benefit _reported values (close to zero for £500k+ earners), realistic pension contributions, and savings-income correlated with imputed income. Should substantially reduce the +£4-6bn drift across income_support, esa_contrib, working_tax_credit, child_tax_credit, and housing_benefit aggregates that the tracking issue attributes to donor leakage.

Test plan

Unit tests (4): non-negative outputs, non-target-column preservation, missing-column tolerance, training-pattern gradient (income -> UC receipt)
make format passes
Full data build in CI (integration path)
Benefit-aggregate comparison vs OBR on the new build (manual verification)

Generated with Claude Code

The enhanced-FRS pipeline's zero-weight SPI-donor subsample has its income columns rewritten by a SPI-trained first-stage QRF, but every other FRS column (benefit `_reported` values, pension contributions, savings income, council tax benefit) stays as whatever middle-income FRS donor was sampled. After calibration upweight this cascades into false benefit aggregates, distorted allowances, and housing-cost mismatches — the tracking issue decomposes about £4-6bn of benefit-aggregate drift to this failure mode (most visibly the "£1M earners with zero everything else" pattern described in #1621). Adds a second-stage QRF (`frs_only.py`) that trains on the original full-FRS build with predictors = [demographics + first-stage income outputs] and outputs = a curated list of FRS-only variables, then predicts for every SPI-donor row. High-earner predictions collapse UC / HB / WTC receipt toward zero, pension contributions rescale, and savings interest correlates with imputed income. Mirrors the CPS-only stage-2 QRF introduced in policyengine-us-data#589. Unit tests cover: non-negative outputs, that non-target columns are untouched, that missing train/target columns are skipped silently, and that the predictions track the training-data income → receipt gradient. The real full-FRS retrain runs in CI via the integration data-build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI surfaced the KeyError: 'region' because the FRS build stores region on the household frame, not the person frame. Route the lookup through person_household_id so the stage-2 QRF trains and predicts on the household-derived region column without needing a full Microsimulation bootstrap (which would require a host of unrelated household columns like council_tax, tenure_type, etc., that test fixtures don't carry). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The weighted-UK-population drift that motivated #310 has already dropped from ~6.5% to ~1.6% on current main as a side-effect of the data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC target refresh #363, reported-anchor takeup #359). Tightens `test_population` tolerance from 7 % to 3 % to lock in that gain — any future calibration change that regresses back toward the pre-April-2026 overshoot now trips CI instead of silently drifting. Adds a new `test_population_fidelity.py` with four regression tests extracted from the #310 draft: - weighted-total ONS match (3 % tolerance) - household-count sanity range (25-33 M) - non-inflation guard (< 72 M) - country-populations-sum-to-UK consistency Does not include #310's loss-function change or Scotland target removal; those are independent proposals and should be evaluated on their own merits once the practical overshoot is resolved. Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tighten population tolerance and add fidelity tests The weighted-UK-population drift that motivated #310 has already dropped from ~6.5% to ~1.6% on current main as a side-effect of the data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC target refresh #363, reported-anchor takeup #359). Tightens `test_population` tolerance from 7 % to 3 % to lock in that gain — any future calibration change that regresses back toward the pre-April-2026 overshoot now trips CI instead of silently drifting. Adds a new `test_population_fidelity.py` with four regression tests extracted from the #310 draft: - weighted-total ONS match (3 % tolerance) - household-count sanity range (25-33 M) - non-inflation guard (< 72 M) - country-populations-sum-to-UK consistency Does not include #310's loss-function change or Scotland target removal; those are independent proposals and should be evaluated on their own merits once the practical overshoot is resolved. Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Loosen population tolerance 3% -> 4% for stochastic calibration variance First CI run on this branch produced 71.8M (3.31% over target) where yesterday's main build produced 70.97M (1.58%). Stochastic dropout in the calibration optimiser (`dropout_weights(weights, 0.05)`) gives ~1-2 percentage point build-to-build variance on the population total. 4% keeps the regression gate well below the pre-April-2026 overshoot (~6.5%) while not flaking on normal stochastic variance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis mentioned this pull request Apr 18, 2026

test_reform_fiscal_impacts[Raise VAT standard rate by 2pp] fails on main (43bn actual vs 25bn expected) #364

Closed

MaxGhenis and others added 2 commits April 18, 2026 07:37

MaxGhenis force-pushed the add-second-stage-qrf-frs-vars branch from ae3b4b6 to 4c06956 Compare April 18, 2026 11:37

MaxGhenis merged commit 5c726b6 into main Apr 18, 2026
3 checks passed

MaxGhenis deleted the add-second-stage-qrf-frs-vars branch April 18, 2026 12:08

This was referenced Apr 19, 2026

Tighten population tolerance and add fidelity tests #366

Merged

Fix calibration population overshoot (~6% drift) #310

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add second-stage QRF imputation for FRS-only variables#362

Add second-stage QRF imputation for FRS-only variables#362
MaxGhenis merged 2 commits intomainfrom
add-second-stage-qrf-frs-vars

MaxGhenis commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 18, 2026

Summary

Expected impact

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant