Skip to content

Calibrate LA council tax (band counts + net £) and fix national gross/net#374

Open
vahid-ahmadi wants to merge 10 commits intomainfrom
feat/la-council-tax-targets
Open

Calibrate LA council tax (band counts + net £) and fix national gross/net#374
vahid-ahmadi wants to merge 10 commits intomainfrom
feat/la-council-tax-targets

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

@vahid-ahmadi vahid-ahmadi commented Apr 21, 2026

What this PR does

Calibrates both FRS council tax data points at LA level, addressing the 28 Apr standup ALIGNED decision ("the model will calibrate the two FRS data points as the council tax information is provided after deductions").

Three new column families added to datasets/local_areas/local_authorities/loss.py:

FRS data point Matrix column Target column y formula
Council tax band 1[council_tax_band == B] for B in A–H voa/council_tax/{A..H} (8 cols) VOA chargeable dwellings per band per LA
Council tax £ paid (net of CTR) council_tax_less_benefit (= gross − CTR benefit) housing/council_tax_net MHCLG taxbase-after-CTR × Band D (E); WG Council Tax Income (W)

Plus a national-level fix: targets/compute/council_tax.py now uses council_tax_less_benefit instead of council_tax for the OBR obr/council_tax target (which is "Total net council tax receipts"). Both sides of that constraint are now net of CTR.

LAs missing either input have their y cell left as NaN; the calibrator (utils/calibrate.py) masks NaN cells out of the loss so missing-source LAs don't contribute to training on those targets. No fabricated national-share fallbacks; no hard-zero NI targets — only directly observed cells train the calibrator (commit 96f5707).

Lineage caveats (flagged in review by @MaxGhenis)

Both council-tax LA targets are derived/proxy targets — products / rescalings of observed inputs, not directly observed LA totals. The PR earlier described them as direct; that was overstated.

Band counts (voa/council_tax/{A..H}):

  • Target: VOA dwelling stock per band per LA (E&W only).
  • Matrix: policyengine-uk household council_tax_band.
  • Drift: dwellings ≠ households; VOA stock includes exempt / empty / second-home dwellings without a household; banding ratios differ in Scotland (post-2017) and Wales (Band I, England has none).

Net £ paid (housing/council_tax_net):

  • Target: MHCLG taxbase × Band D (E), WG-published net council tax income (W). Taxbase is Band D equivalent dwellings adjusted for ~7 discount/premium/exemption classes.
  • Matrix: FRS-reported council_tax_less_benefit (household-reported gross less reported CTB).
  • Drift: same intent (what households pay net of CTR) but different construction paths and underlying microdata.

A separate policy question — whether derived/proxy targets like these should sit at full training weight alongside directly observed targets (HMRC SPI, ONS pop, DWP UC, VOA dwellings) — is being tracked in #381 and is not blocking this PR.

Closes #370.

Sources

Country Source Coverage
England (296) MHCLG Council Taxbase 2025, Table 1.35 "Tax base after allowance for council tax support" × LA Band D amount full
Wales (22) Welsh Government Council Tax Levels April 2026 to March 2027, Table 3 "Council tax income (£m)" — already net of CTR full
Scotland (32) None wired; cells masked (NaN, excluded from loss). Follow-up: parse Scottish Government Council Tax Datasets. masked
NI (10) No council tax (domestic rates instead) — cells masked masked

Sanity check. Across the 318 LAs with a directly observed net £ figure, the per-LA targets sum to ~£49.9bn (England £47.4bn + Wales £2.45bn). This roughly reconciles with MHCLG's published England-only Council Tax Requirement of £45.86bn (small gap from year mismatch — 2025 taxbase × 2026-27 Band D). The reconciliation is for sanity only; the calibration constraint operates per-LA, not on the national aggregate.

Files

New / modified

  • policyengine_uk_data/storage/la_council_tax.csv — adds total_council_tax_net column for E/W LAs.
  • policyengine_uk_data/targets/sources/la_council_tax.pyload_la_net_council_tax() helper + Target objects named housing/council_tax_net/{code}. Module docstring documents derived/proxy nature + lineage caveats.
  • policyengine_uk_data/targets/compute/council_tax.py — switches OBR national matrix col from council_tax (gross) to council_tax_less_benefit (net) so both sides of the constraint are net of CTR.
  • policyengine_uk_data/datasets/local_areas/local_authorities/loss.pyhousing/council_tax_net block immediately after the band-count block. Both blocks leave missing-source cells as NaN; the calibrator masks them.
  • policyengine_uk_data/utils/calibrate.py — calibrator updated to mask NaN cells out of the per-LA loss so sparse local targets are first-class.
  • policyengine_uk_data/tests/test_la_loss_council_tax.py — layer-1 CSV/coverage tests + layer-2 FRS-fixture-gated wiring + calibratability + NaN-masking assertions.
  • policyengine_uk_data/tests/test_calibrate_save.py — toy calibrator regression locking in the NaN-masking contract.
  • policyengine_uk_data/tests/test_obr_council_tax.py — 3 tests pinning the net-variable contract for the national OBR target.

Tests

  • CSV-level: total_council_tax_net column present, England + Wales fully covered, Scotland + NI absent (masked territory), value range £2m–£1.5bn (lower bound for Isles of Scilly), covered total in £43–50bn ballpark.
  • Loss-matrix wiring (FRS-fixture-gated): housing/council_tax_net column present in matrix and y, matrix col equals sim.calculate("council_tax_less_benefit"), direct cells finite, English LA y matches CSV exactly, Scotland + NI cells are NaN (masked, not fabricated), covered-LA target sum within 0.3–3× of weighted initial covered-LA net CT (calibratability sanity).
  • Calibrator masking (toy): NaN local target cells stay finite in the produced weights — they don't propagate NaN through the loss.
  • National OBR compute (light): compute_obr_council_tax returns council_tax_less_benefit not council_tax; country masks apply correctly; gross variable not queried.

Full run incl. adjacent suites: no regressions.

Out of scope (follow-ups)

Related

Two families of LA-level targets, covering all 360 LAs in
local_authorities_2021.csv, built from four public sources:

- `ons/council_tax_band_d/{code}` (350 targets): average Band D
  council tax inclusive of all precepts per billing authority.
  Sources: MHCLG *Council Tax levels set by local authorities in
  England 2026-27*, Welsh Government *Council Tax levels April 2026
  to March 2027*, Scottish Government *Council Tax Assumptions 2025*.
  All 296 English + 22 Welsh + 32 Scottish LAs covered.
- `ons/council_tax_band_count/{code}/{band}` (2,541 targets): number
  of dwellings per band A-H per LA. Source: VOA *Council Tax: Stock
  of Properties, 2025*. Covers England + Wales (318 LAs × ~8 bands,
  minus City of London Band A which is VOA-suppressed).

NI is excluded: domestic rates, not council tax. Scotland band
counts are not in VOA; Scottish Assessors publishes them separately
and is a follow-up.

Files
-----

- `storage/la_council_tax.csv` (31 KB, 360 rows): canonical CSV
  joining DLUHC Table 10 column 17, Welsh Table 1 "Overall average
  band D", Scottish Gov "CT by Band 2025-26" Band D column, and VOA
  CTSOP1.0 bands A-H onto the reference LA list.
  - Post-2023 South Yorkshire E-codes (E08000038/39) re-mapped to
    pre-2023 codes (E08000016/19) to match the reference list.
  - Scottish ampersand/double-space naming normalised
    ("Argyll & Bute" → "Argyll and Bute", etc.).
- `targets/sources/la_council_tax.py`: reads the CSV, emits Target
  objects at geographic_level=LOCAL_AUTHORITY with per-country year
  tagging and per-country reference URL.

Testing
-------

22 hermetic tests (no network access, no baseline fixture needed):

Structure
- Row count matches local_authorities_2021.csv.
- Every expected column present.
- Four UK country codes represented.
- Every LA code matches the reference list.

Value plausibility (the #371 lesson)
- Band D amount in [£900, £3,500] for every row with a value.
- Total dwellings in [200, 800,000] for every row with a value.
- Explicit Isles of Scilly regression test: total dwellings in
  [500, 5,000], not the 2.49M outlier that slipped into #371.
- Band A-H counts sum to total dwellings within 20-property slack
  (VOA 10-property suppression allowance).
- Every band-count target value ≤ 500k (largest LA stock).

Coverage expectations
- Every English, Welsh and Scottish LA has a Band D value.
- Northern Ireland has no council tax flagged (has_council_tax=False).

Spot-checks of published facts
- Wandsworth (E09000032) and Westminster (E09000033) are the two
  lowest-Band-D English LAs (catches row-swap bugs).
- Scottish average Band D is £500+ below English average.

Target-API invariants
- get_targets() returns a non-empty list without network access.
- Band D target count matches the CSV's non-null Band D count.
- Band count target count matches Σ non-null band columns.
- Every target carries geographic_level=LOCAL_AUTHORITY and a
  geo_code.
- Band D targets use Unit.GBP; band count targets use Unit.COUNT
  with is_count=True.
- Every target has at least one year of values.

Sources
-------

- MHCLG (England 2026-27):
  https://www.gov.uk/government/statistics/council-tax-levels-set-by-local-authorities-in-england-2026-to-2027
- Welsh Government (Wales 2026-27):
  https://www.gov.wales/council-tax-levels-april-2026-march-2027-html
- Scottish Government (Scotland 2025-26):
  https://www.gov.scot/publications/council-tax-datasets/
- VOA (England + Wales 2025):
  https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025

Out of scope for this PR (follow-ups)
-------------------------------------

- Wiring these targets into
  datasets/local_areas/local_authorities/loss.py so the LA
  reweighting actually calibrates on them. Planned follow-up PR.
- Scottish Assessors per-LA chargeable-dwellings to fill the Scotland
  band-count gap.
- Council Tax Support caseload per LA (DWP StatXplore).
- Single Person Discount rate per LA (CIPFA).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi self-assigned this Apr 21, 2026
@vahid-ahmadi vahid-ahmadi requested a review from MaxGhenis April 21, 2026 11:42
vahid-ahmadi and others added 3 commits April 23, 2026 13:50
Review points addressed:

- Add count_band_I column to la_council_tax.csv, populated for all 22
  Welsh LAs (Wales revalued in 2005 and introduced a 9th band). Cardiff
  1480, Monmouthshire 670, Vale of Glamorgan 1060, etc. English rows
  keep Band I null; VOA marks it [z] (not applicable).
- Re-source total_dwellings from VOA "All properties" column instead
  of deriving it as the sum of A-H. Previously Σ(A..H) was used for
  both sides of test_band_counts_sum_to_total, making the test
  self-referential; now it validates against the published total with
  a 20-property slack for VOA rounding.
- Rename count columns symmetrically: band_A..band_H + band_D_count →
  count_band_A..count_band_I. Removes the lopsided band_D_count name
  that existed only to avoid clashing with band_d_amount.
- Align band-count target names with voa_council_tax.py:
  voa/council_tax/{code}/{band} (was ons/council_tax_band_count/...);
  variable="council_tax_band" (was council_tax_band_count, which is
  not a real PolicyEngine-UK variable); drop breakdown_variable to
  match the regional VOA module.
- Cache the CSV read with @lru_cache(maxsize=1), matching voa_council_tax.
- Update module docstring: "A-H in England/Scotland, A-I in Wales".

Tests:
- New: test_welsh_las_have_band_i (all 22 Welsh LAs populated).
- New: test_english_las_have_no_band_i (guard against spurious fills).
- New: test_cardiff_band_i_matches_published_figure (~1,480 per VOA 2025).

Final target counts:
- 350 Band D amount targets (unchanged).
- 2,563 band-count targets, up from 2,541: +22 Welsh Band I plus two
  band-H rows that were null due to the earlier truncation.
The targets registered in la_council_tax.py were inert — the LA target
matrix had no columns for them, so the reweighter could not see them.
This wires the eight VOA Council Tax Stock-of-Properties band-count
targets (A-H) into the LA loss matrix:

- matrix entry: per-household indicator 1[council_tax_band == B] from
  policyengine-uk.
- y entry: 360-vector of per-LA dwelling counts from
  storage/la_council_tax.csv. For LAs without VOA data — Scottish LAs
  (the VOA summary tables don't cover Scotland) and Northern Irish LAs
  (no council tax) — the value falls back to
  national_count × la_household_share, matching the existing tenure
  block's fallback pattern.

Two targets are deliberately not wired in this pass:

- Band I — Wales-only and mostly null in the CSV.
- The Band D £ amount (ons/council_tax_band_d/{code}) — a per-rate
  quantity that does not fit the linear matrix-times-weights
  aggregation. Wiring it as total council-tax revenue would need
  Scotland-specific band ratios (different from England/Wales after
  2017) and is worth a separate PR.

New tests in test_la_loss_council_tax.py cover both layers:

- Light: CSV joins to every LA code, the eight count_band_{X} columns
  exist, E/W rows are populated, Scotland is null as documented, and
  NI has has_council_tax=False.
- Full build (gated on enhanced FRS fixture): all eight columns present
  in matrix and y; y vectors length 360, finite and positive; matrix
  entries are 0/1 indicators with rows summing to ≤1; y matches the
  CSV verbatim for an English LA (Hartlepool); Scotland and NI LAs
  receive a positive fallback rather than NaN or zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level council tax calibration targets (Band D + band distribution) Add LA-level council tax band-count targets and calibrate on them Apr 27, 2026
Wires the second FRS data point into the LA reweighter, addressing
the 28 Apr standup ALIGNED decision: "calibrate the two FRS data
points as the council tax information is provided after deductions."

Both sides of the new constraint are net of CTR:
- matrix col = council_tax_less_benefit (gross − CTR benefit)
- y = directly observed net council tax requirement per LA

Sources (no national-total apportionment, all directly published):
- England (296 LAs): MHCLG Council Taxbase 2025, Table 1.35 "Tax base
  after allowance for council tax support" × Band D amount.
  Sums to £47.4bn, within 3.4% of the MHCLG Table 1 published England
  Council Tax Requirement of £45.86bn (small gap from year mismatch:
  2025 taxbase × 2026-27 Band D).
- Wales (22 LAs): Welsh Government "Council Tax Levels April 2026
  to March 2027" Table 3 "Council tax income (£m)". Sums to £2.45bn.
- Scotland (32) and NI (10): no source wired; loss.py routes through
  the existing national × la_household_share fallback, same pattern
  as the band-count target and the rent target.

Mirrors the rent block in loss.py: load CSV → merge into ct_merged →
matrix col / y assignment / has_data mask / national-share fallback.

Files:
- storage/la_council_tax.csv: new column total_council_tax_net.
- targets/sources/la_council_tax.py: load_la_net_council_tax() +
  Target objects named housing/council_tax_net/{code}.
- datasets/local_areas/local_authorities/loss.py: housing/council_tax_net
  block immediately after the band-count block.
- tests/test_la_loss_council_tax.py: 11 new tests (4 layer-1 +
  7 layer-2) covering CSV column presence, country coverage, value
  range, England-total ballpark vs MHCLG, matrix-col correctness,
  na-fallback behaviour, calibratability sanity check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level council tax band-count targets and calibrate on them Add LA-level council tax band counts and net £ amount to calibration Apr 29, 2026
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

vahid-ahmadi commented Apr 29, 2026

Edits:

1. Both FRS data points now calibrated at LA level

FRS data point Where
Council tax band voa/council_tax/{A..H} — 8 cols (already in this PR)
Council tax £ paid housing/council_tax_net — new in this commit

2. Net of CTR alignment on both sides

  • Matrix col: council_tax_less_benefit (= gross − CTR benefit), not council_tax.
  • Target value: directly published net council tax requirement per LA.

3. Net receipts data investigated and used directly (no derivation)

  • England (296 LAs): MHCLG Council Taxbase 2025, Table 1.35 Tax base after allowance for council tax support × Band D. Sums to £47.4bn vs MHCLG's published Council Tax Requirement of £45.86bn (within 3.4%; year-mismatch).
  • Wales (22 LAs): Welsh Government Table 3 Council tax income (£m) — already net.
  • Scotland (32) + NI (10): fallback to national × la_household_share, same pattern as the band-count and tenure targets. Scottish source is a separate follow-up.

4. Same shape as the existing rent / tenure / housing/main_residence_value (#371) blocks — directly observed LA quantity, no national apportionment, na-fallback for missing LAs.

One thing I flagged but did not fix here: at the national level, the OBR obr/council_tax target value is "Total net council tax receipts" but the matrix col in targets/compute/council_tax.py:24 is council_tax (variable labelled "Gross amount, before discounts"). Either the variable label is wrong (FRS encodes net, label is misleading) or there's a real gross/net mismatch in the national reweighter. Worth verifying separately — it maps to the "[Vahid Ahmadi] Review Data" action item.

OBR EFO Table 4.1 reports "Total net council tax receipts" — net of
council tax reduction (CTR). The matching household-level signal is
council_tax_less_benefit (= gross council tax − CTR award), not
council_tax (which is the gross liability before CTR per its
docstring "Gross amount spent on Council Tax, before discounts").

Calibrating gross household values against a net national target
systematically pulls weights down to fit (Σ w × gross > Σ w × net),
leaking bias into adjacent national targets that share the weight
vector.

Order-of-magnitude sanity (UK 2024-25):
  Σ w × council_tax (gross)              ≈ £55bn
  Σ w × council_tax_less_benefit (net)   ≈ £47bn
  OBR Table 4.1 "Total net council tax"  ≈ £44bn

After the fix, the council tax constraint is internally consistent
(both sides net) and aligns with Max's 28 Apr standup decision on
FRS-net-of-CTR alignment. Pairs naturally with the LA-level
housing/council_tax_net target this PR adds — both use the same net
variable.

Adds three regression tests pinning the net-variable contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level council tax band counts and net £ amount to calibration Calibrate LA council tax (band counts + net £) and fix national gross/net Apr 29, 2026
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

vahid-ahmadi commented Apr 29, 2026

Folded the national fix into this PR (commit 3277a21):

targets/compute/council_tax.py:24 was passing the gross variable (council_tax) as the matrix col for the net OBR target (Total net council tax receipts). Now uses council_tax_less_benefit so both sides are net.

Order-of-magnitude check:

  • Σ w × council_tax (gross): ~£55bn
  • Σ w × council_tax_less_benefit (net): ~£47bn
  • OBR Table 4.1 net target: ~£44bn

Three regression tests added (test_obr_council_tax.py) pinning the net-variable contract — a future refactor that reintroduces the gross variable will fail at test time, not silently in production.

Effect: closes the "[Vahid Ahmadi] Review Data" loop. National council tax calibration is now consistent with the FRS-net framing, and pairs naturally with the LA-level housing/council_tax_net target this PR adds (both use council_tax_less_benefit).

Northern Ireland uses domestic rates, not council tax. The CSV's
has_council_tax flag has been False for NI from the original commit,
but loss.py was ignoring it and assigning national × la_household_share
to NI LAs for both band counts and the new net £ column.

Effect: the optimiser was being told "NI households should pay this
much council tax" with a positive target, while every NI household
has council_tax_band == None and council_tax_less_benefit == 0 — an
unsatisfiable constraint that wastes loss the optimiser cannot drive
to zero. Reported by @MaxGhenis in PR review.

Fix: read has_council_tax from the CSV, gate the np.where so NI LAs
get y == 0 for all 9 council-tax columns. Direct-value and fallback
paths unchanged for E/W/S.

Updates two tests that previously asserted positive fallback for NI;
adds explicit zero-NI assertion for housing/council_tax_net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vahid-ahmadi and others added 2 commits April 29, 2026 16:18
Per @MaxGhenis PR review: both council-tax LA targets are derived
proxies, not direct matches for the matrix-side variables. The PR
description and code comments earlier overstated this.

voa/council_tax/{A..H}: target counts VOA dwellings (E&W only,
includes exempt/empty/second homes); matrix counts policyengine-uk
households. Banding ratios differ in Scotland post-2017 and Wales
has Band I.

housing/council_tax_net: target value is MHCLG taxbase × Band D
(taxbase = Band D equivalent dwellings adjusted for ~7 discount/
premium/exemption classes); matrix col is FRS-reported
council_tax_less_benefit (household-reported gross less reported
CTB). Same intent, different construction paths.

Documentation only — no code, data, or test behaviour change.
The la_council_tax.py docstring now has an explicit "Lineage
caveats" section, and loss.py block comments label both targets
as derived/proxy with cross-reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis
Copy link
Copy Markdown
Contributor

Follow-up on the direct-target discussion: I pushed 96f5707 to stop fabricating local council-tax targets where no direct source cell exists. Missing CT band/net cells now stay NaN and calibrate_local_areas masks NaN local target cells out of the loss, rather than filling Scotland/NI with national-share values or hard zeroes. This preserves direct source cells and avoids treating NI zeroes as training constraints when the matrix side may not be zero.\n\nThis also adds a toy calibrator regression for sparse local targets. Remaining source-cleanup direction: prefer direct council-tax requirement/income where available, and add Scotland band/net sources before training on those LA cells.\n\nVerification: uv run pytest policyengine_uk_data/tests/test_calibrate_save.py policyengine_uk_data/tests/test_la_council_tax_targets.py policyengine_uk_data/tests/test_la_loss_council_tax.py policyengine_uk_data/tests/test_obr_council_tax.py -q; ruff check/format on touched files.

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

@MaxGhenis — nit on 96f5707.

is_ct_la may now be dead defensive code. With the switch to NaN-masking, NI LAs naturally have no total_council_tax_net and no count_band_* values in the CSV, so has_count and has_ct_net are already False for them — the cells would be NaN without the explicit is_ct_la gate. Worth checking whether the gate is still doing useful work or can be removed.

Also is_ct_la = ct_merged["has_council_tax"].fillna(True).astype(bool).values defaults missing flags to True, which seems backwards directionally — a missing flag should probably mean "unknown / err on the side of masking", not "yes train on this".

Not blocking.

@MaxGhenis
Copy link
Copy Markdown
Contributor

Addressed in 7f0c3e2: removed the redundant is_ct_la gate from the LA council-tax loss matrix.

After the NaN-masking change, direct source-cell availability is enough: band targets train only where count_band_{A..H} is non-null, and net council-tax targets train only where total_council_tax_net is non-null. NI already has NaNs for those cells, so it remains masked without a separate has_council_tax flag or a missing-flag default.

Verification:

  • uv run pytest policyengine_uk_data/tests/test_la_loss_council_tax.py policyengine_uk_data/tests/test_calibrate_save.py -q
  • uv run ruff check policyengine_uk_data/datasets/local_areas/local_authorities/loss.py
  • uv run ruff format --check policyengine_uk_data/datasets/local_areas/local_authorities/loss.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add LA-level household land value calibration targets

2 participants