Skip to content

Honor Simulation.extra_variables on US/UK microsim paths (closes #303)#307

Merged
MaxGhenis merged 2 commits intomainfrom
fix-extra-variables-silently-ignored
Apr 20, 2026
Merged

Honor Simulation.extra_variables on US/UK microsim paths (closes #303)#307
MaxGhenis merged 2 commits intomainfrom
fix-extra-variables-silently-ignored

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Closes #303.

Summary

Simulation.extra_variables was declared on the base class and honored by the household calculator, but the population run() methods only iterated self.entity_variables — extras passed via Simulation(extra_variables={...}) silently never reached output_dataset.

Fix

New MicrosimulationModelVersion.resolve_entity_variables(simulation) in tax_benefit_models/common/model_version.py:

  • Merges bundled defaults with simulation.extra_variables
  • Order preserved (defaults first, extras appended)
  • Duplicates deduped
  • Unknown entity keys raise ValueError with close-match suggestions (e.g. "tax_units""did you mean 'tax_unit'?"). Same idiom compile_reform uses for parameter paths.
  • Unknown variable names raise with close-match suggestions.

Both PolicyEngineUSLatest.run() (us/model.py:235) and PolicyEngineUKLatest.run() (uk/model.py:212) now call the shared helper in place of iterating self.entity_variables directly.

Tests

New tests/test_extra_variables.py (6 tests):

  • US: extras appear on output_dataset for household + tax_unit
  • US: dedupes when extra overlaps a default (no duplicate columns)
  • US: unknown entity key raises with suggestion
  • US: unknown variable name raises with suggestion
  • UK: resolve_entity_variables direct call merges extras (full UK microsim fixture is heavier than needed; the method is the integration point and both countries call it identically)
  • UK: unknown variable name raises with suggestion

403 existing tests still pass.

Why a shared helper vs. inline fix

Audit (spawned in parallel with this fix) confirmed no other Simulation fields have the "declared but silently dropped" pattern. But two potential follow-ups surfaced:

  1. Simulation.save() raises AttributeError when output_dataset is None instead of a clear error — not silent, but unhelpful.
  2. created_at/updated_at aren't truly round-tripped; load() re-derives them from filesystem stat. The docstring at common/model_version.py:236 oversells this as a "round-trip."

Neither is urgent. Flagging for v4.0.2 / v4.1 consideration; not expanding this PR.

🤖 Generated with Claude Code

Closes #303.

The field was declared on the base Simulation class and honored by
pe.us.calculate_household / pe.uk.calculate_household, but the
population run() methods only iterated self.entity_variables — any
variable a caller added via Simulation(extra_variables={...}) was
silently dropped from output_dataset.

Fix:
- New MicrosimulationModelVersion.resolve_entity_variables(simulation)
  in tax_benefit_models.common.model_version merges bundled defaults
  with simulation.extra_variables. Order preserved (defaults first,
  extras appended). Duplicates deduped.
- Unknown entity keys raise ValueError with close-match suggestions
  (e.g. "tax_units" -> "did you mean 'tax_unit'?"). Same pattern used
  by compile_reform for parameter paths.
- Unknown variable names raise with close-match suggestions.
- Both US run() (tax_benefit_models/us/model.py) and UK run()
  (tax_benefit_models/uk/model.py) now call resolve_entity_variables
  in place of iterating self.entity_variables directly.

Regression tests in tests/test_extra_variables.py:
- US: extras appear on output_dataset for household + tax_unit;
  dedupes when extra overlaps a default; unknown entity/variable
  names raise with suggestions.
- UK: direct test of resolve_entity_variables on pe.uk.model (full
  UK microsim fixture is heavier than needed; the method is the
  integration point and both countries call it identically).

403 existing tests still pass; 6 new tests pass.
Reproducibility-reviewer (blocking):
- Replace direct-helper UK test with end-to-end UK integration test
  (test__uk_extra_variables_appear_on_output_dataset). Catches the
  one-character regression where uk/model.py:212 reverts to
  self.entity_variables.items() — verified by manually reverting.
- Strengthen dedup assertion: also check the column has no NaN
  entries so a silent merge-blanks-column regression fails, not just
  duplicate-column regressions.

Reproducibility-reviewer (follow-ups):
- Add test__default_empty_extras_produces_exact_bundled_defaults
  covering the default-construction ({}) path.
- Add test__partial_coverage_leaves_untouched_entities_unchanged
  covering the case where extras target one entity only.
- Comment explaining net_worth substitution (test_extra_variables.py:
  net_worth not in pe-us 1.653.3 codebase; tests use
  adjusted_gross_income as stand-in).

Code-simplifier (nice-to-have):
- Move get_close_matches import to module level (consistent with
  common/extra_variables.py's style).

Verified fail-without-fix:
- Reverting uk/model.py:212 back to self.entity_variables.items()
  makes test__uk_extra_variables_appear_on_output_dataset fail
  with the expected missing-column assertion.

8 tests, all pass. Ruff clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extra_variables silently ignored in Simulation.run() (US microsim path)

1 participant