Skip to content

Use full-population data for output weights#51

Merged
MaxGhenis merged 2 commits into
mainfrom
fix-population-variable-weights
May 19, 2026
Merged

Use full-population data for output weights#51
MaxGhenis merged 2 commits into
mainfrom
fix-population-variable-weights

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • compute household-impact and aggregate output weights from full source microsimulation populations instead of the 100-household benchmark sample
  • add a committed population-weight artifact and a package CLI to regenerate it from full US Enhanced CPS and UK enhanced FRS
  • refresh app data, snapshot reports, manifest hashes, and rendered paper assets so PIP and other sparse outputs keep population-derived weight
  • document the local-income-tax zero-weight limitation in the current full ECPS source

Fixes #49.

Tests

  • uv run ruff check policybench tests
  • uv run pytest -q
  • npm run lint && npx tsc --noEmit && npm run build
  • source .venv/bin/activate && quarto render paper/index.qmd --to html
  • source .venv/bin/activate && quarto render paper/index.qmd --to pdf

@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policybench-site Ready Ready Preview, Comment May 19, 2026 1:14pm

Request Review

@MaxGhenis MaxGhenis merged commit b06d24d into main May 19, 2026
4 checks passed
@MaxGhenis MaxGhenis deleted the fix-population-variable-weights branch May 19, 2026 13:15
MaxGhenis added a commit that referenced this pull request May 19, 2026
The pr-46-review-era stack landed before PR #51 ("Use full-population
data for output weights") and Daphne's PR #47 ACA-from-scoring
removal. The result on main right now:

- ruff format has drifted on policybench/analysis.py,
  policybench/full_run_export.py, and tests/test_spec.py
- policybench/population_weights.json still contains
  premium_tax_credit, but the headline output set no longer does
  (post-#47)

That makes both lint and test red on every PR against current main.
This commit:

- Reformats the three drifted files with ruff format
- Regenerates policybench/population_weights.json via
  policybench population-weights, which drops the now-out-of-scope
  premium_tax_credit entry
- Updates the matching SHA-256 in
  paper/snapshot/20260501/manifest.json so
  test_snapshot_manifest_hashes_match_population_weight_artifact
  passes against the new artifact

Verified: uv run pytest -m "not slow" -q → 258 passed; uv run ruff
format --check . → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis added a commit that referenced this pull request May 19, 2026
* Hero: replace inventory subtitle with a mission line

The hero used to read "14 models on 100 households across 18 tax
and benefit outputs." That information is already in the stats
chip row below (Models / Households / Outputs / snapshot date), so
the subtitle was redundant inventory rather than orienting copy.

Replace with a one-line mission statement that says what the
benchmark is actually for:

  "Testing how accurately language models calculate household
   taxes and benefits."

The 100% = within-1% explainer continues immediately after as the
muted continuation, so readers still learn what the score means.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Clean PolicyBench deployment config

* Fix CI: ruff format, regenerate population-weight artifact

The pr-46-review-era stack landed before PR #51 ("Use full-population
data for output weights") and Daphne's PR #47 ACA-from-scoring
removal. The result on main right now:

- ruff format has drifted on policybench/analysis.py,
  policybench/full_run_export.py, and tests/test_spec.py
- policybench/population_weights.json still contains
  premium_tax_credit, but the headline output set no longer does
  (post-#47)

That makes both lint and test red on every PR against current main.
This commit:

- Reformats the three drifted files with ruff format
- Regenerates policybench/population_weights.json via
  policybench population-weights, which drops the now-out-of-scope
  premium_tax_credit entry
- Updates the matching SHA-256 in
  paper/snapshot/20260501/manifest.json so
  test_snapshot_manifest_hashes_match_population_weight_artifact
  passes against the new artifact

Verified: uv run pytest -m "not slow" -q → 258 passed; uv run ruff
format --check . → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compute global variable weights from population, not the 100-scenario sample

1 participant