Skip to content

Fix LSR and CG behavioral responses compounding when combined#7788

Open
MaxGhenis wants to merge 3 commits intoPolicyEngine:mainfrom
MaxGhenis:fix-lsr-cg-interaction
Open

Fix LSR and CG behavioral responses compounding when combined#7788
MaxGhenis wants to merge 3 commits intoPolicyEngine:mainfrom
MaxGhenis:fix-lsr-cg-interaction

Conversation

@MaxGhenis
Copy link
Contributor

Summary

Fixes #7785 — when both labor supply responses (LSR) and capital gains (CG) behavioral responses were enabled simultaneously, they created nested simulation branches that compounded each other, producing nonsensical results.

Root cause: LSR measurement branches didn't neutralize capital_gains_behavioral_response, and CG measurement branches didn't neutralize employment_income_behavioral_response / self_employment_income_behavioral_response. This caused each response to fire inside the other's measurement branches with wrong inputs, creating a positive feedback loop.

Fix: Each behavioral response's measurement branches now neutralize all OTHER behavioral responses, making the responses additive (measured against static income) rather than compounding.

Before fix (WATCA reform, 2026):

Variant Impact
LSR only $6.2B
CG only $21.9B
LSR + CG $8,651B

After fix (top bracket +10pp, 2026):

Variant Impact
LSR only -$88.1B
CG only -$64.8B
LSR + CG -$41.3B

Combined impact is now 0.3x the sum of individual impacts (bounded and reasonable) vs ~200x before.

Test plan

  • python /tmp/test_lsr_cg_fix.py passes with fix (combined/sum ratio = 0.3x)
  • Same test produces $8.6T without fix (confirmed on separate WATCA run)
  • CI passes

MaxGhenis and others added 2 commits March 17, 2026 00:44
When both labor supply responses (LSR) and capital gains (CG) behavioral
responses were enabled simultaneously, they created nested simulation
branches that compounded each other's effects, producing nonsensical
results (e.g. $8.6T tax impact instead of ~$40B).

The fix neutralizes CG responses in LSR measurement branches and LSR
responses in CG measurement branches, making each response measured
against static (no-behavioral-response) income. The responses are now
additive rather than compounding.

Fixes PolicyEngine#7785.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test requires ~60 minutes for 3 full microsimulations.
Skip unless RUN_HEAVY_TESTS=1 is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.46%. Comparing base (19ec305) to head (cb520e9).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##              main    #7788       +/-   ##
============================================
- Coverage   100.00%   77.46%   -22.54%     
============================================
  Files            3        2        -1     
  Lines           33       71       +38     
  Branches         0        3        +3     
============================================
+ Hits            33       55       +22     
- Misses           0       14       +14     
- Partials         0        2        +2     
Flag Coverage Δ
unittests 77.46% <ø> (-22.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The parametric reform test passes with the neutralization fix, but
the WATCA structural reform (which overrides taxable_income and
income_tax_before_credits) still produces $8.6T. Split into two
tests to cover both cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSR and CG behavioral responses produce nonsensical results when combined

1 participant