# Policy Impacts for Year 2054 (Wharton Benchmark)

This notebook calculates static budgetary impacts for Option 1 (Full Repeal of Social Security Benefits Taxation) using the 2054 dataset.
This allows comparison with Wharton Budget Model estimates for the same year.

**Dataset**: `hf://policyengine/test/2054.h5`  
**Year**: 2054  
**Scoring**: Static (no behavioral responses)
**Reform**: Option 1 only (for Wharton benchmark comparison)

In [1]:
# Import necessary libraries
import sys
import os

# Determine repo root and add src to path
if os.path.basename(os.getcwd()) == 'analysis':
    repo_root = os.path.abspath('..')
    os.chdir(repo_root)
else:
    repo_root = os.getcwd()

# Add src directory to Python path
src_path = os.path.join(repo_root, 'src')
if src_path not in sys.path:
    sys.path.insert(0, src_path)

print(f"Working directory: {os.getcwd()}")
print(f"Source path: {src_path}")

import pandas as pd
import numpy as np
from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform
from reforms import REFORMS
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

print(f"✓ Libraries imported")
print(f"✓ Found {len(REFORMS)} reforms")

Working directory: /Users/ziminghua/vscode/crfb-tob-impacts
Source path: /Users/ziminghua/vscode/crfb-tob-impacts/src


  from .autonotebook import tqdm as notebook_tqdm


✓ Libraries imported
✓ Found 8 reforms


## Load 2054 Dataset

Load the 2054 projection dataset to compare with Wharton benchmark.

In [2]:
# Load the 2054 dataset
print("Loading 2054 dataset...")
sim = Microsimulation(dataset="hf://policyengine/test/2054.h5")
print("✓ Dataset loaded successfully")

# Check dataset size
household_weight = sim.calculate("household_weight", period=2054)
household_count = sim.calculate("household_count", period=2054, map_to="household")

print(f"\nDataset statistics:")
print(f"  Number of households in sample: {len(household_weight):,}")
print(f"  Weighted household count: {household_count.sum():,.0f}")

Loading 2054 dataset...


✓ Dataset loaded successfully

Dataset statistics:
  Number of households in sample: 21,108
  Weighted household count: 170,807,832


## Compute Baseline

Calculate baseline income tax for year 2054 using the 2054 dataset.

In [3]:
print("Computing baseline for 2054...")
baseline_2054 = Microsimulation(dataset="hf://policyengine/test/2054.h5")
baseline_income_tax = baseline_2054.calculate("income_tax", map_to="household", period=2054)

print(f"✓ Baseline computed")
print(f"  Total baseline income tax: ${baseline_income_tax.sum() / 1e9:,.1f}B")

Computing baseline for 2054...


✓ Baseline computed
  Total baseline income tax: $3,289.7B


## Helper Function

Define function to calculate revenue impact for a given reform.

In [4]:
def calculate_revenue_impact_2054(reform):
    """
    Calculate revenue impact for a given reform in year 2054.
    
    Args:
        reform: Reform object
    
    Returns:
        Revenue impact in dollars (positive = revenue gain, negative = revenue loss)
    """
    # Create reformed simulation with 2054 dataset
    reform_sim = Microsimulation(dataset="hf://policyengine/test/2054.h5", reform=reform)
    
    # Calculate reformed income tax
    reform_income_tax = reform_sim.calculate("income_tax", map_to="household", period=2054)
    
    # JCT convention: reformed - baseline (positive = more revenue)
    revenue_impact = reform_income_tax.sum() - baseline_income_tax.sum()
    
    return revenue_impact

print("✓ Helper function defined")

✓ Helper function defined


## Calculate Reform Impacts for 2054

Test Option 1 with the 2054 dataset for Wharton benchmark comparison.

In [5]:
# Storage for results
results_2054 = []

print("\n" + "="*80)
print("CALCULATING REFORM IMPACTS FOR YEAR 2054")
print("="*80)
print(f"Testing Option 1 only (for Wharton benchmark comparison)\n")

# Only process Option 1 (Full Repeal of Social Security Benefits Taxation)
reforms_to_process = {k: v for k, v in REFORMS.items() if k == 'option1'}

for reform_id, reform_config in tqdm(reforms_to_process.items(), desc="Processing reforms"):
    reform_name = reform_config['name']
    reform_func = reform_config['func']
    
    print(f"\nProcessing {reform_id}: {reform_name}")
    
    try:
        # Get the reform
        reform = reform_func()
        
        # Calculate impact
        print(f"  Calculating 2054 impact...", end=' ')
        impact = calculate_revenue_impact_2054(reform)
        
        results_2054.append({
            'reform_id': reform_id,
            'reform_name': reform_name,
            'year': 2054,
            'revenue_impact': impact,
            'revenue_impact_billions': impact / 1e9,
            'scoring_type': 'static',
            'dataset': '2054.h5'
        })
        
        print(f"${impact/1e9:,.1f}B")
        print(f"  ✓ Complete")
        
    except Exception as e:
        print(f"  ✗ ERROR: {type(e).__name__}: {e}")
        print(f"  Continuing with next reform...")
        import traceback
        traceback.print_exc()

print("\n" + "="*80)
print("CALCULATION COMPLETE")
print("="*80)


CALCULATING REFORM IMPACTS FOR YEAR 2054
Testing Option 1 only (for Wharton benchmark comparison)



Processing reforms:   0%|          | 0/1 [00:00<?, ?it/s]


Processing option1: Full Repeal of Social Security Benefits Taxation
  Calculating 2054 impact... 

Processing reforms: 100%|██████████| 1/1 [04:41<00:00, 281.97s/it]

Processing reforms: 100%|██████████| 1/1 [04:41<00:00, 281.97s/it]

$-239.6B
  ✓ Complete

CALCULATION COMPLETE





## Summary of Results

In [6]:
# Convert to DataFrame
results_df = pd.DataFrame(results_2054)

if len(results_df) > 0:
    print("\n2054 Reform Impacts (Billions):")
    print("="*80)
    
    for _, row in results_df.iterrows():
        print(f"{row['reform_id']:8s}: {row['reform_name']:55s} ${row['revenue_impact_billions']:>8,.1f}B")
    
    print("="*80)
    print(f"\nTotal reforms calculated: {len(results_df)}")
    
    # Display as table
    display(results_df[['reform_id', 'reform_name', 'revenue_impact_billions']].sort_values('revenue_impact_billions', ascending=False))
else:
    print("⚠ No results to display")


2054 Reform Impacts (Billions):
option1 : Full Repeal of Social Security Benefits Taxation        $  -239.6B

Total reforms calculated: 1


Unnamed: 0,reform_id,reform_name,revenue_impact_billions
0,option1,Full Repeal of Social Security Benefits Taxation,-239.612969


## Export Results to CSV

Save the 2054 impact estimates to a CSV file for Wharton benchmark comparison.

In [7]:
# Create data directory if it doesn't exist
os.makedirs('data', exist_ok=True)

if len(results_df) > 0:
    # Export full results
    output_file = 'data/policy_impacts_2054_wharton.csv'
    results_df.to_csv(output_file, index=False)
    print(f"✓ Exported results to: {output_file}")
    print(f"  Records: {len(results_df)}")
    print(f"  Columns: {', '.join(results_df.columns)}")
    
    # Also create a summary version
    summary_df = results_df[['reform_id', 'reform_name', 'revenue_impact_billions']].copy()
    summary_df = summary_df.sort_values('revenue_impact_billions', ascending=False)
    summary_file = 'data/policy_impacts_2054_wharton_summary.csv'
    summary_df.to_csv(summary_file, index=False)
    print(f"✓ Exported summary to: {summary_file}")
else:
    print("⚠ No results to export")

print("\n✓ Analysis complete!")

✓ Exported results to: data/policy_impacts_2054_wharton.csv
  Records: 1
  Columns: reform_id, reform_name, year, revenue_impact, revenue_impact_billions, scoring_type, dataset
✓ Exported summary to: data/policy_impacts_2054_wharton_summary.csv

✓ Analysis complete!
