# California AEI Rebate Base Analysis

This notebook calculates the rebate base for a proposed California consumption rebate program.

## Program Design
- **Rebate Base**: Equal to the Federal Poverty Guideline (FPG) for the tax unit or household
- **Phase-out**: Linear phase-out between 150% and 175% of FPG
- **Actual Rebate**: `rebate_base * VAT_rate` (VAT rate to be determined)

## Key Formula
The rebate base (X) is used in the VAT rate formula:
```
t = Rs/(Cp - X - T + Ro)
```
Where:
- t = VAT rate
- Rs = Revenue target
- Cp = Private consumption
- X = Total rebate base (calculated here)
- T = Existing taxes
- Ro = Other revenue

## 1. Setup and Imports

In [None]:
from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform
from policyengine_us.model_api import *
import numpy as np
import pandas as pd

# Constants
SIMULATION_YEAR = 2026  # Year for simulation calculations

print("Libraries imported successfully")

## 2. Define the AEI Rebate Base Variables

In [2]:
def create_aei_reform():
    """
    Creates a PolicyEngine reform that adds the California AEI rebate base variables.
    
    Returns:
        Reform: PolicyEngine reform with ca_aei_rebate_base variables
    """
    
    class household_fpg(Variable):
        value_type = float
        entity = Household
        label = "Household's federal poverty guideline"
        definition_period = YEAR
        unit = USD

        def formula(household, period, parameters):
            n = household("household_size", period)
            state_group = household("state_group_str", period)
            p_fpg = parameters(period).gov.hhs.fpg
            p1 = p_fpg.first_person[state_group]
            pn = p_fpg.additional_person[state_group]
            return p1 + pn * (n - 1)
    
    class ca_aei_rebate_base_tax_unit(Variable):
        value_type = float
        entity = TaxUnit
        label = "California AEI rebate base (tax unit version)"
        unit = USD
        definition_period = YEAR
        defined_for = StateCode.CA

        def formula(tax_unit, period, parameters):
            # Use tax unit's own AGI
            income = tax_unit("adjusted_gross_income", period)
            fpg = tax_unit("tax_unit_fpg", period)
            income_to_fpg_ratio = where(fpg > 0, income / fpg, np.inf)

            # Phase-out parameters
            PHASEOUT_START = 1.5   # 150% FPG
            PHASEOUT_END = 1.75    # 175% FPG
            PHASEOUT_WIDTH = PHASEOUT_END - PHASEOUT_START

            # Phase-out calculation
            excess = max_(income_to_fpg_ratio - PHASEOUT_START, 0)
            phaseout_percentage = min_(1, excess / PHASEOUT_WIDTH)

            return where(
                income_to_fpg_ratio <= PHASEOUT_END,
                fpg * (1 - phaseout_percentage),
                0
            )

    class ca_aei_rebate_base(Variable):
        value_type = float
        entity = Household
        label = "California AEI rebate base"
        unit = USD
        definition_period = YEAR

        def formula(household, period, parameters):
            # Sum AGI from all tax units in the household
            income = household.sum(household.members.tax_unit("adjusted_gross_income", period))
            fpg = household("household_fpg", period)
            income_to_fpg_ratio = where(fpg > 0, income / fpg, np.inf)

            # Phase-out parameters
            PHASEOUT_START = 1.5   # 150% FPG
            PHASEOUT_END = 1.75    # 175% FPG
            PHASEOUT_WIDTH = PHASEOUT_END - PHASEOUT_START

            # Phase-out calculation
            excess = max_(income_to_fpg_ratio - PHASEOUT_START, 0)
            phaseout_percentage = min_(1, excess / PHASEOUT_WIDTH)

            return where(
                income_to_fpg_ratio <= PHASEOUT_END,
                fpg * (1 - phaseout_percentage),
                0
            )

    class AEIReform(Reform):
        def apply(self):
            self.update_variable(household_fpg)
            self.update_variable(ca_aei_rebate_base_tax_unit)
            self.update_variable(ca_aei_rebate_base)
    
    return AEIReform

print("Reform defined successfully")

Reform defined successfully


## 3. Calculate Rebate Base Statistics

In [None]:
def calculate_rebate_base_statistics(sim, unit_type="household"):
    """
    Calculate AEI rebate base program statistics for California households or tax units.
    
    Args:
        sim: Microsimulation object with reform applied
        unit_type: Either "household" or "tax_unit"
    
    Returns:
        Dictionary with rebate base statistics
    """
    print(f"Calculating {unit_type} statistics...")
    
    if unit_type == "household":
        # Calculate rebate base for all households
        rebate_base = sim.calculate("ca_aei_rebate_base", SIMULATION_YEAR)
        
        # Filter for California households only
        household_state = sim.calculate("state_code", SIMULATION_YEAR, map_to="household")
        ca_mask = household_state == "CA"
        
        # Apply CA filter
        ca_rebate_base = rebate_base[ca_mask]
        total_ca_units = ca_mask.sum()
        
    else:  # tax_unit
        # Calculate rebate base for all tax units (defined_for gives 0 for non-CA)
        rebate_base = sim.calculate("ca_aei_rebate_base_tax_unit", SIMULATION_YEAR)
        
        # Use calculate_dataframe to get household-level data
        household_df = sim.calculate_dataframe(
            ["household_id", "state_code"],
            SIMULATION_YEAR,
            map_to="household"
        )
        
        # Get tax unit data
        tax_unit_df = sim.calculate_dataframe(
            ["tax_unit_id", "tax_unit_household_id"],
            SIMULATION_YEAR
        )
        
        # Merge to get state for each tax unit
        tax_unit_with_state = tax_unit_df.merge(
            household_df[["household_id", "state_code"]],
            left_on="tax_unit_household_id",
            right_on="household_id",
            how="left"
        )
        
        # Create a boolean MicroSeries for CA tax units
        ca_tax_unit_mask = tax_unit_with_state["state_code"] == "CA"
        total_ca_units = ca_tax_unit_mask.sum()
        
        # For tax units, we use all rebates (defined_for already filters to CA)
        ca_rebate_base = rebate_base
    
    # Calculate statistics (MicroSeries already contain weights)
    units_with_rebate = (ca_rebate_base > 0).sum()
    total_rebate_base = ca_rebate_base.sum()
    average_rebate_base = ca_rebate_base[ca_rebate_base > 0].mean() if units_with_rebate > 0 else 0
    
    return {
        f"total_ca_{unit_type}s": total_ca_units,
        f"{unit_type}s_with_rebate": units_with_rebate,
        "rebate_percentage": units_with_rebate / total_ca_units,
        "average_rebate_base": average_rebate_base,
        "total_rebate_base": total_rebate_base,
    }

# Create simulation once
print("Loading data and creating simulation...")
reform = create_aei_reform()
sim = Microsimulation(
    dataset="hf://policyengine/policyengine-us-data/pooled_3_year_cps_2023.h5",
    reform=reform
)

# Calculate both household and tax unit results using the same simulation
household_results = calculate_rebate_base_statistics(sim, "household")
tax_unit_results = calculate_rebate_base_statistics(sim, "tax_unit")

## 4. Display Results

In [None]:
# Display household results
household_table = pd.DataFrame([
    {'Metric': 'Total CA Households', 'Value': f"{household_results['total_ca_households']/1e6:.1f}M"},
    {'Metric': 'Households with Rebate', 'Value': f"{household_results['households_with_rebate']/1e6:.1f}M ({household_results['rebate_percentage']:.0%})"},
    {'Metric': 'Average Rebate Base', 'Value': f"${household_results['average_rebate_base']:,.0f}"},
    {'Metric': 'Total Rebate Base', 'Value': f"${household_results['total_rebate_base']/1e9:.1f}B"},
])

print(f"\n=== HOUSEHOLD-LEVEL RESULTS ({SIMULATION_YEAR}) ===")
household_table

In [None]:
# Display tax unit results  
tax_unit_table = pd.DataFrame([
    {'Metric': 'Total CA Tax Units', 'Value': f"{tax_unit_results['total_ca_tax_units']/1e6:.1f}M"},
    {'Metric': 'Tax Units with Rebate', 'Value': f"{tax_unit_results['tax_units_with_rebate']/1e6:.1f}M ({tax_unit_results['rebate_percentage']:.0%})"},
    {'Metric': 'Average Rebate Base', 'Value': f"${tax_unit_results['average_rebate_base']:,.0f}"},
    {'Metric': 'Total Rebate Base', 'Value': f"${tax_unit_results['total_rebate_base']/1e9:.1f}B"},
])

print(f"\n=== TAX UNIT-LEVEL RESULTS ({SIMULATION_YEAR}) ===")
tax_unit_table

## 5. Summary of Results

In [None]:
# Create summary table
summary_data = {
    'Approach': ['Household-Level', 'Tax Unit-Level'],
    'Total Rebate Base (X)': [
        f"${household_results['total_rebate_base']/1e9:.1f}B",
        f"${tax_unit_results['total_rebate_base']/1e9:.1f}B"
    ],
    'Coverage': [
        f"{household_results['households_with_rebate']/1e6:.1f}M ({household_results['rebate_percentage']:.0%})",
        f"{tax_unit_results['tax_units_with_rebate']/1e6:.1f}M ({tax_unit_results['rebate_percentage']:.0%})"
    ],
    'VAT Formula Value': [
        f"${household_results['total_rebate_base']:,.0f}",
        f"${tax_unit_results['total_rebate_base']:,.0f}"
    ]
}

summary_df = pd.DataFrame(summary_data)
print(f"=== SUMMARY OF REBATE BASE CALCULATIONS ({SIMULATION_YEAR}) ===")
summary_df

#### VAT Formula
The rebate base (X) is used in the VAT rate calculation:
```
t = Rs/(Cp - X - T + Ro)
```
Where:
- **t** = VAT rate to be determined
- **Rs** = Revenue target
- **Cp** = Private consumption
- **X** = Total rebate base (calculated above)
- **T** = Existing taxes
- **Ro** = Other revenue

## 6. Validation Against Official Statistics

### Official Data Sources Used

#### U.S. Census Bureau - American Community Survey (2023)
- **API Endpoint**: [https://api.census.gov/data/2023/acs/acs1?get=NAME,B11001_001E&for=state:06](https://api.census.gov/data/2023/acs/acs1?get=NAME,B11001_001E&for=state:06)
- **Data**: 13,699,816 households in California (2023 ACS 1-Year Estimates)
- **Table**: B11001 - Household Type (Including Living Alone)

#### IRS Statistics of Income (SOI) - Tax Year 2022
- **Source**: [IRS SOI Historic Table 2](https://www.irs.gov/statistics/soi-tax-stats-historic-table-2)
- **California Data File**: [22in05ca.xlsx](https://www.irs.gov/pub/irs-soi/22in05ca.xlsx)
- **Data**: 18,487,690 total returns filed in California for tax year 2022

In [6]:
import pandas as pd

# Official statistics constants
CENSUS_HOUSEHOLDS_2023 = 13_699_816  # From ACS 2023 1-Year Estimates
IRS_RETURNS_2022 = 18_487_690  # From IRS SOI Historic Table 2

# Create validation table
validation_data = {
    'Metric': ['Households', 'Tax Returns/Units'],
    'Official Source': ['U.S. Census ACS 2023', 'IRS SOI 2022'],
    'Official Count': [f"{CENSUS_HOUSEHOLDS_2023:,}", f"{IRS_RETURNS_2022:,}"],
    'Our Simulation': [
        f"{household_results['total_ca_households']/1e6:.1f}M",
        f"{tax_unit_results['total_ca_tax_units']/1e6:.1f}M"
    ],
    'Difference': [
        f"{(household_results['total_ca_households']/CENSUS_HOUSEHOLDS_2023 - 1)*100:+.1f}%",
        f"{(tax_unit_results['total_ca_tax_units']/IRS_RETURNS_2022 - 1)*100:+.1f}%"
    ]
}

validation_df = pd.DataFrame(validation_data)
print("=== VALIDATION AGAINST OFFICIAL STATISTICS ===")
validation_df

=== VALIDATION AGAINST OFFICIAL STATISTICS ===


Unnamed: 0,Metric,Official Source,Official Count,Our Simulation,Difference
0,Households,U.S. Census ACS 2023,13699816,14.2M,+3.7%
1,Tax Returns/Units,IRS SOI 2022,18487690,21.3M,+15.3%


#### Notes on Validation
- **Census API**: [https://api.census.gov/data/2023/acs/acs1?get=NAME,B11001_001E&for=state:06](https://api.census.gov/data/2023/acs/acs1?get=NAME,B11001_001E&for=state:06)
- **IRS Data**: [https://www.irs.gov/pub/irs-soi/22in05ca.xlsx](https://www.irs.gov/pub/irs-soi/22in05ca.xlsx)
- Tax unit count exceeds IRS returns as we include non-filers
- Differences explained by year variations (2022-2024) and methodology

# Create comparison table
comparison_data = {
    'Metric': [
        'Total Count',
        'Eligible Units (with rebate base)',
        'Total Rebate Base',
        'Average Rebate Base (eligible only)'
    ],
    'Households': [
        f"{household_results['total_ca_households']/1e6:.1f}M",
        f"{household_results['households_with_rebate']/1e6:.1f}M ({household_results['rebate_percentage']:.0%})",
        f"${household_results['total_rebate_base']/1e9:.1f}B",
        f"${household_results['average_rebate_base']:,.0f}"
    ],
    'Tax Units': [
        f"{tax_unit_results['total_ca_tax_units']/1e6:.1f}M",
        f"{tax_unit_results['tax_units_with_rebate']/1e6:.1f}M ({tax_unit_results['rebate_percentage']:.0%})",
        f"${tax_unit_results['total_rebate_base']/1e9:.1f}B",
        f"${tax_unit_results['average_rebate_base']:,.0f}"
    ]
}

comparison_df = pd.DataFrame(comparison_data)

# Calculate ratios for all metrics
ratios = [
    f"{tax_unit_results['total_ca_tax_units']/household_results['total_ca_households']:.2f}x",
    f"{tax_unit_results['tax_units_with_rebate']/household_results['households_with_rebate']:.2f}x",
    f"{tax_unit_results['total_rebate_base']/household_results['total_rebate_base']:.2f}x",
    f"{tax_unit_results['average_rebate_base']/household_results['average_rebate_base']:.2f}x"
]

comparison_df['Ratio (TU/HH)'] = ratios

print("=== HOUSEHOLD VS TAX UNIT ANALYSIS (2026) ===")
comparison_df

In [None]:
# Create comparison table
comparison_data = {
    'Metric': [
        'Total Count',
        'Eligible Units (with rebate base)',
        'Total Rebate Base',
        'Average Rebate Base (eligible only)'
    ],
    'Households': [
        f"{household_results['total_ca_households']/1e6:.1f}M",
        f"{household_results['households_with_rebate']/1e6:.1f}M ({household_results['rebate_percentage']:.0%})",
        f"${household_results['total_rebate_base']/1e9:.1f}B",
        f"${household_results['average_rebate_base']:,.0f}"
    ],
    'Tax Units': [
        f"{tax_unit_results['total_ca_tax_units']/1e6:.1f}M",
        f"{tax_unit_results['tax_units_with_rebate']/1e6:.1f}M ({tax_unit_results['rebate_percentage']:.0%})",
        f"${tax_unit_results['total_rebate_base']/1e9:.1f}B",
        f"${tax_unit_results['average_rebate_base']:,.0f}"
    ]
}

comparison_df = pd.DataFrame(comparison_data)

# Calculate ratios for all metrics
ratios = [
    f"{tax_unit_results['total_ca_tax_units']/household_results['total_ca_households']:.2f}x",
    f"{tax_unit_results['tax_units_with_rebate']/household_results['households_with_rebate']:.2f}x",
    f"{tax_unit_results['total_rebate_base']/household_results['total_rebate_base']:.2f}x",
    f"{tax_unit_results['average_rebate_base']/household_results['average_rebate_base']:.2f}x"
]

comparison_df['Ratio (TU/HH)'] = ratios

print(f"=== HOUSEHOLD VS TAX UNIT ANALYSIS ({SIMULATION_YEAR}) ===")
comparison_df

#### Key Insights
- Tax units exceed households by 50% (consistent with multi-generational living)
- Eligible tax units are 2.61x eligible households (low-income households split into multiple tax units)
- Total rebate base is 2.42x higher at tax unit level
- Each approach has different administrative and equity implications

# Federal Poverty Guideline for 2024 (48 contiguous states)
FPG_SINGLE = 16030
FPG_ADDITIONAL = 5910

def calculate_rebate_base(income, household_size):
    """Calculate rebate base for a given income and household size"""
    fpg = FPG_SINGLE + FPG_ADDITIONAL * (household_size - 1)
    ratio = income / fpg
    
    if ratio <= 1.5:
        return fpg  # Full rebate base
    elif ratio >= 1.75:
        return 0  # No rebate base
    else:
        # Linear phase-out between 150% and 175%
        phase_out_pct = (ratio - 1.5) / 0.25
        return fpg * (1 - phase_out_pct)

# Example calculations
examples = [
    ("Single person at 100% FPG", 16030, 1),
    ("Single person at 150% FPG", 24045, 1),
    ("Single person at 162.5% FPG", 26049, 1),
    ("Single person at 175% FPG", 28053, 1),
    ("Family of 4 at 150% FPG", 46275, 4),
    ("Family of 4 at 175% FPG", 53988, 4),
]

# Create examples table
example_data = []
for description, income, size in examples:
    fpg = FPG_SINGLE + FPG_ADDITIONAL * (size - 1)
    rebate_base = calculate_rebate_base(income, size)
    ratio = income / fpg
    example_data.append({
        'Description': description,
        'Income': f"${income:,}",
        'FPG': f"${fpg:,}",
        'Income/FPG': f"{ratio:.1%}",
        'Rebate Base': f"${rebate_base:,.0f}"
    })

examples_df = pd.DataFrame(example_data)
print("=== EXAMPLE REBATE BASE CALCULATIONS (2024 FPG) ===")
examples_df

In [8]:
# Federal Poverty Guideline for 2024 (48 contiguous states)
FPG_SINGLE = 16030
FPG_ADDITIONAL = 5910

def calculate_rebate_base(income, household_size):
    """Calculate rebate base for a given income and household size"""
    fpg = FPG_SINGLE + FPG_ADDITIONAL * (household_size - 1)
    ratio = income / fpg
    
    if ratio <= 1.5:
        return fpg  # Full rebate base
    elif ratio >= 1.75:
        return 0  # No rebate base
    else:
        # Linear phase-out between 150% and 175%
        phase_out_pct = (ratio - 1.5) / 0.25
        return fpg * (1 - phase_out_pct)

# Example calculations
examples = [
    ("Single person at 100% FPG", 16030, 1),
    ("Single person at 150% FPG", 24045, 1),
    ("Single person at 162.5% FPG", 26049, 1),
    ("Single person at 175% FPG", 28053, 1),
    ("Family of 4 at 150% FPG", 46275, 4),
    ("Family of 4 at 175% FPG", 53988, 4),
]

# Create examples table
example_data = []
for description, income, size in examples:
    fpg = FPG_SINGLE + FPG_ADDITIONAL * (size - 1)
    rebate_base = calculate_rebate_base(income, size)
    ratio = income / fpg
    example_data.append({
        'Description': description,
        'Income': f"${income:,}",
        'FPG': f"${fpg:,}",
        'Income/FPG': f"{ratio:.1%}",
        'Rebate Base': f"${rebate_base:,.0f}"
    })

examples_df = pd.DataFrame(example_data)
print("=== EXAMPLE REBATE BASE CALCULATIONS ===")
examples_df

=== EXAMPLE REBATE BASE CALCULATIONS ===


Unnamed: 0,Description,Income,FPG,Income/FPG,Rebate Base
0,Single person at 100% FPG,"$16,030","$16,030",100.0%,"$16,030"
1,Single person at 150% FPG,"$24,045","$16,030",150.0%,"$16,030"
2,Single person at 162.5% FPG,"$26,049","$16,030",162.5%,"$8,014"
3,Single person at 175% FPG,"$28,053","$16,030",175.0%,$0
4,Family of 4 at 150% FPG,"$46,275","$33,760",137.1%,"$33,760"
5,Family of 4 at 175% FPG,"$53,988","$33,760",159.9%,"$20,368"


## 9. Conclusions

### Key Findings:
1. **Household-level rebate base**: $60.5B
2. **Tax unit-level rebate base**: $146.5B
3. **Coverage**: 23% of households, 42% of tax units

### Policy Implications:
- Using tax units instead of households more than doubles the total rebate base
- This reflects the reality of low-income living arrangements in California
- The phase-out between 150-175% FPG ensures targeting to lower-income populations

### Next Steps:
- Use these X values in the VAT rate formula: `t = Rs/(Cp - X - T + Ro)`
- Consider administrative complexity of tax unit vs household approach
- Evaluate distributional impacts of each approach