# EITC Analysis: Childless Filers by Phase Status

## Overview
This notebook analyzes **childless tax units** (those with no EITC-qualifying children) across all 50 US states + DC, categorizing them by where they fall on the EITC schedule.

## What This Notebook Does
1. **Loads state-specific microdata** from PolicyEngine's HuggingFace repository
2. **Filters to childless filers** (eitc_child_count == 0)
3. **Checks EITC eligibility** (age requirements, SSN, investment income limits)
4. **Categorizes each household** into one of 6 phase statuses
5. **Calculates weighted counts and percentages** by state
6. **Exports summary data** to CSV files

## EITC Phase Status Categories
| Status | Description |
|--------|-------------|
| **Ineligible** | Does not meet EITC eligibility requirements (age, SSN, investment income, or filing status) |
| **No earned income** | No earned income, therefore no EITC |
| **Pre-phase-in** | Has earned income but hasn't reached maximum credit yet |
| **Full amount** | At the plateau - receiving maximum credit |
| **Partially phased out** | In phase-out range, receiving reduced credit |
| **Fully phased out** | Income too high, EITC reduced to $0 |

## Data Source
- **State datasets**: `hf://policyengine/policyengine-us-data/states/{STATE}.h5`
- Each state has its own dataset with representative household microdata
- Data is weighted to represent the actual population

## Output Files
- `eitc_childless_phase_status_summary_{year}.csv` - Aggregated by state and phase status

## Years Analyzed
- 2024 and 2025

## State EITC Programs

As of 2024, **31 states plus DC** have state-level Earned Income Tax Credit programs. Most states calculate their EITC as a simple percentage match of the federal EITC, but several have unique structures.

### States with Standard Federal Match Structure
These states calculate their state EITC as a percentage of the federal EITC amount:

| State | Match % | Refundable | Notes |
|-------|---------|------------|-------|
| CO | 50% (2024) | Yes | Phasing down to 10% by 2034 |
| CT | ~30% | Yes | |
| DC | 70% | Yes | Higher match for childless workers |
| DE | 4.5% ref / 20% non-ref | Choice | Taxpayers choose refundable OR non-refundable |
| HI | 40% | Yes | |
| IL | 20% | Yes | |
| IN | 10% | Yes | |
| IA | 15% | Yes | |
| KS | 17% | Yes | |
| LA | 5% | Yes | |
| ME | 50% | Yes | |
| MA | 40% | Yes | |
| MI | 30% | Yes | |
| MO | 20% | Yes | Called "Working Families Tax Credit" |
| MT | 10% | Yes | |
| NE | 10% | Yes | |
| NJ | Variable | Yes | Varies by income |
| NM | ~25% | Yes | |
| NY | 30% | Yes | Plus supplemental credit |
| OH | 30% | Yes | |
| OK | 5% | Yes | Lowest in nation |
| OR | 9-12% | Yes | Varies by children |
| PA | ~10% | Yes | |
| RI | 16% | Yes | |
| SC | 125% | Yes | Highest in nation |
| VT | ~38% | Yes | Increased to 100% for childless in 2025 |
| WI | Variable | Yes | Varies by children |

### States with UNIQUE/NON-STANDARD Structures

#### California (CA) - CalEITC
California does NOT simply match the federal EITC. Instead:
- Uses an **85% adjustment factor** applied to a state-specific calculation
- Has **different phase-in rates by number of children**:
  - 0 children: 7.65%
  - 1 child: 34%
  - 2 children: 40%
  - 3+ children: 45%
- Has a **two-stage phase-out** structure
- Maximum credit is lower than federal EITC
- **Fully refundable**

#### Minnesota (MN) - Working Family Credit / Child & Working Families Credit
Minnesota **replaced** its traditional Working Family Credit in 2023 with the **Child and Working Families Credit (CWFC)**:
- **Two-part credit structure**:
  1. Child Tax Credit component: Fixed amount per qualifying child
  2. Working Family Credit component: Phase-in based on earnings
- Combined amounts phase out together based on AGI or earnings
- **Completely independent calculation** from federal EITC
- **Fully refundable**

#### Washington (WA) - Working Families Tax Credit (WFTC)
Washington has **no income tax** and therefore no traditional EITC. Instead:
- Provides a **flat dollar amount** based on number of children:
  - 0 children: $300-$325
  - 1 child: $600-$640
  - 2 children: $900-$965
  - 3+ children: $1,200-$1,290
- Phases out starting **$2,500-$5,000 below** federal EITC AGI limits
- Requires claiming federal EITC to qualify
- **Fully refundable**

#### Virginia (VA) - Split Refundable/Non-Refundable + Low-Income Tax Credit
Virginia has the most complex structure:
- **Non-refundable match**: 20% of federal EITC (since 2006)
- **Refundable match**: Variable (0% → 15% → 20% → 15% over different years)
- **Alternative Low-Income Tax Credit (LITC)**: $300 per personal exemption
- Taxpayers receive the **better of** EITC match or LITC
- Separate filers receive prorated credits

#### Delaware (DE) - Choice Between Refundable and Non-Refundable
Delaware requires taxpayers to **choose one**:
- **Refundable option**: 4.5% of federal EITC
- **Non-refundable option**: 20% of federal EITC
- Cannot claim both

#### Maryland (MD) - Differentiated by Family Status
Maryland varies match percentages by family composition:
- **Married OR has children**: 
  - Non-refundable: 50%
  - Refundable: 25-45%
- **Childless unmarried filers**: Different (lower) percentages
- Has separate parameters for different filing situations

### States WITHOUT State EITC Programs
The following states have **no state EITC**: AL, AK, AZ, AR, FL, GA, ID, KY, MS, NV, NH, NC, ND, SD, TN, TX, UT, WV, WY

In [1]:
# =============================================================================
# IMPORTS AND CONFIGURATION
# =============================================================================
# 
# policyengine_us: PolicyEngine's US tax-benefit microsimulation model
#   - Microsimulation: Class for running simulations on survey microdata
#   - Loads datasets, calculates tax/benefit variables for each household
#
# pandas/numpy: Standard data manipulation libraries
# =============================================================================

from policyengine_us import Microsimulation
import pandas as pd
import numpy as np

# Configure pandas display options for better output formatting
pd.set_option('display.max_columns', None)      # Show all columns
pd.set_option('display.width', None)            # Don't wrap output
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')  # Format numbers with commas

## EITC Phase Status Classification

The Earned Income Tax Credit (EITC) follows a trapezoidal schedule:

```
Credit
Amount
   ^
   |      ___________
   |     /           \
   |    /             \
   |   /               \
   |  /                 \
   | /                   \
   |/_____________________\____> Earned Income
     Phase-in  Plateau  Phase-out
```

### EITC Eligibility Requirements (Childless Filers)
Before a childless filer can receive EITC, they must meet:
1. **Age requirement**: Between 25 and 64 years old (or 19+ if former foster youth/homeless)
2. **SSN requirement**: Valid Social Security Number for work
3. **Investment income limit**: Investment income must be below threshold (~$11,000 in 2024)
4. **Filing status**: Cannot file as "Married Filing Separately" (in most cases)

### How We Classify Households

We use PolicyEngine's calculated variables:

| Variable | Description |
|----------|-------------|
| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |
| `eitc` | Final EITC amount received (after all calculations) |
| `eitc_maximum` | Maximum possible EITC for this filing status |
| `eitc_phased_in` | Amount "earned" based on phase-in rate × earned income |
| `eitc_reduction` | Amount reduced due to being in phase-out range |
| `tax_unit_earned_income` | Total earned income for the tax unit |

### Classification Logic (in priority order)
1. **Ineligible**: `eitc_eligible == False` (fails age, SSN, investment income, or filing status)
2. **No earned income**: `tax_unit_earned_income == 0` (eligible but no earnings)
3. **Pre-phase-in**: Receiving EITC but `eitc_phased_in < eitc_maximum`
4. **Full amount**: `eitc_phased_in >= eitc_maximum` AND `eitc_reduction == 0`
5. **Partially phased out**: Receiving EITC AND `eitc_reduction > 0`
6. **Fully phased out**: `eitc == 0` AND has income (phased out completely)

In [2]:
# =============================================================================
# EITC PHASE STATUS CLASSIFICATION FUNCTION
# =============================================================================
# This function takes a DataFrame of households and classifies each one into
# one of 6 EITC phase statuses based on eligibility, income, and EITC calculations.
#
# Uses numpy's np.select() for efficient vectorized conditional logic.
# =============================================================================

def determine_eitc_phase_status_vectorized(df):
    """
    Classify each household into an EITC phase status category.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        Must contain columns: eitc_eligible, tax_unit_earned_income, eitc, 
        eitc_reduction, eitc_phased_in, eitc_maximum
    
    Returns:
    --------
    numpy.ndarray
        Array of status strings, one per row in df
    
    Categories (in priority order):
    -------------------------------
    1. Ineligible: Does not meet EITC eligibility (age, SSN, investment income)
    2. No earned income: Eligible but has zero earned income
    3. Pre-phase-in: Receiving EITC, still building up to maximum
    4. Full amount: At maximum credit (plateau region)
    5. Partially phased out: In phase-out region, still receiving some credit
    6. Fully phased out: Income too high, EITC reduced to $0
    """
    
    # Define conditions in PRIORITY ORDER (first match wins)
    conditions = [
        # CONDITION 1: Ineligible for EITC
        # Fails age requirement (25-64), SSN, investment income limit, or filing status
        df['eitc_eligible'] == False,
        
        # CONDITION 2: No earned income
        # Eligible for EITC but has zero earned income (cannot receive credit)
        (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] == 0),
        
        # CONDITION 3: Pre-phase-in
        # Receiving EITC, but haven't earned enough to hit maximum yet
        (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),
        
        # CONDITION 4: Full amount (plateau)
        # Receiving EITC at maximum, no reduction applied
        (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0),
        
        # CONDITION 5: Partially phased out
        # Receiving EITC, but some reduction has been applied
        (df['eitc'] > 0) & (df['eitc_reduction'] > 0),
        
        # CONDITION 6: Fully phased out
        # Eligible, has income, but EITC reduced to zero
        (df['eitc_eligible'] == True) & (df['tax_unit_earned_income'] > 0) & (df['eitc'] <= 0),
    ]
    
    # Labels corresponding to each condition above
    choices = [
        'Ineligible',
        'No earned income',
        'Pre-phase-in',
        'Full amount',
        'Partially phased out',
        'Fully phased out'
    ]
    
    # np.select applies conditions in order, returns first matching choice
    # Default catches any edge cases
    return np.select(conditions, choices, default='Ineligible')

## Data Loading Functions

### `run_state_eitc_analysis(state_abbr, year)`
Loads and processes data for a single state:
1. Loads the state's microdata from HuggingFace
2. Calculates all relevant EITC and household variables
3. Filters to childless filers only (`eitc_child_count == 0`)
4. Classifies each household by EITC phase status
5. Returns a DataFrame with one row per household

### `run_all_states_analysis(year)`
Orchestrates the full analysis:
1. Loops through all 51 states/DC
2. Calls `run_state_eitc_analysis()` for each
3. Combines all results into a single DataFrame

### Variables Calculated
| Variable | Description |
|----------|-------------|
| `tax_unit_weight` | Survey weight (how many real households this record represents) |
| `eitc_eligible` | Whether tax unit meets all EITC eligibility requirements |
| `eitc` | Federal EITC amount received |
| `state_eitc` | State EITC amount (if state has a program) |
| `eitc_child_count` | Number of EITC-qualifying children (we filter to 0) |
| `tax_unit_earned_income` | Total earned income for the tax unit |
| `age_head` | Age of primary filer |

In [3]:
# =============================================================================
# STATE LIST AND DATA LOADING FUNCTIONS
# =============================================================================

# All US states + DC (51 total)
ALL_STATES = [
    'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 
    'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 
    'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 
    'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 
    'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'
]

# Order for sorting phase statuses (follows logical EITC flow)
PHASE_ORDER = [
    'Ineligible',           # Cannot receive EITC (age/SSN/investment income)
    'No earned income',     # Eligible but no earnings
    'Pre-phase-in',         # Building up to maximum
    'Full amount',          # At maximum (plateau)
    'Partially phased out', # Being reduced
    'Fully phased out'      # Reduced to $0
]


def run_state_eitc_analysis(state_abbr, year):
    """
    Load and analyze EITC data for a single state.
    
    Parameters:
    -----------
    state_abbr : str
        Two-letter state abbreviation (e.g., 'CA', 'NY', 'TX')
    year : int
        Tax year to analyze (e.g., 2024, 2025)
    
    Returns:
    --------
    pandas.DataFrame or None
        DataFrame with one row per childless tax unit, or None if error
    """
    try:
        # Load the state's microdata from HuggingFace
        dataset_path = f"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5"
        sim = Microsimulation(dataset=dataset_path)
        
        # Variables to calculate
        tax_unit_vars = [
            'tax_unit_id',              # Unique identifier
            'tax_unit_weight',          # Survey weight
            'eitc_eligible',            # NEW: Whether eligible for EITC
            'eitc',                     # Federal EITC amount
            'eitc_maximum',             # Max possible EITC
            'eitc_phased_in',           # Phase-in amount
            'eitc_reduction',           # Phase-out reduction
            'eitc_child_count',         # Number of EITC-qualifying children
            'state_eitc',               # State EITC amount
            'tax_unit_earned_income',   # Total earned income
            'age_head',                 # Age of primary filer
        ]
        
        # Calculate each variable
        data = {}
        for var in tax_unit_vars:
            result = sim.calculate(var, period=year)
            data[var] = result.values if hasattr(result, 'values') else np.array(result)
        
        df = pd.DataFrame(data)
        df['state'] = state_abbr
        
        # Filter to childless filers only
        childless_mask = df['eitc_child_count'] == 0
        df_childless = df[childless_mask].copy()
        
        if len(df_childless) == 0:
            return None
        
        # Classify each household by EITC phase status
        df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)
        df_childless['year'] = year
        
        return df_childless
        
    except Exception as e:
        print(f"  Error processing {state_abbr}: {e}")
        return None


def run_all_states_analysis(year, states=None):
    """
    Run EITC analysis for all states and combine results.
    """
    if states is None:
        states = ALL_STATES
    
    print(f"\n{'='*60}")
    print(f"Running analysis for {year}")
    print(f"{'='*60}")
    
    all_results = []
    
    for i, state in enumerate(states):
        print(f"Processing {state} ({i+1}/{len(states)})...", end=" ")
        result = run_state_eitc_analysis(state, year)
        
        if result is not None and len(result) > 0:
            weighted_count = result['tax_unit_weight'].sum()
            print(f"{len(result):,} records, {weighted_count:,.0f} weighted")
            all_results.append(result)
        else:
            print("No data found")
    
    if all_results:
        combined = pd.concat(all_results, ignore_index=True)
        print(f"\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units")
        return combined
    else:
        return pd.DataFrame()

## Run Analysis for 2024 and 2025

In [4]:
# =============================================================================
# RUN ANALYSIS FOR 2024
# =============================================================================
# This cell processes all 51 states/DC for tax year 2024.
# 
# Output:
#   df_2024 - DataFrame containing all childless tax units from all states
#            with EITC calculations and phase status classification
#
# Processing time: Approximately 5-10 minutes depending on internet speed
#                  (downloads ~50MB of data from HuggingFace)
# =============================================================================

df_2024 = run_all_states_analysis(2024)


Running analysis for 2024
Processing AL (1/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


AL.h5:   0%|          | 0.00/35.6M [00:00<?, ?B/s]

25,751 records, 1,422,123 weighted
Processing AK (2/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


AK.h5:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

1,182 records, 205,778 weighted
Processing AZ (3/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


AZ.h5:   0%|          | 0.00/41.0M [00:00<?, ?B/s]

30,120 records, 1,905,622 weighted
Processing AR (4/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


AR.h5:   0%|          | 0.00/21.1M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


15,144 records, 683,842 weighted
Processing CA (5/51)... 

CA.h5:   0%|          | 0.00/334M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


238,247 records, 11,676,756 weighted
Processing CO (6/51)... 

CO.h5:   0%|          | 0.00/46.3M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


34,120 records, 1,602,958 weighted
Processing CT (7/51)... 

CT.h5:   0%|          | 0.00/27.9M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


19,827 records, 1,119,846 weighted
Processing DE (8/51)... 

DE.h5:   0%|          | 0.00/5.47M [00:00<?, ?B/s]

3,801 records, 265,233 weighted
Processing DC (9/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


DC.h5:   0%|          | 0.00/7.56M [00:00<?, ?B/s]

4,995 records, 247,082 weighted
Processing FL (10/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


FL.h5:   0%|          | 0.00/56.4M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


45,655 records, 6,828,672 weighted
Processing GA (11/51)... 

GA.h5:   0%|          | 0.00/77.1M [00:00<?, ?B/s]

56,638 records, 2,867,909 weighted
Processing HI (12/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


HI.h5:   0%|          | 0.00/11.6M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


8,416 records, 401,230 weighted
Processing ID (13/51)... 

ID.h5:   0%|          | 0.00/10.4M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


7,678 records, 420,636 weighted
Processing IL (14/51)... 

IL.h5:   0%|          | 0.00/76.9M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


56,631 records, 4,061,833 weighted
Processing IN (15/51)... 

IN.h5:   0%|          | 0.00/46.1M [00:00<?, ?B/s]

33,456 records, 1,707,284 weighted
Processing IA (16/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


IA.h5:   0%|          | 0.00/19.6M [00:00<?, ?B/s]

14,070 records, 834,990 weighted
Processing KS (17/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


KS.h5:   0%|          | 0.00/21.6M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


15,776 records, 746,492 weighted
Processing KY (18/51)... 

KY.h5:   0%|          | 0.00/30.5M [00:00<?, ?B/s]

22,109 records, 1,122,918 weighted
Processing LA (19/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


LA.h5:   0%|          | 0.00/30.3M [00:00<?, ?B/s]

21,674 records, 1,255,035 weighted
Processing ME (20/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


ME.h5:   0%|          | 0.00/11.0M [00:00<?, ?B/s]

7,782 records, 436,655 weighted
Processing MD (21/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


MD.h5:   0%|          | 0.00/54.7M [00:00<?, ?B/s]

39,963 records, 1,737,465 weighted
Processing MA (22/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


MA.h5:   0%|          | 0.00/56.3M [00:00<?, ?B/s]

40,034 records, 2,445,482 weighted
Processing MI (23/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


MI.h5:   0%|          | 0.00/57.3M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


41,722 records, 2,947,462 weighted
Processing MN (24/51)... 

MN.h5:   0%|          | 0.00/45.4M [00:00<?, ?B/s]

32,839 records, 1,579,933 weighted
Processing MS (25/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


MS.h5:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


13,414 records, 751,858 weighted
Processing MO (26/51)... 

MO.h5:   0%|          | 0.00/41.1M [00:00<?, ?B/s]

29,883 records, 1,572,474 weighted
Processing MT (27/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


MT.h5:   0%|          | 0.00/10.8M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


7,850 records, 322,606 weighted
Processing NE (28/51)... 

NE.h5:   0%|          | 0.00/10.3M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


7,585 records, 555,046 weighted
Processing NV (29/51)... 

NV.h5:   0%|          | 0.00/14.2M [00:00<?, ?B/s]

10,477 records, 962,804 weighted
Processing NH (30/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


NH.h5:   0%|          | 0.00/3.58M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


2,774 records, 466,176 weighted
Processing NJ (31/51)... 

NJ.h5:   0%|          | 0.00/75.4M [00:00<?, ?B/s]

53,826 records, 2,670,506 weighted
Processing NM (32/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


NM.h5:   0%|          | 0.00/14.7M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


10,333 records, 674,804 weighted
Processing NY (33/51)... 

NY.h5:   0%|          | 0.00/155M [00:00<?, ?B/s]

111,004 records, 6,089,496 weighted
Processing NC (34/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


NC.h5:   0%|          | 0.00/75.2M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


55,174 records, 3,018,448 weighted
Processing ND (35/51)... 

ND.h5:   0%|          | 0.00/4.80M [00:00<?, ?B/s]

3,391 records, 208,559 weighted
Processing OH (36/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


OH.h5:   0%|          | 0.00/72.0M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


52,414 records, 3,171,406 weighted
Processing OK (37/51)... 

OK.h5:   0%|          | 0.00/24.8M [00:00<?, ?B/s]

17,840 records, 1,141,744 weighted
Processing OR (38/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


OR.h5:   0%|          | 0.00/37.5M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


27,048 records, 1,384,394 weighted
Processing PA (39/51)... 

PA.h5:   0%|          | 0.00/81.9M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


59,791 records, 4,057,412 weighted
Processing RI (40/51)... 

RI.h5:   0%|          | 0.00/10.3M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


7,429 records, 397,583 weighted
Processing SC (41/51)... 

SC.h5:   0%|          | 0.00/36.9M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


26,703 records, 1,387,951 weighted
Processing SD (42/51)... 

SD.h5:   0%|          | 0.00/1.53M [00:00<?, ?B/s]

1,071 records, 257,659 weighted
Processing TN (43/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


TN.h5:   0%|          | 0.00/13.2M [00:00<?, ?B/s]

11,099 records, 2,125,824 weighted
Processing TX (44/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


TX.h5:   0%|          | 0.00/56.5M [00:00<?, ?B/s]

46,778 records, 8,270,492 weighted
Processing UT (45/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


UT.h5:   0%|          | 0.00/24.9M [00:00<?, ?B/s]

18,448 records, 728,702 weighted
Processing VT (46/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


VT.h5:   0%|          | 0.00/5.38M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


3,780 records, 213,103 weighted
Processing VA (47/51)... 

VA.h5:   0%|          | 0.00/65.4M [00:00<?, ?B/s]

47,926 records, 2,348,494 weighted
Processing WA (48/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


WA.h5:   0%|          | 0.00/15.2M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


12,571 records, 2,709,062 weighted
Processing WV (49/51)... 

WV.h5:   0%|          | 0.00/9.70M [00:00<?, ?B/s]

6,981 records, 519,592 weighted
Processing WI (50/51)... 

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


WI.h5:   0%|          | 0.00/38.0M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


27,609 records, 1,731,936 weighted
Processing WY (51/51)... 

WY.h5:   0%|          | 0.00/3.95M [00:00<?, ?B/s]

2,712 records, 170,182 weighted

Total: 1,493,541 records, 96,431,536 weighted tax units


In [5]:
# =============================================================================
# RUN ANALYSIS FOR 2025
# =============================================================================
# Same analysis as above but for tax year 2025.
# PolicyEngine uses inflation-adjusted parameters for future years.
#
# Output:
#   df_2025 - DataFrame containing all childless tax units for 2025
# =============================================================================

df_2025 = run_all_states_analysis(2025)


Running analysis for 2025
Processing AL (1/51)... 25,751 records, 1,435,327 weighted
Processing AK (2/51)... 1,182 records, 207,689 weighted
Processing AZ (3/51)... 30,120 records, 1,923,315 weighted
Processing AR (4/51)... 15,144 records, 690,191 weighted
Processing CA (5/51)... 238,247 records, 11,785,171 weighted
Processing CO (6/51)... 34,120 records, 1,617,841 weighted
Processing CT (7/51)... 19,827 records, 1,130,243 weighted
Processing DE (8/51)... 3,801 records, 267,696 weighted
Processing DC (9/51)... 4,995 records, 249,376 weighted
Processing FL (10/51)... 45,655 records, 6,892,074 weighted
Processing GA (11/51)... 56,638 records, 2,894,537 weighted
Processing HI (12/51)... 8,416 records, 404,956 weighted
Processing ID (13/51)... 7,678 records, 424,542 weighted
Processing IL (14/51)... 56,631 records, 4,099,546 weighted
Processing IN (15/51)... 33,456 records, 1,723,135 weighted
Processing IA (16/51)... 14,070 records, 842,742 weighted
Processing KS (17/51)... 15,776 records

In [6]:
# =============================================================================
# COMBINE BOTH YEARS INTO SINGLE DATASET
# =============================================================================
# Creates a unified dataset with both years for cross-year comparisons.
# The 'year' column distinguishes records from each tax year.
#
# Note: This combined dataset is primarily for exploratory analysis.
#       The exports are done separately by year for cleaner output files.
# =============================================================================

df_combined = pd.concat([df_2024, df_2025], ignore_index=True)
print(f"\nCombined dataset: {len(df_combined):,} records")


Combined dataset: 2,987,082 records


## Create and Export Summary

In [7]:
# =============================================================================
# PHASE STATUS SUMMARY BY STATE
# =============================================================================
# This function creates the main summary output: for each state, what
# percentage of childless households fall into each EITC phase status?
#
# Key outputs per state × phase status:
#   - weighted_households: Actual population count (using survey weights)
#   - pct_of_state: What % of that state's childless households are in this phase
#   - avg_federal_eitc: Average federal EITC for households receiving EITC
#   - avg_state_eitc: Average state EITC (for states with programs)
#
# The percentages should sum to 100% for each state since we include ALL
# childless households (not just EITC recipients).
# =============================================================================

def create_phase_status_summary(df, year_label):
    """
    Create summary of EITC phase status by state with weighted counts and percentages.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        Household-level data from run_all_states_analysis()
    year_label : str
        Label for display (e.g., "2024")
    
    Returns:
    --------
    pandas.DataFrame
        Summary with columns: state, eitc_phase_status, weighted_households,
        pct_of_state, avg_federal_eitc, avg_state_eitc
    """
    print(f"\n{'='*70}")
    print(f"EITC Phase Status by State - {year_label}")
    print(f"{'='*70}")
    
    # Step 1: Calculate weighted counts by state and phase status
    # tax_unit_weight is summed to get population-representative counts
    summary = df.groupby(['state', 'eitc_phase_status']).agg({
        'tax_unit_weight': 'sum',
    }).reset_index()
    
    summary.columns = ['state', 'eitc_phase_status', 'weighted_households']
    
    # Step 2: Calculate state totals for percentage calculation
    state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()
    state_totals.columns = ['state', 'state_total']
    
    # Step 3: Merge to compute percentages
    summary = summary.merge(state_totals, on='state')
    summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)
    
    # Step 4: Add average EITC amounts (only computed for households receiving EITC)
    # This uses weighted averages: sum(value × weight) / sum(weight)
    avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(
        lambda x: pd.Series({
            'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
        })
    ).reset_index()
    
    summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')
    summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)
    summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)
    
    # Step 5: Clean up columns and sort
    summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', 
                       'avg_federal_eitc', 'avg_state_eitc']]
    
    # Sort by state alphabetically, then by phase status in logical order
    summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})
    summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)
    
    return summary

# Generate summaries for both years
summary_2024 = create_phase_status_summary(df_2024, "2024")
summary_2025 = create_phase_status_summary(df_2025, "2025")

# Preview the results
print("\n2024 Summary (first 20 rows):")
print(summary_2024.head(20).to_string(index=False))
print("\n2025 Summary (first 20 rows):")
print(summary_2025.head(20).to_string(index=False))


EITC Phase Status by State - 2024

EITC Phase Status by State - 2025

2024 Summary (first 20 rows):
state    eitc_phase_status  weighted_households  pct_of_state  avg_federal_eitc  avg_state_eitc
   AK           Ineligible           103,108.59         50.10              0.00            0.00
   AK     No earned income            13,868.58          6.70              0.00            0.00
   AK         Pre-phase-in             3,593.07          1.70            515.63            0.00
   AK          Full amount                 0.26          0.00            632.00            0.00
   AK Partially phased out             1,670.44          0.80            626.76            0.00
   AK     Fully phased out            83,537.39         40.60              0.00            0.00
   AL           Ineligible           807,295.19         56.80              0.00            0.00
   AL     No earned income           108,302.58          7.60              0.00            0.00
   AL         Pre-phase-in         

In [8]:
# =============================================================================
# SUMMARY BY STATE - TOP STATES BY POPULATION
# =============================================================================
# Shows the states with the largest childless tax unit populations,
# along with total and average EITC amounts.
#
# Useful for understanding which states contribute most to the national totals.
# =============================================================================

def summary_by_state(df, year_label, top_n=15):
    """
    Create summary by state showing top N by number of childless tax units.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        Household-level data
    year_label : str
        Label for display
    top_n : int
        Number of top states to show (default 15)
    
    Returns:
    --------
    pandas.DataFrame
        State-level summary sorted by weighted tax unit count
    """
    print(f"\n{'='*60}")
    print(f"Top {top_n} States by EITC Recipients - {year_label}")
    print(f"{'='*60}")
    
    # Calculate state-level aggregates using weighted sums/averages
    summary = df.groupby('state').apply(
        lambda x: pd.Series({
            # Total weighted tax units in state
            'Tax Units (Weighted)': x['tax_unit_weight'].sum(),
            # Total federal EITC distributed (weight × eitc amount)
            'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),
            # Total state EITC distributed
            'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),
            # Weighted average federal EITC per tax unit
            'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            # Weighted average state EITC per tax unit
            'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            # Boolean: does this state have a state EITC program?
            'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,
        })
    ).reset_index()
    
    # Sort by number of tax units (largest states first)
    summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)
    
    return summary

# Generate and display for both years
state_2024 = summary_by_state(df_2024, "2024")
print(state_2024.to_string(index=False))

state_2025 = summary_by_state(df_2025, "2025")
print(state_2025.to_string(index=False))


Top 15 States by EITC Recipients - 2024
state  Tax Units (Weighted)  Total Federal EITC  Total State EITC  Avg Federal EITC  Avg State EITC  Has State EITC
   CA         11,676,756.00      126,396,768.00    392,533,280.00             10.82           33.62            True
   TX          8,270,492.50      102,216,176.00              0.00             12.36            0.00           False
   FL          6,828,671.50       50,078,040.00              0.00              7.33            0.00           False
   NY          6,089,496.00       64,632,924.00     17,955,152.00             10.61            2.95            True
   IL          4,061,833.00       43,125,848.00      8,625,170.00             10.62            2.12            True
   PA          4,057,412.25       41,305,212.00              0.00             10.18            0.00           False
   OH          3,171,405.75       30,410,496.00      9,123,148.00              9.59            2.88            True
   NC          3,018,447.50    

In [9]:
# =============================================================================
# AGE DISTRIBUTION ANALYSIS
# =============================================================================
# Shows how childless tax units are distributed by age of the head of household.
#
# Key insight: The childless EITC has age restrictions (25-64 for 2024 under
# current law), so we expect most EITC recipients to fall within that range.
# =============================================================================

def age_distribution(df, year_label):
    """
    Create age group distribution for heads of household.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        Household-level data
    year_label : str
        Label for display
    
    Returns:
    --------
    pandas.DataFrame
        Summary by age group with weighted counts and averages
    """
    print(f"\n{'='*60}")
    print(f"Age Distribution of Head of Household - {year_label}")
    print(f"{'='*60}")
    
    # Create age groups using pd.cut
    df_copy = df.copy()
    df_copy['age_group'] = pd.cut(
        df_copy['age_head'],
        bins=[0, 25, 35, 45, 55, 65, 100],
        labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']
    )
    
    # Calculate weighted statistics by age group
    summary = df_copy.groupby('age_group').apply(
        lambda x: pd.Series({
            'Tax Units (Weighted)': x['tax_unit_weight'].sum(),
            'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,
            'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,
        })
    ).reset_index()
    
    # Add percentage of total
    total_units = summary['Tax Units (Weighted)'].sum()
    summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)
    
    return summary

# Generate for both years
age_2024 = age_distribution(df_2024, "2024")
print(age_2024.to_string(index=False))

age_2025 = age_distribution(df_2025, "2025")
print(age_2025.to_string(index=False))


Age Distribution of Head of Household - 2024
age_group  Tax Units (Weighted)  Avg Federal EITC  Avg Earned Income  % of Total
 Under 25         12,069,262.00              0.57          24,647.42       12.50
    25-34         14,198,971.00             35.78          76,383.24       14.70
    35-44         11,448,204.00              2.19          94,731.74       11.90
    45-54         16,595,334.00             22.36          87,682.62       17.20
    55-64          9,673,886.00              1.18          59,089.31       10.00
      65+         32,441,214.00              0.01          25,601.59       33.60

Age Distribution of Head of Household - 2025
age_group  Tax Units (Weighted)  Avg Federal EITC  Avg Earned Income  % of Total
 Under 25         12,181,323.00              0.59          25,851.27       12.50
    25-34         14,330,805.00             36.65          80,112.14       14.70
    35-44         11,554,499.00              2.15          99,357.79       11.90
    45-54        

## Export Data to CSV

In [10]:
# =============================================================================
# EXPORT DETAILED HOUSEHOLD DATA
# =============================================================================
# Exports the full household-level dataset with all calculated variables.
#
# WARNING: These files are large (~125MB each) and are excluded from git
# via .gitignore. They are generated locally when the notebook runs.
#
# Use cases:
#   - Detailed analysis in external tools (Excel, Stata, R)
#   - Validation of the summary statistics
#   - Custom filtering/aggregation not provided in this notebook
# =============================================================================

def export_household_data(df, year):
    """
    Export household-level data to CSV, sorted by state and phase status.
    """
    
    # Select columns for export (only columns we're loading)
    export_columns = [
        'state',                    # State abbreviation
        'eitc_phase_status',        # Classification result
        'tax_unit_id',              # Unique identifier
        'tax_unit_weight',          # Survey weight
        'eitc_eligible',            # Eligibility status
        'eitc',                     # Federal EITC amount
        'state_eitc',               # State EITC amount
        'eitc_phased_in',           # Phase-in calculation
        'eitc_reduction',           # Phase-out reduction
        'tax_unit_earned_income',   # Total earned income
        'age_head',                 # Age of primary filer
    ]
    
    # Only include columns that exist in the DataFrame
    available_columns = [col for col in export_columns if col in df.columns]
    df_export = df[available_columns].copy()
    
    # Rename columns for clarity in external tools
    df_export = df_export.rename(columns={
        'eitc': 'federal_eitc',
    })
    
    # Sort by state (alphabetically) then by phase status (in logical EITC order)
    df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})
    df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)
    
    # Write to CSV
    filename = f'eitc_childless_families_{year}.csv'
    df_export.to_csv(filename, index=False)
    print(f"Exported {len(df_export):,} rows to: {filename}")
    
    return df_export

# Export both years to separate files
df_export_2024 = export_household_data(df_2024, 2024)
df_export_2025 = export_household_data(df_2025, 2025)

Exported 1,493,541 rows to: eitc_childless_families_2024.csv
Exported 1,493,541 rows to: eitc_childless_families_2025.csv


In [11]:
# Preview the data
print("\nSample of 2024 export data:")
df_export_2024.head(10)


Sample of 2024 export data:


Unnamed: 0,state,eitc_phase_status,tax_unit_id,tax_unit_weight,eitc_eligible,federal_eitc,state_eitc,eitc_phased_in,eitc_reduction,tax_unit_earned_income,age_head
25751,AK,Ineligible,0,0.8,False,0.0,0.0,0.0,0.0,0.0,79
25753,AK,Ineligible,3,0.28,False,0.0,0.0,0.0,10068.1,0.0,76
25757,AK,Ineligible,11,4387.35,False,0.0,0.0,0.0,3368.61,0.0,85
25760,AK,Ineligible,14,2849.94,False,0.0,0.0,632.0,1747.39,31767.87,21
25761,AK,Ineligible,15,639.52,False,0.0,0.0,0.0,992.74,0.0,85
25763,AK,Ineligible,18,1114.78,False,0.0,0.0,0.0,0.0,0.0,83
25764,AK,Ineligible,19,1114.78,False,0.0,0.0,632.0,10566.14,132357.31,61
25766,AK,Ineligible,21,2.31,False,0.0,0.0,632.0,0.0,16941.74,78
25767,AK,Ineligible,22,0.82,False,0.0,0.0,0.0,0.0,0.0,85
25769,AK,Ineligible,24,792.77,False,0.0,0.0,0.0,20.54,0.0,81


In [12]:
# CSVs already exported in previous cell
# Files created:
# - eitc_childless_families_2024.csv
# - eitc_childless_families_2025.csv
print("Household data exported to separate files above.")

Household data exported to separate files above.


## Summary Statistics Export

In [13]:
# =============================================================================
# EXPORT SUMMARY DATA
# =============================================================================
# Exports the aggregated summary by state and phase status.
#
# These files are small (~10KB) and ARE included in git commits.
# This is the primary output for sharing with stakeholders.
#
# Output Files:
#   - eitc_childless_phase_status_summary_2024.csv
#   - eitc_childless_phase_status_summary_2025.csv
# =============================================================================

def export_summary(summary_df, year):
    """
    Export phase status summary to CSV, sorted by state and phase status.
    
    Parameters:
    -----------
    summary_df : pandas.DataFrame
        Summary from create_phase_status_summary()
    year : int
        Tax year (used in filename)
    
    Returns:
    --------
    pandas.DataFrame
        The exported data
    """
    df_export = summary_df.copy()
    
    # Sort by state (alphabetically) then phase status (logical EITC order)
    df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})
    df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)
    
    # Write to CSV
    filename = f'eitc_childless_phase_status_summary_{year}.csv'
    df_export.to_csv(filename, index=False)
    print(f"Exported summary to: {filename}")
    return df_export

# Export both years
summary_2024_export = export_summary(summary_2024, 2024)
summary_2025_export = export_summary(summary_2025, 2025)

Exported summary to: eitc_childless_phase_status_summary_2024.csv
Exported summary to: eitc_childless_phase_status_summary_2025.csv


## Grand Totals

In [14]:
# =============================================================================
# NATIONAL TOTALS BY PHASE STATUS
# =============================================================================
# Aggregates across all states to show the national distribution of
# childless tax units by EITC phase status.
#
# Key insights:
#   - Most childless tax units (~62%) are "Fully phased out" (too much income)
#   - About 35% have "No income" (no earned income = no EITC)
#   - Only ~2% actually receive EITC (Pre-phase-in + Full amount + Partially)
# =============================================================================

def national_totals(df, year):
    """
    Calculate national totals by phase status.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        Household-level data
    year : int
        Tax year (for output column)
    
    Returns:
    --------
    pandas.DataFrame
        National summary with weighted counts and percentages
    """
    totals = df.groupby('eitc_phase_status').agg({
        'tax_unit_weight': 'sum',
    }).reset_index()
    totals.columns = ['eitc_phase_status', 'weighted_households']
    
    # Calculate percentage of total
    total_all = totals['weighted_households'].sum()
    totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)
    totals['year'] = year
    return totals

# Display national totals
print("National Totals by Phase Status:")
print("\n2024:")
nat_2024 = national_totals(df_2024, 2024)
print(nat_2024.to_string(index=False))
print(f"\nTotal childless tax units: {nat_2024['weighted_households'].sum():,.0f}")

print("\n2025:")
nat_2025 = national_totals(df_2025, 2025)
print(nat_2025.to_string(index=False))
print(f"\nTotal childless tax units: {nat_2025['weighted_households'].sum():,.0f}")

National Totals by Phase Status:

2024:
   eitc_phase_status  weighted_households  pct_of_total  year
         Full amount            33,314.48          0.00  2024
    Fully phased out        36,907,524.00         38.30  2024
          Ineligible        51,325,700.00         53.20  2024
    No earned income         6,137,777.50          6.40  2024
Partially phased out           824,046.81          0.90  2024
        Pre-phase-in         1,203,184.00          1.20  2024

Total childless tax units: 96,431,552

2025:
   eitc_phase_status  weighted_households  pct_of_total  year
         Full amount            33,638.47          0.00  2025
    Fully phased out        37,248,152.00         38.30  2025
          Ineligible        51,804,544.00         53.20  2025
    No earned income         6,194,765.50          6.40  2025
Partially phased out           831,458.88          0.90  2025
        Pre-phase-in         1,214,332.12          1.20  2025

Total childless tax units: 97,326,896


## Notes

### Data Interpretation
- **Tax unit weights** represent the number of actual tax units each record represents in the population
- All monetary values are weighted averages/totals reflecting the full population
- State datasets contain representative microdata for each state

### EITC Phase Status Definitions
1. **Ineligible**: Does not meet EITC eligibility requirements (age 25-64, valid SSN, investment income limits, or filing status)
2. **No earned income**: Eligible for EITC but has zero earned income (cannot receive credit without earnings)
3. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. Credit = (earned income × 7.65%)
4. **Full amount**: At the plateau - receiving maximum credit (~$632 for childless in 2024)
5. **Partially phased out**: Income is above the phase-out threshold, receiving reduced credit
6. **Fully phased out**: Income is too high; credit is reduced to $0

### Childless Worker EITC Parameters (2024)
- Maximum credit: ~$632
- Phase-in rate: 7.65%
- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)
- Phase-out rate: 7.65%
- Age requirements: 25-64 years old (or 19+ if former foster youth/homeless)

### State EITC Programs
See the State EITC Programs section at the beginning of this notebook for detailed information on each state's program, including states with unique structures (CA, MN, WA, VA, DE, MD).