# EITC Analysis: Childless Families by Phase-in/Phase-out Status

This notebook analyzes childless families (tax units with no EITC qualifying children) who receive the Earned Income Tax Credit (EITC), including:
- Federal EITC amounts
- State EITC amounts (where applicable)
- EITC schedule position (pre-phase-in, full amount, partially phased out, fully phased out)
- Household characteristics (marital status, state, demographics)

**Data Source:** State-specific datasets from PolicyEngine

**Years Analyzed:** 2024 and 2025

## Setup and Imports

In [30]:
from policyengine_us import Microsimulation
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')

## Helper Function: Determine EITC Phase Status

The EITC has four distinct regions:
1. **Pre-phase-in**: Earning below the level needed to reach maximum credit
2. **Full amount (plateau)**: Earning enough for maximum credit, not yet in phase-out
3. **Partially phased out**: In phase-out range, but still receiving some credit
4. **Fully phased out**: Income too high, EITC = $0

In [31]:
def determine_eitc_phase_status_vectorized(df):
    """
    Vectorized version to determine EITC phase status for a DataFrame.
    
    Categories:
    - No income: No/minimal earned income, not receiving EITC
    - Pre-phase-in: Earning but haven't reached maximum credit yet
    - Full amount: At maximum credit (plateau region)
    - Partially phased out: In phase-out region, still receiving some credit
    - Fully phased out: Income too high, EITC reduced to $0
    """
    conditions = [
        # No income: earned income is 0 or very low AND not receiving EITC
        (df['tax_unit_earned_income'] <= 100) & (df['eitc'] <= 0),
        
        # Fully phased out: EITC is 0 AND had some earned income AND there was reduction
        (df['eitc'] <= 0) & (df['tax_unit_earned_income'] > 100) & (df['eitc_reduction'] > 0),
        
        # Fully phased out: EITC is 0 AND phased_in >= maximum (meaning they would have gotten max but it's all reduced)
        (df['eitc'] <= 0) & (df['eitc_phased_in'] >= df['eitc_maximum']),
        
        # Pre-phase-in: Receiving EITC but haven't hit maximum yet
        (df['eitc'] > 0) & (df['eitc_phased_in'] < df['eitc_maximum']),
        
        # Partially phased out: Receiving EITC with some reduction
        (df['eitc'] > 0) & (df['eitc_reduction'] > 0),
        
        # Full amount: At maximum, no reduction
        (df['eitc'] > 0) & (df['eitc_phased_in'] >= df['eitc_maximum']) & (df['eitc_reduction'] <= 0)
    ]
    
    choices = [
        'No income',
        'Fully phased out',
        'Fully phased out',
        'Pre-phase-in',
        'Partially phased out',
        'Full amount'
    ]
    
    return np.select(conditions, choices, default='No income')

## Load Data and Calculate Variables

We'll run the analysis for both 2024 and 2025.

In [32]:
# List of all US states
ALL_STATES = [
    'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 
    'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 
    'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 
    'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 
    'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'
]

# Phase status order for sorting
PHASE_ORDER = ['No income', 'Pre-phase-in', 'Full amount', 'Partially phased out', 'Fully phased out']

def run_state_eitc_analysis(state_abbr, year):
    """
    Run EITC analysis for ALL childless households (not just recipients) for a given state and year.
    """
    try:
        # Load the state-specific dataset
        dataset_path = f"hf://policyengine/policyengine-us-data/states/{state_abbr}.h5"
        sim = Microsimulation(dataset=dataset_path)
        
        # Calculate tax unit level variables
        data = {}
        
        tax_unit_vars = [
            'tax_unit_id',
            'tax_unit_weight',
            'eitc',
            'eitc_maximum',
            'eitc_phased_in',
            'eitc_reduction',
            'eitc_child_count',
            'state_eitc',
            'adjusted_gross_income',
            'tax_unit_earned_income',
            'filing_status',
            'age_head',
            'age_spouse',
        ]
        
        for var in tax_unit_vars:
            result = sim.calculate(var, period=year)
            data[var] = result.values if hasattr(result, 'values') else np.array(result)
        
        df = pd.DataFrame(data)
        df['state'] = state_abbr
        
        # Filter to childless families only (include ALL, not just EITC recipients)
        childless_mask = df['eitc_child_count'] == 0
        df_childless = df[childless_mask].copy()
        
        if len(df_childless) == 0:
            return None
        
        # Determine EITC phase status for ALL childless households
        df_childless['eitc_phase_status'] = determine_eitc_phase_status_vectorized(df_childless)
        
        # Add year column
        df_childless['year'] = year
        
        # Map filing status codes to readable labels
        filing_status_map = {
            1: 'Single',
            2: 'Joint',
            3: 'Separate',
            4: 'Head of Household',
            5: 'Widow(er)'
        }
        df_childless['filing_status_label'] = df_childless['filing_status'].map(filing_status_map).fillna('Unknown')
        
        return df_childless
        
    except Exception as e:
        print(f"  Error processing {state_abbr}: {e}")
        return None


def run_all_states_analysis(year, states=None):
    """
    Run EITC analysis for all states for a given year.
    """
    if states is None:
        states = ALL_STATES
    
    print(f"\n{'='*60}")
    print(f"Running analysis for {year}")
    print(f"{'='*60}")
    
    all_results = []
    
    for i, state in enumerate(states):
        print(f"Processing {state} ({i+1}/{len(states)})...", end=" ")
        result = run_state_eitc_analysis(state, year)
        if result is not None and len(result) > 0:
            weighted_count = result['tax_unit_weight'].sum()
            print(f"{len(result):,} records, {weighted_count:,.0f} weighted")
            all_results.append(result)
        else:
            print("No data found")
    
    if all_results:
        combined = pd.concat(all_results, ignore_index=True)
        print(f"\nTotal: {len(combined):,} records, {combined['tax_unit_weight'].sum():,.0f} weighted tax units")
        return combined
    else:
        return pd.DataFrame()

## Run Analysis for 2024 and 2025

In [33]:
# Run for 2024 - all states
df_2024 = run_all_states_analysis(2024)


Running analysis for 2024
Processing AL (1/51)... 25,751 records, 1,422,123 weighted
Processing AK (2/51)... 1,182 records, 205,778 weighted
Processing AZ (3/51)... 30,120 records, 1,905,622 weighted
Processing AR (4/51)... 15,144 records, 683,842 weighted
Processing CA (5/51)... 238,247 records, 11,676,756 weighted
Processing CO (6/51)... 34,120 records, 1,602,958 weighted
Processing CT (7/51)... 19,827 records, 1,119,846 weighted
Processing DE (8/51)... 3,801 records, 265,233 weighted
Processing DC (9/51)... 4,995 records, 247,082 weighted
Processing FL (10/51)... 45,655 records, 6,828,672 weighted
Processing GA (11/51)... 56,638 records, 2,867,909 weighted
Processing HI (12/51)... 8,416 records, 401,230 weighted
Processing ID (13/51)... 7,678 records, 420,636 weighted
Processing IL (14/51)... 56,631 records, 4,061,833 weighted
Processing IN (15/51)... 33,456 records, 1,707,284 weighted
Processing IA (16/51)... 14,070 records, 834,990 weighted
Processing KS (17/51)... 15,776 records

In [34]:
# Run for 2025 - all states
df_2025 = run_all_states_analysis(2025)


Running analysis for 2025
Processing AL (1/51)... 25,751 records, 1,435,327 weighted
Processing AK (2/51)... 1,182 records, 207,689 weighted
Processing AZ (3/51)... 30,120 records, 1,923,315 weighted
Processing AR (4/51)... 15,144 records, 690,191 weighted
Processing CA (5/51)... 238,247 records, 11,785,171 weighted
Processing CO (6/51)... 34,120 records, 1,617,841 weighted
Processing CT (7/51)... 19,827 records, 1,130,243 weighted
Processing DE (8/51)... 3,801 records, 267,696 weighted
Processing DC (9/51)... 4,995 records, 249,376 weighted
Processing FL (10/51)... 45,655 records, 6,892,074 weighted
Processing GA (11/51)... 56,638 records, 2,894,537 weighted
Processing HI (12/51)... 8,416 records, 404,956 weighted
Processing ID (13/51)... 7,678 records, 424,542 weighted
Processing IL (14/51)... 56,631 records, 4,099,546 weighted
Processing IN (15/51)... 33,456 records, 1,723,135 weighted
Processing IA (16/51)... 14,070 records, 842,742 weighted
Processing KS (17/51)... 15,776 records

In [35]:
# Combine both years
df_combined = pd.concat([df_2024, df_2025], ignore_index=True)
print(f"\nCombined dataset: {len(df_combined):,} records")


Combined dataset: 2,987,082 records


## Summary Statistics

### EITC Phase Status Distribution

In [36]:
def create_phase_status_summary(df, year_label):
    """
    Create summary of EITC phase status by state with weighted counts and percentages.
    """
    print(f"\n{'='*70}")
    print(f"EITC Phase Status by State - {year_label}")
    print(f"{'='*70}")
    
    # Calculate weighted counts by state and phase status
    summary = df.groupby(['state', 'eitc_phase_status']).agg({
        'tax_unit_weight': 'sum',
    }).reset_index()
    
    summary.columns = ['state', 'eitc_phase_status', 'weighted_households']
    
    # Calculate state totals for percentage
    state_totals = summary.groupby('state')['weighted_households'].sum().reset_index()
    state_totals.columns = ['state', 'state_total']
    
    # Merge to get percentages
    summary = summary.merge(state_totals, on='state')
    summary['pct_of_state'] = (summary['weighted_households'] / summary['state_total'] * 100).round(1)
    
    # Add average EITC amounts (only for those receiving)
    avg_eitc = df[df['eitc'] > 0].groupby(['state', 'eitc_phase_status']).apply(
        lambda x: pd.Series({
            'avg_federal_eitc': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            'avg_state_eitc': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
        })
    ).reset_index()
    
    summary = summary.merge(avg_eitc, on=['state', 'eitc_phase_status'], how='left')
    summary['avg_federal_eitc'] = summary['avg_federal_eitc'].fillna(0)
    summary['avg_state_eitc'] = summary['avg_state_eitc'].fillna(0)
    
    # Reorder columns
    summary = summary[['state', 'eitc_phase_status', 'weighted_households', 'pct_of_state', 
                       'avg_federal_eitc', 'avg_state_eitc']]
    
    # Sort by state and phase status order
    summary['phase_sort'] = summary['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})
    summary = summary.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)
    
    return summary

# Run for 2024 and 2025
summary_2024 = create_phase_status_summary(df_2024, "2024")
summary_2025 = create_phase_status_summary(df_2025, "2025")

print("\n2024 Summary (first 20 rows):")
print(summary_2024.head(20).to_string(index=False))
print("\n2025 Summary (first 20 rows):")
print(summary_2025.head(20).to_string(index=False))


EITC Phase Status by State - 2024

EITC Phase Status by State - 2025

2024 Summary (first 20 rows):
state    eitc_phase_status  weighted_households  pct_of_state  avg_federal_eitc  avg_state_eitc
   AK            No income            64,211.33         31.20              0.00            0.00
   AK         Pre-phase-in             3,593.07          1.70            515.63            0.00
   AK          Full amount                 0.26          0.00            632.00            0.00
   AK Partially phased out             1,670.44          0.80            626.76            0.00
   AK     Fully phased out           136,303.23         66.20              0.00            0.00
   AL            No income           598,891.06         42.10              0.00            0.00
   AL         Pre-phase-in             3,394.00          0.20            354.86            0.00
   AL          Full amount               579.79          0.00            632.00            0.00
   AL Partially phased out         

### Distribution by Filing Status (Marital Status)

In [37]:
def show_example_households(df, year_label, n_examples=3):
    """
    Show example households from each phase status with key characteristics.
    """
    print(f"\n{'='*70}")
    print(f"Example Households by Phase Status - {year_label}")
    print(f"{'='*70}")
    
    examples = []
    
    for phase in ['Pre-phase-in', 'Full amount', 'Partially phased out']:
        phase_df = df[df['eitc_phase_status'] == phase]
        if len(phase_df) > 0:
            # Get random sample of examples
            sample = phase_df.sample(min(n_examples, len(phase_df)), random_state=42)
            for _, row in sample.iterrows():
                examples.append({
                    'phase_status': phase,
                    'state': row['state'],
                    'marital_status': row['filing_status_label'],
                    'age_head': int(row['age_head']),
                    'agi': row['adjusted_gross_income'],
                    'earned_income': row['tax_unit_earned_income'],
                    'federal_eitc': row['eitc'],
                    'state_eitc': row['state_eitc'],
                })
    
    examples_df = pd.DataFrame(examples)
    return examples_df

# Show examples for 2024
examples_2024 = show_example_households(df_2024, "2024")
print(examples_2024.to_string(index=False))


Example Households by Phase Status - 2024
        phase_status state marital_status  age_head       agi  earned_income  federal_eitc  state_eitc
        Pre-phase-in    TN        Unknown        30    938.96       2,316.38        177.20        0.00
        Pre-phase-in    NY        Unknown        38      2.51           2.51          0.19        0.06
        Pre-phase-in    NY        Unknown        44  3,109.13       3,268.07        250.01       75.00
         Full amount    GA        Unknown        72  9,529.73       9,529.73        632.00        0.00
         Full amount    NY        Unknown        31 12,709.29      13,765.16        632.00      189.60
         Full amount    CA        Unknown        48  7,210.69      15,053.27        632.00      159.20
Partially phased out    AL        Unknown        25 12,807.38      13,072.27        422.22        0.00
Partially phased out    NY        Unknown        64 13,765.90      13,765.16        369.15       65.75
Partially phased out    AZ    

### Distribution by State

In [38]:
def summary_by_state(df, year_label, top_n=15):
    """
    Create summary by state (top N by number of recipients).
    """
    print(f"\n{'='*60}")
    print(f"Top {top_n} States by EITC Recipients - {year_label}")
    print(f"{'='*60}")
    
    summary = df.groupby('state').apply(
        lambda x: pd.Series({
            'Tax Units (Weighted)': x['tax_unit_weight'].sum(),
            'Total Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum(),
            'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),
            'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            'Has State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() > 0,
        })
    ).reset_index()
    
    # Sort by number of recipients
    summary = summary.sort_values('Tax Units (Weighted)', ascending=False).head(top_n)
    
    return summary

# 2024
state_2024 = summary_by_state(df_2024, "2024")
print(state_2024.to_string(index=False))

# 2025
state_2025 = summary_by_state(df_2025, "2025")
print(state_2025.to_string(index=False))


Top 15 States by EITC Recipients - 2024
state  Tax Units (Weighted)  Total Federal EITC  Total State EITC  Avg Federal EITC  Avg State EITC  Has State EITC
   CA         11,676,756.00      126,396,768.00    392,533,280.00             10.82           33.62            True
   TX          8,270,492.50      102,216,176.00              0.00             12.36            0.00           False
   FL          6,828,671.50       50,078,040.00              0.00              7.33            0.00           False
   NY          6,089,496.00       64,632,924.00     17,955,152.00             10.61            2.95            True
   IL          4,061,833.00       43,125,848.00      8,625,170.00             10.62            2.12            True
   PA          4,057,412.25       41,305,212.00              0.00             10.18            0.00           False
   OH          3,171,405.75       30,410,496.00      9,123,148.00              9.59            2.88            True
   NC          3,018,447.50    

### Cross-tabulation: Phase Status by Filing Status

In [39]:
def crosstab_phase_by_filing(df, year_label):
    """
    Create cross-tabulation of phase status by filing status.
    """
    print(f"\n{'='*60}")
    print(f"Phase Status by Filing Status (Weighted Tax Units) - {year_label}")
    print(f"{'='*60}")
    
    # Create pivot table with weighted counts
    pivot = df.pivot_table(
        values='tax_unit_weight',
        index='eitc_phase_status',
        columns='filing_status_label',
        aggfunc='sum',
        fill_value=0
    )
    
    # Add totals
    pivot['Total'] = pivot.sum(axis=1)
    pivot.loc['Total'] = pivot.sum()
    
    return pivot

# 2024
crosstab_2024 = crosstab_phase_by_filing(df_2024, "2024")
print(crosstab_2024.to_string())

# 2025
crosstab_2025 = crosstab_phase_by_filing(df_2025, "2025")
print(crosstab_2025.to_string())


Phase Status by Filing Status (Weighted Tax Units) - 2024
filing_status_label        Unknown         Total
eitc_phase_status                               
Full amount              33,314.48     33,314.48
Fully phased out     60,244,548.00 60,244,548.00
No income            34,126,456.00 34,126,456.00
Partially phased out    824,046.81    824,046.81
Pre-phase-in          1,203,184.00  1,203,184.00
Total                96,431,552.00 96,431,552.00

Phase Status by Filing Status (Weighted Tax Units) - 2025
filing_status_label        Unknown         Total
eitc_phase_status                               
Full amount              33,638.47     33,638.47
Fully phased out     60,940,444.00 60,940,444.00
No income            34,307,016.00 34,307,016.00
Partially phased out    831,458.88    831,458.88
Pre-phase-in          1,214,332.12  1,214,332.12
Total                97,326,896.00 97,326,896.00


### Age Distribution

In [40]:
def age_distribution(df, year_label):
    """
    Create age group distribution.
    """
    print(f"\n{'='*60}")
    print(f"Age Distribution of Head of Household - {year_label}")
    print(f"{'='*60}")
    
    # Create age groups
    df_copy = df.copy()
    df_copy['age_group'] = pd.cut(
        df_copy['age_head'],
        bins=[0, 25, 35, 45, 55, 65, 100],
        labels=['Under 25', '25-34', '35-44', '45-54', '55-64', '65+']
    )
    
    summary = df_copy.groupby('age_group').apply(
        lambda x: pd.Series({
            'Tax Units (Weighted)': x['tax_unit_weight'].sum(),
            'Avg Federal EITC': (x['eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,
            'Avg Earned Income': (x['tax_unit_earned_income'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum() if x['tax_unit_weight'].sum() > 0 else 0,
        })
    ).reset_index()
    
    total_units = summary['Tax Units (Weighted)'].sum()
    summary['% of Total'] = (summary['Tax Units (Weighted)'] / total_units * 100).round(1)
    
    return summary

# 2024
age_2024 = age_distribution(df_2024, "2024")
print(age_2024.to_string(index=False))

# 2025
age_2025 = age_distribution(df_2025, "2025")
print(age_2025.to_string(index=False))


Age Distribution of Head of Household - 2024
age_group  Tax Units (Weighted)  Avg Federal EITC  Avg Earned Income  % of Total
 Under 25         12,069,262.00              0.57          24,647.42       12.50
    25-34         14,198,971.00             35.78          76,383.24       14.70
    35-44         11,448,204.00              2.19          94,731.74       11.90
    45-54         16,595,334.00             22.36          87,682.62       17.20
    55-64          9,673,886.00              1.18          59,089.31       10.00
      65+         32,441,214.00              0.01          25,601.59       33.60

Age Distribution of Head of Household - 2025
age_group  Tax Units (Weighted)  Avg Federal EITC  Avg Earned Income  % of Total
 Under 25         12,181,323.00              0.59          25,851.27       12.50
    25-34         14,330,805.00             36.65          80,112.14       14.70
    35-44         11,554,499.00              2.15          99,357.79       11.90
    45-54        

### States with State EITC Programs

In [41]:
def state_eitc_summary(df, year_label):
    """
    Summary of states with state EITC programs.
    """
    print(f"\n{'='*60}")
    print(f"States with State EITC Benefits - {year_label}")
    print(f"{'='*60}")
    
    # Filter to states with state EITC > 0
    df_with_state_eitc = df[df['state_eitc'] > 0]
    
    if len(df_with_state_eitc) == 0:
        print("No state EITC benefits found in the data.")
        return None
    
    summary = df_with_state_eitc.groupby('state').apply(
        lambda x: pd.Series({
            'Tax Units (Weighted)': x['tax_unit_weight'].sum(),
            'Total State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum(),
            'Avg State EITC': (x['state_eitc'] * x['tax_unit_weight']).sum() / x['tax_unit_weight'].sum(),
            'State EITC as % of Fed': ((x['state_eitc'] * x['tax_unit_weight']).sum() / 
                                       (x['eitc'] * x['tax_unit_weight']).sum() * 100) if (x['eitc'] * x['tax_unit_weight']).sum() > 0 else 0,
        })
    ).reset_index()
    
    summary = summary.sort_values('Total State EITC', ascending=False)
    
    return summary

# 2024
state_eitc_2024 = state_eitc_summary(df_2024, "2024")
if state_eitc_2024 is not None:
    print(state_eitc_2024.to_string(index=False))

# 2025
state_eitc_2025 = state_eitc_summary(df_2025, "2025")
if state_eitc_2025 is not None:
    print(state_eitc_2025.to_string(index=False))


States with State EITC Benefits - 2024
state  Tax Units (Weighted)  Total State EITC  Avg State EITC  State EITC as % of Fed
   CA          2,873,632.00    392,533,248.00          136.60                  310.56
   MN            313,996.50    118,287,520.00          376.72                  656.23
   VA            348,289.69    102,224,688.00          293.50                  702.41
   NJ            242,280.23     55,209,756.00          227.88                  173.98
   MD             64,599.95     49,036,304.00          759.08                  294.33
   WA             84,485.99     27,457,220.00          324.99                   64.69
   NY            146,240.25     17,955,152.00          122.78                   27.83
   MA             61,848.42     11,170,704.00          180.61                   40.00
   DC             21,836.86     10,643,084.00          487.39                  442.39
   NM             90,626.20      9,532,400.00          105.18                  140.54
   SC         

## Export Data to CSV

In [42]:
# Export detailed household data - SEPARATE files for 2024 and 2025
# Sorted by state and phase_status, without eitc_maximum

def export_household_data(df, year):
    """Export household-level data sorted by state and phase status."""
    
    export_columns = [
        'state',
        'eitc_phase_status',
        'tax_unit_id',
        'tax_unit_weight',
        'eitc',
        'state_eitc',
        'eitc_phased_in',
        'eitc_reduction',
        'tax_unit_earned_income',
        'adjusted_gross_income',
        'filing_status_label',
        'age_head',
        'age_spouse',
    ]
    
    # Select columns that exist
    available_columns = [col for col in export_columns if col in df.columns]
    df_export = df[available_columns].copy()
    
    # Rename for clarity
    df_export = df_export.rename(columns={
        'eitc': 'federal_eitc',
        'filing_status_label': 'marital_status',
    })
    
    # Sort by state and phase status
    df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})
    df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)
    
    # Export
    filename = f'eitc_childless_families_{year}.csv'
    df_export.to_csv(filename, index=False)
    print(f"Exported {len(df_export):,} rows to: {filename}")
    
    return df_export

# Export 2024
df_export_2024 = export_household_data(df_2024, 2024)

# Export 2025
df_export_2025 = export_household_data(df_2025, 2025)

Exported 1,493,541 rows to: eitc_childless_families_2024.csv
Exported 1,493,541 rows to: eitc_childless_families_2025.csv


In [43]:
# Preview the data
print("\nSample of 2024 export data:")
df_export_2024.head(10)


Sample of 2024 export data:


Unnamed: 0,state,eitc_phase_status,tax_unit_id,tax_unit_weight,federal_eitc,state_eitc,eitc_phased_in,eitc_reduction,tax_unit_earned_income,adjusted_gross_income,marital_status,age_head,age_spouse
25751,AK,No income,0,0.8,0.0,0.0,0.0,0.0,0.0,3923.64,Unknown,79,0
25753,AK,No income,3,0.28,0.0,0.0,0.0,10068.1,0.0,148859.19,Unknown,76,74
25754,AK,No income,5,12.27,0.0,0.0,194.41,0.0,2541.26,3945.09,Unknown,64,0
25757,AK,No income,11,4387.35,0.0,0.0,0.0,3368.61,0.0,61284.13,Unknown,85,82
25761,AK,No income,15,639.52,0.0,0.0,0.0,992.74,0.0,23307.04,Unknown,85,0
25763,AK,No income,18,1114.78,0.0,0.0,0.0,0.0,0.0,1403.83,Unknown,83,0
25767,AK,No income,22,0.82,0.0,0.0,0.0,0.0,0.0,2153.92,Unknown,85,0
25769,AK,No income,24,792.77,0.0,0.0,0.0,20.54,0.0,10598.54,Unknown,81,0
25770,AK,No income,25,1.06,0.0,0.0,0.0,0.0,0.0,1403.83,Unknown,85,0
25771,AK,No income,27,1.04,0.0,0.0,0.0,0.0,0.0,1403.83,Unknown,64,0


In [44]:
# CSVs already exported in previous cell
# Files created:
# - eitc_childless_families_2024.csv
# - eitc_childless_families_2025.csv
print("Household data exported to separate files above.")

Household data exported to separate files above.


## Summary Statistics Export

In [45]:
# Export phase status summaries - SEPARATE files for 2024 and 2025

def export_summary(summary_df, year):
    """Export summary sorted by state and phase status."""
    df_export = summary_df.copy()
    
    # Sort by state and phase status
    df_export['phase_sort'] = df_export['eitc_phase_status'].map({p: i for i, p in enumerate(PHASE_ORDER)})
    df_export = df_export.sort_values(['state', 'phase_sort']).drop('phase_sort', axis=1)
    
    filename = f'eitc_childless_phase_status_summary_{year}.csv'
    df_export.to_csv(filename, index=False)
    print(f"Exported summary to: {filename}")
    return df_export

# Export 2024 summary
summary_2024_export = export_summary(summary_2024, 2024)

# Export 2025 summary  
summary_2025_export = export_summary(summary_2025, 2025)

Exported summary to: eitc_childless_phase_status_summary_2024.csv
Exported summary to: eitc_childless_phase_status_summary_2025.csv


## Grand Totals

In [46]:
# National totals by phase status
def national_totals(df, year):
    totals = df.groupby('eitc_phase_status').agg({
        'tax_unit_weight': 'sum',
    }).reset_index()
    totals.columns = ['eitc_phase_status', 'weighted_households']
    total_all = totals['weighted_households'].sum()
    totals['pct_of_total'] = (totals['weighted_households'] / total_all * 100).round(1)
    totals['year'] = year
    return totals

print("National Totals by Phase Status:")
print("\n2024:")
nat_2024 = national_totals(df_2024, 2024)
print(nat_2024.to_string(index=False))
print(f"\nTotal childless EITC recipients: {nat_2024['weighted_households'].sum():,.0f}")

print("\n2025:")
nat_2025 = national_totals(df_2025, 2025)
print(nat_2025.to_string(index=False))
print(f"\nTotal childless EITC recipients: {nat_2025['weighted_households'].sum():,.0f}")

National Totals by Phase Status:

2024:
   eitc_phase_status  weighted_households  pct_of_total  year
         Full amount            33,314.48          0.00  2024
    Fully phased out        60,244,548.00         62.50  2024
           No income        34,126,456.00         35.40  2024
Partially phased out           824,046.81          0.90  2024
        Pre-phase-in         1,203,184.00          1.20  2024

Total childless EITC recipients: 96,431,552

2025:
   eitc_phase_status  weighted_households  pct_of_total  year
         Full amount            33,638.47          0.00  2025
    Fully phased out        60,940,444.00         62.60  2025
           No income        34,307,016.00         35.20  2025
Partially phased out           831,458.88          0.90  2025
        Pre-phase-in         1,214,332.12          1.20  2025

Total childless EITC recipients: 97,326,896


## Notes

### Data Interpretation
- **Tax unit weights** represent the number of actual tax units each record represents in the population
- All monetary values are weighted averages/totals reflecting the full population
- The enhanced CPS dataset has ~42,000 household records that are weighted to represent the US population

### EITC Phase Status Definitions
1. **Pre-phase-in**: Earned income is below the level needed to receive the maximum credit. The credit amount equals (earned income Ã— phase-in rate).
2. **Full amount**: Earned income is sufficient to receive the maximum credit, and income is below the phase-out threshold.
3. **Partially phased out**: Income is above the phase-out threshold, resulting in a reduced credit.
4. **Fully phased out**: Income is too high; credit is reduced to $0.

### State EITC Programs
Not all states have state EITC programs. States with programs typically calculate their EITC as a percentage of the federal EITC amount.

### Childless Worker EITC
The federal EITC for childless workers is significantly smaller than for workers with children. Key parameters (2024):
- Maximum credit: ~$632
- Phase-in rate: 7.65%
- Phase-out starts at: ~$9,800 (single), ~$16,400 (married)
- Phase-out rate: 7.65%