# Enhanced CPS Dataset Analysis - Updated Documentation

This document describes the Enhanced CPS dataset implementation, including the ASEC Undocumented Algorithm for SSN card type assignment and validation testing for policy reform impacts such as Child Tax Credit (CTC) analysis.

**Latest Update**: Added comprehensive population tracking for all 14 individual ASEC conditions with detailed logging and documentation.

## Overview

The Enhanced CPS dataset integrates multiple sophisticated algorithms to create a comprehensive microsimulation dataset that accurately represents the US population, including undocumented immigrants and their policy interactions.

# Part I: ASEC Undocumented Algorithm

The ASEC Undocumented Algorithm implements a process of elimination approach to identify individuals who are likely undocumented immigrants by systematically removing people who have clear indicators of legal immigration status.

**Research Paper**: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4662801

## Algorithm Overview

The algorithm assigns SSN card type codes:
- **Code 0**: `"NONE"` - Likely undocumented immigrants
- **Code 1**: `"CITIZEN"` - US citizens (born or naturalized)  
- **Code 2**: `"NON_CITIZEN_VALID_EAD"` - Non-citizens with work/study authorization
- **Code 3**: `"OTHER_NON_CITIZEN"` - Non-citizens with indicators of legal status

## Implementation Details

The algorithm is implemented in the `add_ssn_card_type()` function in `policyengine_us_data/datasets/cps/cps.py:692-1350` (updated with comprehensive condition logging).

### Function Parameters
- `undocumented_target`: 13 million (default target for total undocumented population)
- `undocumented_workers_target`: 8.3 million (target for undocumented workers)
- `undocumented_students_target`: ~399k (21% of 1.9M total undocumented students)

### Recent Code Changes
- **Step 6 Removal**: The algorithm now has 5 steps instead of 6
- **Enhanced Logging**: All 14 individual ASEC conditions now log population counts to CSV
- **Comprehensive Tracking**: Each condition prints console output and saves data for documentation

## Algorithm Steps

### Step 0: Initialization
- All persons start with SSN card type code 0 (`ssn_card_type = np.full(len(person), 0)`)
- Prints initial Code 0 population count

### Step 1: Citizen Classification
Citizens are identified and moved to code 1 based on citizenship status:
- **Condition**: `PRCITSHP` values 1, 2, 3, or 4 (all citizen types)
- **Result**: Citizens assigned to code 1, non-citizens (`PRCITSHP == 5`) remain for further processing

### Step 2: ASEC Undocumented Algorithm Conditions
The algorithm applies 14 specific conditions to identify non-citizens with indicators of legal status. Only individuals not already classified as citizens (code 1) or authorized workers/students (code 2) are evaluated (`potentially_undocumented` mask).

**All 14 conditions now have comprehensive population tracking and logging.**

#### Condition 1: Pre-1982 Arrivals (IRCA Amnesty Eligible)
- **Variable**: `PEINUSYR` codes 1-7
- **Logic**: Immigrants who arrived before 1982 were eligible for IRCA amnesty
- **Codes**: 01=Before 1950, 02=1950-1959, 03=1960-1964, 04=1965-1969, 05=1970-1974, 06=1975-1979, 07=1980-1981
- **Logging**: ✅ Console print + CSV tracking

#### Condition 2: Eligible Naturalized Citizens
- **Variables**: `PRCITSHP == 4` (naturalized citizen), `A_AGE >= 18`, `PEINUSYR` for residency
- **Logic**: Naturalized citizens meeting residency requirements:
  - 5+ years in US (codes 8-26: 1982-2019), OR
  - 3+ years in US + married to citizen (codes 8-27: 1982-2021)
- **Marriage check**: `A_MARITL` in [1,2] AND `A_SPOUSE > 0`
- **Logging**: ✅ Console print + CSV tracking

#### Condition 3: Medicare Recipients
- **Variable**: `MCARE == 1`
- **Logic**: Medicare eligibility indicates legal status
- **Logging**: ✅ Console print + CSV tracking

#### Condition 4: Federal Retirement Benefits
- **Variables**: `PEN_SC1 == 3` OR `PEN_SC2 == 3`
- **Logic**: Federal government pension recipients have legal work history
- **Logging**: ✅ Console print + CSV tracking

#### Condition 5: Social Security Disability
- **Variables**: `RESNSS1 == 2` OR `RESNSS2 == 2`
- **Logic**: Social Security disability benefits indicate legal status (code 2 = disabled adult or child)
- **Logging**: ✅ Console print + CSV tracking

#### Condition 6: Indian Health Service Coverage
- **Variable**: `IHSFLG == 1`
- **Logic**: IHS coverage indicates legal status or tribal membership
- **Logging**: ✅ Console print + CSV tracking

#### Condition 7: Medicaid Recipients
- **Variable**: `CAID == 1`
- **Logic**: Medicaid eligibility generally requires legal status (simplified implementation without state-specific adjustments)
- **Logging**: ✅ Console print + CSV tracking

#### Condition 8: CHAMPVA Recipients
- **Variable**: `CHAMPVA == 1`
- **Logic**: CHAMPVA (Veterans Affairs health coverage) indicates military family connection
- **Logging**: ✅ Console print + CSV tracking

#### Condition 9: Military Health Insurance
- **Variable**: `MIL == 1`
- **Logic**: TRICARE/military health insurance indicates military service or family connection
- **Logging**: ✅ Console print + CSV tracking

#### Condition 10: Government Employees
- **Variables**: `PEIO1COW` codes 1-3 OR `A_MJOCC == 11`
- **Logic**: Government employment requires legal work authorization
- **Codes**: 1=Federal government, 2=State government, 3=Local government, A_MJOCC 11=Military occupation
- **Logging**: ✅ Console print + CSV tracking

#### Condition 11: Social Security Recipients
- **Variable**: `SS_YN == 1`
- **Logic**: Social Security benefits indicate legal status and work history
- **Logging**: ✅ Console print + CSV tracking

#### Condition 12: Housing Assistance
- **Variable**: `SPM_CAPHOUSESUB > 0` (mapped from SPM unit data)
- **Logic**: Housing assistance programs generally require legal status
- **Implementation**: Uses SPM unit mapping: `spm_housing_map = dict(zip(spm_unit.SPM_ID, spm_unit.SPM_CAPHOUSESUB))`
- **Logging**: ✅ Console print + CSV tracking

#### Condition 13: Veterans/Military Personnel
- **Variables**: `PEAFEVER == 1` OR `A_MJOCC == 11`
- **Logic**: Military service or veteran status indicates legal status
- **Logging**: ✅ Console print + CSV tracking

#### Condition 14: SSI Recipients
- **Variable**: `SSI_YN == 1`
- **Logic**: SSI eligibility generally requires legal status (simplified implementation)
- **Logging**: ✅ Console print + CSV tracking

### Step 3: Target-Driven EAD Assignment for Workers
- **Target**: 8.3 million undocumented workers (from Pew Research)
- **Eligibility**: Non-citizens not in code 3 with earnings (`WSAL_VAL > 0` OR `SEMP_VAL > 0`)
- **Process**: Uses `select_random_subset_to_target()` function with random seed 0
- **Logic**: Calculates how many workers need EAD status to hit the target, then randomly selects from eligible pool
- **Result**: Selected workers moved from code 0 to code 2

### Step 4: Target-Driven EAD Assignment for Students  
- **Target**: ~399k undocumented students (21% of 1.9M total, from Higher Ed Immigration Portal)
- **Eligibility**: Non-citizens not in code 3 currently in college (`A_HSCOL == 2`)
- **Process**: Uses `select_random_subset_to_target()` function with random seed 1
- **Result**: Selected students moved from code 0 to code 2

### Step 5: Probabilistic Family Correlation Adjustment (Final Step)
- **Purpose**: Achieve final target of 13 million undocumented population
- **Target**: 13 million total undocumented immigrants
- **Logic**: If current undocumented population is below target, probabilistically move some code 3 household members to code 0 in mixed-status families
- **Eligibility**: Only code 3 members in households that already have code 0 (undocumented) members
- **Process**: 
  1. Identify households with both code 0 and code 3 members (mixed-status families)
  2. Calculate how many more undocumented people are needed to hit target
  3. Use `select_random_subset_to_target()` function with random seed 100 to probabilistically select code 3 members
  4. Move selected individuals from code 3 to code 0
- **Result**: Final undocumented population reaches target of 13 million

**Note**: Step 6 (Final Population Targeting) has been removed. Step 5 now handles the final population targeting.

## Population Data and Tracking

The `add_ssn_card_type()` function automatically generates a CSV file (`asec_population_log.csv`) that contains all population numbers from each step of the algorithm. This file is saved to the `docs/` directory and contains the following columns:

- **step**: The algorithm step or condition name
- **description**: Description of what the population number represents
- **population**: The weighted population count

### Enhanced Tracking Features

**New in this version:**
- All 14 individual ASEC conditions are now tracked in the CSV
- Each condition has both console output and CSV logging
- Complete visibility into which conditions identify the most people
- Detailed breakdown available in documentation display

In [None]:
import pandas as pd
import os

# Read the population log CSV file
csv_path = "asec_population_log.csv"
if os.path.exists(csv_path):
    df = pd.read_csv(csv_path)
    print("✅ Population data loaded from CSV file\n")
    print("📊 CSV Data Summary:")
    print(f"Total entries: {len(df)}")
    print(f"Columns: {', '.join(df.columns)}")
    print("\n📋 Full Dataset:")
    print(df.to_string(index=False))
else:
    print("❌ CSV file not found. Run cps.py to generate population data.")
    df = pd.DataFrame()  # Empty dataframe for subsequent cells

### Actual Population Numbers (from latest run)

The following section displays the population results with **enhanced tracking for all 14 individual ASEC conditions**.

In [None]:
if not df.empty:
    def get_population(step, description):
        """Helper function to get population for a specific step and description"""
        result = df[(df['step'] == step) & (df['description'] == description)]
        if not result.empty:
            return f"{result.iloc[0]['population']:,.0f}"
        return "Not found"
    
    print("## Algorithm Step Results with Enhanced Condition Tracking\n")
    
    print("### Initialization")
    print(f"- **Step 0 - Initial**: Code 0 people = {get_population('Step 0 - Initial', 'Code 0 people')}")
    print()
    
    print("### Citizen Classification")
    print(f"- **Step 1 - Citizens**: Moved to Code 1 = {get_population('Step 1 - Citizens', 'Moved to Code 1')}")
    print()
    
    print("### ASEC Conditions Analysis")
    print(f"- **ASEC Conditions**: Current Code 0 people = {get_population('ASEC Conditions', 'Current Code 0 people')}")
    print()
    
    print("### 🔍 Individual ASEC Conditions (Detailed Breakdown)")
    print("*Each condition identifies people with indicators of legal status who qualify for Code 3*\n")
    
    # Define condition mappings for lookup
    condition_names = {
        1: "Pre-1982 arrivals",
        2: "Eligible naturalized citizens", 
        3: "Medicare recipients",
        4: "Federal retirement benefits",
        5: "Social Security disability",
        6: "Indian Health Service coverage",
        7: "Medicaid recipients",
        8: "CHAMPVA recipients",
        9: "Military health insurance",
        10: "Government employees",
        11: "Social Security recipients",
        12: "Housing assistance",
        13: "Veterans/Military personnel",
        14: "SSI recipients"
    }
    
    for i in range(1, 15):
        condition_name = condition_names[i]
        condition_pop = get_population(f'Condition {i}', f'{condition_name} qualify for Code 3')
        print(f"- **Condition {i:2d} - {condition_name}**: {condition_pop} people qualify for Code 3")
    
    print()
    print(f"- **After conditions**: Code 0 people = {get_population('After conditions', 'Code 0 people')}")
    print()
    
    print("### Target Information")
    print(f"- **Before adjustment**: Undocumented workers = {get_population('Before adjustment', 'Undocumented workers')}")
    print(f"- **Target**: Undocumented workers target = {get_population('Target', 'Undocumented workers target')}")
    print(f"- **Before adjustment**: Undocumented students = {get_population('Before adjustment', 'Undocumented students')}")
    print(f"- **Target**: Undocumented students target = {get_population('Target', 'Undocumented students target')}")
    print()
    
    print("### EAD Assignment")
    print(f"- **Step 3 - EAD workers**: Moved from Code 0 to Code 2 = {get_population('Step 3 - EAD workers', 'Moved from Code 0 to Code 2')}")
    print(f"- **Step 4 - EAD students**: Moved from Code 0 to Code 2 = {get_population('Step 4 - EAD students', 'Moved from Code 0 to Code 2')}")
    print(f"- **After EAD assignment**: Code 0 people = {get_population('After EAD assignment', 'Code 0 people')}")
    print()
    
    print("### Family Correlation (Final Step)")
    print(f"- **Step 5 - Family correlation**: Changed from Code 3 to Code 0 = {get_population('Step 5 - Family correlation', 'Changed from Code 3 to Code 0')}")
    print(f"- **After family correlation**: Code 0 people = {get_population('After family correlation', 'Code 0 people')}")
    print()
    
    print("### Final Results")
    print(f"- **Final**: Code 0 (NONE) = {get_population('Final', 'Code 0 (NONE)')}")
    print(f"- **Final**: Code 1 (CITIZEN) = {get_population('Final', 'Code 1 (CITIZEN)')}")
    print(f"- **Final**: Code 2 (NON_CITIZEN_VALID_EAD) = {get_population('Final', 'Code 2 (NON_CITIZEN_VALID_EAD)')}")
    print(f"- **Final**: Code 3 (OTHER_NON_CITIZEN) = {get_population('Final', 'Code 3 (OTHER_NON_CITIZEN)')}")
    print(f"- **Final**: Total undocumented (Code 0) = {get_population('Final', 'Total undocumented (Code 0)')}")
    print(f"- **Final**: Undocumented target = {get_population('Final', 'Undocumented target')}")
    
    print("\n### 📊 Condition Analysis Summary")
    print("*Analysis of which conditions identify the most people with legal status indicators*\n")
    
    # Create condition analysis
    condition_data = []
    for i in range(1, 15):
        condition_name = condition_names[i]
        result = df[(df['step'] == f'Condition {i}') & (df['description'] == f'{condition_name} qualify for Code 3')]
        if not result.empty:
            condition_data.append({
                'Condition': f'Condition {i}',
                'Name': condition_name,
                'Population': result.iloc[0]['population']
            })
    
    if condition_data:
        condition_df = pd.DataFrame(condition_data)
        condition_df = condition_df.sort_values('Population', ascending=False)
        print("**Top 5 Most Effective Conditions:**")
        for idx, row in condition_df.head().iterrows():
            print(f"1. {row['Condition']} ({row['Name']}): {row['Population']:,.0f} people")
        
        print("\n**All Conditions Summary:**")
        print(condition_df.to_string(index=False, formatters={'Population': '{:,.0f}'.format}))
    
else:
    print("*Run cps.py to generate population data*")
    print("\n**Expected Output After Running CPS:**")
    print("- All 14 individual ASEC conditions will be tracked")
    print("- Each condition will show population counts")
    print("- Detailed breakdown of which conditions are most effective")
    print("- Complete CSV data with enhanced logging")

## Helper Functions

### `select_random_subset_to_target()`
A sophisticated targeting function that handles both population reduction and increase scenarios:
- **Population reduction**: Uses new random number generator (`np.random.default_rng`) for refinement steps
- **Population increase**: Uses legacy `np.random` for EAD assignment compatibility
- **Weighting**: Accounts for household weights in selection probability
- **Bounds checking**: Caps selection percentage at 100%

### Enhanced Logging System
**New tracking features:**
- Each condition calculates a `condition_X_count` variable
- Console output shows immediate results
- CSV logging captures all data for analysis
- Documentation automatically displays latest results

## Code Changes Summary

### Recent Updates to `cps.py`

**Enhanced Condition Logging (All 14 Conditions):**

```python
# Example: Condition 2 (now includes comprehensive logging)
condition_2_mask = potentially_undocumented & eligible_naturalized
condition_2_count = np.sum(person_weights[condition_2_mask])  # NEW: Store count
print(
    f"Condition 2 - Eligible naturalized citizens: {condition_2_count:,.0f} people qualify for Code 3"
)
population_log.append({  # NEW: Add to CSV log
    "step": "Condition 2",
    "description": "Eligible naturalized citizens qualify for Code 3",
    "population": condition_2_count,
})
```

**This pattern is now applied to all 14 conditions (previously only Condition 1 had CSV logging).**

### Algorithm Structure Changes
1. **Step Count**: Reduced from 6 steps to 5 steps
2. **Step 6 Removal**: Final population targeting now handled in Step 5
3. **Enhanced Tracking**: All individual conditions logged
4. **Improved Documentation**: Comprehensive population breakdown

### Benefits
- **Complete Visibility**: See exactly how many people each condition identifies
- **Performance Analysis**: Understand which conditions are most/least effective
- **Data Integrity**: Full audit trail of all algorithm decisions
- **Research Value**: Detailed data for policy analysis and validation

## SSN Card Type Calibration in Loss Function

The ASEC undocumented algorithm is integrated into PolicyEngine's calibration system through the loss function in `policyengine_us_data/utils/loss.py`. This ensures that the simulated undocumented population matches external estimates.

### Calibration Implementation

The calibration system targets the `"NONE"` SSN card type (undocumented immigrants) using the following approach:

```python
# SSN Card Type calibration
for card_type_str in ["NONE"]:  # SSN card types as strings
    ssn_type_mask = sim.calculate("ssn_card_type").values == card_type_str
    
    # Overall count by SSN card type
    label = f"ssa/ssn_card_type_{card_type_str.lower()}_count"
    loss_matrix[label] = sim.map_result(
        ssn_type_mask, "person", "household"
    )
```

### Target Population by Year

The calibration uses year-specific targets based on authoritative sources:

```python
undocumented_targets = {
    2022: 11.0e6,  # Official DHS Office of Homeland Security Statistics estimate for 1 Jan 2022
    # https://ohss.dhs.gov/sites/default/files/2024-06/2024_0418_ohss_estimates-of-the-unauthorized-immigrant-population-residing-in-the-united-states-january-2018%25E2%2580%2593january-2022.pdf
    2023: 12.2e6,  # Center for Migration Studies ACS-based residual estimate (published May 2025)
    # https://cmsny.org/publications/the-undocumented-population-in-the-united-states-increased-to-12-million-in-2023/
    2024: 13.0e6,  # Reuters synthesis of experts ahead of 2025 change ("~13-14 million") - central value
    # https://www.reuters.com/data/who-are-immigrants-who-could-be-targeted-trumps-mass-deportation-plans-2024-12-18/
    2025: 13.0e6,  # Same midpoint carried forward - CBP data show 95% drop in border apprehensions
}
```

### Logic for Target Selection

```python
if time_period <= 2022:
    target_count = 11.0e6  # Use 2022 value for earlier years
elif time_period >= 2025:
    target_count = 13.0e6  # Use 2025 value for later years
else:
    target_count = undocumented_targets[time_period]
```

### Data Sources

1. **2022 (11.0M)**: Official DHS Office of Homeland Security Statistics estimate for January 1, 2022
   - [Source: DHS Report](https://ohss.dhs.gov/sites/default/files/2024-06/2024_0418_ohss_estimates-of-the-unauthorized-immigrant-population-residing-in-the-united-states-january-2018%25E2%2580%2593january-2022.pdf)

2. **2023 (12.2M)**: Center for Migration Studies ACS-based residual estimate
   - [Source: CMS Report](https://cmsny.org/publications/the-undocumented-population-in-the-united-states-increased-to-12-million-in-2023/)

3. **2024-2025 (13.0M)**: Reuters synthesis of expert estimates
   - [Source: Reuters Analysis](https://www.reuters.com/data/who-are-immigrants-who-could-be-targeted-trumps-mass-deportation-plans-2024-12-18/)

### Integration with Algorithm

The calibration target (13.0M for 2024) directly corresponds to the `undocumented_target` parameter in the `add_ssn_card_type()` function, ensuring consistency between the algorithm implementation and the broader PolicyEngine calibration framework.

This calibration ensures that PolicyEngine microsimulations produce realistic estimates of undocumented population sizes that align with the best available research and government statistics.

# Part II: CTC Reform Child Recipient Difference Analysis

The Enhanced CPS dataset includes validation testing for policy reform impacts, including the Child Tax Credit (CTC) reform analysis. This test validates the dataset's ability to model policy changes affecting families, particularly those with different immigration statuses.

## Overview

The `test_ctc_reform_child_recipient_difference()` function compares baseline and reform scenarios to measure the impact on child CTC recipients. This analysis helps validate the Enhanced CPS dataset's policy modeling capabilities.

**Note**: This test is used for validation purposes only and is no longer actively targeted in loss matrix calibration due to uncertainty around assumptions from hearing comments.

## Test Implementation

The test implements the following logic:

1. **Define CTC Reform**: Creates a reform that enables the reconciliation CTC provision
2. **Create Simulations**: Runs both baseline and reform microsimulations
3. **Calculate Recipients**: Identifies child CTC recipients in both scenarios
4. **Measure Difference**: Computes the change in child recipients between scenarios
5. **Validate Against Target**: Compares results to a target of 2 million with ±400% tolerance

### Key Parameters

- **Target Count**: 2,000,000 child recipient difference
- **Tolerance**: ±400% error allowed
- **Reform Period**: 2025-01-01 to 2100-12-31
- **Recipient Criteria**: Children with `ctc_individual_maximum > 0` AND `ctc_value > 0`

In [None]:
def test_ctc_reform_child_recipient_difference():
    """
    Test CTC reform impact for validation purposes only.
    Note: This is no longer actively targeted in loss matrix calibration
    due to uncertainty around assumptions from hearing comments.
    """
    from policyengine_us_data.datasets.cps import EnhancedCPS_2024
    from policyengine_us import Microsimulation
    from policyengine_core.reforms import Reform

    TARGET_COUNT = 2e6
    TOLERANCE = 4  # Allow ±400% error

    # Define the CTC reform
    ctc_reform = Reform.from_dict(
        {
            "gov.contrib.reconciliation.ctc.in_effect": {
                "2025-01-01.2100-12-31": True
            }
        },
        country_id="us",
    )

    # Create baseline and reform simulations
    baseline_sim = Microsimulation(dataset=EnhancedCPS_2024)
    reform_sim = Microsimulation(dataset=EnhancedCPS_2024, reform=ctc_reform)

    # Calculate baseline CTC recipients (children with ctc_individual_maximum > 0 and ctc_value > 0)
    baseline_is_child = baseline_sim.calculate("is_child")
    baseline_ctc_individual_maximum = baseline_sim.calculate(
        "ctc_individual_maximum"
    )
    baseline_ctc_value = baseline_sim.calculate("ctc_value", map_to="person")
    baseline_child_ctc_recipients = (
        baseline_is_child
        * (baseline_ctc_individual_maximum > 0)
        * (baseline_ctc_value > 0)
    ).sum()

    # Calculate reform CTC recipients (children with ctc_individual_maximum > 0 and ctc_value > 0)
    reform_is_child = reform_sim.calculate("is_child")
    reform_ctc_individual_maximum = reform_sim.calculate(
        "ctc_individual_maximum"
    )
    reform_ctc_value = reform_sim.calculate("ctc_value", map_to="person")
    reform_child_ctc_recipients = (
        reform_is_child
        * (reform_ctc_individual_maximum > 0)
        * (reform_ctc_value > 0)
    ).sum()

    # Calculate the difference (baseline - reform child CTC recipients)
    ctc_recipient_difference = (
        baseline_child_ctc_recipients - reform_child_ctc_recipients
    )

    pct_error = abs((ctc_recipient_difference - TARGET_COUNT) / TARGET_COUNT)

    print(
        f"CTC reform child recipient difference: {ctc_recipient_difference:.0f}, target: {TARGET_COUNT:.0f}, error: {pct_error:.2%}"
    )
    print(
        "Note: CTC targeting removed from calibration - this is validation only"
    )
    
    return {
        'baseline_child_ctc_recipients': baseline_child_ctc_recipients,
        'reform_child_ctc_recipients': reform_child_ctc_recipients,
        'ctc_recipient_difference': ctc_recipient_difference,
        'target_count': TARGET_COUNT,
        'pct_error': pct_error,
        'tolerance': TOLERANCE
    }

## Run CTC Analysis

Execute the CTC reform analysis and display detailed results:

In [None]:
# Run the test and display results
try:
    print("Running CTC Reform Analysis...")
    print("This may take a few minutes to load the dataset and run simulations.\n")
    
    results = test_ctc_reform_child_recipient_difference()
    
    print("\n" + "="*60)
    print("CTC REFORM ANALYSIS RESULTS")
    print("="*60)
    print(f"Baseline child CTC recipients: {results['baseline_child_ctc_recipients']:,.0f}")
    print(f"Reform child CTC recipients: {results['reform_child_ctc_recipients']:,.0f}")
    print(f"Difference (baseline - reform): {results['ctc_recipient_difference']:,.0f}")
    print(f"Target difference: {results['target_count']:,.0f}")
    print(f"Percent error: {results['pct_error']:.2%}")
    print(f"Tolerance: ±{results['tolerance']*100:.0f}%")
    
    # Determine if test passes
    if results['pct_error'] <= results['tolerance']:
        print("\n✅ TEST PASSED: Error within tolerance")
    else:
        print("\n❌ TEST FAILED: Error exceeds tolerance")
    
    print("\nThis test validates the dataset's ability to model CTC policy impacts.")
    print("Note: This is validation only - not used for active calibration.")
    
except Exception as e:
    print(f"❌ Test failed to run: {e}")
    print("\nPossible causes:")
    print("- Enhanced CPS dataset not generated yet")
    print("- Missing PolicyEngine dependencies")
    print("- Insufficient memory for microsimulation")
    print("\nTo resolve, ensure the Enhanced CPS dataset is available and all dependencies are installed.")

## Analysis Interpretation

### What the Numbers Mean

- **Baseline Recipients**: Number of children receiving CTC under current law
- **Reform Recipients**: Number of children receiving CTC under the reform scenario
- **Difference**: How many fewer children receive CTC under baseline vs reform
- **Target**: Expected difference based on policy analysis (2 million)

### Policy Context

The CTC reform being tested (`gov.contrib.reconciliation.ctc.in_effect`) represents changes to Child Tax Credit eligibility and amounts. The difference in recipients between baseline and reform scenarios helps validate that the Enhanced CPS dataset accurately models:

1. **Immigration Status Effects**: How CTC changes affect families with different SSN card types
2. **Income Distribution**: Whether the dataset captures realistic income patterns
3. **Family Composition**: Accurate representation of household structures

### Validation Purpose

This test serves as a validation check rather than an active calibration target because:
- Policy assumptions involve uncertainty from legislative hearing comments
- The 400% tolerance reflects this uncertainty
- Results help confirm overall dataset quality without over-constraining the model

## Usage and Summary

### Running the Enhanced Algorithm

When you run the CPS data generation process, the function will:
1. Execute all algorithm steps (Steps 0-5) 
2. Print population numbers to console **for all 14 individual conditions**
3. Save detailed population log to `docs/asec_population_log.csv` (replacing previous file)
4. Display the CSV file path upon completion
5. Automatically update this documentation with the latest numbers

### Enhanced Features

**New in this version:**
- ✅ All 14 ASEC conditions tracked individually
- ✅ Complete population breakdown and analysis
- ✅ CSV data includes every condition
- ✅ Documentation shows detailed condition effectiveness
- ✅ Console output for immediate feedback

### Algorithm Structure (Updated)

**Current 5-step algorithm:**
- **Step 0**: Initialization (all start as Code 0)
- **Step 1**: Citizen classification (move citizens to Code 1) 
- **Step 2**: ASEC conditions (move documented to Code 3) - **Now with full tracking**
- **Step 3**: EAD workers (move some workers to Code 2)
- **Step 4**: EAD students (move some students to Code 2)
- **Step 5**: Probabilistic family correlation (final target adjustment)

**Removed**: Step 6 (Final Population Targeting) - functionality moved to Step 5

### To Run Complete Analysis

1. Ensure the Enhanced CPS 2024 dataset is generated
2. Install required dependencies (`policyengine-us`, `policyengine-core`)
3. Execute the notebook cells in order
4. Review results from both algorithm execution and validation testing
5. **New**: Analyze individual condition effectiveness using detailed breakdown

### Key Benefits

- **Research Value**: Complete visibility into which legal status indicators are most common
- **Algorithm Validation**: Verify each condition is working as expected
- **Policy Analysis**: Understand how different legal status indicators affect population estimates
- **Data Quality**: Full audit trail of all algorithm decisions and population movements

The population numbers in this notebook are dynamically loaded from the CSV file and will always reflect the most recent algorithm run, **now including comprehensive tracking of all 14 individual ASEC conditions**.