# Imputing SSN statuses

This documentation outlines the implementation of SSN status imputation within the Enhanced CPS dataset, using the ASEC Undocumented Algorithm. The ASEC Undocumented Algorithm applies a process-of-elimination method to identify likely undocumented individuals in the CPS. It systematically removes people with clear evidence of legal immigration status, such as U.S. citizenship, lawful permanent residence, or work-authorized visas. Those remaining are flagged as likely undocumented and assigned an SSN card type accordingly.

The Enhanced CPS dataset incorporates this imputation to improve accuracy in microsimulation analysis. This includes modelling eligibility and take-up for policies that depend on SSN status—such as the Child Tax Credit (CTC)—and validating distributional impacts under reform scenarios. Most part of this implementation follows the methodology described in [Ryan (2022)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4662801).

## Algorithm steps

The algorithm is implemented in the `add_ssn_card_type()` function. The algorithm assigns SSN card type codes based on immigration status: **Code 0** (`"NONE"`) for likely undocumented immigrants, **Code 1** (`"CITIZEN"`) for U.S. citizens (by birth or naturalization), **Code 2** (`"NON_CITIZEN_VALID_EAD"`) for non-citizens with work or study authorization, and **Code 3** (`"OTHER_NON_CITIZEN"`) for non-citizens with other indicators of legal status.

The following steps explain how individuals are classified and assigned SSN card types based on citizenship, legal status indicators, and probabilistic adjustments to match population targets.

### Step 1: citizen classification  
Individuals are reassigned to Code 1 if they are identified as U.S. citizens based on their `PRCITSHP` value. Codes 1 through 4 capture all forms of U.S. citizenship, including native-born and naturalized citizens. Non-citizens (`PRCITSHP == 5`) are retained for further evaluation.

### Step 2: ASEC undocumented algorithm conditions  
The algorithm applies 14 sequential conditions, derived from the ASEC Undocumented Algorithm, to identify non-citizens with legal status indicators. These conditions rely on administrative data such as arrival year, program participation, and employment history. Individuals meeting any condition are reassigned to Code 2 (valid work/study authorization) or Code 3 (other legal status).

- Condition 1*: Flags individuals who arrived before 1982 (`PEINUSYR` codes 1–7), a group eligible for IRCA amnesty.  
- *Condition 2*: Identifies naturalized citizens (`PRCITSHP == 4`) who meet residency requirements—either 5+ years in the U.S. or 3+ years and married to a U.S. citizen.  
- *Condition 3*: Reassigns individuals receiving Medicare (`MCARE == 1`), as eligibility implies legal status.  
- *Condition 4*: Includes recipients of federal pensions (`PEN_SC1 == 3` or `PEN_SC2 == 3`), indicating lawful employment history.  
- *Condition 5*: Captures those receiving Social Security Disability benefits (`RESNSS1 == 2` or `RESNSS2 == 2`).  
- *Condition 6*: Identifies individuals with Indian Health Service coverage (`IHSFLG == 1`), indicating tribal affiliation or eligibility.  
- *Condition 7*: Flags Medicaid recipients (`CAID == 1`), with the assumption that state-level restrictions are not modeled.  
- *Condition 8*: Includes individuals with CHAMPVA health insurance (`CHAMPVA == 1`), which covers veterans’ families.  
- *Condition 9*: Reassigns those with military health insurance (`MIL == 1`), such as TRICARE.  
- *Condition 10*: Identifies government employees (`PEIO1COW` codes 1–3 or `A_MJOCC == 11`), assuming legal work authorization is required.  
- *Condition 11*: Flags Social Security beneficiaries (`SS_YN == 1`).  
- *Condition 12*: Uses housing subsidy participation (`SPM_CAPHOUSESUB > 0`, mapped from SPM units) as a legal status proxy.  
- *Condition 13*: Identifies veterans or military personnel (`PEAFEVER == 1` or `A_MJOCC == 11`).  
- *Condition 14*: Captures SSI recipients (`SSI_YN == 1`), assuming eligibility implies legal presence.

### Step 3: target-driven EAD assignment for workers  
To align with Pew Research estimates, the algorithm targets 8.3 million undocumented workers with valid employment authorization (EAD). Among non-citizens not already in Code 3, those with earnings (`WSAL_VAL > 0` or `SEMP_VAL > 0`) are eligible. The `select_random_subset_to_target()` function, seeded with 0, randomly assigns enough of these individuals to Code 2 to meet the employment-based target.

### Step 4: target-driven EAD assignment for students  
A separate target is applied for undocumented students in higher education, estimated at roughly 399,000 (21% of 1.9 million, based on Higher Ed Immigration Portal data). Eligible individuals are non-citizens currently in college (`A_HSCOL == 2`) and not already in Code 3. A second call to `select_random_subset_to_target()` with seed 1 randomly reassigns the required number to Code 2.

### Step 5: probabilistic family correlation adjustment  
As a final step, the algorithm ensures the total undocumented population reaches the 13 million target. If needed, it uses a probabilistic adjustment to move some Code 3 household members to Code 0 within mixed-status families—households already containing Code 0 individuals. The function identifies these households, calculates the additional undocumented count needed, and randomly selects Code 3 members to reassign using a random seed of 100. This adjustment accounts for under-identification in prior steps and reflects real-world family compositions.

The following section displays the population results.

In [1]:
import pandas as pd
import os

csv_path = "asec_population_log.csv"
df = pd.read_csv(csv_path)

if not df.empty:
    def get_population(step, description):
        """Helper function to get population for a specific step and description"""
        result = df[(df['step'] == step) & (df['description'] == description)]
        if not result.empty:
            return f"{result.iloc[0]['population']:,.0f}"
        return "Not found"
        
    print("### Initialization")
    print(f"- **Step 0 - Initial**: Code 0 people = {get_population('Step 0 - Initial', 'Code 0 people')}")
    print()
    
    print("### Citizen Classification")
    print(f"- **Step 1 - Citizens**: Moved to Code 1 = {get_population('Step 1 - Citizens', 'Moved to Code 1')}")
    print()
    
    print("### ASEC Conditions Analysis")
    print(f"- **ASEC Conditions**: Current Code 0 people = {get_population('ASEC Conditions', 'Current Code 0 people')}")
    print()
    
    print("### Individual ASEC Conditions (Detailed Breakdown)")
    print("*Each condition identifies people with indicators of legal status who qualify for Code 3*\n")
    
    # Define condition mappings for lookup
    condition_names = {
        1: "Pre-1982 arrivals",
        2: "Eligible naturalized citizens", 
        3: "Medicare recipients",
        4: "Federal retirement benefits",
        5: "Social Security disability",
        6: "Indian Health Service coverage",
        7: "Medicaid recipients",
        8: "CHAMPVA recipients",
        9: "Military health insurance",
        10: "Government employees",
        11: "Social Security recipients",
        12: "Housing assistance",
        13: "Veterans/Military personnel",
        14: "SSI recipients"
    }
    
    for i in range(1, 15):
        condition_name = condition_names[i]
        condition_pop = get_population(f'Condition {i}', f'{condition_name} qualify for Code 3')
        print(f"- **Condition {i:2d} - {condition_name}**: {condition_pop} people qualify for Code 3")
    
    print()
    print(f"- **After conditions**: Code 0 people = {get_population('After conditions', 'Code 0 people')}")
    print()
    
    print("### Target Information")
    print(f"- **Before adjustment**: Undocumented workers = {get_population('Before adjustment', 'Undocumented workers')}")
    print(f"- **Target**: Undocumented workers target = {get_population('Target', 'Undocumented workers target')}")
    print(f"- **Before adjustment**: Undocumented students = {get_population('Before adjustment', 'Undocumented students')}")
    print(f"- **Target**: Undocumented students target = {get_population('Target', 'Undocumented students target')}")
    print()
    
    print("### EAD Assignment")
    print(f"- **Step 3 - EAD workers**: Moved from Code 0 to Code 2 = {get_population('Step 3 - EAD workers', 'Moved from Code 0 to Code 2')}")
    print(f"- **Step 4 - EAD students**: Moved from Code 0 to Code 2 = {get_population('Step 4 - EAD students', 'Moved from Code 0 to Code 2')}")
    print(f"- **After EAD assignment**: Code 0 people = {get_population('After EAD assignment', 'Code 0 people')}")
    print()
    
    print("### Family Correlation (Final Step)")
    print(f"- **Step 5 - Family correlation**: Changed from Code 3 to Code 0 = {get_population('Step 5 - Family correlation', 'Changed from Code 3 to Code 0')}")
    print(f"- **After family correlation**: Code 0 people = {get_population('After family correlation', 'Code 0 people')}")
    print()
    
    print("### Final Results")
    print(f"- **Final**: Code 0 (NONE) = {get_population('Final', 'Code 0 (NONE)')}")
    print(f"- **Final**: Code 1 (CITIZEN) = {get_population('Final', 'Code 1 (CITIZEN)')}")
    print(f"- **Final**: Code 2 (NON_CITIZEN_VALID_EAD) = {get_population('Final', 'Code 2 (NON_CITIZEN_VALID_EAD)')}")
    print(f"- **Final**: Code 3 (OTHER_NON_CITIZEN) = {get_population('Final', 'Code 3 (OTHER_NON_CITIZEN)')}")
    print(f"- **Final**: Total undocumented (Code 0) = {get_population('Final', 'Total undocumented (Code 0)')}")
    print(f"- **Final**: Undocumented target = {get_population('Final', 'Undocumented target')}")

### Initialization
- **Step 0 - Initial**: Code 0 people = 320,890,854

### Citizen Classification
- **Step 1 - Citizens**: Moved to Code 1 = 295,419,820

### ASEC Conditions Analysis
- **ASEC Conditions**: Current Code 0 people = 25,471,035

### Individual ASEC Conditions (Detailed Breakdown)
*Each condition identifies people with indicators of legal status who qualify for Code 3*

- **Condition  1 - Pre-1982 arrivals**: 981,447 people qualify for Code 3
- **Condition  2 - Eligible naturalized citizens**: 0 people qualify for Code 3
- **Condition  3 - Medicare recipients**: 1,918,043 people qualify for Code 3
- **Condition  4 - Federal retirement benefits**: 6,783 people qualify for Code 3
- **Condition  5 - Social Security disability**: 197,206 people qualify for Code 3
- **Condition  6 - Indian Health Service coverage**: 1,776 people qualify for Code 3
- **Condition  7 - Medicaid recipients**: 5,406,195 people qualify for Code 3
- **Condition  8 - CHAMPVA recipients**: 13,149 people

## SSN card type calibration

The ASEC Undocumented Algorithm is integrated into PolicyEngine's calibration system to ensure that the simulated undocumented population aligns with authoritative external estimates. The calibration specifically targets individuals assigned an SSN card type of "NONE" (likely undocumented) and adjusts their share in the population to match year-specific benchmarks. These targets are drawn from high-quality sources, including the DHS Office of Homeland Security Statistics ([11.0 million](https://ohss.dhs.gov/sites/default/files/2024-06/2024_0418_ohss_estimates-of-the-unauthorized-immigrant-population-residing-in-the-united-states-january-2018%25E2%2580%2593january-2022.pdf) for 2022), the Center for Migration Studies ([12.2 million](https://cmsny.org/publications/the-undocumented-population-in-the-united-states-increased-to-12-million-in-2023/) for 2023), and a Reuters synthesis of expert projections ([13.0 million](https://www.reuters.com/data/who-are-immigrants-who-could-be-targeted-trumps-mass-deportation-plans-2024-12-18/) for 2024 and 2025). This integration into the loss function ensures that PolicyEngine’s microsimulations remain grounded in current demographic realities.

## Child Tax Credit reform impact by immigration status

In the following analysis, we use the SSN card type imputation to evaluate how immigration status shapes eligibility for the Child Tax Credit (CTC). Specifically, we assess the effect of CTC reform on the number of child recipients, comparing baseline and reform scenarios to validate the dataset’s ability to capture policy-driven changes across mixed-status and undocumented households.

In [4]:
# Child Tax Credit Reform Recipient Difference Analysis

from policyengine_us_data.datasets.cps import EnhancedCPS_2024
from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform

# Define the CTC reform (makes the reconciliation CTC permanently active)
ctc_reform = Reform.from_dict(
    {
        "gov.contrib.reconciliation.ctc.in_effect": {
            "2025-01-01.2100-12-31": True
        }
    },
    country_id="us",
)

# Create microsimulations for baseline and reform scenarios
baseline_sim = Microsimulation(dataset=EnhancedCPS_2024)
reform_sim = Microsimulation(dataset=EnhancedCPS_2024, reform=ctc_reform)

# Compute CTC recipients in baseline
baseline_is_child = baseline_sim.calculate("is_child")
baseline_ctc_value = baseline_sim.calculate("ctc_value", map_to="person")
baseline_ctc_max = baseline_sim.calculate("ctc_individual_maximum")
baseline_recipients = (
    baseline_is_child * (baseline_ctc_value > 0) * (baseline_ctc_max > 0)
).sum()

# Compute CTC recipients in reform
reform_is_child = reform_sim.calculate("is_child")
reform_ctc_value = reform_sim.calculate("ctc_value", map_to="person")
reform_ctc_max = reform_sim.calculate("ctc_individual_maximum")
reform_recipients = (
    reform_is_child * (reform_ctc_value > 0) * (reform_ctc_max > 0)
).sum()

# Difference in number of child CTC recipients
recipient_difference = baseline_recipients - reform_recipients

# Report results
print(f"Baseline child CTC recipients: {baseline_recipients:,.0f}")
print(f"Reform child CTC recipients:   {reform_recipients:,.0f}")
print(f"Difference: {recipient_difference:,.0f}")


Baseline child CTC recipients: 61,893,530
Reform child CTC recipients:   58,258,260
Difference: 3,635,270
