# Notebook 1: Household Simulation with PolicyEngine


## Learning Objectives
By the end of this notebook, you will be able to:
1. **Understand PolicyEngine's core architecture** - entities, variables, parameters, and periods
2. **Create household situations** using the proper entity structure and relationships
3. **Run baseline simulations** and interpret variable calculations and dependencies
4. **Design and implement policy reforms** using parametric changes
5. **Compare baseline vs. reform scenarios** to quantify policy impacts
6. **Use advanced simulation features** like variable tracing and earnings variation analysis
7. **Work with PolicyEngine's data structures** including arrays, DataFrames, and aggregation methods


## Introduction
This notebook demonstrates the core concepts of PolicyEngine's Python package by simulating household situations. We'll start with a baseline scenario, add a reform, and compare the results.

In [1]:
# Import necessary libraries
from policyengine_us import Simulation
from policyengine_core.reforms import Reform
import pandas as pd
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


## Part 1: Creating a Household Situation

### Understanding Core Concepts
Before we begin, let's understand the key components:
- **Entities**: Person, tax unit, SPM unit, marital unit, family, household
- **Variables**: Can be inputs (employment_income) or calculations (ctc_value)
- **Parameters**: Policy features like tax rates, benefit amounts
- **Periods**: Time periods for which we calculate values

### PolicyEngine Repositories
To explore the source code and understand how variables and parameters are defined:
- **[policyengine-us](https://github.com/PolicyEngine/policyengine-us)**: Contains all US-specific variables, parameters, and policy logic
  - Variables: `/policyengine_us/variables/`
  - Parameters: `/policyengine_us/parameters/`
- **[policyengine-core](https://github.com/PolicyEngine/policyengine-core)**: Contains the core simulation frameworks
  - `Simulation` class: `/policyengine_core/simulations/simulation.py`
  - `Microsimulation` class: `/policyengine_core/simulations/microsimulation.py`

### Getting Started: Using PolicyEngine's Web Interface
The easiest way to create a household situation is to use PolicyEngine's web interface:
1. Go to https://policyengine.org/us/household
2. Enter your household details using the interactive form
3. Navigate to the "Reproduce in Python" section
4. Copy the generated code

For example, after creating a household at:
https://policyengine.org/us/household?focus=householdOutput.pythonReproducibility&household=54688

You'll find ready-to-use Python code that creates the situation dictionary for you!

In [2]:
# Create a household situation dictionary
# This represents a family of 4: 2 adults and 2 children in California

situation = {
    "people": {
        "parent1": {
            "age": {"2025": 35},
            "employment_income": {"2025": 75000}
        },
        "parent2": {
            "age": {"2025": 33},
            "employment_income": {"2025": 45000}
        },
        "child1": {
            "age": {"2025": 10}
        },
        "child2": {
            "age": {"2025": 7}
        }
    },
    "families": {
        "family": {
            "members": ["parent1", "parent2", "child1", "child2"]
        }
    },
    "marital_units": {
        "parents": {
            "members": ["parent1", "parent2"]
        },
        "child1_marital_unit": {
            "members": ["child1"],
            "marital_unit_id": {"2025": 1}
        },
        "child2_marital_unit": {
            "members": ["child2"],
            "marital_unit_id": {"2025": 2}
        }
    },
    "tax_units": {
        "tax_unit": {
            "members": ["parent1", "parent2", "child1", "child2"]
        }
    },
    "spm_units": {
        "spm_unit": {
            "members": ["parent1", "parent2", "child1", "child2"]
        }
    },
    "households": {
        "household": {
            "members": ["parent1", "parent2", "child1", "child2"],
            "state_code": {"2025": "CA"}
        }
    }
}

## Part 2: Running a Baseline Simulation


In [3]:
# Create a simulation object with our household situation
baseline_sim = Simulation(situation=situation)

In [4]:
# Calculate key variables for 2025
PERIOD = 2025

In [5]:
# Income-related variables
employment_income = baseline_sim.calculate("employment_income", PERIOD)
adjusted_gross_income = baseline_sim.calculate("adjusted_gross_income", PERIOD)

In [6]:
# Tax-related variables
income_tax = baseline_sim.calculate("income_tax", PERIOD)
ctc = baseline_sim.calculate("ctc_value", PERIOD)
eitc = baseline_sim.calculate("eitc", PERIOD)

In [7]:
# Benefits
snap = baseline_sim.calculate("snap", PERIOD)

In [8]:
# Net income
household_net_income = baseline_sim.calculate("household_net_income", PERIOD)

### Understanding the calculate() Return Value
**IMPORTANT**: The calculate() method returns a NumPy array with values for each entity
- For person-level variables: array length = number of people (4 in our case)
- For household-level variables: array length = number of households (1 in our case)

In [9]:
# Person-level variable: employment_income
print("PERSON-LEVEL VARIABLE (employment_income):")
print(f"  Array: {employment_income}")
print(f"  Shape: {employment_income.shape}")
print(f"  Length: {len(employment_income)} (one value per person)")
print(f"  Individual values: Parent1=${employment_income[0]:,.0f}, Parent2=${employment_income[1]:,.0f}, Child1=${employment_income[2]:,.0f}, Child2=${employment_income[3]:,.0f}")

PERSON-LEVEL VARIABLE (employment_income):
  Array: [75000. 45000.     0.     0.]
  Shape: (4,)
  Length: 4 (one value per person)
  Individual values: Parent1=$75,000, Parent2=$45,000, Child1=$0, Child2=$0


#### Built-in Functions vs. Methods vs. Properties
Python provides different ways to interact with objects:

| Type | Syntax | Description | Example |
|------|--------|-------------|---------|
| **Built-in function** | `function(object)` | Python's built-in functions | `len(array)`, `sum(array)`, `max(array)` |
| **Method** | `object.method()` | Functions that belong to the object (note parentheses) | `array.sum()`, `array.mean()`, `array.max()` |
| **Property/Attribute** | `object.property` | Data that belongs to the object (no parentheses) | `array.shape`, `array.dtype`, `array.size` |


#### Understanding Shape
The `shape` property tells you the dimensions of an array:
- **1D array**: `(n,)` where n is the number of elements
- **2D array**: `(rows, columns)`
- **Higher dimensions**: `(dim1, dim2, dim3, ...)`

In [10]:
# Household-level variable: household_net_income
print("\nHOUSEHOLD-LEVEL VARIABLE (household_net_income):")
print(f"  Array: {household_net_income}")
print(f"  Shape: {household_net_income.shape}")
print(f"  Length: {len(household_net_income)} (one value per household)")
print(f"  Value: ${household_net_income[0]:,.0f}")


HOUSEHOLD-LEVEL VARIABLE (household_net_income):
  Array: [102116.85]
  Shape: (1,)
  Length: 1 (one value per household)
  Value: $102,117


In [11]:
# This demonstrates why we use .sum() - it works correctly regardless of entity level
print(f"\nUsing .sum():")
print(f"  employment_income.sum() = ${employment_income.sum():,.0f} (adds all 4 person values)")
print(f"  household_net_income.sum() = ${household_net_income.sum():,.0f} (just the 1 household value)")


Using .sum():
  employment_income.sum() = $120,000 (adds all 4 person values)
  household_net_income.sum() = $102,117 (just the 1 household value)


### Common Array Operations
- `.sum()` - adds all values in the array
- `.mean()` - calculates the average
- `.max()` - finds the maximum value
- `.min()` - finds the minimum value
- `[index]` - accesses individual elements


In [12]:
print("\n=== BASELINE RESULTS ===")
print(f"Total Employment Income: ${employment_income.sum():,.0f}")
print(f"Average Employment Income per Person: ${employment_income.mean():,.0f}")
print(f"Adjusted Gross Income: ${adjusted_gross_income.sum():,.0f}")
print(f"Federal Income Tax: ${income_tax.sum():,.0f}")
print(f"SNAP Benefits: ${snap.sum():,.0f}")
print(f"Child Tax Credit: ${ctc.sum():,.0f}")
print(f"Earned Income Tax Credit: ${eitc.sum():,.0f}")
print(f"Household Net Income: ${household_net_income.sum():,.0f}")


=== BASELINE RESULTS ===
Total Employment Income: $120,000
Average Employment Income per Person: $30,000
Adjusted Gross Income: $120,000
Federal Income Tax: $6,323
SNAP Benefits: $0
Child Tax Credit: $4,000
Earned Income Tax Credit: $0
Household Net Income: $102,117


## Part 3: Understanding Variable Calculation Flow


xxx - purpose of trace

In [13]:
# Initialize a new simulation for tracing
sim_for_trace = Simulation(situation=situation)

In [14]:
# Let's trace how the Child Tax Credit is calculated
# This demonstrates the dependency tree of variables

sim_for_trace.trace = True
sim_for_trace.calculate("ctc_value", period=PERIOD)
# Now let's see the trace
print("\n=== CTC CALCULATION TRACE ===")
sim_for_trace.tracer.print_computation_log(max_depth=3)



=== CTC CALCULATION TRACE ===
  ctc_value<2025, (default)> = [4000.]
    ctc<2025, (default)> = [4000.]
      ctc_maximum_with_arpa_addition<2025, (default)> = [4000.]
      ctc_phase_out<2025, (default)> = [0.]
    ctc_phase_in<2025, (default)> = [17625.]
      tax_unit_earned_income<2025, (default)> = [120000.]
      ctc_social_security_tax<2025, (default)> = [9180.]
      eitc<2025, (default)> = [0.]
      ctc_qualifying_children<2025, (default)> = [2]


### Anatomy of a Trace Line
Let's break down a typical trace line:
```
ctc_value<2025, (default)> = [4000.]
```

| Component | Explanation |
|-----------|-------------|
| **ctc_value** | Variable name - the Child Tax Credit value |
| **<2025** | Period - the year this calculation applies to |
| **(default)** | Branch - calculation branch (ignore for now - used for advanced scenarios) |
| **= [4000.]** | Array of values - the calculated results |

Variable calculation from the -us repository:

https://github.com/PolicyEngine/policyengine-us/blob/master/policyengine_us/variables/gov/irs/credits/ctc/ctc_value.py

In [15]:
# Let's increase the depth of the trace

print("\n=== CTC CALCULATION TRACE DEPTH = 4 ===")
sim_for_trace.tracer.print_computation_log(max_depth=4)



=== CTC CALCULATION TRACE DEPTH = 4 ===
  ctc_value<2025, (default)> = [4000.]
    ctc<2025, (default)> = [4000.]
      ctc_maximum_with_arpa_addition<2025, (default)> = [4000.]
        ctc_maximum<2025, (default)> = [4000.]
        ctc_arpa_addition<2025, (default)> = [0.]
      ctc_phase_out<2025, (default)> = [0.]
        adjusted_gross_income<2025, (default)> = [120000.]
        ctc_phase_out_threshold<2025, (default)> = [400000.]
    ctc_phase_in<2025, (default)> = [17625.]
      tax_unit_earned_income<2025, (default)> = [120000.]
        earned_income<2025, (default)> = [75000. 45000.     0.     0.]
        is_tax_unit_dependent<2025, (default)> = [False False  True  True]
      ctc_social_security_tax<2025, (default)> = [9180.]
        employee_social_security_tax<2025, (default)> = [4650. 2790.    0.    0.]
        employee_medicare_tax<2025, (default)> = [1087.5  652.5    0.     0. ]
        unreported_payroll_tax<2025, (default)> = [0.]
        self_employment_tax_ald<2025, 

# Notebook 1: Household Simulation with PolicyEngine


In [16]:
# Import necessary libraries
from policyengine_us import Simulation
from policyengine_core.reforms import Reform
import pandas as pd
import numpy as np

## Part 1: Creating a Household Situation

### Understanding Core Concepts
Before we begin, let's understand the key components:
- **Entities**: Person, tax unit, SPM unit, marital unit, family, household
- **Variables**: Can be inputs (employment_income) or calculations (ctc_value)
- **Parameters**: Policy features like tax rates, benefit amounts
- **Periods**: Time periods for which we calculate values

### PolicyEngine Repositories
To explore the source code and understand how variables and parameters are defined:
- **[policyengine-us](https://github.com/PolicyEngine/policyengine-us)**: Contains all US-specific variables, parameters, and policy logic
  - Variables: `/policyengine_us/variables/`
  - Parameters: `/policyengine_us/parameters/`
- **[policyengine-core](https://github.com/PolicyEngine/policyengine-core)**: Contains the core simulation frameworks
  - `Simulation` class: `/policyengine_core/simulations/simulation.py`
  - `Microsimulation` class: `/policyengine_core/simulations/microsimulation.py`

### Getting Started: Using PolicyEngine's Web Interface
The easiest way to create a household situation is to use PolicyEngine's web interface:
1. Go to https://policyengine.org/us/household
2. Enter your household details using the interactive form
3. Navigate to the "Reproduce in Python" section
4. Copy the generated code

For example, after creating a household at:
https://policyengine.org/us/household?focus=householdOutput.pythonReproducibility&household=54688

You'll find ready-to-use Python code that creates the situation dictionary for you!

In [17]:
# Create a household situation dictionary
# This represents a family of 4: 2 adults and 2 children in California

situation = {
    "people": {
        "parent1": {
            "age": {"2025": 35},
            "employment_income": {"2025": 75000}
        },
        "parent2": {
            "age": {"2025": 33},
            "employment_income": {"2025": 45000}
        },
        "child1": {
            "age": {"2025": 10}
        },
        "child2": {
            "age": {"2025": 7}
        }
    },
    "families": {
        "family": {
            "members": ["parent1", "parent2", "child1", "child2"]
        }
    },
    "marital_units": {
        "parents": {
            "members": ["parent1", "parent2"]
        },
        "child1_marital_unit": {
            "members": ["child1"],
            "marital_unit_id": {"2025": 1}
        },
        "child2_marital_unit": {
            "members": ["child2"],
            "marital_unit_id": {"2025": 2}
        }
    },
    "tax_units": {
        "tax_unit": {
            "members": ["parent1", "parent2", "child1", "child2"]
        }
    },
    "spm_units": {
        "spm_unit": {
            "members": ["parent1", "parent2", "child1", "child2"]
        }
    },
    "households": {
        "household": {
            "members": ["parent1", "parent2", "child1", "child2"],
            "state_code": {"2025": "CA"}
        }
    }
}

## Part 2: Running a Baseline Simulation


In [18]:
# Create a simulation object with our household situation
baseline_sim = Simulation(situation=situation)

In [19]:
# Calculate key variables for 2025
PERIOD = 2025

In [20]:
# Income-related variables
employment_income = baseline_sim.calculate("employment_income", PERIOD)
adjusted_gross_income = baseline_sim.calculate("adjusted_gross_income", PERIOD)

In [21]:
# Tax-related variables
income_tax = baseline_sim.calculate("income_tax", PERIOD)
ctc = baseline_sim.calculate("ctc_value", PERIOD)
eitc = baseline_sim.calculate("eitc", PERIOD)

In [22]:
# Benefits
snap = baseline_sim.calculate("snap", PERIOD)

In [23]:
# Net income
household_net_income = baseline_sim.calculate("household_net_income", PERIOD)

### Understanding the calculate() Return Value
**IMPORTANT**: The calculate() method returns a NumPy array with values for each entity
- For person-level variables: array length = number of people (4 in our case)
- For household-level variables: array length = number of households (1 in our case)

In [24]:
# Person-level variable: employment_income
print("PERSON-LEVEL VARIABLE (employment_income):")
print(f"  Array: {employment_income}")
print(f"  Shape: {employment_income.shape}")
print(f"  Length: {len(employment_income)} (one value per person)")
print(f"  Individual values: Parent1=${employment_income[0]:,.0f}, Parent2=${employment_income[1]:,.0f}, Child1=${employment_income[2]:,.0f}, Child2=${employment_income[3]:,.0f}")

PERSON-LEVEL VARIABLE (employment_income):
  Array: [75000. 45000.     0.     0.]
  Shape: (4,)
  Length: 4 (one value per person)
  Individual values: Parent1=$75,000, Parent2=$45,000, Child1=$0, Child2=$0


In [25]:
# Household-level variable: household_net_income
print("\nHOUSEHOLD-LEVEL VARIABLE (household_net_income):")
print(f"  Array: {household_net_income}")
print(f"  Shape: {household_net_income.shape}")
print(f"  Length: {len(household_net_income)} (one value per household)")
print(f"  Value: ${household_net_income[0]:,.0f}")


HOUSEHOLD-LEVEL VARIABLE (household_net_income):
  Array: [102116.85]
  Shape: (1,)
  Length: 1 (one value per household)
  Value: $102,117


In [26]:
# This demonstrates why we use .sum() - it works correctly regardless of entity level
print(f"\nUsing .sum():")
print(f"  employment_income.sum() = ${employment_income.sum():,.0f} (adds all 4 person values)")
print(f"  household_net_income.sum() = ${household_net_income.sum():,.0f} (just the 1 household value)")


Using .sum():
  employment_income.sum() = $120,000 (adds all 4 person values)
  household_net_income.sum() = $102,117 (just the 1 household value)


### Common Array Operations
- `.sum()` - adds all values in the array
- `.mean()` - calculates the average
- `.max()` - finds the maximum value
- `.min()` - finds the minimum value
- `[index]` - accesses individual elements


In [27]:
print("\n=== BASELINE RESULTS ===")
print(f"Total Employment Income: ${employment_income.sum():,.0f}")
print(f"Average Employment Income per Person: ${employment_income.mean():,.0f}")
print(f"Adjusted Gross Income: ${adjusted_gross_income.sum():,.0f}")
print(f"Federal Income Tax: ${income_tax.sum():,.0f}")
print(f"SNAP Benefits: ${snap.sum():,.0f}")
print(f"Child Tax Credit: ${ctc.sum():,.0f}")
print(f"Earned Income Tax Credit: ${eitc.sum():,.0f}")
print(f"Household Net Income: ${household_net_income.sum():,.0f}")


=== BASELINE RESULTS ===
Total Employment Income: $120,000
Average Employment Income per Person: $30,000
Adjusted Gross Income: $120,000
Federal Income Tax: $6,323
SNAP Benefits: $0
Child Tax Credit: $4,000
Earned Income Tax Credit: $0
Household Net Income: $102,117


## Part 3: Understanding Variable Calculation Flow


xxx - purpose of trace

In [28]:
# Initialize a new simulation for tracing
sim_for_trace = Simulation(situation=situation)

In [29]:
# Let's trace how the Child Tax Credit is calculated
# This demonstrates the dependency tree of variables

sim_for_trace.trace = True
sim_for_trace.calculate("ctc_value", period=PERIOD)
# Now let's see the trace
print("\n=== CTC CALCULATION TRACE ===")
sim_for_trace.tracer.print_computation_log(max_depth=3)



=== CTC CALCULATION TRACE ===
  ctc_value<2025, (default)> = [4000.]
    ctc<2025, (default)> = [4000.]
      ctc_maximum_with_arpa_addition<2025, (default)> = [4000.]
      ctc_phase_out<2025, (default)> = [0.]
    ctc_phase_in<2025, (default)> = [17625.]
      tax_unit_earned_income<2025, (default)> = [120000.]
      ctc_social_security_tax<2025, (default)> = [9180.]
      eitc<2025, (default)> = [0.]
      ctc_qualifying_children<2025, (default)> = [2]


In [30]:
# Let's increase the depth of the trace

print("\n=== CTC CALCULATION TRACE DEPTH = 4 ===")
sim_for_trace.tracer.print_computation_log(max_depth=4)



=== CTC CALCULATION TRACE DEPTH = 4 ===
  ctc_value<2025, (default)> = [4000.]
    ctc<2025, (default)> = [4000.]
      ctc_maximum_with_arpa_addition<2025, (default)> = [4000.]
        ctc_maximum<2025, (default)> = [4000.]
        ctc_arpa_addition<2025, (default)> = [0.]
      ctc_phase_out<2025, (default)> = [0.]
        adjusted_gross_income<2025, (default)> = [120000.]
        ctc_phase_out_threshold<2025, (default)> = [400000.]
    ctc_phase_in<2025, (default)> = [17625.]
      tax_unit_earned_income<2025, (default)> = [120000.]
        earned_income<2025, (default)> = [75000. 45000.     0.     0.]
        is_tax_unit_dependent<2025, (default)> = [False False  True  True]
      ctc_social_security_tax<2025, (default)> = [9180.]
        employee_social_security_tax<2025, (default)> = [4650. 2790.    0.    0.]
        employee_medicare_tax<2025, (default)> = [1087.5  652.5    0.     0. ]
        unreported_payroll_tax<2025, (default)> = [0.]
        self_employment_tax_ald<2025, 

### Understanding the Trace Output
The trace shows:
- **Indentation** = dependency depth (more indented = deeper in calculation tree)
- **Variable names** with PERIOD and year (e.g., `ctc_value<2025, (default)>`)
- **Calculated values** as arrays - the array length indicates the entity level:
  - Arrays with 4 values `[x, x, x, x]` = person-level (4 people in household)
  - Arrays with 1 value `[x]` = tax unit or household level
- **Time periods** when the calculation applies
- **With max_depth=None**, the most deeply indented variables are inputs (from user, defaults, or data) rather than calculations


In [31]:
# Now let's look at individual variables outside of the computation tree
ctc_child_individual_maximum = baseline_sim.calculate("ctc_child_individual_maximum", PERIOD)
ctc_phase_out = baseline_sim.calculate("ctc_phase_out", PERIOD)

print(f"\nCTC Maximum per child: ${ctc_child_individual_maximum}")
print(f"CTC Phase-out amount: ${ctc_phase_out}")


CTC Maximum per child: $[   0.    0. 2000. 2000.]
CTC Phase-out amount: $[0.]


## Part 4: Creating and Running Reforms

### Parametric Reforms
The most common type of reform modifies existing parameters (tax rates, benefit amounts, thresholds, etc.):

### Understanding Parameter Types

PolicyEngine uses different parameter types for various policy features. Here's how to work with each type:

| Parameter Type | Example | Notes |
|---------------|---------|-------|
| **Single value** | `"gov.irs.credits.ctc.amount.base[0].amount": 3000`<br>`"gov.irs.credits.ctc.phase_out.rate": 0.05`<br>`"gov.irs.credits.ctc.child.max_age": 17` | Can be monetary amounts (USD), rates/percentages (as decimals: 5% = 0.05), ages (years), or other single values |
| **List parameter** | `"gov.irs.credits.refundable": ["eitc", "refundable_ctc"]` | Lists of values, often containing variable class names |
| **Scale parameter** | `"gov.irs.credits.ctc.amount.base[0].amount": 3000` | Graduated scales with thresholds and rates |
| **Breakdown parameters** | `"gov.irs.credits.ctc.phase_out.threshold.JOINT": 500000`| Parameters broken down by enums (e.g., filing status) or numeric ranges |

### Accessing Scale Components
- `scale.thresholds`: list[float] - income thresholds
- `scale.rates`: list[float] - tax rates for each bracket
- `scale.amounts`: list[float] - fixed amounts (if applicable)

Examples of accessing specific elements:
- `scale.thresholds[0]` - first threshold (usually 0)
- `scale.thresholds[-1]` - last threshold
- `scale.rates[2]` - rate for third bracket


In [32]:
# Create a parametric reform that increases the CTC amount
# This is a simple parameter change - increasing CTC from $2,000 to $3,000 from 2025 forward

ctc_reform = Reform.from_dict({
    "gov.irs.credits.ctc.amount.base[0].amount": {
        "2025-01-01.2100-12-31": 3000
    }
}, country_id="us")

# You can also create more complex reforms with multiple parameters
comprehensive_reform = Reform.from_dict({
    # Increase CTC amount
    "gov.irs.credits.ctc.amount.base[0].amount": {
        "2025-01-01.2100-12-31": 3000
    },
    # Make CTC fully refundable
    "gov.irs.credits.ctc.refundable.fully_refundable": {
        "2025-01-01.2100-12-31": True
    },
    # Increase phase-out threshold for JOINT filing statuses
    "gov.irs.credits.ctc.phase_out.threshold.JOINT": {
        "2025-01-01.2100-12-31": 500000
    }
}, country_id="us")

### Running the Reformed Simulation


In [33]:
# Create a new simulation with the comprehensive reform
reformed_sim = Simulation(
    situation=situation,
    reform=comprehensive_reform
)

In [34]:
# Calculate the same variables under the reform
reformed_income_tax = reformed_sim.calculate("income_tax", PERIOD)
reformed_ctc = reformed_sim.calculate("ctc_value", PERIOD)
reformed_net_income = reformed_sim.calculate("household_net_income", PERIOD)

In [35]:
print("=== REFORM RESULTS ===")
print(f"Federal Income Tax: ${reformed_income_tax.sum():,.0f}")
print(f"Child Tax Credit: ${reformed_ctc.sum():,.0f}")
print(f"Household Net Income: ${reformed_net_income.sum():,.0f}")

=== REFORM RESULTS ===
Federal Income Tax: $4,323
Child Tax Credit: $6,000
Household Net Income: $104,117


### Structural Reforms (Advanced)
PolicyEngine also supports structural reforms that change how variables are calculated, not just parameter values. Here's an example that adds a phase-out to the SALT deduction cap:

In [36]:
from policyengine_us.model_api import *

In [37]:
# Example of a structural reform that modifies the SALT cap calculation to add phase-out to the cap
class salt_cap(Variable):
    value_type = float
    entity = TaxUnit
    label = "SALT cap"
    unit = USD
    definition_PERIOD = YEAR
    reference = "https://www.law.cornell.edu/uscode/text/26/164"

    def formula(tax_unit, PERIOD, parameters):
        filing_status = tax_unit("filing_status", PERIOD)
        p = parameters(
            PERIOD
        ).gov.irs.deductions.itemized.salt_and_real_estate
        max_cap = p.cap[filing_status]
        p_ref = parameters(PERIOD).gov.contrib.salt_phase_out
        agi = tax_unit("adjusted_gross_income", PERIOD)
        agi_excess = max_(0, agi - p_ref.threshold[filing_status])
        phase_out = p_ref.rate * agi_excess
        phased_out_cap = max_(0, max_cap - phase_out)
        if p_ref.floor.applies:
            floor = p_ref.floor.amount[filing_status]
            return max_(phased_out_cap, floor)
        return phased_out_cap

class reform(Reform):
    def apply(self):
        self.update_variable(salt_cap)

# This would be applied as: Simulation(situation=situation, reform=reform())

**Best Practice**: While structural reforms are powerful, it's often better to:
1. File an issue in the [policyengine-us repository](https://github.com/PolicyEngine/policyengine-us/issues)
2. Discuss with PolicyEngine developers
3. Submit a pull request to add new parameters
4. Convert your structural reform into a parametric one

This approach ensures:
- Your reform integrates well with the existing codebase
- Other users can benefit from the new functionality
- The reform is properly tested and documented
- Future updates won't break your analysis

## Part 5: Comparing Baseline and Reform


In [38]:
# Create a comparison dataframe
comparison_data = {
    "Metric": [
        "Federal Income Tax",
        "Child Tax Credit", 
        "Household Net Income",
    ],
    "Baseline": [
        income_tax.sum(),
        ctc.sum(),
        household_net_income.sum(),
    ],
    "Reform": [
        reformed_income_tax.sum(),
        reformed_ctc.sum(),
        reformed_net_income.sum(),
    ]
}

In [39]:
comparison_df = pd.DataFrame(comparison_data)
comparison_df["Change"] = comparison_df["Reform"] - comparison_df["Baseline"]
comparison_df["% Change"] = (comparison_df["Change"] / comparison_df["Baseline"].abs() * 100).round(2)

In [40]:
print("\n=== COMPARISON ===")
print(comparison_df.to_string(index=False))


=== COMPARISON ===
              Metric      Baseline        Reform  Change   % Change
  Federal Income Tax   6323.000000   4323.000000 -2000.0 -31.629999
    Child Tax Credit   4000.000000   6000.000000  2000.0  50.000000
Household Net Income 102116.851562 104116.851562  2000.0   1.960000


## Part 6: Using the calculate_dataframe Method

The `calculate_dataframe` method provides a convenient way to calculate multiple variables at once and return them in a pandas DataFrame format. This is particularly useful when you need to analyze several variables together or export results for further analysis.

### Understanding Entity Levels and Data Structure

PolicyEngine organizes data hierarchically across different entity levels:

| Entity Level | Description | Example Variables |
|-------------|-------------|-------------------|
| **Person** | Individual-level data | employment_income, age, is_disabled |
| **Tax Unit** | Tax filing unit data | income_tax, ctc_value, eitc |
| **SPM Unit** | Supplemental Poverty Measure unit | snap, housing_assistance |
| **Household** | Household-level data | household_net_income, rent |

When using `calculate_dataframe`, it's crucial to understand how these entity levels interact:

In [41]:
# For more complex analysis, we can use calculate_dataframe
# This returns a pandas DataFrame with multiple variables

VARIABLES_TO_CALCULATE = [
    "employment_income",
    "adjusted_gross_income",
    "taxable_income",
    "income_tax",
    "ctc_value",
    "eitc",
    "snap",
    "household_net_income"
]

In [42]:
PERIOD = 2025

### Default Behavior: Automatic Entity Resolution

When you don't specify a `map_to` parameter, `calculate_dataframe` automatically determines the appropriate entity level based on the "longest" entity needed to represent all variables:

In [43]:
# Calculate for baseline
baseline_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    period=PERIOD
)

# Calculate for reform
reform_df = reformed_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    period=PERIOD
)

In [44]:
print("\n=== BASELINE DATAFRAME (No map_to specified) ===")
print(baseline_df)
print(f"\nShape: {baseline_df.shape}")
print("Note: Since employment_income is person-level, all other variables are disaggregated to person-level")


=== BASELINE DATAFRAME (No map_to specified) ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0            75000.0               120000.0         90000.0      6323.0   
1            45000.0               120000.0         90000.0      6323.0   
2                0.0               120000.0         90000.0      6323.0   
3                0.0               120000.0         90000.0      6323.0   

   ctc_value  eitc  snap  household_net_income  
0     4000.0   0.0   0.0         102116.851562  
1     4000.0   0.0   0.0         102116.851562  
2     4000.0   0.0   0.0         102116.851562  
3     4000.0   0.0   0.0         102116.851562  

Shape: (4, 8)
Note: Since employment_income is person-level, all other variables are disaggregated to person-level


In this case, because `employment_income` is a person-level variable and we have 4 people in our household, the DataFrame will have 4 rows. All higher-level variables (tax unit, household) will be repeated for each person they apply to.

In [45]:
print("\n=== REFORM DATAFRAME ===")
print(reform_df)


=== REFORM DATAFRAME ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0            75000.0               120000.0         90000.0      4323.0   
1            45000.0               120000.0         90000.0      4323.0   
2                0.0               120000.0         90000.0      4323.0   
3                0.0               120000.0         90000.0      4323.0   

   ctc_value  eitc  snap  household_net_income  
0     6000.0   0.0   0.0         104116.851562  
1     6000.0   0.0   0.0         104116.851562  
2     6000.0   0.0   0.0         104116.851562  
3     6000.0   0.0   0.0         104116.851562  


### Using map_to Parameter for Aggregation

The `map_to` parameter allows you to aggregate all variables to a specific entity level. This is extremely useful when you want to analyze data at the household or tax unit level:

In [46]:
# Calculate for baseline
baseline_person_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    map_to="person",
    period=PERIOD
)

In [47]:
print("\n=== BASELINE DATAFRAME (Mapped to person) ===")
print(baseline_person_df)
print(f"\nShape: {baseline_person_df.shape}")
print("Note: All variables are now aggregated to person level (4 rows)")


=== BASELINE DATAFRAME (Mapped to person) ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0            75000.0               120000.0         90000.0      6323.0   
1            45000.0               120000.0         90000.0      6323.0   
2                0.0               120000.0         90000.0      6323.0   
3                0.0               120000.0         90000.0      6323.0   

   ctc_value  eitc  snap  household_net_income  
0     4000.0   0.0   0.0         102116.851562  
1     4000.0   0.0   0.0         102116.851562  
2     4000.0   0.0   0.0         102116.851562  
3     4000.0   0.0   0.0         102116.851562  

Shape: (4, 8)
Note: All variables are now aggregated to person level (4 rows)


In [48]:
# Calculate for baseline
baseline_household_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    map_to="household",
    period=PERIOD
)

In [49]:
print("\n=== BASELINE DATAFRAME (Mapped to household) ===")
print(baseline_household_df)
print(f"\nShape: {baseline_household_df.shape}")
print("Note: All variables are now aggregated to household level (1 row)")


=== BASELINE DATAFRAME (Mapped to household) ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0           120000.0               120000.0         90000.0      6323.0   

   ctc_value  eitc  snap  household_net_income  
0     4000.0   0.0   0.0         102116.851562  

Shape: (1, 8)
Note: All variables are now aggregated to household level (1 row)


### How map_to Aggregation Works

When you use `map_to`, PolicyEngine performs the following operations:

1. **Person-level variables** → Summed within the target entity
   - Example: `employment_income` for all 4 people is summed to get total household employment income

2. **Same-level variables** → Kept as-is
   - Example: `household_net_income` is already at household level, so no change

3. **Intermediate-level variables** → Summed to the target level
   - Example: If there were multiple tax units, their `income_tax` would be summed

In [50]:
# Example: Export-ready DataFrame
export_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    map_to="household",
    period=PERIOD
)

# Add identifiers and save
export_df["household_id"] = "household"
export_df["scenario"] = "baseline"

# You could save this to CSV:
export_df.to_csv("household_analysis.csv", index=False)

## Part 7: Creating Earnings Variation Charts

### Note: Using PolicyEngine's Web Interface for Earnings Variation
You can also generate earnings variation code directly from PolicyEngine's web interface:
1. Create your household at https://policyengine.org/us/household
2. Toggle "Include earnings variation" in the "Reproduce in Python" section
3. Copy the generated code with axes already configured

Here's an example of what the generated code looks like:

In [51]:
# Example from PolicyEngine's "Reproduce in Python" with earnings variation
situation_with_axes = {
  "people": {
    "you": {
      "age": {
        "2025": 35
      }
    },
    "your partner": {
      "age": {
        "2025": 33
      }
    },
    "your first dependent": {
      "age": {
        "2025": 10
      }
    },
    "your second dependent": {
      "age": {
        "2025": 7
      }
    }
  },
  "families": {
    "your family": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ]
    }
  },
  "marital_units": {
    "your marital unit": {
      "members": [
        "you",
        "your partner"
      ]
    },
    "your first dependent's marital unit": {
      "members": [
        "your first dependent"
      ],
      "marital_unit_id": {
        "2025": 1
      }
    },
    "your second dependent's marital unit": {
      "members": [
        "your second dependent"
      ],
      "marital_unit_id": {
        "2025": 2
      }
    }
  },
  "tax_units": {
    "your tax unit": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ]
    }
  },
  "spm_units": {
    "your household": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ]
    }
  },
  "households": {
    "your household": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ],
      "state_name": {
        "2025": "CA"
      }
    }
  },
  "axes": [
    [
      {
        "name": "employment_income",
        "count": 1200,
        "min": 0,
        "max": 600000
      }
    ]
  ]
}

### Varying inputs with **axes**

PolicyEngine inherits **axes** from OpenFisca, letting us replicate the household situation over a grid of values.

| Key | Meaning |
|-----|---------|
| `name`  | Variable to iterate over |
| `count` | Number of equally spaced points |
| `min`   | Interval start |
| `max`   | Interval end |

See the OpenFisca documentation for details: <https://openfisca.org/doc/simulate/replicate-simulation-inputs.html>

### Note: Flexibility of Axes Variables
You can vary **ANY** variable in axes, not just employment_income! Examples include:
- `rent`
- `mortgage_interest` 
- `medical_expenses`
- `childcare_expenses`

Just change the "name" parameter in the axes configuration.

In [52]:
# Create simulation with axes
axes_sim = Simulation(situation=situation_with_axes)

In [53]:
# Calculate variables across the income range
income_range = axes_sim.calculate("employment_income", map_to="household", period=PERIOD)
net_income_range = axes_sim.calculate("household_net_income", map_to="household", period=PERIOD)
ctc_range = axes_sim.calculate("ctc_value", map_to="household", period=PERIOD)
eitc_range = axes_sim.calculate("eitc", map_to="household", period=PERIOD)
snap_range = axes_sim.calculate("snap", map_to="household", period=PERIOD)

In [54]:
# Create a dataframe for analysis
results_df = pd.DataFrame({
    "Employment Income": income_range,
    "Net Income": net_income_range,
    "CTC": ctc_range,
    "EITC": eitc_range,
    "SNAP": snap_range
})

In [55]:
# Show sample of results
print("\n=== INCOME VARIATION RESULTS (Sample) ===")
print(results_df.head())  # Show the first 5 rows


=== INCOME VARIATION RESULTS (Sample) ===
   Employment Income    Net Income  CTC        EITC         SNAP
0           0.000000  26000.064453  0.0    0.000000  7710.931641
1         524.860229  26749.869141  0.0  209.944092  7587.632324
2        1049.720459  27496.976562  0.0  419.888184  7461.632324
3        1574.580688  28244.080078  0.0  629.832275  7335.632324
4        2099.440918  28991.185547  0.0  839.776367  7209.632324


### Parallel and perpendicular axes

* **Parallel axes** – objects inside the **same** inner list. Variables change **together** (lock‑step).  
* **Perpendicular axes** – objects in **separate** inner lists. Variables change **independently** creating a Cartesian product.

In [56]:
# One parallel axis (employment_income)
example_axes = [[
    {"name": "employment_income", "count": 1200, "min": 0, "max": 600_000}
]]

# Two parallel axes (income and childcare cost) changing together
example_axes = [[
    {"name": "employment_income", "count": 1200, "min": 0, "max": 600_000},
    {"name": "childcare_expenses", "count": 1200, "min": 0, "max": 20_000}
]]

# Two perpendicular axes (income × age) – 1 200 × 10 = 12 000 variants
example_axes = [
    [{"name": "employment_income", "count": 1200, "min": 0, "max": 600_000}],
    [{"name": "age", "index": 0, "count": 10, "min": 18, "max": 65}]
]

### Visualizing the data 

In [57]:
import plotly.graph_objects as go
from policyengine_core.charts import format_fig

In [58]:
# Create a Plotly plot of CTC vs Employment Income
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=income_range,
    y=ctc_range,
    mode='lines',
    name='Child Tax Credit',
    line=dict(color='#2C6496', width=3),
    hovertemplate='Employment Income: $%{x:,.0f}<br>CTC: $%{y:,.0f}<extra></extra>'
))

fig.update_layout(
    title='Child Tax Credit by Employment Income<br>Family of 4 in California',
    xaxis_title='Employment Income',
    yaxis_title='Child Tax Credit',
    xaxis=dict(tickformat='$,.0f'),
    yaxis=dict(tickformat='$,.0f'),
    hovermode='x unified',
    template='plotly_white'
)

# Apply PolicyEngine formatting
fig = format_fig(fig)
fig.show()

## Summary

This notebook demonstrated:
1. **Creating household situations** with proper entity relationships
2. **Running simulations** and calculating variables
3. **Understanding variable dependencies** through the calculation tree
4. **Creating and applying reforms** (both simple and complex)
5. **Comparing baseline and reform** scenarios
6. **Using calculate_dataframe** for efficient multi-variable analysis
7. **Calculating marginal tax rates** through incremental changes
8. **Using axes** to vary parameters and analyze responses

Key takeaways:
- PolicyEngine uses a hierarchical entity structure (person → tax unit → household)
- Variables can depend on other variables, creating a calculation tree
- Reforms can modify parameters or add new calculation logic
- The Simulation class provides flexible methods for analysis
- DataFrames integration makes it easy to analyze results with pandas

### Understanding the Trace Output
The trace shows:
- **Indentation** = dependency depth (more indented = deeper in calculation tree)
- **Variable names** with PERIOD and year (e.g., `ctc_value<2025, (default)>`)
- **Calculated values** as arrays - the array length indicates the entity level:
  - Arrays with 4 values `[x, x, x, x]` = person-level (4 people in household)
  - Arrays with 1 value `[x]` = tax unit or household level
- **Time periods** when the calculation applies
- **With max_depth=None**, the most deeply indented variables are inputs (from user, defaults, or data) rather than calculations


In [59]:
# Now let's look at individual variables outside of the computation tree
ctc_child_individual_maximum = baseline_sim.calculate("ctc_child_individual_maximum", PERIOD)
ctc_phase_out = baseline_sim.calculate("ctc_phase_out", PERIOD)

print(f"\nCTC Maximum per child: ${ctc_child_individual_maximum}")
print(f"CTC Phase-out amount: ${ctc_phase_out}")


CTC Maximum per child: $[   0.    0. 2000. 2000.]
CTC Phase-out amount: $[0.]


## Part 4: Creating and Running Reforms

### Parametric Reforms
The most common type of reform modifies existing parameters (tax rates, benefit amounts, thresholds, etc.):

### Understanding Parameter Types

PolicyEngine uses different parameter types for various policy features. Here's how to work with each type:

| Parameter Type | Example | Notes |
|---------------|---------|-------|
| **Single value** | `"gov.irs.credits.ctc.amount.base[0].amount": 3000`<br>`"gov.irs.credits.ctc.phase_out.rate": 0.05`<br>`"gov.irs.credits.ctc.child.max_age": 17` | Can be monetary amounts (USD), rates/percentages (as decimals: 5% = 0.05), ages (years), or other single values |
| **List parameter** | `"gov.irs.credits.refundable": ["eitc", "refundable_ctc"]` | Lists of values, often containing variable class names |
| **Scale parameter** | `"gov.irs.credits.ctc.amount.base[0].amount": 3000` | Graduated scales with thresholds and rates |
| **Breakdown parameters** | `"gov.irs.credits.ctc.phase_out.threshold.JOINT": 500000`| Parameters broken down by enums (e.g., filing status) or numeric ranges |

### Accessing Scale Components
- `scale.thresholds`: list[float] - income thresholds
- `scale.rates`: list[float] - tax rates for each bracket
- `scale.amounts`: list[float] - fixed amounts (if applicable)

Examples of accessing specific elements:
- `scale.thresholds[0]` - first threshold (usually 0)
- `scale.thresholds[-1]` - last threshold
- `scale.rates[2]` - rate for third bracket


In [60]:
# Create a parametric reform that increases the CTC amount
# This is a simple parameter change - increasing CTC from $2,000 to $3,000 from 2025 forward

ctc_reform = Reform.from_dict({
    "gov.irs.credits.ctc.amount.base[0].amount": {
        "2025-01-01.2100-12-31": 3000
    }
}, country_id="us")

# You can also create more complex reforms with multiple parameters
comprehensive_reform = Reform.from_dict({
    # Increase CTC amount
    "gov.irs.credits.ctc.amount.base[0].amount": {
        "2025-01-01.2100-12-31": 3000
    },
    # Make CTC fully refundable
    "gov.irs.credits.ctc.refundable.fully_refundable": {
        "2025-01-01.2100-12-31": True
    },
    # Increase phase-out threshold for JOINT filing statuses
    "gov.irs.credits.ctc.phase_out.threshold.JOINT": {
        "2025-01-01.2100-12-31": 500000
    }
}, country_id="us")

### Running the Reformed Simulation


In [61]:
# Create a new simulation with the comprehensive reform
reformed_sim = Simulation(
    situation=situation,
    reform=comprehensive_reform
)

In [62]:
# Calculate the same variables under the reform
reformed_income_tax = reformed_sim.calculate("income_tax", PERIOD)
reformed_ctc = reformed_sim.calculate("ctc_value", PERIOD)
reformed_net_income = reformed_sim.calculate("household_net_income", PERIOD)

In [63]:
print("=== REFORM RESULTS ===")
print(f"Federal Income Tax: ${reformed_income_tax.sum():,.0f}")
print(f"Child Tax Credit: ${reformed_ctc.sum():,.0f}")
print(f"Household Net Income: ${reformed_net_income.sum():,.0f}")

=== REFORM RESULTS ===
Federal Income Tax: $4,323
Child Tax Credit: $6,000
Household Net Income: $104,117


### Structural Reforms (Advanced)
PolicyEngine also supports structural reforms that change how variables are calculated, not just parameter values. Here's an example that adds a phase-out to the SALT deduction cap:

In [64]:
from policyengine_us.model_api import *

In [65]:
# Example of a structural reform that modifies the SALT cap calculation to add phase-out to the cap
class salt_cap(Variable):
    value_type = float
    entity = TaxUnit
    label = "SALT cap"
    unit = USD
    definition_PERIOD = YEAR
    reference = "https://www.law.cornell.edu/uscode/text/26/164"

    def formula(tax_unit, PERIOD, parameters):
        filing_status = tax_unit("filing_status", PERIOD)
        p = parameters(
            PERIOD
        ).gov.irs.deductions.itemized.salt_and_real_estate
        max_cap = p.cap[filing_status]
        p_ref = parameters(PERIOD).gov.contrib.salt_phase_out
        agi = tax_unit("adjusted_gross_income", PERIOD)
        agi_excess = max_(0, agi - p_ref.threshold[filing_status])
        phase_out = p_ref.rate * agi_excess
        phased_out_cap = max_(0, max_cap - phase_out)
        if p_ref.floor.applies:
            floor = p_ref.floor.amount[filing_status]
            return max_(phased_out_cap, floor)
        return phased_out_cap

class reform(Reform):
    def apply(self):
        self.update_variable(salt_cap)

# This would be applied as: Simulation(situation=situation, reform=reform())

**Best Practice**: While structural reforms are powerful, it's often better to:
1. File an issue in the [policyengine-us repository](https://github.com/PolicyEngine/policyengine-us/issues)
2. Discuss with PolicyEngine developers
3. Submit a pull request to add new parameters
4. Convert your structural reform into a parametric one

This approach ensures:
- Your reform integrates well with the existing codebase
- Other users can benefit from the new functionality
- The reform is properly tested and documented
- Future updates won't break your analysis

## Part 5: Comparing Baseline and Reform


In [66]:
# Create a comparison dataframe
comparison_data = {
    "Metric": [
        "Federal Income Tax",
        "Child Tax Credit", 
        "Household Net Income",
    ],
    "Baseline": [
        income_tax.sum(),
        ctc.sum(),
        household_net_income.sum(),
    ],
    "Reform": [
        reformed_income_tax.sum(),
        reformed_ctc.sum(),
        reformed_net_income.sum(),
    ]
}

In [67]:
comparison_df = pd.DataFrame(comparison_data)
comparison_df["Change"] = comparison_df["Reform"] - comparison_df["Baseline"]
comparison_df["% Change"] = (comparison_df["Change"] / comparison_df["Baseline"].abs() * 100).round(2)

In [68]:
print("\n=== COMPARISON ===")
print(comparison_df.to_string(index=False))


=== COMPARISON ===
              Metric      Baseline        Reform  Change   % Change
  Federal Income Tax   6323.000000   4323.000000 -2000.0 -31.629999
    Child Tax Credit   4000.000000   6000.000000  2000.0  50.000000
Household Net Income 102116.851562 104116.851562  2000.0   1.960000


## Part 6: Using the calculate_dataframe Method

The `calculate_dataframe` method provides a convenient way to calculate multiple variables at once and return them in a pandas DataFrame format. This is particularly useful when you need to analyze several variables together or export results for further analysis.

### Understanding Entity Levels and Data Structure

PolicyEngine organizes data hierarchically across different entity levels:

| Entity Level | Description | Example Variables |
|-------------|-------------|-------------------|
| **Person** | Individual-level data | employment_income, age, is_disabled |
| **Tax Unit** | Tax filing unit data | income_tax, ctc_value, eitc |
| **SPM Unit** | Supplemental Poverty Measure unit | snap, housing_assistance |
| **Household** | Household-level data | household_net_income, rent |

When using `calculate_dataframe`, it's crucial to understand how these entity levels interact:

In [69]:
# For more complex analysis, we can use calculate_dataframe
# This returns a pandas DataFrame with multiple variables

VARIABLES_TO_CALCULATE = [
    "employment_income",
    "adjusted_gross_income",
    "taxable_income",
    "income_tax",
    "ctc_value",
    "eitc",
    "snap",
    "household_net_income"
]

In [70]:
PERIOD = 2025

### Default Behavior: Automatic Entity Resolution

When you don't specify a `map_to` parameter, `calculate_dataframe` automatically determines the appropriate entity level based on the "longest" entity needed to represent all variables:

In [71]:
# Calculate for baseline
baseline_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    period=PERIOD
)

# Calculate for reform
reform_df = reformed_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    period=PERIOD
)

In [72]:
print("\n=== BASELINE DATAFRAME (No map_to specified) ===")
print(baseline_df)
print(f"\nShape: {baseline_df.shape}")
print("Note: Since employment_income is person-level, all other variables are disaggregated to person-level")


=== BASELINE DATAFRAME (No map_to specified) ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0            75000.0               120000.0         90000.0      6323.0   
1            45000.0               120000.0         90000.0      6323.0   
2                0.0               120000.0         90000.0      6323.0   
3                0.0               120000.0         90000.0      6323.0   

   ctc_value  eitc  snap  household_net_income  
0     4000.0   0.0   0.0         102116.851562  
1     4000.0   0.0   0.0         102116.851562  
2     4000.0   0.0   0.0         102116.851562  
3     4000.0   0.0   0.0         102116.851562  

Shape: (4, 8)
Note: Since employment_income is person-level, all other variables are disaggregated to person-level


In this case, because `employment_income` is a person-level variable and we have 4 people in our household, the DataFrame will have 4 rows. All higher-level variables (tax unit, household) will be repeated for each person they apply to.

In [73]:
print("\n=== REFORM DATAFRAME ===")
print(reform_df)


=== REFORM DATAFRAME ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0            75000.0               120000.0         90000.0      4323.0   
1            45000.0               120000.0         90000.0      4323.0   
2                0.0               120000.0         90000.0      4323.0   
3                0.0               120000.0         90000.0      4323.0   

   ctc_value  eitc  snap  household_net_income  
0     6000.0   0.0   0.0         104116.851562  
1     6000.0   0.0   0.0         104116.851562  
2     6000.0   0.0   0.0         104116.851562  
3     6000.0   0.0   0.0         104116.851562  


### Using map_to Parameter for Aggregation

The `map_to` parameter allows you to aggregate all variables to a specific entity level. This is extremely useful when you want to analyze data at the household or tax unit level:

In [74]:
# Calculate for baseline
baseline_person_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    map_to="person",
    period=PERIOD
)

In [75]:
print("\n=== BASELINE DATAFRAME (Mapped to person) ===")
print(baseline_person_df)
print(f"\nShape: {baseline_person_df.shape}")
print("Note: All variables are now aggregated to person level (4 rows)")


=== BASELINE DATAFRAME (Mapped to person) ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0            75000.0               120000.0         90000.0      6323.0   
1            45000.0               120000.0         90000.0      6323.0   
2                0.0               120000.0         90000.0      6323.0   
3                0.0               120000.0         90000.0      6323.0   

   ctc_value  eitc  snap  household_net_income  
0     4000.0   0.0   0.0         102116.851562  
1     4000.0   0.0   0.0         102116.851562  
2     4000.0   0.0   0.0         102116.851562  
3     4000.0   0.0   0.0         102116.851562  

Shape: (4, 8)
Note: All variables are now aggregated to person level (4 rows)


In [76]:
# Calculate for baseline
baseline_household_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    map_to="household",
    period=PERIOD
)

In [77]:
print("\n=== BASELINE DATAFRAME (Mapped to household) ===")
print(baseline_household_df)
print(f"\nShape: {baseline_household_df.shape}")
print("Note: All variables are now aggregated to household level (1 row)")


=== BASELINE DATAFRAME (Mapped to household) ===
   employment_income  adjusted_gross_income  taxable_income  income_tax  \
0           120000.0               120000.0         90000.0      6323.0   

   ctc_value  eitc  snap  household_net_income  
0     4000.0   0.0   0.0         102116.851562  

Shape: (1, 8)
Note: All variables are now aggregated to household level (1 row)


### How map_to Aggregation Works

When you use `map_to`, PolicyEngine performs the following operations:

1. **Person-level variables** → Summed within the target entity
   - Example: `employment_income` for all 4 people is summed to get total household employment income

2. **Same-level variables** → Kept as-is
   - Example: `household_net_income` is already at household level, so no change

3. **Intermediate-level variables** → Summed to the target level
   - Example: If there were multiple tax units, their `income_tax` would be summed

In [78]:
# Example: Export-ready DataFrame
export_df = baseline_sim.calculate_dataframe(
    VARIABLES_TO_CALCULATE,
    map_to="household",
    period=PERIOD
)

# Add identifiers and save
export_df["household_id"] = "household"
export_df["scenario"] = "baseline"

# You could save this to CSV:
export_df.to_csv("household_analysis.csv", index=False)

## Part 7: Creating Earnings Variation Charts

### Note: Using PolicyEngine's Web Interface for Earnings Variation
You can also generate earnings variation code directly from PolicyEngine's web interface:
1. Create your household at https://policyengine.org/us/household
2. Toggle "Include earnings variation" in the "Reproduce in Python" section
3. Copy the generated code with axes already configured

Here's an example of what the generated code looks like:

In [79]:
# Example from PolicyEngine's "Reproduce in Python" with earnings variation
situation_with_axes = {
  "people": {
    "you": {
      "age": {
        "2025": 35
      }
    },
    "your partner": {
      "age": {
        "2025": 33
      }
    },
    "your first dependent": {
      "age": {
        "2025": 10
      }
    },
    "your second dependent": {
      "age": {
        "2025": 7
      }
    }
  },
  "families": {
    "your family": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ]
    }
  },
  "marital_units": {
    "your marital unit": {
      "members": [
        "you",
        "your partner"
      ]
    },
    "your first dependent's marital unit": {
      "members": [
        "your first dependent"
      ],
      "marital_unit_id": {
        "2025": 1
      }
    },
    "your second dependent's marital unit": {
      "members": [
        "your second dependent"
      ],
      "marital_unit_id": {
        "2025": 2
      }
    }
  },
  "tax_units": {
    "your tax unit": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ]
    }
  },
  "spm_units": {
    "your household": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ]
    }
  },
  "households": {
    "your household": {
      "members": [
        "you",
        "your partner",
        "your first dependent",
        "your second dependent"
      ],
      "state_name": {
        "2025": "CA"
      }
    }
  },
  "axes": [
    [
      {
        "name": "employment_income",
        "count": 1200,
        "min": 0,
        "max": 600000
      }
    ]
  ]
}

### Varying inputs with **axes**

PolicyEngine inherits **axes** from OpenFisca, letting us replicate the household situation over a grid of values.

| Key | Meaning |
|-----|---------|
| `name`  | Variable to iterate over |
| `count` | Number of equally spaced points |
| `min`   | Interval start |
| `max`   | Interval end |

See the OpenFisca documentation for details: <https://openfisca.org/doc/simulate/replicate-simulation-inputs.html>

### Note: Flexibility of Axes Variables
You can vary **ANY** variable in axes, not just employment_income! Examples include:
- `rent`
- `mortgage_interest` 
- `medical_expenses`
- `childcare_expenses`

Just change the "name" parameter in the axes configuration.

In [80]:
# Create simulation with axes
axes_sim = Simulation(situation=situation_with_axes)

In [81]:
# Calculate variables across the income range
income_range = axes_sim.calculate("employment_income", map_to="household", period=PERIOD)
net_income_range = axes_sim.calculate("household_net_income", map_to="household", period=PERIOD)
ctc_range = axes_sim.calculate("ctc_value", map_to="household", period=PERIOD)
eitc_range = axes_sim.calculate("eitc", map_to="household", period=PERIOD)
snap_range = axes_sim.calculate("snap", map_to="household", period=PERIOD)

In [82]:
# Create a dataframe for analysis
results_df = pd.DataFrame({
    "Employment Income": income_range,
    "Net Income": net_income_range,
    "CTC": ctc_range,
    "EITC": eitc_range,
    "SNAP": snap_range
})

In [83]:
# Show sample of results
print("\n=== INCOME VARIATION RESULTS (Sample) ===")
print(results_df.head())  # Show the first 5 rows


=== INCOME VARIATION RESULTS (Sample) ===
   Employment Income    Net Income  CTC        EITC         SNAP
0           0.000000  26000.064453  0.0    0.000000  7710.931641
1         524.860229  26749.869141  0.0  209.944092  7587.632324
2        1049.720459  27496.976562  0.0  419.888184  7461.632324
3        1574.580688  28244.080078  0.0  629.832275  7335.632324
4        2099.440918  28991.185547  0.0  839.776367  7209.632324


### Parallel and perpendicular axes

* **Parallel axes** – objects inside the **same** inner list. Variables change **together** (lock‑step).  
* **Perpendicular axes** – objects in **separate** inner lists. Variables change **independently** creating a Cartesian product.

In [84]:
# One parallel axis (employment_income)
example_axes = [[
    {"name": "employment_income", "count": 1200, "min": 0, "max": 600_000}
]]

# Two parallel axes (income and childcare cost) changing together
example_axes = [[
    {"name": "employment_income", "count": 1200, "min": 0, "max": 600_000},
    {"name": "childcare_expenses", "count": 1200, "min": 0, "max": 20_000}
]]

# Two perpendicular axes (income × age) – 1 200 × 10 = 12 000 variants
example_axes = [
    [{"name": "employment_income", "count": 1200, "min": 0, "max": 600_000}],
    [{"name": "age", "index": 0, "count": 10, "min": 18, "max": 65}]
]

### Visualizing the data 

In [85]:
import plotly.graph_objects as go
from policyengine_core.charts import format_fig

In [86]:
# Create a Plotly plot of CTC vs Employment Income
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=income_range,
    y=ctc_range,
    mode='lines',
    name='Child Tax Credit',
    line=dict(color='#2C6496', width=3),
    hovertemplate='Employment Income: $%{x:,.0f}<br>CTC: $%{y:,.0f}<extra></extra>'
))

fig.update_layout(
    title='Child Tax Credit by Employment Income<br>Family of 4 in California',
    xaxis_title='Employment Income',
    yaxis_title='Child Tax Credit',
    xaxis=dict(tickformat='$,.0f'),
    yaxis=dict(tickformat='$,.0f'),
    hovermode='x unified',
    template='plotly_white'
)

# Apply PolicyEngine formatting
fig = format_fig(fig)
fig.show()

## Summary

This notebook demonstrated:
1. **Creating household situations** with proper entity relationships
2. **Running simulations** and calculating variables
3. **Understanding variable dependencies** through the calculation tree
4. **Creating and applying reforms** (both simple and complex)
5. **Comparing baseline and reform** scenarios
6. **Using calculate_dataframe** for efficient multi-variable analysis
7. **Calculating marginal tax rates** through incremental changes
8. **Using axes** to vary parameters and analyze responses

Key takeaways:
- PolicyEngine uses a hierarchical entity structure (person → tax unit → household)
- Variables can depend on other variables, creating a calculation tree
- Reforms can modify parameters or add new calculation logic
- The Simulation class provides flexible methods for analysis
- DataFrames integration makes it easy to analyze results with pandas