# Architecture overview

PolicyEngine.py provides a unified interface for running tax-benefit policy simulations across countries. The architecture centres on five core models that work together to define, run, and analyse policy scenarios.

## Core model relationships

The system follows this flow:

1. **Dataset** wraps entity tables (people, households, etc.) for one year
2. **Policy** and **Dynamics** define parameter changes and modifications
3. **Simulation** combines these and runs country-specific calculations
4. **ReportElement** subclasses analyse results and return structured tables
5. **Report** collections group multiple analyses together

Each model uses Pydantic for validation and type safety.

In [1]:
# Import the models to show their structure
from policyengine.models import (
    Dataset,
    Policy,
    Dynamics,
    Simulation,
    Parameter,
    ParameterValue,
    ReportElement,
    Report,
)
from policyengine.models.enums import DatasetType, OperationStatus
from datetime import datetime

print("Core models imported successfully")

Core models imported successfully


## Dataset model

The `Dataset` model wraps a `SingleYearDataset` containing entity tables for one fiscal year. Each table represents different units of analysis (persons, households, benefit units, tax units, etc.) with proper foreign key relationships.

In [2]:
# Show Dataset structure
print("Dataset fields:")
for field_name, field_info in Dataset.model_fields.items():
    print(f"  {field_name}: {field_info.annotation}")

Dataset fields:
  name: str | None
  source_dataset: typing.Optional[policyengine.models.dataset.Dataset]
  version: str | None
  data: typing.Any | None
  dataset_type: <enum 'DatasetType'>


Datasets come in two main types:

- **National datasets**: Full population microdata (EFRS for UK, ECPS for US)
- **Synthetic datasets**: Small, representative samples for testing and examples

The `SingleYearDataset` handles serialisation for efficient storage and provides table-level operations like copying.

## Policy and parameter system

Policy changes work through a two-level system:

1. **Parameter** defines a specific policy lever (e.g., personal allowance rate)
2. **ParameterValue** sets that parameter's value for a time period
3. **Policy** collects parameter values and optional simulation modifiers

This design supports both simple parameter changes and complex policy reforms.

In [3]:
# Example parameter definition
pa_param = Parameter(
    name="gov.hmrc.income_tax.allowances.personal_allowance.amount",
    data_type=float,
    label="Personal allowance",
    description="Annual income tax personal allowance",
    unit="GBP",
    country="uk",
)

# Parameter value for specific time period
pa_increase = ParameterValue(
    parameter=pa_param,
    model_version="1.0.0",
    start_date=datetime(2025, 4, 6),
    end_date=datetime(2026, 4, 5),
    value=15000.0,
    country="uk",
)

policy = Policy(
    name="Increase personal allowance",
    description="Increase personal allowance to £15,000",
    parameter_values=[pa_increase],
    country="uk",
)

print(f"Created policy: {policy.name}")
print(f"Parameter values: {len(policy.parameter_values)}")

Created policy: Increase personal allowance
Parameter values: 1


## Dynamics and time-varying changes

The `Dynamics` model handles changes that evolve over time - things like benefit uprating, tax threshold adjustments, or economic assumptions. Like policies, dynamics can include both parameter changes and simulation modifiers.

In [4]:
# Example dynamics setup
dynamics = Dynamics(
    name="Static baseline",
    description="No benefit uprating or economic growth",
    country="uk",
)

print(f"Created dynamics: {dynamics.name}")

Created dynamics: Static baseline


## Simulation execution

The `Simulation` model ties everything together and provides the `run()` method that delegates to country-specific implementations. Each simulation tracks its status and timing.

In [5]:
# Show Simulation structure
print("Simulation fields:")
for field_name, field_info in Simulation.model_fields.items():
    print(f"  {field_name}: {field_info.annotation}")

# Show operation status options
print("\nOperation status options:")
for status in OperationStatus:
    print(f"  {status.value}")

Simulation fields:
  dataset: <class 'policyengine.models.dataset.Dataset'>
  policy: <class 'policyengine.models.policy.Policy'>
  dynamics: <class 'policyengine.models.dynamics.Dynamics'>
  result: typing.Any | None
  model_version: str | None
  country: <class 'str'>
  status: <enum 'OperationStatus'>
  created_at: <class 'datetime.datetime'>
  completed_at: datetime.datetime | None

Operation status options:
  pending
  running
  completed
  failed


## Country-specific implementation

Each country implements its own simulation runner that:

1. Converts the generic dataset to country-specific format
2. Applies policy and dynamics parameter changes
3. Runs calculations using the country's PolicyEngine implementation
4. Stores results back in the generic format

For the UK, this uses `policyengine_uk` with entity tables (person, benunit, household).
For the US, this uses `policyengine_us` with different entities (person, family, tax_unit, marital_unit, spm_unit, household).

## Report and analysis system

The reporting system provides a structured way to analyse simulation results:

- **ReportElement**: Base class for individual analyses that return DataFrames
- **Report**: Collections of related report elements
- **Built-in elements**: Like `AggregateChangeReportElement` for comparing scenarios

All analyses focus on returning structured tables that are easy to store, compare, and visualise.

In [6]:
# Show ReportElement structure
print("ReportElement fields:")
for field_name, field_info in ReportElement.model_fields.items():
    print(f"  {field_name}: {field_info.annotation}")

print("\nReport fields:")
for field_name, field_info in Report.model_fields.items():
    print(f"  {field_name}: {field_info.annotation}")

ReportElement fields:
  name: str | None
  description: str | None
  report: typing.Optional[ForwardRef('Report')]
  status: <enum 'OperationStatus'>
  country: str | None

Report fields:
  name: <class 'str'>
  description: str | None
  elements: list[policyengine.models.reports.ReportElement]
  country: str | None


## Key design principles

The architecture follows several important principles:

1. **Country agnostic**: Core models work across different tax-benefit systems
2. **Table-first analysis**: All outputs are structured DataFrames for easy storage
3. **Composable**: Policies, dynamics, and reports can be mixed and matched
4. **Type-safe**: Pydantic models provide validation and IDE support
5. **Extensible**: Custom report elements and simulation modifiers support complex analyses

This design makes it straightforward to run comparable policy analyses across countries while maintaining the flexibility to handle country-specific requirements.