# Getting started

In this notebook, we'll walk through how to use the PolicyEngine.py package to run simulations and produce analyses. We'll start with a basic analysis in the UK that doesn't use any databases, and then start saving and loading things into a database.

## Basic analysis

To start, let's run through a simulation of the UK, and create a chart of the distribution of household income.

In [1]:
from policyengine.models import Simulation, Aggregate, policyengine_uk_model, policyengine_uk_latest_version
from policyengine.utils.datasets import create_uk_dataset
import plotly.graph_objects as go
from policyengine.utils.charts import add_fonts, format_figure

# Load the dataset

uk_dataset = create_uk_dataset()

# Create and run the simulation


sim = Simulation(
    dataset=uk_dataset,
    model=policyengine_uk_model,
    model_version=policyengine_uk_latest_version,
)

sim.run()

# Extract aggregates for household income ranges

income_ranges = [0, 20000, 40000, 60000, 80000, 100000, 150000, 200000, 300000, 500000, 1_000_000]
aggregates = []
for i in range(len(income_ranges) - 1):
    aggregates.append(
        Aggregate(
            entity="household",
            variable_name="hbai_household_net_income",
            aggregate_function="count",
            filter_variable_name="hbai_household_net_income",
            filter_variable_geq=income_ranges[i],
            filter_variable_leq=income_ranges[i + 1],
            simulation=sim,
        )
    )

aggregates = Aggregate.run(aggregates)

# Create the bar chart

fig = go.Figure(data=[
    go.Bar(
        x=[f"£{inc:,}" for inc in income_ranges[:-1]],
        y=[agg.value for agg in aggregates],
    )
])

# Apply formatting

format_figure(
    fig,
    title="The distribution of household income in the UK",
    x_title="Income range",
    y_title="Number of households",
)

So, in this example we introduced a few concepts:

* The `Simulation` object, which represents a full run of a microsimulation model, containing all the information (simulated and input) about a set of people or groups. It takes here a few arguments: a `Dataset`, `Model` and `ModelVersion`.
* The `Dataset` object, which represents a set of people or groups. Here we used a utility function to create this dataset for the UK, but we later will be able to create these from scratch or pull them from a database.
* The `Model` object, which represents a particular microsimulation model (essentially defined as a function transforming a dataset to a new dataset). There are two models defined by this package, one for the UK and one for the US. Think of these objects as adapters representing the full microsimulation models. Here, we've taken the pre-defined UK model.
* The `ModelVersion` object, which represents a particular version of a model. This is useful for tracking changes to the model over time. Here, we used the latest version of the UK model.


In [2]:
from policyengine.models import ParameterValue, Parameter, Policy
from datetime import datetime

personal_allowance = Parameter(
    id="gov.hmrc.income_tax.allowances.personal_allowance.amount",
)
personal_allowance_value = ParameterValue(
    parameter=personal_allowance,
    start_date=datetime(2029, 1, 1),
    value=20_000,
)
policy = Policy(
    name="Increase personal allowance to £20,000",
    description="A policy to increase the personal allowance for income tax to £20,000.",
    parameter_values=[personal_allowance_value],
)

In [3]:
sim_2 = Simulation(
    dataset=uk_dataset,
    model=policyengine_uk_model,
    model_version=policyengine_uk_latest_version,
    policy=policy,
)

In [4]:
sim_2.run()

{'person':        pip_dl_category  miscellaneous_income  pension_income  sublet_income  \
 0                 NONE              0.000000             0.0            0.0   
 1                 NONE              0.000000             0.0            0.0   
 2                 NONE              0.000000             0.0            0.0   
 3                 NONE              0.000000             0.0            0.0   
 4                 NONE              0.000000             0.0            0.0   
 ...                ...                   ...             ...            ...   
 115607            NONE              0.000000             0.0            0.0   
 115608            NONE              0.000000             0.0            0.0   
 115609            NONE            157.425507             0.0            0.0   
 115610            NONE              0.000000             0.0            0.0   
 115611            NONE              0.000000             0.0            0.0   
 
        pip_m_category  empl

In [6]:
# Extract new aggregates for household income ranges

income_ranges = [0, 20000, 40000, 60000, 80000, 100000, 150000, 200000, 300000, 500000, 1_000_000]
aggregates_2 = []
for i in range(len(income_ranges) - 1):
    aggregates_2.append(
        Aggregate(
            entity="household",
            variable_name="hbai_household_net_income",
            aggregate_function="count",
            filter_variable_name="hbai_household_net_income",
            filter_variable_geq=income_ranges[i],
            filter_variable_leq=income_ranges[i + 1],
            simulation=sim_2,
        )
    )

aggregates_2 = Aggregate.run(aggregates_2)

# Create the comparative bar chart
fig = go.Figure(data=[
    go.Bar(
        name="Baseline",
        x=[f"£{inc:,}" for inc in income_ranges[:-1]],
        y=[agg.value for agg in aggregates],
    ),
    go.Bar(
        name="Reform",
        x=[f"£{inc:,}" for inc in income_ranges[:-1]],
        y=[agg.value for agg in aggregates_2],
    ),
])

# Apply formatting
fig = format_figure(
    fig,
    title="The distribution of household income in the UK",
    x_title="Income range",
    y_title="Number of households",
)

add_fonts()

fig