# UK testing notebook

This notebook includes testing for the functionality of reweight with PolicyEngine UK.

## Import statements and data installation

The UK data, as it is not public, is installed through these utilities.

In [14]:
import pandas as pd
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter

In [15]:
from policyengine_uk import Microsimulation
from policyengine_uk.data import RawFRS_2021_22
RawFRS_2021_22().download()

## Generating variables for input into the reweight function

These variables are generated by the pre-existing generate_model_variables function from PolicyEngine UK, and are input into reweight later.

In [16]:
from policyengine_uk.data.datasets.frs.calibration.calibrate import generate_model_variables

(
    household_weights,
    weight_adjustment,
    values_df,
    targets,
    targets_array,
    equivalisation_factors_array
) = generate_model_variables("frs_2021", 2025)

## Reweighting

Reweight is imported and used to generate a new set of weights, which are tested for the proportion that remain nonzero and their overall distribution.

In [17]:
from reweight import reweight

In [18]:
sim_matrix = torch.tensor(values_df.to_numpy(), dtype=torch.float32)

final_weights = reweight(household_weights, sim_matrix, targets, targets_array)

Initial loss: 58.35776138305664
Epoch 100, Loss: 48.30685043334961
Epoch 200, Loss: 40.58155059814453
Epoch 300, Loss: 34.585235595703125
Epoch 400, Loss: 29.832853317260742
Epoch 500, Loss: 25.99891471862793
Epoch 600, Loss: 22.858182907104492
Epoch 700, Loss: 20.250896453857422
Epoch 800, Loss: 18.061073303222656
Epoch 900, Loss: 16.202829360961914
Epoch 1000, Loss: 14.611446380615234


In [19]:
def nonzero_proportion(tensor):
    return torch.count_nonzero(tensor).item() / tensor.numel()

print(nonzero_proportion(household_weights))

print(nonzero_proportion(final_weights))

1.0
1.0


In [20]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    "initial_weights": household_weights.numpy(),
    "final_weights": final_weights.numpy()
})

px.histogram(
    df,
    x=["initial_weights", "final_weights"],
    barmode="overlay",
)

# Income distribution testing

The income distribution when survey data are weighted based on the new weights is compared with the income distribution weighted according to the old weights, to test for overfitting or any other significant distributional changes.

In [21]:
import plotly.graph_objects as go

income_values = np.asarray(values_df["employment income budgetary impact (UK)"])
initial_weights = household_weights.numpy()
finishing_weights = final_weights.numpy()
fig = go.Figure()

# Add the first histogram
fig.add_trace(go.Histogram(
    x=income_values,
    y=initial_weights,
    histfunc='sum',
    name='Initial distribution',
    opacity=0.75,
    #nbinsx=200  # You can adjust the number of bins as needed
))

# Add the second histogram
fig.add_trace(go.Histogram(
    x=income_values,
    y=finishing_weights,
    histfunc='sum',
    name='Final distribution',
    opacity=0.75,
    #nbinsx=200  # You can adjust the number of bins as needed
))

# Customize the layout
fig.update_layout(
    title='Income Distribution Comparison',
    xaxis_title='Income',
    yaxis_title='Population',
    barmode='overlay',  # This creates the overlay effect
    bargap=0.1,  # Adds a small gap between bars within the same bin
)

# Show the plot
fig.show()