# Weighting

OpenFisca-UK calculates variables on each entity of a survey - people, benefit units (nuclear families) and households. Not everyone in the population is in the surveys, so each entity has a weight, all of which sum to the target population. OpenFisca-UK uses ```microdf```, a Python package that modifies ```pandas``` to handle survey weights behind-the-scenes, allowing the user to abstract away the concept of weighting. In practice, this means that results from the model already take into account weights, and the you can essentially treat the Series- and DataFrame-like objects returned from ```sim.calc``` and ```sim.df``` as if they contain everyone in the population. For example:

In [5]:
from openfisca_uk import Microsimulation
import numpy as np

sim = Microsimulation()

(
    sim.calc("income_tax") + np.random.rand(len(sim.calc("person_id"))) * 1000
)  # de-noising the preview here

              value  weight
0       2032.093533  1996.0
1        715.189366   928.0
2      10364.393259  1451.0
3       5778.412968  1451.0
4      11248.744643  1483.0
...             ...     ...
43309   6128.709950  1386.0
43310    881.627228  1386.0
43311    213.505034  1386.0
43312    687.870161  1386.0
43313    315.731888  1386.0

[43314 rows x 2 columns]

As can be seen above, this is not a Series but a MicroSeries, which associates weights with the results. The same happens with DataFrames (MicroDataFrames). We can see the statistics are different when weights are included.

In [6]:
unweighted = sim.calc("income_tax").values.sum()
weighted = sim.calc("income_tax").sum()

import pandas as pd

pd.DataFrame(
    {
        "Total Income Tax (£bn)": pd.Series(
            [unweighted, weighted], index=["Unweighted", "Weighted"]
        ).apply(lambda x: round(x / 1e9, 1))
    }
)

Unnamed: 0,Total Income Tax (£bn)
Unweighted,0.1
Weighted,174.7
