# Populations

In calculating over weighted survey microdata, we use the Family Resources Survey. This isn't redistributable, but is available to academics. We use the package ```frs``` to convert the source TAB files into OpenFisca-compatible CSV files.

## Prerequisite: install and initialise ```frs```

The FRS package converts the Family Resources Survey into OpenFisca-UK input files by calculating the required variables for each person, benefit unit and household from the relational database format. It only needs to be set up once with the original files.

First, install and initialise the package ```frs``` (via ```pip install frs```):

In [2]:
!frs status

FRS status:
	FRS TAB files stored?				Yes
	FRS OpenFisca-UK input files generated?		Yes
	OpenFisca-UK input files outdated?		No (files generated with current version, 0.2.0)


The FRS package needs to be pointed to a folder containing the source TAB files for a year of the FRS. Command-line hints are provided:

In [11]:
!frs --help

usage: frs [-h] [--path PATH] {status,gen,regen}

Utility for managing Family Resources Survey microdata

positional arguments:
  {status,gen,regen}  The action to take on stored data

optional arguments:
  -h, --help          show this help message and exit
  --path PATH         The path to the FRS data


## Using ```PopulationSim```

The helper class ```PopulationSim``` uses the FRS package to load the input data:

In [2]:
from openfisca_uk import PopulationSim

sim = PopulationSim()

This helper class contains an OpenFisca Simulation object in ```sim.model```; the ```PopulationSim``` class provides some functions such as mapping between entity types, effective marginal tax rate calculation and DataFrame generation.

For example, we can generate a DataFrame containing all the variables for the ```person``` entity:

In [6]:
import pandas as pd

# entity_df(entity) generates a DataFrame with all entity-level variables

person_df = sim.entity_df(entity="person")

# display the results

person_df = person_df.filter(items=["age", "hours", "taxable_income", "total_tax", "unused_personal_allowance", "NI"])

person_df.describe()

Unnamed: 0,age,hours,taxable_income,total_tax,unused_personal_allowance,NI
count,43087.0,43087.0,43087.0,43087.0,43087.0,43087.0
mean,41.514309,15.404361,16810.93,3506.071,5442.554688,822.033203
std,23.953661,19.480444,110743.0,52583.24,5662.850098,2645.75708
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,21.0,0.0,0.0,0.0,0.0,0.0
50%,43.0,0.0,9308.0,0.0,3192.0,0.0
75%,62.0,37.0,23117.64,3147.07,12500.0,1075.199951
max,80.0,132.0,21899850.0,10499530.0,12500.0,440800.53125
