# Adding activity chains to synthetic populations 

The purpose of this script is to match each individual in the synthetic population to a respondant from the [National Travel Survey (NTS)](https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=5340). 

### Methods

We use statistical matching, as described in [An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data](https://doi.org/10.1016/j.compenvurbsys.2016.11.003). 

In [9]:
import pandas as pd


## Step 1: Decide on matching variables  

We need to identify the sociodemographic charachteristics that we will match on. Let's see what variables exist in (a) the NTS, 
and (b) our synthetic population. The schema for the synthetic population can be found [here](https://github.com/alan-turing-institute/uatk-spc/blob/main/synthpop.proto). 


## Step 2: Prepare the SPC data for matching 

## Step 3: Prepare the NTS data for matching

In [64]:
# path where datasets are stored
path_psu = "../data/nts/UKDA-5340-tab/tab/psu_eul_2002-2022.tab"
psu = pd.read_csv(path_psu, sep="\t")

path_individuals = "../data/nts/UKDA-5340-tab/tab/individual_eul_2002-2022.tab"
individuals = pd.read_csv(path_individuals, sep="\t")

path_households = "../data/nts/UKDA-5340-tab/tab/household_eul_2002-2022.tab"
households = pd.read_csv(path_households, sep="\t")

path_trips = "../data/nts/UKDA-5340-tab/tab/trip_eul_2002-2022.tab"
trips = pd.read_csv(path_trips, sep="\t")


In [66]:
# what year do we want to look at?
year = 2022

# the survey year is in the PSU table. Get psu_id values that match chosen year
# Assuming y is your DataFrame with 'psu_id' and 'year' columns
psu_filtered = psu[psu['SurveyYear'] == year]

# Get the 'PSUID' values for the chosen year
psu_id_year = psu_filtered['PSUID'].unique()

In [67]:
# Filter the dataframes based on the chosen year
individuals_year = individuals[individuals['PSUID'].isin(psu_id_year)]
households_year = households[households['PSUID'].isin(psu_id_year)]
trips_year = trips[trips['PSUID'].isin(psu_id_year)]

In [87]:
individuals['EdAttn3_B01ID'].unique()

array([-10,   2,  -9,   1,  -8])

## Step 4: Statistical Matching