# Converting the National Travel Survey into a Simple MATSim Format Population

This notebook demonstrates an example workflow for converting tabular diary data (household attributes, person attributes and trip data) into MATSim formatted xml population data for London households.

This includes:
- pre-processing of tabular inputs
- loading data into pam
- household sampling
- facility sampling
- preliminary investigation
- writing to xml

This example is highly simplified. Of particular note: the diary data used is spatially very aggregate (trip locations are aggregated to inner/outer London). This creates significant variance in the sampled trip lengths. Generally we would expect more precise spatial data to be used. Alternately the complexity of the facility sampling step can be improved to better account for known trip features such as mode and duration.

The diary data used is available from the UK Data Service (https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=5340) and is described here:http://doc.ukdataservice.ac.uk/doc/5340/mrdoc/pdf/5340_nts_user_guidance_1995-2016.pdf

In [None]:
use_dummy_data = False

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import geopandas as gp
import os
from matplotlib import pyplot as plt
from copy import deepcopy
from tqdm import tqdm

In [None]:
out_dir = '../outputs'  # outputs are writen here

# required inputs from the National Travel Survey
if use_dummy_data:
    households_csv = './data/dummyNTS/householdeul2017.tab'
    individuals_csv = './data/dummyNTS/individualeul2017.tab'
    trips_csv ='./data/dummyNTS/tripeul2017.tab'

else:
    households_csv = '../data/inputs/UKDA-5340-tab/tab/household_eul_2002-2020.tab'
    individuals_csv = '../data/inputs/UKDA-5340-tab/tab/individual_eul_2002-2020.tab'
    trips_csv ='../data/inputs/UKDA-5340-tab/tab/trip_eul_2002-2020.tab'


## Load households data

1. Load household data into pandas DataFrame.
2. Create some mappings of participation and weighting by household for use later. These are described in http://doc.ukdataservice.ac.uk/doc/5340/mrdoc/pdf/5340_nts_user_guidance_1995-2016.pdf

In [None]:
# Column adjustments
# OutCom_B02ID is missing and is replaced by W1
# HRPSEGWorkStat_B01ID is missing

hh_in = pd.read_csv(
    households_csv,
    sep='\t',
    usecols=['HouseholdID', 'SurveyYear', 'PSUID', 'W2', 'W1',
       'HHIncome2002_B02ID', 'AddressType_B01ID', 'Ten1_B02ID',
       'Landlord_B01ID', 'ResLength_B01ID', 'HHoldCountry_B01ID',
       'HHoldGOR_B02ID', 'HHoldNumAdults', 'HHoldNumChildren',
       'HHoldNumPeople', 'HHoldStruct_B02ID', 'NumLicHolders',
       'HHoldEmploy_B01ID', 'NumVehicles', 'NumBike', 'NumCar', 'NumMCycle',
       'NumVanLorry', 'NumCarVan', 'WalkBus_B01ID', 'Getbus_B01ID',
       'WalkRail_B01ID', 'WalkRailAlt_B01ID',
       'HRPWorkStat_B02ID', 'HRPSEGWorkStat_B01ID', 'HHoldOAClass2011_B03ID',
       'Settlement2011EW_B03ID'],
)

hh_in.HHIncome2002_B02ID = pd.to_numeric(hh_in.HHIncome2002_B02ID, errors='coerce')
hh_in.NumLicHolders = pd.to_numeric(hh_in.NumLicHolders, errors='coerce')
hh_in.NumVehicles = pd.to_numeric(hh_in.NumVehicles, errors='coerce')
hh_in.NumCar = pd.to_numeric(hh_in.NumCar, errors='coerce')
hh_in.NumMCycle = pd.to_numeric(hh_in.NumMCycle, errors='coerce')
hh_in.NumVanLorry = pd.to_numeric(hh_in.NumVanLorry, errors='coerce')
hh_in.NumCarVan = pd.to_numeric(hh_in.NumCarVan, errors='coerce')
hh_in.Settlement2011EW_B03ID = pd.to_numeric(hh_in.Settlement2011EW_B03ID, errors='coerce')
# hh_in.Settlement2011EW_B04ID = pd.to_numeric(hh_in.Settlement2011EW_B04ID, errors='coerce')


hh_in.head()

In [None]:
participation_mapping = dict(zip(hh_in.HouseholdID, hh_in.W1))# OutCom_B02ID
weight_mapping = dict(zip(hh_in.HouseholdID, hh_in.W2))

## Load person data

Load person attributes data into pandas DataFrame.

In [None]:
persons_in = pd.read_csv(
    individuals_csv,
    sep='\t',
    usecols=['SurveyYear', 'IndividualID', 'HouseholdID', 'PSUID', 'VehicleID',
       'PersNo', 'Age_B01ID', 'OfPenAge_B01ID', 'Sex_B01ID', 'EdAttn1_B01ID',
       'EdAttn2_B01ID', 'EdAttn3_B01ID', 'DrivLic_B02ID', 'CarAccess_B01ID',
       'DrivDisable_B01ID', 'WkPlace_B01ID', 'ES2000_B01ID', 'NSSec_B03ID',
       'SC_B01ID', 'Stat_B01ID', 'SVise_B01ID', 'EcoStat_B02ID',
       'PossHom_B01ID']
)
persons_in.head()

## load trip data

1. Load trip data into pandas DataFrame format.
2. Apply some preliminary formatting
3. Replace headers so that we can use pam read method:


- pid - person ID
- hid - household ID
- seq - trip sequence number
- hzone - household zone
- ozone - trip origin zone
- dzone - trip destination zone
- purp - trip purpose
- mode - trip mode
- tst - trip start time (minutes)
- tet - trip end time (minutes)
- freq - weighting for representative population

In [None]:
# Column adjustments
# TripDestUA2009_B01ID replaced by TripDestGOR_B02ID
# TripOrigUA2009_B01ID replaced by TripOrigGOR_B02ID

travel_diaries_in = pd.read_csv(
    trips_csv,
    sep='\t',
    usecols=['TripID', 'SurveyYear', 'DayID', 'IndividualID', 'HouseholdID', 'PSUID',
       'PersNo', 'TravDay', 'JourSeq', 'ShortWalkTrip_B01ID', 'NumStages',
       'MainMode_B04ID', 'TripPurpFrom_B01ID',
        'TripPurpTo_B01ID', 'TripPurpose_B04ID',
       'TripStart', 'TripEnd', 'TripOrigGOR_B02ID', 'TripDestGOR_B02ID'],
#     dtype={"W5": np.float64,}
)

travel_diaries_in.TripStart = pd.to_numeric(travel_diaries_in.TripStart, errors='coerce')
travel_diaries_in.TripEnd = pd.to_numeric(travel_diaries_in.TripEnd, errors='coerce')

travel_diaries_in.head()

In [None]:
travel_diaries_in['participation'] = travel_diaries_in.HouseholdID.map(participation_mapping)
travel_diaries_in['hh_weight'] = travel_diaries_in.HouseholdID.map(weight_mapping)

In [None]:
travel_diaries = travel_diaries_in.loc[travel_diaries_in.participation.isin([1,2])]

In [None]:
travel_diaries.head()

In [None]:
# TripOrigUA2009_B01ID was replaced by TripOrigGOR_B02ID
# TripDestUA2009_B01ID was replaced by TripDestGOR_B02ID
# TripPurpose_B04ID was replaced by 

travel_diaries.rename(
    columns={  # rename data
        'JourSeq': 'seq',
        'TripOrigGOR_B02ID': 'ozone',
        'TripDestGOR_B02ID': 'dzone',
        'TripPurpFrom_B01ID': 'oact',
        'TripPurpTo_B01ID': 'dact',
        'MainMode_B04ID': 'mode',
        'TripStart': 'tst',
        'TripEnd': 'tet',
    },
    inplace=True)

travel_diaries.head()

In [None]:
travel_diaries.dtypes

In [None]:
def check_uniques(df):
    for c in df.columns:
        print(c)
        n = df[c].nunique()
        if n < 1000:
            print(df[c].unique())

In [None]:
check_uniques(travel_diaries)

## Area Mapping

The NTS documentation refers to a 'modified' 2009 Unitary Authorities. The Unmodified 2017 UAs are included below for reference. They 2017 UA names are similar but not the same as the NTS mappings.

Found here: https://data.gov.uk/dataset/4e1d5b2c-bb91-42ad-b420-f7fcab638389/counties-and-unitary-authorities-december-2017-full-extent-boundaries-in-uk-wgs84.

High resolution spatial boundaries found here: https://data.cambridgeshireinsight.org.uk/dataset/output-areas ans https://martinjc.github.io/UK-GeoJSON/

We have built our own geometry:

In [None]:
area_path = "../data/inputs/cambridge/cambridge_wards.geojson"
region_path = "../data/inputs/cambridge/cambridge_boundary.geojson"
# "./data/Counties_Unitary_Authorities_dec2017_full_extent/NTS_boundaries.geojson"
# "./data/cambridge/cambridge_bufferred_convex_hull_output_areas.geojson" 

In [None]:
# Import areas of interest
areas = gp.read_file(area_path)
region = gp.read_file(region_path)

In [None]:
areas.head()

In [None]:
areas.plot(figsize=(6,6))
region.plot(figsize=(6,6))

## Clean out incomplete plans

In [None]:
import_clean_travel_diaries = True

In [None]:
def remove_broken_plans(plan):
    if plan.isnull().values.any():
        return None
    for col in ['ozone', 'dzone']:
        if -8 in list(plan[col]):
            return None
    return plan

In [None]:
if import_clean_travel_diaries:
    # Import clean travel diaries
    clean_travel_diaries = pd.read_csv('../data/outputs/UKDA-5340-processed/clean_travel_diaries.csv')

In [None]:
if not import_clean_travel_diaries:
    clean_travel_diaries = travel_diaries.groupby(
        ['IndividualID', 'TravDay']
    ).apply(
        remove_broken_plans
    ).reset_index(drop=True)
    
    print('Exporting clean travel diaries')
    # Export clean travel diaries
#     clean_travel_diaries.to_csv('./data/UKDA-5340-processed/clean_travel_diaries.csv')#crs="EPSG:27700", to_crs="EPSG:4326")

In [None]:
clean_travel_diaries.head()

In [None]:
print(len(travel_diaries))
print(len(clean_travel_diaries))

## Build Mappings and apply to common fields

We simplify key trip variables such as mode and activity.

In [None]:
def string_to_dict(string):
    """used to build dicts from NTS rtf format dictionaries (cut and paste from the NTS documentation)"""
    mapping = {}
    for line in string.split("\n"):
        _, v, l = line.split("\t")
        v = v.split(" = ")[1]
        l = l.split(" = ")[1]
        mapping[float(v)] = str(l)
    return mapping

In [None]:
mode_mapping = {
    1: 'walk',
     2: 'bike',
     3: 'car',  #'Car/van driver'
     4: 'car',  #'Car/van driver'
     5: 'car',  #'Motorcycle',
     6: 'car',  #'Other private transport',
     7: 'bus', #Bus in London',
     8: 'bus', #'Other local bus',
     9: 'bus', #'Non-local bus',
     10: 'rail', #'London Underground',
     11: 'rail', #'Surface Rail',
     12: 'car',  #'Taxi/minicab',
     13: 'car', #'Other public transport',
     -10: 'DEAD',
     -8: 'NA'
}

purp_mapping = {
    1: 'work',
     2: 'work',  #'In course of work',
     3: 'education',
     4: 'shop',  #'Food shopping',
     5: 'shop',  #'Non food shopping',
     6: 'medical', #'Personal business medical',
     7: 'other',  #'Personal business eat/drink',
     8: 'other',  #'Personal business other',
     9: 'other',  #'Eat/drink with friends',
     10: 'visit',  #'Visit friends',
     11: 'other',  #'Other social',
     12: 'other',  #'Entertain/ public activity',
     13: 'other',  #'Sport: participate',
     14: 'home',  #'Holiday: base',
     15: 'other',  #'Day trip/just walk',
     16: 'other',  #'Other non-escort',
     17: 'escort_home',  #'Escort home',
     18: 'escort_work',  #'Escort work',
     19: 'escort_work',  #'Escort in course of work',
     20: 'escort_education',  #'Escort education',
     21: 'escort_shop',  #'Escort shopping/personal business',
     22: 'escort_other',  #'Other escort',
     23: 'home',  #'Home',
     -10: 'DEAD',
     -8: 'NA'
}

clean_travel_diaries['mode'] = clean_travel_diaries['mode'].map(mode_mapping)
clean_travel_diaries['oact'] = clean_travel_diaries['oact'].map(purp_mapping)
clean_travel_diaries['dact'] = clean_travel_diaries['dact'].map(purp_mapping)

## Reweight and Split Days

In order to get the most from our small sample we treat individual diary days as new persons. In order to maintain the original household weighting we reduce this accordingly.

In [None]:
import_trips = True

In [None]:
if import_trips:
    # Import clean travel diaries
    trips = pd.read_csv('../data/outputs/UKDA-5340-processed/trips.csv')

In [None]:
# reweight and split ids for unique days

def reweight(group):
    """
    Reweight based on multiple diary days, ie if an agent has two diary days, we will treat these as
    two unique agents, so we half the original weighting
    """
    group['freq'] = group.hh_weight / group.DayID.nunique()
    return group
    
if not import_trips:

    # Make sure household weights are floats
    clean_travel_diaries.hh_weight = clean_travel_diaries.hh_weight.apply(float)

#     trips = clean_travel_diaries.groupby('IndividualID').apply(reweight)
    trips.freq = clean_travel_diaries.groupby('IndividualID').apply(reweight).freq
    trips['pid'] = [f"{p}_{d}" for p, d in zip(trips.IndividualID, trips.TravDay)]
    trips['hid'] = [f"{h}_{d}" for h, d in zip(trips.HouseholdID, trips.TravDay)]
    
    trips = trips.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'])
    trips['hzone'] = pd.merge(trips,hh_in[['HouseholdID','PSUID','HHoldGOR_B02ID']],on=['HouseholdID','PSUID'],how='left')['HHoldGOR_B02ID']
    # Export trips
#     trips.to_csv('./data/UKDA-5340-processed/trips.csv')

In [None]:
def expand_days(
    trips,
    target,
    trips_on='Diary_number',
    target_on='Diary_number',
    new_id='pid',
    trim=True
):
    """
    Expand target df based on mapping between trips target_on and new_id.
    This is so slow. Fix
    Set index to new_id.
    """
    print("Building mapping.")
    mapping = {}
    for i, person in trips.groupby(target_on):
        mapping[i] = list(set(person[new_id]))
    n = len(mapping)
    
    if trim:
        print("Trimming target.")
        selection = set(trips[trips_on])
        target = target.loc[target[target_on].isin(selection)]
    
    expanded = pd.DataFrame()
    for p, (i, ids) in enumerate(mapping.items()):
        if not p % 10:
            print(f"Building expanded data {p}/{n}", end='\r', flush=True)
        for idx in ids:
            split = target.loc[target[target_on] == i]
            split[new_id] = idx
            expanded = expanded.append(split)
    expanded.set_index(new_id, inplace=True)
    print(f"Done")
    return expanded

In [None]:
import_hhs = True

In [None]:
if not import_hhs:
    hhs = expand_days(
        trips,
        hh_in,
        trips_on='HouseholdID',
        target_on='HouseholdID',
        new_id='hid'
    )
    hhs = hhs.rename(columns={"HHoldGOR_B02ID":"hzone"})
#     hhs.to_csv('./data/UKDA-5340-processed/hhs.csv')
else:
    hhs = pd.read_csv('../data/outputs/UKDA-5340-processed/hhs.csv')

In [None]:
hhs.head()

In [None]:
import_people = True

In [None]:
if not import_people:
    people = expand_days(
        trips,
        persons_in,
        trips_on='IndividualID',
        target_on='IndividualID',
        new_id='pid'
    )
#     people.to_csv('./data/UKDA-5340-processed/people.csv')
else:
    people = pd.read_csv('../data/outputs/UKDA-5340-processed/people.csv')

In [None]:
people.head()

## Subset trips, people and households

In [None]:
export_cambridge_data = True

In [None]:
# Get all household, trip and people ids
all_hh_ids_len = hhs.HouseholdID.nunique()
all_trip_ids_len = trips.TripID.nunique()
all_people_ids_len = people.shape[0]#IndividualID.nunique()


# Get all household ids that satisfy
# Household (HHoldGOR_B02ID or hzone) must be in East England (6) OR 
cambridge_domestic_hh_ids = set(hhs[hhs.hzone.isin([6])].HouseholdID.unique().tolist())
# Trip origin AND destination must be the same and in the East of England (6)
cambridge_domestic_trip_hh_ids = set(trips[(trips.ozone == trips.dzone) & (trips.ozone.isin([6]))].HouseholdID.unique().tolist())
# AND then take
# Get all households before January 2020 (pandemic)
cambridge_2002_2019_hh_ids = set(hhs[hhs.HouseholdID.apply(lambda x: not str(x).startswith('2020'))].HouseholdID.unique().tolist())

# Take union of first two conditions and then intersection with last condition
cambridge_hh_ids = list((cambridge_domestic_hh_ids.union(cambridge_domestic_trip_hh_ids)).intersection(cambridge_2002_2019_hh_ids))

# Subset household, trips and people based on household ids
cambridge_hhs = hhs[hhs.HouseholdID.isin(cambridge_hh_ids)].reset_index()#.set_index('hid')
cambridge_trips = trips[trips.HouseholdID.isin(cambridge_hh_ids)].reset_index(drop=True)
cambridge_people = people[people.HouseholdID.isin(cambridge_hh_ids)].reset_index()#.set_index('pid')

In [None]:
cambridge_trip_ids_len = len(cambridge_trips.TripID.unique())
cambridge_people_ids_len = cambridge_people.shape[0]


print(f'{len(cambridge_hh_ids)} households maintained ({int(100*len(cambridge_hh_ids)/all_hh_ids_len)} % of total households available) ')
print(f'{cambridge_trip_ids_len} trips maintained ({int(100*cambridge_trip_ids_len/all_trip_ids_len)} % of total trips available) ')
print(f'{cambridge_people_ids_len} people maintained ({int(100*cambridge_people_ids_len/all_people_ids_len)} % of total people available) ')

In [None]:
# Drop unnecessary columns
cambridge_hhs.drop(columns = {"index","Unnamed: 0"},inplace=True)
cambridge_trips.drop(columns = {"Unnamed: 0","Unnamed: 0.1","Unnamed: 0.1.1"},inplace=True)
cambridge_people.drop(columns = {"index"},inplace=True)
# Reset index
# cambridge_hhs = cambridge_hhs.reset_index()
# cambridge_people = cambridge_people.reset_index()

In [None]:
# Export cambridge NTS tables
if export_cambridge_data:
    cambridge_hhs.to_csv('../data/outputs/UKDA-5340-processed/cambridge_hhs.csv')
    cambridge_people.to_csv('../data/outputs/UKDA-5340-processed/cambridge_people.csv')
    cambridge_trips.to_csv('../data/outputs/UKDA-5340-processed/cambridge_trips.csv')

## Load into PAM

We load the pandas formatted data into Pam using the `pam.read.load_travel_diary_from_to` read method. We do some very preliminary validation of plans and assurance.

In [None]:
from pam import write
from pam import read
from pam.plot.stats import plot_activity_times, plot_leg_times

In [None]:
cambridge_trips.tst = cambridge_trips.tst.astype(int)
cambridge_trips.tet = cambridge_trips.tet.astype(int)

In [None]:
population = read.load_travel_diary(
    trips=cambridge_trips,
    persons_attributes=cambridge_people,
    hhs_attributes=cambridge_hhs,
    trip_freq_as_person_freq=True
)

In [None]:
population.fix_plans()

In [None]:
# this should be replaced with a more direct method
for hh in tqdm(population.households.values()):
    for p in hh.people.values():
        p.validate()

In [None]:
population.size  # this also accounts for the weighting

In [None]:
population.stats

In [None]:
population.activity_classes

In [None]:
population.mode_classes

In [None]:
plot_activity_times(population)

In [None]:
plot_leg_times(population)

In [None]:
# night shift @ 2016008863_6

In [None]:
hh = population.random_household()
hh.print()
hh.plot()

In [None]:
population.activity_classes

## Sample the Population

We sample a very small population based on the given NTS household weightings.

In [None]:
from pam.core import Population
from pam.samplers.basic import freq_sample
# from copy import deepcopy

population_sample = Population()
    
for hid, household in tqdm(population.households.items()):
    av_hh_weight = household.freq  # this is currently the av of person freq in the hh
    freq = freq_sample(av_hh_weight, 10)

    for idx in range(freq):
        hh = deepcopy(household)
        hh.hid = f"{hh.hid}_{idx}"
        hh.people = {}
        for pid, person in household.people.items():
            p = deepcopy(person)
            p.pid = f"{pid}_{idx}"
            hh.add(p)
        population_sample.add(hh)


In [None]:
population.size

In [None]:
population_sample.size

## Facility Sampling¶ 

The facilities input is prepared using a separate project called OSM-Facility Sampler (OSMFS). This project woulbe be better names the OSM Facility *Extractor*. We use it to extract viable activity locations for each activity type for each zone. This project is not currently open source, but is described below:

OSMFS joins osm data with the geographies of an area to create a mapping between zones, acts and facility locations (points). This is output as a geojson:

{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"activity": "other"}, "geometry": {"type": "Point", "coordinates": [-4.5235751, 54.1698685]}},

todo: the current methodology does not support shared facilities, ie facilities with more than one activity (schools are places of education and work for example).

todo: the above json has to be rejoined with the geography to create a spatial sampler. This is a duplicated operation which could be included in the Bench output, eg:

zone_id: activity: (id, point)

In [None]:
from pam.samplers import facility

In [None]:
def load_facilities(path, from_crs="EPSG:4326", to_crs="EPSG:27700"):
    
    facilities = gp.read_file(path)
    facilities = facilities.rename(columns={"activities":"activity"})
    # Reproject if necessary
    if from_crs != to_crs: 
        print('Reprojecting facilities')
        facilities.crs = from_crs
        facilities.to_crs(to_crs, inplace=True)
    facilities.crs = to_crs
    return facilities

def load_zones(zones_path, from_crs="EPSG:27700", to_crs="EPSG:27700"):
    
    zones = gp.read_file(zones_path)
    zones.set_index('ward_code', inplace=True)
    if not from_crs == to_crs:
        zones.crs = from_crs
        zones.to_crs(to_crs, inplace=True)
    return zones

In [None]:
# Import wards and facilities in Cambridge
cambridge_facilities_path = './data/cambridge/cambridge_facilities_spread_filtered.geojson'
# './data/cambridge/cambridge_facilities_filtered.geojson'
cambridge_facilities = load_facilities(cambridge_facilities_path, from_crs="EPSG:27700")
wards = load_zones(area_path, from_crs="EPSG:27700")
# wards = wards.reset_index()

### Visualise wards and facilities

In [None]:
fig, ax = plt.subplots(1,1, figsize = (10,7))
wards.boundary.plot(ax = ax)
cambridge_facilities.plot(ax=ax, markersize=2, color='purple')
# for zone, centroid in zip(zones.index, zones.centroid):
#     ax.annotate(zone, xy = (centroid.x, centroid.y), size = 15)
ax.axis('off')
plt.show()

In [None]:
cambridge_facilities.head(3)

In [None]:
cambridge_facilities.activity.unique()

### Launch facility location sampler

In [None]:
def test_sampler_distance(population, sampler, n_iterations = 200, complex_sampler = False,
                         title=None):
    distance_commute = []
    
    for i in range(n_iterations):
        if complex_sampler:
            population.sample_locs_complex(sampler)
        else:
            population.sample_locs(sampler)
        distance_commute.append(write.write_benchmarks(population)['euclidean_distance'][0])

    pd.Series(distance_commute).hist(bins = 20)
    if title!=None:
        plt.title(title)
    plt.xlabel('distance')
    plt.ylabel('frequency')
    plt.show()

In [None]:
temp = gp.sjoin(cambridge_facilities, wards, how='inner', op='intersects')

In [None]:
weight_on = None
activity_areas_dict = {x:{} for x in temp['index_right'].unique()}
for (zone, act), facility_data in temp.groupby(['index_right', 'activity']):
    activity_areas_dict[zone][act] = facility_data
sampler_dict = {}
activities = list(set(cambridge_facilities.activity))
for zone in set(temp.index_right):
    sampler_dict[zone] = {}
    zone_facs = activity_areas_dict.get(zone, {})

    for act in activities:
        facs = zone_facs.get(act, None)
        if facs is not None:
            points = [(i, g) for i, g in facs.geometry.items()]
            if weight_on is not None:
                # weighted sampler
                weights = facs[weight_on]
                transit_distance = facs['transit'] if max_walk is not None else None
                sampler_dict[zone][act] = 1
            else:
                # simple sampler
                sampler_dict[zone][act] = 1
        else:
            sampler_dict[zone][act] = None

In [None]:
facility_sampler = facility.FacilitySampler(
    facilities=cambridge_facilities,
    zones=wards,
    build_xml=True,
    fail=False,
    random_default=False
)

facility_sampler.clear()

In [None]:
person = population_sample.random_person()

In [None]:
person.home_area

In [None]:
person.home.area

In [None]:
population_sample.sample_locs(facility_sampler)

In [None]:
person = population_sample.random_person()
person.plot()
person.print()

In [None]:
test_sampler_distance(population_sample, facility_sampler, complex_sampler=False,
                     title = 'Commute distance, simple sampling')

## Ward not containing any facilities

In [None]:
# # Select random facility index
# random_index = np.random.randint(0,facilities.shape[0],1)
# # Get facility for that index
# fac = gp.GeoDataFrame(cambridge_facilities.iloc[random_index,:].reset_index(drop=True))

In [None]:
# fig, ax = plt.subplots(1,1, figsize = (15,15))
# wards[wards.index == 'E05002816'].boundary.plot(ax = ax)
# facilities.plot(ax=ax, markersize=100, color='red')
# for zone, centroid in zip(wards.index, wards.centroid):
#     ax.annotate(zone, xy = (centroid.x, centroid.y), size = 10)
# ax.axis('off')
# plt.show()
# print('Ward code indentified',fac.ward_code)

# Random Sampler

Failing a facility sampler - we can use random sampling instead.

In [None]:
from pam.samplers.spatial import RandomPointSampler

In [None]:
# zones = load_zones(area_path, from_crs="EPSG:4326")
zones = deepcopy(areas)
sampler = RandomPointSampler(geoms=zones)

In [None]:
sampler.sample(113, None)

In [None]:
population_sample.sample_locs(sampler)

In [None]:
person = population_sample.random_person()
person.plot()
person.print()

## Write to Disk

1. write MATSim formats to disk (plans and attributes)
2. write csv and geojson summaries to disk
3. write MATSim formatted facilities to disk

In [None]:
import pam.write as write

In [None]:
comment = 'NTS cambridge prelim 03122020 epsg27700' 
#'NTS london prelim 24nov2020 epsg27700'

write.write_matsim(
        population_sample,
        plans_path=os.path.join(out_dir, 'plans.xml'),
        attributes_path=os.path.join(out_dir, 'attributes.xml'),
        comment=comment
    )
population_sample.to_csv(out_dir, crs="EPSG:27700", to_crs="EPSG:4326")
# facility_sampler.write_facilities_xml(os.path.join(out_dir, 'facilities.xml'), comment=comment)