# Estimating Mandatory Tour Frequency

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [None]:
import larch as lx
import pandas as pd

lx.versions()

JAX not found. Some functionality will be unavailable.


{'larch': '6.0.32',
 'sharrow': '2.13.0',
 'numpy': '1.26.4',
 'pandas': '1.5.3',
 'xarray': '2024.3.0',
 'numba': '0.60.0'}

For this demo, we will assume that you have already run ActivitySim in estimation
mode, and saved the required estimation data bundles (EDB's) to disk.  See
the [first notebook](./01_estimation_mode.ipynb) for details.  The following module
will run a script to set everything up if the example data is not already available.

In [2]:
from est_mode_setup import prepare

prepare()

EDB directory already populated.


PosixPath('test-estimation-data/activitysim-prototype-mtc-extended')

# Load data and prep model for estimation

In [3]:
modelname = "mandatory_tour_frequency"

from activitysim.estimation.larch import component_model

model, data = component_model(
    modelname,
    edb_directory=f"output-est-mode/estimation_data_bundle/{modelname}/",
    return_data=True,
)

loading from output-est-mode/estimation_data_bundle/mandatory_tour_frequency/mandatory_tour_frequency_coefficients.csv
loading spec from output-est-mode/estimation_data_bundle/mandatory_tour_frequency/mandatory_tour_frequency_SPEC.csv
loading from output-est-mode/estimation_data_bundle/mandatory_tour_frequency/mandatory_tour_frequency_values_combined.parquet


# Review data loaded from the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

## Coefficients

In [4]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_unavailable,-999.0,T
coef_ft_worker_work2_asc,-3.3781,F
coef_pt_worker_work2_asc,-3.0476,F
coef_univ_work1_asc,2.166,F
coef_univ_work2_asc,-1.3965,F
coef_univ_school2_asc,-3.7429,F
coef_univ_work_and_school_asc,0.1073,F
coef_driving_age_child_school2_asc,-3.136,F
coef_driving_age_child_work_and_school_asc,-4.4362,F
coef_pre_driving_age_child_school2_asc,-3.9703,F


## Utility specification

In [5]:
data.spec

Unnamed: 0,Label,Description,Expression,work1,work2,school1,school2,work_and_school
0,util_ft_worker,Full-time worker alternative-specific constants,ptype == 1,0,coef_ft_worker_work2_asc,,,
1,util_pt_worker,Part-time worker alternative-specific constants,ptype == 2,0,coef_pt_worker_work2_asc,,,
2,util_univ,University student alternative-specific constants,ptype == 3,coef_univ_work1_asc,coef_univ_work2_asc,0,coef_univ_school2_asc,coef_univ_work_and_school_asc
3,util_non_working_adult,Non-working adult alternative-specific constants,ptype == 4,,,,,
4,util_retired,Retired alternative-specific constants,ptype == 5,,,,,
...,...,...,...,...,...,...,...,...
95,util_availability_driving_age_child,Unavailable: Driving-age child,ptype == 6,coef_unavailable,coef_unavailable,,,
96,util_availability_pre_driving_age_student,Unavailable: Pre-driving age child who is in s...,ptype == 7,,coef_unavailable,,,coef_unavailable
97,util_availability_pre_driving_age_not_in_school,Unavailable: Pre-driving age child who is not ...,ptype == 8,coef_unavailable,coef_unavailable,,coef_unavailable,coef_unavailable
98,util_availability_work_tours_no_usual_work_loc...,Unavailable: Work tours for those with no usua...,~(workplace_zone_id > -1),coef_unavailable,coef_unavailable,,,coef_unavailable


## Chooser data

In [6]:
data.chooser_data

Unnamed: 0_level_0,person_id,model_choice,override_choice,util_ft_worker,util_pt_worker,util_univ,util_non_working_adult,util_retired,util_driving_age_child,util_pre_driving_age_child,...,auPkTotal,auOpRetail,auOpTotal,trPkRetail,trPkTotal,trOpRetail,trOpTotal,nmRetail,nmTotal,override_choice_code
household_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1918,1918,school1,school1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,12.295952,10.006297,12.411522,5.033643,7.242282,4.864150,7.063680,5.626389,7.133756,3
3215,3215,school1,school1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,12.036005,9.698240,12.156041,0.000000,0.000000,0.000000,0.000000,1.360774,5.139792,3
4362,4362,school1,school1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,11.954228,9.566058,12.139587,0.000000,0.000000,0.000000,0.000000,3.086628,5.647842,3
5859,5859,school1,school1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,12.484185,9.994051,12.608786,2.273060,4.760981,2.104966,4.545843,4.254230,6.941860,3
6100,6100,school1,school1,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,12.718182,10.233345,12.809847,3.534863,6.375527,3.401762,6.258096,5.240538,7.175776,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2857869,7549204,school1,school1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,12.307902,10.005144,12.585021,3.812385,6.519807,3.632087,6.178083,3.718425,5.556368,3
2857903,7549238,school1,school1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,12.309276,9.986443,12.582868,3.319335,6.046532,3.207581,5.829832,4.139571,6.510036,3
2859659,7550994,school1,school1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,11.638333,9.598840,11.963516,0.916326,3.279901,0.315332,2.429541,3.824743,6.768218,3
2861083,7552418,school1,school1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,10.447823,8.697696,10.794293,0.000000,0.000000,0.000000,0.000000,3.679297,5.376192,3


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [7]:
model.estimate()

Unnamed: 0_level_0,value,best,initvalue,minimum,maximum,nullvalue,holdfast
param_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.0,0.0,0.0,0.0,0.0,0.0,1
coef_can_walk_to_work_and_school,0.284832,0.284832,0.1391,-50.0,50.0,0.0,0
coef_can_walk_to_work_school2,0.811379,0.811379,0.7114,-50.0,50.0,0.0,0
coef_can_walk_to_work_work2,0.600928,0.600928,0.5268,-50.0,50.0,0.0,0
coef_driving_age_child_school2_asc,-3.209418,-3.209418,-3.136,-50.0,50.0,0.0,0
coef_driving_age_child_work_and_school_asc,-4.565091,-4.565091,-4.4362,-50.0,50.0,0.0,0
coef_female_school1,0.000463,0.000463,0.1592,-50.0,50.0,0.0,0
coef_female_school2,-0.016856,-0.016856,0.114,-50.0,50.0,0.0,0
coef_female_work1,0.333342,0.333342,0.1737,-50.0,50.0,0.0,0
coef_female_work2,-0.219862,-0.219862,-0.2255,-50.0,50.0,0.0,0


/Users/jpn/Git/est-mode/larch/src/larch/model/jaxmodel.py:1156: PossibleOverspecification: Model is possibly over-specified (hessian is nearly singular).
  self.calculate_parameter_covariance()


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
0,0.000000
coef_can_walk_to_work_and_school,0.284832
coef_can_walk_to_work_school2,0.811379
coef_can_walk_to_work_work2,0.600928
coef_driving_age_child_school2_asc,-3.209418
coef_driving_age_child_work_and_school_asc,-4.565091
coef_female_school1,0.000463
coef_female_school2,-0.016856
coef_female_work1,0.333342
coef_female_work2,-0.219862

Unnamed: 0,0
0,0.0
coef_can_walk_to_work_and_school,0.284832
coef_can_walk_to_work_school2,0.811379
coef_can_walk_to_work_work2,0.600928
coef_driving_age_child_school2_asc,-3.209418
coef_driving_age_child_work_and_school_asc,-4.565091
coef_female_school1,0.000463
coef_female_school2,-0.016856
coef_female_work1,0.333342
coef_female_work2,-0.219862

Unnamed: 0,0
0,0.0
coef_can_walk_to_work_and_school,-4.8e-05
coef_can_walk_to_work_school2,0.000193
coef_can_walk_to_work_work2,-0.000216
coef_driving_age_child_school2_asc,-8.1e-05
coef_driving_age_child_work_and_school_asc,4.4e-05
coef_female_school1,-9.1e-05
coef_female_school2,0.000133
coef_female_work1,8.4e-05
coef_female_work2,-7.3e-05


### Estimated coefficients

In [8]:
model.parameter_summary()

Unnamed: 0_level_0,Value,Std Err,t Stat,Signif,Null Value,Constrained
Parameter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.0,0.0,,,0.0,fixed value
coef_can_walk_to_work_and_school,0.285,0.165,1.73,,0.0,
coef_can_walk_to_work_school2,0.811,0.163,4.97,***,0.0,
coef_can_walk_to_work_work2,0.601,0.103,5.81,***,0.0,
coef_driving_age_child_school2_asc,-3.21,0.23,-13.96,***,0.0,
coef_driving_age_child_work_and_school_asc,-4.57,889.0,-0.01,,0.0,
coef_female_school1,0.000463,0.265,0.00,,0.0,
coef_female_school2,-0.0169,0.166,-0.10,,0.0,
coef_female_work1,0.333,0.257,1.30,,0.0,
coef_female_work2,-0.22,0.0801,-2.75,**,0.0,


# Output Estimation Results

In [9]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [10]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
)

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [11]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_unavailable,-999.0,T
1,coef_ft_worker_work2_asc,-3.494595,F
2,coef_pt_worker_work2_asc,-3.079547,F
3,coef_univ_work1_asc,2.209708,F
4,coef_univ_work2_asc,-1.47243,F
5,coef_univ_school2_asc,-3.640737,F
6,coef_univ_work_and_school_asc,0.173386,F
7,coef_driving_age_child_school2_asc,-3.209418,F
8,coef_driving_age_child_work_and_school_asc,-4.565091,F
9,coef_pre_driving_age_child_school2_asc,-4.210709,F
