# Estimating At-work Subtour Scheduling

This notebook illustrates how to re-estimate the at-work subtour scheduling component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import larch as lx
import pandas as pd

lx.versions()

JAX not found. Some functionality will be unavailable.


{'larch': '6.0.32',
 'sharrow': '2.13.0',
 'numpy': '1.26.4',
 'pandas': '1.5.3',
 'xarray': '2024.3.0',
 'numba': '0.60.0'}

For this demo, we will assume that you have already run ActivitySim in estimation
mode, and saved the required estimation data bundles (EDB's) to disk.  See
the [first notebook](./01_estimation_mode.ipynb) for details.  The following module
will run a script to set everything up if the example data is not already available.

In [2]:
from est_mode_setup import prepare

prepare()

EDB directory already populated.


PosixPath('test-estimation-data/activitysim-prototype-mtc-extended')

# Load data and prep model for estimation

In [3]:
modelname = "atwork_subtour_scheduling"

from activitysim.estimation.larch import component_model

model, data = component_model(
    modelname,
    edb_directory=f"output-est-mode/estimation_data_bundle/{modelname}/",
    return_data=True,
)

loading from output-est-mode/estimation_data_bundle/atwork_subtour_scheduling/tour_scheduling_atwork_coefficients.csv
loading from output-est-mode/estimation_data_bundle/atwork_subtour_scheduling/atwork_subtour_scheduling_SPEC.csv
loading from output-est-mode/estimation_data_bundle/atwork_subtour_scheduling/atwork_subtour_scheduling_alternatives_combined.parquet
loading from output-est-mode/estimation_data_bundle/atwork_subtour_scheduling/atwork_subtour_scheduling_choosers_combined.parquet


# Review data loaded from the EDB

The next (optional) step is to review the EDB, including the coefficients, utilities specification, and chooser and alternative data.

## Coefficients

In [4]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_early_start_at_5,-7.765548,F
coef_am_peak_start_at_6,-6.156718,F
coef_am_peak_start_at_7,-4.061708,F
coef_am_peak_start_at_8,-2.330535,F
coef_am_peak_start_at_9,-1.881593,F
coef_midday_start_at_10_11_12,0.0,T
coef_midday_start_at_13_14_15,-0.775022,F
coef_pm_peak_start_at_16_17_18,-0.227528,F
coef_evening_start_at_19_20_21,-1.01509,F
coef_late_start_at_22_23,-0.73757,F


## Utility specification

In [5]:
data.spec

Unnamed: 0,Label,Description,Expression,Coefficient
0,util_early_start_at_5,Early start at 5,start < 6,coef_early_start_at_5
1,util_am_peak_start_at_6,AM peak start at 6,start == 6,coef_am_peak_start_at_6
2,util_am_peak_start_at_7,AM peak start at 7,start == 7,coef_am_peak_start_at_7
3,util_am_peak_start_at_8,AM peak start at 8,start == 8,coef_am_peak_start_at_8
4,util_am_peak_start_at_9,AM peak start at 9,start == 9,coef_am_peak_start_at_9
5,util_midday_start_at_10_11_12,Midday start at 10/11/12,(start > 9) & (start < 13),coef_midday_start_at_10_11_12
6,util_midday_start_at_13_14_15,Midday start at 13/14/15,(start > 12) & (start < 16),coef_midday_start_at_13_14_15
7,util_pm_peak_start_at_16_17_18,PM peak start at 16/17/18,(start > 15) & (start < 19),coef_pm_peak_start_at_16_17_18
8,util_evening_start_at_19_20_21,Evening start at 19/20/21,(start > 18) & (start < 22),coef_evening_start_at_19_20_21
9,util_late_start_at_22_23,Late start at 22/23,start > 21,coef_late_start_at_22_23


## Chooser data

In [6]:
data.chooser_data

Unnamed: 0,tour_id,model_choice,override_choice,person_id,tour_type,tour_type_count,tour_type_num,tour_num,tour_count,tour_category,...,auOpRetail,auOpTotal,trPkRetail,trPkTotal,trOpRetail,trOpTotal,nmRetail,nmTotal,start_previous,end_previous
0,2966559,86,113,72355,eat,1,1,1,1,atwork,...,9.915345,12.430580,6.550726,9.119016,6.446184,9.035333,5.256966,6.831275,5,5
1,3046632,145,90,74308,eat,1,1,1,1,atwork,...,10.259606,12.727344,5.452325,7.771767,5.298242,7.602426,6.443514,7.965548,5,5
2,3048108,85,124,74344,eat,1,1,1,1,atwork,...,10.259606,12.727344,5.452325,7.771767,5.298242,7.602426,6.443514,7.965548,5,5
3,3177463,99,55,77499,eat,1,1,1,1,atwork,...,7.714071,10.313852,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,5,5
4,3191832,99,145,77849,maint,1,1,1,1,atwork,...,9.763398,12.307147,1.228620,3.278544,1.239060,3.280429,4.284159,6.249753,5,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6031,308156004,99,124,7516000,eat,1,1,1,1,atwork,...,9.566058,12.139587,0.000000,0.000000,0.000000,0.000000,3.086628,5.647842,5,5
6032,308227650,155,88,7517747,maint,1,1,1,1,atwork,...,10.260330,12.748307,4.719610,7.890189,4.493279,7.643024,5.529241,8.876000,5,5
6033,308260103,102,113,7518539,eat,1,1,1,1,atwork,...,10.294354,12.880095,4.408064,7.614481,3.838266,7.011926,5.169891,8.595128,5,5
6034,309080718,55,113,7538554,eat,1,1,1,1,atwork,...,10.345456,12.953264,7.103743,9.892019,6.979741,9.773601,6.557982,8.518901,5,5


## Alternatives data

In [7]:
data.alt_values

Unnamed: 0,tour_id,start,end,duration,tdd,mode_choice_logsum,util_early_start_at_5,util_am_peak_start_at_6,util_am_peak_start_at_7,util_am_peak_start_at_8,...,util_duration_shift_for_number_of_mandatory_tours,util_start_shift_for_number_of_joint_tours,util_duration_shift_for_number_of_joint_tours,util_start_shift_for_number_of_individual_nonmandatory_tours,util_duration_shift_for_number_of_individual_nonmandatory_tours,util_dummy_for_business_related_purpose_and_duration_from_0_to_1,util_dummy_for_eating_out_purpose_and_duration_of_1_hour,util_dummy_for_eating_out_purpose_and_departure_at_11,util_dummy_for_eating_out_purpose_and_departure_at_12,util_dummy_for_eating_out_purpose_and_departure_at_13
0,2966559,10,10,0,85,0,False,False,False,False,...,0,0,0,0,0,False,False,False,False,False
1,2966559,10,11,1,86,0,False,False,False,False,...,1,0,0,0,0,False,False,False,False,False
2,2966559,10,12,2,87,0,False,False,False,False,...,2,0,0,0,0,False,False,False,False,False
3,2966559,10,13,3,88,0,False,False,False,False,...,3,0,0,0,0,False,False,False,False,False
4,2966559,10,14,4,89,0,False,False,False,False,...,4,0,0,0,0,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
457514,309112001,21,22,1,185,0,False,False,False,False,...,1,0,0,0,0,False,False,False,False,False
457515,309112001,21,23,2,186,0,False,False,False,False,...,2,0,0,0,0,False,False,False,False,False
457516,309112001,22,22,0,187,0,False,False,False,False,...,0,0,0,0,0,False,False,False,False,False
457517,309112001,22,23,1,188,0,False,False,False,False,...,1,0,0,0,0,False,False,False,False,False


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [8]:
model.doctor(repair_nan_utility=True)
model.doctor(repair_ch_av="-")

problem: nan_utility has (190 issues)
problem: chosen_but_not_available has (12 issues)


(<larch.Model (MNL) "None">,
 ┣ chosen_but_not_available:    altid  n      example rows
 ┃                           0     88  1              5445
 ┃                           1    100  2        1344, 5240
 ┃                           2    102  1              4696
 ┃                           3    103  1              3099
 ┃                           4    107  1              2364
 ┃                           5    113  2        3454, 4647
 ┃                           6    114  4  1537, 2506, 2800
 ┃                           7    115  1              1791
 ┃                           8    116  1              4089
 ┃                           9    120  1              2595
 ┃                           10   125  3   797, 1766, 2716
 ┃                           11   138  1              2003)

In [9]:
model.estimate(maxiter=900)

Unnamed: 0_level_0,value,best,initvalue,minimum,maximum,nullvalue,holdfast
param_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
coef_am_peak_end,-2.543876,-2.543876,-2.928312,-25.0,25.0,0.0,0
coef_am_peak_start_at_6,-5.790935,-5.790935,-6.156718,-25.0,25.0,0.0,0
coef_am_peak_start_at_7,-4.238642,-4.238642,-4.061708,-25.0,25.0,0.0,0
coef_am_peak_start_at_8,-2.371798,-2.371798,-2.330535,-25.0,25.0,0.0,0
coef_am_peak_start_at_9,-2.106247,-2.106247,-1.881593,-25.0,25.0,0.0,0
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-1.656086,-1.656086,-1.543,-25.0,25.0,0.0,0
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.371453,1.371453,1.511,-25.0,25.0,0.0,0
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.454861,2.454861,2.721,-25.0,25.0,0.0,0
coef_dummy_for_eating_out_purpose_and_departure_at_13,1.83215,1.83215,2.122,-25.0,25.0,0.0,0
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.534786,0.534786,0.3999,-25.0,25.0,0.0,0


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
coef_am_peak_end,-2.543876
coef_am_peak_start_at_6,-5.790935
coef_am_peak_start_at_7,-4.238642
coef_am_peak_start_at_8,-2.371798
coef_am_peak_start_at_9,-2.106247
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-1.656086
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.371453
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.454861
coef_dummy_for_eating_out_purpose_and_departure_at_13,1.832150
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.534786

Unnamed: 0,0
coef_am_peak_end,-2.543876
coef_am_peak_start_at_6,-5.790935
coef_am_peak_start_at_7,-4.238642
coef_am_peak_start_at_8,-2.371798
coef_am_peak_start_at_9,-2.106247
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-1.656086
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.371453
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.454861
coef_dummy_for_eating_out_purpose_and_departure_at_13,1.83215
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.534786

Unnamed: 0,0
coef_am_peak_end,-4.854218e-05
coef_am_peak_start_at_6,1.910552e-05
coef_am_peak_start_at_7,9.916948e-07
coef_am_peak_start_at_8,4.037261e-05
coef_am_peak_start_at_9,0.0001440668
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-0.0001280892
coef_dummy_for_eating_out_purpose_and_departure_at_11,0.0001803296
coef_dummy_for_eating_out_purpose_and_departure_at_12,-0.0001483346
coef_dummy_for_eating_out_purpose_and_departure_at_13,0.0001713493
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,-0.0002135625


### Estimated coefficients

In [10]:
model.parameter_summary()

Unnamed: 0_level_0,Value,Std Err,t Stat,Signif,Null Value,Constrained
Parameter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
coef_am_peak_end,-2.54,0.162,-15.68,***,0.0,
coef_am_peak_start_at_6,-5.79,0.649,-8.93,***,0.0,
coef_am_peak_start_at_7,-4.24,0.2,-21.18,***,0.0,
coef_am_peak_start_at_8,-2.37,0.127,-18.63,***,0.0,
coef_am_peak_start_at_9,-2.11,0.111,-19.05,***,0.0,
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-1.66,0.153,-10.81,***,0.0,
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.37,0.13,10.56,***,0.0,
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.45,0.153,16.0,***,0.0,
coef_dummy_for_eating_out_purpose_and_departure_at_13,1.83,0.204,8.98,***,0.0,
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.535,0.128,4.17,***,0.0,


# Output Estimation Results

In [11]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [12]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
)

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [13]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_early_start_at_5,-7.645209,F
1,coef_am_peak_start_at_6,-5.790935,F
2,coef_am_peak_start_at_7,-4.238642,F
3,coef_am_peak_start_at_8,-2.371798,F
4,coef_am_peak_start_at_9,-2.106247,F
5,coef_midday_start_at_10_11_12,0.0,T
6,coef_midday_start_at_13_14_15,-0.665045,F
7,coef_pm_peak_start_at_16_17_18,-0.000431,F
8,coef_evening_start_at_19_20_21,-0.786571,F
9,coef_late_start_at_22_23,-0.890587,F
