# Estimating At-work Subtour Scheduling

This notebook illustrates how to re-estimate the at-work subtour scheduling component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd

We'll work in our `test` directory, where ActivitySim has saved the estimation data bundles.

In [2]:
os.chdir('test')

# Load data and prep model for estimation

In [3]:
modelname = "atwork_subtour_scheduling"

from activitysim.estimation.larch import component_model
model, data = component_model(modelname, return_data=True)

  return m(*args, **kwargs)


# Review data loaded from the EDB

The next (optional) step is to review the EDB, including the coefficients, utilities specification, and chooser and alternative data.

## Coefficients

In [4]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_early_start_at_5,-7.765548,F
coef_am_peak_start_at_6,-6.156718,F
coef_am_peak_start_at_7,-4.061708,F
coef_am_peak_start_at_8,-2.330535,F
coef_am_peak_start_at_9,-1.881593,F
coef_midday_start_at_10_11_12,0.0,T
coef_midday_start_at_13_14_15,-0.775022,F
coef_pm_peak_start_at_16_17_18,-0.227528,F
coef_evening_start_at_19_20_21,-1.01509,F
coef_late_start_at_22_23,-0.73757,F


## Utility specification

In [5]:
data.spec

Unnamed: 0,Label,Description,Expression,Coefficient
0,util_early_start_at_5,Early start at 5,start < 6,coef_early_start_at_5
1,util_am_peak_start_at_6,AM peak start at 6,start == 6,coef_am_peak_start_at_6
2,util_am_peak_start_at_7,AM peak start at 7,start == 7,coef_am_peak_start_at_7
3,util_am_peak_start_at_8,AM peak start at 8,start == 8,coef_am_peak_start_at_8
4,util_am_peak_start_at_9,AM peak start at 9,start == 9,coef_am_peak_start_at_9
5,util_midday_start_at_10_11_12,Midday start at 10/11/12,(start > 9) & (start < 13),coef_midday_start_at_10_11_12
6,util_midday_start_at_13_14_15,Midday start at 13/14/15,(start > 12) & (start < 16),coef_midday_start_at_13_14_15
7,util_pm_peak_start_at_16_17_18,PM peak start at 16/17/18,(start > 15) & (start < 19),coef_pm_peak_start_at_16_17_18
8,util_evening_start_at_19_20_21,Evening start at 19/20/21,(start > 18) & (start < 22),coef_evening_start_at_19_20_21
9,util_late_start_at_22_23,Late start at 22/23,start > 21,coef_late_start_at_22_23


## Chooser data

In [6]:
data.chooser_data

Unnamed: 0,tour_id,model_choice,override_choice,person_id,tour_type,tour_type_count,tour_type_num,tour_num,tour_count,tour_category,...,COLLFTE,COLLPTE,TOPOLOGY,TERMINAL,household_density,employment_density,density_index,is_cbd,start_previous,end_previous
0,2998927,114,114,73144,maint,1,1,1,1,atwork,...,0.00000,0.00000,1,2.48345,26.073171,8.048780,6.150212,False,5,5
1,3060326,85,85,74642,eat,1,1,1,1,atwork,...,0.00000,0.00000,1,2.09035,20.666667,4.107527,3.426505,False,5,5
2,4422879,124,124,107875,eat,1,1,1,1,atwork,...,0.00000,0.00000,1,5.35435,139.333333,418.518519,104.532377,False,5,5
3,4440282,154,154,108299,maint,1,1,1,1,atwork,...,2035.58118,20.60887,2,5.22542,97.634722,550.205552,82.920387,False,5,5
4,4496780,89,89,109677,maint,1,1,1,1,atwork,...,690.54974,0.00000,3,4.73802,117.769796,246.205869,79.663609,False,5,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
459,302923726,145,145,7388383,maint,1,1,1,1,atwork,...,0.00000,0.00000,1,2.37546,19.153846,5.907692,4.515087,False,5,5
460,302942567,85,85,7388843,eat,1,1,1,1,atwork,...,0.00000,0.00000,1,2.81406,16.068376,21.136752,9.128669,False,5,5
461,302942627,135,135,7388844,maint,1,1,1,1,atwork,...,0.00000,0.00000,1,2.81406,16.068376,21.136752,9.128669,False,5,5
462,305120465,112,112,7441962,maint,1,1,1,1,atwork,...,0.00000,0.00000,1,8.54946,55.606634,142.984438,40.036459,False,5,5


## Alternatives data

In [7]:
data.alt_values

Unnamed: 0,tour_id,variable,0,1,2,3,4,5,6,7,...,180,181,182,183,184,185,186,187,188,189
0,2998927,duration,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2998927,end,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2998927,mode_choice_logsum,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2998927,start,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2998927,util_am_peak_end,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24587,308000674,util_start_shift_for_number_of_individual_nonm...,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
24588,308000674,util_start_shift_for_number_of_joint_tours,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
24589,308000674,util_start_shift_for_number_of_mandatory_tours,5,5,5,5,5,5,5,5,...,0,0,0,0,0,0,0,0,0,0
24590,308000674,util_start_shift_for_outbound_auto_travel_time...,60.10000228881836,60.10000228881836,60.10000228881836,60.10000228881836,60.10000228881836,60.10000228881836,60.10000228881836,60.10000228881836,...,0,0,0,0,0,0,0,0,0,0


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [8]:
model.estimate()

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
coef_am_peak_end,-1.373169,-2.928312,0.0,-25.0,25.0,0,,-1.373169
coef_am_peak_start_at_6,-20.592814,-6.156718,0.0,-25.0,25.0,0,,-20.592814
coef_am_peak_start_at_7,-2.826984,-4.061708,0.0,-25.0,25.0,0,,-2.826984
coef_am_peak_start_at_8,-1.196528,-2.330535,0.0,-25.0,25.0,0,,-1.196528
coef_am_peak_start_at_9,-1.619897,-1.881593,0.0,-25.0,25.0,0,,-1.619897
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-0.663129,-1.543,0.0,-25.0,25.0,0,,-0.663129
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.151827,1.511,0.0,-25.0,25.0,0,,1.151827
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.625436,2.721,0.0,-25.0,25.0,0,,2.625436
coef_dummy_for_eating_out_purpose_and_departure_at_13,2.360981,2.122,0.0,-25.0,25.0,0,,2.360981
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.310405,0.3999,0.0,-25.0,25.0,0,,0.310405


  model.estimate()


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
coef_am_peak_end,-1.373169
coef_am_peak_start_at_6,-20.592814
coef_am_peak_start_at_7,-2.826984
coef_am_peak_start_at_8,-1.196528
coef_am_peak_start_at_9,-1.619897
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-0.663129
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.151827
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.625436
coef_dummy_for_eating_out_purpose_and_departure_at_13,2.360981
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.310405

Unnamed: 0,0
coef_am_peak_end,-1.373169
coef_am_peak_start_at_6,-20.592814
coef_am_peak_start_at_7,-2.826984
coef_am_peak_start_at_8,-1.196528
coef_am_peak_start_at_9,-1.619897
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-0.663129
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.151827
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.625436
coef_dummy_for_eating_out_purpose_and_departure_at_13,2.360981
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.310405

Unnamed: 0,0
coef_am_peak_end,0.0001755671
coef_am_peak_start_at_6,-3.041241e-08
coef_am_peak_start_at_7,-8.670115e-05
coef_am_peak_start_at_8,5.028242e-05
coef_am_peak_start_at_9,0.0005798423
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,0.0001601202
coef_dummy_for_eating_out_purpose_and_departure_at_11,-3.47579e-05
coef_dummy_for_eating_out_purpose_and_departure_at_12,-5.671365e-05
coef_dummy_for_eating_out_purpose_and_departure_at_13,-2.578419e-05
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.0001872976


### Estimated coefficients

In [9]:
model.parameter_summary()

Unnamed: 0,Value,Std Err,t Stat,Signif,Like Ratio,Null Value,Constrained
coef_am_peak_end,-1.37,0.551,-2.49,*,,0.0,
coef_am_peak_start_at_6,-20.6,7.33,-2.81,**,,0.0,
coef_am_peak_start_at_7,-2.83,0.615,-4.59,***,,0.0,
coef_am_peak_start_at_8,-1.2,0.405,-2.96,**,,0.0,
coef_am_peak_start_at_9,-1.62,0.372,-4.35,***,,0.0,
coef_dummy_for_business_related_purpose_and_duration_from_0_to_1,-0.663,0.597,-1.11,,,0.0,
coef_dummy_for_eating_out_purpose_and_departure_at_11,1.15,0.559,2.06,*,,0.0,
coef_dummy_for_eating_out_purpose_and_departure_at_12,2.63,0.598,4.39,***,,0.0,
coef_dummy_for_eating_out_purpose_and_departure_at_13,2.36,0.785,3.01,**,,0.0,
coef_dummy_for_eating_out_purpose_and_duration_of_1_hour,0.31,0.516,0.6,,,0.0,


# Output Estimation Results

In [10]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [11]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
)

<larch.util.excel.ExcelWriter at 0x7fcdc806f7f0>

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [12]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_early_start_at_5,-2.204068,F
1,coef_am_peak_start_at_6,-20.592814,F
2,coef_am_peak_start_at_7,-2.826984,F
3,coef_am_peak_start_at_8,-1.196528,F
4,coef_am_peak_start_at_9,-1.619897,F
5,coef_midday_start_at_10_11_12,0.0,T
6,coef_midday_start_at_13_14_15,-1.051562,F
7,coef_pm_peak_start_at_16_17_18,-0.798588,F
8,coef_evening_start_at_19_20_21,-0.325741,F
9,coef_late_start_at_22_23,-5.142925,F
