# Estimating At-Work Subtour Destination Choice

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import larch as lx
import pandas as pd

lx.versions()

JAX not found. Some functionality will be unavailable.


{'larch': '6.0.32',
 'sharrow': '2.13.0',
 'numpy': '1.26.4',
 'pandas': '1.5.3',
 'xarray': '2024.3.0',
 'numba': '0.60.0'}

For this demo, we will assume that you have already run ActivitySim in estimation
mode, and saved the required estimation data bundles (EDB's) to disk.  See
the [first notebook](./01_estimation_mode.ipynb) for details.  The following module
will run a script to set everything up if the example data is not already available.

In [2]:
from est_mode_setup import prepare

prepare()

EDB directory already populated.


PosixPath('test-estimation-data/activitysim-prototype-mtc-extended')

# Load data and prep model for estimation

In [3]:
modelname="atwork_subtour_destination"

In [4]:
from activitysim.estimation.larch import component_model
model, data = component_model(
    modelname,
    edb_directory=f"output-est-mode/estimation_data_bundle/{modelname}/",
    return_data=True,
)

loading from output-est-mode/estimation_data_bundle/atwork_subtour_destination/atwork_subtour_destination_coefficients.csv
loading from output-est-mode/estimation_data_bundle/atwork_subtour_destination/atwork_subtour_destination_SPEC.csv
loading from output-est-mode/estimation_data_bundle/atwork_subtour_destination/atwork_subtour_destination_alternatives_combined.parquet
loading from output-est-mode/estimation_data_bundle/atwork_subtour_destination/atwork_subtour_destination_choosers_combined.parquet
loading from output-est-mode/estimation_data_bundle/atwork_subtour_destination/atwork_subtour_destination_landuse.csv
loading from output-est-mode/estimation_data_bundle/atwork_subtour_destination/atwork_subtour_destination_size_terms.csv


# Review data loaded from EDB

Next we can review what was read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

## coefficients

In [5]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_distance_piecewise_linear_from_0_to_1_miles,-0.7926,F
coef_distance_piecewise_linear_from_1_to_2_miles,-0.7926,F
coef_distance_piecewise_linear_from_2_to_5_miles,-0.5197,F
coef_distance_piecewise_linear_from_5_to_15_miles,-0.2045,F
coef_distance_piecewise_linear_for_15_plus_miles,-0.2045,F
coef_size_variable_atwork,1.0,T
coef_no_attractions_atwork_size_variable_is_0,-999.0,T
coef_mode_choice_logsum,0.5136,F
coef_sample_of_alternatives_correction_factor,1.0,T


## alt_values

In [6]:
data.alt_values

Unnamed: 0,tour_id,alt_dest,util_distance_piecewise_linear_from_0_to_1_miles,util_distance_piecewise_linear_from_1_to_2_miles,util_distance_piecewise_linear_from_2_to_5_miles,util_distance_piecewise_linear_from_5_to_15_miles,util_distance_piecewise_linear_for_15_plus_miles,util_size_variable_atwork,util_no_attractions_atwork_size_variable_is_0,util_mode_choice_logsum,util_sample_of_alternatives_correction_factor
0,2966559,4,0.99,0.00,0.00,0.0,0.0,7.026095,False,14.641064,3.669233
1,2966559,9,1.00,0.98,0.00,0.0,0.0,7.841204,False,13.896585,4.331944
2,2966559,11,1.00,0.35,0.00,0.0,0.0,7.778631,False,14.171065,3.202032
3,2966559,12,0.89,0.00,0.00,0.0,0.0,7.385584,False,14.520744,4.616777
4,2966559,14,0.87,0.00,0.00,0.0,0.0,6.755056,False,14.588424,4.538306
...,...,...,...,...,...,...,...,...,...,...,...
125546,309112001,1375,1.00,1.00,1.38,0.0,0.0,5.398090,False,12.614690,4.169279
125547,309112001,1376,1.00,1.00,0.95,0.0,0.0,5.786087,False,12.938050,3.557811
125548,309112001,1378,1.00,1.00,0.76,0.0,0.0,5.680807,False,13.080930,3.564348
125549,309112001,1380,1.00,1.00,0.67,0.0,0.0,6.972544,False,13.148609,3.835276


## chooser_data

In [7]:
data.chooser_data

Unnamed: 0,tour_id,model_choice,override_choice,person_id,workplace_zone_id,income_segment
0,2966559,17,109,72355,17,1
1,3046632,1,13,74308,14,1
2,3048108,11,5,74344,106,1
3,3177463,355,355,77499,355,1
4,3191832,309,308,77849,309,1
...,...,...,...,...,...,...
6031,308156004,323,323,7516000,323,1
6032,308227650,583,582,7517747,573,1
6033,308260103,539,606,7518539,407,1
6034,309080718,21,70,7538554,7,1


## landuse

In [8]:
data.landuse

Unnamed: 0_level_0,DISTRICT,SD,county_id,TOTHH,TOTPOP,TOTACRE,RESACRE,CIACRE,TOTEMP,AGE0519,...,area_type,HSENROLL,COLLFTE,COLLPTE,TOPOLOGY,TERMINAL,household_density,employment_density,density_index,is_cbd
zone_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1,1,1,46,82,20.3,1.0,15.00000,27318,7,...,0,0.0,0.00000,0.0,3,5.89564,2.875000,1707.375000,2.870167,False
2,1,1,1,134,240,31.1,1.0,24.79297,42078,19,...,0,0.0,0.00000,0.0,1,5.84871,5.195214,1631.374751,5.178722,False
3,1,1,1,267,476,14.7,1.0,2.31799,2445,38,...,0,0.0,0.00000,0.0,1,5.53231,80.470405,736.891913,72.547987,False
4,1,1,1,151,253,19.3,1.0,18.00000,22434,20,...,0,0.0,0.00000,0.0,2,5.64330,7.947368,1180.736842,7.894233,False
5,1,1,1,611,1069,52.7,1.0,15.00000,15662,86,...,0,0.0,72.14684,0.0,1,5.52555,38.187500,978.875000,36.753679,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1450,34,34,9,2724,6493,1320.0,630.0,69.00000,1046,1013,...,4,0.0,0.00000,0.0,1,1.12116,3.896996,1.496423,1.081235,False
1451,34,34,9,2016,4835,664.0,379.0,43.00000,757,757,...,4,0.0,0.00000,0.0,1,1.17116,4.777251,1.793839,1.304140,False
1452,34,34,9,2178,5055,1068.0,602.0,35.00000,2110,789,...,4,0.0,0.00000,0.0,1,1.17587,3.419152,3.312402,1.682465,False
1453,34,34,9,298,779,14195.0,429.0,4.00000,922,88,...,5,0.0,0.00000,0.0,1,1.01972,0.688222,2.129330,0.520115,False


## spec

In [9]:
data.spec

Unnamed: 0,Label,Description,Expression,atwork
0,util_distance_piecewise_linear_from_0_to_1_miles,"Distance, piecewise linear from 0 to 1 miles","@skims['DIST'].clip(0,1)",coef_distance_piecewise_linear_from_0_to_1_miles
1,util_distance_piecewise_linear_from_1_to_2_miles,"Distance, piecewise linear from 1 to 2 miles","@(skims['DIST']-1).clip(0,1)",coef_distance_piecewise_linear_from_1_to_2_miles
2,util_distance_piecewise_linear_from_2_to_5_miles,"Distance, piecewise linear from 2 to 5 miles","@(skims['DIST']-2).clip(0,3)",coef_distance_piecewise_linear_from_2_to_5_miles
3,util_distance_piecewise_linear_from_5_to_15_miles,"Distance, piecewise linear from 5 to 15 miles","@(skims['DIST']-5).clip(0,10)",coef_distance_piecewise_linear_from_5_to_15_miles
4,util_distance_piecewise_linear_for_15_plus_miles,"Distance, piecewise linear for 15+ miles",@(skims['DIST']-15.0).clip(0),coef_distance_piecewise_linear_for_15_plus_miles
5,util_no_attractions_atwork_size_variable_is_0,"No attractions, atwork size_term variable is 0",size_term==0,coef_no_attractions_atwork_size_variable_is_0
6,util_mode_choice_logsum,Mode choice logsum,mode_choice_logsum,coef_mode_choice_logsum
7,util_sample_of_alternatives_correction_factor,Sample of alternatives correction factor,"@np.minimum(np.log(df.pick_count/df.prob), 60)",coef_sample_of_alternatives_correction_factor


## size_spec

In [10]:
data.size_spec

Unnamed: 0_level_0,RETEMPN,HEREMPN
segment,Unnamed: 1_level_1,Unnamed: 2_level_1
atwork,0.742,0.258


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [11]:
model.doctor(repair_nan_utility=True)

problem: nan_utility has (31 issues)


(<larch.Model (MNL) "None">,
 ┣ nan_utility:                   n
 ┃              dummy_zone_id      
 ┃              1                 0
 ┃              2                 0
 ┃              3                 1
 ┃              4                 5
 ┃              5                 8
 ┃              6                 9
 ┃              7                16
 ┃              8                43
 ┃              9                85
 ┃              10              138
 ┃              11              213
 ┃              12              306
 ┃              13              442
 ┃              14              615
 ┃              15              797
 ┃              16             1002
 ┃              17             1252
 ┃              18             1514
 ┃              19             1770
 ┃              20             2079
 ┃              21             2469
 ┃              22             2891
 ┃              23             3383
 ┃              24             3930
 ┃              25             4447

In [12]:
data.spec

Unnamed: 0,Label,Description,Expression,atwork
0,util_distance_piecewise_linear_from_0_to_1_miles,"Distance, piecewise linear from 0 to 1 miles","@skims['DIST'].clip(0,1)",coef_distance_piecewise_linear_from_0_to_1_miles
1,util_distance_piecewise_linear_from_1_to_2_miles,"Distance, piecewise linear from 1 to 2 miles","@(skims['DIST']-1).clip(0,1)",coef_distance_piecewise_linear_from_1_to_2_miles
2,util_distance_piecewise_linear_from_2_to_5_miles,"Distance, piecewise linear from 2 to 5 miles","@(skims['DIST']-2).clip(0,3)",coef_distance_piecewise_linear_from_2_to_5_miles
3,util_distance_piecewise_linear_from_5_to_15_miles,"Distance, piecewise linear from 5 to 15 miles","@(skims['DIST']-5).clip(0,10)",coef_distance_piecewise_linear_from_5_to_15_miles
4,util_distance_piecewise_linear_for_15_plus_miles,"Distance, piecewise linear for 15+ miles",@(skims['DIST']-15.0).clip(0),coef_distance_piecewise_linear_for_15_plus_miles
5,util_no_attractions_atwork_size_variable_is_0,"No attractions, atwork size_term variable is 0",size_term==0,coef_no_attractions_atwork_size_variable_is_0
6,util_mode_choice_logsum,Mode choice logsum,mode_choice_logsum,coef_mode_choice_logsum
7,util_sample_of_alternatives_correction_factor,Sample of alternatives correction factor,"@np.minimum(np.log(df.pick_count/df.prob), 60)",coef_sample_of_alternatives_correction_factor


In [13]:
model.estimate(method='bhhh', options={'maxiter':1000})

Unnamed: 0_level_0,value,best,initvalue,minimum,maximum,nullvalue,holdfast
param_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
atwork_HEREMPN,-1.371795,-1.371795,-1.354796,-6.0,6.0,0.0,0
atwork_RETEMPN,-0.298406,-0.298406,-0.298406,-0.298406,-0.298406,0.0,1
coef_distance_piecewise_linear_for_15_plus_miles,-0.326384,-0.326384,-0.2045,-25.0,25.0,0.0,0
coef_distance_piecewise_linear_from_0_to_1_miles,-0.840401,-0.840401,-0.7926,-25.0,25.0,0.0,0
coef_distance_piecewise_linear_from_1_to_2_miles,-0.905173,-0.905173,-0.7926,-25.0,25.0,0.0,0
coef_distance_piecewise_linear_from_2_to_5_miles,-0.577554,-0.577554,-0.5197,-25.0,25.0,0.0,0
coef_distance_piecewise_linear_from_5_to_15_miles,-0.193532,-0.193532,-0.2045,-25.0,25.0,0.0,0
coef_mode_choice_logsum,0.403737,0.403737,0.5136,-25.0,25.0,0.0,0
coef_no_attractions_atwork_size_variable_is_0,-999.0,-999.0,-999.0,-999.0,-999.0,0.0,1
coef_sample_of_alternatives_correction_factor,1.0,1.0,1.0,1.0,1.0,0.0,1


Unnamed: 0,0
atwork_HEREMPN,-1.371795
atwork_RETEMPN,-0.298406
coef_distance_piecewise_linear_for_15_plus_miles,-0.326384
coef_distance_piecewise_linear_from_0_to_1_miles,-0.840401
coef_distance_piecewise_linear_from_1_to_2_miles,-0.905173
coef_distance_piecewise_linear_from_2_to_5_miles,-0.577554
coef_distance_piecewise_linear_from_5_to_15_miles,-0.193532
coef_mode_choice_logsum,0.403737
coef_no_attractions_atwork_size_variable_is_0,-999.000000
coef_sample_of_alternatives_correction_factor,1.000000

Unnamed: 0,0
atwork_HEREMPN,-1.371795
atwork_RETEMPN,-0.298406
coef_distance_piecewise_linear_for_15_plus_miles,-0.326384
coef_distance_piecewise_linear_from_0_to_1_miles,-0.840401
coef_distance_piecewise_linear_from_1_to_2_miles,-0.905173
coef_distance_piecewise_linear_from_2_to_5_miles,-0.577554
coef_distance_piecewise_linear_from_5_to_15_miles,-0.193532
coef_mode_choice_logsum,0.403737
coef_no_attractions_atwork_size_variable_is_0,-999.0
coef_sample_of_alternatives_correction_factor,1.0


### Estimated coefficients

In [14]:
model.parameter_summary()

Unnamed: 0_level_0,Value,Std Err,t Stat,Signif,Null Value,Constrained
Parameter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
atwork_HEREMPN,-1.37,0.071,-19.32,***,0.0,
atwork_RETEMPN,-0.298,0.0,,,0.0,fixed value
coef_distance_piecewise_linear_for_15_plus_miles,-0.326,0.0607,-5.38,***,0.0,
coef_distance_piecewise_linear_from_0_to_1_miles,-0.84,0.0954,-8.81,***,0.0,
coef_distance_piecewise_linear_from_1_to_2_miles,-0.905,0.0554,-16.34,***,0.0,
coef_distance_piecewise_linear_from_2_to_5_miles,-0.578,0.0278,-20.77,***,0.0,
coef_distance_piecewise_linear_from_5_to_15_miles,-0.194,0.0173,-11.2,***,0.0,
coef_mode_choice_logsum,0.404,0.0244,16.54,***,0.0,
coef_no_attractions_atwork_size_variable_is_0,-999.0,0.0,,,0.0,fixed value
coef_sample_of_alternatives_correction_factor,1.0,0.0,,,0.0,fixed value


# Output Estimation Results

In [15]:
from activitysim.estimation.larch import update_coefficients, update_size_spec
result_dir = data.edb_directory/"estimated"

## Write updated utility coefficients

In [16]:
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

## Write updated size coefficients

In [17]:
update_size_spec(
    model, data, result_dir, 
    output_file=f"{modelname}_size_terms.csv",
)

Unnamed: 0,segment,model_selector,TOTHH,RETEMPN,FPSEMPN,HEREMPN,OTHEMPN,AGREMPN,MWTEMPN,AGE0519,HSENROLL,COLLFTE,COLLPTE
0,work_low,workplace,0.0,0.129129,0.193193,0.383383,0.12012,0.01001,0.164164,0.0,0.0,0.0,0.0
1,work_med,workplace,0.0,0.12012,0.197197,0.325325,0.139139,0.008008,0.21021,0.0,0.0,0.0,0.0
2,work_high,workplace,0.0,0.11,0.207,0.284,0.154,0.006,0.239,0.0,0.0,0.0,0.0
3,work_veryhigh,workplace,0.0,0.093,0.27,0.241,0.146,0.004,0.246,0.0,0.0,0.0,0.0
4,university,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.592,0.408
5,gradeschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,highschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
7,escort,non_mandatory,0.0,0.225,0.0,0.144,0.0,0.0,0.0,0.465,0.166,0.0,0.0
8,shopping,non_mandatory,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,eatout,non_mandatory,0.0,0.742,0.0,0.258,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Write the model estimation report, including coefficient t-statistic and log likelihood

In [18]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
);

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file and `*_size_terms.csv` file to the configs folder, rename them to `*_coefficients.csv` and `destination_choice_size_terms.csv`, and run ActivitySim in simulation mode.  Note that all the location
and desintation choice models share the same `destination_choice_size_terms.csv` input file, so if you
are updating all these models, you'll need to ensure that updated sections of this file for each model
are joined together correctly.

In [19]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_distance_piecewise_linear_from_0_to_1_miles,-0.840401,F
1,coef_distance_piecewise_linear_from_1_to_2_miles,-0.905173,F
2,coef_distance_piecewise_linear_from_2_to_5_miles,-0.577554,F
3,coef_distance_piecewise_linear_from_5_to_15_miles,-0.193532,F
4,coef_distance_piecewise_linear_for_15_plus_miles,-0.326384,F
5,coef_size_variable_atwork,1.0,T
6,coef_no_attractions_atwork_size_variable_is_0,-999.0,T
7,coef_mode_choice_logsum,0.403737,F
8,coef_sample_of_alternatives_correction_factor,1.0,T


In [20]:
pd.read_csv(result_dir/f"{modelname}_size_terms.csv")

Unnamed: 0,index,segment,model_selector,TOTHH,RETEMPN,FPSEMPN,HEREMPN,OTHEMPN,AGREMPN,MWTEMPN,AGE0519,HSENROLL,COLLFTE,COLLPTE
0,0,work_low,workplace,0.0,0.129129,0.193193,0.383383,0.12012,0.01001,0.164164,0.0,0.0,0.0,0.0
1,1,work_med,workplace,0.0,0.12012,0.197197,0.325325,0.139139,0.008008,0.21021,0.0,0.0,0.0,0.0
2,2,work_high,workplace,0.0,0.11,0.207,0.284,0.154,0.006,0.239,0.0,0.0,0.0,0.0
3,3,work_veryhigh,workplace,0.0,0.093,0.27,0.241,0.146,0.004,0.246,0.0,0.0,0.0,0.0
4,4,university,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.592,0.408
5,5,gradeschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,6,highschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
7,7,escort,non_mandatory,0.0,0.225,0.0,0.144,0.0,0.0,0.0,0.465,0.166,0.0,0.0
8,8,shopping,non_mandatory,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,9,eatout,non_mandatory,0.0,0.742,0.0,0.258,0.0,0.0,0.0,0.0,0.0,0.0,0.0
