# Estimating Tour Mode Choice

This notebook illustrates how to re-estimate tour and subtour mode choice for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd

We'll work in our `test` directory, where ActivitySim has saved the estimation data bundles.

In [2]:
os.chdir('test')

# Load data and prep model for estimation

In [3]:
modelname = "tour_mode_choice"

from activitysim.estimation.larch import component_model
model, data = component_model(modelname, return_data=True)

The tour mode choice model is already a `ModelGroup` segmented on different purposes,
so we can add the subtour mode choice as just another member model of the group

In [4]:
model2, data2 = component_model("atwork_subtour_mode_choice", return_data=True)

In [5]:
model.extend(model2)

# Review data loaded from the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

### Coefficients

In [6]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_one,1.000,T
coef_nest_root,1.000,T
coef_nest_AUTO,0.720,T
coef_nest_AUTO_DRIVEALONE,0.350,T
coef_nest_AUTO_SHAREDRIDE2,0.350,T
...,...,...
walk_transit_CBD_ASC_atwork,0.564,F
drive_transit_CBD_ASC_eatout_escort_othdiscr_othmaint_shopping_social,0.525,F
drive_transit_CBD_ASC_school_univ,0.672,F
drive_transit_CBD_ASC_work,1.100,F


#### Utility specification

In [7]:
data.spec

Unnamed: 0,Label,Description,Expression,DRIVEALONEFREE,DRIVEALONEPAY,SHARED2FREE,SHARED2PAY,SHARED3FREE,SHARED3PAY,WALK,...,WALK_HVY,WALK_COM,DRIVE_LOC,DRIVE_LRF,DRIVE_EXP,DRIVE_HVY,DRIVE_COM,TAXI,TNC_SINGLE,TNC_SHARED
0,util_DRIVEALONEFREE_Unavailable,DRIVEALONEFREE - Unavailable,sov_available == False,-999,,,,,,,...,,,,,,,,,,
1,util_DRIVEALONEFREE_Unavailable_for_zero_auto_...,DRIVEALONEFREE - Unavailable for zero auto hou...,auto_ownership == 0,-999,,,,,,,...,,,,,,,,,,
2,util_DRIVEALONEFREE_Unavailable_for_persons_le...,DRIVEALONEFREE - Unavailable for persons less ...,age < 16,-999,,,,,,,...,,,,,,,,,,
3,util_DRIVEALONEFREE_Unavailable_for_joint_tours,DRIVEALONEFREE - Unavailable for joint tours,is_joint == True,-999,,,,,,,...,,,,,,,,,,
4,util_DRIVEALONEFREE_Unavailable_if_didnt_drive...,DRIVEALONEFREE - Unavailable if didn't drive t...,is_atwork_subtour & ~work_tour_is_SOV,-999,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
310,util_Drive_to_Transit_dest_CBD,Drive to Transit dest CBD,@df.destination_in_cbd,,,,,,,,...,,,drive_transit_CBD_ASC,drive_transit_CBD_ASC,drive_transit_CBD_ASC,drive_transit_CBD_ASC,drive_transit_CBD_ASC,,,
311,util_Drive_to_Transit_distance_penalty,Drive to Transit - distance penalty,@drvtrn_distpen_0_multiplier * (1-od_skims['DI...,,,,,,,,...,,,coef_ivt,coef_ivt,coef_ivt,coef_ivt,coef_ivt,,,
312,util_Walk_not_available_for_long_distances,Walk not available for long distances,@od_skims.max('DISTWALK') > 3,,,,,,,-999,...,,,,,,,,,,
313,util_Bike_not_available_for_long_distances,Bike not available for long distances,@od_skims.max('DISTBIKE') > 8,,,,,,,,...,,,,,,,,,,


### Chooser data

In [8]:
data.chooser_data

Unnamed: 0_level_0,model_choice,override_choice,util_DRIVEALONEFREE_Unavailable,util_DRIVEALONEFREE_Unavailable_for_zero_auto_households,util_DRIVEALONEFREE_Unavailable_for_persons_less_than_16,util_DRIVEALONEFREE_Unavailable_for_joint_tours,util_DRIVEALONEFREE_Unavailable_if_didnt_drive_to_work,util_DRIVEALONEFREE_In_vehicle_time,util_DRIVEALONEFREE_Terminal_time,util_DRIVEALONEFREE_Operating_cost,...,walk_ferry_available,drive_local_available,drive_commuter_available,drive_express_available,drive_heavyrail_available,drive_lrf_available,drive_ferry_available,destination_in_cbd,tour_id.1,override_choice_code
tour_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6812,WALK,WALK,0.0,1.0,0.0,0.0,0.0,5.480000,15.45016,2.156258,...,False,False,False,False,False,False,False,1,6812,7
8110,WALK,WALK,0.0,1.0,0.0,0.0,0.0,8.860000,34.19784,17.722496,...,False,False,False,False,False,False,False,0,8110,7
11013,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,20.689999,18.17320,15.097005,...,False,True,True,False,True,False,False,1,11013,1
11016,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,14.710000,12.85052,10.971117,...,False,True,True,False,True,False,False,0,11016,1
15403,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,15.160000,10.64808,26.092745,...,False,True,True,False,True,False,False,0,15403,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309760814,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,24.510000,11.41364,27.217999,...,False,True,True,False,True,False,False,0,309760814,1
309760815,SHARED2FREE,SHARED2FREE,0.0,0.0,0.0,0.0,0.0,19.349998,16.96372,14.591494,...,False,True,True,False,True,False,False,1,309760815,3
309790009,BIKE,BIKE,0.0,0.0,0.0,0.0,0.0,21.410000,19.44384,84.280323,...,False,True,True,False,True,False,False,1,309790009,8
309796968,SHARED3FREE,SHARED3FREE,0.0,0.0,0.0,0.0,0.0,17.340000,13.66872,30.549632,...,False,True,True,False,True,False,False,0,309796968,5


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [9]:
model.load_data()
model.doctor(repair_ch_av="-")

req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
problem: chosen-but-not-available (1 issues)
problem: low-variance-data-co (1 issues)
problem: low-variance-data-co (1 issues)
problem: chosen-but-not-available (2 issues)
problem: low-variance-data-c

[(<larch.Model (GEV) "eatout">,
  ┣ chosen_but_not_available:     n example rows
  ┃                           12  2      45, 312
  ┣     low_variance_data_co:                   n                                       example cols
  ┃                           low_variance_co  58  util_BIKE_Unavailable_if_didnt_bike_to_work, u...),
 (<larch.Model (GEV) "escort">,
  ┣ low_variance_data_co:                   n                                       example cols
  ┃                       low_variance_co  87  util_BIKE_Unavailable_if_didnt_bike_to_work, u...),
 (<larch.Model (GEV) "othdiscr">,
  ┣ chosen_but_not_available:     n   example rows
  ┃                           10  4   66, 184, 222
  ┃                           12  3  205, 394, 560
  ┣     low_variance_data_co:                   n                                       example cols
  ┃                           low_variance_co  56  util_BIKE_Unavailable_if_didnt_bike_to_work, u...),
 (<larch.Model (GEV) "othmaint">,
  ┣ chosen_bu

In [10]:
model.maximize_loglike(method="SLSQP", options={"maxiter": 1000})

Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
-999,-999.000000,-999.000000,-999.0,-999.0,-999.0,1,,-999.000000
1,1.000000,1.000000,1.0,1.0,1.0,1,,1.000000
bike_ASC_auto_deficient_eatout,-95.051842,-1.569111,0.0,,,0,,-95.051842
bike_ASC_auto_sufficient_eatout,-1.464118,-1.200347,0.0,,,0,,-1.464118
bike_ASC_no_auto_eatout,7.201849,0.868071,0.0,,,0,,7.201849
...,...,...,...,...,...,...,...,...
walk_ASC_no_auto_atwork,12.824109,6.669213,0.0,,,0,,12.824109
walk_transit_ASC_auto_deficient_atwork,8.985177,-2.998829,0.0,,,0,,8.985177
walk_transit_ASC_auto_sufficient_atwork,9.127996,-3.401027,0.0,,,0,,9.127996
walk_transit_ASC_no_auto_atwork,21.234896,2.704188,0.0,,,0,,21.234896


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
-999,-999.000000
1,1.000000
bike_ASC_auto_deficient_eatout,-95.051842
bike_ASC_auto_sufficient_eatout,-1.464118
bike_ASC_no_auto_eatout,7.201849
coef_age010_trn_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,0.719709
coef_age1619_da_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,-0.019080
coef_age16p_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social,-1.139964
coef_hhsize1_sr_multiplier_eatout_escort_othdiscr_othmaint_school_shopping_social_univ_atwork,0.020999
coef_hhsize2_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work_atwork,-0.003927

Unnamed: 0,0
-999,-999.0
1,1.0
bike_ASC_auto_deficient_eatout,-95.051842
bike_ASC_auto_sufficient_eatout,-1.464118
bike_ASC_no_auto_eatout,7.201849
coef_age010_trn_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,0.719709
coef_age1619_da_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,-0.01908
coef_age16p_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social,-1.139964
coef_hhsize1_sr_multiplier_eatout_escort_othdiscr_othmaint_school_shopping_social_univ_atwork,0.020999
coef_hhsize2_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work_atwork,-0.003927

Unnamed: 0,0
-999,0.0
1,0.0
bike_ASC_auto_deficient_eatout,-1.0570780000000001e-39
bike_ASC_auto_sufficient_eatout,0.0002295714
bike_ASC_no_auto_eatout,4.588544e-05
coef_age010_trn_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,0.0001503084
coef_age1619_da_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,-0.0002331535
coef_age16p_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social,-0.0001136698
coef_hhsize1_sr_multiplier_eatout_escort_othdiscr_othmaint_school_shopping_social_univ_atwork,-0.0001619716
coef_hhsize2_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work_atwork,0.000775774


In [11]:
model.calculate_parameter_covariance()

  model.calculate_parameter_covariance()
- commuter_rail_ASC_eatout_escort_othdiscr_othmaint_shopping_social_atwork
- drive_ferry_ASC_eatout_escort_othdiscr_othmaint_shopping_social_atwork
- drive_light_rail_ASC_eatout_escort_othdiscr_othmaint_shopping_social_atwork
- drive_transit_ASC_no_auto_all
- and 54 more
  model.calculate_parameter_covariance()


### Estimated coefficients

In [12]:
model.parameter_summary()

Unnamed: 0,Value,Std Err,t Stat,Signif,Like Ratio,Null Value
-999,-999.0,0.0,,,,-999.0
1,1.0,0.0,,,,1.0
bike_ASC_auto_deficient_eatout,-95.1,0.0,,[**],3.97,0.0
bike_ASC_auto_sufficient_eatout,-1.46,0.459,-3.19,**,,0.0
bike_ASC_no_auto_eatout,7.2,0.74,9.73,***,,0.0
coef_age010_trn_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,0.72,0.387,1.86,,,0.0
coef_age1619_da_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work,-0.0191,0.214,-0.09,,,0.0
coef_age16p_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social,-1.14,0.276,-4.13,***,,0.0
coef_hhsize1_sr_multiplier_eatout_escort_othdiscr_othmaint_school_shopping_social_univ_atwork,0.021,0.0998,0.21,,,0.0
coef_hhsize2_sr_multiplier_eatout_escort_othdiscr_othmaint_shopping_social_work_atwork,-0.00393,0.0708,-0.06,,,0.0


# Output Estimation Results

In [13]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [14]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
)

  xl = ExcelWriter(filename, engine='xlsxwriter_larch', model=model, **kwargs)


# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [15]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_one,1.000000,T
1,coef_nest_root,1.000000,T
2,coef_nest_AUTO,0.720000,T
3,coef_nest_AUTO_DRIVEALONE,0.350000,T
4,coef_nest_AUTO_SHAREDRIDE2,0.350000,T
...,...,...,...
302,walk_transit_CBD_ASC_atwork,0.351950,F
303,drive_transit_CBD_ASC_eatout_escort_othdiscr_o...,0.553404,F
304,drive_transit_CBD_ASC_school_univ,62.459413,F
305,drive_transit_CBD_ASC_work,1.432631,F
