# Estimating Stop Frequency

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import larch as lx
import pandas as pd

lx.versions()

JAX not found. Some functionality will be unavailable.


{'larch': '6.0.32',
 'sharrow': '2.13.0',
 'numpy': '1.26.4',
 'pandas': '1.5.3',
 'xarray': '2024.3.0',
 'numba': '0.60.0'}

For this demo, we will assume that you have already run ActivitySim in estimation
mode, and saved the required estimation data bundles (EDB's) to disk.  See
the [first notebook](./01_estimation_mode.ipynb) for details.  The following module
will run a script to set everything up if the example data is not already available.

In [2]:
from est_mode_setup import prepare

prepare()

EDB directory already populated.


PosixPath('test-estimation-data/activitysim-prototype-mtc-extended')

# Load data and prep model for estimation

In [3]:
modelname = "stop_frequency"

from activitysim.estimation.larch import component_model

model, data = component_model(
    modelname,
    edb_directory=f"output-est-mode/estimation_data_bundle/{modelname}/",
    return_data=True,
)



# Review data loaded from the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

In [4]:
spec_segments = [i.primary_purpose for i in data.settings.SPEC_SEGMENTS]
spec_segments

['work',
 'school',
 'univ',
 'social',
 'shopping',
 'eatout',
 'escort',
 'othmaint',
 'othdiscr',
 'atwork']

## Coefficients

There is one meta-coefficients dataframe for this component, which contains
parameters for all the matching coefficients in the various segmented 
files. When different segments have the same named coefficient with the same
value, it is assumed they should be estimated jointly.  If they have the same name
but different values in the coefficient files, then they are re-estimated
independently.

In [5]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_middle_to_low_income_hh,0.170,F
coef_mid_to_high_income_hh,0.230,F
coef_high_income_hh,0.240,F
coef_number_of_hh_persons,-0.310,F
coef_number_of_students_in_hh,0.210,F
...,...,...
coef_alternative_specific_constant_for_return_stops_0out_3in_atwork,-6.210,F
coef_alternative_specific_constant_for_outbound_stops_1out_0in_atwork,-3.896,F
coef_alternative_specific_constant_for_the_total_number_of_stops_1out_3in_atwork,2.127,F
coef_alternative_specific_constant_for_outbound_stops_2out_0in_atwork,-5.709,F


## Utility specification

The utility spec files are unique to each segment model.  The estimation mode larch pre-processor
for the stop frequency model modifies the spec files to account for jointly re-estimated
parameters.

In [6]:
data.spec[0]

Unnamed: 0,Label,Description,Expression,0out_0in,0out_1in,0out_2in,0out_3in,1out_0in,1out_1in,1out_2in,1out_3in,2out_0in,2out_1in,2out_2in,2out_3in,3out_0in,3out_1in,3out_2in,3out_3in
0,util_middle_to_low_income_hh,Middle to Low Income HH,(income_in_thousands>19999) & (income_in_thous...,,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh,coef_middle_to_low_income_hh
1,util_mid_to_high_income_hh,Mid to High Income HH,(income_in_thousands>=50000) & (income_in_thou...,,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh,coef_mid_to_high_income_hh
2,util_high_income_hh,High Income HH,(income_in_thousands>=100000),,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh,coef_high_income_hh
3,util_number_of_hh_persons,Number of HH Persons,hhsize,,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons,coef_number_of_hh_persons
4,util_number_of_full_time_workers_in_hh,Number of full time workers in HH,num_full,,,,,,,,,,,,,,,,
5,util_number_of_students_in_hh,Number of Students in HH,num_student,,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh,coef_number_of_students_in_hh
6,util_num_kids_between_0_and_4_including_years_old,Num Kids between 0 and 4 (including) years old,num_age_0_4,,,,,,,,,,,,,,,,
7,util_presence_of_kids_between_0_and_4_includin...,Presence of Kids between 0 and 4 (including) y...,(num_age_0_4 > 0),,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...,coef_presence_of_kids_between_0_and_4_includin...
8,util_num_kids_between_5_and_15_including_years...,Num kids between 5 and 15 (including) years old,num_age_5_15,,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...,coef_num_kids_between_5_and_15_including_years...
9,util_presence_of_kids_between_5_and_15_includi...,Presence of kids between 5 and 15 (including) ...,(num_age_5_15 > 0),,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...,coef_presence_of_kids_between_5_and_15_includi...


## Chooser data

The chooser data is unique to each segment model. 

In [7]:
data.chooser_data[0]

Unnamed: 0_level_0,model_choice,override_choice,util_middle_to_low_income_hh,util_mid_to_high_income_hh,util_high_income_hh,util_number_of_hh_persons,util_number_of_full_time_workers_in_hh,util_number_of_students_in_hh,util_num_kids_between_0_and_4_including_years_old,util_presence_of_kids_between_0_and_4_including_years_old,...,tour_mode_is_drive_transit,tour_mode_is_non_motorized,num_school_tours,num_univ_tours,num_atwork_subtours,num_hh_shop_tours,num_hh_maint_tours,hhacc,pracc,destination_area_type
tour_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2966594,0out_1in,0out_1in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,False,0,0,1,0,0,6.446184,6.869385,1
2967783,0out_0in,0out_0in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,6.384615,7.341247,0
2968726,0out_0in,0out_0in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,True,0,0,0,0,0,5.447277,6.565208,1
2970858,1out_1in,0out_0in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,6.787018,7.692237,1
2973728,0out_0in,0out_0in,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,6.611336,5.693881,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309081532,2out_0in,0out_0in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,4.619027,6.475182,3
309090634,0out_1in,0out_0in,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,4.800090,6.198316,1
309101950,0out_0in,1out_1in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,0.000000,0.000000,1
309107362,0out_0in,0out_0in,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,False,False,0,0,0,0,0,0.000000,0.000000,4


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [8]:
model.estimate(method='SLSQP', options={"maxiter": 1000})

Unnamed: 0_level_0,value,best,initvalue,minimum,maximum,nullvalue,holdfast
param_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
coef_alternative_specific_constant_for_outbound_stops_1out_0in,-0.817659,-0.817659,-0.833,-inf,inf,0.0,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_atwork,-3.857785,-3.857785,-3.896,-inf,inf,0.0,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_eatout,-2.243908,-2.243908,-2.190,-inf,inf,0.0,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_escort,-2.254734,-2.254734,-2.173,-inf,inf,0.0,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othdiscr,-1.509245,-1.509245,-1.581,-inf,inf,0.0,0
...,...,...,...,...,...,...,...
coef_primary_destination_accessibility_log_of_it_,0.191205,0.191205,0.180,-inf,inf,0.0,0
coef_subtour_departure_less_than_or_equal_to_11am,0.352247,0.352247,0.310,-inf,inf,0.0,0
coef_subtour_distance_in_miles_from_tour_destination_to_subtour_primary_destination_one_way_,0.023141,0.023141,0.020,-inf,inf,0.0,0
coef_subtour_duration_in_hours_integer_,0.555413,0.555413,0.560,-inf,inf,0.0,0


if you get poor results, consider setting global bounds with model.set_cap()


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in,-0.817659
coef_alternative_specific_constant_for_outbound_stops_1out_0in_atwork,-3.857785
coef_alternative_specific_constant_for_outbound_stops_1out_0in_eatout,-2.243908
coef_alternative_specific_constant_for_outbound_stops_1out_0in_escort,-2.254734
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othdiscr,-1.509245
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othmaint,-1.850181
coef_alternative_specific_constant_for_outbound_stops_1out_0in_school,-2.103501
coef_alternative_specific_constant_for_outbound_stops_1out_0in_shopping,-1.374719
coef_alternative_specific_constant_for_outbound_stops_1out_0in_social,-1.100007
coef_alternative_specific_constant_for_outbound_stops_1out_0in_univ,-2.583745

Unnamed: 0,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in,-0.817659
coef_alternative_specific_constant_for_outbound_stops_1out_0in_atwork,-3.857785
coef_alternative_specific_constant_for_outbound_stops_1out_0in_eatout,-2.243908
coef_alternative_specific_constant_for_outbound_stops_1out_0in_escort,-2.254734
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othdiscr,-1.509245
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othmaint,-1.850181
coef_alternative_specific_constant_for_outbound_stops_1out_0in_school,-2.103501
coef_alternative_specific_constant_for_outbound_stops_1out_0in_shopping,-1.374719
coef_alternative_specific_constant_for_outbound_stops_1out_0in_social,-1.100007
coef_alternative_specific_constant_for_outbound_stops_1out_0in_univ,-2.583745

Unnamed: 0,0
coef_alternative_specific_constant_for_outbound_stops_1out_0in,1.318164e-07
coef_alternative_specific_constant_for_outbound_stops_1out_0in_atwork,-3.450424e-05
coef_alternative_specific_constant_for_outbound_stops_1out_0in_eatout,-2.357602e-05
coef_alternative_specific_constant_for_outbound_stops_1out_0in_escort,7.327103e-06
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othdiscr,5.83016e-05
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othmaint,-1.687364e-05
coef_alternative_specific_constant_for_outbound_stops_1out_0in_school,-1.669586e-05
coef_alternative_specific_constant_for_outbound_stops_1out_0in_shopping,-1.179775e-08
coef_alternative_specific_constant_for_outbound_stops_1out_0in_social,1.831328e-05
coef_alternative_specific_constant_for_outbound_stops_1out_0in_univ,1.659316e-05


### Estimated coefficients

In [9]:
model.parameter_summary()

Unnamed: 0_level_0,Value,Std Err,t Stat,Signif,Null Value
Parameter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
coef_alternative_specific_constant_for_outbound_stops_1out_0in,-0.818,0.0344,-23.78,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_atwork,-3.86,0.249,-15.49,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_eatout,-2.24,0.102,-22.05,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_escort,-2.25,0.0741,-30.45,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othdiscr,-1.51,0.0658,-22.93,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_othmaint,-1.85,0.0875,-21.14,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_school,-2.1,0.0714,-29.46,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_shopping,-1.37,0.0511,-26.89,***,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_social,-1.1,0.565,-1.95,,0.0
coef_alternative_specific_constant_for_outbound_stops_1out_0in_univ,-2.58,0.148,-17.51,***,0.0


# Output Estimation Results

The stop frequency model include seperate coefficient file for each segment,
and has a special writer method to seperate the coefficient by segment
after estimation.

In [10]:
from activitysim.estimation.larch.stop_frequency import update_segment_coefficients
result_dir = data.edb_directory/"estimated"
update_segment_coefficients(
    model, data, result_dir,
    output_file="stop_frequency_coefficients_{segment_name}_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [11]:
for m, segment in zip(model, data.segments):
    m.to_xlsx(
        result_dir/f"{modelname}_{segment}_model_estimation.xlsx", 
        data_statistics=False,
    )

# Next Steps

The final step is to either manually or automatically copy the `stop_frequency_coefficients_*_revised.csv` files to the configs folder, rename them to `stop_frequency_coefficients_*.csv`, and run ActivitySim in simulation mode.

In [12]:
pd.read_csv(result_dir/"stop_frequency_coefficients_work_revised.csv")

Unnamed: 0,coefficient_name,Description,value
0,coef_middle_to_low_income_hh,Middle to Low Income HH,0.17
1,coef_mid_to_high_income_hh,Mid to High Income HH,0.23
2,coef_high_income_hh,High Income HH,0.24
3,coef_number_of_hh_persons,Number of HH Persons,-0.291501
4,coef_number_of_students_in_hh,Number of Students in HH,0.206028
5,coef_presence_of_kids_between_0_and_4_includin...,Presence of Kids between 0 and 4 (including) y...,0.738257
6,coef_num_kids_between_5_and_15_including_years...,Num kids between 5 and 15 (including) years old,0.083356
7,coef_presence_of_kids_between_5_and_15_includi...,Presence of kids between 5 and 15 (including) ...,0.174006
8,coef_number_of_adults_16_years_old_,Number of Adults (>= 16 years old),0.006177
9,coef_number_of_cars_number_of_workers,Number of Cars > Number of Workers,0.195317
