# Estimating Joint Tour Composition

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd

We'll work in our `test` directory, where ActivitySim has saved the estimation data bundles.

In [2]:
os.chdir('test')

# Load data and prep model for estimation

In [3]:
modelname = "joint_tour_composition"

from activitysim.estimation.larch import component_model
model, data = component_model(modelname, return_data=True)

# Review data loaded from the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

## Coefficients

In [4]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_unavailable,-999.0,T
coef_asc_children,5.3517,F
coef_asc_mixed,5.629,fF
coef_tour_purpose_is_eating_out_children,-0.9678,F
coef_tour_purpose_is_eating_out_mixed,-0.8027,F
coef_tour_purpose_is_discretionary_adults,0.7648,F
coef_tour_purpose_is_discretionary_children,0.5101,F
coef_number_of_full_time_workers_adults,1.024,F
coef_number_of_full_time_workers_mixed,0.3624,F
coef_number_of_part_time_workers_adults,0.5412,F


## Utility specification

In [5]:
data.spec

Unnamed: 0,Label,Description,Expression,adults,children,mixed
0,util_asc,Alternative-specific constant,1,,coef_asc_children,coef_asc_mixed
1,util_tour_purpose_is_eating_out,Joint tour purpose is eating out (dummy),tour_type=='eat',,coef_tour_purpose_is_eating_out_children,coef_tour_purpose_is_eating_out_mixed
2,util_tour_purpose_is_discretionary,Joint tour purpose is discretionary (dummy),tour_type=='disc',coef_tour_purpose_is_discretionary_adults,coef_tour_purpose_is_discretionary_children,
3,util_number_of_full_time_workers,Number of Full-Time Workers in the household,num_full_max3,coef_number_of_full_time_workers_adults,,coef_number_of_full_time_workers_mixed
4,util_number_of_part_time_workers,Number of Part-Time Workers in the household,num_part_max3,coef_number_of_part_time_workers_adults,,coef_number_of_part_time_workers_mixed
5,util_number_of_university_students,Number of University students in the household,num_univ_max3,coef_number_of_university_students,,
6,util_number_of_non_workers,Number of Non-Workers in the household,num_nonwork_max3,coef_number_of_non_workers_adults,,coef_number_of_non_workers_mixed
7,util_number_of_children_too_young_for_school,Number of Children too Young for School in the...,num_preschool_max3,,coef_number_of_children_too_young_for_school_c...,coef_number_of_children_too_young_for_school_m...
8,util_number_of_pre_driving_age_children,Number of Pre-driving Age Children in the hous...,num_school_max3,,coef_number_of_pre_driving_age_children_children,coef_number_of_pre_driving_age_children_mixed
9,util_number_of_driving_age_children,Number of Driving-age Children in the household,num_driving_max3,,coef_number_of_driving_age_children_children,coef_number_of_driving_age_children_mixed


## Chooser data

In [6]:
data.chooser_data

Unnamed: 0_level_0,tour_id,model_choice,override_choice,util_asc,util_tour_purpose_is_eating_out,util_tour_purpose_is_discretionary,util_number_of_full_time_workers,util_number_of_part_time_workers,util_number_of_university_students,util_number_of_non_workers,...,log_time_window_overlap_adult_child,num_full_max3,num_part_max3,num_univ_max3,num_nonwork_max3,num_preschool_max3,num_school_max3,num_driving_max3,more_cars_than_workers,override_choice_code
household_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
189758,7785298,adults,adults,1.0,0.0,0.0,0.0,0.0,2.0,0.0,...,0.000,0.0,0.0,2.0,0.0,0.0,0.0,0.0,False,1
201016,8708454,adults,adults,1.0,0.0,0.0,0.0,0.0,2.0,0.0,...,0.000,0.0,0.0,2.0,0.0,0.0,0.0,0.0,False,1
213291,9715006,adults,adults,1.0,0.0,0.0,0.0,0.0,2.0,0.0,...,0.000,0.0,0.0,2.0,0.0,0.0,0.0,0.0,True,1
226902,10831112,adults,adults,1.0,0.0,0.0,0.0,0.0,2.0,0.0,...,0.000,0.0,0.0,2.0,0.0,0.0,0.0,0.0,False,1
337259,20334787,mixed,mixed,1.0,0.0,0.0,1.0,0.0,0.0,1.0,...,2.197,1.0,0.0,0.0,1.0,1.0,0.0,0.0,False,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2628704,283676518,mixed,mixed,1.0,0.0,0.0,3.0,0.0,0.0,0.0,...,2.398,3.0,0.0,0.0,0.0,2.0,0.0,0.0,False,3
2678969,295260168,adults,adults,1.0,0.0,0.0,3.0,1.0,0.0,3.0,...,2.080,3.0,1.0,0.0,3.0,1.0,0.0,0.0,False,1
2704338,297646485,adults,adults,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True,1
2718585,298814741,adults,adults,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.000,0.0,1.0,0.0,0.0,0.0,0.0,0.0,False,1


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [7]:
model.estimate(method='SLSQP')

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
coef_asc_children,121.627746,5.3517,0.0,,,0,,121.627746
coef_asc_mixed,-88.249613,5.629,0.0,,,0,,-88.249613
coef_household_has_more_cars_than_workers_adults,-40.894264,1.386,0.0,,,0,,-40.894264
coef_household_has_more_cars_than_workers_mixed,153.815203,0.751,0.0,,,0,,153.815203
coef_household_in_suburban_area_adults,0.5105,0.5105,0.0,,,0,,0.5105
coef_household_in_suburban_area_mixed,0.1283,0.1283,0.0,,,0,,0.1283
coef_household_in_urban_area,-21.823334,0.5741,0.0,,,0,,-21.823334
coef_log_max_overlap_of_adults_time_windows,-36.285976,1.192,0.0,,,0,,-36.285976
coef_log_max_overlap_of_childrens_time_windows,283.696667,1.841,0.0,,,0,,283.696667
coef_log_max_overlap_of_time_windows,196.728573,1.958,0.0,,,0,,196.728573


  model.estimate(method='SLSQP')
  model.estimate(method='SLSQP')


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
coef_asc_children,121.627746
coef_asc_mixed,-88.249613
coef_household_has_more_cars_than_workers_adults,-40.894264
coef_household_has_more_cars_than_workers_mixed,153.815203
coef_household_in_suburban_area_adults,0.510500
coef_household_in_suburban_area_mixed,0.128300
coef_household_in_urban_area,-21.823334
coef_log_max_overlap_of_adults_time_windows,-36.285976
coef_log_max_overlap_of_childrens_time_windows,283.696667
coef_log_max_overlap_of_time_windows,196.728573

Unnamed: 0,0
coef_asc_children,121.627746
coef_asc_mixed,-88.249613
coef_household_has_more_cars_than_workers_adults,-40.894264
coef_household_has_more_cars_than_workers_mixed,153.815203
coef_household_in_suburban_area_adults,0.5105
coef_household_in_suburban_area_mixed,0.1283
coef_household_in_urban_area,-21.823334
coef_log_max_overlap_of_adults_time_windows,-36.285976
coef_log_max_overlap_of_childrens_time_windows,283.696667
coef_log_max_overlap_of_time_windows,196.728573

Unnamed: 0,0
coef_asc_children,-1.19924e-06
coef_asc_mixed,1.19924e-06
coef_household_has_more_cars_than_workers_adults,-7.697895e-122
coef_household_has_more_cars_than_workers_mixed,2.677106e-09
coef_household_in_suburban_area_adults,0.0
coef_household_in_suburban_area_mixed,0.0
coef_household_in_urban_area,-1.3110609999999998e-38
coef_log_max_overlap_of_adults_time_windows,-2.551704e-38
coef_log_max_overlap_of_childrens_time_windows,-2.523891e-06
coef_log_max_overlap_of_time_windows,2.72654e-06


*Note that in the example data for this model, there are only 91 joint tours, which an is insufficient
number of observations to successfully estimate all 31 parameters in this model.*

### Estimated coefficients

In [8]:
model.parameter_summary()

Unnamed: 0,Value,Std Err,t Stat,Signif,Like Ratio,Null Value,Constrained
coef_asc_children,122.0,,,[***],306.42,0.0,
coef_asc_mixed,-88.2,22000.0,-0.0,,,0.0,
coef_household_has_more_cars_than_workers_adults,-40.9,,,[],0.00,0.0,
coef_household_has_more_cars_than_workers_mixed,154.0,46700.0,0.0,,,0.0,
coef_household_in_suburban_area_adults,0.511,,,[],0.00,0.0,
coef_household_in_suburban_area_mixed,0.128,,,[],0.00,0.0,
coef_household_in_urban_area,-21.8,,,[],0.00,0.0,
coef_log_max_overlap_of_adults_time_windows,-36.3,,,[],0.00,0.0,
coef_log_max_overlap_of_childrens_time_windows,284.0,,,[***],BIG,0.0,
coef_log_max_overlap_of_time_windows,197.0,10200.0,0.02,,,0.0,


# Output Estimation Results

In [9]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [10]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
)

<larch.util.excel.ExcelWriter at 0x7fed101d38e0>

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [11]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_unavailable,-999.0,T
1,coef_asc_children,121.627746,F
2,coef_asc_mixed,-88.249613,fF
3,coef_tour_purpose_is_eating_out_children,-0.9678,F
4,coef_tour_purpose_is_eating_out_mixed,-0.8027,F
5,coef_tour_purpose_is_discretionary_adults,0.7648,F
6,coef_tour_purpose_is_discretionary_children,0.5101,F
7,coef_number_of_full_time_workers_adults,185.490311,F
8,coef_number_of_full_time_workers_mixed,50.943193,F
9,coef_number_of_part_time_workers_adults,47.088158,F
