# Estimating Auto Ownership

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [1]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd

We'll work in our `test` directory, where ActivitySim has saved the estimation data bundles.

In [2]:
os.chdir('test')

# Load data and prep model for estimation

In [3]:
modelname = "auto_ownership"

from activitysim.estimation.larch import component_model
model, data = component_model(modelname, return_data=True)

# Review data loaded from the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

### Coefficients

In [4]:
data.coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_cars1_drivers_2,0.0000,T
coef_cars1_drivers_3,0.0000,T
coef_cars1_persons_16_17,0.0000,T
coef_cars234_asc_marin,0.0000,T
coef_cars1_persons_25_34,0.0000,T
...,...,...
coef_cars4_drivers_3,5.2080,F
coef_cars3_drivers_3,5.5131,F
coef_cars2_drivers_4_up,6.3662,F
coef_cars3_drivers_4_up,8.5148,F


#### Utility specification

In [5]:
data.spec

Unnamed: 0,Label,Description,Expression,cars0,cars1,cars2,cars3,cars4
0,util_drivers_2,2 Adults (age 16+),num_drivers==2,,coef_cars1_drivers_2,coef_cars2_drivers_2,coef_cars3_drivers_2,coef_cars4_drivers_2
1,util_drivers_3,3 Adults (age 16+),num_drivers==3,,coef_cars1_drivers_3,coef_cars2_drivers_3,coef_cars3_drivers_3,coef_cars4_drivers_3
2,util_drivers_4_up,4+ Adults (age 16+),num_drivers>3,,coef_cars1_drivers_4_up,coef_cars2_drivers_4_up,coef_cars3_drivers_4_up,coef_cars4_drivers_4_up
3,util_persons_16_17,Persons age 16-17,num_children_16_to_17,,coef_cars1_persons_16_17,coef_cars2_persons_16_17,coef_cars34_persons_16_17,coef_cars34_persons_16_17
4,util_persons_18_24,Persons age 18-24,num_college_age,,coef_cars1_persons_18_24,coef_cars2_persons_18_24,coef_cars34_persons_18_24,coef_cars34_persons_18_24
5,util_persons_25_34,Persons age 35-34,num_young_adults,,coef_cars1_persons_25_34,coef_cars2_persons_25_34,coef_cars34_persons_25_34,coef_cars34_persons_25_34
6,util_presence_children_0_4,Presence of children age 0-4,num_young_children>0,,coef_cars1_presence_children_0_4,coef_cars234_presence_children_0_4,coef_cars234_presence_children_0_4,coef_cars234_presence_children_0_4
7,util_presence_children_5_17,Presence of children age 5-17,(num_children_5_to_15+num_children_16_to_17)>0,,coef_cars1_presence_children_5_17,coef_cars2_presence_children_5_17,coef_cars34_presence_children_5_17,coef_cars34_presence_children_5_17
8,util_num_workers_clip_3,"Number of workers, capped at 3",@df.num_workers.clip(upper=3),,coef_cars1_num_workers_clip_3,coef_cars2_num_workers_clip_3,coef_cars3_num_workers_clip_3,coef_cars4_num_workers_clip_3
9,util_hh_income_0_30k,"Piecewise Linear household income, $0-30k","@df.income_in_thousands.clip(0, 30)",,coef_cars1_hh_income_0_30k,coef_cars2_hh_income_0_30k,coef_cars3_hh_income_0_30k,coef_cars4_hh_income_0_30k


### Chooser data

In [6]:
data.chooser_data

Unnamed: 0_level_0,model_choice,override_choice,util_drivers_2,util_drivers_3,util_drivers_4_up,util_persons_16_17,util_persons_18_24,util_persons_25_34,util_presence_children_0_4,util_presence_children_5_17,...,HSENROLL,COLLFTE,COLLPTE,TOPOLOGY,TERMINAL,household_density,employment_density,density_index,is_cbd,override_choice_code
household_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
166,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.00000,1,3.21263,24.783133,31.566265,13.883217,False,1
197,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.00000,1,3.68156,56.783784,10.459459,8.832526,False,1
268,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,3598.08521,0.00000,1,3.29100,11.947644,45.167539,9.448375,True,2
375,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.00000,1,4.11499,73.040169,28.028350,20.255520,True,2
387,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,227.78223,41.22827,1,3.83527,26.631579,45.868421,16.848945,False,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2863464,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,72.14684,0.00000,1,5.52555,38.187500,978.875000,36.753679,False,2
2863483,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.00000,3,3.99027,39.838272,71.693001,25.608291,True,2
2863806,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.00000,1,4.27539,51.675676,47.216216,24.672699,False,2
2864518,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00000,0.00000,1,25.52083,15.938148,551.353820,15.490363,True,2


# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [7]:
model.estimate()

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
coef_cars1_asc,4.744711,1.1865,0.0,,,0,,4.744711
coef_cars1_asc_county,-0.566000,-0.5660,0.0,,,0,,-0.566000
coef_cars1_asc_marin,-0.243396,-0.2434,0.0,,,0,,-0.243396
coef_cars1_asc_san_francisco,3.984111,0.4259,0.0,,,0,,3.984111
coef_cars1_auto_time_saving_per_worker,-0.039384,0.4707,0.0,,,0,,-0.039384
...,...,...,...,...,...,...,...,...
coef_retail_auto_no_workers,-0.637704,0.0626,0.0,,,0,,-0.637704
coef_retail_auto_workers,-0.531112,0.1646,0.0,,,0,,-0.531112
coef_retail_non_motor,-0.030000,-0.0300,0.0,,,1,,-0.030000
coef_retail_transit_no_workers,-0.333447,-0.3053,0.0,,,0,,-0.333447


  model.estimate()
  model.estimate()


Unnamed: 0,0
coef_cars1_asc,4.744711
coef_cars1_asc_county,-0.566000
coef_cars1_asc_marin,-0.243396
coef_cars1_asc_san_francisco,3.984111
coef_cars1_auto_time_saving_per_worker,-0.039384
coef_cars1_density_0_10_no_workers,0.000000
coef_cars1_density_10_up_no_workers,-0.006930
coef_cars1_density_10_up_workers,-0.016448
coef_cars1_drivers_2,0.000000
coef_cars1_drivers_3,0.000000

Unnamed: 0,0
coef_cars1_asc,4.744711
coef_cars1_asc_county,-0.566
coef_cars1_asc_marin,-0.243396
coef_cars1_asc_san_francisco,3.984111
coef_cars1_auto_time_saving_per_worker,-0.039384
coef_cars1_density_0_10_no_workers,0.0
coef_cars1_density_10_up_no_workers,-0.00693
coef_cars1_density_10_up_workers,-0.016448
coef_cars1_drivers_2,0.0
coef_cars1_drivers_3,0.0


### Estimated coefficients

In [8]:
model.parameter_summary()

Unnamed: 0,Value,Std Err,t Stat,Signif,Like Ratio,Null Value,Constrained
coef_cars1_asc,4.74,2.66,1.78,,,0.0,
coef_cars1_asc_county,-0.566,,,[],0.00,0.0,
coef_cars1_asc_marin,-0.243,0.0141,-17.28,***,,0.0,
coef_cars1_asc_san_francisco,3.98,2.66,1.50,,,0.0,
coef_cars1_auto_time_saving_per_worker,-0.0394,0.561,-0.07,,,0.0,
coef_cars1_density_0_10_no_workers,0.0,,,,,0.0,fixed value
coef_cars1_density_10_up_no_workers,-0.00693,0.00514,-1.35,,,0.0,
coef_cars1_density_10_up_workers,-0.0164,0.0039,-4.21,***,,0.0,
coef_cars1_drivers_2,0.0,,,,,0.0,fixed value
coef_cars1_drivers_3,0.0,,,,,0.0,fixed value


# Output Estimation Results

In [9]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [10]:
model.to_xlsx(
    result_dir/f"{modelname}_model_estimation.xlsx", 
    data_statistics=False,
)

<larch.util.excel.ExcelWriter at 0x7f9d8862f160>

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [11]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")

Unnamed: 0,coefficient_name,value,constrain
0,coef_cars1_drivers_2,0.000000,T
1,coef_cars1_drivers_3,0.000000,T
2,coef_cars1_persons_16_17,0.000000,T
3,coef_cars234_asc_marin,0.000000,T
4,coef_cars1_persons_25_34,0.000000,T
...,...,...,...
62,coef_cars4_drivers_3,564.490158,F
63,coef_cars3_drivers_3,5.048488,F
64,coef_cars2_drivers_4_up,6.856405,F
65,coef_cars3_drivers_4_up,8.317950,F
