# Estimating Tour Mode Choice

This notebook illustrates how to re-estimate ActivitySim's tour mode choice model.  The steps in the process are:
  - Run ActivitySim in estimation mode to read household travel survey files, run the households through the tour mode choice model step, and write an estimation data bundle (EDB) that contains the model utility specifications, coefficients, chooser data, and alternatives data.
  - Read and transform the EDB into the format required by the model estimation package [larch](https://larch.newman.me) and then re-estimate the model coefficients.  No changes to the model specification will be made.
  - Update the ActivitySim model coefficients and re-run the model in simulation mode.
  
The basic estimation workflow is shown below and explained in the next steps.

![estimation workflow](https://github.com/RSGInc/activitysim/raw/develop/docs/images/estimation_example.jpg)

# Load libraries

In [2]:
import larch  # !conda install larch #for estimation
import pandas as pd
import numpy as np
import yaml 
import larch.util.excel
import larch_asim  # utility functions in a local module
import os

from larch import P,X

# Required Inputs

In addition to a working ActivitySim model setup, estimation mode requires an ActivitySim format household travel survey.  An ActivitySim format household travel survey is very similar to ActivitySim's simulation model tables:

 - households
 - persons
 - tours
 - joint_tour_participants
 - trips (not yet implemented)

Examples of the ActivitySim format household travel survey are included in the [example_estimation data folders](https://github.com/RSGInc/activitysim/tree/develop/activitysim/examples/example_estimation).  The user is responsible for formatting their household travel survey into the appropriate format.  

After creating an ActivitySim format household travel survey, the `scripts/infer.py` script is run to append additional calculated fields.  An example of an additional calculated field is the `household:joint_tour_frequency`, which is calculated based on the `tours` and `joint_tour_participants` tables.  

The input survey files are below.

### Survey households

In [3]:
pd.read_csv("../data_sf/survey_data/override_persons.csv")

Unnamed: 0,person_id,household_id,age,PNUM,sex,pemploy,pstudent,ptype,school_taz,workplace_taz,free_parking_at_work,cdap_activity,mandatory_tour_frequency,_escort,_shopping,_othmaint,_othdiscr,_eatout,_social,non_mandatory_tour_frequency
0,166,166,54,1,2,3,3,4,-1,-1,False,N,,0,0,0,0,1,0,4
1,197,197,46,1,2,3,3,4,-1,-1,False,N,,0,1,0,0,0,0,16
2,268,268,46,1,1,3,3,4,-1,-1,False,N,,0,0,1,1,0,0,9
3,375,375,54,1,2,3,3,4,-1,-1,False,N,,0,0,1,0,0,0,8
4,387,387,44,1,2,3,3,4,-1,-1,False,N,,1,0,0,1,0,0,33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4401,7554799,2863464,93,1,2,3,3,5,-1,-1,False,N,,0,0,0,1,0,0,1
4402,7554818,2863483,68,1,1,3,3,5,-1,-1,False,N,,0,0,1,1,0,0,9
4403,7555141,2863806,93,1,2,3,3,5,-1,-1,False,N,,0,2,0,1,0,0,17
4404,7555853,2864518,71,1,1,3,3,5,-1,-1,False,N,,0,0,0,0,0,1,2


### Survey persons

In [4]:
pd.read_csv("../data_sf/survey_data/override_persons.csv")

Unnamed: 0,person_id,household_id,age,PNUM,sex,pemploy,pstudent,ptype,school_taz,workplace_taz,free_parking_at_work,cdap_activity,mandatory_tour_frequency,_escort,_shopping,_othmaint,_othdiscr,_eatout,_social,non_mandatory_tour_frequency
0,166,166,54,1,2,3,3,4,-1,-1,False,N,,0,0,0,0,1,0,4
1,197,197,46,1,2,3,3,4,-1,-1,False,N,,0,1,0,0,0,0,16
2,268,268,46,1,1,3,3,4,-1,-1,False,N,,0,0,1,1,0,0,9
3,375,375,54,1,2,3,3,4,-1,-1,False,N,,0,0,1,0,0,0,8
4,387,387,44,1,2,3,3,4,-1,-1,False,N,,1,0,0,1,0,0,33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4401,7554799,2863464,93,1,2,3,3,5,-1,-1,False,N,,0,0,0,1,0,0,1
4402,7554818,2863483,68,1,1,3,3,5,-1,-1,False,N,,0,0,1,1,0,0,9
4403,7555141,2863806,93,1,2,3,3,5,-1,-1,False,N,,0,2,0,1,0,0,17
4404,7555853,2864518,71,1,1,3,3,5,-1,-1,False,N,,0,0,0,0,0,1,2


### Survey tours

In [5]:
pd.read_csv("../data_sf/survey_data/override_tours.csv")

Unnamed: 0,tour_id,survey_tour_id,person_id,household_id,tour_type,tour_category,destination,origin,start,end,tour_mode,survey_parent_tour_id,parent_tour_id,composition,tdd,atwork_subtour_frequency
0,25820,258200,629,629,school,mandatory,133.0,131.0,12.0,15.0,WALK,,,,115,
1,52265,522650,1274,1274,school,mandatory,188.0,166.0,9.0,15.0,WALK_LOC,,,,76,
2,1117937,11179370,27266,27266,school,mandatory,133.0,9.0,17.0,18.0,WALK_HVY,,,,163,
3,1148523,11485230,28012,28012,school,mandatory,12.0,10.0,17.0,22.0,WALK_LRF,,,,167,
4,1208547,12085470,29476,29476,school,mandatory,13.0,16.0,8.0,15.0,WALK_LOC,,,,61,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5768,302942627,3029426270,7388844,2750003,maint,atwork,5.0,7.0,14.0,14.0,WALK,3.029426e+09,302942643.0,,135,
5769,305120465,3051204650,7441962,2758909,maint,atwork,110.0,2.0,12.0,13.0,SHARED2FREE,3.051205e+09,305120481.0,,113,
5770,308000655,3080006550,7512211,2820876,eat,atwork,14.0,1.0,12.0,13.0,WALK,3.080007e+09,308000690.0,,113,
5771,308073840,3080738400,7513996,2822661,eat,atwork,69.0,107.0,8.0,16.0,SHARED3FREE,3.080739e+09,308073875.0,,62,


### Survey joint tour participants

In [6]:
pd.read_csv("../data_sf/survey_data/survey_joint_tour_participants.csv")

Unnamed: 0,participant_id,tour_id,household_id,person_id,participant_num
0,22095828301,220958283,2223759,5389226,1
1,22095828302,220958283,2223759,5389227,2
2,14429508701,144295087,1606646,3519392,1
3,14429508702,144295087,1606646,3519393,2
4,28367651801,283676518,2628704,6918939,1
...,...,...,...,...,...
226,16297928102,162979281,1769918,3975105,2
227,16297928103,162979281,1769918,3975106,3
228,16297928104,162979281,1769918,3975107,4
229,26353054902,263530549,2519358,6427575,1


# Run the Estimation Example

The next step is to run the model with an `estimation.yaml` settings file with the following settings in order to output the EDB for tour mode choice:

```
enable=True

bundles:
  - tour_mode_choice

survey_tables:
  households:
    file_name: survey_data/override_households.csv
    index_col: household_id
  persons:
    file_name:  survey_data/override_persons.csv
    index_col: person_id
  tours:
    file_name:  survey_data/override_tours.csv
  joint_tour_participants:
    file_name:  survey_data/override_joint_tour_participants.csv
```

This enables the estimation mode functionality, identifies which models to run and their output estimation data bundles (EDBs), and the input survey tables, which include the override settings for each model choice.  

With this setup, the model will output an EBD with the folling tables:
  - model settings - tour_mode_choice_model_settings.yaml
  - coefficients - tour_mode_choice_coefficients.csv
  - coefficients template by tour purpose - tour_mode_choice_coefficients_template.csv
  - utilities specification - tour_mode_choice_SPEC.csv
  - chooser data - tour_mode_choice_values_combined.csv
  
The following code runs the software in estimation mode, inheriting the settings from the simulation setup and using the San Francisco county data.  It produces the tour_mode_choice model EDB but runs all the model steps identified in the inherited settings file.  

In [18]:
!activitysim run -c ../configs -c ../../test/configs -d ../data_sf -d ../../test/data -o ../output

Configured logging using basicConfig
INFO:activitysim:Configured logging using basicConfig
INFO:activitysim.cli.run:using configs_dir: ['../configs', '../../test/configs']
INFO:activitysim.cli.run:using data_dir: ['../data_sf', '../../test/data']
INFO:activitysim.cli.run:using output_dir: ['../output']
INFO - Read logging configuration from: ../configs\logging.yaml
INFO - setting households_sample_size: 0
INFO - setting chunk_size: 0
INFO - setting multiprocess: None
INFO - setting num_processes: None
INFO - setting resume_after: None
INFO - run single process simulation
INFO - open_pipeline
INFO - Set random seed base to 0
INFO - Time to execute open_pipeline : 0.021 seconds (0.0 minutes)
INFO - preload_injectables
INFO - Time to execute preload_injectables : 0.001 seconds (0.0 minutes)
INFO - Reading CSV file ../data_sf\land_use.csv
INFO - renaming columns: {'ZONE': 'TAZ', 'COUNTY': 'county_id'}
INFO - keeping columns: ['DISTRICT', 'SD', 'county_id', 'TOTHH', 'TOTPOP', 'TOTACRE', 'RE

INFO - Running workplace_location.i1.simulate.work_med with 544 persons
DEBUG - workplace_location: write_table cache: choosers
DEBUG - workplace_location: write_table cache: interaction_sample_alternatives
INFO - Running chunk 1 of 1 size 544
INFO - Running eval_interaction_utilities on 103360 rows
INFO - workplace_location: eval_interaction_utilities write_interaction_expression_values workplace_location.i1.simulate.work_med.interaction_sample_simulate.eval_interaction_utilities
DEBUG - workplace_location: write_table cache: interaction_expression_values
DEBUG - workplace_location: write_table cache: choices
DEBUG - get_survey_values: reindexing using persons.index
DEBUG - workplace_location: write_table cache: override_choices
INFO - Running workplace_location.i1.sample.work_high with 614 persons
INFO - Estimation mode for workplace_location.i1.sample.work_high using unsampled alternatives short_circuit_choices
INFO - Running chunk 1 of 1 size 614
INFO - Running eval_interaction_uti

INFO - Running eval_interaction_utilities on 4800 rows
INFO - non_mandatory_tour_frequency_PTYPE_DRIVING: eval_interaction_utilities write_interaction_expression_values non_mandatory_tour_frequency.PTYPE_DRIVING.interaction_simulate.interaction_simulate.eval_interaction_utilities
DEBUG - non_mandatory_tour_frequency_PTYPE_DRIVING: write_table write: interaction_expression_values
DEBUG - non_mandatory_tour_frequency_PTYPE_DRIVING: write_table cache: choices
DEBUG - get_survey_values: reindexing using persons.index
DEBUG - non_mandatory_tour_frequency_PTYPE_DRIVING: write_table cache: override_choices
DEBUG - non_mandatory_tour_frequency_PTYPE_DRIVING: write_omnibus_table: choosers_combined table_names: ['choices', 'override_choices', 'choosers']
DEBUG - non_mandatory_tour_frequency_PTYPE_DRIVING: write_omnibus_choosers: ../output\estimation_data_bundle\non_mandatory_tour_frequency_PTYPE_DRIVING\non_mandatory_tour_frequency_PTYPE_DRIVING_choosers_combined.csv
INFO - non_mandatory_tour_fr

# Read EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

In [19]:
edb_directory = "../output/estimation_data_bundle/tour_mode_choice/"

def read_csv(filename, **kwargs):
    return pd.read_csv(os.path.join(edb_directory, filename), **kwargs)

In [20]:
coefficients = read_csv(
    "tour_mode_choice_coefficients.csv",
    index_col='coefficient_name',
)
coef_template = read_csv(
    "tour_mode_choice_coefficients_template.csv", 
    index_col='coefficient_name',
)
spec = read_csv("tour_mode_choice_SPEC.csv")
values = read_csv("tour_mode_choice_values_combined.csv")

### Model settings

In [21]:
settings = yaml.load( 
    open(os.path.join(edb_directory, "tour_mode_choice_model_settings.yaml"),"r"), 
    Loader=yaml.SafeLoader,
)

settings

{'LOGIT_TYPE': 'NL',
 'NESTS': {'name': 'root',
  'coefficient': 'coef_nest_root',
  'alternatives': [{'name': 'AUTO',
    'coefficient': 'coef_nest_AUTO',
    'alternatives': [{'name': 'DRIVEALONE',
      'coefficient': 'coef_nest_AUTO_DRIVEALONE',
      'alternatives': ['DRIVEALONEFREE', 'DRIVEALONEPAY']},
     {'name': 'SHAREDRIDE2',
      'coefficient': 'coef_nest_AUTO_SHAREDRIDE2',
      'alternatives': ['SHARED2FREE', 'SHARED2PAY']},
     {'name': 'SHAREDRIDE3',
      'coefficient': 'coef_nest_AUTO_SHAREDRIDE3',
      'alternatives': ['SHARED3FREE', 'SHARED3PAY']}]},
   {'name': 'NONMOTORIZED',
    'coefficient': 'coef_nest_NONMOTORIZED',
    'alternatives': ['WALK', 'BIKE']},
   {'name': 'TRANSIT',
    'coefficient': 'coef_nest_TRANSIT',
    'alternatives': [{'name': 'WALKACCESS',
      'coefficient': 'coef_nest_TRANSIT_WALKACCESS',
      'alternatives': ['WALK_LOC',
       'WALK_LRF',
       'WALK_EXP',
       'WALK_HVY',
       'WALK_COM']},
     {'name': 'DRIVEACCESS',
      

### Coefficients

In [22]:
coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_nest_root,1.000,T
coef_nest_AUTO,0.720,T
coef_nest_AUTO_DRIVEALONE,0.350,T
coef_nest_AUTO_SHAREDRIDE2,0.350,T
coef_nest_AUTO_SHAREDRIDE3,0.350,T
...,...,...
walk_transit_CBD_ASC_atwork,0.564,F
drive_transit_CBD_ASC_eatout_escort_othdiscr_othmaint_shopping_social,0.525,F
drive_transit_CBD_ASC_school_univ,0.672,F
drive_transit_CBD_ASC_work,1.100,F


### Coef_template - coefficients by tour purpose

In [23]:
coef_template

Unnamed: 0_level_0,eatout,escort,othdiscr,othmaint,school,shopping,social,univ,work,atwork
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root,coef_nest_root
coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO,coef_nest_AUTO
coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE,coef_nest_AUTO_DRIVEALONE
coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2,coef_nest_AUTO_SHAREDRIDE2
coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3,coef_nest_AUTO_SHAREDRIDE3
...,...,...,...,...,...,...,...,...,...,...
express_bus_ASC,express_bus_ASC_eatout_escort_othdiscr_othmain...,express_bus_ASC_eatout_escort_othdiscr_othmain...,express_bus_ASC_eatout_escort_othdiscr_othmain...,express_bus_ASC_eatout_escort_othdiscr_othmain...,express_bus_ASC_school_univ,express_bus_ASC_eatout_escort_othdiscr_othmain...,express_bus_ASC_eatout_escort_othdiscr_othmain...,express_bus_ASC_school_univ,express_bus_ASC_work,express_bus_ASC_eatout_escort_othdiscr_othmain...
heavy_rail_ASC,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...,heavy_rail_ASC_school_univ,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...,heavy_rail_ASC_school_univ,heavy_rail_ASC_work,heavy_rail_ASC_eatout_escort_othdiscr_othmaint...
commuter_rail_ASC,commuter_rail_ASC_eatout_escort_othdiscr_othma...,commuter_rail_ASC_eatout_escort_othdiscr_othma...,commuter_rail_ASC_eatout_escort_othdiscr_othma...,commuter_rail_ASC_eatout_escort_othdiscr_othma...,commuter_rail_ASC_school_univ,commuter_rail_ASC_eatout_escort_othdiscr_othma...,commuter_rail_ASC_eatout_escort_othdiscr_othma...,commuter_rail_ASC_school_univ,commuter_rail_ASC_work,commuter_rail_ASC_eatout_escort_othdiscr_othma...
walk_transit_CBD_ASC,walk_transit_CBD_ASC_eatout_escort_othdiscr_ot...,walk_transit_CBD_ASC_eatout_escort_othdiscr_ot...,walk_transit_CBD_ASC_eatout_escort_othdiscr_ot...,walk_transit_CBD_ASC_eatout_escort_othdiscr_ot...,walk_transit_CBD_ASC_school_univ,walk_transit_CBD_ASC_eatout_escort_othdiscr_ot...,walk_transit_CBD_ASC_eatout_escort_othdiscr_ot...,walk_transit_CBD_ASC_school_univ,walk_transit_CBD_ASC_work,walk_transit_CBD_ASC_atwork


### Utility specifications

In [24]:
# Remove apostrophes from Label names
spec['Label'] = spec['Label'].str.replace("'","")

In [25]:
spec

Unnamed: 0,Label,Description,Expression,DRIVEALONEFREE,DRIVEALONEPAY,SHARED2FREE,SHARED2PAY,SHARED3FREE,SHARED3PAY,WALK,...,WALK_HVY,WALK_COM,DRIVE_LOC,DRIVE_LRF,DRIVE_EXP,DRIVE_HVY,DRIVE_COM,TAXI,TNC_SINGLE,TNC_SHARED
0,#,Drive alone no toll,,,,,,,,,...,,,,,,,,,,
1,util_DRIVEALONEFREE_Unavailable,DRIVEALONEFREE - Unavailable,sov_available == False,-999,,,,,,,...,,,,,,,,,,
2,util_DRIVEALONEFREE_Unavailable_for_zero_auto_...,DRIVEALONEFREE - Unavailable for zero auto hou...,auto_ownership == 0,-999,,,,,,,...,,,,,,,,,,
3,util_DRIVEALONEFREE_Unavailable_for_persons_le...,DRIVEALONEFREE - Unavailable for persons less ...,age < 16,-999,,,,,,,...,,,,,,,,,,
4,util_DRIVEALONEFREE_Unavailable_for_joint_tours,DRIVEALONEFREE - Unavailable for joint tours,is_joint == True,-999,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
340,#,FIXME - skims aren't symmetrical,so we have to make sure they can get back,,,,,,,,...,,,,,,,,,,
341,util_Walk_not_available_for_long_distances,Walk not available for long distances,@od_skims.max('DISTWALK') > 3,,,,,,,-999,...,,,,,,,,,,
342,util_Bike_not_available_for_long_distances,Bike not available for long distances,@od_skims.max('DISTBIKE') > 8,,,,,,,,...,,,,,,,,,,
343,util_Drive_alone_not_available_for_escort_tours,Drive alone not available for escort tours,is_escort,-999,-999,,,,,,...,,,,,,,,,,


In [26]:
# Check for double-parameters
ss = spec.query("Label!='#'").iloc[:,3:].stack().str.split("*")
st = ss.apply(lambda x: len(x))>1
assert len(ss[st]) == 0

### Alternative values

In [29]:
# Remove apostrophes from column names
values.columns = values.columns.str.replace("'","")
values.fillna(0, inplace=True)
values

Unnamed: 0,tour_id,model_choice,override_choice,util_DRIVEALONEFREE_Unavailable,util_DRIVEALONEFREE_Unavailable_for_zero_auto_households,util_DRIVEALONEFREE_Unavailable_for_persons_less_than_16,util_DRIVEALONEFREE_Unavailable_for_joint_tours,util_DRIVEALONEFREE_Unavailable_if_didnt_drive_to_work,util_DRIVEALONEFREE_In_vehicle_time,util_DRIVEALONEFREE_Terminal_time,...,walk_heavyrail_available,walk_lrf_available,walk_ferry_available,drive_local_available,drive_commuter_available,drive_express_available,drive_heavyrail_available,drive_lrf_available,drive_ferry_available,destination_in_cbd
0,6812,WALK,WALK,0.0,1.0,0.0,0.0,0.0,1.620000,12.85052,...,False,False,False,False,False,False,False,False,False,0
1,8110,WALK_LRF,WALK_LRF,0.0,1.0,0.0,0.0,0.0,13.010000,22.05276,...,False,True,False,False,False,False,False,False,False,1
2,11013,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,25.420000,19.56452,...,False,True,False,True,True,False,True,False,False,1
3,11016,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,18.779999,14.01624,...,False,True,False,True,True,False,True,False,False,0
4,15403,DRIVEALONEFREE,SHARED2FREE,0.0,0.0,0.0,0.0,0.0,7.180000,19.88140,...,True,False,False,True,True,False,True,False,False,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5318,309760814,SHARED2FREE,SHARED2FREE,0.0,0.0,0.0,0.0,0.0,13.700001,16.73920,...,False,False,False,True,True,False,True,False,False,1
5319,309760815,DRIVEALONEFREE,DRIVEALONEFREE,0.0,0.0,0.0,0.0,0.0,4.770000,18.44424,...,False,False,False,True,True,False,True,False,False,0
5320,309790009,BIKE,BIKE,0.0,0.0,0.0,0.0,0.0,14.809999,21.41740,...,False,False,False,True,True,False,True,False,False,1
5321,309796968,SHARED2FREE,SHARED2FREE,0.0,0.0,0.0,0.0,0.0,12.880000,10.61908,...,False,True,False,True,True,False,True,False,False,0


# Data Processing and Estimation Setup

The next step is to transform the EDB for larch for model re-estimation.  

### Alternatives

In [30]:
alt_names = list(spec.columns[3:])
alt_codes = np.arange(1,len(alt_names)+1)
alt_names_to_codes = dict(zip(alt_names, alt_codes))
alt_codes_to_names = dict(zip(alt_codes, alt_names))
alt_names_to_codes

{'DRIVEALONEFREE': 1,
 'DRIVEALONEPAY': 2,
 'SHARED2FREE': 3,
 'SHARED2PAY': 4,
 'SHARED3FREE': 5,
 'SHARED3PAY': 6,
 'WALK': 7,
 'BIKE': 8,
 'WALK_LOC': 9,
 'WALK_LRF': 10,
 'WALK_EXP': 11,
 'WALK_HVY': 12,
 'WALK_COM': 13,
 'DRIVE_LOC': 14,
 'DRIVE_LRF': 15,
 'DRIVE_EXP': 16,
 'DRIVE_HVY': 17,
 'DRIVE_COM': 18,
 'TAXI': 19,
 'TNC_SINGLE': 20,
 'TNC_SHARED': 21}

### Nesting structure

In [31]:
tree = larch_asim.construct_nesting_tree(alt_names, settings['NESTS'])

tree

In [32]:
tree.elemental_names()

{1: 'DRIVEALONEFREE',
 2: 'DRIVEALONEPAY',
 3: 'SHARED2FREE',
 4: 'SHARED2PAY',
 5: 'SHARED3FREE',
 6: 'SHARED3PAY',
 7: 'WALK',
 8: 'BIKE',
 9: 'WALK_LOC',
 10: 'WALK_LRF',
 11: 'WALK_EXP',
 12: 'WALK_HVY',
 13: 'WALK_COM',
 14: 'DRIVE_LOC',
 15: 'DRIVE_LRF',
 16: 'DRIVE_EXP',
 17: 'DRIVE_HVY',
 18: 'DRIVE_COM',
 19: 'TAXI',
 20: 'TNC_SINGLE',
 21: 'TNC_SHARED'}

### List tour purposes

In [33]:
purposes = list(coef_template.columns)
purposes

['eatout',
 'escort',
 'othdiscr',
 'othmaint',
 'school',
 'shopping',
 'social',
 'univ',
 'work',
 'atwork']

### Setup purpose specific models

In [34]:
m = {purpose:larch.Model(graph=tree) for purpose in purposes}

In [16]:
for alt_code, alt_name in tree.elemental_names().items():
    # Read in base utility function for this alt_name
    u = larch_asim.linear_utility_from_spec(
        spec, x_col='Label', p_col=alt_name, 
        ignore_x=('#',), 
    )
    for purpose in purposes:
        # Modify utility function based on template for purpose
        u_purp = sum(
            (
                P(coef_template[purpose].get(i.param,i.param)) 
                * i.data * i.scale
            )
            for i in u
        )
        m[purpose].utility_co[alt_code] = u_purp


### Set parameter values

In [35]:
for model in m.values():
    larch_asim.explicit_value_parameters(model)

In [36]:
larch_asim.apply_coefficients(coefficients, m)

### Survey choice

In [37]:
values['model_choice_code'] = values.model_choice.map(alt_names_to_codes)

In [38]:
d = larch.DataFrames(
    co=values.set_index('tour_id'),
    av=True,
    alt_codes=alt_codes,
    alt_names=alt_names,
)

In [39]:
for purpose, model in m.items():
    model.dataservice = d.selector_co(f"tour_type=='{purpose}'")
    model.choice_co_code = 'model_choice_code'

In [40]:
from larch.model.model_group import ModelGroup
mg = ModelGroup(m.values())

# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.

In [41]:
mg.estimate()

req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided
req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
coef_nest_AUTO,0.72,1.0,1.0,,,1,,0.72
coef_nest_AUTO_DRIVEALONE,0.35,1.0,1.0,,,1,,0.35
coef_nest_AUTO_SHAREDRIDE2,0.35,1.0,1.0,,,1,,0.35
coef_nest_AUTO_SHAREDRIDE3,0.35,1.0,1.0,,,1,,0.35
coef_nest_NONMOTORIZED,0.72,1.0,1.0,,,1,,0.72
coef_nest_RIDEHAIL,0.36,1.0,1.0,,,1,,0.36
coef_nest_TRANSIT,0.72,1.0,1.0,,,1,,0.72
coef_nest_TRANSIT_DRIVEACCESS,0.5,1.0,1.0,,,1,,0.5
coef_nest_TRANSIT_WALKACCESS,0.5,1.0,1.0,,,1,,0.5


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
coef_nest_AUTO,0.72
coef_nest_AUTO_DRIVEALONE,0.35
coef_nest_AUTO_SHAREDRIDE2,0.35
coef_nest_AUTO_SHAREDRIDE3,0.35
coef_nest_NONMOTORIZED,0.72
coef_nest_RIDEHAIL,0.36
coef_nest_TRANSIT,0.72
coef_nest_TRANSIT_DRIVEACCESS,0.50
coef_nest_TRANSIT_WALKACCESS,0.50
coef_nest_AUTO,0.0

Unnamed: 0,0
coef_nest_AUTO,0.72
coef_nest_AUTO_DRIVEALONE,0.35
coef_nest_AUTO_SHAREDRIDE2,0.35
coef_nest_AUTO_SHAREDRIDE3,0.35
coef_nest_NONMOTORIZED,0.72
coef_nest_RIDEHAIL,0.36
coef_nest_TRANSIT,0.72
coef_nest_TRANSIT_DRIVEACCESS,0.5
coef_nest_TRANSIT_WALKACCESS,0.5

Unnamed: 0,0
coef_nest_AUTO,0.0
coef_nest_AUTO_DRIVEALONE,0.0
coef_nest_AUTO_SHAREDRIDE2,0.0
coef_nest_AUTO_SHAREDRIDE3,0.0
coef_nest_NONMOTORIZED,0.0
coef_nest_RIDEHAIL,0.0
coef_nest_TRANSIT,0.0
coef_nest_TRANSIT_DRIVEACCESS,0.0
coef_nest_TRANSIT_WALKACCESS,0.0


# Output Estimation Results

In [42]:
est_names = [j for j in coefficients.index if j in mg.pf.index]

In [43]:
# Write re-estimated value back into the coefficients file.
coefficients.loc[est_names, 'value'] = mg.pf.loc[est_names, 'value']

### Write the re-estimated coefficients file

In [44]:
# Write out replacement coefficients file and model summaries
os.makedirs(os.path.join(edb_directory,'estimated'), exist_ok=True)

coefficients.reset_index().to_csv(
    os.path.join(
        edb_directory, 
        'estimated',
        "tour_mode_choice_coefficients_revised.csv",
    ),
    index=False,
)



### Write the model estimation report, including coefficient t-statistic and log likelihood

In [45]:
for purpose, model in m.items():
    model.to_xlsx(
        os.path.join(
            edb_directory, 
            'estimated',
            f"tour_mode_choice_{purpose}_model_estimation.xlsx",
        )
    )

# Next Steps

The final step is to either manually or automatically copy the `tour_mode_choice_coefficients_revised.csv` file to the configs folder, rename it to `tour_mode_choice_coeffs.csv`, and run ActivitySim in simulation mode.