# Estimation Mode

This set of notebooks illustrates how to re-estimate ActivitySim's choice models.  These models include

  - school_location
  - workplace_location
  - auto_ownership
  - free_parking
  - cdap
  - mandatory_tour_frequency
  - mandatory_tour_scheduling (work and school estimated seperately)
  - joint_tour_frequency
  - joint_tour_composition
  - joint_tour_participation
  - joint_tour_destination (with non_mandatory_tour_destination)
  - joint_tour_scheduling
  - non_mandatory_tour_frequency
  - non_mandatory_tour_destination (with joint_tour_destination)
  - non_mandatory_tour_scheduling
  - tour_mode_choice (with atwork_subtour_mode_choice)
  - atwork_subtour_frequency
  - atwork_subtour_destination
  - atwork_subtour_scheduling
  - atwork_subtour_mode_choice (with tour_mode_choice)
  - stop_frequency
  - trip_destination
  - trip_mode_choice

As noted above, not every model is estimated independently; some components share parameters with other components, and so must be jointly re-estimated.  

The steps in the process are:
  - Run ActivitySim in estimation mode to read household travel survey files, run the ActivitySim submodels to write estimation data bundles (EDB) that contains the model utility specifications, coefficients, chooser data, and alternatives data for each submodel.
  - Using the `activitysim.estimation.larch` library of tools, read and transform the relevant EDB into the format required by the model estimation package [larch](https://larch.newman.me) and then re-estimate the model coefficients.  No changes to the model specification will be made.
  - Update the ActivitySim model coefficients and re-run the model in simulation mode.
  
The basic estimation workflow is shown below and explained in the next steps.

![estimation workflow](https://github.com/RSGInc/activitysim/raw/develop/docs/images/estimation_example.jpg)

# Load libraries

In [1]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd

# Review Inputs

In addition to a working ActivitySim model setup, estimation mode requires an ActivitySim format household travel survey.  An ActivitySim format household travel survey is very similar to ActivitySim's simulation model tables:

 - households
 - persons
 - tours
 - joint_tour_participants
 - trips 

Examples of the ActivitySim format household travel survey are included in the [example_estimation data folders](https://github.com/RSGInc/activitysim/tree/develop/activitysim/examples/example_estimation).  The user is responsible for formatting their household travel survey into the appropriate format.  

After creating an ActivitySim format household travel survey, the `scripts/infer.py` script is run to append additional calculated fields.  An example of an additional calculated field is the `household:joint_tour_frequency`, which is calculated based on the `tours` and `joint_tour_participants` tables.  

The input survey files are below.

## Survey households

In [2]:
pd.read_csv("../data_sf/survey_data/override_households.csv")

Unnamed: 0,household_id,home_zone_id,income,hhsize,HHT,auto_ownership,num_workers,joint_tour_frequency
0,841891,126,48000,1,4,1,1,0_tours
1,990869,134,48000,2,1,2,2,0_tours
2,125886,113,25900,1,4,1,1,0_tours
3,727893,8,26100,2,1,0,1,0_tours
4,2741769,150,121600,4,1,2,1,0_tours
...,...,...,...,...,...,...,...,...
1995,663493,110,19180,1,6,1,1,0_tours
1996,569375,20,7400,1,6,1,0,0_tours
1997,1445193,17,75000,1,4,0,1,0_tours
1998,2833455,69,0,1,0,0,0,0_tours


## Survey persons

In [3]:
pd.read_csv("../data_sf/survey_data/override_persons.csv")

Unnamed: 0,person_id,household_id,age,PNUM,sex,pemploy,pstudent,ptype,school_zone_id,workplace_zone_id,free_parking_at_work,cdap_activity,mandatory_tour_frequency,_escort,_shopping,_othmaint,_othdiscr,_eatout,_social,non_mandatory_tour_frequency
0,166,166,54,1,2,3,3,4,-1,-1,False,N,,0,0,0,0,1,0,4
1,197,197,46,1,2,3,3,4,-1,-1,False,N,,0,1,0,0,0,0,16
2,268,268,46,1,1,3,3,4,-1,-1,False,N,,0,0,1,1,0,0,9
3,375,375,54,1,2,3,3,4,-1,-1,False,N,,0,0,1,0,0,0,8
4,387,387,44,1,2,3,3,4,-1,-1,False,N,,1,0,0,1,0,0,33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4400,7554799,2863464,93,1,2,3,3,5,-1,-1,False,N,,0,0,0,1,0,0,1
4401,7554818,2863483,68,1,1,3,3,5,-1,-1,False,N,,0,0,1,1,0,0,9
4402,7555141,2863806,93,1,2,3,3,5,-1,-1,False,N,,0,2,0,1,0,0,17
4403,7555853,2864518,71,1,1,3,3,5,-1,-1,False,N,,0,0,0,0,0,1,2


## Survey joint tour participants

In [4]:
pd.read_csv("../data_sf/survey_data/override_joint_tour_participants.csv")

Unnamed: 0,survey_participant_id,survey_tour_id,household_id,person_id,participant_num,tour_id,participant_id
0,144295087010,1442950870,1606646,3519392,1,144295087,14429508701
1,144295087020,1442950870,1606646,3519393,2,144295087,14429508702
2,283676518010,2836765180,2628704,6918939,1,283676518,28367651801
3,283676518040,2836765180,2628704,6918942,2,283676518,28367651804
4,100798519030,1007985190,1173905,2458502,1,100798519,10079851903
...,...,...,...,...,...,...,...
223,162979281020,1629792810,1769918,3975105,2,162979281,16297928102
224,162979281030,1629792810,1769918,3975106,3,162979281,16297928103
225,162979281040,1629792810,1769918,3975107,4,162979281,16297928104
226,263530549020,2635305490,2519358,6427575,1,263530549,26353054902


## Survey tours

In [5]:
pd.read_csv("../data_sf/survey_data/override_tours.csv")

Unnamed: 0,tour_id,survey_tour_id,person_id,household_id,tour_type,tour_category,destination,origin,start,end,tour_mode,survey_parent_tour_id,parent_tour_id,composition,tdd,atwork_subtour_frequency,stop_frequency
0,25820,258200,629,629,school,mandatory,12.0,131.0,12.0,21.0,WALK_HVY,,,,121,,1out_0in
1,52265,522650,1274,1274,school,mandatory,10.0,166.0,10.0,11.0,WALK_LRF,,,,86,,0out_0in
2,1117937,11179370,27266,27266,school,mandatory,12.0,9.0,17.0,18.0,WALK_LRF,,,,163,,0out_0in
3,1148523,11485230,28012,28012,school,mandatory,5.0,10.0,8.0,10.0,WALK_LRF,,,,56,,1out_0in
4,1208547,12085470,29476,29476,school,mandatory,13.0,16.0,8.0,15.0,WALK_LOC,,,,61,,1out_0in
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5773,302923726,3029237260,7388383,2749929,maint,atwork,71.0,36.0,15.0,15.0,DRIVEALONEFREE,3.029237e+09,302923742.0,,145,,0out_0in
5774,302942567,3029425670,7388843,2750003,eat,atwork,87.0,42.0,10.0,10.0,DRIVEALONEFREE,3.029426e+09,302942602.0,,85,,0out_0in
5775,302942627,3029426270,7388844,2750003,maint,atwork,67.0,17.0,14.0,14.0,WALK_LOC,3.029426e+09,302942643.0,,135,,0out_0in
5776,305120465,3051204650,7441962,2758909,maint,atwork,130.0,127.0,12.0,12.0,WALK,3.051205e+09,305120481.0,,112,,0out_1in


## Survey trips

In [6]:
pd.read_csv("../data_sf/survey_data/override_trips.csv")

Unnamed: 0,trip_id,survey_trip_id,person_id,household_id,survey_tour_id,outbound,purpose,destination,origin,depart,trip_mode,tour_id,trip_num
0,54497,544970,166,166,68120,True,eatout,72,71,12.0,WALK,6812,1
1,54501,545010,166,166,68120,False,Home,71,72,18.0,WALK,6812,1
2,64881,648810,197,197,81100,True,shopping,47,80,12.0,WALK,8110,1
3,64885,648850,197,197,81100,False,Home,80,47,12.0,WALK,8110,1
4,88105,881050,268,268,110130,True,othdiscr,32,91,18.0,DRIVEALONEFREE,11013,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
14347,2478375745,24783757450,7556023,2864688,3097969680,True,othdiscr,94,136,15.0,SHARED3FREE,309796968,1
14348,2478375749,24783757490,7556023,2864688,3097969680,False,Home,136,94,16.0,SHARED3FREE,309796968,1
14349,2478375753,24783757530,7556023,2864688,3097969690,True,othdiscr,128,136,20.0,DRIVEALONEFREE,309796969,1
14350,2478375757,24783757570,7556023,2864688,3097969690,False,escort,127,128,20.0,TNC_SHARED,309796969,1


# Example Setup if Needed

To avoid duplication of inputs, especially model settings and expressions, the `example_estimation` depends on the `example`.  The following commands create an example setup for use.  The location of these example setups (i.e. the folders) are important because the paths are referenced in this notebook.  The commands below download the skims.omx for the SF county example from the [activitysim resources repository](https://github.com/RSGInc/activitysim_resources).

In [7]:
!activitysim create -e example_estimation_sf -d test

# Run the Estimation Example

The next step is to run the model with an `estimation.yaml` settings file with the following settings in order to output the EDB for all submodels:

```
enable=True

bundles:
  - school_location
  - workplace_location
  - auto_ownership
  - free_parking
  - cdap
  - mandatory_tour_frequency
  - mandatory_tour_scheduling
  - joint_tour_frequency
  - joint_tour_composition
  - joint_tour_participation
  - joint_tour_destination
  - joint_tour_scheduling
  - non_mandatory_tour_frequency
  - non_mandatory_tour_destination
  - non_mandatory_tour_scheduling
  - tour_mode_choice
  - atwork_subtour_frequency
  - atwork_subtour_destination
  - atwork_subtour_scheduling
  - atwork_subtour_mode_choice
  
survey_tables:
  households:
    file_name: survey_data/override_households.csv
    index_col: household_id
  persons:
    file_name:  survey_data/override_persons.csv
    index_col: person_id
  tours:
    file_name:  survey_data/override_tours.csv
  joint_tour_participants:
    file_name:  survey_data/override_joint_tour_participants.csv
```

This enables the estimation mode functionality, identifies which models to run and their output estimation data bundles (EDBs), and the input survey tables, which include the override settings for each model choice.  

With this setup, the model will output an EBD with the following tables for this submodel:
  - model settings - auto_ownership_model_settings.yaml
  - coefficients - auto_ownership_coefficients.csv
  - utilities specification - auto_ownership_SPEC.csv
  - chooser and alternatives data - auto_ownership_values_combined.csv
  
The following code runs the software in estimation mode, inheriting the settings from the simulation setup and using the San Francisco county data setup.  It produces the EDB for all submodels but runs all the model steps identified in the inherited settings file.  

In [8]:
%cd test

In [9]:
!activitysim run -c configs_estimation/configs -c configs -o output -d data_sf

After completing the a run of ActivitySim in estimation mode, we are ready to begin 
re-estimating models.  This process is shown in the other notebooks in this directory.