# Estimating School Location

This notebook illustrates how to re-estimate ActivitySim's school location model.  The steps in the process are:
  - Run ActivitySim in estimation mode to read household travel survey files, run the households through the school location model step, and write an estimation data bundle (EDB) that contains the model utility specifications, coefficients, chooser data, and alternatives data.
  - Read and transform the EDB into the format required by the model estimation package [larch](https://larch.newman.me) and then re-estimate the model coefficients.  No changes to the model specification will be made.
  - Update the ActivitySim model coefficients and re-run the model in simulation mode.
  
The basic estimation workflow is shown below and explained in the next steps.

![estimation workflow](https://github.com/RSGInc/activitysim/raw/develop/docs/images/estimation_example.jpg)

# Load libraries

In [55]:
import larch  # !conda install larch #for estimation
import pandas as pd
import numpy as np
import yaml 
import larch.util.excel
import larch_asim  # utility functions in a local module
import os

# Required Inputs

In addition to a working ActivitySim model setup, estimation mode requires an ActivitySim format household travel survey.  An ActivitySim format household travel survey is very similar to ActivitySim's simulation model tables:

 - households
 - persons
 - tours
 - joint_tour_participants
 - trips (not yet implemented)

Examples of the ActivitySim format household travel survey are included in the [example_estimation data folders](https://github.com/RSGInc/activitysim/tree/develop/activitysim/examples/example_estimation).  The user is responsible for formatting their household travel survey into the appropriate format.  

After creating an ActivitySim format household travel survey, the `scripts/infer.py` script is run to append additional calculated fields.  An example of an additional calculated field is the `household:joint_tour_frequency`, which is calculated based on the `tours` and `joint_tour_participants` tables.  

The input survey files are below.

### Survey households

In [56]:
pd.read_csv("../data_sf/survey_data/override_households.csv")

Unnamed: 0,household_id,TAZ,income,hhsize,HHT,auto_ownership,num_workers,joint_tour_frequency
0,2223759,16,144100,2,1,0,2,1_Main
1,990869,134,48000,2,1,2,2,0_tours
2,125886,113,25900,1,4,1,1,0_tours
3,727893,8,26100,2,1,0,1,0_tours
4,2741769,150,121600,4,1,2,1,0_tours
...,...,...,...,...,...,...,...,...
1995,663493,110,19180,1,6,1,1,0_tours
1996,569375,20,7400,1,6,1,0,0_tours
1997,1445193,17,75000,1,4,0,1,0_tours
1998,2833455,69,0,1,0,0,0,0_tours


### Survey persons

In [57]:
pd.read_csv("../data_sf/survey_data/override_persons.csv")

Unnamed: 0,person_id,household_id,age,PNUM,sex,pemploy,pstudent,ptype,school_taz,workplace_taz,free_parking_at_work,cdap_activity,mandatory_tour_frequency,_escort,_shopping,_othmaint,_othdiscr,_eatout,_social,non_mandatory_tour_frequency
0,166,166,54,1,2,3,3,4,-1,-1,False,N,,0,0,0,0,1,0,4
1,197,197,46,1,2,3,3,4,-1,-1,False,N,,0,1,0,0,0,0,16
2,268,268,46,1,1,3,3,4,-1,-1,False,N,,0,0,1,1,0,0,9
3,375,375,54,1,2,3,3,4,-1,-1,False,N,,0,0,1,0,0,0,8
4,387,387,44,1,2,3,3,4,-1,-1,False,N,,1,0,0,1,0,0,33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4401,7554799,2863464,93,1,2,3,3,5,-1,-1,False,N,,0,0,0,1,0,0,1
4402,7554818,2863483,68,1,1,3,3,5,-1,-1,False,N,,0,0,1,1,0,0,9
4403,7555141,2863806,93,1,2,3,3,5,-1,-1,False,N,,0,2,0,1,0,0,17
4404,7555853,2864518,71,1,1,3,3,5,-1,-1,False,N,,0,0,0,0,0,1,2


### Survey tours

In [58]:
pd.read_csv("../data_sf/survey_data/override_tours.csv")

Unnamed: 0,tour_id,survey_tour_id,person_id,household_id,tour_type,tour_category,destination,origin,start,end,tour_mode,survey_parent_tour_id,parent_tour_id,composition,tdd,atwork_subtour_frequency
0,25820,258200,629,629,school,mandatory,133.0,131.0,12.0,15.0,WALK,,,,115,
1,52265,522650,1274,1274,school,mandatory,188.0,166.0,9.0,15.0,WALK_LOC,,,,76,
2,1117937,11179370,27266,27266,school,mandatory,133.0,9.0,17.0,18.0,WALK_HVY,,,,163,
3,1148523,11485230,28012,28012,school,mandatory,12.0,10.0,17.0,22.0,WALK_LRF,,,,167,
4,1208547,12085470,29476,29476,school,mandatory,13.0,16.0,8.0,15.0,WALK_LOC,,,,61,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5768,302942627,3029426270,7388844,2750003,maint,atwork,5.0,7.0,14.0,14.0,WALK,3.029426e+09,302942643.0,,135,
5769,305120465,3051204650,7441962,2758909,maint,atwork,110.0,2.0,12.0,13.0,SHARED2FREE,3.051205e+09,305120481.0,,113,
5770,308000655,3080006550,7512211,2820876,eat,atwork,14.0,1.0,12.0,13.0,WALK,3.080007e+09,308000690.0,,113,
5771,308073840,3080738400,7513996,2822661,eat,atwork,69.0,107.0,8.0,16.0,SHARED3FREE,3.080739e+09,308073875.0,,62,


### Survey joint tour participants

In [59]:
pd.read_csv("../data_sf/survey_data/survey_joint_tour_participants.csv")

Unnamed: 0,participant_id,tour_id,household_id,person_id,participant_num
0,22095828301,220958283,2223759,5389226,1
1,22095828302,220958283,2223759,5389227,2
2,14429508701,144295087,1606646,3519392,1
3,14429508702,144295087,1606646,3519393,2
4,28367651801,283676518,2628704,6918939,1
...,...,...,...,...,...
226,16297928102,162979281,1769918,3975105,2
227,16297928103,162979281,1769918,3975106,3
228,16297928104,162979281,1769918,3975107,4
229,26353054902,263530549,2519358,6427575,1


# Example Setup if Needed

To avoid duplication of inputs, especially model settings and expressions, the `example_estimation` depends on the `example`.  The following commands create an example setup and then an example estimation setup for use.  The location of these example setups (i.e. the folders) are important because the paths are referenced in this notebook.  

Make sure to add skims.omx from the [mtc box account](https://mtcdrive.app.box.com/v/activitysim/folder/7484860689) for the SF county example to the data_sf folder before running the estimation example.  This large file is not included in the repository.

In [None]:
# create examples
!activitysim create -e example_mtc -d test
!activitysim create -e example_estimation -d test_est

# Run the Estimation Example

The next step is to run the model with an `estimation.yaml` settings file with the following settings in order to output the EDB for school location:

```
enable=True

bundles:
  - school_location

survey_tables:
  households:
    file_name: survey_data/override_households.csv
    index_col: household_id
  persons:
    file_name:  survey_data/override_persons.csv
    index_col: person_id
  tours:
    file_name:  survey_data/override_tours.csv
  joint_tour_participants:
    file_name:  survey_data/override_joint_tour_participants.csv
```

This enables the estimation mode functionality, identifies which models to run and their output estimation data bundles (EDBs), and the input survey tables, which include the override settings for each model choice.  

With this setup, the model will output an EBD with the folling tables:
  - model settings - school_location_model_settings.yaml
  - coefficients - school_location_coefficients.csv
  - utilities specification - school_location_SPEC.csv
  - land use data - school_location_landuse.csv
  - size terms - school_location_size_terms.csv
  - alternatives values - school_location_alternatives_combined.csv
  - chooser data - school_location_choosers_combined.csv
  
The following code runs the software in estimation mode, inheriting the settings from the simulation setup and using the San Francisco county data setup.  It produces the school_location model EDB but runs all the model steps identified in the inherited settings file.  

In [53]:
# run from the notebook folder
!activitysim run -c ../configs -c ../../test/configs -d ../data_sf -d ../../test/data -o ../output

Configured logging using basicConfig
INFO:activitysim:Configured logging using basicConfig
INFO:activitysim.cli.run:using configs_dir: ['../configs', '../../test/configs']
INFO:activitysim.cli.run:using data_dir: ['../data_sf', '../../test/data']
INFO:activitysim.cli.run:using output_dir: ['../output']
INFO - Read logging configuration from: ../configs\logging.yaml
INFO - setting households_sample_size: 0
INFO - setting chunk_size: 0
INFO - setting multiprocess: None
INFO - setting num_processes: None
INFO - setting resume_after: None
INFO - run single process simulation
INFO - open_pipeline
INFO - Set random seed base to 0
INFO - Time to execute open_pipeline : 0.028 seconds (0.0 minutes)
INFO - preload_injectables
INFO - Time to execute preload_injectables : 0.0 seconds (0.0 minutes)
INFO - Reading CSV file ../data_sf\land_use.csv
INFO - renaming columns: {'ZONE': 'TAZ', 'COUNTY': 'county_id'}
INFO - keeping columns: ['DISTRICT', 'SD', 'county_id', 'TOTHH', 'TOTPOP', 'TOTACRE', 'RESA

# Read EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

In [13]:
from larch import P,X

In [14]:
edb_directory = "../output/estimation_data_bundle/school_location/"

def read_csv(filename, **kwargs):
    return pd.read_csv(os.path.join(edb_directory, filename), **kwargs)

### Model Settings

In [15]:
try:
    from yaml import CLoader as yamlLoader
except ImportError:
    from yaml import yamlLoader

with open(os.path.join(edb_directory, "school_location_model_settings.yaml"), 'r') as stream:
    settings = yaml.load(stream, Loader=yamlLoader)


In [16]:
CHOOSER_SEGMENT_COLUMN_NAME = settings['CHOOSER_SEGMENT_COLUMN_NAME']
SEGMENT_IDS = settings['SEGMENT_IDS']
settings

{'SAMPLE_SIZE': 30,
 'SIMULATE_CHOOSER_COLUMNS': ['TAZ', 'school_segment', 'household_id'],
 'CHOOSER_ORIG_COL_NAME': 'TAZ',
 'ALT_DEST_COL_NAME': 'alt_dest',
 'IN_PERIOD': 14,
 'OUT_PERIOD': 8,
 'DEST_CHOICE_COLUMN_NAME': 'school_taz',
 'DEST_CHOICE_LOGSUM_COLUMN_NAME': 'school_taz_logsum',
 'DEST_CHOICE_SAMPLE_TABLE_NAME': 'school_location_sample',
 'SAMPLE_SPEC': 'school_location_sample.csv',
 'SPEC': 'school_location.csv',
 'COEFFICIENTS': 'school_location_coeffs.csv',
 'LOGSUM_SETTINGS': 'tour_mode_choice.yaml',
 'LOGSUM_PREPROCESSOR': 'nontour_preprocessor',
 'LOGSUM_TOUR_PURPOSE': {'university': 'univ',
  'highschool': 'school',
  'gradeschool': 'school'},
 'annotate_persons': {'SPEC': 'annotate_persons_school', 'DF': 'persons'},
 'CHOOSER_TABLE_NAME': 'persons',
 'MODEL_SELECTOR': 'school',
 'CHOOSER_SEGMENT_COLUMN_NAME': 'school_segment',
 'CHOOSER_FILTER_COLUMN_NAME': 'is_student',
 'SEGMENT_IDS': {'university': 3, 'highschool': 2, 'gradeschool': 1},
 'SHADOW_PRICE_TABLE': 's

### Utility specification

In [17]:
spec = read_csv("school_location_SPEC.csv")

In [18]:
spec

Unnamed: 0,Label,Description,Expression,university,highschool,gradeschool
0,local_dist,,_DIST@skims['DIST'],1,1,1
1,util_dist_0_1,"Distance, piecewise linear from 0 to 1 miles","@_DIST.clip(0,1)",coef_univ_dist_0_1,coef_high_dist_0_1,coef_grade_dist_0_1
2,util_dist_1_2,"Distance, piecewise linear from 1 to 2 miles","@(_DIST-1).clip(0,1)",coef_univ_dist_1_2,coef_high_grade_dist_1_2,coef_high_grade_dist_1_2
3,util_dist_2_5,"Distance, piecewise linear from 2 to 5 miles","@(_DIST-2).clip(0,3)",coef_univ_dist_2_5,coef_high_grade_dist_2_5,coef_high_grade_dist_2_5
4,util_dist_5_15,"Distance, piecewise linear from 5 to 15 miles","@(_DIST-5).clip(0,10)",coef_univ_dist_5_15,coef_high_dist_5_15,coef_grade_dist_5_15
5,util_dist_15_up,"Distance, piecewise linear for 15+ miles",@(_DIST-15.0).clip(0),coef_univ_dist_15_up,coef_high_dist_15_up,coef_grade_dist_15_up
6,util_size_variable,Size variable,@(df['size_term'] * df['shadow_price_size_term...,1,1,1
7,util_utility_adjustment,utility adjustment,@df['shadow_price_utility_adjustment'],1,1,1
8,util_no_attractions,No attractions,@df['size_term']==0,-999,-999,-999
9,util_mode_choice_logsum,Mode choice logsum,mode_choice_logsum,coef_mode_logsum,coef_mode_logsum,coef_mode_logsum


In [19]:
# Remove shadow pricing and pre-existing size expression

spec = spec\
.set_index('Label')\
.drop(index=['util_size_variable', 'util_utility_adjustment'])\
.reset_index()

### Zone size term specification

In [20]:
size_spec = read_csv("school_location_size_terms.csv")

In [21]:
school_size_spec = size_spec \
.query("model_selector == 'school'") \
.drop(columns='model_selector') \
.set_index('segment')
school_size_spec = school_size_spec.loc[:,school_size_spec.max()>0]
school_size_spec

Unnamed: 0_level_0,AGE0519,HSENROLL,COLLFTE,COLLPTE
segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
university,0.0,0.0,0.592,0.408
gradeschool,1.0,0.0,0.0,0.0
highschool,0.0,1.0,0.0,0.0


### Zone size term coefficients

In [22]:
size_coef = school_size_spec.stack().reset_index()
size_coef.index = size_coef.iloc[:,0] +"_"+ size_coef.iloc[:,1]
size_coef = size_coef.loc[size_coef.iloc[:,2]>0]
size_coef['constrain'] = 'F'
one_each = size_coef.groupby('segment').first().reset_index()
size_coef.loc[one_each.iloc[:,0] +"_"+ one_each.iloc[:,1], 'constrain'] = 'T'
size_coef = size_coef.iloc[:,2:]
size_coef.columns = ['value','constrain']
size_coef.index.name = 'coefficient_name'
size_coef['value'] = np.log(size_coef['value'])
size_coef

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
university_COLLFTE,-0.524249,T
university_COLLPTE,-0.896488,F
gradeschool_AGE0519,0.0,T
highschool_HSENROLL,0.0,T


### Coefficients

In [23]:
coefficients = read_csv("school_location_coefficients.csv",index_col=0)
coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_univ_dist_0_1,-3.2451,F
coef_univ_dist_1_2,-2.7011,F
coef_univ_dist_2_5,-0.5707,F
coef_univ_dist_5_15,-0.5002,F
coef_univ_dist_15_up,-0.073,F
coef_high_dist_0_1,-0.9523,F
coef_high_grade_dist_1_2,-0.57,F
coef_high_grade_dist_2_5,-0.57,F
coef_high_dist_5_15,-0.193,F
coef_high_dist_15_up,-0.1882,F


### Chooser data

In [24]:
x_co = read_csv("school_location_choosers_combined.csv",index_col='person_id')
x_co.head()

Unnamed: 0_level_0,model_choice,override_choice,TAZ,school_segment,household_id
person_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
629,13,133,131,3,629
1274,10,188,166,3,1274
27266,10,133,9,3,27266
28012,5,12,10,3,28012
29368,185,13,16,3,29368


In [25]:
x_co.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 984 entries, 629 to 7541072
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   model_choice     984 non-null    int64
 1   override_choice  984 non-null    int64
 2   TAZ              984 non-null    int64
 3   school_segment   984 non-null    int64
 4   household_id     984 non-null    int64
dtypes: int64(5)
memory usage: 46.1 KB


### Alternatives Data

In [26]:
x_cv = read_csv("school_location_alternatives_combined.csv", index_col=(0,1))
x_cv.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,1,2,3,4,5,6,7,8,9,10,...,181,182,183,184,185,186,187,188,189,190
person_id,variable,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
629,TAZ,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,...,181.0,182.0,183.0,184.0,185.0,186.0,187.0,188.0,189.0,190.0
629,mode_choice_logsum,-1.5183555655982788,-1.353962644303794,-1.052603975440089,-1.3907332428502703,-1.2112310544656295,-0.9024649772476414,-0.8544294523037963,-0.9528358416884832,-0.9322353907168196,-0.9960110414966424,...,-1.5077968784535976,-1.193792288653567,-1.07496629805523,-1.0762186838325454,-0.8796290051944193,-0.9219559724996268,-0.9920608655458416,-0.9485572132565794,-1.254031405285034,-1.440460138400616
629,pick_count,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
629,prob,0.0,0.0,0.0,0.0,3.964160745753979e-05,0.0,0.0,0.0,0.0016781636530682,0.0006643895800549,...,0.0,0.0,0.0,0.0,0.0004973126878062,0.0,0.0,0.0502720348896301,0.0,0.0
629,shadow_price_size_term_adjustment,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### Zone land use data

In [27]:
landuse = read_csv("school_location_landuse.csv", index_col='TAZ')
landuse.head()

Unnamed: 0_level_0,DISTRICT,SD,county_id,TOTHH,TOTPOP,TOTACRE,RESACRE,CIACRE,TOTEMP,AGE0519,...,OPRKCST,area_type,HSENROLL,COLLFTE,COLLPTE,TOPOLOGY,TERMINAL,household_density,employment_density,density_index
TAZ,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1,1,1,46,82,20.3,1.0,15.0,27318,7,...,932.83514,0,0.0,0.0,0.0,3,5.89564,2.875,1707.375,2.870167
2,1,1,1,134,240,31.1,1.0,24.79297,42078,19,...,885.61682,0,0.0,0.0,0.0,1,5.84871,5.195214,1631.374751,5.178722
3,1,1,1,267,476,14.7,1.0,2.31799,2445,38,...,716.27252,0,0.0,0.0,0.0,1,5.53231,80.470405,736.891913,72.547987
4,1,1,1,151,253,19.3,1.0,18.0,22434,20,...,314.0,0,0.0,0.0,0.0,2,5.6433,7.947368,1180.736842,7.894233
5,1,1,1,611,1069,52.7,1.0,15.0,15662,86,...,314.01431,0,0.0,72.14684,0.0,1,5.52555,38.1875,978.875,36.753679


# Data Processing and Estimation Setup

The next step is to transform the EDB for larch for model re-estimation.  

In [28]:
x_ca = larch_asim.cv_to_ca(x_cv)

In [29]:
x_ca_1 = pd.merge(x_ca, landuse, on='TAZ', how='left')
x_ca_1.index = x_ca.index

In [30]:
x_ca_1, x_co = larch_asim.prevent_overlapping_column_names(x_ca_1, x_co)

In [31]:
d = larch.DataFrames(
    co=x_co,
    ca=x_ca_1,
    av=True,
)

### Utility specifications

In [32]:
m = larch.Model()

In [33]:
m.utility_ca = larch_asim.linear_utility_from_spec(
    spec, x_col='Label', 
    p_col=SEGMENT_IDS, 
    ignore_x=('local_dist',), 
    segment_id=CHOOSER_SEGMENT_COLUMN_NAME,
)
print(m.utility_ca)

  P.coef_univ_dist_0_1 * X('util_dist_0_1*(school_segment==3)')
+ P.coef_univ_dist_1_2 * X('util_dist_1_2*(school_segment==3)')
+ P.coef_univ_dist_2_5 * X('util_dist_2_5*(school_segment==3)')
+ P.coef_univ_dist_5_15 * X('util_dist_5_15*(school_segment==3)')
+ P.coef_univ_dist_15_up * X('util_dist_15_up*(school_segment==3)')
+ P('-999') * X('util_no_attractions*(school_segment==3)')
+ P.coef_mode_logsum * X('util_mode_choice_logsum*(school_segment==3)')
+ P('1') * X('util_sample_of_corrections_factor*(school_segment==3)')
+ P.coef_high_dist_0_1 * X('util_dist_0_1*(school_segment==2)')
+ P.coef_high_grade_dist_1_2 * X('util_dist_1_2*(school_segment==2)')
+ P.coef_high_grade_dist_2_5 * X('util_dist_2_5*(school_segment==2)')
+ P.coef_high_dist_5_15 * X('util_dist_5_15*(school_segment==2)')
+ P.coef_high_dist_15_up * X('util_dist_15_up*(school_segment==2)')
+ P('-999') * X('util_no_attractions*(school_segment==2)')
+ P.coef_mode_logsum * X('util_mode_choice_logsum*(school_segment==2)')
+ P(

In [34]:
m.quantity_ca = sum(
    P(f"{i}_{q}") * X(q) * X(f"{settings['CHOOSER_SEGMENT_COLUMN_NAME']}=={settings['SEGMENT_IDS'][i]}")
    for i in school_size_spec.index
    for q in school_size_spec.columns
    if school_size_spec.loc[i,q]!=0
)

In [35]:
larch_asim.explicit_value_parameters_from_spec(spec, p_col=SEGMENT_IDS, model=m)

### Coefficients

In [36]:
m.pf

Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note
-999,-999.0,0.0,0.0,-inf,inf,1,
1,1.0,0.0,0.0,-inf,inf,1,
coef_grade_dist_0_1,0.0,0.0,0.0,-inf,inf,0,
coef_grade_dist_15_up,0.0,0.0,0.0,-inf,inf,0,
coef_grade_dist_5_15,0.0,0.0,0.0,-inf,inf,0,
coef_high_dist_0_1,0.0,0.0,0.0,-inf,inf,0,
coef_high_dist_15_up,0.0,0.0,0.0,-inf,inf,0,
coef_high_dist_5_15,0.0,0.0,0.0,-inf,inf,0,
coef_high_grade_dist_1_2,0.0,0.0,0.0,-inf,inf,0,
coef_high_grade_dist_2_5,0.0,0.0,0.0,-inf,inf,0,


In [37]:
larch_asim.apply_coefficients(coefficients, m)
larch_asim.apply_coefficients(size_coef, m, minimum=-6, maximum=6)

In [38]:
m.pf  # Spot check, confirm coefficients set correctly. 

Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note
-999,-999.0,-999.0,-999.0,-999.0,-999.0,1,
1,1.0,1.0,1.0,1.0,1.0,1,
coef_grade_dist_0_1,-1.6419,0.0,0.0,,,0,
coef_grade_dist_15_up,-0.046,0.0,0.0,,,0,
coef_grade_dist_5_15,-0.2031,0.0,0.0,,,0,
coef_high_dist_0_1,-0.9523,0.0,0.0,,,0,
coef_high_dist_15_up,-0.1882,0.0,0.0,,,0,
coef_high_dist_5_15,-0.193,0.0,0.0,,,0,
coef_high_grade_dist_1_2,-0.57,0.0,0.0,,,0,
coef_high_grade_dist_2_5,-0.57,0.0,0.0,,,0,


In [39]:
m.dataservice = d

### Survey choice

In [40]:
m.choice_co_code = 'override_choice'

# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.

In [41]:
m.estimate()

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
-999,-999.0,-999.0,-999.0,-999.0,-999.0,1,,-999.0
1,1.0,1.0,1.0,1.0,1.0,1,,1.0
coef_grade_dist_0_1,-3.727703,0.0,0.0,,,0,,-3.727703
coef_grade_dist_15_up,-0.046,0.0,0.0,,,0,,-0.046
coef_grade_dist_5_15,-0.421664,0.0,0.0,,,0,,-0.421664
coef_high_dist_0_1,-1.980909,0.0,0.0,,,0,,-1.980909
coef_high_dist_15_up,-0.1882,0.0,0.0,,,0,,-0.1882
coef_high_dist_5_15,-0.256995,0.0,0.0,,,0,,-0.256995
coef_high_grade_dist_1_2,-1.188356,0.0,0.0,,,0,,-1.188356
coef_high_grade_dist_2_5,-1.098063,0.0,0.0,,,0,,-1.098063




Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
-999,-999.000000
1,1.000000
coef_grade_dist_0_1,-3.727703
coef_grade_dist_15_up,-0.046000
coef_grade_dist_5_15,-0.421664
coef_high_dist_0_1,-1.980909
coef_high_dist_15_up,-0.188200
coef_high_dist_5_15,-0.256995
coef_high_grade_dist_1_2,-1.188356
coef_high_grade_dist_2_5,-1.098063

Unnamed: 0,0
-999,-999.0
1,1.0
coef_grade_dist_0_1,-3.727703
coef_grade_dist_15_up,-0.046
coef_grade_dist_5_15,-0.421664
coef_high_dist_0_1,-1.980909
coef_high_dist_15_up,-0.1882
coef_high_dist_5_15,-0.256995
coef_high_grade_dist_1_2,-1.188356
coef_high_grade_dist_2_5,-1.098063

Unnamed: 0,0
-999,0.0
1,0.0
coef_grade_dist_0_1,0.000179
coef_grade_dist_15_up,0.0
coef_grade_dist_5_15,-0.000727
coef_high_dist_0_1,-0.000154
coef_high_dist_15_up,0.0
coef_high_dist_5_15,0.000226
coef_high_grade_dist_1_2,0.000831
coef_high_grade_dist_2_5,0.001188


# Output Estimation Results

In [42]:
# Write re-estimated value back into the coefficients file.
est_names = [j for j in coefficients.index if j in m.pf.index]
coefficients.loc[est_names, 'value'] = m.pf.loc[est_names, 'value']

### Write the re-estimated coefficients file

In [43]:
# Write out replacement coefficients file and model summaries
os.makedirs(os.path.join(edb_directory,'estimated'), exist_ok=True)

In [44]:
coefficients.reset_index().to_csv(
    os.path.join(
        edb_directory, 
        'estimated',
        "school_location_coefficients_revised.csv",
    ),
    index=False,
)

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [47]:
m.to_xlsx(
    os.path.join(
        edb_directory, 
        'estimated',
        "school_location_model_estimation.xlsx",
    )
)

<larch.util.excel.ExcelWriter at 0x29fa609f888>

### Write updated size coefficients

In [48]:
# Write size coefficients into size_spec
for c in school_size_spec.columns:
    for i in school_size_spec.index:
        param_name = f"{i}_{c}"
        j = (size_spec['segment'] == i) & (size_spec['model_selector'] == 'school')
        try:
            size_spec.loc[j,c] = np.exp(m.get_value(param_name))
        except KeyError:
            pass


In [49]:
# Rescale each row to total 1, not mathematically needed
# but to maintain a consistent approach from existing ASim

size_spec.iloc[:,2:] = (size_spec.iloc[:,2:].div(size_spec.iloc[:,2:].sum(1), axis=0))

In [50]:
size_spec.to_csv(
    os.path.join(edb_directory,'estimated',"school_location_size_terms.csv"), 
    index=False,
)

# Next Steps

The final step is to either manually or automatically copy the `school_location_coefficients_revised.csv` file and `school_location_size_terms.csv` file to the configs folder, rename them to `school_location_coeffs.csv` and `destination_choice_size_terms.csv`, and run ActivitySim in simulation mode.

In [51]:
pd.read_csv(os.path.join(edb_directory,'estimated',"school_location_coefficients_revised.csv"))

Unnamed: 0,coefficient_name,value,constrain
0,coef_univ_dist_0_1,-7.476604,F
1,coef_univ_dist_1_2,-4.420887,F
2,coef_univ_dist_2_5,-1.169426,F
3,coef_univ_dist_5_15,-1.537107,F
4,coef_univ_dist_15_up,-0.073,F
5,coef_high_dist_0_1,-1.980909,F
6,coef_high_grade_dist_1_2,-1.188356,F
7,coef_high_grade_dist_2_5,-1.098063,F
8,coef_high_dist_5_15,-0.256995,F
9,coef_high_dist_15_up,-0.1882,F


In [52]:
pd.read_csv(os.path.join(edb_directory,'estimated',"school_location_size_terms.csv"))

Unnamed: 0,segment,model_selector,TOTHH,RETEMPN,FPSEMPN,HEREMPN,OTHEMPN,AGREMPN,MWTEMPN,AGE0519,HSENROLL,COLLFTE,COLLPTE
0,work_low,workplace,0.0,0.129129,0.193193,0.383383,0.12012,0.01001,0.164164,0.0,0.0,0.0,0.0
1,work_med,workplace,0.0,0.12012,0.197197,0.325325,0.139139,0.008008,0.21021,0.0,0.0,0.0,0.0
2,work_high,workplace,0.0,0.11,0.207,0.284,0.154,0.006,0.239,0.0,0.0,0.0,0.0
3,work_veryhigh,workplace,0.0,0.093,0.27,0.241,0.146,0.004,0.246,0.0,0.0,0.0,0.0
4,university,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088566,0.911434
5,gradeschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,highschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
7,escort,non_mandatory,0.0,0.225,0.0,0.144,0.0,0.0,0.0,0.465,0.166,0.0,0.0
8,shopping,non_mandatory,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,eatout,non_mandatory,0.0,0.742,0.0,0.258,0.0,0.0,0.0,0.0,0.0,0.0,0.0
