# Estimating Workplace Location

This notebook illustrates how to re-estimate ActivitySim's workplace location model.  The steps in the process are:
  - Run ActivitySim in estimation mode to read household travel survey files, run the households through the workplace location model step, and write an estimation data bundle (EDB) that contains the model utility specifications, coefficients, chooser data, and alternatives data.
  - Read and transform the EDB into the format required by the model estimation package [larch](https://larch.newman.me) and then re-estimate the model coefficients.  No changes to the model specification will be made.
  - Update the ActivitySim model coefficients and re-run the model in simulation mode.
  
The basic estimation workflow is shown below and explained in the next steps.

![estimation workflow](https://github.com/RSGInc/activitysim/raw/develop/docs/images/estimation_example.jpg)

# Load libraries

In [9]:
import larch  # !conda install larch #for estimation
import pandas as pd
import numpy as np
import yaml 
import larch.util.excel
import larch_asim  # utility functions in a local module
import os

# Required Inputs

In addition to a working ActivitySim model setup, estimation mode requires an ActivitySim format household travel survey.  An ActivitySim format household travel survey is very similar to ActivitySim's simulation model tables:

 - households
 - persons
 - tours
 - joint_tour_participants
 - trips (not yet implemented)

Examples of the ActivitySim format household travel survey are included in the [example_estimation data folders](https://github.com/RSGInc/activitysim/tree/develop/activitysim/examples/example_estimation).  The user is responsible for formatting their household travel survey into the appropriate format.  

After creating an ActivitySim format household travel survey, the `scripts/infer.py` script is run to append additional calculated fields.  An example of an additional calculated field is the `household:joint_tour_frequency`, which is calculated based on the `tours` and `joint_tour_participants` tables.  

The input survey files are below.

### Survey households

In [10]:
pd.read_csv("../data_sf/survey_data/override_households.csv")

Unnamed: 0,household_id,TAZ,income,hhsize,HHT,auto_ownership,num_workers,joint_tour_frequency
0,2223759,16,144100,2,1,0,2,1_Main
1,990869,134,48000,2,1,2,2,0_tours
2,125886,113,25900,1,4,1,1,0_tours
3,727893,8,26100,2,1,0,1,0_tours
4,2741769,150,121600,4,1,2,1,0_tours
...,...,...,...,...,...,...,...,...
1995,663493,110,19180,1,6,1,1,0_tours
1996,569375,20,7400,1,6,1,0,0_tours
1997,1445193,17,75000,1,4,0,1,0_tours
1998,2833455,69,0,1,0,0,0,0_tours


### Survey persons

In [11]:
pd.read_csv("../data_sf/survey_data/override_persons.csv")

Unnamed: 0,person_id,household_id,age,PNUM,sex,pemploy,pstudent,ptype,school_taz,workplace_taz,free_parking_at_work,cdap_activity,mandatory_tour_frequency,_escort,_shopping,_othmaint,_othdiscr,_eatout,_social,non_mandatory_tour_frequency
0,166,166,54,1,2,3,3,4,-1,-1,False,N,,0,0,0,0,1,0,4
1,197,197,46,1,2,3,3,4,-1,-1,False,N,,0,1,0,0,0,0,16
2,268,268,46,1,1,3,3,4,-1,-1,False,N,,0,0,1,1,0,0,9
3,375,375,54,1,2,3,3,4,-1,-1,False,N,,0,0,1,0,0,0,8
4,387,387,44,1,2,3,3,4,-1,-1,False,N,,1,0,0,1,0,0,33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4401,7554799,2863464,93,1,2,3,3,5,-1,-1,False,N,,0,0,0,1,0,0,1
4402,7554818,2863483,68,1,1,3,3,5,-1,-1,False,N,,0,0,1,1,0,0,9
4403,7555141,2863806,93,1,2,3,3,5,-1,-1,False,N,,0,2,0,1,0,0,17
4404,7555853,2864518,71,1,1,3,3,5,-1,-1,False,N,,0,0,0,0,0,1,2


### Survey tours

In [12]:
pd.read_csv("../data_sf/survey_data/override_tours.csv")

Unnamed: 0,tour_id,survey_tour_id,person_id,household_id,tour_type,tour_category,destination,origin,start,end,tour_mode,survey_parent_tour_id,parent_tour_id,composition,tdd,atwork_subtour_frequency
0,25820,258200,629,629,school,mandatory,133.0,131.0,12.0,15.0,WALK,,,,115,
1,52265,522650,1274,1274,school,mandatory,188.0,166.0,9.0,15.0,WALK_LOC,,,,76,
2,1117937,11179370,27266,27266,school,mandatory,133.0,9.0,17.0,18.0,WALK_HVY,,,,163,
3,1148523,11485230,28012,28012,school,mandatory,12.0,10.0,17.0,22.0,WALK_LRF,,,,167,
4,1208547,12085470,29476,29476,school,mandatory,13.0,16.0,8.0,15.0,WALK_LOC,,,,61,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5768,302942627,3029426270,7388844,2750003,maint,atwork,5.0,7.0,14.0,14.0,WALK,3.029426e+09,302942643.0,,135,
5769,305120465,3051204650,7441962,2758909,maint,atwork,110.0,2.0,12.0,13.0,SHARED2FREE,3.051205e+09,305120481.0,,113,
5770,308000655,3080006550,7512211,2820876,eat,atwork,14.0,1.0,12.0,13.0,WALK,3.080007e+09,308000690.0,,113,
5771,308073840,3080738400,7513996,2822661,eat,atwork,69.0,107.0,8.0,16.0,SHARED3FREE,3.080739e+09,308073875.0,,62,


### Survey joint tour participants

In [13]:
pd.read_csv("../data_sf/survey_data/survey_joint_tour_participants.csv")

Unnamed: 0,participant_id,tour_id,household_id,person_id,participant_num
0,22095828301,220958283,2223759,5389226,1
1,22095828302,220958283,2223759,5389227,2
2,14429508701,144295087,1606646,3519392,1
3,14429508702,144295087,1606646,3519393,2
4,28367651801,283676518,2628704,6918939,1
...,...,...,...,...,...
226,16297928102,162979281,1769918,3975105,2
227,16297928103,162979281,1769918,3975106,3
228,16297928104,162979281,1769918,3975107,4
229,26353054902,263530549,2519358,6427575,1


# Run the Estimation Example

The next step is to run the model with an `estimation.yaml` settings file with the following settings in order to output the EDB for workplace location:

```
enable=True

bundles:
  - workplace_location

survey_tables:
  households:
    file_name: survey_data/override_households.csv
    index_col: household_id
  persons:
    file_name:  survey_data/override_persons.csv
    index_col: person_id
  tours:
    file_name:  survey_data/override_tours.csv
  joint_tour_participants:
    file_name:  survey_data/override_joint_tour_participants.csv
```

This enables the estimation mode functionality, identifies which models to run and their output estimation data bundles (EDBs), and the input survey tables, which include the override settings for each model choice.  

With this setup, the model will output an EBD with the folling tables:
  - model settings - workplace_location_model_settings.yaml
  - coefficients - workplace_location_coefficients.csv
  - utilities specification - workplace_location_SPEC.csv
  - land use data - workplace_location_landuse.csv
  - size terms - workplace_location_size_terms.csv
  - alternatives values - workplace_location_alternatives_combined.csv
  - chooser data - workplace_location_choosers_combined.csv
  
The following code runs the software in estimation mode, inheriting the settings from the simulation setup and using the San Francisco county data.  It produces the workplace_location model EDB but runs all the model steps identified in the inherited settings file.  

In [14]:
!activitysim run -c ../configs -c ../../test/configs -d ../data_sf -d ../../test/data -o ../output

Configured logging using basicConfig
INFO:activitysim:Configured logging using basicConfig
INFO:activitysim.cli.run:using configs_dir: ['../configs', '../../test/configs']
INFO:activitysim.cli.run:using data_dir: ['../data_sf', '../../test/data']
INFO:activitysim.cli.run:using output_dir: ['../output']
INFO - Read logging configuration from: ../configs\logging.yaml
INFO - setting households_sample_size: 0
INFO - setting chunk_size: 0
INFO - setting multiprocess: None
INFO - setting num_processes: None
INFO - setting resume_after: None
INFO - run single process simulation
INFO - open_pipeline
INFO - Set random seed base to 0
INFO - Time to execute open_pipeline : 0.024 seconds (0.0 minutes)
INFO - preload_injectables
INFO - Time to execute preload_injectables : 0.001 seconds (0.0 minutes)
INFO - Reading CSV file ../data_sf\land_use.csv
INFO - renaming columns: {'ZONE': 'TAZ', 'COUNTY': 'county_id'}
INFO - keeping columns: ['DISTRICT', 'SD', 'county_id', 'TOTHH', 'TOTPOP', 'TOTACRE', 'RE

# Read EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

In [28]:
edb_directory = "../output/estimation_data_bundle/workplace_location/"

def read_csv(filename, **kwargs):
    return pd.read_csv(os.path.join(edb_directory, filename), **kwargs)

In [29]:
coefficients = read_csv("workplace_location_coefficients.csv", index_col='coefficient_name')
spec = read_csv("workplace_location_SPEC.csv")
alt_values = read_csv("workplace_location_alternatives_combined.csv")
chooser_data = read_csv("workplace_location_choosers_combined.csv")
landuse = read_csv("workplace_location_landuse.csv", index_col='TAZ')
size_spec = read_csv("workplace_location_size_terms.csv")

### Zone size term specification

In [30]:
work_size_spec = size_spec \
.query("model_selector == 'workplace'") \
.drop(columns='model_selector') \
.set_index('segment')
work_size_spec = work_size_spec.loc[:,work_size_spec.max()>0]
work_size_spec

Unnamed: 0_level_0,RETEMPN,FPSEMPN,HEREMPN,OTHEMPN,AGREMPN,MWTEMPN
segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
work_low,0.129,0.193,0.383,0.12,0.01,0.164
work_med,0.12,0.197,0.325,0.139,0.008,0.21
work_high,0.11,0.207,0.284,0.154,0.006,0.239
work_veryhigh,0.093,0.27,0.241,0.146,0.004,0.246


### Zone size term coefficients

In [31]:
size_coef = work_size_spec.stack().reset_index()
size_coef.index = size_coef.iloc[:,0] +"_"+ size_coef.iloc[:,1]
size_coef = size_coef.loc[size_coef.iloc[:,2]>0]
size_coef['constrain'] = 'F'
one_each = size_coef.groupby('segment').first().reset_index()
size_coef.loc[one_each.iloc[:,0] +"_"+ one_each.iloc[:,1], 'constrain'] = 'T'
size_coef = size_coef.iloc[:,2:]
size_coef.columns = ['value','constrain']
size_coef.index.name = 'coefficient_name'
size_coef['value'] = np.log(size_coef['value'])
size_coef

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
work_low_RETEMPN,-2.047943,T
work_low_FPSEMPN,-1.645065,F
work_low_HEREMPN,-0.95972,F
work_low_OTHEMPN,-2.120264,F
work_low_AGREMPN,-4.60517,F
work_low_MWTEMPN,-1.807889,F
work_med_RETEMPN,-2.120264,T
work_med_FPSEMPN,-1.624552,F
work_med_HEREMPN,-1.12393,F
work_med_OTHEMPN,-1.973281,F


### Model settings

In [32]:
settings = yaml.load(
    open(os.path.join(edb_directory,"workplace_location_model_settings.yaml"),"r"), 
    Loader=yaml.SafeLoader,
)
settings

{'SAMPLE_SIZE': 30,
 'SIMULATE_CHOOSER_COLUMNS': ['income_segment', 'TAZ'],
 'SAMPLE_SPEC': 'workplace_location_sample.csv',
 'SPEC': 'workplace_location.csv',
 'COEFFICIENTS': 'workplace_location_coeffs.csv',
 'LOGSUM_SETTINGS': 'tour_mode_choice.yaml',
 'LOGSUM_PREPROCESSOR': 'nontour_preprocessor',
 'LOGSUM_TOUR_PURPOSE': 'work',
 'CHOOSER_ORIG_COL_NAME': 'TAZ',
 'ALT_DEST_COL_NAME': 'alt_dest',
 'IN_PERIOD': 17,
 'OUT_PERIOD': 8,
 'DEST_CHOICE_COLUMN_NAME': 'workplace_taz',
 'DEST_CHOICE_LOGSUM_COLUMN_NAME': 'workplace_location_logsum',
 'DEST_CHOICE_SAMPLE_TABLE_NAME': 'workplace_location_sample',
 'annotate_persons': {'SPEC': 'annotate_persons_workplace',
  'DF': 'persons',
  'TABLES': ['land_use']},
 'annotate_households': {'SPEC': 'annotate_households_workplace',
  'DF': 'households',
  'TABLES': ['persons']},
 'CHOOSER_TABLE_NAME': 'persons_merged',
 'MODEL_SELECTOR': 'workplace',
 'CHOOSER_SEGMENT_COLUMN_NAME': 'income_segment',
 'CHOOSER_FILTER_COLUMN_NAME': 'is_worker',
 'S

### Coefficients

In [33]:
coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_dist_0_1,-0.8428,F
coef_dist_1_2,-0.3104,F
coef_dist_2_5,-0.3783,F
coef_dist_5_15,-0.1285,F
coef_dist_15_up,-0.0917,F
coef_dist_0_5_high,0.15,F
coef_dist_5_up_high,0.02,F
coef_mode_logsum,0.3,F


### Utility specification

In [34]:
spec

Unnamed: 0,Label,Description,Expression,coefficient
0,local_dist,,_DIST@skims['DIST'],1
1,util_dist_0_1,"Distance, piecewise linear from 0 to 1 miles","@_DIST.clip(0,1)",coef_dist_0_1
2,util_dist_1_2,"Distance, piecewise linear from 1 to 2 miles","@(_DIST-1).clip(0,1)",coef_dist_1_2
3,util_dist_2_5,"Distance, piecewise linear from 2 to 5 miles","@(_DIST-2).clip(0,3)",coef_dist_2_5
4,util_dist_5_15,"Distance, piecewise linear from 5 to 15 miles","@(_DIST-5).clip(0,10)",coef_dist_5_15
5,util_dist_15_up,"Distance, piecewise linear for 15+ miles",@(_DIST-15.0).clip(0),coef_dist_15_up
6,util_dist_0_5_high,"Distance 0 to 5 mi, high and very high income",@(df['income_segment']>=WORK_HIGH_SEGMENT_ID) ...,coef_dist_0_5_high
7,util_dist_15_up_high,"Distance 5+ mi, high and very high income",@(df['income_segment']>=WORK_HIGH_SEGMENT_ID) ...,coef_dist_5_up_high
8,util_size_variable,Size variable,@(df['size_term'] * df['shadow_price_size_term...,1
9,util_utility_adjustment,utility adjustment,@df['shadow_price_utility_adjustment'],1


### Remove shadow pricing and pre-existing size expression for re-estimation

In [35]:
spec = spec\
.set_index('Label')\
.drop(index=['util_size_variable', 'util_utility_adjustment'])\
.reset_index()

### Alternatives data

In [36]:
alt_values

Unnamed: 0,person_id,variable,1,2,3,4,5,6,7,8,...,181,182,183,184,185,186,187,188,189,190
0,72241,TAZ,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,...,181.0,182.0,183.0,184.0,185.0,186.0,187.0,188.0,189.0,190.0
1,72241,mode_choice_logsum,-0.6000073826104492,-0.4656491845430101,-0.41794791684660004,-0.4545880924383482,-0.3870339464573669,-0.713189384898449,-0.6002723120707204,-0.7303789868909905,...,-1.2551998198820722,-1.444611856370077,-1.309689434695158,-1.2118895631639268,-1.116133290115392,-1.1401586021041086,-1.268242046936546,-1.1992258811031282,-1.332212687016352,-1.4322381548918224
2,72241,pick_count,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,72241,prob,0.044801462203924856,0.06764474840448732,0.004636496646133522,0.038560793011622845,0.029432761447063308,0.007082161657578104,0.021002192072313054,0.007080174169833742,...,9.835213648420107e-05,9.37955508970241e-05,0.0003916917023644097,0.0004549197781990045,0.0006986771541699331,0.0005700269910735453,0.00022288388655805036,0.0015168772236387644,6.456615866696308e-05,0.00038493516540732525
4,72241,shadow_price_size_term_adjustment,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49072,7515185,util_mode_logsum,5.209319664605113,5.263859224685696,5.250578793547065,5.240693588870104,5.266707024418452,5.385882663451519,5.429836866161502,5.413580929225948,...,3.5638723444203606,3.8737395301473363,3.939960434175001,3.896122982249229,4.037021365960745,4.008376112238483,4.022965354450543,4.2561835681622435,4.137475938698699,3.803872808728852
49073,7515185,util_no_attractions,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
49074,7515185,util_sample_of_corrections_factor,3.773690429573432,3.2901835133378077,5.885037102917519,3.829120168967722,3.995759873960041,5.287496147764242,4.186480380588901,5.144701771913909,...,8.673308209076758,8.543415039654008,7.12433751865688,7.0286614320788905,6.59573872948536,6.749128801712912,7.618771571501497,5.665033450001899,8.80633209328273,7.046647564016155
49075,7515185,util_size_variable,8.607993215742676,9.068802152955321,6.44746749765505,8.457988473558888,8.245952810520611,6.957999623413992,7.998487266140552,6.960822948235117,...,4.315499470829833,4.3335976552693785,5.721835205763653,5.789241258107118,6.1502039682537495,6.018658905769173,5.06677615336607,6.994814238173049,3.948605626489833,5.759690106594497


### Chooser data

In [37]:
chooser_data

Unnamed: 0,person_id,model_choice,override_choice,income_segment,TAZ
0,72241,13,1,1,17
1,72441,100,34,1,60
2,72528,139,92,1,69
3,73144,77,115,1,125
4,73493,117,115,1,133
...,...,...,...,...,...
2578,7514214,80,69,1,56
2579,7514284,187,15,1,72
2580,7514404,105,24,1,81
2581,7514777,87,16,1,106


# Data Processing and Estimation Setup

The next step is to transform the EDB for larch for model re-estimation.  

In [38]:
from larch import P, X

### Utility specifications

In [39]:
m = larch.Model()

In [40]:
m.utility_ca = larch_asim.linear_utility_from_spec(
    spec, x_col='Label', p_col='coefficient', 
    ignore_x=('local_dist',), 
)
print(m.utility_ca)

  P.coef_dist_0_1 * X.util_dist_0_1
+ P.coef_dist_1_2 * X.util_dist_1_2
+ P.coef_dist_2_5 * X.util_dist_2_5
+ P.coef_dist_5_15 * X.util_dist_5_15
+ P.coef_dist_15_up * X.util_dist_15_up
+ P.coef_dist_0_5_high * X.util_dist_0_5_high
+ P.coef_dist_5_up_high * X.util_dist_15_up_high
+ P('-999') * X.util_no_attractions
+ P.coef_mode_logsum * X.util_mode_logsum
+ P('1') * X.util_sample_of_corrections_factor


In [41]:
m.quantity_ca = sum(
    P(f"{i}_{q}") * X(q) * X(f"income_segment=={settings['SEGMENT_IDS'][i]}")
    for i in work_size_spec.index
    for q in work_size_spec.columns
)

In [42]:
larch_asim.apply_coefficients(coefficients, m)
larch_asim.apply_coefficients(size_coef, m, minimum=-6, maximum=6)

### Coefficients

In [43]:
m.pf

Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note
-999,-999.0,-999.0,-999.0,-999.0,-999.0,1,
1,1.0,1.0,1.0,1.0,1.0,1,
coef_dist_0_1,-0.8428,0.0,0.0,,,0,
coef_dist_0_5_high,0.15,0.0,0.0,,,0,
coef_dist_15_up,-0.0917,0.0,0.0,,,0,
coef_dist_1_2,-0.3104,0.0,0.0,,,0,
coef_dist_2_5,-0.3783,0.0,0.0,,,0,
coef_dist_5_15,-0.1285,0.0,0.0,,,0,
coef_dist_5_up_high,0.02,0.0,0.0,,,0,
coef_mode_logsum,0.3,0.0,0.0,,,0,


In [44]:
x_co = chooser_data.set_index('person_id').rename(columns={'TAZ':'HOMETAZ'})

In [45]:
x_ca = larch_asim.cv_to_ca(
    alt_values.set_index(['person_id', 'variable'])
)

In [46]:
x_ca_1 = pd.merge(x_ca, landuse, on='TAZ', how='left')
x_ca_1.index = x_ca.index

In [47]:
d = larch.DataFrames(
    co=x_co,
    ca=x_ca_1,
    av=True,
)

In [48]:
d.info(1)

larch.DataFrames:  (not computation-ready)
  n_cases: 2583
  n_alts: 190
  data_ca:
    - TAZ                               (490770 non-null float64)
    - mode_choice_logsum                (490770 non-null float64)
    - pick_count                        (490770 non-null float64)
    - prob                              (490770 non-null float64)
    - shadow_price_size_term_adjustment (490770 non-null float64)
    - shadow_price_utility_adjustment   (490770 non-null float64)
    - size_term                         (490770 non-null float64)
    - util_dist_0_1                     (490770 non-null float64)
    - util_dist_0_5_high                (490770 non-null float64)
    - util_dist_15_up                   (490770 non-null float64)
    - util_dist_15_up_high              (490770 non-null float64)
    - util_dist_1_2                     (490770 non-null float64)
    - util_dist_2_5                     (490770 non-null float64)
    - util_dist_5_15                    (490770 non-null f

In [49]:
m.dataservice = d

### Survey choice

In [50]:
m.choice_co_code = 'override_choice'

# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.

In [51]:
m.estimate()

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
-999,-999.0,-999.0,-999.0,-999.0,-999.0,1,,-999.0
1,1.0,1.0,1.0,1.0,1.0,1,,1.0
coef_dist_0_1,-1.239747,0.0,0.0,,,0,,-1.239747
coef_dist_0_5_high,0.281467,0.0,0.0,,,0,,0.281467
coef_dist_15_up,-0.0917,0.0,0.0,,,0,,-0.0917
coef_dist_1_2,-0.992783,0.0,0.0,,,0,,-0.992783
coef_dist_2_5,-0.731466,0.0,0.0,,,0,,-0.731466
coef_dist_5_15,-0.25377,0.0,0.0,,,0,,-0.25377
coef_dist_5_up_high,0.11331,0.0,0.0,,,0,,0.11331
coef_mode_logsum,0.017697,0.0,0.0,,,0,,0.017697


  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,0
Unnamed: 0_level_1,0
-999,-999.000000
1,1.000000
coef_dist_0_1,-1.239747
coef_dist_0_5_high,0.281467
coef_dist_15_up,-0.091700
coef_dist_1_2,-0.992783
coef_dist_2_5,-0.731466
coef_dist_5_15,-0.253770
coef_dist_5_up_high,0.113310
coef_mode_logsum,0.017697

Unnamed: 0,0
-999,-999.0
1,1.0
coef_dist_0_1,-1.239747
coef_dist_0_5_high,0.281467
coef_dist_15_up,-0.0917
coef_dist_1_2,-0.992783
coef_dist_2_5,-0.731466
coef_dist_5_15,-0.25377
coef_dist_5_up_high,0.11331
coef_mode_logsum,0.017697

Unnamed: 0,0
-999,0.0
1,0.0
coef_dist_0_1,2.8e-05
coef_dist_0_5_high,0.003857
coef_dist_15_up,0.0
coef_dist_1_2,0.000438
coef_dist_2_5,0.003526
coef_dist_5_15,0.001785
coef_dist_5_up_high,0.001724
coef_mode_logsum,-0.001374


# Output Estimation Results

In [52]:
est_names = [j for j in coefficients.index if j in m.pf.index]
coefficients.loc[est_names,'value'] = m.pf.loc[est_names, 'value']

In [53]:
# Write out replacement coefficients file and model summaries
os.makedirs(os.path.join(edb_directory,'estimated'), exist_ok=True)

### Write the re-estimated coefficients file

In [54]:
coefficients.reset_index().to_csv(
    os.path.join(edb_directory,'estimated',"workplace_location_coefficients_revised.csv"), 
    index=False,
)

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [55]:
m.to_xlsx(
    os.path.join(edb_directory,'estimated',"workplace_location_model_estimation.xlsx"), 
)

<larch.util.excel.ExcelWriter at 0x24a733e8bc8>

In [56]:
# Write size coefficients into size_spec
for c in work_size_spec.columns:
    for i in work_size_spec.index:
        param_name = f"{i}_{c}"
        j = (size_spec['segment'] == i) & (size_spec['model_selector'] == 'workplace')
        size_spec.loc[j,c] = np.exp(m.get_value(param_name))
        

In [57]:
# Rescale each row to total 1, not mathematically needed
# but to maintain a consistent approach from existing ASim

size_spec.iloc[:,2:] = (size_spec.iloc[:,2:].div(size_spec.iloc[:,2:].sum(1), axis=0))

### Write updated size coefficients

In [58]:
size_spec.to_csv(
    os.path.join(edb_directory,'estimated',"workplace_location_size_terms.csv"), 
    index=False,
)

# Next Steps

The final step is to either manually or automatically copy the `workplace_location_coefficients_revised.csv` file and `workplace_location_size_terms.csv` file to the configs folder, rename them to `workplace_location_coeffs.csv` and `destination_choice_size_terms.csv`, and run ActivitySim in simulation mode.

In [59]:
pd.read_csv(os.path.join(edb_directory,'estimated',"workplace_location_coefficients_revised.csv"))

Unnamed: 0,coefficient_name,value,constrain
0,coef_dist_0_1,-1.239747,F
1,coef_dist_1_2,-0.992783,F
2,coef_dist_2_5,-0.731466,F
3,coef_dist_5_15,-0.25377,F
4,coef_dist_15_up,-0.0917,F
5,coef_dist_0_5_high,0.281467,F
6,coef_dist_5_up_high,0.11331,F
7,coef_mode_logsum,0.017697,F


In [60]:
pd.read_csv(os.path.join(edb_directory,'estimated',"workplace_location_size_terms.csv"))

Unnamed: 0,segment,model_selector,TOTHH,RETEMPN,FPSEMPN,HEREMPN,OTHEMPN,AGREMPN,MWTEMPN,AGE0519,HSENROLL,COLLFTE,COLLPTE
0,work_low,workplace,0.0,0.000265,0.829184,0.170522,5e-06,1.9e-05,5e-06,0.0,0.0,0.0,0.0
1,work_med,workplace,0.0,0.000263,0.883763,0.115947,5e-06,1.6e-05,5e-06,0.0,0.0,0.0,0.0
2,work_high,workplace,0.0,0.000246,0.903442,0.026134,6e-06,1.3e-05,0.070158,0.0,0.0,0.0,0.0
3,work_veryhigh,workplace,0.0,0.00023,0.999741,7e-06,6e-06,9e-06,6e-06,0.0,0.0,0.0,0.0
4,university,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.592,0.408
5,gradeschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,highschool,school,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
7,escort,non_mandatory,0.0,0.225,0.0,0.144,0.0,0.0,0.0,0.465,0.166,0.0,0.0
8,shopping,non_mandatory,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,eatout,non_mandatory,0.0,0.742,0.0,0.258,0.0,0.0,0.0,0.0,0.0,0.0,0.0
