# Estimating Non-Mandatory Tour Frequency

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [None]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd
import numpy as np
import activitysim
import datetime
activitysim.__version__

We'll work in our `test` directory, where ActivitySim has saved the estimation data bundles.

In [None]:
os.chdir('C:\ABM3_dev\outputs')

In [None]:
def write_coeffs(segment):
    path = r'output\estimation_data_bundle\non_mandatory_tour_frequency'
    spec = pd.read_csv(os.path.join(path, f'non_mandatory_tour_frequency_SPEC.csv'))
    coefs = spec[segment].dropna()
    coefs_df = pd.DataFrame()
    coefs_df['coefficient_name'] = coefs
    coefs_df['value'] = 0.0
    coefs_df['constrain'] = 'F'
    coefs_df.loc[coefs_df['coefficient_name'] == 'coef_unavailable', 'value'] = -999
    coefs_df.loc[coefs_df['coefficient_name'] == 'coef_unavailable', 'constrain'] = 'T'
    # coefs_df.to_csv(os.path.join(path, segment, f'non_mandatory_tour_frequency_coefficients_{segment}.csv'), index=False)
    coefs_df.to_csv(os.path.join(r'C:\ABM3_dev\ABM\src\asim\configs\estimation', f'non_mandatory_tour_frequency_coefficients_{segment}.csv'), index=False)

# write_coeffs('PTYPE_FULL')
# write_coeffs('PTYPE_PART')
# write_coeffs('PTYPE_UNIVERSITY')
# write_coeffs('PTYPE_NONWORK')
# write_coeffs('PTYPE_RETIRED')
# write_coeffs('PTYPE_DRIVING')
# write_coeffs('PTYPE_SCHOOL')
# write_coeffs('PTYPE_PRESCHOOL')

# Load data and prep model for estimation

In [None]:
modelname = "nonmand_tour_freq"

from activitysim.estimation.larch import component_model
# model, data = component_model(modelname, return_data=True, condense_parameters=False, num_chunks=10)
model, data = component_model(modelname, return_data=True, condense_parameters=False, segment_subset=['PTYPE_PART'], num_chunks=10)

The prototype model spec we are re-estimating has 210 rows for each person type, but the
accompanying dataset is not large enough to successfully estimate anywhere near than many
parameters. The `condense_parameters` option is activated here as a short cut to making
a model that can be estimated with stable parameter results.  When activated, it merges
parameters not only by name (i.e. when the same name appears twice it is the same parameter)
but also by value, so that if the initial value of any two parameters is identical
then they are treated as the same parameter.  Using "condense_parameters" in actual model
estimation efforts is ill advised and may generate confusing or unexpected results.

This component actually has a distinct choice model for each person type, so
instead of a single model there's a `dict` of models.

In [95]:
type(model)

dict

In [96]:
model.keys()

dict_keys(['PTYPE_PART'])

# Review data loaded from the EDB

We can review the data loaded as well, similarly there is seperate data 
for each person type.

## Coefficients

In [97]:
data.coefficients['PTYPE_PART']

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_escorting_tour,0.0,F
coef_discretionary_tour,0.0,F
coef_shopping_tour,0.0,F
coef_maintenance_tour,0.0,F
coef_visiting_or_social_tour,0.0,F
...,...,...
coef_telecommute_2_3_days_week_and_tour_freq_2,0.0,F
coef_telecommute_2_3_days_week_and_tour_freq_3p,0.0,F
coef_telecommute_4_days_week_and_tour_freq_1,0.0,F
coef_telecommute_4_days_week_and_tour_freq_2,0.0,F


## Utility specification

In [98]:
data.spec['PTYPE_PART']

0                                 coef_escorting_tour
1                             coef_discretionary_tour
2                                  coef_shopping_tour
3                               coef_maintenance_tour
4                        coef_visiting_or_social_tour
                           ...                       
94     coef_telecommute_2_3_days_week_and_tour_freq_2
95    coef_telecommute_2_3_days_week_and_tour_freq_3p
96       coef_telecommute_4_days_week_and_tour_freq_1
97       coef_telecommute_4_days_week_and_tour_freq_2
98      coef_telecommute_4_days_week_and_tour_freq_3p
Name: PTYPE_PART, Length: 99, dtype: object

In [99]:
type(data.spec['PTYPE_PART'])

pandas.core.series.Series

## Chooser data

In [100]:
data.chooser_data['PTYPE_PART']

Unnamed: 0,person_id,model_choice,override_choice,household_id,PNUM,age,sex,pemploy,pstudent,is_student,...,num_full_time_workers_not_self,num_part_time_workers_not_self,num_university_students_not_self,num_non_workers_not_self,num_retirees_not_self,num_driving_age_students_not_self,num_pre_driving_age_school_kids_not_self,num_pre_school_kids_not_self,retiredHh,num_travel_active_pre_drive_students
0,1,128,96,1,1,49,2,2,3,False,...,0,0,0,1,0,0,2,0,0,2
1,45,24,6,26,2,47,2,2,3,False,...,0,0,0,1,0,0,0,0,0,0
2,54,40,12,31,1,63,1,2,3,False,...,0,0,0,1,0,0,0,0,0,0
3,88,80,1,45,2,39,2,2,3,False,...,1,0,0,0,0,0,0,1,0,0
4,95,122,0,48,2,65,2,2,3,False,...,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2367,96834,148,0,49734,1,68,2,2,3,False,...,0,0,0,0,0,0,0,0,0,0
2368,96837,120,12,49737,1,49,2,2,3,False,...,1,1,0,1,1,0,0,0,0,0
2369,96852,3,0,49745,2,21,2,2,1,True,...,1,0,0,0,0,0,0,0,0,0
2370,96865,135,12,49751,4,51,2,2,3,False,...,1,0,0,0,2,0,0,0,0,0


In [101]:
alt_df = data.alt_values['PTYPE_PART']
alt_df.head()

Unnamed: 0,person_id,variable,0,1,2,3,4,5,6,7,...,187,188,189,190,191,192,193,194,195,196
0,1,util_auto_deficient_tour_freq_1,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,1,util_auto_deficient_tour_freq_2,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,1,util_auto_deficient_tour_freq_3p,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,1,util_discretionary_tour,0,1,2,0,1,2,0,1,...,0,1,0,0,0,0,1,0,0,0
4,1,util_eating_out_tour,0,0,0,0,0,0,1,1,...,0,0,0,1,0,0,0,0,1,0


In [102]:
df = data.chooser_data['PTYPE_PART'].copy()
alts = pd.read_csv(r"C:\ABM3_dev\outputs\output\estimation_data_bundle\non_mandatory_tour_frequency\non_mandatory_tour_frequency_alternatives.csv", index_col=0)
df = df.merge(alts, how='left', left_on='override_choice', right_index=True)

In [103]:
tour_counts = []
for col in ['escort','shopping','othmaint','eatout','social','othdiscr','tot_tours', 'num_mandatory_tours']:
    tmp = df[col].value_counts()
    tour_counts.append(tmp)

tour_counts = pd.concat(tour_counts, axis=1).fillna(0).astype(int)
tour_counts.loc['Total'] = tour_counts.sum(axis=0)
tour_counts

Unnamed: 0,escort,shopping,othmaint,eatout,social,othdiscr,tot_tours,num_mandatory_tours
0,2054,2007,1814,2173,2297,1923,849,1259
1,245,343,507,199,75,409,1040,1032
2,73,22,51,0,0,40,366,81
3,0,0,0,0,0,0,93,0
4,0,0,0,0,0,0,21,0
5,0,0,0,0,0,0,3,0
Total,2372,2372,2372,2372,2372,2372,2372,2372


In [104]:
df.num_mandatory_tours.value_counts()

0    1259
1    1032
2      81
Name: num_mandatory_tours, dtype: int64

# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  The prototype model spec we are re-estimating has 210 rows for each person type, but the accompanying dataset is not large enough to successfully estimate anywhere near than many parameters, so a short cut is applied by having one parameter only per unique existing parameter value.

In [105]:
for k, m in model.items():
    print(f"Person type {k} has {len(m.utility_ca)} utility terms and {len(m.pf)} unique parameters.")

Person type PTYPE_PART has 90 utility terms and 90 unique parameters.


For future estimation work, parameters can be intelligently named and applied to match the model developer's desired structure (by using the same named parameter for multiple rows of the spec file).  If this is done, the "short cut" should be disabled by setting `condense_parameters=False` in the loading step above.

Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [106]:
for k, m in model.items():
    # m.estimate(method='SLSQP')
    m.estimate(method='BHHH')

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
coef_auto_deficient_tour_freq_1,-0.391007,0.0,0.0,,,0,,-0.391007
coef_auto_deficient_tour_freq_2,-0.878192,0.0,0.0,,,0,,-0.878192
coef_auto_deficient_tour_freq_3p,-0.121238,0.0,0.0,,,0,,-0.121238
coef_discretionary_tour,-1.739999,0.0,0.0,,,0,,-1.739999
coef_eating_out_tour,-3.159874,0.0,0.0,,,0,,-3.159874
...,...,...,...,...,...,...,...,...
coef_work_mc_logsum_and_tour_freq_2,0.036613,0.0,0.0,,,0,,0.036613
coef_work_mc_logsum_and_tour_freq_3p,0.093017,0.0,0.0,,,0,,0.093017
coef_zero_auto_tour_freq_1,-0.853418,0.0,0.0,,,0,,-0.853418
coef_zero_auto_tour_freq_2,-0.858171,0.0,0.0,,,0,,-0.858171




### Estimated coefficients

In [None]:
# model['PTYPE_FULL'].parameter_summary()

# Output Estimation Results

In [107]:
datetime.datetime.now().strftime('%d_%m_%Y %H_%M_%S')

'25_07_2023 12_30_57'

In [108]:
from activitysim.estimation.larch import update_coefficients
for k, m in model.items():
    result_dir = data.edb_directory/k/"estimated"
    update_coefficients(
        m, data.coefficients[k], result_dir,
        output_file=f"{modelname}_{k}_coefficients_revised_{datetime.datetime.now().strftime('%d_%m_%Y %H_%M_%S')}.csv",
        relabel_coef=data.relabel_coef.get(k),
    );

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [109]:
for k, m in model.items():
    result_dir = data.edb_directory/k/"estimated"
    m.to_xlsx(
        result_dir/f"{modelname}_{k}_model_estimation_{datetime.datetime.now().strftime('%d_%m_%Y %H_%M_%S')}.xlsx", 
        data_statistics=True,
    )

  xl = ExcelWriter(filename, engine='xlsxwriter_larch', model=model, **kwargs)
  if self.path is not None:
  super().save()


# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [None]:
result_dir = data.edb_directory/'PTYPE_FULL'/"estimated"
pd.read_csv(result_dir/f"{modelname}_PTYPE_FULL_coefficients_revised.csv")

In [110]:
p = pd.read_csv("C:\ABM3_dev\persons.csv")
p.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,hhid,perid,household_serial_no,pnum,age,sex,miltary,pemploy,...,grade,occen5,occsoc5,indcen,weeks,hours,rac1p,hisp,version,naics2_original_code
0,0,0,1,1,0,1,34,2,0,1,...,0,0,51-1011,0,1,40,1,2,0,33
1,1,1,1,2,0,2,16,2,0,3,...,5,0,00-0000,0,0,0,1,2,0,0
2,2,2,1,3,0,3,15,2,0,4,...,5,0,00-0000,0,0,0,1,2,0,0
3,3,3,1,4,0,4,14,2,0,4,...,5,0,00-0000,0,0,0,1,2,0,0
4,4,4,1,5,0,5,12,1,0,4,...,2,0,00-0000,0,0,0,1,2,0,0


In [111]:
p.columns

Index(['Unnamed: 0.1', 'Unnamed: 0', 'hhid', 'perid', 'household_serial_no',
       'pnum', 'age', 'sex', 'miltary', 'pemploy', 'pstudent', 'ptype', 'educ',
       'grade', 'occen5', 'occsoc5', 'indcen', 'weeks', 'hours', 'rac1p',
       'hisp', 'version', 'naics2_original_code'],
      dtype='object')

In [113]:
len(p)

3283880