# Estimating Work from Home Model

This notebook illustrates how to re-estimate a single model component for ActivitySim.  This process 
includes running ActivitySim in estimation mode to read household travel survey files and write out
the estimation data bundles used in this notebook.  To review how to do so, please visit the other
notebooks in this directory.

# Load libraries

In [None]:
import os
import larch  # !conda install larch -c conda-forge # for estimation
import pandas as pd
import numpy as np
from larch import P, X
import matplotlib.pyplot as plt

In [None]:
os.chdir('C:/ABM3_dev/outputs')
path_to_EDB = r'output\estimation_data_bundle\work_from_home'

In [None]:
# construting pseduomsa map due to data error: (currently being fixed by Ali, won't be necessary in subsequent iterations)
old_lu_file = pd.read_csv(os.path.join(path_to_EDB, 'mgra15_based_input2019_v3.csv'))
zone_to_pseudomsa_map = old_lu_file.set_index('mgra').pseudomsa.to_dict()

In [None]:
# adding or updating additional required variables
chooser_data_path =  os.path.join(path_to_EDB, 'work_from_home_values_combined.csv')
raw_path = os.path.join(path_to_EDB, 'work_from_home_values_combined_raw.csv')

# save initial raw choosers table out of estimation mode
if os.path.exists(raw_path):
    chooser_data = pd.read_csv(raw_path)
else:
    chooser_data = pd.read_csv(chooser_data_path)
    chooser_data.to_csv(raw_path, index=False)

# estimation data doesn't have naics coded, only industry, so mapping required
industry_to_naics_xwalk = {
    'business_srv': 54,
    'other': 0,
    'education': 61,
    'healthcare': 62,
    'mgmt_srv': 55,
    'construction': 23,
    'retail': 0,
    'entertainment': 71,
    'manufacturing': 31,
    'food_srv': 722,
    'military': 9000,
    '0': 0,
    'accomodation': 721,
    'government': 92,
    'agriculture': 0,
}

chooser_data.drop_duplicates(subset=['HH_ID', 'PNUM'], keep='first', inplace=True)

# spec has PRE_COVID flag, need to reset for estimation to be based on survey year
chooser_data['naics_code'] = chooser_data.industry.map(industry_to_naics_xwalk)
chooser_data['util_2016'] = np.where(chooser_data.survey_year == 2016, 1, 0)

# fixing bad pseduomsa
chooser_data['pseudomsa'] = chooser_data.home_zone_id.map(zone_to_pseudomsa_map)
chooser_data['util_cbd'] = np.where(chooser_data.pseudomsa == 1, 1, 0)

# need to recomutue utilities since naics_code re-computed
chooser_data['util_ind_accom'] = np.where(chooser_data.naics_code==721, 1, 0)
chooser_data['util_ind_bus_srv'] = np.where(chooser_data.naics_code==54, 1, 0)
chooser_data['util_ind_construct'] = np.where(chooser_data.naics_code==23, 1, 0)
chooser_data['util_ind_edu'] = np.where(chooser_data.naics_code==61, 1, 0)
chooser_data['util_ind_enter'] = np.where(chooser_data.naics_code==71, 1, 0)
chooser_data['util_ind_food_srv'] = np.where(chooser_data.naics_code==722, 1, 0)
chooser_data['util_ind_gov'] = np.where(chooser_data.naics_code==92, 1, 0)
chooser_data['util_ind_health'] = np.where(chooser_data.naics_code==62, 1, 0)
chooser_data['util_ind_manu'] = np.where(chooser_data.naics_code.isin([31,32,33]), 1, 0)
chooser_data['util_ind_mgmt_srv'] = np.where(chooser_data.naics_code==55, 1, 0)
chooser_data['util_ind_mil'] = np.where(chooser_data.naics_code==9000, 1, 0)


# constructing non-wage/salary by pseudomsa
chooser_data['util_emp_non_ws_wfh_pseudomsa'] = chooser_data.groupby('pseudomsa').emp_non_ws_wfh.transform('sum')

chooser_data.to_csv(chooser_data_path, index=False)


In [None]:
chooser_data.groupby('pseudomsa').emp_non_ws_wfh.sum()

In [None]:
pd.crosstab(chooser_data.util_2016, chooser_data.override_choice, margins=True)

# Load data and prep model for estimation

In [None]:
modelname = "work_from_home"

from activitysim.estimation.larch import component_model
model, data = component_model(modelname, return_data=True)

# Review data loaded from the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

### Coefficients

In [None]:
data.coefficients

#### Utility specification

In [None]:
data.spec

### Chooser data

In [None]:
pd.crosstab(data.chooser_data.util_2016,data.chooser_data.override_choice,margins=True)

In [None]:
data.chooser_data.util_2016.value_counts()

In [None]:
data.chooser_data.survey_year.value_counts()

# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.  Larch has a built-in estimation methods including BHHH, and also offers access to more advanced general purpose non-linear optimizers in the `scipy` package, including SLSQP, which allows for bounds and constraints on parameters.  BHHH is the default and typically runs faster, but does not follow constraints on parameters.

In [None]:
model.load_data()

In [None]:
model.maximize_loglike(method="BHHH")

### Estimated coefficients

In [None]:
model.calculate_parameter_covariance()
model.parameter_summary()

# Output Estimation Results

In [None]:
from activitysim.estimation.larch import update_coefficients
result_dir = data.edb_directory/"estimated"
update_coefficients(
    model, data, result_dir,
    output_file=f"{modelname}_coefficients_revised.csv",
);

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [None]:
# result_dir='/projects/SANDAG/2017 On-Call Modeling Services/Area B/TO 05 - ABM3/estimation/'
model.to_xlsx(
    os.path.join(result_dir, "work_from_home_14.xlsx"), 
    data_statistics=True,
)

# Next Steps

The final step is to either manually or automatically copy the `*_coefficients_revised.csv` file to the configs folder, rename it to `*_coefficients.csv`, and run ActivitySim in simulation mode.

In [None]:
pd.read_csv(result_dir/f"{modelname}_coefficients_revised.csv")