# Exploring destination choice models

Sam Maurer, June 2017

Python 3.6

## Plan

- Set up a simple MNL destination choice model using the `urbansim.urbanchoice` interface

- Refactor the code, using this notebook for ad-hoc testing

- Set up more complex models as needed

- Add support for PyLogit MNL through an alternate constructor (class method)

In [1]:
import numpy as np
import pandas as pd

from patsy import dmatrix
from urbansim.urbanchoice import interaction, mnl

from choicemodels import MultinomialLogit

## Load estimation data from disk

In [2]:
# Suppress scientific notation in the Pandas display output

pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [3]:
tracts = pd.read_csv('../data/tracts.csv').set_index('full_tract_id')

print(tracts.shape[0])
print(tracts.head())

1583
                    city  home_density  work_density  school_density
full_tract_id                                                       
6001008309.000   TIJUANA         0.000         0.000           0.000
6001400100.000  BERKELEY        13.438        13.131          13.512
6001400200.000   OAKLAND        11.090         4.249           0.895
6001400300.000   OAKLAND        28.878         7.672           0.000
6001400400.000   OAKLAND        16.885         4.064           8.150


In [4]:
trips = pd.read_csv('../data/trips.csv').set_index('place_id')

print(trips.shape[0])
print(trips.head())

36765
                 full_tract_id  mode  trip_distance_miles
place_id                                                 
10319850102.000 6095252108.000 6.000               13.428
10319850202.000 6095251902.000 5.000                5.126
10335860102.000 6085511915.000 6.000              156.371
10335860103.000 6085512027.000 6.000                1.616
10335860104.000 6085512027.000 6.000                0.376


## MNL destination choice using urbansim.urbanchoice

In [5]:
# - each trip is a realized choice of a particular census tract
# - we can randomly sample alternative census tracts and build a model
#   of destination choice

In [6]:
# `interaction.mnl_interaction_dataset()` is not documented very well, but 
# this is how it seems to work

# Takes following input:
# - choosers: pandas.DataFrame with unique index
# - alternatives: pandas.DataFrame with unique index
# - SAMPLE_SIZE: number of alternatives for each choice scenario
# - chosenalts: list containing the alternative id chosen by each chooser?

# Returns following output:
# - full list of alternatives that were sampled
# - long-format DataFrame merging the two tables
# - numchoosers X SAMPLE_SIZE matrix representing chosen alternatives

In [7]:
# Start with a sample of 500 trips for easier computation

choosers = trips.loc[np.random.choice(trips.index, 500, replace=False)]
choosers = choosers.loc[choosers.trip_distance_miles.notnull()]

print(choosers.shape[0])
print(choosers.head())

481
                 full_tract_id  mode  trip_distance_miles
place_id                                                 
19290320102.000 6095250601.000 5.000              362.468
14703250204.000 6013339001.000 5.000                0.232
15349130404.000 6001440333.000 6.000               14.164
14953360103.000 6095252605.000 5.000                2.027
18199710308.000 6097153401.000 1.000                0.055


In [8]:
# Sample alternatives and set up long-format data table

numalts = 10

_, merged, chosen = interaction.mnl_interaction_dataset(
    choosers=choosers, alternatives=tracts, SAMPLE_SIZE=numalts, 
    chosenalts=choosers.full_tract_id)

print(merged.shape[0])
print(chosen.shape)

4810
(481, 10)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  alts_sample['join_index'] = np.repeat(choosers.index.values, SAMPLE_SIZE)


In [9]:
# Use patsy to generate the design matrix

model_expression = "home_density + work_density + school_density"

model_design = dmatrix(model_expression, data=merged, return_type='dataframe')

model_design.head()

Unnamed: 0_level_0,Intercept,home_density,work_density,school_density
full_tract_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
6095250601.0,1.0,10.833,0.904,7.123
6085511707.0,1.0,21.619,3.419,3.363
6075042601.0,1.0,11.582,4.309,4.809
6085500800.0,1.0,29.53,41.694,0.917
6013337100.0,1.0,8.08,0.0,0.0


In [10]:
log_likelihoods, fit_parameters = mnl.mnl_estimate(
    model_design.as_matrix(), chosen, numalts=10)

print(log_likelihoods)
print(fit_parameters)

{'null': -1107.5434297301363, 'convergence': -1045.6140951619575, 'ratio': 0.055915942351144166}
   Coefficient  Std. Error  T-Score
0        0.000       0.092    0.000
1        0.012       0.005    2.504
2        0.022       0.002   13.496
3        0.014       0.005    2.782


  return PMAT(np.exp(self.mat))


## Same thing using in ChoiceModels

This is a work in progress!

In [11]:
# Using the merged dataset and model expression from above, 
# we can run the estimation in ChoiceModels now

model = MultinomialLogit(merged, chosen, numalts, model_expression)
results = model.fit()

print(type(results))
print(results)

<class 'choicemodels.mnl.MultinomialLogitResults'>
{'null': -1107.5434297301363, 'convergence': -1045.6140951619575, 'ratio': 0.055915942351144166}   Coefficient  Std. Error  T-Score
0        0.000       0.092    0.000
1        0.012       0.005    2.504
2        0.022       0.002   13.496
3        0.014       0.005    2.782


  return PMAT(np.exp(self.mat))
