# Exploring destination choice models

Sam Maurer, June 2017

Python 3.6

## Plan

- Set up a simple MNL destination choice model using the `urbansim.urbanchoice` interface

- Refactor the code, using this notebook for ad-hoc testing

- Set up more complex models as needed

- Add support for PyLogit MNL through an alternate constructor (class method)

In [1]:
import numpy as np
import pandas as pd
from patsy import dmatrix

In [2]:
%load_ext autoreload
%autoreload 1

In [3]:
%aimport choicemodels
from choicemodels.urbanchoice import interaction, mnl

## Load estimation data from disk

In [15]:
# Suppress scientific notation in the Pandas display output

pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [5]:
tracts = pd.read_csv('../data/tracts.csv').set_index('full_tract_id')

print(tracts.shape[0])
print(tracts.head())

1583
                   city  home_density  work_density  school_density
full_tract_id                                                      
6001008309      TIJUANA             0             0               0
6001400100     BERKELEY            13            13              14
6001400200      OAKLAND            11             4               1
6001400300      OAKLAND            29             8               0
6001400400      OAKLAND            17             4               8


In [6]:
trips = pd.read_csv('../data/trips.csv').set_index('place_id')

print(trips.shape[0])
print(trips.head())

36765
             full_tract_id  mode  trip_distance_miles
place_id                                             
10319850102     6095252108     6                   13
10319850202     6095251902     5                    5
10335860102     6085511915     6                  156
10335860103     6085512027     6                    2
10335860104     6085512027     6                    0


## Set up MNL destination choice model

In [None]:
# - each trip is a realized choice of a particular census tract
# - we can randomly sample alternative census tracts and build a model
#   of destination choice

In [None]:
# `interaction.mnl_interaction_dataset()` is not documented very well, but 
# this is how it seems to work

# Takes following input:
# - choosers: pandas.DataFrame with unique index
# - alternatives: pandas.DataFrame with unique index
# - SAMPLE_SIZE: number of alternatives for each choice scenario
# - chosenalts: list containing the alternative id chosen by each chooser?

# Returns following output:
# - full list of alternatives that were sampled
# - long-format DataFrame merging the two tables
# - numchoosers X SAMPLE_SIZE matrix representing chosen alternatives

In [11]:
# Start with a sample of 500 trips for easier computation

choosers = trips.loc[np.random.choice(trips.index, 500, replace=False)]
choosers = choosers.loc[choosers.trip_distance_miles.notnull()]

print(choosers.shape[0])
print(choosers.head())

490
             full_tract_id  mode  trip_distance_miles
place_id                                             
30064970109     6001423902     5                    1
72066410112     6075012402     5                    1
18598770202     6001402400     5                   31
14952680102     6085502601     5                    0
25292110206     6085508404     1                    0


In [12]:
# Sample alternatives and set up long-format data table

_, merged, chosen = interaction.mnl_interaction_dataset(
    choosers=choosers, alternatives=tracts, SAMPLE_SIZE=10, 
    chosenalts=choosers.full_tract_id)

print(merged.shape[0])
print(chosen.shape)

4900
(490, 10)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  alts_sample['join_index'] = np.repeat(choosers.index.values, SAMPLE_SIZE)


In [13]:
# Use patsy to generate the design matrix

model_expression = "home_density + work_density + school_density + trip_distance_miles"

model_design = dmatrix(model_expression, data=merged, return_type='dataframe')

model_design.head()

Unnamed: 0_level_0,Intercept,home_density,work_density,school_density,trip_distance_miles
full_tract_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6001423902,1,6,6,2,1
6085503803,1,18,1,0,1
6081608800,1,33,3,2,1
6001422300,1,33,5,1,1
6085504315,1,22,0,1,1


In [16]:
log_likelihoods, fit_parameters = mnl.mnl_estimate(
    model_design.as_matrix(), chosen, numalts=10)

print(log_likelihoods)
print(fit_parameters)

{'null': -1128.2666955670836, 'convergence': -1069.7666590200256, 'ratio': 0.05184947563984865}
   Coefficient  Std. Error  T-Score
0        0.000       0.094    0.000
1        0.025       0.004    5.552
2        0.014       0.001   10.541
3        0.020       0.005    3.772
4       -0.000       0.003   -0.000


  return PMAT(np.exp(self.mat))
