## MNL prediction demo

Sam Maurer, July 2017

Pyton 3.6

### Summary

This notebook demonstrates how to fit a model using the ChoiceModels interface and then use the UrbanSim MNL functions to generate predictions. 

Eventually, a prediction interface will be incorporated into the `MultinomialLogitResults` object, but it's not there yet. 

This demo uses the estimation data that's set up in the `Data-prep-01` notebook.

In [24]:
import numpy as np
import pandas as pd

from patsy import dmatrix

from choicemodels import mnl  # could also import form urbansim.urbanchoice
from choicemodels import MultinomialLogit
from choicemodels.tools import MergedChoiceTable

In [2]:
# Suppress deprecation warnings
import warnings; warnings.simplefilter('ignore')

### Load data from disk

In [8]:
tracts = pd.read_csv('../data/tracts.csv').set_index('full_tract_id')
tracts = tracts.loc[(tracts.home_density > 0) | (tracts.work_density > 0) | (tracts.school_density > 0)]

print(tracts.shape[0])
print(tracts.head(3))

1566
                   city  home_density  work_density  school_density
full_tract_id                                                      
6.001400e+09   BERKELEY     13.437961     13.130867       13.511570
6.001400e+09    OAKLAND     11.089638      4.248928        0.894794
6.001400e+09    OAKLAND     28.878399      7.671554        0.000000


In [10]:
trips = pd.read_csv('../data/trips.csv').set_index('place_id')
trips = trips.loc[trips.trip_distance_miles.notnull()]

print(trips.shape[0])
print(trips.head(3))

35787
              full_tract_id  mode  trip_distance_miles
place_id                                              
1.031985e+10   6.095252e+09   6.0            13.428271
1.031985e+10   6.095252e+09   5.0             5.125960
1.033586e+10   6.085512e+09   6.0           156.370628


### Set up estimation data

Each observed trip is a realized choice of a particular destination census tract. We can randomly sample alternative census tracts to build a model of destination choice.

We'll divide the trips into a training set and a testing set, fit an MNL model using the training data, use it to generate predicted choices for the testing data, and compare the predicted to the actual choices.

In [20]:
training_observations = trips.iloc[:1000]
training = MergedChoiceTable(observations = training_observations,
                             alternatives = tracts,
                             chosen_alternatives = training_observations.full_tract_id,
                             sample_size = 100)

testing_observations = trips.iloc[1000:]
testing = MergedChoiceTable(observations = testing_observations,
                            alternatives = tracts,
                            chosen_alternatives = testing_observations.full_tract_id,
                            sample_size = 100)

print(training.to_frame().shape)
print(testing.to_frame().shape)

(100000, 9)
(3473400, 9)


### Fit a model using the training observations

In [17]:
%%time
model_expression = "home_density + work_density + school_density - 1"

model = MultinomialLogit(data = training.to_frame(), 
                         observation_id_col = training.observation_id_col, 
                         choice_col = training.choice_col,
                         model_expression = model_expression)

results = model.fit()
print(results)

                  CHOICEMODELS ESTIMATION RESULTS                  
Dep. Var.:                chosen   No. Observations:               
Model:         Multinomial Logit   Df Residuals:                   
Method:       Maximum Likelihood   Df Model:                       
Date:                              Pseudo R-squ.:                  
Time:                              Pseudo R-bar-squ.:              
AIC:                               Log-Likelihood:       -4,511.057
BIC:                               LL-Null:              -4,605.170
                    coef   std err         z     P>|z|   Conf. Int.
-------------------------------------------------------------------
home_density      0.0113     0.002     6.034                       
work_density      0.0115     0.001    14.265                       
school_density    0.0067     0.004     1.851                       
CPU times: user 322 ms, sys: 58.5 ms, total: 380 ms
Wall time: 194 ms


### Predict destination choices for the testing observations

We'll use the UrbanSim MNL functions directly, because this hasn't been integrated into the ChoiceModels results classes yet. https://github.com/UDST/choicemodels/blob/master/choicemodels/mnl.py#L536

In [18]:
# Pull the coefs out of the results object (the PyLogit syntax would differ)

coefs = results.get_raw_results()['fit_parameters']['Coefficient']
print(coefs)

0    0.011281
1    0.011497
2    0.006688
Name: Coefficient, dtype: float64


In [32]:
# The data columns for prediction need to align with the coefficients; 
# you can do this manually or with patsy, as here

data = testing.to_frame().set_index('full_tract_id')  # identify alternatives by their id

testing_df = dmatrix(model_expression, data=data, return_type='dataframe')
print(testing_df.shape)
print(testing_df.head(3))

(3473400, 3)
               home_density  work_density  school_density
full_tract_id                                            
6.097151e+09      10.659461      6.868701        7.160030
6.013302e+09       6.119812      3.841224        1.087662
6.055202e+09      37.176294      4.003031       15.517876


In [30]:
# Simulate a destination choice for each testing observation

choices = mnl.mnl_simulate(testing_df, coefs, numalts=100, returnprobs=False)

print(choices.shape)
print(choices[:3])

(34734,)
[31  1 84]


In [None]:
# But that function seems to identify choices by position rather than by id,
# which is inconvenient... 

### How does UrbanSim generate household location choices?

These three class methods collectively set up the choosers and alternatives according to various parameters like the sample size, prediction filters, "probability mode," and "choice mode" (aggregate or individual):

- `urbansim.models.MNLDiscreteChocieModel.probabilities()` 
- `urbansim.models.MNLDiscreteChocieModel.summed_probabilities()` 
- `urbansim.models.MNLDiscreteChocieModel.predict()` 

https://github.com/UDST/urbansim/blob/master/urbansim/models/dcm.py#L474

Then this lower-level function generates a table of probabilities for each alternative, which is passed back to the `MNLDiscreteChoiceModel` class for further processing:

- `urbansim.urbanchoice.mnl.mnl_simulate()`

https://github.com/UDST/urbansim/blob/master/urbansim/urbanchoice/mnl.py#L121