## Introduction to choice-learn's modelling

In [None]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = ""

import sys
from pathlib import Path

sys.path.append("../")

import numpy as np
import pandas as pd

## Summary

- [Example 1: ConditionalMNL with ModeCanada](#getting-started-with-the-conditionalmnl)
    - [A few words on c-MNL formulation](#conditional-mnl-formulation)
    - [Instantiation and estimation with Choice-Learn](#instantiation--estimation-with-choice-learn)
- [Example 2: ConditionalMNL with SwissMetro](#example-2-swissmetro)

For model customization and more explanation on ChoiceModel and the endpoints, you can go [here](./custom_model.ipynb)

### Getting Started with the ConditionalMNL

The choice-learn package offers a high level API to conceive and estimate discrete choice models. Several models are ready to be used, you can check the list [here](../README.md). If you want to create your own model or another one that is not in the list, the lower level API can help you. Check the notebook [here](./custom_model.ipynb).

We begin this tutorial with the estimation of a Conditional Logit Model on the ModeCanada dataset[1]. We try to reproduce the example from [Torch-Choice](https://gsbdbi.github.io/torch-choice/conditional_logit_model_mode_canada/).
Another example from [PyLogit](https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Main%20PyLogit%20Example.ipynb) is [here](#example-2-swissmetro).

First, we download our data as a ChoiceDataset. See the [data management tutorial](./choice_learn_introduction_data.ipynb) first if needed.

In [None]:
# If you want to check what's in the dataset:
from choice_learn.datasets import load_modecanada

transport_df = load_modecanada(as_frame=True)
transport_df.head()

In [None]:
# Initialization of the ChoiceDataset
from choice_learn.data import ChoiceDataset
dataset = load_modecanada(as_frame=False, preprocessing="tutorial")

print(dataset.summary())

Now, we can import the model from choice_learn.models:

### Conditional MNL formulation

The conditional MNL[2] specifies a linear utility for each item i during the choice s with regards to the features:
$$
U(i, s) = \sum_{features} a(i, s) * feat(i, s)
$$

We will use a ModelSpecification object to define our model with regards to our ChoiceDataset.
For each feature in the choice dataset we can specify how it must be specified in the utility.

Let's re-use a common example from on the ModeCanda[1] dataset:
$$
U(i, s) = \beta^{inter}_i + \beta^{price} \cdot price(i, s) + \beta^{freq} \cdot freq(i, s) + \beta^{ovt} \cdot ovt(i, s) + \beta^{income}_i \cdot income(s) + \beta^{ivt}_i \cdot ivt(i, t) + \epsilon(i, t)
$$

Note that we want to estimate:

- one $\beta^{price}$, $\beta^{freq}$ and $\beta^{ovt}$ coefficient. They are shared by all items.
- one $\beta^{ivt}$ coefficient for **each** item.
- one $\beta^{inter}$ and $\beta^{income}$ coefficient for **each** item, with **additional constraint** to be 0 for the first item (air).

We will use a ModelSpecification object to create the right utility function.
We need to specify for each weight:
- a unique name
- the name of the feature it goes with:
    - it must match the feature name in the ChoiceDataset
    - "intercept" is the standardized name used for intercept, pay attention not to override it
- items_indexes: the items concerned, as indexed in the ChoiceDataset

### Instantiation & estimation with Choice-Learn

In [None]:
from choice_learn.models import ConditionalMNL

# Initialization of the model
model = ConditionalMNL()

# Creation of the different weights:


# add_coefficients adds one coefficient for each specified item_index
# intercept, and income are added for each item except the first one that needs to be zeroed
model.add_coefficients(coefficient_name="beta_inter", feature_name="intercept", items_indexes=[1, 2, 3])
model.add_coefficients(coefficient_name="beta_income", feature_name="income", items_indexes=[1, 2, 3])

# ivt is added for each item:
model.add_coefficients(coefficient_name="beta_ivt", feature_name="ivt", items_indexes=[0, 1, 2, 3])

# shared_coefficient add one coefficient that is used for all items specified in the items_indexes:
# Here, cost, freq and ovt coefficients are shared between all items
model.add_shared_coefficient(coefficient_name="beta_cost", feature_name="cost", items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(coefficient_name="beta_freq", feature_name="freq", items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(coefficient_name="beta_ovt", feature_name="ovt", items_indexes=[0, 1, 2, 3])

Now, we can instantiate our ConditionalMNL from the specification. We use LBFGS as the estimation method.

In order to estimate the the coefficients values, use the .fit method with the ChoiceDataset:

In [None]:
history = model.fit(dataset, get_report=True)

It is possible to see the estimated coefficients with the .trainable_weights argument:

In [None]:
model.trainable_weights

The negative loglikelihood can be estimated using .evaluate():

In [None]:
print("The average neg-loglikelihood is:", model.evaluate(dataset).numpy())
print("The total neg-loglikelihood is:", model.evaluate(dataset).numpy()*len(dataset))

The average neg-loglikelihood is: 0.6756634
The total neg-loglikelihood is: 1877.6686208844185


The model automatically creates a report for each of the coefficient, with its estimation, its standard deviation and more:

In [None]:
model.report

Unnamed: 0,Coefficient Name,Coefficient Estimation,Std. Err,z_value,P(.>z)
0,beta_inter:0_0,-0.88542,1.283656,-0.689764,0.490342
1,beta_inter:0_1,0.292562,0.700675,0.417543,0.676281
2,beta_inter:0_2,1.978386,0.61352,3.224647,0.001261
3,beta_income:0_0,-0.084481,0.018399,-4.591502,4e-06
4,beta_income:0_1,-0.025297,0.003798,-6.661415,0.0
5,beta_income:0_2,-0.035393,0.004078,-8.679575,0.0
6,beta_ivt:0_0,0.060174,0.010153,5.926788,0.0
7,beta_ivt:0_1,-0.007061,0.004483,-1.575129,0.115227
8,beta_ivt:0_2,-0.004096,0.001894,-2.162846,0.030553
9,beta_ivt:0_3,-0.001254,0.001196,-1.049276,0.294051


A faster specification can be done using a dictionnary. It follows torch-choice method to create conditional logit models.
The parameters dict needs to be as follows:
- The key is the feature name
- The value is the mode. Currently three modes are available:
    - constant: the learned coefficient is shared by all items
    - item: one coefficient by item is estimated, the value for the item at index 0 is set to 0
    - item-full: one coefficient by item is estimated

In order to create the same model for the ModeCanada dataset, it looks as follows:

In [None]:
# Instantiation of the parameters dictionnary
params = {"income": "item",
 "cost": "constant",
 "freq": "constant",
 "ovt": "constant",
 "ivt": "item-full",
 "intercept": "item"}

# Instantiation of the model
cmnl = ConditionalMNL(parameters=params)

In [None]:
history = cmnl.fit(dataset)
print(cmnl.trainable_weights)

We can compare the estimated coefficients and the negative log-likelihood obtained in torch-choice example, and it is similar !

In [None]:
import tensorflow as tf

# Here are the values obtained in the references:
gt_weights = [
    tf.constant([[-0.0890796, -0.0279925, -0.038146]]),
    tf.constant([[-0.0333421]]),
    tf.constant([[0.0925304]]),
    tf.constant([[-0.0430032]]),
    tf.constant([[0.0595089, -0.00678188, -0.00645982, -0.00145029]]),
    tf.constant([[0.697311, 1.8437, 3.27381]]),
]
gt_model = ConditionalMNL(parameters=params, lr=0.01, epochs=1, batch_size=-1)
gt_model.fit(dataset)

# Here we estimate the negative log-likelihood with these coefficients (also, we obtain same value as in those papers):
gt_model.trainable_weights = gt_weights
print("'Ground Truth' Negative LogLikelihood:", gt_model.evaluate(dataset) * len(dataset))

100%|██████████| 1/1 [00:01<00:00,  1.54s/it]

'Ground Truth' Negative LogLikelihood: tf.Tensor(1874.3427, shape=(), dtype=float32)





In order to estimate the utilities, use the .predict_utility() method. In order to estimate the probabilities, use the .compute_probabilities() method.


In [None]:
# print("Utilities of each item for the first 5 sessions:", cmnl.predict_utility(dataset)[:5])
print("Purchase probability of each item for the first 5 sessions:", cmnl.predict_probas(dataset)[:5])

Purchase probability of each item for the first 5 sessions: tf.Tensor(
[[0.17682633 0.00293305 0.47432622 0.34590963]
 [0.30933198 0.00062031 0.44251484 0.24752836]
 [0.13726656 0.00530116 0.47348452 0.38394296]
 [0.30933198 0.00062031 0.44251484 0.24752836]
 [0.30933198 0.00062031 0.44251484 0.24752836]], shape=(5, 4), dtype=float32)


For very large datasets that do not fit entirely in the memory, the LBFGS method might not be the best choice. Here we can use the power of the Tensorflow library to use stochastic gradient descent optimizers.

In this case, it is possible to obtain the same coefficients estimation, also it is a little tricky to get it quickly. We need to adjust the learning rate over time for the optimization not to be too slow.

In [None]:
cmnl = ConditionalMNL(parameters=params, optimizer="Adam", epochs=2000, batch_size=-1)
history = cmnl.fit(dataset)
cmnl.optimizer.lr = cmnl.optimizer.lr / 5
cmnl.epochs = 4000
history2 = cmnl.fit(dataset)
cmnl.optimizer.lr = cmnl.optimizer.lr  / 10
cmnl.epochs = 20000
history3 = cmnl.fit(dataset)

In [None]:
cmnl.trainable_weights

[<tf.Variable 'income:0' shape=(1, 3) dtype=float32, numpy=array([[-0.0840305 , -0.02359775, -0.03233716]], dtype=float32)>,
 <tf.Variable 'cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.05140997]], dtype=float32)>,
 <tf.Variable 'freq:0' shape=(1, 1) dtype=float32, numpy=array([[0.09645204]], dtype=float32)>,
 <tf.Variable 'ovt:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04099233]], dtype=float32)>,
 <tf.Variable 'ivt:0' shape=(1, 4) dtype=float32, numpy=
 array([[ 0.05871223, -0.00726241, -0.00368546, -0.00105757]],
       dtype=float32)>,
 <tf.Variable 'intercept:0' shape=(1, 3) dtype=float32, numpy=array([[-1.6874295 , -0.39639074,  1.1344565 ]], dtype=float32)>]

In [None]:
cmnl.evaluate(dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=0.67664874>

A faster specification can be done using a dictionnary. It follows torch-choice \ref{} method to create conditional logit models.
The parameters dict needs to be as follows:
- The key is the feature name
- The value is the mode. Currently three modes are available:
    - constant: the learned coefficient is shared by all items
    - item: one coefficient by item is estimated, the value for the item at index 0 is set to 0
    - item-full: one coefficient by item is estimated

In order to create the same model for the ModeCanada dataset, it looks as follows:

In [None]:
# Instantiation of the parameters dictionnary
params = {"income": "item",
 "cost": "constant",
 "freq": "constant",
 "ovt": "constant",
 "ivt": "item-full",
 "intercept": "item"}

# Instantiation of the model
cmnl = ConditionalMNL(parameters=params)

In [None]:
history = cmnl.fit(dataset)
print(cmnl.trainable_weights)

100%|██████████| 1000/1000 [01:19<00:00, 12.59it/s]

[<tf.Variable 'income:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08466098, -0.02563945, -0.0348225 ]], dtype=float32)>, <tf.Variable 'cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04650575]], dtype=float32)>, <tf.Variable 'freq:0' shape=(1, 1) dtype=float32, numpy=array([[0.09799877]], dtype=float32)>, <tf.Variable 'ovt:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04276635]], dtype=float32)>, <tf.Variable 'ivt:0' shape=(1, 4) dtype=float32, numpy=
array([[ 0.06055771, -0.00874017, -0.0049375 , -0.00117917]],
      dtype=float32)>, <tf.Variable 'intercept:0' shape=(1, 3) dtype=float32, numpy=array([[-0.8924577 ,  0.28773406,  1.9828479 ]], dtype=float32)>]





### Example 2: SwissMetro

We reproduce the [PyLogit](https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Main%20PyLogit%20Example.ipynb) example of ConditionalMNL, that is reproduction of a Biogeme example. It uses the SwissMetro dataset[3].

In [None]:
from choice_learn.datasets import load_swissmetro
swiss_dataset = load_swissmetro(as_frame=False, preprocessing="tutorial")
print(swiss_dataset.summary())

In [None]:
# Initialization of the model
swiss_model = ConditionalMNL(optimizer="lbfgs", epochs=10000)

swiss_model.add_coefficients(coefficient_name="beta_inter", feature_name="intercept", items_indexes=[0, 1])
swiss_model.add_shared_coefficient(coefficient_name="beta_tt_transit", feature_name="travel_time", items_indexes=[0, 1])
swiss_model.add_coefficients(coefficient_name="beta_tt_car", feature_name="travel_time", items_indexes=[2])
swiss_model.add_coefficients(coefficient_name="beta_tc", feature_name="cost", items_indexes=[0, 1, 2])
swiss_model.add_coefficients(coefficient_name="beta_hw", feature_name="headway", items_indexes=[0, 1])
swiss_model.add_coefficients(coefficient_name="beta_seat", feature_name="seats", items_indexes=[1])
swiss_model.add_shared_coefficient(coefficient_name="beta_survey", feature_name="train_survey", items_indexes=[0, 1])
swiss_model.add_coefficients(coefficient_name="beta_first_class", feature_name="regular_class", items_indexes=[0])
swiss_model.add_coefficients(coefficient_name="beta_luggage=1", feature_name="single_luggage_piece", items_indexes=[2])
swiss_model.add_coefficients(coefficient_name="beta_luggage>1", feature_name="multiple_luggage_piece", items_indexes=[2])

In [None]:
history = swiss_model.fit(swiss_dataset)

In [None]:
swiss_model.trainable_weights

[<tf.Variable 'beta_inter:0' shape=(1, 2) dtype=float32, numpy=array([[-1.2929306 , -0.50257486]], dtype=float32)>,
 <tf.Variable 'beta_tt_transit:0' shape=(1, 1) dtype=float32, numpy=array([[-0.69901353]], dtype=float32)>,
 <tf.Variable 'beta_tt_car:0' shape=(1, 1) dtype=float32, numpy=array([[-0.72298324]], dtype=float32)>,
 <tf.Variable 'beta_tc:0' shape=(1, 3) dtype=float32, numpy=array([[-0.5617619 , -0.28167555, -0.51384664]], dtype=float32)>,
 <tf.Variable 'beta_hw:0' shape=(1, 2) dtype=float32, numpy=array([[-0.31433576, -0.3773172 ]], dtype=float32)>,
 <tf.Variable 'beta_seat:0' shape=(1, 1) dtype=float32, numpy=array([[-0.7824475]], dtype=float32)>,
 <tf.Variable 'beta_survey:0' shape=(1, 1) dtype=float32, numpy=array([[2.5424762]], dtype=float32)>,
 <tf.Variable 'beta_first_class:0' shape=(1, 1) dtype=float32, numpy=array([[0.5650172]], dtype=float32)>,
 <tf.Variable 'beta_luggage=1:0' shape=(1, 1) dtype=float32, numpy=array([[0.4227602]], dtype=float32)>,
 <tf.Variable 'bet

In [None]:
len(swiss_dataset) * swiss_model.evaluate(swiss_dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=5156.3345>

We find the same results (estimation of parameters and negative log-likelihood) as the PyLogit package.

### References

[1] ModeCanada dataset in *Application and interpretation of nested logit models of intercity mode choice*, Christophier, V. F.; Koppelman, S. (1993)\
[2] Conditional MultiNomialLogit, Train, K.; McFadden, D.; Ben-Akiva, M. (1987)\
[3] Siwssmetro dataset in *The acceptance of modal innovation: The case of Swissmetro*, Bierlaire, M.; Axhausen, K.; Abay, G (2001)\