## Introduction to choice-learn's modelling

In [None]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = ""

import sys

sys.path.append("../")

import numpy as np
import pandas as pd

## Summary

- [Example 1: ConditionalMNL with ModeCanada](#getting-started-with-the-conditionalmnl)
    - [A few words on c-MNL formulation](#conditional-mnl-formulation)
    - [Instantiation and estimation with Choice-Learn](#instantiation--estimation-with-choice-learn)
- [Example 2: ConditionalMNL with SwissMetro](#example-2-swissmetro)

For model customization and more explanation on ChoiceModel and the endpoints, you can go [here](./custom_model.ipynb)

### Getting Started with the Conditional Logit

The choice-learn package offers a high level API to conceive and estimate discrete choice models. Several models are ready to be used, you can check the list [here](../README.md). If you want to create your own model or another one that is not in the list, the lower level API can help you. Check the notebook [here](./custom_model.ipynb).

We begin this tutorial with the estimation of a Conditional Logit Model on the ModeCanada dataset[1]. We try to reproduce the example from [Torch-Choice](https://gsbdbi.github.io/torch-choice/conditional_logit_model_mode_canada/).
Another example from [PyLogit](https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Main%20PyLogit%20Example.ipynb) is [here](#example-2-swissmetro).

First, we download our data as a ChoiceDataset. See the [data management tutorial](./choice_learn_introduction_data.ipynb) first if needed.

In [None]:
# If you want to check what's in the dataset:
from choice_learn.datasets import load_modecanada

transport_df = load_modecanada(as_frame=True)
transport_df.head()

In [None]:
# Initialization of the ChoiceDataset
from choice_learn.data import ChoiceDataset
dataset = load_modecanada(as_frame=False, preprocessing="tutorial")

print(dataset.summary())

Now, we can import the model from choice_learn.models:

### Conditional Logit formulation

The conditional Logit [2] specifies a linear utility for each item i during the choice c with regards to the features:
$$
U(i, c) = \sum_{features} a(i, c) * feature(i, c)
$$

We will define a ConditionalMNL model with regards to our ChoiceDataset.
For each feature in the choice dataset we can specify how it must be specified in the utility.

Let's re-use a common example: the ModeCanada [1] dataset:
$$
U(i, c) = \beta^{inter}_i + \beta^{price} \cdot price(i, c) + \beta^{freq} \cdot freq(i, c) + \beta^{ovt} \cdot ovt(i, c) + \beta^{income}_i \cdot income(c) + \beta^{ivt}_i \cdot ivt(i, c) + \epsilon(i, c)
$$

Note that we want to estimate:

- one $\beta^{price}$, $\beta^{freq}$ and $\beta^{ovt}$ coefficient. They are **shared** by all items.
- one $\beta^{ivt}$ coefficient for **each** item.
- one $\beta^{inter}$ and $\beta^{income}$ coefficient for **each** item, with **additional constraint** to be 0 for the first item (air).

One notes that it makes sense to include an intercept $\beta^{inter}$ for each item since $ivt(i, c)$ and $income(c)$ depends on each choice $c$.

To build a model with the right utility function, we need to specify for each weight:
- a unique name
- the name of the feature it goes with:
    - it must match the feature name in the ChoiceDataset
    - "intercept" is the standardized name used for intercept, pay attention not to override it
- items_indexes: the items concerned, as indexed in the ChoiceDataset

### Instantiation & estimation with Choice-Learn

In [None]:
from choice_learn.models import ConditionalMNL

# Initialization of the model
model = ConditionalMNL()

# Creation of the different weights:

# shared_coefficient add one coefficient that is used for all items specified in the items_indexes:
# Here, cost, freq and ovt coefficients are shared between all items
model.add_shared_coefficient(feature_name="cost", items_indexes=[0, 1, 2, 3])
# You can specify you own coefficient name
model.add_shared_coefficient(feature_name="freq",
                             coefficient_name="beta_frequence",
                             items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(feature_name="ovt", items_indexes=[0, 1, 2, 3])

# ivt is added for each item:
model.add_coefficients(feature_name="ivt", items_indexes=[0, 1, 2, 3])

# add_coefficients adds one coefficient for each specified item_index
# intercept, and income are added for each item except the first one that needs to be zeroed
model.add_coefficients(feature_name="intercept", items_indexes=[1, 2, 3])
model.add_coefficients(feature_name="income", items_indexes=[1, 2, 3])

Now, we can instantiate our ConditionalMNL from the specification. We use LBFGS as the estimation method.

In order to estimate the the coefficients values, use the .fit method with the ChoiceDataset:

In [None]:
history = model.fit(dataset, get_report=True, verbose=2)

It is possible to see the estimated coefficients with the .trainable_weights argument:

In [None]:
model.trainable_weights

The negative loglikelihood can be estimated using .evaluate():

In [None]:
print("The average neg-loglikelihood is:", model.evaluate(dataset).numpy())
print("The total neg-loglikelihood is:", model.evaluate(dataset).numpy()*len(dataset))

The average neg-loglikelihood is: 0.6744666
The total neg-loglikelihood is: 1874.3427090644836


The model automatically creates a report for each of the coefficient, with its estimation, its standard deviation and more:

In [None]:
model.report

Unnamed: 0,Coefficient Name,Coefficient Estimation,Std. Err,z_value,P(.>z)
0,beta_cost:0_0,-0.033339,0.007095,-4.698975,2.622604e-06
1,beta_frequence:0_0,0.092529,0.005098,18.151848,0.0
2,beta_ovt:0_0,-0.043004,0.003225,-13.335655,0.0
3,beta_ivt:0_0,0.059509,0.010073,5.907977,0.0
4,beta_ivt:0_1,-0.006784,0.004433,-1.530147,0.1259804
5,beta_ivt:0_2,-0.00646,0.001898,-3.403007,0.0006664991
6,beta_ivt:0_3,-0.00145,0.001187,-1.221385,0.2219402
7,beta_intercept:0_0,0.698379,1.280196,0.545525,0.5853922
8,beta_intercept:0_1,1.844094,0.708432,2.603063,0.009239554
9,beta_intercept:0_2,3.27418,0.624344,5.244195,1.192093e-07


A faster specification can be done using a dictionnary. It follows torch-choice method to create conditional logit models.
The parameters dict needs to be as follows:
- The key is the feature name
- The value is the mode. Currently three modes are available:
    - constant: the learned coefficient is shared by all items
    - item: one coefficient by item is estimated, the value for the item at index 0 is set to 0
    - item-full: one coefficient by item is estimated

In order to create the same model for the ModeCanada dataset, it looks as follows:

In [None]:
# Instantiation with the coefficients dictionnary
coefficients = {"income": "item",
 "cost": "constant",
 "freq": "constant",
 "ovt": "constant",
 "ivt": "item-full",
 "intercept": "item"}

# Instantiation of the model
cmnl = ConditionalMNL(coefficients=coefficients, epochs=1000)

Using L-BFGS optimizer, setting up .fit() function


In [None]:
history = cmnl.fit(dataset)
print(cmnl.trainable_weights)
print(cmnl.evaluate(dataset).numpy())

We can compare the estimated coefficients and the negative log-likelihood obtained in torch-choice example, and it is similar !

In [None]:
import tensorflow as tf

# Here are the values obtained in the references:
gt_weights = [
    tf.constant([[-0.0890796, -0.0279925, -0.038146]]),
    tf.constant([[-0.0333421]]),
    tf.constant([[0.0925304]]),
    tf.constant([[-0.0430032]]),
    tf.constant([[0.0595089, -0.00678188, -0.00645982, -0.00145029]]),
    tf.constant([[0.697311, 1.8437, 3.27381]]),
]
gt_model = ConditionalMNL(coefficients=coefficients)
gt_model.instantiate(dataset)

# Here we estimate the negative log-likelihood with these coefficients (also, we obtain same value as in those papers):
gt_model.trainable_weights = gt_weights
print("'Ground Truth' Negative LogLikelihood:", gt_model.evaluate(dataset) * len(dataset))

Using L-BFGS optimizer, setting up .fit() function
'Ground Truth' Negative LogLikelihood: tf.Tensor(1874.3427, shape=(), dtype=float32)


In order to estimate the utilities, use the .predict_utility() method. In order to estimate the probabilities, use the .compute_probabilities() method.


In [None]:
# print("Utilities of each item for the first 5 sessions:", cmnl.predict_utility(dataset)[:5])
print("Purchase probability of each item for the first 5 sessions:", cmnl.predict_probas(dataset)[:5])

Purchase probability of each item for the first 5 sessions: tf.Tensor(
[[0.19061336 0.00353294 0.40536717 0.4004825 ]
 [0.34869507 0.00069691 0.36830768 0.28229663]
 [0.14418297 0.00651323 0.40567806 0.44362125]
 [0.34869507 0.00069691 0.36830768 0.28229663]
 [0.34869507 0.00069691 0.36830768 0.28229663]], shape=(5, 4), dtype=float32)


For very large datasets that do not fit entirely in the memory, the LBFGS method might not be the best choice. Here we can use the power of the Tensorflow library to use stochastic gradient descent optimizers.

In this case, it is possible to obtain the same coefficients estimation, also it is a little tricky to get it quickly. We need to adjust the learning rate over time for the optimization not to be too slow.

In [None]:
cmnl = ConditionalMNL(coefficients=coefficients, optimizer="Adam", epochs=2000, batch_size=-1)
history = cmnl.fit(dataset)
cmnl.optimizer.lr = cmnl.optimizer.lr / 5
cmnl.epochs = 4000
history2 = cmnl.fit(dataset)
cmnl.optimizer.lr = cmnl.optimizer.lr  / 10
cmnl.epochs = 20000
history3 = cmnl.fit(dataset)

It can be useful to look at the loss (negative loglikelyhood) over time to see how the estimation goes:

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history["train_loss"])
plt.title("First part of the gradient descent.")
plt.subplot(1, 2, 2)
plt.plot(history2["train_loss"] + history3["train_loss"])
plt.title("Second and third part of the gradient descent.")

In [None]:
cmnl.trainable_weights

[<tf.Variable 'income_w_0:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08402774, -0.02360038, -0.03233443]], dtype=float32)>,
 <tf.Variable 'cost_w_1:0' shape=(1, 1) dtype=float32, numpy=array([[-0.05140809]], dtype=float32)>,
 <tf.Variable 'freq_w_2:0' shape=(1, 1) dtype=float32, numpy=array([[0.09645417]], dtype=float32)>,
 <tf.Variable 'ovt_w_3:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04098967]], dtype=float32)>,
 <tf.Variable 'ivt_w_4:0' shape=(1, 4) dtype=float32, numpy=
 array([[ 0.05871417, -0.00725972, -0.00368804, -0.00105489]],
       dtype=float32)>,
 <tf.Variable 'intercept_w_5:0' shape=(1, 3) dtype=float32, numpy=array([[-1.6874318 , -0.39636904,  1.1344588 ]], dtype=float32)>]

In [None]:
cmnl.evaluate(dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=0.67664874>

A faster specification can be done using a dictionnary. It follows torch-choice \ref{} method to create conditional logit models.
The parameters dict needs to be as follows:
- The key is the feature name
- The value is the mode. Currently three modes are available:
    - constant: the learned coefficient is shared by all items
    - item: one coefficient by item is estimated, the value for the item at index 0 is set to 0
    - item-full: one coefficient by item is estimated

In order to create the same model for the ModeCanada dataset, it looks as follows:

In [None]:
# Instantiation of the parameters dictionnary
coefficients = {"income": "item",
 "cost": "constant",
 "freq": "constant",
 "ovt": "constant",
 "ivt": "item-full",
 "intercept": "item"}

# Instantiation of the model
cmnl = ConditionalMNL(coefficients=coefficients, optimizer="lbfgs", epochs=1000)

In [None]:
history = cmnl.fit(dataset)
for weight in cmnl.trainable_weights:
    print(weight)

<tf.Variable 'income_w_0:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08908693, -0.02799303, -0.0381465 ]], dtype=float32)>
<tf.Variable 'cost_w_1:0' shape=(1, 1) dtype=float32, numpy=array([[-0.03333883]], dtype=float32)>
<tf.Variable 'freq_w_2:0' shape=(1, 1) dtype=float32, numpy=array([[0.09252924]], dtype=float32)>
<tf.Variable 'ovt_w_3:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04300352]], dtype=float32)>
<tf.Variable 'ivt_w_4:0' shape=(1, 4) dtype=float32, numpy=
array([[ 0.05950952, -0.00678373, -0.00646029, -0.00145036]],
      dtype=float32)>
<tf.Variable 'intercept_w_5:0' shape=(1, 3) dtype=float32, numpy=array([[0.698383 , 1.8441006, 3.2741847]], dtype=float32)>


### Example 2: SwissMetro

We reproduce the [PyLogit](https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Main%20PyLogit%20Example.ipynb) example of ConditionalMNL, that is reproduction of a Biogeme example. It uses the SwissMetro dataset[3].

In [None]:
from choice_learn.datasets import load_swissmetro
swiss_dataset = load_swissmetro(as_frame=False, preprocessing="tutorial")
print(swiss_dataset.summary())

In [None]:
# Initialization of the model
swiss_model = ConditionalMNL(optimizer="lbfgs", epochs=10000)

swiss_model.add_coefficients(feature_name="intercept", items_indexes=[0, 1])
swiss_model.add_shared_coefficient(feature_name="travel_time",
                                   items_indexes=[0, 1],
                                   coefficient_name="beta_tt_transit")
swiss_model.add_coefficients(feature_name="travel_time",
                             items_indexes=[2],
                             coefficient_name="beta_tt_car")
swiss_model.add_coefficients(feature_name="cost",
                             items_indexes=[0, 1, 2],
                             coefficient_name="beta_tc")
swiss_model.add_coefficients(feature_name="headway",
                             items_indexes=[0, 1],
                             coefficient_name="beta_he")
swiss_model.add_coefficients(feature_name="seats", items_indexes=[1])
swiss_model.add_shared_coefficient(feature_name="train_survey",
                                   items_indexes=[0, 1],
                                   coefficient_name="beta_survey")
swiss_model.add_coefficients(feature_name="regular_class",
                             items_indexes=[0],
                             coefficient_name="beta_first_class")
swiss_model.add_coefficients(feature_name="single_luggage_piece",
                             items_indexes=[2],
                             coefficient_name="beta_luggage=1")
swiss_model.add_coefficients(feature_name="multiple_luggage_piece",
                             items_indexes=[2],
                             coefficient_name="beta_luggage>1")

In [None]:
history = swiss_model.fit(swiss_dataset)

In [None]:
swiss_model.trainable_weights

[<tf.Variable 'beta_intercept:0' shape=(1, 2) dtype=float32, numpy=array([[-1.2929306 , -0.50257486]], dtype=float32)>,
 <tf.Variable 'beta_tt_transit:0' shape=(1, 1) dtype=float32, numpy=array([[-0.69901353]], dtype=float32)>,
 <tf.Variable 'beta_tt_car:0' shape=(1, 1) dtype=float32, numpy=array([[-0.72298324]], dtype=float32)>,
 <tf.Variable 'beta_tc:0' shape=(1, 3) dtype=float32, numpy=array([[-0.5617619 , -0.28167555, -0.51384664]], dtype=float32)>,
 <tf.Variable 'beta_he:0' shape=(1, 2) dtype=float32, numpy=array([[-0.31433576, -0.3773172 ]], dtype=float32)>,
 <tf.Variable 'beta_seats:0' shape=(1, 1) dtype=float32, numpy=array([[-0.7824475]], dtype=float32)>,
 <tf.Variable 'beta_survey:0' shape=(1, 1) dtype=float32, numpy=array([[2.5424762]], dtype=float32)>,
 <tf.Variable 'beta_first_class:0' shape=(1, 1) dtype=float32, numpy=array([[0.5650172]], dtype=float32)>,
 <tf.Variable 'beta_luggage=1:0' shape=(1, 1) dtype=float32, numpy=array([[0.4227602]], dtype=float32)>,
 <tf.Variable

In [None]:
len(swiss_dataset) * swiss_model.evaluate(swiss_dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=5156.3345>

We find the same results (estimation of parameters and negative log-likelihood) as the PyLogit package.

### References

[1] ModeCanada dataset in *Application and interpretation of nested logit models of intercity mode choice*, Christophier, V. F.; Koppelman, S. (1993)\
[2] Conditional MultiNomialLogit, Train, K.; McFadden, D.; Ben-Akiva, M. (1987)\
[3] Siwssmetro dataset in *The acceptance of modal innovation: The case of Swissmetro*, Bierlaire, M.; Axhausen, K.; Abay, G (2001)\