# The Nested Logit Model


The Nested Logit model considers sub-groups of alternatives totally substitutables, called 'nests'. The general idea is that a customer might choose its transportation mode between publics transport and its private car. And then, if he decides to use public transportations the customer chooses between taking the train or the bus.\
The classical Conditional Logit does not account for such decision process. Hence the introduction of the Nested Logit. More detailed information are available [here](https://cran.r-project.org/web/packages/mlogit/vignettes/c4.relaxiid.html#:~:text=The%20nested%20logit%20model&text=It%20is%20a%20generalization%20of,different%20nests%20are%20still%20uncorrelated.).


In this notebook we reproduce results from other packages showing how to speficy a Nested Logit model with Choice-Learn and that we reach the right results.

## Summary

In [None]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = ""

import sys

sys.path.append("../../")

import numpy as np
import pandas as pd

### Import the Nested Logit from Choice-Learn !

In [None]:
from choice_learn.models import NestedLogit

## 1- Nested Logit on the SwissMetro dataset

We reproduce the results from [Biogeme](https://biogeme.epfl.ch/sphinx/auto_examples/swissmetro/plot_b09nested.html) that is also reproduced in [PyLogit](https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Nested%20Logit%20Example--Python%20Biogeme%20benchmark--09NestedLogit.ipynb).\
This example uses the SwissMetro dataset further described in the [data introduction](../introduction/2_data_handling.ipynb).



In [None]:
from choice_learn.datasets import load_swissmetro
swiss_dataset = load_swissmetro(preprocessing="biogeme_nested")
print(swiss_dataset.summary())

%%% Summary of the dataset:
Number of items: 3
Number of choices: 6768
 No Shared Features by Choice registered


 Items Features by Choice:
 2 items features 
 with names: (['cost', 'travel_time'],)



The model specified in Biogeme defines two nests:
- The existing modes nest with the train and car *(items indexes of 0 and 2)*
- The future modes nest with the swissmetro *(item index of 1)*

And the utility form is the following:\
&nbsp; &nbsp; &nbsp; $U(i) = \beta^{inter}_i + \beta^{tt} \cdot TT(i) + \beta^{co} \cdot CO(i)$\
with:
- $TT(i)$ the travel time of alternative $i$
- $CO(i)$ the cost of alternative $i$
- $\beta^{inter}_{sm} = 0$

Therefore we have 4 weights in the utility function and the $\gamma_{nest}$ values to estimate. The 'new' nest containing only one alternative, its correlation value $\gamma^{new}$ has no impact, we only need to estimate $\gamma^{old}$.

With Choice-Learn, the Nested Logit model specification is similar to the [Conditional Logit specification](./../introduction/3_model_clogit.ipynb). The few differences are:
- When the model is instantiated, the nested need to be specified as a list of nests with the concerned items indexes. In the example, we specify `items_nests=[[0, 2], [1]]` saying that first nest contains the items of indexes 0 (train) and 2 (car) and the second nest the item of index 1 (swiss metro).
- The "fast" dict-base specifications has another alternative with `coefficients={feature_name: "nest"}` creating for the feature feature_name one coefficient to estimate by nest, this coefficient being shared by all alternatives of the nest.

In [None]:
# Initialization of the model
swiss_model = NestedLogit(optimizer="lbfgs", items_nests=[[0, 2], [1]], batch_size=-1, lr=0.002, epochs=100)

# Intercept for train & sm
swiss_model.add_coefficients(feature_name="intercept", items_indexes=[0, 2])

# betas TT and CO shared by train and sm
swiss_model.add_shared_coefficient(feature_name="travel_time",
                                   items_indexes=[0, 1, 2])
swiss_model.add_shared_coefficient(feature_name="cost",
                                   items_indexes=[0, 1, 2])


In [None]:
# Estimation of the model
history = swiss_model.fit(swiss_dataset, get_report=True, verbose=2)



Using L-BFGS optimizer, setting up .fit() function
Got nest 1 on 2 with 2 items.
Got nest 2 on 2 with 1 items.


In [None]:
swiss_model.trainable_weights

[<tf.Variable 'beta_intercept:0' shape=(1, 2) dtype=float32, numpy=array([[-0.51194817, -0.1671557 ]], dtype=float32)>,
 <tf.Variable 'beta_travel_time:0' shape=(1, 1) dtype=float32, numpy=array([[-0.8986639]], dtype=float32)>,
 <tf.Variable 'beta_cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.85666525]], dtype=float32)>,
 <tf.Variable 'gammas_nests:0' shape=(1, 1) dtype=float32, numpy=array([[0.48683944]], dtype=float32)>]

In [None]:
swiss_model.report

Unnamed: 0,Coefficient Name,Coefficient Estimation,Std. Err,z_value,P(.>z)
0,beta_intercept_0,-0.511948,0.046159,-11.090879,0.0
1,beta_intercept_1,-0.167156,0.036682,-4.556831,5e-06
2,beta_travel_time,-0.898664,0.054548,-16.474665,0.0
3,beta_cost,-0.856665,0.046482,-18.43008,0.0
4,gammas_nests,0.486839,0.029635,16.427753,0.0


In [None]:
# Looking at the weights
swiss_model.trainable_weights

[<tf.Variable 'beta_intercept:0' shape=(1, 2) dtype=float32, numpy=array([[-0.51194817, -0.1671557 ]], dtype=float32)>,
 <tf.Variable 'beta_travel_time:0' shape=(1, 1) dtype=float32, numpy=array([[-0.8986639]], dtype=float32)>,
 <tf.Variable 'beta_cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.85666525]], dtype=float32)>,
 <tf.Variable 'gammas_nests:0' shape=(1, 1) dtype=float32, numpy=array([[0.48683944]], dtype=float32)>]

In [None]:
# Estimating the total summed Negative Log-Likelihood
swiss_model.evaluate(swiss_dataset) * len(swiss_dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=5236.9>

In [None]:
# Probabilities can be easily computed:
probas = swiss_model.predict_probas(swiss_dataset)
print(probas[:4])

tf.Tensor(
[[0.15937707 0.6218435  0.21877941]
 [0.19402009 0.64451504 0.16146487]
 [0.11813082 0.5976908  0.2841783 ]
 [0.12110616 0.5260698  0.35282403]], shape=(4, 3), dtype=float32)


### Interpretation and comparison with Biogeme results

## 2- Nested Logit with the HC Dataset

We reproduce results from [mlogit](https://cran.r-project.org/web/packages/mlogit/vignettes/e2nlogit.html) that are also presented in [Torch-Choice](https://gsbdbi.github.io/torch-choice/nested_logit_model_house_cooling/).

In [None]:
import pandas as pd
from choice_learn.data import ChoiceDataset

# Loading
df = pd.read_csv("../../../../HC.csv")

In [None]:
df.head()

Unnamed: 0,rownames,depvar,ich.gcc,ich.ecc,ich.erc,ich.hpc,ich.gc,ich.ec,ich.er,icca,och.gcc,och.ecc,och.erc,och.hpc,och.gc,och.ec,och.er,occa,income
0,1,erc,9.7,7.86,8.79,11.36,24.08,24.5,7.37,27.28,2.26,4.09,3.85,1.73,2.26,4.09,3.85,2.95,20.0
1,2,hpc,8.77,8.69,7.09,9.37,28.0,32.71,9.33,26.49,2.3,2.69,3.45,1.65,2.3,2.69,3.45,1.63,50.0
2,3,gcc,7.43,8.86,6.94,11.7,25.71,31.68,8.14,22.63,2.28,5.25,4.35,1.44,2.28,5.25,4.35,2.18,50.0
3,4,gcc,9.18,8.93,7.22,12.13,29.72,26.73,8.04,25.33,2.62,4.89,4.85,1.93,2.62,4.89,4.85,2.7,50.0
4,5,gcc,8.05,7.02,8.44,10.51,23.9,28.35,7.15,25.45,2.52,3.71,3.64,1.63,2.52,3.71,3.64,2.77,60.0


It is possible to pre-process the dataset like in the examples to 'easily' specify the Nested Logit model:

In [None]:
items_id = ["gcc", "ecc", "erc", "hpc", "gc", "ec", "er"]
cooling_modes = ["gcc", "ecc", "erc", "hpc"]
room_modes = ["erc", "er"]
non_cooling_modes = ["gc", "ec", "er"]

for mode in items_id:
    if mode in cooling_modes:
        df[f"icca.{mode}"] = df["icca"]
        df[f"occa.{mode}"] = df["occa"]
    else:
        df[f"icca.{mode}"] = 0.
        df[f"occa.{mode}"] = 0.

In [None]:
for item in items_id:
    if item in cooling_modes:
        df[f"int_cooling.{item}"] = 1.
        df[f"inc_cooling.{item}"] = df.income
    else:
        df[f"int_cooling.{item}"] = 0.
        df[f"inc_cooling.{item}"] = 0.
    if item in room_modes:
        df[f"inc_room.{item}"] = df.income
    else:
        df[f"inc_room.{item}"] = 0

In [None]:
# Creating the dataset from this preprocessed dataframe
dataset = ChoiceDataset.from_single_wide_df(df=df,
                                            items_features_prefixes=["ich", "och", "occa", "icca",
                                                                     "int_cooling", "inc_cooling", "inc_room"],
                                            delimiter=".",
                                            items_id=items_id,
                                            choices_column="depvar",
                                            choice_format="items_id")

In [None]:
dataset.items_features_by_choice_names

We can use the fast specification using a dictionnary with the 'constant' keyword.

In [None]:
spec = {
    "ich": "constant",
    "och": "constant",
    "occa": "constant",
    "icca": "constant",
    "int_cooling":"constant",
    "inc_cooling": "constant",
    "inc_room": "constant"
}
model = NestedLogit(
    coefficients=spec,
    items_nests=[[0, 1, 2, 3], [4, 5, 6]],
    optimizer="lbfgs",
    shared_gammas_over_nests=True # Note the argument specifying that all nests have the same gamma value
)


Using L-BFGS optimizer, setting up .fit() function
Got nest 1 on 2 with 4 items.
Got nest 2 on 2 with 3 items.


In [None]:
hist = model.fit(dataset, get_report=True, verbose=1)




Using L-BFGS optimizer, setting up .fit() function
Got nest 1 on 2 with 4 items.
Got nest 2 on 2 with 3 items.


In [None]:
model.trainable_weights

[<tf.Variable 'ich_w_0:0' shape=(1, 1) dtype=float32, numpy=array([[-0.5546904]], dtype=float32)>,
 <tf.Variable 'och_w_1:0' shape=(1, 1) dtype=float32, numpy=array([[-0.857596]], dtype=float32)>,
 <tf.Variable 'occa_w_2:0' shape=(1, 1) dtype=float32, numpy=array([[-1.0874541]], dtype=float32)>,
 <tf.Variable 'icca_w_3:0' shape=(1, 1) dtype=float32, numpy=array([[-0.22486664]], dtype=float32)>,
 <tf.Variable 'int_cooling_w_4:0' shape=(1, 1) dtype=float32, numpy=array([[-6.0049725]], dtype=float32)>,
 <tf.Variable 'inc_cooling_w_5:0' shape=(1, 1) dtype=float32, numpy=array([[0.24946265]], dtype=float32)>,
 <tf.Variable 'inc_room_w_6:0' shape=(1, 1) dtype=float32, numpy=array([[-0.37883615]], dtype=float32)>,
 <tf.Variable 'gamma_nests:0' shape=(1, 1) dtype=float32, numpy=array([[0.58575785]], dtype=float32)>]

In [None]:
model.report

Unnamed: 0,Coefficient Name,Coefficient Estimation,Std. Err,z_value,P(.>z)
0,ich_w_0,-0.55469,0.174447,-3.179699,0.001474
1,och_w_1,-0.857596,0.300293,-2.85586,0.004292
2,occa_w_2,-1.087454,1.056027,-1.02976,0.303123
3,icca_w_3,-0.224867,0.112238,-2.003486,0.045125
4,int_cooling_w_4,-6.004972,4.986424,-1.204264,0.228487
5,inc_cooling_w_5,0.249463,0.053563,4.657358,3e-06
6,inc_room_w_6,-0.378836,0.116015,-3.265419,0.001093
7,gamma_nests,0.585758,0.242306,2.417429,0.015631


Another possibility is to keep the dataset as is and specify manually the model:

In [None]:
# Creating the dataset
dataset = ChoiceDataset.from_single_wide_df(df=df,
                                            shared_features_columns=["income"],
                                            items_features_prefixes=["ich", "och", "occa", "icca"],
                                            delimiter=".",
                                            items_id=items_id,
                                            choices_column="depvar",
                                            choice_format="items_id")

Using the manual specification we define each weight and the indexes of the concerned items.

In [None]:
model = NestedLogit(items_nests=[[0, 1, 2, 3], [4, 5, 6]],
                    optimizer="lbfgs",
                    shared_gammas_over_nests=True)
# Coefficients that are for all the alternatives
model.add_shared_coefficient(feature_name="ich", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="och", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="icca", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="occa", items_indexes=[0, 1, 2, 3, 4, 5, 6])

# The coefficients concerning the income are split into two groups of alternatives:
model.add_shared_coefficient(feature_name="income", items_indexes=[0, 1, 2, 3], coefficient_name="income_cooling")
model.add_shared_coefficient(feature_name="income", items_indexes=[2, 6], coefficient_name="income_room")

# Finally only one nest has an intercept
model.add_shared_coefficient(feature_name="intercept", items_indexes=[0, 1, 2, 3])

Using L-BFGS optimizer, setting up .fit() function
Got nest 1 on 2 with 4 items.
Got nest 2 on 2 with 3 items.


In [None]:
hist = model.fit(dataset, get_report=True, verbose=1)




Using L-BFGS optimizer, setting up .fit() function
Got nest 1 on 2 with 4 items.
Got nest 2 on 2 with 3 items.








In [None]:
model.report

Unnamed: 0,Coefficient Name,Coefficient Estimation,Std. Err,z_value,P(.>z)
0,beta_ich,-0.554688,0.174451,-3.179613,0.001475
1,beta_och,-0.857592,0.3003,-2.855788,0.004293
2,beta_icca,-0.224866,0.11224,-2.003443,0.04513
3,beta_occa,-1.08746,1.056047,-1.029746,0.303129
4,income_cooling,0.249462,0.053563,4.657318,3e-06
5,income_room,-0.378834,0.116017,-3.265331,0.001093
6,beta_intercept,-6.0049,4.986637,-1.204198,0.228513
7,gamma_nests,0.585755,0.242311,2.417369,0.015633
