## Introduction to customized Choice Models

The Choice-Learn package aims at providing structure and helpful functions in order to design any choice model. The main idea is to write the utility function and let the package work its magic.
It is recommended to read the data tutorial before to understand the ChoiceDataset class.

Let's create again a conditional MNL on the ModeCanada just like the example previous example \ref{}. Only this time we will write the model ourselves.

In [None]:
import os

# Remove GPU use
os.environ["CUDA_VISIBLE_DEVICES"] = ""

import sys
from pathlib import Path

sys.path.append("../")

import numpy as np
import pandas as pd

We download the ModeCanada dataset as a ChoiceDataset, see \ref{} for more details.

In [None]:
from choice_learn.data import ChoiceDataset

# TO be transformed to be clearer
transport_df = pd.read_csv("/data/raw_data/ModeCanada.csv", index_col=0)

# Following torch-Choice guide:
transport_df = transport_df.loc[transport_df.noalt == 4]

items = ["air", "bus", "car", "train"]

transport_df["oh_air"] = transport_df.apply(lambda row: 1. if row.alt == items[0] else 0., axis=1)
transport_df["oh_bus"] = transport_df.apply(lambda row: 1. if row.alt == items[1] else 0., axis=1)
transport_df["oh_car"] = transport_df.apply(lambda row: 1. if row.alt == items[2] else 0., axis=1)
transport_df["oh_train"] = transport_df.apply(lambda row: 1. if row.alt == items[3] else 0., axis=1)

transport_df.income = transport_df.income.astype("float32")

dataset = ChoiceDataset.from_single_df(df=transport_df,
                                       items_features_columns=["oh_air",
                                                               "oh_bus",
                                                               "oh_car",
                                                               "oh_train"],
                                       sessions_features_columns=["income"],
                                       sessions_items_features_columns=["cost",
                                                                        "freq",
                                                                        "ovt",
                                                                        "ivt"],
                                       items_id_column="alt",
                                       sessions_id_column="case",
                                       choices_column="choice",
                                       choice_mode="one_zero")

We will subclass the parent class ChoiceModel that we need to import. It mainly works with TensorFlow as a backend, it is thus recommended to use  their operation as much as possible. Most NumPy operations have a TensorFlow equivalent. You can look at the documentation here \ref{}.

For our custom model to work, we need to specify:
- Weights initialization in __init__()
- the utility function in compute_utility()

In [None]:
import tensorflow as tf
from choice_learn.models.base_model import ChoiceModel

### *Coefficients Initialization*

Following our utility formula we need four coefficients vectors:
- $\beta^{inter}$ has 3 values
- $\beta^{price}$, $\beta^{freq}$, $\beta^{ovt}$ are regrouped and each has one value, shared by all items
- $\beta^{income}$ has 3 values
- $\beta^{ivt}$ has 4 values

### *Utility Computation*

In the method compute_utility, we need to define how to estimate each item utility for each session using  the features and initialized weights.
The arguments of the function are each features type of the ChoiceDataset class:

| Order | Argument | shape | Features for ModeCanada| 
|---|---|---|---|
| 1 | items_features | (n_items, n_items_features) | Items OneHot vectors | 
| 2 | sessions_features | (n_sessions, n_sessions_features) | Customer Income | 
| 3 | sessions_items_features | (n_sessions, n_items, n_sessions_items_features) | Cost, Freq, Ivt, Ovt values of each mode | 
| 4 | sessions_items_availabilities | (n_sessions, n_items) | Not Used | 
| 5 | choices | (n_sessions, ) | Not Used | 

The method needs to return the utilities, in the form of a matrix of shape (n_sessions, n_items), reprenting the utility of each item for each session.

In [None]:
class CustomCanadaConditionalMNL(ChoiceModel):
    """Conditional MNL following for ModeCanada.

    Arguments:
    ----------
    n_item_features : int
        Number of items features
    n_session_item_features : int
        Number of sessions items features
    optimizer : str
        tf.keras.optimizer to use for training, default is Adam
    lr: float
        learning rate for optimizer, default is 1e-3
    """

    def __init__(
        self,
        **kwargs,
    ):
        """Model coefficients instantiation."""
        super().__init__(**kwargs)

        # Create model weights. Basically is one weight by feature + one for intercept
        beta_inter = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
                                 name="beta_inter")
        beta_freq_cost_ovt = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
                             name="beta_freq_cost_ovt")
        beta_income = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
                             name="beta_income")
        beta_ivt = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 4)),
                               name="beta_ivt")

        # Do not forget to add them to the list of weights, it is mandatory !
        self.weights = [beta_inter, beta_freq_cost_ovt, beta_income, beta_ivt]


    def compute_utility(self, items_batch, sessions_batch, sessions_items_batch, availabilities_batch, choices_batch):
        """Method that defines how the model computes the utility of a product.

        MNL, here U =

        Parameters
        ----------
        items_batch : tuple of np.ndarray (items_features)
            Fixed-Item-Features: formatting from ChoiceDataset: a matrix representing the products constant features.
            Shape must be (n_items, n_items_features)
        sessions_batch : tuple of np.ndarray (sessions_features)
            Time-Features. Not used as not conditional MNL, means it is the same for all products and is not implicated in utility computation.
            Shape must be (n_sessions, n_sessions_features)
        sessions_items_batch : tuple of np.ndarray (sessions_items_features)
            Time-Item-Features
            Shape must be (n_sessions, n_sessions_items_features)
        availabilities_batch : np.ndarray
            Availabilities (sessions_items_availabilities)
            Shape must be (n_sessions, n_items)
        choices_batch : np.ndarray
            Choices
            Shape must be (n_sessions, )

        Returns:
        --------
        np.ndarray
            Utility of each product for each session.
            Shape must be (n_sessions, n_items)
        """
        # We use the fact that items_features is OneHot of the item, letting us selecting the right beta when needed (through dot)
        # Utility from items features + intercept

        # Concatenation to reach right shape for dot product
        full_beta_inter = tf.concat([tf.constant([[.0]]), self.weights[0]], axis=-1)
        u_intercept = tf.tensordot(tf.concat([*items_batch], axis=-1),
                                   tf.transpose(full_beta_inter), axes=1) # has shape (n_items, )

        sessions_items_ivt = sessions_items_batch[0][:, :, 3]
        sessions_items_cost_freq_ovt = sessions_items_batch[0][:, :, :3]
        u_cost_freq_ovt = tf.squeeze(tf.tensordot(sessions_items_cost_freq_ovt,
                                                  tf.transpose(self.weights[1]), axes=1))
        u_ivt = tf.multiply(sessions_items_ivt, self.weights[3])

        # Concatenation to reach right shape for dot product
        full_beta_income = tf.concat([tf.constant([[.0]]), self.weights[2]], axis=-1)
        u_income = tf.tensordot(sessions_batch[0], full_beta_income, axes=1)

        # Reshaping the intercept that is constant over all sessions (n_items, ) -> (n_sessions, n_items)
        u_intercept = tf.concat([tf.transpose(u_intercept)] * (u_income.shape[0]), axis=0)

        return u_intercept + u_cost_freq_ovt + u_income + u_ivt

In [None]:
model = CustomCanadaConditionalMNL(optimizer="lbfgs")
history = model.fit(dataset, n_epochs=400)

### Decomposition of the utility operations

#### > *Intercept*

- $U_{inter}(air, s) = \beta^{inter}_{air} = 0$
- $U_{inter}(bus, s) = \beta^{inter}_{bus}$
- $U_{inter}(car, s) = \beta^{inter}_{car}$
- $U_{inter}(train, s) = \beta^{inter}_{train}$

$\beta^{inter} = \left(\begin{array}{c} 
0 \\
\beta^{inter}_{bus} \\
\beta^{inter}_{car} \\
\beta^{inter}_{train} \\
\end{array}\right)$

$U_{inter} = \beta^{inter.T}$

#### > *Price, Freq, Ovt*
- $U_{price, freq, ovt}(air, s) = \beta^{price} \cdot price(air, s) + \beta^{freq} \cdot freq(air, s) + \beta^{ovt} \cdot ovt(air, s)$
- $U_{price, freq, ovt}(bus, s) = \beta^{price} \cdot price(bus, s) + \beta^{freq} \cdot freq(bus, s) + \beta^{ovt} \cdot ovt(bus, s)$
- $U_{price, freq, ovt}(car, s) = \beta^{price} \cdot price(car, s) + \beta^{freq} \cdot freq(car, s) + \beta^{ovt} \cdot ovt(car, s)$
- $U_{price, freq, ovt}(train, s) = \beta^{price} \cdot price(train, s) + \beta^{freq} \cdot freq(train, s) + \beta^{ovt} \cdot ovt(train, s)$

$\beta^{price, freq, ovt} = \left(\begin{array}{c} 
\beta^{price} \\
\beta^{freq} \\
\beta^{ovt} \\
\end{array}\right)$ and $sessions\_items\_feature[0, :3] = \left(\begin{array}{ccc} 
price(air, 0) & freq(air, 0) & ovt(air, 0) \\
price(bus, 0) & freq(bus, 0) & ovt(bus, 0) \\
price(car, 0) & freq(car, 0) & ovt(car, 0) \\
price(train, 0) & freq(train, 0) & ovt(train, 0) \\
\end{array}\right)$

$U_{price, freq, ovt} = \beta^{price, freq, ovt .T} \cdot sessions\_items\_feature[:, :3]$

Note that in the matrix we couldn't illustrate the sessions dimension, explaining the [0, :3] -> [:, :3].
sessions_items_features[:, :3] has a shape of (batch_size, 4, 3) and $ \beta^{price, freq, ovt}$ a shape of (1, 3).
Resulting $U_{price, freq, ovt} $ has thus shape of (batch_size, 4)

#### > *Ivt*
- $U_{ivt}(air, s) = \beta^{ivt}_{air} \cdot ivt(air, s)$
- $U_{ivt}(bus, s) = \beta^{ivt}_{bus} \cdot ivt(bus, s)$
- $U_{ivt}(car, s) = \beta^{ivt}_{car} \cdot ivt(car, s)$
- $U_{ivt}(train, s) = \beta^{ivt}_{train} \cdot ivt(train, s)$

$\beta^{ivt} = \left(\begin{array}{c} 
\beta^{ivt}_{air} \\
\beta^{ivt}_{bus} \\
\beta^{ivt}_{car}\\
\beta^{ivt}_{train} \\
\end{array}\right)$ and $sessions\_items\_features[:, 3] = \left(\begin{array}{cccc} 
ivt(0, air) & ivt(0, bus) & ivt(0, car) & ivt(0,train) \\
ivt(1, air) & ivt(1, bus) & ivt(1, car) & ivt(1,train) \\
... & ... & ... & ... \\
ivt(batch_size, air) & ivt(batch_size, bus) & ivt(batch_size, car) & ivt(batch_size,train) \\
\end{array}\right)$


$U_{ivt} = \beta^{ivt} * sessions\_items\_features[:, 3]$ of shape (batch_size, 4)

#### > *Income*
- $U_{income}(air, s) = \beta^{income}_{air} \cdot income(s)$
- $U_{income}(bus, s) = \beta^{income}_{bus} \cdot income(s)$
- $U_{income}(car, s) = \beta^{income}_{car} \cdot income(s)$
- $U_{income}(train, s) = \beta^{income}_{train} \cdot income(s)$

$\beta^{income} = \left(\begin{array}{c} 
\beta^{income}_{air} \\
\beta^{income}_{bus} \\
\beta^{income}_{car}\\
\beta^{income}_{train} \\
\end{array}\right)$ and $sessions\_features = \left(\begin{array}{c} 
income(0) \\
income(1) \\
... \\
income(batch\_size)) \\
\end{array}\right)$

$U_{income} = \beta^{income .T} \cdot sessions\_features$

By concatenating batch_size times $U_{inter}$ over the sessions we obtain 4 matrixes of shape (batch_size, 4).

The final utility is then:
$U = U_{inter} + U_{price, freq, ovt} + U_{ivt} + U_{income}$

## Results
We can now test that we o£btain the same results:

In [None]:
print(model.weights[0])
print(model.weights[1])
print(model.weights[2])
print(model.weights[3])

The coefficients are organized differently but reach the same values. It is also the case for negative log-lilkelihood:

In [None]:
print("Total Neg LikeliHood;", model.evaluate(dataset) * len(dataset))

In this example we have used a simple linear function for utility computation. We could use any function we would like. Particularly we can use neural networks and activation functions to add non-linearities.

A simple example would be:

```python
from tensorflow.keras.layers import Dense

class NeuralNetUtility(ChoiceModel):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # First non-linear layer
        self.dense_1 = Dense(units=10, activation="elu")
        # Second linear layer
        self.dense_2 = Dense(units=1, activation="linear")
        # We do not forget to specify self.weights with all coefficients that need to be estimated. Easy with TensorFlow.Layer
        self.weights = self.dense_1.trainable_variables + self.dense_2.trainable_variables
        
    def compute_utility(self, items_batch, sessions_batch, sessions_items_batch, availabilities_batch, choices_batch):
        # We apply the neural network to all sessions_items_features for all the items
        # We then concatenate the utilities of each item of shape (n_sessions, 1) into a single one of shape (n_sessions, n_items)
        u = tf.concat([self.dense_2(self.dense_1(sessions_items_batch[0][:, i])) for i in range(sessions_items_batch[0].shape[1])], axis=1)
        return u
````

If you want more complex examples, you can look at the following implementations:
- [RUMnet](../choice_learn/models/rumnet.py)