In [1]:
import sys

sys.path.append("../")

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from rumboost.rumboost import rum_train
from rumboost.datasets import load_preprocess_LPMC
from rumboost.metrics import cross_entropy

import lightgbm
import hyperopt
import numpy as np


# Example: Nested logit model (correlation amongst alternatives)

This notebook shows features implemented in RUMBoost through an example on the LPMC dataset, a mode choice dataset in London developed Hillel et al. (2018). You can find the original source of data [here](https://www.icevirtuallibrary.com/doi/suppl/10.1680/jsmic.17.00018) and the original paper [here](https://www.icevirtuallibrary.com/doi/full/10.1680/jsmic.17.00018).

We first load the preprocessed dataset and its folds for cross-validation. You can find the data under the Data folder

In [4]:
#load dataset
LPMC_train, LPMC_test, folds = load_preprocess_LPMC(path="../Data/")

## Nested Logit model

We relax the assumption that the error term is distributed i.i.d.. If we assume that alternatives are correlated, we obtain a nested logit-like model. Nested logit probabilities are implemented in RUMBoost. The additional parameter, the scale of a nest $\mu$, can be estimated with two ways:
1. by a hyperparameter search
2. optimised within the trianing loop

Training a nested logit-like rumboost model requires an additional dictionary in the model specification dictionary. The nested logit disctionary follows the following form:

- ```mu```: a list containing the values (as float) of mu for each nest, e.g. ```[mu_nest_0, mu_nest_1]```
- ```nests```: a dictionary representing the nesting structure. Keys are the nests id, and values are the the list of alternatives in the corresponding nest. For example {0: [0, 1], 1: [2, 3]} means that alternative 0 and 1 are in nest 0, and alternative 2 and 3 are in nest 1.
- `optimise_mu`: a boolean or list of boolean. If it is a simple boolean and True, all mu values are found through scipy.minimize. If it is a list of boolean, it should be the same size than `mu` and it represents which value should be optimised or not.
  
In this example, we assume that PT and car are in a 'motorised' nest, while the walking and cycling alternative are in their own nests.

### General parameters

You can find an example of general parameters below. Unless stated otherwise, the parameters are the same than in LightGBM, since these parameters are applied directly to LightGBM Booster objects. You can find more information in the LightGBM [docs](https://lightgbm.readthedocs.io/en/stable/Parameters.html#).  For a simple RUMBoost, we recommend letting most of the parameters with default values, as RUMBoost is less sensitive to overfitting. **For a multiclass classification problem, you need to specify the num_classes parameter with the appropriate number of classes**.

In [10]:
# parameters
general_params = {
    "n_jobs": -1,
    "num_classes": 4,  # important
    "verbosity": 1,  # specific RUMBoost parameter
    "num_iterations": 3000,
    "early_stopping_round": 100,
}

### Random Utility Model structure


In [11]:
rum_structure = [
    {
        "utility": [0],
        "variables": [
            "age",
            "female",
            "day_of_week",
            "start_time_linear",
            "car_ownership",
            "driving_license",
            "purpose_B",
            "purpose_HBE",
            "purpose_HBO",
            "purpose_HBW",
            "purpose_NHBO",
            "fueltype_Average",
            "fueltype_Diesel",
            "fueltype_Hybrid",
            "fueltype_Petrol",
            "distance",
            "dur_walking",
        ],
        "boosting_params": {
            "monotone_constraints_method": "advanced",
            "max_depth": 1,
            "n_jobs": -1,
            "learning_rate": 0.1,
            "monotone_constraints": [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                -1,
                -1,
            ],
            "interaction_constraints": [
                [0],
                [1],
                [2],
                [3],
                [4],
                [5],
                [6],
                [7],
                [8],
                [9],
                [10],
                [11],
                [12],
                [13],
                [14],
                [15],
                [16],
            ],
        },
        "shared": False,
    },
    {
        "utility": [1],
        "variables": [
            "age",
            "female",
            "day_of_week",
            "start_time_linear",
            "car_ownership",
            "driving_license",
            "purpose_B",
            "purpose_HBE",
            "purpose_HBO",
            "purpose_HBW",
            "purpose_NHBO",
            "fueltype_Average",
            "fueltype_Diesel",
            "fueltype_Hybrid",
            "fueltype_Petrol",
            "distance",
            "dur_cycling",
        ],
        "boosting_params": {
            "monotone_constraints_method": "advanced",
            "max_depth": 1,
            "n_jobs": -1,
            "learning_rate": 0.1,
            "monotone_constraints": [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                -1,
                -1,
            ],
            "interaction_constraints": [
                [0],
                [1],
                [2],
                [3],
                [4],
                [5],
                [6],
                [7],
                [8],
                [9],
                [10],
                [11],
                [12],
                [13],
                [14],
                [15],
                [16],
            ],
        },
        "shared": False,
    },
    {
        "utility": [2],
        "variables": [
            "age",
            "female",
            "day_of_week",
            "start_time_linear",
            "car_ownership",
            "driving_license",
            "purpose_B",
            "purpose_HBE",
            "purpose_HBO",
            "purpose_HBW",
            "purpose_NHBO",
            "fueltype_Average",
            "fueltype_Diesel",
            "fueltype_Hybrid",
            "fueltype_Petrol",
            "distance",
            "dur_pt_access",
            "dur_pt_bus",
            "dur_pt_rail",
            "dur_pt_int_waiting",
            "dur_pt_int_walking",
            "pt_n_interchanges",
            "cost_transit",
        ],
        "boosting_params": {
            "monotone_constraints_method": "advanced",
            "max_depth": 1,
            "n_jobs": -1,
            "learning_rate": 0.1,
            "monotone_constraints": [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                -1,
                -1,
                -1,
                -1,
                -1,
                -1,
                -1,
                -1,
            ],
            "interaction_constraints": [
                [0],
                [1],
                [2],
                [3],
                [4],
                [5],
                [6],
                [7],
                [8],
                [9],
                [10],
                [11],
                [12],
                [13],
                [14],
                [15],
                [16],
                [17],
                [18],
                [19],
                [20],
                [21],
                [22],
            ],
        },
        "shared": False,
    },
    {
        "utility": [3],
        "variables": [
            "age",
            "female",
            "day_of_week",
            "start_time_linear",
            "car_ownership",
            "driving_license",
            "purpose_B",
            "purpose_HBE",
            "purpose_HBO",
            "purpose_HBW",
            "purpose_NHBO",
            "fueltype_Average",
            "fueltype_Diesel",
            "fueltype_Hybrid",
            "fueltype_Petrol",
            "distance",
            "dur_driving",
            "cost_driving_fuel",
            "congestion_charge",
            "driving_traffic_percent",
        ],
        "boosting_params": {
            "monotone_constraints_method": "advanced",
            "max_depth": 1,
            "n_jobs": -1,
            "learning_rate": 0.1,
            "monotone_constraints": [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                -1,
                -1,
                -1,
                -1,
                -1,
            ],
            "interaction_constraints": [
                [0],
                [1],
                [2],
                [3],
                [4],
                [5],
                [6],
                [7],
                [8],
                [9],
                [10],
                [11],
                [12],
                [13],
                [14],
                [15],
                [16],
                [17],
                [18],
                [19],
            ],
        },
        "shared": False,
    },
]

### $\mu$ hyperparameter search

We treat $\mu$ as a hyperparameter. We use hyperopt to find the optimal value of the hyperparameter. More details on how to use hyperopt [here](https://hyperopt.github.io/hyperopt/).

Note that for computational purposes, we show here a hyperparameter search for one iteration. As an example, we ran 25 iterations to obtain the results of the paper.

In [12]:
# specify nest
nest = {0: [0], 1: [1], 2: [2, 3]}

nested_structure = {
    "mu": np.array([1, 1, 1.17]),
    "nests": nest,
    "optimise_mu": False,
}

In [13]:
#model specification
model_specification = {
    "general_params": general_params,
    "rum_structure": rum_structure,
    "nested_logit": nested_structure,
}

#features and label column names
features = [f for f in LPMC_train.columns if f != "choice"]
label = "choice"

#create lightgbm dataset
lgb_train_set = lightgbm.Dataset(LPMC_train[features], label=LPMC_train[label], free_raw_data=False)
lgb_test_set = lightgbm.Dataset(LPMC_test[features], label=LPMC_test[label], free_raw_data=False)

In [16]:
#specifiy seach of mu
param_space =  {'mu': hyperopt.hp.uniform('mu', 1, 2)}

#objective for hyperopt
def objective(space):

    #create mu structure
    nested_structure["mu"] = np.array([1, 1, space["mu"]])

    ce_loss = 0
    num_trees = 0

    for train_idx, test_idx in folds:
        train_set = lgb_train_set.subset(sorted(train_idx))
        test_set = lgb_train_set.subset(sorted(test_idx))

        LPMC_model_trained = rum_train(train_set, model_specification, valid_sets=[test_set])

        ce_loss += LPMC_model_trained.best_score
        num_trees += LPMC_model_trained.best_iteration


    ce_loss = ce_loss / 5
    num_trees = num_trees / 5

    return {'loss': ce_loss, 'status': hyperopt.STATUS_OK, 'best_iteration': num_trees}


#%%
#n_iter=25
n_iter = 1

trials = hyperopt.Trials()
best_classifier = hyperopt.fmin(fn=objective,
                                space=param_space,
                                algo=hyperopt.tpe.suggest,
                                max_evals=n_iter,
                                trials=trials)

print(f'Best mu: {best_classifier["mu"]} \n Best negative CE: {trials.best_trial["result"]["loss"]}')


[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000605 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 855                     
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 17
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001814 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 855                     
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001028 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1828                    
[LightGBM] [Info] Number of data point




[1]------NCE value on train set : 1.4114             
---------NCE value on test set 1: 1.4123             
[11]-----NCE value on train set : 0.9855             
---------NCE value on test set 1: 0.9890             
[21]-----NCE value on train set : 0.8399             
---------NCE value on test set 1: 0.8432             
[31]-----NCE value on train set : 0.7791             
---------NCE value on test set 1: 0.7827             
[41]-----NCE value on train set : 0.7492             
---------NCE value on test set 1: 0.7532             
[51]-----NCE value on train set : 0.7314             
---------NCE value on test set 1: 0.7355             
[61]-----NCE value on train set : 0.7194             
---------NCE value on test set 1: 0.7236             
[71]-----NCE value on train set : 0.7107             
---------NCE value on test set 1: 0.7148             
[81]-----NCE value on train set : 0.7038             
---------NCE value on test set 1: 0.7079             
[91]-----NCE value on train 




[11]-----NCE value on train set : 0.9870             
---------NCE value on test set 1: 0.9841             
[21]-----NCE value on train set : 0.8409             
---------NCE value on test set 1: 0.8387             
[31]-----NCE value on train set : 0.7804             
---------NCE value on test set 1: 0.7781             
[41]-----NCE value on train set : 0.7508             
---------NCE value on test set 1: 0.7482             
[51]-----NCE value on train set : 0.7332             
---------NCE value on test set 1: 0.7298             
[61]-----NCE value on train set : 0.7212             
---------NCE value on test set 1: 0.7175             
[71]-----NCE value on train set : 0.7124             
---------NCE value on test set 1: 0.7086             
[81]-----NCE value on train set : 0.7056             
---------NCE value on test set 1: 0.7017             
[91]-----NCE value on train set : 0.7000             
---------NCE value on test set 1: 0.6959             
[101]----NCE value on train 




[11]-----NCE value on train set : 0.9872             
---------NCE value on test set 1: 0.9836             
[21]-----NCE value on train set : 0.8413             
---------NCE value on test set 1: 0.8373             
[31]-----NCE value on train set : 0.7806             
---------NCE value on test set 1: 0.7767             
[41]-----NCE value on train set : 0.7505             
---------NCE value on test set 1: 0.7471             
[51]-----NCE value on train set : 0.7326             
---------NCE value on test set 1: 0.7297             
[61]-----NCE value on train set : 0.7204             
---------NCE value on test set 1: 0.7179             
[71]-----NCE value on train set : 0.7115             
---------NCE value on test set 1: 0.7098             
[81]-----NCE value on train set : 0.7044             
---------NCE value on test set 1: 0.7034             
[91]-----NCE value on train set : 0.6985             
---------NCE value on test set 1: 0.6987             
[101]----NCE value on train 




[11]-----NCE value on train set : 0.9837             
---------NCE value on test set 1: 0.9920             
[21]-----NCE value on train set : 0.8373             
---------NCE value on test set 1: 0.8472             
[31]-----NCE value on train set : 0.7766             
---------NCE value on test set 1: 0.7867             
[41]-----NCE value on train set : 0.7468             
---------NCE value on test set 1: 0.7571             
[51]-----NCE value on train set : 0.7291             
---------NCE value on test set 1: 0.7400             
[61]-----NCE value on train set : 0.7170             
---------NCE value on test set 1: 0.7288             
[71]-----NCE value on train set : 0.7081             
---------NCE value on test set 1: 0.7207             
[81]-----NCE value on train set : 0.7012             
---------NCE value on test set 1: 0.7146             
[91]-----NCE value on train set : 0.6955             
---------NCE value on test set 1: 0.7095             
[101]----NCE value on train 




[11]-----NCE value on train set : 0.9851             
---------NCE value on test set 1: 0.9858             
[21]-----NCE value on train set : 0.8385             
---------NCE value on test set 1: 0.8422             
[31]-----NCE value on train set : 0.7775             
---------NCE value on test set 1: 0.7833             
[41]-----NCE value on train set : 0.7475             
---------NCE value on test set 1: 0.7549             
[51]-----NCE value on train set : 0.7297             
---------NCE value on test set 1: 0.7379             
[61]-----NCE value on train set : 0.7176             
---------NCE value on test set 1: 0.7263             
[71]-----NCE value on train set : 0.7087             
---------NCE value on test set 1: 0.7183             
[81]-----NCE value on train set : 0.7017             
---------NCE value on test set 1: 0.7121             
[91]-----NCE value on train set : 0.6960             
---------NCE value on test set 1: 0.7067             
[101]----NCE value on train 

### Cross-Validation

Once we know the optimal value of $\mu$, we can perform cross-validation to obtain the best number of trees.

Note that we use the optimal value of $\mu$ found with a bigger hyperparameter search, i.e. 1.17.

In [15]:
_, _, folds = load_preprocess_LPMC(path="../Data/")

#optimal mu
nested_structure['mu'] = np.array([1., 1., 1.166746773143513])

ce_loss = 0
num_trees = 0

#CV with 5 folds
for i, (train_idx, test_idx) in enumerate(folds):

    #create the lightgbm CV training and validation set
    train_set = lgb_train_set.subset(sorted(train_idx))
    test_set = lgb_train_set.subset(sorted(test_idx))
    
    print('-'*50 + '\n')
    print(f'Iteration {i+1}')

    #train the model with rum_train and nest parameters
    LPMC_model_trained = rum_train(train_set, model_specification, valid_sets = [test_set])

    #aggregate results
    ce_loss += LPMC_model_trained.best_score
    num_trees += LPMC_model_trained.best_iteration
    print('-'*50 + '\n')
    print(f'Best cross entropy loss: {LPMC_model_trained.best_score}')
    print(f'Best number of trees: {LPMC_model_trained.best_iteration}')

ce_loss = ce_loss/5
num_trees = num_trees/5
print('-'*50 + '\n')
print(f'Cross validation negative cross entropy loss: {ce_loss}')
print(f'With a number of trees on average of {num_trees}')

--------------------------------------------------

Iteration 1
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000269 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 855
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000159 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 855
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000734 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[Light



[11]-----NCE value on train set : 0.9384
---------NCE value on test set 1: 0.9423
[21]-----NCE value on train set : 0.8225
---------NCE value on test set 1: 0.8260
[31]-----NCE value on train set : 0.7735
---------NCE value on test set 1: 0.7775
[41]-----NCE value on train set : 0.7473
---------NCE value on test set 1: 0.7521
[51]-----NCE value on train set : 0.7308
---------NCE value on test set 1: 0.7355
[61]-----NCE value on train set : 0.7193
---------NCE value on test set 1: 0.7237
[71]-----NCE value on train set : 0.7106
---------NCE value on test set 1: 0.7151
[81]-----NCE value on train set : 0.7036
---------NCE value on test set 1: 0.7079
[91]-----NCE value on train set : 0.6979
---------NCE value on test set 1: 0.7023
[101]----NCE value on train set : 0.6931
---------NCE value on test set 1: 0.6978
[111]----NCE value on train set : 0.6891
---------NCE value on test set 1: 0.6940
[121]----NCE value on train set : 0.6855
---------NCE value on test set 1: 0.6907
[131]----NCE val



[1]------NCE value on train set : 1.3261
---------NCE value on test set 1: 1.3250
[11]-----NCE value on train set : 0.9402
---------NCE value on test set 1: 0.9373
[21]-----NCE value on train set : 0.8237
---------NCE value on test set 1: 0.8212
[31]-----NCE value on train set : 0.7749
---------NCE value on test set 1: 0.7725
[41]-----NCE value on train set : 0.7491
---------NCE value on test set 1: 0.7460
[51]-----NCE value on train set : 0.7326
---------NCE value on test set 1: 0.7292
[61]-----NCE value on train set : 0.7210
---------NCE value on test set 1: 0.7172
[71]-----NCE value on train set : 0.7123
---------NCE value on test set 1: 0.7084
[81]-----NCE value on train set : 0.7054
---------NCE value on test set 1: 0.7015
[91]-----NCE value on train set : 0.6997
---------NCE value on test set 1: 0.6957
[101]----NCE value on train set : 0.6950
---------NCE value on test set 1: 0.6909
[111]----NCE value on train set : 0.6910
---------NCE value on test set 1: 0.6867
[121]----NCE val



[11]-----NCE value on train set : 0.9406
---------NCE value on test set 1: 0.9360
[21]-----NCE value on train set : 0.8240
---------NCE value on test set 1: 0.8189
[31]-----NCE value on train set : 0.7751
---------NCE value on test set 1: 0.7701
[41]-----NCE value on train set : 0.7488
---------NCE value on test set 1: 0.7444
[51]-----NCE value on train set : 0.7319
---------NCE value on test set 1: 0.7285
[61]-----NCE value on train set : 0.7201
---------NCE value on test set 1: 0.7176
[71]-----NCE value on train set : 0.7111
---------NCE value on test set 1: 0.7097
[81]-----NCE value on train set : 0.7039
---------NCE value on test set 1: 0.7035
[91]-----NCE value on train set : 0.6980
---------NCE value on test set 1: 0.6987
[101]----NCE value on train set : 0.6931
---------NCE value on test set 1: 0.6947
[111]----NCE value on train set : 0.6889
---------NCE value on test set 1: 0.6913
[121]----NCE value on train set : 0.6852
---------NCE value on test set 1: 0.6886
[131]----NCE val



[11]-----NCE value on train set : 0.9371
---------NCE value on test set 1: 0.9451
[21]-----NCE value on train set : 0.8203
---------NCE value on test set 1: 0.8299
[31]-----NCE value on train set : 0.7714
---------NCE value on test set 1: 0.7812
[41]-----NCE value on train set : 0.7453
---------NCE value on test set 1: 0.7559
[51]-----NCE value on train set : 0.7287
---------NCE value on test set 1: 0.7395
[61]-----NCE value on train set : 0.7169
---------NCE value on test set 1: 0.7285
[71]-----NCE value on train set : 0.7082
---------NCE value on test set 1: 0.7208
[81]-----NCE value on train set : 0.7012
---------NCE value on test set 1: 0.7142
[91]-----NCE value on train set : 0.6955
---------NCE value on test set 1: 0.7092
[101]----NCE value on train set : 0.6908
---------NCE value on test set 1: 0.7048
[111]----NCE value on train set : 0.6867
---------NCE value on test set 1: 0.7010
[121]----NCE value on train set : 0.6831
---------NCE value on test set 1: 0.6978
[131]----NCE val



[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[1]------NCE value on train set : 1.3256
---------NCE value on test set 1: 1.3253
[11]-----NCE value on train set : 0.9386
---------NCE value on test set 1: 0.9404
[21]-----NCE value on train set : 0.8216
---------NCE value on test set 1: 0.8256
[31]-----NCE value on train set : 0.7724
---------NCE value on test set 1: 0.7783
[41]-----NCE value on train set : 0.7461
---------NCE value on test set 1: 0.7532
[51]-----NCE value on train set : 0.7294
---------NCE value on test set 1: 0.7373
[61]-----NCE value on train set : 0.7176
---------NCE value on test set 1: 0.7260
[71]-----NCE value on train set : 0.7087
---------NCE value on test set 1: 0.7179
[81]-----NCE value on train set : 0.7017
---------NCE value on test set 1: 0.7116
[91]-----NCE value on train set : 0.69

### Testing the model on out-of-sample data

Now that we have the optimal number of trees (1243), we can train the final version of the model on the full dataset, and test it on out-of-sample data with the ```predict()``` function. Note that the dataset must be a lightgbm object in the ```predict()``` function.

In [None]:
general_params['num_iterations'] = int(num_trees)
general_params['early_stopping_round'] = None 

LPMCnested_model_fully_trained = rum_train(lgb_train_set, model_specification)

preds = LPMCnested_model_fully_trained.predict(lgb_test_set)

ce_test = cross_entropy(preds, lgb_test_set.get_label().astype(int))

print('-'*50)
print(f'Final negative cross-entropy on the test set: {ce_test}')

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000884 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 854
[LightGBM] [Info] Number of data points in the train set: 54766, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000443 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 854
[LightGBM] [Info] Number of data points in the train set: 54766, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003491 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1834
[LightGBM] [Info] Number of data points in the train set: 54766, number of used features: 23
[LightG



[11]-----NCE value on train set : 0.9392
[21]-----NCE value on train set : 0.8227
[31]-----NCE value on train set : 0.7738
[41]-----NCE value on train set : 0.7478
[51]-----NCE value on train set : 0.7311
[61]-----NCE value on train set : 0.7195
[71]-----NCE value on train set : 0.7108
[81]-----NCE value on train set : 0.7038
[91]-----NCE value on train set : 0.6981
[101]----NCE value on train set : 0.6934
Early stopping at iteration 0, with a best score on test set of 1000000.0, and on train set of 0.6933764991064469
--------------------------------------------------
Final negative cross-entropy on the test set: 0.7170935957482365


### Optimising $\mu$ with `scipy.minimize`

In [17]:
# specify nest
nest = {0: [0], 1: [1], 2: [2, 3]}

nested_structure = {
    "mu": np.array([1., 1., 1.25]),
    "nests": nest,
    "optimise_mu": [False, False, True],
    "optim_interval": 50,
}

In [20]:
_, _, folds = load_preprocess_LPMC(path="../Data/")

#model specification
model_specification = {
    "general_params": general_params,
    "rum_structure": rum_structure,
    "nested_logit": nested_structure,
}

#features and label column names
features = [f for f in LPMC_train.columns if f != "choice"]
label = "choice"

#create lightgbm dataset
lgb_train_set = lightgbm.Dataset(LPMC_train[features], label=LPMC_train[label], free_raw_data=False)
lgb_test_set = lightgbm.Dataset(LPMC_test[features], label=LPMC_test[label], free_raw_data=False)

ce_loss = []
num_trees = 0
optimised_mu = []

for train_idx, test_idx in folds:
    train_set = lgb_train_set.subset(sorted(train_idx))
    test_set = lgb_train_set.subset(sorted(test_idx))

    LPMC_model_trained = rum_train(train_set, model_specification, valid_sets=[test_set])

    ce_loss.append(LPMC_model_trained.best_score)
    num_trees += LPMC_model_trained.best_iteration
    optimised_mu.append(LPMC_model_trained.mu)

num_trees = num_trees / 5

for i, mu in enumerate(optimised_mu):
    print(f"iteration {i} --- optimised mu: {mu}, CE: {ce_loss[i]}")




[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000649 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 855
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000161 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 855
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000896 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1828
[LightGBM] [Info] Number of data points in the train set: 43812, number of used features: 23
[LightG



[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000272 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 852
[LightGBM] [Info] Number of data points in the train set: 43813, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000285 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 852
[LightGBM] [Info] Number of data points in the train set: 43813, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000789 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1826
[LightGBM] [Info] Number of data poi



[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000271 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1366
[LightGBM] [Info] Number of data points in the train set: 43814, number of used features: 20
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[1]------NCE value on train set : 1.4373
---------NCE value on test set 1: 1.4365
[11]-----NCE value on train set : 0.9870
---------NCE value on test set 1: 0.9832
[21]-----NCE value on train set : 0.8330
---------NCE value on test set 1: 0.8291
[31]-----NCE value on train set : 0.7707
---------NCE value on test set 1: 0.7679
[41]-----NCE value on train set : 0.7417
---------NCE value on test set 1: 0.7395
[51]-----NCE 



[1]------NCE value on train set : 1.4366
---------NCE value on test set 1: 1.4383
[11]-----NCE value on train set : 0.9838
---------NCE value on test set 1: 0.9917
[21]-----NCE value on train set : 0.8293
---------NCE value on test set 1: 0.8402
[31]-----NCE value on train set : 0.7673
---------NCE value on test set 1: 0.7782
[41]-----NCE value on train set : 0.7385
---------NCE value on test set 1: 0.7500
[51]-----NCE value on train set : 0.7219
---------NCE value on test set 1: 0.7339
[61]-----NCE value on train set : 0.7110
---------NCE value on test set 1: 0.7239
[71]-----NCE value on train set : 0.7030
---------NCE value on test set 1: 0.7166
[81]-----NCE value on train set : 0.6967
---------NCE value on test set 1: 0.7109
[91]-----NCE value on train set : 0.6916
---------NCE value on test set 1: 0.7065
[101]----NCE value on train set : 0.6873
---------NCE value on test set 1: 0.7028
[111]----NCE value on train set : 0.6837
---------NCE value on test set 1: 0.6998
[121]----NCE val



[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000226 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1367
[LightGBM] [Info] Number of data points in the train set: 43810, number of used features: 20
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[LightGBM] [Info] Using self-defined objective function
[1]------NCE value on train set : 1.4370
---------NCE value on test set 1: 1.4366
[11]-----NCE value on train set : 0.9851
---------NCE value on test set 1: 0.9858
[21]-----NCE value on train set : 0.8307
---------NCE value on test set 1: 0.8337
[31]-----NCE value on train set : 0.7684
---------NCE value on test set 1: 0.7747
[41]-----NCE value on train set : 0.7393
---------NCE value on test set 1: 0.7475
[51]-----NCE 

### Testing the model on out-of-sample data

Now that we have the optimal number of trees, we can train the final version of the model on the full dataset, and test it on out-of-sample data with the ```predict()``` function. Note that the dataset must be a lightgbm object in the ```predict()``` function.

In [None]:
general_params['num_iterations'] = int(num_trees)
general_params['early_stopping_round'] = None

LPMCnested_model_fully_trained = rum_train(lgb_train_set, model_specification)

preds = LPMCnested_model_fully_trained.predict(lgb_test_set)

ce_test = cross_entropy(preds, lgb_test_set.get_label().astype(int))

print('-'*50)
print(f'Final negative cross-entropy on the test set: {ce_test}')
print(f'Opimal mu {LPMCnested_model_fully_trained.mu}')



[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000294 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 854
[LightGBM] [Info] Number of data points in the train set: 54766, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000225 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 854
[LightGBM] [Info] Number of data points in the train set: 54766, number of used features: 17
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001662 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1834
[LightGBM] [Info] Number of data poi

# References

Salvadé, N., & Hillel, T. (2024). Rumboost: Gradient Boosted Random Utility Models. *arXiv preprint [arXiv:2401.11954](https://arxiv.org/abs/2401.11954)*

Hillel, T., Elshafie, M.Z.E.B., Jin, Y., 2018. Recreating passenger mode choice-sets for transport simulation: A case study of London, UK. Proceedings of the Institution of Civil Engineers - Smart Infrastructure and Construction 171, 29–42. https://doi.org/10.1680/jsmic.17.00018