# Introduction to choice-learn's data management

In [None]:
import os
import sys

sys.path.append("../")

## ChoiceDataset - Getting Started !

In order to estimate a model using the choice-learn API, you will first need to wrap your dataset within a ChoiceDataset.

choice-learn ChoiceDataset aims at being able to handle large datasets, typically by limiting the usage of memory to store several times the same feature.
We define two sources of features, the items and the contexts.

<ins>**Items**</ins> represent a product, an alternative that can be chosen by the customer at some point.

<ins>**Contexts**</ins> represent the contexts surrounding each choice. One context corresponds to one choice and regroups every factor that might be different from one choice to another.


From these two concepts, we defines 5 types of data:

- **choices:** The main information, indicating which item/alternative has been chosen among all availables
- **fixed_items_features:** The items features that never change (e.g. size, color, etc...) over the choices/contexts.
  
  Size=number of items.
- **contexts_features:** It represents all the features that might change from one choice to another and that are **common** to all items (e.g. day of week, customer features, etc...).
  
  Size=number of choices.
- **contexts_items_features:** The features that are function of the item and of the context (e.g. prices change over contexts and are specific to each sold item, etc...).
  
  Size=number of choices x number of items
- **contexts_items_availabilities:** For each context it represents whether each item/alternative is proposed to the customer (1.) or not (0.).
  
  Size=number of choices.


The easiest way to do it is to use a pandas DataFrame, let's see how to do it !

## Hands-on: example from a DataFrame with ModeCanada

We will use the ModeCanada [1] dataset for this example. It is provided with the choice-learn package and can loaded as follows:

In [None]:
import numpy as np
import pandas as pd

from choice_learn.data import ChoiceDataset
from choice_learn.datasets import load_modecanada

canada_transport_df = load_modecanada(as_frame=True)
canada_transport_df.head()

Unnamed: 0,case,alt,choice,dist,cost,ivt,ovt,freq,income,urban,noalt
1,1,train,0,83,28.25,50,66,4,45.0,0,2
2,1,car,1,83,15.77,61,0,0,45.0,0,2
3,2,train,0,83,28.25,50,66,4,25.0,0,2
4,2,car,1,83,15.77,61,0,0,25.0,0,2
5,3,train,0,83,28.25,50,66,4,70.0,0,2


An extensive description of the dataset can be found [here](https://www.ssc.wisc.edu/~bhansen/econometrics/Koppelman_description.pdf).
An extract indicates:

"The dataset was assembled in 1989 by VIA Rail (the Canadian national rail carrier) to estimate the demand for high-speed rail in the Toronto-Montreal corridor. The main information source was a Passenger Review administered to business travelers augmented by information about each trip. The observations consist of a choice between four modes of transportation (train, air, bus, car) with information about the travel mode and about the passenger. The posted dataset has been balanced to only include cases where all four travel modes are recorded. The file contains 11,116 observations on 2779 individuals.  "

Alright !
If we go back to our dataframe, we can see the following columns:
- case: an ID of the traveler
- alt: the alternative concerned by the row
- choice: 1 if the alternative was chosen, 0 otherwise
- dist: trip distance
- cost: trip cost
- ivt: travel time in-vehicule (minutes)
- ovt: travel time out-vehicule (minutes)
- income: housold income of traveler ($)
- urban: 1 if origin or destination is a large city
- noalt: the number of alternative among which the traveler had to chose
- freq: the frequence of the alternative (0 for car)

Following our specification, we can see that one case corresponds to one customer thus one choice. In our choice-learn language it corresponds to "one context": a set of available alternatives and their features/specificites resulting in one choice.
Let's regroup our features:

**choices**
Easy ! It is the alternative whenever the value is one.

**contexts_features**
The income, urban and distance (also noalt which is not really a feature) features are the same for all the alternative within a context: they are contexts_features. They are all constant with respect to a case=traveler ID.

**contexts_items_features**
Ivt, Ovt, cost and freq depends on the alternative and change over the contexts. They are contexts_items_features.

**contexts_items_features**
It in not directly indicated, however it can be easily deduced. Whenever an alternative is not available, it is not precised for its case. For example for the case=1, our first context, only train and car are given as alternatives, meaning that air and bus were could not be chosen/were not available.

Okay, but we are missing fixed_items_features... Indeed there isn't really any in this dataset. Let's create one for the example.
We will create is_public, indicating if an alternative is a public_transportation (1) or a private one (0).

In [None]:
transport_df = canada_transport_df.copy()
items = ["air", "bus", "car", "train"]

# Add "is_public" feature for transport modes
transport_df["is_public"] = transport_df.apply(lambda row: 0. if row.alt == "car" else 1., axis=1)

# Just some typing
transport_df.income = transport_df.income.astype("float32")

# Let's take a look at our new df:
transport_df.head()

Our feature, is_public is 0 for the car and 1 for all other alternatives, seems fine! We can now create our ChoiceDataset !\
*Note that you do NOT need each type of feature, here the purpose was to give a complete example.*

### Creating a ChoiceDataset from this *single* dataframe

In order to create the ChoiceDataset from the DataFrame, we need to specify:
- the column in which the choice is given
- the column where the item is identified 
- the column where the context is identified
- the columns representing the fixed_items_features
- the columns representing the contexts_features
- the columns representing the contexts_items_features


For our Canada Transport example, here is how it should be done:

In [None]:
dataset = ChoiceDataset.from_single_long_df(
    df=transport_df,
    choices_column="choice",
    items_id_column="alt",
    contexts_id_column="case",
    fixed_items_features_columns=["is_public"],
    contexts_features_columns=["income", "urban", "dist"],
    contexts_items_features_columns=["cost", "freq", "ovt", "ivt"],
    choice_format="one_zero")

Last argument, "choice_format", precises how the choice is encoded in the dataframe. Currently two modes are availble:

 - *one_zero*:
The choice column contains a 0 when the alternative/item is not chosen in the session and a 1 if it is chosen.
This is the case here with Canada Transport.
 - *item_id*:
The choice column contains the id of the choice during the session. The id corresponds to the values used in the column 'items_id_column'.
In this case of Canada Transport, the dataframe would need to be:

| | case | alt | choice | dist | cost | ivt | ovt | freq | 	income | urban | noalt | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | train | car | 83 | 28.25 | 50 | 66 | 4 | 45 | 0 | 2 |
| 2 | 1 | car | car | 83 | 15.77 | 61 | 0 | 0 | 45 | 0 | 2 |
| 3 | 2 | train | car | 83 | 28.25 | 50 | 66 | 4 | 25 | 0 | 2 |
| 4 | 2 | car | car | 83 | 15.77 | 61 | 0 | 0 | 25 | 0 | 2 |
| 5 | 3 | train | car | 83 | 28.25 | 50 | 66 | 4 | 70 | 0 | 2 |

In the first 5 examples, the chosen transportation is always the car.

The ChoiceDataset is ready !

If your DataFrame is in the wide format, you can use the equivalent method *from_single_wide_df*. An example can be found [here](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/dataset_creation.ipynb) on the SwissMetro dataset: 

You now have three possibilities to continue discovering the choice-learn package:
- You can directly go [here]() to the modelling tutorial if you want to understand how a first simple ConditionMNl would be implemented.
- You can go [here]() if your dataset is organized differently to see all the different ways to instantiate a ChoiceDataset. In particular it helps if you data is splitted into several DataFrames or if you have another format of data.
- Or you can continue this current tutorial to better understand the ChoiceDataset machinery and everything there is to know about it.

Whatever your choice, you can also check [here](#ready-to-use-datasets) the list of open source datasets available directly with the package.

## Hands-on: example from a NumPy arrays

Let's see an example of ChoiceDataset instantiation from numpy arrays.

Let's consider three *items* whose *features* are: Size, Weight, price, promotion (simply a boolean to indicate whether it is under promotion).

For size and weights, we will store as *fixed items features* as they don't change. For the price and promotion, we will store in the *contexts items features*, since they may change for each context.

For the *contexts*, we will consider the customers attributes: Budget and age.

In [None]:
# Choices:
# Customer 1 bought item 1
# Customer 2 bought item 3
# Customer 1 bought item 2

choices = [0, 2, 1]

fixed_items_features = [
    [1, 2], # item 1 [size, weight]
    [2, 4], # item 2 [size, weight]
    [1.5, 1.5], # item 3 [size, weight]
]

contexts_features = [
    [100, 20], # choice 1, customer 1 [budget, age]
    [200, 40], # choice 2, customer 2 [budget, age]
    [80, 20], # choice 3, customer 1 [budget, age]
]

contexts_items_features = [
    [
        [100, 0], # choice 1, Item 1 [price, promotion]
        [140, 0], # choice 1, Item 2 [price, promotion]
        [200, 0], # choice 1, Item 2 [price, promotion]
    ],
    [
        [100, 0], # choice 2, Item 1 [price, promotion]
        [120, 1], # choice 2, Item 2 [price, promotion]
        [200, 0], # choice 2, Item 2 [price, promotion]
    ],
    [
        [100, 0], # choice 3, Item 1 [price, promotion]
        [120, 1], # choice 3, Item 2 [price, promotion]
        [180, 1], # choice 3, Item 2 [price, promotion]
    ],
]

contexts_items_availabilities = [
    [1, 1, 1], # All items available at choice 1
    [1, 1, 1], # All items available at choice 2
    [0, 1, 1], # Item 1 not available at choice 3
]

Note that in items_features and contexts_items_features, the features need to be well ordered:
- The features are ordered the same for all items
- The items are ordered in their index given in choices. This applies in items_features and contexts_items_features


**items_features** = [[item1_featureA, item1_featureB, ...], [item2_featureA, item2_featureB, ...], ...]

**contexts_items_features** = [[[context1_item1_featureA, ...], [context1_item2_featureA, ...]], [[context2_item1_featureA, ...], [context2_item2_featureA, ...]], ...]

**choices** then represent the index of the item: 0 when item1 is chose, 1 when item2, etc..., e.g. [0, 0, 2, 1, ...]

In [None]:
dataset = ChoiceDataset(
    choices=choices,
    fixed_items_features=fixed_items_features,
    fixed_items_features_names=["size", "weight"], # You can precise the names of the features if you want
    contexts_features=contexts_features,
    contexts_features_names=["budget", "age"], # same, not mandatory
    contexts_items_features=contexts_items_features,
    contexts_items_features_names=["price", "promotion"], # same, not mandatory
    contexts_items_availabilities=contexts_items_availabilities,
)

In [None]:
dataset.contexts_items_features[0].shape

(3, 3, 2)

ChoiceDataset is indexed by choice. You can use [] to subset it.
It is particularly useful for train/test split:

In [None]:
print(dataset[0])

In [None]:
train_index = [0, 1]
test_index = [2]
train_dataset = dataset[train_index]
test_dataset = dataset[test_index]
print("Train Dataset length:", len(train_dataset), "Test Dataset lenght:", len(test_dataset))

Some choices never happen in the dataset: {1}
Some choices never happen in the dataset: {0, 2}
Train Dataset length: 2 Test Dataset lenght: 1


If you want to access the features you can use the .iloc function with choices indexes 
It returns the features in this order:

- items_features (n_items, n_items_features)
- contexts_features (n_choices, n_sessions_features)
- contexts_items_features (n_choices, n_items, n_sessions_items_features)
- contexts_items_availabilities (n_choices, n_items)
- choices (n_choices,)

As a reminder, we have as many contexts as we have choices in the dataset !

| index | feature  | shape  |   
|---|---|---|
| 0 | items_features | (n_items, n_items_features) |
| 1 | contexts_features | (n_choices, n_contexts_features) |
| 2 | contexts_items_features | (n_choices, n_items, n_contexts_items_features) |
| 3 | context_items_availabilities | (n_choices, n_items) |
| 4 | choices | (n_choices,) |

In [None]:
contexts_indexes = [0, 1]
print("Items features:", train_dataset.batch[contexts_indexes][0])
print("Contexts features:", train_dataset.batch[contexts_indexes][1])
print("Contexts Items features:", train_dataset.batch[contexts_indexes][2])
print("Contexts Items Availabilities features:", train_dataset.batch[contexts_indexes][3])
print("Contexts Choices:", train_dataset.batch[contexts_indexes][4])

Items features: (array([[1. , 2. ],
       [2. , 4. ],
       [1.5, 1.5]], dtype=float32),)
Contexts features: (array([[100,  20],
       [200,  40]], dtype=int32),)
Contexts Items features: (array([[[100,   0],
        [140,   0],
        [200,   0]],

       [[100,   0],
        [120,   1],
        [200,   0]]], dtype=int32),)
Contexts Items Availabilities features: [[1. 1. 1.]
 [1. 1. 1.]]
Contexts Choices: [0 2]


To simplify the iteration over the dataset you can call the .iter_batch method, with the batch_size argument.

Note that batch_size=-1 returns the whole dataset

In [None]:
# All the features are given for each session, in order to compute utility and NegativeLogLikelihood
for i, batch in enumerate(dataset.iter_batch(batch_size=1)):
    print(i, batch)

0 (array([[1. , 2. ],
       [2. , 4. ],
       [1.5, 1.5]], dtype=float32), array([[100,  20]], dtype=int32), array([[[100,   0],
        [140,   0],
        [200,   0]]], dtype=int32), array([[1., 1., 1.]], dtype=float32), array([0], dtype=int32))
1 (array([[1. , 2. ],
       [2. , 4. ],
       [1.5, 1.5]], dtype=float32), array([[200,  40]], dtype=int32), array([[[100,   0],
        [120,   1],
        [200,   0]]], dtype=int32), array([[1., 1., 1.]], dtype=float32), array([2], dtype=int32))
2 (array([[1. , 2. ],
       [2. , 4. ],
       [1.5, 1.5]], dtype=float32), array([[80, 20]], dtype=int32), array([[[100,   0],
        [120,   1],
        [180,   1]]], dtype=int32), array([[0., 1., 1.]], dtype=float32), array([1], dtype=int32))


**Stacking features when building the ChoiceDataset**

If you need to keep a clear distinction between different features, you can use stacking in the ChoiceDataset. In this case, you need to provide the additional features arrays indexed the same. It is possible to stack: *items_features*, *contexts_features*, *contexts_items_features*.

For example if we have two kind of items_features and we do not want them to be within the same np.ndarray we can as follow:

In [None]:
items_features_2 = [
    [11, 12], # item 1 
    [12, 14], # item 2 
    [11.5, 11.5], # item 3 
]
dataset = ChoiceDataset(
    # Here items_features specified as a tuple of the two features lists
    fixed_items_features=(fixed_items_features, items_features_2),
    contexts_features=contexts_features,
    contexts_items_features=contexts_items_features,
    contexts_items_availabilities=contexts_items_availabilities,
    choices=choices,
)

When indexing or batching your ChoiceDataset, you will now get items_features as a tuple, with elements corresponding to (items_features, items_features_2)

In [None]:
dataset.batch[0]

((array([[1. , 2. ],
         [2. , 4. ],
         [1.5, 1.5]], dtype=float32),
  array([[11. , 12. ],
         [12. , 14. ],
         [11.5, 11.5]], dtype=float32)),
 array([100,  20], dtype=int32),
 array([[100,   0],
        [140,   0],
        [200,   0]], dtype=int32),
 array([1, 1, 1], dtype=object),
 0)

## More Advanced use: the FeatureStorage & RAM optimization

## FeaturesStorage, why should I use it ?
Regularly, you have features that repeat themselves over several choices. It can happen if you have several times the same customer, if you have store features or if you use OneHot representations... And those are only example.

The FeaturesStorage object is designed to help you better handle these cases. It is mainly built to work well with ChoiceDataset, but here is a small introduction on how it works:

Let's consider a case where we consider three supermarkets: 
- supermarket_1 with surface of 100 and 250 average nb of customers
- supermarket_2 with surface of 150 and 500 average nb of customers
- supermarket_3 with surface of 80 and 100 average nb of customers 

In each store, we have 4 available products for which we have little information. For the example'sake, let's consider the following utility:
$$U(i) = u_i + \beta_1 \cdot S_s + \beta_2 \cdot C_s$$
With $S_s$ the surface of the store and $C_s$ its average number of customers.

We want to estimate the base utilities $u_i$ and the two coefficients: $\beta_1$ and $\beta_2$.

Let's start with creating a ChoiceDataset without the FeaturesStorage:

In [None]:
# Here are our choices:
choices = [0, 1, 2, 0, 2, 1, 1, 0, 2, 1, 2, 0, 2, 0, 1, 2, 1, 0]
supermarket_features = [[100, 250], [150, 500], [80, 100]]
# Now our store sequence of supermarkets is:
supermarkets_sequence = [1, 1, 2, 3, 2, 1, 2, 1, 1, 2, 3, 2, 1, 2, 2, 3, 1, 2]

# The usual way to store the features would be to create the contexts_features array that contains
# the right features:
usual_supermarket_features = np.array([supermarket_features[supermarket_id - 1] for supermarket_id in supermarkets_sequence])
print("Usual Supermakerket Features Shape:", usual_supermarket_features.shape)

# And now we can create our ChoiceDataset:

usual_dataset = ChoiceDataset(choices=choices,
                              fixed_items_features=np.eye(3),
                              contexts_features=usual_supermarket_features)

Usual Supermakerket Features Shape: (18, 2)


Now, we have our dataset, we only need to create our ChoiceModel and we are good to go. However, it would also be natural to feel unsatisfied because your dataset is not well optimized. Indeed we have repeated the same information several times having a lot of redundant information.

If in our small use-case it does not really matter, if we consider hundreds of stores on several millions - or billions - of choices, it would become... unreasonable!

Let's now welcome the FeaturesStorage to help us:

In [None]:
from choice_learn.data import FeaturesStorage

features_dict = {f"supermarket_{i+1}": supermarket_features[i] for i in range(3)}
storage = FeaturesStorage(values=features_dict, name="supermarket_features")

# Let's see how we can use this bad boy:

The FeaturesStorage is basically a Python dictionnary with a wrap-up to easily get batches of data.\
You can ask for a sequence of features with .batch. It works with the keys of our dictionnary that can be int, float, str, etc...

In [None]:
print("Retrieving features of first supermarket:")
print(storage.batch["supermarket_1"])
print("Retrieving a batch of features:")
print(storage.batch[["supermarket_1", "supermarket_2", "supermarket_1"]])

Retrieving features of first supermarket:
[100 250]
Retrieving a batch of features:
[[100 250]
 [150 500]
 [100 250]]


The FeaturesStorage is handy for its transparent use with ChoiceDataset. For it to work well you need:
- to specify a FeaturesStorage name
- to match FeaturesStorage ids with the sequence

In our case we call our FeaturesStorage "supermarket_features", the ids are now strings, let's maker the sequence match:

In [None]:
str_supermarkets_sequence = [[f"supermarket_{i}"] for i in supermarkets_sequence]

And now we can create our ChoiceDataset:

In [None]:
storage_dataset = ChoiceDataset(choices=choices,
                                contexts_features=str_supermarkets_sequence,
                                contexts_features_names=["supermarket_features"],
                                fixed_items_features=np.eye(3),
                                features_by_ids=[storage],
)

If you have paid attention, we have specified the FeaturesStorage in the features_by_ids argument and we HAVE TO match the contexts_features_names column with the name of the Features Storage.\
When calling for a batch of data, the ChoiceDataset will look into the FeaturesStorage call "supermarket_features" to match the values in contexts_features with the ones store in it.

In [None]:
batch = storage_dataset.batch[0]
print("Batch Fixed Items Features:", batch[0])
print("Batch Contexts Features:", batch[1])
print("Batch Choice:", batch[4])
print("%-------------------------%")
batch = storage_dataset.batch[[1, 2, 3]]
print("Batch Fixed Items Features:", batch[0])
print("Batch Contexts Features:", batch[1])
print("Batch Choice:", batch[4])
print("%-------------------------%")
batch = storage_dataset.batch[[0, 1, 5]]
print("Batch Fixed Items Features:", batch[0])
print("Batch Contexts Features:", batch[1])
print("Batch Choice:", batch[4])

Batch Fixed Items Features: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Batch Contexts Features: [100 250]
Batch Choice: 0
%-------------------------%
Batch Fixed Items Features: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Batch Contexts Features: [[100 250]
 [150 500]
 [ 80 100]]
Batch Choice: [1 2 0]
%-------------------------%
Batch Fixed Items Features: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Batch Contexts Features: [[100 250]
 [100 250]
 [100 250]]
Batch Choice: [0 1 1]


Everything is mapped as needed. And the great thing is that you can easily mix ''classical'' features with FeaturesStorages.\
Let's add a 'is_week_end' feature to our problem that will also be stored as a contexts_features.

In [None]:
contexts_features = pd.DataFrame({"supermarket_features": np.array(str_supermarkets_sequence).squeeze(),
"is_week_end": [0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0]})
contexts_features.head()

Unnamed: 0,supermarket_features,is_week_end
0,supermarket_1,0
1,supermarket_1,0
2,supermarket_2,0
3,supermarket_3,1
4,supermarket_2,1


In [None]:
# Creation of the ChoiceDataset
storage_dataset = ChoiceDataset(choices=choices,
                                contexts_features=contexts_features,
                                fixed_items_features=np.eye(3),
                                features_by_ids=[storage],
)

In [None]:
# And now it's ready
batch = storage_dataset.batch[[1, 2, 3]]
print("Batch Fixed Items Features:", batch[0])
print("Batch Contexts Features:", batch[1])
print("Batch Choice:", batch[4])

Batch Fixed Items Features: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Batch Contexts Features: [[100 250   0]
 [150 500   0]
 [ 80 100   1]]
Batch Choice: [1 2 0]


### Specific sub-example: the OneHot Storage
A recurring usecase is the use of **OneHot** representation of features. The OneHotStorage is built specifically for one-hot encoded features and further improves memory consumption. The storage is to be used the same way as FeaturesStorage, but behind will only keep the index of the one of each element and will consitute the one-hot vector only when needed.

In [None]:
from choice_learn.data import OneHotStorage

In [None]:
storage = OneHotStorage(ids=["a", "b", "c"])

print("RAM storage of the OneHotStore:", storage.storage)
# When indexing with .batch, we can access the one-hot encoding of the element using its id
print("One-hot vector batch: storage.batch['a']", storage.batch["a"])
print("One-hot vector batch: storage.batch[['a', 'b', 'c', 'c', 'b', 'a']]")
print(storage.batch[["a", "b", "c", "c", "b", "a"]])

RAM storage of the OneHotStore: {'a': 0, 'b': 1, 'c': 2}
One-hot vector batch: storage.batch['a'] [1 0 0]
One-hot vector batch: storage.batch[['a', 'b', 'c', 'c', 'b', 'a']]
[[1 0 0]
 [0 1 0]
 [0 0 1]
 [0 0 1]
 [0 1 0]
 [1 0 0]]


**Note that:**
- we use strings as ids for the example, however we recommend to use integers.
- FeaturesStorage can be instantiated from dict, np.ndarray, list, pandas.DataFrame, etc...
- More in-depth examples and explanations can be found [here](./features_byID_example.ipynb)

## Ready-to-use datasets
A few well-known open source datasets are directly integrated and the package and can be downloaded in one line:
- SwissMetro from Bierlaire et al (2001) [2]
- ModeCanada from Koppleman et al. (1993) [1]
- The Train dataset from Ben Akiva et al. (1993) [4]
- The Heating & Electricity datasets from Kenneth Train [3]
- The TaFeng dataset from Kaggle [5]

If you feel like another open-source dataset should be included, reach out !

In [None]:
from choice_learn.datasets import load_swissmetro, load_modecanada, load_train, load_heating, load_electricity, load_tafeng

canada_choice_dataset = load_modecanada()
swissmetro_choice_dataset = load_swissmetro()

The datasets can also be downloaded as dataframes:

In [None]:
swissmetro_df = load_swissmetro(as_frame=True)
swissmetro_df.head()

Unnamed: 0,GROUP,SURVEY,SP,ID,PURPOSE,FIRST,TICKET,WHO,LUGGAGE,AGE,...,TRAIN_CO,TRAIN_HE,SM_TT,SM_CO,SM_HE,SM_SEATS,CAR_TT,CAR_CO,CHOICE,CAR_HE
0,2.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,3.0,...,48.0,120.0,63.0,52.0,20.0,0.0,117.0,65.0,2.0,0.0
1,2.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,3.0,...,48.0,30.0,60.0,49.0,10.0,0.0,117.0,84.0,2.0,0.0
2,2.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,3.0,...,48.0,60.0,67.0,58.0,30.0,0.0,117.0,52.0,2.0,0.0
3,2.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,3.0,...,40.0,30.0,63.0,52.0,20.0,0.0,72.0,52.0,2.0,0.0
4,2.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,3.0,...,36.0,60.0,63.0,42.0,20.0,0.0,90.0,84.0,2.0,0.0


### References
[1] Koppelman et al. (1993), *Application and Interpretation of Nested Logit Models of Intercity Mode Choice*\
[2] Bierlaire, M., Axhausen, K. and Abay, G. (2001), *The Acceptance of Modal Innovation: The Case of SwissMetro*\
[3] Train, K.E. (2003) *Discrete Choice Methods with Simulation.* Cambridge University Press.\
[4] Ben-Akiva M.; Bolduc D.; Bradley M. (1993) *Estimation of Travel Choice Models with Randomly Distributed Values of Time*\
[5] https://www.kaggle.com/datasets/chiranjivdas09/ta-feng-grocery-dataset