In [None]:
import os
import sys

sys.path.append("../")

<img src="../docs/choice_learn_official_logo.png" width="256"> 

Choice-Learn is a Python package designed to help building discrete choice models. In particular you will find:

- Optimized **Data** handling with the ChoiceDataset object and ready-to-use datasets
- **Modelling** tools with:
    - Efficient well-known choice models
    - Customizable class ChoiceModel to build your own model
    - Estimation options such as choosing the method (LBFGS, Gradient Descent, etc...)
- Divers **Tools** revolving around choice models such as an Assortment Optimizer


### Discrete Choice Modelling
Discrete choice models aim at explaining or predicting a choice from a set of alternatives. Well known use-cases include analyzing people choice of mean of transport or products purchases in stores.

If you are new to choice modelling, you can check this [resource](https://www.publichealth.columbia.edu/research/population-health-methods/discrete-choice-model-and-analysis). 

### Tutorial
In this notebook we will describe step-by-step the estimation of a choice model.

- [Data Handling](#Data:-items,-features-and-choices)
- [Modelling](#modelling:-estimation-and-choice-probabilities)



## Data: items, features and choices

The data structure for choice modelling is somehow different than usual prediction use-cases.
We consider a set of variable size of different alternatives. Each alternative is described by features and one is chosen among the set. Some contexts features (describing a customer, or time) can also affect the choice.
Let's take an example where we want to predict a customer's next purchase.

Three different items, i<sub>1</sub>, i<sub>2</sub> and i<sub>3</sub> are sold and we have gathered a small dataset:

<table>
<tr><th>1st Purchase: </th><th>2nd Purchase:</th><th>3rd Purchase:</th></tr>

<tr><td>

**Shelf**:

| Item           | Price   | Promotion |
| -------------- | ------- | --------- |
| i<sub>1</sub>  | $100    | no        |
| i<sub>2</sub>  | $140    | no        |
| i<sub>3</sub>  | $200    | no        |

**Customer Purchase:** i<sub>1</sub>

</td><td>

**Shelf**:

| Item           | Price   | Promotion |
| -------------- | ------- | --------- |
| i<sub>1</sub>  | $100    | no        |
| i<sub>2</sub>  | $120    | yes       |
| i<sub>3</sub>  | $200    | no        |

**Customer Purchase:** i<sub>2</sub>

</td><td>

**Shelf**:

| Item           | Price        | Promotion    |
| -------------- | ------------ | ------------ |
| i<sub>1</sub>  | $100         | no           |
| i<sub>2</sub>  | Out-Of-Stock | Out-Of-Stock |
| i<sub>3</sub>  | $180         | yes          |

**Customer Purchase:** i<sub>3</sub>

</td></tr> </table>

Indexing the items in the same order, we create the ChoiceDataset as follows:

In [None]:
choices = [0, 1, 2] # Indexes of the items chosen

items_features_by_choice =  [
    [
        [100, 0], # choice 1, Item 1 [price, promotion]
        [140, 0], # choice 1, Item 2 [price, promotion]
        [200, 0], # choice 1, Item 2 [price, promotion]
    ],
    [
        [100, 0], # choice 2, Item 1 [price, promotion]
        [120, 1], # choice 2, Item 2 [price, promotion]
        [200, 0], # choice 2, Item 2 [price, promotion]
    ],
    [
        [100, 0], # choice 3, Item 1 [price, promotion]
        [120, 1], # choice 3, Item 2 [price, promotion]
        [180, 1], # choice 3, Item 2 [price, promotion]
    ],
]

Item i<sub>2</sub> was out of stock during the last choice. Thus it could not have been chosen. In order to keep this information we create a matric indicating which items were available during each of the choices:

In [None]:
available_items_by_choice = [
    [1, 1, 1], # All items available for choice 1
    [1, 1, 1], # All items available for choice 2
    [1, 0, 1], # Item 2 not available for choice 3
]

And now let's create the ChoiceDataset! We can also specify the features names if we want to.

In [None]:
from choice_learn.data import ChoiceDataset

dataset = ChoiceDataset(
    choices=choices,
    items_features_by_choice=items_features_by_choice,
    items_features_by_choice_names=["price", "promotion"],
    available_items_by_choice=available_items_by_choice,
)

## Modelling: Estimation and choice probabilities