# Notebook #01: Recommender Systems in Practice

# Designing and evaluating a recommendation algorithm

In this notebook, we will focus on becoming familiar with the recommendation pipeline through a custom Python toolbox, in the simplest possible way. First, we will setup the working environment in GDrive. Then, we will follow the experimental pipeline syep by step, by:
- loading the Movielens 1M dataset; 
- performing a train-test splitting;
- creating a pointwise / pairwise / random / mostpop recommendation object;
- training the model, when applicable;
- computing the user-item matrix of predicted relevance scores;
- calculating a set of evaluation metrics, such as Normalized Discounter Cumulative Gain (NDCG), Coverage, and Novelty. 

The trained models, together with the partial computation we will save (e.g., user-item relevance matrix or metrics), will be the starting point of the investigation and the treatment covered by the other Jupyter notebooks.

<div class="alert alert-warning">
    <h1>Warm-up: Setup the working environment for this notebook</h1> <br>
</div>

- Python 3.6
- Package Requirements: pandas, numpy, scipy, matplotlib, scikit-learn, tensorflow. 
- GDrive storage requirements: ~1GB

This step serves to mount GDrive storage within this Jupyter notebook. The command will request us to give access permissions to this notebook, so that we will be able to clone the project repository when we desire. Please follow the prompted instructions.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

We will clone the project repository in our My Drive folder. If you wish to change the target folder, please modify the command below.

In [None]:
%cd /content/gdrive/My Drive/

## Clone the Github repository into GDrive

If you want to work with the codebase locally in your laptop, you should start to run the following commands.

In [None]:
! git clone https://github.com/biasinrecsys/icdm2020.git

We will move to the project folder in order to install the required packages. 

In [None]:
%cd icdm2020

In [None]:
! ls

In [None]:
! pip install -r requirements.txt

We will configure the notebooks directory as our working directory in order to simulate a local notebook execution. 

In [None]:
%cd ./notebooks

## Import Python packages and create the folders for pre-computed results

In [1]:
import sys 
import os

sys.path.append(os.path.join('..'))

In [2]:
import pandas as pd
import numpy as np

In [79]:
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
from helpers.train_test_splitter import *
from models.pointwise import PointWise
from models.pairwise import PairWise
from models.mostpop import MostPop
from models.random import Random
from helpers.utils import *

We will define the subfolders in **./data** where we will store our pre-computed results. For each dataset:

- *data/outputs/splits* will include two csv files including the train and test interactions, according with the selected train-test split rule. 
- *data/outputs/instances* will include a csv file with instances to be fed to the model, either pairs for point-wise or triplets for pair-wise recommenders.
- *data/outputs/models* will include a h5 file associated with a pre-trained recommender model.  
- *data/outputs/predictions* will include a numpy file representing a user-item matrix; a cell stores the relevance score of an item for a given user.
- *data/outputs/metrics* will include a pickle dictionary with the computed evaluation metrics for a given recommender model. 

**N.B.** This strategy will allow us to play with the intermediate outputs of the pipeline, without starting from scratch any time (e.g., for performing a bias treatment as a post-processing, we just need to load the predictions of a model to start). 

In [4]:
data_path = '../data/'

In [None]:
!mkdir '../data/outputs'
!mkdir '../data/outputs/splits'
!mkdir '../data/outputs/instances'
!mkdir '../data/outputs/models'
!mkdir '../data/outputs/predictions'
!mkdir '../data/outputs/metrics'

<div class="alert alert-warning">
    <h1>Step 1: Load data</h1> <br>
</div>

First, we will load the **Movielens 1M** dataset, which has been pre-arranged in order to comply with the following structure: user_id, item_id, rating, timestamp, type (label for the item category), and type_id (unique id of the item category). For the sake of tutorial easiness, we assume here that each item is randomly assigned to one of its categories in the original dataset. 

**N.B.** This toolbox is flexible enough to integrate any other dataset in csv format that has the same structure of the pre-arranged csv shown below. No further changes are then needed to the pipeline in order to experiment with other datasets. The csv file of the new dataset sshould be placed into the *data/datasets/* folder and the name of the file should be assigned to the *dataset* parameter below. 

### Input of this step: CSV file including user preferences
---

In [5]:
dataset = 'ml1m'  
user_field = 'user_id'
item_field = 'item_id'
rating_field = 'rating'
time_field = 'timestamp'
type_field = 'type_id'

---

In [6]:
data = pd.read_csv(os.path.join(data_path, 'datasets/' + dataset + '.csv'), encoding='utf8')

In [7]:
data.head()

Unnamed: 0,user_id,item_id,rating,timestamp,type,type_id
0,1,1193,5.0,2000-12-31 23:12:40,Drama,7
1,2,1193,5.0,2000-12-31 22:33:33,Drama,7
2,12,1193,4.0,2000-12-31 00:49:39,Drama,7
3,15,1193,4.0,2000-12-30 19:01:19,Drama,7
4,17,1193,5.0,2000-12-30 07:41:11,Drama,7


<div class="alert alert-info" role="alert">
  Exercise in brief: find the id of the most popular item (i.e, the item that has received the highest number of ratings).     
    
**Expected result**: 2858

In [101]:
# Please, add your solution here

During this tutorial, we will simulate a scenario with **implicit feedback**. We assume that a user is interested in an item, if that item was rated by the user, no matter of the rating value. Other strategies can be easily integrated. 

**N.B.** Other papers in the literature assumed that an item is relevant for a user, only if the user has given a rating higher than a value X. To implement this strategy here, you just need to change the body of the lambda function below. 

In [8]:
data[rating_field] = data[rating_field].apply(lambda x: 1.0)

### Output of this step: Dataframe / CSV file including pre-processed user preferences 
---

In [9]:
data.head()

Unnamed: 0,user_id,item_id,rating,timestamp,type,type_id
0,1,1193,1.0,2000-12-31 23:12:40,Drama,7
1,2,1193,1.0,2000-12-31 22:33:33,Drama,7
2,12,1193,1.0,2000-12-31 00:49:39,Drama,7
3,15,1193,1.0,2000-12-30 19:01:19,Drama,7
4,17,1193,1.0,2000-12-30 07:41:11,Drama,7


---

<div class="alert alert-warning">
    <h1>Step 2: Split data in training and test sets</h1> <br>
</div>

Once the original dataset has been loaded and the user preferences have been pre-processed, we need to split the whole dataset in two sets: a training set used for optimizing the recommender model and a test set used for evaluating the recommender model. In the literature, a wide range of train-test split strategy exists. This notebook will use a strategy that, for each user, puts the oldest interactions in the training set and the most recent interactions in the test set. The Python toolbox includes also other strategies, such as a random split or a split based on a fixed timestamp (i.e., the most realistic one).    

### Input of this step: Dataframe / CSV file including pre-processed user preferences. 
---

- **smode**: 'uftime' for fixed timestamp split, 'utime' for time-based split per user, 'urandom' for random split per user 
- **train_ratio**: percentage of data to be included in the train set
- **min_train**: minimum number of train samples for a user to be included  
- **min_test**: minimum number of test samples for a user to be included
- **min_time**: start timestamp for computing the splitting timestamp (only for uftime)
- **max_time**: end timestamp for computing the splitting timestamp (only for uftime)
- **step_time**: timestamp step for computing the splitting timestamp (only for uftime)

In [10]:
smode = 'utime'
train_ratio = 0.80        
min_train_samples = 8
min_test_samples = 2
min_time = None
max_time = None
step_time = 1000

During this tutorial, we will work with a common **time-based split per user**. For the sake of clarity, we will provide the implementation of this strategy below. The toolbox conserves all the train-test split strategies into the file *helpers/train_test_splitter.py*.  

In [11]:
def user_timestamp(interactions,split=0.80,min_samples=10,user_field='user_id',item_field='item_id',time_field='timestamp'):
    train_set = []
    test_set = []
    
    groups = interactions.groupby([user_field])
    for i, (index, group) in enumerate(groups):
        
        if i % 1000 == 0:
            print('\r> Parsing user', i+1, 'of', len(groups), end='')
        
        if len(group.index) < min_samples:
            continue
        
        sorted_group = group.sort_values(time_field)
        n_rating_test = int(len(sorted_group.index) * (1.0 - split))
        train_set.append(sorted_group.head(len(sorted_group.index) - n_rating_test))
        test_set.append(sorted_group.tail(n_rating_test))
    
    print('\r> Parsing user', i+1, 'of', len(groups))

    train, test = pd.concat(train_set), pd.concat(test_set)
    train['set'], test['set'] = 'train', 'test' # Ensure that each row has a column that identifies the associated set

    traintest = pd.concat([train, test])
    traintest[user_field + '_original'] = traintest[user_field] # Ensure that we save the original user ids
    traintest[item_field + '_original'] = traintest[item_field] # Ensure that we save the original item ids
    traintest[user_field] = traintest[user_field].astype('category').cat.codes # Ensure that user ids are in [0, |U|] 
    traintest[item_field] = traintest[item_field].astype('category').cat.codes # Ensure that item ids are in [0, |I|] 

    return traintest

This notebook can be easily run with any of the different train-test split strategies, through the following code. 

In [12]:
if smode == 'uftime':
    traintest = fixed_timestamp(data, min_train_samples, min_test_samples, min_time, max_time, step_time, user_field, item_field, time_field, rating_field)
elif smode == 'utime':
    traintest = user_timestamp(data, train_ratio, min_train_samples+min_test_samples, user_field, item_field, time_field)
elif smode == 'urandom':
    traintest = user_random(data, train_ratio, min_train_samples+min_test_samples, user_field, item_field)

> Parsing user 6040 of 6040


**N.B.** For the sake of convenience, *user_ids* and *item_ids* have been scaled so that user_ids are in *[0, |U|]* and item_ids are in *[0, |I|]*. To refer back to the original user and item ids, the *user_id_original* and *item_id_original* columns should be used. 

For the sake of replicability and efficiency of this tutorial, we will save the pre-computed train and test sets in *data/outputs/splits*.

In [13]:
traintest.to_csv(os.path.join(data_path, 'outputs/splits/' + dataset + '_' + smode + '.csv'))

### Output of this step: Dataframe / CSV file with interactions assigned to training and test sets
---

In [14]:
traintest.head()

Unnamed: 0,user_id,item_id,rating,timestamp,type,type_id,set,user_id_original,item_id_original
34073,0,2969,1.0,2000-12-31 23:00:19,Drama,7,train,1,3186
31152,0,1574,1.0,2000-12-31 23:00:55,Romance,13,train,1,1721
37339,0,957,1.0,2000-12-31 23:00:55,Children's,3,train,1,1022
23270,0,1178,1.0,2000-12-31 23:00:55,Sci-Fi,14,train,1,1270
28157,0,2147,1.0,2000-12-31 23:01:43,Romance,13,train,1,2340


---

<div class="alert alert-warning">
    <h1>Step 3: Train the recommender model</h1> <br>
</div>

### Input of this step: Dataframe / CSV file with interactions assigned to training and test sets
---

In [15]:
train = traintest[traintest['set']=='train'].copy()
test = traintest[traintest['set']=='test'].copy()

---

<div class="alert alert-info" role="alert">
  Exercise in brief: plot the distribution of interactions per item in the training set and in the test set, separately.    
    
**Expected result**: ![caption](train_test_pop_distr.png)

In [96]:
# Please, add your solution here

First, we show some statistics about the training and test sets, e.g., number of users and items. 

In [16]:
users = list(np.unique(traintest[user_field].values))
items = list(np.unique(traintest[item_field].values))

In [17]:
len(users), len(items)

(6040, 3706)

Given that some recommender models may require the category of an item, we create a vector of size *|I|* including the integer-encoded category of the item with id *X* at position *X* of the vector. 

In [18]:
category_per_item = traintest.drop_duplicates(subset=['item_id'], keep='first')[type_field].values

In [19]:
len(np.unique(category_per_item))

18

For the sake of easiness and time, this tutorial focuses on four main recommendation strategies: 
- *Random*: randomly recommending a list of items to a user. 
- *MostPop*: recommending the same most popular items (i.e, those which received the highest number of ratings) to all users.
- *PointWise*: given a user-item pair, it is optimized for predicting a higher score (1) when the current item has been rated by the user, and a lower score (0) otherwise. The training instances include a good reprsentation of both types of pairs.   
- *PairWise*: given a triplet with a user, an observed item, and an unobserved item, it is optimized for predicting a higher relevance for the pair of user and unobserved item rather than for the pair of user and unobserved item. 

Each model inherits from the Model class defined in *models/model.py* and extends it by overwriting the *train* and *predict* functions of the original model class. This allows us to minimize the reuse of the code. More details on the implementation of the pairwise recommender can be found into *models/pairwise.py*.  

In [20]:
model_types = {'random': Random, 'mostpop': MostPop, 'pointwise': PointWise, 'pairwise': PairWise}

First, we need to initialize the model. We will see how the process works for a PairWise algorithm. Then, we will consider the other ones. 

In [57]:
model_type = 'pairwise'
%time model = PairWise(users, items, train, test, category_per_item, item_field, user_field, rating_field)

Initializing user, item, and categories lists
Initializing observed, unobserved, and predicted relevance scores
Initializing item popularity lists
Initializing category per item
Initializing category preference per user
Initializing metrics
Wall time: 7.6 s


We will train the model by feeding the train data we previously prepared, using the following default parameters. 

- *no_epochs* (default 100): maximum number of epochs until which the training process will be run. 
- *batches* (default 1024): size of the batches fed into the model during training. 
- *lr* (default 0.001): learning rate defining the pace at which the model will be trained. 
- *no_factors* (default 10): size of the latent vectors associated to users and items. 
- *no_negatives* (default 10): number of triplets for each user-item pair included in the training set. 
- *val_split* (default 0.0001): proportion of the training set used for validation. 

**N.B.** For the sake of tutorial efficiency, we force to stop the training process after 5 epochs (i.e., reasonable trade-off). No grid search on the recommender model is performed at this stage. 

In [58]:
%time model.train(no_epochs=5) 

Generating training instances of type pair
Computing instances for interaction 800000 / 803798 of type pair
Performing training - Epochs 5 Batch Size 1024 Learning Rate 0.001 Factors 10 Negatives 10 Mode pair
Train on 7957600 samples
Validation accuracy: 0.8716098531973128 (Sample 80379 of 80380)
Train on 7957600 samples
Epoch 2/2
Train on 7957600 samples
Epoch 3/3
Train on 7957600 samples
Epoch 4/4
Train on 7957600 samples
Epoch 5/5
Validation accuracy: 0.9180019905449117 (Sample 80379 of 80380)
Wall time: 2min 21s


The architecture of the trained model looks as follows. Essentially, the model includes:
- *UserEmb* encoding a latent vector for each user.
- *ItemEmb* encoding a latent vector for each item.
- *FlatUserEmb* represents the vector associated with the current user *UserInput*.
- *FlatPosItemEmb* represents the vectors associated with the current observed item *PosItemInput*.
- *FlatNegItemEmb* represents the vectors associated with the current unobserved item *NegItemInput*.
- *Accuracy* computes the margin between (i) the *FlatUserEmb-FlatPosItemEmb* and (ii) the *FlatUserEmb-FlatNegItemEmb* similarity scores.  

In [59]:
model.print()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
UserInput (InputLayer)          [(None, 1)]          0                                            
__________________________________________________________________________________________________
PosItemInput (InputLayer)       [(None, 1)]          0                                            
__________________________________________________________________________________________________
NegItemInput (InputLayer)       [(None, 1)]          0                                            
__________________________________________________________________________________________________
UserEmb (Embedding)             (None, 1, 10)        60410       UserInput[0][0]                  
____________________________________________________________________________________________

### Output of this step: H5 TensorFlow model pre-trained with the interactions in the training set
---

The model file is saved in *data/outputs/models*. 

In [60]:
model

<models.pairwise.PairWise at 0x13ba51be940>

---

<div class="alert alert-warning">
    <h1>Step 4: Compute user-item relevance scores</h1> <br>
</div>

Once the recommender model has been trained, we leverage the pre-trained user and item Embedding matrices in order to compute the relevance score predicted for each unseen user-item pair. For all the user-item pairs, the prediction step requires to extract the user and item vector associated to the current user-item pair and, then, compute the similarity between the two - cosine or dot similarity are usually used at this stage.  

### Input of this stage: H5 Tensorflow model pre-trained with the interactions in the training set
---


In [61]:
model

<models.pairwise.PairWise at 0x13ba51be940>

---

Now, we will use the pre-trained model to predict the user-item relevance scores.

In [62]:
model.predict()

Computing predictions


For the sake of easiness, you could directly manipulate the user-item relevance matrix as a numpy array. 

In [63]:
scores = model.get_predictions()

Hence, we can access to the relevance score of the user *120* for the item *320* as follows. 

In [64]:
user_id, item_id = 120, 320
scores[user_id, item_id]

5.455412864685059

<div class="alert alert-info" role="alert">
  Exercise in brief: compute the range of the scores on the whole population of users.     
       
**Expected result**: (-31.945167541503906, 39.834774017333984)

In [66]:
# Please, add your solution here

For the sake of convenience, we will save the predicted scores. They are often used as an input for re-ranking treatments against bias. 

In [29]:
save_obj(scores, os.path.join(data_path, 'outputs/predictions/' + dataset + '_' + smode + '_' + model_type + '_scores.pkl'))

<div class="alert alert-info" role="alert">
  Exercise in brief: retrieve the ids of the 10 items having the highest relevance score for the user with id 47.   
       
**Expected result**: [1106,  253, 1848, 1120, 1108, 1449,  466,  106, 2374,  575]

In [104]:
# Please, add your solution here

### Output of this step: Numpy matrix of size |U|*|I| containing the user-item relevance scores
---

In [30]:
scores.shape

(6040, 3706)

---

<div class="alert alert-warning">
    <h1>Step 5: Generate recommendations and compute evaluation metrics</h1> <br>
</div>

Finally, with the user-item relevance scores predicted in the previous step, we can generate the recommendations for each user and, then, compute a set of well-known evaluation metrics for recommender systems. 

### Input of this step: Numpy matrix of size |U|*|I| containing the user-item relevance scores and a list of cutoffs
---

In [31]:
scores.shape

(6040, 3706)

In [32]:
cutoffs = np.array([5, 10, 20])

---

For the sake of convenience, for the considered recommender model, we also compute some fairness metrics required for the case studies. The following line of code loads the demographic membership of providers, which will be discussed in detail in Notebook #03.

**N.B.** While the gender is by no means a binary construct, to the best of our knowledge no dataset for speaker recognition with non-binary genders exists. What we are considering is a binary feature, as the current publicly available datasets offer.

In [33]:
gender_item_association = pd.read_csv(os.path.join(data_path, 'datasets', 'ml1m-dir-gender.csv')) 

This dataframes includes, for each item, the percentage of providers with gender_1 and gender_2 for that item, respectively. 

In [52]:
gender_item_association.head()

Unnamed: 0,item_id,gender_1,gender_2
0,661,0.0,1.0
1,914,0.0,1.0
2,3408,0.0,1.0
3,2355,0.0,1.0
4,1197,0.0,1.0


<div class="alert alert-info" role="alert">
  Exercise in brief: compute the percentage of items where at least one provider having gender_1 is represented.  
    
**Expected result**: 0.05

In [55]:
# Please, add your solution here

In [34]:
gender_maps = {i:g for i, g in zip(gender_item_association['item_id'], gender_item_association['gender_1'])}
item_maps = {i1:i2 for i1, i2 in zip(traintest['item_id'].unique(), traintest['item_id_original'].unique())}

In [35]:
item_group = [(1 if item_maps[i] in gender_maps and gender_maps[item_maps[i]] == 0 else 0) for i in range(len(items))]

Then, we run the function which computes all the metrics relevant for the subsequent case studies. 

In [36]:
model.test(item_group=item_group, cutoffs=cutoffs)

Computing metrics for user 6040 / 6040


The method has pre-computed a set of metrics and saved the corresponding values in a Python dictionary, as detailed below. 

In [37]:
metrics = model.get_metrics()

In [38]:
metrics.keys()

dict_keys(['precision', 'recall', 'ndcg', 'hit', 'mean_popularity', 'diversity', 'novelty', 'item_coverage', 'visibility', 'exposure'])

The values for each metrics have been computed and store for each cutoff.

In [39]:
for name, values in metrics.items():
    print(values.shape, name)

(3, 6040) precision
(3, 6040) recall
(3, 6040) ndcg
(3, 6040) hit
(3, 6040) mean_popularity
(3, 6040) diversity
(3, 6040) novelty
(3, 3706) item_coverage
(3, 6040) visibility
(3, 6040) exposure


For instance, we can access to the NDCG score for the user *120* at cutoff *10*, with the following commands.

In [40]:
user_id, cutoff_index = 1324, int(np.where(cutoffs == 10)[0])
metrics['ndcg'][cutoff_index, user_id]

0.2583297509898471

<div class="alert alert-info" role="alert">
  Exercise in brief: compute and print the catalog coverage (i.e., percentage of items recommended at least once) at a cutoff of 20.  
    
**Expected result**: 0.30

In [51]:
# Please, add your solution here

For the sake of convenience, we will save the compted metrics.

In [41]:
save_obj(metrics, os.path.join(data_path, 'outputs/metrics/' + dataset + '_' + smode + '_' + model_type + '_metrics.pkl'))

We can also see the aggregated values. 

In [42]:
model.show_metrics(index_k=int(np.where(cutoffs == 10)[0]))

Precision: 0.1164 
Recall: 0.0484 
NDCG: 0.1269 
Hit Rate: 0.5187 
Avg Popularity: 1903.2718 
Category Diversity: 0.3209 
Novelty: 1.7948 
Item Coverage: 0.22 
User Coverage: 0.5187
Minority Exposure: 0.0501
Minority Visibility: 0.0434


### Output of this step: Dictionary of evaluation metrics 
---

In [43]:
' - '.join(list(metrics.keys()))

'precision - recall - ndcg - hit - mean_popularity - diversity - novelty - item_coverage - visibility - exposure'

---

<div class="alert alert-warning">
    <h1>Step 7: Run the pipeline for Random, MostPop, and PointWise</h1> <br>
</div>

We will define a utility function to run all the above operations jointly for each of the other recommender models.

In [44]:
def run_model(model_type, no_epochs=None):
    print('Running model', model_type)
    model = model_types[model_type](users, items, train, test, category_per_item, item_field, user_field, rating_field)
    model.train(no_epochs=no_epochs) if no_epochs else model.train() 
    model.predict()
    scores = model.get_predictions()
    save_obj(scores, os.path.join(data_path, 'outputs/predictions/' + dataset + '_' + smode + '_' + model_type + '_scores.pkl'))
    model.test(item_group=item_group, cutoffs=cutoffs)
    metrics = model.get_metrics()
    save_obj(metrics, os.path.join(data_path, 'outputs/metrics/' + dataset + '_' + smode + '_' + model_type + '_metrics.pkl'))
    print('\n\nFinal evaluation metrics:')
    model.show_metrics(index_k=int(np.where(cutoffs == 10)[0]))

In [45]:
run_model('random')

Running model random
Initializing user, item, and categories lists
Initializing observed, unobserved, and predicted relevance scores
Initializing item popularity lists
Initializing category per item
Initializing category preference per user
Initializing metrics
Computing predictions
Computing metrics for user 6040 / 6040


Final evaluation metrics:
Precision: 0.0092 
Recall: 0.0027 
NDCG: 0.0092 
Hit Rate: 0.0851 
Avg Popularity: 198.1405 
Category Diversity: 0.3285 
Novelty: 6.9874 
Item Coverage: 1.0 
User Coverage: 0.0851
Minority Exposure: 0.1683
Minority Visibility: 0.1673


In [46]:
run_model('mostpop')

Running model mostpop
Initializing user, item, and categories lists
Initializing observed, unobserved, and predicted relevance scores
Initializing item popularity lists
Initializing category per item
Initializing category preference per user
Initializing metrics
Computing predictions
Computing metrics for user 6040 / 6040


Final evaluation metrics:
Precision: 0.1007 
Recall: 0.0384 
NDCG: 0.1096 
Hit Rate: 0.4422 
Avg Popularity: 2328.0848 
Category Diversity: 0.3293 
Novelty: 1.3922 
Item Coverage: 0.03 
User Coverage: 0.4422
Minority Exposure: 0.0509
Minority Visibility: 0.0616


In [None]:
run_model('pointwise', no_epochs=5)

<div class="alert alert-warning">
    <h1>Follow-up: how to extend the toolbox</h1> <br>
</div>

- New splitter: take a look at the helpers/train_test_splitter.py file and how the existing generators have been defined. 
- New train instances creator: similarly, take a look at the helpers/instances_creator.py file and how the existing generators have been defined. 
- New model: a new subclass of the Model class defined in models/model.py should be defined, implementing a 'train' and a 'predict' method. 
- New metrics: both the 'test' and 'show_metrics' methods of models/model.py should be extended with the computation needed by the new metric.  