# Bootcamp Project - Product Recommendation

Our customer is a multi-national company that works on the health sector. They want to predict what products their
customers shall need the most, based on their past purchases but also on other variables that could be interesting
(this would be part of your research).

**Goal:** Build a recommendation engine to recommend relevant items to a user, based on historical data.

<a id='toc'></a>

### Table of Contents
4. [Models evaluation](#dataset) <br>
    1. [Import required modules](#module_import) <br>
    2. [Import datasets](#dataset_import) <br>
    3. [Top sales from the previous month model](#top_sales) <br>
    4. [GMF (neural network) model](#gmf_model) <br>
    5. [MLP (neural network) model](#mlp_model) <br>
    6. [Conjugation of GMF and MLP model](#gmf_mlp_model) <br>

<a name='models_evaluation'></a>

## 4. Models evaluation
In this notebook we explore the multiple model solutions we considered for the project:
1. Top sales from the previous month
3. GMF (neural network)
4. MLP (neural network)
5. Conjugation of GMF and MLP

Our approach to develop and evaluate the models is to divide the given data in a training and testing dataset (3 folds).
The training datasets consist of all the know months until a specific date, and the testing date is composed of the
next month sales.

<a name='module_import'></a>

### 4.1. Import required modules

In [1]:
import pandas as pd
import numpy as np
import pickle

from bootcamp.metrics import Metrics, Models



<a name='dataset_import'></a>

### 4.2. Import datasets

In [2]:
folds = pickle.load(open('../../data/model_datasets/folds_dict.pickle', 'rb'))
unique_items = pickle.load(open('../../data/model_datasets/unique_items_encoded.pickle', 'rb'))
unique_clients = pickle.load(open('../../data/model_datasets/unique_clients_encoded.pickle', 'rb'))

<a name='top_sales'></a>

### 4.3 Top sales from the previous month model
Has a first model we use the sales from the previous month to make the client recommendations for the new month.

`Get model metrics`

In [3]:
# global result lists
global_results = []

for fold in folds.keys():

    # fold results list
    results = []

    # get train and test data
    train = folds[fold][0]
    test =folds[fold][1]

    # get the model metrics
    for k in [5]:
        result = Metrics(train, test, unique_items).last_month_model_metrics(k)
        results.append(result)

    # appends the fold results to the global results as a dataframe
    global_results.append(pd.DataFrame(results))

# prints all the average results
pd.concat(global_results).groupby('k').mean()

Unnamed: 0_level_0,coverage,mean_precision_at_k,mean_recall_at_k
k,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,29.228333,4.335333,3.012


<a name='gmf_model'></a>

### 4.5. GMF (neural network) model
Generalized Matrix Factorization (GMF) is according to the literature a special case of MF where the item and client features
are used has input layers to reach an embedding  vector that can be seen as a latent vector of the clients and items.
We then do a dot product of both these latent vectors and define a maping function for the interaction between client
and item.

`Get model metrics`

In [4]:
# global result lists
global_results = []

for fold in folds.keys():

    # fold results list
    results = []

    # get train and test data
    train = folds[fold][0]
    test =folds[fold][1]

    # gets the number of clients and products
    num_clients = len(unique_clients)
    num_items = len(unique_items)

    # trains the model
    model = Models().train_gmf_model(train, 50, 100, 2, num_clients, num_items)

    # saves the model
    model.save(f'../../data/models/gmf_model/{fold}')

    # get the model metrics
    for k in [5]:
        result = Metrics(train, test, unique_items).gmf_model_metrics(k, model)
        results.append(result)

    # appends the fold results to the global results as a dataframe
    global_results.append(pd.DataFrame(results))

# prints all the average results
pd.concat(global_results).groupby('k').mean()

2021-12-09 12:50:30.669645: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 1/2
Epoch 2/2


2021-12-09 12:50:58.214142: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: ../../data/models/gmf_model/first_fold/assets
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/gmf_model/second_fold/assets
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/gmf_model/third_fold/assets


Unnamed: 0_level_0,coverage,mean_precision_at_k,mean_recall_at_k
k,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,39.244667,4.159667,2.908333


<a name='mlp_model'></a>

### 4.6. MLP (neural network) model
Typical Multi-Layer Perceptron (MLP) where different characteristics of the clients, items and sales is used and feed to
hidden layers with hte objective of predicting an interaction/purchase.

`Get model metrics`

In [5]:
# global result lists
global_results = []

for fold in folds.keys():

    # fold results list
    results = []

    # get train and test data
    train = folds[fold][0]
    test =folds[fold][1]

    # gets the number of clients and products
    num_clients = len(unique_clients)
    num_items = len(unique_items)

    # trains the model
    model = Models().train_mlp_model(train, 100, 2)

    # saves the model
    model.save(f'../../data/models/mlp_model/{fold}')

    # get the model metrics
    for k in [5]:
        result = Metrics(train, test, unique_items).mlp_model_metrics(k, model)
        results.append(result)

    # appends the fold results to the global results as a dataframe
    global_results.append(pd.DataFrame(results))

# prints all the average results
pd.concat(global_results).groupby('k').mean()

Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/mlp_model/first_fold/assets
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/mlp_model/second_fold/assets
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/mlp_model/third_fold/assets


Unnamed: 0_level_0,coverage,mean_precision_at_k,mean_recall_at_k
k,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,29.064,4.335333,3.009


<a name='gmf_mlp_model'></a>

### 4.7. Conjugation of GMF and MLP model
Combination of the GMF and MLP models.

`Get model metrics`

In [6]:
# global result lists
global_results = []

for fold in folds.keys():

    # fold results list
    results = []

    # get train and test data
    train = folds[fold][0]
    test =folds[fold][1]

    # gets the number of clients and products
    num_clients = len(unique_clients)
    num_items = len(unique_items)

    # trains the model
    model = Models().train_gmf_mlp_model(train, 100, 2, 50, num_clients, num_items)

    # saves the model
    model.save(f'../../data/models/gmf_mlp_model/{fold}')

    # get the model metrics
    for k in [5]:
        result = Metrics(train, test, unique_items).gmf_mlp_model_metrics(k, model)
        results.append(result)

    # appends the fold results to the global results as a dataframe
    global_results.append(pd.DataFrame(results))

# prints all the average results
pd.concat(global_results).groupby('k').mean()


Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/gmf_mlp_model/first_fold/assets
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/gmf_mlp_model/second_fold/assets
Epoch 1/2
Epoch 2/2
INFO:tensorflow:Assets written to: ../../data/models/gmf_mlp_model/third_fold/assets


Unnamed: 0_level_0,coverage,mean_precision_at_k,mean_recall_at_k
k,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,65.188667,3.547,2.662
