# Bootcamp Project- Product Recommendation

Our customer is a multi-national company that works on the health sector. They want to predict what products their
customers shall need the most, based on their past purchases but also on other variables that could be interesting
(this would be part of your research).

**Goal:** Build a recommendation engine to recommend relevant items to a user, based on historical data.

<a id='toc'></a>

### Table of Contents
2. [Model dataset preparation](#dataset) <br>
    1. [Import required modules](#module_import) <br>
    2. [Import datasets](#dataset_import) <br>
    3. [Create dataset folds](#prepare_dataset_folds) <br>

<a name='dataset'></a>

## 2. Model dataset preparation
After exploring and cleaning the datasets we need to prepare the datasets to be used in the recommendation engine. <br>
This preparation phase will focus on the creation of different folds to allow a cross validation evaluation of the
different models.

<a name='module_import'></a>

#### 2.1. Import required modules

In [1]:
import pandas as pd

from bootcamp.data import ModelData



<a name='dataset_import'></a>

#### 2.2. Import datasets

In [2]:
final_full_dataset = pd.read_parquet("../../data/clean_datasets/clean_final_dataset.parquet")

<a name='prepare_dataset_folds'></a>

#### 2.3. Create dataset folds
The creation of the dataset folds is composed of several steps that consist of:
1. Create a list of unique clients and items.
2. Create an encoding dictionary for clients and items. The objective is to have a uniform list of both clients and
items.
3. Apply the encoding to the final full dataset.
4. Apply the encoding to the unique client and item lists.
5. Create the data folds based on the division dictionary (different test months to different folds).
6. Save the unique list and fold datasets.

`Fold preparation`

In [3]:
path = '../../data/model_datasets'
folds = ModelData(final_full_dataset).run(path)

12/08/2021 15:14:39 - INFO: Getting unique values...
12/08/2021 15:14:39 - INFO: Getting unique values...
12/08/2021 15:14:39 - INFO: Performing encoding...
12/08/2021 15:14:39 - INFO: Performing encoding...
12/08/2021 15:14:39 - INFO: Encoding dataframe...
12/08/2021 15:14:39 - INFO: Applying encoding...
12/08/2021 15:14:40 - INFO: Applying encoding...
12/08/2021 15:14:40 - INFO: Encoding list...
12/08/2021 15:14:40 - INFO: Encoding list...
12/08/2021 15:14:40 - INFO: Creating folds...
12/08/2021 15:14:40 - INFO: Dividing datasets by date...
12/08/2021 15:14:41 - INFO: Dividing datasets by date...
12/08/2021 15:14:42 - INFO: Dividing datasets by date...
12/08/2021 15:14:43 - INFO: Saving data...
12/08/2021 15:14:43 - INFO: Saving data...
12/08/2021 15:14:43 - INFO: Saving data...
