# MLP and NEUMF Test

In this notebook, we will test the MLP and NeuMF methods to predict top recommendations for each user. First, we do a leave-one-out split for the test set. Then, we evaluate our prediction for each user with 100 sampled items to which we add the last item bought.

In [2]:
# Requirements

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import LabelEncoder

In [3]:
df_recommendation = pd.read_csv('base de donnée_20_20.csv')
df_recommendation.head()

Unnamed: 0,user,id,rating,timestamp
0,AE23JYHGEN3D35CHE5OQQYJOW5RA,B000EEHKVY,5.0,1427926325000
1,AE23JYHGEN3D35CHE5OQQYJOW5RA,B000TGSM6E,5.0,1480348230000
2,AE23JYHGEN3D35CHE5OQQYJOW5RA,B008FDSWJ0,5.0,1528832546194
3,AE23JYHGEN3D35CHE5OQQYJOW5RA,B012VQ5A7S,5.0,1528829957330
4,AE23JYHGEN3D35CHE5OQQYJOW5RA,B076ZSHQ47,3.0,1583185764147


Let's first encode the 'user' and 'item' columns. 

In [4]:
# Initialize encoder
encoder = LabelEncoder()

df_recommendation['user'] = encoder.fit_transform(df_recommendation['user'])

# Reinitialize it
encoder = LabelEncoder()

df_recommendation['id'] = encoder.fit_transform(df_recommendation['id'])

In [5]:
df_recommendation.head()

Unnamed: 0,user,id,rating,timestamp
0,0,30,5.0,1427926325000
1,0,43,5.0,1480348230000
2,0,110,5.0,1528832546194
3,0,233,5.0,1528829957330
4,0,418,3.0,1583185764147


## Train / Test Split

We begin by doing a train-test split to perform leave-one-out evaluation on the recommendations. Moreover, we will create a file containing negative samples.

Train / Test Split
The test set must contain the last relevant item for each user. We define a relevant item for a specific user, an item that has been rated by 4 or more by that user.

In [6]:
# Sort dataframe by user and timestamp
df_recommendation = df_recommendation.sort_values(by=['user', 'timestamp'])

# Filter for items with a rating greater than 4
df_filtered = df_recommendation[df_recommendation['rating'] >= 4]

# Test set: the last item with a rating greater than 4 for each user
df_test = df_filtered.groupby('user').tail(1)

#Train set
df_train = df_recommendation.drop(df_test.index)

df_train.shape, df_test.shape

((37580, 4), (5046, 4))

Let's create a dataset with 5 negative samples for each user that we concatenate with the test set.

In [12]:
# List of all users and items
all_users = df_recommendation['user'].unique()
all_items = df_recommendation['id'].unique()

# All existing interactions set
interactions = set(zip(df_recommendation['user'], df_recommendation['id']))

# Negative items list
negative_samples = []
num_negatives = 5

for user in df_test['user'].unique():
    # All negative samples for each user
    negative_items = [item for item in all_items if (user, item) not in interactions]

    # Sample from negative samples for each user
    sampled_negatives = np.random.choice(negative_items, size=num_negatives, replace=False)

    # # Add the sampled items to their list
    # negative_samples.append({'user': user, 
    #                          'negative_1': sampled_negatives[0], 
    #                          'negative_2': sampled_negatives[1], 
    #                          'negative_3': sampled_negatives[2], 
    #                          'negative_4': sampled_negatives[3], 
    #                          'negative_5': sampled_negatives[4]})

    ## POUR GENERER UN DICTIONNAIRE DE LA BONNE TAILLE AUTOMATIQUE 

    negative_sample = {'user': user}
    for i in range(num_negatives):
        negative_sample[f'negative_{i + 1}'] = sampled_negatives[i]

    # adding the negative sample to the list 
    negative_samples.append(negative_sample)

negative_samples_df = pd.DataFrame(negative_samples)

df_test_negative = pd.merge(df_test, negative_samples_df, on='user', how='left')

df_test_negative.shape

(5046, 9)

In [17]:
df_test_negative.head()

Unnamed: 0,user,id,rating,timestamp,negative_1,negative_2,negative_3,negative_4,negative_5
0,0,770,5.0,1642464772969,161,411,82,714,594
1,1,658,5.0,1664305424891,654,603,967,73,287
2,2,506,5.0,1631833014190,891,752,78,997,741
3,3,700,5.0,1478657366000,199,832,746,975,163
4,4,147,5.0,1647063756658,37,424,167,265,56


In [18]:
# Pandas Datframes to CSV
df_train.to_csv('data/train.csv', index=False)
df_test.to_csv('data/test.csv', index=False)
df_test_negative.to_csv('data/test_negative.csv', index=False)