# Recommendations Systems
## Homework 3: Neural Collaborative Filtering

Submit your solution in the form of an Jupyter notebook file (with extension ipynb).   
Images of graphs or tables should be submitted as PNG or JPG files.   
The code used to answer the questions should be included, runnable and documented in the notebook.   
Python 3.6 should be used.

The goal of this homework is to let you understand the concept of  recommendations based on implicit data which is very common in real life, and learn how ‘Deep neural networks’ components can be used to implement a collaborative filtering and hybrid approach recommenders.  
Implementation example is presented in the <a href='https://colab.research.google.com/drive/1v72_zpCObTFMbNnQXUknoQVXR1vBRX6_?usp=sharing'>NeuralCollaborativeFiltering_Implicit</a> notebook in Moodle.

We will use a dataset based on the <a href='https://grouplens.org/datasets/movielens/1m/'>MovieLens 1M rating dataset</a> after some pre-processing to adapt it to an implicit feedback use case scenario.  
You can download the dataset used by <a href='https://github.com/hexiangnan/neural_collaborative_filtering'>this implementation</a> of the paper Neural Collaborative Filtering or from the NeuralCollaborativeFiltering_implicit notebook in Moodle.
<br>

## Imports:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

from keras.layers import Embedding, Input, Dense, Reshape,  Flatten, Dropout
from keras.regularizers import l2
from keras import backend as K
from keras import initializers
from keras.initializers import RandomNormal
from keras.models import Sequential, Model, load_model, save_model
from keras.layers.core import Dense, Lambda, Activation
from keras.optimizers import Adagrad, Adam, SGD, RMSprop
from keras.layers import Multiply, Concatenate

# from time import time
# import multiprocessing as mp
# import sys
# import math
# import argparse

#### Preprocessing:

In [2]:
# comment-out on initial notebook run (we need some files fromt this repository)
# !git clone https://github.com/hexiangnan/neural_collaborative_filtering.git

In [3]:
column_names = ['user_id', 'item_id', 'rating', 'timestamp']

# Read the training file
training = pd.read_csv('./neural_collaborative_filtering/Data/ml-1m.train.rating', sep='\t', names=column_names)

# Read the test file
test_rating = pd.read_csv('./neural_collaborative_filtering/Data/ml-1m.test.rating', sep='\t', names=column_names)


negative_ids = ['(user_id, item_id)']

for i in range(1,100):
    negative_ids.append(f'id-{i}')

test_negative = pd.read_csv('./neural_collaborative_filtering/Data/ml-1m.test.negative', sep='\t', names=negative_ids)

In [4]:
training.loc[(training['user_id'] == 0) & (training['item_id'] == 8)]

Unnamed: 0,user_id,item_id,rating,timestamp
13,0,8,4,978302268


## Question 1: Dataset preparation

a. This implementation contains one file for training and two files for testing:
- ml-1m.train.rating
- ml-1m.test.rating
- ml-1m.test.negative

**Explain** the role and structure of each file and how it was created from the original MovieLens 1M rating dataset.

##### Answer:

ml-1m.train.rating:
- Training file
- Each line is a training instance: userID\t itemID\t rating\t timestamp (if exists)
- 1 million ratings, where each user has at least 20 ratings
- Similar to the training data from previous HWs

ml-1m.test.rating:
- Test file (positive instances)
- Each line is a testing instance: userID\t itemID\t rating\t timestamp (if exists)

ml-1m.test.negative:
- Test file (negative instances)
- Each line corresponds to the line of test.rating, containing 99 negative samples
- Each line is in the format: (userID,itemID)\t negativeItemID1\t negativeItemID2...
- This is the set of instances the user didn't interact with (rated)
- Only the first cell (userID,itemID) is the user and an item s/he DID interact with
- itemID is the most recent item the user interacted with!

b. **Explain** how the training dataset is created.

##### Answer:

We have the training data (only the positive ratings, none of the negative ones).
We go over the train data and read every tuple of user & item.

Firse, we add the userID to the trainining dataset with a label '1'.

We randomly choose 4 negative items (the "num_negatives" parameter. Can be a differernt number) and add it to the training dataset.
Since the matrix is vary sparse, there is a high probability to randomly choose an index of a negative item.
Then we add it to the trainining dataset with a label '0'.

c. **Explain** how the test dataset is created.

##### Answer:

This is similar to the way the training dataset is constructed, but with data the training dataset this time.

<br>

***
## Question 2: Neural Collaborative filtering

#### Constants:

In [None]:
num_negatives = 4
TOP_K = 10

In [36]:
from icecream import ic
def get_train_instances(train, num_negatives):
    user_input, item_input, labels = [0]*((num_negatives + 1)*len(train)),[0]*((num_negatives + 1)*len(train)),[1]
    num_users = train.shape[0]
    
    negatives = [0]*num_negatives
    labels.extend(negatives)
    total_labels = []
    list(map(lambda x: total_labels.extend(labels), range(len(train))))
    ic(len(total_labels), total_labels[:20])
#     return
    percent_1 = int(len(train)/1000)
    ic(percent_1)
    for idx_i in range(len(train)):
        curr_index = idx_i * (num_negatives + 1)
        if idx_i != 0 and idx_i % percent_1 == 0:
            print(f'{int(idx_i/percent_1)}%')
        u = train.iloc[idx_i].user_id
        i = train.iloc[idx_i].item_id
#         ic(u,i)
        user_input[curr_index:curr_index + (num_negatives + 1)] = [u]*(num_negatives + 1)
#         user_input.extend([u]*(num_negatives + 1))
        item_input[curr_index] = i

        items = training[training['user_id'] == u].item_id.to_numpy()
        sample_items = items[np.random.choice(len(items), size=num_negatives, replace=False)]
        item_input[curr_index+1:curr_index + (num_negatives + 1)] = sample_items

    return user_input, item_input, total_labels

In [28]:
training_data = get_train_instances(training, num_negatives)

ic| len(total_labels): 4970845
    total_labels[:20]: [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0]
ic| percent_1: 994


1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
11%
12%
13%
14%
15%
16%
17%
18%
19%
20%
21%
22%
23%
24%
25%
26%
27%
28%
29%
30%
31%
32%
33%
34%
35%
36%
37%
38%
39%
40%
41%
42%
43%
44%
45%
46%
47%
48%
49%
50%
51%
52%
53%
54%
55%
56%
57%
58%
59%
60%
61%
62%
63%
64%
65%
66%
67%
68%
69%
70%
71%
72%
73%
74%
75%
76%
77%
78%
79%
80%
81%
82%
83%
84%
85%
86%
87%
88%
89%
90%
91%
92%
93%
94%
95%
96%
97%
98%
99%
100%
101%
102%
103%
104%
105%
106%
107%
108%
109%
110%
111%
112%
113%
114%
115%
116%
117%
118%
119%
120%
121%
122%
123%
124%
125%
126%
127%
128%
129%
130%
131%
132%
133%
134%
135%
136%
137%
138%
139%
140%
141%
142%
143%
144%
145%
146%
147%
148%
149%
150%
151%
152%
153%
154%
155%
156%
157%
158%
159%
160%
161%
162%
163%
164%
165%
166%
167%
168%
169%
170%
171%
172%
173%
174%
175%
176%
177%
178%
179%
180%
181%
182%
183%
184%
185%
186%
187%
188%
189%
190%
191%
192%
193%
194%
195%
196%
197%
198%
199%
200%
201%
202%
203%
204%
205%
206%
207%
208%
209%
210%
211%
212%
213%
214%
215%
216%
217%
218%
219%
220%
221%
222

In [55]:
import pickle


with open('training_data.pkl', 'wb') as f:
    pickle.dump(training_data, f)

In [59]:
training_data = train_instances
user_input, item_input, total_labels = training_data

In [60]:
df = pd.DataFrame(list(zip(user_input, item_input, total_labels)), 
               columns =['user_input', 'item_input', 'total_labels'])
df.head()

Unnamed: 0,user_input,item_input,total_labels
0,0,32,1
1,0,11,0
2,0,27,0
3,0,36,0
4,0,33,0


In [63]:
df.to_csv('training_data_latest.csv', index=False)

a. Build the following four models using the neural collaborative filtering approach: 
- Matrix Factorization (MF)
- Multi layer perceptron (MLP)
- Generalized Matrix Factorization (GMF) 
- NeuroMatrixFactorization (NMF)

##### Answer:

In [61]:
def get_MF_model(num_users, num_items, latent_dim):
    '''Vanilla Matrix Factorization'''
    
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding', input_length=1)
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding', input_length=1)   
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings
    #prediction = merge([user_latent, item_latent], mode = 'dot')
    prediction = keras.layers.dot([user_latent,item_latent], axes=1,normalize=False)
    model = Model(inputs=[user_input, item_input], outputs=prediction)

    return model


def get_GMF_model(num_users, num_items, latent_dim, regs=None, activation='sigmoid'):
    '''Generalized Matrix Factorization'''
        
    if not regs:
        regs = [[0,0]]
    
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding',
                                   embeddings_regularizer = l2(regs[0][0]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01)) #init = init_normal,
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding',
                                   embeddings_regularizer = l2(regs[0][1]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01))  #init = init_normal, 
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings 
    predict_vector = Multiply()([user_latent, item_latent]) #merge([user_latent, item_latent], mode = 'mul')
    
    # Final prediction layer
    prediction = Dense(1, activation=activation, kernel_initializer='lecun_uniform', name = 'prediction')(predict_vector)
    model = Model(inputs=[user_input, item_input], outputs=prediction)
    
    return model


def get_MLP_model(num_users, num_items, latent_dim, regs=None, layers = None, activation='sigmoid'):
    '''Multi-Layer Perceptron'''
    
    if not regs:
        regs = [[0,0],0,0]
    
    if not layers:
        layers = [20,10]
    
    assert len(layers) + 1 == len(regs), 'the number of regs is equal to number of layers + the embedding layer'
    num_layer = len(layers) #Number of layers in the MLP
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MLP_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding',
                                   embeddings_regularizer = l2(regs[0][0]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01)) #init = init_normal,
    MLP_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding',
                                   embeddings_regularizer = l2(regs[0][1]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01)) #init = init_normal,
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MLP_Embedding_User(user_input))
    item_latent = Flatten()(MLP_Embedding_Item(item_input))
    
    # Concatenation of embedding layers
    vector = Concatenate(axis=-1)([user_latent, item_latent])#merge([user_latent, item_latent], mode = 'concat')
    
    # MLP layers
    for idx in range(num_layer):
        layer = Dense(layers[idx], kernel_regularizer = l2(regs[idx+1]), activation='relu', name = 'layer%d' %idx)
        vector = layer(vector)
        
    # Final prediction layer
    prediction = Dense(1, activation=activation, kernel_initializer='lecun_uniform', name = 'prediction')(vector)
    model = Model(inputs=[user_input, item_input], outputs=prediction)
    
    return model


def get_NMF_model(num_users, num_items, latent_dim_GMF, latent_dim_MLP, reg_GMF=None, regs_MLP=None, layers=None, activation='sigmoid'):
    '''Neural matrix factorization'''
    
    if not reg_GMF:
        reg_GMF=[[0,0]]
        
    if not regs_MLP:
        regs_MLP=[[0,0],0,0]
        
    if not laters:
        layers=[20,10]
    
    assert len(layers) + 1 == len(regs_MLP), 'the number of regs is equal to number of layers + the embedding layer'
    num_layer = len(layers) #Number of layers in the MLP

    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')
    
    # Embedding layer
    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim_GMF, name = 'MF_user_embedding',
                                   embeddings_regularizer = l2(reg_GMF[0][0]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01)) #init = init_normal,
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim_GMF, name = 'MF_item_embedding',
                                   embeddings_regularizer = l2(reg_GMF[0][1]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01))  #init = init_normal, 
    
    MLP_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim_MLP, name = 'MLP_user_embedding',
                                   embeddings_regularizer = l2(regs_MLP[0][0]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01)) #init = init_normal,
    MLP_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim_MLP, name = 'MLP_item_embedding',
                                   embeddings_regularizer = l2(regs_MLP[0][1]), input_length=1,embeddings_initializer=RandomNormal(mean=0.0, stddev=0.01)) #init = init_normal,
    
    # MF part
    mf_user_latent = Flatten()(MF_Embedding_User(user_input))
    mf_item_latent = Flatten()(MF_Embedding_Item(item_input))
    mf_vector = Multiply()([mf_user_latent, mf_item_latent]) #merge([mf_user_latent, mf_item_latent], mode = 'mul') # element-wise multiply

    # MLP part
    mlp_user_latent = Flatten()(MLP_Embedding_User(user_input))
    mlp_item_latent = Flatten()(MLP_Embedding_Item(item_input))
    mlp_vector = Concatenate(axis=-1)([mlp_user_latent, mlp_item_latent])#merge([mlp_user_latent, mlp_item_latent], mode = 'concat')
    
    for idx in range(num_layer):
        layer =  Dense(layers[idx], kernel_regularizer = l2(regs_MLP[idx+1]), activation='tanh', name = 'layer%d' %idx)
        mlp_vector = layer(mlp_vector)

    # Concatenate MF and MLP parts
    predict_vector = Concatenate(axis=-1)([mf_vector, mlp_vector])
    
    # Final prediction layer
    prediction = Dense(1, activation=activation, kernel_initializer='lecun_uniform', name = "prediction")(predict_vector)    
    model = Model(inputs=[user_input, item_input], outputs=prediction)
    
    return model

In [None]:
num_factors = 8 # size of embedding size. Can be split to 4 different params potentially.
num_negatives = 4 # how many negative samples per positive sample?
learning_rate = 0.001
epochs = 10
batch_size = 256
verbose = 1
write_model=False
topK = 10 # used to evaluate the model. Top K recommendations are used.
evaluation_threads = 1 
model_out_file = 'Pretrain/%s_GMF_%d_%d.h5' %(dataset, num_factors, time())

In [None]:
# Build models
mlp_model = get_MLP_model(num_users, num_items, num_factors, regs = [[0,0],0,0,0], layers = [32,16,8])
gmf_model = get_GMF_model(num_users, num_items, num_factors, regs = [[0,0]])
nmf_model = get_NMF_model(num_users, num_items, latent_dim_GMF=num_factors, latent_dim_MLP=num_factors, reg_GMF=[[0,0]], regs_MLP=[[0,0],0,0,0], layers=[32,16,8])

mlp_model.compile(optimizer=Adam(lr=learning_rate), loss='binary_crossentropy')
gmf_model.compile(optimizer=Adam(lr=learning_rate), loss='binary_crossentropy')
nmf_model.compile(optimizer=Adam(lr=learning_rate), loss='binary_crossentropy')

b. Train and evaluate the recommendations accuracy of three models: 
- MF or GMF
- MLP
- NMF

Compare the learning curve and recommendations accuracy using NDCG and MRR metrics with cutoff values of 5 and 10.   
Discuss the comparison.

##### Answer:

Model Evaluation

In [1]:
import math
import heapq # for retrieval topK
import multiprocessing
from time import time
#from numba import jit, autojit


def evaluate_model(model, test_ratings, test_negatives, K):
    """
    Evaluate the performance (MRR, NDCG) of top-K recommendation
    Return: score of each test rating.
    """
    mrrs, ndcgs = zip(*[eval_one_rating(model, test_ratings, test_negatives, idx, K) for idx in range(len(test_ratings))])
    return np.array(mrrs).mean(), np.array(ndcgs).mean()


def eval_one_rating(model, test_ratings, test_negatives, idx, K):
    rating = test_ratings[idx]
    items = test_negatives[idx]
    u = rating[0]
    gtItem = rating[1]
    items.append(gtItem)
    # Get prediction scores
    map_item_score = {}
    users = np.full(len(items), u, dtype = 'int32')
    predictions = model.predict([users, np.array(items)], 
                                 batch_size=100, verbose=0)
    
    for i in range(len(items)):
        item = items[i]
        map_item_score[item] = predictions[i]
    
    items.pop()
    
    # Evaluate top rank list
    ranklist = heapq.nlargest(_K, map_item_score, key=map_item_score.get)
    mrr = getMRR(ranklist, gtItem)
    ndcg = getNDCG(ranklist, gtItem)
    
    return mrr, ndcg


def getMRR(ranklist, gtItem):
    for i in range(len(ranklist)):
        item = ranklist[i]
        
        if item == gtItem:
            return 1/(i+1)
    return 0


def getNDCG(ranklist, gtItem):
    for i in range(len(ranklist)):
        item = ranklist[i]
        
        if item == gtItem:
            return math.log(2) / math.log(i+2)
    return 0

In [2]:
models = [('GMF', gmf_model), ('MLP', mlp_model), ('NMF', nmf_model)]

def initPerformance():
  # Init performance
  for name, model in models:
    t1 = time()
    mrr, ndcg = evaluate_model(model, test_rating, test_negative, TOP_K)
    print(f'{name} Init: MRR = {mrr:.4f}, NDCG = {ndcg:.4f}\t time = [{(time()-t1)/60}s]')


NameError: name 'gmf_model' is not defined

c. How the values of MRR and NDCG are differ from the results you got in the previous exercises which implemented the explicit recommendation approach. 
What are the difference in preparing the dataset for evaluation.

##### Answer:

d. How will you measure item similarity using the NeuMF model?

##### Answer:

<br>

***
## Question 3: Loss function

a. One of the enhancements presented in the Neural Collaborative Filtering paper is the usage of probabilistic activation function (the sigmoid) and binary cross entropy loss function.   

Select one of the models you implemented in question 2 and change the loss function to a Mean Squared Error and the activation function of the last layer to RELU.   

Train the model and evaluate it in a similar way to what you did in question 2. 
Compare the results and discuss.

In [60]:
model_3a = get_MLP_model(num_users, num_items, num_factors, regs = [[0,0],0,0,0], layers = [32,16,8], activation='relu')
model_3a.compile(optimizer=Adam(lr=learning_rate), loss='mse')
print(model_3a.summary())

NameError: name 'num_users' is not defined