<h1> ~Neural Collaborative Filtering -  Implementation with Keras </h>

In [1]:
import numpy as np
import pandas as pd
import os
import tensorflow as T
import keras
from keras import backend as K
from keras import initializers
from keras.initializers import RandomNormal
from keras.models import Sequential, Model, load_model, save_model
from keras.layers.core import Dense, Lambda, Activation
from keras.layers import Embedding, Input, Dense, merge, Reshape, Merge, Flatten, Dropout
from keras.optimizers import Adagrad, Adam, SGD, RMSprop, Adamax
from keras.regularizers import l2
from keras.layers import Multiply, Concatenate
from keras.callbacks import Callback, EarlyStopping, ModelCheckpoint
from time import time
import multiprocessing as mp
import sys
import math
import argparse
import matplotlib.pyplot as plt


Using TensorFlow backend.


In [0]:
!pip install -U -q PyDrive ## you will have install for every colab session

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
from google.colab import files

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

**Define dataset folder and files**

In [0]:
RATING_DATA_FILE_TRAIN = 'u1.base'
RATING_DATA_FILE_TEST = 'u1.test'
MOVIES_DATA_FILE_PATH = 'u.item'
USERS_DATA_FILE_PATH = 'u.user'

In [0]:
rating_file_import_train = drive.CreateFile({'id':'1_UmoSVT7fEwYBvDBGWSzAeekdQEOJBy6'})
rating_file_import_train.GetContentFile(RATING_DATA_FILE_TRAIN)

rating_file_import_test = drive.CreateFile({'id':'17DIYedqGvQO7SYgnfdPYYIW2ycRMyTWW'})
rating_file_import_test.GetContentFile(RATING_DATA_FILE_TEST)

movies_file_import = drive.CreateFile({'id':'1b3gelAzp7gPoFZSHEBHd7jSdiE1op_4j'})
movies_file_import.GetContentFile(MOVIES_DATA_FILE_PATH)

users_file_import = drive.CreateFile({'id':'1k0QZ9mw2OA0bLCLj0mt180Vs6Yyid8of'})
users_file_import.GetContentFile(USERS_DATA_FILE_PATH)



**The user and item id for embedding should start from 0. Update ids and save to file **

In [0]:
r_cols = ['userid', 'movieid', 'rating', 'timestamp']
ratings_train_df = pd.read_csv(RATING_DATA_FILE_TRAIN, sep='\t', engine='python', encoding='latin-1',names=r_cols)
ratings_train_df['user_emb_id'] = ratings_train_df['userid'] - 1
ratings_train_df['movie_emb_id'] = ratings_train_df['movieid'] - 1

ratings_test_df = pd.read_csv(RATING_DATA_FILE_TEST, sep='\t', engine='python', encoding='latin-1',names=r_cols)
ratings_test_df['user_emb_id'] = ratings_test_df['userid'] - 1
ratings_test_df['movie_emb_id'] = ratings_test_df['movieid'] - 1

u_cols = ['userid','age','gender','profession', '']
users = pd.read_csv(USERS_DATA_FILE_PATH, sep='|', engine='python', encoding='latin-1',names=u_cols)
users['gender'] = users['gender'].map({'M': 0, 'F': 1})

i_cols = ['movieid', 'title']
items_df = pd.read_csv(MOVIES_DATA_FILE_PATH, sep='|', engine='python', encoding='latin-1', names=i_cols, usecols=[0, 1])

train_df = pd.merge(items_df, ratings_train_df, on='movieid')
test_df = pd.merge(items_df, ratings_test_df, on='movieid')
joined_df = pd.concat([train_df, test_df])

Users = joined_df['user_emb_id'].values
Movies = joined_df['movie_emb_id'].values
Ratings = joined_df['rating'].values

In [7]:
max_userid = joined_df['userid'].drop_duplicates().max()
max_movieid = joined_df['movieid'].drop_duplicates().max()
print("Max user id:", max_userid, "   Max movie id:", max_movieid)

Max user id: 943    Max movie id: 1682


## Ex 1

**Define matrix factorization model**

In [0]:
def get_model(num_users, num_items, latent_dim):
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding', input_length=1)
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding', input_length=1)   
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings
    prediction = merge([user_latent, item_latent], mode = 'dot')
    
    
    model = Model(input=[user_input, item_input], output=prediction)

    return model

**Define embedding size and compile model**

In [9]:
K_LATENT = 20
MF_model = get_model(max_userid,max_movieid,K_LATENT)
MF_model.compile(loss='mse', optimizer='adamax',metrics=['mae'])

  
  name=name)


**Train model **

In [10]:
callbacks = [EarlyStopping('val_loss', patience=3)]
history = MF_model.fit([Users, Movies], Ratings, epochs=80, validation_split=.2, verbose=1, callbacks=callbacks, batch_size = 32)

Train on 80000 samples, validate on 20000 samples
Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80
Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80


**Ex 1 Section A**

 **Average MAE for standard matrix factorization:**

In [11]:
print("Average MAE:", np.mean(history.history['mean_absolute_error']),\
      "and Min MAE", min(history.history['mean_absolute_error']),\
      "in", len(history.history['mean_absolute_error']), "epochs")

Average MAE: 0.8414171710930373 and Min MAE 0.6153571675896644 in 76 epochs


**Define generalized matrix factorization model **

In [0]:
def get_gmf_model(num_users, num_items, latent_dim,do):
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding', input_length=1)
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding', input_length=1)   
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings
    hidden1 = Multiply()([user_latent, item_latent])
    drop = Dropout(do)(hidden1)
    prediction = Dense(1, activation='relu', kernel_initializer='lecun_uniform', name = 'prediction')(drop)
    
    
    model = Model(input=[user_input, item_input], output=prediction)

    return model

** Ex 1 section B **

In [13]:
k_latents = [20, 30]
dos = [0.5, 0.2]
GMF_results = []
for do in dos:
  for k in k_latents:
    GMF_model = get_gmf_model(max_userid,max_movieid,k,do)
    GMF_model.compile(loss='mse',optimizer=Adamax(),metrics=['mae'])
    callbacks = [EarlyStopping('val_loss', patience=3)]
    GMF_history = GMF_model.fit([Users, Movies], Ratings, epochs=150, validation_split=.2, verbose=1, callbacks=callbacks, batch_size = 32)
    GMF_results.append((k, do, GMF_history))



Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15

**Average MAE for generalized matrix factorization:**

In [20]:
for k, do, history in GMF_results:
  avg = np.mean(history.history['mean_absolute_error'])
  epochs = len(history.history['mean_absolute_error'])
  minimum = min(history.history['mean_absolute_error'])
  print("Using", k, "latent dimension in", epochs, "epochs and", do, "drop out Average MAE:", avg, "and Min MAE", minimum)

Using 20 latent dimension in 51 epochs and 0.5 drop out Average MAE: 0.7466326880896792 and Min MAE 0.6677943883299827
Using 30 latent dimension in 42 epochs and 0.5 drop out Average MAE: 0.7344445350505057 and Min MAE 0.6331295590162277
Using 20 latent dimension in 20 epochs and 0.2 drop out Average MAE: 0.7768717128753662 and Min MAE 0.6205486805081367
Using 30 latent dimension in 19 epochs and 0.2 drop out Average MAE: 0.7597886738143469 and Min MAE 0.57933232640028


**Ex 1 Section C**

Using Matrix Factorization, the average MAE was 0.84 with minimal MAE of 0.6166

Using GMF's best run, the average MAE was 0.7597 with minimal MAE of 0.5793

Ass 2 - the average MAE was 0.8272

We saw that the time to train the new models - GMF and MF - are lonager than the model's train of ass 2 - minutes vs seconds.

By the results of the min MAE, we can say that the new models are better than ass 2 model.


**Define neural collaborative filtering model**

In [0]:
def get_ncf_model(num_users, num_items, latent_dim, hidden_dim, do):
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding', input_length=1)
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding', input_length=1)   
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings
    conc = Concatenate()([user_latent, item_latent])
    drop = Dropout(0.5)(conc)
    hid1 = Dense(hidden_dim, activation='relu')(conc)
    drop2  = Dropout(do)(hid1)
    prediction = Dense(1, activation='relu', kernel_initializer='lecun_uniform', name = 'prediction')(drop2)
    
    
    model = Model(input=[user_input, item_input], output=prediction)

    return model

##Ex 2

In [0]:
def get_ncf_model(num_users, num_items, latent_dim, hidden_dim, do, num_of_hidden):
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding', input_length=1)
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding', input_length=1)   
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings
    conc = Concatenate()([user_latent, item_latent])
    drop = Dropout(0.5)(conc)
    hid1 = Dense(hidden_dim, activation='relu')(drop)
    last_layer = hid1
    for i in range(num_of_hidden - 1):
      hid_n = Dense(hidden_dim, activation='relu')(last_layer)
      last_layer = hid_n
    drop2  = Dropout(do)(last_layer)
    prediction = Dense(1, activation='relu', kernel_initializer='lecun_uniform', name = 'prediction')(drop2)
    
    model = Model(inputs=[user_input, item_input], outputs=prediction)

    return model

In [16]:
K_LATENT = 20
do = 0.5
num_of_nuerons = [15, 20]
num_of_layers = [1, 3]
optimizers = [(Adamax(), 'Adamax'), (SGD(), 'SGD')]
results = []
for layers in num_of_layers:
  for nuerons in num_of_nuerons:
    for opt, opt_name in optimizers:
      NCF_model = get_ncf_model(max_userid, max_movieid, K_LATENT, nuerons, do, layers)
      NCF_model.compile(loss='mse', optimizer=opt, metrics=['mae'])
      callbacks_ncf = [EarlyStopping('val_loss', patience=5)]
      NCF_history = NCF_model.fit([Users, Movies], Ratings, epochs=150, validation_split=.2, verbose=1, callbacks=callbacks_ncf, batch_size = 32)
      results.append((layers, nuerons, opt_name, NCF_history))

Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 

In [17]:
print("Const parameters: Loss Function - MSE")
for num_of_layers, num_of_nuerons, optimizer, history in results:
  avg = np.mean(history.history['mean_absolute_error'])
  epochs = len(history.history['mean_absolute_error'])
  minimum = min(history.history['mean_absolute_error'])
  print("Using", num_of_layers, "hidden layers sized", num_of_nuerons, "," , optimizer, "as optimizer in", epochs, "epochs: Average MAE is", avg,"Min MAE is", minimum)

Const parameters: Loss Function - MSE
Using 1 hidden layers sized 15 , Adamax as optimizer in 28 epochs: Average MAE is 0.7850146589381355 Min MAE is 0.749778125333786
Using 1 hidden layers sized 15 , SGD as optimizer in 28 epochs: Average MAE is 0.7884379261136056 Min MAE is 0.7639994487762451
Using 1 hidden layers sized 20 , Adamax as optimizer in 40 epochs: Average MAE is 0.7708266069939732 Min MAE is 0.7462981087684631
Using 1 hidden layers sized 20 , SGD as optimizer in 29 epochs: Average MAE is 0.7728053828054462 Min MAE is 0.7460360916137695
Using 3 hidden layers sized 15 , Adamax as optimizer in 21 epochs: Average MAE is 0.7919087006943565 Min MAE is 0.7515310255050659
Using 3 hidden layers sized 15 , SGD as optimizer in 27 epochs: Average MAE is 0.7906686696158516 Min MAE is 0.7512444159984588
Using 3 hidden layers sized 20 , Adamax as optimizer in 16 epochs: Average MAE is 0.8012737548299134 Min MAE is 0.7460245637416839
Using 3 hidden layers sized 20 , SGD as optimizer in 42

The minimal MAE (0.7404) for Neural Collaborative Filtering was achieved using 3 hidden layers sized 20, "Adamax" as optimizer and took 31 epochs to converge. Comparing to GMF, the minimal MAE of NCF is higher but its average MAE is lower. the best method the smallest MAE.



## Ex 3

**Section A**

Categorical feature (movie category) requires more effort in order to trasform than binary (gender) or continuos feature (age).
Hence, we decided to add to the model with the best results so far (GMF), the features of age and gender.

In [0]:
u_cols = ['userid', 'age', 'gender']
users_df = pd.read_csv(USERS_DATA_FILE_PATH, delimiter='|', encoding='latin-1', names=u_cols, usecols=[0, 1, 2])
users_df['gender'].replace('M', 0, inplace=True)
users_df['gender'].replace('F', 1, inplace=True)
joined_with_users_df = pd.merge(joined_df, users_df,on='userid')
Gender = joined_with_users_df['gender'].values
Age = joined_with_users_df['age'].values

In [0]:
def get_gmf_model_gen(num_users, num_items, latent_dim, do, max_age):
    # Input variables
    user_input = Input(shape=(1,), dtype='int32', name = 'user_input')
    item_input = Input(shape=(1,), dtype='int32', name = 'item_input')
    gender_input = Input(shape=(1,), dtype='float32', name = 'gender_input')
    age_input = Input(shape=(1,), dtype='float32', name = 'age_input')

    MF_Embedding_User = Embedding(input_dim = num_users, output_dim = latent_dim, name = 'user_embedding', input_length=1)
    MF_Embedding_Item = Embedding(input_dim = num_items, output_dim = latent_dim, name = 'item_embedding', input_length=1)
    
    # Crucial to flatten an embedding vector!
    user_latent = Flatten()(MF_Embedding_User(user_input))
    item_latent = Flatten()(MF_Embedding_Item(item_input))
    
    # Element-wise product of user and item embeddings
    hidden1 = Multiply()([user_latent, item_latent])
    conc = Concatenate()([hidden1, gender_input, age_input])
    drop = Dropout(do)(conc)
    prediction = Dense(1, activation='relu', kernel_initializer='lecun_uniform', name = 'prediction')(conc)
    
    model = Model(input=[user_input, item_input,gender_input, age_input], output=prediction)
  
    return model

**Section B**

In [20]:
k_latents = [20, 30]
drop_out = [0.3, 0.5]
GMF_results_features_add = []
max_age = max(Age)
for do in drop_out:
  for k in k_latents:
    GMF_model = get_gmf_model_gen(max_userid,max_movieid,k,do,max_age)
    GMF_model.compile(loss='mse',optimizer=Adamax(),metrics=['mae'])
    callbacks = [EarlyStopping('val_loss', patience=3)]
    GMF_results_features_add_history = GMF_model.fit([Users, Movies, Gender, Age], Ratings, epochs=150, validation_split=.2, verbose=1, callbacks=callbacks, batch_size = 32)
    GMF_results_features_add.append((k, do, GMF_results_features_add_history))



Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Train on 80000 samples, validate on 20000 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150


**Ex 3 Section C**

In [24]:
for k, drop_out, history in GMF_results_features_add:
  avg = np.mean(history.history['mean_absolute_error'])
  epochs = len(history.history['mean_absolute_error'])
  minimum = min(history.history['mean_absolute_error'])
  print("drop out of ", drop_out,"and ",  k, " latent dimension in", epochs, "epochs =>",  "Average MAE:", avg, ", Min MAE",minimum)


drop out of  0.3 and  20  latent dimension in 4 epochs => Average MAE: 3.52835 , Min MAE 3.52835
drop out of  0.3 and  30  latent dimension in 4 epochs => Average MAE: 3.52835 , Min MAE 3.52835
drop out of  0.5 and  20  latent dimension in 14 epochs => Average MAE: 0.767402311807871 , Min MAE 0.6069800953626633
drop out of  0.5 and  30  latent dimension in 4 epochs => Average MAE: 3.52835 , Min MAE 3.52835


**Ex 3 Section D**

We add to the GMF model the features: age and gender, and then we got min MAE - 0.551 - which improved the model's results.
The reason for the relatively small running time of the model is the high MAE in the beginning.

**Ex 3 Section E**

The Results Are:

1. Simple Matrix Factorization - min. MAE: 0.6166
2. Generalized Matrix Factorization - min. MAE: 0.5793
3. Neural Collaborative Filtering - min. MAE: 0.739

So far, the best results are for the Generalized Matrix Factorization model.

We decided to add the features of "age" and "gender" to the GMF model in order the improved the MAE.

We can see, according to Section C that the addition of the features improved the results - from 0.5793 to 0.5551.

We would recommend GMF with the features adding as the model with the best results in order to predict movies ratings.