# Neural Collaborative Filtering on MovieLens dataset.

Neural Collaborative Filtering (NCF) is a well known recommendation algorithm that generalizes the matrix factorization problem with multi-layer perceptron. 

This notebook provides an example of how to utilize and evaluate NCF implementation in the `reco_utils`. 

The fundamental assumption behind collaborative filtering technique is that similar user preferences over the items could be exploited to recommend those items to a user who has not seen or used it before.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append("../../")
import time
import pandas as pd
import tensorflow as tf

from reco_utils.recommender.ncf.ncf_singlenode import NCF
from reco_utils.recommender.ncf.dataset import Dataset as NCFDataset
from reco_utils.dataset import movielens
from reco_utils.common.notebook_utils import is_jupyter
from reco_utils.dataset.python_splitters import python_chrono_split
from reco_utils.evaluation.python_evaluation import (rmse, mae, rsquared, exp_var, map_at_k, ndcg_at_k, precision_at_k, 
                                                     recall_at_k, get_top_k_items)

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow version: {}".format(tf.__version__))

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


System version: 3.6.10 |Anaconda, Inc.| (default, May  7 2020, 19:46:08) [MSC v.1916 64 bit (AMD64)]
Pandas version: 0.25.3
Tensorflow version: 1.14.0


Setting up Epochs, Batch Size nad top-K recommendations

In [4]:
# top k items to recommend
TOP_K = 10

# Model parameters
EPOCHS = 10
BATCH_SIZE = 256

SEED = 42

Importing the snacks data

In [5]:
df1 = pd.read_csv("InputFoodData.csv")
df1

Unnamed: 0,userID,itemID,rating,category
0,196,242,3,Fast Foods
1,63,242,3,Fast Foods
2,226,242,5,Fast Foods
3,154,242,3,Fast Foods
4,306,242,5,Fast Foods
...,...,...,...,...
58869,736,296,4,Fish
58870,655,296,4,Fish
58871,782,296,3,Fish
58872,733,296,2,Fish


In [12]:
df1.rename(columns = {'category':'timestamp'}, inplace = True) 
df1

Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3,Fast Foods
1,63,242,3,Fast Foods
2,226,242,5,Fast Foods
3,154,242,3,Fast Foods
4,306,242,5,Fast Foods
...,...,...,...,...
58869,736,296,4,Fish
58870,655,296,4,Fish
58871,782,296,3,Fish
58872,733,296,2,Fish


### 2. Split the data using the Spark chronological splitter provided in utilities

In [13]:
train, test = python_chrono_split(df1, 0.75)


Generate an NCF dataset object from the data subsets.

In [14]:
data = NCFDataset(train=train, test=test, seed=SEED)

### 3. Train the NCF model on the training data, and get the top-k recommendations for our testing data

NCF accepts implicit feedback and generates prospensity of items to be recommended to users in the scale of 0 to 1. A recommended item list can then be generated based on the scores. Note that this quickstart notebook is using a smaller number of epochs to reduce time for training. As a consequence, the model performance will be slighlty deteriorated. 

In [15]:
model = NCF (
    n_users=data.n_users, 
    n_items=data.n_items,
    model_type="NeuMF",
    n_factors=4,
    layer_sizes=[16,8,4],
    n_epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    learning_rate=1e-3,
    verbose=10,
    seed=SEED
)

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.





Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




# Fitting data into the model 

In [16]:
start_time = time.time()

a = model.fit(data)

train_time = time.time() - start_time

print("Took {} seconds for training.".format(train_time))

Took 23.66390085220337 seconds for training.


In the movie recommendation use case scenario, seen movies are not recommended to the users.

In [29]:
start_time = time.time()

users, items, preds = [], [], []
item = list(train.itemID.unique())
for user in train.userID.unique():
    user = [user] * len(item) 
    users.extend(user)
    items.extend(item)
    preds.extend(list(model.predict(user, item, is_list=True)))

all_predictions = pd.DataFrame(data={"userID": users, "itemID":items, "prediction":preds})

merged = pd.merge(train, all_predictions, on=["userID", "itemID"], how="outer")
all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)
all_predictions = all_predictions.drop('timestamp', axis=1)
test_time = time.time() - start_time
print("Took {} seconds for prediction.".format(test_time))

Took 1.2595479488372803 seconds for prediction.


In [30]:
all_predictions

Unnamed: 0,userID,itemID,prediction
44161,1,251,0.074738
44162,1,257,0.105764
44163,1,237,0.124421
44164,1,242,0.200317
44165,1,258,0.208095
...,...,...,...
352677,943,420,0.000519
352678,943,311,0.000093
352679,943,385,0.000565
352680,943,345,0.000066


# Merging the users recommendations with the food data

In [31]:
df = pd.read_csv("FinalFoodData.csv")

In [32]:
all_predictions = all_predictions.merge(df, on= 'itemID')
all_predictions

Unnamed: 0,userID,itemID,prediction,name,category,calories,fat,protein,sugars
0,1,251,0.074738,Pizza Hut 14 Inch Cheese Pizza Stuffed Crust,Fast Foods,274,11.63,12.23,2.90
1,3,251,0.045755,Pizza Hut 14 Inch Cheese Pizza Stuffed Crust,Fast Foods,274,11.63,12.23,2.90
2,4,251,0.041991,Pizza Hut 14 Inch Cheese Pizza Stuffed Crust,Fast Foods,274,11.63,12.23,2.90
3,5,251,0.017546,Pizza Hut 14 Inch Cheese Pizza Stuffed Crust,Fast Foods,274,11.63,12.23,2.90
4,6,251,0.115210,Pizza Hut 14 Inch Cheese Pizza Stuffed Crust,Fast Foods,274,11.63,12.23,2.90
...,...,...,...,...,...,...,...,...,...
308516,934,127,0.524285,Coffee Cafe Con Leche,Coffee,39,1.08,1.64,4.93
308517,937,127,0.800922,Coffee Cafe Con Leche,Coffee,39,1.08,1.64,4.93
308518,940,127,0.691475,Coffee Cafe Con Leche,Coffee,39,1.08,1.64,4.93
308519,941,127,0.769170,Coffee Cafe Con Leche,Coffee,39,1.08,1.64,4.93


In [33]:
len(set(all_predictions.userID))

943

In [26]:
all_predictions[all_predictions.userID == 1]

Unnamed: 0,userID,itemID,prediction,name,category,calories,fat,protein,sugars
0,1,251,0.074738,Pizza Hut 14 Inch Cheese Pizza Stuffed Crust,Fast Foods,274,11.63,12.23,2.90
903,1,257,0.105764,Popeyes Mild Chicken Strips Analyzed 2006,Fast Foods,271,13.01,19.20,0.00
1597,1,237,0.124421,Burger King Cheeseburger,Fast Foods,286,14.81,14.57,4.49
2258,1,242,0.200317,Dominos 14 Inch Cheese Pizza Ultimate Deep Dis...,Fast Foods,265,9.83,10.76,4.22
3099,1,258,0.208095,Popeyes Spicy Chicken Strips Analyzed 2006,Fast Foods,253,11.20,19.61,0.00
...,...,...,...,...,...,...,...,...,...
142270,1,420,0.002639,Syrups Corn Dark,Sweets,286,0.00,0.00,77.59
143212,1,311,0.000226,Florida Avocados,Fruits,120,10.06,2.23,2.42
144152,1,385,0.001921,Pretzels Hard Plain Lightly Salted,Snacks,382,3.22,9.57,2.21
145094,1,345,0.000146,Pizza Cheese From School Lunch Medium Crust,Pizza,250,8.59,13.67,6.07


In [35]:
a = all_predictions.sort_values(['userID','prediction'], ascending=False).groupby('userID').head(10)
a

Unnamed: 0,userID,itemID,prediction,name,category,calories,fat,protein,sugars
293988,943,176,0.900956,Blue Cheese,Dairy and Egg Products,353,28.74,21.40,0.50
208693,943,82,0.859687,Martini Flavored,Beverages,189,0.03,0.09,5.15
205379,943,89,0.823664,Rum And Cola,Beverages,89,0.19,0.00,7.48
172690,943,29,0.792040,Beans Chili Barbecue Ranch Style Cooked,Beans and Lentils,97,1.00,5.00,5.25
65768,943,183,0.786719,Cottage Cheese (Blended),Dairy and Egg Products,98,4.30,11.12,2.67
...,...,...,...,...,...,...,...,...,...
107700,1,191,0.726353,Goat Milk,Dairy and Egg Products,69,4.14,3.56,4.45
64332,1,208,0.648021,Mozzarella (Hard And Lowfat),Dairy and Egg Products,295,19.78,23.75,1.90
107030,1,202,0.604387,Ice Cream Sandwich Vanilla Light No Sugar Added,Dairy and Egg Products,200,2.86,5.71,6.58
106207,1,178,0.593377,Brie Cheese,Dairy and Egg Products,334,27.68,20.75,0.45


In [36]:
a.to_csv("ncf_recomm.csv", index=False)

### 4. Evaluate how well NCF performs

The ranking metrics are used for evaluation.

In [21]:
eval_map = map_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.005699
NDCG:	0.022968
Precision@K:	0.022587
Recall@K:	0.023757


In [37]:
if is_jupyter():
    # Record results with papermill for tests
    import papermill as pm
    pm.record("map", eval_map)
    pm.record("ndcg", eval_ndcg)
    pm.record("precision", eval_precision)
    pm.record("recall", eval_recall)
    pm.record("train_time", train_time)
    pm.record("test_time", test_time)

  after removing the cwd from sys.path.


  """


  


  import sys


  


  if __name__ == '__main__':
