# Neural Collaborative Filtering (NCF)


**Matrix factorization algorithm**

NCF is a neural matrix factorization model, which ensembles Generalized Matrix Factorization (GMF) and Multi-Layer Perceptron (MLP) to unify the strengths of linearity of MF and non-linearity of MLP for modelling the user–item latent structures.

## Imports

In [None]:
import sys
import os
import shutil
import papermill as pm
import scrapbook as sb
import pandas as pd
import numpy as np
import tensorflow as tf

from recommenders.utils.timer import Timer
from recommenders.models.ncf.ncf_singlenode import NCF
from recommenders.models.ncf.dataset import Dataset as NCFDataset
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split
from recommenders.evaluation.python_evaluation import (rmse, mae, rsquared, exp_var, map_at_k, ndcg_at_k, precision_at_k,
                                                     recall_at_k, get_top_k_items)

  from pyarrow import HadoopFileSystem


System version: 3.7.5 (default, Dec  9 2021, 17:04:37) 
[GCC 8.4.0]
Pandas version: 1.3.5
Tensorflow version: 2.7.0


In [None]:
TOP_K = 10

MOVIELENS_DATA_SIZE = '100k'

EPOCHS = 100
BATCH_SIZE = 256

## NCF movie recommender

### Load and split data

We split the data chronologically using python_chrono_split to achieve a 75/25% training and test split.

In [None]:
df = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=["userID", "itemID", "rating", "timestamp"]
)

df.head()

100%|██████████| 4.81k/4.81k [00:00<00:00, 16.9kKB/s]


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


In [None]:
train, test = python_chrono_split(df, 0.75)

Filter out any users or items in the test set that do not appear in the training set.

In [None]:
test = test[test["userID"].isin(train["userID"].unique())]
test = test[test["itemID"].isin(train["itemID"].unique())]

Write datasets to csv files.

In [None]:
train_file = "./train.csv"
test_file = "./test.csv"
leave_one_out_test_file = "./leave_one_out_test.csv"
train.to_csv(train_file, index=False)
test.to_csv(test_file, index=False)

Here we use NCF Dataset data structure to make it easier to make matrixes for matrix factorization



In [None]:
data = NCFDataset(train_file=train_file, test_file=leave_one_out_test_file, overwrite_test_file_full=True)

Indexing ./train.csv ...
Indexing ./leave_one_out_test.csv ...
Indexing ./leave_one_out_test_full.csv ...


### Train NCF model
The NCF are:

`n_factors`, which controls the dimension of the latent space. Usually, the quality of the training set predictions grows with as n_factors gets higher.

`layer_sizes`, sizes of input layer (and hidden layers) of MLP, input type is list.

`n_epochs`, which defines the number of iteration of the SGD procedure.
Note that both parameter also affect the training time.

`model_type`, we can train single `"MLP"`, `"GMF"` or combined model `"NCF"` by changing the type of model.


In [None]:
model = NCF (
    n_users=data.n_users,
    n_items=data.n_items,
    model_type="NeuMF",
    n_factors=4,
    layer_sizes=[16,8,4],
    n_epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    learning_rate=1e-3,
    verbose=10,
)

## Prediction and Evaluation

### Prediction

`predict` returns an internal object Prediction which can be easily converted back to a dataframe:

In [None]:
predictions = [[row.userID, row.itemID, model.predict(row.userID, row.itemID)]
               for (_, row) in test.iterrows()]


predictions = pd.DataFrame(predictions, columns=['userID', 'itemID', 'prediction'])
predictions.head()

Unnamed: 0,userID,itemID,prediction
0,1.0,149.0,0.029068
1,1.0,88.0,0.624769
2,1.0,101.0,0.234142
3,1.0,110.0,0.101384
4,1.0,103.0,0.067458


### Generic Evaluation
We remove rated movies in the top k recommendations
To compute ranking metrics, we need predictions on all user, item pairs. We remove though the items already watched by the user, since we choose not to recommend them again.

In [None]:
users, items, preds = [], [], []
item = list(train.itemID.unique())
for user in train.userID.unique():
    user = [user] * len(item)
    users.extend(user)
    items.extend(item)
    preds.extend(list(model.predict(user, item, is_list=True)))

all_predictions = pd.DataFrame(data={"userID": users, "itemID":items, "prediction":preds})

merged = pd.merge(train, all_predictions, on=["userID", "itemID"], how="outer")
all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)

In [None]:

eval_map = map_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)
eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.048144
NDCG:	0.198384
Precision@K:	0.176246
Recall@K:	0.098700


And same metrics for TOP_K = 100

In [None]:
eval_map = map_at_k(test, all_predictions, col_prediction='prediction', k=100)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=100)
eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=100)
eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=100)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.104101
NDCG:	0.175529
Precision@K:	0.118462
Recall@K:	0.327749


# Conclusion

As we see, NCF gives a bit worse results in comperison to Standard-VAE models. Specifically, the results of evaluting the test set, for the the 3 different approaches, are:

| Model                                           | Metric          | Value     |
|--------------------------------------------------|------------------|-----------|
| **Standard-VAE (without annealing, β=1) - 100**  | MAP@100          | 0.171624  |
|                                                  | NDCG@100         | 0.393328  |
|                                                  | Precision@100    | 0.231867  |
|                                                  | Recall@100       | 0.319650  |
| **Standard-VAE (without annealing, β=1) - 10**   | MAP@10           | 0.066183  |
|                                                  | NDCG@10          | 0.496464  |
|                                                  | Precision@10     | 0.473000  |
|                                                  | Recall@10        | 0.101120  |
| **Standard-VAE (with annealing, optimal β) - 100**| MAP@100          | 0.128121  |
|                                                  | NDCG@100         | 0.312319  |
|                                                  | Precision@100    | 0.191383  |
|                                                  | Recall@100       | 0.224346  |
| **Standard-VAE (with annealing, optimal β) - 10** | MAP@10           | 0.041101  |
|                                                  | NDCG@10          | 0.406207  |
|                                                  | Precision@10     | 0.321167  |
|                                                  | Recall@10        | 0.091352  |
| **Neural Collaborative Filtering (NCF) - 10**     | MAP              | 0.048144  |
|                                                  | NDCG             | 0.198384  |
|                                                  | Precision@K      | 0.176246  |
|                                                  | Recall@K         | 0.098700  |
| **Neural Collaborative Filtering (NCF) - 100**    | MAP@100          | 0.104101  |
|                                                  | NDCG@100         | 0.175529  |
|                                                  | Precision@100    | 0.118462  |
|                                                  | Recall@100       | 0.327749  |


## Model Comparison:

### 1. Standard-VAE vs. NCF at Cutoff 10:
- **NCF:**
  - MAP: 0.048144, NDCG: 0.198384, Precision@K: 0.176246, Recall@K: 0.098700.
- **Standard-VAE (without annealing, β=1):**
  - MAP: 0.066183, NDCG: 0.496464, Precision@10: 0.473000, Recall@10: 0.101120.
- **Observation:**
  - Standard-VAE performs better in terms of MAP, NDCG, and precision at cutoff 10.
  - NCF has a competitive recall value but lags in other metrics.

### 2. Standard-VAE (with annealing, optimal β) vs. NCF at Cutoff 100:
- **NCF:**
  - MAP@100: 0.104101, NDCG@100: 0.175529, Precision@100: 0.118462, Recall@100: 0.327749.
- **Standard-VAE (with annealing, optimal β):**
  - MAP@100: 0.128121, NDCG@100: 0.312319, Precision@100: 0.191383, Recall@100: 0.224346.
- **Observation:**
  - Standard-VAE performs better in terms of MAP, NDCG, and precision at cutoff 100.
  - NCF excels in recall but lags behind in other metrics.

## Summary:

- **Standard-VAE Strengths:**
  - Performs well at both cutoff 10 and cutoff 100.
  - Higher precision and NDCG values compared to NCF.

- **NCF Strengths:**
  - Competitive recall values, especially at cutoff 100.
  - Potential for improvement in precision and MAP.

- **Considerations:**
  - The choice between models depends on specific use case requirements.
  - Further experimentation and tuning are recommended for both models to enhance overall performance.

