# Recommendations Systems
## Homework 3: Neural Collaborative Filtering

Submit your solution in the form of an Jupyter notebook file (with extension ipynb).   
Images of graphs or tables should be submitted as PNG or JPG files.   
The code used to answer the questions should be included, runnable and documented in the notebook.   
Python 3.6 should be used.

The goal of this homework is to let you understand the concept of  recommendations based on implicit data which is very common in real life, and learn how ‘Deep neural networks’ components can be used to implement a collaborative filtering and hybrid approach recommenders.  
Implementation example is presented in the <a href='https://colab.research.google.com/drive/1v72_zpCObTFMbNnQXUknoQVXR1vBRX6_?usp=sharing'>NeuralCollaborativeFiltering_Implicit</a> notebook in Moodle.

We will use a dataset based on the <a href='https://grouplens.org/datasets/movielens/1m/'>MovieLens 1M rating dataset</a> after some pre-processing to adapt it to an implicit feedback use case scenario.  
You can download the dataset used by <a href='https://github.com/hexiangnan/neural_collaborative_filtering'>this implementation</a> of the paper Neural Collaborative Filtering or from the NeuralCollaborativeFiltering_implicit notebook in Moodle.
<br>

## Imports:

In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os


np.random.seed(0)

#### Constants:

#### Preprocessing:

In [3]:
# !git clone https://github.com/hexiangnan/neural_collaborative_filtering.git

Cloning into 'neural_collaborative_filtering'...


In [21]:
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
training = pd.read_csv('./neural_collaborative_filtering/Data/ml-1m.train.rating', sep='\t', names=column_names) # Read the training file
test_rating = pd.read_csv('./neural_collaborative_filtering/Data/ml-1m.test.rating', sep='\t', names=column_names) # Read the test file

negative_ids = ['(user_id, item_id)']

for i in range(1,100):
    negative_ids.append(f'id-{i}')

test_negative = pd.read_csv('./neural_collaborative_filtering/Data/ml-1m.test.negative', sep='\t', names=negative_ids)

In [20]:
test_negative

Unnamed: 0,"(user_id, item_id)",id-1,id-2,id-3,id-4,id-5,id-6,id-7,id-8,id-9,...,id-90,id-91,id-92,id-93,id-94,id-95,id-96,id-97,id-98,id-99
0,"(0,25)",1064,174,2791,3373,269,2678,1902,3641,1216,...,2854,3067,58,2551,2333,2688,3703,1300,1924,3118
1,"(1,133)",1072,3154,3368,3644,549,1810,937,1514,1713,...,1535,341,3525,1429,2225,1628,2061,469,3056,2553
2,"(2,207)",2216,209,2347,3,1652,3397,383,2905,2284,...,953,865,813,1353,2945,2580,2989,2790,2879,2481
3,"(3,208)",3023,1489,1916,1706,1221,1191,2671,81,2483,...,3347,1707,2901,2767,2167,1921,247,1618,2016,2323
4,"(4,222)",1794,3535,108,593,466,2048,854,1378,1301,...,2490,1332,2526,2804,2027,833,176,463,2851,2453
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6035,"(6035,1048)",2495,3406,819,729,1920,2003,3329,2351,549,...,2583,2905,2713,2361,2542,2598,2030,2984,3382,2771
6036,"(6036,294)",2248,1318,3661,72,351,2131,3281,2482,639,...,110,508,2168,354,1156,1646,3238,2091,1494,2489
6037,"(6037,1528)",2194,867,1424,2517,3080,2789,1210,3150,466,...,1428,433,74,3457,833,2823,2425,3434,2331,2530
6038,"(6038,1449)",2606,2054,2754,1299,2854,2413,1055,742,2876,...,2140,3401,813,1374,307,1477,2327,114,98,3021


## Question 1: Dataset preparation

a. This implementation contains one file for training and two files for testing:
- ml-1m.train.rating
- ml-1m.test.rating
- ml-1m.test.negative

**Explain** the role and structure of each file and how it was created from the original MovieLens 1M rating dataset.

##### Answer:

ml-1m.train.rating:
- Training file
- Each line is a training instance: userID\t itemID\t rating\t timestamp (if exists)

ml-1m.test.rating:
- Test file (positive instances)
- Each line is a testing instance: userID\t itemID\t rating\t timestamp (if exists)

ml-1m.test.negative:
- Test file (negative instances)
- Each line corresponds to the line of test.rating, containing 99 negative samples
- Each line is in the format: (userID,itemID)\t negativeItemID1\t negativeItemID2...

b. **Explain** how the training dataset is created.

##### Answer:

c. **Explain** how the test dataset is created.

##### Answer:

<br>

***
## Question 2: Neural Collaborative filtering

a. Build the following four models using the neural collaborative filtering approach: 
- Matrix Factorization (MF)
- Multi layer perceptron (MLP)
- Generalized Matrix Factorization (GMF) 
- NeuroMatrixFactorization (NMF)

b. Train and evaluate the recommendations accuracy of three models: 
- MF or GMF
- MLP
- NMF

Compare the learning curve and recommendations accuracy using NDCG and MRR metrics with cutoff values of 5 and 10.   
Discuss the comparison. 

c. How the values of MRR and NDCG are differ from the results you got in the previous exercises which implemented the explicit recommendation approach. 
What are the difference in preparing the dataset for evaluation. 

d. How will you measure item similarity using the NeuMF model?

<br>

***
## Question 3: Loss function

a. One of the enhancements presented in the Neural Collaborative Filtering paper is the usage of probabilistic activation function (the sigmoid) and binary cross entropy loss function.   

Select one of the models you implemented in question 2 and change the loss function to a Mean Squared Error and the activation function of the last layer to RELU.   

Train the model and evaluate it in a similar way to what you did in question 2. 
Compare the results and discuss.

In [5]:
# from HW1
def get_mse(pred, actual):
    I = actual != 0  # Indicator function which is zero for missing data
    ME = I * (actual - pred)  # Errors between real and predicted ratings
    MSE = ME**2
    return np.sum(MSE)/np.sum(I)