# Mount to Google Drive to retrieve the data

In [None]:
# link to the personal google drive account to import ENSO_SST.txt
from google.colab import drive
drive.mount('/content/drive')

# setup the path to certain folder
# path allow user to link to the folder where store the target file
path = '/content/drive/MyDrive/CSE6240/Project' 
%cd $path

Mounted at /content/drive
/content/drive/MyDrive/CSE6240/Project


Pytorch is somehow not pre-install with Colab, use scripts

$\textbf{"!pip install pytorch_lightning"}$ 

to intall the pytorch lightning

In [None]:
!pip install pytorch_lightning

# Important Technique

In this paper, we would use the implict feedback from user instead of ubquitous explicit feedback.

# Explicit Feedback

In the context of recommender systems, explicit feedback are direct and quantitative data collected from users. For example, in our datset Amazon allows users to rate items on a scale of 1-5. These ratings are provided directly from users, and the scale allows Amazon to quantify user preference.

However, the problem with explicit feedback is that they are rare. That is, the dataset would become extremely sparse. For example, people rarely remember when was the last time they clicked the like button on a YouTube video, or rated your online purchases. Chances are, the amount of videos you watch on YouTube is far greater than the amount of videos that you have explicitly rated.

# Implicit Feedback

On the other hand, implicit feedback are collected indirectly from user interactions, and they act as a proxy for user preference. For example. videos that you watch on YouTube are used as implicit feedback to tailor recommendations for you, even if you don't rate the videos explicitly.

The advantage of implicit feedback is that we could easily collected them. Recommender systems built using implicit feedback also allows us to tailor recommendations in real time, with every click and interaction. Today, online recommender systems are built using implicit feedback, which allows the system to tune its recommendation in real-time, with every user interaction.

However, implicit feedback has its shortcomings. Unlike explicit feedback, every interaction is assumed to be positive and we are unable to capture negative preference from users. Nonetheless, this problem could be solved by negative sampling we implemented later in the code.

# Data Preprocessing

Generally, we store the origin userId and itemId and the new Ids casted to with a table. Then the new userId and itemId was been used to build up the test-train dataset.

In [None]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from pytorch_lightning.core.lightning import LightningModule
from pytorch_lightning import Trainer

np.random.seed(123)

In [None]:
# read text file into pandas DataFrame
data = pd.read_csv("Video_Games.csv",delimiter=",",header=None,names=['itemId','userId','rating','timestamp'])

In [None]:
user_table = dict(zip(data['userId'].unique(),[i for i in range(data['userId'].nunique())]))
item_table = dict(zip(data['itemId'].unique(),[i for i in range(data['itemId'].nunique())]))

In [None]:
ratings = pd.DataFrame.copy(data,deep=True)
ratings['userId'] = data['userId'].map(user_table)
ratings['itemId'] = data['itemId'].map(item_table)

To solve the lengthy training and predicting time, we assume that number of interactions of user with any item should greater than five (>=5) to be considered. Or the user would be seemed as less active and being ignore.

In [None]:
removals = ratings['userId'].value_counts().reset_index()
removals = removals[removals['userId'] >= 5]['index'].values
ratings = ratings[ratings['userId'].isin(removals)]

Along with the rating, there is also a timestamp column that shows the date and time the review was submitted. Using the timestamp column, we will implement our train-test split strategy using the leave-one-out methodology. For each user, the most recent review is used as the test set (i.e. leave one out), while the rest will be used as training data.

This train-test split strategy is often used when training and evaluating recommender systems. Doing a random split would not be fair, as we could potentially be using a user's recent reviews for training and earlier reviews for testing. This introduces data leakage with a look-ahead bias, and the performance of the trained model would not be generalizable to real-world performance.

In [None]:
ratings['rank_latest'] = ratings.groupby(['userId'])['timestamp'] \
                                .rank(method='first', ascending=False)

train_ratings = ratings[ratings['rank_latest'] != 1]
test_ratings = ratings[ratings['rank_latest'] == 1]

# drop columns that we no longer need
train_ratings = train_ratings[['userId', 'itemId', 'rating']]
test_ratings = test_ratings[['userId', 'itemId', 'rating']]

# Note

We will train a recommender system (NCF) using implicit feedback. However, the Amazon Review Dataset that we're using is based on explicit feedback (which means it's either 1.0,2.0,...). To convert this kind of dataset into an implicit feedback dataset, we'll simply binarize the ratings such that they are are '1' (i.e. positive class). The value of '1' represents that the user has interacted with the item.


In [None]:
train_ratings.loc[:, 'rating'] = 1

train_ratings.sample(5)

Unnamed: 0,userId,itemId,rating
2221785,2455,3727,1
2175948,2346,682,1
2188850,27539,1351,1
733948,166357,11868,1
343197,151023,7285,1


Here, we got a problem since all the users gave items a positive feedback (rating = 1). We then dealed with this bias with $\textbf{negative sampling}$.

For our implmentation, we'll choose num_negative = 4. This means that the ratio of negative sample and positive sample now is 4:1. This ratio is generally used by NCF implementation and by original NCF author [Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu and Tat-Seng Chua (2017)].

In [None]:
# Get a list of all item IDs
all_itemIds = ratings['itemId'].unique()

# Placeholders that will hold the training data
users, items, labels = [], [], []

# This is the set of items that each user has interaction with
user_item_set = set(zip(train_ratings['userId'], train_ratings['itemId']))

# 4:1 ratio of negative to positive samples
num_negatives = 4

for (u, i) in tqdm(user_item_set):
    users.append(u)
    items.append(i)
    labels.append(1) # items that the user has interacted with are positive
    for _ in range(num_negatives):
        # randomly select an item
        negative_item = np.random.choice(all_itemIds) 
        # check that the user has not interacted with this item
        while (u, negative_item) in user_item_set:
            negative_item = np.random.choice(all_itemIds)
        users.append(u)
        items.append(negative_item)
        labels.append(0) # items not interacted with are negative

  0%|          | 0/499746 [00:00<?, ?it/s]

Now, we wrap up all the code written before to Pytorch Dataset class.

In [None]:
class AmazonTrainDataset(Dataset):
    """Amazon Review PyTorch Dataset for Training
    
    Args:
        ratings (pd.DataFrame): Dataframe containing the item ratings
        all_itemIds (list): List containing all itemIds
    
    """

    def __init__(self, ratings, all_itemIds):
        self.users, self.items, self.labels = self.get_dataset(ratings, all_itemIds)

    def __len__(self):
        return len(self.users)
  
    def __getitem__(self, idx):
        return self.users[idx], self.items[idx], self.labels[idx]

    def get_dataset(self, ratings, all_itemIds):
        users, items, labels = [], [], []
        user_item_set = set(zip(ratings['userId'], ratings['itemId']))

        num_negatives = 4
        for u, i in user_item_set:
            users.append(u)
            items.append(i)
            labels.append(1)
            for _ in range(num_negatives):
                negative_item = np.random.choice(all_itemIds)
                while (u, negative_item) in user_item_set:
                    negative_item = np.random.choice(all_itemIds)
                users.append(u)
                items.append(negative_item)
                labels.append(0)

        return torch.tensor(users), torch.tensor(items), torch.tensor(labels)

Detail of Neural Collaborative Filtering (NCF) was in NCF authors' paper http://dl.acm.org/citation.cfm?id=3052569. 

In [None]:
class NCF(LightningModule):
    """ Neural Collaborative Filtering (NCF)
    
        Args:
            num_users (int): Number of unique users
            num_items (int): Number of unique items
            ratings (pd.DataFrame): Dataframe containing the item ratings for training
            all_itemIds (list): List containing all itemIds (train + test)
    """
    
    def __init__(self, num_users, num_items, ratings, all_itemIds):
        super().__init__()
        self.user_embedding = nn.Embedding(num_embeddings=num_users, embedding_dim=8)
        self.item_embedding = nn.Embedding(num_embeddings=num_items, embedding_dim=8)
        self.fc1 = nn.Linear(in_features=16, out_features=64)
        self.fc2 = nn.Linear(in_features=64, out_features=32)
        self.output = nn.Linear(in_features=32, out_features=1)
        self.ratings = ratings
        self.all_itemIds = all_itemIds
        
    def forward(self, user_input, item_input):
        
        # Pass through embedding layers
        user_embedded = self.user_embedding(user_input)
        item_embedded = self.item_embedding(item_input)

        # Concat the two embedding layers
        vector = torch.cat([user_embedded, item_embedded], dim=-1)

        # Pass through dense layer
        vector = nn.ReLU()(self.fc1(vector))
        vector = nn.ReLU()(self.fc2(vector))

        # Output layer
        pred = nn.Sigmoid()(self.output(vector))

        return pred
    
    def training_step(self, batch, batch_idx):
        user_input, item_input, labels = batch
        predicted_labels = self(user_input, item_input)
        loss = nn.BCELoss()(predicted_labels, labels.view(-1, 1).float())
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())

    def train_dataloader(self):
        return DataLoader(AmazonTrainDataset(self.ratings, self.all_itemIds),
                          batch_size=512, num_workers=4)

We instantiate the NCF model using the class that was defined above.

In [None]:
num_users = ratings['userId'].max()+1
num_items = ratings['itemId'].max()+1

all_itemIds = ratings['itemId'].unique()

model = NCF(num_users, num_items, train_ratings, all_itemIds)

Use Trainer class in pytorch lightning to train the model.

Model Setup:

1. Maximum epochs: 30

2. Reload_dataloaders every n epochs = 1 (negatively sample from dataset for each epoch)

In [None]:
trainer = Trainer(max_epochs=1, gpus=1, reload_dataloaders_every_n_epochs=1,
                     progress_bar_refresh_rate=50, logger=False, checkpoint_callback=False)

trainer.fit(model)

  f"Setting `Trainer(checkpoint_callback={checkpoint_callback})` is deprecated in v1.5 and will "
  f"Setting `Trainer(progress_bar_refresh_rate={progress_bar_refresh_rate})` is deprecated in v1.5 and"
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name           | Type      | Params
---------------------------------------------
0 | user_embedding | Embedding | 12.2 M
1 | item_embedding | Embedding | 575 K 
2 | fc1            | Linear    | 1.1 K 
3 | fc2            | Linear    | 2.1 K 
4 | output         | Linear    | 33    
---------------------------------------------
12.7 M    Trainable params
0         Non-trainable params
12.7 M    Total params
50.941    Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]

# Evaluate - NDCG@10 and HITS@10

Case 1. k=1

Case 2. k=5

Case 3. k=10

In [None]:
def ndcg(gt_item, pred_items):
	if gt_item in pred_items:
		index = pred_items.index(gt_item)
		return np.reciprocal(np.log2(index+2))
	return 0

In [None]:
# User-item pairs for testing
test_user_item_set = set(zip(test_ratings['userId'], test_ratings['itemId']))

# Dict of all items that are interacted with by each user
user_interacted_items = ratings.groupby('userId')['itemId'].apply(list).to_dict()

hits1 = []
hits5 = []
hits10 = []
ndcg1 = []
ndcg5 = []
ndcg10 = []

for (u,i) in tqdm(test_user_item_set):
    interacted_items = user_interacted_items[u]
    not_interacted_items = set(all_itemIds) - set(interacted_items)
    selected_not_interacted = list(np.random.choice(list(not_interacted_items), 99))
    test_items = selected_not_interacted + [i]
    
    predicted_labels = np.squeeze(model(torch.tensor([u]*100), 
                                        torch.tensor(test_items)).detach().numpy())
    
    top1_items = [test_items[i] for i in np.argsort(predicted_labels)[::-1][0:1].tolist()]
    top5_items = [test_items[i] for i in np.argsort(predicted_labels)[::-1][0:5].tolist()]
    top10_items = [test_items[i] for i in np.argsort(predicted_labels)[::-1][0:10].tolist()]


    # HITS RATE
    if i in top1_items:
        hits1.append(1)
    else:
        hits1.append(0)

    if i in top5_items:
        hits5.append(1)
    else:
        hits5.append(0)

    if i in top10_items:
        hits10.append(1)
    else:
        hits10.append(0)

    # NDCG
    ndcg1.append(ndcg(i,top1_items))
    ndcg5.append(ndcg(i,top5_items))
    ndcg10.append(ndcg(i,top10_items))

print("The NDCG Ratio @ 1 is {:.4f}".format(np.average(ndcg1)))
print("The NDCG Ratio @ 5 is {:.4f}".format(np.average(ndcg5)))
print("The NDCG Ratio @ 10 is {:.4f}".format(np.average(ndcg10)))

print("The Hit Ratio @ 1 is {:.4f}".format(np.average(hits1)))
print("The Hit Ratio @ 5 is {:.4f}".format(np.average(hits5)))
print("The Hit Ratio @ 10 is {:.4f}".format(np.average(hits10)))

  0%|          | 0/64087 [00:00<?, ?it/s]

The NDCG Ratio @ 1 is 0.2023
The NDCG Ratio @ 5 is 0.3549
The NDCG Ratio @ 10 is 0.4014
The Hit Ratio @ 1 is 0.2023
The Hit Ratio @ 5 is 0.4960
The Hit Ratio @ 10 is 0.6398
