### Week 8: Colaborative Filtering
```
- Advanced Machine Learning, Innopolis University 
- Professor: Muhammad Fahim 
- Teaching Assistant: Gcinizwe Dlamini
```
<hr>


```
Lab Plan
    1. Content based recommendation Systems 
    2. Matrix Factorisation
    3. Surprise 
    4. Deep Learning based recommendation systems
    5. Lab Task
```

<hr>

## 1. Background

**Recommender Systems** are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on the product).

![alt text](https://miro.medium.com/max/1920/1*Y_QG3Kvfk0fSnCirLBHZ7w.jpeg)


### Recommendation paradigms

The distinction between approaches is more academic than practical, but it’s important to understand their differences.
Broadly speaking, recommender systems are of 4 types:

1. **Collaborative filtering** is perhaps the most well-known approach to recommendation, to the point that it’s sometimes seen as synonymous with the field. The main idea is that you’re given a matrix of preferences by users for items, and these are used to predict missing preferences and recommend items with high predictions. All you need to get started is user and item IDs and a notion of preference by users for items (ratings, views, etc.).

2. **Content-based filtering** algorithms are given user preferences for items and recommend similar items based on a domain-specific notion of item content. This approach also extends naturally to cases where item metadata is available (e.g., movie stars, book authors, and music genres).
3. **Social and demographic** recommenders suggest items that are liked by friends, friends of friends, and demographically-similar people. Such recommenders don’t need any preferences by the user to whom recommendations are made, making them very powerful.
4. **Contextual recommendation** algorithms recommend items that match the user’s current context. This allows them to be more flexible and adaptive to current user needs than methods that ignore context (essentially giving the same weight to all of the user’s history). Hence, contextual algorithms are more likely to elicit a response than approaches that are based only on historical data.

## Collaborative Filtering

Collaborative filtering (CF) systems work by collecting user feedback in the form of ratings for items in a given domain and exploiting similarities in rating behavior among several users in determining how to recommend an item.
CF accumulates customer product ratings, identifies customers with common ratings, and offers recommendations based on inter-customer comparisons. It’s based on the idea that people who agree in their evaluations of certain items in the past are likely to agree again in the future. For example, most people ask their trusted friends for restaurant or movie suggestions.

![alt text](https://miro.medium.com/max/687/1*-Jr1l2rlj9SBcCzlDHtN5g.jpeg)

Collaborative filtering models are based on an assumption that people like things similar to other things they like, and things that are liked by other people with similar taste.

![alt text](https://miro.medium.com/max/1348/1*K5BOY3B93MLn173VVzOW0Q.png)

## 2. Content based recommendation Systems

* What is content based recommendation Systems? 
* How are Content based recommendation Systems different from other systems you know? 


### 2.1 Dataset

What does the dataset look like? <br>
The componets of the dataset:

1. Item-ID
2. Item-description

In [None]:
import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

#Load dataset and take a look
ds = pd.read_csv("sample-data.csv")
ds.head()

## 2.2 Recommendation task

Recommend k items to a user given that he is intreted in "Coton Shorts" item 30.

Solution:

1. Create TF-IDF of every item.
2. Measure cosine distance.
3. Propose the K-closest

In [None]:
#Step 1: Create TF-IDF of every item
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(ds['description'])

# Step 2: Measure cosine distance
cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)

results = {}

for idx, row in ds.iterrows():
    similar_indices = cosine_similarities[idx].argsort()[:-100:-1]
    similar_items = [(cosine_similarities[idx][i], ds['id'][i]) for i in similar_indices]

    results[row['id']] = similar_items[1:]
    
print('done!')

def item(id):
    return ds.loc[ds['id'] == id]['description'].tolist()[0].split(' - ')[0]

# Just reads the results out of the dictionary.
def recommend(item_id, num):
    print("Recommending " + str(num) + " products similar to " + item(item_id) + "...")
    print("-------")
    recs = results[item_id][:num]
    for rec in recs:
        print("Recommended: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")

recommend(item_id=30, num=10)

Discussion:

1. Make changes.
2. Evaluate the model.

## 3. Model Based Recommendation Systems


![alt text](https://datascienceplus.com/wp-content/uploads/2017/09/2017-09-20-2.png)

## 4.1 Dataset
We will use 
[**MovieLens 20M Dataset**
](https://grouplens.org/datasets/movielens/20m/) <br>
An open-source dataset available in grouplens.org, The data set has 25000095 ratings and 1093360 tag applications across 62423 movies. Created by 162541 users between 1995 and 2019.

Download it and upload the zip file

In [None]:
!wget https://files.grouplens.org/datasets/movielens/ml-20m.zip --no-check-certificate
!unzip 'ml-20m.zip'

## 2.2 Load Dataset 

In [None]:
data_path = './ml-20m/'
movies_filename = 'movies.csv'
ratings_filename = 'ratings.csv'

df_movies = pd.read_csv(
    os.path.join(data_path, movies_filename),
    usecols=['movieId', 'title'],
    dtype={'movieId': 'int32', 'title': 'str'})

df_ratings = pd.read_csv(
    os.path.join(data_path, ratings_filename),
    usecols=['userId', 'movieId', 'rating'],
    dtype={'userId': 'int32', 'movieId': 'int32', 'rating': 'float32'})


In [None]:
df_movies.head()

In [None]:
df_ratings.head()

## 2.3 Create user-movie dataset

In [None]:
df_ratings=df_ratings[:2000000]
df_movie_features = df_ratings.pivot(
    index='userId',
    columns='movieId',
    values='rating'
).fillna(0)

df_movie_features.head()

## 2.4 Singular value decomposition (SVD)

* Remember dimensionality reduction
* What are other algorithmns for dimensionality reduction? 

SVD from scratch ??? **NO**<br>

We will use sklearn implementation for now
<br>


In [None]:
from scipy.sparse.linalg import svds

R = df_movie_features.values
user_ratings_mean = np.mean(R, axis = 1)
R_demeaned = R - user_ratings_mean.reshape(-1, 1)

U, sigma, Vt = svds(R_demeaned, k = 50)

sigma = np.diag(sigma)
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)

preds_df = pd.DataFrame(all_user_predicted_ratings, columns = df_movie_features.columns)
preds_df.head()

In [None]:
def recommend_movies(preds_df, userID, movies_df, original_ratings_df, num_recommendations=5):
  user_row_number = userID - 1 
  sorted_user_predictions = preds_df.iloc[user_row_number].sort_values(ascending=False)
  user_data = original_ratings_df[original_ratings_df.userId == (userID)]
  user_full = (user_data.merge(movies_df, how = 'left', left_on = 'movieId', right_on = 'movieId').
                    sort_values(['rating'], ascending=False)
                )
  recommendations = (movies_df[~movies_df['movieId'].isin(user_full['movieId'])]).merge(pd.DataFrame(sorted_user_predictions).reset_index(), how = 'left', left_on = 'movieId',
              right_on = 'movieId').rename(columns = {user_row_number: 'Predictions'}).sort_values('Predictions', ascending = False).iloc[:num_recommendations, :-1]
                    

  return user_full, recommendations


already_rated, predictions = recommend_movies(preds_df, 330, df_movies, df_ratings, 10)
already_rated.head(10)

In [None]:
predictions

## 3. [Surprise](https://github.com/NicolasHug/Surprise)

Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data

In [None]:
!pip3 install scikit-surprise

### 3.1 Load data and fit SVD model

In [None]:
from surprise import Reader, SVD, Dataset
from collections import defaultdict

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df_ratings[["userId", "movieId", "rating"]], reader)

In [None]:
# Create a train set  and fit the model (using ALS or SGD)
trainset = data.build_full_trainset()
algo = SVD()
algo.fit(trainset)

### 3.2 Recommend Movies

In [None]:
def get_top_n(predictions, n=10):
  """Return the top-N recommendation for each user from a set of predictions.
  Args:
      predictions(list of Prediction objects): The list of predictions, as
          returned by the test method of an algorithm.
      n(int): The number of recommendation to output for each user. Default
          is 10.
  Returns:
  A dict where keys are user (raw) ids and values are lists of tuples:
      [(raw item id, rating estimation), ...] of size n.
  """

  # First map the predictions to each user.
  top_n = defaultdict(list)
  for uid, iid, true_r, est, _ in predictions:
      top_n[uid].append((iid, est))

  # Then sort the predictions for each user and retrieve the k highest ones.
  for uid, user_ratings in top_n.items():
      user_ratings.sort(key=lambda x: x[1], reverse=True)
      top_n[uid] = user_ratings[:n]

  return top_n

In [None]:
testset = trainset.build_anti_testset()
predictions = algo.test(testset)

top_n = get_top_n(predictions, n=10)

In [None]:
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

## 4. Deep learning Based approach 

**We will use PyTorch !!**

The deep learnig approach is not so different from SVD what we have just seen earlier. 

**Back to Embeddings!!!**

The Neural network is made up of Two Embedding layers and some hidden layers. 

1. Example Achitecture :
        1.1 Two `Embedding`s for users and movies.
        1.2 One `Dropout` for the output of the embeddings.
        1.3 The hidden layers
        1.4 Output layer

2. Example forward pass: 
        2.1 Get the 2 embeddings tensors, then concatenate both.
        2.2 Run it through the hidden layers then the last fc layer.
        2.3 Apply sigmoid activation.
        2.4 Adjust the range of the estimated rating matrix to be [1, 5].

3. Loss function is MSE
4. Optimizer ??? 



In [None]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
import pandas as pd

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

epochs = 100
batch_sz = 128

## Read Data and Create batches

100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It is a smaller dataset for education and development. 

In [None]:
#Get smaller dataset 
!wget https://files.grouplens.org/datasets/movielens/ml-latest-small.zip --no-check-certificate
!unzip ml-latest-small.zip

In [None]:

df_ratings = pd.read_csv("./ml-latest-small/ratings.csv", usecols=['userId', 'movieId', 'rating'],
                         dtype={'userId': 'int32', 'movieId': 'int32', 'rating': 'float32'})

users = df_ratings['userId'].values - 1
movies = df_ratings['movieId'].values - 1
rates = df_ratings['rating'].values
n_samples = len(rates)

n_users, n_movies =  max(users)+1, max(movies)+1
batches = []

#Create batches
for i in range(0, n_samples, batch_sz):
  limit =  min(i + batch_sz, n_samples)
  users_batch, movies_batch, rates_batch = users[i: limit], movies[i: limit], rates[i: limit]
  batches.append((torch.tensor(users_batch, dtype=torch.long), torch.tensor(movies_batch, dtype=torch.long),
                  torch.tensor(rates_batch, dtype=torch.float)))
users = None
movies = None 
rates = None 

## Define Model

**TODO :** implement the hidden layers with the following achitecture: 

* 3 layers (128, 256 and 128 neurones)
* Dropout every after a layer with 20% probability
* Relu as activation function for all 3 hidden layers 

In [None]:
class RecommenderNet(nn.Module):
  def __init__(self, n_users, n_movies, n_factors=50, embedding_dropout=0.02, dropout_rate=0.2):
    super().__init__()

    self.u = nn.Embedding(n_users, n_factors)
    self.m = nn.Embedding(n_movies, n_factors)
    self.drop = nn.Dropout(embedding_dropout)
    self.hidden = nn.Sequential(....) #TODO: Implement the hidden layers
    self.fc = nn.Linear(128, 1)
    self._init()

  def forward(self, users, movies, minmax=[1,5]):
    features = torch.cat([self.u(users), self.m(movies)], dim=1)
    x = self.drop(features)
    x = self.hidden(x)
    out = torch.sigmoid(self.fc(x))
    
    if minmax is not None: #Scale the output to [1,5]
      min_rating, max_rating = minmax
      out = out*(max_rating - min_rating + 1) + min_rating - 0.5
    return out

  def _init(self):
    """
    Initialize embeddings and hidden layers weights with xavier.
    """
    def init(m):
        if type(m) == nn.Linear:
            torch.nn.init.xavier_uniform_(m.weight)
            m.bias.data.fill_(0.01)

    self.u.weight.data.uniform_(-0.05, 0.05)
    self.m.weight.data.uniform_(-0.05, 0.05)
    self.hidden.apply(init)
    init(self.fc)

In [None]:
net = RecommenderNet(n_users=n_users, n_movies=n_movies).to(device)
net

## Define Training parameters

1. Loss function 
2. Optimizer 
3. learning rate scheduler : dynamic learning rate `lr_scheduler.ReduceLROnPlateau`: Reduce learning rate when a metric has stopped improving.

In [None]:
criterion = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(net.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.3, patience=2)

## Training Loop

In [None]:
epochs = 10

for epoch in range(epochs):
  train_loss = 0
  for users_batch, movies_batch, rates_batch in batches:
    net.zero_grad()
    out = net(users_batch.to(device), movies_batch.to(device), [1, 5]).squeeze()
    loss = criterion(rates_batch.to(device), out)

    loss.backward()
    optimizer.step()
    train_loss += loss
  scheduler.step(loss)
  print("Loss at epoch {} = {}".format(epoch, loss.item()))
print("Last Loss = {}".format(loss.item()))

## Lab Task
```
1. Implement ...

```


<center>Don't to forget to make a Git commit</center>

## References
1. [Introduction to recommender systems](https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada)

2. [Recommender system](https://en.wikipedia.org/wiki/Recommender_system)

3. [Recommender Systems with Python — Part I: Content-Based Filtering](https://heartbeat.fritz.ai/recommender-systems-with-python-part-i-content-based-filtering-5df4940bd831)

4. [Build a Recommendation Engine With Collaborative Filtering](https://realpython.com/build-recommendation-engine-collaborative-filtering/)
