## Recommender Systems

**Recommender Systems** are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on the product).

![alt text](https://miro.medium.com/max/1920/1*Y_QG3Kvfk0fSnCirLBHZ7w.jpeg)


### Recommendation paradigms

The distinction between approaches is more academic than practical, but it’s important to understand their differences.
Broadly speaking, recommender systems are of 4 types:

1. **Collaborative filtering** is perhaps the most well-known approach to recommendation, to the point that it’s sometimes seen as synonymous with the field. The main idea is that you’re given a matrix of preferences by users for items, and these are used to predict missing preferences and recommend items with high predictions. All you need to get started is user and item IDs and a notion of preference by users for items (ratings, views, etc.).

2. **Content-based filtering** algorithms are given user preferences for items and recommend similar items based on a domain-specific notion of item content. This approach also extends naturally to cases where item metadata is available (e.g., movie stars, book authors, and music genres).
3. **Social and demographic** recommenders suggest items that are liked by friends, friends of friends, and demographically-similar people. Such recommenders don’t need any preferences by the user to whom recommendations are made, making them very powerful.
4. **Contextual recommendation** algorithms recommend items that match the user’s current context. This allows them to be more flexible and adaptive to current user needs than methods that ignore context (essentially giving the same weight to all of the user’s history). Hence, contextual algorithms are more likely to elicit a response than approaches that are based only on historical data.

## Collaborative Filtering

Collaborative filtering (CF) systems work by collecting user feedback in the form of ratings for items in a given domain and exploiting similarities in rating behavior among several users in determining how to recommend an item.
CF accumulates customer product ratings, identifies customers with common ratings, and offers recommendations based on inter-customer comparisons. It’s based on the idea that people who agree in their evaluations of certain items in the past are likely to agree again in the future. For example, most people ask their trusted friends for restaurant or movie suggestions.

![alt text](https://miro.medium.com/max/687/1*-Jr1l2rlj9SBcCzlDHtN5g.jpeg)

Collaborative filtering models are based on an assumption that people like things similar to other things they like, and things that are liked by other people with similar taste.

![alt text](https://miro.medium.com/max/1348/1*K5BOY3B93MLn173VVzOW0Q.png)

### Matrix Factorization
![alt text](https://datascienceplus.com/wp-content/uploads/2017/09/2017-09-20-2.png)

In [1]:
import torch
import pandas as pd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [12]:
ratings = pd.read_csv('ratings_mat.csv')
del ratings['Unnamed: 0']
ratings.fillna(-1)

ratings = ratings.values
exist = (ratings != -1).astype(int)

In [13]:
n_factors = 50
n_users, n_movies = ratings.shape
epochs = 250
logging_rate = 10

In [14]:
ratings = torch.tensor(ratings)
exist = torch.tensor(exist)

In [15]:
print(ratings[:5])

tensor([[ 5,  3,  4,  ..., -1, -1, -1],
        [ 4, -1, -1,  ..., -1, -1, -1],
        [-1, -1, -1,  ..., -1, -1, -1],
        [-1, -1, -1,  ..., -1, -1, -1],
        [ 4,  3, -1,  ..., -1, -1, -1]])


Task:

1. Define the two trainable latent factor tensors. One for the user and one for the movies.
2. Define the optimizer for the training.
3. Define the function that calculate the loss.
4. Calculate the estimated rating matrix.
5. Adjust the range of the estimated rating matrix to be [1, 5].

In [16]:
user_factors = torch.rand(n_factors, n_users, requires_grad=True)
movies_factors = torch.rand(n_factors, n_movies, requires_grad=True)

In [17]:
optimizer = optim.SGD(params=(user_factors, movies_factors), lr=0.01)

def calc_loss(true_r, est_r, exist):
    true_exist = true_r*exist
    est_exist = est_r*exist
    diff = true_exist - est_exist
    res = torch.mean(torch.square(diff))
    return res

for epoch in range(epochs):
    user_factors_tran = torch.transpose(user_factors, 0, 1)
    estimated = torch.matmul(user_factors_tran, movies_factors)
    loss = calc_loss(ratings, estimated, exist)
    loss.backward()
    optimizer.step()
    if epoch % logging_rate == 0:
        print("Loss at epoch {} = {}".format(epoch, loss.item()))
print("Last Loss = {}".format(loss.item()))

Loss at epoch 0 = 5.326584339141846
Loss at epoch 10 = 5.2930169105529785
Loss at epoch 20 = 5.199587345123291
Loss at epoch 30 = 5.049565315246582
Loss at epoch 40 = 4.848113059997559
Loss at epoch 50 = 4.601992130279541
Loss at epoch 60 = 4.31919002532959
Loss at epoch 70 = 4.008499622344971
Loss at epoch 80 = 3.6790778636932373
Loss at epoch 90 = 3.3400299549102783
Loss at epoch 100 = 3.0000295639038086
Loss at epoch 110 = 2.6670033931732178
Loss at epoch 120 = 2.347891092300415
Loss at epoch 130 = 2.0484859943389893
Loss at epoch 140 = 1.7733550071716309
Loss at epoch 150 = 1.5258251428604126
Loss at epoch 160 = 1.308035969734192
Loss at epoch 170 = 1.1210335493087769
Loss at epoch 180 = 0.9648990631103516
Loss at epoch 190 = 0.8388968706130981
Loss at epoch 200 = 0.7416288256645203
Loss at epoch 210 = 0.6711893677711487
Loss at epoch 220 = 0.625308632850647
Loss at epoch 230 = 0.6014841198921204
Loss at epoch 240 = 0.5970937013626099
Last Loss = 0.6075699925422668


#### Save the ratings:

In [18]:
estimated = torch.transpose(user_factors, 0, 1).mm(movies_factors)
estimated = estimated.cpu().detach().numpy()
max_r, min_r = estimated.max(), estimated.min()
estimated = 1 + 4 * (estimated - min_r) / (max_r - min_r)
n_users, n_movies = estimated.shape
estimated = pd.DataFrame(estimated, columns=list(range(1, n_movies + 1)),
                       index=list(range(1, n_users + 1)))
estimated.to_csv("answer_ratings.csv")

## How to make it nonlinear and Deeper?

Answer: Using deep NN, but will be the input in this case?
The user and item.

In [19]:
epochs = 100
batch_sz = 128

In [20]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

ratings = pd.read_csv('ratings.csv')
users = ratings['UserID'].values - 1
movies = ratings['MovieID'].values - 1
rates = ratings['Rating'].values
n_samples = len(rates)
batches = []
for i in range(0, n_samples, batch_sz):
    limit =  min(i + batch_sz, n_samples)
    users_batch, movies_batch, rates_batch = users[i: limit], \
                                             movies[i: limit],\
                                             rates[i: limit]
    batches.append((torch.tensor(users_batch, dtype=torch.long).to(device),
                    torch.tensor(movies_batch, dtype=torch.long).to(device),
                    torch.tensor(rates_batch, dtype=torch.float).to(device)))



### Model definition
Tasks:
1. Define the needed modules for the model:
        1.1 Two `Embedding`s for users and movies.
        1.2 One `Dropout` for the output of the embeddings.
        1.3 The hidden layers using `get_layer` function and `hidden` argument.



2. Define the flow of the training of the model in `forward()` function.
        2.1 Get the 2 embeddings tensors, then concatenate both.
        2.2 Run it through the hidden layers then the last fc layer.
        2.3 Apply sigmoid activation.
        2.4 Adjust the range of the estimated rating matrix to be [1, 5].

In [21]:
class RecommenderNet(nn.Module):
    """
    Creates a dense network with embedding layers.
    Args:
        n_users:
            Number of unique users in the dataset.
        n_movies:
            Number of unique movies in the dataset.
        n_factors:
            Number of columns in the embeddings matrix.
        embedding_dropout:
            Dropout rate to apply right after embeddings layer.
        hidden:
            A single integer or a list of integers defining the number of
            units in hidden layer(s).
        dropout_rate:
            dropout rate after each hidden layer.
    """

    def __init__(self, n_users, n_movies,
                 n_factors=50, embedding_dropout=0.02,
                 hidden=10, dropout_rate=0.2):

        super().__init__()

        n_last = hidden[-1]

        def gen_layers(n_in):
            """
            A generator that yields a sequence of hidden layers and
            their activations/dropouts.

            Note that the function captures `hidden` and `dropouts`
            values from the outer scope.
            """
            hidden_layers = []
            for n_out in hidden:
                hidden_layers.append(nn.Linear(n_in, n_out))
                hidden_layers.append(nn.ReLU())
                if dropout_rate is not None and dropout_rate > 0.:
                     hidden_layers.append(nn.Dropout(dropout_rate))
                n_in = n_out
            return hidden_layers

        self.users = nn.Embedding(n_users, n_factors)
        self.movies = nn.Embedding(n_movies, n_factors)
        self.drop = nn.Dropout(embedding_dropout)
        self.hidden = gen_layers(n_factors * 2)
        
        self.fc = nn.Linear(n_last, 1)
        self._init()


    def forward(self, users, movies, minmax=None):
        users = self.users(users)
        movies = self.movies(movies)
        x = torch.cat([users, movies], dim=1)
        x = self.drop(x)
        for hidden in self.hidden:
            x = hidden(x)
        x = torch.sigmoid(self.fc(x))
        return x * 4 + 1


    def _init(self):
        """
        Setup embeddings and hidden layers with reasonable initial values.
        """

        def init(m):
            if type(m) == nn.Linear:
                torch.nn.init.xavier_uniform_(m.weight)
                m.bias.data.fill_(0.01)

        self.users.weight.data.uniform_(-0.05, 0.05)
        self.movies.weight.data.uniform_(-0.05, 0.05)
        for layer in self.hidden:
            init(layer)
        init(self.fc)

In [22]:
net = RecommenderNet(n_users, n_movies, hidden=[128, 256, 128])
criterion = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(net.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.3, patience=2)

for epoch in range(epochs):
    train_loss = 0
    for users_batch, movies_batch, rates_batch in batches:
        net.zero_grad()
        out = net(users_batch, movies_batch, [1, 5]).squeeze()
        loss = criterion(rates_batch, out)

        loss.backward()
        optimizer.step()
        train_loss += loss
    scheduler.step(loss)
    print("Loss at epoch {} = {}".format(epoch, loss.item()))
print("Last Loss = {}".format(loss.item()))

Loss at epoch 0 = 1.2321830987930298
Loss at epoch 1 = 1.037410020828247
Loss at epoch 2 = 0.9033569097518921
Loss at epoch 3 = 0.9484357237815857
Loss at epoch 4 = 0.8854527473449707
Loss at epoch 5 = 0.817359209060669
Loss at epoch 6 = 0.8303821086883545
Loss at epoch 7 = 0.8551475405693054
Loss at epoch 8 = 0.9294219613075256
Loss at epoch 9 = 0.7396958470344543
Loss at epoch 10 = 0.7679277062416077
Loss at epoch 11 = 0.7623749375343323
Loss at epoch 12 = 0.7428615093231201
Loss at epoch 13 = 0.7214045524597168
Loss at epoch 14 = 0.7749396562576294
Loss at epoch 15 = 0.7848702073097229
Loss at epoch 16 = 0.5242792367935181
Loss at epoch 17 = 0.6891729235649109
Loss at epoch 18 = 0.5654334425926208
Loss at epoch 19 = 0.7867417335510254
Loss at epoch 20 = 0.6534708142280579
Loss at epoch 21 = 0.607595682144165
Loss at epoch 22 = 0.737130343914032
Loss at epoch 23 = 0.6620026230812073
Loss at epoch 24 = 0.6156119108200073
Loss at epoch 25 = 0.7351477742195129
Loss at epoch 26 = 0.59945

In [23]:
# evaluate the model
net.eval()
with torch.no_grad():
    total_correct = 0
    n_correct = 0
    total_loss = 0
    n_loss = 0
    for users_batch, movies_batch, rates_batch in batches:
        out = net(users_batch, movies_batch, [1, 5]).squeeze()
        loss = criterion(rates_batch, out)
        total_loss += loss
        n_loss += 1
        total_correct += torch.sum(torch.abs(rates_batch - out) < 0.5)
        n_correct += len(rates_batch)
    print("Average loss = {}".format(total_loss / n_loss))
    print("Accuracy = {}".format(total_correct / n_correct))

Average loss = 0.7456024885177612
Accuracy = 0.44933000206947327
