# Building a Recommender System with Restricted Boltzmann Machines

A Boltzmann Machine (BM) is a stochastic, generative neural network that can learn the probability distribution of its training data. It treats visible and hidden nodes symmetrically, unlike regular neural networks which treat the hidden nodes differently than the visible nodes.

BMs generate their own data based on the joint probability distribution learned from the training data. They are considered generative models as they can generate new samples from the learned distribution. In contrast, discriminative models like regular neural networks learn the conditional probability of the labels given the inputs.

During training, BM learns the interactions between nodes in an unsupervised manner. It adjusts the weights to capture the statistical structure in the training data. After training on proper data, BM understands how the system works in its normal (acceptable) state and can help identify abnormal states.

Due to the exponential growth in the number of hidden nodes as the visible nodes increase, Restricted Boltzmann Machines (RBMs) are typically used instead of full BMs. RBMs impose structure by only having connections between visible and hidden nodes, not between nodes of the same type.

During the training of RBMs, contrastive divergence is commonly used as an approximation to Maximum Likelihood learning due to its computational simplicity. This involves running a Markov chain for a few steps to estimate the gradient.

After training, RBMs can be used for applications like collaborative filtering, dimensionality reduction, feature learning, etc. The latent representations captured by the hidden nodes can be useful for downstream tasks.

In summary, BMs/RBMs are stochastic, generative models that learn the joint probability distribution of inputs unsupervised. Their learned representations of the data distribution can help with various applications in domains like computer vision, natural language processing, etc.

## Project Overview:

This project is an exploration into the application of deep learning for building a recommender system, a crucial tool in today's data-driven world. The project focuses on the implementation of Restricted Boltzmann Machines (RBM), a type of artificial neural network known for its effectiveness in recommendation systems.

#### Key Components and Steps:

1. **Data Preparation:**

* The project begins by importing and processing data from the **MovieLens dataset**, consisting of movies, users, and ratings. This dataset serves as the foundation for the recommender system.

2. **Data Conversion and Preprocessing:**

* The data is converted into Torch tensors, making it compatible with PyTorch, a popular deep learning framework.
* Ratings are converted into binary values to represent user preferences, simplifying the recommendation task (liked or not liked).

3. **RBM Architecture:**

* A Restricted Boltzmann Machine is designed with a visible layer (movies) and a hidden layer.
* The RBM includes learnable parameters (weights and biases) that capture patterns in user-movie interactions.

4. Training the RBM:

* The RBM is trained using stochastic gradient descent.
* During training, the RBM learns to represent the underlying structure of user preferences in an unsupervised manner.

5. Testing and Evaluation:

* The trained RBM is used to predict user preferences and make movie recommendations.
* The system's performance is evaluated using test data to assess its accuracy and effectiveness.

#### Key Insights:

- The project provides insights into the benefits of using RBMs in recommendation systems. RBMs can effectively capture complex user preferences and generate movie recommendations.
- It explores the concept of stochastic learning and unsupervised training, demonstrating how RBMs can learn the structure of user-movie interactions without labeled data.

#### Dataset Source:

- The MovieLens dataset, obtained from https://grouplens.org/datasets/movielens, serves as the data source for this project. This dataset contains user ratings of movies and is commonly used for building recommendation systems.

#### Significant features of the MovieLens datasets

- Standard benchmark datasets: The datasets like MovieLens 100K, 1M, 10M and 20M have become standard benchmarks for evaluating and comparing recommender system algorithms. Using these standardized datasets allows for easy reproducibility and comparison of results across different studies. 

- Large sizes: The 10M and 20M datasets contain 10 million and 20 million ratings respectively, providing a large amount of data to train and evaluate algorithms. The smaller datasets are also useful for faster experimentation.

- Rich metadata: Along with ratings, the datasets also contain metadata about users, movies and tags which can be useful for developing and evaluating content-based and contextual recommendation techniques. 

- Easy access: The datasets are freely and publicly available for download from the GroupLens website, making them easily accessible for researchers and students.

- Extended features: Some datasets like tag genome and YouTube datasets provide additional tag relevance scores and metadata that can enable new recommendation modeling techniques.

- Long history of use: Being used widely for over a decade, they provide a good baseline and benchmark to compare performance over time as algorithms evolve. 

So in summary, the standardized sizes, rich metadata, easy availability and long history of these datasets make them very useful resources for recommender system research and education.

This project showcases the practical application of deep learning in the field of recommendation systems, providing valuable experience in the development and evaluation of AI-driven solutions for personalized content recommendations.

## Importing the libraries

In [68]:
import numpy as np
import pandas as pd

In [69]:
import torch
import torch.nn as nn                # the module to implement neural network
import torch.nn.parallel             # for parallel computation
import torch.optim as optim          # the optimizer
import torch.utils.data
from torch.autograd import Variable  # for stochastic gradient discent 

## Importing the dataset

In [70]:
movies = pd.read_csv('ml-1m/movies.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
users = pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
# the separator is '::'
# We use **encoding = latin-1** because some movie names have special characters.
# since our columns have no names, we add header = None.
# We will use just the first column which is the movie id.

In [71]:
print(movies.head())
print(movies.shape)

   0                                   1                             2
0  1                    Toy Story (1995)   Animation|Children's|Comedy
1  2                      Jumanji (1995)  Adventure|Children's|Fantasy
2  3             Grumpier Old Men (1995)                Comedy|Romance
3  4            Waiting to Exhale (1995)                  Comedy|Drama
4  5  Father of the Bride Part II (1995)                        Comedy
(3883, 3)


In [72]:
ratings = pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
# The first column is the user ID.
# The second column corresponds to the movie ID.
# The third column corresponds to the user's rating of that movie (1 to 5).

In [73]:
print(ratings.head())
print(ratings.shape)

   0     1  2          3
0  1  1193  5  978300760
1  1   661  3  978302109
2  1   914  3  978301968
3  1  3408  4  978300275
4  1  2355  5  978824291
(1000209, 4)


## Preparing the training set and the test set

In [74]:
training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t')
print(training_set.head())
training_set = np.array(training_set, dtype = 'int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t')
test_set = np.array(test_set, dtype = 'int')

# In u1.base, the elements of each row are separated by a tab; hence delimiter = '\t'
# The 1st column is the user ID
# The 2nd column is the movie ID
# The 3rd column is the rating

   1  1.1  5  874965758
0  1    2  3  876893171
1  1    3  4  878542960
2  1    4  3  876893119
3  1    5  3  889751712
4  1    7  4  875071561


In [75]:
print(training_set.shape)
print(test_set.shape)
# 80/20

(79999, 4)
(19999, 4)


## Getting the number of users and movies

In [76]:
nb_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
nb_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))
print('number of users:',nb_users)
print('number of movies:', nb_movies)

number of users: 943
number of movies: 1682


## Converting the data into an array with users in lines and movies in columns

In [77]:
def convert(data):
    new_data = []
    for id_users in range(1, nb_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(nb_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data
training_set = convert(training_set)
test_set = convert(test_set)

## Converting the data into Torch tensors

In [78]:
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

In [79]:
training_set.shape

torch.Size([943, 1682])

In [80]:
type(training_set)

torch.Tensor

## Converting the ratings into binary ratings 1 (Liked) or 0 (Not Liked)

In [81]:
training_set[training_set == 0] = -1
training_set[training_set == 1] = 0
training_set[training_set == 2] = 0
training_set[training_set == 3] = 1
training_set[training_set == 4] = 1
training_set[training_set == 5] = 1
test_set[test_set == 0] = -1
test_set[test_set == 1] = 0
test_set[test_set == 2] = 0
test_set[test_set == 3] = 1
test_set[test_set == 4] = 1
test_set[test_set == 5] = 1

## Creating the architecture of the Neural Network

Restricted Boltzmann Machine is a probability graphical model. 

h = v.Wt + a 

v = W.h + b

In [82]:
class RBM():
    def __init__(self, nv, nh):
        self.W = torch.randn(nh, nv)
        self.a = torch.randn(1, nh)
        self.b = torch.randn(1, nv)
    def sample_h(self, x):
        wx = torch.mm(x, self.W.t())
        activation = wx + self.a.expand_as(wx)
        p_h_given_v = torch.sigmoid(activation)
        return p_h_given_v, torch.bernoulli(p_h_given_v)
    def sample_v(self, y):
        wy = torch.mm(y, self.W)
        activation = wy + self.b.expand_as(wy)
        p_v_given_h = torch.sigmoid(activation)
        return p_v_given_h, torch.bernoulli(p_v_given_h)
    def train(self, v0, vk, ph0, phk):
        self.W += (torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)).t()
        self.b += torch.sum((v0 - vk), 0)
        self.a += torch.sum((ph0 - phk), 0)
nv = len(training_set[0])
nh = 100
batch_size = 100
rbm = RBM(nv, nh)

## Training the RBM

In [83]:
nb_epoch = 10
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    s = 0.
    for id_user in range(0, nb_users - batch_size, batch_size):
        vk = training_set[id_user:id_user+batch_size]
        v0 = training_set[id_user:id_user+batch_size]
        ph0,_ = rbm.sample_h(v0)
        for k in range(10):
            _,hk = rbm.sample_h(vk)
            _,vk = rbm.sample_v(hk)
            vk[v0<0] = v0[v0<0]
        phk,_ = rbm.sample_h(vk)
        rbm.train(v0, vk, ph0, phk)
        train_loss += torch.mean(torch.abs(v0[v0>=0] - vk[v0>=0]))
        s += 1.
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))

epoch: 1 loss: tensor(0.3490)
epoch: 2 loss: tensor(0.2496)
epoch: 3 loss: tensor(0.2507)
epoch: 4 loss: tensor(0.2473)
epoch: 5 loss: tensor(0.2500)
epoch: 6 loss: tensor(0.2473)
epoch: 7 loss: tensor(0.2480)
epoch: 8 loss: tensor(0.2507)
epoch: 9 loss: tensor(0.2448)
epoch: 10 loss: tensor(0.2514)


## Testing the RBM

In [84]:
test_loss = 0
s = 0.
for id_user in range(nb_users):
    v = training_set[id_user:id_user+1]
    vt = test_set[id_user:id_user+1]
    if len(vt[vt>=0]) > 0:
        _,h = rbm.sample_h(v)
        _,v = rbm.sample_v(h)
        test_loss += torch.mean(torch.abs(vt[vt>=0] - v[vt>=0]))
        s += 1.
print('test loss: '+str(test_loss/s))

test loss: tensor(0.2403)
