# AutoEncoders

# About the dataset 

http://files.grouplens.org/datasets/movielens/ml-100k-README.txt

http://files.grouplens.org/datasets/movielens/ml-1m.zip

http://files.grouplens.org/datasets/movielens/ml-100k.zip

The purpose of this AutoEncoder is to learn latent representations of users and movies such that the model can predict missing ratings and provide personalized movie recommendations based on user preferences. The training and testing processes aim to optimize the model's ability to reconstruct the input ratings while generalizing to unseen data during testing.

# Importing the libraries

In [3]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

## Importing the dataset


In [4]:
# We won't be using this dataset.
movies = pd.read_csv('ml-1m/movies.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
users = pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
ratings = pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')

## Preparing the training set and the test set


In [5]:
training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t')
training_set = np.array(training_set, dtype = 'int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t')
test_set = np.array(test_set, dtype = 'int')

## Getting the number of users and movies


In [6]:
nb_users = int(max(max(training_set[:, 0], ), max(test_set[:, 0])))
nb_movies = int(max(max(training_set[:, 1], ), max(test_set[:, 1])))

## Converting the data into an array with users in lines and movies in columns


In [7]:
def convert(data):
  new_data = []
  for id_users in range(1, nb_users + 1):
    id_movies = data[:, 1] [data[:, 0] == id_users]
    id_ratings = data[:, 2] [data[:, 0] == id_users]
    ratings = np.zeros(nb_movies)
    ratings[id_movies - 1] = id_ratings
    new_data.append(list(ratings))
  return new_data
training_set = convert(training_set)
test_set = convert(test_set)

## Converting the data into Torch tensors


In [8]:
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

Automatic Differentiation: PyTorch tensors support automatic differentiation using the computation graph. This feature allows gradients to be computed automatically during backpropagation. It is essential for training neural networks using gradient-based optimization algorithms like stochastic gradient descent (SGD).

GPU Acceleration: PyTorch tensors can easily be moved to and processed on GPUs, which accelerates the computations and significantly speeds up training for large models and datasets. NumPy arrays do not have built-in support for GPU acceleration.

Seamless Integration with Neural Network Modules: PyTorch tensors seamlessly integrate with the PyTorch neural network modules (nn.Module). This integration allows users to define and train neural networks in a modular and organized manner.

Efficient Operations: PyTorch provides optimized tensor operations using low-level libraries such as Intel MKL and NVIDIA cuBLAS. These operations are often faster than equivalent NumPy operations.

Interoperability: PyTorch tensors can be converted to and from NumPy arrays easily using .numpy() and torch.from_numpy() functions. This interoperability allows users to leverage the strengths of both libraries when needed.

## Creating the architecture of the Neural Network


The AutoEncoder aims to reconstruct the input data as accurately as possible, which helps in learning meaningful latent representations of users and movies for collaborative filtering and recommendation tasks.

In [9]:
class SAE(nn.Module):
    #using nn.Module we can leverage the powers of PyTorch for building neural network models
    def __init__(self, ):
        # ARCHITECTURE OF AUTO ENCODER
        super(SAE, self).__init__()
        #call the constructor of nn.Module
        self.fc1 = nn.Linear(nb_movies, 20) # first fully connected Dense layer of auto encoder nb_movies as input and 
        # produces 20 neurons
        self.fc2 = nn.Linear(20, 10) # dense layer takes 20 as input , op of prev and produces 10 {dense layers}
        self.fc3 = nn.Linear(10, 20) # dense layer
        self.fc4 = nn.Linear(20, nb_movies) # dense layer that produces number of movies output neurons
        self.activation = nn.Sigmoid() # activation function of this model is sigmoid
    def forward(self, x):
        #This method defines the forward pass of the AutoEncoder. 
        #It specifies how the input data x flows through the layers to produce the output.
        
        x = self.activation(self.fc1(x))
        x = self.activation(self.fc2(x))
        x = self.activation(self.fc3(x))
        x = self.fc4(x)
        return x
sae = SAE()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(sae.parameters(), lr = 0.01, weight_decay = 0.5)

## Training the SAE


In [10]:
nb_epoch = 200
for epoch in range(1, nb_epoch + 1):
  train_loss = 0
  s = 0.
  for id_user in range(nb_users):
    input = Variable(training_set[id_user]).unsqueeze(0)
    target = input.clone()
    if torch.sum(target.data > 0) > 0:
      output = sae(input)
      target.require_grad = False
      output[target == 0] = 0
      loss = criterion(output, target)
      mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
      loss.backward()
      train_loss += np.sqrt(loss.data*mean_corrector)
      s += 1.
      optimizer.step()
  print('epoch: '+str(epoch)+'loss: '+ str(train_loss/s))

epoch: 1loss: tensor(1.7713)
epoch: 2loss: tensor(1.0967)
epoch: 3loss: tensor(1.0533)
epoch: 4loss: tensor(1.0383)
epoch: 5loss: tensor(1.0307)
epoch: 6loss: tensor(1.0269)
epoch: 7loss: tensor(1.0239)
epoch: 8loss: tensor(1.0220)
epoch: 9loss: tensor(1.0207)
epoch: 10loss: tensor(1.0199)
epoch: 11loss: tensor(1.0191)
epoch: 12loss: tensor(1.0185)
epoch: 13loss: tensor(1.0181)
epoch: 14loss: tensor(1.0176)
epoch: 15loss: tensor(1.0174)
epoch: 16loss: tensor(1.0167)
epoch: 17loss: tensor(1.0168)
epoch: 18loss: tensor(1.0165)
epoch: 19loss: tensor(1.0164)
epoch: 20loss: tensor(1.0162)
epoch: 21loss: tensor(1.0161)
epoch: 22loss: tensor(1.0163)
epoch: 23loss: tensor(1.0158)
epoch: 24loss: tensor(1.0158)
epoch: 25loss: tensor(1.0158)
epoch: 26loss: tensor(1.0159)
epoch: 27loss: tensor(1.0154)
epoch: 28loss: tensor(1.0148)
epoch: 29loss: tensor(1.0127)
epoch: 30loss: tensor(1.0112)
epoch: 31loss: tensor(1.0100)
epoch: 32loss: tensor(1.0067)
epoch: 33loss: tensor(1.0068)
epoch: 34loss: tens

## Testing the SAE


In [11]:
test_loss = 0
s = 0.
for id_user in range(nb_users):
  input = Variable(training_set[id_user]).unsqueeze(0)
  target = Variable(test_set[id_user]).unsqueeze(0)
  if torch.sum(target.data > 0) > 0:
    output = sae(input)
    target.require_grad = False
    output[target == 0] = 0
    loss = criterion(output, target)
    mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
    test_loss += np.sqrt(loss.data*mean_corrector)
    s += 1.
print('test loss: '+str(test_loss/s))

test loss: tensor(0.9484)
