<a href="https://colab.research.google.com/github/AbdelnasserMostafa/-myproject/blob/master/Deep_Learning_A_Z_Hands_On_Artificial_Neural_Network_course_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
plt.rcParams['axes.facecolor'] = 'white'
#plt.grid(c='grey')
plt.style.use('default')
%matplotlib inline

# Welcome to Part 6 - AutoEncoders
Section 24, Lecture 139
Welcome to Part 6 - AutoEncoders



In this part you will learn:

1. the Intuition of AutoEncoders
2. how to build an AutoEncoder from scratch with PyTorch
3. how to manipulate classes and objects to improve and tune your AutoEncoder

In the previous part we created a Recommender System that predicted binary ratings "Like" or "Not Like". In this part we will take it at the next level and create a Recommender System that predicts ratings from 1 to 5.

We will implement a Stacked AutoEncoders model with PyTorch, a highly advanced Deep Learning platform more powerful than Keras. Every single line of code will be explained in details but I would recommend to have a first look at the PyTorch documentation to start getting familiar with PyTorch:

# Training an Auto Encoder steps:

STEP 1: We start with an array where the lines (the observations) correspond to the users and the columns (the features) correspond to the movies.  Each cell (u, i) contains the rating (from 1 to 5, 0 if no rating) of the movie i by the user u

STEP 2: The first user goes into the network.  The input vector x = (r1, r2, ...., rm) contains all its ratings for all the movies.

STEP 3: The input vector x is encoded into a vector z of lower dimensions by a mapping function f (e.g: sigmoid function):

z = f(Wx + b) where W is the vector of input weights and b is the bias

STEP 4: z is then decoded into the output vector y of same dimensions as x, aiming to replicate the input vector x.

STEP 5: The reconstruction error d(x, y) = ||x-y|| is computed.  The goal is to minimize it.

STEP 6: Back-Propagation: from right to left, the error is back-propagated.  The weights are updated according to how much they are responsible for the error.  The learning rate decides by how much we update the weights.

STEP 7: Repeat Steps 1 to 6 and update the weights after each observation (This is Reinforcement Learning). Or: Repeat Steps 1 to 6 but update the weights only after a batch of observation (This is Batch Learning)

STEP 8: When the whole training set passed through the ANN, that makes an epoch.  Redo more epochs


In [None]:
!wget http://files.grouplens.org/datasets/movielens/ml-100k.zip

--2019-03-30 06:28:43--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.34.235
Connecting to files.grouplens.org (files.grouplens.org)|128.101.34.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4924029 (4.7M) [application/zip]
Saving to: ‘ml-100k.zip.1’


2019-03-30 06:28:45 (3.44 MB/s) - ‘ml-100k.zip.1’ saved [4924029/4924029]



In [None]:
!wget http://files.grouplens.org/datasets/movielens/ml-1m.zip

--2019-03-30 06:29:07--  http://files.grouplens.org/datasets/movielens/ml-1m.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.34.235
Connecting to files.grouplens.org (files.grouplens.org)|128.101.34.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5917549 (5.6M) [application/zip]
Saving to: ‘ml-1m.zip.1’


2019-03-30 06:29:08 (4.12 MB/s) - ‘ml-1m.zip.1’ saved [5917549/5917549]



In [None]:
!unzip ml-100k.zip

Archive:  ml-100k.zip
replace ml-100k/allbut.pl? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: ml-100k/allbut.pl       
  inflating: ml-100k/mku.sh          
  inflating: ml-100k/README          
  inflating: ml-100k/u.data          
  inflating: ml-100k/u.genre         
  inflating: ml-100k/u.info          
  inflating: ml-100k/u.item          
  inflating: ml-100k/u.occupation    
  inflating: ml-100k/u.user          
  inflating: ml-100k/u1.base         
  inflating: ml-100k/u1.test         
  inflating: ml-100k/u2.base         
  inflating: ml-100k/u2.test         
  inflating: ml-100k/u3.base         
  inflating: ml-100k/u3.test         
  inflating: ml-100k/u4.base         
  inflating: ml-100k/u4.test         
  inflating: ml-100k/u5.base         
  inflating: ml-100k/u5.test         
  inflating: ml-100k/ua.base         
  inflating: ml-100k/ua.test         
  inflating: ml-100k/ub.base         
  inflating: ml-100k/ub.test         


In [None]:
!unzip ml-1m.zip

Archive:  ml-1m.zip
replace ml-1m/movies.dat? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: ml-1m/movies.dat        
  inflating: ml-1m/ratings.dat       
  inflating: ml-1m/README            
  inflating: ml-1m/users.dat         


In [None]:
# Importing the libraries
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
from torch.autograd import Variable

# Importing the dataset
movies = pd.read_csv('/content/ml-1m/movies.dat', sep = '::', 
                     header = None, engine = 'python', encoding = 'latin-1')
users = pd.read_csv('/content/ml-1m/users.dat', sep = '::', 
                     header = None, engine = 'python', encoding = 'latin-1')
ratings = pd.read_csv('/content/ml-1m/ratings.dat', sep = '::', 
                     header = None, engine = 'python', encoding = 'latin-1')

# Preparing the training set and the test set
training_set = pd.read_csv('/content/ml-100k/u1.base', delimiter = '\t')
test_set = pd.read_csv('/content/ml-100k/u1.test', delimiter = '\t')

# Convert the training_set and test_set to array
training_set = np.array(training_set, dtype = 'int')
test_set = np.array(test_set, dtype = 'int')

# Converting the data into an array with users in lines and movies in columns
def convert(data):
    new_data = []
    for id_users in range(1, nb_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(nb_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data
training_set = convert(training_set)
test_set = convert(test_set)

# Converting the data into Torch tensors
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

# Creating the architecture of the Neural Network
class SAE(nn.Module):
    def __init__(self, ):
        super(SAE, self).__init__()
        self.fc1 = nn.Linear(nb_movies, 20)
        self.fc2 = nn.Linear(20, 10)
        self.fc3 = nn.Linear(10, 20)
        self.fc4 = nn.Linear(20, nb_movies)
        self.activation = nn.Sigmoid()
    def forward(self, x):
        x = self.activation(self.fc1(x))
        x = self.activation(self.fc2(x))
        x = self.activation(self.fc3(x))
        x = self.fc4(x)
        return x
sae = SAE()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(sae.parameters(), lr = 0.01, weight_decay = 0.5)

# Training the SAE
nb_epoch = 200
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    s = 0.
    for id_user in range(nb_users):
        input = Variable(training_set[id_user]).unsqueeze(0)
        target = input.clone()
        if torch.sum(target.data > 0) > 0:
            output = sae(input)
            target.require_grad = False
            output[target == 0] = 0
            loss = criterion(output, target)
            mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
            loss.backward()
            train_loss += np.sqrt(loss.data[0]*mean_corrector)
            s += 1.
            optimizer.step()
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))

# Testing the SAE
test_loss = 0
s = 0.
for id_user in range(nb_users):
    input = Variable(training_set[id_user]).unsqueeze(0)
    target = Variable(test_set[id_user])
    if torch.sum(target.data > 0) > 0:
        output = sae(input)
        target.require_grad = False
        output[target == 0] = 0
        loss = criterion(output, target)
        mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
        test_loss += np.sqrt(loss.data[0]*mean_corrector)
        s += 1.
print('test loss: '+str(test_loss/s))   

IndexError: ignored

# Data Preprocessing Template

In [None]:
# Downloading the Dataset

!wget https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P16-Data-Preprocessing-Template.zip



--2019-03-30 21:17:40--  https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P16-Data-Preprocessing-Template.zip
Resolving sds-platform-private.s3-us-east-2.amazonaws.com (sds-platform-private.s3-us-east-2.amazonaws.com)... 52.219.84.48
Connecting to sds-platform-private.s3-us-east-2.amazonaws.com (sds-platform-private.s3-us-east-2.amazonaws.com)|52.219.84.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3483 (3.4K) [application/zip]
Saving to: ‘P16-Data-Preprocessing-Template.zip’


2019-03-30 21:17:41 (109 MB/s) - ‘P16-Data-Preprocessing-Template.zip’ saved [3483/3483]



In [None]:
# Unzip the dataset

!unzip P16-Data-Preprocessing-Template.zip

Archive:  P16-Data-Preprocessing-Template.zip
   creating: Data Preprocessing Template/
  inflating: Data Preprocessing Template/categorical_data.py  
   creating: __MACOSX/
   creating: __MACOSX/Data Preprocessing Template/
  inflating: __MACOSX/Data Preprocessing Template/._categorical_data.py  
  inflating: Data Preprocessing Template/Data.csv  
  inflating: __MACOSX/Data Preprocessing Template/._Data.csv  
  inflating: Data Preprocessing Template/data_preprocessing_template.py  
  inflating: __MACOSX/Data Preprocessing Template/._data_preprocessing_template.py  
  inflating: Data Preprocessing Template/missing_data.py  


In [None]:
!ls -la

total 32
drwxr-xr-x 1 root root 4096 Mar 30 21:18  .
drwxr-xr-x 1 root root 4096 Mar 30 21:14  ..
drwxr-xr-x 1 root root 4096 Mar 27 20:25  .config
drwxr-xr-x 2 root root 4096 Mar 31  2017 'Data Preprocessing Template'
drwxrwxr-x 3 root root 4096 Mar 31  2017  __MACOSX
-rw-r--r-- 1 root root 3483 Mar  6 17:48  P16-Data-Preprocessing-Template.zip
drwxr-xr-x 1 root root 4096 Mar 27 20:26  sample_data


In [None]:
!cd  'Data Preprocessing Template'

In [None]:
pwd

'/content'

In [None]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('/content/Data Preprocessing Template/Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
labelencoder_y = LabelEncoder()
y = labelencoder_X.fit_transform(y)

# Spliting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)


# Classification Template