## Introduction:

This notebook is an implementation to stacked denoising auto-encoders (SdA) using Theano on MNIST datset .. 

The SdA is an MLP, for which all weights of intermediate layers are shared with a different denoising autoencoders
We will first construct the SdA as a deep multilayer perceptron, and when constructing each sigmoidal layer we also construct a denoising autoencoder that shares weights with that layer During pretraining we will train these autoencoders (which will lead to chainging the weights of the MLP as well) During finetunining we will finish training the SdA by doing SGD on the MLP. 

----------------------------------------------------------------------------------------------------- 

__Auto_Incoders__:

An autoencoder takes an input x and first maps it to a hidden representation y = s(Wx+b), parameterized by {W,b}. 

The resulting latent representation y is then mapped back to a "reconstructed" vector z in input space z = s(W'y + b').  

The weight matrix W' can optionally be constrained such that W' = W^T, in which case the autoencoder is said to have tied weights. 

The network is trained such that to minimize the reconstruction error (the error between x and z).

----------------------------------------------------------------------------------------------------- 

__Diagonsing_Auto_Incoders__:

 For the denosing autoencoder, during training, first x is corrupted into \tilde{x}, where \tilde{x} is a partially destroyed version of x by means of a stochastic mapping.
 
Afterwards y is computed as before (using \tilde{x}), y = s(W\tilde{x} + b) and z as s(W'y + b'). 

The reconstruction error is now measured between z and the uncorrupted input x, which is computed as the cross-entropy:
 - \sum_{k=1}^d[ x_k \log z_k + (1-x_k) \log( 1-z_k)]

In [100]:
import os
import sys
import timeit

import numpy

import theano
import theano.tensor as T
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams

from logistic_sgd import LogisticRegression, load_data
from mlp import HiddenLayer
from dA import dA

Stacked denoising auto-encoder class (SdA) ==> 

A stacked denoising autoencoder model is obtained by stacking several dAs. The hidden layer of the dA at layer `i` becomes the input of the dA at layer `i+1`. The first layer dA gets as input the input of the SdA, and the hidden layer of the last dA represents the output.


Note that after pretraining, the SdA is dealt with as a normal MLP, the dAs are only used to initialize the weights.

 
There are two main steps for SdA ==> Pretraining (unsupervised) and fine tuning (supervised) .. both gonna be implmented in the class below .. 

In [101]:
class SdA(object):
    def __init__(
        self,
        numpy_rng,
        theano_rng=None,
        n_ins=784,
        hidden_layers_sizes=[500, 500],
        n_outs=10,
        corruption_levels=[0.1, 0.1]
    ):
        self.sigmoid_layers = []
        self.dA_layers = []
        self.params = []
        self.n_layers = len(hidden_layers_sizes)

        assert self.n_layers > 0  # condition
    
        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        # allocate symbolic variables for the data
        self.x = T.matrix('x')  # the data is presented as rasterized images
        self.y = T.ivector('y')  # the labels are presented as 1D vector of [int] labels
        
        
        for i in range(self.n_layers):
            # construct the sigmoidal layer
            # the size of the input is either the number of hidden units of
            # the layer below or the input size if we are on the first layer
            if i == 0:
                input_size = n_ins
            else:
                input_size = hidden_layers_sizes[i-1]

       
        # the input to this layer is either the activation of the hidden
        # layer below or the input of the SdA if you are on the first layer
        if i == 0:
            layer_input = self.x
        else:
            layer_input = self.x  ## incorrect ! .. has to be replaced with the following line
           # layer_input = self.sigmoid_layers[-1].output
            
        sigmoid_layer = HiddenLayer(rng=numpy_rng,
                                        input=layer_input,
                                        n_in=input_size,
                                        n_out=hidden_layers_sizes[i],
                                        activation=T.nnet.sigmoid)

        
         # add the layer to our list of layers
        self.sigmoid_layers.append(sigmoid_layer)
        
        # we are going to only declare that the parameters of the sigmoid_layers are parameters of the StackedDAA
        # the visible biases in the dA are parameters of those dA, but not the SdA
        self.params.extend(sigmoid_layer.params)

        # Construct a denoising autoencoder that shared weights with this layer
        dA_layer = dA(numpy_rng=numpy_rng,
                          theano_rng=theano_rng,
                          input=layer_input,
                          n_visible=input_size,
                          n_hidden=hidden_layers_sizes[i],
                          W=sigmoid_layer.W,
                          bhid=sigmoid_layer.b)  ## call to autoencoder class 
#         self.dA_layers.append(dA_layer)  #Stacking layers
        
         # We now need to add a logistic layer on top of the MLP
        self.logLayer = LogisticRegression(
            input=self.sigmoid_layers[-1].output,
            n_in=hidden_layers_sizes[-1], n_out=n_outs)
        
        self.params.extend(self.logLayer.params)
        # construct a function that implements one step of finetunining

        
        # compute the cost for second phase of training,
        # defined as the negative log likelihood
        self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
        # compute the gradients with respect to the model parameters
        # symbolic variable that points to the number of errors made on the
        # minibatch given by self.x and self.y
        self.errors = self.logLayer.errors(self.y)
    
    ## pre-training .. 
    #         Generates a list of functions, each of them implementing one
    #         step in trainnig the dA corresponding to the layer with same index.
    #         The function will require as input the minibatch index, and to train
    #         a dA you just need to iterate, calling the corresponding function on
    #         all minibatch indexes.

    def pretraining_functions(self, train_set_x, batch_size):
         # index to a [mini]batch
        index = T.lscalar('index')  # index to a minibatch
        corruption_level = T.scalar('corruption')  # % of corruption to use
        learning_rate = T.scalar('lr')  # learning rate to use
        # begining of a batch, given `index`
        batch_begin = index * batch_size
        # ending of a batch given `index`
        batch_end = batch_begin + batch_size

        pretrain_fns = []
        for dA in self.dA_layers:
            # get the cost and the updates list
            cost, updates = dA.get_cost_updates(corruption_level, learning_rate)
            # compile the theano function
            fn = theano.function(
                inputs=[
                    index,
                    theano.In(corruption_level, value=0.2),
                    theano.In(learning_rate, value=0.1)
                ],
                outputs=cost,
                updates=updates,
                givens={self.x: train_set_x[batch_begin: batch_end]})
            
            # append `fn` to the list of functions
            pretrain_fns.append(fn)

        return pretrain_fns

    ## fine tuning .. 
#     Generates a function `train` that implements one step of
#         finetuning, a function `validate` that computes the error on
#         a batch from the validation set, and a function `test` that
#         computes the error on a batch from the testing set
    def build_finetune_functions(self, datasets, batch_size, learning_rate):
          
        (train_set_x, train_set_y) = datasets[0]
        (valid_set_x, valid_set_y) = datasets[1]
        (test_set_x, test_set_y) = datasets[2]
        
        # compute number of minibatches for training, validation and testing
        n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
        n_valid_batches //= batch_size
        n_test_batches = test_set_x.get_value(borrow=True).shape[0]
        n_test_batches //= batch_size
        
        index = T.lscalar('index')  # index to a [mini]batch

        # compute the gradients with respect to the model parameters
        gparams = T.grad(self.finetune_cost, self.params)

        # compute list of fine-tuning updates
        updates = [(param, param - gparam * learning_rate) for param, gparam in zip(self.params, gparams)]
        
        ## train
        train_fn = theano.function(
            inputs=[index],
            outputs=self.finetune_cost,
            updates=updates,
            givens={
                self.x: train_set_x[index * batch_size: (index + 1) * batch_size],
                self.y: train_set_y[index * batch_size: (index + 1) * batch_size]},name='train')

        ## test
        test_score_i = theano.function(
            [index],
            self.errors,
            givens={
                self.x: test_set_x[index * batch_size: (index + 1) * batch_size],
                self.y: test_set_y[index * batch_size: (index + 1) * batch_size]}, name='test')
    
        valid_score_i = theano.function(
            [index],
            self.errors,
            givens={
                self.x: valid_set_x[
                    index * batch_size: (index + 1) * batch_size
                ],self.y: valid_set_y[index * batch_size: (index + 1) * batch_size]},name='valid')
         
         # Create a function that scans the entire validation set
        def valid_score():
            return [valid_score_i(i) for i in range(n_valid_batches)]

        # Create a function that scans the entire test set .. this is for classification ..
        def test_score():
            return [test_score_i(i) for i in range(n_test_batches)]

        return train_fn, valid_score, test_score  ## return of build_finetune_functions()

## Test the model ..

train and test a stochastic denoising autoencoder .. 

In [102]:
def test_SdA(finetune_lr=0.1, pretraining_epochs=15,
             pretrain_lr=0.001, training_epochs=1000,
             dataset='mnist.pkl.gz', batch_size=1):
  
    datasets = load_data(dataset)

    train_set_x, train_set_y = datasets[0]
    valid_set_x, valid_set_y = datasets[1]
    test_set_x, test_set_y = datasets[2]

    # compute number of minibatches for training, validation and testing
    n_train_batches = train_set_x.get_value(borrow=True).shape[0]
    n_train_batches //= batch_size

    # numpy random generator
    numpy_rng = numpy.random.RandomState(89677)
    
    # construct the stacked denoising autoencoder class
    sda = SdA(
        numpy_rng=numpy_rng,
        n_ins=28 * 28,
        hidden_layers_sizes=[1000, 1000, 1000],
        n_outs=10)
    
    #########################
    # PRETRAINING THE MODEL #
    #########################
    pretraining_fns = sda.pretraining_functions(train_set_x=train_set_x, batch_size=batch_size)

    start_time = timeit.default_timer()
    ## Pre-train layer-wise
    corruption_levels = [.1, .2, .3]
    for i in range(sda.n_layers):
        # go through pretraining epochs
        for epoch in range(pretraining_epochs):
            # go through the training set
            c = []
            for batch_index in range(n_train_batches):
                c.append(pretraining_fns[i](index=batch_index,
                         corruption=corruption_levels[i],
                         lr=pretrain_lr))
            print('Pre-training layer %i, epoch %d, cost %f' % (i, epoch, numpy.mean(c, dtype='float64')))

    end_time = timeit.default_timer()

#     print(('The pretraining code for file ' +
#            os.path.split(__file__)[1] +
#            ' ran for %.2fm' % ((end_time - start_time) / 60.)), sys.stderr)

    ########################
    # FINETUNING THE MODEL #
    ########################

    # get the training, validation and testing function for the model
    train_fn, validate_model, test_model = sda.build_finetune_functions(
        datasets=datasets,
        batch_size=batch_size,
        learning_rate=finetune_lr
    )

    # early-stopping parameters
    patience = 10 * n_train_batches  # look as this many examples regardless
    patience_increase = 2.  # wait this much longer when a new best is
                            # found
    improvement_threshold = 0.995  # a relative improvement of this much is
                                   # considered significant
    validation_frequency = min(n_train_batches, patience // 2)
                                  # go through this many
                                  # minibatche before checking the network
                                  # on the validation set; in this case we
                                  # check every epoch

    best_validation_loss = numpy.inf
    test_score = 0.
    start_time = timeit.default_timer()

    done_looping = False
    epoch = 0

    while (epoch < training_epochs) and (not done_looping):
        epoch = epoch + 1
        for minibatch_index in range(n_train_batches):
            minibatch_avg_cost = train_fn(minibatch_index)
            iter = (epoch - 1) * n_train_batches + minibatch_index

            if (iter + 1) % validation_frequency == 0:
                validation_losses = validate_model()
                this_validation_loss = numpy.mean(validation_losses, dtype='float64')
                print('epoch %i, minibatch %i/%i, validation error %f %%' %
                      (epoch, minibatch_index + 1, n_train_batches,
                       this_validation_loss * 100.))

                # if we got the best validation score until now
                if this_validation_loss < best_validation_loss:

                    #improve patience if loss improvement is good enough
                    if (
                        this_validation_loss < best_validation_loss *
                        improvement_threshold
                    ):
                        patience = max(patience, iter * patience_increase)

                    # save best validation score and iteration number
                    best_validation_loss = this_validation_loss
                    best_iter = iter

                    # test it on the test set
                    test_losses = test_model()
                    test_score = numpy.mean(test_losses, dtype='float64')
                    print(('     epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))

            if patience <= iter:
                done_looping = True
                break

    end_time = timeit.default_timer()
    
    print(
        (
            'Optimization complete with best validation score of %f %%, '
            'on iteration %i, '
            'with test performance %f %%'
        )
        % (best_validation_loss * 100., best_iter + 1, test_score * 100.)
    )
    print(('The training code for file ' +
           os.path.split(__file__)[1] +
           ' ran for %.2fm' % ((end_time - start_time) / 60.)), sys.stderr)

In [103]:
if __name__ == '__main__':
    test_SdA()

... loading data


IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/theano/tensor/c_code/dimshuffle.c'

## Comment:

This implementation depends on MLP, Logistic regression and dA implmentation .. this is why it suffers from the same error of dA !