## Introduction:

This notebook is devoted to diagonsing autoencoders using Theano .. 

A denoising autoencoders tries to reconstruct the input from a corrupted version of it by projecting it first in a latent space and reprojecting it afterwards back in the input space.

It assumes an implementation of simple logistic regression and MLP on MNIST dataset .. 

 If x is the input then equation (1) computes a partially
    destroyed version of x by means of a stochastic mapping q_D. Equation (2)
    computes the projection of the input into the latent space. Equation (3)
    computes the reconstruction of the input, while equation (4) computes the
    reconstruction error.

        \tilde{x} ~ q_D(\tilde{x}|x)                                     (1)

        y = s(W \tilde{x} + b)                                           (2)

        x = s(W' y  + b')                                                (3)

        L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)]      (4)
        
The below stems from the following Deep Learning tutorial .. 
http://deeplearning.net/tutorial/deeplearning.pdf


In [131]:
import os
import sys
import timeit

import numpy as np

import theano
import theano.tensor as T
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams

from logistic_sgd import load_data
import utils  #tile_raster_images

import PIL.Image as Image

## Initialization:
When dealing with SdAs this always happens, the dA on layer 2 gets as input the output of the dA on layer 1, and the weights of the dA are used in the second stage of training to construct an MLP.

The following class contains the whole functions .. collectively represent the implementation of DA .. 

It starts with the init function .. followed by that one for corrupted inputs that contains cost calculation and updates .. 

In [135]:
class diagAutoEncod(object):
     def __init__(self, numpy_rng, theano_rng=None, input=None, n_visible=784, n_hidden=500, W=None, bhid=None, bvis=None):
            ## bhid:biases to hidden units, bvis: biases to visible units
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        # create a Theano random generator that gives symbolic random values
        if not theano_rng:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
        
        # note that W' was written as `W_prime` and b' as `b_prime`
        if not W:
            # W is initialized with `initial_W` which is uniformely sampled
            # from [-4*sqrt(6./(n_visible+n_hidden)), 4*sqrt(6./(n_hidden+n_visible))]the output of uniform if
            # converted using asarray to dtype theano.config.floatX so that the code is runable on GPU
            initial_W = np.asarray(
                numpy_rng.uniform(
                    low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    size=(n_visible, n_hidden)
                ),dtype=theano.config.floatX)
            W = theano.shared(value=initial_W, name='W', borrow=True)
            
            ## init biases 
            if not bvis:
                bvis = theano.shared(value=np.zeros(n_visible, dtype=theano.config.floatX),borrow=True)
            
            if not bhid:
                bhid = theano.shared(value=np.zeros(n_hidden, dtype=theano.config.floatX),name='b', borrow=True)
            
            self.W = W
            # b corresponds to the bias of the hidden
            self.b = bhid
            # b_prime corresponds to the bias of the visible
            self.b_prime = bvis
            # tied weights, therefore W_prime is W transpose
            self.W_prime = self.W.T
            self.theano_rng = theano_rng

            # if no input is given, generate a variable representing the input
            if input is None:
                # we use a matrix because we expect a minibatch of several examples, each example being a row
                self.x = T.dmatrix(name='input')
            else:
                self.x = input

            self.params = [self.W, self.b, self.b_prime]
    ## end of init 
    
        #   The following function depends on theano binomial function 
        #   The binomial function return int64 data type by default.  
        #   int64 multiplicated by the input type(floatX) always return float64.  
        #   To keep all data in floatX when floatX is float32, we set the dtype of
        #   the binomial to floatX. As in our case the value of the binomial is always 0 or 1, this don't change the
        #   result. This is needed to allow the gpu to work correctly as it only support float32 for now.

        def get_corrupted_input(self, input, corruption_level):
             # this function produces an array of 0s and 1s 
            # where 1 has a probability of 1 - ``corruption_level`` and 0 with ``corruption_level``
            return self.theano_rng.binomial(size=input.shape, n=1, 
                                            p=1 - corruption_level, dtype=theano.config.floatX) * input
            
        def get_hidden_values(self, input):
        # Computes the values of the hidden layer
            return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
        
        
        def get_reconstructed_input(self, hidden):
          ## Computes the reconstructed input given the values of the hidden layer
            return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)

        ## cost function .. 
        ## This function computes the cost and the updates for one trainng step of the dA
        def get_cost_updates(self, corruption_level, learning_rate):
            tilde_x = self.get_corrupted_input(self.x, corruption_level)
            y = self.get_hidden_values(tilde_x)  # y is a function of x
            z = self.get_reconstructed_input(y)  # z is a function of y
            # note : we sum over the size of a datapoint; if we are using minibatches
            # L will be a vector, with one entry per example in minibatch
            L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1) # cross entropy cost
            # note : L is now a vector, where each element is the
            #        cross-entropy cost of the reconstruction of the
            #        corresponding example of the minibatch. We need to
            #        compute the average of all these to get the cost of
            #        the minibatch
            cost = T.mean(L)

            # compute the gradients of the cost of the `dA` with respect
            # to its parameters
            gparams = T.grad(cost, self.params)
            # generate the list of updates
            updates = [(param, param - learning_rate * gparam) for param, gparam in zip(self.params, gparams)]
            return (cost, updates)     

## Test_dA:
this function is devoted to building, training and testing the whole model ..

MNIST dataset is used in testing the model ..

In [136]:
def test_dA(learning_rate=0.1, training_epochs=15, 
            dataset='mnist.pkl.gz', batch_size=20, 
            output_folder='/home/eman/PhD/Deep Learning Practice/Myown practice/dA_plots'):
    
#     if not os.path.isdir(output_folder):
#         os.makedirs(output_folder)
#         os.chdir(output_folder)  # Changes the current working directory to the given path.It returns None in all the cases.

    
    ## load data 
    datasets = load_data(dataset)
    train_set_x, train_set_y = datasets[0]
    
        # compute number of minibatches for training
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size
    
#     # allocate symbolic variables for the data
    index = T.lscalar()    # index to a [mini]batch
    x = T.matrix('x')  # the data is presented as rasterized images
    
    
    ###################################
 ##   BUILDING THE MODEL NO CORRUPTION ##
    ###################################
    
    rng = np.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))

    da = diagAutoEncod(numpy_rng=rng, theano_rng=theano_rng, input=x, n_visible=28*28, n_hidden=500)
    
    ## cost calc and update
    cost, updates = da.get_cost_updates(corruption_level=0, learning_rate=learning_rate)
    
    ## training the model 
    train_da = theano.function([index], cost, updates=updates,
        givens={x: train_set_x[index * batch_size: (index + 1) * batch_size]})

    ##start timer
    start_time = timeit.default_timer()
    
    
     ################
    # Model TRAINING #
    ################

    # go through training epochs
    for epoch in range(training_epochs):
        # go through trainng set
        c = []
        for batch_index in range(n_train_batches):
            c.append(train_da(batch_index))

        print('Training epoch %d, cost ' % epoch, numpy.mean(c, dtype='float64'))

    ## end timer ..
    end_time = timeit.default_timer()
    
    training_time = (end_time - start_time)  # calc the duration of the training step .. 
    
    print(('The no corruption code for file ' + os.path.split(__file__)[1] +
           ' ran for %.2fm' % ((training_time) / 60.)), sys.stderr)
    
    image = Image.fromarray(tile_raster_images(X=da.W.get_value(borrow=True).T,
                           img_shape=(28, 28), tile_shape=(10, 10), tile_spacing=(1, 1)))
    image.save('filters_corruption_0.png')

      #####################################
    # BUILDING THE MODEL CORRUPTION 30% #
    #####################################

    rng = np.random.RandomState(123)
    theano_rng = RandomStreams(rng.randint(2 ** 30))

    da = diagAutoEncod(numpy_rng=rng, theano_rng=theano_rng, input=x, n_visible=28 * 28, n_hidden=500)
    
    cost, updates = da.get_cost_updates(corruption_level=0.3, learning_rate=learning_rate)
    
    train_da = theano.function([index], cost, updates=updates, givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size]})
    

    ## start training time .. 
    start_time = timeit.default_timer()
    
     ################
    # MODEL TRAINING #
    ################
    
     # go through training epochs
    for epoch in range(training_epochs):
        # go through trainng set
        c = []
        for batch_index in range(n_train_batches):
            c.append(train_da(batch_index))

        print('Training epoch %d, cost ' % epoch, np.mean(c, dtype='float64'))

        ## end training 
        end_time = timeit.default_timer()
        
        training_time = (end_time - start_time)  ## training time ..
        print(('The 30% corruption code for file ' + os.path.split(__file__)[1] +
           ' ran for %.2fm' % (training_time / 60.)), sys.stderr)
    
    image = Image.fromarray(tile_raster_images(X=da.W.get_value(borrow=True).T,
        img_shape=(28, 28), tile_shape=(10, 10), tile_spacing=(1, 1)))
    image.save('filters_corruption_30.png')

## Calling the function:

In [None]:
if __name__ == '__main__':
    test_dA()

## Side Note:
The following commands to check the current working directory and to change it to whatever directory you wanna work on.

In [None]:
import os
cwd = os.getcwd()
cwd

In [110]:
os.chdir('/home/eman/PhD/Deep Learning Practice/Myown practice')