# A7 Autoencoder for Classification


We have talked in lecture about how an Autoencoder nonlinearly reduces the dimensionality of data.  In this assignment you will 
1. load an autoencoder network already trained in the MNIST data,
2. apply it to the MNIST training set to obtain the outputs of the units in the bottleneck layer as a new representation of each training set image with a greatly reduced dimensionality,
3. Train a fully-connected classification network on this new representation.
4. Report on the percent of training and testing images correctly classified.  Compare with the accuracy you get with the original images.

Download [nn_torch.zip](https://www.cs.colostate.edu/~anderson/cs445/notebooks/nn_torch.zip) and extract the files.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas
import pickle
import gzip
import torch
import neuralnetworks_torch as nntorch

First, let's load the MNIST data. You may download it here: [mnist.pkl.gz](http://deeplearning.net/data/mnist/mnist.pkl.gz).

In [2]:
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

Xtrain = train_set[0]
Ttrain = train_set[1]

Xtest = test_set[0]
Ttest = test_set[1]

Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape

((50000, 784), (50000,), (10000, 784), (10000,))

To load the network saved in Lecture Notes 21, run the following code.  This loads the saved torch neural network that was trained in a GPU.  It loads the state of that net (its weights) into a new net of the same structure but allocated on the CPU.

First download [mnist_autoencoder.pt](https://www.cs.colostate.edu/~anderson/cs445/notebooks/mnist_autoencoder.pt).

In [4]:
n_in = Xtrain.shape[1]
n_hiddens_per_layer = [500, 100, 50, 50, 20, 50, 50, 100, 500]
nnet_autoencoder = nntorch.NeuralNetwork(n_in, n_hiddens_per_layer, n_in, device='cpu')

nnet_autoencoder.load_state_dict(torch.load('mnist_autoencoder.pt', map_location=torch.device('cpu')))

<All keys matched successfully>

To get the output of the units in the middle hidden layer, run `use_to_middle` function implemented for you in `neuralnetworks_torch`.

In [5]:
Xtrain_reduced = nnet_autoencoder.use_to_middle(Xtrain)
Xtrain_reduced.shape

(50000, 20)

And while we are here, let's get the reduced representation of `Xtest` also.

In [21]:
Xtest_reduced = nnet_autoencoder.use_to_middle(Xtest)
Xtest_reduced.shape

(10000, 20)

## Requirement

Your jobs are now to
1. train one fully-connected classifier using `Xtrain_reduced` and `Ttrain` and test it with `Xtest_reduced` and `Ttest`, and
2. train a second fully-connected classifier using `Xtrain` and `Ttrain` and test it with `Xtest` and `Ttest.

Try to find parameters (hidden network structure, number of epochs, and learning rate) for which the classifier given the reduced representation does almost as well as the other classifier with the orignal data. Discuss your results.

Here is an example for part of Step 1.  It shows a brief training session (small number of epochs and simple hidden layer structure) for using the reduced data.

In [10]:
 Xtrain_reduced.shape

(50000, 20)

# Experiment 1 with Original Data 

In [30]:
n_in = Xtrain.shape[1]
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [200,200], 10, device='cpu')

n_epochs = 2000
reduced_classifier.train(Xtrain, Ttrain, n_epochs, 0.001, method='adam', standardize='')

Classes1, _ = reduced_classifier.use(Xtrain)

def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

print(f'% Correct  Train {percent_correct(Classes1, Ttrain):.2f}')

Classes2, _ = reduced_classifier.use(Xtest)
print(f'% Correct  Test {percent_correct(Classes2, Ttest):.2f}')

Epoch 200: RMSE 0.108
Epoch 400: RMSE 0.021
Epoch 600: RMSE 0.005
Epoch 800: RMSE 0.002
Epoch 1000: RMSE 0.001
Epoch 1200: RMSE 0.001
Epoch 1400: RMSE 0.000
Epoch 1600: RMSE 0.000
Epoch 1800: RMSE 0.000
Epoch 2000: RMSE 0.000
% Correct  Train 100.00
% Correct  Test 97.88


# Experiment4 with reduced data

In [29]:
n_in = Xtrain_reduced.shape[1]
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [200,200,200], 10, device='cpu')

n_epochs = 6000
reduced_classifier.train(Xtrain_reduced, Ttrain, n_epochs, 0.0001, method='adam', standardize='')



def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

Classesred1, _ = reduced_classifier.use(Xtrain_reduced)
print(f'% Correct  Train {percent_correct(Classesred1, Ttrain):.2f}')

Classesred2, _ = reduced_classifier.use(Xtest_reduced)
print(f'% Correct  Test {percent_correct(Classesred2, Ttest):.2f}')

Epoch 600: RMSE 0.344
Epoch 1200: RMSE 0.283
Epoch 1800: RMSE 0.225
Epoch 2400: RMSE 0.150
Epoch 3000: RMSE 0.100
Epoch 3600: RMSE 0.070
Epoch 4200: RMSE 0.047
Epoch 4800: RMSE 0.030
Epoch 5400: RMSE 0.017
Epoch 6000: RMSE 0.009
% Correct  Train 99.92
% Correct  Test 98.18


# Results:

1.I have done 4 experiments , 1 & 2 on original data ,3 & 4 on reduced data respectively with same network parameters and network structures.
For the network structure epochs=2000,learning rate=0.001 and hiddenlayers=[200,200] I have achieved 100% Training accuracy on original data and 99.82% Training accuracy on reduced data.For the test data i have achoeved 97.88% on original data and 97.73% on test data. so original data worked slightly better than the reduced dimensional data in terms of accuracy.I have experimented on various other parameters. In Experiments 3&4, So to increase the test data accuracy in both types of networks, i have increased the number of epochs to 6000 and reduced the learning rate to 0.0001 and increased the hidden layers of the network to[200,200,200].The training time for this network greatly increased but the training accuracy is 100% in original data  and increased to 99.92% on reduced data. One of the surprising result is reduced data trained network(98.18%) performed better than the original data trained(97.97).It may be surprising that original data triained network performed  slightly less for one time use of the test data ,the key insight is rmse error becomes 0 for original one well after 60% of the epochs completion not incase for reduced data.I have several other experiments like increasing number of epochs to 10000 and training the reduced model to get 100% correct on reduced training set but it doesnt increase the accuracy more than 98% on test data for reduced network(less accuracy than 6000 epochs).I can infer that due to repetitive training with very high number of epochs(10000) and slow learning rate and complex structure the model like on experiment 4 .reduced data is trying matching original data in terms of accurate but for small networks reduced data is far less accurate than orginal data.On this experiment 1 & 3 are optimal parameters to get a matching on the parameters.