# A7.1 Autoencoder for Classification


We have talked in lecture about how an Autoencoder nonlinearly reduces the dimensionality of data.  In this assignment you will 
1. load an autoencoder network already trained in the MNIST data,
2. apply it to the MNIST training set to obtain the outputs of the units in the bottleneck layer as a new representation of each training set image with a greatly reduced dimensionality,
3. Train a fully-connected classification network on this new representation.
4. Report on the percent of training and testing images correctly classified.  Compare with the accuracy you get with the original images.

Download [nn_torch.zip](https://www.cs.colostate.edu/~anderson/cs445/notebooks/nn_torch.zip) and extract the files.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas
import pickle
import gzip
import torch
import neuralnetworks_torch as nntorch

First, let's load the MNIST data. You may download it here: [mnist.pkl.gz](http://deeplearning.net/data/mnist/mnist.pkl.gz).

In [2]:
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

Xtrain = train_set[0]
Ttrain = train_set[1]

Xtest = test_set[0]
Ttest = test_set[1]

Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape

((50000, 784), (50000,), (10000, 784), (10000,))

To load the network saved in Lecture Notes 21, run the following code.  This loads the saved torch neural network that was trained in a GPU.  It loads the state of that net (its weights) into a new net of the same structure but allocated on the CPU.

First download [mnist_autoencoder.pt](https://www.cs.colostate.edu/~anderson/cs445/notebooks/mnist_autoencoder.pt).

In [3]:
n_in = Xtrain.shape[1]
n_hiddens_per_layer = [500, 100, 50, 50, 20, 50, 50, 100, 500]
nnet_autoencoder = nntorch.NeuralNetwork(n_in, n_hiddens_per_layer, n_in, device='cpu')
nnet_autoencoder.standardize = ''

nnet_autoencoder.load_state_dict(torch.load('mnist_autoencoder.pt', map_location=torch.device('cpu')))

<All keys matched successfully>

To get the output of the units in the middle hidden layer, run `use_to_middle` function implemented for you in `neuralnetworks_torch`.

In [4]:
Xtrain_reduced = nnet_autoencoder.use_to_middle(Xtrain)
Xtrain_reduced.shape

(50000, 20)

And while we are here, let's get the reduced representation of `Xtest` also.

In [5]:
Xtest_reduced = nnet_autoencoder.use_to_middle(Xtest)
Xtest_reduced.shape

(10000, 20)

## Requirement

Your jobs are now to
1. train one fully-connected classifier using `Xtrain_reduced` and `Ttrain` and test it with `Xtest_reduced` and `Ttest`, and
2. train a second fully-connected classifier using `Xtrain` and `Ttrain` and test it with `Xtest` and `Ttest`.

Try to find parameters (hidden network structure, number of epochs, and learning rate) for which the classifier given the reduced representation does almost as well as the other classifier with the original data. Discuss your results.

Here is an example for part of Step 1.  It shows a brief training session (small number of epochs and simple hidden layer structure) for using the reduced data. 

In [6]:
n_in = Xtrain_reduced.shape[1]
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [50], 10, device='cpu')

n_epochs = 50
reduced_classifier.train(Xtrain_reduced, Ttrain, n_epochs, 0.01, method='adam', standardize='')

Classes, _ = reduced_classifier.use(Xtest_reduced)

def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

print(f'% Correct  Ttest {percent_correct(Classes, Ttest):.2f}')

Epoch 5: RMSE 2.138
Epoch 10: RMSE 1.876
Epoch 15: RMSE 1.555
Epoch 20: RMSE 1.223
Epoch 25: RMSE 0.945
Epoch 30: RMSE 0.746
Epoch 35: RMSE 0.612
Epoch 40: RMSE 0.523
Epoch 45: RMSE 0.463
Epoch 50: RMSE 0.422
% Correct  Ttest 88.82


In [7]:
#reduced data experiment 1: bigger network using reduced data
n_in = Xtrain_reduced.shape[1]
#NeuralNetwork(n_in, n_hiddens_per_layer, n_in, device='cpu')
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [50, 50, 50], 10, device='cpu')

n_epochs = 50
reduced_classifier.train(Xtrain_reduced, Ttrain, n_epochs, 0.01, method='adam', standardize='')

Classes, _ = reduced_classifier.use(Xtest_reduced)

def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

print(f'% Correct  Ttest {percent_correct(Classes, Ttest):.2f}')

Epoch 5: RMSE 1.989
Epoch 10: RMSE 1.202
Epoch 15: RMSE 0.768
Epoch 20: RMSE 0.596
Epoch 25: RMSE 0.496
Epoch 30: RMSE 0.432
Epoch 35: RMSE 0.393
Epoch 40: RMSE 0.362
Epoch 45: RMSE 0.338
Epoch 50: RMSE 0.321
% Correct  Ttest 90.88


In [8]:
#reduced data experiment 2: even bigger network trained for more epochs
n_in = Xtrain_reduced.shape[1]
#NeuralNetwork(n_in, n_hiddens_per_layer, n_in, device='cpu')
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [100, 100, 100], 10, device='cpu')

n_epochs = 100
reduced_classifier.train(Xtrain_reduced, Ttrain, n_epochs, 0.01, method='adam', standardize='')

Classes, _ = reduced_classifier.use(Xtest_reduced)

def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

print(f'% Correct  Ttest {percent_correct(Classes, Ttest):.2f}')

Epoch 10: RMSE 0.643
Epoch 20: RMSE 0.378
Epoch 30: RMSE 0.316
Epoch 40: RMSE 0.274
Epoch 50: RMSE 0.244
Epoch 60: RMSE 0.221
Epoch 70: RMSE 0.200
Epoch 80: RMSE 0.178
Epoch 90: RMSE 0.156
Epoch 100: RMSE 0.136
% Correct  Ttest 95.71


In [9]:
#part 2: using original data, larger network than original with more epochs
#changed: n_in to Xtrain.shape[1]
n_in = Xtrain.shape[1]
#NeuralNetwork(n_in, n_hiddens_per_layer, n_in, device='cpu')
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [50, 50], 10, device='cpu')

n_epochs = 100
reduced_classifier.train(Xtrain, Ttrain, n_epochs, 0.01, method='adam', standardize='')

Classes, _ = reduced_classifier.use(Xtest)

def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

print(f'% Correct  Ttest {percent_correct(Classes, Ttest):.2f}')

Epoch 10: RMSE 0.637
Epoch 20: RMSE 0.327
Epoch 30: RMSE 0.246
Epoch 40: RMSE 0.196
Epoch 50: RMSE 0.162
Epoch 60: RMSE 0.136
Epoch 70: RMSE 0.115
Epoch 80: RMSE 0.098
Epoch 90: RMSE 0.084
Epoch 100: RMSE 0.072
% Correct  Ttest 96.11


# Discussion of Results

I found that in order for the network using the reduced data to do as well as a normal network it needed to have a considerably larger network.  In particular I found that when using the reduced data, a network with hidden layers [100, 100, 100], trained for 100 epochs and using a learning rate of 0.01 resulted in a percent correct on Ttest of 95.71.  Using the normal data, a network with hidden layers [50, 50] trained for 100 epochs with a learning rate of 0.01 resulted in a percent correct on Ttest of 96.11.  From this, it can be seen that a network using reduced data requires more resources such as network size or number of epochs in order to come close to the accuracy of a network using normal data.  

## Extra Credit

For 1 point of extra credit repeat this assignment using a second data set, one that we have not used in class before. This will require you to to train a new autoencoder net to use for this part.