# (E6) Autoencoders
In this exercise, you will be given an example of [autoencoders](https://en.wikipedia.org/wiki/Autoencoder). 
You should be able to replicate the results given here if you have completed (E2)-(E5) correctly.

It would be best if you have a Python IDE (integrated development environment) such as [PyCharm](https://www.jetbrains.com/pycharm/) and [Anaconda](anaconda.com) is installed because they will make your life easier! If not, you may want to work on the assignment using Google Colab. In any cases, what you need to do is 1) to fill in the blanks in .py files; and 2) to import the files (e.g., layer.py, optim.py, model.py, etc) that you have completed for use. Here are some scenarios how you would go about doing the assignment: 

#### Without Google Colab: Python IDE + Anaconda 
If you have a Python IDE and Anaconda installed, you can do one of the following:
- Edit .py files in the IDE. Then, simply open .ipynb file also in the IDE where you can edit and run codes. 
- Your IDE might not support running .ipynb files. However, since you have installed Anaconda, you can just open this notebook using Jupyter Notebook.

In both of these cases, you can simply import .py files in this .ipynb file:
```python
from model import NeuralNetwork
```
 
#### With Google Colab
- Google Colab has an embedded code editor. So, you could simply upload all .py files to Google Colab and edit the files there. Once you upload the files, double click a file that you want to edit. Please **make sure that you download up-to-date files frequently**, otherwise Google Colab might accidentally restart and all your files might be gone.
- If you feel like the above way is cumbersome, you could instead use any online Python editors for completing .py files (e.g., see [repl.it](https://repl.it/languages/python3)). Also, it's not impossible that you edit the files using any text editors, but they don't show you essential Python grammar information, so you'll be prone to make mistakes in that case. Once you are done editing, you can either upload the files to Colab or follow the instruction below. 
 
- If you have *git clone*d the assignment repository to a directory in your Google Drive (or you have the files stored in the Drive anyway), you can do the following:
```jupyterpython
from google.colab import drive
drive.mount('/content/drive/')          # this will direct you to a link where you can get an authorization key
import sys
sys.path.append('/content/drive/My Drive/your-directory-where-the-python-files-exist')
```
Then, you are good to go. When you change a .py file, make sure it is synced to the drive, then you need to re-run the above lines to get access to the latest version of the file. Note that you should give correct path to *sys.path.append* method.

Now, let's get started!
## Autoencoder
### Input and Target
An autoencoder learns the latent embeddings of inputs in an unsupervised way. This is because we do not need to have specific target values associated with the inputs; however, the input data themselves will act as the targets. 

To see it more concretely, let's look at below code which prepares the data for learning an autoencoder. 

In [1]:
from model import NeuralNetwork

In [2]:
import numpy as np
def generate_data(num=8):
    """ Generate 'num' number of one-hot encoded integers. """ 
    x_train = np.eye(num)[np.arange(num)]                       # This is a simple way to one-hot encode integers
    
    # Repeat x_train multiple times for training
    x_train = np.repeat(x_train, 100, axis=0)
    
    # The target is x_train itself!
    x_target = x_train.copy()
    return x_train, x_target    

Clearly, *x_target* is the same as *x_train*. So, what we want to do is to encode 8-bit inputs using 3 hidden nodes, which in turn will be decoded back to the original 8-bit value by the decoder. Learning an autoencoder, therefore, means that we train both the encoder weight and the decoder weight. In our example, since we have 3 hidden nodes in a single layer, the encoder weight has *[8, 3]* shape, whereas the decoder weight has *[3, 8]* shape. 

### Training an Autoencoder
Now, let us train an autoencoder with the sigmoid activation function and the cross-entropy loss.

In [3]:
from model import NeuralNetwork
from layer import FCLayer
from activation import Activation
from utils import *
from loss import CrossEntropyLoss
from optim import SGD, Adam, RMSProp
# Load data
num = 8
np.random.seed(10)
x_train, x_target = generate_data(num=num)

In [4]:
# Define a model and add fully-connected and activation layers.
nn = NeuralNetwork()
nn.add(FCLayer(x_train.shape[1], 3, initialization='xavier', uniform=False))
nn.add(Activation(sigmoid, sigmoid_prime))
nn.add(FCLayer(3, x_train.shape[1], initialization='xavier', uniform=False))

In [5]:
nn.layers

[<layer.FCLayer at 0x12593f978>,
 <activation.Activation at 0x12593f9e8>,
 <layer.FCLayer at 0x12593fac8>]

## Assigning the Cross Entropy Loss functions to the model

In [6]:
# Define loss: note that CrossEntropyLoss is using the softmax output internally
loss = CrossEntropyLoss()
nn.set_loss(loss)

In [7]:
# Set up hyperparameters
lr = 0.001
epochs = 10
freq = epochs // 10
batch_size = 64

In [8]:
nn.parameters()[3].shape

(1, 8)

In [9]:
for i in range(0,len(nn.parameters())):
    print(nn.parameters()[i].shape)


(8, 3)
(1, 3)
(3, 8)
(1, 8)


In [10]:
# Define optimizer and associate it with the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [11]:
# Training begins
inds = list(range(x_train.shape[0]))
N = x_train.shape[0]

loss_hist = []
for epoch in range(epochs):
    inds = np.random.permutation(inds)
    x_train = x_train[inds]
    x_target = x_target[inds]
    
    loss = 0
    for b in range(0, N, batch_size):
        #0, 800, 64 
        # get the mini-batch
#         print("b:{}".format(b))
#         print("b+batch_size:{}".format(b+batch_size))
        x_batch = x_train[b: b+batch_size]
#         print("x_batch:{}".format(x_batch))
        x_target_batch = x_target[b: b+batch_size]
        
        # feed forward
#         print(x_batch)
        pred = nn.predict(x_batch)
        #pred = result from forward pass, i.e. H^{l-1}
        
        # Error
        loss += nn.loss(pred, x_target_batch)/N
        #this is the delta from the loss function
        
        # Back propagation of error
        nn.backward(pred, x_target_batch)
        
        # Update parameters
        nn.optimizer.step()
    
    # Record loss per epoch
    loss_hist.append(loss)
    
    if epoch % freq == 0:
        print()
        print("Epoch %d/%d\tloss=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)
        
        # Test with the training data
        pred = nn.predict(x_train, mode=False)
        l = nn.loss(pred, x_target)
        print("Test loss: {:.5f}".format(l), end='')

print("\nTraining finished!")
print("Print prediction results:")
x_test = np.eye(num)[np.arange(num)]                        # Test data (one-hot encoded)
np.set_printoptions(2)
for x in x_test:
    print("\tInput: {}\tOutput: {}".format(x, softmax(nn.predict(x[None, :], mode=False))))

loss:17.012284697811733
loss:17.025722381630224
loss:17.040211069076747
loss:17.045639830005275
loss:17.04049323911932
loss:17.045497038424994
loss:17.038706562398943
loss:17.040458330734317
loss:17.05661757723084
loss:17.07049025673983
loss:17.075437447779166
loss:17.076932306958767
loss:17.088827853052436

Epoch 1/10	loss=221.65732	loss:17.074350833208328
Test loss: 17.07435loss:17.065354047362618
loss:17.074928823662
loss:17.07697847495637
loss:17.090620449508307
loss:17.085634802623566
loss:17.10159987583182
loss:17.092164149464928
loss:17.111104286473825
loss:17.109469444735495
loss:17.117163397843377
loss:17.111190766118813
loss:17.092874711198952
loss:17.101628080137218

Epoch 2/10	loss=222.23071	loss:17.12048039408945
Test loss: 17.12048loss:17.12516757798049
loss:17.117202671050975
loss:17.117832003532094
loss:17.11869216526406
loss:17.133506691277514
loss:17.141539937482776
loss:17.127454538628786
loss:17.132588383859925
loss:17.16473966867975
loss:17.155871752471835
loss:17.

If you look at the output values of the network, clearly we have successfully trained the autoencoder to encode-decode 8-bit integers!

## (E7) Your Turn:  Explain the autoencoder
Given the trained model that can encode the 0-7 integers, explain how the NN model learned to encode/compress the numbers. Rather than just stating your reasoning in words, do explore the model closely to see what it has learned. 

$$
loss(x, y) = - \frac{1}{C} * \sum_i y[i] * \log((1 + \exp(-x[i]))^{-1})
                         + (1-y[i]) * \log\left(\frac{\exp(-x[i])}{(1 + \exp(-x[i]))}\right)
                         $$

In [12]:
0.34922997 +0.13353835-0.54969662+0.20205826-0.36256657+0.5011707-0.4363535+0.45998743

0.29736802

In [13]:
def softmax(U):
    # A numerically-stable implementation of the softmax function
    exp = np.exp(U - np.max(U, axis=1, keepdims=True))
    return exp / exp.sum(axis=1, keepdims=True)

In [14]:
pred.shape

(800, 8)

In [15]:
x_target_batch.shape

(32, 8)

In [16]:
np.log(softmax(pred)).shape

(800, 8)

In [17]:
np.logaddexp(0, pred).shape

(800, 8)

In [18]:
np.dot((1-x_target_batch).T, np.logaddexp(0, pred))

ValueError: shapes (8,32) and (800,8) not aligned: 32 (dim 1) != 800 (dim 0)

In [None]:
-np.sum(np.dot(target.T, np.logaddexp(0, -pred)) + np.dot((1-target).T, np.logaddexp(0, pred)))