<a target="_blank" href="https://colab.research.google.com/github/ArtificialIntelligenceToolkit/aitk/blob/master/notebooks/Advanced/EncoderNetwork.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autoencoders

An autoencoder is a network that can be used to learn compressed representations of data. You start by creating a data set where the input data is duplicated as the target data. Thus the goal of the network is to simply reproduce the curent input on the output layer. However, the network's hidden layer is smaller in size than the original data, forcing it to compress the data in the process of reproducing it.

* The front end of the network (input to hidden) is called the **encoder**. It will compress the data.
* The back end of the network (hidden to output) is called the **decoder**. It will decompress the data.


In [1]:
%pip install aitk --quiet

In [2]:
from aitk.networks import SimpleNetwork
from time import sleep
import numpy as np

## Create a data set

We will use a simple data set of all one-hot vectors of length 8.

In [3]:
def one_hot(n):
    data = []
    for i in range(n):
        pattern = [0]*n
        pattern[i] = 1
        data.append(pattern)
    return np.array(data)

In [4]:
patterns = one_hot(8)

In [5]:
patterns

array([[1, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 1]])

## Create the network

The network will take inputs of size 8, pass them through a hidden layer of size 3, and attempt to reproduce them at the output layer, also of size 8. Because these are one-hot vectors we know that only one of the outputs should be on, thus we will use the **softmax** activation function combined with the **categorical crossentropy** loss function.

In [6]:
net = SimpleNetwork(
      8,
      (3, "sigmoid"),
      (8, "softmax"),
      loss = "categorical_crossentropy",
  )

## Train the network

Notice that in the **fit** command below that both the inputs and targets are the same patterns.

We are also not expecting the network to reproduce the input exactly as the output. The **tolerance** indicates that any target value that is within 0.15 of the corresponding input value will be considered correct.

In [7]:
history = net.fit(
    patterns,
    patterns,
    batch_size=8,
    epochs=1000,
    accuracy=1.0,
    tolerance=0.15,
    patience=20,
    report_rate=50,
)

Stopped because accuracy beat goal of 1.0
Epoch 694/1000 loss: 0.06934851408004761 - tolerance_accuracy: 1.0


## Test the network

Let's visualize the network's activations. Notice that the network is successfully reproducing the input pattern on the output layer.

In order to accomplish this, the network had to learn a unique compressed representation for each of the input patterns.

In [8]:
for i in range(len(patterns)):
    net.display(patterns[i])
    sleep(1.0)

Re-test the network again (by re-running the above cell), and this time focus on the hidden layer representations.

Our data consists of 8 patterns with one "on" bit. One possible compressed three-bit representation would be similar to binary:

* 000
* 100
* 010
* 001
* 110
* 101
* 011
* 111

It's hard to keep track of all of the hidden representations in this animation. Let's print them out instead.

In [9]:
hidden_activations = net.predict_to(patterns, "hidden")
for i in range(len(patterns)):
  print(" ".join(["%.2f" % f for f in hidden_activations[i]]))

0.03 0.05 0.02
0.02 0.97 0.94
0.96 0.03 0.80
0.76 0.93 0.30
0.97 0.90 0.98
0.05 0.04 0.96
0.97 0.14 0.02
0.06 0.98 0.02


Now let's simplify them even more, but rounding them up if they are greater than 0.6 and rounding them down if they are less than 0.4. We'll put a ? in for numbers that are in between.

In [10]:
for i in range(len(patterns)):
  print([1 if x > 0.6 else 0 if x < 0.4 else '?' for x in hidden_activations[i]])

[0, 0, 0]
[0, 1, 1]
[1, 0, 1]
[1, 1, 0]
[1, 1, 1]
[0, 0, 1]
[1, 0, 0]
[0, 1, 0]


Did the network learn a binary-like compression scheme? The order of the patterns is not important. We are just interested in whether all of the 3-bit binary patterns are represented.

NOTE: As long as the network has found a representation that allows it to distinguish the 8 different patterns using its 3 hidden values that is sufficient. However, it does tend to find hidden representations that are similar to a binary coding.

If your particular experiment did **not** reproduce the 3-bit binary patterns, you can try it again by going back to the cell labeled "Create the network" and re-running all of the cells from there.