In [1]:
%reload_ext nb_black

<IPython.core.display.Javascript object>

Our autoencoder layers for our goofy exercise:

In [4]:
import random
import pandas as pd

random.seed(42)

names = [
    "Adam P.",
    "Emily",
    "Heather",
    "Hunter",
    "Jon",
    "Michael",
    "Shobair",
    "Wyatt",
]
random.shuffle(names)

input_layer = ["Adam S."]
remaining_layers = names[:6]

layers = input_layer + remaining_layers
layer_type = ["Encoder"] * 3 + ["Code"] + ["Decoder"] * 3
words_allowed = [10, 8, 6, 3, 6, 8, 10]

df = pd.DataFrame(
    {"Layer Type": layer_type, "Person": layers, "Words Allowed": words_allowed}
)
df.index += 1
df.index.name = "Layer Number"
df

Unnamed: 0_level_0,Layer Type,Person,Words Allowed
Layer Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Encoder,Adam S.,10
2,Encoder,Hunter,8
3,Encoder,Jon,6
4,Code,Shobair,3
5,Decoder,Wyatt,6
6,Decoder,Heather,8
7,Decoder,Michael,10


<IPython.core.display.Javascript object>

Round 1:

```
* Input Phrase (Layer 1):
    * "Why fit in when you were born to stand out"
* Encoded Phrase (Layer 4):
    * "Fit Stand Out"
* Output Phrase (Layer 7):  
    * "Please fit a good stand out model very fast now"
```

Round 2:

```
* Input Phrase (Layer 1):
    * "A way to a mans heart is through his stomach"
* Encoded Phrase (Layer 4):
    * "man's heart stomach"
* Output Phrase (Layer 7):  
    * "the way to a man's heart is through his stomach"
```

The game:

* Adam S. will be our input layer, he will pass on a 10 word phrase to the 2nd layer
* The 2nd layer must summarize the 10 word phrase to 8 words with the goal of retaining the true meaning of the phrase as best as possible
    * Repeat this process up to the code layer (layer 4)
* The code layer will pass on their 3 word summary to layer 5 who must try and unsummarize the 3 word summary into a 6 word summary
    * Repeat this process up to the last layer
* The final layer will have an output of a 10 word phrase and we'll compare the output phrase to the input phrase

How to play with Slack:

* If you are a (p)layer, wait for a phrase to be sent to you by the person before you.
* Translate the phrase into your specified number of words, and send this as a direct to the (p)layer after you.

# Autoencoders

<img src='https://miro.medium.com/max/3148/1*44eDEuZBEsmG_TCAKRI3Kw@2x.png' width=70%>

To demo what an autoencoder is doing, let's play a game of telephone that mirrors the encoding/decoding shown above.

Person 1 is the first layer (colored blue), Person 2 is the second layer (colored green), ..., and Person 7 is the last layer (colored blue).

## Encoding layer

Person 1 will be our input layer, let's say our input data is a 10 word phrase (to match the number of blocks in the image).  Person 1 will tell Person 2 this 10 word phrase and Person 2's job is to figure out how to compress this phrase down to 8 words.

Person 2 will then tell Person 3 this 8 word summarization of the origninal phrase.  Person 3 now has to summarize this phrase to 6 words.

Person 3 passes this message to Person 4 who has to summarize down to 3 words.

This 3 word summary is our encoded representation of our data.  It is a reduced dimension version of the data, the rest of the process (the decoder) is to figure out if this is a good summarization.

## Decoding layer

Person 4 (the code layer) tells their 3 word summary to Person 5.  Instead of summarizing, Person 5's job is to expand this summary into 6 words.

Person 5 passes this expanded summary to Person 6 who expands it to 8 words and passes it along to our final Person 7 (the output layer).

Person 7 expands the summary back to the original 10 words, and now we compare how close this output is to the input.  There's likely to be some loss of information.

Our goal is to iterate on this game of telephone until Person 7 is able to recreate the input phrase as closely as possible.  If our output is very similar to our input, then the code layer has a very good summary of the input data.  We can use this lower dimension representation of our data thats found in the code layer for visualization/supervised learning.

## Toy Example

For example, this telphone phrase might be:

An early iteration might be very bad at recreating the input:

```
Layer 1 / Input layer:  "the rain in spain falls very neatly on the plain"
Layer 2:                "the in falls very neatly on the plain"
Layer 3:                "the in very neatly on the"
Layer 4 / Code layer:   "the on the"
Layer 5:                "puts the lotion on the skin"
Layer 6:                "it puts the lotion on the skin or"
Layer 7 / Output layer: "it puts the lotion on the skin or it gets"
```

Our input phrase was very ESL and our output is Buffalo Bill.  This indicates that the code layer is a bad summary of the data.  The code layer has still performed dimension reduction, but it doesn't contain useful information.

A later iteration will hopefully be better at learning the data.

```
Layer 1 / Input layer:  "the rain in spain falls very neatly on the plain"
Layer 2:                "rain in spain falls neatly on the plain"
Layer 3:                "rain in spain falls on plain"
Layer 4 / Code layer:   "rain spain plain"
Layer 5:                "rain spain falls on the plain"
Layer 6:                "rain in spain falls neatly on the plain"
Layer 7 / Output layer: "rain in spain falls neatly on the great spain plain"
```

We could now use the code layer (`"rain spain plain"`) to serve as a lower dimensional representation of our input data.