<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

## *Data Science Unit 4 Sprint 3 Lesson 3*
# Autoencoders

__Problem:__ Is it possible to automatically represent an image as a fixed-sized vector even if it isn’t labeled?

__Solution:__ Use an autoencoder

Why do we need to represent an image as a fixed-sized vector do you ask? 

* __Information Retrieval__
    - [Reverse Image Search](https://en.wikipedia.org/wiki/Reverse_image_search)
    - [Recommendation Systems - Content Based Filtering](https://en.wikipedia.org/wiki/Recommender_system#Content-based_filtering)
* __Dimensionality Reduction__
    - [Feature Extraction](https://www.kaggle.com/c/vsb-power-line-fault-detection/discussion/78285)
    - [Manifold Learning](https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction)

We've already seen *representation learning* when we talked about word embedding modelings during our NLP week. Today we're going to achieve a similiar goal on images using *autoencoders*. An autoencoder is a neural network that is trained to attempt to copy its input to its output. Usually they are restricted in ways that allow them to copy only approximately. The model often learns useful properties of the data, because it is forced to prioritize which aspecs of the input should be copied. The properties of autoencoders have made them an important part of modern generative modeling approaches. Consider autoencoders a special case of feed-forward networks (the kind we've been studying); backpropagation and gradient descent still work. 

## Learning Objectives
*At the end of the lecture you should be to*:
* <a href="#p1">Part 1</a>: Describe the componenets of an autoencoder
* <a href="#p2">Part 2</a>: Train an autoencoder
* <a href="#p3">Part 3</a>: Apply an autoenocder to a basic information retrieval problem

<a id="p1"></a>

## Autoencoder Architecture

The *encoder* compresses the input data and the *decoder* does the reverse to produce the uncompressed version of the data to create a reconstruction of the input as accurately as possible:

<img src='https://miro.medium.com/max/1400/1*44eDEuZBEsmG_TCAKRI3Kw@2x.png' width=800/>

The learning process gis described simply as minimizing a loss function: 
$ L(x, g(f(x))) $

- $L$ is a loss function penalizing $g(f(x))$ for being dissimiliar from $x$ (such as mean squared error)
- $f$ is the encoder function
- $g$ is the decoder function


<a id="p2"></a>
## Training an Autoencoder

In [2]:
#compress input to a tiny vector 128x128, 4x4
#loss function calculated by original inputs as true output
#use original image itself as label
#decoder reconstitutes compression to generate output

#learning process minimizing our loss - mse, 
#loss of autoencoder is inputs x compared to encoder, and decoder of x

#decoder of encoder of x
#autoencoder architecture - using input as label, design of network is like hourglass, compressing everything.

In [3]:
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D, Reshape, Concatenate, Flatten, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import load_img, img_to_array, ImageDataGenerator
from tensorflow.keras.losses import binary_crossentropy, kullback_leibler_divergence
from tensorflow.keras import backend as K
from tensorflow.keras.utils import get_file
from tensorflow.keras.optimizers import Adam

from struct import unpack
import json
import glob

from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np

from io import BytesIO
import PIL
from PIL import ImageDraw

from IPython.display import clear_output, Image, display, HTML

In [4]:
BASE_PATH = 'https://storage.googleapis.com/quickdraw_dataset/full/binary/'
path = get_file('cat', BASE_PATH + 'cat.bin')

Downloading data from https://storage.googleapis.com/quickdraw_dataset/full/binary/cat.bin


A drawing is a list of strokes, each made up of a series of x and y coordinates. The x and y coordinates are stored separately, so we need to zip them into a list to feed into the ImageDraw object we just created:

In [5]:
def load_icons(path, train_size=0.85):#zip into an image, easier to work with
    x = []
    with open(path, 'rb') as f:#open path
        while True:
            img = PIL.Image.new('L', (32, 32), 'white')#create image representation 32x32
            draw = ImageDraw.Draw(img)#on image, draw strokes
            header = f.read(15)
            if len(header) != 15:#unique to dataset
                break
            strokes, = unpack('H', f.read(2))
            for i in range(strokes):
                n_points, = unpack('H', f.read(2))#unpack strokes
                fmt = str(n_points) + 'B'
                read_scaled = lambda: (p // 8 for 
                                       p in unpack(fmt, f.read(n_points)))
                points = [*zip(read_scaled(), read_scaled())]
                draw.line(points, fill=0, width=2)
            img = img_to_array(img)
            x.append(img)#add to image and save to x, which is a list
    x = np.asarray(x) / 255
    return train_test_split(x, train_size=train_size)


x_train, x_test = load_icons(path)
x_train.shape, x_test.shape



((104721, 32, 32, 1), (18481, 32, 32, 1))

In [6]:
x_train[0]#most of image are white

array([[[1.],
        [0.],
        [0.],
        ...,
        [1.],
        [1.],
        [1.]],

       [[1.],
        [0.],
        [0.],
        ...,
        [1.],
        [1.],
        [1.]],

       [[0.],
        [0.],
        [1.],
        ...,
        [1.],
        [1.],
        [1.]],

       ...,

       [[1.],
        [1.],
        [1.],
        ...,
        [1.],
        [1.],
        [1.]],

       [[1.],
        [1.],
        [1.],
        ...,
        [1.],
        [1.],
        [1.]],

       [[1.],
        [1.],
        [1.],
        ...,
        [1.],
        [1.],
        [1.]]], dtype=float32)

In [12]:
def create_autoencoder():
    autoencoder = create_autoencoder()
    autoencoder.summary()
create_autoencoder()

RecursionError: maximum recursion depth exceeded

In [None]:
def create_autoencoder():#specify model as function
    input_img = Input(shape=(32, 32, 1))#1 represents degree of whiteness

    channels = 2#represent dimensions of the data x,y
    x = input_img
    #encoder
    for i in range(4):#looking at image from each of the perspectives
#         [[1,1,1,1],
#          [1,1,1,1],
#          [1,1,1,1],
#          [1,1,1,1]]
        channels *= 2
        left = Conv2D(channels, (3, 3), activation='relu', padding='same')(x)#passing x as an input of convolution layer
        right = Conv2D(channels, (2, 2), activation='relu', padding='same')(x)#taking input passing it down
        conc = Concatenate()([left, right])
        x = MaxPooling2D((2, 2), padding='same')(conc)#pooling layer, taking an average, dimensionality reduction technique
        #max takes the maxiumum from convolutional pool

    x = Dense(channels)(x)#output we'd want from our model, to complete training we need to reconstitute
    #decoder
    for i in range(4):
        x = Conv2D(channels, (3, 3), activation='relu', padding='same')(x)#we upsample
        x = UpSampling2D((2, 2))(x)
        channels //= 2
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

    autoencoder = Model(input_img, decoded)
    autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')#algorithmn for updating weights
    return autoencoder

autoencoder = create_autoencoder()
autoencoder.summary()

#encode
    #we get 32 sets of 4x4 weights
    #take image, slice across 4 perspectives
    #convolve that
    #do on left and right sides
    #concatenate togeter with 32x32x8
    #pull information together
    #do again convolve 16x16x8
    #convolve
    #condense
    #repeat
    #get 2x2x64, get 2x2x32
#decoder

In [None]:
from tensorflow.keras.callbacks import TensorBoard

autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test, x_test),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

In [None]:
cols = 25
idx = np.random.randint(x_test.shape[0], size=cols)
sample = x_test[idx]
decoded_imgs = autoencoder.predict(sample)
decoded_imgs.shape

In [None]:
#predict sample and decode
def decode_img(tile, factor=1.0):
    tile = tile.reshape(tile.shape[:-1])
    tile = np.clip(tile * 255, 0, 255)
    return PIL.Image.fromarray(tile)
    

overview = PIL.Image.new('RGB', (cols * 32, 64 + 20), (128, 128, 128))#create palette shape
for idx in range(cols):#iterate from our dataset
    overview.paste(decode_img(sample[idx]), (idx * 32, 5))
    overview.paste(decode_img(decoded_imgs[idx]), (idx * 32, 42))
f = BytesIO()
overview.save(f, 'png')#pull original image, and image decoded
display(Image(data=f.getvalue()))

In [None]:
#improvement: sample from true distribution
#take average from existing images
#create fixed-vector representation with 64 values.

#z log represents our log variance of cat images


#reverse image search
#content based recommendation systems like pandora - extract features from music

<a id="p3"></a>
## Part 3: Information Retrieval with Autoencoders

Let's slice our autoendoer in half to extract our reduced features. :) 

In [None]:


#we flatten model to 128
from tensorflow.keras.models import Model

layer_name = 'dense'

intermediate_layer_model = Model(inputs=autoencoder.input,#take autoencoder inputs
                                 outputs=autoencoder.get_layer(layer_name).output)#get the layer and output from that

intermediate_output = intermediate_layer_model.predict(x_train)#call the predict methodHey are you following along the lecture? Do you have the same code as he does in his notebook? I don't think I have variational autoencoders function that he is on.

In [None]:
# Loop Over Each Input Observation
intermediate_output.shape

In [None]:
intermediate_output[0].T
#gets 4x4
#for recommendation flatten down 

In [None]:
#do k nearest neighbor search on top of our outputs
#vector searches 
from sklearn.neighbors import NearestNeighbors

nn = NearestNeighbors(n_neighbors=10, algorithm='ball_tree')
nn.fit(vectors)

In [None]:
#we can in half
#extract call method for new model to get intermediate output
#take intermediate output and feed it into

#prepare for lecture tomorrow - read open.ai papers

#first time to apply this technique is for feature extraction

In [None]:
#gan 2 networks mashed together into one 

#input z

<a id="p3"></a>
Part 4: Information Retrieval with Autoencoders

Generator - 
Discriminator - generates samples - decides what is real or fake, loss shoots back to generator to discriminator modeled behind prisoner's delima 

train generator first, train discriminator second
create gan input, get x, get x discriminator, return gan network

load mnist, create batch size, instaniate generator, go through epoch

overfitting: autoencoder output is exactly like autoencoder input - penalty can be added

inputs ar random noise 

create labels
train discriminator, train generator
plot images, retrain whole thing
plot images every 20 epochs, 

gans give better results for images
autoencoders compress information
autoencoders can be for generative purposes
gans include an adversarial component

using inputs as labels in autoencoders - we use just one network
in gan there are two networks competing against each other 
generator in gan is using noise 
gans are hard to achieve good results

loss is determined by what we know about what is real or fake labels are internal - unsupervised approach


image classification - is something or not
     subdomain of image classification: object detection - bounding box - around object and attach probability
     
keras, pytorch - involves customization
whereas aws or gcp automl - could involve consultant when there is less infrastructure

importances are techniques and applications to real problems