See `dataAnalysis/main.ipynb` for the preceeding and successive stages.

This file implements an unsupervised deep embedding process for clustering analysis.
<br>
> A. Constructing an unsupervised deep learning model (e.g. autoencoders).
<br>
> B. Training the deep embedding model to minimise a reconstruction error.
<br>
> C. Embedding generation. 
<br>
> D. Clustering in the embedding space.


Pre-requisite knowledge: simple feedforward perceptrons.

In [4]:
# Importing NN libraries
import tensorflow as tf
from tensorflow import keras
import numpy as np

__A. Constructing An Autoencoder.__
<br>
Autoencodoers consist of two parts: an `encoder` and a `decoder`.
<br>
The autoencoder can be expressed as

> `L(x,g(f(x))`,

for some input space `x` and non-linear encoder and decoder functions `f,` `g`. Note the lower-dimensional latent space is often denoted `f(x) = h`.

In [5]:
from DataAnalysis import DataAnalysis

class undercompleteAE():
    def __init__(self):
        data = DataAnalysis("data/star_data.fits")
        # Take the parameters from dataAnalysis, i.e. numRow amount of numCol-dimensional column vector. This is the input space.
        inputLayer = keras.Input(shape=(data.numRow(),data.numCol(),1), name="rawInput")
        encoderInputLayer = keras.layers.Flatten(name="encoderStart")(inputLayer)
        # Compress this input space into a lower-dimensional latent space, i.e. h.
        encoderOutputLayer = keras.layers.Dense((data.numRow()*data.numCol()/8), activation="relu", name="bottleNeck")(encoderInputLayer)

        # Similarly for the decoder, we take the latent space h and (try to) reconstruct to the input space x
        decoderInputLayer = keras.layers.Dense(data.numRow()*data.numCol(), activation="relu",name="decoderStart")(encoderOutputLayer)
        # Hence giving L(x, g(f(x)))
        decoderOutputLayer = keras.layers.Reshape((data.numRow(),data.numCol(),1), name="reconstruction")(decoderInputLayer)

        opt = keras.optimizers.Adam(learning_rate=0.001)
        self.autoencoder = keras.Model(inputLayer, decoderOutputLayer, name="autoencoder")

    def summary(self):
        self.autoencoder.summary()


In [6]:
e = undercompleteAE()
e.summary()



Model: "autoencoder"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 rawInput (InputLayer)       [(None, 30, 23, 1)]       0         
                                                                 
 encoderStart (Flatten)      (None, 690)               0         
                                                                 
 bottleNeck (Dense)          (None, 86)                59426     
                                                                 
 decoderStart (Dense)        (None, 690)               60030     
                                                                 
 reconstruction (Reshape)    (None, 30, 23, 1)         0         
                                                                 
Total params: 119456 (466.62 KB)
Trainable params: 119456 (466.62 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
