See `dataAnalysis/main.ipynb` for the preceeding and successive stages.

This file implements an unsupervised deep embedding process for clustering analysis.
<br>
> A. Constructing an unsupervised deep learning model (e.g. autoencoders).
<br>
> B. Training the deep embedding model to minimise a reconstruction error.
<br>
> C. Embedding generation. 
<br>
> D. Clustering in the embedding space.


Pre-requisite knowledge: simple feedforward perceptrons.

In [10]:
# Importing NN libraries
import tensorflow as tf
from tensorflow import keras
import numpy as np

__A. Constructing An Autoencoder.__
<br>
Autoencodoers consist of two parts: an `encoder` and a `decoder`.
<br>
The autoencoder can be expressed as

> `L(x,g(f(x))`,

for some input space `x` and non-linear encoder and decoder functions `f,` `g`. Note the lower-dimensional latent space is often denoted `f(x) = h`.

In [11]:
from DataAnalysis import DataAnalysis

class undercompleteAE():
    def __init__(self):
        data = DataAnalysis("data/star_data.fits")
        # Take the parameters from dataAnalysis, i.e. numRow amount of numCol-dimensional column vector. This is the input space.
        inputLayer = keras.Input(shape=(data.numRow(),data.numCol(),1), name="rawInput")
        encoderInputLayer = keras.layers.Flatten(name="encoderIn")(inputLayer)
        # Compress this input space into a lower-dimensional latent space, i.e. h.
        encoderOutputLayer = keras.layers.Dense((data.numRow()*data.numCol()/8), activation="relu", name="bottleneckAsOut")(encoderInputLayer)

        # Save the encoder
        self.encoder = keras.Model(inputLayer, encoderOutputLayer, name="encoder")

        # Similarly for the decoder, we take the latent space h and (try to) reconstruct to the input space x
        decoderInputLayer = keras.layers.Dense((data.numRow()*data.numCol()/8), activation="relu",name="bottleneckAsIn")(encoderOutputLayer)
        decoderOutputLayer = keras.layers.Dense(data.numRow()*data.numCol(), activation="relu",name="decoderOut")(decoderInputLayer)
        # Hence giving L(x, g(f(x)))
        outputLayer = keras.layers.Reshape((data.numRow(),data.numCol(),1), name="reconstruction")(decoderOutputLayer)

        # Model class using Keras' Model class
        self.autoencoder = keras.Model(inputLayer, outputLayer, name="autoencoder")

        opt = keras.optimizers.Adam(learning_rate=0.001)        
        # self.autoencoder.compile(opt, loss="mse")
        # self.autoencoder.fit(data.getData(300), data.getData(300), epochs=3, batch_size=3, validation_split=0.1)

    def summary(self):
        self.autoencoder.summary()

    def getLatentRep():
        return 3


In [12]:
e = undercompleteAE()
e.summary()



Model: "autoencoder"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 rawInput (InputLayer)       [(None, 30, 23, 1)]       0         
                                                                 
 encoderIn (Flatten)         (None, 690)               0         
                                                                 
 bottleneckAsOut (Dense)     (None, 86)                59426     
                                                                 
 bottleneckAsIn (Dense)      (None, 86)                7482      
                                                                 
 decoderOut (Dense)          (None, 690)               60030     
                                                                 
 reconstruction (Reshape)    (None, 30, 23, 1)         0         
                                                                 
Total params: 126938 (495.85 KB)
Trainable params: 1269