[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Machine Learning Methods

## Neural Networks - UnSupervised Learning

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 10/03/2023 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2023_01/0046AnomalyDetectionIsolationForest.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_predict, train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVC

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
from matplotlib.colors import LogNorm, Normalize, PowerNorm
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
%matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

MNIST_IMG_SIZE = (28, 28)


In [None]:
# Fixel Algorithms Packages


## UnSupervised Learning with Neural Networks

In this notebook we'll explore using Neural Networks in the _UnSupervised Learning_ context.  
We'll use _Auto Encoder_ in order to reduce the dimensionality of the data.

![](https://i.imgur.com/dRKxhMw.png)

The concept of the _auto encoder_ is to use the data itself as a reference.   
Using the reconstruction error one optimizes both the _encoder_ and the _decoder_ to generate a low


* <font color='brown'>(**#**)</font> The low dimensionality section is also called the _bottleneck_ section of the net.
* <font color='brown'>(**#**)</font> In this notebook we'll use MLP based encoder / decoder, yet in practice, any atoms of the DNN eco system can be used.

In this notebook we'll use MNIST data. We'll classify it using a Logistic Regression based classifier with only 2 inputs from the encoder.

In [None]:
# Parameters

# Data
numSamplesTrain = 60_000
numSamplesVal   = 10_000


# Model
lHiddenLayers   = [500, 300, 30, 2, 30, 300, 500]
activationFun   = 'logistic'
solverType      = 'sgd'
regFctr         = 0.0001 #<! L2 Regularization
batchSize       = 5000
lrPolicy        = 'adaptive' #<! Works only if `solverType = sgd`
lrInit          = 0.001 #<! Works only if `solverType = sgd` or `solverType = adam`
numEpoch        = 100
stopTol         = 1e-6
earlyStopping   = True #<! Works only if `solverType = sgd` or `solverType = adam`
valRatio        = 0.15

numKFold = 5

# Visualization
numRows     = 3
numCols     = 3
numImgRec   = numRows * numCols


In [None]:
# Auxiliary Functions

def PlotMnistImages(mX: np.ndarray, vY: np.ndarray = None, numRows: int = 1, numCols: int = 1, imgSize = MNIST_IMG_SIZE, randomChoice = True, hF = None):

    numSamples  = mX.shape[0]

    numImg = numRows * numCols

    # tFigSize = (numRows * 3, numCols * 3)
    tFigSize = (numCols * 3, numRows * 3)

    if hF is None:
        hF, hA = plt.subplots(numRows, numCols, figsize = tFigSize)
    else:
        hA = hF.axis
    
    hA = np.atleast_1d(hA) #<! To support numImg = 1
    hA = hA.flat

    if randomChoice:
        vIdx = np.random.choice(numSamples, numImg, replace = False)
    else:
        vIdx = range(numImg)

    
    for kk in range(numImg):
        
        idx = vIdx[kk]
        mI  = np.reshape(mX[idx, :], imgSize)
    
        hA[kk].imshow(mI, cmap = 'gray')
        hA[kk].tick_params(axis = 'both', left = False, top = False, right = False, bottom = False, labelleft = False, labeltop = False, labelright = False, labelbottom = False)
        labelStr = f', Label = {vY[idx]}' if vY is not None else ''
        hA[kk].set_title(f'Index = {idx}' + labelStr)
    
    plt.show()

def PlotScatterData(mX: np.ndarray, vL: np.ndarray = None, hA:plt.Axes = None, figSize: Tuple[int, int] = FIG_SIZE_DEF, markerSize: int = MARKER_SIZE_DEF, edgeColor: int = EDGE_COLOR, axisTitle: str = None):

    if hA is None:
        hF, hA = plt.subplots(figsize = figSize)
    else:
        hF = hA.get_figure()
    
    numSamples = mX.shape[0]

    if vL is None:
        vL = np.zeros(numSamples)
    
    vU = np.unique(vL)
    numClusters = len(vU)

    for ii in range(numClusters):
        vIdx = vL == vU[ii]
        hA.scatter(mX[vIdx, 0], mX[vIdx, 1], s = markerSize, edgecolor = edgeColor, label = ii)
    
    hA.set_xlabel('${{x}}_{{1}}$')
    hA.set_ylabel('${{x}}_{{2}}$')
    if axisTitle is not None:
        hA.set_title(axisTitle)
    hA.grid()
    hA.legend()

    return hA


def PlotLabelsHistogram(vY: np.ndarray, hA = None, lClass = None, xLabelRot: int = None) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(figsize = (8, 6))
    
    vLabels, vCounts = np.unique(vY, return_counts = True)

    hA.bar(vLabels, vCounts, width = 0.9, align = 'center')
    hA.set_title('Histogram of Classes / Labels')
    hA.set_xlabel('Class')
    hA.set_ylabel('Number of Samples')
    hA.set_xticks(vLabels)
    if lClass is not None:
        hA.set_xticklabels(lClass)
    
    if xLabelRot is not None:
        for xLabel in hA.get_xticklabels():
            xLabel.set_rotation(xLabelRot)

    return hA



## Generate / Load Data

In this notebook we'll use the [`MNIST`](https://en.wikipedia.org/wiki/MNIST_database) data set.

It features 60,000 train images and 10,000 test images of size `28x28`.  
The images are `INT64` with values in the range `{0, 1, 2, ..., 255}` (Like `UINT8`).

* <font color='brown'>(**#**)</font> The MNIST is the data set used by Yann LeCun in the ~1990 to show the ability of Neural Networks.
* <font color='brown'>(**#**)</font> In the [MNIST WebSite](http://yann.lecun.com/exdb/mnist/) one can watch the performance improvement over the years using different approaches.


In [None]:
# Loading / Generating Data
mX, vY = fetch_openml('mnist_784', version = 1, return_X_y = True, as_frame = False, parser = 'auto')

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')
print(f'The unique values of the labels: {np.unique(vY)}')

In [None]:
# Pre Processing

# The image is in the range {0, 1, ..., 255}
# We scale it into [0, 1]

mX = mX / 255.0

### Plot the Data

In [None]:
# Display the Data

PlotMnistImages(mX, vY, numRows, numCols)

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY)

In [None]:
# Train / Test Split

mXTrain, mXTest, vYTrain, vYTest =  train_test_split(mX, vY, train_size = numSamplesTrain, test_size = numSamplesVal, stratify = vY)

print(f'The number of training data samples: {mXTrain.shape[0]}')
print(f'The number of training features per sample: {mXTrain.shape[1]}') 


print(f'The number of test data samples: {mXTest.shape[0]}')
print(f'The number of test features per sample: {mXTest.shape[1]}') 

## Auto Encoder

In this section we'll build the Auto Encoder for the MNIST data set.

![](https://i.imgur.com/F6RE6XP.png)

The idea is to push the image on one end and reconstruct it on the other side.  
In the middle, we'll create a bottleneck of size 2, it means the decoder, given only 2 numbers will have to decode the correct digit.


* <font color='brown'>(**#**)</font> Chaining MLP's of SciKit Learn in a pipeline won't make them share the same loss!

* <font color='red'>(**?**)</font> What kind of an MLP should we use? Regressor or Classifier?
* <font color='red'>(**?**)</font> How many outputs will the model have?
* <font color='red'>(**?**)</font> What would be the labels in the `fit()` method?

In [None]:
# Constructing the Model

# oMlpReg = MLPRegressor(hidden_layer_sizes = lHiddenLayers, activation = activationFun, solver = solverType, alpha = regFctr, learning_rate = lrPolicy, learning_rate_init = 0.001, random_state = seedNum)
oMlpReg = MLPRegressor(hidden_layer_sizes = [300, 100, 50, 2, 50, 100, 300], activation = 'relu', solver = 'adam', alpha = regFctr, learning_rate_init = 0.0005, random_state = seedNum)
# Batch = 500,  Epochs = 100
# learning_rate_init = 0.00015 -> 73.38%
# learning_rate_init = 0.00025 -> 78.79%
oMlpReg = MLPRegressor(hidden_layer_sizes = [300, 100, 50, 2, 50, 100, 300], activation = 'tanh', solver = 'adam', alpha = regFctr, learning_rate_init = 0.00025, random_state = seedNum)

# Batch = 5000,  Epochs = 50
# learning_rate_init = 0.00015 -> xxx
# oMlpReg = MLPRegressor(hidden_layer_sizes = [100, 50, 2, 50, 100], activation = 'tanh', solver = 'adam', alpha = regFctr, learning_rate_init = 0.00075, random_state = seedNum)


In [None]:
# Epoch Loop

lTrainLoss  = [] #<! Train set loss
lTrainScore = [] #<! Train set score
lValScore   = [] #<! Validation set score

for ii in range(numEpoch):
    print(f'Processing epoch #{(ii + 1):03d} out of {numEpoch} epochs.')
    
    oMlpReg = oMlpReg.partial_fit(mXTrain, mXTrain) 

    # Accuracy Score
    trainScore  = oMlpReg.score(mXTrain, mXTrain)
    valScore    = oMlpReg.score(mXTest, mXTest)

    lTrainLoss.append(oMlpReg.loss_)
    lTrainScore.append(trainScore)
    lValScore.append(valScore)
    print(f'The train loss (MSE)     : {oMlpReg.loss_:0.4f}')
    print(f'The train score (R2)     : {trainScore:0.4f}')
    print(f'The validation score (R2): {valScore:0.4f}')

* <font color='green'>(**@**)</font> Add early stopping feature to the loop above based on the score of the validation and the loss.
* <font color='green'>(**@**)</font> Create adaptive learning rate policy. In case of many epochs with non decreasing loss, set the learning rate to be smaller by factor of 2.

In [None]:
# Plot the Results

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)
hAT = hA.twinx()

hP1, = hAT.plot(range(numEpoch), lTrainLoss, color = 'C0', label = 'Train Loss')
hAT.set_ylabel('Loss')

hP2, = hA.plot(range(numEpoch), lTrainScore, color = 'C1', label = 'Train Score')
hP3, = hA.plot(range(numEpoch), lValScore, color = 'C2', label = 'Validation Score')

hA.set_title('Score and Loss of the Training Loop')

hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score')

hA.legend(handles = [hP1, hP2, hP3])
# hA.legend()

plt.show()

In [None]:
# oReg = MLPRegressor(hidden_layer_sizes = [500, 300, 2, 300, 500], activation = 'tanh', solver = 'adam', max_iter = 20, learning_rate_init = 0.0005, tol = 0.0000001, verbose = True)
# oReg = oReg.fit(mXTrain, mXTrain)
# oMlpReg = oReg

In [None]:
# Plot a Reconstruction

mXRec = np.zeros(shape = (numImgRec, MNIST_IMG_SIZE[0] * MNIST_IMG_SIZE[1]))
vImgIdx = np.random.choice(mXTest.shape[0], numImgRec, replace = False)

for ii in range(numImgRec):
    mXRec[ii] = oMlpReg.predict(np.atleast_2d(mXTest[vImgIdx[ii]]))

PlotMnistImages(mXRec, vYTest[vImgIdx], numRows, numCols, randomChoice = False)

plt.show()



In [None]:
# Train on Test Only
# We want fair comparison to the vanilla SVM Classifier.
# Hence we retrain the dimensionality reduction model on the test set.

# Batch = 5000,  Epochs = 50
# learning_rate_init = 0.00015 -> xxx
# oMlpReg = MLPRegressor(hidden_layer_sizes = [200, 100, 50, 25, 10, 2, 10, 25, 50, 100, 200], activation = 'tanh', solver = 'adam', alpha = regFctr, batch_size = 1000, learning_rate_init = 0.00005, random_state = seedNum)

In [None]:
# Epoch Loop
# numEpoch = 250

# lTrainLoss  = [] #<! Train set loss
# lTrainScore = [] #<! Train set score

# for ii in range(numEpoch):
#     print(f'Processing epoch #{(ii + 1):03d} out of {numEpoch} epochs.')
    
#     oMlpReg = oMlpReg.partial_fit(mXTest, mXTest) 

#     # Accuracy Score
#     trainScore = oMlpReg.score(mXTest, mXTest)

#     lTrainLoss.append(oMlpReg.loss_)
#     lTrainScore.append(trainScore)
#     print(f'The train loss (MSE)     : {oMlpReg.loss_:0.4f}')
#     print(f'The train score (R2)     : {trainScore:0.4f}')

## Latent Space Analysis (Dimensionality Reduction)

In order to analyze the latent space (Which has 2 features), we need to recreate the encoder part of the model.  
Since we have the attributes `coefs_` and `intercepts_` with the knowledge about the activation type we can reproduce the forward pass of the _encoder_.

Once we have access to the _encoder_ we can analyze the latent space.  
This is basically a dimensionality reduction step by _auto encoder_.

* <font color='brown'>(**#**)</font> In optimized Deep Learning framework we would build the encoder and decoder as 2 models and chain them with the same loss.  
  Then it becomes trivial to do the forward pass.

In [None]:
# Encoder Function
# The input is a fitted Auto Encoder model and the samples to apply the encoder on.
# It assumes the model is symmetric or the index of the bottleneck is given.

def ApplyEncoder( oAutoEnc: MLPRegressor, mX: np.ndarray, latentSpaceIdx: int = None ) -> np.ndarray:
    
    dModelParams = oAutoEnc.get_params()

    if dModelParams['activation'] == 'identity':
        hActLayer = lambda x: x
    elif dModelParams['activation'] == 'logistic':
        hActLayer = lambda x: sp.special.expit(x)
    elif dModelParams['activation'] == 'tanh':
        hActLayer = lambda x: np.tanh(x)
    elif dModelParams['activation'] == 'relu':
        hActLayer = lambda x: np.maximum(x, 0)
    
    lC = oAutoEnc.coefs_
    lI = oAutoEnc.intercepts_
    if latentSpaceIdx is None:
        latentSpaceIdx = len(lC) // 2

    mZ = mX.copy()

    for ii in range(latentSpaceIdx):
        mZ = hActLayer((mZ @ lC[ii]) + lI[ii])
    
    return mZ


In [None]:
# Apply the encoder on the test data
mZ = ApplyEncoder(oMlpReg, mXTest)

In [None]:
hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)

hA = PlotScatterData(mZ, vL = vYTest, hA = hA)

plt.show()

## Applying a Classifier

Given the encoding of the data we can apply a classifier on 2 number in order to classify the digit.

In [None]:
# Creating the Features

mZ = ApplyEncoder(oMlpReg, mXTest)

In [None]:
# Model Score on the RAW Data

vYPred = cross_val_predict(SVC(kernel = 'rbf'), mXTest, vYTest, cv = numKFold)

accScore = np.mean(vYPred == vYTest)
print(f'The raw data model accuracy in {numKFold} cross validation is {accScore:0.2%}.')

In [None]:
# Model Score on Latent Data

vYPred = cross_val_predict(SVC(kernel = 'rbf'), mZ, vYTest, cv = numKFold)

accScore = np.mean(vYPred == vYTest)
print(f'The latent space data model accuracy in {numKFold} cross validation is {accScore:0.2%}.')

* <font color='green'>(**@**)</font> Try to optimize the hyper parameters of the _Auto Encoder_ model to improve the accuracy.