[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Machine Learning Methods

## Neural Networks - Supervised Learning

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 09/03/2023 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2023_01/0046AnomalyDetectionIsolationForest.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

# Miscellaneous
import os
import math
from platform import python_version
import random

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
from matplotlib.colors import LogNorm, Normalize, PowerNorm
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
%matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

DATA_FILE_URL = r'https://raw.githubusercontent.com/nsethi31/Kaggle-Data-Credit-Card-Fraud-Detection/master/creditcard.csv'


In [None]:
# Fixel Algorithms Packages


## Supervised Learning with Neural Networks

The concept Neural Networks have been developed since ~1960.  
The basic idea is chaining a basic atoms.  
The most popular _atom_ is based on linear regression + non linear activation.  

* <font color='brown'>(**#**)</font> [3Blue1Brown - But What Is a Neural Network](https://www.youtube.com/watch?v=aircAruvnKk).
* <font color='brown'>(**#**)</font> The [TensorFlow Play Ground](https://playground.tensorflow.org/).
* <font color='brown'>(**#**)</font> Isolation Forest is a tree based model (Ensemble).

In this notebook we'll reproduce the _Hello World_ of Neural Networks: Solve the MNIST data set.

### Disclaimer: Neural Networks  $\ne$  Deep Learning

To start off I'd like to emphasize that neural networks and deep learning are not one and the same.  
Neural networks are a general and powerful machine learning model, while deep learning is the art of using them when they have a certain form.

> **Note:** **NN** stands for Neural Network, but you will see many specific NN types:
> * **ANN** and **DNN** are basically synonyms of NN and they stand for **Artificial Neural Networks** and **Deep Neural Networks**.
> * **RNN** and  **CNN** stand for **Recurrent Neural Networks** and **Convolutional Neural Networks**, which are networks with specific kind of layers.

* <font color='brown'>(**#**)</font> Actually _DNN_ is for Deep Neural Networks. Deep means the architecture is very long (Many "Nets" stacked).

### Motivation

As we already know from our experience with machine learning challenges, it is easier to fit (Also overfit) the model to the data when your model has many degrees of freedom.
**Neural networks have many degrees of freedom.** In fact, they have so many degrees of freedom, they can actually fit anything, given enough data and time.

* <font color='brown'>(**#**)</font> One might want to read about the [Universal Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem).

In [None]:
# Parameters

# Data
numSamplesTrain = 20_000
numSamplesVal   = 5_000


# Model
lHiddenLayers   = [60, 30, 10]
activationFun   = 'logistic'
solverType      = 'adam'
regFctr         = 0.0001 #<! L2 Regularization
batchSize       = 100
lrPolicy        = 'adaptive' #<! Works only if `solverType = sgd`
lrInit          = 0.001 #<! Works only if `solverType = sgd` or `solverType = adam`
numEpoch        = 100
stopTol         = 1e-4
earlyStopping   = True #<! Works only if `solverType = sgd` or `solverType = adam`
valRatio        = 0.15


# Visualization
numRows = 3
numCols = 3


In [None]:
# Auxiliary Functions

def PlotMnistImages(mX: np.ndarray, vY: np.ndarray = None, numRows: int = 1, numCols: int = 1, imgSize = (28, 28), randomChoice = True, hF = None):

    numSamples  = mX.shape[0]

    numImg = numRows * numCols

    # tFigSize = (numRows * 3, numCols * 3)
    tFigSize = (numCols * 3, numRows * 3)

    if hF is None:
        hF, hA = plt.subplots(numRows, numCols, figsize = tFigSize)
    else:
        hA = hF.axis
    
    hA = np.atleast_1d(hA) #<! To support numImg = 1
    hA = hA.flat

    
    for kk in range(numImg):
        if randomChoice:
            idx = np.random.choice(numSamples)
        else:
            idx = kk
        mI  = np.reshape(mX[idx, :], imgSize)
    
        hA[kk].imshow(mI, cmap = 'gray')
        hA[kk].tick_params(axis = 'both', left = False, top = False, right = False, bottom = False, labelleft = False, labeltop = False, labelright = False, labelbottom = False)
        labelStr = f', Label = {vY[idx]}' if vY is not None else ''
        hA[kk].set_title(f'Index = {idx}' + labelStr)
    
    plt.show()

def PlotScatterData(mX: np.ndarray, vL: np.ndarray = None, hA:plt.Axes = None, figSize: Tuple[int, int] = FIG_SIZE_DEF, markerSize: int = MARKER_SIZE_DEF, edgeColor: int = EDGE_COLOR, axisTitle: str = None):

    if hA is None:
        hF, hA = plt.subplots(figsize = figSize)
    else:
        hF = hA.get_figure()
    
    numSamples = mX.shape[0]

    if vL is None:
        vL = np.zeros(numSamples)
    
    vU = np.unique(vL)
    numClusters = len(vU)

    for ii in range(numClusters):
        vIdx = vL == vU[ii]
        hA.scatter(mX[vIdx, 0], mX[vIdx, 1], s = markerSize, edgecolor = edgeColor, label = ii)
    
    hA.set_xlabel('${{x}}_{{1}}$')
    hA.set_ylabel('${{x}}_{{2}}$')
    if axisTitle is not None:
        hA.set_title(axisTitle)
    hA.grid()
    hA.legend()

    return hA


def PlotLabelsHistogram(vY: np.ndarray, hA = None, lClass = None, xLabelRot: int = None) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(figsize = (8, 6))
    
    vLabels, vCounts = np.unique(vY, return_counts = True)

    hA.bar(vLabels, vCounts, width = 0.9, align = 'center')
    hA.set_title('Histogram of Classes / Labels')
    hA.set_xlabel('Class')
    hA.set_ylabel('Number of Samples')
    hA.set_xticks(vLabels)
    if lClass is not None:
        hA.set_xticklabels(lClass)
    
    if xLabelRot is not None:
        for xLabel in hA.get_xticklabels():
            xLabel.set_rotation(xLabelRot)

    return hA



## Generate / Load Data

In this notebook we'll use the [`MNIST`](https://en.wikipedia.org/wiki/MNIST_database) data set.

It features 60,000 train images and 10,000 test images of size `28x28`.  
The images are `INT64` with values in the range `{0, 1, 2, ..., 255}` (Like `UINT8`).

* <font color='brown'>(**#**)</font> The MNIST is the data set used by Yann LeCun in the ~1990 to show the ability of Neural Networks.
* <font color='brown'>(**#**)</font> In the [MNIST WebSite](http://yann.lecun.com/exdb/mnist/) one can watch the performance improvement over the years using different approaches.


In [None]:
# Loading / Generating Data
mX, vY = fetch_openml('mnist_784', version = 1, return_X_y = True, as_frame = False, parser = 'auto')

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')
print(f'The unique values of the labels: {np.unique(vY)}')

In [None]:
# Pre Processing

# The image is in the range {0, 1, ..., 255}
# We scale it into [0, 1]

mX = mX / 255.0

### Plot the Data

In [None]:
# Display the Data

PlotMnistImages(mX, vY, numRows, numCols)

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY)

In [None]:
# Train / Test Split

mXTrain, mXTest, vYTrain, vYTest =  train_test_split(mX, vY, train_size = numSamplesTrain, test_size = numSamplesVal, stratify = vY)

print(f'The number of training data samples: {mXTrain.shape[0]}')
print(f'The number of training features per sample: {mXTrain.shape[1]}') 


print(f'The number of test data samples: {mXTest.shape[0]}')
print(f'The number of test features per sample: {mXTest.shape[1]}') 

## Neural Network Classifier

As a classifier we'll use SciKit Learn's [`MLPClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html).  
It basically implements the Multi Layer Perceptron architecture:

![](https://i.imgur.com/FATOA17.png`)



Some notes about the configuration:

 * The model allows us to define the depth of the model and the activation layer.
 * The model allows us selecting the optimizer using `solver`.
 * The model allows us setting the scheduling policy using `learning_rate`.
 * The model has the option of _early stopping_ based on validation sub set of the data.  
   This assists in preventing overfit of the model.


In order to have a control over each epoch, we'll us the `partial_fit()` method to apply a single epoch each time.  
Per epoch we'll compare the permeance of the model on the train data vs. test data.

* <font color='red'>(**?**)</font> One of the options for `activation` is `identity`, namely no activation. In case of a multi layer model, what effect will it have?
* <font color='red'>(**?**)</font> What kind of a parameter is the number of hidden layers?

In [None]:
# Constructing the Model

oMlpCls = MLPClassifier(hidden_layer_sizes = lHiddenLayers, activation = activationFun, solver = solverType, alpha = regFctr, batch_size = batchSize)

* <font color='red'>(**?**)</font> Why does the length of `lHiddenLayers` is `n_layers - 2`? What are the missing layers?
* <font color='red'>(**?**)</font> What would happen, in the context of classification, if the test labels will have a label which is not in the train set?

In [None]:
# Epoch Loop

lTrainLoss  = [] #<! Train set loss
lTrainScore = [] #<! Train set score
lValScore   = [] #<! Validation set score

for ii in range(numEpoch):
    print(f'Processing epoch #{ii:03d} out of {numEpoch} epochs.')
    
    oMlpCls = oMlpCls.partial_fit(mXTrain, vYTrain, np.unique(vY)) #<! The method `partial_fit()` requires the classes on first call

    # Accuracy Score
    trainScore  = oMlpCls.score(mXTrain, vYTrain)
    valScore    = oMlpCls.score(mXTest, vYTest)

    lTrainLoss.append(oMlpCls.loss_)
    lTrainScore.append(trainScore)
    lValScore.append(valScore)
    print(f'The train loss (Log Loss)      : {oMlpCls.loss_:0.2f}')
    print(f'The train score (Accuracy)     : {trainScore:0.2%}')
    print(f'The validation score (Accuracy): {valScore:0.2%}')

In [None]:
# Plot the Results

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)

hA.plot(range(numEpoch), lTrainLoss, label = 'Train Loss')
hA.plot(range(numEpoch), lTrainScore, label = 'Train Score')
hA.plot(range(numEpoch), lValScore, label = 'Validation Score')

hA.set_title('Score and Loss of the Training Loop')

hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score / Loss')

hA.legend()

plt.show()

### Early Stopping

Using a simple policy, called _early stopping_, one could prevent over fitting and stop the training process at the optimal phase.  
The idea is to stop the training phase once the score of the validation stops improving.

This is achieved using the `tol`, `early_stopping`, `validation_fraction` and `n_iter_no_change` parameters of the model.

* <font color='brown'>(**#**)</font> The _early stopping_ technique is a regularization method to prevent over fit.
* <font color='brown'>(**#**)</font> The _early stopping_ technique is optimal under the assumption once the policy breaks, the validation score won't ever improve. This is not always true.

In [None]:
# Constructing the Model
# Setting `n_iter_no_change = numEpoch` to prevent early stopping.

oMlpCls = MLPClassifier(hidden_layer_sizes = lHiddenLayers, activation = activationFun, solver = solverType, alpha = regFctr, batch_size = batchSize, max_iter = numEpoch, random_state = seedNum, early_stopping = earlyStopping, n_iter_no_change = numEpoch)


In [None]:
# Training the Model

oMlpCls = oMlpCls.fit(mX, vY)

In [None]:
# Plot Results

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)

hA.plot(range(oMlpCls.n_iter_), oMlpCls.loss_curve_, label = 'Train Loss')
# hA.plot(range(oMlpCls.n_iter), lTrainScore, label = 'Train Score')
hA.plot(range(oMlpCls.n_iter_), oMlpCls.validation_scores_, label = 'Validation Score')

hA.set_title('Score and Loss of the Training Loop - Without Early Stopping')

hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score / Loss')

hA.legend()

plt.show()

In [None]:
# Constructing the Model
# Setting `tol = 0` to prevent early stopping.

oMlpCls = MLPClassifier(hidden_layer_sizes = lHiddenLayers, activation = activationFun, solver = solverType, alpha = regFctr, batch_size = batchSize, max_iter = numEpoch, random_state = seedNum, early_stopping = earlyStopping)

In [None]:
# Training the Model

oMlpCls = oMlpCls.fit(mX, vY)

In [None]:
# Plot Results

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)

hA.plot(range(oMlpCls.n_iter_), oMlpCls.loss_curve_, label = 'Train Loss')
# hA.plot(range(oMlpCls.n_iter), lTrainScore, label = 'Train Score')
hA.plot(range(oMlpCls.n_iter_), oMlpCls.validation_scores_, label = 'Validation Score')

hA.set_title('Score and Loss of the Training Loop - With Early Stopping')

hA.set_xlabel('Epoch Index')
hA.set_ylabel('Score / Loss')

hA.legend()

plt.show()

As can be seen, indeed the training stopped earlier as the score / loss stopped improving.

* <font color='green'>(**@**)</font> Manually recreate the _inference_ model using the model's `coefs_` and `intercepts_` attributes.