[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Machine Learning Methods

## Neural Network - Regression

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 03/12/2025 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0040ClassifierKernelSVM.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

# Miscellaneous
from platform import python_version
import random

# Typing
from typing import Callable, Dict, List, Optional, Set, Tuple, Union

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

In [None]:
# Courses Packages

from DataVisualization import PlotRegressionResults

In [None]:
# General Auxiliary Functions


## Multi Layer Perceptron (MLP) for Regression

In this exercise we'll apply the Cross Validation automatically to find the optimal hyper parameters for the model.  
In order to achieve this we'll do a [Grid Search for Hyper Parameters Optimization](https://en.wikipedia.org/wiki/Hyperparameter_optimization).

1. Load the [Fashion MNIST Data Set](https://github.com/zalandoresearch/fashion-mnist) manually (Done by the notebook).
2. Train a baseline _Logistic Regression_ model.
3. Find the optimal parameters of the model using Grid Search.
4. Extract the optimal model.
5. Plot the Confusion Matrix of the best model on the training data.

### The MLP Regressor Model

![](https://i.imgur.com/qNvZJX8.png)
<!-- ![](https://i.postimg.cc/hjNXppkW/Diagrams-Multi-Layer-Perceptron-(MLP)-Regression.png) -->

The above model is given by:

$$
{\color{red}{\boldsymbol{y}}} = {\color{orange} \boldsymbol{W}_{3}} \sigma \left( {\color{orange} \boldsymbol{W}_{2}} \sigma \left( {\color{orange} \boldsymbol{W}_{1}} {\color{cyan} \boldsymbol{x}} + {\color{959516} \boldsymbol{b}_{1}} \right) + {\color{959516} \boldsymbol{b}_{2}} \right)
$$

Where 
 - The vector ${\color{cyan} \boldsymbol{x}}$ is the input layer.
 - The values ${\color{00B050} \boldsymbol{z}_{1}} = \sigma \left( {\color{orange} \boldsymbol{W}_{1}} {\color{cyan} \boldsymbol{x}} + {\color{959516} \boldsymbol{b}_{1}} \right), \; {\color{00B050} \boldsymbol{z}_{2}} = \sigma \left( {\color{orange} \boldsymbol{W}_{2}} {\color{00B050} \boldsymbol{z}_{1}} + {\color{959516} \boldsymbol{b}_{2}} \right)$ are the _Hidden Layers_.
 - The function $\sigma \left( \cdot \right)$ is the _Activation Layer_.
 - The vector ${\color{red}{\boldsymbol{y}}}$ is the output layer.
 - $ \left( {\color{orange} \boldsymbol{W}_{1}}, {\color{959516} \boldsymbol{b}_{1}} \right), \left( {\color{orange} \boldsymbol{W}_{2}}, {\color{959516} \boldsymbol{b}_{2}} \right), \left( {\color{orange} \boldsymbol{W}_{3}} \right) $ are the _Model Parameters_.

</br>

* <font color='brown'>(**#**)</font> The MLP can be scaled with more hidden layers.  
  Commonly the 1st section expands the number of features and 2nd decreases according to the number of outputs.
* <font color='brown'>(**#**)</font> In the case above the output layer is a single item vector. Yet it may be a vector in order to approximate a _Vector Function_.
* <font color='brown'>(**#**)</font> In case the output should be bounded, one may use a _Sigmoid_ / _Hyperbolic Tangent_ as the last activation layer.  
  In such case, it is common to add a bias term for the output layer as well.
* <font color='brown'>(**#**)</font> Without using _Deep Learning_ techniques, the depth potential is limited.
* <font color='brown'>(**#**)</font> The control over the training phase in SciKit Learn is limited. For more control one may use [Skorch](https://github.com/skorch-dev/skorch) or directly use [PyTorch](https://github.com/pytorch/pytorch).

In [None]:
# Parameters

# Data
csvFileUrl    = r'https://github.com/FixelAlgorithmsTeam/FixelCourses/raw/refs/heads/master/DataSets/UCIConcreteCompressiveStrength.csv'
trainSetRatio = 0.85

# Linear Regression (Baseline Model)
polynomDeg = 2 #<! Baseline
α          = 0.05

# MLP Regressor
numFold = 5

## Generate / Load Data

The data (Features) description:

| Variable Name                 | Role    | Type       | Description | Units  | Missing Values |
|-------------------------------|---------|------------|-------------|--------|----------------|
| Cement                        | Feature | Continuous |             | kg/m^3 | no             |
| Blast Furnace Slag            | Feature | Integer    |             | kg/m^3 | no             |
| Fly Ash                       | Feature | Continuous |             | kg/m^3 | no             |
| Water                         | Feature | Continuous |             | kg/m^3 | no             |
| Superplasticizer              | Feature | Continuous |             | kg/m^3 | no             |
| Coarse Aggregate              | Feature | Continuous |             | kg/m^3 | no             |
| Fine Aggregate                | Feature | Continuous |             | kg/m^3 | no             |
| Age                           | Feature | Integer    |             | day    | no             |
| Concrete Compressive Strength | Target  | Continuous |             | MPa    | no             |

The target variable is `Concrete Compressive Strength`.

In [None]:
# Load Data 

dfData = pd.read_csv(csvFileUrl)

dfData.head(10)

In [None]:
# Data Summary

dfData.info()

In [None]:
# Data Description

dfData.describe()

### Plot Data

In [None]:
# Pair Plot

sns.pairplot(data = dfData)

In [None]:
# Correlation Matrix
mCorr = np.abs(dfData.corr())

hF, hA = plt.subplots(figsize = (6, 4))

sns.heatmap(mCorr, annot = True, fmt = '0.2f', cmap = 'coolwarm', ax = hA)
hA.xaxis.set_tick_params(rotation = 90)

* <font color='red'>(**?**)</font> Which feature is the most important?
* <font color='red'>(**?**)</font> If one feature must be dropped, which one would you drop?

In [None]:
# The Data

dfX = dfData.copy()
dfX = dfX.drop(columns = ['Compressive Strength'])
dsY = dfData['Compressive Strength'].copy()

print(f'The features data shape: {dfX.shape}')
print(f'The labels data shape: {dsY.shape}')

In [None]:
# Train / Validation Data Split

dfXTrain, dfXVal, dsYTrain, dsYVal = train_test_split(dfX, dsY, train_size = trainSetRatio, random_state = seedNum, shuffle = True)

print(f'The training features data shape  : {dfXTrain.shape}')
print(f'The training labels data shape    : {dsYTrain.shape}')
print(f'The validation features data shape: {dfXVal.shape}')
print(f'The validation labels data shape  : {dsYVal.shape}')

## Train a Ridge Regression Regressor

The _Ridge Regression_ model will function as the baseline regressor.

In [None]:
# Ridge Regression Linear Model
#===========================Fill This===========================#
# 1. Construct a baseline model pipeline user the _Hyper Parameters_ defined above:
#   - Data Scaler: `StandardScaler()`.
#   - Polynomial Features: `PolynomialFeatures()`.
#   - Regressor: `Ridge()`.
# 2. Train the model (Training set).
# 3. Score the model using the R2 score (Validation set). Keep result in a variable named `modelScore`.

# Pipeline

oLinReg    = Pipeline([('DataScaler', StandardScaler()), ('PolyFeatures', PolynomialFeatures(degree = polynomDeg)), ('Regressor', Ridge(alpha= α))])
oLinReg    = oLinReg.fit(dfXTrain, dsYTrain)
modelScore = oLinReg.score(dfXVal, dsYVal)
#===============================================================#

print(f'The model score (R2) on the data: {modelScore:0.2f}') #<! Accuracy

## Train Multi Layer Perceptron (MLP) Classifier

This section trains an MLP model for classification.  
Using _Grid Search_ it tunes the optimal hyper parameters of the model.  

In case of the MLP the main _Hyper Parameters_ are:

 - Number of Hidden Layers.
 - Number of parameters in each Hidden Layer.
 - Activation Function.
 - Regularization Factor.

The Grid Search requires defining:

In order to use it we need to define:
 - The Model (`estimator`) - Which model is used.
 - The Parameters Grid (`param_grid`) - The set of parameter to try.
 - The Scoring (`scoring`) - The score used to define the best model.
 - The Cross Validation Iterator (`cv`) - The iteration to validate the model.


* <font color='brown'>(**#**)</font> Pay attention to the expected run time. Using `verbose` is useful.
* <font color='brown'>(**#**)</font> For large number of combinations, one may try [`RandomizedSearchCV`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) which in many is more efficient.
* <font color='brown'>(**#**)</font> The `GridSearchCV()` is limited to one instance of an estimator.  
  Yet using Pipelines we may test different types of estimators.
* <font color='brown'>(**#**)</font> In production one would visualize the effect of each parameter on the model result. Then use it to fine tune farther the parameters.

In [None]:
# Set the Pipeline for MLP Regressor

oMlpReg = Pipeline([('DataScaler', StandardScaler()), ('Regressor', MLPRegressor(solver = 'adam', max_iter = 15_000))])

In [None]:
# Construct the Grid Search object 

#===========================Fill This===========================#
# 1. Set the parameters to iterate over and their values.
# !! The parameters are of Pipeline format: Prefixed by the step name and a double underscore `__`.
# !! One may use `n_jobs = 4` to speed up the search.
dParams = {'Regressor__hidden_layer_sizes': [(40,), (20, 15), (10, 6, 4)], 'Regressor__activation': ['relu', 'logistic'], 'Regressor__alpha': [0.001, 0.01, 0.1]}
dParams = {'Regressor__hidden_layer_sizes': [(50,), (25, 15)], 'Regressor__activation': ['relu', 'logistic'], 'Regressor__alpha': [0.001, 0.01, 0.1]}
#===============================================================#

oGsSvc = GridSearchCV(estimator = oMlpReg, param_grid = dParams, scoring = None, n_jobs = 4, cv = numFold, verbose = 4)

* <font color='brown'>(**#**)</font> Better results can be generated with wider model with the cost of run time.

In [None]:
# Hyper Parameter Optimization
# Training the model with each combination of hyper parameters.
# Should take ~3 minute on a decent machine.

#===========================Fill This===========================#
# 1. The model trains on the train data using Stratified K Fold cross validation.
oGsSvc = oGsSvc.fit(dfXTrain, dsYTrain) #<! It may take few minutes
#===============================================================#

In [None]:
# Best Model
# Extract the attributes of the best model.

#===========================Fill This===========================#
# 1. Extract the best score.
# 2. Extract a dictionary of the parameters.
# !! Use the attributes of the `oGsSvc` object.
bestScore   = oGsSvc.best_score_
dBestParams = oGsSvc.best_params_
#===============================================================#

print(f'The best model had the following parameters: {dBestParams} with the CV score: {bestScore:0.2%}')

* <font color='brown'>(**#**)</font> In production one would visualize the effect of each parameter on the model result. Then use it to fine tune farther the parameters.

In [None]:
# The Best Model

#===========================Fill This===========================#
# 1. Extract the best model.
# 2. Score the best model on the test data set.
oBestModel = oGsSvc.best_estimator_
modelScore = oBestModel.score(dfXVal, dsYVal)
#===============================================================#

print(f'The model score (Accuracy) on the data: {modelScore:0.2%}') #<! Accuracy

* <font color='red'>(**?**)</font> Is the value above exactly as the value from the best model of the grid search? If so, look at the `refit` parameter of `GridSearchCV`.

## Performance Metrics / Scores

This section analyzes the model using regression score.

### The Regression Plot

In [None]:
# Plot the Regression Plot

#===========================Fill This===========================#
# 1. Plot the Regression Plot for the best model.
# Results on the Validation Set

hF, hA = plt.subplots(figsize = (8, 6))
hA = PlotRegressionResults(dsYVal.to_numpy(), oBestModel.predict(dfXVal), hA = hA)
hA.set_title(f'Validation Set Score: {modelScore:0.2f}, CV Score: {bestScore:0.2f}');
#===============================================================#

* <font color='red'>(**?**)</font> Explain the graph. Specifically, how can teh results of Multi Variate regression be displayed.