[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io/)

# AI Program

## Exercise 0004 - Classification

Optimizing the Hyper Parameter of a _Multi Class_ classifier.

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 15/03/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/Exercise0004.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import load_iris
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC


# Miscellaneous
import os
from platform import python_version
import random
import timeit

# Typing
from typing import Callable, Dict, List, Optional, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

L_CLASSES = ['Setosa', 'Versicolor', 'Virginica']



In [None]:
# Course Packages

from DataVisualization import PlotConfusionMatrix, PlotLabelsHistogram


In [None]:
# General Auxiliary Functions



## Exercise

This exercise introduces:

 - The [Iris Flower Data Set](https://en.wikipedia.org/wiki/Iris_flower_data_set).  
   We'll use it to exercise concepts learned on slides.  
 - The concept of a Data Frame by utilizing [Pandas](https://pandas.pydata.org/) (`pandas`).
 - Utilizing [SeaBorn](https://seaborn.pydata.org/) for data visualization and analysis.

<!-- ![Iris Flowers](https://www.pngkey.com/png/full/82-826789_iris-iris-sepal-and-petal.png) -->
![Iris Flowers](https://i.imgur.com/zLsKxI7.png)

In this exercise we'll apply the Cross Validation in the form of _Leave One Out_.  
We'll use the _cross validation_ process to optimize the models hyper parameters (See [Hyper Parameters Optimization](https://en.wikipedia.org/wiki/Hyperparameter_optimization)).  

1. Load the [Iris Flower Data Set](https://en.wikipedia.org/wiki/Iris_flower_data_set) using `load_iris()`.
2. Apply different classification models:
    - K-NN.
    - Linear SVM.
    - Kernel SVM (Use `rbf` in `kernel` parameter of `SVC`).
3. Optimize the hyper parameters of each model
    - The parameter `n_neighbors` for the `KNeighborsClassifier` model.
    - The parameter `C` for the `SVC` model (Both for `kernel = linear` and `kernel = rbf`).
4. For optimization evaluate the score (_Accuracy_) of each model using `cross_val_score()`.  
   This function calculates the score on each fold.
5. Plot the _confusion matrix_ of the best model.

### Pandas

The `pandas` python package is the _go to_ data manipulation and analysis library in Python.  
It has an optimized methods to work on _data series_ (1D) and _data frame_ (2D).  
It relies on NumPy for most of the numeric operations and has a synergy with SeaBorn as the visualization tool.

![](https://i.imgur.com/tFl2Tob.png)

### SeaBorn

The Python package `seaborn` is a statistical data visualization library.  
It wraps up _Matplotlib_ with beautiful recipes and useful tools.  
It has a big synergy with _Pandas_' data frame object.

In [None]:
# Parameters

trainRatio = 0.8

#===========================Fill This===========================#
# 1. Think of the parameters to optimize per model (See above).
# 2. Select the set to optimize over.
# 3. Set the number of folds in the cross validation.
lK = [1, 3, 5, 7] #<! Parameters of the K-NN Model
lC = [0.1, 0.5, 1, 3] #<! Parameters of the SVM (SVC) Model
numFold = 5
#===============================================================#

## Generate / Load Data

Load the [Iris Flower Data Set](https://en.wikipedia.org/wiki/Iris_flower_data_set) using `load_iris()`.

In [None]:
# Load Data

dfX, dsY = load_iris(return_X_y = True, as_frame = True) #<! Data Frame and Data Series

print(f'The number of data samples: {dfX.shape[0]}')
print(f'The number of features per sample: {dfX.shape[1]}') 
print(f'The labels: {dsY.unique()}')

In [None]:
# Data Frame for the Whole Data

dfData = pd.concat((dfX, dsY), axis = 1)
# dfData['target'] = dfData['target'].apply(lambda x: L_CLASSES[x]) #<! Mapping from Integer -> Name
# dfData['target'] = dfData['target'].map(lambda x: L_CLASSES[x]) #<! Mapping from Integer -> Name
dfData['target'] = dfData['target'].map(L_CLASSES.__getitem__) #<! Mapping from Integer -> Name
dfData.rename(columns = {'target': 'Class'}, inplace = True) #<! Many functions has the `inplace` function

In [None]:
# The DF Head

dfData.head()

### Plot Data

A useful plot for multi features data is the _pair plot_ (See `SeaBorn`'s [`pairplot()`](https://seaborn.pydata.org/generated/seaborn.pairplot.html)).  
The pair plot easily gives a view on the:

1. Relation between each pair of the features.
2. Distribution of each feature.

It is an important tool for observation of the features and their interrelation.

* <font color='brown'>(**#**)</font> You may read on it in [Data Exploration and Visualization with SeaBorn Pair Plots](https://scribe.rip/40e6d3450f6d).
* <font color='brown'>(**#**)</font> The plots matrix is $n \times n$ where $n$ is the number of features. Hence it is not feasible for $n \gg 1$.



In [None]:
# Plot the Data

sns.pairplot(data = dfData, hue = 'Class')

In [None]:
# Histogram of Classes
hF, hA = plt.subplots(figsize = (8, 6))
sns.countplot(data = dfData, x = 'Class', ax = hA)
hA.set_title('Counts of Each Class')

### Data to Train

Usually we create a split, yet in this case we'll use Cross Validation for training, hence we'll use the whole data.  
When the data set is large, it is better to keep test data a side and not only for validation.

In [None]:
# Convert Data to NumPy
# Though SciKit Learn fully supports (In some cases even a must) Data Frames as an input
mX, vY = dfX.to_numpy(), dsY.to_numpy()

## Optimize Classifiers

In this section we'll train the different variants of classifier and we'll find the best of them.

In [None]:
# Create a Data Frame to Collect 

#===========================Fill This===========================#
# 1. Calculate the number of variants.
# 2. Create a Data Frame with 4 columns: Type, K, C, Accuracy.
# 3. Fill each column with the relevant values.
# 4. Make sure the column of `K` has integer type. 
numVariants = len(lK) + (2 * len(lC))
vType       = np.concatenate((np.full(len(lK), 'K-NN'), np.full(len(lC), 'Linear SVM'), np.full(len(lC), 'Kernel SVM'))) #<! NumPy Vector of the names of the model
# vK          = np.concatenate((np.array(lK), pd.array(np.full(len(lC), np.nan), dtype = 'UInt8'), pd.array(np.full(len(lC), np.nan), dtype = 'UInt8'))) #<! We must keep the data type as Integer
vK          = np.concatenate((np.array(lK), np.full(len(lC), 0), np.full(len(lC), 0))) #<! NumPy Vector of the K parameter (when applicable, otherwise set to 0) (We must keep the data type as Integer
vC          = np.concatenate((np.full(len(lC), np.nan), np.array(lC), np.array(lC))) #<! NumPy Vector of the K parameter (when applicable, otherwise set to `np.nan`)
vA          = np.full(numVariants, np.nan) #<! Numpy Vector of the initial accuracy (Any value, will be overwritten)
#===============================================================#

dfAnalysis  = pd.DataFrame(data = {'Type': vType, 'K': vK, 'C': vC, 'Accuracy': vA})

In [None]:
# Displays the Data Frame
dfAnalysis

## Train Models

In this section we'll optimize the _hyper parameters_ of the 3 models.  
Given all those variants, we'll choose the best model.

1. Create a _data frame_ to hold the models hyper parameters and the score (Accuracy).  
   The _data frame_ should have 4 columns:
     - Model Type: `Type`.
     - Parameters `K`: `K` (When applicable).
     - Parameter `C`: `C` (when applicable).
     - The accuracy score: `Accuracy`.
2. Loop over the models in the _data frame_ (Each row):
    - Construct the model using the parameters.
    - Evaluate the model using `cross_val_score()` where the cross validation is _Leave One Out_.
    - Average over the array of scores returned from `cross_val_score()` and keep the result in the data frame.
3. Extract the best model.


* <font color='brown'>(**#**)</font> Pay attention to the expected run time. Start with small number of values and increase when it makes sense.
* <font color='brown'>(**#**)</font> The _Leave One Out_ method is commonly used when the number of training samples is low.
* <font color='brown'>(**#**)</font> Setting the $k$ parameter is in _K-Fold_ cross validation is a _bias variance_ tradeoff.  
   See [Cross Validated - 10 Fold Cross Validation vs. Leave One Out Cross Validation](https://stats.stackexchange.com/questions/154830).

In [None]:
# Measuring the Accuracy Using K-Fold with Leave One Out

#===========================Fill This===========================#
# 1. Loop over the Data Frame.
# 2. Per row:
#       - Extract the type and parameters.
#       - Construct the model.
#       - Train it using `cross_val_score()` with 'Leave One Out' policy.
#       - Keep the average accuracy 
# 
for ii in range(numVariants):
    modelType = dfAnalysis['Type'].loc[ii]
    if modelType == 'K-NN':
        modelCls = KNeighborsClassifier(n_neighbors = dfAnalysis['K'].loc[ii])
    elif modelType == 'Linear SVM':
        modelCls = SVC(C = dfAnalysis['C'].loc[ii], kernel = 'linear')
    elif modelType == 'Kernel SVM':
        modelCls = SVC(C = dfAnalysis['C'].loc[ii], kernel = 'rbf')
    
    vAccuracy = cross_val_score(modelCls, mX, vY, cv = KFold(mX.shape[0], shuffle = False)) #<! Leave One Out
    dfAnalysis.loc[ii, 'Accuracy'] = np.mean(vAccuracy)
#===============================================================#

In [None]:
dfAnalysis

* <font color='red'>(**?**)</font> How many elements in the array `cross_val_score()` returns?
* <font color='red'>(**?**)</font> Why can't we use a stratified K-Fold in the case above?
* <font color='red'>(**?**)</font> Compare `cross_val_score()` to `cross_val_predict()`. Which one should you use? when can't we use `cross_val_score()`?
* <font color='brown'>(**#**)</font> You should get accuracy above `97%` with a proper tuning.

In [None]:
# Display Results
# Plot the best results.

dfAnalysis.sort_values(by = 'Accuracy', ascending = False, inplace = True)
dfAnalysis

In [None]:
# Extract the Best Model

#===========================Fill This===========================#
# 1. Extract the best model type.
# 2. Extract its optimal hyper parameter: Set `paramName` for the name {'K' or 'C'} and `paramValue` as its value.
# 3. Construct the best model as `bestModel` using the above.
modelType = dfAnalysis.iloc[0, 0]
if modelType == 'K-NN':
    paramName = 'K'
    paramValue = dfAnalysis.iloc[0, 1]
    bestModel = KNeighborsClassifier(n_neighbors = paramValue)
elif modelType == 'Linear SVM':
    paramName = 'C'
    paramValue = dfAnalysis.iloc[0, 2]
    bestModel = SVC(C = paramValue, kernel = 'linear')
elif modelType == 'Kernel SVM':
    paramName = 'C'
    paramValue = dfAnalysis.iloc[0, 2]
    bestModel = SVC(C = paramValue, kernel = 'rbf')

#===============================================================#

print(f'The best model is of type {modelType} with parameter {paramName} = {paramValue}')


In [None]:
# The Best Mode

#===========================Fill This===========================#
# 1. Train the best model on the whole data.
# 2. Score (Accuracy) it on the whole data.
bestModel   = bestModel.fit(mX, vY)
modelScore  = bestModel.score(mX, vY)
#===============================================================#

print(f'The model score (Accuracy) on the data: {modelScore:0.2%}') #<! Accuracy

* <font color='red'>(**?**)</font> Is the score above lower than the CV result? Why?

## Performance Metrics / Scores

In this section we'll analyze the model using the _confusion matrix_.

### Display the Confusion Matrix

In [None]:
# Plot the Confusion Matrix
hF, hA = plt.subplots(figsize = (10, 10))

#===========================Fill This===========================#
# 1. Plot the confusion matrix using `PlotConfusionMatrix()`.
hA, mConfMat = PlotConfusionMatrix(vY, bestModel.predict(mX), lLabels = L_CLASSES, hA = hA)
#===============================================================#

plt.show()