[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io/)

# AI Program

## Machine Learning - Classification - Linear Classifier Support Vector Machine (SVM) - Exercise

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.001 | 13/03/2024 | Royi Avital | Added explanation on the `LinearSVC` class parameters              |
| 1.0.000 | 09/03/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0030LinearClassifierSVM.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning

# Image Processing

# Machine Learning


# Miscellaneous
import os
from platform import python_version
import random
import timeit

# Typing
from typing import Callable, Dict, List, Optional, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

In [None]:
# Courses Packages

from DataVisualization import Plot2DLinearClassifier, PlotBinaryClassData


In [None]:
# General Auxiliary Functions



In [None]:
# Parameters

# Data Generation


# Data Visualization
numGridPts = 250

## Exercise

In this exercise we'll do the following:

1. Apply SVM Classifier on the [_Breast Cancer Wisconsin (Diagnostic) Data Set_](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).
2. Use the `predict()` method of the SVM object.
3. Implement our own score function: `ClsAccuracy()`.
4. Compare it to the method `score()` of the SVM object.
5. Find the value of the parameter `C` which maximizes the accuracy.

* <font color='brown'>(**#**)</font> This notebook uses the `SVC` class. Yet if the `linear` kernel is used, it might be better to use `LinearSVC` which is oriented to a larger data sets.

## Generate / Load Data


In [None]:
# Load Modules

#===========================Fill This===========================#
# 1. Load the `load_breast_cancer` function from the `sklearn.datasets` module.
# 2. Load the `SVC` class from the `sklearn.svm` module.

from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
#===============================================================#

In [None]:
# Load Data 

dData = load_breast_cancer()
mX    = dData.data
vY    = dData.target

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')

In [None]:
# Pre Process Data
# Standardizing the features to have zero mean and unit variance and labels into {-1, 1}.

#===========================Fill This===========================#
# 1. Normalize Data (Features): Each column to have zero mean and unit standard deviation.
# 2. Transforming the Labels into {-1, 1}.

mX = mX - np.mean(mX, axis = 0)
mX = mX / np.std (mX, axis = 0)


vY[vY == 0] = -1
#===============================================================#

* <font color='brown'>(**#**)</font> Normalization is ambiguous in this context. In some cases it is used to describe the manipulation of the minimum and maximum values of the data.

In [None]:
# Data Dimensions

numSamples  = mX.shape[0]
print(f'The features data shape: {mX.shape}') #>! Should be (569, 30)

* <font color='red'>(**?**)</font> Does the data have a constant column of $1$ or $-1$?  
* <font color='red'>(**?**)</font> Should we add a constant column? Look at the [mathematical formulation of the SVC in SciKit Learn](https://scikit-learn.org/stable/modules/svm.html#mathematical-formulation).  
* <font color='red'>(**?**)</font> What's the `intercept_` attribute of the object?  

## Train a SVM Classifier

This sections trains an SVM Classifier using SciKit Learn.

### The SciKit Learn Package

In the course, from now on, we'll mostly use modules and functions from the [SciKit Learn](https://scikit-learn.org) package.  
It is mostly known for its API of `<model>.fit()` and `<model>.predict()`.  
This simple choice of convention created the ability to scale in the form of pipelines, chaining models for a greater model.

### SVM Classifier Class

In [None]:
# Construct the SVC Object
# Use the SVC constructor and the parameters below.

paramC      = 0.0001
kernelType = 'linear'

#===========================Fill This===========================#
# 1. Create a realization of the `SVC` class using the `C` and `kernel` parameters.
oSvmClassifier = SVC(C = paramC, kernel = kernelType)
#===============================================================#

In [None]:
# Train the Model

#===========================Fill This===========================#
# 1. Train the model using the `fit()` method.
oSvmClassifier.fit(mX, vY)
#===============================================================#


* <font color='blue'>(**!**)</font> Create a function called `ClsAccuracy( oCls, mX, vY )`  
  The function input is a model with `predict()` method and the data and labels.  
  The function output is the accuracy of the model (In the range [0, 1]).

In [None]:
# Scoring (Accuracy) Function

#===========================Fill This===========================#
# .1 Implement the function `ClsAccuracy()` as defined.

def ClsAccuracy( oCls, mX: np.ndarray, vY: np.ndarray ) -> np.floating:
    '''
    Calculates the accuracy (Fraction) of a model.
        oCls - A fitted classifier with a `predict()` method.
        mX   - The input data  mX.shape = (N, d)
        vY   - The true labels vY.shape = (N,)
    '''

    return np.mean(oCls.predict(mX) == vY)

#===============================================================#

In [None]:
# Score the Model

modelAcc = ClsAccuracy(oSvmClassifier, mX, vY)

print(f'The model accuracy on the training data is: {modelAcc:0.2%}')

* <font color='blue'>(**!**)</font> Compare the manual scoring function to the `score()` method of the classifier.

In [None]:
# Comparing the Score

#===========================Fill This===========================#
# 1. Use the model's method `score()` to evaluate the accuracy.
modelAccRef = oSvmClassifier.score(mX, vY)
#===============================================================#

print(f'The model accuracy (Based on the `score()` method) on the training data is: {modelAccRef:0.2%}')

### Optimizing the Parameter `C` of the Model

 * <font color='blue'>(**!**)</font> Create an array of values of the parameter `C`.
 * <font color='blue'>(**!**)</font> Create a loop which check the score for each `C` value.
 * <font color='blue'>(**!**)</font> Keep the `C` value which maximizes the score.

In [None]:
#===========================Fill This===========================#
numParams = 100 #<! Number of different values of `C`
lC = np.linspace(0.001, 5, numParams) #<! The list of `C` values to optimize over

dBestScore = {'Accuracy': 0, 'C': 0} #<! Dictionary to keep the highest score and the corresponding `C`

for ii, paramC in enumerate(lC):
    oSvmClassifier = SVC(C = paramC, kernel = 'linear') #<! Construct the SVC object
    oSvmClassifier = oSvmClassifier.fit(mX, vY) #<! Train on the data
    modelScore     = oSvmClassifier.score(mX, vY) #<! Calculate the score (Accuracy)

    if (modelScore > dBestScore['Accuracy']):
        dBestScore['Accuracy'] = modelScore #<! Update the new best score
        dBestScore['C'] = paramC #<! Update the corresponding `C` hyper parameter
    
#===============================================================#

print(f'The best model has accuracy of {dBestScore["Accuracy"]:0.2%} with `C = {dBestScore["C"]}`')

* <font color='blue'>(**!**)</font> Plot the score of the model as a function of the parameter `C`.  
* <font color='red'>(**?**)</font> Is the above a good strategy to optimize the model?  
* <font color='green'>(**@**)</font> Read the documentation of the [`LinearSVC`](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html) class.   
  Pay attention to the effect of `penalty` on the ${\left\| {\color{orange}\boldsymbol{w}} \right\|}_{p}^{p}$ term and `loss` on the ${\color{magenta}\xi_{i}}:=\max\left\{ 0,1-{\color{green}y_{i}}\left({\color{orange}\boldsymbol{w}^{T}}{\color{green}\boldsymbol{x}_{i}}-{\color{orange}b}\right)\right\}$ term.  
  See explanation on [Meaning of `penalty` and `loss` in `LinearSVC`](https://stackoverflow.com/questions/68819288).
* <font color='green'>(**@**)</font> Read the documentation of the [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) class. Try other values of `kernel`.