[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Machine Learning Methods

## Supervised Learning - Features Transform

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 26/01/2023 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2023_01/0015FeaturesTransform.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import make_moons
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import auc, confusion_matrix, precision_recall_fscore_support, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Miscellaneous
import os
from platform import python_version
import random

# Typing
from typing import Tuple

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
#%matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF = (8, 8)
ELM_SIZE_DEF = 50
CLASS_COLOR = ('b', 'r')
EDGE_COLOR  = 'k'


In [None]:
# Fixel Algorithms Packages


## Kernel SVM by Feature Transform

In this notebook we'll imitate the effect of the _Kernel Trick_ using features transform.  
We'll use a _XOR Data Set_, where data are located in the 4 quadrants of the $\mathbb{R}^{2}$ space.

* <font color='brown'>(**#**)</font> Some useful tutorials on Feature Engineering are given in: [Feature Engine](https://github.com/feature-engine/feature_engine), [Feature Engine Examples](https://github.com/feature-engine/feature-engine-examples), [Python Feature Engineering Cookbook - Jupyter Notebooks](https://github.com/PacktPublishing/Python-Feature-Engineering-Cookbook).

In [None]:
# Parameters

# Data Generation
numSamples = 250 #<! Per Quarter

# Model
paramC      = 1
kernelType  = 'linear'
lC          = [0.1, 0.25, 0.75, 1, 1.5, 2, 3]

# Data Visualization
numGridPts = 500

In [None]:
# Auxiliary Functions

def PlotBinaryClassData( mX: np.ndarray, vY: np.ndarray, hA:plt.Axes = None, figSize: Tuple[int, int] = FIG_SIZE_DEF, elmSize: int = ELM_SIZE_DEF, classColor: Tuple[str, str] = CLASS_COLOR, axisTitle: str = None ) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(figsize = figSize)
    else:
        hF = hA.get_figure()
    
    vC, vN = np.unique(vY, return_counts = True)

    numClass = len(vC)
    if (len(vC) != 2):
        raise ValueError(f'The input data is not binary, the number of classes is: {numClass}')

    vIdx0 = vY == vC[0]
    vIdx1 = vY == vC[1] #<! Basically ~vIdx0

    hA.scatter(mX[vIdx0, 0], mX[vIdx0, 1], s = elmSize, color = classColor[0], edgecolor = 'k', label = f'$C_\u007b {vC[0]} \u007d$')
    hA.scatter(mX[vIdx1, 0], mX[vIdx1, 1], s = elmSize, color = classColor[1], edgecolor = 'k', label = f'$C_\u007b {vC[1]} \u007d$')
    hA.axvline(x = 0, color = 'k')
    hA.axhline(y = 0, color = 'k')
    hA.axis('equal')
    if axisTitle is not None:
        hA.set_title(axisTitle)
    hA.legend()
    
    return hA

def PlotLabelsHistogram(vY: np.ndarray, hA = None):

    if hA is None:
        hF, hA = plt.subplots(figsize = (8, 6))
    
    vLabels, vCounts = np.unique(vY, return_counts = True)

    hA.bar(vLabels, vCounts, width = 0.9, align = 'center')
    hA.set_xticks(vLabels)
    hA.set_title('Histogram of Classes / Labels')
    hA.set_xlabel('Class')
    hA.set_ylabel('Number of Samples')

    return hA

def PlotConfusionMatrix(vY: np.ndarray, vYPred: np.ndarray, hA: plt.Axes = None, lLabels: list = None, dScore: dict = None, titleStr: str = 'Confusion Matrix') -> plt.Axes:

    # Calculation of Confusion Matrix
    mConfMat = confusion_matrix(vY, vYPred)
    oConfMat = ConfusionMatrixDisplay(mConfMat, display_labels = lLabels)
    oConfMat = oConfMat.plot(ax = hA)
    hA = oConfMat.ax_
    if dScore is not None:
        titleStr += ':'
        for scoreName, scoreVal in  dScore.items():
            titleStr += f' {scoreName} = {scoreVal:0.2},'
        titleStr = titleStr[:-1]
    hA.set_title(titleStr)
    hA.grid(False)

    return hA


def PlotDecisionBoundaryClosure( numGridPts, gridXMin, gridXMax, gridYMin, gridYMax, numDigits = 1 ):

    # v0       = np.linspace(gridXMin, gridXMax, numGridPts)
    # v1       = np.linspace(gridYMin, gridYMax, numGridPts)
    roundFctr = 10 ** numDigits
    
    # For equal axis
    minVal = np.floor(roundFctr * min(gridXMin, gridYMin)) / roundFctr
    maxVal = np.ceil(roundFctr * max(gridXMax, gridYMax)) / roundFctr
    v0     = np.linspace(minVal, maxVal, numGridPts)
    v1     = np.linspace(minVal, maxVal, numGridPts)
    
    XX0, XX1 = np.meshgrid(v0, v1)
    XX       = np.c_[XX0.ravel(), XX1.ravel()]

    def PlotDecisionBoundary(hDecFun, hA = None):
        
        if hA is None:
            hF, hA = plt.subplots(figsize = (8, 6))

        Z = hDecFun(XX)
        Z = Z.reshape(XX0.shape)
            
        hA.contourf(XX0, XX1, Z, colors = CLASS_COLOR, alpha = 0.3, levels = [-0.5, 0.5, 1.5])

        return hA

    return PlotDecisionBoundary
    




## Generate / Load Data


In [None]:
# Loading / Generating Data

mX1  = np.random.rand(numSamples, 2) - 0.5 + np.array([ 1,  1]).T
mX2  = np.random.rand(numSamples, 2) - 0.5 + np.array([-1, -1]).T
mX3  = np.random.rand(numSamples, 2) - 0.5 + np.array([-1,  1]).T
mX4  = np.random.rand(numSamples, 2) - 0.5 + np.array([ 1, -1]).T

mX = np.concatenate((mX1, mX2, mX3, mX4), axis = 0)
vY = np.concatenate((np.full(2 * numSamples, 1), np.full(2 * numSamples, 0)))


PlotDecisionBoundary = PlotDecisionBoundaryClosure(numGridPts, -1.5, 1.5, -1.5, 1.5)

### Plot Data

In [None]:
# Display the Data
hA = PlotBinaryClassData(mX, vY, axisTitle = 'Samples Data')

## Solution by a Linear SVM Classifier

In this section we'll try optimize the best Linear SVM model for the problem. 

* <font color='red'>(**?**)</font> What do you think the decision boundary will be? Think about symmetry.

In [None]:
# SVM Linear Model

vAcc = np.zeros(shape = len(lC))

for ii, C in enumerate(lC):
    oLinSvc  = SVC(C = C, kernel = kernelType).fit(mX, vY)
    vAcc[ii] = oLinSvc.score(mX, vY)

bestModelIdx    = np.argmax(vAcc)
bestC           = lC[bestModelIdx]

oLinSvc = SVC(C = bestC, kernel = kernelType).fit(mX, vY)

print(f'The best model with C = {bestC:0.2f} achieved accuracy of {vAcc[bestModelIdx]:0.2%}')


In [None]:
# Plot the Decision Boundary

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)
hA = PlotDecisionBoundary(oLinSvc.predict, hA)
hA = PlotBinaryClassData(mX, vY, hA = hA, axisTitle = 'Classifier Decision Boundary')
plt.show()



## Feature Transform

In this section we'll a new feature: ${x}_{3} = {x}_{1} \cdot {x}_{2}$.

In [None]:
# Generate a set of features with the new feature
mXX = np.column_stack((mX, mX[:, 0] * mX[:, 1]))

## Solution by Linear SVM Classifier

In this section we'll try optimize the best Linear SVM model for the problem.  
Yet, we'll train it on the features with the additional transformed one.

Then we'll show the decision boundary of the best model.

* <font color='red'>(**?**)</font> What do you expect the decision boundary to look like this time?

In [None]:
# SVM Linear Model

vAcc = np.zeros(shape = len(lC))

for ii, C in enumerate(lC):
    oLinSvc  = SVC(C = C, kernel = kernelType).fit(mXX, vY) #<! Pay attention we train on `mXX`
    vAcc[ii] = oLinSvc.score(mXX, vY)

bestModelIdx    = np.argmax(vAcc)
bestC           = lC[bestModelIdx]

oLinSvc = SVC(C = bestC, kernel = kernelType).fit(mXX, vY)

print(f'The best model with C = {bestC:0.2f} achieved accuracy of {vAcc[bestModelIdx]:0.2%}')


* <font color='red'>(**?**)</font> Why was the above `C` gave the best results?
* <font color='red'>(**?**)</font> What's the accuracy of all other models?

In [None]:
# Plot the Decision Boundary

hPredict = lambda mX: oLinSvc.predict(np.column_stack((mX, mX[:, 0] * mX[:, 1])))

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)
hA = PlotDecisionBoundary(hPredict, hA)
hA = PlotBinaryClassData(mX, vY, hA = hA, axisTitle = 'Classifier Decision Boundary')
plt.show()

## Solution by Kernel SVM - Polynomial

In this section we'll apply a Kernel SVM with Polynomial kernel.

* <font color='red'>(**?**)</font> What feature transform is needed for this model?
* <font color='red'>(**?**)</font> What's the minimum degree of the polynomial to solve this problem?

In [None]:
# SVM Polynomial Model

pDegree     = 4
kernelType  = 'poly'

vAcc = np.zeros(shape = len(lC))

for ii, C in enumerate(lC):
    oSvc     = SVC(C = C, kernel = kernelType, degree = pDegree).fit(mX, vY)
    vAcc[ii] = oSvc.score(mX, vY)

bestModelIdx    = np.argmax(vAcc)
bestC           = lC[bestModelIdx]

oSvc = SVC(C = bestC, kernel = kernelType, degree = pDegree).fit(mX, vY)

print(f'The best model with C = {bestC:0.2f} achieved accuracy of {vAcc[bestModelIdx]:0.2%}')


In [None]:
# Plot the Decision Boundary

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)
hA = PlotDecisionBoundary(oSvc.predict, hA)
hA = PlotBinaryClassData(mX, vY, hA = hA, axisTitle = 'Classifier Decision Boundary')
plt.show()

* <font color='green'>(**@**)</font> Do the above with the `rbf` and `sigmoid` kernels.
* <font color='blue'>(**!**)</font> Run the above with the kernel `poly` and set `degree` to 100. What happened?
* <font color='red'>(**?**)</font> How will the complexity of the calculation grow with the polynomial degree? 
* <font color='brown'>(**#**)</font> The issues above are the motivation for the _Kernel Trick_.