[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io/)

# Machine Learning Methods

## Supervised Learning - Classification - Binary Classification on Breast Cancer Data - Exercise Solution

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 17/09/2022 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2023_01/0006ClassifierBinaryExerciseSolution.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import load_breast_cancer

# Misc
import datetime
import os
from platform import python_version
import random
import warnings
import yaml

# Typing
from typing import Tuple

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from bokeh.plotting import figure, show

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
%matplotlib inline

warnings.filterwarnings("ignore")

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF = (8, 8)
ELM_SIZE_DEF = 50
CLASS_COLOR = ('b', 'r')


In [None]:
# Fixel Algorithms Packages


In [None]:
# Parameters


# Data Visualization
figSize     = (8, 8)
elmSize     = 50
classColor0 = 'b'
classColor1 = 'r'

numGridPts = 250

In [None]:
# Auxiliary Functions



## Generate / Load Data

We'll use the [_Breast Cancer Wisconsin (Diagnostic) Data Set_](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).



In [None]:
# Load Data 

dData = load_breast_cancer()
mX    = dData.data
vY    = dData.target

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')

In [None]:
# Print Data Description

print(dData.DESCR)

In [None]:
# Features Description

print(dData.feature_names)


<font color='brown'>(**#**)</font> [Fractal Dimension](https://en.wikipedia.org/wiki/Fractal_dimension) in this context means how curvy and pointy is the perimeter of the object (Digitized image of a fine needle aspirate (FNA) of a breast mass).

In [None]:
print(f'The unique values of the labels: {np.unique(vY)}')

In [None]:
# Pre Process Data

# Normalize Data (Features)
mX = mX - np.mean(mX, axis = 0)
mX = mX / np.std (mX, axis = 0)

# Transforming the Labels into {-1, 1}
vY[vY == 0] = -1

### Matrix Form of the Data (Parameterization)

We want to add to the features the constant column:

$$
\boldsymbol{X} = \begin{bmatrix}
-1 & - & x_{1} & -\\
-1 & - & x_{2} & -\\
 & \vdots\\
-1 & - & x_{N} & -
\end{bmatrix} \in \mathbb{R}^{N \times 31} $$



Tasks:

* <font color='blue'>(**!**)</font> Set `numSamples` to be the number of samples (use `len` or `np.shape`).
* <font color='blue'>(**!**)</font> Reset `mX` as above.

Make sure that `mX.shape = (569, 31)`.

In [None]:
#===========================Fill This===========================#
numSamples  = mX.shape[0]
mX          = np.column_stack((-np.ones(numSamples), mX))
#===============================================================#

print(f'The features data shape: {mX.shape}') #>! Should be (569, 31)

<font color='red'>(**?**)</font> Why can't we plot the data?

### Calculation Building Blocks

 * The [Sigmoid Function](https://en.wikipedia.org/wiki/Sigmoid_function) (Member of the _S Shaped_ function family):

$$ \sigma \left( x \right) = 2 \frac{ \exp \left( x \right) }{ 1 + \exp \left( x \right) } - 1 = 2 \frac{ 1 }{ 1 + \exp \left( -x \right) } - 1 $$

<font color='brown'>(**#**)</font> In practice such function requires numerical stable implementation. Use professionally made implementations if available.   
<font color='brown'>(**#**)</font> See [`scipy.special.expit()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.expit.html) for $\frac{ 1 }{ 1 + \exp \left( -x \right) }$.

 * The gradient of the Sigmoid function:

$$ \frac{\mathrm{d} \sigma \left( x \right) }{\mathrm{d} x} = 2 \frac{ \exp \left( x \right)}{\left( 1 + \exp \left( x \right) \right)^{2}} = 2 \left( \frac{ 1 }{ 1 + \exp \left( -x \right) } \right) \left( 1 - \frac{ 1 }{ 1 + \exp \left( -x \right) } \right) $$

<font color='brown'>(**#**)</font> For derivation of the last step, see https://math.stackexchange.com/questions/78575.

 * The loss function:

$$ J \left( \boldsymbol{w} \right) = \arg \min_{\boldsymbol{w}} \frac{1}{4 N} {\left\| \sigma \left( X \boldsymbol{w} \right) - \boldsymbol{y} \right\|}_{2}^{2} $$

 * The gradient of the loss function:

$$ \nabla_{\boldsymbol{w}} J \left( \boldsymbol{w} \right) = \frac{1}{2N} {X}^{T} \sigma' \left( \operatorname{diag} \left( X \boldsymbol{w} \right) \right) \left( \sigma \left( X \boldsymbol{w}\right) - \boldsymbol{y} \right) $$

 * The accuracy function:

$$ \text{Accuracy} = \frac{1}{N} \sum_{i = 1}^{N} \mathbb{I} \left\{ \hat{y}_{i} = y_{i} \right\}, \; \text{Where} \; \hat{y}_{i} = \operatorname{sign} \left( \boldsymbol{w}^{T} \boldsymbol{x}_{i} \right) $$

 * The Gradient Descent step:

$$ \boldsymbol{w}_{k + 1} = \boldsymbol{w}_{k} - \mu \nabla_{\boldsymbol{w}} J \left( \boldsymbol{w}_{k} \right) $$

In [None]:
# Defining the Functions

def SigmoidFun( vX: np.ndarray ) -> np.ndarray:
    
    return (2 * sp.special.expit(vX)) - 1

def GradSigmoidFun(vX: np.ndarray) -> np.ndarray:

    vExpit = sp.special.expit(vX)
    
    return 2 * vExpit * (1 - vExpit)

def LossFun(mX: np.ndarray, vW: np.ndarray, vY: np.ndarray):

    numSamples = mX.shape[0]

    vR = SigmoidFun(mX @ vW) - vY
    
    return np.sum(np.square(vR)) / (4 * numSamples)

def GradLossFun(mX: np.ndarray, vW: np.ndarray, vY: np.ndarray) -> np.ndarray:

    numSamples = mX.shape[0]
    
    return (mX.T * GradSigmoidFun(mX @ vW).T) @ (SigmoidFun(mX @ vW) - vY) / (2 * numSamples)

def CalcAccuracy(mX: np.ndarray, vW: np.ndarray, vY: np.ndarray):
    
    vHatY = np.sign(mX @ vW)
    
    return np.mean(vHatY == vY)

## Training the Model (Linear Classifier for Binary Classification)

In this section we'll implement the training phase using Gradient Descent.

**Remark**: You should get `~98%`.


In [None]:
# Parameters

#===========================Fill This===========================#
K   = 1000 #<! Num Steps
µ   = 0.05 #<! Step Size
vW  = np.zeros(mX.shape[1]) #<! Initial w
#===============================================================#

mW = np.zeros(shape = (vW.shape[0], K)) #<! Model Parameters (Weights)
vE = np.full(shape = K, fill_value = None) #<! Errors
vL = np.full(shape = K, fill_value = None) #<! Loss

mW[:, 0]    = vW
vE[0]       = 1 - CalcAccuracy(mX, vW, vY)
vL[0]       = LossFun(mX, vW, vY)

#===========================Fill This===========================#
for kk in range(1, K):
    vW -= µ * GradLossFun(mX, vW, vY)

    mW[:, kk] = GradLossFun(mX, vW, vY)
    
    vE[kk] = 1 - CalcAccuracy(mX, vW, vY)
    vL[kk] = LossFun(mX, vW, vY)
#===============================================================#

In [None]:
# Plot the Results

accFinal = CalcAccuracy(mX, vW, vY)

hF, hA = plt.subplots(figsize = FIG_SIZE_DEF)

hA.plot(vE, color = 'k', lw = 2, label = r'$J \left( w \right)$')
hA.plot(vL, color = 'm', lw = 2, label = r'$\tilde{J} \left( w \right)$')
hA.set_title(f'Loss Functions\nFinal Iteration Accuracy: {CalcAccuracy(mX, vW, vY):0.2%}')
hA.set_xlabel('Iteration Index')
hA.set_xlim((0, K - 1))
hA.set_ylim((0, 1))
hA.grid()
hA.legend()
    
plt.show()

## Validate the Gradient Calculation

In order to verify the gradient calculation one may compare it to a numeric approximation of the gradient.  
Usually this is done using the classic [Finite Difference Method](https://en.wikipedia.org/wiki/Finite_difference_method).  
Yet this method requires setting the step size parameter (The `h` parameters in Wikipedia).
Its optimal value depends on $x$ and the function itself.

Yet there is a nice trick called _Complex Step Differentiation_ which goes like:

$$ \frac{\mathrm{d} f \left( x \right) }{\mathrm{d} x} \approxeq \frac{1}{\varepsilon} \Im \left[ f \left( x + i \varepsilon \right) \right] $$

This approximation is less sensitive to the choice of the step size $\varepsilon$.

 * <font color='brown'>(**#**)</font> The tricky part of this method is the complex extension of the functions.  
   for instance, instead of `np.sum(np.abs(vX))` use `np.sum(np.sqrt(vX ** 2))`.
 * <font color='brown'>(**#**)</font> Usually setting `ε = 1e-8` will do the work.

In [None]:
# Numerical Calculation of the Gradient by the Complex Step Trick

def CalcFunGrad( hF, vX, ε ):

    numElements = vX.shape[0]
    
    vY = hF(vX)
    vG = np.zeros(numElements) #<! Gradient
    vP = np.zeros(numElements) #<! Pertubation
    vZ = np.array(vX, dtype = complex)

    for ii in range(numElements):
        vP[ii]  = ε
        vZ.imag = vP
        vG[ii]  = np.imag(hF(vZ)) / ε
        vP[ii]  = 0
    
    return vG


In [None]:
# Updating Functions to Support Complex Input

def SigFunComplex( vX: np.ndarray ) -> np.ndarray:

    return 1 / (1 + np.exp(-vX))


def SigmoidFunComplex( vX: np.ndarray ) -> np.ndarray:
    
    return (2 * SigFunComplex(vX)) - 1


def LossFunComplex(mX: np.ndarray, vW: np.ndarray, vY: np.ndarray):

    numSamples = mX.shape[0]

    vR = SigmoidFunComplex(mX @ vW) - vY
    
    return np.sum(np.square(vR)) / (4 * numSamples)

In [None]:
# Calculating the Gradient Numerically

ε = 1e-8
 
hL = lambda vW: LossFunComplex(mX, vW, vY)

vW = np.random.rand(mX.shape[1])
vG = CalcFunGrad(hL, vW, ε) #<! Numerical gradient

In [None]:
# Verifying the complex variation matches the reference 

maxError = np.max(np.abs(LossFunComplex(mX, vW, vY) - LossFun(mX, vW, vY)))
print(f'The maximum absolute deviation of the complex variation: {maxError}')

In [None]:
maxError = np.max(np.abs(GradLossFun(mX, vW, vY) - vG))
print(f'The maximum absolute deviation of the numerical gradient: {maxError}') #<! We expect it to be less than 1e-8 for Float64