[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io/)

# Machine Learning Methods

## Supervised Learning - Classification - K-NN Classifier - Exercise

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 20/01/2023 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2023_01/0010ClassifierKnnExercise.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import load_breast_cancer
from sklearn.neighbors import KNeighborsClassifier

# Misc
import os
from platform import python_version
import random

# Typing
from typing import Tuple

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
%matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF = (8, 8)
ELM_SIZE_DEF = 50
CLASS_COLOR = ('b', 'r')


In [None]:
# Fixel Algorithms Packages


In [None]:
# Parameters

# Data Visualization
figSize     = (8, 8)
elmSize     = 50
classColor0 = CLASS_COLOR[0]
classColor1 = CLASS_COLOR[1]

numGridPts = 250

In [None]:
# Auxiliary Functions

def PlotBinaryClassData( mX: np.ndarray, vY: np.ndarray, hA:plt.Axes = None, figSize: Tuple[int, int] = FIG_SIZE_DEF, elmSize: int = ELM_SIZE_DEF, classColor: Tuple[str, str] = CLASS_COLOR, axisTitle: str = None ) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(figsize = figSize)
    else:
        hF = hA.get_figure()
    
    vC, vN = np.unique(vY, return_counts = True)

    numClass = len(vC)
    if (len(vC) != 2):
        raise ValueError(f'The input data is not binary, the number of classes is: {numClass}')

    vIdx0 = vY == vC[0]
    vIdx1 = vY == vC[1] #<! Basically ~vIdx0

    hA.scatter(mX[vIdx0, 0], mX[vIdx0, 1], s = elmSize, color = classColor[0], edgecolor = 'k', label = f'$C_\u007b {vC[0]} \u007d$')
    hA.scatter(mX[vIdx1, 0], mX[vIdx1, 1], s = elmSize, color = classColor[1], edgecolor = 'k', label = f'$C_\u007b {vC[1]} \u007d$')
    hA.axvline(x = 0, color = 'k')
    hA.axhline(y = 0, color = 'k')
    hA.axis('equal')
    if axisTitle is not None:
        hA.set_title(axisTitle)
    hA.legend()
    
    return hA

## Exercise

In this exercise we'll do the following:

1. Apply a K-NN Classifier on the [_Breast Cancer Wisconsin (Diagnostic) Data Set_](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).
2. Visualize pair of features.

## Generate / Load Data


In [None]:
# Load Data 

dData = load_breast_cancer()
mX    = dData.data
vY    = dData.target

lFeatName = dData.feature_names

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')

In [None]:
# Pre Process Data

# Normalize Data (Features) into [0, 1]

#===========================Fill This===========================#
mX = ???
#===============================================================#

In [None]:
numSamples  = mX.shape[0]
print(f'The features data shape: {mX.shape}') #>! Should be (569, 30)

<font color='red'>(**?**)</font> Should we add the constant column for this classifier? What effect will it have?

## Train a K-NN Classifier

### Visualizing High Dimensional Data

We're limited to display low dimensional data (Usually 2 or 3 dimensions, a bit more with creativity).  
In this case the data is $\boldsymbol{x}_{i} \in \mathbb{R}^{30}$. 

One way to still work with the data is to show subset of the features and their behaviour.  
In the next example we'll explore the scatter plot of 2 features with their labels and predictions.

In [None]:
# Plotting Function

dMetric = {'l1': 'L1', 'l2': 'L2', 'cosine': 'Cosine'}
dFeaturesByIdx  = {}
dFeaturesByName = {}

for ii, featName in enumerate(lFeatName):
    dFeaturesByIdx[ii]          = featName
    dFeaturesByName[featName]   = ii

def PlotKnn( K, metricChoice, mX, vY, featXName, featYName ):
    lSlcFeature = [dFeaturesByName[featXName], dFeaturesByName[featYName]]
    
    # Train the a K-NN classifier
    #===========================Fill This===========================#
    oKnnClassifier = ??? #<! Creating the object
    oKnnClassifier = ??? #<! Training on the data
    #===============================================================#
    
    # Predict
    #===========================Fill This===========================#
    vYY = ??? #<! Prediction
    #===============================================================#

    # Score (Accuracy)
    #===========================Fill This===========================#
    scoreAcc = ??? #<! Score
    #===============================================================#

    # Plot classification
    hF, hA = plt.subplots(figsize = (8, 8))
    hA = PlotBinaryClassData(mX[:, lSlcFeature], vYY, hA = hA, elmSize = 4 * ELM_SIZE_DEF, classColor = ('c', 'm'), axisTitle = f'K-NN Classifier: $K = {K}$, Metric: {dMetric[metricChoice]}, Aacuracy: {scoreAcc:0.2%}')
    hA = PlotBinaryClassData(mX[:, lSlcFeature], vY, hA = hA, elmSize = ELM_SIZE_DEF)

    tLegend = hA.get_legend_handles_labels()
    lLegendLabels = tLegend[1]
    for ii, labelTxt in enumerate(lLegendLabels):
        if ii < 2:
            labelTxt += ' Predictor'
        else:
            labelTxt += ' Ground Truth'
        
        lLegendLabels[ii] = labelTxt
    
    hA.set_xlabel(featXName)
    hA.set_ylabel(featYName)
    
    hA.legend(handles = tLegend[0], labels = lLegendLabels)
    

In [None]:
# Interaction Elements

kSlider                 = IntSlider(min = 1, max = 21, step = 2, value = 1, layout = Layout(width = '30%'))
metricDropdown          = Dropdown(options = ['l1', 'l2', 'cosine'], value = 'l2', description = 'Metric')
featXSelectionSlider    = SelectionSlider(options = lFeatName, value = dFeaturesByIdx[0], description = 'Feature 1 (x)', layout = Layout(width = '30%'))
featYSelectionSlider    = SelectionSlider(options = lFeatName, value = dFeaturesByIdx[1], description = 'Feature 2 (y)', layout = Layout(width = '30%'))

In [None]:
# Display the Geometry of the Classifier

hPlotKnn = lambda K, metricChoice, featXName, featYName: PlotKnn(K, metricChoice, mX, vY, featXName, featYName)
interact(hPlotKnn, K = kSlider, metricChoice = metricDropdown, featXName = featXSelectionSlider, featYName = featYSelectionSlider)

plt.show()