[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io/)

# AI Program

## Machine Learning - Classification - K Nearest Neighbors (K-NN) Classifier

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 09/03/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0032ClassifierKnn.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import load_breast_cancer
from sklearn.neighbors import KNeighborsClassifier

# Image Processing

# Machine Learning

# Miscellaneous
import math
import os
from platform import python_version
import random
import timeit

# Typing
from typing import Callable, Dict, List, Optional, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

In [None]:
# Courses Packages

from DataVisualization import PlotBinaryClassData


In [None]:
# General Auxiliary Functions



In [None]:
# Parameters

# Data Generation
numCircles0 = 250
numCircles1 = 250
numSwaps    = 50 #<! Number of samples to swap between inner circle and outer circle
noiseLevel  = 0.03


# Data Visualization
elmSize     = ELM_SIZE_DEF
classColor0 = CLASS_COLOR[0]
classColor1 = CLASS_COLOR[1]

numGridPts = 250

## Exercise

In this exercise we'll do the following:

1. Apply a K-NN Classifier on the [_Breast Cancer Wisconsin (Diagnostic) Data Set_](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).
2. Visualize pair of features.

## Generate / Load Data


In [None]:
# Load Data 

dData = load_breast_cancer()
mX    = dData.data
vY    = dData.target

lFeatName = dData.feature_names

print(f'The features data shape: {mX.shape}')
print(f'The labels data shape: {vY.shape}')

In [None]:
# Pre Process Data
# Normalizing the Maximum and Minimum value of each feature.


#===========================Fill This===========================#
# 1. Normalize Data (Features) into [0, 1].
mX = ???
#===============================================================#

In [None]:
# Data Dimensions 
numSamples  = mX.shape[0]
print(f'The features data shape: {mX.shape}') #>! Should be (569, 30)

<font color='red'>(**?**)</font> Should we add the constant column for this classifier? What effect will it have?

## Train a K-NN Classifier

### Visualizing High Dimensional Data

We're limited to display low dimensional data (Usually 2 or 3 dimensions, a bit more with creativity).  
In this case the data is $\boldsymbol{x}_{i} \in \mathbb{R}^{30}$. 

One way to still work with the data is to show subset of the features and their behavior.  
In the next example we'll explore the scatter plot of 2 features with their labels and predictions.

In [None]:
# Plotting Function

dMetric = {'l1': 'L1', 'l2': 'L2', 'cosine': 'Cosine'}
dFeaturesByIdx  = {}
dFeaturesByName = {}

for ii, featName in enumerate(lFeatName):
    dFeaturesByIdx[ii]          = featName
    dFeaturesByName[featName]   = ii

def PlotKnn( K, metricChoice, mX, vY, featXName, featYName ):
    lSlcFeature = [dFeaturesByName[featXName], dFeaturesByName[featYName]]
    
    # Train the a K-NN classifier
    #===========================Fill This===========================#
    oKnnClassifier = ??? #<! Creating the object
    oKnnClassifier = ??? #<! Training on the data
    #===============================================================#
    
    # Predict
    #===========================Fill This===========================#
    vYY = ??? #<! Prediction
    #===============================================================#

    # Score (Accuracy)
    #===========================Fill This===========================#
    scoreAcc = ??? #<! Score
    #===============================================================#

    # Plot classification
    hF, hA = plt.subplots(figsize = (8, 8))
    hA = PlotBinaryClassData(mX[:, lSlcFeature], vYY, hA = hA, elmSize = 4 * ELM_SIZE_DEF, classColor = ('c', 'm'), axisTitle = f'K-NN Classifier: $K = {K}$, Metric: {dMetric[metricChoice]}, Aacuracy: {scoreAcc:0.2%}')
    hA = PlotBinaryClassData(mX[:, lSlcFeature], vY, hA = hA, elmSize = ELM_SIZE_DEF)

    tLegend = hA.get_legend_handles_labels()
    lLegendLabels = tLegend[1]
    for ii, labelTxt in enumerate(lLegendLabels):
        if ii < 2:
            labelTxt += ' Predictor'
        else:
            labelTxt += ' Ground Truth'
        
        lLegendLabels[ii] = labelTxt
    
    hA.set_xlabel(featXName)
    hA.set_ylabel(featYName)
    
    hA.legend(handles = tLegend[0], labels = lLegendLabels)
    

In [None]:
# Interaction Elements

kSlider                 = IntSlider(min = 1, max = 21, step = 2, value = 1, layout = Layout(width = '30%'))
metricDropdown          = Dropdown(options = ['l1', 'l2', 'cosine'], value = 'l2', description = 'Metric')
featXSelectionSlider    = SelectionSlider(options = lFeatName, value = dFeaturesByIdx[0], description = 'Feature 1 (x)', layout = Layout(width = '30%'))
featYSelectionSlider    = SelectionSlider(options = lFeatName, value = dFeaturesByIdx[1], description = 'Feature 2 (y)', layout = Layout(width = '30%'))

In [None]:
# Display the Geometry of the Classifier

hPlotKnn = lambda K, metricChoice, featXName, featYName: PlotKnn(K, metricChoice, mX, vY, featXName, featYName)
interact(hPlotKnn, K = kSlider, metricChoice = metricDropdown, featXName = featXSelectionSlider, featYName = featYSelectionSlider)

plt.show()

## Curse of Dimensionality

The [Curse of Dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) implies the distribution of distance behave differently as the dimension grows.  

* <font color='green'>(**@**)</font> Show a graph of the ratio between the mean distance of a points in a cube to the maximum distance as a function of `d`.
* <font color='green'>(**@**)</font> Given the the volume of a ball in $\mathbb{R}^{d}$ (See [Volume of an $n$ Ball](https://en.wikipedia.org/wiki/Volume_of_an_n-ball)), show the relation between the volume of the ball inscribed within the _unit cube_ in $\mathbb{R}^{d}$ and the cube itself.
* <font color='red'>(**?**)</font> Since the ratio between the volume of the ball and the unit cube goes to zero, what does it mean about the interior of the cube if points are uniformly drawn in the cube?