[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io/)

# Machine Learning Methods

## Exercise 007 - Dimensionality Reduction

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 03/03/2023 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2023_01/Exercise0007DimensionalityReductionSolution.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier

from sklearn.compose import ColumnTransformer
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold
from sklearn.manifold import Isomap, MDS, SpectralEmbedding, TSNE
from sklearn.model_selection import cross_val_predict
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

# Miscellaneous
import itertools
import json
import os
from platform import python_version
import random
import urllib.request
import re

# Typing
from typing import Callable, List, Tuple

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

DATA_FILE_URL   = r'https://drive.google.com/uc?export=download&confirm=9iBg&id=1UXLdZgXwClgwZVszRq88UaaN2nvgMFiC'
DATA_FILE_NAME  = r'BciData.npz'

In [None]:
# Fixel Algorithms Packages


## Exercise

In this exercise we'll use _Dimensionality Reduction_ as a feature engineering process.  
We'll try 3 different approaches utilize the process: Linear, Non Linear and with manual feature engineering.

In this exercise:

1. We'll process real world EEG signals to identify movement of body parts.
2. We'll use dimensionality reduction with a classifier to identify the part of the body.
3. We'll optimize the combination of the dimensionality reduction and the classifier.
4. Visualize the features in low dimension (2).

* <font color='brown'>(**#**)</font> This exercise won't have single line completions. It will require coding the whole block according to instructions.

In [None]:
# Parameters

numRowsPlot = 3
numColsPlot = 2


In [None]:
# Auxiliary Functions

def PlotLabelsHistogram(vY: np.ndarray, hA = None, lClass = None, xLabelRot: int = None) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(figsize = (8, 6))
    
    vLabels, vCounts = np.unique(vY, return_counts = True)

    hA.bar(vLabels, vCounts, width = 0.9, align = 'center')
    hA.set_title('Histogram of Classes / Labels')
    hA.set_xlabel('Class')
    hA.set_ylabel('Number of Samples')
    hA.set_xticks(vLabels)
    if lClass is not None:
        hA.set_xticklabels(lClass)
    
    if xLabelRot is not None:
        for xLabel in hA.get_xticklabels():
            xLabel.set_rotation(xLabelRot)

    return hA



## Generate / Load Data

In this exercise we'll use the Brain Computer Interfaces (BCI) Data from [BCI Competition IV](https://www.bbci.de/competition/iv/).  
Specifically we'll use [data set 2a](https://www.bbci.de/competition/iv/#dataset2a) provided by the Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology, (Clemens Brunner, Robert Leeb, Gernot Müller-Putz, Alois Schlögl, Gert Pfurtscheller).  
This is a recording of EEG signals while the subject is doing a cued movement of the left hand, right hand, feet or tongue.

The data is composed of:

 * 22 EEG channels (0.5-100Hz; notch filtered).
 * 4 classes: left hand, right hand, feet, tongue.
 * 9 subjects.


In [None]:
# Download Data
# This section downloads data from the given URL if needed.

if not os.path.exists(DATA_FILE_NAME):
    urllib.request.urlretrieve(DATA_FILE_URL, DATA_FILE_NAME)

In [None]:
# Loading / Generating Data

dData = np.load(DATA_FILE_NAME)
mX    = dData['mX1']
vY    = dData['vY1']

# Sample: An observation of a subject.
# Measurement: Sample in time of the EEG signal.
# Channel: EEG Channel.
numSamples, numMeasurements, numChannels = mX.shape 
lLabel = ['Left Hand', 'Right Hand', 'Foot', 'Tongue'] #<! The labels

print(f'The data shape: {mX.shape}')


### Pre Process Data

Scale the pixels into the [0, 1] range.

In [None]:
# Pre Process Data


### Plot the Data

In [None]:
# Plot the Image

numPlots = numRowsPlot * numColsPlot

hF, hAs = plt.subplots(nrows = numRowsPlot, ncols = numColsPlot, figsize = (18, 10))
hAs = hAs.flat

vIdx = np.random.choice(numSamples, numPlots, replace = False)

for sampleIdx, hA in zip(vIdx, hAs):
    mXX = mX[sampleIdx, :, :]
    hA.plot(mXX)
    hA.set_title(f'EEG Signals of Sample {sampleIdx:04d} with Label {lLabel[vY[sampleIdx]]}')
    hA.set_xlabel('Measurement Index')
    hA.set_ylabel('Measurement Value')

hF.tight_layout()

plt.show()

In [None]:
# Histogram of Labels

hA = PlotLabelsHistogram(vY, lClass = lLabel)

* <font color='red'>(**?**)</font> What score would you use in the case above?

## Model 001

In this model we'll use a linear combination of the features with a classifier.  
The cross validation is _leave one out_ using `cross_val_predict()`.

1. Transform data into `(numSamples, numMeasurements x numChannels)`.
2. Apply PCA with the optimal number of components.
3. Use a **non ensemble** classifier of your choice.

Objective, above 35% accuracy in _Leave One Out_ cross validation.

* <font color='red'>(**?**)</font> What's the maximum number of components of the PCA you may use?

In [None]:
# Reshape Data
mXX = np.reshape(mX, (numSamples, -1))

In [None]:
# Optimization Parameters
lModels     = [DecisionTreeClassifier(criterion = 'gini', max_leaf_nodes = 12), LogisticRegression(penalty = 'l2', C = 0.5, fit_intercept = True), SVC(C = 0.9, kernel = 'rbf')] #<! We may optimize the Model Hyper Parameters as well
lNumComp    = list(range(10, 51))
# lNumComp    = list(range(10, 12)) #<! For fast testing
lModelsStr  = ['Decision Tree', 'Logistic Regression', 'SVC']


In [None]:
# Creating the Data Frame

numComb = len(lModels) * len(lNumComp)
dData   = {'Model': [], 'Number of Components': [], 'Accuracy': [0.0] * numComb}

for ii, numComp in enumerate(lNumComp):
    for jj, modelStr in enumerate(lModelsStr):
        dData['Model'].append(modelStr)
        dData['Number of Components'].append(numComp)

dfModelScore = pd.DataFrame(data = dData)
dfModelScore

In [None]:
# Optimize the Model
currNumComp = 0

for ii in range(numComb):
    modelStr   = dfModelScore.loc[ii, 'Model']
    numComp    = dfModelScore.loc[ii, 'Number of Components']

    if numComp != currNumComp:
        mZ = PCA(n_components = numComp).fit_transform(mXX)
        currNumComp = numComp


    print(f'Processing model {ii + 1:03d} out of {numComb} with `Model` = {modelStr} and `Number of Components` = {numComp}.')

    # Very slow in Pipeline, hence cache the PCA
    # oPipeCls = Pipeline([('PCA', PCA(n_components = numComp)), ('Classifier', lModels[lModelsStr.index(modelStr)])])
    # vYPred = cross_val_predict(oPipeCls, mXX, vY, cv = KFold(numSamples), n_jobs = 6)
    
    vYPred = cross_val_predict(lModels[lModelsStr.index(modelStr)], mZ, vY, cv = KFold(numSamples), n_jobs = 6)

    accScore = np.mean(vY == vYPred)
    dfModelScore.loc[ii, 'Accuracy'] = accScore
    print(f'Finished processing model {ii + 1:03d} with `Accuracy = {accScore:0.2%}.')

In [None]:
# Display Sorted Results (Descending)

dfModelScore.sort_values(by = ['Accuracy'], ascending = False).head(10)

## Model 002

In this model we'll use the covariance matrix of the data as a feature.  
Basically we want to take advantage of the interaction between the different channels.

Do the following steps:

1. For each sample $\boldsymbol{X}_{i}\in\mathbb{R}^{1000 \times 22}$, compute the covariance matrix $\boldsymbol{C}_{i}\in\mathbb{R}^{22\times22}$.  
   You may use [`np.cov()`](https://numpy.org/doc/stable/reference/generated/numpy.cov.html).
2. Set $\boldsymbol{c}_{i}\in\mathbb{R}^{22^{2}}$ as the columns stack version of each $\boldsymbol{C}_{i}\in\mathbb{R}^{22 \times 22}$.  
   You may use [`np.reshape()`](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html).
4. Build the feature matrix based on $\boldsymbol{c}_{i}$.
5. Analyze the separation using a non linear dimensionality reduction model (Use `vY` for the colorization of data).
6. Use an ensemble based classifier.

You should optimize the combination of the classification model and the dimensionality reduction model.

Objective, above 50% accuracy in _Leave One Out_ cross validation.

* <font color='brown'>(**#**)</font> You should also try no dimensionality reduction as well.
* <font color='brown'>(**#**)</font> If you use `t-SNE`, you may want to apply `PCA` to reduce the data into ~50 dimensions.

In [None]:
# Generate the Features

mC = np.full((numSamples, numChannels, numChannels), np.nan)
for ii in range(numSamples):
    mXi          = mX[ii, :, :]
    mC[ii, :, :] = np.cov(mXi.T)


mXX = np.reshape(mC, (numSamples, -1))


In [None]:
# Plot the Low Dimensional Model

lPlotModels     = [Isomap(), TSNE()]
lPlotModelsStr  = ['IsoMap', 't-SNE']

hF, hAs = plt.subplots(nrows = 1, ncols = len(lPlotModels))
hAs = hAs.flat

for ii, (plotModel, hA) in enumerate(zip(lPlotModels, hAs)):
    plotModel = plotModel.set_params(n_components = 2)
    mZ = plotModel.fit_transform(mXX)

    hA.scatter(mZ[:, 0], mZ[:, 1], s = 50, c = vY, edgecolor = 'k')
    hA.set_title(lPlotModelsStr[ii])
    hA.axis('equal')


plt.show()


In [None]:
# Optimization Parameters
lModels     = [LGBMClassifier(num_leaves = 15, n_estimators = 25), XGBClassifier(n_estimators = 25, max_leaves = 15)] #<! We may optimize the Model Hyper Parameters as well
lModelsStr  = ['LightGBM', 'XGBoost']
lDrModel    = [ColumnTransformer(transformers = [('Pass', 'passthrough', list(range(numChannels * numChannels)))]), Isomap(n_components = 2), Isomap(n_components = 10), TSNE(n_components = 2), TSNE(n_components = 10, method = 'exact')]
lDrModelStr = ['Pass', 'Isomap02', 'Isomap10', 'TSNE02', 'TSNE10']


In [None]:
# Creating the Data Frame

numComb = len(lModels) * len(lDrModel)
dData   = {'CLS Model': [], 'DR Model': [], 'Accuracy': [0.0] * numComb}

for ii, drModelStr in enumerate(lDrModelStr):
    for jj, clsModelStr in enumerate(lModelsStr):
        dData['CLS Model'].append(clsModelStr)
        dData['DR Model'].append(drModelStr)

dfModelScore = pd.DataFrame(data = dData)
dfModelScore

In [None]:
# Optimize the Model
currDrModelStr = 'Default'
vYPred = np.full(shape = numSamples, fill_value = -1, dtype = np.int8)
vF = np.full(shape = numSamples, fill_value = True)

for ii in range(numComb):
    clsModelStr   = dfModelScore.loc[ii, 'CLS Model']
    drModelStr    = dfModelScore.loc[ii, 'DR Model']

    if drModelStr != currDrModelStr:
        mZ = lDrModel[lDrModelStr.index(drModelStr)].fit_transform(mXX)
        currDrModelStr = drModelStr

    print(f'Processing model {ii + 1:03d} out of {numComb} with `CLS Model` = {clsModelStr} and `DR Model` = {drModelStr}.')
    
    vYPred = cross_val_predict(lModels[lModelsStr.index(clsModelStr)], mZ, vY, cv = KFold(numSamples))

    # Manual Leave One Out
    # modelCls = lModels[lModelsStr.index(clsModelStr)]

    # for jj in range(numSamples):
    #     vF[jj] = False
    #     modelCls = modelCls.fit(mZ[vF, :], vY[vF])
    #     vYPred[jj] = modelCls.predict(np.atleast_2d(mZ[jj]).copy())
    #     vF[jj] = True

    accScore = np.mean(vY == vYPred)
    dfModelScore.loc[ii, 'Accuracy'] = accScore
    print(f'Finished processing model {ii + 1:03d} with `Accuracy = {accScore:0.2%}.')

In [None]:
# Display Sorted Results (Descending)

dfModelScore.sort_values(by = ['Accuracy'], ascending = False).head(10)

## Model 003

In this model we'll use the covariance matrix of the data as a feature.  
Yet, we'll add a covariance matrix specific metric, the SPD Metric (`SpdMetric`):

$$d\left(\boldsymbol{C}_{i},\boldsymbol{C}_{j}\right)=\sqrt{\sum_{i=1}^{d}\log^{2}\left(\lambda_{i}\left(\boldsymbol{C}_{i}^{-1}\boldsymbol{C}_{j}\right)\right)}$$

Where ${\lambda}_{i} \left( \cdot \right)$ extract the $i$ -th eigen value of the matrix.

We'll use this metric to apply a metric aware dimensionality reduction.

Do the following steps:

1. Implement the `SpdMetric` metric.  
   The function `SpdMetric()` input is 2 column stacked covariance matrix $\boldsymbol{c}_{i}$ and $\boldsymbol{c}_{j}$.  
   The function reshapes them back to the two matrices $\boldsymbol{C}_{i}$ and $\boldsymbol{C}_{j}$ and returns $d\left(\boldsymbol{C}_{i},\boldsymbol{C}_{j}\right)$.
2. Based covariance features from _Model 002_ build a distance matrix `mD` based on the metric above.  
   Namely `mD[ii, jj]` should be the distance between the covariance matrix of the `ii` sample to the `jj` sample.
5. Analyze the separation using a non linear dimensionality reduction model (Use `vY` for the colorization of data).  
   Make sure to utilize the distance function above.
6. Use an ensemble based classifier applied on the dimensionality reduced data.

You should optimize the combination of the classification model and the dimensionality reduction model.

* <font color='brown'>(**#**)</font> You may use `sp.linalg.eigvalsh(mCi, mCj)` to calculate $ \lambda_{i}\left(\boldsymbol{C}_{i}^{-1}\boldsymbol{C}_{j}\right) $.

Objective, above 75% accuracy in _Leave One Out_ cross validation.

In [None]:
# The Covariance Matrix Metric

def SpdMetric(vC1: np.ndarray, vC2: np.ndarray) -> float:

    numRows = int(np.sqrt(len(vC1)))
    
    mC1 = np.reshape(vC1, (numRows, numRows))
    mC2 = np.reshape(vC2, (numRows, numRows))
    
    vλ = sp.linalg.eigvalsh(mC1, mC2)
    
    return np.linalg.norm(np.log(vλ))

In [None]:
# Distance Matrix to Affinity Matrix

def ConvertDistanceMatAffinityMat(mD: np.ndarray) -> np.ndarray:

    return np.exp(mD / np.std(mD))

In [None]:
# Generating the Distance Matrix

mD = np.zeros(shape = (numSamples, numSamples))
for ii in range(numSamples):
    vCi = mXX[ii, :]
    for jj in range(ii):            
        vCj        = mXX[jj, :]
        mD[ii, jj] = SpdMetric(vCi, vCj)
        mD[jj, ii] = mD[ii, jj] #<! Symmetric

In [None]:
mA = ConvertDistanceMatAffinityMat(mD)

In [None]:
# Plot the Low Dimensional Model

lPlotModels     = [MDS(dissimilarity = 'precomputed', normalized_stress = False), SpectralEmbedding(affinity = 'precomputed'), TSNE(metric = 'precomputed', init = 'random')]
lPlotModelsStr  = ['MDS', 'SpectralEmbedding', 't-SNE']

hF, hAs = plt.subplots(nrows = 1, ncols = len(lPlotModels), figsize = (15, 8))
hAs = hAs.flat

for ii, (plotMode, hA) in enumerate(zip(lPlotModels, hAs)):
    plotMode.set_params(n_components = 2)
    if lPlotModelsStr[ii] == 'SpectralEmbedding':
        mZ = plotMode.fit_transform(mA)
    else:
        mZ = plotMode.fit_transform(mD)

    hA.scatter(mZ[:, 0], mZ[:, 1], s = 50, c = vY, edgecolor = 'k')
    hA.set_title(lPlotModelsStr[ii])
    hA.axis('equal')


plt.show()

In [None]:
# Optimization Parameters
lModels     = [SVC(kernel = 'rbf'), LGBMClassifier(num_leaves = 15, n_estimators = 25), XGBClassifier(n_estimators = 25, max_leaves = 15)] #<! We may optimize the Model Hyper Parameters as well
lModelsStr  = ['SVC', 'LightGBM', 'XGBoost']
lDrModel    = [MDS(n_components = 2, dissimilarity = 'precomputed', normalized_stress = False), MDS(n_components = 10, dissimilarity = 'precomputed', normalized_stress = False), TSNE(n_components = 2, metric = 'precomputed', init = 'random'), TSNE(n_components = 10, metric = 'precomputed', init = 'random', method = 'exact')]
lDrModelStr = ['MDS02', 'MDS10', 'TSNE02', 'TSNE10']

In [None]:
# Creating the Data Frame

numComb = len(lModels) * len(lDrModel)
dData   = {'CLS Model': [], 'DR Model': [], 'Accuracy': [0.0] * numComb}

for ii, drModelStr in enumerate(lDrModelStr):
    for jj, clsModelStr in enumerate(lModelsStr):
        dData['CLS Model'].append(clsModelStr)
        dData['DR Model'].append(drModelStr)

dfModelScore = pd.DataFrame(data = dData)
dfModelScore

In [None]:
# Optimize the Model
currDrModelStr = 'Default'
vYPred = np.full(shape = numSamples, fill_value = -1, dtype = np.int8)
vF = np.full(shape = numSamples, fill_value = True)

for ii in range(numComb):
    clsModelStr   = dfModelScore.loc[ii, 'CLS Model']
    drModelStr    = dfModelScore.loc[ii, 'DR Model']

    if drModelStr != currDrModelStr:
        mZ = lDrModel[lDrModelStr.index(drModelStr)].fit_transform(mD)
        currDrModelStr = drModelStr

    print(f'Processing model {ii + 1:03d} out of {numComb} with `CLS Model` = {clsModelStr} and `DR Model` = {drModelStr}.')
    
    vYPred = cross_val_predict(lModels[lModelsStr.index(clsModelStr)], mZ, vY, cv = KFold(numSamples))

    accScore = np.mean(vY == vYPred)
    dfModelScore.loc[ii, 'Accuracy'] = accScore
    print(f'Finished processing model {ii + 1:03d} with `Accuracy = {accScore:0.2%}.')

In [None]:
# Display Sorted Results (Descending)

dfModelScore.sort_values(by = ['Accuracy'], ascending = False).head(10)

* <font color='red'>(**?**)</font> Why does the SVC classifier performs as good as the tree based ensembles?