[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io/)

# AI Program

## Machine Learning - UnSupervised Learning - Manifold Learning - Exercise

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.002 | 18/05/2024 | Royi Avital | Added notes on local and global methods (`PaCMAP`)                 |
| 1.0.001 | 12/05/2024 | Royi Avital | Changed `Data()` into `DataLoader()`                               |
| 1.0.000 | 13/04/2024 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0069ManifoldLearning.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import load_wine, load_breast_cancer, load_digits
from sklearn.manifold import TSNE, SpectralEmbedding, Isomap
from sklearn.preprocessing import StandardScaler
from umap import UMAP

# Miscellaneous
import math
import os
from platform import python_version
import random
import timeit

# Typing
from typing import Callable, Dict, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())


In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2


In [None]:
# Courses Packages


In [None]:
# General Auxiliary Functions


## Dimensionality Reduction

In this exercise we'll compare different methods for dimensionality reduction on real world data sets.

In this notebook:

 - We'll compare 4 methods:
   * [`IsoMap`](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html).
   * `t-SNE` (Implemented as [`TSNE`](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)).
   * `Laplacian Eigenmaps` (Implemented as [`SpectralEmbedding`](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.SpectralEmbedding.html)).
   * `UMAP` (Implemented in the [UMAP Package](https://github.com/lmcinnes/umap)).
 - We'll use 3 different datasets: `load_wine`, `load_breast_cancer`, `load_digits`.

The `SpectralEmbedding` method is an eigen decomposition of the Laplacian Matrix of the data graph.  
It allows, similar to `t-SNE` keep the local structure of the high dimension data in the low dimension.


* <font color='brown'>(**#**)</font> The [UMAP](https://github.com/lmcinnes/umap) algorithm is considered to fix some of the main pitfall of the T-SNE: Support for _Out of Sample Extension_.  
  Yet it still share the concept of preserving local metric.
* <font color='brown'>(**#**)</font> There are some new algorithms which tries to preserve both the local structure and global.  
  One of the most known algorithm is [PaCMAP: Large Scale Dimension Reduction Technique Preserving Both Global and Local Structure](https://github.com/YingfanWang/PaCMAP).   

In [None]:
# Parameters

#===========================Fill This===========================#
# 1. Create a list of the data sets loaders / generators.
# 2. Create a list of the data sets names.
lData      = ???
lDataStr   = ???
#===============================================================#

#===========================Fill This===========================#
# 1. Create a list of the dimensionality reduction operators.
# 2. Create a list of the dimensionality reduction operators names.
#!! Set the parameters of the models in this phase.
lMethod    = ???
lMethodStr = ???
#===============================================================#

# Colors
lC         = ['r', 'g', 'b', 'c']


## Analyze the Methods

In [None]:
hF, hA = plt.subplots(nrows = len(lData), ncols = len(lMethod), figsize = (12, 8))

for ii, DataLoader in enumerate(lData):
    mX, vY = DataLoader(return_X_y = True)
    
    if lDataStr[ii] != 'Digits':
        mX = StandardScaler().fit_transform(mX)
    
    for jj, oMethod in enumerate(lMethod):
        mZ = oMethod.fit_transform(mX)
        
        hA[ii, jj].scatter(*mZ.T, s = 25, c = vY, edgecolor = 'k', cmap = 'tab10', vmin = -1/2, vmax = 9.5)                         
        hA[ii, jj].set_title(f'{lDataStr[ii]} - {lMethodStr[jj]}', color = lC[ii])
    
plt.tight_layout()
plt.show()

* <font color='red'>(**?**)</font> In case we're after optimization of the Hyper Parameters of the models, how would you set the score?
* <font color='blue'>(**!**)</font> Try to optimize the model parameters.