[![Fixel Algorithms](https://i.imgur.com/AqKHVZ0.png)](https://fixelalgorithms.gitlab.io)

# AI Program

## Machine Learning - UnSupervised Learning - Manifold Learning - UMAP

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 1.0.000 | 13/09/2025 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/AIProgram/2024_02/0067ManifoldLearningIsoMap.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.manifold import Isomap
from umap import UMAP

# Miscellaneous
import os
from platform import python_version
import random
import urllib

# Typing
from typing import Callable, Dict, List, Optional, Self, Set, Tuple, Union
from numpy.typing import NDArray

# Visualization
import matplotlib.pyplot as plt

# Jupyter
from IPython import get_ipython

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

Code Notations:

```python
someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler
```

### Code Exercise

 - Single line fill

```python
valToFill = ???
```

 - Multi Line to Fill (At least one)

```python
# You need to start writing
?????
```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

?????
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())


In [None]:
# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

# Searching face_data.mat github
DATA_FILE_URL = r'https://github.com/Mashimo/datascience/raw/master/datasets/face_data.mat'
DATA_FILE_URL = r'https://github.com/SpencerKoevering/DRCapstone/raw/main/Isomap_face_data.mat'
DATA_FILE_URL = r'https://github.com/jasonfilippou/DimReduce/raw/master/ISOMAP/face_data.mat'

DATA_FILE_NAME = r'IsoMapFaceData.mat'

In [None]:
# Courses Packages


In [None]:
# General Auxiliary Functions

def DownloadUrl( fileUrl: str, fileName: str ) -> str:
    
    if not os.path.exists(fileName):
        urllib.request.urlretrieve(fileUrl, fileName)

    return fileName

def PlotMnistImages( mX: NDArray, vY: NDArray, numRows: int, numCols: Optional[int] = None, tuImgSize: Tuple = (28, 28), randomChoice: bool = True, lClasses: Optional[List] = None, hF: Optional[plt.Figure] = None ) -> plt.Figure:

    numSamples  = mX.shape[0]
    numPx       = mX.shape[1]

    if numCols is None:
        numCols = numRows

    tFigSize = (numCols * 3, numRows * 3)

    if hF is None:
        hF, hA = plt.subplots(numRows, numCols, figsize = tFigSize)
    else:
        hA = hF.axes
    
    hA = np.atleast_1d(hA) #<! To support numImg = 1
    hA = hA.flat
    
    for kk in range(numRows * numCols):
        idx = np.random.choice(numSamples) if randomChoice else kk
        mI  = np.reshape(mX[idx, :], tuImgSize)
    
        # hA[kk].imshow(mI.clip(0, 1), cmap = 'gray')
        if len(tuImgSize) == 2:
            hA[kk].imshow(mI, cmap = 'gray')
        elif len(tuImgSize) == 3:
            hA[kk].imshow(mI)
        else:
            raise ValueError(f'The length of the image size tuple is {len(tuImgSize)} which is not supported')
        hA[kk].tick_params(axis = 'both', left = False, top = False, right = False, bottom = False, 
                           labelleft = False, labeltop = False, labelright = False, labelbottom = False)
        if lClasses is None:
            hA[kk].set_title(f'Index = {idx}, Label = {vY[idx]}')
        else:
            hA[kk].set_title(f'Index = {idx}, Label = {lClasses[vY[idx]]}')
    
    return hF

## Dimensionality Reduction by IsoMap

The IsoMap is a special case of the MDS approach where we try to approximate the geodesic distance by the shortest path distance.  
The geodesic distance is the distance on the low dimensional surface (Manifold) the data is assumed to lie on.  
Hence, by knowing it we can use the data native metric.

In this notebook:

 - We'll use the IsoMap algorithm to reduce the dimensionality of the data set.
 - We'll compare results of the IsoMap with the MDS algorithm with euclidean distance metric.  

In [None]:
# Parameters

# Data
numRows  = 4
numCols  = 4
tImgSize = (64, 64)

# Model
numNeighbors    = 6
lowDim          = 2
metricType      = 'l2'

# Visualization
imgShift        = 5
numImgScatter   = 70

## Generate / Load Data

In this notebook we'll use [IsoMap Face Data Set](https://web.archive.org/web/20160913051505/http://isomap.stanford.edu/datasets.html).    
This data set is composed with 698 images of size `64 x 64` of the same face.  
Each image is taken from a different angle: Vertical and Horizontal.

![](https://i.imgur.com/cNz811Y.png)

We'll download the data from GitHub (There are 3 URL above, one should work :-)).

* <font color='red'>(**?**)</font> What's the dimension of the underlying manifold of the data?

In [None]:
# Download Data
# This section downloads data from the given URL if needed.

dataFileName = DownloadUrl(DATA_FILE_URL, DATA_FILE_NAME)

In [None]:
# Load Data

# Dictionary of the data
# 'images' - The images.
# 'poses' - The angles.
dFaceData = sp.io.loadmat(dataFileName)
mX        = dFaceData['images'].T #<! Loading from MATLAB

numSamples, dataDim = mX.shape

print(f'The features data shape: {mX.shape}')
print(f'The features data type: {mX.dtype}')

* <font color='red'>(**?**)</font> Do we need to scale the data?
* <font color='blue'>(**!**)</font> Check the dynamic range of the data (Images).

In [None]:
# Transpose each image (MATLAB -> Python)

for vX in mX:
    vX[:] = np.reshape(np.reshape(vX, tImgSize), (-1, ), order = 'F')

### Plot Data

In [None]:
# Plot the Data

hF = PlotMnistImages(mX, range(mX.shape[0]), numRows = numRows, numCols = numCols, tuImgSize = tImgSize)

plt.show()

## Applying Dimensionality Reduction - IsoMap

We'll use the IsoMap algorithm to approximate the data native manifold.  

One of the earliest (In ~2000 by Joshua B. Tenenbaum) approaches to manifold learning is the IsoMap algorithm, short for _Isometric Mapping_.  
IsoMap can be viewed as an extension of Multi Dimensional Scaling (MDS) or Kernel PCA.  
IsoMap seeks a lower dimensional embedding which maintains geodesic distances between all points. 

![Isomap](https://github.com/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2022_02/19_DimensionalityReduction/Isomap.png?raw=true)

We'll use SciKit Learn's [`Isomap`](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html).

* <font color='brown'>(**#**)</font> The method is based on MDS which means there is no unique solution.
* <font color='brown'>(**#**)</font> The complexity of the algorithm is rather high hence there are many approximated steps.
* <font color='brown'>(**#**)</font> Behind the scene the SciKit Learn implementation approximate the geodesic distance using a Kernel (So the solution is equivalent to K-PCA).

* <font color='red'>(**?**)</font> What do we send in for production from this model?

In [None]:
# Apply the IsoMap

# Construct the object
oIsoMapDr = Isomap(n_neighbors = numNeighbors, n_components = lowDim, metric = metricType)
# Build the model
oIsoMapDr = oIsoMapDr.fit(mX)

* <font color='red'>(**?**)</font> Does this method support out of sample data? Look for `transform()` method.

## Applying Dimensionality Reduction - UMAP

This sections uses the UMAP algorithm to approximate the data native manifold.  

```mermaid
flowchart LR
  A["Input data: $$\boldsymbol{X} \in \mathbb{R}^{m \times n}$$"]:::data --> B[Choose hyperparameters<br/>• n_neighbors • min_dist • metric]:::param
  B --> C{k-Nearest Neighbors per Point}:::proc
  C --> D["`Compute Local Radii: <br/> (ρᵢ, σᵢ) → smooth distances`"]:::proc
  D --> E["Build Fuzzy High Dimensional Graph: <br/> $${p}_{i, j} = exp \left( − \left( d \left( i, j \right) − {\rho}_{i} \right) / {\sigma}_{i} \right)$$"]:::agraph
  E --> F["Symmetrize Weights:<br/>$$p = 1 − \left( 1 − {p}_{i, j} \right) \left( 1 − {p}_{j, i} \right)$$"]:::agraph

  F --> G["Initialize Low Dimensional Embedding:<br/> $$\mathbb{Y} \in \mathbb{R}^{n \times m}$$<br/>By Spectral / Random / PCA"]:::init

  G --> H{"Optimize $$\boldsymbol{Y}$$ with SGD to Minimize Cross Entropy"}:::proc
  H --> I[[Attractive Edges: Pull Together]]:::force
  H --> J[[Repulsive Edges: Push Apart]]:::force
  I --> K["Update $$\boldsymbol{Y}$$"]:::update
  J --> K
  K -->|Repeat| H
  H --> L[Low Dimensional Similarities Q]:::agraph

  K --> M[Output Embedding<br/>Local neighborhoods preserved, Global structure respected]:::result

  %% Intuition callout
  N["`**Intuition**<br/>1) Build a weighted NN graph in high-D.<br/>2) Find a low-D layout whose graph is similar.<br/><br/>• **n_neighbors**: local vs global balance<br/>• **min_dist**: cluster tightness<br/>• **metric**: geometry of distances`"]:::note -.-> B

  %% Styles
  classDef data fill:#e3f2fd,stroke:#1565c0,color:#0d47a1;
  classDef param fill:#fff3e0,stroke:#ef6c00,color:#e65100;
  classDef proc fill:#ede7f6,stroke:#7b1fa2,color:#4a148c;
  classDef agraph fill:#e8f5e9,stroke:#1b5e20,color:#1b5e20;
  classDef init fill:#f1f8e9,stroke:#558b2f,color:#2e7d32;
  classDef force fill:#ffe0e0,stroke:#c62828,color:#8e0000;
  classDef update fill:#e0f7fa,stroke:#00838f,color:#006064;
  classDef result fill:#e1f5fe,stroke:#0277bd,color:#01579b;
  classDef note fill:#f5f5f5,stroke:#9e9e9e,color:#424242;
```

![](https://mermaid.ink/img/pako:eNptVW1u20YQvcqACYpYkBySEiWaaAPYkj_jOKmdok2jNFiRQ5EJySV2l3FkQUCLAkV_5wi5Q8-RM9Qn6Swpri3VJmzyzc6bfTM7O15aIY_QCqw449dhwoSC15NpAfSz_3ZqnRZlpSBiigXw_Uw8g8ePp9MZzyK5yOm1_GUF02la0J-cqWQ2W16uflvmBFWao4RiRf7WO-j1nsEBhRsnnEuEZFGiKJlgOSoUUge-_f0rFO8LTOfJjAsJGudp8T5KpWoAKpGGFKwRd1DHHC-n1sfeBTKB5HZh2BQeXvG0UFNr1fiPa_-J1sBzygnhnIcsg0sWpek6tyff_vj3n69d-PYnvXbg9q8vIHPOVQJaBStClGb_SR3vkOIdVGkWwVF1c7OAExIAE0q9kCkvKPyxYGViSrcsV--XaRc-rOAHwM8l1SnDWD2B27-_mO_IfGlPAoKCqp3aZ0ko4TrI6m7hqTbLdJ6zzYW69I3aw1rtEam9WuR1JW8QftblUtKoK0mUsyGlQfdUmz0f8PjQhYc3P6o3P667KVUpy_Tm5_x6o1CH-QyjKC3mgWmzdUe9eajFCtNiObWYphws4KrEUAkK9pSOtYh4Th-vxvtGyHEt5IR65mVJZC1jq53fUDC4TunEr44noDi8SIvGcSy4lHBYKMHLhemqkzri6VvKbV_R1qFKPyEcRnOksr6qsgxe8zmqBAWJeAffwZl2vcSyyuSGp0xgny6EWrudk9d2ha5ISMYEVZCy_tEkdVpLeE6En0q6pw-mZJzPGucGPCcApAWZagpzZ34GLyjgy0rp22-ORpe5uTXtPaXrHNFto9uH4hNGXTjO-IzWpRJVqCqBQEv6VDAyGi4ocqdzWqiKUuFFp6PDOjvQ3CMG13VbYgQXFzDX1wfo8BOy9Sa72tXdgaO00J40sXoTyNiCk8rrRE-WNUGCbKpVM9r50uncmzCdTkABdDKfJMwb2fSrb_mdfzuBtHOYVZKmFSgtr0Ap77nVs0k7zZFrsAAebwwN6O3qEdhUAPaDINAzdQ0PCNbTcI3HGgseruFkEx4SZHWea8PRtuGYDNS3ag1PNvmnBGMuQlzjsy18vh3uORmqurfWhhdkoHOtsnaHCzIUvF0PMyblBOP63wbEaZYFj7Afu3HUpb7gHzF45HhDL7S7Ic-4CB7Z0WDEnC1yXZA1O47jPtqGjfEwtA0bh55j29tsyrfdOsJRPDTk0cyJmduSB8wZ-OEWucm-pfuxh3t3ymceumbvBm3Rdelb4U7s3yN7nj9z45bs4ijqu1vk-iRM2mjfSzscur7rt2wfbft_aTfH1Cq341HMDN22_b5vNrftoT0cbNGbQ23pTuzFeEd3R6NZZOiON9qbbdF1C7TSPf0Y8h7qxxTd1Y_VteYijayAZgV2rRxFzjS0ljrs1KKpmePUCugzwpjpbrOmxYpoJSt-5TxvmYJX88QKYpZJQk0JJqk-xNxYBRYRijGvCmUF7mjg1lGsYGl9tgJn5OwO9gae5w4dt28PRl1rYQW9gdff9fv2kNwHzoDe3qpr3dQbO7t9f-D4Xt-1R3v9vm97q_8AkH4BZw?type=svg)

One of the earliest (In ~2000 by Joshua B. Tenenbaum) approaches to manifold learning is the IsoMap algorithm, short for _Isometric Mapping_.  
IsoMap can be viewed as an extension of Multi Dimensional Scaling (MDS) or Kernel PCA.  
IsoMap seeks a lower dimensional embedding which maintains geodesic distances between all points. 

![Isomap](https://github.com/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethods/2022_02/19_DimensionalityReduction/Isomap.png?raw=true)

We'll use UMAP Learn's [`UMAP`](https://umap-learn.readthedocs.io/en/latest/api.html#umap).

* <font color='brown'>(**#**)</font> While the original paper, [UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction](https://arxiv.org/abs/1802.03426), describe the method while using mant advanced theories from [Category Theory](https://en.wikipedia.org/wiki/Category_theory) in practice it is very similar to [t-SNE](https://www.jmlr.org/papers/v9/vandermaaten08a.html) and [LargeVis](https://arxiv.org/abs/1602.00370).
* <font color='brown'>(**#**)</font> Main advantages of UMAP over t-SNE: Speed, Easier to tune, _Out of Sample_ support.   
  Modern implementations of t-SNE (See [FFT Accelerated t-SNE](https://github.com/KlugerLab/FIt-SNE), [OpenTSNE](https://github.com/pavlin-policar/openTSNE)) basically closed the gap on all of those.

In [None]:
# Apply the UMAP
# Parameters tuned to replicate the IsoMap results

# Construct the object
oUmapDr = UMAP(n_neighbors = 7, n_components = lowDim, metric = 'cosine', n_epochs = None, learning_rate = 0.95, min_dist = 1.5, spread = 20)
# Build the model
oUmapDr = oUmapDr.fit(mX)

* <font color='red'>(**?**)</font> Does this method support out of sample data? Look for `transform()` method.

In [None]:
# Apply the Transform
mZ = oIsoMapDr.transform(mX)

In [None]:
# Plot the Low Dimensional Data (With the Faces)

# Compute Images which are far apart

lSet = list(range(1, numSamples))
lIdx = [0] #<! First image
for ii in range(numSamples - 1):
    mDi  = sp.spatial.distance.cdist(mZ[lIdx, :], mZ[lSet, :])
    vMin = np.min(mDi, axis = 0)
    idx  = np.argmax(vMin) #<! Farthest image
    lIdx.append(lSet[idx])
    lSet.remove(lSet[idx])

In [None]:
# Plot the Embedding with Images

hF, hA = plt.subplots(figsize = (10, 8))

imgShift = 5
for ii in range(numImgScatter):
    idx = lIdx[ii]
    x0  = mZ[idx, 0] - imgShift
    x1  = mZ[idx, 0] + imgShift
    y0  = mZ[idx, 1] - imgShift
    y1  = mZ[idx, 1] + imgShift
    mI  = np.reshape(mX[idx, :], tImgSize)
    hA.imshow(mI, aspect = 'auto', cmap = 'gray', zorder = 2, extent = (x0, x1, y0, y1))

hA.scatter(mZ[:, 0], mZ[:, 1], s = 50, c = 'lime', edgecolor = 'k')
hA.set_xlabel('$z_1$')
hA.set_ylabel('$z_2$')

plt.show()

* <font color='red'>(**?**)</font> What is the interpretation of ${z}_{1}$? What about ${z}_{2}$?
* <font color='blue'>(**!**)</font> Use Linear PCA to do the above and compare results.