![](https://i.imgur.com/qkg2E2D.png)

# UnSupervised Learning Methods

## Exercise 004 - Part II

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 16/06/2023 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/UnSupervisedLearningMethods/2023_03/Exercise0004Part002.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

from scipy.spatial.distance import cdist

# Machine Learning
from sklearn.datasets import make_s_curve, make_swiss_roll

# Computer Vision

# Miscellaneous
import os
import math
from platform import python_version
import random
import time
import urllib.request

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

DATA_FILE_URL   = r'None'
DATA_FILE_NAME  = r'None'


In [None]:
# Auxiliary Functions

def GetData(MakeData: Callable, Nx: int, Ny: int) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    mX, vCx = MakeData(Nx)
    mY, vCy = MakeData(Ny)

    return mX, mY, vCx, vCy

def PlotTrainTestData(mX: np.ndarray, mY: np.ndarray, vCx: np.ndarray, vCy: np.ndarray, hA: plt.Axes, Is3DData: bool = False) -> None:
    m = mX.min()
    M = mX.max()

    hA.scatter(*mX.T, s = 25, c = vCx, edgecolor = 'k', alpha = 1, label = 'Train', vmin = vCx.min(), vmax = vCx.max())
    hA.scatter(*mY.T, s = 100, c = 'r', marker = '*', alpha = 1, label = 'Test')
    hA.set_xlim([m, M])
    hA.set_ylim([m, M])
    if Is3DData:
        hA.set_zlim([m, M])
    hA.set_xlabel('$x_1$')
    hA.set_ylabel('$x_2$')
    if Is3DData:
        hA.set_zlabel('$x_3$')
    hA.legend()

## Guidelines

 - Fill the full names and ID's of the team members in the `Team Members` section.
 - Answer all questions / tasks within the Jupyter Notebook.
 - Use MarkDown + MathJaX + Code to answer.
 - Verify the rendering on VS Code.
 - Submission in groups (Single submission per group).
 - You may and _should_ use the forums for questions.
 - Good Luck!

* <font color='brown'>(**#**)</font> The `Import Packages` section above imports most needed tools to apply the work. Please use it.
* <font color='brown'>(**#**)</font> You may replace the suggested functions to use with functions from other packages.
* <font color='brown'>(**#**)</font> Whatever not said explicitly to implement maybe used by a 3rd party packages.
* <font color='brown'>(**#**)</font> The total run time of this notebook must be **lower than 30 [Sec]**.

## Team Members

- Nadav_Talmon_203663950
- Nadav_Shaked_312494925
- Adi_Rosenthal_316550797

## Generate / Load Data

In [None]:
# Download Data
# This section downloads data from the given URL if needed.

if (DATA_FILE_NAME != 'None') and (not os.path.exists(DATA_FILE_NAME)):
    urllib.request.urlretrieve(DATA_FILE_URL, DATA_FILE_NAME)

## 7. MDS

### 7.1. Classic MDS Algorithm

In this section we'll implement a SciKit Learn API compatible class for the Classic MDS.

The class should implement the following methods:

1. `__init____()` - The object constructor by the encoder dimension.
2. `fit()` - Given a data set ($\boldsymbol{D}_{xx}$) builds the encoder.
3. `transform()` - Applies the encoding on the input data ($\boldsymbol{D}_{xy}$) in out of sample manner.
4. `fit_transform()` - Given a data set ($\boldsymbol{D}_{xx}$) builds the encoder and applies the encoding.

* <font color='brown'>(**#**)</font> Pay attention to data structure (`Nx x Nx` / `Nx x Ny`).
* <font color='brown'>(**#**)</font> Do not use any loops in you implementation.

In [None]:
class CMDS():
    def __init__(self, d: int = 2):
        '''
        Constructing the object.
        Args:
            d - Number of dimensions of the encoder output.
        '''
        # ===========================Fill This===========================#
        # 1. Keep the model parameters.

        self.d = d
        self.Σ_d = None
        self.v_d = None
        self.mDxx_row_mean = None
        self.encoder = None
        self.mDxx = None

        # ===============================================================#

    def fit(self, mDxx: np.ndarray):
        '''
        Fitting model parameters to the input.
        Args:
            mDxx - Input data (Distance matrix) with shape Nx x Nx.
        Output:
            self
        '''
        # ===========================Fill This===========================#
        # 1. Build the model encoder.

        mDxxJ = mDxx - np.mean(mDxx, axis=0).reshape(-1, 1)
        mJDxxJ = mDxxJ - np.mean(mDxxJ, axis=0).reshape(1, -1)
        mKxx_centered = -0.5 * mJDxxJ

        v_d, Σ_d_power2, _ = sp.sparse.linalg.svds(mKxx_centered, k=self.d)
        self.Σ_d = np.diag(np.sqrt(Σ_d_power2))
        self.v_d = v_d
        self.mDxx_row_mean = np.mean(mDxx, axis=1).reshape(-1,1)
        self.encoder = (self.Σ_d @ self.v_d.T).T
        self.mDxx = mDxx

        # ===============================================================#
        return self

    def transform(self, mDxy: np.ndarray) -> np.ndarray:
        '''
        Applies (Out of sample) encoding.
        Args:
            mDxy - Input data (Distance matrix) with shape Nx x Ny.
        Output:
            mZ - Low dimensional representation (embeddings) with shape Ny x d.
        '''
        # ===========================Fill This===========================#
        # 1. Encode data using the model encoder.

        k_xy_centered = mDxy - self.mDxx_row_mean
        k_xy_centered = -0.5 * (k_xy_centered - np.mean(k_xy_centered, axis=0).reshape(1, -1))

        mZ = (np.linalg.inv(self.Σ_d) @ self.v_d.T @ k_xy_centered).T

        # ===============================================================#

        return mZ

    def fit_transform(self, mDxx: np.ndarray) -> np.ndarray:
        '''
        Applies encoding on the input.
        Args:
            mDxx - Input data (Distance matrix) with shape Nx x Nx.
        Output:
            mZ - Low dimensional representation (embeddings) with shape Nx x d.
        '''
        # ===========================Fill This===========================#
        # 1. Encode data using the model encoder.

        self.fit(mDxx)
        mZ = self.encoder

        # ===============================================================#

        return mZ

* <font color='red'>(**?**)</font> Will `fit()` and then `transform()` will match the result of `fit_transform()`?
  Make sure you understand this before proceeding.

### 7.2. Metric MDS Algorithm

In this section we'll implement a SciKit Learn API compatible class for the Metric MDS.
The implementation will assume the distance matrix is generated using the _Eculidean_ distance (**Not _Squared Euclidean_**).
The solver will use the Majorization Minimization algorithm.

The class should implement the following methods:

1. `__init____()` - The object constructor by the encoder dimension.
2. `fit()` - Given a data set ($\boldsymbol{D}_{xx}$) initializes the data structures.
3. `fit_transform()` - Given a data set ($\boldsymbol{D}_{xx}$) builds the encoder and applies the encoding.

* <font color='brown'>(**#**)</font> Pay attention to data structure (`Nx x Nx` / `Nx x Ny`).
* <font color='brown'>(**#**)</font> Do not use any loops in you implementation beside the main MM loop.
* <font color='brown'>(**#**)</font> Think about the difference in `transform()` and `fit_transform()` compared to `CMDS()` above.

In [None]:
class MMDS():
    def __init__(self, d: int = 2, maxIter=500, ε=1e-3):
        '''
        Constructing the object.
        Args:
            d       - Number of dimensions of the encoder output.
            maxIter - Maximum number of iterations for the Majorization Minimization.
            ε       - Convergence threshold.
        '''
        # ===========================Fill This===========================#
        # 1. Keep the model parameters.

        self.d = d
        self.ε = ε
        self.maxIter = maxIter
        self.mZ = None

        # ===============================================================#

    def fit(self, mDxx: np.ndarray):
        '''
        Fitting model parameters to the input.
        Args:
            mDxx - Input data (Distance matrix) with shape Nx x Nx.
        Output:
            self
        '''
        # ===========================Fill This===========================#
        # 1. Build the model encoder.

        Nx = mDxx.shape[0]
        mZ_t_next = np.random.rand(Nx, self.d)
        for i in range(self.maxIter):
            mZ_t = mZ_t_next.copy()
            mDzz = cdist(mZ_t, mZ_t)

            mC = np.zeros(mDzz.shape)
            mC[mDzz != 0] = -mDxx[mDzz != 0] / (mDzz[mDzz != 0])
            mC[mDzz == 0] = 0

            mB = mC - np.diag(np.sum(mC, axis=1))
            mZ_t_next = (1 / Nx) * mB @ mZ_t

            curr_dist = np.linalg.norm(mZ_t_next - mZ_t, ord='fro')
            if curr_dist <= self.ε:
                break

        self.mZ = mZ_t_next

        # ===============================================================#
        return self

    def fit_transform(self, mDxx: np.ndarray) -> np.ndarray:
        '''
        Applies encoding on input data.
        Args:
            mDxx - Input data (Distance matrix) with shape Nx x Nx.
        Output:
            mZ - Low dimensional representation (embeddings) with shape Nx x d.
        '''
        # ===========================Fill This===========================#
        # 1. Apply the `fit()` method.
        # 2. Applies the Majorization Minimization.
        # 3. Encode data using the model encoder.
        # !! Use no loops beside the main loop (`maxIter`).

        self.fit(mDxx)
        mZ = self.mZ

        # ===============================================================#
        return mZ

* <font color='red'>(**?**)</font> Why is the `transform()` method not asked to be implemented?
  Make sure you understand this before proceeding.

### 7.3. Apply MDS on Data

In this section the MDS (Using the above classes) will be applied on several data sets:

 * Swiss Roll - Generated using `make_swiss_roll()`.
 * S Curve - Generated using `make_s_curve()`.

For each data set:

1. Plot the Data Set
   Plot the Data set in 3D.
   **This is implemented**.
2. Calculate the Distance Matrix
   Calculate the distance matrix of the training data (`mX1`, `mX2`).
   For _Classic MDS_ use the _Squared Euclidean_ distance.
   For _Metric MDS_ use the _Euclidean_ distance.
3. Apply the MDS
   On each data set, apply both the _Metric MDS_ and _Classic MDS_.
4. Plot Low Dimensional Data
   Make a scatter plot of $\boldsymbol{Z} \in \mathbb{R}^{d \times N}$ and color the data points according to `vCx1` and `vCx2`.
   Use `d = 2`.

* <font color='brown'>(**#**)</font> Pay attention to the difference in dimensions of the data to the derived Math formulations.
* <font color='brown'>(**#**)</font> The output should be 2 figures for each data set. You may show them in a single plot using sub plots.

In [None]:
# Generate Data

Nx = 1000 #<! Train Data
Ny = 10 #<! Test Data (Out of Sample)

mX1, mY1, vCx1, vCy1 = GetData(make_s_curve, Nx, Ny)
mX2, mY2, vCx2, vCy2 = GetData(make_swiss_roll, Nx, Ny)

# Centering Data
vμX1 = np.mean(mX1, axis = 0)
vμX2 = np.mean(mX2, axis = 0)

mX1 -= np.reshape(vμX1, (1, -1))
mY1 -= np.reshape(vμX1, (1, -1))
mX2 -= np.reshape(vμX2, (1, -1))
mY2 -= np.reshape(vμX2, (1, -1))

In [None]:
# Plot Data
# Pay attention how to display the train (Out of Sample) data

hF = plt.figure(figsize = (16, 8))
hA1 = hF.add_subplot(1, 2, 1, projection = '3d')
hA2 = hF.add_subplot(1, 2, 2, projection = '3d')
hA1.view_init(elev = 15, azim = 300)
hA2.view_init(elev = 5, azim = 285)

PlotTrainTestData(mX1, mY1, vCx1, vCy1, hA1, Is3DData = True)
PlotTrainTestData(mX2, mY2, vCx2, vCy2, hA2, Is3DData = True)

In [None]:
mX1.shape, mY1.shape, vCx1.shape, vCx2.shape, vCy1.shape, vCy2.shape

In [None]:
#===========================Fill This===========================#
# 1. Set parameter `d`.
# 2. Calculate the distance matrices of the training data per data set.
# 3. Apply Classic MDS and Metric MDS to each data set.
# 4. Display results as scattered data.
# !! The output should be a figure of 2 x 2 (Row: Method, Column: Data Set).

d = 2
methods = ['Classic MDS', 'Metric MDS']
datas = [
    {
        'dataset_name': 'Curve',
        'mX': mX1,
        'mY': mY1,
        'vCx': vCx1,
    },
    {
        'dataset_name': 'Swiss roll',
        'mX': mX2,
        'mY': mY2,
        'vCx': vCx2
    }
]

fig, axs = plt.subplots(2, 2, figsize=(10, 10))

for i, method in enumerate(methods):
    for j, data in enumerate(datas):
        mX = data['mX']
        mDxx = cdist(mX, mX, metric='euclidean' if method == 'Classic MDS' else 'sqeuclidean')

        mds_alg = CMDS(d) if method == 'Classic MDS' else MMDS(d)
        mZ = mds_alg.fit_transform(mDxx)

        # Plot Classic MDS for Dataset 1
        axs[i, j].scatter(mZ[:, 0], mZ[:, 1], c= data['vCx'])
        axs[i, j].set_title(f'{method} - {data["dataset_name"]}')

# Adjust the spacing between subplots
fig.tight_layout()

# Show the figure
plt.show()

#===============================================================#

### 7.4. Question

1. Explain the differences / similarities between results.
2. Describe the distance function which should be used for such data.
3. What results would you expect if the distance for the Metric MDS was the _Squared Euclidean_?
   Assume the optimal solder for this distance.

### 7.4. Solution

1. Explain the differences / similarities between results.
2. Describe the distance function which should be used for such data.
3. What results would you expect if the distance for the Metric MDS was the _Squared Euclidean_?
   Assume the optimal solder for this distance.

### 7.4. Solution

1. The results of classical MDS and metric MDS will generally aim to be similar in terms of the overall structure and relationships between points.
    However, there can be differences in terms of rotation  mirroring and scale.

2. The choice of distance function in classical MDS and metric MDS depends on the nature of the data and the desired
    representation of similarity or dissimilarity. In both methods, a distance or dissimilarity matrix is required as input.
    
    Commonly used distance functions for classical MDS:
    Euclidean distance is appropriate when the dissimilarity data can be reasonably assumed to satisfy the triangle
    inequality and can be represented accurately in a Euclidean space.
    Manhattan distance is suitable when the dissimilarity data reflects differences along different dimensions or when the
    data does not satisfy the Euclidean assumptions.

    Commonly used distance functions for metric MDS:
    Correlation-based distance measures dissimilarity based on the correlation coefficient between variables.
    It captures the similarity or dissimilarity of patterns rather than the absolute differences in values.
    non-metric distance measures can also be used. These measures do not necessarily satisfy the triangle inequality
    but provide a way to handle dissimilarity data that may violate Euclidean assumptions.

3. if the squared Euclidean distance is used as the distance function in metric MDS, the results would be similar to
    using the regular Euclidean distance, but with some differences.
    The configuration of points obtained through metric MDS with squared Euclidean distance would aim to minimize the
    stress function based on the squared dissimilarities.
    This may lead to a greater separation or spread of points in the low-dimensional space, emphasizing larger dissimilarities.

> Royi: ❌, (1) Why would Classical and Metric be similar?  
> Royi: ❌, (2) You didn't address the above data and would suite it.  
> Royi: ❌, (3) What's the connection between the Classical MDS to the squared euclidean?  
> Royi: <font color='red'>-5</font>.


---

### 7.5. Out of Sample Extension

In this section the _out of sample extension_ of the _Classic MDS_ (Using the above class) will be applied.
In this section the calculation of the out of sample extension will be done without using the test data samples (`mX1`, `mX2`)!

For `mY1` and `mY2`:

1. Calculate the Distance Matrix
   Calculate `Dxy1` and `Dxy2` **without using `mX1` and `mX2`**.
   You may use `Dxx1` and `Dxx2` in any way suitable.
   For _Classic MDS_ use the _Squared Euclidean_ distance.
   For _Metric MDS_ use the _Euclidean_ distance.
3. Apply the Out of Sample Extension for Classic MDS
   On each data set, apply the Classic MDS in _out of sample extension_ mode on `mDxy1` and `mDxy2`.
4. Plot Low Dimensional Data
   Make a scatter plot of $\boldsymbol{Z} \in \mathbb{R}^{d \times N}$ and color the data points according to `vCx1`, `vCx2`.
   You should plot both the training data and the test data.
   Use `d = 2`.

* <font color='brown'>(**#**)</font> Pay attention to the difference in dimensions of the data to the derived Math formulations.
* <font color='brown'>(**#**)</font> You may use the knowledge about the dimensions of `mX1`, `mX2`.
* <font color='brown'>(**#**)</font> In case one fails on (1) one may calculate `mDxy` using `mX` (Points will be reduced).
* <font color='brown'>(**#**)</font> The output should be 2 figures for each data set. You may show them in a single plot using sub plots.

In [None]:
#===========================Fill This===========================#
# 1. Set parameter `d`.
# 2. Calculate the distance matrices of the test data per data set from `mDxx1` and `mDxx2`.
# 3. Apply Classic MDS to each data set.
#    Apply `fit()` then `transform()` on `mDxx1` and `mDxx2`.
#    Apply `transform()` on `mDxy1` and `mDxy2`.
# 4. Display results as scattered data.
#    Display both the train and test data on the same axes (See above).
# !! The output should be a figure of 1 x 2 (Row: Method, Column: Data Set).
# !! Hint: You should recover the data from `mDxx`.

d = 2

fig, axs = plt.subplots(2, 2, figsize=(16, 8))

isUsing_mXs = [False, True]

for i, isUsing_mX in enumerate(isUsing_mXs):
    for j, data in enumerate(datas):
        mX = data['mX']
        mY = data['mY']
        vCx = data['vCx']

        mDxx = cdist(mX, mX, metric='sqeuclidean')

        cmds = CMDS(d)
        mZx = cmds.fit_transform(mDxx)

        if isUsing_mX:
            mDxy = cdist(mX, mY, metric='sqeuclidean')
        else:
            _, d_origin = mY.shape
            v_d, Σ_d_power2, _ = sp.sparse.linalg.svds(cmds.mDxx, k=d_origin)
            Σ_d = np.diag(np.sqrt(Σ_d_power2))
            mX = (Σ_d @ v_d.T).T
            mDxy = cdist(mY, mX, metric='sqeuclidean').T

        mZy = cmds.transform(mDxy)

        axs[i][j].scatter(mZx[:, 0], mZx[:, 1], c = vCx, edgecolor = 'k', alpha = 1, label = 'Train')
        axs[i][j].scatter(mZy[:, 0], mZy[:, 1], s = 100, c = 'r', marker = '*', alpha = 1, label = 'Test')
        axs[i][j].set_title(f'Classic MDS - {data["dataset_name"]} [{"Using" if isUsing_mX else "Without using"} mX]')

plt.show()

#===============================================================#

### 7.6. Question

Are the results above good?
Will they match the results if one would calculate `mDxy` from `mX` and `mY`?

### 7.6. Solution

Recovering `mX1` using Classic MDS and `mDxx` may not be good due to the translation and rotation ambiguity that arises during the recovery process. This ambiguity is caused by the fact that Classic MDS cannot uniquely determine the translation, rotation, and reflection of the recovered configuration.

On the other hand, if we calculate `mDxy` directly from `mX` and `mY`, the results are likely to be good. By using `mX` and `mY` to calculate the dissimilarity matrix `mDxy`, we eliminate the translation and rotation ambiguity present when recovering `mX` from `mDxx`.

Calculating mDxy from `mX` and `mY` allows us to preserve the inherent relationships and structure of the data, resulting in a more accurate representation in the low-dimensional space.

---