![](https://i.imgur.com/qkg2E2D.png)

# UnSupervised Learning Methods

## Exercise 002 - Part III

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 01/04/2023 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/UnSupervisedLearningMethods/2023_03/Exercise0002Part003.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp

# Machine Learning

# Computer Vision

# Statistics
from scipy.stats import multivariate_normal as MVN

# Miscellaneous
import os
import math
from platform import python_version
import random
import time
import urllib.request

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
#%matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

DATA_FILE_URL   = r''
DATA_FILE_NAME  = r''


## Guidelines

 - Fill the full names and ID's of the team members in the `Team Members` section.
 - Answer all questions / tasks within the Jupyter Notebook.
 - Use MarkDown + MathJaX + Code to answer.
 - Verify the rendering on VS Code.
 - Submission in groups (Single submission per group).
 - You may and _should_ use the forums for questions.
 - Good Luck!

* <font color='brown'>(**#**)</font> The `Import Packages` section above imports most needed tools to apply the work. Please use it.
* <font color='brown'>(**#**)</font> You may replace the suggested functions to use with functions from other packages.
* <font color='brown'>(**#**)</font> Whatever not said explicitly to implement maybe used by a 3rd party packages.

## Team Members

 - `<FULL>_<NAME>_<ID001>`.
 - `<FULL>_<NAME>_<ID002>`.

## Generate / Load Data

In [None]:
# Generate / Load Data

N1    = 250
N2    = 150
N3    = 200

vMu1  = np.array([0, 0  ])
vMu2  = np.array([2, 0.5])
vMu3  = np.array([4, 1  ])

mSig1 = 0.5 * np.array([[1.00, 1.25],
                       [1.25, 2.00]])

mSig2 = 0.5 * np.array([[ 1.00, -1.25],
                       [-1.25,  2.00]])

mSig3 = 0.5 * np.array([[1.00, 1.25],
                       [1.25, 2.00]])

mX1 = MVN.rvs(mean = vMu1, cov = mSig1, size = N1)
mX2 = MVN.rvs(mean = vMu2, cov = mSig2, size = N2)
mX3 = MVN.rvs(mean = vMu3, cov = mSig3, size = N3)
mX  = np.r_[mX1, mX2, mX3]


In [None]:
# Plot Data
hF, hA = plt.subplots(figsize = (6, 6))

hA.scatter(mX[:, 0], mX[:, 1], s = 50, edgecolors = 'k', color = 'b')
hA.axis('equal')
hA.set_title('Input Data')
hA.set_xlabel('${x}_{1}$')
hA.set_ylabel('${x}_{2}$')

plt.show()

## 7. Clustering by Gaussian Mixture Model (GMM)

### 7.1. GMM Algorithm

The GMM algorithm aims to maximize the (log) likelihood objective:
$$\arg\max_{\left\{ \left(w_{k},\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}\right)\right\} _{k=1}^{K}}f=\arg\max_{\left\{ \left(w_{k},\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}\right)\right\} _{k=1}^{K}}\sum_{i=1}^{N}\log\left(\sum_{k=1}^{K}w_{k}\mathcal{N}\left(\boldsymbol{x}_{i};\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}\right)\right)$$

In this section you should implement:

1. `InitGmm()` - Given a data set and number of clusters it sets the initialization of the `GMM` algorithm:  
   - `mμ` - Should be initialized by the [`K-Means++`](https://en.wikipedia.org/wiki/K-means%2B%2B) algorithm.
   - `tΣ` - Should be initialized by diagonal matrices with the data variance on the diagonal (A scalar matrix).
   - `vW` - Should be initialized with a uniform distribution.  
2. `CalcGmmObj()` - Given a data set and set of parameters it calculate the value of the GMM objective function.
3. `GMM()` - Given a data set and initialization applies the GMM algorithm.  
The stopping criteria should be number of iterations or minimal improvement in the objective function.

* <font color='brown'>(**#**)</font> Implementation should be efficient (Memory and operations). Total run time expected to be **less than 60 seconds**.
* <font color='brown'>(**#**)</font> You may use the `scipy.stats.multivariate_normal` class.



In [None]:
#===========================Fill This===========================#
def InitGmm(mX: np.ndarray, K: int, seedNum: int = 123) -> np.ndarray:
    '''
    GMM algorithm initialization.
    Args:
        mX          - Input data with shape N x d.
        K           - Number of clusters.
        seedNum     - Seed number used.
    Output:
        mμ          - The initial mean vectors with shape K x d.
        tΣ          - The initial covariance matrices with shape (d x d x K).
        vW          - The initial weights of the GMM with shape K.
    Remarks:
        - Given the same parameters, including the `seedNum` the algorithm must be reproducible.
        - mμ Should be initialized by the K-Means++ algorithm.
    '''

    pass
#===============================================================#

In [None]:
#===========================Fill This===========================#
def CalcGmmObj(mX: np.ndarray, mμ: np.ndarray, tΣ: np.ndarray, vW: np.ndarray) -> float:
    '''
    GMM algorithm objective function.
    Args:
        mX          - The data with shape N x d.
        mμ          - The initial mean vectors with shape K x d.
        tΣ          - The initial covariance matrices with shape (d x d x K).
        vW          - The initial weights of the GMM with shape K.
    Output:
        objVal      - The value of the objective function of the GMM.
    Remarks:
        - A
    '''

    pass
#===============================================================#

In [None]:
#===========================Fill This===========================#
def GMM(mX: np.ndarray, mμ: np.ndarray, tΣ: np.ndarray, vW: np.ndarray, numIter: int = 1000, stopThr: float = 1e-5) -> np.ndarray:
    '''
    GMM algorithm.
    Args:
        mX          - Input data with shape N x d.
        mμ          - The initial mean vectors with shape K x d.
        tΣ          - The initial covariance matrices with shape (d x d x K).
        vW          - The initial weights of the GMM with shape K.
        numIter     - Number of iterations.
        stopThr     - Stopping threshold.
    Output:
        mμ          - The final mean vectors with shape K x d.
        tΣ          - The final covariance matrices with shape (d x d x K).
        vW          - The final weights of the GMM with shape K.
        vL          - The labels (0, 1, .., K - 1) per sample with shape (N, )
        lO          - The objective function value per iterations (List).
    Remarks:
        - The maximum number of iterations must be `numIter`.
        - If the objective value of the algorithm doesn't improve by at least `stopThr` the iterations should stop.
    '''

    pass
#===============================================================#

### 7.2. Clustering the Data Set

In this section we'll compare the output of the GMM to the K-Means on the data set.
The tasks are:

1. Create a file called `CourseAuxFun.py`.  
   Copy the functions of the functions related to the GMM and K-Means into it.
2. Import the functions using `from CourseAuxFun.py import *`.
3. Using **the same** initialization (`mC` and `mμ`), compare the results of the the K-Means and GMM algorithm.
4. Plot the objective function of the GMM as a function of the iteration.
5. Write a short answer why the results are different.

In [None]:
#===========================Fill This===========================#
# 1. Set the clustering parameters.
# 2. Apply the GMM algorithm.

???
#===============================================================#


In [None]:
#===========================Fill This===========================#
# 1. Plot the clustered data.
# 2. Plot the objective function as a function of the iterations.
# !! You may plot in a single figure (Sub Plots).

???
#===============================================================#

### 7.3. GMM vs. K-Means

K-Means is known to be a private case of GMM.  
The following questions try to understand the connection between the 2.

 1. How does the parameters of the GMM algorithm should be set to have the K-Means?
 2. How should the data in (7.2) be altered in order to K-Means perform on it like the GMM?  
    Assume you know exactly how it was generated.

You may use coding to verify and show the results.

### 7.3. Solution

<font color='red'>??? Fill the answer here ???</font>

---