![](https://i.imgur.com/qkg2E2D.png)

# UnSupervised Learning Methods

## Exercise 002 - Part IV

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 03/04/2023 | Royi Avital | First version                                                      |
|         |            |             |                                                                    |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/UnSupervisedLearningMethods/2023_03/Exercise0002Part004.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp

# Machine Learning
from sklearn import datasets
from sklearn.cluster import AgglomerativeClustering

# Computer Vision

# Statistics

# Miscellaneous
import os
import math
from platform import python_version
import random
import time
import urllib.request

# Typing
from typing import Callable, List, Tuple, Union

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

In [None]:
# Configuration
#%matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants

DATA_FILE_URL   = r'https://drive.google.com/uc?export=download&confirm=9iBg&id=11YqtdWwZSNE-0KxWAf1ZPINi9-ar56Na'
DATA_FILE_NAME  = r'ClusteringData.npy'


## Guidelines

 - Fill the full names and ID's of the team members in the `Team Members` section.
 - Answer all questions / tasks within the Jupyter Notebook.
 - Use MarkDown + MathJaX + Code to answer.
 - Verify the rendering on VS Code.
 - Submission in groups (Single submission per group).
 - You may and _should_ use the forums for questions.
 - Good Luck!

* <font color='brown'>(**#**)</font> The `Import Packages` section above imports most needed tools to apply the work. Please use it.
* <font color='brown'>(**#**)</font> You may replace the suggested functions to use with functions from other packages.
* <font color='brown'>(**#**)</font> Whatever not said explicitly to implement maybe used by a 3rd party packages.

## Team Members

 - `<FULL>_<NAME>_<ID001>`.
 - `<FULL>_<NAME>_<ID002>`.

## Generate / Load Data

In [None]:
# Download Data
# This section downloads data from the given URL if needed.

if not os.path.exists(DATA_FILE_NAME):
    urllib.request.urlretrieve(DATA_FILE_URL, DATA_FILE_NAME)

In [None]:
# Generate / Load Data

numSamples  = 1000
mA          =  np.array([[0.6, -0.6], [-0.4, 0.8]])

mX1 = datasets.make_circles(n_samples = numSamples, noise = 0.02)[0]
mX2 = datasets.make_moons(n_samples = numSamples, noise = 0.05)[0]
mX3 = datasets.make_blobs(n_samples = numSamples, random_state = 170)[0] @ mA
mX4 = datasets.make_blobs(n_samples = numSamples, random_state = 170, cluster_std = [0.8, 2, 0.4])[0] 
mX5 = np.load(DATA_FILE_NAME)

lDataSet = [mX1, mX2, mX3, mX4, mX5]
numDataSets = len(lDataSet)


In [None]:
# Plot Data
hF, hAs = plt.subplots(nrows = 1, ncols = numDataSets, figsize = (18, 5))
hAs = hAs.flat

for ii, hA in enumerate(hAs):
    mX = lDataSet[ii]
    hA.scatter(mX[:, 0], mX[:, 1], c = 'lime', s = 15, edgecolor = 'k')
    hA.axis('equal')
    
plt.tight_layout()
plt.show()

## 8. Clustering by Density based Spatial Clustering of Applications with Noise (DBSCAN)

### 8.1. DBSCAN Algorithm

In this section we'll implement the DBSCAN algorithm:

1. Implement an auxiliary function to compute the connected components (`GetConnectedComponents()`).  
   You may choose any implementation strategy (`DFS` / `BFS`, ect...).
2. Implement the function `DBSCAN()`.  
   The function should label noise points as `-1`.

* <font color='brown'>(**#**)</font> Implementation should be efficient (Memory and operations). Total run time expected to be **less than 20 seconds**.


In [None]:
#===========================Fill This===========================#
def GetConnectedComponents(mG: np.ndarray) -> np.ndarray:
    '''
    Extract the connected components of a graph.
    Args:
        mG          - Graph matrix.
    Output:
        vL          - Label per component.
    Remarks:
        - This is a !!BFS / DFS!! implementation.
    '''

    pass
#===============================================================#

In [None]:
#===========================Fill This===========================#
def DBSCAN(mX: np.ndarray, Z: int, r: float) -> np.ndarray:
    '''
    DBSCAN Algorithm.
    Args:
        mX  - Input data with shape N x d.
        Z   - Number of points required to be a core point.
        r   - Neighborhood radius.
    Output:
        vL  - The labels (-1, 0, 1, .., K - 1) per sample with shape (N, ).
    Remarks:
        - Clusters will have the labels {0, 1, ..., K - 1}.
        - Noise samples will have the label `-1`.
    '''

    # Pre

    # Step 1: Find core points

    # Step 2: Build the graph

    # Step 3: Find connected components

    # Step 4: Assign boundary points

    pass
#===============================================================#

### 8.2. Clustering the Data Set

In this section we'll use the implementation of the DSCAN algorithm.
The tasks are:

1. Use the data set `mX4`.
2. Tweak the parameters until you have 3 clusters.
3. Display results.

In [None]:
#===========================Fill This===========================#
# 1. Set parameters.
# 2. Apply the algorithm.

???

#===============================================================#


In [None]:
#===========================Fill This===========================#
# 1. Plot the clustered data.
# !! The noise samples should also be labeled.

???

#===============================================================#

### 8.3. An Algorithm to Set the Parameters Automatically Given a Data Set

Can you think about an algorithm to automatically infer optimal parameters of the DBSCAN algorithm given a data set?   

1. Sketch the algorithm (Words / Diagram).
2. Implement and test on `mX4`.
3. Plot the results.

* <font color='brown'>(**#**)</font> Run time should be reasonable (Single number of seconds).
* <font color='brown'>(**#**)</font> Good answers might be given a bonus points of up to 4 points.

### 8.3. Solution

<font color='red'>??? Fill the answer here ???</font>

---

In [None]:
#===========================Fill This===========================#
# Implement a function which gets a data set and output the `Z` and `r` parameters of `DBSCAN()`.

#===============================================================#


In [None]:
#===========================Fill This===========================#
# Test your algorithm on `mX4` data set. Show results.

#===============================================================#

### 8.4. Test Methods on the Data Set

In this section we'll compare 4 methods on each data set.  
The 4th methods is `AgglomerativeClustering` which is imported from `SciKit Learn`.

1. Run each method on each data set.
2. Plot a grid of results (Using `plt.subplots()`): Each row is a different method, each column is a different data set.
3. Optimize the parameters per data set per method.

The final result is a grid of `4 x 5` scatter plots.

* <font color='brown'>(**#**)</font> You should use `CourseAuxFun.py` and import your self implemented functions from the module.

In [None]:
#===========================Fill This===========================#
# Display the results of each method

#===============================================================#