# LESSON 6: K-MEANS CLUSTERING - DBSCAN
<table><tr>
<td> <img src="../images/clustering_logo.png" width="500px"/> </td>
</tr></table>

*K-means clustering part of this lecture was refered by [machinelearningcoban.com](https://machinelearningcoban.com/2017/01/01/kmeans/)*

## 1. K-means clustering introduction

<img src="../images/kmeans_example.png" width="400px"/>

With the dataset contains lots of data points without label, we have to group these data points into K clusters and we expect that samples in the same cluster will have similar features. The algorithm to solve this problem is called ***K-MEANS CLUSTERING***.

<img src="../images/kmeans_example_5_clusters.png" width="400px"/>

In 2D space, we can see that the region of each cluster is a polygon with a line border. This line border is the mid perpendicular of a line connect 2 centers.

Assume that we have N data point $X = [x_1, x_2, \dots, x_N]$ and K < N is the number of clusters. We have to find $M = [m_1, m_2, \dots, m_K]$ which is the center (or representatives) of K clusters and $Y = [y_1, y_2, \dots, y_N]$ which is the label of N data points.

Label $y_i$ of a sample is encoded into one-hot type, which means $y_i = [y_{i1}, y_{i2}, \dots, y_{iK}]$ and $y_{ij} = 1$ if $x_i$ is predicted to belong to cluster $j$.

So, we have

<center>
\[
 y_{ik} \in \{0, 1\} \\
 \sum_{k = 1}^K y_{ik} = 1
\]
</center>

## 2. Loss function and Optimizer for K-means clustering
The main target of clustering technique is to minimize the distance between each data point and its center.

If data point $x_i$ belong to cluster $m_k$, so $y_{ik} = 1$ and $y_{ij} = 0$ with $\forall j \neq k$

<center>
    \[
    D(x_i, m_k) = (x_i - m_k)^2 \\
    D(x_i, m_k) = \sum_{j=1}^K y_{ij}(x_i - m_j)^2
    \]
</center>

We have the loss function for the whole dataset

<center>
    \[
    \mathcal{L}(Y, M) = MSE(Y, M)
    = \sum_{i=1}^{N}\sum_{j=1}^{K}y_{ij}(x_i - m_j)^2
    \]
</center>

With two variables $Y$ and $M$, to optimize the loss function, we fix one variable and optimize another and vice versa. Specifically, we solve two problems: Fixed $M$, optimize $Y$ and Fix $Y$, optimize $M$ respectively.

### Fixed M, optimize Y
For each data point $x_i$,

<center>
    \[
    y_i = arg \min_{y_i} \mathcal{L}(y_i) \\
    = arg \min_{y_i} \sum_{j=1}^{K}y_{ij}(x_i - m_j)^2
    \]
</center>

We need to find only one $y_i$ for each $x_i$ so this problem is solved by assign the label of each $x_i$ as the **nearest center** of it.

### Fixed Y, optimize M
For each existing cluster,

<center>
    \[
    m_j = arg \min_{m_j} \mathcal{L}(m_j) \\
    = arg \min_{m_j} \sum_{i=1}^{N}y_{ij}(x_i - m_j)^2
    \]
</center>

We need to find only one $m_j$ for each existing cluster

Calculate derivative of $\mathcal{L}(m_j)$ with $m_j$ and solve the derivative function

<center>
    \[
    \frac{\partial \mathcal{L}(m_j)}{\partial m_j} = 2 \sum_{i=1}^{N}y_{ij}(m_j - x_i) = 0 \\
    m_j \sum_{i=1}^{N}y_{ij} = \sum_{i=1}^{N}y_{ij}x_i \\
    m_j = \frac{\sum_{i=1}^{N}y_{ij}x_i}{\sum_{i=1}^{N}y_{ij}} \\
    m_j = \frac{\text{Sum of all data points in cluster} j}{\text{Number of data points in cluster} j}
    \]
</center>

That's why we call this algorithm ***k-means clustering***.

### Overall algorithm
***Input:*** Dataset contains N samples, K clusters. <br>
***Output:*** N label y for each data sample, K center m for each cluster.

***Step 1:*** Randomly choose K data points as initialized cluster center. <br>
***Step 2:*** Assign label for each data point by nearest center. <br>
***Step 3:*** If the results of ***Step 2*** is same as the previous iteration, stop the algorithm, else continue the next step. <br>
***Step 4:*** Calculate the new cluster center by the mean of data point in this cluster. <br>
***Step 5:*** Go back to ***Step 2***.



## 3. Weaknesses of K-means clustering
### We need to define number of clusters K
In some cases, we don't know the number of cluster and this is an obstacle while using K-means clustering

### Clustering results is highly depended on initialization
#### Slow convergence

<img src="../images/kmeans_slow_converge.gif" width="500px"/>

#### Bad results
<img src="../images/kmeans_bad_result.gif" width="500px"/>

We can overcome this problem by running K-means clustering multiple times and choose the best results. Or there are some upgraded versions of K-means clustering like K-means++.

### Clusters must be round in shape

<img src="../images/kmeans_diff_cov.gif" width="500px"/>


### K-means clustering doesn't work with non-convex dataset

<img src="../images/kmeans_smile_face.png" width="500px"/>

## 4. DBSCAN introduction

## 5. Loss function and Optimizer for DBSCAN

## 6. Implementation example
### 6.1. Implement from scratch

### 6.2. Use `sklearn`

## 7. Homework