### 1. Model
- Assume the sample set $X = {x_1, x_2,..., x_n}$, the dimension of the set is $m$. The model $C(i)$ is:
l = $C(i)$, where $i \in {1,2,...,n}$, $l \in {1,2,...,k}$

### 2. Strategy
- The strategy to pick up the optimal model is to minimize the loss function. Assume we use squred Euclidean distance to measure the distances between each two samples:
$$ d(x_{i}, x_{j}) = \sum_{k=1}^m (x_{ki} - x_{kj})^2 $$
- The loss function is the computation of the sum of distances between each sample to the centroid of the cluster that the sample belongs to, that is, the aggregate intra-cluster distance:
$$ W(C) = \sum_{l=1}^k \sum_{C(i)=l} d(x_{i}, c_{l}) $$
$$ W(C) = \sum_{l=1}^k \sum_{C(i)=l} ||x_{i} - c_{l}||^2 $$

### 3. Algorithm
- In practice, the iterative method is often used to solve the loss function:
- ```Step 1:``` Choose k centroids and assign each sample to its nearest centroid and get a clustering result. Given the centroids ($c_{1}, c_{2}, c_{3},..., c_{k}$), we minimize the loss function to get the optimal model $C(i)$:
$$ \min_{C} \sum_{l=1}^k \sum_{C(i)=l} ||x_i - c_l||^2 $$
With the model $C(i)$, we can assign each the sample point to its cluster.
- ```Step 2:``` Update the centroid of each cluster to the mean of the sample in the cluster (new centroid $c_l$ is the mean of all points $x_i$ assigned to cluster $l$ in the previous step): 
$$ c_l = \frac{1}{n_l} \sum_{C(i)=l} x_i $$
- Repeat the steps below until none of the cluster assignments change.
<p align="center">
<img src=images/3.1.1.png width="400" height="300" alt="5" align=centering>

### Example:
$$ X=$$
$
 \left[
 \begin{matrix}
   0 & 0 & 1 & 5 & 5 \\
   2 & 0 & 0 & 0 & 2 \\
  \end{matrix}
  \right] \tag{3}
$
Assume $K=2$, cluster these samples.

- Step 1: initiate two original centroids and assign each sample to the cluster that its nearest centroid belong to.
  - Assume two original centroids are $c_{1} = (0,0) = x_{2}$ and $c_{2} = (5,0) = x_{4}$;
  - Calculate the distances between each sample and these centroids:
    - $d(x_{1}, c_{1}) = 4, d(x_{1}, c_{2}) = 29$, assgined to cluster 1
    - $d(x_{2}, c_{1}) = 0, d(x_{2}, c_{2}) = 25$, assgined to cluster 1
    - $d(x_{3}, c_{1}) = 1, d(x_{3}, c_{2}) = 16$, assgined to cluster 1
    - $d(x_{4}, c_{1}) = 25, d(x_{4}, c_{2}) = 0$, assgined to cluster 2
    - $d(x_{5}, c_{1}) = 29, d(x_{5}, c_{2}) = 4$, assgined to cluster 2
  - Get the new clsuter result: $G_{1} = {1,2,3}, G_{2} = {4,5}$
- Step 2: update the new centroids and assign each sample to the cluster that its nearest centroid belong to.
 - New centroids: $c_{1} = (\frac{1}{3},\frac{2}{3}) $ and $c_{2} = (5,1)$;
 - Calculate the distances and assign the samples;
 - Stop until no cluster assignment changes.