# Clustering

### Introduction to Unsupervised Learning

- An unsupervised learning problem consists of data without any labels associated with it. I.e. A set of data with no positve or negative labels, just a training set of $x^{(1)}, x^{(2)}, x^{(3)},..., x^{(m)}$ values (a training set with no y values corresponding to each x value)
- An unsupervised learning algorithm is used to find patterns/structure in an unlabled dataset
    - an algorithm that determines/discovers data grouped into unique clusters is called a clustering algorithm
    - Applications of clustering:
        - Market segmentation
        - Social network analysis
        - Organize computing clusters
        - Astronomical data analysis
        
### K-means Algorithm

1. When running the K-means algorithm, the first step is to randomly initialize the **cluster centroids**. The number of cluster centroids, K is equaled to the number of clusters you wish to try to seperate your data into.
2. The next step is the cluster assignment step. Each training set will be assigned a cluster centroid based on its proxmity to the centroid.
3. The move centroid step moves the centroid to the average of the points assigned to the same centroid.
4. Repeat steps 2 - 3 until convergence

- Formal definition of K-means algorithm
    - Input:
        - K (number of clusters)
        - Training set {$x^{(1)}, x^{(2)}, ..., x^{(m)}$}
        
    - $x^{(i)} \in \mathbb{R}^n$ (drop $x_0 = 1$ covnention)
    
    - Randomly initialize K cluster centroids $\mu_1,\mu_1,...,\mu_K \in \mathbb{R}^n$
    - Repeat:
        - For i = 1 to m
            - $c^{(i)}:=$ index (from 1 to K) cluster centroid closest to $x^{(i)}$
            - also written as $c^{(i)}:= \left\Vert x^{(i)} - \mu_k \right\Vert^2$ = the distance between the xi training example and the cluster centroid. Find the min value of k that minimizes the distances function and assign xi to that cluster centroid k.
        - For k = 1 to K
            - $\mu_k:=$ average (mean) of points assigned to cluster k
            
### Optimization Objective

- Notation:
    - $c^{(i)}=$ index of cluster to which example $x^{(i)}$ is currently assigned
    - $\mu_k=$ cluster centroid $k$
    - $\mu_{c^{(i)}}=$ cluster centroid of cluster to which example $x^{(i)}$ has been assigned
    
- Optimization Objective:
    - $J(c^{(1)},...,c^{(m)},\mu_1,...\mu_K) = \frac{1}{m}\sum \limits_{i=1}^m\left\Vert x^{(i)} - \mu_{c^{(i)}} \right\Vert^2$
    - Also called the Distortion Cost Function, or the Distortion of the k-means algorithm
    
### Random Initialization

- Randomly choose K training examples (K = number of desired cluster centroids) and set $\mu_1,...\mu_k$ equal to those K examples.
- K-means can sometimes converge to local optima depending on the where the randomly initialized cluster centroids are assigned
    - Try multiple random initializations and select the best result to avoid k-means getting stuck at a local optima
    - Example:
        - For i = 1 to 100:
            1. Randomly initialize K-means
            2. Run K-means. Get $(c^{(1)},...,c^{(m)},\mu_1,...\mu_K)$
            3. Compute cost function (distortion)
         - Select the one with the lowest cost $J(c^{(1)},...,c^{(m)},\mu_1,...\mu_K)$
         
### Choosing the Number of Clusters

- 