# Course Outline

Unsupervised learning is all about understanding how to group our data when we either

1. Do not have a label to predict. An example of this is using an algorithm to look at brain scans to find areas that may raise concern. You don't have labels on the images to understand what areas might raise reason for concern, but you can understand which areas are most similar or different from one another.

2. Are not trying to predict a label, but rather group our data together for some other reason! One example of this is when you have tons of data, and you would like to condense it down to a fewer number of features to be used.

# Types of Unsupervised Learning
There are two popular methods for unsupervised machine learning.

* **Clustering** - which groups data together based on similarities

* **Dimensionality Reduction** - which condenses a large number of features into a (usually much) smaller set of features.

# K-Means
The K-Means algorithm is used to cluster all sorts of data.

It can group together

* Books of similar genres or written by the same authors.
* Similar movies.
* Similar music.
* Similar groups of customers.

This clustering can lead to product, movie, music and other types of recommendations.

In the K-means algorithm 'k' represents the number of clusters you have in your dataset. In this video, you saw that a k value of two makes a lot of sense. There is one cluster of points with shorter distances for when I travel to work. A second cluster is created when I travel to my parents' house.

![km](pics/k_m_1.png)

Visually inspecting your data easily shows these two clusters. On the next page, you will have an opportunity to make sure you have this technique for finding clusters mastered.

## Choosing K

So far you have identified k when you can visually inspect your data to identify the number of clusters. However, in practice, you often have tons of data with many features. This can make visualizing your clusters impossible.

In these cases, choosing k is often an art and a science. Often researchers have an idea of what k should be ahead of time. In other cases, no one has any idea what k should be! How do we choose k in these cases? Don't worry there is a general method used for these cases. 

## Elbow Method

When you have no idea how many clusters exist in your dataset, a common strategy for determining k is the elbow method. In the elbow method, you create a plot of the number of clusters (on the x-axis) vs. the average distance of the center of the cluster to each point (on the y-axis). This plot is called a **scree plot**.

The average distance will always decrease with each additional cluster center. However, with fewer clusters, those decreases will be more substantial. At some point, adding new clusters will no longer create a substantial decrease in the average distance. **This point is known as the elbow.**

## How Does K-Means Work?

You choose k as the number of clusters you believe to be in your dataset or...
You use the elbow method to determine k for your data.
Then this number of clusters is created within your dataset, where each point is assigned to each group.

However, to understand what edge cases might occur when grouping points together, it is necessary to understand exactly what the k-means algorithm is doing. Here is one method for computing k-means:

1. Randomly place k centroids amongst your data.

Then within a loop until convergence perform the following two steps:

2. Assign each point to the closest centroid.

3. Move the centroid to the center of the points assigned to it.

At the end of this process, you should have k-clusters of points.

## Feature Scaling

For any machine learning algorithm that uses distances as a part of its optimization, it is important to scale your features.

You saw this earlier in regularized forms of regression like Ridge and Lasso, but it is also true for k-means. In future sections on PCA and ICA, feature scaling will again be important for the successful optimization of your machine learning algorithms.

Though there are a large number of ways that you can go about scaling your features, there are two ways that are most common:

1. Normalizing or Max-Min Scaling - this type of scaling transforms variable values to between 0 and 1.
2. Standardizing or Z-Score Scaling - this type of scaling transforms variable values so they have a mean of 0 and standard deviation of 1.

# Clustering Recap
We just covered a bunch of information! Here is a quick recap!

## I. Clustering
You learned about clustering, a popular method for unsupervised machine learning. We looked at three ways to identify clusters in your dataset.

1. **Visual Inspection** of your data.
2. **Pre-conceived** ideas of the number of clusters.
3. **The elbow method**, which compares the average distance of each point to the cluster center for different numbers of centers.

## II. K-Means

You saw the k-means algorithm for clustering data, which has 3 steps:

1. Randomly place k-centroids amongst your data.

Then repeat the following two steps until convergence (the centroids don't change):

2. Look at the distance from each centroid to each point. Assign each point to the closest centroid.

3. Move the centroid to the center of the points assigned to it.

## III. Concerns with K-Means
Finally, we discussed some concerns with the k-means algorithm. These concerns included:

1. Concern: The random placement of the centroids may lead to non-optimal solutions.

Solution: Run the algorithm multiple times and choose the centroids that create the smallest average distance of the points to the centroids.

2. Concern: Depending on the scale of the features, you may end up with different groupings of your points.

Solution: Scale the features using Standardizing, which will create features with mean 0 and standard deviation 1 before running the k-means algorithm.