# Fundamentals of Unsupervised Machine Learning

## Intro

<br /><br />

### Motivation

So far we have studied algorithms that work with labeled/annotated data and we divided those algorithms in two classes: Regression algorithms - data label is continuous variable; Classification algorithms - data label is discrete/categorical variable. The part of Machine Learning that deals with these kind of algorithms requiring labeled data points is called <b>Supervised Machine Learning</b>.

However, there are bunch of cases where data labels are not available! Moreover, most of the data in real world doesn't come with labels! Our goal is to still extract some insights out of these datasets without hiring people to do manual annotation which might be expensive or sometimes impossible. The part of machine learning dealing with unlabeled data is called <b>Unsupervised Machine Learning</b>.

Many people believe that unsupervised learning is the key to the general Artificial Intelligence as human babes do most of the learning without explicit supervision.

The goal of today's class is to introduce to some of the most popular unsupervised learning algorithms and go through some use-cases where these algorithms are applicable.


### Learning Objectives

At the end of this class, listeners will be able to:

<ul>
    <li>Understand conceptual differences between Supervised and Unsupervised machine learning.</li>
    <li>Get familiar with several essential unsupervised learning algorithms.</li>
    <li>Get intuition about when these algorithms shall be used.</li>
</ul>

### Reading Material

<ul>
    <li>K-Means algorithm: https://youtu.be/J0A_tkIgutw?list=PLnZuxOufsXnvftwTB1HL6mel1V32w0ThI <b>(mandatory)</b></li> 
    <li>Gaussian Mixture Models: https://youtu.be/I9dfOMAhsug?list=PLnZuxOufsXnvftwTB1HL6mel1V32w0ThI <b>(mandatory)</b></li>
    <li>EM algorithm for Gaussian Mixture Models: https://youtu.be/lMShR1vjbUo?list=PLnZuxOufsXnvftwTB1HL6mel1V32w0ThI</li>
    <li>DBSCAN algorithm: https://en.wikipedia.org/wiki/DBSCAN</li>
    <li>AMLD 2020 workshop material on unsupervised fraud detection: https://github.com/amld/workshop-unsupervised-fraud</li>
</ul>


<br /><br /><br /><br />

## Clustering

Clustering is one of the most important and essential sub-topics of unsupervised machine learning that deals with algorithms that separate/cluster data into logical groups without requiring any labeling as we said.  
  
Below example compares unsupervised ml (namely, clustering algorithm) vs supervised ml.

<img src="https://assets.extrahop.com/images/blogart/supervised-vs-unsupervised-ml.png" />

### K-Means Clustering

The simplest clustering algorithm is K-means clustering, that separates groups of similar data points by <b>circles</b> as depicted in above picture.  

For details of the algorithm refer reading material in top section (~20 minute video).
  
<br /><br />  
  
<b>Pros:</b>  

<ul><li>Very simple and intuitive</li></ul>

<b>Cons:</b>  

<ul><li>Non-circular cluster shapes are hard to deal with.</li>
    <li>Number of clusters should be set manually, wrong number can result in poor results.</li>
    <li>Can get stuck in local minima (solution is to re-run several times and select best one).</li>
    <li>Exhibits poor performance in high-dimensional data due to "curse of dimensionality". Running dimensionality reduction algorithms (refer below) in advance is usually beneficial.</li>
</ul>

One of the use-cases of the K-means can be vector-quantization (information compression): https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48a

### Gaussian Mixture Models (GMMs) for Density Estimation and Clustering

Gaussian Mixture Models is a powerful unsupervised learning technique that is used: 1) for clustering; 2) Density Estimation - learning the distribution of the data in order to be able to sample/generate new data points from it. The latter application is most important use-case of GMM in practice.

For in-depth details of the algorithm refer reading material in top section.

<br /><br /><br />

<img src="images/unsupervised_ml/gmm.png" />
  
<b>Pros:</b>  

<ul>
    <li>If the number of clusters is large enough, can learn practically any density.</li>
    <li>Good Probabilistic Interpretation</li>
</ul>

<b>Cons:</b>  

<ul>
    <li>Number of clusters should be set manually, wrong number can result in poor results.</li>
    <li>Can get stuck in local minima (solution is to re-run several times and select best one).</li>
    <li>Exhibits poor performance in high-dimensional data due to "curse of dimensionality". Running dimensionality reduction algorithms (refer below) in advance is usually beneficial.</li>
</ul>

One of the practical use cases apart from clustering is <b>Anomaly/Outlier Detection</b>.