# Exercices

# 1. How would you define clustering? Can you name a few clustering algorithms?

Clustering is the task of identifying similar instances and group them together. A few clustering algorithms could be K-Means, DBSCAN, Agglomerative Clustering, Gaussian Mixture, etc. .

# 2. What are some of the main applications of clustering algorithms?

Some of the main applications of clustering algorithms:
* Customer Segmentation ( create groups of customers with similar behavior).
* Semi-supervised learning ( propagate all the instances in the same clusters).
* Anomaly detection ( detect a new instance that does not belong to any classes).
* Data Analysis
* Image Segmentation ( by clustering pixels using their color).
* Dimensionality Reduction.
* Density Estimation.

# 3. Describe two techniques to select the right number of clusters when using K-Means.

When using K-Means, we have to select beforehand the number of clusters we want. To do that, we can use:
* Elbow Method, which uses the inertia. The method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use.
* Silhouette score, which is the mean silhouette coefficient over all the instances. Silhouette coefficients (as these values are referred to as) near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster. The size of each clusters can be drawn with the Silhouette diagram.

# 4. What is label propagation? Why would you implement it, and how?

Label propagation refers to the fact of assigning labels to previously unlabeled data points. At the start, a subset of the data points have labels (or classifications). These labels are propagated to the unlabeled points throughout the course of the algorithm.

Label propagation denotes a few variations of semi-supervised graph inference algorithms.

A few features available in this model:
* Can be used for classification and regression tasks
* Kernel methods to project data into alternate dimensional spaces

# 5. Can you name two clustering algorithms that can scale to large datasets? And two that look for regions of high density?

Two clustering algorithms that can scale to large datasets:
* Agglomerative clustering
* BIRCH if the number of features is not too large (< 20).
* Mini-batch K-Means

Two clustering algorithms that can look for regions of high density:
* DBSCAN
* Gaussian Mixtures

# 6. Can you think of a use case where active learning would be useful? How would you implement it?

Active learning is when a human experts interacts with the learning algorithm, providing labels for specific instances when the algorithm requests them. There are many different strategies for active learning, but one of the most common ones is called *uncertainty sampling*.
* 1: The model is trained on the labeled instances gathered so far, and this model is used to make predictions on all the unlabeled instances.
* 2: The instances for which the model is most uncertain (when its estimated probability is lowest) are given to the expert to be labeled.
* 3: Iteration of the process until the performance improvement stops being worth the labeling effort.

Other strategies include labeling the instances that would result in the largest model change, or the lastest drop in the model's validation error, or the instances that different models disagree on (e.g., an SVM, or a RandomForest).

# 7. What is the difference between anomaly detection and novelty detection?

*Anomaly detection* also called *outlier detection* is the task of detecting instances that deviate strongly from the norm.  They are called *anomalies* or *outliers* while the normal instances are called *inliers*.

Outlier detection
* The training data contains outliers which are defined as observations that are far from the others. Outlier detection estimators thus try to fit the regions where the training data is the most concentrated, ignoring the deviant observations.

Novelty detection
* The training data is not polluted by outliers and we are interested in detecting whether a new observation is an outlier. In this context an outlier is also called a novelty.

# 8. What is a Gaussian mixture? What tasks can you use it for?

A *Gaussian Mixture model* (GMM) is a probabilitistic model that assumes that the instances were generated from a mixture of several Gaussian distributions whose parameters are unknown. All the instances generated from a single Gaussian distribution form a cluster that typically looks like an ellipsoid. Each cluster can have a different ellipsoidal shape, size, density, and orientation. When we observe an instance, we know it was generated from one of the Gaussian distributions, but we are not told which one, and we do not know what the parameters of these distributions are.

We can use for the following tasks:
* Anomaly detection
* Novelty detection
* Clustering

# 9. Can you name two techniques to find the right number of clusters when using a Gaussian mixture model?

The two techniques are :
* Bayesian Information Criterion (BIC)
* Akaike Information Criterion (AIC)

Both of them are Theroretical Information Criterions.

Both the BIC and the AIC penalize models that have more parameters to learn (e.g. more clusters) and reward models that fit the data well. They often end up selecting the same model. When they differ, the model selected by the BIC tends to be simpler (fewer parameters) than the one selected by the AIC, but tends to not fit the data quite as well (this is espcially true for large datasets).

# 10. The classic Olivetti faces dataset contains 400 grayscale 64 × 64–pixel images of faces. Each image is flattened to a 1D vector of size 4,096. 40 different people were photographed (10 times each), and the usual task is to train a model that can predict which person is represented in each picture. Load the dataset using the function, then split it into a training set, a validation set, and a test set (note that the dataset is already scaled between 0 and 1). Since the dataset is quite small, you probably want to use stratified sampling to ensure that there are the same number of images per person in each set. Next, cluster the images using K-Means, and ensure that you have a good number of clusters (using one of the techniques discussed in this chapter). Visualize the clusters: do you see similar faces in each cluster?

# 11. Continuing with the Olivetti faces dataset, train a classifier to predict which person is represented in each picture, and evaluate it on the validation set. Next, use K-Means as a dimensionality reduction tool, and train a classifier on the secaf_ittevilo_hctef.stesatad.nraelks reduced set. Search for the number of clusters that allows the classifier to get the best performance: what performance can you reach? What if you append the features from the reduced set to the original features (again, searching for the best number of clusters)?

# 12. Train a Gaussian mixture model on the Olivetti faces dataset. To speed up the algorithm, you should probably reduce the dataset’s dimensionality (e.g., use PCA, preserving 99% of the variance). Use the model to generate some new faces (using the method), and visualize them (if you used PCA, you will need to use its method). Try to modify some images (e.g., rotate, flip, darken) and see if the model can detect the anomalies (i.e., compare the output of the method for normal images and for anomalies).

# 13. Some dimensionality reduction techniques can also be used for anomaly detection. For example, take the Olivetti faces dataset and reduce it with PCA, preserving 99% of the variance. Then compute the reconstruction error for each image. Next, take some of the modified images you built in the previous exercise, and look at their reconstruction error: notice how much larger the reconstruction error is. If you plot a reconstructed image, you will see why: it tries to reconstruct a normal face.