* Supervised learning uses **historical labelled data** to make prediction on new data (regression or classification).

* UnSupervised learning uses **unlabelled data** to discover patterns, clusters, or significant components (clustering or dimensionality reduction). In it, we don't know right answers.

* Supervised metrics won't apply to unsupervised algorithms. Because we have nothing to compare to

# **1. Clustering:**

Using features, group together data rows in distinct clusters.

# **2. Dimensionality Reduction:**

  * Using features, discover how to combine and reduce into fewer components.

  * The reduction is done by combining and transforming the original features into a smaller set of new features or components. These new components are usually a linear combination of the original features.

  * In dimensionality reduction, the goal is to reduce the number of input features or variables while preserving the most important information in the data.

  * We can combine dimensionality reduction into other machine learning algorithms.

# **Cheatsheet:**

## **Definition:**

Unsupervised learning is a type of machine learning where the model learns patterns and relationships in the data without any explicit labels or target outputs.

## **Main Goals:**

  1. Discovering hidden patterns in data.
  2. Grouping similar instances together.
  3. Reducing the dimensionality of data.
  4. Anomaly detection.

## **Common Unsupervised Learning Algorithms:**

**Clustering:** Grouping similar data points together.

  1. K-means clustering.
  2. Hierarchical clustering.
  3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

**Dimensionality Reduction:** Reducing the number of input features.

  1. Principal Component Analysis (PCA).
  2. t-SNE (t-Distributed Stochastic Neighbor Embedding).

**Anomaly Detection:** Identifying abnormal instances in the data.

  1. Isolation Forest.
  2. One-Class SVM (Support Vector Machines).
  3. Autoencoders.

## **Evaluation Metrics:**

  1. Silhouette Coefficient: Measures the quality of clustering results.
  2. Davies-Bouldin Index: Measures the average similarity between clusters.
  3. Explained Variance Ratio: Measures the amount of variance explained by each principal component in PCA.
  4. Reconstruction Error: Measures the difference between input data and the reconstructed output in autoencoders.

## **Preprocessing Techniques:**

  1. Feature Scaling: Normalizing input features to have similar scales.
  2. Missing Data Imputation: Filling in missing values in the dataset.
  3. Feature Selection: Selecting relevant features for analysis.

## **Challenges and Considerations:**

  1. Determining the optimal number of clusters.
  2. Dealing with high-dimensional data.
  3. Handling outliers and noisy data.
  4. Interpreting and validating unsupervised learning results.