# Unsupervised Machine Learning

In the previous topic, we learned supervised machine learning in which models are trained using labeled data under the supervision of training data. But there may be many cases in which we do not have labeled data and need to find the hidden patterns from the given dataset. So, to solve such types of cases in machine learning, we need unsupervised learning techniques.

## What is Unsupervised Learning?

- As the name suggests, unsupervised learning is a machine learning technique in which models are not supervised using training dataset. Instead, models itself find the hidden patterns and insights from the given data. It can be compared to learning which takes place in the human brain while learning new things. It can be defined as:

- *Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision.*

- Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning is to **find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format.**

- **Example**: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset. The task of the unsupervised learning algorithm is to identify the image features on their own. Unsupervised learning algorithm will perform this task by clustering the image dataset into the groups according to similarities between images.

![unsupervised-machine-learning.png](attachment:286af016-621c-43f7-bcbb-e43d2c5bdd2b.png)

## Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised Learning:

- Unsupervised learning is helpful for finding useful insights from the data.

- Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it closer to the real AI.

- Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning more important.

- In real-world, we do not always have input data with the corresponding output so to solve such cases, we need unsupervised learning.

## Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

![unsupervised-machine-learning-1.png](attachment:66214372-fb9b-4a71-b01a-95176aa054e5.png)

Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and difference between the objects.

## Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of problems:



![unsupervised-machine-learning-2.png](attachment:78949d7f-7bbb-4966-bfd7-421af16bf715.png)

- **Clustering**: Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a group and has less or no similarities with the objects of another group. Cluster analysis finds the commonalities between the data objects and categorizes them as per the presence and absence of those commonalities.

- **Association**: An association rule is an unsupervised learning method which is used for finding the relationships between variables in the large database. It determines the set of items that occurs together in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.

# Unsupervised Learning

Unsupervised learning is a type of machine learning where the model identifies patterns or structures in data without using labels or predefined outcomes. It is used when we want to explore the data and find hidden relationships.

Two common types of unsupervised learning are **Clustering** and **Association**.

---

## 1. Clustering

Clustering is about grouping similar data points into clusters. Data points in the same cluster are more similar to each other than to those in other clusters.

### Example:
Imagine you have a group of students in a school, and you want to divide them into study groups based on their exam scores. Students with similar scores will be grouped together:

- **Cluster 1:** High scorers  
- **Cluster 2:** Average scorers  
- **Cluster 3:** Low scorers  

By clustering, you can identify these groups and tailor teaching strategies for each group.

### Real-Life Use Case:
In marketing, businesses use clustering to segment customers. For instance:
- **Group 1:** Young adults with low income but high spending.  
- **Group 2:** Middle-aged customers with high income and moderate spending.

This helps companies create targeted marketing campaigns.

---

## 2. Association

Association is about finding relationships between items in a dataset. It identifies patterns, such as "If X happens, Y is likely to happen."

### Example:
Imagine you run a supermarket and want to find out which products are frequently bought together:

- Customers who buy **bread** often buy **butter**.  
- Customers who buy **milk** often buy **cookies**.

From this information, you can:
- Place bread and butter together on shelves.  
- Offer discounts on cookies when customers buy milk.

### Real-Life Use Case:
E-commerce websites like Amazon use association to show recommendations:
- "Customers who bought this item also bought that item."

---

## Key Differences:

| Feature         | **Clustering**                           | **Association**                                |
|------------------|------------------------------------------|-----------------------------------------------|
| **Purpose**      | Group similar data into clusters         | Find relationships between items              |
| **Example**      | Grouping students by exam scores         | Bread and butter are often bought together    |
| **Use Case**     | Customer segmentation, anomaly detection | Market basket analysis, product recommendations |

---

These techniques help in understanding data better and making informed decisions in areas like marketing, retail, and education.


## Unsupervised Learning Algorithms

Unsupervised learning algorithms identify patterns in unlabeled data. Here are some common types:


## 1. Clustering Algorithms

Clustering algorithms group data points into clusters based on similarity. Examples include:

- **K-Means Clustering**  
  Partitions data into \( k \) clusters, minimizing the variance within each cluster.

- **Hierarchical Clustering**  
  Builds a tree of clusters either by merging smaller clusters (agglomerative) or splitting larger clusters (divisive).

- **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**  
  Groups data points based on density, identifying clusters of varying shapes and sizes. It also marks noise points as outliers.

---

## 2. Dimensionality Reduction Algorithms

Dimensionality reduction algorithms reduce the number of features in data while preserving the most important information. Examples include:

- **Principal Component Analysis (PCA)**  
  Reduces dimensionality by finding the principal axes of variance in the data. Useful for visualization and speeding up computations.

---

These algorithms are widely used for customer segmentation, anomaly detection, data visualization, and more.


## Advantages of Unsupervised Learning

- Unsupervised learning is used for more complex tasks as compared to supervised learning because, in unsupervised learning, we don't have labeled input data.

- Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data.

## Disadvantages of Unsupervised Learning

- Unsupervised learning is intrinsically more difficult than supervised learning as it does not have corresponding output.

- The result of the unsupervised learning algorithm might be less accurate as input data is not labeled, and algorithms do not know the exact output in advance.