# Unsupervised Machine Learning

- Unsupervised learning is a type of machine learning that learns from unlabeled data.
- This means that the data does not have any pre-existing labels or catogeries.
- The goal of unsupervised learning is to discover patterns and relationships in the data without any explicit guidance.

### What is Unsupervised Machine Learning?
Unsupervised Learning is a type of Machine Learning where:
- The model learns from unlabeled data.
- No predefined output or target (y) is provided.
- The goal is to discover patterns, groupings, or structure in the data.

🧠 It is like giving a machine a box of mixed puzzle pieces without the picture on the box and asking it to organize or group them.

### Key Goals of Unsupervised Learning

| Task                      | Goal                                  |
| ------------------------- | ------------------------------------- |
| Clustering                | Group similar data points             |
| Dimensionality Reduction  | Reduce the number of input variables  |
| Anomaly Detection         | Identify unusual data points          |
| Association Rule Learning | Discover relationships among features |


![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

| Feature  | Supervised Learning        | Unsupervised Learning        |
| -------- | -------------------------- | ---------------------------- |
| Data     | Labeled                    | Unlabeled                    |
| Goal     | Predict outcome            | Discover hidden patterns     |
| Examples | Classification, Regression | Clustering, PCA              |
| Output   | y (known)                  | Structure or group (unknown) |


![image.png](attachment:image.png)![image-2.png](attachment:image-2.png)

# Types of Unsupervised Machine Learning

![image.png](attachment:image.png)

##  Clustering

### What is clusturing
-  Clustering is an unsupervised learning technique that groups similar data points together without using labels.
- Goal: Find natural groupings (clusters) in the data.
- Real-Life Analogy: Organizing books in a library into shelves based on content without knowing their genres beforehand.

### Types of Clustering Algorithms

#####  A. Partition-Based Clustering

| Type                       | Description                                                             |
| -------------------------- | ----------------------------------------------------------------------- |
| **K-Means**                | Divides data into `k` clusters by minimizing the distance to centroids. |
| **K-Medoids (PAM)**        | Uses actual points (medoids) instead of average centroids.              |
| **K-Modes / K-Prototypes** | Used for categorical or mixed data types.                               |

- ✅ When to Use: You know the approximate number of clusters.

#####  B. Density-Based Clustering

| Type        | Description                                                         |
| ----------- | ------------------------------------------------------------------- |
| **DBSCAN**  | Forms clusters based on high-density areas. Detects noise/outliers. |
| **OPTICS**  | Improves DBSCAN for variable density.                               |
| **HDBSCAN** | Hierarchical version of DBSCAN; automatic cluster selection.        |

- ✅ When to Use: Irregular cluster shapes, noise, unknown k.

##### C. Hierarchical Clustering

| Type                          | Description                                               |
| ----------------------------- | --------------------------------------------------------- |
| **Agglomerative** (Bottom-Up) | Starts with individual points, merges them into clusters. |
| **Divisive** (Top-Down)       | Starts with all points as one big cluster, splits them.   |
| **BIRCH**                     | Efficient for large datasets using a tree structure.      |

- ✅ When to Use: Want a dendrogram (tree-like structure) or multilevel clustering.

##### D. Graph-Based Clustering

| Type                    | Description                                                            |
| ----------------------- | ---------------------------------------------------------------------- |
| **Spectral Clustering** | Uses graph Laplacian & eigenvectors for dimensionality-based grouping. |

- ✅ When to Use: Complex non-convex shapes or similarity graphs.

##### E. Grid-Based Clustering

| Type              | Description                                           |
| ----------------- | ----------------------------------------------------- |
| **CLIQUE, STING** | Divides space into grids and groups based on density. |

- ✅ When to Use: Large datasets, spatial data.

##### Use Cases

| Domain           | Application           |
| ---------------- | --------------------- |
| Marketing        | Customer Segmentation |
| Healthcare       | Patient Clustering    |
| Image Processing | Image Segmentation    |
| Finance          | Fraud Detection       |
| Social Media     | Community Detection   |


---

## Association

### What is Association
- Association rule learning is about discovering interesting relationships or patterns among variables in large datasets.
-  Goal: Find items that often occur together.
-   Real-Life Analogy: If a customer buys bread 🍞 and butter 🧈, they are also likely to buy jam 🍓.

### Types of Association Algorithms

##### A. Apriori Algorithm
- Uses support, confidence, and lift.
- Builds frequent itemsets using level-wise search.

Terms:
- Support: Frequency of itemset in data.
- Confidence: Probability of item B given A.
- Lift: Strength of association.
```
from mlxtend.frequent_patterns import apriori, association_rules
```

#####  B. Eclat Algorithm
- Uses a vertical layout (item-to-transaction).
- Faster with set intersection.

#####  C. FP-Growth Algorithm
- Builds a compressed FP-tree.
- Efficient and faster than Apriori.

![image.png](attachment:image.png)

----

# Summary

| Category        | Type                    | Algorithms                | Use Cases                        |
| --------------- | ----------------------- | ------------------------- | -------------------------------- |
| **Clustering**  | Partition-Based         | K-Means, K-Medoids        | Customer segmentation            |
|                 | Density-Based           | DBSCAN, HDBSCAN           | Anomaly detection                |
|                 | Hierarchical            | Agglomerative, BIRCH      | Gene data clustering             |
|                 | Graph-Based             | Spectral                  | Complex cluster shapes           |
|                 | Grid-Based              | CLIQUE, STING             | Spatial data clustering          |
| **Association** | Frequent Pattern Mining | Apriori, Eclat, FP-Growth | Basket analysis, recommendations |
