
## 🔍 **Unsupervised Machine Learning Algorithms**

Unsupervised learning deals with **unlabeled data** — the model **finds patterns, clusters, or structures** hidden in the data **without predefined output**.

---

### **1. K-Means Clustering** 🎯

> **Purpose:** Group data into **K clusters** based on similarity.
> **Key Details:**

* **Type:** Clustering
* **Working:** Assigns data points to the nearest cluster centroid, updates centroids until convergence.
* **Use Case:** Customer segmentation, document classification.
* **Pros:** Simple, fast, scalable.
* **Cons:** Needs to specify K, sensitive to initial centroids and outliers.

---

### **2. Hierarchical Clustering** 🌿

> **Purpose:** Build a **tree of clusters** (dendrogram).
> **Key Details:**

* **Type:** Clustering
* **Working:**

  * **Agglomerative (bottom-up):** Merge closest clusters.
  * **Divisive (top-down):** Split clusters recursively.
* **Use Case:** Gene classification, market segmentation.
* **Pros:** Doesn’t need K beforehand, gives cluster hierarchy.
* **Cons:** Slower for large datasets, less flexible.

---

### **3. DBSCAN (Density-Based Spatial Clustering)** 🧩

> **Purpose:** Find clusters based on **density** and **distance**.
> **Key Details:**

* **Type:** Clustering
* **Working:** Groups closely packed points; points in low-density regions are labeled as noise.
* **Use Case:** Anomaly detection, geographic data clustering.
* **Pros:** Detects arbitrary shaped clusters, handles noise well.
* **Cons:** Hard to choose parameters (eps, minPts), not good with varying density.

---

### **4. PCA (Principal Component Analysis)** 🔄

> **Purpose:** Reduce dimensionality while keeping most variance.
> **Key Details:**

* **Type:** Dimensionality Reduction
* **Working:** Transforms correlated features into a smaller set of **uncorrelated** components (principal components).
* **Use Case:** Data visualization, noise reduction, speeding up algorithms.
* **Pros:** Improves efficiency, removes multicollinearity.
* **Cons:** Components are hard to interpret, assumes linearity.

---

### **5. t-SNE (t-Distributed Stochastic Neighbor Embedding)** 🎨

> **Purpose:** Visualize high-dimensional data in 2D/3D.
> **Key Details:**

* **Type:** Dimensionality Reduction / Visualization
* **Working:** Preserves local structure of data and creates visual-friendly layouts.
* **Use Case:** Image clustering, visualizing word embeddings.
* **Pros:** Great for visualizing patterns and groups.
* **Cons:** Computationally expensive, not good for new/unseen data.

---

### **6. Autoencoders** 🧠

> **Purpose:** Learn data compression and reconstruction (used for unsupervised learning).
> **Key Details:**

* **Type:** Neural Network / Dimensionality Reduction
* **Working:**

  * Encoder compresses data
  * Decoder reconstructs original data
* **Use Case:** Anomaly detection, noise reduction.
* **Pros:** Learns non-linear patterns, powerful for image and signal data.
* **Cons:** Requires lots of data, hard to interpret.

---

### **7. Apriori Algorithm** 🛒

> **Purpose:** Discover **frequent itemsets** and **association rules**.
> **Key Details:**

* **Type:** Association Rule Learning
* **Working:** Uses support, confidence, and lift to find relationships.
* **Use Case:** Market basket analysis (e.g., "People who buy bread also buy butter").
* **Pros:** Easy to implement, useful insights for cross-selling.
* **Cons:** Generates many rules, slow for large datasets.

---

### **8. FP-Growth (Frequent Pattern Growth)** 🌱

> **Purpose:** Faster alternative to Apriori for mining frequent patterns.
> **Key Details:**

* **Type:** Association Rule Learning
* **Working:** Builds a compact tree (FP-tree) to find frequent patterns without generating candidate sets.
* **Use Case:** Retail analytics, recommendation systems.
* **Pros:** Efficient, handles large data well.
* **Cons:** Complex implementation.

---

### **9. Isolation Forest** 🌲

> **Purpose:** Detect anomalies by isolating observations.
> **Key Details:**

* **Type:** Anomaly Detection
* **Working:** Anomalies are isolated faster by random splits in trees.
* **Use Case:** Fraud detection, system health monitoring.
* **Pros:** Fast, works with high-dimensional data.
* **Cons:** Less interpretable than clustering methods.



## 📦 Recommendation System (Recommender System)
> 📘 A system that **suggests relevant items** (like movies, products, music) to users based on **preferences, behavior, or similarity**.

---

### 🔹 **Types of Recommendation Systems**

---

### 1️⃣ **Content-Based Filtering** 🧾

> 📌 Recommends items **similar to what a user liked in the past**.

**How it works:**

* Uses item features (genre, category, brand).
* Builds a profile for the user.
* Recommends similar items.

**Example:**

> If you liked a sci-fi movie, it suggests more sci-fi movies.

**Pros:**
✅ Personalized
✅ No need for other users' data

**Cons:**
❌ Limited to known preferences
❌ Can't suggest diverse items

---

### 2️⃣ **Collaborative Filtering** 🤝

> 📌 Recommends items based on **what similar users liked**.

**How it works:**

* Finds users with similar tastes.
* Recommends items liked by those users.

**Example:**

> "Users who watched this also watched..."

**Types:**
🔸 **User-based** – Similar users
🔸 **Item-based** – Similar items

**Pros:**
✅ Learns from others
✅ More surprising suggestions

**Cons:**
❌ Cold start (for new users/items)
❌ Needs lots of data

---

### 3️⃣ **Hybrid System** ⚙️

> 📌 Combines **content-based + collaborative** methods.

**How it works:**

* Mixes or blends results from both techniques.
* Can be weighted or switched based on context.

**Example:**

> Netflix and Amazon use hybrid approaches.

**Pros:**
✅ Balanced and accurate
✅ Reduces limitations of individual methods

**Cons:**
❌ Complex to build and tune

