# **1. K-Means Clustering**

---

## **1. Introduction**

* **K-Means** is an **unsupervised learning algorithm** used for **clustering**.
* It groups similar data points into **K clusters**, where each cluster is represented by its **centroid** (mean of points).
* Goal: Minimize the distance between data points and their assigned cluster centroid.

📌 Example:
Segmenting customers into groups based on their **purchasing behavior**.

---

## **2. How K-Means Works (Step by Step)**

1. **Choose number of clusters (K).**

   * Example: If K=3, we want to split data into 3 groups.

2. **Initialize centroids.**

   * Randomly select K points from the dataset as initial centroids.

3. **Assign each point to the nearest centroid.**

   * Compute distance (e.g., Euclidean) from each point to centroids.
   * Assign each point to the cluster with the closest centroid.

4. **Update centroids.**

   * For each cluster, compute the **mean of all assigned points**.
   * This mean becomes the new centroid.

5. **Repeat steps 3 & 4** until convergence.

   * Stop when centroids no longer move significantly (or max iterations reached).

---

## **3. Mathematical Formulation**

Objective: Minimize the **within-cluster sum of squares (WCSS)**

$$
J = \sum_{i=1}^K \sum_{x \in C_i} \|x - \mu_i\|^2
$$

Where:

* $C_i$ = set of points in cluster $i$
* $\mu_i$ = centroid of cluster $i$

---

## **4. Choosing K (Number of Clusters)**

### 🔹 Elbow Method

* Plot **WCSS vs K**.
* Look for the "elbow" point where adding more clusters doesn’t improve much.

### 🔹 Silhouette Score

* Measures how well points fit within their cluster compared to others.
* Ranges from **-1 to 1** (higher is better).

---

## **5. Pros & Cons**

### ✅ Pros

* Simple & easy to implement.
* Scales well for large datasets.
* Works well when clusters are spherical & well-separated.

### ❌ Cons

* Need to predefine **K**.
* Sensitive to initialization (different results on different runs).
* Sensitive to outliers (can distort centroids).
* Works poorly with non-spherical clusters.

---

## **6. Assumptions**

* Clusters are convex, isotropic (roughly spherical).
* Data points are closer to their cluster centroid than others.

---

## **7. Variants**

* **K-Means++:** Smarter centroid initialization → improves stability.
* **Mini-Batch K-Means:** Uses batches for faster training on large datasets.
* **Fuzzy C-Means:** Allows soft assignments (probability of belonging to clusters).

---

## **8. Real-Life Applications**

* **Customer Segmentation:** Grouping customers by spending habits.
* **Market Basket Analysis:** Grouping similar products.
* **Image Compression:** Reducing colors by clustering similar pixels.
* **Anomaly Detection:** Outliers that don’t belong to any cluster.
* **Document Clustering:** Grouping similar news/articles.

---

## **9. Visualization**

```
Step 1: Initialize centroids (random points)
Step 2: Assign points to nearest centroid
Step 3: Recompute centroids
Step 4: Repeat until convergence
```

Example (K=3):

```
 ● ● ● ○ ○ ○ ▲ ▲ ▲
     ↓ Clustering ↓
 Cluster 1   Cluster 2   Cluster 3
```

---

## **10. Key Takeaways**

* K-Means = **distance-based clustering**.
* Objective = minimize within-cluster variance.
* Needs good choice of **K** and **initialization**.
* Works best on spherical, well-separated clusters.

---
---
---

# **2. Hierarchical Clustering**

---

## **1. Introduction**

* **Hierarchical Clustering** builds a **hierarchy of clusters**.
* Instead of partitioning data into K clusters directly (like K-Means), it creates a **tree-like structure (dendrogram)**.
* You can "cut" the dendrogram at a certain level to decide how many clusters to form.

📌 Example:
Grouping species based on DNA similarity — you see a hierarchy where closer species join earlier, and distant species join later.

---

## **2. Types of Hierarchical Clustering**

### 🔹 **Agglomerative (Bottom-Up)**

* Start with each point as its own cluster.
* Iteratively merge the closest clusters.
* Stop when all points are in one big cluster.
* Most commonly used.

### 🔹 **Divisive (Top-Down)**

* Start with one big cluster (all points).
* Iteratively split into smaller clusters.
* Less common (more computationally expensive).

---

## **3. Distance Metrics (to measure closeness between points)**

* **Euclidean Distance:**

  $$
  d(x,y) = \sqrt{\sum (x_i - y_i)^2}
  $$
* **Manhattan Distance:**

  $$
  d(x,y) = \sum |x_i - y_i|
  $$
* **Cosine Distance:** Based on angle between vectors (good for text).

---

## **4. Linkage Criteria (to measure closeness between clusters)**

When merging clusters, we need a rule to decide "closest".

* **Single Linkage:**
  Minimum distance between any two points in clusters.
  (Tends to form "chains").

* **Complete Linkage:**
  Maximum distance between any two points.
  (Tends to form compact clusters).

* **Average Linkage:**
  Average distance between all pairs.
  (Balance between single & complete).

* **Ward’s Method:**
  Merge clusters that result in the smallest increase in total variance.
  (Very popular for compact clusters).

---

## **5. Workflow (Agglomerative Clustering)**

1. Treat each point as its own cluster.
2. Compute pairwise distances between all clusters.
3. Merge the two closest clusters.
4. Recompute distances between clusters.
5. Repeat until one cluster remains.
6. Use **dendrogram** to decide final number of clusters.

---

## **6. Dendrogram**

A dendrogram is a **tree diagram** that shows how clusters are merged step by step.

* Height of the join = distance between clusters.
* Cutting horizontally across the dendrogram = choosing number of clusters.

📌 Example:
If you cut at a certain height and see 3 vertical lines intersecting → you have 3 clusters.

---

## **7. Pros & Cons**

### ✅ Pros

* No need to pre-specify number of clusters (can decide using dendrogram).
* Works for different cluster shapes and sizes.
* Provides hierarchy (multi-level clustering).

### ❌ Cons

* Computationally expensive for large datasets (**O(n² log n)**).
* Once merged/split, decisions can’t be undone.
* Sensitive to noise and outliers.

---

## **8. Real-Life Applications**

* **Biology:** Phylogenetic trees, DNA sequence grouping.
* **Marketing:** Customer segmentation.
* **Text Mining:** Document clustering.
* **Image Processing:** Grouping pixels for segmentation.
* **Sociology:** Grouping people based on survey responses.

---

## **9. Visualization**

```
Data Points → Compute Distances → Merge Closest Clusters → Dendrogram

Example (dendrogram):

|          ______ Cluster A
|      ___|
|     |   |______ Cluster B
|  ___|
| |   |_________ Cluster C
| |
| |_____________ Cluster D
|
+--------------------------------
```

---

## **10. Key Takeaways**

* Hierarchical Clustering = builds a **tree of clusters**.
* Two main types: **Agglomerative (bottom-up)**, **Divisive (top-down)**.
* Uses **distance metrics** + **linkage criteria** to merge/split.
* Visualized using a **dendrogram**.
* Works well for small/medium datasets, not great for very large ones.

---
---
---

# **3. DBSCAN (Density-Based Clustering)**

---

## **1. Introduction**

* **DBSCAN** is a **density-based clustering algorithm**.
* Instead of partitioning data into K groups (like K-Means), it identifies **dense regions** of points as clusters and labels low-density points as **noise/outliers**.
* It is great for discovering **arbitrarily shaped clusters** (not just spherical ones).

📌 Example:
Separating urban regions (dense data) from rural regions (sparse data) in geographical mapping.

---

## **2. Key Concepts**

DBSCAN relies on two parameters:

1. **ε (epsilon):** Radius defining the neighborhood around a point.
2. **MinPts (minimum points):** Minimum number of points required to form a dense region.

Based on these, points are classified as:

* **Core Point:** Has at least MinPts within ε.
* **Border Point:** Lies within ε of a core point but has fewer than MinPts.
* **Noise (Outlier):** Not a core point and not within ε of any core point.

---

## **3. DBSCAN Algorithm (Step by Step)**

1. Pick an unvisited point.
2. If it has at least **MinPts** neighbors within **ε** → mark as a **core point** and form a new cluster.
3. Expand cluster: Add all reachable points within ε.
4. If the point is not a core point → mark as **noise** (can later become border if part of a cluster).
5. Repeat until all points are visited.

---

## **4. Mathematical Definition**

Two key definitions:

* **Directly Density-Reachable:** A point $p$ is directly density-reachable from $q$ if $p$ lies within ε of $q$ and $q$ is a core point.
* **Density-Connected:** Two points $p$ and $q$ are density-connected if there exists a chain of points between them where each pair is directly density-reachable.

---

## **5. Pros & Cons**

### ✅ Pros

* No need to specify number of clusters (unlike K-Means).
* Can find **arbitrarily shaped clusters**.
* Identifies **outliers** naturally.
* Works well with spatial/geographic data.

### ❌ Cons

* Choice of **ε** and **MinPts** is tricky.
* Struggles with clusters of **varying densities**.
* Computationally expensive for very high-dimensional data.

---

## **6. Choosing Parameters**

* **MinPts:** Typically ≥ dimension + 1 (e.g., for 2D data, MinPts ≥ 3–4).
* **ε:** Use **k-distance plot** → plot distances to the k-th nearest neighbor and look for the “elbow.”

---

## **7. Real-Life Applications**

* **Geography:** Identifying urban vs rural areas from satellite data.
* **Fraud Detection:** Outliers in transaction data.
* **Astronomy:** Finding star clusters in noisy space data.
* **Biology:** Clustering gene expression data.
* **Market Analysis:** Finding customer groups with irregular purchase behavior.

---

## **8. Visualization Example**

```
Dense regions → Clusters
Sparse points → Noise

● ● ● ● ○   ○         ▲ ▲ ▲ ▲ ▲
● ● ● ● ○             ▲ ▲ ▲ ▲
● ● ● ○               ○ (noise)

Clusters = (● group, ▲ group), Noise = ○
```

---

## **9. Comparison with K-Means & Hierarchical**

| Feature             | K-Means   | Hierarchical             | DBSCAN               |
| ------------------- | --------- | ------------------------ | -------------------- |
| Need K?             | Yes       | No (use dendrogram)      | No (density decides) |
| Cluster Shape       | Spherical | Any (depends on linkage) | Arbitrary (best)     |
| Outlier Handling    | Poor      | Poor                     | Excellent            |
| Large Data Handling | Good      | Poor                     | Moderate             |

---

## **10. Key Takeaways**

* DBSCAN = **density-based clustering**.
* Identifies **arbitrarily shaped clusters** and **outliers**.
* Needs careful selection of **ε** and **MinPts**.
* Works best when clusters have clear density differences.

---
---
---