# Individual Assignment: Advanced Clustering & Density Methods


## 1. Dataset Selection & Exploration  
You will load **four real-world** datasets with different structure and cluster assumptions:  
1. **Iris** (compact Gaussian-like clusters)  
2. **Wine** (skewed feature distributions)  
3. **Breast Cancer** (non-spherical, varying density)  
4. **Digits** (high-dimensional, non-convex clusters)  

For each dataset, in a markdown cell describe its source, number of samples and features, any class labels available (for evaluation only), and whether you apply scaling or normalization. Visualize each with a 2D PCA or t-SNE scatterplot to reveal cluster shapes.

---

## 2. Algorithm Implementations & Baselines  
### 2.1 K-Means Variants  
- **Classic (random init)**  
- **K-Means++**  
- **Bisecting K-Means** (iteratively split largest cluster)  

Implement the algorithm to returns centroids, labels, inertia, iterations, runtime.

### 2.2 Density & Hierarchical Methods  
- **DBSCAN** (tune `eps` and `min_samples`)  
- **Agglomerative Clustering** (Ward’s linkage and complete linkage)  

Use scikit-learn implementations; record labels, runtime, and core sample counts (for DBSCAN).

---

## 3. Evaluation & Visualization  
For each dataset and each algorithm:  
1. Compute **inertia** (where defined), **silhouette score**, and if true labels exist, **adjusted rand index (ARI)**.  
2. Tabulate results in a DataFrame and display it.  
3. For a representative `k` (e.g. 3 for Iris and Wine, 2 for Breast Cancer, 10 for Digits), produce side-by-side 2D scatterplots (PCA or t-SNE) colored by cluster labels for:  
   - Random-init K-Means  
   - K-Means++  
   - Bisecting K-Means  
   - DBSCAN  
   - Agglomerative (Ward)  

---

## 4. Algorithmic Comparison & Failure Modes  
In markdown, for each dataset discuss:  
- Which algorithms found the “true” clusters most accurately (highest ARI) and why.  
- How cluster shape or density affected K-Means (e.g. non-convex Digits clusters).  
- Why DBSCAN succeeded or failed on each dataset (e.g. sensitivity to `eps`).  
- How hierarchical linkage choices change results for skewed data (Wine).

---

## 5. Hyperparameter Sensitivity  
Choose the **two** below algorithms and **one** dataset with a challenging structure (Breast Cancer or Digits) perform a hyperparameter sweep:  
- For K-Means: vary `k` from 2 to 8 and plot inertia and silhouette vs. `k`.  
- For DBSCAN: vary `eps` and `min_samples` and heatmap silhouette scores.  

Show code that logs and visualizes these sensitivities.

---

## 6. Reflection & Insights  
At the end of each major section (Exploration, K-Means, DBSCAN/Agglomerative, Hyperparameter), answer in 2–3 sentences:  
- “What was the hardest implementation or tuning challenge I faced here, and how did I overcome it?”  
- “What insight about cluster structure or algorithm behavior did I gain that no black-box call could teach me?”

---

## 7. Submission  
Push a single Colab or Jupyter notebook named `clustering_assignment_firstname_lastname.ipynb` to GitHub, ensuring all data-loading cells run without error and that your code, plots, tables, and markdown narrative are clear. In a final markdown cell, summarize your **top three takeaways** in bullet form.  
