# Homework: Clustering Strategy & Insight Generation
**Topic:** Unsupervised Learning Optimization  
**Course:** KPITB UETM - Week 3

---

## Task 1: Distance Logic (Brainstorming)

### 1. Can two data points be far apart in one feature but still belong to the same cluster? Explain.
**Answer:** Yes, two points can be far apart in one dimension but very close in others. Most clustering algorithms (like K-Means) use a multi-dimensional distance metric (e.g., Euclidean distance). If the "closeness" in other features outweighs the distance in a single feature, the points will likely be grouped together. Additionally, in density-based clustering like DBSCAN, points can be far apart but connected via a chain of dense neighboring points.

### 2. Why can unscaled features distort distance-based clustering results?
**Answer:** Distance-based algorithms calculate the geometric distance between points. If one feature has a large range (e.g., Salary: $20,000 - $200,000) and another has a small range (e.g., Age: 18 - 80), the "Salary" feature will dominate the distance calculation. The algorithm will treat a difference of 100 units in salary as more significant than a difference of 50 units in age, even if the age difference is more meaningful for segmentation.

### 3. List two reasons why feature scaling is critical before applying K-Means or Hierarchical Clustering.
**Answer:**
1. **Uniform Weighting:** Scaling ensures that no single feature dominates the distance metric simply because of its magnitude, allowing all features to contribute equally to the cluster formation.
2. **Algorithm Convergence:** For K-Means, which is an iterative optimization algorithm using centroids, scaling helps the algorithm converge faster and find more stable, robust clusters.

## Task 2: Development Challenge

### Setup & Data Preparation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')

sns.set_palette("viridis") # Rich Aesthetics

# Load Dataset
url = "https://raw.githubusercontent.com/tirthajyoti/Machine-Learning-with-Python/master/Datasets/Mall_Customers.csv"
df = pd.read_csv(url)
df = df.rename(columns={'Annual Income (k$)': 'Income', 'Spending Score (1-100)': 'Spend_Score'})

# Select Features and Scale
X = df[['Income', 'Spend_Score']]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Data Preparation Complete.")
df.head()

### 1. The “No-Optimization” Test
Applying K-Means with an arbitrary value of **K=2**.

In [None]:
kmeans_no_opt = KMeans(n_clusters=2, random_state=42)
labels_no_opt = kmeans_no_opt.fit_predict(X_scaled)

plt.figure(figsize=(10, 6))
plt.scatter(df['Income'], df['Spend_Score'], c=labels_no_opt, cmap='viridis', edgecolors='k')
plt.title("K-Means Clustering (No Optimization, K=2)")
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.show()

### 2. The “Optimized” Test
Using the **Elbow Method** and **Silhouette Score** to find the best K.

In [None]:
wcss = []
silhouette_scores = []
k_range = range(2, 11)

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X_scaled)
    wcss.append(kmeans.inertia_)
    silhouette_scores.append(silhouette_score(X_scaled, kmeans.labels_))

fig, ax = plt.subplots(1, 2, figsize=(15, 5))

# Elbow Plot
ax[0].plot(k_range, wcss, marker='o', linestyle='--')
ax[0].set_title('Elbow Method (WCSS)')
ax[0].set_xlabel('Number of Clusters (K)')
ax[1].set_ylabel('WCSS')

# Silhouette Plot
ax[1].plot(k_range, silhouette_scores, marker='o', color='purple')
ax[1].set_title('Silhouette Score')
ax[1].set_xlabel('Number of Clusters (K)')
ax[1].set_ylabel('Score')

plt.show()

Based on the Elbow plot (the bend at K=5) and the highest Silhouette Score, the **optimal K is 5**.

### Optimized Clustering Comparison (K=5)

In [None]:
kmeans_opt = KMeans(n_clusters=5, random_state=42)
labels_opt = kmeans_opt.fit_predict(X_scaled)

plt.figure(figsize=(10, 6))
plt.scatter(df['Income'], df['Spend_Score'], c=labels_opt, cmap='plasma', edgecolors='k')
plt.title("Optimized K-Means Clustering (K=5)")
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.show()

### 3. Analysis: Compare both approaches

**1. Cluster Quality:**
- **No-Optimization (K=2):** The clusters are too broad. They group people only into roughly "high vs low score" or "low vs high income" depending on the centroid initialization, failing to capture the distinct sub-segments in the middle and corners of the data.
- **Optimized (K=5):** The clusters clearly separate the data into 5 distinct groups: High Income/High Spend, High Income/Low Spend, Low Income/High Spend, Low Income/Low Spend, and an Average group. The Silhouette score is mathematically much higher, indicating better defined clusters.

**2. Interpretability:**
- The **Optimized approach** produced far more meaningful clusters. management can now target "Whales" (High Income/High Spend) or "Frugal Customers" (High Income/Low Spend) with different strategies. A K=2 split is too vague for targeted decision-making.

**3. Conclusion:** The **Optimized approach** is superior as it respects the natural geometry of the data and provides actionable business segments.

## Task 3: Algorithm Intuition

**Question:** If K-Means produces different cluster assignments on different runs but Hierarchical Clustering produces the same structure every time, which result would you trust more for business reporting and why?

**Answer:** For **business reporting consistency**, I would trust **Hierarchical Clustering** (specifically agglomerative) more for its **determinism**. K-Means is stochastic; it depends on the random initialization of centroids, which refers to the "Lucky/Unlucky Seeding" problem. While K-Means is faster for large data, a change in reporting results every time the script runs can confuse management. However, for **segment quality**, K-Means with a fixed `random_state` is often preferred if it produces better-separated groups. In a strictly reporting context where consistency is king, the deterministic nature of Hierarchical Clustering makes it highly reliable.

## Final Report Summary

1. **Clustering Approach used:** K-Means Clustering with feature scaling (StandardScaler).
2. **Method for selecting K:** Combined the **Elbow Method** (within-cluster sum of squares) and the **Silhouette Score** (measuring cluster density and separation).
3. **Meaningfulness:** The final clusters (K=5) represent 5 logical consumer behaviors (Frugal, Rich Spenders, Careful Spenders, Impulsive, and Target Balanced). This is significantly more useful for marketing than arbitrary segmentation.