# Real-World Use Case: Mall Customer Segmentation

## 1. The Problem
A mall marketing team wants to target customers with specific campaigns. They have data on Age, Income, and Spending Score.
*   **Goal**: Identify distinct groups of customers (Segments).

## 2. Why K-Means?
*   **Segmentation**: This is the textbook use case for clustering.
*   **Actionable**: Discovering groups like "High Income, Low Spending" (Potential Savers) or "Low Income, High Spending" (Careless Spenders) directly informs marketing strategy.

## 3. Data Simulation
*   **Annual Income (k$)**: 15 - 137
*   **Spending Score (1-100)**: Behavior score assigned by the mall.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# 1. Generate Realistic Segments
np.random.seed(42)
# Cluster 1: Low Income, Low Spend (Sensible)
c1 = np.random.multivariate_normal([25, 20], [[20, 0], [0, 20]], 50)
# Cluster 2: Low Income, High Spend (Careless)
c2 = np.random.multivariate_normal([25, 80], [[20, 0], [0, 20]], 50)
# Cluster 3: Med Income, Med Spend (Average)
c3 = np.random.multivariate_normal([55, 50], [[30, 0], [0, 30]], 80)
# Cluster 4: High Income, Low Spend (Savers)
c4 = np.random.multivariate_normal([85, 20], [[20, 0], [0, 20]], 50)
# Cluster 5: High Income, High Spend (Target)
c5 = np.random.multivariate_normal([85, 80], [[20, 0], [0, 20]], 50)

X = np.concatenate([c1, c2, c3, c4, c5])
df = pd.DataFrame(X, columns=['Annual_Income', 'Spending_Score'])

# 2. Determine K (Elbow)
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

plt.figure(figsize=(8, 4))
plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()
print("Elbow is clearly at K=5.")

# 3. Train Model (K=5)
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
y_kmeans = kmeans.fit_predict(X)
df['Cluster'] = y_kmeans

# 4. Visualize Segments
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Annual_Income', y='Spending_Score', hue='Cluster', palette='tab10', s=100)
plt.title('Mall Customer Segments')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend(title='Cluster')
plt.show()

## 5. Business Interpretation
*   **Cluster X (High Income, High Spend)**: These are our **VIPs**. Target them with luxury ads.
*   **Cluster Y (High Income, Low Spend)**: **Savers**. Try to convert them with value-propositions or sales.
*   **Cluster Z (Low Income, High Spend)**: **Impulse buyers**. Target with discounts.