# Real-World Use Case: Wholesale Customer Analysis

## 1. The Problem
A wholesale distributor wants to understand its client base. They have data on annual spending in different categories (Fresh, Milk, Grocery, Frozen, etc.).
*   **Goal**: Find client taxonomies (e.g., "Restaurants", "Grocery Stores", "Schools").

## 2. Why Hierarchical Clustering?
*   **Hierarchy**: We might want to see broad groups (e.g., "Retail" vs "HoReCa") and then potentially subgroups within them. Dendrograms are perfect for this.
*   **No Fixed K**: We can explore different levels of granularity visually.

## 3. Data (UCI Wholesale Customers Proxy)
Spending in monetary units on: Fresh, Milk, Grocery, Frozen, Detergents_Paper, Delicassen.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import normalize

# 1. Generate Data
np.random.seed(42)
n = 200
# Group A: Restaurants (High Fresh, High Frozen)
fresh_a = np.random.normal(20000, 5000, n)
milk_a = np.random.normal(5000, 2000, n)
grocery_a = np.random.normal(5000, 2000, n)
frozen_a = np.random.normal(10000, 3000, n)

# Group B: Retailers (High Grocery, High Milk, Low Fresh)
fresh_b = np.random.normal(5000, 2000, n)
milk_b = np.random.normal(15000, 4000, n)
grocery_b = np.random.normal(20000, 5000, n)
frozen_b = np.random.normal(2000, 1000, n)

X = np.column_stack([
    np.concatenate([fresh_a, fresh_b]),
    np.concatenate([milk_a, milk_b]),
    np.concatenate([grocery_a, grocery_b]),
    np.concatenate([frozen_a, frozen_b])
])
cols = ['Fresh', 'Milk', 'Grocery', 'Frozen']
df = pd.DataFrame(X, columns=cols)

# 2. Preprocessing
# Hierarchical clustering is sensitive to scale. Monetary values can be skewed.
# We normalize so each customer's total spending is 1 (analyzing ratios).
data_scaled = normalize(df)
df_scaled = pd.DataFrame(data_scaled, columns=cols)

# 3. Dendrogram
plt.figure(figsize=(12, 7))
dend = sch.dendrogram(sch.linkage(df_scaled, method='ward'))
plt.title("Wholesale Customers Dendrogram")
plt.axhline(y=6, color='r', linestyle='--')
plt.show()

# 4. Clustering (Looks like 2 main groups)
cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
df['Cluster'] = cluster.fit_predict(df_scaled)

# 5. Interpretation
print("Average Spending by Cluster:")
print(df.groupby('Cluster').mean())