# Customer Segmentation using Unsupervised Learning

This notebook performs customer segmentation using **K-Means**, **Hierarchical Clustering**, and **DBSCAN**.

## Step 1: Import Libraries

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.impute import SimpleImputer
from scipy.cluster.hierarchy import dendrogram, linkage


## Step 2: Load Dataset

In [None]:

df = pd.read_csv("customer_segmentation.csv")
df.head()


## Step 3: Data Overview

In [None]:

df.info()
df.describe()
df.isnull().sum()


## Step 4: Handle Missing Values

In [None]:

num_cols = ["Age", "Annual_Income", "Avg_Order_Value"]
imputer = SimpleImputer(strategy="median")
df[num_cols] = imputer.fit_transform(df[num_cols])


## Step 5: Encode Categorical Variables

In [None]:

cat_cols = ["Gender", "Region"]
le = LabelEncoder()

for col in cat_cols:
    df[col] = le.fit_transform(df[col])


## Step 6: Feature Selection & Scaling

In [None]:

X = df.drop(columns=["CustomerID", "Is_Churned"])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## Step 7: K-Means Clustering

In [None]:

inertia = []
for k in range(2, 11):
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(X_scaled)
    inertia.append(km.inertia_)

plt.plot(range(2, 11), inertia, marker='o')
plt.xlabel("Clusters")
plt.ylabel("Inertia")
plt.title("Elbow Method")
plt.show()


In [None]:

kmeans = KMeans(n_clusters=4, random_state=42)
df["KMeans_Cluster"] = kmeans.fit_predict(X_scaled)
df["KMeans_Cluster"].value_counts()


## Step 8: Hierarchical Clustering

In [None]:

sample = X_scaled[:500]
linked = linkage(sample, method="ward")

plt.figure(figsize=(10,5))
dendrogram(linked)
plt.title("Dendrogram")
plt.show()


In [None]:

hc = AgglomerativeClustering(n_clusters=4)
df["Hierarchical_Cluster"] = hc.fit_predict(X_scaled)


## Step 9: DBSCAN

In [None]:

dbscan = DBSCAN(eps=1.5, min_samples=10)
df["DBSCAN_Cluster"] = dbscan.fit_predict(X_scaled)
df["DBSCAN_Cluster"].value_counts()


## Step 10: Visualization

In [None]:

sns.scatterplot(
    x=df["Annual_Income"],
    y=df["Spending_Score"],
    hue=df["KMeans_Cluster"],
    palette="Set2"
)
plt.title("Customer Segmentation (K-Means)")
plt.show()


## Step 11: Cluster Interpretation

In [None]:

df.groupby("KMeans_Cluster").mean()


## Conclusion
- K-Means gives clear segments
- Hierarchical shows structure
- DBSCAN detects outliers