
# Customer Segmentation Notebook
## Business & Teaching Purposes

This notebook demonstrates **customer segmentation** using a real dataset.
It is designed for:

### ðŸŽ¯ Business Purpose
- Identify customer groups
- Support targeted marketing strategies
- Improve customer retention and value

### ðŸŽ“ Teaching Purpose
- Demonstrate a full ML workflow
- Explain each step with markdown
- Suitable for undergraduate Machine Learning / Data Science classes

---



## 1. Problem Definition

**Goal:**  
Segment customers into meaningful groups based on their behavior and attributes.

**Approach:**  
We use **unsupervised learning (Clustering)** since no labeled target variable exists.

**Algorithm Used:**  
- K-Means Clustering



## 2. Import Libraries


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score



## 3. Load Dataset


In [None]:

# Load dataset
df = pd.read_csv("customer_segmentation.csv")

# Display first rows
df.head()



## 4. Exploratory Data Analysis (EDA)

### Teaching Notes:
- Understand data types
- Check missing values
- Identify numerical features for clustering


In [None]:

df.info()


In [None]:

df.describe()



## 5. Data Cleaning & Feature Selection

For clustering:
- Keep numerical features
- Remove IDs or non-informative columns


In [None]:

# Select numeric columns only
num_df = df.select_dtypes(include=['int64', 'float64'])

num_df.head()



## 6. Feature Scaling

### Why?
K-Means is distance-based and sensitive to feature scale.


In [None]:

scaler = StandardScaler()
scaled_data = scaler.fit_transform(num_df)



## 7. Choosing Optimal Number of Clusters

### Methods:
- Elbow Method (Business-friendly visualization)
- Silhouette Score (Teaching metric)


In [None]:

wcss = []
K = range(2, 11)

for k in K:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(scaled_data)
    wcss.append(kmeans.inertia_)

plt.figure()
plt.plot(K, wcss, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.title('Elbow Method')
plt.show()


In [None]:

for k in range(2, 8):
    kmeans = KMeans(n_clusters=k, random_state=42)
    labels = kmeans.fit_predict(scaled_data)
    score = silhouette_score(scaled_data, labels)
    print(f"k={k}, Silhouette Score={score:.3f}")



## 8. Train Final K-Means Model


In [None]:

kmeans = KMeans(n_clusters=3, random_state=42)
df['Segment'] = kmeans.fit_predict(scaled_data)

df.head()



## 9. Business Segmentation Interpretation

We analyze each cluster to understand customer behavior.


In [None]:

segment_summary = df.groupby('Segment').mean()
segment_summary



### Example Business Labels
- Segment 0: High-Value Customers
- Segment 1: Average Customers
- Segment 2: Low Engagement Customers



## 10. Visualization of Segments


In [None]:

plt.figure()
sns.scatterplot(
    x=num_df.iloc[:, 0],
    y=num_df.iloc[:, 1],
    hue=df['Segment'],
    palette='Set2'
)
plt.title("Customer Segments Visualization")
plt.show()



## 11. Teaching Summary (For Students)

### What You Learned:
1. Difference between supervised and unsupervised learning
2. Why scaling matters
3. How to choose K in K-Means
4. How to interpret clusters

### Exam Questions:
- Why is K-Means unsuitable without feature scaling?
- Explain Elbow vs Silhouette method
- How can businesses use clustering results?



## 12. Business Recommendations

- Design personalized marketing per segment
- Allocate budget to high-value segments
- Re-engage low-value customers with promotions



## âœ… End of Notebook
