✅ Project Title:
E-commerce Customer Segmentation and Recommendation Engine
🎯 Objective:

Segment e-commerce customers into meaningful groups and build a simple recommendation engine that suggests products based on customer behavior.

📊 Project Breakdown:
1️⃣ Data Collection:

Dataset Source:
Use publicly available datasets like:
👉 https://www.kaggle.com/datasets/ertugrulesol/online-retail-data?resource=download

Example Dataset:

Online Retail dataset with customer transactions, product details, and customer demographics.

2️⃣ Exploratory Data Analysis (EDA) & Feature Engineering:

Analyze data distributions, missing values, and correlations.

Key Features to Engineer:

Customer Lifetime Value (CLV)

Average Order Value (AOV)

Purchase Frequency

Recency (Days since last purchase)

Visualizations:

Histograms of purchase amounts

Correlation heatmaps

Distribution of customer purchases by geography

3️⃣ Customer Segmentation:

Algorithm:
Use K-Means Clustering to segment customers into groups.

Steps:

Normalize features.

Find optimal number of clusters (Elbow Method or Silhouette Score).

Apply clustering and assign customer segments.

Example Segments:

High-value loyal customers

New shoppers

At-risk customers

Visualization:

Cluster scatter plots

Segment summary tables

4️⃣ Recommendation Engine:

Approach:

Simple Content-Based Filtering or Collaborative Filtering approach.

Example:

For content-based: Recommend products similar to customer’s past purchases based on product features (category, price range).

For collaborative filtering: Recommend products popular among customers in the same segment.

Output:

Top 5 product recommendations per customer.

5️⃣ Visualization Dashboard:

Tool: Power BI or Tableau

Dashboard Sections:

Customer Segments Overview (size, average revenue per segment)

Purchase behavior insights

Performance of recommendation engine (e.g., precision or accuracy if evaluated)

Interactive filters (e.g., by geography, segment)

✅ Project Outcome:

Demonstrated ability to process raw e-commerce data into actionable insights.

Built customer segmentation models using unsupervised learning.

Developed a basic recommendation system offering personalized suggestions.

Delivered a dashboard providing clear, interactive insights.

## Data Loading and Overview.

In [1]:
import pandas as pd

# Load the dataset
file_path = 'synthetic_online_retail_data.csv'
data = pd.read_csv(file_path)

# Preview dataset
print(data.head())
print(data.info())


   customer_id  order_date  product_id  category_id       category_name  \
0        13542  2024-12-17         784           10         Electronics   
1        23188  2024-06-01         682           50   Sports & Outdoors   
2        55098  2025-02-04         684           50   Sports & Outdoors   
3        65208  2024-10-28         204           40  Books & Stationery   
4        63872  2024-05-10         202           20             Fashion   

  product_name  quantity   price payment_method            city  review_score  \
0   Smartphone         2  373.36    Credit Card  New Oliviaberg           1.0   
1  Soccer Ball         5  299.34    Credit Card    Port Matthew           NaN   
2         Tent         5   23.00    Credit Card      West Sarah           5.0   
3   Story Book         2  230.11  Bank Transfer  Hernandezburgh           5.0   
4        Skirt         4  176.72    Credit Card    Jenkinshaven           1.0   

  gender  age  
0      F   56  
1      M   59  
2      F   64 

## Data Cleaning

### Convert `order_date` to datetime type

In [2]:
data['order_date'] = pd.to_datetime(data['order_date'])

In [4]:
### # Fill missing 'gender' values with 'Unknown'
data['gender'] = data['gender'].fillna('Unknown')

# Fill missing 'review_score' with median value
data['review_score'] = data['review_score'].fillna(data['review_score'].median())

## Feature Engineering

### Calculate Total Order Value

In [5]:
data['total_order_value'] = data['quantity'] * data['price']

### Aggregate data per customer

In [7]:
customer_agg = data.groupby('customer_id').agg({
    'order_date': ['max', 'min', 'count'],    # Last order, first order, total orders
    'total_order_value': 'sum',              # Customer Lifetime Value (CLV)
    'age': 'first',
    'gender': 'first',
    'review_score': 'mean'                  # Average review score
}).reset_index()

# Rename columns
customer_agg.columns = ['customer_id', 'last_order_date', 'first_order_date', 'purchase_count',
                        'total_spent', 'age', 'gender', 'avg_review_score']


### Calculate Recency and CLV

In [9]:
from datetime import datetime

current_date = datetime.strptime('2025-09-10', '%Y-%m-%d')

# Recency: Days since last purchase
customer_agg['recency'] = (current_date - customer_agg['last_order_date']).dt.days

# CLV (Customer Lifetime Value) approximated by total spent
customer_agg['clv'] = customer_agg['total_spent']


## Customer Segmentation (K-Means Clustering)

### Select Features and Normalize

In [10]:
from sklearn.preprocessing import StandardScaler

features = customer_agg[['purchase_count', 'total_spent', 'recency', 'avg_review_score', 'age']]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(features)

Normalizing data ensures fair distance-based clustering without bias toward larger numerical ranges.

### Apply K-Means Clustering

In [11]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
customer_agg['segment'] = kmeans.fit_predict(X_scaled)

Assigns each customer to a cluster based on behavioral patterns.

## Simple Recommendation System (Content-Based)

### Build Product Profile (Most Popular Product Per Category)

In [12]:
product_profiles = data.groupby('category_name')['product_name'].agg(lambda x: x.mode()[0]).reset_index()
product_profiles.columns = ['category_name', 'top_product']

### Find Most Purchased Category Per Customer

In [13]:
customer_category = data.groupby(['customer_id', 'category_name'])['quantity'].sum().reset_index()

most_purchased_category = customer_category.loc[customer_category.groupby('customer_id')['quantity'].idxmax()]

### Merge Recommendations

In [14]:
recommendations = most_purchased_category.merge(product_profiles, on='category_name')[['customer_id', 'top_product']]

# Add recommendation back to customer aggregation
customer_agg = customer_agg.merge(recommendations, on='customer_id', how='left')

## Final Sample Output

In [15]:
print(customer_agg.head())

   customer_id last_order_date first_order_date  purchase_count  total_spent  \
0        10201      2024-10-14       2024-10-14               1       624.84   
1        10211      2024-07-23       2024-07-23               1        65.02   
2        10254      2024-09-10       2024-09-10               1        70.93   
3        10299      2024-11-27       2024-11-27               1       815.76   
4        10403      2024-05-03       2024-05-03               1      1319.35   

   age   gender  avg_review_score  recency      clv  segment top_product  
0   23  Unknown               4.0      331   624.84        1  Smartphone  
1   25        F               5.0      414    65.02        1     Blanket  
2   73        M               3.0      365    70.93        0     Blanket  
3   33        F               5.0      287   815.76        1       Pants  
4   65        M               4.0      495  1319.35        0  Smartphone  


In [16]:
customer_agg.to_csv('customer_segmentation_recommendations.csv', index=False)