# Customer Segmentation - Testing Notebook

This notebook is designed to **test the segmentation module (`segmentation.py`)**.  
Here, we validate that each function works correctly on the prepared RFM dataset:

- `prepare_rfm()` → ensures proper column selection and renaming  
- `scale_rfm()` → standardizes RFM values for clustering  
- `run_kmeans()` → applies KMeans clustering with chosen parameters  
- `profile_clusters()` → generates cluster summaries for interpretation  

The purpose of this notebook is **verification**, not exploration.  
It ensures that the segmentation pipeline runs end‑to‑end without errors before integration into the main analysis workflow.


In [1]:
# Import forecasting.py
import importlib.util

spec = importlib.util.spec_from_file_location("segmentation", "../../src/segmentation.py")
segmentation = importlib.util.module_from_spec(spec)
spec.loader.exec_module(segmentation)

In [3]:
# Load dataset
import pandas as pd
sales = pd.read_csv("../../data/cleaned/sales_clean.csv")
sales["Order Date"] = pd.to_datetime(sales["Order Date"])

In [4]:
# Aggregate data to customer level
rfm = sales.groupby("Customer ID").agg({
    "Order Date": lambda x: (sales["Order Date"].max() - x.max()).days,  # Recency
    "Order ID": "count",                                                 # Frequency
    "Sales": "sum"                                                       # Monetary
}).reset_index()

rfm.columns = ["Customer ID", "Recency", "Frequency", "Monetary"]

In [5]:
# Prepare RFM
rfm_prepared = segmentation.prepare_rfm(rfm, "Customer ID", "Recency", "Frequency", "Monetary")
rfm_scaled   = segmentation.scale_rfm(rfm_prepared)
clusters, kmeans_model = segmentation.run_kmeans(rfm_scaled, n_clusters=3)
cluster_profile = segmentation.profile_clusters(rfm_prepared, clusters)

In [6]:
print(cluster_profile)

   Cluster     Recency  Frequency     Monetary  CustomerID
0        0   86.426667  19.680000  5568.311782         225
1        1  107.997812  10.210066  1848.001597         457
2        2  547.441441   8.108108  1799.945763         111


# Testing Summary

All segmentation functions were executed successfully on the RFM dataset:

- Data preparation and column renaming worked as expected.  
- Scaling produced normalized values suitable for clustering.  
- KMeans generated clusters without errors.  
- Cluster profiling returned interpretable summaries.

This confirms that the `segmentation.py` module is ready for use in the main analysis notebook (`03_Segmentation.ipynb`).  
Future improvements may include parameter tuning for KMeans and visualization of cluster distributions.