## **5. Customer Segmentation Modeling**

### **5.1 Overview**
This notebook demonstrates customer segmentation using K-means clustering. The code has been modularized into the `src/clustering.py` module for production use.

**Modeling Steps:**
1. Load customer features
2. Scale RFM features
3. Find optimal number of clusters (Elbow Method, Silhouette Score)
4. Perform K-means clustering
5. Create cluster profiles and segment names
6. Visualize results

**Production Usage:**
```python
from src.clustering import run_clustering_pipeline
customer_segments = run_clustering_pipeline(customer_features, n_clusters=3)
```

### **5.2 Load Customer Features**

In [None]:
# Import libraries and load customer features
import pandas as pd
import numpy as np
import sys
import os

# Add src directory to path for importing our module
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'src'))

# Load customer features
df = pd.read_csv('../data/processed/Customer_RFM_Features.csv')
print(f"Customer features shape: {df.shape}")
print("\nFeature columns:")
print(df.columns.tolist())
df.head()

### **5.3 Feature Scaling**

In [None]:
# Select and scale RFM features for clustering
from feature_engineering import get_rfm_feature_columns
from sklearn.preprocessing import StandardScaler

# Get RFM features
rfm_features = get_rfm_feature_columns()
print(f"RFM features for clustering: {rfm_features}")

# Extract RFM data
X = df[rfm_features].copy()
print(f"\nRFM data shape: {X.shape}")

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(f"Scaled features shape: {X_scaled.shape}")

# Display scaled features
scaled_df = pd.DataFrame(X_scaled, columns=rfm_features)
print("\nSample scaled features:")
print(scaled_df.head())

### **5.4 Find Optimal Clusters**

In [None]:
# Find optimal number of clusters using our modular function
from clustering import find_optimal_clusters, plot_elbow_method, plot_silhouette_scores

cluster_range, wcss, silhouette_scores = find_optimal_clusters(X_scaled)
print(f"Cluster range: {cluster_range}")
print(f"WCSS values: {wcss}")
print(f"Silhouette scores: {silhouette_scores}")

# Note: Functions available in src/clustering.py

In [None]:
# Plot Elbow Method
plot_elbow_method(cluster_range, wcss, save_path='../data/processed/images/elbow_method_notebook.png')

# Plot Silhouette Scores
plot_silhouette_scores(cluster_range, silhouette_scores, 
                      save_path='../data/processed/images/silhouette_scores_notebook.png')

# Determine optimal k (highest silhouette score)
optimal_k = cluster_range[np.argmax(silhouette_scores)]
print(f"Optimal number of clusters: {optimal_k} (Silhouette Score: {max(silhouette_scores):.3f})")

### **5.5 Perform K-means Clustering**

In [None]:
# Perform clustering using our modular function
from clustering import perform_kmeans_clustering

optimal_k = 3  # Based on silhouette analysis
kmeans = perform_kmeans_clustering(X_scaled, optimal_k)

# Add cluster assignments to dataframe
df['Cluster'] = kmeans.labels_
print(f"Clustering completed with {optimal_k} clusters")
print("\nCluster distribution:")
print(df['Cluster'].value_counts().sort_index())

# Note: Function available in src/clustering.py

### **5.6 Create Cluster Profiles**

In [None]:
# Create cluster profiles using our modular function
from clustering import create_cluster_profiles

cluster_profile = create_cluster_profiles(df)
print("Cluster Profiles:")
print(cluster_profile)

# Note: Function available in src/clustering.py

### **5.7 Define Segment Names**

In [None]:
# Define segment names based on RFM characteristics
from clustering import define_segment_names, assign_segments

segment_names = define_segment_names(cluster_profile)
print("Segment Names:")
for cluster_id, segment_name in segment_names.items():
    print(f"Cluster {cluster_id}: {segment_name}")

# Assign segments to customers
df = assign_segments(df, segment_names)
print(f"\nSegment distribution:")
print(df['Segment'].value_counts())

# Note: Functions available in src/clustering.py

### **5.8 Visualize Segments**


In [None]:
# Create RFM segment visualizations
from visualization import plot_rfm_segments, plot_cluster_scatter

# Plot RFM characteristics by segment
plot_rfm_segments(df, save_path='../data/processed/images/rfm_segments_notebook.png')

# Plot cluster scatter plots
plot_cluster_scatter(df, 'Frequency', 'Monetary', 
                   save_path='../data/processed/images/frequency_monetary_scatter_notebook.png')

plot_cluster_scatter(df, 'Recency', 'Monetary', 
                   save_path='../data/processed/images/recency_monetary_scatter_notebook.png')

# Note: Functions available in src/visualization.py

### **5.9 Save Results and Models**

In [None]:
# Save clustering results and models
from clustering import save_model

# Save customer segments
df.to_csv('../data/processed/Customer_Segments.csv', index=False)
print("Customer segments saved to: ../data/processed/Customer_Segments.csv")

# Save trained models
save_model(kmeans, scaler, 
          '../models/kmeans_customer_segmentation_notebook.pkl',
          '../models/scaler_customer_segmentation_notebook.pkl')

print("\nModels saved successfully!")
print(f"Final segments shape: {df.shape}")
print(f"Segments created: {df['Segment'].nunique()}")