# üéØ Customer Segmentation with RFM Analysis & K-Means Clustering

## Objective
Segment 96K+ e-commerce customers into distinct groups using RFM (Recency, Frequency, Monetary) metrics and K-Means clustering to enable targeted marketing strategies.

## Business Value
- Identify high-value customer segments
- Personalize marketing campaigns
- Optimize customer retention strategies
- Increase customer lifetime value

## Technical Approach
1. Extract customer transaction data from data warehouse
2. Calculate RFM metrics
3. Feature engineering & normalization
4. Determine optimal number of clusters (elbow method)
5. Apply K-Means clustering
6. Analyze and visualize segments
7. Generate business recommendations

---
## üì¶ 1. Setup & Imports

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import sys
import os

# Works in VSCode
project_root = os.path.abspath(os.path.join(os.path.dirname(__file__) if '__file__' in globals() else os.getcwd(), '../..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

from src.utils.db_connection import DatabaseConnection
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
import pickle
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ All libraries imported successfully!")
print(f"üìÖ Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print(f"üìÅ Project Root: {project_root}")

‚úÖ All libraries imported successfully!
üìÖ Analysis Date: 2025-11-05 17:54
üìÅ Project Root: /Users/rajkaranyp/Documents/streamcommerce-analytics


---
## üìä 2. Data Extraction

Extract customer transaction data from the data warehouse to calculate RFM metrics.

In [None]:
query = """
SELECT 
    c.customer_id,
    c.customer_state,
    c.customer_city,
    o.order_id,
    o.order_purchase_timestamp,
    SUM(oi.price) as order_value
FROM dim_customers c
JOIN fact_orders o ON c.customer_key = o.customer_key
JOIN fact_order_items oi ON o.order_key = oi.order_key
WHERE o.order_status = 'delivered'
GROUP BY c.customer_id, c.customer_state, c.customer_city, o.order_id, o.order_purchase_timestamp
ORDER BY c.customer_id, o.order_purchase_timestamp DESC;
"""

print("üîç Extracting customer data from data warehouse...")

with DatabaseConnection() as db:
    df_transactions = pd.read_sql(query, db.conn)

print(f"‚úÖ Extracted {len(df_transactions):,} transactions")
print(f"   Unique customers: {df_transactions['customer_id'].nunique():,}")
print(f"   Unique orders: {df_transactions['order_id'].nunique():,}")

display(df_transactions.head(10))

üîç Extracting customer data from data warehouse...
üíª Detected host environment
üîç Connecting to PostgreSQL at localhost:5433...
‚úÖ Connected to database: ecommerce_db @ localhost:5433
‚úÖ Database connection closed
‚úÖ Extracted 96,478 transactions
   Unique customers: 96,478

üìã Sample data:


Unnamed: 0,customer_id,customer_state,customer_city,order_purchase_timestamp,order_value,num_orders
0,00012a2ce6f8dcda20d059ce98491703,SP,osasco,2017-11-14 16:08:26,89.8,1
1,000161a058600d5901f007fab4c27140,MG,itapecerica,2017-07-16 09:40:32,54.9,1
2,0001fd6190edaaf884bcaf3d49edf079,ES,nova venecia,2017-02-28 11:06:43,179.99,1
3,0002414f95344307404f0ace7a26f1d5,MG,mendonca,2017-08-16 13:09:20,149.9,1
4,000379cdec625522490c315e70c7a9fb,SP,sao paulo,2018-04-02 13:42:17,93.0,1
5,0004164d20a9e969af783496f3408652,SP,valinhos,2017-04-12 08:35:12,59.99,1
6,000419c5494106c306a97b5635748086,RJ,niteroi,2018-03-02 17:47:40,34.3,1
7,00046a560d407e99b969756e0b10f282,RJ,rio de janeiro,2017-12-18 11:08:30,120.9,1
8,00050bf6e01e69d5c0fd612f1bcfb69c,RS,ijui,2017-09-17 16:04:44,69.99,1
9,000598caf2ef4117407665ac33275130,MG,oliveira,2018-08-11 12:14:35,1107.0,1


---
## üßÆ 3. RFM Calculation

Calculate Recency, Frequency, and Monetary metrics for each customer:
- **Recency**: Days since last purchase (lower is better)
- **Frequency**: Number of purchases (higher is better)
- **Monetary**: Total amount spent (higher is better)

In [None]:
df_transactions['order_purchase_timestamp'] = pd.to_datetime(df_transactions['order_purchase_timestamp'])

analysis_date = df_transactions['order_purchase_timestamp'].max() + timedelta(days=1)
print(f"üìÖ Analysis Date: {analysis_date.date()}")

print("\nüßÆ Calculating RFM metrics...")

rfm = df_transactions.groupby('customer_id').agg({
    'order_purchase_timestamp': lambda x: (analysis_date - x.max()).days,
    'order_id': 'nunique',  # Count unique orders
    'order_value': 'sum'
}).reset_index()

rfm.columns = ['customer_id', 'recency', 'frequency', 'monetary']

customer_location = df_transactions.groupby('customer_id')[['customer_state', 'customer_city']].first().reset_index()
rfm = rfm.merge(customer_location, on='customer_id')

print(f"‚úÖ RFM calculated for {len(rfm):,} customers")
display(rfm[['recency', 'frequency', 'monetary']].describe())
display(rfm.head(10))

üìÖ Analysis Date: 2018-08-30
   (1 day after last transaction)

üßÆ Calculating RFM metrics...


ValueError: cannot insert customer_id, already exists

---
## üìà 4. Exploratory Data Analysis

In [None]:
# Distribution plots
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Recency
axes[0].hist(rfm['recency'], bins=50, color='skyblue', edgecolor='black')
axes[0].set_title('Distribution of Recency (Days)', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Days Since Last Purchase')
axes[0].set_ylabel('Number of Customers')
axes[0].axvline(rfm['recency'].median(), color='red', linestyle='--', label=f'Median: {rfm["recency"].median():.0f} days')
axes[0].legend()

# Frequency
axes[1].hist(rfm['frequency'], bins=30, color='lightgreen', edgecolor='black')
axes[1].set_title('Distribution of Frequency', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Number of Orders')
axes[1].set_ylabel('Number of Customers')
axes[1].axvline(rfm['frequency'].median(), color='red', linestyle='--', label=f'Median: {rfm["frequency"].median():.0f} orders')
axes[1].legend()

# Monetary
axes[2].hist(rfm['monetary'], bins=50, color='salmon', edgecolor='black')
axes[2].set_title('Distribution of Monetary Value', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Total Spending (R$)')
axes[2].set_ylabel('Number of Customers')
axes[2].axvline(rfm['monetary'].median(), color='red', linestyle='--', label=f'Median: R$ {rfm["monetary"].median():.2f}')
axes[2].legend()

plt.tight_layout()
plt.savefig('outputs/visualizations/rfm_distributions.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ Distributions visualized")

In [None]:
# RFM Correlation Analysis
plt.figure(figsize=(8, 6))
correlation = rfm[['recency', 'frequency', 'monetary']].corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('RFM Metrics Correlation Matrix', fontsize=14, fontweight='bold')
plt.savefig('outputs/visualizations/rfm_correlation.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ Correlation matrix created")
print("\nüìä Key Insights:")
print(f"   Recency ‚Üî Frequency: {correlation.loc['recency', 'frequency']:.3f}")
print(f"   Recency ‚Üî Monetary:  {correlation.loc['recency', 'monetary']:.3f}")
print(f"   Frequency ‚Üî Monetary: {correlation.loc['frequency', 'monetary']:.3f}")

---
## üîß 5. Feature Engineering & Preprocessing

In [None]:
# Handle outliers using IQR method
def remove_outliers(df, column, multiplier=1.5):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - multiplier * IQR
    upper_bound = Q3 + multiplier * IQR
    
    outliers_before = len(df)
    df_filtered = df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
    outliers_removed = outliers_before - len(df_filtered)
    
    print(f"   {column}: Removed {outliers_removed:,} outliers ({outliers_removed/outliers_before*100:.2f}%)")
    return df_filtered

print("üßπ Handling outliers...")
rfm_clean = rfm.copy()
initial_count = len(rfm_clean)

rfm_clean = remove_outliers(rfm_clean, 'monetary', multiplier=2.0)  # More lenient for monetary
rfm_clean = remove_outliers(rfm_clean, 'recency', multiplier=1.5)

print(f"\n‚úÖ Cleaned dataset: {len(rfm_clean):,} customers ({len(rfm_clean)/initial_count*100:.1f}% retained)")

In [None]:
# Feature scaling (normalization)
print("\nüìè Normalizing features...")

# Select features for clustering
features = ['recency', 'frequency', 'monetary']
X = rfm_clean[features].values

# Standardize features (mean=0, std=1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print(f"‚úÖ Features normalized")
print(f"   Shape: {X_scaled.shape}")
print(f"   Mean: {X_scaled.mean(axis=0)}")
print(f"   Std: {X_scaled.std(axis=0)}")

---
## üéØ 6. Determine Optimal Number of Clusters

Use the **Elbow Method** and **Silhouette Score** to find the best K.

In [None]:
print("üîç Finding optimal number of clusters...")

# Test different values of K
k_range = range(2, 11)
inertias = []
silhouette_scores = []
davies_bouldin_scores = []

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    
    inertias.append(kmeans.inertia_)
    silhouette_scores.append(silhouette_score(X_scaled, kmeans.labels_))
    davies_bouldin_scores.append(davies_bouldin_score(X_scaled, kmeans.labels_))
    
    print(f"   K={k}: Inertia={kmeans.inertia_:.2f}, Silhouette={silhouette_scores[-1]:.3f}, DB={davies_bouldin_scores[-1]:.3f}")

print("\n‚úÖ Evaluation complete!")

In [None]:
# Visualization of optimal K
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Elbow curve
axes[0].plot(k_range, inertias, 'bo-', linewidth=2, markersize=8)
axes[0].set_title('Elbow Method', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Number of Clusters (K)')
axes[0].set_ylabel('Inertia (Within-Cluster Sum of Squares)')
axes[0].grid(True, alpha=0.3)

# Silhouette score
axes[1].plot(k_range, silhouette_scores, 'go-', linewidth=2, markersize=8)
axes[1].set_title('Silhouette Score', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Number of Clusters (K)')
axes[1].set_ylabel('Silhouette Score (Higher is Better)')
axes[1].grid(True, alpha=0.3)
axes[1].axhline(y=0.5, color='r', linestyle='--', label='Good threshold (0.5)')
axes[1].legend()

# Davies-Bouldin score
axes[2].plot(k_range, davies_bouldin_scores, 'ro-', linewidth=2, markersize=8)
axes[2].set_title('Davies-Bouldin Index', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Number of Clusters (K)')
axes[2].set_ylabel('DB Index (Lower is Better)')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('outputs/visualizations/optimal_k_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

# Recommend optimal K
optimal_k_silhouette = k_range[np.argmax(silhouette_scores)]
print(f"\nüéØ RECOMMENDED K: {optimal_k_silhouette}")
print(f"   Based on highest Silhouette Score: {max(silhouette_scores):.3f}")

---
## ü§ñ 7. Train K-Means Model

In [None]:
# Train final model with optimal K
optimal_k = 4  # Adjust based on elbow curve (usually 3-5)

print(f"ü§ñ Training K-Means model with K={optimal_k}...")

kmeans_final = KMeans(n_clusters=optimal_k, random_state=42, n_init=20, max_iter=300)
rfm_clean['cluster'] = kmeans_final.fit_predict(X_scaled)

print(f"‚úÖ Model trained successfully!")
print(f"\nüìä Cluster Distribution:")
cluster_counts = rfm_clean['cluster'].value_counts().sort_index()
for cluster, count in cluster_counts.items():
    percentage = (count / len(rfm_clean)) * 100
    print(f"   Cluster {cluster}: {count:,} customers ({percentage:.1f}%)")

# Model evaluation
final_silhouette = silhouette_score(X_scaled, rfm_clean['cluster'])
final_db = davies_bouldin_score(X_scaled, rfm_clean['cluster'])

print(f"\nüìà Model Performance:")
print(f"   Silhouette Score: {final_silhouette:.3f} {'‚úÖ Good' if final_silhouette > 0.5 else '‚ö†Ô∏è Moderate'}")
print(f"   Davies-Bouldin Index: {final_db:.3f} {'‚úÖ Good' if final_db < 1.0 else '‚ö†Ô∏è Moderate'}")

---
## üîç 8. Cluster Analysis & Profiling

In [None]:
# Calculate cluster profiles
print("üìä Cluster Profiles:\n")

cluster_profiles = rfm_clean.groupby('cluster').agg({
    'recency': ['mean', 'median'],
    'frequency': ['mean', 'median'],
    'monetary': ['mean', 'median', 'sum'],
    'customer_id': 'count'
}).round(2)

cluster_profiles.columns = ['_'.join(col).strip() for col in cluster_profiles.columns.values]
cluster_profiles = cluster_profiles.rename(columns={'customer_id_count': 'size'})

cluster_profiles

In [None]:
# Assign meaningful names to clusters based on profiles
cluster_names = {
    0: 'Cluster 0',
    1: 'Cluster 1', 
    2: 'Cluster 2',
    3: 'Cluster 3'
}

# Analyze cluster characteristics to assign names
for cluster in range(optimal_k):
    cluster_data = rfm_clean[rfm_clean['cluster'] == cluster]
    avg_recency = cluster_data['recency'].mean()
    avg_monetary = cluster_data['monetary'].mean()
    size = len(cluster_data)
    
    # Name clusters based on characteristics
    if avg_monetary > rfm_clean['monetary'].quantile(0.75):
        if avg_recency < rfm_clean['recency'].quantile(0.50):
            cluster_names[cluster] = 'üíé VIP Champions'
        else:
            cluster_names[cluster] = 'üëë High Spenders (At Risk)'
    elif avg_monetary > rfm_clean['monetary'].quantile(0.50):
        cluster_names[cluster] = 'üåü Loyal Customers'
    elif avg_recency > rfm_clean['recency'].quantile(0.75):
        cluster_names[cluster] = 'üò¥ Lost/Churned'
    else:
        cluster_names[cluster] = 'üÜï Potential Loyalists'

rfm_clean['segment_name'] = rfm_clean['cluster'].map(cluster_names)

print("‚úÖ Cluster names assigned:")
for cluster, name in cluster_names.items():
    print(f"   {name}")

---
## üìä 9. Visualization of Clusters

In [None]:
# 2D visualization using PCA
print("üé® Creating 2D cluster visualization...")

pca_2d = PCA(n_components=2)
X_pca_2d = pca_2d.fit_transform(X_scaled)

plt.figure(figsize=(12, 8))
scatter = plt.scatter(X_pca_2d[:, 0], X_pca_2d[:, 1], 
                     c=rfm_clean['cluster'], cmap='viridis', 
                     alpha=0.6, s=50, edgecolors='black', linewidth=0.5)

# Add cluster centers
centers_pca = pca_2d.transform(kmeans_final.cluster_centers_)
plt.scatter(centers_pca[:, 0], centers_pca[:, 1], 
           c='red', marker='X', s=500, edgecolors='black', linewidth=2,
           label='Cluster Centers')

plt.title('Customer Segments (2D PCA Projection)', fontsize=16, fontweight='bold')
plt.xlabel(f'PC1 ({pca_2d.explained_variance_ratio_[0]*100:.1f}% variance)')
plt.ylabel(f'PC2 ({pca_2d.explained_variance_ratio_[1]*100:.1f}% variance)')
plt.colorbar(scatter, label='Cluster')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('outputs/visualizations/clusters_2d_pca.png', dpi=300, bbox_inches='tight')
plt.show()

print(f"‚úÖ Total variance explained: {pca_2d.explained_variance_ratio_.sum()*100:.1f}%")

In [None]:
# 3D visualization
print("\nüé® Creating 3D cluster visualization...")

fig = plt.figure(figsize=(14, 10))
ax = fig.add_subplot(111, projection='3d')

scatter = ax.scatter(rfm_clean['recency'], 
                     rfm_clean['frequency'], 
                     rfm_clean['monetary'],
                     c=rfm_clean['cluster'], cmap='viridis',
                     alpha=0.6, s=50, edgecolors='black', linewidth=0.5)

ax.set_title('Customer Segments in RFM Space', fontsize=16, fontweight='bold')
ax.set_xlabel('Recency (days)')
ax.set_ylabel('Frequency (orders)')
ax.set_zlabel('Monetary (R$)')

plt.colorbar(scatter, label='Cluster', pad=0.1)
plt.savefig('outputs/visualizations/clusters_3d_rfm.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ 3D visualization created")

In [None]:
# Segment size and revenue contribution
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Segment sizes
segment_sizes = rfm_clean['segment_name'].value_counts()
colors = plt.cm.viridis(np.linspace(0, 1, len(segment_sizes)))

axes[0].pie(segment_sizes.values, labels=segment_sizes.index, autopct='%1.1f%%',
           colors=colors, startangle=90, textprops={'fontsize': 10})
axes[0].set_title('Customer Distribution by Segment', fontsize=14, fontweight='bold')

# Revenue contribution
segment_revenue = rfm_clean.groupby('segment_name')['monetary'].sum().sort_values(ascending=False)
axes[1].bar(range(len(segment_revenue)), segment_revenue.values, color=colors)
axes[1].set_title('Total Revenue by Segment', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Segment')
axes[1].set_ylabel('Total Revenue (R$)')
axes[1].set_xticks(range(len(segment_revenue)))
axes[1].set_xticklabels(segment_revenue.index, rotation=45, ha='right')
axes[1].grid(axis='y', alpha=0.3)

# Add values on bars
for i, v in enumerate(segment_revenue.values):
    axes[1].text(i, v, f'R$ {v:,.0f}', ha='center', va='bottom')

plt.tight_layout()
plt.savefig('outputs/visualizations/segment_distribution_revenue.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ Segment distribution visualized")

---
## üíº 10. Business Recommendations

In [None]:
# Generate detailed segment analysis
print("="*80)
print("üíº BUSINESS INSIGHTS & RECOMMENDATIONS")
print("="*80)
print()

for cluster in range(optimal_k):
    segment_data = rfm_clean[rfm_clean['cluster'] == cluster]
    segment_name = cluster_names[cluster]
    
    print(f"\n{'='*80}")
    print(f"  {segment_name}")
    print(f"{'='*80}")
    print(f"\nüìä Segment Size: {len(segment_data):,} customers ({len(segment_data)/len(rfm_clean)*100:.1f}%)")
    print(f"\nüìà Key Metrics:")
    print(f"   Average Recency:     {segment_data['recency'].mean():.1f} days")
    print(f"   Average Frequency:   {segment_data['frequency'].mean():.2f} orders")
    print(f"   Average Monetary:    R$ {segment_data['monetary'].mean():,.2f}")
    print(f"   Total Revenue:       R$ {segment_data['monetary'].sum():,.2f}")
    print(f"   Revenue Share:       {segment_data['monetary'].sum()/rfm_clean['monetary'].sum()*100:.1f}%")
    
    print(f"\nüéØ Recommended Actions:")
    
    # Customize recommendations based on segment characteristics
    avg_recency = segment_data['recency'].mean()
    avg_monetary = segment_data['monetary'].mean()
    
    if 'VIP' in segment_name or 'Champions' in segment_name:
        print("   ‚Ä¢ Offer exclusive VIP rewards and early access to new products")
        print("   ‚Ä¢ Invite to exclusive events and provide premium customer service")
        print("   ‚Ä¢ Send personalized thank you messages and anniversary gifts")
        print("   ‚Ä¢ Encourage referrals with generous referral bonuses")
    elif 'Loyal' in segment_name:
        print("   ‚Ä¢ Implement loyalty points program to encourage repeat purchases")
        print("   ‚Ä¢ Send targeted promotions on complementary products")
        print("   ‚Ä¢ Offer volume discounts and bundle deals")
        print("   ‚Ä¢ Request reviews and testimonials")
    elif 'At Risk' in segment_name or 'Lost' in segment_name or 'Churned' in segment_name:
        print("   ‚Ä¢ Launch win-back campaigns with special discounts (15-20% off)")
        print("   ‚Ä¢ Send personalized emails asking for feedback")
        print("   ‚Ä¢ Offer free shipping on next order")
        print("   ‚Ä¢ Re-engage with abandoned cart reminders")
    else:  # Potential Loyalists
        print("   ‚Ä¢ Welcome email series introducing product range")
        print("   ‚Ä¢ First purchase discount to encourage second order")
        print("   ‚Ä¢ Educational content about products they viewed")
        print("   ‚Ä¢ Build relationship with regular engagement campaigns")

print(f"\n{'='*80}")
print("‚úÖ Analysis Complete!")
print(f"{'='*80}")

---
## üíæ 11. Save Model & Results

In [None]:
# Save the trained model
print("üíæ Saving model and artifacts...")

model_artifacts = {
    'kmeans_model': kmeans_final,
    'scaler': scaler,
    'cluster_names': cluster_names,
    'optimal_k': optimal_k,
    'silhouette_score': final_silhouette,
    'davies_bouldin_score': final_db,
    'feature_names': features
}

with open('models/saved_models/customer_segmentation_model.pkl', 'wb') as f:
    pickle.dump(model_artifacts, f)

print("   ‚úÖ Model saved to models/saved_models/customer_segmentation_model.pkl")

# Save segmented customer data
rfm_clean.to_csv('outputs/customer_segments.csv', index=False)
print("   ‚úÖ Segmented data saved to outputs/customer_segments.csv")

# Save cluster profiles
cluster_profiles.to_csv('outputs/cluster_profiles.csv')
print("   ‚úÖ Cluster profiles saved to outputs/cluster_profiles.csv")

print("\n‚úÖ All artifacts saved successfully!")

---
## üìã 12. Summary Report

In [None]:
print("="*80)
print("üìã CUSTOMER SEGMENTATION - EXECUTIVE SUMMARY")
print("="*80)
print()
print(f"üìä Dataset Overview:")
print(f"   Total Customers Analyzed: {len(rfm_clean):,}")
print(f"   Date Range: {df_transactions['order_purchase_timestamp'].min().date()} to {df_transactions['order_purchase_timestamp'].max().date()}")
print(f"   Total Revenue: R$ {rfm_clean['monetary'].sum():,.2f}")
print()
print(f"ü§ñ Model Performance:")
print(f"   Algorithm: K-Means Clustering")
print(f"   Number of Clusters: {optimal_k}")
print(f"   Silhouette Score: {final_silhouette:.3f}")
print(f"   Davies-Bouldin Index: {final_db:.3f}")
print()
print(f"üéØ Key Segments Identified:")
for cluster, name in cluster_names.items():
    size = len(rfm_clean[rfm_clean['cluster'] == cluster])
    revenue = rfm_clean[rfm_clean['cluster'] == cluster]['monetary'].sum()
    print(f"   {name}: {size:,} customers (R$ {revenue:,.2f})")
print()
print(f"üíº Business Impact:")
print(f"   ‚úÖ Enabled targeted marketing campaigns")
print(f"   ‚úÖ Identified high-value customer segments")
print(f"   ‚úÖ Revealed at-risk customers for retention efforts")
print(f"   ‚úÖ Optimized customer acquisition strategies")
print()
print(f"üìÅ Deliverables:")
print(f"   ‚úÖ Trained K-Means model (saved)")
print(f"   ‚úÖ Customer segmentation dataset (96K+ customers)")
print(f"   ‚úÖ Cluster profiles and characteristics")
print(f"   ‚úÖ Visualizations (PCA plots, distributions, revenue charts)")
print(f"   ‚úÖ Actionable business recommendations per segment")
print()
print("="*80)
print("‚úÖ ANALYSIS COMPLETE - MODEL READY FOR DEPLOYMENT")
print("="*80)