# E-commerce Customer Segmentation with Stirling Measure

This notebook demonstrates how to apply the Stirling Measure to e-commerce customer data to discover natural customer segments and optimize marketing strategies.

## Overview

The Stirling Measure provides a mathematical foundation for understanding how customers naturally cluster into segments over time. By analyzing the underlying parameters (a,b), we can make informed decisions about marketing strategies and customer targeting.

**What we'll accomplish:**
1. Load and preprocess e-commerce transaction data
2. Perform customer segmentation analysis over time
3. Calculate the Stirling Measure from clustering patterns
4. Estimate parameters that govern customer behavior
5. Generate actionable marketing insights

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')

# Import our custom modules
from data_prep import load_and_clean_data, create_customer_features, perform_clustering
from stirling_measure import analyze_customer_segments, interpret_parameters
from visualize import create_comprehensive_visualizations

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

## Step 1: Data Loading and Exploration

First, let's load the e-commerce dataset and explore its structure.

In [None]:
# Check if we have preprocessed data available
import os

if os.path.exists('segment_summary.csv'):
    print("Using previously processed data...")
    summary_df = pd.read_csv('segment_summary.csv')
    print(f"Data shape: {summary_df.shape}")
    print("\nFirst few rows:")
    display(summary_df.head())
    
    print(f"\nTime range: {summary_df['month'].min()} to {summary_df['month'].max()}")
    print(f"Customer range: {summary_df['n_customers'].min()} - {summary_df['n_customers'].max()}")
    print(f"Segment range: {summary_df['n_segments'].min()} - {summary_df['n_segments'].max()}")
else:
    print("No preprocessed data found. Please run data_prep.py first.")
    print("You can do this by running: !python data_prep.py")

## Step 2: Visualize Customer and Segment Evolution

Let's examine how customer counts and segment counts change over time.

In [None]:
# Create basic visualization of the data
if 'summary_df' in locals():
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 5))
    
    # Customer evolution
    ax1.plot(summary_df['month'], summary_df['n_customers'], 'o-', linewidth=2, markersize=8)
    ax1.set_title('Customer Count Over Time', fontsize=14, fontweight='bold')
    ax1.set_xlabel('Month')
    ax1.set_ylabel('Number of Customers')
    ax1.tick_params(axis='x', rotation=45)
    ax1.grid(True, alpha=0.3)
    
    # Segment evolution
    ax2.plot(summary_df['month'], summary_df['n_segments'], 's-', 
             linewidth=2, markersize=8, color='orange')
    ax2.set_title('Segment Count Over Time', fontsize=14, fontweight='bold')
    ax2.set_xlabel('Month')
    ax2.set_ylabel('Number of Segments')
    ax2.tick_params(axis='x', rotation=45)
    ax2.grid(True, alpha=0.3)
    
    # Customer-to-segment ratio
    ratio = summary_df['n_customers'] / summary_df['n_segments']
    ax3.plot(summary_df['month'], ratio, '^-', linewidth=2, markersize=8, color='green')
    ax3.set_title('Customers per Segment', fontsize=14, fontweight='bold')
    ax3.set_xlabel('Month')
    ax3.set_ylabel('Customers per Segment')
    ax3.tick_params(axis='x', rotation=45)
    ax3.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print summary statistics
    print(f"Average customers per month: {summary_df['n_customers'].mean():.1f}")
    print(f"Average segments per month: {summary_df['n_segments'].mean():.1f}")
    print(f"Average customers per segment: {ratio.mean():.1f}")

## Step 3: Calculate the Stirling Measure

Now we'll calculate the Stirling Measure and estimate the underlying parameters (a,b).

In [None]:
# Perform Stirling Measure analysis
if 'summary_df' in locals():
    results = analyze_customer_segments(
        customer_counts=summary_df['n_customers'].tolist(),
        segment_counts=summary_df['n_segments'].tolist(),
        time_periods=summary_df['month'].tolist(),
        plot=True
    )
    
    # Extract key results
    estimated_a = results['estimated_a']
    estimated_b = results['estimated_b']
    r_squared = results['r_squared']
    
    print(f"\n{'='*60}")
    print(f"STIRLING MEASURE ANALYSIS RESULTS")
    print(f"{'='*60}")
    print(f"Estimated Parameter a (Customer Affinity): {estimated_a:.4f}")
    print(f"Estimated Parameter b (Segment Barrier):   {estimated_b:.4f}")
    print(f"Model Fit (R-squared):                     {r_squared:.4f}")
    print(f"Number of data points:                     {len(results['measures'])}")

## Step 4: Interpret the Parameters

Let's understand what these parameters mean for our business.

In [None]:
# Get parameter interpretations
if 'results' in locals():
    interpretations = interpret_parameters(estimated_a, estimated_b)
    
    print(f"\n{'='*60}")
    print(f"PARAMETER INTERPRETATION")
    print(f"{'='*60}")
    
    print(f"\n🎯 CUSTOMER AFFINITY (a = {estimated_a:.3f}):")
    print(f"   {interpretations['a_interpretation']}")
    
    print(f"\n🚧 SEGMENT BARRIER (b = {estimated_b:.3f}):")
    print(f"   {interpretations['b_interpretation']}")
    
    print(f"\n📈 RECOMMENDED STRATEGY:")
    print(f"   {interpretations['strategy']}")

## Step 5: Detailed Analysis and Visualization

Let's create more detailed visualizations to understand our customer segmentation patterns.

In [None]:
# Create detailed parameter analysis
if 'results' in locals():
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # 1. 3D visualization of the relationship
    from mpl_toolkits.mplot3d import Axes3D
    
    # Recreate as 2D for this example
    ax1 = axes[0, 0]
    scatter = ax1.scatter(results['customer_counts'], results['segment_counts'], 
                         c=results['measures'], cmap='viridis', s=100, alpha=0.7)
    ax1.set_xlabel('Number of Customers (n)')
    ax1.set_ylabel('Number of Segments (k)')
    ax1.set_title('Customer-Segment Relationship\n(colored by Stirling Measure)')
    plt.colorbar(scatter, ax=ax1, label='Stirling Measure')
    ax1.grid(True, alpha=0.3)
    
    # 2. Residual analysis
    ax2 = axes[0, 1]
    n_vals = np.array(results['customer_counts'])
    k_vals = np.array(results['segment_counts'])
    predicted = estimated_a * n_vals + estimated_b * k_vals
    residuals = np.array(results['measures']) - predicted
    
    ax2.scatter(predicted, residuals, alpha=0.7, s=100)
    ax2.axhline(y=0, color='red', linestyle='--', alpha=0.8)
    ax2.set_xlabel('Predicted Values')
    ax2.set_ylabel('Residuals')
    ax2.set_title('Residual Analysis')
    ax2.grid(True, alpha=0.3)
    
    # 3. Parameter sensitivity analysis
    ax3 = axes[1, 0]
    a_range = np.linspace(0, 1, 50)
    b_range = np.linspace(0, 3, 50)
    
    # Create strategy regions
    A, B = np.meshgrid(a_range, b_range)
    regions = np.zeros_like(A)
    regions[(A < 0.3) & (B < 1.0)] = 1  # Highly personalized
    regions[(A > 0.5) & (B > 2.0)] = 2  # Mass segment
    regions[(A < 0.3) & (B > 2.0)] = 3  # Micro-segment
    regions[(A > 0.5) & (B < 1.0)] = 4  # Adaptive mass
    
    im = ax3.imshow(regions, extent=[0, 1, 0, 3], origin='lower', 
                    cmap='Set3', alpha=0.6, aspect='auto')
    ax3.scatter(estimated_a, estimated_b, color='red', s=200, marker='*', 
               edgecolors='black', linewidth=2, zorder=5)
    ax3.set_xlabel('Parameter a (Customer Affinity)')
    ax3.set_ylabel('Parameter b (Segment Barrier)')
    ax3.set_title('Marketing Strategy Map')
    
    # 4. Time series of Stirling measures
    ax4 = axes[1, 1]
    ax4.plot(range(len(results['measures'])), results['measures'], 'o-', 
             linewidth=2, markersize=8)
    ax4.set_xlabel('Time Period Index')
    ax4.set_ylabel('Stirling Measure')
    ax4.set_title('Stirling Measure Evolution')
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## Step 6: Business Insights and Recommendations

Based on our analysis, let's generate specific business recommendations.

In [None]:
# Generate business insights
if 'results' in locals():
    print(f"\n{'='*80}")
    print(f"BUSINESS INSIGHTS AND RECOMMENDATIONS")
    print(f"{'='*80}")
    
    # Market characteristics
    print(f"\n📊 MARKET CHARACTERISTICS:")
    if estimated_a < 0.3:
        print(f"   • Customers show LOW affinity for joining existing segments")
        print(f"   • Market has diverse, heterogeneous customer behaviors")
        print(f"   • High potential for personalization")
    elif estimated_a < 0.5:
        print(f"   • Customers show MODERATE affinity for existing segments")
        print(f"   • Balanced market with some clustering tendencies")
        print(f"   • Opportunity for both segment-based and personalized approaches")
    else:
        print(f"   • Customers show HIGH affinity for existing segments")
        print(f"   • Market has strong clustering patterns")
        print(f"   • Segment-based strategies will be most effective")
    
    if estimated_b < 1.0:
        print(f"   • LOW barriers to new segment formation")
        print(f"   • Market segments evolve quickly")
        print(f"   • Need for frequent reassessment")
    elif estimated_b < 2.0:
        print(f"   • MODERATE barriers to new segment formation")
        print(f"   • Segments are somewhat stable")
        print(f"   • Balance between stability and evolution")
    else:
        print(f"   • HIGH barriers to new segment formation")
        print(f"   • Very stable market segments")
        print(f"   • Long-term segment strategies are viable")
    
    # Specific recommendations
    print(f"\n🎯 SPECIFIC RECOMMENDATIONS:")
    
    avg_customers = np.mean(results['customer_counts'])
    avg_segments = np.mean(results['segment_counts'])
    customers_per_segment = avg_customers / avg_segments
    
    print(f"   • Current average: {customers_per_segment:.1f} customers per segment")
    
    if estimated_a > 0.5 and estimated_b > 2.0:
        print(f"   • Focus on 3-5 major customer segments")
        print(f"   • Invest heavily in segment-specific products/services")
        print(f"   • Use mass marketing within each segment")
    elif estimated_a < 0.3 and estimated_b < 1.0:
        print(f"   • Implement micro-segmentation (8-12 segments)")
        print(f"   • Use AI-driven personalization")
        print(f"   • Frequent reassessment (monthly)")
    else:
        print(f"   • Maintain {int(avg_segments)}-{int(avg_segments)+2} segments")
        print(f"   • Combine segment-based and personalized approaches")
        print(f"   • Quarterly reassessment of segment structure")
    
    # Model quality assessment
    print(f"\n📈 MODEL QUALITY:")
    if r_squared > 0.8:
        print(f"   • EXCELLENT model fit (R² = {r_squared:.3f})")
        print(f"   • High confidence in parameter estimates")
    elif r_squared > 0.6:
        print(f"   • GOOD model fit (R² = {r_squared:.3f})")
        print(f"   • Reasonable confidence in recommendations")
    else:
        print(f"   • MODERATE model fit (R² = {r_squared:.3f})")
        print(f"   • Consider collecting more data or alternative approaches")

## Step 7: Create Comprehensive Visualizations

Finally, let's create a comprehensive set of visualizations for reporting.

In [None]:
# Create comprehensive visualizations
try:
    create_comprehensive_visualizations()
    print("✅ All visualizations created successfully!")
    print("\nGenerated files:")
    print("   • comprehensive_analysis.png")
    print("   • stirling_measure_3d.png") 
    print("   • business_insights_dashboard.png")
    print("   • interactive_dashboard.html")
except Exception as e:
    print(f"❌ Error creating visualizations: {e}")
    print("This might be due to missing data. Please run data_prep.py first.")

## Conclusion

This analysis demonstrates how the Stirling Measure can provide valuable insights into customer segmentation dynamics. The key findings are:

1. **Mathematical Foundation**: The Stirling Measure provides a rigorous mathematical framework for understanding natural clustering tendencies in customer data.

2. **Parameter Interpretation**: The estimated parameters (a,b) reveal fundamental characteristics about customer behavior and market dynamics.

3. **Actionable Insights**: The analysis translates mathematical results into concrete business recommendations for marketing strategy.

4. **Continuous Monitoring**: The approach can be applied regularly to track changes in customer behavior over time.

### Next Steps

1. **Implement Recommendations**: Apply the suggested marketing strategy based on parameter estimates
2. **Monitor Changes**: Regularly recalculate parameters to detect shifts in customer behavior  
3. **A/B Testing**: Test different approaches on different customer segments
4. **Expand Analysis**: Apply the same framework to other business metrics (product categories, geographic regions, etc.)

The Stirling Measure approach provides a powerful complement to traditional customer segmentation methods, offering both mathematical rigor and practical business value.