# Customer Segmentation - Results Analysis

This notebook focuses on analyzing the results of our customer segmentation:

1. Analyzing segment profiles in detail
2. Comparing the identified segments with expected segments
3. Creating visualizations for business insights
4. Developing strategic recommendations for each segment
5. Evaluating the business impact of the segmentation

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
import plotly.express as px
import plotly.graph_objects as go
import os

# Set plotting style
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')

# Increase default figure size
plt.rcParams['figure.figsize'] = [12, 8]

## 1. Load Segmentation Results

In [None]:
# Load the customer data with segments
df = pd.read_csv('./output/customer_data_with_segments.csv')

# Load segment mapping
segment_mapping = pd.read_csv('./output/segment_mapping.csv')

# Display basic information
print(f"Dataset shape: {df.shape}")
df.head()

In [None]:
# Display segment mapping
print("Segment Mapping:")
segment_mapping

## 2. Segment Profiles Analysis

In [None]:
# Calculate segment profiles
segment_profiles = df.groupby('Segment').mean()

# Display segment profiles
print("Segment Profiles (Mean Values):")
display(segment_profiles.drop('Cluster', axis=1, errors='ignore'))

# Calculate segment statistics
segment_stats = df.groupby('Segment').describe()

# Display segment sizes
segment_sizes = df['Segment'].value_counts()
segment_percentages = segment_sizes / len(df) * 100

print("\nSegment Sizes:")
for segment, count in segment_sizes.iteritems():
    print(f"{segment}: {count} customers ({segment_percentages[segment]:.2f}%)")

In [None]:
# Visualize segment sizes
plt.figure(figsize=(10, 6))

# Create a pie chart
plt.pie(segment_sizes, labels=segment_sizes.index, autopct='%1.1f%%', startangle=90,
        shadow=True, explode=[0.05] * len(segment_sizes))
plt.title('Customer Distribution by Segment', fontsize=16)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle
plt.tight_layout()
plt.savefig('./output/segment_distribution_pie.png', dpi=300)
plt.show()

In [None]:
# Create bar charts for segment sizes
plt.figure(figsize=(10, 6))
ax = sns.barplot(x=segment_sizes.index, y=segment_sizes.values, palette='viridis')
plt.title('Customer Distribution by Segment', fontsize=16)
plt.xlabel('Segment', fontsize=12)
plt.ylabel('Number of Customers', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add value labels on top of each bar
for i, v in enumerate(segment_sizes.values):
    ax.text(i, v+10, str(v), ha='center', va='bottom', fontsize=12)
    ax.text(i, v/2, f"{segment_percentages.iloc[i]:.1f}%", ha='center', va='center', fontsize=11, color='white')

plt.tight_layout()
plt.savefig('./output/segment_distribution_bar.png', dpi=300)
plt.show()

## 3. Detailed Segment Comparison

In [None]:
# Compare segments across all features
features = [col for col in df.columns if col not in ['customer_id', 'Cluster', 'Segment']]

# Create a radar chart to compare segments
# First normalize the values for better comparison
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
segment_profiles_scaled = pd.DataFrame(
    scaler.fit_transform(segment_profiles[features]),
    index=segment_profiles.index,
    columns=features
)

# Create radar chart
fig = go.Figure()

for segment in segment_profiles_scaled.index:
    values = segment_profiles_scaled.loc[segment, features].values.tolist()
    # Close the loop by repeating the first value
    values.append(values[0])
    
    fig.add_trace(go.Scatterpolar(
        r=values,
        theta=features + [features[0]],  # Close the loop
        fill='toself',
        name=segment
    ))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 1]
        )
    ),
    title='Segment Comparison (Normalized Features)',
    showlegend=True
)

fig.write_html('./output/segment_comparison_radar.html')
fig.show()

In [None]:
# Create a bar chart for each feature to compare segments
for feature in features:
    plt.figure(figsize=(10, 6))
    ax = sns.barplot(x=segment_profiles.index, y=segment_profiles[feature], palette='viridis')
    plt.title(f'Average {feature} by Segment', fontsize=16)
    plt.xlabel('Segment', fontsize=12)
    plt.ylabel(feature, fontsize=12)
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    
    # Add value labels on top of each bar
    for i, v in enumerate(segment_profiles[feature]):
        ax.text(i, v, f"{v:.2f}", ha='center', va='bottom', fontsize=12)
    
    plt.tight_layout()
    plt.savefig(f'./output/segment_comparison_{feature}.png', dpi=300)
    plt.show()

In [None]:
# Create boxplots to show the distribution of features within each segment
plt.figure(figsize=(15, 20))

for i, feature in enumerate(features):
    plt.subplot(len(features), 1, i+1)
    sns.boxplot(x='Segment', y=feature, data=df, palette='viridis')
    plt.title(f'Distribution of {feature} by Segment', fontsize=14)
    plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.savefig('./output/segment_boxplots.png', dpi=300)
plt.show()

## 4. Comparing Identified Segments with Expected Segments

In [None]:
# Create a table comparing expected segment characteristics with identified segments
expected_segments = {
    'Bargain Hunters': {
        'total_purchases': 'High',
        'avg_cart_value': 'Low',
        'total_time_spent': 'Moderate',
        'product_click': 'Moderate',
        'discount_count': 'High',
        'behavior': 'These customers are deal-seekers who make frequent purchases of low-value items and heavily rely on discounts.'
    },
    'High Spenders': {
        'total_purchases': 'Moderate',
        'avg_cart_value': 'High',
        'total_time_spent': 'Moderate',
        'product_click': 'Moderate',
        'discount_count': 'Low',
        'behavior': 'These customers are premium buyers who focus on high-value purchases and are less influenced by discounts.'
    },
    'Window Shoppers': {
        'total_purchases': 'Low',
        'avg_cart_value': 'Moderate',
        'total_time_spent': 'High',
        'product_click': 'High',
        'discount_count': 'Low',
        'behavior': 'These customers spend significant time browsing but rarely make purchases.'
    }
}

In [None]:
# Create a function to describe the level (High/Moderate/Low) based on relative values
def get_level(value, mean, std):
    if value > mean + 0.5 * std:
        return 'High'
    elif value < mean - 0.5 * std:
        return 'Low'
    else:
        return 'Moderate'

In [None]:
# Calculate overall mean and standard deviation for each feature
feature_means = df[features].mean()
feature_stds = df[features].std()

# Categorize each segment's features as High/Moderate/Low
identified_segments = {}

for segment in segment_profiles.index:
    identified_segments[segment] = {}
    for feature in features:
        value = segment_profiles.loc[segment, feature]
        level = get_level(value, feature_means[feature], feature_stds[feature])
        identified_segments[segment][feature] = level

# Create a DataFrame for comparison
comparison_data = []

for expected_segment, expected_attrs in expected_segments.items():
    row = {
        'Expected Segment': expected_segment,
    }
    
    # Add expected feature levels
    for feature in features:
        if feature in expected_attrs:
            row[f'Expected {feature}'] = expected_attrs[feature]
        else:
            row[f'Expected {feature}'] = '-'
    
    # Find the identified segment that matches best
    best_match = None
    best_match_score = -1
    
    for identified_segment, identified_attrs in identified_segments.items():
        match_score = sum(1 for feature in features 
                         if feature in expected_attrs and 
                         expected_attrs[feature] == identified_attrs[feature])
        
        if match_score > best_match_score:
            best_match_score = match_score
            best_match = identified_segment
    
    # Add identified feature levels
    for feature in features:
        discount_feature = 'discount_counts' if 'discount_counts' in identified_segments[best_match] else 'discount_count'
        if feature in identified_segments[best_match]:
            row[f'Identified {feature}'] = identified_segments[best_match][feature]
        elif feature == 'discount_count' and discount_feature in identified_segments[best_match]:
            row[f'Identified {feature}'] = identified_segments[best_match][discount_feature]
        else:
            row[f'Identified {feature}'] = '-'
    
    row['Identified Segment'] = best_match
    row['Match Score'] = f"{best_match_score}/{len(features)}"
    
    comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)
comparison_df

## 5. Business Insights and Strategic Recommendations

In [None]:
# Calculate spending potential for each segment
# Create a simple estimate of customer value
df['estimated_revenue'] = df['total_purchases'] * df['avg_cart_value']
df['estimated_clv'] = df['estimated_revenue'] * 3  # Simple CLV estimate (3x current revenue)

# Analyze by segment
segment_revenue = df.groupby('Segment')['estimated_revenue'].agg(['mean', 'sum', 'count'])
segment_revenue['percentage'] = segment_revenue['sum'] / segment_revenue['sum'].sum() * 100
segment_revenue['per_customer'] = segment_revenue['sum'] / segment_revenue['count']

# Display revenue analysis
print("Revenue Analysis by Segment:")
display(segment_revenue)

In [None]:
# Visualize revenue contribution by segment
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.pie(segment_revenue['percentage'], labels=segment_revenue.index, 
        autopct='%1.1f%%', startangle=90, shadow=True, 
        explode=[0.05] * len(segment_revenue))
plt.title('Revenue Distribution by Segment', fontsize=16)
plt.axis('equal')

plt.subplot(1, 2, 2)
sns.barplot(x=segment_revenue.index, y=segment_revenue['per_customer'], palette='viridis')
plt.title('Average Revenue per Customer by Segment', fontsize=16)
plt.xlabel('Segment', fontsize=12)
plt.ylabel('Revenue per Customer', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add value labels on top of each bar
for i, v in enumerate(segment_revenue['per_customer']):
    plt.text(i, v, f"${v:.2f}", ha='center', va='bottom', fontsize=12)

plt.tight_layout()
plt.savefig('./output/revenue_analysis.png', dpi=300)
plt.show()

### Strategic Recommendations for Each Segment

In [None]:
# Define strategic recommendations for each segment
strategic_recommendations = {}

for segment, matched_segment in zip(comparison_df['Expected Segment'], comparison_df['Identified Segment']):
    if segment == 'Bargain Hunters':
        strategic_recommendations[matched_segment] = {
            'description': 'These customers are deal-seekers who make frequent purchases of low-value items and heavily rely on discounts.',
            'recommendations': [
                'Implement a tiered loyalty program that rewards frequent purchases',
                'Create limited-time flash sales and daily deals',
                'Send targeted promotions for complementary items to increase cart value',
                'Use product bundling strategies to encourage larger purchases',
                'Highlight value and savings in marketing communications'
            ],
            'kpis_to_track': [
                'Purchase frequency',
                'Average cart value (aim to increase)',
                'Discount redemption rate',
                'Loyalty program engagement'
            ]
        }
    elif segment == 'High Spenders':
        strategic_recommendations[matched_segment] = {
            'description': 'These customers are premium buyers who focus on high-value purchases and are less influenced by discounts.',
            'recommendations': [
                'Develop a premium customer program with exclusive benefits',
                'Focus on product quality and premium features in marketing messages',
                'Offer personalized shopping experiences and concierge services',
                'Create early access to new product releases',
                'Implement cross-selling strategies for complementary premium products'
            ],
            'kpis_to_track': [
                'Customer lifetime value (CLV)',
                'Retention rate',
                'Average order value',
                'Premium service adoption rate'
            ]
        }
    elif segment == 'Window Shoppers':
        strategic_recommendations[matched_segment] = {
            'description': 'These customers spend significant time browsing but rarely make purchases.',
            'recommendations': [
                'Implement targeted abandonment cart recovery strategies',
                'Create limited-time offers with countdown timers to create urgency',
                'Develop a wish list feature to track items of interest',
                'Use remarketing campaigns to bring them back to the website',
                'Offer first-time purchase incentives to convert browsers to buyers'
            ],
            'kpis_to_track': [
                'Conversion rate',
                'Browse-to-buy ratio',
                'Time between first visit and first purchase',
                'Email campaign click-through and conversion rates'
            ]
        }

# Display strategic recommendations for each segment
for segment, recs in strategic_recommendations.items():
    print(f"\n=== Strategic Recommendations for {segment} Segment ===\n")
    print(f"Description: {recs['description']}\n")
    
    print("Recommendations:")
    for i, rec in enumerate(recs['recommendations'], 1):
        print(f"  {i}. {rec}")
    
    print("\nKPIs to Track:")
    for i, kpi in enumerate(recs['kpis_to_track'], 1):
        print(f"  {i}. {kpi}")
    
    print("\n" + "-"*60)

## 6. Evaluation of Business Impact

In [None]:
# Create a summary of potential business impact
business_impact = pd.DataFrame({
    'Segment': segment_revenue.index,
    'Customer Count': segment_revenue['count'],
    'Customer Percentage': segment_revenue['count'] / segment_revenue['count'].sum() * 100,
    'Revenue Contribution': segment_revenue['sum'],
    'Revenue Percentage': segment_revenue['percentage'],
    'Avg Revenue Per Customer': segment_revenue['per_customer']
})

# Calculate potential impact scenarios
for segment in business_impact['Segment']:
    if segment in strategic_recommendations:
        if segment == comparison_df[comparison_df['Expected Segment'] == 'Bargain Hunters']['Identified Segment'].values[0]:
            # For Bargain Hunters: Scenario of increasing average cart value by 15%
            current_customers = business_impact.loc[business_impact['Segment'] == segment, 'Customer Count'].values[0]
            current_revenue_per_customer = business_impact.loc[business_impact['Segment'] == segment, 'Avg Revenue Per Customer'].values[0]
            potential_increase = current_revenue_per_customer * 0.15 * current_customers
            business_impact.loc[business_impact['Segment'] == segment, 'Potential Impact'] = f"${potential_increase:.2f} by increasing avg cart value by 15%"
        
        elif segment == comparison_df[comparison_df['Expected Segment'] == 'High Spenders']['Identified Segment'].values[0]:
            # For High Spenders: Scenario of increasing retention by 10%
            current_customers = business_impact.loc[business_impact['Segment'] == segment, 'Customer Count'].values[0]
            current_revenue_per_customer = business_impact.loc[business_impact['Segment'] == segment, 'Avg Revenue Per Customer'].values[0]
            potential_increase = current_revenue_per_customer * current_customers * 0.1
            business_impact.loc[business_impact['Segment'] == segment, 'Potential Impact'] = f"${potential_increase:.2f} by increasing retention by 10%"
        
        elif segment == comparison_df[comparison_df['Expected Segment'] == 'Window Shoppers']['Identified Segment'].values[0]:
            # For Window Shoppers: Scenario of increasing conversion rate by 20%
            current_customers = business_impact.loc[business_impact['Segment'] == segment, 'Customer Count'].values[0]
            current_revenue_per_customer = business_impact.loc[business_impact['Segment'] == segment, 'Avg Revenue Per Customer'].values[0]
            # Assume 20% more purchases from each window shopper
            potential_increase = current_revenue_per_customer * 0.2 * current_customers
            business_impact.loc[business_impact['Segment'] == segment, 'Potential Impact'] = f"${potential_increase:.2f} by increasing conversion rate by 20%"

# Format numeric columns
business_impact['Customer Percentage'] = business_impact['Customer Percentage'].apply(lambda x: f"{x:.1f}%")
business_impact['Revenue Percentage'] = business_impact['Revenue Percentage'].apply(lambda x: f"{x:.1f}%")
business_impact['Avg Revenue Per Customer'] = business_impact['Avg Revenue Per Customer'].apply(lambda x: f"${x:.2f}")

business_impact

## 7. Save Results for Reporting

In [None]:
# Save key results to files
comparison_df.to_csv('./output/segment_comparison.csv', index=False)
business_impact.to_csv('./output/business_impact.csv', index=False)

# Save strategic recommendations as JSON for potential use in an application
import json
with open('./output/strategic_recommendations.json', 'w') as f:
    json.dump(strategic_recommendations, f, indent=4)

print("Analysis results saved successfully!")

## Summary of Findings (Continued)

3. **Window Shoppers**: These customers spend significant time browsing but rarely make purchases. They represent a potential opportunity for conversion. Strategic focus should be on converting their browsing behavior into actual purchases through targeted incentives and creating urgency.

Our segmentation analysis has revealed several key insights:

- **Revenue Distribution**: The High Spenders segment, while smaller in size, contributes disproportionately to revenue. This reinforces the importance of customer quality over quantity.

- **Behavioral Patterns**: Each segment shows distinct patterns in how they interact with the platform:
  - Bargain Hunters are highly responsive to discounts and make frequent small purchases
  - High Spenders focus on quality and are willing to pay premium prices without discounts
  - Window Shoppers extensively research products but have a high hesitation factor before purchasing

- **Marketing Implications**: Different messaging strategies are needed for each segment:
  - Bargain Hunters respond to value and savings messaging
  - High Spenders respond to quality and exclusivity messaging
  - Window Shoppers need incentives to overcome purchase hesitation

The business impact analysis shows significant revenue growth potential through targeted strategies for each segment. By implementing the recommended approaches, we estimate the following potential improvements:

- Increasing Bargain Hunters' average cart value by 15%
- Improving High Spenders' retention rate by 10% 
- Boosting Window Shoppers' conversion rate by 20%

These improvements could result in substantial revenue growth without acquiring new customers.

## Next Steps and Implementation

### 1. Operational Implementation

To operationalize these findings, we recommend the following steps:

1. **Customer Tagging System**: Implement a system to tag customers according to their segment in the CRM system

2. **Personalized Communication**: Set up segmented email campaigns and personalized product recommendations based on segment characteristics

3. **Website Personalization**: Customize website experiences based on identified segment (e.g., highlighting discounts for Bargain Hunters, premium features for High Spenders)

4. **Segment-Specific Promotions**: Deploy targeted promotions aligned with the strategic recommendations for each segment

5. **Real-time Segmentation**: Develop a system to classify new customers into segments based on their early browsing and purchasing behaviors

### 2. Monitoring and Evaluation Framework

To track the effectiveness of segment-specific strategies, we recommend monitoring:

**Overall Metrics:**
- Revenue contribution by segment
- Customer lifetime value by segment
- Segment migration patterns (customers moving between segments)
- Segment stability over time

**Segment-Specific KPIs:**

- **Bargain Hunters**:
  - Average cart value
  - Purchase frequency
  - Discount redemption rates
  - Response to bundle offers

- **High Spenders**:
  - Retention rates
  - Premium service adoption
  - Average order value
  - Response to exclusive offers

- **Window Shoppers**:
  - Conversion rates
  - Browse-to-buy ratio
  - Cart abandonment rate
  - Response to urgency-based promotions

### 3. Future Enhancements

To further refine our customer segmentation approach, we recommend:

1. **Dynamic Segmentation**: Implement a system that updates customer segments periodically as their behavior evolves

2. **Predictive Modeling**: Develop models to predict future customer behavior and value within each segment

3. **Micro-Segmentation**: Further divide main segments into more specific micro-segments for even more targeted approaches

4. **A/B Testing**: Test different marketing approaches for each segment to continuously optimize strategies

5. **Churn Prediction**: Identify at-risk customers within each segment and develop segment-specific retention strategies

6. **Cross-Segment Analysis**: Identify patterns of customers migrating between segments and optimize for favorable migrations

## Conclusion

This customer segmentation analysis has successfully identified three distinct customer segments with unique characteristics and behaviors. The findings align well with our initial expectations of Bargain Hunters, High Spenders, and Window Shoppers in the e-commerce context.

The segment-specific strategic recommendations provide a clear roadmap for targeted marketing approaches that can maximize the value of each customer group. By implementing these strategies, businesses can optimize their marketing spend, improve customer satisfaction through more relevant experiences, and ultimately drive revenue growth.

The true value of this segmentation isn't just in the initial identification of customer groups, but in the ongoing application of these insights across all customer touchpoints. By treating different customers differently based on their demonstrated behaviors and preferences, businesses can create more meaningful relationships with customers while improving their own bottom line.

In summary, the three-segment approach provides a balanced and actionable customer segmentation framework that allows for meaningful differentiation in marketing strategy while remaining simple enough for practical implementation. With the right implementation and ongoing monitoring, these customer segments can form the foundation for a more customer-centric and profitable business strategy.