# Pepper Market Segment Analysis

This notebook analyzes Pepper's effectiveness in serving the small-chested customer segment.

## Business Question
"Is Pepper successfully capturing and retaining the underserved small-chested customer segment?"

## Analysis Approach
1. Size Distribution Analysis
2. Retention Rate by Segment
3. Market Opportunity Sizing

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set up plotting style
plt.style.use('seaborn')
sns.set_palette('deep')

# Configure pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

## Data Loading

Load the transformed data that matches TheLook's schema:

In [None]:
# Set up data paths
data_dir = Path('../data/pepper')

# Load latest product data
product_files = list(data_dir.glob('transformed_bra_products_*.csv'))
latest_product_file = max(product_files)
products_df = pd.read_csv(latest_product_file)

# Load latest order data
order_files = list(data_dir.glob('transformed_order_items_*.csv'))
latest_order_file = max(order_files)
orders_df = pd.read_csv(latest_order_file)

print(f'Loaded {len(products_df)} products and {len(orders_df)} order items')

## 1. Size Distribution Analysis

Compare Pepper's size distribution to industry standards:

In [None]:
def analyze_size_distribution(products_df):
    """Analyze and visualize product size distribution."""
    # Extract size from SKU
    products_df['size'] = products_df['sku'].str[-3:]
    
    # Define size segments
    def get_segment(size):
        if size[-1] in ['A', 'AA']:
            return 'Core (A/AA)'
        elif size[-1] == 'B':
            return 'Extended (B)'
        else:
            return 'Other'
    
    products_df['segment'] = products_df['size'].apply(get_segment)
    
    # Plot distribution
    plt.figure(figsize=(12, 6))
    sns.countplot(data=products_df, x='segment')
    plt.title('Product Distribution by Size Segment')
    plt.xlabel('Size Segment')
    plt.ylabel('Number of Products')
    
    return products_df

# Analyze size distribution
products_with_segments = analyze_size_distribution(products_df)

# Show segment percentages
segment_dist = products_with_segments['segment'].value_counts(normalize=True)
print('\nSegment Distribution:')
print(segment_dist)

## 2. Retention Rate Analysis

Calculate and compare retention rates across segments:

In [None]:
def analyze_retention(orders_df, products_df):
    """Calculate retention rates by segment."""
    # Merge order items with products to get segments
    orders_with_segments = orders_df.merge(
        products_df[['id', 'segment']],
        left_on='product_id',
        right_on='id'
    )
    
    # Calculate days between purchases for each customer
    orders_with_segments['created_at'] = pd.to_datetime(orders_with_segments['created_at'])
    orders_with_segments = orders_with_segments.sort_values(['user_id', 'created_at'])
    
    # Calculate retention (returning within 90 days)
    def get_retention(group):
        if len(group) < 2:
            return 0
        first_purchase = group['created_at'].iloc[0]
        return (group['created_at'] - first_purchase).dt.days.between(1, 90).any()
    
    retention = orders_with_segments.groupby(['user_id', 'segment']).apply(get_retention)
    retention_rates = retention.groupby('segment').mean()
    
    # Plot retention rates
    plt.figure(figsize=(12, 6))
    retention_rates.plot(kind='bar')
    plt.title('90-Day Retention Rate by Segment')
    plt.xlabel('Segment')
    plt.ylabel('Retention Rate')
    
    return retention_rates

# Calculate retention rates
retention_rates = analyze_retention(orders_df, products_with_segments)
print('\nRetention Rates by Segment:')
print(retention_rates)

## 3. Market Opportunity Analysis

Calculate revenue and growth metrics by segment:

In [None]:
def analyze_market_opportunity(orders_df, products_df):
    """Calculate revenue metrics by segment."""
    # Merge orders with products
    revenue_data = orders_df.merge(
        products_df[['id', 'segment', 'retail_price']],
        left_on='product_id',
        right_on='id'
    )
    
    # Calculate revenue by segment
    segment_revenue = revenue_data.groupby('segment')['sale_price'].sum()
    
    # Calculate average order value by segment
    segment_aov = revenue_data.groupby('segment')['sale_price'].mean()
    
    # Plot revenue distribution
    plt.figure(figsize=(12, 6))
    segment_revenue.plot(kind='pie', autopct='%1.1f%%')
    plt.title('Revenue Distribution by Segment')
    
    return pd.DataFrame({
        'Total Revenue': segment_revenue,
        'Average Order Value': segment_aov
    })

# Calculate market metrics
market_metrics = analyze_market_opportunity(orders_df, products_with_segments)
print('\nMarket Metrics by Segment:')
print(market_metrics)

## Key Findings

1. Size Distribution
   - X% of products in core segment (A/AA cups)
   - Compared to Y% industry average

2. Retention Rates
   - Z% higher retention in core segment
   - Indicates product-market fit

3. Market Opportunity
   - $W million revenue in core segment
   - V% higher AOV in target market

## Business Recommendations

1. Product Strategy
   - Focus on expanding core size range
   - Maintain price positioning

2. Customer Retention
   - Leverage high retention in marketing
   - Create targeted loyalty program

3. Growth Opportunities
   - Expand in high-performing segments
   - Target look-alike customers