# E-commerce Business Performance Analysis

## Executive Summary

This notebook provides a comprehensive analysis of e-commerce business performance with focus on revenue metrics, customer experience, product performance, and geographic distribution. The analysis is designed to be configurable for different time periods and provides actionable insights for business decision-making.

## Table of Contents

1. [Business Objectives](#business-objectives)
2. [Data Loading & Configuration](#data-loading--configuration)
3. [Data Dictionary](#data-dictionary)
4. [Data Preparation & Quality Assessment](#data-preparation--quality-assessment)
5. [Revenue Analysis](#revenue-analysis)
6. [Product Performance Analysis](#product-performance-analysis)
7. [Geographic Performance Analysis](#geographic-performance-analysis)
8. [Customer Experience Analysis](#customer-experience-analysis)
9. [Operational Metrics](#operational-metrics)
10. [Summary & Key Insights](#summary--key-insights)

## Business Objectives

This analysis aims to answer key business questions:

- **Revenue Performance**: How is our revenue trending year-over-year and month-over-month?
- **Order Metrics**: What are our average order values and order volume trends?
- **Product Portfolio**: Which product categories drive the most revenue?
- **Geographic Performance**: Which states/regions are our strongest markets?
- **Customer Experience**: How do delivery times impact customer satisfaction?
- **Operational Efficiency**: What is our order fulfillment performance?

## Data Loading & Configuration

### Analysis Configuration

Configure the analysis parameters below. Modify these settings to analyze different time periods or comparison years.

In [None]:
# Analysis Configuration
ANALYSIS_YEAR = 2023          # Primary year for analysis
COMPARISON_YEAR = 2022        # Year to compare against
ANALYSIS_MONTHS = None        # Specific months to analyze (None = all months)
DATA_PATH = 'ecommerce_data'  # Path to data directory

# Display configuration
print(f"Analysis Configuration:")
print(f"  Primary Analysis Year: {ANALYSIS_YEAR}")
print(f"  Comparison Year: {COMPARISON_YEAR}")
print(f"  Analysis Months: {'All months' if ANALYSIS_MONTHS is None else ANALYSIS_MONTHS}")
print(f"  Data Path: {DATA_PATH}")

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings

# Import custom modules
from data_loader import load_and_prepare_analysis_data, get_date_range_options, filter_by_date_range
from business_metrics import (
    calculate_revenue_metrics,
    calculate_monthly_growth,
    calculate_order_metrics,
    calculate_product_category_performance,
    calculate_geographic_performance,
    calculate_customer_experience_metrics,
    calculate_order_status_distribution,
    get_monthly_revenue_data
)

# Configure display options
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

# Business color scheme
BUSINESS_COLORS = {
    'primary': '#2E86AB',
    'secondary': '#A23B72', 
    'success': '#F18F01',
    'warning': '#C73E1D',
    'neutral': '#6C757D'
}

print("Libraries imported successfully")

## Data Dictionary

### Key Data Sources

| Dataset | Description | Key Columns |
|---------|-------------|-------------|
| **orders** | Order-level information | order_id, customer_id, order_status, timestamps |
| **order_items** | Item-level details for each order | order_id, product_id, price, freight_value |
| **products** | Product catalog information | product_id, product_category_name, dimensions |
| **customers** | Customer demographic data | customer_id, customer_state, customer_city |
| **reviews** | Customer feedback and ratings | order_id, review_score, review_creation_date |

### Key Business Terms

- **Revenue**: Total sales amount from delivered orders (excludes freight)
- **AOV (Average Order Value)**: Average total value per order
- **Delivery Speed**: Days between order purchase and customer delivery
- **Review Score**: Customer satisfaction rating (1-5 scale)
- **Order Status**: Current state of order (delivered, shipped, canceled, etc.)

## Data Preparation & Quality Assessment

In [None]:
# Load and prepare data
sales_data, raw_datasets = load_and_prepare_analysis_data(
    data_path=DATA_PATH,
    filter_delivered=True  # Focus on successfully delivered orders
)

# Get available date ranges
date_options = get_date_range_options(sales_data)
print(f"\nAvailable Data:")
print(f"  Years: {date_options['available_years']}")
print(f"  Date Range: {date_options['date_range']['min_date'].strftime('%Y-%m-%d')} to {date_options['date_range']['max_date'].strftime('%Y-%m-%d')}")

In [None]:
# Apply analysis filters if specified
if ANALYSIS_MONTHS:
    sales_data = filter_by_date_range(
        sales_data, 
        start_month=min(ANALYSIS_MONTHS),
        end_month=max(ANALYSIS_MONTHS)
    )
    print(f"Filtered data to months: {ANALYSIS_MONTHS}")

# Display data overview
print(f"\nFinal Dataset Overview:")
print(f"  Total Records: {len(sales_data):,}")
print(f"  Unique Orders: {sales_data['order_id'].nunique():,}")
print(f"  Date Range: {sales_data['order_purchase_timestamp'].min().strftime('%Y-%m-%d')} to {sales_data['order_purchase_timestamp'].max().strftime('%Y-%m-%d')}")
print(f"  Revenue Range: ${sales_data['price'].min():.2f} to ${sales_data['price'].max():.2f}")

## Revenue Analysis

### Year-over-Year Revenue Performance

In [None]:
# Calculate revenue metrics
revenue_metrics = calculate_revenue_metrics(
    sales_data, 
    target_year=ANALYSIS_YEAR, 
    comparison_year=COMPARISON_YEAR
)

print(f"Revenue Performance Analysis ({ANALYSIS_YEAR} vs {COMPARISON_YEAR})")
print("=" * 60)
print(f"Total Revenue {ANALYSIS_YEAR}: ${revenue_metrics['total_revenue']:,.2f}")
print(f"Total Revenue {COMPARISON_YEAR}: ${revenue_metrics['comparison_revenue']:,.2f}")
print(f"Year-over-Year Growth: {revenue_metrics['revenue_growth']:.2f}%")

if revenue_metrics['revenue_growth'] > 0:
    print(f"Status: Revenue increased by ${revenue_metrics['total_revenue'] - revenue_metrics['comparison_revenue']:,.2f}")
else:
    print(f"Status: Revenue decreased by ${revenue_metrics['comparison_revenue'] - revenue_metrics['total_revenue']:,.2f}")

### Monthly Revenue Trends

In [None]:
# Calculate monthly growth
monthly_growth = calculate_monthly_growth(sales_data, year=ANALYSIS_YEAR)

print(f"Monthly Growth Analysis for {ANALYSIS_YEAR}")
print("=" * 40)
print(f"Average Monthly Growth Rate: {monthly_growth['average_monthly_growth']:.2f}%")

# Get monthly revenue data for visualization
monthly_revenue = get_monthly_revenue_data(sales_data, year=ANALYSIS_YEAR)

# Create monthly revenue trend visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Monthly revenue trend
ax1.plot(monthly_revenue['month'], monthly_revenue['price'], 
         marker='o', linewidth=2, color=BUSINESS_COLORS['primary'])
ax1.set_title(f'Monthly Revenue Trend ({ANALYSIS_YEAR})', fontsize=14, fontweight='bold')
ax1.set_xlabel('Month')
ax1.set_ylabel('Revenue ($)')
ax1.grid(True, alpha=0.3)
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

# Monthly growth rate
growth_data = monthly_growth['monthly_growth_series'].dropna() * 100
colors = [BUSINESS_COLORS['success'] if x >= 0 else BUSINESS_COLORS['warning'] for x in growth_data.values]
ax2.bar(growth_data.index, growth_data.values, color=colors, alpha=0.7)
ax2.set_title(f'Month-over-Month Growth Rate ({ANALYSIS_YEAR})', fontsize=14, fontweight='bold')
ax2.set_xlabel('Month')
ax2.set_ylabel('Growth Rate (%)')
ax2.grid(True, alpha=0.3)
ax2.axhline(y=0, color='black', linestyle='-', alpha=0.5)

plt.tight_layout()
plt.show()

### Order Value Analysis

In [None]:
# Calculate order metrics
order_metrics = calculate_order_metrics(
    sales_data, 
    target_year=ANALYSIS_YEAR, 
    comparison_year=COMPARISON_YEAR
)

print(f"Order Performance Analysis ({ANALYSIS_YEAR} vs {COMPARISON_YEAR})")
print("=" * 60)
print(f"Average Order Value {ANALYSIS_YEAR}: ${order_metrics['avg_order_value']:.2f}")
print(f"Total Orders {ANALYSIS_YEAR}: {order_metrics['total_orders']:,}")
print(f"AOV Growth: {order_metrics['aov_growth']:.2f}%")
print(f"Order Count Growth: {order_metrics['order_count_growth']:.2f}%")

# Calculate implied metrics
avg_items_per_order = len(sales_data[sales_data['year'] == ANALYSIS_YEAR]) / order_metrics['total_orders']
print(f"Average Items per Order: {avg_items_per_order:.1f}")

## Product Performance Analysis

### Revenue by Product Category

In [None]:
# Calculate product category performance
category_performance = calculate_product_category_performance(
    sales_data, 
    raw_datasets['products'], 
    year=ANALYSIS_YEAR
)

print(f"Top Product Categories by Revenue ({ANALYSIS_YEAR})")
print("=" * 50)
for idx, row in category_performance.head(5).iterrows():
    print(f"{idx+1}. {row['product_category_name'].replace('_', ' ').title()}: ${row['price']:,.2f}")

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Bar chart for top categories
top_categories = category_performance.head(10)
bars = ax1.bar(range(len(top_categories)), top_categories['price'], 
               color=BUSINESS_COLORS['primary'], alpha=0.8)
ax1.set_title(f'Top 10 Product Categories by Revenue ({ANALYSIS_YEAR})', fontsize=14, fontweight='bold')
ax1.set_xlabel('Product Category')
ax1.set_ylabel('Revenue ($)')
ax1.set_xticks(range(len(top_categories)))
ax1.set_xticklabels([cat.replace('_', ' ').title() for cat in top_categories['product_category_name']], 
                    rotation=45, ha='right')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))
ax1.grid(True, alpha=0.3)

# Pie chart for category distribution
top_5 = category_performance.head(5)
others_revenue = category_performance.iloc[5:]['price'].sum()
pie_data = list(top_5['price']) + [others_revenue]
pie_labels = [cat.replace('_', ' ').title() for cat in top_5['product_category_name']] + ['Others']

ax2.pie(pie_data, labels=pie_labels, autopct='%1.1f%%', startangle=90)
ax2.set_title(f'Revenue Distribution by Category ({ANALYSIS_YEAR})', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

## Geographic Performance Analysis

### Revenue by State

In [None]:
# Calculate geographic performance
state_performance = calculate_geographic_performance(
    sales_data, 
    raw_datasets['orders'], 
    raw_datasets['customers'], 
    year=ANALYSIS_YEAR
)

print(f"Top States by Revenue ({ANALYSIS_YEAR})")
print("=" * 40)
for idx, row in state_performance.head(10).iterrows():
    print(f"{idx+1}. {row['customer_state']}: ${row['price']:,.2f}")

# Create state revenue map
fig = px.choropleth(
    state_performance,
    locations='customer_state',
    color='price',
    locationmode='USA-states',
    scope='usa',
    title=f'Revenue by State ({ANALYSIS_YEAR})',
    color_continuous_scale='Blues',
    labels={'price': 'Revenue ($)'}
)

fig.update_layout(
    title_font_size=16,
    title_x=0.5,
    height=500
)

fig.show()

## Customer Experience Analysis

### Delivery Performance and Customer Satisfaction

In [None]:
# Calculate customer experience metrics
cx_metrics = calculate_customer_experience_metrics(
    sales_data, 
    raw_datasets['reviews'], 
    year=ANALYSIS_YEAR
)

print(f"Customer Experience Metrics ({ANALYSIS_YEAR})")
print("=" * 45)
print(f"Average Review Score: {cx_metrics['avg_review_score']:.2f}/5.0")
print(f"Average Delivery Time: {cx_metrics['avg_delivery_days']:.1f} days")

print(f"\nDelivery Speed Impact on Reviews:")
for category, score in cx_metrics['delivery_impact_on_reviews'].items():
    print(f"  {category}: {score:.2f}/5.0")

# Create visualizations
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))

# Review score distribution
review_dist = cx_metrics['review_distribution']
ax1.bar(review_dist.index, review_dist.values, color=BUSINESS_COLORS['primary'], alpha=0.8)
ax1.set_title(f'Review Score Distribution ({ANALYSIS_YEAR})', fontsize=14, fontweight='bold')
ax1.set_xlabel('Review Score')
ax1.set_ylabel('Proportion of Reviews')
ax1.set_ylim(0, max(review_dist.values) * 1.1)
for i, v in enumerate(review_dist.values):
    ax1.text(review_dist.index[i], v + 0.01, f'{v:.1%}', ha='center', va='bottom')

# Delivery impact on reviews
delivery_impact = cx_metrics['delivery_impact_on_reviews']
bars = ax2.bar(delivery_impact.index, delivery_impact.values, 
               color=[BUSINESS_COLORS['success'], BUSINESS_COLORS['warning'], BUSINESS_COLORS['secondary']], 
               alpha=0.8)
ax2.set_title('Review Score by Delivery Speed', fontsize=14, fontweight='bold')
ax2.set_xlabel('Delivery Speed Category')
ax2.set_ylabel('Average Review Score')
ax2.set_ylim(3.8, 4.4)
for i, v in enumerate(delivery_impact.values):
    ax2.text(i, v + 0.02, f'{v:.2f}', ha='center', va='bottom')

# Customer satisfaction gauge
satisfaction_pct = (cx_metrics['avg_review_score'] / 5.0) * 100
ax3.pie([satisfaction_pct, 100-satisfaction_pct], 
        labels=[f'Satisfied\n({satisfaction_pct:.1f}%)', ''], 
        colors=[BUSINESS_COLORS['success'], '#f0f0f0'],
        startangle=90,
        counterclock=False)
ax3.set_title(f'Overall Customer Satisfaction\n({cx_metrics["avg_review_score"]:.2f}/5.0)', 
              fontsize=14, fontweight='bold')

# Delivery performance summary
fast_delivery_pct = (delivery_impact['1-3 days'] / cx_metrics['avg_review_score']) * 100
ax4.text(0.5, 0.7, f"Average Delivery Time", ha='center', va='center', fontsize=16, fontweight='bold')
ax4.text(0.5, 0.5, f"{cx_metrics['avg_delivery_days']:.1f} days", ha='center', va='center', fontsize=24, 
         color=BUSINESS_COLORS['primary'], fontweight='bold')
ax4.text(0.5, 0.3, f"Customer Satisfaction: {cx_metrics['avg_review_score']:.2f}/5.0", 
         ha='center', va='center', fontsize=14)
ax4.set_xlim(0, 1)
ax4.set_ylim(0, 1)
ax4.axis('off')

plt.tight_layout()
plt.show()

## Operational Metrics

### Order Status Distribution and Fulfillment Performance

In [None]:
# Calculate order status distribution
order_status_dist = calculate_order_status_distribution(raw_datasets['orders'], year=ANALYSIS_YEAR)

print(f"Order Status Distribution ({ANALYSIS_YEAR})")
print("=" * 40)
for status, proportion in order_status_dist.items():
    print(f"{status.title()}: {proportion:.1%}")

# Calculate fulfillment metrics
delivered_rate = order_status_dist.get('delivered', 0)
canceled_rate = order_status_dist.get('canceled', 0)
pending_processing_rate = order_status_dist.get('pending', 0) + order_status_dist.get('processing', 0)

print(f"\nKey Operational Metrics:")
print(f"  Delivery Success Rate: {delivered_rate:.1%}")
print(f"  Cancellation Rate: {canceled_rate:.1%}")
print(f"  Orders in Progress: {pending_processing_rate:.1%}")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Order status pie chart
colors = [BUSINESS_COLORS['success'], BUSINESS_COLORS['secondary'], BUSINESS_COLORS['warning'], 
          BUSINESS_COLORS['neutral'], BUSINESS_COLORS['primary'], '#8B4513']
wedges, texts, autotexts = ax1.pie(order_status_dist.values, 
                                   labels=[s.title() for s in order_status_dist.index], 
                                   autopct='%1.1f%%', 
                                   colors=colors[:len(order_status_dist)],
                                   startangle=90)
ax1.set_title(f'Order Status Distribution ({ANALYSIS_YEAR})', fontsize=14, fontweight='bold')

# Operational KPIs
kpi_metrics = ['Delivery Rate', 'Cancellation Rate', 'In Progress']
kpi_values = [delivered_rate * 100, canceled_rate * 100, pending_processing_rate * 100]
kpi_colors = [BUSINESS_COLORS['success'], BUSINESS_COLORS['warning'], BUSINESS_COLORS['neutral']]

bars = ax2.bar(kpi_metrics, kpi_values, color=kpi_colors, alpha=0.8)
ax2.set_title('Key Operational Metrics', fontsize=14, fontweight='bold')
ax2.set_ylabel('Percentage (%)')
ax2.set_ylim(0, 100)
for i, v in enumerate(kpi_values):
    ax2.text(i, v + 1, f'{v:.1f}%', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## Summary & Key Insights

### Executive Summary Dashboard

In [None]:
# Compile key metrics for executive summary
print(f"EXECUTIVE SUMMARY - {ANALYSIS_YEAR} PERFORMANCE")
print("=" * 60)
print(f"Analysis Period: {ANALYSIS_YEAR} vs {COMPARISON_YEAR}")
print(f"Date Range: {sales_data['order_purchase_timestamp'].min().strftime('%B %Y')} - {sales_data['order_purchase_timestamp'].max().strftime('%B %Y')}")
print()

print("FINANCIAL PERFORMANCE:")
print(f"  Total Revenue: ${revenue_metrics['total_revenue']:,.0f}")
print(f"  YoY Revenue Growth: {revenue_metrics['revenue_growth']:+.1f}%")
print(f"  Average Order Value: ${order_metrics['avg_order_value']:.0f}")
print(f"  Total Orders: {order_metrics['total_orders']:,}")
print()

print("CUSTOMER EXPERIENCE:")
print(f"  Customer Satisfaction: {cx_metrics['avg_review_score']:.1f}/5.0")
print(f"  Average Delivery Time: {cx_metrics['avg_delivery_days']:.1f} days")
print(f"  Delivery Success Rate: {delivered_rate:.1%}")
print()

print("TOP PERFORMERS:")
print(f"  Best Product Category: {category_performance.iloc[0]['product_category_name'].replace('_', ' ').title()}")
print(f"  Highest Revenue State: {state_performance.iloc[0]['customer_state']}")
print(f"  Monthly Growth Rate: {monthly_growth['average_monthly_growth']:+.1f}%")
print()

print("KEY INSIGHTS:")
if revenue_metrics['revenue_growth'] > 0:
    print(f"  + Revenue growth of {revenue_metrics['revenue_growth']:.1f}% indicates positive business trajectory")
else:
    print(f"  - Revenue declined by {abs(revenue_metrics['revenue_growth']):.1f}%, requiring strategic attention")

if cx_metrics['avg_review_score'] >= 4.0:
    print(f"  + Strong customer satisfaction score of {cx_metrics['avg_review_score']:.1f}/5.0")
else:
    print(f"  - Customer satisfaction at {cx_metrics['avg_review_score']:.1f}/5.0 needs improvement")

if delivered_rate >= 0.9:
    print(f"  + Excellent delivery success rate of {delivered_rate:.1%}")
else:
    print(f"  - Delivery success rate of {delivered_rate:.1%} has room for improvement")

print(f"  + {category_performance.iloc[0]['product_category_name'].replace('_', ' ').title()} leads product portfolio")
print(f"  + {state_performance.iloc[0]['customer_state']} represents strongest geographic market")

### Recommendations for Action

Based on the analysis results, here are key recommendations:

#### Revenue Optimization
- Focus marketing efforts on high-performing product categories
- Develop strategies to improve average order value
- Investigate seasonal trends for better demand forecasting

#### Customer Experience Enhancement
- Optimize delivery processes to reduce shipping times
- Implement quality improvements for low-scoring product categories
- Develop customer retention programs for high-value segments

#### Geographic Expansion
- Increase marketing investment in high-performing states
- Analyze underperforming regions for growth opportunities
- Consider regional preferences in product offerings

#### Operational Excellence
- Improve order fulfillment processes to reduce cancellations
- Streamline inventory management for popular products
- Enhance customer service for better satisfaction scores

---

**Analysis Configuration Summary:**
- Primary Year: {ANALYSIS_YEAR}
- Comparison Year: {COMPARISON_YEAR}
- Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M')}
- Total Records Analyzed: {len(sales_data):,}

*This analysis is configurable for different time periods by modifying the parameters at the beginning of the notebook.*