## E-Commerce Inventory & Sales Intelligence Dashboard

### Project Overview
This interactive dashboard analyzes sales performance, customer behavior, and product trends for a large online retailer operating across multiple regions. The system provides actionable insights through advanced SQL analytics and dynamic visualizations.

### Business Context
A growing e-commerce company needs to:
- Track product performance across different geographic regions
- Identify high-value customers and calculate Customer Lifetime Value (CLV)
- Analyze sales trends and regional performance
- Monitor product reviews and ratings

### Key Features
1. **Advanced SQL Analytics:**
   - Customer Lifetime Value (CLV) calculation using aggregations
   - Top-selling categories per region using Window Functions (RANK)
   - Moving averages for sales trends
   - Complex JOINs across multiple tables

2. **Interactive Visualizations:**
   - Regional sales heatmap
   - Daily revenue trend analysis
   - Customer value distribution
   - Product category performance

3. **Data Volume:**
   - 1,000 Customers
   - 500 Products across 5 categories
   - 10,000+ Orders
   - 5,000+ Reviews

In [1]:
# ========================================
# SECTION 1: IMPORTS AND SETUP
# ========================================

# Import required libraries
import sqlite3
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random
from faker import Faker

# Visualization libraries
import plotly.express as px
import plotly.graph_objects as go

# Interactive widgets

# Configure display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Initialize Faker for realistic data generation
fake = Faker()
Faker.seed(42)  # For reproducibility
random.seed(42)
np.random.seed(42)

print("✓ All libraries imported successfully")
print("✓ Random seeds set for reproducibility")

✓ All libraries imported successfully
✓ Random seeds set for reproducibility


In [2]:
# ========================================
# SECTION 2: DATABASE CONNECTION
# ========================================

def create_database_connection(db_name='ecommerce.db'):
    """
    Establish connection to SQLite database.

    Args:
        db_name (str): Name of the database file

    Returns:
        sqlite3.Connection: Database connection object
    """
    try:
        conn = sqlite3.connect(db_name)
        print(f"✓ Connected to database: {db_name}")
        return conn
    except sqlite3.Error as e:
        print(f"✗ Database connection error: {e}")
        return None

# Create database connection
conn = create_database_connection()
cursor = conn.cursor()

✓ Connected to database: ecommerce.db


In [3]:
# ========================================
# SECTION 3: DATABASE SCHEMA CREATION (DDL)
# ========================================

def create_database_schema(conn):
    """
    Create all database tables with proper relationships and constraints.
    Implements a normalized relational schema for e-commerce operations.

    Args:
        conn (sqlite3.Connection): Active database connection
    """
    cursor = conn.cursor()

    # Drop existing tables if they exist (for clean slate)
    cursor.execute("DROP TABLE IF EXISTS reviews")
    cursor.execute("DROP TABLE IF EXISTS orders")
    cursor.execute("DROP TABLE IF EXISTS products")
    cursor.execute("DROP TABLE IF EXISTS customers")

    # TABLE 1: CUSTOMERS
    # Stores customer information including regional data for geographic analysis
    cursor.execute("""
        CREATE TABLE customers (
            customer_id INTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT NOT NULL,
            email TEXT UNIQUE NOT NULL,
            region TEXT NOT NULL,
            registration_date DATE NOT NULL,
            customer_segment TEXT NOT NULL
        )
    """)

    # TABLE 2: PRODUCTS
    # Catalog of all products with pricing and inventory information
    cursor.execute("""
        CREATE TABLE products (
            product_id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_name TEXT NOT NULL,
            category TEXT NOT NULL,
            price REAL NOT NULL CHECK(price > 0),
            stock_quantity INTEGER NOT NULL CHECK(stock_quantity >= 0),
            supplier TEXT NOT NULL
        )
    """)

    # TABLE 3: ORDERS
    # Transaction records linking customers to products with temporal data
    cursor.execute("""
        CREATE TABLE orders (
            order_id INTEGER PRIMARY KEY AUTOINCREMENT,
            customer_id INTEGER NOT NULL,
            product_id INTEGER NOT NULL,
            order_date DATE NOT NULL,
            quantity INTEGER NOT NULL CHECK(quantity > 0),
            total_amount REAL NOT NULL CHECK(total_amount > 0),
            status TEXT NOT NULL,
            FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
            FOREIGN KEY (product_id) REFERENCES products(product_id)
        )
    """)

    # TABLE 4: REVIEWS
    # Customer feedback and ratings for products
    cursor.execute("""
        CREATE TABLE reviews (
            review_id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_id INTEGER NOT NULL,
            customer_id INTEGER NOT NULL,
            rating INTEGER NOT NULL CHECK(rating BETWEEN 1 AND 5),
            review_text TEXT,
            review_date DATE NOT NULL,
            FOREIGN KEY (product_id) REFERENCES products(product_id),
            FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        )
    """)

    # Create indexes for performance optimization on frequently queried columns
    cursor.execute("CREATE INDEX idx_orders_customer ON orders(customer_id)")
    cursor.execute("CREATE INDEX idx_orders_product ON orders(product_id)")
    cursor.execute("CREATE INDEX idx_orders_date ON orders(order_date)")
    cursor.execute("CREATE INDEX idx_reviews_product ON reviews(product_id)")
    cursor.execute("CREATE INDEX idx_customers_region ON customers(region)")

    conn.commit()
    print("✓ Database schema created successfully")
    print("✓ Tables created: customers, products, orders, reviews")
    print("✓ Indexes created for query optimization")

# Execute schema creation
create_database_schema(conn)

✓ Database schema created successfully
✓ Tables created: customers, products, orders, reviews
✓ Indexes created for query optimization


In [4]:
# ========================================
# SECTION 4: BATCH DATA GENERATION
# ========================================

def generate_customers(conn, num_customers=1000):
    """
    Generate realistic customer data in batch.

    Args:
        conn: Database connection
        num_customers: Number of customers to generate
    """
    cursor = conn.cursor()

    # Define business constants
    regions = ['North', 'South', 'East', 'West', 'Central']
    segments = ['Premium', 'Standard', 'Basic']

    customers_data = []

    print(f"Generating {num_customers} customers...")

    for i in range(num_customers):
        # Generate registration date within the last 2 years
        days_ago = random.randint(0, 730)
        reg_date = (datetime.now() - timedelta(days=days_ago)).date()

        customer = (
            fake.name(),
            fake.unique.email(),
            random.choice(regions),
            reg_date,
            random.choice(segments)
        )
        customers_data.append(customer)

    # Batch insert for performance
    cursor.executemany("""
        INSERT INTO customers (name, email, region, registration_date, customer_segment)
        VALUES (?, ?, ?, ?, ?)
    """, customers_data)

    conn.commit()
    print(f"✓ {num_customers} customers inserted successfully")


def generate_products(conn, num_products=500):
    """
    Generate product catalog with realistic pricing and categories.

    Args:
        conn: Database connection
        num_products: Number of products to generate
    """
    cursor = conn.cursor()

    # Product categories and their typical price ranges
    categories = {
        'Electronics': (50, 2000),
        'Clothing': (15, 200),
        'Books': (5, 50),
        'Home': (20, 500),
        'Sports': (10, 300)
    }

    products_data = []

    print(f"Generating {num_products} products...")

    for i in range(num_products):
        category = random.choice(list(categories.keys()))
        min_price, max_price = categories[category]

        product = (
            f"{category} - {fake.catch_phrase()}",
            category,
            round(random.uniform(min_price, max_price), 2),
            random.randint(0, 1000),
            fake.company()
        )
        products_data.append(product)

    # Batch insert
    cursor.executemany("""
        INSERT INTO products (product_name, category, price, stock_quantity, supplier)
        VALUES (?, ?, ?, ?, ?)
    """, products_data)

    conn.commit()
    print(f"✓ {num_products} products inserted successfully")


def generate_orders(conn, num_orders=10000):
    """
    Generate order transactions with realistic patterns.
    Implements business logic for order status and timing.

    Args:
        conn: Database connection
        num_orders: Number of orders to generate
    """
    cursor = conn.cursor()

    # Get valid customer and product IDs
    cursor.execute("SELECT customer_id FROM customers")
    customer_ids = [row[0] for row in cursor.fetchall()]

    cursor.execute("SELECT product_id, price FROM products")
    products = {row[0]: row[1] for row in cursor.fetchall()}
    product_ids = list(products.keys())

    statuses = ['Completed', 'Pending', 'Cancelled', 'Shipped']
    status_weights = [0.7, 0.1, 0.05, 0.15]  # Most orders are completed

    orders_data = []

    print(f"Generating {num_orders} orders...")

    for i in range(num_orders):
        # Generate order date within last 365 days
        days_ago = random.randint(0, 365)
        order_date = (datetime.now() - timedelta(days=days_ago)).date()

        product_id = random.choice(product_ids)
        quantity = random.randint(1, 5)
        total_amount = round(products[product_id] * quantity, 2)

        order = (
            random.choice(customer_ids),
            product_id,
            order_date,
            quantity,
            total_amount,
            random.choices(statuses, weights=status_weights)[0]
        )
        orders_data.append(order)

    # Batch insert
    cursor.executemany("""
        INSERT INTO orders (customer_id, product_id, order_date, quantity, total_amount, status)
        VALUES (?, ?, ?, ?, ?, ?)
    """, orders_data)

    conn.commit()
    print(f"✓ {num_orders} orders inserted successfully")


def generate_reviews(conn, num_reviews=5000):
    """
    Generate product reviews with realistic ratings distribution.

    Args:
        conn: Database connection
        num_reviews: Number of reviews to generate
    """
    cursor = conn.cursor()

    # Get valid IDs
    cursor.execute("SELECT customer_id FROM customers")
    customer_ids = [row[0] for row in cursor.fetchall()]

    cursor.execute("SELECT product_id FROM products")
    product_ids = [row[0] for row in cursor.fetchall()]

    # Rating distribution (skewed towards positive)
    ratings = [1, 2, 3, 4, 5]
    rating_weights = [0.05, 0.10, 0.15, 0.35, 0.35]

    reviews_data = []

    print(f"Generating {num_reviews} reviews...")

    for i in range(num_reviews):
        days_ago = random.randint(0, 365)
        review_date = (datetime.now() - timedelta(days=days_ago)).date()

        rating = random.choices(ratings, weights=rating_weights)[0]

        # Generate review text based on rating
        if rating >= 4:
            review_text = fake.sentence(nb_words=10) + " Great product!"
        elif rating == 3:
            review_text = fake.sentence(nb_words=10) + " It's okay."
        else:
            review_text = fake.sentence(nb_words=10) + " Disappointed."

        review = (
            random.choice(product_ids),
            random.choice(customer_ids),
            rating,
            review_text,
            review_date
        )
        reviews_data.append(review)

    # Batch insert
    cursor.executemany("""
        INSERT INTO reviews (product_id, customer_id, rating, review_text, review_date)
        VALUES (?, ?, ?, ?, ?)
    """, reviews_data)

    conn.commit()
    print(f"✓ {num_reviews} reviews inserted successfully")


# Execute batch data generation
print("\n" + "="*50)
print("STARTING BATCH DATA GENERATION")
print("="*50 + "\n")

generate_customers(conn, 1000)
generate_products(conn, 500)
generate_orders(conn, 10000)
generate_reviews(conn, 5000)

print("\n" + "="*50)
print("✓ ALL DATA GENERATED SUCCESSFULLY")
print("="*50)


STARTING BATCH DATA GENERATION

Generating 1000 customers...
✓ 1000 customers inserted successfully
Generating 500 products...
✓ 500 products inserted successfully
Generating 10000 orders...
✓ 10000 orders inserted successfully
Generating 5000 reviews...
✓ 5000 reviews inserted successfully

✓ ALL DATA GENERATED SUCCESSFULLY


In [5]:
# ========================================
# SECTION 5: ADVANCED SQL QUERIES
# ========================================

def calculate_customer_lifetime_value(conn):
    """
    Calculate Customer Lifetime Value (CLV) for all customers.

    CLV = Total revenue generated by a customer across all their orders

    Uses:
    - JOIN: Link customers with their orders
    - SUM: Aggregate total spending per customer
    - COUNT: Count number of orders per customer
    - GROUP BY: Group results by customer

    Returns:
        DataFrame with customer CLV ranking
    """
    query = """
        SELECT
            c.customer_id,
            c.name,
            c.region,
            c.customer_segment,
            COUNT(o.order_id) as total_orders,
            SUM(o.total_amount) as customer_lifetime_value,
            AVG(o.total_amount) as average_order_value,
            MIN(o.order_date) as first_purchase_date,
            MAX(o.order_date) as last_purchase_date
        FROM customers c
        INNER JOIN orders o ON c.customer_id = o.customer_id
        WHERE o.status = 'Completed'  -- Only count completed orders
        GROUP BY c.customer_id, c.name, c.region, c.customer_segment
        ORDER BY customer_lifetime_value DESC
        LIMIT 50
    """

    df = pd.read_sql_query(query, conn)
    print("✓ Customer Lifetime Value calculated")
    print(f"  Total customers analyzed: {len(df)}")
    print(f"  Top CLV: ${df['customer_lifetime_value'].max():.2f}")
    print(f"  Average CLV: ${df['customer_lifetime_value'].mean():.2f}")
    return df


def get_top_category_per_region(conn):
    """
    Find the top-selling product category in each region using Window Functions.

    Uses:
    - WINDOW FUNCTION (RANK): Rank categories by revenue within each region
    - PARTITION BY: Create separate rankings for each region
    - Multiple JOINs: Connect customers, orders, and products
    - Subquery: Filter to only show top-ranked categories

    Returns:
        DataFrame showing top category per region
    """
    query = """
        WITH regional_sales AS (
            SELECT
                c.region,
                p.category,
                SUM(o.total_amount) as total_revenue,
                COUNT(o.order_id) as total_orders,
                RANK() OVER (
                    PARTITION BY c.region
                    ORDER BY SUM(o.total_amount) DESC
                ) as category_rank
            FROM orders o
            INNER JOIN customers c ON o.customer_id = c.customer_id
            INNER JOIN products p ON o.product_id = p.product_id
            WHERE o.status = 'Completed'
            GROUP BY c.region, p.category
        )
        SELECT
            region,
            category as top_category,
            total_revenue,
            total_orders
        FROM regional_sales
        WHERE category_rank = 1
        ORDER BY total_revenue DESC
    """

    df = pd.read_sql_query(query, conn)
    print("✓ Top-selling category per region calculated (Window Function)")
    return df


def get_daily_revenue_trends(conn, days=90):
    """
    Calculate daily revenue with 7-day moving average using Window Functions.

    Uses:
    - WINDOW FUNCTION (AVG): Calculate rolling average
    - ROWS BETWEEN: Define the moving window (7 days)
    - GROUP BY: Aggregate daily sales
    - DATE functions: Group by date

    Args:
        days: Number of days to analyze

    Returns:
        DataFrame with daily revenue and moving average
    """
    query = f"""
        WITH daily_revenue AS (
            SELECT
                order_date,
                SUM(total_amount) as daily_total,
                COUNT(order_id) as daily_orders
            FROM orders
            WHERE status = 'Completed'
                AND order_date >= date('now', '-{days} days')
            GROUP BY order_date
        )
        SELECT
            order_date,
            daily_total,
            daily_orders,
            AVG(daily_total) OVER (
                ORDER BY order_date
                ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
            ) as moving_avg_7day
        FROM daily_revenue
        ORDER BY order_date
    """

    df = pd.read_sql_query(query, conn)
    print(f"✓ Daily revenue trends calculated (last {days} days)")
    print(f"  Total days: {len(df)}")
    return df


def get_regional_category_heatmap_data(conn):
    """
    Get sales data for region × category heatmap.

    Uses:
    - Multiple JOINs: Link customers, orders, and products
    - GROUP BY: Aggregate by region and category
    - SUM: Calculate total revenue

    Returns:
        DataFrame suitable for heatmap visualization
    """
    query = """
        SELECT
            c.region,
            p.category,
            SUM(o.total_amount) as total_revenue,
            COUNT(o.order_id) as order_count,
            AVG(o.total_amount) as avg_order_value
        FROM orders o
        INNER JOIN customers c ON o.customer_id = c.customer_id
        INNER JOIN products p ON o.product_id = p.product_id
        WHERE o.status = 'Completed'
        GROUP BY c.region, p.category
        ORDER BY c.region, total_revenue DESC
    """

    df = pd.read_sql_query(query, conn)
    print("✓ Regional-Category sales data retrieved for heatmap")
    return df


def get_product_performance_with_reviews(conn):
    """
    Analyze product performance combining sales and review data.

    Uses:
    - Multiple JOINs: Link products with orders and reviews
    - LEFT JOIN: Include products even without reviews
    - Subqueries: Calculate aggregated metrics
    - COALESCE: Handle NULL values

    Returns:
        DataFrame with comprehensive product metrics
    """
    query = """
        SELECT
            p.product_id,
            p.product_name,
            p.category,
            p.price,
            COALESCE(SUM(o.quantity), 0) as total_units_sold,
            COALESCE(SUM(o.total_amount), 0) as total_revenue,
            COALESCE(COUNT(DISTINCT o.order_id), 0) as total_orders,
            COALESCE(AVG(r.rating), 0) as avg_rating,
            COALESCE(COUNT(r.review_id), 0) as review_count
        FROM products p
        LEFT JOIN orders o ON p.product_id = o.product_id AND o.status = 'Completed'
        LEFT JOIN reviews r ON p.product_id = r.product_id
        GROUP BY p.product_id, p.product_name, p.category, p.price
        HAVING total_orders > 0
        ORDER BY total_revenue DESC
        LIMIT 30
    """

    df = pd.read_sql_query(query, conn)
    print("✓ Product performance with reviews analyzed")
    return df


# Execute all SQL queries and store results
print("\n" + "="*50)
print("EXECUTING ADVANCED SQL QUERIES")
print("="*50 + "\n")

clv_data = calculate_customer_lifetime_value(conn)
print()
top_categories = get_top_category_per_region(conn)
print()
revenue_trends = get_daily_revenue_trends(conn, 90)
print()
heatmap_data = get_regional_category_heatmap_data(conn)
print()
product_performance = get_product_performance_with_reviews(conn)

print("\n" + "="*50)
print("✓ ALL QUERIES EXECUTED SUCCESSFULLY")
print("="*50)


EXECUTING ADVANCED SQL QUERIES

✓ Customer Lifetime Value calculated
  Total customers analyzed: 50
  Top CLV: $25895.04
  Average CLV: $17118.64

✓ Top-selling category per region calculated (Window Function)

✓ Daily revenue trends calculated (last 90 days)
  Total days: 91

✓ Regional-Category sales data retrieved for heatmap

✓ Product performance with reviews analyzed

✓ ALL QUERIES EXECUTED SUCCESSFULLY


In [6]:
# Display sample results from advanced queries
print("\n📊 TOP 10 CUSTOMERS BY LIFETIME VALUE:")
print(clv_data.head(10))

print("\n\n📊 TOP CATEGORY PER REGION:")
print(top_categories)

print("\n\n📊 PRODUCT PERFORMANCE (Top 10):")
print(product_performance.head(10))


📊 TOP 10 CUSTOMERS BY LIFETIME VALUE:
   customer_id             name   region customer_segment  total_orders  \
0          862    Judith Miller     West         Standard            11   
1          457   Jeremiah Baker    North            Basic            11   
2            3  Cristian Santos    North            Basic             8   
3           77    James Elliott    North          Premium             8   
4          252   Amber Campbell    North            Basic            11   
5          875     Tammy Morgan  Central         Standard             8   
6          203  Vincent Mueller  Central            Basic             8   
7          251   Kathleen Davis     West            Basic            11   
8          163   Donald Schultz     East          Premium            17   
9          411   Makayla Steele     West         Standard            11   

   customer_lifetime_value  average_order_value first_purchase_date  \
0                 25895.04          2354.094545          2025-01

# Interactive Dashboard

The interactive dashboard with dynamic filters has been moved to a separate web application.

To use the interactive dashboard:
1. Run `python dashboard_app.py` in this directory
2. Open your browser to http://localhost:5000
3. Use the dropdown filters to explore the data interactively

The visualizations below show the complete dataset without filters.

In [7]:
# ========================================
# SECTION 6: INTERACTIVE VISUALIZATIONS
# ========================================

def create_regional_sales_heatmap(data):
    """
    Create an interactive heatmap showing sales density by Region × Category.

    Args:
        data: DataFrame with region, category, and total_revenue columns

    Returns:
        Plotly figure object
    """
    # Pivot data for heatmap format
    pivot_data = data.pivot_table(
        index='region',
        columns='category',
        values='total_revenue',
        fill_value=0
    )

    # Create heatmap using Plotly
    fig = go.Figure(data=go.Heatmap(
        z=pivot_data.values,
        x=pivot_data.columns,
        y=pivot_data.index,
        colorscale='YlOrRd',
        text=np.round(pivot_data.values, 0),
        texttemplate='$%{text:,.0f}',
        textfont={"size":10},
        colorbar=dict(title="Revenue ($)")
    ))

    fig.update_layout(
        title='Sales Density Heatmap: Revenue by Region and Product Category',
        xaxis_title='Product Category',
        yaxis_title='Region',
        height=500,
        font=dict(size=12)
    )

    return fig


def create_revenue_trends_chart(data):
    """
    Create interactive line chart for daily revenue trends with moving average.

    Args:
        data: DataFrame with order_date, daily_total, and moving_avg_7day

    Returns:
        Plotly figure object
    """
    fig = go.Figure()

    # Add daily revenue trace
    fig.add_trace(go.Scatter(
        x=data['order_date'],
        y=data['daily_total'],
        mode='lines',
        name='Daily Revenue',
        line=dict(color='lightblue', width=1),
        opacity=0.5
    ))

    # Add 7-day moving average trace
    fig.add_trace(go.Scatter(
        x=data['order_date'],
        y=data['moving_avg_7day'],
        mode='lines',
        name='7-Day Moving Average',
        line=dict(color='darkblue', width=3)
    ))

    fig.update_layout(
        title='Daily Revenue Trends with 7-Day Moving Average',
        xaxis_title='Date',
        yaxis_title='Revenue ($)',
        height=500,
        hovermode='x unified',
        legend=dict(x=0.01, y=0.99)
    )

    return fig


def create_clv_distribution_chart(data):
    """
    Create bar chart showing top customers by lifetime value.

    Args:
        data: DataFrame with customer CLV data

    Returns:
        Plotly figure object
    """
    # Get top 20 customers
    top_customers = data.head(20)

    fig = go.Figure(data=[
        go.Bar(
            x=top_customers['customer_lifetime_value'],
            y=top_customers['name'],
            orientation='h',
            marker=dict(
                color=top_customers['customer_lifetime_value'],
                colorscale='Viridis',
                showscale=True,
                colorbar=dict(title="CLV ($)")
            ),
            text=top_customers['customer_lifetime_value'].round(2),
            texttemplate='$%{text:,.2f}',
            textposition='auto',
        )
    ])

    fig.update_layout(
        title='Top 20 Customers by Lifetime Value (CLV)',
        xaxis_title='Customer Lifetime Value ($)',
        yaxis_title='Customer Name',
        height=600,
        yaxis=dict(autorange="reversed")
    )

    return fig


def create_category_performance_chart(data):
    """
    Create pie chart showing revenue distribution by category.

    Args:
        data: DataFrame with category and revenue data

    Returns:
        Plotly figure object
    """
    # Aggregate by category
    category_totals = data.groupby('category')['total_revenue'].sum().reset_index()
    category_totals = category_totals.sort_values('total_revenue', ascending=False)

    fig = go.Figure(data=[
        go.Pie(
            labels=category_totals['category'],
            values=category_totals['total_revenue'],
            hole=0.4,
            textinfo='label+percent',
            textposition='auto',
            marker=dict(line=dict(color='white', width=2))
        )
    ])

    fig.update_layout(
        title='Revenue Distribution by Product Category',
        height=500,
        showlegend=True
    )

    return fig


def create_product_performance_scatter(data):
    """
    Create scatter plot of product performance: sales vs ratings.

    Args:
        data: DataFrame with product performance metrics

    Returns:
        Plotly figure object
    """
    fig = px.scatter(
        data,
        x='avg_rating',
        y='total_revenue',
        size='total_units_sold',
        color='category',
        hover_data=['product_name', 'review_count'],
        title='Product Performance: Revenue vs Average Rating',
        labels={
            'avg_rating': 'Average Customer Rating',
            'total_revenue': 'Total Revenue ($)',
            'total_units_sold': 'Units Sold'
        }
    )

    fig.update_layout(height=600)

    return fig


# Generate all visualizations
print("\n" + "="*50)
print("GENERATING INTERACTIVE VISUALIZATIONS")
print("="*50 + "\n")

heatmap_fig = create_regional_sales_heatmap(heatmap_data)
revenue_fig = create_revenue_trends_chart(revenue_trends)
clv_fig = create_clv_distribution_chart(clv_data)
category_fig = create_category_performance_chart(heatmap_data)
scatter_fig = create_product_performance_scatter(product_performance)

print("✓ All visualizations created successfully")


GENERATING INTERACTIVE VISUALIZATIONS

✓ All visualizations created successfully


In [8]:
# Display Visualization 1: Regional Sales Heatmap
print("\n📊 VISUALIZATION 1: REGIONAL SALES HEATMAP")
heatmap_fig.show()


📊 VISUALIZATION 1: REGIONAL SALES HEATMAP


In [9]:
# Display Visualization 2: Daily Revenue Trends
print("\n📊 VISUALIZATION 2: DAILY REVENUE TRENDS")
revenue_fig.show()


📊 VISUALIZATION 2: DAILY REVENUE TRENDS


In [10]:
# Display Visualization 3: Customer Lifetime Value
print("\n📊 VISUALIZATION 3: TOP CUSTOMERS BY CLV")
clv_fig.show()


📊 VISUALIZATION 3: TOP CUSTOMERS BY CLV


In [11]:
# Display Visualization 4: Category Revenue Distribution
print("\n📊 VISUALIZATION 4: REVENUE BY CATEGORY")
category_fig.show()


📊 VISUALIZATION 4: REVENUE BY CATEGORY


In [12]:
# Display Visualization 5: Product Performance Scatter
print("\n📊 VISUALIZATION 5: PRODUCT PERFORMANCE ANALYSIS")
scatter_fig.show()


📊 VISUALIZATION 5: PRODUCT PERFORMANCE ANALYSIS


## Interactive Widgets Dashboard

Use the widgets below to filter and explore data dynamically:
- Filter by date range
- Select specific regions
- Choose product categories
- Refresh data with updated filters

## Reflection

### Challenges and Learning Outcomes

#### SQL Complexity Challenges:
The most challenging aspect was implementing window functions, particularly the RANK() function for identifying top-selling categories per region. Understanding the PARTITION BY clause required careful consideration of how data should be grouped before ranking. The moving average calculation using ROWS BETWEEN was also complex, requiring precise window frame specification to calculate 7-day rolling averages correctly.

Another significant challenge was optimizing query performance with large datasets (10,000+ orders). Creating appropriate indexes on frequently joined columns (customer_id, product_id, order_date) was crucial for maintaining query speed.

#### Data Visualization Challenges:
Integrating interactive widgets with Plotly visualizations required understanding the event-driven programming model in Jupyter notebooks. Managing state between widget updates and ensuring visualizations refreshed properly with filtered data took iteration. The heatmap pivot transformation was particularly challenging as it required reshaping data from long to wide format while handling missing region-category combinations.

The most rewarding aspect was seeing how advanced SQL queries could generate actionable business insights, transforming raw transactional data into strategic intelligence about customer value and regional performance patterns.

In [15]:
# ========================================
# SECTION 8: DATABASE SUMMARY & CLEANUP
# ========================================

def get_database_summary(conn):
    """
    Generate comprehensive summary of database contents.

    Returns:
        Dictionary with table statistics
    """
    summary = {}

    tables = ['customers', 'products', 'orders', 'reviews']

    for table in tables:
        cursor = conn.cursor()
        cursor.execute(f"SELECT COUNT(*) FROM {table}")
        count = cursor.fetchone()[0]
        summary[table] = count

    return summary


# Display final summary
print("\n" + "="*50)
print("DATABASE SUMMARY")
print("="*50 + "\n")

summary = get_database_summary(conn)

print("📊 Table Record Counts:")
for table, count in summary.items():
    print(f"   {table.upper()}: {count:,} records")

print("Dashboard implementation complete!")
print("\nKey Features Implemented:")
print("4-table normalized relational schema")
print("Batch data generation (16,500+ total records)")
print("Advanced SQL with window functions (RANK, AVG with ROWS BETWEEN)")
print("Complex JOINs across multiple tables")
print("Aggregations (SUM, COUNT, AVG)")
print("5 interactive Plotly visualizations")
print("Interactive widgets for data filtering")
print("Modular, well-commented code")

print("\n" + "="*50)
print("Thank you for exploring the E-Commerce Dashboard!")
print("="*50)


DATABASE SUMMARY

📊 Table Record Counts:
   CUSTOMERS: 1,000 records
   PRODUCTS: 500 records
   ORDERS: 10,000 records
   REVIEWS: 5,000 records
Dashboard implementation complete!

Key Features Implemented:
4-table normalized relational schema
Batch data generation (16,500+ total records)
Advanced SQL with window functions (RANK, AVG with ROWS BETWEEN)
Complex JOINs across multiple tables
Aggregations (SUM, COUNT, AVG)
5 interactive Plotly visualizations
Interactive widgets for data filtering
Modular, well-commented code

Thank you for exploring the E-Commerce Dashboard!


In [16]:
# Optional: Close database connection when done
# Uncomment the line below to close the connection
conn.close()
print("✓ Database connection closed")

✓ Database connection closed
