# Advanced Data Visualization: Mastering Interactive Dashboards, Perception-Based Design, and Data Storytelling

## Abstract

This tutorial provides a comprehensive exploration of advanced data visualization principles that separate amateur charts from professional, decision-driving dashboards. We delve into the cognitive science of visual perception, the architecture of effective interactive systems, and the narrative structures that make data compelling.

Through progressive examples using real-world patterns, you will master:

1. **Perception-Based Design Principles**: Understanding pre-attentive processing, Gestalt laws, and color theory to create visualizations that communicate instantly and accurately

2. **Interactive Dashboard Architecture**: Building responsive, multi-layered dashboards that follow Shneiderman's Visual Information-Seeking Mantra: "Overview first, zoom and filter, then details-on-demand"

3. **Data Storytelling Frameworks**: Applying narrative structures (Setup-Conflict-Resolution) to guide viewers from data to insight to action

4. **Advanced Visualization Techniques**: Implementing small multiples, layered encodings, animated transitions, and coordinated views

This is not about making "pretty charts" - it's about understanding **why certain design choices work** based on human visual cognition, and **how to architect information** to drive decisions at scale.

**Prerequisites**: Basic Python, familiarity with pandas and plotting libraries

**Learning Outcomes**:
- Explain cognitive principles underlying effective visualizations
- Design multi-dimensional encodings using position, color, size, and shape appropriately
- Build interactive dashboards with coordinated brushing and linking
- Apply narrative frameworks to transform exploratory analysis into persuasive presentations
- Critique visualizations using evidence-based design principles

## Part 1: The Cognitive Foundation - How Humans Process Visual Information

### 1.1 Pre-Attentive Processing: The 250-Millisecond Window

Before conscious thought occurs, your visual system processes certain attributes **automatically** in less than 250 milliseconds. These "pre-attentive attributes" include:

**Spatial Properties:**
- Position (most accurate channel)
- Length
- Angle/Slope
- Size/Area

**Color Properties:**
- Hue (what we call "color")
- Saturation (intensity)
- Luminance (brightness)

**Shape Properties:**
- Shape/Form
- Texture
- Motion

**Key Insight**: Use pre-attentive attributes for the MOST IMPORTANT information. If you want viewers to instantly see outliers, encode them with color, not just different values in a table.

### 1.2 The Visual Hierarchy of Accuracy

Colin Ware's research (Information Visualization, 2012) established a hierarchy of how accurately humans perceive different encodings:

**Tier 1 (Most Accurate):**
1. Position along a common scale
2. Position on identical but nonaligned scales
3. Length

**Tier 2 (Moderate Accuracy):**
4. Angle/Slope
5. Area
6. Volume/Density

**Tier 3 (Least Accurate):**
7. Color saturation
8. Color hue

**Application**: For precise quantitative comparisons, use bar charts (length/position). For categorical distinctions, use color hue. NEVER use pie charts for precise comparisons (angle is tier 2).

### 1.3 Gestalt Principles of Grouping

Your brain automatically organizes visual elements using these laws:

**Proximity**: Objects close together are perceived as a group
- Application: Cluster related charts, add whitespace between different sections

**Similarity**: Objects sharing visual attributes are perceived as related
- Application: Use consistent colors for the same category across multiple charts

**Enclosure**: Objects within boundaries are perceived as a group
- Application: Use subtle boxes/panels to group related metrics

**Continuity**: Elements arranged on a line or curve are perceived as related
- Application: Use connecting lines in line charts to show relationships

**Closure**: Mind fills in gaps to create complete shapes
- Application: Can use incomplete shapes to reduce ink (Tufte's data-ink ratio)

### 1.4 Color Theory for Data Visualization

**Sequential Palettes** (Light → Dark):
- Use for: Ordered data (temperature: cold to hot)
- Example: Light blue → Dark blue for increasing values
- Perception: Brain naturally interprets darker as "more"

**Diverging Palettes** (Color1 → Neutral → Color2):
- Use for: Data with meaningful midpoint (profit/loss, above/below average)
- Example: Red (negative) → White (zero) → Blue (positive)
- Perception: Two-way deviation from center

**Categorical Palettes** (Distinct Hues):
- Use for: Nominal categories with no order
- Example: Red, Blue, Green, Orange for different products
- Perception: Each category is equally distinct

**Critical Rules**:
1. **Colorblind-Safe**: ~8% of males have color vision deficiency. Avoid red-green palettes.
2. **Semantic Meaning**: Red = danger/stop, Green = safe/go, Yellow = caution (universal conventions)
3. **Contrast**: Ensure sufficient contrast (WCAG AA: 4.5:1 ratio) for accessibility
4. **Limit Hues**: Max 6-8 distinct colors in a single view

### 1.5 The Data-Ink Ratio (Edward Tufte)

**Formula**: Data-Ink Ratio = (Ink used to display data) / (Total ink used in graphic)

**Maximize this ratio by removing:**
- Chartjunk (decorative elements)
- Redundant labels
- Heavy gridlines (use light gray, not black)
- 3D effects (distort perception)
- Unnecessary borders

**Exception**: Sometimes "redundant" encoding improves accessibility (color + shape for colorblind users)

In [1]:
# Cell 1: Environment setup and generate dataset for teaching visualization principles

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

print("📚 ADVANCED DATA VISUALIZATION TUTORIAL")
print("=" * 80)
print("Creating synthetic dataset to demonstrate visualization principles...")
print()

# Generate realistic sales data for teaching visualization concepts
def generate_teaching_dataset():
    """
    Generate multi-dimensional business data to demonstrate visualization techniques.

    Includes:
    - Time series with trends and seasonality
    - Multiple categories (products, regions)
    - Continuous and discrete variables
    - Outliers and patterns for teaching
    """

    # Time range: 2 years of daily data
    dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='D')

    # Products and regions
    products = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']
    regions = ['North', 'South', 'East', 'West']

    data = []

    for product_idx, product in enumerate(products):
        for region_idx, region in enumerate(regions):

            # Base sales vary by product and region
            base_sales = (100 + product_idx * 50) * (0.8 + region_idx * 0.15)

            # Trend (growth over time)
            trend = np.linspace(0, base_sales * 0.3, len(dates))

            # Seasonality (yearly cycle)
            seasonal = base_sales * 0.2 * np.sin(2 * np.pi * np.arange(len(dates)) / 365)

            # Weekly pattern (weekends lower)
            weekly = np.array([0.7 if d >= 5 else 1.0 for d in pd.Series(dates).dt.dayofweek])

            # Random noise
            noise = np.random.normal(0, base_sales * 0.1, len(dates))

            # Combine
            sales = (base_sales + trend + seasonal) * weekly + noise
            sales = np.maximum(sales, 0)  # No negative sales

            # Add some outliers (special promotions)
            promo_days = np.random.choice(len(dates), size=10, replace=False)
            sales[promo_days] *= np.random.uniform(2, 3, len(promo_days))

            # Costs (70-80% of sales with variation)
            cost_ratio = 0.75 + np.random.normal(0, 0.05, len(dates))
            costs = sales * cost_ratio

            # Profit
            profit = sales - costs

            # Customer counts (correlated with sales)
            customers = (sales / (50 + product_idx * 10)) + np.random.normal(0, 2, len(dates))
            customers = np.maximum(customers, 0)

            for i, date in enumerate(dates):
                data.append({
                    'date': date,
                    'product': product,
                    'region': region,
                    'sales': sales[i],
                    'costs': costs[i],
                    'profit': profit[i],
                    'customers': int(customers[i]),
                    'avg_transaction': sales[i] / max(customers[i], 1),
                    'year': date.year,
                    'month': date.month,
                    'quarter': f'Q{(date.month-1)//3 + 1}',
                    'day_of_week': date.day_name(),
                    'is_weekend': date.dayofweek >= 5
                })

    df = pd.DataFrame(data)

    # Add derived metrics
    df['profit_margin'] = (df['profit'] / df['sales']) * 100
    df['sales_category'] = pd.cut(df['sales'], bins=5, labels=['Very Low', 'Low', 'Medium', 'High', 'Very High'])

    return df

df = generate_teaching_dataset()

print(f"✅ Dataset created: {len(df):,} records")
print(f"   Date range: {df['date'].min().date()} to {df['date'].max().date()}")
print(f"   Products: {df['product'].nunique()}")
print(f"   Regions: {df['region'].nunique()}")
print(f"   Total sales: ${df['sales'].sum():,.2f}")
print()
print("Sample data:")
print(df.head(10))

📚 ADVANCED DATA VISUALIZATION TUTORIAL
Creating synthetic dataset to demonstrate visualization principles...

✅ Dataset created: 14,600 records
   Date range: 2022-01-01 to 2023-12-31
   Products: 5
   Regions: 4
   Total sales: $3,213,413.95

Sample data:
        date    product region      sales      costs     profit  customers  \
0 2022-01-01  Product A  North  59.973713  40.574977  19.398736          2   
1 2022-01-02  Product A  North  55.109720  38.854663  16.255057          0   
2 2022-01-03  Product A  North  85.798098  58.597833  27.200265          0   
3 2022-01-04  Product A  North  93.108919  75.252652  17.856267          1   
4 2022-01-05  Product A  North  79.359299  64.829069  14.530230          0   
5 2022-01-06  Product A  North  79.666950  66.069291  13.597659          1   
6 2022-01-07  Product A  North  94.480861  61.320178  33.160682          2   
7 2022-01-08  Product A  North  63.647125  51.144923  12.502202          0   
8 2022-01-09  Product A  North  53.966089

## Part 2: What NOT to Do - Common Visualization Mistakes

Before we build great visualizations, let's understand what makes visualizations **terrible**. We'll intentionally create bad examples to highlight common mistakes.

### 2.1 The "Chartjunk" Example

We'll create a visualization that violates multiple principles:
- ❌ 3D effects that distort perception
- ❌ Too many colors without meaning
- ❌ Decorative elements (shadows, gradients)
- ❌ Unclear labels and poor typography
- ❌ Low data-ink ratio

### 2.2 The "Data Overload" Example

- ❌ Too much information in one view
- ❌ No visual hierarchy
- ❌ Every element competing for attention
- ❌ Cognitive overload

### 2.3 The "Misleading" Example

- ❌ Truncated Y-axis to exaggerate differences
- ❌ Inconsistent scales across comparisons
- ❌ Cherry-picked data to support narrative

In [2]:
# Cell 2: Intentionally BAD visualizations to demonstrate what NOT to do

print("❌ EXAMPLES OF BAD VISUALIZATIONS (What NOT to Do)")
print("=" * 80)
print()

# Calculate monthly sales for the bad examples
monthly_sales = df.groupby([df['date'].dt.to_period('M'), 'product'])['sales'].sum().reset_index()
monthly_sales['date'] = monthly_sales['date'].dt.to_timestamp()

# BAD EXAMPLE 1: 3D Pie Chart (Multiple Violations)
print("❌ BAD EXAMPLE 1: 3D Pie Chart with Too Many Categories")
print("Problems:")
print("  • 3D distorts perception (slices appear different sizes)")
print("  • Angle/area is hard to judge accurately (Tier 2 encoding)")
print("  • Too many slices (>7) makes comparison impossible")
print("  • No clear ordering or hierarchy")
print()

# Total sales by product
product_sales = df.groupby('product')['sales'].sum()

fig_bad1 = go.Figure(data=[go.Pie(
    labels=product_sales.index,
    values=product_sales.values,
    hole=0,
    pull=[0.1, 0.1, 0.1, 0.1, 0.1],  # Exploded slices (distracting)
    marker=dict(
        colors=['#FF0000', '#00FF00', '#0000FF', '#FFFF00', '#FF00FF'],  # Harsh colors
        line=dict(color='#000000', width=3)  # Heavy borders
    ),
    textinfo='label+percent',
    textfont=dict(size=14, family='Comic Sans MS'),  # Unprofessional font
    # Note: Can't make true 3D in Plotly, but we're simulating bad design
)])

fig_bad1.update_layout(
    title=dict(
        text='TOTAL SALES BY PRODUCT!!!',  # Too many exclamation marks
        font=dict(size=24, color='red', family='Comic Sans MS'),
        x=0.5
    ),
    showlegend=True,
    legend=dict(
        bgcolor='yellow',  # Harsh background
        bordercolor='black',
        borderwidth=2
    ),
    paper_bgcolor='lightgray',  # Distracting background
    height=500
)

fig_bad1.show()

print("Why this is BAD:")
print("  ✗ Pie charts are inherently difficult for precise comparisons")
print("  ✗ 3D effects make it worse by distorting perception")
print("  ✗ Too many categories (should be max 5-6)")
print("  ✗ Garish colors with no semantic meaning")
print("  ✗ Distracting background colors")
print()
print("-" * 80)
print()

# BAD EXAMPLE 2: Dual-Axis Chart (Misleading Scales)
print("❌ BAD EXAMPLE 2: Dual-Axis Chart with Misleading Scales")
print("Problems:")
print("  • Two different Y-axes create false visual correlation")
print("  • Scales can be manipulated to show any relationship")
print("  • Viewer can't quickly determine which line uses which axis")
print()

# Monthly totals
monthly_totals = df.groupby(df['date'].dt.to_period('M')).agg({
    'sales': 'sum',
    'customers': 'sum'
}).reset_index()
monthly_totals['date'] = monthly_totals['date'].dt.to_timestamp()

fig_bad2 = make_subplots(specs=[[{"secondary_y": True}]])

# Sales (left axis)
fig_bad2.add_trace(
    go.Scatter(
        x=monthly_totals['date'],
        y=monthly_totals['sales'],
        name='Sales ($)',
        line=dict(color='blue', width=3)
    ),
    secondary_y=False
)

# Customers (right axis - MANIPULATED SCALE)
fig_bad2.add_trace(
    go.Scatter(
        x=monthly_totals['date'],
        y=monthly_totals['customers'],
        name='Customers',
        line=dict(color='red', width=3)
    ),
    secondary_y=True
)

# Manipulate the scales to make them look correlated
fig_bad2.update_yaxes(title_text="Sales ($)", range=[0, monthly_totals['sales'].max() * 1.5], secondary_y=False)
fig_bad2.update_yaxes(title_text="Customers", range=[0, monthly_totals['customers'].max() * 1.5], secondary_y=True)

fig_bad2.update_layout(
    title='Sales vs Customers Over Time (MISLEADING!)',
    hovermode='x unified',
    height=400
)

fig_bad2.show()

print("Why this is BAD:")
print("  ✗ Dual axes allow arbitrary scale manipulation")
print("  ✗ Visual correlation may not reflect actual relationship")
print("  ✗ Difficult to mentally compute the actual relationship")
print("  ✗ Can be used to mislead (intentionally or not)")
print()
print("BETTER ALTERNATIVE: Use a single axis with normalized values (0-100%)")
print("                    or show correlation with a scatter plot")
print()
print("-" * 80)
print()

# BAD EXAMPLE 3: Rainbow Color Palette for Sequential Data
print("❌ BAD EXAMPLE 3: Rainbow Colors for Sequential Data")
print("Problems:")
print("  • Rainbow palette has no perceptual ordering")
print("  • Some color transitions are more salient than others")
print("  • Not colorblind-safe")
print("  • Creates false 'bands' where colors change")
print()

# Heatmap of daily sales by product
pivot_data = df.pivot_table(
    values='sales',
    index='product',
    columns=df['date'].dt.month,
    aggfunc='mean'
)

fig_bad3 = go.Figure(data=go.Heatmap(
    z=pivot_data.values,
    x=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
    y=pivot_data.index,
    colorscale='Jet',  # The infamous rainbow scale
    colorbar=dict(title='Avg Sales')
))

fig_bad3.update_layout(
    title='Average Monthly Sales by Product (BAD COLOR SCHEME)',
    xaxis_title='Month',
    yaxis_title='Product',
    height=400
)

fig_bad3.show()

print("Why this is BAD:")
print("  ✗ 'Jet' (rainbow) colorscale has no perceptual ordering")
print("  ✗ Yellow appears as 'peaks' even in smooth data")
print("  ✗ Creates false boundaries between colors")
print("  ✗ Not colorblind accessible")
print()
print("BETTER ALTERNATIVE: Use sequential scale (light to dark) or diverging scale")
print()
print("=" * 80)

❌ EXAMPLES OF BAD VISUALIZATIONS (What NOT to Do)

❌ BAD EXAMPLE 1: 3D Pie Chart with Too Many Categories
Problems:
  • 3D distorts perception (slices appear different sizes)
  • Angle/area is hard to judge accurately (Tier 2 encoding)
  • Too many slices (>7) makes comparison impossible
  • No clear ordering or hierarchy



Why this is BAD:
  ✗ Pie charts are inherently difficult for precise comparisons
  ✗ 3D effects make it worse by distorting perception
  ✗ Too many categories (should be max 5-6)
  ✗ Garish colors with no semantic meaning
  ✗ Distracting background colors

--------------------------------------------------------------------------------

❌ BAD EXAMPLE 2: Dual-Axis Chart with Misleading Scales
Problems:
  • Two different Y-axes create false visual correlation
  • Scales can be manipulated to show any relationship
  • Viewer can't quickly determine which line uses which axis



Why this is BAD:
  ✗ Dual axes allow arbitrary scale manipulation
  ✗ Visual correlation may not reflect actual relationship
  ✗ Difficult to mentally compute the actual relationship
  ✗ Can be used to mislead (intentionally or not)

BETTER ALTERNATIVE: Use a single axis with normalized values (0-100%)
                    or show correlation with a scatter plot

--------------------------------------------------------------------------------

❌ BAD EXAMPLE 3: Rainbow Colors for Sequential Data
Problems:
  • Rainbow palette has no perceptual ordering
  • Some color transitions are more salient than others
  • Not colorblind-safe
  • Creates false 'bands' where colors change



Why this is BAD:
  ✗ 'Jet' (rainbow) colorscale has no perceptual ordering
  ✗ Yellow appears as 'peaks' even in smooth data
  ✗ Creates false boundaries between colors
  ✗ Not colorblind accessible

BETTER ALTERNATIVE: Use sequential scale (light to dark) or diverging scale



## Part 3: Designing Effective Visualizations - The Right Way

Now that we've seen what NOT to do, let's apply evidence-based design principles to create effective visualizations.

### 3.1 The Principle of Proportional Ink (Tufte)

**Rule**: The amount of ink (pixels) used to represent data should be proportional to the data values.

**Violations**:
- Truncated bar charts (bars don't start at zero)
- 3D effects (add non-data ink)
- Unnecessary decorations

**Application**: Start bar charts at zero, remove backgrounds, use minimal gridlines

### 3.2 Choose the Right Chart Type

**Decision Framework**:
```
Is it comparison? → Bar chart (horizontal if long labels)
Is it distribution? → Histogram or box plot
Is it relationship? → Scatter plot
Is it composition? → Stacked bar (for categories) or treemap
Is it change over time? → Line chart (continuous) or bar (discrete periods)
```

**Chart-Specific Guidelines**:

**Bar Charts**:
- ✓ Use for discrete comparisons
- ✓ Sort by value (unless inherent order)
- ✓ Horizontal for long labels
- ✓ Space between bars = 50% of bar width

**Line Charts**:
- ✓ Use for continuous time series
- ✓ Max 5-7 lines (beyond this, use small multiples)
- ✓ Direct labeling (not legend) when possible
- ✓ Emphasize the data with thick lines, de-emphasize gridlines

**Scatter Plots**:
- ✓ Use for correlation/relationship
- ✓ Add transparency for overplotting
- ✓ Consider adding trend line if relationship exists
- ✓ Use size/color for additional dimensions (max 2-3 total)

### 3.3 The Narrative Structure

Every visualization should tell a story with three acts:

**Act 1: Setup (Context)**
- Establish the baseline
- Show the "normal" state
- Orient the viewer

**Act 2: Conflict (Problem/Insight)**
- Highlight the deviation
- Show the anomaly or opportunity
- Use visual emphasis (color, annotation)

**Act 3: Resolution (Action)**
- Suggest next steps
- Compare scenarios
- Guide toward decision

### 3.4 Visual Hierarchy

**Primary (Most Important)**:
- Largest size, boldest color, top-left position
- Example: Key metric, main trend line

**Secondary (Supporting)**:
- Medium size, less saturated colors
- Example: Comparative benchmarks, labels

**Tertiary (Context)**:
- Smallest, lightest, gray
- Example: Gridlines, axis labels, notes

In [3]:
# Cell 3: GOOD visualizations demonstrating best practices

print("✅ EXAMPLES OF EFFECTIVE VISUALIZATIONS")
print("=" * 80)
print()

# Define professional color palette (ColorBrewer)
COLORS = {
    'primary': '#2E86AB',      # Blue (trustworthy, professional)
    'secondary': '#A23B72',    # Purple (supporting)
    'accent': '#F18F01',       # Orange (highlights)
    'success': '#06A77D',      # Green (positive)
    'danger': '#C73E1D',       # Red (negative/urgent)
    'neutral': '#6C757D',      # Gray (context)
    'light_gray': '#E9ECEF',   # Backgrounds
    'dark_gray': '#343A40'     # Text
}

# GOOD EXAMPLE 1: Clean Bar Chart with Proper Encoding
print("✅ GOOD EXAMPLE 1: Sorted Bar Chart with Clear Hierarchy")
print("Best Practices Applied:")
print("  ✓ Sorted by value (makes comparison easy)")
print("  ✓ Horizontal bars (better for product names)")
print("  ✓ Single color (no unnecessary decoration)")
print("  ✓ Direct data labels (no need to reference axis)")
print("  ✓ Minimal gridlines (reduced visual clutter)")
print()

# Calculate total sales by product
product_totals = df.groupby('product')['sales'].sum().sort_values(ascending=True)

fig_good1 = go.Figure()

fig_good1.add_trace(go.Bar(
    x=product_totals.values,
    y=product_totals.index,
    orientation='h',
    marker=dict(
        color=COLORS['primary'],
        line=dict(width=0)  # No borders
    ),
    text=[f'${v/1000:.0f}K' for v in product_totals.values],
    textposition='outside',
    textfont=dict(size=12, color=COLORS['dark_gray']),
    hovertemplate='<b>%{y}</b><br>Sales: $%{x:,.0f}<extra></extra>'
))

fig_good1.update_layout(
    title=dict(
        text='Total Sales by Product (2022-2023)',
        font=dict(size=18, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    xaxis=dict(
        title='',
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        gridwidth=1,
        zeroline=False,
        showticklabels=False  # Data labels make axis redundant
    ),
    yaxis=dict(
        title='',
        showgrid=False
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=350,
    margin=dict(l=120, r=60, t=60, b=40),
    showlegend=False
)

fig_good1.show()

print("Design Rationale:")
print("  → Sorted order enables instant rank comparison (fastest to slowest scan)")
print("  → Horizontal bars accommodate longer labels without rotation")
print("  → Single color focuses attention on the data, not decoration")
print("  → Position along common scale (most accurate encoding)")
print()
print("-" * 80)
print()

# GOOD EXAMPLE 2: Time Series with Visual Hierarchy
print("✅ GOOD EXAMPLE 2: Time Series with Clear Visual Hierarchy")
print("Best Practices Applied:")
print("  ✓ Primary line emphasized (thicker, saturated color)")
print("  ✓ Context lines de-emphasized (thinner, lighter)")
print("  ✓ Direct labeling (reduces cognitive load)")
print("  ✓ Annotations for key events")
print("  ✓ Minimal, light gridlines")
print()

# Calculate monthly sales by product
monthly_by_product = df.groupby([df['date'].dt.to_period('M'), 'product'])['sales'].sum().reset_index()
monthly_by_product['date'] = monthly_by_product['date'].dt.to_timestamp()

# Highlight best performer
best_product = df.groupby('product')['sales'].sum().idxmax()

fig_good2 = go.Figure()

# Add all products
for product in df['product'].unique():
    product_data = monthly_by_product[monthly_by_product['product'] == product]

    is_primary = (product == best_product)

    fig_good2.add_trace(go.Scatter(
        x=product_data['date'],
        y=product_data['sales'],
        name=product,
        mode='lines',
        line=dict(
            color=COLORS['primary'] if is_primary else COLORS['neutral'],
            width=3 if is_primary else 1.5,
        ),
        opacity=1.0 if is_primary else 0.4,
        hovertemplate=f'<b>{product}</b><br>%{{x|%b %Y}}<br>${{y:,.0f}}<extra></extra>'
    ))

fig_good2.update_layout(
    title=dict(
        text=f'Monthly Sales Trends (2022-2023)<br><sub>Highlighting {best_product} (Top Performer)</sub>',
        font=dict(size=18, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    xaxis=dict(
        title='',
        showgrid=False,
        zeroline=False
    ),
    yaxis=dict(
        title='Monthly Sales ($)',
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        gridwidth=1,
        zeroline=False
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=450,
    hovermode='x unified',
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='right',
        x=1,
        font=dict(size=10)
    )
)

# Add annotation for peak - CORRECTED SECTION
best_product_data = monthly_by_product[monthly_by_product['product'] == best_product]
peak_idx = best_product_data['sales'].idxmax()
peak_data = best_product_data.loc[peak_idx]

fig_good2.add_annotation(
    x=peak_data['date'],
    y=peak_data['sales'],
    text=f"Peak: ${peak_data['sales']/1000:.0f}K",
    showarrow=True,
    arrowhead=2,
    arrowcolor=COLORS['accent'],
    ax=-40,
    ay=-40,
    font=dict(size=11, color=COLORS['accent']),
    bgcolor='white',
    bordercolor=COLORS['accent'],
    borderwidth=1,
    borderpad=4
)

fig_good2.show()

print("Design Rationale:")
print("  → Visual hierarchy: Primary line thick/saturated, others thin/transparent")
print("  → Annotation draws attention to key insight (peak)")
print("  → Light gridlines provide reference without competing")
print("  → Unified hover shows all values at once (reduces interaction cost)")
print()
print("-" * 80)
print()

# GOOD EXAMPLE 3: Proper Heatmap with Sequential Colors
print("✅ GOOD EXAMPLE 3: Heatmap with Perceptually-Uniform Colors")
print("Best Practices Applied:")
print("  ✓ Sequential colorscale (light → dark = low → high)")
print("  ✓ Perceptually uniform (equal steps look equal)")
print("  ✓ Colorblind-safe palette")
print("  ✓ Clear labels and title")
print()

# Calculate average daily sales by product and quarter
df['quarter_label'] = df['year'].astype(str) + ' ' + df['quarter']
quarterly_avg = df.groupby(['product', 'quarter_label'])['sales'].mean().reset_index()
pivot_quarterly = quarterly_avg.pivot(index='product', columns='quarter_label', values='sales')

# Sort columns chronologically
pivot_quarterly = pivot_quarterly[sorted(pivot_quarterly.columns)]

fig_good3 = go.Figure(data=go.Heatmap(
    z=pivot_quarterly.values,
    x=pivot_quarterly.columns,
    y=pivot_quarterly.index,
    colorscale='Blues',  # Sequential, perceptually uniform
    colorbar=dict(
        title='Avg Daily<br>Sales ($)',
        titleside='right',
        tickformat='$,.0f',
        len=0.7
    ),
    hovertemplate='<b>%{y}</b><br>%{x}<br>Avg: $%{z:,.0f}<extra></extra>',
    text=pivot_quarterly.values.round(0),
    texttemplate='$%{text:,.0f}',
    textfont=dict(size=10),
    showscale=True
))

fig_good3.update_layout(
    title=dict(
        text='Average Daily Sales by Product and Quarter',
        font=dict(size=18, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    xaxis=dict(
        title='',
        side='bottom',
        tickangle=-45
    ),
    yaxis=dict(
        title='',
        autorange='reversed'  # Top to bottom
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=400,
    margin=dict(l=100, r=100, t=80, b=100)
)

fig_good3.show()

print("Design Rationale:")
print("  → 'Blues' sequential scale: darker = higher (intuitive)")
print("  → Perceptually uniform: equal data steps = equal visual steps")
print("  → Cell annotations provide exact values (no need to decode color)")
print("  → Colorblind-safe (single hue variation)")
print()
print("=" * 80)

✅ EXAMPLES OF EFFECTIVE VISUALIZATIONS

✅ GOOD EXAMPLE 1: Sorted Bar Chart with Clear Hierarchy
Best Practices Applied:
  ✓ Sorted by value (makes comparison easy)
  ✓ Horizontal bars (better for product names)
  ✓ Single color (no unnecessary decoration)
  ✓ Direct data labels (no need to reference axis)
  ✓ Minimal gridlines (reduced visual clutter)



Design Rationale:
  → Sorted order enables instant rank comparison (fastest to slowest scan)
  → Horizontal bars accommodate longer labels without rotation
  → Single color focuses attention on the data, not decoration
  → Position along common scale (most accurate encoding)

--------------------------------------------------------------------------------

✅ GOOD EXAMPLE 2: Time Series with Clear Visual Hierarchy
Best Practices Applied:
  ✓ Primary line emphasized (thicker, saturated color)
  ✓ Context lines de-emphasized (thinner, lighter)
  ✓ Direct labeling (reduces cognitive load)
  ✓ Annotations for key events
  ✓ Minimal, light gridlines



Design Rationale:
  → Visual hierarchy: Primary line thick/saturated, others thin/transparent
  → Annotation draws attention to key insight (peak)
  → Light gridlines provide reference without competing
  → Unified hover shows all values at once (reduces interaction cost)

--------------------------------------------------------------------------------

✅ GOOD EXAMPLE 3: Heatmap with Perceptually-Uniform Colors
Best Practices Applied:
  ✓ Sequential colorscale (light → dark = low → high)
  ✓ Perceptually uniform (equal steps look equal)
  ✓ Colorblind-safe palette
  ✓ Clear labels and title



Design Rationale:
  → 'Blues' sequential scale: darker = higher (intuitive)
  → Perceptually uniform: equal data steps = equal visual steps
  → Cell annotations provide exact values (no need to decode color)
  → Colorblind-safe (single hue variation)



## Part 5: Interactive Visualization - Designing for Exploration

### 5.1 Shneiderman's Visual Information-Seeking Mantra

**"Overview first, zoom and filter, then details-on-demand"**

This three-stage interaction model should guide all dashboard design:

#### Stage 1: Overview First
- Show the big picture
- Use aggregated data
- Highlight patterns and anomalies
- Enable quick "at-a-glance" assessment

**Example**: Executive dashboard showing KPI tiles and trend sparklines

#### Stage 2: Zoom and Filter
- Allow users to focus on subsets
- Provide multiple filtering dimensions (time, category, region)
- Show filtered results immediately (no "submit" buttons)
- Indicate what filters are active

**Example**: Date range selector, dropdown filters, brushing on chart to filter others

#### Stage 3: Details-on-Demand
- Hover tooltips for precise values
- Click-through to detailed views
- Drill-down hierarchies (year → quarter → month → day)
- Export capabilities

**Example**: Hover shows exact values, click opens detailed transaction list

### 5.2 Interaction Design Patterns

#### Pattern 1: Coordinated Multiple Views (CMV)
**Principle**: Multiple charts are linked; interacting with one updates others

**Use Cases**:
- Brushing: Select points in scatter plot → highlights in other charts
- Filtering: Select category → all charts update to show only that category
- Hover: Hover over date → vertical line appears in all time series

**Benefits**: Explore relationships across different perspectives

#### Pattern 2: Focus + Context
**Principle**: Show detail view alongside overview for orientation

**Use Cases**:
- Range slider on time series (zoom to period, slider shows full range)
- Minimap with viewport indicator
- Expandable sections

**Benefits**: Never lose sense of "where you are" in the data

#### Pattern 3: Progressive Disclosure
**Principle**: Show increasing detail as user expresses interest

**Use Cases**:
- Tooltip on hover → popup on click → full page on double-click
- Collapsed panels that expand
- Aggregated data → disaggregated on click

**Benefits**: Prevents information overload while maintaining access to depth

### 5.3 Performance Considerations

**Client-Side Rendering** (Plotly, D3.js):
- ✓ Fast interactions (no server round-trip)
- ✓ Smooth animations
- ✗ Limited to ~10,000 points before lag
- ✗ All data sent to browser (security/size concern)

**Server-Side Rendering** (Dash, Shiny):
- ✓ Handle massive datasets (millions of rows)
- ✓ Keeps raw data secure
- ✗ Network latency on interactions
- ✗ Requires server infrastructure

**Hybrid Approach** (Best of Both):
- Aggregate data server-side
- Send summary to client
- Allow client-side interactions on summary
- Request details on-demand

### 5.4 Accessibility in Interactive Visualizations

**Keyboard Navigation**:
- All interactive elements must be keyboard-accessible
- Tab order should be logical
- Focus indicators must be visible

**Screen Reader Support**:
- Provide text alternatives (aria-labels)
- Describe trends in words, not just visually
- Announce changes when data updates

**Motion Sensitivity**:
- Provide option to disable animations
- Don't auto-play animations on load
- Keep animations < 0.5 seconds

**Color Independence**:
- Never use ONLY color to convey information
- Add shape, pattern, or text labels as backup
- Provide colorblind-safe palettes

In [4]:
# Cell 5: Building an interactive dashboard with linked visualizations

# Note: Dash imports are shown for reference but not used in this static demo
# from dash import Dash, dcc, html, Input, Output, State
# import dash_bootstrap_components as dbc

print("✅ INTERACTIVE DASHBOARD: Coordinated Multiple Views")
print("=" * 80)
print()

# For notebook demonstration, we'll create static versions showing the concept
# (Full Dash app requires running a server)

print("INTERACTION PATTERN: Filter by Selection")
print()
print("Concept: Clicking on a bar in Chart A filters Charts B and C")
print("         This creates a 'coordinated multiple view' system")
print()

# Example: Create a dashboard showing product performance
# Chart 1: Bar chart of total sales (CONTROLLER)
# Chart 2: Time series filtered by selected product (RESPONDER)
# Chart 3: Regional breakdown filtered by selected product (RESPONDER)

# Calculate totals for bar chart
product_totals = df.groupby('product')['sales'].sum().sort_values(ascending=False)

# Simulate selecting "Product A" (in real dashboard, this would be interactive)
selected_product = product_totals.index[0]

print(f"Simulating selection: {selected_product}")
print()

# Create coordinated dashboard visualization
fig_dashboard = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        '<b>Total Sales by Product</b><br><sub>(Click to filter below)</sub>',
        '<b>Regional Breakdown</b><br><sub>(Filtered by selection)</sub>',
        f'<b>Sales Trend</b><br><sub>(Filtered by selection)</sub>',
        ''
    ),
    specs=[
        [{"type": "bar", "rowspan": 1}, {"type": "pie"}],
        [{"type": "scatter", "colspan": 2}, None]
    ],
    vertical_spacing=0.15,
    horizontal_spacing=0.15
)

# Chart 1: Total sales bar chart (CONTROLLER)
for product in product_totals.index:
    is_selected = (product == selected_product)

    fig_dashboard.add_trace(
        go.Bar(
            x=[product],
            y=[product_totals[product]],
            name=product,
            marker=dict(
                color=COLORS['accent'] if is_selected else COLORS['primary'],
                line=dict(
                    color=COLORS['danger'] if is_selected else 'rgba(0,0,0,0)',
                    width=3 if is_selected else 0
                )
            ),
            text=f"${product_totals[product]/1000:.0f}K",
            textposition='outside',
            showlegend=False,
            hovertemplate=f'<b>{product}</b><br>Total Sales: $%{{y:,.0f}}<extra></extra>'
        ),
        row=1, col=1
    )

# Add annotation showing this is the selector
fig_dashboard.add_annotation(
    text=f"← Selected: {selected_product}",
    xref='x', yref='y',
    x=selected_product, y=product_totals[selected_product],
    showarrow=True,
    arrowhead=2,
    arrowcolor=COLORS['danger'],
    ax=60, ay=-30,
    font=dict(size=11, color=COLORS['danger'], family='Arial'),
    bgcolor='white',
    bordercolor=COLORS['danger'],
    borderwidth=2,
    borderpad=4,
    row=1, col=1
)

# Chart 2: Regional breakdown (FILTERED by selected product)
region_breakdown = df[df['product'] == selected_product].groupby('region')['sales'].sum()

fig_dashboard.add_trace(
    go.Pie(
        labels=region_breakdown.index,
        values=region_breakdown.values,
        marker=dict(
            colors=[COLORS['primary'], COLORS['secondary'], COLORS['success'], COLORS['neutral']],
            line=dict(color='white', width=2)
        ),
        textinfo='label+percent',
        textfont=dict(size=11),
        hovertemplate='<b>%{label}</b><br>Sales: $%{value:,.0f}<br>Share: %{percent}<extra></extra>',
        showlegend=False
    ),
    row=1, col=2
)

# Chart 3: Time series (FILTERED by selected product)
monthly_filtered = df[df['product'] == selected_product].groupby(
    df['date'].dt.to_period('M')
)['sales'].sum().reset_index()
monthly_filtered['date'] = monthly_filtered['date'].dt.to_timestamp()

fig_dashboard.add_trace(
    go.Scatter(
        x=monthly_filtered['date'],
        y=monthly_filtered['sales'],
        mode='lines+markers',
        line=dict(color=COLORS['primary'], width=3),
        marker=dict(size=6, color=COLORS['accent']),
        fill='tozeroy',
        fillcolor='rgba(46, 134, 171, 0.1)',
        name=selected_product,
        showlegend=False,
        hovertemplate=f'<b>{selected_product}</b><br>%{{x|%b %Y}}<br>Sales: $%{{y:,.0f}}<extra></extra>'
    ),
    row=2, col=1
)

# Add average line - CORRECTED: specify row and col, and use correct xref/yref
avg_sales = monthly_filtered['sales'].mean()
fig_dashboard.add_shape(
    type="line",
    x0=monthly_filtered['date'].min(),
    x1=monthly_filtered['date'].max(),
    y0=avg_sales,
    y1=avg_sales,
    line=dict(
        color=COLORS['neutral'],
        width=2,
        dash="dash"
    ),
    xref='x3',  # Reference to the third x-axis (row 2, col 1)
    yref='y3',  # Reference to the third y-axis (row 2, col 1)
    row=2, col=1
)

# Add annotation for the average line
fig_dashboard.add_annotation(
    x=monthly_filtered['date'].max(),
    y=avg_sales,
    text=f"Average: ${avg_sales/1000:.0f}K",
    showarrow=False,
    xanchor='left',
    xshift=10,
    font=dict(size=10, color=COLORS['neutral']),
    xref='x3',
    yref='y3',
    row=2, col=1
)

# Update layout
fig_dashboard.update_layout(
    title=dict(
        text='Interactive Dashboard Demo: Coordinated Multiple Views<br><sub>Selection in one chart filters others (simulated here, fully interactive in Dash app)</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    height=700,
    showlegend=False,
    plot_bgcolor='white',
    paper_bgcolor='white',
    hovermode='closest'
)

# Update axes
fig_dashboard.update_xaxes(showgrid=True, gridcolor=COLORS['light_gray'], row=1, col=1)
fig_dashboard.update_yaxes(showgrid=True, gridcolor=COLORS['light_gray'], row=1, col=1)
fig_dashboard.update_xaxes(showgrid=True, gridcolor=COLORS['light_gray'], row=2, col=1)
fig_dashboard.update_yaxes(showgrid=True, gridcolor=COLORS['light_gray'], row=2, col=1)

fig_dashboard.show()

print("Interaction Flow:")
print("  1. User clicks on 'Product A' bar (Chart 1)")
print("  2. System detects click event and captures selected product")
print("  3. Charts 2 and 3 automatically filter to show only Product A data")
print("  4. Visual feedback shows which product is selected (highlighted bar)")
print()
print("Benefits of Coordinated Views:")
print("  ✓ Explore relationships across different perspectives")
print("  ✓ Maintain context (always see the 'big picture' in Chart 1)")
print("  ✓ Reduce cognitive load (filtering happens automatically)")
print("  ✓ Enable hypothesis testing (What if I select Product B?)")
print()
print("-" * 80)
print()

# Example of Focus + Context pattern
print("INTERACTION PATTERN: Focus + Context (Range Slider)")
print()

# Create daily sales time series for one product
daily_product = df[df['product'] == 'Product A'].groupby('date')['sales'].sum().reset_index()

fig_focus = go.Figure()

fig_focus.add_trace(
    go.Scatter(
        x=daily_product['date'],
        y=daily_product['sales'],
        mode='lines',
        line=dict(color=COLORS['primary'], width=2),
        fill='tozeroy',
        fillcolor='rgba(46, 134, 171, 0.1)',
        name='Product A Sales',
        hovertemplate='<b>Product A</b><br>%{x|%b %d, %Y}<br>Sales: $%{y:,.0f}<extra></extra>'
    )
)

# Add range slider (CONTEXT) and range selector buttons
fig_focus.update_xaxes(
    rangeslider=dict(
        visible=True,
        thickness=0.05,
        bgcolor=COLORS['light_gray']
    ),
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1M", step="month", stepmode="backward"),
            dict(count=3, label="3M", step="month", stepmode="backward"),
            dict(count=6, label="6M", step="month", stepmode="backward"),
            dict(count=1, label="1Y", step="year", stepmode="backward"),
            dict(step="all", label="All")
        ]),
        bgcolor=COLORS['light_gray'],
        activecolor=COLORS['accent'],
        x=0,
        y=1.05,
        xanchor='left',
        yanchor='bottom'
    )
)

fig_focus.update_layout(
    title=dict(
        text='Focus + Context Pattern: Range Slider<br><sub>Zoom to time period (FOCUS) while maintaining awareness of full range (CONTEXT)</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    yaxis=dict(
        title='Daily Sales ($)',
        showgrid=True,
        gridcolor=COLORS['light_gray']
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=500,
    showlegend=False,
    hovermode='x unified'
)

fig_focus.show()

print("How Focus + Context Works:")
print("  FOCUS: Main chart shows detailed view of selected time range")
print("  CONTEXT: Range slider below shows entire time series")
print("  → User can zoom to specific period without losing orientation")
print("  → Buttons provide quick jumps to common periods (1M, 3M, 6M, etc.)")
print()
print("Benefits:")
print("  ✓ Never lose sense of 'where you are' in the data")
print("  ✓ Quick navigation with buttons")
print("  ✓ Precise selection with slider")
print("  ✓ Reduces need for separate 'zoom' controls")
print()
print("=" * 80)

✅ INTERACTIVE DASHBOARD: Coordinated Multiple Views

INTERACTION PATTERN: Filter by Selection

Concept: Clicking on a bar in Chart A filters Charts B and C
         This creates a 'coordinated multiple view' system

Simulating selection: Product E



Interaction Flow:
  1. User clicks on 'Product A' bar (Chart 1)
  2. System detects click event and captures selected product
  3. Charts 2 and 3 automatically filter to show only Product A data
  4. Visual feedback shows which product is selected (highlighted bar)

Benefits of Coordinated Views:
  ✓ Explore relationships across different perspectives
  ✓ Maintain context (always see the 'big picture' in Chart 1)
  ✓ Reduce cognitive load (filtering happens automatically)
  ✓ Enable hypothesis testing (What if I select Product B?)

--------------------------------------------------------------------------------

INTERACTION PATTERN: Focus + Context (Range Slider)



How Focus + Context Works:
  FOCUS: Main chart shows detailed view of selected time range
  CONTEXT: Range slider below shows entire time series
  → User can zoom to specific period without losing orientation
  → Buttons provide quick jumps to common periods (1M, 3M, 6M, etc.)

Benefits:
  ✓ Never lose sense of 'where you are' in the data
  ✓ Quick navigation with buttons
  ✓ Precise selection with slider
  ✓ Reduces need for separate 'zoom' controls



## Part 6: Multi-Dimensional Encoding - Beyond X and Y

### 6.1 The Dimensionality Problem

Most real-world datasets have 5+ dimensions. A standard 2D plot only shows 2 dimensions. How do we show more without creating confusion?

### 6.2 Visual Encoding Channels (Ranked by Effectiveness)

**Quantitative (Ordered) Data:**
1. **Position** (X, Y) - Most accurate
2. **Length** (Bar height/width)
3. **Angle** (Pie slices) - Less accurate
4. **Area** (Circle size) - Even less accurate
5. **Volume** (3D) - Avoid!
6. **Color Saturation** (Light to dark)

**Categorical (Unordered) Data:**
1. **Color Hue** (Red, Blue, Green)
2. **Shape** (Circle, Square, Triangle)
3. **Texture/Pattern**

### 6.3 Combining Channels: The Layered Encoding Strategy

**Example: Scatter Plot with 5 Dimensions**
- Dimension 1: X-axis (Position)
- Dimension 2: Y-axis (Position)
- Dimension 3: Color hue (Category)
- Dimension 4: Size (Quantitative)
- Dimension 5: Shape (Category)

**Rules for Combination:**
1. **Most Important → Best Channel**: Use position (X/Y) for your primary comparison
2. **Maximum 3-4 Encodings**: Beyond this, cognitive overload occurs
3. **Don't Conflict**: Don't use both color hue and shape for same dimension
4. **Provide Legend**: Every encoding needs clear legend

### 6.4 When to Use Bubble Charts

**Bubble Chart = Scatter + Size Encoding**

**Good Use Cases:**
- Three quantitative dimensions (X, Y, Size)
- Showing relationships AND magnitude
- Time series where size shows additional metric

**Bad Use Cases:**
- Size differences < 2x (hard to perceive)
- Overlapping bubbles (use transparency or jitter)
- More than 50-100 points (becomes cluttered)

**Design Tips:**
- Scale by area, not radius (perception is area-based)
- Use transparency (0.6-0.8 opacity) for overlap
- Add size legend with example bubbles
- Consider animation if size changes over time

### 6.5 Motion as an Encoding Channel

**Animated Transitions** can encode:
- Time progression (most common)
- Before/after comparison
- Cause and effect relationships

**Animation Best Practices:**
- Duration: 500-1000ms (not too fast, not too slow)
- Easing: Use ease-in-out (feels natural)
- Pause at endpoints (let viewer absorb)
- Provide play/pause controls
- Show static snapshots as fallback

**Hans Rosling's Gapminder**: Classic example of using animation to show country development over time

In [5]:
# Cell 6: Advanced multi-dimensional encodings

print("✅ MULTI-DIMENSIONAL ENCODING: Showing 5+ Dimensions Simultaneously")
print("=" * 80)
print()

# Prepare data for multi-dimensional visualization
# Aggregate by product and region for quarterly view
quarterly_summary = df.groupby(['product', 'region', 'quarter', 'year']).agg({
    'sales': 'sum',
    'profit': 'sum',
    'customers': 'sum',
    'profit_margin': 'mean'
}).reset_index()

quarterly_summary['profit_per_customer'] = quarterly_summary['profit'] / quarterly_summary['customers']
quarterly_summary['quarter_label'] = quarterly_summary['year'].astype(str) + ' ' + quarterly_summary['quarter']

# Focus on 2023 data for clarity
data_2023 = quarterly_summary[quarterly_summary['year'] == 2023].copy()

print("EXAMPLE 1: Bubble Chart with 5 Dimensions")
print()
print("Encoding Strategy:")
print("  X-axis (Position): Sales (most accurate channel for primary metric)")
print("  Y-axis (Position): Profit Margin %")
print("  Size (Area): Total Customers")
print("  Color (Hue): Product (categorical)")
print("  Shape (Symbol): Region (categorical)")
print()

# Create bubble chart
fig_bubble = go.Figure()

# Color palette for products
product_colors = {
    'Product A': COLORS['primary'],
    'Product B': COLORS['secondary'],
    'Product C': COLORS['success'],
    'Product D': COLORS['accent'],
    'Product E': COLORS['danger']
}

# Shape mapping for regions
region_shapes = {
    'North': 'circle',
    'South': 'square',
    'East': 'diamond',
    'West': 'cross'
}

# Add trace for each product-region combination
for product in data_2023['product'].unique():
    for region in data_2023['region'].unique():
        subset = data_2023[
            (data_2023['product'] == product) &
            (data_2023['region'] == region)
        ]

        if len(subset) > 0:
            fig_bubble.add_trace(
                go.Scatter(
                    x=subset['sales'],
                    y=subset['profit_margin'],
                    mode='markers',
                    name=f'{product} - {region}',
                    marker=dict(
                        size=subset['customers'] / 20,  # Scale down for visibility
                        sizemode='diameter',
                        sizeref=2,
                        color=product_colors[product],
                        symbol=region_shapes[region],
                        line=dict(color='white', width=1),
                        opacity=0.7
                    ),
                    text=subset['quarter_label'],
                    customdata=subset[['customers', 'profit']],
                    hovertemplate=(
                        f'<b>{product} - {region}</b><br>' +
                        'Quarter: %{text}<br>' +
                        'Sales: $%{x:,.0f}<br>' +
                        'Profit Margin: %{y:.1f}%<br>' +
                        'Customers: %{customdata[0]:,.0f}<br>' +
                        'Profit: $%{customdata[1]:,.0f}' +
                        '<extra></extra>'
                    ),
                    showlegend=True
                )
            )

# Add quadrant lines for strategic guidance
median_sales = data_2023['sales'].median()
median_margin = data_2023['profit_margin'].median()

fig_bubble.add_vline(
    x=median_sales,
    line_dash="dash",
    line_color=COLORS['neutral'],
    line_width=1,
    opacity=0.5,
    annotation_text="Median Sales",
    annotation_position="top"
)

fig_bubble.add_hline(
    y=median_margin,
    line_dash="dash",
    line_color=COLORS['neutral'],
    line_width=1,
    opacity=0.5,
    annotation_text="Median Margin",
    annotation_position="right"
)

# Add quadrant labels
fig_bubble.add_annotation(
    text="⭐ High Margin<br>High Sales",
    x=data_2023['sales'].quantile(0.75),
    y=data_2023['profit_margin'].quantile(0.75),
    showarrow=False,
    font=dict(size=11, color=COLORS['success']),
    bgcolor='rgba(6, 167, 125, 0.1)',
    bordercolor=COLORS['success'],
    borderwidth=1,
    borderpad=6
)

fig_bubble.add_annotation(
    text="⚠️ Low Margin<br>Low Sales",
    x=data_2023['sales'].quantile(0.25),
    y=data_2023['profit_margin'].quantile(0.25),
    showarrow=False,
    font=dict(size=11, color=COLORS['danger']),
    bgcolor='rgba(199, 62, 29, 0.1)',
    bordercolor=COLORS['danger'],
    borderwidth=1,
    borderpad=6
)

fig_bubble.update_layout(
    title=dict(
        text='5-Dimensional Product-Region Performance Analysis (2023)<br>' +
             '<sub>Size = Customers | Color = Product | Shape = Region | Position = Sales × Profit Margin</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    xaxis=dict(
        title='Total Sales ($)',
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        zeroline=False
    ),
    yaxis=dict(
        title='Profit Margin (%)',
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        zeroline=False
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=600,
    hovermode='closest',
    legend=dict(
        title='Product - Region',
        orientation='v',
        yanchor='top',
        y=1,
        xanchor='left',
        x=1.02,
        font=dict(size=9)
    )
)

fig_bubble.show()

print("How to Read This Visualization:")
print("  1. Position (X,Y): Find product-region in quadrants")
print("  2. Bubble Size: Larger = more customers")
print("  3. Color: Identifies product family")
print("  4. Shape: Identifies geographic region")
print("  5. Hover: See exact values and quarter")
print()
print("Insights Visible:")
print("  ✓ Top-right quadrant: Stars (high sales, high margin)")
print("  ✓ Bottom-left quadrant: Problem areas (low sales, low margin)")
print("  ✓ Large bubbles in good quadrants: Prioritize retention")
print("  ✓ Small bubbles in good quadrants: Growth opportunities")
print()
print("-" * 80)
print()

# Example 2: Parallel Coordinates for high-dimensional comparison
print("EXAMPLE 2: Parallel Coordinates Plot (6+ Dimensions)")
print()
print("Use Case: When you have many quantitative dimensions to compare")
print("Benefits:")
print("  ✓ Show 6+ dimensions simultaneously")
print("  ✓ Identify patterns and clusters")
print("  ✓ Filter by brushing any axis")
print()

# Aggregate by product for parallel coordinates
product_metrics = df.groupby('product').agg({
    'sales': 'sum',
    'profit': 'sum',
    'customers': 'sum',
    'profit_margin': 'mean',
    'avg_transaction': 'mean'
}).reset_index()

product_metrics['sales_k'] = product_metrics['sales'] / 1000
product_metrics['profit_k'] = product_metrics['profit'] / 1000
product_metrics['customers_k'] = product_metrics['customers'] / 1000

# Create parallel coordinates
fig_parallel = go.Figure(data=
    go.Parcoords(
        line=dict(
            color=np.arange(len(product_metrics)),
            colorscale=[[0, COLORS['primary']], [0.5, COLORS['accent']], [1, COLORS['success']]],
            showscale=False
        ),
        dimensions=[
            dict(
                label='Sales ($K)',
                values=product_metrics['sales_k'],
                range=[product_metrics['sales_k'].min() * 0.9, product_metrics['sales_k'].max() * 1.1]
            ),
            dict(
                label='Profit ($K)',
                values=product_metrics['profit_k'],
                range=[product_metrics['profit_k'].min() * 0.9, product_metrics['profit_k'].max() * 1.1]
            ),
            dict(
                label='Profit Margin (%)',
                values=product_metrics['profit_margin'],
                range=[product_metrics['profit_margin'].min() * 0.9, product_metrics['profit_margin'].max() * 1.1]
            ),
            dict(
                label='Customers (K)',
                values=product_metrics['customers_k'],
                range=[product_metrics['customers_k'].min() * 0.9, product_metrics['customers_k'].max() * 1.1]
            ),
            dict(
                label='Avg Transaction ($)',
                values=product_metrics['avg_transaction'],
                range=[product_metrics['avg_transaction'].min() * 0.9, product_metrics['avg_transaction'].max() * 1.1]
            )
        ],
        labelfont=dict(size=12, color=COLORS['dark_gray']),
        rangefont=dict(size=10, color=COLORS['neutral'])
    )
)

fig_parallel.update_layout(
    title=dict(
        text='Product Performance: Parallel Coordinates View<br>' +
             '<sub>Each line represents a product | Hover to see details | Drag on axes to filter</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0,
        xanchor='left'
    ),
    height=500,
    paper_bgcolor='white'
)

fig_parallel.show()

print("How to Read Parallel Coordinates:")
print("  • Each vertical axis represents one dimension")
print("  • Each line represents one product")
print("  • Lines connecting axes show relationships")
print("  • Parallel lines = correlated dimensions")
print("  • Crossing lines = inverse relationships")
print()
print("Interactive Features:")
print("  • Drag on any axis to filter (brushing)")
print("  • Reorder axes by dragging axis title")
print("  • Hover on line to highlight and see product name")
print()
print("=" * 80)

✅ MULTI-DIMENSIONAL ENCODING: Showing 5+ Dimensions Simultaneously

EXAMPLE 1: Bubble Chart with 5 Dimensions

Encoding Strategy:
  X-axis (Position): Sales (most accurate channel for primary metric)
  Y-axis (Position): Profit Margin %
  Size (Area): Total Customers
  Color (Hue): Product (categorical)
  Shape (Symbol): Region (categorical)



How to Read This Visualization:
  1. Position (X,Y): Find product-region in quadrants
  2. Bubble Size: Larger = more customers
  3. Color: Identifies product family
  4. Shape: Identifies geographic region
  5. Hover: See exact values and quarter

Insights Visible:
  ✓ Top-right quadrant: Stars (high sales, high margin)
  ✓ Bottom-left quadrant: Problem areas (low sales, low margin)
  ✓ Large bubbles in good quadrants: Prioritize retention
  ✓ Small bubbles in good quadrants: Growth opportunities

--------------------------------------------------------------------------------

EXAMPLE 2: Parallel Coordinates Plot (6+ Dimensions)

Use Case: When you have many quantitative dimensions to compare
Benefits:
  ✓ Show 6+ dimensions simultaneously
  ✓ Identify patterns and clusters
  ✓ Filter by brushing any axis



How to Read Parallel Coordinates:
  • Each vertical axis represents one dimension
  • Each line represents one product
  • Lines connecting axes show relationships
  • Parallel lines = correlated dimensions
  • Crossing lines = inverse relationships

Interactive Features:
  • Drag on any axis to filter (brushing)
  • Reorder axes by dragging axis title
  • Hover on line to highlight and see product name



## Part 7: Data Storytelling - Transforming Analysis into Narrative

### 7.1 Why Storytelling Matters in Visualization

**The Problem with Exploratory Dashboards:**
- Show everything → overwhelm viewers
- No clear takeaway → people draw wrong conclusions
- Equal emphasis → important insights buried

**The Power of Narrative:**
- Guides attention → viewers see what matters
- Creates emotional connection → data becomes memorable
- Drives action → clear next steps

### 7.2 The Three-Act Structure for Data Stories

#### Act 1: Setup (Context & Baseline)
**Goal**: Establish the "normal" state and why it matters

**Elements**:
- Show historical context or industry benchmarks
- Establish baseline performance
- Introduce key metrics and their importance
- Set expectations

**Visual Techniques**:
- Muted colors for baseline (gray, light blue)
- Reference lines showing averages
- Annotations explaining context

**Example**: "Our average monthly sales have been steady at $500K for the past year..."

#### Act 2: Conflict (Problem or Opportunity)
**Goal**: Reveal the tension, anomaly, or insight

**Elements**:
- Highlight deviation from baseline
- Show magnitude of problem/opportunity
- Use contrast to draw attention
- Quantify impact

**Visual Techniques**:
- Bright colors for highlights (red for problems, green for opportunities)
- Arrows pointing to key data points
- Size emphasis (larger text, thicker lines)
- Zoom into relevant time period

**Example**: "...but in Q3, Product A sales dropped 30% in the North region, costing us $150K."

#### Act 3: Resolution (Action & Future State)
**Goal**: Show the path forward and expected outcomes

**Elements**:
- Present solution options
- Show projected outcomes
- Provide clear recommendations
- Define success metrics

**Visual Techniques**:
- Comparison charts (current vs. proposed)
- Conditional formatting (if-then scenarios)
- Forecast lines with confidence intervals
- Summary scorecards

**Example**: "By reallocating marketing budget to underserved regions, we project $200K recovery by Q1."

### 7.3 Progressive Disclosure Pattern

**Principle**: Reveal information in stages as viewer demonstrates interest

**Stage 1: High-Level Summary (3 seconds)**
- One number: "Sales down 15%"
- Traffic light: Red/Yellow/Green status
- Sparkline: Trend direction

**Stage 2: Overview (10 seconds)**
- Simple bar or line chart
- Top 3-5 items
- Clear title stating the insight

**Stage 3: Details (30+ seconds)**
- Full interactive dashboard
- Filters and drill-downs
- Comparison tools

**Stage 4: Deep Dive (exploration)**
- Raw data export
- Statistical analysis
- Scenario modeling

### 7.4 Annotation Best Practices

**Annotations guide the eye and reinforce narrative**

**Types of Annotations**:

1. **Callout Boxes**: Explain outliers or key points
   - Position: Near the data point
   - Arrow: Points directly to data
   - Content: 1-2 sentences maximum

2. **Reference Lines**: Show targets or benchmarks
   - Style: Dashed or dotted
   - Color: Gray or subtle contrast
   - Label: Short descriptor (e.g., "Target: $500K")

3. **Shaded Regions**: Highlight time periods
   - Use: Mark events (product launches, campaigns)
   - Opacity: 10-20% (subtle background)
   - Label: Small text at top

4. **Trend Lines**: Show overall direction
   - Type: Linear, polynomial, or LOESS
   - Style: Thinner than data lines
   - Color: Complementary to data

**Annotation Anti-Patterns**:
- ❌ Too many annotations (cluttered)
- ❌ Annotations overlap data
- ❌ Long paragraphs (should be concise)
- ❌ Unclear arrows (ambiguous pointing)

### 7.5 Emotional Design for Data

**Color Psychology in Storytelling**:
- 🔴 Red: Urgency, decline, danger (use sparingly)
- 🟢 Green: Growth, success, safety
- 🟡 Yellow: Caution, attention needed
- 🔵 Blue: Trust, stability, neutral
- 🟣 Purple: Innovation, premium

**Typography Hierarchy**:
- **Title**: 18-24pt, Bold → Main message
- **Subtitle**: 12-14pt, Regular → Context
- **Axis Labels**: 10-11pt → Reference
- **Annotations**: 9-10pt, Italic → Supporting details

**Whitespace as Story Element**:
- More whitespace = more importance
- Clustering = relationships
- Separation = distinct concepts

In [6]:
# Cell 7: Building a complete data story with three-act structure

print("✅ DATA STORYTELLING: Complete Three-Act Narrative")
print("=" * 80)
print()

print("STORY: 'The Product B Performance Crisis and Recovery Plan'")
print()
print("This visualization tells a complete story using the three-act structure:")
print("  Act 1: Setup (Product B was our star performer)")
print("  Act 2: Conflict (Sudden decline in Q3 2023)")
print("  Act 3: Resolution (Recovery strategy with projections)")
print()

# Prepare data for storytelling
monthly_product = df.groupby([df['date'].dt.to_period('M'), 'product']).agg({
    'sales': 'sum',
    'profit': 'sum',
    'customers': 'sum'
}).reset_index()
monthly_product['date'] = monthly_product['date'].dt.to_timestamp()
monthly_product['profit_margin'] = (monthly_product['profit'] / monthly_product['sales']) * 100

# Focus on Product B
product_b_monthly = monthly_product[monthly_product['product'] == 'Product B'].copy()

# Create the narrative visualization
fig_story = go.Figure()

# ACT 1: SETUP (Jan 2022 - Jun 2023) - Normal performance
act1_data = product_b_monthly[product_b_monthly['date'] < '2023-07-01']
act1_mean = act1_data['sales'].mean()

fig_story.add_trace(
    go.Scatter(
        x=act1_data['date'],
        y=act1_data['sales'],
        mode='lines',
        name='Act 1: Normal Period',
        line=dict(color=COLORS['neutral'], width=3),
        fill='tozeroy',
        fillcolor='rgba(108, 117, 125, 0.1)',
        hovertemplate='<b>Normal Period</b><br>%{x|%b %Y}<br>Sales: $%{y:,.0f}<extra></extra>'
    )
)

# Add baseline reference
fig_story.add_hline(
    y=act1_mean,
    line_dash="dash",
    line_color=COLORS['neutral'],
    line_width=2,
    opacity=0.6,
    annotation_text=f"Historical Average: ${act1_mean/1000:.0f}K",
    annotation_position="left",
    annotation_font_size=11
)

# ACT 2: CONFLICT (Jul 2023 - Dec 2023) - The decline
act2_data = product_b_monthly[
    (product_b_monthly['date'] >= '2023-07-01') &
    (product_b_monthly['date'] <= '2023-12-31')
]

fig_story.add_trace(
    go.Scatter(
        x=act2_data['date'],
        y=act2_data['sales'],
        mode='lines+markers',
        name='Act 2: Decline Period',
        line=dict(color=COLORS['danger'], width=4),
        marker=dict(size=10, symbol='x'),
        fill='tozeroy',
        fillcolor='rgba(199, 62, 29, 0.2)',
        hovertemplate='<b>⚠️ Decline Period</b><br>%{x|%b %Y}<br>Sales: $%{y:,.0f}<extra></extra>'
    )
)

# Highlight the sharpest decline point
worst_month = act2_data.loc[act2_data['sales'].idxmin()]
decline_pct = ((worst_month['sales'] - act1_mean) / act1_mean) * 100

fig_story.add_annotation(
    x=worst_month['date'],
    y=worst_month['sales'],
    text=f"<b>Crisis Point</b><br>{decline_pct:.0f}% below baseline<br>${(act1_mean - worst_month['sales'])/1000:.0f}K monthly loss",
    showarrow=True,
    arrowhead=2,
    arrowsize=1,
    arrowwidth=3,
    arrowcolor=COLORS['danger'],
    ax=100,
    ay=-80,
    font=dict(size=12, color=COLORS['danger'], family='Arial'),
    bgcolor='white',
    bordercolor=COLORS['danger'],
    borderwidth=2,
    borderpad=8
)

# ACT 3: RESOLUTION - Projected recovery (2024)
# Generate forecast with recovery
recovery_dates = pd.date_range(start='2024-01-01', periods=6, freq='MS')
recovery_baseline = worst_month['sales']
recovery_trend = np.linspace(recovery_baseline, act1_mean * 1.1, len(recovery_dates))
recovery_sales = recovery_trend + np.random.normal(0, recovery_baseline * 0.05, len(recovery_dates))

fig_story.add_trace(
    go.Scatter(
        x=recovery_dates,
        y=recovery_sales,
        mode='lines+markers',
        name='Act 3: Projected Recovery',
        line=dict(color=COLORS['success'], width=3, dash='dash'),
        marker=dict(size=8, symbol='star'),
        hovertemplate='<b>📈 Projected Recovery</b><br>%{x|%b %Y}<br>Forecast: $%{y:,.0f}<extra></extra>'
    )
)

# Add confidence interval for forecast
fig_story.add_trace(
    go.Scatter(
        x=recovery_dates,
        y=recovery_sales * 1.15,
        mode='lines',
        name='Upper Bound',
        line=dict(width=0),
        showlegend=False,
        hoverinfo='skip'
    )
)

fig_story.add_trace(
    go.Scatter(
        x=recovery_dates,
        y=recovery_sales * 0.85,
        mode='lines',
        name='Lower Bound',
        line=dict(width=0),
        fill='tonexty',
        fillcolor='rgba(6, 167, 125, 0.2)',
        showlegend=False,
        hoverinfo='skip'
    )
)

# Add recovery target annotation
target_month = recovery_dates[-1]
target_sales = recovery_sales[-1]

fig_story.add_annotation(
    x=target_month,
    y=target_sales,
    text=f"<b>Recovery Target</b><br>Return to growth<br>${target_sales/1000:.0f}K/month",
    showarrow=True,
    arrowhead=2,
    arrowsize=1,
    arrowwidth=2,
    arrowcolor=COLORS['success'],
    ax=-100,
    ay=-60,
    font=dict(size=12, color=COLORS['success'], family='Arial'),
    bgcolor='white',
    bordercolor=COLORS['success'],
    borderwidth=2,
    borderpad=8
)

# Add shaded regions for each act
fig_story.add_vrect(
    x0=act1_data['date'].min(),
    x1=act1_data['date'].max(),
    fillcolor=COLORS['neutral'],
    opacity=0.05,
    line_width=0,
    annotation_text="ACT 1: SETUP",
    annotation_position="top left",
    annotation_font_size=11,
    annotation_font_color=COLORS['neutral']
)

fig_story.add_vrect(
    x0=act2_data['date'].min(),
    x1=act2_data['date'].max(),
    fillcolor=COLORS['danger'],
    opacity=0.05,
    line_width=0,
    annotation_text="ACT 2: CONFLICT",
    annotation_position="top left",
    annotation_font_size=11,
    annotation_font_color=COLORS['danger']
)

fig_story.add_vrect(
    x0=recovery_dates[0],
    x1=recovery_dates[-1],
    fillcolor=COLORS['success'],
    opacity=0.05,
    line_width=0,
    annotation_text="ACT 3: RESOLUTION",
    annotation_position="top left",
    annotation_font_size=11,
    annotation_font_color=COLORS['success']
)

# Update layout with story title
fig_story.update_layout(
    title=dict(
        text='<b>Product B: Crisis and Recovery Story</b><br>' +
             '<sub>A three-act narrative showing decline, diagnosis, and projected recovery through targeted interventions</sub>',
        font=dict(size=18, color=COLORS['dark_gray'], family='Arial'),
        x=0.5,
        xanchor='center'
    ),
    xaxis=dict(
        title='',
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        gridwidth=1
    ),
    yaxis=dict(
        title='Monthly Sales ($)',
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        gridwidth=1
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=600,
    hovermode='x unified',
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=-0.25,
        xanchor='center',
        x=0.5,
        font=dict(size=11)
    )
)

fig_story.show()

print("Narrative Elements:")
print()
print("ACT 1 - SETUP (Gray shaded region):")
print("  • Shows 18 months of stable performance")
print("  • Baseline reference line establishes 'normal'")
print("  • Muted colors indicate this is historical context")
print()
print("ACT 2 - CONFLICT (Red shaded region):")
print("  • Dramatic color change (red) signals problem")
print("  • X markers emphasize each declining month")
print("  • Annotation quantifies the impact ($ and %)")
print("  • Visual weight (thicker line) draws attention")
print()
print("ACT 3 - RESOLUTION (Green shaded region):")
print("  • Dashed line indicates projection/forecast")
print("  • Green color signals positive outcome")
print("  • Confidence interval shows uncertainty")
print("  • Target annotation sets clear goal")
print()
print("Story Message:")
print("  'Product B experienced a crisis in Q3 2023, but with our recovery")
print("   plan, we expect to return to growth by mid-2024.'")
print()
print("-" * 80)
print()

# Example 2: Progressive Disclosure Dashboard
print("EXAMPLE 2: Progressive Disclosure - Layered Information")
print()
print("Showing information in stages based on user engagement:")
print()

# Create a summary scorecard (Level 1: 3-second view)
fig_scorecard = make_subplots(
    rows=1, cols=4,
    subplot_titles=(
        '<b>Total Sales</b>',
        '<b>Growth Rate</b>',
        '<b>Profit Margin</b>',
        '<b>Customer Count</b>'
    ),
    specs=[[{"type": "indicator"}, {"type": "indicator"},
            {"type": "indicator"}, {"type": "indicator"}]]
)

# Calculate KPIs
total_sales = df['sales'].sum()
prev_year_sales = df[df['year'] == 2022]['sales'].sum()
curr_year_sales = df[df['year'] == 2023]['sales'].sum()
growth_rate = ((curr_year_sales - prev_year_sales) / prev_year_sales) * 100
avg_margin = df['profit_margin'].mean()
total_customers = df['customers'].sum()

# Sales indicator
fig_scorecard.add_trace(
    go.Indicator(
        mode="number+delta",
        value=curr_year_sales,
        number={'prefix': "$", 'valueformat': ",.0f", 'font': {'size': 40, 'color': COLORS['primary']}},
        delta={'reference': prev_year_sales, 'relative': True, 'valueformat': ".1%"},
        domain={'x': [0, 1], 'y': [0, 1]}
    ),
    row=1, col=1
)

# Growth indicator
fig_scorecard.add_trace(
    go.Indicator(
        mode="number+delta",
        value=growth_rate,
        number={'suffix': "%", 'valueformat': ".1f", 'font': {'size': 40,
                'color': COLORS['success'] if growth_rate > 0 else COLORS['danger']}},
        delta={'reference': 0, 'valueformat': ".1f"},
        domain={'x': [0, 1], 'y': [0, 1]}
    ),
    row=1, col=2
)

# Margin indicator
fig_scorecard.add_trace(
    go.Indicator(
        mode="gauge+number",
        value=avg_margin,
        number={'suffix': "%", 'font': {'size': 30}},
        gauge={
            'axis': {'range': [0, 50], 'tickwidth': 1},
            'bar': {'color': COLORS['primary']},
            'steps': [
                {'range': [0, 15], 'color': 'rgba(199, 62, 29, 0.2)'},
                {'range': [15, 25], 'color': 'rgba(255, 143, 1, 0.2)'},
                {'range': [25, 50], 'color': 'rgba(6, 167, 125, 0.2)'}
            ],
            'threshold': {
                'line': {'color': COLORS['danger'], 'width': 4},
                'thickness': 0.75,
                'value': 20
            }
        },
        domain={'x': [0, 1], 'y': [0, 1]}
    ),
    row=1, col=3
)

# Customers indicator
fig_scorecard.add_trace(
    go.Indicator(
        mode="number",
        value=total_customers,
        number={'valueformat': ",.0f", 'font': {'size': 40, 'color': COLORS['primary']}},
        domain={'x': [0, 1], 'y': [0, 1]}
    ),
    row=1, col=4
)

fig_scorecard.update_layout(
    title=dict(
        text='<b>Executive Summary Dashboard</b><br><sub>Level 1: 3-Second Glance (High-Level KPIs)</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0.5,
        xanchor='center'
    ),
    height=250,
    paper_bgcolor='white',
    margin=dict(t=80, b=20, l=40, r=40)
)

fig_scorecard.show()

print("Level 1 - High-Level Summary (3 seconds):")
print("  ✓ Four key metrics at a glance")
print("  ✓ Large numbers for instant comprehension")
print("  ✓ Delta indicators show change direction")
print("  ✓ Gauge shows performance vs. target")
print("  → User decides if they need more detail")
print()
print("Level 2 would show: Trend sparklines and top products (10 seconds)")
print("Level 3 would show: Interactive charts with filters (30+ seconds)")
print("Level 4 would show: Full data exploration tools (deep dive)")
print()
print("=" * 80)

✅ DATA STORYTELLING: Complete Three-Act Narrative

STORY: 'The Product B Performance Crisis and Recovery Plan'

This visualization tells a complete story using the three-act structure:
  Act 1: Setup (Product B was our star performer)
  Act 2: Conflict (Sudden decline in Q3 2023)
  Act 3: Resolution (Recovery strategy with projections)



Narrative Elements:

ACT 1 - SETUP (Gray shaded region):
  • Shows 18 months of stable performance
  • Baseline reference line establishes 'normal'
  • Muted colors indicate this is historical context

ACT 2 - CONFLICT (Red shaded region):
  • Dramatic color change (red) signals problem
  • X markers emphasize each declining month
  • Annotation quantifies the impact ($ and %)
  • Visual weight (thicker line) draws attention

ACT 3 - RESOLUTION (Green shaded region):
  • Dashed line indicates projection/forecast
  • Green color signals positive outcome
  • Confidence interval shows uncertainty
  • Target annotation sets clear goal

Story Message:
  'Product B experienced a crisis in Q3 2023, but with our recovery
   plan, we expect to return to growth by mid-2024.'

--------------------------------------------------------------------------------

EXAMPLE 2: Progressive Disclosure - Layered Information

Showing information in stages based on user engagement:



Level 1 - High-Level Summary (3 seconds):
  ✓ Four key metrics at a glance
  ✓ Large numbers for instant comprehension
  ✓ Delta indicators show change direction
  ✓ Gauge shows performance vs. target
  → User decides if they need more detail

Level 2 would show: Trend sparklines and top products (10 seconds)
Level 3 would show: Interactive charts with filters (30+ seconds)
Level 4 would show: Full data exploration tools (deep dive)



## Part 8: Animation - Adding the Dimension of Time

### 8.1 When to Use Animation

**Good Use Cases:**
- ✅ Showing change over time (primary use)
- ✅ Transitions between states (before/after)
- ✅ Revealing information progressively
- ✅ Demonstrating processes or flows

**Bad Use Cases:**
- ❌ Making static data "more interesting" (chartjunk)
- ❌ Replacing small multiples (animation harder to compare)
- ❌ Very rapid changes (<200ms per frame)
- ❌ Purely decorative motion

### 8.2 Animation Principles (Borrowed from Film)

**Timing and Spacing:**
- Slow in, slow out (ease-in-out easing)
- Duration: 500-1500ms for most transitions
- Faster for small changes, slower for large changes

**Anticipation:**
- Small movement in opposite direction before main action
- Helps viewer track what will happen
- Example: Element shrinks slightly before expanding

**Staging:**
- Animate one thing at a time when possible
- Use staggered delays for multiple elements
- Don't compete for attention

**Follow-Through:**
- Elements don't stop instantly
- Add slight overshoot and settle
- Creates natural feel

### 8.3 Animated Chart Types

**Animated Bar Chart Race:**
- Shows ranking changes over time
- Bars grow/shrink and reorder
- Best for: Competition narratives (top products, countries)
- Duration: 200-500ms per time step

**Animated Scatter Plot:**
- Points move through 2D space
- Size/color can also change
- Best for: Showing relationships evolving (Gapminder-style)
- Include: Play/pause, speed control, time indicator

**Animated Geographic Map:**
- Choropleth colors change over time
- Points appear/disappear
- Best for: Spatial-temporal patterns (disease spread, sales expansion)

**Animated Line Chart:**
- Line draws from left to right
- Best for: Revealing trends progressively
- Include: Pause at key points

### 8.4 Accessibility Considerations

**Motion Sensitivity:**
```css

In [7]:
# Cell 8: Creating animated visualizations for temporal storytelling

print("✅ ANIMATED VISUALIZATIONS: Adding Time Dimension")
print("=" * 80)
print()

# Example 1: Animated bar chart showing quarterly progression
print("EXAMPLE 1: Animated Bar Chart - Quarterly Sales Race")
print()
print("Shows how product rankings change over time")
print("Animation reveals patterns that static charts miss")
print()

# Prepare data for animation
quarterly_totals = df.groupby(['quarter_label', 'product'])['sales'].sum().reset_index()
quarterly_totals = quarterly_totals.sort_values(['quarter_label', 'sales'], ascending=[True, False])

# Get all quarters in order
quarters = sorted(df['quarter_label'].unique())

# Create animated bar chart
fig_animated_bars = px.bar(
    quarterly_totals,
    x='sales',
    y='product',
    animation_frame='quarter_label',
    orientation='h',
    color='product',
    color_discrete_map=product_colors,
    range_x=[0, quarterly_totals['sales'].max() * 1.1],
    labels={'sales': 'Quarterly Sales ($)', 'product': ''},
    text='sales'
)

# Update text format
fig_animated_bars.update_traces(
    texttemplate='$%{text:,.0f}',
    textposition='outside',
    textfont_size=12
)

# Improve animation settings
fig_animated_bars.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 800  # 800ms per frame
fig_animated_bars.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 400  # 400ms transition

# Update layout
fig_animated_bars.update_layout(
    title=dict(
        text='<b>Quarterly Sales Race: Watch Rankings Change</b><br>' +
             '<sub>Press Play to see how products compete quarter by quarter</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0.5,
        xanchor='center'
    ),
    xaxis=dict(
        showgrid=True,
        gridcolor=COLORS['light_gray'],
        showticklabels=False
    ),
    yaxis=dict(
        showgrid=False,
        categoryorder='total ascending'  # Keep bars sorted
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=450,
    showlegend=False,
    font=dict(family='Arial', size=12)
)

fig_animated_bars.show()

print("Animation Features:")
print("  ✓ Bars grow and shrink smoothly (eased animation)")
print("  ✓ Rankings reorder as values change")
print("  ✓ Play/pause controls for user control")
print("  ✓ Frame indicator shows current quarter")
print()
print("Insights Revealed by Animation:")
print("  → Product B starts strong but declines in 2023 Q3-Q4")
print("  → Product A shows consistent growth")
print("  → Ranking changes highlight competitive dynamics")
print()
print("-" * 80)
print()

# Example 2: Animated scatter plot (Gapminder-style)
print("EXAMPLE 2: Animated Scatter Plot - Evolution Over Time")
print()
print("Shows how products move through performance space")
print("Inspired by Hans Rosling's Gapminder visualizations")
print()

# Prepare data for animated scatter
scatter_data = df.groupby(['quarter_label', 'product', 'region']).agg({
    'sales': 'sum',
    'profit': 'sum',
    'customers': 'sum',
    'profit_margin': 'mean'
}).reset_index()

scatter_data['sales_k'] = scatter_data['sales'] / 1000

# Create animated scatter plot
fig_animated_scatter = px.scatter(
    scatter_data,
    x='sales_k',
    y='profit_margin',
    animation_frame='quarter_label',
    animation_group='product',
    size='customers',
    color='product',
    hover_name='product',
    hover_data={'region': True, 'sales_k': ':.1f', 'profit_margin': ':.1f', 'customers': ':.0f'},
    size_max=50,
    color_discrete_map=product_colors,
    range_x=[0, scatter_data['sales_k'].max() * 1.1],
    range_y=[scatter_data['profit_margin'].min() * 0.9, scatter_data['profit_margin'].max() * 1.1],
    labels={'sales_k': 'Quarterly Sales ($K)', 'profit_margin': 'Profit Margin (%)'}
)

# Add quadrant lines
median_sales = scatter_data['sales_k'].median()
median_margin = scatter_data['profit_margin'].median()

fig_animated_scatter.add_vline(
    x=median_sales,
    line_dash="dash",
    line_color=COLORS['neutral'],
    line_width=1,
    opacity=0.4
)

fig_animated_scatter.add_hline(
    y=median_margin,
    line_dash="dash",
    line_color=COLORS['neutral'],
    line_width=1,
    opacity=0.4
)

# Improve animation
fig_animated_scatter.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 1000
fig_animated_scatter.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 500

fig_animated_scatter.update_layout(
    title=dict(
        text='<b>Product Performance Evolution: Sales vs. Profit Margin</b><br>' +
             '<sub>Bubble size = Customer count | Dashed lines = Medians | Watch products move through quadrants</sub>',
        font=dict(size=16, color=COLORS['dark_gray'], family='Arial'),
        x=0.5,
        xanchor='center'
    ),
    xaxis=dict(
        showgrid=True,
        gridcolor=COLORS['light_gray']
    ),
    yaxis=dict(
        showgrid=True,
        gridcolor=COLORS['light_gray']
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=550,
    hovermode='closest'
)

fig_animated_scatter.show()

print("Animation Features:")
print("  ✓ Points move smoothly through 2D space")
print("  ✓ Size changes show customer growth/decline")
print("  ✓ Trails show path (can enable with plotly 'comet' mode)")
print("  ✓ Quadrant lines provide strategic context")
print()
print("Insights Revealed by Animation:")
print("  → Product B moves from high-margin to low-margin (crisis)")
print("  → Product A steadily increases sales with stable margin")
print("  → Customer count (bubble size) correlates with sales")
print("  → Some products oscillate between quadrants (seasonal?)")
print()
print("=" * 80)

✅ ANIMATED VISUALIZATIONS: Adding Time Dimension

EXAMPLE 1: Animated Bar Chart - Quarterly Sales Race

Shows how product rankings change over time
Animation reveals patterns that static charts miss



Animation Features:
  ✓ Bars grow and shrink smoothly (eased animation)
  ✓ Rankings reorder as values change
  ✓ Play/pause controls for user control
  ✓ Frame indicator shows current quarter

Insights Revealed by Animation:
  → Product B starts strong but declines in 2023 Q3-Q4
  → Product A shows consistent growth
  → Ranking changes highlight competitive dynamics

--------------------------------------------------------------------------------

EXAMPLE 2: Animated Scatter Plot - Evolution Over Time

Shows how products move through performance space
Inspired by Hans Rosling's Gapminder visualizations



Animation Features:
  ✓ Points move smoothly through 2D space
  ✓ Size changes show customer growth/decline
  ✓ Trails show path (can enable with plotly 'comet' mode)
  ✓ Quadrant lines provide strategic context

Insights Revealed by Animation:
  → Product B moves from high-margin to low-margin (crisis)
  → Product A steadily increases sales with stable margin
  → Customer count (bubble size) correlates with sales
  → Some products oscillate between quadrants (seasonal?)



In [8]:
# Cell 9: Dashboard layout principles and information hierarchy

print("✅ DASHBOARD LAYOUT: Information Architecture")
print("=" * 80)
print()

print("PRINCIPLE: F-Pattern Layout (Eye-Tracking Research)")
print()
print("Users scan in an F-shaped pattern:")
print("  1. Horizontal scan at top (title, key metrics)")
print("  2. Vertical scan down left side (navigation, labels)")
print("  3. Horizontal scan in middle (details, charts)")
print()
print("Optimize layout by placing most important info in this pattern")
print()

# Create a professional dashboard layout demonstrating best practices
fig_layout = make_subplots(
    rows=3, cols=3,
    row_heights=[0.15, 0.45, 0.40],
    column_widths=[0.33, 0.33, 0.34],
    subplot_titles=(
        '', '', '',  # Row 1: KPI cards (titles added manually)
        '<b>Sales Trend</b>', '<b>Product Mix</b>', '<b>Regional Performance</b>',
        '<b>Top Products</b>', '', '<b>Key Insights</b>'
    ),
    specs=[
        [{"type": "indicator"}, {"type": "indicator"}, {"type": "indicator"}],
        [{"type": "scatter"}, {"type": "pie"}, {"type": "bar"}],
        [{"type": "bar"}, {"type": "table", "colspan": 2}, None]
    ],
    vertical_spacing=0.12,
    horizontal_spacing=0.10
)

# ROW 1: KPI Cards (Top - F-Pattern horizontal scan)
# Current vs previous year
curr_sales = df[df['year'] == 2023]['sales'].sum()
prev_sales = df[df['year'] == 2022]['sales'].sum()

fig_layout.add_trace(
    go.Indicator(
        mode="number+delta",
        value=curr_sales,
        title={'text': "<b>Total Sales</b><br><span style='font-size:0.8em;color:gray'>2023 YTD</span>"},
        number={'prefix': "$", 'valueformat': ",.0f", 'font': {'size': 32, 'color': COLORS['primary']}},
        delta={'reference': prev_sales, 'relative': True, 'valueformat': ".1%",
               'increasing': {'color': COLORS['success']}, 'decreasing': {'color': COLORS['danger']}}
    ),
    row=1, col=1
)

# Profit margin
curr_margin = df[df['year'] == 2023]['profit_margin'].mean()
prev_margin = df[df['year'] == 2022]['profit_margin'].mean()

fig_layout.add_trace(
    go.Indicator(
        mode="number+delta",
        value=curr_margin,
        title={'text': "<b>Profit Margin</b><br><span style='font-size:0.8em;color:gray'>Average %</span>"},
        number={'suffix': "%", 'valueformat': ".1f", 'font': {'size': 32, 'color': COLORS['primary']}},
        delta={'reference': prev_margin, 'valueformat': ".1f",
               'increasing': {'color': COLORS['success']}, 'decreasing': {'color': COLORS['danger']}}
    ),
    row=1, col=2
)

# Customer count
curr_customers = df[df['year'] == 2023]['customers'].sum()
prev_customers = df[df['year'] == 2022]['customers'].sum()

fig_layout.add_trace(
    go.Indicator(
        mode="number+delta",
        value=curr_customers,
        title={'text': "<b>Total Customers</b><br><span style='font-size:0.8em;color:gray'>2023 YTD</span>"},
        number={'valueformat': ",.0f", 'font': {'size': 32, 'color': COLORS['primary']}},
        delta={'reference': prev_customers, 'relative': True, 'valueformat': ".1%",
               'increasing': {'color': COLORS['success']}, 'decreasing': {'color': COLORS['danger']}}
    ),
    row=1, col=3
)

# ROW 2: Main Visualizations (Middle - F-Pattern horizontal scan)

# Chart 1: Sales trend (time series)
monthly_2023 = df[df['year'] == 2023].groupby(df['date'].dt.to_period('M'))['sales'].sum().reset_index()
monthly_2023['date'] = monthly_2023['date'].dt.to_timestamp()

fig_layout.add_trace(
    go.Scatter(
        x=monthly_2023['date'],
        y=monthly_2023['sales'],
        mode='lines+markers',
        line=dict(color=COLORS['primary'], width=3),
        marker=dict(size=8, color=COLORS['accent']),
        fill='tozeroy',
        fillcolor='rgba(46, 134, 171, 0.1)',
        name='Monthly Sales',
        showlegend=False
    ),
    row=2, col=1
)

# Chart 2: Product mix (pie chart)
product_mix_2023 = df[df['year'] == 2023].groupby('product')['sales'].sum()

fig_layout.add_trace(
    go.Pie(
        labels=product_mix_2023.index,
        values=product_mix_2023.values,
        marker=dict(colors=[product_colors[p] for p in product_mix_2023.index]),
        textinfo='label+percent',
        textfont=dict(size=10),
        showlegend=False
    ),
    row=2, col=2
)

# Chart 3: Regional performance (bar chart)
regional_2023 = df[df['year'] == 2023].groupby('region')['sales'].sum().sort_values()

fig_layout.add_trace(
    go.Bar(
        x=regional_2023.values,
        y=regional_2023.index,
        orientation='h',
        marker=dict(color=COLORS['primary']),
        text=[f'${v/1000:.0f}K' for v in regional_2023.values],
        textposition='outside',
        showlegend=False
    ),
    row=2, col=3
)

# ROW 3: Supporting Details (Bottom)

# Chart 4: Top products (sorted bar)
top_products = df[df['year'] == 2023].groupby('product')['sales'].sum().nlargest(5).sort_values()

fig_layout.add_trace(
    go.Bar(
        x=top_products.values,
        y=top_products.index,
        orientation='h',
        marker=dict(color=[product_colors[p] for p in top_products.index]),
        text=[f'${v/1000:.0f}K' for v in top_products.values],
        textposition='outside',
        showlegend=False
    ),
    row=3, col=1
)

# Chart 5: Key insights table
insights_data = [
    ['🎯', '<b>Best Performer</b>', product_mix_2023.idxmax(), f'${product_mix_2023.max()/1000:.0f}K sales'],
    ['📈', '<b>Fastest Growth</b>', 'Product A', '+23% YoY'],
    ['⚠️', '<b>Needs Attention</b>', 'Product B', 'Q3 decline'],
    ['💡', '<b>Opportunity</b>', 'West Region', 'Underserved market']
]

fig_layout.add_trace(
    go.Table(
        header=dict(
            values=['', '<b>Metric</b>', '<b>Item</b>', '<b>Details</b>'],
            fill_color=COLORS['light_gray'],
            align='left',
            font=dict(size=12, color=COLORS['dark_gray'])
        ),
        cells=dict(
            values=list(zip(*insights_data)),
            fill_color='white',
            align='left',
            font=dict(size=11),
            height=30
        )
    ),
    row=3, col=2
)

# Update layout
fig_layout.update_xaxes(showgrid=True, gridcolor=COLORS['light_gray'], row=2, col=1)
fig_layout.update_yaxes(showgrid=True, gridcolor=COLORS['light_gray'], row=2, col=1)
fig_layout.update_xaxes(showgrid=False, row=2, col=3)
fig_layout.update_xaxes(showgrid=False, row=3, col=1)

fig_layout.update_layout(
    title=dict(
        text='<b>2023 Business Performance Dashboard</b><br>' +
             '<sub>Designed using F-pattern layout principles | Most important information in top-left quadrant</sub>',
        font=dict(size=18, color=COLORS['dark_gray'], family='Arial'),
        x=0.5,
        xanchor='center',
        y=0.98,
        yanchor='top'
    ),
    height=900,
    showlegend=False,
    plot_bgcolor='white',
    paper_bgcolor='white',
    margin=dict(t=100, b=40, l=60, r=60)
)

fig_layout.show()

print("Layout Best Practices Applied:")
print()
print("1. TOP ROW (KPI Cards):")
print("   ✓ Large numbers for instant recognition")
print("   ✓ Delta indicators show change")
print("   ✓ Color-coded (green=good, red=bad)")
print("   → Answers 'How are we doing?' in 3 seconds")
print()
print("2. MIDDLE ROW (Primary Charts):")
print("   ✓ Three distinct chart types (variety aids comprehension)")
print("   ✓ Equal visual weight (balanced composition)")
print("   ✓ Clear titles and labels")
print("   → Answers 'What are the trends?' in 10 seconds")
print()
print("3. BOTTOM ROW (Details):")
print("   ✓ Ranked list highlights top performers")
print("   ✓ Insights table provides actionable takeaways")
print("   ✓ Less visual emphasis (smaller, bottom position)")
print("   → Answers 'What should I do?' in 30 seconds")
print()
print("Information Hierarchy:")
print("  Primary: KPIs (largest, top, color)")
print("  Secondary: Trend charts (middle, moderate size)")
print("  Tertiary: Details (bottom, smallest, tables)")
print()
print("Whitespace Strategy:")
print("  • Generous margins (60px) frame the content")
print("  • Consistent spacing between panels (10-12%)")
print("  • No unnecessary borders or decorations")
print("  → Creates 'breathing room' and visual clarity")
print()
print("=" * 80)

✅ DASHBOARD LAYOUT: Information Architecture

PRINCIPLE: F-Pattern Layout (Eye-Tracking Research)

Users scan in an F-shaped pattern:
  1. Horizontal scan at top (title, key metrics)
  2. Vertical scan down left side (navigation, labels)
  3. Horizontal scan in middle (details, charts)

Optimize layout by placing most important info in this pattern



Layout Best Practices Applied:

1. TOP ROW (KPI Cards):
   ✓ Large numbers for instant recognition
   ✓ Delta indicators show change
   ✓ Color-coded (green=good, red=bad)
   → Answers 'How are we doing?' in 3 seconds

2. MIDDLE ROW (Primary Charts):
   ✓ Three distinct chart types (variety aids comprehension)
   ✓ Equal visual weight (balanced composition)
   ✓ Clear titles and labels
   → Answers 'What are the trends?' in 10 seconds

3. BOTTOM ROW (Details):
   ✓ Ranked list highlights top performers
   ✓ Insights table provides actionable takeaways
   ✓ Less visual emphasis (smaller, bottom position)
   → Answers 'What should I do?' in 30 seconds

Information Hierarchy:
  Primary: KPIs (largest, top, color)
  Secondary: Trend charts (middle, moderate size)
  Tertiary: Details (bottom, smallest, tables)

Whitespace Strategy:
  • Generous margins (60px) frame the content
  • Consistent spacing between panels (10-12%)
  • No unnecessary borders or decorations
  → Creates 'breathing roo

## Part 9: Accessibility - Designing for Everyone

### 9.1 Why Accessibility Matters

**Statistics:**
- ~8% of males have color vision deficiency (colorblindness)
- ~15% of world population has some form of disability
- Legal requirements: Section 508 (US), WCAG 2.1 (international)
- Business case: Accessible design benefits ALL users, not just those with disabilities

**Core Principle**: Universal Design
- Design for the widest possible audience from the start
- Don't treat accessibility as an afterthought
- "Curb cut effect": Accessibility features help everyone

### 9.2 Color Vision Deficiency (CVD)

**Types of Colorblindness:**

1. **Deuteranopia (Green-Blind)** - Most common (~5% of males)
   - Can't distinguish red from green
   - Both appear brownish/yellow

2. **Protanopia (Red-Blind)** - Second most common (~2% of males)
   - Red appears dark gray or black
   - Green and red confusion

3. **Tritanopia (Blue-Blind)** - Rare (~0.001%)
   - Blue and green confusion
   - Yellow and violet confusion

4. **Achromatopsia (Total Color Blindness)** - Very rare
   - See only shades of gray

**CVD-Safe Design Rules:**

✅ **DO:**
- Use color + another encoding (shape, pattern, label)
- Choose CVD-friendly palettes (ColorBrewer, Viridis)
- Test with CVD simulators
- Ensure 4.5:1 contrast ratio (WCAG AA)

❌ **DON'T:**
- Rely only on red-green distinctions
- Use rainbow colorscales (Jet)
- Place red and green adjacent
- Use low-contrast color pairs

**Recommended Palettes:**

**Qualitative (CVD-safe):**
- ColorBrewer "Set2", "Dark2"
- IBM Carbon color palette
- Okabe-Ito palette (designed for CVD)

**Sequential (CVD-safe):**
- Viridis, Plasma, Inferno (perceptually uniform)
- ColorBrewer "Blues", "Purples", "Greens"
- Single-hue gradients (avoid multi-hue)

**Diverging (CVD-safe):**
- Blue-Orange (instead of red-green)
- Purple-Green
- Pink-Green (colorblind-safe pairing)

### 9.3 Screen Reader Accessibility

**ARIA Labels (Accessible Rich Internet Applications):**
```html
<!-- Bad: No context for screen readers -->
<div id="chart1"></div>

<!-- Good: Descriptive labels -->
<div id="chart1"
     role="img"
     aria-label="Bar chart showing sales by product. Product A leads with $500K, followed by Product B at $450K.">
</div>
```

**Alt Text Best Practices:**
- Start with chart type ("Bar chart showing...")
- Describe the insight, not every data point
- Keep under 150 characters for short description
- Provide long description link for complex charts

**Data Tables as Fallback:**
- Always provide data table alongside chart
- Use semantic HTML (`<table>`, `<th>`, `<caption>`)
- Allow keyboard navigation through table

### 9.4 Keyboard Navigation

**Requirements:**
- All interactive elements must be keyboard-accessible
- Logical tab order (left-to-right, top-to-bottom)
- Visible focus indicators (blue outline, increased size)
- Skip navigation links for long pages

**Keyboard Shortcuts:**
- `Tab`: Move to next interactive element
- `Shift + Tab`: Move to previous element
- `Enter/Space`: Activate button or link
- `Arrow keys`: Navigate within component

**Focus Management:**
```javascript
// Ensure focus is visible
element.addEventListener('focus', (e) => {
  e.target.style.outline = '3px solid #2E86AB';
});
```

### 9.5 Motion Sensitivity

**Vestibular Disorders:**
- Motion can cause nausea, dizziness, migraines
- Affects ~35% of adults over 40

**Respect User Preferences:**
```css
@media (prefers-reduced-motion: reduce) {
  * {
    animation-duration: 0.01ms !important;
    animation-iteration-count: 1 !important;
    transition-duration: 0.01ms !important;
  }
}
```

**Animation Guidelines:**
- Provide play/pause controls
- Don't auto-play animations
- Keep duration under 5 seconds
- Allow disabling animations

### 9.6 Contrast and Typography

**WCAG Contrast Requirements:**
- **AA (Minimum)**: 4.5:1 for normal text, 3:1 for large text
- **AAA (Enhanced)**: 7:1 for normal text, 4.5:1 for large text

**Typography Best Practices:**
- Minimum 12pt font size (16px)
- Line height: 1.5x font size
- Line length: 50-75 characters
- Use sans-serif fonts (easier to read on screen)
- Avoid all-caps (harder to read)

**Tools:**
- WebAIM Contrast Checker
- Colour Contrast Analyser (CCA)
- axe DevTools browser extension

### 9.7 Testing Your Visualizations

**Automated Testing:**
- axe DevTools (Chrome/Firefox extension)
- WAVE (Web Accessibility Evaluation Tool)
- Lighthouse (built into Chrome DevTools)

**Manual Testing:**
- CVD simulator (Coblis, Color Oracle)
- Keyboard-only navigation
- Screen reader testing (NVDA, JAWS, VoiceOver)
- Print in grayscale (checks if color is redundant)

**Checklist:**
- [ ] Color contrast meets WCAG AA (4.5:1)
- [ ] Information doesn't rely solely on color
- [ ] All interactive elements keyboard-accessible
- [ ] Focus indicators visible
- [ ] ARIA labels present
- [ ] Alternative text provided
- [ ] Animations can be disabled
- [ ] Tested with CVD simulator

In [9]:
# Cell 10: Creating accessible visualizations with CVD-safe palettes

print("✅ ACCESSIBILITY: Designing Inclusive Visualizations")
print("=" * 80)
print()

# Define CVD-safe color palettes
print("Colorblind-Safe Palettes")
print("-" * 80)
print()

# Okabe-Ito palette (designed specifically for CVD)
okabe_ito = {
    'orange': '#E69F00',
    'sky_blue': '#56B4E9',
    'bluish_green': '#009E73',
    'yellow': '#F0E442',
    'blue': '#0072B2',
    'vermillion': '#D55E00',
    'reddish_purple': '#CC79A7',
    'black': '#000000'
}

# IBM Carbon palette (CVD-tested)
ibm_carbon = {
    'blue': '#0F62FE',
    'purple': '#8A3FFC',
    'cyan': '#1192E8',
    'teal': '#009D9A',
    'magenta': '#EE5396',
    'red': '#DA1E28',
    'orange': '#FF832B'
}

# ColorBrewer Set2 (CVD-safe qualitative)
colorbrewer_set2 = {
    'teal': '#66C2A5',
    'orange': '#FC8D62',
    'yellow': '#E5C494',
    'pink': '#E78AC3',
    'green': '#A6D854',
    'blue': '#8DA0CB',
    'salmon': '#FFD92F',
    'lavender': '#B3B3B3'
}

print("1. Okabe-Ito Palette (Scientifically designed for CVD)")
for name, color in okabe_ito.items():
    print(f"   {name:15s} {color}")
print()

print("2. IBM Carbon Palette (CVD-tested)")
for name, color in ibm_carbon.items():
    print(f"   {name:15s} {color}")
print()

print("3. ColorBrewer Set2 (CVD-safe)")
for name, color in colorbrewer_set2.items():
    print(f"   {name:15s} {color}")
print()
print("-" * 80)
print()

# Example 1: Compare bad vs good color choices
print("EXAMPLE 1: Red-Green vs CVD-Safe Comparison")
print()

# Create comparison data
comparison_data = pd.DataFrame({
    'Category': ['Category A', 'Category B', 'Category C', 'Category D', 'Category E'],
    'Value': [85, 92, 78, 95, 88]
})

# Create side-by-side comparison
fig_cvd_compare = make_subplots(
    rows=1, cols=2,
    subplot_titles=(
        '<b>❌ BAD: Red-Green Palette</b><br><sub>(Invisible to 8% of males)</sub>',
        '<b>✅ GOOD: Blue-Orange Palette</b><br><sub>(CVD-safe alternative)</sub>'
    ),
    horizontal_spacing=0.15
)

# Bad example: Red-Green gradient
red_green_colors = ['#d73027', '#fc8d59', '#fee090', '#91cf60', '#1a9850']

fig_cvd_compare.add_trace(
    go.Bar(
        x=comparison_data['Category'],
        y=comparison_data['Value'],
        marker=dict(
            color=red_green_colors,
            line=dict(width=0)
        ),
        text=comparison_data['Value'],
        textposition='outside',
        showlegend=False,
        hovertemplate='%{x}<br>Value: %{y}<extra></extra>'
    ),
    row=1, col=1
)

# Good example: Blue-Orange (CVD-safe)
blue_orange_colors = ['#053061', '#2166ac', '#92c5de', '#f4a582', '#b2182b']

fig_cvd_compare.add_trace(
    go.Bar(
        x=comparison_data['Category'],
        y=comparison_data['Value'],
        marker=dict(
            color=blue_orange_colors,
            line=dict(width=0)
        ),
        text=comparison_data['Value'],
        textposition='outside',
        showlegend=False,
        hovertemplate='%{x}<br>Value: %{y}<extra></extra>'
    ),
    row=1, col=2
)

fig_cvd_compare.update_layout(
    title=dict(
        text='<b>Color Accessibility Comparison</b><br>' +
             '<sub>Left: Problematic for deuteranopia | Right: Distinguishable for all CVD types</sub>',
        font=dict(size=16, color='#212529', family='Arial'),
        x=0.5,
        xanchor='center'
    ),
    height=450,
    showlegend=False,
    plot_bgcolor='white',
    paper_bgcolor='white'
)

fig_cvd_compare.update_xaxes(showgrid=False)
fig_cvd_compare.update_yaxes(showgrid=True, gridcolor='#E9ECEF', range=[0, 100])

fig_cvd_compare.show()

print("Why Blue-Orange is Better:")
print("  ✓ Distinguishable for deuteranopia (green-blind)")
print("  ✓ Distinguishable for protanopia (red-blind)")
print("  ✓ Good contrast in grayscale")
print("  ✓ Perceptually uniform progression")
print()
print("-" * 80)
print()

# Example 2: Using multiple encodings (not just color)
print("EXAMPLE 2: Redundant Encoding - Color + Shape + Pattern")
print()
print("Best Practice: Never rely ONLY on color")
print("Add shape, pattern, or text to ensure accessibility")
print()

# Create data for comparison
region_performance = df[df['year'] == 2023].groupby('region').agg({
    'sales': 'sum',
    'profit_margin': 'mean'
}).reset_index()

region_performance['status'] = pd.cut(
    region_performance['profit_margin'],
    bins=[0, 20, 25, 100],
    labels=['Needs Attention', 'Good', 'Excellent']
)

# Create scatter plot with multiple encodings
fig_redundant = go.Figure()

# Define shapes and colors for each status
status_config = {
    'Needs Attention': {'color': '#DA1E28', 'symbol': 'x', 'size': 16},
    'Good': {'color': '#FF832B', 'symbol': 'diamond', 'size': 14},
    'Excellent': {'color': '#009D9A', 'symbol': 'star', 'size': 18}
}

for status in status_config.keys():
    status_data = region_performance[region_performance['status'] == status]
    config = status_config[status]

    fig_redundant.add_trace(
        go.Scatter(
            x=status_data['sales'],
            y=status_data['profit_margin'],
            mode='markers+text',
            name=status,
            marker=dict(
                color=config['color'],
                size=config['size'],
                symbol=config['symbol'],
                line=dict(color='white', width=2)
            ),
            text=status_data['region'],
            textposition='top center',
            textfont=dict(size=11, color='#212529', family='Arial'),
            hovertemplate=(
                '<b>%{text}</b><br>' +
                f'Status: {status}<br>' +
                'Sales: $%{x:,.0f}<br>' +
                'Margin: %{y:.1f}%<extra></extra>'
            )
        )
    )

# Add reference lines
median_sales = region_performance['sales'].median()
median_margin = region_performance['profit_margin'].median()

fig_redundant.add_vline(x=median_sales, line_dash="dash", line_color='#6C757D', line_width=1, opacity=0.5)
fig_redundant.add_hline(y=median_margin, line_dash="dash", line_color='#6C757D', line_width=1, opacity=0.5)

fig_redundant.update_layout(
    title=dict(
        text='<b>Regional Performance: Multiple Encodings</b><br>' +
             '<sub>Color + Shape + Size + Text Label = Accessible to all users</sub>',
        font=dict(size=16, color='#212529', family='Arial'),
        x=0.5,
        xanchor='center'
    ),
    xaxis=dict(
        title='Total Sales ($)',
        showgrid=True,
        gridcolor='#E9ECEF'
    ),
    yaxis=dict(
        title='Profit Margin (%)',
        showgrid=True,
        gridcolor='#E9ECEF'
    ),
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=500,
    legend=dict(
        title='Performance Status',
        orientation='v',
        yanchor='top',
        y=1,
        xanchor='left',
        x=1.02,
        font=dict(size=11)
    )
)

fig_redundant.show()

print("Redundant Encoding Strategy:")
print("  1. Color: Red/Orange/Green (semantic meaning)")
print("  2. Shape: X/Diamond/Star (distinguishable even in grayscale)")
print("  3. Size: Larger = better performance")
print("  4. Text: Region name directly labeled")
print()
print("Benefits:")
print("  ✓ Colorblind users can use shape")
print("  ✓ Grayscale printing preserves information")
print("  ✓ Screen readers can announce status")
print("  ✓ Reduces cognitive load (multiple cues)")
print()
print("-" * 80)
print()

# Example 3: High contrast, readable typography
print("EXAMPLE 3: Typography and Contrast for Readability")
print()

# Create an accessible dashboard with proper contrast
monthly_summary = df[df['year'] == 2023].groupby(df['date'].dt.to_period('M')).agg({
    'sales': 'sum',
    'profit': 'sum',
    'customers': 'sum'
}).reset_index()
monthly_summary['date'] = monthly_summary['date'].dt.to_timestamp()

fig_accessible = go.Figure()

fig_accessible.add_trace(
    go.Scatter(
        x=monthly_summary['date'],
        y=monthly_summary['sales'],
        mode='lines+markers',
        line=dict(color='#0F62FE', width=4),  # IBM Blue, 4.5:1 contrast on white
        marker=dict(
            size=10,
            color='#0F62FE',
            line=dict(color='white', width=2)
        ),
        name='Sales',
        hovertemplate='<b>%{x|%B %Y}</b><br>Sales: $%{y:,.0f}<extra></extra>'
    )
)

fig_accessible.update_layout(
    title=dict(
        text='<b>2023 Monthly Sales Performance</b>',
        font=dict(
            size=20,  # Large, readable
            color='#212529',  # High contrast (14:1 ratio)
            family='Arial, sans-serif'  # Web-safe, readable
        ),
        x=0,
        xanchor='left'
    ),
    xaxis=dict(
        title='<b>Month</b>',
        titlefont=dict(size=14, color='#212529'),
        showgrid=True,
        gridcolor='#DEE2E6',  # Light gray, subtle
        gridwidth=1,
        tickfont=dict(size=12, color='#495057')  # Good contrast
    ),
    yaxis=dict(
        title='<b>Sales ($)</b>',
        titlefont=dict(size=14, color='#212529'),
        showgrid=True,
        gridcolor='#DEE2E6',
        gridwidth=1,
        tickfont=dict(size=12, color='#495057'),
        tickformat='$,.0f'
    ),
    plot_bgcolor='white',  # High contrast background
    paper_bgcolor='white',
    height=450,
    font=dict(family='Arial, sans-serif', size=12, color='#212529'),
    hovermode='x unified',
    hoverlabel=dict(
        bgcolor='white',
        font_size=13,
        font_family='Arial, sans-serif',
        bordercolor='#0F62FE'
    )
)

fig_accessible.show()

print("Accessibility Features:")
print("  ✓ Font size: 20pt title, 14pt axis labels, 12pt ticks (all above 12pt minimum)")
print("  ✓ Color contrast: 14:1 text-to-background (exceeds WCAG AAA)")
print("  ✓ Line weight: 4px (thick enough for low vision)")
print("  ✓ Marker borders: White outline improves visibility")
print("  ✓ Sans-serif font: Arial (easier to read on screen)")
print("  ✓ Clear labels: Bolded for emphasis")
print()
print("WCAG Compliance:")
print("  ✓ AA: 4.5:1 contrast (minimum) - PASS")
print("  ✓ AAA: 7:1 contrast (enhanced) - PASS")
print("  ✓ Large text: 3:1 contrast - PASS")
print()
print("=" * 80)

✅ ACCESSIBILITY: Designing Inclusive Visualizations

Colorblind-Safe Palettes
--------------------------------------------------------------------------------

1. Okabe-Ito Palette (Scientifically designed for CVD)
   orange          #E69F00
   sky_blue        #56B4E9
   bluish_green    #009E73
   yellow          #F0E442
   blue            #0072B2
   vermillion      #D55E00
   reddish_purple  #CC79A7
   black           #000000

2. IBM Carbon Palette (CVD-tested)
   blue            #0F62FE
   purple          #8A3FFC
   cyan            #1192E8
   teal            #009D9A
   magenta         #EE5396
   red             #DA1E28
   orange          #FF832B

3. ColorBrewer Set2 (CVD-safe)
   teal            #66C2A5
   orange          #FC8D62
   yellow          #E5C494
   pink            #E78AC3
   green           #A6D854
   blue            #8DA0CB
   salmon          #FFD92F
   lavender        #B3B3B3

--------------------------------------------------------------------------------

EXAMPLE 1: Re

Why Blue-Orange is Better:
  ✓ Distinguishable for deuteranopia (green-blind)
  ✓ Distinguishable for protanopia (red-blind)
  ✓ Good contrast in grayscale
  ✓ Perceptually uniform progression

--------------------------------------------------------------------------------

EXAMPLE 2: Redundant Encoding - Color + Shape + Pattern

Best Practice: Never rely ONLY on color
Add shape, pattern, or text to ensure accessibility



Redundant Encoding Strategy:
  1. Color: Red/Orange/Green (semantic meaning)
  2. Shape: X/Diamond/Star (distinguishable even in grayscale)
  3. Size: Larger = better performance
  4. Text: Region name directly labeled

Benefits:
  ✓ Colorblind users can use shape
  ✓ Grayscale printing preserves information
  ✓ Screen readers can announce status
  ✓ Reduces cognitive load (multiple cues)

--------------------------------------------------------------------------------

EXAMPLE 3: Typography and Contrast for Readability



Accessibility Features:
  ✓ Font size: 20pt title, 14pt axis labels, 12pt ticks (all above 12pt minimum)
  ✓ Color contrast: 14:1 text-to-background (exceeds WCAG AAA)
  ✓ Line weight: 4px (thick enough for low vision)
  ✓ Marker borders: White outline improves visibility
  ✓ Sans-serif font: Arial (easier to read on screen)
  ✓ Clear labels: Bolded for emphasis

WCAG Compliance:
  ✓ AA: 4.5:1 contrast (minimum) - PASS
  ✓ AAA: 7:1 contrast (enhanced) - PASS
  ✓ Large text: 3:1 contrast - PASS



## Part 10: Performance Optimization for Large Datasets

### 10.1 The Performance Challenge

**Problem**: Modern dashboards deal with massive datasets
- IoT sensors: Millions of records per day
- Financial data: Tick-by-tick transactions
- Web analytics: Billions of page views

**User Expectations**:
- Initial load: <2 seconds
- Interactions: <100ms (feels instant)
- Animations: 60 FPS (16.67ms per frame)

### 10.2 Data Reduction Strategies

#### Strategy 1: Aggregation
**Principle**: Show summaries, not raw data
```python
# Bad: Plot 1 million daily records
plot(daily_data)  # 1M points

# Good: Aggregate to monthly
monthly_data = daily_data.resample('M').mean()
plot(monthly_data)  # 24 points
```

**When to Aggregate**:
- >10,000 points in line chart
- >50,000 points in scatter plot
- Details not visible at screen resolution

#### Strategy 2: Sampling
**Principle**: Show representative subset

**Random Sampling**:
```python
# Show 10% of data
sample = df.sample(frac=0.1, random_state=42)
```

**Stratified Sampling** (Better):
```python
# Ensure each category represented
sample = df.groupby('category').apply(
    lambda x: x.sample(frac=0.1, random_state=42)
)
```

**LTTB (Largest Triangle Three Buckets)**:
- Intelligent downsampling that preserves shape
- Maintains peaks and valleys
- Much better than random sampling for time series

#### Strategy 3: Progressive Loading
**Principle**: Load data in stages

1. **Initial**: Load aggregated overview (instant)
2. **On Demand**: Load details when user zooms
3. **Background**: Prefetch likely next views

**Implementation**:
```python
# Load levels of detail
overview = load_monthly_summary()      # 24 points
daily = None                            # Lazy load

def on_zoom(date_range):
    daily = load_daily_data(date_range)  # 365 points
    update_chart(daily)
```

#### Strategy 4: Server-Side Processing
**Principle**: Don't send raw data to browser

**Architecture**:
```
Browser (Dash)  →  Request: "Show sales by region"
                ←  Response: Aggregated JSON (5KB)

Instead of:
Browser         ←  Full dataset CSV (50MB)
                   Client aggregates (slow!)
```

### 10.3 Rendering Optimizations

#### Technique 1: Canvas vs SVG
**SVG (Default in Plotly)**:
- ✓ Crisp at any zoom level
- ✓ Accessible (DOM elements)
- ✗ Slow with >5,000 points

**Canvas (WebGL)**:
- ✓ Fast with millions of points
- ✓ GPU-accelerated
- ✗ Not accessible (bitmap)
- ✗ No interaction on individual points

**When to Use Canvas**:
- Scatter plots with >10,000 points
- Heatmaps with >10,000 cells
- Real-time streaming data
```python
# Enable WebGL in Plotly
fig = go.Figure(go.Scattergl(  # Note: Scattergl, not Scatter
    x=large_x,
    y=large_y,
    mode='markers'
))
```

#### Technique 2: Virtualization
**Principle**: Only render visible elements

**For Tables**:
- Render 20 rows (visible viewport)
- As user scrolls, render new rows
- Destroy off-screen rows

**Libraries**:
- react-virtualized (React)
- dash-ag-grid (Dash)

#### Technique 3: Debouncing/Throttling
**Principle**: Limit update frequency

**Debounce** (Execute after pause):
```javascript
// Wait 300ms after user stops typing
debounce(searchFunction, 300)
```

**Throttle** (Execute at most once per interval):
```javascript
// Execute at most once per 100ms
throttle(scrollFunction, 100)
```

### 10.4 Memory Management

#### Issue: Memory Leaks
**Causes**:
- Unclosed connections
- Retained event listeners
- Large objects not garbage collected

**Solutions**:
```python
# Close connections
@app.callback(...)
def update_chart():
    connection = get_db_connection()
    try:
        data = connection.query()
        return create_figure(data)
    finally:
        connection.close()  # Always close!

# Clear large objects
del large_dataframe
gc.collect()  # Force garbage collection
```

### 10.5 Caching Strategies

#### Client-Side Caching
```python
from dash import dcc
import time

# Cache for 1 hour
@cache.memoize(timeout=3600)
def expensive_computation(param):
    time.sleep(5)  # Simulated slow query
    return result
```

#### Server-Side Caching (Redis)
```python
import redis
r = redis.Redis()

# Cache query result
key = f"sales_{date}_{region}"
cached = r.get(key)

if cached:
    data = json.loads(cached)
else:
    data = expensive_query()
    r.setex(key, 3600, json.dumps(data))  # Cache 1 hour
```

### 10.6 Profiling and Monitoring

**Tools**:
- Chrome DevTools (Network, Performance tabs)
- Plotly Profiler
- Python cProfile

**Metrics to Track**:
- Time to First Byte (TTFB): <200ms
- First Contentful Paint (FCP): <1s
- Time to Interactive (TTI): <3s
- Frame rate: 60 FPS

**Optimization Checklist**:
- [ ] Aggregate data before sending to client
- [ ] Limit to <10,000 points per chart
- [ ] Use WebGL for large scatter plots
- [ ] Implement caching (server and client)
- [ ] Debounce user inputs
- [ ] Lazy load details on demand
- [ ] Compress API responses (gzip)
- [ ] Use CDN for static assets

In [10]:
# Cell 11: Demonstrating performance optimization strategies

print("✅ PERFORMANCE OPTIMIZATION: Handling Large Datasets")
print("=" * 80)
print()

import time
from functools import lru_cache

# Generate large dataset for performance testing
print("Generating large dataset for performance testing...")
np.random.seed(42)

# Simulate 1 year of second-by-second data (31M records)
large_date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='1min')
large_dataset = pd.DataFrame({
    'timestamp': large_date_range,
    'value': np.cumsum(np.random.randn(len(large_date_range))) + 100
})

print(f"✓ Created dataset with {len(large_dataset):,} records")
print(f"  Memory usage: {large_dataset.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
print()
print("-" * 80)
print()

# Strategy 1: Intelligent Aggregation
print("STRATEGY 1: Intelligent Aggregation")
print()
print("Problem: Plotting 500K+ points is slow and unnecessary")
print("Solution: Aggregate to appropriate time resolution")
print()

def benchmark_plotting(data, name):
    """Measure time to create plot"""
    start = time.time()
    fig = go.Figure(go.Scatter(x=data['timestamp'], y=data['value'], mode='lines'))
    elapsed = time.time() - start
    return elapsed

# Test different aggregation levels
test_size = 50000  # Use subset for demo

raw_data = large_dataset.head(test_size).copy()
hourly_data = large_dataset.head(test_size).set_index('timestamp').resample('H').mean().reset_index()
daily_data = large_dataset.head(test_size).set_index('timestamp').resample('D').mean().reset_index()

print("Benchmark Results:")
print(f"  Raw data ({len(raw_data):,} points):     {benchmark_plotting(raw_data, 'raw'):.3f} seconds")
print(f"  Hourly ({len(hourly_data):,} points):    {benchmark_plotting(hourly_data, 'hourly'):.3f} seconds")
print(f"  Daily ({len(daily_data):,} points):      {benchmark_plotting(daily_data, 'daily'):.3f} seconds")
print()
print("Takeaway: Aggregating to daily reduces rendering time by ~90%")
print("          AND provides clearer visualization of trends")
print()

# Visualize the difference
fig_aggregation = make_subplots(
    rows=3, cols=1,
    subplot_titles=(
        f'<b>Raw Data</b> ({len(raw_data):,} points) - Noisy, slow',
        f'<b>Hourly Aggregation</b> ({len(hourly_data):,} points) - Clearer',
        f'<b>Daily Aggregation</b> ({len(daily_data):,} points) - Fastest, clearest trends'
    ),
    vertical_spacing=0.1
)

fig_aggregation.add_trace(
    go.Scatter(x=raw_data['timestamp'], y=raw_data['value'],
               mode='lines', line=dict(color='#6C757D', width=0.5),
               name='Raw', showlegend=False),
    row=1, col=1
)

fig_aggregation.add_trace(
    go.Scatter(x=hourly_data['timestamp'], y=hourly_data['value'],
               mode='lines', line=dict(color='#0F62FE', width=1.5),
               name='Hourly', showlegend=False),
    row=2, col=1
)

fig_aggregation.add_trace(
    go.Scatter(x=daily_data['timestamp'], y=daily_data['value'],
               mode='lines+markers', line=dict(color='#009D9A', width=2),
               marker=dict(size=4), name='Daily', showlegend=False),
    row=3, col=1
)

fig_aggregation.update_layout(
    title=dict(
        text='<b>Performance: Effect of Aggregation</b><br><sub>Lower resolution = faster rendering + clearer patterns</sub>',
        font=dict(size=16, color='#212529', family='Arial'),
        x=0.5, xanchor='center'
    ),
    height=800,
    showlegend=False,
    plot_bgcolor='white',
    paper_bgcolor='white'
)

fig_aggregation.update_xaxes(showgrid=True, gridcolor='#E9ECEF')
fig_aggregation.update_yaxes(showgrid=True, gridcolor='#E9ECEF')

fig_aggregation.show()

print("-" * 80)
print()

# Strategy 2: LTTB (Largest Triangle Three Buckets) Downsampling
print("STRATEGY 2: LTTB Downsampling - Preserve Visual Shape")
print()
print("Better than random sampling: Preserves peaks, valleys, and trends")
print()

def lttb_downsample(data, threshold):
    """
    Largest Triangle Three Buckets (LTTB) downsampling algorithm.
    Preserves visual appearance while reducing point count.

    Source: Sveinn Steinarsson (2013)
    """
    if len(data) <= threshold:
        return data

    # Convert to numpy for speed
    timestamps = data['timestamp'].values
    values = data['value'].values

    # Always include first point
    sampled_indices = [0]

    # Bucket size
    bucket_size = (len(data) - 2) / (threshold - 2)

    a = 0  # Point initially selected

    for i in range(threshold - 2):
        # Calculate point average for next bucket
        avg_range_start = int((i + 1) * bucket_size) + 1
        avg_range_end = int((i + 2) * bucket_size) + 1
        avg_range_end = min(avg_range_end, len(data))

        avg_x = np.mean(np.arange(avg_range_start, avg_range_end))
        avg_y = np.mean(values[avg_range_start:avg_range_end])

        # Get the range for this bucket
        range_start = int(i * bucket_size) + 1
        range_end = int((i + 1) * bucket_size) + 1

        # Calculate triangle area
        max_area = -1
        max_area_point = range_start

        for j in range(range_start, range_end):
            # Calculate triangle area
            area = abs(
                (a - avg_x) * (values[j] - values[a]) -
                (a - j) * (avg_y - values[a])
            )

            if area > max_area:
                max_area = area
                max_area_point = j

        sampled_indices.append(max_area_point)
        a = max_area_point

    # Always include last point
    sampled_indices.append(len(data) - 1)

    return data.iloc[sampled_indices]

# Apply LTTB
lttb_sample = lttb_downsample(raw_data, threshold=500)
random_sample = raw_data.sample(n=500, random_state=42).sort_values('timestamp')

print(f"Original: {len(raw_data):,} points")
print(f"LTTB sample: {len(lttb_sample):,} points")
print(f"Random sample: {len(random_sample):,} points")
print()

# Compare LTTB vs random sampling
fig_sampling = make_subplots(
    rows=2, cols=1,
    subplot_titles=(
        '<b>Random Sampling</b> - Misses important peaks/valleys',
        '<b>LTTB Sampling</b> - Preserves visual shape'
    ),
    vertical_spacing=0.15
)

# Random sampling
fig_sampling.add_trace(
    go.Scatter(x=raw_data['timestamp'], y=raw_data['value'],
               mode='lines', line=dict(color='#E9ECEF', width=1),
               name='Original', showlegend=False),
    row=1, col=1
)
fig_sampling.add_trace(
    go.Scatter(x=random_sample['timestamp'], y=random_sample['value'],
               mode='markers', marker=dict(color='#DA1E28', size=3),
               name='Random Sample', showlegend=False),
    row=1, col=1
)

# LTTB sampling
fig_sampling.add_trace(
    go.Scatter(x=raw_data['timestamp'], y=raw_data['value'],
               mode='lines', line=dict(color='#E9ECEF', width=1),
               name='Original', showlegend=False),
    row=2, col=1
)
fig_sampling.add_trace(
    go.Scatter(x=lttb_sample['timestamp'], y=lttb_sample['value'],
               mode='lines+markers', line=dict(color='#009D9A', width=1.5),
               marker=dict(size=3), name='LTTB Sample', showlegend=False),
    row=2, col=1
)

fig_sampling.update_layout(
    title=dict(
        text='<b>Downsampling Comparison: Random vs LTTB</b><br>' +
             '<sub>LTTB intelligently preserves visual shape with 99% fewer points</sub>',
        font=dict(size=16, color='#212529', family='Arial'),
        x=0.5, xanchor='center'
    ),
    height=600,
    plot_bgcolor='white',
    paper_bgcolor='white'
)

fig_sampling.update_xaxes(showgrid=True, gridcolor='#E9ECEF')
fig_sampling.update_yaxes(showgrid=True, gridcolor='#E9ECEF')

fig_sampling.show()

print("LTTB Benefits:")
print("  ✓ Preserves peaks and valleys (important for anomaly detection)")
print("  ✓ Maintains overall trend shape")
print("  ✓ 99% reduction in points (50K → 500)")
print("  ✓ Much faster rendering")
print()
print("-" * 80)
print()

# Strategy 3: Caching with LRU
print("STRATEGY 3: Caching with LRU (Least Recently Used)")
print()
print("Avoid recomputing expensive operations")
print()

@lru_cache(maxsize=128)
def expensive_aggregation(date_str, product):
    """Simulate expensive database query or computation"""
    time.sleep(0.1)  # Simulate delay

    # Parse date
    date = pd.to_datetime(date_str)

    # Filter and aggregate
    result = df[
        (df['date'].dt.year == date.year) &
        (df['date'].dt.month == date.month) &
        (df['product'] == product)
    ]['sales'].sum()

    return result

# Test caching performance
print("Testing cache performance...")
test_product = 'Product A'
test_date = '2023-06-01'

# First call (cache miss)
start = time.time()
result1 = expensive_aggregation(test_date, test_product)
time1 = time.time() - start

# Second call (cache hit)
start = time.time()
result2 = expensive_aggregation(test_date, test_product)
time2 = time.time() - start

print(f"  First call (cache miss):  {time1*1000:.2f}ms")
print(f"  Second call (cache hit):  {time2*1000:.2f}ms")
print(f"  Speedup: {time1/time2:.1f}x faster")
print()
print("Cache Info:")
cache_info = expensive_aggregation.cache_info()
print(f"  Hits: {cache_info.hits}")
print(f"  Misses: {cache_info.misses}")
print(f"  Size: {cache_info.currsize}")
print(f"  Max size: {cache_info.maxsize}")
print()
print("-" * 80)
print()

# Strategy 4: Progressive Loading
print("STRATEGY 4: Progressive Loading (Load-on-Demand)")
print()
print("Load overview first, details on zoom/interaction")
print()

# Simulate progressive loading
monthly_overview = large_dataset.set_index('timestamp').resample('M').mean().reset_index()

print("Level 1: Monthly overview (instant load)")
print(f"  Points: {len(monthly_overview)} (vs {len(large_dataset):,} full dataset)")
print(f"  Load time: <50ms")
print()

print("Level 2: Daily detail (load on zoom to month)")
print(f"  Points: ~30 per month")
print(f"  Load time: ~100ms per month")
print()

print("Level 3: Hourly detail (load on zoom to week)")
print(f"  Points: ~168 per week")
print(f"  Load time: ~200ms per week")
print()

print("Pseudo-code for implementation:")
print("""
def on_zoom(date_range):
    if zoom_level == 'year':
        return load_monthly_summary()
    elif zoom_level == 'month':
        return load_daily_data(date_range)
    elif zoom_level == 'week':
        return load_hourly_data(date_range)
    else:
        return load_minute_data(date_range)
""")
print()

print("Benefits:")
print("  ✓ Initial page load: <1 second (overview only)")
print("  ✓ Interactions: <100ms (load only visible range)")
print("  ✓ Reduced bandwidth: Load 1% of data initially")
print("  ✓ Better UX: Fast perceived performance")
print()
print("=" * 80)

✅ PERFORMANCE OPTIMIZATION: Handling Large Datasets

Generating large dataset for performance testing...
✓ Created dataset with 524,161 records
  Memory usage: 8.00 MB

--------------------------------------------------------------------------------

STRATEGY 1: Intelligent Aggregation

Problem: Plotting 500K+ points is slow and unnecessary
Solution: Aggregate to appropriate time resolution

Benchmark Results:
  Raw data (50,000 points):     0.140 seconds
  Hourly (834 points):    0.004 seconds
  Daily (35 points):      0.001 seconds

Takeaway: Aggregating to daily reduces rendering time by ~90%
          AND provides clearer visualization of trends



--------------------------------------------------------------------------------

STRATEGY 2: LTTB Downsampling - Preserve Visual Shape

Better than random sampling: Preserves peaks, valleys, and trends

Original: 50,000 points
LTTB sample: 500 points
Random sample: 500 points



LTTB Benefits:
  ✓ Preserves peaks and valleys (important for anomaly detection)
  ✓ Maintains overall trend shape
  ✓ 99% reduction in points (50K → 500)
  ✓ Much faster rendering

--------------------------------------------------------------------------------

STRATEGY 3: Caching with LRU (Least Recently Used)

Avoid recomputing expensive operations

Testing cache performance...
  First call (cache miss):  108.10ms
  Second call (cache hit):  0.10ms
  Speedup: 1074.4x faster

Cache Info:
  Hits: 1
  Misses: 1
  Size: 1
  Max size: 128

--------------------------------------------------------------------------------

STRATEGY 4: Progressive Loading (Load-on-Demand)

Load overview first, details on zoom/interaction

Level 1: Monthly overview (instant load)
  Points: 12 (vs 524,161 full dataset)
  Load time: <50ms

Level 2: Daily detail (load on zoom to month)
  Points: ~30 per month
  Load time: ~100ms per month

Level 3: Hourly detail (load on zoom to week)
  Points: ~168 per week
  

## Conclusion: Mastering the Art and Science of Data Visualization

### The Journey We've Taken

This tutorial has taken you from basic visualization principles to advanced, production-ready dashboard design. We've covered:

**Foundational Principles (Part 1-3):**
- Cognitive science of visual perception (pre-attentive processing, Gestalt principles)
- Visual hierarchy and accuracy rankings (position > length > angle > area)
- Color theory and semantic meaning (red = danger, green = success)
- The critical importance of data-ink ratio and removing chartjunk

**Design Excellence (Part 4-6):**
- Small multiples for clean comparisons across categories
- Interactive patterns (coordinated views, focus + context, progressive disclosure)
- Multi-dimensional encoding (combining position, color, size, shape)
- Avoiding common mistakes (dual axes, rainbow colors, 3D distortion)

**Narrative and Storytelling (Part 7-8):**
- Three-act structure (Setup → Conflict → Resolution)
- Progressive disclosure for different engagement levels
- Strategic use of annotations and visual emphasis
- Animation principles for temporal storytelling

**Professional Practice (Part 9-11):**
- Accessibility and inclusive design (CVD-safe palettes, keyboard navigation)
- Performance optimization for large datasets (aggregation, sampling, caching)
- Dashboard layout architecture (F-pattern, information hierarchy)

---

### Key Takeaways: The 10 Commandments of Data Visualization

#### 1. **Know Your Audience**
- Executives need 3-second summaries (KPI cards)
- Analysts need 30-second explorations (interactive dashboards)
- Stakeholders need compelling narratives (annotated stories)
- Design for the least technical person in the room

#### 2. **Respect Cognitive Limits**
- 7±2 items in working memory (Miller's Law)
- 5-7 colors maximum per chart
- 3-4 encodings maximum (position + color + size is enough)
- One insight per chart (don't try to show everything)

#### 3. **Use the Right Chart for the Job**
```
Comparison?          → Bar chart (horizontal if long labels)
Distribution?        → Histogram, box plot, violin plot
Relationship?        → Scatter plot (add trend line if needed)
Composition?         → Stacked bar, treemap (NOT pie chart)
Change over time?    → Line chart (continuous) or column (discrete)
```

#### 4. **Leverage Pre-Attentive Attributes**
- Color, size, position are processed in <250ms
- Use them for the MOST important information
- Example: Red outliers in scatter plot (instant recognition)
- Don't waste pre-attentive channels on unimportant details

#### 5. **Design for Accessibility First**
- 8% of males are colorblind (never rely only on red-green)
- Add shape, pattern, or labels as backup
- Ensure 4.5:1 contrast ratio (WCAG AA standard)
- Test with CVD simulator before publishing

#### 6. **Tell Stories, Don't Just Show Data**
- Every visualization should have a clear message
- Use three-act structure: Context → Problem → Solution
- Guide attention with color, annotations, and visual weight
- End with actionable recommendations

#### 7. **Optimize for Performance**
- <10,000 points per chart (aggregate if needed)
- Use WebGL (Scattergl) for large scatter plots
- Implement caching for expensive computations
- Progressive loading: overview first, details on demand

#### 8. **Embrace Interaction Thoughtfully**
- Interactive ≠ better (sometimes static is clearer)
- Follow Shneiderman: Overview → Zoom/Filter → Details
- Provide clear affordances (buttons look clickable)
- Always include reset/clear controls

#### 9. **Maintain Consistency**
- Same colors for same categories across all charts
- Same scales for small multiples (non-negotiable!)
- Consistent typography hierarchy (Title > Subtitle > Labels)
- Predictable layout patterns (F-pattern for dashboards)

#### 10. **Iterate Based on Feedback**
- Show drafts to actual users (not just designers)
- Ask: "What's the main message?" (test comprehension)
- Track metrics: Time to insight, error rate, satisfaction
- A/B test different designs with real users

---

### From Good to Great: Advanced Techniques Checklist

Once you've mastered the basics, push further:

**Visual Design:**
- [ ] Apply perceptual color spaces (CIELAB, not RGB)
- [ ] Use easing functions for natural animations
- [ ] Implement microinteractions (hover states, transitions)
- [ ] Add texture/pattern for additional encoding channel

**Technical Excellence:**
- [ ] Server-side rendering for >100K points
- [ ] Implement virtual scrolling for large tables
- [ ] Use service workers for offline functionality
- [ ] Add keyboard shortcuts for power users

**Analytical Depth:**
- [ ] Statistical annotations (confidence intervals, p-values)
- [ ] Uncertainty visualization (error bars, gradient fills)
- [ ] Comparative benchmarks (industry averages, targets)
- [ ] Drill-down hierarchies (year → quarter → month)

**Storytelling Mastery:**
- [ ] Scrollytelling (narrative unfolds as user scrolls)
- [ ] Conditional formatting (automatic highlighting)
- [ ] Scenario comparison (what-if analysis)
- [ ] Personalization (adapt to user role/preferences)

---

### The Mindset of a Visualization Expert

**Think Like a Designer:**
- Form follows function (beauty serves purpose)
- Less is more (remove until it breaks)
- Consistency breeds familiarity (users learn patterns)
- White space is content (breathing room matters)

**Think Like a Scientist:**
- Question assumptions (is this the right chart type?)
- Test hypotheses (A/B test designs)
- Measure outcomes (track engagement, comprehension)
- Iterate based on evidence (not opinions)

**Think Like a Storyteller:**
- Every chart has a narrative arc
- Conflict drives engagement (show problems, then solutions)
- Emotions matter (colors evoke feelings)
- Call to action (what should viewer do next?)

**Think Like an Engineer:**
- Performance is a feature (slow = bad UX)
- Accessibility is not optional (design for everyone)
- Maintenance matters (document your decisions)
- Scale from day one (will this work with 10x data?)

---

### Common Pitfalls to Avoid (Even for Experts)

1. **Over-Engineering**: Complex doesn't mean better
   - Symptom: 3D effects, animations everywhere, too many interactions
   - Fix: Start simple, add complexity only when needed

2. **Falling in Love with Your Design**: You are not the user
   - Symptom: Defending choices without user testing
   - Fix: Show to 5 real users, observe confusion points

3. **Ignoring Context**: Dashboard on wall ≠ dashboard on phone
   - Symptom: Tiny text, complex interactions on mobile
   - Fix: Responsive design, test on actual devices

4. **Premature Optimization**: Don't optimize before measuring
   - Symptom: Complex caching for data that loads fast anyway
   - Fix: Profile first, optimize bottlenecks only

5. **Feature Creep**: Every user request doesn't need implementation
   - Symptom: 50 filters, 20 chart types, 100 configuration options
   - Fix: Focus on core use cases (80/20 rule)

---

### Resources for Continued Learning

**Books (Essential Reading):**
- **"The Visual Display of Quantitative Information"** by Edward Tufte
  - The bible of data visualization
  - Focus: Data-ink ratio, chartjunk elimination
  
- **"Information Visualization"** by Colin Ware
  - Cognitive science foundations
  - Focus: How humans perceive visual information
  
- **"Storytelling with Data"** by Cole Nussbaumer Knaflic
  - Practical guide to narrative design
  - Focus: Choosing effective visuals, decluttering

- **"Visualization Analysis and Design"** by Tamara Munzner
  - Academic but comprehensive
  - Focus: Task abstraction, design principles

**Online Resources:**
- Flowing Data (flowingdata.com) - Examples and tutorials
- Information is Beautiful (informationisbeautiful.net) - Inspiration
- Observable (observablehq.com) - Interactive notebooks
- Color Brewer (colorbrewer2.org) - CVD-safe palettes

**Tools to Master:**
- **Plotly/Dash**: Python interactive dashboards (what we used)
- **Tableau**: Business intelligence, drag-and-drop
- **D3.js**: Ultimate flexibility, steep learning curve
- **Power BI**: Microsoft ecosystem integration
- **Observable**: Collaborative notebooks with live code

**Communities:**
- Data Visualization Society (datavisualizationsociety.org)
- r/dataisbeautiful (Reddit) - Feedback and inspiration
- Tableau Community Forums
- Stack Overflow (plotly, dash, d3 tags)

---

### Your Next Steps: A 30-Day Learning Plan

**Week 1: Fundamentals**
- Day 1-2: Re-read Parts 1-3, recreate examples
- Day 3-4: Take ugly chart, redesign using principles
- Day 5-7: Build small multiples for your own data

**Week 2: Interactivity**
- Day 8-10: Implement coordinated views dashboard
- Day 11-12: Add animations to existing charts
- Day 13-14: Create progressive disclosure narrative

**Week 3: Accessibility & Performance**
- Day 15-17: Audit existing work for CVD safety
- Day 18-19: Optimize slow dashboard (profile → fix)
- Day 20-21: Add keyboard navigation and ARIA labels

**Week 4: Portfolio Project**
- Day 22-24: Choose complex dataset (Kaggle, government data)
- Day 25-27: Design full dashboard with story arc
- Day 28-29: Get feedback, iterate
- Day 30: Document on GitHub with README, deploy to web

---

### Final Thoughts: The Impact of Great Visualization

**Bad visualizations:**
- Confuse viewers → Poor decisions
- Waste time → Meetings discussing what chart means
- Erode trust → "I don't believe the data"

**Great visualizations:**
- Clarify instantly → Informed decisions in seconds
- Save time → Insights without explanation needed
- Build confidence → "The data tells a clear story"

The difference between bad and great is not talent—it's **knowledge of principles** and **practice applying them**. You now have the knowledge. The practice is up to you.

**Remember**: Every pixel on screen should serve a purpose. Every color should have meaning. Every interaction should reduce friction. Every chart should tell a story.

Now go build something amazing. The world needs clearer communication of data, and you're equipped to provide it.

---

### Acknowledgments

This tutorial builds on decades of research and practice by pioneers in the field:

- **Edward Tufte**: Data-ink ratio, small multiples
- **Jacques Bertin**: Semiology of graphics
- **William Cleveland**: Perceptual experiments
- **Ben Shneiderman**: Information visualization mantra
- **Colin Ware**: Perceptual psychology
- **Stephen Few**: Dashboard design
- **Hans Rosling**: Animated storytelling (Gapminder)
- **Tamara Munzner**: Visualization design frameworks

And countless practitioners who share their work and insights with the community.

---

### About This Tutorial

**Version**: 1.0 (December 2025)  
**Author**: Advanced Data Visualization Curriculum  
**License**: MIT (code), CC BY 4.0 (text)  
**Repository**: [Link to GitHub]  
**Feedback**: [Email or issue tracker]

**Citation**:
If you use this tutorial in your work, please cite:
```
Advanced Data Visualization Tutorial (2025).
Teaching Interactive Dashboards, Perception-Based Design,
and Data Storytelling. Version 1.0.
```

---

**Thank you for investing your time in learning these principles. May your visualizations be clear, beautiful, and impactful.** 🎨📊✨

In [11]:
# Cell 12: Summary statistics and resource compilation

print("=" * 80)
print("🎓 TUTORIAL COMPLETE - SUMMARY & NEXT STEPS")
print("=" * 80)
print()

# Summary of what was covered
print("📚 TOPICS COVERED:")
print("-" * 80)

topics = {
    "Part 1: Visual Perception": [
        "Pre-attentive processing (<250ms perception)",
        "Visual hierarchy of encodings (position > length > angle)",
        "Gestalt principles (proximity, similarity, closure)",
        "Color theory (sequential, diverging, categorical)"
    ],
    "Part 2: Bad vs Good Design": [
        "Chartjunk elimination (3D effects, rainbow colors)",
        "Data-ink ratio maximization",
        "Misleading visualizations (dual axes, truncated bars)",
        "Chart type selection criteria"
    ],
    "Part 3: Effective Visualizations": [
        "Sorted bar charts with clear hierarchy",
        "Time series with visual emphasis",
        "Perceptually-uniform heatmaps",
        "Direct labeling vs legends"
    ],
    "Part 4: Small Multiples": [
        "Consistent scales across panels",
        "Reducing overplotting with facets",
        "Enabling macro-micro reading",
        "Optimal ordering strategies"
    ],
    "Part 5: Interactive Design": [
        "Shneiderman's mantra (overview → zoom → details)",
        "Coordinated multiple views",
        "Focus + context patterns",
        "Progressive disclosure"
    ],
    "Part 6: Multi-Dimensional Encoding": [
        "Bubble charts (5+ dimensions)",
        "Parallel coordinates plots",
        "Layering encodings effectively",
        "Motion as encoding channel"
    ],
    "Part 7: Data Storytelling": [
        "Three-act structure (setup → conflict → resolution)",
        "Progressive disclosure patterns",
        "Strategic annotations",
        "Emotional design with color"
    ],
    "Part 8: Animation": [
        "When to animate (and when not to)",
        "Animation principles (timing, easing)",
        "Animated bar races",
        "Gapminder-style scatter plots"
    ],
    "Part 9: Accessibility": [
        "CVD-safe color palettes",
        "Redundant encoding (color + shape)",
        "WCAG contrast requirements",
        "Keyboard navigation and ARIA"
    ],
    "Part 10: Performance": [
        "Intelligent aggregation strategies",
        "LTTB downsampling algorithm",
        "Caching with LRU",
        "Progressive loading patterns"
    ],
    "Part 11: Professional Practice": [
        "F-pattern dashboard layout",
        "Information hierarchy",
        "KPI card design",
        "Responsive composition"
    ]
}

for part, items in topics.items():
    print(f"\n{part}:")
    for item in items:
        print(f"  ✓ {item}")

print()
print("=" * 80)
print()

# Key statistics from tutorial
print("📊 TUTORIAL STATISTICS:")
print("-" * 80)
print(f"Total Examples Created:        30+")
print(f"Visualizations Demonstrated:   50+")
print(f"Code Cells:                    12")
print(f"Markdown Cells:                11")
print(f"Lines of Code:                 ~2,500")
print(f"Concepts Covered:              100+")
print(f"Best Practices Taught:         50+")
print()
print("=" * 80)
print()

# Skill assessment
print("🎯 SELF-ASSESSMENT CHECKLIST:")
print("-" * 80)
print("Can you now:")
print()

skills = [
    "Explain why position is more accurate than color for quantitative data?",
    "Choose appropriate chart types for different data relationships?",
    "Design CVD-safe visualizations without relying on color alone?",
    "Implement coordinated multiple views in a dashboard?",
    "Apply three-act structure to create narrative visualizations?",
    "Optimize visualizations for large datasets (>10K points)?",
    "Use small multiples instead of overlapping lines?",
    "Calculate and verify WCAG contrast ratios?",
    "Implement progressive disclosure for different user types?",
    "Critique visualizations using evidence-based principles?"
]

for i, skill in enumerate(skills, 1):
    print(f"  {i:2d}. {skill}")

print()
print("If you answered YES to 8+, you're ready for advanced work!")
print("If you answered YES to 5-7, review relevant sections")
print("If you answered YES to <5, practice the fundamentals more")
print()
print("=" * 80)
print()

# Recommended next steps
print("🚀 RECOMMENDED NEXT STEPS:")
print("-" * 80)
print()

next_steps = {
    "Beginner": [
        "Recreate all examples with your own data",
        "Join r/dataisbeautiful and critique 5 posts",
        "Build 3 different chart types for same dataset",
        "Read 'Storytelling with Data' by Cole Nussbaumer Knaflic"
    ],
    "Intermediate": [
        "Build full dashboard for real business problem",
        "Contribute to open-source viz project (Plotly, etc.)",
        "Create CVD-safe color palette for your organization",
        "Present at local data science meetup"
    ],
    "Advanced": [
        "Publish tutorial on Medium/dev.to",
        "Build custom D3.js visualization",
        "Conduct A/B test on dashboard designs",
        "Mentor junior analysts in visualization"
    ]
}

for level, steps in next_steps.items():
    print(f"{level} Level:")
    for step in steps:
        print(f"  → {step}")
    print()

print("=" * 80)
print()

# Tool recommendations
print("🛠️  TOOLS TO MASTER:")
print("-" * 80)
print()

tools = {
    "Python Visualization": [
        "Plotly/Dash (what we used) - Interactive web dashboards",
        "Matplotlib/Seaborn - Static, publication-quality",
        "Altair - Declarative, grammar of graphics",
        "Bokeh - Large datasets, streaming data"
    ],
    "Business Intelligence": [
        "Tableau - Drag-and-drop, executive dashboards",
        "Power BI - Microsoft ecosystem integration",
        "Looker - SQL-based, developer-friendly",
        "Metabase - Open-source, easy setup"
    ],
    "Web Technologies": [
        "D3.js - Ultimate flexibility, steep curve",
        "Observable - Collaborative notebooks",
        "Chart.js - Simple, lightweight",
        "ECharts - Rich features, good docs"
    ],
    "Design Tools": [
        "Figma - Mock-ups and prototypes",
        "ColorBrewer - CVD-safe palettes",
        "Coolors - Palette generation",
        "Contrast Checker - WCAG compliance"
    ]
}

for category, tool_list in tools.items():
    print(f"{category}:")
    for tool in tool_list:
        print(f"  • {tool}")
    print()

print("=" * 80)
print()

# Final resources compilation
print("📖 ESSENTIAL READING:")
print("-" * 80)
print()

books = [
    ("The Visual Display of Quantitative Information", "Edward Tufte", "★★★★★"),
    ("Information Visualization (3rd Ed)", "Colin Ware", "★★★★★"),
    ("Storytelling with Data", "Cole Nussbaumer Knaflic", "★★★★★"),
    ("The Functional Art", "Alberto Cairo", "★★★★☆"),
    ("Visualization Analysis and Design", "Tamara Munzner", "★★★★☆"),
    ("Show Me the Numbers", "Stephen Few", "★★★★☆")
]

for title, author, rating in books:
    print(f"  {rating} {title}")
    print(f"      by {author}")
    print()

print("=" * 80)
print()

# Community resources
print("👥 COMMUNITIES TO JOIN:")
print("-" * 80)
print()

communities = [
    ("Data Visualization Society", "datavisualizationsociety.org", "Slack, challenges, resources"),
    ("r/dataisbeautiful", "reddit.com/r/dataisbeautiful", "Feedback, inspiration, critique"),
    ("Observable Community", "observablehq.com", "Live code, collaboration"),
    ("Plotly Community", "community.plotly.com", "Technical support, examples"),
    ("Information is Beautiful", "informationisbeautiful.net", "Awards, showcases")
]

for name, url, description in communities:
    print(f"  • {name}")
    print(f"    {url}")
    print(f"    → {description}")
    print()

print("=" * 80)
print()

# Final motivational message
print("🎯 YOUR VISUALIZATION JOURNEY STARTS NOW")
print("=" * 80)
print()
print("Remember:")
print("  • Great visualization is 10% inspiration, 90% perspiration")
print("  • Every expert was once a beginner who kept practicing")
print("  • Your first 100 charts will be learning experiences")
print("  • Feedback is gold - seek it actively")
print("  • The best visualization is the one that drives decisions")
print()
print("You now have the knowledge. Go build something amazing!")
print()
print("=" * 80)
print()
print("✨ Tutorial Complete - Thank you for learning with us! ✨")
print()
print("=" * 80)

# Generate completion certificate (fun touch)
print()
print("╔════════════════════════════════════════════════════════════════════╗")
print("║                     CERTIFICATE OF COMPLETION                      ║")
print("║                                                                    ║")
print("║              Advanced Data Visualization Mastery                  ║")
print("║                                                                    ║")
print("║  This certifies that you have completed comprehensive training    ║")
print("║  in interactive dashboards, perception-based design, and data     ║")
print("║  storytelling principles.                                         ║")
print("║                                                                    ║")
print("║  Topics Mastered:                                                 ║")
print("║    ✓ Visual Perception & Cognitive Science                        ║")
print("║    ✓ Interactive Dashboard Architecture                           ║")
print("║    ✓ Accessibility & Inclusive Design                             ║")
print("║    ✓ Performance Optimization                                     ║")
print("║    ✓ Data Storytelling & Narrative Design                         ║")
print("║                                                                    ║")
print(f"║  Date: {datetime.now().strftime('%B %d, %Y'):>58} ║")
print("║                                                                    ║")
print("║  Now go forth and create beautiful, insightful visualizations!    ║")
print("╚════════════════════════════════════════════════════════════════════╝")
print()

🎓 TUTORIAL COMPLETE - SUMMARY & NEXT STEPS

📚 TOPICS COVERED:
--------------------------------------------------------------------------------

Part 1: Visual Perception:
  ✓ Pre-attentive processing (<250ms perception)
  ✓ Visual hierarchy of encodings (position > length > angle)
  ✓ Gestalt principles (proximity, similarity, closure)
  ✓ Color theory (sequential, diverging, categorical)

Part 2: Bad vs Good Design:
  ✓ Chartjunk elimination (3D effects, rainbow colors)
  ✓ Data-ink ratio maximization
  ✓ Misleading visualizations (dual axes, truncated bars)
  ✓ Chart type selection criteria

Part 3: Effective Visualizations:
  ✓ Sorted bar charts with clear hierarchy
  ✓ Time series with visual emphasis
  ✓ Perceptually-uniform heatmaps
  ✓ Direct labeling vs legends

Part 4: Small Multiples:
  ✓ Consistent scales across panels
  ✓ Reducing overplotting with facets
  ✓ Enabling macro-micro reading
  ✓ Optimal ordering strategies

Part 5: Interactive Design:
  ✓ Shneiderman's mantra (