# Data Visualisation Practice (Demonstration)

_This notebook demonstrates applying data visualisation techniques from Weeks 7-8 to a complete project analysis. Use this workflow as a template for creating visualisations for your group project._

**Important**: This demonstration uses a sample retail sales dataset. The visualisation techniques and workflow demonstrated here can be adapted to your group project dataset.

Note: This Jupyter Notebook was originally compiled by Alex Reppel (AR) based on conversations with [ClaudeAI](https://claude.ai/) *(version 3.5 Sonnet)*. For this year's materials, further revisions were made using [Claude Code](https://www.anthropic.com/claude-code) *(Sonnet 4.5)*, including updated documentation and git commit messages.

## Building on previous sessions

This demonstration integrates visualisation concepts from:

- **Week 07**: Basic plotting (line, bar, scatter, histogram), Pandas/Matplotlib/Seaborn
- **Week 08**: Advanced techniques (time series, small multiples, interactive plots)

We'll apply these skills to a complete visualisation workflow for a data analysis project.

## Visualisation workflow overview

This demonstration follows a typical data visualisation workflow:

1. **Setup and data loading** - Import libraries and load your dataset
2. **Exploratory visualisations** - Quick plots to understand your data
3. **Analytical visualisations** - Focused plots to answer specific questions
4. **Multi-panel figures** - Combine related visualisations
5. **Polish and finalise** - Create publication-ready figures

By the end, you'll have a complete suite of visualisations ready for your project report.

***

## ðŸŽ¯ CORE CONTENT (Essential for Group Project)

*(Estimated time: 60-75 minutes.)*

The sections below demonstrate the essential visualisation workflow for your group project:

- Exploratory visualisations for understanding your data
- Analytical visualisations for answering research questions
- Creating professional multi-panel figures
- Best practices for report-ready visualisations

Work through all examples and adapt these techniques to your own dataset.

***

## Part 1: Setup and data loading

### Import libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set_style('whitegrid')
sns.set_palette('colorblind')  # Accessible color palette

# Set figure quality
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi'] = 300  # High resolution for saving

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

### Load the dataset

We'll use a retail sales dataset as an example. For your group project, replace this with your own dataset.

In [None]:
# Load data (adapt path for your project)
df = pd.read_csv('assets/data/week09/retail_sales.csv')

print(f"Dataset loaded: {len(df):,} rows, {len(df.columns)} columns")
print(f"\nColumns: {list(df.columns)}")

### Initial data exploration

Always explore your data before visualizing.

In [None]:
# View first few rows
print("First 10 rows:")
df.head(10)

In [None]:
# Dataset information
print("Dataset information:")
df.info()

In [None]:
# Summary statistics
print("Summary statistics:")
df.describe()

### Data preparation for visualisation

Prepare data types and create any necessary derived columns.

In [None]:
# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

# Extract temporal features
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['month_name'] = df['date'].dt.month_name()
df['quarter'] = df['date'].dt.quarter

print("Date processing complete:")
print(df[['date', 'year', 'month', 'month_name', 'quarter']].head())

## Part 2: Exploratory visualisations

### Purpose of exploratory visualisations

Exploratory plots help you:
- Understand data distributions
- Identify outliers and anomalies
- Discover patterns and relationships
- Generate hypotheses for analysis

**These are quick, working visualisations - not final figures for your report.**

### Section 1: Distributions

Understand the distribution of key numerical variables.

In [None]:
# Distribution of sales amounts
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Histogram
axes[0].hist(df['total_amount'], bins=30, edgecolor='black', alpha=0.7)
axes[0].set_xlabel('Sales Amount (Â£)')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Distribution of Sales Amounts')
axes[0].grid(axis='y', alpha=0.3)

# Box plot
axes[1].boxplot(df['total_amount'], vert=True)
axes[1].set_ylabel('Sales Amount (Â£)')
axes[1].set_title('Sales Amount Box Plot')
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nSales statistics:")
print(f"Mean: Â£{df['total_amount'].mean():.2f}")
print(f"Median: Â£{df['total_amount'].median():.2f}")
print(f"Std Dev: Â£{df['total_amount'].std():.2f}")

**Interpretation**: 
- The histogram shows the shape of the distribution (normal, skewed, bimodal?)
- The box plot reveals median, quartiles, and potential outliers
- Together, these help you understand if your data needs transformation or special handling

**For your group project**: Create similar plots for your key numerical variables.

### Section 2: Categorical distributions

Explore the distribution of categorical variables.

In [None]:
# Count by category
category_counts = df['product_category'].value_counts()

plt.figure(figsize=(10, 6))
category_counts.plot(kind='barh', color='skyblue')
plt.xlabel('Number of Sales')
plt.ylabel('Product Category')
plt.title('Sales Volume by Product Category')
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nCategory distribution:")
print(category_counts)

**For your group project**: Visualise your categorical variables (e.g., customer segments, regions, product types) to understand their distribution and identify dominant categories.

### Section 3: Time series trends

Visualize how metrics change over time.

In [None]:
# Monthly sales trend
monthly_sales = df.groupby('date')['total_amount'].sum()

plt.figure(figsize=(12, 6))
plt.plot(monthly_sales.index, monthly_sales.values, marker='o', linewidth=2)
plt.xlabel('Date')
plt.ylabel('Total Sales (Â£)')
plt.title('Sales Trend Over Time')
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

**For your group project**: If your data has a time component, always visualise trends. This often reveals:
- Seasonal patterns
- Growth or decline trends
- Anomalies or special events

### Section 4: Relationships between variables

Explore how variables relate to each other.

In [None]:
# Scatter plot: Price vs Sales Amount
plt.figure(figsize=(10, 6))
plt.scatter(df['unit_price'], df['total_amount'], alpha=0.5)
plt.xlabel('Unit Price (Â£)')
plt.ylabel('Sales Amount (Â£)')
plt.title('Relationship Between Unit Price and Sales Amount')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate correlation
correlation = df['unit_price'].corr(df['total_amount'])
print(f"\nCorrelation: {correlation:.3f}")

**For your group project**: Use scatter plots to explore relationships. Look for:
- Positive/negative correlations
- Clusters or groupings
- Outliers that don't fit the pattern

### Section 5: Correlation heatmap

Visualize correlations between multiple numerical variables.

In [None]:
# Select numerical columns
numerical_cols = df.select_dtypes(include=[np.number]).columns
correlation_matrix = df[numerical_cols].corr()

# Heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
            center=0, square=True, linewidths=1)
plt.title('Correlation Matrix of Numerical Variables')
plt.tight_layout()
plt.show()

**Interpretation**:
- Dark red = strong positive correlation
- Dark blue = strong negative correlation
- White = no correlation

**For your group project**: This helps you identify which variables are related and might be worth analysing together.

## Part 3: Analytical visualisations

### Purpose of analytical visualisations

Analytical plots:
- Answer specific research questions
- Support findings in your report
- Compare groups or categories
- Show evidence for conclusions

**These visualisations should be more polished and will likely appear in your final report.**

### Research Question 1: Which product categories generate the most revenue?

Create a clear, professional visualisation to answer this question.

In [None]:
# Calculate total revenue by category
revenue_by_category = df.groupby('product_category')['total_amount'].sum().sort_values(ascending=True)

# Create professional visualisation
fig, ax = plt.subplots(figsize=(10, 6))
revenue_by_category.plot(kind='barh', ax=ax, color='steelblue', edgecolor='black')

# Customization
ax.set_xlabel('Total Revenue (Â£)', fontsize=12, fontweight='bold')
ax.set_ylabel('Product Category', fontsize=12, fontweight='bold')
ax.set_title('Total Revenue by Product Category', fontsize=14, fontweight='bold', pad=20)
ax.grid(axis='x', alpha=0.3, linestyle='--')

# Add value labels
for i, v in enumerate(revenue_by_category.values):
    ax.text(v, i, f' Â£{v:,.0f}', va='center', fontsize=9)

plt.tight_layout()
plt.show()

print("\nRevenue by category:")
print(revenue_by_category.apply(lambda x: f"Â£{x:,.2f}"))

**Design choices explained**:
- **Horizontal bars**: Easier to read category labels
- **Sorted order**: Shows ranking clearly
- **Value labels**: Exact numbers for reference
- **Bold titles/labels**: Professional appearance
- **Grid lines**: Help read values

**For your group project**: When answering research questions, make deliberate design choices that enhance clarity.

### Research Question 2: How do sales vary by customer segment and category?

Use a grouped bar chart to compare multiple dimensions.

In [None]:
# Calculate average sales by customer_segment and category
sales_by_customer_segment_category = df.groupby(['customer_segment', 'product_category'])['total_amount'].mean().unstack()

# Grouped bar chart
fig, ax = plt.subplots(figsize=(12, 6))
sales_by_customer_segment_category.plot(kind='bar', ax=ax, width=0.8, edgecolor='black')

# Customization
ax.set_xlabel('Customer Segment', fontsize=12, fontweight='bold')
ax.set_ylabel('Average Sales Amount (Â£)', fontsize=12, fontweight='bold')
ax.set_title('Average Sales by Customer Segment and Product Category', fontsize=14, fontweight='bold', pad=20)
ax.legend(title='Product Category', title_fontsize=10, fontsize=9, bbox_to_anchor=(1.05, 1))
ax.grid(axis='y', alpha=0.3, linestyle='--')
plt.xticks(rotation=0)

plt.tight_layout()
plt.show()

**For your group project**: Grouped bar charts are excellent for comparing:
- Multiple categories across groups
- Performance metrics by region/segment
- Time periods side by side

### Research Question 3: Are there seasonal patterns in sales?

Visualize patterns over time periods.

In [None]:
# Average sales by month
sales_by_month = df.groupby('month_name')['total_amount'].mean()

# Reorder months chronologically
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']
sales_by_month = sales_by_month.reindex([m for m in month_order if m in sales_by_month.index])

# Line plot with markers
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(sales_by_month.index, sales_by_month.values, 
        marker='o', linewidth=2.5, markersize=8, color='darkgreen')

# Customization
ax.set_xlabel('Month', fontsize=12, fontweight='bold')
ax.set_ylabel('Average Sales Amount (Â£)', fontsize=12, fontweight='bold')
ax.set_title('Seasonal Sales Patterns: Average Sales by Month', fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3, linestyle='--')
plt.xticks(rotation=45, ha='right')

plt.tight_layout()
plt.show()

# Identify peak and low months
peak_month = sales_by_month.idxmax()
low_month = sales_by_month.idxmin()
print(f"\nPeak month: {peak_month} (Â£{sales_by_month.max():.2f})")
print(f"Lowest month: {low_month} (Â£{sales_by_month.min():.2f})")

**For your group project**: Time-based patterns are important for:
- Identifying seasonal trends
- Supporting business decisions
- Understanding cyclical behaviour

## Part 4: Multi-panel figures (Week 08 techniques)

### Purpose of multi-panel figures

Combine related visualisations to:
- Tell a comprehensive story
- Show multiple perspectives on the same data
- Compare different metrics side by side
- Create publication-quality integrated figures

### Example 1: Comprehensive category analysis

Create a 2x2 grid showing different aspects of category performance.

In [None]:
# Prepare data
category_stats = df.groupby('product_category').agg({
    'total_amount': ['sum', 'mean', 'count']
}).round(2)
category_stats.columns = ['total_revenue', 'avg_sale', 'num_sales']
category_stats = category_stats.sort_values('total_revenue', ascending=False)

# Create 2x2 subplot
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Comprehensive Product Category Analysis', fontsize=16, fontweight='bold', y=1.00)

# Panel A: Total Revenue
category_stats['total_revenue'].plot(kind='bar', ax=axes[0, 0], color='steelblue', edgecolor='black')
axes[0, 0].set_title('A) Total Revenue by Category', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Product Category', fontsize=10)
axes[0, 0].set_ylabel('Total Revenue (Â£)', fontsize=10)
axes[0, 0].grid(axis='y', alpha=0.3)
axes[0, 0].tick_params(axis='x', rotation=45)

# Panel B: Average Sale
category_stats['avg_sale'].plot(kind='bar', ax=axes[0, 1], color='coral', edgecolor='black')
axes[0, 1].set_title('B) Average Sale Amount by Category', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Product Category', fontsize=10)
axes[0, 1].set_ylabel('Average Sale (Â£)', fontsize=10)
axes[0, 1].grid(axis='y', alpha=0.3)
axes[0, 1].tick_params(axis='x', rotation=45)

# Panel C: Number of Sales
category_stats['num_sales'].plot(kind='bar', ax=axes[1, 0], color='lightgreen', edgecolor='black')
axes[1, 0].set_title('C) Sales Volume by Category', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Product Category', fontsize=10)
axes[1, 0].set_ylabel('Number of Sales', fontsize=10)
axes[1, 0].grid(axis='y', alpha=0.3)
axes[1, 0].tick_params(axis='x', rotation=45)

# Panel D: Revenue vs Volume scatter
axes[1, 1].scatter(category_stats['num_sales'], category_stats['total_revenue'], 
                   s=200, alpha=0.6, color='purple', edgecolor='black')
axes[1, 1].set_title('D) Revenue vs Sales Volume', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Number of Sales', fontsize=10)
axes[1, 1].set_ylabel('Total Revenue (Â£)', fontsize=10)
axes[1, 1].grid(True, alpha=0.3)

# Add category labels to scatter plot
for idx, row in category_stats.iterrows():
    axes[1, 1].annotate(idx, (row['num_sales'], row['total_revenue']), 
                        fontsize=8, ha='center', va='bottom')

plt.tight_layout()
plt.show()

**Why this works**:
- **Four perspectives**: Revenue, average, volume, and relationship
- **Consistent styling**: Same font sizes, grid style across panels
- **Panel labels (A-D)**: Easy to reference in text
- **Comprehensive story**: Tells the complete category performance story

**For your group project**: Create similar multi-panel figures for your key findings. This is an excellent way to maximise information density while maintaining clarity.

### Example 2: Small multiples (FacetGrid)

Show the same metric across multiple groups using small multiples.

In [None]:
# Sales trend by category - small multiples
monthly_category_sales = df.groupby(['month', 'product_category'])['total_amount'].mean().reset_index()

# Create FacetGrid
g = sns.FacetGrid(monthly_category_sales, col='product_category', col_wrap=2, height=4, aspect=1.5)
g.map(plt.plot, 'month', 'total_amount', marker='o', linewidth=2)
g.set_axis_labels('Month', 'Average Sales (Â£)')
g.set_titles('{col_name}', fontsize=12, fontweight='bold')
g.fig.suptitle('Monthly Sales Trends by Product Category', fontsize=14, fontweight='bold', y=1.02)

# Add grid to each subplot
for ax in g.axes.flat:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

**Why small multiples are powerful**:
- **Easy comparison**: Same scale across all panels
- **Pattern identification**: Spot differences and similarities quickly
- **Reduces clutter**: Better than overlapping lines on one plot
- **Professional appearance**: Clean, organised presentation

**For your group project**: Use small multiples when comparing:
- Trends across different categories
- Distributions across groups
- Regional or segment performance

## Part 5: Polish and finalise

### Creating publication-ready figures

Key considerations for final figures:
1. **Size and resolution**: Appropriate dimensions and DPI
2. **Colour scheme**: Consistent and accessible
3. **Labels and titles**: Clear and informative
4. **White space**: Not too crowded
5. **File format**: Vector (PDF) or high-res raster (PNG)

### Best practices checklist

Before including a figure in your report, verify:

**Content**:
- âœ… Answers a specific research question
- âœ… Appropriate chart type for the data and message
- âœ… Data is accurate and properly aggregated

**Design**:
- âœ… Clear, informative title
- âœ… Axis labels with units
- âœ… Legend (if needed) is clear and positioned well
- âœ… Font sizes are readable (not too small)
- âœ… Colours are colourblind-friendly
- âœ… Grid lines enhance (not distract)

**Technical**:
- âœ… High resolution (300 DPI for print)
- âœ… Appropriate file format (PNG, PDF)
- âœ… Proper aspect ratio (not stretched)
- âœ… White space trimmed (`bbox_inches='tight'`)

### Saving figures properly

In [None]:
# Example: Create and save a publication-ready figure
fig, ax = plt.subplots(figsize=(10, 6))

# Create your visualisation
revenue_by_category.plot(kind='barh', ax=ax, color='steelblue', edgecolor='black')
ax.set_xlabel('Total Revenue (Â£)', fontsize=12, fontweight='bold')
ax.set_ylabel('Product Category', fontsize=12, fontweight='bold')
ax.set_title('Total Revenue by Product Category', fontsize=14, fontweight='bold', pad=20)
ax.grid(axis='x', alpha=0.3, linestyle='--')

# Save in multiple formats
# Uncomment these lines when you want to save:
# plt.savefig('figure1_revenue_by_category.png', dpi=300, bbox_inches='tight')
# plt.savefig('figure1_revenue_by_category.pdf', bbox_inches='tight')

plt.tight_layout()
plt.show()

print("\nTo save figures, uncomment the plt.savefig() lines above.")
print("PNG format: Good for reports, presentations (300 DPI recommended)")
print("PDF format: Vector format, perfect for publications (scalable)")

***

## ðŸ“š SUPPLEMENTARY CONTENT (Interactive & Advanced)

*(Estimated time: 20-30 minutes.)*

This section introduces interactive visualisations using hvPlot and advanced customisation techniques. These are optional enhancements that can make your analysis more engaging, especially for presentations.

***

## Part 6: Interactive visualisations (Week 08)

### Why interactive visualisations?

Interactive plots are excellent for:
- **Presentations**: Engage your audience with dynamic exploration
- **Exploratory analysis**: Zoom, pan, hover to understand your data
- **Stakeholder meetings**: Allow others to explore data themselves

**Note**: Your final report must use static figures (PNG/PDF). Interactive plots are bonus enhancements for presentations or online dashboards.

### Setup for interactive plotting

In [None]:
# Import hvPlot
import hvplot.pandas

# Optional: Set hvPlot defaults
import holoviews as hv
hv.extension('bokeh')

### Example 1: Interactive time series

In [None]:
# Interactive time series with hover information
monthly_sales_df = df.groupby('date')['total_amount'].sum().reset_index()

monthly_sales_df.hvplot.line(
    x='date', 
    y='total_amount',
    title='Interactive Sales Trend',
    xlabel='Date',
    ylabel='Total Sales (Â£)',
    width=800,
    height=400,
    grid=True,
    hover_cols=['date', 'total_amount']
)

**Interactive features**:
- Hover to see exact values
- Zoom in to specific time periods
- Pan across the timeline
- Reset view to original

### Example 2: Interactive scatter with categories

In [None]:
# Interactive scatter plot with category coloring
df.hvplot.scatter(
    x='unit_price',
    y='total_amount',
    by='product_category',
    title='Price vs Sales Amount by Category (Interactive)',
    xlabel='Unit Price (Â£)',
    ylabel='Sales Amount (Â£)',
    width=800,
    height=500,
    legend='top_right',
    alpha=0.6,
    size=100
)

**Why this is useful for your group project**:
- Hover to identify individual points
- Toggle categories on/off in legend
- Zoom to investigate clusters
- Better exploration than static scatter plot

### Example 3: Interactive bar chart with sorting

In [None]:
# Interactive bar chart
category_revenue = df.groupby('product_category')['total_amount'].sum().reset_index()
category_revenue = category_revenue.sort_values('total_amount', ascending=False)

category_revenue.hvplot.bar(
    x='product_category',
    y='total_amount',
    title='Total Revenue by Category (Interactive)',
    xlabel='Product Category',
    ylabel='Total Revenue (Â£)',
    width=800,
    height=500,
    rot=45,
    hover_cols=['product_category', 'total_amount']
)

## Part 7: Creating an integrated dashboard figure

### Final group project figure: Comprehensive analysis

Create one integrated figure that tells your complete story. This is ideal for your group project report's main findings section.

In [None]:
# Create comprehensive dashboard-style figure
fig = plt.figure(figsize=(16, 12))
fig.suptitle('Retail Sales Analysis: Complete Overview', fontsize=18, fontweight='bold', y=0.98)

# Create grid for subplots
gs = fig.add_gridspec(3, 3, hspace=0.5, wspace=0.5)

# Top row: Time series (spans 2 columns)
ax1 = fig.add_subplot(gs[0, :2])
monthly_sales.plot(ax=ax1, marker='o', linewidth=2, color='darkblue')
ax1.set_title('A) Sales Trend Over Time', fontsize=12, fontweight='bold', loc='left')
ax1.set_xlabel('Date', fontsize=10)
ax1.set_ylabel('Total Sales (Â£)', fontsize=10)
ax1.grid(True, alpha=0.3)

# Top right: Key metrics box
ax2 = fig.add_subplot(gs[0, 2])
ax2.axis('off')

# Calculate all metrics outside any string to avoid nested quotes
date_min_str = df['date'].min().strftime('%Y-%m-%d')
date_max_str = df['date'].max().strftime('%Y-%m-%d')
total_sales = df['total_amount'].sum()
avg_sale = df['total_amount'].mean()
num_sales = len(df)
num_categories = df['product_category'].nunique()
num_regions = df['customer_segment'].nunique()
separator_line = '=' * 25

# Build metrics text with f-string using only pre-calculated values
metrics_text = f"""
KEY METRICS
{separator_line}
Total Sales: Â£{total_sales:,.2f}
Average Sale: Â£{avg_sale:.2f}
Number of Sales: {num_sales:,}
Categories: {num_categories}
Regions: {num_regions}
Date Range: {date_min_str}
         to {date_max_str}
"""

ax2.text(0.1, 0.5, metrics_text, fontsize=10, family='monospace', 
         verticalalignment='center', bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))

# Middle left: Revenue by category
ax3 = fig.add_subplot(gs[1, 0])
top_5_revenue = revenue_by_category.tail(5)
top_5_revenue.plot(kind='barh', ax=ax3, color='steelblue', edgecolor='black')
ax3.set_title('B) Top 5 Categories by Revenue', fontsize=12, fontweight='bold', loc='left')
ax3.set_xlabel('Revenue (Â£)', fontsize=10)
ax3.set_ylabel('')
ax3.grid(axis='x', alpha=0.3)

# Middle center: Sales distribution
ax4 = fig.add_subplot(gs[1, 1])
ax4.hist(df['total_amount'], bins=30, edgecolor='black', alpha=0.7, color='coral')
ax4.set_title('C) Sales Amount Distribution', fontsize=12, fontweight='bold', loc='left')
ax4.set_xlabel('Sales Amount (Â£)', fontsize=10)
ax4.set_ylabel('Frequency', fontsize=10)
ax4.grid(axis='y', alpha=0.3)

# Middle right: Seasonal pattern
ax5 = fig.add_subplot(gs[1, 2])
sales_by_month.plot(kind='line', ax=ax5, marker='o', linewidth=2, color='darkgreen')
ax5.set_title('D) Seasonal Pattern', fontsize=12, fontweight='bold', loc='left')
ax5.set_xlabel('Month', fontsize=10)
ax5.set_ylabel('Avg Sales (Â£)', fontsize=10)
ax5.grid(True, alpha=0.3)
ax5.tick_params(axis='x', rotation=45, labelsize=8)

# Bottom row: Regional comparison (spans 3 columns)
ax6 = fig.add_subplot(gs[2, :])
sales_by_customer_segment_category.plot(kind='bar', ax=ax6, width=0.8, edgecolor='black')
ax6.set_title(
    'E) Sales Performance by Customer Segment and Category',
    fontsize=12,
    fontweight='bold',
    loc='left')
ax6.set_xlabel('Segment', fontsize=10)
ax6.set_ylabel('Average Sales (Â£)', fontsize=10)
ax6.legend(title='Category', fontsize=8, title_fontsize=9)
ax6.grid(axis='y', alpha=0.3)
ax6.tick_params(axis='x', rotation=0)

# Save the integrated figure
# plt.savefig('retail_sales_complete_analysis.png', dpi=300, bbox_inches='tight')
# plt.savefig('retail_sales_complete_analysis.pdf', bbox_inches='tight')

plt.show()

print("\nIntegrated dashboard figure created successfully!")
print("This figure combines multiple perspectives into one comprehensive visualisation.")

**Why this dashboard works**:
- **Tells complete story**: Multiple perspectives in one figure
- **Logical flow**: Time series â†’ distributions â†’ comparisons
- **Key metrics**: Summary statistics readily visible
- **Panel labels (A-E)**: Easy to reference in report text
- **Consistent styling**: Unified colour scheme and design
- **Publication-ready**: High quality for reports or presentations

**For your group project**: Use this approach to create your main findings figure. One comprehensive, well-designed figure can be worth several separate plots.

## Summary

This demonstration covered the complete visualisation workflow:

### Part 1: Setup and data loading
- Import visualisation libraries
- Configure styles and settings
- Load and prepare data

### Part 2: Exploratory visualisations
- Distributions (histograms, box plots)
- Categorical distributions (bar charts)
- Time series trends (line plots)
- Relationships (scatter plots)
- Correlation heatmaps

### Part 3: Analytical visualisations
- Research question-focused plots
- Professional styling and customisation
- Grouped comparisons
- Seasonal pattern analysis

### Part 4: Multi-panel figures
- 2x2 comprehensive analysis grids
- Small multiples with FacetGrid
- Consistent styling across panels

### Part 5: Polish and finalise
- Publication-ready figures checklist
- Proper file saving (PNG, PDF)
- High resolution and formatting

### Part 6: Interactive visualisations (Supplementary)
- hvPlot for interactive exploration
- Hover tooltips and zooming
- Enhanced engagement for presentations

### Part 7: Integrated dashboard
- Comprehensive multi-panel figure
- Complete story in one visualisation
- Report-ready integrated analysis

## Adapting to your group project

To use these techniques for your group project:

1. **Load your data**: Replace the retail sales dataset with your own
2. **Identify research questions**: What do you want to discover?
3. **Explore first**: Create quick plots to understand your data
4. **Answer questions**: Build focused analytical visualisations
5. **Combine insights**: Create multi-panel figures for key findings
6. **Polish**: Apply best practices and save high-quality versions
7. **Optional**: Add interactivity for presentations

## Key principles for success

âœ… **Start simple** - Basic plots first, complexity later
âœ… **Choose appropriate chart types** - Match visualisation to data and message
âœ… **Be consistent** - Use same colours, fonts, styles throughout
âœ… **Label clearly** - Titles, axes, units, legends
âœ… **Consider accessibility** - Colourblind-friendly palettes
âœ… **Tell a story** - Each visualisation should have a purpose
âœ… **Iterate** - Refine based on feedback

*Good luck creating impactful visualisations for your group project!*