# Advanced Data Visualisation (Demonstration)

_This notebook introduces more advanced concepts around data visualisation._

Note: This Jupyter Notebook was originally compiled by Alex Reppel (AR) based on conversations with [ClaudeAI](https://claude.ai/) *(version 3.5 Sonnet)*. For this year's materials, further revisions were made using [Claude Code](https://www.anthropic.com/claude-code) *(Sonnet 4.5)*, including updated documentation and git commit messages.

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns  # We need this for example 2!
from matplotlib.gridspec import GridSpec

In [None]:
plt.style.use("bmh")

## Data processing

In [None]:
# Load our retail data
df = pd.read_csv("assets/data/data.csv")  # Using local data file
df['date'] = pd.to_datetime(df['date'])

---
## ðŸŽ¯ CORE CONTENT (Essential for Exercises)

**Estimated time**: 60-75 minutes

This section covers advanced visualisation techniques for creating complex, multi-dimensional plots.

---

In [None]:
# Calculate daily metrics we'll use throughout
daily_metrics = df.groupby(df['date'].dt.date).agg({
    "total_amount": "sum",
    "transaction_id": "count",
    "satisfaction_score": "mean"
}).reset_index()

## Example 1: Building a complex time series visualisation

Each step builds upon the previous one, demonstrating how we can progressively enhance a visualisation by:

1. Starting with basic data representation
2. Incorporating additional metrics with dual axes
3. Finishing with legends and final formatting

### Step 1: Basic revenue line plot

Let's start with a simple line plot of daily revenue:

In [None]:
plt.figure(figsize=(12, 6))

plt.plot(
    daily_metrics['date'], 
    daily_metrics['total_amount'],
    color="#2c3e50",
    linewidth=2,
    label="Revenue")

plt.title("Daily Revenue", pad=20)
plt.xlabel("Date")
plt.ylabel("Revenue (Â£)")

# Add grid with transparency
plt.grid(True, alpha=0.3)

# Format y-axis labels
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f"Â£{x:,.0f}"))

# Add legend
plt.legend()

plt.tight_layout()
plt.show()

### Step 2: Adding a second metric

Let's add transaction count on a secondary axis:

In [None]:
# Create figure and primary axis
fig, ax1 = plt.subplots(figsize=(12, 6))

# Plot revenue on primary axis
color1 = "#2c3e50"
ax1.set_xlabel("Date")
ax1.set_ylabel("Revenue (Â£)", color=color1)

line1 = ax1.plot(
    daily_metrics['date'], 
    daily_metrics['total_amount'],
    color=color1,
    linewidth=2,
    label="Revenue")

ax1.tick_params(axis="y", labelcolor=color1)

# Create secondary axis for transaction count
ax2 = ax1.twinx()
color2 = "#e74c3c"
ax2.set_ylabel("Number of Transactions", color=color2)

line2 = ax2.plot(
    daily_metrics['date'], 
    daily_metrics['transaction_id'],
    color=color2,
    linestyle="--",
    linewidth=2,
    label="Transactions")

ax2.tick_params(axis="y", labelcolor=color2)

# Add title and grid
plt.title("Daily Revenue and Transaction Count", pad=20)
ax1.grid(True, alpha=0.3)

# Format revenue axis labels
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f"Â£{x:,.0f}"))

plt.tight_layout()
plt.show()

### Step 3: Enhancing with legend and final touches

Finally, let's add a combined legend and final styling touches:

In [None]:
# Create figure and primary axis
fig, ax1 = plt.subplots(figsize=(12, 6))

# Plot revenue on primary axis
color1 = "#2c3e50"
ax1.set_xlabel("Date")
ax1.set_ylabel("Daily Revenue (Â£)", color=color1)

line1 = ax1.plot(
    daily_metrics['date'], 
    daily_metrics['total_amount'],
    color=color1,
    linewidth=2,
    label="Revenue")

ax1.tick_params(axis="y", labelcolor=color1)

# Create secondary axis for transaction count
ax2 = ax1.twinx()
color2 = "#e74c3c"
ax2.set_ylabel("Number of Transactions", color=color2)

line2 = ax2.plot(
    daily_metrics['date'], 
    daily_metrics['transaction_id'],
    color=color2,
    linestyle="--",
    linewidth=2,
    label="Transactions")

ax2.tick_params(axis="y", labelcolor=color2)

# Add title and grid
plt.title("Daily Revenue and Transaction Count", pad=20)
ax1.grid(True, alpha=0.3)

# Combine legends
lines = line1 + line2
labels = [l.get_label() for l in lines]
ax1.legend(lines, labels, loc="upper left")

# Format revenue axis labels
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f"Â£{x:,.0f}"))

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

## Example 2: Enhanced visualisation using seaborn

Key improvements:

1. Uses [seaborn](https://seaborn.pydata.org/)'s aesthetically pleasing default style with enhanced font scaling
2. Splits the visualisation into two related plots for clearer data presentation
3. Adds a 7-day moving average to show the trend more clearly
4. Includes summary statistics in a text box
5. Uses a more sophisticated colour palette
6. Adds proper spacing and padding between elements
7. Enhances the grid style for better readability
8. Uses consistent styling across both plots

In [None]:
# Set the style with seaborn
sns.set_style("whitegrid")
sns.set_palette("deep")
sns.set_context("notebook", font_scale=1.2)

# Load and prepare data
df = pd.read_csv("assets/data/data.csv")
df['date'] = pd.to_datetime(df['date'])

# Calculate daily metrics
daily_metrics = df.groupby(df['date'].dt.date).agg({
    "total_amount": "sum",
    "transaction_id": "count",
    "satisfaction_score": "mean"
}).reset_index()

# Create figure and axes
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10), height_ratios=[2, 1])

# Upper plot: Revenue trend
sns.lineplot(
    data=daily_metrics,
    x="date",
    y="total_amount",
    ax=ax1,
    color="#2c3e50",
    linewidth=2.5,
    label="Daily Revenue"
)

# Customize upper plot
ax1.set_title("Daily Revenue Trend", pad=10)
ax1.set_xlabel("")  # Remove x-label as it's repeated in lower plot
ax1.set_ylabel("Revenue (Â£)")
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f"Â£{x:,.0f}"))

# Add rolling average to upper plot
rolling_avg = daily_metrics['total_amount'].rolling(window=7).mean()
sns.lineplot(
    x=daily_metrics['date'],
    y=rolling_avg,
    ax=ax1,
    color="#e74c3c",
    linewidth=2,
    label="7-day Moving Average"
)

# Lower plot: Transaction count
sns.lineplot(
    data=daily_metrics,
    x="date",
    y="transaction_id",
    ax=ax2,
    color="#2ecc71",
    linewidth=2.5,
    label="Daily Transactions"
)

# Customize lower plot
ax2.set_title("Daily Transaction Count", pad=10)
ax2.set_xlabel("Date")
ax2.set_ylabel("Number of Transactions")

# Rotate x-axis labels for both plots
for ax in [ax1, ax2]:
    ax.tick_params(axis="x", rotation=45)
    # Add subtle background grid
    ax.grid(True, alpha=0.3)
    # Add legend with explicit location
    ax.legend(loc="upper left")

# Add a text box with summary statistics
stats_text = f"""
Summary Statistics:
Average Daily Revenue: Â£{daily_metrics['total_amount'].mean():,.0f}
Average Daily Transactions: {daily_metrics['transaction_id'].mean():.0f}
"""
fig.text(0.02, 0.02, stats_text, fontsize=10, 
         bbox=dict(facecolor="white", alpha=0.8, edgecolor="none"))

# Adjust layout to prevent overlapping
plt.tight_layout()
fig.subplots_adjust(top=0.92)  # Adjust for main title

plt.show()

---
## ðŸ“š SUPPLEMENTARY CONTENT (Interactive & Advanced)

**Estimated time**: 20-30 minutes

This section introduces interactive visualisation with hvPlot, enabling dynamic exploration of your data.

---

## Example 3: Interactive visualisation using hvPlot

We import pandas for data manipulation and [hvPlot](https://hvplot.holoviz.org/)'s pandas integration. This allows us to use hvPlot's functionality directly on pandas DataFrames using the `.hvplot` accessor.

### Features

The resulting visualisation includes several interactive features:

- Zoom: Use the mouse wheel or zoom tool
- Pan: Click and drag to move around
- Hover: Move the mouse over lines to see exact values
- Reset: Return to original view
- Save: Export the current view


### Setup

In [None]:
import pandas as pd
import hvplot.pandas

In [None]:
# Load and prepare data
df = pd.read_csv("assets/data/data.csv")
df['date'] = pd.to_datetime(df['date'])

### Data processing

We aggregate our transaction data to daily totals and calculate a 7-day moving average to smooth out daily fluctuations.

In [None]:
# Calculate daily metrics
daily_metrics = df.groupby(df['date'].dt.date).agg({
    "total_amount": "sum",
    "transaction_id": "count",
    "satisfaction_score": "mean"
}).reset_index()

In [None]:
# Calculate rolling average
daily_metrics['rolling_avg'] = daily_metrics['total_amount'].rolling(window=7).mean()

### Creating interactive plots

The visualisation consists of two main components:

1. Plot 1: Revenue plot with moving average & rolling average overlay
2. Plot 2: Transaction count plot

This creates the main revenue line plot. The parameters control:

- Dimensions (`height`, `width`)
- Labels and titles
- Visual styling (`colour`)
- Legend position

In [None]:
# Create individual plots for revenue and rolling average
revenue_plot = daily_metrics.hvplot.line(
    x="date",
    y="total_amount",
    height=400,
    width=800,
    title="Daily Revenue with Rolling Average",
    xlabel="Date",
    ylabel="Revenue (Â£)",
    legend="top",
    label="Daily Revenue",
    color="#2c3e50"
)

rolling_avg_plot = daily_metrics.hvplot.line(
    x="date",
    y="rolling_avg",
    height=400,
    width=800,
    xlabel="Date",
    ylabel="Revenue (Â£)",
    legend="top",
    label="7-day Moving Average",
    line_dash="dashed",
    color="#e74c3c"
)

In [None]:
# Now create a third plot for transaction count
transaction_plot = daily_metrics.hvplot.line(
    x="date",
    y="transaction_id",
    height=200,
    width=800,
    title="Daily Transaction Count",
    xlabel="Date",
    ylabel="Number of Transactions",
    legend="top",
    label="Transactions",
    color="#2ecc71"
)

### Combining the plots

This line combines our plots using hvPlot's composition operators:

- `*` overlays the revenue and rolling average plots
- `+` adds the transaction plot vertically
- `.cols(1)` arranges everything in a single column

In [None]:
((revenue_plot * rolling_avg_plot) + transaction_plot).cols(1)

## Example 4: Small multiples

### What are small multiples?

Small multiples _(also known as trellis plots, lattice charts, or faceted plots)_ are a series of similar graphs or charts using the same scale and axes, allowing them to be easily compared. They're particularly useful for showing how relationships differ across categories or segments.

### Why use small multiples?

Small multiples are excellent for:

1. Comparing patterns across categories
2. Identifying differences and similarities
3. Maintaining context while examining details
4. Reducing the need for complex legend systems
5. Making it easier to spot trends and outliers

### Customisation

The small multiple we'll be creating includes several customisation elements:

- Currency formatting for price labels
- Clear titles for each subplot
- Overall figure title
- Consistent scaling across all plots
- Grid lines for easier comparison

### Setup

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Set the style
sns.set_style("whitegrid")
plt.style.use("classic")

### Data processing

In [None]:
# Load data
df = pd.read_csv("assets/data/data.csv")
df['date'] = pd.to_datetime(df['date'])

In [None]:
# Filter out Electronics
df = df[df['product_category'] != "Electronics"]

### Create a small multiple

In [None]:
# Create a figure with subplots for different metrics by product category
fig = plt.figure(figsize=(15, 10))

We use seaborn's `FacetGrid` for creating small multiples, built on top of matplotlib.

In [None]:
# Create a 2x2 grid of plots
g = sns.FacetGrid(data=df, 
                  col="product_category",
                  row="customer_segment",
                  height=4,
                  aspect=1.5)

# Add the plots
g.map_dataframe(sns.scatterplot, 
                x="unit_price",
                y="satisfaction_score",
                alpha=0.5,
                size="quantity",
                sizes=(20, 200),
                color="#2c3e50")

# Customise the plots
g.set_axis_labels("Unit Price (Â£)", "Satisfaction Score")
g.set_titles(col_template="{col_name}",
            row_template="{row_name}")

# Add a title to the overall figure
g.fig.suptitle("Price vs Satisfaction by Product Category and Customer Segment", 
               y=1.02, 
               fontsize=16)

# Format x-axis labels to show currency
for ax in g.axes.flat:
    ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f"Â£{x:,.0f}"))

# Adjust layout
plt.tight_layout()

plt.show()

### Explanation

#### Creating the grid

This creates a grid where:

- Each column represents a different product category
- Each row represents a different customer segment
- `height` sets the height of each subplot
- `aspect` determines the width/height ratio

#### Adding the plots

We map a scatter plot to each grid cell, showing:

- Unit price on the x-axis
- Satisfaction score on the y-axis
- Point size indicating quantity purchased
- Transparency (alpha) to handle overlapping points