# Introduction to Data Visualisation (Introduction)

_This notebook introduces Week 7's learning objectives and key concepts, building on the data analysis skills from Weeks 4 and 5._

Note: This Jupyter Notebook was originally compiled by Alex Reppel (AR) based on conversations with [ClaudeAI](https://claude.ai/) *(version 3.5 Sonnet)*. For this year's materials, further revisions were made using [Claude Code](https://www.anthropic.com/claude-code) *(Sonnet 4.5)*, including updated documentation and git commit messages.

## Week 7 overview

Welcome to Week 7! This week introduces data visualisation, a critical skill for communicating insights from your analysis. While Pandas helps you analyse data, visualisation helps you **understand patterns** and **communicate findings** effectively to stakeholders.

## Learning objectives

By the end of this week, you will be able to:

1. **Choose appropriate visualisation types** - select the right chart for your data and message
2. **Create plots with Pandas** - use the `.plot()` method for quick visualisations
3. **Customise with Matplotlib** - control titles, labels, colours, and layouts
4. **Use Seaborn for statistical plots** - create distribution and relationship visualisations
5. **Design effective subplots** - combine multiple visualisations in one figure
6. **Apply best practices** - create clear, accessible, and professional visualisations

## The visualisation learning block

This week begins a 3-week block on data visualisation:

- **Week 07 (this week)**: Introduction to Data Visualisation - basic plots with Pandas, Matplotlib, Seaborn
- **Week 08**: Advanced Visualisation - time series, small multiples, interactive plots with hvPlot
- **Week 09**: Application week - create visualisations for your group project

Just as Weeks 04-06 formed the Pandas block (basic → advanced → application), Weeks 07-09 form the visualisation block. Each week builds on the previous, culminating in Week 09 where you'll apply visualisation techniques to your group project. For details about the assessment, see **[Week 03](../Week03/Introduction.ipynb)**.

## Prerequisites

Before starting this week's materials, ensure you're comfortable with:
- Pandas DataFrames and Series (Week 4)
- Data aggregation and grouping (Weeks 4-5)
- Basic data cleaning (Week 5)

## Resources

### Literature

- [My personal reading list](https://zbib.org/d7003aee35484ccf9cb4e2657b919dd3) *(optional!)*

### Recommended tutorials

- [Matplotlib Official Tutorial](https://matplotlib.org/stable/tutorials/index.html) - Comprehensive guide to Matplotlib
- [Seaborn Tutorial](https://seaborn.pydata.org/tutorial.html) - Statistical visualisation guide
- [Pandas Visualisation](https://pandas.pydata.org/docs/user_guide/visualisation.html) - Using Pandas plotting methods

### Colour and accessibility

- [ColourBrewer](https://colourbrewer2.org/) - Selecting appropriate colour palettes *(super helpful!)*
- [Data Viz Catalogue](https://datavizcatalogue.com/) - Choosing the right chart type *(no pie charts!)*

## Why data visualisation matters

### The challenge with numbers alone

Consider this sales data summary:

In [None]:
import pandas as pd

# Sales data
sales = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'Revenue': [45000, 52000, 48000, 61000, 58000, 67000],
    'Costs': [32000, 35000, 33000, 38000, 36000, 39000]
})

print(sales)

### The visualisation advantage

The same data visualised reveals trends immediately:

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(sales['Month'], sales['Revenue'], marker='o', label='Revenue', linewidth=2)
plt.plot(sales['Month'], sales['Costs'], marker='s', label='Costs', linewidth=2)
plt.title('Monthly Revenue vs Costs', fontsize=14, fontweight='bold')
plt.xlabel('Month')
plt.ylabel('Amount (£)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Imagine you do not have six but six thousand rows ...

## This week's structure

### Introduction (this notebook)
- Overview of visualisation importance
- Understanding when to use different chart types
- Preview of key libraries and techniques

### Demonstration
- Comprehensive walkthrough of visualisation tools
- Pandas plotting for quick visualisations
- Matplotlib for customisation
- Seaborn for statistical graphics
- Creating and arranging subplots
- Best practices and accessibility

### Exercises (90 minutes)
- Progressive visualisation tasks
- Real-world business scenarios
- Focus on effective communication

### Solutions
- Complete solutions with explanations
- Alternative visualisation approaches
- Design principles highlighted

## Key concepts (preview)

### 1. Choosing the right visualisation

Different questions require different chart types:

- **Trends over time?** → Line plot
- **Comparing categories?** → Bar chart
- **Distribution of values?** → Histogram or box plot
- **Relationship between variables?** → Scatter plot
- **Parts of a whole?** → Pie chart *(use sparingly, or ... not at all!)*
- **Correlation between many variables?** → Heatmap

### 2. Pandas plotting basics

Pandas DataFrames have built-in plotting methods for quick visualisations:

In [None]:
# Quick line plot
sales.plot(x='Month', y='Revenue', kind='line', title='Revenue Trend', figsize=(8, 4))
plt.show()

### 3. Matplotlib customisation

Matplotlib gives you full control over every aspect of your plots:

In [None]:
# Customised scatter plot
fig, ax = plt.subplots(figsize=(8, 5))
ax.scatter(sales['Revenue'], sales['Costs'], s=100, alpha=0.6, c='steelblue', edgecolor='black')
ax.set_xlabel('Revenue (£)', fontsize=12)
ax.set_ylabel('Costs (£)', fontsize=12)
ax.set_title('Revenue vs Costs Relationship', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()

### 4. Seaborn statistical plots

Seaborn excels at statistical visualisations with attractive defaults:

In [None]:
import seaborn as sns

# Distribution plot with Seaborn
sns.set_style('whitegrid')
plt.figure(figsize=(8, 5))
sns.histplot(data=sales, x='Revenue', kde=True, bins=6, color='skyblue')
plt.title('Revenue Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Revenue (£)')
plt.tight_layout()
plt.show()

## Relevance for your assessments

### Visualisation is essential for:

- Exploring your dataset's characteristics
- Identifying patterns and relationships
- Supporting your analytical narrative
- Creating professional figures for your report
- Demonstrating data understanding

### Your report should include:

- Exploratory visualisations showing data distributions
- Analytical visualisations supporting your findings
- Clear, well-labelled, professional figures
- Appropriate chart types for each insight

## Recommended approach

1. **Review this Introduction** *(10 minutes)*
   - Understand the week's objectives
   - Run the preview examples

2. **Work through the Demonstration** *(90 minutes)*
   - Run every code cell
   - Experiment with parameters
   - Try different chart types

3. **Complete Exercises** *(90 minutes)*
   - Start with simple plots
   - Progress to customisation
   - Focus on clarity over complexity

4. **Review Solutions** *(30 minutes)*
   - Compare your designs
   - Note alternative approaches
   - Learn from design choices

## Relevance for your assessments

### Visualisation is essential for:

- Exploring your dataset's characteristics
- Identifying patterns and relationships
- Supporting your analytical narrative
- Creating professional figures for your report
- Demonstrating data understanding

### Your report should include:

- Exploratory visualisations showing data distributions
- Analytical visualisations supporting your findings
- Clear, well-labeled, professional figures
- Appropriate chart types for each insight

## Ready to begin?

If you're comfortable with:

- Pandas DataFrames from Week 4
- Data aggregation from Week 5
- The assessment requirements from Week 3

Then you're ready to dive into data visualisation!

**What's next?**

- **This week (Week 07)**: Master basic visualisation fundamentals
- **Week 08**: Learn advanced techniques (time series, small multiples, interactive plots)
- **Week 09**: Apply everything to create visualisations for your group project

### Remember

- **Visualisation tells stories** - focus on clarity over complexity
- **Practice builds judgement** - experiment with different approaches
- **Simple is often better** - don't over-design
- **Accessibility matters** - consider all audiences

Proceed to the [Demonstration](./Demonstration.ipynb) notebook when ready.

*Good luck with your visualisations!*

## Ready to begin?

If you're comfortable with:
- Pandas DataFrames from Week 4
- Data aggregation from Week 5
- The individual project requirements

Then you're ready to dive into data visualisation!

**What's next?**
- **This week (Week 07)**: Master basic visualisation fundamentals
- **Week 08**: Learn advanced techniques (time series, small multiples, interactive plots)
- **Week 09**: Apply everything to create visualisations for your individual project

Remember:
- **Visualisation tells stories** - focus on clarity over complexity
- **Practice builds judgment** - experiment with different approaches
- **Simple is often better** - don't over-design
- **Accessibility matters** - consider all audiences

Proceed to the [Demonstration](./Demonstration.ipynb) notebook when ready. Good luck!

## Ready to begin?

If you're comfortable with:
- Pandas DataFrames from Week 4
- Data aggregation from Week 5
- The individual project requirements

Then you're ready to dive into data visualisation!

Remember:
- **Visualisation tells stories** - focus on clarity over complexity
- **Practice builds judgment** - experiment with different approaches
- **Simple is often better** - don't over-design
- **Accessibility matters** - consider all audiences

Proceed to the [Demonstration](./Demonstration.ipynb) notebook when ready. Good luck!