# Module 2, Activity 1: Python for Data Visualisation


---

## Getting Started with Jupyter Notebook

Jupyter Notebook is an interactive environment where you can write and execute Python code in small sections called "cells". 

### How to Use This Notebook:
- **Running a cell**: Click on a cell and press `Shift + Enter` to execute it. Alternatively, Hover over cell e.g. [33] and select the 'Run' button (▶).

- **Adding new cells**: Click on `+` in the toolbar to add a new cell.

- **Editing a cell**: Click inside a cell to start typing.

---

In [None]:
## The components of a Matplotlib figure

Before creating visualisations, we need to understand how Matplotlib works. Matplotlib builds figures using an object-oriented structure, meaning it organises different components (such as axes, labels, and plots) into a hierarchy.

Think of it like building a car—each part (wheels, windows, engine) comes together in a structured way to form the final product. Similarly, in object-oriented programming, code objects take data (what we want to visualise) and procedures (plotting commands) to produce a visual output.

Many other libraries, such as Seaborn and Pyplot, are built on Matplotlib and handle much of this structure for you. Using these libraries is like customising a car instead of building one from scratch.

We'll start by understanding how Matplotlib creates visualisations from the ground up. This will give you a solid foundation when using higher-level libraries, helping you make customisations and troubleshoot issues confidently.

For detailed guidance, the official Matplotlib website is the best resource, and we'll refer to it throughout this course.

---

### Step 1: Importing Libraries

Let's import Matplotlib and any other libraries we will need.


In [None]:
# Note: To run this code, hover over cell [1] and select the 'Run' button (▶).

# Step 1: Import Matplotlib, the most common visualisation library
import matplotlib.pyplot as plt

# Step 2: Import Seaborn for advanced visualisations
import seaborn as sns

# Step 3: Import Pandas for working with datasets. This allows us to:
    # - Load and manipulate tabular data.
    # - Perform data analysis and transformations.
    # Let's create a simple DataFrame (table) and visualise it.
import pandas as pd

# Step 4: Import NumPy for numerical computations. This is for:
    # - Handling arrays and matrices.
    # - Performing mathematical computations.
import numpy as np

# Step 5: Import datetime for working with dates and times
import datetime as dt

# Step 6: Import tick formatting tools from Matplotlib
    # - **AutoMinorLocator** adds minor ticks between major ones.
    # - **MultipleLocator** controls tick intervals.
    # - **FuncFormatter** customises tick labels (e.g., adding currency symbols).
from matplotlib.ticker import AutoMinorLocator, MultipleLocator, FuncFormatter

---
### Step 2: Figure - your canvas
A Figure in Matplotlib is the top-level container for all visual elements of a plot. Think of it as a blank canvas where you can place one or multiple subplots (also called Axes). It does not contain any actual data or plots by itself—it simply provides a space to hold them.



In [None]:
# Hover over cell [2] and select the 'Run' button (▶).

fig = plt.figure()  # an empty figure with no 

`fig = plt.figure()` creates an empty figure (a blank canvas).The default size is of this canvas is 640x440.

Note: Figure (fig)

* This is the overall container for the visualisation.
* It can hold multiple plots (subplots).
* It does not contain subplots by default.

---

### Step 3: Axes - your plot/s
We still have no visualisation yet. That's because we need to specify the region on the figure where we're going to visualise. 

These regions are called axes objects. This might seem redundant - isn't the figure itself the region we're visualising in? - but it's a necessary generalisation if we want to plot multiple visualisations (subplots) on the same figure, or annotate outside the axis of our plot. More broadly, the figure object is the container in which all the nested 'Artists' (axes, legends, colourbars, subplots etc) are kept together.

**NOTE:** One source of confusion here is the name 'Axes object': an Axes actually translates into what we think of as an individual plot or graph, rather than the plural of axis. Our x and y-axis (and z-axis in the case of 3D visualisations) are the Artist objects that give the boundaries of our plot, and they're further down the object hierarchy. 

Let's make a figure, with a single axes:

In [None]:
# Hover over cell [8] and select the 'Run' button (▶).

fig, ax = plt.subplots()  # a figure with a single Axes

Note that since we didn't pass any arguments to plt.subplots(), it creates a figure with a single axes. This is the default operation. But if we wanted a figure with multiple axes, we can specify:

In [None]:
# Hover over cell [5] and select the 'Run' button (▶).

fig, ax = plt.subplots(2, 2)  # a figure with a 2x2 grid of Axes

---

## Exercise:

Complete the following task: Create a figure object containing a 3x4 grid of Axes objects.

In [None]:
# Use this space to create your own Axes object


---

## Our first visualisation

Let's make our first visualisation. 

---

### Step 1: Loading the data 
setWe'll start by loading a dataset containing the some information about the physiology and exercise of 30 
individuals.

In [None]:
# Hover over cell [18] and select the 'Run' button (▶).

df = pd.read_csv("data/exercise.csv")

The **pd.read_csv** command returns a Pandas dataframe. 

Remember; 
* Pandas allows us to load, manipulate, and analyze tabular data.
* Let's create a simple DataFrame (table) and visualize it. 


To confirm that our data is stored as a DataFrame, we can use this command:


In [None]:
# Hover over cell [22] and select the 'Run' button (▶).

type(df)

If the output says `<class 'pandas.core.frame.DataFrame'>`, that means our data is in the correct format.

---
### Step 2: Understanding the Dataset

Before we plot, let's have a quick look at the dataset. There are 90 rows with five columns with information.
We can see that some rows have numbers, others have words (string objects). But note that the first column is just an index of the rows,
so we can ignore it. 

In [None]:
# Hover over cell [26] and select the 'Run' button (▶).

df # The command df displays the dataset. 

---
### Step 3: Fixing the "time" Column

We can use the variable (column) names in the dataframe to create figures with Matplotlib. For example,
if we want to see the relationship between time spent exercising and
pulse rate we can use a scatter plot:

In [None]:
# Hover over cell [28] and select the 'Run' button (▶).

fig, ax = plt.subplots()  # Create a figure and axes (plot area)
ax.scatter(x = df.time, y = df.pulse) # Create a scatter plot
plt.show()# Display the plot

What’s Happening Here?
1. We import Matplotlib to create the plot.
2. We create a figure and axes (a blank canvas).
3. We plot the time (x-axis) against pulse (y-axis).
4. We label the axes so the chart makes sense.
5. We display the plot with plt.show().

This is a good first figure, but we can easily do more with this dataset. For example, there are three different kinds of exercise in the dataset

---

### Step 4: Adding Colors for Different Exercise Types

We can make the plot even clearer by coloring each type of exercise differently.

Before plotting, let's see the different types of exercise in our dataset:


In [None]:
df.kind.unique() # We're asking for an array of the unique values in the "kind" column of the "df" dataframe.

We can show the separate relationships between "time" and "pulse" for each exercise type by adding individual scatter plots to our axes. Each scatter plot will have a unique color assigned to the 'c' variable.

In [None]:
fig, ax = plt.subplots()  # a figure with a single Axes
ax.scatter(x = df.time[df.kind == "rest"], y = df.pulse[df.kind == "rest"], c = "r", label = "rest") 
ax.scatter(x = df.time[df.kind == "walking"], y = df.pulse[df.kind == "walking"], c = "y", label = "walking")
ax.scatter(x = df.time[df.kind == "running"], y = df.pulse[df.kind == "running"], c = "c", label = "running") 
ax.set_ylabel('time')
ax.set_xlabel('pulse')
ax.legend() # Adds a legend to explain the colors
plt.show()

We filter the data so that:
* "Rest" is plotted in red (r).
* "Walking" is plotted in yellow (y).
* "Running" is plotted in cyan (c).
We label the axes and add a legend to explain what each color represents.

 Here we have created a simple scatterplot with multiple exercise groups distinguished by colour. First, we created a figure with a single Axes, then we visualised three scatterplots on that axis. Each point is coloured by which kind of exercise they are associated with.
 
---

## Exercise:

Complete the following task: Swap the colours of the **rest** group and the **walking** group in the visualisation above.

Change **rest** to yellow `(y)`.
Change **walking** to red `(r)`.

---
### Using seaborn
Matplotlib is very flexible, but it requires a lot of code to create detailed visualizations.

If you want a simpler way to make the same scatter plot, you can use Seaborn, which is built on top of Matplotlib.

In [None]:
sns.scatterplot(data = df, x = "time", y = "pulse", hue = "kind")
plt.show()

What’s Different?
* Less code: Seaborn automatically creates the figure and axes.
* Automatic colors: It assigns different colors to each exercise type.
* Legend included: No need to manually add a legend—it’s done for you!

More information about Seaborn, and the figures it can make, is available [here](https://seaborn.pydata.org/). 

We'll use Seaborn and other plotting libraries as needed, but our goal is effective visualisation, not library preference. Most Python plotting tools are built on Matplotlib, simplifying the process.

Using the car analogy—this course focuses on driving (clear data communication) rather than building. Still, understanding what happens under the hood helps when creating visualisations in Python.

---
### Understanding Data Subsetting

In the multi-colour Matplotlib scatterplot, we use the following command

`df.time[df.kind == "rest"]`

This command gives us as subset of the time variable, where the kind of exercise is rest. If you feel over your head about this subsetting command, don't wory. In the next section,
we're going to learn how to index, slice, subset and query our dataset using the Pandas library, before we visualise.

---

## Exercise:

1. Choose one of the Basic plot types on the Matplotlib website [here](https://matplotlib.org/stable/plot_types/index.html). 
2. In your own words, concisely explain step-by-step what the example code to generate your selected plot is doing. Feel free to use the rest of the Matplotlib website, or Google, if you're unsure about any parts of the code. 

**Note**: these examples generate their own data (under the 'make data' section of the code).