# Lab 5 - Heatmaps and Legends

## Goal

Create a calendar-style heatmap by aggregating total distance by Year and Week.

## Importing and loading data

In [1]:
import pandas as pd
file_id = "1ymbNqfv9s6YGZzN93HFKAhjg0Z5xZXV1"
url = f"https://drive.google.com/uc?id={file_id}"
df = pd.read_csv(url)

### Preprocess Data

#### 1. Extract required fields

Our objective is to create a checkerboard-style heatmap.  
A heatmap typically requires three variables:

- x-axis: Week number  
- y-axis: Year  
- z-axis (colour): Distance

The first step is to collate the required information into a DataFrame containing:

- `Year`
- `Week`
- `Distance`

You could use;

```
.dt.isocalendar().week
```

ISO dates follow an international standard (ISO-8601) for representing dates and weeks.

Under this system:
- Weeks start on Monday
- Week 1 is the week containing the first Thursday of the year
- Some years contain 53 weeks instead of 52

Pandas uses ISO dates by default when working with week-based time series.


In [4]:
#Your code here

Unnamed: 0,Year,Week,Distance_km
0,2025,30,8.702
1,2025,29,8.3593
2,2025,29,8.3725
3,2025,29,6.82
4,2025,28,2.7581


## 2. Prepare for the chart

Plotly heatmaps require the colour values (`z`) to be provided as a matrix, for example:

```python
z = [
    [1, 2, 3],
    [3, 1, 1]
]
```
To create this structure, pivot `.pivot()` the DataFrame so that:

- Year becomes the index

- Week number provides the columns

- the values represent distance (km)

> Recommendation: Use .fillna(0) to replace missing (NaN) values with 0 for tidiness.

In [7]:
#Your code here

Week,1,2,3,4,5,6,7,8,9,10,...,44,45,46,47,48,49,50,51,52,53
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011,2.1306,2.205,4.6336,2.2818,2.4244,3.1116,1.6251,1.6498,0.0,1.4403,...,13.0627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2012,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,12.3169,7.4609,0.0,0.0,0.0,5.1443,0.0,7.6417,0.0,0.0
2013,11.5807,7.1071,13.2534,5.0,0.0,3.45,5.01,8.4002,10.5711,10.2385,...,9.1767,0.0,15.3563,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2014,9.0773,11.066,0.0,5.1474,6.0925,0.0,0.0,0.0,2.6449,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2015,0.0,0.0,0.0,0.0,2.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,5.765,10.6715,13.0627,22.7434,25.4735,6.1083


###3. Save

In [None]:
#Your code here

###Solution

In [8]:
weekly = pd.DataFrame()
#Getting fields
df["Date"] = pd.to_datetime(df["start_date"]).dt.tz_localize(None)
weekly["Year"] = df["Date"].dt.isocalendar().year
weekly["Week"] = df["Date"].dt.isocalendar().week
weekly["Distance_km"] = df["distance"] / 1000
#Aggregate
weekly = weekly.groupby(["Year", "Week"]).agg(TotalDistance=("Distance_km", "sum")).reset_index() #Group on Week number
#Create matrix
matrix = weekly.pivot(index='Year', columns='Week', values='TotalDistance').fillna(0) # Pivot to matrix
matrix.to_csv("matrix.csv", index=False)