##Week 2 - Plotly

###Goal
Create a Scatter Plot to analyze the relationship between Distance and Pace.

##Imports

In [1]:
import pandas as pd
import plotly.express as px #Part of the common libraries in colab.
file_id = "1ymbNqfv9s6YGZzN93HFKAhjg0Z5xZXV1"
url = f"https://drive.google.com/uc?id={file_id}"
df = pd.read_csv(url)

## Preprocessing

### 1. Create Distance and Pace Columns

Create a new DataFrame that contains:
- **Distance in kilometres (km)**
- **Pace in minutes per kilometre (min/km)**

> **Reminder:**  
> Convert units before calculating pace.


In [6]:
#Your code here
#distancevpace.head()

Unnamed: 0,Distance_km,Pace_min_km
0,8.702,5.455537
1,8.3593,6.830601
2,8.3725,6.83901
3,6.82,5.590965
4,2.7581,6.485084


### 2. Create a Function

Create a function called `distancevpace(link)`.

The function should:
- take a file link as an input
- load the data into a DataFrame
- calculate **distance (km)** and **pace (min/km)**
- return a new DataFrame containing these values


In [8]:
# Link to CSV file
file_id = "1ymbNqfv9s6YGZzN93HFKAhjg0Z5xZXV1"
url = f"https://drive.google.com/uc?id={file_id}"

def distancevpace(link):
  # Read the CSV into a DataFrame
  df = pd.read_csv(link)
  #your code here
  return #your new dataframe

Distance_Pace = distancevpace(url)
Distance_Pace.head()

Unnamed: 0,Distance_km,Pace_min_km
0,8.702,5.455537
1,8.3593,6.830601
2,8.3725,6.83901
3,6.82,5.590965
4,2.7581,6.485084


In [None]:
#Solution
# Link to CSV file
file_id = "1ymbNqfv9s6YGZzN93HFKAhjg0Z5xZXV1"
url = f"https://drive.google.com/uc?id={file_id}"

def distancevpace(link):
  # Read the CSV into a DataFrame
  df = pd.read_csv(link)
  # Convert to km and calculate average pace min/km
  distancevpace = pd.DataFrame()
  distancevpace["Distance_km"] = df["distance"] / 1000.0
  distancevpace["Pace_min_km"] = (1000.0 / df["average_speed"]) / 60.0
  return distancevpace

Distance_Pace = distancevpace(url)
Distance_Pace.head()

## Visualisations

### 1. Distance vs Pace (Scatter Plot)

Using Plotly Express, create a scatter plot to investigate the relationship between distance and pace.

In Week 1, we introduced bar charts using `matplotlib` (`plt.bar`).  
A rough comparison of the syntax between matplotlib and Plotly Express is shown below:

| Matplotlib        | Plotly Express              |
|-------------------|-----------------------------|
| `plt.bar(x, y)`   | `fig = px.bar(data, x, y)`  |
| `plt.show()`      | `fig.show()`                |
| arrays            | DataFrame + column names    |

For a scatter plot, use:

```python
px.scatter()


In [12]:
#Your code here

### 2. Format the Chart

Using `fig.update_traces()` and `fig.update_layout()` (or other appropriate methods), format the chart by:

- setting the colour of the markers  
- adjusting opacity
- defining the chart size  
- adding clear and descriptive axis titles

> **Recommendation:**  
> Always include **units** in axis labels unless there is a good reason not to.

A useful reference for formatting Plotly figures can be found here:  
https://plotly.com/python/figure-labels/


In [13]:
# Your code here.

### 3. Change the Tooltip

Use `fig.update_traces()` (or another appropriate method) to customise the **tooltip** that appears when hovering over points.

You can control the tooltip format using `hovertemplate`.

> **Tip:**  
> Tooltips should be clear, readable, and include units where appropriate.


In [16]:
#Your code here

##Background and Tips

###Plotly

Plotly is a JavaScript-based visualisation library built on top of D3.js, and it renders plots using a web browser (or a browser-like viewer). Because of this, Plotly visualisations are interactive by default, supporting features such as zooming, panning, and tooltips. In addition, being built on D3.js makes Plotly highly customisable and well-suited for modern, interactive data exploration.

###Plotly Express

A version of plotly built on plotly that is quicker and simpler to use.

In [None]:
# An example plotly plot
# Example DataFrame
df = pd.DataFrame({
    "x": [1, 2, 3, 4],
    "y": [10, 15, 13, 17]
})

# Create an interactive scatter plot
fig = px.scatter(
    df,
    x="x",
    y="y",
    title="Simple Plotly Scatter Example"
)

fig.show()

In [15]:
#Solution
fig = px.scatter(
    Distance_Pace, #Dataframe
    x="Distance_km", #Column
    y="Pace_min_km",
    title="Distance vs Pace" #Chart Title
)

fig.update_traces(
    marker=dict(
        color="rebeccapurple",
        opacity=0.5
    ),
    # The hovertemplate uses D3.js specific notation, but it does allow for some HTML tags.
    hovertemplate=(#set the hover labels/tooltip
      "Distance: %{x:.2f} km<br>"  #%{x:.2f} specifies to use the x value (distance_km) ':' signifies further formating .2f limits to 2 decimal places.
      "Pace: %{y:.2f} min/km" #<br> is a html newline tag.
      "<extra></extra>"
    )
)

fig.update_layout(
    xaxis_title="Distance (km)",
    yaxis_title="Pace (min/km)",
    width=800,
    height=500
)

fig.show()