# GPX Cycling Data Analysis

This notebook analyzes cycling data from a GPX file, which contains GPS tracking information from a ride.

We'll explore:

1. Basic ride statistics (distance, duration, elevation)
2. Speed analysis over the course of the ride
3. Visualization of the route on a map
4. Elevation profile analysis
5. Identify segments with different intensities


In [None]:
# Install required packages
!pip install gpxpy pandas matplotlib folium geopy haversine numpy

In [2]:
# Import required libraries
import gpxpy
import gpxpy.gpx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import folium
from datetime import datetime
from haversine import haversine, Unit
import os
from matplotlib.pyplot import figure

# Set plot style and size
plt.style.use("ggplot")
figure(figsize=(12, 6), dpi=80)

<Figure size 960x480 with 0 Axes>

<Figure size 960x480 with 0 Axes>

In [None]:
# Load and parse the GPX file
gpx_file_path = "../Morning_Ride.gpx"

# Check if file exists
if not os.path.exists(gpx_file_path):
    print(f"File not found: {gpx_file_path}")
    # Try with absolute path
    gpx_file_path = (
        "c:/Users/milively/Documents/_dev_work/playground-0/Morning_Ride.gpx"
    )
    if not os.path.exists(gpx_file_path):
        print(f"File not found with absolute path either: {gpx_file_path}")
    else:
        print(f"File found with absolute path: {gpx_file_path}")
else:
    print(f"File found: {gpx_file_path}")

# Parse the GPX file
with open(gpx_file_path, "r") as gpx_file:
    gpx = gpxpy.parse(gpx_file)

print(f"Number of tracks: {len(gpx.tracks)}")
print(f"Number of segments in first track: {len(gpx.tracks[0].segments)}")
print(f"Number of points in first segment: {len(gpx.tracks[0].segments[0].points)}")
if hasattr(gpx.tracks[0], "type"):
    print(f"Ride type: {gpx.tracks[0].type}")
else:
    print("Ride type not found in GPX data")

if hasattr(gpx.tracks[0], "name"):
    print(f"Ride name: {gpx.tracks[0].name}")
else:
    print("Ride name not found, but metadata indicates this is a 'Morning Ride'")

File found: ../Morning_Ride.gpx
GPX file parsed successfully: None
Number of tracks: 1
Number of segments in first track: 1
Number of points in first segment: 7918
Ride type: cycling
Ride name: Morning Ride


In [4]:
# Convert GPX data to a pandas DataFrame
def gpx_to_dataframe(gpx):
    """Convert GPX data to a pandas DataFrame"""
    points = []

    for track in gpx.tracks:
        for segment in track.segments:
            for point in segment.points:
                point_dict = {
                    "latitude": point.latitude,
                    "longitude": point.longitude,
                    "elevation": point.elevation,
                    "time": point.time,
                }

                # Add extensions data if available
                if hasattr(point, "extensions") and point.extensions:
                    for extension in point.extensions:
                        for child in extension:
                            if child.tag.endswith("hr"):
                                point_dict["heart_rate"] = float(child.text)
                            elif child.tag.endswith("cad"):
                                point_dict["cadence"] = float(child.text)
                            elif child.tag.endswith("power"):
                                point_dict["power"] = float(child.text)
                            elif child.tag.endswith("temp"):
                                point_dict["temperature"] = float(child.text)

                points.append(point_dict)

    return pd.DataFrame(points)


# Convert to DataFrame
df = gpx_to_dataframe(gpx)

# Display the first few rows
print(f"Number of data points: {len(df)}")
df.head()

Number of data points: 7918


Unnamed: 0,latitude,longitude,elevation,time
0,47.579221,-122.000199,128.3,2024-12-01 18:05:07+00:00
1,47.579163,-122.000361,128.4,2024-12-01 18:05:08+00:00
2,47.579192,-122.000404,128.5,2024-12-01 18:05:09+00:00
3,47.57922,-122.000462,128.5,2024-12-01 18:05:10+00:00
4,47.579246,-122.000518,128.5,2024-12-01 18:05:11+00:00


In [5]:
# Check data quality and add calculated fields
print("Basic data quality check:")
print(f"Missing values in DataFrame:\n{df.isnull().sum()}")

# Add calculated columns for time differences and distances
df["prev_lat"] = df["latitude"].shift(1)
df["prev_lon"] = df["longitude"].shift(1)
df["prev_time"] = df["time"].shift(1)
df["prev_elevation"] = df["elevation"].shift(1)

# Calculate time difference between points in seconds
df["time_diff_sec"] = (df["time"] - df["prev_time"]).dt.total_seconds()

# Calculate distance between consecutive points in kilometers
df["distance_km"] = df.apply(
    lambda row: haversine(
        (row["prev_lat"], row["prev_lon"]),
        (row["latitude"], row["longitude"]),
        unit=Unit.KILOMETERS,
    )
    if not pd.isna(row["prev_lat"])
    else 0,
    axis=1,
)

# Calculate elevation change
df["elevation_change"] = df["elevation"] - df["prev_elevation"]

# Calculate speed in km/h
df["speed_km_h"] = df.apply(
    lambda row: (row["distance_km"] / row["time_diff_sec"]) * 3600
    if row["time_diff_sec"] > 0
    else 0,
    axis=1,
)

# Calculate cumulative distance
df["cumulative_distance_km"] = df["distance_km"].cumsum()

# Calculate grade (elevation change / distance) in percent
df["grade_pct"] = df.apply(
    lambda row: (row["elevation_change"] / (row["distance_km"] * 1000)) * 100
    if row["distance_km"] > 0
    else 0,
    axis=1,
)

# Clean up unrealistic values (from GPS errors)
# Filter out unrealistic speeds (e.g., > 100 km/h for a bicycle)
speed_mask = df["speed_km_h"] > 100
if speed_mask.any():
    print(f"Found {speed_mask.sum()} points with unrealistic speeds (>100 km/h)")
    df.loc[speed_mask, "speed_km_h"] = df["speed_km_h"].median()

# Filter out unrealistic grades (e.g., > 40% or < -40%)
grade_mask = (df["grade_pct"] > 40) | (df["grade_pct"] < -40)
if grade_mask.any():
    print(f"Found {grade_mask.sum()} points with unrealistic grades (>40% or <-40%)")
    df.loc[grade_mask, "grade_pct"] = 0

# Calculate moving time (time when speed > 1 km/h)
moving_mask = df["speed_km_h"] > 1
moving_time_seconds = df.loc[moving_mask, "time_diff_sec"].sum()
moving_time_minutes = moving_time_seconds / 60

# Display basic statistics
print("\nBasic ride statistics:")
print(f"Total distance: {df['cumulative_distance_km'].max():.2f} km")
print(
    f"Total elevation gain: {df.loc[df['elevation_change'] > 0, 'elevation_change'].sum():.1f} m"
)
print(
    f"Total elevation loss: {abs(df.loc[df['elevation_change'] < 0, 'elevation_change'].sum()):.1f} m"
)
print(f"Start time: {df['time'].min()}")
print(f"End time: {df['time'].max()}")
total_time_seconds = (df["time"].max() - df["time"].min()).total_seconds()
print(f"Total time: {total_time_seconds / 60:.1f} minutes")
print(f"Moving time: {moving_time_minutes:.1f} minutes")
print(
    f"Average speed (moving): {(df['cumulative_distance_km'].max() / (moving_time_seconds / 3600)):.2f} km/h"
)
print(
    f"Average speed (total): {(df['cumulative_distance_km'].max() / (total_time_seconds / 3600)):.2f} km/h"
)
print(f"Maximum speed: {df['speed_km_h'].max():.2f} km/h")
print(f"Maximum grade: {df['grade_pct'].max():.1f}%")
print(f"Minimum grade: {df['grade_pct'].min():.1f}%")

# Show processed data
df.head()

Basic data quality check:
Missing values in DataFrame:
latitude     0
longitude    0
elevation    0
time         0
dtype: int64
Found 4 points with unrealistic grades (>40% or <-40%)

Basic ride statistics:
Total distance: 39.81 km
Total elevation gain: 386.6 m
Total elevation loss: 378.0 m
Start time: 2024-12-01 18:05:07+00:00
End time: 2024-12-01 20:29:04+00:00
Total time: 143.9 minutes
Moving time: 136.8 minutes
Average speed (moving): 17.47 km/h
Average speed (total): 16.59 km/h
Maximum speed: 84.68 km/h
Maximum grade: 39.8%
Minimum grade: -28.6%


Unnamed: 0,latitude,longitude,elevation,time,prev_lat,prev_lon,prev_time,prev_elevation,time_diff_sec,distance_km,elevation_change,speed_km_h,cumulative_distance_km,grade_pct
0,47.579221,-122.000199,128.3,2024-12-01 18:05:07+00:00,,,NaT,,,0.0,,0.0,0.0,0.0
1,47.579163,-122.000361,128.4,2024-12-01 18:05:08+00:00,47.579221,-122.000199,2024-12-01 18:05:07+00:00,128.3,1.0,0.013757,0.1,49.524708,0.013757,0.72691
2,47.579192,-122.000404,128.5,2024-12-01 18:05:09+00:00,47.579163,-122.000361,2024-12-01 18:05:08+00:00,128.4,1.0,0.004561,0.1,16.419126,0.018318,2.192565
3,47.57922,-122.000462,128.5,2024-12-01 18:05:10+00:00,47.579192,-122.000404,2024-12-01 18:05:09+00:00,128.5,1.0,0.00535,0.0,19.259374,0.023668,0.0
4,47.579246,-122.000518,128.5,2024-12-01 18:05:11+00:00,47.57922,-122.000462,2024-12-01 18:05:10+00:00,128.5,1.0,0.005099,0.0,18.357342,0.028767,0.0


In [6]:
# Visualize the route on a map
def create_route_map(df):
    """Create a map with the route, colored by speed"""
    # Calculate center point for the map
    center_lat = df["latitude"].mean()
    center_lon = df["longitude"].mean()

    # Create a map
    route_map = folium.Map(location=[center_lat, center_lon], zoom_start=13)

    # Add a route line colored by speed
    # Create speed categories for coloring
    speed_bins = [0, 10, 15, 20, 25, 30, 100]
    speed_labels = [
        "0-10 km/h",
        "10-15 km/h",
        "15-20 km/h",
        "20-25 km/h",
        "25-30 km/h",
        ">30 km/h",
    ]
    speed_colors = ["blue", "green", "yellow", "orange", "red", "purple"]

    df["speed_category"] = pd.cut(
        df["speed_km_h"], bins=speed_bins, labels=speed_labels
    )

    # Create a feature group for each speed category
    for i, category in enumerate(speed_labels):
        points = df[df["speed_category"] == category]
        if len(points) > 0:
            # Create a line for each segment of this category
            for j in range(len(points) - 1):
                if j + 1 < len(points):
                    # Create a line between consecutive points
                    line = folium.PolyLine(
                        locations=[
                            [points.iloc[j]["latitude"], points.iloc[j]["longitude"]],
                            [
                                points.iloc[j + 1]["latitude"],
                                points.iloc[j + 1]["longitude"],
                            ],
                        ],
                        color=speed_colors[i],
                        weight=4,
                        opacity=0.8,
                        tooltip=category,
                    )
                    line.add_to(route_map)

    # Add markers for start and end
    folium.Marker(
        location=[df.iloc[0]["latitude"], df.iloc[0]["longitude"]],
        popup="Start",
        icon=folium.Icon(color="green", icon="play"),
    ).add_to(route_map)

    folium.Marker(
        location=[df.iloc[-1]["latitude"], df.iloc[-1]["longitude"]],
        popup="End",
        icon=folium.Icon(color="red", icon="stop"),
    ).add_to(route_map)

    # Add a legend
    legend_html = """
    <div style="position: fixed;
        bottom: 50px; left: 50px; width: 170px; height: 210px;
        border:2px solid grey; z-index:9999; font-size:14px;
        background-color:white;
        padding: 10px;
        border-radius: 5px;">
    <p style="margin-top: 0; margin-bottom: 10px;"><b>Speed Legend</b></p>
    """

    for i, category in enumerate(speed_labels):
        legend_html += f"""
        <div style="display:flex;align-items:center;margin-bottom:5px;">
            <div style="background-color:{speed_colors[i]};width:20px;height:10px;margin-right:5px;"></div>
            <span>{category}</span>
        </div>
        """

    legend_html += """</div>"""

    route_map.get_root().html.add_child(folium.Element(legend_html))

    return route_map


# Create and display the map
route_map = create_route_map(df)
route_map

In [None]:
# Elevation analysis and visualization
plt.figure(figsize=(14, 8))

# Create a distance-based x-axis for smoother plots (every point)
x_distance = df["cumulative_distance_km"]

# Plot elevation vs distance
ax1 = plt.subplot(2, 1, 1)
plt.plot(x_distance, df["elevation"], "b-", linewidth=2)
plt.fill_between(
    x_distance, df["elevation"].min(), df["elevation"], alpha=0.2, color="skyblue"
)
plt.title("Elevation Profile", fontsize=14)
plt.ylabel("Elevation (m)", fontsize=12)
plt.grid(True, alpha=0.3)

# Add annotations for highest and lowest points
max_ele_idx = df["elevation"].idxmax()
min_ele_idx = df["elevation"].idxmin()

plt.scatter(
    df.loc[max_ele_idx, "cumulative_distance_km"],
    df.loc[max_ele_idx, "elevation"],
    color="red",
    zorder=5,
    s=80,
)
plt.annotate(
    f"Highest: {df.loc[max_ele_idx, 'elevation']:.1f}m",
    (df.loc[max_ele_idx, "cumulative_distance_km"], df.loc[max_ele_idx, "elevation"]),
    xytext=(10, 10),
    textcoords="offset points",
    fontsize=10,
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8),
)

plt.scatter(
    df.loc[min_ele_idx, "cumulative_distance_km"],
    df.loc[min_ele_idx, "elevation"],
    color="green",
    zorder=5,
    s=80,
)
plt.annotate(
    f"Lowest: {df.loc[min_ele_idx, 'elevation']:.1f}m",
    (df.loc[min_ele_idx, "cumulative_distance_km"], df.loc[min_ele_idx, "elevation"]),
    xytext=(10, 10),
    textcoords="offset points",
    fontsize=10,
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8),
)

# Plot grade vs distance
ax2 = plt.subplot(2, 1, 2, sharex=ax1)
plt.plot(x_distance, df["grade_pct"], "g-", linewidth=1.5, alpha=0.7)
plt.axhline(y=0, color="k", linestyle="-", alpha=0.2)
plt.fill_between(
    x_distance,
    0,
    df["grade_pct"],
    where=(df["grade_pct"] > 0),
    color="red",
    alpha=0.3,
    label="Uphill",
)
plt.fill_between(
    x_distance,
    0,
    df["grade_pct"],
    where=(df["grade_pct"] <= 0),
    color="green",
    alpha=0.3,
    label="Downhill",
)

plt.title("Grade Profile", fontsize=14)
plt.xlabel("Distance (km)", fontsize=12)
plt.ylabel("Grade (%)", fontsize=12)
plt.grid(True, alpha=0.3)
plt.legend()

# Add annotations for steepest uphill and downhill
max_grade_idx = df["grade_pct"].idxmax()
min_grade_idx = df["grade_pct"].idxmin()

plt.scatter(
    df.loc[max_grade_idx, "cumulative_distance_km"],
    df.loc[max_grade_idx, "grade_pct"],
    color="darkred",
    zorder=5,
    s=80,
)
plt.annotate(
    f"Steepest uphill: {df.loc[max_grade_idx, 'grade_pct']:.1f}%",
    (
        df.loc[max_grade_idx, "cumulative_distance_km"],
        df.loc[max_grade_idx, "grade_pct"],
    ),
    xytext=(10, -20),
    textcoords="offset points",
    fontsize=10,
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8),
)

plt.scatter(
    df.loc[min_grade_idx, "cumulative_distance_km"],
    df.loc[min_grade_idx, "grade_pct"],
    color="darkgreen",
    zorder=5,
    s=80,
)
plt.annotate(
    f"Steepest downhill: {df.loc[min_grade_idx, 'grade_pct']:.1f}%",
    (
        df.loc[min_grade_idx, "cumulative_distance_km"],
        df.loc[min_grade_idx, "grade_pct"],
    ),
    xytext=(10, 20),
    textcoords="offset points",
    fontsize=10,
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8),
)

plt.tight_layout()
plt.show()

# Calculate climb statistics
total_climbing_meters = df.loc[df["elevation_change"] > 0, "elevation_change"].sum()
total_descent_meters = abs(df.loc[df["elevation_change"] < 0, "elevation_change"].sum())

print(f"\nDetailed climbing statistics:")
print(f"Total climbing: {total_climbing_meters:.1f}m")
print(f"Total descent: {total_descent_meters:.1f}m")

# Climbing segments analysis (climbs > 20m vertical)
min_climb_threshold = 20  # meters of elevation gain
climbing = False
climb_start_idx = 0
climbs = []

# Process data to identify significant climbs
for i in range(1, len(df)):
    current_ele = df.iloc[i]["elevation"]

    if not climbing and i > 0 and df.iloc[i]["elevation_change"] > 0:
        # Start a new potential climb
        climbing = True
        climb_start_idx = i - 1
        climb_start_ele = df.iloc[climb_start_idx]["elevation"]

    elif climbing:
        # Check if still climbing or reached top
        if current_ele < df.iloc[i - 1]["elevation"] or i == len(df) - 1:
            # Climbing segment ended
            climbing = False
            climb_end_idx = i - 1 if i < len(df) - 1 else i
            climb_end_ele = df.iloc[climb_end_idx]["elevation"]

            # Calculate climb metrics
            elevation_gain = climb_end_ele - climb_start_ele

            # Only record significant climbs
            if elevation_gain > min_climb_threshold:
                start_distance = df.iloc[climb_start_idx]["cumulative_distance_km"]
                end_distance = df.iloc[climb_end_idx]["cumulative_distance_km"]
                distance = end_distance - start_distance

                # Skip if distance is too small (likely GPS error)
                if distance < 0.05:  # 50 meters
                    continue

                avg_grade = (elevation_gain / (distance * 1000)) * 100

                # Record the climb
                climbs.append(
                    {
                        "start_idx": climb_start_idx,
                        "end_idx": climb_end_idx,
                        "start_distance": start_distance,
                        "end_distance": end_distance,
                        "distance": distance,
                        "start_elevation": climb_start_ele,
                        "end_elevation": climb_end_ele,
                        "elevation_gain": elevation_gain,
                        "avg_grade": avg_grade,
                    }
                )

# Sort climbs by elevation gain
climbs.sort(key=lambda x: x["elevation_gain"], reverse=True)

# Display significant climbs
if climbs:
    print("\nSignificant climbs (>20m elevation gain):")
    for i, climb in enumerate(climbs[:5]):  # Show top 5 climbs
        print(
            f"Climb {i + 1}: {climb['elevation_gain']:.1f}m gain over {climb['distance']:.2f}km "
            + f"({climb['avg_grade']:.1f}% avg grade) at {climb['start_distance']:.2f}km into the ride"
        )

In [None]:
# Speed and heart rate analysis
plt.figure(figsize=(14, 8))

# Plot speed vs distance
ax1 = plt.subplot(2, 1, 1)
plt.plot(df["cumulative_distance_km"], df["speed_km_h"], "b-", linewidth=1.5, alpha=0.7)
plt.title("Speed Profile", fontsize=14)
plt.ylabel("Speed (km/h)", fontsize=12)
plt.grid(True, alpha=0.3)

# Highlight sections by speed
plt.fill_between(
    df["cumulative_distance_km"],
    0,
    df["speed_km_h"],
    where=(df["speed_km_h"] > 30),
    color="purple",
    alpha=0.3,
    label=">30 km/h",
)
plt.fill_between(
    df["cumulative_distance_km"],
    0,
    df["speed_km_h"],
    where=(df["speed_km_h"] > 25) & (df["speed_km_h"] <= 30),
    color="red",
    alpha=0.3,
    label="25-30 km/h",
)
plt.fill_between(
    df["cumulative_distance_km"],
    0,
    df["speed_km_h"],
    where=(df["speed_km_h"] > 20) & (df["speed_km_h"] <= 25),
    color="orange",
    alpha=0.3,
    label="20-25 km/h",
)
plt.fill_between(
    df["cumulative_distance_km"],
    0,
    df["speed_km_h"],
    where=(df["speed_km_h"] > 15) & (df["speed_km_h"] <= 20),
    color="yellow",
    alpha=0.3,
    label="15-20 km/h",
)
plt.fill_between(
    df["cumulative_distance_km"],
    0,
    df["speed_km_h"],
    where=(df["speed_km_h"] > 10) & (df["speed_km_h"] <= 15),
    color="green",
    alpha=0.3,
    label="10-15 km/h",
)
plt.fill_between(
    df["cumulative_distance_km"],
    0,
    df["speed_km_h"],
    where=(df["speed_km_h"] <= 10),
    color="blue",
    alpha=0.3,
    label="0-10 km/h",
)

# Add annotations for fastest and slowest moving points
# Filter out stopped points (speed < 1 km/h)
moving_df = df[df["speed_km_h"] > 1]
max_speed_idx = moving_df["speed_km_h"].idxmax()

plt.scatter(
    df.loc[max_speed_idx, "cumulative_distance_km"],
    df.loc[max_speed_idx, "speed_km_h"],
    color="red",
    zorder=5,
    s=80,
)
plt.annotate(
    f"Max: {df.loc[max_speed_idx, 'speed_km_h']:.1f} km/h",
    (
        df.loc[max_speed_idx, "cumulative_distance_km"],
        df.loc[max_speed_idx, "speed_km_h"],
    ),
    xytext=(10, 10),
    textcoords="offset points",
    fontsize=10,
    bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8),
)

plt.legend(loc="upper right")

# Check if heart rate data is available
if "heart_rate" in df.columns and not df["heart_rate"].isnull().all():
    # Plot heart rate vs distance
    ax2 = plt.subplot(2, 1, 2, sharex=ax1)
    plt.plot(df["cumulative_distance_km"], df["heart_rate"], "r-", linewidth=1.5)
    plt.title("Heart Rate Profile", fontsize=14)
    plt.xlabel("Distance (km)", fontsize=12)
    plt.ylabel("Heart Rate (bpm)", fontsize=12)
    plt.grid(True, alpha=0.3)

    # Calculate heart rate zones (based on common zones)
    # You would normally calculate these based on max HR or threshold
    # Here we use common percentage ranges for visualization
    max_hr = df["heart_rate"].max()
    estimated_max_hr = 220 - 30  # Assuming 30-year-old rider, adjust as needed

    # Heart rate zones (approximate)
    z1_upper = int(estimated_max_hr * 0.6)  # Zone 1: Very Light (50-60%)
    z2_upper = int(estimated_max_hr * 0.7)  # Zone 2: Light (60-70%)
    z3_upper = int(estimated_max_hr * 0.8)  # Zone 3: Moderate (70-80%)
    z4_upper = int(estimated_max_hr * 0.9)  # Zone 4: Hard (80-90%)
    z5_lower = int(estimated_max_hr * 0.9)  # Zone 5: Maximum (90-100%)

    # Fill heart rate zones
    plt.fill_between(
        df["cumulative_distance_km"],
        0,
        df["heart_rate"],
        where=(df["heart_rate"] >= z5_lower),
        color="darkred",
        alpha=0.3,
        label=f"Zone 5 (>{z5_lower} bpm)",
    )
    plt.fill_between(
        df["cumulative_distance_km"],
        0,
        df["heart_rate"],
        where=(df["heart_rate"] >= z4_upper) & (df["heart_rate"] < z5_lower),
        color="red",
        alpha=0.3,
        label=f"Zone 4 ({z4_upper}-{z5_lower} bpm)",
    )
    plt.fill_between(
        df["cumulative_distance_km"],
        0,
        df["heart_rate"],
        where=(df["heart_rate"] >= z3_upper) & (df["heart_rate"] < z4_upper),
        color="orange",
        alpha=0.3,
        label=f"Zone 3 ({z3_upper}-{z4_upper} bpm)",
    )
    plt.fill_between(
        df["cumulative_distance_km"],
        0,
        df["heart_rate"],
        where=(df["heart_rate"] >= z2_upper) & (df["heart_rate"] < z3_upper),
        color="yellow",
        alpha=0.3,
        label=f"Zone 2 ({z2_upper}-{z3_upper} bpm)",
    )
    plt.fill_between(
        df["cumulative_distance_km"],
        0,
        df["heart_rate"],
        where=(df["heart_rate"] < z2_upper),
        color="green",
        alpha=0.3,
        label=f"Zone 1 (<{z2_upper} bpm)",
    )

    # Add annotations for max heart rate
    max_hr_idx = df["heart_rate"].idxmax()
    plt.scatter(
        df.loc[max_hr_idx, "cumulative_distance_km"],
        df.loc[max_hr_idx, "heart_rate"],
        color="darkred",
        zorder=5,
        s=80,
    )
    plt.annotate(
        f"Max HR: {df.loc[max_hr_idx, 'heart_rate']:.0f} bpm",
        (
            df.loc[max_hr_idx, "cumulative_distance_km"],
            df.loc[max_hr_idx, "heart_rate"],
        ),
        xytext=(10, 10),
        textcoords="offset points",
        fontsize=10,
        bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8),
    )

    plt.legend(loc="upper right")

    # Calculate heart rate statistics
    avg_hr = df["heart_rate"].mean()
    max_hr = df["heart_rate"].max()

    # Calculate time in each zone
    time_in_z1 = df[df["heart_rate"] < z2_upper]["time_diff_sec"].sum() / 60  # minutes
    time_in_z2 = (
        df[(df["heart_rate"] >= z2_upper) & (df["heart_rate"] < z3_upper)][
            "time_diff_sec"
        ].sum()
        / 60
    )
    time_in_z3 = (
        df[(df["heart_rate"] >= z3_upper) & (df["heart_rate"] < z4_upper)][
            "time_diff_sec"
        ].sum()
        / 60
    )
    time_in_z4 = (
        df[(df["heart_rate"] >= z4_upper) & (df["heart_rate"] < z5_lower)][
            "time_diff_sec"
        ].sum()
        / 60
    )
    time_in_z5 = df[df["heart_rate"] >= z5_lower]["time_diff_sec"].sum() / 60

    total_time_mins = df["time_diff_sec"].sum() / 60

    print("\nHeart Rate Analysis:")
    print(f"Average Heart Rate: {avg_hr:.0f} bpm")
    print(f"Maximum Heart Rate: {max_hr:.0f} bpm")
    print(f"Estimated Max Heart Rate (age 30): {estimated_max_hr} bpm")
    print("\nTime in Heart Rate Zones:")
    print(
        f"Zone 1 (<{z2_upper} bpm): {time_in_z1:.1f} minutes ({time_in_z1 / total_time_mins * 100:.1f}%)"
    )
    print(
        f"Zone 2 ({z2_upper}-{z3_upper} bpm): {time_in_z2:.1f} minutes ({time_in_z2 / total_time_mins * 100:.1f}%)"
    )
    print(
        f"Zone 3 ({z3_upper}-{z4_upper} bpm): {time_in_z3:.1f} minutes ({time_in_z3 / total_time_mins * 100:.1f}%)"
    )
    print(
        f"Zone 4 ({z4_upper}-{z5_lower} bpm): {time_in_z4:.1f} minutes ({time_in_z4 / total_time_mins * 100:.1f}%)"
    )
    print(
        f"Zone 5 (>{z5_lower} bpm): {time_in_z5:.1f} minutes ({time_in_z5 / total_time_mins * 100:.1f}%)"
    )

else:
    # If no heart rate data, show speed distribution
    ax2 = plt.subplot(2, 1, 2)

    # Calculate speed zones
    speed_counts = pd.cut(
        df["speed_km_h"],
        bins=[0, 5, 10, 15, 20, 25, 30, 100],
        labels=["0-5", "5-10", "10-15", "15-20", "20-25", "25-30", ">30"],
    )

    speed_distribution = speed_counts.value_counts().sort_index()
    speed_distribution = speed_distribution / len(df) * 100  # Convert to percentage

    # Create bar chart
    bars = plt.bar(speed_distribution.index, speed_distribution.values, color="skyblue")

    # Add data labels
    for bar in bars:
        height = bar.get_height()
        plt.text(
            bar.get_x() + bar.get_width() / 2.0,
            height + 0.5,
            f"{height:.1f}%",
            ha="center",
            va="bottom",
        )

    plt.title("Speed Distribution", fontsize=14)
    plt.xlabel("Speed Range (km/h)", fontsize=12)
    plt.ylabel("Percentage of Time (%)", fontsize=12)
    plt.grid(True, alpha=0.3, axis="y")

plt.tight_layout()
plt.show()

# Calculate speed statistics
avg_speed = df["speed_km_h"].mean()
avg_moving_speed = df.loc[df["speed_km_h"] > 1, "speed_km_h"].mean()
median_speed = df["speed_km_h"].median()
max_speed = df["speed_km_h"].max()

print("\nSpeed Analysis:")
print(f"Average Speed (including stops): {avg_speed:.1f} km/h")
print(f"Average Moving Speed (>1 km/h): {avg_moving_speed:.1f} km/h")
print(f"Median Speed: {median_speed:.1f} km/h")
print(f"Maximum Speed: {max_speed:.1f} km/h")

# Calculate time spent in different speed ranges
time_stopped = df[df["speed_km_h"] <= 1]["time_diff_sec"].sum() / 60  # minutes
time_slow = (
    df[(df["speed_km_h"] > 1) & (df["speed_km_h"] <= 10)]["time_diff_sec"].sum() / 60
)
time_medium = (
    df[(df["speed_km_h"] > 10) & (df["speed_km_h"] <= 20)]["time_diff_sec"].sum() / 60
)
time_fast = (
    df[(df["speed_km_h"] > 20) & (df["speed_km_h"] <= 30)]["time_diff_sec"].sum() / 60
)
time_very_fast = df[df["speed_km_h"] > 30]["time_diff_sec"].sum() / 60

total_time_mins = df["time_diff_sec"].sum() / 60

print("\nTime in Speed Ranges:")
print(
    f"Stopped (≤1 km/h): {time_stopped:.1f} minutes ({time_stopped / total_time_mins * 100:.1f}%)"
)
print(
    f"Slow (1-10 km/h): {time_slow:.1f} minutes ({time_slow / total_time_mins * 100:.1f}%)"
)
print(
    f"Medium (10-20 km/h): {time_medium:.1f} minutes ({time_medium / total_time_mins * 100:.1f}%)"
)
print(
    f"Fast (20-30 km/h): {time_fast:.1f} minutes ({time_fast / total_time_mins * 100:.1f}%)"
)
print(
    f"Very Fast (>30 km/h): {time_very_fast:.1f} minutes ({time_very_fast / total_time_mins * 100:.1f}%)"
)

# Ride Summary and Insights

Based on the analysis of the cycling data, here are the key insights:

## Basic Statistics

-   **Total Distance**: We covered a distance of approximately ${df['cumulative_distance_km'].max():.2f} km
-   **Elevation**: Gained ${df.loc[df['elevation_change'] > 0, 'elevation_change'].sum():.1f}m and descended ${abs(df.loc[df['elevation_change'] < 0, 'elevation_change'].sum()):.1f}m
-   **Duration**: Total ride time was ${total_time_seconds / 60:.1f} minutes, with ${moving_time_minutes:.1f} minutes of active riding
-   **Average Speed**: ${(df['cumulative_distance_km'].max() / (moving_time_seconds / 3600)):.2f} km/h while moving

## Performance Highlights

-   **Maximum Speed**: Reached ${df['speed_km_h'].max():.2f} km/h at the fastest point
-   **Steepest Climb**: Encountered a ${df['grade_pct'].max():.1f}% grade at the steepest uphill section
-   **Steepest Descent**: Reached a ${df['grade_pct'].min():.1f}% grade at the steepest downhill section

## Route Characteristics

The route featured varied terrain with a mix of flat sections, climbs, and descents. The elevation profile shows the challenging sections of the ride and where the most effort was required.

## Next Steps for Analysis

For future rides, we could:

1. Compare performance on specific segments over time
2. Analyze power output if power meter data becomes available
3. Correlate heart rate with climbing efforts to assess fitness
4. Examine cadence patterns to optimize pedaling efficiency
5. Track seasonal progress by comparing similar routes throughout the year
