# Strava data processing and exploration

This notebook uses the data fetched by the *fetch_data.ipynb* and explores Strava extracted data.

Data are:

- Athlete profile
- All Atlhlete activities with:
    - Base activity data such as distance, average pace, average HR etc...
    - Advanced data - streams - about HR, pace etc ...


---



## Dependencies and configuration

import needed dependencies and configure the environment.

In [1]:
import json
import math
import time
import os
from pathlib import Path
from typing import Dict, Any
import threading
import requests
import polars as pl
import numpy as np

from utils import config

Loaded STRAVA_CLIENT_ID: 176459
Loaded STRAVA_CLIENT_SECRET: 1a9d963bec85c4de91bd28526331a13ef578a524
Loaded STRAVA_REFRESH_TOKEN: b873396b75c9504b9ebd23bac3216e476cb9d2a4
Loaded STRAVA_ACCESS_TOKEN: f4bc0ac3563ab45187161065726ccacb984d8bac
Loaded STRAVA_USER_AUTHORIZATION_CODE: 1f6d477ee870de8113f2fe2f44283124ceca5935


In [2]:
def load_file(path: Path) -> pl.DataFrame:
    p = str(path)
    # Sniff the first non-whitespace char to choose parser
    try:
        with open(p, 'r', encoding='utf-8') as f:
            head = f.read(2048).lstrip()
    except Exception:
        head = ""

    if head.startswith("["):
        # JSON array of objects
        return pl.read_json(p)

    # Try NDJSON (one JSON object per line) first; fallback to standard JSON
    try:
        return pl.read_ndjson(p)
    except Exception:
        return pl.read_json(p)

## Data Exploration

### 1. Basic athlete data

In [3]:
athlete_profile = load_file(config.ATHLETE_DATA)

print(athlete_profile)

shape: (1, 33)
┌─────┬──────────────┬──────────────┬──────────┬───┬─────────┬─────────────┬─────────┬─────────────┐
│ sex ┆ profile_medi ┆ postable_clu ┆ id       ┆ … ┆ country ┆ athlete_typ ┆ premium ┆ badge_type_ │
│ --- ┆ um           ┆ bs_count     ┆ ---      ┆   ┆ ---     ┆ e           ┆ ---     ┆ id          │
│ str ┆ ---          ┆ ---          ┆ i64      ┆   ┆ str     ┆ ---         ┆ bool    ┆ ---         │
│     ┆ str          ┆ i64          ┆          ┆   ┆         ┆ i64         ┆         ┆ i64         │
╞═════╪══════════════╪══════════════╪══════════╪═══╪═════════╪═════════════╪═════════╪═════════════╡
│ M   ┆ https://dgal ┆ 2            ┆ 10097604 ┆ … ┆ France  ┆ 1           ┆ true    ┆ 1           │
│     ┆ ywyr863hv.cl ┆              ┆          ┆   ┆         ┆             ┆         ┆             │
│     ┆ oudfro…      ┆              ┆          ┆   ┆         ┆             ┆         ┆             │
└─────┴──────────────┴──────────────┴──────────┴───┴─────────┴─────────────┴

### 2. Base activity statistics

Some stats about activities over time

In [6]:
activities = load_file(config.ATHLETE_ACTIVITIES)

print(activities.shape)
print(activities.head())

(709, 57)
shape: (5, 57)
┌────────────┬────────────┬───────────┬───────────┬───┬───────────┬─────────┬───────────┬──────────┐
│ kilojoules ┆ from_accep ┆ moving_ti ┆ average_t ┆ … ┆ elev_high ┆ flagged ┆ comment_c ┆ elev_low │
│ ---        ┆ ted_tag    ┆ me        ┆ emp       ┆   ┆ ---       ┆ ---     ┆ ount      ┆ ---      │
│ f64        ┆ ---        ┆ ---       ┆ ---       ┆   ┆ f64       ┆ bool    ┆ ---       ┆ f64      │
│            ┆ bool       ┆ i64       ┆ i64       ┆   ┆           ┆         ┆ i64       ┆          │
╞════════════╪════════════╪═══════════╪═══════════╪═══╪═══════════╪═════════╪═══════════╪══════════╡
│ null       ┆ false      ┆ 10228     ┆ null      ┆ … ┆ 227.2     ┆ false   ┆ 0         ┆ 58.8     │
│ null       ┆ false      ┆ 23740     ┆ null      ┆ … ┆ 1219.2    ┆ false   ┆ 0         ┆ 809.8    │
│ null       ┆ false      ┆ 2529      ┆ null      ┆ … ┆ 103.2     ┆ false   ┆ 0         ┆ 11.6     │
│ 1355.8     ┆ false      ┆ 6284      ┆ 19        ┆ … ┆ 270.2     