# Strava data WIP

## Purpose

The purpose of this notebook is to vizualize data imports from Strava.

## Table of contents
1. [Methodology](#Methodology)
2. [WIP - Improvements](#WIP-Improvements)
3. [Results](#Results)
4. [Library Import](#Library_Import)
5. [Data Import](#Data_Import)
7. [Data Processing](#Data_Processing)

## Methodology <a name="Methodology"></a>

Collection of the data using Strava API.

Creation of the dataframe.

Exploratory Data Analysis (EDA)

## WIP - improvements <a name="WIP-Improvements"></a>

Use this section only if the notebook is not final.

Notable TODOs:

TODO 1: Cleaning data (rows and columns)

TODO 2: Correction of the pace calcul

TODO 3: Demi Cooper test estimation with the 6 min of fastest run

## Results <a name="Results"></a>

Describe and comment the most important results. TODO

## Library import <a name="Library_Import"></a>

In [2]:
# discards the output of the cell
%%capture

# install packages
!pip install plotly-express
!pip install plotly-calplot

In [3]:
# strava API
import requests
import urllib3

# data analysis and wrangling
import pandas as pd
import numpy as np
import random as rnd

# visualization
import plotly_express as px
import plotly.io as pio #for templates
pio.templates.default = "plotly_white" #set a default plotly templates
from plotly_calplot import calplot

# widgets
from ipywidgets import interact

## Data import <a name="Data_Import"></a>

#### Replace in the cell below: client_id, client_secret, refresh_token

In [4]:
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

auth_url = "https://www.strava.com/oauth/token"
activites_url = "https://www.strava.com/api/v3/athlete/activities"

payload = {
    'client_id': "XXXX",
    'client_secret': 'XXXX',
    'refresh_token': 'XXXX',
    'grant_type': "refresh_token",
    'f': 'json'
}

print("Requesting Token...\n")
res = requests.post(auth_url, data=payload, verify=False)
access_token = res.json()['access_token']
#print("Access Token = {}\n".format(access_token))
header = {'Authorization': 'Bearer ' + access_token}


print('Downloading data, this can take a few seconds......\n')
requests_page_number = 1
all_activities = []
while True:
    param = {'per_page': 200, 'page': requests_page_number}
    my_dataset = requests.get(activites_url, headers=header, params=param).json()

    # breaking out of the while loop because there is no more activities
    if len(my_dataset) == 0:
        break

    # if all_activities is not empty, extend it with the new my_dataset values
    if all_activities:
        all_activities.extend(my_dataset)
    # if all_activities is empty, all_activities = my_dataset
    else:
        all_activities = my_dataset
    requests_page_number += 1

print('DONE !!\n', len(all_activities), ' activities downloaded.')

activities = pd.json_normalize(all_activities)
activities.to_csv('activities.csv')

Requesting Token...

Downloading data, this can take a few seconds......

DONE !!
 685  activities downloaded.


## Data Processing <a name="Data_Processing"></a>

### Data Check

In [5]:
df = pd.read_csv('activities.csv')

In [6]:
df.columns

Index(['Unnamed: 0', 'resource_state', 'name', 'distance', 'moving_time',
       'elapsed_time', 'total_elevation_gain', 'type', 'sport_type',
       'workout_type', 'id', 'start_date', 'start_date_local', 'timezone',
       'utc_offset', 'location_city', 'location_state', 'location_country',
       'achievement_count', 'kudos_count', 'comment_count', 'athlete_count',
       'photo_count', 'trainer', 'commute', 'manual', 'private', 'visibility',
       'flagged', 'gear_id', 'start_latlng', 'end_latlng', 'average_speed',
       'max_speed', 'average_temp', 'has_heartrate', 'heartrate_opt_out',
       'display_hide_heartrate_option', 'elev_high', 'elev_low', 'upload_id',
       'upload_id_str', 'external_id', 'from_accepted_tag', 'pr_count',
       'total_photo_count', 'has_kudoed', 'athlete.id',
       'athlete.resource_state', 'map.id', 'map.summary_polyline',
       'map.resource_state'],
      dtype='object')

In [7]:
df.head()

Unnamed: 0.1,Unnamed: 0,resource_state,name,distance,moving_time,elapsed_time,total_elevation_gain,type,sport_type,workout_type,...,external_id,from_accepted_tag,pr_count,total_photo_count,has_kudoed,athlete.id,athlete.resource_state,map.id,map.summary_polyline,map.resource_state
0,0,2,Sortie vélo en soirée,9636.5,1392,1421,27.0,Ride,Ride,,...,garmin_ping_254634430544,False,0,0,False,45535233,1,a8364559713,_rntHuceRx@{@vAcBNGTR`FnJV`@hA|BJDPf@pAbC|AnCj...,2
1,1,2,Course à pied du midi,8508.7,2682,2761,48.7,Run,Run,0.0,...,a669831f-e97f-4bdf-8a35-af156b173903-activity.fit,False,1,0,False,45535233,1,a8363046289,,2
2,2,2,Sortie vélo le matin,8698.6,1296,1559,53.0,Ride,Ride,,...,garmin_ping_254634428603,False,1,0,False,45535233,1,a8364559632,ibdtH{cyQsBmDuAwBaHiLyCyEgB_Dm@y@We@aBgCQ_@K]G...,2
3,3,2,Sortie vélo dans l'après-midi,9739.3,1457,1562,22.0,Ride,Ride,,...,garmin_ping_254032765411,False,0,0,False,45535233,1,a8342922887,qvntHs~dRbHeITGLFnD~G~B~Eh@`AL`@PJ~A|C|@|AhAvA...,2
4,4,2,Sortie vélo le matin,9512.8,1430,1550,49.0,Ride,Ride,,...,garmin_ping_253975251879,False,1,0,False,45535233,1,a8340571349,kwbtHkrwQAGFKBk@CUKQ_B_BeHyG_AgAyAuAeAgAc@YwCs...,2


In [8]:
df.describe()

Unnamed: 0.1,Unnamed: 0,resource_state,distance,moving_time,elapsed_time,total_elevation_gain,workout_type,id,utc_offset,location_city,...,average_temp,elev_high,elev_low,upload_id,upload_id_str,pr_count,total_photo_count,athlete.id,athlete.resource_state,map.resource_state
count,685.0,685.0,685.0,685.0,685.0,685.0,344.0,685.0,685.0,0.0,...,349.0,681.0,681.0,681.0,681.0,685.0,685.0,685.0,685.0,685.0
mean,342.0,2.0,12807.505401,2303.159124,3924.543066,53.592701,7.616279,6599827000.0,5717.956204,,...,10.532951,51.635389,20.096035,7034213000.0,7034213000.0,0.927007,0.018978,45535233.0,1.0,2.0
std,197.886752,0.0,13004.269966,2262.554287,13356.887869,74.474964,4.267085,1512274000.0,1772.989817,,...,6.728493,26.351672,19.384323,1628316000.0,1628316000.0,2.912052,0.146864,0.0,0.0,0.0
min,0.0,2.0,48.7,25.0,97.0,0.0,0.0,2631713000.0,3600.0,,...,-5.0,-82.0,-88.0,2791704000.0,2791704000.0,0.0,0.0,45535233.0,1.0,2.0
25%,171.0,2.0,8738.4,1358.0,1492.0,27.0,10.0,6209339000.0,3600.0,,...,5.0,42.4,18.2,6596373000.0,6596373000.0,0.0,0.0,45535233.0,1.0,2.0
50%,342.0,2.0,9777.9,1724.0,1993.0,48.8,10.0,6868216000.0,7200.0,,...,10.0,55.8,21.4,7307050000.0,7307050000.0,0.0,0.0,45535233.0,1.0,2.0
75%,513.0,2.0,14662.8,2533.0,2825.0,61.0,10.0,7723532000.0,7200.0,,...,15.0,57.4,25.2,8248689000.0,8248689000.0,0.0,0.0,45535233.0,1.0,2.0
max,684.0,2.0,158904.0,20179.0,315466.0,1218.0,10.0,8364560000.0,7200.0,,...,33.0,202.4,157.4,8971132000.0,8971132000.0,38.0,2.0,45535233.0,1.0,2.0


In [9]:
# drop unwanted columns
drop_columns = ['workout_type', 'location_city', 'utc_offset', 'location_city', 'location_state', 'location_country', 'trainer', 'commute', 'manual', 'gear_id', 'has_heartrate', 'heartrate_opt_out' , 'display_hide_heartrate_option' , 'from_accepted_tag']

df = df.drop(columns=drop_columns)

In [10]:
# set option to see the dataframe entirely 
pd.set_option('display.max_columns', None)
df.describe()

Unnamed: 0.1,Unnamed: 0,resource_state,distance,moving_time,elapsed_time,total_elevation_gain,id,achievement_count,kudos_count,comment_count,athlete_count,photo_count,average_speed,max_speed,average_temp,elev_high,elev_low,upload_id,upload_id_str,pr_count,total_photo_count,athlete.id,athlete.resource_state,map.resource_state
count,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,685.0,349.0,681.0,681.0,681.0,681.0,685.0,685.0,685.0,685.0,685.0
mean,342.0,2.0,12807.505401,2303.159124,3924.543066,53.592701,6599827000.0,2.433577,2.341606,0.042336,1.351825,0.0,5.829232,10.047669,10.532951,51.635389,20.096035,7034213000.0,7034213000.0,0.927007,0.018978,45535233.0,1.0,2.0
std,197.886752,0.0,13004.269966,2262.554287,13356.887869,74.474964,1512274000.0,5.916355,1.572825,0.269753,0.65807,0.0,2.811245,2.691388,6.728493,26.351672,19.384323,1628316000.0,1628316000.0,2.912052,0.146864,0.0,0.0,0.0
min,0.0,2.0,48.7,25.0,97.0,0.0,2631713000.0,0.0,0.0,0.0,1.0,0.0,0.919,0.0,-5.0,-82.0,-88.0,2791704000.0,2791704000.0,0.0,0.0,45535233.0,1.0,2.0
25%,171.0,2.0,8738.4,1358.0,1492.0,27.0,6209339000.0,0.0,2.0,0.0,1.0,0.0,5.233,9.159,5.0,42.4,18.2,6596373000.0,6596373000.0,0.0,0.0,45535233.0,1.0,2.0
50%,342.0,2.0,9777.9,1724.0,1993.0,48.8,6868216000.0,0.0,2.0,0.0,1.0,0.0,6.114,10.2,10.0,55.8,21.4,7307050000.0,7307050000.0,0.0,0.0,45535233.0,1.0,2.0
75%,513.0,2.0,14662.8,2533.0,2825.0,61.0,7723532000.0,2.0,3.0,0.0,2.0,0.0,6.648,11.277,15.0,57.4,25.2,8248689000.0,8248689000.0,0.0,0.0,45535233.0,1.0,2.0
max,684.0,2.0,158904.0,20179.0,315466.0,1218.0,8364560000.0,69.0,9.0,3.0,5.0,0.0,59.0,25.3,33.0,202.4,157.4,8971132000.0,8971132000.0,38.0,2.0,45535233.0,1.0,2.0


In [11]:
df.dtypes

Unnamed: 0                  int64
resource_state              int64
name                       object
distance                  float64
moving_time                 int64
elapsed_time                int64
total_elevation_gain      float64
type                       object
sport_type                 object
id                          int64
start_date                 object
start_date_local           object
timezone                   object
achievement_count           int64
kudos_count                 int64
comment_count               int64
athlete_count               int64
photo_count                 int64
private                      bool
visibility                 object
flagged                      bool
start_latlng               object
end_latlng                 object
average_speed             float64
max_speed                 float64
average_temp              float64
elev_high                 float64
elev_low                  float64
upload_id                 float64
upload_id_str 

In [12]:
# Keep rows where 'distance' >= 4000 and 'type' == 'Ride'
# df = df[(df['distance'] >= 4000) & (df['type'] == 'Ride')]

### Date & time

In [13]:
df['start_date_local'].head()

0    2023-01-09T18:23:35Z
1    2023-01-09T12:23:47Z
2    2023-01-09T08:24:59Z
3    2023-01-05T17:37:05Z
4    2023-01-05T08:38:22Z
Name: start_date_local, dtype: object

In [14]:
# converts the start_date_local columns to a datetime format
df['start_date_local'] = pd.to_datetime(df['start_date_local'])
# creates 2 new columns for date and time
df['start_date'] = df['start_date_local'].dt.date
df['start_time'] = df['start_date_local'].dt.time

# converts the 2 columms to datetime format
df['start_date'] = pd.to_datetime(df['start_date'])
#df['start_time'] = pd.to_datetime(df['start_time'])

df[['start_date', 'start_time']].head()

Unnamed: 0,start_date,start_time
0,2023-01-09,18:23:35
1,2023-01-09,12:23:47
2,2023-01-09,08:24:59
3,2023-01-05,17:37:05
4,2023-01-05,08:38:22


### Speed conversion (km/h & min/km)

In [15]:
df[['average_speed', 'max_speed']].head()

Unnamed: 0,average_speed,max_speed
0,6.923,10.369
1,3.173,7.04
2,6.712,10.29
3,6.684,10.767
4,6.652,11.162


In [16]:
# Conversion of average speed and max speed from m/s to km/h
df['average_speed'] = df['average_speed'] * 3.6
df['max_speed'] = df['max_speed'] * 3.6

In [17]:
# Delete rows where the value in the 'speed' column is greater than 100
df.drop(df[(df['average_speed'] > 100) & (df['type'] == 'Ride')].index, inplace=True)

In [18]:
# New column for average speed in min/km
df['asp'] = round(60 / df['average_speed'], 2)

# New column with the decimals of speeds values multiply by 60 for min.sec/km
df['asp_deci'] = (df['asp'] - np.fix(df['asp'])) * 60

# Converts the two columns as integer
df['asp_deci'] = df['asp_deci'].astype(int)
df['asp'] = df['asp'].astype(int)

# New column asp + asp deci to get the final pace value
df['average_speed_pace'] = df['asp'].astype(str) + '.' + df['asp_deci'].astype(str)
df['average_speed_pace'] = df['average_speed_pace'].astype(float)

# Delete the asp and asp_deci column
df.drop(['asp', 'asp_deci'], axis=1, inplace=True)

In [19]:
df[['start_date', 'type', 'average_speed', 'average_speed_pace']]

Unnamed: 0,start_date,type,average_speed,average_speed_pace
0,2023-01-09,Ride,24.9228,2.24
1,2023-01-09,Run,11.4228,5.15
2,2023-01-09,Ride,24.1632,2.28
3,2023-01-05,Ride,24.0624,2.29
4,2023-01-05,Ride,23.9472,2.30
...,...,...,...,...
680,2019-08-20,Ride,19.2024,3.70
681,2019-08-19,Ride,20.1996,2.58
682,2019-08-19,Ride,17.9964,3.19
683,2019-08-19,Ride,20.8476,2.52


In [20]:
fig = px.scatter(df, x='start_date', y='average_speed', color='type', title='Strava activities averate speed (km/h)')

fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)

#fig.write_html("Plots/Strava activities averate speed (kmh).html")

### Distance check

In [21]:
# Delete rows where the value in the 'distance' column is greater than 100
df.drop(df[(df['distance'] < 4000) & (df['type'] == 'Ride')].index, inplace=True)

In [22]:
fig = px.scatter(df, x='start_date', y='distance', color='type', title='Strava activities distance (m)')

fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)

#fig.write_html("Plots/Strava activities distance (m).html")

### Activities counter

#### Daily

In [23]:
# Create the activities count df with the start-date and type columns
df_activities_count = df[['start_date', 'type']]

# Add a counts column based on start date and type columns
df_activities_count['counts'] = df_activities_count.groupby(['start_date', 'type'])['start_date'].transform('count')

# Drop all the duplicates rows of the df
df_activities_count = df_activities_count.drop_duplicates()

df_activities_count



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,start_date,type,counts
0,2023-01-09,Ride,2
1,2023-01-09,Run,1
3,2023-01-05,Ride,2
5,2023-01-03,Run,1
6,2022-12-29,Ride,2
...,...,...,...
644,2019-09-04,Ride,2
646,2019-09-03,Ride,1
647,2019-09-02,Ride,1
648,2019-08-31,Ride,1


In [24]:
fig = px.bar(df_activities_count, x="start_date", y="counts", color="type", title="Activities Counter")

fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)

#fig.write_html("Plots/Activities Counter Daily.html")

In [25]:
df_activities_count.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 362 entries, 0 to 672
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   start_date  362 non-null    datetime64[ns]
 1   type        362 non-null    object        
 2   counts      362 non-null    int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 11.3+ KB


#### Calendar Heatmap

In [26]:
df['start_date'].value_counts()

2022-02-02    4
2021-12-13    4
2022-05-31    4
2023-01-09    3
2022-01-10    3
             ..
2022-02-18    1
2021-11-11    1
2022-10-01    1
2022-07-08    1
2022-04-11    1
Name: start_date, Length: 312, dtype: int64

In [27]:
# Create the df_cal dataframe with the start_date and counts column from df values
df_cal = df['start_date'].value_counts().rename_axis('start_date').reset_index(name='counts')

# Sort the dataframe by the 'start_date' column in ascending order and update the dataframe in place
df_cal.sort_values(by='start_date', inplace=True)

df_cal.head()

Unnamed: 0,start_date,counts
147,2019-08-21,2
247,2019-08-31,1
236,2019-09-02,1
242,2019-09-03,1
78,2019-09-04,2


In [28]:
# calendar heatmap
fig = calplot(
    df_cal,
    x="start_date", y="counts",
    years_title=True,
    colorscale="blues",
    #color="type",
    gap=4
)
fig.show()

#fig.write_html("Plots/Calendar Heatmap.html")

#### Calendar heatmap for a specific year

In [29]:
year = 2022

# Select only rows with a specific row
df_cal[df_cal['start_date'].dt.year == year].head()

Unnamed: 0,start_date,counts
13,2022-01-03,3
284,2022-01-04,1
135,2022-01-05,2
125,2022-01-06,2
287,2022-01-08,1


In [30]:
# calendar heatmap
fig = calplot(
    df_cal[df_cal['start_date'].dt.year == year],
    x="start_date", y="counts",
    years_title=True,
    colorscale="blues",
    #color="type",
    gap=4
)
fig.show()

#fig.write_html("Plots/Calendar Heatmap 2022.html")

#### Dropdown menu to select the year

In [31]:
# Extract the unique years from the dataframe
years = df_cal['start_date'].dt.year.unique()

# Define a function that filters the dataframe by year and plots the calendar heatmaps
def plot_by_year(year):
  fig = calplot(
      df_cal[df_cal['start_date'].dt.year == year],
      x="start_date", y="counts",
      years_title=True,
      colorscale="blues",
      #color="type",
      gap=4
  )
  fig.show()

# Create a dropdown menu with the years
interact(plot_by_year, year=years)

interactive(children=(Dropdown(description='year', options=(2019, 2020, 2021, 2022, 2023), value=2019), Output…

<function __main__.plot_by_year(year)>

#### Monthly

In [32]:
# Create the activities dayly count df with the start-date and type columns
df_monthly = df[['start_date', 'type']]

# Convert start date to date format
df_monthly['start_date'] = pd.to_datetime(df['start_date'])

# Get the month and year of start date
df_monthly['month'] = df_monthly['start_date'].dt.to_period('M')

# Remove the start date column
df_monthly = df_monthly.drop(['start_date'], axis=1)

# Add a counts column based on start date and type columns
df_monthly['counts'] = df_monthly.groupby(['month', 'type'])['month'].transform('count')

# Drop all the duplicates rows of the df
df_monthly = df_monthly.drop_duplicates()

#df_monthly.reset_index(drop=True, inplace=True)
df_monthly['ConvertedMonth']=df_monthly['month'].astype(str)

df_monthly.head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,type,month,counts,ConvertedMonth
0,Ride,2023-01,4,2023-01
1,Run,2023-01,2,2023-01
6,Ride,2022-12,31,2022-12
14,Run,2022-12,8,2022-12
46,Ride,2022-11,38,2022-11


In [33]:
fig = px.bar(df_monthly, x='ConvertedMonth', y='counts', color="type", 
            labels={
                "ConvertedMonth": "Month - Year",
                "counts": "Counter",
                "type": "Types"
                 },
             title="STRAVA Activities Counter Monthly")

fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)

#fig.write_html("Plots/Count Month.html")

#### Monthly 2

In [34]:
df_monthly_2 = df
df_monthly_2.groupby(df_monthly_2['start_date'].dt.strftime('%m-%y')).head()
#df_monthly_2.agg (total_interviews = ('num_interviews' , 'sum')))
#df_monthly_2.head()

Unnamed: 0.1,Unnamed: 0,resource_state,name,distance,moving_time,elapsed_time,total_elevation_gain,type,sport_type,id,start_date,start_date_local,timezone,achievement_count,kudos_count,comment_count,athlete_count,photo_count,private,visibility,flagged,start_latlng,end_latlng,average_speed,max_speed,average_temp,elev_high,elev_low,upload_id,upload_id_str,external_id,pr_count,total_photo_count,has_kudoed,athlete.id,athlete.resource_state,map.id,map.summary_polyline,map.resource_state,start_time,average_speed_pace
0,0,2,Sortie vélo en soirée,9636.5,1392,1421,27.0,Ride,Ride,8364559713,2023-01-09,2023-01-09 18:23:35+00:00,(GMT+01:00) Europe/Paris,0,2,0,1,0,False,everyone,False,"[50.708782989531755, 3.141603022813797]","[50.64082478173077, 3.0678680911660194]",24.9228,37.3284,5.0,53.2,11.4,8.971132e+09,8.971132e+09,garmin_ping_254634430544,0,0,False,45535233,1,a8364559713,_rntHuceRx@{@vAcBNGTR`FnJV`@hA|BJDPf@pAbC|AnCj...,2,18:23:35,2.24
1,1,2,Course à pied du midi,8508.7,2682,2761,48.7,Run,Run,8363046289,2023-01-09,2023-01-09 12:23:47+00:00,(GMT+01:00) Europe/Paris,2,2,0,1,0,False,everyone,False,[],[],11.4228,25.3440,,35.5,32.4,8.969448e+09,8.969448e+09,a669831f-e97f-4bdf-8a35-af156b173903-activity.fit,1,0,False,45535233,1,a8363046289,,2,12:23:47,5.15
2,2,2,Sortie vélo le matin,8698.6,1296,1559,53.0,Ride,Ride,8364559632,2023-01-09,2023-01-09 08:24:59+00:00,(GMT+01:00) Europe/Paris,2,2,0,1,0,False,everyone,False,"[50.64832616597414, 3.0784226674586535]","[50.709917061030865, 3.1410851888358593]",24.1632,37.0440,4.0,191.2,157.4,8.971132e+09,8.971132e+09,garmin_ping_254634428603,1,0,False,45535233,1,a8364559632,ibdtH{cyQsBmDuAwBaHiLyCyEgB_Dm@y@We@aBgCQ_@K]G...,2,08:24:59,2.28
3,3,2,Sortie vélo dans l'après-midi,9739.3,1457,1562,22.0,Ride,Ride,8342922887,2023-01-05,2023-01-05 17:37:05+00:00,(GMT+01:00) Europe/Paris,0,3,0,1,0,False,everyone,False,"[50.70959544740617, 3.140655616298318]","[50.64087473787367, 3.0678227450698614]",24.0624,38.7612,8.0,59.8,24.2,8.946835e+09,8.946835e+09,garmin_ping_254032765411,0,0,False,45535233,1,a8342922887,qvntHs~dRbHeITGLFnD~G~B~Eh@`AL`@PJ~A|C|@|AhAvA...,2,17:37:05,2.29
4,4,2,Sortie vélo le matin,9512.8,1430,1550,49.0,Ride,Ride,8340571349,2023-01-05,2023-01-05 08:38:22+00:00,(GMT+01:00) Europe/Paris,1,2,0,1,0,False,everyone,False,"[50.642509292811155, 3.071436183527112]","[50.70981706492603, 3.141084937378764]",23.9472,40.1832,7.0,-16.2,-50.0,8.944195e+09,8.944195e+09,garmin_ping_253975251879,1,0,False,45535233,1,a8340571349,kwbtHkrwQAGFKBk@CUKQ_B_BeHyG_AgAyAuAeAgAc@YwCs...,2,08:38:22,2.30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
641,641,2,Sortie à vélo dans l'après-midi,18233.6,3154,3911,65.9,Ride,Ride,2716898886,2019-09-17,2019-09-17 17:30:36+00:00,(GMT+01:00) Europe/Paris,1,0,0,1,0,False,everyone,False,"[50.725521, 3.155747]","[50.636926, 3.046704]",20.8116,44.6400,,49.4,14.9,2.880246e+09,2.880246e+09,,0,0,False,45535233,1,a2716898886,oirtHkjgRx@sBPGGg@Tw@r@o@n@QlAaBv@oJ?kCx@C^g@J...,2,17:30:36,2.52
642,642,2,Sortie à vélo dans l'après-midi,35115.7,6641,8010,253.0,Ride,Ride,2682304315,2019-09-05,2019-09-05 15:50:59+00:00,(GMT+01:00) Europe/Paris,0,0,0,1,0,False,everyone,False,"[43.642892, -1.433693]","[43.464089, -1.532163]",19.0368,70.5600,,75.4,-0.2,2.844332e+09,2.844332e+09,,0,0,False,45535233,1,a2682304315,a_kiGr_wG~BZfDW`CmAxCiCj@Kp@~AhAlFL|CbC~GKf@jE...,2,15:50:59,3.80
648,648,2,Sortie à vélo en soirée,61173.6,12574,17244,158.5,Ride,Ride,2668704543,2019-08-31,2019-08-31 19:14:41+00:00,(GMT+01:00) Europe/Paris,2,0,0,2,0,False,everyone,False,"[44.889099, -0.591568]","[45.008844, -1.190976]",17.5140,44.6400,,52.3,2.2,2.830218e+09,2.830218e+09,,2,0,False,45535233,1,a2668704543,yk~pGhprB^bDo@fUi@jIa@xAkDnF[xAMvBBdCVdAzApCd@...,2,19:14:41,3.25
672,672,2,Sortie à vélo dans l'après-midi,5451.6,996,2775,34.6,Ride,Ride,2638681881,2019-08-21,2019-08-21 17:15:46+00:00,(GMT+01:00) Europe/Paris,1,0,0,1,0,False,everyone,False,"[50.736803, 3.150903]","[50.624839, 3.053461]",19.7028,39.2400,,49.4,25.1,2.798962e+09,2.798962e+09,,1,0,False,45535233,1,a2638681881,_pttHclfRTGLKN_@EQDCPED?LHR@tAa@x@Wd@GLBJEJIDM...,2,17:15:46,3.20


### Distance information

In [35]:
# Every KM
df['distance'].sum()

8528811.7

In [36]:
# Number of runs for a specifide year
year = 2022

# create sub df
df_d_year = df[(df['start_date'].dt.year == year) & (df['type'] == 'Ride')]

len(df_d_year)
#df[(df['distance'] < 4000) & (df['type'] == 'Ride')]

print('Number of bike runs during', year, ':', len(df_d_year))

Number of bike runs during 2022 : 370


In [37]:
distance = df_d_year['distance'].sum()
print('Number of km during', year, ':', round(distance/1000), 'km')

Number of km during 2022 : 5081 km


In [38]:
round(df[(df['type'] == 'Ride')]['distance'].sum()/1000)

7975

### Activities Counter

In [39]:
len(df[df['type'] == 'Ride'])

523