# Python for Cyclists 1. Basics

## Opening a .fit file, reading it into a Pandas dataframe and show some graphs 


## Contents

0. Install fitparse and import the python librarie
1. Import the .fit file and read into a pandas dataframe
2. Check the data
3. Clean the data
4. Show data in graphs
5. Some more data analysis, such as burned calories, total duration etc.
6. Store the data as a .json file

In sports technology the .fit file is a standard originally developed by Garmin but now used as an open standard. 
You can find the .fit file of your activity on your device or download it from your online account. 

Upload this .fit file to a Jupyter environment. In step 1 we will use this file.

## 0. Install fitparse and import the python libraries

We use the fitparse library to parse the .fit file.

Please find the github of fitparse project here: http://dtcooper.github.io/python-fitparse/

In [2]:
%pip install fitparse 

Collecting fitparse
  Downloading fitparse-1.2.0.tar.gz (65 kB)
Building wheels for collected packages: fitparse
  Building wheel for fitparse (setup.py): started
  Building wheel for fitparse (setup.py): finished with status 'done'
  Created wheel for fitparse: filename=fitparse-1.2.0-py3-none-any.whl size=68205 sha256=29e3b235e04d15452955d7b626f5611820ef4f8ef4fb32cf73e77664d5b962d6
  Stored in directory: c:\users\31653\appdata\local\pip\cache\wheels\49\df\4b\3fa676f0a6dd88074d9e09dbaa428e32f43a59a3c01bb95fe8
Successfully built fitparse
Installing collected packages: fitparse
Successfully installed fitparse-1.2.0
Note: you may need to restart the kernel to use updated packages.


In [3]:
#import the modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import fitparse

## 1. import and parse the .fit file and read it into a Pandas dataframe

Source: http://johannesjacob.com/

#### Instruction: Upload the .fit file to Jupyter and place it in the same folder as this notebook.

In [3]:
from fitparse import FitFile
import pandas as pd

fitfile = FitFile('3919172E.fit') ### enter the name of the file here!

while True:
    try:
        fitfile.messages
        break
    except KeyError:
        continue
workout = []
for record in fitfile.get_messages('record'):
    r = {}
    for record_data in record:
        r[record_data.name] = record_data.value
    workout.append(r)

In [4]:
fietsdata = pd.DataFrame(workout) # read the workout into a pandas dataframe 
fietsdata.head() # print the first five rows of the dataframe

Unnamed: 0,accumulated_power,altitude,cadence,calories,distance,enhanced_altitude,enhanced_speed,grade,heart_rate,left_pedal_smoothness,left_right_balance,left_torque_effectiveness,position_lat,position_long,power,right_pedal_smoothness,right_torque_effectiveness,speed,temperature,timestamp
0,9241.0,108.0,64.0,0,0.0,108.0,5.926,,115,,,,624663697.0,57981701.0,111.0,,,5.926,23,2020-05-09 08:00:47
1,9370.0,108.0,65.0,0,12.0,108.0,6.471,,117,,,,624663458.0,57980727.0,129.0,,,6.471,23,2020-05-09 08:00:48
2,9370.0,107.0,65.0,0,12.0,107.0,6.327,,119,,,,624663200.0,57979693.0,112.0,,,6.327,23,2020-05-09 08:00:49
3,9482.0,107.0,65.0,0,24.0,107.0,6.214,,120,,,,624662921.0,57978719.0,132.0,,,6.214,23,2020-05-09 08:00:50
4,9674.0,106.0,66.0,1,24.0,106.0,6.049,,121,,,,624662723.0,57977665.0,60.0,,,6.049,23,2020-05-09 08:00:51


## 2. Check the data

In [5]:
# see the datatypes
print(fietsdata.dtypes)

accumulated_power                    float64
altitude                             float64
cadence                              float64
calories                               int64
distance                             float64
enhanced_altitude                    float64
enhanced_speed                       float64
grade                                 object
heart_rate                             int64
left_pedal_smoothness                 object
left_right_balance                    object
left_torque_effectiveness             object
position_lat                         float64
position_long                        float64
power                                float64
right_pedal_smoothness                object
right_torque_effectiveness            object
speed                                float64
temperature                            int64
timestamp                     datetime64[ns]
dtype: object


In [6]:
# do basic analysis of some of the data 
print(fietsdata[['power', 'heart_rate', 'enhanced_speed', 'cadence']].describe())

              power    heart_rate  enhanced_speed       cadence
count  14905.000000  14911.000000    14911.000000  14905.000000
mean     165.952835    150.404869        6.640665     75.244280
std       72.539232      8.063259        1.536287     18.334591
min        0.000000     98.000000        0.000000      0.000000
25%      139.000000    146.000000        6.317000     73.000000
50%      165.000000    151.000000        7.011000     80.000000
75%      194.000000    155.000000        7.557000     85.000000
max     1194.000000    170.000000        9.296000    248.000000


## 3.Data cleaning

### 3a. Convert m/s to km/h

In [7]:
# convert m/s to km/h
# * 3600s/ 1000 = 3.6
fietsdata['speed_kmh'] = fietsdata['enhanced_speed'] *3.6
fietsdata.head() # check new column speed_kmh at right

Unnamed: 0,accumulated_power,altitude,cadence,calories,distance,enhanced_altitude,enhanced_speed,grade,heart_rate,left_pedal_smoothness,...,left_torque_effectiveness,position_lat,position_long,power,right_pedal_smoothness,right_torque_effectiveness,speed,temperature,timestamp,speed_kmh
0,9241.0,108.0,64.0,0,0.0,108.0,5.926,,115,,...,,624663697.0,57981701.0,111.0,,,5.926,23,2020-05-09 08:00:47,21.3336
1,9370.0,108.0,65.0,0,12.0,108.0,6.471,,117,,...,,624663458.0,57980727.0,129.0,,,6.471,23,2020-05-09 08:00:48,23.2956
2,9370.0,107.0,65.0,0,12.0,107.0,6.327,,119,,...,,624663200.0,57979693.0,112.0,,,6.327,23,2020-05-09 08:00:49,22.7772
3,9482.0,107.0,65.0,0,24.0,107.0,6.214,,120,,...,,624662921.0,57978719.0,132.0,,,6.214,23,2020-05-09 08:00:50,22.3704
4,9674.0,106.0,66.0,1,24.0,106.0,6.049,,121,,...,,624662723.0,57977665.0,60.0,,,6.049,23,2020-05-09 08:00:51,21.7764


### 3b. Convert semicircles to degrees

The data for pos_long and pos_lat is stored as a 'semicircle', so we have to convert it to degrees.

In [8]:
fietsdata['pos_lat'] = fietsdata['position_lat']*  (180 / 2**31)
fietsdata['pos_long'] = fietsdata['position_long']*  (180 / 2**31)
fietsdata.pos_lat.head()

0    52.358706
1    52.358686
2    52.358665
3    52.358641
4    52.358625
Name: pos_lat, dtype: float64

### 3c. Replace 0's with Not a Number and remove outliers

In [9]:
#Replace 0's with Not a Number
import numpy as np

fietsdata.cadence.replace(0, np.nan, inplace=True) # cadence
fietsdata.power.replace(0, np.nan, inplace=True) # power 


In [10]:
# remove high outliers 
fietsdata.power = fietsdata.power[fietsdata.power <1200] # keep only power values < 1200
fietsdata.cadence = fietsdata.cadence[fietsdata.cadence <150] # keep only cadence < 150
print(fietsdata[['power', 'cadence']].describe())

              power       cadence
count  14206.000000  14650.000000
mean     174.118471     76.391809
std       64.023290     15.182021
min        1.000000      2.000000
25%      143.000000     74.000000
50%      167.000000     80.000000
75%      196.000000     85.000000
max     1194.000000    145.000000


In [11]:
# remove low outliers
fietsdata.power = fietsdata.power[fietsdata.power>80] # keep only values > 80
fietsdata.cadence = fietsdata.cadence[fietsdata.cadence>50] # keep only values > 50
fietsdata[['power', 'cadence']].describe()

Unnamed: 0,power,cadence
count,13592.0,13561.0
mean,180.233225,79.8624
std,58.28383,8.531873
min,81.0,51.0
25%,147.0,76.0
50%,169.0,80.0
75%,198.0,85.0
max,1194.0,145.0


## 4. Show data in graphs

### 4a.  basic scatter plot of speed vs distance

In [12]:
%matplotlib notebook
import matplotlib.pyplot as plt
# We use matplotlib to make a scatter of distance vs speed

plt.figure(figsize=(10,5))#use this to set size of figure

### Set the variables ####
x = fietsdata.distance #x-values: column distance
s = fietsdata.speed_kmh #y values: column speed

### Plot the scatter with labels and title ###
plt.plot(x,s,c='lightgreen', marker=".") #make the scatter
# scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=<deprecated parameter>, edgecolors=None, \*, plotnonfinite=False, data=None, \*\*kwargs)
plt.xlabel('Distance')
plt.ylabel('Speed (km/h)')
plt.title('Speed vs Distance')
#plt.legend() 

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Speed vs Distance')

### 4b. Speed, heart rate, cadence and power

In [13]:
%matplotlib notebook

import numpy as np
import matplotlib.pyplot as plt

plt.figure(figsize=(10,7))#use this to set size of figure

######## write the data to a variable ###########

s = fietsdata.speed_kmh
h = fietsdata.heart_rate
c = fietsdata.cadence
p = fietsdata.power

####### make the subplots ##########
plt.subplot(4,1,1)
plt.plot(s, color='red', linewidth=1.5)
plt.ylabel(r'$Speed$') # y label

plt.subplot(4,1,2)
plt.plot(h, color='magenta', linewidth=1.5)
plt.ylabel(r'$Heartrate$') # y label

plt.subplot(4,1,3) 
plt.plot(c, color='grey', linewidth=1.5)
plt.ylabel(r'$Cadence$') # y label

plt.subplot(4,1,4)
plt.plot(p, color='orange', linewidth=1.5)
plt.ylabel(r'$Power$') # y label

<IPython.core.display.Javascript object>

Text(0, 0.5, '$Power$')

### 4c. Histogram of cadence, bpm and power

In [14]:
fig, ax = plt.subplots()
fietsdata[['power', 'heart_rate', 'cadence']].plot.hist(bins=200, alpha=0.5, range=(0, 400), ax=ax)
ax.legend()
ax.set_axisbelow(True)
ax.minorticks_on()
ax.grid(which='major', linestyle='-', linewidth='0.5', color='red')
ax.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()

<IPython.core.display.Javascript object>

## 4d. Speed vs Power

In [15]:
%matplotlib notebook
import matplotlib.pyplot as plt
# We use matplotlib to make a scatter of distance vs speed

plt.figure(figsize=(8,8))#use this to set size of figure

### Set the variables ####
s = fietsdata.speed_kmh #y values: column speed
p = fietsdata.power

### Plot the scatter with labels and title ###
plt.plot(p,s) #make the scatter
plt.xlabel('Power')
plt.ylabel('Speed (km/h)')
plt.title('Speed vs Power')
#plt.legend() 

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Speed vs Power')

## 5. Some more data analysis

### 5a. Average values

In [16]:
avg_speed= fietsdata["speed_kmh"].mean()
text_speed = "Your average speed was {:.1f} km/h."
print(text_speed.format(avg_speed))

avg_cadence = fietsdata['cadence'].mean()
text_cad = "Your average cadence was {:.0f}."
print(text_cad.format(avg_cadence))

avg_power = fietsdata['power'].mean()
text_power = "Your average power was {:.0f} watt."
print(text_power.format(avg_power))

avg_heart_rate = fietsdata['heart_rate'].mean()
text_heart_rate = "Your average heart was {:.0f} beats/minute."
print(text_heart_rate.format(avg_heart_rate))

avg_temp = fietsdata['temperature'].mean()
text_temp = "The average temperature was {:.0f} degrees Celsius."
print(text_temp.format(avg_temp))

Your average speed was 23.9 km/h.
Your average cadence was 80.
Your average power was 180 watt.
Your average heart was 150 beats/minute.
The average temperature was 27 degrees Celsius.


In [17]:
med_speed= fietsdata["speed_kmh"].median()
text_speed = "Your median speed was {:.1f} km/h."
print(text_speed.format(med_speed))

med_cadence = fietsdata['cadence'].median()
text_cad = "Your median cadence was {:.0f}."
print(text_cad.format(med_cadence))

med_power = fietsdata['power'].median()
text_power = "Your median power was {:.0f} watt."
print(text_power.format(med_power))

med_heart_rate = fietsdata['heart_rate'].median()
text_heart_rate = "Your median heartrate was {:.0f} beats/minute."
print(text_heart_rate.format(med_heart_rate))

med_temp = fietsdata['temperature'].median()
text_temp = "The median temperature was {:.0f} degrees Celsius."
print(text_temp.format(med_temp))

Your median speed was 25.2 km/h.
Your median cadence was 80.
Your median power was 169 watt.
Your median heartrate was 151 beats/minute.
The median temperature was 27 degrees Celsius.


In [19]:
max_speed= fietsdata["speed_kmh"].max()
text_speed = "Your max speed was {:.1f} km/h."
print(text_speed.format(max_speed))

max_cadence= fietsdata["cadence"].max()
text_cadence = "Your max cadence was {:.1f}."
print(text_cadence.format(max_cadence))

max_power = fietsdata['power'].max()
text_power = "Your max power was {:.0f} watt."
print(text_power.format(max_power))

max_heart_rate = fietsdata['heart_rate'].max()
text_heart_rate = "Your max heartrate was {:.0f} beats/minute."
print(text_heart_rate.format(max_heart_rate))

max_temp = fietsdata['temperature'].max()
text_temp = "The max temperature was {:.0f} degrees Celsius."
print(text_temp.format(max_temp))

Your max speed was 33.5 km/h.
Your max cadence was 145.0.
Your max power was 1194 watt.
Your max heartrate was 170 beats/minute.
The max temperature was 36 degrees Celsius.


In [20]:
# Show the data in one table

import pandas as pd

tabel2 = pd.DataFrame([ [' ', 'Speed', 'Cadence', 'Power', 'Heart rate', 'Temperature'],
                      ["Average", avg_speed, avg_cadence, avg_power, avg_heart_rate, avg_temp],
                      ["Median", med_speed, med_cadence, med_power, med_heart_rate, med_temp],
                      ["Max", max_speed, '-', max_power, max_heart_rate, max_temp],
                      ])
tabel2.round(1)

Unnamed: 0,0,1,2,3,4,5
0,,Speed,Cadence,Power,Heart rate,Temperature
1,Average,23.9064,79.8624,180.233,150.405,27.1665
2,Median,25.2396,80,169,151,27
3,Max,33.4656,-,1194,170,36


### 5b. Total duration

In [21]:
timetable = (fietsdata['timestamp'])
starttime= timetable.iloc[0]
endtime= timetable.iloc[-1]
#starttijd = timetable.iloc[1,0]
#print(starttijd

duration = endtime-starttime

print("Total duration was: "+ str(duration))

Total duration was: 0 days 06:03:49


## 6. Store the data in json format with date in filename

In [22]:
# Store the dataframe in a json
# source: https://datatofish.com/export-pandas-dataframe-json/
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

import pandas as pd
import datetime

filename = datetime.datetime.now().strftime("%Y%m%d")+"fiets.json"
fietsdata.to_json(filename) # it creates a .json file in the astroplant_explorer/learning_stuff folder

#fietsdata.to_json(r'fietsdata.json')

# result: it works but json file is about 8x larger than .fit file.

### 7. Some more data analys: rolling, expanding etc

In [23]:
%matplotlib notebook
import matplotlib.pyplot as plt

plt.figure(figsize=(10,7))#use this to set size of figure

x = fietsdata.distance #x-values: column distance
p = fietsdata.power
p_roll = fietsdata.power.rolling(60).mean()
p_exp = fietsdata.power.expanding(7).mean()
#############
plt.subplot(4,1,1)
plt.plot(p, color='red', linewidth=1.5)
plt.ylabel(r'$Power$') # y label

plt.subplot(4,1,2)
plt.plot(p_roll, color='magenta', linewidth=1.5)
plt.ylabel(r'$Power Rolling$') # y label

plt.subplot(4,1,3) 
plt.plot(p_exp, color='grey', linewidth=1.5)
plt.ylabel(r'$Power - expanding$') # y label

#plt.subplot(4,1,4)
#plt.plot(p, color='orange', linewidth=1.5)
#plt.ylabel(r'$Power$') # y label

#plt.plot(x,p)
plt.show()

<IPython.core.display.Javascript object>