In [1]:
import pandas as pd
import numpy as np 
from gpxutils import parse_gpx 
import matplotlib.pyplot as plt
%matplotlib inline

# Analysis of Cycling Data

We are provided with four files containing recordings of cycling activities that include GPS location data as
well as some measurements related to cycling performace like heart rate and power.  The goal is to perform
some exploration and analysis of this data. 

The data represents four races.  Two are time trials where the rider rides alone on a set course.  Two are 
road races where the rider rides with a peleton.  All were held on the same course but the road races include
two laps where the time trials include just one. 

Questions to explore with the data:
* What is the overall distance travelled for each of the rides? What are the average speeds etc.  Provide a summary for each ride.
* Compare the range of speeds for each ride, are time trials faster than road races? 
* Compare the speeds achieved in the two time trials (three years apart).  As well as looking at the averages, can you see where in the ride one or the other is faster.  
* From the elevation_gain field you can see whether the rider is _climbing_ , _descending_ or on the _flat_.   Use this to calculate the average speeds in those three cases (climbing, flat or descending).  Note that _flat_ might not be zero elevation_gain but might allow for slight climbs and falls.  

For time varying data like this it is often useful to _smooth_ the data using eg. a [rolling mean](https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html).  You might want to experiment with smoothing in some of your analysis (not required but may be of interest).

## Description of Fields

* _index_ is a datetime showing the time that the observation was made (I wasn't riding at night, this is converted to UTC)
* __latitude, longitude, elevation__ from the GPS, the position of the rider at each timepoint, elevation in m
* __temperature__ the current ambient temperature in degrees celcius
* __power__ the power being generated by the rider in Watts
* __cadence__ the rotational speed of the pedals in revolutions per minute
* __hr__ heart rate in beats per minute
* __elevation_gain__ the change in elevation in m between two observations
* __distance__ distance travelled between observations in km
* __speed__ speed measured in km/h

You are provided with code in [gpxutils.py](gpxutils.py) to read the GPX XML format files that are exported by cycling computers and applications.  The sample files were exported from [Strava](https://strava.com/) and represent four races by Steve Cassidy.


In [2]:
# read the four data files
rr_2016 = parse_gpx('files/Calga_RR_2016.gpx')
tt_2016 = parse_gpx('files/Calga_TT_2016.gpx')
rr_2019 = parse_gpx('files/Calga_RR_2019.gpx')
tt_2019 = parse_gpx('files/Calga_TT_2019.gpx')

In [3]:
rr_2016.head()

Unnamed: 0,latitude,longitude,elevation,temperature,power,cadence,hr,distance,elevation_gain,speed,timedelta
2016-05-14 04:02:41+00:00,-33.415561,151.222303,208.6,29.0,0.0,40.0,102.0,0.0,0.0,0.0,
2016-05-14 04:02:42+00:00,-33.415534,151.222289,208.6,29.0,0.0,40.0,102.0,0.003271,0.0,11.77702,1.0
2016-05-14 04:02:46+00:00,-33.415398,151.22218,208.6,29.0,0.0,40.0,103.0,0.018194,0.0,16.375033,4.0
2016-05-14 04:02:49+00:00,-33.415264,151.222077,208.6,29.0,0.0,55.0,106.0,0.017703,0.0,21.243901,3.0
2016-05-14 04:02:51+00:00,-33.41516,151.222013,208.6,29.0,0.0,61.0,109.0,0.013001,0.0,23.401217,2.0


# Race Data Exploration

ADD MORE INFO??


In [4]:
# Dataset Functions - 

# Basic Exploration

def explorBasic(data):
    print('Total Distance(Km): ' , data.distance.sum())
    print('Average speed(Km/h): ', data.speed.mean())
    print('Minimum Elevation(m): ', data.elevation.min())
    print('Maxium Elevation(m): ', data.elevation.max())
    print('Cadence Minimum(rpm): ', data.cadence.min())
    print('Cadence Maximum(rpm): ', data.cadence.max())
    print('Cadence Average(rpm): ', data.cadence.mean())
    print('Heart Rate Minimum(bpm): ', data.hr.min())
    print('Heart Rate Maximum(bpm): ', data.hr.max())
    print('Heart Rate Average(bpm): ', data.hr.mean())

## Calga 2016

### Road Race

In [5]:
explorBasic(rr_2016)

Total Distance(Km):  49.04858574628638
Average speed(Km/h):  34.93308475482947
Minimum Elevation(m):  176.0
Maxium Elevation(m):  295.8
Cadence Minimum(rpm):  0.0
Cadence Maximum(rpm):  117.0
Cadence Average(rpm):  65.98795180722891
Heart Rate Minimum(bpm):  102.0
Heart Rate Maximum(bpm):  205.0
Heart Rate Average(bpm):  158.39440113394755


### Time Trial

In [6]:
explorBasic(tt_2016)

Total Distance(Km):  24.80288703130808
Average speed(Km/h):  33.52996304869014
Minimum Elevation(m):  85.0
Maxium Elevation(m):  202.6
Cadence Minimum(rpm):  0.0
Cadence Maximum(rpm):  118.0
Cadence Average(rpm):  83.27709279688514
Heart Rate Minimum(bpm):  100.0
Heart Rate Maximum(bpm):  251.0
Heart Rate Average(bpm):  170.93964957819597


## Calga 2019

### Road Race

In [7]:
explorBasic(rr_2019)

Total Distance(Km):  51.78913253596059
Average speed(Km/h):  33.87986137188044
Minimum Elevation(m):  185.2
Maxium Elevation(m):  310.4
Cadence Minimum(rpm):  0.0
Cadence Maximum(rpm):  120.0
Cadence Average(rpm):  70.0049064146829
Heart Rate Minimum(bpm):  71.0
Heart Rate Maximum(bpm):  170.0
Heart Rate Average(bpm):  138.99854624750137


### Time Trial

In [8]:
explorBasic(tt_2019)

Total Distance(Km):  24.38014504376575
Average speed(Km/h):  33.05782378815691
Minimum Elevation(m):  195.8
Maxium Elevation(m):  312.2
Cadence Minimum(rpm):  0.0
Cadence Maximum(rpm):  111.0
Cadence Average(rpm):  89.97966101694915
Heart Rate Minimum(bpm):  88.0
Heart Rate Maximum(bpm):  166.0
Heart Rate Average(bpm):  152.74124293785312


# Race Data Comparision

## Speed Comparison 

In [112]:
tt_2016_speed = pd.DataFrame(tt_2016.speed.array).rolling(150).mean()
rr_2016_speed = pd.DataFrame(rr_2016.speed.array).rolling(150).mean()
print(tt_2016_speed.size)
print(rr_2016_speed.size)




1541
2822


## Challenge: Gear Usage

A modern race bike has up to 22 different gears with two chainrings on the front (attached to the pedals) and 10 or 11 at the back (attached to the wheel).   The ratio of the number of teeth on the front and rear cogs determines the distance travelled with one revolution of the pedals (often called __development__, measured in metres).  Low development is good for climbing hills while high development is for going fast downhill or in the final sprint. 

We have a measure of the number of rotations of the pedals per minute (__cadence__) and a measure of __speed__.  Using these two variables we should be able to derive a measure of __development__ which would effectivly tell us which gear the rider was using at the time.   Development will normally range between __2m__ and __10m__.  Due to errors in GPS and cadence measurements you will see many points outside this range and you should just discard them as outliers. 

Write code to calculate __development__ in _meters_ for each row in a ride.  Plot the result in a _histogram_ and compare the plots for the four rides.   Comment on what you observe in the histograms.



