# UCI MTB DH Data Retrieval

## Setup
#### Import Libraries

If you do not have these libraries available, you should install them using `pip`

```
pip install requests
pip install bs4
pip install pandas
```

In [26]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import datetime as dt
import os

In [27]:
def calculate_age(born):
    today = dt.date.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

Widen display area to prevent column wrapping, and always show all columns for debug 

In [28]:
pd.set_option('display.width', 2000)
pd.set_option('display.max_columns', None)

## Config

Which race data are we collecting?

1. Losinj
1. Fort William
1. Leogang
1. Val di Sole
1. Vallnord
1. Mont-Sainte-Anne
1. La Bresse

In [57]:
year = 2018
race = 4
gender = 'm'
event = 'dh'
rnrSpeed = False

#### Data Sources

The UCI Live Timing API contains a lot of data points, but not all the ones we want (speed being the main one missing), and not even all the ones they include on their own PDF which is frustrating.

Similarly, Roots & Rain also has a lot of the data points, but again not all of them; most notably it's missing timing splits 4 and 5.

Therefore we need to pull from both sources and combine the sets.

We specify the URLs for both sources from which we will extract our data. The UCI API URL can be found by loading the Live Timing page then using your browser's inspector on the Network tab (in Chrome at least) to see the data feed. As the UCI seems to be using a Single Page Application (SPA) here, it's not straight forward to extract this link automagically.

**Note:** The Race list is now maintained as an external python config file `config.py` imported below

In [58]:
from config import races

racename = races[year][race]['name']
urlUci = races[year][race]['urls']['uci'] + str(( 3 if 'm' == gender else 6 )) + '/'
urlUciQ = races[year][race]['urls']['uci'] + str(( 2 if 'm' == gender else 5 )) + '/'
urlRoots = races[year][race]['urls']['rnr'] + gender + '/'

File handling setup

In [59]:
directory = event + str(race) + '_' + racename
if not os.path.exists(directory):
    os.makedirs(directory)

file_prefix = str(year) + '_' + event + str(race) + '_' + racename + '_' + gender
file_prefix = os.path.join( directory, file_prefix )

# UCI API
### Load Data

These two lines make the actual request to the server, and then converts the JSON string response in to a usable list format (deserialization)

In [60]:
r = requests.get( urlUci ).json()
q = requests.get( urlUciQ ).json()

The API returns with three main sections:

1. `Last Finisher`
 - Racers in order of start time
2. `Results`
 - Racers in finishing rank order
3. `Riders`
 - Personal details on all racers
 
Each contains many data points. To see all the contained data, you can un-comment and execute any of the lines in the next section to explore more.

In [61]:
# display( q )
# display( d['Results'][7] )
# display( d['Riders']['1001'] )
# display( d['Results'][61] )

### Extract Data

Here we iterate over the `Results` sub-set of data to extract the information we care about: basically some identifying info, and their splits.

There is a loop within a loop here as the first iterates over the two result sets qualifying and race, and within that we extract the necessary stats.

If you looked at detail of the returned data set in the last step you might have noticed the rider's name is not stored next to their result, riders are only identified by a reference number. To facilitate our analysis later on it is useful to import each rider's name at this stage by cross-referencing the `Riders` sub-set.

We start with an empty list `lst` and in each loop iteration add an entry (actually a dict) to that list for each rider.

In [62]:
dat = {}
for i, d in enumerate( [ r, q ] ):
    lastStart = d['Riders'][list(d['Riders'].keys())[-1]]['StartOrder']
    pfx = 'q_' if 1 == i else ''

    splits = len(d['Results'][0]['Times'] )
    lst = []
    for idx, row in enumerate( d['Results'] ):
        fin = "Finished" == row['Status']
        res = {
            'name': d['Riders'][str(row['RaceNr'])]['PrintName'],
            'id': row['RaceNr'],
            'uci': d['Riders'][str(row['RaceNr'])]['UciRiderId'],
            'bib': d['Riders'][str(row['RaceNr'])]['RaceNr'],
            pfx + 'status': row['Status'],
            pfx + 'rank': row['Position'] if fin else idx+1,
            pfx + 'start': d['Riders'][str(row['RaceNr'])]['StartOrder'],
            pfx + 'start_rev': lastStart - d['Riders'][str(row['RaceNr'])]['StartOrder'] +1
        }
        if rnrSpeed:
            res[pfx + 'speed'] = np.nan

        # Add all splits to result set
        for split in range( 0, splits ):
            head = pfx + 'split'
            res[head + str(split+1)] = row['Times'][split]['RaceTime']/1000 if fin else np.nan

        # Append result set to list
        lst.append(res)

    dat[i] = lst

Here we load the completed lists in to Pandas dataframes to facilitate working with the data moving forward

In [63]:
df = pd.DataFrame( dat[0] )
dq = pd.DataFrame( dat[1] )

# Points

Neither data set contains points awarded so we use a reference file and merge

Merge type here must be `outer` so people that finished outside the top 60 men, top 15 women, or DNF, don't get trimmed from the dataset

In [64]:
df_qp = pd.read_csv( event + '_points_qual_' + gender + '.csv', index_col=0 )
df_qp = df_qp.reset_index(drop=False)
dq = dq.merge( df_qp, left_index=True, right_index=True, how="outer")

df_rp = pd.read_csv( event + '_points_race_' + gender + '.csv', index_col=0 )
df_rp = df_rp.reset_index(drop=False)
df = df.merge( df_rp, left_index=True, right_index=True, how="outer")

In [65]:
display( df.head(), dq.head() )

Unnamed: 0,bib,id,name,rank,split1,split2,split3,split4,split5,start,start_rev,status,uci,r_points
0,1,1001,PIERRON Amaury,1,54.943,117.167,147.615,185.895,216.788,66,1,Finished,10008827283,200.0
1,5,1005,GREENLAND Laurie,2,55.155,116.308,146.631,185.919,217.312,62,5,Finished,10009404738,160.0
2,8,1008,HART Danny,3,54.322,116.386,146.865,185.938,217.448,65,2,Finished,10005470073,140.0
3,7,1007,SHAW Luca,4,56.937,120.157,150.066,188.198,219.036,64,3,Finished,10008813442,125.0
4,39,1039,ESTAQUE Thomas,5,54.382,117.274,148.581,187.674,219.254,45,22,Finished,10008848505,110.0


Unnamed: 0,bib,id,name,q_rank,q_split1,q_split2,q_split3,q_split4,q_split5,q_start,q_start_rev,q_status,uci,q_points
0,1,1001,PIERRON Amaury,1,61.533,135.196,170.755,213.596,247.25,1,145,Finished,10008827283,50.0
1,32,1032,ILES Finn,2,60.176,133.796,170.543,215.391,250.682,30,116,Finished,10076111537,40.0
2,128,1128,FRIXTALON Hugo,3,63.873,139.02,175.673,220.846,255.582,111,35,Finished,10016018118,30.0
3,8,1008,HART Danny,4,61.699,137.952,176.32,221.415,257.581,8,138,Finished,10005470073,25.0
4,7,1007,SHAW Luca,5,62.461,139.053,177.406,221.969,257.846,7,139,Finished,10008813442,22.0


# Merge and Expand

This code merges the qualifying and race data in to a single data frame, merging only the columns that are unique between them to avoid duplicates. This allows us to do more in depth analysis later on.

As we merged race in to quali, we re-sort the resulting dataset by race rank

In [66]:
dfq = dq.merge( df[['id'] + list(df.columns.difference( dq.columns ))], left_on='id', right_on='id', how='outer' )
dfq = dfq.sort_values( 'rank', ascending=True )
dfq = dfq.reset_index( drop=True )
dfq['points'] = dfq['r_points'].fillna(0) + dfq['q_points'].fillna(0)

# Time difference between race and quali
dfq['qr_diff'] = dfq['split5'] - dfq['q_split5']

#### Expand Dataset

Calculate and add all the extra columns we need for split and sector differences and their rankings

In [67]:
for pfx in [ 'q_', '' ]:
    for i in range( 1, splits+1 ):
        split = pfx + 'split' + str(i)
        sector = split + '_sector'
        dfq[split + '_rank'] = dfq[split].rank(method='dense')
        dfq[split + '_vs_best'] = (dfq[split] - dfq[split].min())
        dfq[split + '_vs_winner'] = (dfq[split] - dfq[split][0])

        if i > 1:
            dfq[split + '_sector'] = dfq[split] - dfq[pfx + 'split' + str(i-1)]
            dfq[split + '_sector_rank'] = dfq[sector].rank(method='dense')
            dfq[split + '_sector_vs_best'] = (dfq[sector] - dfq[sector].min())
            dfq[split + '_sector_vs_winner'] = (dfq[sector] - dfq[sector][0])

We can take a peek at our data at this point to make sure it looks how we expect.

At this point the `speed` column is NaN (Not a Number) for all racers. This will be filled in below.

In [68]:
display( dfq.head(10) )

Unnamed: 0,bib,id,name,q_rank,q_split1,q_split2,q_split3,q_split4,q_split5,q_start,q_start_rev,q_status,uci,q_points,r_points,rank,split1,split2,split3,split4,split5,start,start_rev,status,points,qr_diff,q_split1_rank,q_split1_vs_best,q_split1_vs_winner,q_split2_rank,q_split2_vs_best,q_split2_vs_winner,q_split2_sector,q_split2_sector_rank,q_split2_sector_vs_best,q_split2_sector_vs_winner,q_split3_rank,q_split3_vs_best,q_split3_vs_winner,q_split3_sector,q_split3_sector_rank,q_split3_sector_vs_best,q_split3_sector_vs_winner,q_split4_rank,q_split4_vs_best,q_split4_vs_winner,q_split4_sector,q_split4_sector_rank,q_split4_sector_vs_best,q_split4_sector_vs_winner,q_split5_rank,q_split5_vs_best,q_split5_vs_winner,q_split5_sector,q_split5_sector_rank,q_split5_sector_vs_best,q_split5_sector_vs_winner,split1_rank,split1_vs_best,split1_vs_winner,split2_rank,split2_vs_best,split2_vs_winner,split2_sector,split2_sector_rank,split2_sector_vs_best,split2_sector_vs_winner,split3_rank,split3_vs_best,split3_vs_winner,split3_sector,split3_sector_rank,split3_sector_vs_best,split3_sector_vs_winner,split4_rank,split4_vs_best,split4_vs_winner,split4_sector,split4_sector_rank,split4_sector_vs_best,split4_sector_vs_winner,split5_rank,split5_vs_best,split5_vs_winner,split5_sector,split5_sector_rank,split5_sector_vs_best,split5_sector_vs_winner
0,1,1001,PIERRON Amaury,1,61.533,135.196,170.755,213.596,247.25,1,145,Finished,10008827283,50.0,200.0,1.0,54.943,117.167,147.615,185.895,216.788,66.0,1.0,Finished,250.0,-30.462,2.0,1.357,0.0,2.0,1.4,0.0,73.663,2.0,0.043,0.0,2.0,0.212,0.0,35.559,1.0,0.0,0.0,1.0,0.0,0.0,42.841,1.0,0.0,0.0,1.0,0.0,0.0,33.654,1.0,0.0,0.0,5.0,0.621,0.0,4.0,0.859,0.0,62.224,4.0,1.071,0.0,4.0,0.984,0.0,30.448,6.0,0.539,0.0,1.0,0.0,0.0,38.28,2.0,0.148,0.0,1.0,0.0,0.0,30.893,4.0,0.086,0.0
1,5,1005,GREENLAND Laurie,10,64.201,141.328,178.669,224.338,259.234,5,141,Finished,10009404738,15.0,160.0,2.0,55.155,116.308,146.631,185.919,217.312,62.0,5.0,Finished,175.0,-41.922,36.0,4.025,2.668,20.0,7.532,6.132,77.127,14.0,3.507,3.464,11.0,8.126,7.914,37.341,5.0,1.782,1.782,12.0,10.742,10.742,45.669,15.0,2.828,2.828,10.0,11.984,11.984,34.896,7.0,1.242,1.242,6.0,0.833,0.212,1.0,0.0,-0.859,61.153,1.0,0.0,-1.071,1.0,0.0,-0.984,30.323,3.0,0.414,-0.125,2.0,0.024,0.024,39.288,5.0,1.156,1.008,2.0,0.524,0.524,31.393,7.0,0.586,0.5
2,8,1008,HART Danny,4,61.699,137.952,176.32,221.415,257.581,8,138,Finished,10005470073,25.0,140.0,3.0,54.322,116.386,146.865,185.938,217.448,65.0,2.0,Finished,165.0,-40.133,4.0,1.523,0.166,3.0,4.156,2.756,76.253,6.0,2.633,2.59,4.0,5.777,5.565,38.368,18.0,2.809,2.809,4.0,7.819,7.819,45.095,9.0,2.254,2.254,4.0,10.331,10.331,36.166,48.0,2.512,2.512,1.0,0.0,-0.621,2.0,0.078,-0.781,62.064,2.0,0.911,-0.16,2.0,0.234,-0.75,30.479,7.0,0.57,0.031,3.0,0.043,0.043,39.073,3.0,0.941,0.793,3.0,0.66,0.66,31.51,8.0,0.703,0.617
3,7,1007,SHAW Luca,5,62.461,139.053,177.406,221.969,257.846,7,139,Finished,10008813442,22.0,125.0,4.0,56.937,120.157,150.066,188.198,219.036,64.0,3.0,Finished,147.0,-38.81,8.0,2.285,0.928,7.0,5.257,3.857,76.592,8.0,2.972,2.929,5.0,6.863,6.651,38.353,17.0,2.794,2.794,5.0,8.373,8.373,44.563,3.0,1.722,1.722,5.0,10.596,10.596,35.877,33.0,2.223,2.223,27.0,2.615,1.994,14.0,3.849,2.99,63.22,9.0,2.067,0.996,8.0,3.435,2.451,29.909,1.0,0.0,-0.539,5.0,2.303,2.303,38.132,1.0,0.0,-0.148,4.0,2.248,2.248,30.838,2.0,0.031,-0.055
4,39,1039,ESTAQUE Thomas,7,62.669,140.237,178.399,224.052,258.186,36,110,Finished,10008848505,18.0,110.0,5.0,54.382,117.274,148.581,187.674,219.254,45.0,22.0,Finished,128.0,-38.932,9.0,2.493,1.136,12.0,6.441,5.041,77.568,18.0,3.948,3.905,9.0,7.856,7.644,38.162,14.0,2.603,2.603,11.0,10.456,10.456,45.653,14.0,2.812,2.812,7.0,10.936,10.936,34.134,3.0,0.48,0.48,2.0,0.06,-0.561,5.0,0.966,0.107,62.892,5.0,1.739,0.668,5.0,1.95,0.966,31.307,15.0,1.398,0.859,4.0,1.779,1.779,39.093,4.0,0.961,0.813,5.0,2.466,2.466,31.58,10.0,0.773,0.687
5,16,1016,WALLACE Mark,61,61.984,144.803,192.331,238.855,275.623,16,130,Finished,10008172636,,95.0,6.0,56.01,119.331,149.494,188.922,220.768,54.0,13.0,Finished,95.0,-54.855,6.0,1.808,0.451,41.0,11.007,9.607,82.819,62.0,9.199,9.156,67.0,21.788,21.576,47.528,104.0,11.969,11.969,60.0,25.259,25.259,46.524,25.0,3.683,3.683,61.0,28.373,28.373,36.768,71.0,3.114,3.114,14.0,1.688,1.067,11.0,3.023,2.164,63.321,10.0,2.168,1.097,6.0,2.863,1.879,30.163,2.0,0.254,-0.285,6.0,3.027,3.027,39.428,7.0,1.296,1.148,6.0,3.98,3.98,31.846,13.0,1.039,0.953
6,18,1018,WILSON Reece,8,63.383,138.94,179.129,223.458,258.362,18,128,Finished,10009563271,17.0,90.0,7.0,55.955,119.1,150.224,189.992,220.858,44.0,23.0,Finished,107.0,-37.504,19.0,3.207,1.85,5.0,5.144,3.744,75.557,4.0,1.937,1.894,16.0,8.586,8.374,40.189,47.0,4.63,4.63,8.0,9.862,9.862,44.329,2.0,1.488,1.488,8.0,11.112,11.112,34.904,8.0,1.25,1.25,13.0,1.633,1.012,8.0,2.792,1.933,63.145,7.0,1.992,0.921,10.0,3.593,2.609,31.124,11.0,1.215,0.676,8.0,4.097,4.097,39.768,12.0,1.636,1.488,7.0,4.07,4.07,30.866,3.0,0.059,-0.027
7,32,1032,ILES Finn,2,60.176,133.796,170.543,215.391,250.682,30,116,Finished,10076111537,40.0,85.0,8.0,55.787,118.733,149.998,189.66,221.502,48.0,19.0,Finished,125.0,-29.18,1.0,0.0,-1.357,1.0,0.0,-1.4,73.62,1.0,0.0,-0.043,1.0,0.0,-0.212,36.747,3.0,1.188,1.188,2.0,1.795,1.795,44.848,5.0,2.007,2.007,2.0,3.432,3.432,35.291,14.0,1.637,1.637,9.0,1.465,0.844,7.0,2.425,1.566,62.946,6.0,1.793,0.722,7.0,3.367,2.383,31.265,13.0,1.356,0.817,7.0,3.765,3.765,39.662,10.0,1.53,1.382,8.0,4.714,4.714,31.842,12.0,1.035,0.949
8,10,1010,MACDONALD Brook,25,64.139,140.731,178.564,223.557,266.114,10,136,Finished,10006429969,,80.0,9.0,56.118,119.319,151.048,190.64,221.916,57.0,10.0,Finished,80.0,-44.198,35.0,3.963,2.606,16.0,6.935,5.535,76.592,8.0,2.972,2.929,10.0,8.021,7.809,37.833,10.0,2.274,2.274,10.0,9.961,9.961,44.993,6.0,2.152,2.152,25.0,18.864,18.864,42.557,118.0,8.903,8.903,17.0,1.796,1.175,10.0,3.011,2.152,63.201,8.0,2.048,0.977,12.0,4.417,3.433,31.729,24.0,1.82,1.281,11.0,4.745,4.745,39.592,9.0,1.46,1.312,9.0,5.128,5.128,31.276,5.0,0.469,0.383
9,3,1003,BROSNAN Troy,53,63.72,144.695,185.993,235.963,273.015,3,143,Finished,10007307417,,75.0,10.0,54.593,118.625,150.077,190.06,222.379,55.0,12.0,Finished,75.0,-50.636,30.0,3.544,2.187,39.0,10.899,9.499,80.975,50.0,7.355,7.312,45.0,15.45,15.238,41.298,60.0,5.739,5.739,51.0,22.367,22.367,49.97,82.0,7.129,7.129,53.0,25.765,25.765,37.052,80.0,3.398,3.398,3.0,0.271,-0.35,6.0,2.317,1.458,64.032,15.0,2.879,1.808,9.0,3.446,2.462,31.452,18.0,1.543,1.004,9.0,4.165,4.165,39.983,14.0,1.851,1.703,10.0,5.591,5.591,32.319,21.0,1.512,1.426


#### Rider Data

Saving the personal information about each racer is much easier as we can just export the entire `Riders` dataset. However, the rows and columns are the wrong way round so the `.T` command *transposes* the information, meaning it basically flips the axes.

In [69]:
df2 = pd.DataFrame( d['Riders'] )
df2 = df2.T
df2['Age'] = [ calculate_age( dt.datetime.strptime( dob[:10], "%Y-%m-%d" ) ) for dob in df2['BirthDate'] ]

Here we can glimpse the first few rows of our `DataFrame` and can check the data looks as we expect

In [70]:
display( df2.head() )

Unnamed: 0,BirthDate,CategoryCode,FamilyName,GivenName,Id,Nation,Outfit,PrintName,Protected,RaceId,RaceNr,ScoreboardName,StartOrder,StartTime,UciCode,UciRank,UciRiderId,UciTeamCode,UciTeamId,UciTeamName,WorldCupRank,Age
1001,1996-03-04T00:00:00,ME,PIERRON,Amaury,1197084694808582,FRA,WCL,PIERRON Amaury,False,0,1,PIERRON A,1,50400000,FRA19960304,3,10008827283,CVN,1590,COMMENCAL / VALLNORD,1,22
1002,1987-12-24T00:00:00,ME,GWIN,Aaron,1197084694808583,USA,NCh,GWIN Aaron,False,0,2,GWIN A,2,50430000,USA19871224,1,10006516663,YTM,1531,THE YT MOB,2,30
1003,1993-07-13T00:00:00,ME,BROSNAN,Troy,1197084694808584,AUS,NCh,BROSNAN Troy,False,0,3,BROSNAN T,3,50460000,AUS19930713,2,10007307417,CFT,2162,CANYON FACTORY DOWNHILL TEAM,3,25
1004,1996-05-07T00:00:00,ME,VERGIER,Loris,1197084694808585,FRA,,VERGIER Loris,False,0,4,VERGIER L,4,50490000,FRA19960507,7,10008723112,SCB,1307,SANTA CRUZ SYNDICATE,4,22
1005,1997-02-18T00:00:00,ME,GREENLAND,Laurie,1197084694808586,GBR,,GREENLAND Laurie,False,0,5,GREENLAND L,5,50520000,GBR19970218,9,10009404738,MSM,1009,MS MONDRAKER TEAM,5,21


# Speed Data

Roots and Rain seem to take about 3 days to get their results online. Given all UCI data is available immediately I have added a second method for getting speed data. There is boolean in the config at top of this notebook to decide if we pull data from RnR or we use an import CSV file.

## Roots and Rain

### Load Data

Similar to the UCI api, we make a request to the server with the previously declared `urlRoots` variable. This time however we simply load the content of the response as text which is actually the HTML code of the web page. We do not do have a nice JSON API to read which means we will not deserialize.

Next we invoke a utility called `BeautifulSoup` to help us extract the data from this messy HTML code

In [71]:
if rnrSpeed:
    c = requests.post( urlRoots ).content
    soup = BeautifulSoup( c, "html.parser" )

### Extract Data

If you look at the Roots and Rain page you'll see it listed in a tabular format. What we do here is find all the rows of that table so we can extract the information we need.

Specifically we are looking for instances of `tr` (table row), with a class that *begins with* `c-` as this is a common denomenator I discovered when looking through the code with the browser inspector

In [72]:
if rnrSpeed:
    rows = soup.find_all( "tr", class_=lambda x: x and 'c-' in x )

Similar to the UCI data set, here we will iterate over each row in our data set--basically each table row from the web page--and extract the bits we need.

Racer speed is the metric we're interested in, but in order to match that to our existing data set we need a corresponding identifier so we also extract the racer licence number as that exists in both sets and we can match them together: it is the *intersect* between both sets of data.

To summarise:
1. Extract licence number and corresponding speed
2. Import speed to existing DataFrame matching racers by licence

The `if` condition in the middle will exit this block of code once we hit the end of the Elite finishers, seeing as that's all we have in our existing data set so can't match anyone else

In [73]:
if rnrSpeed:
    for row in rows:
        cells = row.find_all( "td" )
        qspd = cells[7].text[:5]
        spd = cells[12].text[:5]
        qspeed = float( qspd if 0 < len(qspd) else 0 )
        speed = float( spd if 0 < len(spd) else 0 )
        licence = cells[4].text
        bib = int( cells[1].text )
        pos = cells[0].text[8:]
        if "" == pos: break

        # Match rider by UCI licence if present, otherwise fallback to bib
        if len(df2.loc[df2['UciRiderId'] == licence].index.values ):
            rid = int(df2.loc[df2['UciRiderId'] == licence].index.values[0])
        else:
            rid = int( df2.loc[df2['RaceNr'] == bib].index.values[0] )

        # Add speed, and other associated metrics
        dfq.loc[dfq['id'] == rid, 'speed'] = speed
        dfq.loc[dfq['id'] == rid, 'q_speed'] = qspeed

As before, we can take another look at how our data is looking, with the `speed` column now containing data 

## UCI PDF Converted Speed

Despite UCI having a speed field in the splits data of their API, it is always 0. Thanks. They do make that data available in their PDFs, but that data is not easy to extract and all regular converters fail. However, trying with some OCR engines I did have good success. The best of which is https://convertio.co/ocr/. I take the converted file, strip it down to UCI# and speed, save as CSV, and then import and merge here.

Regex code for removing (X) rank from OCR converted files.

> Find: `(,[0-9\.]+).*`
>
> Replace: `$1`

Column header in `___qspeed.csv` must be `q_speed`, and in `___speed.csv` must be just `speed`

In [74]:
if not rnrSpeed:
    dfs = pd.read_csv( file_prefix + '.speeds.csv' )
    dfsq = pd.read_csv( file_prefix + '.qspeeds.csv' )
    dfq.uci = dfq.uci.astype(str)
    dfs.uci = dfs.uci.astype(str)
    dfsq.uci = dfsq.uci.astype(str)

    dfq = dfq.merge( dfs, left_on='uci', right_on='uci', how='left' )
    dfq = dfq.merge( dfsq, left_on='uci', right_on='uci', how='left' )
    # dfqs[['name', 'uci', 'q_speed', 'speed']]
    # dfqs.columns

In [75]:
display( dfq.head() )

Unnamed: 0,bib,id,name,q_rank,q_split1,q_split2,q_split3,q_split4,q_split5,q_start,q_start_rev,q_status,uci,q_points,r_points,rank,split1,split2,split3,split4,split5,start,start_rev,status,points,qr_diff,q_split1_rank,q_split1_vs_best,q_split1_vs_winner,q_split2_rank,q_split2_vs_best,q_split2_vs_winner,q_split2_sector,q_split2_sector_rank,q_split2_sector_vs_best,q_split2_sector_vs_winner,q_split3_rank,q_split3_vs_best,q_split3_vs_winner,q_split3_sector,q_split3_sector_rank,q_split3_sector_vs_best,q_split3_sector_vs_winner,q_split4_rank,q_split4_vs_best,q_split4_vs_winner,q_split4_sector,q_split4_sector_rank,q_split4_sector_vs_best,q_split4_sector_vs_winner,q_split5_rank,q_split5_vs_best,q_split5_vs_winner,q_split5_sector,q_split5_sector_rank,q_split5_sector_vs_best,q_split5_sector_vs_winner,split1_rank,split1_vs_best,split1_vs_winner,split2_rank,split2_vs_best,split2_vs_winner,split2_sector,split2_sector_rank,split2_sector_vs_best,split2_sector_vs_winner,split3_rank,split3_vs_best,split3_vs_winner,split3_sector,split3_sector_rank,split3_sector_vs_best,split3_sector_vs_winner,split4_rank,split4_vs_best,split4_vs_winner,split4_sector,split4_sector_rank,split4_sector_vs_best,split4_sector_vs_winner,split5_rank,split5_vs_best,split5_vs_winner,split5_sector,split5_sector_rank,split5_sector_vs_best,split5_sector_vs_winner,speed,q_speed
0,1,1001,PIERRON Amaury,1,61.533,135.196,170.755,213.596,247.25,1,145,Finished,10008827283,50.0,200.0,1.0,54.943,117.167,147.615,185.895,216.788,66.0,1.0,Finished,250.0,-30.462,2.0,1.357,0.0,2.0,1.4,0.0,73.663,2.0,0.043,0.0,2.0,0.212,0.0,35.559,1.0,0.0,0.0,1.0,0.0,0.0,42.841,1.0,0.0,0.0,1.0,0.0,0.0,33.654,1.0,0.0,0.0,5.0,0.621,0.0,4.0,0.859,0.0,62.224,4.0,1.071,0.0,4.0,0.984,0.0,30.448,6.0,0.539,0.0,1.0,0.0,0.0,38.28,2.0,0.148,0.0,1.0,0.0,0.0,30.893,4.0,0.086,0.0,64.759,57.371
1,5,1005,GREENLAND Laurie,10,64.201,141.328,178.669,224.338,259.234,5,141,Finished,10009404738,15.0,160.0,2.0,55.155,116.308,146.631,185.919,217.312,62.0,5.0,Finished,175.0,-41.922,36.0,4.025,2.668,20.0,7.532,6.132,77.127,14.0,3.507,3.464,11.0,8.126,7.914,37.341,5.0,1.782,1.782,12.0,10.742,10.742,45.669,15.0,2.828,2.828,10.0,11.984,11.984,34.896,7.0,1.242,1.242,6.0,0.833,0.212,1.0,0.0,-0.859,61.153,1.0,0.0,-1.071,1.0,0.0,-0.984,30.323,3.0,0.414,-0.125,2.0,0.024,0.024,39.288,5.0,1.156,1.008,2.0,0.524,0.524,31.393,7.0,0.586,0.5,66.807,55.775
2,8,1008,HART Danny,4,61.699,137.952,176.32,221.415,257.581,8,138,Finished,10005470073,25.0,140.0,3.0,54.322,116.386,146.865,185.938,217.448,65.0,2.0,Finished,165.0,-40.133,4.0,1.523,0.166,3.0,4.156,2.756,76.253,6.0,2.633,2.59,4.0,5.777,5.565,38.368,18.0,2.809,2.809,4.0,7.819,7.819,45.095,9.0,2.254,2.254,4.0,10.331,10.331,36.166,48.0,2.512,2.512,1.0,0.0,-0.621,2.0,0.078,-0.781,62.064,2.0,0.911,-0.16,2.0,0.234,-0.75,30.479,7.0,0.57,0.031,3.0,0.043,0.043,39.073,3.0,0.941,0.793,3.0,0.66,0.66,31.51,8.0,0.703,0.617,63.259,53.786
3,7,1007,SHAW Luca,5,62.461,139.053,177.406,221.969,257.846,7,139,Finished,10008813442,22.0,125.0,4.0,56.937,120.157,150.066,188.198,219.036,64.0,3.0,Finished,147.0,-38.81,8.0,2.285,0.928,7.0,5.257,3.857,76.592,8.0,2.972,2.929,5.0,6.863,6.651,38.353,17.0,2.794,2.794,5.0,8.373,8.373,44.563,3.0,1.722,1.722,5.0,10.596,10.596,35.877,33.0,2.223,2.223,27.0,2.615,1.994,14.0,3.849,2.99,63.22,9.0,2.067,0.996,8.0,3.435,2.451,29.909,1.0,0.0,-0.539,5.0,2.303,2.303,38.132,1.0,0.0,-0.148,4.0,2.248,2.248,30.838,2.0,0.031,-0.055,62.485,54.564
4,39,1039,ESTAQUE Thomas,7,62.669,140.237,178.399,224.052,258.186,36,110,Finished,10008848505,18.0,110.0,5.0,54.382,117.274,148.581,187.674,219.254,45.0,22.0,Finished,128.0,-38.932,9.0,2.493,1.136,12.0,6.441,5.041,77.568,18.0,3.948,3.905,9.0,7.856,7.644,38.162,14.0,2.603,2.603,11.0,10.456,10.456,45.653,14.0,2.812,2.812,7.0,10.936,10.936,34.134,3.0,0.48,0.48,2.0,0.06,-0.561,5.0,0.966,0.107,62.892,5.0,1.739,0.668,5.0,1.95,0.966,31.307,15.0,1.398,0.859,4.0,1.779,1.779,39.093,4.0,0.961,0.813,5.0,2.466,2.466,31.58,10.0,0.773,0.687,60.343,52.122


Now we have speed info either way, expand data set

In [76]:
dfq['speed_ms'] = dfq['speed'] * (1000/60/60)
dfq['speed_ms_vs_best'] = dfq['speed_ms'].max() - dfq.speed_ms
dfq['speed_rank'] = dfq.speed.rank(method='dense', ascending=False)
dfq['q_speed_rank'] = dfq['q_speed'].rank(method='dense', ascending=False)

# Data Export

All that's left is to save our data to CSV files so we can quickly import it again for analysis and visualization without making constant requests to the online servers. This not only reduces load on the services providing the data, but also allows us to work on our analysis "offline", moreover giving us a local copy in case the results are ever taken down. It's also much quicker to load data this way than constantly hitting online servers.

In [77]:
dfq.id = dfq.id.astype(str)
dfm = dfq.merge( df2, left_on='id', right_index=True, how='inner' )

In [78]:
df.to_csv( file_prefix + '.results.csv' )
dq.to_csv( file_prefix + '.quali.csv' )
df2.to_csv( file_prefix + '.racers.csv' )
dfm.to_csv( file_prefix + '.merged.csv' )

--- 

## Credits

### Author: Dominic Wrapson


> **@domwrap**
<br>
<img src="https://png.icons8.com/material/24/000000/github-2.png">
<img src="https://png.icons8.com/material/24/000000/stackoverflow.png">
<img src="https://png.icons8.com/material/24/000000/linkedin.png">
<img src="https://png.icons8.com/material/24/000000/windows8.png">
<img src="https://png.icons8.com/ios-glyphs/24/000000/instagram-new.png">
<img src="https://png.icons8.com/material/24/000000/twitter.png">
<a href="https://medium.com/@domwrap"><img src="https://png.icons8.com/material/24/000000/medium-logo.png"></a>
>
> <img src="https://png.icons8.com/material/24/000000/home.png"> http://domwrap.me
>
><img src="https://png.icons8.com/material/24/000000/cycling-mountain-bike.png"> [Hwulex](https://www.pinkbike.com/u/Hwulex/)


---

#### Special Thanks

Mark Shilton for the inspiration
- http://lookatthestats.blogspot.ca
- https://plus.google.com/+MarkShilton
- https://dirtmountainbike.com/author/mrgeekstats


<a href="https://icons8.com">Icon pack by Icons8</a>