In [22]:
%%HTML
<style>
    body {
        --vscode-font-family: "Inter";
        font-size: 15px;
    }
</style>

# **Dependencies**
* Pandas
* Plotly
* nbformat

# **Running Predictions**

The objective of this project is to create a model that predicts race time given the distance and your history of running performance

Let's start by plotting average heart rate (HR) on the z-axis, against distance and pace on the x and y axes, for a few individuals. We might use more than these 3 variables to perform the prediction, but using 3 variables allows us to easily visualise the data. This visualisation will give us an idea of how well the data is correlated.

In [57]:
import plotly.graph_objects as go

def my_scatter(x,y,z,
               c,
               cmap='viridis',
               aspectratio=dict(x=1,y=1,z=1),
               height=None,
               width=None,
               ):
    '''
    Helper function that plots a scatter using plotly

    c: list or np.ndarray or pd.Series - the numerical value associated with the colour map
    cmap: string - colour map
    aspectratio: dict - aspect ratio 

    '''

    x = pd.Series(x)
    y = pd.Series(y)
    z = pd.Series(z)


    trace_data = go.Scatter3d(
        x=x,
        y=y,
        z=z,
        mode='markers',
        marker=dict(
            size=3,
            color=c,
            colorscale=cmap,
            opacity=0.8
        )
    )

    fig = go.Figure(data = trace_data)
    fig.update_layout(
        autosize=False if not (height is None and width is None) else True,
        width=width,
        height=height,
        margin = dict(l=0,r=0,b=0,t=0),
        paper_bgcolor='#192227',
        font_color = '#a3b3b5',
        scene = dict(
            xaxis = dict(
                backgroundcolor='#192227',
                gridcolor='#a3b3b5',
                showbackground=True,
                zerolinecolor='#a3b3b5',
                # range=
            ),
            yaxis = dict(
                backgroundcolor='#192227',
                gridcolor='#a3b3b5',
                showbackground=True,
                zerolinecolor='#a3b3b5',
                # range=
            ),
            zaxis = dict(
                backgroundcolor='#192227',
                gridcolor='#a3b3b5',
                showbackground=True,
                zerolinecolor='#a3b3b5',
                # range=
            ),
            aspectratio=aspectratio,
            camera = dict(projection_type="orthographic")
        )
    )

    return fig

In [103]:
import pandas as pd
import numpy as np
from datetime import datetime

data = pd.read_csv("data/raw-data-kaggle.csv", delimiter=";")


data["timestamp_abs"] = data["timestamp"].apply(lambda x: (datetime.strptime(x, "%d/%m/%Y %H:%M") - datetime(1970, 1, 1)).total_seconds())

data

Unnamed: 0,athlete,gender,timestamp,distance (m),elapsed time (s),elevation gain (m),average heart rate (bpm),timestamp_abs
0,18042525,M,15/12/2019 09:08,2965.8,812,17.4,150.3,1.576401e+09
1,18042525,M,10/12/2019 19:27,10020.8,3290,52.2,160.8,1.576006e+09
2,18042525,M,03/12/2019 19:46,12132.2,4027,249.0,148.9,1.575402e+09
3,18042525,M,26/11/2019 19:46,11631.5,4442,194.0,136.2,1.574798e+09
4,18042525,M,19/11/2019 19:45,11708.1,4022,250.7,146.0,1.574193e+09
...,...,...,...,...,...,...,...,...
42111,27950722,F,17/11/2017 17:48,5790.2,2242,19.5,151.0,1.510941e+09
42112,27950722,F,14/11/2017 18:02,6452.9,2398,19.2,142.0,1.510683e+09
42113,27950722,F,12/11/2017 09:48,12271.2,5334,203.0,153.5,1.510480e+09
42114,27950722,F,10/11/2017 18:06,7057.4,2592,25.1,138.8,1.510337e+09


In [104]:
def normalise(series):
    '''
    Normalises the values of a pd.Series object
    '''

    return (series - series.min()) / (series.max() - series.min())

data.iloc[:,3:8] = data.iloc[:,3:8].apply(normalise)

data

Unnamed: 0,athlete,gender,timestamp,distance (m),elapsed time (s),elevation gain (m),average heart rate (bpm),timestamp_abs
0,18042525,M,15/12/2019 09:08,0.013546,0.000273,0.001441,0.634177,0.997010
1,18042525,M,10/12/2019 19:27,0.045768,0.001107,0.004322,0.678481,0.996385
2,18042525,M,03/12/2019 19:46,0.055411,0.001355,0.020615,0.628270,0.995429
3,18042525,M,26/11/2019 19:46,0.053124,0.001495,0.016062,0.574684,0.994471
4,18042525,M,19/11/2019 19:45,0.053474,0.001354,0.020756,0.616034,0.993513
...,...,...,...,...,...,...,...,...
42111,27950722,F,17/11/2017 17:48,0.026445,0.000754,0.001614,0.637131,0.893320
42112,27950722,F,14/11/2017 18:02,0.029472,0.000807,0.001590,0.599156,0.892910
42113,27950722,F,12/11/2017 09:48,0.056046,0.001795,0.016807,0.647679,0.892590
42114,27950722,F,10/11/2017 18:06,0.032233,0.000872,0.002078,0.585654,0.892363


Once the data is processed and normalised, I'll create an initial Neural Network (NN) to see the minimum number of nodes to capture the input/output map with at least 90% accuracy for a given user. This will allow me to decide whether a helper model for fine-tuning is feasible.

The possible outcomes will be:
* Low number of nodes: the output of the helper fine-tuning model can be the changes to weights and biases for each node
* High number of nodes: there are too many weights and biases to create a neural network with these as outputs - I'll have to fine tune base models directly, and probably use a greater number of clusters 
* Accuracy can't be captured with 90% accuracy: this indicates that there is poor correlation of the input with the output