In [22]:
%%HTML
<style>
    body {
        --vscode-font-family: "Inter";
        font-size: 15px;
    }
</style>

# **Dependencies**
* Pandas
* Plotly
* nbformat

# **Running Predictions**

The objective of this project is to create a model that predicts race time given the distance and your history of running performance

Let's start by plotting average heart rate (HR) on the z-axis, against distance and pace on the x and y axes, for a few individuals. We might use more than these 3 variables to perform the prediction, but using 3 variables allows us to easily visualise the data. This visualisation will give us an idea of how well the data is correlated.

In [57]:
import plotly.graph_objects as go

def my_scatter(x,y,z,
               c,
               cmap='viridis',
               aspectratio=dict(x=1,y=1,z=1),
               height=None,
               width=None,
               ):
    '''
    Helper function that plots a scatter using plotly

    c: list or np.ndarray or pd.Series - the numerical value associated with the colour map
    cmap: string - colour map
    aspectratio: dict - aspect ratio 

    '''

    x = pd.Series(x)
    y = pd.Series(y)
    z = pd.Series(z)


    trace_data = go.Scatter3d(
        x=x,
        y=y,
        z=z,
        mode='markers',
        marker=dict(
            size=3,
            color=c,
            colorscale=cmap,
            opacity=0.8
        )
    )

    fig = go.Figure(data = trace_data)
    fig.update_layout(
        autosize=False if not (height is None and width is None) else True,
        width=width,
        height=height,
        margin = dict(l=0,r=0,b=0,t=0),
        paper_bgcolor='#192227',
        font_color = '#a3b3b5',
        scene = dict(
            xaxis = dict(
                backgroundcolor='#192227',
                gridcolor='#a3b3b5',
                showbackground=True,
                zerolinecolor='#a3b3b5',
                # range=
            ),
            yaxis = dict(
                backgroundcolor='#192227',
                gridcolor='#a3b3b5',
                showbackground=True,
                zerolinecolor='#a3b3b5',
                # range=
            ),
            zaxis = dict(
                backgroundcolor='#192227',
                gridcolor='#a3b3b5',
                showbackground=True,
                zerolinecolor='#a3b3b5',
                # range=
            ),
            aspectratio=aspectratio,
            camera = dict(projection_type="orthographic")
        )
    )

    return fig

In [78]:
import pandas as pd
import numpy as np
from datetime import datetime

data = pd.read_csv("data/raw-data-kaggle.csv", delimiter=";")

# column of distance to the nearest km
data["distance_km"] = round(data["distance (m)"]/1000,0).apply(int)

 
data["timestamp_abs"] = data["timestamp"].apply(lambda x: (datetime.strptime(x, "%d/%m/%Y %H:%M") - datetime(1970, 1, 1)).total_seconds())

data["timestamp_abs"] = (data["timestamp_abs"] - data["timestamp_abs"].min()) / (data["timestamp_abs"].max() - data["timestamp_abs"].min())



Once the data is processed and normalised, I'll create an initial Neural Network (NN) to see the minimum number of nodes to capture the input/output map with at least 90% accuracy for a given user. This will allow me to decide whether a helper model for fine-tuning is feasible.

The possible outcomes will be:
* Low number of nodes: the output of the helper fine-tuning model can be the changes to weights and biases for each node
* High number of nodes: there are too many weights and biases to create a neural network with these as outputs - I'll have to fine tune base models directly, and probably use a greater number of clusters 
* Accuracy can't be captured with 90% accuracy: this indicates that there is poor correlation of the input with the output