# Marlborough Running Club - 2024 Ridgeway Relay

## Preface

### Introduction
This notebook reproduces the graph produced by MRC for the 2024 Ridgeway Relay but makes it interactive.  The data was taken from the MRC 2024 results pdf for the race and put into a csv file.  This is the 2nd iteration.  The first iteration was to reproduce the graph as shown in the results pdf.  This version will use the same data but will show the elapsed time for each leg by each team.

### Motive
The graph on the pdf is difficult to follow as it is a static image and has to squeeze onto an A4 page.  By making it interactive, it is easier to see the details of the team/runner at each leg.  You can also hide teams to de-clutter the graph and focus on the teams you are interested in.

### Author Details
- Name: Irfan Akram
- Club: Handy Cross Runners
- Github: IrfanAkram5
- Participated in the 2024 Ridgeway Relay(Leg 5)

## This version
The program `marborough_hover.ipynb` is the original version which replicates the plot shown in the results.  This version takes the same data but we will use a datetime x-axis to show the elapsed time for each leg by each team.  This means some data manipulation in the pandas dataframe to calculate the elapsed time for each leg by each team.  We will then use this data to plot the graph.  The output is a mix of line and scatter glyphs.

### Acknowlegements
All the data was taken from the MRC 2024 Ridgeway Relay results pdf.  I am not a member of the MRC - just the second time taking part in their annual race as a member of HXR.  This is a personal project for self-learning purposes.

## Code

### Import Libraries
Main two libraries are Bokeh and Pandas.  Bokeh is a plotting library and Pandas is for data manipulation.

In [11]:
from bokeh import __version__ as bkv
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool, PanTool, WheelZoomTool, BoxZoomTool, SaveTool, ResetTool, HelpTool,LinearAxis, BasicTicker, LinearAxis
from bokeh.models import DatetimeTicker, DatetimeTickFormatter, DatetimeAxis,Label
from bokeh.io import output_notebook
from bokeh.palettes import Category20
from bokeh.core.enums import MarkerType

from pathlib import Path
import math
from itertools import cycle
import pandas as pd
import sys, platform

# Needed to display plots in the noteboook and not open MS Edge
output_notebook()  

# Fit in all columns for the dataframes when printing to the notebook
pd.set_option('display.width', 1000)

# Do not use scientific notation
pd.options.display.float_format = '{:,.2f}'.format

# Suppress SettingWithCopyWarning - we do want to modify the copied data!
pd.options.mode.chained_assignment = None  # default='warn'

#Display working versions of main core packages for reference
print(f'{"Python version:":<20}', sys.version )
print(f'{"Pandas version:":<20}', pd.__version__ )
print(f'{"Bokeh version:":<20}', bkv )
print(f'{"Platform info:":<20}', platform.platform())

Python version:      3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
Pandas version:      1.4.4
Bokeh version:       2.4.3
Platform info:       Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35


### Read and Clean Up the Data

Some minor clean up was required in original data. e.g. One of the clubs was mispelled between legs.  This was corrected in the CSV file
The code below is just stripping out unnecessary space characters and filling in blanks with actual values.  Times are converted to datatime objects. 

In [12]:
path = Path()
filename= 'marlborough24_data.txt'
# combine the filename and path into a single object
datafile = path / filename

# Read the data into a dataframe
df = pd.read_csv(datafile, parse_dates=True)

# Clean up the data.  Change types as necessary. Fill in missing values and
# Strip any leading spaces for the columns
df['Mass Start?'].fillna('No', inplace=True)

#Clean up text columns
df['Start Time'] = df['Start Time'].str.strip()
df['Finish Time'] = df['Finish Time'].str.strip()
df['Time(including penalties)'] = df['Time(including penalties)'].str.strip()
df['Pace(including penalties)'] = df['Pace(including penalties)'].str.strip()
df['Penalties'] = df['Penalties'].str.strip()
df['Team Name'] = df['Team Name'].str.strip()

df['Team Number'] = df['Team Number'].astype(int)

# replace string '#N/A' with 00:00:00 for timings.  
# fillna does not work due to it being a string
df['Start Time'].replace('#N/A', '00:00:00', inplace=True)
df['Finish Time'].replace('#N/A', '00:00:00', inplace=True)
df['Time(including penalties)'].replace('#N/A', '00:00:00', inplace=True)
df['Pace(including penalties)'].replace('#N/A', '00:00:00', inplace=True)
df['Penalties'].replace('#N/A', '00:00:00', inplace=True)

# Fill any remaining NaN values with 0
df.fillna(0, inplace=True)

# Convert last two columns of dataframe from float to int
df['Team placing at the end of this Leg'] = df['Team placing at the end of this Leg'].astype(int)
df['Individual placing this Leg'] = df['Individual placing this Leg'].astype(int)

# Make the time columns into datetime columns
df['Start Time'] = pd.to_datetime('23/06/2024 ' +  df['Start Time'])
df['Finish Time'] = pd.to_datetime('23/06/2024 ' +  df['Finish Time'])  # convert to datetime
#df['Time(including penalties)'] = pd.to_datetime('23/06/2024 ' +  df['Time(including penalties)'])
df['Time(including penalties)'] = pd.to_timedelta(df['Time(including penalties)'])
df['Pace(including penalties)'] = pd.to_datetime('23/06/2024 ' +  df['Pace(including penalties)']).dt.time
df['Penalties'] = pd.to_datetime('23/06/2024 ' +  df['Penalties']).dt.time


# Add a new datetime column to the dataframe and set it to midnight
df['Elapsed Time'] = pd.to_datetime('23/06/2024 00:00:00')

# sort dataframe by team number and leg number
df.sort_values(by=['Team Number', 'Leg'], inplace=True)
df = df.reset_index(drop=True, inplace=False)

# Update the elapsed time column.  If the leg number is 1, Elapsed time = Elapsed time + Time(including penalties) of current row
# Otherwise, Elapsed time = Elapsed time of previous row  + Time(including penalties) of current row
# use the loc function to identify the row and column to update
counter = 0
for row in df.itertuples():
    counter = counter + 1
    if counter == 2:
        break

# Update elapsed time column.  If leg number is 1, then elapsed time = elapsed time + time(including penalties) of current row
# Otherwise, elapsed time = elapsed time of previous row + time(including penalties) of current row
for row in df.itertuples():
    if row.Leg == 1:
        df.loc[row.Index, 'Elapsed Time'] = df.loc[row.Index, 'Elapsed Time'] + df.loc[row.Index, 'Time(including penalties)']
    else:
        df.loc[row.Index, 'Elapsed Time'] = df.loc[row.Index - 1, 'Elapsed Time'] + df.loc[row.Index, 'Time(including penalties)']

with pd.option_context('display.max_rows', 20, 'display.max_columns', None):
    display(df)


Unnamed: 0,Leg,Team Number,Team Name,Runner,Category,Start Time,Finish Time,Time(including penalties),Pace(including penalties),Penalties,Mass Start?,Team placing at the end of this Leg,Individual placing this Leg,Elapsed Time
0,1,1,Abingdon AC 1,James Clayton,M,2024-06-23 07:30:00,2024-06-23 08:43:33,0 days 01:13:33,00:06:41,00:00:00,No,9,9,2024-06-23 01:13:33
1,2,1,Abingdon AC 1,Rob Howlin,M,2024-06-23 08:43:33,2024-06-23 09:27:48,0 days 00:44:15,00:07:22,00:00:00,No,3,1,2024-06-23 01:57:48
2,3,1,Abingdon AC 1,Liz Mcgil,F,2024-06-23 09:27:48,2024-06-23 10:53:17,0 days 01:25:29,00:09:06,00:00:00,No,9,22,2024-06-23 03:23:17
3,4,1,Abingdon AC 1,Kim Sutherland,F,2024-06-23 10:53:17,2024-06-23 11:33:36,0 days 00:40:19,00:07:28,00:00:00,No,9,8,2024-06-23 04:03:36
4,5,1,Abingdon AC 1,Joe Evans-Murray,M,2024-06-23 11:33:36,2024-06-23 12:40:53,0 days 01:07:17,00:06:40,00:00:00,No,8,2,2024-06-23 05:10:53
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,6,40,Buckinghamshire Runners,Nathan Jones,F,2024-06-23 13:45:00,2024-06-23 15:45:57,0 days 02:00:57,00:11:38,00:00:00,Yes,38,39,2024-06-23 10:11:01
396,7,40,Buckinghamshire Runners,Kirstie Elliot,F,2024-06-23 14:00:00,2024-06-23 15:22:43,0 days 01:22:43,00:09:05,00:00:00,Yes,38,25,2024-06-23 11:33:44
397,8,40,Buckinghamshire Runners,Jez Vibert,MV,2024-06-23 15:22:43,2024-06-23 16:35:30,0 days 01:12:47,00:09:20,00:00:00,No,37,25,2024-06-23 12:46:31
398,9,40,Buckinghamshire Runners,Keith Harding,M,2024-06-23 16:35:30,2024-06-23 18:36:10,0 days 02:00:40,00:11:17,00:00:00,No,36,35,2024-06-23 14:47:11


## Generate Additional Data Structures

These are used to feed into the plot as additional lookups for a given team/leg.  They are needed to overcome the limitation of the line glyph.

In [13]:
# sort dataframe by Team Number and Leg
df.sort_values(by=["Team Number", "Leg"], inplace=True)
num_of_teams = len(df["Team Name"].unique())

team_names = df["Team Name"].unique().tolist()

#Note Category20 is a dictionary. The key is a number which represents the number of colours 
# in the value field.  The value field content is a list of colours
cycle_colours = cycle(Category20[20])
team_colour = dict(zip(team_names, cycle_colours))

#Note: The list of available markers is less than the number of teams
#Exclude the "dot" marker as it is too small to discern
list_markers = [marker for i, marker in enumerate(MarkerType) if not marker in ["dot", "dash"]]

# Creates a generator that will cycle through the list of markers, restarting at
# the beginning when it reaches the end
cycle_markers = cycle(list_markers)

# zip team names with markers to create a dictionary
team_marker = dict(zip(team_names, cycle_markers))

# Get unique list of team names alongside the team number from df and convert to a dictionary
df_team_numbers = df[["Team Name", "Team Number"]].drop_duplicates()
team_numbers = dict(zip(df_team_numbers["Team Name"], df_team_numbers["Team Number"]))


### Create the actual plot

In [14]:


hover_tool = HoverTool(
        tooltips=[
            ("Leg", "@Leg"),
            ("Team Number", "@{Team Number}"),
            ("Team Name", "@{Team Name}"),
            ("Runner", "@{Runner}"),
            ("Team placing", "@{Team placing at the end of this Leg}"),
            ("Individual placing", "@{Individual placing this Leg}"),
            ("Actual Start Time", "@{Start Time}{%F %T}"),
            ("Actual Finish Time", "@{Finish Time}{%F %T}"),
            ("Time(including penalties)", "@{Time(including penalties)}{%T}"),
            ("Pace(including penalties)", "@{Pace(including penalties)}{%T}"),
            ("Penalties", "@{Penalties}"),
            ("Team running time thus far", "@{Elapsed Time}{%T}"),
        ],
        formatters={
            "@{Start Time}": "datetime",
            "@{Finish Time}": "datetime",
            "@{Time(including penalties)}": "datetime",
            "@{Pace(including penalties)}": "datetime",
            "@{Elapsed Time}": "datetime",
        },
        mode="mouse",
    )

x_axis_ticker = DatetimeTicker(desired_num_ticks=24,num_minor_ticks=4,)
y_axis_ticker = BasicTicker(desired_num_ticks=40,num_minor_ticks=0, max_interval=1)

# Create a new figure
p = figure(
    title="MRC Ridgeway Relay 2024",
    height=800, 
    x_axis_label="Elapsed Time (HH:MM:SS)",
    y_axis_label="Team Number",
    sizing_mode="stretch_width", # Use the maximum screen width
    above = [DatetimeAxis(ticker=x_axis_ticker),],
    right = [LinearAxis(ticker=y_axis_ticker),],
    tools = [PanTool(),hover_tool,BoxZoomTool(),WheelZoomTool(),SaveTool(),ResetTool(),HelpTool()],
)

p.xaxis.ticker = x_axis_ticker
p.yaxis.ticker = y_axis_ticker
p.xaxis.formatter = DatetimeTickFormatter(hours="%H:%M:%S")
p.xaxis.major_label_orientation = math.pi/8

# Create a Label for attributions
# Add in Label for acknowledgement of data source
attributions = Label(
    x=1,
    y=1,
    x_units="screen",
    y_units="screen",
    text="Data: https://www.marlboroughrunningclub.org.uk/races/ridgeway-relay \nCreated by Irfan Akram of Handy Cross Runners \nGithub: IrfanAkram5",
    text_font_size="8pt",
    text_color="black",
    render_mode="css",
)

p.add_layout(attributions)

# We cannot use a view with p.line.  
# Therefore, subset the dataframe for each team and provide as separate source for each line
for k, v in team_colour.items():

    source = ColumnDataSource(df.loc[df["Team Name"] == k])
    p.line(
        x="Elapsed Time",
        y="Team Number",
        source=source,
        legend_label="T" + str(team_numbers[k]) + "-" + k,
        line_color=v,
    )

    # Add in the markers for each team
    p.scatter(
        x="Elapsed Time",
        y="Team Number",
        source=source,
        fill_color=v,
        line_color=v,
        legend_label="T" + str(team_numbers[k]) + "-" + k,
        marker=team_marker[k],
        size=10,
    )

# Make each line clickable to hide it and add a title
p.legend.click_policy = "hide"
p.legend.title = "Teams - click to hide"
p.legend.title_text_font_size = "12pt"
p.legend.title_text_font_style = "bold"
p.add_layout(p.legend[0], "right")

show(p)


## Comment

Although cleaner than the original graph, it's not actually easy to see who came 1st,2nd,3rd etc.  This is because the teams are not ordered by their finish time.  But ordering the teams would jumble all the team nos on the y-axis.  Also, visually, it is not that easy to see the different legs.  See the `marlborough24_hstacked.ipynb` output where I look to correct these shortfalls.  Also with the current height of the plot, I cannot fit in all the labels!

## Possible To-dos(further enhancements)

- Show the labels outside of the plot area.
- Order the teams by their finish time.