## Mean SpeedNorm by Track, Distance, Box (Full Field only)

Here we output the average SpeedNorm by Track, Distance, and Box (with a sample size greater than 1,000). These then can be used as a feature, and the sample size was chosen to be large to avoid overfitting.

From this, we will be limiting our training/testing/predictions to only races with these (Track, Distance) values.

----

Import libraries, packages, and greyhound data

In [13]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
import os
import decouple
import sys
config = decouple.AutoConfig(' ')
os.chdir(config('ROOT_DIRECTORY'))
sys.path.insert(0, '')

from scipy.stats import zscore
from multielo import MultiElo, Player, Tracker
from multielo.multielo import defaults

# Read in data
df_raw = pd.read_csv('./data/clean/dog_results.csv')

display(df_raw)

Unnamed: 0,FasttrackDogId,Place,DogName,Box,Rug,Weight,StartPrice,Margin1,Margin2,PIR,...,FasttrackRaceId,TrainerId,TrainerName,Distance,RaceGrade,Track,RaceNum,TrackDist,RaceDate,FieldSize
0,157500927,1,RAINE ALLEN,1,1,27.4,2.4,2.30,,Q/111,...,335811282,7683,C GRENFELL,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
1,1820620018,2,SURF A LOT,2,2,32.8,6.3,2.30,2.30,M/332,...,335811282,137227,C TYLEY,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
2,1950680026,3,PINGIN' BEE,6,6,25.5,9.3,3.84,1.54,S/443,...,335811282,132763,P DAPIRAN,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
3,1524380048,4,LUCAS THE GREAT,7,7,32.2,9.1,5.27,1.43,M/655,...,335811282,116605,E HAMILTON,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
4,124225458,5,QUAVO,4,4,28.9,3.4,5.56,0.29,M/766,...,335811282,132763,P DAPIRAN,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
782997,491585906,3,GLORIOUS GUNN,8,8,27.1,3.8,3.75,2.43,6644,...,745616339,87891,G HORNE,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7
782998,485659451,4,WOOD FIRE,3,3,32.1,4.1,3.75,0.14,3233,...,745616339,68549,C HALSE,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7
782999,528381655,5,TRENDING QUARTER,6,6,31.8,16.2,5.25,1.43,4566,...,745616339,83581,J DAILLY,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7
783000,537992387,6,ELITE WEAPON,1,1,26.7,2.9,5.25,0.00,1455,...,745616339,293372,S WILLIAMS,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7


Normalised all speeds by (Track, Distance) for full fields (8 greyhounds competing) - we will call this the 'SpeedNorm' - then determine the mean 'SpeedNorm' for each box by (Track, Distance)

In [18]:
# Copy dataframe
df = df_raw.copy()

# Calculate the average speed of greyhound for each race
df["Speed"] = df["Distance"]/df["RunTime"]

# Take only full fields (8 greyhounds)
df = df[df['FieldSize'] == 8]

# Normalise the speed by track and distance
df["SpeedNorm"] = df.groupby("TrackDist")["Speed"].transform(lambda x: zscore(x))

# Group by track, distance, box and aggregate number of box runs and average
df = df.groupby(["TrackDist", "Box"], as_index=False).agg(SpeedNorm_mean=('SpeedNorm', 'mean'),
                                                          SpeedNorm_std=('SpeedNorm', 'std'),
                                                          SampleSize=('Box', 'count'))

# Take only trackdist with a large enough sample size
df["MinSampleSize"] = df.groupby("TrackDist")["SampleSize"].transform(lambda x: min(x))
df = df[df["MinSampleSize"] >= 1000]

# Take only columns of interest for merging, and rename
df = df[['TrackDist', 'Box', 'SpeedNorm_mean', 'SpeedNorm_std']]
df = df.rename(columns={'SpeedNorm_mean': 'TrackDistBox_mean',
                       'SpeedNorm_std': 'TrackDistBox_std'})

display(df)

Unnamed: 0,TrackDist,Box,TrackDistBox_mean,TrackDistBox_std
0,Albion Park331,1,0.200973,0.925296
1,Albion Park331,2,0.117516,0.944033
2,Albion Park331,3,-0.049992,0.984974
3,Albion Park331,4,-0.083871,1.069797
4,Albion Park331,5,-0.061317,0.939514
...,...,...,...,...
859,Warragul460,4,-0.055152,0.936235
860,Warragul460,5,-0.088801,1.000726
861,Warragul460,6,-0.001669,0.987699
862,Warragul460,7,-0.034274,0.981504
