## SpeedNorm by Greyhound

Here we infer/predict a greyhounds SpeedNorm using previous race data provided. At the moment we'll use a exponential moving average (EMA), but we may also want ot scale the previous SpeedNorm's based on how many times the greyhound was checked.

There are some strange outlier runtimes, e.g. FasttrackDogId 253088673. I've limited the SpeedNorm values to (-3, +3) std in order to prevent these from affecting the 'SpeedNorm_EMA' feature. But it is worth taking some time to better correct for these values.

----

Import libraries, packages, and greyhound data

In [13]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
import os
import decouple
import sys
config = decouple.AutoConfig(' ')
os.chdir(config('ROOT_DIRECTORY'))
sys.path.insert(0, '')

from scipy.stats import zscore
from multielo import MultiElo, Player, Tracker
from multielo.multielo import defaults

# Read in data
df_raw = pd.read_csv('./data/clean/dog_results.csv')

display(df_raw)

Unnamed: 0,FasttrackDogId,Place,DogName,Box,Rug,Weight,StartPrice,Margin1,Margin2,PIR,...,FasttrackRaceId,TrainerId,TrainerName,Distance,RaceGrade,Track,RaceNum,TrackDist,RaceDate,FieldSize
0,157500927,1,RAINE ALLEN,1,1,27.4,2.4,2.30,,Q/111,...,335811282,7683,C GRENFELL,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
1,1820620018,2,SURF A LOT,2,2,32.8,6.3,2.30,2.30,M/332,...,335811282,137227,C TYLEY,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
2,1950680026,3,PINGIN' BEE,6,6,25.5,9.3,3.84,1.54,S/443,...,335811282,132763,P DAPIRAN,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
3,1524380048,4,LUCAS THE GREAT,7,7,32.2,9.1,5.27,1.43,M/655,...,335811282,116605,E HAMILTON,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
4,124225458,5,QUAVO,4,4,28.9,3.4,5.56,0.29,M/766,...,335811282,132763,P DAPIRAN,500.0,Restricted Win,Bendigo,1.0,Bendigo500,2018-07-01,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
782997,491585906,3,GLORIOUS GUNN,8,8,27.1,3.8,3.75,2.43,6644,...,745616339,87891,G HORNE,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7
782998,485659451,4,WOOD FIRE,3,3,32.1,4.1,3.75,0.14,3233,...,745616339,68549,C HALSE,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7
782999,528381655,5,TRENDING QUARTER,6,6,31.8,16.2,5.25,1.43,4566,...,745616339,83581,J DAILLY,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7
783000,537992387,6,ELITE WEAPON,1,1,26.7,2.9,5.25,0.00,1455,...,745616339,293372,S WILLIAMS,520.0,Grade 5,Cannington,12.0,Cannington520,2021-12-31,7


Create a SpeedNorm column, take the EWA (favouring more recent races) with a alpha of 0.2, and shift by one race (to prevent data leakage).

In [14]:
# Copy raw dataframe
df = df_raw.copy()

# Create SpeedNorm column - Normalise speed by Track and Distance
df['Speed'] = df['Distance']/df['RunTime']
df['SpeedNorm'] = df.groupby('TrackDist')['Speed'].transform(lambda x: zscore(x))

# Limit SpeedNorm between (-3, 3) due to large outliers (injuries? small sample sizes?)
df['SpeedNorm'] = df['SpeedNorm'].clip(-3, 3)

# Take the EMA SpeedNorm for each greyhound 
alpha_ = 0.2
df = df.sort_values(by=['RaceDate', 'FasttrackDogId'], ascending=True)
df["SpeedNorm_EMA"] = df.groupby("FasttrackDogId")["SpeedNorm"].transform(lambda x: x.ewm(alpha=alpha_).mean().shift(1))

# Take moving standard deviation
df = df.sort_values(by=['RaceDate', 'FasttrackDogId'], ascending=True)
df["SpeedNorm_MSTD"] = df.groupby("FasttrackDogId")["SpeedNorm"].transform(lambda x: x.rolling(3).std().shift(1))

# Take only columns of interest for merging
df = df[['FasttrackDogId', 'FasttrackRaceId', 'RaceDate', 'SpeedNorm_EMA', 'SpeedNorm_MSTD']]

display(df)

Unnamed: 0,FasttrackDogId,FasttrackRaceId,RaceDate,SpeedNorm_EMA,SpeedNorm_MSTD
48,-750768,335811289,2018-07-01,,
351,109032131,334311905,2018-07-01,,
329,109032145,334309959,2018-07-01,,
364,109032152,334311907,2018-07-01,,
37,109032166,335811287,2018-07-01,,
...,...,...,...,...,...
782504,580794426,747018056,2021-12-31,1.247492,0.468723
782683,581314812,747048622,2021-12-31,0.702261,0.234112
782762,587948053,745289885,2021-12-31,0.061423,
782759,591659457,745289884,2021-12-31,-1.421817,


Save to ./data/features as a .csv

In [15]:
df.to_csv('./data/features/ema-speednorm-by-greyhound.csv', index=False)