# Data Classification Assignment

## Step 1: Problem Statement

This assignment aims to predict the winner of a horse race given the race track condition and characteristics of the horses. This is a part of our larger project on horse gambling prediction.

## Step 2: Data Collection

We are using two seperate csv files merged together from a competition on kaggle. The first pdf, races.csv, includes the information about the race. The second pdf, runs.csv, includes information about individual horses. The data is from https://www.kaggle.com/datasets/lantanacamara/hong-kong-horse-racing and includes all horse races in Hong Kong. This is the best dataset for this problem given that better datasets are hidden behind paywall. The documentation for this dataset can be found on the kaggle page.

## Step 3: Preprocessing (and Feature Engineering)

### Import

In [2]:
import pandas as pd
import numpy as np


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb

from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score

import matplotlib.pyplot as plt

from tqdm import tqdm

pd.set_option('display.max_columns', 500)

### Preprocessing

In [3]:
data_races = pd.read_csv('races.csv')
data_runs = pd.read_csv('runs.csv')

In [4]:
data_races.head()

Unnamed: 0,race_id,date,venue,race_no,config,surface,distance,going,horse_ratings,prize,race_class,sec_time1,sec_time2,sec_time3,sec_time4,sec_time5,sec_time6,sec_time7,time1,time2,time3,time4,time5,time6,time7,place_combination1,place_combination2,place_combination3,place_combination4,place_dividend1,place_dividend2,place_dividend3,place_dividend4,win_combination1,win_dividend1,win_combination2,win_dividend2
0,0,1997-06-02,ST,1,A,0,1400,GOOD TO FIRM,40-15,485000.0,5,13.53,21.59,23.94,23.58,,,,13.53,35.12,59.06,82.64,,,,8,11,6.0,,36.5,25.5,18.0,,8,121.0,,
1,1,1997-06-02,ST,2,A,0,1200,GOOD TO FIRM,40-15,485000.0,5,24.05,22.64,23.7,,,,,24.05,46.69,70.39,,,,,5,13,4.0,,12.5,47.0,33.5,,5,23.5,,
2,2,1997-06-02,ST,3,A,0,1400,GOOD TO FIRM,60-40,625000.0,4,13.77,22.22,24.88,22.82,,,,13.77,35.99,60.87,83.69,,,,11,1,13.0,,23.0,23.0,59.5,,11,70.0,,
3,3,1997-06-02,ST,4,A,0,1200,GOOD TO FIRM,120-95,1750000.0,1,24.33,22.47,22.09,,,,,24.33,46.8,68.89,,,,,5,3,10.0,,14.0,24.5,16.0,,5,52.0,,
4,4,1997-06-02,ST,5,A,0,1600,GOOD TO FIRM,60-40,625000.0,4,25.45,23.52,23.31,23.56,,,,25.45,48.97,72.28,95.84,,,,2,10,1.0,,15.5,28.0,17.5,,2,36.5,,


In [5]:
# Check for outliers and missing values
data_races.describe()

Unnamed: 0,race_id,race_no,surface,distance,prize,race_class,sec_time1,sec_time2,sec_time3,sec_time4,sec_time5,sec_time6,sec_time7,time1,time2,time3,time4,time5,time6,time7,place_combination1,place_combination2,place_combination3,place_combination4,place_dividend1,place_dividend2,place_dividend3,place_dividend4,win_combination1,win_dividend1,win_combination2,win_dividend2
count,6349.0,6349.0,6349.0,6349.0,5887.0,6349.0,6349.0,6349.0,6349.0,3634.0,821.0,115.0,0.0,6349.0,6349.0,6349.0,3634.0,821.0,115.0,0.0,6349.0,6349.0,6324.0,23.0,6349.0,6349.0,6324.0,23.0,6349.0,6349.0,12.0,12.0
mean,3174.0,5.226807,0.109151,1419.113246,1134790.0,3.893684,20.699466,22.826749,23.830743,23.852854,23.868685,23.912261,,20.699466,43.526215,67.356959,91.735793,112.479695,140.349739,,6.23862,6.366357,6.525933,8.434783,27.778469,32.802536,38.969292,22.013043,6.23862,96.096944,8.166667,101.416667
std,1832.942761,2.795019,0.311853,281.468745,1749156.0,1.992868,5.880319,1.044998,0.870355,0.820277,0.75886,0.667664,,5.880319,6.657225,6.978638,7.814997,5.452576,4.910272,,3.61858,3.662594,3.731685,2.98216,25.175925,30.275423,37.015938,16.757802,3.620321,131.221259,3.325749,101.672566
min,0.0,1.0,0.0,1000.0,485000.0,0.0,12.39,20.06,21.2,21.4,21.81,21.77,,12.39,33.11,55.16,80.43,105.83,132.84,,1.0,1.0,1.0,4.0,10.1,10.1,10.1,10.1,1.0,10.5,3.0,12.0
25%,1587.0,3.0,0.0,1200.0,675000.0,3.0,13.69,22.14,23.21,23.3,23.41,23.55,,13.69,35.95,59.83,83.41,109.01,137.29,,3.0,3.0,3.0,6.5,15.0,17.0,18.5,10.1,3.0,34.5,5.5,25.25
50%,3174.0,5.0,0.0,1400.0,840000.0,4.0,23.72,22.8,23.72,23.77,23.83,24.02,,23.72,46.41,69.72,94.3,110.33,138.36,,6.0,6.0,6.0,8.0,20.0,24.0,27.5,15.5,6.0,58.5,8.0,50.25
75%,4761.0,7.0,0.0,1650.0,1060000.0,4.0,24.7,23.45,24.38,24.33,24.28,24.295,,24.7,48.07,72.1,100.03,113.15,140.205,,9.0,9.0,10.0,11.0,31.0,36.5,44.0,24.75,9.0,105.5,11.25,178.625
max,6348.0,11.0,1.0,2400.0,25000000.0,13.0,30.03,27.41,27.58,28.92,26.5,25.92,,30.03,56.22,81.29,107.81,134.31,158.49,,14.0,14.0,14.0,14.0,410.5,627.0,420.5,68.0,14.0,2687.5,12.0,282.5


In [6]:
data_races.isna().sum()

race_id                  0
date                     0
venue                    0
race_no                  0
config                   0
surface                  0
distance                 0
going                    0
horse_ratings            0
prize                  462
race_class               0
sec_time1                0
sec_time2                0
sec_time3                0
sec_time4             2715
sec_time5             5528
sec_time6             6234
sec_time7             6349
time1                    0
time2                    0
time3                    0
time4                 2715
time5                 5528
time6                 6234
time7                 6349
place_combination1       0
place_combination2       0
place_combination3      25
place_combination4    6326
place_dividend1          0
place_dividend2          0
place_dividend3         25
place_dividend4       6326
win_combination1         0
win_dividend1            0
win_combination2      6337
win_dividend2         6337
d

In [7]:
data_runs.head()

Unnamed: 0,race_id,horse_no,horse_id,result,won,lengths_behind,horse_age,horse_country,horse_type,horse_rating,horse_gear,declared_weight,actual_weight,draw,position_sec1,position_sec2,position_sec3,position_sec4,position_sec5,position_sec6,behind_sec1,behind_sec2,behind_sec3,behind_sec4,behind_sec5,behind_sec6,time1,time2,time3,time4,time5,time6,finish_time,win_odds,place_odds,trainer_id,jockey_id
0,0,1,3917,10,0.0,8.0,3,AUS,Gelding,60,--,1020.0,133,7,6,4,6,10.0,,,2.0,2.0,1.5,8.0,,,13.85,21.59,23.86,24.62,,,83.92,9.7,3.7,118,2
1,0,2,2157,8,0.0,5.75,3,NZ,Gelding,60,--,980.0,133,12,12,13,13,8.0,,,6.5,9.0,5.0,5.75,,,14.57,21.99,23.3,23.7,,,83.56,16.0,4.9,164,57
2,0,3,858,7,0.0,4.75,3,NZ,Gelding,60,--,1082.0,132,8,3,2,2,7.0,,,1.0,1.0,0.75,4.75,,,13.69,21.59,23.9,24.22,,,83.4,3.5,1.5,137,18
3,0,4,1853,9,0.0,6.25,3,SAF,Gelding,60,--,1118.0,127,13,8,8,11,9.0,,,3.5,5.0,3.5,6.25,,,14.09,21.83,23.7,24.0,,,83.62,39.0,11.0,80,59
4,0,5,2796,6,0.0,3.75,3,GB,Gelding,60,--,972.0,131,14,13,12,12,6.0,,,7.75,8.75,4.25,3.75,,,14.77,21.75,23.22,23.5,,,83.24,50.0,14.0,9,154


In [8]:
# Check for outliers and missing values
data_runs.describe()

Unnamed: 0,race_id,horse_no,horse_id,result,won,lengths_behind,horse_age,horse_rating,declared_weight,actual_weight,draw,position_sec1,position_sec2,position_sec3,position_sec4,position_sec5,position_sec6,behind_sec1,behind_sec2,behind_sec3,behind_sec4,behind_sec5,behind_sec6,time1,time2,time3,time4,time5,time6,finish_time,win_odds,place_odds,trainer_id,jockey_id
count,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,79447.0,46226.0,10079.0,1296.0,79447.0,79447.0,79447.0,46226.0,10079.0,1296.0,79447.0,79447.0,79447.0,46226.0,10079.0,1296.0,79447.0,79447.0,75712.0,79447.0,79447.0
mean,3173.352814,6.905623,2204.410525,6.838597,0.080053,6.108901,3.339346,61.034904,1104.953568,122.729656,6.876005,6.849837,6.846791,6.843443,6.94555,6.748388,6.253086,3.378768,4.083972,4.509457,5.992076,6.026654,10.638735,21.135438,22.928985,23.864054,24.039662,24.105221,24.350216,85.322914,28.812977,7.423177,79.793007,85.832341
std,1833.101494,3.760711,1275.049375,3.730498,0.271378,33.636209,0.876763,11.748788,62.347597,6.305496,3.747589,3.734348,3.733014,3.732055,3.792138,3.70236,3.426108,4.282529,2.691107,16.541538,33.991084,31.754623,67.791252,6.930518,3.599727,3.571163,4.663367,1.127963,1.314755,18.512883,30.097375,8.82343,45.118874,54.338105
min,0.0,1.0,0.0,1.0,0.0,-0.5,2.0,10.0,693.0,103.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.15,0.15,0.0,0.0,0.1,0.0,12.39,19.99,21.0,21.2,21.42,21.5,55.16,1.0,1.0,0.0,0.0
25%,1586.0,4.0,1085.0,4.0,0.0,1.75,3.0,60.0,1062.0,118.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,1.5,1.75,1.75,1.75,1.75,1.75,14.12,22.29,23.23,23.32,23.41,23.57,70.59,7.7,2.4,47.0,39.0
50%,3174.0,7.0,2209.0,7.0,0.0,4.0,3.0,60.0,1102.0,123.0,7.0,7.0,7.0,7.0,7.0,7.0,6.0,3.0,3.75,3.75,3.75,3.75,4.25,24.18,22.87,23.76,23.89,23.96,24.12,83.35,15.0,4.1,75.0,76.0
75%,4764.5,10.0,3308.0,10.0,0.0,6.75,3.0,60.0,1146.0,128.0,10.0,10.0,10.0,10.0,10.0,10.0,9.0,5.0,5.75,5.75,6.25,6.5,7.75,25.36,23.52,24.41,24.56,24.63,24.82,100.78,38.0,8.6,118.0,138.0
max,6348.0,14.0,4404.0,14.0,1.0,999.0,10.0,138.0,1369.0,133.0,15.0,14.0,14.0,14.0,14.0,14.0,14.0,999.0,60.25,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,49.57,34.15,163.58,99.0,101.0,175.0,185.0


In [9]:
data_runs.isna().sum()

race_id                0
horse_no               0
horse_id               0
result                 0
won                    0
lengths_behind         0
horse_age              0
horse_country          2
horse_type             2
horse_rating           0
horse_gear             0
declared_weight        0
actual_weight          0
draw                   0
position_sec1          0
position_sec2          0
position_sec3          0
position_sec4      33221
position_sec5      69368
position_sec6      78151
behind_sec1            0
behind_sec2            0
behind_sec3            0
behind_sec4        33221
behind_sec5        69368
behind_sec6        78151
time1                  0
time2                  0
time3                  0
time4              33221
time5              69368
time6              78151
finish_time            0
win_odds               0
place_odds          3735
trainer_id             0
jockey_id              0
dtype: int64

In [10]:
# Only include variables that are relevant to the prediction
data_races = data_races[['race_id', 'venue', 'config', 'surface', 'distance',
       'going', 'horse_ratings', 'prize', 'race_class', 'place_combination1']]

# Impute missing prize only since other not important
mean_value = data_races['prize'].mean()
for i in range(len(data_races['prize'])):
    if pd.isna(data_races.iloc[i,7]):
        data_races.iloc[i,7] = mean_value

# Only include variables that are relevant to the prediction
data_runs = data_runs[['race_id', 'horse_no', 'horse_id', 'won',
       'horse_age', 'horse_country', 'horse_type', 'horse_rating',
       'declared_weight', 'actual_weight', 'win_odds','place_odds']]

data_runs = data_runs.dropna()

### Feature Engineering

In [11]:
# Binning horse country and horse type

print(data_runs.horse_country.value_counts())
print(data_runs.horse_type.value_counts())

AUS    28404
NZ     26390
IRE     9947
GB      6029
USA     2382
FR      1155
SAF      655
GER      336
ARG      125
CAN       91
JPN       77
ITY       56
GR        33
BRZ       18
ZIM       12
Name: horse_country, dtype: int64
Gelding    71909
Brown       1977
Horse       1068
Colt         275
Mare         230
Rig          150
Roan          42
Filly         42
Grey          17
Name: horse_type, dtype: int64


In [12]:
# Binning horse country and horse type


def helper_country(country):

    if(country not in ['AUS','NZ','IRE','GB','USA','FR']):
        return 'other'
    return country

def helper_type(type):

    if(type not in ['Gelding','Brown']):
        return 'other'
    return type

data_runs.horse_country = data_runs.horse_country.apply(helper_country)
data_runs.horse_type = data_runs.horse_type.apply(helper_type)

In [13]:
# New Feature: number of previous participation and wins

old_data_runs = data_runs.copy()

data_runs['previous_races'] = 0
data_runs['previous_wins'] = 0

def previous_records(horse_id, race_id):

    races = 0
    wins = 0

    df_horse = old_data_runs[(old_data_runs['horse_id'] == horse_id) 
                             & (old_data_runs['race_id'] < race_id)]
    
    races = len(df_horse)
    wins = df_horse['won'].sum()
    
    return races, wins
    

In [14]:
for index, row in tqdm(old_data_runs.iterrows()):

    races, wins = previous_records(row['horse_id'],row['race_id'])
    data_runs.loc[index, 'previous_races'] = races
    data_runs.loc[index, 'previous_wins'] = wins

75710it [00:30, 2505.10it/s]


In [15]:
# New Feature: number of horses in a race

old_data_runs = data_runs.copy()

race_count = old_data_runs.groupby(by = 'race_id').count()['horse_no']

data_races['horse_count'] = race_count    

### Merging and Further Preprocessing

In [16]:
# Merging the two dataset

df_horse_racing = pd.merge(data_runs, data_races, on="race_id", how="left")

In [17]:
# Encode categorical variables

categorical_variables = ['horse_ratings','horse_country', 'horse_type',
                         'venue', 'config', 'going', 'race_class']
horse_racing_predictors = df_horse_racing
horse_racing_predictors = pd.get_dummies(horse_racing_predictors, columns = categorical_variables)


In [18]:
# Selecting variables for horses and race

horsestats = [
              #'race_id','horse_id',
              'horse_no','horse_age',
              'horse_rating','declared_weight','actual_weight', 'win_odds','place_odds']
for col in horse_racing_predictors.columns:
  if('horse_country' in col):
    horsestats.append(col)
  if('horse_type' in col):
    horsestats.append(col)
#horsestats.append('won')

racestats = [
             'race_id',
             'distance',
             'place_combination1',
             'horse_count'
             ]
for col in horse_racing_predictors.columns:
  if('horse_ratings' in col):
    racestats.append(col)
  if('venue' in col):
    racestats.append(col)
  if('config' in col):
    racestats.append(col)
  if('going' in col):
    racestats.append(col)
  if('race_class' in col):
    racestats.append(col)

In [19]:
'''
Xy = horse_racing_predictors[racestats+horsestats]

Xy = Xy[Xy['horse_count'] == 14].drop(columns = ['horse_count','race_id'])
'''

# Group by everything at race level

race_predictors = horse_racing_predictors[racestats].groupby('race_id').min()

horsestats_temp = horsestats[2:]

for j in range(1,15):

    new_cols = [i + '_' + str(j) for i in horsestats_temp]
    race_predictors[new_cols] = -1

index_list = list(race_predictors.index)

for index in tqdm(range(0,len(horse_racing_predictors))):
    row = horse_racing_predictors.iloc[index,:]
    race_id = row['race_id']
    horse_no = row['horse_no']
    info = row[horsestats_temp]
    racestats_len = len(racestats)-1
    index_id = index_list.index(race_id)
    race_predictors.iloc[index_id
        ,int(racestats_len+len(horsestats_temp)*(horse_no-1)):int(racestats_len+len(horsestats_temp)*horse_no)] = info
    
# Only Keeping races with exactly 14 horses

race_predictors_14 = race_predictors.replace(-1, np.NaN)
race_predictors_14 = race_predictors_14.dropna()
Xy = race_predictors_14


  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_predictors[new_cols] = -1
  race_pre

## Step 4: Modelling

### Dataset

In [20]:
Xy

Unnamed: 0_level_0,distance,place_combination1,horse_count,horse_ratings_100+,horse_ratings_100-75,horse_ratings_100-80,horse_ratings_105-80,horse_ratings_105-85,horse_ratings_110-80,horse_ratings_110-85,horse_ratings_110-90,horse_ratings_115-90,horse_ratings_115-95,horse_ratings_120-100,horse_ratings_120-95,horse_ratings_40-0,horse_ratings_40-10,horse_ratings_40-15,horse_ratings_40-20,horse_ratings_60-35,horse_ratings_60-40,horse_ratings_65-40,horse_ratings_75-55,horse_ratings_80+,horse_ratings_80-55,horse_ratings_80-60,horse_ratings_85+,horse_ratings_85-60,horse_ratings_90+,horse_ratings_90-70,horse_ratings_95+,horse_ratings_95-75,horse_ratings_G,venue_HV,venue_ST,config_A,config_A+3,config_B,config_B+2,config_C,config_C+3,going_FAST,going_GOOD,going_GOOD TO FIRM,going_GOOD TO YIELDING,going_SLOW,going_SOFT,going_WET FAST,going_WET SLOW,going_YIELDING,going_YIELDING TO SOFT,race_class_0,race_class_1,race_class_2,race_class_3,race_class_4,race_class_5,race_class_6,race_class_11,race_class_12,race_class_13,horse_rating_1,declared_weight_1,actual_weight_1,win_odds_1,place_odds_1,horse_country_AUS_1,horse_country_FR_1,horse_country_GB_1,horse_country_IRE_1,horse_country_NZ_1,horse_country_USA_1,horse_country_other_1,horse_type_Brown_1,horse_type_Gelding_1,horse_type_other_1,horse_rating_2,declared_weight_2,actual_weight_2,win_odds_2,place_odds_2,horse_country_AUS_2,horse_country_FR_2,horse_country_GB_2,horse_country_IRE_2,horse_country_NZ_2,horse_country_USA_2,horse_country_other_2,horse_type_Brown_2,horse_type_Gelding_2,horse_type_other_2,horse_rating_3,declared_weight_3,actual_weight_3,win_odds_3,place_odds_3,horse_country_AUS_3,horse_country_FR_3,horse_country_GB_3,horse_country_IRE_3,horse_country_NZ_3,horse_country_USA_3,horse_country_other_3,horse_type_Brown_3,horse_type_Gelding_3,horse_type_other_3,horse_rating_4,declared_weight_4,actual_weight_4,win_odds_4,place_odds_4,horse_country_AUS_4,horse_country_FR_4,horse_country_GB_4,horse_country_IRE_4,horse_country_NZ_4,horse_country_USA_4,horse_country_other_4,horse_type_Brown_4,horse_type_Gelding_4,horse_type_other_4,horse_rating_5,declared_weight_5,actual_weight_5,win_odds_5,place_odds_5,horse_country_AUS_5,horse_country_FR_5,horse_country_GB_5,horse_country_IRE_5,horse_country_NZ_5,horse_country_USA_5,horse_country_other_5,horse_type_Brown_5,horse_type_Gelding_5,horse_type_other_5,horse_rating_6,declared_weight_6,actual_weight_6,win_odds_6,place_odds_6,horse_country_AUS_6,horse_country_FR_6,horse_country_GB_6,horse_country_IRE_6,horse_country_NZ_6,horse_country_USA_6,horse_country_other_6,horse_type_Brown_6,horse_type_Gelding_6,horse_type_other_6,horse_rating_7,declared_weight_7,actual_weight_7,win_odds_7,place_odds_7,horse_country_AUS_7,horse_country_FR_7,horse_country_GB_7,horse_country_IRE_7,horse_country_NZ_7,horse_country_USA_7,horse_country_other_7,horse_type_Brown_7,horse_type_Gelding_7,horse_type_other_7,horse_rating_8,declared_weight_8,actual_weight_8,win_odds_8,place_odds_8,horse_country_AUS_8,horse_country_FR_8,horse_country_GB_8,horse_country_IRE_8,horse_country_NZ_8,horse_country_USA_8,horse_country_other_8,horse_type_Brown_8,horse_type_Gelding_8,horse_type_other_8,horse_rating_9,declared_weight_9,actual_weight_9,win_odds_9,place_odds_9,horse_country_AUS_9,horse_country_FR_9,horse_country_GB_9,horse_country_IRE_9,horse_country_NZ_9,horse_country_USA_9,horse_country_other_9,horse_type_Brown_9,horse_type_Gelding_9,horse_type_other_9,horse_rating_10,declared_weight_10,actual_weight_10,win_odds_10,place_odds_10,horse_country_AUS_10,horse_country_FR_10,horse_country_GB_10,horse_country_IRE_10,horse_country_NZ_10,horse_country_USA_10,horse_country_other_10,horse_type_Brown_10,horse_type_Gelding_10,horse_type_other_10,horse_rating_11,declared_weight_11,actual_weight_11,win_odds_11,place_odds_11,horse_country_AUS_11,horse_country_FR_11,horse_country_GB_11,horse_country_IRE_11,horse_country_NZ_11,horse_country_USA_11,horse_country_other_11,horse_type_Brown_11,horse_type_Gelding_11,horse_type_other_11,horse_rating_12,declared_weight_12,actual_weight_12,win_odds_12,place_odds_12,horse_country_AUS_12,horse_country_FR_12,horse_country_GB_12,horse_country_IRE_12,horse_country_NZ_12,horse_country_USA_12,horse_country_other_12,horse_type_Brown_12,horse_type_Gelding_12,horse_type_other_12,horse_rating_13,declared_weight_13,actual_weight_13,win_odds_13,place_odds_13,horse_country_AUS_13,horse_country_FR_13,horse_country_GB_13,horse_country_IRE_13,horse_country_NZ_13,horse_country_USA_13,horse_country_other_13,horse_type_Brown_13,horse_type_Gelding_13,horse_type_other_13,horse_rating_14,declared_weight_14,actual_weight_14,win_odds_14,place_odds_14,horse_country_AUS_14,horse_country_FR_14,horse_country_GB_14,horse_country_IRE_14,horse_country_NZ_14,horse_country_USA_14,horse_country_other_14,horse_type_Brown_14,horse_type_Gelding_14,horse_type_other_14
race_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1
0,1400,8,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,60.0,1020.0,133.0,9.7,3.7,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,980.0,133.0,16.0,4.9,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1082.0,132.0,3.5,1.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1118.0,127.0,39.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,60.0,972.0,131.0,50.0,14.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1114.0,127.0,7.0,1.8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,978.0,123.0,99.0,28.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1170.0,128.0,12.0,3.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1126.0,123.0,38.0,13.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1072.0,125.0,39.0,12.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,60.0,1135.0,123.0,8.6,2.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1018.0,123.0,23.0,8.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1089.0,120.0,5.4,1.7,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,60.0,1027.0,113.0,11.0,3.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,1200,5,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,60.0,1078.0,128.0,14.0,4.2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1257.0,132.0,28.0,8.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1037.0,130.0,7.0,1.7,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,60.0,1168.0,126.0,12.0,3.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1148.0,125.0,2.3,1.2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1057.0,121.0,28.0,6.3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1064.0,122.0,13.0,3.3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1132.0,117.0,43.0,11.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1081.0,121.0,14.0,4.8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1059.0,121.0,10.0,2.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1106.0,118.0,50.0,15.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,60.0,1095.0,103.0,21.0,7.4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1060.0,113.0,13.0,4.7,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1110.0,108.0,47.0,16.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2,1400,11,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,60.0,1115.0,133.0,9.2,2.3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1091.0,126.0,99.0,36.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1000.0,128.0,81.0,22.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1106.0,127.0,3.5,1.5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,60.0,1031.0,126.0,4.4,1.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1068.0,125.0,65.0,23.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1151.0,123.0,99.0,25.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1052.0,125.0,6.7,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1122.0,125.0,7.9,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1016.0,124.0,16.0,5.2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1149.0,122.0,7.0,2.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1028.0,116.0,45.0,11.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1029.0,110.0,22.0,5.9,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1157.0,110.0,99.0,32.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,1600,2,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,60.0,1076.0,133.0,5.2,1.7,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1044.0,132.0,3.6,1.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1065.0,132.0,8.2,2.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1161.0,126.0,28.0,8.1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1040.0,128.0,33.0,10.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1106.0,120.0,43.0,18.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1019.0,125.0,12.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1009.0,120.0,13.0,4.9,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,60.0,1011.0,120.0,6.1,1.9,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,999.0,114.0,11.0,2.8,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1011.0,117.0,19.0,6.4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,988.0,106.0,31.0,9.3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1157.0,113.0,99.0,26.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1152.0,103.0,25.0,7.7,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
5,1200,9,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,60.0,1143.0,133.0,17.0,4.8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1096.0,133.0,1.8,1.1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1169.0,132.0,28.0,6.9,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1120.0,130.0,11.0,2.2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,60.0,1136.0,126.0,13.0,2.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1235.0,125.0,99.0,24.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1015.0,120.0,99.0,23.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1114.0,124.0,32.0,7.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1170.0,124.0,6.1,1.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,60.0,1015.0,123.0,8.5,2.8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1087.0,121.0,37.0,5.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1098.0,119.0,99.0,40.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,60.0,1064.0,115.0,22.0,5.3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,60.0,1155.0,110.0,99.0,40.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6123,1400,5,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,60.0,1056.0,133.0,13.0,3.8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,59.0,1084.0,132.0,3.1,1.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,57.0,1109.0,130.0,29.0,8.2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,56.0,1106.0,127.0,4.2,1.6,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,56.0,1127.0,129.0,16.0,4.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,54.0,1007.0,125.0,7.7,2.2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,54.0,1104.0,127.0,99.0,22.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,52.0,1061.0,125.0,86.0,14.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,50.0,1150.0,123.0,99.0,27.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,48.0,1037.0,121.0,17.0,3.9,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,47.0,1045.0,120.0,12.0,3.4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,46.0,1197.0,119.0,8.4,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,46.0,1133.0,117.0,53.0,12.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,44.0,1113.0,110.0,18.0,5.2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
6125,1000,7,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,80.0,1118.0,133.0,7.5,2.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,75.0,1163.0,121.0,11.0,4.1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,74.0,1063.0,127.0,95.0,23.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,72.0,1249.0,125.0,67.0,14.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,71.0,999.0,122.0,72.0,17.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,71.0,1146.0,117.0,49.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,70.0,1079.0,123.0,2.9,1.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,70.0,1076.0,123.0,14.0,3.2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,68.0,1186.0,121.0,95.0,20.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,67.0,1111.0,118.0,3.9,1.4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,67.0,1110.0,120.0,15.0,3.8,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,66.0,1196.0,117.0,26.0,5.8,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,63.0,1196.0,116.0,19.0,4.7,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,62.0,1122.0,115.0,9.6,2.7,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
6126,1400,7,14.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,80.0,1216.0,133.0,26.0,6.6,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,76.0,1121.0,122.0,21.0,5.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,76.0,1030.0,129.0,9.5,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,75.0,1073.0,128.0,2.7,1.4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,71.0,1059.0,124.0,99.0,40.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,71.0,1285.0,124.0,11.0,3.7,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,69.0,1270.0,122.0,4.8,1.6,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,68.0,1098.0,121.0,40.0,8.8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,68.0,1050.0,121.0,93.0,16.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,66.0,1193.0,119.0,7.5,2.1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,65.0,993.0,111.0,33.0,7.3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,63.0,1100.0,116.0,99.0,36.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,63.0,1168.0,116.0,11.0,3.2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,61.0,1084.0,114.0,14.0,3.4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
6127,1000,7,14.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,94.0,1231.0,131.0,6.5,2.1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,94.0,1089.0,131.0,39.0,7.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,90.0,1142.0,127.0,14.0,3.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,88.0,1084.0,125.0,32.0,5.5,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,87.0,1192.0,122.0,29.0,6.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,86.0,1103.0,123.0,10.0,2.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,85.0,1128.0,122.0,2.8,1.4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,85.0,1101.0,122.0,99.0,16.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,84.0,1105.0,119.0,55.0,8.7,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,84.0,1115.0,121.0,99.0,26.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,84.0,1167.0,121.0,85.0,13.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,83.0,1097.0,113.0,14.0,3.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,83.0,1092.0,118.0,99.0,24.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,81.0,1098.0,111.0,3.1,1.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [21]:
y = Xy['place_combination1'].to_numpy()-1
X = Xy.drop(columns = ['place_combination1'])

# Split according to time since races are in chronological order

train_size = int(len(Xy)*0.8)

X_train, X_test, y_train, y_test = [X[0:train_size],X[train_size:],y[0:train_size],y[train_size:]]

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)



### Naive classifier based on win_odds

In [22]:
win_odds_name = [i for i in X.columns if 'win_odds' in i]
win_odds_index = [X.columns.to_list().index(i) for i in win_odds_name]

In [23]:
X.to_numpy()[:,win_odds_index]

array([[ 9.7, 16. ,  3.5, ..., 23. ,  5.4, 11. ],
       [14. , 28. ,  7. , ..., 21. , 13. , 47. ],
       [ 9.2, 99. , 81. , ..., 45. , 22. , 99. ],
       ...,
       [26. , 21. ,  9.5, ..., 99. , 11. , 14. ],
       [ 6.5, 39. , 14. , ..., 14. , 99. ,  3.1],
       [99. , 13. ,  7.9, ...,  8.2, 59. , 99. ]])

In [24]:
np.mean(X.to_numpy()[:,win_odds_index].argmax(axis = 1) == y)

0.004954954954954955

### KNN

In [25]:
# distance function:

def most_frequent(List):
    return max(set(List), key = List.count)

def dist(point1, point2, option = 'manhattan'):
    
    if(option == 'manhattan'):
        return np.sum(np.abs(point1-point2))
    if(option == 'euclidean'):
        return np.sqrt(np.sum((point1-point2)**2))
    
# main KNN:
    
def pred(X_test, X_train, y_train, k = 10):

    y_pred = []

    for x in tqdm(X_test):

        distances = [dist(x, i) for i in X_train]
        top_k = np.argsort(distances)[0:k]
        labels = [y_train[i] for i in top_k]
        y_pred.append(most_frequent(labels))
    
    return y_pred


In [26]:
np.random.seed(42)

# KNN hyperparmeter tuning (K)

sample = np.random.choice(len(X_test), size = 100, replace=False)
X_test_sample = X_test[sample]
y_test_sample = y_test[sample]

accs = []

for k in [4,6,8,10,12]:

    y_pred_k = pred(X_test_sample, X_train, y_train, k)
    accs.append(np.mean(y_pred_k == y_test_sample))

print(accs)

  0%|          | 0/100 [00:00<?, ?it/s]

100%|██████████| 100/100 [00:00<00:00, 191.73it/s]
100%|██████████| 100/100 [00:00<00:00, 192.47it/s]
100%|██████████| 100/100 [00:00<00:00, 192.93it/s]
100%|██████████| 100/100 [00:00<00:00, 191.54it/s]
100%|██████████| 100/100 [00:00<00:00, 192.65it/s]

[0.11, 0.07, 0.09, 0.1, 0.1]





In [27]:
# KNN prediction

y_pred_knn = pred(X_test, X_train, y_train, 8)

  0%|          | 0/444 [00:00<?, ?it/s]

100%|██████████| 444/444 [00:02<00:00, 185.09it/s]


In [28]:
acc_knn = np.mean(y_test == y_pred_knn)
print("Base accuracy: 0.071")
print("KNN accuracy: {:.4f}".format(acc_knn))

Base accuracy: 0.071
KNN accuracy: 0.0788


### SVM

In [29]:
model_svm = SVC(verbose = True, random_state=42)

model_svm.fit(X_train, y_train)

y_pred_svm = model_svm.predict(X_test)

[LibSVM]

In [30]:
acc_svm = np.mean(y_test == y_pred_svm)
print("Base accuracy: 0.071")
print("svm accuracy: {:.4f}".format(acc_svm))
y_train_pred_svm = model_svm.predict(X_train)
acc_svm_train = np.mean(y_train_pred_svm == y_train)
print("svm train accuracy: {:.4f}".format(acc_svm_train))

Base accuracy: 0.071
svm accuracy: 0.1374
svm train accuracy: 0.8074


### Decision Tree

In [31]:
model_dt = DecisionTreeClassifier(random_state=42)

model_dt.fit(X_train, y_train)

y_pred_dt = model_dt.predict(X_test)

In [32]:
acc_dt = np.mean(y_test == y_pred_dt)
print("Base accuracy: 0.071")
print("Decision Tree accuracy: {:.4f}".format(acc_dt))
y_train_pred_dt = model_dt.predict(X_train)
acc_dt_train = np.mean(y_train_pred_dt == y_train)
print("Decision Tree train accuracy: {:.4f}".format(acc_dt_train))


Base accuracy: 0.071
Decision Tree accuracy: 0.1509
Decision Tree train accuracy: 1.0000


### Random Forest

In [33]:
model_rf = RandomForestClassifier(random_state=42)

model_rf.fit(X_train, y_train)

y_pred_rf = model_rf.predict(X_test)

In [34]:
acc_rf = np.mean(y_test == y_pred_rf)
print("Base accuracy: 0.071")
print("Random Forest accuracy: {:.4f}".format(acc_rf))
y_train_pred_rf = model_rf.predict(X_train)
acc_rf_train = np.mean(y_train_pred_rf == y_train)
print("Random Forest train accuracy: {:.4f}".format(acc_rf_train))


Base accuracy: 0.071
Random Forest accuracy: 0.2410
Random Forest train accuracy: 1.0000


### Gradient Boosting

In [35]:
# Using xgboost since it is the most popular graident boosting method

model_xgb = RandomForestClassifier(random_state=42)

def data_loader():
    return (X_train,y_train), (X_test,y_test)

train_set = xgb.DMatrix(X_train,y_train)
val_set = xgb.DMatrix(X_test,y_test)

config = {
    'max_depth': 10,
    'eta': 0.1,
    'objective': 'multi:softprob',  
    'num_class': 14
}

model_xgb = xgb.train(
    config,
    dtrain = train_set,
    num_boost_round = 50,
    evals = [(val_set, 'eval')],
    early_stopping_rounds=2,
)
y_pred_xgb = model_xgb.predict(xgb.DMatrix(X_test)).argmax(axis = 1)

[0]	eval-mlogloss:2.60217
[1]	eval-mlogloss:2.57604
[2]	eval-mlogloss:2.56038
[3]	eval-mlogloss:2.53301
[4]	eval-mlogloss:2.51597
[5]	eval-mlogloss:2.50664
[6]	eval-mlogloss:2.49898
[7]	eval-mlogloss:2.48886
[8]	eval-mlogloss:2.47697
[9]	eval-mlogloss:2.46991
[10]	eval-mlogloss:2.46260
[11]	eval-mlogloss:2.45423
[12]	eval-mlogloss:2.44510
[13]	eval-mlogloss:2.43702
[14]	eval-mlogloss:2.43297
[15]	eval-mlogloss:2.42642
[16]	eval-mlogloss:2.42884
[17]	eval-mlogloss:2.42656


In [36]:
acc_xgb = np.mean(y_test == y_pred_xgb)
print("Base accuracy: 0.071")
print("X Gradient Boost accuracy: {:.4f}".format(acc_xgb))
y_train_pred_xgb = model_xgb.predict(xgb.DMatrix(X_train)).argmax(axis = 1)
acc_xgb_train = np.mean(y_train_pred_xgb == y_train)
print("X Gradient Boost train accuracy: {:.4f}".format(acc_xgb_train))


Base accuracy: 0.071
X Gradient Boost accuracy: 0.1802
X Gradient Boost train accuracy: 1.0000


## Hyperparametre Tuning

### Random Forest

In [37]:
# random search for hyperparameters

from tune_sklearn import TuneSearchCV

params = {
    'max_depth': [20,30,40],
    'min_samples_leaf' : [10,20,30],
    'min_samples_split': [10,20,30],
    'max_features' : [60,100,140],
}

tune_search = TuneSearchCV(
    RandomForestClassifier(
        n_estimators = 100,
        random_state = 42, 
        criterion = 'gini'
        ),
    params,
    scoring = 'accuracy',
    verbose=2,
    n_jobs = -1,
    early_stopping="MedianStoppingRule",
    n_trials=20,
    max_iters=10,
    search_optimization="bohb"
)

result = tune_search.fit(X_train, y_train)

2023-11-11 19:55:53,480	INFO worker.py:1538 -- Started a local Ray instance.


0,1
Current time:,2023-11-11 19:56:19
Running for:,00:00:24.32
Memory:,15.5/63.8 GiB

Trial name,status,loc,max_depth,max_features,min_samples_leaf,min_samples_split,iter,total time (s),split0_test_score,split1_test_score,split2_test_score
_Trainable_67f630e5,TERMINATED,127.0.0.1:38324,30,100,10,30,1,0.771686,0.255618,0.270423,0.256338
_Trainable_90d9c4b2,TERMINATED,127.0.0.1:41776,40,140,30,30,3,2.23906,0.280899,0.304225,0.253521
_Trainable_52193e05,TERMINATED,127.0.0.1:39092,20,60,20,30,1,0.472615,0.269663,0.276056,0.233803
_Trainable_4363807b,TERMINATED,127.0.0.1:41776,40,60,20,20,1,0.511613,0.269663,0.276056,0.233803
_Trainable_0f4a375f,TERMINATED,127.0.0.1:42032,20,140,10,20,1,1.12829,0.27809,0.230986,0.239437
_Trainable_d25cce99,TERMINATED,127.0.0.1:28984,40,100,30,10,3,1.48732,0.272472,0.290141,0.256338
_Trainable_970eefb2,TERMINATED,127.0.0.1:14868,30,60,30,30,1,0.417099,0.269663,0.270423,0.247887
_Trainable_be3240a8,TERMINATED,127.0.0.1:15540,40,140,10,20,1,1.26658,0.27809,0.230986,0.239437
_Trainable_36d96126,TERMINATED,127.0.0.1:39332,40,60,30,10,7,2.14894,0.280899,0.292958,0.264789
_Trainable_06a299dd,TERMINATED,127.0.0.1:38488,20,60,20,20,3,1.29571,0.272472,0.292958,0.250704


Trial name,average_test_score,objective,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score
_Trainable_06a299dd,0.260693,0.260693,0.269663,0.276056,0.233803,0.256338,0.267606
_Trainable_0f4a375f,0.251111,0.251111,0.27809,0.230986,0.239437,0.239437,0.267606
_Trainable_11e12abd,0.285476,0.285476,0.280899,0.315493,0.267606,0.253521,0.309859
_Trainable_1368811e,0.282662,0.282662,0.275281,0.323944,0.261972,0.24507,0.307042
_Trainable_36d96126,0.279278,0.279278,0.280899,0.292958,0.264789,0.253521,0.304225
_Trainable_3d87978f,0.26746,0.26746,0.258427,0.276056,0.273239,0.24507,0.284507
_Trainable_4363807b,0.260693,0.260693,0.269663,0.276056,0.233803,0.256338,0.267606
_Trainable_44ebbe5c,0.281533,0.281533,0.27809,0.307042,0.28169,0.24507,0.295775
_Trainable_52193e05,0.260693,0.260693,0.269663,0.276056,0.233803,0.256338,0.267606
_Trainable_67f630e5,0.260701,0.260701,0.255618,0.270423,0.256338,0.216901,0.304225


[2m[36m(_Trainable pid=41776)[0m 2023-11-11 19:56:07,138	INFO trainable.py:790 -- Restored on 127.0.0.1 from checkpoint: C:\Users\andyw\AppData\Local\Temp\checkpoint_tmp_2a68334353a042c6a7abb220df8e5ef7
[2m[36m(_Trainable pid=41776)[0m 2023-11-11 19:56:07,138	INFO trainable.py:799 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 0.8771958351135254, '_episodes_total': None}
[2m[36m(_Trainable pid=38476)[0m 2023-11-11 19:56:08,980	INFO trainable.py:790 -- Restored on 127.0.0.1 from checkpoint: C:\Users\andyw\AppData\Local\Temp\checkpoint_tmp_e022942bd2f7410f81decfd55598faf3
[2m[36m(_Trainable pid=38476)[0m 2023-11-11 19:56:08,980	INFO trainable.py:799 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 2.0835771560668945, '_episodes_total': None}
[2m[36m(_Trainable pid=39332)[0m 2023-11-11 19:56:10,524	INFO trainable.py:790 -- Restored on 127.0.0.1 from checkpoint: C:\Users\andyw\AppData\L

In [None]:
tune_search.best_params

In [24]:
# Train and evaluate RF using the best set of hyperparameters

model_rf_tuned = RandomForestClassifier(
    max_depth=30,
    max_features=80, 
    min_samples_leaf=15,
    min_samples_split=20,
    n_estimators=100,
    random_state=42)

model_rf_tuned.fit(X_train, y_train)

y_pred_rf_tuned = model_rf_tuned.predict(X_test)

In [25]:
acc_rf_tuned = np.mean(y_test == y_pred_rf_tuned)
print("Base accuracy: 0.071")
print("Random Forest accuracy: {:.4f}".format(acc_rf_tuned))
y_train_pred_rf_tuned = model_rf_tuned.predict(X_train)
acc_rf_train_tuned = np.mean(y_train_pred_rf_tuned == y_train)
print("Random Forest train accuracy: {:.4f}".format(acc_rf_train_tuned))


Base accuracy: 0.071
Random Forest accuracy: 0.2883
Random Forest train accuracy: 0.4724


### XGBoost

In [41]:
# random search for hyperparameters
# additional function to support xgboost 

from ray import tune
from ray.tune.integration.xgboost import TuneReportCheckpointCallback

def data_loader():
    return (X_train,y_train), (X_test,y_test)

def train_data(config,data):
    t1, t2 = data
    train_set = xgb.DMatrix(t1[0], label = t1[1])
    val_set = xgb.DMatrix(t2[0], label = t2[1])
    results = {}
    bst = xgb.train(
        config,
        train_set,
        num_boost_round = 100,
        evals = [(val_set, 'mlogloss')],
        evals_result = results,
        verbose_eval = False,
        callbacks=[TuneReportCheckpointCallback(filename="model.xgb")],
        early_stopping_rounds=5,
    )

config = {
          "objective": "multi:softprob",
          "tree_method": "gpu_hist",
          "eval_metric": ["mlogloss"],
          "max_depth": tune.randint(5,15),
          "min_child_weight": tune.randint(1,5),
          "colsample_bytree": tune.uniform(0.5, 1.0),
          "eta": tune.loguniform(1e-3, 1e-1),
          "reg_lambda": tune.uniform(0.1, 5),
          "reg_alpha": tune.uniform(0.1, 5),
          "num_class": 14,
          "seed": 42
}
t1,t2 = data_loader()

analysis = tune.run(
    tune.with_parameters(train_data, data = (t1,t2)),
    resources_per_trial = {"gpu":1},
    config = config,
    num_samples = 50,
    metric='mlogloss-mlogloss',
    mode="max",
    stop={
        "training_iteration": 500
    },
)

0,1
Current time:,2023-11-11 19:58:38
Running for:,00:02:16.55
Memory:,15.7/63.8 GiB

Trial name,status,loc,colsample_bytree,eta,max_depth,min_child_weight,reg_alpha,reg_lambda,iter,total time (s),mlogloss-mlogloss
train_data_4bb4b_00019,RUNNING,127.0.0.1:39076,0.571496,0.0333443,6,3,0.595501,3.12927,,,
train_data_4bb4b_00020,PENDING,,0.850485,0.00139806,11,2,3.56059,4.12711,,,
train_data_4bb4b_00021,PENDING,,0.542419,0.0940328,9,1,1.91615,1.93393,,,
train_data_4bb4b_00022,PENDING,,0.973624,0.0937567,8,1,1.94367,3.79155,,,
train_data_4bb4b_00023,PENDING,,0.688994,0.0257139,7,1,1.71824,1.31875,,,
train_data_4bb4b_00024,PENDING,,0.505677,0.00865608,7,1,0.682208,0.375886,,,
train_data_4bb4b_00025,PENDING,,0.873022,0.0146804,13,3,1.93687,4.81465,,,
train_data_4bb4b_00026,PENDING,,0.9343,0.00280022,5,1,0.159557,4.81979,,,
train_data_4bb4b_00027,PENDING,,0.763851,0.0968121,9,4,2.81389,0.461603,,,
train_data_4bb4b_00028,PENDING,,0.761549,0.0181467,8,1,2.32725,3.50917,,,


Trial name,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,mlogloss-mlogloss,node_ip,pid,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
train_data_4bb4b_00000,2023-11-11_19-56-28,True,,40b3bb3ff34f456090b92488e608e34d,"0_colsample_bytree=0.8650,eta=0.0012,max_depth=9,min_child_weight=2,reg_alpha=0.8774,reg_lambda=2.8736",WDesktop,100,2.59101,127.0.0.1,38292,4.96598,0.0472159,4.96598,1699750588,0,,100,4bb4b_00000,0.00299978
train_data_4bb4b_00001,2023-11-11_19-56-37,True,,7655aeed66e348928569fedbd354dea0,"1_colsample_bytree=0.6709,eta=0.0015,max_depth=10,min_child_weight=1,reg_alpha=1.6259,reg_lambda=0.5614",WDesktop,100,2.58419,127.0.0.1,29628,5.49375,0.05,5.49375,1699750597,0,,100,4bb4b_00001,0.00199986
train_data_4bb4b_00002,2023-11-11_19-56-44,True,,bc63d37c3ae94b8e84c0a07b07a85c30,"2_colsample_bytree=0.7371,eta=0.0016,max_depth=8,min_child_weight=1,reg_alpha=2.4200,reg_lambda=2.5089",WDesktop,100,2.58413,127.0.0.1,13696,4.54388,0.0440032,4.54388,1699750604,0,,100,4bb4b_00002,0.00300002
train_data_4bb4b_00003,2023-11-11_19-56-49,True,,c66053bc703d4bc2a7421e9fbdb3b7f1,"3_colsample_bytree=0.7169,eta=0.0063,max_depth=5,min_child_weight=2,reg_alpha=3.2120,reg_lambda=3.1177",WDesktop,100,2.4568,127.0.0.1,21268,2.85425,0.0250032,2.85425,1699750609,0,,100,4bb4b_00003,0.00299954
train_data_4bb4b_00004,2023-11-11_19-56-56,True,,bdb5952d8d884af08e875ee140ed7cc3,"4_colsample_bytree=0.6873,eta=0.0179,max_depth=6,min_child_weight=4,reg_alpha=4.2968,reg_lambda=2.5654",WDesktop,100,2.32995,127.0.0.1,20488,3.36206,0.0311198,3.36206,1699750616,0,,100,4bb4b_00004,0.00200081
train_data_4bb4b_00005,2023-11-11_19-57-04,True,,bdbcc73da5f940ff9d64fcd393e69da7,"5_colsample_bytree=0.5815,eta=0.0014,max_depth=9,min_child_weight=1,reg_alpha=0.2299,reg_lambda=3.2479",WDesktop,100,2.59469,127.0.0.1,36208,5.05052,0.053,5.05052,1699750624,0,,100,4bb4b_00005,0.00300002
train_data_4bb4b_00006,2023-11-11_19-57-11,True,,4d1fb40333a44100b87b9d5d6ef0d9c9,"6_colsample_bytree=0.8386,eta=0.0140,max_depth=8,min_child_weight=4,reg_alpha=4.0749,reg_lambda=0.7297",WDesktop,100,2.35352,127.0.0.1,17340,4.16337,0.0390017,4.16337,1699750631,0,,100,4bb4b_00006,0.00199962
train_data_4bb4b_00007,2023-11-11_19-57-16,True,,999fb960e89e41cdba842993f6b0e291,"7_colsample_bytree=0.6931,eta=0.0836,max_depth=9,min_child_weight=4,reg_alpha=1.0594,reg_lambda=4.5362",WDesktop,49,2.27743,127.0.0.1,39772,2.38452,0.0440099,2.38452,1699750636,0,,49,4bb4b_00007,0.00199986
train_data_4bb4b_00008,2023-11-11_19-57-22,True,,23e58d6910a7401ba7e27f43ca4376a3,"8_colsample_bytree=0.5472,eta=0.0232,max_depth=6,min_child_weight=1,reg_alpha=1.6630,reg_lambda=0.4488",WDesktop,100,2.30197,127.0.0.1,23276,3.51616,0.033,3.51616,1699750642,0,,100,4bb4b_00008,0.00299954
train_data_4bb4b_00009,2023-11-11_19-57-31,True,,59d2cfe99c744214844999d461ef4598,"9_colsample_bytree=0.5116,eta=0.0426,max_depth=14,min_child_weight=2,reg_alpha=0.6790,reg_lambda=1.4811",WDesktop,100,2.29272,127.0.0.1,32028,6.05185,0.0579998,6.05185,1699750651,0,,100,4bb4b_00009,0.00300121


2023-11-11 19:58:38,552	ERROR tune.py:758 -- Trials did not complete: [train_data_4bb4b_00019, train_data_4bb4b_00020, train_data_4bb4b_00021, train_data_4bb4b_00022, train_data_4bb4b_00023, train_data_4bb4b_00024, train_data_4bb4b_00025, train_data_4bb4b_00026, train_data_4bb4b_00027, train_data_4bb4b_00028, train_data_4bb4b_00029, train_data_4bb4b_00030, train_data_4bb4b_00031, train_data_4bb4b_00032, train_data_4bb4b_00033, train_data_4bb4b_00034, train_data_4bb4b_00035, train_data_4bb4b_00036, train_data_4bb4b_00037, train_data_4bb4b_00038, train_data_4bb4b_00039, train_data_4bb4b_00040, train_data_4bb4b_00041, train_data_4bb4b_00042, train_data_4bb4b_00043, train_data_4bb4b_00044]
2023-11-11 19:58:38,552	INFO tune.py:762 -- Total run time: 137.00 seconds (136.54 seconds for the tuning loop).


In [None]:
analysis.get_best_trial(metric = 'mlogloss-mlogloss', mode = 'min').last_result

{'mlogloss-mlogloss': 2.257413484760233,
 'time_this_iter_s': 0.02700018882751465,
 'done': True,
 'timesteps_total': None,
 'episodes_total': None,
 'training_iteration': 100,
 'trial_id': '53c4e_00037',
 'experiment_id': 'e36a056782d14317ac83031dbc0f4ef0',
 'date': '2023-11-11_17-59-07',
 'timestamp': 1699743547,
 'time_total_s': 2.8465235233306885,
 'pid': 29084,
 'hostname': 'WDesktop',
 'node_ip': '127.0.0.1',
 'config': {'objective': 'multi:softprob',
  'tree_method': 'gpu_hist',
  'eval_metric': ['mlogloss'],
  'max_depth': 5,
  'min_child_weight': 1,
  'colsample_bytree': 0.5281877483254636,
  'eta': 0.05363456374740404,
  'reg_lambda': 4.083214944737381,
  'reg_alpha': 4.99861659910204,
  'num_class': 14,
  'seed': 42},
 'time_since_restore': 2.8465235233306885,
 'timesteps_since_restore': 0,
 'iterations_since_restore': 100,
 'warmup_time': 0.0019998550415039062,
 'experiment_tag': '37_colsample_bytree=0.5282,eta=0.0536,max_depth=5,min_child_weight=1,reg_alpha=4.9986,reg_lamb

In [47]:
# Train and evaluate xgboost using the best set of hyperparameters

config = {'objective': 'multi:softprob',
  'tree_method': 'gpu_hist',
  'eval_metric': ['mlogloss'],
  'max_depth': 5,
  'min_child_weight': 1,
  'colsample_bytree': 0.5281877483254636,
  'eta': 0.05363456374740404,
  'reg_lambda': 4.083214944737381,
  'reg_alpha': 4.99861659910204,
  'num_class': 14,
  'seed': 42}

train_set = xgb.DMatrix(X_train,y_train)
val_set = xgb.DMatrix(X_test,y_test)
results = {}

model_xgb = xgb.train(
    config,
    dtrain = train_set,
    num_boost_round = 500,
    evals = [(val_set, 'eval')],
    early_stopping_rounds=5,
)
y_pred_xgb = model_xgb.predict(xgb.DMatrix(X_test)).argmax(axis = 1)

[0]	eval-mlogloss:2.61659
[1]	eval-mlogloss:2.59550
[2]	eval-mlogloss:2.57767
[3]	eval-mlogloss:2.56522
[4]	eval-mlogloss:2.54924
[5]	eval-mlogloss:2.53673
[6]	eval-mlogloss:2.52453
[7]	eval-mlogloss:2.51401
[8]	eval-mlogloss:2.50283
[9]	eval-mlogloss:2.49195
[10]	eval-mlogloss:2.47925
[11]	eval-mlogloss:2.46801
[12]	eval-mlogloss:2.45908
[13]	eval-mlogloss:2.44953
[14]	eval-mlogloss:2.44293
[15]	eval-mlogloss:2.43521
[16]	eval-mlogloss:2.42767
[17]	eval-mlogloss:2.42079
[18]	eval-mlogloss:2.41502
[19]	eval-mlogloss:2.40690
[20]	eval-mlogloss:2.39817
[21]	eval-mlogloss:2.39146
[22]	eval-mlogloss:2.38713
[23]	eval-mlogloss:2.38085
[24]	eval-mlogloss:2.37612
[25]	eval-mlogloss:2.37111
[26]	eval-mlogloss:2.36523
[27]	eval-mlogloss:2.36168
[28]	eval-mlogloss:2.35825
[29]	eval-mlogloss:2.35295
[30]	eval-mlogloss:2.34763
[31]	eval-mlogloss:2.34398
[32]	eval-mlogloss:2.34148
[33]	eval-mlogloss:2.33760
[34]	eval-mlogloss:2.33563
[35]	eval-mlogloss:2.33214
[36]	eval-mlogloss:2.32851
[37]	eval-m

In [48]:
acc_xgb = np.mean(y_test == y_pred_xgb)
print("Base accuracy: 0.071")
print("X Gradient Boost accuracy: {:.4f}".format(acc_xgb))
y_train_pred_xgb = model_xgb.predict(xgb.DMatrix(X_train)).argmax(axis = 1)
acc_xgb_train = np.mean(y_train_pred_xgb == y_train)
print("X Gradient Boost train accuracy: {:.4f}".format(acc_xgb_train))


Base accuracy: 0.071
X Gradient Boost accuracy: 0.2455
X Gradient Boost train accuracy: 0.7523


#### Dense Neural Network

In [31]:
# Hyperparamter tuning for DNN;

import tensorflow as tf
import math
from ray import tune

from keras.models import Sequential
from keras.layers import Dense
from keras import initializers
from keras.callbacks import LearningRateScheduler
from keras.callbacks import EarlyStopping

from ray.tune.integration.keras import TuneReportCallback
from ray.tune.schedulers import HyperBandScheduler

def data_loader():
    return (X_train,pd.get_dummies(y_train)), (X_test,pd.get_dummies(y_test))

def lr_step_decay(epoch, lr):
    drop_rate = 0.5
    epochs_drop = 5.0
    return 0.01 * math.pow(drop_rate, math.floor(epoch/epochs_drop))

def train_data(config,data):

    t1, t2 = data

    X_train = t1[0]
    y_train = t1[1]
    X_test = t2[0]
    y_test = t2[1]

    n_units = config['units']
    n_layers = config['layers']
    activation = config['activation']
    if(activation == 'tanh' or activation == 'sigmoid'):
        initializer = 'glorot_uniform'
    else:
        initializer = 'he_normal'

    model = Sequential()

    model.add(Dense(units = n_units, 
                    input_dim=X_train.shape[1], 
                    activation= activation,
                    kernel_initializer= initializer, 
                    name='h1'))
    
    for i in range(2, n_layers + 1):
        model.add(Dense(units= n_units, 
                        activation= activation,
                        kernel_initializer= initializer,  
                        name='h{}'.format(i)))
        
    model.add(Dense(units=14, activation='softmax', kernel_initializer=initializer, name='o'))

    model.compile(
        loss="categorical_crossentropy", 
        optimizer=tf.keras.optimizers.Adam(
            learning_rate=0.01), 
        metrics=['accuracy'])

    model.fit(
        X_train,
        y_train,
        batch_size=128,
        epochs=50,
        verbose=0,
        validation_data=(X_test, y_test),
        callbacks=[TuneReportCallback({
            "val_accuracy": "val_accuracy"
            }),
            LearningRateScheduler(
                lr_step_decay, verbose=0
            ),
            EarlyStopping(
                monitor='val_accuracy', 
                patience=6
            )
        ])


In [32]:
t1,t2 = data_loader()

config = {
          'units': tune.grid_search([32,64,128,256,512]),
          'layers': tune.grid_search([4,8,16,32,64]),
          'activation': tune.grid_search(['ReLU','tanh','sigmoid'])
}

analysis = tune.run(
    tune.with_parameters(train_data, data = (t1,t2)),
    resources_per_trial = {"cpu":24},
    config = config,
    metric='val_accuracy',
    mode="max"
)

[2m[36m(train_data pid=39400)[0m 2023-11-11 20:42:52.816687: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
[2m[36m(train_data pid=39400)[0m 2023-11-11 20:42:52.817559: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
[2m[36m(train_data pid=39400)[0m Skipping registering GPU devices...


Trial name,date,done,episodes_total,experiment_id,hostname,iterations_since_restore,node_ip,pid,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,val_accuracy,warmup_time
train_data_c8fc6_00000,2023-11-11_20-42-53,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,8,127.0.0.1,39400,0.978253,0.0280032,0.978253,1699753373,0,,8,c8fc6_00000,0.103604,0.00300026
train_data_c8fc6_00001,2023-11-11_20-42-54,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,10,127.0.0.1,39400,0.651096,0.0279999,0.651096,1699753374,0,,10,c8fc6_00001,0.103604,0.00300026
train_data_c8fc6_00002,2023-11-11_20-42-54,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,11,127.0.0.1,39400,0.678026,0.0310972,0.678026,1699753374,0,,11,c8fc6_00002,0.108108,0.00300026
train_data_c8fc6_00003,2023-11-11_20-42-55,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,8,127.0.0.1,39400,0.700553,0.03,0.700553,1699753375,0,,8,c8fc6_00003,0.0923423,0.00300026
train_data_c8fc6_00004,2023-11-11_20-42-56,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,12,127.0.0.1,39400,0.862578,0.0305054,0.862578,1699753376,0,,12,c8fc6_00004,0.0990991,0.00300026
train_data_c8fc6_00005,2023-11-11_20-42-57,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,8,127.0.0.1,39400,0.666596,0.0290024,0.666596,1699753377,0,,8,c8fc6_00005,0.0990991,0.00300026
train_data_c8fc6_00006,2023-11-11_20-42-58,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,9,127.0.0.1,39400,0.915334,0.0343461,0.915334,1699753378,0,,9,c8fc6_00006,0.0833333,0.00300026
train_data_c8fc6_00007,2023-11-11_20-42-59,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,10,127.0.0.1,39400,0.971732,0.0330026,0.971732,1699753379,0,,10,c8fc6_00007,0.0878378,0.00300026
train_data_c8fc6_00008,2023-11-11_20-43-00,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,8,127.0.0.1,39400,0.8883,0.037003,0.8883,1699753380,0,,8,c8fc6_00008,0.0990991,0.00300026
train_data_c8fc6_00009,2023-11-11_20-43-01,True,,acc5a5c233f745a4ab10547268e0918f,WDesktop,12,127.0.0.1,39400,1.54378,0.0379992,1.54378,1699753381,0,,12,c8fc6_00009,0.0878378,0.00300026


2023-11-11 20:44:50,413	ERROR tune.py:758 -- Trials did not complete: [train_data_c8fc6_00059, train_data_c8fc6_00060, train_data_c8fc6_00061, train_data_c8fc6_00062, train_data_c8fc6_00063, train_data_c8fc6_00064, train_data_c8fc6_00065, train_data_c8fc6_00066, train_data_c8fc6_00067, train_data_c8fc6_00068, train_data_c8fc6_00069, train_data_c8fc6_00070, train_data_c8fc6_00071, train_data_c8fc6_00072, train_data_c8fc6_00073, train_data_c8fc6_00074]
2023-11-11 20:44:50,413	INFO tune.py:762 -- Total run time: 121.70 seconds (121.45 seconds for the tuning loop).


In [36]:
# Neural network

config ={'units': 16, 'layers': 4, 'activation': 'ReLU'}

n_units = config['units']
n_layers = config['layers']
activation = config['activation']
if(activation == 'tanh' or activation == 'sigmoid'):
    initializer = 'glorot_uniform'
else:
    initializer = 'he_normal'

model = Sequential()

model.add(Dense(units = n_units, 
                input_dim=X_train.shape[1], 
                activation= activation,
                kernel_initializer= initializer, 
                name='h1'))

for i in range(2, n_layers + 1):
    model.add(Dense(units= n_units, 
                    activation= activation,
                    kernel_initializer= initializer,  
                    name='h{}'.format(i)))
        
model.add(Dense(units=14, activation='softmax', kernel_initializer=initializer, name='o'))

model.compile(
    loss="categorical_crossentropy", 
    optimizer=tf.keras.optimizers.Adam(
        learning_rate=0.01), 
    metrics=['accuracy'])

def lr_step_decay(epoch, lr):
    drop_rate = 0.5
    epochs_drop = 10
    return 0.001 * math.pow(drop_rate, math.floor(epoch/epochs_drop))

model.fit(
    X_train,
    pd.get_dummies(y_train),
    batch_size=128,
    epochs=1000,
    verbose=1,
    validation_data=(X_test, pd.get_dummies(y_test)),
    callbacks=[
    ])

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

KeyboardInterrupt: 

## Step 6: Results

### Performance

|                | KNN    | SVM    | Decision Tree | Random Forest | Gradient Boosting |
|----------------|--------|--------|---------------|---------------|-------------------|
|       Accuracy | 0.0788 | 0.1374 |        0.1509 | 0.2410        | 0.1802            |
| Tuned Accuracy |        |        |               | 0.2883        | 0.2455            |

### Feature Importance

In [26]:
feature_importances = model_rf_tuned.feature_importances_

feature_importances_index = np.argsort(-feature_importances)

for i in feature_importances_index[0:100]:

    name = X.columns[i]
    print(f"Feature {name} importance: {feature_importances[i]}")


Feature win_odds_1 importance: 0.04369203281101722
Feature win_odds_7 importance: 0.04029833396998381
Feature win_odds_5 importance: 0.03811723473875967
Feature win_odds_6 importance: 0.03783301617170432
Feature win_odds_2 importance: 0.037018343962721356
Feature win_odds_10 importance: 0.03421853575262265
Feature place_odds_7 importance: 0.034173789055530285
Feature place_odds_1 importance: 0.033826291912774864
Feature win_odds_4 importance: 0.030627278658140047
Feature win_odds_3 importance: 0.029765814323145744
Feature win_odds_9 importance: 0.02744598093301855
Feature place_odds_5 importance: 0.02631526144972493
Feature place_odds_6 importance: 0.025295941999975494
Feature place_odds_4 importance: 0.024486717466408847
Feature place_odds_10 importance: 0.024241156103418683
Feature win_odds_12 importance: 0.023989994410590035
Feature place_odds_12 importance: 0.022697216793748413
Feature place_odds_3 importance: 0.020209184055063906
Feature place_odds_9 importance: 0.0196050830471314